Protecting vulnerable groups online from harmful content – new (technical) approaches

18 Jun 2024 12:15h - 13:15h

Event report

23

Table of contents

Disclaimer: This is not an official record of the session. The DiploAI system automatically generates these resources from the audiovisual recording. Resources are presented in their original format, as provided by the AI (e.g. including any spelling mistakes). The accuracy of these resources cannot be guaranteed.

Full session report

Experts Discuss Strategies for Protecting Children from Harmful Online Content

The panel discussion, moderated by Torsten Krause, delved into the critical issue of protecting vulnerable groups, particularly children, from harmful online content. The session brought together a diverse group of experts, including Žydrūnas Tamašauskas from Oxylabs, Anna Rywczyńska from the Polish Safer Internet Center, and Andrew Campling from the Internet Watch Foundation (IWF), who contributed their expertise on technology, internet safety, and child protection.

A central concern highlighted during the discussion was the disturbing rise in self-generated child sexual abuse material (CSAM). Andrew Campling shed light on the gravity of CSAM, which often involves the sexual exploitation of very young children, sometimes leading to suicide. He pointed out the growing trend of self-generated abusive material, which is initially shared voluntarily but subsequently misused, or produced under coercion by organised crime groups, with teenage boys being a particular target for blackmail.

The panel explored the role of AI and machine learning in detecting harmful content. Žydrūnas Tamašauskas introduced an AI product by Oxylabs designed to identify and remove CSAM from the internet. Despite the potential of AI, the challenges of false positives and the necessity for human oversight in AI detection systems were acknowledged, underscoring the need for accuracy and caution to prevent wrongful accusations.

Age verification emerged as a vital tool for online safety. Andrew Campling suggested privacy-preserving implementation methods, such as facial scanning, which do not require disclosing personal information. The discussion around privacy versus the need for effective CSAM countermeasures was nuanced, with privacy advocates voicing concerns about the impact on individual privacy rights.

The need for a multi-stakeholder approach was a recurring theme. Anna Rywczyńska emphasised the importance of collaboration between various parties, including NGOs, law enforcement, and internet platforms, to effectively combat harmful online content. The panel concurred that involving a range of stakeholders, including technical experts, civil society, political authorities, and non-technical experts like psychologists, is crucial in addressing the complexities of online safety for vulnerable groups.

Client-side scanning technology was also discussed, with concerns raised about security risks and the potential for misuse or false accusations. The panel clarified the distinction between client-side scanning and the detection of known CSAM, noting that some entities already use client-side scanning without infringing on privacy.

Inclusivity and accessibility were addressed, recognising the need for solutions that cater to the diverse needs of internet users, including those without valid documents or in vulnerable situations. The concept of a European ID as a long-term solution was proposed, contingent on secure implementation using private blockchains or other secure mechanisms.

The session concluded with a call for ongoing collaboration and innovation in technical solutions and policy-making to ensure the safety of vulnerable internet users. The panellists highlighted the persistent challenges and the need for stakeholder engagement to develop effective strategies for combating harmful online content.

Noteworthy observations from the discussion included the privacy paradox, where increased privacy measures can sometimes lead to decreased safety, and the potential for large, centralised databases to be targets for cyber-attacks. The panel also touched on the importance of data ethics and the need for responsible data management, particularly by state actors, given the private sector’s history of data abuse. Corrections were made to ensure the summary accurately reflected UK spelling and grammar conventions.

Session transcript

Torsten Krause:
a warm welcome here to all the on-site participants in our venue in the lovely old town of Vilnius in Lithuania, but also a warm welcome to all the participants remotely at home or in their workplaces at their devices. My name is Torsten Krause. I’m a political scientist and child rights researcher working at the Digital Opportunities Foundation in Germany. And I will be the moderator of this session and drive you through this next hour. With me-

Moderator:
Sorry, can I just intervene? I thank you for the start. No, it has to be my turn.

Torsten Krause:
Apologize, Maria.

Moderator:
I’m going to report to you just in a few moments, okay? Because I need to go through some rules for our participants, so if you’re okay with that. Okay. First of all, some new things here. If you want to access the photos of the event, you can scan the QR code and you’ll find everything there. I’ll just give you a moment to do that. And then some basic rules for the session. So the first of all is as you enter the Zoom meeting, you will be presented with your full name. If you want to ask a question, raise hand using the Zoom function, then you will be unmuted and you will be able to speak. Once speaking, switch on the video, state your name and affiliation, and finally, do not share links to the Zoom meetings even with your colleagues. So now, thank you for giving the time and now the floor is yours.

Torsten Krause:
Yeah, thanks, Maria, and apologize for starting.

Moderator:
Thank you, it’s the heart of my job.

Torsten Krause:
Okay, and then let me shortly introduce the people beside me on the panel. On my left side, we have Žydrūnas Tamašauskas. I hope I pronounced it correctly.

Zydrunas Tamasauskas:
More or less. More or less, yeah. It’s actually Zydrunas Tamasauskas.

Torsten Krause:
Žydrūnas , okay.

Zydrunas Tamasauskas:
In short, it’s fine to call me just Z.

Torsten Krause:
Z, okay. He’s the Chief Technology Officer at Oxylabs, and we will come later back what you invent and what’s your task on this table. Also, on my left side, and from your perspective, it’s, of course, the right one. It’s Anna Anna Rywczyńska. She’s the coordinator of the Polish Safer Internet Center. Happy to have you here at this panel. And on my other side, you can see Andrew Campling. He’s working as a trustee for the IWF, the Internet Watch Foundation. Also, in the room, I really give a warm welcome toIrena Guidikova She’s the head of the Democratic Institutions and Freedom Department of the Council of Europe. She’s over there. And we have also Maciej Gron from NASK Poland with us to bring in perspective and answer questions if necessary. If you read the description in the wiki, you recognize that there are several changes here in the panel, but also in the roles. It was not foreseen that I am the moderator. We decided yesterday for some reasons to change it. But I will be supported by Sofia and by Francesco as the reporter of this session. Francesco will wrap up the session at the end of this one hour and give the first or propose the messages. And then we can all work collaboratively the next days to finalize them. Also, we have a remote moderation. It’s done by Maria from the support team. And we have Sabrina also online, which supports. about this task. With Irena, we decided and agreed not to repeat her statement as she gave it previously this morning, but she is here with us to answer your questions and give, if necessary, some insight from her perspective of the Council of Europe. And as you mentioned, or I will inform you, that this is a kind of continuous workshop. We had a previous session, workshop 1A on child safety online, where we discussed updates on legal issues and proposals or acts coming into force in the UK and proposals by the European Commission to fight child sexual abuse online. We will go to the next step and discuss more about protecting vulnerable groups online from harmful content and what are new technical approaches. And with all this having in mind, again, I think we can start now. And I would like to introduce or to ask Andrew first to give us a perspective of a civil society organization and what kind of harms and violations users are facing online. What are we talking about?

Andrew Campling:
Good morning, everyone. As Torsten said, my name is Andrew Campling. I’m a trustee of the Internet Watch Foundation, whose task is to eradicate child sex abuse material online. I’m not technically speaking on behalf of the IWF, since I was only asked in the break to join the panel. So this is my view on the subject matter. So just for clarification. So to try and give a balanced view, as best I can, of the perspective from civil society and some of the harms and challenges. In terms of… focusing on child sex abuse material. To be blunt, what we’re talking about is effectively is the rape and sexual abuse of children, mainly but not exclusively girls, often very young children. So younger than three years old, in some cases. The greatest growth at the moment seems to be in so called self generated abuse material that’s often produced either something that is produced voluntarily and shared between friends, but then misused, or increasingly, is where someone’s coerced into producing material. So tricked by effectively organized crime groups who are operating very aggressively online, where typically their tactic is to trick you into sharing some intimate material, and then immediately blackmailing you financially, usually, as a consequence of that. And the target, the greatest sort of focus group for that is teenage boys, rather than girls, unusually, for that particular type of abuse. And it has very serious consequences. There are instances where children have committed suicide within 30 minutes of the abuse happening. So it is serious, and has very direct, often fatal consequences. So it shouldn’t be sort of full as a trivial example. The sorts of technologies that are used, there’s a lot of research now which shows that the abusers will target platforms where children are often on, whether that’s common social media platforms, or platforms more overtly targeted at children. They usually will pretend themselves to be young people on the platform, even though typically that they’re not. and they will try and move the target as rapidly as they can onto an end-to-end encrypted messaging system in order to undertake the abuse and then the blackmail in the case of the sex abortion blackmail. To give you a sense of scale, some recent research tells us there’s about 300 million victims, child victims of child sex abuse globally. So this is not a small crime. There’s also research which tells us that once someone has viewed CSAM content, there’s a 40% chance that they will immediately seek access with children. So therefore, there’s a high probability that they will go on themselves to re-offend and grow the scale of victims. And then briefly, in terms of the sort of conflict here, if you will, in civil society, some of the loudest voices come across as very pro-privacy and that the consequences of countering child sex abuse are that it impinges on the privacy of the rest of us. Depending on your point of view, that might be a price worth paying for those 300 million victims. But also from a technical point of view, to try and bring it back to the subject here, you can do things like age verification in a privacy-preserving way. You don’t need to disclose data about who you are to a platform. You can do it either through facial scanning, which is increasingly highly accurate now, or through other means, which we can maybe discuss later. But also for things like client-side scanning, again, you can do that in a privacy-preserving way as well. The biggest challenge is if you want to get into things like monitoring for the risk of grooming, that’s hard, in my view, to do in a privacy-preserving way because you have to start to scan the content. of messages, which is, so you’re going to have much higher risk of impinging on privacy, and also creating false positives, but for known CSAM, I don’t see there to be a privacy challenge, and I’ll maybe stop there.

Torsten Krause:
Okay. Thanks, Andrew, very much for this start in the topic and for this overview and also for building bridges for ongoing discussions. But first, I want to know from Anna, your perspective of the Safer Internet Center in Poland, regarding your practical experience, what groups of harmful content conclude the biggest part of harmful content? What’s your perception?

Anna Rywczynska:
I would relate also to what Andrew said about this self-generated content, talking about the CSAM, so Child Abuse Sexual Materials, more and more percent of those materials are self-generated. So this is something that we see now as a process that is coming. Sometimes it is a consequence of sexting activities, when it goes not where it should, but then it is also… Yes, sure. And also, I have problems with my throat, forgive me, but I will try. But sometimes it’s a consequence of, of course, adult manipulation, and the child generates the content and releases it into the Internet. But apart from CSAM content, apart from this self-generated CSAM content, there is also something that especially we see in our public sphere on the Internet. These are kind of a pathological content. I’m not sure if you know what I mean, but that can be called pathostreams, or that can be called pathological content videos. And what I’m talking about, it can be like a live Internet broadcast of some pathological situation showing alcoholic libations with the content that glorify violence, that glorify aggressive behavior, that glorify sexual abuse as well, because even these kind of situations are on live broadcast to the Internet. It can be all broadcasted live, but it can be also produced music videos that really reach very easily children. These are things, you know, very catchy, very often those producers, which it sounds strange, but they really sometimes they go to the mainstream, like they dance with stars even, in the same time producing very dangerous, very aggressive, very violent materials. And I think it’s one of the really, really big challenges and also the government that we have now is also very focused on how to fight with this patho-streaming and pathological videos. That would be something that I would like to maybe focus on now.

Torsten Krause:
Okay, thank you so much Anna for this and we’ve heard now about several kinds of violation or harmful content children and other vulnerable groups face online while using digital services and we want to talk about new technical approaches to protect users. So we are glad to have Chituraj at the panel and as I mentioned before he’s the Chief Technology Officer at Oxylab and Oxylab has invented an AI product to remove child sexual abuse material. What was your motivation and how does it work?

Zydrunas Tamasauskas:
All right, first of all, I’ll give the background. So Oxylab is a leading web intelligence platform. We work with companies like from Global Fortune 500 and we heavily invest into the intelligence part obviously, that is collecting publicly available data. So for us ethical parts is an essence in the company itself, so why we’re doing this and we are not going into the red lines or things like that. And, well, we understood that we have quite a lot of experience and expertise within the company. And we decided that we want to join some governmental opportunities in helping to fight some sort of, not necessarily CSAM content, but anyhow, to help our governments with the data that we can collect. So, and I think that’s most important is to understand that data is really, really that easily accessible. And I really feel what you’re saying. It’s really sometimes easy to get and way easier to put. And for us, it was kind of a sense to build some sort of tooling with the help of AI, of course. It’s too much of human effort required to review. There’s mental issues, of course, psychological. So what you see there, it might damage yourself in a way. And yeah, and how to fight this. So we have a few million websites here in Lithuania, right? So, and we thought that, okay, so if we have this vast infrastructure to get this data, why not use it to, you know, to build a tool which scans the internet, searches for CSAM images right now, and then reports to the Communications Regulatory Authority, like before Kristina was presenting. So yeah, so that’s why we are here. We actually found a way to do that. So we employed AI technologies, so machine learning algorithms. They are not so sophisticated as of now, but we are looking into retraining them and making them more accurate. And it’s not a problem to get this data, as I mentioned, at least for us. But the problem is that, you know, even AIs make mistakes. And it’s not necessary that if it finds the CSAM, let’s say the image matches age, let’s say it’s a child, right? The image matches like nudity content, all right. But it could be like various things. It could be like family photos. It could be any stuff that shouldn’t be on internet, but somehow it’s there. And so the system actually scans internet daily and sends data to the communications authority. And then it kind of flags which of the contents are the most likely to be CSAM actually. And then there’s a manual efforts on the authority side to actually confirm it. Because like colleague mentioned as well, there’s a few things in fighting this is that we should regulate this as a fact, yes. The systems are like in infancy yet mode, I would say. And we need to somehow ensure that innocent people will not be prosecuted in a way that it could be. But our experience shows that actually there are quite a lot of such content instances. And together with the authority, we managed to at least file a few police cases here. So that’s for us, it was really great achievement. It means that we as a technology company has a lot of potential here and expertise. And we are giving them as a pro bono basis for the government and governmental institutions. And we try to help them.

Torsten Krause:
Okay, thanks for this insight and your contribution. I guess it’s really necessary to hear or to understand that there are humans in the loop because you mentioned also artificial intelligence make not only right decisions. And it was also a topic of concern yesterday, raised by the young people from the USTIC, that we need transparency and that decisions not only can be taken alone by artificial intelligence. To let you know, I would like to ask several questions again to the panel and then open up for a broader conversation. So, if you want to share your thoughts, your concerns or you want to support any position or ask a question, then you could prepare yourself to be ready for this. Now, we’ve heard that there are really hard issues, there are heinous crimes we need to protect users online and we have also the possibilities and opportunities to do this with technical approaches. What seems to be necessary is that the different organizations and perspectives are cooperating. So, I would like to ask Anna, do you have such collaboration agreements between different parties in the space, how you are working in Poland?

Anna Rywczynska:
Yes, just maybe to start with describing the Polish Cypher Internet Center, because I’m not sure if you know how those national Cypher Internet Centers look like. You need to be co-financed by the European Commission to be in the network of those national Cypher Internet Centers. You have to have three components. So, you need to have the awareness activities, you need to have a hotline dealing with illegal content and you need to have a helpline giving support day to day, 24 hours a day to children and young people. So, we as NASK, I represent the National Research Institute, so at our organizations we have awareness activities and we have a hotline, but also we needed to have a helpline and also to join forces doing this. awareness actions, educational actions, so we have like the main, the most important agreement that we have is actually the agreement between us and our consortium partner which is NGO, it’s Empowering Children Foundation and they have, they are operating this well-known number 116-111 and we cooperate actually with them for over 20 years, which I think it’s like, it’s really a lot, I mean it’s a very strong and well-developed cooperation and it’s absolutely crucial for us because we need psychologists, therapists to work in the helplines and we couldn’t do it by ourselves within our institute which is more technical, with a technical approach. Of course having hotline in our organization, we have agreements signed with the police, we renew this agreement every two, three, depends on a year but we have to renew this agreement and it’s actually quite described in details how we operate because as you know hotlines firstly check, this is the main role of the hotline is to check on what server, in what country the illegal content is located and then we cooperate or locally with the police or we send it to other hotline in other country where the server is located. So this is quite in details described how we operate with the police and then we are a trusted flagger for few VLOGs, so for selected platforms and of course without that it also wouldn’t be possible to work effectively. And then we have other cooperations, maybe smaller but also very important, like we have agreements with schools having representatives into our youth panel and also to be able to train their educators, their teachers. We have agreements with institutions that consist our advisory board. because it’s also totally necessary to be surrounded all the time during the program, to have the advisory board to consult with them all the things that we plan that we are doing. Then, who we cooperate more, let me think. Ah, okay, we have also another advisory board, we have parental advisory board. This is something also very important for us, because in this board we have also parental influencers, but you know, not those putting the photos like you described online, but those reasonable and doing right things and also helping us actually to reach parents, because this is something that is most difficult, I would say. We can easier reach schools, educators, but with the parents it’s like a huge problem. It’s strange when I say that, because I think lots of us here are parents, and we would say, oh, it’s not difficult to have parents, we care, but it’s not that issue. Actually, it’s very hard. So, we also have also signed agreements with the people who are in this kind of our advisory board. So, as you can see, we have lots of different collaborations. But also, to sum up, it’s also sometimes difficult, because we used to cooperate a lot with the industry, with the platforms, but now with DSA, of course, we have to be more cautious about this. We have to take care of everything to be transparent. We have to have open consultations, because we, especially as a public institution, have to now somehow challenge this new situation. But we also have to absolutely cooperate also with these big platforms, because it also happens on their websites. So, we can’t forget about them.

Torsten Krause:
Thank you. Thank you very much. to share your insights and bring in your perspective. I guess what we could take away that building networks and strengthening cooperation is key to work together on fighting such violation and risks online. Andrew, you’ve just tipped some possible solutions on how we can protect users online. Maybe you give us some more thoughts about this.

Andrew Campling:
Yes, and again, emphasizing, I’m not speaking on behalf of the IWF, but I’ll happily give some comments. So just to expand on my, again, briefly on a couple of things I touched on earlier. Age verification, you can age verify in a privacy preserving way. Two mechanisms you can use for that. One is facial scanning. And there’s some data published by an organization called Yotty, and also I think others that actually implement the technology. If you do a search, there were some comments in the news in the last few days. So they published data to give a view on the accuracy of that, in particular for those children aged from around 13, because there was some comments from the UK regulator that facial scanning wasn’t especially accurate to determine whether someone was aged above 13. The reason that matters is because some of the popular social media platforms, if you look at their terms of service, you might be surprised to know that they specify a minimum user age of 13. I appreciate in practice that’s maybe not widely enforced. But actually, the data from Yotty suggests that you could just use facial rep scanning for that purpose. Similarly, there’s some work underway at the moment using the European ID system. where you could just disclose to a platform, not your identity, not your date of birth, but if there’s a minimum or maximum age required, it could simply disclose to it that you are indeed over or under that age and no other data at all. And I think long-term, that’s probably a very good solution, but it does require you to have credentials stored on a platform. So that may not always work for children because they don’t always have those forms of ID. The other sort of main technology, briefly, since it came up in the last session, client-side scanning. I will maintain that you can absolutely do that in a privacy-preserving manner if it’s for known CSAM, because all you’re doing is looking at an image to see does it match a hash of known CSAM. There’s little to no risk of false positives, and certainly the IWF’s experience from 25 years is we’ve yet to have any false positives. There are some papers out there that suggest there are flaws in client-side scanning. From looking at those papers, I would politely comment that if you implement the technology badly, then there are flaws, but don’t implement it badly, then you bypass those flaws. So one of the papers comments on some known attacks, but because they’re known attacks, you can implement the technology in ways that don’t suffer from that. So the other comment, I guess, just, and then I’ll finish, is some of the privacy protections that are widely used, what people maybe don’t realize is some of those privacy protections also bypass cybersecurity. So you actually, in implementing what you believe are privacy protections, in some cases, you’re actually leaving yourself more open to attack. And as you understand, if you don’t have cyber security, then you have no privacy. So the worst case scenario is if you think you have privacy, but you don’t have security, then you’ll maybe be far more exposed than if you have concerns about privacy and act differently. I can happily expand on that if there are questions or offline, if that’s easier. But as I say, client-side scanning and age verification, I think there’s two important tools and they can both be done in privacy-preserving ways. Of course.

Torsten Krause:
Thank you very much, Andrew. And I would like to ask for the moment, a last question to Žydrūnas. We’ve heard about from Andrew what tools are in place and what’s already existing, but what developments or improvements do you foresee for this technology in future and how this will impact the internet environment?

Zydrunas Tamasauskas:
Yeah. I mean, one of the things that we have to understand that AI technologies are moving quite fast, quite fast and faster every year, every minute actually. Just look what we have with open AI right now. And the AI models gets more smarter, more sophisticated, more accurate. But you know, there’s one problem with it. The problem is that we need to have accurate data as well to train them with. So the likes has mentioned like CSAM and all of that, it really helps. And we also partnering with the police department and the regulatory authority, the communications regulatory authority and we kind of trying to build a new kind of models as a next phase. So we are partnering further and to build even more accurate and more smarter ways to detect actually the age of the person. So it’s a children is under 13 hours more. And of course, the new thing that we are exploring is because of this boom of AI. So we are looking to analyze as well as the textual information. So the abuse starts not with images, actually. It starts with text before in an exchange of some ideas, some keywords that would specify that what is all content is all about. And so the CSAM, as I mentioned, so yes, we are now fighting and trying to look for CSAM, actually, which is visual representation, right? But textual is also important. And yeah, we are looking further, training those models, getting more information, and trying to make it so that it learns on top of that. So as Kulig mentioned, yes, it’s really hard right now to get the actual results. But human operators is also required. So further still, I believe that we will be going there anyways. Would it be possible to use some sort of open edge technologies for all of that? I think no, because of privacy concerns. But the good part is that if we are to build a very specific, very task-oriented machine learning algorithm, it could be based on large language models. But it’s built privately for this specific purpose. Then I believe we can work wonders with that, because it’s very specified. Yes.

Torsten Krause:
Would you please come to us and use the microphone? So it’s recommended to take a seat to introduce yourself. and ask your question or I’ll bring up your comment. Thanks.

Audience:
Thank you. First of all, it was really interesting. My question is a little bit technical. I’m interested in what kind of technical strategies you use with your AI to make it more inclusive and linguistically accessible for everyone throughout the part, like the territorial part that you use AI for to detect?

Zydrunas Tamasauskas:
Yeah. Okay, so first of all, as mentioned before, so we are a web intelligence gathering platform, meaning that we use a lot of proxies basically. So we have at our possession like a few hundred million or a hundred million proxies and we can access any location. And so the first technical problem is to actually know where such content might be. And this is where we are collaborating with governmental institutions, which signals to us where to look for this content. The second part is to actually go there physically, well physically in a sense in computing, try to search for this and confirm whether it’s right or not. And if so, we’re just simply scanning those images, comparing and converting them to the CSAM hashes. If it matches, then we know what it is. And we are basically sending through automated means to the regulatory authority to check it. If it doesn’t match, so there’s a new type that, you know, we cannot catch every CSAM event. So we have to predict somehow. And thus we have this second model. The first model is actually checking the hashes with CSAM. And the second model is to actually predict or analyze the imagery that we have. with known facts like underage, nudity, and basically send it to the human operator to confirm.

Audience:
Can I have another question?

Torsten Krause:
Sure.

Audience:
What are your strategies for exposure management in case of vulnerable groups especially, like to not expose their information?

Zydrunas Tamasauskas:
What is important to understand that this solution is contained. It works in a contained environment and is actually regulatory authority is owning this solution. We ourselves are providing like services like for proxies and for software development, but we ourselves we are not analyzing it. But for privacy contents, how we build a solution is as soon as it finds something and it extracts it and analyzes and labels this data. So we no longer need this data. So we are having a metadata about it. Like my colleague mentioned, what is the server? What is the location? Where exactly like if there are a thousand pages on which page it was found and then how it was labeled. So we no longer need the physical image. So we just remove it and we don’t keep any records. So and even I think that’s like if we speak about privacy and even especially like children, right? So we don’t need to hold this information because of course it’s damaging. So if that answers the question.

Audience:
Thank you.

Torsten Krause:
Thank you very much. Thank you very much for your contribution. I saw many hands. First, I saw Desara. Please come to us. And then I would like to ask Maria and Sabrina if there are also comments or questions in the remote area to have a fair discussion from both kind of worlds.

Desara Dushi:
Hello, this is Desara Dushi from the University of Brussels. I have one question and one comment. I’ll start with a comment. I think it’s important to make a distinction because I had the feeling that this distinction was not made between client-side scanning technology and the detection of known CSAM. They are two different things and currently detection of known CSAM does not use client-side scanning technology. And so detection happens in interpersonal communication services or in publicly available content, but not in by using client-side scanning. And what is client-side scanning? It’s a technology that it’s not yet in use, but it’s supposed to be used if the proposed regulation that we mentioned in the previous workshop is going to be implemented, not for detecting CSAM. And what it will do, it basically is there is a tool that there is an app, something like an app that will be installed in all devices, in all phones of every user in the EU. And it will detect, it will try to detect CSAM before the image gets sent to the other person. So at the moment that we decide to send somebody else an image before it gets put into WhatsApp, for example, and this is the idea that it sort of prevents. So the aim is to not interfere with end-to-end communication because once it’s put in WhatsApp, then it’s end-to-end encryption. And it means we cannot access it because of privacy reasons. But this of course leads to sort of circumvention of end-to-end encryption, because anyway, we are assessing the material before it gets sent. While my question is for Oxalab, I wanted to ask if you have, if there have been any false positives related to the detection of CSAM and whether this has resulted in any person being falsely accused and prosecuted.

Zydrunas Tamasauskas:
No. So we have false positives. That’s why we have a second iteration with the authority. trying to improve the accuracy of the model. So, as I said, so only like two to three cases were reported to police and there were legit cases for accusation. So far. Okay. Thanks.

Torsten Krause:
Thank you very much. Andrew wants to react.

Andrew Campling:
Yeah, I’ll make a comment, but try to be brief. So again, on client-side scanning, two thoughts. One is some of the existing messaging systems already use client-side scanning, but you don’t know that. So for example, if you put a URL into WhatsApp, it will typically generate an image of the page. That’s client-side scanning. It’s not labeled as that, but that’s absolutely, it’s scanning your message, working out that you’re trying, you put in a URL and it’s fetching an image of the page. So it’s the same technology as that, but obviously for a different purpose. In terms of implementation, you could absolutely have a generic application that did it in some way. But personal view, I prefer to see WhatsApp, for example, expand its use of client-side scanning and actually do it inside the app rather than have a generic app on a device. I’ve had a similar discussion with Meredith Whittaker at Signal, who’s, in my view, made some technically incorrect assertions about client-side scanning is by definition, impinges on privacy, and suggested that if her own coders wrote the code within the app, then you don’t have a concern about what the scanning is doing because you’ve written the code yourself, you know what it does, and you’re in control. So that’s why I’d rather see it inside the app as opposed to having a generic application provided by somebody else that you had to install on your device. There are different solutions.

Torsten Krause:
Thank you very much. I see several hands. First, I want to ask Maria and Sabrina if there are comments or questions remotely.

Online panelist:
We have several comments. I’m not sure if you want to comment or not. People are sharing thoughts. If you want, I can voice them or take questions. You can decide whether you want to comment on that or not.

Torsten Krause:
Please voice them so that we can all hear. Can you hear everybody, Maria?

Online panelist:
We have the first comment. The age limit for social media is often due to GDPR. Age verification without disclosing the part of your identity to the platform is possible. However, you disclose to the verification system that you are using this platform, which sometimes might be even worse in terms of privacy. One more comment. It can be done without letting the verification system know which platform you are using. It’s not easy, but it’s possible. The verification system gives you a verification token without knowing where you use it. We have one more. Google already uses CVM.

Zydrunas Tamasauskas:
It’s not only Google. Apple also rolled out no-data detection last year. They at least say so that they will do. Some people get very reactive to that. They said that they will not analyze this data and not send it somewhere, but still, it’s there. To my opinion, actually, which is my own opinion, not influenced by a company or anyone. So, there are platforms out there that collect fast information about any use of your email, for example. You actually don’t need a major verification when you’re entering social platforms, because the social platforms already know who you are, and what age group you are, what your things that you do on the internet are, like, you know, where you visit frequently. Because those systems are very integrated and they’re used for businesses, and they’re used for, like, boosting sales, boosting ads, and all of that. And in my experience, I had once a chance to work with such a system, and it’s very, very intelligent. Actually, there’s another thing, which where those examples are actually good examples. It’s fighting fraud, because if you’re collecting that much information, of course, there’s a question about privacy, non-privacy. But if you can identify fraudsters, and then, you know, forbid committing crime, using credit cards, for example, and all of that, so it’s also good. But you know, there’s a really thin line between when this data should be used and when it’s not.

Torsten Krause:
Okay, thank you.

Andrew Campling:
And do you want to actually take the question?

Torsten Krause:
Okay. And I would also, due to a gender balance, I would like to invite you here. Please introduce yourself and share your comments or questions with us. And then we have another hand there. Okay, maybe you are second one. Yeah, thanks, everyone, for the discussion.

Audience:
My name is Calum Vogue. I work for the Internet Society, and we’re mainly looking at this topic through the lens of encryption, and how things like client-side scanning might interact with the encryption, and as you said, the expectations that users might have for privacy when they relate. encryption. So my question is actually about what are some of the technical risks when it comes to client-side scanning. So for example, how could client-side scanning database be targets of attack or are there any specific risks for when data is processed on device or if it’s processed off device? So maybe some more details there would be very helpful. And that kind of brings me to a connected question, which is when the EU is maybe deciding if they’re going to support any of these technologies or credit them, what security measures should they be looking at? And then also, how should they be weighing up maybe conflicting rights when it comes to this space?

Torsten Krause:
Thank you. So you’re able to answer this question, how we can achieve this?

Zydrunas Tamasauskas:
I can comment a little bit on the first part on the client-side. So it turns out that it was hacked as well using WhatsApp. And actually in the same way that colleagues was mentioned, sending an URL and then trying to parse this URL with an application. And that application actually, what it did, it break into my phone and it actually started to get this old data out. Yeah, that was one of the things that raises suspicions because we know, for example, that there are malicious agencies who are building quite sophisticated software like Pegasus, for example, if you heard, which hacks into any device basically, and it takes any data that they want. So the client-side actually also has some sort of challenges. First of all, any device can be rooted. So you can do anything with that. You can take down this application, dissect it and make it reverse engineering efforts and make it so it doesn’t serve its purpose, but no, it’s actually making things worse. The second thing is the encryption that’s being used right now for transport. It’s not that robust. you can build your own certificates, you can craft your certificates, you can install those built-in certificates into the device, and then you have the encryption basically nullified. So those are the biggest risks right now, because we don’t yet have technologically-proof encryption methods available. We talked about the transfer of the scans. Even when you’re using, for example, suppose you are a mobile user. You’re using, let’s say, X provider mobile data. It turns out some of the providers can read your data, so it doesn’t really matter if it’s in secure channels or not. So there is a few challenges that we have to solve as well.

Andrew Campling:
In terms of the question about whether it should be on or off device, ironically, one of the issues caused by additional widespread use of encryption is it’s a lot harder to do it off-device, because when it leaves the device, the data is typically encrypted, so it almost forces the scanning to be done on-device, which is perhaps an unintended consequence of the push for more widespread use of encryption. So it’s literally the only place you can actually undertake the scanning. I suppose the other comment worth saying, again, the end-to-end encryption is also a very helpful attack vector, because you don’t know when data has been exfiltrated from your device. And unfortunately, some of the blind push to end-to-end encryption of lots of metadata is hiding what are called indicators of compromise, which are really important signals used by the sort of cyber defences. on our systems. And if you lose those indicators of compromise, it’s very hard to identify when data’s being exfiltrated from your system. And that’s something which I think is becoming an increasing problem because the cryptologists aren’t cyber security specialists. There’s a contradiction between those two areas. And I’m working with some cyber security companies who are horrified by some of the trends in encryption, as are some of the CISOs in some of the big Fortune 500 companies, because they get fined if their defenses are penetrated and they lose data. And it’s increasingly hard for them to have effective defenses because of the more widespread use of encryption. The sort of fines we’re talking about is some recently in the US, the financial institutions will each find $200 million for these compromises. So these are not trivial problems in terms of just the financial, let alone the personal consequences. Okay, okay.

Torsten Krause:
Thanks, Hugh. We are running straight into the end of our workshop, I guess, that we have you. And maybe I would like to ask then if there was a last comment or questions remotely. Okay. And then Francesco and Sofia have the challenge to wrap up this discussion for us.

Audience:
Hello, my name is Joao. I’m from YouthLeague. And actually, when we were crafting our messages, we talked about age verification, like a digital wallet. We had this idea, we developed for a bit, but eventually it didn’t make the final cut because there were two concerns that we couldn’t overcome at that setting. So I just wanted to hear your thoughts on them. The first one was that using a digital wallet, like a centralized digital wallet for you, for example, would pose like a huge security risk. Even if you just send your token, yes, it meets the age, no, it doesn’t meet the age, you still have to store the information somewhere. And if you centralize it, it will be like a huge target. Like everyone will want to attack that because it’s so valuable. Every European information is in a single place. So… do you think this is a risk or just a non-problem? And the second one is that we would be excluding people in difficult and fragile situations, people that don’t have email, people that don’t have documents like illegal migrants for example, so even if they are not legal citizens of the country they should not be banned from the internet or from specific services on the internet, so do you think this would be something that we should consider when talking about age verification and digital wallets?

Torsten Krause:
Okay, thank you very much, who wants to take this?

Zydrunas Tamasauskas:
I can comment a little bit, I used to work with crypto technologists as well and I know the governments especially in Dubai, they’re experimenting with thing called ledgers, so it’s a private crypto chains where you store basically all this information which is private to the citizen, what kind of assets it owns, age, banking, financial institution data, all of that, and it’s based on private networks, meaning that it should not be accessible publicly, but as I said, I don’t want to be that pessimistic person here, but what’s on the internet is basically hackable, it doesn’t really matter if it’s private or not, but I think if we store this information on tokens and in a blockchain, so we have to really consider what it is and because it stays there basically forever, and there are tools to manipulate it, somehow to execute this data, so it’s not complicated. And so for those cases that I mentioned, like people who are migrating, well, e-identity, like digital identity could be a solution, where if a person comes to the EU, it could get a temporary e-identity, then use basically that to access services. So basically, if we’re moving to the society, which is more towards, you know, digital services and all of that, so it becomes more connected anyways. So it should be simple.

Torsten Krause:
Okay, a quick comment.

Andrew Campling:
Yeah, just a very quick comment, I think is a valid concern about having a big target. Having said that, there are mechanisms you can use to defend the target, such as not having it literally all in one database, but dispersing it and having protection so that you can’t link the data in, say, the age bit to the identity, the name, and so on. So it is a valid concern. It would certainly need to be done, in my view, by a state actor to give it an appropriate scale of protections. I wouldn’t want it to be done by the private sector, not least of which, because dare I say it, if you look at the social media platforms, the private sector has a history of abusing data far worse than any government. So yeah, there’s good concerns, but you can reduce risk.

Torsten Krause:
Okay. Thank you so much. Thank you very much. So I see one hand raised up in the remote room. So please, if I see it correctly, Jutta Kroll, please take the floor.

Online panelist:
Yes. Thank you for giving me the floor. I just wanted to address some of the issues that were raised by the next-to-last speaker, I would say, or the people who questioned. When it comes to, firstly, the vulnerable groups that may not have a European ID or something else, a valid personal document, there are several systems that we have learned at the Global Age Assurance Standardization Summit in Manchester in April, several solutions to address that issue. and to help people to not only to identify, but at least to verify their age in online services without having a valid document. And the second term that I would like to bring into the debate is the double blindness of age assurance systems, which addresses a point that was made by Jörn Erbgut in the chat, that of course, if the service provider, either the service provider knows who has verified a voucher for the age, that might cause privacy issues, as well as if the verifier, the verifying organization knows which service is used. But this is addressed by the approach of double blindness, where neither the verifier nor the service provider will know of each other, only the person in the middle that needs an age verification or proof of their own belonging to a certain age group has the certain knowledge, and those other sides are double blinded. And I do think that could be a very good approach. The German government has already commissioned the development of a demonstrator, and probably in September this year, we will know better whether that works or whether it doesn’t work, but I’m pretty sure it will work. Thank you.

Torsten Krause:
Thank you very much, Jutta Kohl, for your information and sharing your experience from the Global Age Assurance Summit in Manchester. So, we are running out of time. I see several hands up, but I guess that the speakers, the attendants all were around here today and maybe also tomorrow, so you can catch each other and go in the conversation bilaterally. I’m really tied to the schedule and want to finish, but before we close this session, we have Francesca and maybe also Sofia wrapping up our discussion. Okay, Sofia told me to start, so yeah. Thank you. You can add whatever you want, whenever you want, it’s always okay.

Reporter:
Hi, I’m Francesco Becchi from Big, I’m the reporter for this workshop. Actually, I’ve already attended also the workshop about almost the same topic earlier. And again, I would like to thank you, all the panelists, for the insightful presentation. It was really rich, so it’s really hard to me to try to sum up what has been said, because there are so many different things. I’ll try to focus, well, what I’ll try to say is what we can all kind of agree upon. Okay, I’ll ask you at the end if you agree on sending a message that has these kind of main priorities, but then you can all check on the shared platform for comments in the next few days, especially after Eurodig. If you want to add some comments, if you think that something was going missing, please do. It’s a collective process, so absolutely, feel free to do whatever you want. Okay, I believe that the main points have been, I’d say, three, but very complex ones. The first one was about the type of content that is actually important in this issue. First of all, self-generated material, pathological content. Actually, also related to this, the privacy paradox that sometimes claiming for more privacy practices can lead to a decrease in safety. The second part, which is actually even more complicated, is the part related to new technical approaches that try to be privacy respectful. We mentioned especially age verification in client scanning, but actually, I believe that all of us should agree that it is really important to differentiate between client scanning and detection of CSA. They’re not exactly the same thing, and some actors actually already use client scanning without infringing privacy, especially WhatsApp, Google, and Apple, and so on and so forth. There are actually more concerns during speaking regarding anti-grooming techniques and the fact they use both visual and textual data and probably we should still need some development on this, but I mean, a company is dying, it’s coming there, precisely. There are of course the typical concerns about AI, so biases, quality of datasets, accuracy, of course, and especially also inclusivity and accessibility in general concerns. We need accurate data and probably with task-oriented models, well, sorry, task-oriented models can actually work pretty well in this sense. And finally, just a couple of things that are related more to the political authorities. Of course, we constantly need the check by authorities. So what companies can do is just to detect data and signal issues, but then it is up to the authorities to make the final decision. And second, I, you know, this is also a topic I’m really into, as my fellow leaders know, but apparently a European ID seems a sort of long-term possible solution, not yet, provided that it is based on a blockchain or other kind of encrypting techniques. Finally, and actually I wanted to focus especially on the contribution of Andrew and Dana in this sense, even though it hasn’t been that part of the discussion, but no one actually questioned it, so I’m going to propose it to you as well, the relevance of a diversified multi-stakeholder approach, especially in trying to involve, you know, technicians on the one side to provide the solutions, but also civil society, political authorities and representations, but also people that don’t come from technical backgrounds. I believe that you mentioned psychologists, but we can also mention that people who work in ethics, cybersecurity experts that don’t always talk with encryptioners and so on and so forth. Well, if you all agree with this, okay, attempt of summing up the session, okay, please. One little correction, blockchain does not encrypt data. Yes, okay, it should stand on private blockchains.

Torsten Krause:
Okay, thanks, Jörn. And maybe, Sofia, do you want to add something or…? I think that’s it for wrap-up. Okay, thanks for your wrap-up and you are all invited to cooperate with us and finalize these messages in the next coming days. And then also thanks to all participants who shared insight and thoughts with us and also a big thank to all of you to be with us to promote our discussion and be interested in this necessary topic. Thank you very much and have a fine day.

AC

Andrew Campling

Speech speed

165 words per minute

Speech length

2214 words

Speech time

806 secs

AR

Anna Rywczynska

Speech speed

155 words per minute

Speech length

1075 words

Speech time

417 secs

A

Audience

Speech speed

188 words per minute

Speech length

530 words

Speech time

169 secs

DD

Desara Dushi

Speech speed

164 words per minute

Speech length

328 words

Speech time

120 secs

M

Moderator

Speech speed

199 words per minute

Speech length

209 words

Speech time

63 secs

OP

Online panelist

Speech speed

132 words per minute

Speech length

455 words

Speech time

207 secs

R

Reporter

Speech speed

168 words per minute

Speech length

674 words

Speech time

240 secs

TK

Torsten Krause

Speech speed

143 words per minute

Speech length

1648 words

Speech time

692 secs

ZT

Zydrunas Tamasauskas

Speech speed

152 words per minute

Speech length

2404 words

Speech time

948 secs