WS #219 Generative AI Llms in Content Moderation Rights Risks

24 Jun 2025 16:00h - 17:00h

WS #219 Generative AI Llms in Content Moderation Rights Risks

Session at a glance

Summary

This discussion focused on the human rights implications of using Large Language Models (LLMs) for content moderation on social media platforms and digital services. The panel, featuring experts from the European Center for Nonprofit Law, Digital Trust and Safety Partnership, Center for Democracy and Technology, and Access Now, examined both the potential benefits and significant risks of deploying LLMs in automated content moderation systems.


The conversation highlighted how LLMs represent a concentration of power, with a handful of companies developing foundation models that are then deployed by smaller platforms, creating a cascading effect where moderation decisions made at the foundation level impact content across multiple platforms. While LLMs offer some advantages over traditional automated systems, including better contextual understanding and improved accuracy, they also pose serious risks to human rights, particularly freedom of expression, privacy, and non-discrimination.


A critical issue discussed was the disparity between high-resource languages like English and low-resource languages, with multilingual LLMs performing poorly for languages with limited training data. This creates significant inequities in content moderation, as demonstrated by examples from the Middle East and North Africa region, where Arabic content faces over-moderation while hate speech in Hebrew was under-moderated due to lack of appropriate classifiers.


The panelists shared concerning real-world examples, including the misclassification of Al-Aqsa Mosque as a terrorist organization and the wrongful detention of a Palestinian construction worker based on Facebook’s mistranslation. These cases illustrate how LLM errors can have severe consequences for marginalized communities, particularly during times of crisis when platforms tend to rely more heavily on automation.


The discussion emphasized the need for greater community involvement in LLM development, mandatory human rights impact assessments, and more transparency from platforms about their use of these technologies.


Keypoints

## Major Discussion Points:


– **Concentration of Power in LLM Development**: The discussion highlighted how a handful of companies (like those behind ChatGPT, Claude, Gemini, Llama) develop foundation models that are then used by smaller platforms, creating a concerning concentration of power where decisions made at the foundation level (such as defining Palestinian content as terrorist content) automatically trickle down to all deploying platforms.


– **Language Inequities and Low-Resource Languages**: A significant focus was placed on how LLMs perform poorly for “low-resource languages” (languages with limited textual data available for training), creating disparities where content moderation works well for English and other high-resource languages but fails for languages like Swahili, Tamil, Quechua, and various Arabic dialects, despite these being spoken by millions of people.


– **Real-World Harms in Content Moderation**: The panel extensively discussed concrete examples of LLM failures, particularly in the Middle East and North Africa region, including cases where Arabic content was over-moderated while Hebrew hate speech was under-moderated, mistranslations leading to false terrorism accusations, and aggressive automated removal of Palestinian content during crises.


– **Technical Limitations and Trade-offs**: The discussion covered inherent technical challenges including the precision vs. recall trade-off (accuracy vs. comprehensive coverage), hallucinations where LLMs confidently provide wrong information, and the particular vulnerability of LLMs when dealing with novel situations not well-represented in their training data.


– **Community Involvement and Alternative Approaches**: The conversation emphasized the need for meaningful community engagement throughout the AI development lifecycle, highlighting emerging community-led initiatives in the Global South that focus on culturally-informed, decentralized models as alternatives to the current concentrated approach.


## Overall Purpose:


The discussion aimed to provide a comprehensive analysis of the human rights implications of using Large Language Models (LLMs) for content moderation on social media platforms. The panel sought to bridge technical understanding with real-world societal impacts, moving beyond AI hype to document actual harms while also exploring potential solutions and alternative approaches that could better respect human rights and community needs.


## Overall Tone:


The discussion maintained a consistently serious and concerned tone throughout, with speakers demonstrating deep expertise while expressing genuine alarm about current practices. The tone was analytical rather than alarmist, with panelists providing concrete evidence and examples to support their concerns. While the conversation acknowledged some potential benefits of LLMs, the overall sentiment was cautionary, emphasizing the urgent need for better oversight, community involvement, and human rights protections. The tone remained constructive, with speakers offering specific recommendations and highlighting promising alternative approaches, suggesting a path forward despite the significant challenges identified.


Speakers

– **Marlene Owizniak**: Leads the technology team at the European Center for Nonprofit Law (ECNL), a human rights and civic space organization; based in San Francisco


– **David Sullivan**: Executive Director of the Digital Trust and Safety Partnership (DTSP), which brings together companies providing digital products and services around trust and safety best practices


– **Dhanaraj Thakur**: Research Director at the Center for Democracy and Technology (CDT), a nonprofit tech policy advocacy group based in Washington, D.C. and Brussels; expertise in content moderation and multilingual AI systems


– **Panelist 1**: Marwa Fatafta, MENA (Middle East and North Africa) policy and advocacy director at Access Now; expertise in regional content moderation issues and human rights impacts


– **Audience**: Multiple audience members including:


– Balthazar from University College London


– Someone from the Internet Architecture Board in the IETF (Internet Engineering Task Force)


– Professor Julia Hornley, Professor of Internet Law at Queen Mary University of London


**Additional speakers:**


None identified beyond those in the speakers names list.


Full session report

# Human Rights Implications of Large Language Models in Content Moderation: Panel Discussion Report


## Introduction and Context


This panel discussion examined the intersection of artificial intelligence and human rights in digital content moderation. The conversation featured Marlene Owizniak from the European Center for Nonprofit Law (ECNL), David Sullivan from the Digital Trust and Safety Partnership (DTSP), Dhanaraj Thakur from the Center for Democracy and Technology (CDT), and Marwa Fatafta from Access Now.


The discussion addressed how the deployment of Large Language Models (LLMs) for automated content moderation affects fundamental human rights, particularly freedom of expression, privacy, and non-discrimination. The panelists grounded their analysis in documented real-world harms and systemic inequities already emerging from current LLM implementations.


## The Concentration of Power Problem


### Foundation Models and Cascading Effects


Marlene Owizniak highlighted the unprecedented concentration of power in LLM development, explaining how “a handful of companies” developing foundation models like ChatGPT, Claude, Gemini, and Llama create a cascading effect where “any kind of decision made at the foundation level, let’s say, defining Palestinian content as terrorist content will then also trickle down to the deployer level unless it’s explicitly fine-tuned.”


This structural issue creates what Owizniak described as “even more homogeneity of speech” than previous systems. Unlike traditional platforms where individual companies made independent moderation decisions, the current LLM ecosystem concentrates unprecedented power in foundation model developers.


### Demystifying AI Technology


Owizniak provided crucial context by demystifying LLM technology, noting that “AI is neither artificial nor intelligent. It uses a lot of infrastructure, a lot of hardware, and it’s mostly guesstimates.” She characterised LLMs as “basically statistics on steroids” rather than divine intelligence, connecting their technical limitations to their concentrated ownership structure.


## Language Inequities and Systematic Discrimination


### The Low-Resource Language Crisis


Dhanaraj Thakur provided extensive analysis of how language inequities create systematic discrimination in LLM-based content moderation. He explained the concept of “low-resource languages” – languages with limited textual data available for training – and how this creates severe disparities in system performance. Despite languages like Swahili, Tamil, and Quechua being spoken by millions of people, they receive inadequate representation in training datasets compared to “high-resource languages” like English.


The consequences are severe: speakers of low-resource languages experience “longer moderation times, unjust content removal, and shadow banning” compared to English-language users. These represent systematic digital discrimination that mirrors and amplifies existing global inequalities.


### Complex Linguistic Challenges


Thakur introduced the concept of “diglossia” – situations where communities use two languages with different social functions, often reflecting colonial power structures. He explained how “in many of these languages, particularly in those that have gone through the colonial experience, there’s a combined use of two languages” where one represents power whilst the other serves mundane functions.


This analysis raised questions about whether LLM development might “replicate or exacerbate this kind of power dynamics between these two languages,” connecting historical colonialism to contemporary AI systems. Additional challenges include code-switching (mixing languages within conversations) and agglutinative language structures that don’t conform to English-based training assumptions.


### Regional Case Studies from MENA


Marwa Fatafta provided concrete examples from the Middle East and North Africa region, documenting systematic disparities where “there were no Hebrew classifiers to moderate hate speech in Hebrew language, but they were such for Arabic.” This had severe consequences during periods of heightened conflict, with Arabic content facing over-moderation whilst Hebrew hate speech remained undetected.


Fatafta described how “removing terrorist content in Arabic got it wrongly 77% of the time,” whilst critical infrastructure like Al-Aqsa Mosque was mislabelled as a terrorist organisation during sensitive periods. She also recounted cases where Facebook’s mistranslation led to false terrorism accusations, including a Palestinian construction worker who was wrongfully detained after the platform mistranslated his Arabic post about “attacking” his work (meaning going to work) as a terrorist threat.


## Crisis Response and Over-Moderation


Fatafta observed that “companies tend to over rely on automation around times of crises” and “are willing to sacrifice accuracy in the decisions, as long as we try to catch as large amounts of content as possible.” She noted that Meta lowered confidence thresholds from 85% to 25% for Arabic content during crisis periods, leading to what she described as “mass censorship of legitimate content.”


This approach is particularly harmful because crises are precisely when marginalised communities most need access to communication platforms for safety, coordination, and documentation of human rights violations.


## Technical Capabilities and Limitations


### Potential Benefits and Applications


David Sullivan provided the industry perspective, identifying areas where LLMs might improve current systems: enhancing risk assessments, improving policy development consultation, and augmenting rather than replacing human review. He noted the potential for “generative AI to improve explainability of content moderation decisions and provide better context to users.”


Sullivan emphasised that effective deployment requires understanding LLMs as tools that “augment human review rather than replace it” and referenced the ROOST (Robust Open Online Safety Tooling) project as an example of collaborative safety initiatives.


### Fundamental Technical Constraints


However, Sullivan was candid about technical limitations, explaining that “models struggle with novel challenges not adequately represented in training data” and highlighting the persistent issue of hallucinations where LLMs “confidently provide wrong information.”


He explained the precision versus recall trade-off that remains central to all content moderation systems: emphasising precision (accuracy) means missing harmful content, whilst emphasising recall (comprehensive coverage) means removing legitimate content. LLMs don’t eliminate this trade-off but may shift where the balance point lies.


### Structural Incompatibilities with Democratic Rights


Owizniak highlighted structural incompatibilities with democratic rights, explaining that “even the best intentioned platforms will make errors just because this is content that falls outside of data sets and the bell curve. It is, by definition, exceptional or contrarian.”


She noted that organisations working on “protests, civic space, assembly and association” deal with content that is “by default, contrarian, minority, anti-power, protest.” Statistical systems trained on mainstream content will systematically struggle with such material.


## Community Engagement and Alternative Approaches


### Current Inadequacies


Thakur noted that “social media companies lack awareness of local LLM developers and researchers,” missing opportunities for partnerships that could improve system performance for underserved languages and communities.


Owizniak outlined ECNL’s work developing “a framework for meaningful engagement” that involves stakeholders “from AI design stage through deployment,” including their current Discord partnership pilot project. She emphasised the need for involvement in reinforcement learning and human feedback processes rather than relying solely on what she termed “Silicon Valley experts.”


### Community-Led Alternatives


Thakur described efforts to develop “culturally-informed, decentralised models as alternatives to the current concentrated approach.” These initiatives focus on community ownership of both data and classification systems, representing a fundamental alternative to the centralised foundation model approach.


## ECNL’s Research Methodology


Owizniak described ECNL’s comprehensive research approach, which involved “reading 200+ computer science papers” and conducting human rights legal analysis to understand the implications of LLM deployment in content moderation. This methodology combines technical understanding with human rights expertise to provide grounded analysis of current harms.


## Regulatory and Governance Solutions


### Human Rights Impact Assessments


Fatafta advocated for “mandatory human rights impact assessments throughout the AI development lifecycle,” moving beyond voluntary corporate initiatives to regulatory requirements. She referenced BSR human rights due diligence findings regarding Meta’s content moderation practices as evidence of the need for systematic evaluation.


### Transparency and Accountability


Owizniak emphasised the need for greater transparency about “when and how LLMs are used in content moderation.” Current opacity makes it impossible for researchers, civil society, and affected communities to assess system performance or advocate for improvements.


### Alternative Moderation Models


Thakur highlighted “different moderation models beyond centralized approaches, including community-based solutions” that could provide more culturally appropriate and contextually sensitive content governance.


## Audience Engagement


The discussion included questions from academic participants, including Professor Julia Hornley from Queen Mary University of London, who raised concerns about whether community-based approaches could handle the sophisticated legal analysis that content moderation often requires. An audience member from University College London asked about government and civil society influence on technical design decisions.


## Conclusion


This panel discussion revealed the complexity and urgency of human rights challenges posed by LLM-based content moderation. While the technology offers some potential benefits over traditional automated systems, it also creates new forms of systematic discrimination and concentrates unprecedented power in foundation model developers.


The conversation moved beyond both uncritical AI optimism and complete technological pessimism to provide nuanced analysis grounded in documented harms. The speakers demonstrated how seemingly technical decisions about training data, confidence thresholds, and language support embed political choices with severe consequences for marginalised communities.


The discussion highlighted the need for meaningful community engagement, mandatory human rights assessments, and greater transparency in how LLMs are deployed for content moderation. All speakers agreed that human involvement remains essential and that current voluntary approaches to addressing these challenges are inadequate.


The path forward requires sustained collaboration between technologists, human rights advocates, affected communities, and policymakers to ensure that content moderation systems serve human rights rather than undermining them.


Session transcript

Marlene Owizniak: LLM’s and calling us online, Shen. It’s great to see you all here in person and welcome to the folks joining us online. My name is Marlene Owizniak. I lead the technology team at the European Center for Nonprofit Law, a human rights and civic space organization, mostly based in Europe, but we operate worldwide and I myself am based in San Francisco. I’m really thrilled to be here today with my esteemed panelists, from right to left, David Sullivan, executive director of the DTSP, what is his hand for again? Digital Trust and Safety Partnership. Yes. It’s a lengthy acronym. And David can share 30 seconds of what it is later. Dan Arush from, Charles Bradley from Center for Democracy and Technology, research director, and to my left, Marwa Fatafta, MENA policy and advocacy director at Access Now. And today we’ll talk about a topic that is really emerging and a lot of acronyms, apologies in advance, JNAI, LLMs, being based in San Francisco, it’s something that people talk about daily and it seems far fetched, but we already see that in the world today. So the way that our session will be structured is I’ll share a few key takeaways from our emerging research at ECNL, as well as some human rights impacts. And then we’ll hear from folks on the panel about different use cases, different risks. We’ll hear a regional perspective from Marwa and really try to bridge both the technical and societal aspects of LLMs and placing it today in everything that’s happening in the world, including geopolitical developments. So this topic is really relevant for ECNL. We’ve been working on automated content moderation. for the past five, six years, and LLMs have become really an interesting development, and interesting from a human rights perspective means both potential good as well as alarming use cases. And I’d say one of the biggest issues for LLMs, which is a subset of generative AI trained on vast textual data, is that while it promises efficiency and adaptability, they also pose serious risks. Automated content moderation, as many folks in this room probably know, pose already a lot of human rights risks and cause violations, and LLMs have become, at least in Silicon Valley, and increasingly presented as a silver bullet for solving these issues. However, our research has shown that they can reinforce existing systemic discrimination, censorship, and surveillance. And one of the most pressing issues that we found, and ECNL has conducted research on this topic for the past year, working with hundreds of different folks across civil society, academia, industry, on mapping the human rights impacts of large language models for content moderation. One of the most pressing issues is the concentration of power. So the way that LLM used for content moderation works today is that there’s a handful of companies that develop foundation models or LLMs. The one that probably folks are most aware of is ChatGPT. There’s also Cloud, Gemini, Llama, and a few others. And so this is at the AI developer level. Then you have the deployer level, which are often smaller social media platforms like Discord, Reddit, Slack. They often will not have their own LLMs, but they will use other LLMs like the ones I mentioned and fine-tune them for their own purposes. So what does this mean? Any kind of decision made at the foundation level, let’s say, defining Palestinian content as terrorist content will then also trickle down to the deployer level unless it’s explicitly fine-tuned. What this means for freedom of expression globally is that content moderation defined at the foundation level will also be replicated on the deployment one and really there’s even more homogeneity of speech as before. However, alternative approaches are emerging and we’ve seen throughout our research that there are community-led initiatives, especially in the global majority, that focus on public interest AI that is culturally informed and really decentralized. These models, though smaller in scale, demonstrate comparable performance with broader LLMs and highlight the potential for more rights-based moderation. Our report, which our friends at IGF can maybe post online, and I encourage you guys to check out, look at the various human rights impacts from privacy, freedom of expression information, assembly and association, non-discrimination, participation, and remedy. We’ll talk about some of these later but I encourage you to read it. It’s a thorough analysis of each of these rights and the last part is recommendations, which we’ll also dive in to this session. And just to note that we’ll have ample time for questions. I really want to hear from folks in the room as well as online. I already see a few experts on this topic and also encourage everyone to participate even though it seems like it’s brand new. A lot of the questions around large language models have been around for a long time. Obviously automated content moderation but even offline or human-led moderation, you know, these questions are often the same and they’re just exacerbated and accelerated due to the scale and speed of AI. With that said, I wanted to to turn it to David to share a few use cases of LLMs for content moderation. And DTSP together with BSR, Business for Social Responsibility, has led research on the topic. So David, if you could share some of those findings and introduce your work.


David Sullivan: Thank you. Thanks, Marlena. And I’m going to take these off for a moment. It’s great to be here with everyone. I’m David Sullivan. I lead the Digital Trust and Safety Partnership, which brings together companies providing all different types of digital products and services, including some of those companies that are frontier model developers and deployers, like a Google or a Meta, as well as smaller players, such as Discord and Reddit. And our companies come together around a framework of best practices for trust and safety, a framework that is content and technology agnostic. And the idea is basically that companies can come together around the practices that they use to develop their products, enforce, develop the governance and rules for those products, enforce those rules, improve over time, and be transparent with their users and with the public. So it’s about the practices of safety, as opposed to agreeing on what types of content should be favored or disfavored on these kinds of digital services. So what I want to start with, so last year, in 2024, we brought together a working group of our partner companies on best practices for AI and automation in trust and safety. So that looked at the full range of technologies from even the most basic kind of rule-based systems that have been used as part of trust and safety going back for 20 years, dealing with things like spam, to possibilities for use of generative AI as part of trust and safety. of which content moderation is kind of one component. And so we spent the better part of a year looking at what companies were doing and trying to identify some best practices and as well as what we called generative AI possibilities, ways that companies might be experimenting and beginning to use this technology as part of trust and safety, as well as what the limitations and challenges and ways to overcome those challenges. So I’d encourage folks to go to our website, DTSpartnership.org. That report is right on the front page. And as Marlena mentioned, we worked closely with a team at BSR who helped us do that research. Hannah from that team is here and is also really an expert in this space. So I want to just briefly mention a few things. First, as I think I already said, I think that use of trust and safety, use of AI and automation in trust and safety has always been a blended process of human and technology. That’s always been the case and it continues to be the case, even as what that blend looks like may change quite substantially as LLMs and these generative AI technologies get incorporated into trust and safety. The second thing is that perfection when it comes to content moderation is nearly an impossibility. And so we’re always thinking about potential for over-action or under-action when it comes to how companies are enforcing their policies. And we can talk a little bit more about some of the trade-offs there. And I’m sure we’ll talk a lot about that with this group here. So with that in mind, I wanted to just use our framework of these five overarching commitments to talk about five examples of kind of possibilities for the use of generative AI. as part of trust and safety. And then hopefully those will help kick off some discussion. So the first of our commitments that all our company members make is around product development. And so one example of a generative AI possibility in product development is the use of generative AI to enhance and inform the kinds of risk assessments that companies do when they are developing and rolling out new products or new features within products. So examples of this could be generative AI could help to analyze emerging patterns of content related abuse. They could identify edge cases that could then become more mainstream. You connect data points between different types of risk factors and could potentially be used as part of red teaming exercises by trust and safety teams, brainstorming attack scenarios and things like that. So that’s on the product development side. The second commitment that all our companies make is to product governance. And so there as part of that commitment, some of the best practices we’ve identified around external consultation, incorporating user perspectives into company policies and consulting with the civil society organizations and other external groups as the part of developing and iterating those policies. So there again, I think LLMs could potentially be leveraged to gather much more data. You have companies currently using kind of surveys and focus groups and getting information from outside experts and all of that may not always be as coherently brought together as it could be. So LLMs could help with that. And one other thing they could potentially do is help to create the kind of feedback loop where those organizations that spend a lot of time telling companies what they should and shouldn’t be doing with their policies. would be able to hear it. This is how your input was used in, you know, the development of this, you know, new content policy around whatever issue. On enforcement, so there I think one of the things that, where I am cautiously optimistic, is about the potential for generative AI to augment human review as opposed to replace it. And we hear a lot these days about AI replacing humans when it comes to content review. But I think the area where there’s the most potential, both in terms of shielding humans from having to review the worst of the worst type of content, but also being able to help provide context to human reviewers that maybe will help them with their decision-making. And also being able to route the things that are easily determined to be content violating away from humans to then make their own work more efficient. And we can talk, of course, about a lot of the challenges there as well. Just quickly on improvement, I think one thing that gen AI can do is sort of enhance the automated evaluation of context around violations. So, you know, it can be hard for companies to be able to let’s see basically the idea is being able to have more information at your disposal in order to figure out how your policies are being are actually being implemented in practice and to incorporate that context into automated actions as well as that sort of information for human reviewers. And then lastly on transparency, there I think there’s also potential for generative AI to improve the explainability of the decisions that companies are taking. So I think maybe all of us have at one point or another had an experience of having Something you’ve posted finding that it violates some services guidelines one way or another and when you Appeal those things you get very little information in return And so there is I think potential for these types of technologies to be able to provide a little bit more information So the example for example would be you know, if you have posted a video that’s an hour long It could tell you here’s the two minutes that we found to be violative and you can have a chance to correct that So those are I think some Possibilities some positive use cases. We’re going to talk a lot more about the limitations and challenges. I just wanted to Mention just a couple of them The first is that and this is where I think the stakes of all of this gets very high is that we don’t know what you know, kind of tomorrow’s content crises are going to look like and We know that these models are not good when they’re dealing with novel challenges that are not adequately Represented in their training data. That’s when they really go off off the deep end So we need to be aware of that. The second thing is that for all companies that have Trust and safety operations that have their content policies. They need to exist in three different forms They have to there has to be a public facing version for users to understand what’s allowed and not allowed There has to be the internal detailed. These are the specifics of how we enforce this policy, which you don’t want to make Completely public because bad actors can use that to kind of you know, gain the system and then you need a version That’s machine readable. It can be used by a company by by LLMs So those are that’s a complicated sort of balancing act and one that that also complicates these challenges Lastly I think there’s just trade-offs when it comes to the metrics that companies use here And so you you can you know optimize for? Precision which is really the metric about how correct your decisions are or you can optimize for recall which is about are here to talk to you about how we can make sure we get as much, covering as much content as possible. What that, these are the terms that, you know, kind of AI folks will throw around. They have real consequences when it comes to the kinds of harms that occur through digital services and the impact of those. And so you’re constantly having to balance, you know, we need to make sure we get as much of the really harmful content as possible. Whereas in other situations, you want to worry about false positives. So those are real trade-offs. You can’t just wish them away. And so I think hopefully that maybe helps kick things off and I’ll stop there to give others time.


Marlene Owizniak: Thanks. Thanks, David. And can everybody hear us? Yeah. OK, great. Because the mic situation is a little bit off, but you have to wear the earphones to hear. Next we’ll hear from Dhanaraj about multilingual models in particular. So a lot of this conversation research is often on English content and some colonial languages. But perhaps or hopefully unsurprisingly to folks in this room, that is not the case across languages. There’s a lot of inequities. And CDT really over the past few years has done groundbreaking research on this topic. It has informed our own research as well. So I’m thrilled to have you, Dhanaraj, here. And also shout out to Alia Bhatia, who is not here with us today, but who has done some of that research.


Dhanaraj Thakur: Yeah, great. Thank you, Marlena. And thanks for the invitation to join this conversation. Yeah, so I’m Dhanaraj Thakur. I’m research director at the Center for Democracy and Technology based in Washington, D.C. and in Brussels. We’re a nonprofit tech policy advocacy group. We focus on a range of issues, one of which is our own content moderation. Great. So, yeah, just to follow up on then on what David discussed on, like, the application of large and large models the content moderation analysis and trust and safety systems. I can talk a bit more specifically about how those are applied, how those systems and technologies are applied in what we’ll further discuss and explain is low-resource languages. And this is based on some research that as Marlena mentioned the CT has done. For example one report called large language models in non-language content analysis led by Gabriel Nicholas and Aliya Bhatia and a forthcoming report based on a series of case studies we’ve been doing on content moderation and global south looking at different low-resource languages specifically Maghrebi Arabic, Swahili, Tamil and Quechua. So when we talk about multilingual large language models what we’re essentially focused on is large language models that are trained on text data from several different languages at once. And the logic or the claim that researchers often make with these kinds of models is that they can extend the various multi-capacity capabilities and benefits that for example that David highlighted to languages other than English and even to languages for which there’s little or no text data available. And so you can also then see how you can apply some of these kinds of benefits to many of the content, user-generated content from various kinds of languages around the world. That said there are several issues and challenges that come up and that’s what I’ll spend a bit more time talking about. Skipping over the potential benefits which I think David has covered quite well. So a lot of studies also show that multilingual language models also struggle to deal with what’s called there’s this wide disparity between languages and how much textual text data is available. And so researchers describe or use categories of high-resource and low-resource languages. English has by multiple odds of magnitude much more text data available than any other language and there’s a lot of reasons behind that. You can think of the legacy of British colonialism, American neocolonialism and the subsequent erasure of regional and indigenous languages. Most of the companies that we are discussing now when we talk about frontier model companies are based in the U.S. as well, as well as the social media companies, so English becomes a dominant language there. So what we call high-resource languages effectively refer to languages where there are significant amounts of text data available, such as English and many other European and other languages, Chinese, Arabic, for example. On the other hand, on the other side of the spectrum, you have low-resource languages with very little textual data available, but these can still be major languages in terms of number of speakers. So this can include, for example, Swahili, Tamil, or dialects of Arabic, so the Maghrebi Arabic languages I mentioned earlier. There’s also like Bahasa Indonesian, which is like literally hundreds of millions of speakers. So what this leads to then is this kind of disparity or inequity in the potential applicability of these technologies to content analysis around, particularly in social media, but in other use cases. We could have a separate discussion on the terminology of high and low-resource as it applies to these kinds of languages, but we could leave that for another time. So one of the questions that comes up in a lot of our work is not so much the technical capability of these technologies, but also how they’re incorporated into existing trust and safety systems. And so here we come across several different kinds of problems. So with low-resource languages, for example, we have this problem of lack of training data, but often that lack of training data is not just for content in general, but can be for specific domains. So in our research, for example, we spoke to LLM and LP researchers, natural language processing researchers, working on Quechua, for example. and the other is the problem of having enough data and catch available generally for developing these models, but in specific domains such as hate speech. Because if hate speech is a concern, for example, for a particular trust and safety application, then you need particular data in that as well. And that’s also part of the lack of data or the low resource problems, so to speak. Many users, and this is, I think, a well-known fact for many of you, is that people online, when they come to user-generated content, engage in what’s called code switching. So they alternate between two languages, for example, because many people employ multiple languages in daily conversation as well. There are other challenges, such as the agglutinative nature of some of these languages. So by that, I mean that some languages, such as Quechua, Tamil, for example, will build words based on lexical roots and then they add suffixes to create complex meanings that often in other languages require entire sentences or multiple sentences to convey the same thing. How LLMs handle that process is quite different, but it adds challenges to the analysis of text or content in those languages. There’s also the issue of diglossia, which is a linguistic concept. Often in many of these languages, particularly in those that have gone through, like, the colonial experience, there’s a combined use of two languages, which I mentioned, but they’re done in such a way that there’s an asymmetrical relationship between the use of two languages. So if I use the example of Quechua and its relationship to Spanish, which is a colonial language, one would represent power status and issues of importance in how people use that language in the same sentence or paragraph with Quechua, whereas Quechua would be used, for example, a more mundane function. So this concept of, like, words from a certain language you use to represent more power and the other is used to represent less power, and that’s used together. and the issue that comes up here is to what extent, or a question that comes to mind is to what extent will the development of models around this, building upon this kind of dynamic replicate or exacerbate this kind of polydynamics between these two languages. This is combined with a problem that you see a lot, for example, in indigenous languages where there is a history of erasure of languages. And so often a concern that comes up when you talk to researchers and people in these communities is to what extent do these technologies combine or at least push back against that kind of erasure. In our research, what we showed or what we found, I’ll just highlight some of the problems, but these have direct impacts on people, social media users, who post in these languages. For example, what people observed is that often it would take a longer time for the social media companies to moderate content in these languages versus content that was uploaded in high resource or in the colonial languages. And because of the challenges around developing models around this, and or the lack of native speakers on trust institute teams that can handle these languages, it could lead to a longer time to moderate content. There were often in addition reports of unjust content removal, perceived shadow banning and so on. People also highlighted different ways of recognizing these problems, highlighted different tactics of what we call resistance. So, for example, I mentioned code switching. There’s algo speak, you know, using random letters in a word or using different emojis, for example, the watermelon emoji, which is used to refer to Palestine, for example. So using various tactics because they are aware of the failures or the potential weaknesses of these kinds of automated systems, using various tactics to get around them. Yeah, I can stop there for now. Thanks.


Marlene Owizniak: Thanks so much. And this is a perfect segue to talk about some real world regional harms going to the Middle East and North Africa region, which is, you know, especially today, very topical, but the issues around content moderation and censorship more broadly, surveillance are not new to the region. So really, really grateful to have Marwa here. And we’d love to hear from you about your regional perspective.


Panelist 1: Yeah, thank you, Marlena. And yeah, unfortunately, the MENA region is quite rife in examples. But I do want to first thank my co-panelists for laying the ground pretty well for me to provide some specific examples. I want to make my comments around three issues. The first one of which that has already been alluded to, to the question of where do you invest in those systems and in which languages? Arabic is one of I don’t want to call it a minority language. Millions of people speak it. It’s an official UN language. Yet, unfortunately, AI systems used by tech companies and social media platforms more specifically tend to be poorly trained. And I’ll mention some specific examples there. But the issue here is also in some cases where there is a minority language, companies sometimes think it’s not, you know, market. It’s not there is no incentive for them to prioritize that language, even though, for example, in the context of Palestine, Israel in 2021, when there was a surge of violence on the ground and also a surge of online content, protesting or documenting abuses, we noticed that there was an over-moderation of Arabic language under moderation of Hebrew language on META’s platforms more specifically. And after Business for Social Responsibility conducted a human rights due diligence into META’s content moderation of that period, one of the reasons behind such dynamic was the fact that There were no Hebrew classifiers to moderate hate speech in Hebrew language, but they were such for Arabic. One would ask a question here is that why, despite the context was very clear, there were high incitement, so high volume of content inciting to genocide, inciting to violence, pretty direct hate speech where it’s pretty much black and white. But nevertheless, the company did not think that it was a priority at the time to roll out classifiers that would be able to automatically detect and remove such harmful and potentially violative content. When we pushed back, of course, and after the due diligence findings were out, now META has classifiers, but we found out in the new round of violence, unfortunately, I mean, after October 7th, that those classifiers were not even well trained to be able to capture even more, a larger volume of hate speech and incitement to violence and genocide and dehumanization or dehumanizing rhetoric, which leads me to the second issue. That is the under and over enforcement that comes as a direct impact of basically company decisions and investment and where and when to deploy such systems. Let’s now zoom in into the concrete issues of how these systems are not, they’re far from perfect, but the risks and the direct impact of which can be very, very harmful. One of the examples here in terms of over enforcement, I mean, if you talk to, for example, Syrians, Syria has been one of the most sanctioned countries on Earth planet. Thankfully, many of the sanctions are being removed. But the result of that, you know, for example, on counterterrorism legislation or laws, we’ve seen aggressive moderation. and removing terrorist content in Arabic got it wrongly 77% of the time. That’s quite huge. And when we talk again about a region that is pretty much at the receiving end of this aggressive counterterrorism measures, the result is this mass scale censorship of activists, of human rights defenders, of journalists, and particularly around peaks of violence and escalations where people do come to online platforms to share their stories and the realities and to document abuses and for journalists, of course, to govern what’s happening on the ground. There are other examples that I could mention where I got things terribly wrong and extremely sensitive in critical moments. One example I can think of in twenty twenty one when Instagram falsely flagged Al-Aqsa Mosque, which is the third holiest mosque in Islam as a terrorist organization. And as a result, all hashtags. And that particular time is also quite interesting. It’s interesting because it was when the Israeli army stormed Al-Aqsa Mosque and people were reporting and sharing photos with that hashtag Al-Aqsa and this is when Instagram decided, or Meta, now is the time to mislabel this mosque as a terrorist organization and as a result all the content was banned and forcibly removed. During the unfolding genocide in Gaza we had also examples where one famous example was of a person whose Instagram bio was mistranslated. He said, you know, praise to be God, I’m Palestinian but the system translated it as praise to be God, Palestinian terrorists are fighting for their freedom. Many, many years ago also there was a case of a Palestinian construction worker who was working in Jerusalem, who was arrested by the Israeli police because he was flagged to them that he’s about to conduct a terrorist attack and they relied on Facebook’s translation, automated translation which falsely or mistakenly translated the man saying good morning, posting a picture of himself smoking a cigarette and leaning on a caterpillar to good morning, I’m going to attack them. And the man was detained for a few hours and interrogated and then he was released after the Israeli police realized oh, Facebook made a mistake in translation. The man had shut down his accounts, I do remember, meaning that those types of actions and their consequences can be quite detrimental for people’s ability not only to exercise themselves but can constitute or instill a sense of fear that they might be subject to similar detrimental consequences. David had mentioned an interesting point which I would like to elaborate on. The tension between precision versus recall. From my observation, having worked on multiple crises in the MENA region over the past few years, is that companies tend to over rely on automation around times of crises. And particularly when there are attacks or, you know, they feel like under pressure that they need to remove as fast as possible large amounts of content in order to avoid being, you know, liable. One example I can think of is, of course, the October 7th attack. Immediately after tens of thousands, hundreds, in fact, hundreds of thousands of content was just largely removed using automation. And there that balancing act that is not possible to even achieve means that companies are willing to sacrifice or, you know, to say, okay, it’s fine if we erroneously get content decisions or content moderation decisions wrong, as long as we try to catch as large amounts of content as possible. And there one specific example I can mention here is META’s decision to lower the threshold for hate speech classifiers directly in the aftermath of October 7th attack to remove, for these classifiers to detect comments in Arabic language and specifically those coming from Palestine. So lowering the confidence thresholds from, I think it was around 85 or so, all the way down to 25%, meaning that the classifier, you know, at that very low level could remove and hide people’s comments. Because again, the emphasis here on removal versus precision or accuracy in the decisions. Now, what does that mean for the users, for people and their ability to use? Those platforms freely and safely to express themselves. We’ve had situations where people were banned from commenting for days We’ve had people who had Really extremely innocuous. I mean just Palestinian Flags or they watermelon emojis being removed We’ve had even people having receiving warnings Before following particular accounts, you know For instance if you were a journalist known for covering the events in in Palestine and or in Gaza more specifically and The you would get a notification saying are you sure you want to follow this person because they’re known for spreading disinformation I’m talking about credible journalists as you know, professional journalists not influencers or content creators so we’ve had many examples and again where Hundreds if not thousands of people who had their content removed as a result of these types of that tension to which companies tend to tilt towards Again over moderation or aggressive moderation rather than than accuracy Lastly what I want to say is that okay. I’m not an expert on LLMs, but I What concerns me the most is that we are at the cusp yet of another era of new technologies or you know a new iteration of technologies in which there is a lot of promise, but there are yet to be proper human rights impact assessments and that’s something that you Excellently catch in your in your report that we still don’t have access to these systems It’s hard to independently audit them and therefore to understand and also work with the companies What are the the risks and how can be mitigated before they are already rolled out at a scale and then we as civil society? find ourselves in the position of having to Document the harm try to connect the dots and understand Okay, why is it that the certain population at a certain time being subject to censorship? and what could be the catalyst reasons behind it and then provide that as an evidence for platforms for them to correct course and adjust the systems and the policies behind them. And I’ll stop here. Thanks so much.


Marlene Owizniak: And before I open it up to the floor, I just wanted to highlight a few of the key risks that we found, just following up on the speaker’s points. And I really do encourage you to read our report. We distilled it down to 70 pages, which is still quite long. We read over 200 computer science papers and like really brought a human rights legal analysis to it and try to make it more digestible by having different chapters. So every right is its own chapter. And then there’s also one technical primer. Some of the concepts that David shared, which are very common in the, you know, AI, CS technical world are less so in policy and vice versa. And then Marwa said we need more human impact assessments. So BSI was brought up several times. There’s a handful of orgs and people doing that, but it’s, it’s really concerning that you have these many human rights impact assessments for this big of an impact. And some of the key LLM impacts we found, one of the benefits side, because there can be some potential use cases is that LLMs are typically better at assessing context. So if we are going to use automated content moderation, they typically perform better. The accuracy level is higher than traditional machine learning. And they can be also better for personalized content moderation. So we talk a lot about user empowerment and agency, and if folks want to kind of like adjust their own moderation settings, if someone is comfortable with sensitive content, for example, or gore or nudity, they can choose that versus others can filter that out. And also LLMs can be better at informing users in real time why the content was removed, for example. and I’m going to talk about what steps they can take to remedy it. There’s such a big gap today with explaining to users why their content was removed and what they can do to appeal that. That said, a few key risks specific to LLMs. One, because there’s so much content. And I often say, I’ve been working in AI for a long time, for those who know me, and AI is neither artificial nor intelligent. It uses a lot of infrastructure, a lot of hardware, and it’s mostly guesstimates. So LLMs, and I should have begun with that, is large language models. It’s basically statistics on steroids. It’s not divine intelligence. It’s just a lot of data with a lot of computing power, which is also one of the reasons why it’s so concentrated. What happens when systems have so much data, often from web scraping, is that they can infer sensitive attributes. Much more than traditional ML systems can. And when we think about the relationships between governments and companies today, that really puts minorities at risk of being targeted and increasingly surveilled. Marwa and folks here already talked about over and under enforcement. Unfortunately, marginalized groups are both impacted by false positives and false negatives. That means, for Marwa’s example, Palestinian content is both overly censored and at the same time, genocidal, hateful content is not removed from the platform. Hallucinations is a very typical Gen AI LLM example. When companies rely on LLM-driven content moderation to moderate misinformation, for example, the LLMs can put out really confident-sounding statements that are just wrong. So using that to inform human content moderators or automated removal often leads to just errors and inaccuracy. One last thing I will mention is that our organization, ECNL, we work a lot on protests, civic space, assembly and association. These are often actions and content that are, by default, contrarian, minority, anti-power, protest. You usually protest something. You protest a powerful institution. And if you think about AI, both traditional machine learning and LMs, they are statistical bell curves. And minority content falls outside this data set. And Marwa and David, I think, hinted at that when we think about crisis. And, quote-unquote, exceptional content. So even the best intentioned platforms will make errors just because this is content that falls outside of data sets and the bell curve. It is, by definition, exceptional or contrarian. So that’s really something to consider when you think about assembly and protests. Yeah, and I’ll just leave it at that. We also have a lot of work on participation, so I encourage you all to check that out and reach out. And I would love to open it up now. One of the things that we’ve been thinking a lot about at ACNL after doing this human rights impact assessment is now what? What kind of recommendations can we make to AI developers and employers? What are we still missing in the academic and civil society community? What gaps are there? So if anybody has thoughts on that, I would love to hear from you. Otherwise, any other questions? And folks online, please either write your question in the chat and I’ll bring it to the floor or you can raise your hand. I’ll take a couple questions just because we have limited time. Please raise your hand if you have a question. For now, there’s only one, so please, yes. And if you can introduce yourself, name, and affiliation, that’d be great. Oh, yeah, the mic is over there. I’m sorry. So please line up and go to the mic if you’d like to ask a question.


Audience: Yes, so this is Balthazar from University College London I’m just wondering if there are any avenues for other actors I mean in this case, sorry, in this case government and civil society to influence the technical design of the LLM used for content moderation within digital platform, or is it like largely proprietary by these social media companies and there’s no no way to influence the technical life cycle so to speak and then we just you know rely on tech platform to make decision on what is the next iteration of the LLM going to be or is there really I haven’t used some kind of external human in the loop mechanism so that civil society or government can influence in a more technical sense to complement the legal intervention and program intervention. Thank you


Marlene Owizniak: Thanks so much. I have a few thoughts, but I’ll hand it over to the panel first


Dhanaraj Thakur: Sure, thanks great person So one of the kind of feedback that came up a lot in our research is that engage with greater community leadership and participation in the building of LLM So for example, that can come in the form of the building of data sets ownership of data sets, right? For specific kinds of contents in in building LLMs There are many examples about around the world of this happening with local researchers and communities coming together to build LLMs, build models, language technologies for specific purposes outside of social media. The problem that came up a lot and I don’t know how if others have thoughts on this was that social media companies the ones who engage are often not aware of these communities of local LLM developers, LLM researchers or these kinds of efforts which they could benefit a lot from but there’s a gap there, a disconnect there. I think that’s one area as well There’s also significant opportunities for governments and even industry to invest in these kinds of partnerships as well and to support those kinds of efforts


David Sullivan: Just building on that I do think that there is an opportunity coming up where there is a lot of enthusiasm and interest within the trust and safety community for open source So in particular there’s a new project called Roost robust open online safety tooling Which is where a lot of companies are coming together to open source some of the technologies and tools in the space That’s been a sticky area when it comes to really challenging online safety issues But I think there is an opportunity there and there’s ways for people to get involved So I think that that’s one one positive to look at


Panelist 1: Plus one to involving communities from the get-go from the start and yes, and I also do Yeah, I do confirm that I don’t think that Social media companies are connected to local developers or local LLM experts. I certainly don’t see that happening in the region I would also say, in addition to these voluntary multi-stakeholder mechanisms or FORAs, maybe there should be a space for mandatory human rights impact assessments. Also, throughout the cycle of development, starting from the very beginning, up and, of course, during the launch of those systems and their enrollment, and, of course, following any adjustments or modifications of such systems and their use.


Marlene Owizniak: Yeah, and I’ll just add briefly on the AI lifecycle. ECNL has been working with Discord on piloting what we call a framework for meaningful engagement, where from the first stage of the lifecycle on AI design, they’re developing machine learning and LLM-driven interventions to moderate content online. So we’ve been partnering with them and with stakeholders around the world, some of you in this room, on helping them do that. So it’s a very specific case study, and I can’t share more about that. Another example where I think folks can be involved is after the deployment stage. So the way that LLMs work is they’re trained, and then there’s the whole validation slash evaluation section. They’re often done through reinforcement learning by human feedback. I won’t go into details, but it basically requires people to go through the outputs and retrain them. One thing that we’ve been advocating for at ECNL is to involve communities at that stage as well. What typically happens is during this reinforcement learning phase, it’s mostly Silicon Valley folks or experts like probably us in the room who would do that, but not the communities affected. And it’s very, very homogenous. So it’s people from elitist academic institutions, high-name NGOs, and those based in Silicon Valley. And that’s a problem, because it’s supposed to, quote-unquote, fix or improve the LLM, but it just ends up perpetuating even more bias. So many ways to involve folks, and that is definitely step one, I think, to actually make these systems better. Next question, please.


Audience: I’m part of the Internet Architecture Board in the IETF, the Internet Engineering Task Force. I have a very straightforward or blunt question, maybe. So I kind of understood that LLMs will always make mistakes. I think that seems obvious, but do you think there is an area to involve LLMs to do a good job here? Or do you think there will always be a need for other mechanisms to have humans involved in the thing? Or do you think LLMs is not the right technology at all, and we need maybe other ways to empower the users and to give users a decision about which contents they want to engage with and they want to see? So, you know, what’s the way forward?


Marlene Owizniak: Want to briefly say?


Dhanaraj Thakur: Sure, I can take a quick. So I think there are two dimensions to this. Like, how do companies address content moderation? And there are actually many different models, not just this kind of centralized model that you see with large social media companies. So we should keep that in mind. And they could use, like, you can imagine a subreddit, a moderator, thinking of ways they could use these kind of tools for their specific use case. And it could be very helpful for community building in that sense. So just keep in mind that there’s this range of options available. But I think when we think of it at a larger scale, all positions, always, at human moderators should be part of the consideration and the calculus of how you address content. I think others, like David and Marwad, mentioned this as well. I think what’s important is the flexibility around this. So for some of the particularly low-resource languages where there’s very little data and these kinds of tools may not be as effective, there should be a heavier emphasis on human moderation. And that could evolve over time. But I think there will always be some kind of combination between the two.


David Sullivan: I would just add, maybe, there’s an excellent research paper that Google folks put out last year in 2024 about how LLMs can be leveraged to support human raters of content, which goes into this at a level of technical detail that is beyond me, but I think might be helpful. But I think one of the opportunities here, which is cognizant of all of the risks when it comes to how AI can be misused when it comes to content moderation, I do think when you think about AI as this, sometimes, this technology that is overhyped and lacking business applications, I do think that content moderation and trust and safety is a concrete business application for AI, and one where the developers and deployers are often the same company. And so I think there are some opportunities there, but it comes with all the risks we’ve talked about.


Audience: That’s a very quick follow-up. So you said if the humans are part of the chain, I think the challenge is always scaling up and also timely reactions, right? Can you come comment on that?


Marlene Owizniak: Excuse me, there’s people behind me with time, but I urge you to read the reports, and we also have a large section on recommendations, so that is the next step forward. We only have two minutes, so briefly, please.


Audience: I’m Professor Julia Hornley. I’m a professor of internet law at Queen Mary University of London. I’m an academic, and I’m a lawyer, and hence my question. Obviously, for lawyers, it would often take an extremely long qualification for a judge to adjudicate content, right? For lawyers, these are very, very complex decisions, whereas, as I understand, LLMs and artificial intelligence is based on, obviously, complex processes of labeling and validation processes. So I was wondering whether, in addition to LLMs, which, by definition, will always have these problems, which you so often…


Dhanaraj Thakur: So there are models where it’s really engaging communities about what kinds of data, like how you classify data, what categories of data are important, and who ultimately owns it and becomes a steward of that. That kind of emphasis is very different from the current models. And having LLM developers partner with these kinds of communities in those contexts is one approach. And it’s very similar to the kinds of community-based internet networks that you’re talking about. And that also introduces different kinds of business models as well.


Marlene Owizniak: Thanks so much. And unfortunately, the session is already wrapping up. There’s so much to be said about this topic. I hope one takeaway you have is that it’s an emerging field. There’s still too little transparency. And if we can urge platforms to share more data, including how and when LLMs are used, we often don’t even know that. That is one thing. One of the things that we try to do at EC&L is really document the human rights harms as opposed to AI hype. There’s a lot of hype in this space, as you probably all know. And at the same time, excitement around community-driven models, like Dhanraj talked about. And so there is, like not everything is doom and gloom. There is hope for some community-driven public interest, fit-for-purpose models that I think we can explore. And really find a way that develop AI that respects labor rights, including human counter-moderators, users, and engages stakeholders. And going forward, we will obviously continue to work closely with our partners, many of you in the room, and implement the recommendations with the platforms, test them as well. They’re very much ongoing. So you’ll see that in our report in the last section. Please reach out if you want to get involved. This conversation is only starting. Who knows if LLMs will even be deployed. They’re very expensive to run to begin with. That’s something we didn’t really talk about. But in any case, hearing your voice and concerns is really important. So thank you so much for being here, and happy IGF.


M

Marlene Owizniak

Speech speed

168 words per minute

Speech length

2596 words

Speech time

926 seconds

LLMs can reinforce existing systemic discrimination, censorship, and surveillance

Explanation

While LLMs are presented as a solution to content moderation issues, research shows they can actually exacerbate existing problems. They perpetuate and amplify discriminatory patterns present in their training data, leading to biased enforcement that disproportionately affects marginalized communities.


Evidence

ECNL conducted research for the past year working with hundreds of different folks across civil society, academia, and industry on mapping human rights impacts of LLMs for content moderation


Major discussion point

Human Rights Impacts and Risks of LLMs in Content Moderation


Topics

Human rights | Legal and regulatory | Sociocultural


Disagreed with

– David Sullivan
– Panelist 1

Disagreed on

Optimism about LLM potential versus focus on current harms


Concentration of power at foundation model level creates homogeneity of speech globally

Explanation

A handful of companies develop foundation models like ChatGPT, Claude, Gemini, and Llama, which are then used by smaller platforms. Any content moderation decisions made at the foundation level automatically trickle down to all deploying platforms unless explicitly fine-tuned, creating unprecedented uniformity in global speech regulation.


Evidence

Example given: defining Palestinian content as terrorist content at foundation level will trickle down to deployer level platforms like Discord, Reddit, Slack unless explicitly fine-tuned


Major discussion point

Human Rights Impacts and Risks of LLMs in Content Moderation


Topics

Human rights | Legal and regulatory | Economic


LLMs can infer sensitive attributes more than traditional ML systems, putting minorities at risk of targeting and surveillance

Explanation

Due to the vast amount of data LLMs are trained on through web scraping, they can deduce sensitive personal characteristics about users far beyond what traditional machine learning systems could detect. This capability, combined with government-company relationships, creates significant surveillance risks for vulnerable populations.


Major discussion point

Human Rights Impacts and Risks of LLMs in Content Moderation


Topics

Human rights | Cybersecurity | Legal and regulatory


Marginalized groups face both over-enforcement and under-enforcement of content moderation

Explanation

Vulnerable communities experience a double burden where their legitimate content is excessively censored (false positives) while harmful content targeting them remains on platforms (false negatives). This creates a situation where they are both silenced and unprotected simultaneously.


Evidence

Palestinian content is both overly censored and at the same time, genocidal, hateful content is not removed from the platform


Major discussion point

Human Rights Impacts and Risks of LLMs in Content Moderation


Topics

Human rights | Sociocultural | Legal and regulatory


Agreed with

– Panelist 1

Agreed on

Crisis periods lead to over-reliance on automation with harmful consequences


Protest and contrarian content falls outside statistical bell curves, making it vulnerable to errors

Explanation

Content related to protests and civic activism is inherently contrarian and anti-establishment, representing minority viewpoints that fall outside the statistical norms that AI systems are trained on. Since LLMs operate on statistical patterns, this exceptional content is systematically misclassified, even by well-intentioned platforms.


Evidence

ECNL works on protests, civic space, assembly and association – actions that are by default contrarian, minority, anti-power; AI systems are statistical bell curves and minority content falls outside datasets


Major discussion point

Human Rights Impacts and Risks of LLMs in Content Moderation


Topics

Human rights | Sociocultural | Legal and regulatory


D

Dhanaraj Thakur

Speech speed

179 words per minute

Speech length

1857 words

Speech time

619 seconds

Wide disparity exists between high-resource languages like English and low-resource languages in available training data

Explanation

English has orders of magnitude more textual data available than any other language due to historical factors like British colonialism and American technological dominance. This creates a fundamental inequality where major languages with millions of speakers are still considered ‘low-resource’ for AI training purposes.


Evidence

Examples include Swahili, Tamil, Maghrebi Arabic dialects, and Bahasa Indonesian with hundreds of millions of speakers still being low-resource; legacy of British colonialism and American neocolonialism mentioned as contributing factors


Major discussion point

Language Inequities and Multilingual Challenges


Topics

Sociocultural | Human rights | Development


Agreed with

– Panelist 1

Agreed on

Language inequities create systematic discrimination in content moderation


Code switching, agglutinative language structures, and diglossia create additional challenges for LLM analysis

Explanation

Many users naturally alternate between languages in their communications, while some languages build complex meanings through word construction that would require entire sentences in other languages. Additionally, colonial language relationships create power dynamics where different languages within the same text carry different social meanings.


Evidence

Quechua and Tamil mentioned as agglutinative languages; diglossia example of Quechua-Spanish relationship where Spanish represents power/status and Quechua represents mundane functions


Major discussion point

Language Inequities and Multilingual Challenges


Topics

Sociocultural | Human rights | Legal and regulatory


Users experience longer moderation times, unjust content removal, and shadow banning in low-resource languages

Explanation

Due to technical limitations and lack of native speakers on trust and safety teams, content in low-resource languages takes significantly longer to moderate. Users also report widespread unjust removal of legitimate content and perceived shadow banning, leading them to develop resistance tactics.


Evidence

Users employ algo speak, random letters in words, different emojis like watermelon emoji for Palestine, and other tactics to circumvent system weaknesses


Major discussion point

Language Inequities and Multilingual Challenges


Topics

Human rights | Sociocultural | Legal and regulatory


Agreed with

– Panelist 1

Agreed on

Language inequities create systematic discrimination in content moderation


Greater community leadership and participation needed in building LLMs and datasets

Explanation

Local communities should have ownership and stewardship over the data and classification systems used to moderate their content. This approach would ensure cultural context and community values are properly represented in AI systems rather than imposing external standards.


Evidence

Examples of local researchers and communities building LLMs and language technologies for specific purposes outside social media; emphasis on community ownership of datasets


Major discussion point

Community Engagement and Governance Solutions


Topics

Sociocultural | Human rights | Development


Agreed with

– Marlene Owizniak
– Panelist 1

Agreed on

Community engagement and participation is critical for effective AI systems


Social media companies lack awareness of local LLM developers and researchers

Explanation

There is a significant disconnect between social media platforms and local communities of AI researchers and developers who could provide valuable expertise for their specific languages and contexts. This gap prevents companies from benefiting from existing local knowledge and community-driven solutions.


Major discussion point

Community Engagement and Governance Solutions


Topics

Development | Economic | Sociocultural


Agreed with

– Marlene Owizniak
– Panelist 1

Agreed on

Community engagement and participation is critical for effective AI systems


Different moderation models exist beyond centralized approaches, including community-based solutions

Explanation

Content moderation doesn’t have to follow the centralized model of large social media companies. Alternative approaches like subreddit moderation show how LLM tools could be adapted for specific community use cases, potentially being more effective for community building and context-appropriate moderation.


Evidence

Example of subreddit moderator using these tools for their specific use case


Major discussion point

Technical Limitations and Future Considerations


Topics

Sociocultural | Legal and regulatory | Economic


Flexibility needed with heavier emphasis on human moderation for low-resource languages

Explanation

Given the technical limitations of LLMs with low-resource languages, these contexts require a greater reliance on human moderators rather than automated systems. This balance should be flexible and can evolve over time as technology improves, but human oversight remains essential.


Major discussion point

Technical Limitations and Future Considerations


Topics

Human rights | Sociocultural | Development


Agreed with

– David Sullivan
– Audience

Agreed on

Human involvement remains essential in content moderation systems


Disagreed with

– David Sullivan
– Panelist 1

Disagreed on

Role of automation versus human moderation in content decisions


D

David Sullivan

Speech speed

170 words per minute

Speech length

1898 words

Speech time

666 seconds

LLMs can enhance risk assessments, improve policy development consultation, and augment human review rather than replace it

Explanation

Generative AI can analyze emerging abuse patterns, identify edge cases, connect risk factors, and assist in red teaming exercises. For policy development, LLMs can help gather and synthesize input from surveys, focus groups, and external experts more coherently, while in enforcement they can provide context to human reviewers and route clear violations away from human review.


Evidence

DTSP worked with BSR on research with partner companies; examples include analyzing emerging content abuse patterns, brainstorming attack scenarios, creating feedback loops with civil society organizations


Major discussion point

Technical Applications and Use Cases


Topics

Legal and regulatory | Cybersecurity | Economic


Agreed with

– Dhanaraj Thakur
– Audience

Agreed on

Human involvement remains essential in content moderation systems


Disagreed with

– Dhanaraj Thakur
– Panelist 1

Disagreed on

Role of automation versus human moderation in content decisions


Generative AI can improve explainability of content moderation decisions and provide better context to users

Explanation

Current content moderation appeals provide very little information to users about why their content was removed. LLMs have the potential to offer more detailed explanations and specific guidance on how to correct violations, such as identifying the specific problematic segments in longer content.


Evidence

Example given of hour-long video where system could identify the specific two minutes that were violative


Major discussion point

Technical Applications and Use Cases


Topics

Human rights | Legal and regulatory | Sociocultural


Models struggle with novel challenges not adequately represented in training data

Explanation

LLMs perform poorly when encountering new types of content crises or abuse patterns that weren’t present in their training data. This limitation is particularly concerning given the unpredictable nature of online harms and the high stakes involved in content moderation decisions.


Major discussion point

Technical Applications and Use Cases


Topics

Cybersecurity | Legal and regulatory | Human rights


Trade-offs exist between precision and recall metrics, with real consequences for harmful content detection

Explanation

Companies must constantly balance between precision (accuracy of decisions) and recall (coverage of content). Optimizing for one metric necessarily compromises the other, and these technical trade-offs have direct real-world impacts on both the spread of harmful content and the wrongful removal of legitimate speech.


Major discussion point

Technical Applications and Use Cases


Topics

Legal and regulatory | Human rights | Cybersecurity


Open source initiatives like ROOST provide opportunities for collaborative safety tooling

Explanation

The Robust Open Online Safety Tooling (ROOST) project represents a new approach where companies collaborate to open source trust and safety technologies. This initiative could provide avenues for broader community involvement in developing content moderation tools, though it has historically been challenging in this sensitive area.


Evidence

ROOST (Robust Open Online Safety Tooling) project mentioned as new collaborative effort


Major discussion point

Community Engagement and Governance Solutions


Topics

Legal and regulatory | Development | Economic


Content moderation represents a concrete business application for AI with specific technical opportunities

Explanation

Unlike many overhyped AI applications, content moderation provides a genuine business use case where AI developers and deployers are often the same company. This alignment creates opportunities for more integrated and effective solutions, though it comes with all the associated risks discussed.


Evidence

Reference to Google research paper from 2024 about LLMs supporting human content raters


Major discussion point

Technical Limitations and Future Considerations


Topics

Economic | Legal and regulatory | Cybersecurity


Disagreed with

– Marlene Owizniak
– Panelist 1

Disagreed on

Optimism about LLM potential versus focus on current harms


P

Panelist 1

Speech speed

153 words per minute

Speech length

1676 words

Speech time

656 seconds

Arabic content is over-moderated while Hebrew content is under-moderated due to classifier availability disparities

Explanation

During the 2021 violence surge, META had Arabic language classifiers for hate speech detection but no Hebrew classifiers, leading to systematic bias in enforcement. Even after Hebrew classifiers were developed following criticism, they remained poorly trained and ineffective at detecting incitement to violence and genocide.


Evidence

Business for Social Responsibility human rights due diligence found no Hebrew classifiers existed in 2021; after October 7th, Hebrew classifiers were still inadequately trained to capture hate speech, incitement to violence and genocide


Major discussion point

Language Inequities and Multilingual Challenges


Topics

Human rights | Legal and regulatory | Sociocultural


Agreed with

– Dhanaraj Thakur

Agreed on

Language inequities create systematic discrimination in content moderation


Translation errors have led to false terrorism accusations and wrongful arrests

Explanation

Automated translation systems have made critical errors with severe real-world consequences, including false terrorism alerts that led to police arrests. These errors demonstrate how technical failures in AI systems can directly harm individuals through interaction with law enforcement and security systems.


Evidence

Palestinian construction worker arrested by Israeli police after Facebook mistranslated ‘good morning’ post with cigarette photo as ‘good morning, I’m going to attack them’; Instagram bio ‘praise to be God, I’m Palestinian’ mistranslated as ‘praise to be God, Palestinian terrorists are fighting for their freedom’


Major discussion point

Language Inequities and Multilingual Challenges


Topics

Human rights | Cybersecurity | Legal and regulatory


Companies over-rely on automation during crises, sacrificing accuracy for speed of content removal

Explanation

During crisis periods, platforms prioritize rapid content removal over accurate decision-making, accepting high error rates to avoid liability. This approach systematically disadvantages affected communities who need platforms most during critical moments to document abuses and share information.


Evidence

After October 7th attack, hundreds of thousands of content was removed using automation; companies feel pressure to remove content quickly to avoid liability


Major discussion point

Crisis Response and Over-Moderation


Topics

Human rights | Legal and regulatory | Cybersecurity


Agreed with

– Marlene Owizniak

Agreed on

Crisis periods lead to over-reliance on automation with harmful consequences


Disagreed with

– David Sullivan
– Dhanaraj Thakur

Disagreed on

Role of automation versus human moderation in content decisions


Confidence thresholds are lowered during crises, leading to mass censorship of legitimate content

Explanation

META lowered hate speech classifier confidence thresholds from around 85% to 25% specifically for Arabic content from Palestine after October 7th. This dramatic reduction meant that classifiers with very low confidence could automatically remove or hide content, leading to widespread censorship of legitimate expression.


Evidence

META lowered confidence thresholds for hate speech classifiers from ~85% to 25% for Arabic language content from Palestine; resulted in people banned from commenting for days, Palestinian flags and watermelon emojis removed


Major discussion point

Crisis Response and Over-Moderation


Topics

Human rights | Legal and regulatory | Sociocultural


Agreed with

– Marlene Owizniak

Agreed on

Crisis periods lead to over-reliance on automation with harmful consequences


Aggressive counter-terrorism content moderation wrongly removes content 77% of the time in Arabic

Explanation

Automated systems designed to detect and remove terrorist content in Arabic language have an extremely high error rate, incorrectly flagging legitimate content as terrorism-related in more than three-quarters of cases. This massive failure rate particularly impacts regions already subject to aggressive counter-terrorism measures.


Evidence

77% error rate specifically mentioned for Arabic content removal related to terrorism


Major discussion point

Crisis Response and Over-Moderation


Topics

Human rights | Cybersecurity | Legal and regulatory


Critical infrastructure like Al-Aqsa Mosque has been mislabeled as terrorist organization during sensitive periods

Explanation

Instagram falsely flagged Al-Aqsa Mosque, the third holiest site in Islam, as a terrorist organization in 2021, causing all related hashtags and content to be banned. This error occurred precisely when the Israeli army stormed the mosque and people were trying to document and report on the events.


Evidence

Al-Aqsa Mosque flagged as terrorist organization by Instagram in 2021 when Israeli army stormed the mosque, resulting in all hashtags and related content being banned


Major discussion point

Crisis Response and Over-Moderation


Topics

Human rights | Sociocultural | Legal and regulatory


Mandatory human rights impact assessments should be required throughout the AI development lifecycle

Explanation

Current voluntary approaches are insufficient to address the scale of human rights harms from AI systems. Comprehensive, mandatory assessments should be conducted from initial development through deployment and any subsequent modifications to ensure human rights considerations are embedded throughout the process.


Major discussion point

Community Engagement and Governance Solutions


Topics

Human rights | Legal and regulatory | Development


Agreed with

– Marlene Owizniak
– Dhanaraj Thakur

Agreed on

Community engagement and participation is critical for effective AI systems


Disagreed with

– David Sullivan
– Marlene Owizniak

Disagreed on

Optimism about LLM potential versus focus on current harms


A

Audience

Speech speed

253 words per minute

Speech length

390 words

Speech time

92 seconds

LLMs will always make mistakes and require human involvement in content moderation

Explanation

Given the inherent limitations of LLM technology, there will always be errors in automated content moderation systems. The question becomes whether there are areas where LLMs can perform adequately, or if alternative approaches like user empowerment and choice should be prioritized over automated moderation entirely.


Major discussion point

Technical Limitations and Future Considerations


Topics

Legal and regulatory | Human rights | Economic


Agreed with

– David Sullivan
– Dhanaraj Thakur

Agreed on

Human involvement remains essential in content moderation systems


Complex legal decisions require extensive qualification, raising questions about AI’s capability for nuanced judgments

Explanation

Legal professionals undergo extensive training and qualification to make content-related decisions that judges would typically handle in court systems. This raises fundamental questions about whether AI systems, regardless of their sophistication, can adequately handle the nuanced legal and ethical judgments required for content moderation.


Evidence

Reference to judges requiring extensive qualification to adjudicate content and lawyers finding these very complex decisions


Major discussion point

Technical Limitations and Future Considerations


Topics

Legal and regulatory | Human rights | Sociocultural


Agreements

Agreement points

Human involvement remains essential in content moderation systems

Speakers

– David Sullivan
– Dhanaraj Thakur
– Audience

Arguments

LLMs can enhance risk assessments, improve policy development consultation, and augment human review rather than replace it


Flexibility needed with heavier emphasis on human moderation for low-resource languages


LLMs will always make mistakes and require human involvement in content moderation


Summary

All speakers agree that despite technological advances, human oversight and involvement in content moderation remains crucial. LLMs should augment rather than replace human judgment, particularly for low-resource languages and complex decisions.


Topics

Human rights | Legal and regulatory | Sociocultural


Community engagement and participation is critical for effective AI systems

Speakers

– Marlene Owizniak
– Dhanaraj Thakur
– Panelist 1

Arguments

Greater community leadership and participation needed in building LLMs and datasets


Social media companies lack awareness of local LLM developers and researchers


Mandatory human rights impact assessments should be required throughout the AI development lifecycle


Summary

There is strong consensus that meaningful community involvement from the beginning of AI development is essential, including local researchers, affected communities, and comprehensive stakeholder engagement throughout the AI lifecycle.


Topics

Human rights | Development | Sociocultural


Language inequities create systematic discrimination in content moderation

Speakers

– Dhanaraj Thakur
– Panelist 1

Arguments

Wide disparity exists between high-resource languages like English and low-resource languages in available training data


Users experience longer moderation times, unjust content removal, and shadow banning in low-resource languages


Arabic content is over-moderated while Hebrew content is under-moderated due to classifier availability disparities


Summary

Both speakers agree that significant language disparities in AI training data and system development lead to discriminatory outcomes, with non-English and particularly Arabic content facing systematic bias and poor moderation quality.


Topics

Human rights | Sociocultural | Legal and regulatory


Crisis periods lead to over-reliance on automation with harmful consequences

Speakers

– Marlene Owizniak
– Panelist 1

Arguments

Marginalized groups face both over-enforcement and under-enforcement of content moderation


Companies over-rely on automation during crises, sacrificing accuracy for speed of content removal


Confidence thresholds are lowered during crises, leading to mass censorship of legitimate content


Summary

Both speakers identify that during crisis situations, platforms increase automated moderation at the expense of accuracy, disproportionately harming marginalized communities who need platforms most during critical moments.


Topics

Human rights | Legal and regulatory | Cybersecurity


Similar viewpoints

Both speakers highlight how AI systems create disproportionate surveillance and enforcement risks for minority and marginalized communities, with particularly severe impacts on Arabic-speaking populations.

Speakers

– Marlene Owizniak
– Panelist 1

Arguments

LLMs can infer sensitive attributes more than traditional ML systems, putting minorities at risk of targeting and surveillance


Aggressive counter-terrorism content moderation wrongly removes content 77% of the time in Arabic


Topics

Human rights | Cybersecurity | Legal and regulatory


Both speakers see potential in alternative, more collaborative approaches to content moderation that move beyond centralized corporate control toward community-driven and open-source solutions.

Speakers

– David Sullivan
– Dhanaraj Thakur

Arguments

Open source initiatives like ROOST provide opportunities for collaborative safety tooling


Different moderation models exist beyond centralized approaches, including community-based solutions


Topics

Development | Economic | Sociocultural


Both speakers recognize that AI systems inherently struggle with content that falls outside normal patterns, whether it’s protest content or novel challenges, due to their statistical nature.

Speakers

– Marlene Owizniak
– David Sullivan

Arguments

Protest and contrarian content falls outside statistical bell curves, making it vulnerable to errors


Models struggle with novel challenges not adequately represented in training data


Topics

Human rights | Legal and regulatory | Cybersecurity


Unexpected consensus

Industry-civil society collaboration potential

Speakers

– David Sullivan
– Dhanaraj Thakur
– Marlene Owizniak

Arguments

Open source initiatives like ROOST provide opportunities for collaborative safety tooling


Greater community leadership and participation needed in building LLMs and datasets


ECNL has been working with Discord on piloting what we call a framework for meaningful engagement


Explanation

Despite the critical tone toward tech companies throughout the discussion, there was unexpected consensus that meaningful collaboration between industry and civil society is both possible and necessary, with concrete examples of successful partnerships already emerging.


Topics

Development | Legal and regulatory | Economic


Technical limitations acknowledgment across all stakeholders

Speakers

– David Sullivan
– Dhanaraj Thakur
– Panelist 1
– Audience

Arguments

Models struggle with novel challenges not adequately represented in training data


Code switching, agglutinative language structures, and diglossia create additional challenges for LLM analysis


Translation errors have led to false terrorism accusations and wrongful arrests


Complex legal decisions require extensive qualification, raising questions about AI’s capability for nuanced judgments


Explanation

Surprisingly, even the industry representative openly acknowledged significant technical limitations of LLMs, creating consensus across all stakeholders about the fundamental constraints of current AI technology for content moderation.


Topics

Legal and regulatory | Human rights | Sociocultural


Overall assessment

Summary

The discussion revealed strong consensus on key issues: the necessity of human involvement in content moderation, the critical importance of community engagement, the systematic discrimination created by language inequities, and the harmful over-reliance on automation during crises. There was also unexpected agreement on the potential for industry-civil society collaboration and honest acknowledgment of technical limitations.


Consensus level

High level of consensus on fundamental principles and problems, with implications suggesting that despite different perspectives, there is a shared foundation for developing more equitable and effective approaches to AI-driven content moderation. This consensus provides a strong basis for collaborative solutions that prioritize human rights, community involvement, and technical humility.


Differences

Different viewpoints

Role of automation versus human moderation in content decisions

Speakers

– David Sullivan
– Dhanaraj Thakur
– Panelist 1

Arguments

LLMs can enhance risk assessments, improve policy development consultation, and augment human review rather than replace it


Flexibility needed with heavier emphasis on human moderation for low-resource languages


Companies over-rely on automation during crises, sacrificing accuracy for speed of content removal


Summary

David Sullivan emphasizes LLMs augmenting rather than replacing human review and sees potential for AI to improve content moderation processes. Dhanaraj Thakur advocates for heavier human moderation especially for low-resource languages. Panelist 1 criticizes the over-reliance on automation during crises, arguing companies prioritize speed over accuracy.


Topics

Human rights | Legal and regulatory | Sociocultural


Optimism about LLM potential versus focus on current harms

Speakers

– David Sullivan
– Marlene Owizniak
– Panelist 1

Arguments

Content moderation represents a concrete business application for AI with specific technical opportunities


LLMs can reinforce existing systemic discrimination, censorship, and surveillance


Mandatory human rights impact assessments should be required throughout the AI development lifecycle


Summary

David Sullivan expresses cautious optimism about LLMs as concrete business applications with genuine opportunities. Marlene Owizniak and Panelist 1 focus more heavily on documenting current harms and the need for stronger regulatory oversight, with less emphasis on potential benefits.


Topics

Human rights | Legal and regulatory | Economic


Unexpected differences

Degree of technical optimism about LLM capabilities

Speakers

– David Sullivan
– Marlene Owizniak

Arguments

Generative AI can improve explainability of content moderation decisions and provide better context to users


LLMs can infer sensitive attributes more than traditional ML systems, putting minorities at risk of targeting and surveillance


Explanation

Despite both being from organizations that work closely with tech companies, David maintains more optimism about LLM potential for improving user experience and transparency, while Marlene emphasizes how the same capabilities create surveillance risks. This disagreement is unexpected given their similar institutional positions and shared concern for human rights.


Topics

Human rights | Cybersecurity | Legal and regulatory


Overall assessment

Summary

The main areas of disagreement center on the appropriate balance between automation and human oversight, the level of optimism about LLM potential versus focus on current harms, and implementation approaches for community engagement. While all speakers acknowledge both benefits and risks of LLMs, they differ significantly in emphasis and proposed solutions.


Disagreement level

Moderate disagreement with significant implications. The speakers share fundamental concerns about human rights impacts but differ on whether to focus on improving current systems or implementing stronger regulatory oversight. These differences could lead to divergent policy recommendations and advocacy strategies, potentially affecting how the technology develops and is regulated.


Partial agreements

Partial agreements

Similar viewpoints

Both speakers highlight how AI systems create disproportionate surveillance and enforcement risks for minority and marginalized communities, with particularly severe impacts on Arabic-speaking populations.

Speakers

– Marlene Owizniak
– Panelist 1

Arguments

LLMs can infer sensitive attributes more than traditional ML systems, putting minorities at risk of targeting and surveillance


Aggressive counter-terrorism content moderation wrongly removes content 77% of the time in Arabic


Topics

Human rights | Cybersecurity | Legal and regulatory


Both speakers see potential in alternative, more collaborative approaches to content moderation that move beyond centralized corporate control toward community-driven and open-source solutions.

Speakers

– David Sullivan
– Dhanaraj Thakur

Arguments

Open source initiatives like ROOST provide opportunities for collaborative safety tooling


Different moderation models exist beyond centralized approaches, including community-based solutions


Topics

Development | Economic | Sociocultural


Both speakers recognize that AI systems inherently struggle with content that falls outside normal patterns, whether it’s protest content or novel challenges, due to their statistical nature.

Speakers

– Marlene Owizniak
– David Sullivan

Arguments

Protest and contrarian content falls outside statistical bell curves, making it vulnerable to errors


Models struggle with novel challenges not adequately represented in training data


Topics

Human rights | Legal and regulatory | Cybersecurity


Takeaways

Key takeaways

LLMs in content moderation pose significant human rights risks including reinforcing discrimination, censorship, and surveillance while concentrating power among a few foundation model companies


Language inequities are severe – low-resource languages face longer moderation times, higher error rates, and systematic bias compared to high-resource languages like English


During crises, platforms over-rely on automation and lower confidence thresholds, leading to mass censorship of legitimate content while sacrificing accuracy for speed


LLMs perform better than traditional ML at understanding context and can improve user explanations, but they struggle with novel content and hallucinate confidently incorrect information


Marginalized communities face both over-enforcement (false positives) and under-enforcement (false negatives) simultaneously


Protest and contrarian content is inherently vulnerable to AI moderation errors because it falls outside statistical norms by definition


Community-driven, culturally-informed AI models show promise as alternatives to centralized foundation models


Resolutions and action items

Implement mandatory human rights impact assessments throughout the entire AI development lifecycle from design to deployment


Establish frameworks for meaningful engagement that involve affected communities from the AI design stage through deployment


Increase investment in partnerships with local LLM developers and researchers, particularly for low-resource languages


Develop open source safety tooling initiatives like ROOST to enable collaborative approaches


Involve affected communities in reinforcement learning and human feedback processes rather than relying solely on Silicon Valley experts


Require platforms to provide greater transparency about how and when LLMs are used in content moderation


Document human rights harms systematically to counter AI hype with evidence-based analysis


Unresolved issues

How to balance precision versus recall metrics in content moderation without causing systematic harm to marginalized groups


Whether LLMs are fundamentally the right technology for content moderation or if alternative user empowerment approaches should be prioritized


How to scale human involvement in content moderation while maintaining timely responses during crises


How to address the economic sustainability of LLMs given their high operational costs


How to prevent the replication of colonial language hierarchies and power dynamics in multilingual AI systems


How to ensure adequate representation of indigenous and minority languages in AI development


How to create effective oversight mechanisms for proprietary AI systems used by social media companies


Suggested compromises

Implement blended human-AI approaches that augment rather than replace human moderators, with flexibility to emphasize human moderation more heavily for low-resource languages


Use LLMs to enhance human reviewer capabilities by providing better context and routing obviously violative content away from humans rather than fully automating decisions


Develop community-specific moderation models that can be tailored to different contexts (like subreddit moderators) rather than relying solely on centralized approaches


Create tiered systems where confidence thresholds and automation levels can be adjusted based on language resources and cultural context


Establish partnerships between large tech companies and local researchers/communities to combine resources with cultural expertise


Thought provoking comments

Any kind of decision made at the foundation level, let’s say, defining Palestinian content as terrorist content will then also trickle down to the deployer level unless it’s explicitly fine-tuned. What this means for freedom of expression globally is that content moderation defined at the foundation level will also be replicated on the deployment one and really there’s even more homogeneity of speech as before.

Speaker

Marlene Owizniak


Reason

This comment crystallizes one of the most critical structural issues with LLM-based content moderation – the concentration of power and how biases cascade through the entire ecosystem. It moves beyond technical discussions to highlight the systemic implications for global freedom of expression.


Impact

This framing established the power dynamics theme that ran throughout the discussion, setting up the foundation for later speakers to provide concrete examples of how this plays out in practice, particularly Marwa’s examples from the MENA region.


There were no Hebrew classifiers to moderate hate speech in Hebrew language, but they were such for Arabic. One would ask a question here is that why, despite the context was very clear, there were high incitement… the company did not think that it was a priority at the time to roll out classifiers that would be able to automatically detect and remove such harmful and potentially violative content.

Speaker

Marwa Fatafta


Reason

This comment exposes the political dimensions of seemingly technical decisions about language support in AI systems. It reveals how resource allocation decisions by tech companies can systematically disadvantage certain communities while protecting others, even in contexts of clear harm.


Impact

This shifted the discussion from abstract concerns about bias to concrete examples of how technical decisions have real-world consequences for vulnerable populations. It demonstrated how the ‘concentration of power’ issue Marlene introduced manifests in practice.


There’s also the issue of diglossia… Often in many of these languages, particularly in those that have gone through, like, the colonial experience, there’s a combined use of two languages… one would represent power status and issues of importance… whereas [the other] would be used… a more mundane function… to what extent will the development of models around this… replicate or exacerbate this kind of power dynamics between these two languages.

Speaker

Dhanaraj Thakur


Reason

This comment introduces sophisticated linguistic and postcolonial analysis to the technical discussion, showing how LLMs might not just fail to understand languages but actively perpetuate colonial power structures embedded in language use patterns.


Impact

This deepened the conversation by connecting historical colonialism to contemporary AI systems, adding a crucial dimension that moved the discussion beyond technical performance metrics to questions of historical justice and power reproduction.


From my observation… companies tend to over rely on automation around times of crises. And particularly when there are attacks… they feel like under pressure that they need to remove as fast as possible large amounts of content… companies are willing to sacrifice… accuracy in the decisions, as long as we try to catch as large amounts of content as possible.

Speaker

Marwa Fatafta


Reason

This insight reveals how crisis situations create perverse incentives that amplify the worst aspects of automated content moderation, showing how the precision vs. recall trade-off becomes weaponized against marginalized communities during their most vulnerable moments.


Impact

This comment connected the technical discussion of precision vs. recall that David had introduced to real-world crisis scenarios, showing how technical trade-offs become political choices with severe consequences for human rights during critical moments.


AI is neither artificial nor intelligent. It uses a lot of infrastructure, a lot of hardware, and it’s mostly guesstimates… LLMs… is basically statistics on steroids. It’s not divine intelligence. It’s just a lot of data with a lot of computing power, which is also one of the reasons why it’s so concentrated.

Speaker

Marlene Owizniak


Reason

This demystifying comment cuts through AI hype to reveal the material and statistical reality of these systems, directly connecting their technical limitations to their concentrated ownership structure.


Impact

This reframing helped ground the discussion in material reality rather than technological mysticism, providing a foundation for more realistic policy discussions and connecting technical limitations to economic concentration.


Even the best intentioned platforms will make errors just because this is content that falls outside of data sets and the bell curve. It is, by definition, exceptional or contrarian… our organization… we work a lot on protests, civic space, assembly and association. These are often actions and content that are, by default, contrarian, minority, anti-power, protest.

Speaker

Marlene Owizniak


Reason

This comment reveals a fundamental incompatibility between the statistical nature of AI systems and the protection of dissent and protest rights, showing how the technology is structurally biased against the very content that democratic societies most need to protect.


Impact

This insight shifted the conversation from fixable bias problems to fundamental structural incompatibilities, suggesting that some human rights issues with LLMs may be inherent rather than solvable through better training or fine-tuning.


Overall assessment

These key comments transformed what could have been a technical discussion about AI performance into a sophisticated analysis of power, colonialism, and structural inequality. The speakers successfully connected abstract technical concepts to concrete human rights harms, while revealing how seemingly neutral technical decisions embed political choices. The discussion evolved from identifying problems to understanding their systemic nature – moving from ‘LLMs make mistakes’ to ‘LLMs systematically reproduce and amplify existing power structures.’ The comments also demonstrated how crisis situations exploit these structural vulnerabilities, making the stakes clear and urgent. Most importantly, the speakers avoided both uncritical AI hype and complete technological pessimism, instead providing a nuanced analysis that grounds policy recommendations in material reality while maintaining focus on community-driven alternatives.


Follow-up questions

How can social media companies better connect with local LLM developers and researchers in different regions?

Speaker

Dhanaraj Thakur


Explanation

There’s a disconnect between social media companies and local communities developing LLMs, which could benefit content moderation systems but companies are often unaware of these efforts


What are the specific technical details of how LLMs can support human content raters?

Speaker

David Sullivan


Explanation

Sullivan referenced a Google research paper from 2024 about leveraging LLMs to support human raters but noted the technical details were beyond his expertise


How can scaling and timely reactions be addressed when humans are part of the content moderation chain?

Speaker

Audience member from IETF


Explanation

This addresses the fundamental challenge of balancing human oversight with the need for rapid, large-scale content moderation


How can complex legal decisions about content be effectively translated into LLM training and validation processes?

Speaker

Professor Julia Hornley


Explanation

Legal content decisions require extensive qualification and expertise, raising questions about how this complexity can be captured in AI systems


What alternative business models could support community-based LLM development for content moderation?

Speaker

Dhanaraj Thakur


Explanation

Community-driven models require different economic structures than current centralized approaches


Will LLMs actually be widely deployed given their high operational costs?

Speaker

Marlene Owizniak


Explanation

The economic viability of LLMs for content moderation remains uncertain due to expensive computational requirements


How can platforms be urged to share more data about when and how LLMs are used in content moderation?

Speaker

Marlene Owizniak


Explanation

Lack of transparency makes it difficult to assess and improve LLM-based content moderation systems


How can reinforcement learning phases better involve affected communities rather than just Silicon Valley experts?

Speaker

Marlene Owizniak


Explanation

Current human feedback processes are homogenous and may perpetuate bias rather than improve LLM performance


What are the most effective ways for governments and civil society to influence technical LLM design beyond legal interventions?

Speaker

Balthazar from University College London


Explanation

Understanding pathways for external stakeholders to impact proprietary AI systems used by social media companies


How can mandatory human rights impact assessments be implemented throughout the AI development lifecycle?

Speaker

Marwa Fatafta


Explanation

Current voluntary assessments are insufficient given the scale of human rights impacts from LLM-based content moderation


Disclaimer: This is not an official session record. DiploAI generates these resources from audiovisual recordings, and they are presented as-is, including potential errors. Due to logistical challenges, such as discrepancies in audio/video or transcripts, names may be misspelled. We strive for accuracy to the best of our ability.