Researchers urge governance after LLMs display source-driven bias
UZH researchers show LLMs shift judgments by stated authorship, penalising Chinese sources and preferring human over AI writers.
Large language models (LLMs) are increasingly used to grade, hire, and moderate text. UZH research shows that evaluations shift when participants are told who wrote identical text, revealing source bias. Agreement stayed high only when authorship was hidden.
When told a human or another AI wrote it, agreement fell, and biases surfaced. The strongest was anti-Chinese across all models, including a model from China, with sharp drops even for well-reasoned arguments.
AI models also preferred ‘human-written’ over ‘AI-written’, showing scepticism toward machine-authored text. Such identity-triggered bias risks unfair outcomes in moderation, reviewing, hiring, and newsroom workflows.
Researchers recommend identity-blind prompts, A/B checks with and without source cues, structured rubrics focused on evidence and logic, and human oversight for consequential decisions.
They call for governance standards: disclose evaluation settings, test for bias across demographics and nationalities, and set guardrails before sensitive deployments. Transparency on prompts, model versions, and calibration is essential.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!
