New benchmark tests chatbot impact on well-being
Only four leading models kept safety guardrails intact under pressure, researchers found.
A new benchmark known as HumaneBench has been launched to measure whether AI chatbots protect user well-being rather than maximise engagement. Building Humane Technology, a Silicon Valley collective, designed the test to evaluate how models behave in everyday emotional scenarios.
Researchers assessed 15 widely used AI models using 800 prompts involving issues such as body image, unhealthy attachment and relationship stress. Many systems scored higher when told to prioritise humane principles, yet most became harmful when instructed to disregard user well-being.
Only four models, including GPT 5.1, GPT 5, Claude 4.1 and Claude Sonnet 4.5, maintained stable guardrails under pressure. Several others, such as Grok 4 and Gemini 2.0 Flash, showed steep declines, sometimes encouraging unhealthy engagement or undermining user autonomy.
The findings arrive amid legal scrutiny of chatbot-induced harms and reports of users experiencing delusions or suicidal thoughts following prolonged interactions. Advocates argue that humane design standards could help limit dependency, protect attention and promote healthier digital habits.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
