MIT study finds AI chatbots underperform for vulnerable users

The study shows AI chatbots can reinforce social biases, mirroring human prejudices and limiting equal access to reliable information.

MIT research shows GPT-4, Claude 3 Opus, and Llama 3 give less-accurate answers to users with lower English proficiency, less education, or non-US origins.

Research from the MIT Centre for Constructive Communication (CCC) finds that leading AI chatbots often provide lower-quality responses to users with lower English proficiency, less education, or who are outside the US.

Models tested include GPT-4, Claude 3 Opus, and Llama 3, which sometimes refuse to answer or respond condescendingly. Using TruthfulQA and SciQ datasets, researchers added user biographies to simulate differences in education, language, and country.

Accuracy fell sharply among non-native English speakers and less-educated users, with the most significant drop among those affected by both; users from countries like Iran also received lower-quality responses.

Refusal behaviour was notable. Claude 3 Opus declined 11% of questions for less-educated, non-native English speakers versus 3.6% for control users. Manual review showed 43.7% of refusals contained condescending language.

Some users were denied access to specific topics even though they answered correctly for others.

The study echoes human sociocognitive biases, in which non-native speakers are often perceived as less competent. Researchers warn AI personalisation could worsen inequities, providing marginalised users with subpar or misleading information when they need it most.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot