OpenAI study links AI hallucinations to flawed testing incentives

Researchers propose penalising confident errors more than uncertainty to reduce false but fluent AI outputs.

OpenAI says AI hallucinations persist because current evaluations reward guessing instead of admitting uncertainty.

OpenAI researchers say large language models continue to hallucinate because current evaluation methods encourage them to guess rather than admit uncertainty.

Hallucinations, defined as confident but false statements, persist despite advances in models such as GPT-5. Low-frequency facts, like specific dates or names, are particularly vulnerable.

The study argues that while pretraining predicts the next word without true or false labels, the real problem lies in accuracy-based testing. Evaluations that reward lucky guesses discourage models from saying ‘I don’t know’.

Researchers suggest penalising confident errors more heavily than uncertainty, and awarding partial credit when AI models acknowledge limits in knowledge. They argue that only by reforming evaluation methods can hallucinations be meaningfully reduced.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot