MIT develops method to detect overconfident AI

A new method compares a model’s output with similar LLMs to identify unreliable or hallucinated predictions.

A new method compares a model’s output with similar LLMs to identify unreliable or hallucinated predictions.

Researchers at MIT have introduced a new method to assess the reliability of large language models more accurately. Many LLMs can produce confident yet incorrect responses, posing risks in high-stakes applications such as healthcare or finance.

The team combined self-consistency checks with an ensemble approach, comparing a model’s outputs to similar LLMs. This total uncertainty (TU) metric more accurately identifies overconfident predictions and can flag hallucinations that simpler methods may miss.

Experiments on ten common tasks- including question-answering, translation, summarisation, and math reasoning- showed that TU outperformed individual uncertainty measures.

The ensemble approach relies on models from different developers to ensure diversity and credibility, offering a practical and energy-efficient way to gauge AI confidence.

Researchers suggest TU could also help reinforce correct answers during training, improving overall model performance. Future developments aim to enhance the metric’s accuracy for open-ended tasks and explore additional forms of uncertainty.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!