Researchers believe AI transparency is within reach by 2027

Experts say decoding AI could be key to safe adoption in critical areas.

Generative AI, researchers, Anthropic, Dario Amodei, Goodfire

Top AI researchers admit they still do not fully understand how generative AI models work. Unlike traditional software that follows predefined logic, gen AI models learn to generate responses independently, creating a challenge for developers trying to interpret their decision-making processes.

Dario Amodei, co-founder of Anthropic, described this lack of understanding as unprecedented in tech history. Mechanistic interpretability — a growing academic field — aims to reverse engineer how gen AI models arrive at outputs.

Experts compare the challenge to understanding the human brain, but note that, unlike biology, every digital ‘neuron’ in AI is visible.

Companies like Goodfire are developing tools to map AI reasoning steps and correct errors, helping prevent harmful use or deception. Boston University professor Mark Crovella says interest is surging due to the practical and intellectual appeal of interpreting AI’s inner logic.

Researchers believe the ability to reliably detect biases or intentions within AI models could be achieved within a few years.

This transparency could open the door to AI applications in critical fields like security, and give firms a major competitive edge. Understanding how these systems work is increasingly seen as vital for global tech leadership and public safety.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!