28 Apr 2025

Anthropic aims to decode AI ‘black box’ within two years

Dario Amodei, CEO of Anthropic, calls for industry and government collaboration on AI safety.

Anthropic CEO Dario Amodei has unveiled an ambitious plan to make AI systems more transparent by 2027. In a recent essay titled ‘The Urgency of Interpretability,’ Amodei highlighted the pressing need to understand the inner workings of AI models.

He expressed concern over deploying highly autonomous systems without a clear grasp of their decision-making processes, deeming it ‘basically unacceptable’ for humanity to remain ignorant of how these systems function.

Anthropic is at the forefront of mechanistic interpretability, a field dedicated to deciphering the decision-making pathways of AI models. Despite these advancements, Amodei emphasized that much more research is needed to fully decode these complex systems.

Looking ahead, Amodei envisions conducting ‘brain scans’ or ‘MRIs’ of advanced AI models to detect potential issues like tendencies to deceive or seek power. He believes that achieving this level of interpretability could take five to ten years but is essential for the safe deployment of future AI systems.

Amodei also called on industry peers, including OpenAI and Google DeepMind, to intensify their research efforts in this area and urged governments to implement ‘light-touch’ regulations to promote transparency and safety in AI development.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!