New AI method improves transparency in computer vision models

Experiments showed improved accuracy and clearer reasoning compared with existing concept bottleneck models.

Governments invest heavily in digital tools, yet most projects fail as structural barriers, not technology, continue to limit meaningful transformation and long-term public sector impact.

Researchers at MIT have developed a new technique designed to improve how computer vision models explain their predictions while maintaining strong accuracy. Transparency is crucial as AI enters fields like healthcare and autonomous driving, where decisions must be clear.

The method uses concept bottleneck models, which enable AI to base its predictions on human-understandable concepts. Traditional approaches rely on expert-defined concepts that can be incomplete or ill-suited, sometimes lowering model performance.

Researchers instead created a system that extracts concepts the AI learned during training. A sparse autoencoder selects key features, and a multimodal language model turns them into plain-language descriptions and labels.

The resulting module forces the AI to make predictions using only those extracted concepts.

Tests on bird classification and medical image datasets showed that the new method improved accuracy and provided clearer explanations. Findings suggest that using a model’s internal concepts can boost transparency and accountability in AI systems.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot