New AI method improves transparency in computer vision models

Experiments showed improved accuracy and clearer reasoning compared with existing concept bottleneck models.

A new MIT technique helps computer vision models explain their predictions by extracting concepts directly from their internal knowledge.

Researchers at MIT have developed a new technique designed to improve how computer vision models explain their predictions while maintaining strong accuracy. Transparency is crucial as AI enters fields like healthcare and autonomous driving, where decisions must be clear.

The method uses concept bottleneck models, which enable AI to base its predictions on human-understandable concepts. Traditional approaches rely on expert-defined concepts that can be incomplete or ill-suited, sometimes lowering model performance.

Researchers instead created a system that extracts concepts the AI learned during training. A sparse autoencoder selects key features, and a multimodal language model turns them into plain-language descriptions and labels.

The resulting module forces the AI to make predictions using only those extracted concepts.

Tests on bird classification and medical image datasets showed that the new method improved accuracy and provided clearer explanations. Findings suggest that using a model’s internal concepts can boost transparency and accountability in AI systems.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot