Large language models mimic human object perception

Using text and image data, language models mimic human object understanding, forming internal representations aligned with brain activity.

Recent research shows that large multimodal language models (LLMs) can develop object representations strikingly similar to human cognition. By analysing how these AI models understand and organise concepts, scientists found patterns in the models that mirror neural activity in the human brain.

The study examined embeddings for 1,854 natural objects, derived from millions of text-image pairings. These embeddings capture relationships between objects and were compared with brain scan data from regions like EBA, PPA, RSC and FFA.

Researchers also discovered that multimodal training, which combines text and image data, enhances model’s ability to form these human-like concepts. Findings suggest that large language models can achieve more natural understanding of the world, offering potential improvements in human-AI interaction and future model design.

Would you like to learn more about AI, tech and digital diplomacyIf so, ask our Diplo chatbot!