Research unveils AI overreliance on memorisation

MIT CSAIL research reveals that large language models (LLMs) excel in familiar tasks but struggle in novel scenarios, highlighting their reliance on memorisation rather than true reasoning abilities.

Robot, Face, Head, Person, AI, memory, memorization, problem solving

Recent research from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has uncovered significant insights into the capabilities of large language models (LLMs). The study found that while LLMs excel in familiar scenarios, they struggle with novel tasks, raising questions about their true reasoning abilities versus reliance on memorisation.

The researchers compared LLMs’ performance on common tasks to hypothetical scenarios that deviated from their training data. For instance, models like GPT-4 showed proficiency in arithmetic using base-10 but faltered with other number bases, indicating a lack of generalisable addition skills. The pattern was consistent across various tasks, including spatial reasoning and chess, where models performed no better than random guessing in unfamiliar settings.

Lead author Zhaofeng Wu emphasised the importance of these findings, noting that as AI becomes more integrated into society, it must handle diverse scenarios reliably. The study’s insights aim to inform the development of more adaptable and robust future LLMs. The team plans to expand their research to include more complex and varied tasks, further exploring AI’s limitations and improving interpretability.

Supported by the MIT–IBM Watson AI Lab, the MIT Quest for Intelligence, and the National Science Foundation, the study was presented at the North American Chapter of the Association for Computational Linguistics (NAACL).