26 Dec 2025

MIT-IBM researchers improve large language models with PaTH Attention

Combining PaTH Attention with selective forgetting mechanisms further enhances AI models’ ability to prioritise relevant information over long sequences.

Researchers at MIT and the MIT-IBM Watson AI Lab have introduced a new attention mechanism designed to enhance the capabilities of large language models (LLMs) in tracking state and reasoning across long texts.

Unlike traditional positional encoding methods, the PaTH Attention system adapts to the content of words, enabling models to follow complex sequences more effectively.

PaTH Attention models sequences through data-dependent transformations, allowing LLMs to track how meaning changes between words instead of relying solely on relative distance.

The approach improves performance on long-context reasoning, multi-step recall, and language modelling benchmarks, all while remaining computationally efficient and compatible with GPUs.

Tests demonstrated consistent gains in perplexity and content-awareness compared with conventional methods. The team combined PaTH Attention with FoX to down-weight less relevant information, improving reasoning and long-sequence understanding.

According to senior author Yoon Kim, these advances represent the next step in developing general-purpose building blocks for AI, combining expressivity, scalability, and efficiency for broader applications in structured domains such as biology.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!