MIT-IBM researchers improve large language models with PaTH Attention

Combining PaTH Attention with selective forgetting mechanisms further enhances AI models’ ability to prioritise relevant information over long sequences.

MIT and IBM scientists created PaTH Attention, a data-aware encoding method that improves long-text reasoning and state tracking in large language models.

Researchers at MIT and the MIT-IBM Watson AI Lab have introduced a new attention mechanism designed to enhance the capabilities of large language models (LLMs) in tracking state and reasoning across long texts.

Unlike traditional positional encoding methods, the PaTH Attention system adapts to the content of words, enabling models to follow complex sequences more effectively.

PaTH Attention models sequences through data-dependent transformations, allowing LLMs to track how meaning changes between words instead of relying solely on relative distance.

The approach improves performance on long-context reasoning, multi-step recall, and language modelling benchmarks, all while remaining computationally efficient and compatible with GPUs.

Tests demonstrated consistent gains in perplexity and content-awareness compared with conventional methods. The team combined PaTH Attention with FoX to down-weight less relevant information, improving reasoning and long-sequence understanding.

According to senior author Yoon Kim, these advances represent the next step in developing general-purpose building blocks for AI, combining expressivity, scalability, and efficiency for broader applications in structured domains such as biology.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot