Quantum Technology company unveils groundbreaking algorithm for compressing Large Language Models

The algorithm achieved a 35% reduction in parameters for the GPT-2 small model. It also demonstrated superior speech generation capabilities and improved predictive accuracy.

 Electronics, Computer, Person, Security, Tablet Computer, Blackboard

Terra Quantum, a leading quantum technology company, has unveiled TQCompressor, a groundbreaking algorithm specifically designed to compress large language models (LLMs) while maintaining performance. This innovative compression technique addresses the growing demands of generative AI models by significantly reducing the size of datasets required for pre-training on targeted tasks, compared to commonly used compression methods.

In a case study, Terra Quantum demonstrated the effectiveness of TQCompressor by compressing the GPT-2 small model, achieving an impressive 35% reduction in the number of parameters. Despite employing up to 97% less data, the compressed model exhibited superior speech generation capabilities compared to other prevalent compressed versions of its ChatGPT predecessor.

The researchers at Terra Quantum, in their work titled ‘TQCompressor: Improving Tensor Decomposition Methods in Neural Networks via Permutations,’ compressed the benchmark model GPT-2 small from 117 million parameters to 81 million. They evaluated its performance against other compressed models using various datasets, including a vast collection of Wikipedia articles. The Terra Quantum model consistently produced better results in predicting the next word in a sequence and generating coherent text based on contextual understanding.

Markus Pflitsch, CEO of Terra Quantum, emphasised the significance of the compression algorithm, stating that it can significantly reduce the energy and compute costs associated with LLMs. This advancement paves the way for optimising neural network architecture and streamlining generative AI (GenAI) to meet sustainability goals without compromising performance.

The GPT-2 small model presented in the paper shares the same foundational language architecture as GPT-2 and ChatGPT, with the GPT-2 series comprising models with 1.5 billion parameters. Reducing the overall size of these LLMs unlocks numerous new and practical use cases.

TQCompressor utilises a tensor network technique to restructure the connections between neurons while preserving the structural integrity of the model. The result is TQCompressedGPT-2, an advanced neural network model for natural language processing (NLP) tasks that achieves improved efficiency and expressivity compared to GPT-2.

Aleksei Naumov, AI Engineer at Terra Quantum and lead author of the paper, explained that compressing neural networks often leads to a loss of expressivity—the ability to capture and represent complex patterns and relationships in data. However, Terra Quantum’s optimization of the neural network enables a more effective compression process that mitigates expressivity loss, allowing for efficient and effective deployment of the AI model.

TQCompressedGPT-2 also outperforms other compressed GPT-2 models in perplexity scores, which measure the accuracy of language models in predicting data. Naumov highlighted that TQCompressedGPT-2 achieved better scores than popular compressed models like DistilGPT-2 across all benchmarking datasets.

One of the key motivations behind the development of compression techniques like TQCompressor is the significant time, computation, and energy resources required for training large NLP models.

In perspective, if every Google search incorporated LLM-generated results, the annual electricity consumption could match that of Ireland. Addressing these resource demands requires the implementation of techniques such as TQCompressor.

Terra Quantum’s researchers believe that TQCompressor has the potential to be applied to larger use cases, such as ChatGPT. They envision that quantum-inspired techniques like TQCompressor can streamline machine learning applications, develop more efficient LLMs, and transform industries across finance, healthcare, education, and beyond. By combining generative AI with quantum computing and tensor network methods, the field of AI and NLP can be further amplified and revolutionised.