AI efficiency technique faces critical limits, quantization may harm performance
Researchers reveal that cost-saving AI techniques may degrade performance when applied to massive datasets.
Quantization, a technique widely used to improve AI efficiency, may have reached its limits, according to recent research. This method reduces the precision of data in AI models, making them faster and cheaper to operate. However, studies suggest that as models grow larger and are trained on vast datasets, quantization can degrade their performance.
Researchers from leading institutions found that highly trained models suffer more when quantised. For AI companies relying on massive models to enhance quality, this finding raises concerns about the long-term viability of cost-saving approaches. Quantization already impacts models like Meta’s Llama 3, which reportedly shows reduced performance compared to other AI systems.
Efforts to lower AI model costs continue as inference—using models to generate responses—remains the most significant expense for AI labs. Techniques like training in low precision and hardware supporting ultra-low bit precision are being explored. Yet, such strategies face diminishing returns and risks of quality loss if precision drops too far.
Experts believe a shift towards better data curation and filtering, alongside new architectures optimised for low-precision training, may offer solutions. These advancements could help balance efficiency and performance as AI evolves beyond traditional scaling methods.