Rising DRAM prices push memory to the centre of AI strategy
Memory optimisation is emerging as a central factor in AI infrastructure, with rising DRAM prices and complex caching systems pushing companies to refine orchestration strategies and cut inference costs.
The cost of running AI systems is shifting towards memory rather than compute, as the price of DRAM has risen sharply over the past year. Efficient memory orchestration is now becoming a critical factor in keeping inference costs under control, particularly for large-scale deployments.
Analysts such as Doug O’Laughlin and Val Bercovici of Weka note that prompt caching is turning into a complex field.
Anthropic has expanded its caching guidance for Claude, with detailed tiers that determine how long data remains hot and how much can be saved through careful planning. The structure enables significant efficiency gains, though each additional token can displace previously cached content.
The growing complexity reflects a broader shift in AI architecture. Memory is being treated as a valuable and scarce resource, with optimisation required at multiple layers of the stack.
Startups such as Tensormesh are already working on cache optimisation tools, while hyperscalers are examining how best to balance DRAM and high-bandwidth memory across their data centres.
Better orchestration should reduce the number of tokens required for queries, and models are becoming more efficient at processing those tokens. As costs fall, applications that are currently uneconomical may become commercially viable.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
