AI200 and AI250 set a rack-scale inference push from Qualcomm

Near-memory architecture in Qualcomm’s AI250 targets over tenfold effective bandwidth while cutting power for cost-sensitive data-centre deployments.

Qualcomm launches AI200 and AI250 to cut data-centre AI costs and power.

Qualcomm unveiled AI200 and AI250 data-centre accelerators aimed at high-throughput, low-TCO generative AI inference. AI200 targets rack-level deployment with high performance per pound per watt and 768 GB LPDDR per card for large models.

AI250 introduces a near-memory architecture that boosts adequate memory bandwidth by over tenfold while lowering power draw. Qualcomm pitches the design for disaggregated serving, improving hardware utilisation across large fleets.

Both arrive as full racks with direct liquid cooling, PCIe for scale-up, Ethernet for scale-out, and confidential computing. Qualcomm quotes around 160 kW per rack for thermally efficient, dense inference.

A hyperscaler-grade software stack spans apps to system software with one-click onboarding of Hugging Face models. Support covers leading frameworks, inference engines, and optimisation techniques to simplify secure, scalable deployments.

Commercial timing splits the roadmap: AI200 in 2026 and AI250 in 2027. Qualcomm commits to an annual cadence for data-centre inference, aiming to lead in performance, energy efficiency, and total cost of ownership.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!