NVIDIA platform lifts leading MoE models
The rack-scale design by NVIDIA accelerates MoE models by boosting communication among experts, offering faster inference and greater energy efficiency for developers adopting frontier architectures.
Frontier developers are adopting a mixture-of-experts architecture as the foundation for their most advanced open-source models. Designers now rely on specialised experts that activate only when needed instead of forcing every parameter to work on each token.
Major models, such as DeepSeek-R1, Kimi K2 Thinking, and Mistral Large 3, rise to the top of the Artificial Analysis leaderboard by utilising this pattern to combine greater capability with lower computational strain.
Scaling the architecture has always been the main obstacle. Expert parallelism requires high-speed memory access and near-instant communication between multiple GPUs, yet traditional systems often create bottlenecks that slow down training and inference.
NVIDIA has shifted toward extreme hardware and software codesign to remove those constraints.
The GB200 NVL72 rack-scale system links seventy-two Blackwell GPUs via fast shared memory and a dense NVLink fabric, enabling experts to exchange information rapidly, rather than relying on slower network layers.
Model developers report significant improvements once they deploy MoE designs on NVL72. Performance leaps of up to ten times have been recorded for frontier systems, improving latency, energy efficiency and the overall cost of running large-scale inference.
Cloud providers integrate the platform to support customers in building agentic workflows and multimodal systems that route tasks between specialised components, rather than duplicating full models for each purpose.
Industry adoption signals a shift toward a future where efficiency and intelligence evolve together. MoE has become the preferred architecture for state-of-the-art reasoning, and NVL72 offers a practical route for enterprises seeking predictable performance gains.
NVIDIA positions its roadmap, including the forthcoming Vera Rubin architecture, as the next step in expanding the scale and capability of frontier AI.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
