Why DeepSeek V4 is changing the AI model race

DeepSeek V4 brings long context, open weights and lower costs into focus. Here is why the model is generating so much hype.

DeepSeek V4 shows how the AI race is shifting from raw power to efficiency.

DeepSeek has again placed itself at the centre of the global AI race. After drawing worldwide attention with its R1 reasoning model in early 2025, the Chinese company has recently released DeepSeek V4, a new model designed to compete not only on performance, but also on price, openness and efficiency.

The hype around DeepSeek V4 is not based on a single feature. The model comes with a 1 million-token context window, open weights, two versions for different use cases and a strong focus on agentic workflows such as coding, research, document analysis and long-running tasks. In a market still dominated by expensive closed models, DeepSeek is trying to prove that powerful AI does not need to remain locked behind trademarked systems.

A model built for long memory

The most immediate difference between DeepSeek V4 and other models is context length. Both DeepSeek-V4-Pro and DeepSeek-V4-Flash support a 1-million-token context window, meaning they can process inputs far longer than those of older generations of mainstream models. According to DeepSeek’s official release, one million tokens is now the default across all official DeepSeek services.

For ordinary users, that may sound technical. In practice, it matters because a longer context allows models to work with large documents, long conversations, full codebases, legal materials, research archives or complex project histories without losing track as quickly.

That is why DeepSeek V4 is not just another chatbot release. It is aimed at the next stage of AI use, where models are expected to act less like question-answering tools and more like assistants that can follow long processes over time.

Two models for two different needs

DeepSeek V4 comes in two main versions. DeepSeek-V4-Pro is a larger and more capable model, with 1.6 trillion total parameters and 49 billion active parameters. DeepSeek-V4-Flash is a smaller model, with 284 billion total parameters and 13 billion active parameters, designed for faster and more cost-effective workloads.

That distinction is important. Not every user needs the strongest model for every task. A company summarising documents, routing queries or running basic support may choose Flash. A developer working on complex coding tasks, long-context agents or advanced reasoning may prefer Pro.

DeepSeek’s release reflects a broader trend in AI. The best model is no longer always the biggest one. Cost, speed, context size and deployment flexibility are now as important as raw benchmark performance.

Why the price matters

One reason DeepSeek attracts so much attention is its aggressive pricing. DeepSeek’s API page lists V4-Flash at USD 0.14 per 1 million input tokens on a cache miss and USD 0.28 per 1 million output tokens. V4-Pro is listed at USD 1.74 per 1 million input tokens and USD 3.48 per 1 million output tokens before the temporary 75% discount.

For developers and companies, that changes the calculation. High-performing AI models are useful only if they can be deployed at scale. If every long document, coding session or agentic workflow becomes too expensive, adoption slows down.

DeepSeek’s challenge to the market is therefore not only technical. It is economic. The company is pushing the idea that frontier-level AI should be cheaper to run, easier to access and less dependent on closed ecosystems.

The architecture behind the hype

DeepSeek V4 uses a mixture-of-experts approach, meaning only part of the model is active during each response. That helps explain why the model can be very large on paper, yet still more efficient to run than a dense model of similar overall size.

The more interesting part is how DeepSeek handles long context. NVIDIA’s technical overview explains that DeepSeek V4 uses hybrid attention, combining compression and selective attention techniques to reduce the cost of processing very long prompts. NVIDIA says these changes are designed to cut per-token inference FLOPs by 73% and reduce KV cache memory burden by 90% compared with DeepSeek-V3.2.

For a non-technical audience, the point is simple. DeepSeek V4 is trying to solve one of the biggest problems in modern AI: how to make models remember and process much more information without becoming too slow or too expensive.

That is where much of the hype comes from. The model is not merely larger. It is designed around the economics of long-context AI.

Why NVIDIA is still in the picture

DeepSeek’s R2 launch is delayed as US restrictions cut off supply of NVIDIA H20 chips built for China.

NVIDIA’s role in the DeepSeek V4 story is especially interesting. DeepSeek is often discussed as part of China’s effort to build a more independent AI ecosystem, but NVIDIA has also been quick to move forward to support developers who want to build with the model.

In its technical blog, NVIDIA describes DeepSeek V4 as a model family designed for efficient inference of million-token contexts. The company says DeepSeek-V4-Pro and V4-Flash are available through NVIDIA GPU-accelerated endpoints, while developers can also use NVIDIA Blackwell, NIM containers, SGLang and vLLM deployment options.

NVIDIA also reports that early tests of DeepSeek-V4-Pro on the GB200 NVL72 platform showed more than 150 tokens per second per user. That matters because long-context models place heavy memory pressure, as well as on compute and networking infrastructure. The model may be efficient by design, but serving it at scale still requires serious hardware.

So, DeepSeek V4 does not remove NVIDIA from the story – it complicates it. The model is part of a broader push towards more efficient AI, but the infrastructure race remains central.

The chip question behind the model

DeepSeek V4 also arrives at a time when AI infrastructure is becoming just as important as model performance. MIT Technology Review frames the release partly through that lens, noting that DeepSeek’s new model reflects China’s broader attempt to reduce reliance on foreign AI hardware and build a more self-sufficient technology stack.

That detail matters because the AI race is no longer only about who builds the most capable model. It is also about who controls the chips, software frameworks and data centres needed to run it.

Replacing NVIDIA, however, remains difficult. Its advantage lies not just in its chips, but also in the software ecosystem developers have built around its platforms over many years. Moving to alternative hardware means adapting code, rebuilding tools and proving that the new systems are stable enough for serious use.

DeepSeek V4, however, sits between two realities. It points towards China’s ambition to build a more independent AI stack, while NVIDIA’s rapid support for the model shows that frontier AI still depends heavily on established infrastructure.

Open weights as a strategic move

DeepSeek V4 is also important because the model weights are available through Hugging Face under the MIT License. That gives developers more freedom to inspect, adapt and deploy the model than they would have with a fully closed commercial system.

Open-weight models are becoming a major pressure point in the AI race. Closed models may still lead in some areas, especially in polished consumer products, enterprise support and safety layers. However, open models offer something different: flexibility.

For universities, start-ups, smaller companies and developers outside the largest AI ecosystems, that flexibility matters. It means advanced AI can be tested, modified and integrated without relying entirely on a handful of dominant providers.

Benchmarks need caution

DeepSeek presents V4-Pro as highly competitive across reasoning, coding, long-context and agentic benchmarks. Hugging Face lists results including 80.6 on SWE-bench Verified, 90.1 on GPQA Diamond and 87.5 on MMLU-Pro for DeepSeek-V4-Pro.

Those numbers are impressive, but they should not be treated as the full story. Benchmarks are useful, but they rarely capture every real-world use case. A model can score well on coding tests and still struggle with reliability, factual accuracy, safety or complex multi-step workflows in production.

That caution is important. The AI industry often turns benchmarks into headlines, while real performance depends on deployment, prompting, safety controls and the specific task at hand.

More than just another model release

DeepSeek V4 matters because it combines several trends into one release: long context, lower prices, open weights, agentic workflows and geopolitical competition. It also shows that the AI race is no longer fought only in labs, benchmarks and data centres. Visibility now matters too. Tools such as Diplo’s Digital Footprints show how digital presence shapes the way technology actors and media narratives are discovered, ranked and understood. At this stage, the competition is not only about who has the smartest model. It is also about who can make intelligence cheaper, more available and easier to deploy.

That does not mean DeepSeek has solved every problem. Questions remain around independent benchmarking, safety, data governance, infrastructure and the broader political context of Chinese AI development. Still, the release does show where the market is heading.

The next phase of AI may not be defined solely by the most powerful model. It may be defined by the model that is powerful enough, affordable enough and open enough to change how people build products, services and tools with AI.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!