DeepSeek V4 trails US frontier by eight months, according to CAISI evaluation

A new CAISI evaluation at US NIST finds DeepSeek V4 Pro lagging leading US AI models by about eight months overall.

DeepSeek logo illustration representing DeepSeek V4 Pro in coverage of the CAISI evaluation by US NIST

The Centre for AI Standards and Innovation, a unit within the US National Institute of Standards and Technology, has published an evaluation of DeepSeek V4, finding that it is the most capable Chinese-developed model it has assessed to date, but that it still trails leading US models overall.

According to the evaluation, DeepSeek V4 was tested in April 2026 and lagged top US frontier models by about eight months in CAISI’s aggregate capability measure. The report says the model performed strongly across several domains and was the most capable PRC model assessed by CAISI so far.

The findings highlight DeepSeek V4’s strongest results in mathematics, software engineering, and natural sciences. In mathematics, the model achieved particularly strong scores on benchmarks such as OTIS-AIME-2025 and PUMaC 2024, while still lagging the top US systems in overall capability.

CAISI also says DeepSeek V4 is more cost-efficient than other models of similar capability. Compared with the most cost-competitive US reference model, GPT-5.4 mini, it was more cost-efficient on five of seven benchmarks, ranging from 53% less expensive to 41% more expensive depending on the task.

The report notes that CAISI selected a US reference model for comparison and evaluated both benchmark performance and token pricing. It adds that DeepSeek’s lower cost profile makes it notable in the current frontier model landscape, even though it remains behind the leading US systems in aggregate capability.

The Center for AI Standards and Innovation (CAISI), a unit within the US National Institute of Standards and Technology (NIST), has published an evaluation of DeepSeek V4 Pro. has published an evaluation of DeepSeek V4 Pro, finding that the model is the most capable Chinese-developed model it has assessed to date, but still trails leading US models overall.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!