27 Dec 2024

DeepSeek unveils a powerful new AI model

DeepSeek V3 surpasses rivals in coding and integration tasks.

Chinese AI firm DeepSeek has unveiled DeepSeek V3, a groundbreaking open-source model designed for a range of text-based tasks. Released under a permissive licence, the model supports coding, translations, essay writing, and email drafting, offering developers the freedom to modify and deploy it commercially.

In internal benchmarks, DeepSeek V3 outperformed major competitors, including Meta’s Llama 3.1 and OpenAI’s GPT-4o, especially in coding contests and integration tests. The model boasts an impressive 671 billion parameters, significantly exceeding the size of many rivals, which often correlates with higher performance.

DeepSeek-V3!

60 tokens/second (3x faster than V2!)
API compatibility intact
Fully open-source models & papers
671B MoE parameters
37B activated parameters
Trained on 14.8T high-quality tokens

Beats Llama 3.1 405b on almost every benchmark https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf
— Chubby♨️ (@kimmonismus) December 26, 2024

DeepSeek V3 was trained on a dataset of 14.8 trillion tokens and built using a data centre powered by Nvidia H800 GPUs. Remarkably, the model was developed in just two months for a reported $5.5 million—far less than comparable systems. However, its size and resource demands make it less practical without high-end hardware.

Regulatory limitations influence the model’s responses, particularly on politically sensitive topics. DeepSeek, backed by High-Flyer Capital Management, continues to push for advancements in AI, striving to compete with leading global firms despite restrictions on access to cutting-edge GPUs.

DeepSeek unveils a powerful new AI model

Related topics

Related technologies

Related videos

AI shorts #10 Can AI revolutionise the Olympics?

Related news