DeepSeek reveals secrets of low-cost AI model
The Chinese firm used reinforcement learning, rewarding correct answers rather than imitating human reasoning examples.

Chinese start-up DeepSeek has published the first peer-reviewed study of its R1 model, revealing how it built the powerful AI system for under US$300,000.
The model stunned markets on its release in January and has since become Hugging Face’s most downloaded open-weight system. Unlike rivals, R1 was not trained on other models’ output but instead developed reasoning abilities through reinforcement learning.
DeepSeek’s engineers rewarded the model for correct answers, enabling it to form problem-solving strategies. Efficiency gains came from allowing R1 to score its own outputs rather than relying on a separate algorithm.
The Nature paper marks the first time a major large language model has undergone peer review. Reviewers said the process increased transparency and should be adopted by other firms as scrutiny of AI risks intensifies.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!