U.S. export controls on advanced semiconductors were intended to slow China’s AI progress, but they may have inadvertently spurred innovation. Unable to rely solely on the latest hardware, companies like Hangzhou-based DeepSeek have been forced to find creative solutions to do more with less.
What is more, China is pursuing an open-source strategy and emerging as one of the biggest providers of powerful, fully open-source AI models in the world.
This month, DeepSeek released its R1 model, using advanced techniques such as pure reinforcement learning to create a model that’s not only among the most formidable in the world, but is fully open source, making it available for anyone in the world to examine, modify, and build upon.
DeepSeek-R1 demonstrates that China is not out of the AI race and, in fact, may yet dominate global AI development with its surprising open-source strategy. By open-sourcing competitive models, Chinese companies can increase their global influence and potentially shape international AI standards and practices. Open-source projects also attract global talent and resources to contribute to Chinese AI development. The strategy further enables China to extend its technological reach into developing countries, potentially embedding its AI systems—and by extension, its values and norms—into global digital infrastructure.
DeepSeek-R1’s performance is comparable to OpenAI’s top reasoning models across a range of tasks, including mathematics, coding, and complex reasoning. For example, on the AIME 2024 mathematics benchmark, DeepSeek-R1 scored 79.8% compared to OpenAI-o1’s 79.2%. On the MATH-500 benchmark, DeepSeek-R1 achieved 97.3% versus o1’s 96.4%. In coding tasks, DeepSeek-R1 reached the 96.3rd percentile on Codeforces, while o1 reached the 96.6th percentile – although it’s important to note that benchmark results can be imperfect and should not be overinterpreted.
But what’s most remarkable is that DeepSeek was able to achieve this largely through innovation rather than relying on the latest computer chips.
They introduced MLA (multi-head latent attention), which reduces memory usage to just 5-13% of the commonly used MHA (multi-head attention) architecture. MHA is a technique widely used in AI to process multiple streams of information simultaneously, but it requires a lot of memory.
To make their model even more efficient, DeepSeek created the DeepSeekMoESparse structure. “MoE” stands for Mixture-of-Experts, which means the model uses only a small subset of its components (or “experts”) for each task, instead of running the entire system. The “sparse” part refers to how only the necessary experts are activated, saving computing power and reducing costs.
DeepSeek-R1’s architecture has 671 billion parameters, but only 37 billion are activated during operation, demonstrating remarkable computational efficiency. The company has published a comprehensive technical report on GitHub, offering transparency into the model’s architecture and training process. The accompanying open-source code includes the model’s architecture, training pipeline, and related components, enabling researchers to fully understand and replicate its design.
These innovations allow DeepSeek’s model to be both powerful and significantly more affordable than its competitors. This has already triggered an inference price war in China, which will likely spill over to the rest of the world.
DeepSeek charges a small fraction of what OpenAI-o1 costs for API usage. This dramatic reduction in costs could potentially democratize access to advanced AI capabilities, allowing smaller organizations and individual researchers to leverage powerful AI tools that were previously out of reach.
DeepSeek has also pioneered the distillation of its large model’s capabilities into smaller, more efficient models. These distilled models, ranging from 1.5B to 70B parameters, are also open-sourced, providing the research community with powerful, efficient tools for further innovation.
By making their models freely available for commercial use, distillation, and modification, DeepSeek is building goodwill within the global AI community, and potentially setting new standards for transparency in AI development.
DeepSeek was founded by Liang Wenfeng, 40, one of China’s top quantitative investors. His hedge fund, High-Flyer, finances the company’s AI research.
In a rare interview in China, DeepSeek founder Liang issued a warning to OpenAI: “In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up.”
DeepSeek is part of a growing trend of Chinese companies contributing to the global open-source AI movement, countering perceptions that China’s tech sector is primarily focused on imitation rather than innovation.
In September, China’s Alibaba unveiled over 100 new open-source AI models as part of the Qwen 2.5 family, which support over 29 languages. Chinese search giant Baidu has the Ernie series, Zhipu AI has the GLM series and MiniMax the MiniMax-01 family, all offering competitive performance at significantly lower costs compared to leading U.S. models.
As China continues to invest in and promote open-source AI development, while simultaneously navigating the challenges posed by export controls, the global technology landscape is likely to see further shifts in power dynamics, collaboration patterns, and innovation trajectories. The success of this strategy could position China as a leading force in shaping the future of AI, with far-reaching consequences for technological progress, economic competitiveness, and geopolitical influence.