#iii Beyond Citations | How China gamed AI hardware restrictions
Yet another take on DeepSeek? No. This is about Tencent's Hunyuan-Large LLM.
Authored by Lokendra Sharma, Beyond Citations grounds vital tech developments in foundational scholarship. Because academic work is deeply relevant beyond citations in the scholarly universe. Delivered to your inbox every Wednesday.
This edition examines how exactly did Chinese AI companies, particularly Tencent, game the US AI export restrictions and developed highly capable LLMs.
This is not yet another take on DeepSeek’s R1 large language model (LLM) that has made waves globally and contributed to the big dip in NVIDIA's market cap. Much has been written about it already, from the New York Times to substack newsletters to a famous 12000-word long blog post that contributed to the epic fall of NVIDIA. The first edition of Beyond Citations covered it as well.  Â
This edition instead focuses on a lesser-known LLM by a well-known Chinese company called Tencent. This is the same company behind WeChat, an all-encompassing social media platform used by a billion-plus people in China. It is also one of the largest video game creators in the world. Given their frequent run-ins with graphical processing units, servers, algorithms and everything that has to do with the Fourth Industrial Revolution, it is only natural that Tencent made an early leap towards developing LLMs. That Chinese conglomerates like Tencent have a huge talent base located on the high-end spectrum only speeds things up for them.Â
In November 2024, Tencent released Hunyuan-Large LLM on public repositories Github and Hugging Face. Like DeepSeek will do later, Tencent released open weights for the LLM. This was not just any other ‘open-source’ AI model. According to a report in Analytics India Magazine, it outperformed Meta’s open-source LLM ‘Llama3.170B model on several benchmarks in English and Chinese’ and is ‘comparable with Meta’s flagship Llama 3.1-405B model on tasks involving language understanding, coding, maths and logical reasoning.’Â
But why is the Hunyuan model important when DeepSeek has already hogged so much limelight? Because the fact that Hunyuan performed on par with the best of what the open-source LLM world has to offer about two-three months before DeepSeek’s arrival says something about Chinese AI ecosystem: DeepSeek R1 was not a one-off innovation but reflected an increased tendency by Chinese companies to work around AI hardware restrictions imposed by the US in some form since 2022.Â
How did Chinese tech companies game the AI hardware restrictions imposed by the US?
Ritwik Gupta, Leah Walker and Andrew W. Reddie provide some answers in their following November 2024 pre-print paper (an output of the AI Frontiers Initiative at the Berkeley Risk and Security Lab):
Gupta, R., Walker, L., & Reddie, A. W. (2024). Whack-a-Chip: The Futility of Hardware-Centric Export Controls. arXiv pre-print, https://doi.org/10.48550/arXiv.2411.14425Â (complete pdf available)
Long answer short, China gamed the restriction in three ways. The first two are easier to arrive at: stockpiling chips just before US export restrictions kick-in as well as illegally acquiring chips through black market. The third is a more difficult but crucial way through which high-end-chips deprived Chinese companies have found a way around restrictions: clever process innovations on the software end to compensate for what the hardware lacks.Â
Tencent largely used export restrictions compliant NVIDIA H20 chip to train its Hunyuan model. For perspective, H20 was NVIDIA’s downgraded version of H100 tailored for the Chinese market. But Tencent was able to leverage techniques such as mixture-of-experts (building an aggregate model that uses less compute as opposed to a monolithic model) and quantisation (involving the use of less precise number representations that lead to memory savings while not significantly reducing accuracy of the model). IBM offers a simple explanation of mixture of experts actually is:Â
Mixture of Experts architectures enable large-scale models, even those comprising many billions of parameters, to greatly reduce computation costs during pre-training and achieve faster performance during inference time. Broadly speaking, it achieves this efficiency through selectively activating only the specific experts needed for a given task, rather than activating the entire neural network for every task.
By highlighting innovation on the software side as well as hardware restriction workarounds, the authors are able to demonstrate the futility of export controls in place in 2024. But they do not stop there. They also predict that any expansion of export controls (as did happen in the final days of the Biden administration in January 2025) would not help. They rather argue for a monitoring and verification mechanism to continuously examine Chinese strides in AI. They do not make a strong case for how this will help the US. Also, the paper does not even attempt to answer whether there is any way of hindering China’s AI progress — thereby, projecting certain inevitability around it. The really difficult question facing the Trump administration is: how can US maintain its edge in AI (software and hardware) while keeping the engines of profit for US tech companies up and running?
Limitations of the paper notwithstanding, here’s a key (bonus) point that makes the paper valuable: Gupta, Walker and Reddie make a strong case for ‘analyzing representative code signatures in multiple Tencent [or any other AI developer] codebases’ to find out the classes of GPUs a model developer uses. This is particularly useful to determine if Chinese AI developers are using export-controlled GPUs in a very specific case: codebases are made public in part or full while all classes of GPUs used are not declared explicitly or left ambiguous by the model developers.
But wait, what about India? Should the Indian government promote open-source AI development? Or, should it go full throttle on AI sovereignty with a focus on propriertary foundational models? If you are curious about these questions, check this out:
Apply here: https://school.takshashila.org.in/pgp