BEIJING: Chinese artificial intelligence startup DeepSeek this week unveiled a novel approach to training large language models that researchers say could boost performance while lowering computational costs at a time when global AI development faces hardware constraints.
In a technical paper published on 1 January, DeepSeek introduced Manifold-Constrained Hyper-Connections, or mHC, a revision of residual and hyper-connection mechanisms that underpin most modern AI systems. The architecture is designed to stabilise how information flows between layers of neural networks during training, potentially improving efficiency without significantly increasing resource demands.
DeepSeek researchers tested the approach on models with 3 billion, 9 billion and 27 billion parameters, finding performance improvements across several benchmarks when compared with previous hyper-connection techniques, while adding minimal hardware overhead.
ndustry analysts spoken to by Business Insider described the work as significant. "The willingness to share important findings with the industry … showcases a newfound confidence in the Chinese AI industry," said Lian Jye Su, a chief analyst at consulting firm Omdia. Another analyst called the method a "striking breakthrough" for scaling models.
DeepSeek's mHC framework arrives amid broader efforts within China's tech sector to build more efficient training systems amid limited access to cutting-edge AI chips. StartupNews noted the company's research reflects Beijing's push for efficiency as local labs compete with U.S. and global AI firms.
While DeepSeek has not tied mHC directly to a product launch, observers say it could inform the development of the company's next flagship models.
Human