Are the three giants of Silicon Valley igniting a mass production revolution, and will China's embodied intelligence claim the C position on the global stage?

Introduction: Predefined actions are today’s admission ticket, while generalization ability is tomorrow’s final ticket.

Editor: Jingcheng

Author: Jiang Jing

As the first quarter of 2026 comes to a close, a globally synchronized action in the tech sector formally announces a historic turning point for the humanoid robot industry.

The three major Silicon Valley giants—Google, Amazon, and Tesla—are making moves simultaneously, sprinting from technical empowerment and scene layout to mass production, pushing humanoid robots from the tech showcase to the industrial arena.

Meanwhile, there have been more actions from China. On March 26, the China Academy of Information and Communications Technology, in conjunction with over 40 organizations, released the first industry standard in the field of embodied intelligence. Coupled with continuous policy support, accelerated implementation by enterprises, and heightened capital enthusiasm, China is making a leap from following to running alongside, even starting to challenge for a leading position in several areas.

Will China occupy the C position in this revolution that disrupts future business rules and industrial ecology?

Global Surge: Silicon Valley Giants Move to Mass Production, Restructuring Future Productivity

No one considers humanoid robots a sci-fi concept anymore.

Recently, the synchronized actions of the three giants in Silicon Valley have made the footsteps of the mass production era clearly audible. Every step they take is aimed at reconstructing future productivity, while the follow-up from global capital and local enterprises continues to elevate the heat of this track.

Google was the first to develop a “smart brain” for robots, launching two new AI models, Gemini Robotics and Gemini Robotics-ER. The former allows robots to understand new situations without specialized training, while the latter can “understand complex and dynamic worlds,” empowering robots to operate in real-world scenarios from a technical standpoint.

Amazon is focusing on scene implementation, acquiring humanoid robot startup Fauna Robotics and logistics robot company Rivr within a week. Their layout is not just to optimize package delivery; rather, it’s to build a “robot service capillary” from factory assembly lines to living rooms, creating the next generation of labor systems.

Tesla’s mass production actions have drawn the most attention. On March 25, Optimus robot announced related talent recruitment, clearly stating it will change the landscape of labor and manufacturing economies, with the goal of achieving large-scale mass production as soon as possible. This summer, it will launch the world’s first humanoid robot production line with an annual output of one million units, pushing mass production into a substantive phase.

The layout in Silicon Valley is far from over; American local enterprises are also accelerating implementation. On the same day, the Figure03 humanoid robot developed by Figure AI entered the White House, becoming the first humanoid robot made in America, capable of multilingual communication and independently completing household chores. The company raised over $1 billion six months ago, with giants like Nvidia and LG showing support, highlighting global capital’s enthusiasm for the humanoid robot track.

Yuan Shuai, deputy director of the Investment Department of the Urban Development Research Institute in China, stated that the mass production actions of Silicon Valley giants and the release of industry standards for embodied intelligence in China jointly mark the transition of the humanoid robot industry from deep-water technical research to a golden period of commercialization, where breakthroughs in core technologies support scaled manufacturing, and industry standards delineate technical norms, reducing chaotic competition.

However, Gao Heng, an expert from the China Science and Technology Journalism Society, offered a cautious assessment, believing that the industry is currently entering a pre-commercialization and localized realization phase, rather than a fully commercial explosion golden period. The core change in the industry is that various forces are starting to jointly verify whether robots can continuously work in real scenarios and whether costs are controllable, rather than simply achieving breakthroughs in technical research.

China’s Breakthrough: Multiple Advantages to Gain a Foothold, Core Shortcomings Urgently Need to Be Addressed

While Silicon Valley giants are surging into mass production, China is not passively following but has already laid the groundwork in advance. With multiple advantages in standards, scenes, markets, and capital, China has established a foothold in the global embodied intelligence track. However, compared to Silicon Valley giants, there are still gaps in core technologies and capabilities, limiting further industrial development.

In terms of advantages, China’s layout exhibits distinct local characteristics and first-mover effects. Firstly, it has mastered the standard discourse power. On March 26, the China Academy of Information and Communications Technology, in collaboration with over 40 organizations, released the first industry standard in the field of embodied intelligence, establishing a unified benchmark testing framework and seizing the initiative in standard-setting during the early stages of industrial development.

Secondly, it leads in scene implementation. The development of embodied intelligence in China has never remained in the demonstration phase but has truly achieved practical applications. For instance, the Yushu quadruped robot has been deployed in various industrial inspection projects such as substations in Zhejiang, underground utility corridors in Hangzhou, and the Guangdong petrochemical base.

At the same time, China possesses a vast market scale and an active capital environment. By 2025, there will be over 140 domestic embodied intelligence complete machine enterprises, releasing more than 330 humanoid robot products, with a shipment volume of about 17,000 units. The market scale for embodied intelligence and humanoid robots will reach 5.295 billion yuan and 8.239 billion yuan, respectively.

On the capital side, Yushu Technology’s IPO has been accepted, becoming the first humanoid robot stock in A-shares. Since the beginning of the year, substantial financing in the embodied intelligence industry has surged, accelerating the process of capitalization. Yushu Technology’s sales revenues for quadruped robots and humanoid robots from January to September 2025 have grown by 182.22% and 6.42 times year-on-year, respectively, visually verifying the market potential.

Despite the rapid development momentum, China’s shortcomings in the global competition are equally obvious.

Multiple experts have pointed out that the core gap between humanoid robots in China and abroad is not in hardware manufacturing but in data accumulation, model generalization ability, and underlying technology accumulation, which manifests as insufficient flexibility in robot movements and generalization ability.

Yuan Shuai believes that the gap between Chinese and foreign humanoid robots appears to be differences in movement flexibility and generalization ability, but the root cause lies in underlying technology, data accumulation, and R&D philosophy. For example, Google’s RoboCat achieves flexible generalized movement due to long-term technological accumulation, particularly in large model algorithms, sensor fusion, and robot dynamics control, continuously investing in these fields and relying on a vast amount of multi-scenario training data, enabling robots to possess autonomous learning and environmental adaptability.

He points out that domestic products are currently mostly at the stage of predefined actions and fixed scene reproduction. The core shortcomings are the lack of high-quality, large-scale real scene training data and insufficient algorithm generalization ability; secondly, reliance on imports for core components like high-precision servo motors and force sensors restricts movement accuracy and perception levels.

Gao Heng adds that the real gap lies in whether data, models, system engineering, and scene closed-loop capabilities can form a linkage. The aim of leading foreign enterprises is to create intelligent robots that can understand the environment and autonomously complete tasks, fundamentally treating robots as data products that can undergo sustainable iteration. Generalization ability is inherently a composite capability; the domestic sector is not just lagging in single-point technologies but rather lacks the iterative flywheel formed by data and scenes, allowing robots to only adjust parameters for singular tasks, making it difficult to become increasingly intelligent with use.

Renowned financial writer and director of the Qiaoyuan Influence Research Institute, Gao Chengyuan, states that the core gap is concentrated in data accumulation and model generalization ability. Foreign countries clearly have advantages in simulation to reality transfer learning and multi-task general strategies, having established cross-scene data closed-loop and foundational model R&D capabilities through long-term investment. Domestically, there is still a focus on predefined actions, essentially due to a lack of high-quality embodied data, along with an intergenerational gap in computation power and algorithm engineering capabilities required for end-to-end large models.

Yushu Technology also acknowledges that key technologies for large-scale commercial applications in industrial and household scenes remain to be broken through, primarily including “brain” level embodied large model capabilities and the fine durability of “dexterous hands.” The most significant technical challenge is that globally, embodied large models are still in the early development stage, with insufficient generalization capabilities.

Path to Resolution: Multidimensional Paths to Enhance Capabilities, Balancing Current and Long-term Development

Given the background of insufficient data and scene accumulation, how to enhance the flexibility and generalization ability of robot movements has become a core issue for domestic enterprises to catch up.

Multiple experts have proposed development paths that are both practical and forward-looking based on the current state of the industry, emphasizing that enterprises need to balance short-term implementation with long-term R&D, using predefined actions as an admission ticket and generalization ability as a core barrier.

Wang Peng, a researcher at the Beijing Academy of Social Sciences, suggests that domestic enterprises can catch up through two paths: “scene anchoring + technology reuse.” On one hand, they should focus on vertical scene data closed loops, first locking in standardized scenes like industrial welding and material handling, obtaining proprietary datasets through small-scale implementation, and then training embodied models for specific fields; on the other hand, they should rely on open-source ecological collaboration, promoting cross-enterprise data sharing through industry standards released by the Academy, conducting joint training of general models based on operational data in a unified format.

Yuan Shuai recommends a multi-path parallel approach, advocating for collaboration with universities and research institutions to generate virtual data through simulation and digital twin technology, completing training and transferring to real scenes. They should also open interfaces to engage scene parties in pilot projects, collecting real data for iterative algorithms while promoting anonymous training data sharing among enterprises to break data silos and increase self-research investment in core components to support flexible robot movements with hardware breakthroughs.

Gao Heng offers four practical paths: firstly, gather data from real scenes, deeply binding robots to factories, warehouses, etc., to accumulate data within real workflows; secondly, prioritize simulation before real machines, training strategies in simulation environments first, then fine-tuning in real scenarios to reduce training costs; thirdly, focus on task generalization by achieving generalization in single-type tasks like picking and handling to realize commercial value; fourthly, establish a shared data and standards system within the industry to resolve the issues of non-unified interfaces and evaluation systems, forming industrial-level iterations.

Experts unanimously believe that predefined actions and generalization ability are equally important for enterprise development.

Wang Peng notes that in the short term, robots with predefined actions can already cover a majority of industrial scene needs at a lower cost than generalized robots. However, in the long term, generalization ability is the core barrier determining whether enterprises can navigate industrial cycles— as non-standard scenarios like home services and emergency rescue expand, robots that can autonomously adapt to environments will gradually become mainstream.

Gao Heng also agrees that predefined actions are today’s admission ticket, while generalization ability is tomorrow’s final ticket. For enterprises, they should not abandon long-term investment in generalization ability simply because they can make money from predefined actions today; yet they also cannot ignore the immediate implementable scenes in pursuit of generalization. Securing orders first and then developing intelligence is a more realistic route.

Currently, China’s embodied intelligence market accounts for nearly half of the global market, and practical applications have been realized in industrial and emergency scenarios. In the future, which types of scenes will become the breakthrough points for China’s embodied intelligent robots to achieve large-scale commercial use?

Gao Chengyuan believes that industrial manufacturing will be the breakthrough point for China to achieve large-scale commercial use first, especially in automotive manufacturing, 3C electronics assembly, and warehousing logistics. To explore scene demands, in-depth collaboration with leading manufacturing enterprises to build joint laboratories is necessary, starting from replacing single processes and gradually expanding to full-line automation. The key to promoting the integration of technology and scenes lies in establishing a reverse driving mechanism for “scene defining technology,” allowing real production line demands to drive hardware iteration and algorithm optimization rather than following technology first and then seeking scenes.

From “running alongside” to “global leading,” China still needs to break through core bottlenecks in policy, technology, and industrial ecology.

Yuan Shuai suggests that at the policy level, support and funding should be strengthened, and intellectual property protection should be improved; in terms of technology, the focus should be on tackling large model algorithms and core components, enhancing robots’ autonomous learning and generalization capabilities; within the industrial ecology, upstream and downstream collaboration should be strengthened, accelerating the localization of components, deepening the integration of production, learning, research, and application, and promoting the transformation of results. Additionally, international cooperation should be actively carried out, participating in the establishment of global standards to enhance industry discourse power, ultimately constructing a complete embodied intelligence industrial ecology to achieve the goal of leading.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin