A Comprehensive Review and Future Perspectives on Embodied AI Large Models

A Comprehensive Review and Future Perspectives on Embodied AI Large Models

PDF

Tingyu YUAN¹^,², Kai LIU³, Biaoliang GUAN³, Wen YE²^,⁴, Yacui ZHAO⁵, Chaoyang ZHAO¹^,⁶, Jinqiao WANG¹^,²

Radio Engineering | 2025, 55(11) : 2256 - 2273

Less

Radio Engineering | 2025, 55(11): 2256-2273

• Engineering & Application •

A Comprehensive Review and Future Perspectives on Embodied AI Large Models

Full

Tingyu YUAN¹^,², Kai LIU³, Biaoliang GUAN³, Wen YE²^,⁴, Yacui ZHAO⁵, Chaoyang ZHAO¹^,⁶, Jinqiao WANG¹^,²

Affiliations

^1.Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100083, China

^2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100083, China

^3.School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^4.New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100083, China

^5.The Hamlyn Centre, Imperial College London, London SW7 2AZ , United Kingdom

^6.Objecteye. Inc, Beijing 100083, China

Published: 2025-11-05 doi: 10.3969/j.issn.1003-3106.2025.11.014

Outline

Abstract

Less

Vision-Language-Action (VLA) models are a core technology for achieving general embodied artificial intelligence, aiming to integrate visual perception, language understanding, and action decision-making within a unified end-to-end framework. The current research status and development trajectory of VLA models are comprehensively and systematically reviewed. The theoretical origins of VLA models are traced, and the paradigm shift from modular designs to unified architectures is clarified. Along the evolutionary path of VLA, representative works such as SpatialVLA, TLA, and GR00T N1 are presented with a focus on multimodal fusion and cognitive hierarchies. A detailed taxonomy of VLA models is constructed from two key dimensions-macro architecture and system hierarchy. Key technologies and design principles are deeply analyzed, ranging from pioneering works such as RT-1, to models introducing large-scale knowledge transfer such as RT-2, OpenVLA, and ECOT, and further to cutting-edge dual-system architectures such as Helix, OpenHelix, DexVLA, and DexGraspVLA. Mainstream simulation environments, core datasets, and benchmarks supporting VLA research are systematically integrated and reviewed. The application status and prospects of VLA models in robotic manipulation, autonomous navigation, and industrial automation are explored. Core challenges in current VLA research are analyzed, including generalization and data efficiency, long-horizon task planning, and real-time responsiveness. Future research directions are discussed, including integration with world models and enhancement of data efficiency.

Key words

VLA models / large models / embodied AI / robot learning / multimodal learning

Cite this Article

Tingyu YUAN, Kai LIU, Biaoliang GUAN, Wen YE, Yacui ZHAO, Chaoyang ZHAO, Jinqiao WANG. A Comprehensive Review and Future Perspectives on Embodied AI Large Models[J]. Radio Engineering, 2025 , 55 (11) : 2256 -2273 . DOI: 10.3969/j.issn.1003-3106.2025.11.014

Appendix

Less

Year 2025 volume 55 Issue 11

PDF

191

Cite this Article

BibTeX

Article Info

doi: 10.3969/j.issn.1003-3106.2025.11.014

Receive Date：2025-08-02
Online Date：2026-04-17
Published：2025-11-05

Article Data

Affiliations

History

Received：2025-08-02

Affiliations

^1.Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100083, China

^2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100083, China

^3.School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^4.New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100083, China

^5.The Hamlyn Centre, Imperial College London, London SW7 2AZ , United Kingdom

^6.Objecteye. Inc, Beijing 100083, China

References

Share

https://castjournals.cast.org.cn/joweb/wxdgc/EN/10.3969/j.issn.1003-3106.2025.11.014

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House