Agent evolution under the VLA architecture: From mechanistic construction to application expansion

Agent evolution under the VLA architecture: From mechanistic construction to application expansion

PDF

Hui ZHANG¹, Dongjin XIE², Shutong LIANG¹, Mingxuan LI¹, Xiaofeng JIA³^,^*, Yonglin TIAN⁴, Siji MA⁵, Haoran LI⁴, Yidong LI¹

Science & Technology Review | 2025, 43(20) : 48 - 61

Less

Science & Technology Review | 2025, 43(20): 48-61

• Exclusive •

Agent evolution under the VLA architecture: From mechanistic construction to application expansion

Full

Hui ZHANG¹, Dongjin XIE², Shutong LIANG¹, Mingxuan LI¹, Xiaofeng JIA³^,^*, Yonglin TIAN⁴, Siji MA⁵, Haoran LI⁴, Yidong LI¹

Affiliations

¹School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China

²School of Software, Xinjiang University, Urumqi 830046, China

³Beijing Big Data Centre, Beijing 101117, China

⁴Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

⁵Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China

Published: 2025-10-28 doi: 10.3981/j.issn.1000-7857.2025.10.00077

Outline

Abstract

Less

Embodied intelligence represents a new stage in the evolution of artificial intelligence, marking a transition from "perception−cognition" to an integrated paradigm of "perception−cognition−action." The Vision−Language−Action (VLA) model provides a critical technological pathway for enabling autonomous agent operation in the real world by unifying visual perception, language understanding, and action generation. This paper systematically reviews the development trajectory and representative achievements of VLA technologies, and summarizes their architectural paradigm, which includes multi−modal perception, semantic fusion mechanisms, reinforcement and imitation learning, world models, and hierarchical action output. By considering application scenarios such as autonomous driving, human–computer interaction, and industrial equipment, we further analyze the core challenges faced by VLA development, including the scarcity of data resources, limited generalization and transferability, insufficient interpretability, and increasing computational demands, and we outline the future development trends.

Key words

vision−language−action model / multi−modal learning / embodied intelligence / large language model

Cite this Article

Hui ZHANG, Dongjin XIE, Shutong LIANG, Mingxuan LI, Xiaofeng JIA, Yonglin TIAN, Siji MA, Haoran LI, Yidong LI. Agent evolution under the VLA architecture: From mechanistic construction to application expansion[J]. Science & Technology Review, 2025 , 43 (20) : 48 -61 . DOI: 10.3981/j.issn.1000-7857.2025.10.00077

Appendix

Less

Year 2025 volume 43 Issue 20

PDF

2578

1539

Cite this Article

BibTeX

Article Info

doi: 10.3981/j.issn.1000-7857.2025.10.00077

Receive Date：2025-09-11
Online Date：2025-12-29
Published：2025-10-28

Article Data

Affiliations

History

Received：2025-09-11
Revised：2025-10-18

Funding

Affiliations

¹School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China

²School of Software, Xinjiang University, Urumqi 830046, China

³Beijing Big Data Centre, Beijing 101117, China

⁴Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

⁵Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China

References

Share

https://castjournals.cast.org.cn/joweb/kjdb/EN/10.3981/j.issn.1000-7857.2025.10.00077

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House