收藏切换
Agent evolution under the VLA architecture: From mechanistic construction to application expansion
收藏切换
PDF
Hui ZHANG1, Dongjin XIE2, Shutong LIANG1, Mingxuan LI1, Xiaofeng JIA3, *, Yonglin TIAN4, Siji MA5, Haoran LI4, Yidong LI1
Science & Technology Review | 2025, 43(20) : 48 - 61
Less
收藏切换
Science & Technology Review | 2025, 43(20): 48-61
Exclusive
Agent evolution under the VLA architecture: From mechanistic construction to application expansion
Full
Hui ZHANG1, Dongjin XIE2, Shutong LIANG1, Mingxuan LI1, Xiaofeng JIA3, *, Yonglin TIAN4, Siji MA5, Haoran LI4, Yidong LI1
Affiliations
  • 1School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
  • 2School of Software, Xinjiang University, Urumqi 830046, China
  • 3Beijing Big Data Centre, Beijing 101117, China
  • 4Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
  • 5Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
Published: 2025-10-28 doi: 10.3981/j.issn.1000-7857.2025.10.00077
Outline
收藏切换

Embodied intelligence represents a new stage in the evolution of artificial intelligence, marking a transition from "perception−cognition" to an integrated paradigm of "perception−cognition−action." The Vision−Language−Action (VLA) model provides a critical technological pathway for enabling autonomous agent operation in the real world by unifying visual perception, language understanding, and action generation. This paper systematically reviews the development trajectory and representative achievements of VLA technologies, and summarizes their architectural paradigm, which includes multi−modal perception, semantic fusion mechanisms, reinforcement and imitation learning, world models, and hierarchical action output. By considering application scenarios such as autonomous driving, human–computer interaction, and industrial equipment, we further analyze the core challenges faced by VLA development, including the scarcity of data resources, limited generalization and transferability, insufficient interpretability, and increasing computational demands, and we outline the future development trends.

vision−language−action model  /  multi−modal learning  /  embodied intelligence  /  large language model
Hui ZHANG, Dongjin XIE, Shutong LIANG, Mingxuan LI, Xiaofeng JIA, Yonglin TIAN, Siji MA, Haoran LI, Yidong LI. Agent evolution under the VLA architecture: From mechanistic construction to application expansion[J]. Science & Technology Review, 2025 , 43 (20) : 48 -61 . DOI: 10.3981/j.issn.1000-7857.2025.10.00077
Year 2025 volume 43 Issue 20
PDF
2578
1539
Cite this Article
BibTeX
Article Info
doi: 10.3981/j.issn.1000-7857.2025.10.00077
  • Receive Date:2025-09-11
  • Online Date:2025-12-29
  • Published:2025-10-28
Article Data
Affiliations
History
  • Received:2025-09-11
  • Revised:2025-10-18
Funding
Affiliations
    1School of Computer Science and Technology, Beijing Jiaotong University, Beijing 100044, China
    2School of Software, Xinjiang University, Urumqi 830046, China
    3Beijing Big Data Centre, Beijing 101117, China
    4Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    5Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
References
Share
https://castjournals.cast.org.cn/joweb/kjdb/EN/10.3981/j.issn.1000-7857.2025.10.00077
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT