Article(id=1220368076957208837, tenantId=1146029695717560320, journalId=1146119989267898375, issueId=1220368072699986922, articleNumber=null, orderNo=null, doi=10.7654/j.issn.2097-1974.20250601, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1751644800000, receivedDateStr=2025-07-05, revisedDate=1757865600000, revisedDateStr=2025-09-15, acceptedDate=null, acceptedDateStr=null, onlineDate=1768888022889, onlineDateStr=2026-01-20, pubDate=1766592000000, pubDateStr=2025-12-25, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1768888022889, onlineIssueDateStr=2026-01-20, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1768888022889, creator=13701087609, updateTime=1768888022889, updator=13701087609, issue=Issue{id=1220368072699986922, tenantId=1146029695717560320, journalId=1146119989267898375, year='2025', volume='48', issue='6', pageStart='1', pageEnd='106', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=null, createTime=1768888021873, creator=13701087609, updateTime=1768888582821, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1220370425566118679, tenantId=1146029695717560320, journalId=1146119989267898375, issueId=1220368072699986922, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1220370425570312984, tenantId=1146029695717560320, journalId=1146119989267898375, issueId=1220368072699986922, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=1, endPage=9, ext={EN=ArticleExt(id=1220368077271781639, articleId=1220368076957208837, tenantId=1146029695717560320, journalId=1146119989267898375, language=EN, title=Attitude Control of High-speed Vehicles Based on Improved TD3 Reinforcement Learning, columnId=1154057568293999177, journalTitle=Missiles and Space Vehicles, columnName=Launch Vehicle and Missile, runingTitle=null, highlight=null, articleAbstract=

To address the challenges of strong nonlinearity, high uncertainty, and rapid time-varying parameters during the reentry phase of high-speed vehicles, this study proposes an end-to-end intelligent attitude control method based on an improved Twin Delayed Deep Deterministic Policy Gradient algorithm, aligned with the demands of intelligent spacecraft development. To overcome the issues of training instability and convergence difficulties in TD3-based attitude control learning, two key innovations are introduced: a hybrid reward mechanism combining continuous tracking error penalties and sparse task-completion rewards is designed within the Markov Decision Process framework to synergistically guide agent convergence. Prior knowledge constraints derived from modern control theory are incorporated into the training process, proposing a behavior cloning-based optimization strategy for the Actor network to balance expert experience imitation and cumulative reward maximization. Simulation results show that the proposed method can accurately track the three-channel attitude commands under 14 combinations of parameter deviations.

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Weili WANG, Wanwei HUANG, Xiaodong LIU, Kunfeng LU, Chenhui JIA), CN=ArticleExt(id=1220368079935164706, articleId=1220368076957208837, tenantId=1146029695717560320, journalId=1146119989267898375, language=CN, title=基于改进型TD3强化学习的高速飞行器姿态控制, columnId=1154057568814092874, journalTitle=导弹与航天运载技术(中英文), columnName=运载器及导弹总体技术, runingTitle=null, highlight=null, articleAbstract=

针对高速飞行器再入段面临的强非线性、高不确定性以及参数快时变等挑战,结合航天器智能化发展需求,提出了一种改进型的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)端到端智能姿态控制方法。为解决TD3算法在姿态控制学习过程中存在训练不稳定、收敛困难的问题,在其马尔可夫决策过程中,设计了混合奖励机制,融合连续跟踪误差惩罚和稀疏任务完成奖励,协同引导智能体收敛;在其训练过程中,引入基于现代控制理论的先验知识约束,提出了基于行为克隆的Actor网络优化更新策略,以平衡专家经验模仿与累计回报最大化目标。仿真结果表明,在14种参数偏差组合的工况下,所提方法能够精确跟踪三通道姿态指令。

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=zoeMds9kydM2qRBy+JBlJg==, magXml=8NJkGaWluWSOikQRUTjrug==, pdfUrl=null, pdf=csrVoAizV5M55Qiwx/B8Qw==, pdfFileSize=2452360, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=Z5yNutYvEhj9ORCqs04PPA==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=yAguE0gbogfC37Kvv+T9rQ==, mapNumber=null, authorCompany=null, fund=null, authors=

王伟丽(1997—),女,博士研究生,主要研究方向为飞行器智能控制、自适应控制等。

黄万伟(1970—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

刘晓东(1987—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

路坤锋(1983—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

贾晨辉(1985—),男,博士,高级工程师,主要研究方向为飞行器制导与控制、智能控制等。

, authorsList=王伟丽, 黄万伟, 刘晓东, 路坤锋, 贾晨辉)}, authors=[Author(id=1220376972228739344, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1220376972300042514, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972228739344, language=EN, stringName=Weili WANG, firstName=Weili, middleName=null, lastName=WANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1220376972354568467, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972228739344, language=CN, stringName=王伟丽, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854, bio={"content":"

王伟丽(1997—),女,博士研究生,主要研究方向为飞行器智能控制、自适应控制等。

"}, bioImg=null, bioContent=

王伟丽(1997—),女,博士研究生,主要研究方向为飞行器智能控制、自适应控制等。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])]), Author(id=1220376972417483029, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1220376972476203287, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972417483029, language=EN, stringName=Wanwei HUANG, firstName=Wanwei, middleName=null, lastName=HUANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1220376972530729240, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972417483029, language=CN, stringName=黄万伟, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854, bio={"content":"

黄万伟(1970—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

"}, bioImg=null, bioContent=

黄万伟(1970—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])]), Author(id=1220376972597838106, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1220376972664946972, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972597838106, language=EN, stringName=Xiaodong LIU, firstName=Xiaodong, middleName=null, lastName=LIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1220376972732055837, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972597838106, language=CN, stringName=刘晓东, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854, bio={"content":"

刘晓东(1987—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

"}, bioImg=null, bioContent=

刘晓东(1987—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])]), Author(id=1220376972786581791, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1220376972862079265, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972786581791, language=EN, stringName=Kunfeng LU, firstName=Kunfeng, middleName=null, lastName=LU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1220376972924993826, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376972786581791, language=CN, stringName=路坤锋, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854, bio={"content":"

路坤锋(1983—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

"}, bioImg=null, bioContent=

路坤锋(1983—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])]), Author(id=1220376974195867940, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1220376974317502758, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376974195867940, language=EN, stringName=Chenhui JIA, firstName=Chenhui, middleName=null, lastName=JIA, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1220376974472692007, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, authorId=1220376974195867940, language=CN, stringName=贾晨辉, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854, bio={"content":"

贾晨辉(1985—),男,博士,高级工程师,主要研究方向为飞行器制导与控制、智能控制等。

"}, bioImg=null, bioContent=

贾晨辉(1985—),男,博士,高级工程师,主要研究方向为飞行器制导与控制、智能控制等。

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])])], keywords=[Keyword(id=1220376974623686952, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, orderNo=1, keyword=high-speed vehicles), Keyword(id=1220376974699184425, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, orderNo=2, keyword=attitude control), Keyword(id=1220376974757904682, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, orderNo=3, keyword=deep reinforcement learning), Keyword(id=1220376974833402155, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, orderNo=4, keyword=behavior cloning), Keyword(id=1220376974917288236, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, orderNo=5, keyword=strongly adaptive control), Keyword(id=1220376974976008493, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, orderNo=1, keyword=高速飞行器), Keyword(id=1220376975038923054, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, orderNo=2, keyword=姿态控制), Keyword(id=1220376975114420527, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, orderNo=3, keyword=深度强化学习), Keyword(id=1220376975210889520, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, orderNo=4, keyword=行为克隆), Keyword(id=1220376975269609777, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, orderNo=5, keyword=强适应控制)], refs=[Reference(id=1220376977333207373, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2023, volume=6, issue=3, pageStart=39, pageEnd=51, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=刘双喜, 刘世俊, 李勇, journalName=空天防御, refType=null, unstructuredReference=刘双喜, 刘世俊, 李勇, 等. 国外高超声速飞行器及防御体系发展现状[J]. 空天防御, 2023, 6(3): 39-51., articleTitle=国外高超声速飞行器及防御体系发展现状, refAbstract=null), Reference(id=1220376977391927630, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2023, volume=6, issue=3, pageStart=39, pageEnd=51, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=LIU Shuangxi, LIU Shijun, LI Yong, journalName=Air and Space Defense, refType=null, unstructuredReference=LIU Shuangxi, LIU Shijun, LI Yong, et al. Current developments in foreign hypersonic vehicles and defense systems[J]. Air and Space Defense, 2023, 6(3): 39-51., articleTitle=null, refAbstract=null), Reference(id=1220376978679578959, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=1, pageStart=64, pageEnd=70, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=樊轶, 秦昌茂, 董添, journalName=导弹与航天运载技术(中英文), refType=null, unstructuredReference=樊轶, 秦昌茂, 董添, 等. 基于MIMO-ESO的高速飞行器自抗扰控制[J].导弹与航天运载技术(中英文), 2024(1): 64-70., articleTitle=基于MIMO-ESO的高速飞行器自抗扰控制, refAbstract=null), Reference(id=1220376978767659344, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=null, issue=1, pageStart=64, pageEnd=70, url=null, language=null, rfNumber=[2], rfOrder=3, authorNames=FAN Yi, QIN Changmao, DONG Tian, journalName=Missiles and Space Vehicles, refType=null, unstructuredReference=FAN Yi, QIN Changmao, DONG Tian, et al. ADRC attitude controller design for hypersonic vehicle based on MIMO-ESO[J]. Missiles and Space Vehicles, 2024(1): 64-70., articleTitle=null, refAbstract=null), Reference(id=1220376978834768209, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=11, pageStart=8, pageEnd=17, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=包为民, journalName=航空学报, refType=null, unstructuredReference=包为民. 航天智能控制技术让运载火箭“会学习”[J]. 航空学报, 2021, 42(11): 8-17., articleTitle=航天智能控制技术让运载火箭“会学习”, refAbstract=null), Reference(id=1220376978910265682, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2021, volume=42, issue=11, pageStart=8, pageEnd=17, url=null, language=null, rfNumber=[3], rfOrder=5, authorNames=BAO Weimin, journalName=Acta Aeronautica et Astronautica Sinica, refType=null, unstructuredReference=BAO Weimin. Space intelligent control technology enables launch vehicle to “self-learning”[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(11): 8-17., articleTitle=null, refAbstract=null), Reference(id=1220376978977374547, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2022, volume=9, issue=9, pageStart=1561, pageEnd=1573, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=ZHANG Z Y, MO Z B, CHEN Y T, journalName=IEEE-CAA Journal of Automatica Sinica, refType=null, unstructuredReference=ZHANG Z Y, MO Z B, CHEN Y T, et al. Reinforcement learning behavioral control for nonlinear autonomous system[J]. IEEE-CAA Journal of Automatica Sinica, 2022, 9(9): 1561-1573., articleTitle=Reinforcement learning behavioral control for nonlinear autonomous system, refAbstract=null), Reference(id=1220376979044483412, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=null, issue=22, pageStart=8187, pageEnd=8199, url=null, language=null, rfNumber=[5], rfOrder=7, authorNames=LUO B, SUN J Y, TANG R, journalName=IEEE Transactions on Automation Science and Engineering, refType=null, unstructuredReference=LUO B, SUN J Y, TANG R, et al. Reinforcement learning-based 3D trajectory tracking control of hypersonic gliding vehicles with time-varying uncertainties[J]. IEEE Transactions on Automation Science and Engineering, 2025(22): 8187-8199., articleTitle=Reinforcement learning-based 3D trajectory tracking control of hypersonic gliding vehicles with time-varying uncertainties, refAbstract=null), Reference(id=1220376979107397973, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=96, pageStart=105537, pageEnd=null, url=null, language=null, rfNumber=[6], rfOrder=8, authorNames=LIU C, DONG C Y, ZHOU Z J, journalName=Aerospace Science and Technology, refType=null, unstructuredReference=LIU C, DONG C Y, ZHOU Z J, et al. Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet[J]. Aerospace Science and Technology, 2020(96): 105537., articleTitle=Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet, refAbstract=null), Reference(id=1220376979174506838, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=8, issue=2, pageStart=201, pageEnd=209, url=null, language=null, rfNumber=[7], rfOrder=9, authorNames=LU K F, WANG W L, LIU X D, journalName=Advances in Astronautics, refType=null, unstructuredReference=LU K F, WANG W L, LIU X D, et al. Research progress and prospect of high-speed vehicle control technology based on reinforcement learning[J]. Advances in Astronautics, 2025, 8(2): 201-209., articleTitle=Research progress and prospect of high-speed vehicle control technology based on reinforcement learning, refAbstract=null), Reference(id=1220376979233227095, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=198, pageStart=606, pageEnd=616, url=null, language=null, rfNumber=[8], rfOrder=10, authorNames=WANG G, AN H, WANG Y, journalName=Acta Astronautica, refType=null, unstructuredReference=WANG G, AN H, WANG Y, et al. Intelligent control of air-breathing hypersonic vehicles subject to path and angle-of-attack constrai-nts[J]. Acta Astronautica, 2022(198): 606-616., articleTitle=Intelligent control of air-breathing hypersonic vehicles subject to path and angle-of-attack constrai-nts, refAbstract=null), Reference(id=1220376979317113176, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=31, issue=4, pageStart=577, pageEnd=582, url=null, language=null, rfNumber=[9], rfOrder=11, authorNames=GAO Q, LI X, JI Y, journalName=Control Engineering of China, refType=null, unstructuredReference=GAO Q, LI X, JI Y, et al. Research on active disturbance rejection control of hypersonic vehicle based on Q-learning[J]. Control Engineering of China, 2024, 31(4): 577-582., articleTitle=Research on active disturbance rejection control of hypersonic vehicle based on Q-learning, refAbstract=null), Reference(id=1220376979380027737, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=36, issue=8, pageStart=4071, pageEnd=4081, url=null, language=null, rfNumber=[10], rfOrder=12, authorNames=LI X, JI Y H, SONG Y, journalName=Neural Computing and Applications, refType=null, unstructuredReference=LI X, JI Y H, SONG Y, et al. Modified deep deterministic policy gradient based on active disturbance rejection control for hypersonic vehicles[J]. Neural Computing and Applications, 2024, 36(8): 4071-4081., articleTitle=Modified deep deterministic policy gradient based on active disturbance rejection control for hypersonic vehicles, refAbstract=null), Reference(id=1220376979451330906, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=45, issue=7, pageStart=1100, pageEnd=1110, url=null, language=null, rfNumber=[11], rfOrder=13, authorNames=路坤锋, 贾晨辉, 黄旭, journalName=宇航学报, refType=null, unstructuredReference=路坤锋, 贾晨辉, 黄旭, 等. 面向变构型飞行器的强化学习位置姿态一体化控制方法[J]. 宇航学报, 2024, 45(7): 1100-1110., articleTitle=面向变构型飞行器的强化学习位置姿态一体化控制方法, refAbstract=null), Reference(id=1220376979514245467, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2024, volume=45, issue=7, pageStart=1100, pageEnd=1110, url=null, language=null, rfNumber=[11], rfOrder=14, authorNames=LU Kunfeng, JIA Chenhui, HUANG Xu, journalName=Journal of Astronautics, refType=null, unstructuredReference=LU Kunfeng, JIA Chenhui, HUANG Xu, et al. Reinforcement learning-based integrated position and attitude control method towards morphing flight vehicles[J]. Journal of Astronautics, 2024, 45(7): 1100-1110., articleTitle=null, refAbstract=null), Reference(id=1220376979589742940, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=46, issue=8, pageStart=298, pageEnd=313, url=null, language=null, rfNumber=[12], rfOrder=15, authorNames=姜凌峰, 李新凯, 张海, journalName=航空学报, refType=null, unstructuredReference=姜凌峰, 李新凯, 张海, 等. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 298-313., articleTitle=基于改进TD3算法的无人机动态环境无地图导航, refAbstract=null), Reference(id=1220376979652657501, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=46, issue=8, pageStart=298, pageEnd=313, url=null, language=null, rfNumber=[12], rfOrder=16, authorNames=JIANG Lingfeng, LI Xinkai, ZHANG Hai, journalName=Acta aeronautica et Astronautica Sinica, refType=null, unstructuredReference=JIANG Lingfeng, LI Xinkai, ZHANG Hai, et al. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm[J]. Acta aeronautica et Astronautica Sinica, 2025, 46(8): 298-313., articleTitle=null, refAbstract=null), Reference(id=1220376979707183454, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=34, issue=2, pageStart=61, pageEnd=73, url=null, language=null, rfNumber=[13], rfOrder=17, authorNames=彭博, 王晓波, 魏祥麟, journalName=计算机系统应用, refType=null, unstructuredReference=彭博, 王晓波, 魏祥麟, 等. 基于SPER-TD3的无人机编队三维航迹规划[J]. 计算机系统应用, 2025, 34(2): 61-73., articleTitle=基于SPER-TD3的无人机编队三维航迹规划, refAbstract=null), Reference(id=1220376979765903711, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=34, issue=2, pageStart=61, pageEnd=73, url=null, language=null, rfNumber=[13], rfOrder=18, authorNames=PENG Bo, WANG Xiaobo, WEI Xianglin, journalName=Computer Systems & Applications, refType=null, unstructuredReference=PENG Bo, WANG Xiaobo, WEI Xianglin, et al. 3D trajectory planning for unmanned aerial vehicle formation based on SPER-TD3[J]. Computer Systems & Applications, 2025, 34(2): 61-73., articleTitle=null, refAbstract=null), Reference(id=1220376979824623968, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=42, issue=4, pageStart=1003, pageEnd=1010, url=null, language=null, rfNumber=[14], rfOrder=19, authorNames=闫雷鸣, 刘健, 朱永昕, journalName=计算机应用研究, refType=null, unstructuredReference=闫雷鸣, 刘健, 朱永昕. DPC-DQRL: 动态行为克隆约束的离线-在线双Q值强化学习[J]. 计算机应用研究, 2025, 42(4): 1003-1010., articleTitle=DPC-DQRL: 动态行为克隆约束的离线-在线双Q值强化学习, refAbstract=null), Reference(id=1220376979887538529, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2025, volume=42, issue=4, pageStart=1003, pageEnd=1010, url=null, language=null, rfNumber=[14], rfOrder=20, authorNames=YAN Leiming, LIU Jian, ZHU Yongxin, journalName=Application Research of Computers, refType=null, unstructuredReference=YAN Leiming, LIU Jian, ZHU Yongxin. DPC-DQRL: offline to online double Q value reinforcement learning with dynamic behavior cloning constraints[J]. Application Research of Computers, 2025, 42(4): 1003-1010., articleTitle=null, refAbstract=null), Reference(id=1220376979954647394, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2015, volume=36, issue=8, pageStart=916, pageEnd=922, url=null, language=null, rfNumber=[15], rfOrder=21, authorNames=刘晓东, 黄万伟, 禹春梅, journalName=宇航学报, refType=null, unstructuredReference=刘晓东, 黄万伟, 禹春梅. 含扩张状态观测器的高超声速飞行器动态面姿态控制[J]. 宇航学报, 2015, 36(8): 916-922., articleTitle=含扩张状态观测器的高超声速飞行器动态面姿态控制, refAbstract=null), Reference(id=1220376980021756259, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2015, volume=36, issue=8, pageStart=916, pageEnd=922, url=null, language=null, rfNumber=[15], rfOrder=22, authorNames=LIU Xiaodong, HUANG Wanwei, YU Chunmei, journalName=Journal of Astronautics, refType=null, unstructuredReference=LIU Xiaodong, HUANG Wanwei, YU Chunmei. Dynamic surface attitude control for hypersonic vehicle containing extended state observer[J]. Journal of Astronautics, 2015, 36(8): 916-922., articleTitle=null, refAbstract=null), Reference(id=1220376980088865124, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=8, pageStart=1113, pageEnd=1126, url=null, language=null, rfNumber=[16], rfOrder=23, authorNames=黄旭, 柳嘉润, 张远, journalName=宇航学报, refType=null, unstructuredReference=黄旭, 柳嘉润, 张远, 等. 知识与数据混合驱动的高速飞行控制方法综述[J]. 宇航学报, 2023, 44(8): 1113-1126., articleTitle=知识与数据混合驱动的高速飞行控制方法综述, refAbstract=null), Reference(id=1220376980155973989, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=8, pageStart=1113, pageEnd=1126, url=null, language=null, rfNumber=[16], rfOrder=24, authorNames=HUANG Xu, LIU Jiarun, ZHANG Yuan, journalName=Journal of Astronautics, refType=null, unstructuredReference=HUANG Xu, LIU Jiarun, ZHANG Yuan, et al. Review on knowledge-based and data-driver cooperating control methods of high-speed vehicle[J]. Journal of Astronautics, 2023, 44(8): 1113-1126., articleTitle=null, refAbstract=null), Reference(id=1220376980214694246, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, doi=null, pmid=null, pmcid=null, year=2017, volume=null, issue=66, pageStart=164, pageEnd=175, url=null, language=null, rfNumber=[17], rfOrder=25, authorNames=LIU X, HUANG W, DU L, journalName=ISA Transactions, refType=null, unstructuredReference=LIU X, HUANG W, DU L. An integrated guidance and control approach in three-dimensional space for hypersonic missile constrained by impact angles[J]. ISA Transactions, 2017(66): 164-175., articleTitle=An integrated guidance and control approach in three-dimensional space for hypersonic missile constrained by impact angles, refAbstract=null)], funds=[Fund(id=1220376977190601036, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, awardId=U24B20159, language=CN, fundingSource=国家自然科学基金(U24B20159), fundOrder=null, country=null)], companyList=[AuthorCompany(id=1220376972165824780, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, xref=null, ext=[AuthorCompanyExt(id=1220376972174213389, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854), AuthorCompanyExt(id=1220376972178407694, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, companyId=1220376972165824780, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854)])], figs=[ArticleFig(id=1220376975433187634, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.1, caption=Typical unpowered high-speed vehicle, figureFileSmall=rb3v+lObYXMR8Yo4pmSgmQ==, figureFileBig=VekLGfCfNZqNaGndzZAoaQ==, tableContent=null), ArticleFig(id=1220376975508685107, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图1, caption=典型的无动力高速飞行器, figureFileSmall=rb3v+lObYXMR8Yo4pmSgmQ==, figureFileBig=VekLGfCfNZqNaGndzZAoaQ==, tableContent=null), ArticleFig(id=1220376975592571188, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.2, caption=Online deployment of neural network model, figureFileSmall=P0ZyBkwZs9seTISyF0fKIA==, figureFileBig=2/KIidyeOQp5DpHvDMB14w==, tableContent=null), ArticleFig(id=1220376975663874357, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图2, caption=神经网络模型的在线部署, figureFileSmall=P0ZyBkwZs9seTISyF0fKIA==, figureFileBig=2/KIidyeOQp5DpHvDMB14w==, tableContent=null), ArticleFig(id=1220376975726788918, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.3, caption=Typical parameters of standard trajectories, figureFileSmall=geQ6cFh08dKvMW/Tp5IxLQ==, figureFileBig=uwOyyywQomyFnnjMmcU0EQ==, tableContent=null), ArticleFig(id=1220376975781314871, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图3, caption=标准轨迹的典型参数, figureFileSmall=geQ6cFh08dKvMW/Tp5IxLQ==, figureFileBig=uwOyyywQomyFnnjMmcU0EQ==, tableContent=null), ArticleFig(id=1220376975840035128, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.4, caption=Reward progression during training, figureFileSmall=QWRn34i3EGroLjq9m8WXGw==, figureFileBig=gIAEhwYna4DZ5Mo34oPIcA==, tableContent=null), ArticleFig(id=1220376975894561081, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图4, caption=训练过程中的奖励值, figureFileSmall=QWRn34i3EGroLjq9m8WXGw==, figureFileBig=gIAEhwYna4DZ5Mo34oPIcA==, tableContent=null), ArticleFig(id=1220376975944892730, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.5, caption=Attitude angle tracking effect, figureFileSmall=/dTNg7/rII8/hwnPvhG4CQ==, figureFileBig=WlkYokS99cEXAdVAzcjQAw==, tableContent=null), ArticleFig(id=1220376976024584507, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图5, caption=姿态角跟踪效果, figureFileSmall=/dTNg7/rII8/hwnPvhG4CQ==, figureFileBig=WlkYokS99cEXAdVAzcjQAw==, tableContent=null), ArticleFig(id=1220376976074916156, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.6, caption=Attitude angle tracking error, figureFileSmall=D4H94agU1Cq21w2pVMunRQ==, figureFileBig=o8qimmT9Vg60eqGmWLsp6g==, tableContent=null), ArticleFig(id=1220376976146219325, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图6, caption=姿态角跟踪误差, figureFileSmall=D4H94agU1Cq21w2pVMunRQ==, figureFileBig=o8qimmT9Vg60eqGmWLsp6g==, tableContent=null), ArticleFig(id=1220376976225911102, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Fig.7, caption=Three-axis rudder deflection of the KG-TD3 method, figureFileSmall=JHkOozLrZIqjsyrk1oj9Ew==, figureFileBig=SnCP8uWitNY26z+fUcJmgA==, tableContent=null), ArticleFig(id=1220376976276242751, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=图7, caption=基于KG-TD3方法的三轴舵偏, figureFileSmall=JHkOozLrZIqjsyrk1oj9Ew==, figureFileBig=SnCP8uWitNY26z+fUcJmgA==, tableContent=null), ArticleFig(id=1220376976339157312, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=null, caption=null, figureFileSmall=null, figureFileBig=null, tableContent=
1:随机初始化策略网络μs;θ和价值网络q1s,a;ω1q2s,a;ω2
2:初始化目标策略网络参数θ-θ和目标价值网络ω1-ω1ω2-ω2
3:初始化经验回放缓冲区R
4:for episode = 1 toMdo
5:初始化噪声ε用于噪声探索
6:策略网络接收初始观察状态s
7:fort=1toTdo
8:根据当前策略和探索噪声选择动作at=μst;θ+ε
9:执行动作at,获取奖励rt,观测新状态st+1
10:经验存储:在R中存储st,at,rt,st+1,at_k
11:经验回放:从R中随机取出Nsj,aj,rj,sj+1,aj_k
12:âj+1-=μsj+1;θo-+ξ
13:两个目标价值网络预测:q̂i,j+1-=qsj+1,âj+1-;ωi,o-i=1,2
14:ŷj=rj+γminq̂1,j+1-,q̂2,j+1-
15:两个价值网络预测:q̂i,j=qsj,aj;ωi,o
16:

最小化损失函数:Lωi=1Nq̂1,j-ŷj2+q̂2,j-ŷj2

更新价值网络ωi,eωi,o

17:ift mod kthen
18:âj=μsj;θo
19:最小化损失函数Lθ,更新策略网络θeθo
Lθ=-Esj~Dqsj,âj;ω1,o+λEsj~Dâj-aj_k2
20:软更新目标策略网络θe-τθo+1-τθo
21:软更新目标价值网络ωi,e-τωi,o+1-τωi,o
22:end if
23:end for
24:end for
), ArticleFig(id=1220376976439820609, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=, caption=, figureFileSmall=null, figureFileBig=null, tableContent=
1:随机初始化策略网络μs;θ和价值网络q1s,a;ω1q2s,a;ω2
2:初始化目标策略网络参数θ-θ和目标价值网络ω1-ω1ω2-ω2
3:初始化经验回放缓冲区R
4:for episode = 1 toMdo
5:初始化噪声ε用于噪声探索
6:策略网络接收初始观察状态s
7:fort=1toTdo
8:根据当前策略和探索噪声选择动作at=μst;θ+ε
9:执行动作at,获取奖励rt,观测新状态st+1
10:经验存储:在R中存储st,at,rt,st+1,at_k
11:经验回放:从R中随机取出Nsj,aj,rj,sj+1,aj_k
12:âj+1-=μsj+1;θo-+ξ
13:两个目标价值网络预测:q̂i,j+1-=qsj+1,âj+1-;ωi,o-i=1,2
14:ŷj=rj+γminq̂1,j+1-,q̂2,j+1-
15:两个价值网络预测:q̂i,j=qsj,aj;ωi,o
16:

最小化损失函数:Lωi=1Nq̂1,j-ŷj2+q̂2,j-ŷj2

更新价值网络ωi,eωi,o

17:ift mod kthen
18:âj=μsj;θo
19:最小化损失函数Lθ,更新策略网络θeθo
Lθ=-Esj~Dqsj,âj;ω1,o+λEsj~Dâj-aj_k2
20:软更新目标策略网络θe-τθo+1-τθo
21:软更新目标价值网络ωi,e-τωi,o+1-τωi,o
22:end if
23:end for
24:end for
), ArticleFig(id=1220376976523706690, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Tab.1, caption=

Aircraft body parameters

, figureFileSmall=null, figureFileBig=null, tableContent=
参数数值
m/kg1 000
l/m0.7
Jy/(kg·m-2)3 000
S/m20.45
Jx/(kg·m-2)200
Jz/(kg·m-2)2 800
), ArticleFig(id=1220376976590815555, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=表1, caption=

飞行器机体参数

, figureFileSmall=null, figureFileBig=null, tableContent=
参数数值
m/kg1 000
l/m0.7
Jy/(kg·m-2)3 000
S/m20.45
Jx/(kg·m-2)200
Jz/(kg·m-2)2 800
), ArticleFig(id=1220376976645341508, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Tab.2, caption=

Network structure of the KG-TD3 algorithm

, figureFileSmall=null, figureFileBig=null, tableContent=
网络名称层类型神经元个数激活函数
Actor输入层6None
全连接层128Relu
输出层3Tanh
Critic输入层9None
全连接层512Relu
全连接层256Relu
输出层1Linear
), ArticleFig(id=1220376976725033285, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=表2, caption=

KG-TD3算法的网络结构

, figureFileSmall=null, figureFileBig=null, tableContent=
网络名称层类型神经元个数激活函数
Actor输入层6None
全连接层128Relu
输出层3Tanh
Critic输入层9None
全连接层512Relu
全连接层256Relu
输出层1Linear
), ArticleFig(id=1220376976775364934, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Tab.3, caption=

KG-TD3 algorithm training process hyper-parameters

, figureFileSmall=null, figureFileBig=null, tableContent=
超参数数值
探索噪声εN0, 0.12
目标策略平滑噪声ξCN0, 0.22, -0.5, -0.5
延迟更新策略k2
目标网络更新率τ0.005
批量大小128
Actor学习率1×e-4
Critic学习率1×e-4
行为克隆权重λ0.1
), ArticleFig(id=1220376976834085191, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=表3, caption=

KG-TD3算法训练过程超参数

, figureFileSmall=null, figureFileBig=null, tableContent=
超参数数值
探索噪声εN0, 0.12
目标策略平滑噪声ξCN0, 0.22, -0.5, -0.5
延迟更新策略k2
目标网络更新率τ0.005
批量大小128
Actor学习率1×e-4
Critic学习率1×e-4
行为克隆权重λ0.1
), ArticleFig(id=1220376976892805448, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Tab.4, caption=

Margins of error for parameters

, figureFileSmall=null, figureFileBig=null, tableContent=
参数偏差范围
Δα0/(°)1
Δγv0/(°)3
ΔS10%
Δρ20%
ΔJy15%
ΔCL20%
ΔCmzα20%
Δβ0/(°)1
Δm10%
Δl10%
ΔJx15%
ΔJz15%
ΔCZ20%
ΔCmzβ20%
), ArticleFig(id=1220376976951525705, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=表4, caption=

参数偏差范围

, figureFileSmall=null, figureFileBig=null, tableContent=
参数偏差范围
Δα0/(°)1
Δγv0/(°)3
ΔS10%
Δρ20%
ΔJy15%
ΔCL20%
ΔCmzα20%
Δβ0/(°)1
Δm10%
Δl10%
ΔJx15%
ΔJz15%
ΔCZ20%
ΔCmzβ20%
), ArticleFig(id=1220376977010245962, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=EN, label=Tab.5, caption=

Comparison of attitude angle tracking errors

, figureFileSmall=null, figureFileBig=null, tableContent=
最大误差TD3KG-TD3
MAX_α/(°)2.4030.420
MAX_β/(°)0.8360.528
MAX_γv/(°)34.5630.725
), ArticleFig(id=1220376977073160523, tenantId=1146029695717560320, journalId=1146119989267898375, articleId=1220368076957208837, language=CN, label=表5, caption=

姿态角跟踪误差对比

, figureFileSmall=null, figureFileBig=null, tableContent=
最大误差TD3KG-TD3
MAX_α/(°)2.4030.420
MAX_β/(°)0.8360.528
MAX_γv/(°)34.5630.725
)], attaches=null, journal=Journal(id=1146118917132496903, delFlag=0, nameCn=导弹与航天运载技术(中英文), nameEn=Missiles and Space Vehicles, nameHistory1=null, nameHistory2=null, issn=2097-1974, eissn=, cn=10-1807/V, coden=null, periodic=1, language=CN, oaType=否, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=DdfacUcWE1ibGHaCsbhL8w==, journalPrice=null, startedYear=null, abbrevIsoEn=Miss Space Veh, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1753780023753, createdBy=null, updatedBy=13701087609, firstLetterCn=M, firstLetterEn=M, subjectCode=Engineering, subjectName=工程, subjectCodeEn=Engineering, subjectNameEn=null, picCn=DdfacUcWE1ibGHaCsbhL8w==, picEn=jBog3LY3UHfAAYfYEqIBHg==, jcr=null, cjcr=null, exts=[JournalExt(id=1157000535829205111, language=CN, name=导弹与航天运载技术(中英文), nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1753780023772, updatedTime=1753780023772, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=null, submissionAuthorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, submissionEditorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, submissionReviewUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, submissionCeEditorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, submissionAeEditorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, option={"copyright":""}), JournalExt(id=1157000535871148152, language=EN, name=Missiles and Space Vehicles, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1753780023782, updatedTime=1753780023782, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=null, submissionAuthorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-ddht-author&redirect_uri=https%3A%2F%2Fddht.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=fefa73ea-30bb-4b68, submissionEditorUrl=, submissionReviewUrl=, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146119989267898375, websiteList=[Website(id=1148243202378817956, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146119989267898375, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/ddyht/CN, language=CN, createTime=1751692112774, createBy=18614031015, updateTime=1753519037132, updateBy=18614031015, name=导弹与航天运载技术-中文站点, tplId=1146099689490845704, title=导弹与航天运载技术(中英文), delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148620442723704855, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202378817956, code=articleTextType, value=kx, createTime=1751782053881, updateTime=1751782053881, creator=18614031015, updator=18614031015), WebsiteProps(id=1148620442694344724, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202378817956, code=banner, value=null, createTime=1751782053874, updateTime=1751782053874, creator=18614031015, updator=18614031015), WebsiteProps(id=1148620442677567507, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202378817956, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=gfUyzanfTuxQ2yc+L/MeWA==, createTime=1751782053870, updateTime=1751782053870, creator=18614031015, updator=18614031015), WebsiteProps(id=1148620442711121942, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202378817956, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751782053878, updateTime=1751782053878, creator=18614031015, updator=18614031015), WebsiteProps(id=1148620442706927637, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202378817956, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751782053877, updateTime=1751782053877, creator=18614031015, updator=18614031015)]), Website(id=1155906039850618895, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146119989267898375, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/ddyht/EN, language=EN, createTime=1753519075604, createBy=18614031015, updateTime=1753519075604, updateBy=18614031015, name=导弹与航天运载技术-英文站点, tplId=1146101810881728533, title=Missiles and Space Vehicles, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155907112585777347, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155906039850618895, code=articleTextType, value=kx, createTime=1753519331356, updateTime=1753519331356, creator=18614031015, updator=18614031015), WebsiteProps(id=1155907112556417216, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155906039850618895, code=banner, value=null, createTime=1753519331349, updateTime=1753519331349, creator=18614031015, updator=18614031015), WebsiteProps(id=1155907112552222911, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155906039850618895, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=gfUyzanfTuxQ2yc+L/MeWA==, createTime=1753519331348, updateTime=1753519331348, creator=18614031015, updator=18614031015), WebsiteProps(id=1155907112577388738, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155906039850618895, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753519331354, updateTime=1753519331354, creator=18614031015, updator=18614031015), WebsiteProps(id=1155907112564805825, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155906039850618895, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753519331351, updateTime=1753519331351, creator=18614031015, updator=18614031015)])], journalTitle=导弹与航天运载技术(中英文), weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Missiles and Space Vehicles, journalPhotoCn=DdfacUcWE1ibGHaCsbhL8w==, journalPhotoEn=jBog3LY3UHfAAYfYEqIBHg==, journalFirstLetter=M, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/ddyht/CN/10.7654/j.issn.2097-1974.20250601, detailUrlEn=https://castjournals.cast.org.cn/joweb/ddyht/EN/10.7654/j.issn.2097-1974.20250601, pdfUrlCn=https://castjournals.cast.org.cn/joweb/ddyht/CN/PDF/10.7654/j.issn.2097-1974.20250601, pdfUrlEn=https://castjournals.cast.org.cn/joweb/ddyht/EN/PDF/10.7654/j.issn.2097-1974.20250601, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
基于改进型TD3强化学习的高速飞行器姿态控制
收藏切换
PDF下载
王伟丽 , 黄万伟 , 刘晓东 , 路坤锋 , 贾晨辉
导弹与航天运载技术(中英文) | 运载器及导弹总体技术 2025,48(6): 1-9
收起
收藏切换
导弹与航天运载技术(中英文) | 运载器及导弹总体技术 2025, 48(6): 1-9
基于改进型TD3强化学习的高速飞行器姿态控制
全屏
王伟丽, 黄万伟, 刘晓东, 路坤锋, 贾晨辉
作者信息
  • 北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854
  • 王伟丽(1997—),女,博士研究生,主要研究方向为飞行器智能控制、自适应控制等。

    黄万伟(1970—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

    刘晓东(1987—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

    路坤锋(1983—),男,博士,研究员,主要研究方向为飞行器制导与控制、智能控制、自适应控制等。

    贾晨辉(1985—),男,博士,高级工程师,主要研究方向为飞行器制导与控制、智能控制等。

Attitude Control of High-speed Vehicles Based on Improved TD3 Reinforcement Learning
Weili WANG, Wanwei HUANG, Xiaodong LIU, Kunfeng LU, Chenhui JIA
Affiliations
  • National Key Laboratory of Science and Technology on Aerospace Intelligent Control, Beijing AerospaceAutomatic Control Institute, Beijing, 100854
出版时间: 2025-12-25 doi: 10.7654/j.issn.2097-1974.20250601
文章导航
收藏切换

针对高速飞行器再入段面临的强非线性、高不确定性以及参数快时变等挑战,结合航天器智能化发展需求,提出了一种改进型的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)端到端智能姿态控制方法。为解决TD3算法在姿态控制学习过程中存在训练不稳定、收敛困难的问题,在其马尔可夫决策过程中,设计了混合奖励机制,融合连续跟踪误差惩罚和稀疏任务完成奖励,协同引导智能体收敛;在其训练过程中,引入基于现代控制理论的先验知识约束,提出了基于行为克隆的Actor网络优化更新策略,以平衡专家经验模仿与累计回报最大化目标。仿真结果表明,在14种参数偏差组合的工况下,所提方法能够精确跟踪三通道姿态指令。

高速飞行器  /  姿态控制  /  深度强化学习  /  行为克隆  /  强适应控制

To address the challenges of strong nonlinearity, high uncertainty, and rapid time-varying parameters during the reentry phase of high-speed vehicles, this study proposes an end-to-end intelligent attitude control method based on an improved Twin Delayed Deep Deterministic Policy Gradient algorithm, aligned with the demands of intelligent spacecraft development. To overcome the issues of training instability and convergence difficulties in TD3-based attitude control learning, two key innovations are introduced: a hybrid reward mechanism combining continuous tracking error penalties and sparse task-completion rewards is designed within the Markov Decision Process framework to synergistically guide agent convergence. Prior knowledge constraints derived from modern control theory are incorporated into the training process, proposing a behavior cloning-based optimization strategy for the Actor network to balance expert experience imitation and cumulative reward maximization. Simulation results show that the proposed method can accurately track the three-channel attitude commands under 14 combinations of parameter deviations.

high-speed vehicles  /  attitude control  /  deep reinforcement learning  /  behavior cloning  /  strongly adaptive control
王伟丽, 黄万伟, 刘晓东, 路坤锋, 贾晨辉. 基于改进型TD3强化学习的高速飞行器姿态控制. 导弹与航天运载技术(中英文), 2025 , 48 (6) : 1 -9 . DOI: 10.7654/j.issn.2097-1974.20250601
Weili WANG, Wanwei HUANG, Xiaodong LIU, Kunfeng LU, Chenhui JIA. Attitude Control of High-speed Vehicles Based on Improved TD3 Reinforcement Learning[J]. Missiles and Space Vehicles, 2025 , 48 (6) : 1 -9 . DOI: 10.7654/j.issn.2097-1974.20250601
高速飞行器凭借其全空域机动、宽速域巡航和强突防能力的优势,已成为现代远程精确打击体系的核心装备1。然而,其飞行包线内表现出的强非线性动力学特性、多通道耦合效应以及由气动热或结构形变引发的快时变参数,使得传统基于精确数学模型的控制器设计方法面临理论局限2。尤其在再入段高动态环境下,飞行器同时承受极端气动载荷、复杂干扰和模型不确定性,基于先验知识的经典控制方法(如增益调度PID、鲁棒自适应控制)进而设计可靠的姿态控制器变得困难。
为应对这些挑战,并响应航天器智能化发展趋势3,智能飞行控制(Intelligent Flight Control,IFC)技术应运而生。其中,数据驱动方法因其对模型依赖程度低的优势,正逐渐成为IFC领域的研究重点。深度强化学习(Deep Reinforcement Learning,DRL)因其特有的“环境交互-自主优化”机制,展现出解决复杂控制问题的独特潜力4-6。目前DRL在高速飞行器控制中的应用主要呈现3个研究方向7:控制参数自适应整定8-9、不确定性补偿控制10以及端到端自主控制11,形成了该领域新的技术突破点。
基于DRL的端到端控制架构通过直接从原始传感器数据学习控制策略以生成执行器命令,无需人工特征提取或控制律设计,实现高速飞行器主控制律的高自主设计,有效降低对其地面设计模型的依赖程度。然而,DRL在高速飞行器再入段三通道姿态控制中的应用研究仍处于探索阶段,其核心挑战源于两大特性:一是再入过程中气动参数与惯性参数的快时变特性导致系统动力学环境剧烈变化;二是三通道间的强耦合效应使得控制策略的训练难以稳定收敛。针对这些问题,现有研究主要从两个方向突破:在算法层面,学者们通过优化网络架构设计12等方式和改进经验回放机制13来提升收敛性能;在训练策略层面,结合行为克隆(Behavior Cloning,BC)技术以修正策略偏差14
综上所述,本研究针对高速飞行器再入段端到端姿态控制问题,创新性提出了基于知识引导的双延迟深度确定性策略梯度(Knowledge-Guided Twin Delayed Deep Deterministic Policy Gradient,KG-TD3)算法,该算法通过融合现代控制理论中的先验知识与双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient,TD3)算法的数据驱动特性,构建了新的混合驱动控制架构。这种知识嵌入式的DRL范式不仅能够解决纯数据驱动方法在复杂动态环境中的训练不稳定问题,同时保留了端到端控制的自适应优势,为高速飞行器的智能控制提供了新的技术途径。
本文研究对象为升力式面对称无动力高速飞行器,其采用BTT控制模式,控制物理量为攻角α、侧滑角β和速度倾侧角γv。典型的无动力高速飞行器有美国的HTV-2、SR-72等,如图1所示。
根据升力式面对称高速飞行器自身及其再入段飞行环境特点,提出如下可行性假设条件:忽略地球自转的影响,此时发射惯性坐标系与地面坐标系始终重合,而且不需考虑离心惯性力和哥氏惯性力的作用;将地球视为均质圆球,忽略地球扁率以及切向引力加速度的影响;惯量积Jx1y1为小量,且忽略不计;将飞行器视为刚体,即不考虑弹性影响。
参考文献[15]建立了飞行器六自由度数学模型,并写为如式(1)所示的仿射非线性形式,以便于控制系统设计。
x˙Ω=FΩ+GΩxω+dΩx˙ω=Fω+Gωu+dω
式中 xΩ为姿态环状态量,即攻角α、侧滑角β、速度倾侧角γvxω为姿态角速度环状态量,即滚转角速度ωx、偏航角速度ωy、俯仰角速度ωzu表示控制输入,即滚转舵偏角δx、偏航舵偏角δy、俯仰舵偏角δzFΩFω表示受控对象的集中动力学;GΩGω为控制信号的增益,描述了控制信号对系统动力学的影响;dΩ=dαdβdγvTdω=dωxdωydωzT为外界干扰。
在实际工程中,由于高速飞行器飞行过程中存在参数摄动大、外界干扰严重等问题,则将飞行器模型写为如式(2)所示的仿射非线性系统:
x˙Ω=F¯Ω+G¯Ωxω+ςΩx˙ω=F¯ω+G¯ωu+ςω
式中 FΩ=F¯Ω+ΔFΩFω=F¯ω+ΔFωGΩ=G¯Ω+ΔGΩGω=G¯ω+ΔGωF¯ΩG¯Ω是姿态环标称状态下的已知模型,F¯ωG¯ω是姿态角速度环标称状态下的已知模型,具体如式(3)式(4)所示。ΔFΩΔGΩΔFωΔGω为未知动态。ςΩςω表示模型未知动态、参数摄动、外界干扰等系统未知项,ςΩ=ΔFΩ+ΔGΩxω+dΩςω=ΔFω+ΔGωu+dω,该项的存在是导致系统性能下降的主要原因,需采用自适应强抗扰的控制器应对,本文采用的是KG-TD3智能控制器。
x˙Ω=α˙β˙γ˙vTxω=ωxωyωzTF¯Ω=-L-mgcosθcosγvmVcosβZ+mgcosθsinγvmVLtanβ+tanθsinγv+Ztanθcosγv-mgcosθtanβcosγvmVG¯Ω=-cosαtanβsinαtanβ1sinαcosα0cosαsecβ-sinαsecβ0
x˙ω=ω˙xω˙yω˙zTu=δxδyδzTF¯ω=Jy-JzJxωyωz+Cmxαα+CmxββqSlJxJz-JxJyωxωz+CmyββqSlJyJx-JyJzωxωy+CmzααqSlJzG¯ω=qSl1Jx0001Jy0001JzCmxδxCmxδyCmxδzCmyδxCmyδyCmyδzCmzδxCmzδyCmzδz
式中 mVθ分别为飞行器的质量、速度和弹道倾角;LZ分别为气动升力和气动侧向力;g为重力加速度;Mx1My1Mz1分别为气动滚转力矩、偏航力矩和俯仰力矩;Jx1Jy1Jz1为飞行器的主转动惯量;qSl分别为动压、气动参考面积和参考长度;Cmx·Cmy·Cmz·分别为相对于·的滚动、偏航和俯仰力矩系数。
至此,面向控制的高速飞行器三通道姿态运动数学模型构建完成。接下来,将根据该模型研究知识与数据融合的智能姿态控制方法。
在深度强化学习中,TD3算法虽然在连续控制任务中表现出色,但仍面临探索效率低、训练初期不稳定以及局部最优陷阱等问题。为此,本文引入基于知识的控制器约束,即动态面控制器(Dynamic Surface Control,DSC)(本文将其定义为“专家控制器”),并结合行为克隆方法,提出了基于知识引导的TD3(KG-TD3)算法。其中,“知识”与“数据”均依据文献[16]界定。
动态面控制器依赖于精确、解析的飞行器数学模型,故依据文献[16]可称为基于知识的控制器。
对于非线性系统,定义跟踪误差sΩ和其微分:
sΩ=xΩ-xΩds˙Ω=F¯Ω+G¯Ωxω+ςΩ-x˙Ωd
式中 xΩ为系统实际状态;xΩd为系统制导指令状态。
取虚拟控制输入xv
xv=-G¯Ω-1F¯Ω+WΩsΩ-x˙Ωd+ςΩ
经一阶滤波器,得到:
τx˙τ+xτ=xv
式中 τ为待设计的滤波器系数;xτ为经过滤波器后的控制输入。
定义跟踪误差sω并对其微分:
sω=xω-xτs˙ω=F¯ω+G¯ωu+ςω-x˙τ
从而设计最终的控制器u
u=-G¯ω-1F¯ω+Wωsω-x˙τ+ςω
综上,动态面姿态控制律:
sΩ=xΩ-xΩdxv=-G¯Ω-1F¯Ω+WΩsΩ-x˙Ωd+ςΩτx˙τ+xτ=xvsω=xω-xτu=-G¯ω-1F¯ω+Wωsω-x˙τ+ςω
式中 WΩWω均为正定矩阵。
DSC控制器所得到的u,即滚转舵偏角δx、偏航舵偏角δy、俯仰舵偏角δz,将其视为“专家动作”,引导后续设计的智能控制器训练环节。
在训练智能控制器前,需要建立高速飞行器再入段飞行的马尔可夫决策过程(Markov Decision Process,MDP),以创建环境与智能体之间的联系。
a)状态空间。
状态空间包括智能体可以从环境中收集到的有用信息。在本研究中,我们优先考虑高速飞行器的可观测状态,形成如式(11)所示的状态空间。
st=ρΩeΩ, ρωeωT=eα, eβ, eγ, eωx, eωy, eωzT
式中 eΩ表示当前时刻姿态角跟踪误差;eω表示当前时刻姿态角速度跟踪误差;ρΩρω为归一化正定对角矩阵,用于保证状态量的尺度大小基本相同。
b)动作空间。
本研究为高速飞行器的三通道姿态控制,故智能体直接学习控制指令,设计如式(12)所示动作空间。
at=δx, δy, δzT
式中 δx为滚转舵偏角;δy为偏航舵偏角;δz为俯仰舵偏角。
此外,考虑气动舵作动范围的物理限制,舵偏角的幅值应满足给定的约束范围。
-35°δi<35°, i=x, y, z
c)奖励函数。
奖励函数被设计为连续和稀疏奖惩的混合函数,其组成部分可表示如下:
1)姿态角和姿态角速度误差惩罚。
P1=K1eα+eβ+eγ+K2eωx+eωy+eωz
2)姿态角误差奖励。
R1=K3e-η1eα+eβ+eγ
3)动作及动作变化率抖动惩罚。
首先,为抑制过大动作,对动作的绝对值之和进行惩罚;其次,若当前时刻的动作为δx_c,δy_c,δz_c,上一时刻动作为δx_l,δy_l,δz_l,动作变化率可依次写为Δδx=δx_c-δx_l,Δδy=δy_c-δy_l,Δδz=δz_c-δz_l,惩罚相邻时间步动作的变化幅度,以抑制高频抖动。
P2=K4e-η2δx+δy+δz+K5e-η3Δδx+Δδy+Δδz
4)动作安全性惩罚。
若动作超出阈值δM,则对超限部分进行二次惩罚。
P3=K6iδi-δM2, if δi>δMi=x, y, z
5)高精度跟踪奖励。
R2=K7, if eα+eβ+eγ<M
最终得到混合奖励函数:
R=-j=13Pj+r=12Rr
式中 KlR+,l1,2,,7η1,η2,η3R+均为奖惩系数;MR+为设计要求所提的姿态角值。
a)Actor网络更新策略的改进。
为解决TD3算法在复杂任务中Actor网络可能面临因探索不足或训练初期Critic不准确,从而学到次优策略,且若单纯模仿“专家控制器”又无法超越“专家”水平,为解决该问题,本文提出了一种基于行为克隆的约束优化方法,从而对Actor网络的更新策略进行改进。
该方法的核心思想是将“专家控制器”的先验知识以软约束的形式融入DRL框架,具体来讲,在Actor网络策略优化目标中引入“专家动作”的行为克隆损失作为正则项,构建如式(20)所示的复合目标函数,以确保Actor输出的动作不会偏离专家动作太远,同时最大化Critic评估的q值,达到平衡“模仿专家”和“最大化累计回报”两个目标。
Lθ=-Esj~𝒟qsj,a^j;ω1,o强化学习目标+λEsj~𝒟a^j-aj_k2行为克隆约束
式中 Lθ为损失函数,优化的目标即为最小化损失函数;Esj~𝒟qsj,a^j;ω1,o表示当前策略ω1,o在状态sj下生成的动作a^j,由Critic网络评估得到q值后,在数据分布𝒟(从经验回放缓冲区R中采样得到)上的平均q值;Esj~𝒟a^j-aj_k2表示当前策略ω1,o在状态sj下生成的动作a^j与当前时刻的“专家动作”aj_k的均方误差在数据分布𝒟上的期望; λ为行为克隆的权重因子。
b)KG-TD3算法训练框架。
KG-TD3算法训练与TD3有两点不同:一是提出了一种基于行为克隆的约束优化方法,利用式(20)进行Actor网络更新策略进行改进;二是为将智能控制器与“专家控制器”在时间尺度上对齐,需额外存储“专家动作”,本研究选取对经验回放缓冲区进行改进。具体来讲,在智能体与环境交互过程后,将智能体轨迹整理为st,at,rt,st+1,at_k的五元组,即t时刻的状态st、智能体动作at、奖励rt以及t+1时刻的状态st+1、“专家控制器”动作at_k。而后通过经验回放缓存区采用时,即可匹配同一时间的“专家动作”。
除上述两点外的KG-TD3训练环节与TD3别无二致,此处不再赘述,可参考文献[12]。KG-TD3完整的算法伪代码如下所示。
当Actor和Critic网络收敛,则智能体训练完成。如图2所示,训练好的Actor网络将作为神经网络控制策略在线实施,并以端到端的方式生成高速飞行器三通道姿态控制指令。具体来讲,训练好的Actor网络接收到归一化后的姿态角跟踪误差和姿态角速度跟踪误差,经过全连接层后,输出三轴舵偏角。图2中的xΩd为系统姿态角制导指令,xΩ为实际姿态角状态量,即攻角α、侧滑角β、速度倾侧角γv
通过仿真软件Visual Studio Code,编程语言Python对提出的KG-TD3模型进行训练和测试。Visual Studio Code版本为1.102.3,Python版本为3.12.3。用于试验的硬件平台配置如下:操作系统为Win11,CPU为Intel Core i5,内存为16 GB。
本研究选取高速飞行器的再入段,其飞行持续时间为38.4 s,初始高度为30 km,初始速度约为Ma=5,动压范围从27.4 kPa到588.8 kPa。图3为标准轨迹的典型参数,从图3中可以看出,该飞行阶段的高度、速度和动压呈现出显著而快速的变化,同时伴随着模型参数的变化。飞行器机体参数如表1所示,气动参数为参考文献[17]中提供的公开数据。
使用KG-TD3作为高速飞行器再入段三通道智能姿态控制器,其网络结构如表2所示,智能体训练过程中的超参数如表3所示。
为了验证本文提出的KG-TD3姿态控制器的适应能力,我们考虑了三个通道中的初始姿态偏差Δα0Δβ0Δγv0;质量、气动参考面积、参考长度和大气密度偏差 ΔmΔSΔlΔρ;三通道中的惯性力矩偏差ΔJxΔJyΔJz;气动力系数偏差ΔCLΔCZ;俯仰力矩系数偏差ΔCmzαΔCmzβ表4为具体偏差值。由于风干扰不改变系统本身动力学模型结构,可通过添加扰动观测器等行为进行干扰估计并补偿。本研究重点针对需依赖控制器本身适应性的参数摄动,故适应能力分析只涉及参数摄动。
图4展示了所提KG-TD3算法和TD3算法在训练过程中每回合智能体的动作奖励回报,反映了Actor网络的学习情况。KG-TD3算法在训练10轮左右奖励值趋近稳定,且无高位振荡的状态,反映了其训练过程稳定,收敛快速。相比之下,TD3算法在训练过程中奖励值波动较大,且数值略低于KG-TD3算法。
与TD3算法相比,本文提出的KG-TD3算法主要改进了奖励机制以及Actor网络优化更新策略。为了评估这一改进对姿态角跟踪精度的影响,我们将其与TD3进行对比分析,两种智能控制器的姿态角跟踪最大误差(MAX)如表5所示。其中,用于分析误差的飞行段为1~38.4 s。通过表5分析可知,KG-TD3相比TD3算法跟踪精度有较为明显的提升。
为验证所提控制方法的自适应能力,在三种典型工况下进行了仿真试验:标称状态、偏差上限状态和偏差下限状态。KG-TD3智能控制器的姿态角跟踪性能以及根据控制器得出的等效舵偏曲线如图5图7所示。仿真试验结果分析表明,在14组涵盖上、下边界的参数摄动工况下,系统稳态控制偏差均小于1°。
本研究针对高速飞行器再入段强非线性、高不确定性和参数快时变等复杂控制问题,提出了一种基于改进型TD3算法的端到端智能姿态控制方法。通过融合混合奖励机制和基于行为克隆的先验知识约束,有效解决了传统深度强化学习在姿态控制中训练不稳定、收敛困难的问题。
  • 国家自然科学基金(U24B20159)
参考文献 引证文献
排序方式:
[1]
刘双喜, 刘世俊, 李勇, 等. 国外高超声速飞行器及防御体系发展现状[J]. 空天防御, 2023, 6(3): 39-51.
LIU Shuangxi, LIU Shijun, LI Yong, et al. Current developments in foreign hypersonic vehicles and defense systems[J]. Air and Space Defense, 2023, 6(3): 39-51.
[2]
樊轶, 秦昌茂, 董添, 等. 基于MIMO-ESO的高速飞行器自抗扰控制[J].导弹与航天运载技术(中英文), 2024(1): 64-70.
FAN Yi, QIN Changmao, DONG Tian, et al. ADRC attitude controller design for hypersonic vehicle based on MIMO-ESO[J]. Missiles and Space Vehicles, 2024(1): 64-70.
[3]
包为民. 航天智能控制技术让运载火箭“会学习”[J]. 航空学报, 2021, 42(11): 8-17.
BAO Weimin. Space intelligent control technology enables launch vehicle to “self-learning”[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(11): 8-17.
[4]
ZHANG Z Y, MO Z B, CHEN Y T, et al. Reinforcement learning behavioral control for nonlinear autonomous system[J]. IEEE-CAA Journal of Automatica Sinica, 2022, 9(9): 1561-1573.
[5]
LUO B, SUN J Y, TANG R, et al. Reinforcement learning-based 3D trajectory tracking control of hypersonic gliding vehicles with time-varying uncertainties[J]. IEEE Transactions on Automation Science and Engineering, 2025(22): 8187-8199.
[6]
LIU C, DONG C Y, ZHOU Z J, et al. Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet[J]. Aerospace Science and Technology, 2020(96): 105537.
[7]
LU K F, WANG W L, LIU X D, et al. Research progress and prospect of high-speed vehicle control technology based on reinforcement learning[J]. Advances in Astronautics, 2025, 8(2): 201-209.
[8]
WANG G, AN H, WANG Y, et al. Intelligent control of air-breathing hypersonic vehicles subject to path and angle-of-attack constrai-nts[J]. Acta Astronautica, 2022(198): 606-616.
[9]
GAO Q, LI X, JI Y, et al. Research on active disturbance rejection control of hypersonic vehicle based on Q-learning[J]. Control Engineering of China, 2024, 31(4): 577-582.
[10]
LI X, JI Y H, SONG Y, et al. Modified deep deterministic policy gradient based on active disturbance rejection control for hypersonic vehicles[J]. Neural Computing and Applications, 2024, 36(8): 4071-4081.
[11]
路坤锋, 贾晨辉, 黄旭, 等. 面向变构型飞行器的强化学习位置姿态一体化控制方法[J]. 宇航学报, 2024, 45(7): 1100-1110.
LU Kunfeng, JIA Chenhui, HUANG Xu, et al. Reinforcement learning-based integrated position and attitude control method towards morphing flight vehicles[J]. Journal of Astronautics, 2024, 45(7): 1100-1110.
[12]
姜凌峰, 李新凯, 张海, 等. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 298-313.
JIANG Lingfeng, LI Xinkai, ZHANG Hai, et al. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm[J]. Acta aeronautica et Astronautica Sinica, 2025, 46(8): 298-313.
[13]
彭博, 王晓波, 魏祥麟, 等. 基于SPER-TD3的无人机编队三维航迹规划[J]. 计算机系统应用, 2025, 34(2): 61-73.
PENG Bo, WANG Xiaobo, WEI Xianglin, et al. 3D trajectory planning for unmanned aerial vehicle formation based on SPER-TD3[J]. Computer Systems & Applications, 2025, 34(2): 61-73.
[14]
闫雷鸣, 刘健, 朱永昕. DPC-DQRL: 动态行为克隆约束的离线-在线双Q值强化学习[J]. 计算机应用研究, 2025, 42(4): 1003-1010.
YAN Leiming, LIU Jian, ZHU Yongxin. DPC-DQRL: offline to online double Q value reinforcement learning with dynamic behavior cloning constraints[J]. Application Research of Computers, 2025, 42(4): 1003-1010.
[15]
刘晓东, 黄万伟, 禹春梅. 含扩张状态观测器的高超声速飞行器动态面姿态控制[J]. 宇航学报, 2015, 36(8): 916-922.
LIU Xiaodong, HUANG Wanwei, YU Chunmei. Dynamic surface attitude control for hypersonic vehicle containing extended state observer[J]. Journal of Astronautics, 2015, 36(8): 916-922.
[16]
黄旭, 柳嘉润, 张远, 等. 知识与数据混合驱动的高速飞行控制方法综述[J]. 宇航学报, 2023, 44(8): 1113-1126.
HUANG Xu, LIU Jiarun, ZHANG Yuan, et al. Review on knowledge-based and data-driver cooperating control methods of high-speed vehicle[J]. Journal of Astronautics, 2023, 44(8): 1113-1126.
[17]
LIU X, HUANG W, DU L. An integrated guidance and control approach in three-dimensional space for hypersonic missile constrained by impact angles[J]. ISA Transactions, 2017(66): 164-175.
2025年第48卷第6期
PDF下载
336
158
引用本文
BibTeX
文章信息
doi: 10.7654/j.issn.2097-1974.20250601
  • 接收时间:2025-07-05
  • 首发时间:2026-01-20
  • 出版时间:2025-12-25
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2025-07-05
  • 修回日期:2025-09-15
基金
国家自然科学基金(U24B20159)
作者信息
    北京航天自动控制研究所,宇航智能控制技术全国重点实验室,北京,100854
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/ddyht/CN/10.7654/j.issn.2097-1974.20250601
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏