Article(id=1156949464423420645, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1156949362480861758, articleNumber=null, orderNo=null, doi=10.12404/j.issn.1671-1815.2309878, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1702483200000, receivedDateStr=2023-12-14, revisedDate=1731945600000, revisedDateStr=2024-11-19, acceptedDate=null, acceptedDateStr=null, onlineDate=1753767847400, onlineDateStr=2025-07-29, pubDate=1738944000000, pubDateStr=2025-02-08, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1753767847400, onlineIssueDateStr=2025-07-29, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1753767847400, creator=13701087609, updateTime=1753767847400, updator=13701087609, issue=Issue{id=1156949362480861758, tenantId=1146029695717560320, journalId=1146123166801305609, year='2025', volume='25', issue='4', pageStart='1312', pageEnd='1751', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=0, createTime=1753767823094, creator=13701087609, updateTime=1755171161273, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1162835389472424814, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1156949362480861758, language=EN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1162835389472424815, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1156949362480861758, language=CN, specialIssueTitle=, coverIllustrator=, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=1555, endPage=1562, ext={EN=ArticleExt(id=1156949465933370123, articleId=1156949464423420645, tenantId=1146029695717560320, journalId=1146123166801305609, language=EN, title=Improved YOLO Algorithm via Fusing Multilayer Features and Contextual Information, columnId=1156262729162810294, journalTitle=Science Technology and Engineering, columnName=Papers·Automation and Computational Technology, runingTitle=null, highlight=null, articleAbstract=

Remote sensing image target detection is one of great significance in military reconnaissance, intelligent agriculture and other fields, especially small target detection has been gaining continuous attention. However, small targets in remote sensing images face the problems of insufficient feature information and difficult detection, which have become the biggest obstacles plaguing the development of remote sensing applications. To this end, the you only look once-hybrid feature(YOLO-HF) algorithm was proposed, which introduced a hybrid attention mechanism of channel attention and self-attention in the network of the traditional YOLOv7 model to extract the target’s deep features, and fused the shallow and deep features to increase the richness of local features; to further strengthen the attention to the global information, a global attention mechanism was added for the small-scale targets after the extraction of the features, to achieve the ability of global feature expression enhancement. In order to avoid that the traditional loss function was sensitive to the positional deviation of small targets, which leaded to poor detection effect, a new metric was selected for use, which was embedded into the computation of the bounding box loss function, so as to accelerated the convergence of the loss function and realized the enhancement of the detection accuracy of small targets. The experimental results show that compared with the traditional YOLOv7 algorithm, the proposed algorithm shows superiority on both RSOD and NWPU VHR-10 datasets, and in particular, the mean average accuracy on RSOD dataset is improved by 2.90%, and the mean average accuracy on NWPU VHR-10 dataset realizes an improvement of 3.61%.

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Xuan FEI, Meng-yao GUO, Si-jia WU, Zi-long JIN, Ding MA), CN=ArticleExt(id=1156949508782379027, articleId=1156949464423420645, tenantId=1146029695717560320, journalId=1146123166801305609, language=CN, title=融合多层特征与上下文信息的YOLO改进算法, columnId=1156262729783567290, journalTitle=科学技术与工程, columnName=论文·自动化技术、计算机技术, runingTitle=null, highlight=null, articleAbstract=

遥感图像目标检测在军事侦察、智慧农业等领域意义重大,特别是小目标检测一直获得持续关注。然而,遥感图像中的小目标面临特征信息不足、检测难度大等问题,成为困扰遥感检测应用发展的最大障碍。为此,提出YOLO-HF(you only look once-hybrid feature)算法,该算法在传统YOLOv7模型的网络中,引入通道注意力和自注意力的混合注意力机制提取目标深层特征,并将浅层特征和深层特征进行融合,增加局部特征的丰富性;为进一步加强对全局信息的关注,在提取特征后为小尺度目标添加全局注意力机制,实现全局特征表达能力的提升;为避免传统损失函数对小目标位置偏差敏感,导致检测效果不佳,选择使用一种新的度量方式,将其嵌入边界框损失函数的计算中,从而加快损失函数的收敛,实现小目标检测精度的提升。实验结果表明:与传统YOLOv7算法相比,所提算法在RSOD和NWPU VHR-10数据集上均表现出优越性,特别地,在RSOD数据集上均值平均精度提升了2.90%,在NWPU VHR-10数据集上均值平均精度实现了3.61%的提升。

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=mlFQv43dbMM6pr4PVUrfdA==, magXml=PpKhCHMbwxx5mULHc3x1Zw==, pdfUrl=null, pdf=RE8fJJv+qKpiX8Ggj9fHiw==, pdfFileSize=9164058, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=oaX2Kw4ccz52FwnfBFy47A==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=NTtOSkTY8352h1SOw6u4TA==, mapNumber=null, authorCompany=null, fund=null, authors=

费选(1986—),男,汉族,河南郑州人,博士,副教授。研究方向:高光谱遥感影像分析。E-mail:

, authorsList=费选, 郭梦瑶, 吴思佳, 靳子泷, 马丁)}, authors=[Author(id=1225944421745734117, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=feixuan@haut.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1225944421867368945, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944421745734117, language=EN, stringName=Xuan FEI, firstName=Xuan, middleName=null, lastName=FEI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1225944422039335427, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944421745734117, language=CN, stringName=费选, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=河南工业大学人工智能与大数据学院, 郑州 450001, bio={"content":"

费选(1986—),男,汉族,河南郑州人,博士,副教授。研究方向:高光谱遥感影像分析。E-mail:

"}, bioImg=null, bioContent=

费选(1986—),男,汉族,河南郑州人,博士,副教授。研究方向:高光谱遥感影像分析。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])]), Author(id=1225944422194524695, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1225944422332936738, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944422194524695, language=EN, stringName=Meng-yao GUO, firstName=Meng-yao, middleName=null, lastName=GUO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1225944422471348785, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944422194524695, language=CN, stringName=郭梦瑶, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=河南工业大学人工智能与大数据学院, 郑州 450001, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])]), Author(id=1225944422580400703, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1225944422739784271, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944422580400703, language=EN, stringName=Si-jia WU, firstName=Si-jia, middleName=null, lastName=WU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1225944422857224794, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944422580400703, language=CN, stringName=吴思佳, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=河南工业大学人工智能与大数据学院, 郑州 450001, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])]), Author(id=1225944423029191274, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1225944423238906508, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944423029191274, language=EN, stringName=Zi-long JIN, firstName=Zi-long, middleName=null, lastName=JIN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1225944423419261598, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944423029191274, language=CN, stringName=靳子泷, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=河南工业大学人工智能与大数据学院, 郑州 450001, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])]), Author(id=1225944424807576236, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1225944424979542718, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944424807576236, language=EN, stringName=Ding MA, firstName=Ding, middleName=null, lastName=MA, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1225944425159897818, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, authorId=1225944424807576236, language=CN, stringName=马丁, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=河南工业大学人工智能与大数据学院, 郑州 450001, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])])], keywords=[Keyword(id=1225944425466082054, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, orderNo=1, keyword=remote sensing images), Keyword(id=1225944425579328279, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, orderNo=2, keyword=target detection), Keyword(id=1225944425856152373, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, orderNo=3, keyword=YOLOv7), Keyword(id=1225944426040701777, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, orderNo=4, keyword=multilayer features), Keyword(id=1225944426179113825, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, orderNo=5, keyword=attention mechanism), Keyword(id=1225944426283971437, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, orderNo=1, keyword=遥感图像), Keyword(id=1225944426397217659, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, orderNo=2, keyword=目标检测), Keyword(id=1225944426564989838, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, orderNo=3, keyword=YOLOv7), Keyword(id=1225944426699207576, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, orderNo=4, keyword=多层特征), Keyword(id=1225944426921505714, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, orderNo=5, keyword=注意力机制)], refs=[Reference(id=1225944432160190754, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2022, volume=49, issue=4, pageStart=49, pageEnd=65, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=马梁, 苟于涛, 雷涛, journalName=光电工程, refType=null, unstructuredReference=马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 49-65., articleTitle=基于多尺度特征融合的遥感图像小目标检测, refAbstract=null), Reference(id=1225944432327962927, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2022, volume=49, issue=4, pageStart=49, pageEnd=65, url=null, language=null, rfNumber=[1], rfOrder=1, authorNames=Ma Liang, Gou Yutao, Lei Tao, journalName=Photovoltaic Engineering, refType=null, unstructuredReference=Ma Liang, Gou Yutao, Lei Tao, et al. Small target detection in remote sensing images based on multi-scale feature fusion[J]. Photovoltaic Engineering, 2022, 49(4): 49-65., articleTitle=Small target detection in remote sensing images based on multi-scale feature fusion, refAbstract=null), Reference(id=1225944433707888963, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2020, volume=15, issue=6, pageStart=75, pageEnd=76, url=null, language=null, rfNumber=[2], rfOrder=2, authorNames=程塨, 周培诚, 韩军伟, journalName=科学观察, refType=null, unstructuredReference=程塨, 周培诚, 韩军伟. 基于旋转不变卷积神经网络的高分辨率光学遥感图像目标检测[J]. 科学观察, 2020, 15(6): 75-76., articleTitle=基于旋转不变卷积神经网络的高分辨率光学遥感图像目标检测, refAbstract=null), Reference(id=1225944433812746570, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2020, volume=15, issue=6, pageStart=75, pageEnd=76, url=null, language=null, rfNumber=[2], rfOrder=3, authorNames=Cheng Gong, Zhou Peicheng, Han Junwei, journalName=Scientific Observation, refType=null, unstructuredReference=Cheng Gong, Zhou Peicheng, Han Junwei. Target detection in high-resolution optical remote sensing images based on rotationally invariant convolutional neural networks[J]. Scientific Observation, 2020, 15(6): 75-76., articleTitle=Target detection in high-resolution optical remote sensing images based on rotationally invariant convolutional neural networks, refAbstract=null), Reference(id=1225944433976324431, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=3, pageStart=35, pageEnd=41, url=null, language=null, rfNumber=[3], rfOrder=4, authorNames=院老虎, 常玉坤, 刘家夫, journalName=郑州大学学报(工学版), refType=null, unstructuredReference=院老虎, 常玉坤, 刘家夫. 基于改进 YOLOv5s 的雾天场景车辆检测方法[J]. 郑州大学学报(工学版), 2023, 44(3): 35-41., articleTitle=基于改进 YOLOv5s 的雾天场景车辆检测方法, refAbstract=null), Reference(id=1225944434173456730, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=44, issue=3, pageStart=35, pageEnd=41, url=null, language=null, rfNumber=[3], rfOrder=5, authorNames=Yuan Laohu, Chang Yukun, Liu Jiafu, journalName=Journal of Zhengzhou University (Engineering Science), refType=null, unstructuredReference=Yuan Laohu, Chang Yukun, Liu Jiafu. Vehicle detection method based on improved YOLOv5s in foggy scene[J]. Journal of Zhengzhou University (Engineering Science), 2023, 44(3): 35-41., articleTitle=Vehicle detection method based on improved YOLOv5s in foggy scene, refAbstract=null), Reference(id=1225944434316063078, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2014, volume=null, issue=null, pageStart=580, pageEnd=587, url=null, language=null, rfNumber=[4], rfOrder=6, authorNames=Girshick R, Donahue J, Darrell T, journalName=Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, refType=null, unstructuredReference=Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587., articleTitle=Rich feature hierarchies for accurate object detection and semantic segmentation, refAbstract=null), Reference(id=1225944434513195378, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2016, volume=null, issue=null, pageStart=1440, pageEnd=1448, url=null, language=null, rfNumber=[5], rfOrder=7, authorNames=Girshick R, journalName=IEEE International Conference on Computer Vision(ICCV), refType=null, unstructuredReference=Girshick R. Fast R-CNN[C]// IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE, 2016: 1440-1448., articleTitle=Fast R-CNN, refAbstract=null), Reference(id=1225944434643218813, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2017, volume=39, issue=6, pageStart=1137, pageEnd=1149, url=null, language=null, rfNumber=[6], rfOrder=8, authorNames=Ren S Q, He K M, Girshick R, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149., articleTitle=Faster R-CNN: towards real-time object detection with region proposal networks, refAbstract=null), Reference(id=1225944434777436551, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2016, volume=null, issue=null, pageStart=779, pageEnd=788, url=null, language=null, rfNumber=[7], rfOrder=9, authorNames=Redmon J, Divvala S, Girshick R, journalName=IEEE Conference on Computer Vision and Pattern Recognition (CVPR), refType=null, unstructuredReference=Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-788., articleTitle=You only look once: unified, real-time object detection, refAbstract=null), Reference(id=1225944434907459986, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2017, volume=null, issue=null, pageStart=6517, pageEnd=6525, url=null, language=null, rfNumber=[8], rfOrder=10, authorNames=Redmon J, Farhadi A, journalName=Honolulu, H. I., refType=null, unstructuredReference=Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, H. I.: IEEE, 2017: 6517-6525., articleTitle=YOLO9000:better, faster, stronger, refAbstract=null), Reference(id=1225944434999734684, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=1, pageEnd=17, url=null, language=null, rfNumber=[9], rfOrder=11, authorNames=Bochkovskiy A, Wang C Y, Liao H Y M, journalName=Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, refType=null, unstructuredReference=Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1-17., articleTitle=YOLOv4: optimal speed and accuracy of object detection, refAbstract=null), Reference(id=1225944435087815076, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=4, pageStart=1598, pageEnd=1606, url=null, language=null, rfNumber=[10], rfOrder=12, authorNames=李启明, 阙祖航, journalName=科学技术与工程, refType=null, unstructuredReference=李启明, 阙祖航. 基于改进 YOLOv5 的 X 射线图像危险品检测[J]. 科学技术与工程, 2023, 23(4): 1598-1606., articleTitle=基于改进 YOLOv5 的 X 射线图像危险品检测, refAbstract=null), Reference(id=1225944435222032812, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=4, pageStart=1598, pageEnd=1606, url=null, language=null, rfNumber=[10], rfOrder=13, authorNames=Li Qiming, Que Zuhang, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Li Qiming, Que Zuhang. Detection of dangerous objects in X-ray images based on improved YOLOv5[J]. Science Technology and Engineering, 2023, 23(4): 1598-1606., articleTitle=Detection of dangerous objects in X-ray images based on improved YOLOv5, refAbstract=null), Reference(id=1225944435352056243, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=14, pageStart=5901, pageEnd=5910, url=null, language=null, rfNumber=[11], rfOrder=14, authorNames=周孟然, 王昊男, 高立鹏, journalName=科学技术与工程, refType=null, unstructuredReference=周孟然, 王昊男, 高立鹏, 等. 基于YOLOv5s-FCS的钢材表面缺陷检测[J]. 科学技术与工程, 2024, 24(14): 5901-5910., articleTitle=基于YOLOv5s-FCS的钢材表面缺陷检测, refAbstract=null), Reference(id=1225944435473691063, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=14, pageStart=5901, pageEnd=5910, url=null, language=null, rfNumber=[11], rfOrder=15, authorNames=Zhou Mengran, Wang Haonan, Gao Lipeng, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Zhou Mengran, Wang Haonan, Gao Lipeng, et al. YOLOv5s-FCS based steel surface defect detection study[J]. Science Technology and Engineering, 2024, 24(14): 5901-5910., articleTitle=YOLOv5s-FCS based steel surface defect detection study, refAbstract=null), Reference(id=1225944435591131585, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=9, pageStart=3869, pageEnd=3875, url=null, language=null, rfNumber=[12], rfOrder=16, authorNames=郭华玲, 刘佳帅, 郑宾, journalName=科学技术与工程, refType=null, unstructuredReference=郭华玲, 刘佳帅, 郑宾, 等. 融合RepVGG的YOLOv5交通标志识别算法[J]. 科学技术与工程, 2024, 24(9): 3869-3875., articleTitle=融合RepVGG的YOLOv5交通标志识别算法, refAbstract=null), Reference(id=1225944435750515144, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=9, pageStart=3869, pageEnd=3875, url=null, language=null, rfNumber=[12], rfOrder=17, authorNames=Guo Hualing, Liu Jiashuai, Zheng Bin, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Guo Hualing, Liu Jiashuai, Zheng Bin, et al. YOLOv5 traffic sign recognition algorithm combined with RepVGG[J]. Science Technology and Engineering, 2024, 24(9): 3869-3875., articleTitle=YOLOv5 traffic sign recognition algorithm combined with RepVGG, refAbstract=null), Reference(id=1225944435884732884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=16, pageStart=6766, pageEnd=6774, url=null, language=null, rfNumber=[13], rfOrder=18, authorNames=蒋启超, 余成波, 宣以国, journalName=科学技术与工程, refType=null, unstructuredReference=蒋启超, 余成波, 宣以国, 等. 基于轻量级主干的YOLOv5驾驶员疲劳检测算法[J]. 科学技术与工程, 2024, 24(16): 6766-6774., articleTitle=基于轻量级主干的YOLOv5驾驶员疲劳检测算法, refAbstract=null), Reference(id=1225944436044116439, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=16, pageStart=6766, pageEnd=6774, url=null, language=null, rfNumber=[13], rfOrder=19, authorNames=Jiang Qichao, Yu Chengbo, Xuan Yiguo, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Jiang Qichao, Yu Chengbo, Xuan Yiguo, et al. Driver fatigue detection algorithm based on lightweight YOLOv5[J]. Science Technology and Engineering, 2024, 24(16): 6766-6774., articleTitle=Driver fatigue detection algorithm based on lightweight YOLOv5, refAbstract=null), Reference(id=1225944436186722786, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=15, issue=16, pageStart=3975, pageEnd=null, url=null, language=null, rfNumber=[14], rfOrder=20, authorNames=Li C, Xu R, Lü Y, journalName=Remote Sensing, refType=null, unstructuredReference=Li C, Xu R, Y, et al. Edge realtime object detection and DPU-based hardware implementation for optical remote sensing images[J]. Remote Sensing, 2023, 15(16): 3975., articleTitle=Edge realtime object detection and DPU-based hardware implementation for optical remote sensing images, refAbstract=null), Reference(id=1225944436295774694, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=59, issue=8, pageStart=227, pageEnd=238, url=null, language=null, rfNumber=[15], rfOrder=21, authorNames=张朝阳, 张上, 王恒涛, journalName=计算机工程与应用, refType=null, unstructuredReference=张朝阳, 张上, 王恒涛, 等. 多尺度下遥感小目标多头注意力检测[J]. 计算机工程与应用, 2023, 59(8): 227-238., articleTitle=多尺度下遥感小目标多头注意力检测, refAbstract=null), Reference(id=1225944436438381040, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=59, issue=8, pageStart=227, pageEnd=238, url=null, language=null, rfNumber=[15], rfOrder=22, authorNames=Zhang Chaoyang, Zhang Shang, Wang Hengtao, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=Zhang Chaoyang, Zhang Shang, Wang Hengtao, et al. Remote sensing of small targets with multiple attention at multiple scales force detection[J]. Computer Engineering and Applications, 2023, 59(8): 227-238., articleTitle=Remote sensing of small targets with multiple attention at multiple scales force detection, refAbstract=null), Reference(id=1225944436568404468, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/pdf/2303.09030.pdf, language=null, rfNumber=[16], rfOrder=23, authorNames=Li Y, Hou Q, Zheng Z, journalName=arXiv Preprint, refType=null, unstructuredReference=Li Y, Hou Q, Zheng Z, et al. Large selective kernel network for remote sensing object detection[J]. arXiv Preprint, 2023: https://arxiv.org/pdf/2303.09030.pdf., articleTitle=Large selective kernel network for remote sensing object detection, refAbstract=null), Reference(id=1225944436677456381, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2020, volume=12, issue=null, pageStart=1432, pageEnd=null, url=null, language=null, rfNumber=[17], rfOrder=24, authorNames=Rabbi J, Ray N, Schubert M, journalName=Remote Sensing, refType=null, unstructuredReference=Rabbi J, Ray N, Schubert M, et al. Small objectdetection in remote sensing images with end-to-end edge-enhanced GAN and object detector network[J]. Remote Sensing, 2020, 12: 1432., articleTitle=Small objectdetection in remote sensing images with end-to-end edge-enhanced GAN and object detector network, refAbstract=null), Reference(id=1225944436790702595, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2023, volume=61, issue=null, pageStart=1, pageEnd=15, url=null, language=null, rfNumber=[18], rfOrder=25, authorNames=Zhang J, Lei J, Xie W, journalName=IEEE Transactions on Geoscience and Remote Sensing, refType=null, unstructuredReference=Zhang J, Lei J, Xie W, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-15., articleTitle=SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery, refAbstract=null), Reference(id=1225944438199988745, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/abs/2207.02696, language=null, rfNumber=[19], rfOrder=26, authorNames=Wang C Y, Bochkovskiy A, Liao H Y M, journalName=arXiv Preprint, refType=null, unstructuredReference=Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies setsnew state-of-the-art for realtime object detectors[J]. arXiv Preprint, 2022: https://arxiv.org/abs/2207.02696., articleTitle=YOLOv7: trainable bag-of-freebies setsnew state-of-the-art for realtime object detectors, refAbstract=null), Reference(id=1225944438325817875, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/pdf/2205.04437.pdf, language=null, rfNumber=[20], rfOrder=27, authorNames=Chen X Y, Wang X T, Zhou J T, journalName=arXiv Preprint, refType=null, unstructuredReference=Chen X Y, Wang X T, Zhou J T, et al. Activating more pixels in image super-resolution transformer[J]. arXiv Preprint, 2022: https://arxiv.org/pdf/2205.04437.pdf., articleTitle=Activating more pixels in image super-resolution transformer, refAbstract=null), Reference(id=1225944438426481177, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/pdf/2112.05561.pdf, language=null, rfNumber=[21], rfOrder=28, authorNames=Liu Y, Shao Z, Hoffmann N, journalName=arXiv Preprint, refType=null, unstructuredReference=Liu Y, Shao Z, Hoffmann N. Global attention mechanism: retain information to enhance channel-spatial interactions[J]. arXiv Preprint, 2021: https://arxiv.org/pdf/2112.05561.pdf., articleTitle=Global attention mechanism: retain information to enhance channel-spatial interactions, refAbstract=null), Reference(id=1225944438573281827, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=https://arxiv.org/abs/2110.13389, language=null, rfNumber=[22], rfOrder=29, authorNames=Wang J W, Xu C, Yang W, journalName=arXiv Preprint, refType=null, unstructuredReference=Wang J W, Xu C, Yang W, et al. A normalized gaussian wasserstein distance for tiny object detection[J]. arXiv Preprint, 2021: https://arxiv.org/abs/2110.13389., articleTitle=A normalized gaussian wasserstein distance for tiny object detection, refAbstract=null), Reference(id=1225944438690722344, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, doi=null, pmid=null, pmcid=null, year=2016, volume=null, issue=null, pageStart=21, pageEnd=37, url=null, language=null, rfNumber=[23], rfOrder=30, authorNames=Liu W, Anguelov D, Erhan D, journalName=Proceedings of European Conference on Computer Vision, refType=null, unstructuredReference=Liu W, Anguelov D, Erhan D, et al. SSD: single shot multi box detector[C]// Proceedings of European Conference on Computer Vision. Cham: Springer, 2016: 21-37., articleTitle=SSD: single shot multi box detector, refAbstract=null)], funds=[Fund(id=1225944431250026713, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, awardId=62006072, language=CN, fundingSource=国家自然科学基金青年科学基金(62006072), fundOrder=null, country=null), Fund(id=1225944431426187498, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, awardId=222102210108, language=CN, fundingSource=河南省重点研发与推广专项(科技攻关)项目(222102210108), fundOrder=null, country=null), Fund(id=1225944431581376758, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, awardId=KFJJ2022013, language=CN, fundingSource=粮食处理与控制教育部重点实验室开放课题(KFJJ2022013), fundOrder=null, country=null), Fund(id=1225944431715594501, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, awardId=2022ZKCJ11, language=CN, fundingSource=河南工业大学创新基金支持计划专项资助(2022ZKCJ11), fundOrder=null, country=null), Fund(id=1225944431845617933, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, awardId=null, language=CN, fundingSource=河南工业大学青年骨干教师培育计划, fundOrder=null, country=null)], companyList=[AuthorCompany(id=1225944421590544847, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, xref=null, ext=[AuthorCompanyExt(id=1225944421598933456, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China), AuthorCompanyExt(id=1225944421611516370, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, companyId=1225944421590544847, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=河南工业大学人工智能与大数据学院, 郑州 450001)])], figs=[ArticleFig(id=1225944427227689940, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Fig.1, caption=Improvement of YOlO-HF overall structure, figureFileSmall=fAFj2Pgfpzs8gvgd7Fyvhw==, figureFileBig=b21pnee9M8uTwxmEvPBBKQ==, tableContent=null), ArticleFig(id=1225944427345130466, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=图1, caption=YOLO-HF整体结构图

CBS为卷积归一化和SiLU激活函数模块;ELAN为高效层聚合网络模块;MP为最大池化操作;CBM为卷积归一化和sigmoid激活函数模块;REP为3×3卷积和1×1卷积和残差连接组合的一个卷积层;CBS_x_x中,C为2D卷积,B为批归一化层(batch normalization,BN),S为所使用的激活函数silu;为了同sigmoid区分,用M表示sigmoid激活函数;x_x中,第1个x为卷积核尺寸和步长;虚线框所标出部分为修改部分;backbone为主干特征提取网络;HAT_Block为本文所引入的混合注意力模块;head为检测头;cat为凭借操作;UpSample为上采样操作;SPPCSPC为YOLOv7结构中特殊的卷积层;GAM_ATT为全局注意力块

, figureFileSmall=fAFj2Pgfpzs8gvgd7Fyvhw==, figureFileBig=b21pnee9M8uTwxmEvPBBKQ==, tableContent=null), ArticleFig(id=1225944427479348218, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Fig.2, caption=HAT_Block structure, figureFileSmall=e9qIqTuDpQsJ/B0W5lrbSQ==, figureFileBig=3rqrFV6kDh9vQVqDT68QWQ==, tableContent=null), ArticleFig(id=1225944427605176330, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=图2, caption=HAT_Block结构图

Conv为卷积;RHAG为残差混合注意力组;cat拼接;HAB为混合注意力块,具体结构如HAB箭头所指;OCAB为重叠交叉注意力块;CAB为通道注意力块;W-MSA为基于窗口的多头自注意力;OCA为重叠交叉注意力;LayerNorm为层归一化;MLP为多层感知机

, figureFileSmall=e9qIqTuDpQsJ/B0W5lrbSQ==, figureFileBig=3rqrFV6kDh9vQVqDT68QWQ==, tableContent=null), ArticleFig(id=1225944427751976988, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Fig.3, caption=GAM_Att structure, figureFileSmall=LP1HSOel6ZLvGMxVWtzMRg==, figureFileBig=CgL1cFckVkJBKgNaIAfQ6A==, tableContent=null), ArticleFig(id=1225944427886194732, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=图3, caption=GAM_Att结构图

Permutation为转置操作;MLP为多层感知机;Reverse Permutation为逆转置操作;sig为激活函数;Conv为卷积

, figureFileSmall=LP1HSOel6ZLvGMxVWtzMRg==, figureFileBig=CgL1cFckVkJBKgNaIAfQ6A==, tableContent=null), ArticleFig(id=1225944427991052347, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Fig.4, caption=ThemAP variation graph of RSOD dataset, figureFileSmall=t/xoIHZjgwX+Dz4Ed03ZVQ==, figureFileBig=Cx/WVOvxeMjsjYB3nNGwSw==, tableContent=null), ArticleFig(id=1225944429391949894, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=图4, caption=RSOD数据集mAP变化图

Epoch为轮次

, figureFileSmall=t/xoIHZjgwX+Dz4Ed03ZVQ==, figureFileBig=Cx/WVOvxeMjsjYB3nNGwSw==, tableContent=null), ArticleFig(id=1225944429576499286, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Fig.5, caption=Comparison of test airplanes, figureFileSmall=+OkdhsD+ZTZgxBOkAnXXGA==, figureFileBig=VGoykHa2GCZMDPEe84p8NQ==, tableContent=null), ArticleFig(id=1225944429689745504, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=图5, caption=检测飞机对比图, figureFileSmall=+OkdhsD+ZTZgxBOkAnXXGA==, figureFileBig=VGoykHa2GCZMDPEe84p8NQ==, tableContent=null), ArticleFig(id=1225944429962375281, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Table 1, caption=

Comparison results of some two-stage and single-stage algorithms trained on the RSOD dataset

, figureFileSmall=null, figureFileBig=null, tableContent=
算法 单类别检测精度/% mAP/%
飞机 操场 立交桥 油罐
Faster R-CNN 63.51 99.26 88.59 92.82 86.04
SSD 45.42 99.06 68.73 91.95 76.29
YOLOv5 93.09 98.15 66.26 98.15 88.70
YOLOv7 92.29 98.77 75.17 96.88 90.78
本文算法 93.59 99.10 85.15 96.87 93.68
), ArticleFig(id=1225944430339862664, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=表1, caption=

部分两阶段和单阶段算法在RSOD数据集上训练对比结果

, figureFileSmall=null, figureFileBig=null, tableContent=
算法 单类别检测精度/% mAP/%
飞机 操场 立交桥 油罐
Faster R-CNN 63.51 99.26 88.59 92.82 86.04
SSD 45.42 99.06 68.73 91.95 76.29
YOLOv5 93.09 98.15 66.26 98.15 88.70
YOLOv7 92.29 98.77 75.17 96.88 90.78
本文算法 93.59 99.10 85.15 96.87 93.68
), ArticleFig(id=1225944430516023452, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Table 2, caption=

Comparison results of some two-stage and single- stage algorithms algorithms trained on NWPU VHR dataset

, figureFileSmall=null, figureFileBig=null, tableContent=
类别 不同算法检测结果/%
Faster R-CNN SSD YOLOv5 YOLOv7 本文算法
飞机 98.31 90.40 99.95 100.00 99.99
棒球场 99.55 89.90 97.73 98.07 98.10
篮球场 95.35 80.60 83.18 89.86 96.05
桥梁 86.33 76.70 75.61 71.29 87.89
田径场 99.95 98.31 100.00 99.19 99.90
港口 95.75 73.40 89.39 91.35 94.56
舰船 75.60 60.90 82.44 86.64 85.40
油罐 65.60 79.80 98.33 93.12 95.51
网球场 81.80 82.60 91.11 90.48 93.89
车辆 47.82 52.10 73.93 81.97 86.83
mAP 84.61 78.40 89.17 90.20 93.81
), ArticleFig(id=1225944430679601322, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=表2, caption=

部分两阶段和单阶段算法在NWPU VHR数据集上训练对比结果

, figureFileSmall=null, figureFileBig=null, tableContent=
类别 不同算法检测结果/%
Faster R-CNN SSD YOLOv5 YOLOv7 本文算法
飞机 98.31 90.40 99.95 100.00 99.99
棒球场 99.55 89.90 97.73 98.07 98.10
篮球场 95.35 80.60 83.18 89.86 96.05
桥梁 86.33 76.70 75.61 71.29 87.89
田径场 99.95 98.31 100.00 99.19 99.90
港口 95.75 73.40 89.39 91.35 94.56
舰船 75.60 60.90 82.44 86.64 85.40
油罐 65.60 79.80 98.33 93.12 95.51
网球场 81.80 82.60 91.11 90.48 93.89
车辆 47.82 52.10 73.93 81.97 86.83
mAP 84.61 78.40 89.17 90.20 93.81
), ArticleFig(id=1225944430872539325, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=EN, label=Table 3, caption=

Results of ablation experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
算法 改进1 改进2 改进3 mAP/%
YOLOv7 90.78
HAT_Block 91.07
GAM_Att 91.02
NWD度量 90.90
本文算法 93.68
), ArticleFig(id=1225944430985785549, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1156949464423420645, language=CN, label=表3, caption=

消融实验结果

, figureFileSmall=null, figureFileBig=null, tableContent=
算法 改进1 改进2 改进3 mAP/%
YOLOv7 90.78
HAT_Block 91.07
GAM_Att 91.02
NWD度量 90.90
本文算法 93.68
)], attaches=null, journal=Journal(id=1146119176004939786, delFlag=0, nameCn=科学技术与工程, nameEn=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, issn=1671-1815, eissn=, cn=11-4688/T, coden=null, periodic=4, language=CN, oaType=是, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=UKU/O7GSka5polgCTkbIIw==, journalPrice=null, startedYear=null, abbrevIsoEn=Sci Technol Eng, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1754445529766, createdBy=null, updatedBy=13701087609, firstLetterCn=S, firstLetterEn=S, subjectCode=Natural Sciences, subjectName=自然科学, subjectCodeEn=Natural Sciences, subjectNameEn=null, picCn=UKU/O7GSka5polgCTkbIIw==, picEn=5hwlULoNwcbj3xUmVi9MAQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1159791870395564357, language=CN, name=科学技术与工程, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529793, updatedTime=1754445529793, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=http://www.stae.com.cn/jsygc/site/menus/20090429150146001, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1159791870441701702, language=EN, name=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529804, updatedTime=1754445529804, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146123166801305609, websiteList=[Website(id=1148243202391400884, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/CN, language=CN, createTime=1751692112777, createBy=18614031015, updateTime=1753520965431, updateBy=18614031015, name=科学技术与工程-中文站点, tplId=1146099689490845704, title=科学技术与工程, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148622798802673703, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=articleTextType, value=kx, createTime=1751782615614, updateTime=1751782615614, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798781702180, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=banner, value=null, createTime=1751782615609, updateTime=1751782615609, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798769119267, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1751782615606, updateTime=1751782615606, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798794285094, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751782615612, updateTime=1751782615612, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798790090789, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751782615611, updateTime=1751782615611, creator=18614031015, updator=18614031015)]), Website(id=1155914124811976731, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/EN, language=EN, createTime=1753521003206, createBy=18614031015, updateTime=1753521003206, updateBy=18614031015, name=科学技术与工程-英文站点, tplId=1146101810881728533, title=Science Technology and Engineering, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155914371227308235, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=articleTextType, value=kx, createTime=1753521061952, updateTime=1753521061952, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371210531016, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=banner, value=null, createTime=1753521061947, updateTime=1753521061947, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371202142407, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1753521061945, updateTime=1753521061945, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371223113930, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753521061950, updateTime=1753521061950, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371218919625, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753521061949, updateTime=1753521061949, creator=18614031015, updator=18614031015)])], journalTitle=科学技术与工程, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Science Technology and Engineering, journalPhotoCn=UKU/O7GSka5polgCTkbIIw==, journalPhotoEn=5hwlULoNwcbj3xUmVi9MAQ==, journalFirstLetter=S, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2309878, detailUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2309878, pdfUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/PDF/10.12404/j.issn.1671-1815.2309878, pdfUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/PDF/10.12404/j.issn.1671-1815.2309878, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
融合多层特征与上下文信息的YOLO改进算法
收藏切换
PDF下载
费选 , 郭梦瑶 , 吴思佳 , 靳子泷 , 马丁
科学技术与工程 | 论文·自动化技术、计算机技术 2025,25(4): 1555-1562
收起
收藏切换
科学技术与工程 | 论文·自动化技术、计算机技术 2025, 25(4): 1555-1562
融合多层特征与上下文信息的YOLO改进算法
全屏
费选 , 郭梦瑶, 吴思佳, 靳子泷, 马丁
作者信息
  • 河南工业大学人工智能与大数据学院, 郑州 450001
  • 费选(1986—),男,汉族,河南郑州人,博士,副教授。研究方向:高光谱遥感影像分析。E-mail:

Improved YOLO Algorithm via Fusing Multilayer Features and Contextual Information
Xuan FEI , Meng-yao GUO, Si-jia WU, Zi-long JIN, Ding MA
Affiliations
  • School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China
出版时间: 2025-02-08 doi: 10.12404/j.issn.1671-1815.2309878
文章导航
收藏切换

遥感图像目标检测在军事侦察、智慧农业等领域意义重大,特别是小目标检测一直获得持续关注。然而,遥感图像中的小目标面临特征信息不足、检测难度大等问题,成为困扰遥感检测应用发展的最大障碍。为此,提出YOLO-HF(you only look once-hybrid feature)算法,该算法在传统YOLOv7模型的网络中,引入通道注意力和自注意力的混合注意力机制提取目标深层特征,并将浅层特征和深层特征进行融合,增加局部特征的丰富性;为进一步加强对全局信息的关注,在提取特征后为小尺度目标添加全局注意力机制,实现全局特征表达能力的提升;为避免传统损失函数对小目标位置偏差敏感,导致检测效果不佳,选择使用一种新的度量方式,将其嵌入边界框损失函数的计算中,从而加快损失函数的收敛,实现小目标检测精度的提升。实验结果表明:与传统YOLOv7算法相比,所提算法在RSOD和NWPU VHR-10数据集上均表现出优越性,特别地,在RSOD数据集上均值平均精度提升了2.90%,在NWPU VHR-10数据集上均值平均精度实现了3.61%的提升。

遥感图像  /  目标检测  /  YOLOv7  /  多层特征  /  注意力机制

Remote sensing image target detection is one of great significance in military reconnaissance, intelligent agriculture and other fields, especially small target detection has been gaining continuous attention. However, small targets in remote sensing images face the problems of insufficient feature information and difficult detection, which have become the biggest obstacles plaguing the development of remote sensing applications. To this end, the you only look once-hybrid feature(YOLO-HF) algorithm was proposed, which introduced a hybrid attention mechanism of channel attention and self-attention in the network of the traditional YOLOv7 model to extract the target’s deep features, and fused the shallow and deep features to increase the richness of local features; to further strengthen the attention to the global information, a global attention mechanism was added for the small-scale targets after the extraction of the features, to achieve the ability of global feature expression enhancement. In order to avoid that the traditional loss function was sensitive to the positional deviation of small targets, which leaded to poor detection effect, a new metric was selected for use, which was embedded into the computation of the bounding box loss function, so as to accelerated the convergence of the loss function and realized the enhancement of the detection accuracy of small targets. The experimental results show that compared with the traditional YOLOv7 algorithm, the proposed algorithm shows superiority on both RSOD and NWPU VHR-10 datasets, and in particular, the mean average accuracy on RSOD dataset is improved by 2.90%, and the mean average accuracy on NWPU VHR-10 dataset realizes an improvement of 3.61%.

remote sensing images  /  target detection  /  YOLOv7  /  multilayer features  /  attention mechanism
费选, 郭梦瑶, 吴思佳, 靳子泷, 马丁. 融合多层特征与上下文信息的YOLO改进算法. 科学技术与工程, 2025 , 25 (4) : 1555 -1562 . DOI: 10.12404/j.issn.1671-1815.2309878
Xuan FEI, Meng-yao GUO, Si-jia WU, Zi-long JIN, Ding MA. Improved YOLO Algorithm via Fusing Multilayer Features and Contextual Information[J]. Science Technology and Engineering, 2025 , 25 (4) : 1555 -1562 . DOI: 10.12404/j.issn.1671-1815.2309878
目标检测作为计算机视觉领域的重要研究方向,被众多研究者所关注。与传统自然图像领域的目标检测不同,遥感图像的目标检测对环境监测、动物保护、交通管理、国防军事等领域具有重要的理论意义和实用价值[1]。基于不同传感器获取的遥感图像,往往蕴含丰富的数据信息,且类型多样、背景复杂,特别是数据集样本中含有大量特征不明显的小目标。此外,传统目标检测通过人工方式提取图像特征[2],导致图像特征信息提取不足,无法有效甄别小目标,阻碍了遥感图像小目标检测的进一步发展和应用。
近年来,随着软硬件环境和计算资源的不断进步,有很多学者开始研究如何将深度学习方法,尤其是卷积神经网络,与各个领域进行结合,并取得了一定成果,这引起了目标检测领域研究者的广泛关注[3]。Girshick等[4]在目标检测领域应用卷积神经网络,并借助该网络提取图像区域特征,从而实现目标检测,即R-CNN(regions with CNN features)。与传统滑动窗通过滑动来逐个判断所有可能包含目标的区域截然不同,Girshick等[4]提出预先提取最可能的目标候选区域,然后利用卷积神经网络对这些候选区域进行特征提取,以便判断和识别目标。这种创新性想法影响深远,为目标检测的研究开辟了新思路。紧随其后的Fast R-CNN算法[5]在R-CNN的基础上取得了一定的发展,该算法引入RoI(region of interest pooling)池化层,目的是将各种尺寸的候选区域映射为统一大小的特征图。Ren等[6]提出Faster R-CNN算法,快速生成候选区域的过程由区域生成网络(region proposal network,RPN)负责实现,同时利用RPN与共享卷积特征图的卷积操作,获取生成候选区域的边界框和置信度得分,从而结合RPN与Fast R-CNN形成了端到端的目标检测系统,能够在一定程度上提高目标检测的精度。
当前,以R-CNN系列算法为代表的两阶段方法检测速度无法满足实时性的需求,因而不需要生成候选框的单阶段方法逐渐成为主流。Redmon等[7]提出的YOLO(you only look once)算法成为单阶段方法的重要代表,引起相关研究者的极大兴趣。与以往的目标检测工作不同,YOLO算法采用全新的方法,将目标检测过程视为空间分离的边界框和相关类概率的回归问题,进一步利用分割网格进行目标位置和类别的预测。在此基础上,YOLOv2[8]使用Darknet-19作为特征提取网络,并考虑到尺度多样性,针对不同尺度利用锚框预测不同形状和大小的目标,提高了目标检测精度。YOLOv3利用Darknet-53作为特征提取的核心网络,并且在此之上嵌入特征金字塔网络(feature pyramid network,FPN)结构,以实现对多尺度目标的检测,这种更细粒度的锚框可以提升对检测目标的定位能力。YOLOv4[9]选择CSPDarknet53作为主干网络,其中Neck结构主要采用SPP(spatial pyramid pooling)模块、FPN和路径聚合网络(path aggregation network,PAN)。YOLOv5则通过Mosaic数据增强处理数据,用Focus结构和CSP(cross-stage-partial-connections)结构进一步提升YOLO算法的目标检测精度。李启明等[10]针对X射线图像危险品检测存在的问题对YOLOv5的网络进行改进,使用剪枝减小模型,并通过坐标注意力机制使网络聚焦检测目标,进一步使用数据增强实现检测性能优化。YOLOv7在YOLOv5基础上,引入ELAN模块代替CSP(cross-stage-partial-connections)模块,对池化操作进行修改使得目标检测能力得到进一步提高。
在许多方面,研究者都选用YOLO作为基础进行详尽的研究,周孟然等[11]通过FReLU所形成的新卷积块来对空间的解析能力进行提升,引入位置注意来解决钢材缺陷的检测问题,取得成效。郭华玲等[12]利用RepVGG和YOLOv5的结合对交通标志小目标进行检测得到了不错的效果。蒋启超等[13]将Transform和YOLO相结合用于驾驶员的疲劳检测,其算法的检测精度和轻量上都具有一定的优势。
随着深度学习方法在遥感图像处理领域的引入和快速发展,遥感图像目标检测的精度得到较大提升。Li等[14]在YOLOv4的基础上,将主干网络替换为MobileNext网络以减少参数量,并添加了RFB(receptive field block)和ECA(efficient channel attention)结构,通过实验验证了模型在遥感数据集上的检测有效性。张朝阳等[15]针对遥感图像的多尺度、形态多样等问题,引入双向特征金字塔网络,并融合Swin Transformer的多头注意力机制,重构网络结构,对YOLO算法进行优化。此外,针对遥感图像中普遍存在的小目标遮挡导致漏检及误检问题,如何充分利用被检测目标所具有的独特先验知识对提高检测效果而言很重要。Li等[16]考虑遥感图像中大量背景先验知识可能提供有效信息,首次在遥感目标检测中探索大卷积核机制,提出了LSKNet(large selective kernel network),优势在于相对其他检测器较轻量,检测精度在多个数据集上得到大幅提高,极大降低了误检率。针对遥感图像小目标检测性能不理想的问题,Rabbi等[17]将GAN引入遥感图像目标检测领域,结合其他模块能够进一步提高遥感小目标检测能力。Zhang等[18]通过融合多模态遥感图像中的互补信息来改善小目标检测能力,删除相应模块保留高分辨特征,利用像素级多模态融合提取信息,并通过超分辨辅助分支学习高分辨特征,在低分辨率输入的大背景下区分小物体,从而更好检测小目标。
基于此,针对遥感图像小目标特征信息过少的检测难点,YOLO-HF(you only look once-hybrid feature)以YOLOv7算法[19]为基础,通过改进和优化,提升小目标检测能力。将混合注意力转换器(hybrid attention transformer,HAT)模型[20]中对输入图像的浅层和深层特征提取模块添加到YOLOv7的主干网络所提取的特征信息之后,对所得到的特征信息进行处理,提取更多小目标的特征信息;引入全局注意力机制(global attention mechanism,GAM)[21],增强模型对上下文信息的理解,提升模型检测性能;将归一化的Wassertein 距离(normalization Wassertein distance,NWD)[22]嵌入边界框损失函数中,并调整相应参数,打破传统基于IoU度量对小目标位置偏差敏感的局限,提升检测器的检测性能,从而准确评估小目标间的相似度,进一步提高检测精度。
YOLOv7的原始网络中,主干网络是由卷积层堆叠得到,提取的不同层次特征信息不够充分。受图像超分辨可对小目标放大并增强信息的启发,在YOLOv7主干网络中增加HAT_Block对浅层特征和深层特征进行提取,以增强小目标特征提取能力,便于后续步骤中有较为充分的特征信息进一步对目标进行识别和定位。此外,通过在SPPCSPC模块之后引入全局注意力机制GAM_Att,增强对小目标上下文信息的关注,进一步提升小目标的检测性能。研究发现,原始YOLOv7使用基于IoU(intersection of union)的度量,对小物体位置偏差敏感,导致基于锚框的检测模型性能降低。因此,将NWD度量嵌入边界框与预测框的回归损失计算中,将加快损失函数的收敛速度,提高检测精度。修改后的整体网络模型图如图1所示。
受图像超分辨重建的HAT模型启发,在提取浅层特征的基础上,通过引入通道注意力和自注意力的混合注意力机制来提取目标深层特征,为了提升小目标局部特征的多样性,将浅层特征和深层特征进行结合。具体来说,浅层特征提取依然采用卷积层,深层特征提取则使用混合注意力组(residual hybrid attention group,RHAG)结构。紧接着,将浅层特征和深层特征融合,融合是通过使用残差连接方式,最后得到融合了多层特征的结果。
所使用到的多层特征提取模块HAT_Block结构如图2所示,主要由RHAG模块、卷积模块和残差连接构成。其中,RHAG模块由混合注意力模块HAB(hybrid attention block)、重叠交叉注意力模块OCAB(overlapping cross-attention block)和卷积模块组成。由于HAB模块计算通道注意力权重时涉及全局信息,所以能激活更多的像素,从而增强网络的表示能力,而OCAB模块则通过构建跨窗口的连接进一步提高了表示能力。
为了增强神经网络对全局上下文的信息感知和获取能力,采用GAM注意力机制作为上下文信息引导模块是一种比较合适的选择。其主要思想是通过全局上下文的引入指导特征的加权和融合,将每个特征的重要性与全局上下文关联,可以捕捉全局结构,上下文关系和长距离依赖,从而优化网络模型。具体实现过程见式(1)。
${F}_{3}={M}_{s}\left[{M}_{c}\right({F}_{1})\otimes {F}_{1}]\otimes {F}_{2}$
式(1)中:F1为输入特征;Mc(*)表明*经过通道注意力后得到的输出;F2Mc(F1)和F1相互作用后的中间结果,也可作为输入,进入后续空间注意力模块中;Ms(*)为经过空间注意力后得到的输出;F3为得到的输出特征。
所使用的上下文信息引导模块GAM_Att结构如图3所示,它是一种全局调度指挥控制机制,借助减少相关信息缩减和放大全局交互操作表示,达到在保留通道和空间信息的基础上,增强跨维度交互,有利于目标检测时对全局及远距离信息的把握,进而提高深度神经网络的性能,提升检测效果。
YOLOv7的损失函数由目标置信度损失、类别置信度损失、预测框和真实框的回归损失3个指标组成。在预测框与真实框的回归损失计算中,采用的IoU度量对小目标位置偏差敏感,导致基于锚框的检测模型性能降低。而NWD对不同尺度的物体不敏感,更适合测量微小物体之间的相似性,因此使用NWD度量替换原来的IoU度量,以获得较好的小目标检测效果。
NWD度量方式的提出是为了减轻IoU对小物体位置偏差的敏感性,从而提升模型对小目标检测效能。主要思想是通过建模将边界框构造为二维高斯分布,期间需要使用表示框的中心点坐标(cx,cy)和宽w以及高h的参数。水平框(cx,cy,w, h)到二维高斯分布N(μ,Σ),其中,μ为高斯分布的均值,Σ为高斯分布的方差。
建模过程可由式(2)进行定义。
$\mu =\left[\begin{array}{l}{c}_{x}\\ {c}_{y}\end{array}\right],\Sigma =\left[\begin{array}{ll}\frac{{w}^{2}}{4}& 0\\ 0& \frac{{ℎ}^{2}}{4}\end{array}\right]$
然后使用NWD来计算高斯分布的相似性。计算Wasserstein距离的计算公式为
${n}_{1}=N({\mu }_{1},{\Sigma }_{1})$
${n}_{2}=N({\mu }_{2},{\Sigma }_{2})$
$\begin{aligned} W_{2}^{2}\left(n_{1}, n_{2}\right)= & \left\|\mu_{1}-\mu_{2}\right\|_{2}^{2}+\operatorname{tr}\left[\Sigma_{1}+\Sigma_{2}-\right. \\ & \left.2\left(\Sigma_{1}^{\frac{1}{2}} \Sigma_{1} \Sigma_{2}^{\frac{1}{2}}\right)^{\frac{1}{2}}\right] \end{aligned}$
式中:N为高斯分布;n1n2均服从高斯分布;μ为高斯分布的均值;Σ为高斯分布方差;tr为矩阵的迹;有序的Wasserstein距离定义为${W}_{2}^{2}$
${W}_{2}^{2}$简化后得到式4,式中的${\Sigma }^{\frac{1}{2}}$为标准差,将式(3)中矩阵迹的计算简化为了Frobenius范数形式。
${W}_{2}^{2}({n}_{1},{n}_{2})={‖{\mu }_{1}-{\mu }_{2}‖}_{2}^{2}+{‖{\Sigma }_{1}^{\frac{1}{2}}-{\Sigma }_{2}^{\frac{1}{2}}‖}_{F}^{2}$
在此基础上,再通过计算NWD作为新的度量,NWD可嵌入损失函数以及非极大值抑制中取代常用的IoU度量指标。利用对边界框a和b建立的高斯分布模型nanb,求得NWD。在嵌入过程中,可利用iou_ratio的参数调节实现对小目标友好的损失计算方式。它的值靠近0方向时,将更适用于数据集中小目标居多的情况。因此可根据数据集的小目标比例调整该参数值。考虑到所使用的数据集中小目标占比,在实验过程中将其值设为0.4。
特别地,当边界框a和b是以中心坐标(cx,cy)、宽w和高h来表示时,分别用nanb表示边界框a和b服从的高斯分布,先求出其Wasserstein距离,再根据Wasserstein距离求NWD,其表达式为
${W}_{2}^{2}({n}_{a},{n}_{b})={‖{\left[{c}_{xa},{c}_{ya},\frac{{w}_{a}}{2},\frac{{ℎ}_{a}}{2}\right]}^{T},{\left[{c}_{xb},{c}_{yb},\frac{{w}_{b}}{2},\frac{{ℎ}_{b}}{2}\right]}^{T}‖}_{2}^{2}$
$NWD({n}_{a},{n}_{b})=exp\left[\frac{\sqrt{{W}_{2}^{2}({n}_{a},{n}_{b})}}{C}\right]$
式(8)中:C为常数,与数据集相关,本文数据集中将其值设为2。2 实验结果与分析
2.1 实验数据集及对比算法
实验采用的两个遥感数据集均包含有大量的小目标,能够很好地验证模型的目标检测效果。一个是由武汉大学标注的RSOD数据集,共包含4类数据,分别是飞机、操场、立交桥和油桶。数据集一共有935张图像,其中飞机实例4 993个,操场191个,立交桥180个,油桶1 586个。另一个数据集是NWPU VHR-10数据集。这些是由西北工业大学标注的,分别指飞机、舰船、油罐、棒球场、网球场、篮球场、田径场、港口、桥梁和车辆,而这10类实例分布于该数据集的800张图像之中,具体而言,背景图像150张,图像含目标的则有650张。在实验数据设计中进行了细致的分配,训练集、测试集、验证集之比为7∶2∶1,也就是说,训练集占据了数据集的70%,数据集的20%视为测试集,剩余部分为验证集。为了验证本文算法的有效性,选取有代表性的部分双阶段和单阶段算法进行比较,包括检测精度高但速度慢的Faster R-CNN算法、以VGGNet作为骨干网络的SSD(single shot multibox detector)算法[23]、YOLO系列模型中的YOLOv5算法和YOLOv7算法。
实验结果的优劣主要通过平均检测精度mAP来进行判定。同时,还涉及其他指标,如准确率P、召回率R和单个类别的平均精度mAP,其计算公式分别为
$P=\frac{TP}{TP+FP}$
$R=\frac{TP}{TP+FN}$
$mAP=\frac{\sum _{i=1}^{n}{\int }_{0}^{1}P\left(R\right)dR}{n}$
式中:TP为检测结果为正例,实际也是正例;FP为被错误地判断为正例的负例的样本;FN为实际为正例,却被检测为负例,但这一情况不常见,因此值一般小一些;n为数据集总的类别数。
实验所使用的操作系统为64位Windows 10系统,显卡是NVIDIA GeForce RTX 3070Ti 8GB,CUDA版本为11.7,CUDNN版本为11.0,PyTorch版本为1.7.1,python版本为3.7.1。训练过程中的mAP(IoU=0.5)变化如图4所示。该数据集在训练过程中在40轮基本收敛,后续逐渐趋于平稳。
RSOD数据集上的对比检测结果如表1所示。本文算法相比Faster R-CNN,mAP提升7.64%;相比SSD,提升了17.39%;相比YOLOv5,提高了4.98%。相对于YOLOv7,修改后的本文算法在RSOD数据集的mAP提升了2.9%,其中飞机、操场和立交桥分别在增加了1.3%、0.4%和10.02%,油罐检测结果几乎无变化。
根据NWPU VHR-10数据集上的对比检测结果表2中呈现了相关数据。修改后的算法训练所得mAP在Faster R-CNN基础上提升了9.2%;相比SSD算法,提高了15.41%;相较YOLOv5算法,mAP提升4.64%;与YOLOv7 相比,mAP增加了3.61%。其中,篮球场、桥梁、港口、网球场、车辆的AP值提升较为明显。
为了直观展示目标检测效果,下面以RSOD数据集中的图像为例,在保持实验参数一致的基础上,根据图5所示,其中YOLOv7的检测结果如图5(a)所示,对于其中两个飞机的实例未检测出,本文算法的检测结果如图5(a)所示。可以看出,在相同的情况下,YOLOv7算法出现了漏检飞机实例的现象,而本文算法能够将更多小目标检测出来。
为了验证所修改各个部分是否有效,在RSOD数据集上进行了消融实验,并在表3中展示实验结果。修改加入HAT_Block模块的mAP增长主要是由立交桥的值提高所引起的。在仅添加HAT_Block模块时,mAP提高了0.31%;仅添加GAM_Att模块时,mAP仅提高了0.36%;仅修改度量方式时,mAP提升较少,只有0.2%。当所提三部分都进行修改后,检测结果相较原YOLOv7算法提升了2.9%,在RSOD数据集上,改进后的算法显示出对各块的改进是有效的。
在遥感图像目标检测中,由于小目标信息量较少,有效鉴别特征提取困难,导致整体检测精度下降。为了增加小目标检测的准确性,在基于YOLOv7模型框架基础上,进行改进。通过实验得出以下结论。
(1)引入混合注意力机制提取深层特征,并融合浅层特征以增强多层局部特征的丰富性,进一步有效提升目标检测的准确性。
(2)利用上下文信息添加全局注意力机制,进一步实现全局特征表达能力的提升。
(3)NWD度量融入到边界框损失函数的计算过程中,以减弱原模型中IoU度量对小目标位置偏差敏感的缺陷,提高目标检测准确率。
  • 国家自然科学基金青年科学基金(62006072)
  • 河南省重点研发与推广专项(科技攻关)项目(222102210108)
  • 粮食处理与控制教育部重点实验室开放课题(KFJJ2022013)
  • 河南工业大学创新基金支持计划专项资助(2022ZKCJ11)
  • 河南工业大学青年骨干教师培育计划
参考文献 引证文献
排序方式:
[1]
马梁, 苟于涛, 雷涛, 等. 基于多尺度特征融合的遥感图像小目标检测[J]. 光电工程, 2022, 49(4): 49-65.
Ma Liang, Gou Yutao, Lei Tao, et al. Small target detection in remote sensing images based on multi-scale feature fusion[J]. Photovoltaic Engineering, 2022, 49(4): 49-65.
[2]
程塨, 周培诚, 韩军伟. 基于旋转不变卷积神经网络的高分辨率光学遥感图像目标检测[J]. 科学观察, 2020, 15(6): 75-76.
Cheng Gong, Zhou Peicheng, Han Junwei. Target detection in high-resolution optical remote sensing images based on rotationally invariant convolutional neural networks[J]. Scientific Observation, 2020, 15(6): 75-76.
[3]
院老虎, 常玉坤, 刘家夫. 基于改进 YOLOv5s 的雾天场景车辆检测方法[J]. 郑州大学学报(工学版), 2023, 44(3): 35-41.
Yuan Laohu, Chang Yukun, Liu Jiafu. Vehicle detection method based on improved YOLOv5s in foggy scene[J]. Journal of Zhengzhou University (Engineering Science), 2023, 44(3): 35-41.
[4]
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2014: 580-587.
[5]
Girshick R. Fast R-CNN[C]// IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE, 2016: 1440-1448.
[6]
Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[7]
Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-788.
[8]
Redmon J, Farhadi A. YOLO9000:better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, H. I.: IEEE, 2017: 6517-6525.
[9]
Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: optimal speed and accuracy of object detection[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1-17.
[10]
李启明, 阙祖航. 基于改进 YOLOv5 的 X 射线图像危险品检测[J]. 科学技术与工程, 2023, 23(4): 1598-1606.
Li Qiming, Que Zuhang. Detection of dangerous objects in X-ray images based on improved YOLOv5[J]. Science Technology and Engineering, 2023, 23(4): 1598-1606.
[11]
周孟然, 王昊男, 高立鹏, 等. 基于YOLOv5s-FCS的钢材表面缺陷检测[J]. 科学技术与工程, 2024, 24(14): 5901-5910.
Zhou Mengran, Wang Haonan, Gao Lipeng, et al. YOLOv5s-FCS based steel surface defect detection study[J]. Science Technology and Engineering, 2024, 24(14): 5901-5910.
[12]
郭华玲, 刘佳帅, 郑宾, 等. 融合RepVGG的YOLOv5交通标志识别算法[J]. 科学技术与工程, 2024, 24(9): 3869-3875.
Guo Hualing, Liu Jiashuai, Zheng Bin, et al. YOLOv5 traffic sign recognition algorithm combined with RepVGG[J]. Science Technology and Engineering, 2024, 24(9): 3869-3875.
[13]
蒋启超, 余成波, 宣以国, 等. 基于轻量级主干的YOLOv5驾驶员疲劳检测算法[J]. 科学技术与工程, 2024, 24(16): 6766-6774.
Jiang Qichao, Yu Chengbo, Xuan Yiguo, et al. Driver fatigue detection algorithm based on lightweight YOLOv5[J]. Science Technology and Engineering, 2024, 24(16): 6766-6774.
[14]
Li C, Xu R, Y, et al. Edge realtime object detection and DPU-based hardware implementation for optical remote sensing images[J]. Remote Sensing, 2023, 15(16): 3975.
[15]
张朝阳, 张上, 王恒涛, 等. 多尺度下遥感小目标多头注意力检测[J]. 计算机工程与应用, 2023, 59(8): 227-238.
Zhang Chaoyang, Zhang Shang, Wang Hengtao, et al. Remote sensing of small targets with multiple attention at multiple scales force detection[J]. Computer Engineering and Applications, 2023, 59(8): 227-238.
[16]
Li Y, Hou Q, Zheng Z, et al. Large selective kernel network for remote sensing object detection[J]. arXiv Preprint, 2023: https://arxiv.org/pdf/2303.09030.pdf. https://arxiv.org/pdf/2303.09030.pdf
[17]
Rabbi J, Ray N, Schubert M, et al. Small objectdetection in remote sensing images with end-to-end edge-enhanced GAN and object detector network[J]. Remote Sensing, 2020, 12: 1432.
[18]
Zhang J, Lei J, Xie W, et al. SuperYOLO: super resolution assisted object detection in multimodal remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-15.
[19]
Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7: trainable bag-of-freebies setsnew state-of-the-art for realtime object detectors[J]. arXiv Preprint, 2022: https://arxiv.org/abs/2207.02696. https://arxiv.org/abs/2207.02696
[20]
Chen X Y, Wang X T, Zhou J T, et al. Activating more pixels in image super-resolution transformer[J]. arXiv Preprint, 2022: https://arxiv.org/pdf/2205.04437.pdf. https://arxiv.org/pdf/2205.04437.pdf
[21]
Liu Y, Shao Z, Hoffmann N. Global attention mechanism: retain information to enhance channel-spatial interactions[J]. arXiv Preprint, 2021: https://arxiv.org/pdf/2112.05561.pdf. https://arxiv.org/pdf/2112.05561.pdf
[22]
Wang J W, Xu C, Yang W, et al. A normalized gaussian wasserstein distance for tiny object detection[J]. arXiv Preprint, 2021: https://arxiv.org/abs/2110.13389. https://arxiv.org/abs/2110.13389
[23]
Liu W, Anguelov D, Erhan D, et al. SSD: single shot multi box detector[C]// Proceedings of European Conference on Computer Vision. Cham: Springer, 2016: 21-37.
2025年第25卷第4期
PDF下载
365
146
引用本文
BibTeX
文章信息
doi: 10.12404/j.issn.1671-1815.2309878
  • 接收时间:2023-12-14
  • 首发时间:2025-07-29
  • 出版时间:2025-02-08
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2023-12-14
  • 修回日期:2024-11-19
基金
国家自然科学基金青年科学基金(62006072)
河南省重点研发与推广专项(科技攻关)项目(222102210108)
粮食处理与控制教育部重点实验室开放课题(KFJJ2022013)
河南工业大学创新基金支持计划专项资助(2022ZKCJ11)
河南工业大学青年骨干教师培育计划
作者信息
    河南工业大学人工智能与大数据学院, 郑州 450001
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2309878
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏