Article(id=1209816724317278467, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1209811339510411616, articleNumber=null, orderNo=null, doi=10.12404/j.issn.1671-1815.2405983, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=research-article, receivedDate=1723132800000, receivedDateStr=2024-08-09, revisedDate=1734883200000, revisedDateStr=2024-12-23, acceptedDate=null, acceptedDateStr=null, onlineDate=1766372384385, onlineDateStr=2025-12-22, pubDate=1751904000000, pubDateStr=2025-07-08, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1766372384385, onlineIssueDateStr=2025-12-22, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1766372384385, creator=13701087609, updateTime=1766372384385, updator=13701087609, issue=Issue{id=1209811339510411616, tenantId=1146029695717560320, journalId=1146123166801305609, year='2025', volume='25', issue='19', pageStart='7885', pageEnd='8315', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=-1, specialIssue=null, createTime=1766371100547, creator=13701087609, updateTime=1766373228996, updator=13701087609, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1209820266960654935, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1209811339510411616, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1209820266960654936, tenantId=1146029695717560320, journalId=1146123166801305609, issueId=1209811339510411616, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=8151, endPage=8157, ext={EN=ArticleExt(id=1209816724766069031, articleId=1209816724317278467, tenantId=1146029695717560320, journalId=1146123166801305609, language=EN, title=Improved YOLOv8 Bird Recognition Algorithm Integrates ViT and Multi-scale Attention, columnId=1209816719539966141, journalTitle=Science Technology and Engineering, columnName=Papers∙Automation and Computational Technology, runingTitle=null, highlight=null, articleAbstract=

In order to solve the problems of inaccurate dense target recognition and difficult detection of small targets in bird recognition, a bird recognition algorithm based on improved YOLOv8 was proposed. Firstly, in order to solve the problem of difficult dense object recognition, the multi-scale linear attention mechanism EfficientViT was used to replace the backbone network to realize the global receptive field and multi-scale learning, improve the performance and efficiency of the model, and improve the dense object recognition effect. Then, in order to solve the problem that it is difficult to detect small target birds and is prone to missed detection, an efficient multi-scale attention EMA (efficient multi-scale attention) mechanism was introduced to realize cross-dimensional aggregation features through channel recombination, so as to better capture global information, realize multi-scale feature fusion, and reduce the probability of missed detection. The experimental results show that the mAP50 of the improved model on the benchmark dataset CUB-200-2011 and birds28 reaches 77.1% and 88.4%, respectively, which is 4.5 and 5.4 percentage points higher than the original YOLOv8 model, respectively, which verifies the effectiveness of the improved model.

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Qiang ZHANG, Can-zhi ZHANG, Heng CAO, Teng-jiao YUAN), CN=ArticleExt(id=1209816727483978139, articleId=1209816724317278467, tenantId=1146029695717560320, journalId=1146123166801305609, language=CN, title=融合ViT与多尺度注意力的改进YOLOv8飞鸟识别算法, columnId=1209816721498706113, journalTitle=科学技术与工程, columnName=论文∙自动化技术、计算机技术, runingTitle=null, highlight=null, articleAbstract=

针对飞鸟识别中存在密集目标识别不准确、小目标检测困难等问题,提出一种基于改进YOLOv8的飞鸟识别算法。首先,针对密集目标识别难度大的问题,采用多尺度线性注意力机制EfficientViT替换骨干网络,实现全局感受野和多尺度学习,提升模型性能和效率的同时提高密集目标识别效果。然后,针对小目标飞鸟检测困难、容易出现漏检的问题,引入高效多尺度注意力 (efficient multi-scale attention, EMA)机制,通过通道重组实现跨维度聚合特征,从而更好地捕捉全局信息,实现多尺度特征融合,减少漏检概率。实验结果表明,改进模型在鸟类识别基准数据集CUB-200-2011和自制数据集birds28上的mAP50分别达到77.1%和88.4%,较原始YOLOv8模型分别提高了4.5和5.4个百分点,验证了改进模型的有效性。

, correspAuthors=null, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=Gph4PJGXgmIyT8InqDoFfg==, magXml=zqD8l59mdxtJJxb0ZrVoDg==, pdfUrl=null, pdf=LtQw/UmYW4JN2vJ/L/ZSSA==, pdfFileSize=10107090, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=OFsI+vJhwMzIv6EQ1qyNuw==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=8RZyDPDTPeEfgMLsx/gD7A==, mapNumber=null, authorCompany=null, fund=null, authors=

张强(1986—),男,汉族,四川广安人,博士,教授。研究方向:民航通信导航监视、低空监视。E-mail:

, authorsList=张强, 张灿智, 曹恒, 员腾蛟)}, authors=[Author(id=1209929297222038490, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=271198043@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1209929297322701791, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297222038490, language=EN, stringName=Qiang ZHANG, firstName=Qiang, middleName=null, lastName=ZHANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1209929297419170788, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297222038490, language=CN, stringName=张强, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=中国民用航空飞行学院空中交通管理学院, 德阳 618300, bio={"content":"

张强(1986—),男,汉族,四川广安人,博士,教授。研究方向:民航通信导航监视、低空监视。E-mail:

"}, bioImg=null, bioContent=

张强(1986—),男,汉族,四川广安人,博士,教授。研究方向:民航通信导航监视、低空监视。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1209929297108792275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, xref=null, ext=[AuthorCompanyExt(id=1209929297117180884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China), AuthorCompanyExt(id=1209929297125569493, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=中国民用航空飞行学院空中交通管理学院, 德阳 618300)])]), Author(id=1209929297498862569, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1209929297591137261, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297498862569, language=EN, stringName=Can-zhi ZHANG, firstName=Can-zhi, middleName=null, lastName=ZHANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1209929297675023343, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297498862569, language=CN, stringName=张灿智, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=中国民用航空飞行学院空中交通管理学院, 德阳 618300, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1209929297108792275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, xref=null, ext=[AuthorCompanyExt(id=1209929297117180884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China), AuthorCompanyExt(id=1209929297125569493, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=中国民用航空飞行学院空中交通管理学院, 德阳 618300)])]), Author(id=1209929297767298037, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1209929297918292989, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297767298037, language=EN, stringName=Heng CAO, firstName=Heng, middleName=null, lastName=CAO, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1209929297997984768, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929297767298037, language=CN, stringName=曹恒, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=中国民用航空飞行学院空中交通管理学院, 德阳 618300, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1209929297108792275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, xref=null, ext=[AuthorCompanyExt(id=1209929297117180884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China), AuthorCompanyExt(id=1209929297125569493, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=中国民用航空飞行学院空中交通管理学院, 德阳 618300)])]), Author(id=1209929298111229957, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1209929298199310345, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929298111229957, language=EN, stringName=Teng-jiao YUAN, firstName=Teng-jiao, middleName=null, lastName=YUAN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1209929298283196429, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, authorId=1209929298111229957, language=CN, stringName=员腾蛟, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=null, address=中国民用航空飞行学院空中交通管理学院, 德阳 618300, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1209929297108792275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, xref=null, ext=[AuthorCompanyExt(id=1209929297117180884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China), AuthorCompanyExt(id=1209929297125569493, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=中国民用航空飞行学院空中交通管理学院, 德阳 618300)])])], keywords=[Keyword(id=1209929298450968592, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=1, keyword=bird recognition), Keyword(id=1209929298547437588, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=2, keyword=multi-scale attention), Keyword(id=1209929298635517976, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=3, keyword=dense target recognition), Keyword(id=1209929299835088919, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=4, keyword=YOLOv8), Keyword(id=1209929299935752221, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=5, keyword=EfficientViT), Keyword(id=1209929300011249694, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, orderNo=6, keyword=EMA), Keyword(id=1209929300124495906, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=1, keyword=飞鸟识别), Keyword(id=1209929300225159204, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=2, keyword=多尺度注意力), Keyword(id=1209929300304850985, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=3, keyword=密集目标识别), Keyword(id=1209929300413902894, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=4, keyword=YOLOv8), Keyword(id=1209929300518760500, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=5, keyword=EfficientViT), Keyword(id=1209929300627812407, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, orderNo=6, keyword=EMA)], refs=[Reference(id=1209929304536903846, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=23, issue=19, pageStart=22867, pageEnd=22879, url=null, language=null, rfNumber=[1], rfOrder=0, authorNames=Zhang C, Shi F, Zhang X, journalName=IEEE Sensors Journal, refType=null, unstructuredReference=Zhang C, Shi F, Zhang X, et al. Airport near-altitude flying birds detection based on information compensation multi-scale feature fusion[J]. IEEE Sensors Journal, 2023, 23(19): 22867-22879., articleTitle=Airport near-altitude flying birds detection based on information compensation multi-scale feature fusion, refAbstract=null), Reference(id=1209929304650150056, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=7, issue=3, pageStart=26, pageEnd=null, url=null, language=null, rfNumber=[2], rfOrder=1, authorNames=Metz I C, Ellerbroek J, Mühlhausen T, journalName=Aerospace, refType=null, unstructuredReference=Metz I C, Ellerbroek J, Mühlhausen T, et al. The bird strike challenge[J]. Aerospace, 2020, 7(3): 26., articleTitle=The bird strike challenge, refAbstract=null), Reference(id=1209929304734036136, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=11, issue=9, pageStart=1072, pageEnd=1085, url=null, language=null, rfNumber=[3], rfOrder=2, authorNames=Ferreira A C, Silva L R, Renna F, journalName=Methods in Ecology and Evolution, refType=null, unstructuredReference=Ferreira A C, Silva L R, Renna F, et al. Deep learning-based methods for individual recognition in small birds[J]. Methods in Ecology and Evolution, 2020, 11(9): 1072-1085., articleTitle=Deep learning-based methods for individual recognition in small birds, refAbstract=null), Reference(id=1209929304822116523, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=27, issue=null, pageStart=1677, pageEnd=1690, url=null, language=null, rfNumber=[4], rfOrder=3, authorNames=Liu H, Zhang C, Deng Y, journalName=IEEE Transactions on Multimedia, refType=null, unstructuredReference=Liu H, Zhang C, Deng Y, et al. TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification[J]. IEEE Transactions on Multimedia, 2023, 27: 1677-1690., articleTitle=TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification, refAbstract=null), Reference(id=1209929304893419695, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=13, issue=2, pageStart=264, pageEnd=null, url=null, language=null, rfNumber=[5], rfOrder=4, authorNames=Wang K, Yang F, Chen Z, journalName=Animals, refType=null, unstructuredReference=Wang K, Yang F, Chen Z, et al. A fine-grained bird classification method based on attention and decoupled knowledge distillation[J]. Animals, 2023, 13(2): 264., articleTitle=A fine-grained bird classification method based on attention and decoupled knowledge distillation, refAbstract=null), Reference(id=1209929304952139955, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=1, pageEnd=6, url=null, language=null, rfNumber=[6], rfOrder=5, authorNames=Chakraborti T, McCane B, Mills S, journalName=null, refType=null, unstructuredReference=Chakraborti T, McCane B, Mills S, et al. CoCoNet:a collaborative convolutional network applied to fine-grained bird species classification[C]// 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). New York: IEEE, 2020: 1-6., articleTitle=CoCoNet:a collaborative convolutional network applied to fine-grained bird species classification, refAbstract=null), Reference(id=1209929305019248823, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=9, pageStart=3727, pageEnd=3736, url=null, language=null, rfNumber=[7], rfOrder=6, authorNames=任书杰, 胡勇, 何文祥, journalName=科学技术与工程, refType=null, unstructuredReference=任书杰, 胡勇, 何文祥, 等. 基于深度学习的砂岩组分显微图像识别[J]. 科学技术与工程, 2024, 24(9): 3727-3736., articleTitle=基于深度学习的砂岩组分显微图像识别, refAbstract=null), Reference(id=1209929305086357687, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=9, pageStart=3727, pageEnd=3736, url=null, language=null, rfNumber=[7], rfOrder=7, authorNames=Ren Shujie, Hu Yong, He Wenxiang, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Ren Shujie, Hu Yong, He Wenxiang, et al. Microscopic image recognition of sandstone components based on deep learning[J]. Science Technology and Engineering, 2024, 24(9): 3727-3736., articleTitle=Microscopic image recognition of sandstone components based on deep learning, refAbstract=null), Reference(id=1209929305153466552, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=17, pageStart=7059, pageEnd=7065, url=null, language=null, rfNumber=[8], rfOrder=8, authorNames=林开颜, 牛程远, 张浩平, journalName=科学技术与工程, refType=null, unstructuredReference=林开颜, 牛程远, 张浩平, 等. 基于深度学习的景观植物颜色特征提取方法[J]. 科学技术与工程, 2024, 24(17): 7059-7065., articleTitle=基于深度学习的景观植物颜色特征提取方法, refAbstract=null), Reference(id=1209929305266712764, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=17, pageStart=7059, pageEnd=7065, url=null, language=null, rfNumber=[8], rfOrder=9, authorNames=Lin Kaiyan, Niu Chengyuan, Zhang Haoping, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Lin Kaiyan, Niu Chengyuan, Zhang Haoping, et al. A method for extracting color characteristics of landscape plants based on deep learning[J]. Science Technology and Engineering, 2024, 24(17): 7059-7065., articleTitle=A method for extracting color characteristics of landscape plants based on deep learning, refAbstract=null), Reference(id=1209929305367376064, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=8, pageStart=46, pageEnd=55, url=null, language=null, rfNumber=[9], rfOrder=10, authorNames=谢威宇, 张强, journalName=计算机工程与应用, refType=null, unstructuredReference=谢威宇, 张强. 基于深度学习的图像中无人机与飞鸟检测研究综述[J]. 计算机工程与应用, 2024, 60(8): 46-55., articleTitle=基于深度学习的图像中无人机与飞鸟检测研究综述, refAbstract=null), Reference(id=1209929305438679235, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=60, issue=8, pageStart=46, pageEnd=55, url=null, language=null, rfNumber=[9], rfOrder=11, authorNames=Xie Weiyu, Zhang Qiang, journalName=Computer Engineering and Applications, refType=null, unstructuredReference=Xie Weiyu, Zhang Qiang. Review on detection of drones and birds in photoelectric images based on deep learning convolutional neural network[J]. Computer Engineering and Applications, 2024, 60(8): 46-55., articleTitle=Review on detection of drones and birds in photoelectric images based on deep learning convolutional neural network, refAbstract=null), Reference(id=1209929305518371015, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=8, issue=null, pageStart=116663, pageEnd=116674, url=null, language=null, rfNumber=[10], rfOrder=12, authorNames=Won C S, journalName=IEEE Access, refType=null, unstructuredReference=Won C S. Multi-scale CNN for fine-grained image recognition[J]. IEEE Access, 2020, 8: 116663-116674., articleTitle=Multi-scale CNN for fine-grained image recognition, refAbstract=null), Reference(id=1209929305585479883, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=57, issue=3, pageStart=102186, pageEnd=null, url=null, language=null, rfNumber=[11], rfOrder=13, authorNames=Tan M, Zhou J, Peng Z, journalName=Information Processing & Management, refType=null, unstructuredReference=Tan M, Zhou J, Peng Z, et al. Fine-grained image classification with factorized deep user click feature[J]. Information Processing & Management, 2020, 57(3): 102186., articleTitle=Fine-grained image classification with factorized deep user click feature, refAbstract=null), Reference(id=1209929305652588749, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=null, issue=null, pageStart=17, pageEnd=34, url=null, language=null, rfNumber=[12], rfOrder=14, authorNames=Bellafkir H, Vogelbacher M, Schneider D, journalName=International Conference on Networked Systems, refType=null, unstructuredReference=Bellafkir H, Vogelbacher M, Schneider D, et al. Edge-based bird species recognition via active learning[C]// International Conference on Networked Systems. Cham: Springer Nature Switzerland, 2023: 17-34., articleTitle=Edge-based bird species recognition via active learning, refAbstract=null), Reference(id=1209929305728086224, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=17, pageStart=7205, pageEnd=7213, url=null, language=null, rfNumber=[13], rfOrder=15, authorNames=宣以国, 余成波, 蒋启超, journalName=科学技术与工程, refType=null, unstructuredReference=宣以国, 余成波, 蒋启超, 等. 基于改进 YOLOv7 的道路裂缝和坑洞检测算法[J]. 科学技术与工程, 2024, 24(17): 7205-7213., articleTitle=基于改进 YOLOv7 的道路裂缝和坑洞检测算法, refAbstract=null), Reference(id=1209929305807778003, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=24, issue=17, pageStart=7205, pageEnd=7213, url=null, language=null, rfNumber=[13], rfOrder=16, authorNames=Xuan Yiguo, Yu Chengbo, Jiang Qichao, journalName=Science Technology and Engineering, refType=null, unstructuredReference=Xuan Yiguo, Yu Chengbo, Jiang Qichao, et al. Improved YOLOv7 road crack and pothole detection algorithm[J]. Science Technology and Engineering, 2024, 24(17): 7205-7213., articleTitle=Improved YOLOv7 road crack and pothole detection algorithm, refAbstract=null), Reference(id=1209929305883275478, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2019, volume=null, issue=null, pageStart=28, pageEnd=38, url=null, language=null, rfNumber=[14], rfOrder=17, authorNames=Kumar A, Das S D, journalName=Computer Vision Applications, refType=null, unstructuredReference=Kumar A, Das S D. Bird species classification using transfer learning with multistage training[C]// Computer Vision Applications: Third Workshop, WCVA 2018. Singapore: Springer, 2019: 28-38., articleTitle=Bird species classification using transfer learning with multistage training, refAbstract=null), Reference(id=1209929305962967256, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=44, issue=4, pageStart=1114, pageEnd=1120, url=null, language=null, rfNumber=[15], rfOrder=18, authorNames=陈天华, 朱家煊, 印杰, journalName=计算机应用, refType=null, unstructuredReference=陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 计算机应用, 2024, 44(4): 1114-1120., articleTitle=基于注意力机制的鸟类识别算法, refAbstract=null), Reference(id=1209929306051047641, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2024, volume=44, issue=4, pageStart=1114, pageEnd=1120, url=null, language=null, rfNumber=[15], rfOrder=19, authorNames=Chen Tianhua, Zhu Jiaxuan, Yin Jie, journalName=Journal of Computer Applications, refType=null, unstructuredReference=Chen Tianhua, Zhu Jiaxuan, Yin Jie. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120., articleTitle=Bird recognition algorithm based on attention mechanism, refAbstract=null), Reference(id=1209929306126545118, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=2039, pageEnd=2048, url=null, language=null, rfNumber=[16], rfOrder=20, authorNames=Cohen A S, Cont R, Rossier A, journalName=International Conference on Machine Learning, refType=null, unstructuredReference=Cohen A S, Cont R, Rossier A, et al. Scaling properties of deep residual networks[C]// International Conference on Machine Learning. New York: PMLR, 2021: 2039-2048., articleTitle=Scaling properties of deep residual networks, refAbstract=null), Reference(id=1209929306210431198, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=7, issue=5, pageStart=304, pageEnd=null, url=null, language=null, rfNumber=[17], rfOrder=21, authorNames=Li Y, Fan Q, Huang H, journalName=Drones, refType=null, unstructuredReference=Li Y, Fan Q, Huang H, et al. A modified YOLOv8 detection network for UAV aerial image recognition[J]. Drones, 2023, 7(5): 304., articleTitle=A modified YOLOv8 detection network for UAV aerial image recognition, refAbstract=null), Reference(id=1209929306290122977, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=13, issue=24, pageStart=12977, pageEnd=null, url=null, language=null, rfNumber=[18], rfOrder=22, authorNames=Wu T, Dong Y, journalName=Applied Sciences, refType=null, unstructuredReference=Wu T, Dong Y. YOLO-SE: improved YOLOv8 for remote sensing object detection and recognition[J]. Applied Sciences, 2023, 13(24): 12977., articleTitle=YOLO-SE: improved YOLOv8 for remote sensing object detection and recognition, refAbstract=null), Reference(id=1209929306394980580, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2020, volume=11929, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=[19], rfOrder=23, authorNames=Dosovitskiy A, Beyer L, Kolesnikov A, journalName=arXiv preprint arXiv: 2010, refType=null, unstructuredReference=Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010. 11929, 2020., articleTitle=An image is worth 16×16 words: Transformers for image recognition at scale, refAbstract=null), Reference(id=1209929306478866661, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2021, volume=34, issue=null, pageStart=12077, pageEnd=12090, url=null, language=null, rfNumber=[20], rfOrder=24, authorNames=Xie E, Wang W, Yu Z, journalName=Advances in Neural Information Processing Systems, refType=null, unstructuredReference=Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 12077-12090., articleTitle=SegFormer: simple and efficient design for semantic segmentation with transformers, refAbstract=null), Reference(id=1209929306562752742, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2022, volume=35, issue=null, pageStart=1140, pageEnd=1156, url=null, language=null, rfNumber=[21], rfOrder=25, authorNames=Guo M H, Lu C Z, Hou Q, journalName=Advances in Neural Information Processing Systems, refType=null, unstructuredReference=Guo M H, Lu C Z, Hou Q, et al. Segnext: rethinking convolutional attention design for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2022, 35: 1140-1156., articleTitle=Segnext: rethinking convolutional attention design for semantic segmentation, refAbstract=null), Reference(id=1209929306650833130, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=null, issue=null, pageStart=17302, pageEnd=17313, url=null, language=null, rfNumber=[22], rfOrder=26, authorNames=Cai H, Li J, Hu M, journalName=Proceedings of the IEEE/CVF International Conference on Computer Vision, refType=null, unstructuredReference=Cai H, Li J, Hu M, et al. Efficientvit: lightweight multi-scale attention for high-resolution dense prediction[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2023: 17302-17313., articleTitle=Efficientvit: lightweight multi-scale attention for high-resolution dense prediction, refAbstract=null), Reference(id=1209929306764079342, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, doi=null, pmid=null, pmcid=null, year=2023, volume=null, issue=null, pageStart=1, pageEnd=5, url=null, language=null, rfNumber=[23], rfOrder=27, authorNames=Ouyang D, He S, Zhang G, journalName=2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), refType=null, unstructuredReference=Ouyang D, He S, Zhang G, et al. Efficient multi-scale attention module with cross-spatial learning[C]// 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2023: 1-5., articleTitle=Efficient multi-scale attention module with cross-spatial learning, refAbstract=null)], funds=[Fund(id=1209929303102451874, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, awardId=ZJ2023-007, language=CN, fundingSource=中央高校基本科研业务费专项(ZJ2023-007), fundOrder=null, country=null), Fund(id=1209929303182143650, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, awardId=202101199029, language=CN, fundingSource=教育部产学合作协同育人项目(202101199029), fundOrder=null, country=null)], companyList=[AuthorCompany(id=1209929297108792275, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, xref=null, ext=[AuthorCompanyExt(id=1209929297117180884, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China), AuthorCompanyExt(id=1209929297125569493, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, companyId=1209929297108792275, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=中国民用航空飞行学院空中交通管理学院, 德阳 618300)])], figs=[ArticleFig(id=1209929300783001663, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.1, caption=Inter-class and intra-class differences, figureFileSmall=K6qOmRwcoKjaoHiUkE2FyA==, figureFileBig=UJthX/BY72wJePbpm3iVvw==, tableContent=null), ArticleFig(id=1209929300904636484, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图1, caption=类间差异与类内差异, figureFileSmall=K6qOmRwcoKjaoHiUkE2FyA==, figureFileBig=UJthX/BY72wJePbpm3iVvw==, tableContent=null), ArticleFig(id=1209929301038854218, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.2, caption=Schematic diagram of a multiscale linear attention model, figureFileSmall=mI5hpiiRAq+fn1NV/uN7ZQ==, figureFileBig=AnLQYnlYuAM4DnMo1iJeIA==, tableContent=null), ArticleFig(id=1209929301139517518, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图2, caption=多尺度线性注意力模型示意图, figureFileSmall=mI5hpiiRAq+fn1NV/uN7ZQ==, figureFileBig=AnLQYnlYuAM4DnMo1iJeIA==, tableContent=null), ArticleFig(id=1209929301256958036, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.3, caption=Schematic diagram of the EMA structure, figureFileSmall=RqiWyy3luJ2dqN2CQ+yQZg==, figureFileBig=o6NXPDCBFWYe0tga9wBZ8A==, tableContent=null), ArticleFig(id=1209929301340844119, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图3, caption=EMA结构示意图

C表示输入通道的数量;HW分别表示输入特征的空间维度

, figureFileSmall=RqiWyy3luJ2dqN2CQ+yQZg==, figureFileBig=o6NXPDCBFWYe0tga9wBZ8A==, tableContent=null), ArticleFig(id=1209929301424730204, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.4, caption=Data set situation of CUB-200-2011, figureFileSmall=Nt1IBWZXT7Ax7VoeJsujmA==, figureFileBig=070+253JMDM+LMXKREVLfw==, tableContent=null), ArticleFig(id=1209929301512810595, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图4, caption=CUB-200-2011数据集情况, figureFileSmall=Nt1IBWZXT7Ax7VoeJsujmA==, figureFileBig=070+253JMDM+LMXKREVLfw==, tableContent=null), ArticleFig(id=1209929301592502373, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.5, caption=Data set situation of birds28, figureFileSmall=1XWTxDx4Cyi/qOqsJs6Vvw==, figureFileBig=LX1GKSvjY68oL5KA27G0Cw==, tableContent=null), ArticleFig(id=1209929301659611242, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图5, caption=birds28数据集情况, figureFileSmall=1XWTxDx4Cyi/qOqsJs6Vvw==, figureFileBig=LX1GKSvjY68oL5KA27G0Cw==, tableContent=null), ArticleFig(id=1209929301735108717, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.6, caption=Original YOLOv8 model recognition effect, figureFileSmall=TTC9iVNPU+H3Cgse3t1Lew==, figureFileBig=FklzDwP5+OBf/z/fH68Y9g==, tableContent=null), ArticleFig(id=1209929301827383408, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图6, caption=原始YOLOv8模型识别效果, figureFileSmall=TTC9iVNPU+H3Cgse3t1Lew==, figureFileBig=FklzDwP5+OBf/z/fH68Y9g==, tableContent=null), ArticleFig(id=1209929301907075190, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Fig.7, caption=The improved YOLOv8 model recognition effect, figureFileSmall=xfVQdPTiBkjn2CH3l74r5g==, figureFileBig=36jo1e8ZEGWucOr3ow3HzQ==, tableContent=null), ArticleFig(id=1209929302032904315, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=图7, caption=改进后的YOLOv8模型识别效果, figureFileSmall=xfVQdPTiBkjn2CH3l74r5g==, figureFileBig=36jo1e8ZEGWucOr3ow3HzQ==, tableContent=null), ArticleFig(id=1209929302112596096, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Table 1, caption=

Experimental environmental parameter

, figureFileSmall=null, figureFileBig=null, tableContent=
参数 设置
epoch 300
batch 16
Imgsz 640
workers 4
lr0 0.01
momentum 0.937
weight_decay 0.000 5
), ArticleFig(id=1209929302192287875, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=表1, caption=

实验环境参数

, figureFileSmall=null, figureFileBig=null, tableContent=
参数 设置
epoch 300
batch 16
Imgsz 640
workers 4
lr0 0.01
momentum 0.937
weight_decay 0.000 5
), ArticleFig(id=1209929302276173957, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Table 2, caption=

CUB-200-2011 data set ablation experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv8 0.726 0.62 3.37 9.9
YOLOv8-EMA 0.742 0.64 3.38 10
YOLOv8-EfficientViT 0.756 0.642 4.37 11.2
YOLOv8-EfficientViT-
EMA
0.771 0.654 4.39 11.3
), ArticleFig(id=1209929302330699911, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=表2, caption=

CUB-200-2011数据集消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv8 0.726 0.62 3.37 9.9
YOLOv8-EMA 0.742 0.64 3.38 10
YOLOv8-EfficientViT 0.756 0.642 4.37 11.2
YOLOv8-EfficientViT-
EMA
0.771 0.654 4.39 11.3
), ArticleFig(id=1209929302393614474, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Table 3, caption=

birds28 data set ablation experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv8 0.830 0.592 3.01 8.2
YOLOv8-EMA 0.827 0.586 3.02 8.4
YOLOv8-EfficientViT 0.882 0.646 4.01 9.5
YOLOv8-EfficientViT-
EMA
0.884 0.655 4.02 9.6
), ArticleFig(id=1209929302494277773, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=表3, caption=

birds28数据集消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv8 0.830 0.592 3.01 8.2
YOLOv8-EMA 0.827 0.586 3.02 8.4
YOLOv8-EfficientViT 0.882 0.646 4.01 9.5
YOLOv8-EfficientViT-
EMA
0.884 0.655 4.02 9.6
), ArticleFig(id=1209929302594941073, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Table 4, caption=

CUB-200-2011 data set model comparison experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv5 0.687 0.580 2.87 8.9
YOLOv6 0.666 0.572 4.68 14.0
YOLOv8-ghost 0.716 0.615 2.08 6.8
YOLOv8-ghost-EMA 0.718 0.611 2.68 8.2
YOLOv8-BiFPN 0.749 0.639 3.15 9.9
Resnet18 0.770 0.642 13.69 36.9
YOLOv8-EfficientViT-
EMA
0.771 0.654 4.39 11.3
), ArticleFig(id=1209929302703992980, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=表4, caption=

CUB-200-2011数据集模型对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv5 0.687 0.580 2.87 8.9
YOLOv6 0.666 0.572 4.68 14.0
YOLOv8-ghost 0.716 0.615 2.08 6.8
YOLOv8-ghost-EMA 0.718 0.611 2.68 8.2
YOLOv8-BiFPN 0.749 0.639 3.15 9.9
Resnet18 0.770 0.642 13.69 36.9
YOLOv8-EfficientViT-
EMA
0.771 0.654 4.39 11.3
), ArticleFig(id=1209929302834016411, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=EN, label=Table 5, caption=

birds28 dataset model comparison experiment

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv5 0.827 0.577 2.51 7.2
YOLOv6 0.751 0.523 4.24 11.9
YOLOv8-ghost 0.804 0.574 1.72 5.2
YOLOv8-ghost-EMA 0.808 0.569 1.73 5.2
YOLOv8-BiFPN 0.836 0.598 2.79 8.3
Resnet18 0.795 0.539 13.33 35.1
YOLOv8-EfficientViT-
EMA
0.884 0.655 4.02 9.6
), ArticleFig(id=1209929302917902492, tenantId=1146029695717560320, journalId=1146123166801305609, articleId=1209816724317278467, language=CN, label=表5, caption=

birds28数据集模型对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
模型 mAP50 mAP50-95 参数量/
106
GFLOPs
YOLOv5 0.827 0.577 2.51 7.2
YOLOv6 0.751 0.523 4.24 11.9
YOLOv8-ghost 0.804 0.574 1.72 5.2
YOLOv8-ghost-EMA 0.808 0.569 1.73 5.2
YOLOv8-BiFPN 0.836 0.598 2.79 8.3
Resnet18 0.795 0.539 13.33 35.1
YOLOv8-EfficientViT-
EMA
0.884 0.655 4.02 9.6
)], attaches=null, journal=Journal(id=1146119176004939786, delFlag=0, nameCn=科学技术与工程, nameEn=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, issn=1671-1815, eissn=, cn=11-4688/T, coden=null, periodic=4, language=CN, oaType=是, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=UKU/O7GSka5polgCTkbIIw==, journalPrice=null, startedYear=null, abbrevIsoEn=Sci Technol Eng, journalRemark=null, publicationField=null, createdTime=null, updatedTime=1754445529766, createdBy=null, updatedBy=13701087609, firstLetterCn=S, firstLetterEn=S, subjectCode=Natural Sciences, subjectName=自然科学, subjectCodeEn=Natural Sciences, subjectNameEn=null, picCn=UKU/O7GSka5polgCTkbIIw==, picEn=5hwlULoNwcbj3xUmVi9MAQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1159791870395564357, language=CN, name=科学技术与工程, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529793, updatedTime=1754445529793, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=http://www.stae.com.cn/jsygc/site/menus/20090429150146001, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1159791870441701702, language=EN, name=Science Technology and Engineering, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=null, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=http://www.stae.com.cn/jsygc/home, createdTime=1754445529804, updatedTime=1754445529804, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=http://www.stae.com.cn/jsygc/author/login, submissionEditorUrl=http://www.stae.com.cn/jsygc/editor/login, submissionReviewUrl=http://www.stae.com.cn/jsygc/reviewer/login, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1146123166801305609, websiteList=[Website(id=1148243202391400884, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/CN, language=CN, createTime=1751692112777, createBy=18614031015, updateTime=1753520965431, updateBy=18614031015, name=科学技术与工程-中文站点, tplId=1146099689490845704, title=科学技术与工程, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1148622798802673703, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=articleTextType, value=kx, createTime=1751782615614, updateTime=1751782615614, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798781702180, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=banner, value=null, createTime=1751782615609, updateTime=1751782615609, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798769119267, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1751782615606, updateTime=1751782615606, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798794285094, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1751782615612, updateTime=1751782615612, creator=18614031015, updator=18614031015), WebsiteProps(id=1148622798790090789, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1148243202391400884, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1751782615611, updateTime=1751782615611, creator=18614031015, updator=18614031015)]), Website(id=1155914124811976731, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1146123166801305609, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/kxjsygc/EN, language=EN, createTime=1753521003206, createBy=18614031015, updateTime=1753521003206, updateBy=18614031015, name=科学技术与工程-英文站点, tplId=1146101810881728533, title=Science Technology and Engineering, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1155914371227308235, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=articleTextType, value=kx, createTime=1753521061952, updateTime=1753521061952, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371210531016, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=banner, value=null, createTime=1753521061947, updateTime=1753521061947, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371202142407, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=logo, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic?fileId=j86gbwi+p0Idkyl5SzIlmQ==, createTime=1753521061945, updateTime=1753521061945, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371223113930, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/kjdb/CN/file/pic, createTime=1753521061950, updateTime=1753521061950, creator=18614031015, updator=18614031015), WebsiteProps(id=1155914371218919625, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1155914124811976731, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1753521061949, updateTime=1753521061949, creator=18614031015, updator=18614031015)])], journalTitle=科学技术与工程, weixinUrl=null, journalUrl=null, iacademicId=null, status=0, seqNo=null, journalTitleEn=Science Technology and Engineering, journalPhotoCn=UKU/O7GSka5polgCTkbIIw==, journalPhotoEn=5hwlULoNwcbj3xUmVi9MAQ==, journalFirstLetter=S, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=null, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2405983, detailUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/10.12404/j.issn.1671-1815.2405983, pdfUrlCn=https://castjournals.cast.org.cn/joweb/kxjsygc/CN/PDF/10.12404/j.issn.1671-1815.2405983, pdfUrlEn=https://castjournals.cast.org.cn/joweb/kxjsygc/EN/PDF/10.12404/j.issn.1671-1815.2405983, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
融合ViT与多尺度注意力的改进YOLOv8飞鸟识别算法
收藏切换
PDF下载
张强 , 张灿智 , 曹恒 , 员腾蛟
科学技术与工程 | 论文∙自动化技术、计算机技术 2025,25(19): 8151-8157
收起
收藏切换
科学技术与工程 | 论文∙自动化技术、计算机技术 2025, 25(19): 8151-8157
融合ViT与多尺度注意力的改进YOLOv8飞鸟识别算法
全屏
张强 , 张灿智, 曹恒, 员腾蛟
作者信息
  • 中国民用航空飞行学院空中交通管理学院, 德阳 618300
  • 张强(1986—),男,汉族,四川广安人,博士,教授。研究方向:民航通信导航监视、低空监视。E-mail:

Improved YOLOv8 Bird Recognition Algorithm Integrates ViT and Multi-scale Attention
Qiang ZHANG , Can-zhi ZHANG, Heng CAO, Teng-jiao YUAN
Affiliations
  • College of Air Traffic Management, Civil Aviation Flight University of China, Deyang 618300, China
出版时间: 2025-07-08 doi: 10.12404/j.issn.1671-1815.2405983
文章导航
收藏切换

针对飞鸟识别中存在密集目标识别不准确、小目标检测困难等问题,提出一种基于改进YOLOv8的飞鸟识别算法。首先,针对密集目标识别难度大的问题,采用多尺度线性注意力机制EfficientViT替换骨干网络,实现全局感受野和多尺度学习,提升模型性能和效率的同时提高密集目标识别效果。然后,针对小目标飞鸟检测困难、容易出现漏检的问题,引入高效多尺度注意力 (efficient multi-scale attention, EMA)机制,通过通道重组实现跨维度聚合特征,从而更好地捕捉全局信息,实现多尺度特征融合,减少漏检概率。实验结果表明,改进模型在鸟类识别基准数据集CUB-200-2011和自制数据集birds28上的mAP50分别达到77.1%和88.4%,较原始YOLOv8模型分别提高了4.5和5.4个百分点,验证了改进模型的有效性。

飞鸟识别  /  多尺度注意力  /  密集目标识别  /  YOLOv8  /  EfficientViT  /  EMA

In order to solve the problems of inaccurate dense target recognition and difficult detection of small targets in bird recognition, a bird recognition algorithm based on improved YOLOv8 was proposed. Firstly, in order to solve the problem of difficult dense object recognition, the multi-scale linear attention mechanism EfficientViT was used to replace the backbone network to realize the global receptive field and multi-scale learning, improve the performance and efficiency of the model, and improve the dense object recognition effect. Then, in order to solve the problem that it is difficult to detect small target birds and is prone to missed detection, an efficient multi-scale attention EMA (efficient multi-scale attention) mechanism was introduced to realize cross-dimensional aggregation features through channel recombination, so as to better capture global information, realize multi-scale feature fusion, and reduce the probability of missed detection. The experimental results show that the mAP50 of the improved model on the benchmark dataset CUB-200-2011 and birds28 reaches 77.1% and 88.4%, respectively, which is 4.5 and 5.4 percentage points higher than the original YOLOv8 model, respectively, which verifies the effectiveness of the improved model.

bird recognition  /  multi-scale attention  /  dense target recognition  /  YOLOv8  /  EfficientViT  /  EMA
张强, 张灿智, 曹恒, 员腾蛟. 融合ViT与多尺度注意力的改进YOLOv8飞鸟识别算法. 科学技术与工程, 2025 , 25 (19) : 8151 -8157 . DOI: 10.12404/j.issn.1671-1815.2405983
Qiang ZHANG, Can-zhi ZHANG, Heng CAO, Teng-jiao YUAN. Improved YOLOv8 Bird Recognition Algorithm Integrates ViT and Multi-scale Attention[J]. Science Technology and Engineering, 2025 , 25 (19) : 8151 -8157 . DOI: 10.12404/j.issn.1671-1815.2405983
飞鸟撞击航空器是影响航空安全的世界性难题[1],中国民航局航空安全办公室和民航科学技术研究院联合发布的《中国民航安全信息统计报告》显示,鸟击是民航第一大事故征候来源,占比达 43.26%。近年来随着生态环境的改善以及航班增多,鸟击风险正在逐渐提高。大多数鸟击事件都发生在机场和机场附近空域,为有效防止鸟击事件发生,需要对机场及其周边的鸟类情况有充分的了解[2]。但在鸟情调查工作中,人工识鸟效率低,且难度大。因此可以利用机场光电设施获取鸟类图片,再通过深度学习算法对鸟类进行自动识别[3],帮助机场更好地开展鸟防工作。
利用计算机视觉实现鸟类识别的难点[4-6]在于鸟类特征的特殊性,不同子类之间往往具有相似的外观,而同一子类也可能因为不同的姿态和背景遮挡而造成较大差距。随着深度学习和卷积神经网络(convolutional neural networks,CNN)的发展[7-8],目标分类识别技术也取得了重大突破[9]。深度学习以其卓越的计算能力和深度特征提取技术,在飞鸟识别领域展现出了非凡的潜力。通过自动学习并识别鸟类之间复杂且微妙的特征差异,深度学习模型能够显著提升鸟类识别的精度,有效减少重复性工作[10-11]。针对飞鸟目标易被遮挡、类间区别度小等难点,研究人员已经采取多种方法来改进目标检测和鸟类图像识别任务。姿态归一化CNN,依赖边界框,先完成局部定位检测并对图片裁剪,将不同的部位姿态对齐,再获得不同部位的卷积层特征,并对这些特征进行处理后实现分类。采用姿态对齐操作和多尺度特征融合方式,减轻了姿态问题的干扰,增加了识别准确率。但是依赖标注信息,对于复杂姿态的目标,姿态归一化可能会增加额外的计算和模型复杂度。Bellafkir等[12]利用深度学习模型解决由于环境条件变化所导致的识别效果差的问题,利用可靠性分数和集成预测的组合实现了较理想的识别效果,但是该模型采用主动学习的方式,依赖人类专家反馈,需要足够的专家反馈示例才能达到较好的识别效果。YOLO(yolo only look once)算法广泛应用于各种计算机视觉任务[13],而且至今仍在快速迭代更新中[14]。陈天华等[15]改进了YOLOv5算法去识别鸟类,引入双层路由注意力提升了模型效果,但其仅在自制数据集中实验,无法验证改进模型的泛化能力。YOLOv8集成了此前一系列YOLO算法的优点,在网络中大量使用了残差连接[16],性能更优秀,但在飞鸟识别任务中也存在一些不足,因此对YOLOv8进行改进,以更好地适应飞鸟识别任务。主要改进方式为采用多尺度线性注意力机制(EfficientViT)替换主干网络,使用更轻量和高效的硬件操作实现多尺度学习和全局感受野,以提升对目标密集图像的检测识别效果,同时降低计算成本,实现性能和速度的提升。以及引入高效多尺度注意力(efficient multi-scale attention, EMA)模块,通过跨空间重塑信道的方式提高参数的利用率,同时对于捕获图像边缘目标以及小尺度鸟类具有更好的效果。
YOLO系列算法自提出就受到业界广泛关注,YOLOv8集成了前几代算法的优势,具有更快的检测速度和识别精度,在各种图像处理任务都表现出色[17-18]YOLOv8又被细分为五个版本以适应各种图像处理任务的要求,分别为YOLOv8n、YOLOv8s、YOLOv8m、YOLOv8l、YOLOv8x。YOLOv8的具体结构由主干网络(Backbone)、颈部网络(Neck),以及头部(Head)组成。主干网络提取特征信息,颈部网络负责特征融合,头部则输出结果。虽然YOLOv8的性能已经十分出色,但对于飞鸟识别任务来说还略有不足,存在识别精度不高、小目标检测困难、密集目标预测不准确等问题。因此,针对飞鸟的一些特性,提出了改进的YOLOv8模型。
飞鸟识别较一般图像识别任务更为困难,因为飞鸟具有高度相似的从属类别,也就是类间差异小,而不同的姿势、拍摄角度等因素又可能会造成较大的类内差异。如图1所示,金腰燕因飞行姿态和拍摄角度问题,呈现出较大差别。而金腰燕、灰椋鸟、白头鹎之间的差异很小。因此,引入EfficientViT对飞鸟特征进行多尺度学习,提高模型的飞鸟识别能力。
此外,由于飞鸟具有群聚特性,飞鸟数据集中也常出现目标密集的情况。YOLOv8的主干网络采用的Darknet-53结构框架利用残差连接和Split操作实现了多尺度特征信息的整合,但无法有效处理密集目标。而EfficientVIT采用的多尺度线性注意力方法可以有效识别密集目标,因此将其加入YOLOv8主干网络,以加强密集目标的识别能力。
最后,部分飞鸟体积较小,占整幅图像不足十分之一,属于小目标范畴,YOLOv8在识别时会出现漏检和识别不准确的情况。高效多尺度注意力(EMA)可以避免特征降维,可以捕获更多的特征信息,有利于小目标的识别,因此将EMA引入改进模型,以减少漏检的概率,提升模型对小目标飞鸟的识别性能。
Transformer模型在处理自然语言方面表现出色,自注意力机制允许模型同时关注Input部分的所有位置,可以捕捉全局语义信息。Dosovitskiy等[19]将Transformer引入计算机视觉方面,提出了ViT(vision Transformer),通过将图像拆分为像素块的方法将图像嵌入模型,用处理自然语言的方法处理图像。在大规模预训练之后,VIT在图像处理任务中取得了能与最先进的卷积神经网络相比的良好效果。但密集目标的检测识别效果仍然较差,Xie等[20]研究证明对密集目标的处理需要全局感受野,Guo等[21]使用大核注意机制实现了大感受野,但大卷积核对硬件要求较高,该方法存在一定的局限性。Cai等[22]提出的EfficientViT,进一步提高模型效率,且在高分辨率的密集预测任务中表现出色,EfficientViT利用轻量级线性注意力实现了全局感受野和多尺度学习,并且对硬件没有特殊要求,在实验中验证了其对密集目标检测的有效性。因此将EfficientVIT的核心模块应用于YOLOv8网络架构中,以提高鸟类识别时密集检测的准确率以及模型性能。
图2为多尺度线性注意力模块,输入的图像经过线性投影层,生成Q/K/V矩阵,再经过ReLU注意力模块获取全局感受野,最后再经线性投影层实现特征融合。
ReLU注意力的相似度函数为
$\operatorname{Sim}(\boldsymbol{Q}, \boldsymbol{K})=\operatorname{ReLU}(\boldsymbol{Q})[\operatorname{ReLU}(\boldsymbol{K})]^{\mathrm{T}}$
当输入为 x R N × f时,ReLU注意力可表示为
$O_{i}=\sum_{j=1}^{N} \frac{\operatorname{ReLU}\left(\boldsymbol{Q}_{i}\right)\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}}}{\sum_{j=1}^{N} \operatorname{ReLU}\left(\boldsymbol{Q}_{i}\right)\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}}} V_{j}$
式中: Q = x W Q, K = x W K, V = x W V,且 W Q W K W V R f × d,都是线性投影矩阵。利用矩阵乘法的可结合性,可将式(2)变形得
$O_{i}=\frac{\operatorname{ReLU}\left(\boldsymbol{Q}_{i}\right)\left\{\sum_{j=1}^{N}\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}} \boldsymbol{V}_{j}\right\}}{\operatorname{ReLU}\left(\boldsymbol{Q}_{i}\right)\left\{\sum_{j=1}^{N}\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}}\right\}}$
此时就实现了对$\left\{\sum_{j=1}^{N}\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}} V_{j}\right\} \in \mathbf{R}^{d \times d}$和$\left\{\sum_{j=1}^{N}\left[\operatorname{ReLU}\left(\boldsymbol{K}_{j}\right)\right]^{\mathrm{T}}\right\} \in \mathbf{R}^{d \times 1}$的重复使用,将其引入主干网络后,降低了硬件成本,使得模型更为高效。
鸟类数据集常存在目标尺寸变化大的问题,这就使得检测识别时会出现漏检问题。为使模型能更好的关注到小目标鸟类,同时也具备良好的对大尺寸目标检测识别能力,引入了高效多尺度注意力(EMA)[23]
图3所示,EMA具有灵活的结构特征,能够融入CNN主干架构中,它通过跨空间重塑通道的方式,避免了特征降维,从而能捕获更多的特征信息。因此该模块对于区别不同鸟种的细微特征更具效果。EMA采用了特征分组的方法,将输入的特征分为G个子特征组,分别学习特征语义。这种方法允许模型在多GPU资源上部署,同时减弱了噪声的干扰。同时整体采用并行结构,分为1×1卷积分支和3×3卷积分支,避免了更大的网络深度,提高了参数的利用率。在1×1分支中采用了平均池化操作,实现空间编码通道信息,以赋值通道不同的重要性。在3×3卷积分支实现多尺度特征捕捉。最后将信息通过Sigmoid函数聚合,此时输出结果与输入结果的大小相同,因此可以更方便的将EMA加入到卷积神经网络的架构中。通过不同空间维度方向的跨空间信息聚合方法,以实现更丰富的特征聚合,同时这种方法可以处理短程和长程依赖性。与形成的有限感受野的渐进行为相反,并行利用3×3和1×1卷积捕获中间特征之间的更多上下文信息。因此,引入EMA模块可以给模型带来更高的性能增益。
在飞鸟识别的实验中,最常用和权威的数据集是加州理工大学制作的CUB-200-2011,它包含200种鸟类,11 788张尺寸不一的图片,每张图片都包含细节标注信息,但本文中仅采用其标注框与属性类别信息实现识别。同时,为验证模型适用性,也为使研究更具实际意义,结合《机场常见鸟类防范指南》,构建了常见于中国本土的鸟种数据集birds28,包含28种鸟类,共5 040张图片。
图4图5分别为CUB-200-2011和birds28的数据集情况。CUB-200-2011每幅图只有一个目标,数据量与种类更为整齐。而自制数据集中,特意选取了更复杂的密集目标以更贴合实际情况,因此数据量与种类之前会呈现参差状态。
实验采用的操作系统为Windows 11 专业版,处理器为Intel(R) Core(TM) i7-12700KF,3.6 GHz,32 GB运行内存,图像处理器为GeForceRTX3080。实验框架环境为Pytorch2.3.1,CUDA版本为12.2。具体的实验环境参数设置如表1所示。
为验证引入多尺度线性模型在鸟类识别任务中的有效性,在两个数据集中都对多种YOLOv8变体模型进行了对比试验,在所有的比较实验和消融实验中均未采用预训练权值。采用mAP50和mAP50-95两个评价指标来衡量模型效果,同时确保参数和计算需求处于同一量级。
为验证改进方法的有效性,以原始的YOLOv8模型为参考,逐步引入改进策略,分别在CUB-200-2011和birds28数据集上进行消融实验。在同等的实验条件下,比较YOLOv8、引入多尺度线性注意力模块的模型(YOLOv8-EfficientViT)、引入跨空间高效多尺度注意力模块的模型(YOLOv8-EMA)、同时引入多尺度线性注意力和跨空间高效多尺度注意力模块的模型(YOLOv8-EfficientViT-EMA)之间的性能。
表2表3所示,在引入EMA机制后,模型捕捉多尺度变化的能力提升,在CUB-200-2011数据集中,mAP50和mAP50-95分别提升1.6%和2%,在birds28中mAP50和mAP50-95分别提升0.3%和0.6%。在引入EfficientViT之后,模型对于密集目标的检测识别和性能都有所提升,在CUB-200-2011数据集中,mAP50和mAP50-95分别提升3%和2.2%,在birds28中mAP50和mAP50-95分别提升5.2%和5.4%。消融实验证明了模型改进的有效性,最终改进模型相比原始模型在CUB-200-2011数据集中mAP50和mAP50-95分别提升4.5%和3.4%,在birds28中mAP50和mAP50-95分别提升5.4%和6.3%。同时模型的参数量和计算量(GFLOPs)都与原模型处于同一数量级。图6图7为识别效果对比。
图6为原始YOLOv8模型的识别效果,可以看到存在漏检和准确度不高的问题,图7为改进后的模型识别效果,明显改善了漏检问题,识别的准确性也获得了提高。
为进一步验证改进模型的优势,将改进后的算法与其他算法进行实验比较,分别在CUB-200-2011和birs28数据集上验证,以mAP50和mAP50-95为评价指标。选择了YOLOv5、YOLOv6、resnet18以及YOLOv8的其他变体算法进行对比,在均未采用预训练权重的前提下,模型性能均优于其他算法。如表4表5所示,虽然resnet18在CUB-200-2011数据集中取得了与改良后模型相近似的结果,但其参数量和计算需求却远大于改进模型,而且其在birds28数据集中表现欠佳,说明其泛化性不好。同模型大小下,改进后的模型性能优于其他算法。
针对鸟类识别任务中,由于目标密集导致的识别不准确问题,和由于目标尺寸变化大而引起的漏检问题,提出了基于YOLOv8的改进模型。实验结果表明改进的模型在CUB-200-2011数据集上,mAP50和mAP50-95分别提升了4.5%和3.4%,验证了模型的有效性,同时在自制数据集birds28上,mAP50和mAP50-95分别提升了5.4%和6.3%,验证了模型的泛化性。
  • 中央高校基本科研业务费专项(ZJ2023-007)
  • 教育部产学合作协同育人项目(202101199029)
参考文献 引证文献
排序方式:
[1]
Zhang C, Shi F, Zhang X, et al. Airport near-altitude flying birds detection based on information compensation multi-scale feature fusion[J]. IEEE Sensors Journal, 2023, 23(19): 22867-22879.
[2]
Metz I C, Ellerbroek J, Mühlhausen T, et al. The bird strike challenge[J]. Aerospace, 2020, 7(3): 26.
[3]
Ferreira A C, Silva L R, Renna F, et al. Deep learning-based methods for individual recognition in small birds[J]. Methods in Ecology and Evolution, 2020, 11(9): 1072-1085.
[4]
Liu H, Zhang C, Deng Y, et al. TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification[J]. IEEE Transactions on Multimedia, 2023, 27: 1677-1690.
[5]
Wang K, Yang F, Chen Z, et al. A fine-grained bird classification method based on attention and decoupled knowledge distillation[J]. Animals, 2023, 13(2): 264.
[6]
Chakraborti T, McCane B, Mills S, et al. CoCoNet:a collaborative convolutional network applied to fine-grained bird species classification[C]// 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). New York: IEEE, 2020: 1-6.
[7]
任书杰, 胡勇, 何文祥, 等. 基于深度学习的砂岩组分显微图像识别[J]. 科学技术与工程, 2024, 24(9): 3727-3736.
Ren Shujie, Hu Yong, He Wenxiang, et al. Microscopic image recognition of sandstone components based on deep learning[J]. Science Technology and Engineering, 2024, 24(9): 3727-3736.
[8]
林开颜, 牛程远, 张浩平, 等. 基于深度学习的景观植物颜色特征提取方法[J]. 科学技术与工程, 2024, 24(17): 7059-7065.
Lin Kaiyan, Niu Chengyuan, Zhang Haoping, et al. A method for extracting color characteristics of landscape plants based on deep learning[J]. Science Technology and Engineering, 2024, 24(17): 7059-7065.
[9]
谢威宇, 张强. 基于深度学习的图像中无人机与飞鸟检测研究综述[J]. 计算机工程与应用, 2024, 60(8): 46-55.
Xie Weiyu, Zhang Qiang. Review on detection of drones and birds in photoelectric images based on deep learning convolutional neural network[J]. Computer Engineering and Applications, 2024, 60(8): 46-55.
[10]
Won C S. Multi-scale CNN for fine-grained image recognition[J]. IEEE Access, 2020, 8: 116663-116674.
[11]
Tan M, Zhou J, Peng Z, et al. Fine-grained image classification with factorized deep user click feature[J]. Information Processing & Management, 2020, 57(3): 102186.
[12]
Bellafkir H, Vogelbacher M, Schneider D, et al. Edge-based bird species recognition via active learning[C]// International Conference on Networked Systems. Cham: Springer Nature Switzerland, 2023: 17-34.
[13]
宣以国, 余成波, 蒋启超, 等. 基于改进 YOLOv7 的道路裂缝和坑洞检测算法[J]. 科学技术与工程, 2024, 24(17): 7205-7213.
Xuan Yiguo, Yu Chengbo, Jiang Qichao, et al. Improved YOLOv7 road crack and pothole detection algorithm[J]. Science Technology and Engineering, 2024, 24(17): 7205-7213.
[14]
Kumar A, Das S D. Bird species classification using transfer learning with multistage training[C]// Computer Vision Applications: Third Workshop, WCVA 2018. Singapore: Springer, 2019: 28-38.
[15]
陈天华, 朱家煊, 印杰. 基于注意力机制的鸟类识别算法[J]. 计算机应用, 2024, 44(4): 1114-1120.
Chen Tianhua, Zhu Jiaxuan, Yin Jie. Bird recognition algorithm based on attention mechanism[J]. Journal of Computer Applications, 2024, 44(4): 1114-1120.
[16]
Cohen A S, Cont R, Rossier A, et al. Scaling properties of deep residual networks[C]// International Conference on Machine Learning. New York: PMLR, 2021: 2039-2048.
[17]
Li Y, Fan Q, Huang H, et al. A modified YOLOv8 detection network for UAV aerial image recognition[J]. Drones, 2023, 7(5): 304.
[18]
Wu T, Dong Y. YOLO-SE: improved YOLOv8 for remote sensing object detection and recognition[J]. Applied Sciences, 2023, 13(24): 12977.
[19]
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010. 11929, 2020.
[20]
Xie E, Wang W, Yu Z, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 12077-12090.
[21]
Guo M H, Lu C Z, Hou Q, et al. Segnext: rethinking convolutional attention design for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2022, 35: 1140-1156.
[22]
Cai H, Li J, Hu M, et al. Efficientvit: lightweight multi-scale attention for high-resolution dense prediction[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. New York: IEEE, 2023: 17302-17313.
[23]
Ouyang D, He S, Zhang G, et al. Efficient multi-scale attention module with cross-spatial learning[C]// 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). New York: IEEE, 2023: 1-5.
2025年第25卷第19期
PDF下载
194
80
引用本文
BibTeX
文章信息
doi: 10.12404/j.issn.1671-1815.2405983
  • 接收时间:2024-08-09
  • 首发时间:2025-12-22
  • 出版时间:2025-07-08
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2024-08-09
  • 修回日期:2024-12-23
基金
中央高校基本科研业务费专项(ZJ2023-007)
教育部产学合作协同育人项目(202101199029)
作者信息
    中国民用航空飞行学院空中交通管理学院, 德阳 618300
参考文献
分享链接
https://castjournals.cast.org.cn/joweb/kxjsygc/CN/10.12404/j.issn.1671-1815.2405983
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏