Article(id=1249044017850294679, tenantId=1146029695717560320, journalId=1249024232475115590, issueId=1249044006114628363, articleNumber=null, orderNo=null, doi=10.11834/jig.240653, pmid=null, cstr=null, oa=null, hot=null, price=null, onlineType=0, articleFormat=0, articleType=null, articleTypeStr=null, receivedDate=1730908800000, receivedDateStr=2024-11-07, revisedDate=1745164800000, revisedDateStr=2025-04-21, acceptedDate=null, acceptedDateStr=null, onlineDate=1775724899970, onlineDateStr=2026-04-09, pubDate=1765814400000, pubDateStr=2025-12-16, doiRegisterDate=null, doiRegisterDateStr=null, onlineIssueDate=1775724899970, onlineIssueDateStr=2026-04-09, onlineJustAcceptDate=null, onlineJustAcceptDateStr=null, onlineFirstDate=null, onlineFirstDateStr=null, sourceXml=null, magXml=null, createTime=1775724899970, creator=13041195026, updateTime=1775724899970, updator=13041195026, issue=Issue{id=1249044006114628363, tenantId=1146029695717560320, journalId=1249024232475115590, year='2025', volume='30', issue='12', pageStart='3707', pageEnd='3968', issueExtLink='null', onlineDate='null', pubDate='null', beforeIssueId=null, nextIssueId=null, price=null, status=1, issueComplete=1, articleOrder=1, issueType=1, specialIssue=null, createTime=1775724897161, creator=13041195026, updateTime=1775726353303, updator=13041195026, preIssue=null, nextIssue=null, ext={EN=IssueExt(id=1249050113662984471, tenantId=1146029695717560320, journalId=1249024232475115590, issueId=1249044006114628363, language=EN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=), CN=IssueExt(id=1249050113667178776, tenantId=1146029695717560320, journalId=1249024232475115590, issueId=1249044006114628363, language=CN, specialIssueTitle=, coverIllustrator=null, specialIssueEditor=, specialIssueAbout=)}, issueFiles=null}, startPage=3838, endPage=3854, ext={EN=ArticleExt(id=1249044022086541735, articleId=1249044017850294679, tenantId=1146029695717560320, journalId=1249024232475115590, language=EN, title=Cross-modal feature fusion and detail-enhanced RGB-D salient object detection, columnId=1249044008786400014, journalTitle=Journal of Image and Graphics, columnName=Image Understanding and Computer Vision, runingTitle=null, highlight=null, articleAbstract=
Objective

RGB-D salient object detection (SOD) combines complementary information from RGB and depth images, offering substantially enhanced performance in complex and challenging scenes compared to RGB-only models. This technique has gained considerable attention in the academic community due to its capability to effectively capture salient objects by leveraging visual and spatial information. However, existing RGB-D detection models face several key challenges. First, efficiently utilizing and fusing multi-modal information from RGB and depth inputs remains a difficult task due to the inherent differences between the two modalities. RGB images provide rich color and texture details but lack depth information, whereas depth maps offer spatial cues but are often noisy or of low quality. Second, achieving accurate boundary detection is particularly challenging in cluttered or noisy environments. Noisy depth maps and cluttered backgrounds can obscure object contours, making it difficult to predict sharp and precise boundaries. These challenges highlight the urgent need for a robust model that can effectively integrate RGB and depth information while simultaneously addressing noise and enhancing boundary precision.

Method

Aiming to address these challenges, a novel method, the cross-modal feature fusion and detail-enhanced RGB-D salient object detection network (CFADNet), is introduced. The proposed network incorporates two innovative modules: the cross-modal attention fusion enhancement module (CAFEM) and the boundary feature extraction module (BFEM). The CAFEM is designed to enhance the integration of RGB and depth features by leveraging attention mechanisms that emphasize the most informative aspects of each modality. Specifically, channel attention is applied to the RGB features to suppress noise and enhance critical color and texture details. Similarly, spatial attention is applied to the depth features to emphasize spatial regions that are relevant for salient object detection. This attention-based fusion mechanism ensures that the model effectively retains global semantic information from the depth map while preserving fine-grained details from the RGB image. The fusion process is structured in multiple layers, progressively integrating features at different scales to fully utilize the complementary strengths of RGB and depth modalities. In contrast, the BFEM is specifically designed to improve the accuracy of salient object boundaries. Accurate contour detection is crucial for generating high-quality saliency maps; thus, BFEM leverages low-level CNN features, which are rich in edge and texture information. These features are refined through channel attention, which filters out noise and irrelevant details, enhancing the clarity of boundary-related cues. The refined features are then used to guide cross-modal feature decoding, ensuring that the final saliency maps exhibit sharp and accurate boundaries. By combining the edge-extraction capabilities of low-level CNN features with the semantic richness of cross-modal features, BFEM notably improves boundary precision in RGB-D salient object detection.

Result

Aiming to evaluate the performance of CFADNet, extensive experiments are conducted on four widely used RGB-D salient object detection datasets: NJU2K, NLPR, STERE, and SIP. These datasets encompass a wide range of diverse and challenging scenes, making them ideal for evaluating the generalization capability of the proposed model. CFADNet is compared against 16 state-of-the-art RGB-D salient object detection methods, including DCF, CIRNet, and CAVER, using standard quantitative metrics such as mean absolute error (MAE), F-measure(Fβ), and structural similarity (Sα). CFADNet demonstrated superior performance across all datasets, particularly excelling in the MAE metric. Specifically, this network outperformed the second-best method by 6.9%, 10.5%, 9.7%, and 2.4% on the NJU2K, NLPR, STERE, and SIP datasets, respectively. These substantial improvements highlight the effectiveness of the attention-based fusion strategy and edge refinement mechanisms. Furthermore, CFADNet consistently achieved higher F-measure and Sα scores, indicating that the model not only reduces pixel-level errors but also more accurately preserves the overall structure and shape of salient objects compared to competing methods. In addition to quantitative evaluations, qualitative comparisons are conducted to visually assess the performance of CFADNet in various challenging scenarios. Results show that the proposed method generates saliency maps with sharp and accurate boundaries, even in cases where salient objects exhibit complex edges or are embedded in cluttered and noisy backgrounds. This finding demonstrates the robustness of CFADNet in handling difficult scenes by effectively separating salient objects from their background while preserving fine boundary details. The visual results further confirm that CFADNet successfully captures global semantic information and local detail, ensuring accurate identification and clear isolation of salient objects from the background.

Conclusion

This paper presents CFADNet, a cross-modal feature fusion and detail-enhancement network for RGB-D SOD, designed to address the two major challenges: effective multimodal feature fusion and accurate boundary detection. CFADNet introduces two novel modules, the CAFEM and the BFEM. CFADNet effectively integrates RGB and depth information while notably enhancing the precision of salient object boundaries. The attention mechanisms used in the CAF0EM enable the network to fully leverage the complementary information from RGB and depth modalities. Simultaneously, the BFEM module focuses on refining edge details, resulting in sharper and more accurate saliency predictions. Extensive experiments conducted on four benchmark datasets demonstrate that CFADNet consistently outperforms existing state-of-the-art methods, achieving superior performance across key evaluation metric, including MAE, F-measure, and structural similarity index. These findings highlight the robustness and strong generalization capability of CFADNet in diverse and challenging environments. By combining attention-based feature fusion with effective edge refinement, CFADNet emerges as a powerful and reliable solution for RGB-D salient object detection into complex scenarios. Future research could explore extending this approach to other multi-modal tasks, such as RGB-Thermal or multi-spectral image processing, where challenges related to multi-modal fusion and boundary detection are also prevalent. Additionally, optimizing the computational efficiency of CFADNet for real-time deployment represents a potential research direction, enabling its application in time-sensitive applications such as autonomous driving and robotics.

, correspAuthors=Xiaogang Song, authorNote=null, correspAuthorsNote=null, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=null, magXml=null, pdfUrl=null, pdf=null, pdfFileSize=null, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=null, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=null, mapNumber=null, authorCompany=null, fund=null, authors=null, authorsList=Xiaogang Song, Yuping Tan, Fuqiang Guo, Xiaofeng Lu, Xinhong Hei), CN=ArticleExt(id=1249044036313621140, articleId=1249044017850294679, tenantId=1146029695717560320, journalId=1249024232475115590, language=CN, title=跨模态特征融合与细节信息增强的RGB-D显著目标检测, columnId=1249044009054835474, journalTitle=中国图象图形学报, columnName=图像理解和计算机视觉, runingTitle=null, highlight=null, articleAbstract=
目的

RGB-D显著目标检测通过整合RGB图像和深度图像的互补信息,可以提高应对复杂和具有挑战性场景的显著目标检测(salient object detection,SOD)能力,取得了比RGB显著性检测模型更好的性能,受到高度关注。然而,现有RGB-D 检测模型面临如何高效利用输入的多模态信息进行融合以及如何提高显著目标边缘检测精度等问题。为此,提出一种跨模态特征融合与边缘细节增强的RGB-D显著目标检测方法。

方法

通过跨模态注意力融合增强模块(cross-modal attention fusion enhancement module,CAFEM)对不同模态特征进行注意力整合,使RGB图像和深度图像的互补信息充分融合,使模型充分利用多模态特征,从而提高模型的性能。但是两种模态的输入容易出现背景信息混淆、噪声增多、深度图质量低和目标轮廓提取困难的情况。为应对上述问题,提出一种卷积神经网络(convolutional neural network,CNN)低层特征引导的边缘特征提取模块(boundary feature extraction module,BFEM),通过通道注意力对低层特征携带的噪声进行过滤,然后使用低层细节特征引导跨模态融合特征进行聚焦解码以得到更加准确的显著图像。

结果

在4个RGB-D显著目标检测数据集进行实验,与16种代表性方法进行定量和定性实验对比。在平均绝对误差(mean absolute error, MAE)指标上,本文方法相较于排名第2的方法,在4个数据集上分别提升6.9%、10.5%、9.7%和2.4%。结果表明,本文方法在各场景均有优异表现。

结论

提出一种用于RGB-D显著目标检测的跨模态特征融合与细节信息增强网络(cross-modal feature fusion and detail-enhanced network,CFADNet),通过跨模态注意力融合增强模块(CAFEM),较好地实现了RGB特征与深度特征的融合。此外,构建了边缘特征提取模块(BFEM)提取低层细节特征,最终较为准确地定位显著物体并增强了边缘细节的清晰度。

, correspAuthors=宋霄罡, authorNote=null, correspAuthorsNote=
宋霄罡
, copyrightStatement=null, copyrightOwner=null, extLink=null, articleAbsUrl=null, sourceXml=XG0DEImlKXWEZ9Bt3+9MOQ==, magXml=HsYVbuZLWxCfbj4tJpsveA==, pdfUrl=null, pdf=pBhJba9G/MB8ojWaMmeZWw==, pdfFileSize=4906729, pdfExtLink=null, richHtmlUrl=null, mobilePdfUrl=null, reviewReport=null, pdfFirstPage=null, abstractGraph=5b9ITVT5R5PSNWyvQVLaeQ==, abstractGraphContent=null, abstractVideo=null, citation=null, cebUrl=null, magXmlContent=b5hV4OpwcWM9+/2E8lmeFg==, mapNumber=null, authorCompany=null, fund=null, authors=

宋霄罡,男,副教授,主要研究方向为计算机视觉和无人自主导航系统。E-mail:

谭裕平,男,博士研究生,主要研究方向为显著目标检测和多模态融合。E-mail:

郭富强,男,硕士研究生,主要研究方向为人工智能和目标检测。E-mail:

鲁晓锋,男,教授,主要研究方向为模式识别和图像处理。E-mail:

黑新宏,男,教授,主要研究方向为计算机视觉和人工智能。E-mail:

, authorsList=宋霄罡, 谭裕平, 郭富强, 鲁晓锋, 黑新宏)}, authors=[Author(id=1249044037634826956, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=songxg@xaut.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=1, authorType=1, ext={EN=AuthorExt(id=1249044037949399766, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044037634826956, language=EN, stringName=Xiaogang Song, firstName=Xiaogang, middleName=null, lastName=Song, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, *, address=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China
2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1249044038159114974, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044037634826956, language=CN, stringName=宋霄罡, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, *, address=1西安理工大学计算机科学与工程学院,西安710048
2人机共融智能机器人陕西省高校工程研究中心,西安710048, bio={"content":"

宋霄罡,男,副教授,主要研究方向为计算机视觉和无人自主导航系统。E-mail:

"}, bioImg=null, bioContent=

宋霄罡,男,副教授,主要研究方向为计算机视觉和无人自主导航系统。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)]), AuthorCompany(id=1249044037441888960, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=2, ext=[AuthorCompanyExt(id=1249044037467054785, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China), AuthorCompanyExt(id=1249044037488026307, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2人机共融智能机器人陕西省高校工程研究中心,西安710048)])]), Author(id=1249044038356247271, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=1179961190@qq.com, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1249044038540796653, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044038356247271, language=EN, stringName=Yuping Tan, firstName=Yuping, middleName=null, lastName=Tan, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1249044040063329012, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044038356247271, language=CN, stringName=谭裕平, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1西安理工大学计算机科学与工程学院,西安710048, bio={"content":"

谭裕平,男,博士研究生,主要研究方向为显著目标检测和多模态融合。E-mail:

"}, bioImg=null, bioContent=

谭裕平,男,博士研究生,主要研究方向为显著目标检测和多模态融合。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)])]), Author(id=1249044040344347391, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=3200441274@stu.xaut.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1249044040503730951, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044040344347391, language=EN, stringName=Fuqiang Guo, firstName=Fuqiang, middleName=null, lastName=Guo, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1249044040608588557, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044040344347391, language=CN, stringName=郭富强, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1西安理工大学计算机科学与工程学院,西安710048, bio={"content":"

郭富强,男,硕士研究生,主要研究方向为人工智能和目标检测。E-mail:

"}, bioImg=null, bioContent=

郭富强,男,硕士研究生,主要研究方向为人工智能和目标检测。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)])]), Author(id=1249044040742806294, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=luxiaofeng@xaut.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1249044040897995550, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044040742806294, language=EN, stringName=Xiaofeng Lu, firstName=Xiaofeng, middleName=null, lastName=Lu, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China
2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1249044041028018981, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044040742806294, language=CN, stringName=鲁晓锋, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1西安理工大学计算机科学与工程学院,西安710048
2人机共融智能机器人陕西省高校工程研究中心,西安710048, bio={"content":"

鲁晓锋,男,教授,主要研究方向为模式识别和图像处理。E-mail:

"}, bioImg=null, bioContent=

鲁晓锋,男,教授,主要研究方向为模式识别和图像处理。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)]), AuthorCompany(id=1249044037441888960, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=2, ext=[AuthorCompanyExt(id=1249044037467054785, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China), AuthorCompanyExt(id=1249044037488026307, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2人机共融智能机器人陕西省高校工程研究中心,西安710048)])]), Author(id=1249044041145459500, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=heixinhong@xaut.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1249044041300648757, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044041145459500, language=EN, stringName=Xinhong Hei, firstName=Xinhong, middleName=null, lastName=Hei, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China
2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1249044041460032316, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, authorId=1249044041145459500, language=CN, stringName=黑新宏, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, 2, address=1西安理工大学计算机科学与工程学院,西安710048
2人机共融智能机器人陕西省高校工程研究中心,西安710048, bio={"content":"

黑新宏,男,教授,主要研究方向为计算机视觉和人工智能。E-mail:

"}, bioImg=null, bioContent=

黑新宏,男,教授,主要研究方向为计算机视觉和人工智能。E-mail:

, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)]), AuthorCompany(id=1249044037441888960, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=2, ext=[AuthorCompanyExt(id=1249044037467054785, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China), AuthorCompanyExt(id=1249044037488026307, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2人机共融智能机器人陕西省高校工程研究中心,西安710048)])])], keywords=[Keyword(id=1249044041690719041, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, orderNo=1, keyword=salient object detection(SOD)), Keyword(id=1249044042026263372, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, orderNo=2, keyword=attention mechanism), Keyword(id=1249044042126926671, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, orderNo=3, keyword=cross-modal), Keyword(id=1249044042307281752, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, orderNo=4, keyword=feature fusion), Keyword(id=1249044042395362141, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, orderNo=5, keyword=edge detail-enhancement), Keyword(id=1249044042470859617, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, orderNo=1, keyword=显著性目标检测(SOD)), Keyword(id=1249044042554745699, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, orderNo=2, keyword=注意力机制), Keyword(id=1249044042676380521, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, orderNo=3, keyword=跨模态), Keyword(id=1249044042869318512, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, orderNo=4, keyword=特征融合), Keyword(id=1249044043032896377, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, orderNo=5, keyword=边缘细节增强)], refs=[Reference(id=1249044050641363060, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=136, issue=null, pageStart=null, pageEnd=109194, url=null, language=null, rfNumber=null, rfOrder=0, authorNames=Bi H B, Wu R W, Liu Z Q, Zhu H H, Zhang C, Xiang T Z, journalName=Pattern Recognition, refType=null, unstructuredReference=Bi H BWu R WLiu Z QZhu H HZhang C and Xiang T Z. 2023. Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recognition136: #109194 [DOI: 10.1016/j.patcog.2022.109194], articleTitle=Cross-modal hierarchical interaction network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044050737832060, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2018a, volume=null, issue=null, pageStart=3051, pageEnd=3060, url=null, language=null, rfNumber=null, rfOrder=1, authorNames=Chen H, Li Y F, journalName=null, refType=null, unstructuredReference=Chen H and Li Y F. 2018a. Progressively complementarity-aware fusion network for RGB-D salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE:3051-3060 [DOI: 10.1109/CVPR.2018.00322], articleTitle=Progressively complementarity-aware fusion network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044050838495364, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=null, rfOrder=2, authorNames=Chen J N, Lu Y Y, Yu Q H, Luo X D, Adeli E, Wang Y, Lu L, Yuille A L, Zhou Y Y, journalName=null, refType=null, unstructuredReference=Chen J NLu Y YYu Q HLuo X DAdeli EWang YLu LYuille A L and Zhou Y Y. 2021. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. [2024-11-07]. https://arxiv.org/pdf/2102.04306.pdf, articleTitle=TransUNet: transformers make strong encoders for medical image segmentation, refAbstract=null), Reference(id=1249044050943352972, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=35, issue=3, pageStart=4309, pageEnd=4323, url=null, language=null, rfNumber=null, rfOrder=3, authorNames=Chen Q, Zhang Z X, Lu Y Y, Fu K R, Zhao Q J, journalName=IEEE Transactions on Neural Networks and Learning Systems, refType=null, unstructuredReference=Chen QZhang Z XLu Y YFu K R and Zhao Q J. 2024. 3-D convolutional neural networks for RGB-D salient object detection and beyond. IEEE Transactions on Neural Networks and Learning Systems35(3): 4309-4323 [DOI: 10.1109/TNNLS.2022.3202241], articleTitle=3-D convolutional neural networks for RGB-D salient object detection and beyond, refAbstract=null), Reference(id=1249044051027239058, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2018b, volume=null, issue=null, pageStart=236, pageEnd=252, url=null, language=null, rfNumber=null, rfOrder=4, authorNames=Chen S H, Tan X L, Wang B, Hu X L, journalName=null, refType=null, unstructuredReference=Chen S HTan X LWang B and Hu X L. 2018b. Reverse attention for salient object detection//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer:236-252 [DOI: 10.1007/978-3-030-01240-3_15], articleTitle=Reverse attention for salient object detection, refAbstract=null), Reference(id=1249044051102736534, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=522, issue=null, pageStart=152, pageEnd=164, url=null, language=null, rfNumber=null, rfOrder=5, authorNames=Chen T Y, Xiao J, Hu X G, Zhang G F, Wang S J, journalName=Neurocomputing, refType=null, unstructuredReference=Chen T YXiao JHu X GZhang G F and Wang S J. 2023. Adaptive fusion network for RGB-D salient object detection. Neurocomputing522: 152-164 [DOI: 10.1016/j.neucom.2022.12.004], articleTitle=Adaptive fusion network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044051211788445, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=25, issue=null, pageStart=4253, pageEnd=4266, url=null, language=null, rfNumber=null, rfOrder=6, authorNames=Cheng X L, Zheng X, Pei J L, Tang H, Lyu Z, Chen C B, journalName=IEEE Transactions on Multimedia, refType=null, unstructuredReference=Cheng X LZheng XPei J LTang HLyu Z and Chen C B. 2023. Depth-induced gap-reducing network for RGB-D salient object detection: an interaction, guidance and refinement approach. IEEE Transactions on Multimedia25: 4253-4266 [DOI: 10.1109/TMM.2022.3172852], articleTitle=Depth-induced gap-reducing network for RGB-D salient object detection: an interaction, guidance and refinement approach, refAbstract=null), Reference(id=1249044051320840358, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2013, volume=null, issue=null, pageStart=#112, pageEnd=null, url=null, language=null, rfNumber=null, rfOrder=7, authorNames=Ciptadi A, Hermans T, Rehg J M, journalName=null, refType=null, unstructuredReference=Ciptadi AHermans T and Rehg J M. 2013. An in depth view of saliency//Proceedings of 2013 British Machine Vision Conference (BMVC). Bristol, UK: BMVC:#112 [DOI: 10.5244/C.27.112], articleTitle=An in depth view of saliency, refAbstract=null), Reference(id=1249044051442475182, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=31, issue=null, pageStart=6800, pageEnd=6815, url=null, language=null, rfNumber=null, rfOrder=8, authorNames=Cong R M, Lin Q W, Zhang C, Li C Y, Cao X C, Huang Q M, Zhao Y, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Cong R MLin Q WZhang CLi C YCao X CHuang Q M and Zhao Y. 2022. CIR-Net: cross-modality interaction and refinement for RGB-D salient object detection. IEEE Transactions on Image Processing31: 6800-6815 [DOI: 10.1109/TIP.2022.3216198], articleTitle=CIR-Net: cross-modality interaction and refinement for RGB-D salient object detection, refAbstract=null), Reference(id=1249044051513778354, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=null, issue=null, pageStart=406, pageEnd=416, url=null, language=null, rfNumber=null, rfOrder=9, authorNames=Cong R M, Liu H Y, Zhang C, Zhang W, Zheng F, Song R, Kwong S, journalName=null, refType=null, unstructuredReference=Cong R MLiu H YZhang CZhang WZheng FSong R and Kwong S. 2023. Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa, Canada: ACM:406-416 [DOI: 10.1145/3581783.3611982], articleTitle=Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044051589275834, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=9, issue=1, pageStart=2624, pageEnd=2635, url=null, language=null, rfNumber=null, rfOrder=10, authorNames=Ding N, Zhang C, Eskandarian A, journalName=IEEE Transactions on Intelligent Vehicles, refType=null, unstructuredReference=Ding NZhang C and Eskandarian A. 2024. SalienDet: a saliency-based feature enhancement algorithm for object detection for autonomous driving. IEEE Transactions on Intelligent Vehicles9(1): 2624-2635 [DOI: 10.1109/TIV.2023.3287359], articleTitle=SalienDet: a saliency-based feature enhancement algorithm for object detection for autonomous driving, refAbstract=null), Reference(id=1249044051673161919, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=61, issue=null, pageStart=1, pageEnd=9, url=null, language=null, rfNumber=null, rfOrder=11, authorNames=Ding Y, Liu Z, Huang M K, Shi R, Wang X Y, journalName=Journal of Visual Communication and Image Representation, refType=null, unstructuredReference=Ding YLiu ZHuang M KShi R and Wang X Y. 2019. Depth-aware saliency detection using convolutional neural networks. Journal of Visual Communication and Image Representation61: 1-9 [DOI: 10.1016/j.jvcir.2019.03.019], articleTitle=Depth-aware saliency detection using convolutional neural networks, refAbstract=null), Reference(id=1249044051786408134, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=32, issue=5, pageStart=2075, pageEnd=2089, url=null, language=null, rfNumber=null, rfOrder=12, authorNames=Fan D P, Lin Z, Zhang Z, Zhu M L, Cheng M M, journalName=IEEE Transactions on Neural Networks and Learning Systems, refType=null, unstructuredReference=Fan D PLin ZZhang ZZhu M L and Cheng M M. 2021. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems32(5): 2075-2089 [DOI: 10.1109/TNNLS.2020.2996406], articleTitle=Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks, refAbstract=null), Reference(id=1249044051929014477, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=594, issue=null, pageStart=null, pageEnd=127865, url=null, language=null, rfNumber=null, rfOrder=13, authorNames=Fang X, Jiang M F, Zhu J C, Shao X L, Wang H P, journalName=Neurocomputing, refType=null, unstructuredReference=Fang XJiang M FZhu J CShao X L and Wang H P. 2024. GroupTransNet: group transformer network for RGB-D salient object detection. Neurocomputing594: #127865 [DOI: 10.1016/j.neucom.2024.127865], articleTitle=GroupTransNet: group transformer network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044053476712660, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=44, issue=9, pageStart=5541, pageEnd=5559, url=null, language=null, rfNumber=null, rfOrder=14, authorNames=Fu K P, Fan D P, Ji G P, Zhao Q J, Shen J B, Zhu C, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Fu K PFan D PJi G PZhao Q JShen J B and Zhu C. 2022. Siamese network for RGB-D salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence44(9): 5541-5559 [DOI: 10.1109/TPAMI.2021.3073689], articleTitle=Siamese network for RGB-D salient object detection and beyond, refAbstract=null), Reference(id=1249044053552210138, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2012, volume=21, issue=9, pageStart=4290, pageEnd=4303, url=null, language=null, rfNumber=null, rfOrder=15, authorNames=Gao Y, Wang M, Tao D C, Ji R R, Dai Q H, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Gao YWang MTao D CJi R R and Dai Q H. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing21(9): 4290-4303 [DOI: 10.1109/TIP.2012.2199502], articleTitle=3-D object retrieval and recognition with hypergraph analysis, refAbstract=null), Reference(id=1249044053640290530, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2018, volume=48, issue=11, pageStart=3171, pageEnd=3183, url=null, language=null, rfNumber=null, rfOrder=16, authorNames=Han J W, Chen H, Liu N, Yan C G, Li X L, journalName=IEEE Transactions on Cybernetics, refType=null, unstructuredReference=Han J WChen HLiu NYan C G and Li X L. 2018. CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics48(11): 3171-3183 [DOI: 10.1109/TCYB.2017.2761775], articleTitle=CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion, refAbstract=null), Reference(id=1249044053711593701, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=41, issue=4, pageStart=815, pageEnd=828, url=null, language=null, rfNumber=null, rfOrder=17, authorNames=Hou Q B, Cheng M M, Hu X W, Borji A, Tu Z W, Torr P H S, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Hou Q BCheng M MHu X WBorji ATu Z W and Torr P H S. 2019. Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence41(4): 815-828 [DOI: 10.1109/TPAMI.2018.2815688], articleTitle=Deeply supervised salient object detection with short connections, refAbstract=null), Reference(id=1249044053782896873, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=24, issue=null, pageStart=1651, pageEnd=1664, url=null, language=null, rfNumber=null, rfOrder=18, authorNames=Huang N C, Yang Y, Zhang D W, Zhang Q, Han J G, journalName=IEEE Transactions on Multimedia, refType=null, unstructuredReference=Huang N CYang YZhang D WZhang Q and Han J G. 2022. Employing bilinear fusion and saliency prior information for RGB-D salient object detection. IEEE Transactions on Multimedia24: 1651-1664 [DOI: 10.1109/TMM.2021.3069297], articleTitle=Employing bilinear fusion and saliency prior information for RGB-D salient object detection, refAbstract=null), Reference(id=1249044053850005742, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=23, issue=2, pageStart=509, pageEnd=518, url=null, language=null, rfNumber=null, rfOrder=19, authorNames=Jahanifar M, Tajeddin N Z, Asl B M, Gooya A, journalName=IEEE Journal of Biomedical and Health Informatics, refType=null, unstructuredReference=Jahanifar MTajeddin N ZAsl B M and Gooya A. 2019. Supervised saliency map driven segmentation of lesions in dermoscopic images. IEEE Journal of Biomedical and Health Informatics23(2): 509-518 [DOI: 10.1109/JBHI.2018.2839647], articleTitle=Supervised saliency map driven segmentation of lesions in dermoscopic images, refAbstract=null), Reference(id=1249044053942280438, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=9466, pageEnd=9476, url=null, language=null, rfNumber=null, rfOrder=20, authorNames=Ji W, Li J J, Yu S, Zhang M, Piao Y, Yao S Y, Bi Q, Ma K, Zheng Y F, Lu H C, Cheng L, journalName=null, refType=null, unstructuredReference=Ji WLi J JYu SZhang MPiao YYao S YBi QMa KZheng Y FLu H C and Cheng L. 2021. Calibrated RGB-D salient object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE:9466-9476 [DOI: 10.1109/CVPR46437.2021.00935], articleTitle=Calibrated RGB-D salient object detection, refAbstract=null), Reference(id=1249044054034555132, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2014, volume=null, issue=null, pageStart=1115, pageEnd=1119, url=null, language=null, rfNumber=null, rfOrder=21, authorNames=Ju R, Ge L, Geng W J, Ren T W, Wu G S, journalName=null, refType=null, unstructuredReference=Ju RGe LGeng W JRen T W and Wu G S. 2014. Depth saliency based on anisotropic center-surround difference//Proceedings of 2014 IEEE International Conference on Image Processing (ICIP). Paris, France: IEEE:1115-1119 [DOI: 10.1109/ICIP.2014.7025222], articleTitle=Depth saliency based on anisotropic center-surround difference, refAbstract=null), Reference(id=1249044054135218433, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2015, volume=null, issue=null, pageStart=1, pageEnd=15, url=null, language=null, rfNumber=null, rfOrder=22, authorNames=Kingma D P, Ba J, journalName=null, refType=null, unstructuredReference=Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR:1-15, articleTitle=Adam: a method for stochastic optimization, refAbstract=null), Reference(id=1249044054214910215, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=null, issue=null, pageStart=630, pageEnd=647, url=null, language=null, rfNumber=null, rfOrder=23, authorNames=Lee M, Park C, Cho S, Lee S, journalName=null, refType=null, unstructuredReference=Lee MPark CCho S and Lee S. 2022. SPSN: superpixel prototype sampling network for RGB-D salient object detection//Proceedings of the 17th European Conference on Computer Vision (ECCV). Tel Aviv, Israel: Springer:630-647 [DOI: 10.1007/978-3-031-19818-2_36], articleTitle=SPSN: superpixel prototype sampling network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044054483345677, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=30, issue=null, pageStart=6855, pageEnd=6868, url=null, language=null, rfNumber=null, rfOrder=24, authorNames=Li J, Su J M, Xia C Q, Ma M C, Tian Y H, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Li JSu J MXia C QMa M C and Tian Y H. 2021. Salient object detection with purificatory mechanism and structural similarity loss. IEEE Transactions on Image Processing30: 6855-6868 [DOI: 10.1109/TIP.2021.3099405], articleTitle=Salient object detection with purificatory mechanism and structural similarity loss, refAbstract=null), Reference(id=1249044054596591893, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=238, issue=null, pageStart=null, pageEnd=121778, url=null, language=null, rfNumber=null, rfOrder=25, authorNames=Liang B C, Luo H L, journalName=Expert Systems with Applications, refType=null, unstructuredReference=Liang B C and Luo H L. 2024. MEANet: an effective and lightweight solution for salient object detection in optical remote sensing images. Expert Systems with Applications238: #121778 [DOI: 10.1016/j.eswa.2023.121778], articleTitle=MEANet: an effective and lightweight solution for salient object detection in optical remote sensing images, refAbstract=null), Reference(id=1249044054755975451, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2018, volume=275, issue=null, pageStart=2227, pageEnd=2238, url=null, language=null, rfNumber=null, rfOrder=26, authorNames=Liang F F, Duan L J, Ma W, Qiao Y H, Cai Z, Qing L, journalName=Neurocomputing, refType=null, unstructuredReference=Liang F FDuan L JMa WQiao Y HCai Z and Qing L. 2018. Stereoscopic saliency model using contrast and depth-guided-background prior. Neurocomputing275: 2227-2238 [DOI: 10.1016/j.neucom.2017.10.052], articleTitle=Stereoscopic saliency model using contrast and depth-guided-background prior, refAbstract=null), Reference(id=1249044054852444450, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=32, issue=3, pageStart=1378, pageEnd=1389, url=null, language=null, rfNumber=null, rfOrder=27, authorNames=Mei H Y, Liu Y Y, Wei Z Q, Zhou D S, Wei X P, Zhang Q, Yang X, journalName=IEEE Transactions on Circuits and Systems for Video Technology, refType=null, unstructuredReference=Mei H YLiu Y YWei Z QZhou D SWei X PZhang Q and Yang X. 2022. Exploring dense context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology32(3): 1378-1389 [DOI: 10.1109/TCSVT.2021.3069848], articleTitle=Exploring dense context for salient object detection, refAbstract=null), Reference(id=1249044054927941928, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2012, volume=null, issue=null, pageStart=454, pageEnd=461, url=null, language=null, rfNumber=null, rfOrder=28, authorNames=Niu Y Z, Geng Y J, Li X Q, Liu F, journalName=null, refType=null, unstructuredReference=Niu Y ZGeng Y JLi X Q and Liu F. 2012. Leveraging stereopsis for saliency analysis//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA: IEEE:454-461 [DOI: 10.1109/CVPR.2012.6247708], articleTitle=Leveraging stereopsis for saliency analysis, refAbstract=null), Reference(id=1249044055020216621, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=32, issue=null, pageStart=892, pageEnd=904, url=null, language=null, rfNumber=null, rfOrder=29, authorNames=Pang Y W, Zhao X Q, Zhang L H, Lu H C, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Pang Y WZhao X QZhang L H and Lu H C. 2023. CAVER: cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Transactions on Image Processing32: 892-904 [DOI: 10.1109/TIP.2023.3234702], articleTitle=CAVER: cross-modal view-mixed transformer for bi-modal salient object detection, refAbstract=null), Reference(id=1249044055083131184, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=null, issue=null, pageStart=#721, pageEnd=null, url=null, language=null, rfNumber=null, rfOrder=30, authorNames=Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z M, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J J, Chintala S, journalName=null, refType=null, unstructuredReference=Paszke AGross SMassa FLerer ABradbury JChanan GKilleen TLin Z MGimelshein NAntiga LDesmaison AKöpf AYang EDeVito ZRaison MTejani AChilamkurthy SSteiner BFang LBai J J and Chintala S. 2019. PyTorch: an imperative style, high-performance deep learning library//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.:#721, articleTitle=PyTorch, refAbstract=null), Reference(id=1249044055183794487, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2014, volume=null, issue=null, pageStart=92, pageEnd=109, url=null, language=null, rfNumber=null, rfOrder=31, authorNames=Peng H W, Li B, Xiong W H, Hu W M, Ji R R, journalName=null, refType=null, unstructuredReference=Peng H WLi BXiong W HHu W M and Ji R R. 2014. RGBD salient object detection: a benchmark and algorithms//Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer:92-109 [DOI: 10.1007/978-3-319-10578-9_7], articleTitle=RGBD salient object detection: a benchmark and algorithms, refAbstract=null), Reference(id=1249044055267680573, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2015, volume=null, issue=null, pageStart=25, pageEnd=32, url=null, language=null, rfNumber=null, rfOrder=32, authorNames=Ren J Q, Xiaojin Gong N, Yu L, Wenhui Zhou N, Yang M Y, journalName=null, refType=null, unstructuredReference=Ren J QXiaojin Gong NYu LWenhui Zhou N and Yang M Y. 2015. Exploiting global priors for RGB-D saliency detection//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Boston, USA: IEEE:25-32 [DOI: 10.1109/CVPRW.2015.7301391], articleTitle=Exploiting global priors for RGB-D saliency detection, refAbstract=null), Reference(id=1249044055364149569, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2014, volume=24, issue=5, pageStart=769, pageEnd=779, url=null, language=null, rfNumber=null, rfOrder=33, authorNames=Ren Z X, Gao S H, Chia L T, Tsang I W H, journalName=IEEE Transactions on Circuits and Systems for Video Technology, refType=null, unstructuredReference=Ren Z XGao S HChia L T and Tsang I W H. 2014. Region-based saliency detection and its application in object recognition. IEEE Transactions on Circuits and Systems for Video Technology24(5): 769-779 [DOI: 10.1109/TCSVT.2013.2280096], articleTitle=Region-based saliency detection and its application in object recognition, refAbstract=null), Reference(id=1249044055448035653, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2015, volume=null, issue=null, pageStart=null, pageEnd=null, url=null, language=null, rfNumber=null, rfOrder=34, authorNames=Simonyan K, Zisserman A, journalName=null, refType=null, unstructuredReference=Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR, articleTitle=Very deep convolutional networks for large-scale image recognition, refAbstract=null), Reference(id=1249044055519338823, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=35, issue=4, pageStart=1899, pageEnd=1913, url=null, language=null, rfNumber=null, rfOrder=35, authorNames=Sun F M, Hu X H, Wu J Y, Sun J, Wang F S, journalName=Journal of Software, refType=null, unstructuredReference=Sun F MHu X HWu J YSun J and Wang F S. 2024. RGB-D salient object detection based on cross-modal interactive fusion and global awareness. Journal of Software35(4): 1899-1913, articleTitle=RGB-D salient object detection based on cross-modal interactive fusion and global awareness, refAbstract=null), Reference(id=1249044055582253387, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=35, issue=4, pageStart=1899, pageEnd=1913, url=null, language=null, rfNumber=null, rfOrder=36, authorNames=孙福明, 胡锡航, 武景宇, 孙静, 王法胜, journalName=软件学报, refType=null, unstructuredReference=孙福明, 胡锡航, 武景宇, 孙静, 王法胜. 2024. 跨模态交互融合与全局感知的RGB-D显著性目标检测. 软件学报35(4): 1899-1913 [DOI: 10.13328/j.cnki.jos.006833], articleTitle=跨模态交互融合与全局感知的RGB-D显著性目标检测, refAbstract=null), Reference(id=1249044055653556559, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022a, volume=31, issue=null, pageStart=1285, pageEnd=1297, url=null, language=null, rfNumber=null, rfOrder=37, authorNames=Wang F Y, Pan J S, Xu S K, Tang J H, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Wang F YPan J SXu S K and Tang J H. 2022a. Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Transactions on Image Processing31: 1285-1297 [DOI: 10.1109/TIP.2022.3140606], articleTitle=Learning discriminative cross-modality features for RGB-D saliency detection, refAbstract=null), Reference(id=1249044055733248339, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022b, volume=44, issue=6, pageStart=3239, pageEnd=3259, url=null, language=null, rfNumber=null, rfOrder=38, authorNames=Wang W G, Lai Q X, Fu H Z, Shen J B, Ling H B, Yang R G, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Wang W GLai Q XFu H ZShen J BLing H B and Yang R G. 2022b. Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44(6): 3239-3259 [DOI: 10.1109/TPAMI.2021.3051099], articleTitle=Salient object detection in the deep learning era: an in-depth survey, refAbstract=null), Reference(id=1249044055808745814, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=41, issue=7, pageStart=1531, pageEnd=1544, url=null, language=null, rfNumber=null, rfOrder=39, authorNames=Wang W G, Shen J B, Ling H B, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Wang W GShen J B and Ling H B. 2019. A deep network solution for attention and aesthetics aware photo cropping. IEEE Transactions on Pattern Analysis and Machine Intelligence41(7): 1531-1544 [DOI: 10.1109/TPAMI.2018.2840724], articleTitle=A deep network solution for attention and aesthetics aware photo cropping, refAbstract=null), Reference(id=1249044055884243290, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=43, issue=7, pageStart=2413, pageEnd=2428, url=null, language=null, rfNumber=null, rfOrder=40, authorNames=Wang W G, Shen J B, Lu X K, Hoi S C H, Ling H B, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Wang W GShen J BLu X KHoi S C H and Ling H B. 2021. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence43(7): 2413-2428 [DOI: 10.1109/TPAMI.2020.2966453], articleTitle=Paying attention to video object pattern understanding, refAbstract=null), Reference(id=1249044055959740766, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=46, issue=3, pageStart=1635, pageEnd=1649, url=null, language=null, rfNumber=null, rfOrder=41, authorNames=Wang W G, Sun G L, Van Gool L, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Wang W GSun G L and Van Gool L. 2024. Looking beyond single images for weakly supervised semantic segmentation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence46(3): 1635-1649 [DOI: 10.1109/TPAMI.2022.3168530], articleTitle=Looking beyond single images for weakly supervised semantic segmentation learning, refAbstract=null), Reference(id=1249044056064598370, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022a, volume=195, issue=null, pageStart=null, pageEnd=116614, url=null, language=null, rfNumber=null, rfOrder=42, authorNames=Wu J Y, Sun F M, Xu R, Meng J, Wang F S, journalName=Expert Systems with Applications, refType=null, unstructuredReference=Wu J YSun F MXu RMeng J and Wang F S. 2022a. Aggregate interactive learning for RGB-D salient object detection. Expert Systems with Applications195: #116614 [DOI: 10.1016/j.eswa.2022.116614], articleTitle=Aggregate interactive learning for RGB-D salient object detection, refAbstract=null), Reference(id=1249044056144290150, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=45, issue=11, pageStart=12760, pageEnd=12771, url=null, language=null, rfNumber=null, rfOrder=43, authorNames=Wu Y H, Liu Y, Zhan X, Cheng M M, journalName=IEEE Transactions on Pattern Analysis and Machine Intelligence, refType=null, unstructuredReference=Wu Y HLiu YZhan X and Cheng M M. 2023. P2T: pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence45(11): 12760-12771 [DOI: 10.1109/TPAMI.2022.3202765], articleTitle=P2T: pyramid pooling transformer for scene understanding, refAbstract=null), Reference(id=1249044056240759147, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022b, volume=31, issue=null, pageStart=3125, pageEnd=3136, url=null, language=null, rfNumber=null, rfOrder=44, authorNames=Wu Y H, Liu Y, Zhang L, Cheng M M, Ren B, journalName=IEEE Transactions on Image Processing, refType=null, unstructuredReference=Wu Y HLiu YZhang LCheng M M and Ren B. 2022b. EDN: salient object detection via extremely-downsampled network. IEEE Transactions on Image Processing31: 3125-3136 [DOI: 10.1109/TIP.2022.3164550], articleTitle=EDN: salient object detection via extremely-downsampled network, refAbstract=null), Reference(id=1249044056316256622, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=26, issue=null, pageStart=2648, pageEnd=2658, url=null, language=null, rfNumber=null, rfOrder=45, authorNames=Xiao F, Pu Z D, Chen J Q, Gao X P, journalName=IEEE Transactions on Multimedia, refType=null, unstructuredReference=Xiao FPu Z DChen J Q and Gao X P. 2024. DGFNet: depth-guided cross-modality fusion network for RGB-D salient object detection. IEEE Transactions on Multimedia26: 2648-2658 [DOI: 10.1109/TMM.2023.3301280], articleTitle=DGFNet: depth-guided cross-modality fusion network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044056391754097, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=29, issue=5, pageStart=1252, pageEnd=1264, url=null, language=null, rfNumber=null, rfOrder=46, authorNames=Ye X Y, Zhu L, Wang W W, Fu Y, journalName=Journal of Image and Graphics, refType=null, unstructuredReference=Ye X YZhu LWang W W and Fu Y. 2024. RGB_D salient object detection algorithm based on complementary information interaction. Journal of Image and Graphics29(5): 1252-1264, articleTitle=RGB_D salient object detection algorithm based on complementary information interaction, refAbstract=null), Reference(id=1249044057973006708, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=29, issue=5, pageStart=1252, pageEnd=1264, url=null, language=null, rfNumber=null, rfOrder=47, authorNames=叶欣悦, 朱磊, 王文武, 付云, journalName=中国图象图形学报, refType=null, unstructuredReference=叶欣悦, 朱磊, 王文武, 付云. 2024. 互补特征交互融合的RGB_D实时显著目标检测. 中国图象图形学报29(5): 1252-1264 [DOI: 10.11834/jig.230583], articleTitle=互补特征交互融合的RGB_D实时显著目标检测, refAbstract=null), Reference(id=1249044058044309879, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=49, issue=3, pageStart=1259, pageEnd=1290, url=null, language=null, rfNumber=null, rfOrder=48, authorNames=Zhang R, Lyu Y, Zhang Z T, Ren L, Xie J, Zhang A L, Yan Z W, Mi O, journalName=Journal of China Coal Society, refType=null, unstructuredReference=Zhang RLyu YZhang Z TRen LXie JZhang A LYan Z W and Mi O. 2024. Development and prospect of multidimensional information perception and intelligent construction in deep earth engineering. Journal of China Coal Society49(3): 1259-1290, articleTitle=Development and prospect of multidimensional information perception and intelligent construction in deep earth engineering, refAbstract=null), Reference(id=1249044058128195963, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=49, issue=3, pageStart=1259, pageEnd=1290, url=null, language=null, rfNumber=null, rfOrder=49, authorNames=张茹, 吕游, 张泽天, 任利, 谢晶, 张安林, 严志伟, 米欧, journalName=煤炭学报, refType=null, unstructuredReference=张茹, 吕游, 张泽天, 任利, 谢晶, 张安林, 严志伟, 米欧. 2024. 深地工程多维信息感知与智能建造的发展与展望. 煤炭学报49(3): 1259-1290 [DOI: 10.13225/j.cnki.jccs.2023.1439], articleTitle=深地工程多维信息感知与智能建造的发展与展望, refAbstract=null), Reference(id=1249044058224664958, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2020, volume=null, issue=null, pageStart=3469, pageEnd=3478, url=null, language=null, rfNumber=null, rfOrder=50, authorNames=Zhang M, Ren W S, Piao Y, Rong Z K, Lu H C, journalName=null, refType=null, unstructuredReference=Zhang MRen W SPiao YRong Z K and Lu H C. 2020. Select, supplement and focus for RGB-D saliency detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE:3469-3478 [DOI: 10.1109/CVPR42600.2020.00353], articleTitle=Select, supplement and focus for RGB-D saliency detection, refAbstract=null), Reference(id=1249044058346299777, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2023, volume=25, issue=null, pageStart=5142, pageEnd=5154, url=null, language=null, rfNumber=null, rfOrder=51, authorNames=Zhang M, Yao S Y, Hu B Q, Piao Y, Ji W, journalName=IEEE Transactions on Multimedia, refType=null, unstructuredReference=Zhang MYao S YHu B QPiao Y and Ji W. 2023. C2DFNet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Transactions on Multimedia25: 5142-5154 [DOI: 10.1109/TMM.2022.3187856], articleTitle=C2DFNet: criss-cross dynamic filter network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044058413408643, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=24, issue=7, pageStart=1096, pageEnd=1105, url=null, language=null, rfNumber=null, rfOrder=52, authorNames=Zhang Q, Li Y, Li W J, Lin J J, Xiao M, Chen F Y, journalName=Journal of Image and Graphics, refType=null, unstructuredReference=Zhang QLi YLi W JLin J JXiao M and Chen F Y. 2019. Salient object detection via deep features and multiple kernel boosting learning. Journal of Image and Graphics24(7): 1096-1105, articleTitle=Salient object detection via deep features and multiple kernel boosting learning, refAbstract=null), Reference(id=1249044058472128902, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=24, issue=7, pageStart=1096, pageEnd=1105, url=null, language=null, rfNumber=null, rfOrder=53, authorNames=张晴, 李云, 李文举, 林家骏, 肖莽, 陈飞云, journalName=中国图象图形学报, refType=null, unstructuredReference=张晴, 李云, 李文举, 林家骏, 肖莽, 陈飞云. 2019. 融合深度特征和多核增强学习的显著目标检测. 中国图象图形学报24(7): 1096-1105 [DOI: 10.11834/jig.180224], articleTitle=融合深度特征和多核增强学习的显著目标检测, refAbstract=null), Reference(id=1249044058535043465, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2024, volume=34, issue=3, pageStart=1493, pageEnd=1507, url=null, language=null, rfNumber=null, rfOrder=54, authorNames=Zhang Q, Qin Q, Yang Y, Jiao Q, Han J G, journalName=IEEE Transactions on Circuits and Systems for Video Technology, refType=null, unstructuredReference=Zhang QQin QYang YJiao Q and Han J G. 2024. Feature calibrating and fusing network for RGB-D salient object detection. IEEE Transactions on Circuits and Systems for Video Technology34(3): 1493-1507 [DOI: 10.1109/TCSVT.2023.3296581], articleTitle=Feature calibrating and fusing network for RGB-D salient object detection, refAbstract=null), Reference(id=1249044058614735245, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2019, volume=null, issue=null, pageStart=8778, pageEnd=8787, url=null, language=null, rfNumber=null, rfOrder=55, authorNames=Zhao J X, Liu J J, Fan D P, Cao Y, Yang J F, Cheng M M, journalName=null, refType=null, unstructuredReference=Zhao J XLiu J JFan D PCao YYang J F and Cheng M M. 2019. EGNet: edge guidance network for salient object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE:8778-8787 [DOI: 10.1109/ICCV.2019.00887], articleTitle=EGNet: edge guidance network for salient object detection, refAbstract=null), Reference(id=1249044058686038416, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=47, issue=9, pageStart=2120, pageEnd=2131, url=null, language=null, rfNumber=null, rfOrder=56, authorNames=Zhao X K, Li M L, Zhang G, Li N, Li J S, journalName=Acta Automatica Sinica, refType=null, unstructuredReference=Zhao X KLi M LZhang GLi N and Li J S. 2021. Object detection method based on saliency map fusion for UAV-borne thermal images. Acta Automatica Sinica47(9): 2120-2131, articleTitle=Object detection method based on saliency map fusion for UAV-borne thermal images, refAbstract=null), Reference(id=1249044058765730195, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=47, issue=9, pageStart=2120, pageEnd=2131, url=null, language=null, rfNumber=null, rfOrder=57, authorNames=赵兴科, 李明磊, 张弓, 黎宁, 李家松, journalName=自动化学报, refType=null, unstructuredReference=赵兴科, 李明磊, 张弓, 黎宁, 李家松. 2021. 基于显著图融合的无人机载热红外图像目标检测方法. 自动化学报47(9): 2120-2131 [DOI: 10.16383/j.aas.c200021], articleTitle=基于显著图融合的无人机载热红外图像目标检测方法, refAbstract=null), Reference(id=1249044058878976407, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2022, volume=6, issue=3, pageStart=593, pageEnd=601, url=null, language=null, rfNumber=null, rfOrder=58, authorNames=Zhou W J, Pan S J, Lei J S, Yu L, journalName=IEEE Transactions on Emerging Topics in Computational Intelligence, refType=null, unstructuredReference=Zhou W JPan S JLei J S and Yu L. 2022. TMFNet: three-input multilevel fusion network for detecting salient objects in RGB-D images. IEEE Transactions on Emerging Topics in Computational Intelligence6(3): 593-601 [DOI: 10.1109/TETCI.2021.3097393], articleTitle=TMFNet: three-input multilevel fusion network for detecting salient objects in RGB-D images, refAbstract=null), Reference(id=1249044058946085274, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, doi=null, pmid=null, pmcid=null, year=2021, volume=null, issue=null, pageStart=894, pageEnd=910, url=null, language=null, rfNumber=null, rfOrder=59, authorNames=Zhu X Z, Su W J, Lu L W, Li B, Wang X G, Dai J F, journalName=null, refType=null, unstructuredReference=Zhu X ZSu W JLu L WLi BWang X G and Dai J F. 2021. Deformable DETR: deformable transformers for end-to-end object detection//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: ICLR:894-910, articleTitle=Deformable DETR: deformable transformers for end-to-end object detection, refAbstract=null)], funds=null, companyList=[AuthorCompany(id=1249044037068595890, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=1, ext=[AuthorCompanyExt(id=1249044037085373108, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China), AuthorCompanyExt(id=1249044037118927541, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037068595890, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1西安理工大学计算机科学与工程学院,西安710048)]), AuthorCompany(id=1249044037441888960, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, xref=2, ext=[AuthorCompanyExt(id=1249044037467054785, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China), AuthorCompanyExt(id=1249044037488026307, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, companyId=1249044037441888960, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2人机共融智能机器人陕西省高校工程研究中心,西安710048)])], figs=[ArticleFig(id=1249044044853224328, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.1, caption=Cross-modal feature fusion and detail-enhanced network (CFADNet) architecture, figureFileSmall=b9xoypNJvQAZ/f1KIYZ5Rw==, figureFileBig=5b9ITVT5R5PSNWyvQVLaeQ==, tableContent=null), ArticleFig(id=1249044045054550930, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图1, caption=跨模态特征融合与细节信息增强网络架构, figureFileSmall=b9xoypNJvQAZ/f1KIYZ5Rw==, figureFileBig=5b9ITVT5R5PSNWyvQVLaeQ==, tableContent=null), ArticleFig(id=1249044045482369958, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.2, caption=Cross-modal attention fusion enhancement module (CAFEM) structure, figureFileSmall=2uhsbBWaG/bxmMtUcqM6Tw==, figureFileBig=z0xKJgosBNVy2igEVr9oYA==, tableContent=null), ArticleFig(id=1249044045608199085, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图2, caption=跨模态注意力融合增强模块, figureFileSmall=2uhsbBWaG/bxmMtUcqM6Tw==, figureFileBig=z0xKJgosBNVy2igEVr9oYA==, tableContent=null), ArticleFig(id=1249044045742416822, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.3, caption=Boundary feature extraction module (BFEM) structure, figureFileSmall=Jw3UhX9+iJ5m08zlf0QYlw==, figureFileBig=NC8dF7yhbgQw5vHFzvhwiQ==, tableContent=null), ArticleFig(id=1249044045876634556, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图3, caption=边缘特征提取模块, figureFileSmall=Jw3UhX9+iJ5m08zlf0QYlw==, figureFileBig=NC8dF7yhbgQw5vHFzvhwiQ==, tableContent=null), ArticleFig(id=1249044046065378247, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.4, caption=Intermediate feature visualization of BFEM, figureFileSmall=CbyjmQjrRgXz7Y0XCBb5Yw==, figureFileBig=iXODMvZOrlrtfr61nyfFyA==, tableContent=null), ArticleFig(id=1249044046300259280, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图4, caption=边缘特征提取模块中间特征可视化, figureFileSmall=CbyjmQjrRgXz7Y0XCBb5Yw==, figureFileBig=iXODMvZOrlrtfr61nyfFyA==, tableContent=null), ArticleFig(id=1249044046493197272, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.5, caption=Fβ curves, figureFileSmall=5un3Fwp3R4Q7kgl6tmJN6g==, figureFileBig=wIXYZKjSCZVLoZthmSx/Mg==, tableContent=null), ArticleFig(id=1249044046648386526, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图5, caption=Fβ曲线, figureFileSmall=5un3Fwp3R4Q7kgl6tmJN6g==, figureFileBig=wIXYZKjSCZVLoZthmSx/Mg==, tableContent=null), ArticleFig(id=1249044046883267559, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.6, caption=Precision-recall curves, figureFileSmall=6lD61WN/Udf/MR+Vx+sGrg==, figureFileBig=UtvkU0ax5LOMFVooO3v56A==, tableContent=null), ArticleFig(id=1249044047118148586, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图6, caption=精度-召回率曲线, figureFileSmall=6lD61WN/Udf/MR+Vx+sGrg==, figureFileBig=UtvkU0ax5LOMFVooO3v56A==, tableContent=null), ArticleFig(id=1249044047277532146, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.7, caption=Visual comparison of our method with the state-of-the-art methods, figureFileSmall=kHVy4BjbYoZni2vKHl6dFQ==, figureFileBig=57fPEvkm4J1o1GMfFrUtSA==, tableContent=null), ArticleFig(id=1249044047462081529, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图7, caption=本文方法与现有先进方法的直观对比, figureFileSmall=kHVy4BjbYoZni2vKHl6dFQ==, figureFileBig=57fPEvkm4J1o1GMfFrUtSA==, tableContent=null), ArticleFig(id=1249044049022362624, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.8, caption=Comparison of details of prediction maps obtained by different edge feature extraction method, figureFileSmall=/Rk4H27UZDdYl/vwEAyhAg==, figureFileBig=TnNylwhufUu5hE70leHitg==, tableContent=null), ArticleFig(id=1249044049106247687, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图8, caption=采用不同边缘特征提取方法得到的预测图细节对比, figureFileSmall=/Rk4H27UZDdYl/vwEAyhAg==, figureFileBig=TnNylwhufUu5hE70leHitg==, tableContent=null), ArticleFig(id=1249044049190133769, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Fig.9, caption=Some failure cases of our method, figureFileSmall=klBTHQkna5CgSRztwKM0iQ==, figureFileBig=cuo1Tgp/W8Tw4CmPDG5BqA==, tableContent=null), ArticleFig(id=1249044049311768596, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=图9, caption=失败案例, figureFileSmall=klBTHQkna5CgSRztwKM0iQ==, figureFileBig=cuo1Tgp/W8Tw4CmPDG5BqA==, tableContent=null), ArticleFig(id=1249044049429209116, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.1, caption=

Quantitative evaluation

, figureFileSmall=null, figureFileBig=null, tableContent=
方法接收会议NJU2K数据集NLPR数据集STERE数据集SIP数据集
MAE ↓FβSαMAEFβSαMAE ↓FβSαMAEFβSα
DCFCVPR 210.0350.9020.9120.0210.8910.9240.0390.8850.9020.0510.8750.876
CIRNetTIP 220.0350.9270.9250.0230.9240.9330.0380.9140.9170.0520.8960.888
CM-LCGTIP 220.0430.9150.9130.0290.9060.9220.0430.9060.910---
AILNetESWA 220.0450.8760.8980.0290.8570.9120.0380.8800.9080.0500.8660.889
SPSNECCV 220.0320.9200.9180.0230.9100.9230.0350.9000.9070.0420.8990.892
TMFNetTETCI 220.0410.8820.9100.0270.8670.921---0.0570.8530.874
AFNetNuecom220.0320.9280.9260.0200.9250.9360.0340.9180.9180.0430.9090.896
JL-DCFTPAMI 220.0400.9130.9110.0230.9170.9260.0390.9070.9110.0460.9000.892
EBFSPTMM 220.0380.8950.9070.0280.8870.9090.0410.8730.9000.0520.8630.877
CAVERTIP 230.0310.9250.9210.0200.9210.9290.0330.9120.9130.0420.9020.893
HINetPR 230.0390.9140.9150.0260.9060.9220.0490.8830.8920.0660.8550.856
C2DFNetTMM 230.0390.9090.9080.0220.9170.9280.0380.8970.9020.0530.8770.872
PICRNetACMM 230.0290.9310.9270.0190.9280.9350.0310.9200.9210.0530.8830.872
DGFNetTMM 240.0320.9140.9210.0210.9020.9280.0350.8960.9110.0480.8790.883
FCFNetTCSVT 240.0340.9230.9180.0240.9110.9240.0380.9060.906---
RD3DTNNLS240.0330.9280.9280.0220.9210.9330.0370.9050.9140.0460.9000.892
CFADNet(本文)-0.0270.9330.9300.0170.9340.9390.0280.9230.9250.0410.9100.897
), ArticleFig(id=1249044049529872418, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表1, caption=

定量评估

, figureFileSmall=null, figureFileBig=null, tableContent=
方法接收会议NJU2K数据集NLPR数据集STERE数据集SIP数据集
MAE ↓FβSαMAEFβSαMAE ↓FβSαMAEFβSα
DCFCVPR 210.0350.9020.9120.0210.8910.9240.0390.8850.9020.0510.8750.876
CIRNetTIP 220.0350.9270.9250.0230.9240.9330.0380.9140.9170.0520.8960.888
CM-LCGTIP 220.0430.9150.9130.0290.9060.9220.0430.9060.910---
AILNetESWA 220.0450.8760.8980.0290.8570.9120.0380.8800.9080.0500.8660.889
SPSNECCV 220.0320.9200.9180.0230.9100.9230.0350.9000.9070.0420.8990.892
TMFNetTETCI 220.0410.8820.9100.0270.8670.921---0.0570.8530.874
AFNetNuecom220.0320.9280.9260.0200.9250.9360.0340.9180.9180.0430.9090.896
JL-DCFTPAMI 220.0400.9130.9110.0230.9170.9260.0390.9070.9110.0460.9000.892
EBFSPTMM 220.0380.8950.9070.0280.8870.9090.0410.8730.9000.0520.8630.877
CAVERTIP 230.0310.9250.9210.0200.9210.9290.0330.9120.9130.0420.9020.893
HINetPR 230.0390.9140.9150.0260.9060.9220.0490.8830.8920.0660.8550.856
C2DFNetTMM 230.0390.9090.9080.0220.9170.9280.0380.8970.9020.0530.8770.872
PICRNetACMM 230.0290.9310.9270.0190.9280.9350.0310.9200.9210.0530.8830.872
DGFNetTMM 240.0320.9140.9210.0210.9020.9280.0350.8960.9110.0480.8790.883
FCFNetTCSVT 240.0340.9230.9180.0240.9110.9240.0380.9060.906---
RD3DTNNLS240.0330.9280.9280.0220.9210.9330.0370.9050.9140.0460.9000.892
CFADNet(本文)-0.0270.9330.9300.0170.9340.9390.0280.9230.9250.0410.9100.897
), ArticleFig(id=1249044049617952811, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.2, caption=

Quantitative comparison in terms of average precision and average recall with other methods

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集SIP数据集
PrecavgRecallavgPrecavgRecallavgPrecavgRecallavgPrecavgRecallavg
DCF0.9080.9170.8980.9220.8880.9180.9020.847
SPSN0.9180.9240.9010.9240.8930.9240.9010.894
AFNet0.9160.9250.9070.9240.9000.9310.9130.874
CAVER0.9240.9290.9170.9240.9040.9300.9220.876
HINet0.9090.9080.8960.9090.8680.8920.8850.808
PICRNet0.9000.8940.9110.9210.8690.8730.8870.864
C2DFNet0.9100.8980.9110.9190.8870.9130.8870.858
RD3D0.9170.9250.9040.9230.8840.9260.9070.872
CFADNet(本文)0.9270.9370.9260.9380.9090.9460.9220.882
), ArticleFig(id=1249044049727004720, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表2, caption=

与其他方法的平均精度和平均召回率比较

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集SIP数据集
PrecavgRecallavgPrecavgRecallavgPrecavgRecallavgPrecavgRecallavg
DCF0.9080.9170.8980.9220.8880.9180.9020.847
SPSN0.9180.9240.9010.9240.8930.9240.9010.894
AFNet0.9160.9250.9070.9240.9000.9310.9130.874
CAVER0.9240.9290.9170.9240.9040.9300.9220.876
HINet0.9090.9080.8960.9090.8680.8920.8850.808
PICRNet0.9000.8940.9110.9210.8690.8730.8870.864
C2DFNet0.9100.8980.9110.9190.8870.9130.8870.858
RD3D0.9170.9250.9040.9230.8840.9260.9070.872
CFADNet(本文)0.9270.9370.9260.9380.9090.9460.9220.882
), ArticleFig(id=1249044049823473720, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.3, caption=

Comparison of computational complexity and parameter size with other methods

, figureFileSmall=null, figureFileBig=null, tableContent=
方法FLOPs/G参数量/M
DCF55.48107.29
CIRNet156.3482.08
TMFNet-266.7
AFNet130.02258.13
HINet389.798.9
C2DFNet22.04747.52
DGFNet74.8942.14
RD3D57.847.14
CFADNet(本文)46.82203.88
), ArticleFig(id=1249044049945108542, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表3, caption=

与其他方法计算复杂度和参数量对比

, figureFileSmall=null, figureFileBig=null, tableContent=
方法FLOPs/G参数量/M
DCF55.48107.29
CIRNet156.3482.08
TMFNet-266.7
AFNet130.02258.13
HINet389.798.9
C2DFNet22.04747.52
DGFNet74.8942.14
RD3D57.847.14
CFADNet(本文)46.82203.88
), ArticleFig(id=1249044050012217412, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.4, caption=

Comparison experiment of CAFEM with other fusion methods

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAEFβSα
w/o CAFEM0.0290.9280.9260.0180.9330.9370.0300.9200.923
CmPI0.0300.9290.9250.0180.9310.9380.0320.9150.919
CAFEM0.0270.9330.9300.0170.9340.9390.0280.9230.925
), ArticleFig(id=1249044050083520587, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表4, caption=

CAFEM与不同融合方法的对比实验

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAEFβSα
w/o CAFEM0.0290.9280.9260.0180.9330.9370.0300.9200.923
CmPI0.0300.9290.9250.0180.9310.9380.0320.9150.919
CAFEM0.0270.9330.9300.0170.9340.9390.0280.9230.925
), ArticleFig(id=1249044050171600978, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.5, caption=

Ablation studies on the BFEM

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAEFβSα
w/o EE0.0290.9270.9260.0170.9330.9380.030.9210.922
BFEM0.0270.9330.930.0170.9340.9390.0280.9230.925
), ArticleFig(id=1249044050272264280, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表5, caption=

BFEM的消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAEFβSα
w/o EE0.0290.9270.9260.0170.9330.9380.030.9210.922
BFEM0.0270.9330.930.0170.9340.9390.0280.9230.925
), ArticleFig(id=1249044050402287713, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=EN, label=Tab.6, caption=

Ablation studies on the loss function

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAE ↓FβSα
FBCE0.0340.9100.9210.0200.9050.9110.0360.9030.910
FBCE + FSSIM0.0300.9250.9270.0190.9250.9350.0310.9210.921
F混合0.0320.9170.9240.0210.9180.9280.0330.9120.917
FBCE + FSSIM + FIoU0.0270.9330.9300.0170.9340.9390.0280.9230.925
), ArticleFig(id=1249044050515533930, tenantId=1146029695717560320, journalId=1249024232475115590, articleId=1249044017850294679, language=CN, label=表6, caption=

混合损失函数的消融实验

, figureFileSmall=null, figureFileBig=null, tableContent=
方法NJU2K数据集NLPR数据集STERE数据集
MAE ↓FβSαMAEFβSαMAE ↓FβSα
FBCE0.0340.9100.9210.0200.9050.9110.0360.9030.910
FBCE + FSSIM0.0300.9250.9270.0190.9250.9350.0310.9210.921
F混合0.0320.9170.9240.0210.9180.9280.0330.9120.917
FBCE + FSSIM + FIoU0.0270.9330.9300.0170.9340.9390.0280.9230.925
)], attaches=null, journal=Journal(id=1249023527618129992, delFlag=0, nameCn=中国图象图形学报, nameEn=Journal of Image and Graphics, nameHistory1=null, nameHistory2=null, issn=1006-8961, eissn=null, cn=11-3758, coden=CODEN ZTTXFZ, periodic=0, language=CN, oaType=null, ccby=null, superviseOffice=null, ownerOffice=null, pubOffice=null, editorOffice=null, officeType=null, aims=null, clcCode=null, officeProv=null, officeCity=null, officeAddr=null, officeZip=null, officeEmail=null, officePhone=null, editDirector=null, officeDirector=null, officeDirectorPhone=null, officeStaffNum=null, officeEmpNum=null, coverPicUrl=uirXtX858YS3zEpFXZttJA==, journalPrice=null, startedYear=null, abbrevIsoEn=Journal of Image and Graphics, journalRemark=null, publicationField=null, createdTime=1775720014721, updatedTime=1775720337198, createdBy=18614031015, updatedBy=13701087609, firstLetterCn=J, firstLetterEn=J, subjectCode=Engineering, subjectName=null, subjectCodeEn=Engineering, subjectNameEn=null, picCn=uirXtX858YS3zEpFXZttJA==, picEn=bud7qaxfvWHeFsbyBTAiKQ==, jcr=null, cjcr=null, exts=[JournalExt(id=1249024880377786590, language=CN, name=中国图象图形学报, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1775720337242, updatedTime=1775720337242, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-author&redirect_uri=https%3A%2F%2Fcjig.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=e6369def-2842-41d8, submissionEditorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-editor&redirect_uri=https%3A%2F%2Fcjigeditor.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=9ccec05b-6bd, submissionReviewUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-author&redirect_uri=https%3A%2F%2Fcjig.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=1e8a31c8-5434-4f78, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""}), JournalExt(id=1249024880449089759, language=EN, name=Journal of Image and Graphics, nameHistory1=null, nameHistory2=null, managedBy=, sponsoredBy=, publishedBy=, editorOffice=, officeProv=null, officeCity=null, officeAddr=, officeZip=, editDirector=, officeDirector=null, officePhone=null, coverPicUrl=null, journalRemark=, submitArticleUrl=null, websiteUrl=, createdTime=1775720337259, updatedTime=1775720337259, createdBy=13701087609, updatedBy=13701087609, submissionGuidelinesUrl=, submissionAuthorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-author&redirect_uri=https%3A%2F%2Fcjig.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=e6369def-2842-41d8, submissionEditorUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-editor&redirect_uri=https%3A%2F%2Fcjigeditor.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=9ccec05b-6bd, submissionReviewUrl=https://journal.ids.fzyun.cn/auth/realms/journal/protocol/openid-connect/auth?client_id=journal-cjig-author&redirect_uri=https%3A%2F%2Fcjig.portal.founderss.cn%2Foauth%2Fcallback&response_type=code&scope=phone+openid+email+profile&state=1e8a31c8-5434-4f78, submissionCeEditorUrl=, submissionAeEditorUrl=, option={"copyright":""})], databaseList=null, tenantJournalId=1249024232475115590, websiteList=[Website(id=1249025782459334881, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1249024232475115590, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN, language=CN, createTime=1775720552315, createBy=18614031015, updateTime=1775720586268, updateBy=18614031015, name=中国图象图形学报-中文, tplId=1146099689490845704, title=中国图象图形学报, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1249026166254928133, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=articleTextType, value=kx, createTime=1775720643819, updateTime=1775720643819, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166221373698, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=banner, value=null, createTime=1775720643811, updateTime=1775720643811, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166271705352, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=grayFlag, value=0, createTime=1775720643823, updateTime=1775720643823, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166212985089, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=logo, value=https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN/file/pic?fileId=TDRjKTHfgAnvFKZaDA70wA==, createTime=1775720643809, updateTime=1775720643809, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166288482570, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=minRunFlag, value=0, createTime=1775720643827, updateTime=1775720643827, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166246539524, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN/file/pic, createTime=1775720643817, updateTime=1775720643817, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166280093961, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=silenceFlag, value=0, createTime=1775720643825, updateTime=1775720643825, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166233956611, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_cn_619/, createTime=1775720643814, updateTime=1775720643814, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166259122438, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=themeColor, value=null, createTime=1775720643820, updateTime=1775720643820, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026166267511047, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782459334881, code=themeStyle, value=null, createTime=1775720643822, updateTime=1775720643822, creator=18614031015, updator=18614031015)]), Website(id=1249025782681633001, webName=null, webTitle=null, webDomain=null, webCopyrigh=null, webIpcNo=null, seoTitle=null, seoKeywords=null, seoDescription=null, tenantJournalId=null, journalId=1249024232475115590, journalNameCn=null, journalNameEn=null, grayFlag=null, tenantId=1146029695717560320, platformId=null, journalGroupId=null, journalGroupNameCn=null, journalGroupNameEn=null, type=1, domain=https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN, language=EN, createTime=1775720552368, createBy=18614031015, updateTime=1775720607118, updateBy=18614031015, name=中国图象图形学报-英文, tplId=1146101810881728533, title=Journal of Image and Graphics, delFlag=0, indexPage=/home, props=[WebsiteProps(id=1249026195371786511, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=articleTextType, value=kx, createTime=1775720650761, updateTime=1775720650761, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195355009292, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=banner, value=null, createTime=1775720650757, updateTime=1775720650757, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195392758034, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=grayFlag, value=0, createTime=1775720650766, updateTime=1775720650766, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195342426379, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=logo, value=https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/file/pic?fileId=TDRjKTHfgAnvFKZaDA70wA==, createTime=1775720650754, updateTime=1775720650754, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195409535252, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=minRunFlag, value=0, createTime=1775720650770, updateTime=1775720650770, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195367592206, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=picServerUrl, value=https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/file/pic, createTime=1775720650760, updateTime=1775720650760, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195401146643, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=silenceFlag, value=0, createTime=1775720650768, updateTime=1775720650768, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195359203597, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=staticResourcePath, value=https://castjournals.cast.org.cn/joweb/cast_kjdb_en_623/, createTime=1775720650758, updateTime=1775720650758, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195380175120, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=themeColor, value=null, createTime=1775720650763, updateTime=1775720650763, creator=18614031015, updator=18614031015), WebsiteProps(id=1249026195388563729, tenantId=1146029695717560320, journalId=null, journalGroupId=null, siteId=1249025782681633001, code=themeStyle, value=null, createTime=1775720650765, updateTime=1775720650765, creator=18614031015, updator=18614031015)])], journalTitle=中国图象图形学报, weixinUrl=null, journalUrl=https://www.cjig.cn/, iacademicId=null, status=1, seqNo=null, journalTitleEn=Journal of Image and Graphics, journalPhotoCn=uirXtX858YS3zEpFXZttJA==, journalPhotoEn=bud7qaxfvWHeFsbyBTAiKQ==, journalFirstLetter=J, journalRecommend=null, journalNew=null, journalCollection=null, jcrJf=null, cjcrJf=null, jcrJfStr=null, cjcrJfStr=null, submissionFirstDecision=null, sciSubjectClassification=null, casSubjectClassification=null, citeScore=null, totalCitationFrequency=null, icpCode=null, psCode=null, advertisingLicenseCode=null, copyrightInformation=null, country=null, option=, provinceCode=null, provinceName=null, collectFlag=false), detailUrlCn=https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN/10.11834/jig.240653, detailUrlEn=https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/10.11834/jig.240653, pdfUrlCn=https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN/PDF/10.11834/jig.240653, pdfUrlEn=https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/PDF/10.11834/jig.240653, aliStartDate=null, aliEndDate=null, collectionFlag=false, citedCount=null, citedUrl=null, reference=null)
收藏切换
跨模态特征融合与细节信息增强的RGB-D显著目标检测
收藏切换
PDF下载
宋霄罡 1, 2, * , 谭裕平 1 , 郭富强 1 , 鲁晓锋 1, 2 , 黑新宏 1, 2
中国图象图形学报 | 图像理解和计算机视觉 2025,30(12): 3838-3854
收起
收藏切换
中国图象图形学报 | 图像理解和计算机视觉 2025, 30(12): 3838-3854
跨模态特征融合与细节信息增强的RGB-D显著目标检测
全屏
宋霄罡1, 2, * , 谭裕平1 , 郭富强1 , 鲁晓锋1, 2 , 黑新宏1, 2
作者信息
  • 1西安理工大学计算机科学与工程学院,西安710048
  • 2人机共融智能机器人陕西省高校工程研究中心,西安710048
  • 宋霄罡,男,副教授,主要研究方向为计算机视觉和无人自主导航系统。E-mail:

    谭裕平,男,博士研究生,主要研究方向为显著目标检测和多模态融合。E-mail:

    郭富强,男,硕士研究生,主要研究方向为人工智能和目标检测。E-mail:

    鲁晓锋,男,教授,主要研究方向为模式识别和图像处理。E-mail:

    黑新宏,男,教授,主要研究方向为计算机视觉和人工智能。E-mail:

通讯作者:

Cross-modal feature fusion and detail-enhanced RGB-D salient object detection
Xiaogang Song1, 2, * , Yuping Tan1 , Fuqiang Guo1 , Xiaofeng Lu1, 2 , Xinhong Hei1, 2
Affiliations
  • 1School of Computer Science and Engineering, Xi’an University of Technology, Xi’an710048, China
  • 2Human Machine Integration Intelligent Robot Shaanxi Provincial University Engineering Research Center, Xi’an710048, China
出版时间: 2025-12-16 doi: 10.11834/jig.240653
文章导航
收藏切换
目的

RGB-D显著目标检测通过整合RGB图像和深度图像的互补信息,可以提高应对复杂和具有挑战性场景的显著目标检测(salient object detection,SOD)能力,取得了比RGB显著性检测模型更好的性能,受到高度关注。然而,现有RGB-D 检测模型面临如何高效利用输入的多模态信息进行融合以及如何提高显著目标边缘检测精度等问题。为此,提出一种跨模态特征融合与边缘细节增强的RGB-D显著目标检测方法。

方法

通过跨模态注意力融合增强模块(cross-modal attention fusion enhancement module,CAFEM)对不同模态特征进行注意力整合,使RGB图像和深度图像的互补信息充分融合,使模型充分利用多模态特征,从而提高模型的性能。但是两种模态的输入容易出现背景信息混淆、噪声增多、深度图质量低和目标轮廓提取困难的情况。为应对上述问题,提出一种卷积神经网络(convolutional neural network,CNN)低层特征引导的边缘特征提取模块(boundary feature extraction module,BFEM),通过通道注意力对低层特征携带的噪声进行过滤,然后使用低层细节特征引导跨模态融合特征进行聚焦解码以得到更加准确的显著图像。

结果

在4个RGB-D显著目标检测数据集进行实验,与16种代表性方法进行定量和定性实验对比。在平均绝对误差(mean absolute error, MAE)指标上,本文方法相较于排名第2的方法,在4个数据集上分别提升6.9%、10.5%、9.7%和2.4%。结果表明,本文方法在各场景均有优异表现。

结论

提出一种用于RGB-D显著目标检测的跨模态特征融合与细节信息增强网络(cross-modal feature fusion and detail-enhanced network,CFADNet),通过跨模态注意力融合增强模块(CAFEM),较好地实现了RGB特征与深度特征的融合。此外,构建了边缘特征提取模块(BFEM)提取低层细节特征,最终较为准确地定位显著物体并增强了边缘细节的清晰度。

显著性目标检测(SOD)  /  注意力机制  /  跨模态  /  特征融合  /  边缘细节增强
Objective

RGB-D salient object detection (SOD) combines complementary information from RGB and depth images, offering substantially enhanced performance in complex and challenging scenes compared to RGB-only models. This technique has gained considerable attention in the academic community due to its capability to effectively capture salient objects by leveraging visual and spatial information. However, existing RGB-D detection models face several key challenges. First, efficiently utilizing and fusing multi-modal information from RGB and depth inputs remains a difficult task due to the inherent differences between the two modalities. RGB images provide rich color and texture details but lack depth information, whereas depth maps offer spatial cues but are often noisy or of low quality. Second, achieving accurate boundary detection is particularly challenging in cluttered or noisy environments. Noisy depth maps and cluttered backgrounds can obscure object contours, making it difficult to predict sharp and precise boundaries. These challenges highlight the urgent need for a robust model that can effectively integrate RGB and depth information while simultaneously addressing noise and enhancing boundary precision.

Method

Aiming to address these challenges, a novel method, the cross-modal feature fusion and detail-enhanced RGB-D salient object detection network (CFADNet), is introduced. The proposed network incorporates two innovative modules: the cross-modal attention fusion enhancement module (CAFEM) and the boundary feature extraction module (BFEM). The CAFEM is designed to enhance the integration of RGB and depth features by leveraging attention mechanisms that emphasize the most informative aspects of each modality. Specifically, channel attention is applied to the RGB features to suppress noise and enhance critical color and texture details. Similarly, spatial attention is applied to the depth features to emphasize spatial regions that are relevant for salient object detection. This attention-based fusion mechanism ensures that the model effectively retains global semantic information from the depth map while preserving fine-grained details from the RGB image. The fusion process is structured in multiple layers, progressively integrating features at different scales to fully utilize the complementary strengths of RGB and depth modalities. In contrast, the BFEM is specifically designed to improve the accuracy of salient object boundaries. Accurate contour detection is crucial for generating high-quality saliency maps; thus, BFEM leverages low-level CNN features, which are rich in edge and texture information. These features are refined through channel attention, which filters out noise and irrelevant details, enhancing the clarity of boundary-related cues. The refined features are then used to guide cross-modal feature decoding, ensuring that the final saliency maps exhibit sharp and accurate boundaries. By combining the edge-extraction capabilities of low-level CNN features with the semantic richness of cross-modal features, BFEM notably improves boundary precision in RGB-D salient object detection.

Result

Aiming to evaluate the performance of CFADNet, extensive experiments are conducted on four widely used RGB-D salient object detection datasets: NJU2K, NLPR, STERE, and SIP. These datasets encompass a wide range of diverse and challenging scenes, making them ideal for evaluating the generalization capability of the proposed model. CFADNet is compared against 16 state-of-the-art RGB-D salient object detection methods, including DCF, CIRNet, and CAVER, using standard quantitative metrics such as mean absolute error (MAE), F-measure(Fβ), and structural similarity (Sα). CFADNet demonstrated superior performance across all datasets, particularly excelling in the MAE metric. Specifically, this network outperformed the second-best method by 6.9%, 10.5%, 9.7%, and 2.4% on the NJU2K, NLPR, STERE, and SIP datasets, respectively. These substantial improvements highlight the effectiveness of the attention-based fusion strategy and edge refinement mechanisms. Furthermore, CFADNet consistently achieved higher F-measure and Sα scores, indicating that the model not only reduces pixel-level errors but also more accurately preserves the overall structure and shape of salient objects compared to competing methods. In addition to quantitative evaluations, qualitative comparisons are conducted to visually assess the performance of CFADNet in various challenging scenarios. Results show that the proposed method generates saliency maps with sharp and accurate boundaries, even in cases where salient objects exhibit complex edges or are embedded in cluttered and noisy backgrounds. This finding demonstrates the robustness of CFADNet in handling difficult scenes by effectively separating salient objects from their background while preserving fine boundary details. The visual results further confirm that CFADNet successfully captures global semantic information and local detail, ensuring accurate identification and clear isolation of salient objects from the background.

Conclusion

This paper presents CFADNet, a cross-modal feature fusion and detail-enhancement network for RGB-D SOD, designed to address the two major challenges: effective multimodal feature fusion and accurate boundary detection. CFADNet introduces two novel modules, the CAFEM and the BFEM. CFADNet effectively integrates RGB and depth information while notably enhancing the precision of salient object boundaries. The attention mechanisms used in the CAF0EM enable the network to fully leverage the complementary information from RGB and depth modalities. Simultaneously, the BFEM module focuses on refining edge details, resulting in sharper and more accurate saliency predictions. Extensive experiments conducted on four benchmark datasets demonstrate that CFADNet consistently outperforms existing state-of-the-art methods, achieving superior performance across key evaluation metric, including MAE, F-measure, and structural similarity index. These findings highlight the robustness and strong generalization capability of CFADNet in diverse and challenging environments. By combining attention-based feature fusion with effective edge refinement, CFADNet emerges as a powerful and reliable solution for RGB-D salient object detection into complex scenarios. Future research could explore extending this approach to other multi-modal tasks, such as RGB-Thermal or multi-spectral image processing, where challenges related to multi-modal fusion and boundary detection are also prevalent. Additionally, optimizing the computational efficiency of CFADNet for real-time deployment represents a potential research direction, enabling its application in time-sensitive applications such as autonomous driving and robotics.

salient object detection(SOD)  /  attention mechanism  /  cross-modal  /  feature fusion  /  edge detail-enhancement
宋霄罡, 谭裕平, 郭富强, 鲁晓锋, 黑新宏. 跨模态特征融合与细节信息增强的RGB-D显著目标检测. 中国图象图形学报, 2025 , 30 (12) : 3838 -3854 . DOI: 10.11834/jig.240653
Xiaogang Song, Yuping Tan, Fuqiang Guo, Xiaofeng Lu, Xinhong Hei. Cross-modal feature fusion and detail-enhanced RGB-D salient object detection[J]. Journal of Image and Graphics, 2025 , 30 (12) : 3838 -3854 . DOI: 10.11834/jig.240653
视觉显著性检测利用算法模拟人类视觉,评估图像中不同部分的吸引力程度,称为显著性,最终生成显著性图像。显著性目标检测(salient object detection,SOD)专注于显著目标的分割,通过在图像处理算法中结合显著性目标检测技术,可以使算法得到优化,资源能够得到高效利用。在计算机视觉的各个领域,已经有很多算法模型将显著性检测技术作为它们的预处理操作,如图像检索(Gao等,2012)、照片裁剪(Wang等,2019)、场景分类(Ren等,2014)、语义分割(Wang等,2024)以及视频分割(Wang等,2021)等。除了在计算机视觉研究领域作为辅助研究和人工智能化系统中发挥作用之外,该技术已经成功应用于一些实际生活场景中,例如:医学图像分割(Jahanifar等,2019Chen等,2021)、安防监控领域(赵兴科 等,2021)、卫星图像领域(Liang和Luo,2024)、智能驾驶领域(Ding等,2024)以及深地工程领域(张茹 等,2024a)。
自进入深度学习(Chen等,2018bHou等,2019Li等,2021)时代以来,基于卷积神经网络(convolutional neural network,CNN)的RGB SOD框架(Mei等,2022Wu等,2022bZhu等,2021)得到了大力发展,远远超过了基于手工特征的方法。然而,由于卷积操作的感受野受限,CNN理论上可以通过加深网络层数获得更大的感受野。但随着网络层数的加深,不断下采样导致了更多的特征丢失,并且无法预测出完整的显著目标以及锐利的边缘。此外,RGB输入虽然能够提供纹理细节、颜色特征以及对比度等信息,但是其缺少空间信息,并且受光照条件影响会使RGB图像变得模糊,提取到的纹理细节特征变少、噪声增加,导致预测精度下降,模型出现性能瓶颈。为了克服RGB图像质量变差的问题从而引入了深度信息,通过深度信息携带的空间特征建模全局关系,从而弥补RGB特征缺失的全局语义信息。
虽然在过去的几年里,已经提出各种基于 RGB-D的显著目标检测模型,这些模型虽然引入了深度信息,但对于如何利用多模态信息以及如何增加边缘特征仍然存在着问题。近几年基于RGB-D 的显著目标检测模型倾向于探索高效融合多模态特征的方法:Ji等人(2021)提出一个学习策略来校准原始深度图中的潜在偏差,以提高SOD性能,并提出一个简单的交叉融合模块,融合RGB和深度特征。Chen等人(2018a)提出一个互补感知的融合模块来集成跨模态和跨级特征表示。它可以通过显式地利用跨模态和水平的连接以及模态和水平的监督来有效地利用互补信息,以减少融合的模糊性。Fang等人(2024)提出一种模态净化模块和一种尺度统一模块来融合多模态特征。Chen等人(2023)提出三重编码器网络,通过多模态特征交互模块自适应评估模态重要性。Cheng等人(2023)引入了一个嵌入了ID信息的跨模态交互块(cross-modal interaction block,CMIB)。CMIB可以提取和融合不同模式的特征,交互作用有助于深度神经网络学习互补特征,减少语义差距。孙福明等人(2024)设计了 CNN-Transformer网络架构,借助注意力机制学习深度图像和RGB图像之间的互补信息,并且将跨模态融合特征输入到RGB分支中,以充分利用不同模态的特征信息。Cong等人(2023)通过跨模态点感知交互模块约束特征交互位置,并利用CNN细化单元缓解Transformer的块效应。
然而,上述方法将每一级的深度特征或经过融合的特征输入到RGB特征提取主干中,或将每一级融合特征单独输出,未与后续融合特征进行交互,直接使用解码器生成最终的显著性图像。这会影响特征提取主干对显著特征以及边缘细节信息的提取,降低对融合特征的利用率,导致SOD无法更好地泛化适应更多的场景,在一些背景复杂、边缘纹理较多的情况下的检测效果不理想。本文通过采用注意力融合增强的方式,将深度特征与RGB特征逐级融合,使每一级融合特征充分交互,不将融合特征输入到特征提取主干,而是直接输入到解码器进行解码,以降低融合后的特征对主干网络的负面影响,使主干网络专注于提取全局语义信息与局部细节信息,无须对融合特征进行参数的调整。
基于上述分析,本文提出一种跨模态特征融合与边缘细节增强的RGB-D显著目标检测方法,使用双分支Transformer进行特征提取,克服了CNN感受野受限导致的语义信息提取不足的问题。为了更好地融合利用多模态特征,并减少深度图质量差带来的负面影响,利用通道注意力机制对融合特征进行通道过滤,并使用深度图像提取空间注意力权重,对融合特征进行空间过滤。为了充分保留全局语义信息,本文将融合特征进行逐层拼接并利用自注意力机制建立逐层语义信息之间的依赖关系。由于Transformer提取到的局部细节信息较少,本文在网络的末端加入两层CNN特征提取网络,使用CNN提取低层细节信息,利用通道注意力机制将低层特征进行过滤后融入多模态融合特征中,从而丰富特征中的边缘细节,能够使预测出的显著目标完整且具有更锐利的边缘。
本文主要贡献如下:1)设计一个双分支Transformer网络架构,将Transformer提取的多模态全局语义特征逐步融合,最后通过CNN提取局部细节特征,丰富融合特征中的边缘细节以增强特征表示。2)设计跨模态注意力融合增强模块,兼顾RGB与深度特征。对于RGB通道间信息,通过通道注意力融合,深度信息的空间特征则通过空间注意力加权,以此实现全局语义、局部细节和通道间依赖信息的融合,并有效抑制噪声,确保模型在复杂场景中的精准预测。3)设计CNN引导的边缘特征提取模块,通过CNN提取低层细节特征,并利用通道注意力过滤噪声后将提取的细节特征逐步融合到显著特征中。使最终预测的特征中包含大量的语义,同时又有细节特征的补充。4)采用预训练的P2T(pyramid pooling transformer)作为骨干网络,在4个RGB-D显著目标检测测试数据集上的实验结果表明,本文提出的方法能够有效地利用RGB和深度信息,并通过提取低层细节信息提高显著目标检测的精度。
在过去的几年中,许多基于手工制作特征的传统RGB-D显著性目标检测模型已经被开发出来(Liang等,2018)。如早期工作(Ciptadi等,2013)中,专注于由RGB图像和深度图生成的布局和形状特征之间的交互建模,Peng等人(2014)开发了一个新的多阶段RGB-D模型,并构建了第一个大规模的RGB-D基准数据集:NLPR(National Laboratory of Pattern Recognition)。但是上述传统方法由于手工制作特征的表达能力有限,其显著目标检测性能不理想。为了解决这一问题,一些研究已经转向使用深度神经网络融合RGB-D特征(Zhang等,2020)。这些模型可以学习高级特征表示,探索RGB图像和深度线索之间的复杂相关性,以提高显著目标检测性能。张晴等人(2019)通过多尺度超像素分割、深度特征提取和多核增强学习,有效抑制复杂图像中的无关背景区域。Ji等人(2021)提出一个学习策略来校准原始深度图中的潜在偏差,以提高SOD性能,并提出一个简单而有效的交叉融合模块,融合RGB和深度特征。Chen等人(2018a)提出一个互补感知的融合模块来集成跨模态和跨级特征表示,它可以通过显式地利用跨模态和水平的连接以及模态和水平的监督来有效地利用互补信息,以减少融合的模糊性。Lee等人(2022)提出一种新的超像素原型采样网络体系结构,将输入的RGB图像和深度映射分割为组件超像素,生成组件原型,使该网络只对显著对象对应的原型进行采样,消除了非显著性对象的影响。跨模态视图混合变换器(Pang等,2023)引入了Transformer,从序列到序列的角度重新考虑双模态SOD 建模,从而获得了更好的可解释性,并构建了一个基于自顶而下的转换器的信息传播路径,由视图混合注意块增强来充分利用来自空间和通道视图的模态间和模态内信息。Zhang等人(2023)提出一种基于校准后融合的RGB-D两阶段SOD模型,同时考虑低质量图像和前景不一致图像对显著性检测的影响。在图像生成阶段,从原始输入的RGB-D 图像对中选择高质量、前景一致的深度图像作为伪深度图像生成网络的监督信息;在推理阶段,校准不可靠的深度信息,然后从RGB-D图像中捕获更多的跨模态信息,用于最终的预测。
早期融合的方法主要为输入融合和早期特征融合。RGB图像和深度图像直接拼接形成一个四通道输入,称之为输入融合(Ren等,2015)。RGB图像和深度图像输入到单独的网络进行特征提取,之后利用提取的低级特征拼接,然后输入后续网络进一步显著预测,称之为早期的特征融合。后期融合也分为后期特征融合和后期结果融合,采用两个并行网络流分别学习RGB和深度图像的高级特征,并将它们拼接起来,生成最终的显著预测,称之为后期特征融合(Han等,2018)。利用两个并行网络流分别获得RGB图像和深度图像的显著预测图,然后将两个显著预测图拼接起来,得到最终的预测图,称为后期结果融合(Ding等,2019)。为了有效地探索RGB图像与深度图像之间的相关性,学者们提出一种多尺度融合策略,例如,Cong等人(2023)引入了CNN辅助的Transformer架构,考虑到RGB模态和深度模态之间的先验相关性,设计了一个注意触发的跨模态点感知交互模块来探索具有位置约束的不同模态的特征交互,并设计了一个CNN诱导的细化单元,用于内容的细化和补充。这种方法将跨模态交互引入到多层中,可以提供额外的梯度来增强对深度流的学习,并使低级和高级表示之间的互补性得以探索。Zhang等人(2023)提出一种通过解耦动态卷积实现的交叉动态滤波网络,通过一个动态增强模块,利用全局上下文指导动态地增强了模态内特征,同时提出一个场景感知的动态融合模块来实现两种模式之间的动态特征选择。Chen等人(2023)提出三重编码器网络,通过多模态特征交互模块自适应评估模态重要性。叶欣悦等人(2024)通过设计新的互补信息交互模块和跨模态特征融合模块,实现了互补信息交互融合网络。然而,对于两种模态的输入导致的背景信息与前景混淆,噪声增多,以及目标轮廓提取困难的问题,目前缺乏统一的解决方案。
通过整合RGB图像和深度图的互补信息,可以提高对复杂和具有挑战性的场景的显著目标检测能力。本文提出一种跨模态融合和细节信息增强的RGB-D显著性目标检测方法。首先,以P2T(Wu等,2023)为特征提取主干,分别对RGB图像与深度图像进行特征提取,并通过跨模态注意力融合增强模块(cross-modal attention fusion enhancement module,CAFEM)对两种模态特征进行融合,以挖掘两种模态中显著性特征的共性与互补特征。将融合特征输入到Transformer解码器进行解码,然后,通过构建的边缘特征提取模块(boundary feature extraction module,BFEM)生成边界引导特征,并对每层跨模态融合特征进行边缘细化,确保生成显著物边界的完整性。最后,本文设计了一个CNN低层特征引导模块,使用CNN提取的低级细节特征引导跨模态融合特征进行聚焦解码以得到更加准确的显著图像。
所提跨模态特征融合与细节增强网络(cross-modal feature fusion and detail-enhanced network,CFADNet)架构如图1所示,该网络主要分为双分支Transformer特征提取、跨模态融合、边界特征提取、VGG16(Visual Geometry Group)(Simonyan和Zisserman,2015)低层细节特征提取以及聚焦预测部分。
首先,网络通过双分支的预训练P2T特征提取网络(P2TP2T)对输入RGB图像深度图进行特征提取,表示为
Fri=P2TIrgbFdi=P2TIdepth
式中,IrgbIdepth分别表示RGB和深度输入,FriFdiFdi分别表示P2T提取的RGB特征和深度图像特征。RGB特征与深度特征存在模态差异,RGB特征包含大量的细节和纹理信息,深度特征包含更多的空间信息,可以很好地定位显著目标。因此,针对两种模态特征的特点设计了跨模态注意力融合增强模块(CAFEM)对两部分特征进行注意力加权融合,并将融合后的多模态混合特征输出到Transformer解码器进行解码。
在网络末端将解码后的特征和VGG提取的细节特征进行聚焦定位细化,增加空间信息、语义信息和局部细节信息之间的融合交互,从而使模型能够充分利用多模态信息以及边缘预测的信息,输出完整且边缘轮廓清晰的预测图,以上过程可表示为
Zi=CAFEMFri, Fdi, Zi-1Si=DecoderZi, Si+1Fvgg112, Fvgg224=VGGIrgbSout=BFEMSi, Fvgg112, Fvgg224
式中,Zi表示CAFEM模块融合特征的输出,Si表示Transformer解码器输出,Fvgg112Fvgg112和Fvgg224Fvgg224表示VGG前两层特征提取模块输出,形状分别为(112,112,128) 和 (224,224,64),Sout表示BFEM模块显著预测输出。最后,调整各个阶段的预测结果Si大小并计算Loss。
在提取了RGB模态和深度模态的多层次编码特征后,如何实现特征的充分融合交互是编码阶段需要关注的一个重要问题。深度特征包含丰富的空间信息,这对于显著目标的定位非常重要。RGB特征包含丰富的细节信息,这对于显著目标能否预测出清晰的边缘起决定作用。细节信息往往表现在特征通道之间的相关性,因此融合时首先将二者按通道拼接起来,然后利用拼接后的特征获得通道注意力权重,从而筛选出更重要的特征。深度特征自身就包含特征在空间上的相关性,因此利用深度特征提取空间注意力权重,对经过通道注意力筛选后的特征进行空间加权,加权后经过多层卷积,逐步将通道数压缩至输入通道数,接着将融合后的特征与前一层的融合特征拼接,这样在融合时能减少有效特征的丢失,拼接后进行自注意力加权,最后输出本层融合特征。所提出的跨模态融合模块结构如图2所示。
在跨模态注意力融合增强模块中,FriFdiZi-1ZiRN×CiRN×Ci分别表示RGB输入特征、深度输入特征、前一层融合特征和当前层输出的融合特征,FrxiFrxi和FdxiFdxi ∈ RH×W×CiRH×W×Ci分别表示变换形状后的RGB特征与深度特征,其中,N = H × WCi 表示输入和输出通道数,i ∈{2,3,4}。首先将FriFdi的形状变换为(BCH W),然后将变换后的特征按通道进行拼接,利用拼接后的特征获得通道注意力权重,并对拼接特征的通道进行加权,具体为
F=ConcatreshapeFri, FdiX'=σCConcatMeanF, MaxFM=X'F
式中,F RH × W × Cireshape表示将特征形状从(BNC)变换为(BC, HW),Concat表示特征拼接操作,⊗表示像素级乘法,Mean表示对每个通道进行平均值操作,Max表示对每个通道进行最大值操作,X'表示通道注意力权重。C表示具有2个和1个通道的1 × 1卷积操作,σ表示sigmoid激活函数。然后,利用Fdi生成空间注意力权重,可以表示为
Y'=σfcavgpoolFdi+fcmaxpoolFdi
式中,avgpool表示全局平均池化操作,maxpool表示最大池化操作, fc表示1 × 1卷积、修正线性单元(rectified linear unit,ReLU)以及一个1 × 1卷积。在获得相应的权重之后,对经过通道注意力筛选的特征进行空间加权融合。具体过程为
M=Y'F
式中,M RH×W×CiRH×W×Ci,空间加权融合后的特征输入到卷积层,增强特征的表征能力并将通道数减半,具体过程为
Tx=CbrM
式中,TxRH×W×CiRH×W×Ci,Cbr表示卷积核大小为3 × 3卷积操作,包括归一化和ReLU操作。最后,将Tx展平与前一层CAFEM模块的输出直接拼接,使得最终特征最大程度上保留原始融合特征,再通过两层线性层调整通道数,然后进行自注意力加权融合,得到最终融合特征。具体过程为
Zi=Att(LinearConcatUnfoldTx, Zi-1
式中,ZiRN×CiRN×Ci,Unfold表示特征在特定维度上被展平,ConCat表示在最后一个维度上进行特征拼接,Linear表示线性连接操作,Att代表自注意力计算。
整体而言,本文方法根据RGB特征与深度特征的不同性质,通过对应的注意力进行加权融合,以学习具有不同层次的特征表系,并不断与前一层特征进行融合,充分学习并保留语义信息与细节信息。关于CAFEM的有效性以及对比实验的详细讨论见3.4实验部分。
为了补充Transformer提取特征的边缘细节,本文方法提出边缘特征提取模块(BFEM),在网络的末端将RGB图像输入到VGG特征提取主干的两边缘特征,并将此边缘特征与Transformer解码器的输出进行融合,如图3所示,本节将重点描述BFEM的详细结构及所提出的边缘特征获取以及融合细节。此外,将BFEM中的一些中间特征进行可视化的展示,具体操作为对于每一个特征,计算其在通道维度上的均值,将其转化为单通道图然后将其归一化,如图4所示。
在Transformer解码器的输出处,基本确定了显著性对象的主体,但由于Transformer的结构使特征提取感受野较大,所得到的显著性图像可能会出现边缘细节缺失等问题。为此,本文在解码器的末端提出一个CNN诱导的细化单元。CNN在特征提取时感受野逐步扩大,处理局部细节方面有着得天独厚的优势。并且该阶段的特征分辨率较高,在这个阶段使用卷积操作在参数数量和计算成本方面都更为合理。由于这一步的主要目的是对边缘细节的提取与细化,不需要引入完整的CNN编解码器网络,因此使用VGG中纹理细节最为丰富的两层浅层特征进行边缘特征提取与特征融合,记为VGG112VGG224。但由于低层特征同时包含大量的噪声,需要经过通道注意力进行筛选,因此将VGG112通过通道注意力进行特征的筛选。首先,将来自最后一个Transformer解码器特征S1的形状变换为(BCHW),并上采样到与VGG112相同的分辨率,然后将VGG112特征通过通道注意力筛选并压缩通道数后与S1拼接,并输入到卷积层进行细化融合,得到第1次融合的特征,具体为
S=UpsamplereshapeS1VGG112=CAVGG112Q1=CbrConcatS, VGG112
式中,S1RN×CiRN×Ci,Q1RH×W×CiRH×W×Ci,reshape表示从(BNC)恢复特征形状为(BCHW),Upsample表示双倍上采样操作,CA表示通道注意力模块,Concat表示特征拼接操作,Cbr代表3 × 3卷积、归一化和ReLU操作。然后,将融合后的特征Q1输入到边缘提取层获得边缘特征后进行融合上采样,准备与VGG特征进行第2次融合,具体为
edge=EdgeExtraction(Q1)U=UpsampleCbrConcat(Q1, edge)
式中,edgeR112×112×128UR224×224×CiR224×224×Ci,Cbr代表3 × 3卷积、归一化和ReLU操作,Concat表示特征拼接操作,Upsample表示双倍上采样操作,EdgeExtraction表示边缘提取操作,具体通过不同膨胀因子的空洞卷积提取多尺度特征,将多尺度特征相加后与输入特征拼接,充分地提取边缘细节特征,最后通过3 × 3卷积将通道数调整为输入通道数得到边缘特征,具体为
Ti=DSConv3×3iC3×3QjTx=i=13iedge=C3×3Concat(Qj, Tx)
式中,TiTxQjRH×W×CiRH×W×Ci表示边缘特征,Concat表示特征拼接操作,DSConvi3×3表示空洞卷积操作,膨胀因子分别为2,4,8。将上采样后的特征UVGG224最终拼接后输入到融合卷积层进行融合,融合后输入到通道注意力层进行通道筛选并压缩通道,最后通过1 × 1卷积操作将通道数压缩为1得到最终的显著预测结果pre224。可以表示为
Ft=ConcatU , VGG224pre224=CCACbrFt
式中,pre224R224×224×1Cbr代表 3 × 3卷积、归一化和ReLU操作,C表示将通道数压缩为1通道的1 × 1卷积操作,CA表示通道注意力模块。所提方法引入CNN提取边缘特征,且在仅使用了低层特征的情况下融合了大量的边缘细节,使最终的显著目标预测的边缘更清晰,细节更丰富。对该模块的有效性以及对比实验的详细讨论见4.3实验部分。
最后,在训练阶段,为了获得具有清晰边界的高质量显著性图像,本文采用混合损失函数训练网络,包括常用的二元交叉熵损失(binary cross-entropy loss,BCE)、测量结构相似度的SSIM(structure similarity index measure)以及交并比(intersection over union,IoU)损失(Cong等,2023)。将每个阶段显著目标的BCE损失、SSIM损失和IoU损失的总和作为显著目标的总损失,定义为
LsalP, G=LbceP, G+LssimP, G+LiouP, GLtotal=i=1412iLsal(Si, Gi)+Lsal(Sout, G)
式中,G为真实值,Si为各阶段显著目标预测输出。
采用NJU2K(Nanjing University 2K)(Ju等,2014)的1 485组图像和NLPR(Peng等,2014)的700组图像作为训练集。验证则采用NJU2K除用于训练外的其他图像作为验证数据集,测试采用4个使用较为广泛的公开基准数据集NJU2K、NLPR、SIP(salient person)(Fan等,2021)和STERE(stereo dataset)(Niu等,2012)。评估时采用常用的3项评价指标:1)平均绝对误差(mean absolute error, MAE);2)Fβ,评估模型的识别能力及查全能力,在本文中始终将β2设置为0.3(Wang等,2022b),以强调准确率的重要性;3)Sα,评估显著图像区域感知和对象感知的空间结构相似性,在本文中始终将α设置为0.5(Zhao等,2019)。
所提出的方法基于PyTorch(Paszke等,2019)库实现。在Nvidia RTX 3090 GPU(24 G显存)上进行训练与测试。本文采用的P2T版本为P2T-base,VGG版本为VGG-16,并且只采用了VGG16前两层的结构用于微调训练。在训练过程中,采用了Adam优化器(Kingma和Ba,2015),使用默认的超参数设置。为了标准化输入,将图像调整为224 × 224像素。此外,所有的深度图像都被规范化并复制成3个通道以适应输入的大小,所有图像采用随机旋转和水平翻转进行数据增强。本文使用批量大小为16训练网络,共进行了200个epoch。学习率设置为10-4,每40个epoch衰减为原来的1/5。测试过程中,将各个阶段输出使用双线性插值将它们调整回原始尺寸。
为了验证所提出的CFADNet模型的有效性,在NJU2K、NLPR、STERE和SIP数据集上进行定量评估,并与16种先进的RGB-D显著目标检测方法进行定量比较,结果如表1所示。对比方法包括DCF(depth calibration and fusion)(Ji等,2021)、CIRNet(cross-modality interaction and refinement network)(Cong等,2022)、CM-LCG(cross-modality long-range context information gathering)(Wang等,2022a)、AILNet(aggregate interactive learning network)(Wu等,2022a)、SPSN(superpixel prototype sampling network)(Lee等,2022)、TMFNet(three-input multilevel fusion network)(Zhou等,2022)、AFNet、 JL-DCF(joint learning and densely cooperative fusion)(Fu等,2022)、EBFSP(employing bilinear fusion and saliency prior)(Huang等,2022)、CAVER(cross-modal view-mixed transformer)(Pang等,2023)、HINet(hierarchical interaction network)(Bi等,2023)、C2DFNet(criss-cross dynamic filter network)(Zhang等,2023)、PICRNet(point-aware interaction and cnn-induced refinement network)(Cong等,2023)、DGFNet(depth-guided cross-modality fusion network)(Xiao等,2024)、FCFNet(feature calibrating and fusing network)(Zhang等,2024)以及RD3D(Chen等,2024)。
在所有模型中,CFADNet的3项指标在NJU2K、NLPR、STERE和SIP数据集上均取得优异成绩。特别是与排名第2的方法相比,CFADNet的MAE分别降低了6.9%、10.5%、9.7%和2.4%,并且SαFβ在4个数据集上都达到最优结果。
Fβ曲线和PR曲线如图5图6所示。可以看出,本文方法在4个数据集上的Fβ曲线比其他模型的曲线更平坦,表明本文结果更接近二分类预测,且对阈值变化具有不变性。从PR曲线可以看出,本文方法在4个数据集上达到更高精度。
表2展示了不同方法在所有数据集的平均精度和平均召回率。表3展示了本文方法与开源对比方法的计算复杂度和参数量对比。可以看出,尽管复杂的网络结构导致了较高的参数量,但每秒浮点运算次数(floating point operations per second,FLOPs)维持在一个相对较低的值46.82 G。因此在计算资源受限的场景中,同样能实现计算效率与模型性能的良好平衡。此外,本文方法在所有数据集上保持了高精度,同时确保了高召回率。表明本文方法能够正确识别更多的正例,将更少的负例误判为正例,并尽可能找到所有显著对象。
综上所述,CFADNet在性能上展现出强大竞争力。与目前先进方法相比,在4个基准数据集上的所有指标上均表现出色,达到最优或次优结果,充分证明了CFADNet的优秀性能,同时表明其在处理不同场景下显著目标预测的准确性和稳定性。
除定量评估对比外,为了直观展示CFADNet的卓越性能,在4个测试数据集上抽取具有挑战性的几个场景进行显著目标检测测试的可视化细节对比,其中所对比其他方法的预测图像是基于作者提供的开源代码生成或来自作者开源仓库中提供的测试图像,对比结果如图7所示。结果表明,在各种复杂场景下,CFADNet都能够保证预测的精度。第1、3、4、9行展示了图像中包含的丰富细节,并且显著目标边缘复杂不规则。与其他方法相比,CFADNet能够正确分割显著区域,并且预测出清晰锐利的边缘轮廓,证明了CFADNet可以有效捕捉局部细节特征。在细节信息的帮助下,网络能够准确识别显著物体的边缘轮廓。第2、7、8行展示了目标主体区域不连续或部分区域占比较小的图像。结果显示,本文方法能精确分割出完整的显著物体,证明了CFADNet能够有效捕获全局语义信息,并对全局语义信息进行整合筛选,最终正确地预测出完整的显著目标。第5、6行展示了显著目标主体与背景相似或嵌入在背景中,本文方法能够正确地区分前景与背景,从而将显著目标从背景中分割出来。综上所述,在这些具有挑战性的场景中,CFADNet能够产生高精度的显著性检测结果。
使用3个常用的具有挑战性的测试数据集NJU2K、NLPR和STERE进行评估。从3项评价指标和可视化对比讨论CAFEM和BFEM在跨模态特征融合、边缘提取以及噪声抑制的有效性。同时,研究不同的边缘提取方法得到边缘特征对显著物体预测的影响。
为探究最佳的跨模态特征融合模块的设计,验证本文提出的跨模态注意力融合增强模块的有效性,进行了两组对照实验。第1组对照实验通过移除CAFEM中的通道注意力以及空间注意力机制,直接拼接两种模态的特征进行卷积融合,通过自注意力特征中语义和细节信息之间的依赖关系后输出,训练剩余的网络进行对照实验。第2组实验将CAFEM模块替换为 PICRNet(Cong等,2023)中提出的CmPI(cross-modality pointaware interaction)融合方法,对两种模态特征进行融合。
为确保实验结果的准确性和公平性,所有实验都在相同的环境下进行,具体而言,每个阶段提取的RGB特征和深度特征不再通过CAFEM模块进行融合,而是直接堆叠并输入到卷积模块进行融合或直接输入到 CmPI融合模块进行融合。
实验结果如表4所示。从表中3个测试数据集上的所有评价指标可看出,引入CAFEM后,预测精度明显提高,RGB特征与深度特征得到了充分的融合利用,帮助了网络提取更丰富的多尺度细节与语义信息,此外融合语义信息与细节信息的同时还抑制了噪声的引入,使最终的预测精度得到了显著的提高,这证明了所提出的跨模态注意力融合增强模块是有效的。
为了证明本文方法提出的边缘特征提取模块的有效性,通过移除BFEM的边缘提取与融合部分并训练剩余网络来研究边缘细化的影响。即不使用空洞卷积提取多尺度边缘特征来融合补充到显著特征中,而是直接将VGG提取的低层特征与显著特征进行拼接卷积融合后直接预测,不对VGG的低层特征进行噪声的过滤。为保证公平性和准确性,每组实验的训练环境和超参设置均保持一致,评估结果如表5所示,引入边缘提取与融合后,性能有了明显的提升。此外,本文对有无边缘特征提取与融合的两种预测结果进行可视化比较,如图8所示。其中,S1表示未提取边缘及未过滤噪声的结果,S2表示使用BFEM提取边缘并过滤噪声的结果。S1_FS2_F为两种方式在BFEM模块中最终输出前一层特征Ft的可视化结果。可以看到,通过通道注意力对低层特征进行噪声过滤以及使用空洞卷积进行边缘特征提取后,融合到显著特征中进行预测,生成的预测图拥有了清晰的边界并且预测错误区域明显降低。表明BFEM可以有效过滤VGG提取低层特征的噪声并将局部细节特征充分融合到显著特征中得到精准的预测结果。综上所述,通过引入边缘细化模块,CFAENet能够更好地融合多种特征,准确提取目标多尺度下的细节信息,从而获得更好的预测结果。
为了验证本文方法中混合损失函数的有效性,设计了4组对照实验,结果如表6所示。实验结果表明,单独使用BCE损失时,模型在3个数据集上仅取得基础性能;当进一步引入SSIM损失后,精度有所提升,表明结构相似度损失在保持前景整体结构和抑制背景噪声方面发挥了积极作用;加入IoU损失之后,模型性能进一步提升,说明IoU损失能够有效增强目标区域的覆盖性,并改善边界一致性。最后,将多阶段监督策略替换为仅在最终预测结果上进行监督的单阶段策略。结果表明,尽管仍采用混合损失,单阶段策略下模型精度较多阶段监督策略显著下降。这一对比充分证明了本文训练策略的合理性和有效性。
虽然本文模型已取得令人满意的结果,但在一些具有挑战性的场景中仍存在一些预测失败的情况。本节在4个测试数据集上分析了一些失败案例。图9展示了本文提出模型的6个失败案例。在第1、3个案例中,本文模型将石像后面的人错误识别为显著物体,并且只识别了半个,同时将与显著目标相连的下部分错误判断为显著目标。这是因为模型可能受深度图像提取特征的影响,由于深度图像层次分明,导致模型误判。在第2、4个案例中,显著目标的部分细长不明显或偏离视觉中心,使得模型很容易将这部分目标判断为非显著。在第5、6个案例中,显著目标与非显著目标过度平缓且连接在一起,本文模型错误地将其识别为显著物体。
上述失败案例表明,本文方法在图像背景出现多个相似显著物体且深度图像层次鲜明时表现不佳,仍有很大改进空间。本文认为可以通过判断深度图像的质量,训练出一组权重,当深度图像质量高、图像层次鲜明则可以用来确定显著目标具体位置,若深度图像质量较低,分辨不出显著目标时要尽可能将权重偏向RGB图像分支,多使用 RGB 图像提取的特征进行预测。针对显著目标与背景相似且黏连的问题,后续可以通过扩大类似场景的训练数据集来缓解。在未来工作中,将对这些问题进行研究。
本文提出一种用于RGB-D的显著目标检测网络CFADNet。为了高效利用多模态特征,设计了一种跨模态注意力融合增强模块(CAFEM)。该模块考虑了RGB特征与深度特征的特点,借助RGB携带的通道间信息,通过通道注意力进行筛选融合。同时,利用深度信息携带的空间信息,采用空间注意力对融合特征进行空间加权,使融合特征能够包含全局语义信息、局部细节信息和通道间依赖信息,并在一定程度上抑制引入的噪声,有助于模型在复杂场景下保持预测精度。此外,构建了一个边缘特征提取模块(BFEM),通过CNN提取低层细节特征,并利用通道注意力过滤噪声后将这些细节特征逐步融合到显著特征中,使最终预测的特征既包含语义信息,又有细节特征的补充。在RGB-D显著目标检测基准数据集上进行的大量综合评估实验表明,CFADNet能够较好地融合并利用多模态特征,提取丰富的语义和细节特征,对显著物体进行准确的定位,并预测出较为清晰的边缘。
参考文献 引证文献
排序方式:
Bi H BWu R WLiu Z QZhu H HZhang C and Xiang T Z. 2023. Cross-modal hierarchical interaction network for RGB-D salient object detection. Pattern Recognition136: #109194 [DOI: 10.1016/j.patcog.2022.109194
Chen H and Li Y F. 2018a. Progressively complementarity-aware fusion network for RGB-D salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE:3051-3060 [DOI: 10.1109/CVPR.2018.00322
Chen J NLu Y YYu Q HLuo X DAdeli EWang YLu LYuille A L and Zhou Y Y. 2021. TransUNet: transformers make strong encoders for medical image segmentation [EB/OL]. [2024-11-07]. https://arxiv.org/pdf/2102.04306.pdf
Chen QZhang Z XLu Y YFu K R and Zhao Q J. 2024. 3-D convolutional neural networks for RGB-D salient object detection and beyond. IEEE Transactions on Neural Networks and Learning Systems35(3): 4309-4323 [DOI: 10.1109/TNNLS.2022.3202241
Chen S HTan X LWang B and Hu X L. 2018b. Reverse attention for salient object detection//Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer:236-252 [DOI: 10.1007/978-3-030-01240-3_15
Chen T YXiao JHu X GZhang G F and Wang S J. 2023. Adaptive fusion network for RGB-D salient object detection. Neurocomputing522: 152-164 [DOI: 10.1016/j.neucom.2022.12.004
Cheng X LZheng XPei J LTang HLyu Z and Chen C B. 2023. Depth-induced gap-reducing network for RGB-D salient object detection: an interaction, guidance and refinement approach. IEEE Transactions on Multimedia25: 4253-4266 [DOI: 10.1109/TMM.2022.3172852
Ciptadi AHermans T and Rehg J M. 2013. An in depth view of saliency//Proceedings of 2013 British Machine Vision Conference (BMVC). Bristol, UK: BMVC:#112 [DOI: 10.5244/C.27.112
Cong R MLin Q WZhang CLi C YCao X CHuang Q M and Zhao Y. 2022. CIR-Net: cross-modality interaction and refinement for RGB-D salient object detection. IEEE Transactions on Image Processing31: 6800-6815 [DOI: 10.1109/TIP.2022.3216198
Cong R MLiu H YZhang CZhang WZheng FSong R and Kwong S. 2023. Point-aware interaction and CNN-induced refinement network for RGB-D salient object detection//Proceedings of the 31st ACM International Conference on Multimedia. Ottawa, Canada: ACM:406-416 [DOI: 10.1145/3581783.3611982
Ding NZhang C and Eskandarian A. 2024. SalienDet: a saliency-based feature enhancement algorithm for object detection for autonomous driving. IEEE Transactions on Intelligent Vehicles9(1): 2624-2635 [DOI: 10.1109/TIV.2023.3287359
Ding YLiu ZHuang M KShi R and Wang X Y. 2019. Depth-aware saliency detection using convolutional neural networks. Journal of Visual Communication and Image Representation61: 1-9 [DOI: 10.1016/j.jvcir.2019.03.019
Fan D PLin ZZhang ZZhu M L and Cheng M M. 2021. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems32(5): 2075-2089 [DOI: 10.1109/TNNLS.2020.2996406
Fang XJiang M FZhu J CShao X L and Wang H P. 2024. GroupTransNet: group transformer network for RGB-D salient object detection. Neurocomputing594: #127865 [DOI: 10.1016/j.neucom.2024.127865
Fu K PFan D PJi G PZhao Q JShen J B and Zhu C. 2022. Siamese network for RGB-D salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence44(9): 5541-5559 [DOI: 10.1109/TPAMI.2021.3073689
Gao YWang MTao D CJi R R and Dai Q H. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing21(9): 4290-4303 [DOI: 10.1109/TIP.2012.2199502
Han J WChen HLiu NYan C G and Li X L. 2018. CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics48(11): 3171-3183 [DOI: 10.1109/TCYB.2017.2761775
Hou Q BCheng M MHu X WBorji ATu Z W and Torr P H S. 2019. Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence41(4): 815-828 [DOI: 10.1109/TPAMI.2018.2815688
Huang N CYang YZhang D WZhang Q and Han J G. 2022. Employing bilinear fusion and saliency prior information for RGB-D salient object detection. IEEE Transactions on Multimedia24: 1651-1664 [DOI: 10.1109/TMM.2021.3069297
Jahanifar MTajeddin N ZAsl B M and Gooya A. 2019. Supervised saliency map driven segmentation of lesions in dermoscopic images. IEEE Journal of Biomedical and Health Informatics23(2): 509-518 [DOI: 10.1109/JBHI.2018.2839647
Ji WLi J JYu SZhang MPiao YYao S YBi QMa KZheng Y FLu H C and Cheng L. 2021. Calibrated RGB-D salient object detection//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE:9466-9476 [DOI: 10.1109/CVPR46437.2021.00935
Ju RGe LGeng W JRen T W and Wu G S. 2014. Depth saliency based on anisotropic center-surround difference//Proceedings of 2014 IEEE International Conference on Image Processing (ICIP). Paris, France: IEEE:1115-1119 [DOI: 10.1109/ICIP.2014.7025222
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR:1-15
Lee MPark CCho S and Lee S. 2022. SPSN: superpixel prototype sampling network for RGB-D salient object detection//Proceedings of the 17th European Conference on Computer Vision (ECCV). Tel Aviv, Israel: Springer:630-647 [DOI: 10.1007/978-3-031-19818-2_36
Li JSu J MXia C QMa M C and Tian Y H. 2021. Salient object detection with purificatory mechanism and structural similarity loss. IEEE Transactions on Image Processing30: 6855-6868 [DOI: 10.1109/TIP.2021.3099405
Liang B C and Luo H L. 2024. MEANet: an effective and lightweight solution for salient object detection in optical remote sensing images. Expert Systems with Applications238: #121778 [DOI: 10.1016/j.eswa.2023.121778
Liang F FDuan L JMa WQiao Y HCai Z and Qing L. 2018. Stereoscopic saliency model using contrast and depth-guided-background prior. Neurocomputing275: 2227-2238 [DOI: 10.1016/j.neucom.2017.10.052
Mei H YLiu Y YWei Z QZhou D SWei X PZhang Q and Yang X. 2022. Exploring dense context for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology32(3): 1378-1389 [DOI: 10.1109/TCSVT.2021.3069848
Niu Y ZGeng Y JLi X Q and Liu F. 2012. Leveraging stereopsis for saliency analysis//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, USA: IEEE:454-461 [DOI: 10.1109/CVPR.2012.6247708
Pang Y WZhao X QZhang L H and Lu H C. 2023. CAVER: cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Transactions on Image Processing32: 892-904 [DOI: 10.1109/TIP.2023.3234702
Paszke AGross SMassa FLerer ABradbury JChanan GKilleen TLin Z MGimelshein NAntiga LDesmaison AKöpf AYang EDeVito ZRaison MTejani AChilamkurthy SSteiner BFang LBai J J and Chintala S. 2019. PyTorch: an imperative style, high-performance deep learning library//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.:#721
Peng H WLi BXiong W HHu W M and Ji R R. 2014. RGBD salient object detection: a benchmark and algorithms//Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer:92-109 [DOI: 10.1007/978-3-319-10578-9_7
Ren J QXiaojin Gong NYu LWenhui Zhou N and Yang M Y. 2015. Exploiting global priors for RGB-D saliency detection//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Boston, USA: IEEE:25-32 [DOI: 10.1109/CVPRW.2015.7301391
Ren Z XGao S HChia L T and Tsang I W H. 2014. Region-based saliency detection and its application in object recognition. IEEE Transactions on Circuits and Systems for Video Technology24(5): 769-779 [DOI: 10.1109/TCSVT.2013.2280096
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR
Sun F MHu X HWu J YSun J and Wang F S. 2024. RGB-D salient object detection based on cross-modal interactive fusion and global awareness. Journal of Software35(4): 1899-1913
孙福明, 胡锡航, 武景宇, 孙静, 王法胜. 2024. 跨模态交互融合与全局感知的RGB-D显著性目标检测. 软件学报35(4): 1899-1913 [DOI: 10.13328/j.cnki.jos.006833
Wang F YPan J SXu S K and Tang J H. 2022a. Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Transactions on Image Processing31: 1285-1297 [DOI: 10.1109/TIP.2022.3140606
Wang W GLai Q XFu H ZShen J BLing H B and Yang R G. 2022b. Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44(6): 3239-3259 [DOI: 10.1109/TPAMI.2021.3051099
Wang W GShen J B and Ling H B. 2019. A deep network solution for attention and aesthetics aware photo cropping. IEEE Transactions on Pattern Analysis and Machine Intelligence41(7): 1531-1544 [DOI: 10.1109/TPAMI.2018.2840724
Wang W GShen J BLu X KHoi S C H and Ling H B. 2021. Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence43(7): 2413-2428 [DOI: 10.1109/TPAMI.2020.2966453
Wang W GSun G L and Van Gool L. 2024. Looking beyond single images for weakly supervised semantic segmentation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence46(3): 1635-1649 [DOI: 10.1109/TPAMI.2022.3168530
Wu J YSun F MXu RMeng J and Wang F S. 2022a. Aggregate interactive learning for RGB-D salient object detection. Expert Systems with Applications195: #116614 [DOI: 10.1016/j.eswa.2022.116614
Wu Y HLiu YZhan X and Cheng M M. 2023. P2T: pyramid pooling transformer for scene understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence45(11): 12760-12771 [DOI: 10.1109/TPAMI.2022.3202765
Wu Y HLiu YZhang LCheng M M and Ren B. 2022b. EDN: salient object detection via extremely-downsampled network. IEEE Transactions on Image Processing31: 3125-3136 [DOI: 10.1109/TIP.2022.3164550
Xiao FPu Z DChen J Q and Gao X P. 2024. DGFNet: depth-guided cross-modality fusion network for RGB-D salient object detection. IEEE Transactions on Multimedia26: 2648-2658 [DOI: 10.1109/TMM.2023.3301280
Ye X YZhu LWang W W and Fu Y. 2024. RGB_D salient object detection algorithm based on complementary information interaction. Journal of Image and Graphics29(5): 1252-1264
叶欣悦, 朱磊, 王文武, 付云. 2024. 互补特征交互融合的RGB_D实时显著目标检测. 中国图象图形学报29(5): 1252-1264 [DOI: 10.11834/jig.230583
Zhang RLyu YZhang Z TRen LXie JZhang A LYan Z W and Mi O. 2024. Development and prospect of multidimensional information perception and intelligent construction in deep earth engineering. Journal of China Coal Society49(3): 1259-1290
张茹, 吕游, 张泽天, 任利, 谢晶, 张安林, 严志伟, 米欧. 2024. 深地工程多维信息感知与智能建造的发展与展望. 煤炭学报49(3): 1259-1290 [DOI: 10.13225/j.cnki.jccs.2023.1439
Zhang MRen W SPiao YRong Z K and Lu H C. 2020. Select, supplement and focus for RGB-D saliency detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE:3469-3478 [DOI: 10.1109/CVPR42600.2020.00353
Zhang MYao S YHu B QPiao Y and Ji W. 2023. C2DFNet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Transactions on Multimedia25: 5142-5154 [DOI: 10.1109/TMM.2022.3187856
Zhang QLi YLi W JLin J JXiao M and Chen F Y. 2019. Salient object detection via deep features and multiple kernel boosting learning. Journal of Image and Graphics24(7): 1096-1105
张晴, 李云, 李文举, 林家骏, 肖莽, 陈飞云. 2019. 融合深度特征和多核增强学习的显著目标检测. 中国图象图形学报24(7): 1096-1105 [DOI: 10.11834/jig.180224
Zhang QQin QYang YJiao Q and Han J G. 2024. Feature calibrating and fusing network for RGB-D salient object detection. IEEE Transactions on Circuits and Systems for Video Technology34(3): 1493-1507 [DOI: 10.1109/TCSVT.2023.3296581
Zhao J XLiu J JFan D PCao YYang J F and Cheng M M. 2019. EGNet: edge guidance network for salient object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE:8778-8787 [DOI: 10.1109/ICCV.2019.00887
Zhao X KLi M LZhang GLi N and Li J S. 2021. Object detection method based on saliency map fusion for UAV-borne thermal images. Acta Automatica Sinica47(9): 2120-2131
赵兴科, 李明磊, 张弓, 黎宁, 李家松. 2021. 基于显著图融合的无人机载热红外图像目标检测方法. 自动化学报47(9): 2120-2131 [DOI: 10.16383/j.aas.c200021
Zhou W JPan S JLei J S and Yu L. 2022. TMFNet: three-input multilevel fusion network for detecting salient objects in RGB-D images. IEEE Transactions on Emerging Topics in Computational Intelligence6(3): 593-601 [DOI: 10.1109/TETCI.2021.3097393
Zhu X ZSu W JLu L WLi BWang X G and Dai J F. 2021. Deformable DETR: deformable transformers for end-to-end object detection//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: ICLR:894-910
2025年第30卷第12期
PDF下载
105
52
引用本文
BibTeX
文章信息
doi: 10.11834/jig.240653
  • 接收时间:2024-11-07
  • 首发时间:2026-04-09
  • 出版时间:2025-12-16
补充材料
相关文章
文章信息
作者
出版历史
  • 收稿日期:2024-11-07
  • 修回日期:2025-04-21
基金
作者信息
    1西安理工大学计算机科学与工程学院,西安710048
    2人机共融智能机器人陕西省高校工程研究中心,西安710048

通讯作者:

参考文献
分享链接
https://castjournals.cast.org.cn/joweb/zgtxtxxb/CN/10.11834/jig.240653
分享至
全文二维码

扫描看全文

引用本文
BibTeX
本文的引用情况
2种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏