Lightweight pyramid cross-attention network for orbital image defect detection

Lightweight pyramid cross-attention network for orbital image defect detection

PDF

Sixu Guo¹^,²^,^*, Huizheng Geng¹, Li Su¹, Shen He¹, Xinyue Zhang¹

Journal of Image and Graphics | 2025, 30(12) : 3824 - 3837

Less

Journal of Image and Graphics | 2025, 30(12): 3824-3837

• Image Analysis and Recognition •

Lightweight pyramid cross-attention network for orbital image defect detection

Full

Sixu Guo¹^,²^,^*, Huizheng Geng¹, Li Su¹, Shen He¹, Xinyue Zhang¹

Affiliations

¹China Mobile Research Institute， Beijing100032， China

²Vehicle Engineering College， Dalian Jiaotong University， Dalian116021， China

Published: 2025-12-16 doi: 10.11834/jig.240547

Outline

Abstract

Less

Objective

Most existing vision-based rail defect detection methods face challenges such as high parameter counts， computational complexity， slow detection speeds， and limited accuracy. Aiming to overcome these limitations， this paper introduces a lightweight pyramid cross-attention network （LPCANet） for orbital image defect detection using RGB images and depth images.

Method

LPCANet adopts MobileNetv2 as its backbone network to extract multiscale feature maps from RGB images. Simultaneously， a lightweight pyramid module （LPM） is employed to extract similarly-sized feature maps from depth images. Each stage of the LPM comprises a sequence of operations including max pooling， a 3 × 3 convolutional layer， batch normalization， and ReLU activation， enabling efficient extraction of features from depth images. By leveraging deep learning， RGB-D technology， and salient object detection， LPCANet efficiently extracts multiscale feature representations from RGB and depth data. The LPM handles depth image features， while the backbone captures detailed pyramid features from RGB images. Subsequently， a cross-attention mechanism （CAM） is applied to integrate the feature maps from both modalities， enhancing the network’s focus on relevant defect regions. Additionally， a spatial feature extractor （SFE） is introduced to further boost defect detection performance. Finally， a “pixel shuffle” operation is used to restore the output to the original image resolution.

Result

The proposed scheme was computationally evaluated using the PyTorch library in an environment equipped with an NVIDIA 3090 GPU， alongside several benchmark models for comparison. For the evaluation of LPCANet， three publicly available unsupervised RGB-D rail datasets were used： NEU-RSDDS-AUG， RSDD-TYPE1， and RSDD-TYPE2. Experimental results on the NEU-RSDDS-AUG dataset indicate that LPCANet achieves excellent efficiency， with 9.90 million parameters， a computational complexity of 2.50 G， a model size of 37.95 MB， and a running speed of 162.60 frames per second. Compared to 18 existing rail defect detection schemes， LPCANet exhibits superior lightness in performance. In particular， when compared against CSEPNet， the current best-performing model， LPCANet achieves improvements across several evaluation metrics： +1.48% in $S α$ Sα， +0.86% in intersection over union （IOU）， +0.14% in $F β m a x$ Fβmax， +0.03% in mean average precision （mAP）， and +1.77% in mean absolute error （MAE）. An ablation study was conducted on four upsampling methods （interpolation， transposed convolution， patch merging， and “pixel shuffle”） to evaluate their effectiveness within the LPCANet framework. Among these， the “pixel shuffle” method demonstrated clear advantages and was found to be the most suitable for the LPCANet model. Further ablation studies were conducted on four different components （backbone network， LPM， SFE， and CAM）. The results indicate that CAM and SFE notably enhance the detection performance of LPCANet. An in-depth analysis of various backbone networks confirmed that LPCANet model is not only compatible with existing backbone networks but also consistently achieves superior detection results. Aiming to evaluate the model’s generalization capability beyond rail datasets， experiments were also conducted on three non-rail defect datasets： DAGM2007， MT， and Kolektor-SDD2. The results show that LPCANet delivers improved performance across three key metrics： mAP， MAE， and IOU， demonstrating its potential for general-purpose defect detection tasks.

Conclusion

The LPCANet model proposed in this study effectively combines the advantages of traditional and deep learning approaches， demonstrating strong practical value in the field of rail defect image processing. In the future， this scheme will focus on further reducing the model size to achieve rapid detection speeds while ensuring further improvements in performance quality.

Key words

track defect detection / deep learning / RGB-D / significant object detection / spatial feature extractor（SFE）

Cite this Article

Sixu Guo, Huizheng Geng, Li Su, Shen He, Xinyue Zhang. Lightweight pyramid cross-attention network for orbital image defect detection[J]. Journal of Image and Graphics, 2025 , 30 (12) : 3824 -3837 . DOI: 10.11834/jig.240547

Appendix

Less

Year 2025 volume 30 Issue 12

PDF

112

Cite this Article

BibTeX

Article Info

doi: 10.11834/jig.240547

Receive Date：2024-09-24
Online Date：2026-04-09
Published：2025-12-16

Article Data

Affiliations

History

Received：2024-09-24
Revised：2025-05-20

Affiliations

¹China Mobile Research Institute， Beijing100032， China

²Vehicle Engineering College， Dalian Jiaotong University， Dalian116021， China

References

Share

https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/10.11834/jig.240547

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House