收藏切换
Lightweight pyramid cross-attention network for orbital image defect detection
收藏切换
PDF
Sixu Guo1, 2, *, Huizheng Geng1, Li Su1, Shen He1, Xinyue Zhang1
Journal of Image and Graphics | 2025, 30(12) : 3824 - 3837
Less
收藏切换
Journal of Image and Graphics | 2025, 30(12): 3824-3837
Image Analysis and Recognition
Lightweight pyramid cross-attention network for orbital image defect detection
Full
Sixu Guo1, 2, *, Huizheng Geng1, Li Su1, Shen He1, Xinyue Zhang1
Affiliations
  • 1China Mobile Research Institute, Beijing100032, China
  • 2Vehicle Engineering College, Dalian Jiaotong University, Dalian116021, China
Published: 2025-12-16 doi: 10.11834/jig.240547
Outline
收藏切换
Objective

Most existing vision-based rail defect detection methods face challenges such as high parameter counts, computational complexity, slow detection speeds, and limited accuracy. Aiming to overcome these limitations, this paper introduces a lightweight pyramid cross-attention network (LPCANet) for orbital image defect detection using RGB images and depth images.

Method

LPCANet adopts MobileNetv2 as its backbone network to extract multiscale feature maps from RGB images. Simultaneously, a lightweight pyramid module (LPM) is employed to extract similarly-sized feature maps from depth images. Each stage of the LPM comprises a sequence of operations including max pooling, a 3 × 3 convolutional layer, batch normalization, and ReLU activation, enabling efficient extraction of features from depth images. By leveraging deep learning, RGB-D technology, and salient object detection, LPCANet efficiently extracts multiscale feature representations from RGB and depth data. The LPM handles depth image features, while the backbone captures detailed pyramid features from RGB images. Subsequently, a cross-attention mechanism (CAM) is applied to integrate the feature maps from both modalities, enhancing the network’s focus on relevant defect regions. Additionally, a spatial feature extractor (SFE) is introduced to further boost defect detection performance. Finally, a “pixel shuffle” operation is used to restore the output to the original image resolution.

Result

The proposed scheme was computationally evaluated using the PyTorch library in an environment equipped with an NVIDIA 3090 GPU, alongside several benchmark models for comparison. For the evaluation of LPCANet, three publicly available unsupervised RGB-D rail datasets were used: NEU-RSDDS-AUG, RSDD-TYPE1, and RSDD-TYPE2. Experimental results on the NEU-RSDDS-AUG dataset indicate that LPCANet achieves excellent efficiency, with 9.90 million parameters, a computational complexity of 2.50 G, a model size of 37.95 MB, and a running speed of 162.60 frames per second. Compared to 18 existing rail defect detection schemes, LPCANet exhibits superior lightness in performance. In particular, when compared against CSEPNet, the current best-performing model, LPCANet achieves improvements across several evaluation metrics: +1.48% in SαSα, +0.86% in intersection over union (IOU), +0.14% in FβmaxFβmax, +0.03% in mean average precision (mAP), and +1.77% in mean absolute error (MAE). An ablation study was conducted on four upsampling methods (interpolation, transposed convolution, patch merging, and “pixel shuffle”) to evaluate their effectiveness within the LPCANet framework. Among these, the “pixel shuffle” method demonstrated clear advantages and was found to be the most suitable for the LPCANet model. Further ablation studies were conducted on four different components (backbone network, LPM, SFE, and CAM). The results indicate that CAM and SFE notably enhance the detection performance of LPCANet. An in-depth analysis of various backbone networks confirmed that LPCANet model is not only compatible with existing backbone networks but also consistently achieves superior detection results. Aiming to evaluate the model’s generalization capability beyond rail datasets, experiments were also conducted on three non-rail defect datasets: DAGM2007, MT, and Kolektor-SDD2. The results show that LPCANet delivers improved performance across three key metrics: mAP, MAE, and IOU, demonstrating its potential for general-purpose defect detection tasks.

Conclusion

The LPCANet model proposed in this study effectively combines the advantages of traditional and deep learning approaches, demonstrating strong practical value in the field of rail defect image processing. In the future, this scheme will focus on further reducing the model size to achieve rapid detection speeds while ensuring further improvements in performance quality.

track defect detection  /  deep learning  /  RGB-D  /  significant object detection  /  spatial feature extractor(SFE)
Sixu Guo, Huizheng Geng, Li Su, Shen He, Xinyue Zhang. Lightweight pyramid cross-attention network for orbital image defect detection[J]. Journal of Image and Graphics, 2025 , 30 (12) : 3824 -3837 . DOI: 10.11834/jig.240547
Year 2025 volume 30 Issue 12
PDF
112
56
Cite this Article
BibTeX
Article Info
doi: 10.11834/jig.240547
  • Receive Date:2024-09-24
  • Online Date:2026-04-09
  • Published:2025-12-16
Article Data
Affiliations
History
  • Received:2024-09-24
  • Revised:2025-05-20
Affiliations
    1China Mobile Research Institute, Beijing100032, China
    2Vehicle Engineering College, Dalian Jiaotong University, Dalian116021, China
References
Share
https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/10.11834/jig.240547
Share to
QR

Scan QR to access full text

Cite this article
BibTeX
Citations
表12种不同金属材料的力学参数

Family
属数
Number of
genus
种数
Number of
species
占总种数比例
Percentage of
total species (%)

Genus
种数
Number of
species
占总种数比例
Percentage of total
species (%)
鹅膏菌科Amanitaceae 2 11 5.26 鹅膏菌属 Amanita 10 4.78
小菇科 Mycenaceae 2 12 5.74 丝盖伞属 Inocybe 5 2.39
多孔菌科 Polyporaceae 8 14 6.70 蜡蘑属 Laccaria 5 2.39
红菇科 Russulaceae 3 23 11.00 小皮伞属 Marasmius 6 2.87
小菇属 Mycena 11 5.26
光柄菇属 Pluteus 5 2.39
红菇属 Russula 17 8.13
栓菌属 Trametes 5 2.39
关闭全屏
  • BibTeX
  • EndNote
  • RefWorks
  • TxT