Enhanced attention-based joint semantic instance segmentation network for point clouds

Enhanced attention-based joint semantic instance segmentation network for point clouds

PDF

Wen Hao¹^,²^,^*, Zhanbin Zuo¹, Hansen Lu¹, Wei Liang¹^,², Haiyan Jin¹^,², Zhenghao Shi¹^,²

Journal of Image and Graphics | 2025, 30(12) : 3914 - 3926

Less

Journal of Image and Graphics | 2025, 30(12): 3914-3926

• Computer Graphics •

Enhanced attention-based joint semantic instance segmentation network for point clouds

Full

Wen Hao¹^,²^,^*, Zhanbin Zuo¹, Hansen Lu¹, Wei Liang¹^,², Haiyan Jin¹^,², Zhenghao Shi¹^,²

Affiliations

¹Department of Computer Science，Xi’an University of Technology，Xi’an710048，China

²Shaanxi Key Laboratory for Network Computing and Security Technology，Xi’an710048，China

Published: 2025-12-16 doi: 10.11834/jig.240445

Outline

Abstract

Less

Objective

With the rapid advancement of 3D sensing technologies such as LiDAR （light detection and ranging） and depth cameras， large-scale 3D point clouds have emerged as a crucial data source for a wide range of applications， including autonomous driving， robotic navigation， augmented reality， and urban scene reconstruction. Compared to 2D images， point clouds offer precise spatial geometry and provide a comprehensive representation of the environment without perspective distortion. Additionally， they are robust to variations in lighting and texture. Point cloud segmentation plays a crucial role in scene analysis and interpretation. The segmentation can be categorized into three types： semantic segmentation， instance segmentation， and joint semantic-instance segmentation. Semantic segmentation partitions a 3D scene into informative regions and assigns each region to a specific class. Instance segmentation identifies and separates individual objects at the point level， including those that belong to the same semantic category. In recent years， researchers have increasingly focused on combining the two tasks to achieve more consistent and informative scene-level interpretations. Joint semantic-instance segmentation leverages the intrinsic correlation between semantic and instance-level segmentation， enabling the two tasks to complement and reinforce each other. In 3D point cloud contexts， this joint approach substantially improves the capability of the system to comprehend complex environments and offers strong technical support for the development of intelligent systems. Consequently， this approach has become an area of growing interest and active research. However， most existing methods for joint semantic-instance segmentation rely on simplistic feature fusion strategies， which limit their effectiveness in fully capturing the potential relationship between semantic and instance features. Aiming to address this limitation， an enhanced attention-based joint semantic-instance segmentation network is proposed. This network is designed to effectively model and utilize the correlation between semantic and instance information.

Method

The enhanced attention-based joint semantic-instance segmentation neural network （EAJS-Net） incorporates a semantic feature extraction module based on an attention mechanism. This module focuses on the local neighborhood of each point and dynamically adjusts attention weights to emphasize key information， thereby enhancing the extraction of semantic features across points. Additionally， an attention-enhanced semantic/instance feature fusion module is introduced， which adaptively learns the similarity between central and adjacent features. This design reinforces key characteristics and effectively captures the correlation between instance and semantic segmentation， ultimately improving overall segmentation accuracy. EAJS-Net integrates PointNet++ and PointConv as its backbone network and comprises three main components： a point feature enhancement module， an encoder-decoder module， and an enhanced attention-based joint segmentation module. The input to EAJS-Net includes N × 9 dimensional point cloud data， where N represents the number of points， and the nine dimensions include coordinate values （XYZ）， color information （RGB）， and normalized coordinates. A semantic feature extraction module based on an attention mechanism is employed to effectively capture local contextual information between points. The enhanced features extracted by this module are then fed into the encoding layer， which includes four encoding modules： one attention pooling-based set abstraction layer adapted from PointNet++ and three feature encoding layers derived from PointConv. The corresponding decoding layer comprises four decoding modules： three deep feature decoding layers derived from PointConv and one feature propagation layer from PointNet++. By utilizing the attention pooling-based set abstraction layer from PointNet++， the network effectively captures spatial geometric relationships among features. Through the combination of the encoding and decoding layers， the initial semantic and instance features of the point cloud are extracted， laying the foundation for accurate joint segmentation. An enhanced attention module is designed to adaptively learn the similarity between central and neighboring features through dual attention mechanisms， which dynamically compute attention weights. These dual attention weights are summed and applied to the initial semantic features， resulting in enhanced semantic representations. This module is embedded within the semantic branch of the joint segmentation module， enabling more effective integration of semantic and instance features to improve joint segmentation accuracy. The encoded features are then upsampled through two parallel decoder branches to generate an instance feature matrix and a semantic feature matrix， which serve as inputs to the joint segmentation module. Within this module， the semantic and instance branches are integrated using the enhanced attention mechanism. The final output comprises instance embeddings and semantic predictions， supporting precise and consistent segmentation results.

Result

The proposed network is evaluated on the Stanford large-scale 3D indoor spaces （S3DIS） dataset and ScanNet V2 to assess its performance on point cloud segmentation tasks. Six fold cross-validation is performed on the S3DIS dataset， and the results of EAJS-Net are compared with those of the state-of-the-art （SOTA） methods. For semantic segmentation on the S3DIS dataset， EAJS-Net achieves a mean intersection over union （mIoU） of 65.9%， overall accuracy （oAcc） of 89.1%， and mean accuracy （mAcc） of 76.0%. Compared to JSNet++， these results represent improvements of 3.5% （mIoU）， 0.4% （oAcc）， and 3.2% （mAcc）. For instance segmentation， EAJS-Net reaches a weighted coverage rate of 61.1%， outperforming JSNet++ by 4.1% （mean weighted coverage， mWCov）， 4.6% （mean coverage， mCov）， and 1.2% （mean recall， mRec）. On the ScanNet dataset， EAJS-Net improves the mIoU for semantic segmentation by 3.2% and increases the weighted coverage rate for instance segmentation by 2.8% compared to JSNet. Visual comparisons between EAJS-Net and other SOTA methods are also presented， demonstrating that EAJS-Net consistently achieves superior segmentation results， even in complex indoor scenes. In addition， ablation experiments are conducted to validate the effectiveness of individual modules within the network. The enhanced attention-based joint segmentation module in EAJS-Net dynamically adjusts attention weights to effectively capture various features， successfully integrating semantic and instance features into the semantic feature space. This integration notably enhances the performance of the semantic segmentation task.

Conclusion

Aiming to address the limitations of existing feature fusion strategies that fail to fully capture inter-instance semantic correlations， this paper proposes a novel semantic-instance joint segmentation network， EAJS-Net， based on an enhanced attention mechanism. A new semantic feature extraction module is designed to capture contextual relationships among points. Additionally， an enhanced attention module is introduced to effectively aggregate instance features into the semantic feature space. This improved feature fusion strategy boosts the performance of joint semantic-instance segmentation. Experimental results demonstrate that EAJS-Net effectively integrates semantic and instance features， substantially improving the accuracy of both segmentation tasks compared to SOTA methods.

Key words

deep learning / point cloud / semantic segmentation / instance segmentation / enhanced attention-based mechanism

Cite this Article

Wen Hao, Zhanbin Zuo, Hansen Lu, Wei Liang, Haiyan Jin, Zhenghao Shi. Enhanced attention-based joint semantic instance segmentation network for point clouds[J]. Journal of Image and Graphics, 2025 , 30 (12) : 3914 -3926 . DOI: 10.11834/jig.240445

Appendix

Less

Year 2025 volume 30 Issue 12

PDF

122

Cite this Article

BibTeX

Article Info

doi: 10.11834/jig.240445

Receive Date：2024-08-12
Online Date：2026-04-09
Published：2025-12-16

Article Data

Affiliations

History

Received：2024-08-12
Revised：2025-05-15

Affiliations

¹Department of Computer Science，Xi’an University of Technology，Xi’an710048，China

²Shaanxi Key Laboratory for Network Computing and Security Technology，Xi’an710048，China

References

Share

https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/10.11834/jig.240445

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House