Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification

Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification

PDF

Chenxi ZHU¹, Yinan LU¹, Tieru WU², Wenyong GONG³, Rui MA²

Journal of Graphics | 2026, 47(1) : 78 - 89

Less

Journal of Graphics | 2026, 47(1): 78-89

• Image Processing and Computer Vision •

Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification

Full

Chenxi ZHU¹, Yinan LU¹, Tieru WU², Wenyong GONG³, Rui MA²

Affiliations

¹ College of Computer Science and Technology, Jilin University, Changchun Jilin 130012, China

² School of Artificial Intelligence, Jilin University, Changchun Jilin 130012, China

³ College of Information Science and Technology, Jinan University, Guangzhou Guangdong 510632, China

Published: 2026-02-28 doi: 10.11996/JG.j.2095-302X.2026010078

Outline

Abstract

Less

Traditional 3D point-cloud classification methods tend to suffer from insufficient generalization and catastrophic forgetting in Few-Shot Class-incremental Learning (FSCIL) scenarios. The pretrained vision-language model CLIP (Contrastive Language-Image Pre-training), which contains rich 2D shape priors, has been shown to effectively enhance 3D FSCIL performance. However, existing CLIP-based frameworks still lack flexibility and adaptability in multimodal feature extraction and fusion, which limits classification accuracy during incremental stages. To address these shortcomings, a 3D FSCIL approach with deeply fused multimodal features was proposed. An adaptive adapter based on gated units and residual blocks was introduced to achieve multi-scale feature alignment and redundancy suppression, and a multimodal global feature dynamic fusion module with self-attention was designed to adaptively adjust the weight allocation of different feature streams according to sample characteristics, thereby obtaining more consistent and complementary fused representations. Specifically, point clouds were rendered into multi-view depth maps, and features were extracted using both the original CLIP visual encoder and a CLIP encoder pretrained on depth maps, combined with point-cloud geometric features. After processing through the adaptive adapter, these features were fed into the attention-based fusion module and aligned with semantic features extracted by the CLIP text encoder for classification. In addition, contrastive learning loss, multi-view and geometric perturbation-based data augmentation strategies, and a memory-replay mechanism were incorporated to effectively mitigate overfitting and forgetting under few-shot conditions. Experiments on ShapeNet, ModelNet, and CO3D demonstrated that the proposed method consistently achieved higher accuracy across incremental stages compared with existing 3D FSCIL approaches, while significantly reducing both relative accuracy drop rates and maximum stage fluctuations.

Key words

3D point cloud / incremental learning / few-shot learning / 3D classification / pre-trained model

Cite this Article

Chenxi ZHU, Yinan LU, Tieru WU, Wenyong GONG, Rui MA. Deep fusion of multimodal features for few-shot class-incremental 3D point cloud classification[J]. Journal of Graphics, 2026 , 47 (1) : 78 -89 . DOI: 10.11996/JG.j.2095-302X.2026010078

Funding

Less

National Natural Science Foundation of China(62202199)

Appendix

Less

Year 2026 volume 47 Issue 1

PDF

Cite this Article

BibTeX

Article Info

doi: 10.11996/JG.j.2095-302X.2026010078

Receive Date：2025-06-30
Online Date：2026-05-19
Published：2026-02-28

Article Data

Affiliations

History

Received：2025-06-30
Accepted：2025-08-23

Funding

National Natural Science Foundation of China(62202199)

Affiliations

¹ College of Computer Science and Technology, Jilin University, Changchun Jilin 130012, China

² School of Artificial Intelligence, Jilin University, Changchun Jilin 130012, China

³ College of Information Science and Technology, Jinan University, Guangzhou Guangdong 510632, China

Corresponding:

MA Rui, E-mail：ruim@jlu.edu.cn

References

Share

https://castjournals.cast.org.cn/joweb/txxb/EN/10.11996/JG.j.2095-302X.2026010078

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House