Weakly supervised semantic segmentation (WSSS) aims to reduce the cost associated with annotating “strong” pixel-level labels by using “weak” labels, such as points, bounding boxes, image-level class labels, and scribbles. Among these, image-level class labels are the most cost-effective and readily available; however, leveraging them for precise segmentation remains a considerable challenge. A widely used WSSS approach based on image-level class labels generally comprises the following steps: 1) training a neural network for image classification using the class labels; 2) using the trained network to generate class activation maps (CAMs), which serve as seed regions for the segmentation task; and 3) refining these CAMs into pseudo-labels, which are then used as the ground truth to supervise a segmentation network. These steps can be integrated into a single collaborative stage; typically, single-stage frameworks are highly efficient due to their simplified training pipeline. However, the quality of pseudo-labels is crucial to the overall performance of semantic segmentation. High-quality pseudo-labels result in superior segmentation outcomes, whereas noisy or inaccurate pseudo-labels hinder the capability of the model to learn meaningful features. WSSS based on image-level labels faces considerable challenges due to the absence of precise positional and shape-related information, making it difficult to generate accurate segmentation maps. These challenges have led to the development of various approaches, which can be broadly categorized into two types: single-stage methods and multistage methods. Although single-stage methods offer greater efficiency and simplify the overall training process, they often produce less accurate pseudo-labels. This condition is due to the limited refinement of CAMs, resulting in imprecise supervision signals that ultimately degrade segmentation performance. Aiming to alleviate these limitations, a simple yet novel single-stage WSSS framework that incorporates knowledge distillation is introduced to enhance pseudo-label quality without relying on any additional external supervision. The framework enhances the feature learning process within the teacher-student network using a dual-stage knowledge distillation module. This module allows the student network to acquire more dynamic and informative knowledge from the teacher network while preserving key features, thereby enhancing the overall robustness of the student model. Moreover, to further improve segmentation accuracy, a pseudo-label correction module based on a Gaussian mixture model (GMM) is introduced. This module refines the pseudo-labels by modeling the distribution of the CAMs, resulting in highly accurate and reliable supervision signals. The combination of dual-stage knowledge distillation and the Gaussian correction module ensures accurate learning and improved segmentation results, even under weak supervision signals such as image-level labels. Ultimately, the proposed method effectively mitigates the impact of noise during training and enhances the accuracy of the generated pseudo-labels, resulting in superior semantic segmentation outcomes in WSSS tasks.
A novel weakly-supervised semantic segmentation method, aimed at addressing the challenges posed by noisy data points and weak supervision, is proposed. First, a dual-stage knowledge interaction module is introduced to enhance the feature learning process of the teacher and student networks. By enabling highly effective knowledge exchange between the two networks, the proposed approach notably reduces the impact of noise during training, leading to robust feature extraction. Additionally, a Gaussian correction module is proposed to enhance the quality of pseudo-labels. This module refines the pseudo-labels by modeling the distribution of class activation maps. By fitting the distribution more accurately, the module corrects potential errors in the pseudo-labels, ensuring that the model learns from high-quality, refined labels. Therefore, the method boosts the overall performance of weakly-supervised semantic segmentation, making it more robust to noise and improving segmentation accuracy. This method provides a promising solution for weakly-supervised segmentation tasks.
The mIoU values of this method on the PASCAL VOC 2012 and MS COCO 2014 datasets were 74.8% and 42.3%, respectively, surpassing other comparative methods. Specifically, on the PASCAL VOC 2012 dataset, the proposed method achieved a 3.7% improvement over ToCo, an 8.8% enhancement compared to AFA, a 7.5% increase relative to TSCD, and 1.1% compared to BECO. On the MS COCO 2014 dataset, the method improved performance by 2.2% compared to TSCD, 3.4% compared to AFA, and 5.3% compared to AuxSegNet+. Additionally, the mIoU values of different categories are compared on the PASCAL VOC 2012 validation set. The experimental results showed that the method outperformed the competing methods in 16 categories. Notably, for the background class, the method achieved an mIoU of 92.4%, the highest among all methods evaluated. This result indicates that the method effectively leverages the Gaussian correction module to reduce misclassification of background regions, thereby improving segmentation performance. Furthermore, the method achieved notable improvements in categories such as bird, bottle, car, chair, and cow, further demonstrating its effectiveness.
The proposed method effectively mitigates the impact of noise during training and address the issue of incomplete pseudo-label generation through the integration of a dual-stage knowledge distillation module and a Gaussian correction module. This approach achieves remarkable performance improvements compared to existing methods. Overall, the results demonstrate notable advantages in end-to-end weakly supervised semantic segmentation and holds considerable research value.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |