Human pose estimation aims to locate skeletal keypoints of individuals in a given image. As a fundamental task in computer vision, human pose estimation has wide applications in human activity recognition, person re-identification, pose tracking, and related fields. Two main approaches for human pose estimation are available: top-down and bottom-up. Top-down methods first detect human bodies in the image, crop out each person, and then estimate the keypoint coordinates. While effective, these methods perform poorly in cases of occlusion, and their computation cost increases with the number of people in the image. In contrast, bottom-up methods detect all identity-independent keypoints simultaneously and then group them into individual poses. These methods are typically lightweight and fast but must handle varying human scales. Bottom-up human pose estimation methods commonly use 2D Gaussian kernels to generate keypoint heatmaps as regression targets because they provide rich spatial information. However, conventional approaches apply Gaussian kernels with a fixed variance across all keypoints, resulting in uniform heatmap structures. This uniformity is problematic given the existing scale variability in bottom-up methods. On the one hand, different keypoints cover different pixel areas in images, and using large Gaussian kernels may introduce semantic ambiguity, particularly for small joints. On the other hand, differences in keypoint scale imply different levels of annotation uncertainty, which the heatmap variance should ideally reflect. The variance of the Gaussian kernel represents uncertainty; thus, it should be proportional to the scale and ambiguity associated with each keypoint. Aiming to address these issues, an adaptive heatmap generation network (AHGNet) for bottom-up human pose estimation is proposed. AHGNet estimates the appropriate radius of the Gaussian kernel for each keypoint by integrating inherent scale information and geometric relationships. Through formula derivation, the relationship between the radius and the Gaussian kernel variance is established, enabling the creation of customized, scale-adaptive ground-truth heatmaps. This approach improves localization accuracy by effectively aligning the heatmap structure with the spatial characteristics of each keypoint.
First, an adaptive heatmap generation module is introduced. This module combines the inherent scale information from image features and the geometric relationship between adjacent keypoints to constrain the coverage areas of kernels. Keypoint scale is defined by semantic coverage areas in images. However, in the actual scene, accurately allowing pixel areas to occupy keypoints is almost impossible, and determining the potential relationship between Gaussian kernels and coverage areas is difficult. Interestingly, the areas occupied by keypoints are found to be related to geometric distance from adjacent keypoints. Therefore, an adaptive heatmap generation module is introduced to generate kernel scale maps of keypoints. This module combine the geometric relationship between adjacent keypoints and inherent scale information from image features. Second, local probabilistic consistency loss is presented to define the distance between the predicted and ground truth heatmaps globally and locally. Most methods based on heatmap regression use L2 loss for supervised learning. However, as the loss function for heatmap regression, L2 loss assumes that each pixel point is independent and overlooks the local structural correlation, making it difficult to describe the probability distribution of heatmaps. A keypoint heatmap is a probability distribution that describes pixels belonging to a certain joint. Thus, KL Divergence must be added to describe local probability consistency. Moreover, samples with large prediction errors are difficult to predict; thus, the weight of difficult samples should be increased. Similarly, the weight of easily detected samples should be reduced. Therefore, the dynamic weight is added to balance the contribution of different samples. Inspired by focal loss, which allows the model to actively focus on hard-to-detect samples, this paper utilizes dynamic weights to reduce the contribution of easily detected samples while enhancing the contribution of hard-to-detect samples.
HrHRNet is used as the baseline to establish AHGNet for bottom-up human pose estimation. The model is tested on two public datasets: MS COCO and CrowdPose. Experimental results reveal that AHGNet surpasses HrHRNet in terms of average precision (AP), achieving 72.1% AP and 74.1% AP on COCO test-dev and CrowdPose dataset, providing improvements of +1.6% AP and +6.5% AP, respectively. In addition, the substantial improvement on the CrowdPose dataset with crowded scenes indicates that AHGNet helps alleviate the problem of human scale changes in complex crowded scenes. Simultaneously, the ablation experiments verified the effectiveness of the proposed method.
AHGNet leverages geometric features between adjacent keypoints and inherent scale information within the image to generate adaptive heatmaps as groundtruth. This network further employs a local probability consistency loss function to address the challenges posed by various human scales, effectively improving the accuracy of bottom-up human pose estimation. AHGNet provides a new paradigm for optimizing supervision signals in bottom-up pose estimation. By dynamically adjusting the Gaussian kernel scale and enforcing local probability constraints, it effectively reduces multiscale ambiguity in complex scenarios.
| 科 Family | 属数 Number of genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) | 属 Genus | 种数 Number of species | 占总种数比例 Percentage of total species (%) |
|---|---|---|---|---|---|---|
| 鹅膏菌科Amanitaceae | 2 | 11 | 5.26 | 鹅膏菌属 Amanita | 10 | 4.78 |
| 小菇科 Mycenaceae | 2 | 12 | 5.74 | 丝盖伞属 Inocybe | 5 | 2.39 |
| 多孔菌科 Polyporaceae | 8 | 14 | 6.70 | 蜡蘑属 Laccaria | 5 | 2.39 |
| 红菇科 Russulaceae | 3 | 23 | 11.00 | 小皮伞属 Marasmius | 6 | 2.87 |
| 小菇属 Mycena | 11 | 5.26 | ||||
| 光柄菇属 Pluteus | 5 | 2.39 | ||||
| 红菇属 Russula | 17 | 8.13 | ||||
| 栓菌属 Trametes | 5 | 2.39 |