Adaptive ground-truth heatmap generation for bottom-up human pose estimation

Adaptive ground-truth heatmap generation for bottom-up human pose estimation

PDF

Ling Jiang¹, Zhuocheng Liu², Yuan Xiong², Wei Wu², Kaige Li²^,^*

Journal of Image and Graphics | 2025, 30(12) : 3870 - 3883

Less

Journal of Image and Graphics | 2025, 30(12): 3870-3883

• Image Understanding and Computer Vision •

Adaptive ground-truth heatmap generation for bottom-up human pose estimation

Full

Ling Jiang¹, Zhuocheng Liu², Yuan Xiong², Wei Wu², Kaige Li²^,^*

Affiliations

¹School of Computer Science and Engineering， Anhui University of Science & Technology， Huainan232001， China

²State Key Laboratory of Virtual Reality Technology and Systems， Beihang University， Beijing100191， China

Published: 2025-12-16 doi: 10.11834/jig.240615

Outline

Abstract

Less

Objective

Human pose estimation aims to locate skeletal keypoints of individuals in a given image. As a fundamental task in computer vision， human pose estimation has wide applications in human activity recognition， person re-identification， pose tracking， and related fields. Two main approaches for human pose estimation are available： top-down and bottom-up. Top-down methods first detect human bodies in the image， crop out each person， and then estimate the keypoint coordinates. While effective， these methods perform poorly in cases of occlusion， and their computation cost increases with the number of people in the image. In contrast， bottom-up methods detect all identity-independent keypoints simultaneously and then group them into individual poses. These methods are typically lightweight and fast but must handle varying human scales. Bottom-up human pose estimation methods commonly use 2D Gaussian kernels to generate keypoint heatmaps as regression targets because they provide rich spatial information. However， conventional approaches apply Gaussian kernels with a fixed variance across all keypoints， resulting in uniform heatmap structures. This uniformity is problematic given the existing scale variability in bottom-up methods. On the one hand， different keypoints cover different pixel areas in images， and using large Gaussian kernels may introduce semantic ambiguity， particularly for small joints. On the other hand， differences in keypoint scale imply different levels of annotation uncertainty， which the heatmap variance should ideally reflect. The variance of the Gaussian kernel represents uncertainty； thus， it should be proportional to the scale and ambiguity associated with each keypoint. Aiming to address these issues， an adaptive heatmap generation network （AHGNet） for bottom-up human pose estimation is proposed. AHGNet estimates the appropriate radius of the Gaussian kernel for each keypoint by integrating inherent scale information and geometric relationships. Through formula derivation， the relationship between the radius and the Gaussian kernel variance is established， enabling the creation of customized， scale-adaptive ground-truth heatmaps. This approach improves localization accuracy by effectively aligning the heatmap structure with the spatial characteristics of each keypoint.

Method

First， an adaptive heatmap generation module is introduced. This module combines the inherent scale information from image features and the geometric relationship between adjacent keypoints to constrain the coverage areas of kernels. Keypoint scale is defined by semantic coverage areas in images. However， in the actual scene， accurately allowing pixel areas to occupy keypoints is almost impossible， and determining the potential relationship between Gaussian kernels and coverage areas is difficult. Interestingly， the areas occupied by keypoints are found to be related to geometric distance from adjacent keypoints. Therefore， an adaptive heatmap generation module is introduced to generate kernel scale maps of keypoints. This module combine the geometric relationship between adjacent keypoints and inherent scale information from image features. Second， local probabilistic consistency loss is presented to define the distance between the predicted and ground truth heatmaps globally and locally. Most methods based on heatmap regression use L₂ loss for supervised learning. However， as the loss function for heatmap regression， L₂ loss assumes that each pixel point is independent and overlooks the local structural correlation， making it difficult to describe the probability distribution of heatmaps. A keypoint heatmap is a probability distribution that describes pixels belonging to a certain joint. Thus， KL Divergence must be added to describe local probability consistency. Moreover， samples with large prediction errors are difficult to predict； thus， the weight of difficult samples should be increased. Similarly， the weight of easily detected samples should be reduced. Therefore， the dynamic weight is added to balance the contribution of different samples. Inspired by focal loss， which allows the model to actively focus on hard-to-detect samples， this paper utilizes dynamic weights to reduce the contribution of easily detected samples while enhancing the contribution of hard-to-detect samples.

Result

HrHRNet is used as the baseline to establish AHGNet for bottom-up human pose estimation. The model is tested on two public datasets： MS COCO and CrowdPose. Experimental results reveal that AHGNet surpasses HrHRNet in terms of average precision （AP）， achieving 72.1% AP and 74.1% AP on COCO test-dev and CrowdPose dataset， providing improvements of +1.6% AP and +6.5% AP， respectively. In addition， the substantial improvement on the CrowdPose dataset with crowded scenes indicates that AHGNet helps alleviate the problem of human scale changes in complex crowded scenes. Simultaneously， the ablation experiments verified the effectiveness of the proposed method.

Conclusion

AHGNet leverages geometric features between adjacent keypoints and inherent scale information within the image to generate adaptive heatmaps as groundtruth. This network further employs a local probability consistency loss function to address the challenges posed by various human scales， effectively improving the accuracy of bottom-up human pose estimation. AHGNet provides a new paradigm for optimizing supervision signals in bottom-up pose estimation. By dynamically adjusting the Gaussian kernel scale and enforcing local probability constraints， it effectively reduces multiscale ambiguity in complex scenarios.

Key words

human pose estimation / adaptive scale / bottom-up / heatmap regression / dynamic weight

Cite this Article

Ling Jiang, Zhuocheng Liu, Yuan Xiong, Wei Wu, Kaige Li. Adaptive ground-truth heatmap generation for bottom-up human pose estimation[J]. Journal of Image and Graphics, 2025 , 30 (12) : 3870 -3883 . DOI: 10.11834/jig.240615

Appendix

Less

Year 2025 volume 30 Issue 12

PDF

118

Cite this Article

BibTeX

Article Info

doi: 10.11834/jig.240615

Receive Date：2024-10-21
Online Date：2026-04-09
Published：2025-12-16

Article Data

Affiliations

History

Received：2024-10-21
Revised：2025-05-22

Affiliations

¹School of Computer Science and Engineering， Anhui University of Science & Technology， Huainan232001， China

²State Key Laboratory of Virtual Reality Technology and Systems， Beihang University， Beijing100191， China

References

Share

https://castjournals.cast.org.cn/joweb/zgtxtxxb/EN/10.11834/jig.240615

Share to

Scan QR to access full text

Cite this article

BibTeX

Citations

表12种不同金属材料的力学参数

科 Family	属数 Number of genus	种数 Number of species	占总种数比例 Percentage of total species (%)	属 Genus	种数 Number of species	占总种数比例 Percentage of total species (%)
鹅膏菌科Amanitaceae	2	11	5.26	鹅膏菌属 Amanita	10	4.78
小菇科 Mycenaceae	2	12	5.74	丝盖伞属 Inocybe	5	2.39
多孔菌科 Polyporaceae	8	14	6.70	蜡蘑属 Laccaria	5	2.39
红菇科 Russulaceae	3	23	11.00	小皮伞属 Marasmius	6	2.87
				小菇属 Mycena	11	5.26
				光柄菇属 Pluteus	5	2.39
				红菇属 Russula	17	8.13
				栓菌属 Trametes	5	2.39

关闭全屏

BibTeX
EndNote
RefWorks
TxT

Articles: Latest Articles; Most Read; Collections

Updates: Events; News; Multimedia

About: About Us

Contact

No. 86 Xueyuan South Road, Haidian District, Beijing

100081

010-62199257

qkjq@cast.org.cn

Copyright © 2025 China Association for Science and Technology. All rights reserved. For all open access content, the relevant licensing terms apply.
Sponsored by the Office of the Leading Group for Cybersecurity and Informatization of CAST, and supported by Science and Technology Review Publishing House