Learning to Acquire the Quality of Human Pose Estimation

Learning to Acquire the Quality of Human Pose Estimation

Abstract:

Making human poses serve high-level computer vision tasks such as action recognition, recognizing the quality of estimated poses is of critical importance. Conventionally, the mean confidence of each keypoint is used as pose quality in most human pose estimation frameworks. However, because different types of keypoint are not identical in visibility and size, they should not contribute equally, which produces biased quality scores. In the paper, we propose end-to-end human pose quality learning, which adds a quality prediction block alongside pose regression. The proposed block learns the object keypoint similarity (OKS) between the estimated pose and its corresponding ground truth by sharing the pose features with heatmap regression. The predicted OKS correlates well with pose quality, making the selection of reliable poses straightforward. Moreover, utilizing the learned quality as pose score improves pose estimation performance during COCO AP evaluation, because it ranks more accurate ones high among all pose detections. We conduct extensive experiments based on the three most popular human pose estimation frameworks, including Hourglass, SimpleBaseline and HRNet. Adding the proposed quality learning block is able to consistently bring nearly 1 percent AP improvement on all the frameworks.