Abstract:
In this paper, we present a method (Hybrid-Pose) to improve human pose estimation in images. We adopt Stacked Hourglass Networks to design two convolutional neural network models, RNet for pose refinement and CNet for pose correction. The CNet (Correction Network) guides the pose refinement RNet (Refinement Network) to correct the joint location before generating the final pose. Each of the two models is composed of four hourglasses, and each hourglass generates a group of detection heatmaps for the joints. The RNet model hourglasses have the same structure. However, the CNet model is designed with hourglasses of different structures for pose guidance. Since the pose estimation in RGB images is very sensitive to the image scene, our proposed approach generates multiple outputs of detection heatmaps to broaden the searching scope for the correct joints locations. We use the RNet model to refine the joints locations in each hourglass stage horizontally, then the heatmaps of each stage are fused with the heatmaps of all the CNet model hourglasses vertically in a hybrid manner. Our method shows competitive results with the existing state-of-the-art approaches on MPII and FLIC benchmark datasets. Although our proposed method focuses on improving single-person pose estimation, we also show the influence of this improvement on multi-person pose estimation by detecting multiple people using SSD detector, then estimating the pose of each person individually.