Abstract:
Multi-person pose tracking task aims to estimate and track person keypoints in videos. Most of the previous methods follow the general track-by-detection strategy that ignores the consistent pose information during the whole framework. Thus, they often suffer from missing detections or inaccurate human association in challenging scenes with motion blur or person occlusion. To handle those problems, we propose a pose-guided tracking-by-detection framework that fuses pose information into both video human detection and human association procedures. In the video human detection stage, we adopt the pose-guided person location prediction exploiting the temporal information to make up missing detections. Technically, pose heatmaps are utilized to cope with the person-specific intra-class distractors. Furthermore, in the human association stage, we propose an appearance discriminative model based on the hierarchical pose-guided graph convolutional networks (PoseGCN). The PoseGCN-based model exploits human structural relations to boost person representation. Extensive experiments show the superiority of our method on the challenging pose tracking benchmark. Our proposed method ranks first on the PoseTrack leaderboard. 11 http://posetrack.net/leaderboard.php till the submission date (22-Aug-2019) of this paper. Our code has been publicly available