Abstract:
In this article, we present POISON, an end-to-end deep neural network (DNN) method to estimate the human pose in real-time and under the variability of lighting conditions. A lightweight version of the open-source library, OpenPose, is used to extract human keypoints from RGB and infrared (IR) images, and the contribution of each image is combined by a fusion step. We propose a method to fuse these two types of information, using a fusion strategy followed by a refinement stage DNN, which aims to identify complex relationships between extracted keypoints. Finally, the combined information is used to infer the human pose. The experimental results validate that POISON improves the overall performance of conventional single-camera methods by a factor 1.79×. On a custom data set of RGB/IR pair images captured in challenging low-light conditions, POISON correctly detects up to 86% of human keypoints, outperforming conventional single-camera methods, which can detect up to 48% of keypoints. We also provide a comparison of POISON against the current state-of-the-art image fusion methods based on DNN techniques. We perform extensive quantitative evaluation and show that POISON outperforms existing approaches for the task of estimating the human pose in challenging lighting environments. POISON runs at 17 frames/s on Intel Core i7 CPU with Nvidia GeForce GTX-1080 GPU dedicated to DNN operations, making the solution suitable for real-time performance.