Abstract:
Although recent emotion recognition methods (based on facial expression cues) achieve excellent performance in controlled scenarios, the recognition of emotion in the wild remains a challenging problem because of occlusion, large head poses, illumination variations, etc. Recent advances in deep learning show that combining an ensemble of deep learning models can considerably outperform the approach of using only a single deep learning model for challenging recognition problems. This paper presents a novel ensemble deep learning method, “deep convolutional neural network (DCNN) ensemble classifier”, for improved facial expression recognition (FER) in the wild. Our proposed DCNN ensemble classifier is novel in terms of the following aspects: (1) the process of finding ensemble weights for combining DCNN decision outputs is formulated as a stochastic optimization problem (via simulated annealing) in which the energy to be minimized represents the generalized (test) classification error of the DCNN ensemble and (2) for the creation of DCNN ensemble members, we propose the combined use of different types of face representations and bagging (T. G. Dietterich, 2000), which is quite useful in increasing the diversity of the DCNN ensemble. Extensive and comparative experiments on three wild FER datasets, namely FER2013, SFEW2.0, and RAF-DB, show that the proposed DCNN ensemble classifier achieves competitive FER performances when compared with other recently developed methods—76.69%, 58.68%, and 87.13% of FER accuracy under the FER2013, SFEW2.0, and RAF-DB evaluation protocols, respectively.