Abstract:
Facial emotions are expressed through a combination of facial muscle movements, namely, the Facial Action Units (FAUs). FAU intensity estimation aims to estimate the intensity of a set of structurally dependent FAUs. Contrary to the existing works that focus on improving FAU intensity estimation performance, this study investigates how knowledge distillation (KD) incorporated into a training model can improve FAU intensity estimation efficiency while achieving the comparable level of performance. Given the intrinsic structural characteristics of FAU, it is desirable to distill deep structural relationships, namely, DSR-FAU, using heatmap regression. Our methodology is as follows: First, a feature map-level distillation loss is applied to ensure that the student network and the teacher network share similar feature distributions. Second, the region-wise and channel-wise relationship distillation loss functions are introduced to penalize the difference in structural relationships. Specifically, the region-wise relationship can be represented by the structural correlations across the facial features, whereas the channel-wise relationship is represented by the implicit FAU co-occurrence dependencies. Third, we compare the model performance of DSR-FAU with the state-of-the-art models, based on two benchmarking datasets. It is shown that our model achieves comparable performance, with a lower number of model parameters and lower computation complexities.