In this work we propose a novel cascaded feature pyramid network with non-backward propagation (CFPN-NBP) for facial expression recognition (FER) that addresses the problems inherent in traditional backward propagation (BP) algorithms in the training process by using the Hilbert-Schmidt independence criterion (HSIC) bottleneck. The proposed algorithm is developed at two different levels. At the first level, a novel training method HSIC bottleneck is considered as an alternative to traditional BP optimization, where the correlation between the output of the hidden layers and the input, and the correlation between the output of the hidden layers and its label are calculated to reduce redundant information; hence, the least information is used to predict the results. At the second level, a novel architecture is designed in the feature extraction process. The convolutional layers with the same resolutions are densely connected and introduced into the attention mechanism, so that the model can focus on more important information. The convolutional layers with different resolutions are combined by three cascaded pyramid networks; in this way, the shallow features and the deep features can be further fused, and; therefore, the semantic information and the content information can both be reserved. To further reduce the number of parameters, the operation of separable convolution instead of traditional convolution is utilized. Experiments on the challenging FER2013 dataset show that the proposed CFPN-NBP algorithm improves the accuracy of the FER task and outperforms the related state-of-the-art methods.