Regularization Effect of Random Node FaultNoise on Gradient Descent Learning Algorithm

Regularization Effect of Random Node FaultNoise on Gradient Descent Learning Algorithm

Abstract:

For decades, adding fault/noise during training by gradient descent has been a technique for getting a neural network (NN) tolerant to persistent fault/noise or getting an NN with better generalization. In recent years, this technique has been readvocated in deep learning to avoid overfitting. Yet, the objective function of such fault/noise injection learning has been misinterpreted as the desired measure (i.e., the expected mean squared error (mse) of the training samples) of the NN with the same fault/noise. The aims of this article are: 1) to clarify the above misconception and 2) investigate the actual regularization effect of adding node fault/noise when training by gradient descent. Based on the previous works on adding fault/noise during training, we speculate the reason why the misconception appears. In the sequel, it is shown that the learning objective of adding random node fault during gradient descent learning (GDL) for a multilayer perceptron (MLP) is identical to the desired measure of the MLP with the same fault. If additive (resp. multiplicative) node noise is added during GDL for an MLP, the learning objective is not identical to the desired measure of the MLP with such noise. For radial basis function (RBF) networks, it is shown that the learning objective is identical to the corresponding desired measure for all three fault/noise conditions. Empirical evidence is presented to support the theoretical results and, hence, clarify the misconception that the objective function of a fault/noise injection learning might not be interpreted as the desired measure of the NN with the same fault/noise. Afterward, the regularization effect of adding node fault/noise during training is revealed for the case of RBF networks. Notably, it is shown that the regularization effect of adding additive or multiplicative node noise (MNN) during training an RBF is reducing network complexity. Applying dropout regularization in RBF networks, its effect is the same as adding MNN during training.