Abstract:
Sentiment classification is a fundamental task in many natural language processing applications. Neural networks have achieved great successes on the sentiment classification task in recent years, since recurrent neural networks and long-short-term memory networks have the ability to deal with sequences of different lengths and to capture contextual semantic information. However, the effectiveness of these methods is limited when used to extract contextual information from relatively long texts. Therefore, in our model, we apply bidirectional gated recurrent units to capture contextual information as far as possible when learning word representations, which may effectively reduce the noise compared to other methods. We also propose a novel loss function namely drop loss (DL) which makes the model focus on the hard examples - examples which are easily classified incorrectly - in order to improve the accuracy of the model. We experiment on four commonly used datasets, and the results show that the proposed method has a good performance on four datasets, and needs fewer parameters compared with recent benchmarks, such as CoVe, ULMFiT, embeddings from language models, and bidirectional encoder representations from transformers. Furthermore, we demonstrate that the classification performance of existing shallow network models can be significantly improved by using DL. In particular, the accuracy of the CNN+LSTM model improves 9% on the IMDB-10 dataset.