Abstract:
Deep neural networks have been applied to different applications, such as image recognition, data analysis, natural language processing, and so on. Some companies regularly release their mobile applications to app stores. Usually, app stores have a restriction that overweight applications can not be downloaded until the smartphone connects to Wi-Fi. Unfortunately, the size of the application that consists of deep neural networks may be too large. In order to reduce the size of application, an efficient compression approach is highly desired. An efficient encoding for compressed neural networks is a critical point for reducing storage complexity. Most of previous studies focus on the sparse matrix encoding instead of the dense one. The dense matrix encoding methods have not been studied well. Meanwhile, sparse matrix encoding can not attain a high compression rate when encoding a dense matrix. Toward this end, we proposed an alternative compression algorithm that linearizes the weights on each layer and then stores linearized weights with critical information. For a dense matrix, our algorithm can achieve a higher compression rate than one existing sparse matrix encoding. Experiment results indicate that our proposed piecewise linearization scheme can make VGG-16, Resnet152, and Densenet169 achieve at least two times of the compression rates. In other words, neural networks parameters can be stored in 40–50% of the original space.