Data Balanced Classification Model for Mobile Encrypted Traffic in Big Data Environment

Data Balanced Classification Model for Mobile Encrypted Traffic in Big Data Environment

Abstract:

With the widespread use of mobile technologies and the Internet, traffic in mobile networks is increasing. This situation has made the classification of traffic an important element for data security and network management. However, encryption of traffic in modern networks makes it difficult to classify traffic with traditional methods. In this study, a unique deep learning-based classification model is proposed for the classification of encrypted mobile traffic data. The proposed model is a classification model called RFSE-GRU, which combines the Gated Recurrent Units (GRU) algorithm, feature selection and data balancing. The features that are more meaningful in the classification process are determined by selecting the features with the Random Forest algorithm. In addition, Synthetic Minority Oversampling Technique (SMOTE) oversampling algorithm and Edited Nearest Neighbor (ENN) undersampling algorithm were used together to reduce the negative impact of data imbalance on classification performance. The study was carried out on Apache Spark’s big data platform in the Google Colab environment. In the study, ISCX VPN-Non VPN and UTMobileNet2021 datasets were used. Binary and multiclass classifications were made for the ISCX VPN-Non VPN dataset, and multiclass classifications were made for the UTMobileNet2021 dataset by using various algorithms on the datasets. The proposed model has been compared with eleven different algorithms and hybrid methods. At the same time, the effect of data balancing and feature selection on classification performance is examined. As a result, the proposed model achieved 93.91%, 82.68% and 96.83% accuracy rates in ISCX VPN-Non VPN binary and multiclass, UTMobileNet2021 multiclass classifications, respectively.