Abstract:
In today's world, the number of internet services and users is increasing rapidly. This leads to a significant rise in the internet traffic. Thus, the task of classifying IP traffic is essential for internet service providers or ISP, as well as various government and private organizations in order to have better network management and security. IP traffic classification involves identification of user activity using network traffic flowing through the system. This will also help in enhancing the performance of the network. The use of traditional IP traffic classification mechanisms which are based on inspection of packet payload and port numbers has decreased drastically because there are many internet applications nowadays which use port numbers which are dynamic in nature rather than well-known port numbers. Also, there are several encryption techniques nowadays due to which the inspection of packet payload is hindered. Presently, various machine learning techniques are generally used for classifying IP traffic. However, not much research has been conducted for the classification of IP traffic for a 4G network. During this research, we developed a new dataset by capturing packets of real-time internet traffic data of a 4G network using a tool named Wireshark. After that, we extracted the inferred features of the captured packets by using a python script. Then we applied five machine learning models, i.e., Decision Tree, Support Vector Machines, K Nearest Neighbours, Random Forest, and Naive Bayes for classifying IP traffic. It was observed that Random Forest gave the best accuracy of approximately 87%.