Evaluation of Machine Learning Algorithms for Intrusion Detection System in Python

Evaluation of Machine Learning Algorithms for Intrusion Detection System in Python

Abstract:

As the Internet continues to get stronger, so does the potential risk of malicious users trying to harm others. An intrusion detection system (IDS) can be used to alert the appropriate entities when potentially dangerous operations are happening within a host of set of hosts. Nowadays, we need a system that can accurately process large amount of network data quickly. Most of the state-of-the art IDSs apply the traditional machine learning algorithms to classify whether a packet is a part of an attack. However, these algorithms typically aren't implemented with a big data platform. In this research, we will use Apache Spark, a big data processing tool known for handling tasks at fast speeds, to process network packet data. This paper utilizes the Spark libraries to implement Random Forest, Support Vector Machines (SVMs), Logistic Regression, Naïve Bayes, and Gradient Boosted Trees. We also implement a Deep Multilayer Perceptron, which is a Spark implementation of a deep learning algorithm. We compare the results of the traditional machine learning algorithms to the deep learning method. Our results show that the deep learning algorithm produces favorable accuracy, precision, and recall, but takes longer to analyze the data than classical machine learning algorithms.