Performance evaluation of intrusion detection based on machine learning using Apache Spark

Performance evaluation of intrusion detection based on machine learning using Apache Spark

Abstract:

Nowadays, network intrusion is considered as one of the major concerns in network communications. Thus, the developed network intrusion detection systems aim to identify attacks or malicious activities in a network environment. Various methods have been already proposed for finding an effective and efficient solution to detect and prevent intrusion in the network, ensuring network security and privacy. Machine learning is an effective analysis framework to detect any anomalous events occurred in the network traffic flow. Based on this framework, the paper in hand evaluates the performance of four well-known classification algorithms; SVM, Naïve Bayes, Decision Tree and Random Forest using Apache Spark, a big data processing tool for intrusion detection in network traffic. The overall performance comparison is evaluated in terms of detection accuracy, building time and prediction time. Experimental results on UNSW-NB15, a recent public dataset for network intrusion detection, show an important advantage for Random Forest classifier among other well-known classifiers in terms of detection accuracy and prediction time, using the complete dataset with all 42 features.