Abstract:
Traffic analysis is the process of monitoring network activities, discovering specific patterns, and gleaning valuable information from network traffic. It can be applied in various fields such as network assert probing and anomaly detection. With the advent of network traffic encryption, however, traffic analysis becomes an arduous task. Due to the invisibility of packet payload, traditional traffic analysis methods relying on capturing valuable information from plaintext payload are likely to lose efficacy. Machine learning has been emerging as a powerful tool to extract informative features without getting access to payload, and thus is widely employed in encrypted traffic analysis. In this paper, we present a comprehensive survey on recent achievements in machine learning-powered encrypted traffic analysis. To begin with, we review the literature in this area and summarize the analysis goals that serve as the basis for literature classification. Then, we abstract the workflow of encrypted traffic analysis with machine learning tools, including traffic collection, traffic representation, traffic analysis method, and performance evaluation. For the surveyed studies, the requirements of classification granularity and information timeliness may vary a lot for different analysis goals. Hence, in terms of the goal of traffic analysis, we present a comprehensive review on existing studies according to four categories: network asset identification, network characterization, privacy leakage detection, and anomaly detection. Finally, we discuss the challenges and directions for future research on encrypted traffic analysis.