Abstract:
Identifying the nature of data flows can help improve network service and security. Most existing solutions usually simplify the traffic classification to protocol and application identification based on some uniqueness assumptions. However, in the real world these assumptions aren't always reasonable due to the abuse of multiplexing techniques. In this work, a new scheme is proposed from a different perspective that aims to directly identify the content inside a data flow without considering the external protocols and applications. We use wavelet to obtain the time-scale signals of each data flow and develop a new hidden Markov tree (HMT) with an embedding deep neural network (DNN) to model these signals. Each hidden state of the HMT represents a specific signal generation pattern. Transition of hidden states describes the time-scale context of the signal patterns. DNN is used to describe the probabilistic relationship between the implicit patterns and the observed time-scale signals. We derive new algorithms for the model and create an instance for each type of traffic, which projects the data flows into a multi-dimensional decision space and achieves their content identification through a classifier. Numerical experiments using real datasets are presented to validate the proposed scheme. Performance-related issues and comparisons with related works are discussed.