Obfuscated Tor Traffic Identification Based on Sliding Window
Tor is an anonymous communication network used to hide the identities of both parties in communication. Apart from those who want to browse the web anonymously using Tor for a benign purpose, criminals can use Tor for criminal activities. It is recognized that Tor is easily intercepted by the censorship mechanism, so it uses a series of obfuscation mechanisms to avoid censorship, such as Meek, Format-Transforming Encryption (FTE), and Obfs4. In order to detect Tor traffic, we collect three kinds of obfuscated Tor traffic and then use a sliding window to extract 12 features from the stream according to the five-tuple, including the packet length, packet arrival time interval, and the proportion of the number of bytes sent and received. And finally, we use XGBoost, Random Forest, and other machine learning algorithms to identify obfuscated Tor traffic and its types. Our work provides a feasible method for countering obfuscated Tor network, which can identify the three kinds of obfuscated Tor traffic and achieve about 99% precision rate and recall rate.