Classification Modelling for Encrypted Network Traffic Captured in Air

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
End-to-end encrypted traffic predominates the network traffic on the Internet for privacy and security. However, unveiling encrypted harmful content and detecting malicious activities become difficult to achieve through traditional network surveillance. To address encrypted network traffic (captured in air) surveillance problems, this thesis leverages deep learning models to identify the content of encrypted network traffic, including classification, novelty detection, and few-shot open-set recognition. First, it is still challenging to identify the content of encrypted network traffic passively sniffed in air (Data Link Layer) due to the lack of packet header information in upper protocol layers. Therefore, I evaluate the feasibility of classifying the content of encrypted traffic with the raw data and frame level features using a lightweight deep learning model. Second, to use classification models in practice, the proposed model should be able to identify test samples as the target traffic (i.e., inliers) or background traffic (i.e., outliers). I proposed a Calibrated Reconstruction Based Adversarial AutoEncoder model (CRAAE) to implement location agnostic outlier detection. The key idea is to integrate implicit and explicit confidence calibration strategies into a reconstruction-based model. I leverage the category information disentangled from feature space to calibrate the decision metric (i.e., reconstruction error) constructed in the original data space for building a more accurate decision boundary. CRAAE also adds Uniform or Dirichlet noise into the artificial outlier generation process to represent various outliers. Finally, I propose a task adaptive Siamese Neural Network (SNN) for open-set recognition to meet higher practical requirements, like adapting dynamically changing task scenarios. My contributions are three-fold: i) introducing generated positive and negative pairs into the SNN training process to shape a more precise similarity boundary through bidirectional dropout data augmentation; ii) utilising Dirichlet Process Gaussian Mixture Model (DPGMM) distribution to fit the similarity scores of the negative pairs constructed by the support set of each query task and creating a new open-set recognition metric; iii) constructing a hierarchical cross entropy loss by leveraging the extracted features at coarse and fine granular levels to improve the confidence of the similarity score. This thesis started with application research and explored more comprehensive scenarios. The proposed methods are demonstrated to solve the problems in the network traffic domain and to be transferred to other fields.
Please use this identifier to cite or link to this item: