Classification Modelling for Encrypted Network Traffic Captured in Air

Huang, Yi

Classification Modelling for Encrypted Network Traffic Captured in Air

Huang, Yi

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (4.69 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Huang, Yi
dc.date.accessioned	2024-04-03T00:34:21Z
dc.date.available	2024-04-03T00:34:21Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/177451
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	End-to-end encrypted traffic predominates the network traffic on the Internet for privacy and security. However, unveiling encrypted harmful content and detecting malicious activities become difficult to achieve through traditional network surveillance. To address encrypted network traffic (captured in air) surveillance problems, this thesis leverages deep learning models to identify the content of encrypted network traffic, including classification, novelty detection, and few-shot open-set recognition. First, it is still challenging to identify the content of encrypted network traffic passively sniffed in air (Data Link Layer) due to the lack of packet header information in upper protocol layers. Therefore, I evaluate the feasibility of classifying the content of encrypted traffic with the raw data and frame level features using a lightweight deep learning model. Second, to use classification models in practice, the proposed model should be able to identify test samples as the target traffic (i.e., inliers) or background traffic (i.e., outliers). I proposed a Calibrated Reconstruction Based Adversarial AutoEncoder model (CRAAE) to implement location agnostic outlier detection. The key idea is to integrate implicit and explicit confidence calibration strategies into a reconstruction-based model. I leverage the category information disentangled from feature space to calibrate the decision metric (i.e., reconstruction error) constructed in the original data space for building a more accurate decision boundary. CRAAE also adds Uniform or Dirichlet noise into the artificial outlier generation process to represent various outliers. Finally, I propose a task adaptive Siamese Neural Network (SNN) for open-set recognition to meet higher practical requirements, like adapting dynamically changing task scenarios. My contributions are three-fold: i) introducing generated positive and negative pairs into the SNN training process to shape a more precise similarity boundary through bidirectional dropout data augmentation; ii) utilising Dirichlet Process Gaussian Mixture Model (DPGMM) distribution to fit the similarity scores of the negative pairs constructed by the support set of each query task and creating a new open-set recognition metric; iii) constructing a hierarchical cross entropy loss by leveraging the extracted features at coarse and fine granular levels to improve the confidence of the similarity score. This thesis started with application research and explored more comprehensive scenarios. The proposed methods are demonstrated to solve the problems in the network traffic domain and to be transferred to other fields.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/177451/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2023 Yi Huang
dc.rights	au.edu.uts.lib/cph
dc.title	Classification Modelling for Encrypted Network Traffic Captured in Air	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

End-to-end encrypted traffic predominates the network traffic on the Internet for privacy and security. However, unveiling encrypted harmful content and detecting malicious activities become difficult to achieve through traditional network surveillance. To address encrypted network traffic (captured in air) surveillance problems, this thesis leverages deep learning models to identify the content of encrypted network traffic, including classification, novelty detection, and few-shot open-set recognition. First, it is still challenging to identify the content of encrypted network traffic passively sniffed in air (Data Link Layer) due to the lack of packet header information in upper protocol layers. Therefore, I evaluate the feasibility of classifying the content of encrypted traffic with the raw data and frame level features using a lightweight deep learning model. Second, to use classification models in practice, the proposed model should be able to identify test samples as the target traffic (i.e., inliers) or background traffic (i.e., outliers). I proposed a Calibrated Reconstruction Based Adversarial AutoEncoder model (CRAAE) to implement location agnostic outlier detection. The key idea is to integrate implicit and explicit confidence calibration strategies into a reconstruction-based model. I leverage the category information disentangled from feature space to calibrate the decision metric (i.e., reconstruction error) constructed in the original data space for building a more accurate decision boundary. CRAAE also adds Uniform or Dirichlet noise into the artificial outlier generation process to represent various outliers. Finally, I propose a task adaptive Siamese Neural Network (SNN) for open-set recognition to meet higher practical requirements, like adapting dynamically changing task scenarios. My contributions are three-fold: i) introducing generated positive and negative pairs into the SNN training process to shape a more precise similarity boundary through bidirectional dropout data augmentation; ii) utilising Dirichlet Process Gaussian Mixture Model (DPGMM) distribution to fit the similarity scores of the negative pairs constructed by the support set of each query task and creating a new open-set recognition metric; iii) constructing a hierarchical cross entropy loss by leveraging the extracted features at coarse and fine granular levels to improve the confidence of the similarity score. This thesis started with application research and explored more comprehensive scenarios. The proposed methods are demonstrated to solve the problems in the network traffic domain and to be transferred to other fields.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/177451