Adversarial Machine Learning on AI Model Attacks

Yang, Xinghao

Adversarial Machine Learning on AI Model Attacks

Yang, Xinghao

Permalink

Publication Type:: Thesis
Issue Date:: 2022

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (243.38 kB)

Adobe PDF

Download thesisAdobe PDF (2.8 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, Xinghao
dc.date.accessioned	2022-07-12T00:20:27Z
dc.date.available	2022-07-12T00:20:27Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/10453/158779
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Deep Neural Networks (DNNs) have achieved great success in multiple domains, stretching from Computer Vision (CV) to Natural Language Processing (NLP). However, recent studies demonstrated that DNNs are extremely vulnerable towards adversarial examples, which are original input with small perturbations. These perturbations are usually imperceptible to humans but mislead well-trained DNNs to erroneous output with high confidence. This phenomenon poses great concern of DNNs' robust performance on security-critical applications, such as traffic sign recognition and sentiment analysis. In this research, we focus on adversarial attacks, which is an effective strategy to understand DNNs behavior and promote their robust performance. Firstly, we proposed a Targeted Attention Attack (TAA) strategy to investigate the robustness of the traffic sign recognition system. Our TAA strategy takes the advantage of a soft attention map to reduce the attack cost and generates more natural perturbations to fit the real-world situations. Secondly, we designed the Bigram and Unigram based Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models in text classification. The BU-SPO attacks text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless sentences, where the Semantic Preservation Optimization (SPO) is designed to reduce the modification cost and improve the semantic consistency. Thirdly, we presented a BERT-based Simulated Annealing (BESA) algorithm to craft fluent text adversarial examples. The BESA mechanism employs the BERT Masked Language Model to generate context-aware word substitutions and adopts the Simulated Annealing to approach the global optima solution with a reasonable objective function.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/158779/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Adversarial Machine Learning on AI Model Attacks	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Deep Neural Networks (DNNs) have achieved great success in multiple domains, stretching from Computer Vision (CV) to Natural Language Processing (NLP). However, recent studies demonstrated that DNNs are extremely vulnerable towards adversarial examples, which are original input with small perturbations. These perturbations are usually imperceptible to humans but mislead well-trained DNNs to erroneous output with high confidence. This phenomenon poses great concern of DNNs' robust performance on security-critical applications, such as traffic sign recognition and sentiment analysis. In this research, we focus on adversarial attacks, which is an effective strategy to understand DNNs behavior and promote their robust performance. Firstly, we proposed a Targeted Attention Attack (TAA) strategy to investigate the robustness of the traffic sign recognition system. Our TAA strategy takes the advantage of a soft attention map to reduce the attack cost and generates more natural perturbations to fit the real-world situations. Secondly, we designed the Bigram and Unigram based Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models in text classification. The BU-SPO attacks text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless sentences, where the Semantic Preservation Optimization (SPO) is designed to reduce the modification cost and improve the semantic consistency. Thirdly, we presented a BERT-based Simulated Annealing (BESA) algorithm to craft fluent text adversarial examples. The BESA mechanism employs the BERT Masked Language Model to generate context-aware word substitutions and adopts the Simulated Annealing to approach the global optima solution with a reasonable objective function.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/158779