Adversarial Machine Learning on AI Model Attacks

Publication Type:
Issue Date:
Full metadata record
Deep Neural Networks (DNNs) have achieved great success in multiple domains, stretching from Computer Vision (CV) to Natural Language Processing (NLP). However, recent studies demonstrated that DNNs are extremely vulnerable towards adversarial examples, which are original input with small perturbations. These perturbations are usually imperceptible to humans but mislead well-trained DNNs to erroneous output with high confidence. This phenomenon poses great concern of DNNs' robust performance on security-critical applications, such as traffic sign recognition and sentiment analysis. In this research, we focus on adversarial attacks, which is an effective strategy to understand DNNs behavior and promote their robust performance. Firstly, we proposed a Targeted Attention Attack (TAA) strategy to investigate the robustness of the traffic sign recognition system. Our TAA strategy takes the advantage of a soft attention map to reduce the attack cost and generates more natural perturbations to fit the real-world situations. Secondly, we designed the Bigram and Unigram based Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models in text classification. The BU-SPO attacks text documents not only at the unigram word level but also at the bigram level to avoid producing meaningless sentences, where the Semantic Preservation Optimization (SPO) is designed to reduce the modification cost and improve the semantic consistency. Thirdly, we presented a BERT-based Simulated Annealing (BESA) algorithm to craft fluent text adversarial examples. The BESA mechanism employs the BERT Masked Language Model to generate context-aware word substitutions and adopts the Simulated Annealing to approach the global optima solution with a reasonable objective function.
Please use this identifier to cite or link to this item: