Unmasking vulnerabilities : adversarial attacks via word-level manipulation on NLP models

Publication Type:
Thesis
Issue Date:
2024
Full metadata record
Natural language processing (NLP) models have advanced significantly and are widely used in applications like sentiment analysis, translation, and chatbots. However, they are vulnerable to adversarial attacks, threatening their reliability and real-world adoption. This thesis examines the vulnerabilities of sequence-to-sequence and classification models and introduces techniques for creating effective, imperceptible adversarial examples. The Hybrid Attentive Attack (HAA) method crafts subtle adversarial examples in Neural Machine Translation by focusing on semantically relevant words. The Fraud's Bargain Attack (FBA) uses randomization to improve adversarial example selection for classifiers via the Word Manipulation Process (WMP) and the Metropolis-Hasting sampler. Two algorithms, Reversible Jump Attack (RJA) and Metropolis-Hasting Modification Reduction (MMR), enhance search space and balance changes with attack success. This thesis demonstrates the proposed methods' effectiveness through extensive experiments.
Please use this identifier to cite or link to this item: