Adversarial machine learning in generative models

Publication Type:
Thesis
Issue Date:
2025
Full metadata record
Generative artificial intelligence (AI) models, such as natural language processing (NLP) and computer vision (CV) models, have demonstrated remarkable progress in recent years, enabling significant advancements in applications such as question answering, summarization, and image captioning. However, recent studies highlight the vulnerability of these models to adversarial attacks, raising critical concerns about AI system reliability and real-world deployment challenges. This thesis investigates the vulnerabilities of three widely used models: question answering, summarization, and image captioning models, and it introduces innovative techniques for generating highly effective yet imperceptible adversarial examples. The research presents three novel attack methodologies. First, the Paraphrasing-Based Summarization Attack (SAP) addresses abstractive summarization models by ranking sentences based on their importance to the summarization outcome. This approach employs sophisticated paraphrasing mechanisms to craft adversarial examples that preserve semantic coherence while inducing incorrect summaries. Second, AICAttack (Attention-Based Image Captioning Attack) introduces a novel black-box adversarial attack for image captioning models. The method leverages an attention-based mechanism to identify critical image regions. It employs customized differential evolution algorithms to optimize pixel perturbations, achieving highly effective adversarial captions while maintaining minimal visual perturbation. Finally, the QA-Attack method presents a comprehensive approach to crafting practical adversarial examples for question answering models, addressing both boolean and informative queries. The method adopts a Hybrid Ranking Fusion algorithm that combines attention-based and removal-based ranking mechanisms, demonstrating high success rates across diverse question answering systems. This thesis advances the field of AI security through these three innovative attack methodologies, each demonstrating superior effectiveness while maintaining imperceptibility and revealing critical vulnerabilities in today’s generative AI landscape. These innovative approaches not only expose fundamental limitations across diverse AI applications but also establish a crucial foundation for developing more robust and interpretable next-generation systems, ultimately fostering greater trust in AI technologies increasingly deployed in sensitive real-world contexts.
Please use this identifier to cite or link to this item: