Adversarial machine learning in generative models

Li, Jiyao

Adversarial machine learning in generative models

Li, Jiyao

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (4.4 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, Jiyao
dc.date.accessioned	2026-01-16T02:15:46Z
dc.date.available	2026-01-16T02:15:46Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/191896
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Generative artificial intelligence (AI) models, such as natural language processing (NLP) and computer vision (CV) models, have demonstrated remarkable progress in recent years, enabling significant advancements in applications such as question answering, summarization, and image captioning. However, recent studies highlight the vulnerability of these models to adversarial attacks, raising critical concerns about AI system reliability and real-world deployment challenges. This thesis investigates the vulnerabilities of three widely used models: question answering, summarization, and image captioning models, and it introduces innovative techniques for generating highly effective yet imperceptible adversarial examples. The research presents three novel attack methodologies. First, the Paraphrasing-Based Summarization Attack (SAP) addresses abstractive summarization models by ranking sentences based on their importance to the summarization outcome. This approach employs sophisticated paraphrasing mechanisms to craft adversarial examples that preserve semantic coherence while inducing incorrect summaries. Second, AICAttack (Attention-Based Image Captioning Attack) introduces a novel black-box adversarial attack for image captioning models. The method leverages an attention-based mechanism to identify critical image regions. It employs customized differential evolution algorithms to optimize pixel perturbations, achieving highly effective adversarial captions while maintaining minimal visual perturbation. Finally, the QA-Attack method presents a comprehensive approach to crafting practical adversarial examples for question answering models, addressing both boolean and informative queries. The method adopts a Hybrid Ranking Fusion algorithm that combines attention-based and removal-based ranking mechanisms, demonstrating high success rates across diverse question answering systems. This thesis advances the field of AI security through these three innovative attack methodologies, each demonstrating superior effectiveness while maintaining imperceptibility and revealing critical vulnerabilities in today’s generative AI landscape. These innovative approaches not only expose fundamental limitations across diverse AI applications but also establish a crucial foundation for developing more robust and interpretable next-generation systems, ultimately fostering greater trust in AI technologies increasingly deployed in sensitive real-world contexts.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/191896/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Jiyao Li
dc.rights	au.edu.uts.lib/cph
dc.title	Adversarial machine learning in generative models	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Generative artificial intelligence (AI) models, such as natural language processing (NLP) and computer vision (CV) models, have demonstrated remarkable progress in recent years, enabling significant advancements in applications such as question answering, summarization, and image captioning. However, recent studies highlight the vulnerability of these models to adversarial attacks, raising critical concerns about AI system reliability and real-world deployment challenges. This thesis investigates the vulnerabilities of three widely used models: question answering, summarization, and image captioning models, and it introduces innovative techniques for generating highly effective yet imperceptible adversarial examples. The research presents three novel attack methodologies. First, the Paraphrasing-Based Summarization Attack (SAP) addresses abstractive summarization models by ranking sentences based on their importance to the summarization outcome. This approach employs sophisticated paraphrasing mechanisms to craft adversarial examples that preserve semantic coherence while inducing incorrect summaries. Second, AICAttack (Attention-Based Image Captioning Attack) introduces a novel black-box adversarial attack for image captioning models. The method leverages an attention-based mechanism to identify critical image regions. It employs customized differential evolution algorithms to optimize pixel perturbations, achieving highly effective adversarial captions while maintaining minimal visual perturbation. Finally, the QA-Attack method presents a comprehensive approach to crafting practical adversarial examples for question answering models, addressing both boolean and informative queries. The method adopts a Hybrid Ranking Fusion algorithm that combines attention-based and removal-based ranking mechanisms, demonstrating high success rates across diverse question answering systems. This thesis advances the field of AI security through these three innovative attack methodologies, each demonstrating superior effectiveness while maintaining imperceptibility and revealing critical vulnerabilities in today’s generative AI landscape. These innovative approaches not only expose fundamental limitations across diverse AI applications but also establish a crucial foundation for developing more robust and interpretable next-generation systems, ultimately fostering greater trust in AI technologies increasingly deployed in sensitive real-world contexts.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/191896