A Study on Learning Knowledge-based Security in Reinforcement Learning

Lei, Yunjiao

A Study on Learning Knowledge-based Security in Reinforcement Learning

Lei, Yunjiao

Permalink

Publication Type:: Thesis
Issue Date:: 2024

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (10.48 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Lei, Yunjiao
dc.date.accessioned	2025-06-29T23:49:44Z
dc.date.available	2025-06-29T23:49:44Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/10453/187985
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Reinforcement learning (RL) is a crucial branch of Artificial Intelligence (AI) that focuses on agents interacting with their environments to learn optimal policies through trial and error. However, this approach typically requires substantial data and time for learning. Moreover, in some cases, there is neither the opportunity nor sufficient time to interact with the environment to collect adequate data. To improve learning efficiency and expand the learning dataset, a common strategy is to enable an RL agent to acquire training knowledge from additional resources. This thesis focuses on knowledge-based security involving performance and robustness within reinforcement learning. It explores two distinct scenarios of learning knowledge resources and presents novel methods to tackle the associated challenges. First, we investigate a scenario in which a RL agent obtains training knowledge from other RL agents within a multi-agent context. The second knowledge resource domain involves Large Language Models (LLMs). Our experimental results also indicate that these proposed approaches significantly surpasses baseline methods in RL performance. Specifically, the contributions of this thesis can be summarized as follows: 1.We proposed a novel advising approach for simultaneous learning environments that eliminates the need for pre-trained teachers and allows each agent to seek advice from multiple teachers. The approach also takes into account the agents’ connection structure by utilizing a GNN, which can aggregate advice and learn the weight of each piece of advice. 2.We proposed a novel framework called the Federated Advisory Teacher-student (FATS) framework for simultaneous learning, considering a more complex environment with simultaneous learning and handling multiple advice in a deep reinforcement learning context. 3.We introduced an innovative online internal advice poisoning attack in MARL. This method targets the internal dynamics of the multi-agent system. Internal attacks can more directly affect the training process and often possess a greater destructive potential on policy than external attacks. This method also reduced information requirements for the malicious agent. 4.We introduce a novel data augmentation approach leveraging Large Language Models (LLMs) considering critical samples. LLMs have demonstrated exceptional capabilities, thereby acting as dependable sources for generating training data for RL environments. We also proposed a left-right limit construct method to handle the neglect effects of critical samples by unlearning. Furthermore, the left-right limit construct method fine-tunes the policy model, enhancing its overall performance.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/187985/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2024 Yunjiao Lei
dc.rights	au.edu.uts.lib/cph
dc.title	A Study on Learning Knowledge-based Security in Reinforcement Learning	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Reinforcement learning (RL) is a crucial branch of Artificial Intelligence (AI) that focuses on agents interacting with their environments to learn optimal policies through trial and error. However, this approach typically requires substantial data and time for learning. Moreover, in some cases, there is neither the opportunity nor sufficient time to interact with the environment to collect adequate data. To improve learning efficiency and expand the learning dataset, a common strategy is to enable an RL agent to acquire training knowledge from additional resources. This thesis focuses on knowledge-based security involving performance and robustness within reinforcement learning. It explores two distinct scenarios of learning knowledge resources and presents novel methods to tackle the associated challenges. First, we investigate a scenario in which a RL agent obtains training knowledge from other RL agents within a multi-agent context. The second knowledge resource domain involves Large Language Models (LLMs). Our experimental results also indicate that these proposed approaches significantly surpasses baseline methods in RL performance. Specifically, the contributions of this thesis can be summarized as follows: 1.We proposed a novel advising approach for simultaneous learning environments that eliminates the need for pre-trained teachers and allows each agent to seek advice from multiple teachers. The approach also takes into account the agents’ connection structure by utilizing a GNN, which can aggregate advice and learn the weight of each piece of advice. 2.We proposed a novel framework called the Federated Advisory Teacher-student (FATS) framework for simultaneous learning, considering a more complex environment with simultaneous learning and handling multiple advice in a deep reinforcement learning context. 3.We introduced an innovative online internal advice poisoning attack in MARL. This method targets the internal dynamics of the multi-agent system. Internal attacks can more directly affect the training process and often possess a greater destructive potential on policy than external attacks. This method also reduced information requirements for the malicious agent. 4.We introduce a novel data augmentation approach leveraging Large Language Models (LLMs) considering critical samples. LLMs have demonstrated exceptional capabilities, thereby acting as dependable sources for generating training data for RL environments. We also proposed a left-right limit construct method to handle the neglect effects of critical samples by unlearning. Furthermore, the left-right limit construct method fine-tunes the policy model, enhancing its overall performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/187985