A Study on Learning Knowledge-based Security in Reinforcement Learning
- Publication Type:
- Thesis
- Issue Date:
- 2024
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Reinforcement learning (RL) is a crucial branch of Artificial Intelligence (AI) that focuses on agents interacting with their environments to learn optimal policies through trial and error. However, this approach typically requires substantial data and time for learning. Moreover, in some cases, there is neither the opportunity nor sufficient time to interact with the environment to collect adequate data. To improve learning efficiency and expand the learning dataset, a common strategy is to enable an RL agent to acquire training knowledge from additional resources.
This thesis focuses on knowledge-based security involving performance and robustness within reinforcement learning. It explores two distinct scenarios of learning knowledge resources and presents novel methods to tackle the associated challenges.
First, we investigate a scenario in which a RL agent obtains training knowledge from other RL agents within a multi-agent context. The second knowledge resource domain involves Large Language Models (LLMs). Our experimental results also indicate that these proposed approaches significantly surpasses baseline methods in RL performance.
Specifically, the contributions of this thesis can be summarized as follows:
1.We proposed a novel advising approach for simultaneous learning environments that eliminates the need for pre-trained teachers and allows each agent to seek advice from multiple teachers. The approach also takes into account the agents’ connection structure by utilizing a GNN, which can aggregate advice and learn the weight of each piece of advice.
2.We proposed a novel framework called the Federated Advisory Teacher-student (FATS) framework for simultaneous learning, considering a more complex environment with simultaneous learning and handling multiple advice in a deep reinforcement learning context.
3.We introduced an innovative online internal advice poisoning attack in MARL. This method targets the internal dynamics of the multi-agent system. Internal attacks can more directly affect the training process and often possess a greater destructive potential on policy than external attacks. This method also reduced information requirements for the malicious agent.
4.We introduce a novel data augmentation approach leveraging Large Language Models (LLMs) considering critical samples. LLMs have demonstrated exceptional capabilities, thereby acting as dependable sources for generating training data for RL environments. We also proposed a left-right limit construct method to handle the neglect effects of critical samples by unlearning. Furthermore, the left-right limit construct method fine-tunes the policy model, enhancing its overall performance.
Please use this identifier to cite or link to this item:
