Building Reliable Autonomous Agents: A Causal Perspective
- Publication Type:
- Thesis
- Issue Date:
- 2024
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Recent advances in reinforcement learning (RL) and large language models (LLMs) have enabled the development of highly capable autonomous agents. These agents are designed to complete complex tasks and make proper decisions in a variety of domains, such as robotics, personal assistance, finance, and healthcare. Ensuring the reliability of these agents is crucial for their safe deployment in real-world settings. Current approaches often overlook the causal relationships underlying the data generation and decision-making processes of autonomous agents, leading to suboptimal performance and unintended consequences. For example, in offline RL, where the agent cannot access the environment to collect new feedback, false correlations between uncertainty and decision-making may mislead the agent to take suboptimal actions that yield high returns only by chance. Moreover, for decision-making problems involving sensitive attributes like race and gender, agents failing to understand the causal relationships between these attributes and the outcomes may exhibit biased and discriminatory behaviors, perpetuating unfairness. Furthermore, language-based agents are often criticized for generating harmful or inappropriate outputs, raising concerns about their transparency and safety, which necessitate reliable methods to interpret the causes of their behaviors.
To overcome these challenges, this research explores how integrating causality into autonomous agents can enhance their reliability. By explicitly considering causal relationships, this work aims to improve the robustness, fairness, and transparency of autonomous agents, making them more reliable and trustworthy. Specifically, 1) we propose the SCORE algorithm, which mitigates false correlations in offline RL, leading to improved robustness and provably efficient policy learning; 2) we introduce dynamics fairness in RL, using causal mediation analysis to evaluate the fairness of environmental dynamics in sequential decision-making problems, which allows agents to perceive and promote long-term fairness; 3) we develop the CARE approach, which helps enhance the transparency of language-based agents by incorporating matched-pair trials in representation engineering. It offers a principled way to establish a causal link between the neural activities and the model behaviors, enabling more reliable interpretation and control of the agent. Theoretical analysis and extensive empirical evaluations demonstrate the effectiveness of the proposed approaches, highlighting the potential and importance of addressing reliability challenges in autonomous agents through a causal perspective.
Please use this identifier to cite or link to this item:
