Interactive Explanations

Summary

Interactive explanations provide a novel approach to improving communication between AI agents and users in the context of reinforcement learning. This method uses natural language templates to create a two-way communication channel, allowing agents to explain their decision-making processes and users to provide feedback and corrections. By making the agent’s thinking procedure transparent, users can diagnose issues and suggest modifications to the agent’s behavior, including specific actions, goals, and the reasoning behind them. This approach has been successfully tested in a video game environment, demonstrating its effectiveness in diagnosing and repairing agent behaviors. Interactive explanations offer a promising solution to the challenge of aligning AI behavior with user preferences, bridging the gap between complex reinforcement learning models and human understanding.

Research Papers

Interactive Explanations Diagnosis and Repair of Reinforcement Learning Based Agent Behaviors

AI Alignment Knowledge Graph

Table of Contents

Backlinks

Graph View

Interactive Explanations

Summary

Research Papers