Learning from Human Feedback
Summary
Learning from Human Feedback is a crucial area of AI alignment research that focuses on developing AI systems capable of understanding and fulfilling human intentions, even when those intentions are not easily formalized. This approach involves training AI agents using human-provided feedback, which can take various forms such as binary evaluations, visual explanations, or physical corrections. Recent research has explored the challenges and benefits of this method, including the impact of policy-dependent feedback, the importance of nonverbal robot feedback to improve human teaching, and the need to account for potential misspecifications in human objectives or teaching styles. Competitions like MineRL BASALT and algorithms such as COACH, EXPAND, and ReQueST have been developed to advance this field, addressing issues such as sample efficiency, safety concerns, and the ability to learn in complex 3D environments. These efforts aim to create more adaptable, safe, and aligned AI systems that can effectively learn from human guidance in diverse contexts.
Research Papers
- The MineRL BASALT Competition on Learning from Human Feedback
- Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
- Interactive Learning from Policy-Dependent Human Feedback
- Nonverbal Robot Feedback for Human Teachers
- Learning under Misspecified Objective Spaces
- Parenting Safe Reinforcement Learning from Human Input
- Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning
- Safe Deep RL in 3D Environments using Human Feedback