Defining Human Values
Summary
Defining human values is a crucial challenge in AI alignment research, as it involves capturing and encoding the complex moral principles, ethical judgments, and social norms that guide human behavior. Researchers have approached this task through various methods, including creating datasets of ethical scenarios and judgments (like the ETHICS dataset and Commonsense Norm Bank), developing models to predict moral reasoning (such as Delphi), and exploring philosophical frameworks for addressing normative uncertainty. The challenge lies not only in identifying universal human values but also in accounting for personal values, different moral frameworks, and the contextual nature of ethical decision-making. Efforts to define human values must grapple with the interplay between competing values, the vagueness inherent in moral concepts, and the need for a well-defined procedure to resolve ontological crises as AI systems evolve. Ultimately, the goal is to create AI systems that can make ethically-aligned decisions in diverse real-world situations, reflecting the nuanced and sometimes conflicting nature of human values.
Research Papers
- Aligning AI With Shared Human Values
- Delphi Towards Machine Ethics and Norms
- Hard Choices in Artificial Intelligence Addressing Normative Uncertainty through Sociotechnical Commitments
- Ontological Crises in Artificial Agents’ Value Systems
- AI Safety and Reproducibility Establishing Robust Foundations for the Neuropsychology of Human Values
- Towards a Theory of Justice for Artificial Intelligence
- What Would Jiminy Cricket Do? Towards Agents That Behave Morally
- A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents
- What are you optimizing for? Aligning Recommender Systems with Human Values
- Building Ethics into Artificial Intelligence
- Artificial Intelligence, Values and Alignment
- Alignment of Language Agents
- Friendly Artificial Intelligence the Physics Challenge