Dynamic Adversarial Training

Summary

Dynamic Adversarial Training, particularly Dynamic Adversarial Data Collection (DADC), is a promising approach for creating robust machine learning models by generating diverse and challenging training datasets. This method involves human annotators crafting examples that continually challenge improving models over multiple rounds. Research has shown that extended DADC, conducted over many rounds (e.g., 20 rounds), can significantly enhance model performance and generalization. Models trained on DADC examples demonstrate reduced error rates on expert-curated test sets compared to those trained on non-adversarial data. The advantages of DADC include the generation of more difficult, lexically and syntactically diverse examples with fewer annotation artifacts, ultimately leading to better model robustness across a wide range of test inputs.

Research Papers

Analyzing Dynamic Adversarial Training Data in the Limit

AI Alignment Knowledge Graph

Table of Contents

Backlinks

Graph View

Dynamic Adversarial Training

Summary

Research Papers