Adversarial Machine Learning
Summary
Adversarial machine learning focuses on studying vulnerabilities in ML systems and developing techniques to make them more robust against adversarial attacks. Key aspects include:
-
Identifying and generating adversarial examples - inputs that are minimally perturbed but cause misclassification. This includes targeted and untargeted attacks across various threat models (white-box, black-box, etc.).
-
Developing defenses and training techniques to improve model robustness, such as adversarial training, certified defenses, and randomized smoothing.
-
Evaluating robustness across different perturbation types, sizes, and out-of-distribution scenarios.
-
Studying transferability of attacks and defenses across models and domains.
-
Analyzing the geometry and manifold structure of adversarial examples.
-
Exploring adversarial vulnerabilities in real-world applications like medical imaging.
-
Developing formal verification methods to provide robustness guarantees.
-
Investigating the interplay between adversarial robustness and other desirable properties like accuracy and generalization.
Overall, this is an active area of research aiming to improve the security and reliability of ML systems against adversarial threats.
Research Papers
- Quantifying Perceptual Distortion of Adversarial Examples
- Fooling the primate brain with minimal, targeted image manipulation
- Transfer of Adversarial Robustness Between Perturbation Types
- Pyramid Adversarial Training Improves ViT Performance
- Adversarial Logit Pairing
- Certified Adversarial Robustness via Randomized Smoothing
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
- Certified Defenses against Adversarial Examples
- An Alternative Surrogate Loss for PGD-based Adversarial Testing
- Fortified Networks Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations
- CEB Improves Model Robustness
- BERT-ATTACK Adversarial Attack Against BERT Using BERT
- Testing Robustness Against Unforeseen Adversaries
- A Marauder’s Map of Security and Privacy in Machine Learning
- Certified Adversarial Defenses Meet Out-of-Distribution Corruptions Benchmarking Robustness and Simple Baselines
- Certified Patch Robustness via Smoothed Vision Transformers
- Sufficient Conditions for Idealised Models to Have No Adversarial Examples a Theoretical and Empirical Study with Bayesian Neural Networks
- Natural Adversarial Examples
- On the Geometry of Adversarial Examples
- The LogBarrier adversarial attack making effective use of decision boundary information
- Discrete Representations Strengthen Vision Transformer Robustness
- Adversarial Attacks Against Medical Deep Learning Systems
- Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation
- Playing the Game of Universal Adversarial Perturbations