Certified Adversarial Robustness via Randomized Smoothing

http://arxiv.org/abs/1902.02918v2

Abstract

We show how to turn any classifier that classifies well under Gaussian noise into a new classifier that is certifiably robust to adversarial perturbations under the $ℓ_{2}$ norm. This randomized smoothing technique has been proposed recently in the literature, but existing guarantees are loose. We prove a tight robustness guarantee in $ℓ_{2}$ norm for smoothing with Gaussian noise. We use randomized smoothing to obtain an ImageNet classifier with e.g. a certified top-1 accuracy of 49% under adversarial perturbations with $ℓ_{2}$ norm less than 0.5 (=127/255). No certified defense has been shown feasible on ImageNet except for smoothing. On smaller-scale datasets where competing approaches to certified $ℓ_{2}$ robustness are viable, smoothing delivers higher certified accuracies. Our strong empirical results suggest that randomized smoothing is a promising direction for future research into adversarially robust classification. Code and models are available at http//github.com/locuslab/smoothing.

AI Alignment Knowledge Graph

Table of Contents

Backlinks

Graph View

Certified Adversarial Robustness via Randomized Smoothing

Abstract