Distributional Generalization
Summary
Distributional Generalization is a novel concept in machine learning that extends beyond traditional notions of generalization. It proposes that the outputs of a classifier at both training and test time are similar as entire distributions, rather than just in terms of average error. This concept is exemplified by the observation that if a certain percentage of mislabeling occurs in the training set, a similar pattern of mislabeling tends to manifest in the test set as well. Unlike classical generalization, which focuses solely on average error, distributional generalization takes into account the distribution of errors across the input domain. The authors present formal conjectures that describe the expected form of distributional generalization based on various factors such as model architecture, training procedure, sample size, and data distribution. These conjectures are supported by empirical evidence from diverse areas of machine learning, including neural networks, kernel machines, and decision trees, thus providing new insights into the behavior of interpolating classifiers and advancing our understanding of generalization in machine learning.