Learning Curve Theory

http://arxiv.org/abs/2102.04074v1

Abstract

Recently a number of empirical universal scaling law papers have been published, most notably by OpenAI. Scaling laws’ refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size . Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with or , where is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit learning curves for arbitrary power , and determine whether power laws are universal or depend on the data distribution.