Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping

Rich Caruana, Steve Lawrence, Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

171 Citations (Scopus)

Abstract

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000
PublisherNeural information processing systems foundation
ISBN (Print)0262122413, 9780262122412
StatePublished - 2001
Event14th Annual Neural Information Processing Systems Conference, NIPS 2000 - Denver, CO, United States
Duration: Nov 27 2000Dec 2 2000

Other

Other14th Annual Neural Information Processing Systems Conference, NIPS 2000
CountryUnited States
CityDenver, CO
Period11/27/0012/2/00

Fingerprint

Backpropagation
Neural networks
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Caruana, R., Lawrence, S., & Giles, L. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000 Neural information processing systems foundation.
Caruana, Rich ; Lawrence, Steve ; Giles, Lee. / Overfitting in neural nets : Backpropagation, conjugate gradient, and early stopping. Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000. Neural information processing systems foundation, 2001.
@inproceedings{c24ac8ec7df245b3b598cc01f77adbff,
title = "Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping",
abstract = "The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.",
author = "Rich Caruana and Steve Lawrence and Lee Giles",
year = "2001",
language = "English (US)",
isbn = "0262122413",
booktitle = "Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000",
publisher = "Neural information processing systems foundation",

}

Caruana, R, Lawrence, S & Giles, L 2001, Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. in Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000. Neural information processing systems foundation, 14th Annual Neural Information Processing Systems Conference, NIPS 2000, Denver, CO, United States, 11/27/00.

Overfitting in neural nets : Backpropagation, conjugate gradient, and early stopping. / Caruana, Rich; Lawrence, Steve; Giles, Lee.

Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000. Neural information processing systems foundation, 2001.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Overfitting in neural nets

T2 - Backpropagation, conjugate gradient, and early stopping

AU - Caruana, Rich

AU - Lawrence, Steve

AU - Giles, Lee

PY - 2001

Y1 - 2001

N2 - The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

AB - The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity. 2) Regardless of size, nets learn task subcomponents in similar sequence. Big nets pass through stages similar to those learned by smaller nets. Early stopping can stop training the large net when it generalizes comparably to a smaller net. We also show that conjugate gradient can yield worse generalization because it overfits regions of low non-linearity when learning to fit regions of high non-linearity.

UR - http://www.scopus.com/inward/record.url?scp=84898932856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898932856&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84898932856

SN - 0262122413

SN - 9780262122412

BT - Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000

PB - Neural information processing systems foundation

ER -

Caruana R, Lawrence S, Giles L. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13 - Proceedings of the 2000 Conference, NIPS 2000. Neural information processing systems foundation. 2001