Local minima and generalization

Steve Lawrence, Ah Chung Tsoi, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

We consider a number of popular beliefs within the neural network community on the training and generalization behavior of multi-layer perceptrons, and to some extent recurrent networks: (a) 'the solution found is often close to the global minimum in terms of the magnitude of the error', (b) 'smaller networks generalize better than larger networks', (c) 'the number of parameters in the network should be less than the number of data points in order to provide good generalization'. For the tasks and methodology we consider, we show that a) the solution found is often significantly worse than the global minimum, b) oversize networks can provide improved generalization due to their ability to find better solutions, and c) that the optimal number of parameters with respect to generalization error can be much larger than the number of data points.

Original languageEnglish (US)
Title of host publicationIEEE International Conference on Neural Networks - Conference Proceedings
PublisherIEEE
Pages371-376
Number of pages6
Volume1
StatePublished - 1996
EventProceedings of the 1996 IEEE International Conference on Neural Networks, ICNN. Part 1 (of 4) - Washington, DC, USA
Duration: Jun 3 1996Jun 6 1996

Other

OtherProceedings of the 1996 IEEE International Conference on Neural Networks, ICNN. Part 1 (of 4)
CityWashington, DC, USA
Period6/3/966/6/96

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Local minima and generalization'. Together they form a unique fingerprint.

Cite this