Abstract
We consider a number of popular beliefs within the neural network community on the training and generalization behavior of multi-layer perceptrons, and to some extent recurrent networks: (a) 'the solution found is often close to the global minimum in terms of the magnitude of the error', (b) 'smaller networks generalize better than larger networks', (c) 'the number of parameters in the network should be less than the number of data points in order to provide good generalization'. For the tasks and methodology we consider, we show that a) the solution found is often significantly worse than the global minimum, b) oversize networks can provide improved generalization due to their ability to find better solutions, and c) that the optimal number of parameters with respect to generalization error can be much larger than the number of data points.
Original language | English (US) |
---|---|
Title of host publication | IEEE International Conference on Neural Networks - Conference Proceedings |
Publisher | IEEE |
Pages | 371-376 |
Number of pages | 6 |
Volume | 1 |
State | Published - 1996 |
Event | Proceedings of the 1996 IEEE International Conference on Neural Networks, ICNN. Part 1 (of 4) - Washington, DC, USA Duration: Jun 3 1996 → Jun 6 1996 |
Other
Other | Proceedings of the 1996 IEEE International Conference on Neural Networks, ICNN. Part 1 (of 4) |
---|---|
City | Washington, DC, USA |
Period | 6/3/96 → 6/6/96 |
All Science Journal Classification (ASJC) codes
- Software