Lessons in neural network training: overfitting may be harder than expected

Steve Lawrence, C. Lee Giles, Ah Chung Tsoi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

124 Citations (Scopus)

Abstract

For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that the optimal solution is typically not found. Furthermore, we observe that networks larger than might be expected can result in lower training and generalization error. This result is supported by another real world example. We further investigate the training behavior by analyzing the weights in trained networks (excess degrees of freedom are seen to do little harm and to aid convergence), and contrasting the interpolation characteristics of multi-layer perceptron neural networks (MLPs) and polynomial models (overfitting behavior is very different - the MLP is often biased towards smoother solutions). Finally, we analyze relevant theory outlining the reasons for significant practical differences. These results bring into question common beliefs about neural network training regarding convergence and optimal network size, suggest alternate guidelines for practical use (lower fear of excess degrees of freedom), and help to direct future work (e.g. methods for creation of more parsimonious solutions, importance of the MLP/BP bias and possibly worse performance of 'improved' training algorithms).

Original languageEnglish (US)
Title of host publicationProceedings of the National Conference on Artificial Intelligence
Editors Anon
PublisherAAAI
Pages540-545
Number of pages6
StatePublished - 1997
EventProceedings of the 1997 14th National Conference on Artificial Intelligence, AAAI 97 - Providence, RI, USA
Duration: Jul 27 1997Jul 31 1997

Other

OtherProceedings of the 1997 14th National Conference on Artificial Intelligence, AAAI 97
CityProvidence, RI, USA
Period7/27/977/31/97

Fingerprint

Neural networks
Learning systems
Backpropagation algorithms
Multilayer neural networks
Backpropagation
Interpolation
Statistical Models

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Lawrence, S., Giles, C. L., & Tsoi, A. C. (1997). Lessons in neural network training: overfitting may be harder than expected. In Anon (Ed.), Proceedings of the National Conference on Artificial Intelligence (pp. 540-545). AAAI.
Lawrence, Steve ; Giles, C. Lee ; Tsoi, Ah Chung. / Lessons in neural network training : overfitting may be harder than expected. Proceedings of the National Conference on Artificial Intelligence. editor / Anon. AAAI, 1997. pp. 540-545
@inproceedings{74e7076364d44b9fad4735996944615e,
title = "Lessons in neural network training: overfitting may be harder than expected",
abstract = "For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that the optimal solution is typically not found. Furthermore, we observe that networks larger than might be expected can result in lower training and generalization error. This result is supported by another real world example. We further investigate the training behavior by analyzing the weights in trained networks (excess degrees of freedom are seen to do little harm and to aid convergence), and contrasting the interpolation characteristics of multi-layer perceptron neural networks (MLPs) and polynomial models (overfitting behavior is very different - the MLP is often biased towards smoother solutions). Finally, we analyze relevant theory outlining the reasons for significant practical differences. These results bring into question common beliefs about neural network training regarding convergence and optimal network size, suggest alternate guidelines for practical use (lower fear of excess degrees of freedom), and help to direct future work (e.g. methods for creation of more parsimonious solutions, importance of the MLP/BP bias and possibly worse performance of 'improved' training algorithms).",
author = "Steve Lawrence and Giles, {C. Lee} and Tsoi, {Ah Chung}",
year = "1997",
language = "English (US)",
pages = "540--545",
editor = "Anon",
booktitle = "Proceedings of the National Conference on Artificial Intelligence",
publisher = "AAAI",

}

Lawrence, S, Giles, CL & Tsoi, AC 1997, Lessons in neural network training: overfitting may be harder than expected. in Anon (ed.), Proceedings of the National Conference on Artificial Intelligence. AAAI, pp. 540-545, Proceedings of the 1997 14th National Conference on Artificial Intelligence, AAAI 97, Providence, RI, USA, 7/27/97.

Lessons in neural network training : overfitting may be harder than expected. / Lawrence, Steve; Giles, C. Lee; Tsoi, Ah Chung.

Proceedings of the National Conference on Artificial Intelligence. ed. / Anon. AAAI, 1997. p. 540-545.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Lessons in neural network training

T2 - overfitting may be harder than expected

AU - Lawrence, Steve

AU - Giles, C. Lee

AU - Tsoi, Ah Chung

PY - 1997

Y1 - 1997

N2 - For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that the optimal solution is typically not found. Furthermore, we observe that networks larger than might be expected can result in lower training and generalization error. This result is supported by another real world example. We further investigate the training behavior by analyzing the weights in trained networks (excess degrees of freedom are seen to do little harm and to aid convergence), and contrasting the interpolation characteristics of multi-layer perceptron neural networks (MLPs) and polynomial models (overfitting behavior is very different - the MLP is often biased towards smoother solutions). Finally, we analyze relevant theory outlining the reasons for significant practical differences. These results bring into question common beliefs about neural network training regarding convergence and optimal network size, suggest alternate guidelines for practical use (lower fear of excess degrees of freedom), and help to direct future work (e.g. methods for creation of more parsimonious solutions, importance of the MLP/BP bias and possibly worse performance of 'improved' training algorithms).

AB - For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that the optimal solution is typically not found. Furthermore, we observe that networks larger than might be expected can result in lower training and generalization error. This result is supported by another real world example. We further investigate the training behavior by analyzing the weights in trained networks (excess degrees of freedom are seen to do little harm and to aid convergence), and contrasting the interpolation characteristics of multi-layer perceptron neural networks (MLPs) and polynomial models (overfitting behavior is very different - the MLP is often biased towards smoother solutions). Finally, we analyze relevant theory outlining the reasons for significant practical differences. These results bring into question common beliefs about neural network training regarding convergence and optimal network size, suggest alternate guidelines for practical use (lower fear of excess degrees of freedom), and help to direct future work (e.g. methods for creation of more parsimonious solutions, importance of the MLP/BP bias and possibly worse performance of 'improved' training algorithms).

UR - http://www.scopus.com/inward/record.url?scp=0031370144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031370144&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0031370144

SP - 540

EP - 545

BT - Proceedings of the National Conference on Artificial Intelligence

A2 - Anon, null

PB - AAAI

ER -

Lawrence S, Giles CL, Tsoi AC. Lessons in neural network training: overfitting may be harder than expected. In Anon, editor, Proceedings of the National Conference on Artificial Intelligence. AAAI. 1997. p. 540-545