There has been much interest in applying noise to feedforward neural networks in order to observe their effect on network performance. We extend these results by introducing and analyzing various methods of injecting synaptic noise into dynamically driven recurrent networks during training. We present theoretical results which show that applying a controlled amount of noise during training may improve convergence and generalization performance. In addition, we analyze the effects of various noise parameters (additive versus multiplicative, cumulative versus noncumulative, per time step vs. per string) and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term can be interpreted as a regularization term that can improve generalization. Specifically, this term can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Computer Networks and Communications
- Artificial Intelligence