International Journal of Control, Vol.62, No.6, 1391-1407, 1995
Overtraining, Regularization and Searching for a Minimum, with Application to Neural Networks
In this paper we discuss the role of criterion minimization as a means for parameter estimation. Most traditional methods, such as maximum likelihood and prediction error identification are based on these principles. However, somewhat surprisingly, it turns out that it is not always ’optimal’ to try to find the absolute minimum point of the criterion. The reason is that ’stopped minimization’ (where the iterations have been terminated before the absolute minimum has been reached) has more or less identical properties as using regularization (adding a parametric penalty term). Regularization is known to have beneficial effects on the variance of the parameter estimates and it reduces the ’variance contribution’ of the misfit. This also explains the concept of ’overtraining’ in neural nets. How does one know when to terminate the iterations then? A useful criterion would be to stop iterations when the criterion function applied to a validation data set no longer decreases. However, in this paper, we show that applying this technique extensively may lead to the fact that the resulting estimate is an unregularized estimate for the total data set : estimation + validation data.