We present an analytical study of gradient descent algorithms applied to a classification problem in machine learning based on artificial neural networks. Our approach is based on entropy–entropy dissipation estimates that yield explicit rates. Specifically, as long as the neural nets remain within a set of “good classifiers” we establish a striking feature of the algorithm: it mathematically diverges as the number of gradient descent iterations (“time”) goes to infinity but this divergence is only logarithmic, while the loss function vanishes polynomially. As a consequence, this algorithm still yields a classifier that exhibits good numerical performance and may even appear to converge.
All Science Journal Classification (ASJC) codes