Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points

Murray R; Swenson B; Kar S

IEEE Transactions on Automatic Control, Vol.64, No.11, 4818-4824, 2019

DOI10.1109/TAC.2019.2914998 Export Citation

Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points

The paper considers normalized gradient descent (NGD), a natural modification of classical gradient descent (GD) in optimization problems. It is shown that, contrary to GD, NGD escapes saddle points "quickly." A serious shortcoming of GD in nonconvex problems is that it can take arbitrarily long to escape from the neighborhood of a saddle point. In practice, this issue can significantly slow the convergence of GD, particularly in high-dimensional nonconvex problems. The paper focuses on continuous-time dynamics. It is shown that 1) NGD "almost never" converges to saddle points and 2) the time required for NGD to escape from a ball of radius r about a saddle point x* is at most 5 root kappa r, where kappa is the condition number of the Hessian of f at x*. As a simple application of these results, a global convergence-time bound is established for NGD under mild assumptions.

Keywords:Gradient methods;machine learning;nonconvex optimization;optimization