Stochastic optimization with momentum: convergence, fluctuations, and traps avoidanceIn this paper, a general stochastic optimization procedure is studied,
unifying several variants of the stochastic gradient descent such as, among
others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated
Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm
is seen as a noisy Euler discretization of a non-autonomous ordinary
differential equation, recently introduced by Belotto da Silva and Gazeau,
which is analyzed in depth. Assuming that the objective function is non-convex
and differentiable, the stability and the almost sure convergence of the
iterates to the set of critical points are established. A noteworthy special
case is the convergence proof of S-NAG in a non-convex setting. Under some
assumptions, the convergence rate is provided under the form of a Central Limit
Theorem. Finally, the non-convergence of the algorithm to undesired critical
points, such as local maxima or saddle points, is established. Here, the main
ingredient is a new avoidance of traps result for non-autonomous settings,
which is of independent interest.
arxiv.org