TY - JOUR

T1 - Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization

T2 - block-specific steplengths and adapted batch-sizes

AU - Lei, Jinlong

AU - Shanbhag, Uday V.

N1 - Publisher Copyright:
© 2020, © 2020 Informa UK Limited, trading as Taylor & Francis Group.

PY - 2020

Y1 - 2020

N2 - This work considers the minimization of a sum of an expectation-valued coordinate-wise smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength relying on its block-specific Lipschitz constant while batch-sizes are updated as a function of the number of times that block is selected. We show that every limit point is a stationary point and establish the ergodic non-asymptotic rate O(1/K). Iteration and oracle complexity to obtain an ε-stationary point are shown to be O(1/ϵ) and O(1/ϵ2), respectively. Furthermore, under a proximal Polyak–Łojasiewicz condition with batch sizes increasing at a geometric rate, we prove that the suboptimality diminishes at a geometric rate, the optimal deterministic rate while iteration and oracle complexity to obtain an ε-optimal solution are O(ln(1/ϵ) and oracle complexity to obtain anϵ-optimal solution are O(ln(1/ϵ)) and Oϵ(1/ϵ)1+cϵwith c ≥ 0.. In the single block setting, we obtain the optimal oracle complexity (Formula presented.). Finally, preliminary numerics suggest that the schemes compare well with competitors reliant on global Lipschitz constants.

AB - This work considers the minimization of a sum of an expectation-valued coordinate-wise smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength relying on its block-specific Lipschitz constant while batch-sizes are updated as a function of the number of times that block is selected. We show that every limit point is a stationary point and establish the ergodic non-asymptotic rate O(1/K). Iteration and oracle complexity to obtain an ε-stationary point are shown to be O(1/ϵ) and O(1/ϵ2), respectively. Furthermore, under a proximal Polyak–Łojasiewicz condition with batch sizes increasing at a geometric rate, we prove that the suboptimality diminishes at a geometric rate, the optimal deterministic rate while iteration and oracle complexity to obtain an ε-optimal solution are O(ln(1/ϵ) and oracle complexity to obtain anϵ-optimal solution are O(ln(1/ϵ)) and Oϵ(1/ϵ)1+cϵwith c ≥ 0.. In the single block setting, we obtain the optimal oracle complexity (Formula presented.). Finally, preliminary numerics suggest that the schemes compare well with competitors reliant on global Lipschitz constants.

UR - http://www.scopus.com/inward/record.url?scp=85083551489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85083551489&partnerID=8YFLogxK

U2 - 10.1080/10556788.2020.1746963

DO - 10.1080/10556788.2020.1746963

M3 - Article

AN - SCOPUS:85083551489

JO - Optimization Methods and Software

JF - Optimization Methods and Software

SN - 1055-6788

ER -