TY - JOUR

T1 - DISTRIBUTED OPTIMIZATION BASED ON GRADIENT TRACKING REVISITED

T2 - ENHANCING CONVERGENCE RATE VIA SURROGATION

AU - Sun, Ying

AU - Scutari, Gesualdo

AU - Daneshmand, Amir

N1 - Funding Information:
\ast Received by the editors May 10, 2019; accepted for publication (in revised form) January 9, 2022; published electronically April 5, 2022. https://doi.org/10.1137/19M1259973 Funding: This work has been supported by the National Science Foundation under grants CIF 1719205 and CMMI 1832688, by the Army Research Office under grant W911NF1810238, and by the Office of Naval Research under grant N00014-21-1-2673. \dagger School of Electrical Engineering and Computer Science, The Pennsylvania State University, State College, PA 16801 USA (ysun@psu.edu). \ddagger School of Industrial Engineering, Purdue University, West Lafayette, IN 47907 USA (gscutari @purdue.edu, adaneshm@purdue.edu).
Publisher Copyright:
© 2022 Society for Industrial and Applied Mathematics

PY - 2022

Y1 - 2022

N2 - We study distributed multiagent optimization over graphs. We consider the minimization of F + G subject to convex constraints, where F is the smooth strongly convex sum of the agent's losses and G is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (thus going beyond linearization, such as proximal-gradient) coupled with a perturbed consensus mechanism that aims to locally track the gradient of F. SONATA achieves precision \epsilon > 0 on the objective value in \scrO (\kappa g log(1/\epsilon )) gradient computations at each node and \scrO \~\bigl(\kappa g(1 - \rho ) -1/2 log(1/\epsilon )\bigr) communication steps, where \kappa g is the condition number of F and \rho characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (nonaccelerated) schemes just minimizing F, whose rate depends on much larger quantities than \kappa g. When the loss functions of the agents are similar, due to statistical data similarity or otherwise, SONATA employing high-order surrogates achieves precision \epsilon > 0 in \scrO \bigl((\beta /\mu ) log(1/\epsilon )\bigr) iterations and \scrO \~\bigl((\beta /\mu )(1 - \rho ) -1/2 log(1/\epsilon )\bigr) communication steps, where \beta measures the degree of similarity of agents' losses and \mu is the strong convexity constant of F. Therefore, when \beta /\mu < \kappa g, the use of high-order surrogates yields provably faster rates than those achievable by first-order models; this is without exchanging any Hessian matrix over the network.

AB - We study distributed multiagent optimization over graphs. We consider the minimization of F + G subject to convex constraints, where F is the smooth strongly convex sum of the agent's losses and G is a nonsmooth convex function. We build on the SONATA algorithm: the algorithm employs the use of surrogate objective functions in the agents' subproblems (thus going beyond linearization, such as proximal-gradient) coupled with a perturbed consensus mechanism that aims to locally track the gradient of F. SONATA achieves precision \epsilon > 0 on the objective value in \scrO (\kappa g log(1/\epsilon )) gradient computations at each node and \scrO \~\bigl(\kappa g(1 - \rho ) -1/2 log(1/\epsilon )\bigr) communication steps, where \kappa g is the condition number of F and \rho characterizes the connectivity of the network. This is the first linear rate result for distributed composite optimization; it also improves on existing (nonaccelerated) schemes just minimizing F, whose rate depends on much larger quantities than \kappa g. When the loss functions of the agents are similar, due to statistical data similarity or otherwise, SONATA employing high-order surrogates achieves precision \epsilon > 0 in \scrO \bigl((\beta /\mu ) log(1/\epsilon )\bigr) iterations and \scrO \~\bigl((\beta /\mu )(1 - \rho ) -1/2 log(1/\epsilon )\bigr) communication steps, where \beta measures the degree of similarity of agents' losses and \mu is the strong convexity constant of F. Therefore, when \beta /\mu < \kappa g, the use of high-order surrogates yields provably faster rates than those achievable by first-order models; this is without exchanging any Hessian matrix over the network.

UR - http://www.scopus.com/inward/record.url?scp=85130634943&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85130634943&partnerID=8YFLogxK

U2 - 10.1137/19M1259973

DO - 10.1137/19M1259973

M3 - Article

AN - SCOPUS:85130634943

SN - 1052-6234

VL - 32

SP - 354

EP - 385

JO - SIAM Journal on Optimization

JF - SIAM Journal on Optimization

IS - 2

ER -