Saddle point (SP) problems arise frequently in many key settings in today's big data world. Examples include but are not limited to unsupervised and supervised learning problems such as regression or classication where the aim is to learn a predictive model from data subject to possible constraints on the model, resource allocation problems and robust optimization. For solving these SP problems, rst-order (FO) methods that rely on gradient information have been very popular in practice due to their favorable scalability propertiesbut they come with a number of challenges. First of all, the gradients can often contain stochastic errors, e.g., when the gradient is estimated from random draws from data as in stochastic SP algorithms or when the noise is injected to the gradients on purpose to protect the privacy of the user data. The techniquesfor accelerating deterministic FO methods such as momentum averaging, amplify the errors in the gradients and are less robust to gradient errors unless the stepsize and the momentum parameters are very carefully tuned to the problem at hand where there is a lack of principled strategies to tune these parameters. Second, for general SP problems, existing methods are either not optimal in terms of number of iterations required or they are not single-loop methods except the special case of bilinearly coupled SP problems. This is asignicant disadvantage because having a single loop usually allows for easily handling online streaming data and implementing line-search, and leads to better computational complexity in practice. Third, ecient stochastic line search techniques that can exploit the structure of SP problems are missing in the literature. We propose a framework that allows to trade-o convergence rate and robustness to gradient noise of momentum-based SP algorithms in a systematic fashion for strongly convex/strongly concave (SCSC) SPproblems. If successful, the proposed approach will enable tuning of the parameters of existing momentumbased SP algorithms in a principled fashion allowing them to be both fast and robust to gradient noise simultaneously. We also propose to study the Accelerated Primal Dual (APD) method and its stochastic version (SAPD) for solving SP problems and we conjecture that it can achieve optimal complexity if its parameters are optimized with our robustness-rate framework. Our preliminary results show that SAPD is very ecient in practice and has state-of-the-art performance. We plan to investigate its further theoreticalconvergence properties for both convex/concave and structured non-convex/non-concave SP problems. We also propose to investigate stochastic line search techniques for SP problems that can exploit their structure. Finally, we propose to extend our approach to obtain ecient distributed SP algorithms in the multi-agent settings where the data can be physically distributed and private.
|Effective start/end date||4/1/21 → 4/1/21|
- Office of Naval Research: $326,000.00