Stochastic gradient descent with only one projection

Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jinfeng Yi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, the proposed algorithms achieve an O(1/√T) convergence rate for general convex optimization, and an O (ln T/T) rate for strongly convex optimization under mild conditions about the domain and the objective function.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 25
Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Pages494-502
Number of pages9
StatePublished - Dec 1 2012
Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
Duration: Dec 3 2012Dec 6 2012

Publication series

NameAdvances in Neural Information Processing Systems
Volume1
ISSN (Print)1049-5258

Other

Other26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
CountryUnited States
CityLake Tahoe, NV
Period12/3/1212/6/12

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'Stochastic gradient descent with only one projection'. Together they form a unique fingerprint.

Cite this