Learning in mean-field oscillator games

Huibing Yin, Prashant G. Mehta, Sean P. Meyn, Vinayak V. Shanbhag

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This research concerns a noncooperative dynamic game with large number of oscillators. The states are interpreted as the phase angles for a collection of non-homogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. We introduce approximate dynamic programming (ADP) techniques for learning approximating optimal control laws for this model. Two types of parameterizations are considered, each of which is based on analysis of the deterministic PDE model introduced in our prior research. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters. In an online setting, a steepest descent stochastic approximation algorithm is proposed. We provide detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control cost parameter is identified: Above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.

Original languageEnglish (US)
Title of host publication2010 49th IEEE Conference on Decision and Control, CDC 2010
Pages3125-3132
Number of pages8
DOIs
StatePublished - Dec 1 2010
Event2010 49th IEEE Conference on Decision and Control, CDC 2010 - Atlanta, GA, United States
Duration: Dec 15 2010Dec 17 2010

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0191-2216

Other

Other2010 49th IEEE Conference on Decision and Control, CDC 2010
CountryUnited States
CityAtlanta, GA
Period12/15/1012/17/10

Fingerprint

Mean Field
Game
Optimal Parameter
Optimal Control
Approximate Dynamic Programming
Approximation algorithms
Learning Control
Parameterization
Dynamic programming
Dynamic Games
Non-cooperative Game
Steepest Descent
Stochastic Approximation
Galerkin Approximation
Stochastic Algorithms
Online Algorithms
Coupled Oscillators
Galerkin
Model
Phase transitions

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Cite this

Yin, H., Mehta, P. G., Meyn, S. P., & Shanbhag, V. V. (2010). Learning in mean-field oscillator games. In 2010 49th IEEE Conference on Decision and Control, CDC 2010 (pp. 3125-3132). [5717142] (Proceedings of the IEEE Conference on Decision and Control). https://doi.org/10.1109/CDC.2010.5717142
Yin, Huibing ; Mehta, Prashant G. ; Meyn, Sean P. ; Shanbhag, Vinayak V. / Learning in mean-field oscillator games. 2010 49th IEEE Conference on Decision and Control, CDC 2010. 2010. pp. 3125-3132 (Proceedings of the IEEE Conference on Decision and Control).
@inproceedings{eb06145f74274e6b9b5a2263656cc70d,
title = "Learning in mean-field oscillator games",
abstract = "This research concerns a noncooperative dynamic game with large number of oscillators. The states are interpreted as the phase angles for a collection of non-homogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. We introduce approximate dynamic programming (ADP) techniques for learning approximating optimal control laws for this model. Two types of parameterizations are considered, each of which is based on analysis of the deterministic PDE model introduced in our prior research. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters. In an online setting, a steepest descent stochastic approximation algorithm is proposed. We provide detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control cost parameter is identified: Above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.",
author = "Huibing Yin and Mehta, {Prashant G.} and Meyn, {Sean P.} and Shanbhag, {Vinayak V.}",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/CDC.2010.5717142",
language = "English (US)",
isbn = "9781424477456",
series = "Proceedings of the IEEE Conference on Decision and Control",
pages = "3125--3132",
booktitle = "2010 49th IEEE Conference on Decision and Control, CDC 2010",

}

Yin, H, Mehta, PG, Meyn, SP & Shanbhag, VV 2010, Learning in mean-field oscillator games. in 2010 49th IEEE Conference on Decision and Control, CDC 2010., 5717142, Proceedings of the IEEE Conference on Decision and Control, pp. 3125-3132, 2010 49th IEEE Conference on Decision and Control, CDC 2010, Atlanta, GA, United States, 12/15/10. https://doi.org/10.1109/CDC.2010.5717142

Learning in mean-field oscillator games. / Yin, Huibing; Mehta, Prashant G.; Meyn, Sean P.; Shanbhag, Vinayak V.

2010 49th IEEE Conference on Decision and Control, CDC 2010. 2010. p. 3125-3132 5717142 (Proceedings of the IEEE Conference on Decision and Control).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Learning in mean-field oscillator games

AU - Yin, Huibing

AU - Mehta, Prashant G.

AU - Meyn, Sean P.

AU - Shanbhag, Vinayak V.

PY - 2010/12/1

Y1 - 2010/12/1

N2 - This research concerns a noncooperative dynamic game with large number of oscillators. The states are interpreted as the phase angles for a collection of non-homogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. We introduce approximate dynamic programming (ADP) techniques for learning approximating optimal control laws for this model. Two types of parameterizations are considered, each of which is based on analysis of the deterministic PDE model introduced in our prior research. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters. In an online setting, a steepest descent stochastic approximation algorithm is proposed. We provide detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control cost parameter is identified: Above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.

AB - This research concerns a noncooperative dynamic game with large number of oscillators. The states are interpreted as the phase angles for a collection of non-homogeneous oscillators, and in this way the model may be regarded as an extension of the classical coupled oscillator model of Kuramoto. We introduce approximate dynamic programming (ADP) techniques for learning approximating optimal control laws for this model. Two types of parameterizations are considered, each of which is based on analysis of the deterministic PDE model introduced in our prior research. In an offline setting, a Galerkin procedure is introduced to choose the optimal parameters. In an online setting, a steepest descent stochastic approximation algorithm is proposed. We provide detailed analysis of the optimal parameter values as well as the Bellman error with both the Galerkin approximation and the online algorithm. Finally, a phase transition result is described for the large population limit when each oscillator uses the approximately optimal control law. A critical value of the control cost parameter is identified: Above this value, the oscillators are incoherent; and below this value (when control is sufficiently cheap) the oscillators synchronize. These conclusions are illustrated with results from numerical experiments.

UR - http://www.scopus.com/inward/record.url?scp=79953147145&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953147145&partnerID=8YFLogxK

U2 - 10.1109/CDC.2010.5717142

DO - 10.1109/CDC.2010.5717142

M3 - Conference contribution

SN - 9781424477456

T3 - Proceedings of the IEEE Conference on Decision and Control

SP - 3125

EP - 3132

BT - 2010 49th IEEE Conference on Decision and Control, CDC 2010

ER -

Yin H, Mehta PG, Meyn SP, Shanbhag VV. Learning in mean-field oscillator games. In 2010 49th IEEE Conference on Decision and Control, CDC 2010. 2010. p. 3125-3132. 5717142. (Proceedings of the IEEE Conference on Decision and Control). https://doi.org/10.1109/CDC.2010.5717142