A model for system uncertainty in reinforcement learning

Ryan William Murray, Michele Palladino

Research output: Contribution to journalArticle

Abstract

This work provides a rigorous framework for studying continuous-time control problems in uncertain environments. The framework models uncertainty in state dynamics as a probability measure on the space of functions. Such a probability measure is permitted to change over time as agents learn about their environment. This model can be seen as a variant of either Bayesian reinforcement learning (RL) or adaptive optimal control. We study conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton–Jacobi equations. Some discussion of variants of the model are also provided, including one potential framework for studying the tradeoff between exploration and exploitation in RL.

Original languageEnglish (US)
Pages (from-to)24-31
Number of pages8
JournalSystems and Control Letters
Volume122
DOIs
StatePublished - Dec 1 2018

Fingerprint

Reinforcement learning
Dynamic programming
Trajectories
Uncertainty

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science(all)
  • Mechanical Engineering
  • Electrical and Electronic Engineering

Cite this

@article{d6c3fae4bd2b4fd896d8d8f2c9182061,
title = "A model for system uncertainty in reinforcement learning",
abstract = "This work provides a rigorous framework for studying continuous-time control problems in uncertain environments. The framework models uncertainty in state dynamics as a probability measure on the space of functions. Such a probability measure is permitted to change over time as agents learn about their environment. This model can be seen as a variant of either Bayesian reinforcement learning (RL) or adaptive optimal control. We study conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton–Jacobi equations. Some discussion of variants of the model are also provided, including one potential framework for studying the tradeoff between exploration and exploitation in RL.",
author = "Murray, {Ryan William} and Michele Palladino",
year = "2018",
month = "12",
day = "1",
doi = "10.1016/j.sysconle.2018.09.011",
language = "English (US)",
volume = "122",
pages = "24--31",
journal = "Systems and Control Letters",
issn = "0167-6911",
publisher = "Elsevier",

}

A model for system uncertainty in reinforcement learning. / Murray, Ryan William; Palladino, Michele.

In: Systems and Control Letters, Vol. 122, 01.12.2018, p. 24-31.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A model for system uncertainty in reinforcement learning

AU - Murray, Ryan William

AU - Palladino, Michele

PY - 2018/12/1

Y1 - 2018/12/1

N2 - This work provides a rigorous framework for studying continuous-time control problems in uncertain environments. The framework models uncertainty in state dynamics as a probability measure on the space of functions. Such a probability measure is permitted to change over time as agents learn about their environment. This model can be seen as a variant of either Bayesian reinforcement learning (RL) or adaptive optimal control. We study conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton–Jacobi equations. Some discussion of variants of the model are also provided, including one potential framework for studying the tradeoff between exploration and exploitation in RL.

AB - This work provides a rigorous framework for studying continuous-time control problems in uncertain environments. The framework models uncertainty in state dynamics as a probability measure on the space of functions. Such a probability measure is permitted to change over time as agents learn about their environment. This model can be seen as a variant of either Bayesian reinforcement learning (RL) or adaptive optimal control. We study conditions for locally optimal trajectories within this model, in particular deriving an appropriate dynamic programming principle and Hamilton–Jacobi equations. Some discussion of variants of the model are also provided, including one potential framework for studying the tradeoff between exploration and exploitation in RL.

UR - http://www.scopus.com/inward/record.url?scp=85055500109&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055500109&partnerID=8YFLogxK

U2 - 10.1016/j.sysconle.2018.09.011

DO - 10.1016/j.sysconle.2018.09.011

M3 - Article

VL - 122

SP - 24

EP - 31

JO - Systems and Control Letters

JF - Systems and Control Letters

SN - 0167-6911

ER -