A method for model selection using reinforcement learning when viewing design as a sequential decision process

Jaskanwal P.S. Chhabra, Gordon Patrick Warn

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

Original languageEnglish (US)
Pages (from-to)1521-1542
Number of pages22
JournalStructural and Multidisciplinary Optimization
Volume59
Issue number5
DOIs
StatePublished - May 15 2019

Fingerprint

Reinforcement learning
Reinforcement Learning
Model Selection
Fidelity
Model
Modeling
Design Process
Computational Cost
Design
Model Evaluation
Q-learning
Online Learning
Alternatives
Markov Decision Process
Sequencing
Design Methodology
Learning Algorithm
Learning algorithms
Paradigm
Costs

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Control and Optimization

Cite this

@article{7192f7e32b734f9ca7f732d6efe202f0,
title = "A method for model selection using reinforcement learning when viewing design as a sequential decision process",
abstract = "In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.",
author = "Chhabra, {Jaskanwal P.S.} and Warn, {Gordon Patrick}",
year = "2019",
month = "5",
day = "15",
doi = "10.1007/s00158-018-2145-6",
language = "English (US)",
volume = "59",
pages = "1521--1542",
journal = "Structural and Multidisciplinary Optimization",
issn = "1615-147X",
publisher = "Springer Verlag",
number = "5",

}

A method for model selection using reinforcement learning when viewing design as a sequential decision process. / Chhabra, Jaskanwal P.S.; Warn, Gordon Patrick.

In: Structural and Multidisciplinary Optimization, Vol. 59, No. 5, 15.05.2019, p. 1521-1542.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A method for model selection using reinforcement learning when viewing design as a sequential decision process

AU - Chhabra, Jaskanwal P.S.

AU - Warn, Gordon Patrick

PY - 2019/5/15

Y1 - 2019/5/15

N2 - In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

AB - In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

UR - http://www.scopus.com/inward/record.url?scp=85058488790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058488790&partnerID=8YFLogxK

U2 - 10.1007/s00158-018-2145-6

DO - 10.1007/s00158-018-2145-6

M3 - Article

VL - 59

SP - 1521

EP - 1542

JO - Structural and Multidisciplinary Optimization

JF - Structural and Multidisciplinary Optimization

SN - 1615-147X

IS - 5

ER -