Optimal habits can develop spontaneously through sensitivity to local cost

Theresa M. Desrochers, Dezhe Z. Jin, Noah D. Goodman, Ann M. Graybiel

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively fewavailable responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed naïve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.

Original languageEnglish (US)
Pages (from-to)20512-20517
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume107
Issue number47
DOIs
StatePublished - Nov 23 2010

Fingerprint

Habits
Learning
Saccades
Costs and Cost Analysis
Haplorhini
Reward
Ceremonial Behavior
Macaca
Eye Movements
Reinforcement (Psychology)
Brain

All Science Journal Classification (ASJC) codes

  • General

Cite this

@article{801c1e19c82044dca104627ffb8e6c23,
title = "Optimal habits can develop spontaneously through sensitivity to local cost",
abstract = "Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively fewavailable responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed na{\"i}ve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.",
author = "Desrochers, {Theresa M.} and Jin, {Dezhe Z.} and Goodman, {Noah D.} and Graybiel, {Ann M.}",
year = "2010",
month = "11",
day = "23",
doi = "10.1073/pnas.1013470107",
language = "English (US)",
volume = "107",
pages = "20512--20517",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "47",

}

Optimal habits can develop spontaneously through sensitivity to local cost. / Desrochers, Theresa M.; Jin, Dezhe Z.; Goodman, Noah D.; Graybiel, Ann M.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 107, No. 47, 23.11.2010, p. 20512-20517.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Optimal habits can develop spontaneously through sensitivity to local cost

AU - Desrochers, Theresa M.

AU - Jin, Dezhe Z.

AU - Goodman, Noah D.

AU - Graybiel, Ann M.

PY - 2010/11/23

Y1 - 2010/11/23

N2 - Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively fewavailable responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed naïve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.

AB - Habits and rituals are expressed universally across animal species. These behaviors are advantageous in allowing sequential behaviors to be performed without cognitive overload, and appear to rely on neural circuits that are relatively benign but vulnerable to takeover by extreme contexts, neuropsychiatric sequelae, and processes leading to addiction. Reinforcement learning (RL) is thought to underlie the formation of optimal habits. However, this theoretic formulation has principally been tested experimentally in simple stimulus-response tasks with relatively fewavailable responses. We asked whether RL could also account for the emergence of habitual action sequences in realistically complex situations in which no repetitive stimulus-response links were present and in which many response options were present. We exposed naïve macaque monkeys to such experimental conditions by introducing a unique free saccade scan task. Despite the highly uncertain conditions and no instruction, the monkeys developed a succession of stereotypical, self-chosen saccade sequence patterns. Remarkably, these continued to morph for months, long after session-averaged reward and cost (eye movement distance) reached asymptote. Prima facie, these continued behavioral changes appeared to challenge RL. However, trial-by-trial analysis showed that pattern changes on adjacent trials were predicted by lowered cost, and RL simulations that reduced the cost reproduced the monkeys' behavior. Ultimately, the patterns settled into stereotypical saccade sequences that minimized the cost of obtaining the reward on average. These findings suggest that brain mechanisms underlying the emergence of habits, and perhaps unwanted repetitive behaviors in clinical disorders, could follow RL algorithms capturing extremely local explore/exploit tradeoffs.

UR - http://www.scopus.com/inward/record.url?scp=78650557305&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650557305&partnerID=8YFLogxK

U2 - 10.1073/pnas.1013470107

DO - 10.1073/pnas.1013470107

M3 - Article

C2 - 20974967

AN - SCOPUS:78650557305

VL - 107

SP - 20512

EP - 20517

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 47

ER -