Prior work on prosocial and self-serving behavior in human economic exchanges has shown that counterparts’ high social reputations bias striatal reward signals and elicit cooperation, even when such cooperation is disadvantageous. This phenomenon suggests that the human striatum is modulated by the other’s social value, which is insensitive to the individual’s own choices to cooperate or defect. We tested an alternative hypothesis that, when people learn from their interactions with others, they encode prediction error updates with respect to their own policy. Under this policy update account striatal signals would reflect positive prediction errors when the individual’s choices correctly anticipated not only the counterpart’s cooperation but also defection. We examined behavior in three samples using reinforcement learning and model-free analyses and performed an fMRI study of striatal learning signals. In order to uncover the dynamics of goal-directed learning, we introduced reversals in the counterpart’s behavior and provided counterfactual (would-be) feedback when the individual chose not to engage with the counterpart. Behavioral data and model-derived prediction error maps (in both whole-brain and a priori striatal region of interest analyses) supported the policy update model. Thus, as people continually adjust their rate of cooperation based on experience, their behavior and striatal learning signals reveal a self-centered instrumental process corresponding to reciprocal altruism.
All Science Journal Classification (ASJC) codes
- Cognitive Neuroscience
- Behavioral Neuroscience