Daniel Acuña,
Department of Computer Science and Engineering
University of Minnesota.
Structure Learning in Human Sequential Decision-Making
Authors: Daniel Ernesto Acuña, Paul Schrater
Year: 2010
Abstract: Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that has perfect knowledge of the model of how rewards and events are generated in the environment. Rather than being suboptimal, we argue that the learning problem humans face is more complex, in that it also involves learning the structure of reward generation in the environment. We formulate the problem of structure learning in sequential decision tasks using Bayesian reinforcement learning, and show that learning the generative model for rewards qualitatively changes the behavior of an optimal learning agent. To test whether people develop structure learning, we performed experiments involving a mixture of one-armed and two-armed bandit reward models, where structure learning produces many of the qualitative behaviors deemed suboptimal in previous studies. Our results demonstrate humans can perform structure learning in a near-optimal manner.
Author summary: Every decision-making experiment has a structure for how rewards are obtained, which is usually explained to the subject at the beginning. Humans frequently fail to act as if they understand the experimental structure, even in tasks as simple as determining which of two biased coins they should choose to maximize the number of trials that produce “heads”. We hypothesize that subject’s behavior is not driven by top-down instructions—rather, subjects must learn for themselves through experience how the rewards are generated. This study demonstrates that be- havior often believed to show humans are error-prone and suboptimal decision makers can result from an optimal learning approach. In fact, the idea that subjects try to learn the structure in tasks provides a compelling new family of rational hypotheses for behavior previously deemed irrational, including under- and over-exploration, limited memory and probability matching. We formalize the problem of optimal learning in sequential decision making using a fully rational optimal Bayesian reinforcement learning approach. In an experimental test of structure learn- ing in humans, we further show that humans learn reward structure from experience in a near optimal manner. This results are of interest to a wide audience as variations of the sequential binary choice task have been employed and studied by researchers, not just in Experimental Psychology, but also in Neuroscience, Economics, and Computer Sciences.
The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Minnesota.