Normalized Reinforcement Learning

Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, empirical evidence overwhelmingly shows that reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. This project examines a novel nonlinear RL algorithm incorporating the canonical divisive normalization computation, which is widely found in sensory and cognitive processing. Preliminary work shows that normalized RL introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making.


Assessing Stellate Treatment of Post-Traumatic Stress Disorder

Post-traumatic stress disorder (PTSD) is a complex, persistent pathological anxiety condition in response to extreme stress with significant impact on mental and physical function. PTSD severity is exacerbated by frequently occurring comorbid conditions including mood and anxiety disorders, impulsive and dangerous behavior, and substance abuse. Given its prevalence, chronic and recurring nature, and associated comorbid conditions, PTSD produces a significantly high total disease burden. Recent research has highlighted differences in neural activity in individuals with PTSD, specifically in brain regions responsible for psychological processes affected in PTSD such as emotional processing, fear learning and extinction, and memory. Most studies have examined neural differences between control subjects and those with PTSD, making it difficult to determine: (1) whether these differences reflect pre-existing vulnerabilities or acquired pathophysiology, (2) whether changes in specific brain activity patterns in PTSD are correlated with improvements in PTSD symptoms after treatment, and (3) which brain regions – and their associated psychological processes – offer the most promise as targets for intervention. Over the past decade, however, there have been growing number of case reports indicating that simultaneous blockade of the stellate ganglion at spinal levels C4 and C6 can elicit rapidly evoked and long-lasting remission from PTSD. SGB therefore provides a clinical intervention for PTSD that allows for pre-intervention and post-intervention fMRI neuroimaging that matches the timescale of changes in PTSD symptoms. In this study, subjects with PTSD symptoms will be randomized into SGB treatment and control arms to examine how SGB might modulate the neural circuitry of emotion processing, fear learning and extinction, and decision making. Identifying what specific changes in neural activity correspond to symptomatic improvement in PTSD will offer an increased understanding in the pathology of PTSD and could set the stage for the development of novel treatments for anxiety disorders, specifically PTSD.


Divisive Normalization in the Human Brain Network

Divisive normalization is a canonical computation well acknowledged in perception, attention, and value-based decision-making. However, most of the neural evidence is from animal studies using single-electrode recordings. In this project, we examine the neural mechanism of divisive normalization in the human brain by using functional magnetic resonance imaging (fMRI). This project will help us understand more about normalized value coding in the human brain at the network level other than single-site computations.


Local Disinhibition Decision Model (LDDM)

Normalized value coding and winner-take-all choice are two important features of decision-making uncovered in the brain. Computational neuroscience utilizes different neural circuit models to explain the mechanism of single features. It is unknown whether these features can be explained by a single decision circuit. In this project, we examine an integrated circuit model with the recently identified neuronal types from optogenetics, including excitatory, inhibitory, and disinhibitory neurons. The integrated circuit implements both normalized value coding and winner-take-all choice dynamics, controlled by the top-down signal of disinhibition. In addition to capturing empirical psychometric and chronometric data, the model produces persistent activity consistent with working memory. Furthermore, disinhibition provides a simple mechanism for flexible top-down control of network states, enabling the circuit to capture diverse task-dependent neural dynamics. The top-down controlled disinhibition may also play a role in controlling decision speed and accuracy tradeoffs. Ongoing work is discovering other interesting features from this biological circuit.


Quantifying Risk of Transitioning from Occasional Opioid Use to Opioid Addiction

The central aim of this study is to identify the risk factors that predispose an individual to transition from occasional opioid use to addiction as defined by DSM-V. We seek to understand how i) overall life quality, ii) hedonic experience during initial opioid use and subsequent craving, iii) genetic predisposition, and iv) individual psychological traits may facilitate a transition to addiction during opiate use. We hypothesize that these four main risk factor categories can be used together to reliably differentiate a cohort of subjects who have developed OUD from a comparable matching cohort of subjects who have similar initial exposure to opiates but do not develop OUD. We also propose to examine these traits in a matched control group who differs from our other groups in that they are opioid naïve. A sufficiently predictive quantitative model would allow patient specific and dosage specific risk assessment, could meaningfully impact prescribing plans, and might even be used to assess the likelihood of addiction recovery.


Efficient Coding and Divisive Normalization

Divisive normalization is a widespread neural computation that can explain many violations of rationality such as context dependence, and is often seen as an implementation of the Efficient Coding Hypothesis. This theory project derives precise conditions under which divisive normalization is an efficient computation given constraints on information processing. We find that divisive normalization is optimal in stimulus environments described by Pareto distributions, whose power-law characteristics and statistical dependencies reflect many naturalistic stimulus environments.


Computational Foundations of Behavioral Biases

Insights into the choice process have made it possible to study the neural and computational causes of irrational choice behavior. This project investigates behavioral biases such as base-rate neglect through the lens of the drift-diffusion model (DDM) and related choice-process models. Psychometric experiments allow us to reveal the evolving decision variable in behavior, which provides empirical restrictions on the latent decision variable above and beyond choices and response times alone. Using this methodology, we find that evidence accumulates independently of base rates, which suggests a simple mechanism underlying base-rate neglect. Our methodology is more broadly applicable, and promises detailed insights into the neural and computational foundations of irrational choice behavior more generally.


Choosing Well: Testing the Efficiency of Neural Representation

Divisive Normalization is often viewed as a "canonical brain encoding mechanism" (in the words of Heeger). However, the divisive normalization encoding function is efficient only for specific types of input stimuli. Using behavioral paradigms and computational modeling, in this project we test whether the brain uses the same type of output firing rate function of normalized value for all types of input stimuli, or, perhaps the encoding mechanism varies between different types of choice environments. In other words, we ask if our choices are constrained by one physiological encoding mechanism - at the expense of efficiency - or, whether we are adaptive to different types of choice environments?


Neurocomputational Framework for Strategic Interactions

Seminal work in social neuroscience identified the role of the temporo-parietal junction (TPj) in reasoning about the mental state, actions and goals of others, a process usually referred to as “mentalizing”. Nonetheless, the study of mentalizing behavior is still lacking a strictly computational approach, which could lay the foundation for a neuronal-mechanistic account of social interactions. In this project, we develop a novel framework to examine the cognitive basis of strategic social interactions. Our goal is to investigate how the brain decodes social interactions into subclasses of cognitive functions associated with different cortical modules. Leveraging modelling architectures from Game Theory, alongside experimental design concepts from computational psychology, we identify orthogonal cognitive modules that we hypothesize play a role in mentalizing.


Decision-Making in Addiction

Decision-making is strongly affected by drugs of abuse. The ability to exert self-control and resist to temptation and craving is severely impaired in addiction and this can give rise to impulsivity and risk-seeking behavior. These changes however may be malleable, and existing pharmacological and psychosocial treatments seem to restore these decision processes to a healthier state. The goal of this project is to study how craving and treatment for addiction may impact the neural computations underlying these types of decision-making in a dynamic way. Using fMRI, computational modeling, and choice experiments in patients with substance use disorder, our aim is to design a set of robust screening tools that can be used in the clinic to make accurate predictions about an individual's recovery.


Dynamics of Divisive Normalization in Decision-Related Processing

Dynamical models are a common tool for modeling neural circuits (i.e. integrate and fire models) and also in modeling the decision-making process (i.e. drift-diffusion models). We have recently developed a first-generation dynamic model of the decision-making process based on assumptions of network connectivity that implement divisive normalization. Recent recordings in LIP suggest that this model qualitatively predicts activity dynamics. Mathematical analysis of the model suggests a network mechanism for history dependence. Currently, we are working on extending this model to one that operates on multiple timescales and captures both long and short-term history dependence effects.


Divisive Normalization and Value Coding in Decision Circuits

The neural code governing the representation of value information is critical to the decision process, guiding the choice between potential options and bridging the gap between sensation and action. We have recently shown that value coding in parietal cortex is relative rather than absolute, imparting an intrinsic context-dependence to value representation. This relative representation is governed by divisive normalization, a computational algorithm widely described in sensory cortices, suggesting a common cortical mechanism for contextual processing. Ongoing work is extending this work to the temporal domain by examining the potential role of normalization in adaptive value coding under different local distributions of received rewards.

Louie K., Grattan L.E., & Glimcher, P.W. (2011). Reward value-based gain control: Divisive normalization in parietal cortex. Journal of Neuroscience, 31(29): 10627-10639 [pdf]


Context-Dependent Choice: Modeling, Prediction, and Remediation

Normalized value coding provides insight into both the physiological mechanism of decision-making and the efficiency of choice behavior. Using computational modeling and human choice experiments, we are exploring how normalization and stochastic variability in neural value coding can explain many context-dependent violations of rational choice theory. Ongoing projects include the algorithmic modeling of context-dependent choice behavior, the characterization of novel forms of contextual choice inefficiencies in human subjects, the development of normalization-based compensatory behavioral strategies, and the examination of context-dependent value coding using neuroimaging techniques.

Louie, K.L. & Glimcher, P.W. (2012). Efficient coding and the neural representation of value. Ann. N.Y. Acad. Sci., (1251): 13-32. [pdf]


Fooling the System: Reassigning Value Through Exogenous Dopamine Activation

The current project seeks to extend upon the work on response properties of dopamine neurons previously conducted by Schultz et al. (1997), Nakahara et al. (2004) and Bayer & Glimcher (2005). Their experiments clearly demonstrated that dopamine neurons increase their firing rate to an unexpected reward in a way that aligns with the reinforcement-learning theorem and suggests that dopamine neurons encode a reward prediction error. This error term teaches subjects about the value of objects in the environment around them. For the experiment we engage non-human primates in a dynamic learning task where they are choosing between two stimuli, one has a smaller reward associated with it than the other. We electrically stimulate in the ventral tegmental area (thereby activating dopamine neurons and creating a positive reward prediction error) directly following reward of the stimulus with the smaller reward associated with it. We are able to show that the subjects’ preferences switch to the smaller rewarded stimulus in a way that can be modeled with the reinforcement-learning theorem. Our hope is that this experiment helps describe in some ways how psychostimulant drugs work to hijack the reward circuitry.


Can Deep Brain Stimulation Reduce Preference for Cocaine?

One possible cause of addiction is that drugs of abuse inflate the value of actions above their natural value – its not so much that addicts like drugs as that they want them. Many drugs of abuse, like cocaine, disrupt the normal functioning of the midbrain dopamine system, which is thought to be responsible for how the values of actions are learned and updated. It may be possible to inhibit the midbrain dopamine system at the precise moment at which these values are updated, and therefore reduce or negate the effects of cocaine. In this line of research, we use the magnitude matching task of Herrnstein to assess a thirsty monkey’s relative preference for cocaine over water on a trial-by-trial basis. Then, we use deep brain stimulation of the Lateral Habenula to inhibit the dopamine system precisely when the monkey chooses cocaine. The monkey should reduce its preference for cocaine, or even prefer the water.


Neural Random Utility: A Theoretical Framework Linking Neuroscience to Stochastic Behavior

An important step in our research is creating a theoretical framework which can bridge between the detailed constraints and processes at the level of neuroscience, and the more abstract (but flexible) modeling of choice behavior at the level of economics. One project, the Neural Random Utility Model development, attempts to lay out such a theoretical framework, and allows us to relate the stochasticity in networks of spiking neurons to the stochasticity found in choice behavior. Related projects attempt to impose more direct neural constraints on the model, exploring how (evolutionarily adaptive) information processing constraints can lead to seemingly sub-optimal choice behavior, and the formation of a reference point.

Webb, R., Glimcher, P.W, Levy, I., Lazzarr, S., Rutledge, R. (2012) Neural Random Utility. Social Science Research Network. [pdf]


Decision-Making Across the Life Span

Characterizing behavioral changes in decision-making across the life span and understanding why they occur has significant implications for behavioral problems associated with poor decision-making at different stages of life - such as careless driving in adolescents and disadvantageous medical or financial decision-making in the elderly. Scientists in many disciplines have long observed that age seems to be a significant determinant of decision-making under risk and ambiguity. There has, however, been significant disagreement about how and why preferences toward risk and ambiguity change with age. Research on risk attitudes and age has so far focused separately on adult or minor populations, making it hard or impossible to assess the differences in risk attitudes between adolescents or older adults and other age groups. The most important result of this controversy has been the reliance, by policy makers, on a set of stylized facts about the decision-making behavior of mid-life representative agents. But whether those stylized facts about the representative agent are robustly true, what those stylized facts mean for decision-makers of different ages and how those representative mid-life agents relate to individual decision-makers has never been exhaustively examined. The goal of this project is to provide a comprehensive study of decision-making in a population that ranges in age from 12 to 90 years old using both behavioral and fMRI research techniques.

Additionally, in cooperation with the Museum of the National Academy of Sciences in Washington, we designed an interactive exhibit in which museum visitors can estimate their own risk and ambiguity attitudes by participating in a simple, incentivized choice task. Over the next several years, we will use this data set to assess the impact of (1) age and other individual-specific factors: marital status, birth order, birth cohort, gender or culture and (2) individual-independent factors such as macroeconomic shocks, weather or context, on attitudes toward known and unknown risks.

Tymula, A., Rosenberg Belmaker, L.A, Roy, A.K, Ruderman L.,Manson, K., Glimcher, P.W, and Levy, I. (2012) Adolescents' Risk Taking Behavior is Driven by Tolerance to Ambiguity. Proceedings of the National Academy of Sciences of the United States of America, 109 (42): 17135-17140 [pdf]


Temporal Context Effects in Decision-Making

People make thousands of choices every day. Reference dependent preferences have a huge and detrimental economic impact. Recent price history and related phenomena lead to increases in crime, distortions of the labor market and inefficiencies in investor behavior. To date, no model has been developed which can capture these effects completely. The goal of this project is to advance our understanding of why and how a decision makers’ evaluation of outcome quality changes when the benchmark against which these outcomes are compared, the reference point, changes. As a final product of this research project we propose to construct a novel model of reference dependent preferences, based on temporal normalization models from neuroscience that describe the biophysical calculations hypothesized to underlie the physical instantiation of the reference point. This model should be able to account for a wide range of choice inefficiencies/irrationalities that are observed in everyday life, as well as offering a mechanism for suggesting and modeling remedies. The unique strength of this project is its joint application of tools from economics and neural science in an effort to build a complete model of the reference-dependent preferences which plague both consumer and investor choice behavior.


Wealth Effects in Decision-Making Under Risk

Standard economic techniques allow us to evaluate human risk-attitudes, although it has been technically difficult to relate these measurements to the overall wealth levels standard models employ as a critical variable. Previous work has, however, applied these techniques to animals to answer two questions: 1) Do our close evolutionary relatives share both our risk attitudes and our economic rationality? 2) How does satiety state (or wealth level in the language of economics) change risk-attitudes? Previous studies have provided conflicting answers to these questions. To address these issues, we employed standard techniques from human experimental economics to measure monkey risk-attitudes (utility function curvature) for water rewards in captive rhesus macaques as a function of blood osmolality (an objective measure of water wealth). Overall, our monkey subjects were slightly risk-averse in a manner reminiscent of human choosers, but only after significant training. Monkeys consistently violated expected utility theory (violating first order stochastic dominance) early in training, indicating that traditional economic models cannot be used to describe their behavior at that stage. Once these choosers were rational, measured risk-attitudes were thirst-dependent. But unexpectedly, as the animals became thirstier risk-aversion actually increased, a finding that may be incompatible with some standard economic models..


Flexible Valuations for Consumer Goods as Measured by the Becker-DeGroot-Marschak Mechanism

In this project we experimentally investigate whether valuations elicited by the commonly used Becker-DeGroot-Marschak (BDM) procedure depend on the distribution of prices that are used in the elicitation mechanism. To answer this question we created a novel within-subject design that allowed us to observe an individual's bid for a given product repeatedly while varying the price distribution. Our data clearly show that subjects do not bid constantly the same amount for each good on each offer as would be predicted by EU theory, but rather show a mass-seeking bias. This bias is strongest when the mass of the price distribution is close to the average bid that the subject places on the good. Bids are influenced by the mass to a lesser extent when the mass of the distribution is further away from the mean bid. We characterize preference structures that are consistent with the observed behavior.