Controllability and resource-rational planning

Falk Lieder Noah D Goodman Quentin JM Huys

Abstract Learned helplessness experiments involving controllable vs. uncontrollable stressors have shown that the perceived ability to control events has profound consequences for decision making. Normative models of decision making, however, do not naturally incorporate knowledge about controllability, and previous approaches to incorporating it have led to solutions with biologically implausible computational demands [1,2]. Intuitively, controllability bounds the differential rewards for choosing one strategy over another, and therefore believing that the environment is uncontrollable should reduce one’s willingness to invest time and effort into choosing between options. Here, we offer a normative, resource-rational account of the role of controllability in trading mental effort for expected gain. In this view, the brain not only faces the task of solving Markov decision problems (MDPs), but it also has to optimally allocate its finite computational resources to solve them efficiently. This joint problem can itself be cast as a MDP [3], and its optimal solution respects computational constraints by design. We start with an analytic characterisation of the influence of controllability on the use of computational resources. We then replicate previous results on the effects of controllability on the differential value of exploration vs. exploitation, showing that these are also seen in a cognitively plausible regime of computational complexity. Third, we find that controllability makes computation valuable, so that it is worth investing more mental effort the higher the subjective controllability. Fourth, we show that in this model the perceived lack of control (helplessness) replicates empirical findings [4] whereby patients with major depressive disorder are less likely to repeat a choice that led to a reward, or to avoid a choice that led to a loss. Finally, the model makes empirically testable predictions about the relationship between reaction time and helplessness.

Additional Detail Our first aim is to better understand the normative reasons for tracking controllability when making decisions. We build on classical descriptions of controllability as a belief about the entropy of action outcomes and extend previous work showing that controllability is a crucial determinant of the differential value of exploration vs. exploitation [1, 4]. We revisit the sequential decision-making task by [4] where subjects face a series of slot machines with unknown outcome probabilities. Each slot machine yields discrete outcomes from 0 to 9. The multinomial distributions are independently drawn from Dirichlet priors. In this scenario, the subject can exert control by adaptively choosing slot machines that have yielded a high outcome and appear to have a low outcome entropy. We formulate this task by a MDP with an augmented state-space encompassing beliefs about transition probabilities. Like most MDPs of interest, such belief-state MDPs are too computationally expensive for standard solution approaches. Recently, Monte Carlo methods, which approximate the full evaluation of a tree by sampling, have proven very useful in these scenarios [5]. Monte Carlo methods highlight a critical feature of real-world decision making: in addition to choosing amongst actions in the world, agents also have to decide whether to spend further computational resources to improve their estimates of the value of actions. Here, the problem which outcome distributions to sample from and when to stop sampling was itself formalized as a meta-level MDP [3] and solved near-optimally by an extension of the analytical results in [6]. Specifically, the states of the meta-level MDP were the mean and precision parameters of Gaussian beliefs about the Q t values of playing the k slot machines (Smeta = {(µti , τit )}1≤i≤k , P (Q(s, ai )) = N (µti , τit ). The meta-level actions comprise the decision to stop planning and a set of computations each of which samples from one action’s cumulative reward distribution by simulating taking the action and then following the optimal policy according to a modified version of the algorithm by [5]. The meta-level transition distribution was defined by Bayesian learning from a sample drawn from the Normal distribution N (Q(s, ai ), τisample ) centered on the Q-value of the simulated action ai . The reward function returns the negative time cost of computation c for computations and the cumulative reward of the best action expected 1

under the current meta-level belief for the decision to stop sampling. Therefore, the meta-level MDP’s objective function is the expected cumulative reward of the action that will be chosen minus the time cost of computation. Based on this formulation we derived lower and upper bounds on the number of computations n chosen by the optimal meta-level policy:     1 1 k 1 sample sample 0 0 √ − max{τi + τi √ − min{τi + τi · } ≤n≤ · } . (1) i i c · 2π c · 2π maxi τisample mini τisample The optimal number of computations n is determined by subjective controllability via the precision parameters τi0 and τisample and by the cost of computation c. Figure 1 shows that 200 samples are sufficient to closely approximate the normative effect of controllability on the optimal exploration-exploitation tradeoff in sequential decision making (cf. [4]). Figure 2a shows that the optimal number of computations increases with subjective controllability. While this provides a plausible explanation of why depressed patients might invest less mental effort into planning, depression is also associated with a reduced speed of information processing, and Equation 1 also shows that the optimal number of mental simulations decreases with the time cost of computation (see Figure 2b). Figure 3 finally compares the relative frequency with which our model decided to repeat an action as a function of its outcome between prior beliefs expressing high and low controllability respectively. This qualitatively replicates the findings by [4]. Overall our results suggest that the importance of controllability for decision making is closely connected to the rational management of computational resources.

1: Sample-based approximation to the difference in the differential Q-value of exploitation between a controllable and an uncontrollable belief-space MDP. 2: The optimal speed-accuracy tradeoff as function of controllability and time cost respectively. 3: Simulated repeat modulation in the eight-stage decsision-making task from [4].

References [1] [2] [3] [4] [5] [6]

QJM Huys and P Dayan. Cognition, 113(3):314–328, 2009. A Guez, A Silver, and P Dayan. In NIPS, volume 24, December 2012. NJ Hay, S Russell, D Tolpin, and SE Shimony. AUAI Press, P.O. Box 866 Corvallis, Oregon 97339 USA, August 2012. QJM Huys, JT Vogelstein, and P Dayan. In Advances in Neural Information Processing Systems, volume 21, December 2009. M. Kearns, Y. Mansour, and A. Y. Ng. Machine Learning, 49(2):193–208, 2002. N. J. Hay and S. Russell. Technical Report UCB/EECS-2011-119, EECS Department, University of California, Berkeley, 2011.

2

Controllability and resource-rational planning

critical feature of real-world decision making: in addition to choosing amongst actions in ... c for computations and the cumulative reward of the best action expected. 1 ... Technical Report UCB/EECS-2011-119, EECS Department, University of ...

1MB Sizes 0 Downloads 292 Views

Recommend Documents

Controllability of complex networks.pdf
Page 1 of 7. ARTICLE. doi:10.1038/nature10011. Controllability of complex networks. Yang-Yu Liu1,2, Jean-Jacques Slotine3,4 & Albert-La ́szlo ́ Baraba ́si1,2,5. The ultimate proof of our understanding of natural or technological systems is reflect

Coarsest Controllability-Preserving Minimization for ...
(under)specifications for a greater modeling convenience [5]. Note that ...... failed pairs in total, leading to the complexity of O(S3) for updating the little brother ...

Exact boundary controllability of two Euler-Bernoulli ...
Dec 20, 2002 - For the system (1.3), when γ > 0, it turns that the space of controllable initial data can not be found among the family of energy spaces but it is ...

On Global Controllability of Affine Nonlinear Systems with ... - CiteSeerX
We may call the function g1(x)f2(x) − g2(x)f1(x) as the criterion function for global ..... Here, we make a curve γ2 connecting (n − 1)-time smoothly γ1 and the ..... Control Conference, South China University of Technology Press, 2005, pp.

Further Results On Global Controllability of Affine ...
Aug 10, 2006 - The first class is planar affine nonlinear systems with one singular point, the second class is high-dimensional affine nonlinear systems with ...

Boundary controllability of the one-dimensional wave ...
of the solutions is of the order of the size of the microstructure. More precisely, as it was proved in 2], there exist stationary solutions which concentrate most of its ...

On Global Controllability of Affine Nonlinear Systems with ... - CiteSeerX
We may call the function g1(x)f2(x) − g2(x)f1(x) as the criterion function for global controllability of the system (2.2) ..... one side of the straight-line which passes through the point x0 with direction g(x0. ); ..... bridge: MIT Press, 1973. 1

planning and scheduling
ment of PDDL (Planning Domain De nition Language), a language for ... Satisfaction (CSP) or Constraint Programming. The dis- ..... cision Procedure for AR.

On Global Controllability of Planar Affine Nonlinear ...
Sep 24, 2008 - Qiang LUa a. State Key Laboratory of Power System, Department of Electrical Engineering,. Tsinghua University, Beijing, China b. School of Mathematics and Computational Science, Sun Yat-sen University,. Guangzhou, China. Abstract. This

Globally Asymptotical Controllability of Nonlinear Systems
global controllability of nonlinear systems. Let F(0) = 0, i.e., the origin is an equilibrium point of the vector field F(x). Then we need the following definition of the globally asymptotical controllability of the system. (2.1). Definition 2.1 [12]

Uniform boundary controllability of the semi-discrete ...
Glowinski R., Li C. H. and Lions J.-L.: A numerical approach to the exact boundary controlla- .... System (3) is controllable if and only if for any initial data. (U. 0 h.

planning and scheduling
from those arising in more realistic application domains. Real-sized planning domains are ... development of problems with increasingly complex fea- tures. Functional to the ..... to the problem of plan life-cycle management. REFERENCES.

Telecentre and Community Informatics Planning And Management ...
Page 1 of 10. No. of Printed Pages : 11 I OXE-021 & 022. CERTIFICATE IN TELECENTRE/ VILLAGE. KNOWLEDGE CENTRE MANAGEMENT. (CTVM). Term-End Examination. December, 2011. OXE-021 : TELECENTRE AND COMMUNITY. INFORMATICS. OXE-022 : PLANNING AND MANAGEMENT

Media planning and evaluation.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Media planning and social marketing.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Media planning ...

Community Planning and Facility Partnership.pdf
Partnership. Page 3 of 4. Community Planning and Facility Partnership.pdf. Community Planning and Facility Partnership.pdf. Open. Extract. Open with. Sign In.