Delegated Experimentation Umberto Garfagnini∗ October 19, 2011

Abstract This paper studies situations in which a principal can only acquire information for a mutually relevant decision through a biased agent. The principal has no commitment power and transfers are impossible. We base our analysis on an exponential bandit model with one “risky” action and one “safe” action. We first characterize the unique, Markov-perfect equilibrium when the agent’s effort choice is observable, and thus learning is symmetric. The chance to generate positive information about the risky action is balanced with the risk of producing negative information that makes the principal lean toward the undesirable safe action. In equilibrium, this tension produces a delay in information acquisition. However, when effort is unobservable, this delay can actually hurt the agent. Thus, when the agent’s cost of experimentation is low, the agent ends up implementing the principal’s optimal experimentation policy.

1

Introduction

In many situations, information can only be acquired through an agent who has biased incentives. Consider the case of a financially distressed bank. A regulator has to decide whether to liquidate the bank or allow the management to operate it. The optimal decision depends on the underlying quality of the bank’s assets, which initially might be unknown I am deeply indebted to my advisors, Alp Atakan, Marco Ottaviani, and Bruno Strulovici, for their constant encouragement, guidance, and insightful conversations. I am also very grateful to David Besanko, Wiola Dziuda, Julio Gonz´ alez-D´ıaz, Brett Green, Peter Klibanoff, B˚ ard Harstad, Jin Li, Kane Sweeney, Mauricio Varela, and seminar participants at the University of Bristol, University of Essex, ITAM School of Business, Kellogg School of Management, University of Miami, Royal Holloway, and University of Warwick. Any remaining errors are my own. ∗ ITAM School of Business, R´ıo Hondo No.1, Col. Progreso Tizap´an, 01080 M´exico, Distrito Federal. Email: [email protected].

1

to both the regulator and the bank management. For instance, the assets’ quality may depend on the state of the economy. In addition, the regulator’s decision depends on how much effort the bank management will exert to keep the bank operating–for example, by restructuring the bank. If the assets’ quality is high, then effort may be able to restore the bank’s profitability, which removes the need for the regulator’s intervention. Otherwise, effort is pointless. Furthermore, even if the bank’s assets are high quality, any restructuring effort will require time to restore the bank’s profitability. How does the separation of roles in the restructuring phase affect the bank managers’ incentives and, thus, the speed of information acquisition? Does the regulator benefit from monitoring the managers closely? How does the quality of information generated change when the regulator cares about the externality resulting from his intervention? This paper studies the delay in information acquisition when a principal must delegate experimentation to a biased agent. The principal has no commitment power, and monetary transfers are impossible. We conduct our analysis using an exponential bandit model with one “risky” action and one “safe” action. The risky action could be either “good” or “bad”for the principal, while the agent strictly prefers the risky action over the safe action. At each instant, the principal must decide between the risky action (e.g., allowing the bank to operate), and the alternative safe action (e.g., liquidating the bank), which yields a deterministic payoff to both players. Instead, the agent chooses an effort level. In a good state, effort determines the arrival rate of a shock that, once it occurs, makes the risky action dominant for the principal (effort is pointless in a bad state). In the context of the introductory banking example, the shock corresponds to the bank’s return to solvency. In addition, it is important to keep in mind that the bank managers (i.e., the agent) are in charge of the restructuring and receive a salary as long as the bank is not liquidated, which generates a bias. We analyze the model under two different monitoring assumptions: In the baseline scenario, effort is observable; in the second scenario, effort is unobservable. When effort is observable, we restrict attention to Markov-perfect equilibria, with the posterior belief about the risky action as a state variable. This restriction allows for a tractable analysis and a unique equilibrium. The equilibrium concept allows for the principal’s lack of commitment power, since the principal cannot commit to accept only high effort levels. In addition, the restriction to Markov strategies implies that both players’ decisions can only be based on the payoff-relevant information about the history of play, which is captured completely by the posterior belief about the state of the risky action. We find this requirement to be reasonable in many applications of the model. In equilibrium, the agent balances the chance to generate a shock with the risk of facing 2

a more pessimistic principal later on. By exerting effort, the agent increases the chance of a shock occurring, which ends the need for experimentation. However, when no shock occurs, high effort also increases the principal’s pessimism about the good state (i.e., the true state of the bank’s assets). Thus, the agent prefers to create an information delay in the unique Markovian equilibrium of our game, although this is not what the principal wants. Delay is a consequence of the principal’s lack of commitment power to terminate the relationship with the agent. The principal can only make decisions that are sequentially rational, given the publicly available information, and the expectation about the continuation of play. In the introductory banking example, procrastination allows the management to retain independent control of the bank for a longer period of time than would be the case if full effort were exerted initially. Given observability, the regulator will link the delay in the occurrence of a shock to lack of effort by the bank managers rather than low assets’ quality. The agent increases effort over time until either a shock occurs or the principal stops experimentation by switching to the safe action. The monotonicity of the equilibrium level of effort follows from two observations. First, the principal can exert pressure on the agent by threatening to implement the safe action. This threat determines the minimal level of effort that the agent must exert in order for the experimentation phase to continue. Second, the agent chooses exactly that level of effort in every period, which makes the principal indifferent to the two actions. Since no news is bad news, the principal becomes more pessimistic as time goes by. The principal’s growing pessimism increases the principal’s commitment power to terminate the agent, unless the agent exerts more effort. In our banking example, the managers will have to try harder to keep control of the bank when the bank’s financial situation fails to improve. The cost of experimentation determines how much information the agent is willing to generate through experimentation. Without taking discounting into consideration, the agent generates the principal’s desired level of information only if the cost is low. The principal always maintains control over which action is chosen in any period. However, the value of the risky action for the principal is linked to both the posterior belief and how much effort the agent exerts. If the agent exerts no effort, the risky action has no value for the principal, and the principal is forced to switch to the safe action. This bleak scenario occurs when the cost of effort is high. The agent’s effort must increase over time when no shock arrives, so at some point the marginal cost of an increase in effort exceeds the marginal benefit to the agent. Experimentation ends prematurely then, because the principal would still be sufficiently optimistic to allow experimentation, even though the agent’s level of effort is less than maximal. When the cost is low, the situation is reversed, and the agent would be 3

willing to experiment for a longer period of time than the principal desires. In equilibrium, the principal stops experimentation at exactly the same posterior belief at which he would stop anyway. Even when discounting is considered, delegating experimentation to the agent may result in more information being generated, as compared to the case in which the principal engages directly in experimentation (if it were possible). In our banking example, this scenario corresponds to the case in which the regulator takes control of the bank (instead of liquidating the bank). Delegation to the agent generates more information than direct experimentation when the prior belief about the good state lies in an intermediate range. The agent has more to lose than the principal if the safe action is imposed forever. Thus, the agent is willing to sustain current losses for a longer period of time, in order to prolong the experimentation phase. However, when the prior belief is sufficiently optimistic and the principal cannot experiment directly, the principal is willing to tolerate lower effort levels by the agent. Consequently, the delay induced by the agent reduces the quality of the information she generates. However, this equilibrium experimentation policy may exhibit over-experimentation, when compared to the utilitarian optimum. Over-experimentation occurs when: i) social welfare under the safe action is lower than the agent’s utility from implementing the risky action at all times; and ii) the cost of experimentation is high. In this case, the social gain from implementing the risky action more than compensates for the social loss incurred by the principal, in the case of a bad state. Thus, the utilitarian experimentation policy dictates the implementation of the risky action, whether or not the shock occurs. However, experimentation stops at a posterior belief that is strictly higher than the equilibrium cutoff belief. In fact, the agent’s lack of control over the course of action in the delegation case induces her to over-experiment in order to prolong the experimentation phase. Since the principal fails to internalize the social gain resulting from the implementation of the risky action, the agent is forced to exert effort to keep the principal from switching to the safe action. Instead, when the cost of experimentation is low or the social gain from the risky action is not enough to compensate for the principal’s payoff loss, then there will be under-experimentation in equilibrium compared to the social optimum. However, regardless of the specific scenario, the equilibrium level of effort is always less than the socially optimal level. When effort is unobservable, the delay in information acquisition vanishes in equilibrium: For small enough cost parameters, the value of monitoring for the principal is negative in our model. Monitoring, together with the principal’s lack of commitment power to terminate the agent, exposes the principal to manipulations that occur through the control the agent 4

has on the rate of learning. When effort is unobservable, the principal’s decision to continue or stop experimentation depends only on his expectation about the agent’s effort choices and the occurrence of the shock. Thus, an unobservable deviation to higher effort has the benefit of increasing the probability of the shock without any negative effect on the principal’s posterior belief. In equilibrium, the principal expects the agent’s higher effort level and responds accordingly. However, the principal’s optimal experimentation policy can only be implemented when experimentation is cheap for the agent. When experimentation is expensive, moral hazard leads to under-experimentation in equilibrium because the principal lacks the instruments to discipline the agent, other than the threat of implementing the safe action. Finally, we extend the model by analyzing a scenario in which the principal cares about the externality induced on the agent by his choice of action. We then perform comparative statics on how much the principal cares about the externality. In equilibrium, when effort is observable, we show that delay increases when the principal cares more about the externality. This is because the principal (partially) internalizes the negative effect on the agent that results from a switch to the safe action. The principal becomes more lenient, and the agent takes advantage of the additional leeway to slow down the rate of learning. Thus, the (undiscounted) amount of information generated in equilibrium depends on the cost of experimentation. When the cost is low, the agent provides more information, compared to the baseline scenario in which the principal disregards the externality. When the cost is high, the level of information is unaffected by the principal’s increased interest toward the agent because experimentation is still too expensive for the agent. This paper contributes to several strands of the literature. First, we build on the recent literature on strategic experimentation following Bolton and Harris (1999) and, in particular, Keller, Rady, and Cripps (2005). However, we depart from the literature by analyzing a delegation problem in which the principal decides which action is chosen in any period, while the agent controls the rate of learning. Thus, the paper ignores free-riding considerations and instead focuses on how the conflict of interest between the principal and the agent shapes incentives to generate or delay information when the principal has no commitment power and limited instruments to discipline the agent. Strulovici (2010) considers a similar environment in which multiple agents decide collectively which action is going to be implemented, focusing on the effect of voting rules on group decisions. This paper differs from that approach, in that the players have different roles during the experimentation phase. Bonatti and H¨orner (2011b) focus on moral hazard in teams and show that a free-riding problem leads to underprovision of effort. In contrast, we show how delay arises from the conflict of interest 5

between the principal and the agent.1 Finally, Hirsch (2011) analyzes experimentation in a principal-agent framework in which transfers are not feasible but with a focus on the disagreement between the principal and the agent on the best policy to implement to achieve a common goal, rather than on a conflict of the interest as is the case in this paper. The literature on financing experimentation is also related to this paper. The papers most similar to this one are Bergemann and Hege (2005) and H¨orner and Samuelson (2009), which study the optimal, dynamic provision of incentives to an entrepreneur who needs funding to complete a project. In contrast, we focus on a different trade-off. Our agent considers the generation of the shock to be a burden, rather than a positive event, because the agent does no directly benefit from experimentation and effort is costly. Thus, when monitored, the agent prefers to exert the least possible amount of effort. Furthermore, the agent is willing to terminate the relationship prematurely if the cost of effort is sufficiently high and the posterior belief about the good state is low, while the principal prefers experimentation to continue. Besanko and Wu (2010) study the effect of subsidies on the development of R&D projects, while Gerardi and Maestri (2009) investigate the optimal, long-term contract that induces the agent to acquire and reveal information that is useful for the principal. In contrast, this paper examines environments in which transfers are not possible. The impossibility of transfers links our paper to the literature on optimal delegation, following Holmstr¨om (1984).2 However, we relax two main assumptions from that paper. First, we endogenize the principal’s commitment power through a (possibly) infinite repeated relationship with the agent. In this regard, our paper is also related to Alonso and Matouschek (2007), but with an important difference, which leads to our next assumption. Second, we introduce learning about a payoff-relevant state of the world. Although learning is symmetric in our baseline model, the principal still faces the problem of providing the agent with the right incentives to experiment. A parallel line of research investigates the so-called “games of persuasion”.3 This paper is most closely related to Brocas and Carrillo (2007), which studies a game in which a leader (i.e., agent) controls the flow of public information, while a follower (i.e., principal) uses the information generated to make a decision. In contrast, in this paper the principal chooses 1

Exponential bandit models have also been employed by Malueg and Tsutsui (1997) and D´ecamps and Mariotti (2004). More recently, Klein (2010) employed exponential bandits in a principal-agent model in which the principal pays the agent to prove a certain hypothesis, but the agent can replicate the results using a known technology. 2 Other relevant papers in the optimal delegation literature include Alonso and Matouschek (2008), Szalay (2005), and Armstrong and Vickers (2010). 3 Early contributions include Matthews and Postlewaite (1985), Milgrom and Roberts (1986), and Kamenica and Gentzkow (2010).

6

which action is implemented in every period, and the value of learning is linked to that choice. In addition, the effort exerted by our agent has the dual effect of revealing the state of the risky action and affecting directly the principal’s preferences. Finally, our paper relates to the literature on the possible disadvantages of monitoring in agency problems. Cr´emer (1995) and Bergemann and Hege (2005) show that the principal can benefit from less information about the agent’s actions because of the resulting increase in the principal’s commitment power. A similar observation holds in our framework. The lack of monitoring implies that the only way the agent can induce the principal to choose the risky action is by generating the shock. In this scenario, delay can only hurt the agent, so the agent experiments more than when she is more closely monitored. In this regard, our result recalls the “signal-jamming” problem in the career concerns’ model of Holmstr¨om (1999).4

2

The Baseline Model: Observable Effort

We present a model of delegated experimentation between a risk-neutral principal (“he”) and a risk-neutral agent (“she”), based on Keller, Rady, and Cripps (2005). Time t ∈ [0, ∞) is continuous, and payoffs are discounted at a common interest rate, r > 0. At each instant, t, the principal must decide between taking a “risky” action, R, such as letting a financially distressed bank to continue to operate, and a “safe” action, S, such as liquidating the bank. The safe action yields a flow payoff, s > 0, per unit of time to both players.5 The agent must continually decide how much observable effort, k ∈ [0, 1], to exert.6 The risky action yields lump-sum payoffs of size h to the agent at random times that correspond to the jumping times of a Poisson process with intensity normalized to 1. Over an interval of time, [t, t + dt), if the agent exerts effort, kt ∈ [0, 1], her expected payoff is (h − ckt ) dt, with c > 0. We assume that h > s, so it is commonly known that the agent strictly prefers R to S. The principal can prevent the agent from receiving the flow payoff, h, only by implementing S. There are no monetary transfers, and, as in the introductory banking example, the managers of the bank continue to receive their salaries as long as the bank is not liquidated. The principal’s instantaneous payoff from R depends on: i) a state of the world, θ ∈ 4

See Bonatti and H¨ orner (2011a) for a recent reexamination of Holmstr¨om (1999). The assumption that the flow payoff under the safe action is the same for both players is just for convenience. 6 Section 3 relaxes the assumption that the agent’s choice of effort is observable. 5

7

{Good, Bad}, which is unknown to both players–such as the quality of a bank’s assets; and ii) the effort, k, chosen by the agent. If θ is bad, R generates no payoffs for the principal, regardless of effort. If θ is good, then the principal receives no payoffs until the occurrence of a shock at a random time, τ , which corresponds to a lump-sum payoff of size h for the principal. If no shock occurs until time t, then the probability of observing the shock over an interval of time, [t, t + dt), is kt dt, conditional on θ = Good.7 After the occurrence of the shock, the strategic part of the game ends. From τ onwards, the principal’s future lump-sum payoffs, still of size h, arrive at random times corresponding to the jumping times of a Poisson process with constant intensity 1, but independently of the agent’s lump-sum payoffs.8 A few remarks are in order. First, the agent controls the arrival time of the shock in the good state through her choice of effort. Second, the observation of the first lump-sum payoff has the dual effect of revealing the state (i.e., informational effect), and also affecting the principal’s payoff structure (i.e., direct payoff effect). After the shock, the agent’s effort has no additional effect on the principal’s payoff distribution. In the introductory banking example, the shock may be interpreted as the bank’s return to solvency, which not only proves the high assets’ quality but also carries payoff-relevant consequences.9 Third, the assumption that lump-sum payoffs arrive stochastically after the observation of the shock is simply a convenient way to model the game, while simultaneously allowing for a direct comparison with the standard exponential bandit model. The principal and the agent share an initial common belief, p0 < 1, that θ is good. Since all the information is public, the posterior belief, pt = Prob(θ = Good|Ft ), is common as well, where Ft denotes all the information available up to time t. We can then write the principal’s instantaneous, expected payoff from R as pt kt h dt, if the agent exerts effort, kt , over [t, t + dt) and no shock occurs before time t. The assumption that the principal and the agent share a common prior belief at the start of the game is a strong requirement. For instance, in our introductory example, if the quality of the bank’s assets depends on the state of the economy, it might be reasonable to assume that the bank management and the regulator are equally well informed. However, in some applications, the principal and the 7

The assumption that learning cannot occur when k = 0 is just for convenience. The qualitative results of this paper would remain the same if we instead assumed that the instantaneous probability of the shock is (kt + ) dt, with  positive but small. 8 The assumption that the size of the principal’s lump-sum payoffs is the same as the agent’s is only just for economy of notation. 9 Alternatively, we can assume that the shock reduces the agent’s cost of experimentation, which we can normalize to zero. In this case, the agent will simply exert maximal effort after the shock to induce the principal to continue implementing the risky action. For example, the bank’s return to solvency induces the return to “business as usual”.

8

Agent chooses effort k

Principal observes k and chooses R or S

Payoffs are realized

Figure 1: Heuristic timeline over an interval of time, [t, t + dt). agent may be asymmetrically informed. Given the agent’s effort choices, {kt }, it follows from Bayes’ rule that the posterior belief pt evolves according to the dynamic equation,10 p˙t = −kt pt (1 − pt ),

(1)

provided that the principal continues to choose the risky action. The posterior jumps to 1, as soon as the shock is observed. It stays constant thereafter. If the principal chooses the safe action, then no learning occurs about the risky action, regardless of effort. More importantly, the effort chosen by the agent controls the rate of learning, because higher levels of effort induce larger drops in the posterior belief following no shock. Before proceeding with the explanation of the equilibrium concept, we note Figure 1, which contains a heuristic timeline over an infinitesimal interval of time, [t, t + dt). The agent chooses a level of effort, kt , for the considered interval of time. The principal observes the agent’s choice of effort and decides whether to choose R or S. If the principal chooses R, both players observe whether or not the shock occurs. Otherwise, both players receive s over that interval of time.11 An experimentation policy is a stochastic process, Z = {Zt }, adapted to the filtration generated by the arrival of the principal’s lump-sum payoffs, which takes values in the space [0, 1] × {0, 1}. At any time t, Zt = (kt , Dt ), where kt is the effort chosen at time t, and Dt is the decision to implement R (Dt = 1) or S (Dt = 0). Given any experimentation policy, Z, 10

Equation (1) can be easily derived from Bayes’ rule, as follows, pt + dpt =

pt (1 − kt dt) (1 − pt ) + pt (1 − kt dt)

Taking pt to the right-hand side, dividing by dt, and1 taking the limit as dt → 0, we obtain the desired law of motion. 11 The reason for the choice of timing will be explained after we introduce the solution concept.

9

the agent’s value function induced by Z is Z VA,t

Z

+∞

=E

e

−ru

 [(1 − Du )s + Du (h − cku )] du

(2)

t

while the principal’s value function is given by Z VP,t =E

(Z

τ (Z)

e−ru (1 − Du )s du + e−rτ (Z)

t



) h +h r

(3)

where τ (Z) is the (possibly infinite) random arrival time of the first lump-sum payoff for the principal, which is induced by the experimentation policy Z.12 A Markovian decision strategy for the principal is a function D : [0, 1]2 → {0, 1}, which maps the posterior belief and the effort level chosen by the agent into an action. Similarly, a Markovian effort strategy for the agent is a function, k : [0, 1] → [0, 1]. A Markovian experimentation policy, Z, is an experimentation policy, such that both {kt } and {Dt } are Markovian. Under the Markovian restriction, the value function of the principal solves the following Hamilton-Jacobi-Bellman (HJB) equation rVPZ (p)

    dVPZ (p) h Z − VP (p) − k(p)p(1 − p) ,s = max pk(p)h + pk(p) r dp

(4)

while the value function of the agent solves rVAZ (p)

      Z (p) h dV A Z ˜ ˜ h − ck˜ + pk˜ ˜ − VA (p) − kp(1 + (1 − D(p, k))s = max D(p, k) − p) ˜ r dp k∈[0,1] (5)

The principal’s value function reflects the value associated with each action. Clearly, the principal’s payoff cannot be below what he can obtain by simply choosing S, which is represented by the second term of the maximand in (4). The first term represents the payoff associated with R. By choosing R, the principal receives an expected flow payoff equal to pkh, where k is the level of effort exerted by the agent. In addition, the value function changes after information generated from experimentation is released. The continuation payoff can be decomposed into two elements: i) the jump of the value function following the shock; and ii) the change in the value function when no shock occurs, which follows from The definition of VPZ already incorporates the observation that choosing R is a dominant strategy for the principal after the shock. 12

10

Bayesian updating. The agent’s value function is obtained in a similar manner. However, it is important to note that the agent’s continuation value also depends on the posterior belief for two reasons: i) the principal’s power to discontinue the risky action in the future; and ii) the agent cares about the posterior belief because it affects her value of experimentation. Next, we define the equilibrium concept.13 Definition 1. A Markovian experimentation policy, Z, is a Markov-perfect equilibrium if it solves (4) and (5) simultaneously for almost all p. The Markovian restriction rules out strategies that are history-dependent, but it yields a tractable analysis. Equation (4) captures the principal’s lack of commitment power, which is a result of his inability to reject effort paths that induce a payoff from R that is higher than what the principal could obtain by choosing the safe action. Finally, note that the principal’s level of commitment power evolves with the posterior belief.14

2.1

The Principal’s Optimal Experimentation Policy

We begin the analysis by establishing a useful benchmark. We focus on the optimal policy from the principal’s point of view.15 What would the outcome of experimentation be if the principal could impose experimentation on the agent? This is a single-agent decision problem. Let VO (p) denote the value function of the principal, which must solve the following HJB equation:16       h 0 − VO (p) − kp(1 − p)VO (p) , s rVO (p) = max max pkh + pk k r 13

(6)

The equilibrium concept involves a backward induction argument similar to the one employed by Strulovici (2010), but in this paper the principal’s value function depends on the agent’s choice of effort. 14 To complete the description of the model, it is important to understand why we assume that the agent’s choice of effort is observable before the principal makes his own decision, rather than both choices occurring simultaneously. Suppose that both strategies depend only on the posterior belief. In particular, let’s consider a time t in which the posterior belief, pt , is such that Dt = D(pt ) = 1, so that the principal chooses the risky action at t. However, if the agent deviates to zero effort, the posterior belief will remain unchanged, because no learning has occurred following the deviation. The principal would then keep implementing the risky action even after the deviation. Thus, Markovian strategies would not be appropriate to describe equilibrium behavior, thus requiring the use of non-Markovian strategies. While restrictive, we believe that the assumption that the agent chooses her effort level before the principal implements his choice of action gains tractability without affecting the economic insights derived from our analysis. 15 The reason for this choice of benchmark will become clear when we analyze the experimentation game in the absence of monitoring in Section 3. 16 Equation (6) does not mention the cost of effort because the agent pays this cost.

11

The linearity in k of the maximand in (6) clearly shows the optimality of a cutoff-strategy. That is, the principal sets effort, k = 1, above a cutoff belief pO (yet to be determined). Below pO , the principal exerts no effort and, thus, VO (p) = rs for any p ≤ pO . The cutoff belief can be obtained from the smooth pasting condition, which implies that the derivative of the value function vanishes at the cutoff belief (i.e., VO0 (pO ) = 0), giving pO =

rs rh + (h − s)

(7)

The cutoff belief is well-defined, provided that h > s. We thus obtain the immediate result given in Proposition 1. Proposition 1. If the principal can impose experimentation on the agent, then the principal chooses R for any posterior belief above pO , and imposes maximal effort, k O (p) = 1, on the agent; otherwise, the principal chooses S, and the agent exerts zero effort. Proposition 1 is quite intuitive. The principal has an outside option and thus wants to learn whether or not he should switch to the safe action as soon as possible. Since delay generates an opportunity cost for the principal, he imposes maximal effort on the agent to speed up the rate of learning.

2.2

Equilibrium Analysis

More realistically, the principal can hardly expect the agent to follow his instructions for conducting experimentation. In this case, Theorem 1 shows that there exists a unique Markovperfect equilibrium in the experimentation game. The equilibrium policy depends crucially on the magnitude of the cost of experimentation, c. If c is “low,” then experimentation continues until either a shock occurs or the posterior belief reaches pO , the principal’s optimal cutoff belief. If the cost is “high,” then there is under-experimentation in equilibrium–that is, the agent prefers to let S be imposed before the posterior belief reaches pO , even if she could still convince the principal to continue with experimentation. In both scenarios, experimentation is delayed over time. This delay is a direct consequence of the principal’s lack of commitment power. Definition 2. Experimentation has a high cost if c > c∗ ≡ cost.

(1+r)h(h−s) rh+(h−s)

and, otherwise, a low

The agent faces a trade-off in her choice of effort. From the agent’s HJB equation, (5), it is immediately evident that the effort, k, affects the agent’s flow payoff through the cost 12

only, but it affects her continuation payoff in two ways. First, higher effort increases the probability of a shock, which, once it occurs, makes both the principal and the agent’s continuation values jump to hr . At that point, the principal faces exactly the same incentives as the agent, and he will never question R again. Second, if no shock occurs, higher effort leads to a larger decrease in the posterior belief. Faster learning, then, has the adverse effect (for the agent) of prompting an early end to the experimentation phase. Theorem 1 shows that the second effect always dominates. h(h−s) Theorem 1. Suppose c < 1+r . The experimentation game has a unique Markovr s perfect equilibrium, such that the principal chooses the risky action, R, for posterior beliefs O above max{p∗ (c), pO } and the agent exerts effort, k ∗ (p) = pp < 1, where pO is the principal’s optimal stopping cutoff, given by (7), while

p∗ (c) =

rcs (1 + r)h(h − s)

(8)

is the agent’s optimal stopping cutoff. Otherwise, the principal chooses the safe action, S, and the agent exerts zero effort. The agent’s value function, VA , is nondecreasing in p, while the principal’s value function is constant at rs . Also, p∗ (c) < pO if and only if the cost of experimentation is low. h(h−s) , then there is no experimentation in equilibrium, and S is Finally, if c > 1+r r s implemented for any posterior belief. When the cost is too high, it is impossible to motivate the agent to conduct any experimentation, but otherwise the agent exerts the minimal level of effort that the principal is willing to accept to choose R. The agent prefers to experiment in order to prompt the shock for the principal, thereby securing R forever, without the need for further experimentation. However, if the pace of experimentation is too fast and no shock occurs, the agent faces a more pessimistic principal. Since experimentation is conducted with the sole purpose of learning whether or not R is good for the principal, who has the power to discontinue the risky action, it is optimal for the agent to exert as little effort as possible. In fact, even if the cost of experimentation were zero, the agent would generate delay. Thus, the equilibrium choice of effort displays extreme pessimism. The agent draws out the experimentation phase, rather than hastening this and, thereby, increasing the likelihood of the shock. Suppose the effort is fixed at k > 0. By the dynamic equation (1), the time it takes for the posterior

13

EQUILIBRIUM WITH OBSERVABLE EFFORT c>c* 1

0.8

0.8

0.6

0.6

EFFORT

EFFORT

EQUILIBRIUM WITH OBSERVABLE EFFORT c
0.4

0.4

0.2

0.2

0

0

T* TIME

T*(c) TIME

(a) c = 0.5 < 1.75 = c∗

(b) c = 3 > c∗

Figure 2: Equilibrium effort: h = 2, s = 1, p0 = 0.9, r = 1/6. belief to drop from p to q < p is simply   p(1 − q) 1 T = ln k (1 − p)q

(9)

It is immediately apparent that as k decreases, T increases. Therefore, the equilibrium effort path maximizes T , in case no shock occurs–subject to the principal’s participation constraint. The equilibrium effort level monotonically decreases in the posterior belief. This is because the principal becomes more pessimistic when no shock arrives, and this corresponds with a reduction in the posterior belief. Thus, the minimal level of effort that keeps the principal indifferent must necessarily increase as the posterior belief drops. The monotonicity is directly linked to the principal’s commitment power to discontinue the risky action. In the absence of a shock, this commitment power increases over time. Physical constraints dictate that experimentation cannot be sustained below pO because the minimal effort level required by the principal would then exceed the upper bound of 1. Thus, the cutoff belief p∗ (c), is the optimal stopping threshold for the agent, if she has no constraints on the level of effort in each instant of time. The threshold cost level, c∗ , coincides with the level that equates p∗ (c) with pO . When the cost is low, then p∗ (c) < pO . In this case, the agent has more incentives than the principal to experiment, and thus she is willing to extend experimentation until the posterior belief reaches p∗ (c). For the principal, the price of this action comes in the form of delay, since the agent experiments at a much slower pace than the principal wants. In equilibrium, when the cost is high, the agent underexperiments, as compared to the principal’s optimal level (i.e., p∗ (c) > pO ). At the stopping

14

threshold, the marginal cost of additional experimentation exceeds the marginal benefit for the agent, even if she is still able to convince the principal to go on. In this case, increasing the upper bound on the maximal, per-period effort level has no effect on the equilibrium outcome. Theorem 1 implies the following observations. Corollary 1 (Comparative Statics). In the unique equilibrium characterized in Theorem 1: 1. The equilibrium level of effort is an increasing function of time. 2. The equilibrium level of effort increases in the interest rate, r, and the flow payoff, s, while it decreases in the flow payoff, h. 3. In the absence of a shock, the duration of the experimentation phase decreases in the interest rate, r, and the flow payoff, s, while it increases in the flow payoff, h. Increased impatience speeds up the rate of learning, because it is costlier for the principal to wait for the shock to occur. This puts more pressure on the agent, who is forced to intensity her efforts. A similar situation occurs when the outside option, s, increases, because the principal has more commitment power due to the increased attractiveness of the safe action. In that case, an increase in the flow payoff, h, associated with a good risky action makes experimentation more attractive for the principal, and thus the agent can decrease her effort. It is also important to note that the above comparative statics arguments remain valid if we allow for heterogeneity in the payoffs to the principal and the agent. This is because the equilibrium level of effort is derived solely from the principal’s indifference condition. However, heterogeneity would affect the equilibrium cost cutoff, c∗ . Figure 3 shows how the path of equilibrium effort changes as players become more impatient. A diametrically opposed argument holds for the duration of the experimentation phase.

2.3

The Informational Content of Delegation

In order to evaluate the effect of delegation on information acquisition, we need to measure how much information is generated through experimentation. Since effort controls the rate of learning about the state of the risky action from equation (1), it seems natural to employ the aggregate amount of effort as a proxy for the total amount of information generated. We thus introduce the following two measures given Definition 3.

15

1

EQUILIBRIUM EFFORT

0.8

r=1.5 r=3

0.6

0.4

r=0.5 0.2

0

TIME

Figure 3: Comparative statics: c < c∗ , h = 2, s = 1, p0 = 0.9. Definition 3. Let T denote the calendar time at which experimentation ends if no shock occurs. Then: RT (i) The amount of experimentation is given by 0 kt dt; (ii) The discounted amount of experimentation is given by

RT 0

e−rt kt dt;

The amount of experimentation defines a measure for comparing experimentation policies. This measure is particularly easy to compute, because it depends solely on the prior belief, p0 , and the belief that experimentation ends at, q < p0 , in the following way:17 Z

T

kt dt = ln Ω(q) − ln Ω(p0 )

(10)

0

where Ω(p) = 1−p . However, some caution must be exercised because it is an undiscounted p measure. For example, it is unclear whether or not an experimentation policy that induces experimentation until the posterior belief drops to a level, q, is necessarily better than another policy in which experimentation terminates at q 0 > q. The first policy generates a higher amount of experimentation than the second policy, but the latter may be characterized by delay in information acquisition through a lower effort level throughout. Nonetheless, (10) is useful and easy to compute because of its independence from the effort path. Given that the present paper presents as a theory of delay in information acquisition, we need to factor in discounting. The discounted amount of experimentation exactly captures 17

See Lemma 3.1 in Keller, Rady, and Cripps (2005).

16

the idea that information produced far in the future is worth less than information produced today. We can now compare the amount of information generated in equilibrium with the optimal level for the principal. Proposition 2. Suppose c < c∗ . For any p0 > pO , the equilibrium amount of experimentation is equal to the optimal amount, but the equilibrium discounted amount of experimentation is strictly less than the optimal discounted amount. When the agent’s cost of experimentation is low, she produces exactly the same amount of information that the principal requires. In equilibrium, though, the agent generates delay in information acquisition in order to slow down the principal’s rate of learning.18 The agent benefits from this delay because he can save on the costs of experimentation and also because he can induce the principal to experiment for a longer period of time. Thus, even though the information provided is the same, the delay significantly reduces the quality of that information. When the cost of experimentation is high, the agent generates less information than is optimal for the principal, even in undiscounted terms. A more interesting comparison involves the amount of information that the principal would generate if he were conducting experimentation directly, and thus paying the cost.19 In the introductory banking example, this would only occur if the regulator took over the management of the bank.20 The optimal solution to the principal’s problem with direct experimentation can be derived by following the same steps given in subsection 2.1. The principal experiments at maximal effort until either a shock occurs or the posterior belief r(s+c) , which is higher than pO because the principal now has reaches the cutoff, pD (c) = rh+(h−s) to internalize the cost of experimentation. Definition 4. For any prior belief, p0 , the informational content of delegation is the difference between the equilibrium, discounted amount of experimentation and the discounted amount of experimentation under direct experimentation. Theorem 2 states the main result of this section. 18

Even though learning is symmetric in this baseline model, the agent does not directly benefit from knowledge about the state. 19 In this situation, we can either assume that the agent has been fired or that the agent is still enjoying the higher flow payoff associated with the risky action but has no power to conduct any experimentation. 20 For instance, in the United States, the Federal Deposit Insurance Corporation (FDIC) can take control of a bank, when the bank is deemed to be sufficiently undercapitalized.

17

h(h−s) Theorem 2 (Delegation vs Direct Experimentation). Fix c < 1+r . The r s equilibrium amount of experimentation is always higher than the amount under direct experimentation. The informational content of delegation is zero starting from low prior beliefs, positive for intermediate prior beliefs, and negative for high prior beliefs.

Theorem 2 shows that, starting from an intermediate prior belief, delegating experimentation to the agent maximizes the discounted amount of information generated. The agent has more at stake than the principal from having the safe action imposed forever. Thus delegation is valuable from an informational point of view because it exploits the agent’s higher incentives to generate information. For high prior beliefs, however, direct experimentation generates more discounted information than delegation. The main reason for the reversal is the rate of learning: The agent generates information too slowly. So, the greater equilibrium amount of experimentation is not enough to compensate for the delay created by the agent. Figure 4 provides a graphic illustration of this. In particular, note that the equilibrium discounted amount of experimentation may fail to be monotonic in the prior belief. Perhaps surprisingly, the discounted amount of experimentation generated in equilibrium may be higher, starting from intermediate prior beliefs than it is when the players start from high prior beliefs. To gain some intuition, recall that a reduction in the prior belief has two opposite effects. On the one hand, it reduces the aggregate amount of effort exerted during the experimentation phase. But on the other hand, it increases the level of effort the agent exerts at each instant (in calendar time), because the principal is now more pessimistic initially.21 If the second effect is stronger than the first, the discounted amount of experimentation will increase after a reduction in the prior belief. This explains the non-monotonicity in the equilibrium discounted amount of experimentation. Delegation can also lead to higher social welfare, as we see in Corollary 2, which is an immediate consequence of Theorem 2. Corollary 2 (Value of Delegation). There exists an intermediate range of prior beliefs, such that social welfare is higher with delegation than with direct experimentation. The simple intuition behind the result lies in the differing incentives for experimentation for the principal and the agent. The principal has less incentive to experiment than the agent because the principal has an outside option that might be more valuable than the risky action. 21

See (27) in the Appendix.

18

DISCOUNTED AMOUNT OF EXPERIMENTATION

1.8 1.6 1.4

Direct Experimentation

1.2 1

Delegated Experimentation

0.8 0.6 0.4 0.2 0

BELIEF

Figure 4: Delegation vs direct experimentation: h = 2, s = 1, c = 0.2, r = 0.5, and c∗ = 1.5. In contrast, the agent is a sure loser if the safe action is implemented. Thus the agent is always willing to experiment for longer (i.e., lower stopping cutoff) than the principal. This difference in incentives is reflected in a higher social welfare for posterior beliefs that lie in an intermediate range, where the principal is either unwilling to experiment or about to stop experimentation.

2.4

Utilitarian Policy

What would a utilitarian social planner do in this scenario? The social planner’s problem is similar to the problem we have already analyzed in subsection 2.1 but also includes the agent’s utility. Clearly, the social planner experiments at maximal effort, and the socially efficient cutoff is,22 2s − h + c (11) pU (c) = r rh + 2(h − s) Proposition 3. If the cost of experimentation is low, the utilitarian stopping cutoff is lower than the equilibrium cutoff. If the cost of experimentation is high, the utilitarian stopping cutoff is lower than the equilibrium cutoff if and only if h − s < s–that is, when the agent’s loss from having the safe action chosen forever is less than the principal’s loss from having a bad risky action chosen forever. 22

The cutoff, pU (c), can be obtained from (15) with α = 1.

19

When the cost of experimentation is low, the social planner experiments for longer than the agent does in equilibrium, because the social planner has higher incentives than both the principal and the agent. In particular, the principal will never allow experimentation below the optimal cutoff, pO , because he would suffer a loss, since the outside option exceeds the payoff from experimentation. However, the social planner is willing to push experimentation further because switching to the safe action would reduce welfare due to the loss incurred by the agent. When the cost is high, the comparison depends on the difference, (h − s) − s, between the agent’s loss from implementing the safe action and the principal’s loss from implementing a bad risky action. If s > h − s, the risky action is attractive, from a social point of view, only if there is some experimentation. In this case, the social planner once again experiments for longer than in equilibrium. Instead, if h − s > s, the social planner always sticks to the risky action regardless of the occurrence of the shock. However, the social planner experiments less than the agent does in equilibrium. Thus in equilibrium, there is more experimentation than is socially optimal. The reason lies in the agent’s bias toward the risky action. The agent is willing to suffer current losses in order to continue experimentation, in the hope of prompting the shock. Thus the lack of control over the principal’s actions induces the agent to over-experiment because the principal will eventually switch to the risky action, unless the shock occurs.

3

Unobservable Effort

This section relaxes the assumption that the agent’s choice of effort is observable to the principal. We retain the assumption that the occurrence of the shock is public information.23 If the principal cannot observe how much experimentation the agent is performing at each instant, the principal cannot tell whether the delay in the occurrence of the shock is attributable to a bad state, or to the agent’s shirking. Thus, the principal and the agent share a common belief only on the equilibrium path. Off the equilibrium path, the principal and the agent’s posterior beliefs about the good state might differ. Instead of expressing the strategies in terms of beliefs, it is easier to use time as a state variable.24 Thus if the shock has no yet occurred, the principal’s strategy can be expressed 23

The assumption that the shock is public information is without loss of generality, because the agent has no incentive to hide such information from the principal. 24 This assumption is without loss of generality because the only event observable to the principal is whether or not the shock has occurred.

20

as a measurable function, D : R+ → {0, 1}, where Dt represents the principal’s decision to choose either the risky action, (Dt = 1), or the safe action, (Dt = 0), at time t. We retain the assumption that the principal has no commitment power. Otherwise, the principal could easily improve on the Markov-perfect equilibrium identified in Theorem 1 by simply committing to stop experimentation after a certain deadline. We start the analysis by looking at the agent’s expected payoff associated with an arbitrary, measurable effort strategy, K : R+ → [0, 1]. To this end, we denote by T the induced stopping time at which the principal switches to the safe action in the absence of a shock, given some arbitrary decision strategy, D(·), and some arbitrary belief about the agent’s chosen effort path. Let τ (K) denote the (possibly infinite) arrival time of the shock, given effort strategy, K. Thus we can write the expected, discounted payoff of the agent as ) s h E e−rt [h − ckt ] dt + e−rτ (K) 1{τ (K)≤T } + e−rT 1{τ (K)>T } r r 0 "Z # τ (K)∧T h h − s −rT e [1 − Prob (τ (K) ≤ T )] − E ckt e−rt dt = − r r 0 (Z

τ (K)∧T

(12)

The agent’s expected payoff is comprised of three components: i) the discounted flow payoff, hr , that the agent obtains when the risky action is implemented at all times; ii) a correction term, which takes into account the possibility that the safe action will be implemented if no shock occurs before the induced stopping time, T ; iii) the expected discounted cost associated with the agent’s allocation of effort during the experimentation phase. Similarly, for a given effort strategy for the agent, the principal’s discounted, expected payoff from allowing experimentation until a date, T , is given by,25   h −rT s E e + h 1{τ (K)≤T } + e 1{τ (K)>T } r r     s h = + h E e−rτ (K) 1{τ (K)≤T } + e−rT Prob (τ (K) > T ) r r 

−rτ (K)



(13)

The first component includes the lump-sum payoff, h, that the principal receives at the time of the shock–if the shock occurs before the proposed stopping time, T . In addition, the principal’s continuation payoff jumps to hr . The second component deals with the possibility that the shock may not occur before the stopping time, T , is reached. In that case, the principal obtains the discounted payoff from the safe action, rs , only at T and receives 25

For expositional purposes, we suppressed the dependence of the effort strategy, K, on the induced stopping, T .

21

nothing in the meantime. But suppose that the cost of experimentation for the agent is c = 0. Then, the agent would be willing to exert maximal effort until any stopping time, because the agent only cares about maximizing the probability of a shock hitting before T . Instead, when effort is observable, Theorem 1 shows that the agent prefers to slow down the rate of learning, even when experimentation costs nothing. The intuition is that observability gives the agent the opportunity to exploit the principal’s lack of commitment power and thus to extend the length of the experimentation phase to a date, T ∗ , during which the agent enjoys a higher flow payoff. The lack of monitoring removes the possibility for the principal to condition on the actual experimentation path, which is unobservable. The agent then focuses entirely on prompting the shock, as this is her only way of guaranteeing that the risky action will be chosen in the future. On the equilibrium path, the principal shares a common belief with the agent. Thus, the principal will switch to the safe action as soon as the posterior belief drops to pO ,h which isi the principal’s optimal cutoff. In calendar time, experimentation ends O 0 (1−p ) ∗ at T ∗∗ = ln p(1−p O , which is strictly less than T . 0 )p More generally, given a stopping time, an arbitrary effort strategy is always at least weakly dominated by a “bang-bang” strategy that involves shirking until a (possibly positive) intermediate calendar time, followed by maximal effort until the stopping time is reached. The probability of a shock occurring before the stopping time does not depend on how effort is allocated over time, but only on the aggregate amount of effort. Therefore, a bang-bang strategy is optimal because it postpones effort to the end of the experimentation phase, when effort is cheaper for the agent.26 Thus the main result of this section is given in Theorem 3. Theorem 3. Fix a prior belief, p0 > pO . There exists a cutoff for the cost of experimentation, c˜∗ (p0 ) > 0, such that if c ≤ c˜∗ (p0 ), then there exists an equilibrium in which the agent implements the principal’s optimal experimentation policy, as defined in Proposition h i p0 (1−pO ) ∗∗ 1. Experimentation lasts until a date, T (p0 ) = ln (1−p0 )pO , if no shock occurs before then.  If c ∈ c˜∗ (p0 ), p0 h−s , then there exists an equilibrium in which the principal chooses the r risky action only until time, M ∗ (c, p0 ) < T ∗∗ (p0 ), and the agent exerts maximal effort on the equilibrium path, until either the shock occurs or M ∗ (c, p0 ) is reached. If c ≥ p0 h−s , then the principal always chooses the safe action. r 26

This result hinges on the assumption of a linear cost of effort. When the cost is convex in the level of effort, the agent may prefer to smooth effort over time, rather than postponing it until the end of the experimentation phase. However, the main result of this section remains valid as long as the cost function is mildly convex and the optimal length of the experimentation phase for the principal is not too long.

22

Perhaps surprisingly, the impossibility of observing the agent’s choice of effort allows the principal to avoid delay in information acquisition. When the cost of experimentation is sufficiently low, the principal is better off if he can commit not to monitor the agent, provided that the principal is capable of that type of commitment. When experimentation is cheap, the agent has more incentives to experiment than the principal. Thus the equilibrium amount of experimentation is at its optimal level for the principal, as in the observable effort case. But in this scenario the discounted amount of experimentation is at the optimal level, though it was lower in the observable case. We already argued that the agent employs only strategies that involve initial shirking and then maximal effort. In this instance, stopping experimentation at T ∗∗ (p0 ) gives the agent the incentives to work at maximal effort until that stopping time is reached, because the principal keeps updating the belief downward, even if the agent might have shirked. In fact, if at any time, t < T ∗∗ (p0 ), the total amount of effort exerted by the agent is less than it should have been, then the agent will be more optimistic than the principal. In turn, that higher level of optimism induces the agent to catch up. However, the maximal level of effort is bounded above by 1, and since the principal has been expecting maximal effort, the agent will never have enough time to make up for the time wasted, even if she exerts maximal effort. If the stopping time arrives without the occurrence of the shock, then the principal no longer has incentives to continue experimentation because his posterior belief is already pessimistic enough to make halting experimentation optimal, given his belief. When experimentation is expensive, the situation is more complex. If the principal allows the risky action until the same stopping time that he would set with a low cost, the agent will shirk initially until some intermediate time and then (possibly) switch to maximal effort. Since the initial shirking phase has no value for the principal, he halts experimentation at an earlier date to provide the agent with sufficient incentive to exert maximal effort. Thus the principal’s optimal experimentation policy is once again unattainable. Even though there is no delay in equilibrium, as compared with the observable case, the equilibrium with unobservable effort, identified in Theorem 3, may induce a lower amount of experimentation than the corresponding Markov-perfect equilibrium with observable effort. Indeed, there may even be cases in which there is no experimentation in the equilibrium with unobservable effort, while the agent would have worked if her effort had been observable.27 However, as long as there is some experimentation in the equilibrium, the principal is always better off when there is no delay. More formally, it can be shown that minp0 ≥pO c˜∗ (p0 ) = pO h−s < c∗ . Thus there exists prior beliefs, r O ∗ p0 > p , and cost levels, c < c , that do not induce experimentation when there is a lack of monitoring. 27

23

The next result, given in Corollary 3, shows that higher prior beliefs require lower cost levels in order to sustain the principal’s optimal experimentation policy. Corollary 3 (Comparative Statics). The equilibrium cutoff, c˜∗ (p0 ), is decreasing in p0 . Also, M ∗ (c, p0 ) is increasing in p0 . An increase in the prior belief increases the principal’s optimal stopping time–assuming that the agent is exerting maximal effort–more than it increases the agent’s willingness to exert effort. Thus the cost of experimentation must be low enough to induce the agent to implement the principal’s optimal experimentation policy for higher levels of the prior belief. Instead, when the optimal experimentation policy is unattainable, an increase in the prior belief leads to a longer experimentation phase because the agent is willing to exert effort for a longer period of time, given her higher optimism.

4 4.1

Comparative Statics Observable effort

This section extends the baseline model by analyzing the case in which the principal takes into consideration the externality induced on the agent by his choice of action. Although this is an obvious simplification, we assume that the principal’s total expected, discounted payoff is now given by Wα (p) = VP (p) + αVA (p), given an arbitrary posterior belief p. The coefficient, α ∈ [0, 1], measures how much the principal cares for the agent. The case, α = 0, corresponds to the baseline model, while α = 1 represents the case in which the principal’s objective function coincides with that of a utilitarian social planner. To make the analysis interesting, we impose the Assumption 1 for all the results in this section. Assumption 1. s > h − s. Assumption 1 states that, absent any experimentation, the safe action generates a higher social welfare than the risky action. The share of social welfare generated by the agent under the risky action is not enough to cover for the potential loss incurred by the principal, in case of a bad risky action. Thus experimenting with the risky action is socially valuable only if paired with enough effort by the agent. As such, Assumption 1 eliminates the uninteresting case in which the principal prefers the risky action to the safe action for any level of α and the agent never exerts any effort. 24

Proposition 4 shows that there exists a unique, Markov-perfect equilibrium for any fixed level of α. Similar to the baseline model, the agent prefers to slow down the rate of learning as much as possible in order to maximize the length of the experimentation phase. This is accomplished by experimenting at the minimal rate which prevents the principal from implementing the safe action. h(h−s) Proposition 4. Suppose c < 1+r . For any α ∈ (0, 1], there exists a unique, Markovr s perfect equilibrium, such that the principal chooses the risky action for posterior beliefs above max{p∗ (c), pα (c)}, and the agent exerts effort

kα∗ (p) ≡

r[(1 + α)s − αh] ≤1 p [rh + (1 + α)(h − s)] − rαc

(14)

where p∗ (c) is the agent’s optimal stopping cutoff given by (8), and pα (c) =

r[(1 + α)s − αh + αc] rh + (1 + α)(h − s)

(15)

is the principal’s optimal stopping cutoff. Otherwise, the principal chooses the safe action, and the agent exerts zero effort. Finally, p∗ (c) < pα (c) if and only if c < c∗ . When the cost of experimentation is high, the agent stops experimenting at the same cutoff belief, p∗ (c), regardless of α. This is because a positive value of α has no effect on the cost of experimentation for the agent. Also, the agent’s incentives to slow down the speed of learning are unaffected because her utility function has not changed. However, the equilibrium level of effort changes because it is derived from the indifference condition of the principal, whose utility depends on α. Next we perform some comparative statics with the coefficient α. Proposition 5 (Comparative Statics). For any fixed belief, p, the equilibrium level of effort is a decreasing function of α. The equilibrium cutoff belief decreases in α when the cost of experimentation is low but is independent of α, otherwise. When the cost of experimentation is low, the more the principal pays attention to the agent’s well-being (i.e., higher α), the higher the equilibrium amount of experimentation (i.e., lower pα (c)) and the greater the delay will be (i.e., lower kα∗ (p)). What do these opposite effects mean? As α increases, the principal cares more about the negative effect that implementing the safe action has on the agent. Therefore, the principal becomes more 25

2.5

2.5

2

2 EQUILIBRIUM EFFORT

EQUILIBRIUM EFFORT

α=0.3

1.5 α=0.9 α=0.6 1

α=0.3

0.5

0

α=0.6 α=0.3

1.5

1

0.5

0

BELIEF

(a) c = 1 <= 1.65 = c∗ .

BELIEF

(b) c = 1.9 > c∗

Figure 5: Comparative Statics for α: h = 2, s = 1.2, r = 1/6. lenient and, as a consequence, authorizes experimentation for a wider range of posterior beliefs. This leniency allows the agent more leeway which, in turn, allows her to delay learning even more. The agent is completely “exploiting” the principal by extending the experimentation phase, while also reducing her flow cost. In equilibrium, even when the cost is high, the agent continues to under-experiment. Since a change in α leaves the cost unaffected, the amount of experimentation in equilibrium stays constant even as the delay increases. Figure 5 provides a graphic illustration of this. In Theorem 4, we provide a comparative statics result on the discounted amount of experimentation, as a function of α. Theorem 4 (Discounted Amount of Experimentation). Fix c < sider α > α0 . The following is true:

1+r h(h−s) r s

and con-

1. If the cost of experimentation is high, then the discounted amount of experimentation under α is lower than that under α0 , for any prior belief p0 > p∗ (c). 2. If the cost of experimentation is low, the discounted amount of experimentation under α is higher than that under α0 for intermediate prior beliefs and lower for high prior beliefs. Theorem 4 shows that the effect of an increase in α on the discounted amount of experimentation depends on the level of the cost of experimentation. Suppose first that c > c∗ , so that experimentation ends at p∗ (c), independent of the level of α. Next, an increase in α reduces the equilibrium level of effort for any level of the posterior belief, by Proposition 26

1.4 DISOOUNTED AMOUNT OF EXPERIMENTATION

DISCOUNTED AMOUNT OF EXPERIMENTATION

1.4 α=0.3 1.2 α=0.6 1 α=0.9 0.8

0.6

0.4

0.2

0

α=0.6 1 α=0.9 0.8

0.6

0.4

0.2

0

BELIEF

(a) c = 1 <= 1.65 = c∗ .

α=0.3

1.2

BELIEF

(b) c = 1.9 > c∗

Figure 6: Comparative statics for the discounted amount of experimentation: h = 2, s = 1.2, r = 1/6. 5. However, the aggregate amount of effort remains constant because the amount of experimentation is unaffected (for a fixed prior belief). Thus an increase in α induces the agent to back-load effort, which, in turn, reduces the discounted amount of experimentation in equilibrium. But suppose instead that c < c∗ . Now an increase in α also increases the amount of experimentation by shifting the cutoff, pα (c), by Proposition 5. Thus the aggregate amount of effort is greater. In this case, the effect on the discounted amount of experimentation depends on the prior belief. For high prior beliefs, the negative effect derived from backloading effort dominates the positive effect induced by a greater aggregate amount of effort. Therefore, increasing α reduces the quality of information. For low prior beliefs, however, the conclusion is reversed. The increase in aggregate effort is the dominant factor, and thus the discounted amount of experimentation rises.

4.2

Unobservable effort

Finally, we turn our attention to the case of unobservable effort. It is immediately evident that the agent’s incentives to postpone effort to the end of the experimentation phase are unaffected by the level of α. Thus in case no shock occurs, any changes in α will only affect the time at which the principal halts experimentation. i h p0 (1−pα (c)) denote the stopping Proposition 6. Fix p0 > pα (c) and let T ∗∗ (p0 , pα (c)) = ln (1−p α 0 )p (c) time induced by the principal’s optimal experimentation policy under α. Then, an increase in α increases T ∗∗ (p0 , pα (c)) if and only if c < c∗ . 27

Intuitively, the more the principal cares about the agent, the longer the principal should be willing to wait before stopping experimentation. However, this intuition is correct only when the cost of experimentation is low. Since an increase in α makes the principal more sensitive to the cost of experimentation, the principal is less willing to experiment in the presence of a high cost. Since the principal’s optimal stopping time depends on the cost parameter, c, whenever α > 0, implementing the principal’s optimal experimentation policy requires comparing the cost of experimentation with a cutoff function that also depends on c. Thus the analysis is less tractable than it was in the case of observable effort. Nevertheless, the next result contains some comparative statics for low-cost parameters. Proposition 7. Fix α0 > α and p0 > pO . There exist values of the cost parameter, c < c∗ , such that the principal’s α-optimal experimentation policy can be implemented under some equilibrium, while the α0 -optimal experimentation policy cannot be implemented under any equilibrium. When experimentation is cheap, an increase in α induces the principal to request more experimentation from the agent, which requires a longer stopping time. Since changes in α have no effect on the cost the agent faces from experimentation, the principal’s new optimal experimentation policy can be implemented under some equilibrium only if the cost parameter is sufficiently low. Otherwise, giving the agent more time will only result in initial shirking.

5

Conclusion

This paper investigates a theory of strategic delay in information acquisition when a principal must delegate experimentation to a biased agent. The agent delays the generation of information to maximize the length of the experimentation phase, during which time, the principal implements the agent’s preferred action. The delay arises from the principal’s lack of commitment power, which allows the agent to slow down the rate of learning. This problem is somewhat mitigated over time because the principal’s commitment power increases with the accumulation of information. Thus the agent is forced to speed up learning. The equilibrium experimentation policy is also affected by the degree to which the principal cares about the agent’s utility and the principal’s monitoring technology. Furthermore, when effort is observable and the common prior belief lies is neither too optimistic nor too pessimistic,

28

delegating experimentation to the agent allows more information to be generated than would be under direct experimentation by the principal. This paper can be extended in several ways. For example, this paper considered only the possibility of positive shocks, but negative shocks are equally relevant. In the introductory example, we might expect the bank managers to exert effort in order to avoid the possibility of sudden insolvency. The managers would then perform some form of risk management. In the context of the same application, the regulator might also have incomplete information about the initial probability of the good state or even the cost of experimentation, since this private information is available only to the agent. These extensions may lead to interesting policy implications. This paper uses bank regulation as the main illustrative example, but this analysis can be applied to other areas, like product safety regulation, medical drug recalls, etc. More generally, it is our hope that this paper provides insights that are useful in understanding the trade-offs involved when a regulator cannot gain direct and easy access to information that is required to make a socially relevant decision.

Appendix Proof of Theorem 1. The agent must guarantee that the principal’s expected payoff from R never drops below s or else the principal implements S. Let’s first solve for the effort path that makes the principal just indifferent between the two actions. From (4), we obtain the following equation pO s ∗ = (16) k (p) = p p[h + h−s ] r where the derivative of the value function is 0 at any belief level. k ∗ (p) is the lower bound on any effort path that induces the principal to choose R. We proceed by constructing a candidate solution. First, we can rewrite the HJB equation of the agent as follows 

      h 0 rVA = max max h + k −c + p − VA − p(1 − p)VA , s k≥k∗ (p) r

(17)

Since VA (p) ≤ hr for any p, this implies that the term in square brackets has to be negative. Therefore, the agent optimally sets the effort at the lowest level that induces the principal to allow experimentation, provided that her continuation value exceeds s. By smooth pasting, the cutoff belief at which the agent is indifferent between continuing experimentation at 29

effort level k ∗ and stopping solves  h−s h + k (p) −c + p =s r ∗



(18)

The LHS is strictly increasing in p, it tends to −∞ as p → 0, and it exceeds s for p = 1, h(h−s) because we assumed c < 1+r and h > s. The unique solution is given by (8). However, r s ∗ it might be the case that k (p) exceeds 1 before the posterior belief reaches p∗ (c). First, notice that the unique solution to the equation k ∗ (p) = 1 is pO . Then, p∗ (c) < pO ⇐⇒ c <

(1 + r)h(h − s) ≡ c∗ rh + (h − s)

(19)

Thus, the cutoff belief that determines the end of the experimentation phase depends on the magnitude of the flow cost c. Next, the definition of k ∗ (p) implies that the value function of the principal is constant at rs for any p. On the other hand, the value function of the agent solves, for any p ≥ p˜(c) ≡ max{pO , p∗ (c)}, h p(1 − p)VA0 + ξpVA = pξ − c (20) r which admits the solution   Z p h h−s 1 1 VA (p) = − +c dq (1 − p)ξ (21) ξ+1 r r (1 − p˜(c))ξ q(1 − q) p˜(c) where ξ ≡ (1 + r) hs , and the constant of integration has been obtained from the boundary condition VA (˜ p(c)) = rs . Lemma 1. VA (p) is increasing in p, for any p ≥ p˜(c).

30

Proof. Let’s differentiate with respect to p, VA0 (p)

 Z p h−s c 1 1 = ξ(1 − p) (22) +c dq − ξ ξ+1 r (1 − p˜(c)) p(1 − p) p˜(c) q(1 − q)   Z 1 1 c p c ξ−1 h − s ≥ ξ(1 − p) + dq − ξ ξ+1 r (1 − p˜(c)) p p˜(c) (1 − q) p(1 − p)    1 1 c c 1 ξ−1 h − s = ξ(1 − p) − + − ξ ξ ξ r (1 − p˜(c)) p ξ(1 − p) ξ(1 − p˜(c)) p(1 − p)   ξ−1 (1 − p) h−s c = ξ − ξ (1 − p˜(c)) r p   ξ−1 (1 − p) h−s c ≥ ξ − ∗ (1 − p˜(c))ξ r p (c) ξ−1



where the last inequality follows from p ≥ p˜(c) ≥ p∗ (c). Finally, it is easy to verify that the last term in square brackets is 0. Next, we need to show that VP (p) = rs , for any p, is the unique solution to equation (4) with effort level k ∗ (p). After substitution, we arrive at (1 −

p)VP0

 +

 1+r r + 1 VP = h O p r

(23)

which admits the general solution VP (p) =

s O + M (1 − p)r/p +1 r

(24)

From the boundary condition VP (˜ p(c)) = rs , we obtain that M = 0. Finally, we complete the proof by showing that the coefficient of k in (17) is indeed negative. By (21),    Z p 1 1 h 0 ξ h−s − VA − p(1 − p)VA = p(1 − p) +c dq (1 − ξ) −c + p ξ+1 r r (1 − p˜(c))ξ p˜(c) q(1 − q) 

(25) which is negative because ξ > 1, from h > s. Proof of Corollary 1. Let’s first substitute for k ∗ (p) into the dynamic equation (1). The evolution of the posterior belief reduces to p˙t = −pO (1 − pt ), which can be easily integrated

31

out to obtain the following solution pt = 1 − (1 − p0 )ep

Ot

(26)

where the constant of integration has been derived from the restriction that the posterior must equal the prior p0 at t = 0. We thus obtain directly that kt∗ =

pO 1 − (1 − p0 )epO t

(27)

and the length of the experimentation phase is given by 1 T (p0 ) ≡ O ln p ∗



1 − max{pO , p∗ (c)} 1 − p0

 (28)

for any prior p0 ≥ max{pO , p∗ (c)}. Next, differentiating (27) with respect to t, we obtain O

dkt∗ ep t = (pO )2 (1 − p0 )  2 > 0 dt 1 − (1 − p0 )epO t

(29)

Finally, suppose first that c ≤ c∗ , then     ∂T ∗ ∂pO 1 1 − pO 1 =− ln + O ∂z ∂z (pO )2 1 − p0 p (1 − pO )

(30)

    1 1 1 − p∗ ∂pO 1 ∂p∗ ∂T ∗ = − O O ln + ∂z p p 1 − p0 ∂z 1 − p∗ ∂z

(31)

If instead c > c∗ , the

O



O



Since dpdr , dp , dpds , dp > 0 and dr ds proof.

dpO dp∗ , dh dh

< 0, substituting into (30) and (31) completes the

Proof of Proposition 2. When c < c∗ , then p∗ (c) < pO by Theorem 1 and experimentation ends at pO . Thus, the equilibrium amount of experimentation coincides with the optimal amount by an immediate application of Definition 3. In turn, this implies that the aggregate amount of effort is the same under both policies. Since effort is less than 1 in equilibrium, this means that most of the effort is exerted towards the end of the experimentation phase in equilibrium. Therefore, the discounted amount of experimentation in equilibrium 32

is less than under direct experimentation, due to discounting. Proof of Theorem 2. We first start with a preliminary result, Lemma 2, which computes the discounted amount of experimentation induced by the principal’s optimal direct ˆ as well as for the equilibrium policy characterized in Theorem experimentation policy (D), ˆ 1 (B). Lemma 2. Fix p0 > pD (c). The optimal discounted amount of experimentation starting from a prior belief p0 is given by   r  Ω(p0 ) 1 ˆ 1− D(p0 ) = r Ω(pD (c))

(32)

Fix p0 > p˜(c) ≡ max{pO , p∗ (c)}. The equilibrium discounted amount of experimentation is given by Z p0 1 r/˜ p(c) ˆ B(p0 ) = (1 − p0 ) dp (33) r/˜ p(c)+1 p˜(c) p(1 − p) Proof. Equation (33) is a special case of equation (58) with α = 0 and it thus proven in Lemma 7. Next, let T D (c) denote the time at which experimentation ends under the optimal experimentation policy for the principal identified in Proposition 1. From (9) with k = 1, we have Z

T D (c) −rt

e 0

r dp p(1 − p0 ) dt = (1 − p)p0 p(1 − p) pD (c)  r  r p0 1 − p0 1 p = r 1 − p D p0 Z

p0



(34)

p (c)

which gives (32). h Notice that p∗ (c) ≤ pO < pD (c) for any c ≤ c∗ . Next, let c˜ ≡ 1+r (h − s) > cˆ ≡ r s 1+r ∗ D ∗ (h − s) > c . If cˆ ≤ c < c˜, then p (c) ≥ 1 > p (c). So, suppose c ∈ (c∗ , cˆ). Notice r that p∗ (c∗ ) = pO < pD (c∗ ) and p∗ (ˆ c) < 1 = pD (ˆ c). Since p∗ (c) and pD (c) are linear functions of c, it follows that they cannot intersect over the interval (c∗ , cˆ). Thus, p∗ (c) < pD (c) for any c ∈ [c∗ , c˜).28 The first part of the result then follows from the observation that p˜(c) ≡ max{pO , p∗ (c)} < pD (c), for any c < c˜, together with the monotonicity of the amount of experimentation from Definition 3. 28

We assume that pD (c) = 1 over (ˆ c, c˜).

33

Let’s then consider the discounted amount of experimentation. We need to show that there exists pˆ(c) ∈ (˜ p(c), 1] such that the equilibrium discounted amount of experimentation is higher than the discounted amount under direct experimentation if and only if p0 ∈ (˜ p(c), pˆ(c)). It is immediate to notice that the discounted amount of experimentation under ˆ direct experimentation, D(·), is given by (32) with pD (c) in place of pO . If c ∈ (ˆ c, c˜), the principal doesn’t experiment directly, thus the result is obvious with pˆ(c) = 1. So, suppose ˆ 0 ) > 0 = D(p ˆ 0 ) for any 0 < c < cˆ. Since p˜(c) < pD (c) in this case, it follows that B(p O ˆ ˆ ˆ 0 ) and p0 ∈ (˜ p(c), pD (c)). Also, D(1) = 1r > pr = B(1). Next, we need to show that D(p ˆ 0 ) intersect only once. Let q > p˜(c) be such that D(q) ˆ ˆ B(p = B(q), then ˆ dBˆ dD − dp0 dp0

p0 =q

 r Z q 1 1 r Ω(q) 1 r/˜ p(c)−1 = − (1 − q) dp − r/˜ p(c)+1 q(1 − q) p˜(c) q(1 − q) Ω(˜ p(c)) p˜(c) p(1 − p)    r  1 1 1 Ω(q) − = −1 (35) 1 − q p˜(c) q Ω(˜ p(c))

ˆ ˆ which is negative. Thus, B(p) can only intersect D(p) from above, which proves the uniqueness of the point of intersection pˆ(c) ∈ (˜ p(c), 1). Proof of Proposition 3. Notice that pU (c) < pO ⇐⇒ c < c∗ . In addition, pU (c) < 1 ⇐⇒ c < c˜ = 2

1+r (h−s) r

and

p∗ (c) < 1 ⇐⇒ c < cˆ =

1+rh (h−s) (36) r s

Thus, c˜ > cˆ ⇐⇒ 2s > h. Since the cutoff functions are linear in c and pU (c∗ ) = pO = p∗ (c∗ ), it follows that when c > c∗ then pU (c) < p∗ (c) ⇐⇒ 2s > h. Proof of Theorem 3. We first provide a more convenient representation for the agent’s expected discounted payoff given an arbitrary effort strategy K and an arbitrary stopping

34

time T ,29 (Z

) s h E e−rt [h − ckt ] dt + e−rτ (K) 1{τ (K)≤T } + e−rT 1{τ (K)>T } r r 0 "Z # τ (K)∧T h h − s −rT = − e [1 − Prob (τ (K) ≤ T )] − E ckt e−rt dt r r 0 Z T Z T Rt R h h − s −rT h − s −rT − 0t pu ku du ckt e− 0 (pu ku +r) du dt dt − e + e pt kt e = − r r r 0 0 Z T Z T h h − s −rT h − s −rT 1 − p0 1 − p0 −rt = − e + e pt kt dt − ckt e dt r r r 1 − pt 1 − pt 0 0 τ (K)∧T

(37)

Fix T > 0 and p0 > pO . The result is proved through a sequence of lemmas. The following lemma is a consequence of the linear cost structure. Lemma 3. For any experimentation strategy K, there exists another experimentation strategy Kbang such that ( 0 if t ≤ TKbang ktbang = 1 if TKbang ≤ t ≤ T  for some TKbang ∈ [0, T ], with Prob τ (Kbang ) ≤ T = Prob (τ (K) ≤ T ), and which induces a weakly lower total expected discounted cost than that associated with K. Proof. Let p¯ = Prob (τ (K) ≤ T ). The probability of a shock with an arbitrary bang-bang strategy with cutoff M is given by  G(M ) = Prob τ (Kbang (M )) ≤ T Z T R t Kbang bang − M pu du = pK e dt t M Z T −M 1 − p0 bang = pK bang dt t 1 − pK 0 t 1 − p0 =1− bang 1 − pK T −M

(38)

where we made use of equation (1). Notice that G(0) ≥ p¯, while G(T ) = 0 < p¯. Also, 29

In the sequel, we will make frequent use of the following fact   Z t 1 − p0 ln =− pu ku du 1 − pt 0

for any arbitrary {ku }, which follows from the dynamic equation (1).

35

˜ ˜ G(M ) is continuous and  strictly decreasing in M . Thus, there exists T = T (K) such that Prob τ (Kbang (T˜)) ≤ T = p¯, where Kbang (T˜) is the bang-bang strategy with switching point at T˜. Similarly, 1 − p0 p¯ = 1 − (39) 1 − pT R T −T˜ RT bang dt = Thus, it must necessarily be pT = pK k dt = . But then, it must also be t 0 0 T −T˜ T − T˜, by Definition 3. That is, both experimentation policies induce the same aggregate amount of effort. However, the effort is postponed at the end of the experimentation phase under Kbang (T˜), where it costs less due to discounting. Thus, the cost cannot be greater under K than under Kbang (T˜). This completes the proof. Lemma 3 implies that we can restrict attention to “bang-bang” experimentation strategies in the agent’s maximization problem. Thus, the agent’s problem reduces to the choice of a cutoff T˜ that maximizes (37). We now turn to the analysis of the agent’s best-response to a stopping time T . Lemma 4. Suppose the principal stops experimentation at T , and the prior is p0 . Then, (h − s) such that, if: i) c < c1 , then T˜∗ = 0; ii) there exist constants c1 (p0 , T ) < c2 (p0 ) < 1+r r c > c2 , then T˜∗ = T ; iii) c1 < c < c2 , then T˜∗ = T − M ∗ (c, p0 ) ∈ (0, T ) where " # h−s c − 1 1 r 1+r ln M ∗ (c, p0 ) = r 0 1+r c 1−p + 1+r p

(40)

0

Furthermore, M ∗ (c, p0 ) is independent of T, decreasing in c and increasing in p0 . Proof. The agent’s expected payoff from a bang-bang strategy with cutoff T˜ is h − s −rT H(T˜) =A + e r

T

T

1 − p0 −rt e dt 1 − pt T˜ T˜ Z Z T −T˜ −rt ˜ h − s −rT T −T 1 − p0 e −rT˜ =A + e pt dt − c(1 − p0 )e dt r 1 − pt 1 − pt 0 0 ( ) Z Z T −T˜ −rt ˜ h − s −rT T −T pt e −rT˜ =A + (1 − p0 ) e dt − ce dt r 1 − pt 1 − pt 0 0

where A =

h r



h−s −rT e . r

Z

1 − pT˜ pt dt − 1 − pt

Z

c

(41)

Differentiating with respect to T˜ gives

(

" #) Z T −T˜ −rt −rT p h − s e e ˜ ˜ T −T H 0 (T˜) = (1 − p0 ) − e−rT + c + re−rT dt (42) r 1 − pT −T˜ 1 − pT −T˜ 1 − pt 0 36

Let’s look for an interior solution of the FOC   −rM Z M −rt e h − s pM e −rM − e +r dt = −c r 1 − pM 1 − pM 1 − pt 0 | {z } | {z } LHS

(43)

RHS

where M = T − T˜. Since the agent is exerting maximal effort, we can solve for the posterior belief from (1). Thus, h − s p0 −(r+1)M e (44) LHS = − r 1 − p0 Similarly, 

r p0 1 p0 −(r+1)M RHS = −c 1 + + e r + 1 1 − p0 r + 1 1 − p0

 (45)

Notice that H 0 (T ) < 0 ⇐⇒ c < p0 h−s ≡ c2 (p0 ) regardless of T , and r 0

H (0) > 0 ⇐⇒ c >

1−p0 p0

h−s −(r+1)T e r r 1 −(r+1)T + 1+r + 1+r e

≡ c1 (p0 , T )

(46)

Furthermore, H (T˜) = − 00



c h−s − r 1+r



  p0 −(1+r)T T˜ r p0 −rT˜ e e − rce 1+ 1 − p0 1 + r 1 − p0

(47)

(h − s) ≡ c3 . As functions of p0 , c2 (p0 ) > c1 (p0 , T ) and which is negative for any c < 1+r r c3 > c2 (p0 ), for any p0 ∈ (0, 1].30 So, if c2 ≤ c ≤ c3 , then T˜∗ = T because H 0 (T˜) > 0 for any T˜ ∈ [0, T ]. If c ≤ c1 , then T˜∗ = 0 because H 0 (T˜) < 0 for any T˜ ∈ [0, T ]. If, instead, c ∈ (c1 , c2 ), then there exists a unique interior solution, which is given by T˜∗ = T −M ∗ (c, p0 ), with M ∗ (c, p0 ) given by (40). If instead c > c3 , the second derivative might be positive over some range of times, but the first derivative can still be zero only at one point. Since min{H 0 (0), H 0 (T )} > 0, the first derivative is always nonnegative. Thus, once again T˜∗ = T . Finally, M ∗ (c, p0 ) is independent of T and decreasing in c. This completes the proof. Let’s now consider the problem from the point of view of the principal. At any time t, the only information the principal has is whether the shock has occurred or not. However, the principal also has a belief about the effort strategy chosen by the agent. In particular, ˆ the principal’s posterior belief if the principal thinks the agent is using an effort strategy K, 30 2

It can be shown that, for any T > 0, c1 (0, T ) = c2 (0), c1 (1, T ) < c2 (1),

∂ c1 ∂(p0 )2

> 0. Furthermore,

∂c1 ∂T

< 0.

37



∂c1 ∂p0

< p0 =0



dc2 dp0

and p0 =0

at time t is uniquely defined and given by pˆt =

p0 e−

Rt 0

ˆu du k

1 − p0 + p0 e−

Rt 0

ˆu du k

(48)

The principal’s discounted expected payoff if he decides to stop experimentation at some ˆ for the agent, is given by date T , given a strategy K 

  h −rT s + h 1{τ (K)≤T 1 ˆ ˆ }+e r r {τ (K)>T }  h i s h  i ˆ −rτ (K) −rT ˆ +h E e 1{τ (K)≤T 1 − Prob τ (K) ≤ T ˆ } + e r  Z T  Z T R R s −rT ˆu du ˆu du −rt ˆ − 0t pˆu k − 0t pˆu k ˆ dt + e e pˆt kt e dt +h 1− pˆt kt e r 0 0 Z T 1 − p0 s 1 − p0 e−rt pˆt kˆt +h dt + e−rT 1 − pˆt r 1 − pˆT 0

ˆ −rτ (K)

E e  h = r  h = r  h = r



(49)

By Lemma 3, we can restrict attention to bang-bang strategies and thus the principal’s payoff function reduces to Z T 1 − p0 s h 1 − p0 e−rt pˆt +h dt + e−rT r 1 − pˆt r 1 − pˆT −Tˆ Tˆ   Z T −Tˆ h pˆt s 1 − p0 ˆ + h (1 − p0 )e−rT e−rt dt + e−rT = r 1 − pˆt r 1 − pˆT −Tˆ 0   i −(1+r)(T −Tˆ) h s −rT h −rTˆ 1 − e −(T −Tˆ) = + h p0 e + e (1 − p0 ) + p0 e r 1+r r



(50)

Unlike the agent, the principal cares about the exact timing of the shock before date T is reached. The next lemma shows the conditions under which the optimal experimentation policy can be sustained as an equilibrium. i h p0 (1−pO ) Lemma 5. Let c˜∗ (p0 ) = c1 (p0 , T ∗∗ (p0 )), where T ∗∗ (p0 ) = ln (1−p . If c ≤ c˜∗ (p0 ), then O 0 )p there exists an equilibrium in which the principal chooses R until T ∗∗ and, on the equilibrium path, the agent exerts maximal effort until either the shock occurs or else the stopping time is reached.  If c ∈ c˜∗ (p0 ), p0 h−s , then there exists an equilibrium in which the principal sets a r ∗ stopping time at M (c, p0 ), given by (40) and, on the equilibrium path, the agent exerts maximal effort until until either the shock occurs or else the stopping time is reached. 38

If c ≥ p0 h−s , then the principal chooses S forever. r Proof. Let’s first consider when it is possible for the principal to induce the agent to exert maximal effort until T ∗∗ (p0 ), so that at the time of switching the posterior belief is exactly pO , the principal’s optimal cutoff. Let’s consider the case that c ≤ c1 (p0 , T ∗∗ (p0 )) ≡ c˜∗ (p0 ). Suppose that the principal allows experimentation until T ∗∗ (p0 ), and that he thinks that the agent will exert maximal effort until T ∗∗ (p0 ). From Lemma 4 and our assumption about the cost level, it follows immediately that the agent will indeed exert maximal effort until the proposed date. Similarly, if the agent is exerting maximal effort until T ∗∗ (p0 ), then the principal has no incentive to deviate either, because the corresponding outcome coincides with the principal’s optimal experimentation outcome (in the absence of a shock). Since the principal’s beliefs are indeed correct on the equilibrium path, the candidate strategies form an equilibrium. . Then, Lemma 4 shows that the agent would simply shirk Next, suppose that c ≥ p0 h−s r until any stopping time. Since the principal’s expected payoff then equals rs e−rT from (50), the principal always choosing S and the agent always exerting zero effort is an equilibrium.  Finally, suppose c ∈ c˜∗ (p0 ), p0 h−s . In this case, the agent would never exert maximal r ∗∗ effort until T (p0 ), but she would shirk until T ∗∗ (p0 ) − M ∗ (c, p0 ), and then work at maximal effort. So, let’s consider the candidate equilibrium in which the agent is using the strategy described in Lemma 4 and the principal is allowing experimentation until M ∗ (c, p0 ), under the belief that the agent is exerting maximal effort at all times. By construction, the agent has no incentive to deviate from maximal effort until M ∗ (c, p0 ). So, we only need to check whether the principal has any incentive to deviate. Clearly, the principal would never stop experimentation before M ∗ (c, p0 ). Such a deviation would reduce the principal’s expected payoff as the posterior belief at the time of stopping is already strictly higher than his optimal stopping cutoff pO , given that M ∗ (c, p0 ) < T ∗∗ (p0 ), and in addition the agent is exerting maximal effort. So, suppose that we just reached the stopping time. Does the principal have any incentive to allow experimentation for a little longer? The answer depends on what the agent is going to do. However, again by construction, the agent has no incentive to exert any additional effort after M ∗ (c, p0 ), if he exerted maximal effort until that date. Thus, the principal’s continuation payoff is strictly below rs from (49), because the principal gets a zero flow payoff for as long as he keeps choosing R. Since the principal could get rs for sure by choosing the safe action instead, we have thus shown that the principal has no incentive to deviate either. This completes the proof. This completes the proof. 39

Proof of Corollary 3. From total differentiation of c˜∗ (p0 ) = c1 (p0 , T ∗∗ (p0 )), we obtain ∗

d˜ c (p0 ) = − dp0 1−p0 p0

h−s (p0 )2 (1−p0 )

+

r 1+r

+

h

(1−p0 )pO p0 (1−pO )

1 1+r

h

i1+r

(1−p0 )pO p0 (1−pO )

i1+r 2 < 0

(51)

Proof of Proposition 4. Fix α ∈ [0, 1]. The value function of the principal must then solve the following HJB equation 





   (1 + α)h 0 − Wα (p) − p(1 − p)Wα (p) , (1 + α)s rWα (p) = max αh + k ph − αc + p r (52) ∗ Let kα (p) denote the level of the effort that makes the principal indifferent between the two actions at the posterior belief p. From (52), we obtain (14), which is decreasing in p and kα∗ (1) > 0, by Assumption 1. Also, kα∗ (p) converges to +∞ if p → pαc , where pαc ≡ rαc . It can be checked directly that kα∗ (p) ≤ 1 ⇐⇒ p ≥ pα (c), where pα (c) is given rh+(1+α)(h−s) by (15). Since the effort level of the agent is bounded above by 1, any posterior belief below pα (c) is too low to sustain experimentation. Next, in the continuation region, the value function of the agent solves     h 0 rVA = h + max k −c + p − VA − p(1 − p)VA ∗ (p) k≥kα r

(53)

Since VA ≤ hr , the term in curly brackets must necessarily be negative. Thus, the agent is willing to invest but at the lowest possible rate that induces the principal to experiment, that is, kα∗ (p). By smooth pasting, the belief at which the agent is indifferent between experimenting and stopping solves   h−s ∗ h + kα (p) −c + p =s (54) r The LHS is strictly increasing in p, it tend to −∞ as p → pαc and it exceeds s for p = 1, by Assumption 1. Using (14), we obtain p∗ (c) as the unique solution. Once again, the agent cannot invest below pα (c) because the necessary effort level exceeds 1. Thus, experimentation continues provided that p ≥ max{p∗ (c), pα (c)}. Lemma 6. For any α, pα (c) > p∗ (c) if and only if c < c∗ . Proof. From (54), notice that kα∗ (p∗ (c)) = kα∗ 0 (p∗ (c)), for any α 6= α0 . In particular, kα∗ (p∗ (c)) = 40

k0∗ (p∗ (c)) > 1 = k0∗ (pO ) = k ∗ (pO ) = kα∗ (pα (c)) if and only if c < c∗ . Let’s solve the problem backwards. Starting from p˜α (c) ≡ max{pα (c), p∗ (c)}, let the effort level be given by kα∗ (p). The value function of the agent must then solve h p(1 − p)VA0 + (βp − γ)VA = (βp − γ) − c r where β=

(1 + r)h , (1 + α)s − αh

and γ =

(55)

rαc (1 + α)s − αh

(56)

This differential equation can be integrated out with the boundary condition VA (˜ pα (c)) = rs . Finally, the verification that the candidate policy is indeed an equilibrium follows the same lines as in the proof of Theorem 1. Proof of Proposition 5. Let’s differentiate (14) with respect to α dkα∗ p(h − s)(1 + r)h − rcs = −r dα [p [rh + (1 + α)(h − s)] − rαc]2

(57)

which is negative if and only if p > p∗ (c). This establishes the first part of the proposition. Next, suppose that c < c∗ , and consider α > α0 . From (54), we have that kα∗ (p∗ (c)) = ∗ α kα∗ 0 (p∗ (c)) > 1. Since dk < 0 for posterior beliefs above p∗ (c), then the function kα (·) must dα 0 cross 1 before the function kα0 (·), that is, pα (c) < pα (c). Finally, the result is obvious when c ≥ c∗ , because the cutoff belief is p∗ (c), which is independent of α. Proof of Theorem 4. We start with a preliminary result that directly computes the discounted amount of experimentation for any level of α. Lemma 7. Fix 0 ≤ α ≤ 1. The discounted amount of experimentation is given by (1 − p0 )rφα Bˆα (p0 ) = Ω(p0 )rνα where φα =

Z

p0

max{pα (c),p∗ (c)}

rh + (1 + α)(h − s) , r[(1 + α)s − αh]

and

Ω(p)rνα dp p(1 − p)rφα +1

να =

αc (1 + α)s − αh

Proof. We can first rewrite the equilibrium effort as kα∗ (p) =

41

1 . pφα −να

(58)

(59)

Substituting into (1),

and integrating out gives the following equation  Tα (q) = (φα − να ) ln

1−q 1 − p0



 + να ln

q p0

 (60)

for the length of the experimentation phase starting from a prior belief p0 and terminating at a posterior belief q. Next, let Tα∗ (c) denote the calendar time at which experimentation stops under the equilibrium policy identified in Proposition 4. Then, using (60), we arrive at Z

Tα∗ (c)

∗ e−rt kα,t

Z

p0



dt = max{pα (c),p∗ (c)}

0

1 − p0 1−p

r(φα −να ) 

p0 p

rνα

dp p(1 − p)

(61)

which gives (58) after rearranging. When c ≥ c∗ , the equilibrium amount of experimentation is constant in α. This implies that the aggregate amount of effort is constant as well. Since an increase in α induces the agent to postpone effort by Proposition 5, the discounted amount of experimentation is necessarily lower for higher levels of α. 0 Next, if c < c∗ , Proposition 5 also implies that α > α0 induces pα (c) < pα (c). Thus, ∗ ∗ k (1) 0 0 Bˆα (pα (c)) > 0 = Bˆα0 (pα (c)). In addition, Bˆα (1) = kαr(1) < α0r = Bˆα0 (1). Next, let q be such that Bˆα (q) = Bˆα0 (q), then dBˆα dBˆα0 − dp0 dp0



p0 =q

Next,

and

=⇒ Thus,

dBˆα dp0





dBˆα0 dp0

p0 =q

να 0 − να (φα0 − να0 ) − (φα − να ) +r = r 1−q q



Bˆα (q)

(62)

dνβ cs = >0 dβ [(1 + β)s − βh]2

(63)

dφβ (1 + r)h(h − s) >0 = dβ r[(1 + β)s − βh]2

(64)

dφβ dνβ 1+rh − > 0 ⇐⇒ c < (h − s) dβ dβ r s

(65)

< 0, which implies that Bˆα (p0 ) and Bˆα0 (p0 ) are going to cross exactly

once and with Bˆα (·) crossing Bˆα0 (·) from above. Thus, there exists a unique cutoff pˆα,α0 (c) ∈ (pα (c), 1) such that Bˆα (p0 ) > Bˆα0 (p0 ) for any p0 ∈ (pα (c), pˆα,α0 (c)), while Bˆα (p0 ) < Bˆα0 (p0 ), for any p0 > pˆα,α0 (c). This completes the proof. 42

Proof of Proposition 6. From equation (52), it is by now immediate to see that the optimal experimentation policy of the principal, for any level of α, involves maximal effort until either the shock occurs or else the posterior belief drops to pα (c). Differentiating pα (c) α with respect to α, we obtain that dpdα(c) < 0 if and only if c < c∗ . Since T ∗∗ (p0 , pα (c)) is decreasing in pα (c), the result follows. Proof of Proposition 7. We will restrict attention to cost parameters below c∗ . From Lemma 4, let c1 (p0 , T ∗∗ (p0 , pα (c))) denote the cutoff for the cost level such that if c < c1 (p0 , T ∗∗ (p0 , pα (c))), then there exists an equilibrium in which the agent would work at maximal effort until either the shock occurs or else the principal stops experimenta0 tion at T ∗∗ (p0 , pα (c)). By Proposition 6, T ∗∗ (p0 , pα (c)) > T ∗∗ (p0 , pα (c)), which implies 0 that c1 (p0 , T ∗∗ (p0 , pα (c))) > c1 (p0 , T ∗∗ (p0 , pα (c))) by footnote 30, for any c < c∗ . Also, 0 0 c1 (p0 , T ∗∗ (p0 , pα (c∗ ))) = c1 (p0 , T ∗∗ (p0 , pα (c∗ ))), because pα (c∗ ) = p∗ (c∗ ) = pα (c∗ ). Thus, 0 we are left to show that there exist cost levels such that c1 (p0 , T ∗∗ (p0 , pα (c))) < c < c1 (p0 , T ∗∗ (p0 , pα (c))). Since p∗ (c∗ ) = pO , it is sufficient to show that c1 (p0 , T ∗∗ (p0 , pO )) = c˜(p0 ) < c∗ , because then the 45 degrees line is going to cross both c1 (p0 , T ∗∗ (p0 , pα (·))) and 0 c1 (p0 , T ∗∗ (p0 , pα (·))) as functions of c from below before reaching c∗ . By Corollary 3, it is s(h−s) = rh+(h−s) , which is indeed less then enough to show that c˜(pO ) < c∗ . Next, c˜(pO ) = pO h−s r (1+r)h(h−s) ∗ than c = rh+(h−s) given that h > s.

References Alonso, R., and N. Matouschek, 2007, “Relational Delegation,” RAND Journal of Economics, 38, 1070–1089. , 2008, “Optimal Delegation,” Review of Economic Studies, 75, 259–293. Armstrong, M., and J. Vickers, 2010, “A Model of Delegated Project Choice,” Econometrica, 78, 213–244. Bergemann, D., and U. Hege, 2005, “The Financing of Innovation: Learning and Stopping,” RAND Journal of Economics, 36, 719–752. Besanko, D., and J. Wu, 2010, “Government Subsidies for Research Programs Facing “If” and “When” Uncertainty,” mimeo. Bolton, P., and C. Harris, 1999, “Strategic Experimentation,” Econometrica, 67, 349–374. 43

Bonatti, A., and J. H¨orner, 2011a, “Career Patterns and Career Concerns,” mimeo, MIT and Yale Univerisity. , 2011b, “Collaborating,” American Economic Review, 101, 632–663. Brocas, I., and J. Carrillo, 2007, “Influence through Ignorance,” RAND Journal of Economics, 38, 931–947. Cr´emer, J., 1995, “Arm’s Length Relationships,” Quarterly Journal of Economics, 110, 275– 295. D´ecamps, J.-P., and T. Mariotti, 2004, “Investment Timing and Learning Externalities,” Journal of Economic Theory, 118, 80–102. Gerardi, D., and L. Maestri, 2009, “A Principal-Agent Model of Sequential Testing,” mimeo. Hirsch, A., 2011, “Experimentation and Persuation in Political Organizations,” mimeo. Holmstr¨om, B., 1984, “On the Theory of Delegation,” in Bayesian Models in Economic Theory,ed. by M. Boyer, and R. Kihlstrom. North Holland, New York, NY. , 1999, “Mangerial Incentive Problems: A Dynamic Perspective,” Review of Economic Studies, 66, 169–182. H¨orner, J., and L. Samuelson, 2009, “Incentives for Experimenting Agents,” mimeo. Kamenica, E., and M. Gentzkow, 2010, “Bayesian Persuasion,” American Economic Review, p. forthcoming. Keller, G., S. Rady, and M. Cripps, 2005, “Strategic Experimentation with Exponential Bandits,” Econometrica, 73, 39–68. Klein, N., 2010, “The Importance of Being Honest,” mimeo. Malueg, D., and S. Tsutsui, 1997, “Dynamic R&D Competition with Learning,” RAND Journal of Economics, 28, 751–772. Matthews, S., and A. Postlewaite, 1985, “Quality Testing and Disclosure,” RAND Journal of Economics, 16, 328–340. Milgrom, P., and J. Roberts, 1986, “Relying on the Information of Interested Parties,” RAND Journal of Economics, 17, 18–32. 44

Strulovici, B., 2010, “Learning While Voting: Determinants of Collective Experimentation,” Econometrica, 78, 933–971. Szalay, D., 2005, “The Economics of Clear Advice and Extreme Options,” Review of Economic Studies, 74, 1173–1198.

45

Delegated Experimentation

19 Oct 2011 - Mauricio Varela, and seminar participants at the University of Bristol, University of Essex, ITAM School of. Business, Kellogg School of Management, University of Miami, Royal Holloway, and University of Warwick. Any remaining errors are my own. .... to complete a project. In contrast, we focus on a different ...

582KB Sizes 3 Downloads 240 Views

Recommend Documents

160518-delegated-regulation_en.pdf
On 23 April 2014, the Commission services sent a. formal request for technical advice (the "Mandate") to ESMA on the contents of the delegated.

"Financial Intermediation and Delegated Monitoring".
Mar 25, 2008 - http://www.jstor.org/about/terms.html. ... of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed.

"Financial Intermediation and Delegated Monitoring".
Mar 25, 2008 - visit your library's website or contact a librarian to learn about options for remote ... 2, Papers and Proceedings of the Fortieth Annual Meeting of.

Contracts for Experimentation
Jan 13, 2016 - †Graduate School of Business, Columbia University and Department of Economics, ... D Supplementary Appendix for Online Publication Only.

Auditing the Use of Delegated Administrative Rights - Netwrix
SEO Search Optimization keywords: Active Directory auditing; Active. Directory change management; administrative activity auditing, delegated administration ...

Strategic Experimentation in Queues
Nov 10, 2015 - σ∗(q,N,M), the queue length at the beginning of the arrival stage of .... by the “renewal” effect of the uninformed first in line reneging after N unsuccessful ... values of N and M are chosen for clarity of illustration and are

Strategic Experimentation in Queues
May 9, 2018 - ... length that perfectly reveals the state to new arrivals, even if the first in line knows that the server is good. ... Section 3 we introduce two concepts in the context of two auxiliary individual optimization. 3 ...... Springer-Ver

Strategic Experimentation with Congestion
Jun 4, 2018 - stage n to stage n + 1, and the new incumbency state yn+1. For any admissible sequence of control profiles {kt} t≥tn let τi n = min{t ≥ tn : ki.