Abstract We study a decision-maker who must choose the stopping time for a project of unknown quality when she is concerned both about social welfare and public beliefs about her ability. The decision-maker gets private information over time about whether the project will succeed or fail. Intuition suggests that in this setting the decision-maker will continue with the project for too long, both because persisting signals positive private information and in the hope of a last-minute success. We show, however, that exact efficiency can be achieved in equilibrium for many information structures. Surprisingly, increasing the informational asymmetry by improving the decision-maker’s private information improves efficiency. When efficiency cannot be achieved, we examine the nature of the inefficiencies.

JEL Classification Number: C72, C73, D82, B83, P16. Keywords: Career Concerns, Dynamic Signalling, Political Economy, Poisson Bandits, Strategic Experimentation

∗

Department of Economics, University of Texas at Austin. Email: [email protected]

Thanks to Yiman Sun for research assistance. I thank Elchanan Ben-Porath, V. Bhaskar, Joyee Deb, Laura Doval, Florian Ederer, Mark Feldman, Godfrey Keller, Christian Krestel, Georg N¨ oldeke, Larry Samuelson, Max Stinchcombe, Jeroen Swinkels, and Tom Wiseman for useful comments, as well as seminar audiences at Arizona State University’s “Topics in Contracting” conference 2014, Econometric Society World Congress, ESSET, Games 2016, National University of Singapore, SAET, Seoul National University, Southern Methodist University, Stanford, ThReD Conference, Duke, Universit´e de Montr´eal, Yale. I am grateful to the Cowles Foundation for its hospitality. This paper was previously circulated under the title “Reputational Concerns and Policy Intransigence”.

1

1

Introduction

Many unsuccessful political projects persist for longer than necessary, given the information available to their proponents. Governments proposing these projects often advocate them even more strongly when faced with evidence of their failure. In response to the unpopularity of the poll tax introduced in the UK in 1989-90, Margaret Thatcher became more resolute in supporting it, declaring it her flagship policy. Mao pushed on with the Great Leap Forward in China, censoring numerous reports by local administrators of famine in many parts of the country.1 Similar concerns arise in the financial sector. Managers believing their bank to be insolvent may conceal this from its shareholders and the regulator, and continue operating, perhaps even doubling down with further risky investments, hoping for a lucky outcome. In the recent financial crisis, a number of agents, institutions and countries have notoriously gambled for resurrection.2 Career concerns are a leading explanation for policy-makers’ reluctance to abandon policy experiments. Political leaders care not only about social welfare, but also about re-election. A CEO cares not only about her stock options and incentive pay, but also about her future job prospects. If a policy’s merit is informative about the policy-maker’s competence, these two objectives may conflict. When the policy-maker is known to receive private information regarding the viability of her policy, repealing it might make her look incompetent. This causes a bias towards inefficient continuation. The policy-maker might resist repealing the policy even when she is privately convinced of its worthlessness. Additionally, the project might take a turn for the better, and gambling for resurrection might rescue the policy-maker’s reputation. This paper shows that the validity of this argument depends on how private information is modelled. In a dynamic game where private information accrues gradually over time, we show that reputational concerns do not automatically lead to inefficiencies. Indeed, although intuition suggests that private information and misaligned incentives should lead to inefficient overexperimentation, in fact exact efficiency can be achievable in equilibrium when the private information is acquired gradually on the path of play, is sufficiently informative, and there are no constraints on the timing of the agent’s actions. Our findings have implications for organisation and information design. Even when it is not feasible to align the DM’s objectives with those of the public through explicit incentives, improving the quality of the decision-maker’s private information can improve welfare. To the best of our knowledge, our setup is the first instance of a dynamic signalling game where types change over the course of play.3 1 2

Dik¨ otter (2010). Baldursson and Portes (2013), Milne (2013), Patrick (2013), Senate (2013). See also Dewatripont, Tirole,

et al. (1994) 3 Noldeke and Van Damme (1990) and Swinkels (1999) analyse a dynamic signalling game where types are fixed once and for all at the beginning of the game.

2

We consider a continuous-time game between a decision-maker and an observer. The decisionmaker’s competence-level is uncertain and initially unknown to both herself and the observer. She is endowed with a project which can be good or bad. The project’s quality is uncertain and also unknown to both herself and the observer. Hence, information is symmetric ex-ante. Her competence and the project’s quality are correlated: competent decision-makers are more likely to bring good projects. Consequently, there is a one-to-one map between beliefs about the project quality and beliefs about decision-maker’s competence, and it suffices to focus attention on the former. The project’s quality drives two distinct stochastic processes: a publicly observed payoff process and a private news process. A good project will eventually succeed, as long as the decisionmaker continues to experiment. The success arrives at an exponentially distributed random time. Similarly, if the decision-maker experiments then a bad project eventually fails, again at an exponentially distributed random time. Success and failure both end the game. They are publicly observed and conclusively reveal the project quality, thereby resolving all uncertainty. Depending on which is expected to occur faster – the success of a good project or the failure of a bad one – the absence of a public success or failure may in itself be good or bad news about the project’s quality. Meanwhile, the decision-maker gradually receives private, inconclusive information about the project’s quality. Specifically, one piece of news arrives at each jumping time of a Poisson process whose intensity depends on the project’s quality. Again, private news may be either good or bad depending on whether it is assumed to arrive more frequently when the project is good, or vice-versa. Since these pieces of news are privately observed, the private information she has accumulated so far determines the decision-maker’s type. Over the course of the game, the decision-maker continually learns about the project’s quality. Therefore her type is stochastic. At each instant, the decision-maker chooses whether to maintain her new project, or repeal it and revert to the status quo ante. This also ends the game. Experimenting has a cost: it is socially worthwhile if the project is good but not if it is bad. The decision-maker aims to maximise a weighted average of the expected social payoff and her expected reputation, modelled as the observer’s belief about the project’s quality at the end of the game – and hence about her ability. The decision-maker’s reputation is based only on the publicly available information and, in equilibrium, on her strategy. Not repealing her project when it is socially efficient to do so may boost her reputation by convincing the observer that she has privately received encouraging information about the project’s quality. In this model, the decision-maker cares about the value of the reputation when the project ends because, say, her re-election or reappointment takes place after this. In particular, the decision-maker is not concerned with the flow value of her reputation. This means that she has no preferences about the timing with which bad news is 3

revealed to the observer and so she has no intrinsic preference for delaying the revelation of bad news. The novel combination of public and private news that can each be good or bad gives rise to a rich set of possibilities. We begin by assuming that a good project is expected to succeed faster than a bad project is expected to fail, so that the absence of a public success or failure is in itself bad news.4 It is useful to keep in mind the benchmark of policy experiments or the development of new commercial products, where projects rarely have disastrous consequences and bad projects simply fail to succeed. The analysis then varies depending on the state-dependent intensities of the private news process. We find that under all but one information structures, if the intensity of the decision-makers’ reputational concerns is not too strong, the social welfare maximising policy can be adopted in equilibrium. This is the case, for example, when private news is good news. This may be plausible for technological reasons – the success of the project may depend upon several milestones being reached. It may also arise for organisational reasons – the decision-maker’s subordinates may be “yes-men” who only bring good news, while censoring bad news.5 In this case the absence of news, whether public or private, is bad, so that the decision-maker’s private belief drifts down toward the planner stopping region, with the arrival of private news causing upward jumps. A deviation from the planner policy must delay stopping by a fixed interval of time in order to bring reputational gains. It therefore necessitates strict social losses and is not profitable overall if the intensity of the decision-maker’s reputational concerns is low. When reputational concerns are too strong, we construct an equilibrium in which the project is abandoned inefficiently late if there is initial good news, while a decision-maker with no news stops before the social planner would. This behaviour is constrained efficient given subsequent distortions.6 Only the most efficient equilibrium survives refinement according to the D1 criterion of Cho and Kreps (1987), which requires that off-path beliefs only put weight on the type with the strongest incentives to deviate from that equilibrium.7 Only one information structure proves problematic for our efficiency result. Here, private news is bad but not very informative relative to public news, so that the absence of a public event is more informative about the project’s quality than the absence of private news. As a result, the motion of the decision-maker’s belief is consistently down towards the planner stopping region, jumping down following the arrival of a piece of news and drifting down in the 4 5

In Section 6.2 we analyse the case where a bad project fails faster than a good project succeeds. See Prendergast (1993). The bad news scenario is more plausible when news is difficult to suppress, as in the

more objective environment of medical services or scientific R&D. 6 The phenomenon of early stopping due to career concerns is novel in the literature on career concerns. 7 Applying the D1 criterion in a signalling game with stochastically changing types is not straightforward.

4

absence of news. The reputation obtained from stopping under the social planner policy has periodical discontinuous surges, making local deviations profitable. It follows that the decisionmaker cannot implement the socially optimal decision rule, no matter how small her reputational concern. We show that equilibrium strategies necessarily feature inefficient local pooling. One surprising finding is that improving the decision-maker’s private information, thereby increasing the informational asymmetry between her and the observer along the path of play, improves equilibrium outcomes. In fact, for any intensity of the decision-maker’s reputational concerns, there exists an information structure under which she implements the socially efficient decision-rule. This suggests that regulation aimed at diminishing informational asymmetries may in fact cause inefficient distortions. According to our model, a decision-maker with better private information – access to a well-funded think-tank – is preferable. Interestingly, the better private information is not used to extract a better reputation. Instead it encourages maximising social welfare. This last result mirrors the insights of the recent literature on dynamic mechanism design8 . Different from initial private information, subsequent private information does not lead either to informational rents or allocative inefficiency. In addition, we show that our efficiency results rely on the assumption that the decision-maker is free to repeal the project at any date. If she is restricted to act only at given pre-set dates then reputational concerns always result in excessive continuation of the project, compared to the social optimum subject to identical constraints on timing. This implies that a decision-maker should be given full discretion with regard to the decision to cancel her pet project, without requiring ratification by the party conference in the case of a political leader, or a the board of directors in the case of a CEO. Our results underline that career concerns are not sufficient to explain the persistence of inefficient policies. The information structure also matters. In our model, even a thoroughly career concerned decision-maker will choose the efficient policy if her private information is sufficiently good. Conversely, there exist information structures under which even the mildest career concerns cause the decision-maker to behave inefficiently. This suggests two substitutable instruments for executive compensation schemes aimed at reducing inefficiencies due to career concerns. One is to explicitly increase the decision-maker’s stake in the company – striking the right balance between salary, bonus and shares. Alternatively, the drive towards inefficiency caused by career concerns can be overcome with better private information. A sizeable literature has attempted to explain the persistence of inefficient policies.9 The tension between career concerns and the desire to serve the public interests has been advanced 8 9

Courty and Li (2000), Es˝ o and Szentes (2007), Es˝ o and Szentes (2016) Fernandez and Rodrik (1991), Alesina and Drazen (1991)

5

as one explanation in economics10 , political science (Downs and Rocke (1994), Canes-Wrone, Herron, and Shotts (2001)) and finance (Rajan (1994)). The misalignment of incentives has been interpreted as ego-rents (Rogoff (1990)), political bias (Jackson and Morelli (2007)), seeking re-election, empire-building (Jensen (1986)) or simply the divergence of short and long-term objectives. In particular, a good reputation might be instrumental in securing payoff gains in future interactions (Benabou and Laroque (1992), Boot and Thakor (1993), Morris (2001), Ottaviani and Sørensen (2006)). The unifying theme is that when decision-makers hold private information, deviating from the socially efficient policy sends a reputation-enhancing signal to the market. The incentive to pool with the better type leads to inefficient distortions. In Dur (2001), as in our model, the political leader is ex-ante uncertain about her ability to identify and implement socially valuable policies. After choosing a project, she privately learns its quality. Repealing a bad project is socially optimal, but reduces the leader’s chances of re-election in the second period. In Majumdar and Mukand (2004), the leader knows her ability prior to choosing a policy project, allowing a highability leader with perfect information to only select good projects, while the project selected by a low ability leader with noisy private information will sometimes be bad. A subsequent noisy public signal is therefore uninformative to the able type, who always persists with the chosen project. Repealing her project after a low public signal therefore reveals the low ability type, imposing a reputational cost. In Prendergast and Stole (1996) an agent, who wants to acquire a reputation for quickly learning the correct course of action, makes investment decisions over a finite number of periods. In each period the agent may base her choice on her observation of a private signal about the investment’s profitability. The high ability agent receives more precise signals than her low-ability counterpart and is able to learn about and respond to the economic environment more quickly. Exaggerating the information of her signal in early periods, and dampening it in later periods lets the low-ability agent pool with the high ability type. In Dur (2001), as in our setup, there are no information asymmetries ex-ante, and the agent’s type reflects her private information hitherto. However, as in the other two papers, the agent’s type is binary and known to her before she signals to the market. Our model departs from this assumption, as information asymmetries arise gradually, on the path of play11 , as the agent receives noisy private news, stochastically over time. These private signals are only partially informative, and the agent retains some uncertainty about her ability throughout the game. Only the arrival of a payoff resolves the agent’s uncertainty, but we assume that this is publicly observed, and ends the game. In our setup, the agent’s type is the private information she has 10

Dur (2001), Majumdar and Mukand (2004), Prendergast and Stole (1996), Ben-Porath, Dekel, Lipman, et al.

(2014). 11 In Holmstr¨ om (1999) information is symmetric ex-ante and informational asymmetries can only arise off path.

6

accrued so far, and it evolves over time. This gives rise to a dynamic signalling game with stochastic types. This may be contrasted with the pioneering work on dynamic signalling in Noldeke and Van Damme (1990) and Swinkels (1999), where types are fixed. The information structures we consider generalise those frequently encountered in the strategic experimentation literature.12 Most of that literature assumes a single news process. In Keller, Rady, and Cripps (2005) and Keller and Rady (2010), the publicly observed arrival of a payoff is good news, corresponding to the success of a product or a research idea, whereas in Keller and Rady (2015), it is bad news, for instance an industrial accident. (In these papers the focus is on information as a public good.) Our model allows for either, and in fact for a combination of the two: the public arrival of a payoff can constitute good or bad news and is conclusive, while news privately observed by the decision-maker is inconclusive, and may also be either good or bad. The resultant optimisation generalises that commonly encountered in the strategic experimentation literature. In particular, the planner solutions in propositions 1 and 7 are new. The only work we are aware of in which private information accrues gradually over time in the context of dynamic signalling is Halac and Kremer (2016), which was written subsequent to this paper. Section 6.2 discusses the version of our model closest to theirs. In the next section, we introduce the model. Section 3 summarises the main results of the paper. Sections 4 treats the case of good news in detail, as a representative for information structures under which exact efficiency is achievable in equilibrium. (The others are discussed briefly in Section 6.) Section 5 treats the case of bad news with downward drift, the unique information structure under which efficiency cannot be achieved in equilibrium. Section 7 concludes. All omitted proofs are in the Appendix.13

2

The model

We consider a game between a decision-maker – henceforth DM – and a population of observers. Since the observers are passive, and form beliefs about the DM that depend on public information, we may treat them as a single player.14 Time is continuous, the horizon is infinite and the observer’s payoffs are discounted at rate ρ > 0. At the beginning of the game, nature chooses the DM’s competence-level from the set {C, I}, i.e. either competent or incompetent. The DM’s competence-level is fixed once chosen. The prior probability that the DM is competent equals 12

Dong (2016) studies a game of strategic experimentation with initial private information. There too longer

experimentation signals greater optimism. Private information also accrues exogenously, on the path of play, in Krestel and Thomas (2014), Heidhues, Rady, and Strack (2015) or Das (2015) 13 Together with more formal statements of the propositions, when required. 14 Depending on the interpretation of the model, the observer may be group of individuals (an electorate, the shareholders of a company) who have common interests and access to the same public information.

7

p0 ∈ (0, 1). Since neither the DM nor the observer know the realisation of her competence, the DM has no private information at the beginning of the game. The DM is endowed with a project of unknown quality θ which can be either good (θ = G) or bad (θ = B). A project is more likely to be good if it is undertaken by a competent DM. This implies that there is an increasing affine map between the belief about the project’s quality and the belief about the DM’s competence. For simplicity, we assume that this map is identity, i.e. we assume that the project is good if and only if the DM is competent.15 The project’s quality determines the observer’s payoff: a good project strictly improves upon the status quo ante, whereas a bad project is strictly worse. It also governs the accrual of the DM’s private information. This is modelled as follows. Public Success or Failure Success and failure of the project are mutually exclusive and public events. A good project yields no payoff until it succeeds at random time τG ∈ [0, ∞), and thereafter yields a constant flow payoff with present value g. The random time τG follows an exponential distribution with commonly known parameter ηG ≥ 0. Conversely, a bad project yields no payoff until it fails at random time τB ∈ [0, ∞), and thereafter yields a constant flow payoff with present value ` < g. The random time τB follows an exponential distribution with commonly known parameter ηB ≥ 0. A good project never fails and a bad project never succeeds. Let ∆η := ηG − ηB . Except for Section 6.2, we will assume that ∆η > 0, with the interpretation that a good project succeeds faster than a bad project fails. Thus for most of the paper, public news is a “good-news” process, implying that the absence of success or failure induces beliefs to drift down over time. Let γG := ηG g/(ηG + ρ) denote the expected discounted payoff of keeping a good project active until it succeeds, and γB := ηB `/(ηB + ρ) denote the expected discounted payoff of keeping a bad project active until it fails. Finally, define the right-continuous process (S(t))t≥0 , with S(0) = 0 and S(t) = 1{τG ≤ t} − 1{τB ≤ t}, that tracks whether at any time t ≥ 0 the project has as yet succeeded or failed. The project’s success or failure is publicly observed by the DM and the observer. Either event perfectly reveals the underlying quality of the project, and ends the game. We shall say that the DM “experiments” if she keeps active a project that has not yet succeeded or failed. Repealing the project, thereby reverting to the status quo ante, also ends the game, and yields a payoff with discounted present value s. We assume that s ∈ (γB , γG ) so that good projects should be pursued and bad ones should not. 15

Since the map is affine, and since we shall assume that the DM cares about the expected belief about her

competence, it is straightforward to verify that our results extend to the case where DM competence and project quality are imperfectly correlated.

8

Private News

As long as the project is active, the DM privately observes a stream of private

signals, or “pieces of news”. Given the quality of the project θ ∈ {G, B}, one piece of news arrives at each jumping time of a standard Poisson process with intensity λθ ≥ 0 that depends on the project’s quality θ. Let ∆λ := λG − λB . If ∆λ > 0, the private news process is a “good-news” process, and news events can be thought of as “breakthroughs” that indicate that the project is more likely to succeed. If ∆λ < 0, we have a private “bad-news” process, where news events are “breakdowns” that are more likely to occur if the project is bad. This paper analyses both cases.16 The arrival of a news event does not in itself generate a payoff.17 Let the random variable N (t) taking values in the non-negative integers be the number of news events observed by the DM up to date t. Observe that this summarises the DM’s private information or type at date t. For n ∈ {1, 2, . . . }, let τn , denote the arrival time of the nth piece of private news. Conditional on the project’s quality θ, the processes (N (t)) and (S(t)), are assumed to be independent. Let (X(t)) = (N (t), S(t)), and let FtX = σ{X(s)|0 ≤ s ≤ t} reflect the information available to the DM at date t.18 A policy for the DM is a stopping time, T ∈ [0, ∞], with respect to the filtration {FtX }t≥0 . Beliefs: Two posterior beliefs about the project’s quality are important. The first, p(t) := Pp0 (θ = G|FtX ), is the DM’s private belief. The second, µ(t), is the observer’s posterior belief about the project’s quality, based only on the publicly available information: the DM’s actions up to date t and whether the project has as yet succeeded or failed.19 It is also the observer’s posterior belief about the DM’s competence level, and equivalently, the DM’s reputation. Even though the state N (t) of the private news process is not observed by the observer, it may be partially inferred from the DM’s equilibrium policy and her actions up to date t. The DM’s private posterior belief satisfies Bayesian updating: (1) 16 17 18

p(t) =

1

p0 1−p0 φ(t) , p0 + 1−p φ(t) 0

If ∆λ = 0, news events provide no information about the project’s quality. This is without loss of generality. Formally, let θ be an independent Bernoulli random variable with parameter p0 . For θ ∈ {G, B} let (Nθ (t)) be

a standard Poisson process with intensity λθ , and let Sθ (t) = 1{τG < t} − 1{τB < t}, where τ¬θ = +∞ and τθ is a random time that follows an exponential distribution with parameter ηθ . Finally let (Xθ (t)) = (Nθ (t), Sθ (t)). The process (X(t)) is defined to be (Xθ (t)). Let Pp0 denote the probability measure over the space of realised paths that corresponds to this description. From now on all the expectations are taken under the probability measure Pp0 . 19

The process (p(t))t≥0 is adapted to the filtration {FtX }t≥0 . The process (µ(t))t≥0 is adapted to the natural

filtration associated with (Y (t))t≥0 := (S(t), a(t))t≥0 , where a(t) := 1{t ≤ T } is a random variable taking the value 0 if at date t the DM has previously repealed the project and reverted to the status quo ante, and 1 otherwise. The stochastic process (a(t))t≥0 is the publicly observable action path.

9

where φ(t) = e−(∆η+∆λ)t

λG λB

N (t)

.

If over the time interval [t, t + dt) there is no private news and no public event even though the project is active, the DM’s posterior belief evolves continuously according to the law of motion dp = −p(1 − p)(∆η + ∆λ)dt.

(2)

If a private news event occurs at time t > 0, the DM’s posterior belief jumps from p(t− ) (the limit of her posterior beliefs before the news event) to p(t) = j(p(t− )), where (3)

j(p) :=

pλG , λ(p)

and λ(p) := pλG + (1 − p)λB . Let γ(p) := pγG + (1 − p)γB denote the expected value of the project given belief p. Whenever min{λG , λB } > 0, private news is inconclusive and j(p) ∈ (0, 1) for every p ∈ (0, 1). A public event – success of failure – proves either that the project is good or that it is bad, and the public and private beliefs together jump to one or zero correspondingly. Since this ends the game, our discussion henceforth conditions throughout on the absence of a public event. Now consider the evolution of the DM’s private belief according to (2) and (3). Since this depends on the arrival rates of private and public news, its drift can be in the same or opposite direction of the jumps that arise due to private news. We can distinguish three cases when ∆η > 020 : • Good private news: When ∆λ > 0, then ∆η + ∆λ > 0 and p(t) continuously drifts down and jumps up, as illustrated in Figure 1.21 • Bad private news with downward drift: When ∆λ < 0 and ∆η + ∆λ > 0 the belief jumps down and continuously drifts down, as illustrated in Figure 2. • Bad private news with upward drift: When ∆λ < 0 and ∆η + ∆λ < 0 the belief jumps down and continuously drifts up, as illustrated in Figure 3. The paths pnt := Pr(θ = G|N (t) = n, S(t) = 0) are deterministic functions of t, and are horizontal translations of one another. 20

If we assume that a bad project fails faster than a good project succeeds (∆η < 0), then there are three

further cases. These are discussed in Section 6.2. 21 Note that the figures illustrate a sample path where θ = G. If instead we had θ = B, the posterior belief p(t) would jump to 0 instead of 1 at the arrival of the public event, which would be a failure.

10

1.0

p(t)

0.8

p0t

0.6 p1t

0.4

p2t

0.2 0.0 0.0

0.5

1.0

1.5

2.0

t

Figure 1: Sample path of the belief p(t), in the private good news case (∆λ > 0).

1.0 p0t

0.8

p(t)

p1t

0.6 p2t

0.4 0.2 0.0 0

1

2

3

4

5

t

Figure 2: Sample path of the belief p(t), in the private bad news case with downward drift (∆λ < 0, ∆η + ∆λ > 0).

1.0 p2t

p(t)

0.8 p1t

0.6 p0t

0.4 0.2 0.0 0

1

2

3

4

5

t

Figure 3: Sample path of the belief p(t), in the private bad news case with upward drift (∆λ < 0, ∆η + ∆λ < 0).

11

Policy:

For every value of N (t), the number of pieces of private news, the DM’s posterior

belief is a deterministic function of time. Thus a stopping time T with respect to the filtration {FtX }t≥0 can be described by the sequence of times {Tn }n≥0 .22 Equivalently, it can be described by a sequence of threshold beliefs pˆ := {ˆ pn }n≥0 , where pˆn = pnTn is the threshold belief at which the DM’s type with n pieces of news stops. The policy T is then the stopping time T (p) ˆ = inf{t : N (t) = n, p(t) ≤ pˆn }. Payoffs:

A policy T determines a social payoff and a reputation for the DM. If the project

succeeds before the DM repeals it, the social payoff is e−ρτG g and the DM’s reputation is µ(τG ) = 1. If the project fails before the DM repeals it, the social payoff is e−ρτB ` and the DM’s reputation is µ(τB ) = 0. If the DM repeals the project first, the social payoff is e−ρT s, while the DM’s reputation is µ(T ), determined in equilibrium. The DM’s payoff is defined as a convex combination of the social payoff and her reputation.23 Let α ∈ [0, 1) parametrise the intensity of the DM’s reputational concern, and let WtT := 1{τG < T } e−ρ(τG −t) g + 1{τB < T } e−ρ(τB −t) ` + 1{τG ∧ τB ≥ T } e−ρ(T −t) s be the observer’s payoff under the policy T at date t < (T ∧ τG ∧ τB ). We shall also refer to it as the social payoff. The DM’s expected payoff from a policy T is i h Vtα,T = E (1 − α) WtT + α µ(T ∧ τG ∧ τB ) FtX . Observe that it is the undiscounted terminal value of her reputation that the DM cares about. It can be thought of as a sufficient summary statistic for the DM’s payoff in a continuation game. For instance, her reputation at the end of a project determines the likelihood with which a CEO is re-hired, or a political leader deposed. In this model, the DM has no intrinsic preferences about the time at which she receives information or reveals it though her actions. 22

At the initial moment, the DM’s type who has observed n = 0 pieces of private news chooses a deterministic

time T0 ∈ [0, ∞] with the interpretation that she cancels the project at date T0 if up until then there has been no private news (τ1 ≥ T0 ) and no success or failure (τG ∧ τB ≥ T0 ). Thus, T0 determines the DM’s control up to the moment of the first jump of the two dimensional process (X(t)), that is, either until a success or a failure, or until the first piece of news arrives, whichever comes first. If at the random time τ1 < T0 ∧ τG ∧ τB a news event occurs, a new time T1 ∈ [τ1 , ∞] is chosen for the type who has observed n = 1 pieces of private news. And so on for further news events. The game ends when the DM repeals the project, or at the random time τG ∧ τB when a success or failure occurs and the quality θ of the project is conclusively revealed. See Presman and Sonin (1990). 23 The linearity of the payoff function, in conjunction with the martingale property of beliefs, implies that there is no inbuilt bias towards over- or under-experimentation due to non-linear preferences. These are present, for example, in Kamenica and Gentzkow (2011).

12

Social planner problem:

When α = 0, the DM has no reputational concerns, and her

objective coincides with that of the observer. We refer to the DM with preference parameter α = 0 as the social planner. The policy maximising Vt0,T = WtT is the planner policy. Dynamic signalling game with dynamic types:

When α > 0, the DM’s and observer’s

objectives are misaligned, as the DM also cares about her reputation. She is then engaged in a dynamic signalling game with the observer. Although the information is symmetric ex-ante, the DM acquires private information over time. By equation (1), there is a one-to-one mapping between the DM’s private information N (t), and her posterior belief p(t) at date t. Without loss of generality, we can therefore let p(t) denote the DM’s type at date t. The set of possible types at date t is Θ(t) := {pnt }∞ n=0 . Equilibrium:

Our solution concept is the perfect Bayesian equilibrium (henceforth: “equilib-

rium”). It consists of a policy T , and a public belief process (µ(t))t≥0 such that: (i) µ(t) is measurable with respect to the observer’s information at t and consistent with the policy T . In particular, suppose the DM repeals the project at t. If t is in the support of the policy T , then µ(t) is defined by Bayes’ rule. Otherwise µ(t) may be chosen arbitrarily. (ii) T maximises Vtα,T given (µ(t))t≥0 . Interpretations:

It is useful to keep in mind a number of interpretations of the model.

1. The observer is the electorate and the DM a political leader undertaking a policy experiment. The leader cares directly about the social value of the policy experiment, but also about her own future electoral prospects. Those depend on the electorate’s assessment of her competence. At each point in time the political leader may cancel the experiment and revert to a known status quo ante. She can base her decision on the continuous evaluation of the policy by a government think-tank, though the think-tank’s report are not available to the general public. The public does, however, perceive improvements in its standard of living. 2. The DM is a CEO seeking to rescue a distressed company and the observer is the company’s board of directors. While the board can observe a company’s recovery (decline) once it is reflected in a better financial position, it is unable to independently assess the effectiveness of the stabilisation and recovery measures instigated by the CEO. 3. The DM is a venture-capitalist overseeing a project on behalf of investors. She makes the decision to keep on funding or stop, requiring sufficient evidence of performance to justify 13

continued funding. She has more information as to how the new venture is proceeding than her investors, who only perceive whether the investment eventually pays off or is a loss.

3

Summary of Results

Is it possible for a DM to behave in the social interest in spite of her reputational concerns? In other words, does there exist an equilibrium of the signalling game in which a DM with a given intensity α ∈ (0, 1) of reputational concerns adopts the social planner’s optimal policy? The answer to that question depends on the information structure, parametrised by λG , λB and ∆η > 0. We show that there exists a threshold, α ¯≡α ¯ (λB , λG , ∆η), such that the social planner policy is an equilibrium strategy of the signalling game if and only if the intensity α of the DM’s reputational concerns belongs to the interval [0, α ¯ ]. The threshold α ¯ depends on the information structure, and admits two qualitatively different regimes, as illustrated in Figure 4. In the scenario where the DM’s private belief jumps down towards zero following a news event, and also drifts down in the absence of news24 , we show in Section 5 that α ¯ = 0. There is no equilibrium of the signalling game in which the DM adopts the efficient policy, if she is at all concerned with her reputation. This stark result is driven by two features arising together only in the bad news case with downward drift. Under the planner policy, the private belief may enter the stopping region discontinuously on a downward jump. Second, the public belief is a discontinuous function of time, and admits upward discontinuities at regular intervals. Deviations from the planner policy that exploit these frequent “surges” in her reputation are profitable for the DM if she has any career concerns. For all remaining information structures25 , there exists an equilibrium in which the DM adopts the socially efficient policy, provided she is not too concerned with her reputation. That is, α ¯ > 0. In fact, we show that as the quality of the DM’s private information improves the threshold α ¯ becomes arbitrarily close to one.26 24

The evolution of the DM’s posterior belief in this case is illustrated in Figure 2. In Section 6.2 we argue that this is also the case for all information structures with ∆η < 0. 26 Section 4.4 establishes this in the context of the “good news” case, where ∆λ > 0. A similar result holds in 25

the bad news case with upward drift, although the proof is omitted in this paper.

14

Figure 4: The threshold α¯ ≡ α(λ ¯ B , λG , ∆η) as a function of the information structure, parametrised by λB , λG and ∆η > 0. In each case, we specify the motion of the private posterior belief p(t).

4

Achieving Exact Efficiency under Good Private News

4.1

Social Planner

We begin with the case of good private news. The evolution of the DM’s private belief in this case is illustrated in Figure 1. The next proposition describes the planner policy and the resulting social value. Let U 0 (p(t)) := supT Vt0,T be the common value function for the DM and the observer, and Ω(p) := (1 − p)/p denote the inverse likelihood ratio. The planner’s threshold belief p∗ is constant over time. Conditional on no public failure, the DM’s posterior belief may only enter the planner stopping region (0, p∗ ] on its continuous downward motion. It cannot jump into the stopping region. Let t∗n be the date t at which pnt = p∗ . It is the date at which the planner repeals the project if she observes n pieces of news prior to her posterior belief falling below the threshold p∗ . Observe that since the planner only stops at dates {t∗n }n≥0 , there are unreached information sets under the planner solution, namely the intervals (t∗n , t∗n+1 ), n ≥ 0. Figure 5 illustrates these features of the planner solution. Proposition 1. The planner’s optimal policy is to stop at the first time t such that p(t) ≤ p∗ . The planner threshold belief satisfies (4)

p∗ =

ν (s − γB ) , (ν + 1) (γG − s) + ν (s − γB )

where ν > 0 is the positive solution to (5)

ηB + λB + ρ − ν (∆η + ∆λ) = λB 15

λB λG

ν .

The social payoff under the optimal policy is ( u0 (p) if p(t) > p∗ , 0 (6) U (p) = s if p(t) ≤ p∗ , where (7)

1−p (s − γ(p∗ )) u (p) := γ(p) + 1 − p∗ 0

Ω(p) Ω(p∗ )

ν .

Figure 5: The planner threshold in the private good news case.

4.2

Adopting the Planner Policy in an Equilibrium

Let us now consider the signalling game played by a DM who is concerned both about the social welfare and the observer’s belief about her ability. We show that if the DM’s career concerns are sufficiently mild, there exists an equilibrium in which she adopts the planner policy. Proposition 2. There exists α ¯ ∈ (0, 1) such that the planner policy is an equilibrium policy if and only if the DM’s reputational concern has intensity α ≤ α ¯ . This equilibrium is supported by the off-path reputation µ(t) = p0t for t ∈ (0, t∗0 ) and µ(t) = pnt for t ∈ (t∗n , t∗n+1 ) for each integer n > 0. The intuition for this result is simple. For a deviation to bring a strict reputational benefit, a DM whose posterior belief enters the stopping region at date t∗n for some n ≥ 0 must delay stopping until t∗n+1 . Since t∗n+1 − t∗n > 0, this deviation requires a strict social loss. The social loss only outweighs the reputational benefit if the intensity α of the DM’s reputational concerns is sufficiently low. If that intensity is α ¯ , the DM is just indifferent. 16

The threshold intensity α ¯ need not be small. In fact, we will see in Section 4.4 that α ¯ depends on the information structure and can be arbitrarily close to one.27 At this equilibrium, the types of the DM separate: the DM stops at t∗n if and only if she has observed n pieces of news up until that date. The observer is therefore able to infer that her type is p(t∗n ) = p∗ . Fix α < α ¯ and suppose the DM adopts the planner policy. To prove Proposition 2, we first show that the DM cannot profitably deviate from the planner policy by not repealing the project when it would be efficient to do so. To this end, we define the following deviation. ∗

Definition 1. Fix n ≥ 0, and t0 > t∗n . The deviation Dnp (t∗n , t0 ) replaces the planner policy with the policy T (p) ˆ characterised by the sequence of threshold beliefs pˆ := {ˆ pi }i≥0 such that pˆn = pnt0 and pˆk = p∗ for every k 6= n. Under this deviation, all types of the DM adhere to the planner policy, except the type of the DM with belief p(t∗n ) = p∗ , i.e. the DM who has observed n pieces of news up to date t∗n . She deviates from the planner policy by keeping the project active over the time interval [t∗n , t0 ), and resuming the planner policy at t0 . Loosely, this deviation delays stopping for the type of the DM with n pieces of news. When t0 = t∗n+1 , the type of the DM who has observed n pieces of news up to t∗n pools with higher types. Observing at least one piece of private news on [t∗n , t∗n+1 ) makes her posterior jump up, and at t∗n+1 her belief is p(t∗n+1 ) so that the DM is back on path when she resumes the ∗

planner policy. The deviation Dnp (t∗n , t∗n+1 ) can therefore be interpreted as the type of the DM who has observed n pieces of news gambling for resurrection. Going forward, the DM’s behaviour is indistinguishable from that of a type who had at least n+1 pieces of news at t∗n . She repeals the project whenever her posterior belief p(t) hits the planner threshold p∗ and the observer correctly infers her private belief so that the public belief also equals p∗ . Conversely, observing no private news on [t∗n , t∗n+1 ) makes the DM’s private belief drift further into the planner stopping region. At t∗n+1 her belief is p(t∗n+1 ) = j −1 (p∗ ), which is strictly below p∗ . The DM then stops at t∗n+1 , thereby pooling with the type who has observed n + 1 pieces of news up to t∗n and no private news on [t∗n , t∗n+1 ). Indeed, the observer wrongly infers that the DM’s private belief at t∗n+1 must be p∗ , so that the public belief is µ(t∗n+1 ) = p∗ , resulting in a net reputational gain for the DM. ∗

While the deviation Dnp (t∗n , t∗n+1 ) always induces a net expected social loss, it succeeds in convincing the observer that the DM had more good news at t∗n than was in fact the case, and thus generates a net expected reputational gain for the DM. It is profitable for a DM with reputational concern α > 0 whenever its expected reputational benefit is large enough to outweigh the associated expected loss in social welfare. 27

We show that this result holds whether the effect of the better information structure is to lengthen t∗n+1 − t∗n ,

or to shorten it.

17

The net payoff from this deviation is independent of n. Consequently there exists an intensity ∗

α ¯ > 0 of reputational concerns such that the deviation Dnp (t∗n , t∗n+1 ) is not profitable for every n ≥ 0 if and only if the intensity of the DM’s reputational concern is α < α ¯ . For these values of ∗

α, longer deviations Dnp (t∗n , t∗n+k ) with k > 1 are not profitable a fortiori. What about shorter deviations? Fix t0 ∈ (t∗n , t∗n+1 ). The reputation which the DM obtains if stopping at t0 is not pinned down by Bayes’ rule. The off-path public belief specified under ∗

Proposition 2 ensures that the deviation Dnp (t∗n , t0 ) has no reputational benefit or loss. Since it ∗

does have a social cost, it is necessarily less profitable than Dnp (t∗n , t∗n+1 ). Finally, repealing the project when it is socially optimal to continue experimenting induces reputational as well as social losses, and is therefore not profitable. Proposition 2 follows.

4.3

Equilibrium Refinement

When the efficient equilibrium exists, it is the only equilibrium satisfying the D1 criterion of Cho and Kreps (1987). Given a fixed equilibrium of a signalling game, this refinement requires that the beliefs of the receiver upon observing a message that no type of the sender sends on path should place all weight on the type(s) of the sender with the strongest incentive to deviate to that message. More precisely, the beliefs should place no weight on a type of the sender if there exists another type who has a strict incentive to deviate whenever the first type has a strict or weak incentive to deviate. Although the next proposition is similar in flavour, it does not follow from Cho and Sobel (1990) as the conditions of that paper are not satisfied in our dynamic setup. Indeed, when applying D1, some care is needed to address the complication that the DM’s type changes stochastically. Suppose that at some date t, a DM whose type should stop at t deviates and continues experimenting until date t0 > t. If she observes news on the interval [t, t0 ) her type will be different at the completion of her deviation. Our proof exploits the fact that the DM’s type N (t) can only increase over time. Proposition 3. Suppose that the intensity of the DM’s reputational concern is α ≤ α ¯ . (a) The efficient equilibrium satisfies the D1 criterion. (b) No inefficient equilibrium satisfies the D1 criterion. We illustrate some features of this result. Fix the DM’s reputational concern α ≤ α ¯ and consider the efficient equilibrium. For every t0 ∈ (t∗n , t∗n+1 ), the off-path reputations µ(t0 ) satisfying D1 are characterised in the next lemma and illustrated in Figure 6. Lemma 1. For every intensity α ≤ α ¯ of the DM’s reputational concerns there exists a belief pµ (α) and associated dates tµn (α) ∈ (t∗n , t∗n+1 ] satisfying pntµ (α) = pµ (α) for each n ≥ 0, such that n

18

the off-path reputations satisfying D1 are µ(t) = p0t for t < t∗0 , and for each t > t∗0 , n if t ∈ (t∗n , tµn (α)), {pt } (8) µ(t) ∈ M D1 (t) := {pn+1 } if t ∈ (tµn (α), t∗n+1 ), t pn , pn+1 if t = tµ (α). t

t

n

Figure 6: Off-path reputation, µ(t), satisfying D1. It is striking that under D1 a DM repealing the project at t0 ∈ (tµn (α), t∗n+1 ) obtains a reputation µ(t0 ) > p∗ . Stopping the project at t0 convinces the observer that continued experimentation would have been optimal. Divergences in the private and public beliefs can be interpreted using the terminology of Canes-Wrone, Herron, and Shotts (2001). If stopping when µ(t0 ) > p∗ > p(t0 ), the DM follows her private belief in spite of public opinion, thereby exerting true leadership. Conversely, stopping when µ(t0 ) < p∗ < p(t0 ) amounts to pandering, as the DM is privately convinced that the project is still worth pursuing, but yields to public opinion despite her private reservations. However, neither occurs on the equilibrium path. To see why µ may take values above the threshold p∗ , let us vary t0 ∈ (t∗n , t∗n+1 ) and compare the incentives of the type with belief p(t0 ) = pn+1 to preempt the planner policy and repeal the t0 project at t0 , with the incentives of the type with belief p(t∗n ) = p∗ to engage in the deviation ∗

Dnp (t∗n , t0 ) and delay repealing the project until t0 .28 Both deviations impose a net social loss, which is increasing in t0 for the delaying type and decreasing in t0 for the preempting type. Suppose that for every t0 we chose the off-path reputation µnt0 such that the expected rep∗

utational benefit of Dnp (t∗n , t0 ) just offsets its expected social loss, making the delaying type indifferent between delaying and staying on path. The reputation µnt0 must increase with t0 . For 28

We show that these are the two types with the strongest incentives to stop at t0 , thereby allowing us to

eliminate all other types under D1.

19

t0 ∈ (t∗n , tµn (α)) that reputation is insufficient to incite the type p(t0 ) = pn+1 to preempt, and t0 D1 prescribes that the public belief should put all weight on type p(t0 ) = pnt0 . Conversely, for to strictly outweigh the lower cost t0 ∈ (tµn (α), t∗n+1 ), the reputation µnt0 is far enough above pn+1 t0 of repealing the project inefficiently early, and D1 prescribes that the public belief should put all weight on type p(t0 ) = pn+1 t0 . Our argument also explains why the off-path beliefs selected by D1 are not high enough to make deviations from the planner policy profitable, and Proposition 3 (a) follows. As the intensity α of the DM’s reputational concerns increases on (0, α ¯ ], pµ (α) decreases. Notably, pµ (α ¯ ) = j −1 (p∗ ). To see why, recall that when α = α ¯ , the delaying type with belief ∗

p(t∗n ) = p∗ is indifferent between adhering to the planner policy and the deviation Dnp (t∗n , t∗n+1 ) where the reputation obtained from stopping at t∗n+1 is p∗ . Consequently, setting µnt0 = p∗ for ∗

every t0 ∈ (t∗n , t∗n+1 ) makes Dnp (t∗n , t0 ) strictly profitable for the delaying type, and stopping at t0 strictly unprofitable for the preempting type p(t0 ) = pn+1 t0 .

4.4

Changing the Information Structure

Fix ∆η > 0 and let the threshold from Proposition 2 explicitly depend on the private news technology, that is, on the pair (λB , λG ). The DM with reputational concern α ¯ (λB , λG ) is just ∗

indifferent between adhering to the planner policy and the deviation Dnp (t∗n , t∗n+1 ). Proposition 4 describes the effect of a change in the parameters λB and λG on the equilibrium, and on the threshold α ¯ (λB , λG ) in particular. We show that α ¯ (λB , λG ) can take any value in [0, 1]. A more informative private news process – either through more frequent news when the project is good, or less frequent news when the project is bad, or both – strengthens the ability of a DM to adhere to the socially optimal policy in equilibrium, given her level of career concerns. A less informative news process makes deviating more attractive. One important implication of Proposition 4 is that, for any given intensity α ∈ (0, 1) of reputational concerns, there exists an information structure under which the planner policy constitutes an equilibrium strategy of the game between the reputationally concerned DM and the observer, and another, distinct information structure under which the DM’s equilibrium behaviour is necessarily inefficient. Equivalently, it implies that even an intensely reputationally concerned DM can adopt the first-best decision-rule if her private information is sufficiently good. Conversely, even a thoroughly socially minded DM will behave inefficiently if her private information is too poor. Proposition 4.

(a) Fix λG > 0. Then α ¯ (λB , λG ) tends to one as λB tends to zero.

(b) Fix λB ≥ 0. Then α ¯ (λB , λG ) tends to one as λG tends to infinity.

20

(c) For any λB > 0, α ¯ (λB , λG ) tends to one as the difference ∆λ tends to zero. Suppose as in (a) that news events only occur if the project is good. Deviating from the planner solution is never profitable for a DM, unless she is only concerned with her reputation. The intuition for this result can be obtained from the case when λB = 0 so that the first arrival of a piece of news conclusively reveals to the DM that the project is good. Thus, at every t > 0, there are effectively two types of the DM: the “informed” DM, who has observed at least one piece of news and for whom p(t) = 1, and the uninformed DM who has not yet observed any news and for whom p(t) = p0t . Under the planner policy, the uninformed type repeals the project at t∗ satisfying p0t∗ = p∗ , while the informed type continues to experiment until the project succeeds or fails.29 For the uninformed type at t∗ , pooling with higher types requires committing to never repealing the project, which generates a large social loss, but no reputational benefit.30 Now fix λB ≥ 0 as in (b) and suppose that news events occur with increasing frequency when the project is good (λG → ∞). Then the planner policy is adopted in the equilibrium of the signalling game satisfying D1, provided the DM is not exclusively concerned with her reputation. The intuition is as follows. As news arrives with increasing frequency when the project is good, the length t∗n+1 − t∗n of the deviation required for the DM with n pieces of news to pool with higher types shrinks to zero, as does the social cost of that deviation. However, since the planner threshold tends to zero, the expected reputational benefit from the deviation also shrinks to zero, and at a faster rate than the social cost. Finally, suppose as in (c) that news arrives also when the project is bad (λB > 0), but that news events carry a vanishing amount of information about the underlying quality of the project (∆λ → 0). In this case also, the duration t∗n+1 − t∗n shrinks zero, as does the reputational benefit from the deviation. However, it is now the expected social cost that tends to zero faster, and deviating to pool with higher types is profitable, unless the DM has no reputational concerns. From the discussion above it is clear that the relationship between α ¯ and t∗n+1 − t∗n is not monotonic. It is important to emphasise that Proposition 4 is not equivalent to the notion that, under a better information structure, pooling with the next-best type requires a longer deviation and is therefore too costly. Indeed, the proposition also holds when a better information structure makes the required deviation shorter, because of the effect on the planner threshold belief. 29

The threshold p∗ is socially optimal when λB = 0 so that a piece of private news suffices to reveal that the

project is good. See Corollary 1 in Appendix A.1.2. 30 0 A similar argument holds when ∆λ = 0. In this case, news events are uninformative, so that pn t = pt for every n > 0, t > 0, and the DM never acquires any private information. Thus µ(t) = p0t for every t < τG ∧ τB . Under the planner policy, the DM repeals the project at date t¯∗ satisfying p0t¯∗ = p¯∗ , unless the project has previously succeeded or failed. Any deviation from this policy generates a strict social loss and no reputational benefit. Here, p¯∗ denotes the upper bound on p∗ , achieved when ∆λ = 0 so that private news is uninformative. See Corollary 1 in Appendix A.1.2.

21

4.5

Other Equilibria

Fix α > α ¯ , so that the planner policy is not an equilibrium policy. We show that there exists an inefficient separating equilibrium in which the types of the DM who have observed at least one piece of news repeal the project inefficiently late, in line with the intuition that reputational concerns should cause delays. Proposition 5. Fix the intensity α ∈ (¯ α, 1) of the DM’s reputational concern. There exists a separating equilibrium such that: (i) the threshold belief pˆ0 for the DM with no news is above the planner threshold p∗ , (ii) every type of the DM with at least one piece of news adopts the constant threshold belief q, which lies below the planner threshold. For the types of the DM with at least one piece of news, a policy with constant threshold q < p∗ under which stopping is delayed relative to the planner policy increases the social cost and reduces the expected reputational gain from deviating so as to pool with higher types. When q equals the planner threshold p∗ , we know from Proposition 2 that for any intensity α > α ¯ ∗

of reputational concerns the social cost of the deviation Dnp (t∗n , t∗n+1 ) is too small to dissuade the DM from pursuing its expected reputational gain. Lowering q below p∗ both increases the inefficiency caused by the deviation and reduces its expected reputational gain, as both the size of the gain, and the probability of obtaining it decrease. This lessens the appeal of the deviation for the DM. In this equilibrium, the type of the DM who has observed no news repeals the project inefficiently early. For that type, stopping before the date tˆ0 satisfying p0tˆ = pˆ0 cannot be punished 0 with a reputational penalty, as µ(t) = p0t is the lowest feasible reputation at t ∈ (0, tˆ0 ). In equilibrium, pˆ0 must be sufficiently high that this deviation has no social benefit. Indeed, there exists an equilibrium in which the DM with no news chooses pˆ0 so as to maximise the social welfare, subject to the constraint that higher types use the inefficiently low threshold q. Observe that the constrained efficient pˆ0 must be strictly greater than the planner threshold p∗ : for the DM with no news, a small inefficiency from stopping slightly too early is preferable to the possibility of receiving a piece of news and being committed to a continuation policy with the inefficiently low threshold belief q. We conclude that, as in standard signalling games31 , there exists an equilibrium in which the lowest type of the DM chooses the efficient policy, although in our signalling game with changing types, that policy is constrained by the inefficient behaviour of higher types. 31

See Spence (1973)

22

4.6

Constrained Stopping Times

Finally, suppose the DM may only make decisions at certain pre-set dates. Consider an extreme case and assume the DM may decide at the exogenously given date t1 > 0 whether to repeal the project or keep it active until the exogenously given final date t2 > t1 . If the project does not succeed by t2 it is automatically repealed. The DM optimally uses a threshold policy: she repeals the project if and only if p(t1 ) ≤ p˜α , where the index stands for the intensity of the DM’s reputational concern. The social planner’s threshold p˜0 subject to this constraint on timing must be such that continuing at t1 generates no gain or loss in social welfare when compared with stopping at t1 . It therefore satisfies (9)

E[ Wtt12 | p(t1 ) = p˜0 ] = s.

The next proposition states that for an intensity α > 0 of reputational concerns, an equilibrium policy cannot maximise the social welfare because the DM’s reputational concern biases her in favour of experimentation. Her bias increases with the intensity α of her reputational concerns, and an utterly career concerned DM (α = 1) never stops at t1 . The intuition is simple. Fix α ∈ (0, 1). In equilibrium, if the DM stops at t1 , the observer infers that her private belief p(t1 ) must be at most p˜α . The public belief µ(t1 ) is therefore strictly below p˜α . As a result, by stopping at t1 the marginal type of the DM for whom p(t1 ) = p˜α pools with lower types for whom p(t1 ) < p˜α . She can strictly improve her reputation by deviating and keeping the project active at t1 thereby pooling with higher types with p(t1 ) > p˜α . To compensate, in equilibrium she must be sufficiently pessimistic at t1 that pooling with higher types imposes a strict loss in expected social welfare. A DM with more intense reputational concerns requires a greater expected social loss for any given boost to her reputation, and the threshold p˜α decreases with α. Proposition 6. In equilibrium, the threshold belief p˜α of a reputationally concerned DM is inefficiently low, and she might proceed with a project even tough it would be efficient to repeal it. The threshold p˜α decreases as the intensity α of the DM’s reputational concern increases, and tends to zero when α tends to 1. This simple example illustrates that social welfare suffers if the timing of the DM’s actions is constrained. In this case, the intuition that reputational concerns cause inefficient delays is valid. Shorter period lengths allows for the complete separation of the DM’s types, mitigating her reputational concerns, and in continuous time, inefficiencies can be eliminated altogether. In contrast, with longer periods, inefficiencies necessarily arise. These observations echo the results of Noldeke and Van Damme (1990).

23

5

No exact Efficiency: Bad Private News with Downward Drift

5.1

Planner Policy

Now consider the case of bad news with downward drift. Figure 2 illustrates a sample path of the DM’s private belief in this case. Proposition 7 describes the planner policy. Observe that the planner threshold belief p[ in (10) does not depend on the parameters of the news process. Here is a heuristic argument explaining why. At the threshold belief, the planner is indifferent between stopping immediately, and committing to keeping the project active for a short time interval of length dt. If over that time interval the project succeeds (fails), the social payoff g (`) accrues. In all other events, the planner repeals the project after the duration dt has elapsed, regardless of how many pieces of news she observed during that time, as her posterior belief would only have moved further into the planner stopping region (0, p[ ). For an infinitesimal dt, we obtain the indifference condition: s = pηG dt g + (1 − p)ηB dt ` + (1 − η(p)dt − ρdt) s, where η(p) := pηg + (1 − p)ηB . Let V 0 denote the planner value function. It admits different expressions on each interval [p[ , j −1 (p[ )) and [j −n (p[ ), j −(n+1) (p[ )), n ≥ 1. The next proposition only gives an explicit expression on the first interval.32 As the posterior belief can discontinuously jump into the stopping region (0, p[ ], or continuously drift into it, the planner value function is C 1 everywhere, including at p[ , and “smooth-pasting” is satisfied.33 Proposition 7. The planner’s optimal policy is to stop at the first time t such that p(t) ≤ p[ . The planner threshold belief satisfies p[ =

(10)

ηB (s − `) + ρs . ηB (s − `) + ηG (g − s)

On the interval [0, j −1 (p[ )), the social payoff under the optimal policy is ( V 0 (p) =

(11)

s

if p < p[ ,

v 0 (p) if p[ ≤ p < j −1 (p[ ),

where (12) 32 33

+λB +ρ 1 − p Ω(p) ηB∆η+∆λ v 0 (p) := u(p, 1) + s − u(p[ , 1) , 1 − p[ Ω(p[ )

In Appendix A.8 we give an explicit expression for the remaining intervals. In Keller and Rady (2015) the planner value function does not satisfy the smooth-pasting condition at the

optimal stopping threshold, because the belief continuously drifts away from and jumps towards the stopping region. As a consequence, in their model, the posterior belief can only enter the stopping region following a jump. The case with upward drift analysed in Section 6.1 is closer to their setup in that respect.

24

and where, for every q ∈ [0, 1], u(q, 1) = q

ηG g + λG s ηB ` + λB s + (1 − q) . ηG + λG + ρ ηB + λB + ρ

On the interval [j −1 (p[ ), 1], V 0 (p) is defined recursively.

Figure 7: Planner value function, V 0 , in the bad news case with downward drift. There are two important qualitative differences with the good-news case. Let t[n satisfy pnt[ = p[ n

for each n ≥ 0. First, t[n+1 < t[n , so that the DM stops earlier the more news she observes. At any date t ≥ 0 the DM with no news is the most optimistic type. Second, since it is possible for the DM’s private belief to jump into the stopping region, stopping on the open interval (t[n+1 , t[n ) for each n ≥ 1 with j n−1 (p0 ) ≥ p[ occurs with strictly positive probability under the planner solution. Stopping after t[0 never occurs. As a consequence, the DM’s reputation under the planner policy is pinned down by Bayes’ rule for every t ≤ t[0 , as discussed in the next section.

5.2

Adopting the Planner Policy in an Equilibrium

Now consider the signalling game between the DM with reputational concerns α ∈ (0, 1) and the observer. In the bad news case with downward drift, the DM’s reputation under the planner policy is pinned down by Bayes’ rule for every t ∈ (0, t[0 ]. Indeed, if the planner policy is played in equilibrium, then at any t ∈ (0, t[0 ] if the DM stops, the observer concludes that her posterior belief just entered the stopping region (0, p[ ]. For every n ≥ 1 with j n−1 (p0 ) ≥ p[ , we therefore have the public belief (13)

µ(t) = pnt ,

h t ∈ 0 ∧ t[n , t[n−1 ,

25

as illustrated in Figure 8. Observe that at every date t[n , the observer concludes that the DM must have observed no more than n pieces of news, and eliminates the most pessimistic type from the set of types to which she assigns positive probability. As a result, the DM’s reputation has discontinuous “surges” at every t[n .

Figure 8:

Reputation pinned down by Bayes’ rule under the planner solution in the

private bad news case with downward drift.

If the DM’s posterior belief jumps into the planner stopping region at a date sufficiently close to the next surge in the public belief µ(t), then waiting until that date entails a social cost that is negligible compared with the resulting reputational gain to the DM. Therefore, the planner policy always admits a profitable deviation. The next proposition shows that a reputationally concerned DM cannot adopt the planner policy in equilibrium. This result is strikingly different from the good news case. It arises because of the discontinuous surges in the observer’s belief µ(t) at every t[n > 0. Those uniquely occur in the bad news case with downward drift. Another force for increased experimentation is absent here, making our result especially striking. Recall that in the good news case, the DM is able to gamble for resurrection: continued experimentation might result in the arrival of good news, providing an incentive to experiment further than under the planner policy. This incentive is absent in the bad news case with downward drift, as additional news only pushes the DM’s posterior belief further into the planner stopping region, increasing inefficiencies. It is the profitability of local deviations due to reputational surges alone that suffices to break the equilibrium. Proposition 8. For every intensity α > 0 of the DM’s reputational concerns, the social planner policy cannot be an equilibrium policy.

26

5.3

Other Equilibria

Fix α > 0 so that in the bad news case with downward drift the planner policy is not an equilibrium policy. We show that an equilibrium policy necessarily pools some types of the DM. This immediately excludes the planner policy. A policy T (ˆ p) for the DM is characterised by the function pˆ : R+ → [0, 1] such that the DM stops at the first date t at which p(t) ≤ pˆ(t). The planner policy has pˆ(t) = p[ for every t > 0, and fully separates the types of the DM: whenever the DM stops, the observer perfectly infers her posterior belief.34 More generally, whenever pˆ(t) is strictly positive and continuous on R+ , it fully separates the types of the DM. Proposition 9 shows that an equilibrium policy necessarily has a local pooling structure, as illustrated in Example 1. ˆ ˆ ˆ Example 1. Fix an integer K > 0 and a sequence of reals {tˆk }K k=0 with tk+1 < tk and tK > 0. Consider the policy T (ˆ p) characterised by the function pˆ such that ( pˆ(t) = 0 ∀t ∈ R+ \ {tˆk }K k=0 , (14) K pˆ(t) > 0 ∀t ∈ {tˆk }k=0 . Under T (ˆ p), the reputation of a DM who repeals the project at date tˆk as determined by Bayes’ rule is µ(tˆk ) = E p(tˆk ) p(tˆk ) ≤ pˆ(tˆk ), p(tˆk+1 ) > pˆ(tˆk+1 ) for every k < K, and µ(tˆK ) = E p(tˆK ) p(tˆK ) ≤ pˆ(tˆK ) . For every t ∈ R+ \ {tˆk }K k=0 , we have µ(t) = 0.35 The policy in Example 1 has types pooling locally in the following sense. At each tˆk all types must stop whose posterior belief fell below pˆ(tˆk ) on the interval (tˆk+1 , tˆk ) but was above pˆ(tˆk+1 ) at date tˆk+1 . For some of these, the private posterior belief p(tˆk ) is below the public belief µ(tˆk ), and stopping entails a reputational penalty. Vice-versa for others.36 On the open interval (tˆk+1 , tˆk ), no type should stop. This policy is clearly inefficient, since the planner policy would prescribe stopping at the first date at which the posterior belief falls below p[ without restricting these dates to the set {tˆk }K k=0 . Observe that the support of pˆ has a gap prior to every upward discontinuities in the public belief µ. These inefficient gaps are necessary for equilibrium. Under a separating policy, such 34

Indeed, the DM’s reputation is determined by Bayes’ rule for every t ∈ (0, t[0 ) and given by (13). Strictly speaking, this is an off-path public belief, as the private belief p(t) is strictly positive at every t > 0. However, the incentive to stop at t ∈ R+ \ {tˆk }K k=0 given the strategy described increases with N (t). Consequently, 35

D1 would require putting all weight on the type with the most pieces of news at t. Because limn→∞ pn t → 0 for every t > 0, we use this limit value for the public belief at t. 36 Here too, the two phenomena can be interpreted as “true leadership” or “pandering”, as in Canes-Wrone, Herron, and Shotts (2001).

27

as the planner policy, the DM can benefit from an upward discontinuity in the public belief by engaging in a local deviation that slightly delays stopping at a negligible social cost. The gap in the support of pˆ under a pooling policy ensures that the social cost of delaying stopping must be large. Conversely, the career concerns of a DM whose posterior belief falls below p[ at some date t0 belonging to the interval (tˆk+1 , tˆk ) must be sufficiently strong that she is willing to accept the social cost of continuing to experiment until tˆk in exchange for the reputation µ(tˆk ) > 0, rather than efficiently stopping at t0 and being punished with the lowest possible reputation µ(t0 ) = 0. Consequently, for lower intensities α of reputational concerns a greater density in the support of pˆ is required for equilibrium. In particular, a pooling policy cannot be an equilibrium if α = 0. Finally, in the appendix we show that the planner policy is an ε-equilibrium when the intensities λG and λB of the news processes grow large while keeping the difference ∆λ fixed. Proposition 9. Fix the intensity α > 0 of the DM’s reputational concern. The following conditions are necessary for a policy T (ˆ p) to be an equilibrium policy: (i) In equilibrium, no type of the DM continues experimenting forever. In particular, there exists a date t0 ≥ 0 at which the type of the DM with no news stops. (ii) The threshold function pˆ has discontinuities on (0, t0 ]. Some of these are upward jumps. (iii) If the threshold function pˆ has an upward discontinuity at tˆ ∈ (0, t0 ], then (a) no type of the DM stops on an open time interval just preceding tˆ; (b) on an open interval just following tˆ, only those types of the DM stop who have observed at least as many pieces of news as a DM whose private belief at tˆ equals the public belief at tˆ.

6

Achieving Exact Efficiency: Other Information Structures

6.1

Bad Private News with Upward Drift

Consider the bad news case with upward drift. The evolution of the DM’s belief is illustrated in Figure 3. Even though the dynamics in this case are different from those of the good news case analysed in Section 4, we obtain the same results concerning efficiency: exact efficiency can be achieved provided the DM’s career concerns are not too intense, and better private information improves efficiency. Proposition 10 describes the planner policy. Let W 0 denote the planner value function and p† denote the planner threshold belief. Because the posterior belief can jump into the planner stopping region (0, p† ), while its continuous motion is away from it, the planner value function is C 1 everywhere except at the planner threshold p† where it is C 0 , and “smooth-pasting” does

28

not hold. Proposition 10. (Adapted from Keller and Rady (2015)) There exists a unique p† ∈ (0, 1) such that the planner’s optimal policy is to stop at the first time t such that p(t) < p† . The social payoff under the optimal policy, is ( W 0 (p) =

(15)

if p < p† ,

s

w0 (p) if p ≥ p† .

It is bounded from above by the full-information value pγG + (1 − p)s, and from below by: (16) w0 (p) := max s, max u(p, n) , n≥1

where u(p, n) is defined in (A.46).

Figure 9:

Reputation pinned down by Bayes’ rule under the planner solution in the

bad news case with upward drift.

In the bad news case with upward drift, stopping is possible at every t > 0 under the planner policy. Consequently, the DM’s reputation under the planner policy is pinned down by Bayes’ rule at every t > 0. Let t†n satisfy pn† = p† . Then, for every n such that pn−1 < p† , 0 tn

µ(t) = pnt ,

h t ∈ 0 ∧ t†n−1 , t†n .

The feature that the DM’s private belief may jump into the planner stopping region is common with the case of bad news with downward drift analysed in Section 5. Contrary to that case, here the drift of the private belief p(t) is upwards towards one. As a result, the public belief µ(t) has only downward discontinuities, as illustrated in Figure 9, and the discontinuous reputational 29

surges from Section 5 do not occur. As a consequence, in a dynamic signalling game between her and the observer, a DM with an intensity α > 0 of reputational concerns perceives net costs and benefits from local deviations that are of the same order of magnitude, yielding the following result. Proposition 11. There exists α ¯ ∈ (0, 1) such that the planner policy is an equilibrium policy if and only if the DM’s reputational concerns have intensity α ≤ α ¯. As in Section 4, one can show that α ¯ increases towards 1 as the informativeness of the DM’s private news process improves. Furthermore, when α > α ¯ , a similar departure from the social welfare maximising policy is required to satisfy the DM’s incentives.

6.2

Failures More Rapid than Successes

Suppose now that ∆η < 0, with the interpretation that a bad project fails faster than a good project succeeds. The absence of a public event – success or failure – on its own constitutes good news, making the DM’s belief drift up. Again we distinguish three cases, according to the directions of the jumps and the drift of the private belief process p(t). In all three cases, exact efficiency can be achieved. In particular there is no information structure under which the complications of Section 5 arise. At one extreme, we have a good-news case (∆λ > 0) in which the news process is informative enough that the absence of news over a short time interval is more informative about the project’s quality than the absence of public success or failure, so that the drift of the private belief process is downwards (∆η +∆λ > 0). The evolution of the private belief process is therefore as illustrated in Figure 1. Moreover, the planner policy is characterised by a constant threshold belief, and prescribes that the DM repeal the project the first time her posterior belief falls below that threshold. Consequently, the qualitative results from Section 4 go through. In particular, there exists an intensity α ¯ > 0 of reputational concerns such that the DM can adopt the planner policy in an equilibrium of the signalling game if α ≤ α ¯ and her reputational concerns are not too severe. At the other extreme, we have a bad-news case with upward drift (∆λ < 0), in which the evolution of the posterior belief process is as in Figure 3. In this case, the qualitative results from Section 6.1 go through. As a third possibility, we have a good-news case with upward drift, in which the posterior belief both jumps up and drifts up (∆λ > 0 and ∆η + ∆λ < 0). In this case, the planner solution is “bang-bang” as in Proposition 1 of Keller and Rady (2015), and characterised by a constant threshold belief. If the prior p0 is below the threshold, the DM repeals the project immediately.37 37

Formally, the DM repeals the project at some date ε > 0 provided that p(ε) is strictly below the planner

threshold, and this policy is approximately optimal as ε → 0.

30

Otherwise, she continues experimenting until either a public success or a public failure ends the game. Since in the signalling game the prior is common knowledge between the DM and the observer, the observer knows whether the DM should immediately repeal the project or not. Thus, a deviation is immediately detected, and can be punished with a reputation µ(t) = p0 for every t > 0. It follows that the DM has no incentive to deviate from the planner policy as long as α < 1. In a slight abuse of notation, we write α ¯ = 1.

Figure 10: The threshold α¯ ≡ α(λ ¯ B , λG , ∆η) as a function of the information structure, parametrised by λB , λG and ∆η < 0.

The scenario where ηG = 0 < ηB and λG = 0 < λB is closest to Halac and Kremer (2016). The DM’s private belief drifts up in the absence of private news, while the first piece of private news reveals that the project is bad and will surely fail. Thus, there are only two possible types of the DM. In this model there are no reputational gains from deviating and we have that α ¯ = 1. In Halac and Kremer (2016), because she receives a flow payoff proportional to the observer’s belief, which increases as long as she keeps the project active, the DM has an additional bias towards experimentation. Moreover, the observer does not observe the project’s failure, so that the DM’s actions are the only source of public information. The DM might therefore keep the project active even after having learned that it is bad.

7

Conclusion

The innovation of this paper is to consider a dynamic signalling model where the sender’s private information is acquired gradually over time. This model helps clarify the role of private information when the agent has reputational concerns. There are two main insights. Regardless of the intensity of her reputational concerns, the social welfare maximising action plan is chosen in equilibrium by a career-concerned agent, provided her 31

private signal is sufficiently informative, and she is not exogenously constrained in her ability to act at any point in time. Thus, increasing the agent’s stake in the social welfare, and improving the quality of her private information are both valid policy instruments for regulators seeking to mitigate the ill effects of career concerns. Second, when the quality of the agent’s private information is not sufficient to override her career concerns, any equilibrium may require inefficient delays, or inefficient discretising of the policy support, depending on the information structure. Our analysis also suggests that reputational concerns may not be the best explanation for the tendency of political or business leaders to persist with their pet projects, such as the Great Leap Forward or the poll tax in the UK. More likely, hubris and a misplaced belief in their own infallibility made Mao or Thatcher discount the information they received.

32

References Alesina, A., and A. Drazen (1991): “Why are Stabilizations Delayed?,” The American Economic Review, 81(5), 1170–1188. Baldursson, F. M., and R. Portes (2013): “Gambling for resurrection in Iceland: the rise and fall of the banks,” Available at SSRN 2361098. Ben-Porath, E., E. Dekel, B. L. Lipman, et al. (2014): “Disclosure and Choice,” Discussion paper, Mimeo. Benabou, R., and G. Laroque (1992): “Using privileged information to manipulate markets: Insiders, gurus, and credibility,” The Quarterly Journal of Economics, 107(3), 921–958. Boot, A. W., and A. V. Thakor (1993): “Self-interested bank regulation,” The American Economic Review, pp. 206–212. Canes-Wrone, B., M. C. Herron, and K. W. Shotts (2001): “Leadership and pandering: A theory of executive policymaking,” American Journal of Political Science, pp. 532–550. Cho, I.-K., and D. M. Kreps (1987): “Signaling games and stable equilibria,” The Quarterly Journal of Economics, pp. 179–221. Cho, I.-K., and J. Sobel (1990): “Strategic stability and uniqueness in signaling games,” Journal of Economic Theory, 50(2), 381–413. Courty, P., and H. Li (2000): “Sequential screening,” The Review of Economic Studies, 67(4), 697–717. Das, K. (2015): “Strategic Experimentation with Competition and Private Arrival of Information,” Discussion paper, Exeter University, Department of Economics. Dewatripont, M., J. Tirole, et al. (1994): The prudential regulation of banks. MIT Press. ¨ tter, F. (2010): Mao’s Great Famine: The History of China’s Most Devastating CatasDiko trophe, 1958-1962. Walker. Dong, M. (2016): “Strategic Experimentation with Asymmetric Information,” Working Paper. Downs, G. W., and D. M. Rocke (1994): “Conflict, agency, and gambling for resurrection: The principal-agent problem goes to war,” American Journal of Political Science, pp. 362–380. Dur, R. A. (2001): “Why do policy makers stick to inefficient decisions?,” Public Choice, 107(34), 221–234. 33

˝ , P., and B. Szentes (2007): “Optimal information disclosure in auctions and the handicap Eso auction,” The Review of Economic Studies, 74(3), 705–731. (2016): “Dynamic Contracting: An Irrelevance Result,” Theoretical Economics, Forthcoming. Fernandez, R., and D. Rodrik (1991): “Resistance to reform: Status quo bias in the presence of individual-specific uncertainty,” The American economic review, pp. 1146–1155. Halac, M., and I. Kremer (2016): “Experimenting with Career Concerns,” Discussion paper, mimeo. Heidhues, P., S. Rady, and P. Strack (2015): “Strategic experimentation with private payoffs,” Journal of Economic Theory, 159, 531–551. ¨ m, B. (1999): “Managerial incentive problems: A dynamic perspective,” The Review Holmstro of Economic Studies, 66(1), 169–182. Jackson, M. O., and M. Morelli (2007): “Political Bias and War,” The American Economic Review, 97(4), 1353–1373. Jensen, M. C. (1986): “Agency cost of free cash flow, corporate finance, and takeovers,” Corporate Finance, and Takeovers. American Economic Review, 76(2). Kamenica, E., and M. Gentzkow (2011): “Bayesian Persuasion,” American Economic Review, 101(6), 2590–2615. Keller, G., and S. Rady (2010): “Strategic experimentation with Poisson bandits,” Theoretical Economics, 5(2), 275–311. (2015): “Breakdowns,” Theoretical Economics, 10(1), 175–202. Keller, G., S. Rady, and M. Cripps (2005): “Strategic experimentation with exponential bandits,” Econometrica, pp. 39–68. Krestel, C., and C. Thomas (2014): “Strategic Experimentation with Congestion - Private Monitoring,” Working Paper, The University of Texas at Austin. Majumdar, S., and S. W. Mukand (2004): “Policy gambles,” The American Economic Review, 94(4), 1207–1222. Milne, R. (2013): “Iceland bank pair jailed for five years,” Financial Times, 12 December 2013. Morris, S. (2001): “Political correctness,” Journal of Political Economy, 109(2), 231–265. 34

Noldeke, G., and E. Van Damme (1990): “Signalling in a dynamic labour market,” The Review of Economic Studies, 57(1), 1–23. Ottaviani, M., and P. N. Sørensen (2006): “Reputational cheap talk,” The Rand journal of economics, 37(1), 155–175. Patrick, M. (2013): “Barclays Qatar Dealings Hit,” Wall Street Journal, 16 September 2013. Prendergast, C. (1993): “A theory of “yes men”,” The American Economic Review, pp. 757–770. Prendergast, C., and L. Stole (1996): “Impetuous youngsters and jaded old-timers: Acquiring a reputation for learning,” Journal of Political Economy, pp. 1105–1134. Presman, E. L., and I. N. Sonin (1990): Sequential control with incomplete information: the Bayesian Approach to multi-armed bandit problems. Academic Press. Rajan, R. G. (1994): “Why bank credit policies fluctuate: A theory and some evidence,” The Quarterly Journal of Economics, pp. 399–441. Rogoff, K. (1990): “Equilibrium Political Budget Cycles,” The American Economic Review, pp. 21–36. Senate, U. S. (2013): “JP Morgan Chase Whale Trades: A Case History of Derivatives Risks and Abuses,” Permanent Subcommittee on Investigations, Committee on Homeland Security and Governmental Affairs, March, 15. Spence, M. (1973): “Job market signaling,” The quarterly journal of Economics, pp. 355–374. Swinkels, J. M. (1999): “Education signalling with preemptive offers,” The Review of Economic Studies, 66(4), 949–970.

35

A

Appendix

A.1

Proof of Proposition 1

Proof. Fix 0 < ηB < ηG . The common value function for the DM and the observer, U 0 (p(t)) := supT Vt0,T , is convex and continuous. Convexity reflects a nonnegative value of information, and implies continuity in the open unit interval.38 The function U 0 solves the Bellman equation: n o (A.1) u(p) = max s ; bS (p, u) + bF (p, u) + bN (p, u) − d(p, u) /ρ , where bS (p, u) = pηG [g − u(p)] is the expected benefit from a success, bF (p, u) = (1 − p)ηB [` − u(p)] is the expected loss from a failure, bN (p, u) = λ(p)[u(j(p)) − u(p)] is the expected benefit from a piece of news, and d(p, u) = (∆η + ∆λ)p(1 − p)u0 (p) measures the deterioration in the DM’s outlook when she experiments without observing any event – success, failure, or news. As, in the good-news case, infinitesimal changes in the belief are always downward, we say that a continuous function u solves the Bellman equation if its left-hand derivative exists on (0, 1] and (A.1) holds on (0, 1) when this left-hand derivative is used to compute d(p, u). The planner value function U 0 is the unique solution satisfying the boundary conditions u(0) = s and u(1) = γG . Over values of p ∈ (0, 1) at which experimentation is optimal, U 0 solves39 the following ordinary differential difference equation40 , where η(p) := pηg + (1 − p)ηB : (A.2) u(p) η(p) + ρ + λ(p) [u(p) − u (j(p))] + p(1 − p)(∆η + ∆λ)u0 (p) = pηG g + (1 − p)ηB `. Let p 7→ u0 (p), defined on (0, 1), denote the solution to this differential equation. It is easy to verify that the function p 7→ pγG + (1 − p)γB constitutes a particular solution to (A.2). The function p 7→ (1 − p)Ω(p)ν , ν > 0, captures the option-value of being able to repeal the project41 , and constitutes a solution to the homogeneous version of (A.2) if and only if ν satisfies equation (5). 38

Continuity at the boundaries follows from the fact that U 0 is bounded above by the full information payoff

pγG + (1 − p)s, and bounded below by the payoff s ∨ γ(p) of a DM whose policy T may only take values in {0, ∞}, with the interpretation that the DM only has a choice between immediately repealing the project, or experimenting forever. Both these payoffs converge to γG as p → 1 and to s as p → 0. 39 Whenever the drift of the posterior belief is downward, we say that a continuous function solves the ODDE (A.2) if its left-hand derivative exists and (A.2) holds when this left-hand derivative is used to compute V 00 (p). 40 This ODDE bears some resemblance to, but is different from equation (1) in Keller and Rady (2010). The difference is that in their model the arrival of news and the accrual of payoff always coincide. In our context they may be separated, as private news events do not have direct payoff consequences (except though their effect on the DM’s belief). The similarity is that the publicly observable arrival of payoff in the form of a success or failure remains informative about the underlying project quality, as are the payoff-relevant “breakthroughs” in Keller and Rady (2010). 41 The assumption of exponential discounting dictates our guess of the form of the option-value.

36

There are two solutions to (5), one negative, the other positive. We hlet ν(λG , λB ) denote i the positive solution. ρ+ηB ρ+ηB +λB ∂ν (We suppress the dependence on (ηG , ηB ).) Observe that ν(λG , λB ) ∈ ∆η+∆λ , ∆η+∆λ , ∂λ (λG , λB ) < 0, and G ∂ν ∂λB

(λG , λB ) > 0.

The solution to (A.2) is therefore given by the family of functions uC0 (p) = γ(p) + C0 (1 − p)Ω(p)ν(λG ,λB ) , where C0 ∈ R is a constant of integration. Let p∗ denote the planner’s optimal threshold belief. It satisfies the boundary condition (value-matching): u0 (p∗ ) = s. Solving for the constant C0 , we obtain the expression in (7) for u0 (p), which for every p ∈ [0, 1] is a continuous function of p∗ on [0, 1]. There exists a unique belief p∗ ∈ (0, 1) that maximises the resulting expression for every p ∈ (0, 1) (the solution is interior and smooth-pasting is satisfied). It is given by (4). In Appendix A.1.1 we verify that U 0 solves the Bellman equation (A.1) with the maximum being achieved under the planner policy.

A.1.1

Verification

Proof. For p ≥ p∗ , bS (p, U 0 ) + bF (p, U 0 ) + bN (p, U 0 ) − d(p, U 0 ) /ρ = U 0 (p), which is strictly greater than s for every p > p∗ , and equals s when p = p∗ . For p ≤ j −1 (p∗ ), (bS (p, U 0 ) + bF (p, U 0 ) + bN (p, U 0 ) − d(p, U 0 )) = pηG (g − s) + (1 − p)ηB (` − s). This is strictly less than sρ if and only if p < p¯∗ (defined in Corollary 1). It is therefore strictly less than sρ for every p ≤ j −1 (p∗ ). For j −1 (p∗ ) < p < p∗ , bS (p, U 0 ) + bF (p, U 0 ) + bN (p, U 0 ) − d(p, U 0 ) = pηG (g − s) + (1 − p)ηB (` − s) + λ(p)(U 0 (j(p)) − s) < pηG (g − s) + (1 − p)ηB (` − s) + λ(p)(U 0 (j(p∗ )) − s) = pηG (g − s) + (1 − p)ηB (` − s) ν(λG ,λB ) λ(p) p∗ γG λG + (1 − p∗ )γB λB + s − γ(p∗ ) λB λλB − λ(p)s + λ(p ∗) G =

1 λ(p∗ )

h

i (γG − s)λB (ηG + ρ) + (s − γB )λG (ηB + ρ) (p − p∗ ) + sρ

< sρ. The first inequality follows from the monotonicity in p of U 0 (j(p)) on (j −1 (p∗ ), 1). The expression at the penultimate line is obtained from the previous line via a substitution from (5) and after some rearranging. The term in square brackets is strictly positive, so that this expression is linearly increasing in p on (j −1 (p∗ ), p∗ ) and takes the value sρ when p = p∗ . The last inequality follows, establishing the result.

A.1.2

Comparative Statics

Clearly, p∗ and U 0 (p) depend on the information structure. Suppose λG increases, λB decreases, or both, so that the difference ∆λ increases, with the interpretation that news events are more informative. Elementary calculations show that p∗ decreases, and U 0 (p) increases uniformly for all p ∈ (p∗ , 1): the social value of experimentation increases in the presence of more informative news about the project’s quality.

37

The next corollary bounds p∗ and describes the limit cases. It is illustrated in Figure 11. Fix λG > 0. When news is completely uninformative (λB = λG ), then p∗ attains its upper bound, p¯∗ .42 When λB = 0 and one piece of news suffices to conclusively reveal that the project is good, p∗ attains its lower bound, p∗ . In both cases there is a unique date at which the planner stops under the optimal policy. When λB = λG , p(t) is not affected by the arrival of news and continuously drifts down. The planner stops at the date t¯∗ setting p(t¯∗ ) = p¯∗ , conditional on no success or failure on [0, t¯∗ ). When λB = 0, p(t) drifts down continuously in the absence of news. The first piece of news conclusively reveals that the project is good, and the planner’s posterior belief jumps to one, making it optimal to never stop experimenting. The planner stops at the date t∗ setting p(t∗ ) = p∗ , conditional on no success, failure or news on [0, t∗ ). Now fix λB ≥ 0. Letting λG → ∞ approximates the full-information case. If the project is good, conclusive news arrives instantly. The absence of news immediately reveals that the project is bad. Repealing the project at time ε unless she observes news is an almost-optimal policy for the planner when ε → 0. The social payoff under this policy tends to p0 γG + (1 − p0 )s. Corollary 1.

(a) Fix λG > 0. If λB = λG , then ν(λG , λB ) =

(b) Fix λG > 0. If λB = 0, then ν(λG , λB ) =

ρ+ηB , ∆η+λG

ρ+ηB ∆η

, and p∗ = p¯∗ :=

and p∗ = p∗ :=

ρs+ηB (s−`) . ηG (g−s)+ηB (s−`)

ρs+ηB (s−`) . ηG (g−s)+ηB (s−`)+λG (γG −s)

(c) For every λG > λB > 0, p∗ ∈ (p∗ , p¯∗ ). (d) Fix λB ≥ 0. If λG → ∞, then ν(λG , λB ) → 0, and p∗ → 0. Proof. (a) This λB = λG in (5). For (b) and (d), recall from the proof of Proposition 1 that i h follows from setting ρ+ηB +λB ρ+ηB ν(λG , λB ) ∈ ∆η+∆λ , ∆η+∆λ , and observe that when λB = 0 this interval is the point (ρ + ηB )/(∆η + λG ), and that when λG → ∞ this point converges to 0. Finally, (c) follows from the monotonicity of ν(λG , λB ) in both its arguments.

Figure 11:

The function u0 (p) and the planner threshold for the cases (from lowest to greatest value

achieved under the planner solution): 0 < λB = λG ; 0 < λB < λG ; 0 = λB < λG ; 0 ≤ λB , λG → ∞. 42

If in addition ηB = 0, then the planner problem is equivalent to the exponential bandit decision problem of

Keller, Rady, and Cripps (2005). (See Proposition 3.1 in Keller, Rady, and Cripps (2005).)

38

A.2

Proof of Proposition 2

Proof. We begin by showing that the DM cannot profitably deviate from the planner policy by continuing to experiment when p(t) ≤ p∗ . ∗

Lemma A.1. The deviation Dnp (t∗n , t∗n+1 ) is strictly profitable if and only if ∗ ∗ α p∗ − j −1 (p∗ ) + (1 − α) e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) > 0.

(A.3)

The first bracket is positive and measures the expected net reputational benefit from deviating. The second bracket is negative and measures the expected net social cost of deviating. ∗

Proof. Let us first evaluate the net reputational benefit entailed by the deviation Dnp (t∗n , t∗n+1 ). If the DM adopts the planner policy and stops at t∗n , she reveals her private information and the observer learns that her private belief is the planner threshold: µ(t∗n ) = p∗ . ∗

Assume that instead the DM follows the deviation Dnp (t∗n , t∗n+1 ). If in the time interval [t∗n , t∗n+1 ) the project succeeds, the DM’s reputation jumps to 1 and µ(t∗n+1 ) = p(t∗n+1 ) = 1. If it fails, the DM’s reputation jumps to 0 and µ(t∗n+1 ) = p(t∗n+1 ) = 0. If there is no public event, i.e. no success or failure, but the DM observes k ≥ 1 pieces of news, then either the project succeeds (at random time τG > t∗n+1 ) or fails (at random time τB > t∗n+1 ) before the DM’s belief reaches p∗ (at stopping time T > t∗n+1 ), or vice-versa. In each case, the observer learns the DM’s private information and we have µ(τG ∧ τB ∧ T ) = p(τG ∧ τB ∧ T ). Finally, if in the time interval [t∗n , t∗n+1 ) there is no public event and the DM observes 0 pieces of news, she stops at t∗n+1 . In this case, the observer’s belief is defined by Bayes’ rule and satisfies µ(t∗n+1 ) = p∗ , whereas the DM’s private belief is p(t∗n+1 ) = j −1 (p∗ ) < p∗ . By the above argument, we have that E[µ(τG ∧ τB ∧ T )|p(t∗n ) = p∗ ] − E[p(τG ∧ τB ∧ T )|p(t∗n ) = p∗ ] = π(0, p∗ , t∗n+1 − t∗n ) p∗ − j −1 (p∗ )

(A.4)

where we let ∗

∗

∗

∗

π(0, p∗ , t∗n+1 − t∗n ) := p∗ e−(ηG +λG )(tn+1 −tn ) + (1 − p∗ ) e−(ηB +λB )(tn+1 −tn ) denote the probability that no news or public event arrives over the time interval [t∗n , t∗n+1 ), given the current belief p(t∗n ) = p∗ . Since the posterior belief process is an F X -martingale, E[p(τG ∧ τB ∧ T )|p(t∗n ) = p∗ ] = p∗ and the left-hand ∗

side of (A.4) defines the net reputational benefit from the deviation Dnp (t∗n , t∗n+1 ). The right-hand side is clearly positive. ∗

Let us now evaluate the net social loss from the deviation Dnp (t∗n , t∗n+1 ). We begin with a few observations on u0 (p), defined in (7). It is a convex function of p, reaching its minimum s when p = p∗ , and with u0 (1) = γG and limp→0 u0 (p) = +∞. For all pn t ∈ (0, 1) and for all ∆t > 0, it is easy to verify that (A.5)

∞ X n −(ηB +ρ)∆t −ρ∆t 0 u0 (pn 1 − e−(ηG +ρ)∆t γG + (1 − pn γB + π(k, pn u pn+k t ) = pt t) 1−e t , ∆t) e t+∆t , k=0

where (A.6)

n π(k, pn t , ∆t) := pt

(λB ∆t)k −(ηB +λB )∆t (λG ∆t)k −(ηG +λG )∆t e + (1 − pn e t) k! k!

denotes the probability that S(t + ∆t) − S(t) = 0 and N (t + ∆t) − N (t) = k, given S(t) = 0 and N (t) = n. Equivalently, it is the probability that, given that the DM’s belief at t is p(t) = pn t , there is no public success or failure, and she observes k = 0, 1, 2, . . . pieces of news over the time interval [t, t + ∆t). Her resulting posterior belief in this case is pn+k t+∆t .

39

∗

If, given her current belief p(t∗n ) = pn = p∗ , the DM follows the deviation Dnp (t∗n , t∗n +∆t), for 0 < ∆t ≤ t∗n+1 −t∗n , t∗ n the expected social payoff is

U0

(A.7)

Dnp∗

) 1 − e−(ηB +ρ)∆t γB (t∗n , t∗n + ∆t) := pn 1 − e−(ηG +ρ)∆t γG + (1 − pn t∗ t∗ n n +

∞ X

) , ∆t) e−ρ∆t U 0 (pn+k π(k, pn t∗ t∗ n n +∆t

k=1

, ∆t) e−ρ∆t s. + π(0, pn t∗ n If the project succeeds over the time interval [t∗n , t∗n + ∆t), a payoff with present discounted value g accrues at random time τG ∈ [t∗n , t∗n + ∆t). If the project fails over the time interval [t∗n , t∗n + ∆t), a payoff with present discounted value ` accrues at random time τB ∈ [t∗n , t∗n + ∆t). If there is no public event but k ≥ 1 pieces of ∗ news arrive over the time interval [t∗n , t∗n + ∆t), the resulting belief for the DM is ptn+k ∗ +∆t . It exceeds p , so that n

the planner policy prescribes that the DM continue experimenting at t∗n + ∆t, which generates a social payoff of ∗ ∗ U 0 (pn+k t∗ +∆t ). Finally, if over the time interval [tn , tn + ∆t) no public event occurs and no private news arrives, n

, is strictly below p∗ and the planner policy prescribes that the DM repeals the project at the DM’s belief, pn t∗ n +∆t t∗n + ∆t, which generates a social payoff of s. n+k 0 n+k For ∆t = t∗n+1 − t∗n we have that, for all k ≥ 1, pn+k ≥ p∗ so that U 0 (pn+k t∗ +∆t = pt∗ t∗ +∆t ) = u (pt∗ +∆t ). Replacing n

n+1

n

n

the first two lines in (A.7) using (A.5), we have that U0

Dnp∗

h i ∗ n ∗ ∗ −ρ(t∗ 0 n n+1 −tn ) ∗ ∗ , tn+1 − tn ) e (t∗n , t∗n+1 ) = U 0 pn + π(0, p s − u (p ) . t∗ t t n n n+1

Observe that, since pn t∗

n+1

< p∗ , the value u0 (pn t∗

n+1

) is strictly greater than s, and the term in square brackets in ∗

the expression above is strictly negative. Thus, the expected net social payoff from the deviation Dnp (t∗n , t∗n+1 ), h i Dp∗ ∗ ∗ U 0 n (t∗n , t∗n+1 ) − U 0 pn ) = π(0, pn , t∗n+1 − t∗n ) e−ρ(tn+1 −tn ) s − u0 (pn t∗ t∗ t∗ n n n+1 is a strict loss. This was to be expected: the planner policy maximises the social payoff, therefore any deviation from it generates a net social loss. , t∗n+1 − t∗n ) > 0, the lemma follows. Since π(0, pn t∗ n Observe that the duration t∗n+1 − t∗n is independent of n. Therefore, for any given α > 0, Lemma A.1 holds for some n ≥ 0 if and only if it holds for every n ≥ 0. Consequently there exists an intensity α ¯ > 0 of the DM’s ∗

reputational concern such that for all α < α, ¯ the deviation Dnp (t∗n , t∗n+1 ) is not profitable for every n ≥ 0. For these ∗

values of α, longer deviations Dnp (t∗n , t∗n+k ) with k > 1 are not profitable a fortiori. Lemma A.2 establishes that ∗

shorter deviations are not profitable either. Thus, Dnp (t∗n , t∗n+1 ) is the best possible deviation. Finally, repealing the project when p(t) > p∗ induces reputational as well as social losses, and is therefore not profitable. Proposition 2 follows. ∗

∗

Lemma A.2. For every t0 ∈ (t∗n , t∗n+1 ) the deviation Dnp (t∗n , t0 ) is less profitable than Dnp (t∗n , t∗n+1 ). Proof. Given t0 ∈ (t∗n , t∗n+1 ), consider the deviation Dn (t∗n , t0 ). If no public event or news arrives on (t∗n , t0 ), the DM stops at t0 . Observe that µ(t0 ) is not determined by Bayes’ rule. Consider the following off-path beliefs: µ(t0 ) = pn t0 . (This reputation can be motivated as follows: It is the belief the observer attributes to the DM when assuming that a random event has occurred that prevented the DM from acting for a short duration t0 − t∗n .) Then, after all histories, µ(τG ∧ τB ∧ T ) = p(τG ∧ τB ∧ T ). Therefore, by

40

the martingale property, the DM’s expected reputational benefit is zero, and cannot outweigh the expected social loss.

A.3

Proof of Proposition 3 (a)

The proof or Proposition 3 (a) is organised as follows. Consider the efficient equilibrium from Proposition 2. Fix an off-path date t0 ∈ (t∗k , t∗k+1 ), and suppose the DM stops at that date. The reputations µ(t0 ) that satisfy D1 are described in Lemma 1, proved in Appendix A.3.1. Lemma A.3, stated and proved in Appendix A.3.2, establishes that these reputations are uniquely defined for almost every t0 ∈ (t∗k , t∗k+1 ), given α. Appendix A.3.3 concludes the proof by showing that the efficient equilibrium is supported by the off path reputation satisfying D1.

A.3.1

Proof of Lemma 1

Proof. Fix k ≥ 0 and t0 ∈ (t∗k , t∗k+1 ). The set of reputations that could be offered to the DM who stops at t0 is k n ∞ [p0t0 , 1) because the set of possible types at t0 is {pn t0 }n=0 ∪ {pt0 }n=k+1 . For the types in the first subset, a deviation

to t0 requires delaying stopping when compared with the equilibrium. For those in the second subset it requires anticipating the equilibrium stopping date. k • First, we show that D1 eliminates all types pn t0 < pt0 .

Suppose the DM stops at date t0 ∈ (t∗k , t∗k+1 ).

Conditional on p(t0 ) < p∗ , her type could be pn t0 ∈

0 0 at date t∗0 and had no public success {p0t0 , p1t0 , . . . , pkt0 }. If pn t0 = pt0 , the DM must have been type pt∗ 0 1 0 at date t∗0 or failure and no private news on [t∗0 , t0 ). If pn t0 = pt0 , the DM could either have been type pt∗ 0

and observed no public event and one piece of news on [t∗0 , t0 ), or she could have been type p1t∗1 at date t∗1 and observed no public event and no news on [t∗1 , t0 ). And so on. If we eliminate deviations up to date t0 ∗ for types p0t∗0 , p1t∗1 , . . . , pk−1 , in that order, we can conclude that, conditional on pn t0 < p , the type stopping t∗

at t0 must be pkt0 .

k−1

For t < t0 , let h i VtT (p(t), µ(t0 )) := E (1 − α) WtT + α µ(τG ∧ τB ∧ T ) | p(t) be type p(t)’s expected payoff at date t under the arbitrary policy T , if the reputation from stopping at t0 is µ(t0 ) ∈ [p0t0 , 1), and is as in the efficient equilibrium for every t 6= t0 . Observation 1. If under T the DM expects to stop at date t0 with strictly positive probability, then VtT (p(t), µ(t0 )) is a strictly increasing, continuous function of µ(t0 ). Observation 2. The payoff VtT (p(t), µ(t0 )) is a strictly increasing, continuous function of p(t). Let T¯n (µ(t0 )) be the policy that maximises VtT∗n (pn , µ(t0 )). Under that policy, the DM only stops at dates t∗ n 0 0 in {t∗j }∞ j=n ∪ {t }, since amongst all strategies that never stop at t , the planner policy, which we denote

T ∗ , maximises her expected payoff. In other words, the DM can only improve upon the planner policy by adding t0 to the support of T ∗ . Finally, we let Mn (t0 ) be the set of reputations µ(t0 ) ∈ [p0t0 , 1) that are such n 0 that T¯n (µ(t0 )) prescribes that for each t ∈ [t∗n , t0 ), type pn t continues to experiment, and type pt0 stops at t . Accordingly, reputations in Mn (t0 ) (if that set is not empty) will persuade the DM who holds the belief pn t∗ n at date t∗n to continue experimenting on [t∗n , t0 ) even in the absence of news, conditional on no public events. If Mn (t0 ) = ∅, then stopping at date t0 can be excluded for type pn t0 by equilibrium dominance. Now suppose that Mn (t0 ) 6= ∅ and fix µ(t0 ) ∈ Mn (t0 ). By definition, for every j = n, . . . , k it must be (at least weakly)

41

∗ ∗ 0 preferable to continue experimenting with belief pn t∗ for each tj ∈ [tn , t ), rather than stopping: j

T¯ (µ(t0 )) n (pt∗j , µ(t0 )) Vt∗n j

≥ (1 − α)s + αp∗ ,

j = n, . . . , k.

Moreover, by Observation 1, for every µ ˜(t0 ) > µ(t0 ), T¯ (µ(t0 ))

Vt∗n

T¯ (µ(t0 ))

(pn ,µ ˜(t0 )) > Vt∗n t∗ j j

j

(pn , µ(t0 )), t∗ j

j = n, . . . , k.

Hence there exists a reputation µn such that for every t∗j ∈ [t∗n , t0 ), t0 T¯ (µ(t0 ))

Vt∗n j

(pn , µn ) ≥ (1 − α)s + αp∗ , t∗ j t0

j = n, . . . , k,

and there exists a date t∗j ∈ [t∗n , t0 ) for which the relation holds with equality. Thus, for each n ≤ k, Mn (t0 ) is the interval [µn , 1) ⊆ [p0t0 , 1). t0 Now consider type p0t∗0 at date t∗0 . If µ0t0 < 1, then we have T¯0 (µ00 ) t

Vt∗ j

(p0t∗j , µ0t0 ) ≥ (1 − α)s + αp∗ ,

j = 0, . . . , k.

For every t∗j ∈ [t∗1 , t0 ), the policy T¯0 (µ0t0 ) is available to type p1t∗ at t∗j . Moreover, by observation 2 we have: j

T¯0 (µ00 ) t

Vt∗ j

T¯0 (µ00 )

(p1t∗j , µ0t0 ) > Vt∗

t

j

(p0t∗j , µ0t0 ),

j = 1, . . . , k,

so that T¯0 (µ00 ) t

Vt∗ j

(p1t∗j , µ0t0 ) > (1 − α)s + αp∗ ,

j = 1, . . . , k.

Therefore, the optimal policy T¯1 (µ0t0 ) prescribes that for each t ∈ [t∗1 , t0 ), type p1t continues to experiment, and that type p1t0 stops at t0 . Consequently we must have µ1t0 < µ0t0 . Stopping at date t0 can therefore be excluded for type p0t0 under the D1 criterion43 . k Finally, we proceed by induction and eliminate all types pn t0 < pt0 . k+1 • Second, we show that D1 eliminates all types pn t0 > pt0 . k+1 at date t0 . Her payoff on the equilibrium path is Consider type pn t0 > pt0 n (1 − α) U 0 (pn t0 ) + α p t0 ,

where U 0 is defined in (6). The payoff from deviating and stopping at date t0 is (1 − α) s + α µ(t0 ). Stopping at date t0 is a weakly profitable deviation for pn t0 if and only if 1−α (A.8) µ(t) ≥ µn := pn U 0 (pn t0 + t0 ) − s . t0 α n ≥ 1 then stopping at date t0 can be excluded for type pn If µn t0 by equilibrium dominance. If µt0 < 1, the t0 0 n 0 n set of reputations, µ(t0 ), giving type pn t0 a strict incentive to stop at t is (µt0 , 1), while at µ(t ) = µt0 the

incentive is weak. From (A.8) it is easy to see that µn is a strictly increasing function of n, so that µn < µn+1 for all n ≥ k. t0 t0 t0 k+1 We can therefore eliminate the deviation t0 for all types pn t0 > pt0 . 43

0

n According to the D1 criterion, we can eliminate the deviation t0 for type pn t0 if there exists another type pt0

such that: 0

[µn , 1) ⊂ (µn , 1), t0 t0 0

0

n n or, equivalently, if there exists a type pn t0 for whom µt0 > µt0 . Using the notation of Cho and Kreps (1987) (p. 205), 0

0

n Dpnt0 ≡ (µn , 1), Dpn0 ≡ (µn , 1) and Dp0n0 ≡ µn . In words: we can eliminate type pn t0 if there exists a type pt0 who t0 t0 t0 t0

t

has a strict incentive to deviate whenever type pn t0 has a strict or weak incentive to deviate. By this elimination process, the type who has the strongest incentives to deviate to t0 remains.

42

0 ∗ ∗ • We are left with the possible types pkt0 and pk+1 t0 . We now show that at almost every t ∈ (tk , tk+1 ] there

exists an off-path reputation µ(t0 ) ∈ [pkt0 , pk+1 t0 ] that satisfies the D1 criterion. For pkt∗ to deviate and adopt the threshold belief pkt0 instead of p∗ , we need k i i h h 0 ∗ (1 − α)e−ρ(t −tk ) s − u0 (pkt0 ) + α µ(t0 ) − pkt0 ≥ 0. −ρ(t0 −t∗ k)

where e

=

k 1−p∗ pt0 p∗ 1−pk0

ρ ∆η+∆λ

and u0 is defined in 7. Equivalently:

t

(A.9)

0

µ(t ) ≥

µkt0 (α)

:=

pkt0

1−α + α

1 − p∗ pkt0 p∗ 1 − pkt0

ρ ∆η+∆λ

u0 (pkt0 ) − s .

For pk+1 to deviate and adopt the threshold belief pk+1 instead of p∗ , we need t0 t0 h i h i 0 k+1 (1 − α) s − U 0 (pk+1 ≥ 0, t0 ) + α µ(t ) − pt0 or equivalently: (A.10)

µ(t0 ) ≥ µk+1 (α) := j(pkt0 ) + t0

1−α 0 U (j(pkt0 )) − s . α

The figure below illustrates µk+1 (α) and µkt0 (α) for some α ≤ α. ¯ (For α > α ¯ the two thresholds intersect at t0 (α) for every (j −1 (p∗ ), p∗ ).) p < j −1 (p∗ ), and we have µkt0 (α) < µk+1 t0

k It follows that, under D1, we eliminate pk+1 for every t0 ∈ (t∗k , tµ k (α)), and we eliminate pt0 for every t0 k+1 k t0 ∈ (tµk (α), t∗k+1 ). At tµ k (α), both types, pt0 and pt0 , are possible. This establishes Equation (8) in Lemma

1.

A.3.2

Lemma A.3

The next lemma characterises the threshold tµ (α) used in Lemma 1. ∗ ∗ Lemma A.3. For each α ∈ (0, α] ¯ there exists a unique date tµ k (α) ∈ (tk , tk+1 ] satisfying

(A.11)

µkt0 (α) = µk+1 (α) t0

⇔

t0 = tµ k (α).

µ k k+1 0 ∗ k Moreover, for every t0 ∈ (t∗k , tµ k (α)), we have µt0 (α) < µt0 (α), while for every t ∈ (tk (α), tk+1 ), we have µt0 (α) > µ ∗ ∗ ∗ µk+1 (α). When α = α, ¯ tµ k (α) = tk+1 . When α → 0, tk (α) converges to a date, tA , in the interior of (tk , tk+1 ). t0

43

Proof. To prove this result, it is more convenient to re-define all variables so that they depend on p ≡ pkt0 rather than t0 . Thus, for each p ∈ [j −1 (p∗ ), p∗ ] and for all α ∈ (0, α], ¯ we let µk (p, α) := p + where f (p) :=

1 − p∗ p p∗ 1 − p

1−α f (p), α ρ ∆η+∆λ

and we let µk+1 (p, α) := j(p) +

u0 (p) − s ;

1−α g(p), α

where g(p) := U 0 (j(p)) − s. Let h(p, α) := µk (p, α) − µk+1 (p, α) =

1−α f (p) − g(p) − (p − j(p)) . α {z } | {z } | B(p)

A(p,α)

First, we show that A(p, α) is strictly decreasing on [j −1 (p∗ ), p∗ ] and has a unique root, pA ∈ (j −1 (p∗ ), p∗ ). The function g(p) is strictly increasing on [j −1 (p∗ ), p∗ ]. Conversely, ρ ∆η+∆λ 1 1 − p∗ p ρ u0 (p) − s + p(1 − p)(∆η + ∆λ)u00 (p) . f 0 (p) = ∗ p 1−p p(1 − p)(∆η + ∆λ) Replacing the term in square brackets using (A.2) gives ρ ∆η+∆λ pηG g + (1 − p)ηB ` + λ(p)u0 (j(p)) − ρs − (η(p) + λ(p))u0 (p) 1 − p∗ p (A.12) f 0 (p) = . p∗ 1 − p p(1 − p)(∆η + ∆λ) At every p < p∗ , the DM strictly prefers stopping rather than experimenting over a very small interval of time. This yields the local condition: pηG g + (1 − p)ηB ` + λ(p)u0 (j(p)) − ρs < (η(p) + λ(p))u0 (p). When p = p∗ , the condition holds with equality. Using these results to evaluate the sign of the term in square brackets in (A.12), we have f 0 (p) > 0 for all p < p∗ , and f 0 (p∗ ) = 0.

(A.13)

We conclude that A(p, α) is strictly decreasing on [j −1 (p∗ ), p∗ ]. Moreover, for every α ≤ α, ¯ A(j −1 (p∗ ), α) ≥ −1 ∗ ∗ −1 ∗ ∗ 0 ∗ 1−α A(j (p ), α) ¯ = p − j (p ) > 0. Finally, A(p , α) = α s − U (j(p )) < 0. Consequently, by the intermediate value theorem, A(p, α) admits a unique root pA ∈ (j −1 (p∗ ), p∗ ). The function B(p) is strictly concave on (0, 1) and is maximised at the unique solution, denoted pB , to p = 1−j(p). We now consider two cases. pA ≤ pB : In this case B(p) is strictly increasing on [j −1 (p∗ ), pA ].

Moreover, B(j −1 (p∗ )) = p∗ − j −1 (p∗ ) ≤

(p ), α) for every α ≤ α. ¯ Thus, there exists a unique belief pµ (α) ∈ [j −1 (p∗ ), pA ] such that −1 ∗ µ h(p, α) > 0 ∀p ∈ [j (p ), p (α)), (A.14) h(p, α) < 0 ∀p ∈ (pµ (α), pA ], h(p, α) = 0 ⇔ p = pµ (α). A(j

−1

∗

pA > pB : In this case, A(p, α) is strictly convex on [j −1 (p∗ ), pA ]. Then h(p, α) is also strictly convex on [j −1 (p∗ ), pA ]. Moreover, h(pA , α) = B(pA ) < 0 is invariant to α, while for every α < α, ¯ h(j −1 (p∗ ), α) > h(j −1 (p∗ ), α) ¯ = 0. Consequently, there exists a unique belief pµ (α) ∈ [j −1 (p∗ ), pA ] satisfying (A.14).

44

In both cases, for every p ∈ [j −1 (p∗ ), pA ] and α < α, ¯ A(p, α) > A(p, α). ¯ Therefore pµ (α) is strictly decreasing with α, taking values between limα→0 pµ (α) ¯ = pA and pµ (α) ¯ = j −1 (p∗ ). We define tA to be the date satisfying pktA = pA .

A.3.3

Concluding the Proof of Proposition 3(a)

Finally, we show that the efficient equilibrium is supported by some off-path reputation in M D1 (t0 ). For every t0 ∈ (t∗k , tµk (α)), µ(t0 ) = pkt0 , and our equilibrium is supported. It is also supported at date t0 = tµ k (α) if we choose µ(tµk (α)) = pktµ (α) . k

k+1 ∗ 0 For every t0 ∈ (tµ k (α), tk+1 ), µ(t ) = pt0 . Moreover, on this interval, we have

(α) > pk+1 µkt0 (α) > µk+1 t0 , t0 where the first inequality follows from Lemma A.3 and the second one from (A.10). Consequently, the reputation µ(t0 ) = pk+1 obtained by stopping at t0 is not sufficiently high to make a deviation to t0 profitable, both for type t0 pkt∗ and for type pk+1 t0 . k

We conclude that the following off-path reputation satisfies D1 and supports the efficient equilibrium: ( pkt0 if t0 ∈ (t∗k , tµ 0 k (α)], (A.15) µ(t ) = µ k+1 0 if t ∈ (tk (α), t∗k+1 ). pt 0 We have therefore established Proposition 3(a).

A.4

Proof of Proposition 3 (b)

First, we show that equilibria with inefficient delay do not satisfy D1. Then we show that this is also the case for equilibria with inefficient preemption, establishing the result.

A.4.1

D1 does not support equilibria with inefficient delay

Fix α ≤ α. ¯ Let T (p∗ ) denote the planner policy, and {t∗n }n≥0 denote the associated stopping dates. The social payoff achieved under the planner policy is given by the function U 0 , defined in (6). Consider the policy T (p), ˆ characterised by the sequence of threshold beliefs p ˆ := {ˆ pn }n≥0 such that there exists ∗ an integer k ≥ 0 for which pˆk < p , and with associated stopping dates {tˆn }n≥0 . Type p(t)’s expectation at date t of the social payoff achieved under T (p) ˆ is given by (A.16)

0,T (p) ˆ

Vt

h i T (p) ˆ (p(t)) := E Wt |p(t) .

Under this policy, the type of the DM who has observed k pieces of news repeals the project inefficiently late: 0,T (p) ˆ t∗k < tˆk . Therefore, for every t ≤ t∗k , Vt (pkt ) < U 0 (pkt ). Suppose that T (p) ˆ is an equilibrium policy. We show that there are no off-path beliefs satisfying D1 that support this equilibrium. We begin by excluding equilibrium in which the DM with k pieces of news repeals the project inefficiently late, and the DM with at least k + 1 pieces of news adopts the planner policy. Lemma A.4. Suppose there exists an integer k ≥ 0 such that pˆk < p∗ and pˆn = p∗ for every n > k. Then the policy T (p) ˆ cannot be played in an equilibrium that satisfies D1.

45

Proof. Fix t0 ∈ [t∗k , tˆk ). Adapting the argument from Section A.3, we can exclude types with at most k − 1 pieces of news and at least k + 2 pieces of news, so that M D1 (t0 ) ⊂ {pkt0 , pk+1 ˆ and stopping at t0 t0 }. Deviating from T (p) is profitable for type pk+1 if and only if t0 µ(t0 ) ≥ µk+1 := pk+1 + t0 t0

1 − α 0 k+1 U (pt0 ) − s . α

k+1 Because T (p) ˆ is an equilibrium policy, t0 < t∗k+1 . Consequently, U 0 (pk+1 > pk+1 t0 ) > s, and therefore µt0 t0 . If

µ(t0 ) = pkt0 , then type pkt0 has no reputational loss or gain from deviating from T (p) ˆ and stopping at t0 . However this deviation entails a net social gain. Therefore µk0 < pkt0 . Hence, for every t0 ∈ [t∗k , tˆk ), the off-path reputation t

satisfies D1 if and only if µ(t0 ) = pkt0 . But then, type pkt0 benefits from deviating from T (p) ˆ and stopping at t0 , establishing the result. We now exclude any remaining equilibrium in which the DM with k pieces of news repeals the project inefficiently late. Lemma A.5. Suppose there exists an integer k ≥ 0 such that pˆk < p∗ , and an integer m > k such that pˆm 6= p∗ . Then the policy T (p) ˆ cannot be played in an equilibrium that satisfies D1. k 0 0 ˆ ˆ ˆ Proof. Adapting the argument from Section A.3, M D1 (t0 ) ⊂ {pk−1 t0 , pt0 } for every t ∈ (tk−1 , tk ). Let t = tk −

for some small real > 0. Deviating from T (p) ˆ and stopping at t0 is profitable for type pkt0 if and only if 1 − α 0,T (p) ˆ (pkt0 ) − s . Vt µ(t0 ) ≥ µkt0 := pkt0 + α 0,T (p) ˆ

For t0 ∈ [t∗k , tˆk ), Vt

(pkt0 ) < s, and therefore µkt0 < pkt0 , with lim µkt0 = pˆk +

(A.17)

→0

1 − α 0,T (p) ˆ Vt (ˆ pk ) − s < pˆk . α

p ˆ Now consider type pˆk−1 at tˆk−1 . Her net payoff from engaging the deviation Dk−1 (tˆk−1 , tˆk ) is 0,T (p( ˜ tˆk )) (A.18) α π(0, pˆk−1 , tˆk − tˆk−1 ) pˆk − j −1 (ˆ pk ) + (1 − α) Vtˆ (ˆ pk−1 ) − s , k−1

0

0

where for any given t ∈ (tˆk−1 , tˆk ], p(t ˜ ) := {˜ pn }n≥0 has p˜k−1 =

pk−1 t0

and p˜n = pˆn for every n ≥ k. For T (p) ˆ to be

an equilibrium policy, the expression in (A.18) must be non-positive. p ˆ If µ(t0 ) = µkt0 , her net payoff from engaging the deviation Dk−1 (tˆk−1 , t0 ) is

(A.19)

0,T (p(t ˜ 0 )) α π(0, pˆk−1 , t0 − tˆk−1 ) µkt0 − ptk−1 + (1 − α) Vtˆ (ˆ pk−1 ) − s . 0 k−1

As → 0, the expression above tends to 0,T (p( ˜ tˆk )) α π(0, pˆk−1 , tˆk − tˆk−1 ) lim µkt0 − j −1 (ˆ pk ) + (1 − α) Vtˆ (ˆ pk−1 ) − s , →0

k−1

which, by (A.17) is strictly less than (A.18), and therefore strictly negative. This implies that there exists a small , so that under D1 we must have µ(t0 ) = pkt0 . But then, real > 0 such that for every t0 ∈ (tˆk − , tˆk ), µkt0 < µk−1 t0 ˆ and stopping at t0 , establishing the result. for t0 ∈ (tˆk − , tˆk ) type pkt0 benefits from deviating from T (p)

A.4.2

D1 does not support equilibria with inefficient preemption

Consider the policy T (p), ˆ characterised by the sequence of threshold beliefs p ˆ := {ˆ pn }n≥0 with pˆn ≥ p∗ for every n ≥ 0, and with associated stopping dates {tˆn }n≥0 . Under this policy, the DM repeals the project inefficiently early.

46

We begin with an observation. Fix q > p∗ and k ≥ 0, and suppose that for every n > k, pˆn = q, with the interpretation that all types of the DM who have observed at least k + 1 pieces of news use a policy with constant threshold belief q. The next lemma, proved in Section A.4.3, describes the policy maximising the social payoff for the type of the DM who has observed k pieces of news. Lemma A.6. If higher types play a policy with constant threshold belief q > p∗ , the type of the DM who has observed k pieces of news maximises the social payoff by using the stopping time Tk := inf{t : N (t) = k, p(t) ≤ r∗ (q)}. For every q > p∗ , the optimal threshold belief, r∗ (q), belongs to (p∗ , q). Two opposing forces are at work. By continuing to experiment at t, a DM with posterior belief pkt > p∗ avoids the inefficiency of stopping too early, conditional on having observed k pieces of news. This effect pushes r∗ (q) below q. At the same time, continued experimentation increases the likelihood of observing a piece of news, and adopting the inefficiently high threshold q in the future. This second effect keeps r∗ (q) strictly above p∗ . Indeed, for pkt sufficiently close to p∗ , the inefficiency caused by stopping too early is small, compared with the expected inefficiency incurred by continuing to experiment, making stopping optimal. Now consider the policy T (p), ˆ characterised by the sequence of threshold beliefs p ˆ := {ˆ pn }n≥0 with pˆn ≥ p∗ for every n ≥ 0. Suppose that T (p) ˆ is an equilibrium policy. We show that there are no off-path beliefs satisfying D1 that support this equilibrium. We begin by showing that, under D1, p ˆ must be an increasing sequence. Lemma A.7. Suppose there exists a finite integer n ≥ 0 such that pˆn > pˆn+1 . Then the policy T (p) ˆ cannot be played in an equilibrium that satisfies D1. Proof. Assume, by way of contradiction, that it is not. Then there exists a finite integer n ¯ ≥ 0 such that pˆn¯ = supn pˆn . Choose q = supm>¯n pˆm . Our assumption implies that q ≤ pˆn¯ . Consider the policy T (p), ˇ characterised by the sequence of threshold beliefs p ˇ such that pˇn¯ = pˆn¯ and pˇn = q for n > n ¯ , and with associated stopping dates {tˇn }n≥0 . Adapting the argument from Section A.3, we have that n ¯ +1 ∗ ¯ D1 0 ¯ ˆn¯ ). } for every t0 ∈ (tˇn¯ , tˇn¯ +1 ). Fix t0 such that pn M (t ) ⊂ {pn t0 ∈ [r (q), p t0 , pt0 ¯ +1 Deviating from T (p) ˇ and stopping at t0 is profitable for type pn if and only if t0 ¯ +1 ¯ +1 µ(t0 ) ≥ µn + := pn t0 t0

1−α ¯ +1 )−s . uq (pn t0 α

where, adapting (7), for a DM with current belief p ∈ [q, 1), the payoff from adopting a continuation policy with constant threshold belief q is (A.20)

uq (p) = γ(p) +

1−p (s − γ(q)) 1−q

Ω(p) Ω(q)

ν(λG ,λB ) .

¯ +1 ¯ +1 ¯ +1 n ¯ ¯ +1 ¯ Since pn > q, we have uq (pn ) > s, and therefore µn > pn . If µ(t0 ) = pn t0 , then type pt0 has no reputational t0 t0 t0 t0 loss or gain from engaging in the deviation Dnp¯ˇ(tˆn¯ , t0 ). However this deviation entails a net social gain. Therefore ¯ ¯ 0 n ¯ ∗ ¯ µn < pn ˆn¯ ), the off-path reputation satisfies D1 if and only if µ(t0 ) = pn t0 . Hence, for every t such that pt0 ∈ [r (q), p t0 . t0 But then, type pˆn¯ benefits from engaging the deviation Dnp¯ˇ(tˆn¯ , t0 ).

Let rn¯ (p) ˆ denote the threshold belief that maximises the social payoff for the type of the DM who has observed n ¯ pieces of news, if higher types adhere to T (p). ˆ Because there is more (inefficient) preemption under the policy ˇ) than under the policy T (p), T (pˆ ˆ we have rn¯ (p) ˆ < r∗ (q). Thus, the argument just made for p ˇ remains valid under p, ˆ establishing the contradiction. Finally, if p ˆ is an increasing sequence on [p∗ , 1), it must converge to an upper bound, q ∈ (p∗ , 1]. Then for every δ ∈ (0, q − r∗ (q)) there exists an integer m(δ) > 0 such that for every n > m(δ), pˆn > q − δ > r∗ (q). Fix

47

p ˆ ˆ 0 δ ∈ (0, q − r∗ (q)) and n > m(δ) and consider t0 > tˆn such that pn t0 = q − δ, and the deviation Dn (tn , t ). Under D1,

we must have µ(t0 ) = pn ˆ is t0 . (The proof is similar to that of Lemma A.6 and we omit it.) Thus, the policy T (p) not supported by beliefs satisfying D1.

A.4.3

Proof of Lemma A.6

Proof. Consider a DM for whom α = 0 and fix k ≥ 0. Suppose all types of the DM who have observed at least k + 1 pieces of news use a policy with constant threshold belief q, which we allow to differ from p∗ . Expecting this, what is the payoff-maximising policy for the type of the DM who has observed k pieces of news? We let r∗ (q) denote the threshold belief employed under the optimal policy, and let ϕq denote the corresponding value function for the DM, with the interpretation that ϕq (p) is the value under that policy for a DM who has observed k pieces of news and holds the posterior belief pkt = p. The value ϕq solves the Bellman equation: n o (A.21) u(p) = max s ; bS (p, u) + bF (p, u) + bN (p, u) − d(p, u) /ρ , where bS (p, u) = pηG [g − u(p)] is the expected benefit from a success, bF (p, u) = (1 − p)ηB [` − u(p)] is the expected loss from a failure, bN (p, u) = λ(p)[uq (j(p)) − u(p)] is the expected benefit from a piece of news, with γ(x) + 1−x (s − γ(q)) Ω(x) ν(λG ,λB ) , 1−q Ω(q) uq (x) := s,

x ≥ q, x < q,

giving the expected social payoff to a DM with posterior belief x ∈ (0, 1) under the policy with constant threshold belief q; and finally d(p, u) = (∆η + ∆λ)p(1 − p)u0 (p) measures the deterioration in the DM’s outlook when she experiments without observing any event – success, failure, or news. Over values of p ∈ (0, 1) at which experimentation is optimal, ϕq solves the following ordinary differential difference equation: (A.22)

(η(p) + λ(p) + ρ) ϕq (p) + p(1 − p)(∆η + ∆λ)ϕ0q (p) = pηG g + (1 − p)ηB ` + λ(p)uq (j(p)).

The solution to this ODE is given by the family of functions ϕq (p, r), parametrised by the threshold belief r employed conditional on no news and no public event: ϕq (p, r) = Φq (p) + (s − Φq (r)) where

1−p 1−r

Ω(p) Ω(r)

ηB +λB +ρ ∆η+∆λ

,

ηG g + λ G γ G ηB ` + λB γB 1−p Φq (p) = p + (1 − p) + (s − (qγG + (1 − q)γB )) ηG + λ G + ρ ηB + λB + ρ 1−q

48

Ω(p) Ω(q)

ν ,

We let r∗ (q) denote the threshold belief that maximises the payoff ϕq (p, r). Then, the value function ϕq (p) is defined to be ϕq (p, r) evaluated at r = r∗ (q). The threshold r∗ (q) is the unique solution to : f1 (r∗ (q)) = f2 (q);

(A.23) where

f1 (r) := [r(s − γG )βG + (1 − r)(s − γB )βB ]

(A.24)

1 1−r

1 f2 (q) := (s − γB )q(γG − γB )(βB − ν(βG − βB )) 1−q

(A.25)

r 1−r

ν

q 1−q

,

ν ,

and βθ := ηθ + λθ + ρ, θ ∈ {G, B}. First, we show that, for every r > p∗ , f1 (r) is a strictly decreasing function of r. From ν 1 + ν r+ν 1 r f10 (r) = (s − γG )βG − (s − γB )βB + (s − γB )βB , 1−r r(1 − r) 1 − r 1 − r f10 (r) < 0 if and only if the term in curly brackets is strictly negative, which is the case whenever r>

ν(s − γB ) . G ν(s − γB ) + (1 + ν)(γG − s) ββB

Since βG > βB , the right-hand side above is strictly below p∗ . Second, we show that, for every q > p∗ , f2 (q) is a strictly decreasing function of q. From ν ν λB 1 q+ν q 1+ν + (s − γB ) λB , f20 (r) = (γB − γG ) 1−q q(1 − q) λG 1−q 1−q f20 (r) < 0 if and only if the term in curly brackets is weakly negative, which is the case whenever q > p∗ . From the last two observations, we have that r∗0 (q) > 0 for every q > p∗ . Moreover, observe that r∗ (p∗ ) = p∗ . Hence, r∗ (q) > p∗ for every q > p∗ . Finally, f1 (q) < f2 (q) if and only if q > p∗ . Hence, the solution, r = q, to f1 (r) = f1 (q) is strictly greater than the solution, r = r∗ (q) to f1 (r) = f2 (q). Hence, r∗ (q) < q for every q > p∗ .

A.5

Proof of Proposition 4

We begin with a formal statement of the proposition. Proposition 4.

(a) Fix λG > 0. Then limλB →0 α(λ ¯ B , λG ) = α(0, ¯ λG ) = 1.

(b) Fix λB ≥ 0. Then limλG →∞ α(λ ¯ B , λG ) = 1. (c) For any λB > 0, lim∆λ→0 α(λ ¯ B , λG ) = 0. Proof. Fix ∆η > 0. Consider the DM whose belief at date t∗n equals the planner threshold p∗ . We have argued ∗

that the DM adopts the planner solution if and only if the deviation Dnp (t∗n , t∗n+1 ) is not profitable. We wish to describe how the DM’s incentive to deviate varies with (λB , λG ). ∗

By Lemma A.1, for a DM with reputational concern α, the deviation Dnp (t∗n , t∗n+1 ) is profitable if and only if (A.26)

∗ ∗ α p∗ − j −1 (p∗ ) + (1 − α) e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) > 0. ∗

∗

Observe that e−(∆η+∆λ)(tn+1 −tn ) =

λB . λG

49

For the proof of (a) it is more convenient to consider inequality (A.26) prior to dividing by π(0, p∗ , t∗n+1 − t∗n ), the probability of no public event and no private news over the time interval [t∗n+1 − t∗n ): ∗ ∗ (A.27) α π(0, p∗ , t∗n+1 − t∗n ) p∗ − j −1 (p∗ ) + (1 − α) π(0, p∗ , t∗n+1 − t∗n ) e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) > 0. ∗

As λB → 0, the length t∗n+1 −t∗n of the deviation Dnp (t∗n , t∗n+1 ) increases without bound. Moreover, limλB →0 p∗ = p∗ and limλB →0 j −1 (p∗ ) = 0. The expected net social loss from deviating is bounded below by γ(p∗ ) − s, the expected net social payoff from an infinite deviation. As λB → 0, π(0, p∗ , t∗n+1 − t∗n ) → 0 and ∗ ∗ π(0, p∗ , t∗n+1 − t∗n ) e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) → γ(p∗ ) − s, while π(0, p∗ , t∗n+1 − t∗n ) p∗ − j −1 (p∗ ) → 0. Therefore, α(λ ¯ B , λG ) → 1. The planner solution when λB = 0 is described in corollary 1. There is a unique date t∗ satisfying p(t∗ ) = p∗ at ∗

which the planner might stop on path. The analogue of the deviation D0p (t∗0 , t∗1 ) for a DM who has not observed any news by date t∗ = t∗0 is an infinite deviation under which the DM never repeals the project. This deviation does not improve the DM’s reputations, Since it gives her an expected reputation of p∗ × 1 + (1 − p∗ ) × 0 = p∗ . However, the net social loss to this reputation is γ(p∗ ) − s < 0. Thus, such a deviation cannot be profitable for a DM unless she puts no weight at all on its social cost, so that α(0, ¯ λG ) = 1. We now prove (b). Consider equation (A.26). Since limλG →∞ ν(λG , λB ) = 0 we have that limλG →∞ p∗ = 0. Since p∗ > j −1 (p∗ ), we also have that limλG →∞ j −1 (p∗ ) = 0. Hence, the net reputational benefit from the deviation ∗ Dnp (t∗n , t∗n+1 ) tends to zero: limλG →∞ p∗ − j −1 (p∗ ) = 0. However, as λG → ∞, the time interval t∗n+1 − t∗n tends ∗ ∗ ∗ to zero and e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) , the social cost of the deviation Dnp (t∗n , t∗n+1 ), also tends to zero. We therefore rewrite equation (A.26) as ∗ ∗ e−ρ(tn+1 −tn ) s − u0 (j −1 (p∗ )) α >− . 1−α p∗ − j −1 (p∗ )

(A.28) The right-hand side equals

h i ∗ ∗ e−ρ(tn+1 −tn ) (s − γG )B + (s − γB )A ,

(A.29) where −1

B :=

∗

p∗ 1−j1−p(p ∗

)

λG λB

ν(λB ,λG )

−1

,

p∗ − j −1 (p∗ )

A :=

(A.30)

Since limλG →∞

B=

λG λB

ν(λG ,λB )

1 1 − p∗

λG λB

ν(λG ,λB )

1−

λG λB

= 1 and limλG →∞ p∗ = 0 we have lim B =

λG →∞

λG λB λG λB

−1 −1

= 1.

Simplifying A, we have (A.31)

ν(λG ,λB ) λG 1 − λB 1 λG A= ∗ G p λB 1 − λλB

50

)

λG λB

ν(λB ,λG )

p∗ − j −1 (p∗ )

Simplifying B, we have

1−

∗

(1 − p∗ ) 1−j1−p(p ∗

− j −1 (p∗ )

− (1 − j −1 (p∗ ))

As λG → ∞ the interval

h

ρ η+λG −λB

,

ρ+λB η+λG −λB

i

containing ν(λG , λB ) converges to the point 1/λG . Therefore, 1 λ

lim A = −

λG →∞

1 − λGG 1 λG

.

Both the numerator and the denominator of the expression above tend to 0 as λG → ∞. Applying l’Hˆ ospital’s rule, we have lim A = lim

λG →∞

λG →∞

1 λ λGG (ln λG − 1) = +∞.

Hence, h i ∗ ∗ lim e−ρ(tn+1 −tn ) (s − γG )B + (s − γB )A = +∞.

λG →∞

Therefore, for every α < 1 there exists an intensity λG > 0 for news process associated with a good project such ∗

that condition (A.28) is violated, and the deviation Dnp (t∗n , t∗n+1 ) is not profitable, establishing (c). ∗

We now prove (c). If ∆λ → 0 then t∗n+1 − t∗n → 0 so that the net social cost of the deviation Dnp (t∗n , t∗n+1 ) tends to zero. Moreover, as a piece of private news becomes almost completely uninformative, we have p∗ − j −1 (p∗ ) → 0. We therefore consider equation (A.28) in lieu of equation (A.26). As ∆λ → 0, ν(λB , λG ) →

ηB +ρ . ∆η

From (A.30)

and (A.31) we have that lim B =

∆λ→0

1 ηG + ρ , 1 − p∗ ∆η

lim A =

∆λ→0

1 ηB + ρ p∗ ∆η

Using the expression for p∗ from Corollary 1, we have that h i ∗ ∗ lim e−ρ(tn+1 −tn ) (s − γG )B + (s − γB )A = 0, ∆λ→0

Establishing the result.

A.6

Proof of Proposition 5

We begin with a formal statement of the proposition. Proposition 5. Fix α ∈ (α, ¯ 1). There exists a separating equilibrium with threshold beliefs p ˆ := {ˆ pn }n≥0 and ∗ ˆ associated stopping dates {tn }n≥0 satisfying: (i) pˆ0 > p , (ii) for each n > 0, pˆn = q, where q ∈ (0, p∗ ). This ˆ ˆ equilibrium is supported by the off-path reputation µ(t) = p0t for each t ∈ (0, tˆ0 ), and µ(t) = pn t for each t ∈ (tn , tn+1 ) and for each n > 0. Proof. Fix α > α. ¯ In this appendix, we henceforth omit the dependence on α, to lighten notation. We prove the proposition assuming that the primitives (`, g, λG , λB , ηG , ηB ) of the model are such that j −10 (p∗ ) ≤ 1.44 Let T (p) ˆ denote the policy characterised by the sequence of threshold beliefs p ˆ defined in Proposition 5. The proposition follows from two lemmas, A.8 and A.9. Lemma A.8. There exists an interval [qP RE , qDE ] ⊆ [0, p∗ ] of beliefs such that, for every q ∈ [qP RE , qDE ], the DM with n > 0 pieces of news has no incentives to deviate from the policy T (p). ˆ Proof. Fix n > 0 and consider the DM with belief q at date tˆn . Under the policy T (p), ˆ the DM should stop. p ˆ ˆ ˆ Consider the deviation Dn (tn , tn+1 ). It must not be profitable. Equivalently, its net payoff must be non-positive: (A.32) 44

ˆ ˆ α q − j −1 (q) + (1 − α) e−ρ(tn+1 −tn ) s − uq j −1 (q) ≤ 0,

A version of this proposition also holds for the remaining parameter values.

51

where uq : [0, 1] → R, defined in (A.20), is the payoff from adopting a policy with constant threshold belief q ∈ (0, 1) when the DM’s belief is p > q. For the parameter values chosen, the net payoff from Dnpˆ(tˆn , tˆn+1 ) is a strictly increasing function of q on (0, p∗ ). Since for α > α ¯ that net payoff is strictly positive when q = p∗ , by continuity there exists a belief qDE setting the left-hand side of (A.32) equal to zero. Consequently, (A.32) holds for every q ≤ qDE . 0 Now fix t0 ∈ (tˆn−1 , tˆn ) and consider the DM with belief pn t0 > q at date t . On path, the DM should continue

experimenting. That is, stopping at t0 should not be profitable. Equivalently, for every p ∈ (q, j(q)), the net payoff from this deviation must be negative: α j −1 (p) − p + (1 − α) (s − uq (p)) ≤ 0, {z } |

(A.33)

∀p ∈ (q, j(q)).

=:f (q,p)

We now show that (A.33) holds if and only if q < qP RE , where qP RE < p∗ is defined in (A.36). First, observe that

∂ f (q, p) ∂p

≥ 0 if and only if α j −10 (p) − 1 ≥ (1 − α)uq0 (p),

(A.34)

where each side of (A.34) is a strictly increasing function of p on (0, 1), since for every (p, q) ∈ (0, 1)2 , j −1 and uq are strictly convex functions of p. Furthermore, α j −10 (0) − 1 = α λλB − 1 < 0 and we assume that p∗ G is sufficiently low that j −10 (p∗ ) ≤ 1. Conversely, for every q ∈ (0, p∗ ), limp→0 uq0 (p) = −∞. Since, for every p ∈ (0, 1), uq0 (p) is a strictly increasing function of q ∈ (0, p∗ ), and because uq0 (p∗ ) = 0 when q = p∗ , we have that uq0 (p∗ ) > 0 for every q ∈ (0, p∗ ). Finally, j −10 is a strictly convex function of p on (0,1), while we can show that uq0 is strictly concave for p < (2 + ν)/3, and is uniformly bounded from above by j −10 ((2 + ν)/3) for p ∈ ((2 + ν)/3, 1) when (2 + ν)/3 < 1. Therefore, by the intermediate value theorem, there exists a unique pmax (q) satisfying fp (q, p) = 0 ⇔ p = pmax (q), and maximising f (q, p) on (0, p∗ ), given q ∈ (0, p∗ ). Observe that pmax (q) =

=

is a strictly increasing function of q ∈ (0, p∗ ) , with limq→0 pmax (q) = 0 and limq→p∗ pmax (q) = p, where p < p∗ whenever j −10 (p∗ ) < 1. By the Envelope theorem, (A.35)

∂ ∂ ∂ f (q, pmax (q)) = f (q, p) |p=pmax (q) = {− (1 − α) uq (p)} |p=pmax (q) < 0. ∂q ∂q ∂q

Moreover, lim f (q, pmax (q)) = (1 − α) [s − γB ] > 0 h i = lim∗ f (q, pmax (q)) = α j −10 (p∗ ) − p∗ − (1 − α) s − u0 ( p) < 0, q→0

q→p

=

where the last inequality follows from the fact that p ≤ p∗ . It follows that there exists a unique qP RE ∈ (0, p∗ ) such that (A.36)

f (qP RE , pmax (qP RE )) = 0,

and, from the definition of pmax (q), that (A.33) is satisfied for every p ∈ [qP RE , j(qP RE )]. Finally, by (A.35), (A.33) holds strictly on [q, j(q)] for every q > qP RE . We can show that f (qDE , pmax (qDE )) < 0, and that therefore qP RE < qDE . The lemma follows. We now establish the second lemma: Lemma A.9. For every q < qDE there exists an interval [r∗ (q), pˆDE (q)] ⊆ [p∗ , 1] such that, for every pˆ0 ∈ [r∗ (q), pˆDE (q)], the DM with n = 0 pieces of news has no incentives to deviate from the policy T (p). ˆ

52

Proof. Fix q ∈ [qP RE , qDE ], and consider the DM with belief pˆ0 at date tˆ0 . On path, the DM should stop. Consider the deviation D0pˆ(tˆ0 , tˆ1 ). It must not be profitable. Equivalently, its net payoff must be non-positive: h i απ (0, pˆ0 , ∆) q − j −1 (q) + (1 − α) uq (ˆ p0 ) − s + π (0, pˆ0 , ∆) e−ρ∆ s − uq j −1 (q) ≤ 0, | {z }

(A.37)

=:g(q,p ˆ0 )

ρ ∆η+∆λ

Ω (ˆ p0 ) . Ω (q) We now show that (A.37) provides an upper bound on pˆ0 , which we denote pˆDE (q). Proceeding as in the

where ∆ := tˆ1 − tˆ0 , and e−ρ∆ =

proof of Lemma A.37, we can show that for every q ∈ (0, p∗ ] there exists a unique pmin (q) ∈ (0, 1) such that g(q, p) is minimised when p = pmin (q). Second, for every p ∈ (0, 1), g(q, pmin (q)) strictly increases with q in (0, p∗ ], with limq→0 g(q, pmin (q)) = −∞ and g(p∗ , pmin (p∗ )) > 0. Thus, there exists a unique q0 ∈ (0, p∗ ] such that g(q0 , pmin (q0 )) = 0. Moreover, q0 < qDE . Consequently, g(q, pmin (q)) < 0 for every q < qDE . Observing that g(q0 , p∗ ) = 0, we conclude that for every q < q0 , g(q, p) admits a (unique) root in (p∗ , 1), which we denote pˆDE (q). It follows that, for every q < qDE , (A.37) holds whenever pˆ0 ≤ pˆDE (q). Finally, observe that pˆDE (q) is a strictly decreasing function of q ∈ (0, qDE ]. Now fix t0 ∈ (0, tˆ0 ) and consider the DM with belief p0t0 > q at date t0 . On path, the DM should continue experimenting, and stopping at t0 should not be profitable. If the DM stops at t0 , she cannot be punished with a reputation below her belief, since µ(t) = p0t for each t ∈ (0, tˆ0 ). As a result, there is no reputational loss or gain from such preemption. Thus, pˆ0 must be such that the DM with no news cannot increase the social welfare by stopping before tˆ0 . The DM therefore solves the problem of choosing a threshold belief r∗ (q) so as to maximise the social payoff, given a continuation policy with the constant (inefficient) threshold belief q. This was analysed in Lemma A.6 for the case where q > p∗ , For q < p∗ , we similarly find that r∗ (q) > p∗ . Preemption is therefore not optimal for every pˆ0 > r∗ (q). The lower bound r∗ (q) is also a strictly decreasing function of q ∈ (0, p∗ ), with r∗ (p∗ ) = p∗ . Finally, observe that limq→0 r∗ (q) < limq→0 pˆDE (q). We conclude that for every q < qDE , r∗ (q) < pˆDE (q). The lemma follows.

A.7

Proof of Proposition 6

We begin with a formal statement of the proposition. Proposition 6. Fix 0 < t1 < t2 and the intensity α ∈ (0, 1) of the DM’s reputational concern. Then, p˜0 > p˜α > 0, p˜α decreases with α, and limα→1 p˜α = 0. Proof. The threshold policy with threshold belief p˜α is an equilibrium policy if the DM with reputational concern α and belief p(t1 ) = p˜α is indifferent between repealing the project at date t1 , and keeping it active. We assume for now that α is sufficiently large that p˜α ≥ p0t1 . Her reputation, conditional on stopping at t1 , is determined by Bayes rule and satisfies µ(t1 ) = E [p(t1 )|p(t1 ) ≤ p˜α ]. If she continues at date t1 , her reputation at t2 is 1 if the project succeeds on the interval [t1 , t2 ), 0 if it fails on that interval, and E [p(t2 )|S(t2 ) = 0, p(t1 ) > p˜α ] otherwise. The threshold belief p˜α is therefore pinned down by the indifference condition: h i h i (A.38) α E[µ(t2 )|p(t1 ) = p˜α ] − µ(t1 ) + (1 − α) E[Wtt12 |p(t1 ) = p˜α ] − s = 0, where the first term is the DM’s net reputational payoff from keeping the project active at t1 , and the second term measure the net social payoff.

53

We now show that for every p˜α ≥ p0t1 , the first term is strictly positive. If she stops at date t1 , the DM’s reputation is µ(t1 ) = E [p(t1 )|p(t1 ) ≤ p˜α ] ≤ p˜α . For every p ∈ [0, 1], the DM’s expectation at t1 of her reputation at t2 is E[µ(t2 )|p(t1 ) = p] = p

1 − e−ηG (t2 −t1 )

+ p e−ηG (t2 −t1 ) + (1 − p) e−ηB (t2 −t1 ) E [p(t2 )|S(t2 ) = 0, p(t1 ) > p˜α ] . It is strictly increasing in p. Therefore, for every p > p˜α , (A.39)

E[µ(t2 )|p(t1 ) = p] > E[µ(t2 )|p(t1 ) = p˜α ].

Moreover, the DM’s reputation at t2 is bounded from below: h i ˜α n ˜α (A.40) E [p(t2 )|S(t2 ) = 0, p(t1 ) > p˜α ] = E p(t2 )|p(t2 ) ∈ (pn t2 , 1) > pt2 , ˜α where n ˜ α satisfies pn ˜α , so that t1 = p ˜α E[µ(t2 )|p(t1 ) = p˜α ] > p˜α 1 − e−ηG (t2 −t1 ) + p˜α e−ηG (t2 −t1 ) + (1 − p˜α ) e−ηB (t2 −t1 ) pn ˜α , t2 = p

where the last equality follows from Bayesian updating. It follows that the first term in (A.38) is strictly positive. For the DM to be indifferent, it must therefore be that E[Wtt12 |p(t1 ) = p˜α ] < s. By (9) we therefore have p˜α < p˜0 . When α is sufficiently close to 0, a solution p˜α ≥ p0t1 to (A.38) exists. However, as α increases, we have p˜α → 0. In these cases p˜α < p0t1 , and no type p(t1 ) of the DM repeals the project at t1 in equilibrium.

A.8

Proof of Proposition 7

Proof. When α = 0, the common value function for the DM and the observer, V 0 (p(t)) := supT Vt0,T , is convex and continuous. The function V 0 solves the Bellman equation (A.1) – where, in the bad news context, bN (p, u) reflects the expected loss from a piece of news – and is the unique solution satisfying the boundary conditions u(0) = s and u(1) = γG . It is equal to s on [0, p[ ] and solves the ordinary differential difference equation (A.2) in p on (p[ , 1) for some p[ to be found. We begin by considering equation (A.2) on [b, 1) with some b ∈ (0, 1) given and fixed. Observe that the function p 7→ j(p) is decreasing and finds points b := b0 < b1 < b2 , . . . such that j(bn ) = bn−1 for n ≥ 1. Let In := [bn−1 , bn ) for n ≥ 1. Consider equation (A.2) on I1 after setting u(p) = s for p ∈ [j(b), b): (A.41)

u(p) η(p) + ρ + λ(p) [u(p) − s] + p(1 − p)(∆η + ∆λ)u0 (p) = pηG g + (1 − p)ηB `.

This is a first-order differential equation which can be solved explicitly. The function p 7→ u(p, 1) with u(p, 1) = (A.42)

R ∞ −(η +λ +ρ)t pe G G (ηG g + λG s) + (1 − p) e−(ηB +λB +ρ)t (ηB ` + λB s) dt 0

=p

ηG g+λG s ηG +λG +ρ

+ (1 − p)

ηB `+λB s ηB +λB +ρ

constitutes a particular solution to (A.41). It is the payoff to the following policy, given current belief p: repeal the project as soon as a piece of news arrives, provided this occurs before the project succeeds or fails; otherwise

54

keep the project active. The function p 7→ (1 − p)Ω(p)

ηB λB +ρ ∆η+∆λ

constitutes a solution to the homogeneous version

of (A.41). The solution to (A.41) on I1 is therefore given by the family of functions vc1 (p) = u(p, 1) + c1 (1 − p)Ω(p)

ηB λB +ρ ∆η+∆λ

,

where c1 ∈ R is a constant of integration. Imposing the continuity condition vc1 (b) = s at b (in agreement with V 0 (p[ ) = s), we solve for c1 and obtain a unique solution p 7→ v(p, b) on I1 where v(p, b) := u(p, 1) + (s − u(b, 1))

(A.43)

1−p 1−b

Ω(p) Ω(b)

ηB λB +ρ ∆η+∆λ

.

Now consider equation (A.2) on I2 . Setting u(p) = v(p, b) for p ∈ I1 , we once more have a first-order differential equation which can be solved explicitly. Imposing a continuity condition over I1 ∪ I2 at b1 , we again obtain a unique solution p 7→ v(p, b) on I2 . Continuing this process by induction, we obtain the following expression, for p ∈ In : (A.44)

v(p, b) = u(p, n) +

n−1 X

Vk π(n − 1 − k, p, x(p, bn−1 )) e−ρx(p,bn−1 ) ,

k=0

where the function π is defined in (A.6) and where for every p > bn−1 , x(p, bn−1 ) > 0 satisfies e

−x(p,bn−1 )

=

Ω(p) Ω(bn−1 )

1 ∆η+∆λ

Ω(p) Ω(bk )

=

λB λG

1 n−1−k ! ∆η+∆λ

;

and where V0 := u(b, 0) − u(b, 1) and V1 , . . . , Vn−1 are constants satisfying the recurrence relation: (A.45)

Vk = u(bk , k) − u(bk , k + 1) +

k−1 X

Vi π(k − 1 − i, bk , x(bk , bk−1 )) e−ρx(bk ,bk−1 ) ,

i=0

with n u(p, n) := p An G + (1 − p) AB ,

(A.46) and,

n An θ := (aθ ) s +

n−1 X

(aθ )i ζθ ,

i=0

with aθ =

λθ , ηθ + λθ + ρ

ζG =

ηG g , ηG + λ G + ρ

ζB =

ηB ` . ηB + λB + ρ

We obtain the expression in (10) for p[ , by choosing b ∈ (0, 1) so as to maximise v(p, b) for every p ∈ (0, 1). This effectively amounts to maximising the expression in (A.43). The solution is interior and smooth-pasting is satisfied at p[ . The expression in (12) is obtained by setting V 0 (p) := v(p, p[ ) for p > p[ , where v 0 (p) is the function v(p, p[ ) restricted to the interval I1 . The usual verification argument shows that V 0 solves the Bellman equation (A.1) with the maximum being achieved under the planner policy.

A.9

Proof of Proposition 8

Proof. We prove the proposition by constructing a profitable deviation from the planner policy. Consider the type of the DM with N (t[n+1 ) = n who receives her n + 1th piece of news at t0 ∈ (t[n+1 , t[n ), and let us vary t0 .

55

[

p Consider that DM’s net payoff from following the deviation Dn+1 (t0 , t[n ) from the planner policy. Since it delays

stopping the project relative to the social optimum, its expected net social payoff is negative. Longer delays cause greater expected social losses. When t0 → t[n , the expected social loss tends to zero. Conversely, the deviation generates a strictly positive expected reputational gain. On path, the DM stops at t0 [

p and her reputation is µ(t0 ) = pn+1 . Under Dn+1 (t0 , t[n ), unless the project has succeeded or failed over the interval t0

[t0 , t[n ), the DM repeals the project at t[n . Her reputation is then µ(t[n ) = p[ . The expected reputational gain from [

p Dn+1 (t0 , t[n ) is therefore

∞ X

π(k, pn+1 , t[n − t0 ) p[ − pn+1+k , t0 t0

k=0

which is bounded from below by h i [ 0 [ 0 pn+1 e−ηG (tn −t ) + (1 − pn+1 ) e−ηB (tn −t ) p[ − j(p[ ) . t0 t0 This bound strictly increases with t0 on (t[n+1 , t[n ), and tends to p[ − j(p[ ) > 0 as t0 → t[n . The expected social loss and reputational gain are continuous in t0 . Therefore there exists a date t00 ∈ (t[n+1 , t[n ) [

p such that for all t0 ∈ (t00 , t[n ), the deviation Dn+1 (t0 , t[n ) is strictly profitable.

Observation 3. Fix ∆λ ∈ (−∆η, 0) and let λG = λ and λB = λ + ∆λ, for some λ > 0. We are interested in the [

p (t0 , t[n ) from the planner policy when λ grows large. Recall that for every n ≥ 0, t[n is payoff to the deviation Dn+1

defined by pn = p[ . By (1) and (10), t[0 is not affected by changes in λ. In contrast, for every n > 0, t[n increases t[ n

with λ. Specifically, the difference ∆t[ := t[n+1 − t[n = −

1 ln ∆η + ∆λ

λG λB

=

1 ∆λ ln 1 + ∆η + ∆λ λ

is independent of n ≥ 0, decreases with λ and shrinks to zero as λ tends to infinity. Similarly, the magnitude of the jump p[ − j(p[ ) = p[ −

p[ p[ + (1 − p[ ) 1 +

∆λ λ

decreases with λ and shrinks to zero as λ tends to infinity. [

p As a consequence, both the social cost and the reputational benefit of the deviation Dn+1 (t0 , t[n ) shrink to zero

for every t0 ∈ (t[n+1 , t[n ). Thus, for every λ > 0 there exists a real ε > 0 such that the social planner policy constitutes an ε-equilibrium. Finally, ε decreases with λ.

A.10

Proof of Proposition 9

We begin with a formal statement of the proposition. Proposition 9. Fix α > 0. The following conditions are necessary for a policy T (ˆ p) to be an equilibrium policy: (i) ∃t0 ≥ 0 such that pˆ(t0 ) ≥ p0t0 ; (ii) pˆ(t) has discontinuities on (0, t0 ]. Some of these are upward jumps; (iii) if at tˆ ∈ (0, t0 ], limt%tˆ pˆ(t) < pˆ(tˆ), then (a) ∃ε > 0 such that pˆ(t) = 0, ∀t ∈ (tˆ − ε, tˆ); (b) ∃δ > 0 such that n ˆ (tˆ) ∀t ∈ (tˆ, tˆ + δ), pˆ(t) ≤ pt where n ˆ (tˆ) := min{n ≥ 0|pnˆ ≤ µ(tˆ)} and µ is determined by Bayes’ rule under T (ˆ p). t

Proof. (i) This point states that, in equilibrium, no type of the DM continues experimenting forever. Suppose, by way of contradiction, that there exists an equilibrium at which, for every t ≥ 0, pˆ(t) < p0t . At such an equilibrium,

56

for every t ≥ 0, a DM who has seen no news and no success before t must continue experimenting. In other words, for every t > 0, stopping at date t must be worse than continuing to experiment for a DM with belief p0t : h i α µ(t) + (1 − α)s ≤ α E µ(T ∧ τG ∧ τB )|p0t + (1 − α)E WtT |p0t . Observe that, under Bayes’ rule, µ(t) ≤ p0t , so that, as t grows without bound and p0t → 0, the left-hand side above tends to (1 − α)s, while the right-hand side above tends to (1 − α)E e−ρτB `|θ = B < (1 − α)s, a contradiction. (ii) Point (ii) states that, in equilibrium, pˆ(t) cannot be continuous at every t ≥ 0. Suppose, by way of contradiction, that in equilibrium, pˆ(t) is continuous at every t ≥ 0. By Proposition 9 (i), there exists a date t0 such that pˆ(t0 ) ≥ p0t0 . Moreover, suppose that the prior p0 is high enough that, in equilibrium, pˆ(0) < p0 . ˆ(t) = pm+1 Then, there exists an integer m ≥ 0, a date tˆm ≥ 0 and εm > 0 such that pˆ(tˆm ) = pm for each t tˆm and p m+1 m+1 ˆ ˆ t ∈ (tm − εm , tm ). (Indeed, observe that the policies T (ˆ p) with pˆ(t) = pt and T (˜ p) with p˜(t) ∈ (pt , pm t ) are equivalent at t.) In this case, the DM’s reputation from stopping at t ∈ (tˆm − εm , tˆm ] is ( pm+1 ∀t ∈ (tˆm − εm , tˆm ); t µ(t) = m pt if t = tˆm . p ˆ Consequently, for a DM whose belief enters the stopping region at date t0 ∈ (tˆm −εm , tˆm ), the deviation Dm+1 (t0 , tˆm )

generates a strictly positive expected net reputational benefit that is increasing in t0 . The expected net social cost from such a deviation is strictly decreasing in t0 , and tends to zero as t0 → tˆm . Thus, there exists a date t00 < tˆm p ˆ such that, for every t00 < t0 < tˆm , the deviation Dm+1 (t0 , tˆm ) is strictly profitable. A contradiction.

We now show that some of the discontinuities of pˆ0 must be upward jumps. Suppose by way of contradiction that pˆ(t) > 0 for every t ∈ (0, t0 ), and that all discontinuities of pˆ are downward jumps. It is then impossible to satisfy pˆ(0) < p0 and pˆ(t0 ) ≥ p0t0 without having an m and a date tm at which pˆ(tm ) = pm tm , and the previous argument excludes such a policy profile as an equilibrium, establishing the contradiction. (iii) a) Suppose by way of contradiction that ∃ε > 0 such that pˆ(t) > 0 for each t ∈ (tˆ − ε, tˆ). The same argument as for (ii) establishes a contradiction. (iii) b) Suppose by way of contradiction that ∃ε > 0 such that pˆ(t) > µ(tˆ) for each t ∈ (tˆ, tˆ + ε). Then there exists δ ∈ (0, ε) such that µ(t) > µ(tˆ) for every t ∈ (tˆ, tˆ + δ), and for every n such that pn ˆ(tˆ), the deviation tˆ ≤ p p ˆ ˆ ˆ ˆ Dn (t, t) is profitable for some t ∈ (t, t + δ). A contradiction. Thus, our candidate equilibrium policies either have a support which is a finite collection of dates {tˆk }K k=0 at ˆ ˆ which pˆ(t) > 0, as illustrated in Example 1. Alternatively, the support may include a closed interval [tk , tk + εk ] where εk ∈ (0, tˆk+1 − tˆk ), as long as (iii)b) is satisfied on that interval.

57

Figure 12: Sketch of an admissible equilibrium policy with threshold beliefs given by pˆ, and resulting reputation given by µ(t).

A.11

Proof of Proposition 10

Proof. Let W 0 (p(t)) := supT Vt0,T . The function W 0 solves45 the Bellman equation (A.1), and is the unique solution satisfying the boundary conditions u(0) = s and u(1) = γG . It is equal to s on [0, p† ] and solves the ordinary differential difference equation (A.2) in p on (p† , 1)46 for some p† to be found. The function W 0 is bounded above by the full information benchmark pγG + (1 − p)s. We now derive a lower bound. Take some b ∈ (0, 1) given and fixed. Consider the policy that experiments until the arrival of the nth piece of news, n ≥ 1, for initial beliefs p0 ∈ [b, 1) and immediately repeals the project for p0 ∈ (0, b). On (0, b) its payoff is s, and on [b, 1) it is given by u(p, n) defined in (A.46). For each n ≥ 1 the unique threshold p† (n) solving u(p† (n), n) = s satisfies p† (n) < p† (n + 1). Moreover, as n grows without bounds, this policy approximates the policy of never stopping, given initial belief p0 ∈ [b, 1), so that limn→∞ u(p, n) = γ(p) and limn→∞ p† (n) = (s − γB )/(γG − γB ). Choosing b and for each p ≥ b choosing n so as to maximise the payoff from this policy, we obtain the payoff: w0 (p) := max s, max u(p, n) . n≥1

and the optimal value of b: (A.47)

p‡ = p† (1) =

s − A1B ∈ A1G − A1B

0,

s − γB γG − γB

.

Obviously, this policy is suboptimal in the planner problem. Therefore, w0 (p) constitutes a lower bound on the planner value W 0 (p). 45

The planner problem is similar to the cooperative problem with inconclusive breakdowns in Keller and Rady

(2015). Their Proposition 4 establishes that optimality of a threshold policy with a uniquely defined threshold belief. We do not repeat the argument here, but concentrate on describing the planner value function. It is not possible to obtain an analytical expression for this function. Nevertheless, we are able to perform the equilibrium analysis by relying on bounds on W 0 and p† derived here. 46 As the posterior belief always drifts up, we say that a continuous function solves the following ODDE if its right-hand derivative exists and (A.2) holds when this right-hand derivative is used to compute W 00 (p).

58

A.12

Proof of Proposition 11

Proof. Fix n such that pn−1 < p† , and a realised path of p(t) such that under the planner policy, the DM repeals 0 † the project at t0 ∈ [t†n−1 , t†n ), where for each k ≥ 0, t†k is defined by pn † = p . t k

† Fix a short duration dt > 0 such that pn t0 +dt < p . Let dt → 0 and consider the following, local deviation. The 0 DM with posterior belief pn t0 does not repeal the project at t , but continues experimenting for the duration dt,

then resumes the planner policy. This means that, conditional on no public success or failure on [t0 , t0 + dt), the DM repeals the project at t0 + dt, regardless of how many pieces of news she observes on [t0 , t0 + dt). Using the shorthand p ≡ pn t0 , the expected net reputational benefit from this deviation is given by pηG dt + (1 − η(p)dt)µ(t0 + dt) − p = −p(1 − p)∆λdt,

(A.48)

where the equality is obtained by substituting µ(t0 + dt) = pn t0 +dt = p − (∆η + ∆λ)p(1 − p)dt, ignoring terms in o(dt), and simplifying. Since in the bad news case, ∆λ < 0, the expression in A.48 is strictly positive and the DM expects a strict reputational benefit from this deviation. Conversely, the expected net social cost from this deviation is given by (A.49)

pηG dtg + +(1 − p)ηB dt` + (1 − η(p)dt − ρdt)s − s = p(ηG + ρ)[γG − s] + (1 − p)(ηB + ρ)[γB − s] dt,

which is strictly negative for every p < p† . (This follows from Proposition 10) The local deviation is not profitable if and only if h i h i (A.50) α − p(1 − p)∆λ + (1 − α) p(ηG + ρ)[γG − s] + (1 − p)(ηB + ρ)[γB − s] ≤ 0. Now set t0 = t†n − dt, so that letting dt → 0 amounts to letting t0 → t†n , and consider the local deviation such that, at the issue of the deviation, conditional on no public success or failure on the interval [t0 , t0 + dt), the DM’s † 0 posterior belief is pn t+dt = p . Resuming the planner policy then means continuing to experiment at t + dt. Since

the planner policy separates all types of the DM, her expected reputation under that policy is h i h i E µ(τG ∧ τB ∧ τ )|p(t0 + dt) = p† = E p(τG ∧ τB ∧ τ )|p(t0 + dt) = p† = p† , and (A.48) gives the expected net reputational benefit from the local deviation in question. Furthermore, for every p < p† , there is no expected learning benefit from experimenting on [t0 , t0 + dt), so that W 0 (p) = s and W 00 (p) = 0 for every p < p† . Thus (A.49) gives the expected net social cost. Consequently, the left-hand side of (A.50) gives the expected net payoff from a local deviation for any t0 ∈ [t†n−1 , t†n ),

† † or equivalently, any pn t0 ∈ [j(p ), p ). It is strictly increasing in α. Thus, the deviation is unprofitable for

every α ≤ α ¯ 1 (p), where α ¯ 1 (p) is defined to be the unique value satisfying A.50) with equality. The function α ¯ 1 is continuous, and is strictly decreasing in p on (0,1). This follows from (A.51)

A0 (p) =

p2 (ηG + ρ)[γG − s] + (1 − p)2 (ηB + ρ)[s − γB ] < 0, ∆λ [p(1 − p)]2

where A(p) := α ¯ 1 (p)/(1 − α ¯ 1 (p)). Consider the threshold belief p‡ defined in (A.47). It satisfies ηB + ρ ηG + ρ p‡ γG − s + (1 − p‡ ) γB − s = 0. ηG + λ G + ρ ηB + λB + ρ Multiplying both sides of the equation above by (ηG +λG +ρ), and observing that since ∆η+∆λ < 0, (ηG + λG + ρ)/(ηB + λB + ρ) < 1, we obtain that p‡ (ηG + ρ)[γG − s] + (1 − p‡ )(ηB + ρ)[γB − s] < 0, implying that α ¯ 1 (p‡ ) > 0. Moreover, it is easy to see that α ¯ 1 (0) = 1. It follows that, for every p ∈ [j(p† ), p† ), 1>α ¯ 1 (j(p† )) > α ¯ 1 (p) > α ¯ 1 (p† ) > 0.

59

We conclude that, when α ≤ α ¯ := α ¯ 1 (p† ), continuing to experiment when the planner policy prescribes repealing the project is not profitable, for every realisation t0 of the planner’s stopping time. Repealing the project when the planner policy prescribes experimenting is never profitable, as it generates a social loss, and a reputational benefit no greater than zero. The proposition follows.

60