Divide and Conquer Dynamic Moral Hazard

Viewer
Transcript

Divide and Conquer Dynamic Moral Hazard Katsuhiko Aiba∗ August, 2015

Abstract It is often said that if one has a big problem to solve or prove, it might be better to adopt the ”divide and conquer” strategy, i.e., first divide the problem into easier subproblems and then solve them sequentially, which might make it easier to keep one’s motivation and effort level. This paper provides a characterization of when this is true. Specifically, we consider a dynamic principal-agent model in which the principal decides how many subproblems the whole research problem is divided into before the interaction starts, and then in each period the principal pays the research grant in advance to the agent and has the agent solve the subproblems sequentially. Keywords: Divide and conquer strategy, bandit problem, encouragement effect, direct effect, dynamic agency cost, continuous-time principal-agent model JEL codes: C73, D82, D83

1. Introduction Suppose that an entrepreneur has a R&D project which requires external financial support from a venture capital fund. Usually, R&D process may require a new technology or innovative idea whose effectiveness is ascertained by a sequence of trials and failures, and how many trials are needed to obtain a success of the project is inherently uncertain. It might be obtained after the 1st trial or the 10,000th trial. The R&D for a new drug in a pharmaceutical industry and a new semiconductor in an electronic industry are good examples. On the other hand, in order to eventually obtain success, the entrepreneur must continue to make his R&D effort and spend the funded money properly on the project even upon facing a sequence of failures. Moreover, the venture capital fund is not able to monitor these behaviors of the entrepreneur. Both the uncertainty of the time of success ∗

Institute for Microeconomics, University of Bonn Adenauerallee 24-42, 53113 Bonn, Germany. E-mail: [email protected]

and the imperfection of monitoring creates an agency problem between the entrepreneur and the venture capital fund: the venture capital fund cannot make a distinction between whether a sequence of failures is due to the entrepreneur’s laziness or to the unfortunate outcomes of his best effort. But the entrepreneur might try to explain his failures as unlucky draws from his diligent work while diverting the funded money toward his private needs. Hence, in order to hinder this temptation of the entrepreneur the venture capital fund has to pay him additional rent, and thereby induce his proper effort. In the above argument we suppose implicitly that the entrepreneur tries to attack the whole project at once. As the entrepreneur struggles with the project but observes sequences of failures, he might despair about the possibility of success and give up completely. Now it might be recommended that the entrepreneur try the ”divide and conquer” strategy, i.e., first divide the project into easier sub-projects and then attack them sequentially. This is because by adopting that strategy, it might become easier to obtain a small success at each sub-project, which keeps the entrepreneur’s motivation and effort level. This familiar intuition from our daily life experience suggests that the divide and conquer strategy might allow the venture capital fund to earn more profit because the entrepreneur can be more easily encouraged to make an effort by observing a sequence of small successes and hence the entrepreneur’s effort can be induced by smaller payments. In this paper we characterize the conditions under which this is true. Although the idea of the divide and conquer strategy is familiar, to our knowledge, there is no economic theory examining it. The applicability for the divide and conquer strategy to mitigate the above agency problem covers a variety of economic situations having in common that a researcher or an entrepreneur requires external funding. For example, an academic researcher has a research problem to solve or prove, and he needs a grant for working on the problem from a research sponsor such as a government, a university or a research foundation. By dividing the whole problem into subproblems, it might be less costly for the sponsor to induce the researcher to make the proper effort. Specifically, we consider a stylized model of a dynamic principal-agent relationship in which the principal decides how many subproblems the whole research problem is divided into before the interaction starts, and then in each period he pays the research grant in advance to the agent and has the agent solve the subproblems sequentially. Unbeknownst to the principal the agent can either shirk by diverting the grant to his private needs or work on the subproblem by investing the grant on a method (bandit), whose effectiveness for solving the subproblem is inherently uncertain. Once one subproblem is solved, the agent moves on to the next subproblem until the final one is solved. In each

–2–

period, if the method is effective and the agent spends the grant on the project, then the subproblem will be successfully solved with positive probability. Otherwise, no success occurs. As the agent works on the subproblem but observes no success, the belief in the method’s effectiveness gradually decreases by Bayesian learning. The interaction ends if the final subproblem is solved or if the belief is pessimistic enough for the project to be abandoned in the light of a row of failures. While the number of subproblems is fixed throughout the interaction, we assume that the principal does not have commitment ability regarding future contract terms. In the model, there is a basic tension between the principal and the agent. By pocketing the grant for his private ends, the agent can always guarantee the chance of future success because the agent knows that nothing happens in the current period if he does not work. On the other hand, the immediate success terminates the funding for the subproblem that the agent is currently working on, so that he could not extract further streams of future rents, that is, the option values of diverting the grant in the future experiments. Hence, in order to have the agent work in a current period, the principal has to compensate the agent for the loss of these rents, which we call dynamic agency cost. Before stating our result, we argue that the divide and conquer strategy increases the dynamic agency cost. Upon observing a failure to solve an easy subproblem, the principal becomes more pessimistic about the effectiveness of the method than after observing a failure to solve the whole harder problem at once. Thus, with the divide and conquer strategy the principal would be more inclined to say, ”the agent could not even solve that easy subproblem. Maybe his method is wrong.” Then, in the next period the principal will have to pay more to the agent in order to induce the agent’s effort because the principal thinks that the agent is also disappointed by the method. This actually increases the dynamic agency cost, that is, the agent’s option value of diverting the grant to obtain a failure with certainty, compared to the case when solving the whole problem at once. We first consider a two period model in which the principal decides whether to have the agent solve the whole problem all at once, or to divide the problem into halves and have the agent solve them sequentially. Suppose that the research project only needs a small grant and that the principal is sufficiently confident about the effectiveness of the method. Then the divide and conquer strategy is optimal since it has the effect of increasing both the principal’s and the agent’s continuation payoff enough to cover an increase in the dynamic agency cost. The divide and conquer strategy increases the principal’s continuation value because updating a high prior belief about the method’s effectiveness after a one-time failure does not lead to a significant difference in the posteriors between the two alternative strategies (i.e., the divide and conquer strategy and the solving-the-

–3–

whole-problem strategy). That is, the speed of learning is slow when the parties are optimistic. Hence, the continuation values after the failure does not make difference. On the other hand, given the optimistic belief about the method, the principal thinks that the whole problem is likely to be solved in the current period with no continuation payoff in the next period, while with the divide and conquer strategy the current subproblem is also likely to be solved but they still have the next subproblem to be solved so that the continuation payoff after the current success is positive. This is why the divide and conquer strategy increases the principal’s continuation payoff, which we call the direct effect. The similar intuition holds for the agent and this increase in the agent’s continuation payoff relaxes his incentive constraint, so that the principal can save in payments to the agent. We call this the encouragement effect. Both the direct effect and the encouragement effect are so strong that they dominate the increase in the dynamic agency cost. Continue to consider the case in which the research project only needs a small grant but now suppose that the principal is pessimistic about the method’s effectiveness. Then it is better to solve the whole problem at once since the divide and conquer strategy is risky in that the agent might get stuck in early subproblem. In contrast with the previous case, with the pessimistic belief about the method’s effectiveness, one-time failure makes a large difference about the degree of pessimism between the divide and conquer strategy and the solving-the-whole-problem strategy. Hence the continuation payoff after the failure is much lower with the divide and conquer strategy, and the principal fears ending up in such a pessimistic situation. This makes the direct effect work negatively for the principal’s profit. Combined with the increase in the dynamic agency cost, the principal can earn more by having the agent solve the whole problem at once. Finally, suppose that the project needs a large grant. In this case the principal must be sufficiently confident about the method’s effectiveness in order for the project to be undertaken and hence the same intuition as in the previous paragraph holds, so that the divide and conquer strategy is always optimal. We then extend the model from two periods to an infinite horizon. We consider a continuous time environment as it is hard to obtain a clear characterization of an equilibrium in a discrete time version of our model. Our attention is restricted to a subclass of weak perfect Bayesian equilibria in which players’ strategies on the equilibrium path depend on two state variables: the public belief held by the principal and the agent about the method’s effectiveness, and the number of subproblems that remain to be solved. Although the resulting value function of the principal is complex, we show by numerical examples when the divide and conquer strategy is optimal, and confirm that an intuition similar to the one applying to the two period model continues to hold.

–4–

1.1 Related Literature The term of divide and conquer strategy is used in politics and computer science. In politics, the strategy has been adopted for ruling subjects under a sovereign or an empire. It involves creating or encouraging divisions among the subjects in order to prevent alliances that could challenge the sovereign. In computer science, it is an algorithm of breaking down a problem into two or more sub-problems until these become simple enough to be solved directly. Our usage of the divide and conquer strategy is close to the latter. There are other economic contexts in which the term of divide and conquer strategy is used. Segal (2003) considers a contracting problem when there are externalities among agents. Suppose that the principal wants to implement the outcome in which every agent trades with him, and that there is externality in that if one agent trades with the principal, then the others will want to trade with him more. Now the principal’s optimal strategy is a divide and conquer one : the principal first offers a price just low enough to induce the first agent to trade, and then he can induce the second agent to trade by offering a lower price because of the externality and so on. Ishiguro (2004) studies how the principal can prevent collusive shirking among agents when he can only evaluate them based on the relative ranking of their performance. He shows that the collusion can be deterred by offering one agent a higher-powered scheme than the other agent even when they are symmetric. Jullien (2011) analyzes a competition among platforms in a multi-sided market, and he shows that a platform can make more profits by dividing and conquering users: setting a low price to some users and serving the other users at a high price, which is made possible by the externality of users’ participation in the platform. The usage of the divide and conquer strategy in these studies is close to the one in politics. In our paper we adopt a bandit problem for modeling the agent that is experimenting on each subproblem by using the risky method. The multi-armed bandit problem is originally a statistical decision model in which a player decides which arm of different slot machines to pull in a sequence of trials so as to maximize his total expected payoffs. This problem creates the intriguing trade-off between experimentation on each arm to find the best one, and exploitation by playing the arm believed to be best. This model was first applied in economics by Rothschild (1974) to the problem of a single firm’s pricing policy for learning a market demand. Since then the model has been embedded in more strategically intricate interaction among economic agents. Bolton and Harris (1999), Keller, Rady and Cripps (2005), Keller and Rady (2010), and Klein and Rady (2011) investigate the two-armed bandit problem in which players learn about a risky arm whose outcome and their choices are publicly observable. Hence, free-riding coming from the information externality plays a key role in these studies. Rosenberg, Solan and Vieille –5–

(2007), Murto and V¨alim¨aki (2010), and Hopenhayn and Squintani (2011) work on the case in which players’ actions are observable but their outcome is not, while Bonatti and Horner (2011) study the other case with unobservable actions and observable outcome. ¨ Akcigit and Qingmin (2011) consider a R&D competition game where players’ actions are unobservable but only success is observable. Most closely related papers to ours are Bergemann and Hege (2005), and Horner and ¨ Samuelson (2012). Bergemann and Hege (2005) investigate a dynamic principal-agent model that is similar to ours but the agent has the bargaining power to offer a contract while the principal is the one to make the offer in our paper. They focus on whether the observability of an agent’s effort is better in terms of social efficiency, and show that the unobservability of the effort is always good for efficiency. While Bergemann and Hege (2005) focus on a particular Markovian equilibrium which may not even exist, Horner ¨ and Samuelson (2012) investigate a continuous-time infinite horizon model, introduce a weaker notion of recursive Markovian equilibrium which always exists, and characterize the full set of these equilibria. They also focus on a non-Markovian equilibrium and find that there exist more efficient equilibria than any of recursive Markovian ones. However, what is significantly different from our paper is that neither of these papers ever ask about the divide and conquer strategy, so that in their model the agent has to solve the whole problem at once. Finally, Mason and V¨alim¨aki (2011) considers a continuous-time moral hazard model in which the bandit is known to be good and the agent incurs a convex cost of effort instead of the principal paying the research grant. They compare the cases with and without the principal’s ability to commit, and investigate what the payment schedule looks like in each case. The paper is organized as follows. Section 2 studies a simple two-period model and completely characterizes the conditions under which the divide and conquer strategy is optimal. We see how the intuition of our result discussed above holds more specifically. In Section 3 we investigate an infinite horizon model where equilibrium payoffs and behaviors are characterized. We show by numerical examples when the divide and conquer strategy is optimal and confirm that an intuition similar to the one discussed fro the two period model continues to hold. Finally, Section 4 gives possible extensions and concluding remarks.

2. Two-Period Model In this section, we introduce a simple two-period model with t ∈ {0, 1} and completely characterize the conditions under which the divide and conquer strategy is optimal. En –6–

route, basic insights are provided. We consider a two-period interaction between a principal and an agent. The principal has a fixed amount V of a research project to have the agent work on. For the project the principal is also prepared to pay the agent a research grant in the amount of g ∈ (0, 12 ) in each period. In order to solve the research problem, the agent tries some risky method, which we call risky bandit, that might be either good or bad for solving the problem. If that risky bandit is good, then the agent can solve the whole problem V with probability λ in each period. When the agent successfully solves the problem, the output V is distributed between the principal and the agent. The bad bandit always brings nothing. There is also another risky bandit available for solving the problem. But for this bandit, even if it is good, the agent can only solve the half problem V2 with probability 2λ in each period. The trade-off here is that each subproblem V2 is more likely to be solved than the whole problem V because 2λ > λ. Again, if the agent succeeds, then the output V2 is distributed, and the bandit yields nothing if it is bad. Let’s call the bandit with (V, λ) a full bandit, and the bandit with ( V2 , 2λ) a half bandit. We assume that neither the principal nor the agent knows whether these risky bandits are good or bad, and that they have a common prior p0 < 1 of being good about both kinds of bandit. Note that both the full bandit and the half bandit initially have the same level of uncertainty p0 . Now there are two ways to attack the research project ; either (i) the principal lets the agent use the full bandit over two periods to solve the whole problem V at once, or (ii) the principal divides the whole problem into two subproblems ( V2 , V2 ) and has the agent use the half bandit to solve them sequentially. Before the initial period 0, the principal chooses which way to go. We assume that Vλ = 1 just for normalization and consider the research problem V big enough to make sure that 2λ < 1. Both the principal and the agent discount future payoffs with δ < 1. The game ends either once the total amount of output reaches V, or at the end of period t = 1. That is, there are the following ways in which the game ends : (i)

Suppose that the principal decides to have the agent solve the whole problem at once with the full bandit. If the agent successfully solves the problem V at period 0, then the game ends. If the agent fails in period 0, then he continues to use the same full bandit in the next period to try to solve the whole problem. But now both the principal and the agent revise their common prior p0 to posterior p1 =

p0 (1 − λ) < p0 . 1 − λp0

That is, upon the failure in period 0, they become more pessimistic about the –7–

effectiveness of the full bandit for solving the problem. The game ends at the end of period 1 whether the agent gets a breakthrough or not. (ii) Suppose that the principal divides the whole problem into two subproblems and has the agent adopt to solve each subproblem sequentially with the half bandit. Then, if the agent has a breakthrough V2 at period 0, then the agent adopts a new half bandit with a common prior p0 at period 1 to solve the next subproblem. If the agent fails at period 0, then he continues to use the same half bandit with more pessimistic posterior e p1 =

p0 (1 − 2λ) . 1 − 2λp0

The game ends at the end of period 1 whether the agent gets a breakthrough or not. The stage game in each period t develops as follows. At the beginning of each period t, the principal can offer a contract of share st ∈ [0, 1] of the output and the research grant g to the agent. The share st means that if the agent successfully solves with output V2 or V, then he gets the share st of the output and the principal receives the remaining share 1 − st . We can without loss of generality restrict attention to contracts of share due to the binary (success or failure) nature of the outcome. After the offer is made, the agent can choose whether to accept it or not. If the agent accepts, then the agent can either spend the grant on using the risky bandit to work on the problem, or shirk by diverting the grant to his private needs, so that the larger grant makes it more tempting for the agent to shirk. The principal cannot observe whether the agent is working or shirking. If the offer is not made by the principal or rejected by the agent, then nothing happens in the current period and the principal can make a new offer in the next period unless the game ends. We remark here that the principal commits from the outset to the use of a full bandit or a half bandit over the two periods. That is, the principal cannot change the bandit’s size during the game. Moreover, at the outset, the principal cannot commit to an output share schedule for the second period. In this paper, we put aside this issue for simplicity. Finally, we assume that the prior satisfies : (1)

p0 ≥ 2g.

Otherwise, there would be no investment on the research project in equilibrium as we will see in the following subsections. Note that the posteriors p1 and pe1 might be greater or less than 2g, depending on the probability of success λ and the prior p0 . –8–

2.1 Social Surplus In this subsection, we suppose that the principal spends the grant and solves the problem by himself. And we consider the total amount of social surplus over two periods. The socially efficient policy is to invest the grant on solving the problem as long as the current expected return of the research project is positive : Vλpt − g =

V 2λpt − g = pt − g > 0, 2

where the first term is the output V or V2 multiplied by the probability of success λpt or 2λpt , and the second term is the cost g of the research project. Both bandits yield the same expected payoff in the current period. We can easily check that our assumption (1) implies p1 , e p1 ≥ g, so that the social efficient policy implements the research project in both periods whichever bandit is adopted. Hence, the social surplus is SF (p0 ) := Vλp0 − g + δ(1 − λp0 )(Vλp1 − g), when the full bandit is adopted to solve the whole problem at once. The second term is the continuation value when the principal fails in period 0, which happens with probability 1 − λp0 , and the principal continues to use the bandit under more pessimistic belief p1 . When the problem is divided into two subproblems and the half bandit is used, the social surplus is SH (p0 ) :=

V V V 2λp0 − g + δ (1 − 2λp0 ) 2λe p1 − g + 2λp0 2λp0 − g . 2 2 2

The second term is the continuation value consisting of the payoff V2 2λe p1 − g when the V principal fails with probability 1 − 2λp0 , and the payoff 2 2λp0 − g when he succeeds with probability 2λp0 and tries to solve the new subproblem in period 1. The social surplus is S(p0 ) := max SF (p0 ), SH (p0 ) when both bandits are available. Since it turns out that SH (p0 ) ≥ SF (p0 )

if and only if

p0 ≥ p∗ :=

1+g , 2

the availability of the divide and conquer strategy increases the social surplus when p0 is large, compared to the case when the principal can only solve the whole problem at once. To understand this, note that comparing surpluses SF (p0 ) and SH (p0 ) is equivalent to comparing two lotteries, full bandit : (1 − λp0 ) ◦ (p1 − g) + λp0 ◦ 0, –9–

half bandit : (1 − 2λp0 ) ◦ (e p1 − g) + 2λp0 ◦ (p0 − g), where e p1 − g < p1 − g < p0 − g. First, note that ! 1 − 2λp0 1 − λ p1 = 1 − λp0 1 − 2λ e p1 is decreasing in p0 , so that the relative difference between e p1 − g and p1 − g is small with large p0 , and vice versa. This is because when the principal is confident about the bandit’s effectiveness, the speed of learning is so slow that the principal does not become so pessimistic about it upon observing the failure. Thus, with large p0 , the principal does not care about the difference between the pessimistic situations p1 − g and e p1 − g that he might end up when using the full bandit and the half bandit respectively. On the other hand, by the divide and conquer strategy he is likely to solve a subproblem in period 0 and obtain high surplus p0 − g in period 1 with high probability 2λp0 , so that the dividing and conquering is more attractive. Conversely, with small p0 , the difference between e p1 − g and p1 − g is so large that the principal fears getting stuck at the initial subproblem when adopting the divide and conquer strategy, and hence he prefers solving the whole problem at once.

2.2 Solving the Whole Problem at Once In this subsection we suppose that the principal adopts the full bandit to have the agent solve the whole problem at once. In the finite horizon model, we can solve for the equilibrium by backward induction. Suppose that the problem fails to be solved in period 0, and that the prior is updated to the common posterior p1 . We denote by sFt the share of the contract when the principal adopts the full bandit. Given the principal’s offer sF1 , the agent accepts it and exerts an effort to work on the problem only if the agent’s incentive compatibility constraint holds : sF1 Vλp1 ≥ g. That is, the expected return of solving the problem is not smaller than the payoff of diverting the grant to the agent’s private needs. Moreover, the principal invests the grant on having the agent solve the problem only if the participation constraint is satisfied : (1 − sF1 )Vλp1 − g ≥ 0.

–10–

Hence, these constraints imply that the research project is undertaken only if p1 ≥ 2g, which is in stark contrast with the socially efficient policy that the project is undertaken only if p1 ≥ g. Thus, in the equilibrium the principal gives up the project too early because the principal has to pay the information rent g in order to give the agent an incentive to g work. When p1 ≥ 2g, the principal maximizes his profit by offering the contract sF1 = p1 so that the agent’s incentive constraint holds with equality, and the agent receives payoff g. When p1 < 2g, the principal does not make an offer and nothing happens. Thus, we denote by     g F U (p1 ) :=    0

if p1 ≥ 2g, otherwise,

and n o W F (p1 ) := max (1 − sF1 )Vλp1 − g, 0 = max p1 − 2g, 0 , the agent’s value function and the principal’s value function in period 1 respectively, where the superscript F represents that the agent is using the full bandit. Now we proceed with our backward induction and consider the initial period 0. The agent is induced to work on the problem only if the agent’s incentive constraint is satisfied : (2)

sF0 Vλp0 + δ(1 − λp0 )UF (p1 ) ≥ g + δUF (p0 , p1 ),

where we write UF (p0 , p1 ) as the continuation value when the agent shirks at period 0. The left-hand side is the discounted payoff of solving the problem in period 0. With probability 1 − λp0 , the agent fails and the game moves to period 1 with the agent’s continuation value UF (p1 ). The right-hand side is the discounted value of shirking. The agent appropriates the grant g in period 0 and receives the continuation value UF (p0 , p1 ) with certainty as shirking makes sure that the problem could never be solved in period 0. Note that the history after the agent shirks is off the equilibrium path where the beliefs among the parties diverge. That is, on the one hand the agent’s posterior remains at p0 unbeknownst to the principal because the agent knows he did not try the problem. On the other hand, the principal’s posterior is updated to p1 even though the agent did not exert effort at

–11–

g

period 0. Hence, the principal offers sF1 = p1 in the final period 1 if p1 ≥ 2g and does not offer anything otherwise. Then, the agent’s continuation value is   F   s1 Vλp0 = F U (p0 , p1 ) =    0

p0 g p1

=

1−λp 0

1−λ

g

if p1 ≥ 2g, otherwise.

1−λp

The expression 1−λ0 is increasing in λ and decreasing in p0 . Using the bandit with larger λ, the principal becomes more pessimistic about the effectiveness of the bandit and offers a larger share to the agent in period 1, which increases the agent’s information rent. Note also that the agent has the option of diverting the grant in period 0 and guaranteeing himself the chance to solve the problem in the next period. This option is more attractive to the agent when the probability of success p0 is small. We call the term δUF (p0 , p1 ) the dynamic agency cost. The principal offers the contract sF0 in period 0 only if the following participation constraint holds : (1 − sF0 )Vλp0 − g + δ(1 − λp0 )W F (p1 ) ≥ δW F (p0 ). The left hand side is the total discounted payoff of investing on the research project over two periods while the right hand side is the discounted payoff of delaying the offer in period 0 and investing on the project in period 1. If the principal offers the contract, he sets s0 such that the agent’s incentive constraint (2) holds with equality. After some calculation we have the following characterization when the principal offers the contract in period 0. Lemma 2.1. The principal offers the contract sF0 in period 0 if and only if F

p0 ≥ p :=

2g +

λ δg 1−λ

1 − δλ(1 − g) +

λ δg 1−λ

≥ 2g.

Thus, if p0 ∈ [2g, pF ), then the principal delays the offer in period 0 and the project is undertaken only in period 1. When the probability of success p0 is low enough, the dynamic agency cost makes it too costly for the principal to induce the agent’s incentive to work on the problem in period 0. Now we write the principal’s total profit as

(3)

  F F   (1 − s0 )Vλp0 − g + δ(1 − λp0 )W (p1 ), F W (p0 ) :=    δW F (p0 ),

–12–

if p0 ≥ pF , otherwise.

2.3 Dividing and Conquering the Problem In this subsection we suppose that the principal adopts the divide and conquer strategy, i.e., dividing the problem V into halves and using the half bandit to solve them. By backward induction, we begin by considering the final period 1. There are two kinds of histories in which the game moves to final period 1 : one in which the subproblem is solved in period 0 and the agent is about to use the new half bandit to solve the new subproblem, and the other in which it fails to be solved in period 0 and the agent continues to use the same half bandit to try the same subproblem as before. However, in either case the agent’s incentive compatibility constraint and the principal’s problem in period 1 are the same as in the full bandit case because this is the final period. Hence we denote by     g H U (p) =    0

if p ≥ 2g, otherwise,

and W H (p) = max p − 2g, 0 the agent’s and the principal’s value function in period 1 respectively, where the superscript H represents that the agent is using the half bandit. Now going back to period 0, the agent is induced to work on the problem in period 0 if the following incentive compatibility condition holds : (4)

sH 0

n o V 2λp0 + δ (1 − 2λp0 )UH (e p1 ) + 2λp0 UH (p0 ) ≥ g + δUH (p0 , e p1 ), 2

where sH is a share of the contract when the principal adopts the divide and conquer strat0 egy. The second term of the left-hand side is the expected continuation value consisting of the payoff UH (e p1 ) after the problem fails to be solved in period 0, and the payoff UH (p0 ) after it is successfully solved. These payoffs are multiplied with probabilities 1 − 2λp0 and 2λp0 respectively. The payoff UH (p0 , e p1 ) of the right-hand side is the continuation value after the agent shirks in period 0, and as in the previous argument,   1−2λp0   g  1−2λ UH (p0 , e p1 ) =    0

if e p1 ≥ 2g, otherwise.

The term δUH (p0 , e p1 ) is the dynamic agency cost when using the half bandit. Again, the –13–

in period 0 only if the following participation constraint principal offers the contract sH 0 holds : (5)

n o V H H (1 − sH ) 2λp − g + δ (1 − 2λp )W (e p ) + 2λp W (p ) ≥ δW H (p0 ), 0 0 0 0 0 1 2

is set such that inequality (4) holds with equality. As in the full bandit case, there where sH 0 may be a delay of the offer in equilibrium. in period 0 if Lemma 2.2. There exists pH ∈ [2g, 1) such that the principal offers the contract sH 0 and only if p0 ≥ pH . Proof. See Appendix. The expression of pH in terms of parameters is complicated because pH is a solution of the quadratic equation (5), and so we abstain from showing it. It is shown in the next subsection that pF can be greater or less than pH depending on parameters. Now we write the principal’s profit as

(6)

 n o  H V H H   (1 − s ) 2λp − g + δ (1 − 2λp )W (e p ) + 2λp W (p ) 0 0 0 0  1 0 2 W H (p0 ) :=   H  δW (p0 )

if p0 ≥ pH , otherwise.

2.4 When the Divide and Conquer Stratetgy Is Best? In this subsection we compare the principal’s total profit (3) and (6) to characterize when the principal optimally chooses the divide and conquer strategy. From these functions, it is not hard to obtain the exact characterization. For convenience, we let ZF (p0 ) represent the ”if p0 ≥ pF ” part of function W F (p0 ) and similarly we let ZH (p0 ) represent the ”if p0 ≥ pH ” part of the function W H (p0 ). First note that both functions W F (p0 ) and W H (p0 ) share the same linear function δW F (p0 ) = δW H (p0 ) as long as p0 ≤ min pF , pH . Once p0 exceeds min pF , pH , one of these functions departs from the other and follows ZF (p0 ) or ZH (p0 ). The other function also follows ZF (p0 ) or ZH (p0 ) if p0 ≥ max pF , pH . Next it is easy to show that the quadratic and increasing function ZH (p0 ) can cross the linear and increasing function ZF (p0 ) from below at most once for p0 ∈ [2g, 1]. Finally, it is also easy to see that ZH (1) > ZF (1). Therefore, we have the following exact characterization. Proposition 2.3. If pF ≥ pH , then the divide and conquer strategy is always optimal. Conversely, if pF < pH , then there exists p ∈ (2g, 1) such that p0 ≥ p if and only if the divide and conquer strategy is optimal.

–14–

The expression determining whether pF ≥ pH holds or not in terms of the underlying parameters is so complicated that it seems to be hard to obtain an economic intuition. Moreover, in the above we do not argue that there exists a parameter configuration under which pF ≥ pH or pF < pH is true, i.e., one of the statements in Proposition 2.3 might be vacuously true. However, we can give the following sufficient conditions for pF ≥ pH and pF < pH , respectively. Proposition 2.4. There exists g ∈ (0, 21 ) such that if g ≥ g, then pF ≥ pH and the divide and conquer strategy is always optimal. Proposition 2.5. There exists g ∈ (0, 12 ) such that if g ≤ g, then pF < pH and there exists pˆ ∈ (2g, 1) such that p0 ≥ pˆ if and only if the divide and conquer strategy is optimal. Proof. See Appendix. Thus, if the amount g of the grant is sufficiently large (i.e., g ≥ g), then the principal can adopt the divide and conquer strategy to make more profits for any prior p0 ∈ (2g, 1). If the ˆ then grant g is sufficiently small and the prior p0 is sufficiently large (i.e., g ≤ g and p0 ≥ p), the principal can adopt the divide and conquer strategy to make more profits. But when ˆ it is optimal to solve the whole problem the parties are pessimistic (i.e., g ≤ g and p0 < p), at once. Moreover, the above two propositions show that there exist environments in which pF ≥ pH and pF < pH hold respectively. What is the underlying force that explains these results? To simplify the argument, we suppose here that the research project is undertaken in both periods whichever strategy the principal adopts. Since the shares sF0 and sH are determined so that the incentive 0 constraints (2) and (4) hold with equality, the principal’s expected payments are : (7)

sF0 p0 = g + δUF (p0 , p1 ) − δ(1 − λp0 )UF (p1 ), h i H H H e sH p = g + δU (p , p ) − δ (1 − 2λp )U (e p ) + 2λp U (p ) 0 1 0 1 0 0 . 0 0

First note that the difference between the second terms in equations (7) can be seen as UH (p0 , e p1 ) − UF (p0 , p1 ) =

1 − p0 λg > 0, (1 − λ)(1 − 2λ)

so that adopting the divide and conquer strategy increases the dynamic agency cost. Thus, by using the half bandit with success probability 2λ rather than the full bandit with λ, the principal becomes more pessimistic after no success, and has to pay a higher information rent to the agent, which makes the incentive constraint (4) more tight relative to constraint –15–

(2). But this difference is decreasing in p0 because as p0 increases, the difference in the speeds of learning between the full bandit and the half bandit are less significant, as we discussed in Subsection 2.1. Second, the difference between the third terms in equations (7) can be seen as h

i (1 − 2λp0 )UH (e p1 ) + 2λp0 UH (p0 ) − (1 − λp0 )UF (p1 ) = λgp0 ≥ 0,

which we call the encouragement effect. Thus, dividing the problem and giving the agent the opportunity to solve it in the next period always encourages him to work and relaxes the incentive constraint (4) relative to (2). This difference is increasing in p0 because with large p0 the whole problem is likely to be solved in period 0 so that the future expected value of solving the whole problem at once is less attractive. Now from these arguments it is natural to see that sF0 p0 ≥ sH 0 p0

if and only if

p0 ≥

1 . 1 + (1 − λ)(1 − 2λ)

Thus, if the prior is sufficiently large, then the encouragement effect dominates the increase in the dynamic agency cost so that the principal can save in payments to the agent by adopting the divide and conquer strategy. Finally, looking at the principal’s profit functions (3) and (6), there is a direct effect of the divide and conquer strategy, that is, the difference between the principal’s expected continuation values : h

i (1 − 2λp0 )W H (e p1 ) + 2λp0 W H (p0 ) − (1 − λp0 )W F (p1 ).

Comparing these values is almost equivalent to comparing the lotteries of social surplus in Subsection 2.1 and the same intuition applies : with large p0 , the principal prefers to adopting the divide and conquer strategy to seek high profits W H (p0 ) rather than risking getting stuck at a pessimistic situation with W H (e p1 ), and vice versa. We just call this term the direct effect. Substituting for the above expression of the expected payments sF0 p0 and sH p into the 0 0 profit functions (3) and (6), we can see that h i W H (p0 ) − W F (p0 ) = SH (p0 ) − SF (p0 ) − δ UH (p0 , e p1 ) − UF (p0 , p1 ) , which comes from a standard result in a dynamic mechanism design literature : the principal’s profit can be expressed as the expected dynamic virtual surplus Sk (p0 ) − g − δUk (p0 , p1 ), k = H, L. In our context, the difference SH (p0 ) − SF (p0 ) of the social surpluses –16–

is the sum of the encouragement effect and the direct effect. As we have already seen in 1+g 1+g Subsection 2.1 it is positive if p0 ≥ p∗ = 2 , and negative otherwise. Note that if 2g ≥ 2 , that is, g ≥ 31 , then the sum of these effects is always positive for any prior p0 ∈ [2g, 1). This is just because in order to implement a project with high cost g, the initial belief p0 must be sufficiently high. Given the decomposition of the profit into the two effects and the dynamic agency cost, we can summarize when the divide and conquer strategy is effective for the principal. • Suppose that the grant g which the agent can pocket by shirking is large so that g ≥ g. Then the encouragement effect and the direct effect are strong enough to dominate the increase in the dynamic agency cost. Hence, the divide and conquer strategy is always optimal for any prior p0 ∈ [2g, 1). • Suppose that the grant g is sufficiently small such that g ≤ g. If the principal is ˆ then the encouroptimistic enough about the bandit’s effectiveness so that p0 ≥ p, agement effect and the direct effect are so strong that they dominate the increase in the dynamic agency cost and hence it is optimal to adopt the divide conquer strategy. ˆ then the sum If the principal is pessimistic enough about the bandit such that p0 < p, of the encouragement effect and the direct effect are negative or those positive effects are so small that the increase in the dynamic agency cost dominates them. In this case it is optimal to solve the whole problem at once.

3. Infinite Horizon Model In this section, we consider the infinite horizon model in continuous time, starting at time t = 0. First, we consider the long-term interaction between the principal and the agent as a family of dynamic games (Γ∆ ) indexed by time length ∆ > 0. Each game Γ∆ can be viewed as a continuous time game in which players are subject to inertia. This means that if the last time the principal makes an offer is time t, then the principal cannot make another offer until t + ∆, which guarantees that the model is well defined. We characterize the limit behaviors and payoffs of the limit game Γ0 by letting ∆ → 0. This also helps us in deriving the system of differential equations that characterizes equilibrium payoffs in a heuristic way. The game Γ∆ develops as follows. Before the game starts, the principal chooses N ∈ N to decide how many N subproblems the whole problem is divided into. Then, the agent V must solve each subproblem by using the bandit ( N , Nλ∆). This is more general than in the two period model where the principal can only choose either not to divide at all or to –17–

divide it into halves. Suppose that time length ∆ has elapsed at time t since the last time the principal made an offer. As in the two period model, the principal can offer a contract of share st ∈ [0, 1] of the output and the research grant g∆ to the agent. If the principal V , Nλ∆) to work makes an offer, then the agent decides whether to use the risky bandit ( N on the problem or to shirk by appropriating the grant g∆. Again, if the bandit is good and the agent works, then the subproblem is solved with probability Nλ∆. Otherwise, nothing V is distributed among happens. If the subproblem is successfully solved, then the output N the parties and they move on to try to solve the next subproblem with prior p0 if there remains an unsolved subproblem. When the final subproblem is solved, the game ends. Let r be the common instantaneous interest rate of the principal and the agent, that is, they discount payoffs with e−r∆ in the game Γ∆ . We maintain the normalization Vλ = 1 as in the two period model. A time t history ht is a complete record of players’ action profiles and the number of remaining subproblems from time 0 up to, but not including, time t. If one knows ht , then he knows for each time t0 < t whether the principal made an offer or not (”No offer”), what share s ∈ [0, 1] was offered at time t0 , whether the agent shirked or worked at time t0 (”Shirk” or ”Work”), and how many n ∈ {1, 2, . . . , N} subproblems remain to be solved at time t0 . Formally, a time t history is a mapping ht : [0, t) → {{[0, 1] × {”Shirk”,”Work”}} ∪ {”No offer”}} × {1, . . . , N}, such that for some n0 , n” ∈ {1, . . . , N}, ht (t0 ) = ”No offer” × n0 if ht (t”) ∈ [0, 1] × n” and t0 ∈ (t”, t” + ∆), and such that for every t” < t0 < t, if the last component of ht (t”) and ht (t0 ) specifies n” and n0 respectively, then it must be n” ≥ n0 . The first qualification represents the inertia of an offer. The second one means that the number of remaining subproblems decreases as time passes. A time t public history e ht is a component of a time t history ht commonly observed by both the principal and the agent, that is, a mapping e ht : [0, t) → {[0, 1] ∪ {”No offer”}} × {1, . . . , N} with the same qualification as above. Now we define Ht and H∞ as the set of mappings ht and the paths h∞ of play, both of which satisfy the above qualification, respectively. We endow H∞ with the product σ-algebra H ∞ , and Ht with the sub-σ-algebra H t generated by the cylinder sets of Ht . Similarly we e t ) and (H e ∞ ) for the public history. Then a behavioral strategy for the et , H e∞ , H define (H principal is a progressively measurable process e∞ × [0, ∞) → ∆ {[0, 1] ∪ {”No offer”}} σP : H e t }t>0 such that σP ( · , t) = ”No offer” if σP ( · , t0 ) ∈ [0, 1] and t ∈ (t0 , t0 + ∆). with respect to {H –18–

By our assumption of the inertia, the principal has to wait to offer until t0 + ∆ if he offered at time t0 . When he can offer at time t, given a time t pubic history e ht he chooses a mixed action over [0, 1] ∪ {”No offer”} prescribed by his strategy σP (e ht , t). Similarly, the agent’ strategy is defined as a progressively measurable process σA : H∞ × {[0, 1] ∪ {”No offer”}} × [0, ∞) → ∆{”Shirk”, ”Work”} ∪ {”Nothing”} with respect to {H t ⊗ B[0,1]∪{”No offer”} }t>0 , such that the strategy prescribes no action if the principal makes no offer, i.e., σA ( · , ”No offer”, t) = ”Nothing”. Given a time t private history ht and an outstanding offer s ∈ [0, 1], the strategy σA (ht , s, t) prescribes a mixed action over {”Shirk”,”Work”}. As seen above, the players’ strategy depends on the private history ht for the agent and the public history e ht for the principal. Generally, the agent and the principal form the private belief and the public belief respectively about the bandit being good given those kinds of histories, and equilibrium play depends on those variables. However, in this paper we focus on a special class of weak perfect Bayesian equilibrium in which the players’ play on the equilibrium path depends on two state variables : the current public belief pt about the bandit being good, and the number n of remaining subproblems. We write UN (p, n) and WN (p, n) as the agent’s and the principal’s value function respectively, when V , Nλ∆). The the principal decides to divide the problem into N subproblems and adopts ( N state transition is determined as follows : for n > 1, if the agent solves a subproblem at state (pt , n), then the state transits to (pt+ , n − 1) with pt+ = p0 since the agent tries the new subproblem with common prior p0 . The game ends at time t whenever the agent solves a final subproblem at state (pt , 1). V When using the bandit ( N , Nλ∆), by Bayes’ rule the posterior belief evolves according to (8)

pt+∆ =

(1 − Nλ∆)pt , 1 − Nλ∆pt

upon observing the failure to solve a subproblem. We denote by sN (p, n) the principal’s strategy of share of contract that depends on state variables (p, n). Now that discount factor e−r∆ ≈ 1 − r∆, the agent’s Bellman equation can be written up to second order terms as V Nλ∆pt + (1 − r∆) (1 − Nλ∆pt )UN (pt+∆ , n) + Nλ∆pt UN (p0 , n − 1) N pt = g∆ + (1 − r∆) UN (pt+∆ , n), pt+∆

UN (pt , n) = sN (pt , n)

–19–

for n > 1 and V Nλ∆pt + (1 − r∆)(1 − Nλ∆pt )UN (pt+∆ , 1) N pt UN (pt+∆ , 1), = g∆ + (1 − r∆) pt+∆

UN (pt , 1) = sN (pt , n)

for n = 1. The second equality in both Bellman equations is the incentive compatibility constraint, which holds with equality here. Although much explanation is not needed about the above equations when looking at incentive constraints (2) and (4) in the two period model, we may discuss why the continuation value after shirking is given by pt U (p , n). The intuition is as follows : off the equilibrium path in which the agent pt+∆ N t+∆ shirked, the beliefs of the principal and the agent diverge and the principal makes subsequent offers based on more pessimistic belief pt+∆ , which brings the agent the continuation vale UN (pt+∆ , n). But the agent knows that he did not try the problem at time t, so that he still remains optimistic with pt . This is why we have the true continuation value of the pt . Formally, we first see that the agent by multiplying the value UN (pt+∆ , n) by the term pt+∆ agent’s value function, whether on or off the equilibrium path, has the following form of discounted sum :  k  ∞ X Y  UN (pt+∆ , n) = st+∆ pt+∆ ∆ + (1 − r∆)k  (1 − Nλ∆pt+l∆ ) st+(k+1)∆ pt+(k+1)∆ ∆ k=1 l=1  k−1  ∞ X Y   + (1 − r∆)k  (1 − Nλ∆pt+l∆ ) Nλ∆pt+k∆ UN (p0 , n − 1) k=1

l=1

The second term is the discounted sum of the expected payoffs of getting the share of the output after the agent failed k − 1 times and succeed on the kth attempt. The third term is the discounted sum of the continuation values of moving on to the next subproblem after the agent failed k − 1 times and succeed on the kth attempt. Now substituting pt+(k+1)∆ =

(1 − Nλ∆)pt+k∆ , 1 − Nλ∆pt+k∆

into the above expression repeatedly, we get ( ∞ X UN (pt+∆ , n) = pt+∆ ∆ st+∆ + (1 − r∆)k (1 − Nλ∆)k st+(k+1)∆ +

k=1 ∞ X

) k

k−1

(1 − r∆) (1 − Nλ∆)

k=1

–20–

NλUN (p0 , n − 1)

which involves only pt+∆ and the sequence of shares {st+k∆ }k∈Z+ . Note that the value UN (p0 , n − 1) is independent of beliefs held in the past. The sequence of shares is offered by the principal based on the belief pt+∆ while the agent privately knows that the true probability is pt rather than pt+∆ . Hence, the true continuation value to the agent is given pt UN (pt+∆ , n). by pt+∆ The principal’s Bellman equation is V WN (pt , n) = 1 − sN (pt , n) Nλ∆pt − g∆ N + (1 − r∆) (1 − Nλpt )WN (pt+∆ , n) + Nλpt WN (p0 , n − 1) , for n > 1 and V Nλ∆pt − g∆ + (1 − r∆)(1 − Nλpt )WN (pt+∆ , 1), WN (pt , 1) = 1 − sN (pt , n) N for n = 1. Again we do not need to explain those equations when seeing the profit functions (3) and (6) in the two period model. Now letting ∆ → 0 in th above equations, we first see that the evolution of belief follows (9)

p˙ t = −Nλpt (1 − pt ).

We also have the following system of differential equations : the agent’s value function satisfies the first order ODE (10)

Nλpt (1 − pt )UN0 (pt , n) + (r + Nλpt )UN (pt , n) = sN (pt , n)pt + Nλpt UN (p0 , n − 1) = g + NλUN (pt , n),

and the principal’s value function satisfies (11)

Nλpt (1 − pt )WN0 (pt , n) + (r + Nλpt )WN (pt , n) = 1 − sN (pt , n) pt − g + Nλpt WN (p0 , n − 1),

where we define UN (p, 0) = WN (p, 0) = 0 for any N and p. From the second equality of (10), we see that the expected payment to the agent is sN (pt , n)pt = g + NλUN (pt , n) − Nλpt UN (p0 , n − 1). The second term NλUN (pt , n) is the dynamic agency cost as in the two period model. The agent always has the option of shirking to postpone the success and guarantee the continuation payoff UN (pt , n), so that the increase in UN (pt , n) increases the expected payment. –21–

However, the third term −Nλpt UN (p0 , n − 1), which corresponds to the encouragement effect in the two period model, decreases the expected payment. The future opportunity of solving the subproblem increases the agent’s continuation payoff and encourages the agent to work. The more likely the current subproblem is to be solved, i.e., the larger pt , the more the agent is encouraged. Finally, the third term Nλpt WN (p0 , n − 1) in the right-hand side of (11) corresponds to the direct effect in the two period model. Again, the future opportunity of solving the subproblem increases the principal’s continuation payoff, and this effect is increasing in the current belief pt . As long as observing the failure, the posterior continues to decrease according to (9), and at some point pN the principal stops offering and gives up on having the agent attempt n to solve it. Then the agent’s continuation value is zero and hence UN (pN , n) = 0. Now n substituting this boundary condition into (10), we have sN (pN , n)pN = g − NλpN UN (p0 , n − 1). n

n

n

The principal also receives nothing anymore once the posterior reaches pN i.e., the value n matching condition WN (pN ) = 0. Since the principal is optimally choosing when to stop, n the smooth pasting condition WN0 (pN ) = 0 also holds. Substituting these conditions and n the above expression of sN (pN , n)pN into (11), we identify the threshold of belief n

(12)

pN = n

n

2g . 1 + Nλ UN (p0 , n − 1) + WN (p0 , n − 1)

It is easy to see that UN (p0 , n) ≥ UN (p0 , n − 1) and WN (p0 , n) ≥ WN (p0 , n − 1) because solving the remaining n subproblems is more valuable that solving n − 1 subproblems. Hence, we have pN ≤ pN N

N−1

≤ · · · ≤ pN = 2g. 1

Thus, the principal is more willing to continue to undertake at the earlier stages of the project because the future promise of solving the subproblems makes the principal persevere more even when having the pessimistic belief about the current subproblem. Now we solve the system of (10) and (11) to obtain the following characterization of an equilibrium. Proposition 3.1. The agent’s payoff and the principal’ payoff in an equilibrium with prior p0 are

–22–

recursively characterized by functional equations as follows; for n = 1, . . . , N,    Nλr  N  N    p p (r − Nλp ) (1 − p )   n t       r − Nλpt  t g n       UN (pt , n) =  −        pN (r − Nλ)   pt (1 − pN )      r − Nλ r n

n

and   N 1+ Nλr      p (1 − p )  n t   Nλpt        U (p , n − 1) + W (p , n − 1) + FN (pt , n) 1 − WN (pt , n) =   N 0 N 0     pt (1 − pN )    Nλ + r    n where FN (pt , n) is defined as FN (pt , n)   Nλr N   p (1 − p )   t   n  :=    N)     p (1 − p t  n

    (1 − pt )pN r(1 − g) rpt (1 − pN )  gNλ − pt r 2r(r − Nλpt )    g n n  1 −  − + +  ,  N N 2 2   (1 − p )(Nλ + r)g p (Nλ − r) g(Nλ + r) (Nλ) − r  r n

n

and pN is defined recursively according to (12). In an equilibrium starting with prior p0 , the n principal offers a contract and the agent works as long as the belief exceeds pN when there remains n n subproblems. Proof. See Appendix. We recall that our model allows the principal to delay an offer. Now consider the principal’s value WN (p, 1) of solving the last subproblem. Since it is easily shown that WN (p, 1) has at most one reflection point, if WN (1, 1) > 0 and W”N (pN , 1) > 0 (recall 1 WN0 (pN , 1) = 0), then WN (p, 1) ≥ 0 for any p ∈ [0, 1], and in the equilibrium the principle 1 always makes an offer and the agent always works until the belief reaches pN . Once these 1 conditions hold for WN (p, 1), the same conditions hold for the values WN (p, n), n = 2, . . . , N because WN (p, n) ≥ WN (p, n − 1), and after all, the principle always makes an offer for each of N subproblems until the belief reaches pN , n = 1, . . . , N. But conversely, if either n WN (1, 1) < 0 or W”N (pN , 1) < 0, then the value WN (p, 1) is negative for some beliefs. In this 1 case, it is known as argued in Horner and Samuelson (2012) that there is an equilibrium ¨ in which the principal delays his offer. But in this paper we do not consider this kind of equilibrium because from the above characterization it seems to be very cumbersome to identify when a delay starts, and hence we focus on an ”no delay” equilibrium, which is ensured by the following condition.

–23–

Lemma 3.2. There is no delay in an equilibrium if N<

1 − 2g r 1 and g < . g λ 4

Proof. See Appendix. The characterization will be obtained about what is the optimal N for the principal by comparing values WN (p0 , N) with different N. However, unfortunately the expression of the above principal’s value function is too complex to compare, and so we cannot give a general statement about what the optimal N is in terms of the parameters as in the two period model. Hence, in the following subsection we see through numerical examples how the optimal N is determined.

3.1 Numerical Examples Before we see when the divide and conquer strategy is better, we obtain another expression of the principal’s value function in Proposition 3.1. We consider the dynamic social surplus of solving the nth subproblem, ignoring the value of the subsequent subproblems. " SN (p0 , n) := E e

−rT V

N

T

Z

−rt

−

e

! gdt 1

0

# pT ≥pN

,

n

where T is a random time at which the nth subproblem is solved. Now that the probability Rt that a success has not occurred by time t is given by exp{− 0 Nλps ds}, we see that ∞

Z SN (p0 , n) =

Rt

e−rt− 0 Nλps ds (pt − g)1 0

pt ≥pN

dt

n

and hence SN (p, n) is a solution of ODE : Nλp(1 − p)S0N (p, n) + (r + Nλp)SN (p, n) = p − g with the boundary condition SN (pN , n) = 0. From equations (10) and(11), it is easy to see n that SN (p, n) = UN (p, n) + FN (p, n). Then substituting this into WN (p0 , N) in Proposition 3.1, we have   N 1+ Nλr     p (1 − p )   0   Nλp0   N      WN (p0 , N) = 1 − UN (p0 , N − 1) + WN (p0 , N − 1)       N     Nλ + r  p0 (1 − p )  N

–24–

+ SN (p0 , N) − UN (p0 , N) Thus, how the optimal N is determined depends on the encouragement effect Nλp0 UN (p0 , N− 1), the direct effect Nλp0 WN (p0 , N − 1), the social value of the current subproblem SN (p0 , N), and the agent’s payoff UN (p0 , N), which is proportional to the dynamic agency cost NλUN (p0 , N). In the following we use this decomposition to see by numerical examples when the divide and conquer strategy is better. 3.1.1 N = 1 or 2 with Low Cost g Here, as in the two period model, we focus on considering whether the principal should divide the problem into halves (N = 2) or not at all (N = 1). The parameter values are λ = 0.3, r = 0.22 and g = 0.12 that satisfy the conditions for no delay equilibrium in Lemma 3.2. Thus we suppose that the research project only needs a relatively small grant g = 0.12. Figure 1 shows the principal’s value functions W1 (p0 , 1) and W2 (p0 , 2). If the prior p0 is greater than about 0.9, then the divide and conquer strategy (N = 2) is better. Otherwise, solving the whole problem at once (N = 1) is better. Figure 2 shows the social values S1 (p0 , 1) and S2 (p0 , 2), which correspond to the lines ”N = 1” and ”N = 2 (Surplus)” respectively, and the total of the social value S2 (p0 , 2), the encouragement effect and the direct effect, which corresponds to the curve ”N = 2 (Total)”. It is not surprising that the social value S2 (p0 , 2) of solving the first subproblem is lower that the social value S1 (p0 , 1) of solving the whole problem. But now the divide and conquer strategy generates the additional values of the encouragement and the direct effect, so that the curve ”N = 2 (Total)” crosses the line ”N = 1” from below. Thus, if the prior p0 is sufficiently high, then the total value of the divide and conquer strategy exceeds that of solving the whole problem at once, and vice versa. This relation does not change even when subtracting the agent’s payoff shown in Figure 3. 3.1.2 N = 1 or 2 with High Cost g Again we focus on whether to choose N = 1 or 2 but the research project needs the relatively large amount of the grant. We only increase the cost parameter from g = 0.12 to g = 2, keeping the other parameters the same value. Figure 5 shows that the divide and conquer strategy is better for any prior. Similarly to the low cost case, the ”N = 1” line is above the ”N = 2 (Total)” line for the low values of prior but in this case the difference is not so big that after subtracting the agent’s payoff shown in Figure 6, the principal’s profit of N = 2 is greater than N = 1 even for the low priors. Thus, the encouragement and the direct effect still remain strong even for the low priors while the dynamic agency cost –25–

for N = 2, which is proportional to the agent’s payoff, is not so much bigger than N = 1. Combining these effects, the divide and conquer strategy is more attractive for any prior. We can summarize from these two numerical examples that the same intuition as in the two period model seems to be inherited by the infinite horizon model. 3.1.3 More Than N = 2 Our characterization allows us to consider no delay equilibria with more than N = 2 as long as the condition in Lemma 3.2 is met. Here we adopt the same values of the parameters as in the low cost case, and given these values we can consider no delay equilibria with up to N = 4. Figure 7 shows that if the prior is lower than about 0.9, then N = 1 is best, and if the prior is greater than about 0.9, then N = 2 is best. But if the prior is very close to 1, then N = 4 seems to be best.

4. Conclusion We considered a dynamic principal-agent model in which the principal decides how many subproblems the whole research problem is divided into before the interaction starts, and then in each period the agent can work on the subproblem by trying a risky bandit, whose effectiveness for solving the subproblem is uncertain. We characterized completely when the divide and conquer strategy is optimal in the two period model. We also characterized the equilibrium behavior and payoffs for the continuous time infinite horizon model, and confirmed by numerical examples that a similar intuition to the one discussed for the two period model continues to hold. In this paper we focus on the no delay Markovian equilibria for simplicity, and hence we do not have the characterization of all Markovian equilibria. The next step will be characterizing delay equilibria and Non-Markovian equilibria. Another restriction in this paper is that we assume that the principal can only divide the whole problem into a set of subproblems of equal size, i.e., each subproblem has N1 of the whole problem. We also assume that the principal can commit to the bandit size during the interaction but the principal might want to change the size of the bandits. For instance, the principal might want to have the agent solve the whole problem at once after observing the successive failures to solve the current subproblem. It might be interesting for future research to consider these issues.

–26–

A. Appendix : Proof of Lemma 2.2 Suppose that the principal offers the contract s0 such that inequality (4) holds at equality, so that ( !) 1 − p0 g s0 = 1 + δ2λ . 1 − 2λ p0 Then it is easy to see that if p0 = 1, the left-hand side is greater than the right-hand side in inequality (5). Also, if p0 = 2g, then the left-hand side is less than the right-hand side, so that there exists pH ∈ [2g, 1) such that the inequality (5) holds at equality since the the left-hand side is quadratic in p0 .

B. Appendix : Proof of Proposition 2.4 Proof. We focus on ZF (p0 ) and ZH (p0 ), the ”if p0 ≥ pF ” part and the ”if p0 ≥ pH ” of the profit functions W F (p0 ) and W H (p0 ). Since sF0 and sH are determined such that the incentive 0 constraints (2) and (4) hold at equality, we see from the argument in Subsection 2.4 that ZH (p0 ) − ZF (p0 ) n h io = δ SH (p0 ) − SF (p0 ) − UH (p0 , e p1 ) − UF (p0 , p1 ) ) ( 1 − p0 g . = δλ p0 (2p0 − 1 − g) − (1 − λ)(1 − 2λ) We naively extend these functions to the domain [2g, 1]. A sufficient condition for pF ≥ pH is that ZH (p0 ) is always above ZF (p0 ) for p0 ∈ [2g, 1]. Hence, note that ZH (1) − ZF (1) = δλ(1 − g) > 0. Next we need (

) 1 − 2g Z (2g) − Z (2g) = δλg 3g − 1 − ≥ 0, (1 − λ)(1 − 2λ) H

F

which is equivalent to g ≥ g :=

2(1 − λ)(1 − 2λ) + 1 3 1 ∈ , . 6(1 − λ)(1 − 2λ) + 2 8 2

–27–

Since 0 < λ < 12 , g ∈

3 1 , 8 2

. Moreover, when g ≥ g, which implies that p0 ≥ 2g >

3 4

( ) g d H F Z (p0 ) − Z (p0 ) = δλ 4p0 − 1 − g + > 0. dp0 (1 − λ)(1 − 2λ) Therefore, ZH (p0 ) never intersects with ZF (p0 ) for p0 ∈ (2g, 1]. Conversely, if g < g, then ZH (p0 ) crosses ZF (p0 ) only once from the below. This verifies the statement before Proposition 2.3 that ZH (p0 ) crosses ZF (p0 ) at most once from the below.

C. Appendix : Proof of Proposition 2.5 Note again that n h io ZH (p0 ) − ZF (p0 ) = δ SH (p0 ) − SF (p0 ) − UH (p0 , e p1 ) − UF (p0 , p1 ) . Since UH (p0 , e p1 ) − UF (p0 , p1 ) > 0, the point p0 where ZH (p0 ) crosses ZF (p0 ), if any, is greater 1+g than p∗ = 2 . Hence, a sufficient condition for pF < pH is p =

2g +

F

λ δg 1−λ

1 − δλ(1 − g) +

λ δg 1−λ

<

1+g , 2

which is equivalent to ! λ λ 1 2 2+ δ < 1 − δλ(1 − g) + δg 1 + . 1−λ 1−λ g

Since the right-hand side diverges as g goes to zero, there exists g such that if g ≤ g, then the above inequality holds.

D. Appendix : Proof of Proposition 3.1 We solve the differential equation (10) for UN (p, n). From the second equality of (10). UN0 (p, n) +

r − (1 − p)Nλ g UN (p, n) = Nλp(1 − p) Nλp(1 − p)

–28–

Hence, an integrating factor is R

e

r−(1−p)Nλ Nλp(1−p) dp

=e

1 Nλ

R

r Nλ−r 1−p − p

dp

r

r

= p−1+ Nλ (1 − p)− Nλ .

so that a general solution is Z

g r r r r (1 − p) q−1+ Nλ (1 − q)− Nλ dq + p1− Nλ (1 − p) Nλ CU Nλq(1 − q) ! r − Nλp g r r = + p1− Nλ (1 − p) Nλ CU r − Nλ r r Nλ

r 1− Nλ

UN (p, n) = p

where constant CU is determined by the boundary condition UN (pN , n) = 0, and so n

 N  r − Nλpn  g  (1 − pN )− Nλr (pN ) Nλr −1 . CU = −   n n r − Nλ  r Therefore, substituting this into the above we get the expression of the solution. Next, we solve for differential equation (11), which is equivalent to WN0 (p, n)

1 − sN (p, n) p − g + NλpWN (p0 , n − 1) r + Nλp + WN (p, n) = Nλp(1 − p) Nλp(1 − p)

so that an integration factor is R

e

r+Nλp Nλp(1−p) dp

1

= e Nλ

Rh

r Nλ+r p + 1−p

i dp

r

r

= p Nλ (1 − p)−(1+ Nλ ) .

Now from the first equality of (10) and the above solution UN (p, n), sN (p, n)p = Nλp(1 − p)UN0 (p, n) + (r + Nλp)UN (p, n) − NλpUN (p0 , n − 1) =

(Nλ)2 p − r2 g r r + Nλp1− Nλ (1 − p) Nλ CU − NλpUN (p0 , n − 1). Nλ − r r

Substituting this into the above differential equation, after some algebra a general solution turns out to be ) Z ( 1 − sN (q, n) q − g + NλqWN (p0 , n − 1) r r r r (1+ Nλ ) −(1+ Nλ ) − Nλ Nλ dq WN (p, n) = p (1 − p) q (1 − q) Nλq(1 − q) r

r

+ p− Nλ (1 − p)(1+ Nλ ) CW p = Nλ UN (p0 , n − 1) + WN (p0 , n − 1) + 1 Nλ + r –29–

2r2 − (Nλ)2 + Nλr(1 − 2p) 1−p + g + CW (1 − p) − CU r(Nλ − r)(Nλ + r) p

! Nλr

p 1−p = Nλ UN (p0 , n − 1) + WN (p0 , n − 1) + f (p) + CW (1 − p) − CU Nλ + r p

! Nλr

where f (p) is defined as p 2r2 − (Nλ)2 + Nλr(1 − 2p) g f (p) := + Nλ + r (Nλ − r)(Nλ + r) r g 2r(r − Nλp) g p Nλ − + = Nλ + r Nλ + r r (Nλ)2 − r2 r " # gNλ − pr 2r(r − Nλp) g = − + g(Nλ + r) (Nλ)2 − r2 r Now constant CW is determined by the boundary condition WN (pN , n) = 0. (CU has been n already determined above.) Then the boundary condition is equivalent to r r   N N NλpN  1 − pn  Nλ  1 − pn  Nλ n N  N CW (1 − p )  N  = CU  N  − f (p ) − UN (p0 , n − 1) + Wn (p0 , n − 1) , n n p p Nλ + r

n

n

which is equivalent to r   N −  1 − p   1 − pn  Nλ 1−p    CW (1 − p) = CU − f (pN )    N n 1−p 1 − pN   pN 

n

n

n

r   N − Nλ 1 − p      1 − p   n     − UN (p0 , n − 1) + Wn (p0 , n − 1)  .  N N   Nλ + r 1−p p 

NλpN n

n

n

It is further equivalent to 1−p CW (1 − p) − CU p

! Nλr

  r     ! Nλr N − Nλ    1 − p 1 − p       1 − p 1 − p         n      = CU  − 1 − f (pN )   n 1 − pN 1 − pN   pN   p  n

n

n

 N 1+ Nλr p (1 − p )   t Nλpt   − UN (p0 , n − 1) + WN (p0 , n − 1)  n .  Nλ + r pt (1 − pN )  n

Substituting the expression of CU , the first term of the right-hand side in the above equation

–30–

,

turns out to be        N r N  1 − p   pn (1 − pt )  Nλ  Nλpn − r  g  1 − p  N           pN (r − Nλ)  r  1 − pN − 1 − f (pn )  1 − pN   pt (1 − pN )  n n n   n    N N N  r(p r(1 − p r(1 + p gNλ − pN r   1 − p  − 1) ) )       1 − p        n n n n   −     = + − − 1  − 1  N      Nλ − r   p (r − Nλ) 1 − pN Nλ + r (Nλ + r)g   1 − pN  n

n

n

 N r  pn (1 − pt )  Nλ g  ×   pt (1 − pN )  r n     N  r(1 + pN ) gNλ − pN r   1 − p  r(p − p )   1 − p r(1 − p)    n n n     − +1+ = +  −   N N N   p (r − Nλ) 1 − p r − Nλ Nλ + r (Nλ + r)g   1 − p  n n n  N r  pn (1 − pt )  Nλ g  ×   N pt (1 − p )  r n    Nλr    N   p rp(1 − pN )  r(1 + pN ) gNλ − pN r (1 − p )  g     t     1 − p   n  n n n       = 1 + + − − 1 ×          1 − pN     pt (1 − pN )  r pN (r − Nλ)  Nλ + r (Nλ + r)g n n n  N   Nλr N N  (1 − pt )p r(1 − g)  p (1 − pt )  g rp(1 − p )     n  n n  − = 1 +     pt (1 − pN )  r  pN (r − Nλ) (1 − pN )(Nλ + r)g  n

n

n

Substituting this into the general solution, we get the expression shown in the proposition.

E. Appendix : Proof of Lemma 3.2 As argued in the text, we just seek conditions for WN (1, 1) > 0 and W”N (pN , 1) > 0. 1 From Proposition 3.1, the former is equivalent to   r  −(Nλ − g )  g 2  WN (1, 1) =  − Nλ + r Nλ + r  r 1−g g = − > 0, Nλ + r r

–31–

so that one of the conditions in the lemma is obtained. By substituting the second equality of (10) into (11), we see WN0 (p, 1) =

p − 2g − NλUN (p, 1) − (r + NλpWN (p, 1)) . Nλp(1 − p)

Differentiating once more and using the facts : pN = 2g and UN (pN , 1) = WN (pN , 1) = 1 1 1 WN0 (pN , 1) = 0, we end up 1

W”N (p , 1) = N

1 − NλUN0 (pN , 1) NλpN (1 1

1

−

1 pN ) 1

.

From (10) and UN (pN , 1) = 0, 1

NλUN0 (pN , 1) = 1

pN (1 1

g . − pN ) 1

and substituting this into the above and again noting pN = 2g, 1

W”N (pN , 1) = 1

(1 − 4c)c , pN (1 − pN ) 1

1

so that we obtain the condition c < 14 .

References [1] Akcigit, U., and L. Qingmin (2011): ”The Role of Information in Competitive Experimentation,” mimeo. [2] Bergemann, D., and U. Hege (2005): ”The Financing of Innovation: Learning and Stopping,” RAND Journal of Economics, 36, 719-752. [3] Bolton, P., and C. Harris (1999): ”Strategic Experimentation,” Econometrica, 67, 349-374. [4] Bonatti, A., and J. Horner (2011): ”Collaborating,” American Economics Review, 101, ¨ 632-663. [5] Hopenhayn, H., and F. Squintani (2011): ”Preemption Games with Private Information,” Review of Economic Studies, 78, 667-692. [6] Horner, J., and L. Samuelson (2012): ”Incentives for Experimenting Agents,” mimeo. ¨ –32–

[7] Ishiguro, S. (2004): ”Collusion and Discrimination in Organizations,” Journal of Economic Theory, 116, 357-369. [8] Jullien, B. (2011): ”Competition in Multi-Sided Markets: Divide and Conquer,” American Economic Journal: Microeconomics, 3, 186-219. [9] Keller, G., and S. Rady (2010): ”Strategic Experimentation with Poisson Bandits,” Theoretical Economics, 5, 275-311. [10] Keller, G., S. Rady, and C. Cripps (2005): ”Strategic Experimentation with Exponential Bandits,” Econometrica, 73, 39-68. [11] Klein, N., and S., Rady (2011): ”Negatively Correlated Bandits,” Review of Economic Studies, 78, 693-732. [12] Mason, R., and J. V¨alim¨aki, M. (2011): ”Dynamic Moral Hazard and Stopping,” mimeo. [13] Murto, P., and J. V¨alim¨aki (2010): ”Learning and Information Aggregation in an Exit Game,” mimeo. [14] Rosenberg, D., E. Solan, and N. Vieille (2007): ”Social Learning in One-Arm Bandit Problems,” Econometrica, 75, 15911611. [15] Rothschild, M. (1974): ”A Two-Armed Bandit Theory of Market Pricing,” Journal of Economic Theory, 9, 185-202. [16] Segal, I. (2003): ”Coordination and Discrimination in Contracting with Externalities: Divide and Conquer?,” Journal of Economics Theory, 113, 147-181.

–33–

WH pL 1.4

N=1

1.2

N=2 1.0

0.8

0.6

0.4

0.2

p 0.0

0.2

0.4

0.6

0.8

1.0

Figure 1: Principal’s Payoff U H pL 2.0

SH pL 2.0

N=1 N=1 N=2HSurplusL 1.5

1.5

N=2

N=2HTotalL

1.0

1.0

0.5

0.5

p 0.0

0.2

0.4

0.6

0.8

1.0

p 0.0

Figure 2: Dynamic surplus

0.2

0.4

0.6

0.8

Figure 3: Agent’s Payoff

–34–

1.0

WH pL 1.0

N=1 0.8 N=2

0.6

0.4

0.2

p 0.0

0.2

0.4

0.6

0.8

1.0

Figure 4: Principal’s Payoff U H pL 2.0

SH pL 2.0

N=1

N=2HSurplusL

1.5

1.5

N=1

N=2 N=2HTotalL

1.0

1.0

0.5

0.5

p 0.0

0.2

0.4

0.6

0.8

1.0

p 0.0

Figure 5: Dynamic surplus

0.2

0.4

0.6

0.8

Figure 6: Agent’s Payoff

–35–

1.0

U H pL 1.4

N=1 1.2 N=2

1.0

N=3

N=4 0.8

0.6

0.4

0.2

p 0.2

0.4

0.6

Figure 7: More generally optimal N

–36–

0.8

1.0

Impatience and dynamic moral hazard

Dynamic Moral Hazard and Stopping - Semantic Scholar

Dynamic Moral Hazard and Project Completion - CiteSeerX

Dynamic risk sharing with moral hazard

Divide and Conquer Strategies for MLP Training

Asymmetric awareness and moral hazard

Bayesian Persuasion and Moral Hazard

Monitoring, Moral Hazard, and Turnover

Bayesian Persuasion and Moral Hazard

special moral hazard report -

Frequent Pattern Mining Using Divide and Conquer ...

Distributed divide-and-conquer techniques for ... - Research

A divide-and-conquer direct differentiation approach ... - Springer Link

Distributed divide-and-conquer techniques for ... - Research

A divide-and-conquer direct differentiation approach ... - Springer Link

Divide-and-conquer: Approaching the capacity of the ...

To Divide and Conquer Search Ranking by Learning ...

Frequent Pattern Mining Using Divide and Conquer ...

A Divide and Conquer Algorithm for Exploiting Policy ...

An Improved Divide-and-Conquer Algorithm for Finding ...