Frequent Actions with Infrequent Coordination

Viewer
Transcript

Frequent Actions with Infrequent Coordination David Rahman∗ University of Minnesota September 23, 2013

Abstract I study repeated games with mediated communication and frequent actions. I derive a Folk Theorem with imperfect monitoring assuming a conditional form of individual identifiability. Even in the limit, when noise is driven by Brownian motion and actions are arbitrarily frequent, as long as players are sufficiently patient they can attain virtually efficient equilibrium outcomes, in two ways: secret monitoring and infrequent coordination. Players follow private strategies over discrete blocks of time. A mediator constructs latent Brownian motions to score players on the basis of others’ secret monitoring, and gives incentives with these variables at the end of each block, to improve information aggregation. This brings together the work on repeated games in discrete and continuous time in that, despite actions being continuous, strategic coordination is endogenously discrete. As an application, I show how (unconditional) individual identifiability is necessary and sufficient for the Folk Theorem in the Prisoners’ Dilemma. JEL Classification: D21, D23, D82. Keywords: mutual cooperation, mediated communication, frequent actions.

∗

I owe many thanks to David Levine and Joe Ostroy for invaluable comments, the European University Institute for their hospitality during the Fall of 2012 and the National Science Foundation for financial support through Grant No. SES 09-22253.

Contents 1 Introduction

1

2 Prisoners’ Dilemma 2.1 Secret Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Infrequent Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 5 7

3 Assumptions 3.1 Payoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Drifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 10 11 12

4 Incentive Compatibility 4.1 Scoring Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Punishment Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Reward Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14 14 15 18

5 Folk Theorem 5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 21

6 Discussion 6.1 Private Monitoring and Private Information 6.2 Social Incentives . . . . . . . . . . . . . . . 6.3 Other Signal Structures . . . . . . . . . . . 6.4 Discrete Time . . . . . . . . . . . . . . . . . 6.5 Dispensing with the Mediator . . . . . . . .

22 22 23 25 26 26

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

7 Conclusion

27

A Proofs

27

References

42

Supplementary Material

46

B Proof of Lemma 1

46

C Communication Equilibrium C.1 Public versus Private Equilibrium . . . . . . . . . . . . . . . . . . . . . . . C.2 T -Public Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48 49 52

1

Introduction

Firms, partners and household members reach complex, dynamic, often informal agreements regarding behavior, coordination and incentives. A basic facet of these relationships involves managing information amongst interested parties. For instance, firms sometimes form trade associations, ostensibly to foment collaboration through regular meetings, as well as standardization. In numerous occasions, these associations have also played the role of information management institutions.1 This paper studies how such institutions can facilitate mutual cooperation in terms of two canonical channels: (i) they allow players to secretly monitor each other, hence only occasionally, which can yield substantial reductions in monitoring costs, and (ii) when incentives require efficiency losses beyond monitoring itself, players can aggregate information better by coordinating infrequently. To better understand such institutions and practices, I consider mediated communication and frequent actions. Players follow mediated strategies (Forges, 1986; Myerson, 1986)—a plausible generalization of private strategies that accommodates unlimited communication possibilities. By the revelation principle, a mediated strategy can be summarized as follows. Players engage in confidential, non-binding communication with a so-called mediator,2 who makes behavioral recommendations that may depend on previous observations. Everyone follows the mediator’s recommendations in communication equilibrium. Actions are also frequent, where a player’s information converges to Brownian motion, for two reasons. First, as Sannikov and Skrzypacz (2010, p. 871) argued, abstracting from the friction of a fixed period length “uncover[s] fundamental principles of [. . .] repeated interactions.” In the model, this abstraction disciplines significantly how information can be reliably aggregated.3 Second, to underscore the value of communication, impossibility results of Sannikov and Skrzypacz (2007, 2010) are overturned simply and intuitively. Communication is not new to repeated games (e.g., Compte, 1998; Kandori and Matsushima, 1998; Kandori, 2003; Aoyagi, 2005; Obara, 2009; Tomala, 2009). The equilibria from all of these papers fail with frequent actions. Similarly motivated, Harrington and Skrzypacz (2011) derive collusive equilibria with private monitoring that fail with frequent actions, too, and, being in pure strategies, cannot explain secret monitoring or infrequent coordination. Others study communication via actions (Ely et al., 2005; H¨orner and Olszewski, 2006; Kandori and Obara, 2006; Sugaya, 2010), but also fail with frequent actions. 1

A recent discussion of such institutions as well as a detailed description of their specific information management role can be found in Marshall and Marx (2012, Chapter 6), for instance. 2 In principle, this disinterested party can be a machine, a not-necessarily-public randomization device. 3 Rahman (2013) illustrates this point in a simpler context. It shows that the information aggregation approach of Abreu et al. (1990a), hence of Compte (1998), too, fails with frequent actions. Moreover, a version of Kandori and Matsushima (1998) yields limited success, generally bounded away from efficiency.

1

The results of this paper are important not just intrinsically, but also because they qualify recent work on games with frequent actions (Sannikov, 2007; Sannikov and Skrzypacz, 2007, 2010; Faingold and Sannikov, 2011; Fudenberg and Levine, 2007, 2009) with a wide range of applications. A key result of this literature is that when information is driven by Brownian motion, “value-burning” is not feasible, and can collapse equilibrium outcomes to the repetition of static equilibria. Perhaps most notably, Sannikov and Skrzypacz (2007) argued that collusion is impossible in oligopoly with flexible production. These papers restrict attention to Nash equilibria in public strategies.4 Although there is precedent for this solution concept,5 to classify channels (i) and (ii), in this paper I broaden the notion of equilibrium towards dynamic mechanism design in two steps (see Table 1 below): (i) from public Nash equilibrium to public communication equilibrium, and (ii) from public communication equilibrium to private communication equilibrium. In public Nash equilibrium, strategies only depend on public information. After a mixed strategy, continuation play cannot vary with the mixture’s realization: incentives must be independent of actual behavior. In public communication equilibrium, the mediator can keep others’ recommendations secret for one period—however brief—and condition continuation play on these recommendations as well as any other relevant information. Now incentives can depend on whether a player was secretly monitored. As a result, players can be monitored only seldom, hence at lower monitoring costs. As Rahman (2012b) shows, this restores full collusion in Sannikov and Skrzypacz’s (2007) model of oligopoly. Public Private

Nash Communication Literature Lemma 1 Lemma 2

Table 1: Equilibrium with Frequent Actions In public communication equilibrium, coordination is so frequent that nobody’s private information is useful for more than a period. This may be socially costly, especially in games with frequent actions where a period can be as brief as a nanosecond. Although public communication equilibrium helps to reduce monitoring costs, it can stop short of avoiding the additional costs of value-burning: if, after a deviation, it is impossible to (statistically) identify the culprit, discouraging it requires punishing everyone. With frequent actions and Brownian information, a deviation’s noise-to-signal ratio explodes, rendering this punishment infeasible. In private equilibrium, though, even if actions vary frequently, players can temper the noise-to-signal ratio by aggregating private information and synchronizing their histories infrequently, say once a week instead of every nanosecond. 4 5

Recall that a strategy is public if it only depends on public histories, otherwise it is private. E.g., Abreu et al. (1990b) equate their payoffs to those from pure strategy sequential equilibria.

2

The mediator can manage information as follows. First, he decides on the length of time c during which players’ histories may diverge. Each period, he makes secret recommendations, records public signal realizations, and constructs a latent variable Yit for each player i that follows an independent Brownian motion. The drift of Yi depends on the mediator’s recommendations as well as the public signal. Players do not observe the mediator’s recommendations to others except at the end of each block, when these are publicly announced together with the latent variables, and another block of length c begins afresh. Assuming conditional identifiability, these latent variables are driftless when players are obedient, but drift downwards when someone disobeys. Everyone is subject to a cutoff that grows linearly in c (so longer blocks have less likely cutoffs). If Yic is below the cutoff, player i is punished with a loss of continuation value. (I also construct reward schemes with the opposite effect.) A simple cutoff is the lowest possible drift arising from a deviation. This guarantees convexity of punishments, which simplifies the analysis of incentives. However, other cutoffs can motivate cooperation. Punishments turn out to be (approximately) linear in players’ discount rate r > 0, so they satisfy local self-decomposability, which easily yields a Folk Theorem. In the Prisoners’ Dilemma, I use this construction to show that it is always possible to mediate cooperation, except when the signal’s drift is completely uninformative. The main result of the paper is a “Nash-threats” Folk Theorem (except that threats are actually correlated equilibria), which argues that information management can sustain cooperation with frequent actions even with value-burning, as players become unboundedly patient. To this end, I make mostly technical assumptions to easily extend the algorithm of Fudenberg and Levine (1994) to “T -public” communication equilibria. I make assumptions on payoffs for a Folk Theorem as in Compte (1998, Theorem 2) without pairwise identifiability, but with important departures. One key difference is that the scoring rule of Abreu et al. (1990a), on which he relies, fails with the frequent actions (see Rahman, 2013). Their scoring rule is too lenient, because it does not trigger punishment sufficiently often. On the other hand, if punishment is too likely, too much value-burning might occur. I adopt an intermediate approach. In terms of probabilities, I assume a conditional form of individual identifiability (not in the private monitoring case, see Section 6.1), entirely consistent with the possibility that identifying the deviator is statistically impossible. To overturn the result of Sannikov and Skrzypacz (2010), I only give incentives with individual punishments and rewards, that is, I only allow for incentives with value-burning. I make three more claims. First, players can manage information by mediating their coordination using equilibria with strict incentives, unlike belief-free equilibria. This overcomes Bhaskar’s (2000) critique. Conversely, it is possible to dispense with a mediator if players can communicate their intended behavior publicly and with delay (Section 6.5), albeit at the cost of strict incentives, since now players’s intentions must be solicited. 3

Secondly, even if players interact frequently, they can still interact discretely in terms of strategic coordination. Thus, for all intents and purposes, the game is effectively discrete. Finally, I find a new ambiguity in the continuous time limit of a family of repeated games in discrete time, complementing Fudenberg and Levine (2007). In public equilibrium, they showed that limiting payoffs depend on how information converges to Brownian motion. To address their issue, I focus on binomial random walks, which has costs and benefits. On the costs side, it becomes more difficult for the mediator to detect deviations than, say, if there were more signals every period. On the other hand, it is now more difficult for players to infer what others played, vindicating information delay.

2

Prisoners’ Dilemma

Example 1. Two-player repeated Prisoners’ Dilemma with imperfect public monitoring. C D C D C 1, 1 −1, 2 C x2 x1 D 2, −1 0, 0 D x1 x0 Payoffs Drifts The left bi-matrix shows flow payoffs as a function of players’ actions. The right matrix shows the drift of a publicly observed Brownian motion X, with law dXt = x(at )dt + dWt , where at is the action profile at time t and W is Wiener process. Because volatility is effectively observable, and in line with the literature, assume that actions do not affect the volatility of X. Moreover, assume that the drift of X only depends on the number of cooperators, not on a cooperator’s identity.6 Consider discrete-time approximations of this game, where the time between interactions is ∆t > 0, and players have a common discount factor δ = e−r∆t with r > 0. In each period, players observe a random walk that converges to the Brownian motion above as ∆t → 0: ( √ √ Xt−∆t + ∆t with probability p(at ) = 12 [1 + x(at ) ∆t], √ Xt = Xt−∆t − ∆t with probability q(at ) = 1 − p(at ). The choice of a Binomial random walk is not for simplicity: binary per-period signals make incentive provision more challenging in discrete time (Fudenberg et al., 1994) and in the continuous-time limit (Fudenberg and Levine, 2009). So much so, in fact, that welfare is bounded above by 1 in public Nash equilibrium, as Sannikov and Skrzypacz (2010) show. Contrariwise, I estimate the set of communication equilibrium payoffs of this game as ∆t → 0, and then as r → 0, with the following necessary and sufficient condition for a Folk Theorem, meaning that every non-negative payoff is virtually attainable in equilibrium. 6

This is not for simplicity, but for non-triviality: otherwise, deviators can be statistically identified (i.e., there is pairwise identifiability), yielding a Folk Theorem (Sannikov and Skrzypacz, 2010, Proposition 1).

4

Theorem 1. The Folk Theorem fails if and only if x0 = x1 = x2 . Theorem 1 gives very weak conditions for a Folk Theorem: as long as x0 6= x1 or x1 6= x2 , it is possible to motivate mutual cooperation as players become patient. Intuitively, this condition says simply that for every unilateral deviation there exists an action profile that makes it statistically detectable, even if the identity of the deviator is not. First of all, necessity is immediate: if x0 = x1 = x2 then clearly there is no hope for cooperation, since defecting is completely undetectable, no matter what anybody does. To argue sufficiency, consider two parametric cases corresponding to channels (i) and (ii) above. First, assume that the drift x is monotone in the number of cooperators, to obtain a Folk Theorem in public communication equilibrium that underscores the gains from secret monitoring (Lemma 1). Next, a Folk Theorem in private communication equilibrium is derived when monotonicity fails that highlights the value of infrequent coordination (Lemma 2).

2.1

Secret Monitoring

If x is increasing in the number of cooperators, the secret principal contract of Rahman and Obara (2010) delivers a Folk Theorem with public communication equilibria, as follows. Lemma 1. The Folk Theorem holds in public communication equilibrium if x2 ≥ x1 > x0 .7 Proof Sketch. A complete proof appears in Appendix B. Here, I argue that virtually full cooperation is sustainable. Given µ ∈ (0, 1), with probability 1 − µ a mediator secretly recommends cooperation (and defection with probability µ) independently to each player. Recommendations are publicly announced at the end of each period, after X is realized. Current-period expected payoff is 1 − µ. By Theorem 4.1 of Fudenberg et al. (1994), it suffices to show that this µ is enforceable with respect to tangent hyperplanes, i.e., there exist budget-balanced transfers that make µ incentive compatible. Let continuation values change depending on whether X jumps up (labeled +1) or down (labeled −1) as well as the mediator’s recommendations as follows: +1 C D −1 C D C +w, −w C −w, +w D −w, +w D +w, −w Change in continuation values after +1 and −1 If the mediator asks a player to cooperate, incentive compatibility requires that (1 − δ)(1 − 2µ) + δµw(p1 − q1 ) ≥ (1 − δ)2(1 − µ) + δµw(p0 − q0 ). √ Rearranging and substituting for p and q yields, equivalently, δµw(x1 − x0 ) ∆t ≥ 1 − δ. 7

Contrast this with Sannikov and Skrzypacz (2010, Example 1), where (x0 , x1 , x2 ) = (1, 5, 8).

5

Since δ = e−r∆t and 1 − δ ≤ r∆t, this follows from the following inequality: √ r ∆t . δµw ≥ x 1 − x0

(1)

Similarly, for a player asked to defect, incentive compatibility requires that (1 − δ)2(1 − µ) + δw(1 − µ)(q1 − p1 ) ≥ (1 − δ)(1 − 2µ) + δw(1 − µ)(q2 − p2 ), which, rearranging as in the previous derivation, follows from w ≥ 0 and hence is implied by (1). Intuitively, cooperating when asked to defect lowers utility this period and the probability of a down jump next period, since x2 > x1 . Choose w such that (1) holds with equality, so w is increasing in both r and ∆t, (almost) linear in r, and the correlated strategy above is enforceable in the welfare direction (1, 1). Given a smooth set in the interior of the feasible, individually rational payoffs, its boundary point with tangent vector (1, 1) exhibits local self-decomposability (LSD) for some (r, ∆t) by Theorem 4.1 of Fudenberg p et al. (1994). If w yields LSD at (r, ∆) as above then so does w = w ∆t/∆ at (r, ∆t) for any ∆t < ∆. Finally, w → 0 at rate r as r → 0—the Folk Theorem only requires √ convergence faster than r. See Figure 1 below for geometric intuition. Smooth, locally self-‐decomposable set, boundary ∝ rΔt

x Current-‐period payoﬀ

x Average life6me u6lity w Δt / Δ

1− δ ≈ rΔ δ

Cr Δ

x

x

x

x

x

Expected next-‐period payoﬀ

w

Figure 1: Frequent-actions versus discrete-time Folk Theorem In the proof of Lemma 1, x2 ≥ x1 > x0 is used to establish enforceability with respect to every tangent hyperplane. Reversing the roles of up and down jumps, the same proof follows verbatim if x0 > x1 ≥ x2 . This monotonicity corresponds to first-order stochastic dominance in the literature on oligopoly with noisy prices, from Green and Porter (1984) and Abreu et al. (1986) to Sannikov and Skrzypacz (2007) and Harrington and Skrzypacz (2011). Monotonicity permits identifying obedient agents (Rahman and Obara, 2010), and assigning continuation values that avoid “value-burning” (Sannikov and Skrzypacz, 2010). To see why, let x2 ≥ x1 6> x0 (but not x0 = x1 = x2 ) and consider the following profile of deviations: a player asked to cooperate defects with probability α, whereas if asked to defect 6

he cooperates with probability β such that 0 ≤ α, β ≤ 1 and β/α = (p0 −p1 )/(p2 −p1 ).8 For any correlated strategy, it is impossible to identify the deviator after a unilateral deviation.9 As a result, Lemma 1 generally fails: since it could have been anyone, every player must be punished on-path. This leads to inefficiency, even as players become patient. In other words, value-burning is unavoidable without monotonicity.

2.2

Infrequent Coordination

Such inefficiency will now be overcome by manipulating the arrival of recommendations, yielding a Folk Theorem in private communication equilibrium assuming that x satisfies x2 − x1 6= x1 − x0 ,

or, equivalently,

x0 + x2 6= 2x1 .

(2)

The only vectors x excluded by (2) exhibit x2 − x1 = x1 − x0 . If x2 − x1 = x1 − x0 > 0 then x2 > x1 > x0 , and by Lemma 1 a Folk Theorem obtains. Similarly, a Folk Theorem holds if x2 − x1 = x1 − x0 < 0 by reversing the roles of up and down jumps. Hence, the Folk Theorem fails only if x2 − x1 = x1 − x0 = 0, proving Theorem 1. Let us construct T -period blocks such that T = bc/∆tc, where c is a constant representing the length of calendar time of each block. To sustain a nearly efficient payoff, players face a punishment scheme that depends on both the mediator’s secret recommendations over the T -period block and the public signals. At the end of the T -period block all previous recommendations are made public to solve for T -public communication equilibria. This delays the arrival of inefficient punishments and increases the number of signals contingent on which to trigger said punishments, which tempers the noise to signal ratio. Lemma 2. The Folk Theorem also holds if x2 − x1 6= x1 − x0 . Proof Sketch. This result follows from Theorem 2 below, proved in Appendix A. Here I argue that virtually full cooperation is sustainable. Without loss, let x0 + x2 > 2x1 . The case x0 + x2 < 2x1 is handled similarly by reversing the roles of up and down jumps. The mediator generates a latent variable Yi for each player i that follows a random walk representation of Brownian motion, with drift determined by the mediator’s recommendations and the public signals. Players do not observe these latent variables except every c units of time. The mediator secretly recommends cooperation to each player with probability 1 − µ and defection with probability µ, for some small µ > 0. Players observe only their own recommendations. Recommendations are IID across players and time throughout each block. At time t, the mediator makes a profile of recommendations at , each player takes a 8

For instance, if x2 = x0 = 1 and x1 = 0, let α = 1 and β = 1: disobedience with probability one. Since deviations are identical, a deviator cannot be identified from (C, C) or (D, D). Given (C, D), if player 1 defects then the distribution of signals changes by α[(p0 , q0 ) − (p1 , q1 )], just as for player 2, since β[(p1 , q1 ) − (p2 , q2 )] = α[(p0 , q0 ) − (p1 , q1 )]. By symmetry, the claim now follows. 9

7

possibly different action, and the publicly observed variable Xt realizes. Let ωt = +1 if Xt jumps up and −1 if it jumps down. Take the first block of length c. The latent variable Yi starts at Yi0 = 0. After (at , ωt ) realizes, the mediator performs a secret Bernoulli trial with failure probability ξi (at , ωt ). The law of motion of Yi is ( √ Yit−∆t + ∆t with probability ζi (at , ωt ) = 1 − ξi (at , ωt ), √ Yit = Yit−∆t − ∆t with probability ξi (at , ωt ), where ξ is a scoring rule that determines the failure rate of Y such that deviations increase failure rates, even after obedience failure is still possible, and ξ implements the correlated strategy defined by µ. The following scoring rule works: ξi (a, ω) = 12 for all (i, a, ω) except µ p1 1 ξi (Ci , C−i , +1) = 2 1 − and ξi (Ci , D−i , +1) = 1, 1 − µ p2 where µ > 0 and ∆t > 0 are small enough that ξi (a, ω) ∈ [0, 1] is a probability. Observe that ξi (Ci , D−i , +1) > 21 > ξi (Ci , C−i , +1): an increase in Xt increases the failure rate if i’s opponent was recommended to defect and lowers it otherwise. Notice that, on the path of play, player i’s failure rate conditional on his information equals 21 , so the latent variable Yi follows a random walk without drift. Indeed, after i was recommended Di or X jumped down this is clear by construction of ξ. Moreover, the probability that Yi jumps down given that X jumped up, i was recommended Ci and he obeyed equals h i µ p1 1 (1 − µ) 12 1 − 1−µ p2 + µp1 [(1 − µ)p2 + µp1 ] p2 = 2 = 12 . (1 − µ)p2 + µp1 (1 − µ)p2 + µp1 If player i defected when asked to cooperate, this conditional probability increases to h i µ p1 1 (1 − µ) 2 1 − 1−µ p2 p1 + µp0 (p0 p2 − p21 )/p2 ∗∗ 1 π = = 2 1+µ . (1 − µ)p1 + µp0 (1 − µ)p1 + µp0 As ∆t → 0, since p2 ≈ (1 − µ)p1 + µp0 ≈ 12 , this conditional probability is close to √ 1 [1 + µ(x0 + x2 − 2x1 ) ∆t]. Hence, the largest possible conditional failure drift is 2 z ∗∗ = µ(x0 + x2 − 2x1 ). Unconditionally, that is, before the realization of X, the probability of failure equals µ p1 ∗ 1 π = (1 − µ) 2 1− p1 + q1 + µ(p0 + 21 q0 ) = 12 1 + µ(p0 p2 − p21 )/p2 . 1 − µ p2 Hence, the unconditional, or prior, failure drift is similarly derived from π ∗ to be z ∗ = 21 µ(x0 + x2 − 2x1 ). 8

At the end of the block, that is, at time c, the mediator publicly reveals the entire path of recommendations as well as each Yi , and the next block begins afresh. If the value of Yi is below some threshold Y ∗∗ < 0 at time c, then player i pays a penalty w of continuation value. For fixed ∆t > 0, if Yi exhibited t∗∗ = b(1 − π ∗∗ )(T − 1)c successes or fewer along the block then punishment ensues. The threshold Y ∗∗ is related to t∗∗ through the random walk representation.10 As ∆t → 0, the threshold is linear in the block length c: Y ∗∗ → −z ∗∗ c

as

∆t → 0.

Figure 2: Linear threshold (red) versus latent variable (blue) over time In equilibrium, Yi is a driftless Brownian motion, so Yic − Yi0 = Yic ∼ N (0, c) and i is √ punished with probability 1−Φ(z ∗∗ c), where Φ is the standard normal CDF. Player i only affects the drift of Yi by deviating, hence the mean of Yic . If player i is asked to cooperate in √ the first period but defects instead, he is punished with probability 1−Φ((z ∗∗ c−z ∗ ∆t)/ c). To discourage this single deviation, punishment costs must outweigh deviation gains: √ √ 1 − e−r∆t ≤ e−rc w[Φ(z ∗∗ c) − Φ((z ∗∗ c − z ∗ ∆t)/ c)].

(3)

Since 1−e−x ≤ x for x ≥ 0, the left-hand side is estimated by r∆t, and since Φ(z) is concave √ √ for z ≥ 0, the right-hand side is estimated by e−rc wϕ(z ∗∗ c)z ∗ ∆t/ c. Substituting these estimates, multiplying both sides by c/∆t and rearranging, (3) follows from w≥

rcerc √ √ . ϕ(z ∗∗ c)z ∗ c

If w makes this inequality bind then every deviation is discouraged. To see this, pick any partial history of behavior and observations, hti . Relative to obedience, the mean of Yic may have decreased by some amount, say θ. By construction of ξ, however, θ cannot exceed z ∗∗ c, the latter being c times the largest possible conditional failure drift. With the same logic Roughly, Yic equals Y ∗∗ after t∗∗ jumps up and T − t∗∗ jumps down, each jump having length √ √ √ Thus, Y ∗∗ ≈ t∗∗ ∆t − (T − t∗∗ ) ∆t ≈ (1 − 2π ∗∗ )T ∆t. 10

9

√

∆t.

√ √ as for (3), a single deviation after hti is discouraged if w ≥ rcerc /[ϕ((z ∗∗ c − θ)/ c)z ∗ c]. √ √ This clearly holds since ϕ((z ∗∗ c − θ)/ c) ≤ ϕ(z ∗∗ c) for all θ ∈ [0, z ∗∗ c], so at any partial history, every one-step deviation is unprofitable. Hence, every deviation is discouraged. √ Since w is approximately linear in r for every c, it diminishes faster than r as r → 0, so punishments are self-decomposable if players are patient. Each player’s lifetime expected √ √ utility solves v = (1 − e−rc )(1 − µ) + e−rc [Φ(z ∗∗ c)v + (1 − Φ(z ∗∗ c))(v − w)], so √ 1 − Φ(z ∗∗ c) rc √ √ . v =1−µ− 1 − e−rc ϕ(z ∗∗ c)z ∗ c Finally, as r → 0, this holds for every c, and rc/(1 − e−rc ) → 1. As c → ∞, the normal hazard rate explodes linearly, so v → 1 − µ, and, since µ > 0 was arbitrary, v → 1. Mutual cooperation requires so little in Theorem 1 because players just have two actions. With more actions, more is required for a Folk Theorem, but the strategic flavor is similar. On the other hand, with private monitoring (and full support) the requirements for a Folk Theorem are much weaker, reducing to individual identifiability. See Section 6. The threshold Y ∗∗ above is clearly not the only one that can induce mutual cooperation. I chose it to yield a particularly transparent derivation of incentive compatibility. Finding optimal thresholds and strategies is an interesting open question for further research. The mediator may be dispensable. Instead of taking recommendations, players may simply report their intended action before playing it. This is reminiscent of but not identical to Kandori’s (2003) contracts. To reveal his intentions, a player must be indifferent over all reports. Nevertheless, Section 6 shows that reporting intentions can sometimes—yet decidedly not always—be a good substitute for a mediator. Finally, even if a mediator cannot be dispensed with, it may be possible to decentralize it with plain conversation, as argued by Forges (1986, 1990). She required four or more players, though.

3 3.1

Assumptions Payoffs

Consider a repeated game with public monitoring in discrete time. Each stage of the game is indexed by τ ∈ {1, 2, . . .}, with ∆t > 0 the length of time between stages. The calendar date of each stage is t ∈ {∆t, 2∆t, . . .}. The stage game is repeated every period; it consists of a finite set I = {1, . . . , n} of players, a finite set Ai of actions for each player i ∈ I, where Q A = i Ai , and a function u : I × A → R, where ui (a) denotes the utility flow to player i from action profile a. Players have a common discount factor δ = e−r∆t , where r > 0. Given a sequence of action profiles a∞ = (a1 , a2 , . . .), the overall utility to player i is (1 − δ)

∞ X

δ τ −1 ui (aτ ).

τ =1

10

P Let U = {u(µ) = a u(a)µ(a) : µ ∈ ∆(A)} be the convex hull of stage-game payoff vectors. Given a correlated strategy µ ∈ ∆(A), let X max ui (bi , a−i )µ(a)σi (bi |ai ), ui (µ) = σi :Ai →∆(Ai )

(a,bi )

and write u(µ) = (u1 (µ), . . . , un (µ)). Player i’s correlated minmax value is given by ui = min ui (µ). µ∈∆(A)

Write u = (u1 , . . . , un ) for the vector of such values across all players. The set of feasible, individually rational payoffs is denoted by U = {u ∈ U : u ≥ u}. I make the following standard assumptions on payoffs. Assumption 1 (Payoffs). (a) The set U has dimension n. (b) The stage game has a strictly inefficient correlated equilibrium, with payoff profile u0 . Let U0 = {u ∈ U : u ≥ u0 }. This is standard. Assumption 1(a) says that in principle players can be punished individually and 1(b) that there is room for everyone’s improvement beyond static equilibrium.

3.2

Probabilities

Let Ω = {−1, +1} be the set of publicly observed signals.11 In any given period and for every action profile a ∈ A, let Pr(ω|a) be the probability that ω ∈ Ω is publicly observed at the end of the period when a was played in that same period. The following detectability condition is necessary for the results of the paper. Definition 1. The matrix Pr exhibits conditional identifiability (CI-Pr) if Pr(ω|ai , ·) 6∈ cone{Pr(ω|bi , ·) : bi 6= ai } ∀(i, ai , ω). Conditional identifiability is stronger than individual identifiability, which requires only that Pr(·|ai , ·) 6∈ cone{Pr(·|bi , ·) : bi 6= ai } for all (i, ai ).12 Nevertheless, it is equally silent about the identity of a deviator. It is consistent with some strongly symmetric conditional distributions, unlike pairwise identifiability, for instance. Example 2. To illustrate, consider signal probabilities in the Prisoners’ Dilemma. Let q2 = Pr(−1|C, C), q1 = Pr(−1|C, D) = Pr(−1|D, C) and q0 = Pr(−1|D, D). Assume that 0 < q2 < q1 < 1. Conditional identifiability holds whenever q1 ≥ q0 . Indeed, by definition it fails if and only if either q2 /q1 = q1 /q0 or (1 − q2 )/(1 − q1 ) = (1 − q1 )/(1 − q0 ) or both, which clearly requires that q1 < q0 , since q2 < q1 . 11

More general signal structures can be easily accommodated here. I discuss them briefly in Section 6.3. This is still significantly stronger than convex independence, rather than the conic version above. Convex independence is necessary but not sufficient for the Folk Theorem. 12

11

For interpretation, pick any player i, profile a ∈ A of mediator recommendations and mixed strategy σi ∈ ∆(Ai ) with σi (ai ) < 1. The ratio σi (ai ) Pr(ω|a)/ Pr(ω|σi , a−i ), where P Pr(ω|σi , a−i ) = bi σi (bi ) Pr(ω|bi , a−i ), equals the posterior probability that player i obeys the mediator given the mediator’s information: his recommendations and the public signal. Conditional identifiability holds if and only if for every signal ω there exist two profiles, a−i and b−i , that give the mediator different posterior beliefs of whether or not i played ai , i ) Pr(ω|ai ,b−i ) i (ai ) Pr(ω|a) 6= σi (aPr(ω|σ . Otherwise, player i could infer the mediator’s beliefs. that is, σPr(ω|σ i ,a−i ) i ,b−i ) Therefore, if player i does not observe whether a−i or b−i was recommended, the mediator may sustain i’s belief that punishment is possible even when the mediator knows it is not. Lemma 3. CI-Pr fails if and only if there exists (i, ai , σi , ω) such that σi (ai ) < 1 and Pr(ω|a) Pr(ω|ai , b−i ) = Pr(ω|σi , a−i ) Pr(ω|σi , b−i )

3.3

∀(a−i , b−i ).

Drifts

Let us specify a discrete-time model of imperfect monitoring whose noise converges to Brownian motion as actions become arbitrarily frequent. Following Example 1, for every action profile a and ∆t > 0 sufficiently small, given a given function x : A → R, let Pr(+1|a) = Pr(−1|a) =

1 [1 2 1 [1 2

√ + x(a) ∆t] =: p(a) and √ − x(a) ∆t] =: q(a) = 1 − p(a).

Consider the following Binomial random walk X starting at X0 = 0. For all t > 0, let Xt = Xt−∆t + εt , where {εt } is a sequence of independent Bernoulli trials with success √ probability p(at ). Success means that εt = ∆t, so X jumps up, whereas failure implies √ εt = − ∆t, so X jumps down. As ∆t → 0, X converges to a Brownian motion with law dXt = x(at )dt + dWt , where W corresponds to Wiener process. Thus, x describes the drift of X. Since the probability matrix above depends on ∆t, it is convenient to find a condition on the drift function x that guarantees conditional identifiability for small ∆t. Definition 2. The drift function x exhibits conditional identifiability (CI-x) if x(ai , ·) 6∈ conv{x(bi , ·) : bi 6= ai } + L1

∀(i, ai ),

(4)

where “conv” stands for convex hull and L1 = {α(1, . . . , 1) ∈ RA−i : α ∈ R}. I will write conditional identifiability “in probabilities” versus “in drifts” when necessary. 12

Lemma 4. If x exhibits conditional identifiability then for some ∆ > 0 and all ∆t ∈ (0, ∆), the probability matrix that parametrizes X at ∆t exhibits conditional identifiability. Lemma 4 shows that conditional identifiability of the limiting stochastic process implies conditional identifiability in probabilities of its random walk representation sufficiently close to it. For intuition, recall the Prisoners’ Dilemma of Example 1. Figure 3 below shows that (4) holds when (x0 , x1 , x2 ) = (0, −1, 1). However, (4) is sufficient but not necessary for conditional identifiability in probabilities given all small ∆t, as the next example shows. (-‐1,0) + L1

(1,-‐1) + L1

Figure 3: Conditional identifiability in Example 1 when (x0 , x1 , x2 ) = (0, −1, 1) Example 3. With reference to Example 1, let x1 = 21 (x0 + x2 ) but x0 6= x2 . Clearly, (4) fails, yet conditional identifiability holds for all ∆t. Indeed, conditional identifiability requires that both p2 /p1 6= p1 /p0 and q2 /q1 6= q1 /q0 . This is easily seen to be equivalent to √ |2x1 − (x0 + x2 )| = 6 x21 − x0 x2 ∆t. By hypothesis, the left-hand side of this inequality equals zero and the right-hand side does not, so conditional identifiability holds for all ∆t > 0 even though (4) fails. On the other hand, the right-hand side tends to zero as ∆t → 0, so conditional identifiability subsides. I now characterize CI-x in terms of this decay, to help contrast with CI-Pr in Lemma 3. Proposition 1. CI-x fails if and only if there exists (i, ai , σi , ω) with σi (ai ) < 1 and 1 Pr(ω|a) Pr(ω|ai , b−i ) √ − →0 ∀(a−i , b−i ). ∆t Pr(ω|σi , a−i ) Pr(ω|σi , b−i ) Intuitively, Proposition 1 says that CI-x fails whenever the mediator’s informational advantage—in terms of having several possible posterior beliefs of whether player i obeyed √ the mediator given the mediator’s information—deteriorates faster than ∆t. This is important because, although by Lemma 4 conditional identifiability in drifts is not necessary for conditional identifiability in probabilities for small ∆t, the Folk Theorem below relies on the full strength of CI-x over and above CI-Pr for sufficiently small ∆t. Assumption 2 (Drifts). The drift function x exhibits conditional identifiability. 13

4

Incentive Compatibility

In this section I construct incentive schemes via punishments and rewards to obtain the Folk Theorem. From now on, fix a stage-game correlated strategy µ ∈ ∆(A).

4.1

Scoring Rules

Definition 3. A scoring rule is a pair of functions ξ, ζ : I × A × Ω → [0, 1] of failure and success probabilities, respectively, with ζ = 1 − ξ. Call ξ and ζ proper 13 if 1. Belief stability: Failure is equally likely with obedience, more likely without it. X X Pr(ω|a−i , bi )µ(a)πi ≤ ξi (a, ω) Pr(ω|a−i , bi )µ(a) ∀(i, ai , bi , ω), a−i

where πi =

(5)

a−i

P

(a,ω) ξi (a, ω) Pr(ω|a)µ(a)

for every player i.

2. Implementability: Some γi > 0 yields incentive compatibility for each player i. X X ∆ui (a, bi )µ(a) ≤ ξi (a, ω)ω∆x(a, bi )µ(a) ∀(i, ai , bi ), γi a−i

(6)

(a−i ,ω)

where ∆ui (a, bi ) = ui (a−i , bi ) − ui (a) and ∆x(a, bi ) = x(a−i , bi ) − x(a). √ A proper scoring rule consists of failure probabilities ξ (with punishment 21 ∆t/γi ) such that (i) the probability of failure given a player’s private history always exceeds the belief the player would have if he planned to obey the mediator’s recommendations, and (ii) obeying the mediator’s recommendations can be incentive compatible. By (i), whenever a player obeys a recommendation, his posterior beliefs about the probability of failure conditional on ai and ω always equal his prior πi . Lemma 5. If ξ is a proper scoring rule then conditional failure probabilities are constant and equal to π on the equilibrium path: X X ξi (a, ω) Pr(ω|a)µ(a) = πi Pr(ω|a)µ(a) ∀(i, ai , ω). a−i

a−i

Proper scoring rules can have any prior failure probability. Lemma 6. If a proper scoring rule exists then for any vector α ∈ (0, 1)n there exists another proper scoring rule with prior failure probability for player i equal to αi . Proposition 2. If Pr exhibits conditional identifiability then the set of proper scoring rules for µ is not empty as long as µ is a completely mixed correlated strategy. 13

The label “proper” is borrowed from statistics (see, e.g., Gneiting and Raftery, 2007). Formally, belief stability implies properness, as its constraints are ex post, so a better label might be “very proper.”

14

Notation. If σi ∈ M (Ai ) = {Ai → ∆(Ai )} is a stage-game deviation and f : A → R P depends on actions, not recommendations, write f (a, σi ) = bi f (a−i , bi )σi (bi |ai ) for the convolution of f with σi , ∆f (a, σi ) = f (a, σi ) − f (a) for the effect of σi on f at a, and P ∆f (µ, σi ) = a ∆f (a)µ(a), or simply ∆f (σi ), for the a priori effect. P Fix a proper scoring rule ξ. Let πi (σi ) = (a,ω) ξi (a, ω) Pr(ω|a−i , σi )µ(a) be the prior √ failure probability from σi and ∆πi (σi ) = πi (σi ) − πi . Let zi (σi ) = ∆πi (σi )/( 21 ∆t) be the prior failure drift from σi . The incentive cost of σi from ξi at µ equals the ratio Ci (σi ) = ∆ui (σi )/zi (σi ), where I assume that 0/0 = 0. Ci (σi ) is the ratio of utility changes to drift changes. The conditional failure probability from σi given (ai , ω) is written as P a−i ξi (a, ω) Pr(ω|a−i , σi )µ(a) P . πi (σi |ai , ω) = a−i Pr(ω|a−i , σi )µ(a) √ Similarly, ∆πi (σi |ai , ω) = πi (σi |ai , ω) − πi and zi (σi |ai , ω) = ∆πi (σi |ai , ω)/( 12 ∆t). Definition 4. A cost-maximizing deviation for player i is any σi∗ ∈ arg maxσi Ci (σi ). Let ∆u∗i = ∆ui (σi∗ ), zi∗ = zi (σi∗ ) and Ci∗ = Ci (σi∗ ). A failure-maximizing conditional deviation for player i given (ai , ω) is any σi∗∗ (ai , ω) ∈ arg maxσi πi (σi |ai , ω). Player i’s maximum conditional failure probability equals πi∗∗ = max(ai ,ω) πi (σi∗∗ (ai , ω)|ai , ω), with conditional √ failure drift zi∗∗ = ∆πi∗∗ /( 12 ∆t), where ∆πi∗∗ = πi∗∗ − πi . By implementability (6), 0 ≤ Ci∗ < ∞, so, since M (Ai ) is compact, σi∗ and σi∗∗ exist. If Ci∗ = 0 then i needs no incentives, so without loss Ci∗ > 0. Hence, πi∗∗ > πi . By the Maximum Theorem, arg max Ci and arg max zi have a continuous selection. Lemma 7. Let ξ be a proper scoring rule and µ completely mixed. Without loss, σi∗ and σi∗∗ are continuous with respect to ξ and µ. So are ∆u∗i , zi∗ , Ci∗ , πi∗∗ and zi∗∗ . Finally, consider convergence of scoring rules. If ξ → ξ for some scoring rule ξ, then, ∗ ∗∗ ∗ ∗ ∗ by Lemma 7, σi∗ → σ ∗i and σi∗∗ → σ ∗∗ i for some (σ i , σ i ). Hence, (∆ui , zi , Ci ) converges ∗ ∗∗ to some (∆u∗i , z ∗i , C i ) and (πi∗∗ , zi∗∗ ) to some (π ∗∗ i , z i ). By Proposition 2, if Pr exhibits conditional identifiability for all small ∆t > 0 then every completely mixed correlated strategy has a proper scoring rule ξ such that Ci∗ < ∞. Since 0 ≤ ξ ≤ 1, without loss ξ converges to some scoring rule ξ as ∆t → 0. Proposition 3. CI-x implies that there is a family of proper scoring rules ξ with ξ → ξ ∗ as ∆t → 0, where ξ is a proper scoring rule at ∆t = 0. Hence, π i ∈ (0, 1) and C i < ∞.

4.2

Punishment Schemes

Definition 5. A punishment scheme is a triple (µ, ξ, w) with ξ a proper scoring rule for µ ∈ ∆(A) and w ∈ Rn+ a vector of punishments. It is implemented as follows. 15

1. Over a T -period block, the mediator secretly recommends players to play the acQ tion profile aT = (a1 , . . . , aT ) with probability τ µ(aτ ). Players only observe their own recommendations throughout the T -period block and these recommendations are independent and identically distributed, generated by µ. 2. Player i’s score for the block is determined by the following secret process. (a) For any history (aT , ω T ), where aT is the vector of recommendations by the mediator and ω T is the vector of realized public signals, the mediator performs T independent Bernoulli trials, called scoring trials. The trial at time t has failure probability ξi (at , ωt ) if at was recommended and ωt realized. (b) Finally, the score equals the number of successes in the T scoring trials. 3. Punishment for player i ensues if his score at the end of the block does not exceed τi∗∗ = b(1 − πi∗∗ )(T − 1)c, where σi∗∗ is a failure-maximizing conditional deviation and πi∗∗ is player i’s maximum conditional failure probability from Definition 4. 4. Punishment to player i entails subtracting wi from i’s continuation value. Otherwise, “no punishment” entails no change to player i’s continuation value. The probability that i is punished if everyone obeys the mediator is given by ∗∗

Πi0 =

τi X T τ =0

τ

πiT −τ (1 − πi )τ ,

P where πi = (a,bi ,ω) ξi (a, ω) Pr(ω|a)µ(a) is the equilibrium prior failure probability. Therefore, the average lifetime utility to player i is given by vi = (1 − δ T )ui (µ) + δ T [(1 − Πi0 )vi + Πi0 (vi − wi )], where ui (µ) =

P

a

ui (a)µ(a). Rearranging yields vi = ui (µ) −

δT Πi0 wi . 1 − δT

Having defined a punishment scheme, let us argue its incentive compatibility. First, I find minimal punishments that discourage player i from deviating in the first period of a block. Then I show that this punishment discourages every deviation. If player i plans to only deviate in the first period by playing σi ∈ M (Ai ) instead of obeying the mediator’s recommendation, his utility gain is (1 − δ)∆ui (σi ). On the other hand, the additional cost from deviating is given by the present value punishment times the change in punishment probability. Let Πi1 (σi ) be the punishment probability if player i only disobeys in the first period, and does so according to σi . Discouraging σi requires (1 − δ)∆ui (σi ) ≤ e−rc wi [Πi1 (σi ) − Πi0 ]. 16

(7)

From period 2 onwards, player i will obey the mediator, so failure probability equals πi then. Hence, if FiT (τ ) stands for the CDF of a Binomial random variable with failure probability πi and T trials then Πi1 (σi ) = πi (σi )FiT −1 (τi∗∗ ) + (1 − πi (σi ))FiT −1 (τi∗∗ − 1). Therefore, letting fiT (t) stand for the probability mass function obtained from Fi , it follows that fiT −1 (τi∗∗ ) = FiT −1 (τi∗∗ ) − FiT −1 (τi∗∗ − 1), so (7) can be rewritten as (1 − δ)∆ui (σi ) ≤ e−rc wi ∆πi (σi )fiT −1 (τi∗∗ ). Lemma 8. The punishment scheme above discourages every first-period deviation if ∗ T ∗∗ T −τ ∗∗ rc ∆ui rce ≤ wi T ∗∗ πi i (1 − πi )τi . ∗ ∆πi τi

(8)

I will now argue that every deviation is discouraged by a punishment scheme satisfying (8). I will show that for any partial history of deviations and observations hτi , every onestep deviation in period τ ≤ T followed by obedience henceforth on the part of player i is unprofitable. This clearly renders every dynamic deviation unprofitable, since every such dynamic deviation must have a history after which its last one-step deviation takes place. Indeed, given hτi and a one-step deviation σi ∈ M (Ai ) at time t, let FiT (τi∗∗ |hτi , σi ) be the probability that player i is punished if he deviates according to σi and let FiT (τi∗∗ |hτi ) be the probability that he is punished if he chooses not to deviate after hτi . The utility gain from σi given hτi is (1−δ)∆ui (σi ). On the other hand, the deviation costs at least e−rc wi [FiT (τi∗∗ |hτi , σi ) − FiT (τi∗∗ |hτi )]. Let FiT −1 (·|hτi ) be the CDF of the number of failures given hτi during all periods except τ , assuming a failure probability of πi for all periods larger than t. Letting fiT −1 (·|hτi ) be the probability mass function induced by FiT −1 (·|hτi ), it easily follows just as before that σi is discouraged if (1 − δ)∆ui (σi ) ≤ e−rc wi ∆πi (σi )fiT −1 (τi∗∗ |hτi ). By belief stability (5), no partial history of deviations and observations can decrease the probability of failure below πi every period. Using this observation, the next result shows that fiT −1 (τi∗∗ |hτi ) ≥ fiT −1 (τi∗∗ ), that is, the probability of τi∗∗ successes is smaller whenever player i obeyed the mediator’s recommendations. Together with Lemma 8, this implies that discouraging every single one-step deviation discourages the last one-step deviation associated with any arbitrarily complex dynamic deviation. Proceeding by induction, this ultimately discourages every dynamic deviation. Lemma 9. For every history hτi , it is the case that fiT −1 (τi∗∗ |hτi ) ≥ fiT −1 (τi∗∗ ), therefore every dynamic deviation is unprofitable whenever wi satisfies (8). 17

Let us now turn to convergence. I first take the limit as ∆t → 0 and then, afterwards, the limit as r → 0. Consider a family ξ 0 of proper scoring rules indexed by ∆t ≥ 0. By 0 Proposition 3, it is possible to construct such a family and satisfy ξ 0 → ξ as ∆t → 0, where 0 ξ is a proper scoring rule at ∆t = 0. Just to ease notation eventually, apply an affine transformation (Lemma 6) to each scoring rule in the family to obtain a new family of scoring rules ξ such that πi = 12 for every ∆t ≥ 0. I will fix this family ξ for the remaining discussion of punishment schemes. Proposition 4. 1. For every ε > 0 there exists ∆ > 0 such that if rcerc ∆u∗i √ ∗√ wi ≥ (1 − ε)ϕ(z ∗∗ c)z i c i

(9)

then every deviation is unprofitable for all ∆t ∈ (0, ∆). 2. If wi is chosen so that (9) holds with equality then lifetime utility satisfies √ c) rc∆u∗i 1 − Φ(z ∗∗ i √ ∗√ as ∆t → 0. vi → ui (µ) − ∗∗ −rc 1 − e ϕ(z i c)z i c 3. Finally, letting c → ∞ but rc → 0 as r → 0, lim lim vi = ui (µ).

r→0 ∆t→0

4.3

Reward Schemes

Definition 6. A reward scheme is a triple (µ, ζ, w) such that ξ = 1 − ζ is a proper scoring rule for µ and w ∈ Rn+ is a vector of rewards. It is implemented as follows. 1. Recommendations follow the same rule as for punishment schemes. 2. The score of the block is determined as follows. (a) For any history (aT , ω T ), where aT is the vector of recommendations by the mediator and ω T is the vector of realized public signals, the mediator performs T independent Bernoulli trials, called scoring trials. The trial at time t has success probability ζi (at , ωt ) if at was recommended and ωt realized. (b) Finally, player i’s score equals the number of failures in the T scoring trials. 3. Reward for i ensues if his score at the end of the block is less than or equal to P τi∗ = bπi (T − 1)c − 1, where πi = 1 − (a,ω) ζi (a, ω) Pr(ω|a)µ(a). 4. Reward to player i entails adding wi to i’s continuation value. 18

Reward schemes are only slightly different from punishment schemes. Of course, reward schemes involve the possibility of increasing continuation payoffs, rather than lowering them. Since a player could deviate every period, the reward wi must compensate for at least these deviation gains. Another notable difference is that the cut-off proportion of successes that induces reward is roughly the prior reward probability assuming obedient behavior. This is in stark contrast with punishment schemes, where the cut-off is the prior punishment probability assuming disobedient behavior. This yields concave deviation costs, so the most profitable way to deviate is to do so (almost) every period. The lifetime utility of player i is given by vi = ui (µ) +

δT Ri0 wi , 1 − δT

where Ri0 , the probability of reward given obedience, equals τi∗ X T τ Ri0 = πi (1 − πi )T −τ . τ τ =0 Let σiT be a dynamic deviation for a T -period block. Discouraging σiT entails (1 − δ)E

T X

δ τ −1 ∆ui (σiτ ) ≤ δ T wi E[Ri0 − RiT (σiT )],

τ =1

where Ri0 and RiT (σiT ) are reward probabilities after obedience and σiT respectively, and E, of course, computes expectations. Let us find a useful bound on the reward wi for incentive compatibility. For any terminal history of the block, hTi , let RiT (σiT |hTi ) be the reward probability from playing σiT conditional on hTi . By belief stability (5), the failure probability πiτ of each period’s scoring trial must lie between πi and πi∗∗ . Therefore, Hoeffding’s (1956) inequality implies that τi∗ X T τ T T RiT (σi |hi ) ≤ π ˆi (1 − π ˆi )T −τ =: Ri (ˆ πi ), (10) τ τ =0 P where RiT (σiT |hTi ) is the probability of being rewarded given hTi and π ˆi = T1 τ πiτ . Lemma 10. Ri (ˆ πi ) is strictly decreasing and strictly convex on [πi , πi∗∗ ]. Lemma 10 shows that—all else equal—player i prefers more variation in average failure probabilities. It also shows how reward schemes contrast punishment schemes. In Lemma 9, the choice of cut-off helped to determine that discouraging one deviation discouraged them all. Here, a different choice of cut-off yields the opposite conclusion: deviating only once is suboptimal is worse than deviating more often. Figure 4 (see Rahman, 2013, for details) illustrates this point as ∆t → 0. Now, to bound wi , let P E[∆ˆ ui (σiT )] = T1 τ max{E[∆ui (σiτ )], 0}. The next result is obvious, so a proof is omitted. 19

one devia)on Rewards many devia)ons many devia)ons

Punishments

one devia)on

−zi** c

0

zi** c

Figure 4: Concavity and convexity of punishments and rewards Proposition 5. The reward scheme above discourages every deviation if E[∆ˆ ui (σiT )] . E[Ri (πi ) − Ri (ˆ πi )]

wi ≥ rcerc max σiT

Let us now turn to convergence for reward schemes. Just as with punishment schemes, I first take the limit as ∆t → 0 and then the limit as r → 0. Let ξ 0 be a family of proper 0 0 scoring rules indexed by ∆t ≥ 0. Again, ξ 0 → ξ as ∆t → 0, where ξ is a proper scoring rule at ∆t = 0. Finally, apply an affine transformation (Lemma 6) to each scoring rule in the family to obtain a new family of scoring rules ξ such that πi = 12 for every ∆t ≥ 0. Fix this family ξ for the rest of this section. Given c and T = bc/∆tc, let Di = sup max T

σiT

E[∆ˆ ui (σiT )] . E[Ri (πi ) − Ri (ˆ πi )]

Proposition 6. 1. A reward scheme with wi ≥ rcerc Di discourages every deviation for all ∆t ∈ (0, c]. 2. If wi = rcerc Di then player i’s lifetime utility satisfies rc 1 vi → ui (µ) + Di as ∆t → 0. 1 − e−rc 2 3. Finally, Di < ∞, and letting c → ∞ but rc → 0 as r → 0, lim lim vi = ui (µ) + ∆udi ,

r→0 ∆t→0

where ∆udi = maxσi ∆ui (σi ). This result explains how incentives are provided with reward schemes in the long run, as players become patient. To allocate rewards more accurately, players increase the length of their block and receive more signals with which to aggregate information. In the limit, reward schemes are efficient in the sense that players are compensated for their best deviation gains (which I already argued was unavoidable) and no more. 20

5

Folk Theorem

I will state the main result of the paper, Theorem 2. First, I introduce a restriction on the space of utility profiles that permits consistent use of individual punishments and rewards. This restriction forces me to give incentives by burning value. Then I state the Folk Theorem and describe an outline for the proof, which can be found in Appendix A.

5.1

Preliminaries

Given a correlated strategy µ ∈ ∆(A), recall that X udi (µ) = max ui (bi , a−i )µ(a)σi (bi |ai ). σi

(a,bi )

For any subset of players J and any ε1 , ε2 > 0, let V εJ1 ,ε2 = {v ∈ U : ∃µ ∈ ∆ε1 (A) s.t. vi ≥ udi (µ) + ε2 ∀i 6∈ J, vi ≤ ui (µ) − ε2 ∀i ∈ J}, where ∆ε (A) = {µ ∈ ∆(A) : µ(a) ≥ ε ∀a}. Let \ ε ,ε V∗ε1 ,ε2 = and V J1 2

V∗ =

[

V∗ε1 ,ε2 .

ε1 ,ε2 >0

J⊂I

The set V∗ corresponds approximately to the set W ∗∗ of Compte (1998), but it is easy to see that W ∗∗ ⊂ V∗ . This is because Compte (1998) only allows players outside of some set J to deviate from pure strategy profiles, as opposed to the correlated strategies with full support above. V∗ is used for the Folk Theorem below. For example, it is easy to see that V∗ = int U0 in the Prisoners’ Dilemma. Basically, V∗ is the set of payoff profiles such that for every subset of players, there is a correlated strategy whose payoff exceeds the payoff for the players in that subset and, for the complementary set of players, their deviation payoff is exceeded by the payoffs in the set. According to Compte (1998, Proposition 3), the requirement that V∗ includes a sequence of payoff profiles that converge to the Pareto efficient frontier is generic in the space of payoffs.

5.2

Main Result

Now we can state the paper’s main result, namely that an “equilibrium threats” Folk Theorem holds as players become unboundedly patient, even with frequent actions. Two properties of the result below are worth emphasizing. First, I allow players to engage in mediated communication. Although actions are arbitrarily frequent, their public communication rounds occur in discrete time, thus, with respect to privacy of players’ strategies, time is endogenously discrete. Secondly, asymptotic efficiency is achieved with value-burning. The reason that this is possible, unlike in Sannikov and Skrzypacz (2010), is that the burning of value is delayed and conditioned on richer information. Intuitively, this prevents the noise to signal ratio from exploding. 21

Theorem 2 (Folk Theorem). Under Assumptions 1 and 2, for any payoff profile u in the interior of V∗ , there is a discount rate r and a time step ∆ such that for all r ≤ r and all ∆t ≤ ∆ there exists a communication equilibrium of the repeated game with discount rate r and time step ∆t whose payoff profile equals u. Communication equilibrium is defined formally in the supplementary material, and the proof of Theorem 2 can be found in Appendix A. I offer a brief outline below. First, I establish that every smooth subset of V∗ is locally self-decomposable with respect to T -public equilibria of some appropriate calendar length c. I derive a uniform bound on c to self-decompose W . Next, since I give players individual punishments and rewards, local self-decomposability is straightforward for payoff vectors belonging to the boundary of W whose outward normal vectors are regular, that is, not coordinate vectors. This is because, for regular vectors, punishments and rewards from v lie inside W for sufficiently patient players. On the other hand, for coordinate vectors there exist players for whom individual punishments and rewards necessarily displace continuation values outside of W , violating self-decomposability. To correct this issue, I shift players’ continuation payoffs inside of W by an amount proportional to r. Now, incentive-providing payments are linear in r. √ However, W is smooth, so the scope of feasible continuation payoffs is proportional to r. Following Theorem 4.1 of Fudenberg et al. (1994), there exists r > 0 sufficiently small that the incentives provided above lie inside W . Finally, the incentive schemes designed in the paper hold even more strongly as ∆t and r are lowered, for fixed c.

6

Discussion

Next I discuss the model’s limitations and possible extensions.

6.1

Private Monitoring and Private Information

The model above assumed public monitoring, partly to make the results more difficult to obtain. With private monitoring it is arguably easier to obtain the results of the paper. This is because with private monitoring the mediator has more information at his disposal to manage. As usual, assume full support. Players observe a private signal every period that may be imperfectly correlated with others’ signals. The information management protocol becomes slightly more sophisticated than before. Now, the mediator not only makes confidential (still non-binding) recommendations to players, but also asks players to confidentially report their observations. The mediator then makes recommendations contingent not only on what he told players, but also on what players told him. The mediator can delay the arrival of two pieces of information: (i) others’ recommendations, and (ii) others’ reports. As such, the detectability requirements for the construction of

22

this paper to work become substantially weaker: individual identifiability—every deviation being detectable suffices in games with private monitoring and full support. To see this, notice that the key step is to obtain a version of Proposition 2. With private monitoring, the linear programming problem in the proof of Lemma 2 must include incentive constraints that ensure players are willing to report their information honestly, and the belief stability constraints can integrate out others’ signals. From this linear program, it is easy to see that the corresponding identifiability condition that suffices for a private-monitoring version of Proposition 2 to obtain is Pr(si , ·|ai , ·) 6∈ cone{Pr(ri , ·|bi , ·) : (ri , bi ) 6= (si , ai )} ∀(i, ai , si ), where Pr(s|a) is the probability of a profile (s1 , . . . , sn ) of private signals for each player given that everyone played the action profile (a1 , . . . , an ). It is easy to see that this condition is equivalent to every deviation being detectable, that is, “unconditional” identifiability. The rest of the construction of this paper follows without any changes. Incidentally, if players have payoff-relevant private information, as in reputation models, the same logic applies as above and can easily be accommodated, although additional assumptions on the set V∗ will be necessary.

6.2

Social Incentives

Theorem 2 assumes that incentives can only be given individually. In fact, the detectability assumptions there are the weakest sufficient conditions consistent with this restriction. However, it is natural to ask when it is possible to give social incentives, with schemes that correlate continuation payoffs across players. For instance, in Example 1, it was convenient and intuitive to do so when x2 ≥ x1 > x0 , even if sometimes conditional identifiability held. At the same time, we obtained a Folk Theorem under weak conditions there. In general, sufficient conditions for a Folk Theorem will be stronger than convex independence. Definition 7. Given a vector of welfare weights λ, a proper λ-balanced scoring rule consists of failure probabilities ξ and a payment scheme β : I × A × Ω → R such that 1. Belief stability: X 0≤ (ξi (a, ω) − π) Pr(ω|a−i , bi )µ(a) ∀(i, ai , bi , ω), (11) a−i

where π =

P

(i,a,ω) ξi (a, ω) Pr(ω|a)µ(a)

if for every player i.

2. Implementability: There exists γ ∈ R+ that yields incentive compatibility, i.e., X X λi ξ (a, ω) + βi (a, ω)ω∆x(a, bi )µ(a) ∀(i, ai , bi ); 14 (12) γ ∆ui (a, bi )µ(a) ≤ |λi | i a−i 14

I interpret

(a−i ,ω) λi |λi |

= 0 when λi = 0.

23

3. Budget balance: The payment scheme β is welfare neutral, i.e., n X

λi βi (a, ω) = 0 ∀(a, ω).

i=1

A balanced scoring rule has two new properties: (i) it includes a payment scheme β, where payments accrue each period with probability one, but only paid out at the end of the block, and (ii) the scoring rule is oriented around the welfare weights: if λi > 0 then ξi defines a punishment scheme, whereas if λi < 0 then ξi defines a reward scheme. If λi = 0 then player i is not given incentives with a scoring rule, but only with payments βi . Moreover, the payment scheme is budget-balanced around the vector of welfare weights λ. A deviation σi ∈ M (Ai ) is conditionally irreversible if X Pr(ω|ai , ·) = yi (ai , σi , ω) Pr(ω|σi (ai ), ·) + yi (ai , bi , ω) Pr(ω|bi , ·) bi

for some y ≥ 0 and (i, ai , ω) implies that yi (ai , σi , ω) = 0. It is clear from the proof of Proposition 2 that what is necessary for it to hold is that every profitable deviation is conditionally irreversible with respect to Pr. Regarding drifts, we obtain a similar condition for Proposition 3. For Proposition 6 we qualify utility gains to be non-negative. We seek identifiability conditions that guarantee incentive compatibility. A profile of deviations σ is λ-unattributable (Rahman and Obara, 2010) if there is a vector η ∈ RA×Ω such that ∆ Pr(ω|a, σi ) = λi η(a, ω) for every player i. The profile σ is conditionally irreversible with respect to λ if |λλii | σi is conditionally irreversible for each i. Finally, σ is profitable if the sum of unilateral deviation gains across individuals is positive: n X

∆ui (σi , µ) > 0.

i=1

Proposition 7. Let µ be a completely mixed correlated strategy and λ a regular15 vector of welfare weights. If every profitable, λ-unattributable deviation profile is conditionally irreversible with respect to λ then a proper λ-balanced scoring rule exists. If a profile of deviations is unattributable then—statistically speaking—as far as the mediator knows anyone could have been the deviator. If η = 0 then the deviations are undetectable and there is nothing that the mediator can do to prevent them, but if η 6= 0 then in order to provide incentives the mediator cannot punish some and reward others simultaneously, which generally leads to value-burning. To apply the punishment and reward schemes of Section 4, it is enough that (i) every individually profitable deviation is detectable, and (ii) profitable, unattributable deviations are conditionally irreversible. 15

Regular means that at least two elements of λ are non-zero, thus λ is not a coordinate vector.

24

I should mention that Proposition 7 also applies in the limiting case with conditional irreversibility in drifts rather than probabilities, as in Section 4, as the proof suggests. From Proposition 7, a somewhat weaker Folk Theorem than Theorem 2 obtains. For regular λ, punishment and reward schemes are designed as in Section 4, except that now, at the end of a T -period block, if the mediator’s history was (aT , ω T ), then each player i P is pays the amount τ βi (aτ , ωτ ) in units of continuation value in period T . The utility to player i over a T -period block includes the transfers βi : (1 − δ

T

)Ui0 (aT , ω T )

= (1 − δ)

T X

[δ τ −1 uiτ (aτ ) + δ T βi (aτ , ωτ )],

τ =1

where Ui0 (aT , ω T ) is player i’s utility net of payments βi . Now suppose that λ is a coordinate vector. If λ = −1i then we can decompose a boundary payoff with a correlated equilibrium, as usual, whereas if λ = 1i assume that there is an enforceable correlated strategy that maximizes i’s expected payoff. In this case, i needs no incentives. If c is small relative to 1/r, players’ net utility gains from a deviation will be bounded, so the incentive schemes of Section 4 apply and we can obtain a Folk Theorem in the same way as for Theorem 2, except that the detectability assumptions are the weaker ones of Proposition 7. The detectability condition of Proposition 7 reconciles the characterization of a Folk Theorem in the Prisoners’ Dilemma of Example 1. There, it was necessary and sufficient that every profitable deviation was detectable. Indeed, since players only have two actions, a profitable deviation simply cannot be reversed, hence the result for two-by-two games.

6.3

Other Signal Structures

Relatedly, part of the reason why the condition of Proposition 4 is so strong is that the information structure used to reach the continuous time limit is arguably sparse. As shown by Fudenberg and Levine (2007, 2009), the kind of information structure used to reach the continuous time limit can have crucial consequences. In this paper I took the extreme case where it was impossible to obtain positive results with the approach suggested by them. Therefore, I assumed a binomial random walk, where actions affect the drift but not the volatility of the limit diffusion process. This is important because it is well known that volatility is effectively observable, since it can be immediately inferred from a Brownian motion (Fudenberg and Levine, 2007, 2009, and references therein). On the other hand, if the public signals are richer than just binary, then it may be more difficult to satisfy conditional identifiability for every signal. Nevertheless, for a general signaling structure, Theorem 2 holds as long as the condition in the consequent of Proposition 1 holds.

25

6.4

Discrete Time

Of course, Theorem 2 has a relationship with the Folk Theorem in discrete time. By Theorem 2 the order of limits between r → 0 and ∆t → 0 doesn’t matter for approaching V∗ . Moreover, for fixed ∆t > 0 similar schemes as derived deliver a Folk Theorem, although some of the details need amending. However, τi∗∗ must be slightly different in case ∆t is fixed but T is variable for punishment schemes to be incentive compatible.

6.5

Dispensing with the Mediator

To some extent, the mediator can be dispensed with easily. This requires resorting to mixed strategy profiles rather than correlated strategies. Also, instead of taking recommendations, players may report their intended action before actually playing it.16 The scoring rules derived above for recommendations ensure that players follow through on any intentions they report. One way to induce the right intentions is to solicit them at the beginning of every period, charge players in units of continuation value for their reports in order to keep them indifferent over reports, and subject them to punishments and rewards as defined above so that actual behavior agrees with the profile of reported intentions. To illustrate, assume that every player will be given incentives via punishments. Let µi be a Q mixed strategy for player i and µ−i = j6=i µj the product of others’ mixed strategies. Let P P ui∗ = minai a−i ui (a)µ−i (a−i ) and ui∗ (ai ) = a−i ui (a)µ−i (a−i ) − ui∗ . Given an initial block of length c, if player i reports the intention to play ait at date t ∈ [0, c], he will be charged wit (ait ) = (1 − e−r∆t )e−rt ui∗ (ait ) units of continuation value during subsequent blocks. In addition, player i will face a punishment scheme as defined in Section 4.2, where the mediator’s recommendations are replaced with players’ intentions. Given any intended action, a punishment scheme that satisfies incentive compatibility as described in Section 4.2 immediately satisfies incentive compatibility in this setting. Given that every player follows their reported intentions, the probability of ultimate punishment is independent of the actual reported intentions. Therefore, if a player abides by his intentions, different reported intentions only affect a player’s payoff through the direct charge wit (ait ). But this is constructed to keep a player indifferent over possible reports in this case. Hence: Proposition 8. An incentive compatible punishment or reward scheme remains incentive compatible when recommendations are replaced with intentions as described above. However, intentions are subject to Bhaskar’s (2000) critique. For a player to reveal his intentions, he must be indifferent over reports and, playing mixed strategies, must also be 16

Relatedly, Kandori (2003) suggested that players report what they played to each other. Rahman (2012a) shows that, generally, it is better for players to report intentions than actual behavior.

26

indifferent over the pure actions in their support, too. With a mediator, these ties can be broken easily and robustly. Moreover, for some games detectability requires perfectly correlated behavior by some players, but crucially kept secret from others. In such games, it is unlikely that a mediator can be dispensed with. See Rahman (2012a) for details. Even if a mediator cannot be dispensed with, it may be possible to decentralize it with plain conversation, as argued by Forges (1990, 1986); she required 4 or more players, though. If players communicated through actions instead of reporting intentions as above, any information communicated by players may be subject to additional incentive constraints.

7

Conclusion

This paper studies repeated games with frequent actions, secret monitoring and infrequent coordination, showing how to sustain dynamic equilibria with imperfect monitoring that converges to Brownian motion. The approach developed above relies on the use of mediated strategies, a plausible generalization of private strategies, which simplify the delay and dissemination of the arrival of endogenous strategic information. These mediated strategies may be thought of as dynamic information management institutions. These institutions form latent variables for each player that are revealed at regular intervals. The incentive schemes in this paper rely on empirical likelihood tests of obedience that not only apply to discrete-time problems, but also to continuous-time problems. The results, despite having been derived for the case of public monitoring, apply much more generally, including to environments with private monitoring and private payoff-relevant information. It is possible that continuous-time games may be useful for pointed analysis of outcomes with fixed discount rates. It would be interesting in the future to use the continuous-time limit to characterize the best forms of dynamic incentives for such fixed dicount rates.

A

Proofs

Proof of Lemma 3. If conditional identifiability fails then for some (i, ai , ω) there is a vector P P yi ≥ 0 such that Pr(ω|a) = bi 6=ai yi (bi ) Pr(ω|bi , a−i ) for all a−i . Let σ ˆi = yi / bi yi (bi ), and for any π ∈ (0, 1), define σi according to σi (ai ) = 1 − π and σi (bi ) = πˆ σi (bi ) for bi 6= ai . P Write Pr(ω|ˆ σi , a−i ) = ˆi (bi ) Pr(ω|bi , a−i ). Failure of conditional identifiability is bi 6=ai σ equivalent to the existence of (i, ai , ω) and σ ˆi such that Pr(ω|ˆ σi , b−i ) Pr(ω|ˆ σi , a−i ) = Pr(ω|a) Pr(ω|ai , b−i )

27

∀(a−i , b−i ).

Letting Pr(a, ω) =

P

bi

σi (bi ) Pr(ω|bi , a−i ), this is equivalent to

(1 − π) Pr(ω|a)/ Pr(a, ω) (1 − π) Pr(ω|ai , b−i )/ Pr(ai , b−i , ω) = π Pr(ω|ˆ σi , a−i )/ Pr(a, ω) π Pr(ω|ˆ σi , b−i )/ Pr(ai , b−i , ω)

∀(a−i , b−i ).

But by definition of conditional probability, this is the same as Pr(ai |ai , b−i , ω) Pr(ai |a, ω) = 1 − Pr(ai |a, ω) 1 − Pr(ai |ai , b−i , ω)

∀(a−i , b−i ),

where Pr(ai |a, ω) is the probability that i played ai given that a was recommended and ω realized. By monotonicity of x/(1 − x) with respect to x ∈ (0, 1), this is equivalent to Pr(ai |a, ω) = Pr(ai |ai , b−i , ω)

∀(a−i , b−i ).

The result now follows because Pr(ai |a, ω) = Πi (a, ω).

Proof of Lemma 4. First argue by contraposition. Suppose that for all ∆ > 0 there exists ∆t ∈ (0, ∆) such that conic independence fails: either p(ai , ·) ∈ conv{p(bi , ·) : bi 6= ai } or q(ai , ·) ∈ conv{q(bi , ·) : bi 6= ai } for some (i, ai ). Without loss, the second inclusion holds infinitely often at (i, ai ) as ∆ → 0, so there exists yi∆t (ai , ·) ≥ 0 such that for every a−i , X yi∆t (ai , bi )p(bi , a−i ) ⇔ (13) q(a) = √ 1 [1 − x(a) ∆t] = 2

bi 6=ai 1 2

X

√ yi∆t (ai , bi )[1 − x(bi , a−i ) ∆t]

bi 6=ai

x(a) =

X

yi∆t (ai , bi )x(bi , a−i ) + (1 −

X

⇔ √ yi∆t (ai , bi ))/ ∆t.

(14) (15)

bi 6=ai

bi 6=ai

For any sequence {∆tm > 0} decreasing to 0, consider the corresponding sequence {yim = P yi∆tm }. Let y im (ai ) = bi 6=ai yim (ai , bi ). Without loss assume that {y im (ai )} is a monotone sequence, so it has a (possibly infinite) limit, y i (ai ). If y i (ai ) = 0 then obviously (14) fails. If y i (ai ) = ∞ then divide (15) by y im (ai ), rearrange terms, and write πim (ai ) ∈ ∆(Ai \{ai }) for πim (ai ) = yim (ai , bi )/y im (ai ) to obtain 0=

X bi 6=ai

πim (ai , bi )x(bi , a−i ) −

x(a) y (ai ) − 1 p + im / ∆tm , y im (ai ) y im (ai )

(16)

But the right-hand side of (16) explodes, a contradiction. To see this, note that the first P term, bi 6=ai πim (ai , bi )x(bi , ·), lies in the bounded set conv{x(bi , ·) : bi 6= ai }, so is bounded, too. The second term, x(ai , ·)/y im (ai ), clearly converges to zero because y im (ai ) → ∞. The √ third term explodes, too, since [y im (ai ) − 1]/y im (ai ) → 1 and 1/ ∆tm → ∞. If y i (ai ) ∈ R+ √ is different from 1 then again [1−y im (ai )]/ ∆tm explodes, leading to another contradiction 28

of (16). Finally, suppose that y i (ai ) = 1. Since y ≥ 0, it follows that yim (ai , bi ) is a bounded sequence, hence has a convergent subsequence. Taking subsequences of subsequences if necessary, there is a subsequence such that all yim (ai , bi ) converge together to some limit yi (ai , bi ). Depending on the rate at which y im (ai ) → 1 relative to ∆tm → 0, the term √ (1 − y im (ai ))/ ∆tm can converge to any real number, independently of a−i . The first claim now follows from (15) because it implies that (4) does not hold. For the converse result, suppose that (4) fails for some (i, ai ), in other words, there exists a P P vector y ≥ 0 with bi 6=ai y(bi ) = 1 and a scalar α such that x(ai , ·) = bi 6=ai y(bi )x(bi , ·)+α. P By the definitions of p(a) and q(a), it follows that Pr(ω|ai , ·)− bi 6=ai y(bi ) Pr(ω|bi , ·) equals X X √ √ 1 1 1 [1 − [x(a , ·) − ∆t = ± α ∆t → 0 as ∆t → 0, y(b )] ± y(b )x(b , ·)] i i i i 2 2 2 bi 6=ai

bi 6=ai

as was claimed.

Proof of Proposition 1. First notice that, by definition of x and p, Pr(ω|a) Pr(ω|ai , b−i ) Pr(ω|a) Pr(ω|ai , b−i ) − = 1+ −1− Pr(ω|σi , a−i ) Pr(ω|σi , b−i ) Pr(ω|σi , a−i ) Pr(ω|σi , b−i ) Pr(ω|a) − Pr(ω|σi , a−i ) Pr(ω|ai , b−i ) − Pr(ω|σi , b−i ) − . = Pr(ω|σi , a−i ) Pr(ω|σi , b−i ) If x fails conditional identifiability then there is a mixed strategy σ ˆi and a scalar αi with X σ ˆi (bi )x(bi , a−i ) = αi ∀a−i . x(a) − bi 6=ai

Applying the previous formula to σ ˆi yields

" # √ √ 1 1 α ∆t α ∆t Pr(ω|a) − Pr(ω|ˆ σi , a−i ) Pr(ω|ai , b−i ) − Pr(ω|ˆ σi , b−i ) i i 2 − =± − 2 . Pr(ω|ˆ σi , a−i ) Pr(ω|ˆ σi , b−i ) Pr(ω|ˆ σi , a−i ) Pr(ω|ˆ σi , b−i )

Therefore, 1 αi [Pr(ω|ˆ σi , b−i ) − Pr(ω|ˆ σi , a−i )] 1 Pr(ω|a) Pr(ω|ai , b−i ) √ − = ±2 → 0 as ∆t → 0. σi , a−i ) Pr(ω|ˆ σi , b−i ) Pr(ω|ˆ σi , a−i ) Pr(ω|ˆ σi , b−i ) ∆t Pr(ω|ˆ Conversely, if x satisfies conditional identifiability then for every mixed strategy σi such that σi (ai ) < 1 there exist (a−i , αi ) and (b−i , βi ) such that αi 6= βi and both X X x(a) − σi (bi )x(bi , a−i ) = αi and x(ai , b−i ) − σi (bi )x(b) = βi . bi

bi

Following the previous steps, 1 αi Pr(ω|ˆ σi , b−i ) − βi Pr(ω|ˆ σi , a−i ) Pr(ω|a) Pr(ω|ai , b−i ) 1 √ − = ±2 σi , a−i ) Pr(ω|ˆ σi , b−i ) Pr(ω|ˆ σi , a−i ) Pr(ω|ˆ σi , b−i ) ∆t Pr(ω|ˆ → ±(αi − βi ) as ∆t → 0, since each of the probabilities above converges to 12 . 29

P P Proof of Lemma 5. By feasibility, a−i ξi (a, ω) Pr(ω|a)µ(a) ≥ πi a−i Pr(ω|a)µ(a) for all (i, ai , ω). Summing with respect to (ai , ω) yields X ξi (a, ω) Pr(ω|a)µ(a) ≥ πi . (a,ω)

P However, feasibility requires that πi = (a,ω) ξi (a, ω) Pr(ω|a)µ(a), which is violated if there P P exists (i, ai , ω) such that a−i ξi (a, ω) Pr(ω|a)µ(a) > πi a−i Pr(ω|a)µ(a). Proof of Lemma 6. Let ξ be any proper scoring rule with prior punishment probability vector π. If πi = αi for all i then we are done. Otherwise, if πi > αi then simply scale down ξi to obtain ξi0 (a, ω) = αi ξi (a, ω)/πi . It follows that X X πi0 = ξi0 (a, ω) Pr(ω|a)µ(a) = αi ξi (a, ω) Pr(ω|a)µ(a)πi = αi πi /πi = αi . (a,ω)

(a,ω)

Finally, if πi < αi then pick βi = (αi − πi )/(1 − πi ) and ξi0 (a, ω) = βi + (1 − βi )ξi (a, ω). Notice that βi ∈ (0, 1), so ξi0 ∈ [0, 1] and still satisfies (6). Now, similarly to the previous case where πi > αi , we obtain that πi0 = βi + (1 − βi )πi = αi . Since the transformations applied to ξ were affine and monotone, it follows that ξ 0 is still a proper scoring rule. Proof of Proposition 2. The proof is a simple application of duality. Consider the following linear program: sup γ s.t. ξi (a, ω)µ(a) ≤ µ(a) ∀(i, a, ω), X X ∆ui (bi , a−i )µ(a) ≤ ξi (a, ω)∆ Pr(ω|a, bi )µ(a) ∀(i, ai , bi ), γ V (µ, ∆t) =

γ,ξ≥0,π

a−i

X

(a−i ,ω)

Pr(ω|a−i , bi )µ(a)πi ≤

X

ξi (a, ω) Pr(ω|a−i , bi )µ(a) ∀(i, ai , bi , ω),

a−i

a−i

πi ≥

X

ξi (a, ω) Pr(ω|a)µ(a) ∀i.

(a,ω)

√ If a proper scoring rule ξ exists then V (µ, ∆t) > 0. (Since ∆ Pr(ω|a, bi ) = ω∆x(a, bi ) 21 ∆t, √ apart from the constant multiple 12 ∆t, the second family of constraints above coincides with implementability (6) for proper scoring rules.) The dual of this problem is given by X X V (µ, ∆t) = sup ηi (a, ω)µ(a) s.t. ∆ui (µ, σi ) ≥ 1, η,σ,y≥0

ηi (a, ω) ≥ ω∆x(a, σi ) +

X

i

(i,a,ω)

yi (ai , bi , ω) Pr(ω|a−i , bi ) − yˆi Pr(ω|a) ∀(i, a, ω),

bi

yˆi =

X (a,bi ,ω)

30

yi (ai , bi , ω) Pr(ω|a−i , bi )µ(a) ∀i.

Suppose V (µ, ∆t) = 0. Since η ≥ 0, necessarily ηi (a, ω) = 0 for all (i, a, ω). Substituting P P for yˆi , it follows that (a,ω) bi yi (ai , bi , ω) Pr(ω|a−i , bi )µ(a) − yˆi Pr(ω|a)µ(a) = 0. Since, in P addition, ω ∆ Pr(ω|a, σi ) = 0 for each a, the right-hand side of the second dual inequality above adds up to zero with respect to (a, ω). So, if there exists (a, ω) such that this righthand side is negative then there must exist another (a, ω) for which this right-hand side is positive, contradicting the hypothesis that ηi (a, ω) = 0 for all (a, ω). Therefore, this right-hand side must equal zero for all (a, ω). Rearranging this equation and dividing by µ(a) > 0 yields X P Pr(ω|a−i , bi )[σi (bi |ai ) + yi (ai , bi , ω)] − Pr(ω|a)[ bi σi (bi |ai ) + yˆi ] = 0 ∀(i, a, ω). bi

P Finally, dividing by bi σi (bi |ai )+ yˆi we obtain that Pr(ω|a) is a positive linear combination of Pr(ω|a−i , bi ), with weights that depend on (ai , bi , ω), but not on a−i . By the first dual inequality above, σi is (proportional to) a profitable deviation, therefore Pr fails to exhibit conditional identifiability for that player. Proof of Lemma 7. This follows immediately from the Maximum Theorem and the fact that, while ξ is a proper scoring rule, Ci∗ < ∞. The fact that µ is a completely mixed correlated strategy is used only to obtain continuity for πi∗∗ , since it is defined in terms of conditional probabilities obtained from µ. √ Proof of Proposition 3. Let W (µ, ∆t) = V (µ, ∆t)/ ∆t for all ∆t > 0, where V (µ, ∆t) was defined in the proof of Proposition 2, so W (µ, ∆t) = sup λ s.t. 0 ≤ ξi (a, ω) ≤ 1 ∀(i, a, ω), λ,ξ,π X X ∆ui (bi , a−i )µ(a) ≤ 21 λ ξi (a, ω)ω∆x(a, bi )µ(a) ∀(i, ai , bi ), a−i

X a−i

Pr(ω|a−i , bi )µ(a)πi ≤

(a−i ,ω)

X

ξi (a, ω) Pr(ω|a−i , bi )µ(a) ∀(i, ai , bi , ω),

a−i

πi =

X

ξi (a, ω) Pr(ω|a)µ(a) ∀i.

(a,ω)

By Lemma 4, W (µ, ∆t) > 0 for all small ∆t > 0. (Note that W may still be unbounded.) ∗ I will show that W (µ, 0) = lim∆t→0 W (µ, ∆t) > 0. This clearly yields C i > 0 and π i ∈ (0, 1), since π i = 0 or 1 implies ξ i (a, ω) all equal 0 or 1 by virtue of the fact that P π i = 12 (a,ω) ξi (a, ω)µ(a). By the incentive constraint above, this in turn implies λ = 0,

31

contradicting W (µ, 0) > 0. The dual of the problem above equals X X W (µ, ∆t) = inf ηi (a, ω)µ(a) s.t. ∆ui (µ, σi ) ≥ 1, η,σ,y≥0

i

(i,a,ω)

ηi (a, ω) ≥ ω∆x(a, σi ) +

X

yi (ai , bi , ω) Pr(ω|a−i , bi ) − yˆi Pr(ω|a) ∀(i, a, ω),

bi

X

yˆi =

yi (ai , bi , ω) Pr(ω|a−i , bi )µ(a) ∀i.

(a,bi ,ω)

For a contradiction, suppose that W (µ, ∆t) → 0, and assume that (η, σ, y) solve the dual at ∆t > 0. By the Maximum Theorem, W (µ, 0) = 0. Taking a subsequence if necessary, as ∆t → 0 the solution (η, σ, y) converges to (η, σ, y), say, which must satisfy the dual constraints evaluated at ∆t = 0 and η = 0, since η ≥ 0: X ∆ui (µ, σ i ) ≥ 1, i

#

" 0 ≥ ±∆xi (a, σ i ) +

1 2

X

y i± (ai , bi ) − yˆi

∀(i, a),

(17)

µ(a)[y i+ (ai , bi ) + y i− (ai , bi )] ∀i.

(18)

bi

yˆi =

1 2

X (a,bi )

Substituting for yˆi from (18) and adding all the inequalities in (17) weighted by µ yields X X X 0≥ µ(a)[∆xi (a, σ i ) − ∆xi (a, σ i )] + 12 µ(a)[ y i+ (ai , bi ) + y i− (ai , bi ) − yˆi ] = 0. a∈A

a∈A

bi

Hence, every right-hand side in (17) equals zero. This clearly contradicts Assumption 2. Proof of Lemma 8. We already established that (1 − δ)∆ui (σi ) ≤ e−rc wi ∆πi (σi )fiT −1 (τi∗∗ ). Since 1 − δ ≤ r∆t, it follows that r∆t∆ui (σi ) ≤ e−rc wi ∆πi (σi )fiT −1 (τi∗∗ ) implies (7). Since T ≤ c/∆t, this is implied by rcerc

∆ui (σi ) ≤ wi T fiT −1 (τi∗∗ ) ∆πi (σi )

after some rearrangement. Maximizing the left-hand side with respect to σi and substituting for fi yields ∗ T − 1 T −1−τi∗∗ ∗∗ rc ∆ui rce ≤ wi T πi (1 − πi )τi . ∗ ∗∗ ∆πi τi ∗∗ ∗∗ T −τ 1+πi (T −1) T T But πTi Tτ−1 = πii τT∗∗ ≥ ≥ T , from which the claim follows. ∗∗ ∗∗ ∗∗ πi τ τ i

i

i

i

32

Proof of Lemma 9. The probability mass functions fiT −1 (·|hτi ) and fiT −1 (·) are obtained from the convolution of independent Bernoulli trials. By belief stability (5), each of the Bernoulli trials that generate fiT −1 (·|hτi ) have a failure probability that is greater than or equal to the corresponding Bernoulli trial that generates fiT −1 (·), with failure rate πi . At the same time, these failure probabilities cannot be larger than πi∗∗ by definition of πi∗∗ . Let us simplify notation for the purpose of the proof. I will use f instead of the more cumbersome fiT −1 . I will use the following classic observation due to Samuels (1965) regarding the mode of a sequence of independent Bernoulli trials. (The notation is slightly different because πi corresponds to the probability of failure—not success.) Lemma 11 (Samuels, 1965, Theorem 1). If k is an integer such that k ≤ (1 − π i )(T − 1) ≤ k + 1, P −1 where π i = τT=1 πiτ /(T − 1) and πi∨ = maxτ {πiτ }, then f ∨ (k) ≥ f ∨ (k − 1), where f ∨ is the probability mass function that excludes a trial with failure probability πi∨ . Thus, the mode of a Binomial with failure probability p and n trials is b(n + 1)pc. Pick 1 any sequence of failure probabilities in order from highest to lowest: {πiτ }. Let f be the probability mass function of successes from independent Bernoulli trials with failure 1 probabilities {πiτ }. Let f1 be defined by f1 (n) = f ∨ (n − 1) + πi ∆f ∨ (n), where ∆f ∨ (n) = f ∨ (n) − f ∨ (n − 1). Of course, f (n) has the same expression except for πi1∨ replacing πi . If k1 satisfies the conditions of Lemma 11 above for {πit1 } then f1 (k1 ) ≤ f (k1 ), since πi ≤ πi1∨ and ∆f ∨ (k1 ) ≥ 0. 1 2 with } be the sequence of failure probabilities obtained by first replacing πi1 Now let {πiτ πi and then reordering the probabilities from highest to lowest. Let f2 be defined by f2 (n) = f1∨ (n − 1) + πi ∆f1∨ (n). 2 Similarly, if k2 satisfies the conditions of Lemma 11 above for {πiτ } then f2 (k2 ) ≤ f1 (k2 ). 2 1 But since πiτ ≤ πiτ for all τ , it follows that k2 ≤ k1 , therefore f2 (k1 ) ≤ f1 (k1 ) ≤ f (k1 ). T −1 Proceeding inductively, it follows that πiτ = πi for all τ and fT −1 (k1 ) ≤ f (k1 ). In other words, the probability of k1 successes with failure probability πi at every period t is less 1 than or equal to that with arbitrary failure probabilities {πiτ }. 1 Finally, the largest possible value of k1 is associated with πiτ = πi∗∗ for all τ , rendering 1 k1 = τi∗∗ . Notice that, by unimodality, for any other sequence of πiτ ’s between πi and πi∗∗ , k1 ≤ τi∗∗ , therefore f ∨ (k1 ) ≥ f ∨ (k1 − 1) implies that f ∨ (τi∗∗ ) ≥ f ∨ (τi∗∗ − 1). This finally implies the claimed result.

33

Proof of Proposition 4. The first claim follows by a simple application of the de MoivreLaplace Theorem. Recall the condition from Lemma 8: ∗ T ∗∗ T −τi∗∗ rc ∆ui ≤ w T π (1 − πi )τi . rce i i ∗ ∗∗ ∆πi τi p √ √ Rearranging T ≤ c/∆t and writing ∆πi∗ = zi∗ 21 ∆t yields ∗ √ T ∗∗ T −τ ∗∗ rc ∆ui 1 rce ∗ √ ≤ wi 2 T ∗∗ πi i (1 − πi )τi . τi zi c Since τi∗∗ = b(1 − πi∗∗ )(T − 1)c and by assumption πi = 12 along the sequence of scoring rules as ∆t → 0, by the de Moivre-Laplace Theorem, 1 (t∗∗ −(1−πi )T )2 √ T T −t∗∗ − i2T π (1−π t∗∗ 1 2 i i i i) p e T π (1 − π ) ≈ i i 2 t∗∗ 2ππi (1 − πi ) i ∗∗ √ 2 i −πi ) T ] 1 − [(π2π (1−π i i) ≈ √ e 2π √ 2 ∗∗ 1 √c)2 c) (z ∗∗ √ 2 1 − (z2πi (1−π 1 i ≈ √ e i i ) = √ e− 2 = ϕ(z ∗∗ c). i 2π 2π Again by the de Moivre-Laplace Theorem, the left-hand side converges to the right-hand side, meaning that the ratio of the left and right hand sides above converges to 1. For the second claim, choose wi so that inequality (9) holds with equality. Substituting this expression into the lifetime utility for player i yields vi → ui (µ) −

rc∆u∗i Π √i0 ∗ √ ∗∗ −rc 1 − e ϕ(z i c)z i c

as ∆t → 0.

By the Central Limit Theorem, a similar calculation to the previous one above yields √ c). Moreover, recent improvements to the Berry-Esseen Theorem,17 Πi0 → 1 − Φ(z ∗∗ i p establish uniform convergence in the Central Limit Theorem of order 21 ∆t/c. For the third claim, recall that the hazard rate of the normal distribution explodes faster than linearly. Therefore, its inverse implodes: √ 1 − Φ(z ∗∗ c) i √ ∗ √ → 0 as c → ∞. ∗∗ ϕ(z i c)z i c Finally, for the last claim, if c → ∞ but rc → 0 as r → 0 (e.g., c = rε−1 and 0 < ε < 1) then rc/(1 − e−rc ) → 1, therefore lim lim vi = ui (µ),

r→0 ∆t→0

as claimed. 17

See http://en.wikipedia.org/wiki/Berry-Esseen theorem.

34

ˆ i equals Proof of Lemma 10. The proof is straightforward. The first derivative of R ˆ 0 (ˆ R i πi )

τi∗ X T = [τ π ˆiτ −1 (1 − π ˆi )T −τ − (T − τ )ˆ πiτ (1 − π ˆi )T −1−τ ] τ τ =0 τ i∗ X T − 1 τ −1 T −1 τ T −τ ˆi )T −1−τ = T π ˆi (1 − π ˆi ) − π ˆi (1 − π τ − 1 τ τ =0 T − 1 τi∗ = −T π ˆi (1 − π ˆi )T −1−τi∗ < 0, τi∗

ˆ 0 (ˆ therefore R ˆi . Similarly, the second derivative equals i πi ) is a decreasing function of π T − 1 00 ˆ i (ˆ R πi ) = −T [τi∗ π ˆiτi∗ −1 (1 − π ˆi )T −1−τi∗ − (T − 1 − τi∗ )ˆ πiτi∗ (1 − π ˆi )T −2−τi∗ ] τi∗ T − 1 τi∗ −1 = −T π ˆi (1 − π ˆi )T −2−τi∗ [τi∗ (1 − π ˆi ) − (T − 1 − τi∗ )ˆ πi ] τi∗ T − 1 τi∗ −1 = −T π ˆi (1 − π ˆi )T −2−τi∗ [τi∗ − (T − 1)ˆ πi ] τi∗ T − 1 τi∗ −1 (1 − π ˆi )T −2−τi∗ [bπi (T − 1)c − 1 − (T − 1)ˆ πi ] = −T π ˆi τi∗ T − 1 τi∗ −1 (1 − π ˆi )T −2−τi∗ [πi (T − 1) − 1 − (T − 1)ˆ πi ] ≥ −T π ˆi τi∗ T − 1 τi∗ −1 (1 − π ˆi )T −2−τi∗ [(T − 1)(ˆ πi − πi ) + 1] > 0. = T π ˆi τi∗ ˆ i is strictly convex on [πi , 1]. Therefore, R

Proof of Proposition 6. The first two claims follow immediately from Proposition 5 above, since Ri0 = 21 . It remains to establish the third claim, which, since rc/(1 − e−rc ) → 1 as rc → 0, boils down to showing that Di < ∞ and 12 Di → ∆udi . To see that Di < ∞, writing E[∆ˆ ui (σiT )], defined just after Lemma 10 as just E[∆ˆ ui ], by convexity E[∆ˆ ui ] E[∆ˆ ui ] E[∆ˆ πi ] E[∆ˆ ui ] ∆ˆ πi T DiT (σi ) := = ≤ E . E[Ri (πi ) − Ri (ˆ πi )] E[∆ˆ πi ] E[Ri (πi ) − Ri (ˆ πi )] E[∆ˆ πi ] Ri (πi ) − Ri (ˆ πi ) E[∆ˆ ui ] E[∆ˆ πi ]

∆u∗i ∆πi∗

Ri (πi )−Ri (ˆ πi ) is π ˆi −πi ∗∗ ∆π ∆ˆ π of π ˆi , with minimum at πi∗∗ . Therefore, Ri (πi )−Ri i (ˆπi ) ≤ Ri (πi )−Ri i (π∗∗ ) i ∗ Lemma 7 and Proposition 3, C i = lim∆t→0 Ci∗ < ∞, and

Notice that

≤

< ∞. By Lemma 10, the slope

zi∗∗ z ∗∗ i √ → Ri (πi ) − Ri (πi∗∗ ) Φ(0) − Φ(−z ∗∗ c) i 35

a decreasing function < ∞. Moreover, by

as ∆t → 0

by the central limit theorem, so lim DiT (σiT ) ≤

∆t→0

∆u∗i z ∗∗ i √ <∞ z ∗i Φ(0) − Φ(−z ∗∗ c) i

= 0, which would imply that µ is implementable with completely constant unless z ∗∗ i transfers, that is, there is no incentive problem. This last inequality implies that Di < ∞. √ For the last claim, notice first that Di ≥ ∆udi /[Φ(0) − Φ(−zid c)] by having player i always deviate to σid . It follows that limc→∞ Di ≥ 2∆udi . For the converse inequality, let σ ˜iT ∈ arg max σiT

E[∆ui (σiT )] E[Ri (πi ) − Ri (ˆ πi )]

be an optimal dynamic deviation for a T -period block. There exists πi∨ ∈ [πi , πi∗∗ ] such that E[Ri (ˆ πi )] = Ri (πi∨ ), therefore E[Ri (πi ) − Ri (ˆ πi )] = Ri (πi ) − Ri (πi∨ ). By convexity of Ri , πi∨ ≤ E[ˆ πi ], but since ∨ T Ri is strictly decreasing, πi = πi only if E[∆ui (σi )] ≤ 0. Let π ˜i be the average failure P 1 probability generated by σ ˜i , that is, π ˜i = T E[ τ πiτ ], where πiτ is the (random) failure T probability associated with σ ˜i in period τ . Let z˜i be the failure drift associated with π ˜i . NoP P + ∗ tice that E[∆ˆ ui ] ≤ ∆ˆ ui (˜ zi ), where ∆ui (σi ) = (ai ,bi ) σi (bi |ai ) max{ a−i µ(a)∆ui (a, bi ), 0} and X X ∆ˆ u∗i (˜ zi ) := max {∆u+ (σ ) : z ˜ ≥ ξ (a, ω)ω∆x(a, b )σ (b |a )µ(a), σi (bi |ai ) ≤ 1 ∀ai }. i i i i i i i i σi ≥0

bi

(a,ω)

This follows simply because E[∆ˆ ui ] is constructed as a deterministic average of T random utility payoffs (one for each time period) that can be incorporated into σi . The function ∆ˆ u∗i (˜ zi ) is defined as the value of a linear program. Its dual is given by X X ξi (a, ω)ω∆x(a, bi )µ(a) + κi (ai )}, min {λzi + κi (ai ) : ∆u+ (a , b ) ≤ λ i i i λ,κi ≥0

a−i

ai

P where ∆u+ i (ai , bi ) = max{ a−i µ(a)∆ui (a, bi ), 0}. To estimate Di , let us first bound the directional derivative of ∆ˆ u∗i at 0. By duality, this is bounded above by λ∗ , where (λ∗ , κ∗i ) solve the dual above. Since the dual minimizes its objective, this λ∗ is bounded above by P + (ai ,bi ) ∆ui (ai , bi )σi (bi |ai ) + P Ci = sup . σi ∈M (Ai ) a,bi ξi (a, ω)ω∆x(a, bi )σi (bi |ai )µ(a) This follows because Ci+ is clearly a feasible solution for λ in the dual together with κi ≡ 0. By Proposition 3, there exists a proper scoring rule for ∆u+ i , so without loss we may assume ∗ + ∗ that ξi is such a rule. Therefore, Ci = Ci < ∞ and C i < ∞. 36

√ As ∆t → 0, Ri (πi ) − Ri (πi∨ ) → Φ(0) − Φ(−z ∨i c), where z ∨i ∈ [0, zi∗∗ ] is the limit of failure drifts corresponding to πi∨ as ∆t → 0. If z ∨i = 0 then by l’Hopital’s Rule ∗

Ci E[∆ui (σiT )] ∆u+∗ i √ = √ . ≤ lim max T ∆t→0 σ E[Ri (πi ) − Ri (ˆ πi )] ϕ(0) c ϕ(0)z ∗ c i By convexity of Φ(z) for z ≤ 0 (see Figure 4), it follows that √ Φ(0) − Φ(−z ∗i c) √ ϕ(0) ≥ . z ∗i c Therefore, ∆u+∗ ∆u+∗ i i √ √ . ≤ ∗ ϕ(0)z c Φ(0) − Φ(−z ∗i c) +∗ d d Taking the limit as c → ∞ yields 2∆u+∗ i , where ∆ui ≤ ∆ui by definition of ∆ui . ∨ Taking a subsequence if necessary, z ∨i converges to some failure drift z i ∈ [0, zi∗∗ ] as √ ∨ c → ∞. If this limit is different from zero then E[∆ui (z i )] ≥ 0 and Φ(0) − Φ(−z ∨i c) → 12 , ∨ so limc→∞ Di ≤ 2∆udi . On the other hand, now suppose instead that z i = 0. If z ∨i → 0 √ as fast as 1/ c or slower then trivially limc→∞ Di = 0, as the numerator tends to zero but not the denominator (recall that E[∆ˆ ui ] ≤ ∆ˆ u∗i defined above), which contradicts the previously derived lower bound on Di . If convergence is faster, again by l’Hopital’s rule

E[∆ˆ ui ] ∆u+∗ i √ → 0. ≈ E[Ri (πi ) − Ri (ˆ πi )] ϕ(0)z ∗ c This again contradicts our previously derived lower bound. The result now follows.

Proof of Theorem 2. The proof below is a simple application of self-generation due to Abreu et al. (1990b), using the insights by Fudenberg et al. (1994). To begin, let W be any smooth subset of the interior of U0 .18 I will eventually show that W is locally self-decomposable. Let B(W, r, ∆t, c) be the set of all payoff vectors v ∈ U such that, for T = bc/∆tc, there is a T -public mediated strategy µ ˜ and continuation T payoffs w : (A × Ω) → W that enforce v, i.e., (i) vi equals the expected lifetime utility for player i from playing µ ˜ for the first T -period block followed by the expected continuation payoff E[wi ], and (ii) the mechanism (˜ µ, w) is incentive compatible in the sense that every T T σi ∈ Mi is unprofitable. The set B(W, r, ∆t, c) of so-called decomposable payoffs is indexed by the set of feasible continuation payoffs W , the discount rate r, the time-step ∆t and the length of calendar time c of the T -period blocks during which equilibrium strategies are private. This leads to the following version of local self-decomposability. 18

By definition (Fudenberg et al., 1994, p. 1012), the set W is smooth if (i) it is closed and convex, (ii) it has nonempty interior, and (iii) its boundary is a C 2 -submanifold of Rn .

37

Definition 8. Given ∆t > 0 and c > 0, W ⊂ Rn is called self-decomposable if W ⊂ B(W, r, ∆t, c) for some r > 0. W is locally self-decomposable if for each v ∈ W there exists r > 0 and an open set O containing v such that O ∩ W ⊂ B(W, r, ∆t, c). This standard definition of local self-decomposability leads to the following useful lemma, which is proved in Lemma 4.2 of Fudenberg et al. (1994). Lemma 12. Fix ∆t > 0 and c > 0. If W ⊂ Rn is compact, convex and locally selfdecomposable then there exists r > 0 such that W ⊂ E(r, ∆t, c) for all r ∈ (0, r], where E(r, ∆t, c) is the set of T -public communication equilibrium payoffs when the time interval has length ∆t, T = bc/∆tc and players’ common discount rate is r > 0. The key step towards proving Theorem 2 is the following. Lemma 13. Every smooth subset of the interior of V∗ is locally self-decomposable. Proof. Let W be any such smooth subset of the interior of V∗ and pick any v ∈ W . If v ∈ int W (as in Fudenberg et al., 1994), choose an open ball O containing v whose closure belongs to the interior of W . Then there exists δ < 1 such that for each u ∈ O there exists u0 ∈ W such that u = (1 − δ T )u(µ0 ) + δ T u0 , where µ0 is a correlated equilibrium of the stage game. Now let rO (c) solve δ = e−rc , where c is determined later. Next, suppose that v ∈ ∂W .19 Since W is in the interior of V∗ , there exists ε1 , ε2 > 0 such that W belongs to the interior of V∗ε1 ,ε2 ∩ U0 , too. Therefore, we may use our previously derived punishment and reward schemes consistently. Let Λ = {λ ∈ Rn : kλk = 1} be the set of directions. Let J = {i ∈ I : λi ≥ 0}, and choose µ ∈ ∆ε1 (A) such that vi ≥ udi (µ)+ε2 for all i 6∈ J and ui (µ) ≥ vi + ε2 for all i ∈ J, therefore u(µ) ∈ int V εJ1 ,ε2 . Players in J face punishment schemes and the rest reward schemes. The mediator makes recommendations according to the correlated strategy µ independently during every period of a T -period block, where T = bc/∆tc will be chosen later. Given µ, the scoring rule is determined in two steps. First, find a proper scoring rule ξ 0 that solves the linear program defined in the proof of Proposition 3 with value W (µ, ∆t). By Assumption 2, W (µ, ∆t) > 0. Next, apply an affine transformation to ξ 0 to obtain a proper scoring rule ξ such that πi = 21 (Lemma 6). Since ∆ε1 (A) is compact and W (µ, ∆t) is continuous on that set, every µ ∈ ∆ε1 (A) has a well-defined proper scoring rule for each ∆t > 0, and these converge to their corresponding limiting proper scoring rules as ∆t → 0. In each period, a scoring trial is implemented, and players’ scores are calculated according to the rules of Section 4. Consider two kinds of direction λ in trying to decompose v with respect to some µ. First, say λi 6= 0 for every i. If λi > 0, by Proposition 4 and its proof δ T /(1−δ T )Πi0 wi = ui (µ)−vi 19

The ∂ notation stands for boundary, thus ∂W is the boundary of W .

38

and for every ε3 > 0 there exists ∆ > 0 such that p √ c) + ∆/c] ∆u∗i [1 − Φ(z ∗∗ δT i √ √ . Πi0 wi > 1 − δT z ∗i cϕ(z ∗∗ c)(1 − ε3 ) i

(19)

I will now derive a uniform bound on the right-hand side above. First of all, let ∗

C =

max

i,µ∈∆ε1 (A)

∆u∗i /z ∗i .

∗

By the Maximum Theorem, C i (µ) is continuous on ∆ε1 (A), a compact set, therefore its maximum is attained, so this maximization is well defined. Given ε3 > 0, by continuity of z ∗ and z ∗∗ with respect to µ, the same choice of ∆ satisfies (19) in a neighborhood of µ. Repeating this exercise for every µ yields an open cover of ∆ε1 (A) indexed by µ, with each neighborhood in the cover having its own associated ∆ > 0. Since ∆ε1 (A) is compact there exists a finite sub-cover. Let ∆0 > 0 be the minimum of the ∆’s in the finite sub-cover. ε1 Define the highest possible failure rate by z 0 = max(i,µ) {z ∗∗ i : µ ∈ ∆ (A)}, which is clearly finite. The tightest failure rate, for each ∆ and c, is defined by p √ [1 − Φ(z c) + ∆/c] √ √ z ∗ ∈ arg max0 . z∈[0,z ] ϕ(z c) c Substituting the value recursion expression above into the incentive constraint, and recognizing that ui (µ) − vi ≥ ε2 , yields a sufficient condition, uniform in i and µ, for incentive compatibility of punishment schemes: p √ ∗ c) + ∆/c] ∗ [1 − Φ(z √ √ ε2 > C , ϕ(z ∗ c) c(1 − ε3 ) where ε3 ∈ (0, 1) is arbitrary. As ∆ → 0 and c → ∞, the right-hand side tends to zero, and as c → 0 it tends to ∞. Therefore, there exists c0 > 0 that satisfies this inequality, hence implies incentive compatibility for our punishment schemes based on any µ ∈ ∆ε1 (A). If λi < 0 then, by Proposition 6, incentive compatibility is implied by δT Ri0 wi ≥ 21 Di , 1 − δT where value recursion for rewards yields δ T /(1 − δ T )Ri0 wi = vi − ui (µ). Recognizing that vi − ui (µ) ≥ ∆udi + ε2 yields the sufficient condition ε2 > 12 Di − ∆udi . ˆ small enough and cˆ large enough to satisfy By the proof of Proposition 6, there exist ∆ ˆ and c = max{c0 , cˆ}. this inequality. Now fix ∆ = min{∆0 , ∆} 39

Recall δ T /(1 − δ T )Πi0 wi = ui (µ) − vi for i with λi > 0 and δ T /(1 − δ T )Ri0 wi = vi − ui (µ) for i with λi < 0. Let wi = α[ui (µ) − vi ]/Πi0 when λi > 0 and wi = α[vi − ui (µ)]/Ri0 when λi < 0, for some α > 0. Let vi0 = vi − wi if λi > 0 and vi0 = vi + wi if λi < 0. Now choose α such that v 0 ∈ int W . Finally, choose rv > 0 to solve e−rc /(1 − e−rc ) = 1/α. It now follows that v is decomposable with respect to µ, W and (rv , ∆, c). Moreover, a small enough perturbation of v within W , with corresponding changes in wi to preserve value recursion, maintains decomposability, v 0 ∈ int W , and incentive compatibility (since the sufficient incentive inequalities above were strict) for the same rv . Therefore, there is an open set O containing v such that O ∩ W ⊂ B(W, rv , ∆, c). It remains to argue the case where λi = 0 for some i. For self-generation, we must amend other players’ continuation value (see Figure 5 for geometric intuition). If λi > 0, he will now face a punishment scheme where his contingent payoffs are vi −λi ε2 (1−e−rc )/e−rc if no punishment ensues an vi −λi ε2 (1−e−rc )/e−rc −wi if punishment ensues. Similarly, if λi < 0, he will now face a reward scheme where his contingent payoffs are vi + λi ε2 (1 − e−rc )/e−rc if no reward ensues an vi + λi ε2 (1 − e−rc )/e−rc + wi if reward ensues. If λi = 0 then player i faces a punishment scheme, shifted by the amount Πi0 wi . That is, if punishment ensues (which happens with probability Πi0 ), player i’s continuation payoff becomes vi +Πi0 wi −wi , otherwise it becomes vi + Πi0 wi . Hence, player i’s expected continuation payoff equals vi . The incentive compatibility constraints derived above still hold relative to J = {i : λi ≥ 0}.

x

x

x

Figure 5: Local self-decomposability in every direction Now, in order for the continuation values to be self-generated we choose w and r > 0 as follows. By value recursion, e−rc /(1 − e−rc )Πi0 wi = ui (µ) − vi − 12 λi ε2 when λi > 0, e−rc /(1 − e−rc )Ri0 wi = vi − ui (µ) + 12 λi ε2 when λi < 0, and vi = ui (µ) for λi = 0. However, for λi = 0, the punishments and rewards for player i relative to vi are less than or equal to wi in magnitude. The incentive constraint for this player i with λi = 0 is given by p √ ∗ c) + ∆/c] ∗ [1 − Φ(z −rc −rc √ √ e /(1 − e )Πi0 wi > C =: K. ϕ(z ∗ c) c(1 − ε3 ) 40

Therefore, e−rc /(1 − e−rc )wi = K/Πi0 . For λi = 0, choose wi = αK/Πi0 . For λi > 0, choose wi = α[ui (µ) − vi − 12 λi ε2 ]/Πi0 . For λi < 0, choose wi = α[vi − ui (µ) + 21 λi ε2 ]/Ri0 . It remains to show that for some small α > 0, the continuation values v 0 belong to int W . For λi 6= 0 this is clear. For λi = 0, notice that the ratio of contingent payments to player i relative to transfers to everyone else (which, since λ is a unit vector, have length 21 ε2 ) equals K < ∞. Since W is smooth, a second-order Taylor series expansion of ∂W shows that, Πi0 12 ε2 following the proof of Theorem 4.1 in Fudenberg et al. (1994), since q −rc −rc wi = (K/Πi0 )(1 − e )/e < 12 ε2 (1 − e−rc )/e−rc for sufficiently small r > 0, the change in continuation value induced by wi is insufficient to escape from W . Therefore, it is possible to pick α, hence r, such that for every v 0 ∈ int W . Let rv > 0 be such an interest rate. Therefore, v is decomposable with respect to µ, W and (rv , ∆, c). Finally, as before, a small perturbation of v is still decomposable with the same parameters, so there is an open set O containing v such that O ∩ W ⊂ B(W, rv , ∆, c). For each v ∈ W , we just argued that there is an open set Ov and (r, ∆t, c) such that O ∩ W ⊂ B(W, r, ∆t, c). By compactness, the open cover {Ov } of W has a finite subcover, {Ok : k ∈ {1, . . . , m}}. Let ∆ = min{∆tk } > 0, let c = max{ck } < ∞ and finally define r = min{rk ck }/c < min{rk }. By construction, rc ≤ rk ck for every k. Hence, W ⊂ E(r, ∆, c) for some (r, ∆, c). By construction of the open cover above and Propositions 4(1) and 6(1), W ⊂ E(r, ∆t, c) for all (r, ∆t) ≤ (r, ∆). This finally proves Theorem 2. Proof of Proposition 7. I will prove the claim for conditional irreversibility both in probabilities and drifts. Assuming λ is regular, consider the following linear program. Vλ (µ, ∆t) = sup γ s.t. ξi (a, ω)µ(a) ≤ µ(a) ∀(i, a, ω), γ,ξ≥0,β,π X X λi ∆ui (bi , a−i )µ(a) ≤ [ |λi | ξi (a, ω) + βi (a, ω)]∆ Pr(ω|a, bi )µ(a) ∀(i, ai , bi ), γ a−i

(a−i ,ω)

0≤

X

(ξi (a, ω) − π) Pr(ω|a−i , bi )µ(a) ∀(i, ai , bi , ω),

a−i

π≥

X

(i,a,ω) n X

ξi (a, ω) Pr(ω|a)µ(a),

λi βi (a, ω) = 0 ∀(a, ω).

i=1

41

If a proper λ-balanced scoring rule exists then Vλ (µ, ∆t) > 0. The dual of this problem is V (µ, ∆t) = sup η,σ,y≥0

ηi (a, ω) ≥

λi ∆ Pr(ω|a, σi ) |λi |

+

X

X

ηi (a, ω)µ(a) s.t.

n X

∆ui (µ, σi ) ≥ 1,

i=1

(i,a,ω)

yi (ai , bi , ω) Pr(ω|a−i , bi ) − yˆ Pr(ω|a) ∀(i, a, ω),

bi

yˆ =

X

yi (ai , bi , ω) Pr(ω|a−i , bi )µ(a),

(i,a,bi ,ω)

∆ Pr(ω|a, σi ) = λi ηˆ(a, ω) ∀(a, ω). With the same argument as in Proposition 2, Vλ = 0 implies that λi ∆ Pr(ω|a, σi ) |λi |

+

X

yi (ai , bi , ω) Pr(ω|a−i , bi ) − yˆ Pr(ω|a) = 0 ∀(i, a, ω).

bi

The first dual constraint requires that the deviation profile σi be profitable. The last one requires that σi be λ-unattributable. Finally. the equation above requires that σ be conditionally reversible with respect to λ. The proof for conditional irreversibility in drifts is similar; just replace |λλii | ∆ Pr(ω|a, σi ) in the dual with |λλii | ∆x(a, σi ).

References Abreu, D., P. Milgrom, and D. G. Pearce (1990a): “Information and Timing in Repeated Partnerships,” Econometrica, 59, 1713–33. 1, 3 Abreu, D., D. Pearce, and E. Stacchetti (1986): “Optimal cartel equilibria with imperfect monitoring,” Journal of Economic Theory, 39, 251–269. 6 Abreu, D., D. G. Pearce, and E. Stacchetti (1990b): “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,” Econometrica, 58, 1041–1063. 2, 37 Aoyagi, M. (2005): “Collusion Through Mediated Communication in Repeated Games with Imperfect Private Monitoring,” Economic Theory, 25, 455–475. 1 Aumann, R. (1987): “Correlated Equilibrium as an Expression of Bayesian Rationality,” Econometrica, 55, 1–18. Ben-Porath, E. and M. Kahneman (1996): “Communication in Repeated Games with Private Monitoring,” Journal of Economic Theory, 70, 281–297.

42

——— (2003): “Communication in repeated games with costly monitoring,” Games and Economic Behavior, 44, 227–250. Bhaskar, V. (2000): “The robustness of repeated game equilibria to incomplete payoff information,” University of Essex Working Paper. 3, 26 Bhaskar, V., G. J. Mailath, and S. Morris (2008): “Purification in the infinitelyrepeated prisoners’ dilemma,” Review of Economic Dynamics, 11, 515–528. Cherry, J. and L. Smith (2010): “Unattainable Payoffs for Repeated Games of Private Monitoring,” Working paper. Commission, E. (2001): “Commission Decision of 7 June 2000 (Case COMP/36.545/F3 — Amino Acids,” Official Journal of the European Communities, 24–72. Compte, O. (1998): “Communication in Repeated Games with Imperfect Private Monitoring,” Econometrica, 66, 597–626. 1, 3, 21 ¨ rner, and W. Olszewski (2005): “Belief-Free Equilibria in Repeated Ely, J. C., J. Ho Games,” Econometrica, 73, 377–415. 1 Faingold, E. and Y. Sannikov (2011): “Reputation in Continuous-Time Games,” Econometrica, 79, 773–876. 2 ¨ rner, and Y. Sannikov (2007): “Efficiency in the Fong, K., O. Gossner, J. Ho Repeated Prisoners’ Dilemma with Private Monitoring,” Mimeo. Forges, F. (1986): “An approach to communication equilibria,” Econometrica, 54, 1375– 1385. 1, 10, 27 ——— (1990): “Universal mechanisms,” Econometrica, 1341–1364. 10, 27 Fudenberg, D. and D. Levine (1994): “Efficiency and observability with long-run and short-run players,” Journal of Economic Theory, 62, 103–135. 3 ——— (2007): “Continuous time limits of repeated games with imperfect public monitoring,” Review of Economic Dynamics, 10, 173–192. 2, 4, 25 ——— (2009): “Repeated Games with Frequent Signals,” The Quarterly Journal of Economics, 124, 233–265. 2, 4, 25 Fudenberg, D., D. Levine, and E. Maskin (1994): “The Folk Theorem with Imperfect Public Information,” Econometrica, 62, 997–1039. 4, 5, 6, 22, 37, 38, 41, 46, 47 43

Gneiting, T. and A. Raftery (2007): “Strictly Proper Scoring Rules, Prediction, and Estimation,” Journal of the American Statistical Association, 102, 359–378. 14 Green, E. and R. Porter (1984): “Noncooperative collusion under imperfect price information,” Econometrica: Journal of the Econometric Society, 87–100. 6 Harrington, Jr., J. E. and A. Skrzypacz (2011): “Private monitoring and communication in cartels: Explaining recent collusive practices,” American Economic Review, 101, 2425–49. 1, 6 Hexner, E. (1943): The International Steel Cartel, Chapel Hill. Hoeffding, W. (1956): “On the distribution of the number of successes in independent trials,” The Annals of Mathematical Statistics, 27, 713–721. 19 ¨ rner, J. and W. Olszewski (2006): “The Folk Theorem for Games with Private Ho Almost-Perfect Monitoring,” Econometrica, 74, 1499–1544. 1 Kandori, M. (2003): “Randomization, Communication, and Efficiency in Repeated Games with Imperfect Public Monitoring,” Econometrica, 71, 345–353. 1, 10, 26 Kandori, M. and H. Matsushima (1998): “Private Observation, Communication, and Collusion,” Econometrica, 66, 627–652. 1 Kandori, M. and I. Obara (2006): “Efficiency in Repeated Games Revisited: The Role of Private Strategies,” Econometrica, 74, 499–519. 1 Lehrer, E. (1991): “Internal correlation in repeated games,” International Journal of Game Theory, 19, 431–456. Mailath, G., S. Matthews, and T. Sekiguchi (2002): “Private strategies in finitely repeated games with imperfect public monitoring,” The BE Journal of Theoretical Economics, 2, 2. Marshall, R. C. and L. M. Marx (2012): The Economics of Collusion: Cartels and Bidding Rings, MIT Press. 1 Miyagawa, E., Y. Miyahara, and T. Sekiguchi (2008): “The folk theorem for repeated games with observation costs,” Journal of Economic Theory, 139, 192–221. ——— (2009): “Repeated Games with Costly Imperfect Monitoring,” Kyoto University Discussion Paper No. 043. 44

Myerson, R. (1986): “Multistage games with communication,” Econometrica, 54, 323– 358. 1, 48, 49 Obara, I. (2009): “Folk theorem with communication,” Journal of Economic Theory, 144, 120–134. 1 Rahman, D. (2012a): “But Who Will Monitor the Monitor?” American Economic Review, 102, 2267–2297. 26, 27 ——— (2012b): “Mediating Collusion with Flexible Production,” Mimeo. 2 ——— (2013): “Information Delay in Games with Frequent Actions,” Working paper. 1, 3, 19 Rahman, D. and I. Obara (2010): “Mediated Partnerships,” Econometrica, 78, 285– 308. 5, 6, 24, 51 Renault, J. and T. Tomala (2004): “Communication equilibrium payoffs in repeated games with imperfect monitoring,” Games and Economic Behavior, 49, 313–344. Samuels, S. (1965): “On the number of successes in independent trials,” The Annals of Mathematical Statistics, 36, 1272–1278. 33 Sannikov, Y. (2007): “Games with imperfectly observable actions in continuous time,” Econometrica, 75, 1285–1329. 2 Sannikov, Y. and A. Skrzypacz (2007): “Impossibility of collusion under imperfect monitoring with flexible production,” The American Economic Review, 97, 1794–1823. 1, 2, 6 ——— (2010): “The role of information in repeated games with frequent actions,” Econometrica, 78, 847–882. 1, 2, 3, 4, 5, 6, 21 Sekiguchi, T. (1997): “Efficiency in repeated prisoner’s dilemma with private monitoring,” Journal of Economic Theory, 76, 345–361. Sugaya, T. (2010): “Folk Theorem in Repeated Games with Private Monitoring,” Tech. rep., mimeo. 1 Tomala, T. (2009): “Perfect Communication Equilibria in Repeated Games with Imperfect Monitoring,” Games and Economic Behavior, 67, 682–694. 1, 51 Yamamoto, Y. (2007): “Efficiency results in N player games with imperfect private monitoring,” Journal of Economic Theory, 135, 382–413. 45

Supplementary Material B

Proof of Lemma 1

A correlated strategy µ is enforceable with respect to W ⊂ R2 and δ if there exists a vector v and a function w : A × Ω → W such that X vi = (1 − δ)ui (µ) + δ Pr(ω|a)wi (a, ω)µ(a) ∀i, and (a,ω)

(1 − δ)

X a−i

∆ui (a, bi )µ(a) + δ

X

∆ Pr(ω|a, bi )wi (a, ω)µ(a) ≤ 0 ∀(i, ai , bi ).

(a−i ,ω)

The first family of equations above describes value recursion. The second family describes incentive compatibility: discouraging recommendation-contingent deviations. Following Fudenberg et al. (1994), if µ is enforceable with respect to W and δ with the pair (v, w), we will say that w enforces µ with respect to v and δ, and that v is decomposable with respect to µ, W and δ. If µ is enforceable with respect to some W and δ, call it simply enforceable. Let B(W, r, ∆t) be the set of all decomposable payoff vectors as we vary µ with respect to fixed W , r and ∆t. W is self-decomposable if W ⊂ B(W, r, ∆t) for some r. A smooth20 subset W ⊂ U is decomposable on tangent hyperplanes if for every point v on the boundary of W there exists a correlated strategy µ with finite support such that (i) u(µ) is separated from W by the (unique) hyperplane Pv that is tangent to W at u, and (ii) there exists a continuation payoff function w : A × Ω → Pv that enforces µ. I will now argue decomposability on tangent hyperplanes, for the following reason. Lemma 14 (Fudenberg et al., 1994, Theorem 4.1). If a smooth set W ⊂ U+ is decomposable on tangent hyperplanes then r > 0 exists with W ⊂ E(r, ∆t) for all r < r, where, E(r, ∆t) is the set of public communication equilibrium payoffs. Let W be a smooth subset of the interior of the feasible, individually rational set U . Let λ be the outward unit normal vector to W at v. For decomposability on tangent hyperplanes, P P I must show that given λ there is a correlated strategy µ such that (i) i λi vi < i λi ui (µ), P and (ii) there exist continuation values w that enforce µ with i λi wi (a, ω) = 0 for all (a, ω). To decompose v, let wi+ (a) be the payment to player i after an up jump if the mediator recommended a, and similarly write wi− (a) after a down jump. For simplicity, I assume that wi+ (a) + wi− (a) = 0 for all a, and write wi (a) = 2wi+ (a) = −2wi− (a). Let wi (Di , D−i ) = 0 and write wi (Ci , C−i ) = wi2 , wi (Ci , D−i ) = −wi (Di , C−i ) = wi1 . 20

Smooth means (i) closed and convex, (ii) with nonempty interior, and (iii) with a boundary that is a C -submanifold (Fudenberg et al., 1994, Definition 4.3). 2

46

If λ ≤ 0, this is easy: v is decomposable into the pure strategy profile (D1 , D2 ) and w ≡ 0. If λi = 0 and λj > 0, decompose v into the pure strategy profile (Ci , Dj ) and continuation values w as follows. First, player j needs no incentives to defect, so let wj ≡ 0. On the other hand, enforceability for player i requires that 1 − δ ≤ δwi1 (p1 − p0 ). Since 1 − δ ≤ r∆t √ √ and p1 − p0 = 21 (x1 − x0 ) ∆t, it follows that δwi1 ≥ 2r ∆t/(x1 − x0 ) implies enforceability. For the remaining cases of λ, choose µ as follows: C D C µ0 µ2 D µ1 Let µ0 + µ1 + µ2 = 1 and every entry in the table above be strictly positive. Since W is closed and in the interior of U , there exists ε > 0 such that for every vector λ 6≤ 0 there P P exists µ = (µ0 , µ1 , µ2 ) ≥ ε such that i λi ui (µ) > i λi vi . Indeed, if λi > 0 ≥ λj choose µ0 = µj = ε and µi = 1 − 2ε, whereas if λ 0 choose µ0 = 1 − 2ε and µ1 = µ2 = ε. With this notation, recommending cooperation is incentive compatible if (1 − δ)(µ0 + µj ) ≤ δ[µ0 (p2 − p1 )wi2 + µj (p1 − p0 )wi1 ]. Similarly, recommending defection requires the inequality −(1 − δ) ≤ δ(p2 − p1 )wi1 . √ If λi > 0 > λj and x2 > x1 then let wi1 = wj1 = 0, δw2 = 2r ∆t/[ε(x2 − x1 )] and δwi2 = δw2 / |λi |. Since |λi | < 1, incentive compatibility follows. Moreover, by construction √ λi wi2 +λj wj2 = 0, too. If x2 = x1 , let wi2 = 0 for all i, choose δw1 = 2r ∆t/[ε(x1 −x0 )] and let δwi1 = δw1 /λi . Incentive compatibility and budget balance again follow. Notice that these continuation values also yield incentive compatibility and budget balance if λ 0. Therefore, W is decomposable on tangent hyperplanes, as claimed. Now it remains to show that for ∆ > 0 sufficiently small, if W ⊂ E(r, ∆) then W ⊂ E(r, ∆t) for ∆t < ∆. Following Fudenberg et al. (1994, proof of Theorem 4.1, p. 1035), choose any v ∈ W . Let µ and w decompose v as above. Change the coordinate system so that v is the origin, the first axis is the line connecting v to u(µ) and the remaining axes lie in Pv . For any vector x, write x = (x0 , x1 ), where x0 is the component on the first axis and x1 is the component in Pv . Since W is smooth, by Taylor’s Theorem there exists δ ∗ < 1, a constant Cˆ > 0 and a p neighborhood O of the origin such that, for all δ > δ ∗ , if x ∈ O then kx1 k < Cˆ (1 − δ)/δ and x0 ≤ − ku(µ)k (1 − δ)/δ imply that x belongs to the interior of W . By decomposability on tangent hyperplanes, there exists (i) r > 0 such that e−r∆ > δ ∗ , p and (ii) w ∈ O (after the coordinate change) that enforces µ with kwk < 12 Cˆ (1 − δ)/δ. 47

Think of the vector w, belonging to Pv , as x1 above, and x0 = − ku(µ)k (1 − δ)/δ, therefore (x0 , w) belongs to the interior of W . p Now consider any ∆t < ∆. Define w = w ∆t/∆ to be continuation values for (r, ∆t), and let x0 = − ku(µ)k (1 − δ)/δ with δ = e−r∆t . The scaled vector (x0 , w) enforces µ at (r, ∆t) and remains in the interior of W if r > 0 is small. That w enforces µ follows from the derivation of decomposability on tangent hyperplanes above. That (x0 , w) belongs to the interior of W for small ∆ follows because s s r r r r r −r∆t ∆t r∆t 1 − e−r∆ 1 1 − e−r∆ 1 − e−r∆t 1 ˆ 1 ˆ ˆ 1−e w=w < 2C ≤ C < C , 2 ∆ r∆ e−r∆ r∆ e−r∆ e−r∆t e−r∆t since (1 − e−r∆ )/e−r∆ < 4r∆ for small enough r > 0. Therefore, w belongs to the interior of W for all ∆t < ∆. This completes the proof of Lemma 1.

C

Communication Equilibrium

I assume that players behave according to communication equilibrium in the sense of Myerson (1986). Intuitively, a disinterested mediator sends non-binding messages to players, who use the messages to make inferences about others’ behavior and choose their best response. To facilitate classification of equilibria, I decompose the mediator’s messages into private and public ones. Let Ai be the set of private recommendations that the mediator can send to player i, and A the finite set of possible public announcements. The timing of information flow is this: At the beginning of any period τ , the mediator makes a private, non-binding recommendation to every player i to play aiτ ∈ Ai . Afterwards, a public signal ωτ ∈ Ω realizes, depending stochastically on what players actually played. Finally, the mediator sends a public announcement ατ ∈ A to everyone. Let H0τ = (A × Ω × A)τ −1 be the set of partial histories for the mediator (thus H01 = {∅}), consisting of recommendations, S public signals and announcements. The set of all such partial histories is H0 = τ H0t . Definition 9. A communication mechanism, or mediated strategy, is a pair of functions µ ˜= τ −1 τ −1 τ −1 (˜ µ1 , µ ˜2 ), where µ ˜1 : H0 → ∆(A) and µ ˜2 : A×Ω×H0 → ∆(A). Here, µ ˜1 (at |a , ω , α ) stands for the probability that the mediator privately recommends aiτ to every player i given (aτ −1 , ω τ −1 , ατ −1 ), and µ ˜2 (ατ |aτ , ω τ , ατ −1 ) stands for the conditional probability that the subsequent public announcement is ατ . Notation. I will write µ ˜ to describe mediated strategies of the repeated game, and write µ ∈ ∆(A) for a correlated strategy of the stage game. A mediated strategy µ ˜ induces a probability distribution on each H0τ as follows: τ Y τ τ τ Pr(a , ω , α |˜ µ) = µ ˜1 (as |as−1 , ω s−1 , αs−1 )˜ µ2 (αs |as , ω s , αs−1 ) Pr(ωs |as ). s=1

48

The utility to player i from the mediated strategy µ ˜ is therefore given by X

Ui (˜ µ) = (1 − δ)

δ τ −1 ui (aτ ) Pr(aτ , ω τ , ατ |˜ µ).

(τ,aτ ,ω τ ,ατ )

Let Hiτ = (Ai ×Ai ×Ω×A)τ −1 be the set of partial histories for player i, with typical element hτi = (aτi −1 , bτi −1 , ω τ −1 , ατ −1 ), where aτi −1 is the vector of recommendations to player i from periods 1 to τ − 1, bτi −1 is the vector of actions taken by player i, ω τ −1 is the vector of signal realizations and ατ −1 is the vector of public announcements by the mediator. Let S Hi = t Hiτ be the set of all partial histories for player i. For any player i, a unilateral deviation from µ ˜ is a function σi : Hi → M (Ai ), where M (Ai ) = {Ai → ∆(Ai )} is the set of recommendation-contingent mixed strategies and σi (biτ |aiτ , hτi ) is interpreted as the probability that player i plays biτ if aiτ is recommended in period τ when his private history is hτi . The induced probability that (aτ , ω τ , ατ , bτi ) occurs if everyone else is honest and obedient except for i, who plays bτi instead of the recommended aτi , is given by Pr(aτ , ω τ , ατ , bτi |˜ µ, σi ) = Pr(aτ , ω τ , ατ |˜ µ)

τ Y

σi (bis |asi , bs−1 , ω s−1 , αs−1 ) i

s=1

Pr(ωs |bis , a−is ) . Pr(ωs |as )

Therefore, the utility to player i from a unilateral deviation σi can be written as Ui (˜ µ|σi ) = (1 − δ)

X

δ τ −1 ui (biτ , a−iτ ) Pr(aτ , ω τ , ατ , bτi |˜ µ, σi ).

(τ,aτ ,bτi ,ω τ ,ατ )

Definition 10. A mediated strategy µ ˜ is called a communication equilibrium, or just equilibrium, if every unilateral deviation from µ ˜ is unprofitable: Ui (˜ µ) ≥ Ui (˜ µ|σi ) ∀(i, σi ).

C.1

Public versus Private Equilibrium

Definition 10 is close to Myerson’s standard definition of communication equilibrium, where the mediator uses a canonical communication system consisting only of secret recommendations of what actions to take, without public announcements. I added public announcements to distinguish public and private communication equilibria, as the former enjoy a tractable recursive structure. Nevertheless, Myerson’s definition is a special case of Definition 10 that I will label “private” communication equilibrium. It corresponds to assuming that A is a singleton or (˜ µ1 , µ ˜2 ) is independent of α.

49

˜0 = S H ˜ 0τ be the mediator’s histories without ˜ 0τ = (A × Ω)τ and H To see this, let H τ ˜ 0 → ∆(A). public announcements. A private mediated strategy is just a function ν˜ : H Q Letting Pr(aτ , ω τ |˜ ν ) = τs=1 ν˜(as |as−1 , ω s−1 ) Pr(ωs |as ), player i’s payoff from ν˜ is X Ui (˜ ν ) = (1 − δ) δ τ −1 ui (aτ ) Pr(aτ , ω τ |˜ ν ). (τ,aτ ,ω τ )

˜ iτ = Since ν˜ involves no public announcements, player i’s histories are defined by the sets H ˜i = S H ˜ τ . A deviation from ν˜ is a map σ ˜ i → M (Ai ), and ν˜ (Ai × Ai × Ω)τ −1 and H ˜i : H i τ is a communication equilibrium if Ui (˜ ν ) ≥ Ui (˜ ν |˜ σi ) for all (i, σ ˜i ), where X Ui (˜ ν |˜ σi ) = (1 − δ) ν, σ ˜i ), and δ τ −1 ui (biτ , a−iτ ) Pr(aτ , ω τ , bτi |˜ (τ,aτ ,bτi ,ω τ ) τ

Pr(a , ω

τ

ν, σ ˜i ) , bτi |˜

τ

τ

= Pr(a , ω |˜ ν)

τ Y

, ω s−1 ) σi (bis |asi , bs−1 i

s=1

Pr(ωs |bis , a−is ) . Pr(ωs |as )

A private mediated strategy that is also a communication equilibrium is called a private communication equilibrium, or just a private equilibrium. A mediated strategy µ ˜ induces a private mediated strategy ν˜ by integrating out A: Q P ˜1 (as |as−1 , ω s−1 , αs−1 )˜ µ2 (αs |as , ω s , αs−1 ) ˜1 (aτ +1 |aτ , ω τ , ατ ) s µ τ τ ατ µ P Q . ν˜(aτ +1 |a , ω ) = ˜1 (as |as−1 , ω s−1 , αs−1 )˜ µ2 (αs |as , ω s , αs−1 ) sµ ατ It follows that Pr(aτ , ω τ |˜ ν ) = Pr(aτ , ω τ |˜ µ) for every (aτ , ω τ ), therefore Ui (˜ ν ) = Ui (˜ µ). Lemma 15. If µ ˜ is an equilibrium then ν˜ is a private equilibrium. Proof. Every deviation σ ˜i from µ ˜ corresponds to a deviation from µ that does not depend on public announcements, so, integrating out α, it follows that X Ui (µ|˜ σi ) = (1 − δ) δ τ −1 ui (biτ , a−iτ ) Pr(aτ , ω τ , ατ , bτi |µ, σ ˜i ) (τ,aτ ,bτi ,ω τ ,ατ )

= (1 − δ)

X

δ τ −1 ui (biτ , a−iτ ) Pr(aτ , ω τ , bτi |˜ µ, σ ˜i ) = Ui (˜ µ|˜ σi ) ≥ Ui (˜ µ).

(τ,aτ ,bτi ,ω τ )

Therefore, µ ˜ is a private equilibrium. Private equilibria are, in some sense, the most general, because they enjoy the fewest number of incentive constraints. On the other hand, they are difficult to analyze. This S motivates the study of public equilibria. Let Hp = τ Hpτ be the set of partial public histories, where Hpτ = (Ω × A)τ −1 collects the public information available to players up to period τ . A public deviation from µ ˜ is a map σ i : Hp → M (Ai ). In words, a public deviation only depends on public information. Specifically, a player’s mapping from current recommendations to mixed strategies only depends on public information. Say that a mediated strategy µ ˜ discourages public deviations if every public deviation is unprofitable, i.e., Ui (˜ µ) ≥ Ui (˜ µ|σ i ) for all (i, σ i ). 50

Definition 11. A mediated strategy µ ˜ = (˜ µ1 , µ ˜2 ) is public if it publicly announces all previous recommendations: A = A and µ ˜2 (aτ |aτ , ω τ , ατ −1 ) = 1 for all (τ, aτ , ω τ , ατ −1 ). Therefore, without loss we will use µ ˜ and µ ˜1 interchangeably in this case. A public mediated strategy that is also an equilibrium is a public equilibrium.21 Proposition 9. A public mediated strategy that discourages public deviations is an equilibrium, hence a public equilibrium. Proof. Let µ be a public mediated strategy that discourages public deviations. Let σi be any deviation from µ, not necessarily public. The probability of (aτ , ω τ , bτi ) equals Pr(aτ , ω τ , bτi |µ, σi ) =

τ Y

µ(aρ |aρ−1 , ω ρ−1 ) Pr(ωρ |biρ , a−iρ )σi (biρ |aiρ , bρ−1 , ω ρ−1 , aρ−1 ) i

ρ=1

as usual. Since µ is public, for s ≤ τ , the probability conditional on (as , ω s , bsi ) equals Pr(aτ , ω τ , bτi |as , ω s , bsi , µ, σi ) = Pr(aτ , ω τ , bτi |µ, σi )/ Pr(as , ω s , bsi |µ, σi ). Let s + 1 be the first period where σi depends on bsi . Decompose Ui (µ|σi ) as follows: X

Ui (µ|σi ) = (1 − δ s )Uis (µ|σi ) + δ s

Uis (µ|as , ω s , bsi , σi ) Pr(as , ω s , bsi |µ, σi ),

(as ,ω s ,bsi )

where Uis (µ|σi ) =

1−δ 1−δ s

Ps

τ =1

P

(aτ ,ω τ ,bτi )

Uis (µ|hs+1 , σi ) = (1 − δ) i

δ τ ui (aτ ) Pr(aτ , ω τ , bτi |µ, σi ) and

X

δ τ −s−1 ui (aτ ) Pr(hτi |hs+1 , µ, σi (hs+1 )). i i

hτi +1 >his+1 s+1 s s+1 ˆs s+1 s s∗ s+1 s s Let bs∗ i (h0 , bi ) ∈ arg maxˆbsi Ui (µ|h0 , bi , σi (h0 , bi )). It is easy to see that bi (h0 , bi ) = s+1 τ s bs∗ i (h0 ) does not depend on bi . Indeed, letting bis+1 = (bis+1 , . . . , biτ ), s s+1 ˆs Uis (µ|hs+1 0 , bi , σi (h0 , bi )) = (1 − δ)

X

s s+1 ˆs δ τ −s−1 ui (aτ ) Pr(hτi |hs+1 0 , bi , µ, σi (h0 , bi ))

hτi +1 >his+1

= (1 − δ)

X

δ τ −s−1 ui (aτ ) ×

hτi +1 >his+1 τ Y

ˆs ρ−1 , aρ−1 ) µ(aρ |aρ−1 , ω ρ−1 ) Pr(ωρ |biρ , a−iρ )σi (biρ |aiρ , bρ−1 is+1 , bi , ω

ρ=s+1 s+1 ˆs = Uis (µ|hs+1 0 , σi (h0 , bi )). 21

This notion of public equilibrium is comparable to the recursive communication equilibrium introduced by Tomala (2009) and Rahman and Obara (2010) for games in discrete time.

51

Therefore, a deviation is optimal regardless of the private part of a player’s history. Lets+1 s s+1 )= ting σis = σi for τ ≤ s and σis (hτi +1 ) = σi (hτ0 +1 , btis+1 , bs∗ i (h0 )), it follows that σi (hi s+1 s s s+1 σi (h0 ) is a public deviation up to and including period s+1, and moreover Ui (µ|hi , σi ) ≤ Uis (µ|hs+1 , σis ). Now let s0 + 1 be the next period where σis is not public, and repeat the i 0 algorithm above to obtain σis . Proceeding inductively, we obtain a limiting deviation that is public and whose value exceeds that of σi . Proposition 9 allows us to establish that public equilibria enjoy a tractably recursive structure, as usual. Intuitively, a player’s past deviations do not affect his beliefs about opponents’ future behavior. Formally, if µ ˜ is a public mediated strategy and hτp ∈ Hpτ a partial public history then we may rewrite a player i’s payoffs as follows: X X µ(aτ |hτp ). ui (aτ )˜ µ(aτ |hτp ) + δ vi (aτ , ωτ , htp ) Pr(ωτ |aτ )˜ vi (htp ) = (1 − δ) aτ

(aτ ,ωτ )

The public mediated strategy µ ˜ is therefore a public equilibrium if for every player i, public τ history hp and one-shot deviation σit ∈ M (Ai ), vi (hτp ) ≥

X

[(1 − δ)ui (biτ , a−iτ ) + δvi (aτ , ωτ , hτp )] Pr(ωτ |biτ , a−iτ )σiτ (biτ |aiτ )˜ µ(aτ |hτp ).

(aτ ,biτ ,ωτ )

C.2

T -Public Equilibrium

Let us now define T -public communication equilibrium. Given a block length T ∈ N, the mediated strategy µ ˜ = (˜ µ1 , µ ˜2 ) is called T -public if A = AT ∪ {0} and ( ατ = 0 and τ 6= kT for some k ∈ N, and τ τ τ −1 µ ˜2 (ατ |a , ω , α ) = 1 if ατ = aττ −T +1 and τ = kT for some k ∈ N, where aττ −T +1 = (aτ −T +1 , . . . , aτ ) lists the recommendation profiles in the most recent T -period block. In words, a T -public mediated strategy publicly announces all of the mediator’s recommendations every T periods. Again, to economize on notation, I identify µ ˜ with µ ˜1 and understand µ ˜2 implicitly as just defined. A T -public equilibrium is a T -public mediated strategy that is also an equilibrium. An analogue of Proposition 9 applies to T -public equilibria as follows. The set of T -public S T bτ /T c histories is denoted by HpT = τ Hp , where HpT k = (ΩT × AT )k is the set of partial Q public histories in any period τ such that k = bτ /T c. Let MTi = Tτ=1 {Hiτ → ∆(Ai )} be the set of private-history-contingent mixed strategies within a T -period block. They represent a player’s plan to privately deviate along a block. A T -public deviation from µ ˜ T T is a map σi : Hp → Mi . Thus, σi describes how a player plans to privately deviate at the beginning of each T -period block. 52

By Proposition 9 applied to T -period blocks, the T -public mediated strategy µ ˜ is a T public equilibrium if it discourages T -public deviations, that is, for every player i, stage k ∈ N, public kT -period public history hTp k ∈ HpT k and T -public deviation σi , T (k+1)

vi (hTp k )

≥

X T (k+1)

hiT k

[(1 − δ)

X

δ τ −1 ui (biτ , a−iτ ) + δ T vi (aT , ω T , hTp k )] ×

τ =T k+1 T (k+1)

Y

Pr(ωτ |biτ , a−iτ )σiτ (biτ |aiτ , hτiT k , hTp k )˜ µ(aτ |hτp ),

τ =T k+1

where hτiT k ∈ Hiτ is a partial private history for player i in the kth T -period block.

53