A Folk Theorem for Stochastic Games with Private Almost-Perfect Monitoring ∗ Katsuhiko Aiba† October, 2013

Abstract We prove a folk theorem for stochastic games with private, almost-perfect monitoring and observable states when the limit set of feasible and individually rational payoffs is independent of the state. This asymptotic state independence holds, for example, for irreducible stochastic games. Our result establishes that the sophisticated construction of Horner and Olszewski ¨ (2006) for repeated games can be adapted to stochastic games, reinforcing our conviction that much knowledge and intuition about repeated games carries over to the analysis of irreducible stochastic games. Keywords: Stochastic games, private monitoring, folk theorem JEL codes: C72, C73, D82

1. Introduction The class of stochastic games includes models in which persistent shocks, stock variables representing human or natural resources, technological innovations, or capital play an important role. Stochastic games are extensively used in economics since they capture dynamic interactions in rich and changing environments. It is sometimes natural to assume that players cannot observe others’ actions in these dynamic interactions. For example, consider an oligopolistic market in which firms now set prices with their customers bilaterally in each period (Stigler (1964)). The firms cannot observe competitors’ price offers, but they obtain some information about these from their own sales that are unobservable to competitors. This is an example of imperfect private monitoring where players cannot directly observe others’ actions, but receive some private signals, which are imperfect indicators of the action taken in the current period. Such imperfect monitoring is extensively studied in repeated games, which one can view as stochastic games whose state variable is fixed. In this paper, we prove a folk theorem for stochastic games with private almostperfect monitoring and observable states under the assumption that the limit set of feasible and ∗

I am grateful to Johannes Horner, Antonio Penta, Marzena Rostek, Bill Sandholm, Tadashi Sekiguchi, Ricardo ¨ Serrano-Padial, Lones Smith, referees, and seminar audiences at the Midwest Economic Theory Meeting Fall 2010 and University of Wisconsin-Madison for valuable comments and discussions. I also thank Peter Wagner for proof reading. † Institute for Microeconomics, University of Bonn, Adenauerallee 24-42, 53113 Bonn, Germany. e-mail: [email protected]

individually rational payoffs is independent of the state (we call the assumption the asymptotic state independence). In general, when we analyze a repeated game with private monitoring in which there is no available public signal for players to coordinate on, there is no obvious recursive structure available because players’ beliefs about opponents’ histories become increasingly complex as time proceeds. To avoid these obstacles, the literature has focused on belief-free equilibria, introduced by Piccione (2002), and simplified and extended by Ely and V¨alim¨aki (2002) and Ely, Horner and Olszewski ¨ (2005). In this class of equilibria any continuation play is optimal whatever private histories other players might have observed, so that players need not form beliefs about opponents’ histories in order to compute their optimal behaviors. For illustration, suppose that John and Susie repeatedly play the following prisoners’ dilemma.

C D

C 2, 2 3, −1

D −1, 3 0, 0

Note that Susie can ensure that John receives at least 2 when she plays C, and that he receives at most 0 when she plays D. Then John might have an incentive to play C today if Susie is more likely to play C tomorrow when she receives a sufficiently informative signal that John played C today. By suitably choosing the probability with which she plays C tomorrow as a function of her private information today, Susie can ensure that John is indifferent between C and D. In turn, because he is indifferent between C and D, it is optimal for him to condition his play tomorrow on his private information so as to make Susie indifferent between C and D. Strategy profiles of this sort constitute sequential equilibria with the belief-free property. Ely and V¨alim¨aki (2002) use this belief-free approach to prove the folk theorem in repeated prisoners’ dilemmas. But Ely, Horner ¨ and Olszewski (2005) show that Ely and V¨alim¨aki’s construction does not extend to general stage games. Horner and Olszewski (2006) (hereafter HO) exploit the essential feature of belief-free equi¨ librium to prove the folk theorem for general stage games. To do so, they divide the repeated game into a sequence of T-period block games. Provided that T is sufficiently large, the payoff structure of prisoners’ dilemma can be recovered from any stage game, using two T-period block strategies for each player: a ”good” strategy and a ”bad” strategy, which correspond to C and D in the prisoners’ dilemma above. John might prefer to play the good strategy if Susie is more likely to play her good strategy in the next block when she observes a sufficiently informative signal that John played his good strategy in the current block. By suitably choosing the probability with which she plays the good strategy in the next block as a function of her current block’s private histories, Susie can ensure that John is indifferent between the good and bad strategy at the beginning of each block regardless of what she observed before, and can also ensure the sequential rationality of his strategy during each block. It is the latter requirement that creates additional complications in HO’s construction relative to the construction of belief-free equilibrium. –2–

1.1 Our Construction The heart in this paper lies in the adaptation of HO’s construction from repeated games to stochastic games. To begin, we divide the entire stochastic game into consecutive T-period block games, for which we define a ”good” strategy and a ”bad” strategy. Then, we introduce of Lperiod Markov strategies that are repeatedly played within the block. Although strategies’ payoffs in stochastic games can be heavily dependent on the initial state, it is shown that the payoffs generated by the L-period Markov strategies depend little on the initial and final state during the L-period game when the asymptotic state independence holds.1 Hence, an L-period game in which these L-period Markov strategies are used is an ”almost invariant stage game”, and the repeated play of these almost invariant stage games can be regarded as ”almost repeated game”. By using these L-period strategies of the ”almost repeated game”, we are able to construct ”good” and ”bad” strategies as in HO. We ensure that these strategies exhibit the belief-free property in the beginning of each block and sequential rationality during each block by defining carefully the transition probabilities between the good and bad strategies as one block ends and the next begins. Another but minor difficulty presented by the stochastic game environment is that altering current actions affects not only the distribution of private signals, but also the distribution of future states. This implies that a player may have an incentive to deviate solely to ensure an advantageous distribution of future states. For example, if, during a play of a strategy profile, a player’s continuation payoffs are higher from state ωt+1 than from state ω˜ t+1 , then in period t the player has an incentive to choose an action that makes ωt+1 more likely. To contend with this issue, one can try to ensure that when an opponent’s observations suggest that a player has chosen this kind of advantageous action, the probability that the opponent plays the bad strategy in the next block goes up. By carefully conditioning the transition probabilities on the realized states, we are able to maintain both the belief-free property and the within-block sequential rationality. The folk theorem for stochastic games has already been proved under both perfect monitoring and imperfect public monitoring. In the case of perfect monitoring, Dutta (1995) proved the folk theorem for stochastic games when the asymptotic state independence holds. More recently, Horner, Sugaya, Takahashi and Vieille (2011) and Fudenberg and Yamamoto (2011) independently ¨ proved the folk theorem for stochastic games with imperfect public monitoring under a similar assumption. Section 2 describes the model and result. In Section 3 we provide the sketch of the proof for the two player case and describe the adaptation of HO’s construction from repeated games to stochastic games. Finally, the paper is concluded in Section 4. All the details of the proof and cases with more than two players are left to our working paper (Aiba (2013)). 1

The proof follows from some results in Dutta (1995).

–3–

2. Model and Result This paper considers n-player stochastic games with private monitoring, where the stage game is played in each period over an infinite horizon.

2.1 The Stochastic Game The stage game in each period proceeds as follows; (i) (ii) (iii) (iv)

Each player i publicly observes state ω from the finite set Ω. Each player i independently chooses an action ai from a finite set Ai . Each player i receives a stochastic private signal σi from a finite set Σi . The game moves stochastically to next period’s state ω0 ∈ Ω.

The action ai and the signal σi are private information that only player i can observe. The transition probability from current state ω to next period’s state ω0 is denoted by p(ω0 | ω, a) that depends on current action profile a = (a1 , . . . , an ). A profile σ = (σ1 , . . . , σn ) of private signals is drawn with probability m(σ | ω, a) given current state ω and action profile a. Let mi (σi | ω, a) denote the marginal probability that player i observes the signal σi . After choosing ai and observing σi , each player i collects a realized payoff u˜ i (ω, ai , σi ), so that the ex-ante payoff in the stage game is P given by ui (ω, a) = σi ∈Σi u˜ i (ω, ai , σi )mi (σi | ω, a). The payoff function ui is naturally extended from pure action profile a ∈ A := A1 × · · · × An to mixed action profile α ∈ ∆A := ∆A1 × · · · × ∆An . In this paper we study the case in which private signals are sufficiently informative that cooperation is possible. For this purpose and simplifying the analysis we assume the following about the monitoring technology. Assumption 2.1 (Canonical Signal Space). Σi = A−i for each i. Assumption 2.2 (ε-Perfect Monitoring). For each ε ≥ 0, each player i, each state ω ∈ Ω, and each action profile a ∈ A, mi (σi = a−i | ω, a) ≥ 1 − ε. Assumption 2.3 (Full Support Monitoring). m(σ | ω, a) > 0 for all ω ∈ Ω, a ∈ A, and σ ∈ Σ. Note that the random variables ω0 and σ are conditionally independent given (ω, a), so that players do not update their beliefs about their opponents’ signals after publicly observing state ω0 . Moreover, as  → 0, the monitoring distribution m changes but the state transition probability p remains fixed. We denote by ωt , ati , and σti the realized state, player i’s realized action, and player i’s observed signal in period t, respectively. Player i’s private history up to and including period t is then hti := (ω1 , a1i , σ1i , . . . , ωt , ati , σti ) ∈ Hit := (Ω × Ai × Σi )t . Let h0i = ∅ denote the null history before the initial state is drawn. A (behavior) strategy for player i is a mapping si : Hi × Ω → ∆Ai , where S t t−1 and a current realized state ωt Hi = ∞ t=0 Hi , that is, a mapping taking a past private history hi

–4–

to a mixed action. We denote by Si the set of all strategies for player i. Note that Si includes the (stationary) Markov strategies, which are functions of the current state only. Given a strategy profile s = (s1 , . . . , sn ) ∈ S := S1 × · · · × Sn and initial state w, let Ui (ω, s; δ) denote ; δ) denote the discounted average payoff for player i with discount factor δ and let Ui (ωt , s | ht−1 i , ωt ), given player i’s consistent belief about the expected continuation payoff after observing (ht−1 i . Then strategy profile s is a sequential equilibrium2 of the stochastic opponents’ private histories ht−1 −i ; δ) ≥ game if no player has a profitable deviation after any private history, that is, Ui (ωt , s | ht−1 i t 0 t−1 t 0 t−1 Ui (ω , si , s−i | hi ; δ) for all (hi , ω ) and all strategies si of every player i.

2.2 Feasible and Individually Rational Payoffs The set of feasible payoffs for repeated games is defined in terms of the stage game payoffs since any feasible payoff is a convex combination of the extreme points attained by repeatedly playing a constant action in the stage game. But the stage game can change from period to period in stochastic games. Hence, the set of feasible payoffs is defined in terms of discounted average expected payoffs as V(ω; δ) = co {(U1 (ω, s; δ), . . . , Un (ω, s; δ)) for some s ∈ S} given initial state ω and discount factor δ. Note that unlike in repeated games, V(ω; δ) varies with the discount factor and initial state. Since the folk theorem is an asymptotic result, it is natural to focus on the limit set of feasible discounted average payoffs V(ω; δ) as δ approaches 1. Dutta (1995) shows that this limit can be expressed relatively simply as the set V(ω) of the feasible limit-average payoffs3 : Lemma 2.4. As δ → 1, V(ω; δ) → V(ω) in the Hausdorff metric for all states ω. We define the set of individually rational payoffs. In the stochastic game with initial state ω, player i’s min-max payoff is given by v∗i (ω; δ) = mins−i ∈S−i maxsi ∈Si Ui (ω, si , s−i ; δ). From Neyman and Sorin (2003), for discount factors close to 1, the min-max payoff v∗i (ω; δ) is approximately equal to the limit-average min-max payoff v∗i (ω)4 : Lemma 2.5. As δ → 1, v∗i (ω; δ) → v∗i (ω) for all states ω and players i. We call a payoff vector v = (v1 , . . . , vn ) individually rational in terms of the discounted average criterion if vi > v∗i (ω; δ), and individually rational in terms of the limit-average criterion if vi > v∗i (ω). Stochastic games can generate incentive problems that are absent from repeated games. For example, suppose that there is an absorbing state. Then a low individually rational payoff for player i starting from another state might not be supportable as an equilibrium if player i can 2

In our finite setting, a stationary Markov perfect equilibrium always exists (Sobel (1971, Theorem 1)). Hence, the existence of a sequential equilibrium is guaranteed, since a Markov perfect equilibrium is also a sequential equilibrium. 3 The limit-average payoff for initial state ω, strategy profile s, and time sequence {Tk }∞ with 0 < T1 < T2 < T3 < . . . k=1   h PT i 1 t t k is defined as Ui ω, s, {Tk }∞ = lim E u (ω , a ) , provided that the limit exists for all players i. We denote T →∞ ω,s i k Tk t=1 k=1 by T (ω, s) the set of sequences {Tk }∞ such that the limit of the averages exists for initial state ω and strategy profile s. n  o   k=1  ∞ ∞ Then, V(ω) = co U1 ω, s, {Tk }∞ , . . . , U ω, s, {T } for some s ∈ S and {T } ∈ T (ω, s) . n k k=1 k k=1 k=1   4 The precise definiton is v∗ (ω) = min max inf ∞ U ω, s , s , {T }∞ . i

s−i ∈S−i

si ∈Si

{Tk }k=1 ∈T (ω,si ,s−i )

–5–

i

i

−i

k k=1

guide play to the absorbing state in which her payoffs are high. Alternatively, suppose that there is no absorbing state, but that player i can force play to remain in a certain state ω. Again, a low individually rational payoff for player i starting from another state might not be supportable as an equilibrium, since player i might deviate in a way that leads play to state ω, and then earn high payoffs by maintaining state ω forever. In order to rule out these possibilities, which would make the folk theorem fail, the following assumptions are made. Assumption 2.6 (Asymptotic state independence). We assume : (i) The set of limit-average feasible payoffs V(ω) is independent of initial state ω. (ii) The limit-average min-max payoff v∗i (ω) is independent of initial state ω for all i. This state independence assumption is satisfied for irreducible stochastic games in which, for all players i and all pairs of states ω and ω0 , there is some finite sequence of action profiles of players −i that moves the state from ω to ω0 with positive probability, independent of player i’s actions5 . Examples of irreducible stochastic games are very common in economics, see e.g. Rotemberg and Saloner’s (1986) and Besanko, Doraszelski, Kryukov, and Satterthwaite (2010). By Assumption 2.6 we can let V(ω) = V and v∗i (ω) = v∗i for all states ω. We write V ∗ (ω; δ) and V ∗ for the interior of the set of feasible and individually rational payoffs in terms of the discounted average criterion and the limit average criterion respectively: V ∗ (ω; δ) = int{v ∈ V(ω, δ) | vi > v∗i (ω, δ), ∀i}

and

V ∗ = int{v ∈ V | vi > v∗i , ∀i}.

It follows from Lemmas 2.4 and 2.5 that as δ goes to 1, V ∗ (ω; δ) converges to V ∗ for all states ω.

2.3 Main Result While in general the set of equilibria of a stochastic game with imperfect monitoring depends on the exact specification of the monitoring structure, our main conclusion does not. We denote by E(ω, δ, ε) the set of discounted average payoff vectors in the stochastic game with initial state ω that are sequential equilibrium payoff vectors for all ε-perfect monitoring structures. Our main result is: Theorem 2.7 (The Folk Theorem). For any v ∈ V ∗ and any ε-perfect monitoring structure, there exists a sequential equilibrium in which player i obtains discounted average payoff vi , so long as the discount factor δ is sufficiently close to 1 and the noise level ε is sufficiently close to 0. That is, ∀v ∈ V ∗ , ∃δ < 1, ε > 0, ∀δ ∈ (δ, 1), ∀ε ∈ (0, ε), ∀ω ∈ Ω, v ∈ E(ω, δ, ε). 5

The intuition why irreducibility implies Assumption 2.6 is as follows. Given the finiteness of the state space, the assumption of irreducibility ensures that the probability of a transition from any initial state to any other state in finite time is one. Since payoffs during a finite number of periods are negligible in terms of the limit-average criterion, the payoff attained from any state can also be attained from any other state.

–6–

Theorem 2.7 shows that in stochastic games any payoff in V ∗ can be supported in a sequential equilibrium if players are sufficiently patient and receive noisy but highly precise private information about opponents’ behavior. In particular, sufficiently patient players are able to achieve approximately Pareto-efficient payoffs. The remainder of the paper is devoted to the sketch of proof of Theorem 2.7 for the two players case. A reader might refer to Aiba (2013) for the details of the proof and a discussion of the cases with more than two players.

3. Proof for Two Players In this section, we sketch the proof of Theorem 2.7 for the two player case. Before that, we briefly review HO’s argument for repeated games with private monitoring. HO divide the repeated game into consecutive T-period blocks. They construct two T-period strategies sBi , sG i such that, in equilibrium, player i is indifferent between two strategies and weakly prefers them to all others, no matter what each player’s private history before the current block is, if player −i . This can be done by a suitable choice of the probability with uses either of two strategies sB−i , sG −i which player −i uses one of the two strategies within the current block (the transition probability) as a function of her recent history and of her recent strategy within the previous block. This notable idea makes sure that beliefs about private histories are irrelevant at the beginning of each block and that strategies sBi , sG depend only on recent histories within the current block. This approach i makes the analysis tractable and actually enables them to prove the folk theorem in the repeated game with general stage game for almost-perfect monitoring. Note that when T = 1, sBi = D, and sG = C, this argument reduces to a belief-free equilibrium in repeated prisoners’ dilemma by Ely i and V¨alim¨aki (2002). Hence, HO’s idea is the extension of belief-free concept. In the stochastic game considered in this paper, we modify the above construction in two respects. First, HO use constant or deterministic sequences of stage game actions within a block to achieve payoffs surrounding a target payoff, and they derive strategies sBi , sG from those. With i stochastic games, the stage game payoffs can be very different depending on the state. Therefore, we introduce L−period stochastic game using L−period Markov strategies, whose payoff matrices are almost identical regardless of the initial state. Hence this L−period stochastic game can be regarded as an ”almost invariant stage game” and its repeated play as an ”almost repeated game”. Then we use a finite sequence of these L−period Markov strategies within a block to approximate payoffs surrounding a target payoff and derive strategies sBi , sG from those. During the construction i we also use a finite sequence of L−period (not necessarily Markov) strategies to approximate minmax payoffs. Second, in stochastic games current actions affect both the distribution of private signals and the distribution of future states. This implies that a player may deviate to make the state distribution advantageous for her. For example, suppose that, during a play of a strategy profile, the continuation payoffs are higher for player i from state ωt+1 than from state ω˜ t+1 . Then player i has an incentive in period t to choose a current action which makes ωt+1 more likely. Hence, if we make sure that, given a state, player −i’s observations suggesting that player i is choosing –7–

this kind of advantageous action are more likely to trigger a punishment (bad) strategy in the next block, then we will be able to make player i indifferent between sBi , sG , as desired. We accomplish i this by carefully specifying the above transition probability as function not only of private signals but also of the realized states. In what follows, we will consider the finite T-period (or L-period) stochastic game. Corresponding history sets and strategy sets are denoted HiT and STi , and t (≤ T) refers to a time period in the T-period stochastic game. The discounted average payoffs in this game with initial state ω is denoted UiT (ω, s; δ), where s ∈ ST := ST1 ×ST2 . Moreover we will consider the finite T-period stochasT → R at the end of the last period. Player i’s tic game with payoffs augmented by a transfer πi : H−i payoff in this auxiliary scenario is defined as UiA (ω, s, πi ; δ) := UiT (ω, s; δ) + (1 − δ)δT E{πi | ω, s}.

3.1 Target payoff and ”almost invariant stage games”

V∗

wBG wGG v2 v v2 wBB

wGB

v∗1 v∗2

v1

v1

Figure 1: Payoffs

We fix a target payoff vector v ∈ V ∗ throughout. Consider payoff vectors {wXY }X,Y∈{G,B} surrounding v such that wGG > vi > wBB i , i wGB 1

> v1 >

wBG 1

i = 1, 2 and

GB wBG 2 > v2 > w2 .

See Figure 1. Therefore, there exist vi and vi such that v∗i < vi < vi < vi and   [v1 , v1 ] × [v2 , v2 ] ⊂ int co{wGG , wGB , wBG , wBB } . In the following lemma, under the asymptotic state independence we construct finite L−period strategies with payoffs arbitrarily close to wXY or minmax payoffs v∗ to generate ”almost invariant

–8–

stage game”. Lemma 3.1. Consider the perfect monitoring case, i.e.,  = 0. Then, for any η > 0, there exist L and δ such that for every L ≥ L and δ ≥ δ, the following holds: (i) For each XY ∈ {G, B}2 , there exists an L-period pure Markov strategy gXY such that < η, UiL (ω, gXY ; δ) − wXY i

∀ω ∈ Ω

∀i,

(ii) For each player i, there exists an L-period strategy gi−i for player −i such that for any L-period strategy si of player i, UiL (ω, si , gi−i ; δ) < v∗i + η,

∀ω ∈ Ω.

The lemma tells us that the L−period payoff matrix generated by the L−period Markov strategy profiles {gGG , gGB , gBG , gBB } is arbitrarily close to payoff matrix {wGG , wGB , wBG , wBB }, whatever the initial state is. So an almost repeated game consists of the infinite repetition of these L−period stochastic games that can be regarded as an almost invariant stage games. g−i is used as punishment i for player −i in almost invariant stage game. In the following we apply HO’s technique to our almost repeated game, accounting for the effect of actions on state transitions.

3.2 Block games, and ”good” and ”bad” strategies Now we divide the stochastic game into consecutive T-period stochastic games, each of which we call a block game. Take T such that T − 1 is a multiple of L. The block game consists of the following two rounds. (i)

Coordination Round (t = 1) : Let G and B be a partition of Ai . Player i sends good message Mi = G if she chooses an action from G; otherwise player i sends bad message Mi = B.

(ii) Main Round (t = 2, . . . , T) : If player i observes a message profile (M1 , M2 ) ∈ {G, B}2 in the 2 M1 coordination round, he plays a L-period pure Markov strategy gM repeatedly. However, i if player i sent bad message, and observed a signal indicating that player −i unilaterally 2 M1 deviated from gM , then player i plays the L-period minmax strategy g−i repeatedly in i −i the remaining period of the block game. In this description of the block game, we denote by sG ∈ STi (sBi ∈ STi ) the good (bad) strategy in i which player i sends a good (bad) message in the coordination round and follows the prescribed 2 M1 Markov strategy gM in the main round. i Consider the perfect monitoring case6 with  = 0. Observe that if player −i uses strategy sB−i , no matter what player i does in the block game, she earns strictly less than vi in every L-period interval of the main round except at most one interval in which player i deviates from gM2 M1 . 6

In perfect monitoring, players surely receive a correct message profile in the coordination round and precisely observe opponent’s unilateral deviation in the main round.

–9–

Hence, if T is sufficiently large, UiT (ω, si , sB−i ; δ) in the block game cannot exceed vi for any si ∈ STi and any initial state ω. Similarly, if T is sufficiently large, UiT (ω, si , sG ; δ) cannot fall below vi for −i G B any si ∈ {si , si } and any initial state ω. Hence, for such T and all ω ∈ Ω, T B min UiT (ω, si , sG −i ; δ) > vi > vi > vi > max Ui (ω, si , s−i ; δ).

si ∈{sBi ,sG } i

si ∈STi

Now consider the imperfect private monitoring case. If  > 0 is sufficiently small given T, an error event is very unlikely and hence the influence to T-period’s payoffs is so negligible that the above inequalities still hold.

3.3 Perfect Monitoring In this subsection we provide a sketch of the proof for the perfect monitoring case, in which i’s signal is equal to −i’s actual action and hence hti equals ht−i up to the ordering of signals and actions. We abuse the notation to denote it by ht−i = hti . As in HO, we shall define the transfers T → R such that, in the block game, player i is indifferent between strategies sB , sG and πBi , πG : H−i i i i weakly prefers them to all others if player −i uses either of two strategies sB−i , sG . −i More specifically, by backward induction from period T to 1, we define θB (ht−1 , ωt , ati ) as −i difference of the continuation payoffs from (ht−1 , ωt ) between some strategy rBi ∈ STi and another −i strategy of playing ati in period t and reverting to rBi afterwards, provided that opponent −i plays a bad strategy sB−i . Now let (1)

πBi (hT−i ) :=

T 1 X t−1 B t−1 t t δ θ (h−i , ω , ai ). δT t=1

Then the transfer (1), when added at the end of the block, guarantees that player i is indifferent across all her strategies in T-period stochastic game. In stochastic games, altering current actions affects not only the distribution of private signals, but also the distribution of future states. Transfer θB accounts for that effect. If, for example, a change in player i’s action from what rBi prescribes lowers her continuation payoff through a change in the distribution of future states, then the transfer compensates her so that she keeps indifferent among all actions. Letting the transfer θB depend on the realized state enables us to keep the indifference, which is how we adapt the construction of HO to deal with stochastic games. Similarly, define (2)

T πG i (h−i ) :=

T 1 X t−1 G t−1 t t δ θ (h−i , ω , ai ), δT t=1

where θG is defined7 so that it adjusts the continuation payoffs for strategies sBi , sG and other i T strategies si in Si to make sure that, when the transfer (2) is added at the end of the block, player 7

We omit the details, which are more complicated than those of θB . See the appendix in Aiba (2013).

–10–

i is indifferent between sBi and sG , and prefers them to all other strategies in T-period stochastic i game, provided that opponent −i plays good strategy sG . By adjusting the transfers a little further −i and enlarging, if necessary, [v1 , v1 ] × [v2 , v2 ], we may assume without loss of generality that for all ω ∈ Ω, UiA (ω, si , sG −i , πi ; δ) = vi

for all si ∈ {sBi , sG i },

and

UiA (ω, si , sB−i , πi ; δ) = vi

for all si ∈ STi .

Note that player −i determines player i’s payoff in the auxiliary scenario, depending on whether she plays sB−i or sG . −i Given the target payoff v, the equilibrium strategies are described as follows: in the first Tperiod block player −i picks a strategy sG with probability q and strategy sB−i otherwise, where −i q ∈ [0, 1] solves vi = qvi + (1 − q)vi . After initial randomization, player −i sticks to the resulting strategy sG or sB−i throughout the block. If player −i plays sB−i and observes history hT−i , then the −i new target payoff from the next block on becomes v0 = vi + (1 − δ)πBi (hT−i ). If player −i plays sG and −i G T T 0 observes history h−i , then the new target payoff becomes v = vi + (1 − δ)πi (h−i ). Now in the second T-period block player −i randomizes two strategies with probability q0 such that v0i = q0 vi +(1−q0 )vi , and so on.8 Notice that if players follow this strategy, player i’s total average payoffs when player respectively are, for si ∈ {sBi , sG }, −i plays sB−i and sG i −i h i (1 − δT )UiT (ω, si , sB−i ; δ) + δT vi + (1 − δ)πBi (hT−i ) = vi , and i h G T T (1 − δT )UiT (ω, si , sG −i ; δ) + δ vi + +(1 − δ)πi (h−i ) = vi , respectively. Thus, with initial randomization probability q, the target payoff v is actually achieved. Now we claim that these strategies form a sequential equilibrium. From the transfer scheme (1), we can see that for each private history in each block, player i is indifferent among all actions when player −i plays sB−i . On the other hand, from the transfer scheme (2), for each private history } when player −i plays sG . in each block, it is optimal for player i to follow any strategy in {sBi , sG −i i G B Hence, given player −i’s strategy, any strategy that player i plays si ∈ {si , si } in each block is a best reply. So the strategies described constitute a sequential equilibrium.

3.4 Imperfect Private Monitoring In this subsection, we consider the private almost-perfect monitoring, constructing block equilibria similar to those from the perfect monitoring case. We would like to design the transfers that give the desired incentives. While ht−i = hti in the perfect monitoring case, players may form non-trivial beliefs Pr{ht−1 | ht−1 , ωt } in the private monitoring case. Thus, we construct transfers (1) −i i and (2) such that in the block game, player i is indifferent between {sBi , sG } and it is sequentially i t−1 t−1 t optimal to follow them given his beliefs Pr{h−i | hi , ω } if player −i uses either of two strategies 8

The transfers shall be defined so that v0 ∈ [vi , vi ].

–11–

, ωt ) ∈ Hit−1 × Ω, t ≥ 1 that cannot be sB−i , sG .9 For that purpose, let HiE be the set of all histories (ht−1 i −i reached by any strategy profile in {sB1 , sG } × {sB2 , sG } when there is no error ( = 0), i.e., monitoring 2 1 E T−1 R is perfect. Hi := (Hi × Ω) \ Hi is the set of histories that can be reached by some strategy profile in {sB1 , sG } × {sB2 , sG }. We call the former the set of erroneous histories and the latter the set of regular 2 1 histories. Note that in the private monitoring case ( > 0) an erroneous history can be reached with positive probability. We denote by si | HiR and si | HiE strategies on regular histories and erroneous histories respectively. , wt ) ∈ HiR , it is not difficult to give the desired incentives using the For regular history (ht−1 i , wt } → 1 as  → 0. However, if | ht−1 = ht−1 transfers as in the perfect monitoring case since Pr{ht−1 i i −i player i reaches an erroneous history, he knows that someone has observed an erroneous signal. , ωt−1 ) ∈ HiR , it cannot happen on a is such that, given (ht−2 For example, suppose that signal σt−1 i i regular history. Then player i thinks that either player i might have observed an erroneous signal in the current period or opponent −i might have observed an erroneous signal in the previous period even though opponent −i followed the prescribed strategy in both cases. Which event is likely to occur depends on the likelihood on the monitoring probability m. Moreover, there is another case that is absent in repeated games since the public state variable may give additional information about players’ actions together with private signals.10 Suppose that ht−1 can be reached on a i t regular history but ω is such that it cannot happen on a regular history. Thus, ωt contradicts σt−1 i so that player i believes that signal σt−1 is erroneous. In this case player i thinks that opponent −i i observed an erroneous signal in the previous period and chose an action that is different from the one he should pick if he observed a correct signal. Again, which signal was likely to be observed and which action was likely to be chosen depends on the fine details of the transition probability p and the monitoring probability m, which makes it hard to specify optimal strategies on erroneous histories under generic assumptions. Hence, as in HO, we leave sBi | HiE and sG | HiE unspecified and i jointly determine transfers (πBi , πG ) and strategies (sBi | HiE , sG | HiE ) by the fixed point argument, still i i keeping incentives to follow sBi | HiR and sG | HiR on regular histories. i Now that the transfers (πBi , πG ) and block strategies (sBi , sG ) are determined, the equilibrium i i strategy is constructed as in the perfect monitoring case, which we leave to our working paper (Aiba (2013)). Moreover, in the private monitoring case it is important that transfer πBi makes sure that if player −i plays sB−i , player i is indifferent across all her actions, conditional on any history within the block. Because of this property, player i may assume, for the sake of computing her best replies, that her opponent is playing sG−i , independently of her own private history. Otherwise player i would have to form her belief about −i’s current strategy and hence i’s strategy would have to depend on her own history in the entire stochastic game before the current block, which makes the analysis untractable. 10 Because of this, one might be concerned that if ωt is strongly correlated with σt−1 and ωt suggests that σt−1 is −i −i t erroneous, then player i may revise his beliefs about σt−1 based on ω . However, this is not an issue here because of the −i conditional independence between the transition probability and the monitoring probability. 9

–12–

4. Conclusion We prove the folk theorem for stochastic games with private almost-perfect monitoring when the asymptotic state independence holds. In particular, our result shows that players can cooperate in a broader class of economic environments than those described by repeated games. Moreover, our result establishes that the folk theorem proven by Dutta (1995) for perfect monitoring case is robust to perturbation from perfect monitoring toward private monitoring, and that the sophisticated construction of Horner and Olszewski (2006) for repeated games can be adapted to ¨ stochastic games, reinforcing our conviction that much knowledge and intuition about repeated games carries over to the analysis of stochastic games.

References [1] Aiba, K. (2013): ”A Folk Theorem for Stochastic Games with Private Almost Perfect Monitoring”, mimeo. [2] Besanko, D., U. Doraszelski, Y. Kryukov, and M. Satterthwaite (2010): ”Learning-by-Doing, Organizational Forgetting, and Industry Dynamics”, Econometrica, 78, 453-508. [3] Dutta, P. K. (1995): ”A Folk Theorem for Stochastic Games,” Journal of Economic Theory, 66, 1-32. [4] Ely, J., J. Horner, and W. Olszewski (2005): ”Belief-Free Equilibria in Repeated Games,” Econo¨ metrica, 73, 377-415. [5] Ely, J., and J. V¨alim¨aki (2002): ”A Robust Folk Theorem for the Prisoner’s Dilemma,” Journal of Economic Theory, 102, 84-105. [6] Fudenberg. D., and Y. Yamamoto (2011): ”The Folk Theorem for Irreducible Stochastic Games with Imperfect Public Monitoring,” Journal of Economic Theory, 146, 1664-1683. [7] Horner, J., and W. Olszewski (2006): ”The Folk Theorem for Games with Private Almost-perfect ¨ Monitoring,” Econometrica, 74, 1499-1544. [8] Horner, J., S. Sugaya, S. Takahashi, and N. Vieille (2011): ”Recursive Methods in Discounted ¨ Stochastic Games: an Algorithm for δ → 1 and a Folk Theorem,” Econometrica, 79, 1277-1318. [9] Neyman, A., and S. Sorin (eds) (2003): ”Stochastic Games and Applications,” NATO ASI series. [10] Piccione, M. (2002): ”The Repeated Prisoner’s Dilemma with Imperfect Private Monitoring,” Journal of Economic Theory, 102, 70-83. [11] Sobel, M. (1971): ”Noncooperative Stochastic Games,” The Annals of Mathematical Statistics, 42, 1930-1935. [12] Stigler, G.J. (1964): ”A Theory of Oligopoly,” Journal of Political Economy, 72, 44-61.

–13–

A. Appendix : Two Players We introduce further notations before describing the details of the proof. First, we write and the ∈ ∆Ai for the mixed action prescribed by strategy si when the past history is ht−1 i t t−1 t t−1 current state is ω . The probability assigned to any action ai by si [hi , ω ] is denoted si [hi , ωt ](ai ). We also denote by si [ωt ] the mixed action prescribed by player i’s Markov strategy in state ωt .

, ωt ] si [ht−1 i

A.1 Perfect Monitoring In this subsection we consider the perfect monitoring case. As in the body text, we first take payoff vectors {wGG , wGB , wBG , wBB } surrounding the target payoff v. Then, in order to apply HO’s technique to our stochastic games, we construct L-period Markov strategies approximating payoff matrix {wGG , wGB , wBG , wBB } and minmax payoffs v∗ . Lemma A.1 (Lemma 3.1). For any η > 0, there exist L and δ such that for every L ≥ L and δ ≥ δ, the following holds: (i) For each XY ∈ {G, B}2 , there exists a L-period pure Markov strategy gXY such that UiL (ω, gXY ; δ) − wXY < η, i

∀ω ∈ Ω ∀i,

(ii) For each player i, there exists −i’s L-period strategy gi−i such that for any i’s L-period strategy si , UiL (ω, si , gi−i ; δ) < v∗i + η,

∀ω ∈ Ω.

Proof. The proof follows from some results in Dutta (1995). By Dutta (1995, Lemma 1), wXY ∈ V P can be expressed as a convex combination wXY = kj=1 λ j w j , where w j is the limit-average payoff of some pure Markov strategy q j . Consider the following strategy : play the strategy q1 for L1 P periods, followed by the strategy q2 for L2 periods, and so on. Letting L = kj=1 L j , choose L j such that L j /L is arbitrary close to λ j and  Lj  X   E  1 ui ωt , q j [ωt ]   L j t=1

 η  1 j  ω = ω − w < ,   i 3

for all ω ∈ Ω and player i. Then there exists δ < 1 such that for any δ > δ, any j ∈ {1, . . . , k}, and L j j j any player i, Ui (ω, q ; δ) − wi < 2η/3, which implies (i). By Dutta (1995, Proposition 3), there exists L > 0 and −i’s strategy gi−i such that for any ∞ ≥ L0 ≥ L and any i’s strategy si ,  L0    1 X t i t−1 t  E  0 ui ωt , si [ht−1 , ω ], g [h , ω ] i −i −i L t=1

  1 ∗ η  ω = ω < vi + , 2

–14–

∀ω ∈ Ω.

Pick such L0 . Now if L defined in the previous paragraph is smaller than L0 , then readjust L to ensure L = L0 ; otherwise pick L0 equal to L. Finally, take δ > δ such that UiL (ω, si , gi−i ; δ) < v∗i + η holds for all ω ∈ Ω.  In the above lemma, take η such that v∗i + η < vi and   Bη [v1 , v1 ] × [v2 , v2 ] ⊂ interior of co{wGG , wGB , wBG , wBB }, where Bη (C) means the η-neighborhood of the set C. And then given this η, take L ≥ L and δ ≥ δ satisfying Lemma A.1. Hereafter, we omit δ from the notation Ui (· , · ; δ), UiT (· , · ; δ) and UiA (· , · ; δ). A.1.1 Block games and the set of strategies Si Now we divide the entire stochastic game into consecutive T-period block games, where T − 1 is a multiple of L. In the following, we specify good startegies sG and bad strategies sBi formally. i First, we define the set of strategies Si ; si ∈ STi is in Si ⊂ STi if and only if M2 M1 t [ωt ] si [ht−1 i , ω ] = gi

for t ≥ 1,

if hit−1 = (ω1 , a1 , . . . , ωt−1 , at−1 ) such that a1 ∈ Mi × G and aτ = gM2 M1 [ωτ ] for all τ ∈ {2, . . . , t − 1}. That is, if opponent −i sends a good message in the communication round, then player i plays a pure 2 M1 Markov strategy gM as long as both played gM2 M1 in the previous periods. If opponent sends a i bad message or someone deviated from gM2 M1 before, then player i may play any action. For each (hit−1 , wt ) ∈ Hit−1 × Ω, define n o t t−1 t Ai (ht−1 , w ) = a ∈ A : ∃s ∈ S , s [h , w ](a ) > 0 . i i i i i i i i Since gM2 M1 is a pure Markov strategy, Ai (ht−1 , wt ), the set of actions prescribed by Si , is either Ai or i o n MM a singleton gi 2 1 [ωt ] . Finally, define for ρ > 0 n o ρ t t−1 t t−1 t Si = si ∈ Si : ∀(ht−1 i , w ), ∀ai ∈ Ai (hi , w ), si [hi , w ](ai ) ≥ ρ . ρ

So a strategy in Si prescribes completely mixing over actions prescribed by Si at each history. Now we provide the precise definitions of regular histories and erroneous histories. HiE,t−1/2 is the set of histories (ht−1 , ωt ) ∈ Hit−1 × Ω that are off the equilibrium path for some (actually every) strategy i ρ ρ profile in S1 × S2 . HiR,t−1/2 := (Hit−1 × Ω) \ HiE,t−1/2 is the set of histories (ht−1 , ωt ) ∈ Hit−1 × Ω that is i ρ ρ reached with positive probability under some (actually every) strategy profile in S1 × S2 . Finally, S S let HiR := t≤T HiR,t−1/2 and HiE := t≤T HiE,t−1/2 . We call the former the set of regular histories and the latter the set of erroneous histories.

–15–

A.1.2 The strategies sG , sBi ∈ Si and T i g

We first define a good strategy sG and a bad strategy sBi in two steps. First, define si ∈ Si as i g

si [h0i , ω1 ] ∈ ∆G, g

M2 M1 t si [ht−1 [ωt ] i , ω ] = gi

for t ≥ 1,

if hit−1 = (ω1 , a1 , . . . , ωt−1 , at−1 ) such that a1 ∈ Mi × M−i and aτ = gM2 M1 [ωτ ] for all τ ∈ {2, . . . , t − 1}. On all other histories define it arbitrarily. Similarly, define sbi ∈ Si as sbi [h0i , ω1 ] ∈ ∆B, M2 M1 t sbi [ht−1 [ωt ] i , ω ] = gi

for t ≥ 1,

if hit−1 = (ω1 , a1 , . . . , ωt−1 , at−1 ) such that a1 ∈ Mi × M−i and aτ = gM2 M1 [ωτ ] for all τ ∈ {2, . . . , t − 1}. Moreover, define for k ≤ t, e hki as the recent k length history of hti , and then let t −i ek−1 t sbi [ht−1 i , ω ] = gi [hi , ω ]

for t = 1 + nL + k, 1 ≤ k ≤ L, n ∈ Z+ ,

2 M1 if hit−1 = (ω1 , a1 , . . . , ωt−1 , at−1 ) such that a1 ∈ B × M−i and aτ−i , gM [ωτ ] for some τ ∈ {2, . . . , t − 1}. −i So, this strategy prescribes player i sending a bad message, and if player −i unilaterally deviates from gM2 M1 , then player i plays the L-period minmax strategy g−i repeatedly in the remaining i −i period of the block game. Note the minmax strategy gi depends on the recent (k − 1)-length history e hk−1 i . On all other histories, define it arbitrarily. Observe that if player −i uses strategy sb−i , no matter what player i does in the block game, she earns strictly less than vi in every L-period interval of the main round but at most one, the interval in which player i deviates from gM2 M1 . Hence, if T is sufficiently large, UiT (ω, si , sb−i ) in the block game cannot exceed vi for any si ∈ STi and any initial state ω. Similarly, if T is sufficiently large, g UiT (ω, si , s−i ) cannot fall below vi for any si ∈ Si and any initial state ω. Take such T. Now pick ρ g small enough ρ > 0 and perturb si , sbi slightly to get a pair of strategies sG , sBi in Si such that for all i ω ∈ Ω,

T B min UiT (ω, si , sG −i ) > vi > vi > vi > max Ui (ω, si , s−i ). si ∈Si

si ∈STi

A.1.3 The result We first construct transfers πBi , πG that give the desired incentives (see (1) and (2).). Define i G ri ∈ Si as a strategy such that, for every (ht−1 , ωt ) ∈ Hit−1 × Ω, the continuation strategy rG | t−1 i i (hi ,ωt ) G yields the lowest continuation payoff against s−i among all strategies si ∈ Si . Similarly, define rBi ∈ STi as a strategy such that, for every (ht−1 , ωt ) ∈ Hit−1 × Ω, the continuation strategy rBi |(ht−1 ,ωt ) i i yields the highest continuation payoff against sB−i among all strategies si ∈ STi .

–16–

, ωt ) , ωt )) be a normalized discounted continuation payoff given history (ht−1 Let UiT (· , · | (ht−1 i i , ωt )) be its unnormalized version. Then, we define πB as in T-period block game. Let WiT (· , · | (ht−1 i t−1 × Ω, , ωt ) ∈ H−i in (1) where for each t ∈ {1, . . . , T} and (ht−1 −i     t t T 1 B B t−1 t T 1 B t B t−1 t θB (ht−1 , ω , a ) := W ω , r , s (h , ω ) − W ω , r /a , s (h , ω ) , −i i i i −i i i i i −i i where rBi /ati is a strategy playing ati in the current period t, followed by reversion to rBi from the next up to the ordering equals ht−1 period t + 1 on. Note that since we assume perfect monitoring, ht−1 −i i B , ωt , ati ) ≥ 0 of signals and actions, so the above equation is well defined. By definition of ri , θB (ht−1 −i t−1 , ωt , at ). We can see that the transfer (1) gives the desired incentives, i.e., when added for any (h−i i at the end of the block, it guarantees that player i is indifferent across all her strategies in T-period stochastic game. t−1 × Ω, define , ωt ) ∈ H−i Similarly, for each t ∈ {1, . . . , T} and (ht−1 −i     t t T 1 G G t−1 t T 1 G t G t−1 t , a ω , r , s (h ) − W ω , r /a (h ) . θ(ht−1 , ω ) := W , ω , s , ω −i i i i i i −i i i −i i By definition of rG , we have θ(ht−1 , ωt , ati ) ≤ 0 whenever ati ∈ Ai (ht−1 , ωt ). Define θG (ht−1 , ωt , ati ) = i −i −i i min{0, θ(ht−1 , ωt , ati )} ≤ 0. Given θG , we define transfer πG as in (2). Again, we can see that the −i i transfer (2) gives the desired incentive, i.e., when added at the end of the block, it guarantees that player i is indifferent across all her strategies within Si and prefers them to all other strategies in T-period stochastic game. By enlarging, if necessary, [v1 , v1 ] × [v2 , v2 ], we may assume without loss of generality that G min UiT (ω, rG i , s−i ) = vi ω∈Ω

and

max UiT (ω, rBi , sB−i ) = vi . ω∈Ω

Since T is fixed, we can take δ close enough to 1 so that for any ω and hT−i ,  1 − δT  T B B v − U (ω, r , s ) i i i −i < vi − vi , δT  1 − δT  T T G G (1 − δ)πG (h ) + v − U (ω, r , s ) i −i i i i −i > vi − vi . δT (1 − δ)πBi (hT−i ) +

Proof of Theorem 2.7 (n = 2, Perfect Monitoring): We describe equilibrium strategies by an automata, which revises automaton states and actions at the beginning of every block. An action of the automaton is the T-period stochastic game strategy to be used by the player in the block. (i) Automaton state space: The automaton state (u, ω) of player −i’s automaton is an element of [vi , vi ] × Ω, a pair of player i’s continuation payoff and the initial state in the T-period stochastic game. (ii) Initial automaton state: Player −i starts in automaton state (u, ω) = (vi , ω1 ), a pair of the payoff to be achieved and the initial state that was drawn. (iii) Actions: In automaton state (u, ω), player −i picks strategy sG with probability q and −i –17–

strategy sB−i otherwise, where q ∈ [0, 1] solves u = qvi + (1 − q)vi . Thus, in each block, player −i performs an initial randomization and then sticks to the resulting strategy sG or sB−i −i throughout the block. (iv) Transitions: If the automaton state is (u, ω), the action of the automaton is sB−i and player −i’s history is hT−i , then, at the end of the block, player −i transits to automaton state vi + (1 −

δ)πBi (hT−i )

!  1 − δT  T B B 0 + vi − Ui (ω, ri , s−i ) , ω , δT

where ω0 is the state drawn at the end of the block game according to the probability distribution p. Note that the first term is in [vi , vi ]. Similarly, if the automaton state is (u, ω), the action of the automaton is sG and player −i’s history is hT−i , then, at the end of the block, −i player −i transits to automaton state vi + (1 −

T δ)πG i (h−i )

!  1 − δT  T G G 0 + vi − Ui (ω, ri , s−i ) , ω . δT

Note again that the first term is in [vi , vi ], so the automaton is well defined11 . We claim that these strategies form a sequential equilibrium. From the transfer scheme (1), we can see that in each block game, for any (ht−1 , ωt ), t ≥ 1, player i is indifferent among all actions i when player −i plays sB−i . On the other hand, from the transfer scheme (2), we can see that in each block game, for any (ht−1 , ωt ), t ≥ 1, it is optimal for player i to choose an action from Ai (ht−1 , ωt ) i i when player −i plays sG . Hence, it follows from the one-shot deviation property that, given player −i −i’s strategy, any strategy that player i plays si ∈ Si in each block game is a best reply. So the strategies described by the automaton constitute a sequential equilibrium. Note also that player i’s payoff is equal to the weighted average of the payoff of playing rG against sG and the payoff of i −i playing rBi against sB−i , with respective weights q and 1 − q. The sum of the average payoff within the block and the continuation payoff from playing rG against sG is i −i (1 − δ

T

G )UiT (ω, rG i , s−i )

# "  1 − δT  T G G vi − Ui (ω, ri , s−i ) = vi . + δ vi + δT T

Similarly, the sum of the average payoff within the block and the continuation payoff from playing rBi against sB−i is (1 − δ

T

)UiT (ω, rBi , sB−i )

" #  1 − δT  T B B vi − Ui (ω, ri , s−i ) = vi . + δ vi + δT T

Thus, at the beginning of a block, player i’s payoff when player −i’s state is (u, ω) is qvi +(1−q)vi = u, 11 Actually, we can make only i’s continuation payoff u the automaton state because the information of ω is included in hT−i . However, we adopt the two-dimensional automaton state (u, ω) in order to make explicit that we are showing v ∈ E(ω, δ, ε).

–18–

as desired 

A.2 Imperfect Private Monitoring In this subsection, we consider the almost-perfect monitoring, constructing block equilibria similar to those from the perfect monitoring case. Note that under the canonical signal structure, ρ the definitions of Si and Si are still valid, and the definitions of regular and erroneous histories do not change, though now erroneous histories are reached with positive probability. As argued by HO, it is important that player i may assume, for the sake of computing her best replies, that her opponent is playing sG , independently of her own private history, because −i otherwise player i would have to form her belief about −i’s current strategy and hence i’s strategy would have to depend on her own history in the entire stochastic game before the current block, which makes the analysis untractable. This difficulty can be overcome, as in the perfect monitoring case, by defining transfers πBi such that if player −i plays sB−i , player i is indifferent across all her actions, conditional on any history within the block. But again transfers will have to depend on realized state to achieve this required property. Given (ω, s−i , πi ), let Bi (ω, s−i , πi ) be the set of player i’s best responses in the auxiliary scenario with initial state ω. Lemma A.2. For every strategy profile s | HE , there exists ε > 0 such that, for all ε < ε, there exists a T → R such that, for all ω ∈ Ω, nonnegative transfer πBi : H−i + (3)

STi = Bi (ω, sB−i , πBi ),

R = sB | H R and sB | H E = s | H E , and for every s ∈ B (ω, sB , πB ), where sB−i | H−i −i i i −i −i −i −i −i −i i

(4)

lim UiA (ω, si , sB−i , πBi ) = max UiT (ω, s˜i , sB−i ),

ε→0

s˜i ∈STi

where UiT (· , · ) is player i’s discounted average payoff under perfect monitoring. Proof. Given a history hT−i , let (ht−1 , ωt , σt−i ) denote the truncation of hT−i to ht−1 and the observed −i −i t t state ω ∈ Ω and the private signal σ−i ∈ Σ−i = Ai obtained by player −i in the period t. The transfer is defined as  T  X  1 t t   , πBi (hT−i ) = T  δt−1 θ(ht−1 −i , ω , σ−i )  δ t=1 for some function θ(· , · , · ) to be specified by backward induction. T−1 , ωT ], define transfers θ(hT−1 , ωT , σT ) First, start with t = T. Given (hT−1 , ωT ) and αT−i := sB−i [h−i −i −i −i

–19–

such that for all aTi ∈ Ai , ui (ωT , aTi , αT−i ) +

X

T−1 m−i (σT−i | ωT , (aTi , αT−i ))θ(h−i , ωT , σT−i ) = ui (ωT , aTi , αT−i ),

σT−i ∈Σ−i

where aTi := argmaxaT ∈Ai gi (ωT , aTi , αT−i ). To see that this system has a solution, consider a row vector i {m−i (σT−i | ωT , (aTi , αT−i ))}σT ∈Σ−i for each aTi ∈ Ai , and construct a matrix DT by stacking these row −i T−1 , ωT , σT ) , ωT , σT−i )}σT ∈Σ−i the column vector of unknowns θ(h−i vectors. Moreover, let θT := {θ(hT−1 −i −i −i and uT := {ui (ωT , aTi , α−i ) − ui (ωT , aTi , αT−i )}aT ∈Ai the column vector of the payoff difference. Then as i ε → 0, DT tends to the identity matrix, so the above system, which can be rewritten as DT θT = uT , has a solution when the monitoring is almost perfect. Next, consider period t < T. Suppose that all transfers θ(hτ−i , ωτ+1 , στ+1 ) for τ ≥ t are defined, −i , ωt ) and so that player i is indifferent across all his strategies from t + 1 on. Then given (ht−1 −i αt−i := sB−i [ht−1 , ωt ], define transfers θ(ht−1 , ωt , σt−i ) such that for all ati ∈ Ai , −i −i ui (ωt , ati , αt−i ) +

X

h i t t t t+1 t−1 t t t m−i (σt−i | ωt , (ati , αt−i ))θ(ht−1 , σ ) h , (a , ω ) + δE W (h , ω , ω , α ) i −i −i −i −i i −i

σt−i ∈Σ−i

h i t t t , (a , ω , α ) = ui (ωt , ati , αt−i ) + δE Wi (ht−i , ωt+1 ) ht−1 −i i −i , where Wi (ht−i , ωt+1 ) is player i’s auxiliary-scenario unnormalized continuation payoff from period t + 1 on, defined as  T h i  X s s , σ δs−t−1 ui (ωs , as ) + θ(hs−1 , ω ) E  −i −i s=t+1

  t t+1   h−i , ω  ,

n io h t , (at , αt ) . Note that W (ht , ωt+1 ) and ati := argmaxat ∈Ai ui (ωt , ati , αt−i ) + δE Wi (ht−i , ωt+1 ) ht−1 , ω i −i −i i −i i does not depend on her strategy from period t + 1 on by the induction hypothesis. To see that this system has a solution, consider a row vector {m−i (σt−i | ωt , (ati , αt−i ))}σt ∈Σ−i for each ati ∈ Ai , −i and construct a matrix Dt by stacking these row vectors. Moreover, define column vectors θt := t−1 , ωt , σt )} t {θ(h−i and −i σ ∈Σ−i −i

( h i t t t u : = ui (ωt , ati , αt−i ) + δE Wi (ht−i , ωt+1 ) ht−1 , ω , (a , α ) −i i −i t

− ui (ω

t

, ati , αt−i )

) h i t t+1 t−1 t t t − δE Wi (h−i , ω ) h−i , ω , (ai , α−i )

.

ati ∈Ai

Then, by the same argument as the above, the system, which can be rewritten as Dt θt = ut , has a solution when the monitoring is almost perfect. In this way, (3) is achieved when reaching (h0−i , ω1 ) in the backward induction. Take sufficiently small ε > 0 such that for all ε < ε, the system Dt θt = ut has a solution for all t ∈ {1, . . . , T}. Moreover, –20–

since each Dt tends to the identity matrix and ut ≥ 0, each θt must be nonnegative as ε → 0. Hence , ωt , σt−i ) nonnegative by adding to all of them the for each ε < ε, we can make all transfers θ(ht−1 −i positive constant t t min θ(ht−1 , ω , σ ) −i −i , ,ωt ,σt−i ht−1 −i which tends to 0 as ε → 0 since θ(ht−1 , ωt , ati ) tends to 0. −i Finally consider player i’s strategy rBi such that for each (ht−1 , ωt ) ∈ HiT−1 × Ω, rBi [ht−1 , ωt ] = ati , i i i h , ωt , (ati , αt−i ) over ati ∈ Ai , provided that where ati maximizes ui (ωt , ati , αt−i ) + δE Wi (ht−i , ωt+1 ) ht−1 −i t−1 equals ht−1 up to the ordering of signals and actions. Then as the monitoring becomes perfect, h−i i (a) by the construction from the previous paragraph, πBi (hT−i ) tends to 0 if players play rBi and sB−i respectively, and (b) by the definition of rBi , the strategy rBi approaches to the strategy rBi considered in the perfect monitoring case. So (a) and (b) establish (4).  Given ω ∈ Ω, HiE ⊂ HiT , (si , s−i ) ∈ STi ×ST−i , and πi , Bi (ω, s−i , πi | si ) denotes the set of strategies that maximizes i’s auxiliary-scenario payoff (with initial state ω) against (s−i , πi ) among all strategies si ∈ STi such that si | HiE = si | HiE . Lemma A.3. For every strategy profile s | HE , there exists ε > 0 such that for ε < ε, there exists a T → R such that for every ω ∈ Ω, nonpositive transfer πG : H−i − i (5)

o n G si ∈ STi : si | HiR = s˜i | HiR for some s˜i ∈ Si and si | HiE = si | HiE ⊂ Bi (ω, sG −i , πi | si ),

R = sG | H R and sG | H E = s | H E , and for every s ∈ B (ω, sG , πG | s ), where sG | H−i −i i i i −i −i −i −i i −i −i −i

(6)

G T ˜ G lim UiA (ω, si , sG −i , πi ) = min Ui (ω, si , s−i ),

ε→0

s˜i ∈Si

where UiT (· , · ) is player i’s discounted average payoff under perfect monitoring; furthermore, πG depends i continuously on s and is bounded away from −∞ by π. Proof. Again the transfer has the form T πG i (h−i ) =

T 1 X t−1 t−1 t t δ θ(h−i , ω , σ−i ), δT t=1

for some function θ(· , · , · ) to be specified by backward induction. Let rG ∈ STi a strategy such that for i R,t−1/2 every (hit−1 , ωt ) ∈ Hi , rG | t−1 yields the lowest payoff against sG in the T-period stochastic i (hi ,ωt ) −i T R R game, among all strategies of {si ∈ Si : si | Hi = s˜i | Hi for some s˜i ∈ Si and si | HiE = si | HiE }. To achieve (5), it is enough to pick θ(ht−1 , ωt , σt−i ) that satisfies the following properties12 , −i 12

Here we do not need to consider the sequential rationality at the erroneous histories since we fix the strategy on the erroneous histories as si . The sequential rationality on them will be considered in a fixed point argument later.

–21–

, ωt ) = Ai , player i is indifferent under the auxiliary , ωt ) ∈ HiR,t−1/2 such that Ai (ht−1 For (ht−1 i i scenario between playing all actions ai ∈ Ai , each followed by switching to rG from period i t + 1 on. , ωt ) = {gGG , ωt ) ∈ HiR,t−1/2 such that Ai (ht−1 [ωt ]} or {gBG (ii) For (ht−1 [ωt ]} if i = 2, or {gGB [ωt ]} i i i i i , ωt ) (weakly) if i = 1, the payoff of player i under the auxiliary scenario to playing Ai (ht−1 i exceeds the payoff playing any other action, both followed by switching to rG from period i t + 1 on. , ωT ) ∈ HiR,T−1/2 and aTi ∈ Ai , we consider three cases; Start with t = T. For (hT−1 i , ωT ) = Ai , (a) Ai (hT−1 i , ωT ) = {gGG (b) Ai (hT−1 [ωT ]} or {gBG [ωT ]} if i = 2, or {gGB [ωT ]} if i = 1, i h  i i ii h  i   T−1 , ωT ], sG [hT−1 , ωT ] (hT−1 , ωT ) ≤ E u ωT , aT , sG [hT−1 , ωT ] (hT−1 , ωT ) , and E ui ωT , rG [h i i i i i i −i −i −i −i (c) otherwise. To achieve (i) and (ii), it is enough to define transfers θ(hT−1 , ωT , σT−i ) such that for all (hiT−1 , ωT ) ∈ −i HiR,T−1/2 and aTi ∈ Ai , (i)

 i h  T−1 T (hT−1 , ωT ) E ui ωT , aTi , sG i −i [h−i , ω ] X X   T−1 T T T T G T−1 T T−1 Pr(hT−1 | h , ω )m σ ω , (a , + s [h ]) , ω θ(h−i , ωT , σT−i ) −i −i i −i i −i −i {z } T−1 ∈HT−1 σT ∈Σ | h−i −i −i −i (∗)  i h  T−1 T T−1 (hiT−1 , ωT ) , = E ui ωT , rG , ωT ], sG −i [h−i , ω ] i [hi if (a) or (b) holds, and X

X

  T−1 T T T T G T−1 T T−1 Pr(hT−1 | h , ω )m σ ω , (a , s [h , ω ]) θ(h−i , ωT , σT−i ) = 0, −i −i i −i i −i −i {z } T−1 σT ∈Σ | hT−1 ∈H−i −i −i −i (∗) if (c) holds, where Pr(hT−1 | hT−1 , ωT ) is the probability assigned by player i, conditional on history −i i hiT−1 . To see that this system has a solution, for each (hiT−1 , ωT ) ∈ HiR,T−1/2 and aTi ∈ Ai , con T−1 × Ω × Σ row vector that consists of the probabilities (∗) in the above equasider an 1 × H−i −i T−1 × {ωT } × Σ tion for the columns corresponding to H−i −i and zeros otherwise. Construct a R,T−1/2 T−1 T Hi × Ai × H−i × Ω × Σ−i matrix D by stacking these row vectors. Moreover, let θT := T−1 , ωT , σ )} T−1 T T {θ(h−i −i (hT−1 ,ωT ,σT )∈HT−1 ×Ω×Σ−i the column vector of unknowns θ(h−i , ω , σ−i ) and define −i −in −i o the column vector uT := u(hT−1 , ωT , aTi ) T−1 T T R,T−1/2 as i (hi

(7)

,ω ,ai )∈Hi

u(hT−1 , ωT , aTi ) = i

–22–

×Ai

 i h  i   h   T , rG [hT−1 , ωT ], sG [hT−1 , ωT ] (hT−1 , ωT ) − E u ωT , aT , sG [hT−1 , ωT ] (hT−1 , ωT )  E u ω  i i  i i i i −i −i −i −i i     if (a) or (b) holds,       0 if (c) holds. T−1 | hT−1 , ωT ) , ωT ) ∈ HiR,T−1/2 and small enough ε/ρ, the probability Pr(h−i By Bayes’ theorem, for (hT−1 i i R,T−1 T−1 up to the ordering of actions and signals.13 Hence, ∈ H that equals h is close to 1 for hT−1 i −i −i as ε → 0 and ε/ρ → 0, DT has full row rank so that DT θT = uT has a unique solution when we set T−1 , ωT , σT ) = 0 for all (hT−1 , ωT , σT ) ∈ H E,T−1/2 × Σ . θ(h−i −i −i −i −i −i τ Next consider period t < T, where all transfers θ(h−i , ωτ+1 , στ+1 ), ∀τ ≥ t are defined. Fix ωt ∈ Ω. −i Given hit−1 and sG , player i’s auxiliary-scenario unnormalized continuation payoff from period t −i on is  T  T X X t−1 t G  s−t s s t−1 t t s−t s−1 s s  E  δ ui (ω , a ) + θ(h−i , ω , σ−i ) + δ θ(h−i , ω , σ−i ) (hi , ω ), s−i  , s=t

s=t+1

which of course depends on player i’s continuation strategy. For notational convenience, given player i’s strategy si , let ViT (ht−1 , ωt , si ) denote the conditional expected value of the first term and i WiT (hit−1 , ωt , si ) the conditional expected value of the third term. Then for (ht−1 , ωt ) ∈ HiR,t−1/2 and i ati ∈ Ai we consider three cases; (a0 ) Ai (ht−1 , ωt ) = Ai , i (b0 ) Ai (ht−1 , ωt ) = {gGG [ωt ]} or {gBG [ωt ]} if i = 2, or {gGB [ωt ]} if i = 1, i i i i and ViT (ht−1 , ωt , rG ) ≤ ViT (ht−1 , ωt , rG /ati ), i i i i (c0 ) otherwise. To achieve (i) and (ii), it is enough to define transfers θ(ht−1 , ωt , σt−i ) such that for all (ht−1 , ωt ) ∈ −i i HiR,t−1/2 and ati ∈ Ai , t G t T t−1 t G t ViT (ht−1 i , ω , ri /ai ) + Wi (hi , ω , ri /ai ) X X   t t t t−1 t t t t G t−1 σ ω , (a , s [h , ω ]) θ(ht−1 + Pr(ht−1 | h , ω )m −i −i , ω , σ−i ) −i i −i −i −i i t−1 σt ∈Σ ht−1 ∈H−i −i −i −i

t G T t−1 t G = ViT (ht−1 i , ω , ri ) + Wi (hi , ω , ri ), 13

By Bayes’ rule, T−1 Pr(hT−1 , ωT ) = P −i | hi

p(ωT | ωT−1 , aT−1 , aT−1 )m(σT−1 , σT−1 | ωT−1 , aT−1 , aT−1 )sG−i [hT−2 , ωT−1 ](aT−1 )Pr(hT−2 | hT−2 , ωT−1 ) i −i i −i i −i −i −i −i i

e σ−i ,e a−i ,e hT−2 −i

T−2 T−1 ](e p(ωT | ωT−1 , aT−1 ,e a−i )m(σT−1 ,e σ−i | ωT−1 , aT−1 ,e a−i )sG−i [e hT−2 a−i )Pr(e h−i | hT−2 , ωT−1 ) −i , ω i i i i

Since we are focusing on regular histories, ωT occurs with positive probability under perfect monitoring. This and εperfect monitoring imply that as ε → 0, m(σiT−1 = aT−1 , σT−1 = aT−1 | ωT−1 , aT−1 , aT−1 ) remains positive while m((σT−1 , σT−1 ), −i −i i i −i i −i T−1 T−1 T−1 T−1 T−1 T−2 T−2 T−1 T−1 T−1 T (a−i , ai ) | ω , ai , a−i ) goes to zero. So by induction of Pr(h−i | hi , ω ), Pr(h−i | hi , ω ) converges to one for R,T−1 hT−1 ∈ H−i that equals hT−1 up to the ordering of actions and signals. −i i

–23–

.

if (a0 ) or (b0 ) holds, and t G t WiT (ht−1 i , ω , ri /ai ) X X   t−1 t t t t t t t G t−1 s + Pr(ht−1 [h | h , ω ]) θ(ht−1 , ω )m σ ω , (a , −i −i i −i i −i −i −i , ω , σ−i ) t−1 σt ∈Σ ht−1 ∈H−i −i −i −i

t G = WiT (ht−1 i , ω , ri ),

if (c0 ) holds, where rG /ati is a strategy playing ati in the current period t, followed by reversion to rG i i from the next period t + 1 on. To see that this system has a solution, similarly to the above, construct a HiR,t−1/2 × Ai × t−1 × Ω × Σ matrix Dt and the column vector θt := {θ(ht−1 , ωt , σt )} H−i t−1 ×Ω×Σ . Define −i ,ωt σ−i )∈H−i −i −i (ht−1 −i −i o n t t t t−1 as the column vector u := u(hi , ω , ai ) t−1 t t R,t−1/2 (hi ,ω ,ai )∈Hi

(8)

×Ai

t t u(ht−1 i , ω , ai ) =  n o  T (ht−1 , ωt , rG ) + W T (ht−1 , ωt , rG ) − V T (ht−1 , ωt , rG /at ) + W T (ht−1 , ωt , rG /at )  V   i i i i i i i i i i i i i i    0 0  if (a ) or (b ) holds,      W T (ht−1 , ωt , rG ) − W T (ht−1 , ωt , rG /at ) if (c0 ) holds.  i

i

i

i

i

i

i

Similarly to the above, since Dt has full row rank as ε → 0 and ε/ρ → 0, the system Dt θt = ut has E,t−1/2 a unique solution when we set θ(ht−1 , ωt , σt−i ) = 0 for all (ht−1 , ωt , σt−i ) ∈ H−i × Σ−i . −i −i In this way (5) is achieved. Take sufficiently small ε > 0 such that for all ε < ε, the system t [ht−1 , ωt ]) = 0, D θt = ut has a unique solution for all t ∈ {1, . . . , T}. Now from (7) and (8), u(ht−1 , ωt , rG i i i so it follows from the system Dt θt = ut that for every t and every (ht−1 , ωt ) ∈ HiR,t−1/2 , i t G t−1 t 0 = u(ht−1 i , ω , ri [hi , ω ]) h i t t t−1 t G t−1 t = E θ(ht−1 −i , ω , σ−i ) (hi , ω , ri [hi , ω ]) t G t−1 t → θ(ht−1 −i , ω , ri [hi , ω ])

as ε → 0,

t−1 in the last line equals ht−1 up to the ordering of actions and signals. Hence, for all t, where h−i i T (ht−1 , ωt , rG /at ) → 0 as ε → 0. Thus from (7), (8), and the definition WiT (hit−1 , ωt , rG ) → 0 and W i i i i i t must be nonpositive as ε → 0. Therefore, for each ε < ε, we can make all transfers of rG , u i t−1 , ωt , σt ) nonpositive by subtracting from all of them the constant θ(h−i −i t t max θ(ht−1 −i , ω , σ−i ).

ht−1 ,ωt ,σt−i −i

Also, since we argued that WiT (h0i , ω1 , rG ) → 0 as ε → 0 and by construction UiA (ω, rG , sG , πG )= i i −i i G G G G A Ui (ω, si , s−i , πi ) for every si ∈ Bi (ω, s−i , πi | si ), we can see that (6) holds. Obviously, the system Dt θt = ut continuously depends on s and so is its solution. –24–

Finally, to show the boundedness of πG , consider the perfect monitoring case, in which from i t t t t t t−1 the system D θ = u , θ(h−i , ω , σ−i ) is at least as small as   −B := − T max ui (ω, a) − min ui (ω, a) ω,a ω,a   X   s−1 s s s−1 s s  −  max θ(h−i , ω , σ−i ) − min θ(h−i , ω , σ−i ) . s−1 s s s−1 s s s>t

h−i ,ω ,σ−i

h−i ,ω ,σ−i

, ωt , σt−i ) exceeds −2B. By Thus, for small enough ε > 0, we can assume that all the values θ(ht−1 −i compactness of the set of strategies s and a backward induction argument14 , θ(· ) is bounded below from some θ independent of s, which establishes the result.  , and πBi Proof of Theorem 2.7 (n = 2, Imperfect Private Monitoring): First, we define sFP | HiE , πG i i for i = 1, 2. Consider a correspondence from the set of all strategies s1 | H1E , s2 | H2E and nonpositive transfers π1 , π2 into itself, defined by n o F(s1 | H1E , s2 | H2E , π1 , π2 ) = (s01 | H1E , s02 | H2E , π01 , π02 ) , where s0i | HiE is such that s0i ∈ Bi (ω, s−i , πi ) for every ω ∈ Ω, where s−i | HiR = sG | HiR and s−i | HiE = −i whose existence is established in Lemma A.3 s−i | HiE . The transfer π0i is defined as transfer πG i E E E E and the set of all transfers for s | H = (s1 | H1 , s2 | H2 ). Now both the set of strategies s−i | H−i {πi | π ≤ πi ≤ 0} can be identified with a compact, convex subset of a finite dimensional Euclidean space. Moreover, the best response correspondence Bi (ω, · , · ) is upper hemicontinuous and has nonempty and convex values. π0i is single-valued, independent of πi , and continuous with respect to s | HE by Lemma A.3. Hence F is a nonempty, compact, convex-valued, and upper hemicontinuous correspondence, which establishes that there exists a fixed point and sBi such that sX (sFP | H1E , sFP | H2E , πG , πG ) ∈ F(sFP | H1E , sFP | H2E , πG , πG ). Redefine sG | HiR = sX | HiR 2 2 2 2 1 1 i i 1 1 i and sX | HiE = sFP | HiE for X = G, B. i i Let πBi , i = 1, 2 be defined by Lemma A.2 with the strategies on erroneous histories given n | H1E , sFP | H2E ). Notice that from (3) and (5), any strategy from the set si ∈ STi : si | HiR = by (sFP 2 1 o E is a best response against both (sG , πG ) and (sB , πB ). s˜i | HiR for some s˜i ∈ Si and si | HiE = sFP | H i i −i i −i i G G Let rBi ∈ Bi (ω, sB−i , πBi ) and rG ∈ B (ω, s , π ) as defined in the proof of Lemma A.2 and A.3 rei i −i i spectively. Then from (4) and (6), for small enough ε and every ω ∈ Ω, both payoffs, given by UiA (ω, rG , sG , πG ) and UiA (ω, rBi , sB−i , πBi ), are close to mins˜i ∈Si UiT (ω, s˜i , sG ) > vi and maxs˜i ∈ST UiT (ω, s˜i , sB−i ) < i −i i −i i vi respectively. By enlarging, if necessary, [v1 , v1 ] × [v2 , v2 ], we may assume without loss of generality that G G min UiA (ω, rG i , s−i , πi ) = vi ω∈Ω

and

max UiA (ω, rBi , sB−i , πBi ) = vi . ω∈Ω

−B, the lower bound of θ(ht−1 , ωt , σt−i ), depends on θ(hs−1 , ωs , σs−i ), s > t, so we need to invoke backward induction −i −i to argue that θ is bounded. 14

–25–

Since T is fixed, we can take δ close enough to 1 so that for any ω and hT−i ,  1 − δT  A B B B v − U (ω, r , s , π ) < vi − vi , i i i −i i δT  1 − δT  T G G (1 − δ)πG (h ) + vi − UiA (ω, rG −i i i , s−i , πi ) > vi − vi . T δ (1 − δ)πBi (hT−i ) +

Similarly to the perfect monitoring case, the equilibrium strategies are described by automata, which revise automaton states and actions at the beginning of every block. An action of the automaton is the T-period stochastic game strategy to be used by the player in the block. (i) Automaton state space: The automaton state (u, ω) of player −i’s automaton is an element of [vi , vi ] × Ω, a pair of player i’s continuation payoff and the initial state in the T-period stochastic game. (ii) Initial automaton state: Player −i starts in automaton state (u, ω) = (vi , ω1 ), a pair of the payoff to be achieved and the initial state that was drawn. with probability q and (iii) Actions: In automaton state (u, ω), player −i picks strategy sG −i B strategy s−i otherwise, where q ∈ [0, 1] solves u = qvi + (1 − q)vi . Thus, in each block, player −i performs an initial randomization and then sticks to the resulting strategy sG or sB−i −i throughout the block. (iv) Transitions: If the automaton state is (u, ω), the action of the automaton is sB−i and player −i’s history is hT−i , then, at the end of the block, player −i transits to automaton state vi + (1 −

δ)πBi (hT−i )

!  1 − δT  A B B B 0 vi − Ui (ω, ri , s−i , πi ) , ω , + δT

where ω0 is the state drawn at the end of the block game according to the probability distribution p. Note that the first coordinate is in [vi , vi ]. Similarly, if the automaton state is (u, ω), the action of the automaton is sG and player −i’s history is hT−i , then, at the end of −i the block, player −i transits to automaton state vi + (1 −

T δ)πG i (h−i )

!  1 − δT  A G G G 0 + vi − Ui (ω, ri , s−i , πi ) , ω . δT

Note again that the first coordinate is in [vi , vi ], so the automaton is well defined. We claim that these strategies form a sequential equilibrium. It follows from Lemma A.2 and A.3, and the one-shot deviation property that, given player −i’s strategy, any strategy of player i n such that in each block game player i plays a strategy from the set si ∈ STi : si | HiR = s˜i | HiR for some o s˜i ∈ Si and si | HiE = sFP | HiE is a best reply. So this strategy constitutes a sequential equilibrium. i Note also that player i’s payoff is equal to the weighted average of the payoff of playing rG against i G B B s−i and the payoff of playing ri against s−i , with respective weights q and 1 − q. The sum of the

–26–

average payoff within the block and the continuation payoff from playing rG against sG is i −i G (1 − δT )UiT (ω, rG i , s−i ) # " h i 1 − δT   G T G G A G G G T vi − Ui (ω, ri , s−i , πi ) + δ vi + E (1 − δ)πi (h−i ) (ri , s−i ) + δT

= vi . Similarly, the sum of the average payoff within the block and the continuation payoff from playing rBi against sB−i is (1 − δT )UiT (ω, rBi , sB−i ) # " h i 1 − δT   B T B B A B B B T vi − Ui (ω, ri , s−i , πi ) + δ vi + E (1 − δ)πi (h−i ) (ri , s−i ) + δT = vi . Thus, at the beginning of a block, player i’s payoff when player −i’s state is (u, ω) is qvi +(1−q)vi = u, as desired. 

B. More Players We start with the brief review of HO’s argument for cases with more than two players. The notable points of HO’s argument are; (I) At the beginning of each block, player i uses one of two continuation strategies (sBi and sG ) i to determine the continuation payoff only of player i + 1, whom we call her successor (and we say that i is the predecessor of i + 1). (II) Within a block game, after the coordination round, each player i is given the opportunity to report (through her actions) what signal she observed about all her opponents’ messages. Transfers are defined such that if player i’s reports are different from their messages, then player i + 1 is indifferent across all her strategies in the block game. (III) Within a block game, after all players report, each player i + 1 is given the opportunity to report (through her actions) what signal she observed about player i’s reports. Transfers are defined to ensure that if (i + 1)’s reports are different from i’s actual reports, then player i + 1 is indifferent across all her strategies in the block game. These points enable us to avoid the coordination issue that is specific to more than two player case; that is, player j needs to form a belief about how player k thinks about player l’s continuation payoff or current strategy, where j , k and l , j, k, which makes the analysis intractable because it can depend on i’s private history in the entire stochastic game. First, (I) guarantees that only player i is in charge of determining player (i + 1)’s continuation payoff at the beginning of each block. As in the two player case, we will make sure that player –27–

i + 1 may assume, for the sake of computing best replies, that player i plays sG . This changes the i above coordination issue into the less severe problem that player i + 1 need only focus on forming a belief about how player i thinks about −{i, i + 1}’s current strategies. But by (II), player i + 1 does not need to be worried that player i’s belief about −{i, i + 1}’s current strategies is incorrect because in that case i + 1 would be indifferent across all her strategies. Finally, by (III), player i + 1 does not need to be worried that this belief of player i is different from what player i + 1 believes about her opponents’ strategy profile. Therefore, (I) − (III) ensure that player i + 1 may act as if she knew her opponents’ strategy profile for the sake of computing best replies, which makes her block strategy depend only on her recent histories. What is new in this paper is again that we will have to preserve the above property in the construction while considering the effect of a current action choice on the distribution of the future state. We will achieve it by carefully defining transfers that depend not only on private signals but also on realized states. A strategy profile in T-block stochastic game is denoted s ∈ ST := ST1 ×· · ·×STn . We will consider again the finite T-period stochastic game with payoffs augmented by a transfer πi+1 : HiT → R at the end of the last period (identifying player 1 with player n + 1). Notice from the above argument that players make the transfer to the next higher player.

B.1 Perfect Monitoring In this subsection we consider the perfect monitoring case, in which σi = a−i for all i, and hence private histories hti are always the same among all players up to the ordering of actions and signals. B.1.1 Payoffs and strategies We fix v ∈ V ∗ throughout. Pick 2n payoff vectors wM , where M = (M1 , . . . , Mn ) and Mi ∈ {G, B} such that wM i > vi

if

Mi = G;

wM i < vi

if

Mi = B.

Then there exists vi and vi such that v∗i < vi < vi < vi and [v1 , v1 ] × · · · × [vn , vn ] ⊂ interior of co{wM : M ∈ {G, B}n }. Take η such that v∗i + η < vi and   Bη [v1 , v1 ] × · · · × [vn , vn ] ⊂ interior of co{wM : M ∈ {G, B}n }, where Bη (C) means the η-neighborhood of the set C. And then given this η, take L ≥ L, δ ≥ δ and strategies gM , gi−i satisfying the n player version of Lemma A.1, the proof of which is obvious and so omitted.

–28–

B.1.2 Block games and the set of strategies Si We divide the play in the block game into five phases. (i) Phase 1 (t = 1) : Let G and B be a partition of Ai . Player i sends good message Mi = G if she chooses an action from G; otherwise player i sends bad message Mi = B. In this and subsequent phases, each player sends message Mi by uniformly randomizing over all actions in the corresponding element of the partition, G or B. (ii) Phase 2 (t = 2, . . . , n(n − 1) + 1) : Players alternatively report what she observed about other players’ messages in Phase 1; first, player 1 reports (n − 1)-tuple signals observed in phase 1, while other players uniformly randomize over all the actions; next, player 2 reports her signals while others uniformly randomize over all their actions, and so forth. Specifically, player i reports M−i = (M1 , . . . , Mi−1 , Mi+1 , . . . , Mn ), M j ∈ {G, B} in the n − 1 periods (i − 1)(n − 1) + 2, . . . , i(n − 1) + 1 while she randomizes uniformly over all their actions in other periods. (iii) Phase 3 (t = n(n − 1) + 2, . . . , 2n(n − 1) + 1) : Players repeat alternatively their predecessors’ report in Phase 2; first player 2 repeats what she observed about player 1’s report in Phase 2, while other players uniformly randomize over all the actions; next, player 3 repeats what she observed about player 2’s report in Phase 2, while other players uniformly randomize over all the actions, and so forth. (iv) Phase 4 (t = 2n(n − 1) + 2, . . . , T − k) : The play in this phase depends on the message profile sent in Phase 1. If players send a message profile (M1 , . . . , Mn ) ∈ {G, B}n in Phase 1, they e e = (Mn , M1 , M2 , . . . , Mn−1 ). play a L-period pure Markov strategy gM repeatedly, where M However, if player i unilaterally deviates from this behavior, players −i play the L-period minmax strategy g−i repeatedly in the remaining period of Phase 4. The period length of i this phase is set to a multiple of L. Details are defined formally below. (v) Phase 5 (t = T − k + 1, . . . , T) : In period T − k + 1, player i sends the same message Mi as she sent in Phase 1. From T − k + 2 on, player i reports (a) his signals about M−i sent in e Phase 1, (b) whether there exists an unilateral deviation from gM in Phase 4 according to her private signals, and (c) if so, who first deviated and when this first deviation happened. The period length k of this phase can be set to of order log T as argued in HO. Now we specify this strategy more formally. As with n = 2 case, we first define the set of strategies Si ; si+1 ∈ STi+1 is in Si+1 ⊂ STi+1 if and only if the following conditions hold: (i) In Phase 3, si+1 repeats truthfully what she observed about player i’s report from Phase 2. (ii) Suppose player i + 1 observed M about the message profile sent in Phase 1. If Mi = G, then e si+1 plays gM in Phase 4 as long as, according to player (i + 1)’s private history, players i+1 have played gM in all previous periods t < t of Phase 4. For each (hit−1 , wt ), define e

n o t t−1 t Ai (ht−1 , w ) = a ∈ A : ∃s ∈ S , s [h , w ](a ) > 0 . i i i i i i i i

–29–

, wt ), the set of actions prescribed by Si , coincides with Ai ; in Phase In Phase 1, 2, and 5, Ai (ht−1 i 4, Ai (hit−1 , wt ) is either Ai or a singleton {gM [ωt ]} since gM is a pure Markov strategy; in Phase 3, i Ai (hit−1 , wt ) coincides with G or B when player i repeats her predecessor’s report in period t and ρ coincides with Ai otherwise. Finally, Si , HiE,t−1/2 , HiR,t−1/2 , HiR and HiE are defined similarly as in the two player case. e

e

B.1.3 The strategies sG , sBi ∈ Si and T i g

We define a good strategy sG and a bad strategy sBi in two steps. First, define si , sbi ∈ Si as follows. i In Phase 1, g

si [h0i , ω1 ] ∈ ∆G

and

sbi [h0i , ω1 ] ∈ ∆B.

In Phase 2 (respectively, Phase 3), both strategies specify that player i reports what she observed in Phase 1 (respectively, repeats what she observed in Phase 2) as described in Subsection B.1.2. In Phase 4, given the massage profile M = (M1 , M2 , . . . , Mn−1 , Mn ) observed in Phase 1, both strategies specify g

t b t−1 t M t si [ht−1 i , ω ] = si [hi , ω ] = gi [ω ], e

e ˜ = (Mn , M1 , M2 , . . . , Mn−1 ), until the first unilateral deviation from gM where M is observed accorde ing to i’s private history. If it is observed that a player j , i unilaterally deviates from gM , then both strategies specify g

j

t b t−1 t ek−1 t si [ht−1 i , ω ] = si [hi , ω ] = gi [hi , ω ]

for t = 1 + nL + k, 1 ≤ k ≤ L, n ∈ Z+ .

j Note the minmax strategy gi depends on the recent (k − 1)-length history e hik−1 . On all other histories in Phase 4 define them arbitrarily. In Phase 5, both strategies specify that player i sends the same message as she sent in Phase 1 and then truthfully reports what she is suppose to report as described before. g g For large enough T and δ, the average discounted payoffs against sbi and si (and s j ∈ {sbj , s j } for j , i, i + 1) are approximately equal to the average payoffs in Phase 4. If player i uses strategy sbi , no matter what player i + 1 does in the block game, she earns strictly less than vi+1 in every

L-period interval of Phase 4 but at most one; the interval in which player i + 1 deviates from gM . T (ω, s b Hence, if T is sufficiently large, Ui+1 i+1 , si , s−{i,i+1} ) in the block game cannot exceed vi+1 for g any si+1 ∈ STi+1 , s j ∈ {sbj , s j }, j , i, i + 1, and any initial state ω. Similarly, if T is sufficiently large, e

g

g

T (ω, s b Ui+1 i+1 , si , s−{i,i+1} ) cannot fall below vi+1 for any si+1 ∈ Si+1 , s j ∈ {s j , s j }, j , i, i + 1, and any g

initial state ω. Take such T. Now pick small enough ρ > 0 and perturb si , sbi sightly to get a pair

–30–

ρ

of strategies sG , sBi in Si such that for all ω ∈ Ω and s j ∈ {sGj , sBj }, j , i, i + 1, i T T B (ω, si+1 , sG min Ui+1 i , s−{i,i+1} ) > vi+1 > vi+1 > vi+1 > max Ui+1 (ω, si+1 , si , s−{i,i+1} ). si+1 ∈STi+1

si+1 ∈Si+1

B.1.4 The result For s j ∈ {sGj , sBj }, j , i, i + 1, define rG , ωt ) ∈ (s ) ∈ Si+1 as a strategy such that for every (ht−1 i+1 i+1 −{i,i+1} t−1 ×Ω, the continuation strategy rG (s Hi+1 )| t−1 yields the lowest payoff against sG and s−{i,i+1} i+1 −{i,i+1} (hi+1 ,ωt ) i among all strategies si+1 ∈ Si+1 . Similarly, define rBi+1 (s−{i,i+1} ) ∈ STi+1 as a strategy such that for every t−1 , ωt ) ∈ H t−1 × Ω, the continuation strategy rB (s )| ,ωt ) yields the highest payoff against (hi+1 i+1 −{i,i+1} (ht−1 i+1 i+1 T B si and s−{i,i+1} among all strategies si+1 ∈ Si+1 . T (· , · | (ht−1 , ωt )) be the unnormalized discounted continuation Similarly to the n = 2 case, let Wi+1 i+1 T (· , · | (ht−1 , ωt )). For each t ∈ {1, . . . , T} and (ht−1 , ωt ) ∈ H t−1 × Ω, define payoff of Ui+1 i i i+1   t t T−k+1 T 1 B B t−1 t , a ) =W ω , r ) θB (ht−1 , ω , a (s ), s , s (h , ω i i+1 i+1 i+1 −{i,i+1} i −{i,i+1} i+1   T 1 B t B t − Wi+1 ω , ri+1 (s−{i,i+1} )/ai+1 , si , s−{i,i+1} (ht−1 i+1 , ω ) , Mj

where s j = s j , j , i, i + 1, and M j is the massage identified from player j’s action aT−k+1 in the first j B t t period of Phase 5. Again ri+1 (s−{i,i+1} )/ai+1 is a strategy playing ai+1 in the current period t, followed by reversion to rBi+1 (s−{i,i+1} ) from the next period t + 1 on. Note that since we assume the perfect monitoring, ht−1 equals ht−1 up to the ordering of signals and actions. By definition of rBi+1 (s−{i,i+1} ), i+1 i θB (hit−1 , ωt , ati+1 , aT−k+1 ) ≥ 0 for any (ht−1 , ωt , ati+1 , aT−k+1 ). Now define i πBi+1 (hTi ) =

T 1 X t−1 B t−1 t t δ θ (hi , ω , ai+1 , aT−k+1 ). δT t=1

Similarly to the n = 2 case, this transfer makes player i + 1 indifferent across all strategies si+1 ∈ STi+1 in the block game. For the transfer πG , we consider three cases. i+1 (i) Case 1: The message vector sent by player i + 1 in Phase 3 differs from the message vector reported by player i in Phase 2. (ii) Case 2: The message vector sent by player i + 1 in Phase 3 coincides with the message vector reported by player i in Phase 2, but the message vector reported by player i in Phase 2 differs from the massage vector sent in the first period of Phase 5. (iii) Case 3: The message vector sent by player i + 1 in Phase 3 coincides with the message vector reported by player i in Phase 2, and the message vector reported by player i in Phase 2 coincides with the massage vector sent in the first period of Phase 5. Next, for each t ∈ {1, . . . , T} and (ht−1 , ωt ) ∈ Hit−1 × Ω, define i   t t T−k+1 T 1 G G t−1 t θ(ht−1 , ω , a , a ) =W ω , r (s ), s , s (h , ω ) −{i,i+1} −{i,i+1} i i+1 i+1 i+1 i+1 i –31–

  T t G t−1 t − Wi+1 ω1 , rG i+1 (s−{i,i+1} )/ai+1 , si , s−{i,i+1} (hi+1 , ω ) , Mj

in where s j = s j , j , i, i + 1, and M j is the massage identified from player j’s action aT−k+1 j , ωt , ati+1 , aT−k+1 ) ≤ 0 whenever the first period of Phase 5. By definition of rG (s ), θ(ht−1 i i+1 −{i,i+1} , ωt , ati+1 , aT−k+1 )} ≤ 0. Finally, , ωt , ati+1 , aT−k+1 ) = min{0, θ(ht−1 , ωt ). Define θG (ht−1 ati+1 ∈ Ai+1 (ht−1 i i i+1 set   T B   if Case 1 or 2 happens, πi+1 (hi ) G T πi+1 (hi ) =  P    δ1T Tt=1 δt−1 θG (ht−1 if Case 3 happens. , ωt , ati , aT−k+1 ) i This transfer makes player i + 1 indifferent across all strategies si+1 ∈ STi+1 if Case 1 or 2 happens while it makes playing si+1 ∈ Si+1 a best response in the T-block game if Case 3 happens. By enlarging, if necessary, [v1 , v1 ] × · · · × [vn , vn ], we may assume without loss of generality that min

  T G Ui+1 ω, rG (s ), s , s = vi+1 , −{i,i+1} −{i,i+1} i+1 i

max

  T Ui+1 ω, rBi+1 (s−{i,i+1} ), sBi , s−{i,i+1} = vi+1 .

ω∈Ω,s−{i,i+1} :s j ∈{sBj ,sGj },j,i,i+1 ω∈Ω,s−{i,i+1} :s j ∈{sBj ,sGj },j,i,i+1

Moreover, pick ζ > 0 such that vi+1 − ζ ∈ (vi+1 , vi+1 ). ζ is a fine that is imposed on player (i + 1)’s continuation payoff if he misreports about his messages in Phase 3 and the first period of Phase 5, . That is, player i + 1 receives lower payoff vi+1 − ζ if Case 1 or 2 happens. provided player i plays sG i Hence, this fine gives player i + 1 incentives to report truthfully as prescribed by a strategy in Si . Since T is fixed, we can take δ close enough to 1 so that for any ω and s j ∈ {sGj , sBj }, j , i, i + 1, 1 − δT δT 1 − δT T (h (1 − δ)πG ) + i+1 i δT

(1 − δ)πBi+1 (hTi ) +

h  i T vi+1 − Ui+1 ω, rBi+1 (s−{i,i+1} ), sBi , s−{i,i+1} < vi+1 − vi+1 , h  i T G ω, rG (s ), s , s > vi+1 − (vi+1 − ζ). vi+1 − ζ − Ui+1 −{i,i+1} −{i,i+1} i+1 i

Proof of Theorem 2.7 (n ≥ 2, Perfect Monitoring): The equilibrium strategies are described by automata, which revises automaton states and actions at the beginning of every block. An action of the automaton is the T-period stochastic game strategy to be used by the player in the block. (i) Automaton state space: The automaton state (u, ω) of player i’s automaton is an element of [vi+1 , vi+1 ] × Ω, a pair of player (i + 1)’s continuation payoff and the initial state in the T-period stochastic game. (ii) Initial automaton state: Player i starts in automaton state (u, ω) = (vi+1 , ω1 ), a pair of the payoff to be achieved and the initial state that was drawn. (iii) Actions: In automaton state (u, ω), player i picks strategy sG with probability q and strategy i B si otherwise, where q ∈ [0, 1] solves u = qvi+1 + (1 − q)vi+1 . Thus, in each block, player i performs an initial randomization and then sticks to the resulting strategy sG or sBi throughout i –32–

the block. We refer to the outcome of this randomization as player i’s intention. (iv) Transitions: If the automaton state is (u, ω), the action (intention) of the automaton is sBi and player i’s history is hTi , then, at the end of the block, player i transits to automaton state vi+1 + (1 −

δ)πBi (hTi )

! i  1 − δT h 0 T B B + vi+1 − Ui+1 ω, ri+1 (s−{i,i+1} ), si , s−{i,i+1} , ω , δT

where ω0 is the state drawn at the end of the block game according to the probability distribution p. Note that the first term is in [vi+1 , vi+1 ]. Similarly, if the automaton state is (u, ω), the action (intention) of the automaton is sG and player i’s history is hTi in which i Case 1 or 2 happens, then, at the end of the block, player i transits to automaton state vi+1 − ζ + (1 −

T δ)πG i+1 (hi )

!  i 1 − δT h T G G 0 + vi+1 − ζ − Ui+1 ω, ri+1 (s−{i,i+1} ), si , s−{i,i+1} , ω . δT

Note that the first term is in [vi+1 , vi+1 ]. If the automaton state is (u, ω), the action (intention) of the automaton is sG and player i’s history is hTi in which Case 3 happens, then, at the end i of the block, player i transits to automaton state vi+1 + (1 −

T δ)πG i+1 (hi )

!  i 1 − δT h T G G 0 + vi+1 − Ui+1 ω, ri+1 (s−{i,i+1} ), si , s−{i,i+1} , ω . δT

Note again that the first term is in [vi+1 , vi+1 ], so the automaton is well defined. By construction of the transfer scheme and the one-shot deviation property, given the strategy of player (i + 1)’s opponents, any strategy that player i + 1 plays si+1 ∈ Si+1 in each block game is a best reply. So this strategy constitutes a sequential equilibrium. Note also that player (i+1)’s payoff is equal to the weighted average of the payoff of playing rG (s ) against sG and s j ∈ {sGj , sBj }, i+1 −{i,i+1} i j , i, i+1, and the payoff of playing rBi+1 (s−{i,i+1} ) against sBi and s j ∈ {sGj , sBj }, j , i, i+1, with respective weights q and 1 − q. The sum of the average payoff within the block and the continuation payoff from playing rG (s ) against sG and s j ∈ {sGj , sBj }, j , i, i + 1, is i+1 −{i,i+1} i   T G (1 − δT )Ui+1 ω1 , rG (s ), s , s −{i,i+1} i+1 −{i,i+1} i ) ( T  i h 1−δ T G T G vi+1 − Ui+1 ω, ri+1 (s−{i,i+1} ), si , s−{i,i+1} = vi+1 . + δ vi+1 + δT Similarly, the sum of the average payoff within the block and the continuation payoff from playing rBi+1 (s−{i,i+1} ) against sBi and s j ∈ {sGj , sBj }, j , i, i + 1, is   T (1 − δT )Ui+1 ω1 , rBi+1 (s−{i,i+1} ), sBi , s−{i,i+1} ( )  i 1 − δT h T T B B + δ vi+1 + vi+1 − Ui+1 ω, ri+1 (s−{i,i+1} ), si , s−{i,i+1} = vi+1 . δT

–33–

Thus, at the beginning of a block, player (i + 1)’s payoff when player i’s state is (u, ω) is qvi+1 + (1 − q)vi+1 = u, as desired  Under the perfect monitoring, Case 1 and 2 does not need to be considered because actions and signals always coincides. But the idea that player i + 1 becomes indifferent across her all strategies if Case 1 or 2 happens is important under imperfect monitoring because it allows player i + 1 to play as if she knew player −{i, i + 1}’s intentions, which allows her block strategy to depend only on recent histories when combined with the fact that player i + 1 is indifferent across all strategies when player i intends to play sBi (so player i + 1 can play as if she knows player i intends to play sG ). i

B.2 Imperfect Private Monitoring In this subsection we consider the almost perfect monitoring case. Note that condition (c) in the following lemma corresponds to (II) and (III) stated in the introduction of Section B. Lemma B.1. For every strategy profile s | HE , there exists ε > 0 such that for all ε < ε, (a) There exists a nonnegative transfer πBi+1 : HiT → R+ such that for every ω ∈ Ω and every M = (M1 , . . . , Mn ) ∈ {G, B}n with Mi = B, (9)

, πBi+1 ), STi+1 = Bi+1 (ω, sM −{i+1} Mj

| HRj = s j | HRj and sM | HEj = s j | HEj , and for every si+1 ∈ Bi+1 (ω, sM where sM , πBi+1 ), j j −{i+1} (10)

A lim Ui+1 (ω, si+1 , sM , πBi+1 ) < vi+1 . −{i+1}

ε→0

: HiT → R− such that for every ω ∈ Ω and every M = (b) There exists a nonpositive transfer πG i+1 (M1 , . . . , Mn ) ∈ {G, B}n with Mi = G, (11)

n o R R E E si+1 ∈ STi+1 : si+1 | Hi+1 = s˜i+1 | Hi+1 for some s˜i+1 ∈ Si+1 and si+1 | Hi+1 = si+1 | Hi+1 ⊂ Bi+1 (ω, sM , πG i+1 | si+1 ), −{i+1} Mj

where sM | HRj = s j | HRj and sM | HEj = s j | HEj , and for every si+1 ∈ Bi+1 (ω, sM , πG | s ), j j i+1 i+1 −{i+1} (12)

A lim Ui+1 (ω, si+1 , sM , πG i+1 ) > vi+1 ; −{i+1}

ε→0

πG depends continuously on s and is bounded away from −∞ by π. i+1 (c) Moreover, every strategy in Phase 4 yields the same payoff to player i + 1 conditional on each of the following two events; (i) The message profile reported by player i in Phase 2 does not coincide with the intention profile of players other than i + 1. –34–

(ii) The message profile reported by player i in Phase 2 coincides with the intention profile of players other than i + 1, but the message profile sent by player i + 1 in Phase 3 does not coincide with the message profile reported by player i in Phase 2. Proof. See Appendix C.  0 Now for every M = (M1 , M2 , . . . , Mn ) ∈ {G, B}n with Mi = B and the corresponding J−(i+1) 0 defined in the first paragraph of Appendix C, define the transfer φBi+1 (ω1 , J−(i+1) ) satisfying 0 A )= Ui+1 , πBi+1 ) + φBi+1 (ω1 , J−(i+1) (ω1 , si+1 , sM −{i+1}

max

M∈{G,B}n :Mi =B

A Ui+1 (ω1 , si+1 , sM , πBi+1 ), −{i+1}

and from φBi+1 define φBi+1 (ω1 , ωT−k+1 , aˆT−k+1 ) that is irrelevant to (i + 1)’s maximization problem in Phase 5 as in the Appendix C. Similarly, for every M = (M1 , M2 , . . . , Mn ) ∈ {G, B}n with Mi = G and 0 1 0 the corresponding J−(i+1) , define the transfer φG i+1 (ω , J−(i+1) ) satisfying A 1 0 (ω1 , si+1 , sM Ui+1 , πBi+1 ) + φG i+1 (ω , J−(i+1) ) = −{i+1}

min

M∈{G,B}n :Mi =G

A Ui+1 (ω1 , si+1 , sM , πG i+1 ), −{i+1}

G 1 T−k+1 , aˆT−k+1 ) that is irrelevant to (i + 1)’s maximization problem in and from φG i+1 define φi+1 (ω , ω Phase 5 as in the Appendix C.

, and πBi+1 Proof of Theorem 2.7 (n ≥ 2, Imperfect Private Monitoring): First, we define sFP | HiE , πG i i+1 for i = 1, 2, . . . , n. Consider a correspondence from the set of all strategies si | HiE and nonpositive transfers πi+1 , i = 1, 2, . . . , n, into itself, defined by n o F(si | H1E , πi+1 , i = 1, 2, . . . , n) = (s0i | HiE , π0i+1 , i = 1, 2, . . . , n) , E is a strategy of player i + 1 such that s0 | where s0i+1 | Hi+1 is a best response to her opponents’ i+1 (ht−1 ,ωt ) i+1

E,t−1 E R | H−(i+1) ) for every (ht−1 , ωt ) ∈ Hi+1 strategy (sM | H−(i+1) , sM × Ω in the auxiliary scenario i+1 −(i+1) −(i+1) with πi+1 , where M is the message profile player i + 1 observed about i’s report in Phase 2. Hence E,t−1 player i + 1 knows M at every ht−1 ∈ Hi+1 since there is no erroneous history in Phase 1 and i+1 0 E 2, so the best response si+1 | Hi+1 is well defined. The transfer π0i+1 is defined as transfer πG i+1 whose existence is established in Lemma B.1 (b) for s | HE = (si | HiE , i = 1, 2, . . . , n). Now both the set of strategies si | HiE and the set of all transfers {πi+1 | π ≤ πi+1 ≤ 0} can be identified with a compact, convex subset of finite dimensional Euclidean space. By the same argument as in the two player case, F is a nonempty, compact, convex-valued, and upper hemicontinuous correspondence, which establishes that there exists a fixed point (sFP | HiE , πG , i = 1, 2, . . . , n) ∈ F(sFP | HiE , πG ,i = i i i+1 i+1 1, 2, . . . , n). For the intention profile M, define sM such that sM | HiR = sM | HiR and sM | HiE = sFP | HiE . Let i i i i i πBi+1 , i = 1, 2, . . . , n be defined by Lemma B.1 (a) with the strategies on erroneous histories given R = by (sFP | HiE , i = 1, 2, . . . , n). Notice that from (9) and (11), any strategy si+1 such that si+1 | Hi+1 i M G R for some s˜ E FP E s˜i+1 | Hi+1 i+1 ∈ Si+1 and si+1 | Hi+1 = si+1 | Hi+1 is a best response against (s−(i+1) , πi+1 )

–35–

with Mi = G, and (sM , πBi+1 ) with Mi = B. Then from (10) and (12), for such a strategy si+1 , −(i+1) UiA (ω, si+1 , sM , πG , πBi+1 ) < vi when Mi = B, for small ) > vi when Mi = G, and UiA (ω, si+1 , sM i+1 −(i+1) −(i+1) enough ε and every ω ∈ Ω. Q By enlarging, if necessary, ni=1 [vi , vi ], we may assume without loss of generality that min

A Ui+1 (ω, si+1 , sM , πG i+1 ) = vi+1 , −{i+1}

max

A Ui+1 (ω, si+1 , sM , πBi+1 ) = vi+1 . −{i+1}

ω∈Ω,M∈{G,B}n :Mi =G ω∈Ω,M∈{G,B}n :Mi =B

Moreover, define ζBi+1 (ω, hTi ) := φBi+1 (ω, ωT−k+1 , aˆT−k+1 ) + vi+1 − T ζG i+1 (ω, hi )

:=

T−k+1 T−k+1 φG , aˆ ) i+1 (ω, ω

+ vi+1 −

max

M∈{G,B}n :M

i =B

min

M∈{G,B}n :Mi =G

A , πBi+1 ), Ui+1 (ω, si+1 , sM −{i+1} A Ui+1 (ω, si+1 , sM , πG i+1 ). −{i+1}

Given that T is fixed, we take δ close enough to 1 so that for any ω and hTi , (1 − δ)πBi+1 (hTi ) +

1 − δT B ζ (ω, hTi ) < vi+1 − vi+1 , δT i+1

T (1 − δ)πG i+1 (hi ) +

1 − δT G ζ (ω, hTi ) > vi+1 − vi+1 . δT i+1

and

Similarly to the perfect monitoring case, the equilibrium strategies are described by automata, which revises automaton states and actions at the beginning of every block. An action of the automaton is the T-period stochastic game strategy to be used by the player in the block. (i) Automaton state space: The automaton state (u, ω) of player i’s automaton is an element of [vi+1 , vi+1 ] × Ω, a pair of player (i + 1)’s continuation payoff and the initial state in the T-period stochastic game. (ii) Initial automaton state: Player i starts in automaton state (u, ω) = (vi+1 , ω1 ), a pair of the payoff to be achieved and the initial state that was drawn. (iii) Actions: In automaton state (u, ω), player i picks strategy sG with probability q and strategy i B si otherwise, where q ∈ [0, 1] solves u = qvi+1 + (1 − q)vi+1 . Thus, in each block, player i performs an initial randomization and then sticks to the resulting strategy sG or sBi throughout i the block. (iv) Transitions: If the automaton state is (u, ω), the action of the automaton is sBi and player i’s history is hTi , then, at the end of the block, player i transits to automaton state vi+1 + (1 −

δ)πBi+1 (hTi )

! 1 − δT B T 0 + ζ (ω, hi ), ω , δT i+1

–36–

where ω0 is the state drawn at the end of the block game according to the probability distribution p. Note that the first term is in [vi+1 , vi∗1 ]. Similarly, if the automaton state is (u, ω), the action of the automaton is sG and player i’s history is hTi , then, at the end of the i block, player i transits to automaton state vi+1 + (1 −

T δ)πG i+1 (hi )

! 1 − δT G T 0 + ζ (ω, hi ), ω . δT i+1

Note again that the first term is in [vi+1 , vi+1 ], so the automaton is well defined. It follows from Lemma B.1 and the one-shot deviation property that, given player i’s strategy, any strategy of player i + 1 such that in each block game player i + 1 plays a strategy from the set o n E R R = s˜ E FP si+1 ∈ STi+1 : si+1 | Hi+1 i+1 | Hi+1 for some s˜i+1 ∈ Si+1 and si+1 | Hi+1 = si+1 | Hi+1 is a best reply. So this strategy constitutes a sequential equilibrium. The sum of the average payoff within the block and the continuation payoff of player i + 1 against sM with Mi = G is −(i+1) T (ω, si+1 , sM (1 − δT )Ui+1 −(i+1) ) ( " #) 1 − δT G T G T M T + δ vi+1 + E (1 − δ)πi+1 (hi ) + ζ (ω, hi ) (si+1 , s−(i+1) ) δT i+1

= vi+1 . Similarly, the sum of the average payoff within the block and the continuation payoff of player with Mi = B is i + 1 against sM −(i+1) T (1 − δT )Ui+1 (ω, si+1 , sM −(i+1) ) " #) ( T 1 − δ B T T B T M + δ vi+1 + E (1 − δ)πi+1 (hi ) + ζ (ω, hi ) (si+1 , s−(i+1) ) δT i+1

= vi+1 . Thus, at the beginning of a block, player (i + 1)’s payoff when player i’s state is (u, ω) is qvi+1 + (1 − q)vi+1 = u, as desired. 

C. Appendix : Proof of Lemma B.1 Let J−(i+1) denote the information held by players other than i + 1; (1) the intentions of players other than i+1 to be reported in the first period of Phase 5, (2) the signals observed by those players about the message profile M sent in Phase 1, (3) whether there exists an unilateral deviation from e gM in Phase 4 according to their private signals, and (4) if so, who first deviated and when this first deviation happened. Let aˆt := (ati , σti ) ∈ Ai × Σi a pair of player i’s action and signal in period t. Player i receives the information about J−(i+1) through (aˆT−k+1 , aˆT−k+2 , . . . , aˆT ) in Phase 5. –37–

t Furthermore, given a period t of Phase 4, we denote by J−(i+1) the component of J−(i+1) that contains the information of all players other than i + 1 up to period t; thus it contains the information of (1), (2), whether a unilateral deviation was observed in some period τ < t of Phase 4, and if so, who 0 1 first deviated and when this first deviation happened. Let J−(i+1) contain (1) and (2), and let J−(i+1) t t contain only (1). We write J−(i+1) ∈ J−(i+1) when the information of all players other than i + 1 up

to period t coincides and according to their private histories, either all players played gM up to e period t or player i + 1 unilaterally first deviated from gM in some period τ < t. Note that player i’s information about J−(i+1) is imperfect in two respects: First, all players take all actions with probability at least ρ in Phase 5 and hence players other than i + 1 send a wrong action with positive probability; second, since players report through actions and monitoring is imperfect, the signals received may be erroneous. However, as both ε → 0 and ρ → 0, player i’s information about J−(i+1) coincides with J−(i+1) with probability tending to 1. e

C.1 Proof of (a) The transfer will have the form πBi+1 (hTi )

T 1 X t−1 δ θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ), = T δ t=1

for some function θ to be defined by backward induction. Note that θ depends on not only the sequence of player i’s actions and signals (aˆT−k+1 , . . . , aˆT ) in Phase 5 but also the sequences of states (ωT−K+1 , . . . , ωT ) in Phase 5. [Phase 5] : Start with the periods of Phase 5. Consider period t. Fix ωt . In this phase, θ will depend only on (ωt , aˆt ). Suppose that θ(ωτ , aˆτ ), τ > t, make player i + 1 indifferent across all action profiles aτ , in each period τ > t. To make player i + 1 indifferent across all action profiles at in period t, for each ati it suffices to pick θ(ωt , aˆt ) satisfying ui+1 (ωt , ati , at−i ) +

X

mi (σti | ωt , ati , at−i )θ(ωt , ati , σti ) =

σti ∈Σi

max ui+1 (ωt , at )

ωt ∈Ω,at ∈A

∀at−i ∈ A−i .

This system has |A−i | × |Σi | coefficient matrix tending to the identity matrix as ε → 0 and hence has a unique solution. By construction, these values θ(ωτ , aˆτ ), τ ≥ t make player i + 1 indifferent across all action profiles aτ in each period τ ≥ t. Thus, player i + 1 is indifferent across all her continuation strategies from t on. Moreover, these values θ(ωt , aˆt ) may be chosen to be positive for small enough ε by the similar argument as in the proof of Lemma A.2. Finally by construction the auxiliary continuation payoff in Phase 5 is maxω,a ui+1 (ω, a), which is independent of the outcome of the previous phases. Hence (i + 1)’s actions in the previous phases does not affect the auxiliary continuation payoff in Phase 5, which is an important consideration for stochastic game. [Phase 4] : We move on to Phase 4. Define θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ) such that the

–38–

following conditions hold. (iii) Player i + 1 is indifferent across all her strategies from period t on (until the end of Phase 4) S t t t conditional on every ωt ∈ Ω and J−(i+1) ∈ J−(i+1) (J−(i+1) )c . (iv) Player (i + 1)’s discounted payoff from period t on, augmented by the transfers θ assigned t t from period t on (until the end of Phase 4), conditional on every ωt and J−(i+1) ∈ J−(i+1) , converges as ε → 0 to the maximum of her payoffs over all continuation strategies under t perfect monitoring (until the end of Phase 4), conditional on the same ωt and J−(i+1) . (v) Player (i + 1)’s discounted payoff from the initial period of Phase 4 on, augmented by the transfers θ assigned from the initial period of Phase 4 on (until the end of Phase 4), is independent of the initial state of Phase 4. (vi) The transfers θ assigned in the periods of Phase 4 is irrelevant to player (i+1)’s maximization problem in the periods τ ∈ {T − k + 1, . . . , T} of Phase 5. Condition (iii) guarantees that player i + 1 is indifferent across all her strategies in Phase 4, which is enough for the condition (c) in the case of Mi = B. Given T is large enough, condition (iv) guarantees that (10) is satisfied. Note that by (v) the auxiliary continuation payoff from Phase 4 on is independent of the outcome of Phase 2 and 3. This consideration is important for stochastic game because player i + 1 no longer needs to worry about her current action choice changing the distribution of state during Phase 4 and 5. Finally, by (vi), player i + 1 does not need to consider the effect of her choice of actions in Phase 5 on the transfers assigned in the periods of Phase 4 as argued in Yamamoto (2009).15 t As first step, we will define transfer θ0 (ωt , aˆt , J−(i+1) ) that directly depends on J−(i+1) instead of t t T−k+1 T−k+1 T T t θ(ω , aˆ , ω , aˆ , . . . , ω , aˆ ), where we can assume as if player i knew J−(i+1) . Observe that M t t determines the (mixed) actions in period t of players other than through ω and s−(i+1) each J−(i+1) t t i + 1. This means that given ω and J−(i+1) , (i + 1)’s stage-game payoff in period t and the probability t+1 t+1 distribution over ω and J−(i+1) are determined by player (i + 1)’s action in period t. Now we use the backward induction argument. Start with the last period T − k of Phase 4. Fix T−k T−k as given. From ωT−k and J T−k , αT−k ω and J−(i+1) is determined as argued. To achieve (iii) −(i+1) −(i+1) 0 T−k T−k T−k ), and (iv), it suffices to pick θ (ω , aˆ , J−(i+1) ) such that for each aT−k ∈ support(α−(i+1) −(i+1) T−k ui+1 (ωt , aT−k i+1 , a−(i+1) ) +

X

T−k 0 T−k T−k T−k mi (σiT−k | ωT−k , aT−k , ai , σi , J−(i+1) ) i+1 , a−(i+1) )θ (ω

σT−k ∈Σi i T−k = max ui+1 (ωT−k , aT−k i+1 , a−(i+1) ) aT−k ∈Ai+1 i+1

∀aT−k i+1 ∈ Ai+1 .

This system has |Ai+1 | × |Σi | coefficient matrix DT−k tending to be full row rank as ε → 0 and hence T−k has a unique solution by setting θ0 (ωT−k , aiT−k , σiT−k , J−(i+1) ) = 0 if σT−k is not equal to a−{i,i+1} on the i part of i’s signal about −(i + 1)’s actions. 15 Yamamoto, Y. (2009): ”A Limit Characterization of Belief-Free Equilibrium Payoffs in Repeated Games,” Journal of Economic Theory, 144, 802-824.

–39–

t Consider period t < T − k. Fix ωt and J−(i+1) . Suppose that θ0 (ωτ , aˆτ , J−(i+1) ), τ > t, makes player t+1 . As i + 1 indifferent across all her strategies from t + 1 on, conditional on every ωt+1 and J−(i+1) , Jt determines αt−(i+1) . To achieve (iii) and (iv), it suffices to argued before, given ωt and sM −(i+1) −(i+1) pick θ0 (ωt , aˆt , J−(i+1) ) such that for each at−(i+1) ∈ support(αt−(i+1) ),

i h 4 t+1 ui+1 (ωt , ati+1 , at−(i+1) ) + δE Wi+1 (ωt+1 , J−(i+1) ) ωt , ati+1 , at−(i+1) X + mi (σti | ωt , ati+1 , at−(i+1) )θ0 (ωt , ati , σti , J−(i+1) ) σti ∈Σi

= max

ati+1 ∈Ai+1

n h io 4 t+1 ui+1 (ωt , ati+1 , at−(i+1) ) + δE Wi+1 (ωt+1 , J−(i+1) ) ωt , ati+1 , at−(i+1) ,

∀ati+1 ∈ Ai+1 ,

4 (ωt+1 , J t+1 ) is player (i + 1)’s unnormalized discounted payoff from period t + 1 on, where Wi+1 −(i+1) augmented by the transfers θ0 from period t + 1 on (until the end of Phase 4). This system has |Ai+1 | × |Σi | coefficient matrix Dt tending to be full row rank as ε → 0 and hence has a unique solution by setting θ0 (ωt , ati , σti , J−(i+1) ) = 0 if σti is not equal to at−{i,i+1} on the part of i’s signal about −(i + 1)’s actions. In this way the transfers θ0 achieve (iii) and (iv). Moreover, they may be chosen to be positive for small enough ε by the similar argument as in the proof of Lemma A.2. Finally 4 (ω0 , J 1 4 (ω, J 1 we can ensure (v) by adding the difference maxω0 Wi+1 ) − Wi+1 ) to the transfers −(i+1) −(i+1) 0 θ assigned for the initial period of Phase 4. However, θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ) cannot directly depend on J−(i+1) . We will define θ t such that given (ωt , aˆt , J−(i+1) ), the expected value of θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ) is equal to θ0 (ωt , aˆt , J−(i+1) ), which implies that θ defined in this way achieves (iii), (iv), and (v). Moreover, the realization of (aˆT−k+1 , ωT−k+2 , aˆT−k+2 , . . . , ωT , aˆT ) is affected not only by J−(i+1) but also by a sequence of player (i + 1)’s actions in Phase 5 through the distribution p. Hence, we will need to design θ to ensure (vi). First, define θ”(ωt , aˆt , aT−k+1 , aT−k+2 , . . . , aT−(i+1) ), which does not depend on (ωT−k+1 , . . . , ωT ), such −(i+1) −(i+1) that, for all ωt ,aˆt ,and J−(i+1) ,

(13)

θ0 (ωt , aˆt , J−(i+1) ) =

X

T−k+1 T−k+2 Pr(a−(i+1) , a−(i+1) , . . . , aT−(i+1) | J−(i+1) )

T−k+1 ,aT−k+2 ,...,aT (a−(i+1) ) −(i+1) −(i+1)

T−k+2 T × θ”(ωt , aˆt , aT−k+1 −(i+1) , a−(i+1) , . . . , a−(i+1) ),

where Pr(aT−k+1 , aT−k+2 , . . . , aT−(i+1) | J−(i+1) ) is the probability that players −(i + 1), following sM , −(i+1) −(i+1) −(i+1) T−k+1 T−k+2 T play a sequence of actions (a−(i+1) , a−(i+1) , . . . , a−(i+1) ) in Phase 5 conditional on J−(i+1) . Note that are linearly independent row vectors {Pr(aT−k+1 , aT−k+2 , . . . , aT−(i+1) | J−(i+1) )}(aT−k+1 ,aT−k+2 ,...,aT )∈Ak −(i+1) −(i+1) −(i+1) −(i+1) −(i+1) −(i+1) for different J−(i+1) as ρ → 0. Hence, the above system can be solved for θ”. In particular, we can make θ” nonnegative as ε → 0 since the coefficient matrix is positive and θ0 can be made nonnegative as ε → 0. Next, define θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ) such that for all ωt ,aˆt , and

–40–

T−k+1 , aT−k+2 , . . . , aT (a−(i+1) ), −(i+1) −(i+1)

(14)

T−k+2 T θ”(ωt , aˆt , aT−k+1 −(i+1) , a−(i+1) , . . . , a−(i+1) ) X X X X X ··· = σiT−k+1 ∈Σi ωT−k+2 ∈Ω σiT−k+2 ∈Σi ωT−k+3 ∈Ω

X

p(ωT−k+2 | ωT−k+1 , aT−k+1 )

σTi ∈Σi ωT+1 ∈Ω

× mi (σT−k+1 | ωT−k+1 , aT−k+1 ) i × p(ωT−k+3 | ωT−k+2 , aT−k+2 )mi (σiT−k+2 | ωT−k+2 , aT−k+2 ) × · · · × p(ωT+1 | ωT , aT )mi (σTi | ωT , aT ) × θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , ωT−k+2 , aˆT−k+2 , ωT−k+3 , . . . , ωT , aˆT ), for a given ωT−k+1 . Row vectors {p(ωT−k+2 | ωT−k+1 , aT−k+1 )mi (σiT−k+1 | ωT−k+1 , aT−k+1 )×· · ·×p(ωT+1 | ωT , aT ) T−k+1 , aT−k+2 , . . . , aT ) ×mi (σTi | ωT , aT )}(σT−k+1 ,ωT−k+2 ,...,σT ,ωT+1 ) are linearly independent for different (a−(i+1) −(i+1) −(i+1) i as ε → 0, so the above system can be solved for θ. In particular, we can make θ nonnegative as ε → 0 since θ” can be made nonnegative as ε → 0. Note that the expected value of θ conditional on (ωt , aˆt , J−(i+1) ) is equal to θ0 (ωt , aˆt , J−(i+1) ). Hence, these transfers θ achieve (iii), (iv), and (v). It remains to show that these transfers θ achieve (vi). First, consider period T − k + 1. Note that θ is chosen so that the right-hand side of (14) is independent of a sequence of player i + 1’s actions T−k+1 , . . . , aT ) in Phase 5, which implies that conditional on any (ωt , aˆt , ωT−k+1 ), player (i + 1)’s (ai+1 i+1 strategy choice from T − k + 1 on does not affect the expected value of θ. Next, consider period τ > T − k + 1. Rearranging (14), T−k+2 T θ”(ωt , aˆt , aT−k+1 −(i+1) , a−(i+1) , . . . , a−(i+1) ) X X X X X p(ωT−k+2 | ωT−k+1 , aT−k+1 )mi (σiT−k+1 | ωT−k+1 , aT−k+1 ) = ··· σiT−k+1 ∈Σi ωT−k+2 ∈Ω σiT−k+2 ∈Σi

∈Σi ωτ ∈Ω στ−1 i

× · · · × p(ωτ | ωτ−1 , aτ−1 )mi (στ−1 | ωτ−1 , aτ−1 ) i h i τ × E θ(ωt , aˆt , ωT−k+1 , aˆT−k+1 , ωT−k+2 , aˆT−k+2 , . . . , ωT , aˆT ) (σiT−k+1 , ωT−k+2 , . . . , στ−1 , ω ) . i h i τ ) , which is Similarly to the above, this system can be solved for E θ (σiT−k+1 , ωT−k+2 , . . . , στ−1 , ω i independent of a sequence of player (i + 1)’s actions from period τ on since θ” and the coefficient matrix of the system does not depend on them, as desired. [Phase 2 and 3] : By the same argument as in Phase 5, we pick the values θ(ωt , aˆt ) for all the periods of Phase 2 and 3 such that player i + 1 is indifferent across all her strategies in two phases. Note that payer i + 1 can disregard the auxiliary continuation payoff from Phase 4 on, which is independent of the outcome of Phase 2 and 3. [Phase 1] : We define the transfer θ(ω1 , aˆ1 , ωT−k+1 , aˆT−k+1 , . . . , ωT , aˆT ) in Phase 1 such that player i + 1 is indifferent across all her actions in Phase 1 conditional on every ω1 ∈ Ω and every S 1 1 1 J−(i+1) ∈ J−(i+1) (J−(i+1) )c and the transfer assigned for Phase 1 is irrelevant to player (i + 1)’s maximization problem in each period of Phase 5. Note that player (i + 1)’s action choice in Phase 1 4 (J 1 4 (ω, J 1 1 affects J−(i+1) , on which the payoff Wi+1 ) := maxω Wi+1 ) in Phase 4 depends. −(i+1) −(i+1) –41–

0 1 1 Consider first θ0 (ω1 , aˆ1 , J−(i+1) ) directly depending on J−(i+1) . Note that J−(i+1) determines α1−(i+1) . 0 For each ω1 and J−(i+1) , it suffices to pick θ0 such that for all a1i+1 ∈ Ai+1 ,

ui+1 (ω1 , a1i+1 , a1−(i+1) ) +

X

1 Pr(J−(i+1) | ω1 a1i+1 , a1−(i+1) )

1 J−(i+1)

   X    1 4 1 ×  mi (σ1i | ω1 , a1i+1 , a1−(i+1) )θ0 (ω1 , a1i , σ1i , J−(i+1) ) + δ2n(n−1)+1 Wi+1 (J−(i+1) )   σ1i ∈Σi

    X   1 4 1 = max ui+1 (ω1 , a1i+1 , a1−(i+1) ) + δ2n(n−1)+1 Pr(J−(i+1) | ω1 , a1i+1 , a1−(i+1) )Wi+1 (J−(i+1) ) ,  a1 ∈Ai+1   1 J−(i+1)

i+1

0 1 for every a1−(i+1) in the support of α1−(i+1) determined by J−(i+1) , where Pr(J−(i+1) | ω1 , a1 ) is the proba1 bility of J−(i+1) given a1 . This system has a coefficient matrix tending to be full row rank as ε → 0 and ρ → 0, and hence can be solved for θ0 . These θ0 make player i + 1 indifferent across her actions in Phase 1 as desired. Applying the similar argument to the above, we can define θ such that the transfer assigned in Phase 1 is irrelevant to player (i + 1)’s maximization problem in each periods of Phase 5.

C.2 Proof of (b) and (c) 0 Recall that J−(i+1) is the intentions of all players other than i + 1. The transfer will have the form

T πG i+1 (hi ) =

T 1 X t−1 t−1 t t T−k+1 T−k+1 δ θ(hi , ω , aˆ , ω , aˆ ), δT t=1

for some function θ to be defined by backward induction. [Phase 5] : By the same argument as in (a), we can make player i + 1 indifferent across all her strategies in Phase 5 with her auxiliary continuation payoff in Phase 5 independent of the initial state of Phase 5. 0 0 [Phase 4] : As in (a), we first define θ0 (ht−1 , ωt , aˆt , J−(i+1) ) directly depending on J−(i+1) such that i t−1 t t T−k+1 T−k+1 (11), (12), (i), and (ii) from (c) hold. Next we will define θ(hi , ω , aˆ , ω , aˆ ) so that (vi) from the proof of (a) additionally holds. But here we do not make sure that (v) from the proof of (a) holds, so that in defining transfers in previous phases, we will have to consider the effect of action choices in these phases on the distribution of the initial state of Phase 4. 0 [Phase 4 : Case (i)] : First consider (i) from (c). Consider period t of Phase 4. In this case, J−(i+1) does not coincide with ht−1 on the report of player i from Phase 2. By backward induction, suppose i 0 τ−1 0 τ τ that θ (hi , ω , aˆ , J−(i+1) ), τ > t, make player i + 1 indifferent across all action profiles aτ in each 0 period τ > t, when J−(i+1) does not coincide with hτ−1 on the report of player i from Phase 2. Then i 0 0 0 0 t−1 t t it suffice to pick θ (hi , ω , aˆ , J−(i+1) ) such that for every ωt , ht−1 , and J−(i+1) for which J−(i+1) does i –42–

on the report of player i from Phase 2, not coincide with ht−1 i X

ui+1 (ωt , at ) +

(15)

h 4(i) i t t t 0 t t+1 0 t t mi (σti | ωt , at )θ0 (ht−1 , ω , a , σ , J ) + δE W (h , ω , J ) ω , a i i i −(i+1) −(i+1) i+1 i

σti ∈Σi

= ui+1 (ωt , at ) +

X

h 4(i) i t t t 0 t t+1 0 mi (σti | ωt , at )θ0 (ht−1 , J−(i+1) ) ωt , at , i , ω , ai , σi , J−(i+1) ) + δE Wi+1 (hi , ω

σti ∈Σi

for all at except some at , and t t 0 t θ0 (ht−1 i , ω , a , J−(i+1) ) = c ,

(16)

4(i)

0 where Wi+1 (hti , ωt+1 , J−(i+1) ) is the unnormalized discounted payoff from period t + 1 on, augmented 0 by the transfers θ from period t + 1 on (until the end of Phase 4). ct is recursively set to a negative number   ct < − T max ui+1 (ω, a) − min ui+1 (ω, a) ω,a ω,a   T−k X    s s 0 s s 0 . min θ0 (hs−1 −  max θ0 (hs−1  i , ω , aˆ , J−(i+1) ) − s−1 i , ω , aˆ , J−(i+1) ) s−1 s s s s s=t+1

hi

,ω ,aˆ

hi

,ω ,aˆ

The coefficient matrix of the system (15) and (16) is linearly independent as ε → 0, so the system can be solved for θ0 . Moreover, from (15), as ε → 0, t t 0 θ0 (ht−1 i , ω , a , J−(i+1) ) t t 0 t t t t → θ0 (ht−1 i , ω , a , J−(i+1) ) + ui+1 (ω , a ) − ui+1 (ω , a ) h 4(i) i h 4(i) i 0 0 + δE Wi+1 (hti , ωt+1 , J−(i+1) ) ωt , at − δE Wi+1 (hti , ωt+1 , J−(i+1) ) ωt , a t , 0 so by the definition of ct , θ0 (ht−1 , ωt , at , J−(i+1) ) becomes negative as ε → 0. In this way the coni 4(i)

2n(n−1)+1

dition (i) is achieved. We can ensure that Wi+1 (hi

4(i)

0 , ω2n(n−1)+2 , J−(i+1) ) is independent of the

2n(n−1)+1

initial state ω2n(n−1)+2 of Phase 4 (denote it by Wi+1 (hi 4(i)

2n(n−1)+1

minω0 Wi+1 (hi

4(i)

2n(n−1)+1

0 , ω0 , J−(i+1) ) − Wi+1 (hi

4(i)

0 , J−(i+1) )) by adding the difference

0 , ω, J−(i+1) ) to the transfers θ0 in the initial period of

2n(n−1)+1

Phase 4. Note that from (15) and (16), Wi+1 (hi

2n(n−1)+1

0 , J−(i+1) ) has the same value for any hi

4(i)

0 satisfying (i), so finally denote it by Wi+1 (J−(i+1) ). 0 t−1 t t T−k+1 T−k+1 Of course, θ(hi , ω , aˆ , ω , aˆ ) cannot depend directly on J−(i+1) . Moreover, (i + 1)’s t−1 t t T−k+1 T−k+1 action in the initial period of Phase 5 can affect θ(hi , ω , aˆ , ω , aˆ ). Hence as in (13) and (14), define θ such that

(17)

t t 0 θ0 (ht−1 i , ω , aˆ , J−(i+1) ) =

X

T−k+1 0 Pr(a−(i+1) | J−(i+1) )

X σT−k+1 ∈Σi i

T−k+1 ∈A a−(i+1) −(i+1)

t t T−k+1 T−k+1 × θ(ht−1 , aˆ ), i , ω , aˆ , ω

–43–

mi (σiT−k+1 | ωT−k+1 , aT−k+1 )

which makes sure (vi). The coefficient matrix D1 of this system tends to the identity matrix as ε → 0 and ρ → 0, so the system on θ obtained from that on θ0 by replacing the vector of θ0 with D1 multiplied by the vector of θ still has linearly independent equations. [Phase 4 : Case (ii)] : A similar argument guarantees the condition (ii). First, consider 0 0 is such that i’s signals about player ) directly depending on J−(i+1) , where ht−1 θ0 (hit−1 , ωt , aˆt , J−(i+1) i (i + 1)’s message in Phase 3 are precisely equal to an actual message profile sent by player i + 1 in Phase 3 (i.e. the corresponding part of στi about (i + 1)’s action is equal to her actual action aτi+1 ). in which an actual Then to define θ0 , apply the same argument as in the case of (i) for any ht−1 i message profile sent by player i + 1 in Phase 3 differs from the report of player i in Phase 2 and for 0 0 on the report of player i from Phase 2. As result, we any J−(i+1) such that J−(i+1) coincides with ht−1 i 4(ii)

2n(n−1)+1

0 ), player (i + 1)’s unnormalized discounted payoff in Phase 4, augmented get Wi+1 (hi , J−(i+1) by the transfers θ0 in Phase 4, which is independent of the initial state of Phase 4. Note that as 2n(n−1)+1 4(ii) 2n(n−1)+1 0 satisfying (ii), so in case (i), the payoff Wi+1 (hi , J−(i+1) ) has the same value for any hi 4(ii)

0 denote it by Wi+1 (J−(i+1) ). t−1 Finally define θ(hi , ωt , aˆt , ωT−k+1 , aˆT−k+1 ) such that t t 0 θ0 (ht−1 i , ω , aˆ , J−(i+1) ) X = Pr(ht−1 | ht−1 i i ) t−1 ht−1 i ∈Hi

X

T−k+1 0 | J−(i+1) ) Pr(a−(i+1)

X

mi (σiT−k+1 | ωT−k+1 , aT−k+1 )

∈Σi σT−k+1 i

T−k+1 ∈A a−(i+1) −(i+1)

t t T−k+1 T−k+1 × θ(ht−1 , aˆ ), i , ω , aˆ , ω

where Pr(hit−1 | ht−1 ) is the probability of player i’s signal about (i + 1)’s report in Phase 3 given an i actual message sent by player i + 1 in Phase 3 and hence can be positive only for ht−1 that may differ i t−1 from hi only in the components corresponding to the periods in which player i + 1 is supposed to repeat i’s report in Phase 3. The coefficient matrix D2 of this system tends to the identity matrix as ε → 0 and ρ → 0, so the system on θ obtained from that on θ0 by replacing the vector of θ0 with D2 multiplied by the vector of θ still has linearly independent equations. [Phase 4 : Case (∗)] : Now consider the case that player i’s report in Phase 2 coincides with both 0 the message profile sent by player i + 1 in Phase 3 and J−(i+1) . We label this case as (∗). As usual 0 0 0 t−1 t t consider θ (hi , ω , aˆ , J−(i+1) ) directly depending on J−(i+1) and as in the proof of Lemma A.3 define ) ∈ Si+1 , where M is the message profile sent by player i in Phase 2. Then we a strategy rG (sM i+1 −{i,i+1} 0 0 t−1 t can make θ (hi , ω , aˆt , J−(i+1) ) guarantee (11) and (12) in the similar manner to the proof of Lemma 4(∗) 2n(n−1)+1 0 A.3. Denote by Wi+1 (hi+1 , ω2n(n−1)+2 , J−(i+1) ) player (i+1)’s unnormalized discounted payoff in Phase 4, augmented by the transfers in Phase 4. Finally, as usual, define θ(ht−1 , ωt , aˆt , ωT−k+1 , aˆT−k+1 ) i as in (17). The resulting system on θ has linearly independent equations. As in HO, the coefficient matrix of the systems in the cases (i), (ii) and (∗), even when combined together, is still linearly independent as ε → 0 and ρ → 0, so the system can be solved simultaneously for θ. To summarize, (i + 1)’s discounted payoff in Phase 4, augmented by the transfers in

–44–

Phase 4, is  4(i) 0    Wi+1 (J−(i+1) )    4(ii) 0 2n(n−1)+1 4 0 Wi+1 (hi+1 , ω2n(n−1)+2 , J−(i+1) ) :=  Wi+1 (J−(i+1) )     W 4(∗) (h2n(n−1)+1 , ω2n(n−1)+2 , J0 ) i+1 i+1 −(i+1)

if (i) holds, if (ii) holds, if (∗) holds.

up to the ordering of actions and is equal to ht−1 , ωt ], where ht−1 (sM )[ht−1 By setting at = rG i i+1 i+1 i+1 −(i+1) signals, in the construction of the transfers (15) and (16) in the above cases (i) and (ii), we can ensure 4(i) 0 4(ii) 0 4(∗) 2n(n−1)+1 0 Wi+1 (J−(i+1) ) = Wi+1 (J−(i+1) ) < Wi+1 (hi+1 , ω2n(n−1)+2 , J−(i+1) ), 2n(n−1)+1

0 for all hi+1 , ω2n(n−1)+2 , and J−(i+1) . [Phase 3] : We show that player i + 1 strictly prefers to repeating truthfully her signals about i’s report in Phase 2 in the periods in which she is supposed to do so. First, for the periods in which player i+1 is supposed to uniformly randomize over all her actions, we make player i+1 indifferent R,t−1/2 0 ) as follows; for all (ht−1 , ωt ) ∈ Hi+1 across all her actions by setting transfers θ0 (ht−1 , ωt , aˆt , J−(i+1) i+1 i and at ∈ A in such period t,

(18)

ui+1 (ωt , at ) +

X

X

t t t 0 t−1 t t t 0 | ht−1 Pr(ht−1 i i+1 , ω )mi (σi | ω , a )θ (hi , ω , ai , σi , J−(i+1) )

ht−1 ∈Hit−1 σti ∈Σ i

h i 0 t t + δE Wi+1 (hti+1 , ωt+1 , J−(i+1) ) ht−1 , ω , a i+1 n h io t t t t+1 0 t−1 t t = max ui+1 (ω , a ) + δE Wi+1 (hi+1 , ω , J−(i+1) ) hi+1 , ω , a , at ∈A

0 where Pr(ht−1 | ht−1 , ωt ) is the player (i+1)’s belief about ht−1 given (ht−1 , ωt ), and Wi+1 (hti+1 , ωt+1 , J−(i+1) ) i i+1 i i+1 is player (i + 1)’s auxiliary unnormalized continuation payoff from period t on (until the end of R,t−1/2 Phase 5). This system has Hi+1 × A × Hit−1 × Ω × Ai × Σi coefficient matrix tending to be full row rank as ε → 0, ρ → 0, and ε/ρ → 0, and hence has a unique solution for θ0 when 0 θ0 (hit−1 , ωt , aˆti , J−(i+1) ) = 0 for all (ht−1 , ωt ) ∈ HiE,t−1/2 . Finally define θ as above. i Second, for period t in which player i + 1 is supposed to truthfully repeat, let θ(ht−1 , ωt , aˆt ) = 0 i when player i + 1 correctly reports the corresponding action of player i’s reports in Phase 2.   Otherwise let θ(ht−1 , ωt , aˆt ) = c where c < − maxω.a ui+1 (ω, a) − minω,a ui+1 (ω, a) . Then player i + 1 i has incentive to repeat truthfully since for small enough ε, player i + 1 cannot gain from incorrectly repeating due to her flow payoff (i.e. ui+1 + θ0 ). Moreover, for large enough T, the auxiliary 4 , so it is higher when truthfully repeating unnormalized continuation payoff Wi+1 is close to Wi+1 than when incorrectly repeating16 . R,t−1 As argued in HO, in order to make i+1’s belief of the case (∗) happening close to 1 for each regular history ht−1 ∈ Hi+1 i+1 in Phase 3, it is important to assume (i) the perturbation ρ with ε/ρ → 0, (ii) Σi = A−i , and (iii) sequentially repeating in Phase 3. 16

–45–

[Phase 1 and 2] : By the same argument as in (18), we can make player i + 1 indifferent across all her actions in each period of each phase. Finally, the continuity and the boundedness of πG can be proven by the similar argument in i+1 Lemma A.3. 

–46–

A Folk Theorem for Stochastic Games with Private ...

Page 1 ... Keywords: Stochastic games, private monitoring, folk theorem ... belief-free approach to prove the folk theorem in repeated prisoners' dilemmas.

277KB Sizes 0 Downloads 369 Views

Recommend Documents

A Folk Theorem with Private Strategies
Mar 31, 2011 - The main contribution here is to apply the techniques from that .... For any player i, let ϕi(ai,bi) be the probability of failure conditional on a ...

A Folk Theorem for Minority Games
May 27, 2004 - phases which use, in an unusual way, the pure actions that were ...... Cc m.. ≤ ϵ. One may also assume that for all m ≥ M2, we have. 3/. √.

A Folk Theorem for Minority Games
May 27, 2004 - Email addresses: [email protected] (Jérôme Renault), ... tion: The players repeat a known one-shot game and after each stage ...

The Folk Theorem in Repeated Games with Individual ...
Keywords: repeated game, private monitoring, incomplete information, ex-post equilibrium, individual learning. ∗. The authors thank Michihiro Kandori, George ...

The Nash-Threat Folk Theorem in Repeated Games with Private ... - cirje
Nov 7, 2012 - the belief free property holds at the beginning of each review phase. ...... See ?? in Figure 1 for the illustration (we will explain the last column later). 20 ..... If we neglect the effect of player i's strategy on θj, then both Ci

The Nash-Threat Folk Theorem in Repeated Games with Private ... - cirje
Nov 7, 2012 - The belief-free approach has been successful in showing the folk ...... mixture αi(x) and learning the realization of player j's mixture from yi. ... See ?? in Figure 1 for the illustration (we will explain the last column later). 20 .

Approachability with Discounting and the Folk Theorem
Aug 6, 2015 - where v(u) is the value of the game G = (A1,A2,π1,−π1) with π1(i, j) = u · ¯m(i, j) for all (i, j) ∈ A1 × A2. 3 Folk Theorem with Perfect Monitoring and Fi- nite Automata. A normal form game G is defined by G = (Ai,ui)i∈N ,

On Stochastic Incomplete Information Games with ...
Aug 30, 2011 - The objective of this article is to define a class of dynamic games ..... Definition 2.3 A pure Markov strategy is a mapping σi : Ti × Bi → Ai.

A minmax theorem for concave-convex mappings with ...
Sion [4] or see Sorin [5] and the first chapter of Mertens-Sorin-Zamir [2] for a .... (5). Then X and Y are both finite dimensional but unbounded, f is concave-.

Introduction to Repeated Games with Private Monitoring
Stony Brook 1996 and Cowles Foundation Conference on Repeated Games with Private. Monitoring 2000. ..... actions; we call such strategies private). Hence ... players.9 Recent paper by Aoyagi [4] demonstrated an alternative way to. 9 In the ...

On the folk theorem with one-dimensional payoffs and ...
We denote by ai the lowest subgame-perfect equilibrium payoff of Player i in a ... For given discount factors, the existence of the (ai)i=1,...,n is ensured by the ...

Introduction to Repeated Games with Private Monitoring
our knowledge about repeated games with imperfect private monitoring is quite limited. However, in the ... Note that the existing models of repeated games with.

A discrete stochastic model for investment with an ...
A discrete stochastic model for investment with an application to the transaction costs case. Laurence Carassus a,), Elyes Jouini b. ` a UniХersite de Paris 7, CREST and CERMSEM, Paris, France. ´ b CREST and CERMSEM, UniХersite de Paris, 1 Pantheo

A STRUCTURE THEOREM FOR RATIONALIZABILITY ...
under which rti (ai) is a best reply for ti and margΘXT−i. (πti,rti (ai)) = κti . Define a mapping φti,rti (ai),m : Θ* → Θ* between the payoff functions by setting. (A.5).

A STRUCTURE THEOREM FOR RATIONALIZABILITY IN ... - STICERD
We show that in any game that is continuous at infinity, if a plan of action ai is rationalizable ... Following Chen, we will use the notation customary in incomplete ...

pdf-1837\games-the-world-around-four-hundred-folk ...
... the apps below to open or edit this item. pdf-1837\games-the-world-around-four-hundred-folk-ga ... -in-the-elementary-school-by-sarah-ethridge-hunt.pdf.

A STRUCTURE THEOREM FOR RATIONALIZABILITY IN ... - STICERD
particular, there, we have extensively discussed the meaning of perturbing interim ..... assumption that Bi (h), the set of moves each period, is finite restricts the ...

Coalitional stochastic stability in games, networks and ...
Nov 3, 2013 - and the resulting stochastic evolution of social conventions. ... bility with respect to an arbitrary set of feasible coalitions, and (iii) employing the ...

A utility representation theorem with weaker continuity ...
Sep 10, 2009 - ... of the Japan Society for the Promotion of Science (JSPS) and financial ... In conjunction with the independence axiom, Herstein and Milnor [7] ...

Cognitive Biases in Stochastic Coordination Games ...
An example is depicted in Figure 1. It is well-known that in coordination games, under myopic best response dynamics with errors, a population's behavior ...

VIRTUAL REHABILITATION WITH VIDEO GAMES : A NEW ...
VIRTUAL REHABILITATION WITH VIDEO GAMES : ... UPATIONAL THERAPY By : JONATHAN HALTON.pdf. VIRTUAL REHABILITATION WITH VIDEO GAMES ...

Implementing the Nash Program in Stochastic Games
Jun 23, 2013 - game without state variables, a stochastic game with contracts is, .... problem, from his static setting to a stationary, infinite-horizon environment.

A utility representation theorem with weaker continuity condition
Sep 4, 2008 - The main step in the proof of our utility representation theorem is to show ...... If k ≥ 3, we can apply Lemma 1 again to D ∩ co({x0} ∪ Y ε/(k−1).