Repeated Games with General Time Preference

Viewer
Transcript

Repeated Games with General Time Preference∗ Ichiro Obara University of California, Los Angeles

Jaeok Park Yonsei University

February 25, 2017

Abstract We introduce general time preferences into repeated games with perfect monitoring. A profile of strategies is called an agent subgame perfect equilibrium if all one-shot deviation constraints are satisfied at every history. We present codes of behavior that prescribe worst equilibrium punishment paths depending on the current action profile. We provide a scenario in which present-biased players cannot be minmaxed. We develop a recursive characterization of agent subgame perfect equilibrium payoﬀs. We extend codes of behavior and the recursive characterization to dynamic games where a state variable aﬀects stage-game payoﬀs and time preferences. Lastly, we discuss various equilibrium concepts for players with time-inconsistent preferences. JEL Classification: C73 Keywords: Agent Subgame Perfect Equilibrium, Perfect Monitoring, Recursive Preference, Repeated Games, Semi-Hyperbolic Discounting, Time Inconsistency, Time Preference.

1

Introduction

For any dynamic model in Economics, a good understanding of trade-oﬀs between current cost (benefit) and future benefit (cost) is essential. A repeated or dynamic game, which is the topic of this paper, is not an exception. Each agent’s incentive constraint is all about a comparison of deviation gain today and loss of future payoﬀs. To evaluate such trade-oﬀs, we need to introduce some time preference into the model. We usually rely on a particular model of time preference: the discounted utility (DU) model, where each agent’s period utilities are discounted at some constant rate, and then aggregated across periods. There is a good reason for using the DU model. It is simple, so easy to use ∗

Obara gratefully acknowledges support from National Science Foundation grant SES-1135711. Park gratefully acknowledges support from the Yonsei University Future-leading Research Initiative of 2015 (2015-220157).

1

and easy to interpret, and it is time consistent. It also seems to describe the behavior of people reasonably well. Nonetheless it has been observed many times, empirically and experimentally, that the DU model is at odds with the actual behavior of people in a variety of circumstances. Not surprisingly, a plethora of models of time preferences have been proposed by psychologists and behavioral economists such as hyperbolic discounting, reference-dependent preferences (e.g., habit formation), etc. Some of them have been successful in explaining many “anomalies” that cannot be explained by the standard DU model (Frederick, Loewenstein and O’Donoghue [5]). Although these time preferences usually take simple parametric forms, the analysis can become very complex quickly with those preferences for a dynamic model with a long horizon. One reason for this is time inconsistency of those preferences. This is probably partly why the most popular class of non-DU time preference is quasi-hyperbolic discounting (Phelps and Pollak [15]; Laibson [8, 9]; and O’Donoghue and Rabin [11, 12]), which is a minimal departure from the DU model. Therefore it is important to develop a machinery to analyze models with more general time preferences and establish a mapping between various preferences and possible outcomes in diﬀerent decision problems and games. By doing so, we can examine the robustness of known results with those preferences and may be able to find a new class of useful non-DU time preferences. This paper introduces general time preferences into repeated games with perfect information. We allow time preferences to be nonseparable across time and time inconsistent. Later we also introduce a state variable to allow past behavior to aﬀect both intertemporal preference and within-period preference (i.e., stage-game payoﬀ). In particular, preferences can be of a habit formation type. When players are time inconsistent, there is a conflict of interests between the current self and future selves. Then, as is well-known, the notion of optimal dynamic plans becomes unclear even for single-person decision problems. A player’s optimal decision in the current period depends on his beliefs on the behavior of his future selves. In the context of games, this means that the set of possible deviations at each point of time is ambiguous. In this paper, we take the multiple-selves approach and assume that each player believes that all future selves stick to the equilibrium strategy whenever he contemplates a deviation. That is, we apply Strotz-Pollak equilibrium (Peleg and Yaari [14]) to repeated games, as in Chade, Prokopovych and Smith [4]. We call it agent subgame perfect equilibrium (ASPE) because in this notion each player can be regarded as a collection of agents who take an action only once at a particular history.1 1

By definition, we only need to check one-shot deviations for ASPE. With geometric discounting, it is without loss of generality to focus on one-shot deviations. If all one-shot deviation constraints are satisfied, then all incentive constraints are satisfied in this case. This “one-shot deviation principle” does not hold with

2

We first examine the structure of agent subgame perfect equilibrium. With the standard DU model, it is without loss of generality to use the harshest punishment path after any deviation for any subgame perfect equilibrium in a repeated game with perfect information. Hence there exists a profile of punishment paths (one for each player) that can be used to support any equilibrium path and (thus) is self-sustaining (called an optimal penal code in Abreu [1]). We derive a similar result for repeated games with general time preferences. We show that there is a code of behavior for each player that assigns a punishment path to the player as a function of the current action profile upon his unilateral deviation. This code of behavior is universal in the sense that it can be used to assign a punishment path at any history for any equilibrium like an optimal penal code in Abreu [1]. Next, to illustrate that allowing for general time preferences can generate a surprising, but intuitive result, we assume a specific preference, namely quasi-hyperbolic discounting, and show that a present-biased player cannot be minmaxed. Then we provide a recursive characterization of equilibrium payoﬀs. This result is an extension of Abreu, Pearce and Stacchetti’s [2] characterization to the case of general time preferences with perfect information. A strength of Abreu, Pearce and Stacchetti’s [2] results lies in providing an algorithm to compute the entire equilibrium payoﬀ set. Our results provide a similar algorithm, which is implementable when the stage game is a finite game (with finite actions and states). Lastly, we extend our results on codes of behavior and recursive characterization to dynamic games, where a state variable aﬀects both payoﬀs within each period and intertemporal preferences. The paper that is most closely related to our work is the one by Chade, Prokopovych and Smith [4], where the authors provide a recursive characterization of Strotz-Pollak equilibrium (agent subgame perfect equilibrium) with quasi-hyperbolic discounting and examine monotonicity properties of the equilibrium payoﬀ set in the context of repeated games with perfect information. Our recursive characterization extends their characterization to repeated and dynamic games with more general time preferences. In a companion paper (Obara and Park [10]), we study strongly symmetric ASPE of repeated symmetric games with general discounting and apply the recursive characterization developed in this paper to a particular setting. There are some recent papers that introduce non-DU time-consistent time preferences into repeated games. Kochov and Song [6] study repeated games with endogenous discounting, obtaining a folk theorem and analyzing the structure of eﬃcient paths. Sekiguchi and Wakai [16] examine a repeated Cournot duopoly game with a recursive preference where firms discount gains and losses in an asymmetric way, and they investigate the eﬀect of the gain/loss asymmetry on the best and worst equilibria. We introduce the model in the next section. We present our results on the equilibrium time-inconsistent preferences.

3

structure in Section 3 and the recursive characterization in Section 4. We extend our results to dynamic games in Section 5. We discuss various equilibrium notions in Section 6. All the proofs are provided in the Appendix.

2

The Model

2.1

Repeated Games with General Time Preference

Let N = {1, . . . , n} be the set of n players. Player i’s action set Ai is a compact metric space for every i ∈ N . Player i’s stage-game payoﬀ function is given by gi : A → R, where A = A1 × · · · × An . It is assumed that each gi is continuous. This defines the stage game Γ = (N, A, g), where g = (g1 , . . . , gn ). We assume that there exists a pure strategy Nash equilibrium of Γ. The time is discrete and denoted by t = 1, 2, . . .. In each period, players choose actions simultaneously given the complete knowledge of past actions. Let at denote the action profile chosen in period t, for t = 1, 2, . . .. A history at the beginning of period t ≥ 2 is ht = ∪ t (a1 , . . . , at−1 ) ∈ H t := At−1 , while the initial history is written as h1 = ∅. Let H = ∞ t=1 H with H 1 := {∅} be the set of all histories. We only consider pure strategies. Thus player i’s strategy si is a mapping from H to Ai . Let Si be the set of player i’s strategies. For any si ∈ Si and h ∈ H, si |h ∈ Si is player i’s continuation strategy after history h. Each strategy profile s ∈ S := S1 × · · · × Sn generates a sequence of action profiles, or a path, in A∞ . A generic element in A∞ is denoted by a∞ = (a1 , a2 , . . .). A sequence of action profiles generated by s ∈ S is denoted by a(s) = (a1 (s), a2 (s), . . .) = (s(h1 ), s(s(h1 )), s(s(h1 ), s(s(h1 ))), . . .), where s(h) = (s1 (h), . . . , sn (h)) for any h ∈ H. For any given sequence x∞ = (x1 , x2 , . . .) in A∞ or R∞ , we use xt,∞ to denote (xt , xt+1 , . . .) and xt,t+k to denote (xt , xt+1 , . . . , xt+k ) for any t, k = 1, 2, . . .. We assume that player i’s time preference at any history ht ∈ H on sequences of future stage-game payoﬀs (gi (at ), gi (at+1 ), . . .) ∈ R∞ is represented by a mapping Wi (·|ht ) : R∞ → R. To simplify notations, we often denote player i’s repeated game payoﬀ given a path a∞ = (a1 , a2 , . . .) ∈ A∞ by Wi (a∞ |h) instead of Wi (gi (a1 ), gi (a2 ), . . . |h). Given a strategy profile s ∈ S and history h ∈ H, player i’s repeated game payoﬀ is given by Wi (gi (a1 (s)), gi (a2 (s)), . . . |h), which we denote simply by Wi (a(s)|h) or Wi (s|h). A stage game Γ and a profile of time preferences W = (W1 , . . . , Wn ) define a repeated game, which is denoted by (Γ, W ).

2.2

Various Assumptions on Time Preferences

In this subsection, we present a list of assumptions on time preferences we use. We first assume that a player’s time preference is stationary in the sense that it does not depend on the past history. 4

(A1: Stationarity) For any g ∞ ∈ R∞ and h, h′ ∈ H, Wi (g ∞ |h) = Wi (g ∞ |h′ ). This assumption means that a player evaluates a sequence of future payoﬀs exactly in the same way at any time at any history. Given this assumption, we can denote player i’s stationary preference simply by Wi (·), dropping its dependence on histories. This does not mean that we do not consider any possibility of past behavior aﬀecting current preference. We will allow for such possibilities by introducing state variables in Section 5. We impose two other assumptions as follows.2 (A2: Continuity at Infinity) For any g ∞ (k) → g ∞ ∈ R∞ , T (k) → ∞, and x ∈ R, limk→∞ Wi (g 1,T (k) (k), x, x, . . .) = Wi (g ∞ ). (A3: Monotonicity) For any g ∞ , g˜∞ ∈ R∞ , Wi (˜ g ∞ ) ≥ (>)Wi (g ∞ ) if g˜∞ ≥ (>)g ∞ .3 The second assumption captures the idea that payoﬀs in the far future are not so important. Hence limit-of-means type preferences are excluded. The third assumption guarantees that playing a stage-game Nash equilibrium at every history is an equilibrium of the repeated game (Γ, W ). The above three assumptions are always maintained for every player throughout the paper and will not be explicitly mentioned again. If Wi satisfies the following condition in addition to (A1)–(A3), we call it recursive. (A4: Recursivity) There exists Fi : R2 → R such that, for any g ∞ ∈ R∞ , Wi (g ∞ ) = Fi (g 1 , Wi (g 2,∞ )). We only need the following weaker version of recursivity for our recursive characterization of the equilibrium payoﬀ set. ci : R∞ → R that is recursive and Gi : RK+1 → R such (A5: K-Recursivity) There exists W ci (g K+1,∞ )). that, for any g ∞ ∈ R∞ , Wi (g ∞ ) = Gi (g 1,K , W Intuitively, this means that the first K action profiles can aﬀect player i’s total payoﬀ in an arbitrary way, but action profiles from the (K + 1)th period aﬀect his payoﬀ through some ci as the score to summary statistic that is recursive.4 We refer to the recursive component W distinguish it from the payoﬀ Wi . As is the case with Wi , we sometimes use action profiles instead of stage-game payoﬀs in the arguments of Fi and Gi . The class of time preferences that satisfy (A1)–(A3) and (A5) is large. In particular, we do not assume time-separability and allow for time-inconsistent preferences. Clearly the standard geometric discounting model is a special case of this. Another special case with K = 1 and ci corresponds to the standard recursive preference axiomatized by Koopmans [7]. Wi = W 2

Every product space is equipped with the product topology. For any x∞ , y ∞ ∈ R∞ , x∞ ≥ y ∞ means xt ≥ y t for all t, and x∞ > y ∞ means xt ≥ y t for all t with strict inequality for some t. 4 Note that Fi and Gi must be increasing in their arguments. 3

5

Here is one example of preferences that satisfy all the above assumptions, which would be useful for many applications: K ∞ ∑ ∑ 1 k 1 Wi (g ) = g + δ k−K g k . k K +1 ∞

k=1

k=K+1

This preference discounts payoﬀs in the first K periods as a hyperbolic discounting preference does, and then discounts further future payoﬀs by constant rate δ ∈ (0, 1) as standard geometric discounting does.5,6 This preference reduces to geometric discounting when K = 0, β–δ discounting with β = 1/2 when K = 1, and approximates a genuine hyperbolic discounting when K is large.

2.3

Agent Subgame Perfect Equilibrium

In standard subgame perfect equilibrium, each player can use any possible continuation strategy at any history as a deviation. Without time consistency, however, it is not obvious which type of deviations a player would think possible, since it may be the case that some deviation is profitable from the viewpoint of the current self, but not from the viewpoint of future selves. A player in the standard notion of subgame perfect equilibrium would be “naive” in the sense that he believes that all the future selves follow his deviation plan. In this paper, we make an opposite extreme assumption: we assume that each player believes that future selves (as well as all the other players) would stick to the equilibrium strategy if he deviates in the current period. One interpretation of this assumption is that each player is a collection of players (one for each history) who share the same preference and act only once.7 This type of equilibrium notions is standard and used in numerous papers such as Phelps and Pollak [15] and Peleg and Yaari [14].8 We adopt this notion of equilibrium and call it agent subgame perfect equilibrium. The formal definition is as follows. Definition 1 (Agent Subgame Perfect Equilibrium). A strategy profile s ∈ S is an agent subgame perfect equilibrium (ASPE) of the repeated game (Γ, W ) if for any ht ∈ H, Wi (a(s|ht )) ≥ Wi ((a′i , a1−i (s|ht )), a(s|ht ,(a′ ,a1 i

−i (s|ht ))

))

5 We cannot have pure hyperbolic discounting in an infinite horizon model because the first sum would be exploding as K → ∞. We need to taper oﬀ future payoﬀs in some way to guarantee convergence to a finite value. 6 This preference belongs to the class of semi-hyperbolic preferences proposed by Olea and Strzalecki [13] in that it is time separable and becomes geometric after some period. 7 In fact, this equilibrium is just a standard subgame perfect equilibrium if agents at diﬀerent histories are diﬀerent players. 8 See also O’Donoghue and Rabin [11]. See Section 6 for discussion on other (less extreme) equilibrium concepts.

6

for any i ∈ N and a′i ∈ Ai .9 We know that the one-shot deviation principle applies with geometric discounting (or more generally with recursive preference and continuity), and hence there is no gap between ASPE and subgame perfect equilibrium. In our setting with more general time preferences, the one-shot deviation principle fails. In fact, the set of ASPE is strictly larger than that of subgame perfect equilibria.10 Actually there can be an ASPE that is not a Nash equilibrium. As mentioned before, there exists an agent subgame perfect equilibrium of (Γ, W ) because we assume the existence of a pure strategy Nash equilibrium of Γ and the monotonicity of W . We denote the set of ASPE payoﬀs by E ⊂ Rn and the set of ASPE paths by AE ⊂ A∞ .

3

Structure of Agent Subgame Perfect Equilibrium

3.1

Structure of Worst Equilibria: Generalization of Optimal Penal Codes

In repeated games with perfect information, one punishment path for each player, which is associated with his worst equilibrium payoﬀ, can be used to deter any deviation by him from any equilibrium path. That is, it is without loss of generality (in terms of characterizing the set of all equilibrium paths and payoﬀs) to use the same punishment path and the same punishment strategy after any unilateral deviation by a certain player. So there is a punishment path for each player that is invariant across diﬀerent histories and diﬀerent equilibria. Hence every equilibrium can be described by n + 1 paths of action profiles: one for the equilibrium path and one punishment path for each player (Abreu [1]). We generalize this result to our repeated game setting with general time preference. Our result provides a foundation for developing further results. Note that, with general time preference, the current action profile may aﬀect how future payoﬀs are evaluated due to non-separability. Hence there is no well-defined worst equilibrium in our setting: what is the worst equilibrium depends on the choice of current actions. This means that diﬀerent punishments must be used to deter diﬀerent unilateral deviations at least. We can show, however, that there exists a worst punishment for each player that depends only on the current action profile and is otherwise universal. A code of behavior for player i is a mapping ci : A → A∞ . A code of behavior for player i assigns a punishment path when player i unilaterally deviates from any equilibrium path and as a result a ∈ A is played in the current period. For any path a(0) ∈ A∞ and codes of behavior (c1 , ..., cn ), let s(a(0), c1 , ..., cn ) be a profile of strategies where (1) the players follow a(0) initially as long as there is no unilateral deviation and (2) the players start playing We follow the convention of using subscript −i to denote a profile without player i’s component. Chade, Prokopovych and Smith [4] show that the sets of equilibrium payoﬀs for these two notions of equilibrium still coincide for present-biased β–δ discounting. 9

10

7

ci (a) when player i deviates from a(0) or the path prescribed by the current code of behavior unilaterally and the realized current action profile is a. When each ci is a constant function, i.e., ci (a) = a(i) for some path a(i) ∈ A∞ for every a ∈ A, the strategy profile s(a(0), c1 , ..., cn ) reduces to a simple strategy in Abreu [1].11 Our theorem shows that there exists a code of behavior that can be used as a punishment at any history in any ASPE without loss of generality. Theorem 1. There exists a profile of codes of behavior (c∗1 , . . . , c∗n ) such that 1. s(c∗i (a), c∗1 , . . . , c∗n ) is an agent subgame perfect equilibrium for every i ∈ N and every a ∈ A. 2. s(a∞ , c∗1 , . . . , c∗n ) is an agent subgame perfect equilibrium if a∞ ∈ A∞ is the limit of a sequence of ASPE paths. Proof. This is a corollary of Theorem 4. The second part of the theorem and continuity implies the following corollary. Corollary 1. The set of ASPE paths AE and the set of ASPE payoﬀs E are both compact. This theorem is not diﬃcult to prove when the stage game is a finite game. As was the case in Abreu [1], proving the existence of such codes of behavior becomes nontrivial when A is not finite, which is of course a very important case for many applications.

3.2

Present-Biased Players Cannot be Minmaxed

Suppose that each player’s time preference is given by β–δ discounting. Normalize every player’s pure strategy minmax payoﬀ in the stage game to 0. Let v i be the worst equilibrium continuation payoﬀ for player i.12 The following theorem shows that a player’s continuation payoﬀ cannot be reduced to his minmax level in any ASPE when the player is present-biased (i.e., β < 1). Theorem 2. Assume β–δ discounting with β < 1. Suppose that there is a unique pure minmax action profile for player i ∈ N , denoted by ai ∈ A. Suppose further that there is some player j ̸= i such that gj (ai ) < 0. Then v i > 0. Proof. See the Appendix. 11

This is in fact the case when each Wi is time separable or recursive because then the current action profile does not aﬀect the rankings of continuation paths. 12 Since β–δ discounting is time separable, the worst equilibrium continuation payoﬀ does not depend on the current action profile, as mentioned in footnote 11. Also, since β–δ discounting is K-recursive with K = 1, the worst equilibrium continuation payoﬀ is equal to the worst equilibrium score.

8

The intuition for this result is as follows. It is not diﬃcult to show that v i ≥ 0 as usual. To achieve 0 as player i’s worst equilibrium continuation payoﬀ, ai−i must be played by other players in the first period of the worst punishment equilibrium. Suppose that gi (a) < 0 in the first period, that is, ai is not a best response to a−i = ai−i . Then the incentive constraint implies that player i must be given a large enough “reward” R as the continuation payoﬀ for playing ai now so that his total payoﬀ is nonnegative, i.e., (1−δ)gi (a)+βδR ≥ 0. Now go back one period and evaluate this sequence of punishment payoﬀs from the viewpoint of a player who contemplates a deviation expecting that his deviation is followed by such punishment. Then the value of (gi (a), R) is (1 − δ)gi (a) + δR, which is strictly positive. The present bias at the beginning of the punishment requires reward R to be very large, in fact large enough to make this punishment less harsh and less eﬀective from the one period before-viewpoint. If ai is a best response to ai−i , then there is another possibility to reduce player i’s payoﬀ to 0: play ai in every period. But this is impossible by the assumption that gj (ai ) < 0 for some j ̸= i. What does this theorem mean? In one sense, a player’s present bias works in favor of him because it pushes up a lower bound on his equilibrium continuation payoﬀ. This eﬀect would be especially important in settings such as the principal-agent model. On the other hand, present bias may work against the player in other environments because less eﬀective punishment may imply a weaker motivation. Lastly, we note that a player’s equilibrium continuation payoﬀ can be reduced below his minmax level (v i < 0) when he has an opposite bias and puts less weight on the current payoﬀ (β > 1).13

4

Recursive Characterization of Agent Subgame Perfect Equilibrium

Suppose that each Wi satisfies K-recursivity (A5) (in addition to (A1)–(A3)). In this section, we provide a recursive characterization of ASPE payoﬀs in this case. In the standard case with geometric discounting, the set of action profiles that can be supported in the current period depends on the set of possible continuation payoﬀs. If a set of possible continuation payoﬀs from the next period is given, then we can derive a set of discounted average payoﬀs from the current period that can be sustained subject to all one-shot deviation constraints, by the recursivity of geometric discounting. This defines a mapping from a set of possible continuation payoﬀs from the next period to a set of incentive compatible payoﬀs from the current period. The maximal fixed point of this mapping is the set of equilibrium payoﬀs. This is the key logic of Abreu, Pearce and Stacchetti [2]. 13 Obara and Park [10] show that the worst equilibrium continuation payoﬀ can be greater (resp. smaller) than the minmax payoﬀ with present (resp. future) bias in another setting.

9

Our approach is similar, but employs a fixed point argument in a larger space. To illustrate our approach, assume that K = 2, which is a simple, but still an interesting case. When ci (g 3,∞ )). Hence the set of K = 2, player i’s preference is given by Wi (g ∞ ) = Gi (g 1 , g 2 , W action profiles that can be supported in the current period is determined not by the set of continuation payoﬀs from the 2nd period, but by the set of possible pairs of an action profile in the 2nd period and a continuation score from the 3rd period. Denote this set of possible pairs by V ⊂ A × Rn , where (a, v) ∈ V means that v can be supported as a continuation score from the 3rd period when a is played in the 2nd period. Our approach is to look for a self-generating mapping in the space of sets of such pairs. Notice that, if we are given V , then we can derive a set of possible pairs of a current action profile and a continuation score from the 2nd period subject to one-shot deviation constraints for the current action profile. This defines a new set V ′ . As one might expect, the maximal fixed point V ∗ of this mapping provides the set of all possible pairs of an action profile and a continuation score in equilibrium. Once we know V ∗ , it is easy to characterize all the ASPE payoﬀs. This idea generalizes to the case with K > 2. In general, we consider a set of pairs of K − 1 action profiles (from the 2nd period to the Kth period) and a continuation score from the (K + 1)th period. Given any such set, we can derive a set of possible pairs of K − 1 action profiles from the current period and a continuation score from the Kth period subject to the incentive constraints in the current period. Now we introduce formal notations to state our result. Let g = maxi,a gi (a) and g = ci (g, g, . . .) and W i = W ci (g, g, . . .) for i = 1, . . . , n, and let V † = mini,a gi (a). Let W i = W ] ∏n [ c i=1 W i , W i . Since Wi is monotone, every feasible profile of continuation scores is included in V † . Let V be a set in AK−1 ×V † and V be the collection of all such sets. Let f be a function from A to AK−1 and r be a function from A to Rn . Given any current action profile a, the continuation play in the next K − 1 periods is given by f (a) ∈ AK−1 while the continuation score from the (K + 1)th period is given by r(a) ∈ Rn . We say that (a∗ , f, r) is admissible with respect to V if the following conditions are satisfied: (i) (f (a), r(a)) ∈ V for every a ∈ A, and (ii) Gi ((a∗ , f (a∗ )), ri (a∗ )) ≥ Gi (((a′i , a∗−i ), f (a′i , a∗−i )), ri (a′i , a∗−i )) for every i ∈ N and a′i ∈ Ai . That is, if (a∗ , f, r) is admissible with respect to V , the continuation play and score (f (a), r(a)) are drawn from V for any a ∈ A, and all the one-shot deviation constraints to play a∗ are satisfied given the continuation plan (f, r). Then, given any set V , we can generate a new set of pairs of K −1 action profiles including the current action profile and continuation scores from the Kth period using admissibility.

10

Denote this operator by B. The formal definition of B is as follows: { B(V ) := ((a∗ , f (a∗ )1,K−2 ), v) ∈ AK−1 × V † :

} ∃ (a∗ , f, r) admissible w.r.t. V and vi = Fi (f (a∗ )K−1 , ri (a∗ )) ∀ i .

We say that a set V is self-generating if V ⊂ B(V ). Let c (aK,∞ )) ∈ AK−1 × V † : a∞ ∈ AE }. V ∗ = {(a1,K−1 , W That is, V ∗ is the set of pairs of K − 1 action profiles and a continuation score following them that can be achieved by some ASPE. Then our result can be stated as follows. Theorem 3. Suppose that each Wi satisfies (A5). Then (a) If V ⊂ AK−1 × V † is self-generating, then B(V ) ⊂ V ∗ . (b) V ∗ is the largest fixed point of the operator B in V. The set of ASPE payoﬀs, E, is given by E = {v ∈ Rn : ∃ (a∗ , f, r) admissible w.r.t. V ∗ and vi = Gi ((a∗ , f (a∗ )), ri (a∗ )) ∀ i} . Proof. This is a corollary of Theorem 5. As in the case of geometric discounting, the operator B is monotone, and we can obtain the set V ∗ by applying B repeatedly starting from AK−1 × V † .

5

Extension to Dynamic Games

In this section, we generalize our previous results (Theorems 1 and 3) to dynamic games where a state variable aﬀects stage game payoﬀs and time preferences.

5.1

Dynamic Games with General Time Preference

Let Θ be a state space, which is a compact metric space. An initial state is given, and a transition rule T : Θ × A → Θ determines the next period state given the current state and action profile. We assume that T is continuous. We still assume perfect monitoring, and in each period players choose actions after observing the current state. Let θt denote the state in period t = 1, 2, . . .. Then the initial history is given by the initial state, i.e., h1 = θ1 , while a period-t history is given by ht = (θ1 , a1 , . . . , θt−1 , at−1 , θt ) for t ≥ 2. Player i’s stage game payoﬀ can depend on the current state and is written as gi (θ, a). We assume that gi is continuous. Player i’s time preference at history ht is written as Wi (·|ht ). Note that ht

11

includes θt . We will assume that Wi (·|ht ) depends on ht only through θt . The dynamic game with the initial state θ is denoted by (Γ, W, θ). This formulation allows for habit-formation type preferences as a special case. For example, suppose that player i’s stage game payoﬀ in period t is aﬀected by the most recent L past action profiles as well as by the current action profile. Then player i’s period t payoﬀ can be written as gi (at−L,t ). Such payoﬀs can be casted in the above framework by regarding at−L,t−1 as a state variable and defining the transition function by T (at−L,t−1 , a) = (at−L+1,t−1 , a). Given an initial state θ ∈ Θ, let H(θ) be the set of all possible histories, i.e., those that begin with θ and are consistent with the transition rule. Player i’s strategy si in the dynamic game (Γ, W, θ) is a mapping from H(θ) to Ai . Let Si (θ) be the set of player i’s strategies in the game (Γ, W, θ), and let S(θ) = S1 (θ) × · · · × Sn (θ). As before, given si ∈ Si (θ), si |ht denotes player i’s continuation strategy after history ht ∈ H(θ). Note that si |ht is a strategy in the game (Γ, W, θt ), where θt is the last component of ht . As before, given a strategy profile s ∈ S(θ), we use a(s) to denote the path generated by s starting from the initial state θ. Since state transition is deterministic, if we are given a current state and a sequence of action profiles from the current period, we can determine the future states. Using this observation, we introduce some simplifying notations. For t = 1, 2, . . ., let T t (θ, a1,t ) be the state reached after t periods when the current state is θ and the action profiles a1,t are played for the t periods. Formally, we can define T t (θ, a1,t ) inductively by setting T 1 (θ, a1 ) = T (θ, a1 ) and T k (θ, a1,k ) = T (T k−1 (θ, a1,k−1 ), ak ) for k = 2, 3, . . .. We use (θ, a1,t−1 ) to represent the history ht = (θ1 , a1 , . . . , θt−1 , at−1 , θt ) where θ1 = θ and θk = T k−1 (θ, a1,k−1 ) for k = 2, . . . , t. For a given state θ and a sequence of action profiles a∞ , we denote a sequence of stage game payoﬀs up to period t by g(θ, a1,t ) = (g(θ, a1 ), g(T (θ, a1 ), a2 ), . . . , g(T t−1 (θ, a1,t−1 ), at )), where g(θ, a) = (g1 (θ, a), . . . , gn (θ, a)). Similarly, we use g(θ, a∞ ) to mean (g(θ, a1 ), g(T (θ, a1 ), a2 ), . . .). We still assume a version of stationarity and continuity of players’ time preferences as follows.14 (A1′ : Stationarity) For any g ∞ ∈ R∞ and h, h′ ∈ ∪θ∈Θ H(θ) with the same last component, Wi (g ∞ |h) = Wi (g ∞ |h′ ). Due to this assumption, in the following we write Wi (·|θ) instead of Wi (·|h) where h ends with θ. (A2′ : Continuity at Infinity) For any (θ(k), g ∞ (k)) → (θ, g ∞ ) ∈ Θ × R∞ , T (k) → ∞, and x ∈ R, limk→∞ Wi (g 1,T (k) (k), x, x, . . . |θ(k)) = Wi (g ∞ |θ). We modify the definition of K-recursivity as follows. 14

In the repeated game setting, we impose monotonicity to guarantee the existence of ASPE. In the dynamic game setting, we assume existence directly.

12

ci : R∞ → R that is recursive and Gi : RK+1 × Θ → R (A5′ : K-Recursivity) There exists W ci (g K+1,∞ )|θ). such that, for any g ∞ ∈ R∞ and θ ∈ Θ, Wi (g ∞ |θ) = Gi (g 1,K , W Since state transition is deterministic, we can use action profiles instead of stage-game payoﬀs in the arguments of Wi and Gi , as before. An agent subgame perfect equilibrium of a dynamic game can be defined analogously as follows. Definition 2 (Agent Subgame Perfect Equilibrium of a Dynamic Game). A strategy profile s ∈ S(θ) is an agent subgame perfect equilibrium (ASPE) of the dynamic game (Γ, W, θ) if for any ht ∈ H(θ) (ending with θt ), Wi (a(s|ht )|θt ) ≥ Wi ((a′i , a1−i (s|ht )), a(s|ht ,(a′ ,a1 i

−i (s|ht )),T (θ

t ,(a′ ,a1 (s| ))) ht i −i

)|θt )

for any i ∈ N and a′i ∈ Ai . Let E(θ) ⊂ Rn be the set of ASPE payoﬀs and AE (θ) ⊂ A∞ be the set of ASPE paths when θ is the initial state. We assume that there exists an agent subgame perfect equilibrium given any initial state θ.15

5.2

Codes of Behavior

First we extend Theorem 1 to the setting of dynamic games. In this setting, the current state and the current action profile aﬀect how a player evaluates future payoﬀs through two channels. First, the current state aﬀects a player’s time preference directly. Second, the current state and the current action profile determine the current payoﬀ, which aﬀects a player’s preference on future payoﬀs because of non-separability. Also note that the current state and the current action profile determine the next state, which determines the set of possible continuation ASPE. Therefore, the worst punishment for a player (from the current self’s viewpoint) needs to depend on the current state and the current action profile at least. We can show, however, that the worst punishment does not need to depend on any more variable other than these. We show that there exists a punishment code for each player that depends only on the current state and the current action profile. A code of behavior for player i is now defined as a mapping ci : Θ × A → A∞ . A code of behavior for player i assigns a punishment path when player i unilaterally deviates from any equilibrium path when the current state is θ and, as a result, a ∈ A is played in the current period. For any path a(0) ∈ A∞ and codes of behavior (c1 , ..., cn ), let s(a(0), c1 , ..., cn ) be the following profile of strategies: (1) players follow a(0) or a path prescribed by a code of behavior, (2) the initial path of play is a(0), (3) the current path continues to be played as 15

Existence is not a trivial issue for dynamic games. If this assumption is not satisfied, we can restrict our attention to a subset of Θ on which the set of ASPE is not empty. There exists such a subset if there exists a state with at least one ASPE.

13

long as there is no deviation from it, (4) if there is any unilateral deviation by player i from the current path, θ is the current state, and a is the realized action profile in the current period, then the path ci (θ, a) is played from the next period, and (5) if there is any joint deviation from the current path, θ is the current state, and a is the realized action profile in the current period, then the path c1 (θ, a) is played from the next period. This strategy profile is similar to the previous one for repeated games except for (5). In a repeated game, we do not need (5), and we can let the current path continue after any joint deviation. However, in a dynamic game, we need (5) because a joint deviation aﬀects the next period state. So continuing the current path may not be an equilibrium any more. A version of Theorem 1 and Corollary 1 for dynamic games can be stated as follows. Theorem 4. Suppose that each Wi satisfies (A1′ ) and (A2′ ). Then there exists a profile of codes of behavior (c∗1 , . . . , c∗n ) such that 1. s(c∗i (θ, a), c∗1 , . . . , c∗n ) is an agent subgame perfect equilibrium given the initial state T (θ, a) for every i ∈ N , every θ ∈ Θ and every a ∈ A. 2. s(a∞ , c∗1 , . . . , c∗n ) is an agent subgame perfect equilibrium given the initial state θ if ∞ (θ, a∞ ) ∈ Θ × A∞ is the limit of a sequence {(θ(k), a∞ (k))}∞ k=1 such that a (k) is

an ASPE path given the initial state θ(k) for all k = 1, 2, . . .. Proof. See the Appendix. Corollary 2. The correspondences AE (·) and E(·) are upper hemicontinuous in θ. In particular, AE (θ) and E(θ) are compact for every θ ∈ Θ. In addition to Corollary 2, Theorem 4(2) implies the following: • For every i ∈ N and θ ∈ Θ, there exists an ASPE path a∗ (i, θ) ∈ A∞ that achieves the best/worst ASPE payoﬀ for player i given the initial state θ. • For any ASPE s given the initial state θ, s(a(s), c∗1 , . . . , c∗n ) is an ASPE given the initial state θ that generates the same path a(s) ∈ A∞ . In Abreu [1], the punishment path a(i) for player i is constructed as a limit of a sequence of equilibrium paths a(i, k), k = 1, 2, . . ., that approximate the infimum of player i’s equilibrium payoﬀs. Consider the following simple strategy: (1) Initially play according to a(i) as long as there is no unilateral deviation from it, (2) switch to a(j) if there is any unilateral deviation from a(i) by player j. Apply the same rule to whatever a(j) that is currently followed. The main step of the proof is to show that this profile of strategies is a subgame perfect equilibrium. This can be done as follows. As a(i, k) is an equilibrium path, one-shot deviation constraints along the path are satisfied if player j’s unilateral deviation is followed by the punishment 14

path a(j). Taking the limit as k → ∞, all the one-shot deviation constraints for the path a(i) are satisfied in the limit. This means that a profile of a(i), i = 1, . . . , n, is self-sustaining and the above simple strategy constitutes a subgame perfect equilibrium. If we take the same approach, we may define, for each i and each (θ, a), the worst continuation path c∗i (θ, a) (from the viewpoint of the current self) by taking the limit of continuation ASPE paths that approximate the infimum v i (θ, a) of player i’s equilibrium punishment payoﬀs given current state θ ∈ Θ and current action profile a ∈ A (i.e., v i (θ, a) := inf a∞ ∈AE (T (θ,a)) Wi (a, a∞ |θ)). Then we can define a “simple” strategy using these paths c∗i (θ, a), i = 1, . . . , n, exactly as before.

The problem of this approach is that we do not know whether v i (θ, a) is continuous in (θ, a) or not. To be precise, we need it to be lower semicontinuous. The lack of continuity of v i causes a trouble because one-shot deviation constraints may not be preserved in the limit for a sequence of ASPE paths that approximate c∗i (θ, a) given (θ, a). To see this, take a such sequence of ASPE paths given (θ, a). As in the standard case, the continuation path can be replaced by the above punishment path at every history for any ASPE along this sequence. But if the punishment payoﬀ v i (θ, a) is not (lower semi)continuous, then this punishment may not be severe enough to deter a deviation in the limit because it may jump up in the limit. To avoid this problem, we define v i (θ, a) in a diﬀerent way (which turns out to be the same object as the above v i in the end). Consider the graph of all (θ, a, v) where v is a profile of payoﬀs when θ is the current state, a is played in the current period, and some ASPE given the state T (θ, a) is played from the next period on. Then we take the closure of all such points in Θ × A × Rn . If we regard this graph as a correspondence from Θ × A to Rn , it is upper hemicontinuous. Then we define v i (θ, a) as the “lower envelope” of this correspondence from player i’s viewpoint. This guarantees the lower semicontinuity of v i (θ, a), which is exactly what we need to show that (c∗1 , . . . , c∗n ) is self-sustaining (hence v i (θ, a) can be indeed achieved by an ASPE path following (θ, a)).

5.3

Recursive Characterization

Next we describe how to extend Theorem 3 to the dynamic game setting. Given current state θ1 , the set of current action profiles a1 that can be supported is determined by the set of possible pairs of a sequence of states and action profiles in the next K − 1 periods (θ2,K , a2,K ) and a continuation score from the (K + 1)th period. Since a history and a history of action profiles are one-to-one given the initial state, we can define V simply as a correspondence from a current state to a set of pairs of K − 1 action profiles and a continuation score from the Kth period. That is, V : Θ ⇒ AK−1 × V † , where V † is defined as before with g = maxi,θ,a gi (θ, a) and g = mini,θ,a gi (θ, a). Let V be the set of all correspondences from Θ to AK−1 × V † . Let f be a function from A to AK−1 and r be a function from A to Rn as before.

15

We say that (a∗ , f, r) is admissible with respect to V given θ if the following conditions are satisfied: (i) (f (a), r(a)) ∈ V (T (θ, a)) for every a ∈ A, and (ii) Gi ((a∗ , f (a∗ )), ri (a∗ )|θ) ≥ Gi (((a′i , a∗−i ), f (a′i , a∗−i )), ri (a′i , a∗−i )|θ) for every i ∈ N and a′i ∈ Ai . Using the notion of admissibility, we can define an operator B that maps a correspondence V : Θ ⇒ AK−1 × V † into another one as follows: { B(V )(θ) := ((a∗ , f (a∗ )1,K−2 ), v) ∈ AK−1 × V † : ∃ (a∗ , f, r) admissible w.r.t. V given θ } and vi = Fi (gi (T K−1 (θ, (a∗ , f (a∗ )1,K−2 )), f (a∗ )K−1 ), ri (a∗ )) ∀ i . A correspondence V is self-generating if V (θ) ⊂ B(V )(θ) for all θ ∈ Θ. For every θ ∈ Θ, let c (g(T K−1 (θ, a1,K−1 ), aK,∞ ))) : a∞ ∈ AE (θ)}. V ∗ (θ) = {(a1,K−1 , W That is, V ∗ (θ) is the set of pairs of K − 1 action profiles and a continuation score following them that can be achieved by some ASPE given the initial state θ. The following theorem is a extension of Theorem 3 to dynamic games. Theorem 5. Suppose that each Wi satisfies (A5′ ) in addition to (A1′ ) and (A2′ ). Then (a) If V : Θ ⇒ AK−1 × V † is self-generating, then B(V )(θ) ⊂ V ∗ (θ) for all θ ∈ Θ. (b) V ∗ is the largest fixed point of the operator B in V. The set of ASPE payoﬀs given the initial state θ, E(θ), is given by E(θ) = {v ∈ Rn : ∃ (a∗ , f, r) admissible w.r.t. V ∗ given θ and vi = Gi ((a∗ , f (a∗ )), ri (a∗ )|θ) ∀ i} .

(1)

Proof. See the Appendix. We can obtain V ∗ by applying the operator B repeatedly starting from V such that V (θ) = AK−1 × V † for all θ.

6

Various Equilibrium Concepts for Time-Inconsistent Players

In this section, we discuss other concepts of equilibrium. For ease of discussion, we focus on the repeated game setting, although the concepts can be applied to dynamic games as well.

16

6.1

Various Kinds of Deviations and Equilibrium Concepts

When players have time-inconsistent preferences, equilibrium concepts depend on what kind of deviations each player perceives as possible. The equilibrium concept used in this paper, ASPE, can be defined as follows. Definition 3 (ASPE). Given s−i ∈ S−i , s′i ∈ Si is a profitable one-shot deviation for player i from si ∈ Si at history ht ∈ H if s′i (h) = si (ht , h) for all histories h ̸= ∅ and Wi (a(s′i , s−i |ht )) > Wi (a(s|ht )). A strategy profile s ∈ S is an agent subgame perfect equilibrium (ASPE) if there is no profitable one-shot deviation for any player i from si at any history given s−i . The usual equilibrium concept for repeated games with perfect monitoring, SPE, can be defined as follows. Definition 4 (SPE). Given s−i ∈ S−i , s′i ∈ Si is a naively profitable multi-period deviation for player i from si ∈ Si at history ht ∈ H if Wi (a(s′i , s−i |ht )) > Wi (a(s|ht )). A strategy profile s ∈ S is a subgame perfect equilibrium (SPE) if there is no naively profitable multi-period deviation for any player i from si at any history given s−i . In ASPE the current self believes he cannot aﬀect the behavior of future selves at all, whereas in SPE the current self naively believes he can aﬀect the behavior of future selves in a way that he desires without worrying about whether future selves have an incentive to follow his instructions. The class of profitable one-shot deviations is restrictive, and thus one may argue that ASPE is too permissive. At the same time, the class of naively profitable multi-period deviation is (unreasonably) large, and thus one may argue that SPE is too strong under time inconsistency. In the following, we propose several classes of profitable deviations in between, which lead to equilibrium concepts stronger than ASPE and weaker than SPE. Consider the following scenario: When a player plans a deviation at some history, he persuades his future selves to execute his deviation plan. Future selves will follow his suggestion if the proposed deviation path yields a payoﬀ to them at least as high as the payoﬀ from the continuation path at the original strategy. This leads to the following equilibrium concept. Definition 5 (IPE). Given s−i ∈ S−i , s′i ∈ Si is a weakly profitable multi-period deviation for player i from si ∈ Si at history ht ∈ H if Wi (a(s′i , s−i |ht )) > Wi (a(s|ht )) 17

and Wi (a(s′i |h , s−i |(ht ,h) )) ≥ Wi (a(s|(ht ,h) )) for all h ∈ H. A strategy profile s ∈ S is an intrapersonal-coalition-proof perfect equilibrium (IPE) if there is no weakly profitable multi-period deviation for any player i from si at any history given s−i . When defining weakly profitable multi-period deviations, it is without loss of generality to assume that deviations in actions occur only along the deviation path. Each future self compares the payoﬀs from the proposed deviation path and the original continuation path, and follows the deviation plan if it is as good as the original strategy. The reason that we compare the deviation plan only with the original strategy, not with all possible strategies, is that we check whether the original strategy is optimal for the future self at history h when h is taken as the current history. Hence, if s is an IPE, the equilibrium continuation path from h is optimal for the self at h, and thus he has an incentive to follow a deviation plan only when he is indiﬀerent between the deviation plan and the equilibrium strategy. Fix a strategy profile s ∈ S, and suppose that player i at history ht plans a deviation s′i ̸= si |ht . If player i deviates to s′i at history ht , the induced continuation path is a(s′i , s−i |ht ). Let H(s′i , s−i , ht ) be the set of future continuation histories that are induced by the deviation plan s′i at history ht , i.e., H(s′i , s−i , ht ) = {a1,τ (s′i , s−i |ht ) : τ = 1, 2, . . .}. e i , s′ , s−i , ht ) be the set of future continuation histories on the path at which s′ specifies Let H(s i i a diﬀerent action for player i from the one specified by si |ht , i.e., e i , s′ , s−i , ht ) = {h ∈ H(s′ , s−i , ht ) : s′ (h) ̸= si |ht (h)}. H(s i i i e i , s′ , s−i , ht ) is the set of future continuation histories on the path at which player That is, H(s i i deviates. Now we focus on the class of deviations in which there are a finite number of e i , s′ , s−i , ht ) is finite). deviating selves on the path (i.e., H(s i

Definition 6 (FIPE). Given s−i ∈ S−i , s′i ∈ Si is a weakly profitable finite-period deviation for player i from si ∈ Si at history ht ∈ H if s′i is a weakly profitable multi-period deviation e i , s′ , s−i , ht ) is finite. A strategy profile s ∈ S is a for player i from si at history ht and H(s i

finite intrapersonal-coalition-proof perfect equilibrium (FIPE) if there is no weakly profitable finite-period deviation for any player i from si at any history given s−i . We can narrow the class of profitable deviations further by assuming that deviating future selves follow the proposed deviation plan only when they gain strictly relative to the original 18

strategy. Definition 7 (WFIPE). Given s−i ∈ S−i , s′i ∈ Si is a strongly profitable finite-period deviation e i , s′ , s−i , ht ) is finite, for player i from si ∈ Si at history ht ∈ H if H(s i

Wi (a(s′i , s−i |ht )) > Wi (a(s|ht )), e i , s′i , s−i , ht ), and Wi (a(s′i |h , s−i |(ht ,h) )) > Wi (a(s|(ht ,h) )) for all h ∈ H(s e i , s′ , s−i , ht ). Wi (a(s′i |h , s−i |(ht ,h) )) ≥ Wi (a(s|(ht ,h) )) for all h ∈ H(s′i , s−i , ht ) \ H(s i A strategy profile s ∈ S is a weak finite intrapersonal-coalition-proof perfect equilibrium (WFIPE) if there is no strongly profitable finite-period deviation for any player i from si at any history given s−i . ˜ be the longest history in e i , s′ , s−i , ht ) is nonempty and finite. Let h Suppose that H(s i e i , s′ , s−i , ht ). Let H1 be the set of all histories in H(s′ , s−i , ht ) \ H(s e i , s′ , s−i , ht ) shorter H(s i i i ˜ and let H2 be the set of those longer than h. ˜ Then the set of future continuation histothan h, e i , s′ , s−i , ht ) (two sets if ries, H(s′i , s−i , ht ), can be partitioned into three sets, H1 , H2 , and H(s i H1 is empty). For any future self at h ∈ H2 , the incentive constraint Wi (a(s′i |h , s−i |(ht ,h) )) ≥ Wi (a(s|(ht ,h) )) is satisfied trivially since the two paths, a(s′i |h , s−i |(ht ,h) ) and a(s|(ht ,h) ), coine i , s′ , s−i , ht ). cide. Hence, only the relevant incentive constraints are for those in H1 and H(s i

If s′i is a strongly profitable finite-period deviation, we can obtain a profitable one-shot de˜ as the current history. (If H(s e i , s′ , s−i , ht ) viation by taking the longest deviation history h i is empty, then we already have a profitable one-shot deviation.) This implies that WFIPE is equivalent to ASPE. Note that this equivalence holds no matter what requirement we impose on those in H1 . For example, we can alternatively require that those in H1 do not have a profitable one-shot deviation from the proposed deviation path or they gain strictly over the original continuation path as those deviating. To summarize the relationship among the various equilibrium concepts discussed so far, we have SPE ⊆ IPE ⊆ FIPE ⊆ WFIPE = ASPE. Suppose that we focus on finite-period deviations. If it suﬃces for the current self to provide weak incentives in order to persuade his future selves (i.e., the current self can break the indiﬀerence of future selves in his favor as in Caplin and Leahy [3]), we obtain an equilibrium concept stronger than ASPE. However, if the current self needs to provide strict incentives in order to persuade his future selves to take a diﬀerent action, restricting attention to oneshot deviations is without loss of generality, and this is true regardless of what we require on non-deviating future selves before the longest history at which a deviation occurs.

19

6.2

Intrapersonal Pareto Eﬃciency and Renegotiation-Proofness

A path a∞ is intrapersonally Pareto dominated for player i by another path e a∞ if Wi (e at,∞ ) ≥ Wi (at,∞ ) for all t, with strict inequality for some t. A strategy profile s ∈ S is intrapersonally Pareto eﬃcient if there is no i ∈ N and s′i ∈ Si such that a(s) is intrapersonally Pareto dominated for player i by a(s′i , s−i ). A strategy profile s ∈ S is intrapersonally renegotiation-proof if s|h is intrapersonally Pareto eﬃcient for all h ∈ H. We can define weakly profitable multi-period deviations alternatively as follows to obtain the same concept of IPE. Given s−i ∈ S−i , s′i ∈ Si is a weakly profitable multi-period deviation for player i from si ∈ Si at history ht ∈ H if Wi (a(s′i |h , s−i |(ht ,h) )) ≥ Wi (a(s|(ht ,h) )) for all h ∈ {∅} ∪ H(s′i , s−i , ht ), with strict inequality for some h ∈ {∅} ∪ H(s′i , s−i , ht ). That a(s′i , s−i |ht ) intrapersonally Pareto dominates a(s|ht ) for player i may look similar to that s′i is weakly profitable multi-period deviation for player i from si ∈ Si at history ht in the above sense, but there is a key diﬀerence regarding the reference payoﬀ. Consider the future self after τ periods on the path at history (ht , a1,τ −1 (s′i , s−i |ht )). For weakly profitable multi-period deviations, the future self compares the proposed path aτ,∞ (s′i , s−i |ht ) with the continuation path at the original strategy starting from (ht , a1,τ −1 (s′i , s−i |ht )), i.e., a(s|(ht ,a1,τ −1 (s′i ,s−i |ht )) ). In contrast, for intrapersonal Pareto eﬃciency, the future self com-

pares the proposed path aτ,∞ (s′i , s−i |ht ) with the continuation path at the original strategy starting from ht , i.e., aτ,∞ (s|ht ). In other words, we treat a player in period t at diﬀerent histories ht as distinct selves for equilibrium, whereas we treat a player in period t at diﬀerent histories ht as a single self for Pareto eﬃciency. Hence, we can regard intrapersonal Pareto eﬃciency and intrapersonal renegotiation-proofness as separate criteria from equilibrium.

6.3

Naive Players

Now suppose that players are naive in the sense that they believe time inconsistency does not arise. We develop equilibrium concepts for such naive players. Consider any player i at history ht . Fix s−i so that the situation is as in a single-agent decision making problem. For each action profile a taken at history ht , player i has (possibly incorrect) perception of the continuation strategy sei (ht , a) ∈ Si used by his future selves. First, we require player i should

20

behave optimally at ht given his perception: Wi ((si (ht ), s−i (ht )), a(e si (ht , (si (ht ), s−i (ht ))), s−i |(ht ,(si (ht ),s−i (ht ))) )) ≥ Wi ((ai , s−i (ht )), a(e si (ht , (ai , s−i (ht ))), s−i |(ht ,(ai ,s−i (ht ))) )) for all ai ∈ Ai . Moreover, player i at history ht should be able to justify his perception sei (ht , a) based on his naive belief. For example, consider history (ht , b a). Player i at ht perceives that his next-period self will choose sei (ht , b a)(∅), mistakenly believing that his next-period self has payoﬀ function Wi (b a, a∞ ). Hence, we require that Wi (b a, a(e si (ht , b a), s−i |(ht ,ba) )) ≥ Wi (b a, (ai , s−i (ht , b a)), a(e si (ht , b a)|(ai ,s−i (ht ,ba)) , s−i |(ht ,ba,(ai ,s−i (ht ,ba))) )) for all ai ∈ Ai . Similar inequalities must hold for longer histories as well. Consider history (ht , b a1,τ ) where τ ≥ 2. Then Wi (b a1,τ , a(e si (ht , b a1 )|ba2,τ , s−i |(ht ,ba1,τ ) )) ≥ Wi (b a1,τ , (ai , s−i (ht , b a1,τ )), a(e si (ht , b a1 )|(ba2,τ ,(ai ,s−i (ht ,ba1,τ ))) , s−i |(ht ,ba1,τ ,(ai ,s−i (ht ,ba1,τ ))) )) for all ai ∈ Ai . If Wi is K-recursive, the inequalities for τ ≥ K can be taken care of by requiring that sei (ht , b a1 )|ba2,K is a (A)SPE strategy for player i given s−i |(ht ,ba1,K ) using the ci as the payoﬀ function, for all b recursive score function W a1,K . The above can be considered as a naive version of ASPE in that we consider only one-shot deviations. We can think of a naive version of IPE as well. Roughly speaking, s is a naive IPE if for each history h ∈ H and for each player i, si (h) can be justified by his belief about the continuation play. This can be formalized using a recursive approach as in Caplin and Leahy [3]. ci (g K+1,∞ )). Suppose that each player i has K-recursive preferences, i.e., Wi (g ∞ ) = Gi (g 1,K , W Let ΣSP E ⊂ S be the set of all SPE strategy profiles when players’ payoﬀ functions are given ci . Then only strategies in ΣSP E can be justified from period K + 1 by the score functions W from the perspective of a naive player in period 1. Consider history a1,K−1 in period K. Find a∗ ∈ A and f (a) ∈ ΣSP E for all a ∈ A such that, for every i, (a∗i , fi (a∗ )) solves ci (a(si , f−i (ai , a∗−i )))) max Gi (a1,K−1 , (ai , a∗−i ), W

(ai ,si )

s.t. ai ∈ Ai , (si , f−i (ai , a∗−i )) ∈ ΣSP E . Note that (a∗ , f (a)) can be considered as a strategy profile s where s(∅) = a∗ and s|a = f (a)

21

for all a ∈ A. So let ΣK (a1,K−1 ) be the set of strategy profiles (a∗ , f (a)) satisfying the above property. Naive players in the initial period see any s ∈ ΣK (a1,K−1 ) as a possible continuation strategy profile at history a1,K−1 , since no future self at that history can gain in terms of the misperceived payoﬀ functions by a multi-period deviation. Consider history a1,K−2 in period K − 1. Given ΣK (a1,K−1 ), we find a∗ ∈ A and f (a) ∈ ΣK (a1,K−2 , a) for all a ∈ A such that, for every i, (a∗i , fi (a∗ )) solves max Wi (a1,K−2 , (ai , a∗−i ), a(si , f−i (ai , a∗−i )))

(ai ,si )

s.t. ai ∈ Ai , (si , f−i (ai , a∗−i )) ∈ ΣK (a1,K−2 , (ai , a∗−i )). Let ΣK−1 (a1,K−2 ) be the set of strategy profiles (a∗ , f (a)) satisfying the above property. Working backwards in this way, we can obtain Σ1 at the end. For any (a∗ , f (a)) ∈ Σ1 , players choose a∗ optimally given their possible incorrect belief about the continuation strategy f (a) in the next period. Let Π ⊂ A be the set of all action profiles a∗ constituting a pair (a∗ , f (a)) ∈ Σ1 . Then we can define s ∈ S as a naive IPE if s(h) ∈ Π for every h ∈ H.16

6.4

Recursive Approach to FIPE

Suppose that the current self can plan a deviation up to for T periods from the current period. Given s−i ∈ S−i , s′i ∈ Si is a weakly profitable T -period deviation for player i from si ∈ Si at history ht ∈ H if s′i is a weakly profitable finite-period deviation for player i from si at e i , s′ , s−i , ht ) has a length at most T − 1. A strategy history ht and the longest history in H(s i

profile s ∈ S is a T -period intrapersonal-coalition-proof perfect equilibrium (T -FIPE) if there is no weakly profitable T -period deviation for any player i from si at any history given s−i . We outline a recursive approach to T -FIPE. Fix a subset Σ of S, which we take as the set of candidate equilibrium strategies. Consider period T . Players in period 1 have no influence on their future selves from period T + 1 on, and thus take their continuation strategies in period T + 1, chosen from Σ, as given. Given Σ ⊂ S, we find a∗ ∈ A and f (a) ∈ Σ for all a ∈ A such that (a∗ , f (a)) ∈ Σ and, for every i, Wi (a∗ , a(f (a∗ ))) ≥ Wi ((ai , a∗−i ), a(f (ai , a∗−i )))

for all ai ∈ Ai .

Let ΣT be the set of strategy profiles (a∗ , f (a)) satisfying the above property. Then, for any s ∈ ΣT , no player has an incentive to deviate in period T when s is the continuation strategy profile in period T . 16

Note that there is a diﬀerence regarding beliefs about other players’ strategies in the two equilibrium concepts. In a naive IPE, players share common misperception of their future selves’ strategies. In contrast, in a naive ASPE, a player has a correct belief about other players’ strategies while he may have misperception of his future selves’ strategies.

22

Now consider period T − 1. Given Σ and ΣT , we find a∗ ∈ A and f (a) ∈ ΣT for all a ∈ A such that (a∗ , f (a)) ∈ Σ and, for every i, (a∗i , fi (a∗ )) solves max Wi ((ai , a∗−i ), a(si , f−i (ai , a∗−i )))

(ai ,si )

s.t. ai ∈ Ai , (si , f−i (ai , a∗−i )) ∈ ΣT . Let ΣT −1 be the set of strategy profiles (a∗ , f (a)) satisfying the above property. Then, for any s ∈ ΣT −1 , no player has an incentive to deviate in periods T − 1 and T when s is the continuation strategy profile in period T − 1. Working backwards in this way, we can obtain Σ1 in period 1. For any s ∈ Σ1 , no player has a weakly profitable T -period deviation at the initial history, while the continuation strategy profile at any history belongs to Σ. Let B be the operator to obtain Σ1 from Σ. Note that Σ1 = B(Σ) ⊆ Σ. Also, when we denote the set of all T -FIPE by Σ∗ , it satisfies Σ∗ = B(Σ∗ ), i.e., Σ∗ is a fixed point of B. In practice, we may not know Σ∗ in the beginning, but we may know the set of ASPE, denoted by ΣASP E . Then starting from ΣASP E , we can apply the above operation iteratively, and we may obtain Σ∗ in the limit.

7

Appendix: Proofs

Proof of Theorem 2 Proof. Let v i = (1 − δ)gi (a) + δvi be the worst equilibrium continuation payoﬀ for player i. The incentive constraint is (1 − ∆)gi (a) + ∆vi ≥ (1 − ∆) max gi (ai , a−i ) + ∆v i , ai

where ∆ = βδ/(1 − δ + βδ). Since gi (a) ≤ maxai gi (ai , a−i ), vi ≥ v i , and ∆ < δ, the above incentive constraint implies v i = (1 − δ)gi (a) + δvi ≥ (1 − δ) max gi (ai , a−i ) + δv i ai

(2)

Hence v i ≥ maxai gi (ai , a−i ) ≥ 0. If maxai gi (ai , a−i ) > 0 (i.e., a−i ̸= ai−i ), then v i ≥ maxai gi (ai , a−i ) > 0, and we are done. Suppose that maxai gi (ai , a−i ) = 0 (i.e., a−i = ai−i ). If gi (a) < 0, then the inequality in (2) holds strictly, which implies v i > maxai gi (ai , a−i ) = 0. We show that it is impossible to have gi (a) = 0. Suppose to the contrary that gi (a) = 0. Since the minmax action profile is unique, we have a = ai . We show that vi = v i = 0 and that v i = 0 is achieved only when ai is played in every period. If vi > v i , we can reduce v i = (1 − δ)gi (a) + δvi by replacing the continuation play with the equilibrium generating v i . Thus vi = v i . Since v i = (1 − δ)gi (a) + δvi = δv i , 23

we have vi = v i = 0. Now consider the decomposition of continuation payoﬀ 0 by (a′ , vi′ ), i.e., 0 = (1 − δ)gi (a′ ) + δvi′ . Since vi′ ≥ v i ≥ 0, we must have gi (a′ ) ≤ 0. If gi (a′ ) < 0, then vi′ > 0, and (1 − δ)gi (a′ ) + δvi′ = 0 implies (1 − ∆)gi (a′ ) + ∆vi′ < 0. The incentive constraint for player i to play a′i is

(1 − ∆)gi (a′ ) + ∆vi′ ≥ (1 − ∆) max gi (ai , a′−i ) + ∆v i . ai

Since

maxai gi (ai , a′−i )

≥ 0 and v i ≥ 0, we obtain (1 − ∆)gi (a′ ) + ∆vi′ ≥ 0, which is a

contradiction. Thus, we must have gi (a′ ) = vi′ = 0. This implies that (a′ , vi′ ) = (ai , 0) is the unique decomposition of continuation payoﬀ 0. However, ai cannot be played in every period at equilibrium because player j’s payoﬀ becomes negative, which is below his minmax level. Proof of Theorem 4 Proof. (1) We first define c∗i . For each θ ∈ Θ and a ∈ A, let Y (θ, a) = {W (a, a∞ |θ) : a∞ ∈ AE (T (θ, a))}.

Note that Y (θ, a) is nonempty for every θ and a by assumption.

Let Z = cl{(θ, a, v) ∈ Θ × A × Rn : v ∈ Y (θ, a)}.

That is, Z is the closure of the

graph of the correspondence Y . Since Z is a closed subset of a compact metric space, it is compact.

So we can define v i (θ, a) = min{vi : ∃v−i s.t. (θ, a, (vi , v−i )) ∈ Z} for

each i ∈ N , θ ∈ Θ and a ∈ A.

Choose any i ∈ N , θ ∈ Θ and a ∈ A.

can find sequences θ(k) ∈ Θ, a(k) ∈ A, s(i, k) ∈ S(T (θ(k), a(k))), and

Then we

a∞ (i, k)

∈ A∞ ,

k = 1, 2, . . ., such that (i) for each k, s(i, k) is an ASPE given the initial state T (θ(k), a(k)), and a∞ (i, k) is the path associated with s(i, k), and (ii) as k → ∞, (θ(k), a(k)) → (θ, a) and Wi (a(k), a∞ (i, k)|θ(k)) → v i (θ, a). By invoking a diagonal argument, we can find a subsequence, which we take as the original sequence without loss of generality, such that a∞ (i, k) converges to some path a∞ (i, θ, a) in A∞ . Then we define c∗i (θ, a) := a∞ (i, θ, a). Note that, by continuity of g and T , g(θ(k), (a(k), a∞ (i, k))) converges to g(θ, (a, a∞ (i, θ, a))) in product topology as (θ(k), a(k), a∞ (i, k)) converges to (θ, a, a∞ (i, θ, a)) in product topology. Hence, by (A2′ ), Wi (a(k), a∞ (i, k)|θ(k)) → Wi (a, c∗i (θ, a)|θ), and thus v i (θ, a) = Wi (a, c∗i (θ, a)|θ). ˜a Choose any ˜i ∈ N , θ˜ ∈ Θ, and a ˜ ∈ A. We want to show that s(c˜∗i (θ, ˜), c∗1 , . . . , c∗n ) is an ˜a ˜a ASPE given the initial state T (θ, ˜). Note that given the strategy profile s(c∗ (θ, ˜), c∗ , . . . , c∗ ), ˜i

1

n

the continuation path at any history takes the form [c∗i (θ, a)]t,∞ for some i, θ, a, and t. ˜a Choose an arbitrary history h ∈ H(T (θ, ˜)). Let [c∗ (θ, a)]t,∞ be the path prescribed by

i ∗ ∗ ∗ ˜ s(c˜i (θ, a ˜), c1 , . . . , cn ) following h. That is, θ and a are the state and the action profile that initiated the path c∗i (θ, a) (t − 1) periods ago. Thus, the state at history h, denoted by θ(h), is given by T t−1 (T (θ, a), [c∗i (θ, a)]1,t−1 ). Consider the continuation path at,∞ (i, k) of the ASPE s(i, k) given the initial state T (θ(k), a(k)).

Let θt (k) be the state where the path at,∞ (i, k) begins, i.e., θt (k) = T t−1 (T (θ(k), a(k)), a1,t−1 (i, k)).

24

Since at,∞ (i, k) is an ASPE path given the state θt (k), we have for every k, j and a′j , Wj (at,∞ (i, k)|θt (k)) ≥ Wj ((a′j , at−j (i, k)), a∞ (k)|θt (k)),

(3)

a∞ (k) = a(s(i, k)|(T (θ(k),a(k)),a1,t−1 (i,k),(a′j ,at−j (i,k))) ).

(4)

where

On the right-hand side of (4), we used an initial state and a finite sequence of action profiles instead of a history. Fix any a′j ̸= [c∗i (θ, a)]tj , and we take limits as k → ∞. Since the state transition function T is continuous, we have θt (k) → θ(h). Without loss of generality, we can assume that a∞ (k) converges to some path e a∞ ∈ A∞ as k → ∞. By continuity, the left-hand side of the inequality (3) converges to Wj ([c∗i (θ, a)]t,∞ |θ(h)), while the right-hand a∞ |θ(h)). Note that side converges to Wj ((a′j , [c∗i (θ, a)]t−j ), e (θt (k), (a′j , at−j (i, k)), W ((a′j , at−j (i, k)), a∞ (k)|θt (k))) ∈ Z for all k. Since Z is closed, the limit (θ(h), (a′j , [c∗i (θ, a)]t−j ), W ((a′j , [c∗i (θ, a)]t−j ), e a∞ |θ(h))) is also in Z. By the definition of v i (θ, a), we have Wj ((a′j , [c∗i (θ, a)]t−j ), e a∞ |θ(h)) ≥ v j (θ(h), (a′j , [c∗i (θ, a)]t−j )). Hence we have Wj ([c∗i (θ, a)]t,∞ |θ(h)) ≥ v j (θ(h), (a′j , [c∗i (θ, a)]t−j )) = Wj ((a′j , [c∗i (θ, a)]t−j ), c∗j (θ(h), (a′j , [c∗i (θ, a)]t−j ))|θ(h)) ˜a for every h ∈ H(T (θ, ˜)), j ∈ N and a′j ̸= [c∗i (θ, a)]tj . Observe that if these conditions are satisfied, then every one-shot deviation is not prof˜a itable. Moreover, the state corresponding to the initial history, θ(h1 ), is given by T (θ, ˜). ˜a ˜a This proves that s(c˜∗i (θ, ˜), c∗1 , . . . , c∗n ) is an ASPE given the initial state T (θ, ˜).

(2) For each k = 1, 2, . . ., let s(k) ∈ S(θ(k)) be an ASPE given the initial state θ(k) that

generates path a∞ (k) so that a∞ (k) = a(s(k)). We want to show that s(a∞ , c∗1 , . . . , c∗n ) is an ASPE given the initial state θ. Every one-shot deviation occurring after there is a deviation is not profitable by Theorem 4(1), and thus it suﬃces to consider one-shot deviations when the play is along the path a∞ . Since s(k) is an ASPE and a∞ (k) is an ASPE path given the

25

initial state θ(k), for every k and for every t, j and a′j , Wj (at,∞ (k)|θt (k)) ≥ Wj ((a′j , at−j (k)), a(s(k)|(θ(k),a1,t−1 (k),(a′j ,at−j (k))) )|θt (k)), where θt (k) = T t−1 (θ(k), a1,t−1 (k)) for t ≥ 2 and θ1 (k) = θ(k). By a similar argument to that used in the proof of Theorem 4(1), we can show that for every t, j and a′j ̸= atj , Wj (at,∞ |θt ) ≥ Wj ((a′j , at−j ), c∗j (θt , (a′j , at−j ))|θt ), where θt = T t−1 (θ, a1,t−1 ) for t ≥ 2 and θ1 = θ. This establishes that there is no profitable one-shot deviation along the path a∞ given the initial state θ, and thus s(a∞ , c∗1 , . . . , c∗n ) is an ASPE given θ. Proof of Theorem 5 Proof. For notational simplicity, we use g(θ, s) and Gi (a1,K , r|θ) to mean g(θ, a(s)) and Gi (gi (θ, a1,K ), r|θ), respectively. (a) Choose any self-generating correspondence V : Θ ⇒ AK−1 × V † and any θ1 ∈ Θ. Take any (e a1,K−1 , v) ∈ B(V )(θ1 ). We define (a∗ht , fht , rht ) for each ht ∈ H(θ1 ) inductively. We start from h1 = θ1 . By the definition of B(V )(θ1 ), there exists (a∗h1 , fh1 , rh1 ) admissible w.r.t. a1,K−1 and, for all i, V given θ1 such that (a∗h1 , fh1 (a∗h1 )1,K−2 ) = e

vi = Fi (gi (T K−1 (θ, (a∗h1 , fh1 (a∗h1 )1,K−2 )), fh1 (a∗h1 )K−1 ), rh1 ,i (a∗h1 )). For each a1 ∈ A, we obtain h2 = (θ1 , a1 , θ2 ) ∈ H(θ1 ) where θ2 = T (θ1 , a1 ), and we have (fh1 (a1 ), rh1 (a1 )) ∈ V (θ2 ) ⊂ B(V )(θ2 ). Thus, there exists (a∗h2 , fh2 , rh2 ) admissible w.r.t. V given θ2 such that (a∗h2 , fh2 (a∗h2 )1,K−2 ) = fh1 (a1 ) and, for all i,

rh1 ,i (a1 ) = Fi (gi (T K−1 (θ2 , (a∗h2 , fh2 (a∗h2 )1,K−2 )), fh2 (a∗h2 )K−1 ), rh2 ,i (a∗h2 )). Suppose that we have worked up to ht , for some t = 2, 3, . . .. For each at ∈ A, we obtain ht+1 = (ht , at , θt+1 ) ∈ H(θ1 ) where θt+1 = T (θt , at ), and we have (fht (at ), rht (at )) ∈ V (θt+1 ) ⊂ B(V )(θt+1 ). Thus, there exists (a∗ht+1 , fht+1 , rht+1 ) admissible w.r.t. V given θt+1 such that

(a∗ht+1 , fht+1 (a∗ht+1 )1,K−2 ) = fht (at ) and, for all i,

rht ,i (at ) = Fi (gi (T K−1 (θt+1 , (a∗ht+1 , fht+1 (a∗ht+1 )1,K−2 )), fht+1 (a∗ht+1 )K−1 ), rht+1 ,i (a∗ht+1 )). Once we are done with this process, define a strategy profile s∗ ∈ S(θ1 ) by setting s∗ (ht ) = a∗ht for all ht ∈ H(θ1 ).

We show that s∗ is an ASPE given θ1 whose path

starts with e a1,K−1 and that it generates v as the continuation score from the Kth period. Note that, by construction, for any ht ∈ H(θ1 ) and ht+1 = (ht , a∗ht , T (θt , a∗ht )), we have 26

fht (a∗ht ) = a2,K (s∗ |ht ) and a(s∗ |ht+1 ) = a2,∞ (s∗ |ht ). Hence, for any ht ∈ H(θ1 ) and at ∈ A, we have rht ,i (at ) = Fi (gi (T K−1 (θt+1 , (a∗ht+1 , fht+1 (a∗ht+1 )1,K−2 )), fht+1 (a∗ht+1 )K−1 ), rht+1 ,i (a∗ht+1 )) = Fi (gi (θt+1 , s∗ |ht+1 )K , rht+1 ,i (a∗ht+1 )) = Fi (gi (θt+1 , s∗ |ht+1 )K , Fi (gi (θt+2 , s∗ |ht+2 )K , rht+2 ,i (a∗ht+2 ))) = Fi (gi (θt+1 , s∗ |ht+1 )K , Fi (gi (θt+1 , s∗ |ht+1 )K+1 , rht+2 ,i (a∗ht+2 ))) = ··· , where θt+1 = T (θt , at ), ht+1 = (ht , at+1 , θt+1 ), θt+k+1 = T (θt+k , a∗ht+k ), and ht+k+1 = ci is monotone and rhτ (a) ∈ V † for all hτ (ht+k , a∗ht+k , θt+k+1 ) for k = 1, 2, . . .. Since W and a, we have ci (gi (θt+1 , s∗ |ht+1 )K,K+k , g, g, . . .) ≤ rht ,i (at ) ≤ W ci (gi (θt+1 , s∗ |ht+1 )K,K+k , g, g, . . .), W for all k = 1, 2, . . .. Since the left-hand and right-hand sides of the above inequality conci (gi (θt+1 , s∗ |ht+1 )K,∞ ) as k goes to infinity by continuity, we have rht ,i (at ) = verge to W

ci (gi (θt+1 , s∗ |ht+1 )K,∞ ). This implies that, for each ht ∈ H(θ1 ), (ii) in the admissibility W

conditions becomes Wi (a∗ht , a(s∗ |(ht ,a∗ t ,T (θt ,a∗ t ) )|θt ) h

≥

h

Wi ((a′i , a∗ht ,−i ), a(s∗ |(ht ,(a′i ,a∗ t ),T (θt ,(a′i ,a∗ t )) )|θt ) h ,−i h ,−i

for every i ∈ N and a′i ∈ Ai . Hence all the one-shot deviation constraints are satisfied at ci (gi (θ1 , s∗ )K,∞ ) for every ht ∈ H(θ1 ), and s∗ is an ASPE given θ1 . It also implies that vi = W all i. By construction a1,K−1 (s∗ ) = e a1,K−1 . Therefore, (e a1,K−1 , v) ∈ V ∗ (θ1 ). (b) Note that V ∗ is a self-generating correspondence with its image in AK−1 × V † . Hence B(V ∗ )(θ) ⊂ V ∗ (θ) for all θ ∈ Θ by (a). Therefore V ∗ (θ) = B(V ∗ )(θ) for all θ ∈ Θ. Let V be a fixed point of B in V. Then by (a), V (θ) = B(V )(θ) ⊂ V ∗ (θ) for all θ ∈ Θ. Hence V ∗ is the largest fixed point in V. Take any θ ∈ Θ and v ∈ E(θ), and let s∗ ∈ S(θ) be an ASPE given the initial state θ such that W (a(s∗ )|θ) = v. Define a∗ = s∗ (θ), f (a) = a(s∗ |(θ,a,T (θ,a)) )1,K−1 and ri (a) = ci (gi (T (θ, a), s∗ |(θ,a,T (θ,a)) )K,∞ ), i = 1, . . . , n, for each a ∈ A. Then it can be shown that W (a∗ , f, r) is admissible w.r.t. V ∗ given θ and vi = Gi ((a∗ , f (a∗ )), ri (a∗ )|θ) for all i. To show the other inclusion, take any v in the set shown on the right-hand side of (1). Then there exists (a∗ , f, r) such that (a∗ , f, r) is admissible w.r.t. V ∗ given θ and vi = Gi (a∗ , f (a∗ ), ri (a∗ )|θ) for all i. Since V ∗ is self-generating, we can construct an ASPE s∗ given the initial state θ such that W (a(s∗ )|θ) = v, as in the proof of (a). 27

References [1] D. Abreu, “On the Theory of Infinitely Repeated Games with Discounting,” Econometrica 56 (1988) 383–396. [2] D. Abreu, D. Pearce, and E. Stacchetti, “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,” Econometrica 58 (1990) 1041–1063. [3] A. Caplin and J. Leahy, “The Recursive Approach to Time Inconsistency,” Journal of Economic Theory 131 (2006) 134–156. [4] H. Chade, P. Prokopovych, and L. Smith, “Repeated Games with Present-Biased Preferences,” Journal of Economic Theory 139 (2008) 157–175. [5] S. Frederick, G. Loewenstein, and T. O’Donoghue, “Time Discounting and Time Preference: A Critical Review,” Journal of Economic Literature XL (2002) 351–401. [6] A. Kochov and Y. Song, “Repeated Games with Endogenous Discounting,” working paper (2015). [7] T. Koopmans, “Stationary Ordinal Utility and Impatience,” Econometrica 28 (1960) 287–309. [8] D. Laibson, “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Economics 112 (1997) 443–477. [9] D. Laibson, “Life Cycle Consumption and Hyperbolic Discount Functions,” European Economic Review 42 (1998) 861–871. [10] I. Obara and J. Park, “Repeated Games with General Discounting,” working paper (2016). [11] T. O’Donoghue and M. Rabin, “Doing it Now or Later,” American Economic Review 89 (1999) 103–124. [12] T. O’Donoghue and M. Rabin, “Choice and Procrastination,” Quarterly Journal of Economics 116 (2001) 121–160. [13] J. L. M. Olea and T. Strzalecki, “Axiomatization and Measurement of Quasi-Hyperbolic Discounting,” Quarterly Journal of Economics 129 (2014) 1449–1499. [14] B. Peleg and M. Yaari, “The Existence of a Consistent Course of Action when Tastes are Changing,” Review of Economic Studies 40 (1973) 391–401.

28

[15] E. Phelps and R. Pollak, “On Second-Best National Saving and Game-Equilibrium Growth,” Review of Economic Studies 35 (1968) 185–199. [16] T. Sekiguchi and K. Wakai, “Repeated Games with Recursive Utility: Cournot Duopoly under Gain/Loss Asymmetry,” Kyoto University Discussion Paper No. e-16-006 (2016).

29

Repeated Games with General Discounting - CiteSeerX