Waiting Times in Evolutionary Dynamics with Time ...

Viewer
Transcript

Waiting Times in Evolutionary Dynamics with Time-Decreasing Noise Katsuhiko Aiba∗ February 2014

Abstract Expected waiting times for equilibrium selection are exponentially increasing as the noise level goes to zero in evolutionary models with time-constant noise, raising questions about whether the history independent prediction of equilibrium selection is relevant in economic and social studies. However, by using the theoretical results on simulated annealing, we show that expected waiting times in models with time-decreasing noise need not tend toward infinity in the small noise limit. Our model thus describes conditions under which the waiting-time critique of the predictions of stochastic stability theory may have less force. Keywords: Evolutionary game theory, waiting times, simulated annealing JEL codes: C61, C72, C73

1. Introduction Kandori, Mailath, and Rob (1993) (henceforth KMR) and Young (1993) show that adding a small amount of noise to an adaptive adjustment process can help select between multiple strict Nash equilibria. While a Markov chain induced by the basic adjustment process has multiple steady states, the perturbed chain with a small noise level has an unique stationary distribution whose mass may become concentrated on a single strict Nash equilibrium, which is said to be stochastically stable. The noise in these models ensures that the chain escapes from steady states of the unperturbed adjustment process that are not stochastically stable. However, when the noise level is small, it may take an extremely long time to escape from such steady states; indeed, the expected time required to escape from the basin of attraction of a steady state increases exponentially as the noise level goes to zero (Ellison (1993, 2000), Beggs (2005)). This leads to the critique that the history independent prediction obtained from stochastic stability analysis may not be relevant in economic and social studies. Even with a moderate amount of noise, infinite horizon predictions may not hold force within any relevant time span. ∗ Institute for Microeconomics, University of Bonn, Adenauerallee 24-42, 53113 Bonn, Germany. e-mail: [email protected]

In these models the noise level is assumed to be constant over time. An alternative plausible assumption is that the noise level declines over time. As the game is repeated, a strategic uncertainty declines; players learn more about their environments and opponents’ play and hence make fewer mistakes or experiment less often over time. If mutations come from these mistakes or experiments, then it may be preferable to assume that the noise level decreases over time. In this paper we examine evolutionary dynamics with time-decreasing noise and estimate the expected waiting times until equilibrium selection. The Markov chain induced by time-decreasing noise is no longer time-homogeneous, so the techniques employed in the papers described above are not applicable here. But fortunately, our time-inhomogeneous Markov chain can be analyzed within the general framework of simulated annealing. Simulated annealing algorithms are procedures used to find the global minimizer of a function which may have multiple local minimizers. To avoid becoming trapped at a local minimizer, the algorithm employs large noise levels during an initial phase. After this, it seeks the global optima using a sequence of ever smaller noise levels, whose speed of decresing is slow enough not to be trapped at local optima. By applying theoretical results on simulated annealing due to Catoni (1991) and Trouv´e (1996), we derive conditions for equilibrium selection in evolutionary models with time-decreasing noise, and we estimate the expected waiting times required before these selection results become relevant. We show that the stochastic evolutionary model with time-decreasing noise has the same limiting distribution as that with time-constant noise if and only if the noise level decreases slowly enough. Trouv´e (1996) provides the result that a state’s escape from a basin of attraction obeys timeinhomogeneous exponential laws. We apply his result to derive an estimate of expected waiting times before a state leaves the basin of the attractors that are stochastically unstable, and show that there is a noise schedule that makes the expected waiting times bounded. Thus, one need not accept that expected waiting times tend toward infinity for equilibrium selection: declining noise levels bring faster equilibrium selection. Our result shows that the waiting-time critique of the predictions of stochastic stability theory may have less force than previously believed. Stochastic evolutionary models with time-decreasing noise have been considered previously by Robles (1998), Sandholm and Pauzner (1998), Chen and Chow (2001), and Pak (2008). Relative to what these papers accomplish, the present work makes progress on two fronts. First, while the earlier papers only establish conditions for equilibrium selection results, we obtain precise estimates of the expected waiting times needed for these selection results to become relevant. Second, except Robles (1998) and Pak (2008), the previous papers only consider the case in which the mutation rate is independent from payoffs as in KMR. Our framework is broad enough to allow mutation rate to depend on payoffs as in the logit choice model considered in Blume (1993). The paper proceeds as follows. Section 2 introduces a model of stochastic evolution with time-decreasing noise, which corresponds to the general framework of simulated annealing, and shows through two examples how evolutionary dynamics from the literature can be fit into this framework. Section 3 provides estimates of the expected waiting times in evolutionary dynamics with time-decreasing noise. The paper concludes in Section 4. –2–

2. The Model This section introduces a rather abstract model of stochastic evolution with time-decreasing noise, which can be analyzed within the general framework for simulated annealing following Catoni (1991,1992) and Trouv´e (1996). We then show that this framework subsumes well-known stochastic evolutionary dynamics considered in the literature.

2.1 Evolution with Time-decreasing Noise Suppose that there are N agents in a population. They play a symmetric game with strategy set S = {1, . . . , n}. The state x = (x1 , . . . , xn ) represents a vector of fractions of agents currently choosing P each strategy, and the (finite) state space is E = {x ∈ Rn+ | i∈S xi = 1, Nx ∈ Zn }, where Nx is a vector of the numbers of players playing each strategy. Let Fi (x) be the payoffs to strategy i when the aggregate distribution of play is x, and let F(x) = (F1 (x), . . . , Fn (x)) be a payoff vector. In a model of stochastic evolution with time-constant noise level η, which is usually considered in the literature, the state follows a stochastic evolutionary process: a time-homogeneous Markov chain (Xt )t∈N on E induced by a transition probability function pη , that is, P(Xt = y | Xt−1 = x) = pη (x, y), The Markov chain induced by p0 = limη→0 pη describes some basic evolutionary process such as best response dynamics, while the chain induced by pη represents the perturbed process obtained by introducing mistakes or experiments. In our model the transition probability function pη is assumed to satisfy the following: first we define an energy function U : E → R+ with a unique global minimizer1 such that minx∈E U(x) = 0. Second, we define an irreducible transition probability function q : E × E → [0, 1] such that X y∈E

q(x, y) = 1, for all x ∈ E

and

sup qt (x, y) > 0 for all x, y ∈ E, t∈N

where qt is defined as t-th step transition probability. Finally, given (E, q, U), we assume that a family {pη }η∈R++ of transition probabilities indexed by η ∈ R++ satisfies the following: there exists κ ≥ 1 such that for any η ∈ R++ , (1)

1 q(x, y)e−[U(y)−U(x)]+ /η ≤ pη (x, y) ≤ κq(x, y)e−[U(y)−U(x)]+ /η , κ

∀x, y ∈ E,

where [x]+ := max{x, 0}, and p0 = limη→0 pη is assumed to exist. U and q are determined by the specific evolutionary dynamics chosen by the modeler. The function U captures how the basic evolutionary process is perturbed. To see it, suppose that q(x, y) > 0 for now. If U(y) ≤ U(x), then 1 To handle the case of multiple global minimizers, we would have to introduce more tedious definitions. Since this is not typical situation in the evolutionary model, we assume a unique global minimizer throughout this paper.

–3–

p0 (x, y) > 0 from (1) and hence an immediate transition from x to y is possible in the basic process. If U(y) > U(x), then the transition is impossible, i.e. p0 (x, y) = 0, and as seen from (1) the difference U(y) − U(x) describes the rate of decay at which the transition probability in the perturbed process tends to zero as η approaches 0. We present two specific examples in the following subsection. In this paper we consider a model of stochastic evolution in which the noise level {ηt }t∈N decreases over time. Now a Markov chain (Xt )t∈N on E induced by (pηt )t∈N whose transition probability is given by P(Xt = y | Xt−1 = x) = pηt (x, y) is time-inhomogeneous. When (pηt )t∈N satisfies equation (1), our time-inhomogeneous Markov chain with time-decreasing noise can be analyzed within the general framework of simulated annealing. Simulated annealing algorithms aim to find the global minimizer of the energy function U on the finite state space E. The algorithm generates a time-inhomogeneous Markov chain with transition probability satisfying (1) and time decreasing noise (ηt )t∈N . With an appropriately decreasing noise level, it can be shown that the chain converges in distribution to the global minimizer of U whenever the chain starts from. In order to understand the intuition of the algorithm, think of the optimization procedure that starts at an arbitrary initial point and seeks the global minimizer 0 via allowable paths according to q (i.e. q(x, y) > 0 if x and y are successive points along the path).2 In general, this procedure leads to a local minimizer and gets stuck there forever. To avoid this problem, the procedure must occasionally go uphill even when it has the option to go downhill. Of course, since usually we do not have a priori knowledge of the entire shape of U, we do not know when experiments would be most beneficial. In the absence of such knowledge, the algorithm randomly goes up the hill from x to y with probability e−[U(y)−U(x)]+ /η . Note that if η is large, this experiment probability is large, and if η is close to 0, this experiment probability is close to 0 so that the algorithm is almost the same as the steepest decent strategy. Hence, the time-decreasing noise schedule (ηt ) enables the algorithm to escape from a local minimizer in the initial phase through experimentation, and to adopt the efficient descent procedure afterwards in order to reach the global minimizer.3

2.2 Examples of Evolutionary Dynamics In this subsection we consider evolutionary models studied in the literature that fit in the general framework introduced above. 2 One such procedure is the steepest decent procedure: from point x, one moves to y if y is the minimizer of U for which q(x, y) > 0 and U(x) ≥ U(y), and one stays at x if there is no such y. 3 But we should be careful about the speed of the noise level decreasing because the algorithm might be trapped at local optima even in later periods if the noise level decreases to zero too rapidly. See Proposition 3.3.

–4–

Example 2.1. Best response with mutations. Suppose that N agents are randomly matched to play the 2 × 2 symmetric coordination game below, where a > c and d > b. While the state space is E = {(x1 , x2 ) ∈ R2+ | x1 + x2 = 1, Nxi ∈ Z+ }, it is convenient to identify x with x2 , the mass of agents playing strategy 2. Let x∗ = (a − c)/(a − c + d − b)) be the proportion of agents playing strategy 2 in the mixed Nash equilibrium. We assume that x∗ ≥ 21 so that strategy 1 is risk dominant. s1 s2

s1 a, a c, b

s2 b, c d, d

In this game, agents are never matched with themselves. Hence, expected payoffs are F1 (x) = F2 (x) =

N(1−x)−1 Nx N−1 a + N−1 b N(1−x) Nx−1 N−1 c + N−1 d

for x ∈ {0, . . . , N−1 N } 1 for x ∈ { N , . . . , 1}.

We assume that agents follow the clever decision rule (Sandholm (1998)): when comparing strategies’ payoffs, agents account for the fact that their switch from strategy i to strategy j when the state is x changes the state to x + N1 (j − i), that is, agents compare their current payoff Fi (x) to F j (x + N1 (j − i)) rather than F j (x). In this coordination game, a clever agent prefers strategy 1 if less than fraction x∗ of his opponents play strategy 2, and prefers strategy 2 if more than fraction x∗ of his opponents play strategy 2. That is, 1 N−1 1 2 y + ) − F ( y) = sgn(y − x∗ ) for y ∈ {0, N−1 , N−1 , . . . , 1}, sgn F2 ( N−1 1 N N N where sgn(d) equals 1, 0, and −1 when d is positive, zero, and negative respectively. Letting x = N−1 N y, this is equivalent to ∗ sgn F2 (x + N1 ) − F1 (x) = sgn x − N−1 for x ∈ {0, N1 , N2 , . . . , N−1 N x N }. Finally, we consider the following dynamics: in each period, one player is randomly picked to revise his strategy, and he chooses the optimal strategy with probability 1−ε. Otherwise he chooses the suboptimal strategy. Before describing the transition probability of this dynamics, we consider an energy function (2)

U(x) = −

Nx X j=1

sgn F2

! !! j−1 j − F1 . N N

N−1 ∗ N x

< E. Define dye = min{z ∈ Z | y ≤ z}. Note that when x < U(x + N1 ) − U(x) = − sgn F2 (x + N1 ) − F1 (x) = 1 and U(x − N1 ) − U(x) = sgn F2 (x) − F1 (x − N1 ) = −1.

Suppose that

–5–

d(N−1)x∗ e , N

d(N−1)x∗ e

d(N−1)x∗ e

and x = , the differences of function U are similarly computed. For other cases of x > N N See Figure 1 for the shape of U. Observe that the energy function U counts the number of mutations needed to escape the basins of attraction because it increases as the state moves against the flow of the best response dynamics and decreases as the state moves toward an attractor when it resides in the attractor’s basin. The transition probability of this dynamics is given by 1

1

pε (x, x + N1 ) = (1 − x)ε[U(x+ N )−U(x)]+ (1 − ε)[U(x+ N )−U(x)]− for x ∈ {0, N1 , . . . , 1 − 1 1 pε (x, x − N1 ) = xε[U(x− N )−U(x)]+ (1 − ε)[U(x− N )−U(x)]− for x ∈ { N1 , . . . , 1 − N1 , 1}, pε (x, x) = 1 − p(x, x − N1 ) − p(x, x + N1 ) for x ∈ {0, N1 , . . . , 1}, pε (x, y) = 0 otherwise,

1 N },

where [x]− = |min{x, 0}| and p(0, − N1 ) = p(1, 1 + N1 ) = 0. Letting ε = e−1/η , we see that the transition probability has form (1). Note that as η goes to zero, the dynamics approximates the best response ∗ dynamics, so η can be interpreted as a noise level. Finally, if N−1 N x ≥ 1/2, then U(0) = 0 (< U(1)) attains the global minimum of the energy function. §

U

0

⌈(N−1)x∗ ⌉−1 N

N−1 ∗ N x

⌈(N−1)x∗⌉ N

1

Figure 1: Energy function U

Example 2.2. Logit choice in a potential game. In potential games payoff difference is equal to the first difference of a potential function. These games are first introduced by Monderer and Shapley (1995) and vigorously studied in the evolutionary game theory. We call a game F : E → Rn a potential game if it admits a potential function U : E → R such that minx∈E U(x) = 0 and F j (x +

1 N (e j

− ei )) − Fi (x) = U(x) − U(x +

1 N (e j

− ei )) for all x ∈ E and i, j ∈ S,

–1–

where ei is a vector assigning one in the i-th element and zero otherwise.4 It is easy to see that a Nash equilibrium of the game corresponds to a local minimizer of U. In each period, one player is picked at random to revise his strategy. He randomly picks a strategy to compare to his current strategy. Although the revising player tries to choose the better 4

Note that we take the opposite sign to the standard definition of a potential function in order to make it correspond to the definition of an energy function.

–6–

strategy, he makes a mistakes that are represented by the pairwise logit choice model. That is, the probability of the revising player choosing strategy j over strategy i is given by eπ j /η eπi /η + eπ j /η where πi is the payoff of strategy i. With two strategies (n = 2), this rule coincides with the standard logit choice model considered in Blume (1993). We assume the pairwise comparison case here to make it easy to check that the rule satisfies (1), though it can be shown that standard logit choice also satisfies (1). The transition probability for the pairwise logit choice model is  F (y)/η  e j   q(x, y) eFi (x)/η +eF j (y)/η   P  pη (x, y) =  1 − z,x pη (x, z)     0

if y = x +

1 N (e j

if x = y,

− ei ),

i, j ∈ S, i , j

otherwise.

where q(x, y) = n1 xi for y = x + N1 (e j − ei ), which is the probability that an agent playing strategy i is picked to revise his strategy and considers switching to strategy j. Note that as η decreases, the probability of a mistake goes to zero, so we can again interpret η as the noise level. Indeed the chain induced by p0 describes a best response dynamic with zero noise; an agent getting the opportunity to switch her strategy does so only when the other strategy is the best response. Observe that q(x, y)

eF j (y)/η eFi (x)/η + eF j (y)/η

= q(x, y) = q(x, y) =

e−(Fi (x)−F j (y))/η 1 + e−(Fi (x)−F j (y))/η e[U j (y)−Ui (x)]− /η 1+e

−(U j (y)−Ui (x))/η

e−[U j (y)−Ui (x)]+ /η

q(x, y) −[U j (y)−Ui (x)]− /η

e

+ e−[U j (y)−Ui (x)]+ /η

e−[U j (y)−Ui (x)]+ /η ,

Now since 1 ≤ e−[U j (y)−Ui (x)]− /η + e−[U j (y)−Ui (x)]+ /η ≤ 2 for any η > 0, we can see that the transition probability has form (1). § As a remark, note that both examples are potential games since the symmetric 2 × 2 game in Example 2.1 is known to have a potential function. However, as is clear from equation (2), the energy function U is not the potential function5 while U coincides with the potential function in Example 2.2. Hence, as we mentioned before, the shape of the energy function U depends on which dynamic is considered. 5

But U is an ordinal transformation of the potential function.

–7–

2.3 Communication Subclasses and Depth Let us introduce some definitions capturing the landscape defined by the triple (E, q, U). We introduce the contour set Eλ of U defined by Eλ = U−1 ([0, λ]) and define an equivalence relation Rλ on E by (x, y) ∈ Rλ ⇔ sup (qλ )n (x, y) > 0,

where

n∈Z+

    q(x, y), qλ (x, y) =    0,

if (x, y) ∈ Eλ × Eλ , otherwise.

We assume the convention that q0λ is the identity matrix, so that (x, x) ∈ Rλ , for any x ∈ E. The definition says that (x, y) ∈ Rλ if there is a path from x to y which does not go through any energy level above λ. We denote by Cλ (E, q, U) the set of all equivalence classes of Rλ . We define the set of (communication) subclasses6 of (E, q, U) by C(E, q, U) =

[

Cλ (E, q, U).

λ∈R+

For an arbitrary set C ⊂ E, the boundary of C is B(C) := y ∈ E − C | ∃x ∈ C, q(x, y) > 0 , and the energy of C is U(C) := minx∈C U(x). Finally, we define the depth of C as H(C) = min y∈B(C) [U(y) − U(C)]+ . That is, the depth of C is the smallest increase7 in energy sufficient to escape from ”the bottom” of C. Example 2.3. Best response with mutations once more. Given the energy function (2), two of nond(N−1)x∗ e d(N−1)x∗ e−1 } and C2 = { N , . . . , 1}. Both are the basins of the trivial subclasses are C1 = {0, . . . , N attractions of steady states (x = 0, 1) of the best response dynamics. The depth of each set is d(N−1)x∗ e−1 d(N−1)x∗ e − U(0) and H(C2 ) = U − U(1) respectively. § H(C1 ) = U N N

3. Waiting Times with Time-Decreasing Noise This section provides an estimate of the expected waiting times in stochastic evolution with time-decreasing noise, using theoretical results on the simulated annealing algorithm. To make clear the issue, we first show the trade-offs between the equilibrium selection and the expected waiting times in the time-constant noise case, which is familiar from the evolutionary game literature. In Subsection 3.2 we show that the familiar trade-off can be ameliorated by introducing time-decreasing noise.

3.1 Waiting Times in the Time-Constant Noise Case First, we suppose that the noise level η is constant through time. This case, which is called the Metropolis algorithm in the optimization literature, is well known in evolutionary game theory. 6 7

These are usually called cycles in the literature of simulated annealing. The reason [·]+ is needed here is that min y∈B(C) {U(y) − U(C)} might be negative.

–8–

Since q is irreducible, pη is irreducible, so there exists a unique stationary distribution µη . Moreover, if pη is aperiodic, which is usually satisfied for the Markov chains considered in the evolutionary model (including Examples 2.1 and 2.2), then we have a standard result: limt→∞ Pη (Xt = x) = µη (x) for any initial distribution, where Pη is a probability measure induced by pη . Now define µ∗ = lim µη . η→0

Then, from Catoni (1995), which is based on Friedlin and Wentzell (1998), we have Proposition 3.1. For any x ∈ E, (3)

lim −η ln µη (x) = U(x).

η→0

In particular, µ∗ (x) = 0 if x , argmin y∈E U(y). Although the chain generated by p0 may have multiple stationary distributions, the perturbed chain generated by pη has a unique stationary distribution. If we let the noise level go to zero, then the probability mass of µ∗ = limη→0 µη concentrates on the global minimizer of the energy function. However, we should be careful about how relevant these results are. Even though µ∗ predicts that a unique state is selected, it may take an exceedingly long time for the perturbed chain to reach it for small η if there are other limit points of unperturbed dynamics p0 and the chain starts in their neighborhood. Define, for a set A ⊂ E, τ(A) = inf{t ≥ 0 | Xt ∈ A}, that is, the first time the set A is reached. The characterization of the expected waiting time by Friedlin and Wentzell (1998)’s graph argument leads to: Proposition 3.2. For any subclass8 C ∈ C(E, q, U), C , E, and C = E \ C, there exists a constant b > 0 such that (4)

beH(C)/η ≤ sup Eη {τ(C) | X0 = x} ≤ b−1 eH(C)/η , x∈C

where expectation Eη is taken relative to Pη . Proof. See Appendix. Consider Example 2.1. Suppose that strategy 1 is the risk-dominant equilibrium, so the state d(N−1)x∗ e x = 0 is selected by µ∗ by Proposition 3.1. Recall that C2 = { N , . . . , 1} is the basin of the attraction of another steady state x = 1. Then Proposition 3.2 says that the expected amount of time needed to escape from C2 exponentially increases as the noise level η goes to zero. This has called into question whether history independent infinite horizon prediction is relevant in Actually we can obtain the same bound for any set D ⊂ E but, then we would have to introduce more definitions in order to define a depth H(D) on the general set D. So here we focus on a subclass. 8

–9–

economic applications. In this paper we especially emphasize the following: if we want to make sure that the mass of the limiting distribution concentrates on states selected by µ∗ , then we have to let the noise level go to zero as seen from the equation (3). But then from (4), it seems that we have to accept that the expected waiting times become unbounded. In the following subsection, we see that the time-decreasing noise may resolve this trade-off.

3.2 Waiting Times in the Time-Decreasing Noise Case Suppose that the noise level η is represented by a sequence (ηt )t∈N that is decreasing over time. Now the evolutionary process is represented by a time-inhomogeneous Markov chain (Xt )t∈N induced by (pηt )t∈N . Again in the optimization literature, (E, q, U, (pηt )t∈N , (Xt )t∈N ) is called a simulated annealing algorithm, and is expected to speed up the convergence to an optimum state in comparison to the Metropolis algorithm.9 In this subsection we use the theoretical results on simulated annealing algorithms to derive the necessary and sufficient conditions for convergence to µ∗ as well as a bound on the waiting time. Trouv´e (1996) gives a necessary and sufficient condition for the simulated annealing chain to reach the global minimizer of the energy function.10 Let P(η· ) denote the probability measure induced by (pηt )t∈N . Proposition 3.3 (Trouv´e (1996, Theorem 5.2)). Suppose limt→∞ ηt = 0. For any initial distribution, (5)

lim P(η· ) (Xt = x) = µ∗ (x),

t→∞

if and only if the schedule (ηt )t∈N satisfies (6)

∞ X t=1

e−H

∗ /η

t

= ∞,

where H∗ = max{H(C) | C ∈ C(E, q, U), C contains a state that is a local but not a global minimum of U}.

H∗ is the least upper bound on the altitude the chain must overcome, no matter where it starts or no matter what path it chooses, in order to reach a global minimum of U. That is, H∗ is roughly the depth of ”the second deepest valley” of U. The result imposes some restriction on the speed of noise decreasing for convergence to stochastically stable state. Condition (6) means that the non-increasing schedule (ηt )t∈N should not go to zero too fast to escape from the second deepest valley; otherwise the chain would get stuck within a local but not global minimum with positive probability. 9

In the optimization problem the designer is allowed to choose the optimal (possibly variable) step sizes. But in our evolutionary model the state space E is a finite subset of n-dimensional grid. Moreover, it is standard to assume a single mutation in each period: we consider a period as time interval that is small enough that the probability of more than one individual changing his action in the same period is negligible as in the construction of Poisson processes. Hence, in our model the step size is exogenously fixed. 10 Hajek (1988) derives the above necessary and sufficient conditions for more restrictive setup.

–10–

Proposition 3.3 encompasses the results obtained in the earlier papers dealing with timedecreasing noise. Let εt = e−1/ηt as considered in Example 2.1 of the best response with mutations. P ∗ H∗ Then the condition (6) becomes ∞ t=1 εt = ∞. It is easily seen that H is equal to the minimum coradius considered in Robles (1998) when the energy function is given by (2). That is, condition (6) becomes identical to the condition of Robles (1998). Hence his result can be understood in the context of simulated annealing algorithm being applied to a KMR type model. Similarly, suppose that ηt assumes the parametric form ηt =

h . ln(t + 1)

Then condition (6) and hence (5) is true if and only if h ≥ H∗ . Again letting εt = e−1/ηt and substituting the above parametric form for ηt give εt = (t + 1)−1/h , which is identical to the noise decreasing schedule for the selection result considered in Sandholm and Pauzner (1998) when the energy function is given by (2). In their symmetric 2 × 2 coordination game, H∗ then corresponds to the size of the basin of the attractor that is not the risk-dominant equilibrium.11 They show that if ∗ the speed of the decrease of noise εt is slower than that of (t + 1)−1/H , then the equilibrium selection result is obtained, which is equivalent to Proposition 3.3. Hence, their result can be interpreted as applying the simulated annealing to a KMR type model with a time-decreasing noise of the above parametric form. Although the above papers and others consider conditions for convergence in stochastic evolution with time-decreasing noise, no paper provides an estimate of the expected waiting time. A bound of the expected waiting time of a simulated annealing algorithm is not explicitly calculated in the literature. However, we can calculate it from estimates due to Trouv´e (1996). Define, for a set A and a time s ∈ N, τ(A, s) = inf{t > s | Xt ∈ A}, that is, the first time the set A is reached after time s. Lemma 3.4 (Trouv´e (1996, Theorem 4.5.)12 ). There exists a schedule (ηt )t∈N such that for a subclass C ∈ C(E, q, U) and any x ∈ C, there exist positive constants (a, b, c, d) such that for any s, t ∈ Z+ a

t Y k=s+1

1 − be

−H(C)/ηk

t Y o n 1 − de−H(C)/ηk . ≤ P(η· ) τ(C, s) > t | Xs = x ≤ c k=s+1

n o The main message here is that the probability P(η· ) τ(C, s) > t | Xs = x of the chain staying in Q C between times s and t is of order tk=s+1 (1 − ae−H(C)/ηk ), so roughly speaking the jumping out of C can be thought of as obeying time-inhomogeneous exponential laws.13 From this we use the P formula E(τ) = t P(τ > t) to have the main result of this paper: Specifically, −1/H∗ is equal to U(M, x∗ ) in their model. Catoni (1991, Lemma 5.21) first proves the above statement for a more restrictive model, and Trouv´e (1996, Theorem 4.5.) proves for the general simulated annealing. 13 The motivation of introducing the definition of a subclass is that at low noise level η the probability of escaping from a subclass C is approximately given by e−H(C)/η irrespective of the starting point within C. 11

12

–11–

Proposition 3.5. There exists a schedule (ηt )t∈N such that for a subclass C ∈ C(E, q, U) and any x ∈ C, there exist positive constants (a, b, c, d) such that a

(7)

∞ Y t X t=0 k=1

1 − be

−H(C)/ηk

≤ E(η· )

∞ Y t X 1 − de−H(C)/ηk . τ(C) | X0 = x ≤ c t=0 k=1

Note that if we assume a constant noise level, then equation (7) reduces to equation (4).14 From Propositions 3.3 and 3.5, we get the following corollary. Corollary 3.6. Take any C ∈ C(E, q, U) that contains a state that is a local but not a global minimum of U and any x ∈ C. Then one schedule that both ensures (5) and makes the series in (7) convergent is ηk =

h ln(k + 1)

h > H∗ .

for

Proof. From the argument after Proposition 3.3, the schedule satisfies condition (6) and hence ensures (5). Let r := H(C)/h < 1. Then since 1 − x ≤ e−x , the right hand-side of (7) is c

∞ Y t X t=0 k=1

∞ ∞ X X Pt Pt −H(C)/ηk −r ln(k+1) 1 − de−H(C)/ηk ≤ c e−d k=1 e =c e−d k=1 e . t=0

t=0

Then since t+1 r X 1 k=1

k

Z ≥

t+2

1

i 1 r 1 h dx = (t + 2)1−r − 1 , x 1−r

we have e−d

Pt k=1

e−r ln(k+1)

= e−d

1 r k=1 k+1

Pt

(

) = ed e−d

1 r k=1 k

Pt+1

d d 1−r ( ) ≤ ed e− 1−r [(t+2)1−r −1] = e d(2−r) 1−r e− 1−r (t+2) . d

By application of L’Hospital rule, we can easily see that there exists K > 0 such that e− 1−r (t+2) ≤ 2 1 for all t ≥ K. So we can conclude from the comparison theorem for series convergence that t+2 P Qt −H(C)/ηk is convergent. the series c ∞ 1 − de t=0 k=1 1−r

Hence the time-decreasing noise enables us to get a limiting distribution whose mass concentrates on the minimizer of the energy without accepting unbounded expected waiting times. Recall Example 2.1 where the state x = 0 corresponding to the risk-dominant equilibrium is selected by µ∗ . By Proposition 3.5, if the probability of mistakes of choosing the best response decreases over time according to εt = e−1/ηt = (t + 1)−1/h , then the expected time needed to esd(N−1)x∗ e cape from C2 = { N , . . . , 1}, which is the basin of attraction of another steady state x = 1, is bounded. This point is in stark contrast with the time-constant noise case. It is natural to assume 14

Letting ηk = η for all k and then letting x = 1−be−H(C)/η , we have

–12–

P∞ Qt t=0

k=1

P t H(C)/η 1 − be−H(C)/η = ∞ . t=0 x = 1/(1−x) = be

that agents in a society learn more about their environments and make less mistakes or experiments over time. Hence if analysts take this into consideration, it might be legitimate to have a history independent and infinite horizon prediction even in the economic and social studies. Finally, Robles (1998), Sandholm and Pauzner (1998), Chen and Chow (2001), and Pak (2008) also consider a time-decreasing noise but there are two differences between their results and ours. First, they only show conditions for convergence to µ∗ , although Sandholm and Pauzner (1998) derives the stronger statement that the stochastic evolutionary process almost surely converges if ∗ εt = O(t−1/H ), which is not derived in this paper. But none of them calculate the expected waiting time. Second, except Robles (1998) and Pak (2008), they only consider the case that mutation rate ε = e−1/η is independent from payoffs as in Example 2.1. But our general framework allows mutation rate e−[U(y)−U(x)]+ /ηk to depend on payoffs as in Example 2.2.15

4. Conclusion We apply the general framework of the simulated annealing algorithm to consider evolutionary dynamic models with noise decreasing over time. We show that the expected waiting times in models with time-decreasing noise do not have to be unbounded in order for the mass of the limiting distribution of evolutionary process to be concentrated on stochastically stable states. Our model thus describes conditions under which the waiting-time critique of the predictions of stochastic stability theory may have less force. However, we should be aware of the limitation of our results. We only show that the expected waiting time is bounded. It would be interesting to see through some simulation how big it is under realistic values of parameters. Moreover, our results hold only for a particular sequence of noise levels. Hence, extending the results to understand whether the result applies robustly for a broader class of sequences is an important question for future research. Acknowledgments We are grateful to Bill Sandholm, an Associate Editor, and anonymous referees for their comments and suggestions.

Appendix. Proof of Proposition 3.2 We denote x 7→ y if there is a directed link from state x to y. For any subset W ⊂ E, we let G(W) be the set of oriented graphs g = (x, y) ∈ E × E | x 7→ y such that (i) every point x ∈ W = E \ W is the initial point of exactly one arrow (no arrow starts from W), and (ii) there are no closed cycles in the graph. Let Gx,y (W) denote the set of graphs g ∈ G(W) which leads from x ∈ E to y ∈ W, where, by convention, G y,y (W) = G(W) for y ∈ W.16 Catoni (1995, Lemma 3.4), based on Freidlin 15 Robles (1998) derives a condition for ergodicity in the case of state dependent mutation as in Bergin and Lipman (1996). Pak (2008) considers more general mutations as in Ellison (2000) and allows a limit set under the base dynamics to have a periodic cycle, which the previous papers exclude. 16 The definition of G y,y (W) is used in equation (8). Note also Gx,y (W) = ∅ for x ∈ W \ {y} since no arrow starts from x.

–13–

and Wenztel (1998, Lemma 6.3.4), gives a formula

(8)

 X E τ(C) | X0 = x =   y∈C

X g∈Gx,y (C∪{y})

 −1   X     p(g)  p(g)   

for

g∈G(C)

x ∈ C,

Q where p(g) := (x,y)∈g pη (x, y) is the product of transition probabilities over the graph g. The transition probability (1) has a form ae−V(x,y)/η ≤ pη (x, y) ≤ a−1 e−V(x,y)/η

for all x , y and some a > 0,

P where V(x, y) := [U(y) − U(x)]+ if q(x, y) > 0, and V(x, y) := ∞ otherwise. Let V(g) := (x,y)∈g V(x, y) and g∗ the graph solving min y∈C min g∈Gx,y (C∪{y}) V(g). Then the numerator of (8) is bounded as follows: ∗ − min y∈C min g∈G (C∪{y}) V(g)/η x,y a| g | e ≤

X

≤

X

≤

X

X

a| g| e−V(g)/η

y∈C g∈Gx,y (C∪{y})

X

p(g)

y∈C g∈Gx,y (C∪{y})

X

a−| g| e−V(g)/η

y∈C g∈Gx,y (C∪{y})

∗ − min y∈C min g∈G (C∪{y}) V(g)/η x,y ≤ B1 a−| g | e

for some constant B1 > 0, where g is the number of elements in the graph g. The last inequality follows because the summation is over a finite number of terms, each of which is positive and bounded. Similarly, letting g∗∗ ∈ argmin g∈G(C) V(g), the denominator of (8) is bounded as follows: ∗∗ − min V(g)/η g∈G(C) a| g | e ≤

X g∈G(C)

∗∗ − min V(g)/η g∈G(C) p(g) ≤ B2 a−| g | e

for some constant B2 > 0.

Therefore, we conclude that there exists b > 0 such that beHC (x)/η ≤ E τ(C) | X0 = x ≤ b−1 eHC (x)/η , where HC (x) := min g∈G(C) V(g) − min y∈C min g∈Gx,y (C∪{y}) V(g). Now we are done if we show that maxx∈C HC (x) = H(C). But this comes from Proposition 4.15 of Catoni (1995).

References [1] Beggs, A. (2005): ”Waiting Times and Equilibrium Selection,” Economic Theory, 25, 599-628. [2] Bergin, J. and B. Lipman (1996) : ”Evolution with State-dependent Mutations,” Econometrica, 64, 943-956. –14–

[3] Blume, A. (1993): ”The Statistical Mechanics of Strategic Interaction,” Games and Economic Behavior, 5, 387-424. [4] Catoni, O. (1991): ”Sharp Large Deviation Estimates for Simulated Annealing Algorithms,” Annals de l’institute Henri Poincar´e, 27, 291-383. [5] Catoni, O. (1992): ”Rough Large Deviation Estimates for Simulated Annealing: Aplication to Exponential Schedules,” The Annals of Probability, 20, 1109-1146. [6] Catoni, O. (1995): ”Simulated Annealing Algorithm and Markov Chains wtih Rare Transitions,” Lecture Note. [7] Chen, H., and Y. Chow (2001): ”On the Convergence of Evolution Processes with Time-varying Mutations and Local Interaction,” Journal of Applied Probability, 38, 301-323. [8] Cot, C., and O. Catoni (1998): ”Piecewise Constant Triangular Cooling Schedule for Generalized Simulated Annealing Algorithms,” The Annals of Applied Probability, 8, 375-396. [9] Ellison, G. (1993): ”Learning, Local Interaction, and Coordination,” Econometrica, 61, 1047-1071. [10] Ellison, G. (2000): ”Basin of Attraction, Long-Run Stochastic Stability, and the Speed of Step-by Step Evolution,” Review of Economic Studies, 67, 17-45. [11] Freidlin, M., and A. Wentzell (1998): Random Perturbations of Dynamical Systems, Second edition, Springer-Verlag [12] Hajek, B. (1988): ”Cooling Schedules for Optimal Annealing,” Mathematics of Operations Research, 13, 311-329. [13] Kandori, M., Mailath, G. J., and R. Rob (1993): ”Learning, Mutation, and Long Run Equilibria in Games,” Econometrica, 61, 29-56. [14] Monderer, D. and Shapley, L. S. (1996): ”Potential games,” Games and Economic Behavior, 14, 124-143. [15] Pak, M. (2008): ”Stochastic Stability and Time-dependent Mutations,” Games and Economic Behavior, 64, 650-665. [16] Sandholm,W. (1998): ”Simple and Clever Decision Rules in a Model of Evolution,” Economics Letters, 61, 165-170. [17] Sandholm, W., and A. Pauzner (1998): ”Evolution, Popultion Growth, and History Independence,” Games and Economic Behavior, 22, 84-120. [18] Trouv´e, A. (1996): ”Rough Large Deviation Esitimates for the Optimal Convergence Speed Exponent of Generalized Simulated Annealing Algorithm,” Annals de l’institute Henri Poincar´e, 32, 299-348. [19] Robles, J. (1998): ”Evolution with Changing Mutation Rates,” Journal of Economic Theory, 79, 207-223. [20] Young, P. (1993): ”The Evolution of Convention,” Econometirca, 61, 57-84.

–15–