Abstract Taste shocks result in non-degenerate choice probabilities, smooth policy functions, continuous demand correspondences, and reduced computational errors. They also cause significant computational cost when the number of choices is large. However, I show that in many economic models, a numerically equivalent approximation may be obtained extremely efficiently. If the objective function has increasing differences (a condition closely tied to policy function monotonicity) or is concave in a discrete sense, the proposed algorithms are O(n log n) for n states and n choices—a drastic improvement over the naive algorithm’s O(n2 ) cost. If both hold, the cost can be further reduced to O(n). Additionally, with increasing differences in two state variables, I propose an algorithm that is O(n2 ) even without concavity (in contrast to the O(n3 ) naive algorithm). I illustrate the usefulness of the proposed approach in an incomplete markets economy and a long-term sovereign debt model, the latter requiring taste shocks for convergence. For grid sizes of 500 points, the algorithms are up to 200 times faster than the naive approach. Keywords: Computation, Monotonicity, Grid Search, Discrete Choice, Sovereign Default JEL Codes: C61, C63, E32, F34

∗

Grey Gordon, Indiana University, Department of Economics, [email protected] This paper builds on previous work with Shi Qiu, whom I thank. I also thank Amanda Michaud, David Wiczer, and participants at the IU Macro lunch workshop for helpful comments.

1

1

Introduction

Discrete choice models with taste shocks have been widely used in economics. In applied micro, and following Luce (1959), McFadden (1974) and Rust (1987), taste shocks have been used to give nondegenerate choice probabilities for constructing likelihood functions. Bruins, Duffy, Keane, and Smith (2015) show taste shocks may be used to faciliate indirect inference estimation discrete choice models. Iskhakov, Jørgensen, Rust, and Schjerning (2017) show how discrete choice can introduce significant nonconvexities, that taste shocks reduce the degree of nonconvexity, and establish a homotopy result in the limit as taste shocks disappear. In heterogeneous agent macro models, Chatterjee, Corbae, Dempsey, and R´ıos-Rull (2015) taste shocks are used to facilitate computation by giving continuity. Additionally, we show in this paper they also decrease Euler equation errors by orders of magnitude in an incomplete markets model. While taste shocks are in general useful, they can also be essential whenever discontinuous policy functions create problems. We illustrate one such example by considering a long-term sovereign default model where convergence only obtains with taste shocks, but additional examples arise in marriage markets (Santos and Weiss, 2016), credit markets (Chatterjee et al., 2015), dynamic migration models (Childers, 2018; Gordon and Guerron-Quintana, 2018), and quantal response equilibria (McKelvey and Palfrey, 1995). But for all their advantages, taste shocks come with a significant computational burden. Specifically, to construct the choice probabilities, one must generally know the value associated with every optimal choice. Consequently, taste shocks imply that computational cost grows quadratically in the number of states and choices. However, for small taste shocks, many choice probabilities occur with extremely low probability. Consequently, if one knew in advance where those choices were, one could exclude them while still obtaining a numerically equivalent solution. In this paper, I show there is a way to determine where these regions of the choice space are in many economic models, and then I propose and analyze algorithms for exploiting this. The key requirement is that the objective function must exhibit increasing differences and/or concavity (in a discrete sense). In the absence of taste shocks, this property implies (under mild conditions), increasing differences implies any optimal policy is monotone. A partial converse also holds in that monotonicity of policy functions implies increasing differences on the graph of the optimal choice correspondence. Building on algorithms proposed in Gordon and Qiu (2018), I develop numerically-equivalent solutions whose cost grows linearly or almost linearly for small taste shocks. With n states and n choices, the algorithms are O(n log2 n) for sufficiently small taste shocks if one has monotonicity or concavity or O(n) if one has both. Compared to the naive case, which has cost O(n2 ), the improvement is dramatic. With a two-dimensional state space, the proposed algorithm (for a restricted class of problems) is O(n2 ) even when only exploiting monotonicity, another drastic improvement over the naive O(n3 ) algorithm. I demonstrate the numeric efficiency of the algorithm in two models. The first is a standard incomplete markets (SIM) model developed by Laitner (1979), Bewley (1983), Huggett (1993), Aiyagari (1994), and others. This model exhibits monotonicity in two-state variables and concavity, 2

allowing all the algorithms to be evaluated within one model. I show taste shocks are useful in this model for reducing the numerical errors arising from a discrete choice space and for making excess demand continuous. Further, I show that the optimal taste shock size—in the sense of minimizing average Euler equation errors—is decreasing in grid sizes. This suggests the theoretical cost bounds, which require small taste shocks, are the relevant ones. In the SIM model, taste shocks are useful but, practically speaking, not necessary. The second model I use, a long-term sovereign default model nearly indistinguishable from Hatchondo and Martinez (2009) and Chatterjee and Eyigungor (2012), requires taste shocks for convergence. The model further serves as a useful benchmark because the objective function does not exhibit increasing differences globally but does have a monotone policy function. Consequently, the partial converse of monotonicity implying increasing differences on the graph of the optimal choice correspondence can be investigated numerically. Using a guess-and-verify approach, the algorithm exploiting monotonicity works without flaw and is extremely quick. There are few algorithms for speeding computation with taste shocks. Partly this is because taste shocks require a discrete number of choices which precludes the use of derivative-based approaches. One clever approach is that of Chiong, Galichon, and Shum (2016). They show there is dual representation for discrete choice models. By exploiting it, they can estimate the model they consider five times faster; in contrast, the approach here can be hundreds of times faster. The discrete continuous endogenous gridpoints method (DC-EGM) introduced by Iskhakov et al. (2017) has taste shocks for the discrete choice while using first order conditions for the continuous choice. Chatterjee and Eyigungor (2016) show models with quasi-geometric discounting do not have Markov perfect equilibria having continuous decision rules, provided there is a positive lower bound on wealth, and that decision makers prefer lotteries. They show how lotteries may be implemented efficiently using convex hull algorithms. While their method for computing the convex hull is efficient, it is problem specific and is not capable of “concavifying” the long-term debt model considered here. In contrast, taste shocks provide an alternative and very fast way of doing lotteries.

2 2.1

Taste shock properties Overview

Fix a state i ∈ {1, . . . , n} and let U (i, i0 ) be the utility associated with choice i0 ∈ {1, . . . , n0 }. Consider the maximization problem max

i0 ∈{1,...,n0 }

U (i, i0 ) + σi0 .

3

(1)

McFadden (1974) pointed out that if the {i0 } are i.i.d. and all distributed Type-I extreme value,1 then the choice probabilities have a closed form expression given by exp(U (i, i0 )/σ) P(i0 |i) = Pn0 0 j 0 =1 exp(U (i, j )/σ)

(2)

The expected value of the maximum is given by

E

0

max

i0 ∈{1,...,n0 }

U (i, i0 ) + σi0 = σ log

n X

! exp(U (i, i0 )/σ)

(3)

i0 =1

(Rust, 1987, p. 1012). In the limiting case as σ goes to 0, the optimal policy correspondence G(i) is given by arg maxi0 ∈{1,...,n0 } U (i, i0 ). A sufficient condition for the optimal policy (if unique) to be monotone (or in the non-unique case for the optimal correspondence to be ascending) is for U to have increasing differences. In the differentiable case, this requires that, essentially, that the crossderivative Ui,i0 be non-negative. However, differentiability is not required, and Gordon and Qiu (2018) collects many sufficient conditions. In what follows, I will exploit this increasing differences property to drastically save on computation cost.

2.2

Numerical example: choice probability structures

At face value, it appears that computing P(i0 |i) requires evaluating (i, i0 ) at each combination because doing so is necessary for computing the denominator. However, our method circumvents this by using a particular property of U that occurs in many economic models and exploiting properties of numerical precision. To see the structure of the choice probabilities, consider the following numerical example. Let U (i, i0 ) = log(bi − bi0 /2) + log(bi0 ), corresponding to a two-period consumption-savings problem where first-period wealth is bi and second-period wealth is bi0 . (In this example, U has increasing differences.) For σ > 0, every feasible choice is chosen with positive probability, but the probabilities can vary dramatically. This is illustrated in Figure 1’s contour plot, which gives the log10 choice probabilities log10 (P(i0 , i)) for σ = .01 and σ = .0001. The plotted contour levels correspond (roughly) to machine epsilon in double precision (10−16 ) and single precision (10−8 ), as well as a single basis point (10−4 ). For σ = .01, 45% of the (i, i0 ) are chosen with at least 10−16 probability, which falls to 24% for at least 10−4 probability. For σ = .0001, only 6% are chosen with 10−16 probability, and this falls to 3% for 10−4 . By exploiting increasing differences, the proposed algorithm will almost result in only the colored region being evaluated. That is, instead of evaluating U (i, i0 ) at each of the 1002 possible i, i0 states (like a naive algorithm requires), U will only need to be computed for a relatively small number 1

The Gumbel distribution or generalized extreme value distriubtion Type-I has two parameters, µ and β with a −z pdf β −1 e−z+e where z = (x − µ)/β. This has a mean µ + βγ for γ the Euler-Mascheroni constant. Rust (1987) focuses on the special case where µ = −γ and β = 1 resulting in a (unconditional) mean zero process. The formulas in this paper also assume this special case.

4

100

-1 -8 6 -4

100

80

6-8-1

80

4

-4

60

-8

60 6 -1

40

6-8 -1

20

-81- 6 4

20

40

-16

-4 -8

4

-16

20

40

60

80

100

20

40

60

80

100

Figure 1: Choice probabilities for different taste shock sizes of (i, i0 ) pairs. For large σ, this will bring some benefit (being anywhere from 2-4 times faster in this example); but for small σ, it will be much faster (16-33 times faster in this example). This is particularly useful because often researchers would like the taste shocks to be as small as possible (while still yielding its beneficial properties).

2.3

Monotonicity of relative choice probabilities

I now show that with increasing differences, essentially, that each of the contour lines in Figure 1 must be monotone increasing. Define the log odds ratio as 0

0

Lσ (j , i |i) := σ log

P(i0 |i) P(j 0 |i)

= U (i, i0 ) − U (i, j 0 )

(4)

U (i, i0 )

(5)

= U (i, i0 ) − U ∗ (i)

(6)

Further, define U ∗ (i) = L∗σ (i0 |i)

:= σ log

max

i0 ∈{1,...,n0 }

P(i0 |i) P(g(i)|i)

where g(i) is any value in G(i) and note that L∗σ (i0 |i) ≤ 0. Suppose that for P(i0 |i) to be numerically distinguishable from 0, one requires that P(i0 |i) ≥ ε (analogous to a given contour line in Figure 1). Then 0

0

P(i |i) ≥ ε ⇒ Lσ (g(i), i |i) = σ log

P(i0 |i) P(g(i)|i)

≥ σ log

ε P(g(i)|i)

≥ σ log(ε).

(7)

Define Lσ := σ log(ε). To be numerically relevant—i.e., to have P(i0 |i) ≥ ε—for a given i, L∗σ (i0 |i) 5

must be greater than Lσ . By definition, U satisfies increasing differences if and only if j 0 > i0 and j > i imply U (i, j 0 ) − U (i, i0 ) ≤ U (j, j 0 ) − U (j, i0 ). Note that this implies the log odds ratio satisfies the following monotonicity conditions: ( Lσ (j 0 , i0 |·) is

decreasing if j 0 > i0 increasing if j 0 < i0

.

(8)

Using this and the monotonicity properties of Lσ (j 0 , i0 |·), one can construct bounds on which choices are possible relevant and which are not. Specifically, suppose that L∗σ (i0 |i) < Lσ for some i0 < g(i). Then for j > i one has Lσ (g(i), i0 |j) < Lσ as well. This implies L∗σ (i0 |j) = Lσ (g(j), i0 |j) = U (j, i0 ) − U (j, g(j)) ≤ U (j, i0 ) − U (j, g(i)) = Lσ (g(i), i0 |j) < Lσ

(9)

by the optimality of g(j). Consequently, if i0 < g(i) is not numerically relevant at i, then it is not numerically relevant at j > i either. Similarly, if i0 > g(j) and i0 is not numerically relevant at j, then for any i < j one will not have i0 relevant either.

2.4

Concavity of relative choice probabilities

With “concavity” of the objective function, further gains may be obtained. In particular, suppose that U (i, ·) is strictly quasi-concave in that, for any i and any U , {i0 |U (i, i0 ) ≥ U } is a sequential list of integers with no gaps (i.e., convex in an integer sense). Then L∗σ (i0 |i) = U (i, i0 ) − U (i, g(i)) immediately inherits this property. Consequently, if one finds a i0 such that L∗σ (i0 |i) ≥ Lσ and L∗σ (i0 + 1|i) < Lσ , one immediately knows that L∗σ (j 0 |i) < Lσ for all j 0 > i0 . In other words, if i0 is numerically relevant and i0 + 1 is not, then all j 0 > i0 are irrelevant. Similarly, if one finds an i0 such that i0 is relevant and i0 − 1 is not, then all j 0 < i0 are irrelevant. In Figure 1, the choice probabilities exhibit concavity. For any i, the probabilities increase then decrease moving from i0 = 1 to n0 .

2.5

Monotone policies versus increasing differences

Increasing differences essentially implies monotonicity.2 The reason is not difficult to see. Consider j > i and g(j), g(i) (not necessarily with g(j) ≥ g(i)). Then by optimality U (i, g(j)) − U (i, g(i)) ≤ 0 ≤ U (j, g(j)) − U (j, g(i))

(10)

With increasing differences, this is only possibly if g(j) ≥ g(i), which gives monotonicity. The partial converse of this result is the following. Suppose g is known to be monotone. Then taking g(j) ≥ g(i) and (10) gives that U exhibits increasing differences on {(i, i0 )|i ∈ I, i0 ∈ G(i)} ⊂ 2

If the feasible choice correspondence is ascending, then increasing differences implies the optimal choice correspondence is ascending. See Gordon and Qiu (2018) for details.

6

I × I 0 . That is, monotonicity implies the objective function exhibits increasing differences at all states and relevant choices, i.e., choices that are optimal for some state. In this respect, monotonicity is a weaker condition that implies increasing differences “around” optimal choices. Later, a sovereign default model will be given that has a monotone policy function but that does not necessarily increasing differences. However, it will be seen that for moderate to small levels of σ in the numerically relevant set (i.e., for i0 such that P(i0 |i) ≥ ε), the model behaves as if it had increasing differences and the monotonicity algorithm delivers a correct result.

3

Algorithms for exploiting choice probability structures

3.1

Exploiting monotonicity

The algorithm for exploiting monotonicity of L∗ (i.e., increasing differences of U ) is as follows: Algorithm 1: Binary monotonicity with taste shocks Parameters: A σ > 0, a ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: None Output: U (i, i0 ) for all (i, i0 ) having P(i0 |i) ≥ ε. 1. Let i = 1 and i := n 2. Solve for U (i, i0 ) for each i0 . Solve for L∗σ (i0 |i) for each i0 . Find the smallest and largest i0 such that L∗σ (i0 |i) ≥ Lσ and save them as l(i) and h(i), respectively. 3. Solve for U (i, i0 ) for i0 = l(i), . . . , n0 . Solve for L∗σ (i0 |i) for each i0 = l(i), . . . , n0 assuming that U (i, i0 ) = −∞ for i0 = 1, . . . , l(i) − 1. Find the smallest and largest i0 in {l(i), . . . , n0 } such that L∗σ (i0 |i) ≥ Lσ and save them as l(i) and h(i), respectively. 4. Main loop: (a) If i < i + 1, STOP. Otherwise, define m = b(i + i)/2c. (b) Solve for U (m, i0 ) for i0 = l(i), . . . , h(i). Solve for L∗σ (i0 |m) for each i0 = l(i), . . . , h(i) assuming that U (m, i0 ) = −∞ for i0 = 1, . . . , l(i) − 1, h(i) + 1, . . . , n0 . Find the smallest and largest i0 in {l(i), . . . , h(i)} such that L∗σ (i0 |m) ≥ Lσ and save them as l(m) and h(m), respectively. (c) Go to step 4(a) twice, once redefining (i, i) := (i, m) and once redefining (i, i) := (m, i). At Step 4 of the algorithm, it is always the case (provided U has increasing differences) that L∗σ (i0 |m)

< Lσ for all i0 < l(i) or i0 > h(i). Hence, the algorithm’s treatment of U (m, i0 ) = −∞ for

these choices, while theoretically biased, is numerically correct as long as ε is small. Larger ε values will result in numerical bias but may still be useful in earlier stages of the computation or if the speed/accuracy tradeoff favors speed.

7

The algorithm as applied to the U (i, i0 ) = log(bi −bi0 /2)+log(bi0 ) example is depicted graphically in Figure 2. The top panel shows the algorithm when n = n0 = 35. The blue dots give the computed l(·) and h(·) bounds on numerical relevance. The gray empty circles show where U ends up being evaluated. For i = 1, nothing is known and so one “must”—though we will show with concavity it actually need not be—evaluated U (1, ·) at all i0 . This corresponds to step 2. In step 3, the algorithm moves to i = 35. There, U (35, ·) is evaluated at {l(1), . . . , n}, but since l(1) = 1 for this example, U is evaluated everywhere. In step 4, the algorithm goes to the midpoint of 1 and 35, which is 18. Because l(1) = 1 and h(35) = 35, U (18, ·) must be evaluated at all i0 again. So far, the algorithm has gained nothing. However, when the divide and conquer occurs, the algorithm goes to 9 (b(1 + 18)/2c) and 26 (b(18 + 35)/2c). Consider what happens at 9. There, U must be evaluated at {l(1), . . . , h(18)} = {1, . . . , 21}, an improvement over the 35 in the naive algorithm. As the divide and conquer steps progress, the search space is increasingly narrowed. For instance, at i = 4 (arriving at it via b(3 + 5)/2c), U (4, ·) is evaluated at {l(3), . . . , h(5)} = {3, 4, 5}. This is an order of magnitude fewer evaluations than in the naive approach. Increasing the number of points to 250, as is done in the bottom panel Figure 2, shows the algorithm computes wastes very few evaluations. In particular, everywhere in between the blue lines must be evaluated to construct the choice probabilities, and so the algorithm evaluates all those spots. The gray area outside the blue lines are the “wasted” evaluations that give U at not numerically relevant i0 choices, but this is only a small amount of the overall space. Note that smaller σ tend to shrink the area between the blue lines, which reduces the overall algorithm cost.

3.2

Exploiting concavity

Using this concavity property, the following algorithm—the structure of which falls —identifies the set of relevant i0 values for a given i. Algorithm 2: Binary concavity with taste shocks Parameters: A σ > 0, a ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: An i ∈ {1, . . . , n} and a, b such that L∗σ (i0 |i) < Lσ for all i0 < a and i0 > b. Output: U ∗ (i), l(i), h(i), and g(i) such that L∗σ (i0 |i) ≥ Lσ if and only if i0 ∈ {l(i), . . . , h(i)} and g(i) ∈ G(i). 1. Solve for any maximizer in arg maxi0 ∈{a,...,b} U (i, i0 ) using Heer and Maußner (2005)’s algorithm as described in Gordon and Qiu (2018). Call the selected maximizer g(i). Note that by virtue of L∗σ (i0 |i) < Lσ (i) for all i0 < a and i0 > b, G(i) = arg maxi0 ∈{a,...,b} U (i, i0 ) and U ∗ (i) = U (i, g(i)). Consequently, one may now obtain L∗σ (i0 |i) for any i0 as U (i, i0 )−U ∗ (i). 2. Define i0 = g(i) (a) If i0 = a, STOP. Store i0 as l(i). (b) If L∗σ (i0 − 1|i) < Lσ , save i0 as l(i) and STOP. Store i0 as l(i). 8

35 30 25 20 15 10 5 1

3

5

7

9

1

16

32

47

63

11

13

15

18

20

22

24

26

28

30

32

35

250

200

150

100

50

78

94 109 125 140 156 171 187 202 218 234 250

Figure 2: Illustration of Algorithm 1 evaluations for two grid sizes

9

(c) Decrement i0 by 1 and go to 2a. 3. Define i0 = g(i) (a) If i0 = b, STOP. Store i0 as h(i). (b) If L∗σ (i0 + 1|i) < Lσ , save i0 as h(i) and STOP. Store i0 as h(i). (c) Increment i0 by 1 and go to 3a. Steps 2 and 3 are efficient if preference shocks are small. Since the worst case bounds we obtain are conditioned on small preference shocks, we use the small shock case as the benchmark algorithm. However, for larger taste shocks one should prefer a divide-and-conquer in steps 2 and 3 to obtain the largest i0 ∈ {a, . . . , g(i)} such that L∗σ (i0 |i) ≥ Lσ (smallest i0 ∈ {g(i), . . . , b} such that L∗σ (i0 |i) ≥ Lσ ) would be preferred for large shocks.

3.3

Exploiting monotonicity and concavity

I now combine Algorithm 1 and 2 to achieve obtain an efficient monotonicity and concavity algorithm. Algorithm 3: Binary monotonicity and binary concavity with taste shocks Parameters: A σ > 0, a ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: None Output: U (i, i0 ) for all (i, i0 ) having P(i0 |i) ≥ ε. 1. Let i = 1 and i := n 2. Use Algorithm 2 with (i, a, b) = (i, 1, n0 ) to solve for l(i), h(i), U ∗ (i) and g(i). Solve for L∗σ (i0 |i) for each i0 = l(i), . . . , g(i) − 1, g(i) + 1, . . . , h(i)} (note L∗σ (g(i)|i) = 0). Use L∗σ (i0 |i) to recover P(i0 |i) for all i0 assuming P(i0 |i) = 0 for i0 < l(i) and i0 > h(i). Using L∗σ (i0 |i) = U (i, i0 )−U ∗ (i), recover U (i, ·) for all i0 (which requires only addition). 3. Use Algorithm 2 with (i, a, b) = (i, l(i), n0 ) to solve for l(i), h(i), U ∗ (i) and g(i). Solve for L∗σ (i0 |i) for each i0 = l(i), . . . , g(i) − 1, g(i) + 1, . . . , h(i)} (note L∗σ (g(i)|i) = 0). Use L∗σ (i0 |i) to recover P(i0 |i) for all i0 assuming P(i0 |i) = 0 for i0 < l(i) and i0 > h(i). 4. Main loop: (a) If i < i + 1, STOP. Otherwise, define m = b(i + i)/2c. (b) Use Algorithm 2 with (i, a, b) = (m, l(i), h(i)) to solve for l(m), h(m), U ∗ (m) and g(m). Solve for L∗σ (i0 |m) for each i0 = l(m), . . . , g(m)−1, g(m)+1, . . . , h(m)} (note L∗σ (g(m)|m) = 0). Use L∗σ (i0 |m) to recover P(i0 |m) for all i0 assuming P(i0 |m) = 0 for i0 < l(m) and i0 > h(m). (c) Go to step 4(a) twice, once redefining (i, i) := (i, m) and once redefining (i, i) := (m, i). 10

4

Algorithm efficiency

4.1

Theoretical cost bounds

For all the algorithms, the worst case behavior can be very bad for two reasons. First, if U (i, ·) is a constant, then L∗σ (i0 |i) = 0 for all i, i0 . Consequently, l(i) = 1 and h(i) = n0 for all i and U must be evaluated everywhere. Second, if σ is large enough, every choice will be chosen with virtually equal probability, which requires evaluating utility from all the choices. However, if σ is small and U (i, ·) has a unique maximizer for each i (there is a unique optimal policy g in the σ = 0), then one can obtain the following theoretical cost bounds as stated in Propositions 1, 2, and 3. Proposition 1. Consider Algorithm 1. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any n, n0 , there is a σ(n, n0 ) > 0 such that U is evaluated at most n0 log2 (n)+3n0 +2n times and, fixing n0 = n, the algorithm is O(n log n) with a hidden constant of 1. Proposition 2. Consider Algorithm 2. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any n, n0 , there is a σ(n, n0 ) > 0 such that U is evaluated at most 2n log2 (n0 ) + 3n times if n0 ≥ 3 and, fixing n0 = n, the algorithm is O(n log n) with a hidden constant of 2. Proposition 3. Consider Algorithm 3. Suppose U (i, ·) has a unique maximizer for each i. Then for any ε > 0 and any (n, n0 ) there is a σ(n, n0 ) > 0 such that U is evaluated fewer than 8n + 8n0 + 2 log2 (n0 ) times and, fixing n0 = n, the algorithm is O(n) with a hidden constant of 16.

4.2

Numerical example #1: Bewley-Huggett-Aiyagari

The previous subsection established very efficient worst case bounds for the case of small taste shocks. I now explore the algorithm’s efficiency properties numerically for moderately sized taste shocks, quantifying the tradeoff between taste shock size and speed. 4.2.1

Model description

The incomplete market model I consider is from Aiyagari (1994) with a zero borrowing limit. The household problem can be written V (a, e) = max u(c) + βEe0 |e V (a0 , e0 ) 0 a ≥0

.

(11)

s.t. c + a0 = we + (1 + r)a The equilibrium prices w and r are determined by optimality conditions of a competitive firm and R given as r = FK (K, N )−δ, and w = FN (K, N ). Aggregate labor and capital are given by N = edµ R and K = adµ, respectively. 4.2.2

The tradeoff between speed and taste shock size

Figure 3 shows the algorithm performance for commonly used grid sizes at differing levels of taste shock sizes. The blue solid line shows what one would guess from the theory: Using binary mono11

tonicity and concavity, the speedup is linear. For the other algorithms, the speedup is almost linear, differing only by a log factor. 40 100

30 20

50 10 0

0 100

200

300

400

500

40

20

30

15

20

10

10

5

0

100

200

300

400

500

100

200

300

400

500

100

200

300

400

500

0 100

200

300

400

500

30

20

10

10

5

0

0 100

200

300

400

500

Figure 3: Algorithm speedup by taste shock magnitude and grid size As the taste shock size increases, the performance seems to asymptote. For σ = 0.01, the algorithm is around 10 times faster for a 500 point grid size whereas for σ ≈ 0 the algorithm is around 135 times faster.

12

4.2.3

Optimal taste shock levels

In the previous subsection, one saw that comparatively large grid sizes hamper the asymptotic performance of the algorithm. I now show how in general there is an optimal shock size and that this value tends to zero as shock sizes increase. First, it is worth examining how taste shocks change the optimal policy. Figure 4 does this, revealing the classic saw-tooth pattern of consumption that is familiar to anyone who has worked with these models. Using taste shocks smooths out “the” consumption policy—here, I am treating the consumption policy as the expected consumption associated with a given asset and income level (integrating out the taste shock values). 5

5

4

4

3

3

2

2

1

1

0

0 0

10

20

30

40

50

-1

-1

-2

-2

-3

-3

-4

-4

-5

-5

-6

0

10

20

30

40

50

0

10

20

30

40

50

-6 0

10

20

30

40

50

Figure 4: Policy and Euler error comparison The effect on the Euler equation errors can be large. At medium-levels of asset holdings, the errors drop by as much as 2 orders of magnitude. However, at the upper bound and lower bound of the grid, there is not necessarily any improvement. The reason is that the consumption policy is R being treated as c(a, e, ε)dF (ε), and so the calculation does not account for the points where the borrowing constraint (or saving constraint) are binding for only some ε. This can be seen for the 13

high-earnings policy in which the borrowing constraint is not binding and the Euler error is small. Figure 5 shows a clear tradeoff between shock size and the average Euler errors. At small values of taste shocks, the comparatively large errors are driven by the discrete grid. At large taste shock sizes, the errors are instead driven by “mistakes” coming from agents purposely choosing positions that have comparatively low utility. The optimal value for this calibration and grid size is around 10−2.6 ≈ 0.002, which is the value used in Figure 4.

Figure 5: Taste shocks’ impact on Euler errors Clearly, as grid sizes increase, the first type of error falls. The second type of error, however, does not tend to change. The import of this is that the optimal taste shocks size—in the sense of minimizing average Euler equation errors—is diminishing in grid size. This is revealed in Figure 6, which plots average Euler equation errors for differing grid and taste shock sizes. For grids of 100, σ = 0.1 is optimal. This decreases to 0.01 for grid sizes of 250. For grids of 1000, the optimal size falls by another order of magnitude to 0.001. The difference between an optimal taste shock size and an arbitrary size is generally orders of magnitude. Additionally, a 250 point grid with an optimally chosen σ is far better (with average errors around 10−.6 ≈ 4 times smaller) than a 1000 point grid with no taste shocks. 14

-1

-1.5

-2

-2.5

-3

-3.5

-4 0

100

200

300

400

500

600

700

800

Figure 6: Euler errors by taste shock magnitude and grid size

15

900

1000

More importantly, the fact that the optimal σ is decreasing in grid size implies that the algorithms are not constrained by the asymptotes at large σ seen in Figure 3. 4.2.4

Market clearing

One advantage of having preference shocks is that, provided the utility function U moves continuously in the underlying object, the policy functions (averaged across choice probabilities) do as well. This has many benefits, one of which can be seen in Figure 7, a reproduction of the classic market clearing figure in Aiyagari (1994). In the top panels, it seems that as the interest rate r increases, the implied aggregate asset holdings labeled Ea(r) increases continuously. However, upon further inspection in the bottom panels, aggregate assets actually move discontinuously for σ = 0. For some parameterizations, this is more of an issue than for others. Here, K(r) falls close to Ea(r) for r ≈ 0.014. However, the model rules out the possibility that capital would be between 7.7 and 8, which could be an issue in calibration. In contrast, for σ > 0.01, demand moves continuously. 0.03

0.03

0.02

0.02

0.01

0.01

0

0

-0.01

-0.01

-0.02

-0.02

-0.03

-0.03 5

10

15

20

25

5

0.018

0.018

0.017

0.017

0.016

0.016

0.015

0.015

0.014

0.014

0.013

0.013

0.012

10

15

20

25

0.012 7

8

9

10

7

8

Figure 7: Market clearing in Aiyagari (1994)

16

9

10

4.3

Numerical example #2: Sovereign default

Hatchondo and Martinez (2009) and Chatterjee and Eyigungor (2012) consider sovereign default models with long-term bonds. As emphasized in Chatterjee and Eyigungor (2012), these models suffer from convergence issues. I will show how that is the case, and how it can be eliminated through the use of taste shocks. I will then characterize the speedup for taste shocks admitting convergence. 4.3.1

Model description

The model has sovereign with a total stock of debt −b and output y. If the sovereign decides to default on its debt, the country moves to autarky with output reduced by φ(y). The country returns from autarky with probability ξ, in which case all its outstanding debt has gone away. When not defaulting and not in autarky, the sovereign has to bay pack (λ + (1 − λ)z)(−b)) reflecting that λ fraction of the debt matures and, for the fraction not maturing, a coupon z must be paid. Additionally, the sovereign issues (or buys back) −b0 + (1 − λ)b units of debt at price q(b0 , y)—the price depends only on the next period total stock of debt and current output because they are sufficient statistics for determining repayment rates. The sovereign’s problem may be written V (b, y) = max (1 − d)V r (b, y) + dV d (y) d∈{0,1}

where the value of repaying is V r (b, y) = max u(c) + βEy0 |y V (b0 , y 0 ) 0 b ∈B

0

s.t. c = y − q(b , y)(b0 − (1 − λ)b) + (λ + (1 − λ)z)b and the value of default is h i V d (y) = u(y − φ(y)) + Ey0 |y (1 − ξ)V d (y) + ξV (0, y) Let the optimal bond policy for the V r problem be denoted a(b, y). Using the sufficient conditions Gordon and Qiu (2018), the optimal policy a is monotone in b. The price schedule is a solution to q = T ◦ q where (T ◦ q)(b0 , y) = (1 + r∗ )−1 Ey0 |y (1 − d(b0 , y 0 )) λ + (1 − λ)(z + q(a(b0 , y 0 ), y 0 ) The price schedule reveals a fundamental problem with using discrete choice spaces in this model: Small changes q can cause discrete changes in a, which then cause discrete changes in T ◦ q that inhibit or prevent convergence. Hatchondo and Martinez (2009) use continuous choice spaces with splines and seem to not have convergence troubles. Chatterjee and Eyigungor (2012) incorporate a continuously distributed i.i.d. shock to facilitate convergence. Taste shocks are, however, a simpler 17

option that accomplishes the same task. In terms of discretization, the fastest way to solve this problem with taste shocks is by solving for V directly rather than recovering V r first.3 Specifically, we solve the problem V (i, y) =

max

i0 ∈{1,...,1+#B}

h i u(c) + βEy0 |y (1 − d)V (i0 − 1, y 0 ) + dV d (y 0 ) b = Bi (

(d, b0 ) = ( c=

(0, Bi0 −1 ) (1, arbitrary)

if i0 > 1 if

i0

(12)

=1

y − q(b0 , y)(b0 − (1 − λ)b) + (λ + (1 − λ)z)b

if d = 0

y − φ(y)

if d = 1

The implied optimal policy g(i, y) is monotone for each y. (Figure 9 will show explicitly what the optimal policy looks like.) Taste shocks are added to the formulation in (12), which brings up a common issue in logit-style models, namely, the red bus/blue bus paradox discussed in Rust (1994) and elsewhere. Here, the issue can be stated as follows. If V d = V r , one would expect that the probability of default should be 50%. But here the probability of default will generally be less than that. E.g., if V r has 3 b0 choices that attain the maximum with all the rest infeasible, then the probability of default will be 25%. There are different ways to deal with this, such as a nested logit (Rust, 1994). However, since our focus is on small σ where this is less of an issue, I will just use a non-nested logit. 4.3.2

Convergence only with preference shocks

Figure 8 shows the lack of convergence for σ = 0, even after 2500 iterations with a relaxation parameter above 0.95 and with 100 income states. In contrast with σ = 10−3 , convergence happens without any relaxation. For the intermediate case of σ = 10−5 , convergence does not occur with a relaxation parameter of 0, but as soon as it begins increasing the supnorms for V and q trend downward. 4.3.3

Monotone policies, but not necessarily increasing differences

By Lemma 2(f) in Gordon and Qiu (2018), increasing differences of u ◦ c holds if c(b, b0 ) := y − q(b0 , y)(b0 − (1 − λ)b) + (λ + (1 − λ)z)b is increasing in b, decreasing in b0 , and c has increasing differences. All of these hold except for possibly c decreasing in b0 . For b02 > b01 , this holds if and only if q(b02 , y)b02 − q(b01 , y)b01 ≥ (1 − λ)(q(b02 , y) − q(b01 , y))(−b) 3

(13)

Conditional on repayment d = 0, at high debt levels −b and low output levels y, the price schedule may be identically equal to 0 for a large range of b0 choices and the continuation utility may βEy0 |y V (b0 , y 0 ) may not vary in b0 because default is expected next period. This causes inefficiency in the algorithm because many b0 choices can then be optimal. However, the default decision d(b, y) is decreasing in b, y. So, by adding the constraint b0 = B1 if d = 1 and ordering the choices (d, b0 ) as (1, B1 ), (0, B1 ), (0, B2 ), . . . , (0, Bn ), the implied optimal policy will be monotone in b and will not have these “flat spots.”

18

0 -2 -4 -6 -8 500

1000

1500

2000

2500

500

1000

1500

2000

2500

500

1000

1500

2000

2500

-2 -4 -6 -8

1

0.95

0.9

Figure 8: Convergence and lack of it with long-term debt

19

for b02 > b01 . Even with short-term debt λ = 1, the above inequality does not necessarily hold for b01 < 0: q(b02 , y)b02 −q(b01 , y)b01 = q(b02 , y)(b02 −b01 )+(q(b02 , y)−q(b01 , y))b01 . However, with long-term debt λ < 0, there is some incentive to dilute existing debt (captured by the righthand side of the above inequality), which is increasing in existing debt −b and gives the inequality additional reasons for not holding. Analogously to Figure 2, Figure 9 plots boundaries on choice probabilities for an intermediate level of output computed without any monotonicity assumptions. Both the double precision cutoff of P(i0 |i) ≥ 10−16 and the single precision cutoff of P(i0 |i) ≥ 108 are monotone (the one red plus at i = 95 corresponds to the upper bound). While the absence of increasing differences globally means this behavior is not guaranteed, it does follow intuitively from increasing differences on the graph of the optimal policy correspondence for σ = 0. The importance of this result is that, at either of these cutoffs, Algorithm 1 will work correctly. 250 200 150 100 50 0 0

50

100

150

200

250

130 120 110 100 90 85

90

95

100

105

110

115

120

125

130

135

Figure 9: Choice probabilities Figure 10 examines whether the objective function U (i, i0 ; y) exhibits increasing differences locally, i.e., whether U (i, i0 + 1; y) − U (i, i0 ; y) ≤ U (i + 1, i0 + 1; y) − U (i + 1, i0 ; y). The top panel reveals increasing differences does not seem to hold over a large amount of choice space. However, the horizontal regions of non-increasing differences are driven by b0 choices that are on the wrong-side of 20

the debt Laffer curve. These choices result in very low utility and so are not chosen optimally with any numerically relevant probability. When excluding (b, y) that result in default with probability greater than 1 − 10−16 , one arrives at the bottom panel’s contour lines. There are still some regions where increasing differences is violated, namely, high debt state and low debt choice. However, these regions are never visited in equilibrium with significant probability as can be seen by comparing with the circled regions (which represent i0 having P(i0 |i; y) > 10−16 for some (i, y)). 250 1

200

0.8

150

0.6

100

0.4 0.2

50

0

0 0

50

100

150

200

0

50

100

150

200

250 200 150 100 50 0 250

Figure 10: Locally increasing differences and equilibrium choices

4.3.4

Empirical speedup

With these results, we consider a guess-and-verify approach to exploiting monotonicity, computing the optimal policy assuming monotonicity and lastly checking, with brute force, whether the computed V and q functions constitute an equilibrium. The algorithm’s speedups are essentially identical to those in the incomplete markets model. This can be seen in Figure 11 where the speedups grow almost linearly as measured by evaluation counts provided σ is close to zero. (Here, we do not look at run times to avoid a very costly computation of the naive algorithm and allow parallelization to be used.) However, there is a tradeoff in that small σ values require more iterations for convergence (along with larger relaxation 21

parameters). Note that a 2500 for the iteration count means the solution did not converge by the 2500th iteration. The consequence in terms of overall run times is that even though larger values of σ are less efficient per iteration, they require far fewer iterations and have an overall lower run time. For these results, it is important to keep in mind that the optimal taste shock size σ is decreasing in grid sizes. Consequently, the speedup improves as n increases and σ decreases. 40 30 20 10 0 50

100

150

200

250

300

350

400

450

500

100

150

200

250

300

350

400

450

500

2500 2000 1500 1000 500 50

Figure 11: Default model speedups and iterations to convergence

5

Exploiting monotonicity in two states

So far, I have focused on the one-dimensional case. But in many cases there can be monotonicity in more than one state variable, such as in the incomplete markets model where the savings policy is monotone bonds and earnings.

22

5.1

The algorithm

Consider now a problem with a two-dimensional state space (i1 , i2 ) ∈ {1, . . . , n1 } × {1, . . . , n2 } with n := n1 n2 states overall. Defining everything analogously to section 2, e.g., max

i0 ∈{1,...,n0 }

and

U (i1 , i2 , i0 ) + σi0

(14)

exp(U (i1 , i2 , i0 )/σ) , P(i0 |i1 , i2 ) = PN 0 j 0 =1 exp(U (i1 , i2 , j )/σ)

(15)

all the preceding results for the one-dimensional case go through holding the other state fixed. For instance, the monotonicity result in (8) becomes ( Lσ (j 0 , i0 |(i1 , i2 )) is

decreasing in i1 and i2 if j 0 > i0 increasing in i1 and i2 if j 0 < i0

.

(16)

The below algorithm exploits this monotonicity in an efficient way. Because the logic of the algorithm is simple but its details are complicated, I present a simplified version here and the full version in the appendix. Algorithm 4: Binary monotonicity in two-states taste shocks (simplified) Parameters: A σ > 0, a ε > 0 such that P(i0 |i) < ε will be treated as zero, and Lσ := σ log(ε). Input: None Output: U (i1 , i2 , i0 ) for all (i1 , i2 , i0 ) having P(i0 |i1 , i2 ) ≥ ε. 1. Solve for U (·, 1, i0 ) using Algorithm 1 or 3. Save the l function from it as l(·, 1). 2. Solve for U (·, n2 , i0 ) using Algorithm 1 or 3 while only checking i0 ≥ l(·, 1). Save the h function from it as h(·, n2 ). Set (i2 , i2 ) := (1, n2 ). 3. Main loop: (a) If i2 < i2 + 1, STOP. Otherwise, define m2 = b(i2 + i2 )/2c. (b) Solve for U (·, m2 , i0 ) using Algorithm 1 or 3 but restricting the search space additionally to l(·, i2 ) and h(·, i2 ). Save the l and h coming from the one-state algorithm as l(·, m2 ) and h(·, m2 ), respectively. (c) Go to step 4(a) twice, once redefining (i2 , i2 ) := (i2 , m2 ) and once redefining (i2 , i2 ) := (m2 , i2 ).

5.2

Theoretical cost bounds

Proposition 4. Consider Algorithm 4 for exploiting monotonicity only (i.e., using Algorithm 1 instead of 3). Suppose that, for each i1 , i2 , U (i1 , i2 , ·) has a unique maximizer g(i1 , i2 )—monotone

23

in both arguments—and that n1 , n2 , n0 ≥ 4. Let λ ∈ (0, 1] be such that for every j ∈ {2, . . . , n2 − 1}, one has g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1). Then for any ε > 0 and any (n1 , n2 , n0 ) there is a σ(n1 , n2 , n0 ) > 0 such that U is evaluated fewer than (1+λ−1 ) log2 (n1 )n0 nκ2 +3(1+λ−1 )n0 nκ2 +4n1 n2 +2n0 log2 (n1 )+6n0 times where κ = log2 (1+λ). Moreover, for n1 = n0 =: n and n1 /n2 = ρ with ρ a constant, the cost is O(n1 n2 ) with a hidden constant of 4 if (g(n1 , j) − g(1, j) + 1)/(g(n1 , j + 1) − g(1, j − 1) + 1) is bounded away from 1 for large n.

5.3

Empirical speedup

Figure 12 reveals the evaluation counts per state for the fastest methods for two levels of σ. As suggested (though not guaranteed for all U ) by Proposition 4, the two-state monotonicity algorithm’s evaluations counts fall and seem to level off at slightly more than 3.0 by the time #A = 500 and #E = 50 are reached. This is around a 167-fold improvement on brute force, even without a concavity assumption. When also exploiting concavity, the evaluation counts fall to around 2.4, a 208-fold improvement on brute force. With larger σ, the evaluation counts continue to rise in grid sizes resulting in less dramatic speedups. For instance, at the largest grid size, the two-state algorithms and monotonicity with concavity all have evaluation counts of around 17. While this is not the dramatic 200 times better than brute force, it is still 29 times better. With larger σ, the evaluation counts continue to rise in grid sizes resulting in less dramatic speedups. For instance, at the largest grid size, the two-state algorithms and monotonicity with concavity all have evaluation counts of around 17. While this is not the dramatic 200 times better than brute force, it is still 29 times better. Figure 13 directly shows the speedups in terms of evaluations counts and run times. The speedups measured by evaluation counts are as expected from the privous graph. Using a twostate algorithm is particular advantageous when exploiting monotonicity only as the speedups go from less than 50 to more than 150. The advantages in terms of runtimes for σ = 10−16 are above 40 when using the two-state algorithm. The speedup tapers off at larger grid sizes because the cost of other parts of the solution method, such as computing continuation utilities, grow faster than the O(n1 n2 ) methods.

6

Conclusion

Taste shocks have a large number of advantages but, with naive algorithms, imply large computational cost. However, if the objective function exhibits increasing differences or is concave, then the proposed algorithms are capable of drastically speedup theoretically for small taste shocks. Empirically, the algorithms exhibit a speedup that asymptotes if taste shocks are moderate or large. While the speedup is still large in absolute terms (greater than 16 for the examples considered), the benefit of the algorithms is limited. However, I showed that the optimal size of taste shocks, 24

16 14 12 10 8 6 4 2 50

100

150

200

250

300

350

400

450

500

100

150

200

250

300

350

400

450

500

30 25 20 15 10 5 50

Figure 12: Evaluations per state

25

200 40 150 30 100 20 50

10

0

0 100

200

300

400

500

150

100

200

300

400

500

100

200

300

400

500

40 30

100

20 50 10 0

0 100

200

300

400

500

Figure 13: Two-state algorithm speedup by taste shock magnitude and grid size

26

measured by average Euler errors, goes to zero as grid sizes increase. Consequently, one should decrease taste shocks to arbitrarily small levels as grid sizes increase, allowing the theoretical results to apply empirically. With all the significant theoretical advantages of taste shocks, the algorithms proposed in this paper should enable the computation of many new and challenging models.

References S. R. Aiyagari. Uninsured idiosyncratic risk and aggregate savings. Quarterly Journal of Economics, 109(3):659–684, 1994. T. Bewley. A difficulty with the optimum quantity of money. Econometrica, 51(5):145–1504, 1983. M. Bruins, J. A. Duffy, M. P. Keane, and A. A. Smith, Jr. Generalized indirect inference for discrete choice models. Mimeo, July 2015. S. Chatterjee and B. Eyigungor. Maturity, indebtedness, and default risk. American Economic Review, 102(6):2674–2699, 2012. S. Chatterjee and B. Eyigungor. Continuous markov equilibria with quasi-geometric discounting. Journal of Economic Theory, 163:467–494, 2016. S. Chatterjee, D. Corbae, K. Dempsey, and J.-V. R´ıos-Rull. A theory of credit scoring and competitive pricing of default risk. Mimeo, August 2015. D. Childers. Solution of rational expectations models with function valued states. Mimeo, 2018. K. X. Chiong, A. Galichon, and M. Shum. Duality in dynamic discrete-choice models. Quantitative Economics, 1:83–115, 2016. G. Gordon and P. A. Guerron-Quintana. On regional migration, borrowing, and default. Mimeo, 2018. G. Gordon and S. Qiu. A divide and conquer algorithm for exploiting policy function monotonicity. Quantitative Economics, 2018. Forthcoming. J. C. Hatchondo and L. Martinez. Long-duration bonds and sovereign defaults. Journal of International Economics, 79(1):117–125, 2009. B. Heer and A. Maußner. Dynamic General Equilibrium Modeling: Computational Methods and Applications. Springer, Berlin, Germany, 2005. M. Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, 1993.

27

F. Iskhakov, T. H. Jørgensen, J. Rust, and B. Schjerning. The endogenous grid method for discretecontinuous dynamic choice models with (or without) taste shocks. Quantitative Economics, 8(2): 317–365, 2017. J. Laitner. Household bequest behavior and the national distribution of wealth. The Review of Economic Studies, 46(3):467–483, 1979. R. D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, New York, 1959. D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics, chapter 4. Academic Press, 1974. R. D. McKelvey and T. R. Palfrey. Quantal response equilibria for normal form games. Games and Economic Behavior, 10:6–38, 1995. J. Rust. Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 55(5):999–1033, 1987. J. Rust. Structural estimation of markov decision processes. In R. F. Engle and D. L. McFadden, editors, Handbook of Econometrics, volume IV, chapter 51. Elsevier Science B.V., 1994. C. Santos and D. Weiss. “Why not settle down already?” A quantitative analysis of the delay in marriage. International Economic Review, 57(2):425–452, 2016.

A

Omitted proofs [Not for publication]

Proof of Proposition 1. Note that L∗σ (i0 |i) = U (i, i0 ) − U (i, g(i)) is invariant to σ. Additionally, L∗σ (i0 |i) ≤ 0 with equality if and only if i0 = g(i). The cutoff Lσ = σ log(ε) < 0 can be made arbitrarily close to zero by decreasing σ. Consequently, for a small enough σ, the condition L∗σ (i0 |i) ≥ Lσ is satisfied if and only if (i0 , i) = (g(i), i). In this case, l(i) and h(i) equal g(i). For l and h equal to g, step 4 evaluates U (m, i0 ) only at {g(i), . . . , g(i)}. This is the same as in Step 3 of Gordon and Qiu (2018). Similarly, step 2 (3) evaluate U (1, i0 ) (U (n, i0 )) only at {1, . . . , n0 } ({g(1), . . . , n0 }). This is the same as in step 1 (2) of Gordon and Qiu (2018). Since the same divide and conquer step is used, the total number of evaluations of U does not exceed (n0 −1) log2 (n−1)+3n0 +2n−4 (Proposition 1 in Gordon and Qiu, 2018). Simplifying this expression by making it slightly looser gives the desired result. Proof of Proposition 2. As in the proof of Proposition 1, we note that for small enough σ one has l(i) and h(i) equal to g(i). In this case, the algorithm reduces to binary concavity as stated in Gordon and Qiu (2018) except for steps 3 and 4 of Algorithm 2. For small enough preference shocks, each of these requires 1 additional evaluation—specifically U (g(i) − 1, i) and U (g(i) + 1, i), respectively—, which results in 2n extra evaluations overall. Gordon and Qiu (2018) show binary concavity for a fixed i requires at most 2dlog2 (n0 )e − 1 evaluations if n0 ≥ 3 and n evaluations for 28

n0 ≤ 2. Consequently, for n0 ≥ 3 there are at most n·(2+2dlog2 (n0 )e−1) evaluations for Proposition 2 if n0 ≥ 3. Simplifying, there are fewer than 2n log2 (n0 ) + 3n evaluations required. Proof of Proposition 3. As in the proof of Proposition 1, we note that for small enough σ one has l(i) and h(i) equal to g(i). In this case, the algorithm reduces to binary monotonicity with binary concavity except for steps 3 and 4 of Algorithm 2. For small enough preference shocks, each of these requires 1 additional evaluation—specifically U (g(i) − 1, i) and U (g(i) + 1, i), respectively—, which results in 2n extra evaluations overall. Consequently, the 6n + 8n0 + 2 log2 (n0 − 1) − 15 bound in Proposition 1 of Gordon and Qiu (2018) becomes 8n + 8n0 + 2 log2 (n0 − 1) − 15. Simplifying slightly by making the bounds looser gives the desired result. Proof of Proposition 4. As in the proof of Propositions 1 and 3, we note that for small enough σ, one has l and h equal g. Consequently, when using Algorithm 1 at each step, the algorithm reduces to the two-state binary monotonicity algorithm in Gordon and Qiu (2018). Proposition 2 of Gordon and Qiu (2018) then gives the desired upper bound and O(n1 n2 ) behavior.

29