A Divide and Conquer Algorithm for Exploiting Policy ...

Viewer
Transcript

A Divide and Conquer Algorithm for Exploiting Policy Function Monotonicity Grey Gordon and Shi Qiu∗ April 10, 2017

Abstract A divide and conquer algorithm for exploiting policy function monotonicity is proposed and analyzed. To solve a discrete problem with n states and n choices, the algorithm requires at most n log2 (n)+5n objective function evaluations. In contrast, existing methods for non-concave problems require n2 evaluations in the worst case. For concave problems, the solution technique can be combined with a method exploiting concavity to reduce evaluations to 14n + 2 log2 (n). A version of the algorithm exploiting monotonicity in two state variables allows for even more efficient solutions. The algorithm can also be efficiently employed in a common class of problems that do not have monotone policies, including problems with many state and choice variables. In the sovereign default model of Arellano (2008) and the real business cycle model, the algorithm reduces run times by an order of magnitude for moderate grid sizes and orders of magnitude for larger ones. Sufficient conditions for monotonicity are provided. Keywords: Computation, Monotonicity, Grid Search, Sovereign Default JEL Codes: C61, C63, E32, F34

∗

Grey Gordon (corresponding author), Indiana University, Department of Economics, [email protected]. Shi Qiu, Indiana University, Department of Economics, [email protected]. The authors thank Kartik Athreya, Bob Becker, Alexandros Fakos, Filomena Garcia, Bulent Guler, Daniel Harenberg, Juan Carlos Hatchondo, Aaron Hedlund, Chaojun Li, Lilia Maliar, Serguei Maliar, Amanda Michaud, Jim Nason, Julia Thomas, Nora Traum, Todd Walker, Xin Wei, and David Wiczer, as well as three anonymous referees and participants at the Econometric Society World Congress 2015 and the Midwest Macro Meetings 2015. Fortran code is provided at https://sites.google.com/site/greygordon/code. All mistakes are our own.

1

1

Introduction

Many optimal control problems in economics are either naturally discrete or can be discretized. However, solving these problems can be costly. For instance, consider a simple growth model where the state k and choice k 0 are restricted to lie in {1, . . . , n}. One way to find the optimal policy g(k) is, for every k, to evaluate lifetime utility at every k 0 . The n2 cost of this “brute force” approach makes the cost grow quickly in n. In this paper, we propose a divide and conquer algorithm that drastically reduces this cost for problems having monotone policy functions. When applied to the growth model, the algorithm first solves for g(1) and g(n). It then solves for g(n/2) by only evaluating utility at k 0 greater than g(1) and less than g(n) since monotonicity of g gives g(n/2) ∈ {g(1), . . . , g(n)}. Similarly, once g(n/2) is known, the algorithm uses it as an upper bound in solving for g(n/4) and a lower bound in solving for g(3n/4). Continuing this subdivision recursively leads to further improvements, and we show this method—which we refer to as “binary monotonicity”—requires no more than n log2 (n)+5n objective function evaluations (lifetime utility evaluations in this example) in the worst case. When the objective function is concave, further improvements can be made. To see this, note that binary monotonicity restricts the search space but does not say, within a search space, how the optimum should be found. E.g., binary monotonicity gives that g(n/2) is in {g(1), . . . , g(n)}, but the maximum within this search space can be found in different ways. One way is brute force, i.e., checking every value. However, if the objective function is concave, one can check g(1), g(1) + 1, . . . sequentially and stop as soon as the objective function decreases. We refer to this approach as “simple concavity.”1 An alternative is Heer and Maußner (2005)’s method, which we will refer to as “binary concavity.” This method employs a divide and conquer step, evaluating the objective function in the middle of the search space and repeatedly discarding half of it.2 When binary monotonicity and binary concavity are combined, we prove the optimum can be found in at most 14n + 2 log2 (n) function evaluations. For problems with non-concave objectives like the Arellano (2008) sovereign default model, binary monotonicity vastly outperforms existing methods both theoretically and quantitatively. Beyond brute force, we are aware of only one commonly-used existing that exploits monotonicity. This method, which we refer to as “simple monotonicity,” restricts the search space for g(k) to {g(k − 1), . . . , n}.3 Like brute force, simple monotonicity require n2 evaluations in the worst case, which happens when g(·) = 1. Consequently, assuming worst-case behavior for all the methods, simple monotonicity is 8.7, 35.9, and 66.9 times slower for n equal to 100, 500, and 1000, respectively. While this worst-case behavior could be misleading, in practice we find it is not. For instance, in 1

This formulation is described in Judd (1998). The basic idea of this procedure dates back to at least Christiano (1990) where he started from a guess on the optimal policy and then went either left or right, using a decrease in utility as the stopping criterion. 2 Their algorithm in its original form can be found on p. 26 of Heer and Maußner (2005). Our implementation, given in Appendix A, differs slightly. Imrohoro˘ glu, Imrohoro˘ glu, and Joines (1993) employ an adaptive grid method that is conceptually very similar to binary concavity. Specifically, it progresses from coarse to fine search spaces recursively assuming concavity. 3 This approach dates back at least to Christiano (1990).

2

the Arellano (2008) model, we find that binary monotonicity is 5.1, 21.4, and 40.1 times faster than simple monotonicity for grid sizes of 100, 500, and 1000, respectively. Similar results hold in a real business cycle (RBC) model when not exploiting concavity. Binary monotonicity vastly outperforms simple monotonicity because the latter is only about twice as fast as brute force, requiring roughly .5n2 evaluations to compute the optimal policy. To test binary monotonicity’s performance when paired with methods exploiting concavity, we use the RBC model. There we find, despite its good theoretical properties, that binary monotonicity with binary concavity is only the second fastest combination of the nine possible pairings of monotonicity and concavity techniques. Specifically, simple monotonicity with simple concavity is around 20% faster, requiring only 3.0 objective function evaluations per state.4 In contrast, binary monotonicity with binary concavity requires 3.7 evaluations. While somewhat slower for the RBC model, the combination of binary monotonicity with binary concavity has guaranteed O(n) performance that may prove useful in different setups. While our benchmark algorithm exploits monotonicity in only one state variable, some models have monotonicity in multiple state variables. For instance, under certain conditions, the RBC model’s capital policy is monotone in both capital and productivity. To take advantage of this, we extend our divide and conquer approach to a “two-state binary monotonicity” algorithm. In the RBC model, this algorithm reduces evaluation counts per state to 2.1 or less for commonlyused grid sizes even without exploiting concavity. Consequently, the two-state binary monotonicity algorithm ends up being several times faster than the one-dimensional algorithm. We also show for a class of optimal policies that the algorithm asymptotically requires at most 4 evaluations per state. As knowing the true policy and simply recovering the value function using it would require one evaluation per state, this algorithm delivers very good performance. We also show binary monotonicity can be used in a class of models that does not have monotone policies. Specifically, it can be used for problems of the form maxi0 u(z(i)−w(i0 ))+W (i0 ) where i and i0 are indices in possibly multi-dimensional grids and u is concave, increasing, and differentiable. If z and W are increasing, the sufficient conditions we provide can be used to show the optimal policy is monotone. However, even if they are not increasing, sorting z and W transforms the problem into one that does have a monotone policy and so allows binary monotonicity to be used. After establishing that this type of algorithm is O(n log n) inclusive of sorting costs, we show it is several times faster than existing grid search methods in solving a combined Arellano (2008) and RBC model featuring portfolio choice. While we have described our method in terms of discrete choices, we show it applies in the continuous case as well.5 When applied to the Arellano (2008) model, a continuous choice space 4

The reason is that the optimal policy very nearly satisfies g(i) = g(i − 1) + 1. When this is the case, computing the optimal policy for a given i requires only three evaluations: The algorithm evaluates the objective function at g(i − 1), g(i − 1) + 1, and g(i − 1) + 2 and stops since g(i − 1) + 2 is inferior to the optimal choice, g(i − 1) + 1. 5 The algorithm’s logic also extends to continuous state spaces. However, when using a continuous state space, one must still necessarily recover the value and policy functions at a finite number of points, and so assuming discrete state spaces is essentially without loss of generality. For instance, if one is using a cubic spline to represent the value function, one must obtain its values at the spline’s knots.

3

with binary monotonicity reduces Euler equation errors by up to three orders of magnitude and increases run times by only 70-90%. We also show this and other applications allow for a nonheuristic maximization method that, to numerical precision, recovers a global maximum of the approximate problem. This method is only around 15% more costly than the heuristic method of a global grid search followed by a local continuous search when using binary monotonicity. Lastly, we present sufficient conditions for policy functions to be monotone and show how they can be applied in the RBC and Arellano (2008) models. Most of the results are collected from various parts of the literature, especially Topkis (1978), but we also give a new one. In particular, a key condition required by Topkis (1978) does not, in general, hold for the Arellano (2008) model or when using binary monotonicity with the sorting algorithm. Our sufficient condition circumvents the violated requirement by assuming continuation utility is increasing.

1.1

Related literature

For problems exhibiting global concavity, many attractive solution methods exist. These include fast and globally accurate methods such as projection and Carroll (2006)’s endogenous gridpoints method (EGM), as well as fast and locally accurate methods like linearization and higher-order perturbations. Judd (1998) and Schmedders and Judd (2014) provide useful descriptions of these methods that are, in general, superior to grid search in terms of speed and usually accuracy, although not necessarily robustness (for a comparison in the RBC context, see Aruoba, Fern´andez-Villaverde, and Rubio-Ram´ırez, 2006). In contrast, few computational methods are available for problems with non-concavities, such as problems commonly arising in discrete choice models.6 The possibility of multiple local maxima makes working with any method requiring first order conditions (FOCs) perilous. This discourages the use of the methods mentioned above and often leads researchers to use value function iteration with discrete state space methods. As binary monotonicity is many times faster than the best existing grid search methods for non-concave problems, its use in these models seems promising. However, recent research has looked at ways of solving non-concave problems using FOCs. Fella (2014) generalizes Carroll (2006)’s EGM for non-concave problems by finding all points satisfying the FOCs and then using a value function iteration step to distinguish global maxima from local maxima. Iskhakov et al. (2016) show how to modify discrete choice models using i.i.d. taste shocks to facilitate their solution using EGM. Because these approaches work directly with FOCs, they will almost surely be more accurate than binary monotonicity with a discrete choice space. However, binary monotonicity solves the problem without adding shocks, does not require any derivative computation, and is simple.7 Moreover, accuracy can be improved by using a continuous search 6

In discrete choice models, the value function is the upper envelope of a finite number of “subsidiary” value functions. Even if the subsidiary value functions are concave, the upper envelope is generally not. For instance, in the Arellano (2008) model we consider in this paper, the value function is V (b, y) = max{V d (y), V n (b, y)} where b is a sovereign’s bond position, y is output, V d (y) is the value conditional on default, and V n (b, y) is the value conditional on repaying. In this case, even if the subsidiary value function V n is concave in b, V is not. Iskhakov, Jørgensen, Rust, and Schjerning (2016) discuss how these non-concavities propagate in dynamic programming problems. 7 While an imperfect measure of complexity, the 16 lines needed to code binary monotonicity with brute force grid

4

space without drastically affecting computation time, and we show this can reduce average Euler equation errors by up to three orders of magnitude for moderate grid sizes. Puterman (1994) and Judd (1998) discuss a number of existing methods (beyond what we have mentioned thus far) that can be used in discrete optimal control problems. Some of these methods, such as policy iteration, modified policy function, multi-grids, and action elimination procedures, are complimentary to the binary monotonicity algorithm. Others, such as sweeping methods like the Gauss-Seidel methods or splitting methods could not naturally be employed while simultaneously using binary monotonicity.8 In contrast to binary monotonicity, most of these methods try to produce good value function guesses and so only apply in the infinite-horizon case. To the best of our knowledge, binary monotonicity is novel. However, the general concept that monotonicity can be exploited beyond its simple formulation has been recognized. In fact, Judd (1998) points out that the “order in which we solve the various [maximization] problems is important in exploiting” monotonicity and concavity (p. 414). As we show, binary monotonicity uses a particular order that is efficient and has guaranteed performance. While the divide and conquer principle it uses is of course well known, nowhere have we seen it applied to exploit monotonicity in the way we do. The remainder of this paper is organized as follows. Section 2 lays out the algorithm and characterizes its performance theoretically and quantitatively. Section 3 extends the algorithm to exploit monotonicity in two state variables. Section 4 shows how the algorithm can be applied to a class of problems with non-monotone policies. Section 5 shows how the method may be used with continuous choice spaces. Section 6 provides sufficient conditions for policy function monotonicity. Section 7 concludes.

2

Binary monotonicity

This section formalizes the algorithm and analyzes its properties.

2.1

The algorithm and a simple example

Our focus is on the problem of solving Π(i) =

max

i0 ∈{1,...,n0 }

π(i, i0 )

(1)

for i ∈ {1, . . . , n} with an optimal policy g (note the number of choices n0 may differ from n). Economic problems of the form given in (1) abound, and we show how the RBC model and Arellano (2008)’s model can be cast in this form. We say g is monotone (increasing) if g(i) ≤ g(j) whenever search (excluding declarations and comments) gives evidence of its simplicity. 8 The operations research literature has also developed algorithms that approximately solve dynamic programming problems. Papadaki and Powell (2002, 2003) and Jiang and Powell (2015) aim to preserve monotonicity of value and policy functions while receiving noisy updates of the value function at simulated states. While they exploit monotonicity in a non-traditional order, they do so in a random one and do not compute an exact solution.

5

i ≤ j.9 For a concrete example, consider the problem of optimally choosing next period capital k 0 given a current capital stock k where both of these lie in a grid K = {k1 , . . . , kn } having kj < kj+1 for all j. For u a period utility function, F a production function, δ depreciation, β the time discount factor, and V0 a guess on the value function, the Bellman update can be written V (ki ) =

max

i0 ∈{1,...,n}

u(−ki0 + F (ki ) + (1 − δ)ki ) + βV0 (ki0 )

(2)

with the optimal policy g(i) given in terms of indices. Then (2) fits the form of (1). Of course, not every choice is necessarily feasible, there may be multiple states and choice variables, and the choice space may be continuous, and we will deal with all these issues. Under mild conditions—specifically, F increasing, δ ≤ 1, and u0 > 0, u00 < 0—one can show the optimal policy g(i) is monotone using the sufficient conditions we provide. Note in particular that V0 need not be concave. Our algorithm computes the optimal policy g and the optimal value Π using divide-and-conquer. The algorithm is as follows: 1. Initialization: Compute g(1) and Π(1) by searching over {1, . . . , n0 }. If n = 1, STOP. Compute g(n) and Π(n) by searching over {g(1), . . . , n0 }. Let i = 1 and i = n. If n = 2, STOP. 2. At this step, (g(i), Π(i)) and (g(i), Π(i)) are known. Find an optimal policy and value for all i ∈ {i, . . . , i} as follows: (a) If i = i + 1, STOP: For all i ∈ {i, . . . , i} = {i, i}, g(i) and Π(i) are known. (b) For the midpoint m = b i+i 2 c, compute g(m) and Π(m) by searching over {g(i), . . . , g(i)}. (c) Divide and conquer: Go to (2) twice, first computing the optimum for i ∈ {i, . . . , m} and second computing the optimum for i ∈ {m, . . . , i}. In the first case, redefine i := m; in the second case, redefine i := m. Figure 1 provides a graphical illustration of binary monotonicity and, for comparison, simple monotonicity. The blue dots represent the optimal policy of (2) and the black empty circles represent, for a given i, the search space when solving for g(i), i.e., the range of i0 that might possibly be optimal. With simple monotonicity, the search for i > 1 is restricted to {g(i − 1), . . . , n0 }. Because the policy function is nearly linear, this results in the nearly triangular search space seen in Figure 1. For binary monotonicity, the search is restricted to {g(i), . . . , g(i)} where i and i are far apart for the first iterations but become close together rapidly (with the distance roughly halved at each iteration). This results in an irregularly shaped search space that is large at the first iterations i = 1, n, and n/2 but much smaller at later ones. For this example, the average search space for 9 For most of the paper, we will talk about the optimal policy as if it were unique. In general, there may be more than one optimal policy even if the objective function is consistent with strict concavity because of the discrete choice set (e.g., take π(i, i0 ) = −(i0 − 1.5)2 ). The sufficient conditions we provide handle this possibility by establishing that the optimal policy correspondence is “ascending” (a notion defined in Section 6). When this is the case, binary monotonicity selects an optimal policy, which we prove in Appendix C.

6

simple monotonicity, i.e., the average size of {g(i − 1), . . . , n0 }, is 10.6 when n = n0 = 20 and grows to 51.8 when n = n0 = 100. Since the search space for brute force would be 20 and 100, respectively, this represents roughly a 50% improvement on brute force. In contrast, the average size of binary monotonicity’s search space is 7.0 when n = n0 = 20 (34% smaller than simple monotonicity’s) and 9.5 when n = n0 = 100 (82% smaller), a large improvement.

Figure 1: A graphical illustration of simple and binary monotonicity While binary monotonicity restricts the search space, it does not say how one should find a maximum within the search space, and there can be multiple ways of doing this. To be precise, consider solving maxi0 ∈{a,...,b} π(i, i0 ) for a given i, a, and b. Brute force finds the maximum by evaluating π(i, i0 ) at all i0 ∈ {a, . . . , b}. In Figure 1, this amounts to evaluating π at every dot. However, if the objective function is concave in i0 , then a concavity technique can be used. Simple concavity economizes on brute force by proceeding sequentially from i0 = a to i0 = b but stopping whenever π(i, i0 − 1) > π(i, i0 ). In Figure 1, this amounts to evaluating π(i, i0 ) from the lowest dot in each column to one above the blue dot. Lastly, Heer and Maußner (2005)’s binary concavity algorithm uses the ordering of π(i, m) and π(i, m+1) where m = b a+b 2 c to narrow the search space: If π(i, m) < π(i, m+1), an optimal choice must be in {m+1, . . . , b}; if π(i, m) ≥ π(i, m+1), an optimal choice must be in {a, . . . , m}. The search then proceeds recursively, redefining the search space 7

accordingly until there is only one choice left. In Figure 1, this amounts to always evaluating π(i, i0 ) at adjacent dots (m and m + 1) within a column with the adjacent dots progressing geometrically towards the blue dot. For ease of reference, Table 1 summarizes the monotonicity and concavity techniques. Grid Search Technique

Description

Simple Monotonicity Binary Monotonicity

Exploits monotonicity by only searching {g(i − 1), . . . , n0 } when i > 1. The approach in this paper. Uses divide-and-conquer to exploit monotonicity, restricting the search space to {g(i), . . . , g(i)}. Exploits concavity of π(i, i0 ) in i0 . To find the maximum in {a, . . . , b}, iterates through i0 = a, a + 1, . . . and stops once π(i, i0 ) decreases or b is reached. The approach in Heer and Maußner (2005). Uses divide-and-conquer to exploit concavity of π(i, i0 ) in i0 . To find a maximum in {a, . . . , b}, a+b 0 checks i0 = b a+b 2 c and i = b 2 c + 1, then restricts search to a+b {a, . . . , b a+b 2 c} or {b 2 c + 1, . . . , b}. Uses X monotonicity to restrict the search space for i0 to {a, . . . , b} and then Y concavity to find the maximum within it.

Simple Concavity

Binary Concavity

X Monotonicity with Y Concavity

Table 1: Description of grid search techniques

2.2

Theoretical results

We now assess binary monotonicity’s theoretical properties. For our theoretical measure of efficiency, we will use the number of times that π must be evaluated to recover g and Π in the worst case. Clearly, this will depend on the method used to solve max

i0 ∈{a,...,a+γ−1}

π(i, i0 ).

(3)

While brute force requires γ evaluations of π(i, ·), binary concavity can find the solution in no more than 2dlog2 (γ)e − 1 for γ ≥ 3 and γ for γ ≤ 2.10 We allow for many maximization techniques by characterizing the algorithm’s properties conditional on a monotonically increasing σ : Z++ → Z+ that bounds the evaluation count required to solve (3). Because of the recursive nature of the algorithm, the π evaluation bound for general σ is also naturally recursive. The following definition and Proposition 1 provide such a bound. Definition 1. For any σ : Z++ → Z+ , define Mσ : {2, 3, . . .} × Z+ → Z+ recursively by Mσ (z, γ) = 0 if z = 2 or γ = 0 and Mσ (z, γ) = σ(γ) +

n o z z Mσ (b c + 1, γ 0 ) + Mσ (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ} max

for z > 2 and γ > 0. 10

We prove this in Lemma 6 in Appendix C.

8

Proposition 1. Let σ : Z++ → Z+ be an upper bound on the number of π evaluations required to solve (3). Then, the algorithm requires at most 2σ(n0 ) + Mσ (n, n0 ) evaluations for n ≥ 2 and n0 ≥ 1. Proposition 1 gives a fairly tight bound. However, it is also unwieldy because of the discrete nature of the problem. By bounding σ whose domain is Z++ with a σ ¯ whose domain is [1, ∞), a more convenient bound can be found. This bound is given in Lemma 1. Lemma 1. Suppose σ ¯ : [1, ∞) → R+ is either the identity map (¯ σ (γ) = γ) or is a strictly increasing, strictly concave, and differentiable function. If σ ¯ (γ) ≥ σ(γ) for all γ ∈ Z++ , then an upper bound on function evaluations is 3¯ σ (n0 ) +

I−2 X

2j σ ¯ (2−j (n0 − 1) + 1)

j=1

if I > 2 where I = dlog2 (n − 1)e + 1. An upper bound for I ≤ 2 is 3¯ σ (n0 ). The main theoretical result of the paper is given in Proposition 2. It applies Lemma 1 with σ ¯ bounds corresponding to brute force and binary concavity. Proposition 2. Suppose n ≥ 4 and n0 ≥ 3. If brute force grid search is used, then binary monotonicity requires no more than (n0 − 1) log2 (n − 1) + 3n0 + 2n − 4 evaluations of π. Consequently, fixing n = n0 the algorithm’s worst case behavior is O(n log2 n) with a hidden constant of one. If binary concavity is used with binary monotonicity, then no more than 6n + 8n0 + 2 log2 (n0 − 1) − 15 evaluations are required. Consequently, fixing n = n0 the algorithm’s worst case behavior is O(n) with a hidden constant of 14. Note the bounds stated in the abstract and introduction, n log2 (n) + 5n for brute force and 14n + 2 log2 (n) for binary concavity, are simplified versions of these for the case n = n0 . These worst-case bounds show binary monotonicity is very powerful. To see this, note that even if one knew the true solution, recovering Π(i) would still require n evaluations of π (specifically, evaluating π(i, g(i)) for i = 1, . . . , n). Hence, relative to knowing the true solution, binary monotonicity is only asymptotically slower by a log factor for n = n0 . Moreover, when paired with binary concavity, the algorithm is only asymptotically slower by a factor of 14.

2.3

Quantitative results

We now turn to assessing the algorithm’s performance in economically relevant examples. Because (1) assumes that every choice is feasible, we first discuss an equivalent formulation that does not assume this and show how the RBC and Arellano (2008) models can be mapped into it. Then we use the Arellano (2008) model to compare our method with existing techniques that only assume monotonicity. Last, we use the RBC model to compare our method with existing techniques that assume monotonicity and concavity. We will not conduct any error analysis here since all the techniques deliver identical solutions. The calibrations are given in Appendix B. 9

2.3.1

A more general, but equivalent, formulation

In both the RBC and Arellano (2008) model, there is no guarantee that every choice is feasible. Consequently, one cannot directly apply the binary monotonicity algorithm because the maximization problems do not directly fit (1). To handle this issue in a general way, suppose that the feasible choice set is I 0 (i) ⊂ {1, . . . , n0 }, which may be empty, and that the objective function is given by some π ˜ (i, i0 ) only defined for (i, i0 ) such that i0 ∈ I 0 (i). For every i such that I 0 (i) is nonempty, we define ˜ Π(i) = max π ˜ (i, i0 ). i0 ∈I 0 (i)

(4)

In Appendix C, we prove that if I 0 (i) is monotonically increasing, then an equivalence between (1) and (4) exists. Namely, letting π denote a lower bound on π ˜ and defining  ˜ (i, i0 ) if I 0 (i) 6= ∅ and i0 ∈ I 0 (i)   π π(i, i0 ) = if I 0 (i) 6= ∅ and i0 ∈ / I 0 (i) π   1[i0 = 1] if I 0 (i) = ∅,

(5)

we show the two problems have the same solutions whenever I 0 (i) is nonempty.11 Further, we show the monotonicity techniques applied to (1) deliver a correct solution to (4) and that, if I 0 (i) = {1, . . . , n ¯ 0 (i)} for some increasing function n ¯ 0 (i), that the concavity techniques do as well.12 To see how the RBC model can be cast into (4), consider the problem’s Bellman update. This is given by V (k, z) =

max u(c) + βEz 0 |z V0 (k 0 , z 0 )

c≥0,k0 ∈K

(6)

s.t. c + k 0 = zF (k) + (1 − δ)k for k ∈ K, z ∈ Z where K = {k1 , . . . , kn } with the ki increasing.13 Now consider creating a separate problem for each z and writing Πz (i) = max πz (i, i0 ) i0 ∈Iz0 (i)

(7)

where πz and Iz0 are defined as πz (i, i0 ) := u(−ki0 + zF (ki ) + (1 − δ)ki ) + βEz 0 |z V0 (ki0 , z 0 ) Iz0 (i) := {i0 ∈ {1, . . . , n}|ki0 ≤ zF (ki ) + (1 − δ)ki }.

(8)

Evidently, (7) is just (6) with k and k 0 given by grid indices. Because Iz0 (i) is increasing, the problem 11 A theoretical lower bound on π ˜ is mini mini0 ∈I 0 (i) π ˜ (i, i0 ). While this particular bound is not practically useful (because computing it would be very costly), the smallest machine-representable number serves as a lower bound for computational purposes. 12 For a proof, see Proposition 7 of Appendix C. Of course, the solution techniques only work if monotonicity or concavity obtain as required by the solution technique. For monotonicity, the optimal policy correspondence and choice set in (4) must be ascending as defined in Section 6. For concavity, π ˜ must be concave in the discrete sense defined in Appendix C, which essentially requires quasi-concavity. 13 Here we consider the case of inelastic labor supply, which is what we will use in the computation. However, elastic labor supply can be incorporated by replacing the period utility function with an an indirect utility function as discussed in Section 6.

10

is equivalent to (1). Moreover, because Iz0 (i) has the form {1, . . . , n ¯ 0z (i)} for an increasing function n ¯ 0z (i), a concavity algorithm can be used. The Arellano (2008) model can be mapped into (4) in the same fashion. The main computational difficulty in that model is solving the sovereign’s problem conditional on not defaulting. Specifically, the problem is to solve, for each b ∈ B and y ∈ Y, V n (b, y) = max u(c) + βEy0 |y max{V n (b0 , y 0 ), V d (y 0 )} 0 c≥0,b ∈B

(9)

s.t. c + q(b0 , y)b0 = b + y where b is the sovereign’s outstanding bonds, q(b0 , y) is the bond price, y is output, and V d is the value of defaulting. (For completeness, the full model may be found in Appendix B.) Taking B as {b1 , . . . , bn } and creating a separate problem for each y, one has Πy (i) = max πy (i, i0 ) i0 ∈Iy0 (i)

(10)

where πy and Iy0 are defined as πy (i, i0 ) := u(−q(bi0 , y)bi0 + bi + y) + βEy0 |y max{V n (bi0 , y 0 ), V d (y 0 )} Iy0 (i) := {i0 ∈ {1, . . . , n}| − q(bi0 , y)bi0 + bi + y ≥ 0}.

(11)

Because Iy0 is monotonically increasing, (10) is equivalent to (1). 2.3.2

Exploiting monotonicity only

Figure 2 compares the time required for the Arellano (2008) model to converge when using brute force, simple monotonicity, and binary monotonicity.14 The ratio of brute force’s run time to binary monotonicity’s grows virtually linearly, increasing from 10.7 times faster for a grid size of 100 to 223 times faster for a grid size of 2500. Similarly, binary monotonicity improves over simple monotonicity, with a speedup of 5.1 (95) for a grid size of 100 (2500). In contrast, the advantage of simple monotonicity relative to brute force is flat and ranges from 2.1 to 2.4. Table 2 confirms and extends these findings by providing run times (which are sensitive to programming) and π evaluation counts (which are not) for a wide range of grid sizes. Simple monotonicity requires roughly 50% fewer evaluations than brute force and so exhibits the same quadratic growth in run times and evaluation counts. The run times and evaluation counts for binary monotonicity are less initially and grow much more slowly in n. Regardless of whether binary monotonicity’s speedup relative to simple monotonicity is measured in run times or evaluation counts, the speedup grows almost linearly. However, for a given grid size, the speedup is twice as 14 Run times are for the Intel Fortran compiler version 17.0.0 with the flags -mkl=sequential -debug minimal -g -traceback -xHost -O3 -no-wrap-margin -qopenmp-stubs on an Intel Xeon E5-2650 processor with a clockspeed of 2.60GHz. The singly-threaded jobs were run in parallel using GNU Parallel as described in Tange (2011). When the run times were less than two minutes, we repeatedly solved the model until two minutes had passed and then computed the average time per solution.

11

Figure 2: Convergence time comparison for methods only exploiting monotonicity large when measured in evaluation counts. The reason is that, when using binary monotonicity, the time required by other parts of the solution (such as computing conditional expectations) take a non-negligible amount of time. Binary monotonicity is faster than simple monotonicity by a factor of 5 to 20 for grids of a few hundred points, which is a large improvement. While these are the grid sizes that have been commonly used in the literature to date (e.g., Arellano, 2008, uses 200), the cost of using several thousand points is relatively small when using binary monotonicity. For instance, in roughly the same time simple monotonicity needs to solve the 250 point case, binary monotonicity can solve the 2500 point case. At these larger grid sizes, binary monotonicity can be hundreds of times faster than simple monotonicity. 2.3.3

Exploiting monotonicity and concavity

When the problem is concave, binary monotonicity can be combined with various concavity techniques to deliver potentially much faster performance. To test how much faster, we use the RBC model. Table 3 examines the running times and evaluation counts for all nine combinations of the monotonicity and concavity techniques. Perhaps surprisingly, the fastest combination is simple monotonicity with simple concavity. This pair has the smallest run times for both values of n and the time increases linearly (in fact,

12

Run Time (Seconds)

Speedup

None∗

Simple∗

Binary∗

None∗

Simple∗

Binary∗

Time

Eval.

0.55 3.51 12.9 50.8 335 1253 4685∗ 26769∗ 100052∗ 373953∗

0.26 1.48 5.80 22.9 142 569 2275 14203∗ 56761∗ 226837∗

0.05 0.13 0.27 0.57 1.51 3.18 6.88 18.6 39.3 83.0

100 250 500 1000 2500 5000 10000∗ 25000∗ 50000∗ 100000∗

49.1 121 241 481 1202 2402 4803 12004∗ 24003∗ 47995∗

5.9 6.6 7.1 7.6 8.3 8.8 9.3 10.0 10.5 11.0

5.1 11.5 21.4 40.1 94.5 179 331 764∗ 1445∗ 2733∗

8.3 18.3 33.9 63.1 144 272 514 1200∗ 2287∗ 4364∗

Grid size (n) 100 250 500 1000 2500 5000 10000 25000 50000 100000

Evaluations/n

Note: ∗ means the value is estimated; the speedup measure gives the ratio of simple monotonicity to binary monotonicity. Table 2: Run times and evaluation counts for methods only exploiting monotonicity

n = 250 Monotonicity

Concavity

None Simple Binary

n = 500

Eval./n

Time

Eval./n

Time

Increase

None None None

250.0 127.4 10.7

8.69 3.43 0.40

500.0 253.4 11.7

29.51 13.29 0.85

3.4 3.9 2.1

None Simple Binary

Simple Simple Simple

125.5 3.0 6.8

4.39 0.14 0.30

249.6 3.0 7.3

17.28 0.26 0.60

3.9 1.9 2.0

None Simple Binary

Binary Binary Binary

13.9 12.6 3.7

0.58 0.43 0.20

15.9 14.6 3.7

1.27 0.95 0.36

2.2 2.2 1.8

Note: Time is measured in seconds; the last column gives the run time increase in going from n = 250 to 500. Table 3: Run times and evaluation counts for all monotonicity and concavity techniques

13

slightly sub-linearly). For this combination, solving for the optimal policy requires, on average, only three evaluations of π per state. The reason for this is that the capital policy very nearly satisfies g(i) = g(i − 1) + 1. In this case, simple monotonicity with simple concavity requires only three π evaluations to compute g(i): Evaluating π(i, g(i − 1)), π(i, g(i − 1) + 1), and π(i, g(i − 1) + 2), the algorithm finds π(i, g(i−1)+1) > π(i, g(i−1)+2) and stops. While the second fastest combination, binary monotonicity with binary concavity, exhibits a similar linear (in fact, slightly sub-linear) time increase, it fares somewhat worse in absolute terms. In particular, it requires on average 3.7 evaluations of π per state, a small number, but not quite as small as the 3.0 required by simple monotonicity with simple concavity. All the other combinations are slower and exhibit greater run time and evaluation count growth. Although Table 3 only reports the performance for two grid sizes, it is representative. This can be seen in Figure 3, which plots the average number of π evaluations per state required for four of the methods. Simple monotonicity with simple concavity and binary monotonicity with binary concavity both appear to be O(n) (with the latter guaranteed to be), but the hidden constant is smaller for the former. The other methods all appear to be O(n log n).

Figure 3: Empirical O(n) behavior While simple monotonicity with simple concavity dominates binary monotonicity with binary concavity for this model, calibration, and computational setup, the performance of the latter is guaranteed for all models, calibrations, and computational setups. Additionally, while binary mono-

14

tonicity with binary concavity is not the fastest combination, it is still very fast. As Table 3 shows, for n = 500 it is roughly 100 times faster than brute force and can solve a standard RBC model in less than a second.

3

Monotonicity in two states

In the previous section, we demonstrated binary monotonicity’s performance when exploiting monotonicity in one state variable. However, some models have policy functions that are monotone in more than one state. For instance, under certain conditions, the RBC model’s capital policy k 0 (k, z) is monotone in both k and z. In this section, we show how to exploit this property using what we will call a two-state binary monotonicity algorithm. Our canonical problem is to solve Π(i, j) =

max

i0 ∈{1,...,n0 }

π(i, j, i0 )

(12)

for i ∈ {1, . . . , n1 } and j ∈ {1, . . . , n2 } where the optimal policy g(i, j) is increasing in both arguments. The algorithm we propose below uses the one-state binary monotonicity algorithm to recover g(·, j) for each j. But instead of sequentially iterating on j from 1 to n2 , it uses a divide and conquer approach. Specifically, with g(·, j) and g(·, j) known, it solves for g(·, b

j+j 2 c)

using g(·, j)

and g(·, j) as additional bounds on the search space: Instead of just restricting the search space for j=b

j+j 2 c

to integers in [g(i, j), g(i, j)] as in the one-dimensional case, monotonicity in j is used to

further restrict it to integers in [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)]. While this concept is straightforward, its details are somewhat complicated. The algorithm is as follows: 1. Solve for g(·, 1) using binary monotonicity.15 Define l(·) := g(·, 1), u(·) := n0 , j := 1, and j := n2 . 2. Solve for g(·, j) as follows: (a) Solve for g(1, j) on the search space {l(1), . . . , u(1)}. Define g := g(1, j) and i := 1. (b) Solve for g(n1 , j) on the search space {max{g, l(n1 )}, . . . , u(n1 )}. Define g := g(n1 , j) and i := n1 . (c) Consider the state as (i, i, g, g). Solve for g(i, j) for all i ∈ {i, . . . , i} as follows. i. If i = i + 1, then this is done, so go to Step 3. Otherwise, continue. ii. Let m1 = b i+i 2 c and compute g(m1 , j) by searching {max{g, l(m1 )}, . . . , min{g, u(m1 )}}. iii. Divide and conquer: Go to (c) twice, once redefining (i, g) := (m1 , g(m1 , j)) and once redefining (i, g) := (m1 , g(m1 , j)). 15

To make the algorithm less verbose, we omitted references to Π, but it should be solved for at the same time as

g.

15

3. Here, g(·, j) and g(·, j) are known. Redefine l(·) := g(·, j) and u(·) := g(·, j). Compute g(·, j) for all j ∈ {j, . . . , j} as follows. (a) If j = j + 1, STOP: g(·, j) is known for all j ∈ {j, . . . , j}. (b) Define m2 := b

j+j 2 c.

Solve for g(·, m2 ) by essentially repeating the same steps as in 2

(but everywhere replacing j with m2 ). Explicitly, i. Solve for g(1, m2 ) on the search space {l(1), . . . , u(1)}. Define g := g(1, m2 ) and i := 1. ii. Solve for g(n1 , m2 ) on the search space {max{g, l(n1 )}, . . . , u(n1 )}. Define g := g(n1 , m2 ) and i := n1 . iii. Consider the state as (i, i, g, g). Solve for g(i, m2 ) for all i ∈ {i, . . . , i} as follows. A. If i = i + 1, then this is done, so go to Step 3 part (c). Otherwise, continue. B. Let m1 = b i+i 2 c and compute g(m1 , m2 ) by searching {max{g, l(m1 )}, . . . , min{g, u(m1 )}}. C. Divide and conquer: Go to (iii) twice, once redefining (i, g) := (m1 , g(m1 , m2 )) and once redefining (i, g) := (m1 , g(m1 , m2 )). (c) Go to Step 3 twice, once redefining j := m2 and once redefining j := m2 . Note that one could “transpose” the algorithm, i.e., solve for g(1, ·) in Step 1 rather than g(·, 1) and so on. However, we found this has no effect. The reason is that the search space for a given (i, j) is restricted to [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)]. In the transposed algorithm, the search for the same (i, j) pair would be restricted to [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)]. Evidently, these search spaces are the same as long as i, i, j, j are the same in both the original and transposed algorithm. This is the case because—in both the original and transposed algorithm—i is reached after subdividing {1, . . . , n1 } in a order that does not depend on g and similarly for j. Consequently, the search space for each (i, j) pair is identical in each algorithm, and so—irrespective of the maximization procedure used, be it brute force, simple concavity, or binary concavity—each algorithm requires the same number of π evaluations. Figure 4 illustrates how the two-state binary monotonicity algorithm works when applied to the RBC model using capital as the first dimension. The figure is analogous to Figure 1, but the presence of redlines indicates bounds on the search space coming from previous solutions of g(·, j) and g(·, j). Consider first the top two panels that correspond to n1 = n2 = n0 = 25. At j = n2 (the top left panel), g(·, 1) has been solved already and hence provides a lower bound on the search space as indicated by the red, monotonically increasing line. However, there is no upper bound on the search space other than n0 . At j = b n22+1 c (the top right panel), both g(·, 1) and g(·, n2 ) are known and so provide lower and upper bounds, respectively. The drastic search space reduction in this case is driven in part by the coarse grids, but for finer grids there is a similar percent reduction. This can be seen in the bottom two panels where n1 = n0 = 100 and n2 = 25. Table 4 reports the evaluation counts and run times for the RBC model when using the twostate binary monotonicity algorithm with the various concavity techniques. We again treat the first 16

Figure 4: Example of binary monotonicity in two states

17

dimension as capital, so n1 = n0 but n1 does not necessarily equal n2 . For ease of reference, the table also reports these measures when exploiting monotonicity only in k.16 The speedup of the two-state algorithm relative to the one-state algorithm ranges from 1.4 to 5.1 when considering evaluation counts. The run-time improvements are smaller, ranging from 1.0 (i.e., no improvement) to 2.7. In part, this is because time spent maximizing is already small, which reduces the benefit of further reductions to it. But it is also caused by a more complicated algorithm that makes optimization difficult for the compiler. The greatest improvements come when not exploiting concavity, which results in run-time speedups of at least 2.0 and evaluation count speedups of at least 3.2. Exploiting monotonicity in k and z, even without an assumption on concavity, can bring evaluation counts down to 2.1, which occurs for (n1 , n2 ) = (250, 42). This is 119 times better than brute force (where the evaluation count per state would be 250) and only 50% worse than knowing the true solution (which would require an evaluation count of 1). Additionally, for (n1 , n2 ) = (250, 21), the 2.9 count also outperforms the “simple-simple” combination that, as can be seen in Table 3, had a 3.0 count for these grid sizes. The extremely small evaluation count also means that brute force is nearly as fast as, and sometimes faster than, binary concavity. That it can be faster again reflects the important role played by compiler optimization.17 Table 4 also shows that how these measures vary in grid sizes is non-trivial. For instance, when doubling only the capital grid sizes (specifically going from (n1 , n2 ) = (250, 21) to (n1 , n2 ) = (500, 21)), evaluation counts per state uniformly increase for the both the one-state and twostate algorithms. However, when doubling only the productivity grid size (specifically going from (n1 , n2 ) = (250, 21) to (n1 , n2 ) = (250, 42)), evaluation counts per state do not change for the onestate algorithm but fall for the two-state algorithm. Doubling all the grid sizes increases evaluation counts per state for the one-state algorithm but decreases them for the two-state algorithm except when using binary concavity. Figure 5 helps visualize the algorithm’s efficiency by plotting evaluations counts per state as grid sizes grow while fixing the ratio n1 /n2 and forcing n1 = n0 (for this figure, concavity is not exploited). The horizontal axis gives the number of states in log2 so that a unit increment means the number of states is doubled. In all cases, the evaluation counts fall and appear to asymptote as grid sizes increase, which suggests the algorithm is O(n1 n2 ). However, the overall performance depends on the n1 /n2 ratio with larger ratios implying less efficiency. In Table 4, this ratio was between 6 and 24, resulting in an evaluation count per state in the 2.1 to 3.7 range. The two-state algorithm restricts the search space more than the alternative of looping over j values and using the one-state algorithm to recover g(·, j) sequentially for each j. Consequently, the two-state algorithm’s worst case is weakly better than n2 (n0 log2 (n1 ) + 3n0 + 2n1 ) when using brute force and n2 (8n0 + 6n1 + 2 log2 (n0 )) when using binary concavity (by Proposition 2). Since knowing the true policy requires n1 n2 evaluations—specifically, recovering Π(i, j) by evaluating π(i, j, g(i, j)) 16

The times need not agree with the ones in Table 3 because they are from a different run. Because these optimizations matter, a commonly-used improvement to quicksort (which has an O(n log n) average case) is to switch to insertion sort (which is O(n2 )) when n is very small (Press, Flannery, Tuekolsky, and Vetterling, 1989). While we neglect such optimizations in this paper, they should prove useful in practice. 17

18

n1 = n0 = 250, n2 = 21 Monotonicity

Concavity

Eval

Time

Eval Speedup

Time Speedup

k k k k k k

None Simple Binary None Simple Binary

10.7 6.8 3.7 2.9 2.4 2.2

0.42 0.29 0.19 0.18 0.16 0.16

− − − 3.7 2.8 1.7

− − − 2.4 1.8 1.2

only only only and z and z and z

n1 = n0 = 250, n2 = 42 Monotonicity

Concavity

Eval

Time

Eval Speedup

Time Speedup

k k k k k k

None Simple Binary None Simple Binary

10.7 6.8 3.7 2.1 1.9 1.8

0.81 0.71 0.39 0.30 0.28 0.31

− − − 5.1 3.6 2.1

− − − 2.7 2.5 1.3

only only only and z and z and z

n1 = n0 = 500, n2 = 21 Monotonicity

Concavity

Eval

Time

Eval Speedup

Time Speedup

k k k k k k

None Simple Binary None Simple Binary

11.7 7.3 3.7 3.7 3.0 2.7

0.85 0.60 0.36 0.41 0.37 0.37

− − − 3.2 2.4 1.4

− − − 2.0 1.6 1.0

only only only and z and z and z

n1 = n0 = 500, n2 = 42 Monotonicity

Concavity

Eval

Time

Eval Speedup

Time Speedup

k k k k k k

None Simple Binary None Simple Binary

11.7 7.3 3.7 2.6 2.3 2.2

1.71 1.24 0.77 0.71 0.67 0.70

− − − 4.4 3.1 1.7

− − − 2.4 1.9 1.1

only only only and z and z and z

Note: Time is measured in seconds; “Eval Speedup” and “Time Speedup” give the evaluation count and run times, respectively, for exploiting monotonicity only in k relative to exploiting it in k and z. Table 4: Run times and evaluation counts for one-state and two-state binary monotonicity

19

Figure 5: Two-state algorithm’s empirical O(n1 n2 ) behavior for n1 = n0 and n1 /n2 in a fixed ratio for all i, j—the only significant theoretical improvement that can be made is eliminating the log2 (n1 ) term when using brute force. While Figure 5 suggests the two-state algorithm is in fact O(n1 n2 ) as n1 , n2 and n0 increase (holding their proportions fixed), we have not been able to prove this is the case for all monotone policies. However, we have when restricting the class of optimal policies. The specific restriction we require is that for every j ∈ {2, . . . , n2 − 1} one has g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1). For any function monotone in i and j, λ = 1 satisfies this constraint, but there may be smaller values of λ that do as well. The bound we derived in light of this restriction is stated in Proposition 3.18 Proposition 3. Suppose the number of states in the first (second) dimension is n1 (n2 ) and the number of choices is n0 with n1 , n2 , n0 ≥ 4. Further, let λ ∈ (0, 1] be such that for every j ∈ {2, . . . , n2 − 1} one has g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1). Then, the two-state binary monotonicity algorithm requires no more than (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 + 2n0 log2 (n1 ) + 6n0 evaluations of π where κ = log2 (1 + λ). For n1 = n0 =: n and n1 /n2 = ρ with ρ a constant, the cost is O(n1 n2 ) with a hidden constant of 4 if (g(n, j) − g(1, j) + 1)/(g(n, j + 1) − g(1, j − 1) + 1) is bounded away from 1 for large n. 18 In the proposition, we abuse notation by writing O(n1 n2 ) in place of O(n2 /ρ). We do this to emphasize that the algorithm asymptotically requires only 4 evaluations per state, and in general there are n1 n2 states.

20

For λ = 1, the bound essentially coincides with just looping sequentially through j values and using the one-state binary monotonicity algorithm. However, for λ < 1, asymptotically the two-state algorithm—without exploiting concavity—is only 4 times slower than knowing the exact solution. In the RBC example, we found that (g(n, j) − g(1, j) + 1)/(g(n, j + 1) − g(1, j − 1) + 1) was less than 0.67 for small grid sizes but grew and seemed to approach one.19 If it does limit to one, the asymptotic performance is not guaranteed. However, the two-state algorithm has shown itself to be very efficient for quantitatively-relevant grid sizes.

4

Extensions to a class of non-monotone problems

In this section, we extend binary monotonicity’s usage to a class of problems having potentially non-monotone policies. Our canonical problem is to solve, for i = 1, . . . , n, V (i) =

max

c≥0,i0 ∈{1,...,n0 }

u(c) + W (i0 ) (13) 0

s.t. c = z(i) − w(i ) where u0 > 0, u00 < 0 with an associated optimal policy g(i). Here, as before, i and i0 can be thought of as indices in possibly multi-dimensional grids. While this is a far narrower class of problems than (1), it still encompasses a large number of problems including the RBC and Arellano (2008) models. If z and W are weakly increasing, then one can use our sufficient conditions to show g is monotone. Our main insight is that binary monotonicity can be applied even when z and W are non-monotone by creating a new problem in which their values have been sorted. Specifically, letting ˜ be the sorted values of z and W , respectively, and letting w z˜ and W ˜ be rearranged in the same ˜ , the transformed problem is to solve, for each j = 1, . . . , n, order as W V˜ (j) =

max

c≥0,j 0 ∈{1,...,n0 }

˜ (j 0 ) u(c) + W (14)

s.t. c = z˜(j) − w(j ˜ 0) ˜ and z˜ are increasing, binary monotonicity can be with associated policy function g˜. Because W used to obtain g˜ and V˜ . Then, one may recover g and V by undoing the sorting.20 Evidently, this class of problems allows for a cash-at-hand reformulation with z as a state variable, so the real novelty of this approach lies in the sorting of W . For a concrete example, consider a combination of the Arellano (2008) and RBC models. (While this example will have multi-dimensional grids, there are also one-dimensional grid examples.)21 19

For this measure, we are taking the minimum over treating k as the first dimension and treating z as the first dimension. As argued above, whether one “transposes” the algorithm or not results in the same search spaces and so, for the asymptotics, it is enough that one of these be bounded away from one. 20 To be precise, let the sorting of z and W be denoted by ζ and Ω, respectively, so that z(i) = z˜(ζ(i)), W (i0 ) = 0 ˜ (Ω(i0 )), and w(i0 ) = w(Ω(i W ˜ )) for all i, i0 . Further, let the inverse of Ω be denoted Ω−1 . Then g(i) = Ω−1 (˜ g (ζ(i))) ˜ and V (i) = V (ζ(i)). 21 E.g., with Medicaid, a person’s eligibility depends on their not having too many assets. In a model with z(i)

21

While we fully describe the model in Appendix B, the relevant part is the sovereign’s problem conditional on not defaulting. This may be written V n (b, k, a) = max u(c) + βEa0 |a max{V n (b0 , k 0 , a0 ), V d (k 0 , a0 )} 0 0 c≥0,b ,k

0

0

0

0

(15)

α

s.t. c + q(b , k , a)b + k = ak + (1 − δ)k + b, where a is productivity, other variables are as before, and the associated optimal policies are b0 (b, k, a) and k 0 (b, k, a). The optimal policies for the problem are generally not monotone because of the non-trivial portfolio choice problem: An increase in bonds b or capital k may cause a substitution from b0 into k 0 or vice-versa. This is true even if one uses a cash-at-hand formulation. Nevertheless, binary monotonicity can be used to solve this problem by mapping it into (13) and then sorting to arrive at (14). Specifically, suppose the states (b, k) and choices (b0 , k 0 ) both lie in a set X = {(bi , ki )} having cardinality n. In most cases, the grid would be a tensor-product, but that is not required. Creating a separate problem for each a, one may then write Van (i) =

max

c≥0,i0 ∈{1,...,n}

u(c) + Wa (i0 ) (16) 0

s.t. c = za (i) − wa (i ) where Wa (i0 ) := βEa0 |a max{V n (bi0 , ki0 , a0 ), V d (ki0 , a0 )}, za (i) := akiα + (1 − δ)ki + bi , and wa (i0 ) := q(bi0 , ki0 , a)bi0 + ki0 . (Note that although the choice variable is now one-dimensional, the portfolio problem is still present as each i0 gives a distinct (bi0 , ki0 ) pair.) Generally, za and Wa will not be increasing because that would require a very particular ordering of the elements of X . However, by sorting the z and W values, one can solve the model using binary monotonicity. In this case, the portfolio problem is being simplified by using the restriction that continuation utility—evaluated at optimal choices—must increase as cash-at-hand increases. Basic considerations show that this “binary monotonicity with sorting” algorithm is asymptotically efficient when an efficient sorting algorithm is used. Specifically, using a sorting algorithm like heapsort or mergesort, the required manipulations for z can be done in O(n log n) operations and for W and w in O(n0 log n0 ) operations.22 Additionally, the mapping from the transformed problem back to the original has linear cost. Then, since binary monotonicity is O(n0 log n) + O(n) as either n or n0 grow, the entire algorithm is O(n log n) + O(n0 log n0 ) as either n or n0 grow. Fixing n0 = n, this is the same O(n log n) cost seen in Proposition 2. Note that if only one of n or n0 grows, the sorting costs asymptotically dominate, which is a point we will revisit below. While the cost depends only on the total number of points n and n0 , in the case of tensor grids 0

with a fixed number of points m along each dimension, n = md and n0 = md grow quickly in m when d and d0 (the dimensionality of states and choices, respectively) are bigger than 1. The giving cash-at-hand from asset levels indexed by i and Medicaid transfers, z would be increasing up to the eligibility threshold, discontinuously drop, and then continue increasing. The continuation utility W would inherit this nonmonotonicity. 22 We use quicksort in the numerical results because it is generally faster (despite quadratic worst-case behavior).

22

0

0

cost in these terms is O(mmax{d,d } log m) for binary monotonicity and O(md+d ) for brute force.23 While theoretically this results in a massive improvement when d = d0 > 1, the extreme cost of using brute force in this case means one would almost surely reformulate the problem in terms of cash-at-hand, effectively reducing d to 1. Table 5 reports the run times and evaluation counts for brute force, simple monotonicity, and binary monotonicity for different grid sizes m in the combined Arellano (2008) and RBC model. In the top panel, the cash-at-hand reformulation has not been used, and so binary monotonicity vastly outperforms the other methods. For 50 points in each dimension, binary monotonicity is already 14 times faster than simple monotonicity and 39 times faster than brute force. For a 1000 points, simple monotonicity’s estimated run time is two months while binary monotonicity’s actual run time is only 21 minutes. A doubling of the grid sizes makes the speedup increase by roughly a factor of 4, which agrees with binary monotonicity being O(m2 log m) and brute force being O(m4 ). The speedups measured in evaluations are even more dramatic as they exclude time spent sorting. Reformulating the problem using cash-at-hand (with m points for the cash-at-hand state variable) makes brute force an O(m3 ) algorithm but has no change on binary monotonicity’s asymptotics (which are still O(m2 log m)). Consequently, binary monotonicity still has an advantage, but it is much smaller. This can be seen in a comparison of the top and bottom panels of Table 5. With the cash-at-hand formulation, brute force and simple monotonicity are faster by a factor of roughly m, but binary monotonicity is only twice as fast. Overall, binary monotonicity still outperforms simple monotonicity, but by a more modest factor of 1.4 to 9.2 for run times. The better evaluation count speedups, which are in the 3.4 to 42.5 range, show that sorting costs are playing a non-trivial role. As pointed out above, when only n or n0 grow, sorting costs dominate, and here that is essentially the case because n0 = m2 grows much faster than n = m. However, it should be pointed out that the speedups measured against brute force—which may be a better benchmark since the sorting of continuation utilities is, to our knowledge, novel—are roughly twice as large. Before closing this section, we would like like to point out that in some cases additive separability may only occur when conditioning on some variables. For instance, in our example, adding capital adjustment costs transforms the budget constraint to c + q(b0 , k 0 , a)b0 + k 0 + ξ(k 0 − k)2 = ak α + (1 − δ)k + b,

(17)

which can be written as c = za (b, k, k 0 ) − wa (b0 , k, k 0 ) where za (b, k, k 0 ) := ak α + (1 − δ)k + b and wa (b0 , k, k 0 ) := q(b0 , k 0 , a)b0 + k 0 + ξ(k 0 − k)2 . In this case, the problem has the form Van (i, j) = max u(c) + Wa (i0 , j 0 ) 0 0 c≥0,i ,j

0

0

(18)

0

s.t. c = za (i, j, j ) − wa (i , j, j ), so that there is additive separability between the i and i0 variables conditional on a j, j 0 pair. Binary As discussed above, binary monotonicity with sorting is O(n log n) + O(n0 log n0 ). Plugging in n = md and 0 0 0 n = md gives O(md log m) + O(md log m). So, the cost is O(mmax{d,d } log m). 23

0

23

Original formulation Run times

Evaluations per state

Points (m)

Brute

Simple

Binary

Brute

Simple

Binary

Speedup

50 100 250 500 1000

1.73 (m) 26.4 (m) 16.1∗ (h) 10.3∗ (d) 157∗ (d)

37.1 (s) 9.48 (m) 5.83∗ (h) 3.72∗ (d) 57.0∗ (d)

2.64 (s) 9.35 (s) 1.10 (m) 5.02 (m) 21.1 (m)

2500 10000 62500∗ 250000∗ 1000000∗

1273 5100 31936∗ 127919∗ 512379∗

14 16 19 21 23

14.1 60.8 317.9∗ 1066.8∗ 3890.5∗

Cash-at-hand formulation Run times

Evaluations per state

Points (m)

Brute

Simple

Binary

Brute

Simple

Binary

Speedup

50 100 250 500 1000

3.14 (s) 17.5 (s) 4.12 (m) 32.3 (m) 4.23∗ (h)

1.21 (s) 8.03 (s) 1.76 (m) 12.8 (m) 1.55∗ (h)

0.85 (s) 4.16 (s) 31.0 (s) 2.27 (m) 10.1 (m)

2500 10000 62500 250000 1000000∗

1304 5127 31723 126354 503277∗

379 860 2467 5432 11855

1.4 1.9 3.4 5.6 9.2∗

Note: Run times are in seconds (s), minutes (m), hours (h), or days (d). The last column gives the run time for simple over the run time for binary; an ∗ means the value is estimated; times and average evaluations are over the first 200 value function iterations as convergence only obtained for some grid sizes. Table 5: Run times and evaluations for the combined Arellano (2008) and RBC model

24

monotonicity can be used to solve (18) by breaking it into a two-stage problem where j 0 is chosen in the first stage and i0 in the second: Van (i, j) = max V˜an (i, j, j 0 ) 0 j

V˜an (i, j, j 0 ) = max0 u(c) + Wa (i0 , j 0 )

(19)

c≥0,i

s.t. c = za (i, j, j 0 ) − wa (i0 , j, j 0 ). For each j, j 0 combination, one can sort z(·, j, j 0 ) and W (·, j 0 ) so that the optimal policy of the second-stage problem is monotone in i. For grids of size m in each dimension, binary monotonicity with sorting can be used to solve for V˜an in O(m3 log m) operations and then Van in O(m3 ) operations. So, the total cost is O(m3 log m), which compares favorably with brute force’s O(m4 ).

5

Application to continuous problems

While grid search is a simple and robust method for value function optimization, its accuracy is typically low relative to methods that assume a continuous choice space. In this section, we show how binary monotonicity can be used with a continuous choice space using the Arellano (2008) model as an example. (We use this model as its non-concave objective function requires a global optimization technique.) We focus on the problem of solving, for each b ∈ B with B a discrete grid, V (b) =

max

u(c) + W (b0 )

c≥0,b0 ∈[b,b]

(20)

c = z(b) − w(b0 ) for z and W increasing, u0 > 0, u00 < 0, and b := min B, b := max B.24 Note the choice set is now an interval rather than a grid. Using the sufficient conditions given in the next section, one can show the policy function g(b) will be monotone. A common approach for solving this type of problem is to identify the best b0 ∈ B using grid search and then searching locally about it (e.g., Hatchondo, Martinez, and Sapriza, 2010, do this). Typically, the local search is done via golden section search or a Newton-like optimizer. Since W (and in some cases w) is generally only known on the discrete grid B, some approximation method is used to obtain W (b0 ) for b0 ∈ / B. Binary monotonicity can be used to accelerate this procedure, as in the discrete case, by facilitating the grid search. This commonly-used approach is heuristic in that it is not guaranteed to find a global maximum without additional assumptions. However, when W and w off the B grid are given by linear interpolants, the exact (to numerical precision) global maximum of the approximate problem can be found.25 Specifically, a global optimum in [b, b] must either be on the grid B or in one of the 24

If z and W are not increasing, binary monotonicity can still be employed by using the sorting algorithm described in Section 4. However, the resulting solution will have an unconventional interpretation. 25 Linear interpolation is convenient because it preserves monotonicity (and concavity) and has a simple formula.

25

intervals defined by it. In the first case, the optimum can be found via grid search. In the second case, the optimum must satisfy a FOC that admits a closed-form expression when using linear interpolation.26 Practically speaking, solving the continuous problem with this method only doubles the number of choices relative to the discrete case: In addition to checking b0 ∈ B like in the discrete case, one also checks a b0 for each interval defined by B. Table 6 reports run times, speedups, evaluation counts, and average Euler equation errors (in log10 units) for pure grid search (columns labeled D), the heuristic of grid search followed by local optimization (columns labeled CL), and the global optimization method (columns labeled CG) with the exact implementations given in Appendix A.27 As previously seen with the discrete case, the times and binary monotonicity’s speedup relative to simple monotonicity grow nearly linearly in grid sizes. While the continuous solutions tend to be around 70-90% slower than the discrete solution, the average Euler equation errors are much lower. This is especially true for grid sizes of 500 and above where the errors can be up to three orders of magnitude less.28 While the heuristic approach seems to work well—having nearly, though not exactly, the same average Euler error—the guaranteed accuracy of the global approach may be had at a cost of around 15%. Binary monotonicity makes the global approach CG relatively attractive compared to the heuristic approach CL because of the way the continuous choice space is added. With CG, the choice set is effectively doubled relative to D. For both binary and simple monotonicity, this causes evaluation counts per state to also roughly double, and so speedups are virtually unchanged. In contrast, with CL the evaluations counts do not scale up proportionally but instead increase by around 1.6 in levels. When using simple monotonicity, this increases evaluation counts by a few percent or less. However, with binary monotonicity this causes a percent increase of around 20-30%. Consequently, the cost of using CG instead of CL is relatively small for binary monotonicity. Cai and Judd (2012)’s method also preserves shape using a slightly more complicated formula and is a better choice in terms of accuracy. 26 Suppose the interval is (l, u). For b0 ∈ (l, h) to be optimal, it must satisfy u0 (z(b)−w(l)−(b0 −l)ω) = Ω/ω where Ω (ω) denotes the slope of W (w). Inverting marginal utility and solving for b0 gives b0 = ω −1 z(b) − w(l) − u0−1 (Ω/ω) + l. Note this can only possibly be optimal if the implied b0 is in (l, h). Also, since w and W are increasing, the case of ω = 0 implies h is weakly better than any element in (l, h), and so there is no need to divide by zero. 27 The times are not directly comparable to the ones reported in Table 2 because different subroutines are used. For the evaluation count measure, we include all the evaluations required by grid search plus one “evaluation” for each interval checked by the continuous procedures. Also, because the price schedule q is a step function and so only piecewise differentiable, we include an observation in the Euler error measure only if the schedule is differentiable at the optimal choice. We say the schedule is differentiable if the two-sided numerical derivative of q(b0 , y)b0 (the expenditure schedule) is less than 10−3 from the numerical left (i.e., backwards) and right (i.e., forwards) derivative. This ends up discarding about half of the points, consistent with Athreya, Tam, and Young (2012)’s finding for consumer default that “almost all borrowers . . . locate at the edge of ‘cliffs’ [i.e., points of non-differentiability] in the pricing function” (p. 166). 28 The Euler errors are not necessarily diminishing in grid sizes in part because the average is computed with respect to the invariant distribution. The distribution of Euler errors is bimodal with one mode having high errors, corresponding to the optimal choices being on the grid, and the other mode having low errors, corresponding to optimal choices that were between grid points. Errors at the 75th, 90th, 95th, and 99th percentile all fall monotonically.

26

Binary time

Mean EEE

Grid size

D

CL

CG

D

CL

CG

100 250 500 1000 2500

0.04 0.11 0.23 0.46 1.21

0.07 0.19 0.38 0.76 1.96

0.08 0.21 0.43 0.88 2.28

-2.156 -2.467 -3.370 -3.760 -4.235

-3.502 -3.973 -8.148 -7.881 -6.739

-3.504 -3.966 -8.167 -7.881 -6.739

Time speedup

Evaluations per state

Grid size

D

CL

CG

D

CL

CG

100 250 500 1000 2500

3.5 7.7 14.5 28.1 66.5

2.4 4.9 9.1 17.5 41.4

3.4 7.3 13.9 26.8 63.5

5.9 6.6 7.1 7.6 8.3

7.5 8.2 8.7 9.2 9.9

10.9 12.2 13.3 14.3 15.7

Note: Times are in seconds; the speedup is simple monotonicity’s time over binary’s; Mean EEE is the mean Euler equation error (in log10 ) conditional on points where the left and right hand derivatives of the budget constraint are nearly equal; evaluations include actual objective function evaluations plus one “evaluation” for each interval checked by the optimization procedure. Table 6: Run time and accuracy comparison across optimization techniques

27

6

Sufficient conditions

Since we have shown binary monotonicity is a powerful solution technique, it is useful to have sufficient conditions that guarantee policy functions are monotone. In this section, we collect some sufficient conditions from the literature, as well as provide new ones, and show how they can be used to establish the monotonicity results that have been used in the paper. Topkis (1978) provides general sufficient conditions for establishing monotonicity. One of the key requirements is that the objective function must have increasing differences. We formalize this notion now.29 Definition 2. Let S ⊂ R2 . Then f : S → R has weakly (strictly) increasing differences on S if f (x2 , y) − f (x1 , y) is weakly (strictly) increasing in y for x2 > x1 whenever (x1 , y), (x2 , y) ∈ S. Note that increasing differences holds if and only if f (x, y2 ) − f (x, y1 ) is increasing in x for y2 > y1 , so either condition may be used in the definition. Before stating the main sufficient conditions, we need to formalize what monotonicity means in cases where there may be more than one optimal policy. One possibility is that every possible optimal policy must necessarily be monotone. This corresponds to the optimal choice correspondence being strongly ascending in the sense defined by Topkis (1978). However, for simple and binary monotonicity to correctly deliver optimal policies, it will be enough for the correspondence to satisfy the weaker condition of being ascending. This condition ensures a monotone policy function exists, but it allows for some non-monotone policies to exist as well. These notions are defined now. Definition 3. Let I, I 0 ⊂ R with G : I → P (I 0 ) where P denotes the power set. Let a, b ∈ I with a < b. G is ascending on I if g1 ∈ G(a), g2 ∈ G(b) implies min{g1 , g2 } ∈ G(a) and max{g1 , g2 } ∈ G(b). G is strongly ascending on I if g1 ∈ G(a), g2 ∈ G(b) implies g1 ≤ g2 . Since binary monotonicity restricts the search space for g(m) to {g(i), . . . , g(i)} (for m the midpoint of i and i), it could fail to deliver an optimal policy if {g(i), . . . , g(i)} did not contain an optimal choice. However, if the optimal choice correspondence is ascending, this is impossible and binary monotonicity correctly delivers an optimal policy.30 Now we can state Topkis (1978)’s sufficient condition in the context of our simplified framework: Proposition 4 (Topkis, 1978). Let I, I 0 ⊂ R, I 0 : I → P (I 0 ), and π : I ×I 0 → R. If I 0 is ascending on I and π has increasing differences on I × I 0 , then G defined by G(i) := arg maxi0 ∈I 0 (i) π(i, i0 ) is ascending on {i ∈ I|G(i) 6= ∅}. If π has strictly increasing differences, then G is strongly ascending. There are two main difficulties in applying Proposition 4. First, one must establish the objective function has increasing differences. Second, one must establish the choice correspondence is ascend29 For the general definitions, see Topkis (1978). We present the definitions and results in a simplified context corresponding to what binary monotonicity can solve. 30 To see why, let G(i) := arg maxi0 π(i, i0 ). Then for any gˆ ∈ G(m), one also has min{g(i), max{g(i), gˆ}} ∈ G(m). Consequently, G(m) ∩ {g(i), . . . , g(i)} 6= ∅ (noting that the finite choice set implies G(m) 6= ∅). We formalize this argument in Appendix C.

28

ing. One way to ensure the choice correspondence is ascending is for every choice to be feasible. The following lemma, which relies on increasing differences, provides an alternative.31 Lemma 2. Let I, I 0 ⊂ R and I 0 : I → P (I 0 ). Suppose I 0 (i) = {i0 ∈ I 0 |hm (i, i0 ) ≥ 0 for all m ∈ M } with M arbitrary. If I 0 is increasing, hm is decreasing in i0 , and hm has increasing differences on I × I 0 (for all m), then I 0 is ascending on I. For an example application of Lemma 2, consider the RBC model and define c(k 0 , k, z) := −k 0 + zF (k) + (1 − δ)k. Since c is increasing in k and decreasing in k 0 , the budget constraint {k 0 ∈ K|c(k, k 0 , z) ≥ 0} will be ascending on K for each z as long as c has increasing differences in k and k0 . As Proposition 4 and Lemma 2 suggest, monotonicity is intimately connected to functions having increasing differences. The following lemma provides a number of sufficient conditions for establishing that this is the case.32 Lemma 3. For I, I 0 ⊂ R and S ⊂ I × I 0 , f : S → R has increasing differences on S if any of the following hold: (a) f (i, i0 ) = p(i) + q(i0 ) for arbitrary p and q. (b) f (i, i0 ) = p(i)+q(i0 )+r(i)s(i0 ) for arbitrary p and q with r and s both increasing or decreasing. (c) f (i, i0 ) agrees with g : L ⊂ R2 → R, a C 2 function having g12 ≥ 0 and L a hypercube with S ⊂ L. (d) f (i, i0 ) is a non-negative linear combination (i.e., f =

P

αk fk with αk ≥ 0) of functions

having increasing differences. (e) f (i, i0 ) = h(g(i, i0 )) for h an increasing, convex, C 2 function and g increasing (in i and i0 ) and having increasing differences. (f ) f (i, i0 ) = h(g(i, i0 )) for h an increasing, concave, C 2 function and g increasing in i, decreasing in i0 , and having increasing differences. (g) f (i, i0 ) =

R E

ˆ i0 )|h ˆ = h(i, ε), ε ∈ g(h(i, ε), i0 )dF (ε) with g having increasing differences on {(h,

E, (i, i0 ) ∈ S} and h increasing in i. (h) f (i, i0 ) = maxx∈Γ(i,i0 ) g(i, i0 , x) exists for all (i, i0 ) ∈ S, S is a lattice, Γ : S → P (X), X ⊂ R, the graph of Γ is a lattice, and g has increasing differences in i, i0 and i, x and i0 , x on I × I 0 , I × X, and I 0 × X, respectively (for all i, i0 , x). 31

Another alternative is to establish the graph of the choice correspondence is a lattice. Under our assumptions, this is actually necessary and sufficient, which we prove in Appendix C. 32 Some additional sufficient conditions can be found in Topkis (1978) and Simchi-Levi, Chen, and Bramel (2014). Many of these results hold more generally (e.g., for supermodular functions on lattices) and have strict versions.

29

By repeated use of Lemma 3 followed by an application of Proposition 4, monotonicity can be shown in many economic models. For instance, again consider the RBC model where we defined c(k 0 , k, z) = −k 0 + zF (k) + (1 − δ)k. By Lemma 3 part (a), c has increasing differences in k, k 0 . Then, u(c(k 0 , k, z)) and u(c(k 0 , k, z)) + βEz 0 |z V0 (k 0 , z 0 ) have increasing differences by parts (f) and (d), respectively. Consequently, an application of Proposition 4 (noting the budget constraint is ascending by Lemma 2) gives that arg maxk0 u(c(k 0 , k, z))+βEz 0 |z V0 (k 0 , z 0 ) is ascending on K. Hence, binary monotonicity can be used to compute an optimal policy k 0 (k, z) that is monotone in k. With additional assumptions, one can also use these results to establish k 0 is monotone in z, although doing so is more complicated. Identical arguments to the above give the optimal choice correspondence as ascending in z if Ez 0 |z V0 (k 0 , z 0 ) has increasing differences in k 0 , z. By part (g), this will hold as long as z 0 is increasing in z and V0 (k, z) has increasing differences in k, z. To ensure this is the case at each step of the Bellman update (assuming the initial guess has increasing differences), one can use part (h) if the graph of the choice correspondence is a lattice and u ◦ c + βEz 0 |z V0 has increasing differences in k, z.33 A sufficient condition for the first condition is that every choice is feasible. A sufficient condition for the latter, by part (c), is that ∂ 2 u(c)/(∂k∂z) ≥ 0, which is the same condition required in Hopenhayn and Prescott (1992). It should be kept in mind that these conditions are not, in general, necessary. For instance, in standard calibrations, the condition ∂ 2 u(c)/(∂k∂z) ≥ 0 will not be satisfied because it requires relative risk aversion less than 1.34 However, just because one cannot prove a policy function is monotone, it does not mean binary monotonicity cannot be used. But it does mean one should check ex post that the true policy function is monotone. For instance, we check using brute force that the policy function for the RBC model is monotone in z (and it is). Since our method primarily exploits monotonicity of one choice variable, it will be convenient in some cases to construct an indirect utility function over the other choices and then establish that the indirect utility function has increasing differences. For instance, in the RBC model with elastic labor supply, the maximization problem for a given z can be written as maxk0 U (k, k 0 ) + βEz 0 |z V (k 0 , z 0 ) where U (k, k 0 ) = maxl∈[0,1] u(c(l, k 0 , k), l) with c giving consumption conditional on an (l, k, k 0 ) combination. If U has increasing differences in k, k 0 , then by Proposition 4 optimal k 0 will be ascending in k. Lemma 3 part (h) gives one sufficient condition for this, but its conditions can be difficult to guarantee unless every choice is feasible.35 Proposition 5 provides an alternative sufficient condition that may be easier to verify. 33 Part (h) also requires u ◦ c + βEz0 |z V0 to have increasing differences in k0 , k and k0 , z, conditions that hold as discussed above. 34 After differentiation and some rearranging, one has this condition as equivalent to −u00 /u0 ≤ F 0 /((1 + zF 0 − δ)F ). For constant relative risk aversion of σ, multiplying both sides by c and manipulating gives this as equivalent to σ ≤ (r/R)(c/y) where r := zF 0 , R := 1 + r − δ, and y := zF . For positive investment levels, c/y < 1 and so this generally requires σ < 1. 35 To see why the conditions can be difficult to guarantee, consider a parameterization where zF (k, l)+(1−δ)k = l+k. ¯ = {(l, k, k0 ) ∈ R3+ |0 ≤ l+k−k0 }, is not a lattice when equipped with the standard Then the budget constraint graph, Γ ¯ but their meet (0, 0, 1) is not. Mirman and Ruble (2008) discuss many of ordering: Both (1, 0, 1) and (0, 1, 1) are in Γ the issues and provide some ways of producing lattices that involve non-standard orderings. According to Chen, Hu, and He (2013), “relaxing the lattice requirement has been a significant challenge” (p. 1166).

30

Proposition 5. Let S ⊂ R2 be open and convex. Let f : S → R be defined by f (i, i0 ) = maxx∈X u(g(x, i, i0 ), x) where u is differentiable, increasing, and concave in its first argument and X is arbitrary. Then f has increasing differences on S if 1. f is well-defined and C 1 in i on the closure of S, and 2. for any optimal policy x∗ and any (i, i0 ) ∈ S, g2 (x∗ (i, i0 ), i, i0 ) exists, is positive, and is increasing in i0 , and g(x∗ (i, i0 ), i, i0 ) is decreasing in i0 . To apply this result to the RBC model with elastic labor supply, write c(l, k, k 0 ) = max{0, −k 0 + zF (k, l) + (1 − δ)k}. If U is differentiable and solutions are interior—i.e., l∗ ∈ (0, 1) and c(l∗ , k, k 0 ) > 0—then c2 (l∗ , k, k 0 ) = zFk (k, l∗ ) + 1 − δ exists, is positive, and is increasing (weakly) in k 0 . If in addition consumption is a normal good, so that c(l∗ , k, k 0 ) is decreasing in k 0 , then Proposition 5 gives U as having increasing differences. To establish monotonicity in the RBC model, Lemma 3 part (f) was used to show the composition u(c(i, i0 )) had increasing differences. For this to be the case, c had to have increasing differences, be increasing in i, and be decreasing in i0 . This last assumption—that c is decreasing in i0 —proves to be too strong in the Arellano (2008) model and for solving the sorted problem (14).36 E.g., consumption in the Arellano (2008) is not necessarily decreasing in b0 because q(b0 , y)b0 is generally not monotone (p. 699). Similarly, in (14), there is no guarantee w ˜ is increasing. Nevertheless, monotonicity can still be established in these cases provided that continuation utility is increasing. This claim is formalized in the following proposition. Proposition 6. Let I, I 0 ⊂ R. Define G(i) = arg maxi0 ∈I 0 ,c(i,i0 )≥0 u(c(i, i0 )) + W (i0 ) where u is differentiable, increasing, and concave. If c is increasing in i and has increasing differences and W is increasing in i0 , then G is ascending (on i such that an optimal choice exists). If, in addition, c is strictly increasing in i and W is strictly increasing in i0 , or if c has strictly increasing differences, then G is strongly ascending. Because one can show continuation utility in the Arellano (2008) model is increasing, Proposition 6 gives the policy function b0 (b, y) is monotone in b. Even if it were not, this proposition shows binary monotonicity with sorting works, and—since the Arellano (2008) model has additive separability between states and controls—the sorting algorithm could be used. Our goal in this section has been to establish sufficient conditions that can be verified easily and solved with our computational method, not to establish a comprehensive list. Milgrom and Shannon (1994) provide necessary and sufficient conditions for optimal policies to be monotone. These relax the supermodularity and increasing differences assumptions of Topkis (1978) to what they term quasisupermodularity and a single crossing property. Hopenhayn and Prescott (1992) apply the Topkis (1978) results in a dynamic programming context, characterizing the behavior The requirement in Lemma 3 part (f) that c be decreasing in i0 is necessary. To see this, take u(x) = x1/2 and c(i, i0 ) = i + i0 . Then this satisfies all the conditions of part (f) except for the requirement that c be decreasing in i0 . Because the cross partial of u ◦ c is −(i + i0 )−3/2 /4 < 0, u ◦ c has strictly decreasing differences. 36

31

of policy functions as well as invariant distributions in a general framework. Quah (2007) complements the results of these papers by using assumptions of concavity and convexity to relax lattice requirements. The large literature on turnpike theorems, e.g., Majumdar and Zilcha (1987); Mitra and Nyarko (1991); Joshi (1997), have monotonicity results of a type we have not exploited but that could prove useful. E.g., Theorem 4 of Majumdar and Zilcha (1987) gives that, under certain conditions, k 0 (k, z) ≥ k00 (k, z) where k00 is the optimal policy from the next period. Hence, information from a previous capital policy solution could provide a lower bound on the search space. More sufficient conditions can be found in Stokey and Lucas Jr. (1989), Puterman (1994), Topkis (1998), Athey (2002), Smith and McCardle (2002), Mirman and Ruble (2008), Quah and Strulovici (2009), Strulovici and Weber (2010), and Jiang and Powell (2015).

7

Conclusion

We have shown binary monotonicity is a powerful grid search technique. Without any assumption of concavity, the algorithm delivers O(n log n) performance and is an order of magnitude faster than simple monotonicity for commonly-used grid sizes in the Arellano (2008) and RBC models. Moreover, when paired with binary concavity, binary monotonicity delivers O(n) performance. While simple monotonicity with simple concavity was faster for the RBC model, binary monotonicity with binary concavity offers guaranteed performance. Binary monotonicity has also proven useful in two extensions. In the first extension, a modification to exploit monotonicity in two states delivered performance that was several times better than exploiting monotonicity in one state only and was O(n1 n2 ) for a class of optimal policies. In the second extension, a sorting algorithm allowed binary monotonicity to be used in a class of problems with non-monotone policies. This algorithm delivered O(n log n) performance and, in a combination of the Arellano (2008) and RBC models, proved to be several times more efficient than the best applicable grid search methods. Binary monotonicity also proved useful in the context of a continuous choice space, allowing a global maximization method to be used at relatively low cost. For concave problems, a myriad of solution techniques exist. Binary monotonicity complements these methods by providing a simple and—relative to existing grid search methods—efficient solution technique. In contrast, few computational methods are available for non-concave problems like those arising naturally in discrete choice models. Binary monotonicity and its variants should greatly facilitate computation in this challenging class of problems.

References C. Arellano. Default risk and income fluctuations in emerging economies. American Economic Review, 98(3):690–712, 2008. S. B. Aruoba, J. Fern´ andez-Villaverde, and J. F. Rubio-Ram´ırez. Comparing solution methods for

32

dynamic general equilibrium economies. Journal of Economic Dynamics and Control, 30(12): 2477–2508, 2006. S. Athey. Monotone comparative statics under uncertainty. The Quarterly Journal of Economics, 117(1):187, 2002. K. B. Athreya, X. S. Tam, and E. R. Young. A quantitative theory of information and unsecured credit. American Economic Journal: Macroeconomics, 4(3):153–183, 2012. Y. Bai and J. Zhang. Financial integration and international risk sharing. Journal of International Economics, 86(1):17–32, 2012. Y. Cai and K. L. Judd. Dynamic programming with shape-preserving rational spline Hermite interpolation. Economics Letters, 117(1):161–164, 2012. C. D. Carroll. The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3):312–320, 2006. X. Chen, P. Hu, and S. He. Technical note—preservation of supermodularity in parametric optimization problems with nonlattice structures. Operations Research, 61(5):1166–1173, 2013. L. J. Christiano. Solving the stochastic growth model by linear-quadratic approximation and by value-function iteration. Journal of Business & Economic Statistics, 8(1):23–26, 1990. G. Fella. A generalized endogenous grid method for non-smooth and non-concave problems. Review of Economic Dynamics, 17(2):329–344, 2014. J. C. Hatchondo, L. Martinez, and H. Sapriza. Quantitative properties of sovereign default models: Solution methods matter. Review of Economic Dynamics, 13(4):919–933, 2010. B. Heer and A. Maußner. Dynamic General Equilibrium Modeling: Computational Methods and Applications. Springer, Berlin, Germany, 2005. H. A. Hopenhayn and E. C. Prescott. Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica, 60(6):1387–1406, 1992. A. Imrohoro˘ glu, S. Imrohoro˘ glu, and D. H. Joines. A numerical algorithm for solving models with incomplete markets. International Journal of High Performance Computing Applications, 7(3): 212–230, 1993. F. Iskhakov, T. H. Jørgensen, J. Rust, and B. Schjerning. The endogenous grid method for discretecontinuous dynamic choice models with (or without) taste shocks. Quantitative Economics, forthcoming, 2016. D. R. Jiang and W. B. Powell. An approximate dynamic programming algorithm for monotone value functions. Operations Research, 63(6):1489–1511, 2015. 33

S. Joshi. Turnpike theorems in nonconvex nonstationary environments. International Economic Review, 38(1):225–248, 1997. K. L. Judd. Numerical Methods in Economics. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1998. M. Majumdar and I. Zilcha. Optimal growth in a stochastic environment: Some sensitivity and turnpike results. Journal of Economic Theory, 43(1):116 – 133, 1987. P. Milgrom and I. Segal. Envelope theorems for arbitrary choice sets. Econometrica, 70(2):583–601, 2002. P. Milgrom and C. Shannon. Monotone comparative statics. Econometrica, 62(1):157–180, 1994. L. J. Mirman and R. Ruble. Lattice theory and the consumer’s problem. Mathematics of Operations Research, 33(2):301–314, 2008. T. Mitra and Y. Nyarko. On the existence of optimal processes in non-stationary environments. Journal of Economics, 53(3):245–270, 1991. K. P. Papadaki and W. B. Powell. Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. European Journal of Operational Research, 142 (1):108–127, 2002. K. P. Papadaki and W. B. Powell. A discrete online monotone estimation algorithm. Working Paper LSEOR 03.73, Operational Research Working Papers, 2003. W. H. Press, B. P. Flannery, S. A. Tuekolsky, and W. T. Vetterling. Numerical Recipes (Fortran version). Cambridge University Press, Cambridge, MA, 1989. M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994. J. K. Quah. The comparative statics of constrained optimization problems. Econometrica, 75(2): 401–431, 2007. J. K. Quah and B. Strulovici. Comparative statics, informativeness, and the interval dominance order. Econometrica, 77(6):1949–1992, 11 2009. K. Schmedders and K. Judd. Handbook of Computational Economics, volume 3. Elsevier Science, 2014. D. Simchi-Levi, X. Chen, and J. Bramel. The Logic of Logistics: Theory, Algorithms, and Applications for Logistics Management (Third Edition). Springer, New York, 2014. J. E. Smith and K. F. McCardle. Structural properties of stochastic dynamic programs. Operations Research, 50(5):796–809, 2002. 34

N. L. Stokey and R. E. Lucas Jr. Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, Massachusetts and London, England, 1989. B. H. Strulovici and T. A. Weber. Generalized monotonicity analysis. Economic Theory, 43(3): 377–406, 2010. O. Tange. GNU Parallel - the command-line power tool. ;login: The USENIX Magazine, 36(1): 42–47, 2011. G. Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20(2):177–181, 1986. D. M. Topkis. Minimizing a submodular function on a lattice. Operations Research, 26(2):305–321, 1978. D. M. Topkis. Supermodularity and complementarity. Princeton University Press, Princeton, N.J., 1998.

A

Algorithms

This appendix contains a non-recursive version of binary monotonicity, our implementation of binary concavity, and our precise implementations of the the continuous choice set algorithms.

A.1

A non-recursive formulation of binary monotonicity

Rather than directly implementing binary monotonicity using recursion, it can be computationally more efficient to eliminate the recursion. The following algorithm does this: 1. Initialization: Compute g(1) maximizing over {1, . . . , n0 } and compute g(n) maximizing over {g(1), . . . , n0 }. Allocate an array of size (dlog2 (n − 1)e + 1) × 2 that will hold the list below. Fill the first row of the array with (l1 , u1 ) where l1 := 1, u1 := n. Define k := 1. 2. Expand the list from k rows to k˜ rows as follows:    l1 u1 l1 u1 .   . .. ..  .    .. . . .         lk  uk uk   lk      l +u lk+1 uk+1  := lk b k k c   2      lk +uk+1    c l b lk+2 uk+2  k  2     . . ..   .. . . .  .  . .    

lk˜

lk b

uk˜

35

lk +uk−1 ˜ c 2

stopping when uk˜ ≤ lk˜ + 1 (corresponding to step 2(a) in the algorithm). At this step, ˜ compute g(uj ) by k ≥ 1 and g(lj ) and g(uj ) are known for all j ≤ k. For j = k + 1, . . . , k, maximizing over the interval g(lj−1 ), . . . , g(uj−1 ). Taking each row as specifying an interval and subdividing it into two intervals, the following row gives the leftmost subinterval. At this point, g is known for every l and u in the list. Moreover, the interval corresponding to k˜ has exactly two elements, l˜ and u˜ , which implies the policy g(l˜ ), . . . , g(u˜ ) is known. k

k

k

k

˜ Go to step 3. Define k := k. 3. If k = 1, STOP. If uk = uk−1 , then eliminate the last row of the list by setting k := k − 1 and repeat this step. Otherwise, go to the next step. 4. If here, then uk < uk−1 . Set (lk , uk ) := (uk , uk−1 ). This step corresponds to moving to the right subinterval of the interval corresponding to k − 1. Go to step 2. To better understand the logic of the algorithm, consider how it works when n = 5. The list’s progression is traced out in (21). A box around number i means that g(i) is solved for in that step. The initialization step solves for the left and right bound. Step 2 repeatedly partitions [1 5], which one should think of as {1, . . . , 5}, until it has only two elements, in this case corresponding to {1, 2}. Each time it partitions, it solves for exactly one value of g, namely, the midpoint of the smallest current interval. So, for instance, when step 2 is first encountered, the list is just [1 5], and so the smallest interval is {1, . . . , 5}. It then solves for g(3) and makes the smallest interval {1, 2, 3}. It then bisects again, solving for g(2) and making the smallest interval {1, 2}. Step 2 always moves “left” when it bisects. For instance, when it divides {1, . . . , 5} in half, it moves to {1, 2, 3} rather than {3, 4, 5}.

h |

1

5 {z

Step 1

i }

   " # " # 1 5 1 5 1 5 1 5     ⇒ ⇒ 3 5 3  ⇒ 1 3 ⇒ 1 3 3 5 1 2 2 3 3 4 | {z } | {z } | {z } | {z } | {z Step 3 Step 4

 1  ⇒ 1

5

Step 2



Step 4

Step 2

  1 5 h i     ⇒ 3 5 ⇒ 1 5 | {z } 4 5 Step 3 } | {z } 

(21)

Step 4

While step 2 always checks the left subinterval, step 4 checks the right subinterval. For instance, the first time step 4 is encountered, it replaces the bottom row [1 2] with [2 3]. This moves the algorithm from solving for the optimal policy in {1, 2}—the left partition of {1, 2, 3}—to solving for the optimal policy in {2, 3}—the right partition. The next time it is encountered, it moves the bottom row from [1 3] to [3 5]. Note that the algorithm works from left to right: Step 2 checks left subintervals, but when there are no left subintervals remaining, step 4 moves to the right. Step 3 determines when it is time to move “up” and to the right. For instance, after the first time step 4 is encountered in (21), g is known for every value in {1, . . . , 3}, but not every value in {3, . . . , 5}. Step 3 signals it is time to move from {1, . . . , 3} to {3, . . . , 5} by deleting the fine partition {2, 3}. Step 4, which always comes after step 3, then moves to the right by replacing [1 3] 36

with [3 5]. When there are no more right partitions, the algorithm terminates.

A.2

Binary concavity

Below is our implementation of Heer and Maußner (2005)’s algorithm for solving max

i0 ∈{a,...,b}

π(i, i0 ).

Throughout, n refers to b − a + 1. 1. Initialization: If n = 1, compute the maximum, π(i, a), and STOP. Otherwise, set the flags 1a = 0 and 1b = 0. These flags indicate whether the value of π(i, a) and π(i, b) are known, respectively. 2. If n > 2, go to 3. Otherwise, n = 2. Compute π(i, a) if 1a = 0 and compute π(i, b) if 1b = 0. The optimum is the best of a, b. 3. If n > 3, go to 4. Otherwise, n = 3. If max{1a , 1b } = 0, compute π(i, a) and set 1a = 1. Define m =

a+b 2 ,

and compute π(i, m).

(a) If 1a = 1, check whether π(i, a) > π(i, m). If so, the maximum is a. Otherwise, the maximum is either m or b; redefine a = m, set 1a = 1, and go to 2. (b) If 1b = 1, check whether π(i, b) > π(i, m). If so, the maximum is b. Otherwise, the maximum is either a or m; redefine b = m, set 1b = 1, and go to 2. 4. Here, n ≥ 4. Define m = b a+b 2 c and compute π(i, m) and π(i, m + 1). If π(i, m) < π(i, m + 1), a maximum is in {m + 1, . . . , b}; redefine a = m + 1, set 1a = 1, and go to 2. Otherwise, a maximum is in {a, . . . , m}; redefine a = m, set 1b = 1, and go to 2.37

A.3

Continuous algorithms

The continuous optimization algorithms are as follows. Let B = {b1 , . . . , bn }. At any given maximization step, the state is (i, i, i) with i < i < i.38 The goal is to solve for g(bi ) where g(bi ) and g(bi ) are potentially already known. If i (i) is in {1, . . . , n}, then g(bi ) (g(bi )) is known. When i (i) is not in {1, . . . , n}, the policy is unknown and treated as if g(bi ) = b1 (g(bi ) = bn ). Define h : {1, . . . , n} → {1, . . . , n} as follows: If g(bi ) is on the grid, then g(bi ) = bh(i) ; otherwise, g(bi ) ∈ (bh(i) , bh(i)+1 ). The algorithms first find bounds on the grid search. Let these be given in terms of indices with the lower and upper bounds on the grid search denoted δ and δ, respectively. The algorithms set these as follows: 1. If g(bi ) (g(bi )) is on the grid, then δ = h(i) (δ = h(i)). 37 38

Note that in the case of indifference, π(i, m) = π(i, m + 1), the algorithm proceeds to {a, . . . , m}. For binary monotonicity, (i, i, i) corresponds to (i, m, i) at step 2 of the algorithm.

37

2. If g(bi ) (g(bi )) is not on the grid, then δ = h(i) + 1 (δ = h(i)).39 Brute force is then used to find the best choice on {δ, . . . , δ}. Let i0∗ denote the best index found via grid search. If δ < δ, which can happen if g(bi ) and g(bi ) are in the same interval and not on the grid, define i0∗ := δ (this only matters for the local search). The algorithms also find bounds on the continuous search. Let these also be given in terms of indices so that when the bounds are (κ, κ) the intervals checked will be {(bi , bi+1 )|i ≥ κ, i + 1 ≤ κ}. For the global algorithm, these are set as follows: 1. If g(bi ) (g(bi )) is on the grid, then κ = h(i) (κ = h(i)). 2. If g(bi ) (g(bi )) is not on the grid, then κ = h(i) (κ = h(i) + 1). For the local algorithm, they are instead set as κ = max{1, i0∗ − 1} and κ = min{n, i0∗ + 1}. With the bounds on the intervals defined, the potential optimum within each interval is the found as described in the main text.

B

Additional model details and calibrations

This appendix gives additional details on the models and lists the calibrations.

B.1

The Arellano (2008) model

In the Arellano (2008) model, a sovereign has output y following a Markov process, chooses discount bond holdings b0 from a set B that contains zero, and makes a default decision d. If the sovereign defaults, output falls to h(y) ≤ y. In equilibrium, the discount bond price q(b0 , y) satisfies q(b0 , y) = (1 + r)−1 Ey0 |y (1 − d(b0 , y 0 )) (where r is an exogenous risk-free rate). The sovereign’s problem reduces to V (b, y) = max dV d (y) + (1 − d)V n (b, y)

(22)

d∈{0,1}

where the value of repaying is V n (b, y) = max u(c) + βEy0 |y V (b0 , y 0 ) 0 c≥0,b ∈B

(23)

s.t. c + q(b0 , y)b0 = b + y and the value of defaulting is V d (y) = u(h(y)) + βEy0 |y (θV d (y 0 ) + (1 − θ)V n (0, y 0 )).

(24)

The parameter θ governs how long the sovereign remains in default. 39

Setting δ = h(i) + 1 is justified because h(i) is such that bh(i) < g(bi ). So by monotonicity there is no reason to check it. A similar argument applies to setting δ = h(i).

38

B.2

The Arellano (2008) and RBC combined model

This section gives an extension of Arellano (2008) model that has endogenous output and capital accumulation like the RBC model. The model is similar to Bai and Zhang (2012) but lacks capital adjustment costs. A sovereign has total factor productivity a that is Markov and chooses bonds b0 and capital k 0 from sets B and K, respectively. (We also assume 0 ∈ B.) If the sovereign defaults, output ak α falls by a fraction κ. In equilibrium, the discount bond price q(b0 , k 0 , a) satisfies q(b0 , k 0 , a) = (1 + r)−1 Ea0 |a (1 − d(b0 , k 0 , a0 )) (where r is an exogenous risk-free rate). The sovereign’s problem is to solve V (b, k, a) = max dV d (k, a) + (1 − d)V n (b, k, a)

(25)

d∈{0,1}

where the value of repaying is V n (b, k, a) =

max

c≥0,b0 ∈B,k0 ∈K

u(c) + βEa0 |a V (b0 , k 0 , a0 ) (26)

s.t. c + q(b0 , k 0 , a)b0 + k 0 = ak α + (1 − δ)k + b and the value of defaulting is V d (k, a) =

max u(c) + βEa0 |a (θV d (k 0 , a0 ) + (1 − θ)V n (0, k 0 , a0 ))

c≥0,k0 ∈K

(27)

s.t. c + k 0 = (1 − κ)ak α + (1 − δ)k.

B.3

Calibrations

For the RBC model, we take β = .99, δ = .025, and F (k, l) = k .36 . We discretize the TFP process log z = .95 log z−1 + .007ε with ε ∼ N (0, 1) using Tauchen (1986)’s method with 21 points (unless explicitly stated otherwise) spaced evenly over ±3 unconditional standard deviations. The capital grid is linearly spaced over ±20% of the steady state capital stock. For the Arellano (2008) model, we adopt the same calibration as the original Arellano (2008) paper and refer the reader to that paper for details. For n bond grid points, 70% are linearly spaced from −.35 to 0 and the rest from 0 to .15. We discretize the exogenous output process log y = .945 log y−1 + .025ε with ε ∼ N (0, 1) using Tauchen (1986)’s method with 21 points spaced evenly over ±3 unconditional standard deviations. For the combined Arellano (2008) and RBC model, we use capital grids that are linearly spaced from .5k ∗ to 1.5k ∗ where k ∗ is the steady state capital in the closed economy. The capital share and depreciation are as in the RBC calibration. We scale the bond grid from Arellano (2008) by the steady state output k ∗.36 . Productivity follows an AR1 with a persistence parameter of .945 and standard deviation of .025, and we use 21 points spaced evenly over ±3 unconditional standard deviations (as before). The default cost κ is set to .05, which gives the risky borrowing region in equilibrium is roughly in the middle of the bond grid for k close to k ∗ .

39

C

Omitted proofs and lemmas [Not for Publication]

This section contains omitted proofs and lemmas. They are broken into three subsections. Section C.1 gives the proofs for the sufficient conditions for monotonicity. Section C.2 examines the properties of the binary monotonicity algorithm and to a lesser extent the binary concavity algorithm. Section C.3 state in what sense (1) and (4) are equivalent and proves their equivalence.

C.1

Sufficient conditions

In this subsection, we will at times refer to lattices. Throughout, we will only use lattices that consist of subsets of Euclidean space with the component-wise ordering, i.e., x ≤ y for x, y ∈ Rn if xj ≤ yj for all j where zj denotes the jth component of z. In this context, the join operation ∨ gives the component-wise maximum, namely, x ∨ y = (max{x1 , y1 }, . . . , max{xn , yn }). Likewise, the meet operation ∧ gives the component-wise minimum, x ∧ y = (min{x1 , y1 }, . . . , min{xn , yn }). A lattice consists of a set X ⊂ Rn with the component-wise ordering such that x, y ∈ X implies x ∨ y, x ∧ y ∈ X. Note that if X ⊂ R, it constitutes a lattice with our ordering: x, y ∈ X implies min{x, y}, max{x, y} ∈ X. A function f : X → R where X is a lattice is said to be supermodular (submodular) if f (x) + f (y) ≤ (≥)f (x ∧ y) + f (x ∨ y) for all x, y ∈ X. If the inequality is strict for all x and y that cannot be ordered, then the function is strictly supermodular (submodular). For part of Lemma 3, we will need a slightly broader definition of increasing differences than what was given in the main text. We define it now. Definition 4. Let X ⊂ Rn . Use the notation (x−ij , yi , yj ) to denote the vector x but with the ith and jth component replaced with the ith and jth component of y, respectively. A function f : X → R has increasing differences on X if for all i, j with i 6= j and for all yi , yj having xi ≤ yi and xj ≤ yj (such that (x−ij , yi , xj ), (x−ij , xi , xj ), (x−ij , yi , yj ), (x−ij , xi , yj ) ∈ X) one has f (x−ij , yi , xj ) − f (x−ij , xi , xj ) ≤ f (x−ij , yi , yj ) − f (x−ij , xi , yj ). The function f has decreasing differences if f (x−ij , yi , xj ) − f (x−ij , xi , xj ) ≥ f (x−ij , yi , yj ) − f (x−ij , xi , yj ). The differences are strict if the inequality holds strictly (whenever xi < yi and xj < yj ). Note that this is equivalent to having increasing differences—as defined in the main text—for all pairs of components. Note also that any univariate function has increasing differences because the condition requires i 6= j. We will also need to appeal to the following partial equivalence between increasing differences and supermodularity. Lemma 4. Suppose X ⊂ Rn is a lattice. If f is (strictly) supermodular on X, then f has (strictly) Q increasing differences on X. If X = ni=1 Xi with Xi ⊂ R for all i and f has (strictly) increasing differences on X, then f is (strictly) supermodular on X.

40

Proof. Let × denote the direct product (a generalization of the Cartesian product).40 Then Theorem 2.6.1 of Topkis (1998) gives that if Xα is a lattice for each α in a set A, X is a sublattice of ×α∈A Xα , and f (x) is (strictly) supermodular on X, then f (x) has (strictly) increasing differences on X. Taking A = {1, . . . , n} and Xα = R for all α ∈ A gives ×α∈A Xα = Rn (which is example 2.2.1 part c on Topkis, 1998, p. 12). Consequently, X is a sublattice of Rn and the theorem applies to show the (strict) supermodularity of f on X implies (strictly) increasing differences of f on X. Corollary 2.6.1 of Topkis (1998) gives that if Xi is a chain (by definition, a partially ordered set that contains no unordered pairs of elements) for i = 1, . . . , n and f has (strictly) increasing differences on ×ni=1 Xi , then f is (strictly) supermodular on ×ni=1 Xi . Since Xi ⊂ R, it is a chain, Q and the direct product ×ni=1 Xi is just the Cartesian product ni=1 Xi . Hence (strictly) increasing differences on X implies (strict) supermodularity on X.

The rest of this subsection gives the omitted proofs and results. Proof of Proposition 4 Proof. Because I 0 ⊂ R, it is a lattice. Additionally, −π(i, i0 ) has decreasing differences and is trivially submodular in i0 (as well as supermodular). So Theorem 6.1 of Topkis (1978) gives that arg mini0 ∈I 0 (i) −π(i, i0 ) is ascending on the set of i such that a solution that a minimizer exists. Theorem 6.3 strengthens this to strongly ascending when −π(i, i0 ) has strictly decreasing differences. Noting G(i) := arg maxi0 ∈I 0 (i) π(i, i0 ) = arg mini0 ∈I 0 (i) −π(i, i0 ) then gives the result.

Lemma 5. Let I, I 0 ⊂ R and I 0 : I → P (I 0 ). Then I 0 is ascending on I if and only if the graph of I 0 is a lattice. Proof. Let Γ denote the graph of I 0 . Let i1 , i2 ∈ I and without loss of generality take i1 ≤ i2 . Then for any a, b ∈ I 0 , (i1 , a), (i2 , b) ∈ Γ if and only if a ∈ I 0 (i1 ), b ∈ I 0 (i2 ). So, consider any (i1 , i01 ), (i2 , i02 ) ∈ I × I 0 . Then (i1 , i01 ) ∧(i2 , i02 ), (i1 , i01 ) ∨(i2 , i02 ) ∈ Γ ⇔ (i1 , i01 ∧ i02 ), (i2 , i01 ∨ i02 ) ∈ Γ ⇔ i01 ∧ i02 ∈ I 0 (i1 ), i01 ∨ i02 ∈ I 0 (i2 )

(*)

⇔ min{i01 , i02 } ∈ I 0 (i1 ), max{i01 , i02 } ∈ I 0 (i2 ) 40

Topkis (1998) defines the direct product, which he denotes by ×, in this way: “If Xα is a set for each α in a set A, then the direct product of these sets Xα is the product set ×α∈A Xα = {x = (xα : α ∈ A) : xα ∈ Xα for each α ∈ A}” (p. 12). The notation (xα : α ∈ A) gives a vector that “consists of a component xα for each α ∈ A” (p. 12). In words, ×α∈A Xα is the set of vectors that can be formed under the restriction that each α component has to lie in Xα .

41

So, suppose Γ is a lattice. Let i01 ∈ I 0 (i1 ), i02 ∈ I 0 (i2 ) with i1 ≤ i2 . We need to show min{i01 , i02 } ∈ I 0 (i1 ), max{i01 , i02 } ∈ I 0 (i2 ). Since, (i1 , i01 ), (i2 , i02 ) ∈ Γ and Γ is a lattice, (i1 , i01 ) ∧(i2 , i02 ), (i1 , i01 ) ∨(i2 , i02 ) ∈ Γ. Hence, (*) gives min{i01 , i02 } ∈ I 0 (i1 ), max{i01 , i02 } ∈ I 0 (i2 ), which shows I 0 is ascending. Now, suppose I 0 is ascending. Let (i1 , i01 ), (i2 , i02 ) ∈ Γ with i1 ≤ i2 . So, i01 ∈ I 0 (i1 ), i02 ∈ I 0 (i2 ). Since I 0 is ascending, min{i01 , i02 } ∈ I 0 (i1 ), max{i01 , i02 } ∈ I 0 (i2 ). Hence, (*) gives (i1 , i01 ) ∧(i2 , i02 ), (i1 , i01 ) ∨(i2 , i02 ) ∈ Γ, which shows Γ is a lattice.

Proof of Lemma 2 Proof. To have I 0 ascending on I one needs i1 < i2 , i01 ∈ I 0 (i1 ), and i02 ∈ I 0 (i2 ) to imply min{i01 , i02 } ∈ I 0 (i1 ) and max{i01 , i02 } ∈ I 0 (i2 ). Since I 0 is increasing, i01 ∈ I 0 (i2 ) and so max{i01 , i02 } ∈ I 0 (i2 ). If i02 ≥ i01 , then one has min{i01 , i02 } = i01 ∈ I 0 (i1 ). So, take i02 < i01 . We need to show that i02 ∈ I 0 (i1 ) for i1 < i2 and i02 < i01 . Pick an arbitrary m and suppress dependence on it. Then h(i2 , i02 ) − h(i1 , i02 ) ≤ h(i2 , i01 ) − h(i1 , i01 ) ≤ h(i2 , i01 ) ≤ h(i2 , i02 ) where the first line follows from increasing differences, the second from i01 ∈ I 0 (i1 ) so that h(i1 , i01 ) ≥ 0, and the third from h being decreasing in i0 . Consequently −h(i1 , i02 ) ≤ 0 which gives h(i1 , i02 ) ≥ 0. Since the m was arbitrary, this holds for all m and so i02 ∈ I 0 (i1 ). Thus, min{i01 , i02 } ∈ I 0 (i1 ) and so I 0 is ascending on I.

Proof of Lemma 3 Proof. For (a) and (b), we prove (b) which implies (a). let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f (i2 , i0j ) − f (i1 , i0j ) = p(i2 ) + q(i0j ) + r(i2 )s(i0j ) − p(i1 ) − q(i0j ) − r(i1 )s(i0j ) = p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i0j ). So, f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ) if and only if p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i01 ) ≤ p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i02 ) ⇔ 0 ≤ (r(i2 ) − r(i1 ))(s(i02 ) − s(i01 )), which holds because r and s are either both increasing or both decreasing. For (c), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f has increasing differences if and only if g(i2 , i01 ) − g(i1 , i01 ) ≤ g(i2 , i02 ) − g(i1 , i02 ) 42

since g agrees with f on S. Because g is C 2 , this holds if Z

g1 (θ, i01 )dθ

Z

g1 (θ, i02 )dθ

≤

Z [i1 ,i2 ]

[i1 ,i2 ]

[i1 ,i2 ]

(g1 (θ, i02 ) − g1 (θ, i01 ))dθ

⇔0≤

Again, by g being C 2 , this holds if Z

Z

0≤ [i1 ,i2 ]

[i01 ,i02 ]

g12 (θ, θ0 )dθ0 dθ

Because L is assumed to be a hypercube containing S and g12 ≥ 0 on L, this holds. For (d), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f has increasing differences if and only if X

αk fk (i2 , i01 ) −

k

X

αk fk (i1 , i01 ) ≤

X

k

⇔

X

αk (fk (i2 , i01 )

αk fk (i2 , i02 ) −

k

−

fk (i1 , i01 ))

k

αk fk (i1 , i02 )

k

X

≤

X

αk (fk (i2 , i02 )

− fk (i1 , i02 )).

k

A sufficient condition for this is that, for all k, fk has increasing differences so that fk (i2 , i01 ) − fk (i1 , i01 ) ≤ fk (i2 , i02 ) − fk (i1 , i02 ). For (e) and (f), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. The composition h ◦ g has increasing differences on S if and only if h(g(i2 , i01 )) − h(g(i1 , i01 )) ≤ h(g(i2 , i02 )) − h(g(i1 , i02 ))

(28)

Because g is increasing in i, g(i2 , i0 ) − g(i1 , i0 ) ≥ 0. Then because h is C 2 , (28) is equivalent to Z

g(i2 ,i01 )−g(i1 ,i01 )

h

0

(g(i1 , i01 )

Z + θ)dθ ≤

0

g(i2 ,i02 )−g(i1 ,i02 )

h0 (g(i1 , i02 ) + θ)dθ.

0

Because g has increasing differences, g(i2 , i01 )−g(i1 , i01 ) ≤ g(i2 , i02 )−g(i1 , i02 ). Hence, this is equivalent to Z

g(i2 ,i01 )−g(i1 ,i01 )

h0 (g(i1 , i01 ) + θ) − h0 (g(i1 , i02 ) + θ) dθ ≤

0

Z

g(i2 ,i02 )−g(i1 ,i02 )

g(i2 ,i01 )−g(i1 ,i01 )

h0 (g(i1 , i02 ) + θ)dθ.

Because h0 > 0, the right-hand side is positive. So, a sufficient condition for this to hold is that the left-hand side be negative, which is true if h0 (g(i1 , i01 ) + θ) ≤ h0 (g(i1 , i02 ) + θ) for all positive θ. In (f), g is increasing in its second argument, so g(i1 , i01 ) + θ ≤ g(i1 , i02 ) + θ, and, because h00 > 0, this holds. In (g), g is decreasing in its second argument, so g(i1 , i01 ) + θ ≥ g(i1 , i02 ) + θ, and, because h00 < 0, this holds. So, h ◦ g has increasing differences. For (g), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then because

43

h is increasing in i and because g has increasing differences, g(h(i2 , ε), i01 ) − g(h(i1 , ε), i01 ) ≤ g(h(i2 , ε), i02 ) − g(h(i1 , ε), i02 ) for any ε ∈ E. Integrating, Z

g(h(i2 , ε), i01 )

−

g(h(i1 , ε), i01 )

Z dF (ε) ≤

From f (i, i0 ) =

g(h(i2 , ε), i02 ) − g(h(i1 , ε), i02 )dF (ε).

E

E

R E

g(h(i, ε), i0 )dF (ε), this says f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ),

which establishes that f has increasing differences. For (h), note that the pairwise increasing differences of g in i, i0 and i, x and i0 , x gives, by definition, that g has increasing differences on I × I 0 × X. So, g is supermodular on the lattice I × I 0 × X by Lemma 4. Since the graph of Γ is assumed to be a lattice, it is a sublattice of S ×X. Last, because g is supermodular, −g is submodular. Consequently, Theorem 4.3 of Topkis (1978) applies to show that minx∈Γ(i,i0 ) −g(i, i0 , x) is submodular. Therefore, − minx∈Γ(i,i0 ) −g(i, i0 , x) is supermodular, and this equals maxx∈Γ(i,i0 ) g(i, i0 , x), which by definition is f (i, i0 ). So, f is supermodular on the lattice S, which implies f has increasing differences on S by Lemma 4.

Proof of Proposition 5 Proof. Because f is assumed to be differentiable (left and right) on the closure of S, Theorem 1 of Milgrom and Segal (2002) gives that fi (i, i0 ) = u1 (g(x∗ (i, i0 ), i, i0 ))g2 (x∗ (i, i0 ), i, i0 ) on S. To show increasing differences, we need to establish that f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ) for (i1 , i01 ), (i2 , i02 ) ∈ S with i1 ≤ i2 and i01 ≤ i02 . Because f is assumed to be C 1 in i, this is equivalent to Z

i2

0≤

(fi (θ, i02 ) − fi (θ, i01 ))dθ.

i1

Hence, if fi is increasing in

i0 ,

then increasing difference holds. Defining x∗j := x∗ (i, i0j ), then fi is

increasing in i0 if 0 ≤ u1 (g(x∗2 , i, i02 ))g2 (x∗2 , i, i02 ) − u1 (g(x∗1 , i, i01 ))g2 (x∗1 , i, i01 ) = u1 (g(x∗2 , i, i02 ))(g2 (x∗2 , i, i02 ) − g2 (x∗1 , i, i01 )) + (u1 (g(x∗2 , i, i02 )) − u1 (g(x∗1 , i, i01 )))g2 (x∗1 , i, i01 ) Since u1 ≥ 0 and g2 (x∗ (i, i0 ), i, i0 ) is increasing in i0 , the first term is positive. Since u11 ≤ 0, g2 (x∗ (i, i0 ), i, i0 ) ≥ 0, and g(x∗ (i, i0 ), i, i0 ) is decreasing in i0 , the second term is positive. So, f has

44

increasing differences.

Proof of Proposition 6 Proof. Let a < b with a, b ∈ I. Define the feasible set as Γ(i) := {i0 ∈ I 0 |c(i, i0 ) ≥ 0}. Let g1 ∈ G(a) and g2 ∈ G(b). To establish the first claim, that G is ascending, it is sufficient to show that g1 > g2 implies g1 ∈ G(b) and g2 ∈ G(a). To establish the second claim, that G is strongly ascending, it is sufficient to show that g1 > g2 gives a contradiction. So, suppose g1 > g2 . Unless explicitly stated, we only assume c is weakly increasing in i, has weakly increasing differences, and that W is weakly increasing. First, we will show c(a, g2 ) ≥ c(a, g1 ) ≥ 0 and c(b, g2 ) ≥ c(b, g1 ) ≥ 0 for W weakly increasing and c(a, g2 ) > c(a, g1 ) ≥ 0 and c(b, g2 ) > c(b, g1 ) ≥ 0 for W strictly increasing. To see this, note that Γ is increasing. Consequently, g1 ∈ Γ(b) (so c(b, g1 ) ≥ 0) and hence g2 ∈ G(b) implies u(c(b, g2 )) + W (g2 ) ≥ u(c(b, g1 )) + W (g1 ).

(29)

Because g2 < g1 and W is weakly (strictly) increasing, this implies c(b, g2 ) ≥ (>)c(b, g1 ). So, exploiting weakly increasing differences, 0 ≥ (>)c(b, g1 ) − c(b, g2 ) ≥ c(a, g1 ) − c(a, g2 ) for W weakly (strictly) increasing. Also using g1 ∈ Γ(a), c(a, g2 ) ≥ (>)c(a, g1 ) ≥ 0 for W weakly (strictly) increasing. For use below, note also that because g1 ∈ G(a) and g2 ∈ Γ(a), u(c(a, g1 )) + W (g1 ) ≥ u(c(a, g2 )) + W (g2 ).

(30)

Combining (29) and (30), u(c(b, g2 )) + W (g2 ) − u(c(b, g1 )) − W (g1 ) ≥ 0 ≥ u(c(a, g2 )) + W (g2 ) − u(c(a, g1 )) − W (g1 ), which implies u(c(b, g2 )) − u(c(b, g1 )) ≥ u(c(a, g2 )) − u(c(a, g1 )).

(31)

As established above, c(b, g2 ) ≥ c(b, g1 ) and c(a, g2 ) ≥ c(a, g1 ). Using this and the differentiability of u, (31) is equivalent to Z

c(b,g2 )−c(b,g1 )

0

Z

u (c(b, g1 ) + θ)dθ ≥ 0

c(a,g2 )−c(a,g1 )

u0 (c(a, g1 ) + θ)dθ.

0

Because of weakly increasing differences, c(a, g1 ) − c(a, g2 ) ≤ c(b, g1 ) − c(b, g2 ) or, equivalently, c(b, g2 ) − c(b, g1 ) ≤ c(a, g2 ) − c(a, g1 ). Moreover, c(b, g2 ) ≥ c(b, g1 ). So, the above inequality is

45

equivalent to Z

c(a,g2 )−c(a,g1 )

0≥

Z

0

c(b,g2 )−c(b,g1 )

u (c(a, g1 ) + θ)dθ +

(u0 (c(a, g1 ) + θ) − u0 (c(b, g1 ) + θ))dθ

0

c(b,g2 )−c(b,g1 )

Because c is weakly increasing in i and u is concave, the second integral is positive. The first must also be positive. Hence, the inequality holds if and only if

AND

c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 )

(C1)

c(b, g2 ) = c(b, g1 ) or c(a, g1 ) = c(b, g1 )

(C2)

Now, consider the claims again. For the second claim, we seek a contradiction. A contradiction obtains if c has strictly increasing differences as then (C1) is violated. Alternatively, if W is strictly increasing and c is strictly increasing in i, (C2) will be violated because c(b, g2 ) > c(b, g1 ) and c(a, g1 ) > c(b, g1 ). For the first claim, we want to show g1 ∈ G(b) and g2 ∈ G(a). (C2) implies either c(b, g2 ) = c(b, g1 ) and/or c(a, g1 ) = c(b, g1 ). Consider the cases separately, with c(b, g2 ) = c(b, g1 ) first. Then (C1) gives c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 ). So, c(a, g2 ) = c(a, g1 ). Hence, the choices give the same consumption at a and b. So, the continuation utility must be the same: (29) implies W (g2 ) ≥ W (g1 ) and (30) implies W (g1 ) ≥ W (g2 ). So, with the same consumption and choice utilities, g1 ∈ G(a) gives g2 ∈ G(a) and g2 ∈ G(b) gives g1 ∈ G(b). Now consider the second case where c(a, g1 ) = c(b, g1 ). Because (C1) gives c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 ), replacing c(a, g1 ) with c(b, g1 ) gives c(a, g2 ) − c(b, g1 ) = c(b, g2 ) − c(b, g1 ) or c(a, g2 ) = c(b, g2 ). Then u(c(a, g1 )) + W (g1 ) ≥ u(c(a, g2 )) + W (g2 ) ⇔ u(c(b, g1 )) + W (g1 ) ≥ u(c(b, g2 )) + W (g2 )

(32)

where the first line follows from the optimality of g1 ∈ G(a) and the second from c(a, g1 ) = c(b, g1 ) and c(a, g2 ) = c(b, g2 ). Consequently, since g2 ∈ G(b) and (32) shows g1 delivers weakly higher utility at b, g1 ∈ G(b). To establish g2 ∈ G(a), the argument is similar. We have c(a, g1 ) = c(b, g1 ) and c(a, g2 ) = c(b, g2 ). Because we have shown g1 ∈ G(b), u(c(b, g1 )) + W (g1 ) = u(c(b, g2 )) + W (g2 ). Replacing c(b, g1 ) with c(a, g1 ) and c(b, g2 ) with c(a, g2 ), this becomes u(c(a, g1 )) + W (g1 ) = u(c(a, g2 )) + W (g2 ). Consequently, g1 ∈ G(a) implies g2 ∈ G(a).

46

C.2

Algorithm properties

As in the main text, σ is always assumed to be monotonically increasing. Lemma 6. Consider the problem maxi0 ∈{a,...,a+n−1} π(i, i0 ) for any a and any i. For any n ∈ Z++ , binary concavity requires no more 2dlog2 (n)e−1 evaluations if n ≥ 3 and no more than n evaluations if n ≤ 2. Proof. We will show σ(n) = 2dlog2 (n)e − 1 for n ≥ 3 and σ(n) = n for n ≤ 2 is an upper bound on the number of evaluations the binary concavity requires. For n = 1, the algorithm computes π(i, a) and stops, so one evaluation is required. This agrees with σ(1) = 1. For n = 2, two evaluations are required (π(i, a) and π(i, a + 1)). This agrees with σ(2) = 2. For n = 3, step 3 requires π(i, m) to be computed and may require π(i, a) to be computed. Then step 3(a) or step 3(b) either stop with no additional function evaluations or go to step 2 with max{1a , 1b } = 1 where, in that case, at most one additional function evaluation is required. Consequently, n = 3 requires at most three function evaluations, which agrees with σ(3) = 2dlog2 (3)e − 1 = 3. So, the statement of lemma holds for 1 ≤ n ≤ 3. Now consider each n ∈ {4, 5, 6, 7} for any 1a , 1b flags. Since n ≥ 4 the algorithm is in (or goes to) step 4. Consequently, two evaluations are required. Since the new interval is either {a, . . . , m} or {m+1, . . . , b} and π(i, m) and π(i, m+1) are computed in step 4, the next step has max{1a , 1b } = 1. Now, if n = 4, the next step must be step 2, which requires at most one additional evaluation (since max{1a , 1b } = 1). Hence, the total evaluations are less than or equal to 3 (two for step 4 and one for step 2). If n = 5, then the next step is either step 2, requiring one evaluation, or step 3, requiring two evaluations. So, the total evaluations are not more than four. If n = 6, the next step is step 3, and so four evaluations are required. Lastly, for n = 7, the next step is either step 3, requiring two evaluations, or step 4 (with n = 4), requiring at most three evaluations. So, the evaluations are weakly less than 5 = 2 + max{2, 3}. Hence, for every n = 4, 5, 6, and 7, the required evaluations are less than 3, 4, 4, and 5, respectively. One can then verify that the evaluations are less than 2dlog2 (n)e − 1 for these values of n. Now, suppose n ≥ 4. We shall prove that the required number of evaluations is less than 2dlog2 (n)e − 1 (i.e., is less than σ(n)). The proof is by induction. We have already verified the hypothesis holds for n ∈ {4, 5, 6, 7}, so consider some n ≥ 8 and suppose the hypothesis holds for all m ∈ {4, . . . , n − 1}. Let i be such that n ∈ [2i + 1, 2i+1 ]. Then note that two things are true, 41 Since n ≥ 4, the algorithm is in (or proceeds to) step dlog2 (n)e = i + 1 and dlog2 (b n+1 2 c)e = i.

4, which requires two evaluations, and then proceeds with a new interval to step 4 (again). If n is even, the new interval has size n/2. If n is odd, the new interval either has a size of (n + 1)/2 or (n − 1)/2. So, if n is even, no more than 2 + σ(n/2) evaluations are required; if n is odd, no more than 2 + max{σ((n + 1)/2), σ((n − 1)/2)} = 2 + σ((n + 1)/2) evaluations are required. The even 41

The proof is as follows. Both dlog2 (·)e and dlog2 (b·c)e are weakly increasing functions. So n ∈ [2i + 1, 2 ] implies dlog2 (n)e ∈ [dlog2 (2i + 1)e, dlog2 (2i+1 )e] = [i + 1, i + 1]. Likewise, dlog2 (b n+1 c)e ∈ 2 i+1

[dlog2 (b 2

i

i+1 +1+1 c)e, dlog2 (b 2 2 +1 c)e] 2

= [i, i].

47

and odd case can then be handled simultaneously with the bound 2 + σ(b n+1 2 c). Manipulating this expression using the previous observation that dlog2 (n)e = i + 1 and dlog2 (b n+1 2 c)e = i, 2 + σ(b

n+1 n+1 c) = 2 + 2dlog2 b ce − 1 2 2 = 2 + 2i − 1 = 2(i + 1) − 1 = 2dlog2 (n)e − 1.

Hence, the proof by induction is complete.

Lemma 7. For any σ, Mσ (z, γ) is weakly increasing in z and γ. Proof. Fix a σ and suppress dependence on it. First, we will show M (z, γ) is weakly increasing in γ for every z. The proof is by induction. For z = 2, M (2, ·) = 0. For z = 3, M (3, γ) = σ(γ) which is weakly increasing in γ. Now consider some z > 3 and suppose M (y, ·) is weakly increasing for all y ≤ z − 1. For γ2 > γ1 , o n z z M (b c + 1, γ 0 ) + M (b c + 1, γ1 − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ1 } n o z z ≤ σ(γ2 ) + max M (b c + 1, γ 0 ) + M (b c + 1, γ1 − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ2 } n o z z ≤ σ(γ2 ) + max M (b c + 1, γ 0 ) + M (b c + 1, γ2 − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ2 }

M (z, γ1 ) = σ(γ1 ) +

max

= M (z, γ2 ) where the second inequality is justified by the induction hypothesis giving M (b z2 c + 1, ·) as an increasing function (note b z2 c + 1 ≤ z − 1 for all z > 3). Now we will show M (z, γ) is increasing in z for every γ. The proof is by induction. First, note that M (2, γ) = 0 ≤ σ(γ) = M (3, γ) for all γ > 0 and M (2, γ) = 0 = M (3, γ) for γ = 0. Now, consider some k > 3 and suppose that for any z1 , z2 ≤ k − 1 with z1 ≤ z2 that M (z1 , γ) ≤ M (z2 , γ) for all γ. The goal is to show that for any z1 , z2 ≤ k with z1 ≤ z2 that M (z1 , γ) ≤ M (z2 , γ) for all γ. So, consider such z1 , z2 ≤ k with z1 ≤ z2 . If γ = 0, then M (z1 , γ) = 0 = M (z2 , γ), so take γ > 0. Then

n o z1 z1 M (b c + 1, γ 0 ) + M (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ} n o z2 z 2 ≤ σ(γ) + max M (b c + 1, γ 0 ) + M (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ}

M (z1 , γ) = σ(γ) +

max

= M (z2 , γ). The inequality obtains since b z2i c + 1 ≤ k − 1 for all i (which is true since even if zi = k, one has bk/2c + 1 ≤ k − 1 by virtue of k > 3). So, the induction hypothesis gives M (b z21 c + 1, ·) ≤ M (b z22 c + 1, ·), and the proof by induction is complete. 48

Proof of Proposition 1. Proof. Since g is the policy function associated with (1), g : {1, . . . , n} → {1, . . . , n0 }. By monotonicity, g is weakly increasing. Define N : {1, . . . , n}2 → Z+ by N (a, b) = M (b − a + 1, g(b) − g(a) + 1) noting that this is well-defined (based on the definition of M ) whenever b > a. Additionally, define a sequence of sets Ik for k = 1, . . . , n − 1 by Ik := {(i, i)|i = i + k and i, i ∈ {1, . . . , n}}. Note that for any k ∈ {1, . . . , n − 1}, Ik is nonempty and N (a, b) is well-defined for any (a, b) ∈ Ik . We shall now prove that for any k ∈ {1, . . . , n − 1}, (i, i) ∈ Ik implies N (i, i) is an upper bound on the number of evaluations of π required by the algorithm in order to compute the optimal policy for all i ∈ {i, . . . , i} when g(i) and g(i) are known. If true, then beginning at step 2 in the algorithm (which assumes g(i) and g(i) are known) with (i, i) ∈ Ik , N (i, i) is an upper bound on the number of π evaluations. The argument is by induction. First, consider k = 1. For any (a, b) ∈ I1 , the algorithm terminates at step 2(a). Consequently, the number of required π evaluations is zero, which is the same as N (a, b) = M (b − a + 1, g(b) − g(a) + 1) = M (2, g(b) − g(a) + 1) = 0 (recall M (2, ·) = 0). Now, consider some k ∈ {2, . . . , n − 1} and suppose the induction hypothesis holds for all j in 1, . . . , k − 1. That is, assume for all j in 1, . . . , k − 1 that (i, i) ∈ Ij implies N (i, i) is an upper bound on the number of required π evaluations when g(i) and g(i) are known. We shall show it holds for k as well. Consider any (i, i) ∈ Ik with g(i) and g(i) are known. Since i > i + 1, the algorithm does not terminate at step 2(a). In step 2(b), to compute g(m) (where m := b i+i 2 c), one must find the maximum within the range g(i), . . . , g(i), which requires at most σ(g(i) − g(i) + 1) evaluations of π. In step 2(c), the space is then divided into {i, . . . , m} and {m, . . . , i}. If k is even, then m equals

i+i 2 .

Since (i, m) ∈ Ik/2 and g(i) and g(m) are known, the induction

hypothesis gives N (i, m) as an upper bound on the number of π evaluations needed to compute g(i), . . . , g(m). Similarly, since (m, i) ∈ Ik/2 and g(m) and g(i) are known, N (m, i) provides an upper bound on the number of π evaluations needed to compute g(m), . . . , g(i). Therefore, to compute g(i), . . . , g(i), at most σ(g(i) − g(i) + 1) + N (i, m) + N (m, i) evaluations are required. Defining γ = g(i) − g(i) + 1 and γ 0 = g(m) − g(i) + 1 and using the definition of m and N , we have

49

that the number of required evaluations is less than σ(γ) + M (m − i + 1, g(m) − g(i) + 1) + M (i − m + 1, g(i) − g(m) + 1) i+i 2 i−i = σ(γ) + M ( 2 i−i = σ(γ) + M ( 2 i−i = σ(γ) + M ( 2

i+i + 1, g(i) − g(m) + 1) 2

− i + 1, γ 0 ) + M (i −

= σ(γ) + M (

i−i + 1, g(i) − g(m) + γ 0 − γ 0 + 1) 2 i−i + 1, γ 0 ) + M ( + 1, g(i) − g(m) + g(m) − g(i) + 1 − γ 0 + 1) 2 i−i + 1, γ 0 ) + M ( + 1, γ − γ 0 + 1). 2 + 1, γ 0 ) + M (

By virtue of monotonicity, g(m) ∈ {g(i), . . . , g(i)} and so g(m)−g(i)+1 ∈ {1, . . . , g(i)−g(i)+1} or equivalently γ 0 ∈ {1, . . . , γ}. Consequently, the number of function evaluations is not greater than σ(γ) +

max

M

γ 0 ∈{1,...,γ}

i−i + 1, γ 0 2

+M

i−i + 1, γ − γ 0 + 1 2

.

The case of k odd is very similar, but the divide-and-conquer algorithm splits the space unequally. If k is odd, then m equals

i+i−1 2 .

In this case (i, m) ∈ I(k−1)/2 and (m, i) ∈ I(k−1)/2+1 .42 Con-

sequently, computing the policy for i, . . . , i takes no more than σ(g(i)−g(i)−1)+N (i, m)+N (m, i) maximization steps. Defining γ and γ 0 the same as before and using the definition of m and N , we have the the required maximization steps is less than σ(γ) + M (m − i + 1, γ 0 ) + M (i − m + 1, γ − γ 0 + 1) i+i−1 i+i−1 − i + 1, γ 0 ) + M (i − + 1, γ − γ 0 + 1) = σ(γ) + M ( 2 2 i−i+1 0 i−i+1 0 ,γ + M + 1, γ − γ + 1 . = σ(γ) + M 2 2 Because M is increasing in the first argument, this is less than σ(γ) +

max

γ 0 ∈{1,...,γ}

M

i−i+1 + 1, γ 0 2

+M

i−i+1 + 1, γ − γ 0 + 1 2

.

Combining the bounds for k even and odd, the required number of π evaluations is less than σ(γ) +

max

γ 0 ∈{1,...,γ}

M

i−i+1 i−i+1 0 0 + 1, γ + M + 1, γ − γ + 1 2 2

because if k is even, then b i−i+1 2 c=

i−i 2 .

(33)

Consequently, (33) gives an upper bound for any (i, i) ∈ Ik

for k ≥ 1 when g(i) and g(i) are known. If N (i, i) is less than this, then the proof by induction is 42

To see this, note that (i, i) ∈ Ik implies k = i − i. To have, (i, m) ∈ I(k−1)/2 , it must be that m = i +

holds: i +

k−1 2

= i+

i−i−1 2

= i+

i+i−1−i−i 2

= i+m+

i = m + (k − 1)/2 + 1. This also obtains: m +

(k−1) 2

−2i 2

k−1 . 2

This

= m. Similarly, to have (m, i) ∈ I(k−1)/2+1 , one must have

+1=

i+i−1 2

50

+

i−i−1 2

+1=

2i−2 2

+ 1 = i.

complete. Since N (i, i) is defined as M (i − i + 1, g(i) − g(i) + 1), using the definitions of N and M shows N (i, i) = M (i − i + 1, g(i) − g(i) + 1) = M (i − i + 1, γ) = σ(γ) +

max

γ 0 ∈{1,...,γ}

i−i+1 i−i+1 0 0 c + 1, γ ) + M (b c + 1, γ − γ + 1) . M (b 2 2

Consequently, N (i, i) exactly equals the value in (33), and the proof by induction is complete. Step 1 of the algorithm requires at most 2σ(n0 ) evaluations to compute g(1) and g(n). If n = 2, step 2 is never reached. Since M (n, n0 ) = 0 in this case, 2σ(n0 ) + M (n, n0 ) provides an upper bound. If n > 2, then since (1, n) ∈ In−1 and g(1) and g(n) known, only N (1, n) additional evaluations are required. Therefore, to compute for each i ∈ {1, . . . , n}, no more than 2σ(n0 ) + N (1, n) = 2σ(n0 ) + M (n, g(n) − g(1) + 1) function evaluations are needed. Lemma (7) then gives that this is less than 2σ(n0 ) + M (n, n0 ) since g(n) − g(1) + 1 ≤ n0 − 1 + 1 = n0 .

i−1 +1 Lemma 8. Define a sequence {mi }∞ i=1 by m1 = 2 and mi = 2mi−1 −1 for i ≥ 2. Then mi = 2

and log2 (mi − 1) = i − 1 for all i ≥ 1. Proof. The proof of mi = 2i−1 + 1 for all i ≥ 1 is by induction. For i = 1, m1 is defined as 2, which equals 21−1 + 1. For i > 1, suppose it holds for i − 1. Then mi = 2mi−1 − 1 = 2[2i−2 + 1] − 1 = 2i−1 + 1.

Lemma 9. Consider any z ≥ 2. Then there exists a unique sequence {ni }Ii=1 such that n1 = 2, nI = z, b n2i c + 1 = ni−1 , and ni > 2 for all i > 1. Moreover, I = dlog2 (z − 1)e + 1. Proof. The proof that a unique sequence exists is by construction. Let z ≥ 2 be fixed. Define an infinite sequence {zi }∞ i=1 recursively as follows: Define zi = Ti (z) for all i ≥ 1 with T1 (z) := z and Ti+1 (z) = b Ti2(z) c + 1. We now establish all of the following: Ti (z) ≥ 2, Ti (z) ≥ Ti+1 (z), and Ti (z) > Ti+1 (z) whenever Ti (z) > 2. As an immediate consequence, for any z ≥ 2, there exists a unique I(z) ≥ 1 such that TI(z) = 2 and, for all i < I(z), Ti (z) > 2. We also show for later use that Ti (z) is weakly increasing in z for every i. To show Ti (z) ≥ 2, the proof is by induction. We have T1 (z) = z and z ≥ 2. Now, consider some i > 1 and suppose it holds for i − 1. Then Ti (z) = b Ti−12 (z) c + 1 ≥ b 22 c + 1 = 2. To show Ti (z) > Ti+1 (z) whenever Ti (z) > 2, consider two cases. First, consider Ti (z) even. Then Ti+1 (z) = b Ti2(z) c + 1 =

Ti (z) 2

+ 1 and so Ti+1 (z) < Ti (z) as long as Ti (z) > 2. Second, consider 51

Ti (z) odd. Then Ti+1 (z) = b Ti2(z) c + 1 =

Ti (z)−1 2

+ 1 and so Ti+1 (z) < Ti (z) as long as Ti (z) > 1.

To show that Ti (z) ≥ Ti+1 (z), all we need to show now is that Ti+1 (z) = 2 when Ti (z) = 2 (since the inequality is strict if Ti (z) > 2 and Ti (z) ≥ 2 for all i). If Ti (z) = 2, then Ti+1 (z) = b 22 c + 1 = 2. To establish that Ti (z) is weakly increasing in z for every i, the proof is by induction. For a ≤ b, T1 (a) = a ≤ b = T1 (b). Now consider some i > 1 and suppose the induction hypothesis holds for i − 1. Then Ti (a) = bTi−1 (a)/2c + 1 ≤ bTi−1 (b)/2c + 1 = Ti (b). I(z)

The sequence {nj }j=1 defined by nj = TI(z)−j+1 (z)—i.e., an inverted version of the sequence I(z)

{Ti (z)}i=1 —satisfies nI(z) = T1 (z) = z, n1 (z) = TI(z) = 2, and ni−1 = TI(z)−(i−1)+1 = b

TI(z)−(i−1) c+ 2

1 = b n2i c + 1. Also, by the definition of I(z), Ti (z) > 2 for any i > I(z). So, if we can show that I(z) = dlog2 (z − 1)e + 1, the proof is complete. The proof of I(z) = dlog2 (z − 1)e + 1 is as follows. Note that for z = 2, the sequence {zi } is simply zi = 2 for all i which implies I(2) = 1. Since dlog2 (2 − 1)e + 1 = 1, the relationship holds for z = 2. So, now consider z > 2. The proof proceeds in the following steps. First, for the special {mi } sequence defined in Lemma 8, we show Tj (mi ) = mi+1−j for any i ≥ 1 and any j ≤ i. Second, we use this to show that I(mi ) = i for all i ≥ 1. Third, we show that z ∈ (mi−1 , mi ] implies I(z) = i by showing I(mi − 1) < I(z) ≤ I(mi ). Fourth, we show that the i such that z ∈ (mi−1 , mi ] is given by dlog2 (z − 1)e + 1. This then gives I(z) = dlog2 (z − 1)e + 1 since I(z) = I(mi ) = i = dlog2 (z − 1)e + 1. First, we show Tj (mi ) = mi+1−j for any i ≥ 1 and any j ≤ i. Fix some i ≥ 1. The proof is by induction. For j = 1, T1 (mi ) = mi = mi+1−1 . Now consider some j having 2 ≤ j ≤ i and suppose the induction hypothesis holds for j − 1. Then Tj−1 (mi ) c+1 2 mi+1−(j−1) c+1 =b 2 mi+2−j =b c+1 2 2mi+1−j − 1 =b c+1 2 1 = mi+1−j + b− c + 1 2 = mi+1−j − 1 + 1

Tj (mi ) = b

= mi+1−j , which proves Tj (mi ) = mi+1−j for j ≤ i. The fourth equality follows from the definition of {mi } in Lemma 8. Second, we show I(mi ) = i for all i ≥ 1. Fix any i ≥ 1. We just showed Tj (mi ) = mi+1−j . Hence, Ti (mi ) = m1 = 2 and Ti−1 (mi ) = m2 = 3. Consequently, the definition of I—which for a given z is defined as the smallest i ≥ 1 such that Ti (z) = 2—gives I(mi ) = i (recall Tj is decreasing in j). Third, we show that z ∈ (mi−1 , mi ] implies I(z) = i by showing I(mi − 1) < I(z) ≤ I(mi ). Note that, since z > 2 (having taken care of the z = 2 case already), there is some i ≥ 2 such that 52

z ∈ (mi−1 , mi ] (since m1 = 2). To see I(z) ≤ I(mi ), suppose not, that I(z) > I(mi ). But then 2 = TI(z) (z) < TI(mi ) (z) ≤ TI(mi ) (mi ) = 2, which is a contradiction.43 Therefore, I(z) ≤ I(mi ). To see I(mi−1 ) < I(z), we begin by showing Tj (mi−1 ) < Tj (mi−1 + ε) for any ε > 0 and any j ≤ i − 1. Since Tj (mi−1 ) = mi−j , it is equivalent to show that mi−j < Tj (mi−1 + ε), which we show by induction. Clearly, for j = 1, we have mi−1 < mi−1 + ε = T1 (mi−1 + ε). Now consider j > 1 and suppose it is true for j − 1. Then Tj−1 (mi−1 + ε) c+1 2 Tj−1 (mi−1 + ε) − mi−j+1 + mi−j+1 c+1 =b 2 Tj−1 (mi−1 + ε) − mi−j+1 + 2mi−j − 1 =b c+1 2 Tj−1 (mi−1 + ε) − mi−j+1 − 1 =b c + mi−j + 1 2

Tj (mi−1 + ε) = b

Now, since the induction hypothesis of Tj−1 (mi−1 +ε) > mi−j+1 gives Tj−1 (mi−1 +ε)−mi−j+1 −1 ≥ 0, one has 0 Tj (mi−1 + ε) ≥ b c + mi−j + 1 2 = mi−j + 1 > mi−j . Hence the proof by induction is complete. Now, having established Tj (mi−1 ) < Tj (mi−1 + ε) for any ε > 0 and any j ≤ i − 1, we show I(mi−1 ) < I(z). Suppose not, that I(mi−1 ) ≥ I(z). Then since z > mi−1 , taking ε = z − mi−1 we have 2 = TI(mi−1 ) (mi−1 ) < TI(mi−1 ) (mi−1 + ε) = TI(mi−1 ) (z) ≤ TI(z) (z) = 2, which is a contradiction. Lastly, we now show that the i such that z ∈ (mi−1 , mi ] is given by dlog2 (z − 1)e + 1. That this holds can be seen as follows. Note that z ∈ (mi−1 , mi ] implies log2 (z − 1) + 1 ∈ (log2 (mi−1 − 1) + 1, log2 (mi − 1) + 1]. Then, since log2 (mj − 1) + 1 = j for all j ≥ 1 (Lemma 8), we have log2 (z − 1) + 1 ∈ (i − 1, i]. Then, by the definition of d·e, one has dlog2 (z − 1) + 1e = i, which of course is equivalent to dlog2 (z − 1)e + 1 = i. We established the i such that z ∈ (mi−1 , mi ] is i = dlog2 (z − 1)e + 1. Also we showed i − 1 = I(mi−1 ) < I(z) ≤ I(mi ) = i. Hence I(z) = dlog2 (z − 1)e + 1, which completes the proof.

Proof of Lemma 1. Proof. In keeping with the notation of the other proofs, let z and γ correspond to n and n0 , respectively. Fix some arbitrary z ≥ 2. By Lemma 9, there is a strictly monotone increasing 43

The second inequality uses that Ti (z) is weakly increasing in z for every i, as established above.

53

sequence {zi }Ii=1 with zI = z, zi = b zi+1 2 c + 1 for i < I, and with I = dlog2 (z − 1)e + 1 (and having z1 = 2). For i > 1 and any γ ≥ 1, define W (zi , γ) :=

max

γ 0 ∈{1,...,γ}

M (zi−1 , γ 0 ) + M (zi−1 , γ − γ 0 + 1).

For i = 1, define W (zi , ·) = 0. The definition of M gives M (zi , γ) = σ(γ) + W (zi , γ) for any i > 1 with M (z1 , ·) = 0. Note that W (z2 , γ) = W (z1 , γ) = 0. ¯ —which we will demonstrate is an upper bound and continuous version of W —as Define W ¯ (zi , γ) := σ ¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) W ¯ ∗ (γ) + max W γ 0 ∈[1,γ]

for i > 2 with σ ¯ ∗ (γ) := max σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1).

(34)

γ 0 ∈[1,γ]

¯ (zi , γ) = 0. Then W (zi , γ) ≤ W ¯ (zi , γ) for all i ≥ 1 and all γ ∈ Z++ . The For i = 1 or 2, define W proof is by induction. They are equal for i = 1 and i = 2. Now consider an i > 2 and suppose it holds for i − 1. Then, W (zi , γ) = = ≤ ≤

max

M (zi−1 , γ 0 ) + M (zi−1 , γ − γ 0 + 1)

max

σ(γ 0 ) + σ(γ − γ 0 + 1) + W (zi−1 , γ 0 ) + W (zi−1 , γ − γ 0 + 1)

max

σ(γ 0 ) + σ(γ − γ 0 + 1) +

max

σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) +

γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ}

max

W (zi−1 , γ 0 ) + W (zi−1 , γ − γ 0 + 1)

max

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) W

γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ}

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) ≤ max σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) + max W γ 0 ∈[1,γ]

γ 0 ∈[1,γ]

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) ≤σ ¯ ∗ (γ) + max W γ 0 ∈[1,γ]

¯ (zi , γ). =W If σ ¯ (x) = x for all x, then σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) = γ + 1, which does not depend on γ 0 . So, σ ¯ ∗ (γ) = γ + 1 = 2¯ σ ( γ+1 ¯ function is strictly increasing, strictly concave, and differentiable, 2 ). If the σ then the first order condition of the σ ¯ ∗ (γ) problem yields σ ¯ 0 (γ 0 ) = σ ¯ 0 (γ − γ 0 + 1). The derivative is invertible (by strict concavity) and so γ 0 = σ ¯ ∗ (γ)

=

2¯ σ ( γ+1 2 ),

γ+1 2 .

So, σ ¯ ∗ (γ) = σ ¯ ( γ+1 ¯ (γ − 2 )+σ

the same condition as in the linear σ ¯ ∗ (γ) = 2¯ σ(

44

case.44

γ+1 2

+ 1), which gives

Hence,

γ+1 ). 2

Since this is an interior solution and the problem is concave, the constraint γ 0 ∈ [1, γ] is not binding.

54

(35)

So, for i > 2, γ+1 ¯ (zi , γ) = 2¯ ¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1). W σ( ) + max W 2 γ 0 ∈[1,γ]

(36)

γ+1 ¯ ¯ (zi , γ) = 2¯ We will now show for i > 2 that W σ ( γ+1 2 ) + 2W (zi−1 , 2 ), which gives a simple ¯ (zi , γ) = 0). First note that 2W ¯ (zi , γ+1 ) recursive relationship for the upper bound (for i = 1 or 2, W 2

¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) evaluated at γ+1 , and so maxγ 0 ∈[1,γ] W ¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) ≥ is just W 2 ¯ (zi , γ+1 ). So, it is sufficient to show maxγ 0 ∈[1,γ] W ¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) ≤ 2W ¯ (zi , γ+1 ). Now, 2W 2

2

¯ (zi , γ 0 ) + W ¯ (zi , γ − γ 0 + 1) max W

γ 0 ∈[1,γ]

0 0 γ0 + 1 (γ − γ 0 + 1) + 1 ¯ (zi−1 , γ + 1 ) + 2¯ ¯ (zi−1 , (γ − γ + 1) + 1 ) ) + 2W σ( ) + 2W 2 2 2 2 γ 0 ∈[1,γ] 0 0 0 0 γ+1 γ +1 γ +1 ¯ (zi−1 , γ + 1 ) + W ¯ (zi−1 , γ + 1 − γ + 1 + 1) )+σ ¯( − + 1) + W = 2 max σ ¯( 2 2 2 2 2 2 γ 0 ∈[1,γ] γ + 1 γ + 1 ¯ (zi−1 , γ˜ 0 ) + W ¯ (zi−1 , − γ˜ 0 + 1) + W − γ˜ 0 + 1) = 2 max σ ¯ (˜ γ0) + σ ¯( γ+1 0 2 2 γ ˜ ∈[1, 2 ]

= max 2¯ σ(

γ+1 ¯ (zi−1 , γ˜ 0 ) + W ¯ (zi−1 , γ + 1 − γ˜ 0 + 1) − γ˜ 0 + 1) + 2 max W 2 2 γ ˜ 0 ∈[1, γ+1 γ ˜ 0 ∈[1, γ+1 ] ] 2 2 ! γ+1 +1 γ+1 γ+1 = 2 max σ ¯ (˜ γ0) + σ ¯( − γ˜ 0 + 1) + 2 W (zi , ) − 2σ( 2 ) 2 2 2 γ ˜ 0 ∈[1, γ+1 ] 2

≤2

max

σ ¯ (˜ γ0) + σ ¯(

γ+1 +1 γ+1 γ+1 ) + 2W (zi , ) − 4σ( 2 ) 2 2 2 γ+1 γ+1 +1 γ+1 2 +1 = 4¯ σ( ) + 2W (zi , ) − 4σ( 2 ) 2 2 2 ¯ (zi , γ + 1 ) = 2W 2

= 2¯ σ∗(

The first equality follows from (36). The second equality follows from algebra. The third equality is just a change of variables where γ˜ 0 = (γ + 1)/2. The inequality follows from max(f + g) ≤ max f + max g for any f, g. The fourth equality follows from evaluating (36) at (γ + 1)/2 and manipulation. The fifth equality follows from the definition of σ ¯ ∗ in (34). The sixth equality follows ¯ (zi , γ 0 ) + W ¯ (zi , γ − γ 0 + 1) = 2W ¯ (zi , γ+1 ). from (35). The last equality simplifies. So, maxγ 0 ∈[1,γ] W 2

Using this equality to replace the max in (36), one has (for i > 2) that γ+1 ¯ (zi , γ) = 2¯ ¯ (zi−1 , γ + 1 ). W σ( ) + 2W 2 2 Now, fix any γ ≥ 1 and define γI := γ and γi =

γi+1 +1 . 2

(37)

Then γi = 2i−I (γI − 1) + 1.45 Then, for

45

The proof is by induction. For i = I, γI = 2I−I (γI − 1) + 1 = γI , which holds. Consider then some i < I, and suppose it holds for i + 1. Then γi =

(2i+1−I (γI − 1) + 1) + 1 γi+1 + 1 = = 2i−I (γI − 1) + 1. 2 2

55

i > 2, (37) becomes ¯ (zi , γi ) = 2¯ ¯ (zi−1 , γi−1 ). W σ (γi−1 ) + 2W ¯ (zI , γI ): So, if I > 2, one can repeatedly expand the above to find a value for W ¯ (zI , γI ) = 2¯ ¯ (zI−1 , γI−1 ) W σ (γI−1 ) + 2W ¯ (zI−2 , γI−2 ) = 2¯ σ (γI−1 ) + 22 σ ¯ (γI−2 ) + 22 W ¯ (z2 , γ2 ) = 2¯ σ (γI−1 ) + . . . + 2I−2 σ ¯ (γ2 ) + 2I−2 W = 2¯ σ (γI−1 ) + . . . + 2I−2 σ ¯ (γ2 ) =

I−2 X

2j σ ¯ (γI−j )

j=1

=

I−2 X

2j σ ¯ (2I−j−I (γI − 1) + 1)

j=1

=

I−2 X

2j σ ¯ (2−j (γI − 1) + 1).

j=1

¯ (z2 , ·) = 0, and the rest are The first equalities are algebra, the fourth uses the definition of W ¯ (zI , γI ) = 0. algebra. If I ≤ 2, then W Proposition 1 shows the number of required evaluations is less than or equal to 2σ(γI )+M (zI , γI ) for zI ≥ 2 and γI ≥ 1. Since M (zi , γ) = σ(γ) + W (zi , γ) for any i > 1 (with M (z1 , γ) = 0) and ¯ (zi , γ), the required function evaluations are weakly less than 3σ(γI ) + W ¯ (zI , γI ) for W (zi , γ) ≤ W any I (recalling W (zI , γI ) = 0 for I ≤ 2). Hence, if I > 2, then an upper bound is 3σ(γI ) +

I−2 X

2j σ ¯ (2−j (γI − 1) + 1)

j=1

and if I ≤ 2 an upper bound is 3σ(γI ).

P P Lemma 10. For r 6= 1, bj=a rj = (ra − rb+1 )/(1 − r) and bj=a jrj = P P r = 2, bj=a rj = 2b+1 − 2a and bj=a jrj = 2a (2 − a) + 2b+1 (b − 1). Proof. The first sum, written as

(ra

−

Pb

j=a r

rb+1 )/(1

The second sum,

Pb

j,

ara −brb+1 1−r

+

ra+1 −rb+1 . (1−r)2

For

is the standard geometric series and its sum can be compactly

− r) for r 6= 1. For r = 2, this is 2b+1 − 2a .

j=a jr

j,

a sort of weighted geometric series, has no commonly known formula,

56

so we derive it. Define S :=

(1 − r)S =

b X

Pb

j=a jr

jrj −

j=a

=

b X

b X

Then for r 6= 1,

jrj+1

j=a

jrj −

j=a

=

b X

j.

b+1 X

(j − 1)rj

j=a+1

jrj −

j=a

b+1 X

jrj +

j=a+1

 = ara +

b X

b+1 X

rj

j=a+1





jrj  − 

j=a+1

b X

 jrj + (b + 1)rb+1  +

j=a+1

ra+1 − rb+2 . 1−r

The first line is algebra, the second a change of indices, the third algebra, and the fourth separates out a term from each of the first two summations (and uses the geometric series formula to replace the third). Canceling the remaining summations, one then has ra+1 − rb+2 1−r a+1 − r b+2 r = ara − brb+1 − rb+1 + 1−r b+1 (1 − r)r ra+1 − rb+2 = ara − brb+1 − + 1−r 1−r b+2 b+1 a+1 r −r r − rb+2 = ara − brb+1 + + 1−r 1−r a+1 − r b+1 r = ara − brb+1 + 1−r a b+1 a+1 ar − br r − rb+1 ⇔S= + . 1−r (1 − r)2

(1 − r)S = ara − (b + 1)rb+1 +

Plugging in r = 2 gives S = b2b+1 − a2a + 2a+1 − 2b+1 = 2a (2 − a) + 2b+1 (b − 1).

Proof of Proposition 2. Proof. From Lemma 1, the number of evaluations required by the algorithm is not greater than ¯ (n, n0 ) where W ¯ (n, n0 ) := PI−2 2j σ 3¯ σ (n0 ) + W ¯ (2−j (n0 − 1) + 1) and I = dlog2 (n − 1)e + 1. In the j=1

case of brute force, σ ¯ (γ) = γ is a valid upper bound on σ(γ) = γ. Plugging this into the expression

57

¯ (n, n0 ), one has for W 0

0

W (n, n ) = (I − 2)(n − 1) +

I−2 X

2j

j=1 0

= (I − 2)(n − 1) + 2I−1 − 2 = (I − 2)(n0 − 1) + 2dlog2 (n−1)e − 2 ≤ (I − 2)(n0 − 1) + 2log2 (n−1)+1 − 2 ≤ (I − 2)(n0 − 1) + 2(n − 1) − 2 = (I − 2)(n0 − 1) + 2n − 4 = (dlog2 (n − 1)e − 1)(n0 − 1) + 2n − 4 ≤ (n0 − 1) log2 (n − 1) + 2n − 4. where we have used Lemma 10 to arrive at the second line. So, no more than (n0 − 1) log2 (n − 1) + 3n0 + 2n − 4 evaluations are required. In the case of binary concavity, Lemma 6 shows σ(γ) = 2dlog2 (γ)e − 1 for γ ≥ 3 and σ(γ) = γ for γ ≤ 3 is an upper bound. Now consider σ ¯ (γ) = 2 log2 (γ) + 1. It is a strictly increasing, strictly concave, and differentiable function. For γ = 1 or 2, one can plug in values to find σ(γ) ≤ σ ¯ (γ). Additionally, for γ ≥ 3, σ(γ) ≤ 2dlog2 (γ)e − 1 ≤ 2(1 + log2 (γ)) − 1 = σ ¯ (γ) So, σ ¯ satisfies all the conditions of Lemma 1. Plugging this σ ¯ into the bound, one finds ¯ (n, n0 ) = W

I−2 X

2j 1 + 2 log2 (2−j (n0 − 1) + 1)

j=1

= (2I−1 − 2) + 2

I−2 X

2j log2 (2−j (n0 − 1) + 1)

j=1

(using Lemma 10). To handle the log2 (2−j (n0 − 1) + 1) term, we break the summation into two parts, one with 2−j (n0 − 1) < 1 and one with 2−j (n0 − 1) ≥ 1. We do this to exploit the following fact: For x ≥ 1, log2 (x + 1) ≤ log2 (x) + 1 since they are equal at x = 1 the right hand side grows more quickly in x (i.e., the derivative of log2 (x + 1) is less than the derivative of log2 (x) + 1). Let J be such that j > J implies 2−j (n0 − 1) < 1 and j ≤ J implies 2−j (n − 1) ≥ 1. Then since 2−j (n − 1) = 1 for j = log2 (n0 − 1), J is given by blog2 (n0 − 1)c. Recall that in the statement of the

58

proposition we assumed that n0 ≥ 3. So, J ≥ 1. I−2 X

¯ (n, n0 ) = (2I−1 − 2) + 2 W

j=J+1 I−2 X

≤ (2I−1 − 2) + 2

2j log2 (2−j (n0 − 1) +1) + 2 {z } | <1

2j + 2

= (2I−1 − 2) + 2

2j + 2

j=1

= 3(2I−1 − 2) + 2

J X

j=1

2j log2 (2−j (n0 − 1) +1) {z } | ≥1

2j (1 + log2 (2−j (n0 − 1)))

j=1

j=J+1 I−2 X

J X

J X

J X

2j log2 (2−j (n0 − 1))

j=1

2j log2 (2−j (n0 − 1))

j=1

= 3(2I−1 − 2) + 2

J X

2j (−j + log2 (n0 − 1))

j=1

= 3(2I−1 − 2) − 2

J X

j2j + 2 log2 (n0 − 1)

j=1

J X

2j

j=1

The first line follows from the definition of J; the second from log2 (x + 1) ≤ 1 + log2 (x) for x ≥ 1; the third from algebra; the fourth from the standard geometric series formula; and the fifth and sixth from algebra. Then, using the weighted geometric sum found in Lemma 10, i.e., Pb j a b+1 (b − 1), a j2 = 2 (2 − a) + 2 = 3(2I−1 − 2) − 2(21 (2 − 1) + 2J+1 (J − 1)) + 2 log2 (n0 − 1)(2J+1 − 2) = 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + 2 log2 (n0 − 1)(2J+1 − 2) ≤ 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + 2(J + 1)(2J+1 − 2) = 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + (J + 1)2J+2 − 4(J + 1) = 3(2I−1 − 2) − 4 − J2J+2 + 2J+2 + J2J+2 + 2J+2 − 4(J + 1) = 3(2I−1 − 2) − 4 + 2J+3 − 4J − 4 = 3 · 2I−1 + 2J+3 − 4J − 14. The first line applies the weighted geometric sum formula, the second simplifies, the third uses log2 (n0 − 1) ≤ J + 1 (i.e., log2 (n0 − 1) ≤ blog2 (n0 − 1)c), and the remaining lines use algebra.

59

Now, substituting the expressions for I and J, 0

= 3 · 2dlog2 (n−1)e+1−1 + 2blog2 (n −1)c+3 − 4blog2 (n0 − 1)c − 14 0

≤ 3 · 21+log2 (n−1) + 2log2 (n −1)+3 − 4blog2 (n0 − 1)c − 14 = 6(n − 1) + 8(n0 − 1) − 4blog2 (n0 − 1)c − 14 = 6n + 8n0 − 4blog2 (n0 − 1)c − 28 ≤ 6n + 8n0 − 4(log2 (n0 − 1) − 1) − 28 = 6n + 8n0 − 4 log2 (n0 − 1) − 24. ¯ (n, n0 ). Hence, the total number of evaluations—which The above expression provides a bound for W ¯ (n, n0 )—cannot exceed must be less than 3¯ σ (n0 ) + W 3(2 log2 (n0 ) + 1) + 6n + 8n0 − 4 log2 (n0 − 1) − 24 = 6n + 8n0 + 6 log2 (n0 ) − 4 log2 (n0 − 1) − 21 ≤ 6n + 8n0 + 6(log2 (n0 − 1) + 1) − 4 log2 (n0 − 1) − 21 = 6n + 8n0 + 2 log2 (n0 − 1) − 15. The second line is algebra, the third again uses log2 (x + 1) ≤ 1 + log2 (x) for x ≥ 1 (note n0 ≥ 3 in the statement of the proposition), and the last simplifies.

b−a+1 Lemma 11. Define m(a, b) = b a+b 2 c for b > a with a, b ∈ Z. Then m(a, b) − a + 1 ≤ b 2 c + 1

and b − m(a, b) + 1 ≤ b b−a+1 2 c + 1. Proof. If exactly one of a, b is odd, then m =

b+a−1 2 ;

otherwise, m =

b+a 2 .

So,

b+a −a+1 2 b−a+1 1 = + 2 2 b−a+1 c + 1. ≤b 2

m−a+1≤

Now, take the case of exactly one of a, b being odd. Then b − a is odd and b − a + 1 is even. In this case, b+a−1 +1 2 b−a+1 = +1 2 b−a+1 =b c + 1. 2

b−m+1=b−

60

If on the other hand a, b are either both even or both odd, then b − a + 1 is odd. In this case, b+a +1 2 b−a+1 1 = + 2 2 b−a+1 =b c + 1. 2

b−m+1=b−

Proof of Proposition 3 Proof. Let an upper bound on the cost of the usual binary monotonicity algorithm given z states and γ choices as C(z, γ). Then note that the cost of solving for g(·, 1) is less than C(n1 , n0 ), as is the cost of solving for g(·, n2 ). Let g(·, ·; π) give the policy selected by the algorithm when the objective function is π. Let T exact (j, j; π)

denote the exact cost of the algorithm for recovering g(·, j; π) for all j ∈ {j +1, . . . , j −

1} when the objective function is π and g(·, j; π) and g(·, j; π) are known. (That is, the cost from Step 3 onward). Define for z > 2 and γ ≥ 1 T ∗ (z, γ) = sup C(n1 , γ) + T exact (j, m(j, j); π) + T exact (m(j, j), j; π) j,j,π

s.t. j, j ∈ {1, . . . , n2 } j
weakly increasing in both arguments. Fix some (j, j, π) with j, j ∈ {1, . . . , n2 } and j < j, and note that (j, j, π) is in the choice set of the T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1) problem. We now show T exact is bounded by T ∗ in that T exact (j, j; π) ≤ T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1). To see this, note that if j − j + 1 = 2 then T exact (j, j; π) = 0, in which case T ∗ (j −j +1, ·) = 0 by definition. On the other hand, if j −j +1 > 2, then T exact (j, j; π) ≤ C(n1 , g(n1 , j; π) − g(1, j; π) + 1) + T exact (j, m(j, j); π) + T exact (m(j, j), j; π) because C bounds the cost of the one-dimensional algorithm. Comparing with the definition of T ∗ , T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1) is necessarily larger because (j, j, π) is in its choice set.

61

Now, using this bound and the definition of T ∗ gives T ∗ (z, γ) ≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, g(n1 , j; π) − g(1, m; π) + 1) j,j,π

= sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗

j − m + 1,

j,j,π

g(n1 , j; π) − g(1, j; π) + 1

!

+g(1, j; π) − g(1, m; π)

≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, γ + g(1, j; π) − g(1, m; π)) j,j,π

= sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) j,j,π

+ T ∗ (j − m + 1, γ + g(n1 , m; π) − g(1, m; π) + 1 − γ 0 (π, j, j)) where m = m(j, j) and we have defined γ 0 (π, j, j) := g(n1 , m; π) − g(1, j; π) + 1. The first relation follows from T exact being less than T ∗ , the second from adding and subtracting g(1, j; π), the third from the constraint that g(n1 , j; π) − g(1, j; π) + 1 ≤ γ and T ∗ being increasing in its second argument, and the last from adding and subtracting γ 0 and manipulation. With this definition of γ 0 , the constraint g(n1 , j; π) − g(1, j; π) + 1 ≤ γ is equivalent to γ 0 (π, j, j) ≤ γ.46 So, γ 0 (π, j, j) ∈ [1, γ]. Using our λ-based restriction, we have as an implication of it that g(n1 , m; π) − g(1, m; π) + 1 ≤ λ(g(n1 , j; π) − g(1, j; π) + 1) ≤ λγ. So, because T ∗ is increasing in its second argument, T ∗ (z, γ) ≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, (1 + λ)γ − γ 0 (π, j, j)) j,j,π

Note now that π only shows up in γ 0 . Since the choice set implies γ 0 ∈ [1, γ], we can drop π and allow γ 0 ∈ [1, γ] to be chosen directly: T ∗ (z, γ) ≤

sup

C(n1 , γ) + T ∗ (m − j + 1, γ 0 ) + T ∗ (j − m + 1, (1 + λ)γ − γ 0 )

j,j,γ 0 ∈[1,γ]

By Lemma 11, m − j + 1 and j − m + 1 are less than b j − j + 1 ≤ z—must be less than T ∗ (z, γ) ≤

b z2 c

j−j+1 2 c+1

which—because of the constraint

+ 1. So,

z z C(n1 , γ) + T ∗ (b c + 1, γ 0 ) + T ∗ (b c + 1, (1 + λ)γ − γ 0 ), 2 2 j,j,γ 0 ∈[1,γ] sup

46

As an aside, note the important difference here from the one-dimensional case. In the one-dimensional case, evaluating g at the midpoint provided a lower bound and upper bound that essentially split the search space in half. Here, that is the case if g(·, m) is a constant in which case the g(n1 , m) − g(1, m) is zero and the bound on function counts is analogous to the Mσ (z, γ) function defined in the main text. However, if g(·, m) is strictly ascending, and in particular if g(1, m) = g(1, j) and g(n1 , m) = g(n1 , j), then the g(n1 , m) − g(1, m) term is essentially γ − 1, an extra term not present in the one-dimensional case.

62

which no longer depends on j or j. Hence, dropping these from the choice set, one has z z T ∗ (z, γ) ≤ C(n1 , γ) + sup T ∗ (b c + 1, γ 0 ) + T ∗ (b c + 1, (1 + λ)γ − γ 0 ), 2 2 γ 0 ∈[1,γ] Defining T (z, γ) := C(n1 , γ) + maxγ 0 ∈[1γ] T (b z2 c + 1, γ 0 ) + T (b z2 c + 1, (1 + λ)γ − γ 0 ) for z > 2 and 0 otherwise, we then have T (z, γ) as an upper bound for T ∗ (z, γ) (by induction with T ∗ (2, γ) = T (2, γ) = 0 as the base case). Now, by Lemma 9, there is a unique sequence {zi }Ii=1 with zI = z, b z2i c + 1 = zi−1 , z1 = 2, and zi > 2 for all i > 1 and I = dlog2 (z − 1)e + 1. Then, T (zi , γ) = C(n1 , γ) + sup T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) γ 0 ∈[1,γ]

for all i. For binary monotonicity with brute force grid search, the upper bound on function counts in Proposition 2 is linear increasing in γ. So suppose C is linear increasing in γ. In that case, we will show T (zi , γ) is linear increasing in γ for all i ≥ 2 and that T (zi , γ) = C(n1 , γ) + 2T (zi−1 , γ

(1 + λ) ). 2

First note this is trivially the case for i = 2 since in that case T (zi−1 , ·) = T (z1 , ·) = T (2, ·) = 0. Now, suppose that it holds for i − 1. Then continuity gives that the maximum is attained so sup can be replaced with max. Because of linearity, T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) is independent of γ 0 . Consequently, maxγ 0 T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) = 2T (zi−1 , γ (1+λ) 2 ). Hence, T (zi , γ) will also be linearly increasing and satisfy the recursive formulation above. Now, expanding the recursive formulation and defining c := (1 + λ)/2, T (zI , γ) = C(n1 , γ) + 2T (zI−1 , γc) = C(n1 , γ) + 2(C(n1 , γc) + 2T (zI−2 , (γc)c) = C(n1 , γ) + 2C(n1 , γc) + 22 T (zI−2 , γc2 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) + 2I−1 T (zI−(I−1) , γcI−1 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) + 2I−1 T (2, γcI−1 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) =

I−2 X

2i C(n1 , γci )

i=0

Plugging in c = (1 + λ)/2, zI = n2 (corresponding to the first time Step 3 is reached), and γ = n0

63

(corresponding to Step 3 being reached with the worst case g(·, 1) = 1 and g(·, n2 ) = n0 ), 0

T (n2 , n ) =

I−2 X

2i C(n1 , 2−i (1 + λ)i n0 )

i=0

where I = dlog2 (n2 − 1)e + 1. With brute force grid search, a valid C is C(n1 , γ) = 2n1 + γ(log2 (n1 ) + 3). Then T (n2 , n0 ) =

I−2 X

2i 2n1 + 2−i (1 + λ)i n0 (log2 (n1 ) + 3)

i=0 I−2 I−2 X X i = n (log2 (n1 ) + 3) (1 + λ) + 2n1 2i 0

i=0

i=0

λ)I−1

1 − (1 + + 2n1 (2I−1 − 1) −λ (1 + λ)I−1 − 1 = n0 (log2 (n1 ) + 3) + 2n1 (2I−1 − 1) λ = n0 (log2 (n1 ) + 3)

using the formulas in Lemma 10. Now, I = dlog2 (n2 − 1)e + 1 implies I − 1 ≤ log2 (n2 − 1) + 1. So, defining κ := log2 (1 + λ) so that (1 + λ) = 2κ , 2κ(I−1) − 1 + 2n1 (2I−1 − 1) λ ≤ n0 (log2 (n1 ) + 3)λ−1 2κ(I−1) + 2n1 (2I−1 − 1)

T (n2 , n0 ) = n0 (log2 (n1 ) + 3)

≤ n0 (log2 (n1 ) + 3)λ−1 2κ(log2 (n2 −1)+1) + 2n1 (2log2 (n2 −1)+1 − 1) = n0 (log2 (n1 ) + 3)λ−1 (2log2 (n2 −1) )κ 2κ + 2n1 (2(n2 − 1) − 1) = n0 (log2 (n1 ) + 3)λ−1 (n2 − 1)κ 2κ + 4n1 n2 − 6n1 ≤ n0 (log2 (n1 ) + 3)λ−1 nκ2 (1 + λ) + 4n1 n2 − 6n1 = (1 + λ)λ−1 log2 (n1 )n0 nκ2 + 3(1 + λ)λ−1 n0 nκ2 + 4n1 n2 − 6n1 = (λ−1 + 1) log2 (n1 )n0 nκ2 + 3(λ−1 + 1)n0 nκ2 + 4n1 n2 − 6n1 . This bound does not include the cost of solving for g(·, 1) and g(·, n2 ) using the standard binary algorithm. Including this cost, the algorithm’s total cost is less than 2C(n1 , n0 ) + T (n2 , n0 ) ≤ 2n0 (log2 (n1 ) + 3) + 4n1 + (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 − 6n1 ≤ (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 + 2n0 log2 (n1 ) + 6n0 , which is the bound stated in the proposition. Now, suppose that λ < 1 provides a uniform bound on (g(n1 , j) − g(1, j) + 1)/(g(n1 , j + 1) − g(1, j − 1) + 1) (this also implies λ > 0). So, κ ∈ (0, 1) (since κ = log2 (1 + λ)). To characterize the 64

algorithm’s O(n1 n2 ) behavior when n1 = n0 =: n and n2 = ρ−1 n, divide the bound by n1 n2 = n2 /ρ to arrive at c

log2 (n)n1+κ o(n2 ) + 4 + n2 n2

where c is a positive constant. Then it is enough to show that log(n)n1+κ (using that natural log has the same asymptotics as log2 ) grows more slowly than n2 . The ratio log(n)n1+κ /n2 equals log(n)/n1−κ . Using l’Hopital’s rule as n → ∞, if the limit exists it is the same as the limit of (1/n)/((1 − κ)n−κ ) = n−1+κ /(1 − κ). This equals 0 since κ ∈ (0, 1). Consequently, the cost is O(n1 n2 ) with a hidden constant of 4.

C.3

Equivalence between the two problems

This subsection establishes the equivalence between (1) and (4). The notation for the π ˜ and π problems is reversed from the main text. Suppose I 0 (i) ⊂ {1, . . . , n0 } may be empty, but is monotonically increasing. Let I = {i|i ∈ {1, . . . , n}, I 0 (i) 6= ∅} so that i ∈ I has a feasible solution. For all i ∈ I, let Π(i) := max π(i, i0 ). i0 ∈I 0 (i)

and let G(i) := arg maxi0 ∈I 0 (i) π(i, i0 ). Let G be the set of optimal policies, i.e., for every g ∈ G and every i ∈ I, g(i) ∈ G(i). Further, let π be such that π(i, i0 ) > π for all i ∈ I and i0 ∈ I 0 (i), for instance, π = mini∈I mini0 ∈I 0 (i) π(i, i0 ) − 1. Define ˜ Π(i) =

max

i0 ∈{1,...,n0 }

π ˜ (i, i0 )

and  0 if I 0 (i) 6= ∅ and i0 ∈ I 0 (i)   π(i, i ) π ˜ (i, i0 ) = if I 0 (i) 6= ∅ and i0 ∈ / I 0 (i) π   1[i0 = 1] if I 0 (i) = ∅ Note that by construction, i0 = 1 is optimal whenever there is no feasible choice and i0 ∈ I 0 (i) is ˜ always preferable to i0 ∈ / I 0 (i) when a feasible choice exists. Let G(i) := arg maxi0 ∈{1,...,n0 } π ˜ (i, i0 ). ˜ Further, let G˜ be the set of optimal policies, i.e., for every g˜ ∈ G˜ and every i ∈ {1, . . . , n}, g˜(i) ∈ G(i). Lemma 12. All of the following are true: ˜ 1. Π(i) = Π(i) for all i ∈ I. ˜ 2. G(i) = G(i) for all i ∈ I. ˜ is ascending on {1, . . . , n}. 3. If G is ascending on I, then G

65

Proof. For the proof of the first claim, let i ∈ I. Then I 0 (i) 6= ∅. Therefore, ˜ Π(i) =

max

i0 ∈{1,...,n0 }

π ˜ (i, i0 ) = max{π, max π(i, i0 )} = max π(i, i0 ) = Π(i) i0 ∈I 0 (i)

i0 ∈I 0 (i)

where the second equality follows from the definition of π ˜ and the third equality is justified by π(i, i0 ) > π ˜ for all feasible i0 . For the proof of the second claim, again let i ∈ I. Then I 0 (i) 6= ∅. As before, infeasible choices, i.e., i0 ∈ {1, . . . , n0 } \ I 0 (i), are strictly suboptimal because any feasible choice j 0 ∈ I 0 (i) delivers π ˜ (i, j 0 ) > π = π ˜ (i, i0 ). Hence, ˜ = arg max π G(i) ˜ (i, i0 ) = arg max π ˜ (i, i0 ) = arg max π(i, i0 ) = G(i) i0 ∈{1,...,n0 }

i0 ∈I 0 (i)

i0 ∈I 0 (i)

(where the third equality follows from the definition of π ˜ ). ˜ 1 ) and To show the third claim, let G be ascending on I. Now, let i1 ≤ i2 and g1 ∈ G(i ˜ 1 ) and max{g1 , g2 } ∈ G(i ˜ 2 ). Clear this is the ˜ 2 ). We want to show that min{g1 , g2 } ∈ G(i g2 ∈ G(i ˜ case if g1 ≤ g2 , so take g1 > g2 . Then since G(i) = {1} for all i ∈ / I and g1 > g2 ≥ 1, it must be that i1 ∈ I (otherwise, g1 would have to be 1). Then, because I 0 (i) is increasing, i2 ∈ I. Hence, ˜ 1 ) = G(i1 ) and G(i ˜ 2 ) = G(i2 ). So, G ascending gives the desired result. G(i

Definition 5. We say the problem is concave if for all i ∈ I, I 0 (i) = {1, . . . , n0 (i)} for some monotone increasing function n0 (i) and π(i, ·) is either first strictly increasing and then weakly decreasing; or is always weakly decreasing; or is always strictly increasing (where defined). Note that this formulation allows for multiple maxima but ensures that the argmax is an integer interval. The strictly increasing is necessary because the algorithms, when indifferent between two points, assume there is a maximum to the left of the two points. Note that if there is some strictly concave or strictly quasi-concave function f : R → R such that π(i, i0 ) = f (i0 ) for all i0 ∈ I 0 (i), then π satisfies the conditions. We now establish that applying any of combination of the algorithms (e.g., binary monotonicity with simple concavity) to the transformed problem delivers a correct solution (provided the underlying assumptions for using them are met). We do this through a series of lemmas culminating in Proposition 7. Lemma 13. If the problem is concave and π ˜ (i, j) ≥ π ˜ (i, j+1) for some j, then π ˜ (i, j) = maxi0 ∈{j,...,n0 } π ˜ (i, i0 ) (j is as least as good as anything to the right of it). If π ˜ (i, k − 1) < π ˜ (i, k) for some k, then π ˜ (i, k) = maxi0 ∈{1,...,k} π ˜ (i, i0 ) = maxi0 ∈{1,...,k} π(i, i0 ) (k is as least as good as anything to the left of it). Proof. First, we consider π ˜ (i, j) ≥ π ˜ (i, j + 1). Consider two cases. First, suppose i ∈ / I. Then I 0 (i) = ∅ and π ˜ (i, i0 ) = 1[i0 = 1]. Consequently, for any j, π ˜ (i, j) = maxi0 ∈{j,...,n0 } π ˜ (i, i0 ). In other words, j is weakly better than any value to the 66

right of it. Now, suppose i ∈ I. If j > n0 (i), then π ˜ (i, j) = π = maxi0 ∈{j,...,n0 } π ˜ (i, i0 ). If j = n0 (i), then π ˜ (i, j) > π = maxi0 ∈{j+1,...,n0 } π ˜ (i, i0 ) implying π ˜ (i, j) = maxi0 ∈{j,...,n0 } π ˜ (i, i0 ). If j < n0 (i), then max

i0 ∈{j,...,n0 }

π ˜ (i, i0 ) = max{

max

π ˜ (i, i0 ),

= max{

max

π ˜ (i, i0 ), π}

i0 ∈{j,...,n0 (i)} i0 ∈{j,...,n0 (i)}

= =

max

π ˜ (i, i0 )

max

π(i, i0 ).

i0 ∈{j,...,n0 (i)} i0 ∈{j,...,n0 (i)}

max

i0 ∈{n0 (i)+1,...,n0 }

π ˜ (i, i0 )}

All that remains to be shown for this case is π ˜ (i, j) = maxi0 ∈{j,...,n0 (i)} π(i, i0 ). Since π ˜ (i, j) = π(i, j) and π ˜ (i, j + 1) = π(i, j + 1), the hypothesis gives π(i, j) ≥ π(i, j + 1). Because the problem is concave and π(i, ·) is weakly decreasing from j to j + 1, it must be weakly decreasing from j to n0 (i). Hence, π(i, j) = maxi0 ∈{j,...,n0 (i)} π(i, i0 ). So, π ˜ (i, j) = π(i, j) = maxi0 ∈{j,...,n0 (i)} π(i, i0 ) = maxi0 ∈{j,...,n0 } π ˜ (i, i0 ). Now, we prove the case for π ˜ (i, k − 1) < π ˜ (i, k). In this case, k − 1 and k must both be feasible, i.e. k ≤ n0 (i), because (1) if they were both infeasible, then π ˜ (i, k − 1) = π = π ˜ (i, k) and (2) if only k were infeasible, then π ˜ (i, k − 1) > π = π ˜ (i, k). Given that k − 1 and k are feasible, π ˜ (i, k − 1) = π(i, k − 1) and π ˜ (i, k) = π(i, k). Since π(i, ·) is strictly increasing until it switches to weakly decreasing, π(i, 1) < . . . < π(i, k − 1) < π(i, k). Hence π(i, k) = maxi0 ∈{1,...,k} π(i, i0 ). Since all of 1, . . . , k are feasible, π ˜ (i, k) = maxi0 ∈{1,...,k} π ˜ (i, i0 ) = maxi0 ∈{1,...,k} π(i, i0 ).

˜ ∩ {a, . . . , b} is nonempty. Then brute force applied to Lemma 14. Suppose it is known that G(i) max

i0 ∈{a,...,b}

π ˜ (i, i0 )

˜ delivers an optimal solution, i.e., letting gˆ be the choice the algorithm delivers, gˆ ∈ G(i). Additionally, if the problem is concave, then the simple concavity and binary concavity algorithms also deliver an optimal solution. ˜ = {1}. Proof. First, suppose i ∈ / I so that π ˜ (i, i0 ) = 1[i0 = 1]. Then it must be that a = 1 since G(i) Brute force clearly finds the optimum since it checks every value of i0 . Simple concavity will compare i0 = a = 1 against i0 = a + 1 = 2 and find i0 = 2 is strictly worse. So, it stops and gives gˆ = 1, ˜ = {1}. Binary concavity first checks whether b − a + 1 ≤ 2. If so, it is the same as implying gˆ ∈ G(i) brute force. If not, it checks whether b − a + 1 ≤ 3. If so, then b − a + 1 = 3 and it does a comparison of either (1) a and m = (b + a)/2, in which case it correctly identifies the maximum as a or (2) m and b in which case it drops b from the search space and does a brute force comparison of a and a + 1 (when it goes to step 2). If b − a + 1 > 3, it will evaluate the midpoint m = b(a + b)/2c and m + 1 and find π ˜ (i, m) = π ˜ (i, m + 1) = 0. It will then proceed to step 2, searching for the 67

optimum in {1, . . . , m} with a = 1 and b = m in the next iteration of the recursive algorithm. This proceeds until b − a + 1 ≤ 3, where it then correctly identifies the maximum (as was just discussed). ˜ Therefore, binary concavity finds a correct solution, gˆ ∈ G(i). Now, suppose i ∈ I. Because brute force will evaluate π ˜ (i, ·) at every i0 ∈ {a, . . . , b}, it finds ˜ gˆ ∈ G(i). Now, suppose the problem is concave. The simple concavity algorithm evaluates π ˜ (i, i0 ) at i0 ∈ {a, . . . , b} sequentially until it reaches an x ∈ {a+1, . . . , b} that π ˜ (i, x−1) ≥ π ˜ (i, x). If this stopping rule is not triggered, then simple concavity is identical to brute force and so finds an optimal solution. So, it suffices to consider otherwise. In this case, x−1 satisfies the conditions for “j” in Lemma 13 and hence π ˜ (i, x − 1) = maxi0 ∈{x−1,...,n0 } π ˜ (i, i0 ). By virtue of not having stopped until x − 1, π ˜ (i, x − 1) ≥ maxi0 ∈{a,...,x−1} π ˜ (i, i0 ). Consequently, π ˜ (i, x − 1) ≥ maxi0 ∈{a,...,x−1}∪{x−1,...,n0 } π ˜ (i, i0 ) = maxi0 ∈{a,...,n0 } π ˜ (i, i0 ). Since a maximum is known to be in {a, . . . , b}, ˜ Π(i) =

max

i0 ∈{a,...,b}

π ˜ (i, i0 ) ≤

max

i0 ∈{a,...,n0 }

˜ π ˜ (i, i0 ) ≤ π ˜ (i, x − 1) ≤ Π(i).

˜ ˜ So, π ˜ (i, x − 1) = Π(i) giving x − 1 ∈ G(i). Now consider the binary concavity algorithm. If b ≤ a + 1 (so that the size of the search space, b − a + 1, is 1 or 2), the algorithm is the same as brute force and so finds a maximum. If b = a + 2 (a search space of size 3), the algorithm goes to either step 3(a) or step 3(b). In step 3(a), it stops if π ˜ (i, a) > π ˜ (i, m) (where m = (a + b)/2) taking the maximum as a and otherwise does the same as brute force. So, suppose the stopping condition is satisfied. A maximum is a as long as π ˜ (i, a) = maxi0 ∈{a,...,b} π ˜ (i, i0 ), which it is since a satisfies the conditions for “j” in Lemma 13. In step 3(b), it stops if π ˜ (i, b) > π ˜ (i, m) taking the maximum as b and otherwise does the same as brute force. So, suppose the stopping condition is satisfied. A maximum is b as long as π ˜ (i, b) = maxi0 ∈{a,...,b} π ˜ (i, i0 ), which is true since b satisfies all the conditions for “k” in Lemma 13. If b ≥ a + 3 (a search space of 4 or more), binary concavity goes to step 4 of the algorithm. In this case, it evaluates at two points m = b(a + b)/2c and m + 1. If π ˜ (i, m) ≥ π ˜ (i, m + 1), it assumes a maximum is in {a, . . . , m}. Since m satisfies the conditions for “j” in Lemma 13, π ˜ (i, m) ≥ maxi0 ∈{m,...,b} π ˜ (i, i0 ), which justifies this assumption. If π ˜ (i, m) < π ˜ (i, m + 1), it instead assumes a maximum is in {m+1, . . . , b}. This again is justified since m+1 satisfies all the conditions for “k” in Lemma 13 and so m + 1 is better than any value of i0 < m + 1. The algorithm repeatedly divides {a, . . . , b} into either {a, . . . , m} or {m + 1, . . . , b} until the size of the search space is either two or three. Since we have already shown the algorithm correctly identifies a maximum when the search space is of size two or three (i.e., b = a + 1 or b = a + 2), the algorithm correctly finds the maximum for larger search spaces as long as this subdivision stops in a finite number of iterations (since then induction can be applied). Lemma 6 shows the required number of function evaluations is finite, and so this holds.

Proposition 7. Any of brute force, simple, or binary monotonicity combined with any of brute force, simple, or binary concavity delivers an optimal solution provided G is ascending and the 68

problem is concave as required by the algorithm choices. That is, letting gˆ be the policy function the ˜ algorithm finds, gˆ ∈ G. Proof. Each of the brute force, simple, and binary monotonicity algorithms can be thought of as iterating through states i (in some order that, in the case of binary monotonicity, depends on π ˜) with a search space {a, . . . , b}. If every state is visited and optimal choice is found at each state, then an optimal solution is found. So, it suffices to show that each of the brute force, simple, and binary monotonicity algorithms explore every state i ∈ {1, . . . , n} and at each state, the following conditions are met so that Lemma 14 can be applied: (1) {a, . . . , b} ⊂ {1, . . . , n0 }; (2) a ≤ b; and (3) ˜ ∩ {a, . . . , b} = ˜ (provided an appropriate G(i) 6 ∅. An application of Lemma 14 then gives gˆ(i) ∈ G(i) concavity algorithm is used). Brute force monotonicity trivially explores all states i ∈ {1, . . . , n} sequentially. At each i, a = 1 ˜ ∩ {a, . . . .b} = and b = n0 . Consequently, G(i) 6 ∅ and Lemma 14 can be applied. Now, we prove simple monotonicity and binary monotonicity deliver a correct solution when G ˜ is ascending. is ascending, which, by Lemma 12, gives that G The simple monotonicity algorithm explores all states i ∈ {1, . . . , n} sequentially always with ˜ ∩ {a, . . . , b} = b = n0 (and so a ≤ b). For i = 1, a = 1 and so G(1) 6 ∅. Consequently, Lemma 14 gives ˜ ˜ − 1). that gˆ(1) ∈ G(1). Now, consider some i > 1 and suppose for induction that gˆ(i − 1) ∈ G(i ˜ is ascending and gˆ(i − 1) ∈ G(i ˜ − 1), any g˙ ∈ G(i) ˜ ˜ Because G implies max{ˆ g (i − 1), g} ˙ ∈ G(i). So, ˜ ∩ {g(i − 1), . . . , n0 } = ˜ G(i) 6 ∅. Hence, Lemma 14 applies, and gˆ(i) ∈ G(i) completing the induction argument. Now consider the binary monotonicity algorithm. If n = 1 or n = 2, the algorithm is the same as simple monotonicity and so delivers a correct solution. If n > 2, then the algorithm first correctly identifies gˆ(1) (by brute force) and gˆ(n) (using the same argument as simple monotonicity). It then ˜ ˜ defines i = 1 and i = n and maintains the assumption that gˆ(i) ∈ G(i) and gˆ(i) ∈ G(i). The goal of step 2 is to find the optimal solution for all i ∈ {i, . . . , i}. The algorithm stops at step 2(a) if i ≤ i + 1, in which case this objective is clearly met since {i, . . . , i} = {i, i}. If the algorithm does not stop, then it computes gˆ(m) for m = b(a + b)/2c using the search space {ˆ g (i), . . . , gˆ(i)}. ˜ ˜ By Lemma 14, an optimum is found as long as G(m) ∩ {ˆ g (i), . . . , gˆ(i)} = 6 ∅. If gˆ(i) ∈ G(i) and ˜ ˜ ˜ ascending gives min{ˆ ˜ then for any g˙ ∈ G(m), G g (i), max{ˆ g (i), g}} ˙ ∈ G(m). So, gˆ(i) ∈ G(i), ˜ ˜ ˜ G(m) ∩ {ˆ g (i), . . . , gˆ(i)} 6= ∅ if gˆ(i) ∈ G(i) and gˆ(i) ∈ G(i). This holds because of the algorithm’s maintained assumptions.47 So, if every i ∈ {2, . . . , n−1} is the midpoint of some (i, i) after iterating some number of times, the proof is complete. In other words, since the algorithm only solves for the optimal policy at midpoints once it reaches step 2, we need to prove every state (except for i = 1 and i = n) is eventually a midpoint. To show that every i ∈ {2, . . . , n − 1} is a midpoint of some interval reached in the recursion, fix an arbitrary such i and suppose not. Define (i1 , i1 ) = (1, n). When step 2 is first reached, 47

Formally it can be shown through induction. It is true at the first instance of step 2. Since in step 2(c) the algorithm then divides into {i, . . . , m} and {m, . . . , i}, it is also true at the next iteration. Consequently, induction gives that a maximum is found for every midpoint.

69

i ∈ {i1 + 1, . . . , i1 − 1}. Now, uniquely and recursively define (ik , ik ) to be the one of (ik−1 , m) and (m, ik−1 ) with m = b(ik−1 + ik−1 )/2c such that i ∈ {ik + 1, . . . , ik − 1} (because i is assumed to never be a midpoint, this is well-defined). Now, consider the cardinality of {ik , . . . , ik } defining it as Nk = ik − ik + 1. By construction, i ∈ {ik + 1, . . . , ik − 1} for each k. So, a contradiction is reached if {ik + 1, . . . , ik − 1} = ∅ which is equivalent to Nk ≤ 2. So, it must be that Nk ≥ 3 for all k. If Nk−1 is odd, then Nk = (Nk−1 + 1)/2. If Nk−1 is even, Nk ≤ Nk−1 /2 + 1. So, in either case Nk ≤ Nk−1 /2 + 1. Defining Mk recursively by M1 = N1 and Mk = Mk−1 /2 + 1, one can show by induction that Nk ≤ Mk for all k. Because Nk ≥ 3 for all k, Mk ≥ 3 for all k. Hence Mk − Mk−1 = 1 − Mk−1 /2 ≤ 1 − 3/2 = −1/2. Hence Mk ≤ Mk−1 − 1/2. Therefore, Mk will be less than three in a finite number of iterations, which gives a contradiction.

70