A Divide and Conquer Algorithm for Exploiting Policy ...

Viewer
Transcript

A Divide and Conquer Algorithm for Exploiting Policy Function Monotonicity Grey Gordon and Shi Qiu∗ July 29, 2017

Abstract A divide and conquer algorithm for exploiting policy function monotonicity is proposed and analyzed. To solve a discrete problem with n states and n choices, the algorithm requires at most n log2 (n)+5n objective function evaluations. In contrast, existing methods for non-concave problems require n2 evaluations in the worst case. For concave problems, the solution technique can be combined with a method exploiting concavity to reduce evaluations to 14n + 2 log2 (n). A version of the algorithm exploiting monotonicity in two state variables allows for even more efficient solutions. The algorithm can also be efficiently employed in a common class of problems that do not have monotone policies, including problems with many state and choice variables. In the sovereign default model of Arellano (2008) and in the real business cycle model, the algorithm reduces run times by an order of magnitude for moderate grid sizes and orders of magnitude for larger ones. Sufficient conditions for monotonicity and code are provided. Keywords: Computation, Monotonicity, Grid Search, Discrete Choice, Sovereign Default JEL Codes: C61, C63, E32, F34

∗

Grey Gordon (corresponding author), Indiana University, Department of Economics, [email protected]. Shi Qiu, Indiana University, Department of Economics, [email protected]. The authors thank Kartik Athreya, Bob Becker, Alexandros Fakos, Filomena Garcia, Bulent Guler, Daniel Harenberg, Juan Carlos Hatchondo, Aaron Hedlund, Chaojun Li, Lilia Maliar, Serguei Maliar, Amanda Michaud, Jim Nason, Julia Thomas, Nora Traum, Todd Walker, Xin Wei, and David Wiczer, as well as the editor and three anonymous referees. We also thank participants at the Econometric Society World Congress 2015 and the Midwest Macro Meetings 2015. Code is provided at https://sites.google.com/site/greygordon/code. Any mistakes are our own.

1

1

Introduction

Many optimal control problems in economics are either naturally discrete or can be discretized. However, solving these problems can be costly. For instance, consider a simple growth model where the state k and choice k 0 are restricted to lie in {1, . . . , n}. One way to find the optimal policy g is to evaluate lifetime utility at every k 0 for every k. The n2 cost of this brute force approach grows quickly in n. In this paper, we propose a divide and conquer algorithm that drastically reduces this cost for problems with monotone policy functions. When applied to the growth model, the algorithm first solves for g(1) and g(n). It then solves for g(n/2) by only evaluating utility at k 0 greater than g(1) and less than g(n) since monotonicity of g gives g(n/2) ∈ {g(1), . . . , g(n)}. Similarly, once g(n/2) is known, the algorithm uses it as an upper bound in solving for g(n/4) and a lower bound in solving for g(3n/4). Continuing this subdivision recursively leads to further improvements, and we show this method—which we refer to as binary monotonicity—requires no more than n log2 (n) + 5n objective function evaluations in the worst case. When the objective function is concave, further improvements can be made. For example, binary monotonicity gives that g(n/2) is in {g(1), . . . , g(n)}, but the maximum within this search space can be found in different ways. One way is brute force, i.e., checking every value. However, with concavity, one may check g(1), g(1) + 1, . . . sequentially and stop as soon as the objective function decreases. We refer to this approach as simple concavity. An alternative, Heer and Maußner’s (2005) method, repeatedly discards half of the search space. We prove that this approach, which we will refer to as binary concavity, combined with binary monotonicity computes the optimum in at most 14n + 2 log2 (n) function evaluations. For problems with non-concave objectives like the Arellano (2008) sovereign default model, binary monotonicity vastly outperforms brute force and simple monotonicity both theoretically and quantitatively. Theoretically, simple monotonicity requires n2 evaluations in the worst case (the same as brute force). Consequently, assuming worst-case behavior for all the methods, simple monotonicity is 8.7, 35.9, and 66.9 times slower than binary monotonicity for n equal to 100, 500, and 1000, respectively. While this worst-case behavior could be misleading, in practice we find it is not. For instance, in the Arellano (2008) model, we find binary monotonicity is 5.1, 21.4, and 40.1 times faster than simple monotonicity for grid sizes of 100, 500, and 1000, respectively. Similar results hold in a real business cycle (RBC) model when not exploiting concavity. Binary monotonicity vastly outperforms simple monotonicity because the latter is only about twice as fast as brute force. For problems with concave objectives like the RBC model, we find, despite its good theoretical properties, that binary monotonicity with binary concavity is only the second fastest combination of the nine possible pairings of monotonicity and concavity techniques. Specifically, simple monotonicity with simple concavity is around 20% faster, requiring only 3.0 objective function evaluations per state compared to 3.7 for binary monotonicity with binary concavity. While somewhat slower for the RBC model, binary monotonicity with binary concavity has guaranteed O(n) performance that may prove useful in different setups. 2

So far we have described binary monotonicity as it applies in the one-state-variable case, but it can also be used to exploit monotonicity in two state variables. Quantitatively, we show a two-state binary monotonicity algorithm further reduces evaluation counts per state (to 2.9 with brute force and 2.2 with binary concavity in the RBC example above), which makes it several times faster than the one-state algorithm. Theoretically, we show for a class of optimal policies that the twostate algorithm—without any assumption of concavity—requires at most four evaluations per state asymptotically. As knowing the true policy and simply recovering the value function using it would require one evaluation per state, this algorithm delivers very good performance. We also show binary monotonicity can be used in a class of problems that does not have monotone policies. Specifically, it can be used for problems of the form maxi0 u(z(i) − w(i0 )) + W (i0 ) where i and i0 are indices in possibly multidimensional grids and u is concave, increasing, and differentiable. If z and W are increasing, one can show—using sufficient conditions we provide—that the optimal policy is monotone. However, even if they are not increasing, sorting z and W transforms the problem into one that does have a monotone policy and so allows binary monotonicity to be used. We establish this type of algorithm is O(n log n) inclusive of sorting costs and show it is several times faster than existing grid search methods in solving a sovereign default model with capital that features portfolio choice. For problems exhibiting global concavity, many attractive solution methods exist. These include fast and globally accurate methods such as projection (including Smolyak sparse grid and cluster grid methods), Carroll’s (2006) endogenous gridpoints method (EGM), and Maliar and Maliar’s (2013) envelope condition method (ECM), as well as fast and locally accurate methods such as linearization and higher-order perturbations. Judd (1998) and Schmedders and Judd (2014) provide useful descriptions of these methods that are, in general, superior to grid search in terms of speed and usually accuracy, although not necessarily robustness.1 In contrast, few computational methods are available for problems with non-concavities, such as problems commonly arising in discrete choice models. The possibility of multiple local maxima makes working with any method requiring first order conditions (FOCs) perilous, which discourages the use of the methods mentioned above. As a result, researchers often resort to value function iteration with grid search. As binary monotonicity is many times faster than simple monotonicity when not exploiting concavity, its use in discrete choice models and other models with non-concavities seems particularly promising. However, recent research has looked at ways of solving non-concave problems using FOCs. Fella (2014) lays out a generalized EGM algorithm (GEGM) for non-concave problems that finds all points satisfying the FOCs and uses a value function iteration step to distinguish global maxima from local maxima. The algorithm in Iskhakov, Jørgensen, Rust, and Schjerning (2016) is qualitatively similar, but they identify the global maxima in a different and way and show that adding i.i.d. 1

For a comparison in the RBC context, see Aruoba, Fern´ andez-Villaverde, and Rubio-Ram´ırez (2006). Maliar and Maliar (2014) and a 2011 special issue of the Journal of Economic Dynamics and Control (see Den Haan, Judd, and Juillard, 2011) evaluate many of these methods and their variations in the context of a large-scale multi-country RBC model.

3

taste shocks facilitates computation. Maliar and Maliar’s (2013) ECM does not necessitate the use of FOCs, and Arellano, Maliar, Maliar, and Tsyrennikov (2016) use it to solve the Arellano (2008) model. However, because of convergence issues, they use ECM only as a refinement to a solution computed by grid search (Arellano et al., 2016, p. 454). While these derivative-based methods will generally be more accurate than a purely grid-search-based method, binary monotonicity solves the problem without adding shocks, requires no derivative computation, and is simple. We also show in the working paper Gordon and Qiu (2017) that binary monotonicity can be used with continuous choice spaces to significantly improve accuracy. Additionally, some of these methods either require (such as GEGM) or benefit from (such as ECM) a grid search component, which allows binary monotonicity to be useful even as part of a more complicated algorithm. Puterman (1994) and Judd (1998) discuss a number of existing methods (beyond what we have mentioned thus far) that can be used in discrete optimal control problems. Some of these methods, such as policy iteration, multigrids, and action elimination procedures, are complementary to binary monotonicity. Others, such as the Gauss-Seidel methods or splitting methods, could not naturally be employed while simultaneously using binary monotonicity.2 In contrast to binary monotonicity, most of these methods try to produce good value function guesses and so only apply in the infinitehorizon case. Exploiting monotonicity and concavity is not a new idea, and—as Judd (1998) pointed out— the “order in which we solve the various [maximization] problems is important in exploiting” monotonicity and concavity (p. 414). The quintessential ideas behind exploiting monotonicity and concavity date back to at least Christiano (1990); simple monotonicity and concavity as used here are from Judd (1998); and Heer and Maußner (2005) proposed binary concavity, which is qualitatively similar to an adaptive grid method proposed by Imrohoro˘glu, Imrohoro˘glu, and Joines (1993). What is new here is that we exploit monotonicity in a novel and efficient way in both one and two dimensions, provide theoretical cost bounds, and show binary monotonicity’s excellent quantitative performance. Additionally, we provide code and sufficient conditions for policy function monotonicity. The rest of the paper is organized as follows. Section 2 lays out the algorithm for exploiting monotonicity in one state variable and characterizes its performance theoretically and quantitatively. Section 3 extends the algorithm to exploit monotonicity in two state variables. Section 4 shows how the algorithm can be applied to the class of problems with non-monotone policies. Section 5 gives sufficient conditions for policy function monotonicity. Section 6 concludes. The appendices provide additional algorithm, calibration, computation, and performance details, as well as examples and proofs. 2

The operations research literature has also developed algorithms that approximately solve dynamic programming problems. Papadaki and Powell (2002, 2003) and Jiang and Powell (2015) aim to preserve monotonicity of value and policy functions while receiving noisy updates of the value function at simulated states. While they exploit monotonicity in a nontraditional order, they do so in a random one and do not compute an exact solution.

4

2

Binary monotonicity in one state

This section formalizes the binary monotonicity algorithm for exploiting monotonicity in one state variable, illustrates how it and the existing grid search algorithms work, and analyzes its properties theoretically and quantitatively.

2.1

Binary monotonicity, existing algorithms, and a simple example

Our focus is on solving Π(i) =

max

i0 ∈{1,...,n0 }

π(i, i0 )

(1)

for i ∈ {1, . . . , n} with an optimal policy g. We say g is monotone (increasing) if g(i) ≤ g(j) whenever i ≤ j. For a concrete example, consider the problem of optimally choosing next period capital k 0 given a current capital stock k where both of these lie in a grid K = {k1 , . . . , kn } having kj < kj+1 for all j. For a period utility function u, a production function F , a depreciation rate δ, a time discount factor β, and a guess on the value function V0 , the Bellman update can be written V (ki ) =

max

i0 ∈{1,...,n}

u(−ki0 + F (ki ) + (1 − δ)ki ) + βV0 (ki0 )

(2)

with the optimal policy g(i) given in terms of indices. Then (2) fits the form of (1). Of course, not every choice is necessarily feasible; there may multiple optimal policies; there may be multiple state and choice variables; and the choice space may be continuous. However, binary monotonicity handles or can be adapted to handle all these issues.3 Our algorithm computes the optimal policy g and the optimal value Π using divide and conquer. The algorithm is as follows: 1. Initialization: Compute g(1) and Π(1) by searching over {1, . . . , n0 }. If n = 1, STOP. Compute g(n) and Π(n) by searching over {g(1), . . . , n0 }. Let i = 1 and i = n. If n = 2, STOP. 2. At this step, (g(i), Π(i)) and (g(i), Π(i)) are known. Find an optimal policy and value for all i ∈ {i, . . . , i} as follows: (a) If i = i + 1, STOP: For all i ∈ {i, . . . , i} = {i, i}, g(i) and Π(i) are known. (b) For the midpoint m = b i+i 2 c, compute g(m) and Π(m) by searching over {g(i), . . . , g(i)}. (c) Divide and conquer: Go to (2) twice, first computing the optimum for i ∈ {i, . . . , m} and then for i ∈ {m, . . . , i}. In the first case, redefine i := m; in the second, redefine i := m. 3 Beyond the two-state binary monotonicity algorithm in Section 3, Appendix E shows how multiple state or choice variables can be implicit in the i and i0 and exploits that fact. Appendix D (E) shows how additional state (choice) variables, for which one does not want to or cannot exploit monotonicity, may be handled. Appendix D also shows that, under mild conditions, binary monotonicity will correctly deliver an optimal policy (1) even if there are multiple optimal policies and (2) even if there are nonfeasible choices. For the latter result, π(i, i0 ) must be assigned a sufficiently large negative number when an (i, i0 ) combination is not feasible. Moreover, continuous choice spaces can be handled without issue as discussed in the working paper Gordon and Qiu (2017).

5

Figure 1 illustrates how binary and simple monotonicity work. The blue dots represent the optimal policy of (2), and the empty circles represent the search spaces implied by the respective algorithms. With simple monotonicity, the search for i > 1 is restricted to {g(i − 1), . . . , n0 }, which results in a nearly triangular search space. For binary monotonicity, the search is restricted to {g(i), . . . , g(i)} where i and i are far apart for the first iterations but rapidly approach each other. This results in an irregularly shaped search space that is large at the first iterations i = 1, n, and n/2 but much smaller at later ones. For this example, the average search space for simple monotonicity, i.e., the average size of {g(i − 1), . . . , n0 }, grows from 10.6 for n = n0 = 20 to 51.8 for n = n0 = 100. This is, roughly, a 50% improvement on brute force. In contrast, the average size of binary monotonicity’s search space is 7.0 when n = n0 = 20 (34% smaller than simple monotonicity’s) and 9.5 when n = n0 = 100 (82% smaller), a large improvement.

Figure 1: A graphical illustration of simple and binary monotonicity Binary monotonicity restricts the search space but does not say how one should find a maximum within it. In solving maxi0 ∈{a,...,b} π(i, i0 ) for a given i, a, and b, one can use brute force, evaluating π(i, i0 ) at all i0 ∈ {a, . . . , b}. In Figure 1, this amounts to evaluating π at every empty circle and blue dot. However, if the problem is concave, simple or binary concavity may be used. Simple concavity proceeds sequentially from i0 = a to i0 = b but stops whenever π(i, i0 − 1) > π(i, i0 ). In Figure 1, this amounts to evaluating π(i, i0 ) from the lowest empty circle in each column to one above the blue dot. In contrast, binary concavity uses the ordering of π(i, m) and π(i, m + 1) where m = b a+b 2 c to narrow the search space: If π(i, m) < π(i, m + 1), an optimal choice must be in {m + 1, . . . , b}; if π(i, m) ≥ π(i, m + 1), an optimal choice must be in {a, . . . , m}. The search then proceeds recursively, redefining the search space accordingly until there is only one choice left. In 6

Figure 1, this amounts to always evaluating π(i, i0 ) at adjacent circles (m and m + 1) within a column with the adjacent circles progressing geometrically towards the blue dot. For our precise implementation of binary concavity, see Appendix A.

2.2

Theoretical cost bounds

We now characterize binary monotonicity’s theoretical performance by providing bounds on the number of times π must be evaluated to solve for g and Π. Clearly, this depends on the method used to solve max

i0 ∈{a,...,a+γ−1}

π(i, i0 ).

(3)

While brute force requires γ evaluations of π(i, ·) to solve (3), binary concavity requires at most 2dlog2 (γ)e evaluations, which we prove in Appendix F. Proposition 1 gives the main theoretical result of the paper: Proposition 1. Suppose n ≥ 4 and n0 ≥ 3. If brute force grid search is used, then binary monotonicity requires no more than (n0 − 1) log2 (n − 1) + 3n0 + 2n − 4 evaluations of π. Consequently, fixing n = n0 , the algorithm’s worst case behavior is O(n log2 n) with a hidden constant of one. If binary concavity is used with binary monotonicity, then no more than 6n + 8n0 + 2 log2 (n0 − 1) − 15 evaluations are required. Consequently, fixing n = n0 , the algorithm’s worst case behavior is O(n) with a hidden constant of 14. Note the bounds stated in the abstract and introduction, n log2 (n) + 5n for brute force and 14n + 2 log2 (n) for binary concavity, are simplified versions of these for the case n = n0 . These worst-case bounds show binary monotonicity is very powerful. To see this, note that even if one knew the optimal policy, recovering Π would still require n evaluations of π (specifically, evaluating π(i, g(i)) for i = 1, . . . , n). Hence, relative to knowing the true solution, binary monotonicity is only asymptotically slower by a log factor when n = n0 . Moreover, when paired with binary concavity, the algorithm is only asymptotically slower by a factor of 14.

2.3

Quantitative performance in the Arellano (2008) and RBC models

We now turn to assessing the algorithm’s performance in the Arellano (2008) and RBC models. First, we use the Arellano (2008) model to compare our method with existing techniques that only assume monotonicity. Second, we use the RBC model to compare our method with existing techniques that assume monotonicity and concavity. We will not conduct any error analysis here since all the techniques deliver identical solutions. The calibrations and additional computation details are given in Appendix B. For a description of how the Arellano (2008) and RBC models can be mapped into (1), see Appendix D.

7

2.3.1

Exploiting monotonicity in the Arellano (2008) model

Figure 2 compares the run times and average π-evaluation counts necessary to obtain convergence in the Arellano (2008) model when using brute force, simple monotonicity, and binary monotonicity. The ratio of simple monotonicity’s run time to binary monotonicity’s grows virtually linearly, increasing from 5.1 for a grid size of 100 to 95 for a grid size of 2500. The speedup of binary monotonicity relative to brute force also grows linearly but is around twice as large in levels. This latter fact reflects that simple monotonicity is only about twice as fast as brute force irrespective of grid size. For evaluation counts, the patterns are similar, but binary monotonicity’s speedups relative to simple monotonicity and brute force are around 50% larger.

Figure 2: Cost comparison for methods only exploiting monotonicity Binary monotonicity is faster than simple monotonicity by a factor of 5 to 20 for grids of a few hundred points, which is a large improvement. While these are the grid sizes that have been commonly used in the literature to date (e.g., Arellano, 2008, uses 200), the cost of using several thousand points is relatively small when using binary monotonicity. For instance, in roughly the same time simple monotonicity needs to solve the 250-point case (1.48 seconds), binary monotonicity can solve the 2500-point case (which takes 1.51 seconds). At these larger grid sizes, binary monotonicity can be hundreds of times faster than simple monotonicity.

8

2.3.2

Exploiting monotonicity and concavity in the RBC model

To assess binary monotonicity’s performance when paired with a concavity technique, we turn to the RBC model. Table 1 examines the run times and evaluation counts for all nine possible combinations of the monotonicity and concavity techniques. n = 250 Monotonicity

Concavity

None Simple Binary

n = 500

Eval./n

Time (s)

Eval./n

Time (s)

Increase

None None None

250.0 127.4 10.7

8.69 3.43 0.40

500.0 253.4 11.7

29.51 13.29 0.85

3.4 3.9 2.1

None Simple Binary

Simple Simple Simple

125.5 3.0 6.8

4.39 0.14 0.30

249.6 3.0 7.3

17.28 0.26 0.60

3.9 1.9 2.0

None Simple Binary

Binary Binary Binary

13.9 12.6 3.7

0.58 0.43 0.20

15.9 14.6 3.7

1.27 0.95 0.36

2.2 2.2 1.8

Note: The last column gives the run time increase from n = 250 to 500. Table 1: Run times and evaluation counts for all monotonicity and concavity techniques Perhaps surprisingly, the fastest combination is simple monotonicity with simple concavity. This pair has the smallest run times for both values of n and the time increases linearly (in fact, slightly sublinearly). For this combination, solving for the optimal policy requires, on average, only three evaluations of π per state. The reason for this is that the capital policy very nearly satisfies g(i) = g(i−1)+1. When this is the case, simple monotonicity evaluates π(i, g(i−1)), π(i, g(i−1)+1), and π(i, g(i − 1) + 2); finds π(i, g(i − 1) + 1) > π(i, g(i − 1) + 2); and stops. The second fastest combination, binary monotonicity with binary concavity, exhibits a similar linear (in fact, slightly sublinear) time increase. However, it fares worse in absolute terms, requiring 3.7 evaluations of π per state. All the other combinations are slower and exhibit greater run time and evaluation count growth. While Table 1 only reports the performance for two grid sizes, it is representative. This can be seen in Figure 3, which plots the average number of π evaluations per state required for the most efficient methods. Simple monotonicity with simple concavity and binary monotonicity with binary concavity both appear to be O(n) (with the latter guaranteed to be), but the hidden constant is smaller for the former. The other methods all appear to be O(n log n).

3

Binary monotonicity in two states

In the previous section, we demonstrated binary monotonicity’s performance when exploiting monotonicity in one state variable. However, some models have policy functions that are monotone in

9

Figure 3: Empirical O(n) behavior more than one state. For instance, under certain conditions, the RBC model’s capital policy k 0 (k, z) is monotone in both k and z. In this section, we show how to exploit this property.

3.1

The two-state algorithm and an example

Our canonical problem is to solve Π(i, j) =

max

i0 ∈{1,...,n0 }

π(i, j, i0 )

(4)

for i ∈ {1, . . . , n1 } and j ∈ {1, . . . , n2 }, where the optimal policy g(i, j) is increasing in both arguments. The two-state binary monotonicity algorithm first solves for g(·, 1) using the one-state algorithm. It then recovers g(·, n2 ) using the one-state algorithm but with g(·, 1) serving as an additional lower bound. The core of the two-state algorithm assumes g(·, j) and g(·, j) are known and uses them as additional bounds in computing g(·, j) for j = b

j+j 2 c.

Specifically, in solving for

g(i, j), the search spaced is restricted to integers in [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)] instead of just [g(i, j), g(i, j)] as the one-state algorithm would. Appendix A gives the algorithm in full detail. Figure 4 illustrates how the two-state binary monotonicity algorithm works when applied to the RBC model using capital as the first dimension. The figure is analogous to Figure 1, but the red lines indicate bounds on the search space coming from the previous solutions g(·, j) and g(·, j). At j = n2 (the left panel), g(·, 1) has been solved for and hence provides a lower bound on the search space as indicated by the red, monotonically increasing line. However, there is no upper bound on the search space other than n0 . At j = b n22+1 c (the right panel), g(·, 1) and g(·, n2 ) are known, and 10

the bounds they provide drastically narrow the search space.

Figure 4: Example of binary monotonicity in two states

3.2

Quantitative performance in the RBC model

Table 2 reports the evaluation counts and run times for the RBC model when using the two-state algorithm with the various concavity techniques. We again treat the first dimension as capital, so n1 = n0 but n1 does not necessarily equal n2 . For ease of reference, the table also reports these measures when exploiting monotonicity only in k. The speedup of the two-state algorithm relative to the one-state algorithm ranges from 1.7 to 3.7 when considering evaluation counts and 1.2 to 2.4 when considering run times. Exploiting monotonicity in k and z, even without an assumption on concavity, brings evaluation counts down to 2.9. This is marginally better than the “simplesimple” combination seen in Table 1, which had a 3.0 count for these grid sizes. However, when combining two-state binary monotonicity with binary concavity, evaluations counts drop to 2.2. This significantly improves on the simple-simple 3.0 evaluation count and is only around twice as slow as knowing the true solution. Monotonicity

Concavity

Eval

Time

Eval Speedup

Time Speedup

k k k k k k

None Simple Binary None Simple Binary

10.7 6.8 3.7 2.9 2.4 2.2

0.42 0.29 0.19 0.18 0.16 0.16

− − − 3.7 2.8 1.7

− − − 2.4 1.8 1.2

only only only and z and z and z

Note: Time is in seconds; the speedups give the two-state algorithm improvement relative to the one-state; grid sizes of n1 = n0 = 250 and n2 = 21 are used. Table 2: Run times and evaluation counts for one-state and two-state binary monotonicity

11

How the two-state algorithm’s cost varies in grid sizes can be seen in Figure 5, which plots evaluations counts per state as grid sizes grow while fixing the ratio n1 /n2 and forcing n1 = n0 (for this figure, concavity is not exploited). The horizontal axis gives the number of states in log2 so that a unit increment means the number of states is doubled. In absolute terms, the evaluations per state are below 3 for a wide range of grid sizes. While the overall performance depends on the n1 /n2 ratio with larger ratios implying less efficiency, in all cases the evaluation counts fall and appear to asymptote as grid sizes increase.

Figure 5: Two-state algorithm’s empirical O(n1 n2 ) behavior for n1 = n0 and n1 /n2 in a fixed ratio

3.3

Theoretical cost bounds

While Figure 5 suggests the two-state algorithm is in fact O(n1 n2 ) as n1 , n2 and n0 increase (holding their proportions fixed), we have only been able to prove this is the case for a restricted class of optimal policies:4 Proposition 2. Suppose the number of states in the first (second) dimension is n1 (n2 ) and the number of choices is n0 with n1 , n2 , n0 ≥ 4. Further, let λ ∈ (0, 1] be such that for every j ∈ {2, . . . , n2 − 1} one has g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1). Then, the two-state binary monotonicity algorithm requires no more than (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 + 2n0 log2 (n1 ) + 6n0 evaluations of π where κ = log2 (1 + λ). 4 We abuse notation in writing O(n1 n2 ) to emphasize the algorithm’s performance in terms of the total number of states, n1 n2 . Formally, for n1 /n2 = ρ and n1 = n0 =: n, O(n1 n2 ) should be O(n2 /ρ).

12

For n1 = n0 =: n and n1 /n2 = ρ with ρ a constant, the cost is O(n1 n2 ) with a hidden constant of 4 if (g(n, j) − g(1, j) + 1)/(g(n, j + 1) − g(1, j − 1) + 1) is bounded away from 1 for large n. For any function monotone in i and j, the restriction g(n1 , j) − g(1, j) + 1 ≤ λ(g(n1 , j + 1) − g(1, j − 1) + 1) is satisfied for λ = 1. However, for the algorithm’s asymptotic cost to be O(n1 n2 ), we require, essentially, λ < 1. When this is the case, the two-state algorithm—without exploiting concavity—is only four times slower asymptotically than when the optimal policy is known. In the RBC example, we found that (g(n, j) − g(1, j) + 1)/(g(n, j + 1) − g(1, j − 1) + 1) was small initially but grew and seemed to approach one. If it does limit to one, the asymptotic performance is not guaranteed. However, the two-state algorithm has shown itself to be very efficient for quantitativelyrelevant grid sizes.

4

Extension to a class of non-monotone problems

In this section, we briefly describe how the model can be applied to a class of problems with potentially non-monotone policies. Our canonical problem is to solve, for i = 1, . . . , n, V (i) =

max

c≥0,i0 ∈{1,...,n0 }

u(c) + W (i0 ) (5)

s.t. c = z(i) − w(i0 ), where u0 > 0, u00 < 0 with an associated optimal policy g. Here, as before, i and i0 can be thought of as indices in possibly multidimensional grids. While this is a far narrower class of problems than (1), it is broad enough to apply in the Arellano (2008) and RBC models, as well as many others. If z and W are weakly increasing, then one can show (using the sufficient conditions given in the next section) that g is monotone. Our main insight is that binary monotonicity can be applied even when z and W are not monotone by creating a new problem where their values have been ˜ be the sorted values of z and W , respectively, and letting w sorted. Specifically, letting z˜ and W ˜ ˜ be rearranged in the same order as W , the transformed problem is to solve, for each j = 1, . . . , n, V˜ (j) =

max

c≥0,j 0 ∈{1,...,n0 }

˜ (j 0 ) u(c) + W (6) 0

s.t. c = z˜(j) − w(j ˜ ) ˜ and z˜ are increasing, binary monotonicity can be with an associated policy function g˜. Because W used to obtain g˜ and V˜ . Then, one may recover g and V by undoing the sorting. Evidently, this class of problems allows for a cash-at-hand reformulation with z as a state variable, so the real novelty of this approach lies in the sorting of W . Theoretically, this algorithm is asymptotically efficient when an efficient sorting algorithm is used. Specifically, the sorting for z can be done in O(n log n) operations and the sorting for W and w in O(n0 log n0 ) operations. Since binary monotonicity is O(n0 log n) + O(n) as either n or n0 grows, the entire algorithm is O(n log n) + O(n0 log n0 ) as either n or n0 grows. Fixing n0 = n, this is the 13

same O(n log n) cost seen in Proposition 1. Additionally, Appendix E shows these good asymptotic properties carry over to commonly-used grid sizes in a sovereign default model with bonds and capital that entails a nontrivial portfolio choice problem.

5

Sufficient conditions for monotonicity

In this section, we provide two sufficient conditions for policy function monotonicity. The first, due to Topkis (1978), is from the vast literature on monotonicity.5 The second is novel (although an implication of the necessary and sufficient conditions in Milgrom and Shannon, 1994) and applies to show monotonicity in the Arellano (2008) model and sorted problem (6), whereas the Topkis (1978) result does not. The Topkis (1978) sufficient condition has two main requirements. First, the objective function must have increasing differences. In our simplified context, this may be defined as follows: Definition 1. Let S ⊂ R2 . Then f : S → R has weakly (strictly) increasing differences on S if f (x2 , y) − f (x1 , y) is weakly (strictly) increasing in y for x2 > x1 whenever (x1 , y), (x2 , y) ∈ S. For smooth functions, increasing differences essentially requires that the cross-derivative f12 be non-negative. However, smoothness is not necessary, and we include many sufficient conditions for increasing differences in Appendix C. The second requirement is that the feasible choice correspondence must be ascending. For our purposes, this may be defined as follows: Definition 2. Let I, I 0 ⊂ R with G : I → P (I 0 ) where P denotes the power set. Let a, b ∈ I with a < b. G is ascending on I if g1 ∈ G(a), g2 ∈ G(b) implies min{g1 , g2 } ∈ G(a) and max{g1 , g2 } ∈ G(b). G is strongly ascending on I if g1 ∈ G(a), g2 ∈ G(b) implies g1 ≤ g2 . One way for the choice set to be ascending is for every choice to be feasible. An alternative we establish in Appendix C is for feasibility to be determined by inequality constraints such as h(i, i0 ) ≥ 0 with h increasing in i, decreasing in i0 , and having increasing differences. Now we can state Topkis’s (1978) sufficient condition as it applies in our simplified framework: Proposition 3 (Topkis, 1978). Let I, I 0 ⊂ R, I 0 : I → P (I 0 ), and π : I ×I 0 → R. If I 0 is ascending on I and π has increasing differences on I × I 0 , then G defined by G(i) := arg maxi0 ∈I 0 (i) π(i, i0 ) is ascending on {i ∈ I|G(i) 6= ∅}. If π has strictly increasing differences, then G is strongly ascending. Note that a requirement is that the feasible choice correspondence is ascending while the result is that the optimal choice correspondence is ascending. If the optimal choice correspondence is strongly ascending, then every optimal policy is necessarily monotone. However, for simple and 5 This literature includes Athey (2002); Hopenhayn and Prescott (1992); Huggett (2003); Jiang and Powell (2015); Joshi (1997); Majumdar and Zilcha (1987); Milgrom and Shannon (1994); Mirman and Ruble (2008); Mitra and Nyarko (1991); Puterman (1994); Quah (2007); Quah and Strulovici (2009); Smith and McCardle (2002); Stokey and Lucas Jr. (1989); Strulovici and Weber (2010); Topkis (1998); and others. The working paper Gordon and Qiu (2017) gives additional discussion on these papers and how they relate to the sufficient conditions here.

14

binary monotonicity to correctly deliver optimal policies, the weaker result that the optimal choice correspondence is ascending is enough, which we prove in Appendix D. While the Topkis (1978) result is general, it cannot be used to show monotonicity in the Arellano (2008) model or the sorted problem because their objective functions typically do not have increasing differences.6 Proposition 4 gives an alternative sufficient condition that does apply for these problems: Proposition 4. Let I, I 0 ⊂ R. Define G(i) := arg maxi0 ∈I 0 ,c(i,i0 )≥0 u(c(i, i0 )) + W (i0 ) where u is differentiable, increasing, and concave. If c is increasing in i and has increasing differences and W is increasing in i0 , then G is ascending (on i such that an optimal choice exists). If, in addition, c is strictly increasing in i and W is strictly increasing in i0 , or if c has strictly increasing differences, then G is strongly ascending. While the objective’s functional form is restrictive, it still applies in many dynamic programming contexts with u◦c as flow utility and W as continuation utility. Appendix C shows how Propositions 3 and 4 may be used to establish monotonicity in the RBC and Arellano (2008) models, respectively.

6

Conclusion

Binary monotonicity is a powerful grid search technique. The one-state algorithm is O(n log n) and an order of magnitude faster than simple monotonicity in the Arellano (2008) and RBC models. Moreover, combining it with binary concavity guarantees O(n) performance. The two-state algorithm is even more efficient than the one-state and, for a class of optimal policies, gives O(n1 n2 ) performance without any assumption of concavity. Binary monotonicity is also widely applicable and can even be used in a class of non-monotone problems. While binary monotonicity should prove useful for concave and non-concave problems alike, its use in the latter—where few solution techniques exist—seems especially promising.

References C. Arellano. Default risk and income fluctuations in emerging economies. American Economic Review, 98(3):690–712, 2008. C. Arellano, L. Maliar, S. Maliar, and V. Tsyrennikov. Envelope condition method with an application to default risk models. Journal of Economic Dynamics and Control, 69:436–459, 2016. S. B. Aruoba, J. Fern´ andez-Villaverde, and J. F. Rubio-Ram´ırez. Comparing solution methods for dynamic general equilibrium economies. Journal of Economic Dynamics and Control, 30(12): 2477–2508, 2006. ˜ (j 0 ) does not necessarily have For instance, in the sorted problem, the objective function u(˜ z (j) − w(j ˜ 0 )) + W 00 0 increasing differences because, loosely speaking, the cross-derivative u z˜j w ˜j is negative if w ˜ is strictly decreasing. 6

15

S. Athey. Monotone comparative statics under uncertainty. The Quarterly Journal of Economics, 117(1):187, 2002. Y. Bai and J. Zhang. Financial integration and international risk sharing. Journal of International Economics, 86(1):17–32, 2012. C. D. Carroll. The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3):312–320, 2006. L. J. Christiano. Solving the stochastic growth model by linear-quadratic approximation and by value-function iteration. Journal of Business & Economic Statistics, 8(1):23–26, 1990. W. J. Den Haan, K. L. Judd, and M. Juillard. Computational suite of models with heterogeneous agents II: Multi-country real business cycle models. Journal of Economic Dynamics and Control, 35(2):175 – 177, 2011. G. Fella. A generalized endogenous grid method for non-smooth and non-concave problems. Review of Economic Dynamics, 17(2):329–344, 2014. G. Gordon and S. Qiu. A divide and conquer algorithm for exploiting policy function monotonicity. CAEPR Working Paper 2017-006, Indiana University, 2017. URL https://ssrn.com/abstract= 2995636. B. Heer and A. Maußner. Dynamic General Equilibrium Modeling: Computational Methods and Applications. Springer, Berlin, Germany, 2005. H. A. Hopenhayn and E. C. Prescott. Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica, 60(6):1387–1406, 1992. M. Huggett. When are comparative dynamics monotone? Review of Economic Dynamics, 6(1):1 – 11, 2003. A. Imrohoro˘ glu, S. Imrohoro˘ glu, and D. H. Joines. A numerical algorithm for solving models with incomplete markets. International Journal of High Performance Computing Applications, 7(3): 212–230, 1993. F. Iskhakov, T. H. Jørgensen, J. Rust, and B. Schjerning. The endogenous grid method for discretecontinuous dynamic choice models with (or without) taste shocks. Quantitative Economics, forthcoming, 2016. D. R. Jiang and W. B. Powell. An approximate dynamic programming algorithm for monotone value functions. Operations Research, 63(6):1489–1511, 2015. S. Joshi. Turnpike theorems in nonconvex nonstationary environments. International Economic Review, 38(1):225–248, 1997.

16

K. L. Judd. Numerical Methods in Economics. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1998. M. Majumdar and I. Zilcha. Optimal growth in a stochastic environment: Some sensitivity and turnpike results. Journal of Economic Theory, 43(1):116 – 133, 1987. L. Maliar and S. Maliar. Envelope condition method versus endogenous grid method for solving dynamic programming problems. Economics Letters, 120:262–266, 2013. L. Maliar and S. Maliar.

Numerical methods for large-scale dynamic economic models.

In

K. Schmedders and K. Judd, editors, Handbook of Computational Economics, volume 3, chapter 7. Elsevier Science, 2014. P. Milgrom and I. Segal. Envelope theorems for arbitrary choice sets. Econometrica, 70(2):583–601, 2002. P. Milgrom and C. Shannon. Monotone comparative statics. Econometrica, 62(1):157–180, 1994. L. J. Mirman and R. Ruble. Lattice theory and the consumer’s problem. Mathematics of Operations Research, 33(2):301–314, 2008. T. Mitra and Y. Nyarko. On the existence of optimal processes in non-stationary environments. Journal of Economics, 53(3):245–270, 1991. K. P. Papadaki and W. B. Powell. Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. European Journal of Operational Research, 142 (1):108–127, 2002. K. P. Papadaki and W. B. Powell. A discrete online monotone estimation algorithm. Working Paper LSEOR 03.73, Operational Research Working Papers, 2003. M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, 1994. J. K. Quah. The comparative statics of constrained optimization problems. Econometrica, 75(2): 401–431, 2007. J. K. Quah and B. Strulovici. Comparative statics, informativeness, and the interval dominance order. Econometrica, 77(6):1949–1992, 11 2009. K. Schmedders and K. Judd. Handbook of Computational Economics, volume 3. Elsevier Science, 2014. D. Simchi-Levi, X. Chen, and J. Bramel. The Logic of Logistics: Theory, Algorithms, and Applications for Logistics Management (Third Edition). Springer, New York, 2014.

17

J. E. Smith and K. F. McCardle. Structural properties of stochastic dynamic programs. Operations Research, 50(5):796–809, 2002. N. L. Stokey and R. E. Lucas Jr. Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, Massachusetts and London, England, 1989. B. H. Strulovici and T. A. Weber. Generalized monotonicity analysis. Economic Theory, 43(3): 377–406, 2010. O. Tange. GNU Parallel - the command-line power tool. ;login: The USENIX Magazine, 36(1): 42–47, 2011. G. Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20(2):177–181, 1986. D. M. Topkis. Minimizing a submodular function on a lattice. Operations Research, 26(2):305–321, 1978. D. M. Topkis. Supermodularity and complementarity. Princeton University Press, Princeton, N.J., 1998.

A

Additional algorithm details

This appendix gives our implementation of binary concavity and the two-state algorithm. The working paper Gordon and Qiu (2017) contains a non-recursive implementation of the one-state binary monotonicity algorithm.

A.1

Binary concavity

Below is our implementation of Heer and Maußner (2005)’s algorithm for solving maxi0 ∈{a,...,b} π(i, i0 ). Throughout, n refers to b − a + 1. 1. Initialization: If n = 1, compute the maximum, π(i, a), and STOP. Otherwise, set the flags 1a = 0 and 1b = 0. These flags indicate whether the value of π(i, a) and π(i, b) are known, respectively. 2. If n > 2, go to 3. Otherwise, n = 2. Compute π(i, a) if 1a = 0 and compute π(i, b) if 1b = 0. The optimum is the best of a, b. 3. If n > 3, go to 4. Otherwise, n = 3. If max{1a , 1b } = 0, compute π(i, a) and set 1a = 1. Define m =

a+b 2 ,

and compute π(i, m).

(a) If 1a = 1, check whether π(i, a) > π(i, m). If so, the maximum is a. Otherwise, the maximum is either m or b; redefine a = m, set 1a = 1, and go to 2. 18

(b) If 1b = 1, check whether π(i, b) > π(i, m). If so, the maximum is b. Otherwise, the maximum is either a or m; redefine b = m, set 1b = 1, and go to 2. 4. Here, n ≥ 4. Define m = b a+b 2 c and compute π(i, m) and π(i, m + 1). If π(i, m) < π(i, m + 1), a maximum is in {m + 1, . . . , b}; redefine a = m + 1, set 1a = 1, and go to 2. Otherwise, a maximum is in {a, . . . , m}; redefine a = m, set 1b = 1, and go to 2.

A.2

The two-state binary monotonicity algorithm

We now give the two-state binary monotonicity algorithm in full detail. To make it less verbose, we omit references to Π, but it should be solved for at the same time as g. 1. Solve for g(·, 1) using the one-state binary monotonicity algorithm. Define l(·) := g(·, 1), u(·) := n0 , j := 1, and j := n2 . 2. Solve for g(·, j) as follows: (a) Solve for g(1, j) on the search space {l(1), . . . , u(1)}. Define g := g(1, j) and i := 1. (b) Solve for g(n1 , j) on the search space {max{g, l(n1 )}, . . . , u(n1 )}. Define g := g(n1 , j) and i := n1 . (c) Consider the state as (i, i, g, g). Solve for g(i, j) for all i ∈ {i, . . . , i} as follows. i. If i = i + 1, then this is done, so go to Step 3. Otherwise, continue. ii. Let m1 = b i+i 2 c and compute g(m1 , j) by searching {max{g, l(m1 )}, . . . , min{g, u(m1 )}}. iii. Divide and conquer: Go to (c) twice, once redefining (i, g) := (m1 , g(m1 , j)) and once redefining (i, g) := (m1 , g(m1 , j)). 3. Here, g(·, j) and g(·, j) are known. Redefine l(·) := g(·, j) and u(·) := g(·, j). Compute g(·, j) for all j ∈ {j, . . . , j} as follows. (a) If j = j + 1, STOP: g(·, j) is known for all j ∈ {j, . . . , j}. (b) Define m2 := b

j+j 2 c.

Solve for g(·, m2 ) by essentially repeating the same steps as in 2

(but everywhere replacing j with m2 ). Explicitly, i. Solve for g(1, m2 ) on the search space {l(1), . . . , u(1)}. Define g := g(1, m2 ) and i := 1. ii. Solve for g(n1 , m2 ) on the search space {max{g, l(n1 )}, . . . , u(n1 )}. Define g := g(n1 , m2 ) and i := n1 . iii. Consider the state as (i, i, g, g). Solve for g(i, m2 ) for all i ∈ {i, . . . , i} as follows. A. If i = i + 1, then this is done, so go to Step 3 part (c). Otherwise, continue. B. Let m1 = b i+i 2 c and compute g(m1 , m2 ) by searching {max{g, l(m1 )}, . . . , min{g, u(m1 )}}. C. Divide and conquer: Go to (iii) twice, once redefining (i, g) := (m1 , g(m1 , m2 )) and once redefining (i, g) := (m1 , g(m1 , m2 )). 19

(c) Go to Step 3 twice, once redefining j := m2 and once redefining j := m2 . While one could “transpose” the algorithm, i.e., solve for g(1, ·) in Step 1 rather than g(·, 1) and so on, this has no effect. The reason is that the search space for a given (i, j) is restricted to [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)]. In the transposed algorithm, the search for the same (i, j) pair would be restricted to [g(i, j), g(i, j)] ∩ [g(i, j), g(i, j)]. Evidently, these search spaces are the same as long as i, i, j, j are the same in both the original and transposed algorithm. This is the case because—in both the original and transposed algorithm—i is reached after subdividing {1, . . . , n1 } in a order that does not depend on g and similarly for j.

B

Calibration and computation details

This appendix gives additional calibration and computation details. For the growth and RBC model, we use u(c) = c1−σ /(1 − σ) with σ = 2, a time discount factor β = .99, a depreciation rate δ = .025, and a production function zF (k) = zk .36 . The RBC model’s TFP process, log z = .95 log z−1 + .007ε with ε ∼ N (0, 1), is discretized using Tauchen (1986)’s method with 21 points spaced evenly over ±3 unconditional standard deviations. The capital grid is linearly spaced over ±20% of the steady state capital stock. The growth model’s TFP is constant and equal to 1, and its capital grid is {1, . . . , n}. A full description of the RBC model may be found in Aruoba et al. (2006). For the Arellano (2008) model, we adopt the same calibration as in her paper. For n bond grid points, 70% are linearly spaced from −.35 to 0 and the rest from 0 to .15. We discretize the exogenous output process log y = .945 log y−1 + .025ε with ε ∼ N (0, 1) using Tauchen (1986)’s method with 21 points spaced evenly over ±3 unconditional standard deviations. All run times are for the Intel Fortran compiler version 17.0.0 with the flags -mkl=sequential -debug minimal -g -traceback -xHost -O3 -no-wrap-margin -qopenmp-stubs on an Intel Xeon E5-2650 processor with a clock speed of 2.60GHz. The singly-threaded jobs were run in parallel using GNU Parallel as described in Tange (2011). When the run times were less than two minutes, the model was repeatedly solved until two minutes had passed and then the average time per solution was computed.

C

Additional sufficient conditions and applications [Not for Publication]

This appendix expands on the discussion of monotonicity sufficient conditions in Section 5 in two ways. First, Section C.1 gives sufficient conditions for choice correspondences to be ascending and for functions to exhibit increasing differences. Second, Section C.2 applies these sufficient conditions and the ones from the main text to establish monotonicity in the RBC model, the Arellano (2008) model, and the sorted problem (6). Appendix F gives the proofs.

20

C.1

Ascending correspondences and increasing differences

Proposition 3 requires the choice correspondence to be ascending. One way to ensure this is for every choice to be feasible. The following lemma, which relies on increasing differences, provides an alternative. Lemma 1. Let I, I 0 ⊂ R and I 0 : I → P (I 0 ). Suppose I 0 (i) = {i0 ∈ I 0 |hm (i, i0 ) ≥ 0 for all m ∈ M } with M arbitrary. If I 0 is increasing, hm is decreasing in i0 , and hm has increasing differences on I × I 0 (for all m), then I 0 is ascending on I. For an example application of Lemma 1, consider the RBC model and define c(k 0 , k, z) := −k 0 + zF (k) + (1 − δ)k. Since c is increasing in k and decreasing in k 0 , the budget constraint {k 0 ∈ K|c(k, k 0 , z) ≥ 0} will be ascending on K for each z as long as c has increasing differences in k, k 0 . To apply the sufficient conditions Proposition 3 and Proposition 4 or to apply Lemma 1, one must establish functions have increasing differences. The following lemma provides a number of sufficient conditions for establishing that this is the case. Some additional conditions may be found in Topkis (1978) and Simchi-Levi, Chen, and Bramel (2014). Lemma 2. For I, I 0 ⊂ R and S ⊂ I × I 0 , f : S → R has increasing differences on S if any of the following hold: (a) f (i, i0 ) = p(i) + q(i0 ) for arbitrary p and q. (b) f (i, i0 ) = p(i)+q(i0 )+r(i)s(i0 ) for arbitrary p and q with r and s both increasing or decreasing. (c) f (i, i0 ) agrees with g : L ⊂ R2 → R, a C 2 function having g12 ≥ 0 and L a hypercube with S ⊂ L. (d) f (i, i0 ) is a non-negative linear combination (i.e., f =

P

αk fk with αk ≥ 0) of functions

having increasing differences. (e) f (i, i0 ) = h(g(i, i0 )) for h an increasing, convex, C 2 function and g increasing (in i and i0 ) and having increasing differences. (f ) f (i, i0 ) = h(g(i, i0 )) for h an increasing, concave, C 2 function and g increasing in i, decreasing in i0 , and having increasing differences. (g) f (i, i0 ) =

R E

ˆ i0 )|h ˆ = h(i, ε), ε ∈ g(h(i, ε), i0 )dF (ε) with g having increasing differences on {(h,

E, (i, i0 ) ∈ S} and h increasing in i. (h) f (i, i0 ) = maxx∈Γ(i,i0 ) g(i, i0 , x) exists for all (i, i0 ) ∈ S, S is a lattice, Γ : S → P (X), X ⊂ R, the graph of Γ is a lattice, and g has increasing differences in i, i0 and i, x and i0 , x on I × I 0 , I × X, and I 0 × X, respectively (for all i, i0 , x). Since our method primarily exploits monotonicity of one choice variable, it will be convenient in some cases to construct an indirect utility function over the other choices and then establish 21

that the indirect utility function has increasing differences. Lemma 2 part (h) gives one sufficient condition for this, but its conditions can be difficult to guarantee unless every choice is feasible. Proposition 5 provides an alternative sufficient condition that may be easier to verify. Proposition 5. Let S ⊂ R2 be open and convex. Let f : S → R be defined by f (i, i0 ) = maxx∈X u(g(x, i, i0 ), x) where u is differentiable, increasing, and concave in its first argument and X is arbitrary. Then f has increasing differences on S if 1. f is well-defined and C 1 in i on the closure of S, and 2. for any optimal policy x∗ and any (i, i0 ) ∈ S, g2 (x∗ (i, i0 ), i, i0 ) exists, is positive, and is increasing in i0 and g(x∗ (i, i0 ), i, i0 ) is decreasing in i0 .

C.2

Applying the sufficient conditions

We now show how the preceding results can be applied to establish monotonicity in the RBC model, Arellano (2008), and the sorted problem (6). C.2.1

Monotonicity in k in the RBC model

Consider the RBC model where we defined c(k 0 , k, z) = −k 0 + zF (k) + (1 − δ)k. By Lemma 2 part (a), c has increasing differences in k, k 0 . Then, u(c(k 0 , k, z)) and u(c(k 0 , k, z)) + βEz 0 |z V0 (k 0 , z 0 ) have increasing differences by parts (f) and (d), respectively. Consequently, an application of Proposition 3 (noting the budget constraint is ascending by Lemma 1) gives that arg maxk0 u(c(k 0 , k, z)) + βEz 0 |z V0 (k 0 , z 0 ) is ascending on K. As stated in the main text and proven in Appendix D, this means binary monotonicity can be used to compute an optimal policy k 0 (k, z) that is monotone in k. C.2.2

Monotonicity in z in the RBC model

With additional assumptions, one can also use these results to establish k 0 is monotone in z, although doing so is more complicated. Identical arguments to the above give the optimal choice correspondence as ascending in z if Ez 0 |z V0 (k 0 , z 0 ) has increasing differences in k 0 , z. By Lemma 2 part (g), this will hold as long as z 0 is increasing in z and V0 (k, z) has increasing differences in k, z. To ensure this is the case at each step of the Bellman update (assuming the initial guess has increasing differences), one can use Lemma 2 part (h) if the graph of the choice correspondence is a lattice and u ◦ c + βEz 0 |z V0 has increasing differences in k, z. A sufficient condition for the former is that every choice is feasible. A sufficient condition for the latter, by Lemma 2 part (c), is that ∂ 2 u(c)/(∂k∂z) ≥ 0, which is the same condition required in Hopenhayn and Prescott (1992). C.2.3

Monotonicity in k in the RBC model with elastic labor supply

One can establish monotonicity of k 0 (k, z) in k for the RBC model with elastic labor supply by using Proposition 5 to establish increasing differences of the indirect utility function followed by 22

an application of Proposition 3. Specifically, the maximization problem for a given z can be written as maxk0 U (k, k 0 ) + βEz 0 |z V (k 0 , z 0 ) where U (k, k 0 ) := maxl∈[0,1] u(c(l, k 0 , k), l) and c(l, k, k 0 ) := max{0, −k 0 + zF (k, l) + (1 − δ)k}. If U is differentiable and solutions are interior—i.e., l∗ ∈ (0, 1) and c(l∗ , k, k 0 ) > 0—then c2 (l∗ , k, k 0 ) = zFk (k, l∗ ) + 1 − δ exists, is positive, and is weakly increasing in k 0 . If in addition consumption is a normal good so that c(l∗ , k, k 0 ) is decreasing in k 0 , then Proposition 5 gives U as having increasing differences. In this case, Proposition 3 gives monotonicity of k 0 (k, z) in k. C.2.4

Monotonicity in the Arellano (2008) model and the sorted problem

Proposition 4 may be used to establish monotonicity in the Arellano (2008) model and the sorted problem (6). For the Arellano (2008) model, the budget constraint may be written as c(b, b0 ; y) = b + y − q(b0 , y)b0 , which is increasing in b. By Lemma 2 part (a), c has increasing differences in b, b0 . Moreover, the continuation utility W (b0 ; y) := βEy0 |y V (b0 , y) (where V is the upper envelope of the repayment and default value functions) is weakly increasing in b0 . Consequently, Proposition 4 applies. An identical argument may be used for the sorted problem.

D

Algorithm correctness for a more general problem [Not for Publication]

This appendix shows binary monotonicity and the other algorithms deliver correct solutions in a more general formulation of (1). Section D.1 gives the more general formulation and states the correctness result. Section D.2 shows how the result applies in the RBC and Arellano (2008) models while incidentally showing how additional state variables, for which one does not want to exploit monotonicity, may be handled.

D.1

A more general formulation

In both the RBC and Arellano (2008) model, there is no guarantee that every choice is feasible. Consequently, one cannot directly use binary monotonicity because the maximization problems do not directly fit (1). To handle this issue in a general way, suppose that the feasible choice set is I 0 (i) ⊂ {1, . . . , n0 }, which may be empty, and that the objective function is given by some π ˜ (i, i0 ) only defined for (i, i0 ) such that i0 ∈ I 0 (i). For every i such that I 0 (i) is nonempty, define ˜ Π(i) = max π ˜ (i, i0 ). i0 ∈I 0 (i)

23

(7)

Let π denote a lower bound on π ˜ ,7 and define  ˜ (i, i0 ) if I 0 (i) 6= ∅ and i0 ∈ I 0 (i)   π π(i, i0 ) = π if I 0 (i) 6= ∅ and i0 ∈ / I 0 (i)   1[i0 = 1] if I 0 (i) = ∅,

(8)

Further, formalize the notion of concavity in the following way: Definition 3. The problem is concave if, for all i such that I 0 (i) 6= ∅, I 0 (i) = {1, . . . , n0 (i)} for some monotone increasing function n0 (i) and π(i, ·) is either first strictly increasing and then weakly decreasing; or is always weakly decreasing; or is always strictly increasing (where defined). Now we can state the correctness result: Proposition 6. If I 0 is increasing, then any of brute force, simple, or binary monotonicity combined with any of brute force, simple, or binary concavity applied to (1) with the objective function defined as in (8) delivers an optimal solution to (7) provided arg maxi0 ∈I 0 (i) π ˜ (i, i0 ) is ascending and the problem is concave as required by the algorithm choices. For the proof, see Appendix F.

D.2

Examples

To see how this result may be applied and how the RBC model can be cast into (7), consider the problem’s Bellman update, V (k, z) =

max u(c) + βEz 0 |z V0 (k 0 , z 0 )

c≥0,k0 ∈K

(9)

s.t. c + k 0 = zF (k) + (1 − δ)k for k ∈ K, z ∈ Z where K = {k1 , . . . , kn } with the ki increasing. (While here we have used inelastic labor supply, elastic labor supply can be incorporated by replacing the period utility function with an an indirect utility function as we discuss in Appendix C.) Now, create a separate problem for each z and write ˜ z (i) = max π Π ˜z (i, i0 ) i0 ∈Iz0 (i)

(10)

where π ˜z and Iz0 are defined as π ˜z (i, i0 ) := u(−ki0 + zF (ki ) + (1 − δ)ki ) + βEz 0 |z V0 (ki0 , z 0 ) Iz0 (i) := {i0 ∈ {1, . . . , n}|ki0 ≤ zF (ki ) + (1 − δ)ki }.

(11)

A theoretical lower bound on π ˜ is −1 + mini mini0 ∈I 0 (i) π ˜ (i, i0 ). While this particular bound is not practically useful (because computing it would be very costly), the smallest machine-representable number serves as a lower bound for computational purposes. 7

24

Then (10) is just (9) with k and k 0 given by grid indices. Moreover, (10) has the same form as (7). Further, because Iz0 has the form {1, . . . , n ¯ 0z (i)} for an increasing function n ¯ 0z (i), Proposition 6 shows the monotonicity and concavity algorithms will deliver correct solutions. The Arellano (2008) model can be mapped into (7) in the same fashion. The main computational difficulty in that model is solving the sovereign’s problem conditional on not defaulting. Specifically, the problem is to solve, for each b ∈ B and y ∈ Y, V n (b, y) = max u(c) + βEy0 |y max{V n (b0 , y 0 ), V d (y 0 )} 0 c≥0,b ∈B

(12)

s.t. c + q(b0 , y)b0 = b + y where b is the sovereign’s outstanding bonds, q(b0 , y) is the bond price, y is output, and V d is the value of defaulting. Taking B as {b1 , . . . , bn } with the bi increasing and creating a separate problem for each y, one has ˜ y (i) = max π Π ˜y (i, i0 ) i0 ∈Iy0 (i)

(13)

where π ˜y and Iy0 are defined as π ˜y (i, i0 ) := u(−q(bi0 , y)bi0 + bi + y) + βEy0 |y max{V n (bi0 , y 0 ), V d (y 0 )} Iy0 (i) := {i0 ∈ {1, . . . , n}| − q(bi0 , y)bi0 + bi + y ≥ 0}.

(14)

Then (13) has the same form as (7) and Iy0 is increasing. Consequently, the monotonicity algorithms will deliver correct solutions.

E

Additional results for the class of non-monotone problems [Not for Publication]

This appendix builds on Section 4 by testing the quantitative performance of binary monotonicity with sorting. The application, a sovereign default model with endogenous capital accumulation, is described in Section E.1. Section E.2 assesses the algorithm’s performance inclusive of sorting costs. Last, Section E.3 shows how binary monotonicity with sorting can be used to solve some problems that do not fit (6) by transforming them into two-stage problems. It also illustrates how choice variables for which one does not want to (or cannot) exploit monotonicity may be handled.

E.1

A sovereign default model with capital

The model is similar to Bai and Zhang (2012) but lacks capital adjustment costs (in Section E.3 we use adjustment costs to illustrate how two-stage reformulations allow binary monotonicity to be used). A sovereign has total factor productivity a that is Markov and chooses bonds b0 and capital k 0 from sets B and K, respectively, with 0 ∈ B. If the sovereign defaults, output ak α falls by a fraction κ. In equilibrium, the discount bond price q satisfies q(b0 , k 0 , a) = (1 + r)−1 Ea0 |a (1 − d(b0 , k 0 , a0 ))

25

where r is an exogenous risk-free rate and d gives the default decision. The sovereign’s problem is to solve V (b, k, a) = max dV d (k, a) + (1 − d)V n (b, k, a),

(15)

d∈{0,1}

where the value of defaulting is V d (k, a) =

max u(c) + βEa0 |a (θV d (k 0 , a0 ) + (1 − θ)V n (0, k 0 , a0 ))

c≥0,k0 ∈K

(16) 0

α

s.t. c + k = (1 − κ)ak + (1 − δ)k and the value of repaying is V n (b, k, a) = 0

max

c≥0,b0 ∈B,k0 ∈K 0

0

u(c) + βEa0 |a V (b0 , k 0 , a0 )

0

(17) α

s.t. c + q(b , k , a)b + k = ak + (1 − δ)k + b. The most difficult part of computing this model is solving for V n . Note the optimal policies for the problem are generally not monotone: An increase in bonds b or capital k may cause a substitution from b0 into k 0 or vice-versa. This is true even if one uses a cash-at-hand formulation. Nevertheless, binary monotonicity can be used to solve this problem by mapping it into (5) and then sorting to arrive at (6). Specifically, suppose the states (b, k) and choices (b0 , k 0 ) both lie in a set X = {(bi , ki )} having cardinality n. Creating a separate problem for each a, one may then write Van (i) =

max

c≥0,i0 ∈{1,...,n}

u(c) + Wa (i0 ) (18) 0

s.t. c = za (i) − wa (i ), where Wa (i0 ) := βEa0 |a V (bi0 , ki0 , a0 ), za (i) := akiα + (1 − δ)ki + bi , and wa (i0 ) := q(bi0 , ki0 , a)bi0 + ki0 . Generally, za and Wa will not be increasing. However, by sorting them, one can solve the model using binary monotonicity.

E.2

Performance

As stated in the main text, binary monotonicity with sorting is O(n log n) + O(n0 log n0 ) as either n or n0 grow. While the cost depends only on the total number of points n and n0 , in the case of 0

tensor grids with a fixed number of points m along each dimension, n = md and n0 = md grow quickly in m when d and d0 (the dimensionality of states and choices, respectively) are bigger than 0

0

1. The cost in these terms is O(mmax{d,d } log m) for binary monotonicity and O(md+d ) for brute force. While theoretically this results in a massive improvement when d = d0 > 1, the extreme cost of using brute force in this case means one would almost surely reformulate the problem in terms of cash-at-hand, effectively reducing d to 1. Table 3 reports the run times and evaluation counts for brute force, simple monotonicity, and

26

Original formulation Run times

Evaluations per state

Points (m)

Brute

Simple

Binary

Brute

Simple

Binary

Speedup

50 100 250 500 1000

1.73 (m) 26.4 (m) 16.1∗ (h) 10.3∗ (d) 157∗ (d)

37.1 (s) 9.48 (m) 5.83∗ (h) 3.72∗ (d) 57.0∗ (d)

2.64 (s) 9.35 (s) 1.10 (m) 5.02 (m) 21.1 (m)

2500 10000 62500∗ 250000∗ 1000000∗

1273 5100 31936∗ 127919∗ 512379∗

14 16 19 21 23

14.1 60.8 317.9∗ 1066.8∗ 3890.5∗

Cash-at-hand formulation Run times

Evaluations per state

Points (m)

Brute

Simple

Binary

Brute

Simple

Binary

Speedup

50 100 250 500 1000

3.14 (s) 17.5 (s) 4.12 (m) 32.3 (m) 4.23∗ (h)

1.21 (s) 8.03 (s) 1.76 (m) 12.8 (m) 1.55∗ (h)

0.85 (s) 4.16 (s) 31.0 (s) 2.27 (m) 10.1 (m)

2500 10000 62500 250000 1000000∗

1304 5127 31723 126354 503277∗

379 860 2467 5432 11855

1.4 1.9 3.4 5.6 9.2∗

Note: Run times are in seconds (s), minutes (m), hours (h), or days (d). The last column gives the run time for simple relative to binary; an ∗ means the value is estimated; times and average evaluations are over the first 200 value function iterations. Table 3: Run times and evaluations for the combined Arellano (2008) and RBC model

27

binary monotonicity for different grid sizes m.8 In the top panel, the cash-at-hand reformulation has not been used, and so binary monotonicity vastly outperforms the other methods. For 50 points in each dimension, binary monotonicity is already 14 times faster than simple monotonicity and 39 times faster than brute force. For a 1000 points, simple monotonicity’s estimated run time is two months while binary monotonicity’s actual run time is only 21 minutes. A doubling of the grid sizes makes the speedup increase by roughly a factor of 4, which agrees with binary monotonicity being O(m2 log m) and brute force being O(m4 ). The speedups measured in evaluation counts are even more dramatic as they exclude time spent sorting. Reformulating the problem using cash-at-hand (with m points for the cash-at-hand state variable) makes brute force an O(m3 ) algorithm but has no change on binary monotonicity’s asymptotics (which are still O(m2 log m)). Consequently, binary monotonicity still has an advantage, but it is much smaller. This can be seen in a comparison of the top and bottom panels of Table 3. With the cash-at-hand formulation, brute force and simple monotonicity are faster by a factor of roughly m, but binary monotonicity is only twice as fast. Overall, binary monotonicity still outperforms simple monotonicity, but by a more modest factor of 1.4 to 9.2 for run times. The better evaluation count speedups, which are in the 3.4 to 42.5 range, show that sorting costs are playing a non-trivial role. When only n or n0 grow, sorting costs dominate, and here that is essentially the case because n0 = m2 grows much faster than n = m. However, the speedups measured against brute force—which may be a better benchmark since the sorting of continuation utilities is, to our knowledge, novel—are roughly twice as large.

E.3

Two-stage reformulations

Some models, such as models with adjustment costs, do not directly have the additive separability in the budget constraint of (6). However, they might when breaking the maximization problem into two stages. For instance, adding capital adjustment costs to our example results in a budget constraint c + q(b0 , k 0 , a)b0 + k 0 + ξ(k 0 − k)2 = ak α + (1 − δ)k + b,

(19)

which can be written as c = za (b, k, k 0 ) − wa (b0 , k, k 0 ) where za (b, k, k 0 ) := ak α + (1 − δ)k + b and wa (b0 , k, k 0 ) := q(b0 , k 0 , a)b0 + k 0 + ξ(k 0 − k)2 . Consequently, the problem has the form Van (i, j) = max u(c) + Wa (i0 , j 0 ) 0 0 c≥0,i ,j

(20)

s.t. c = za (i, j, j 0 ) − wa (i0 , j, j 0 ), so that there is additive separability between the i and i0 variables conditional on a j, j 0 pair. Binary monotonicity can be used to solve (20) by breaking it into a two-stage problem where j 0 is chosen 8 For this example, productivity follows an AR(1) with a persistence parameter of .945 and standard deviation of .025, the default cost κ is .05, the risk-free rate r is .017, the discount factor β is .952, and the capital share, risk aversion, and depreciation rate are as in the RBC calibration.

28

in the first stage and i0 in the second: Van (i, j) = max V˜an (i, j, j 0 ) 0 j

V˜an (i, j, j 0 )

= max0 u(c) + Wa (i0 , j 0 )

(21)

c≥0,i

s.t. c = za (i, j, j 0 ) − wa (i0 , j, j 0 ). For each j, j 0 combination, one can sort z(·, j, j 0 ) and W (·, j 0 ) so that the optimal policy of the second-stage problem is monotone in i. For grids of size m in each dimension, binary monotonicity with sorting can be used to solve for V˜an in O(m3 log m) operations and then Van in O(m3 ) operations. So, the total cost is O(m3 log m), which compares favorably with brute force’s O(m4 ). In general, if one wants to use binary monotonicity for one choice variable (say i0 ) but not another (say j 0 ), a two-stage reformulation must be done. The above example illustrates one way to do this: First choose j 0 and make it a state variable when choosing i0 . The other way is to first choose i0 and make it a state variable when choosing j 0 . The RBC model with elastic labor supply provides an example of this latter approach, and one may consult Section C.2.3 of Appendix C for more details.

F

Omitted proofs and lemmas [Not for Publication]

This appendix gives omitted proofs and lemmas. Section F.1 gives the results for the monotonicityrelated sufficient conditions. Section F.2 gives proofs and lemmas showing binary monotonicity and the other algorithms work correctly. Section F.3 gives the proofs and lemmas for the cost bounds in established in Propositions 1 and 2 in the main text.

F.1

Monotonicity sufficient condition proofs and lemmas

To give the omitted sufficient condition proofs, we first give some definitions and a lemma. F.1.1

Definitions and a lemma

Lattices are general mathematical structures. For our purposes, we need only lattices consisting of subsets of Euclidean space with the component-wise ordering, i.e., x ≤ y for x, y ∈ Rn if xj ≤ yj for all j where zj denotes the jth component of z. In this context, the join operation ∨ gives the component-wise maximum, namely, x ∨ y = (max{x1 , y1 }, . . . , max{xn , yn }). Likewise, the meet operation ∧ gives the component-wise minimum, x ∧ y = (min{x1 , y1 }, . . . , min{xn , yn }). A lattice consists of a set X ⊂ Rn with the component-wise ordering such that x, y ∈ X implies x ∨ y, x ∧ y ∈ X. Note that if X ⊂ R, it constitutes a lattice with our ordering: x, y ∈ X implies min{x, y}, max{x, y} ∈ X. A function f : X → R where X is a lattice is said to be supermodular (submodular) if f (x) + f (y) ≤ (≥)f (x ∧ y) + f (x ∨ y) for all x, y ∈ X; if the inequality is strict for all x and y that cannot be ordered, then the function is strictly supermodular (submodular).

29

For part of Lemma 2, we will need a slightly broader definition of increasing differences than what was given in the main text: Definition 4. Let X ⊂ Rn . Use the notation (x−ij , yi , yj ) to denote the vector x but with the ith and jth component replaced with the ith and jth component of y, respectively. A function f : X → R has increasing differences on X if for all i, j with i 6= j and for all yi , yj having xi ≤ yi and xj ≤ yj (such that (x−ij , yi , xj ), (x−ij , xi , xj ), (x−ij , yi , yj ), (x−ij , xi , yj ) ∈ X) one has f (x−ij , yi , xj ) − f (x−ij , xi , xj ) ≤ f (x−ij , yi , yj ) − f (x−ij , xi , yj ). The function f has decreasing differences if f (x−ij , yi , xj ) − f (x−ij , xi , xj ) ≥ f (x−ij , yi , yj ) − f (x−ij , xi , yj ). The differences are strict if the inequality holds strictly (whenever xi < yi and xj < yj ). Note that this is equivalent to having increasing differences—as defined in the main text—for all pairs of components. Any univariate function has increasing differences because the condition requires i 6= j. We will also need to appeal to the following partial equivalence between increasing differences and supermodularity. Lemma 3. Suppose X ⊂ Rn is a lattice. If f is (strictly) supermodular on X, then f has (strictly) Q increasing differences on X. If X = ni=1 Xi with Xi ⊂ R for all i and f has (strictly) increasing differences on X, then f is (strictly) supermodular on X. Proof. Let × denote the direct product (a generalization of the Cartesian product).9 Then Theorem 2.6.1 of Topkis (1998) gives that if Xα is a lattice for each α in a set A, X is a sublattice of ×α∈A Xα , and f (x) is (strictly) supermodular on X, then f (x) has (strictly) increasing differences on X. Taking A = {1, . . . , n} and Xα = R for all α ∈ A gives ×α∈A Xα = Rn (which is example 2.2.1 part c on Topkis, 1998, p. 12). Consequently, X is a sublattice of Rn and the theorem applies to show the (strict) supermodularity of f on X implies (strictly) increasing differences of f on X. Corollary 2.6.1 of Topkis (1998) gives that if Xi is a chain (by definition, a partially ordered set that contains no unordered pairs of elements) for i = 1, . . . , n and f has (strictly) increasing differences on ×ni=1 Xi , then f is (strictly) supermodular on ×ni=1 Xi . Since Xi ⊂ R, it is a chain, Q and the direct product ×ni=1 Xi is just the Cartesian product ni=1 Xi . Hence (strictly) increasing differences on X implies (strict) supermodularity on X. F.1.2

Proofs

Proof of Proposition 3 9 Topkis (1998) defines the direct product, which he denotes by ×, in this way: “If Xα is a set for each α in a set A, then the direct product of these sets Xα is the product set ×α∈A Xα = {x = (xα : α ∈ A) : xα ∈ Xα for each α ∈ A}” (p. 12). The notation (xα : α ∈ A) gives a vector that “consists of a component xα for each α ∈ A” (p. 12). In words, ×α∈A Xα is the set of vectors that can be formed under the restriction that each α component has to lie in Xα .

30

Proof. Because I 0 ⊂ R, it is a lattice. Additionally, −π(i, i0 ) has decreasing differences and is trivially submodular in i0 (as well as supermodular). So Theorem 6.1 of Topkis (1978) gives that arg mini0 ∈I 0 (i) −π(i, i0 ) is ascending on the set of i such that a solution exists. Theorem 6.3 strengthens this to strongly ascending when −π(i, i0 ) has strictly decreasing differences. Noting G(i) := arg maxi0 ∈I 0 (i) π(i, i0 ) = arg mini0 ∈I 0 (i) −π(i, i0 ) then gives the result. Proof of Lemma 1 Proof. To have I 0 ascending on I one needs i1 < i2 , i01 ∈ I 0 (i1 ), and i02 ∈ I 0 (i2 ) to imply min{i01 , i02 } ∈ I 0 (i1 ) and max{i01 , i02 } ∈ I 0 (i2 ). Since I 0 is increasing, i01 ∈ I 0 (i2 ) and so max{i01 , i02 } ∈ I 0 (i2 ). If i02 ≥ i01 , then one has min{i01 , i02 } = i01 ∈ I 0 (i1 ). So, take i02 < i01 . We need to show that i02 ∈ I 0 (i1 ) for i1 < i2 and i02 < i01 . Pick an arbitrary m and suppress dependence on it. Then h(i2 , i02 ) − h(i1 , i02 ) ≤ h(i2 , i01 ) − h(i1 , i01 ) ≤ h(i2 , i01 ) ≤ h(i2 , i02 ) where the first line follows from increasing differences, the second from i01 ∈ I 0 (i1 ) so that h(i1 , i01 ) ≥ 0, and the third from h being decreasing in i0 . Consequently −h(i1 , i02 ) ≤ 0 which gives h(i1 , i02 ) ≥ 0. Since the m was arbitrary, this holds for all m and so i02 ∈ I 0 (i1 ). Thus, min{i01 , i02 } ∈ I 0 (i1 ) and so I 0 is ascending on I. Proof of Lemma 2 Proof. For (a) and (b), we prove (b) which implies (a). Let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f (i2 , i0j ) − f (i1 , i0j ) = p(i2 ) + q(i0j ) + r(i2 )s(i0j ) − p(i1 ) − q(i0j ) − r(i1 )s(i0j ) = p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i0j ). So, f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ) if and only if p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i01 ) ≤ p(i2 ) − p(i1 ) + (r(i2 ) − r(i1 ))s(i02 ) ⇔ 0 ≤ (r(i2 ) − r(i1 ))(s(i02 ) − s(i01 )), which holds because r and s are either both increasing or both decreasing. For (c), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f has increasing differences if and only if g(i2 , i01 ) − g(i1 , i01 ) ≤ g(i2 , i02 ) − g(i1 , i02 )

31

since g agrees with f on S. Because g is C 2 , this holds if Z

g1 (θ, i01 )dθ

Z

g1 (θ, i02 )dθ

≤

Z [i1 ,i2 ]

[i1 ,i2 ]

[i1 ,i2 ]

(g1 (θ, i02 ) − g1 (θ, i01 ))dθ

⇔0≤

Again, by g being C 2 , this holds if Z

Z

0≤ [i1 ,i2 ]

[i01 ,i02 ]

g12 (θ, θ0 )dθ0 dθ

Because L is assumed to be a hypercube containing S and g12 ≥ 0 on L, this holds. For (d), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then f has increasing differences if and only if X

αk fk (i2 , i01 ) −

k

X

αk fk (i1 , i01 ) ≤

X

k

⇔

X

αk (fk (i2 , i01 )

αk fk (i2 , i02 ) −

k

−

fk (i1 , i01 ))

k

αk fk (i1 , i02 )

k

X

≤

X

αk (fk (i2 , i02 )

− fk (i1 , i02 )).

k

A sufficient condition for this is that, for all k, fk has increasing differences so that fk (i2 , i01 ) − fk (i1 , i01 ) ≤ fk (i2 , i02 ) − fk (i1 , i02 ). For (e) and (f), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. The composition h ◦ g has increasing differences on S if and only if h(g(i2 , i01 )) − h(g(i1 , i01 )) ≤ h(g(i2 , i02 )) − h(g(i1 , i02 ))

(22)

Because g is increasing in i, g(i2 , i0 ) − g(i1 , i0 ) ≥ 0. Then because h is C 2 , (22) is equivalent to Z

g(i2 ,i01 )−g(i1 ,i01 )

h

0

(g(i1 , i01 )

Z + θ)dθ ≤

0

g(i2 ,i02 )−g(i1 ,i02 )

h0 (g(i1 , i02 ) + θ)dθ.

0

Because g has increasing differences, g(i2 , i01 )−g(i1 , i01 ) ≤ g(i2 , i02 )−g(i1 , i02 ). Hence, this is equivalent to Z

g(i2 ,i01 )−g(i1 ,i01 )

h0 (g(i1 , i01 ) + θ) − h0 (g(i1 , i02 ) + θ) dθ ≤

0

Z

g(i2 ,i02 )−g(i1 ,i02 )

g(i2 ,i01 )−g(i1 ,i01 )

h0 (g(i1 , i02 ) + θ)dθ.

Because h0 > 0, the right-hand side is positive. So, a sufficient condition for this to hold is that the left-hand side be negative, which is true if h0 (g(i1 , i01 ) + θ) ≤ h0 (g(i1 , i02 ) + θ) for all positive θ. In (f), g is increasing in its second argument, so g(i1 , i01 ) + θ ≤ g(i1 , i02 ) + θ, and, because h00 > 0, this holds. In (g), g is decreasing in its second argument, so g(i1 , i01 ) + θ ≥ g(i1 , i02 ) + θ, and, because h00 < 0, this holds. So, h ◦ g has increasing differences. For (g), let (i1 , i01 ), (i2 , i02 ) ∈ S with i1 < i2 and i01 < i02 but otherwise arbitrary. Then because

32

h is increasing in i and because g has increasing differences, g(h(i2 , ε), i01 ) − g(h(i1 , ε), i01 ) ≤ g(h(i2 , ε), i02 ) − g(h(i1 , ε), i02 ) for any ε ∈ E. Integrating, Z

g(h(i2 , ε), i01 )

−

g(h(i1 , ε), i01 )

Z dF (ε) ≤

From f (i, i0 ) =

g(h(i2 , ε), i02 ) − g(h(i1 , ε), i02 )dF (ε).

E

E

R E

g(h(i, ε), i0 )dF (ε), this says f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ),

which establishes that f has increasing differences. For (h), note that the pairwise increasing differences of g in i, i0 and i, x and i0 , x gives, by definition, that g has increasing differences on I × I 0 × X. So, g is supermodular on the lattice I × I 0 × X by Lemma 3. Since the graph of Γ is assumed to be a lattice, it is a sublattice of S ×X. Last, because g is supermodular, −g is submodular. Consequently, Theorem 4.3 of Topkis (1978) applies to show that minx∈Γ(i,i0 ) −g(i, i0 , x) is submodular. Therefore, − minx∈Γ(i,i0 ) −g(i, i0 , x) is supermodular, and this equals maxx∈Γ(i,i0 ) g(i, i0 , x), which by definition is f (i, i0 ). So, f is supermodular on the lattice S, which implies f has increasing differences on S by Lemma 3. Proof of Proposition 5 Proof. Because f is assumed to be differentiable (left and right) on the closure of S, Theorem 1 of Milgrom and Segal (2002) gives that fi (i, i0 ) = u1 (g(x∗ (i, i0 ), i, i0 ))g2 (x∗ (i, i0 ), i, i0 ) on S. To show increasing differences, we need to establish that f (i2 , i01 ) − f (i1 , i01 ) ≤ f (i2 , i02 ) − f (i1 , i02 ) for (i1 , i01 ), (i2 , i02 ) ∈ S with i1 ≤ i2 and i01 ≤ i02 . Since f is C 1 in i, this is equivalent to Z

i2

0≤

(fi (θ, i02 ) − fi (θ, i01 ))dθ.

i1

Hence, if fi is increasing in i0 , then increasing difference holds. Defining x∗j := x∗ (i, i0j ), then fi is increasing in i0 if 0 ≤ u1 (g(x∗2 , i, i02 ))g2 (x∗2 , i, i02 ) − u1 (g(x∗1 , i, i01 ))g2 (x∗1 , i, i01 ) = u1 (g(x∗2 , i, i02 ))(g2 (x∗2 , i, i02 ) − g2 (x∗1 , i, i01 )) + (u1 (g(x∗2 , i, i02 )) − u1 (g(x∗1 , i, i01 )))g2 (x∗1 , i, i01 ) Since u1 ≥ 0 and g2 (x∗ (i, i0 ), i, i0 ) is increasing in i0 , the first term is positive. Since u11 ≤ 0, g2 (x∗ (i, i0 ), i, i0 ) ≥ 0, and g(x∗ (i, i0 ), i, i0 ) is decreasing in i0 , the second term is positive. So, f has increasing differences. 33

Proof of Proposition 4 Proof. Let a < b with a, b ∈ I. Define the feasible set as Γ(i) := {i0 ∈ I 0 |c(i, i0 ) ≥ 0}. Let g1 ∈ G(a) and g2 ∈ G(b). To establish that G is ascending, it is sufficient to show that g1 > g2 implies g1 ∈ G(b) and g2 ∈ G(a). To establish that G is strongly ascending, it is sufficient to show that g1 > g2 gives a contradiction. So, suppose g1 > g2 . Unless explicitly stated, we only assume c is weakly increasing in i, has weakly increasing differences, and that W is weakly increasing. First, we will show c(a, g2 ) ≥ c(a, g1 ) ≥ 0 and c(b, g2 ) ≥ c(b, g1 ) ≥ 0 for W weakly increasing and c(a, g2 ) > c(a, g1 ) ≥ 0 and c(b, g2 ) > c(b, g1 ) ≥ 0 for W strictly increasing. To see this, note that Γ is increasing. Consequently, g1 ∈ Γ(b) (so c(b, g1 ) ≥ 0) and hence g2 ∈ G(b) implies u(c(b, g2 )) + W (g2 ) ≥ u(c(b, g1 )) + W (g1 ).

(23)

Because g2 < g1 and W is weakly (strictly) increasing, this implies c(b, g2 ) ≥ (>)c(b, g1 ). So, exploiting weakly increasing differences, 0 ≥ (>)c(b, g1 ) − c(b, g2 ) ≥ c(a, g1 ) − c(a, g2 ) for W weakly (strictly) increasing. Also using g1 ∈ Γ(a), c(a, g2 ) ≥ (>)c(a, g1 ) ≥ 0 for W weakly (strictly) increasing. For use below, note also that because g1 ∈ G(a) and g2 ∈ Γ(a), u(c(a, g1 )) + W (g1 ) ≥ u(c(a, g2 )) + W (g2 ).

(24)

Combining (23) and (24), u(c(b, g2 )) + W (g2 ) − u(c(b, g1 )) − W (g1 ) ≥ 0 ≥ u(c(a, g2 )) + W (g2 ) − u(c(a, g1 )) − W (g1 ), which implies u(c(b, g2 )) − u(c(b, g1 )) ≥ u(c(a, g2 )) − u(c(a, g1 )).

(25)

As established above, c(b, g2 ) ≥ c(b, g1 ) and c(a, g2 ) ≥ c(a, g1 ). Using this and the differentiability of u, (25) is equivalent to Z

c(b,g2 )−c(b,g1 )

u0 (c(b, g1 ) + θ)dθ ≥

Z

0

c(a,g2 )−c(a,g1 )

u0 (c(a, g1 ) + θ)dθ.

0

Because of weakly increasing differences, c(a, g1 ) − c(a, g2 ) ≤ c(b, g1 ) − c(b, g2 ) or, equivalently, c(b, g2 ) − c(b, g1 ) ≤ c(a, g2 ) − c(a, g1 ). Moreover, c(b, g2 ) ≥ c(b, g1 ). So, the above inequality is equivalent to Z

c(a,g2 )−c(a,g1 )

0≥

0

Z

c(b,g2 )−c(b,g1 )

u (c(a, g1 ) + θ)dθ + c(b,g2 )−c(b,g1 )

(u0 (c(a, g1 ) + θ) − u0 (c(b, g1 ) + θ))dθ

0

Because c is weakly increasing in i and u is concave, the second integral is positive. The first must

34

also be positive. Hence, the inequality holds if and only if

AND

c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 )

(C1)

c(b, g2 ) = c(b, g1 ) or c(a, g1 ) = c(b, g1 )

(C2)

Now, consider the claims again. For the second claim, we seek a contradiction. A contradiction obtains if c has strictly increasing differences as then (C1) is violated. Alternatively, if W is strictly increasing and c is strictly increasing in i, (C2) will be violated because c(b, g2 ) > c(b, g1 ) and c(a, g1 ) > c(b, g1 ). For the first claim, we want to show g1 ∈ G(b) and g2 ∈ G(a). (C2) implies either c(b, g2 ) = c(b, g1 ) and/or c(a, g1 ) = c(b, g1 ). Consider the cases separately with c(b, g2 ) = c(b, g1 ) first. Then (C1) gives c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 ). So, c(a, g2 ) = c(a, g1 ). Hence, the choices give the same consumption at a and b. So, the continuation utility must be the same: (23) implies W (g2 ) ≥ W (g1 ) and (24) implies W (g1 ) ≥ W (g2 ). So, with the same consumption and choice utilities, g1 ∈ G(a) gives g2 ∈ G(a) and g2 ∈ G(b) gives g1 ∈ G(b). Now consider the second case where c(a, g1 ) = c(b, g1 ). Because (C1) gives c(a, g2 ) − c(a, g1 ) = c(b, g2 ) − c(b, g1 ), replacing c(a, g1 ) with c(b, g1 ) gives c(a, g2 ) − c(b, g1 ) = c(b, g2 ) − c(b, g1 ) or c(a, g2 ) = c(b, g2 ). Then u(c(a, g1 )) + W (g1 ) ≥ u(c(a, g2 )) + W (g2 ) ⇔ u(c(b, g1 )) + W (g1 ) ≥ u(c(b, g2 )) + W (g2 )

(26)

where the first line follows from the optimality of g1 ∈ G(a) and the second from c(a, g1 ) = c(b, g1 ) and c(a, g2 ) = c(b, g2 ). Consequently, since g2 ∈ G(b) and (26) shows g1 delivers weakly higher utility at b, g1 ∈ G(b). To establish g2 ∈ G(a), the argument is similar. We have c(a, g1 ) = c(b, g1 ) and c(a, g2 ) = c(b, g2 ). Because we have shown g1 ∈ G(b), u(c(b, g1 )) + W (g1 ) = u(c(b, g2 )) + W (g2 ). Replacing c(b, g1 ) with c(a, g1 ) and c(b, g2 ) with c(a, g2 ), this becomes u(c(a, g1 )) + W (g1 ) = u(c(a, g2 )) + W (g2 ). Consequently, g1 ∈ G(a) implies g2 ∈ G(a).

F.2

Algorithm correctness proofs and lemmas

We now give the omitted proofs and lemmas pertaining to the algorithm correctness. ˜ Let the objective function π ˜ (i, i0 ), the maximum Π(i), and the feasible choice set I 0 (i) be as in (7). Assume that I 0 (i) ⊂ {1, . . . , n0 }—which may be empty—is monotonically increasing, and define I := {i ∈ {1, . . . , n}|I 0 (i) 6= ∅} so that i ∈ I has a feasible solution. For all i ∈ I, define ˜ G(i) := arg maxi0 ∈I 0 (i) π ˜ (i, i0 ). Let π be defined via (8)—where π is such that π ˜ (i, i0 ) > π for all i ∈ I and i0 ∈ I 0 (i)—with Π(i) := maxi0 ∈{1,...,n0 } π(i, i0 ) and G(i) := arg maxi0 ∈{1,...,n0 } π(i, i0 ). Note 35

that by construction, i0 = 1 is optimal whenever there is no feasible choice and i0 ∈ I 0 (i) is always preferable to i0 ∈ / I 0 (i) when a feasible choice exists. Lemma 4 establishes the mathematical equivalence of these problems (but does not say that the algorithms applied to (1) deliver a correct solution to (7)). Lemma 4. All of the following are true: ˜ 1. Π(i) = Π(i) for all i ∈ I. ˜ 2. G(i) = G(i) for all i ∈ I. ˜ is ascending on I, then G is ascending on {1, . . . , n}. 3. If G Proof. The first claim is an implication of the second claim. For the proof of the second claim, let i ∈ I. Then I 0 (i) 6= ∅. Infeasible choices, i.e., i0 ∈ {1, . . . , n0 } \ I 0 (i), are strictly suboptimal in the Π problem (1) because any feasible choice j 0 ∈ I 0 (i) delivers π(i, j 0 ) > π = π(i, i0 ). Hence, ˜ G(i) = arg max π(i, i0 ) = arg max π(i, i0 ) = arg max π ˜ (i, i0 ) = G(i) i0 ∈{1,...,n0 }

i0 ∈I 0 (i)

i0 ∈I 0 (i)

(where the third equality follows from the definition of π). ˜ be ascending on I. Now, let i1 ≤ i2 and g1 ∈ G(i1 ) and To show the third claim, let G g2 ∈ G(i2 ). We want to show that min{g1 , g2 } ∈ G(i1 ) and max{g1 , g2 } ∈ G(i2 ). Clear this is the case if g1 ≤ g2 , so take g1 > g2 . Then since G(i) = {1} for all i ∈ / I and g1 > g2 ≥ 1, it must be that i1 ∈ I (otherwise, g1 would have to be 1). Then, because I 0 (i) is increasing, i2 ∈ I. Hence, ˜ 1 ) and G(i2 ) = G(i ˜ 2 ). So, G ˜ ascending gives the desired result. G(i1 ) = G(i Lemmas 5 and 6 establish that, for concave problems, simple and binary concavity deliver an optimal choice provided there is one in the search space. Lemma 5. If the problem is concave and π(i, j) ≥ π(i, j+1) for some j, then π(i, j) = maxi0 ∈{j,...,n0 } π(i, i0 ) (j is as least as good as anything to the right of it). If π(i, k − 1) < π(i, k) for some k, then π(i, k) = maxi0 ∈{1,...,k} π(i, i0 ) = maxi0 ∈{1,...,k} π ˜ (i, i0 ) (k is as least as good as anything to the left of it). Proof. Note that by the problem being concave (as given in Definition 3), there is some increasing function n0 (i) such that I 0 (i) = {1, . . . , n0 (i)} for i ∈ I. To prove π(i, j) ≥ π(i, j + 1) implies π(i, j) = maxi0 ∈{j,...,n0 } π(i, i0 ), consider two cases. First, suppose i ∈ / I. Then I 0 (i) = ∅ and π(i, i0 ) = 1[i0 = 1]. Consequently, for any j, π(i, j) = maxi0 ∈{j,...,n0 } π(i, i0 ). In other words, j is weakly better than any value to the right of it. Second, suppose i ∈ I. If j > n0 (i), then π(i, j) = π = maxi0 ∈{j,...,n0 } π(i, i0 ). If j = n0 (i), then

36

π(i, j) > π = maxi0 ∈{j+1,...,n0 } π(i, i0 ) implying π(i, j) = maxi0 ∈{j,...,n0 } π(i, i0 ). If j < n0 (i), then max

i0 ∈{j,...,n0 }

π(i, i0 ) = max{

max

π(i, i0 ),

= max{

max

π(i, i0 ), π}

i0 ∈{j,...,n0 (i)} i0 ∈{j,...,n0 (i)}

= =

max

π(i, i0 )

max

π ˜ (i, i0 ).

i0 ∈{j,...,n0 (i)} i0 ∈{j,...,n0 (i)}

max

i0 ∈{n0 (i)+1,...,n0 }

π(i, i0 )}

All that remains to be shown for this case is π(i, j) = maxi0 ∈{j,...,n0 (i)} π ˜ (i, i0 ). Since π(i, j) = π ˜ (i, j) and π(i, j + 1) = π ˜ (i, j + 1), the hypothesis gives π ˜ (i, j) ≥ π ˜ (i, j + 1). Because the problem is concave and π ˜ (i, ·) is weakly decreasing from j to j + 1, it must be weakly decreasing from j to n0 (i). Hence, π ˜ (i, j) = maxi0 ∈{j,...,n0 (i)} π ˜ (i, i0 ). So, π(i, j) = π ˜ (i, j) = maxi0 ∈{j,...,n0 (i)} π ˜ (i, i0 ) = maxi0 ∈{j,...,n0 } π(i, i0 ). Now, we prove the case for π(i, k − 1) < π(i, k). In this case, k − 1 and k must both be feasible, i.e. k ≤ n0 (i), because (1) if they were both infeasible, then π(i, k − 1) = π = π(i, k) and (2) if only k were infeasible, then π(i, k − 1) > π = π(i, k). Given that k − 1 and k are feasible, π(i, k − 1) = π ˜ (i, k − 1) and π(i, k) = π ˜ (i, k). Since π ˜ (i, ·) is strictly increasing until it switches to weakly decreasing, π ˜ (i, 1) < . . . < π ˜ (i, k − 1) < π ˜ (i, k). Hence π ˜ (i, k) = maxi0 ∈{1,...,k} π ˜ (i, i0 ). Since all of 1, . . . , k are feasible, π(i, k) = maxi0 ∈{1,...,k} π(i, i0 ) = maxi0 ∈{1,...,k} π ˜ (i, i0 ). Lemma 6. Suppose it is known that G(i) ∩ {a, . . . , b} is nonempty. Then brute force applied to max

i0 ∈{a,...,b}

π(i, i0 )

delivers an optimal solution, i.e., letting gˆ be the choice the algorithm delivers, gˆ ∈ G(i). Additionally, if the problem is concave, then the simple concavity and binary concavity algorithms also deliver an optimal solution. Proof. First, suppose i ∈ / I so that π(i, i0 ) = 1[i0 = 1]. Then it must be that a = 1 since G(i) = {1}. Brute force clearly finds the optimum since it checks every value of i0 . Simple concavity will compare i0 = a = 1 against i0 = a + 1 = 2 and find i0 = 2 is strictly worse. So, it stops and gives gˆ = 1, implying gˆ ∈ G(i) = {1}. Binary concavity first checks whether b − a + 1 ≤ 2. If so, it is the same as brute force. If not, it checks whether b − a + 1 ≤ 3. If so, then b − a + 1 = 3 and it does a comparison of either (1) a and m = (b + a)/2, in which case it correctly identifies the maximum as a or (2) m and b in which case it drops b from the search space and does a brute force comparison of a and a + 1 (when it goes to step 2). If b − a + 1 > 3, it will evaluate the midpoint m = b(a + b)/2c and m + 1 and find π(i, m) = π(i, m + 1) = 0. It will then proceed to step 2, searching for the optimum in {1, . . . , m} with a = 1 and b = m in the next iteration of the recursive algorithm. This proceeds until b − a + 1 ≤ 3, where it then correctly identifies the maximum (as was just discussed). Therefore, binary concavity finds a correct solution, gˆ ∈ G(i). 37

Now, suppose i ∈ I. Because brute force will evaluate π(i, ·) at every i0 ∈ {a, . . . , b}, it finds gˆ ∈ G(i). Now, suppose the problem is concave. The simple concavity algorithm evaluates π(i, i0 ) at i0 ∈ {a, . . . , b} sequentially until it reaches an x ∈ {a+1, . . . , b} that π(i, x−1) ≥ π(i, x). If this stopping rule is not triggered, then simple concavity is identical to brute force and so finds an optimal solution. So, it suffices to consider otherwise. In this case, x − 1 satisfies the conditions for “j” in Lemma 5 and hence π(i, x−1) = maxi0 ∈{x−1,...,n0 } π(i, i0 ). By virtue of not having stopped until x−1, π(i, x − 1) ≥ maxi0 ∈{a,...,x−1} π(i, i0 ). Consequently, π(i, x − 1) ≥ maxi0 ∈{a,...,x−1}∪{x−1,...,n0 } π(i, i0 ) = maxi0 ∈{a,...,n0 } π(i, i0 ). Since a maximum is known to be in {a, . . . , b}, Π(i) =

max

i0 ∈{a,...,b}

π(i, i0 ) ≤

max

i0 ∈{a,...,n0 }

π(i, i0 ) ≤ π(i, x − 1) ≤ Π(i).

So, π(i, x − 1) = Π(i) giving x − 1 ∈ G(i). Now consider the binary concavity algorithm. If b ≤ a + 1 (so that the size of the search space, b − a + 1, is 1 or 2), the algorithm is the same as brute force and so finds a maximum. If b = a + 2 (a search space of size 3), the algorithm goes to either step 3(a) or step 3(b). In step 3(a), it stops if π(i, a) > π(i, m) (where m = (a + b)/2) taking the maximum as a and otherwise does the same as brute force. So, suppose the stopping condition is satisfied. A maximum is a as long as π(i, a) = maxi0 ∈{a,...,b} π(i, i0 ), which it is since a satisfies the conditions for “j” in Lemma 5. In step 3(b), it stops if π(i, b) > π(i, m) taking the maximum as b and otherwise does the same as brute force. So, suppose the stopping condition is satisfied. A maximum is b as long as π(i, b) = maxi0 ∈{a,...,b} π(i, i0 ), which is true since b satisfies all the conditions for “k” in Lemma 5. If b ≥ a + 3 (a search space of 4 or more), binary concavity goes to step 4 of the algorithm. In this case, it evaluates at two points m = b(a + b)/2c and m + 1. If π(i, m) ≥ π(i, m + 1), it assumes a maximum is in {a, . . . , m}. Since m satisfies the conditions for “j” in Lemma 5, π(i, m) ≥ maxi0 ∈{m,...,b} π(i, i0 ), which justifies this assumption. If π(i, m) < π(i, m + 1), it instead assumes a maximum is in {m+1, . . . , b}. This again is justified since m+1 satisfies all the conditions for “k” in Lemma 5 and so m + 1 is better than any value of i0 < m + 1. The algorithm repeatedly divides {a, . . . , b} into either {a, . . . , m} or {m + 1, . . . , b} until the size of the search space is either two or three. Since we have already shown the algorithm correctly identifies a maximum when the search space is of size two or three (i.e., b = a + 1 or b = a + 2), the algorithm correctly finds the maximum for larger search spaces as long as this subdivision stops in a finite number of iterations (since then induction can be applied). Lemma 7 shows the required number of function evaluations is finite, and so this holds. We now give the proof of Proposition 6, which establishes that the monotonicity and concavity algorithms deliver an optimal policy. Proof of Proposition 6 Proof. We will show, letting gˆ be the policy function the algorithm finds, that gˆ(i) ∈ G(i) for all ˜ i, which implies gˆ(i) ∈ G(i) for all i ∈ I by Lemma 4. Each of the brute force, simple, and binary 38

monotonicity algorithms can be thought of as iterating through states i (in some order that, in the case of binary monotonicity, depends on π) with a search space {a, . . . , b}. If every state is visited and optimal choice is found at each state, then an optimal solution is found. So, it suffices to show that each of the brute force, simple, and binary monotonicity algorithms explore every state i ∈ {1, . . . , n} and at each state, the following conditions are met so that Lemma 6 can be applied: (1) {a, . . . , b} ⊂ {1, . . . , n0 }; (2) a ≤ b; and (3) G(i) ∩ {a, . . . , b} = 6 ∅. An application of Lemma 6 then gives gˆ(i) ∈ G(i) (provided an appropriate concavity algorithm is used). Brute force monotonicity trivially explores all states i ∈ {1, . . . , n} sequentially. At each i, a = 1 and b = n0 . Consequently, G(i) ∩ {a, . . . .b} = 6 ∅ and Lemma 6 can be applied. ˜ Now, we prove simple monotonicity and binary monotonicity deliver a correct solution when G is ascending, which, by Lemma 4, gives that G is ascending. The simple monotonicity algorithm explores all states i ∈ {1, . . . , n} sequentially always with b = n0 (and so a ≤ b). For i = 1, a = 1 and so G(1) ∩ {a, . . . , b} = 6 ∅. Consequently, Lemma 6 gives that gˆ(1) ∈ G(1). Now, consider some i > 1 and suppose for induction that gˆ(i − 1) ∈ G(i − 1). Because G is ascending and gˆ(i − 1) ∈ G(i − 1), any g˙ ∈ G(i) implies max{ˆ g (i − 1), g} ˙ ∈ G(i). So, G(i) ∩ {g(i − 1), . . . , n0 } 6= ∅. Hence, Lemma 6 applies, and gˆ(i) ∈ G(i) completing the induction argument. Now consider the binary monotonicity algorithm. If n = 1 or n = 2, the algorithm is the same as simple monotonicity and so delivers a correct solution. If n > 2, then the algorithm first correctly identifies gˆ(1) (by brute force) and gˆ(n) (using the same argument as simple monotonicity). It then defines i = 1 and i = n and maintains the assumption that gˆ(i) ∈ G(i) and gˆ(i) ∈ G(i). The goal of step 2 is to find the optimal solution for all i ∈ {i, . . . , i}. The algorithm stops at step 2(a) if i ≤ i + 1, in which case this objective is clearly met since {i, . . . , i} = {i, i}. If the algorithm does not stop, then it computes gˆ(m) for m = b(a + b)/2c using the search space {ˆ g (i), . . . , gˆ(i)}. By Lemma 6, an optimum is found as long as G(m) ∩ {ˆ g (i), . . . , gˆ(i)} 6= ∅. If gˆ(i) ∈ G(i) and g (i), max{ˆ g (i), g}} ˙ ∈ G(m). So, gˆ(i) ∈ G(i), then for any g˙ ∈ G(m), G ascending gives min{ˆ G(m) ∩ {ˆ g (i), . . . , gˆ(i)} 6= ∅ if gˆ(i) ∈ G(i) and gˆ(i) ∈ G(i). This holds because of the algorithm’s maintained assumptions.10 So, if every i ∈ {2, . . . , n−1} is the midpoint of some (i, i) after iterating some number of times, the proof is complete. In other words, since the algorithm only solves for the optimal policy at midpoints once it reaches step 2, we need to prove every state (except for i = 1 and i = n) is eventually a midpoint. To show that every i ∈ {2, . . . , n − 1} is a midpoint of some interval reached in the recursion, fix an arbitrary such i and suppose not. Define (i1 , i1 ) = (1, n). When step 2 is first reached, i ∈ {i1 + 1, . . . , i1 − 1}. Now, uniquely and recursively define (ik , ik ) to be the one of (ik−1 , m) and (m, ik−1 ) with m = b(ik−1 + ik−1 )/2c such that i ∈ {ik + 1, . . . , ik − 1} (because i is assumed to never be a midpoint, this is well-defined). Now, consider the cardinality of {ik , . . . , ik } defining it as Nk = ik − ik + 1. By construction, 10

Formally, it can be shown through induction. It is true at the first instance of step 2. Since in step 2(c) the algorithm then divides into {i, . . . , m} and {m, . . . , i}, it is also true at the next iteration. Consequently, induction gives that a maximum is found for every midpoint.

39

i ∈ {ik + 1, . . . , ik − 1} for each k. So, a contradiction is reached if {ik + 1, . . . , ik − 1} = ∅ which is equivalent to Nk ≤ 2. So, it must be that Nk ≥ 3 for all k. If Nk−1 is odd, then Nk = (Nk−1 + 1)/2. If Nk−1 is even, Nk ≤ Nk−1 /2 + 1. So, in either case Nk ≤ Nk−1 /2 + 1. Defining Mk recursively by M1 = N1 and Mk = Mk−1 /2 + 1, one can show by induction that Nk ≤ Mk for all k. Because Nk ≥ 3 for all k, Mk ≥ 3 for all k. Hence Mk − Mk−1 = 1 − Mk−1 /2 ≤ 1 − 3/2 = −1/2. Hence Mk ≤ Mk−1 − 1/2. Therefore, Mk will be less than three in a finite number of iterations, which gives a contradiction.

F.3

Cost bound proofs and lemmas [Not for Publication]

We now give the cost bound proofs and supporting lemmas. Section F.3.1 establishes the performance of Heer and Maußner (2005)’s binary concavity. Section F.3.2 proves the performance of the one-state binary monotonicity as stated in Proposition 1. Section F.3.3 the performance of the two-state binary monotonicity as stated in Proposition 2. F.3.1

Binary concavity

Lemma 7. Consider the problem maxi0 ∈{a,...,a+n−1} π(i, i0 ) for any a and any i. For any n ∈ Z++ , binary concavity requires no more 2dlog2 (n)e−1 evaluations if n ≥ 3 and no more than n evaluations if n ≤ 2. Proof. For n = 1, the algorithm computes π(i, a) and stops, so one evaluation is required. For n = 2, two evaluations are required (π(i, a) and π(i, a + 1)). For n = 3, step 3 requires π(i, m) to be computed and may require π(i, a) to be computed. Then step 3(a) or step 3(b) either stop with no additional function evaluations or go to step 2 with max{1a , 1b } = 1 where, in that case, at most one additional function evaluation is required. Consequently, n = 3 requires at most three function evaluations, which agrees with 2dlog2 (3)e − 1 = 3. So, the statement of lemma holds for 1 ≤ n ≤ 3. Now consider each n ∈ {4, 5, 6, 7} for any 1a , 1b flags. Since n ≥ 4 the algorithm is in (or goes to) step 4. Consequently, two evaluations are required. Since the new interval is either {a, . . . , m} or {m+1, . . . , b} and π(i, m) and π(i, m+1) are computed in step 4, the next step has max{1a , 1b } = 1. Now, if n = 4, the next step must be step 2, which requires at most one additional evaluation (since max{1a , 1b } = 1). Hence, the total evaluations are less than or equal to 3 (two for step 4 and one for step 2). If n = 5, then the next step is either step 2, requiring one evaluation, or step 3, requiring two evaluations. So, the total evaluations are not more than four. If n = 6, the next step is step 3, and so four evaluations are required. Lastly, for n = 7, the next step is either step 3, requiring two evaluations, or step 4 (with n = 4), requiring at most three evaluations. So, the evaluations are weakly less than 5 = 2 + max{2, 3}. Hence, for every n = 4, 5, 6, and 7, the required evaluations are less than 3, 4, 4, and 5, respectively. One can then verify that the evaluations are less than 2dlog2 (n)e − 1 for these values of n. Now, suppose n ≥ 4. We shall prove that the required number of evaluations is less than σ(n) := 2dlog2 (n)e−1 by induction. We have already verified the hypothesis holds for n ∈ {4, 5, 6, 7}, 40

so consider some n ≥ 8 and suppose the hypothesis holds for all m ∈ {4, . . . , n−1}. Let i be such that 11 n ∈ [2i +1, 2i+1 ]. Then note that two things are true, dlog2 (n)e = i+1 and dlog2 (b n+1 2 c)e = i. Since

n ≥ 4, the algorithm is in (or proceeds to) step 4, which requires two evaluations, and then proceeds with a new interval to step 4 (again). If n is even, the new interval has size n/2. If n is odd, the new interval either has a size of (n+1)/2 or (n−1)/2. So, if n is even, no more than 2+σ(n/2) evaluations are required; if n is odd, no more than 2 + max{σ((n + 1)/2), σ((n − 1)/2)} = 2 + σ((n + 1)/2) evaluations are required. The even and odd case can then be handled simultaneously with the bound 2 + σ(b n+1 2 c). Manipulating this expression using the previous observation that dlog2 (n)e = i + 1 and dlog2 (b n+1 2 c)e = i, 2 + σ(b

n+1 n+1 c) = 2 + 2dlog2 b ce − 1 2 2 = 2 + 2i − 1 = 2dlog2 (n)e − 1.

Hence, the proof by induction is complete. F.3.2

Binary monotonicity in one dimension

In proving the performance on the one-state binary monotonicity algorithm, we allow for many maximization techniques by characterizing the algorithm’s properties conditional on a monotonically increasing σ : Z++ → Z+ that bounds the evaluation count required to solve (3). Because of the recursive nature of binary monotonicity, the π evaluation bound for general σ is also naturally recursive. In Proposition 7, we will show the algorithm’s cost is 2σ(n0 ) + Mσ (n, n0 ) where the function Mσ is defined as follows: Definition 5. For any σ : Z++ → Z+ , define Mσ : {2, 3, . . .} × Z+ → Z+ recursively by Mσ (z, γ) = 0 if z = 2 or γ = 0 and Mσ (z, γ) = σ(γ) +

n o z z Mσ (b c + 1, γ 0 ) + Mσ (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ} max

for z > 2 and γ > 0. Now, we establish that Mσ is increasing. Lemma 8. For any σ, Mσ (z, γ) is weakly increasing in z and γ. Proof. Fix a σ and suppress dependence on it. First, we will show M (z, γ) is weakly increasing in γ for every z. The proof is by induction. For z = 2, M (2, ·) = 0. For z = 3, M (3, γ) = σ(γ) which is weakly increasing in γ. Now consider some z > 3 and suppose M (y, ·) is weakly increasing for all 11 The proof is as follows. Both dlog2 (·)e and dlog2 (b·c)e are weakly increasing functions. So n ∈ [2i + 1, 2i+1 ] implies dlog2 (n)e ∈ [dlog2 (2i + 1)e, dlog2 (2i+1 )e] = [i + 1, i + 1]. Likewise, dlog2 (b n+1 c)e ∈ 2

[dlog2 (b 2

i

i+1 +1+1 c)e, dlog2 (b 2 2 +1 c)e] 2

= [i, i].

41

y ≤ z − 1. For γ2 > γ1 , o n z z M (b c + 1, γ 0 ) + M (b c + 1, γ1 − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ1 } o n z z ≤ σ(γ2 ) + max M (b c + 1, γ 0 ) + M (b c + 1, γ2 − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ2 }

M (z, γ1 ) = σ(γ1 ) +

max

= M (z, γ2 ) where the inequality is justified by σ being increasing and the induction hypothesis giving M (b z2 c + 1, ·) as an increasing function (note b z2 c + 1 ≤ z − 1 for all z > 3). Now we will show M (z, γ) is increasing in z for every γ. The proof is by induction. First, note that M (2, γ) = 0 ≤ σ(γ) = M (3, γ) for all γ > 0 and M (2, γ) = 0 = M (3, γ) for γ = 0. Now, consider some k > 3 and suppose that for any z1 , z2 ≤ k − 1 with z1 ≤ z2 that M (z1 , γ) ≤ M (z2 , γ) for all γ. The goal is to show that for any z1 , z2 ≤ k with z1 ≤ z2 that M (z1 , γ) ≤ M (z2 , γ) for all γ. So, consider such z1 , z2 ≤ k with z1 ≤ z2 . If γ = 0, then M (z1 , γ) = 0 = M (z2 , γ), so take γ > 0. Then

n o z1 z1 M (b c + 1, γ 0 ) + M (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ} n o z2 z 2 ≤ σ(γ) + max M (b c + 1, γ 0 ) + M (b c + 1, γ − γ 0 + 1) 2 2 γ 0 ∈{1,...,γ}

M (z1 , γ) = σ(γ) +

max

= M (z2 , γ). The inequality obtains since b z2i c + 1 ≤ k − 1 for all i (which is true since even if zi = k, one has bk/2c + 1 ≤ k − 1 by virtue of k > 3). So, the induction hypothesis gives M (b z21 c + 1, ·) ≤ M (b z22 c + 1, ·), and the proof by induction is complete. Now we can give a cost bound for the one-state binary monotonicity algorithm: Proposition 7. Let σ : Z++ → Z+ be an upper bound on the number of π evaluations required to solve (3). Then, the algorithm requires at most 2σ(n0 ) + Mσ (n, n0 ) evaluations for n ≥ 2 and n0 ≥ 1. Proof. Since g is the policy function associated with (1), g : {1, . . . , n} → {1, . . . , n0 }. By monotonicity, g is weakly increasing. Define N : {1, . . . , n}2 → Z+ by N (a, b) = M (b − a + 1, g(b) − g(a) + 1) noting that this is well-defined (based on the definition of M ) whenever b > a. Additionally, define a sequence of sets Ik for k = 1, . . . , n − 1 by Ik := {(i, i)|i = i + k and i, i ∈ {1, . . . , n}}. Note that for any k ∈ {1, . . . , n − 1}, Ik is nonempty and N (a, b) is well-defined for any (a, b) ∈ Ik . We shall now prove that for any k ∈ {1, . . . , n − 1}, (i, i) ∈ Ik implies N (i, i) is an upper bound on the number of evaluations of π required by the algorithm in order to compute the optimal policy 42

for all i ∈ {i, . . . , i} when g(i) and g(i) are known. If true, then beginning at step 2 in the algorithm (which assumes g(i) and g(i) are known) with (i, i) ∈ Ik , N (i, i) is an upper bound on the number of π evaluations. The argument is by induction. First, consider k = 1. For any (a, b) ∈ I1 , the algorithm terminates at step 2(a). Consequently, the number of required π evaluations is zero, which is the same as N (a, b) = M (b − a + 1, g(b) − g(a) + 1) = M (2, g(b) − g(a) + 1) = 0 (recall M (2, ·) = 0). Now, consider some k ∈ {2, . . . , n − 1} and suppose the induction hypothesis holds for all j in 1, . . . , k − 1. That is, assume for all j in 1, . . . , k − 1 that (i, i) ∈ Ij implies N (i, i) is an upper bound on the number of required π evaluations when g(i) and g(i) are known. We shall show it holds for k as well. Consider any (i, i) ∈ Ik with g(i) and g(i) are known. Since i > i + 1, the algorithm does not terminate at step 2(a). In step 2(b), to compute g(m) (where m := b i+i 2 c), one must find the maximum within the range g(i), . . . , g(i), which requires at most σ(g(i) − g(i) + 1) evaluations of π. In step 2(c), the space is then divided into {i, . . . , m} and {m, . . . , i}. If k is even, then m equals

i+i 2 .

Since (i, m) ∈ Ik/2 and g(i) and g(m) are known, the induction

hypothesis gives N (i, m) as an upper bound on the number of π evaluations needed to compute g(i), . . . , g(m). Similarly, since (m, i) ∈ Ik/2 and g(m) and g(i) are known, N (m, i) provides an upper bound on the number of π evaluations needed to compute g(m), . . . , g(i). Therefore, to compute g(i), . . . , g(i), at most σ(g(i) − g(i) + 1) + N (i, m) + N (m, i) evaluations are required. Defining γ = g(i) − g(i) + 1 and γ 0 = g(m) − g(i) + 1 and using the definition of m and N , we have that the number of required evaluations is less than σ(γ) + M (m − i + 1, g(m) − g(i) + 1) + M (i − m + 1, g(i) − g(m) + 1) i+i 2 i−i = σ(γ) + M ( 2 i−i = σ(γ) + M ( 2 i−i = σ(γ) + M ( 2

= σ(γ) + M (

i+i + 1, g(i) − g(m) + 1) 2

− i + 1, γ 0 ) + M (i −

i−i + 1, g(i) − g(m) + γ 0 − γ 0 + 1) 2 i−i + 1, γ 0 ) + M ( + 1, g(i) − g(m) + g(m) − g(i) + 1 − γ 0 + 1) 2 i−i + 1, γ 0 ) + M ( + 1, γ − γ 0 + 1). 2 + 1, γ 0 ) + M (

By virtue of monotonicity, g(m) ∈ {g(i), . . . , g(i)} and so g(m)−g(i)+1 ∈ {1, . . . , g(i)−g(i)+1} or equivalently γ 0 ∈ {1, . . . , γ}. Consequently, the number of function evaluations is not greater than σ(γ) +

max

γ 0 ∈{1,...,γ}

M

i−i + 1, γ 0 2

+M

i−i + 1, γ − γ 0 + 1 2

.

The case of k odd is very similar, but the divide-and-conquer algorithm splits the space unequally. If k is odd, then m equals 12

i+i−1 2 .

In this case (i, m) ∈ I(k−1)/2 and (m, i) ∈ I(k−1)/2+1 .12 Con-

To see this, note that (i, i) ∈ Ik implies k = i − i. To have, (i, m) ∈ I(k−1)/2 , it must be that m = i +

holds: i +

k−1 2

= i+

i−i−1 2

= i+

i+i−1−i−i 2

= i+m+

−2i 2

k−1 . 2

This

= m. Similarly, to have (m, i) ∈ I(k−1)/2+1 , one must have

43

sequently, computing the policy for i, . . . , i takes no more than σ(g(i)−g(i)−1)+N (i, m)+N (m, i) maximization steps. Defining γ and γ 0 the same as before and using the definition of m and N , we have that the required maximization steps is less than σ(γ) + M (m − i + 1, γ 0 ) + M (i − m + 1, γ − γ 0 + 1) i+i−1 i+i−1 = σ(γ) + M ( − i + 1, γ 0 ) + M (i − + 1, γ − γ 0 + 1) 2 2 i−i+1 0 i−i+1 0 = σ(γ) + M ,γ + M + 1, γ − γ + 1 . 2 2 Because M is increasing in the first argument, this is less than σ(γ) +

max

γ 0 ∈{1,...,γ}

M

i−i+1 + 1, γ 0 2

+M

i−i+1 + 1, γ − γ 0 + 1 2

.

Combining the bounds for k even and odd, the required number of π evaluations is less than σ(γ) +

max

γ 0 ∈{1,...,γ}

M

i−i+1 i−i+1 0 0 + 1, γ + M + 1, γ − γ + 1 2 2

because if k is even, then b i−i+1 2 c=

i−i 2 .

(27)

Consequently, (27) gives an upper bound for any (i, i) ∈ Ik

for k ≥ 1 when g(i) and g(i) are known. If N (i, i) is less than this, then the proof by induction is complete. Since N (i, i) is defined as M (i − i + 1, g(i) − g(i) + 1), using the definitions of N and M shows N (i, i) = M (i − i + 1, g(i) − g(i) + 1) = M (i − i + 1, γ) = σ(γ) +

max

γ 0 ∈{1,...,γ}

i−i+1 i−i+1 0 0 M (b c + 1, γ ) + M (b c + 1, γ − γ + 1) . 2 2

Consequently, N (i, i) exactly equals the value in (27), and the proof by induction is complete. Step 1 of the algorithm requires at most 2σ(n0 ) evaluations to compute g(1) and g(n). If n = 2, step 2 is never reached. Since M (n, n0 ) = 0 in this case, 2σ(n0 ) + M (n, n0 ) provides an upper bound. If n > 2, then since (1, n) ∈ In−1 and g(1) and g(n) known, only N (1, n) additional evaluations are required. Therefore, to compute for each i ∈ {1, . . . , n}, no more than 2σ(n0 ) + N (1, n) = 2σ(n0 ) + M (n, g(n) − g(1) + 1) function evaluations are needed. Lemma (8) then gives that this is less than 2σ(n0 ) + M (n, n0 ) since g(n) − g(1) + 1 ≤ n0 − 1 + 1 = n0 . The preceding proposition gives a fairly tight bound. However, it is unwieldy because of the discrete nature of the problem. By bounding σ whose domain is Z++ with a σ ¯ whose domain is [1, ∞), a more convenient bound can be found. We give this bound in Lemma 11. For its proof, we need the following two lemmas: i = m + (k − 1)/2 + 1. This also obtains: m +

(k−1) 2

+1=

i+i−1 2

44

+

i−i−1 2

+1=

2i−2 2

+ 1 = i.

i−1 +1 Lemma 9. Define a sequence {mi }∞ i=1 by m1 = 2 and mi = 2mi−1 −1 for i ≥ 2. Then mi = 2

and log2 (mi − 1) = i − 1 for all i ≥ 1. Proof. The proof of mi = 2i−1 + 1 for all i ≥ 1 is by induction. For i = 1, m1 is defined as 2, which equals 21−1 + 1. For i > 1, suppose it holds for i − 1. Then mi = 2mi−1 − 1 = 2[2i−2 + 1] − 1 = 2i−1 + 1. Lemma 10. Consider any z ≥ 2. Then there exists a unique sequence {ni }Ii=1 such that n1 = 2, nI = z, b n2i c + 1 = ni−1 , and ni > 2 for all i > 1. Moreover, I = dlog2 (z − 1)e + 1. Proof. The proof that a unique sequence exists is by construction. Let z ≥ 2 be fixed. Define an infinite sequence {zi }∞ i=1 recursively as follows: Define zi = Ti (z) for all i ≥ 1 with T1 (z) := z and Ti+1 (z) = b Ti2(z) c + 1. We now establish all of the following: Ti (z) ≥ 2, Ti (z) ≥ Ti+1 (z), and Ti (z) > Ti+1 (z) whenever Ti (z) > 2. As an immediate consequence, for any z ≥ 2, there exists a unique I(z) ≥ 1 such that TI(z) = 2 and, for all i < I(z), Ti (z) > 2. We also show for later use that Ti (z) is weakly increasing in z for every i. To show Ti (z) ≥ 2, the proof is by induction. We have T1 (z) = z and z ≥ 2. Now, consider some i > 1 and suppose it holds for i − 1. Then Ti (z) = b Ti−12 (z) c + 1 ≥ b 22 c + 1 = 2. To show Ti (z) > Ti+1 (z) whenever Ti (z) > 2, consider two cases. First, consider Ti (z) even. Then Ti+1 (z) = b Ti2(z) c + 1 = Ti (z) odd. Then Ti+1 (z) =

Ti (z) 2 + 1 and so Ti+1 (z) < Ti (z) as long as Ti (z) > 2. Second, consider Ti (z) b 2 c + 1 = Ti (z)−1 + 1 and so Ti+1 (z) < Ti (z) as long as Ti (z) > 1. 2

To show that Ti (z) ≥ Ti+1 (z), all we need to show now is that Ti+1 (z) = 2 when Ti (z) = 2 (since the inequality is strict if Ti (z) > 2 and Ti (z) ≥ 2 for all i). If Ti (z) = 2, then Ti+1 (z) = b 22 c + 1 = 2. To establish that Ti (z) is weakly increasing in z for every i, the proof is by induction. For a ≤ b, T1 (a) = a ≤ b = T1 (b). Now consider some i > 1 and suppose the induction hypothesis holds for i − 1. Then Ti (a) = bTi−1 (a)/2c + 1 ≤ bTi−1 (b)/2c + 1 = Ti (b). I(z)

The sequence {nj }j=1 defined by nj = TI(z)−j+1 (z)—i.e., an inverted version of the sequence I(z)

{Ti (z)}i=1 —satisfies nI(z) = T1 (z) = z, n1 (z) = TI(z) = 2, and ni−1 = TI(z)−(i−1)+1 = b

TI(z)−(i−1) c+ 2

1 = b n2i c + 1. Also, by the definition of I(z), Ti (z) > 2 for any i > I(z). So, if we can show that I(z) = dlog2 (z − 1)e + 1, the proof is complete. The proof of I(z) = dlog2 (z − 1)e + 1 is as follows. Note that for z = 2, the sequence {zi } is simply zi = 2 for all i which implies I(2) = 1. Since dlog2 (2 − 1)e + 1 = 1, the relationship holds for z = 2. So, now consider z > 2. The proof proceeds in the following steps. First, for the special {mi } sequence defined in Lemma 9, we show Tj (mi ) = mi+1−j for any i ≥ 1 and any j ≤ i. Second, we use this to show that I(mi ) = i for all i ≥ 1. Third, we show that z ∈ (mi−1 , mi ] implies I(z) = i by showing I(mi − 1) < I(z) ≤ I(mi ). Fourth, we show that the i such that z ∈ (mi−1 , mi ] is given by dlog2 (z − 1)e + 1. This then gives I(z) = dlog2 (z − 1)e + 1 since I(z) = I(mi ) = i = dlog2 (z − 1)e + 1. First, we show Tj (mi ) = mi+1−j for any i ≥ 1 and any j ≤ i. Fix some i ≥ 1. The proof is by induction. For j = 1, T1 (mi ) = mi = mi+1−1 . Now consider some j having 2 ≤ j ≤ i and suppose

45

the induction hypothesis holds for j − 1. Then Tj−1 (mi ) c+1 2 mi+1−(j−1) =b c+1 2 mi+2−j =b c+1 2 2mi+1−j − 1 =b c+1 2 = mi+1−j ,

Tj (mi ) = b

which proves Tj (mi ) = mi+1−j for j ≤ i. The fourth equality follows from the definition of {mi } in Lemma 9. Second, we show I(mi ) = i for all i ≥ 1. Fix any i ≥ 1. We just showed Tj (mi ) = mi+1−j . Hence, Ti (mi ) = m1 = 2 and Ti−1 (mi ) = m2 = 3. Consequently, the definition of I—which for a given z is defined as the smallest i ≥ 1 such that Ti (z) = 2—gives I(mi ) = i (recall Tj is decreasing in j). Third, we show that z ∈ (mi−1 , mi ] implies I(z) = i by showing I(mi − 1) < I(z) ≤ I(mi ). Note that, since z > 2 (having taken care of the z = 2 case already), there is some i ≥ 2 such that z ∈ (mi−1 , mi ] (since m1 = 2). To see I(z) ≤ I(mi ), suppose not, that I(z) > I(mi ). But then 2 = TI(z) (z) < TI(mi ) (z) ≤ TI(mi ) (mi ) = 2, which is a contradiction. Therefore, I(z) ≤ I(mi ). To see I(mi−1 ) < I(z), we begin by showing Tj (mi−1 ) < Tj (mi−1 + ε) for any ε > 0 and any j ≤ i − 1. Since Tj (mi−1 ) = mi−j , it is equivalent to show that mi−j < Tj (mi−1 + ε), which we show by induction. Clearly, for j = 1, we have mi−1 < mi−1 + ε = T1 (mi−1 + ε). Now consider j > 1 and suppose it is true for j − 1. Then Tj−1 (mi−1 + ε) c+1 2 Tj−1 (mi−1 + ε) − mi−j+1 + mi−j+1 c+1 =b 2 Tj−1 (mi−1 + ε) − mi−j+1 + 2mi−j − 1 =b c+1 2 Tj−1 (mi−1 + ε) − mi−j+1 − 1 =b c + mi−j + 1 2

Tj (mi−1 + ε) = b

Now, since the induction hypothesis of Tj−1 (mi−1 +ε) > mi−j+1 gives Tj−1 (mi−1 +ε)−mi−j+1 −1 ≥ 0, one has 0 Tj (mi−1 + ε) ≥ b c + mi−j + 1 = mi−j + 1 > mi−j . 2 Hence the proof by induction is complete. Now, having established Tj (mi−1 ) < Tj (mi−1 + ε) for any ε > 0 and any j ≤ i − 1, we show I(mi−1 ) < I(z). Suppose not, that I(mi−1 ) ≥ I(z). Then since z > mi−1 , taking ε = z − mi−1 we have 2 = TI(mi−1 ) (mi−1 ) < TI(mi−1 ) (mi−1 + ε) = TI(mi−1 ) (z) ≤ TI(z) (z) = 2, which is

46

a contradiction. Lastly, we now show that the i such that z ∈ (mi−1 , mi ] is given by dlog2 (z − 1)e + 1. That this holds can be seen as follows. Note that z ∈ (mi−1 , mi ] implies log2 (z − 1) + 1 ∈ (log2 (mi−1 − 1) + 1, log2 (mi − 1) + 1]. Then, since log2 (mj − 1) + 1 = j for all j ≥ 1 (Lemma 9), we have log2 (z − 1) + 1 ∈ (i − 1, i]. Then, by the definition of d·e, one has dlog2 (z − 1) + 1e = i, which of course is equivalent to dlog2 (z − 1)e + 1 = i. We established the i such that z ∈ (mi−1 , mi ] is i = dlog2 (z − 1)e + 1. Also we showed i − 1 = I(mi−1 ) < I(z) ≤ I(mi ) = i. Hence I(z) = dlog2 (z − 1)e + 1, which completes the proof. Now we give a more convenient bound than the one in Proposition 7. The main result will apply it with σ ¯ bounds corresponding to brute force and binary concavity. Lemma 11. Suppose σ ¯ : [1, ∞) → R+ is either the identity map (¯ σ (γ) = γ) or is a strictly increasing, strictly concave, and differentiable function. If σ ¯ (γ) ≥ σ(γ) for all γ ∈ Z++ , then an upper bound on function evaluations is 3¯ σ (n0 ) +

I−2 X

2j σ ¯ (2−j (n0 − 1) + 1)

j=1

if I > 2 where I = dlog2 (n − 1)e + 1. An upper bound for I ≤ 2 is 3¯ σ (n0 ). Proof. In keeping with the notation of the other proofs, let z and γ correspond to n and n0 , respectively. Fix some arbitrary z ≥ 2. By Lemma 10, there is a strictly monotone increasing sequence {zi }Ii=1 with zI = z, zi = b zi+1 2 c + 1 for i < I, and with I = dlog2 (z − 1)e + 1 (and having z1 = 2). For i > 1 and any γ ≥ 1, define W (zi , γ) :=

max

γ 0 ∈{1,...,γ}

M (zi−1 , γ 0 ) + M (zi−1 , γ − γ 0 + 1).

For i = 1, define W (zi , ·) = 0. The definition of M gives M (zi , γ) = σ(γ) + W (zi , γ) for any i > 1 with M (z1 , ·) = 0. Note that W (z2 , γ) = W (z1 , γ) = 0. ¯ —which we will demonstrate is an upper bound and continuous version of W —as Define W ¯ (zi , γ) := σ ¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) W ¯ ∗ (γ) + max W γ 0 ∈[1,γ]

for i > 2 with σ ¯ ∗ (γ) := max σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1). γ 0 ∈[1,γ]

(28)

¯ (zi , γ) = 0. Then W (zi , γ) ≤ W ¯ (zi , γ) for all i ≥ 1 and all γ ∈ Z++ . The For i = 1 or 2, define W proof is by induction. They are equal for i = 1 and i = 2. Now consider an i > 2 and suppose it

47

holds for i − 1. Then, W (zi , γ) = = ≤ ≤

max

M (zi−1 , γ 0 ) + M (zi−1 , γ − γ 0 + 1)

max

σ(γ 0 ) + σ(γ − γ 0 + 1) + W (zi−1 , γ 0 ) + W (zi−1 , γ − γ 0 + 1)

max

σ(γ 0 ) + σ(γ − γ 0 + 1) +

max

σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) +

γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ}

max

W (zi−1 , γ 0 ) + W (zi−1 , γ − γ 0 + 1)

max

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) W

γ 0 ∈{1,...,γ} γ 0 ∈{1,...,γ}

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) ≤ max σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) + max W γ 0 ∈[1,γ]

γ 0 ∈[1,γ]

¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1) ≤σ ¯ ∗ (γ) + max W γ 0 ∈[1,γ]

¯ (zi , γ). =W If σ ¯ (x) = x for all x, then σ ¯ (γ 0 ) + σ ¯ (γ − γ 0 + 1) = γ + 1, which does not depend on γ 0 . So, ¯ function is strictly increasing, strictly concave, and differentiable, σ ¯ ∗ (γ) = γ + 1 = 2¯ σ ( γ+1 2 ). If the σ then the first order condition of the σ ¯ ∗ (γ) problem yields σ ¯ 0 (γ 0 ) = σ ¯ 0 (γ − γ 0 + 1). The derivative is invertible (by strict concavity) and so γ 0 = σ ¯ ∗ (γ)

=

2¯ σ ( γ+1 2 ),

γ+1 2 .

So, σ ¯ ∗ (γ) = σ ¯ ( γ+1 ¯ (γ − 2 )+σ

the same condition as in the linear σ ¯ ∗ (γ) = 2¯ σ(

case.13

γ+1 2

+ 1), which gives

Hence,

γ+1 ). 2

(29)

So, for i > 2, γ+1 ¯ (zi , γ) = 2¯ ¯ (zi−1 , γ 0 ) + W ¯ (zi−1 , γ − γ 0 + 1). W σ( ) + max W 2 γ 0 ∈[1,γ]

(30)

γ+1 ¯ (zi , γ) = 2¯ ¯ We will now show for i > 2 that W σ ( γ+1 2 ) + 2W (zi−1 , 2 ), which gives a simple ¯ (zi , γ) = 0). First note that 2W ¯ (zi , γ+1 ) recursive relationship for the upper bound (for i = 1 or 2, W 2

¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) evaluated at γ+1 , and so maxγ 0 ∈[1,γ] W ¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) ≥ is just W 2 ¯ (zi , γ+1 ). So, it is sufficient to show maxγ 0 ∈[1,γ] W ¯ (zi , γ 0 )+ W ¯ (zi , γ −γ 0 +1) ≤ 2W ¯ (zi , γ+1 ). Now, 2W 2

2

¯ (zi , γ 0 ) + W ¯ (zi , γ − γ 0 + 1) max W

γ 0 ∈[1,γ]

0 0 γ0 + 1 (γ − γ 0 + 1) + 1 ¯ (zi−1 , γ + 1 ) + 2¯ ¯ (zi−1 , (γ − γ + 1) + 1 ) ) + 2W σ( ) + 2W 2 2 2 2 γ 0 ∈[1,γ] 0 0 0 0 γ +1 γ+1 γ +1 ¯ (zi−1 , γ + 1 ) + W ¯ (zi−1 , γ + 1 − γ + 1 + 1) = 2 max σ ¯( )+σ ¯( − + 1) + W 2 2 2 2 2 2 γ 0 ∈[1,γ] γ+1 ¯ (zi−1 , γ˜ 0 ) + W ¯ (zi−1 , γ + 1 − γ˜ 0 + 1) = 2 max σ ¯ (˜ γ0) + σ ¯( − γ˜ 0 + 1) + W γ+1 0 2 2 γ ˜ ∈[1, 2 ]

= max 2¯ σ(

13

Since this is an interior solution and the problem is concave, the constraint γ 0 ∈ [1, γ] is not binding.

48

γ+1 ¯ (zi−1 , γ˜ 0 ) + W ¯ (zi−1 , γ + 1 − γ˜ 0 + 1) − γ˜ 0 + 1) + 2 max W 2 2 γ ˜ 0 ∈[1, γ+1 ] ] γ ˜ 0 ∈[1, γ+1 2 2 ! γ+1 +1 γ+1 γ+1 = 2 max σ ¯ (˜ γ0) + σ ¯( − γ˜ 0 + 1) + 2 W (zi , ) − 2σ( 2 ) 2 2 2 γ ˜ 0 ∈[1, γ+1 ] 2

≤2

max

σ ¯ (˜ γ0) + σ ¯(

γ+1 +1 γ+1 γ+1 ) + 2W (zi , ) − 4σ( 2 ) 2 2 2 γ+1 γ+1 +1 γ+1 2 +1 ) + 2W (zi , ) − 4σ( 2 ) = 4¯ σ( 2 2 2 ¯ (zi , γ + 1 ) = 2W 2

= 2¯ σ∗(

The first equality follows from (30). The second equality follows from algebra. The third equality is just a change of variables where γ˜ 0 = (γ + 1)/2. The inequality follows from max(f + g) ≤ max f + max g for any f, g. The fourth equality follows from evaluating (30) at (γ + 1)/2 and manipulation. The fifth equality follows from the definition of σ ¯ ∗ in (28). The sixth equality follows ¯ (zi , γ 0 ) + W ¯ (zi , γ − γ 0 + 1) = 2W ¯ (zi , γ+1 ). from (29). The last equality simplifies. So, maxγ 0 ∈[1,γ] W 2

Using this equality to replace the max in (30), one has (for i > 2) that γ+1 ¯ (zi , γ) = 2¯ ¯ (zi−1 , γ + 1 ). W σ( ) + 2W 2 2 Now, fix any γ ≥ 1 and define γI := γ and γi =

γi+1 +1 . 2

(31)

Then γi = 2i−I (γI − 1) + 1.14 Then, for

i > 2, (31) becomes ¯ (zi , γi ) = 2¯ ¯ (zi−1 , γi−1 ). W σ (γi−1 ) + 2W ¯ (zI , γI ): So, if I > 2, one can repeatedly expand the above to find a value for W ¯ (zI , γI ) = 2¯ ¯ (zI−1 , γI−1 ) W σ (γI−1 ) + 2W ¯ (zI−2 , γI−2 ) = 2¯ σ (γI−1 ) + 22 σ ¯ (γI−2 ) + 22 W ¯ (z2 , γ2 ) = 2¯ σ (γI−1 ) + . . . + 2I−2 σ ¯ (γ2 ) + 2I−2 W = 2¯ σ (γI−1 ) + . . . + 2I−2 σ ¯ (γ2 ) =

I−2 X

2j σ ¯ (γI−j )

j=1

=

I−2 X

2j σ ¯ (2I−j−I (γI − 1) + 1)

j=1

=

I−2 X

2j σ ¯ (2−j (γI − 1) + 1).

j=1 14

The proof is by induction. For i = I, γI = 2I−I (γI − 1) + 1 = γI , which holds. Consider then some i < I, and suppose it holds for i + 1. Then γi =

(2i+1−I (γI − 1) + 1) + 1 γi+1 + 1 = = 2i−I (γI − 1) + 1. 2 2

49

¯ (z2 , ·) = 0, and the rest are The first equalities are algebra, the fourth uses the definition of W ¯ (zI , γI ) = 0. algebra. If I ≤ 2, then W Proposition 7 shows the number of required evaluations is less than or equal to 2σ(γI )+M (zI , γI ) for zI ≥ 2 and γI ≥ 1. Since M (zi , γ) = σ(γ) + W (zi , γ) for any i > 1 (with M (z1 , γ) = 0) and ¯ (zi , γ), the required function evaluations are weakly less than 3σ(γI ) + W ¯ (zI , γI ) for W (zi , γ) ≤ W any I (recalling W (zI , γI ) = 0 for I ≤ 2). Hence, if I > 2, then an upper bound is 3σ(γI ) +

I−2 X

2j σ ¯ (2−j (γI − 1) + 1)

j=1

and if I ≤ 2 an upper bound is 3σ(γI ). Proposition 1 essentially applies the formula in Lemma 11 and simplifies. In the proof, there will be two partial sums, and we establish what they equal now. P P Lemma 12. For r 6= 1, bj=a rj = (ra − rb+1 )/(1 − r) and bj=a jrj = P P r = 2, bj=a rj = 2b+1 − 2a and bj=a jrj = 2a (2 − a) + 2b+1 (b − 1). Proof. The first sum, written as

(ra

−

Pb

j=a r

rb+1 )/(1

j,

ara −brb+1 1−r

+

ra+1 −rb+1 . (1−r)2

For

is the standard geometric series and its sum can be compactly

− r) for r 6= 1. For r = 2, this is 2b+1 − 2a .

Pb

j , a sort of weighted geometric series, has no commonly known formula, Pb so we derive it. Define S := j=a jrj . Then for r 6= 1,

The second sum,

j=a jr

(1 − r)S =

b X

jrj −

j=a

=

b X

=

jrj+1

j=a

jrj −

j=a b X

b X

b+1 X

(j − 1)rj

j=a+1

jrj −

b+1 X

jrj +

j=a

j=a+1



b X

= ara +

j=a+1

b+1 X

rj

j=a+1





jrj  − 

b X

j=a+1

 jrj + (b + 1)rb+1  +

ra+1 − rb+2 . 1−r

The first line is algebra, the second a change of indices, the third algebra, and the fourth separates out a term from each of the first two summations (and uses the geometric series formula to replace

50

the third). Canceling the remaining summations, one then has ra+1 − rb+2 1−r b+1 (1 − r)r ra+1 − rb+2 = ara − brb+1 − + 1−r 1−r a+1 − r b+1 r = ara − brb+1 + 1−r a b+1 a+1 ar − br r − rb+1 ⇔S= + . 1−r (1 − r)2

(1 − r)S = ara − (b + 1)rb+1 +

Plugging in r = 2 gives S = b2b+1 − a2a + 2a+1 − 2b+1 = 2a (2 − a) + 2b+1 (b − 1). We can now prove Proposition 1 by applying Lemma 11 with σ bounds corresponding to brute force and binary concavity. Proof of Proposition 1 Proof. From Lemma 11, the number of evaluations required by the algorithm is not greater than ¯ (n, n0 ) where W ¯ (n, n0 ) := PI−2 2j σ ¯ (2−j (n0 − 1) + 1) and I = dlog2 (n − 1)e + 1. In the 3¯ σ (n0 ) + W j=1 case of brute force, σ ¯ (γ) = γ is a valid upper bound on σ(γ) = γ. Plugging this into the expression 0 ¯ (n, n ), one has for W 0

0

W (n, n ) = (I − 2)(n − 1) +

I−2 X

2j

j=1 0

= (I − 2)(n − 1) + 2I−1 − 2 = (I − 2)(n0 − 1) + 2dlog2 (n−1)e − 2 ≤ (I − 2)(n0 − 1) + 2log2 (n−1)+1 − 2 ≤ (I − 2)(n0 − 1) + 2(n − 1) − 2 = (dlog2 (n − 1)e − 1)(n0 − 1) + 2n − 4 ≤ (n0 − 1) log2 (n − 1) + 2n − 4. where we have used Lemma 12 to arrive at the second line. So, no more than (n0 − 1) log2 (n − 1) + 3n0 + 2n − 4 evaluations are required. In the case of binary concavity, Lemma 7 shows σ(γ) = 2dlog2 (γ)e − 1 for γ ≥ 3 and σ(γ) = γ for γ ≤ 3 is an upper bound. Now consider σ ¯ (γ) = 2 log2 (γ) + 1. It is a strictly increasing, strictly concave, and differentiable function. For γ = 1 or 2, one can plug in values to find σ(γ) ≤ σ ¯ (γ). Additionally, for γ ≥ 3, σ(γ) ≤ 2dlog2 (γ)e − 1 ≤ 2(1 + log2 (γ)) − 1 = σ ¯ (γ) So, σ ¯ satisfies all the conditions of Lemma 11.

51

Plugging this σ ¯ into the bound, one finds ¯ (n, n0 ) = W

I−2 X

2j 1 + 2 log2 (2−j (n0 − 1) + 1)

j=1

= (2I−1 − 2) + 2

I−2 X

2j log2 (2−j (n0 − 1) + 1)

j=1

(using Lemma 12). To handle the log2 (2−j (n0 − 1) + 1) term, we break the summation into two parts, one with 2−j (n0 − 1) < 1 and one with 2−j (n0 − 1) ≥ 1. We do this to exploit the following fact: For x ≥ 1, log2 (x + 1) ≤ log2 (x) + 1 since they are equal at x = 1 the right hand side grows more quickly in x (i.e., the derivative of log2 (x + 1) is less than the derivative of log2 (x) + 1). Let J be such that j > J implies 2−j (n0 − 1) < 1 and j ≤ J implies 2−j (n − 1) ≥ 1. Then since 2−j (n − 1) = 1 for j = log2 (n0 − 1), J is given by blog2 (n0 − 1)c. Recall that in the statement of the proposition we assumed that n0 ≥ 3. So, J ≥ 1. I−2 X

¯ (n, n0 ) = (2I−1 − 2) + 2 W

j=J+1 I−2 X

≤ (2I−1 − 2) + 2

2j log2 (2−j (n0 − 1) +1) + 2 | {z } <1

2j + 2

I−2 X

2j + 2

j=1

= 3(2I−1 − 2) + 2

J X

j=1

2j log2 (2−j (n0 − 1) +1) | {z } ≥1

2j (1 + log2 (2−j (n0 − 1)))

j=1

j=J+1

= (2I−1 − 2) + 2

J X

J X

J X

2j log2 (2−j (n0 − 1))

j=1

2j log2 (2−j (n0 − 1))

j=1 I−1

= 3(2

− 2) + 2

J X

2j (−j + log2 (n0 − 1))

j=1 I−1

= 3(2

− 2) − 2

J X

0

j

j2 + 2 log2 (n − 1)

j=1

J X

2j

j=1

The first line follows from the definition of J; the second from log2 (x + 1) ≤ 1 + log2 (x) for x ≥ 1; the third from algebra; the fourth from the standard geometric series formula; and the fifth and sixth from algebra. Then, using the weighted geometric sum found in Lemma 12, i.e., Pb j a b+1 (b − 1), a j2 = 2 (2 − a) + 2 = 3(2I−1 − 2) − 2(21 (2 − 1) + 2J+1 (J − 1)) + 2 log2 (n0 − 1)(2J+1 − 2) = 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + 2 log2 (n0 − 1)(2J+1 − 2) ≤ 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + 2(J + 1)(2J+1 − 2) = 3(2I−1 − 2) − 4 − 2J+2 (J − 1) + (J + 1)2J+2 − 4(J + 1)

52

= 3(2I−1 − 2) − 4 − J2J+2 + 2J+2 + J2J+2 + 2J+2 − 4(J + 1) = 3(2I−1 − 2) − 4 + 2J+3 − 4J − 4 = 3 · 2I−1 + 2J+3 − 4J − 14. The first line applies the weighted geometric sum formula, the second simplifies, the third uses log2 (n0 − 1) ≤ J + 1 (i.e., log2 (n0 − 1) ≤ blog2 (n0 − 1)c), and the remaining lines use algebra. Now, substituting the expressions for I and J, 0

= 3 · 2dlog2 (n−1)e+1−1 + 2blog2 (n −1)c+3 − 4blog2 (n0 − 1)c − 14 0

≤ 3 · 21+log2 (n−1) + 2log2 (n −1)+3 − 4blog2 (n0 − 1)c − 14 = 6(n − 1) + 8(n0 − 1) − 4blog2 (n0 − 1)c − 14 = 6n + 8n0 − 4blog2 (n0 − 1)c − 28 ≤ 6n + 8n0 − 4(log2 (n0 − 1) − 1) − 28 = 6n + 8n0 − 4 log2 (n0 − 1) − 24. ¯ (n, n0 ). Hence, the total number of evaluations—which The above expression provides a bound for W ¯ (n, n0 )—cannot exceed must be less than 3¯ σ (n0 ) + W 3(2 log2 (n0 ) + 1) + 6n + 8n0 − 4 log2 (n0 − 1) − 24 = 6n + 8n0 + 6 log2 (n0 ) − 4 log2 (n0 − 1) − 21 ≤ 6n + 8n0 + 6(log2 (n0 − 1) + 1) − 4 log2 (n0 − 1) − 21 = 6n + 8n0 + 2 log2 (n0 − 1) − 15. The second line is algebra, the third again uses log2 (x + 1) ≤ 1 + log2 (x) for x ≥ 1 (note n0 ≥ 3 in the statement of the proposition), and the last simplifies. F.3.3

Binary monotonicity in two dimensions

In proving the efficiency of the two-state binary monotonicity algorithm, we first establish a lemma. b−a+1 Lemma 13. Define m(a, b) = b a+b 2 c for b > a with a, b ∈ Z. Then m(a, b) − a + 1 ≤ b 2 c + 1

and b − m(a, b) + 1 ≤ b b−a+1 2 c + 1. Proof. If exactly one of a, b is odd, then m = m−a+1≤

b+a−1 2 ;

otherwise, m =

b+a 2 .

So,

b+a b−a+1 1 b−a+1 −a+1= + ≤b c + 1. 2 2 2 2

Now, take the case of exactly one of a, b being odd. Then b − a is odd and b − a + 1 is even. In this case, b−m+1=b−

b+a−1 b−a+1 b−a+1 +1= +1=b c + 1. 2 2 2

53

If on the other hand a, b are either both even or both odd, then b − a + 1 is odd. In this case, b−m+1=b−

b+a b−a+1 1 b−a+1 +1= + =b c + 1. 2 2 2 2

We now give the proof of the two-state algorithm’s cost bounds: Proof of Proposition 2 Proof. Let an upper bound on the cost of the usual binary monotonicity algorithm given z states and γ choices as C(z, γ). Then note that the cost of solving for g(·, 1) is less than C(n1 , n0 ), as is the cost of solving for g(·, n2 ). Let g(·, ·; π) give the policy selected by the algorithm when the objective function is π. Let T exact (j, j; π)

denote the exact cost of the algorithm for recovering g(·, j; π) for all j ∈ {j +1, . . . , j −

1} when the objective function is π and g(·, j; π) and g(·, j; π) are known. (That is, the cost from Step 3 onward). Define for z > 2 and γ ≥ 1 T ∗ (z, γ) =

sup

C(n1 , γ) + T exact (j, m(j, j); π) + T exact (m(j, j), j; π)

j,j∈{1,...,n2 },π

s.t. j − j + 1 ≤ z, j < j g(n1 , j; π) − g(1, j; π) + 1 ≤ γ ∗ ∗ where m(a, b) is the integer midpoint b a+b 2 c. For z = 2, define T (z, ·) = 0. Note T (z, γ) must be

weakly increasing in both arguments. Fix some (j, j, π) with j, j ∈ {1, . . . , n2 } and j < j, and note that (j, j, π) is in the choice set of the T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1) problem. We now show T exact is bounded by T ∗ in that T exact (j, j; π) ≤ T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1). To see this, note that if j − j + 1 = 2 then T exact (j, j; π) = 0, in which case T ∗ (j −j +1, ·) = 0 by definition. On the other hand, if j −j +1 > 2, then T exact (j, j; π) ≤ C(n1 , g(n1 , j; π) − g(1, j; π) + 1) + T exact (j, m(j, j); π) + T exact (m(j, j), j; π) because C bounds the cost of the one-dimensional algorithm. Comparing with the definition of T ∗ , T ∗ (j − j + 1, g(n1 , j; π) − g(1, j; π) + 1) is necessarily larger because (j, j, π) is in its choice set. Now, using this bound and the definition of T ∗ gives T ∗ (z, γ) ≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, g(n1 , j; π) − g(1, m; π) + 1) j,j,π

= sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ j,j,π

j − m + 1,

g(n1 , j; π) − g(1, j; π) + 1 +g(1, j; π) − g(1, m; π)

≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, γ + g(1, j; π) − g(1, m; π)) j,j,π

54

!

= sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) j,j,π

+ T ∗ (j − m + 1, γ + g(n1 , m; π) − g(1, m; π) + 1 − γ 0 (π, j, j)) where m = m(j, j) and we have defined γ 0 (π, j, j) := g(n1 , m; π) − g(1, j; π) + 1. The first relation follows from T exact being less than T ∗ , the second from adding and subtracting g(1, j; π), the third from the constraint that g(n1 , j; π) − g(1, j; π) + 1 ≤ γ and T ∗ being increasing in its second argument, and the last from adding and subtracting γ 0 and manipulation. With this definition of γ 0 , the constraint g(n1 , j; π) − g(1, j; π) + 1 ≤ γ is equivalent to γ 0 (π, j, j) ≤ γ. So, γ 0 (π, j, j) ∈ [1, γ]. Using our λ-based restriction, we have as an implication of it that g(n1 , m; π) − g(1, m; π) + 1 ≤ λ(g(n1 , j; π) − g(1, j; π) + 1) ≤ λγ. So, because T ∗ is increasing in its second argument, T ∗ (z, γ) ≤ sup C(n1 , γ) + T ∗ (m − j + 1, γ 0 (π, j, j)) + T ∗ (j − m + 1, (1 + λ)γ − γ 0 (π, j, j)) j,j,π

Note now that π only shows up in γ 0 . Since the choice set implies γ 0 ∈ [1, γ], we can drop π and allow γ 0 ∈ [1, γ] to be chosen directly: T ∗ (z, γ) ≤

sup

C(n1 , γ) + T ∗ (m − j + 1, γ 0 ) + T ∗ (j − m + 1, (1 + λ)γ − γ 0 )

j,j,γ 0 ∈[1,γ]

By Lemma 13, m − j + 1 and j − m + 1 are less than b j − j + 1 ≤ z—must be less than T ∗ (z, γ) ≤

b z2 c

j−j+1 2 c+1

which—because of the constraint

+ 1. So,

z z C(n1 , γ) + T ∗ (b c + 1, γ 0 ) + T ∗ (b c + 1, (1 + λ)γ − γ 0 ), 2 2 j,j,γ 0 ∈[1,γ] sup

which no longer depends on j or j. Hence, dropping these from the choice set, one has z z T ∗ (z, γ) ≤ C(n1 , γ) + sup T ∗ (b c + 1, γ 0 ) + T ∗ (b c + 1, (1 + λ)γ − γ 0 ). 2 2 γ 0 ∈[1,γ] Defining T (z, γ) := C(n1 , γ) + maxγ 0 ∈[1γ] T (b z2 c + 1, γ 0 ) + T (b z2 c + 1, (1 + λ)γ − γ 0 ) for z > 2 and 0 otherwise, we then have T (z, γ) as an upper bound for T ∗ (z, γ) (by induction with T ∗ (2, γ) = T (2, γ) = 0 as the base case). Now, by Lemma 10, there is a unique sequence {zi }Ii=1 with zI = z, b z2i c + 1 = zi−1 , z1 = 2, and zi > 2 for all i > 1 and I = dlog2 (z − 1)e + 1. Then, T (zi , γ) = C(n1 , γ) + sup T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) γ 0 ∈[1,γ]

for all i. For binary monotonicity with brute force grid search, the upper bound on function counts in Proposition 1 is linear increasing in γ. So suppose C is linear increasing in γ. In that case, we will 55

show T (zi , γ) is linear increasing in γ for all i ≥ 2 and that T (zi , γ) = C(n1 , γ) + 2T (zi−1 , γ

(1 + λ) ). 2

First note this is trivially the case for i = 2 since in that case T (zi−1 , ·) = T (z1 , ·) = T (2, ·) = 0. Now, suppose that it holds for i − 1. Then continuity gives that the maximum is attained so sup can be replaced with max. Because of linearity, T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) is independent of γ 0 . Consequently, maxγ 0 T (zi−1 , γ 0 ) + T (zi−1 , (1 + λ)γ − γ 0 ) = 2T (zi−1 , γ (1+λ) 2 ). Hence, T (zi , γ) will also be linearly increasing and satisfy the recursive formulation above. Now, expanding the recursive formulation and defining c := (1 + λ)/2, T (zI , γ) = C(n1 , γ) + 2T (zI−1 , γc) = C(n1 , γ) + 2(C(n1 , γc) + 2T (zI−2 , (γc)c) = C(n1 , γ) + 2C(n1 , γc) + 22 T (zI−2 , γc2 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) + 2I−1 T (zI−(I−1) , γcI−1 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) + 2I−1 T (2, γcI−1 ) = C(n1 , γ) + 2C(n1 , γc) + . . . + 2I−2 C(n1 , γcI−2 ) =

I−2 X

2i C(n1 , γci )

i=0

Plugging in c = (1 + λ)/2, zI = n2 (corresponding to the first time Step 3 is reached), and γ = n0 (corresponding to Step 3 being reached with the worst case g(·, 1) = 1 and g(·, n2 ) = n0 ), T (n2 , n0 ) =

I−2 X

2i C(n1 , 2−i (1 + λ)i n0 )

i=0

where I = dlog2 (n2 − 1)e + 1. With brute force grid search, a valid C is C(n1 , γ) = 2n1 + γ(log2 (n1 ) + 3). Then T (n2 , n0 ) =

I−2 X

2i 2n1 + 2−i (1 + λ)i n0 (log2 (n1 ) + 3)

i=0 I−2 I−2 X X i = n (log2 (n1 ) + 3) (1 + λ) + 2n1 2i 0

i=0

i=0

1 − (1 + λ)I−1 + 2n1 (2I−1 − 1) = n0 (log2 (n1 ) + 3) −λ (1 + λ)I−1 − 1 = n0 (log2 (n1 ) + 3) + 2n1 (2I−1 − 1) λ using the formulas in Lemma 12. Now, I = dlog2 (n2 − 1)e + 1 implies I − 1 ≤ log2 (n2 − 1) + 1. So, defining κ := log2 (1 + λ) so 56

that (1 + λ) = 2κ , 2κ(I−1) − 1 + 2n1 (2I−1 − 1) λ ≤ n0 (log2 (n1 ) + 3)λ−1 2κ(I−1) + 2n1 (2I−1 − 1)

T (n2 , n0 ) = n0 (log2 (n1 ) + 3)

≤ n0 (log2 (n1 ) + 3)λ−1 2κ(log2 (n2 −1)+1) + 2n1 (2log2 (n2 −1)+1 − 1) = n0 (log2 (n1 ) + 3)λ−1 (2log2 (n2 −1) )κ 2κ + 2n1 (2(n2 − 1) − 1) = n0 (log2 (n1 ) + 3)λ−1 (n2 − 1)κ 2κ + 4n1 n2 − 6n1 ≤ n0 (log2 (n1 ) + 3)λ−1 nκ2 (1 + λ) + 4n1 n2 − 6n1 = (1 + λ)λ−1 log2 (n1 )n0 nκ2 + 3(1 + λ)λ−1 n0 nκ2 + 4n1 n2 − 6n1 = (λ−1 + 1) log2 (n1 )n0 nκ2 + 3(λ−1 + 1)n0 nκ2 + 4n1 n2 − 6n1 . This bound does not include the cost of solving for g(·, 1) and g(·, n2 ) using the standard binary algorithm. Including this cost, the algorithm’s total cost is less than 2C(n1 , n0 ) + T (n2 , n0 ) ≤ 2n0 (log2 (n1 ) + 3) + 4n1 + (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 − 6n1 ≤ (1 + λ−1 ) log2 (n1 )n0 nκ2 + 3(1 + λ−1 )n0 nκ2 + 4n1 n2 + 2n0 log2 (n1 ) + 6n0 , which is the bound stated in the proposition. Now, suppose that λ < 1 provides a uniform bound on (g(n1 , j) − g(1, j) + 1)/(g(n1 , j + 1) − g(1, j − 1) + 1) (this also implies λ > 0). So, κ ∈ (0, 1) (since κ = log2 (1 + λ)). To characterize the algorithm’s O(n1 n2 ) behavior when n1 = n0 =: n and n2 = ρ−1 n, divide the bound by n1 n2 = n2 /ρ to arrive at c

log2 (n)n1+κ o(n2 ) + 4 + n2 n2

where c is a positive constant. Then it is enough to show that log(n)n1+κ (using that natural log has the same asymptotics as log2 ) grows more slowly than n2 . The ratio log(n)n1+κ /n2 equals log(n)/n1−κ . Using l’Hopital’s rule as n → ∞, if the limit exists it is the same as the limit of (1/n)/((1 − κ)n−κ ) = n−1+κ /(1 − κ). This equals 0 since κ ∈ (0, 1). Consequently, the cost is O(n1 n2 ) with a hidden constant of 4.

57