Abstract Stochastic Trust-Region Response-Surface method (STRONG) is a new response-surface framework for simulation optimization. In this paper, we point out that STRONG can fail to converge when the response surface is generally-distributed rather than normal. We investigate the reason and propose a modiﬁed framework, called STRONG-X, that can address the problem. We prove that STRONG-X can achieve convergence under generally-distributed response surfaces. Numerical experiments show that STRONG and STRONG-X are preferred in diﬀerent situations and are complementary to each other. Keywords: Response surface methodology, Trust region method, Simulation optimization, Blackbox method.

1

Introduction

Simulation optimization refers to an iterative procedure in search of the optimal parameter when the objective function can only be evaluated by stochastic simulation. Mathematically, the optimization problem can be written as: minimize g(x), x ∈ Rp , (1) where g(x) = E[G(x)] is the objective function, G(x) is the stochastic response, x is a controllable parameter (possibly a vector). Chang et al. (2011) proposed a new response-surface framework, called Stochastic Trust-Region Response-Surface method (STRONG), for simulation optimization. Since its introduction, STRONG has drawn a great deal of attention, because it preserves the advantages, yet eliminates the disadvantages, of the traditional response surface methodology (RSM) (Myers et al., 2009), which is one of the most popular methods for simulation optimization. Specifically, due to the employment of well-established statistical tools such as design of experiment (DOE) and regression analysis, STRONG is computationally superior to other existing approaches, especially when solving higher-dimensional problems. The novel integration of the concept “trust region” (Conn et al., 2000), originally developed in deterministic nonlinear programming, further enables STRONG to get rid of the weaknesses of the traditional RSM—the requirement of human 1

involvements and failure to guarantee convergence. Indeed, STRONG is an automated framework with provable convergence guarantee. The automation property of STRONG is especially important for simulation optimization, because the number of iteration in simulation optimization is typically very large and hence it is diﬃcult, if not impossible, to stop and ask for people’s inputs for each iteration, as required by the traditional RSM. The convergence property, which guarantees that the algorithm can converge to one of the local optima with probability one, is also of theoretical relevance, as it ensures that the algorithm is “on the right track”; moreover, if suﬃcient computation is allowed, the algorithm can always ﬁnd a solution any close to the true optima, which is desired but never achieved by the traditional RSM. In addition to the aforementioned strengths, STRONG is advantageous with its broad applicability: it can handle both white-box and black-box problems. More speciﬁcally, when the problem can be exploited to derive an eﬀective gradient estimator (i.e., white-box problems), STRONG allows the use of the derived gradient estimator to construct the local model and, as a consequence, achieves a better computational eﬃciency. On the other hand, if the problem is too complex and only inputs and outputs of the simulation model are observable (i.e., black-box problems), a black-box version of STRONG that employs regression technique to construct the local model is also provided. STRONG, although attractive due to its automation property, guaranteed convergence and computational superiority, has some limitations. The main point is that, when a new point is generated, STRONG decides whether to accept it by conducting hypothesis testing, which is essentially a t-test comparing the current solution and the new solution. If there exist suﬃcient evidence showing that the new solution can “yield suﬃcient reductions,” the current solution is replaced by the new solution; otherwise, the current solution is retained and the algorithm will ﬁnd another solution. In order to ensure that the algorithm is “on the right track” and ﬁnally achieves convergence, the type-I error of the hypothesis testing, which refers to the probability that a new solution is poor but is incorrectly accepted, should be well controlled. In STRONG, because it is assumed that the response variable is normally distributed, the use of t-test is justiﬁed. In practice, however, it is common that the response variable can be any general distribution rather than normal. This raises an immediate concern: the use of t-test is simply an approximation and the type-I error in the hypothesis testing cannot be exactly controlled. In fact, for any generally-distributed response surface, one can hope is that, when sample size is suﬃciently large, normal distribution serves as a good approximate distribution due to the central limit theorem (CLT). For physical experiments, the “inaccuracy” or “small error,” due to the approximation, may not be a serious problem. In simulation, however, the problem is aggravated. This is attributed to the fact that the number of iterations conducted in simulation is typically much larger than physical experiments. If the hypothesis testing is conducted at each iteration, the “small error” can accumulate and ruin the convergence of the algorithm. We use a simple example to illustrate the point. Suppose, for each iteration k, the sample size used for hypothesis testing is also k. Let the CDF of the response variable of k samples be ξk (x). Then, by Berry-Ess´een theorem (Serﬂing, 1980), the error of approximation in CLT can be characterized by sup |ξk (x) − Φ(x)| ≤ x

33E|G(x) − g(x)|3 for all k, 4σ 3 k 1/2

(2)

where σ is the standard deviation of response variable G(x) and Φ(x) is the CDF of standard normal distribution. If the hypothesis testing is conducted based on normal distribution Φ(x) instead of the exact distribution ξk (x), there is a small error in the type I error. Although the error is decreasing ∑∞ as the algorithm iterates, the summation of it is divergent, i.e., k=1 Ck 1/2 = ∞, where C is some constant. Consequently, by the second Borel-Cantelli lemma (Billingsley, 1995), we know that the event that a poor solution is incorrectly accepted will occur inﬁnitely often, which is suﬃcient to 2

make the algorithm diverge. In this note, we intend to study and address this problem. Moreover, we are interested in knowing how non-normal response surfaces can aﬀect the performance of the algorithm when hypothesis testing is iteratively conducted based on normal approximation. We re-examine the framework of STRONG and develop a modiﬁed framework for generally-distributed response surfaces. To distinguish, we call the modiﬁed framework STRONG-X, where “X” denotes “relaxation.” As we shall see, STRONG and STRONG-X have a complementary relationship and are favorable in diﬀerent situations. More details will be presented in later sections. The rest of this article is organized as follows. In Section 2, we review the basic framework of STRONG and present the modiﬁed framework STRONG-X that can handle generally-distributed response surfaces. We also prove that STRONG-X can achieve convergence with probability one. In Section 3, we compare and contrast the two frameworks, STRONG and STRONG-X, and discuss the situations where one is preferred to the other. We make concluding remarks in Section 4.

2

The new framework STRONG-X

In this section, we ﬁrst brieﬂy review the basic steps of STRONG and then point out how this framework can be modiﬁed to handle generally-distributed response surfaces. Finally, we prove that STRONG-X can achieve convergence.

2.1

Revisit the basic framework of STRONG

STRONG is a metamodel-based framework. For each iteration k (say), STRONG constructs a local model, either by a derived gradient estimator or by black-box estimation, and deﬁnes a “trust region,” which represents a neighborhood where the local model is believed to well represent the underlying response surface. Suppose the current solution is xk and the radius of trust region is ∆k > 0. The trust region is deﬁned as: B(xk , ∆k ) = {x ∈ Rp : ∥x − xk ∥ ≤ ∆k },

(3)

where ∥ · ∥ denotes the Euclidian norm. STRONG uses the concept of “trust region” to replace the “region of interest” in the traditional RSM, automatically adjusts its size, thus avoiding the need for human involvement. For each iteration, STRONG performs the following three steps: Step 1. Construct a local model rk (x) around the center point xk . Step 2. Solve x∗k ∈ argmin{rk (x) : x ∈ B(xk , ∆k )}. Step 3. Simulate several observations at xk and x∗k , and estimate g(x∗k ). Step 3. Conduct so-called ratio-comparison (RC) and suﬃcient-reduction (SR) tests to examine the quality of x∗k and to update xk+1 and the size of the trust region ∆k+1 . Here we only focus on Steps 3 and 4 and refer to Chang et al. (2011) for more details about other steps. STRONG accepts a new solution x∗k when it passes both the RC test and SR test. Let G1 (x), . . . , Gn (x) denote independent and identically distributed (i.i.d.) random variables of G(x), and Gi (x) be truncated at ±M where M is an extremely large number. Further let the average 1∑ Gi (x). n n

G(x, n) =

i=1

3

(4)

At iteration k, STRONG estimates the function values of the current solution and the new solution, gbk (xk ) and gbk (x∗k ), by G(xk , Nk ) and G(x∗k , Nk ), respectively. The RC test is deﬁned as ρk =

gbk (xk ) − gbk (x∗k ) . rk (xk ) − rk (x∗k )

(5)

Let 0 < η0 < η1 < 1, for example, we may set η0 = 1/4, η1 = 3/4. The RC test compares the “observed reduction” with the “predicted reduction” given by the model and decides whether the new solution should be accepted. If the ratio is large (ρk ≥ η1 ), which implies that the new solution is signiﬁcantly better than the current one, the new solution will be accepted. If the ratio is moderate (η0 ≤ ρk < η1 ), which implies that the “observed reduction” has fair agreements with the “predicted reduction,” the new solution is also accepted. If ρk is close to 0 or negative (ρk < η0 ), which implies that the “observed reduction” does not agree with the “predicted reduction,” the new solution will be rejected. If the new solution passes the RC test, the SR test is further employed to ascertain that it can yield “suﬃcient reductions”. The SR test is deﬁned as follows: H0 : g(xk ) − g(x∗k ) ≤ η02 ζk

vs. H1 : g(xk ) − g(x∗k ) > η02 ζk

where η02 ζk corresponds to “suﬃcient reductions” and is deﬁned in Lemma 1 of Chang et al. (2011). The type I error is set as αk . When H0 is rejected, we conclude that the new solution can yield “suﬃcient reductions.” Clearly, the SR-test is a right-tailed test, which means a new solution will be accepted only when there is strong evidence showing that it can yield “suﬃcient reductions”. The type-I error, referring to probability that a poor solution is incorrectly ∑ accepted, has to be controlled. In fact, for STRONG to achieve convergence, it is required that ∞ k=1 αk < ∞. Note that if a poor solution is incorrectly accepted, the algorithm can be led to a wrong direction and as a consequence, it may take a long time to come back, or even worse, may never come back.

2.2

Uncover the reason that ruin convergence of the algorithm

In simulation optimization, when deciding whether a new solution should be accepted and replace the current solution, hypothesis testing is a nature choice, because it allows decisions to be made with controlled type-I error. In this section, we uncover how the small error existing in the type-I error may cause the algorithm to diverge. As illustrated in Figure 1, suppose the skewed distribution is the true distribution of the test statistic in the hypothesis testing, and the symmetric distribution is the approximate distribution. The “true critical region” and “approximate critical region” correspond to critical regions deﬁned by the true and approximate distributions, respectively. It can be easily seen that, for a right-tailed test (such as the SR test) and left-skewed true distribution, the “approximate critical region” is larger than the “true critical region.” This increases the probability that a poor solution is incorrectly accepted. As discussed earlier, if a poor solution is incorrectly accepted inﬁnitely often, the algorithm can fail to converge.

2.3

Eﬀective modiﬁcations

For generally-distributed response surfaces, since the SR test is no longer applicable, what is left is the RC test when deciding whether to accept a new solution. However, the “observed reduction” in the RC test, i.e., the numerator of ρk , is estimated based on sample averages, which invariably contain some random error. In order to ensure the RC test can make a “correct” judgement, it is 4

True Critical Region Approximate Critical Region

Figure 1: Comparing the critical regions based on the exact and approximate distributions. crucial to control the random error in the “observed reduction.” One way to accomplish this is to require a suﬃciently large sample size for the estimation. We call the sample size required for each iteration a sample size schedule, which refers to a sequence of positive integer numbers {N1 , N2 , ...Nk } representing the number of samples taken at the current and new solutions. Obviously, Nk should be an increasing sequence so as to achieve the goal of reducing the random errors, and in turn making the RC test yield correct results. For the modiﬁed algorithm STRONG-X to achieve convergence, the following sample size schedule is required, Nk+1 = ⌈Nk · ψ⌉, ψ > 1. (6) In STRONG-X, if a new solution, say x∗k , passes the RC test, it will replace the current solution, and we let xk+1 = x∗k . Based on the developed sample size schedule, we have the following lemma. The proof is given in the appendix. We use “w.p.1” to denote “with probability 1.” Lemma 1. Let Nk+1 = ⌈Nk · ψ⌉, ψ > 1. If supx∈Rp Var[G(x)] < ∞, then w.p.1, ∞ ∑

|g(xk ) − g(xk+1 ) − gbk (xk ) + gbk+1 (xk+1 )| < ∞,

(7)

k=1

and

∞ ∑

|b gk (xk ) − g(xk )| < ∞.

(8)

k=1

Lemma 1 shows that the developed sample size schedule is in fact eﬀective: under this sample size schedule, the accumulated diﬀerences between the“observed reduction” and “true reduction” over iterations are bounded. Moreover, the accumulated diﬀerences between the “estimated function value” and “true function value” over iterations are bounded too.

2.4

The framework of STRONG-X and its convergence analysis

With the modiﬁcations developed in Section 2.3, we are ready to present the new framework STRONG-X and the convergence analysis. The ﬂowchart and full algorithm of STRONG-X are given in Figures 6, 7, and 8 in the appendix. Simply speaking, for each iteration, instead of employing the SR test, STRONG-X increases the sample size at the current and new solution by a multiplier ψ, ψ > 1, making the RC test to result in asymptotically correct results. To show that STRONG-X can achieve convergence, we need a few assumptions. In Assumption 1, ∇g(x) and H(x) denote the gradient and Hessian at x.

5

Assumption 1. The objective function g(x) is bounded below, twice diﬀerentiable, and there exist two positive constants, α1 and β1 , such that the gradient ∥∇g(x)∥ ≤ α1 and the Hessian ∥H(x)∥ ≤ β1 , for all x ∈ Rp . Assumption 1 requires that g(x) has uniformly bounded gradient and Hessian. Without loss of generality, we can assume there exists a strongly consistent gradient estimator that can be used to construct the local model. Note, however, if the problem does not allow the derivation of a strongly consistent gradient estimator, Chang et al. (2011) provided a black-box approach based on design of experiment and regression analysis. The black-box approach can also be proved to be strongly consistent. b m (x) denote the strongly consistent gradient estimator with sample size m. Let ∇g Assumption 2. The estimators of g(x)

and ∇g(x) satisfy supx∈Rp G(x, n) − g(x) → 0 w.p.1 as

b n → ∞ and supx∈Rp ∇g m (x) − ∇g(x) → 0 w.p.1 as m → ∞. b m (x) follow Uniform Law of Large Numbers Assumption 2 requires that both G(x, n) and ∇g (ULLN). Notably, for any strongly consistent estimator, it is possible to extend it to ULLN as long as the conditions given in Andrews (1992) are satisﬁed. Now we can present the convergence of STRONG-X. The steps of proof strictly follows that for STRONG. We only provide the main theorem and refer readers to Chang et al. (2011) for the supporting lemmas. The proof of theorem 1 is given in the appendix. Theorem 1. Suppose that Assumptions 1 and 2 hold. If STRONG-X has inﬁnitely many successful iterations, then lim inf k→∞ ∥∇g(xk )∥ = 0 w.p.1.

3

Compare and contrast STRONG and STRONG-X

Although STRONG-X is advantageous with its generality, i.e., the convergence is guaranteed for any generally-distributed response surface, the price to pay, however, is the increased sample size. As Figure 2 shows, for each iteration, STRONG uses a ﬁxed sample size Nk = N0 , and increases it to Nk′ when inner loop is initiated. On the other hand, STRONG-X requires the sample size being iteratively increased by a multiplier, i.e., Nk = Nk−1 · ψ, ψ > 1, and if inner loop is initiated, the sample size is further increased. Overall speaking, STRONG-X is more computationally demanding than STRONG. Sample Sizes Sample Sizes

N2

N4

'

N4

' 1

'

N3

N

'

N1

N0 0

N

N2

' 3

N1

N2

N3

N4

1

2

3

4

N1

N0 5

'

'

N4

'

N3

N2

Iteration 0

1

2

3

4

5

Figure 2: Sample size schedules of STRONG (left) and STRONG-X (right). To understand the advantages and disadvantages of STRONG and STRONG-X, we compare their performances under three scenarios. In particular, because Berry-Ess´een theorem indicates that the approximation error is proportional to the skewness, it is of interest to evaluate the 6

Iteration

performance of algorithms when the response surface has diﬀerent levels of skewness. We let the response surface p−1 ∑ [100(xi − x2i+1 )2 + (1 − xi )2 ] − ϵx , (9) i=1

where g(x) = E[G(x)], p = 6, ϵx is a (1) normal random variable with mean zero and standard deviation = g(x), (2) gamma random variable with α = 0.5, β = (2 · g(x))1/2 , and (3) gamma random variable with α = 20, β = (0.05·g(x))1/2 . Note that the second and third scenarios represent generally-distributed response surfaces, with high skewness and low skewness, respectively. Also note that the random noise is “−ϵx ,” which allows left-skewed response surfaces to be generated. From Figure 3, we can see that STRONG has much better performance than STRONG-X when the response surface is normally-distributed. The computational gains are obtained due to the employment of hypothesis testing under normal response surfaces. For generally-distributed response surfaces, Figures 4 and 5 show that STRONG-X can converge to the optima in a steady manner, while STRONG is oscillating without being able to identify the optimal solution. Moreover, comparing Figures 4 and 5, it is found that STRONG is susceptible to the skewness of response surfaces. When the skewness is higher, the performance trajectory of STRONG becomes much “bumpier,” which means STRONG tends to (incorrectly) accept solutions that are not actually better than the current solutions, due to the use of approximate distribution in the hypothesis testing. These results suggest that, when the response surface is known to be normally-distributed, STRONG should be used due to its computational eﬃciency. On the other hand, when the distribution of response surface is unknown, STRONG-X is preferred because of the convergence guarantee. 6.5 STRONG STRONG−X

6 5.5

logg(x)

5 4.5 4 3.5 3 2.5

0

2000

4000 6000 8000 Number of observations

10000

12000

Figure 3: Performance comparison when response surface is normally-distributed.

4

Concluding remarks

In this paper, we point out that the use of hypothesis testing based on an approximate distribution may cause STRONG to diverge. In fact, this problem may not only occur in STRONG but also other simulation optimization frameworks. We develop a modiﬁed framework STRONG-X that can handle generally-distributed response surfaces. We prove that STRONG-X can achieve convergence under generally-distributed response surfaces. Numerical experiments show that STRONG and STRONG-X are preferable in diﬀerent situations and have a complementary relationship. 7

6.5 STRONG STRONG−X

6 5.5

logg(x)

5 4.5 4 3.5 3 2.5 2

0

2

4 6 8 Number of observations

10 4

x 10

Figure 4: Performance comparison when response surface is generally-distributed with low skewness. STRONG STRONG−X

6.5 6 5.5

logg(x)

5 4.5 4 3.5 3 2.5 2 0

2

4 6 8 Number of observations

10 4

x 10

Figure 5: Performance comparison when response surface is normally-distributed with high skewness.

5

Appendix

5.1

STRONG-X

5.2

Proof of Lemma 1

Proof. We only prove the ﬁrst assertion since the second one can also be proved based on a similar argument. First observe that E[b gk (xk ) − gbk (xk+1 )] = g(xk ) − g(xk+1 ). Let αk = 1/k 2 , k = 1, 2, . . . . ∑∞ 2 k=1 1/k < ∞. By Chebyshev inequality Billingsley (1995), Pr(|b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| > αk ) Var(xk ) + Var(xk+1 ) E[b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )]2 < ≤ 2 αk Nk αk2 2 supx∈Rp Var(x) ≤ . Nk αk2

8

(10)

Initialization

Trust Region Small? No No

Yes

(Stage I) First-Order Model

Iteration Successful?

(Stage II) Second-Order Model

Yes

Yes

Iteration Successful?

No Inner Loop Stopping Rule

N k +1 = N k j, j > 1 Yes Exit

Figure 6: The main framework of STRONG-X.

Step 0. Set the iteration count k = 0. Select an initial solution x0 , initial sample size of the center point ˜ satisfying n0 and of the gradient estimator m0 , initial trust region size ∆0 , the switch threshold ∆ ˜ > 0, and constants η0 , η1 , γ1 , γ2 , ψ satisfying 0 < η0 < η1 < 1, 0 < γ1 < 1 < γ2 , and ψ > 1. ∆0 > ∆ ˜ go to Stage I ; otherwise, go to Stage II. Let Nk+1 = ⌈Nk · ψ⌉. Step 1. Let k = k + 1. If ∆k > ∆, Step 2. If the termination criterion is satisﬁed, stop and return the solution. Otherwise, go to Step 1.

Figure 7: STRONG-X main framework

9

Stage I. First-order polynomial (for kth iteration) 1. Build a ﬁrst-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the RC test and update the size of trust region. • if ρk < η0 , let xk+1 = xk , ∆k+1 = γ1 ∆k and return to Step 1 in the main framework. • if η0 ≤ ρk < η1 , let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρk ≥ η1 , let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Stage II. Second-order polynomial (for kth iteration) 1. Build a second-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the RC test and update the size of trust region. • if ρk < η0 , go to inner loop. • if η0 ≤ ρk < η1 , let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρk ≥ η1 , let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Inner Loop (for ith loop in kth iteration) 1. Set subiteration count i = 0. Let nk0 = n0 , mk0 = m0 , and ∆k0 = ∆k . 2. Let i = i + 1. Increase the sample sizes for the current and new solutions as Section 3.4 of Chang et al. (2011) describes. 3. Increase the sample size of the gradient estimator as Section 3.4 of Chang et al. (2011) describes. 4. Build a second-order polynomial. 5. Solve the subproblem and obtain a new solution x∗ki . 6. Conduct the RC test and update the size of trust region. • if ρki < η0 , let ∆ki+1 = γ1 ∆ki , and go back to Step 2 in inner loop. • if η0 < ρki < η1 , let xk+1 = x∗ki and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρki > η1 , let xk+1 = x∗ki and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework.

Figure 8: STRONG-X full algorithm

10

The second inequality of Equation (10) uses the fact that gbk (xk ) and gbk (xk+1 ) are independent.

Since Nk = ⌈Nk−1 · ψ⌉, ψ > 1, we know |Nk−1 /Nk | < 1. Since limk→∞ |

Nk−1 α2 | limk→∞ | N αk−1 2 k k

Nk−1 Nk |

< 1,

< 1. By the ratio test (Rudin, 1976), the series ∞ ∑ 2 sup k=1

2 x∈X σx 2 Nk αk

< ∞.

By Borel-Cantelli Lemma (Billingsley, 1995), Pr(|b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| > αk , i.o.) = 0

(11)

¯ for k ≥ K, ¯ =⇒ there exists K, |b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| < αk .

(12)

It follows then, w.p.1 ∞ ∑

|b gk (xk ) − gbk+1 (xk+1 ) − g(xk ) + g(xk+1 )|

¯ k=K ¯ K

<

∑

|b gk (xk ) − gbk+1 (xk+1 ) − g(xk ) + g(xk+1 )| +

αk < ∞.

(13)

¯ k=K

k=1

Equation (13) follows because G(x) is truncated to ±M and

5.3

∞ ∑

∑∞

k=1 αk

< ∞.

Proof of Theorem 1

b ′ (xk ), Proof. The convergence proof requires additional notations. For every iteration k, we let ∇g k b ′ (xk ), and ∆′ denote the estimates of gradient and Hessian at xk and the trust-region size at the H k

k

end of iteration k before the algorithm moves to iteration k + 1. Note that Theorem 1 is analyzed for outer loop, where xk is random in the decision The “w.p.1” is respect to the randomness

space.

b ′

of the algorithm. Also note that, if lim inf k→∞ ∇g (xk ) = 0, then there exists a subsequence of k

{xkj }∞ j=0

such that b ′ (xk )∥ = 0. lim ∥∇g kj j

j→∞

By lemma 6 in (Chang et al., 2011), we have limj→∞ ∥∇g(xkj )∥ = 0 w.p.1, which implies that lim inf k→∞ ∥∇g(xk )∥ = 0 w.p.1. Therefore, it suﬃces to prove that

b ′

lim inf ∇g (x ) k k = 0 w.p.1. k→∞

(14)

b ′ We prove Equation (14) by contradiction. Suppose that lim inf k→∞ ∇gk (xk ) > 0. Then, by lemma 5 in (Chang et al., 2011), lim inf k→∞ ∆′k > 0 w.p.1. Then, for almost all sample paths,

11

b ′

′ there exists a constant K1 > 0 such that ∇g k (xk ) ≥ ε and ∆k ≥ ∆L when k ≥ K1 for some ϵ > 0 and ∆L > 0. For each iteration, gbk (xk ) − gbk (xk+1 ) = G(xk , Nk ) − G(xk+1 , Nk ) b ′ (xk )∥ · ∆′ (for Stage I), or > η02 ∥∇g k k { } b ′ (xk )∥ ∥∇g 1 2 b ′ k > η ∥∇gk (xk )∥ · min , ∆′k . (for Stage II or inner loop) b ′ (xk )∥ 2 0 ∥H k For k ≥ K, ∞ ∑

gbk (xk ) − gbk (xk+1 ) =

k=K

=

∞ ∑ k=K ∞ ∑

G(xk , Nk ) − G(xk+1 , Nk ) G(xk , Nk ) − G(xk+1 , Nk ) + G(xk+1 , Nk ) − G(xk+1 , Nk+1 )

k=K

>

˜ η02 ε nI (K)∆

∞ {ε } ∑ 1 2 , ∆L + + η0 ε nII (K) min G(xk+1 , Nk ) − G(xk+1 , Nk+1 ), 2 κ k=K

(15) where nI (K) and nII (K) are the numbers of successful iterations of Stages I and II after iteration ˜ is the lower bound of the trust-region size for STRONG to switch from Stage I to II, and κ K, ∆ b ′ (xk )∥. Moreover, is the upper bound of ∥H k

∞ ∑

= <

k=K ∞ ∑ k=K ∞ ∑

G(xk+1 , Nk ) − G(xk+1 , Nk+1 ) G(xk+1 , Nk ) − g(xk+1 ) + g(xk+1 ) − G(xk+1 , Nk+1 ) |G(xk+1 , Nk ) − g(xk+1 )| + |g(xk+1 ) − G(xk+1 , Nk+1 )|

k=K

< ∞

(16)

Equation (16) uses Equation (8) in lemma 1. Also, since there are inﬁnitely many successful iterations, at least one of σI (K), σII (K) → ∞, it follows then Equation (15) = ∞ is unbounded. ∑ By Equation (7) in lemma 1, we know that ∞ k=K (g(xk )−g(xk+1 )) is also unbounded, which clearly contradicts the fact that g(x) is bounded below and the proof is complete.

References Andrews, D.W.K. 1992. Generic uniform convergence. Econometric Theory 8 241–257. Billingsley, P. 1995. Probability and Measure. John Wiley and Sons., New York. 12

Chang,

K.-H.,

method mization.

L.J. Hong,

(STRONG)-a To

appear

H. Wan. 2011. new in

Stochastic trust-region response-surface

response-surfance INFORMS

Journal

framework on

for

Computing

simulation (download

optifrom

http://sites.google.com/site/ssoptimizationlab/publications) . Conn, A.R., N.L.M. Gould, P.L. Toint. 2000. Trust-Region Methods. SIAM. Myers, R.H., D.C. Montgomery, C.M. Anderson-Cook. 2009. Response Surface Methodology-Process and Product Optimization Using Designed Experiments. John Wiley and Sons., New York. Rudin, W. 1976. Principles of Mathematical Analysis. McGraw-Hill., New York. Serﬂing, R.J. 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.

13