STRONG-X: An Extended Framework of STRONG for Simulation Optimization with Generally-Distributed Response Surfaces Kuo-Hao Chang Dept. of Industrial Engineering and Engineering Management National Tsing Hua University Hsinchu, 30013, Taiwan ROC Hong Wan School of Industrial Engineering Purdue University West Lafayette, IN 47907-2023, USA February 29, 2012

Abstract Stochastic Trust-Region Response-Surface method (STRONG) is a new response-surface framework for simulation optimization. In this paper, we point out that STRONG can fail to converge when the response surface is generally-distributed rather than normal. We investigate the reason and propose a modified framework, called STRONG-X, that can address the problem. We prove that STRONG-X can achieve convergence under generally-distributed response surfaces. Numerical experiments show that STRONG and STRONG-X are preferred in different situations and are complementary to each other. Keywords: Response surface methodology, Trust region method, Simulation optimization, Blackbox method.

1

Introduction

Simulation optimization refers to an iterative procedure in search of the optimal parameter when the objective function can only be evaluated by stochastic simulation. Mathematically, the optimization problem can be written as: minimize g(x), x ∈ Rp , (1) where g(x) = E[G(x)] is the objective function, G(x) is the stochastic response, x is a controllable parameter (possibly a vector). Chang et al. (2011) proposed a new response-surface framework, called Stochastic Trust-Region Response-Surface method (STRONG), for simulation optimization. Since its introduction, STRONG has drawn a great deal of attention, because it preserves the advantages, yet eliminates the disadvantages, of the traditional response surface methodology (RSM) (Myers et al., 2009), which is one of the most popular methods for simulation optimization. Specifically, due to the employment of well-established statistical tools such as design of experiment (DOE) and regression analysis, STRONG is computationally superior to other existing approaches, especially when solving higher-dimensional problems. The novel integration of the concept “trust region” (Conn et al., 2000), originally developed in deterministic nonlinear programming, further enables STRONG to get rid of the weaknesses of the traditional RSM—the requirement of human 1

involvements and failure to guarantee convergence. Indeed, STRONG is an automated framework with provable convergence guarantee. The automation property of STRONG is especially important for simulation optimization, because the number of iteration in simulation optimization is typically very large and hence it is difficult, if not impossible, to stop and ask for people’s inputs for each iteration, as required by the traditional RSM. The convergence property, which guarantees that the algorithm can converge to one of the local optima with probability one, is also of theoretical relevance, as it ensures that the algorithm is “on the right track”; moreover, if sufficient computation is allowed, the algorithm can always find a solution any close to the true optima, which is desired but never achieved by the traditional RSM. In addition to the aforementioned strengths, STRONG is advantageous with its broad applicability: it can handle both white-box and black-box problems. More specifically, when the problem can be exploited to derive an effective gradient estimator (i.e., white-box problems), STRONG allows the use of the derived gradient estimator to construct the local model and, as a consequence, achieves a better computational efficiency. On the other hand, if the problem is too complex and only inputs and outputs of the simulation model are observable (i.e., black-box problems), a black-box version of STRONG that employs regression technique to construct the local model is also provided. STRONG, although attractive due to its automation property, guaranteed convergence and computational superiority, has some limitations. The main point is that, when a new point is generated, STRONG decides whether to accept it by conducting hypothesis testing, which is essentially a t-test comparing the current solution and the new solution. If there exist sufficient evidence showing that the new solution can “yield sufficient reductions,” the current solution is replaced by the new solution; otherwise, the current solution is retained and the algorithm will find another solution. In order to ensure that the algorithm is “on the right track” and finally achieves convergence, the type-I error of the hypothesis testing, which refers to the probability that a new solution is poor but is incorrectly accepted, should be well controlled. In STRONG, because it is assumed that the response variable is normally distributed, the use of t-test is justified. In practice, however, it is common that the response variable can be any general distribution rather than normal. This raises an immediate concern: the use of t-test is simply an approximation and the type-I error in the hypothesis testing cannot be exactly controlled. In fact, for any generally-distributed response surface, one can hope is that, when sample size is sufficiently large, normal distribution serves as a good approximate distribution due to the central limit theorem (CLT). For physical experiments, the “inaccuracy” or “small error,” due to the approximation, may not be a serious problem. In simulation, however, the problem is aggravated. This is attributed to the fact that the number of iterations conducted in simulation is typically much larger than physical experiments. If the hypothesis testing is conducted at each iteration, the “small error” can accumulate and ruin the convergence of the algorithm. We use a simple example to illustrate the point. Suppose, for each iteration k, the sample size used for hypothesis testing is also k. Let the CDF of the response variable of k samples be ξk (x). Then, by Berry-Ess´een theorem (Serfling, 1980), the error of approximation in CLT can be characterized by sup |ξk (x) − Φ(x)| ≤ x

33E|G(x) − g(x)|3 for all k, 4σ 3 k 1/2

(2)

where σ is the standard deviation of response variable G(x) and Φ(x) is the CDF of standard normal distribution. If the hypothesis testing is conducted based on normal distribution Φ(x) instead of the exact distribution ξk (x), there is a small error in the type I error. Although the error is decreasing ∑∞ as the algorithm iterates, the summation of it is divergent, i.e., k=1 Ck 1/2 = ∞, where C is some constant. Consequently, by the second Borel-Cantelli lemma (Billingsley, 1995), we know that the event that a poor solution is incorrectly accepted will occur infinitely often, which is sufficient to 2

make the algorithm diverge. In this note, we intend to study and address this problem. Moreover, we are interested in knowing how non-normal response surfaces can affect the performance of the algorithm when hypothesis testing is iteratively conducted based on normal approximation. We re-examine the framework of STRONG and develop a modified framework for generally-distributed response surfaces. To distinguish, we call the modified framework STRONG-X, where “X” denotes “relaxation.” As we shall see, STRONG and STRONG-X have a complementary relationship and are favorable in different situations. More details will be presented in later sections. The rest of this article is organized as follows. In Section 2, we review the basic framework of STRONG and present the modified framework STRONG-X that can handle generally-distributed response surfaces. We also prove that STRONG-X can achieve convergence with probability one. In Section 3, we compare and contrast the two frameworks, STRONG and STRONG-X, and discuss the situations where one is preferred to the other. We make concluding remarks in Section 4.

2

The new framework STRONG-X

In this section, we first briefly review the basic steps of STRONG and then point out how this framework can be modified to handle generally-distributed response surfaces. Finally, we prove that STRONG-X can achieve convergence.

2.1

Revisit the basic framework of STRONG

STRONG is a metamodel-based framework. For each iteration k (say), STRONG constructs a local model, either by a derived gradient estimator or by black-box estimation, and defines a “trust region,” which represents a neighborhood where the local model is believed to well represent the underlying response surface. Suppose the current solution is xk and the radius of trust region is ∆k > 0. The trust region is defined as: B(xk , ∆k ) = {x ∈ Rp : ∥x − xk ∥ ≤ ∆k },

(3)

where ∥ · ∥ denotes the Euclidian norm. STRONG uses the concept of “trust region” to replace the “region of interest” in the traditional RSM, automatically adjusts its size, thus avoiding the need for human involvement. For each iteration, STRONG performs the following three steps: Step 1. Construct a local model rk (x) around the center point xk . Step 2. Solve x∗k ∈ argmin{rk (x) : x ∈ B(xk , ∆k )}. Step 3. Simulate several observations at xk and x∗k , and estimate g(x∗k ). Step 3. Conduct so-called ratio-comparison (RC) and sufficient-reduction (SR) tests to examine the quality of x∗k and to update xk+1 and the size of the trust region ∆k+1 . Here we only focus on Steps 3 and 4 and refer to Chang et al. (2011) for more details about other steps. STRONG accepts a new solution x∗k when it passes both the RC test and SR test. Let G1 (x), . . . , Gn (x) denote independent and identically distributed (i.i.d.) random variables of G(x), and Gi (x) be truncated at ±M where M is an extremely large number. Further let the average 1∑ Gi (x). n n

G(x, n) =

i=1

3

(4)

At iteration k, STRONG estimates the function values of the current solution and the new solution, gbk (xk ) and gbk (x∗k ), by G(xk , Nk ) and G(x∗k , Nk ), respectively. The RC test is defined as ρk =

gbk (xk ) − gbk (x∗k ) . rk (xk ) − rk (x∗k )

(5)

Let 0 < η0 < η1 < 1, for example, we may set η0 = 1/4, η1 = 3/4. The RC test compares the “observed reduction” with the “predicted reduction” given by the model and decides whether the new solution should be accepted. If the ratio is large (ρk ≥ η1 ), which implies that the new solution is significantly better than the current one, the new solution will be accepted. If the ratio is moderate (η0 ≤ ρk < η1 ), which implies that the “observed reduction” has fair agreements with the “predicted reduction,” the new solution is also accepted. If ρk is close to 0 or negative (ρk < η0 ), which implies that the “observed reduction” does not agree with the “predicted reduction,” the new solution will be rejected. If the new solution passes the RC test, the SR test is further employed to ascertain that it can yield “sufficient reductions”. The SR test is defined as follows: H0 : g(xk ) − g(x∗k ) ≤ η02 ζk

vs. H1 : g(xk ) − g(x∗k ) > η02 ζk

where η02 ζk corresponds to “sufficient reductions” and is defined in Lemma 1 of Chang et al. (2011). The type I error is set as αk . When H0 is rejected, we conclude that the new solution can yield “sufficient reductions.” Clearly, the SR-test is a right-tailed test, which means a new solution will be accepted only when there is strong evidence showing that it can yield “sufficient reductions”. The type-I error, referring to probability that a poor solution is incorrectly ∑ accepted, has to be controlled. In fact, for STRONG to achieve convergence, it is required that ∞ k=1 αk < ∞. Note that if a poor solution is incorrectly accepted, the algorithm can be led to a wrong direction and as a consequence, it may take a long time to come back, or even worse, may never come back.

2.2

Uncover the reason that ruin convergence of the algorithm

In simulation optimization, when deciding whether a new solution should be accepted and replace the current solution, hypothesis testing is a nature choice, because it allows decisions to be made with controlled type-I error. In this section, we uncover how the small error existing in the type-I error may cause the algorithm to diverge. As illustrated in Figure 1, suppose the skewed distribution is the true distribution of the test statistic in the hypothesis testing, and the symmetric distribution is the approximate distribution. The “true critical region” and “approximate critical region” correspond to critical regions defined by the true and approximate distributions, respectively. It can be easily seen that, for a right-tailed test (such as the SR test) and left-skewed true distribution, the “approximate critical region” is larger than the “true critical region.” This increases the probability that a poor solution is incorrectly accepted. As discussed earlier, if a poor solution is incorrectly accepted infinitely often, the algorithm can fail to converge.

2.3

Effective modifications

For generally-distributed response surfaces, since the SR test is no longer applicable, what is left is the RC test when deciding whether to accept a new solution. However, the “observed reduction” in the RC test, i.e., the numerator of ρk , is estimated based on sample averages, which invariably contain some random error. In order to ensure the RC test can make a “correct” judgement, it is 4

True Critical Region Approximate Critical Region

Figure 1: Comparing the critical regions based on the exact and approximate distributions. crucial to control the random error in the “observed reduction.” One way to accomplish this is to require a sufficiently large sample size for the estimation. We call the sample size required for each iteration a sample size schedule, which refers to a sequence of positive integer numbers {N1 , N2 , ...Nk } representing the number of samples taken at the current and new solutions. Obviously, Nk should be an increasing sequence so as to achieve the goal of reducing the random errors, and in turn making the RC test yield correct results. For the modified algorithm STRONG-X to achieve convergence, the following sample size schedule is required, Nk+1 = ⌈Nk · ψ⌉, ψ > 1. (6) In STRONG-X, if a new solution, say x∗k , passes the RC test, it will replace the current solution, and we let xk+1 = x∗k . Based on the developed sample size schedule, we have the following lemma. The proof is given in the appendix. We use “w.p.1” to denote “with probability 1.” Lemma 1. Let Nk+1 = ⌈Nk · ψ⌉, ψ > 1. If supx∈Rp Var[G(x)] < ∞, then w.p.1, ∞ ∑

|g(xk ) − g(xk+1 ) − gbk (xk ) + gbk+1 (xk+1 )| < ∞,

(7)

k=1

and

∞ ∑

|b gk (xk ) − g(xk )| < ∞.

(8)

k=1

Lemma 1 shows that the developed sample size schedule is in fact effective: under this sample size schedule, the accumulated differences between the“observed reduction” and “true reduction” over iterations are bounded. Moreover, the accumulated differences between the “estimated function value” and “true function value” over iterations are bounded too.

2.4

The framework of STRONG-X and its convergence analysis

With the modifications developed in Section 2.3, we are ready to present the new framework STRONG-X and the convergence analysis. The flowchart and full algorithm of STRONG-X are given in Figures 6, 7, and 8 in the appendix. Simply speaking, for each iteration, instead of employing the SR test, STRONG-X increases the sample size at the current and new solution by a multiplier ψ, ψ > 1, making the RC test to result in asymptotically correct results. To show that STRONG-X can achieve convergence, we need a few assumptions. In Assumption 1, ∇g(x) and H(x) denote the gradient and Hessian at x.

5

Assumption 1. The objective function g(x) is bounded below, twice differentiable, and there exist two positive constants, α1 and β1 , such that the gradient ∥∇g(x)∥ ≤ α1 and the Hessian ∥H(x)∥ ≤ β1 , for all x ∈ Rp . Assumption 1 requires that g(x) has uniformly bounded gradient and Hessian. Without loss of generality, we can assume there exists a strongly consistent gradient estimator that can be used to construct the local model. Note, however, if the problem does not allow the derivation of a strongly consistent gradient estimator, Chang et al. (2011) provided a black-box approach based on design of experiment and regression analysis. The black-box approach can also be proved to be strongly consistent. b m (x) denote the strongly consistent gradient estimator with sample size m. Let ∇g Assumption 2. The estimators of g(x)

and ∇g(x) satisfy supx∈Rp G(x, n) − g(x) → 0 w.p.1 as

b n → ∞ and supx∈Rp ∇g m (x) − ∇g(x) → 0 w.p.1 as m → ∞. b m (x) follow Uniform Law of Large Numbers Assumption 2 requires that both G(x, n) and ∇g (ULLN). Notably, for any strongly consistent estimator, it is possible to extend it to ULLN as long as the conditions given in Andrews (1992) are satisfied. Now we can present the convergence of STRONG-X. The steps of proof strictly follows that for STRONG. We only provide the main theorem and refer readers to Chang et al. (2011) for the supporting lemmas. The proof of theorem 1 is given in the appendix. Theorem 1. Suppose that Assumptions 1 and 2 hold. If STRONG-X has infinitely many successful iterations, then lim inf k→∞ ∥∇g(xk )∥ = 0 w.p.1.

3

Compare and contrast STRONG and STRONG-X

Although STRONG-X is advantageous with its generality, i.e., the convergence is guaranteed for any generally-distributed response surface, the price to pay, however, is the increased sample size. As Figure 2 shows, for each iteration, STRONG uses a fixed sample size Nk = N0 , and increases it to Nk′ when inner loop is initiated. On the other hand, STRONG-X requires the sample size being iteratively increased by a multiplier, i.e., Nk = Nk−1 · ψ, ψ > 1, and if inner loop is initiated, the sample size is further increased. Overall speaking, STRONG-X is more computationally demanding than STRONG. Sample Sizes Sample Sizes

N2

N4

'

N4

' 1

'

N3

N

'

N1

N0 0

N

N2

' 3

N1

N2

N3

N4

1

2

3

4

N1

N0 5

'

'

N4

'

N3

N2

Iteration 0

1

2

3

4

5

Figure 2: Sample size schedules of STRONG (left) and STRONG-X (right). To understand the advantages and disadvantages of STRONG and STRONG-X, we compare their performances under three scenarios. In particular, because Berry-Ess´een theorem indicates that the approximation error is proportional to the skewness, it is of interest to evaluate the 6

Iteration

performance of algorithms when the response surface has different levels of skewness. We let the response surface p−1 ∑ [100(xi − x2i+1 )2 + (1 − xi )2 ] − ϵx , (9) i=1

where g(x) = E[G(x)], p = 6, ϵx is a (1) normal random variable with mean zero and standard deviation = g(x), (2) gamma random variable with α = 0.5, β = (2 · g(x))1/2 , and (3) gamma random variable with α = 20, β = (0.05·g(x))1/2 . Note that the second and third scenarios represent generally-distributed response surfaces, with high skewness and low skewness, respectively. Also note that the random noise is “−ϵx ,” which allows left-skewed response surfaces to be generated. From Figure 3, we can see that STRONG has much better performance than STRONG-X when the response surface is normally-distributed. The computational gains are obtained due to the employment of hypothesis testing under normal response surfaces. For generally-distributed response surfaces, Figures 4 and 5 show that STRONG-X can converge to the optima in a steady manner, while STRONG is oscillating without being able to identify the optimal solution. Moreover, comparing Figures 4 and 5, it is found that STRONG is susceptible to the skewness of response surfaces. When the skewness is higher, the performance trajectory of STRONG becomes much “bumpier,” which means STRONG tends to (incorrectly) accept solutions that are not actually better than the current solutions, due to the use of approximate distribution in the hypothesis testing. These results suggest that, when the response surface is known to be normally-distributed, STRONG should be used due to its computational efficiency. On the other hand, when the distribution of response surface is unknown, STRONG-X is preferred because of the convergence guarantee. 6.5 STRONG STRONG−X

6 5.5

logg(x)

5 4.5 4 3.5 3 2.5

0

2000

4000 6000 8000 Number of observations

10000

12000

Figure 3: Performance comparison when response surface is normally-distributed.

4

Concluding remarks

In this paper, we point out that the use of hypothesis testing based on an approximate distribution may cause STRONG to diverge. In fact, this problem may not only occur in STRONG but also other simulation optimization frameworks. We develop a modified framework STRONG-X that can handle generally-distributed response surfaces. We prove that STRONG-X can achieve convergence under generally-distributed response surfaces. Numerical experiments show that STRONG and STRONG-X are preferable in different situations and have a complementary relationship. 7

6.5 STRONG STRONG−X

6 5.5

logg(x)

5 4.5 4 3.5 3 2.5 2

0

2

4 6 8 Number of observations

10 4

x 10

Figure 4: Performance comparison when response surface is generally-distributed with low skewness. STRONG STRONG−X

6.5 6 5.5

logg(x)

5 4.5 4 3.5 3 2.5 2 0

2

4 6 8 Number of observations

10 4

x 10

Figure 5: Performance comparison when response surface is normally-distributed with high skewness.

5

Appendix

5.1

STRONG-X

5.2

Proof of Lemma 1

Proof. We only prove the first assertion since the second one can also be proved based on a similar argument. First observe that E[b gk (xk ) − gbk (xk+1 )] = g(xk ) − g(xk+1 ). Let αk = 1/k 2 , k = 1, 2, . . . . ∑∞ 2 k=1 1/k < ∞. By Chebyshev inequality Billingsley (1995), Pr(|b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| > αk ) Var(xk ) + Var(xk+1 ) E[b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )]2 < ≤ 2 αk Nk αk2 2 supx∈Rp Var(x) ≤ . Nk αk2

8

(10)

Initialization

Trust Region Small? No No

Yes

(Stage I) First-Order Model

Iteration Successful?

(Stage II) Second-Order Model

Yes

Yes

Iteration Successful?

No Inner Loop Stopping Rule

N k +1 = N k j, j > 1 Yes Exit

Figure 6: The main framework of STRONG-X.

Step 0. Set the iteration count k = 0. Select an initial solution x0 , initial sample size of the center point ˜ satisfying n0 and of the gradient estimator m0 , initial trust region size ∆0 , the switch threshold ∆ ˜ > 0, and constants η0 , η1 , γ1 , γ2 , ψ satisfying 0 < η0 < η1 < 1, 0 < γ1 < 1 < γ2 , and ψ > 1. ∆0 > ∆ ˜ go to Stage I ; otherwise, go to Stage II. Let Nk+1 = ⌈Nk · ψ⌉. Step 1. Let k = k + 1. If ∆k > ∆, Step 2. If the termination criterion is satisfied, stop and return the solution. Otherwise, go to Step 1.

Figure 7: STRONG-X main framework

9

Stage I. First-order polynomial (for kth iteration) 1. Build a first-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the RC test and update the size of trust region. • if ρk < η0 , let xk+1 = xk , ∆k+1 = γ1 ∆k and return to Step 1 in the main framework. • if η0 ≤ ρk < η1 , let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρk ≥ η1 , let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Stage II. Second-order polynomial (for kth iteration) 1. Build a second-order polynomial. 2. Solve the subproblem and obtain a new solution x∗k . 3. Take n0 replications at the new solution x∗k . 4. Conduct the RC test and update the size of trust region. • if ρk < η0 , go to inner loop. • if η0 ≤ ρk < η1 , let xk+1 = x∗k and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρk ≥ η1 , let xk+1 = x∗k and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework. Inner Loop (for ith loop in kth iteration) 1. Set subiteration count i = 0. Let nk0 = n0 , mk0 = m0 , and ∆k0 = ∆k . 2. Let i = i + 1. Increase the sample sizes for the current and new solutions as Section 3.4 of Chang et al. (2011) describes. 3. Increase the sample size of the gradient estimator as Section 3.4 of Chang et al. (2011) describes. 4. Build a second-order polynomial. 5. Solve the subproblem and obtain a new solution x∗ki . 6. Conduct the RC test and update the size of trust region. • if ρki < η0 , let ∆ki+1 = γ1 ∆ki , and go back to Step 2 in inner loop. • if η0 < ρki < η1 , let xk+1 = x∗ki and ∆k+1 = ∆k , and return to Step 1 in the main framework. • if ρki > η1 , let xk+1 = x∗ki and ∆k+1 = γ2 ∆k , and return to Step 1 in the main framework.

Figure 8: STRONG-X full algorithm

10

The second inequality of Equation (10) uses the fact that gbk (xk ) and gbk (xk+1 ) are independent.

Since Nk = ⌈Nk−1 · ψ⌉, ψ > 1, we know |Nk−1 /Nk | < 1. Since limk→∞ |

Nk−1 α2 | limk→∞ | N αk−1 2 k k

Nk−1 Nk |

< 1,

< 1. By the ratio test (Rudin, 1976), the series ∞ ∑ 2 sup k=1

2 x∈X σx 2 Nk αk

< ∞.

By Borel-Cantelli Lemma (Billingsley, 1995), Pr(|b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| > αk , i.o.) = 0

(11)

¯ for k ≥ K, ¯ =⇒ there exists K, |b gk (xk ) − gbk (xk+1 ) − g(xk ) + g(xk+1 )| < αk .

(12)

It follows then, w.p.1 ∞ ∑

|b gk (xk ) − gbk+1 (xk+1 ) − g(xk ) + g(xk+1 )|

¯ k=K ¯ K

<



|b gk (xk ) − gbk+1 (xk+1 ) − g(xk ) + g(xk+1 )| +

αk < ∞.

(13)

¯ k=K

k=1

Equation (13) follows because G(x) is truncated to ±M and

5.3

∞ ∑

∑∞

k=1 αk

< ∞.

Proof of Theorem 1

b ′ (xk ), Proof. The convergence proof requires additional notations. For every iteration k, we let ∇g k b ′ (xk ), and ∆′ denote the estimates of gradient and Hessian at xk and the trust-region size at the H k

k

end of iteration k before the algorithm moves to iteration k + 1. Note that Theorem 1 is analyzed for outer loop, where xk is random in the decision The “w.p.1” is respect to the randomness

space.

b ′

of the algorithm. Also note that, if lim inf k→∞ ∇g (xk ) = 0, then there exists a subsequence of k

{xkj }∞ j=0

such that b ′ (xk )∥ = 0. lim ∥∇g kj j

j→∞

By lemma 6 in (Chang et al., 2011), we have limj→∞ ∥∇g(xkj )∥ = 0 w.p.1, which implies that lim inf k→∞ ∥∇g(xk )∥ = 0 w.p.1. Therefore, it suffices to prove that

b ′

lim inf ∇g (x ) k k = 0 w.p.1. k→∞

(14)



b ′ We prove Equation (14) by contradiction. Suppose that lim inf k→∞ ∇gk (xk ) > 0. Then, by lemma 5 in (Chang et al., 2011), lim inf k→∞ ∆′k > 0 w.p.1. Then, for almost all sample paths,

11



b ′

′ there exists a constant K1 > 0 such that ∇g k (xk ) ≥ ε and ∆k ≥ ∆L when k ≥ K1 for some ϵ > 0 and ∆L > 0. For each iteration, gbk (xk ) − gbk (xk+1 ) = G(xk , Nk ) − G(xk+1 , Nk ) b ′ (xk )∥ · ∆′ (for Stage I), or > η02 ∥∇g k k { } b ′ (xk )∥ ∥∇g 1 2 b ′ k > η ∥∇gk (xk )∥ · min , ∆′k . (for Stage II or inner loop) b ′ (xk )∥ 2 0 ∥H k For k ≥ K, ∞ ∑

gbk (xk ) − gbk (xk+1 ) =

k=K

=

∞ ∑ k=K ∞ ∑

G(xk , Nk ) − G(xk+1 , Nk ) G(xk , Nk ) − G(xk+1 , Nk ) + G(xk+1 , Nk ) − G(xk+1 , Nk+1 )

k=K

>

˜ η02 ε nI (K)∆

∞ {ε } ∑ 1 2 , ∆L + + η0 ε nII (K) min G(xk+1 , Nk ) − G(xk+1 , Nk+1 ), 2 κ k=K

(15) where nI (K) and nII (K) are the numbers of successful iterations of Stages I and II after iteration ˜ is the lower bound of the trust-region size for STRONG to switch from Stage I to II, and κ K, ∆ b ′ (xk )∥. Moreover, is the upper bound of ∥H k

∞ ∑

= <

k=K ∞ ∑ k=K ∞ ∑

G(xk+1 , Nk ) − G(xk+1 , Nk+1 ) G(xk+1 , Nk ) − g(xk+1 ) + g(xk+1 ) − G(xk+1 , Nk+1 ) |G(xk+1 , Nk ) − g(xk+1 )| + |g(xk+1 ) − G(xk+1 , Nk+1 )|

k=K

< ∞

(16)

Equation (16) uses Equation (8) in lemma 1. Also, since there are infinitely many successful iterations, at least one of σI (K), σII (K) → ∞, it follows then Equation (15) = ∞ is unbounded. ∑ By Equation (7) in lemma 1, we know that ∞ k=K (g(xk )−g(xk+1 )) is also unbounded, which clearly contradicts the fact that g(x) is bounded below and the proof is complete.

References Andrews, D.W.K. 1992. Generic uniform convergence. Econometric Theory 8 241–257. Billingsley, P. 1995. Probability and Measure. John Wiley and Sons., New York. 12

Chang,

K.-H.,

method mization.

L.J. Hong,

(STRONG)-a To

appear

H. Wan. 2011. new in

Stochastic trust-region response-surface

response-surfance INFORMS

Journal

framework on

for

Computing

simulation (download

optifrom

http://sites.google.com/site/ssoptimizationlab/publications) . Conn, A.R., N.L.M. Gould, P.L. Toint. 2000. Trust-Region Methods. SIAM. Myers, R.H., D.C. Montgomery, C.M. Anderson-Cook. 2009. Response Surface Methodology-Process and Product Optimization Using Designed Experiments. John Wiley and Sons., New York. Rudin, W. 1976. Principles of Mathematical Analysis. McGraw-Hill., New York. Serfling, R.J. 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.

13

An Extended Framework of STRONG for Simulation ...

Feb 29, 2012 - Indeed, STRONG is an automated framework with provable .... Construct a local model rk(x) around the center point xk. Step 2. .... We call the sample size required for each iteration a sample size schedule, which refers to a.

196KB Sizes 1 Downloads 98 Views

Recommend Documents

NetTopo: A Framework of Simulation and Visualization for ... - CiteSeerX
Oct 30, 2008 - networks (WSNs) applications drive the fast development of research in various .... e.g., energy consumption, bandwidth management. NetTopo.

NetTopo: A Framework of Simulation and Visualization for ... - CiteSeerX
Oct 30, 2008 - hazardous situations [3], deploying a testbed is unwanted since ... in a physical WSN testbed and interact with each other to ..... The private variable is initialized by calling the ..... an easy way for users to configure their input

An extended edge-representative formulation for ... - ScienceDirect.com
1 Introduction. Let G(V,E) be a graph with weights wij on each edge ij of E. The graph partitioning problem consists in partitioning the nodes V in subsets called.

a simulation framework for energy efficient data grids
ing a data grid that can conserve energy for data-intensive ... Figure 1: A system architecture for data grids. 1418 .... distributed memory multiprocessors.

An Architectural Framework for Interactive Music Systems
Software Architecture, Interactive Systems, Music soft- ... synthesis of data media of different nature. ... forms (e.g. Max/MSP [19] and Pure Data [24]), and oth-.

An Evidence Framework For Bayesian Learning of ...
data is sparse, noisy and mismatched with test. ... In an evidence Bayesian framework, we can build a better regularized HMM with ... recognition performance.

An adaptive SPH method for strong shocks
We propose an alternative SPH scheme to usual SPH Godunov-type ... and working with the total energy equation rather than the thermal energy equation.

An adaptive SPH method for strong shocks
the resolution in a continuum description of the flow with a certain ...... that the segment of marginal stability shortens as the size of the kernel support increases.

An Efficient Parallel Dynamics Algorithm for Simulation ...
portant factors when authoring optimized software. ... systems which run the efficient O(n) solution with ... cated accounting system to avoid formulation singu-.

An introduction to MCSim: a MetaCommunity Simulation package for ...
Nov 1, 2013 - package can be used to compare beta-diversity between simulations with low .... will need an internet connection and you may need to choose a CRAN mirror (the server from which you ..... Oxford University Press, Oxford, UK.

An Event-based Framework for Characterizing the ...
for evolving networks, based on our framework. Categories and Subject Descriptors: H.2.8 Database. Management: Database Applications - Data Mining.

Zemberek, an open source NLP framework for Turkic ...
source, platform independent NLP framework not only for Turkish but also all Turkic languages. In. This paper .... There is no agreed standard naming scheme defined for. Turkish yet. Most suffixes contain a production ..... environment: AMD Athlon 64

Instrumentino: An open-source modular Python framework for ...
Official Full-Text Paper (PDF): Instrumentino: An open-source modular ... 1. Introduction. In the process of scientific research, many laboratories around ..... [18] N. Barroca, et al., Wireless sensor networks for temperature and .... The communicat