Stochastic Nelder-Mead Simplex Method-A New ...

Viewer
Transcript

Stochastic Nelder-Mead Simplex Method-A New Globally Convergent Direct Search Method for Simulation Optimization Kuo-Hao Chang Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, Taiwan

Abstract Nelder-Mead simplex method (NM), originally developed in deterministic optimization, is an eﬃcient direct search method that optimizes the response function merely by comparing function values. While successful in deterministic settings, the application of NM to simulation optimization suﬀers from two problems: (1) It lacks an eﬀective sample size scheme for controlling noise; consequently the algorithm can be misled to the wrong direction because of noise, and (2) it is a heuristic algorithm; the quality of estimated optimal solution cannot be quantiﬁed. We propose a new variant, called Stochastic Nelder-Mead simplex method (SNM), that employs an eﬀective sample size scheme and a specially-designed global and local search framework to address these two problems. Without the use of gradient information, SNM can handle problems where the response functions are nonsmooth or gradient does not exist. This is complementary to the existing gradient-based approaches. We prove that SNM can converge to the true global optima with probability one. An extensive numerical study also shows that the performance SNM is promising and is worthy of further investigation. Keywords: Simulation, Direct search method, Nelder-Mead simplex.

1

Introduction

Simulation is one of the most popular planning tools in operations research and management science. The advantage of simulation is that it can account for any detail that is important to the system under investigation. Simulation optimization is concerned with identifying optimal design parameters for a stochastic system, where the objective function is expressed as an expectation of a function of response variables associated with a simulation model. Over the decades, simulation optimization has received considerable attention owing to a wide range of real-world applications, for example, designing production plans that minimize the expected inventory cost with stochastic customer demands and selecting ﬁnancial portfolios that maximize the expected total proﬁts with stochastic asset prices. In this paper, we are focused on simulation optimization where the decision variables are continuous (henceforth called “continuous simulation optimization”). Recent methodology development 1

for continuous simulation optimization have been discussed in much literature, for example, Banks (1998), Fu (2002), Tekin and Sabuncuoglu (2004), and Fu (2006). Ang¨ un and Kleijnen (2010) classiﬁed the methodologies of continuous simulation optimization into two categories: the whitebox method and black-box method. The white-box method corresponds to the methods where the gradient can be estimated via a single simulation run—best known are perturbation analysis, likelihood ratio/score function method (Fu, 2006), while the black-box method is directed to the method where the simulation model is essentially treated as a black box; examples are simultaneous perturbation stochastic approximation (SPSA)(Spall, 2003), response surface methodology (RSM)(Myers et al., 2009), and the many metaheuristics (genetic and evolutionary algorithm, scatter search, simulated annealing, tabu search). While the white-box methods are typically computationally efﬁcient, they require substantial knowledge of the simulation model, such as the input distributions and/or some of the system dynamics, to derive an eﬀective and eﬃcient gradient estimator. When the simulation model is too complex to be known, the white-box method cannot be used and the black-box method appears to be the only option. Stochastic approximation (SA) (Kiefer and Wolfowitz, 1952) may be one of the most prevalent and extensively studied black-box method over the past decades, e.g., Blum (1954), Fabian (1971), Benveniste et al. (1990), Andrad´ottir (1995), Kushner and Yin (1997), Wang and Spall (2008), Bhatnagar et al. (2011) and Andrieu et al. (2011). The advantages of SA are that it requires minimal knowledge of the simulation model and that it can be proved to converge under certain regularity conditions (Kushner and Yin, 1997). On the other hand, RSM, as stated in Myers et al. (2009), “is a collection of statistical and mathematical techniques useful for developing, improving, and optimizing processes.” Taking a sequential experimentation strategy, RSM approximates the response function based on ﬁrst-order regression models, and later switches to second-order regression models whenever ﬁrst-order models are insuﬃcient to represent the underlying response function. Some analytical approaches such as canonical analysis and ridge analysis are used to locate the optimal solution. Full coverage about RSM can be found in many classic texts such as Khuri and Cornell (1996) and Myers et al. (2009). While SA and RSM are extensively used and enjoy tremendous success in many applications, the requirement of ﬁrst- or second-order Taylor approximation has serious limitations when solving problems whose objective functions are ill-behaved, for example, nondiﬀerentiable functions 2

(Swann, 1972), which arise in many engineering problems. This motivates the interest of direct search methods, because they “rely only on function values to ﬁnd the location of the optimum” (Barton and Ivey, 1996). In fact, there has been an increasing interest in direct search methods during the past few years, for example, Hooke and Jeeves (1961), Nelder and Mead (1965), Spendley et al. (1962), Anderson and Ferris (2001), and Kolda et al. (2003). The direct search methods are advantageous in that they do not require gradient estimation and involve relatively few function evaluations at each iteration. In practice they have generally proved to be robust and reliable (Swann, 1972). Nelder-Mead simplex method (NM) (Nelder and Mead, 1965), originally developed for unconstrained optimization of deterministic functions, is one of the most popular direct search methods (Barton and Ivey, 1996). Fletcher (1987) noted that “Nelder-Mead simplex method is the most successful of the methods which merely compare function values.” While successful in deterministic settings, the direct applications of NM to simulation optimization where the response variable is in the presence of noise have some serious limitations. First, NM lacks an eﬀective sample size scheme for controlling noise; consequently when the response variable is grossly noisy, the noise can corrupt the relative ranks of solutions, leading the algorithm to a totally wrong direction. Second, NM is a heuristic algorithm (Spall, 2003); the quality of estimated optimal solution cannot be quantiﬁed. While a convergent variant of NM is developed for deterministic optimization, e.g., Price et al. (2002); to our best knowledge, the convergence of NM in stochastic environment is not addressed, except some heuristic modiﬁcations designed to handle noisy response functions, e.g., Barton and Ivey (1996) and Humphrey and Wilson (2000). We propose a new variant of NM, called Stochastic Nelder-Mead simplex method (SNM), for simulation optimization. Inheriting the advantages of NM, SNM only uses function values to optimize the response function instead of requiring gradient estimation; therefore it can solve problems whose objective function is nonsmooth, or gradient does not exist. This makes SNM complementary to the existing gradient-based approaches. Moreover, SNM is equipped with a specially-designed sample size scheme, thus the noise inherent in the response variable can be eﬀectively controlled and the mistakes made when ranking a set of solutions can be minimized. A newly-developed global and local search framework further enables SNM to prevent the algorithm from premature convergence, a notorious weakness of NM. We prove that SNM is a globally convergent algorithm, 3

i.e., it is guaranteed that the algorithm can achieve the global optima with probability one. Two computational enhancement procedures, LHS and SNS, are developed for SNM to achieve the best cost economy when handling practical problems. The rest of this paper is organized as follows. In Section 2, we will formally deﬁne the problem. In Section 3, the main framework of SNM as well as its convergence analysis are presented. In Section 4, we discuss two procedures that can signiﬁcantly enhance the computational eﬃciency of SNM. In Section 5, we use an extensive numerical study to demonstrate the performance of SNM and compare it with other competing algorithms. We conclude with future direction in Section 6.

2

Problem deﬁnition

Consider the following continuous simulation optimization problem: Minimizex∈X

E[G(x, ω)],

(1)

where x is a p–dimensional vector of continuous decision variables, X is the parameter space, and ω is a random variable deﬁned in the probability space (Ω, F, P). The stochastic response G(x, ω) takes two inputs, the design parameters x and a random sample from the distribution of ω. A basic assumption requires that the response function G(x, ω) is measurable, so that the expectation E[G(x, ω)] exists for every x. The objective function g(x) = E[G(x, ω)] is assumed not analytically available and can only be evaluated by “black-box” stochastic simulation with noise. Note that in many practical problems, g(x) can be functionally ill-behaved. To enhance the applicability of the developed methodology, we do not assume diﬀerentiability for g(x) but only assume g ∗ = inf x∈X g(x) > −∞. In what follows, we suppress ω in G(x, ω) for ease of presentation. When g(x) is diﬀerentiable, many existing approaches, e.g., Deng and Ferris (2009) and Chang et al. (2009), have been developed to solve (1). They take advantage of the smoothness property of g(x), approximate the gradient, and perform optimization based on the estimated gradient. These gradient-based approaches can fall short when g(x) lacks diﬀerentiability. For example, in many engineering problems, the objective function can be nonsmooth, failing the use of the gradient-based approaches. This paper aims at developing a direct search-based methodology to bridge the gap. Due to the use of ordinal information of solutions instead of requiring gradient information, the developed 4

methodology can withstand small random perturbations in function values, which is attractive in simulation optimization, because in which the response variable is in the presence of noise.

3

Stochastic Nelder-Mead simplex method

3.1

Nelder-Mead simplex method

Nelder-Mead Simplex method (NM) (Nelder and Mead, 1965) was originally developed for nonlinear and deterministic optimization. The idea of simplex —a geometric object that is the convex hull of p + 1 points in Rp not lying in the same hyperplane—can be traced back to (Spendley et al., 1962). NM is renowned because of the following advantages. First, it searches for the new solution by reﬂecting the extreme point with the worst function value through the centroid of the remaining extreme points. With suﬃcient geometric justiﬁcations, the algorithm can ﬁnd the improved solution more eﬃciently, compared to random search. Moreover, since no gradient information (either noisy or noise-free) is required through the process, it can handle problems for which the gradient does not exist everywhere. Second, NM allows the simplex to rescale or change its shape based on the local behavior of the response function. When the newly-generated point has good quality, an extension step will be taken in the hope that a better solution can be found. On the other hand, when the newly-generated solution is of poor quality, a contraction step will be taken, restricting the search on a smaller region. Third, since NM determines its search direction only by comparing the function values, it is insensitive to small inaccuracies or stochastic perturbations in function values. In spite of the advantages discussed above, NM has several disadvantages that limit its applications in simulation optimization. First, NM lacks eﬀective mechanisms (e.g., a sample size scheme) to control the noise. Therefore, when noise dominates the true diﬀerences in the function values, the relative rank of function values in the simplex can be wrong due to the sampling error, leading the algorithm to the wrong direction. To illustrate, consider a 10-D Rosenbrock function (More et al., 1981) with a varying noise level solved by NM. The global minima occurs at g ∗ = 0. Figure 1 demonstrates the performances of NM under three diﬀerent noise levels, S.D.=0, S.D.= 0.1 ∗ g(x), and S.D.= g(x) that correspond to cases of no noise, mild noise and large noise, respectively. Let log(1 + g(x)) be the performance measure (Y-axis). It can be clearly seen that as noise is getting large, NM is driven to diverge. This shows that a direct application of NM to problems in the 5

Performance of NM under Difference Noise Structure 8

7

S.D.=0 S.D.=0.1*g(x) S.D.=g(x)

6

log(1+g(x))

5

4

3

2

1

0

0

1000

2000 3000 4000 Number of observations

5000

6000

Figure 1: Performance of NM under diﬀerent noise structures stochastic environment can result in unsatisfactory performance. Second, NM might perform the shrink step frequently and in turn reduce the size of simplex to the greatest extent. Consequently, the algorithm can converge prematurely at a nonoptimal solution. Moreover, since the shrink step requires reevaluations for all points except the best one, an excessive number of simulation runs are typically demanded. Third, while some eﬀective modiﬁcations exist for NM to be applied to simulation optimization, e.g., Barton and Ivey (1996) and Humphrey and Wilson (2000), they are essentially heuristic algorithms without convergence guarantee, i.e., the quality of ﬁnal solution can not be quantiﬁed. We study the classic NM method, exploit its structure, and propose a new variant, SNM, for simulation optimization. SNM addresses the weaknesses, yet maintains the advantages, of the classic NM. Embedded with an eﬀective sample size scheme, SNM can control noise, minimizing the mistakes made when ranking a set of solutions in the presence of noise. A new global and local search framework, called Adaptive Random Search, is further developed to replace the shrink step in the classic NM so as to prevent the algorithm from premature convergence and mitigate the required intensive computations. We prove that SNM can converge to the global optima with probability one.

6

3.2

The SNM framework

We start this section by introducing some notation. Let gˆk = Sk /Nk be the estimate of g in iteration k. Sk refers to the sum of Nk objective function evaluations by iteration k. For p +1 extreme points, let xmax , x2max , and xmin represent the points that have the highest, second highest, and lowest k k k estimates of function values, respectively in iteration k. In any iteration of the algorithm, for example iteration k, SNM conducts the following four steps: Step 1. Generate a new point xk . Step 2. Take Nk and Nk − Nk−1 samples at the new point and the points that have been visited respectively. Step 3. Calculate gˆk = Sk /Nk for the new point and update gˆk for the points that have been visited. Remove the point that has the highest gˆk . Step 4. Stop if a prespeciﬁed convergence criterion is satisﬁed or if the maximum number of function evaluations has been reached; return x∗ = xmin k . Otherwise, go to Step 1. These four steps and a sample size scheme (Section 3.3) are cornerstones of SNM. The full algorithm and the detailed descriptions are provided in Figure 5 in the Appendix. Two remarks deserve to be made for SNM. First, the way that SNM generates a new point is similar as NM, except that SNM does not perform the “shrink” step. Speciﬁcally, SNM starts with generating the reﬂection point, followed by the contraction point if the reﬂection point is not promising, and ﬁnally performs the newly-developed Adaptive Random Search (ARS) when all previous steps fail. As will be introduced in Section 3.4, ARS guarantees a positive probability of ﬁnding a better point in each iteration, thus enables SNM to converge to the global optima with probability one because of nonstop improvements. Second, to ensure all points being correctly ranked, SNM requires each point having enough sample sizes deﬁned by the sample size scheme Nk . That said, Nk samples are taken at the new point, and Nk − Nk−1 samples are taken at the points that have been visited (note that these points already have Nk−1 samples taken in previous iterations). By comparing the sample averages of all points, SNM removes the worst point, i.e., the point that has the highest sample average, and determines the moving direction for next iteration. The developed sample size

7

scheme, as will be proved, can ensure that SNM can tell whether a new point should be accepted and identify the truly best point in the set of simplex points.

3.3

Development of sample size scheme

As we have pointed out, a proper sample size scheme is essential for SNM to minimize the errors made through accepting and removing points in the set of simplex points. In this section, we investigate the required sample size scheme so as to render the algorithm convergent. For ease of exposition, we call the newly-generated point xnew k , regardless of how it is generated, and use Nk to represent Nk (x). Recall that a newly-generated point is accepted if its sample average is lower than that of xmax , k and a point is removed if it has the highest estimate of function value in the set of simplex points. For each iteration k, let x∗k denote the point that has the highest true function value in Xk , and let the indicator process be I(k) =

{ 1 if gˆk (x∗k ) ≥ maxx∈Xk \x∗k gˆk (x) 0 otherwise.

Assumption 1. E[esG(x) ] exists and is continuous in a neighborhood of s = 0. Assumption 1 implies the existence of moment-generating functions in the neighborhood of zero. While for each solution the accuracy of performance measure cannot improve faster than √ O(1/ Nk ), as a result of averaging i.i.d. noise over Nk simulation replications, our goal is not to ﬁnd the absolute performance measure for each solution but to identify the solution that has the highest performance measure. Lemma 1. If Assumption 1 holds, there exists a positive number β > 0 such that Pr[I(k) = 1] = 1 − O(e−βNk ), Pr[I(k) = 0] = O(e−βNk ).

(2)

Lemma 1 shows that the probability that the solution with the highest true function value also has the highest sample average among others will go to one as the sample size goes to inﬁnity. Moreover, the convergence rate for identifying the point with the true highest function value using the highest sample average is exponential. Note that the bound provided on Lemma 1 does not depend on the set of solutions under investigation. 8

Let Bk be the event that the removed point is not the one that has the highest true function value in Xk , Dk be the event that a point is wrongly accepted, i.e., it does not have lower function value than xmax and Fk be the event that the estimator of optimal solution xmin is not the point that k k has the lowest true function value in Xk . In the following lemma , we prove that the probabilities that these events can occur inﬁnitely often is zero under appropriate sample size scheme. We use “i.o.” to denote “inﬁnitely often”. Lemma 2. If Assumption 1 holds, and let Nk be a sequence satisfying ∞ ∑

α Nk < ∞

for all α ∈ (0, 1)

(3)

k=1

Then, Pr(Bk i.o.) = 0, Pr(Dk i.o.) = 0, and Pr(Fk i.o.) = 0. Proof. We only prove the ﬁrst assertion; others can also be obtained based on similar arguments. ∑ ∑∞ Note that ∞ k=1 Pr(Bk ) = k=1 C1 exp(−βNk ) < ∞, where β, C1 > 0 are constants. The ﬁrst equality is due to Equation (2), and the ﬁrst inequality is due to Equation (3). By Borel-Cantelli lemma (Billingsley, 1995), Pr(Bk i.o.) = 0. Lemma 2 suggests that if the sample size at each iteration grows as Equation (3) requires, the sample average-based removal and acceptance rules will be almost surely correct; moreover, for suﬃciently large k, the estimator of optimal solution is guaranteed to be the point that has the true lowest function value in the set of simplex points. Note that not all increasing sample size schemes satisfy Equation (3). Homem-De-Mello (2003) gave a counterexample: let Nk = c log k, c > 0, and under this sample size scheme, we have ∞ ∑

αc log k =

k=1

∞ ∑

ec log k log α =

k=1

∞ ∑

c

k log α .

(4)

k=1

Equation (4) diverges when α > exp(−1/c), failing the fulﬁllment of Equation (3).

3.4

Adaptive random search

The Adaptive Random Search (ARS), consisting of a local search and a global search, is designed to ﬁnd a satisfactory solution in the parameter space when the contraction step fails to generate a solution of good quality. With a uniquely deﬁned neighborhood structure, ARS makes a balance between exploring the parameter space (i.e., global search) and exploiting the local region (i.e., 9

local search). Suppose there are p + 1 solutions, say xi , i = 1, 2, 3, . . . , p + 1, the neighborhood of solution xi is deﬁned as ψ(xi ) = {x ∈ X : ∥x − xi ∥ ≤ min{∥xi , xj ∥, ∀j ̸= i}}, which is adaptively changed based on the relative locations of solutions generated by SNM. Let F (x) refers to the ﬁtness function for which the higher ﬁtness value implies the better solution quality. For example, for a minimization problem, we can let F (x) = 1/g(x). A schematic representation of ARS procedure is described as follows: Step 1. Decide to perform global search or local search with probability Ps > 0 and 1 − Ps > 0 respectively. Go to Step 2a if local search is selected and go to Step 2b if global search is selected. ∑ Step 2a. Select a neighborhood of xi with probability F (xi )/ p+1 i=1 F (xi ) and perform uniform sampling on the selected neighborhood. Call the sampled solution xARS . Step 2b. Perform uniform sampling on the whole parameter space. Call the sampled solution xARS . Step 3. Calculate gˆk (xARS ). Step 4. If gˆk (xARS ) ≤ gˆk (xmax ), accept xARS and let Xk = Xk−1 k

∪

{xARS }. Otherwise, return to

Step 1.

3.5

Convergence analysis

In what follows, we analyze the convergence of SNM. We ﬁrst show that ARS is able to ﬁnd a satisfactory solution any close to the global optima and then show that SNM can achieve convergence because of ARS. Let Xϵ = {x ∈ X : g(x) ≤ g ∗ + ϵ} denote ϵ−neighborhood of the global optima. Assumption 2 requires that for every small ϵ, ARS has a positive probability that the solutions belonging to Xϵ is sampled. Assumption 2. For each ϵ, µ(Xϵ ) > 0. Without loss of generality, we assume that all the previous steps fail and SNM is forced to perform ARS to ﬁnd the improved solution. In Lemma 3, we prove that for any ϵ > 0 and suﬃciently 10

large k, SNM must be able to ﬁnd a solution whose true function value is within ϵ−neighborhood of the global optima with probability one. Deﬁne Ek (ϵ) = {∃ x ∈ Xk s.t. g(x) < g ∗ + ϵ} as the event that the algorithm can ﬁnd a ¯k (ϵ) as the complement of Ek (ϵ). We solution within ϵ−neighborhood of the global optima and E have the following lemma. Lemma 3. Suppose Assumptions 1 and 2 hold. Then, for any ϵ > 0 ¯k (ϵ), i.o.) = 0. Pr(E

(5)

Proof. Fix ϵ > 0. Without loss of generality, assume that for xi ∈ X0 , xi ∈ / Xϵ for i = 1, 2, . . . p + 1. Note that ¯k (ϵ)) = Pr{g(xi ) ≥ g ∗ + ϵ for all xi ∈ Xk } Pr(E ≤ Pr{ max g(xi ) ≥ g ∗ + ϵ} xi ∈Xk

≤ Pr{xnew ∈ / Xϵ , for t = 1, 2, . . . , k} t

(6)

+ Pr{at least one of t = 1, 2, . . . , k such that xnew ∈ Xϵ , and gˆt (xnew ) > gˆt (xmax )} t t k ≤ (1 − Ps µ(Xϵ ))k + O(C2 Ps µ(Xϵ ) exp(−βNk ))

(7)

where C2 > 0 is a constant. Note that the probability that xnew is sampled from Xϵ is at least t Ps µ(Xϵ ) and that the global sampling is independent of each other and of the observations evaluated at the sampled points. Equation (7) follows because of Lemma 1. Further, ∞ ∑ k=1

¯k (ϵ)) ≤ Pr(E

∞ ∑ (1 − Ps µ(Xϵ ))k + C2 Ps µ(Xϵ ) exp(−βNk ) k=1

∞ ∞ ∑ ∑ k < (1 − Ps µ(Xϵ )) + C2 Ps µ(Xϵ ) exp(−βNk ), k=1

k=1

< ∞.

(8)

Note that 1 − Ps µ(Xϵ ) < 1 because of Assumption 2. Equation (8) follows due to Equation (3). By ¯k i.o.) = 0. This concludes the lemma. Borel-Cantelli lemma (Billingsley, 1995), we have Pr(E A detailed examination of the proof of Lemma 3 shows that only the distribution of global sampling matters. Therefore, users can devise a more eﬃcient local search framework based on

11

problems without hurting Lemma 3. Also note that, the only restriction imposed on the global sampling is Assumption 2. As long as Assumption 2 holds, the global sampling can also be changed. We now present the convergence analysis of the SNM method. In the following theorem, we use “w.p.1” to denote “with probability one.” Theorem 1. Suppose Assumptions 1 and 2 hold, and

∑∞

k=1 α

Nk

< ∞ for all α ∈ (0, 1). If SNM

∞ generates a sequence of {xmin k }k=1 , then we have ∗ g(xmin k )→g

w.p.1.

(9)

Proof. Fix ϵ > 0. ∗ min ∗ ¯ Pr{g(xmin k ) > g + ϵ} ≤ Pr(Ek (ϵ)) + Pr{g(xk ) > g + ϵ, Ek (ϵ)}

By Equation (8), we know

∑∞

¯

k=1 Pr(Ek (ϵ))

(10)

< ∞. Recall that Fk is the event that the estimator of

optimal solution xmin is not the point that has the highest true function value in Xk . We have k ∗ Pr{g(xmin k ) > g + ϵ, Ek (ϵ)} ≤ Pr(Fk ).

(11)

∑ ∑∞ min ∗ By Lemma 2, we know that ∞ k=1 Pr(Fk ) < ∞, and therefore k=1 Pr{g(xk ) > g + ϵ, Ek (ϵ)} < ∑ min ∗ ∞. Consequently, ∞ k=1 Pr{g(xk ) > g + ϵ} < ∞. By Borel-Cantelli lemma, and let ϵ → 0, the conclusion follows. The convergence of SNM substantially relies on the ARS procedure. While the global search plays a key role in achieving convergence, the local search expedites the convergence by narrowing down the search regions to several promising ones so the algorithm can achieve convergence quickly. Note that, the convergence of SNM does not require restrictive functional properties for the objective function, such as diﬀerentiability. This is valuable because, as we have pointed out earlier, many engineering problem does not have diﬀerentiable objective function. This can greatly extend the applicability of SNM. Also note that the sample size scheme as required in Equation (3) leaves much room for end users to design the most cost eﬃcient one for their own problems.

4

Enhancement of computational eﬃciency

While SNM is proved to possess nice global convergence, it does not guarantee a satisfactory performance when applied to practical problems. In this section, we oﬀer two procedures, the 12

generation of a better initial simplex and a signal-to-noise sample size scheme, to enhance the computational performance of SNM.

4.1

Generation of a better initial simplex

In classic NM, the initial simplex is formed by randomly generating p + 1 points that do not lie on the same hyperplane. However, experiences indicate that the initial simplex can signiﬁcantly aﬀect the performance of NM; a poor initial simplex can lead to unsatisfactory performance. Therefore, it makes sense to have a wiser generation of the initial simplex. Two goals are relevant when generating the initial simplex: (1) the simplex points are spread in the parameter space such that the exploration of the underlying response function is maximized, and (2) the simplex points must not lie on the same hyperplane (otherwise the reﬂection point will be on the same hyperplane too and the algorithm will never be able to explore other regions). We propose to use Latin Hypercube Sampling (LHS) for generation of the initial simplex, because LHS can simultaneously meet the two goals that are desired. The simplex generated based on LHS is more representative of the parameter space compared to random generation. LHS is a generalization of Latin Square design (LSD) (Montgomery, 2005) to arbitrary number of dimensions, whereby a square grid containing only one design point in each row and each column. The detailed algorithm of generating a LHS is given in the appendix.

4.2

Signal-to-Noise sample size scheme

In Section 3.2, the sample size scheme is required to satisfy ∞ ∑

α Nk < ∞

for all α ∈ (0, 1)

(12)

k=1

in order to render the algorithm achieve convergence. A typical selection of the sample size scheme is

√ Nk = [ k],

(13)

where [x] is the largest integer of x. Equation (13) implies that the sample size is small when k is small (e.g., in early iterations), but is gradually increasing as the algorithm iterates. While the algorithm convergence can be achieved, in early iterations, the simplex can be thrown oﬀ course due to insuﬃcient sample sizes.

13

We propose a Signal-to-Noise sample size scheme (SNS) to address the problem. To ensure the algorithm can result in correct ranking when comparing a set of points, we suggest that the ratio of signal to noise should be at least 3, where signal represents the function value and noise represents the standard deviation, described as follows: g(x) σ(x)/√N

k

≥ 3.

(14)

Note that in stochastic environment both g(x) and σ(x) are both unknown and should be estimated. Combining Equation (13) and (14) yields the following new sample size scheme, √ σ(x) 2 Nk′ = max{[ k], 9( ) }, g(x)

(15)

where Nk′ is the sample size that satisﬁes both requirements. Based on Equation (15), in early iterations the sample size is determined according to SNS, while the original selection of sample size scheme will gradually dominate as the algorithm progresses. As a conseqence, Equation (15) not only fulﬁlls the requirement for the algorithm to converge, but also ensures that the sample size is suﬃcient in early iterations.

5

Numerical study

In this section, we conduct a numerical study to evaluate the performance of SNM. In particular, we compare the performance of SNM with three widely-used algorithms, Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall, 2003), Modiﬁed Nelder-Mead (MNM) method (Barton and Ivey, 1996), and Pattern Search (PS) on 96 scenarios, constituted by 8 types of test functions, 3 types of dimensionalities, 2 types of variance settings, 2 types of initial solutions. In particular, for SPSA and PS, we use existing implementations given in http://www.jhuapl.edu/SPSA/Pages /MATLAB.htm and patternsearch function in MATLAB (version 7).

5.1

Description of test problems

The test problem is composed of a deterministic function with added noise, i.e., G(x) = g(x)+ϵx . For each test problem, we consider three types of dimensionality: 4D, 12D and 16D, which correspond to low-, moderate-, and high-dimensional problems respectively. The added noise ϵx is generated from N (0, σ 2 (x)) with two types of variances: ﬁxed (σ(x) = 10) and varying (σ(x) = 0.5 · g(x). 14

Note that in the latter setting, when the initial solution is selected far from the true optima, the function value g(x) will be large, making the response variable grossly noisy. This allows us to test the stability of algorithms. To evaluate the eﬀect of initial solutions, the initial solution is generated by: (1) using a ﬁxed initial solution, 10 · 1p , where 1p = [1, 1, ..1]T and p is the dimensionality of the problem; and (2) randomly selecting an initial solution from the parameter space. Eight types of test functions are used. The ﬁrst type is the absolute value function (AV): g(x) =

p ∑

|xi − 1|.

(16)

i=1

This function does not have gradient on the points where one of xi is equal to 1. We use this function to understand the the performance of SNM when handling nonsmooth response functions. The second to eighth test functions are selected from literature in nonlinear optimization (More et al., 1981) and are considered diﬃcult even in deterministic settings. Speciﬁcally, the second type is the extended Rosenbrock function (ER), g(x) =

p−1 ∑

[100(xi − x2i+1 )2 + (1 − xi )2 ],

(17)

i=1

which is is a multimodal function when p ≥ 4 (Shang and Qiu, 2006). The third type is the Freudenstein and Roth function (FR), g(x) =

p/2 ∑

[−13 + x2i−1 + ((5 − x2i )x2i − 2)x2i ]2 + [−29 + x2i−1 + ((x2i + 1)x2i − 14)x2i ]2 , (18)

i=1

which is also a multimodal function (More et al., 1981). The fourth type is the Powell badly scaled function (PBS), g(x) =

p−1 ∑

[104 xi xi+1 − 1]2 + [exp(−xi ) + exp(−xi+1 ) − 1.0001]2 .

(19)

i=1

The ﬁfth type is the Beale function (BLE), g(x) =

p/2 ∑

[1.5 − x2i−1 (1 − x2i )]2 + [2.25 − x2i−1 (1 − x22i )]2 + [2.625 − x2i−1 (1 − x32i )]2 .

(20)

i=1

The sixth type is the Powell singular function (PS), g(x) =

p/4 ∑

√ √ (xi + 10xi+1 )2 + ( 5(xi+2 − xi+3 ))2 + (xi+1 − 2xi+2 )4 + ( 10(xi − xi+3 )2 )2 .

i=1

15

(21)

The seventh type is the Wood function (WD), g(x) =

p/4 ∑

√ (10(xi+1 − x2i ))2 + (1 − xi )2 + ( 90(xi+3 − x2i+2 ))2 + (1 − xi+2 )2

(22)

i=1

√ √ 2 +( 10(xi+1 + xi+3 − 2))2 + 1/ 10(xi+1 − xi+3 ) ,

(23)

and the eighth type is the Trigonometric function (TR), g(x) =

p ∑

{p −

i=1

5.2

p ∑

cos xj + i(1 − cos xi ) − sin xi }2 .

(24)

j=1

Performance measure

Two performance measures are used to evaluate the algorithm performance. In particular, when the initial solution is ﬁxed, the quantity log(g(x∗k ) − g ∗ + 1)

(25)

is used as the performance measure, where x∗k and g ∗ correspond to the best solution found when the algorithm is terminated and the true globally optimal function value, respectively. Equation (25) refers to the gap of function values between the ﬁnal solution and the true optima and is an absolute performance measure. On the other hand, when the initial solution is randomly generated, a relative performance measure, g(x∗k ) − g ∗ g(x0 ) − g ∗

(26)

is used. This quantity measures the percentage that the algorithm has not achieved relative to the initial total “distance.” When it is close to zero, the algorithm is close the optima; on the other hand, when it is close to one or even greater than one, the algorithm diverges. Note that when the initial solution is random selected, a relative performance measure makes more sense than an absolute performance measure because the algorithm can obtain a good absolute performance because of good “luck,” i.e., the initial solution is selected close to the true optima. For each algorithm and each scenario, 30 macroreplications are run. For each macroreplication, we stop the algorithm when the maximum function evaluations 50,000 are reached and report the performance measure as well as the associated standard deviation (given in the parenthesis). In the case where at least one macroreplication fails to converge, we report the ratio of successful macroreplications. For SNM, we also report the total number of points generated and the number 16

of times that ARS is performed (the second and third number, respectively). The ﬁtness function √ used in ARS is selected as 1/g(x); the sample size scheme is selected as Nk = [ k], and the Ps is set at 0.4.

5.3

Analysis

5.3.1

Evaluating the eﬀectiveness of LHS and SNS

We ﬁrst evaluate the computational gains obtained from the two computational enhancement procedures. We apply three algorithms, SNM, SNM+LHS, and SNM+SNS to the test problems with two variance settings. Here SNM, SNM+LHS, and SNM+SNS correspond to the original SNM algorithm, the SNM algorithm with generation of a better initial simplex and the SNM algorithm with the signal-to-noise sample size scheme. Since the results of all test functions exhibit similar behaviors, we only show that of the absolute value function. S.D.=10

S.D. = 0.5 *g(x)

900

900 SNM SNM+SNS SNM+LHS

800

SNM SNM+SNS SNM+LHS

800

600

600

500

500 g(x)

700

g(x)

700

400

400

300

300

200

200

100

100

0

0

100

200

300 400 Number of observations

500

600

0

700

0

100

200

300 400 Number of observations

500

600

700

Figure 2: Evaluating the eﬀectiveness of LHS and SNS In Figures 2, the gaps between the trajectories of SNM and SNM+LHS, and SNM and SNM+SNS represent the computational gains obtained by employing LHS and SNS respectively. It is clear that the both LHS and SNS can signiﬁcantly enhance the computational eﬃciency of SNM. In particular, when the response variable is grossly noisy, i.e., σ(x) = 0.5 · g(x), the computational gains are even larger.

17

5.3.2

Evaluating the impact of non-normal response variable on the SNM performance

To evaluate the impact of diﬀerent response variable distributions on the algorithm performance, we let ϵx ∼Uniform (-10, 10) and run the algorithm on the extended Rosenbrock function with a ﬁxed initial point. The algorithm performance is shown in Figure 3. It is found that, regardless of the variance setting, SNM is superior to the other three competing algorithms. This can be attributed to that SNM estimates the function value based on the sample average, which is approximately normally distributed no matter what the original distribution of the response variable is due to the central limit theorem eﬀect. S.D.=10

S.D.=0.5*g(x)

8

8 SNM SPSA MNM

6

6

5

5

4

4

3

3

2

2

1

1

0

0

200

400 600 800 Number of observations

1000

SNM SPSA MNM

7

log(g(x)+1)

log(g(x)+1)

7

0

1200

0

200

400 600 800 Number of observations

1000

Figure 3: Evaluating the impact of non-normal response variable distributions on the SNM performance

5.3.3

Evaluating the impact of Ps on the SNM performance

Next, we evaluate the impact of Ps used in ARS on the performance of SNM. Figure 4 shows the performance of SNM on the extended Rosenbrock function with a ﬁxed initial solution and ﬁxed variance. It can be easily seen that at early stages diﬀerent selections of Ps can result in distinct behaviors. As the algorithm achieves convergence, however, their behaviors are very similar. 5.3.4

Evaluating the eﬀectiveness of the sample size scheme

To evaluate the eﬀectiveness of the developed sample size scheme, we compare SNM and the standard NM algorithm using the extended Rosenbrock function, under ﬁxed and varying variance settings. The standard NM algorithm is run and restarted when it gets stuck, until the total 18

1200

7 Ps=0.2 Ps=0.4 Ps=0.6

6

log(g(x)+1)

5

4

3

2

1

0

0

200

400 600 800 Number of observations

1000

1200

Figure 4: Performance of SNM under diﬀerent Ps computational budget is exhausted. From Figure 5, we can see that SNM slightly outperforms NM S.D.=10

S.D.=0.5*g(x)

7

7 SNM NM

6

6

5

5

log(g(x)+1)

log(g(x)+1)

SNM NM

4

3

4

3

2

2

1

1

0

0

1

2 3 4 Number of observations

5

0

6 4

x 10

0

1

2 3 4 Number of observations

5

Figure 5: Evaluating the eﬀectiveness of the sample size scheme when the variance is ﬁxed, while the performance of SNM is signiﬁcantly better than NM when the variance is varying, i.e., when the noise level is high. This shows that the developed sample size scheme is relevant for the algorithm to achieve convergence, especially when the noise level is high. 5.3.5

Comparing the performance of SNM, SPSA, MNM, and PS under 96 Scenarios

Tables 1, 2, 3 and 4 show the numerical performance of SNM and other competing algorithms under 96 scenarios. Overall speaking, SNM has robust and better performance than other three competing algorithms under 96 scenarios. In particular, Table 1 provides the results when the

19

6 4

x 10

initial solution is ﬁxed and the variance is ﬁxed. Clearly, SNM has comparable performance with MNM and PS but much better performance than SPSA, which only converges on the absolute value function, the Powell singular function and the Trigonometric function. Table 2 gives the results when the initial solution is ﬁxed and the variance is varying. In this setting, SNM results in more satisfactory performance than SPSA, MNM and PS. Furthermore, by comparing Table 1 and 2, it can be easily seen that the performance of SPSA, MNM and PS become worse as the response variable becomes noisier. This is reasonable, because SPSA is known to be sensitive to the variance level of the response variable and MNM, as we pointed out earlier, can be seriously aﬀected when the response variable is extremely noisy due to a lack of a proper sample size scheme. PS also suﬀers from noise in the response variable. Tables 3 and 4 also indicate a clear preference for SNM over other three algorithms.

6

Conclusion

In this paper, we proposed a new direct search method, SNM, for continuous simulation optimization. SNM does not require gradient estimation and therefore it can handle problems where the response function is nonsmooth or gradient does not exist. This is complementary to the existing gradient-based approaches. We prove that SNM can converge to the global optima with probability one. An extensive numerical study shows that SNM can outperform the existing algorithms, SPSA, MNM and PS. The limitation of SNM, however, is that its performance can deteriorate as the problem dimensionality is getting large. This is attributable to the ineﬃciency of the ARS procedure when dealing with a large parameter space. More work should be done to address this problem. Future research should also include extending the developed SNM framework to handle problems with complex constraints, e.g, nonlinear and/or chance constraints.

Acknowledgments The author would like to thank two anonymous referees for their insightful comments and suggestions that have signiﬁcantly improved this paper. This research is supported in part by the Advanced Manufacturing and Service Management Center at National Tsing Hua University and National Science Council in Taiwan (NSC99-2221-E-007-038-MY1).

20

Table 1: Performance comparison when noise level is ﬁxed and initial solution is ﬁxed. The numbers in the SNM column represent the absolute performance measure and the associated standard deviation (given in the parenthesis), the total number of points generated and the number of times that ARS is performed. For other algorithms, only the absolute performance and the associated standard deviation are reported. If at least one macroreplication fails to converge, the ratio of successful macroreplications is reported.

SNM AV 4D 12D 16D ER 4D 12D 16D FR 4D 12D 16D PBS 4D 12D 16D BLE 4D 12D 16D PS 4D 12D 16D WD 4D 12D 16D TR 4D 12D 16D

Competing Algorithms SPSA MNM

PS

0.58(0.18), 35, 15 1.065(0.31), 257, 51 1.15(0.35), 492, 62

0.11(0.057) 0.24(0.10) 0.33(0.11)

0.29(0.32) 0.76(0.60) 1.0089(0.57)

0.84(0.26) 1.27(0.13) 1.31(0.10)

0.72(0.35), 51, 16 1.45(0.15), 253, 55 1.58(0.14), 404, 61

0% 0% 0%

2.30(0.71) 2.18(0.47) 2.43(0.40)

0.89(0.23) 1.14(0.22) 1.24(0.16)

2.076(0.43), 52, 46 2.99(0.24), 391, 97 3.12(0.16), 817, 106

0% 0% 0%

2.041(0.17) 2.82(0.12) 2.88(0.17)

2.11(0.12) 2.54(0.11) 2.63(0.10)

0.46(0.084), 15, 11 0.64(0.17), 381, 14 0.73(0.18), 743, 10

0% 0% 0%

0.68(0.24) 1.16(0.13) 1.21(0.10)

0.47(0.06) 0.84(0.03) 0.97(0.03)

1.00(0.29), 27, 30 1.57(0.23), 240, 56 1.61(0.24), 353, 67

0% 0% 0%

0.88(0.23) 1.64(0.50) 1.97(0.47)

1.39(0.13) 1.88(0.06) 2.03(0.04)

0.30(0.47), 43, 12 1.29(0.63), 382, 57 1.37(0.59), 1018, 41

0.11(0.061) 0.28(0.10) 0.40(0.10)

1.084(0.45) 1.76(0.50) 1.90(0.54)

1.32(0.14) 1.75(0.16) 1.86(0.21)

0.98(0.54), 67, 25 1.75(0.55), 627, 63 2.13(0.81), 1052, 34

0% 0% 0%

0.92(0.79) 1.96(0.71) 2.064(0.75)

1.95(0.25) 2.48(0.18) 2.55(0.19)

0.33(0.28), 18, 20 1.96(0.70), 224, 121 1.99(0.50), 427, 161

0.031(0.026) 0.044(0.034) 0.050(0.033)

0.50(0.53) 0.56(0.46) 1.097(0.70)

0.45(0.38) 0.83(0.25) 1.15(0.17)

21

Table 2: Performance comparison when noise level is varying and initial solution is ﬁxed. The numbers in the SNM column represent the absolute performance measure and the associated standard deviation (given in the parenthesis), the total number of points generated and the number of times that ARS is performed. For other algorithms, only the absolute performance and the associated standard deviation are reported. If at least one macroreplication fails to converge, the ratio of successful macroreplications is reported.

SNM AV 4D 12D 16D ER 4D 12D 16D FR 4D 12D 16D PBS 4D 12D 16D BLE 4D 12D 16D PS 4D 12D 16D WD 4D 12D 16D TR 4D 12D 16D

Competing Algorithms SPSA

MNM

PS

0.31(0.19), 44, 44 1.35(0.83), 162, 85 1.58(0.85), 221, 97

6.25E-06(3.69E-06) 0.25(0.65) 40%

70% 55% 40%

60% 55% 65%

0.71(0.33), 46, 34 1.90(0.81), 228, 57 2.63(1.037), 321, 77

15% 0% 0%

2.012(1.49) 95% 85%

70% 60% 45%

2.067(0.41), 31, 46 3.20(0.17), 95, 79 3.49(0.12), 146. 112

0% 0% 0%

2.95(0.33) 25% 55%

65% 65% 65%

0.089(0.16), 58, 22 0.41(0.23), 337, 13 0.45(0.28), 723, 18

0% 0% 0%

90% 35% 55%

65% 55% 55%

0.72(0.52), 40, 35 2.0063(0.71), 166, 65 2.060(0.62), 288, 75

0% 0% 0%

2.97(1.41) 30% 60%

85% 55% 55%

0.14(0.48), 915, 16 3.068(0.41), 119, 85 3.27(0.29), 175, 97

0.0058(0.0034) 10% 10%

4.19(0.29) 65% 60%

75% 60% 55%

1.15(0.86), 54, 36 3.63(0.92), 161, 60 4.19(0.27), 197, 78

95% 0% 0%

3.47(0.75) 5.22(1.10) 6.51(0.37)

50% 70% 65%

0.025(0.025), 45, 53 2.22(0.45), 72, 116 2.76(0.56), 88, 126

0.0089(2.26E-05) 0.0039(0.00013) 2.42(1.93)

0.66(0.76) 2.12(1.25) 90%

90% 95% 4.38(0.09)

22

Table 3: Performance comparison when noise level is ﬁxed and initial solution is random. The numbers in the SNM column represent the relative performance measure and the associated standard deviation (given in the parenthesis), the total number of points generated and the number of times that ARS is performed. For other algorithms, only the absolute performance and the associated standard deviation are reported. If at least one macroreplication fails to converge, the ratio of successful macroreplications is reported.

SNM AV 4D 12D 16D ER 4D 12D 16D FR 4D 12D 16D PBS 4D 12D 16D BLE 4D 12D 16D PS 4D 12D 16D WD 4D 12D 16D TR 4D 12D 16D

Competing Algorithms SPSA MNM

PS

3.2E-03(2.7E-03), 35, 12 9.3E-03(9.7E-03), 279, 47 9.1E-03(6.0E-03), 542, 73

9.1E-04(1.3E-03) 6.4E-04(4.7E-04) 5.6E-04(2.3E-04)

5.0E-03(8.5E-03) 3.1E-02(3.4E-02) 4.8E-02(6.1E-02)

0.01(0.01) 0.02(0.01) 0.02(0.01)

6.07E-07(6.72E-07), 41, 17 6.63E-07(2.29E-07), 260, 55 1.02E-06(4.46E-07), 436, 62

0% 0% 0%

1.55E-06(1.88E-06) 3.36E-06(2.95E-06) 2.38E-06(2.81E-06)

1.41E-06(9.06E-07) 1.58E-06(7.24E-07) 1.60E-06(4.14E-07)

1.58E-06(1.13E-06), 52, 50 6.34E-06(4.48E-06), 501, 97 6.83E-06(4.90E-06), 756, 106

0% 0% 0%

3.91E-06(4.72E-06) 7.49E-06(6.99E-06) 6.27E-06(3.15E-06)

1.25E-06(8.10E-07) 1.22E-06(4.34E-07) 1.28E-06 (4.48E-07)

2.71E-13(1.55E-13), 17, 11 1.61E-13(8.46E-14), 343, 11 1.44E-13(5.76E-14), 815, 6

10% 0% 0%

2.36E-12(3.66E-12) 1.27E-12(3.60E-12) 6.08E-12(1.21E-11)

5.85E-15(1.54E-14) 1.17E-16(7.94E-17) 1.16E-16(4.67E-17)

3.69E-09(6.05E-09), 32, 26 1.72E-09(1.05E-09), 233. 61 1.73E-09(8.40E-10), 427, 66

5% 0% 0%

6.91E-08(2.60E-07) 1.24E-08(1.65E-08) 7.34E-08(1.42E-07)

1.06E-09(8.47E-10) 1.41E-09(1.10E-09) 1.39E-09(7.30E-10)

3.3E-04(9.8E-04), 57, 14 2.9E-04(4.8E-04), 557, 41 9.6E-04(1.3E-03), 1086, 40

3.15E-05(2.40E-05) 1.92E-05(1.41E-05) 2.41E-05(1.75E-05)

5.5E-03(9.5E-03) 1.2E-03(3.9E-03) 1.9E-03(5.4E-03)

1.02E-0.3(8.56E-04) 0.00068(0.00039) 0.00049(0.00018)

8.31E-07(9.12E-07), 54, 19 1.14E-05(4.02E-05), 745, 51 1.15E-05(1.82E-05), 1248, 29

5% 0% 0%

9.69E-06(4.34E-05) 6.48E-06(1.32E-05) 2.47E-05(5.07E-05)

1.08E-05(1.11E-05) 1.35E-05(6.82E-06) 1.58E-05(4.92E-06)

3.8E-03(4.3E-03), 19, 20 1.8E-02(2.0E-01), 203, 122 1.4E-02(1.1E-02), 732, 143

9.52E-04(1.59E-03) 2.29E-05(1.47E-05) 1.46E-05(1.12E-05)

1.2E-02(1.5E-02) 8.9E-02(9.8E-02) 5.2E-02(9.1E-02)

0.015(0.015) 0.0019(0.00093) 0.0012(0.00071)

23

Table 4: Performance comparison when noise level is varying and initial solution is random. The numbers in the SNM column represent the relative performance measure and the associated standard deviation (given in the parenthesis), the total number of points generated and the number of times that ARS is performed. For other algorithms, only the absolute performance and the associated standard deviation are reported. If at least one macroreplication fails to converge, the ratio of successful macroreplications is reported.

SNM AV 4D 12D 16D ER 4D 12D 16D FR 4D 12D 16D PBS 4D 12D 16D BLE 4D 12D 16D PS 4D 12D 16D WD 4D 12D 16D TR 4D 12D 16D

Competing Algorithms SPSA MNM

PS

4.8E-03(1.0E-02), 42, 45 5.8E-02(6.6E-02), 152, 91 6.4E-02(7.0E-02), 204, 109

7.32E-08(1.59E-07) 3.6E-02(1.5E-02) 95%

90% 75% 50%

95% 90% 75%

6.58E-07(6.76E-07), 49, 32 1.04E-04(2.63E-04), 211, 59 8.72E-05(2.30E-04), 337, 62

15% 0% 0%

95% 95% 80%

90% 80% 75%

3.01E-06(1.97E-06), 27, 47 1.28E-05(5.02E-06), 92, 84 1.45E-05(5.61E-06), 125, 102

0% 0% 0%

2.9E-04(6.7E-04) 80% 80%

90% 90% 95%

6.54E-14(1.50E-13), 44, 20 9.80E-14(8.53E-14), 334, 21 8.39E-14(5.38E-14), 710, 7

0% 0% 0%

90% 95% 9.3E-03(2.3E-02)

95% 90% 90%

1.53E-09(2.22E-09), 37, 41 7.99E-09(1.10E-08), 167, 69 1.27E-08(1.93E-08), 316, 77

0% 0% 0%

95% 80% 75%

90% 75% 90%

5.9E-04(1.4E-03), 744, 22 2.4E-02(1.1E-02), 110, 76) 3.1E-02(2.1E-02), 169, 102

1.80E-06(1.92E-06) 15% 15%

60% 70% 70%

0.28(0.27) 80% 95%

4.01E-05(7.40E-05), 53, 40 3.44E-04(3.11E-04), 146, 64 5.17E-04(2.51E-04), 194, 170

25% 0% 0%

90% 90% 90%

1.32E-01(1.81E-0.1) 75% 90%

2.3E-04(4.7E-04), 91, 51 2.8E-02(3.1E-02), 87, 112 5.5E-02(4.0E-02), 87, 125

2.5E-03(4.2E-04) 2.19E-06(1.10E-06) 80%

0.018(0.044) 95% 70%

85% 65% 80%

24

References Anderson, E.J., M.C. Ferris. 2001. A direct search algorithm for optimization with noisy function evaluations. SIAM Journal on Optimization 11 837–857. Andrad´ottir, S. 1995. Stochastic approximation algorithm with varying bounds. Operations Research 43(6) 1037–1048. Andrieu, L., G. Cohen, F.J. V´azquez-Abad. 2011. Gradient-based simulation optimization under probability constraints. European Journal of Operational Research 212 345–351. Ang¨ un, E., J. Kleijnen. 2010.

An asymptotic test of optimality conditions in multiresponse

simulation-based optimization. INFORMS Journal on Computing . Banks, J., ed. 1998. Handbook of Simulation. John Wiley and Sons., New York. Barton, R.R., J.S. Ivey. 1996. Nelder-Mead simplex modiﬁcations for simulation optimization. Management Science, 42(7) 954–973. Benveniste, A., M. Metivier, P. Priouret. 1990. Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, New York. Bhatnagar, S., N. Hemachandra, V.K. Mishra. 2011. Stochastic approximation algorithms for constrained optimization via simulation. ACM Transactions on Modeling and Computer Simulation 21(3) 15:1–15:22. Billingsley, P. 1995. Probability and Measure. John Wiley and Sons., New York. Blum, J.R. 1954. Approximation methods which converge with probability one. Annals of Mathematical Statistics 25 382–386. Chang, K.-H., L.J. Hong, H. Wan. 2009. Stochastic trust region gradient-free method (strong)-a new response-surfance-based algorithm in simulation optimization. working paper . Deng, G., M.C. Ferris. 2009. Variable-number sample-path optimization. Mathematical Programming 117 81–109. Fabian, V. 1971. Stochastic approximation. Optimizing Methods in Statistics 439–470. 25

Fang, K.-T., R. Li, A. Sudjianto. 2006. Design and Modeling for Computer Experiments. Chapman and Hall/CRC. Fletcher, R. 1987. Practical Methods of Optimization. 2nd ed. John Wiley and Sons, New York. Fu, M.C. 2002. Optimization for simulation: theory vs. practice. INFORMS Journal on Computing 14 192–227. Fu, M.C. 2006. Gradient estimation. S.G. Henderson, B.L. Nelson, eds., Chapter 19 in Handbooks in Operations Research and Management Science, vol. 13: Simulation. Elsevier, Amsterdam, 575–616. Homem-De-Mello, T. 2003. Variable-sample methods for stochastic optimization. ACM Transactions on Modeling and Computer Simulation 13(2) 108–133. Hooke, R., T.A. Jeeves. 1961. Direct search solution of numerical and statistical problems. Journal of the ACM 8 212–229. Humphrey, D.G., J.R. Wilson. 2000. A revised simplex search procedure for stochastic simulation response surface optimization. INFORMS Journal on Computing 12(4) 272–283. Khuri, A.I., J.A. Cornell. 1996. Response Surfaces Designs and Analyses. Marcel Dekker, New York. Kiefer, J., J. Wolfowitz. 1952. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics 23 462–466. Kolda, T.G., R.M. Lewis, V. Torczon. 2003. Optimization by direct search methods: new perspectives on some classical and modern methods. SIAM Review 45 385–482. Kushner, H.J., G.G. Yin. 1997. Stochastic Approximation Algorithms and Applications. SpringerVerlag, New York. Montgomery, D.C. 2005. Design and Analysis of Experiments. 6th ed. John Wiley and Sons., New York. More, J.J., B.S. Garbow, K.E. Hillstrom. 1981. Testing unconstrained optimization software. ACM Transactions on Mathematical Software 7 17–41. 26

Myers, R.H., D.C. Montgomery, C.M. Anderson-Cook. 2009. Response Surface Methodology-Process and Product Optimization Using Designed Experiments. John Wiley and Sons., New York. Nelder, J.A., R. Mead. 1965. A simplex method for function minimization. The Computer Journal 7 308–313. Price, C.J., I.D. Coope, D. Byatt. 2002. A convergent variant of the Nelder-Mead algorithm. Journal of Optimization theory and Applications 113(1) 5–19. Shang, Y-W, Y-H Qiu. 2006. A note on the extended Rosenbrock function. Evolutionary Computation 14(1) 119–126. Spall, J.C. 2003. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control . John Wiley and Sons., New York. Spendley, W., G.R. Hext, F.R. Himsworth. 1962. Sequential application of simplex designs in optimization and evolutionary operation. Technometrics 4 441–461. Swann, W.H. 1972. Direct search methods. W. Murray, ed., Numerical Methods for Unconstrained Optimization. London. Tekin, E., I. Sabuncuoglu. 2004. Simulation optimization: a comprehensive review on theory and applications. IIE Transactions 36 1067–1081. Wang, I.-J., J.C. Spall. 2008. Stochastic optimization with inequality constraints using simultaneous perturbations and penalty functions. International Journal of Control 81(8) 1232–1238.

A A.1

Appendix The SNM Algorithm

In the beginning, SNM generates p + 1 points (solutions) and stores them in the set of simplex points X0 . After evaluating and comparing the function value of the p + 1 points, three points are ref max through , and xmin , x2max identiﬁed: xmax k . The reﬂection point xk is generated by reﬂecting xk k k

in order to move the simplex in a direction away from the highest values the centroid point xcent k of the objective function and toward the lowest value of the objective function. Then, comparing 27

the estimate of function value of xref with that of xmax , x2max , and xmin can result in three k k k k possibilities: If xref has lower function value than x2max but greater than xmin k k , it is accepted. k On the other hand, if xref has extremely low function value, even lower than xmin k , an expansion k point xexp will be generated following the same direction in an attempt to explore the region that k exp potentially have even better solutions. If xexp has lower function value than xref is accepted; k k , xk

otherwise, the xref is accepted. Lastly, if xref has greater function value than x2max , the algorithm k k k will perform contraction, either outside or inside contraction (depending on how poor the xref is), k in an attempt to ﬁnd a better solution to replace xref k . The Adaptive Random Search procedure (ARS), as will be presented in Section 3.4, is performed if all the previous steps fail. SNM uses xmin to be the estimator of optimal solution. It is remarkable that SNM maintains the cardinality k of the set of the simplex points at p + 1, because for each iteration (except for the initialization) one point is accepted and the worst point in the set of simplex points is removed. Step 0. Generate an initial set of p + 1 extreme points in Rp to form an initial simplex. Let X0 store all simplex points. Step 1. For any iteration k, calculate gˆk for all points in Xk and remove the point with the highest gˆk if the cardinality of Xk is more than p + 1. Step 2. Find xcent , the centroid of all vertices other than xmax . Rank all points and identify xmax , x2max , k k k k ref min max cent and xk according to gˆk . Generate a new point xk by reﬂecting xk through xk according to ref xref = (1 + α)xcent − αxmax (α > 0). Calculate gˆk (xref k k k k ) = Sk (xk )/Nk (x). ∪ Step 3a. If gˆk (xmin ) ≤ gˆk (xref ˆk (x2max ), then let Xk = Xk−1 {xref k k k ) 1. If gˆk (xk ) < gˆk (xk ), then let Xk = Xk−1 {xk }. Otherwise, let Xk = Xk−1 {xref }. k Step 3c. If gˆk (xref ˆk (x2max ), then the simplex contracts. If (i) gˆk (x2max ) ≤ gˆk (xref ˆk (xmax ) the k k k k ) ≥ g k ) < g ref cont cent contraction point is determined by xk = βxk + (1 − β)xk , 0 ≤ β ≤ 1 (outside contraction). If (ii) gˆk (xref ˆk (xmax ), the contraction point is determined by xcont = βxmax + (1 − β)xcent k k k k k ) ≥ g ref cont (inside contraction). In case (i), if gˆk (xk ) ≤ gˆk (xk ), the contraction is accepted. In case (ii), if gˆk (x∪cont ) ≤ gˆk (xmax ), the contraction is accepted. If the contraction is accepted, let k k cont Xk = Xk−1 {xk }. Otherwise, perform Adaptive Random Search (Section 3.2) to ﬁnd an appropriate ∪ xARS and let Xk = Xk−1 {xARS .} Step 4. If a prespeciﬁed convergence criterion is met or if the maximum number of function evaluations ; otherwise, has been reached, stop the algorithm and return the estimated optimal solution with xmin k let k = k + 1 and return to Step 1.

Figure 6: The full SNM Algorithm

28

A.2

Generation of LHS

Suppose the parameter space is of p + 1 dimensions, and n divisions are made for each dimension. Further suppose that each dimension is constrained in (aj , bj ), j = 1, . . . , p + 1. The algorithm of generating an initial simplex based on LHS is described as follows (Fang et al., 2006). Algorithm: Step 1. Take p + 1 permutations πj (1), . . . , πj (n) of the integers 1, . . . , n for j = 1, . . . , p + 1. Step 2. Independently generate n(p+1) uniform variates Ukj ∼ U (0, 1), k = 1, . . . , n, j = 1, . . . , p+ 1. Let r0j = (πj (k) − Ukj )/n, k = 1, . . . , n, j = 1, . . . , p + 1. Step 3. Let xj0 = aj + (bj − aj )r0j . Then, x0 = (x10 , . . . xp+1 0 ), is the initial simplex.

29

Simplex Elements Stochastic Collocation for ...

Simplex Elements Stochastic Collocation in Higher ... - Jeroen Witteveen

A simplex-simplex approach for mixed aleatory ...

simplex-simplex approach for robust design optimiza ...

Natural Remedies for Herpes simplex - Semantic Scholar

Regular Simplex Fingerprints and Their Optimality ... - CiteSeerX

Natural Remedies for Herpes simplex - Semantic Scholar

Simplex 6400 master clock manual.pdf

INTEGRO-DIFFERENTIAL STOCHASTIC RESONANCE

Regular Simplex Fingerprints and Their Optimality Properties - UIUC-IFP

Stochastic cell transmission model (SCTM) A stochastic dynamic ...

STOCHASTIC PROCESSES.pdf

Stochastic Program Optimization - GitHub

INTEGRO-DIFFERENTIAL STOCHASTIC RESONANCE

stochastic processes on Riesz spaces

Fast Modulo Scheduling Under the Simplex Scheduling ...

The Treatment of Herpes Simplex Infections

Volume of an n dimensional simplex

Stochastic Data Streams