IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004

2499

LETTER

n-Dimensional Cauchy Neighbor Generation for the Fast Simulated Annealing Dongkyung NAM†a) , Member, Jong-Seok LEE† , and Cheol Hoon PARK† , Nonmembers

SUMMARY Many simulated annealing algorithms use the Cauchy neighbors for fast convergence, and the conventional method uses the product of n one-dimensional Cauchy distributions as an approximation. However, this method slows down the search severely as the dimension gets high because of the dimension-wise neighbor generation. In this paper, we analyze the orthogonal neighbor characteristics of the conventional method and propose a method of generating symmetric neighbors from the n-dimensional Cauchy distribution. The simulation results show that the proposed method is very effective for the search in the simulated annealing and can be applied to many other stochastic optimization algorithms. key words: fast simulated annealing, cauchy neighbor generation

1. Introduction Simulated annealing (SA) is one of the powerful and robust stochastic search algorithms for the various optimization problems having both discrete and continuous search spaces [1]. Basically, the SA is composed of three operations – neighbor generation with a probability density function g, state transition with an acceptance probability h, and temperature cooling with an annealing schedule T . The canonical SA is based on the Monte Carlo importance sampling [2], [3]. Figure 1 shows the simple pseudo code of the SA. In continuous state optimization, the SA uses the following Boltzmann annealing scheme [4]. g(∆x, T, k) = (2πT (k))−n/2 exp[−∆x2 /(2T(k))], h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 / ln k.

(1) (2) (3)

where g(·) is the neighbor generation function, h(·) the acceptance rule, T (·) the cooling schedule, x the state vector, n the dimension of the state space, T 0 the initial temperature, and k the iteration number. ∆x and ∆E denote the amount of the state and the cost changed, respectively. Since the logarithmic cooling in (3) is too slow for a practical use, many researchers proposed various modified versions of the SA [5] among which Szu and Hartly [4] proposed the Fast Simulated Annealing (FSA) given by an T (k) , + T (k)2 )(n+1)/2 h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 /k g(∆x, T, k) =

(∆x2

(4) (5) (6)

Manuscript received April 28, 2004. The authors are with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305–701, Korea. a) E-mail: [email protected]

Set an initial temperature T and initial state vector x. repeat Generate a neighbor vector x¯ by g. Evaluate x and x¯. Accept x¯ as the next state with h. Cooling the temperature T . until the terminating conditions are satisfied. Fig. 1

The pseudo-code of the SA.

 where ∆x = ∆x21 + · · · + ∆x2n , (∆x1 , . . . , ∆xn ) is the state variation in the n-dimensional space, and an a normalizing constant for the probability density function (e.g., a1 = 1/π). The FSA using the Cauchy neighbors in (4) and the fast cooling schedule in (6) has been shown to be practically effective in many real-world applications [4]. However, because of the difficulty of generating random numbers from the n-dimensional Cauchy distribution, people have used the product of n dimension-wise (one-dimensional) Cauchy distributions with the modified cooling scheme [5]. g(∆x, T, k) = an

n 

∆x2i i=1

T (k) , + T (k)2

h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 /k1/n

(7) (8) (9)

where an is a normalizing constant for the probability density function. Even though this approximation is simple to be used, it makes the search slow and this side-effect gets severe as the dimension gets high. This phenomenon occurs mainly from the fact that the neighbor generation prefers the axial direction in the state space as shown in Fig. 2(a). The preference of the neighbor generation to the (orthogonal) axial direction restricts the search capability and severely slows down the search in a high dimension. This is not a problem in the Boltzmann annealing which uses the normal distribution of (1) because the product of n one-dimensional normal distributions generates an omnidirectional neighbor as shown in Fig. 2(b). However, the FSA using the dimension-wise neighbors suffers from the high dimensional orthogonal neighbor problem. In this paper, to increase the search speed of the SA, we propose a method of generating the omni-directional neighbors by using the exact form of the n-dimensional Cauchy distribution as shown in Fig. 2(c).

IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004

2500

Fig. 2 The distribution of neighbor generation (for two-dimensional case): (a) the orthogonal neighbor of the dimension-wise Cauchy distributions (T (k) = 1). (b) the neighbor of the dimension-wise normal distribution (zero mean and unit variance). (c) the omni-directional neighbor of the proposed Cauchy neighbor generation (T (k) = 1).

˜ θ) ˆ = G(

2. Generation of the Exact n-Dimensional Cauchy Distribution

where

For simplicity, from now on, we abbreviate g(∆x, T, k) as g(∆x) and T (k) as T in the FSA. The cumulative distribution function (CDF) of g(∆x) of (4) is,   an T d∆x1 ···d∆xn . (10) G(∆x) = ··· n+1 2 (∆x1 +···+∆x2n +T 2 ) 2 By changing (10) in Cartesian coordinates into spherical polar coordinates, we obtain G(∆x) = P{r ≤ rˆ, φ1 ≤ φˆ 1 , . . . , φn−1 ≤ φˆ n−1 }  rˆ  φˆ 1  φˆ n−2 φˆ n−1 an T ··· rn−1 × = 2 2 (n+1)/2 0 0 −π (r + T ) 0 sinn−2 φ1 · · · sin2 φn−3 sin φn−2 × (11) drdφ1 · · · dφn−3 dφn−2 dφn−1  rˆ T rn−1 ˆ ˆ = α(φ1 , . . . , φn−1 , n) dr 2 2 (n+1)/2 0 (r + T )  where r = ∆x21 + · · · + ∆x2n (0 ≤ r ≤ ∞), and  φˆ 1  φˆ n−2 φˆ n−1 α(φˆ 1 , . . . , φˆ n−1 , n) = 0 ··· 0 −π sinn−2 φ1 · · · (12) sin2 φn−3 sin φn−2 dφ1 ···dφn−3 dφn−2 dφn−1 for 0 ≤ φˆ 1 , . . . , φˆ n−2 ≤ π, and −π ≤ φˆ n−1 ≤ π. From (11), let  rˆ T rn−1 ˜ dr. (13) G(ˆr) = 2 2 (n+1)/2 0 (r + T ) Substituting r = T tan θ (0 ≤ θ < π/2), we obtain  θˆ ˜ θ) ˆ = G( sinn−1 θdθ

(14)

0

˜ θ) ˆ a for 0 ≤ θˆ < π/2. By normalization of (14) to make G( CDF, it can be calculated that

1 β(n − 1) 

π/2

β(n) =



θˆ

sinn−1 θdθ,

(15)

0

sinn θdθ

0

 2k  2 k!k!     (2k + 1)! , =  π(2k + 1)!   ,  k!(k + 1)!22k+2

if n = 2k + 1, if n = 2k + 2,

(16)

for k = 0, 1, 2, · · · and β(0) = π/2. We can generate the random value of the radial length rˆ of the n-dimensional Cauchy distribution by finding θˆ from rˆ = T tan θˆ and ˜ θ) ˆ where u is a uniform random number in [0, 1) u = G( [6]. Then, after determining the random radial length value, we should generate the random variable from the angularly uniform distribution on the hyper-sphere with radius r. This is the well-known hyper-sphere point picking problem and there are several methods developed so far [7]. Among them we choose the method of generating ∆x1 , . . . , ∆xn independently with standard normal distribution and making √ √ S = ∆x21 + · · · + ∆x2n to obtain (∆x1 / S , · · · , ∆xn / S ). The final √ n-dimensional √ Cauchy random neighbor becomes rˆ(∆x1 / S , · · · , ∆xn / S ). Even though this method requires additional calculations for the normal distribution, it will be shown to be effective in the following computer simulation. 3. Simulations and Analyses By changing the coordinates and subdividing the neighbor variables into two parts – magnitude and direction, we can simplify the calculation of the n-dimensional Cauchy distribution. The calculation of the integration part in (15) may be an obstacle which makes the neighbor generation slow. However, because the dimension of the state keeps unchanged during optimization in general, we calculate the

LETTER

2501

Fig. 3 Comparison of the convergence performances in the n-dimensional quadratic problems. (a) n = 30. (b) n = 100. (c) n = 1000. Table 1 The CPU time (sec) in generating n by m random samples (m = 10, 000) where n represents the dimension and m is the number of samples. Normal n = 20 n = 50 n = 100 n = 200

0.01 0.04 0.08 0.16

Dimension-wise Cauchy 0.08 0.22 0.42 0.85

n-Dimensional Cauchy 0.23 0.25 0.34 0.60

integration once in a numerical way, make a look-up table and use the interpolation of the look-up table instead of each integration in the search. We use the Runge-Kutta integration method and make a look-up table with uniformly selected 10,000 sample points in [0, π/2) for the simulation. The linear interpolation method is used for the interpolation between sample points. The generation of the magnitude component of the neighbor variables from the look-up table is followed by the generation of their n − 1 angular components from the n − 1 one-dimensional normal distribution with normalization. By multiplying the magnitude value and the unit directional vector, we generate an n-dimensional Cauchy neighbor. Figure 2 shows the dimension-wise Cauchy neighbor (a), the normally distributed neighbor in two dimensions (b), and the proposed n-dimensional Cauchy neighbor (c). As expected, ndimensional Cauchy distribution is omni-directional while the dimension-wise distribution is more oriented to the axial direction. Also the n-dimensional Cauchy distribution shows a ‘fatter’ tail than the Gaussian distribution, permitting distant searches which are important for the FSA having a fast temperature cooling schedule. We measure the neighbor generation time in the MATLAB environment with the AMD Athlon 2800+ CPU. Table 1 shows the CPU time in generating n by m random sample matrix. The results show that the proposed method is not much slower than the dimension-wise neighbor generation and even faster in higher dimensions. Finally, in order to test the performance of the proposed neighbor generations, we apply the proposed neighbor generation method to the FSA for minimizing the following high dimensional quadratic problem and compare the per-

formance to the FSA with the conventional dimension-wise Cauchy neighbors. f (x1 , · · · , xn ) =

n

x2i , (−5.12 ≤ x1 ≤ 5.12)

(17)

i=1

where n is the dimension of the state space. Figure 3 shows the general convergence trajectories when n = 30, n = 100 and n = 1000. The FSA with the proposed multidimensional Cauchy neighbors converges faster than the one with the dimension-wise Cauchy neighbors especially when the dimension of the problem becomes high. We can also see that the latter takes much time in the initial stage to find the right way to the optimal solution in a high dimensional problem. 4. Conclusion In this paper, we proposed the method of generating random numbers from the high dimensional Cauchy distribution and compared it with the conventional dimension-wise method. Computer simulation shows that the proposed method has the omni-directional characteristics and that it is effective in the Fast Simulated Annealing especially for high dimensional problems. The proposed method can also be applied to other stochastic search algorithms such as evolutionary algorithms. Acknowledgments This work was supported by grant No. R01-2003-00010829-0 from the Basic Research Program of the Korea Science and Engineering Foundation. References [1] P.J. van Laarhoven and E.H.L. Aarts, Simulated Annealing: Theory and Applications, Kluwer Academic Press, Dordrecht, The Netherlands, 1987. [2] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” J. Chemical Physics, vol.21, no.6, pp.1087–1092, June 1953.

IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004

2502

[3] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing,” Science, vol.220, no.4598, pp.671–680, May 1983. [4] H. Szu and R. Hartley, “Fast simulated annealing,” Phys. Lett. A, vol.122, pp.157–162, June 1987. [5] L. Ingber, “Very fast simulated re-annealing,” J. Math. and Comput.

Model., vol.12, no.8, pp.967–973, 1989. [6] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, NY, 1991. [7] G. Marsaglia, “Choosing a point from the surface of a sphere,” Ann. Math. Statist., vol.43, no.2, pp.645–646, 1972.

n-Dimensional Cauchy Neighbor Generation for the ...

Page 1. IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004 .... Page 2. 2500. IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER ...

498KB Sizes 18 Downloads 172 Views

Recommend Documents

n-Dimensional Cauchy Neighbor Generation for the ...
Nov 11, 2004 - uct of n one-dimensional Cauchy distributions as an approximation. How- ever, this ... lowing Boltzmann annealing scheme [4]. g(∆x, T, k) ...

Asymptotic structure for solutions of the Cauchy ...
where { ˜fl(x−clt)} are the travelling wave solutions of (1b) with overfalls [α−l,α+ l ], .... The proofs of Theorems 1a, 1b combine improved versions of earlier tech-.

Asymptotic structure for solutions of the Cauchy ...
Large time behaviour of solutions of the Cauchy problem for the Navier–Stokes equation in Rn, n ≥ 2, and ... Gelfand [G] found a solution to this problem for the inviscid case ε = +0 with initial conditions f(x, 0) = α± if ..... equations but

Cauchy Graph Embedding
ding results preserve the local topology of the ... local topology preserving property: a pair of graph nodes ..... f(x)=1/(x2 + σ2) is the usual Cauchy distribution.

Hello Neighbor...Card.pdf
Not authorized by. any candidate or candidate's committee. www.HCRP.org. Phone #:. E-mail: Page 1 of 1. Hello Neighbor...Card.pdf. Hello Neighbor...Card.pdf.

Neighbor Discrimination - CNRS
Oct 18, 2016 - 1A similar phenomenon occurs in the labor market: African immigrants ...... Statistical discrimination Black tenants may also be discriminated ...

Neighbor Discrimination - CNRS
Oct 18, 2016 - Price discrimination must be covert: it may involve the amount of the security deposit (two or .... 2For recent empirical evidence on the existence of frictions on the US home-sale market, see ... residential segregation.

Neighbor Discrimination
Oct 13, 2016 - Maurice Halbwachs and the Centre d'Acc`es Sécurisé `a Distance. ... must be covert: it may involve the amount of the security deposit (two or .... empirical evidence on the existence of frictions on the US home-sale market, see ...

The condensed nearest neighbor rule (Corresp.)
pertaining to the nearest neighbor decision rule (NN rule). We briefly review ... From a practical point of view, however, the NN rule is not a prime candidate for.

My Neighbor, the Umbrella Stand.pdf
My Neighbor, the Umbrella Stand.pdf. My Neighbor, the Umbrella Stand.pdf. Open. Extract. Open with. Sign In. Main menu.

The Influence of Gaussian, Uniform, and Cauchy ...
networks found that Cauchy mutation had better performance than Gaussian ... network training performance using a certain perturbation function does not.

The Cauchy problem at a node with buffer
Nov 29, 2011 - existence and well posedness of solutions to the Cauchy problem, by ..... Definition 6 We say that a wave (ρl,ρr) in an arc is a big shock if ρl < σ

Hello Neighbor...Card.pdf
Please feel free to contact me if I may be of assistance ... or about getting involved. There are great opportunities to volunteer for campaigns, meet new people in ... Hello Neighbor...Card.pdf. Hello Neighbor...Card.pdf. Open. Extract. Open with.

Training the Gaming Generation
This enables companies to optimise ... The trainee is represented in the software by a realistic ... Among the multiple and broad application areas, two can be.

Canada: The Neighbor to the North Author(s)
For more information about JSTOR, please contact [email protected]. University of California Press is collaborating ... produced not one but two new nations on the North American conti- nent, the United States and Canada. ..... The "capital" of Canad

Nearest Neighbor Conditional Estimation for Harris ...
Jul 5, 2008 - However, under more restrictive conditions central limit theorems can also be inferred and details are provided. Inferential arguments in ...

k-Nearest Neighbor Monte-Carlo Control Algorithm for ...
nique uses a database of belief vector pro- totypes to ... dressed by mapping the dialog state representation ... into summary space and then mapped into a sum-.

SEQUENTIAL k-NEAREST NEIGHBOR PATTERN ...
speech data into two classes. Hypothesis ... auto-regressive model coefficients. A step-size of ... define the SPR ϕ j for segment j and classes ω0 and ω1 as: ϕ0 = (n−n ... training scheme requires a priori knowledge of the class as- sociations

SVM-KNN: Discriminative Nearest Neighbor ...
Michael Maire. Jitendra Malik. Computer Science Division, EECS Department. Univ. of California, Berkeley, CA 94720 ... yields excellent results in practice. The basic idea is to find close neighbors to a query sample and ... rather by similarity to p