IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004
2499
LETTER
n-Dimensional Cauchy Neighbor Generation for the Fast Simulated Annealing Dongkyung NAM†a) , Member, Jong-Seok LEE† , and Cheol Hoon PARK† , Nonmembers
SUMMARY Many simulated annealing algorithms use the Cauchy neighbors for fast convergence, and the conventional method uses the product of n one-dimensional Cauchy distributions as an approximation. However, this method slows down the search severely as the dimension gets high because of the dimension-wise neighbor generation. In this paper, we analyze the orthogonal neighbor characteristics of the conventional method and propose a method of generating symmetric neighbors from the n-dimensional Cauchy distribution. The simulation results show that the proposed method is very effective for the search in the simulated annealing and can be applied to many other stochastic optimization algorithms. key words: fast simulated annealing, cauchy neighbor generation
1. Introduction Simulated annealing (SA) is one of the powerful and robust stochastic search algorithms for the various optimization problems having both discrete and continuous search spaces [1]. Basically, the SA is composed of three operations – neighbor generation with a probability density function g, state transition with an acceptance probability h, and temperature cooling with an annealing schedule T . The canonical SA is based on the Monte Carlo importance sampling [2], [3]. Figure 1 shows the simple pseudo code of the SA. In continuous state optimization, the SA uses the following Boltzmann annealing scheme [4]. g(∆x, T, k) = (2πT (k))−n/2 exp[−∆x2 /(2T(k))], h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 / ln k.
(1) (2) (3)
where g(·) is the neighbor generation function, h(·) the acceptance rule, T (·) the cooling schedule, x the state vector, n the dimension of the state space, T 0 the initial temperature, and k the iteration number. ∆x and ∆E denote the amount of the state and the cost changed, respectively. Since the logarithmic cooling in (3) is too slow for a practical use, many researchers proposed various modified versions of the SA [5] among which Szu and Hartly [4] proposed the Fast Simulated Annealing (FSA) given by an T (k) , + T (k)2 )(n+1)/2 h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 /k g(∆x, T, k) =
(∆x2
(4) (5) (6)
Manuscript received April 28, 2004. The authors are with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305–701, Korea. a) E-mail:
[email protected] †
Set an initial temperature T and initial state vector x. repeat Generate a neighbor vector x¯ by g. Evaluate x and x¯. Accept x¯ as the next state with h. Cooling the temperature T . until the terminating conditions are satisfied. Fig. 1
The pseudo-code of the SA.
where ∆x = ∆x21 + · · · + ∆x2n , (∆x1 , . . . , ∆xn ) is the state variation in the n-dimensional space, and an a normalizing constant for the probability density function (e.g., a1 = 1/π). The FSA using the Cauchy neighbors in (4) and the fast cooling schedule in (6) has been shown to be practically effective in many real-world applications [4]. However, because of the difficulty of generating random numbers from the n-dimensional Cauchy distribution, people have used the product of n dimension-wise (one-dimensional) Cauchy distributions with the modified cooling scheme [5]. g(∆x, T, k) = an
n
∆x2i i=1
T (k) , + T (k)2
h(∆E, T, k) = min[1, exp(−∆E/T (k))], T (k) = T 0 /k1/n
(7) (8) (9)
where an is a normalizing constant for the probability density function. Even though this approximation is simple to be used, it makes the search slow and this side-effect gets severe as the dimension gets high. This phenomenon occurs mainly from the fact that the neighbor generation prefers the axial direction in the state space as shown in Fig. 2(a). The preference of the neighbor generation to the (orthogonal) axial direction restricts the search capability and severely slows down the search in a high dimension. This is not a problem in the Boltzmann annealing which uses the normal distribution of (1) because the product of n one-dimensional normal distributions generates an omnidirectional neighbor as shown in Fig. 2(b). However, the FSA using the dimension-wise neighbors suffers from the high dimensional orthogonal neighbor problem. In this paper, to increase the search speed of the SA, we propose a method of generating the omni-directional neighbors by using the exact form of the n-dimensional Cauchy distribution as shown in Fig. 2(c).
IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004
2500
Fig. 2 The distribution of neighbor generation (for two-dimensional case): (a) the orthogonal neighbor of the dimension-wise Cauchy distributions (T (k) = 1). (b) the neighbor of the dimension-wise normal distribution (zero mean and unit variance). (c) the omni-directional neighbor of the proposed Cauchy neighbor generation (T (k) = 1).
˜ θ) ˆ = G(
2. Generation of the Exact n-Dimensional Cauchy Distribution
where
For simplicity, from now on, we abbreviate g(∆x, T, k) as g(∆x) and T (k) as T in the FSA. The cumulative distribution function (CDF) of g(∆x) of (4) is, an T d∆x1 ···d∆xn . (10) G(∆x) = ··· n+1 2 (∆x1 +···+∆x2n +T 2 ) 2 By changing (10) in Cartesian coordinates into spherical polar coordinates, we obtain G(∆x) = P{r ≤ rˆ, φ1 ≤ φˆ 1 , . . . , φn−1 ≤ φˆ n−1 } rˆ φˆ 1 φˆ n−2 φˆ n−1 an T ··· rn−1 × = 2 2 (n+1)/2 0 0 −π (r + T ) 0 sinn−2 φ1 · · · sin2 φn−3 sin φn−2 × (11) drdφ1 · · · dφn−3 dφn−2 dφn−1 rˆ T rn−1 ˆ ˆ = α(φ1 , . . . , φn−1 , n) dr 2 2 (n+1)/2 0 (r + T ) where r = ∆x21 + · · · + ∆x2n (0 ≤ r ≤ ∞), and φˆ 1 φˆ n−2 φˆ n−1 α(φˆ 1 , . . . , φˆ n−1 , n) = 0 ··· 0 −π sinn−2 φ1 · · · (12) sin2 φn−3 sin φn−2 dφ1 ···dφn−3 dφn−2 dφn−1 for 0 ≤ φˆ 1 , . . . , φˆ n−2 ≤ π, and −π ≤ φˆ n−1 ≤ π. From (11), let rˆ T rn−1 ˜ dr. (13) G(ˆr) = 2 2 (n+1)/2 0 (r + T ) Substituting r = T tan θ (0 ≤ θ < π/2), we obtain θˆ ˜ θ) ˆ = G( sinn−1 θdθ
(14)
0
˜ θ) ˆ a for 0 ≤ θˆ < π/2. By normalization of (14) to make G( CDF, it can be calculated that
1 β(n − 1)
π/2
β(n) =
θˆ
sinn−1 θdθ,
(15)
0
sinn θdθ
0
2k 2 k!k! (2k + 1)! , = π(2k + 1)! , k!(k + 1)!22k+2
if n = 2k + 1, if n = 2k + 2,
(16)
for k = 0, 1, 2, · · · and β(0) = π/2. We can generate the random value of the radial length rˆ of the n-dimensional Cauchy distribution by finding θˆ from rˆ = T tan θˆ and ˜ θ) ˆ where u is a uniform random number in [0, 1) u = G( [6]. Then, after determining the random radial length value, we should generate the random variable from the angularly uniform distribution on the hyper-sphere with radius r. This is the well-known hyper-sphere point picking problem and there are several methods developed so far [7]. Among them we choose the method of generating ∆x1 , . . . , ∆xn independently with standard normal distribution and making √ √ S = ∆x21 + · · · + ∆x2n to obtain (∆x1 / S , · · · , ∆xn / S ). The final √ n-dimensional √ Cauchy random neighbor becomes rˆ(∆x1 / S , · · · , ∆xn / S ). Even though this method requires additional calculations for the normal distribution, it will be shown to be effective in the following computer simulation. 3. Simulations and Analyses By changing the coordinates and subdividing the neighbor variables into two parts – magnitude and direction, we can simplify the calculation of the n-dimensional Cauchy distribution. The calculation of the integration part in (15) may be an obstacle which makes the neighbor generation slow. However, because the dimension of the state keeps unchanged during optimization in general, we calculate the
LETTER
2501
Fig. 3 Comparison of the convergence performances in the n-dimensional quadratic problems. (a) n = 30. (b) n = 100. (c) n = 1000. Table 1 The CPU time (sec) in generating n by m random samples (m = 10, 000) where n represents the dimension and m is the number of samples. Normal n = 20 n = 50 n = 100 n = 200
0.01 0.04 0.08 0.16
Dimension-wise Cauchy 0.08 0.22 0.42 0.85
n-Dimensional Cauchy 0.23 0.25 0.34 0.60
integration once in a numerical way, make a look-up table and use the interpolation of the look-up table instead of each integration in the search. We use the Runge-Kutta integration method and make a look-up table with uniformly selected 10,000 sample points in [0, π/2) for the simulation. The linear interpolation method is used for the interpolation between sample points. The generation of the magnitude component of the neighbor variables from the look-up table is followed by the generation of their n − 1 angular components from the n − 1 one-dimensional normal distribution with normalization. By multiplying the magnitude value and the unit directional vector, we generate an n-dimensional Cauchy neighbor. Figure 2 shows the dimension-wise Cauchy neighbor (a), the normally distributed neighbor in two dimensions (b), and the proposed n-dimensional Cauchy neighbor (c). As expected, ndimensional Cauchy distribution is omni-directional while the dimension-wise distribution is more oriented to the axial direction. Also the n-dimensional Cauchy distribution shows a ‘fatter’ tail than the Gaussian distribution, permitting distant searches which are important for the FSA having a fast temperature cooling schedule. We measure the neighbor generation time in the MATLAB environment with the AMD Athlon 2800+ CPU. Table 1 shows the CPU time in generating n by m random sample matrix. The results show that the proposed method is not much slower than the dimension-wise neighbor generation and even faster in higher dimensions. Finally, in order to test the performance of the proposed neighbor generations, we apply the proposed neighbor generation method to the FSA for minimizing the following high dimensional quadratic problem and compare the per-
formance to the FSA with the conventional dimension-wise Cauchy neighbors. f (x1 , · · · , xn ) =
n
x2i , (−5.12 ≤ x1 ≤ 5.12)
(17)
i=1
where n is the dimension of the state space. Figure 3 shows the general convergence trajectories when n = 30, n = 100 and n = 1000. The FSA with the proposed multidimensional Cauchy neighbors converges faster than the one with the dimension-wise Cauchy neighbors especially when the dimension of the problem becomes high. We can also see that the latter takes much time in the initial stage to find the right way to the optimal solution in a high dimensional problem. 4. Conclusion In this paper, we proposed the method of generating random numbers from the high dimensional Cauchy distribution and compared it with the conventional dimension-wise method. Computer simulation shows that the proposed method has the omni-directional characteristics and that it is effective in the Fast Simulated Annealing especially for high dimensional problems. The proposed method can also be applied to other stochastic search algorithms such as evolutionary algorithms. Acknowledgments This work was supported by grant No. R01-2003-00010829-0 from the Basic Research Program of the Korea Science and Engineering Foundation. References [1] P.J. van Laarhoven and E.H.L. Aarts, Simulated Annealing: Theory and Applications, Kluwer Academic Press, Dordrecht, The Netherlands, 1987. [2] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” J. Chemical Physics, vol.21, no.6, pp.1087–1092, June 1953.
IEICE TRANS. INF. & SYST., VOL.E87–D, NO.11 NOVEMBER 2004
2502
[3] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, “Optimization by simulated annealing,” Science, vol.220, no.4598, pp.671–680, May 1983. [4] H. Szu and R. Hartley, “Fast simulated annealing,” Phys. Lett. A, vol.122, pp.157–162, June 1987. [5] L. Ingber, “Very fast simulated re-annealing,” J. Math. and Comput.
Model., vol.12, no.8, pp.967–973, 1989. [6] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, NY, 1991. [7] G. Marsaglia, “Choosing a point from the surface of a sphere,” Ann. Math. Statist., vol.43, no.2, pp.645–646, 1972.