The Influence of Gaussian, Uniform, and Cauchy Perturbation Functions in the Neural Network Evolution Paulito P. Palmes and Shiro Usui RIKEN Brain Science Institute 2-1 Hirosawa, Wako, Saitama 351-0198 JAPAN [email protected] [email protected]

Abstract. Majority of algorithms in the field of evolutionary artificial neural networks (EvoANN) rely on the proper choice and implementation of the perturbation function to maintain their population’s diversity from generation to generation. Maintaining diversity is an important factor in the evolution process since it helps the population of ANN (Artificial Neural Networks) to escape local minima. To determine which among the perturbation functions are ideal for ANN evolution, this paper analyzes the influence of the three commonly used functions, namely: Gaussian, Cauchy, and Uniform. Statistical comparisons are conducted to examine their influence in the generalization and training performance of EvoANN. Our simulations using the glass classification problem indicate that for mutation-with-crossover-based EvoANN, generalization performance among the three perturbation functions are not significantly different. On the other hand, mutation-based EvoANN that uses Gaussian mutation performs as good as that with crossover but it performs worst when it uses either Uniform or Cauchy distribution function. These observations suggest that crossover operation becomes a significant operation in systems that employ strong perturbation functions but has less significance in systems that use weak or conservative perturbation functions.

1

Introduction

There are two major approaches in evolving a non-gradient based population of neural networks, namely: Mutation-based approach using EP (Evolutionary Programming) or ES (Evolutionary Strategies) concepts and Crossover-based approach which is based on GA (Genetic Algorithm) implementation. While the former relies heavily on the mutation operation, the latter considers the crossover operation to be the dominant operation of evolution. Common to these approaches is the choice of the perturbation function that is responsible for the introduction of new characteristics and information in the population. Since the selection process favors individuals with better fitness for the next generation, it is important that the latter generation will not be populated by individuals

that are too similar to avoid the possibility of being stuck in a local minimum. The way to address this issue is through the proper choice and implementation of the perturbation function, encoding scheme, selection criteria, and the proper formulation of the fitness function. In this study, we are interested in the first issue. The SEPA (Structure Evolution and Parameter Adaptation) [4] evolutionary neural network model is chosen in the implementation to ensure that the main driving force of evolution is through the perturbation function and the crossover operation. The SEPA model does not use any gradient information and relies only in its mutation’s perturbation function and crossover operation for ANN evolution.

2

Related Study

Several studies have been conducted to examine the influence of the different perturbation functions in the area of optimization. While Gaussian mutation is the predominant function in numerical optimization, the work done by [7] indicated that local convergence was similar between Gaussian and spherical Cauchy but slower in non-spherical Cauchy. Studies done by [8] in evolutionary neural networks found that Cauchy mutation had better performance than Gaussian mutation in multimodal problems with many local minima. For problems with few local minima, both functions had similar performance. A study conducted by [1] combined both the Gaussian and Cauchy distributions by taking the mean of the random variable from Gaussian together with the random variable from Cauchy. Preliminary results showed that the new function performed as good or better than the plain Gaussian implementation. Common to these approaches is the reliance of the system to the perturbation function to effect gradual changes to its parameters in order for the system to find a better solution. In a typical implementation, the perturbation function undergoes adaptation together with the variables to be optimized. Equations (1) and (2) describe a typical implementation using Gaussian self-adaptation [1]: η 0 = η + ηN (0, 1)

(1)

x0 = x + η 0 N (0, 1)

(2)

where x is the vector of variables to be optimized; η is the vector of search step parameters (SSP), each undergoing self-adaptation; N is the vector of Gaussian functions with mean 0 and the standard deviation controlled by their respective SSPs; The typical implementations in evolutionary neural networks also follow similar formulation for the mutation of weights: w = w + N(0, α(ϕ)) ∀w ∈ ϕ where N(0, α(ϕ)) is the gaussian perturbation with mean 0 and standard deviation α(ϕ); w is a weight; and (ϕ) is an error function of network ϕ (e.g. mean-squared error) which is scaled by the user-defined constant α. 2

Unlike in a typical function optimization problem where the main goal is to optimize the objective function, the goal of neural network evolution is to find the most suitable architecture with the best generalization performance. Good network training performance using a certain perturbation function does not necessarily translate into a good generalization performance due to overfitness. It is important, therefore, to study the influence of the different perturbation functions in the training and the generalization performances of ANN. Moreover, knowing which combination of mutation and adaptation strategies are suited for a particular perturbation function and problem domain will be a big help in the neural network implementation. These issues will be examined in the future. In this paper, our discussions will only be limited to the performance of EvoANN in the glass classification problem taken from the UCI repository [2].

3

Evolutionary ANN Model

deleted node

1

a

a

b

x 2

c 1

1

2

3

4

a

b

2

x

3

y

b y

3 c

c 4

W1

4

W2

W1

a) ANN

W2

b) SEPA Representation of ANN

Fig. 1. Evolutionary ANN

Neural Network implementation can be viewed as a problem in optimization where the goal is to search for the best network configuration having good performance in training, testing, and validation. This is achieved by training the network to allow it to adjust its architecture and weights based on the constraint imposed by the problem. The SEPA model (Fig. 1) used in this study addresses this issue by making weight and architecture searches become a single process that is controlled by mutation and crossover. Changes caused by mutation and crossover induce corresponding changes to the weights and architecture of the ANN at the same time [3]. In this manner, the major driving force of evolution in SEPA is through the implementation of the crossover and mutation operations. This makes the choice of the perturbation function and the implementation of adaptation, mutation, and crossover very important for the successful evolution of the network. Below is a summary of the SEPA approach: 1. At iteration t=0, initialize a population P (t) = {nett1 , ..., nettµ } of µ individuals randomly: i i , ρ(pri , mi , σi )} , θw2 neti = {W 1i , W 2i , θw1

3

where: W 1, W 2 are the weight matrices; θw1 , θw2 are the threshold vectors; ρ is the perturbation function; pr is the mutation probability; m is the strategy parameter; and σ is the step size parameter (SSP). 2. Compute the fitness of each individual based on the objective function Qf it [5]: Qf it = α ∗ Qacc + β ∗ Qnmse + γ ∗ Qcomp where: Qacc is the percentage error in classification; Qnmse is the percentage of normalized mean-squared error (NMSE); Qcomp is the complexity measure in terms of the ratio between the active connections c and the total number of possible connections ctot ; α, β, and γ are constants used to control the strength of influence of their respective factors. 3. Using rank selection policy, repeat until there are µ individuals generated: – Rank-select two parents, netk and netl , and apply crossover operation by exchanging weights between W 1k and W 1l and weights between W 2k and W 2l : ∀(r, c) ∈ W 1k ∧ W 1l , if rand() < Θ, swap(W 1k [r][c], W 1l [r][c]) ∀(r, c) ∈ W 2k ∧ W 2l , if rand() < Θ, swap(W 2k [r][c], W 2l [r][c]) where Θ is initialized to a random value between 0 to 0.5 4. Mutate each individual neti , i = 1, ..., µ, by perturbing W 1i and W 2i using: δi = ρ(σi ); m0i = mi + ρ(δi ); wi0 = wi + ρ(m0i ) where: σ is the SSP (step size parameter); δ is mutation strength intensity; ρ is the perturbation function; m is the adapted strategy parameter, and w is the weight chosen randomly from either W1 or W2. 5. Compute the fitness of each offspring using Qf it 6. Using elitist replacement policy, retain the best two parents and replace the remaining parents by their offsprings. 7. Stop if the stopping criterion is satisfied; otherwise, go to step 2.

4

Experiments and Results

Two major SEPA variants were used to aid in the analysis, namely: mutationbased (mSEPA) and the mutation-crossover-based (mcSEPA or standard SEPA). Furthermore, each major variant is divided into three categories, namely: mSEPAc (Cauchy-based); mSEPA-g (Gaussian-based); and mSEPA-u (Uniform-based). Similarly, mcSEPA follows similar categorization, namely: mcSEPA-c, msSEPAg, and mcSEPA-u which is based on the type of perturbation function used. Table 1 summarizes the important parameters and variables used by the different variants. The glass problem was particularly chosen because its noisy data made generalization difficult which was a good way to discriminate robust variants. The sampling procedure divided the data into 50% training, 25% validation, and 25% testing [6]. The objective was to forecast the glass type (6 types) based on the results of the chemical analysis (6 inputs) using 214 observations. Table 2 shows the generalization performance of the different SEPA variants. The posthoc test in Table 2 uses the Tukey’s HSD wherein average error results that are not significantly different are indicated by the same label (∗ or 4

Table 1. Feature Implemented in SEPA for the Simulation SEPA Main Features Features

Implemented

selection type mutation type mutation prob SSP size crossover type replacement population size no. of trials max. hidden units max. generations stopping criterion fitness constants classification

rank gaussian-cauchy-uniform 0.01 σ=100 uniform elitist 100 30 10 5000 validation sampling α = 1.0, β = 0.7, γ = 0.3 winner-takes-all

Comment rank-sum selection depends on the variant Uniform range is U(-100,100) randomly assigned between (0,0.5) retains two best parents

evaluated at every 10th generation

Table 2. ANOVA of Generalization Error in Glass Classification Problem Gaussian vs Uniform vs Cauchy Variants mSEPA-g mcSEPA-u mcSEPA-g mcSEPA-c mSEPA-u mSEPA-c Linear-BP [6] Pivot-BP [6] NoShortCut-BP [6]

Average Error

Std Dev

0.3912∗ 0.4006∗ 0.4031∗ 0.4113∗† 0.4194† 0.4453†

0.0470 0.0380 0.0516 0.0626 0.0448 0.0649

0.5528 0.5560 0.5557

0.0127 0.0283 0.0370

∗, † (Tukey’s HSD posthoc test classification using α = 0.05 level of significance)

† labels). Table 2 indicates that for mutation-based SEPA (mSEPA), Gaussian perturbation is significantly superior than the Uniform and Cauchy functions. For the mutation-crossover-based SEPA (mcSEPA), there is no significant difference among the three perturbation functions. Furthermore, the table also indicates that any SEPA variant has superior generalization than any of the Backpropagation variants tested by Prechelt [6]. Since these results are only limited to the glass classification problem and BP can be implemented in many ways, the comparison of SEPA with the BP variants are not conclusive and requires further study. Moreover, Figure 2 and Table 2 suggest that even though the Uniform perturbation has the best training performance in mSEPA, it has the worst generalization performance. For mcSEPA, the performance of the three perturbation functions are similar.

5

Conclusion

This preliminary study suggests that for evolutionary neural networks that rely solely in mutation operation, Gaussian perturbation provides a superior generalization performance than the Uniform and Cauchy functions. On the other hand, introduction of crossover operation helps to significantly improve the performance of the Cauchy and Uniform functions. It also suggests that in order to manage complexity provided by more chaotic perturbation functions such as that of the Uniform and Cauchy perturbations, a proper crossover operation 5

0.9

Correct Classification

Correct Classification

0.9

0.8 0.7

(a) mSEPA

0.6 0.5

mSEPA−c mSEPA−g mSEPA−u

0.4 0.3

0

200

400

600

800

0.8 0.7

0.5

mcSEPA−c mcSEPA−g mcSEPA−u

0.4 0.3

1000

Generations

(b) mcSEPA

0.6

0

200

400

600

800

1000

Generations

Fig. 2. Training Performance of the Different SEPA Variants

must be introduced to leverage and exploit the wider search coverage introduced by these functions. The simulation also indicates that that superior performance in training for mutation-based evolution does not necessarily imply a good generalization performance. It may even worsen the generalization performance due to too localized searching.

References 1. K. Chellapilla and D. Fogel. Two new mutation operators for enhanced search and optimization in evolutionary programming. In B.Bosacchi, J.C.Bezdek, and D.B.Fogel, editors, Proc. of SPIE: Applications of Soft Computing, volume 3165, pages 260–269, 1997. 2. P. M. Murphy and D. W. Aha. UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA, 1994. 3. P. Palmes, T. Hayasaka, and S. Usui. Evolution and adaptation of neural networks. In Proceedings of the International Joint Conference on Neural Networks, IJCNN, volume II, pages 397–404, Portland, Oregon, USA, 19-24 July 2003. IEEE Computer Society Press. 4. P. Palmes, T. Hayasaka, and S. Usui. SEPA: Structure evolution and parameter adaptation. In E. Cantu Paz, editor, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, page 223, Chicago, Illinois, USA, 11-17 July 2003. Morgan Kaufmann. 5. P. Palmes, T. Hayasaka, and S. Usui. Mutation-based genetic neural network. IEEE Transactions on Neural Network, 2004. article in press. 6. L. Prechelt. Proben1–a set of neural network benchmark problems and benchmarking rules. Technical Report 21/94, Fakultat fur Informatik, Univ. Karlsruhe, Karlsruhe, Germany, Sept 1994. 7. G. Rudolph. Local convergence rates of simple evolutionary algorithms with cauchy mutations. IEEE Trans. on Evolutionary Computation, 1(4):249–258, 1997. 8. X. Yao, Y. Liu, and G. Liu. Evolutionary programming made faster. IEEE Trans. on Evolutionary Computation, 3(2):82–102, 1999.

6

The Influence of Gaussian, Uniform, and Cauchy ...

networks found that Cauchy mutation had better performance than Gaussian ... network training performance using a certain perturbation function does not.

107KB Sizes 1 Downloads 180 Views

Recommend Documents

Asymptotic structure for solutions of the Cauchy ...
where { ˜fl(x−clt)} are the travelling wave solutions of (1b) with overfalls [α−l,α+ l ], .... The proofs of Theorems 1a, 1b combine improved versions of earlier tech-.

Asymptotic structure for solutions of the Cauchy ...
Large time behaviour of solutions of the Cauchy problem for the Navier–Stokes equation in Rn, n ≥ 2, and ... Gelfand [G] found a solution to this problem for the inviscid case ε = +0 with initial conditions f(x, 0) = α± if ..... equations but

POINTWISE AND UNIFORM CONVERGENCE OF SEQUENCES OF ...
Sanjay Gupta, Assistant Professor, Post Graduate Department of .... POINTWISE AND UNIFORM CONVERGENCE OF SEQUENCES OF FUNCTIONS.pdf.

Cauchy Graph Embedding
ding results preserve the local topology of the ... local topology preserving property: a pair of graph nodes ..... f(x)=1/(x2 + σ2) is the usual Cauchy distribution.

The Capacity Region of the Gaussian Cognitive Radio ...
MIMO-BC channels. IV. GENERAL DETERMINISTIC CHANNELS. For general deterministic cognitive channels defined in a manner similar to [13] we have the following outer bound: Theorem 4.1: The capacity region of a deterministic cogni- tive channel such tha

n-Dimensional Cauchy Neighbor Generation for the ...
Nov 11, 2004 - uct of n one-dimensional Cauchy distributions as an approximation. How- ever, this ... lowing Boltzmann annealing scheme [4]. g(∆x, T, k) ...

Uniform Distribution - Khadi Uniform Clarification.pdf
Uniform Distribution - Khadi Uniform Clarification.pdf. Uniform Distribution - Khadi Uniform Clarification.pdf. Open. Extract. Open with. Sign In. Details. Comments. General Info. Type. Dimensions. Size. Duration. Location. Modified. Created. Opened

toroidal gaussian filters for detection and extraction of ...
Screening mammography, which is x4ray imaging of the breast, is ... of quadrature filters and have the advantage that the sum of the squared .... imaging. Thus one can reject the null hypothesis (H0), if the p4value obtained for this statistical test

Tail Probabilities of Gaussian Suprema and Laplace ...
where d and σ are some important numeric parameters of the field x and Φ is the distribution function of the standard normal law. We apply this relation to the investigation of the Gaussian measure of large balls in the space lp in order to general

The Influence of Training Errors, Context and Number ...
Apr 21, 2009 - (i, j), 0 ≤ i ≤ M − 1,0 ≤ j ≤ N − 1, will be denoted by S, the support of the image. ... deserved a great deal of attention in the computer vision literature since they were ..... International Journal of Remote Sensing 13:

The Influence of Foreign Direct Investment, Intrafirm Trading, and ...
Sep 15, 2015 - resource-seeking, investments abroad are less likely to file AD petitions and that firms are likely to undertake vertical FDI ... No part of this working paper may be reproduced or utilized in any form or by any means, electronic or me

The Influence of Consequence Value and Text Difficulty ...
Departments of Psychology a and Computer Science b, University of Notre Dame, IN, .... consequence value on affect, attention, and learning and to assess if ...

The Influence of Whole Genome Duplication and ...
ble that explicitly maps the output expression state for all possible combinations of input ..... Princeton University Press. Wagner, A. (2008). Robustness and ...

Download Friedrich Hayek: The Ideas and Influence of ...
... of the Libertarian Economist Android, Download Friedrich Hayek: The Ideas ... achieved international fame through his 1944 critique of totalitarian socialism, ...

The influence of private and public greenspace on short ... - Brunsdon
field of health geography, which has taken a closer look at the role of place in influencing .... of individuals in Perth, Australia, with those in high SED neighbourhoods less likely to ..... statistical software package STATA using different measur

The Influence of Foreign Direct Investment, Intrafirm Trading, and ...
Sep 15, 2015 - We use firm-level data to examine the universe of US manufacturing firms and find that AD filers generally conduct no intrafirm ..... intrafirm trading intensity have very similar levels of vertical integration across the entire range

Identification and estimation of Gaussian affine term ...
Feb 3, 2012 - We will use for our illustration a Q representation for this system. Dai and ..... one would draw from Local 53 in Table 1 are fundamentally flawed ..... (50). Y2 t. (2×1). = A∗. 2+ φ∗. 2m. (2×24). Fm t. + φ∗. 21. (2×3). Y1 t

The Influence of Whole Genome Duplication and ...
Dartmouth College, Hanover, NH 03755. Jason.H.Moore@Dartmouth. ... highly deleterious in ancestral environments, but provides fit- ness advantages in novel ...