The Application of Evolutionary Computation to the ...

Viewer
Transcript

The Application of Evolutionary Computation to the Analysis of the Profiles of Elliptical Galaxies: A Maximum Likelihood Approach Ben Washbrook and Jin Li The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA) The School of Computer Science The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc35bxw, J.Li}@cs.bham.ac.uk Abstract- Genetic programming technique has been found to be suitable in scenarios where the formulation of models is a data driven process. Evolutionary programming provides a way of searching for parameters in a model without being prone to fall in local minima. A review of how these techniques have been applied to the analysis of elliptical galaxies is given. The effectiveness of a maximum likelihood based fitness function is asserted and is applied to the parameter fitting using evolutionary programming. A maximum likelihood based function is found to show consistent and significant improvement over a hit-based fitness function for modeling the profiles of elliptical galaxies. It is asserted that such a function would potentially improve the quality of model produced by symbolic regression using genetic programming.

1 Introduction Galaxies are structures within the universe formed by the mutual gravitation of matter into bound structures. Detectable via the emission of electromagnetic radiation from stars and gas (non-dark components), the intensity of the observed radiation from galaxies pertains to the distribution of light and thus matter within them. The morphology of a galaxies is a product of their formation history and therefore the classification of galaxies impinges upon the study of the physical processes which spawned them. Galaxies can be broadly divided into two morphological groups; those which appear as disks with a variety of spiral structure and those which are elliptical in appearance. Elliptical or earlytype galaxies exhibit a larger range of masses than spirals and exhibit less angular variation in brightness. This accommodates the decomposition of their 2 dimensional CCD (Charged Coupled Device) images to a 1 dimensional profile. This is achieved by evaluating the brightness between successive isophotal ellipses around the center of the galaxy resulting in a brightness distribution along the semi major axis (more detail of process in [Brown et al 2003]). 1.1 Traditional Approach The traditional approach in the analysis of physical phenomena is for a measure of a specific physical quantity to be made which can then be compared to a model derived from the assertion of the physical principles governing it’s form ([Babovic and Keijzer 2000]). The process of creating a physical model inevitably involves simplification of

Figure 1: The Elliptical galaxy M60 and it’s spiral companion NGC 4647 the phenomena under consideration. In the case of elliptical galaxies, a model proposed by Hubble (see Equation 1) (Hubble (1930) cited in [Li et al 2004]) based upon the treatment of the galaxy as a gravitating isothermal sphere of gas whose brightness falls as it’s density. I(r) = I0 /(1 + (r/Re ))2

(1)

Although there is a physical basis for the Hubble model it does not fully describe the physics of elliptical galaxy formation. This is highlighted by the fact that many of the models used to describe elliptical galaxies have a purely empirical basis, the exponential power law proposed by De Voucoulers (1948 cited in [Li et al 2004]) is an example such a model . I(r) = Ie exp −7.67((r/Re )4 − 1)

(2)

The basis on which purely empirical models are used is that they yield a better statistical fit although neither describes well the profiles of all elliptical galaxies ([Li et al 2004], and [Graham, et al 1996]). This desire to show elliptical galaxies as a homogeneous group of objects motivates the formulation of new models. 1.2 Evolutionary Approach The analysis of elliptical galaxies can be divided into two stages. 1. The formulation of a model. 2. The comparison of a model to real data.

in the GP and EP of [Li et al 2004] are shown in Table 1 and Table 2 respectively. The fitness function in this case is hits based, counting the number of times a model falls within a specified tolerance (a fraction of the data points value) of the data points. In the GP regression of [Li et al 2004] the hit criterion is combined with a term which reduces the fitness as a function of length of the expression (Table 1). The result of this analysis was two simple expressions; Equation (3) and Equation (4).

Figure 2: The tree structure ofpa genetic program describing the function f (x) = sin x + x2 + y These two stages have counterparts in evolutionary computation. [Koza 1996] asserts the effectiveness of genetic programming (GP) in performing functional regression in a situation where an empirical model is satisfactory. Evolutionary programming (EP) is a process which performs well in the finding parameters given a model ([Yao, X 1996]). GP is a heuristic search method by which a population of programs are generated which then are subject to mutation and cross-over on the basis of a fitness criterion. Fit individuals along with their offspring comprise the next generation, this selection procedure is the basis by which useful schemata (the components of individuals) are perpetuated to future generations. In the tree representation of GP programs are produced by combining terminals (fixed and variable values) and functions (operations) in a tree structure. The tree like nature of the representation allows only syntactically correct programs to be created. The population of individuals is then quantitively tested to see how well their performance correlates with that which is desired, this is their fitness. Using a set of mathematical operations as the functions this forms a basis for performing stochastic symbolic regression. Figure.2 shows an example of the tree representation structure of a function produced by GP. GP then describes a method to replace human intuition in the formulation of a model ([Langdon and Poli 2002]). EP is a search method suited to obtaining parameters given a function since the genomes subject to evolution are single values. The process of EP is similar to GP except the only genetic operator used is that of mutation. The process of EP is used to search the parameter space in order to find a global optimum in place of traditionally used hill climbing algorithms such as the Levenberg-Marquard method ([Press et al 1992]). This is necessary as although GP can find general models which fit data it does not fit the parameters (the constant terminal values) well.

2 Review of Previous Work Evolutionary computation has been applied to find functional descriptions of the profiles of elliptical galaxies. [Li et al 2004] applied both EP and GP to the profiles of 18 elliptical galaxies in the coma cluster. The parameters used

I(r) = A/(B + Cr)

(3)

I(r) = A/(B + Cr2 )

(4)

Interpreting Equation (3) and Equation (4) from a wish to write them succinctly and in terms of physical quantities they can be given the form of Equation (5) and Equation (6). This reduces the number of parameters without altering the properties of the expression. I0 represents the central brightness of the galaxy whilst Re is the radius at which the light has fallen to half it’s original brightness. Equation (6) shows similarity to Equation (1) and is so called the modified Hubble profile ([Rood et al 1972]). This is encouraging even prior to any statistical analysis as it shows the emergence of a model from a stochastic process which is known to perform well and is related to the physical properties of elliptical galaxies. I(r) = I0 /(1 + r/Re )

(5)

I(r) = I0 /(1 + (r/Re )2 )

(6)

The need for further interpretation is asserted as in the GP no representation of the power operation was present (see Table 1). In light of this it can be inferred that if both Equation (5) and Equation (6) describe the data to a comparable extent then they can be interpreted as two aspects of a single function. The model then takes the form of Equation (7), a generalized modified Hubble model. This generalization from a two parameter model to a three parameter model is similar to the generalization of (2) to (8) by Sersic (1968 cited in [Graham, et al 1996]) ([Khosroshahi et al 2004]). This generalization as well as giving a better fit to the data relates to the physical nature of the galaxies ([Khosroshahi et al 2004]). The parameter n in Equation (7) shows how the structure of a galaxy deviates from an isothermal sphere. I(r) = I0 /(1 + (r/Re )n )

(7)

I(r) = Ie exp −bn ((r/Re )n − 1)

(8)

2.1 Statistical Analysis The test of any model is ultimately statistical. From a maximum likelihood point of view how closely a model describes data can be based on taking the model to be true and considering the probability of the data being produced

Table 1: The parameters used in the GP regression of [Li et al 2004] Parameter Terminals Functions Crossover rate Mutation rate Population size Max no of generations Termination criterion Selection strategy Max Depth of individual expression Hit criterion Fitness criterion Max initial expression depth Max run time by the model ([Press et al 1992]). This approach is the basis of the χ2 test (Equation(9)). χ2 considers the error (σi ) attributed to each data point (di ) as being the standard deviation of a gaussian probability distribution around the model (mi ). χ2 =

X (di − mi )2 i

σi2

(9)

Thus the success of the fitness function in both GP and EP will be dependent on how well it approximates the χ2 test. In this respect the hits criterion exhibits two main weakness as an approximation of χ2 and therefore as a measure of model fitness. 1. It is discrete in nature which means it is unable to distinguish between differences in χ2 as long as the number of hits is the same i.e. a models proximity to a data point is disregarded beyond the boolean result of it existing within or beyond the bounds of the hits criterion. 2. Where the errors attributed to the data are not well approximated by a linear transformation upon the value of the data, the number of hits will be a poor approximation of χ2 i.e. if there exist data points within the set which have been significantly more/less accurately measured the fitness function will not take this into account. In light of this it would then seem that in this situation a fitness function based on maximum likelihood considerations would be suitable for this application of evolutionary computation.

3 Experiments and Results The application of fitness function based on maximum likelihood was applied to EP. The form of the fitness function is chosen to preserve the process of EP as a maximization procedure as opposed to minimization which is natural form of χ2 . This is done by simply using an inverted χ2 as the

Value r, R=[-10, 10] +, -, *, /, exp, sin, cos, log 0.9 0.01 6000 100 generation or time limit Tournament selection (size 6) 17 0.005 Hits - 0.01 * expression length 6 6 fitness function.To test the validity of the proposition that a maximum likelihood has significant practical advantages over a hits regime, an experimental comparison was performed upon 18 datasets from galaxies in the coma cluster as in [Li et al 2004]. The two methods were set to be identical except for the fitness function to ensure an unbiased procedure (see Table 2). The hits criterion was initialized at a value of 0.001 and was subject to re-evaluation once a maximum number of hits had been reached. The model chosen for the test was that inferred from the GP model regression of [Li et al 2004] Equation (7.). The three parameter model was needed to be extended to a four parameter model Equation (10) due to account for background light being present in the CCD image which was decomposed to form the data set. I(r) = I0 /(1 + (r/Re )n ) + K

(10)

In the performing of the parameter regression a set of errors acquired in the decomposition procedure is favored as to show the maximum likelihood based model’s ability to describe a set of errors of variable quality. This is not the case here and the nature of the errors used is that of a poisson statistic upon the counts (N) read from the galaxy (11) ([Barlow 1989]). σpoisson ≈

√ N

(11)

However this value is not known until after the fitting of the background parameter so in the EP parameter regression the poisson√statistic of the total counts value is taken. The order of N is approximately 10 times that of √ the background corrected value of N −√K although it is not a linear relationship. The value of N − K is used to calculate the values of χ2 This procedure, although not ideal, is far from unacceptable, the yielding errors of order √ N − K ≈ N × 10− 3. For each elliptical galaxies data set each procedure was run 5 times in order to get statistics on their respective performance. This is necessary as any evolutionary computational process is fundamentally random.

Table 2: The parameters used in the EP Parameter Mutation rate Magnitude of mutation Population size Max no of generations Termination criterion Selection strategy

Value 0.9 0.005 10000 200 Max generation Tournament selection (size 6)

3.1 Setup of Evolutionary Programming The EP process was that of an altered version of that used in [Li et al 2004]. The model parameters are represented within the EP by genomes which take real values. The genomes I0 , Re , and K were initialized randomly between -10 and 10, whilst the parameter n was initialized between -2 and 2 to account for the sensitivity of the power operation. This restriction in the initialization is not thought to restrict the process as values of the order of the isothermal value of 2 are expected. To accommodate the initialization procedure all the data is linearly normalized within the EP procedure such that they fall between values of 0 and 10. The nature of the mutation upon the genomes is Cauchy in order to search around fit values whilst maintaining diversity within the population ([Li et al 2004]). The magnitude of the mutation refers to the standard deviation (the standard fractional change in the genome value) of the probability distribution (see Table 2).

Figure 3: Fit using a maximum likelihood based method to the profile of galaxy D159s43

3.2 Results

greater than that of the mean. A very small number of solutions approximately 0.05 of the population returned incalculable values of χ2 as the parameter K was larger than N √ yielding a imaginary value of N − K, these points are not included in the results shown in Table 3. Although the majority of these points are the result of fitting by the fits method some are from the chi method. It is conjectured that these values of K arise from the discrepancies between the χ2 evaluation in the fitness evaluation and that in the The hits based method runs faster than the maximum likelihood based method which is expected from the nature of the respective fitness calculations. Fig. 3 and Fig. 4 show a fit by the maximum likelihood and hits based method to the galaxy D159s43, where the y-axis shows the brightness in counts and the x-axis is the semi-major axis in units of pixels. The χ2red for each are 0.12, and 20000 for Equation 3 and Equation 4, respectively.

Table 3 shows the statistics of the hits based method and the maximum likelihood based method from the 5 runs. χ ¯2red is the mean value of Function 12, where the degrees of freedom is given by the number of data points minus number of parameters, and σχ2red is the corresponding standard deviation. χ2red =

χ2 degreesof f reedom

(12)

If a value of χ2red < 1 is produced, the model is considered as a good fit and is on average within the every data point’s error bar. The average χ ¯2red on the hits based and maximum likelihood based method are 2260 and 2.41 respectively. It is confirmed that EP performed with the hits criterion returns a near maximum hits on every occasion ([Li et al 2004]). However this does not directly correspond to a high value of χ2 as shown in Table 3. The set of solutions from the hit based method fall into 2 categories; 1. Those which are physical and to a certain extent approximate χ2 ( χ ¯2red =32.6 ). 2. Those which are unphysical describing profiles which have negative brightness at certain radii (Fig.4) (χ ¯2red = 12500). The hits method as well as producing high values of χ2red , the standard deviation upon the χ2red is consistently

Figure 4: Fit using a hits based method to the profile of galaxy D159s43

4 Discussions and Conclusions 4.1 Interpretation The results of the analysis suggest that a hits based fitness function causes EP to converge in one case to poorer solutions than that found by the maximum likelihood based method and in the other cases to give solutions which don’t physically make sense (see Fig. 4). It is inferred that this arises when the hits criterion is given the higher possibility

Table 3: The comparative performance of each EP method Galaxy

Hits method σχ2red 834 1940 501 1250 6250 8980 2640 3750 2.75 4.41 132 276 3.37 2.17 760 1770 7.07 8.34 143 346 70.2 144 16300 21500 2860 6360 44.5 38.4 2140 4680 190 417 214 756 2.05 2.35 12.6 16.0

χ ¯2red IC843 D140 D159s43 D159s83 D159s89 D160s23 D160s27 D160s37 D160s159 IC3900 IC4011 IC4021 IC4012 IC4051 D159s41 IC4133 IC4673 IC4789 IC4807

of increasing fitness by “hitting” additional data points with the effect that one or some of the data points will be sacrificed from the fitting procedure all together (e.g., fitting the bulk of a galaxy well but appending negative brightness to the core). Given such a possibility, the hits method is more likely to accept some profile models that are generally poor, whereas the maximum likelihood would have rejected them. Thus, populations evolved by the hit method would become infested with many more poor profiles resulting in the huge discrepancy in terms of χ2red values, as compared to the maximum likelihood method’s. The results in Table 3 show that a maximum likelihood based fitness function found consistently better solutions than one based on a hits criterion in terms of χ2red values. In a scenario like this where the errors used in the regression of the parameters and the evaluation of the χ2red the power of the process is limited by it’s relationship to the errors used in the evaluation. Overall, although a hits based fitness function may be successful in some cases, the flaws in this context have been shown. A maximum likelihood approach shows a way of improving the performance of EP in this case and thus indicates promises for improving the performance of GP in the future if applied. 4.2 Discussion and Further Work In a situation such as this, where there is knowledge about the parameters which should be found, i.e., the parameters must be positive or they cannot represent the decomposed profile of the galaxy. This allows constrains to be applied to the search space which reduces and thus removes local minima which EP would otherwise find. EP shows itself as being a viable way of fitting parameters in this situation, however the time taken in the fitting is large in comparison

χ2 method σχ2red 0.507 0.0731 0.898 1.00 0.413 0.289 0.391 0.0152 0.158 0.114 0.317 0.218 1.08 2.81 0.279 0.182 1.05 1.08 0.375 0.0416 0.437 0.603 3.94 3.63 19.8 43.6 1.27 0.483 1.35 1.15 0.546 0.560 0.137 0.0349 0.253 0.245 13.7 14.9 χ ¯2red

to traditional methods of parameter regression. EP is known to be less susceptible to falling into local minima and less sensitive to it’s initialization. In choosing whether EP is a necessary method to use to fit parameters to a model two considerations need to be made. 1. How complex is the model: how many parameters does it have and what is there functional relationship. This pertains to the complexity of the search space i.e the space of χ2 (a1 , a2 ..an ). 2. How well can the parameters be initialized; a hill climbing algorithm may not be susceptible to falling into local minima if guess values near to the global solution can be produced. There is adequate evidence to suggest that the addition of a maximum likelihood based fitness function in the process of symbolic regression using GP may enhance it’s performance. A strength of GP is that it is a random process which means that many non-intuitive models can be formulated and tested relatively quickly. There is however much more that can be done to maximize the efficiency of GP symbolic regression in this case. Three factors are noted which limit the accuracy of models produced by GP: 1. The phenomenon of bloat: The tendency of GP to produce long expressions. this produces complex models with much insignificant substructure. 2. The fundamental accuracy of the data being fitted to. 3. The extent to which the constant terminal values in the GP fit the data. GP has the tendency to produce models which are overly complex with many terms which have little impact to the

overall form of the model. The success of various methods to combat such effects in symbolic regression are given in [Silva and Almeida 2003], where a combination of lexicographical parsimony pressure and dynamic maximum tree depth was found to reduce the amount of redundant code the most. If the assumption that data sets can be described by the same model is made then the limitation of the errors a single data set can be avoided by performing symbolic regression upon multiple data sets. The fitness of a model is dependent on it’s functions and it’s terminal values, within GP the parameters to a model are not found but very roughly approximated therefore the ability to find a new model is limited by this fact. If parameters could be fitted during the GP process then this would improve the fitness of the generated models.To achieved this without reducing the probability of new expressions would increase the power of the GP process in this case. GP allows any syntactically valid expression to be tested as a possible solution therefore allowing expressions such as Equation 3 to be produced, which are the same as Equation 5. This is another form of redundancy within GP; the existence of expressions which are semantically meaningless. To shrink the search space, [Babovic and Keijzer 2000] built the semantics which form a theory into their GP symbolic regression. However there may be subtle advantages to not restricting the regression in this way. The development of the application of evolutionary computation to the profiles of galaxies inevitable will have impact on other areas of physical analysis. Advantages of using the maximum likelihood approach over the hit-based approach certainly shed light on the analysis of other forms of physical data using evolutionary computation techniques.

Acknowledgments Thanks go to Somak Raychaudhury for his supervision and help. Datasets were supplied by Haibib Khosroshahi. EP environment was a modified version by Jin Li, based on ECJ by Sean Luke. Fig.1 is taken from www.astr.ua.edu/gifimages/m60b.html. Fig.2 is taken from http://jmvidal.cse.sc.edu/talks/geneticalgorithms/allslides.xml.

Bibliography [Babovic and Keijzer 2000] Babovic, V. and Keijzer, M. 2000 Data to knowledge - The new scientific paradigm. Water Industry Systems 3–14. Exeter, United Kingdom. [Barlow 1989] Barlow, R. 1989 Statistics. Wiley. [Brown et al 2003] Brown, R.J.N., Forbes, D.A., Silva, D., Helsdon, S.F., Ponman, T.J., Hau, G.K.T., Brodie, J.P., Goudfrooij, P. Bothun, G. 2003 Near-Infrared Imaging of Ellipticals: Surface Brightness Profiles and Photometry. Mon. Not. Roy. Astron. Soc., 341 p 747-759. [Graham, et al 1996] Graham et al. 1996 Brightest cluster galaxies profile shapes The Astrophysical Journal. 465 p 534-547.

[Khosroshahi et al 2004] Khosroshahi H, Raychaudhury S, Ponman T, Miles T, Forbes D. 2004 Scaling relations in early type galaxies belonging to groups Mon. Not. Roy. Astron. Soc. 349 p 527-534 [Koza 1996] Koza, J. 1996 Future Work and Practical Applications of Gernetic Programming. Handbook of Evolutionary Computaiton. [Langdon and Poli 2002] Langdon,W, B and Poli, R. 2002 Foundations of Genetic Programming. Springer-Verlag. [Li et al 2004] Li, J. Yao, X. Frayn, C, Khosroshahi, H. and Raychaudhury, S. 2004, An Evolutionary Approch to Modelling Radial Brightness Distributions in Elliptical Galaxies. Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, PPSN VIII, LNCS 3242, 591-601. Berlin: Springer-Verlag. [Press et al 1992] Press, W, H, Teukolsky, S, A Vetterling, W, T, Flannery, B, P 1992 Numerical Recipes in C++ (2nd Edition). Cambridge. [Rood et al 1972] Rood, h, J; Page, T, L; Kinter, E, C; King, I, R; 1972 The Structure of the Coma Cluster of Galaxies. ApJ 175 p627. [Silva and Almeida 2003] Silva, S, Almeida, J, 2003 Dynamic Maximum Tree Depth - A Simple Technique for Avoiding Bloat un Tree-Based GP. Lecture Notes in Computer Science 2724:1776-1787 [Yao, X 1996] Yao, X, 1996 An Overview of Evolutionary Computation. Chinese Journal of Advanced Software Research Volume 3 No 1.

Theoretical Foundations of Evolutionary Computation