Generic GA-based Meta-level Parameter Optimization for Pattern Recognition Systems Ernest Lumanpauw, Michel Pasquier, and Richard J. Oentaryo, Student Member, IEEE

Abstract—This paper proposes a novel generic meta-level parameter optimization framework to address the problem of determining the optimal parameters of pattern recognition systems. The proposed framework is currently implemented to control the parameters of neuro-fuzzy system, a subclass of pattern recognition system, by employing a genetic algorithm (GA) as the core optimization technique. Two neuro-fuzzy systems i.e., Generic Self-Organizing Fuzzy Neural Network realizing Yager inference (GenSoFNN-Yager) and Reduced Fuzzy Cerebellar Model Articulation Computer realizing the Yager inference (RFCMAC-Yager), are employed as the test prototypes to evaluate the proposed framework. Experimental results on several classification and regression problems have shown the efficacy and robustness of the proposed approach.

I. INTRODUCTION

P

ARAMETERS are pervasive in many pattern recognition systems. Apart from the nature of the problem, the performance of any pattern recognition system depends heavily on its parameter settings. Given a task to be solved, the choice of parameters dictates internal operations of the system, which eventually will determine the output produced and overall performance of the system [1]. For instance, multi-layer perceptron (MLP) [2] has parameters such as learning rate, number of hidden layers, number of hidden neurons, and activation function type, which determine the learning process and overall structure of the network. In practical situations, however, end-users or operators usually do not have understandings of the effects of parameters on the structure and internal operations of the system. As a result, given input data to be processed, they need to repetitively apply different parameter settings using brute force or trial-and-error method to achieve desirable results. In order to resolve this problem, various optimization techniques have been embedded inside pattern recognition systems to internally optimize its structure and learning process, and to tune its weights with varying results. Typical examples are GA-NN hybrid [3-5], a combination of genetic algorithm (GA) [6, 7] and neural network (NN) [8]. In this method, the GA is usually employed to internally assist the learning process, tune the structure, or optimize internal operations of the NN. These approaches, however, are not generic as they are specifically tailored for certain types of

The authors are with the Center for Computational Intelligence, School of Computer Engineering, Nanyang Technological University, 639798 Singapore (e-mail: asmbpasquier@ ntu.edu.sg).

system and highly dependent on its internal architecture. In addition, these approaches are not modular since modifying the optimization technique requires probing into and/or modifying the system architecture. In this paper, we propose a generic and modular approach for parameter optimization. In contrast to the aforementioned approaches, the optimization technique is external to the pattern recognition system and does not need to know internal operations of the system, and vice versa. That is, both view one another as black-boxes. In this context, the optimization technique can be thought as the “operator” of the system, working at meta-level to replace the human roles in manually configuring the system parameters. Our present work focuses on implementing the proposed optimization framework using GA as the core optimization technique and applying it to a subclass of pattern recognition system called neuro-fuzzy system (NFS) [9], which synergizes the learning and adaptation capabilities of NN and human-like reasoning mechanisms of fuzzy system. GA, inspired by the Darwinian survival-of-the-fittest theory [10], is a global search technique widely used to find optimal or approximate solutions in many real-world problems. GA can quickly locate high performance regions in extremely large and complex search space, and is less likely to be trapped in local optima as compared to other optimization techniques such as Simulated Annealing [11] and Simplex Method [12]. For experimental purposes, two instances of neuro-fuzzy systems termed the Generic Self-organizing Fuzzy Neural Network realizing Yager inference (GenSoFNN-Yager) [13] and Reduced Fuzzy Cerebellar Model Articulation Computer based on Yager inference (RFCMAC-Yager) [14], will be employed to evaluate the proposed framework. The former is a global learning memory system which employs online sequential learning [13], while the latter is a local learning memory system which uses batched learning procedure [14]. This paper is organized as follows. Section II describes the proposed meta-level parameter optimization method. Section III discusses the implementation of the proposed framework to neuro-fuzzy systems. Section IV subsequently presents the experimental results and analysis. Section V finally concludes this paper and outlines some future works. II. META-LEVEL PARAMETER OPTIMIZATION In general, a pattern recognition system at its operational level (or from the end-user’s point of view) can be viewed as a black-box system because its internal operations are opaque to

1593 c 1-4244-1340-0/07$25.00 2007 IEEE

the users. This kind of system is usually defined in terms of its input and output characteristics, as shown in Fig. 1. The input section of the system typically consists of two parts: input data x , which form the main information processing inputs to the system, and parameter vector p , which influences the internal operations or working mechanisms of the system. Since the system performance depends on the input data and its parameter settings, the output y can be expressed as a

parameter set is presented in Fig. 3. The optimization begins with the user specifying the initial parameter set p , followed

function of input data and parameter vector: y = f ( x, p ).

g (y s *, y s ) (e.g., mean squared error (MSE), correlation) can

Input x Parameter p

System f (.)

Output y

by training the network using the training set x t and desired training output y t * (for supervised learning). Afterwards, the trained network is tested using the testing set x s to obtain the network output ys = f ( xs , p ). Some performance metrics

then be derived from the desired testing output y s * and the NFS output y s . This process will then be terminated when a satisfactory performance is reached. Desired Training Output yt*

Fig. 1. Black-box representation of a system.

System f (.) (Black-box)

Desired Testing Output ys*

g(.)

Training Data xt

Trained Neuro-fuzzy System System

Untrained System Neuro-fuzzy (Black-box) System

Input x Parameter p

Testing Data xs

Output y

(Black-box) f (.)

Parameter p

Output ys

Fig. 3. Neuro-fuzzy system without parameter optimization. Optimization Technique

g(.)

Fig. 2. Meta-level parameter optimization framework.

The representation given in Fig. 1 belongs to the category of open-loop system [15]. Despite its simple structure, one major disadvantage of this architecture is that it has no feedback loop to control its output and determine if it has achieved the desired (optimal) goal. As a result, in order to achieve desirable performance, one needs to repetitively apply different parameter settings by trial-and-error. To resolve this problem, we can transform the open system representation in Fig. 1 into a closed-loop system [15] by incorporating an optimization technique to provide feedback, as depicted in Fig. 2. In this approach, the output y remains a function of input data x and parameter vector p , i.e. y = f ( x, p ). Consequently, the process of finding optimal parameters can be viewed as a continuous communication process between the optimization technique and the system. The optimization technique sends parameters p to the system to process the input data. In exchange, the system returns performance metrics, g (y ) (e.g. total errors, accuracy, efficiency, etc.), to the optimization technique to be evaluated. The process continues until certain termination criteria (e.g. maximum number of evaluations or time limit) are satisfied. III. IMPLEMENTATION TO NEURO-FUZZY SYSTEM A. Architecture For comparison purposes, a schematic diagram depicting the conventional trial-and-error framework to find the best

1594

Desired Training Output yt*

Training Data xt Parameter p

Testing Data xs

Desired Testing Output ys*

Trained System Neuro-fuzzy System

Untrained System Neuro-fuzzy (Black-box) System

Output ys

(Black-box) f (.)

GA

g(.)

Fig. 4. Neuro-fuzzy system with GA-based parameter optimization.

Employing GA as the meta-level parameter optimization technique yields in turn a new architecture, as illustrated in Fig. 4. The continuous communication process between the GA and neuro-fuzzy system is summarized by the flowchart in Fig. 5. The process starts with the initialization of n candidate parameter vectors by the GA. Each parameter vector is sent to the neuro-fuzzy system for training and testing in order to obtain its fitness value, i.e. g (y s *, y s ) . When all parameter vectors have been evaluated, the GA operations (i.e. selection, mutation, crossover, replacement) will commence. This cycle is repeated until a termination criterion (e.g. maximum number of generations) is reached. B. Chromosome Representation Fig. 6 shows encoding of the discrete incremental clustering (DIC) parameters adopted by the GenSoFNN-Yager system in the form of a real-valued chromosome. They include plasticity STEP, input and output SLOPE, fuzzy membership threshold, and cluster annexation threshold [16]. In this representation, each gene corresponds to precisely one parameter. Likewise, the parameters of the RFCMAC-Yager are encoded into a real-valued chromosome in the form shown in Fig. 7. For

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

interested readers, further details about the definitions and effects of the DIC parameters can be found in [16].

Maximize f ( x) subject to x ∈ X

(1)

where f (x) is the objective function, x is the decision vector, and X is the feasible region in the decision space. The problem of finding optimal parameters for neuro-fuzzy system can also be cast into the form:

Maximize g (y s *, y s ) subject to LBi ≤ p i ≤ UBi , i = 1,2,..., I

(2)

where g (y s *, y s ) is performance metrics, p1 ,..., p I ∈ p are parameters of the neuro-fuzzy system, LBi and UBi are lower and upper bound values of parameter i, and I is the total number of parameters in the neuro-fuzzy system. In general, there are two classes of problems for supervised learning: classification and regression. They concern pattern recognition problems with discrete- and continuous-valued outputs respectively. For classification problem, an optimal performance corresponds to the minimum number of misclassified instances, while the best performance in the context of regression type is achieved when the error between the desired/actual outputs and the NFS outputs is minimal. These can subsequently be formulated as:

Minimize MSE =

1 M

M

¦ ( y s*

m =1

m

− y sm ) 2

(3)

subject to LBi ≤ p i ≤ UBi , i = 1,2,..., I where M is the number of testing samples, y *sm is the desired output of mth testing sample, and y sm is the output of mth testing sample produced by the NFS.

Fig. 5. GA-based meta-level parameter optimization for neuro-fuzzy system. STEP (0, 1.0]

Input SLOPE [0.0 1.0]

Output SLOPE [0.0 1.0]

Membership threshold [0.0 1.0]

Annexation threshold [0.0 1.0]

Fig. 6. Chromosome representation of the GenSoFNN-Yager parameters. STEP (0, 1.0]

Input SLOPE [0.0 1.0]

Output SLOPE [0.0 1.0]

Membership threshold [0.0 1.0]

Fig. 7. Chromosome representation of the RFCMAC-Yager parameters.

C. Fitness Function Formulation Consider a standard optimization (maximization) problem in the following form:

D. Complexity Analysis The process of finding optimal parameters of neuro-fuzzy system requires O ( gpτ ) computational time, where g, p, and

τ represent number of generations, number of parameter vector population, and computational time required for single train and test respectively. In other words, the computational time is equal to the time required to train and test the NFS for g × p times. It should be noted, however, that τ is not necessarily uniform; it may vary depending on the parameter vector. For some parameter vectors, the training and testing time can be very short, but it may not be the case for others. IV. EXPERIMENTAL RESULTS AND DISCUSSION This section presents some of the experiments which have been conducted to validate the effectiveness of the GA-based meta-level parameter optimization of GenSoFNN-Yager and RFCMAC-Yager for classification and regression problems.

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

1595

The datasets used in the experiments comprise: Fisher’s Iris classification dataset [17], Mackey-Glass time series dataset [18], and Pima Indians Diabetes [19]. The performance of the optimized NFSs will be compared against that of the systems with their default parameters. The latter corresponds to the parameter set that is determined heuristically based on empirical performance observations on various benchmark problems. Thus, it is not surprising that the default parameter often yields an “acceptable” performance. For our simulation studies, we employ a specific type of GA implementation termed DemeGA [20]. It is chosen due to its comparatively good performance in several benchmarking experiments we conducted before. The parameter settings of the DemeGA are listed in TABLE I. TABLE I PARAMETER SETTINGS OF DEMEGA Parameter Name Value Maximum number of generations

20

Population size

50

Crossover probability (2-point crossover)

0.9

Mutation probability

0.01

Elitism

On

Migration rate

0.1%

Replacement rate

0.5%

A. Iris Classification The Iris dataset [17], originally conceived by Fisher in 1936, is used to provide a pedagogical illustration for the proposed parameter optimization framework. The dataset comprises 150 instances of iris flower, evenly distributed in three classes: iris-setosa, iris-versicolor, and iris-virginica. Each class contains 50 vectors and each vector has four features: sepal length X1, sepal width X2, petal length X3, and petal width X4 (all measured in centimeters). TABLE II OPTIMIZED NETWORK PARAMETERS FOR IRIS CLASSIFICATION Parameters GenSoFNN-Yager RFCMAC-Yager Input SLOPE

0.1

Output SLOPE

0.1

0.3 0.8

STEP

0.5

0.1

Membership Threshold

0.4

0.5

Annex Threshold

0.8

-

In this experiment, threefold cross-validation (CV) method is used to evaluate the system performance. TABLE II shows the optimized parameter setting of the GenSoFNN-Yager and RFCMAC-Yager systems. The obtained parameter settings are used to train and test both networks. Results obtained using the optimized parameter settings are then compared with those obtained using the default parameters. Comparisons will be based on the following performance metrics: classification

1596

accuracy (i.e., total number of correctly classified instances), number of fuzzy rules identified, and number of fuzzy clusters for each input dimension (measuring the clustering efficiency). TABLE III COMPARISON BETWEEN DEFAULT AND OPTIMIZED GENSOFNN-YAGER FOR IRIS CLASSIFICATION CV Criteria Default Optimized Improvement

1

2

3

Accuracy

78.00%

96.00%

+23.08%

# Rules

31

13

-58.06%

# Clusters

6:6:5:5

4:4:4:3

N/A

Accuracy

92.00%

98.00%

+6.52% -53.85%

# Rules

26

12

# Clusters

7:7:4:5

4:4:3:3

N/A

Accuracy

74.00%

94.00%

+27.03%

# Rules

30

12

-60.00%

# Clusters

7:7:4:5

5:4:3:3

N/A

TABLE IV COMPARISON BETWEEN DEFAULT AND OPTIMIZED RFCMAC-YAGER FOR IRIS CLASSIFICATION CV Criteria Default Optimized Improvement

1

2

3

Accuracy

86.00%

96.00%

+23.08%

# Rules

38

6

-58.06%

# Clusters

6:6:5:5

4:0:0:4

N/A

Accuracy

94.00%

100%

+6.52%

# Rules

50

7

-53.85%

# Clusters

7:7:4:5

5:0:3:3

N/A

Accuracy

96.00%

98.00%

+27.03%

# Rules

54

7

-60.00%

# Clusters

7:7:4:5

5:0:3:3

N/A

TABLE III shows performance comparison between the default and optimized GenSoFNN-Yager system for Iris classification. From the table, it is evident that for all the three CVs, the optimized GenSoFNN-Yager achieves a higher classification accuracy with much fewer fuzzy rules and fuzzy clusters than those of the default GenSoFNN-Yager. It can be observed that the optimized network results in reduction of the number of rules by 58.06%, 53.85%, and 60.00% for CV1–3 respectively, while at the same time improving the classification accuracy by 23.08%, 6.52%, and 27.03% for CV1–3 respectively. In terms of the number of fuzzy clusters, the default network produces 4-7 fuzzy clusters for each input, while the optimized network yields only 3-5 fuzzy clusters. This signifies the optimized parameters not only significantly simplify the network but at the same time also improve its generalization ability. TABLE IV shows performance comparison between the default and optimized RFCMAC-Yager for Iris dataset. As with the GenSoFNN-Yager, it is evident that the optimized

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

RFCMAC-Yager achieves higher accuracy rates for all the three CVs, with significantly fewer fuzzy rules and fuzzy clusters compared to those of the default RFCMAC-Yager. It is observed that the optimized network reduces the number of rules by 84.21%, 86.00%, and 87.04%, and improves the classification accuracy by 11.63%, 6.38%, and 2.08% for CV1–3 respectively. In addition, the optimized parameters allow the feature reduction capability of RFCMAC-Yager to become evident. Previously, using the default parameters, no input feature was removed by the network. Using the optimized parameters, however, we observe that in CV1 input dimensions sepal width (X2) and petal length (X3) are removed, in CV2 and CV3 input dimension X2 is removed, yielding in a simpler and more efficient network structure. TABLE V summarizes the benchmarking results of the optimized GenSoFNN-Yager and RCFMAC-Yager for Iris dataset against other popular pattern recognition systems, including k-nearest neighbor (k-NN) [21], multi-layer perceptron (MLP) [2], genetic algorithm-based classifier (GAC) [22], fuzzy adaptive learning control network based on adaptive resonance theory (FALCON-ART) [23], pseudo outer product fuzzy neural network based on Yager inference scheme (POPFNN-Yager) [24], and FCMAC-Yager [25]. TABLE V PERFORMANCE COMPARISON WITH OTHER METHODS FOR IRIS CLASSIFICATION Method Evaluation # Rules Accuracy k-NN (k = 3)

3-CV

N/A

95.33 + 1.15%

MLP (7 hidden neurons)

3-CV

N/A

94.00 + 3.46%

50T-50E

N/A

96.00% 75.76 + 7.54%

GAC

+ 2.00%, which is the best among other systems, and the total number of rules reduces significantly from 48 to 7. These results emphasize the importance of finding the optimal parameter settings of neuro-fuzzy system. They also show that the meta-level parameter optimization process not only eliminates the hassle of finding optimal parameters by trial-and-error, but also helps unleash the inherent capabilities of neuro-fuzzy system, resulting in significant improvements to the overall performance of the system. B. Mackey-Glass Time Series The Mackey-Glass (MG) time series [18] is an example of a chaotic time series. It is based on the Mackey-Glass differential equation and is widely regarded as a benchmark for comparing generalization ability of different methods. The following time-delay ordinary differential equation is used to generate the series:

ax(t − τ ) dx(t ) = −bx(t ) + dt 1 + x(t − τ )10

(4)

where a = 0.2, b = 0.1, and τ = 30. The problem can be formulated as: given values x(t − m), x(t − m + 1),..., x(t − 1), determine x(t − 1 + n) , where m and n are fixed positive integers and t is the series index. Throughout all experiments, 6 previous values (i.e. m = 6) are used as inputs to predict the next step (set 1), 2 steps (set 2), and 4 steps (set 3) (i.e. n = 1, n = 2, and n = 4) respectively. Since the data is a time series, the CV approach cannot be used because it will affect the continuity of the data. Therefore, during the experiments, holdout validation approach is used with 700 vectors as training set and 300 vectors as testing set.

FALCON-ART

3-CV

3

POPFNN-Yager

50T-50E

56

89.93%

FCMAC-Yager

6-CV

286

96.00 + 3.06%

GenSoFNN-Yager (def)

3-CV

29

81.33 + 9.45%

GenSoFNN-Yager (opt)

3-CV

14

96.00 + 2.00%

RFCMAC-Yager (def)

3-CV

48

92.00 + 5.29%

Input SLOPE

0.1

0.1

0.1

0.1

0.1

RFCMAC-Yager (opt)

3-CV

7

98.00 + 2.00%

Output SLOPE

0.1

0.2

0.1

0.1

0.1

0.2

STEP

0.6

0.6

0.6

0.6

0.3

0.3

Membership Threshold

0.6

0.6

0.5

0.6

0.5

0.6

Annex Threshold

0.6

0.7

0.9

-

-

-

def = default, opt = optimized, 50T-50E = 50% training-50% testing, k-CV = k-fold cross-validation

The average classification accuracy of the default GenSoFNN-Yager is 81.33 + 9.45%, which is the lowest compared to other systems except the FALCON-ART. Meanwhile, the default RFCMAC-Yager achieves an average classification accuracy of 92.00 + 5.29%, which is inferior to that of k-NN, MLP, GAC, and FCMAC-Yager. However, using optimized parameters, the average accuracy of GenSoFNN-Yager boosts to 96.00 + 2.00%, which is higher compared to other architectures. At the same time, the number of rules decreases from 29 to 14, thus more efficient. Likewise, using the optimized RFCMAC-Yager, the average classification accuracy increases from 92.00 + 5.29% to 98.00

TABLE VI OPTIMIZED NETWORK PARAMETERS FOR MG TIME SERIES SET 1-3 GenSoFNN-Yager RFCMAC-Yager Parameters Set1 Set2 Set3 Set1 Set2 Set3 0.1

TABLE VI presents the optimized parameters of the GenSoFNN-Yager and RFCMAC-Yager systems for MG time series prediction. Various combinations of parameter settings listed in this table illustrate the adaptive capabilities of the meta-level parameter optimization technique to produce (near) optimal parameter settings given different problem sets. The obtained parameter settings will be used to train and test both networks. Results obtained using the optimized parameter settings will then be compared to those obtained using the default parameter settings. Two metrics are used to

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

1597

measure the system performance: Pearson’s product-moment correlation (indicating the relationship between the actual and the predicted outputs) and root mean squared error (RMSE) (denoting the prediction accuracy). TABLE VII summarizes prediction results obtained for the Mackey-Glass time series dataset using 6 previous values for single step, two-step, and four-step predictions, comparing performance of the default and optimized GenSoFNN-Yager. It is seen that the optimized GenSoFNN-Yager outperforms the default system. In particular, a performance improvement is shown by the reduction of the prediction error (RMSE) by 59.54%, 57.05%, and 40.46% for set 1 to 3 respectively. Consequently, the correlation values improve from a minor increase of 3.69% to a significant increase of 10.19%. TABLE VII COMPARISON BETWEEN DEFAULT AND OPTIMIZED GENSOFNN-YAGER FOR MG TIME SERIES Set Criteria Default Optimized Improvement 1

2

3

Correlation

95.80%

99.33%

+3.69%

RMSE

0.0692

0.0280

-59.54%

Correlation

94.05%

98.40%

+4.63%

RMSE

0.0970

0.0417

-57.07%

Correlation

86.35%

95.15%

+10.19%

RMSE

0.1196

0.0712

-40.46%

TABLE VIII COMPARISON BETWEEN DEFAULT AND OPTIMIZED RFCMAC-YAGER FOR MG TIME SERIES Set Criteria Default Optimized Improvement 1

2

3

Correlation

97.37%

99.35%

+2.03%

RMSE

0.0572

0.0266

-53.37%

Correlation

94.77%

98.75%

+4.21%

RMSE

0.0842

0.0369

-56.12%

Correlation

87.68%

96.41%

+9.95%

RMSE

0.1227

0.0615

-49.90%

As before, the optimized RFCMAC-Yager outperforms the default system, which can be verified from TABLE VIII. The RMSE values decrease by 53.37%, 56.12%, and 49.90% for set 1 to 3 respectively, resulting in improvements in terms of correlation, starting from a minor increase of 2.03% to a significant increase of 9.95%. From TABLE VII and TABLE VIII, it is observed that the prediction performances for both GenSoFNN-Yager and RFCMAC-Yager degrade when moving from single step prediction (set 1) to multi-step prediction (sets 2 and 3). However, these results are expected since the prediction difficulty increases as the number of prediction steps increases. In addition, as observed in set 1 from both tables, the closer the performance metrics to their optimal/desired values, the less evident the improvement resulting from the optimization process will be. Conversely, performance metrics which are far from optimal values provide more room for improvement

1598

during the parameter optimization process, as observed in set 3 from both tables. Therefore, it can be inferred that there is an inverse relationship between the amount of improvement resulting from the optimization process and the performance metrics obtained from the system using their default parameters. Improvement of 2–4% may look small or insignificant in terms of magnitude. However, in many real-world time series predictions, e.g. stock price prediction, more accurate predictions could bring better insights which eventually could translate into significant values in terms of trading profits or investment returns. C. Pima Indians Diabetes The Pima Indians Diabetes dataset [19] contains data of female patients (at least 21 years old) of Pima Indians heritage. Using the dataset, we are interested in investigating the effectiveness of the parameter optimization in a more complex and real-life problem. A total of 768 cases has been collected, 500 (65.1%) healthy and 268 (34.9%) positive diabetes. Each sample consists of eight attributes: number of pregnancy, plasma glucose concentration, diastolic blood pressure (mmHg), triceps skin fold thickness (mm), two-hour serum insulin (mu U/ml), body mass index (kg/m2), diabetes pedigree function, and age (years), plus one class label. TABLE IX shows the optimized parameters of the GenSoFNN-Yager and RFCMAC-Yager systems for Pima Indians Diabetes dataset using threefold CV method. The obtained parameter settings will be used to train and test both networks. The results of the optimized network are then compared to those obtained by the default network based on the three performance metrics discussed in Section III.A. TABLE IX OPTIMIZED NETWORK PARAMETERS FOR PIMA INDIANS DIABETES Parameters GenSoFNN-Yager RFCMAC-Yager STEP

0.4

0.3

Input SLOPE

0.3

0.4

Output SLOPE

0.9

0.4

Membership Threshold

0.4

0.5

Annex Threshold

0.3

-

TABLE X shows the performance comparison between default and optimized GenSoFNN-Yager for Pima Indians Diabetes. It is shown that the optimized GenSoFNN-Yager achieves higher classification accuracy with significantly fewer fuzzy rules and fuzzy clusters compared to those of the default network. The optimized network reduces the number of rules by 75.92%, 79.71%, and 81.09% for CV1–3 respectively, and improves the classification accuracy by 1.69%, 12.50%, and 6.94% for CV1–3 respectively. In terms of number of fuzzy clusters, the default network derives 5-7 fuzzy clusters for each input dimension, while only 2-3 fuzzy clusters are identified using the optimized network.

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

TABLE X COMPARISON BETWEEN DEFAULT AND OPTIMIZED GENSOFNN-YAGER FOR PIMA INDIANS DIABETES CV Criteria Default Optimized Improvement

1

2

3

Accuracy

69.53%

70.70%

+1.69%

# Rules

407

98

-75.92%

# Clusters

7:5:7:6:6:6:5:5

3:3:3:3:3:3:3:3

N/A

Accuracy

65.63%

73.83%

+12.50%

# Rules

409

83

-79.71%

# Clusters

6:6:5:6:6:5:6:7

2:3:3:3:3:3:2:3

N/A

Accuracy

67.58%

72.27%

6.94%

# Rules

439

83

-81.09%

# Clusters

6:6:5:6:6:5:5:7

3:3:3:3:3:3:3:2

N/A

TABLE XI COMPARISON BETWEEN DEFAULT AND OPTIMIZED RFCMAC-YAGER FOR PIMA INDIANS DIABETES CV Criteria Default Optimized Improvement

1

2

3

Accuracy

69.14%

77.34%

+11.86%

# Rules

411

189

-54.01%

# Clusters

7:5:7:6:6:6:5:5

4:4:4:4:3:4:4:4

N/A

Accuracy

74.22%

75.39%

+1.58%

# Rules

413

220

-46.73%

# Clusters

6:6:5:6:6:5:6:7

4:4:4:3:3:4:3:4

N/A

Accuracy

67.19%

71.88%

+6.98%

# Rules

442

214

-51.58%

# Clusters

6:6:5:6:6:5:5:7

3:4:4:4:4:4:3:4

N/A

TABLE XI shows performance comparison between default and optimized RFCMAC-Yager for Pima Indians Diabetes. As with the case of GenSoFNN-Yager, it is evident that for all the three CVs, the optimized RFCMAC-Yager achieves higher accuracy with much fewer fuzzy rules and less fuzzy clusters compared to the default RFCMAC-Yager. It is observed that the optimized network reduces the number of rules by 54.01%, 46.73%, and 51.58% for CV1–3 respectively, and increases the classification accuracy by 11.86%, 1.58%, and 6.98% for CV1–3 respectively. In terms of number of fuzzy clusters, using default parameters, the network requires 5-7 fuzzy clusters for each input dimension, while 3-4 clusters are crafted by the optimized network. TABLE XII subsequently compares the performance of GenSoFNN-Yager and RCFMAC-Yager for Pima Indians Diabetes dataset against other approaches: Naïve Bayes [26], C4.5 decision tree [26], classification and regression tree (CART) [27], and MLP [27]. Using its default parameters, the average classification accuracy of GenSoFNN-Yager is 67.58 + 1.95%, which is the lowest compared to other systems. RFCMAC-Yager, on the other hand, achieves an average classification accuracy of 70.18 + 3.63%, which is also higher than that of GenSoFNN-Yager, but still not as good as that of Naïve Bayes, MLP, C4.5, and CART. However, the average

classification accuracy of the optimized GenSoFNN-Yager increases to 72.27 + 1.56, which is now comparable to that of CART, C4.5, and Naïve Bayes. At the same time, the number of rules decreases drastically from 419 to 88. Likewise, the optimized RFCMAC-Yager improves the average accuracy from 70.18 + 3.63% to 74.87 + 2.77%, which is the best among other architectures (except MLP). In addition, the total number of rules reduces drastically from 422 to 208. TABLE XII PERFORMANCE COMPARISON WITH OTHER METHODS FOR PIMA INDIANS DIABETES Method Evaluation # Rules Accuracy Naïve Bayes

5-CV

N/A

74.5 + 0.9%

C4.5

5-CV

N/A

73.0 + 0.9%

MLP

10-CV

N/A

76.4%

CART

10-CV

N/A

72.8%

GenSoFNN-Yager (def)

3-CV

419

67.58 + 1.95%

GenSoFNN-Yager (opt)

3-CV

88

72.27 + 1.56%

RFCMAC-Yager (def)

3-CV

422

70.18 + 3.63%

RFCMAC-Yager (opt)

3-CV

208

74.87 + 2.77%

def = default, opt = optimized, k-CV = k-fold cross-validation

As with Section III.A, the above results signify the benefits of the meta-level parameter optimization in eliminating the hassle of trial-and-error parameter finding and in boosting the performance of the neuro-fuzzy systems. V. CONCLUSION AND FUTURE WORKS A generic meta-level parameter optimization framework is put forward in this paper to address the fundamental issue of finding the optimal parameter settings for pattern recognition systems. The proposed framework has been employed atop two examples of neuro-fuzzy system: GenSoFNN-Yager and RFCMAC-Yager, by adopting GA as the main optimization technique. The presented simulation results on Fisher’s Iris classification, Mackey-Glass time series, and Pima Indians Diabetes datasets have justified the capability of the proposed framework to improve the accuracy performance of the neuro-fuzzy systems, while producing more compact structure, i.e., less number of rules and input clusters. Nevertheless, computational time remains a major issue in the optimization processes. This could be attributed to the amount of time required for every train-and-test operation, which depends on the problem and parameter configurations. One plausible approach to resolve this problem is to employ dynamic programming-based selection method for candidate parameters to be evaluated. For example, a lookup table can be constructed to store the previously evaluated parameter vectors and their fitness values in order to avoid multiple evaluations and in turn shorten the total computational time required to complete the operation. This and other issues will thus be investigated to further improve the performance of the proposed optimization framework.

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

1599

REFERENCES [1] [2]

[3]

[4]

[5] [6] [7] [8] [9]

[10] [11] [12] [13]

[14]

[15] [16]

[17] [18]

[19]

[20] [21] [22]

[23]

[24]

[25]

[26]

J. D. Trimmer, "Response of Physical Systems." New York: Wiley, 1950, pp. 13. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533 - 536, 1986. S.-S. Han and G. S. May, "Optimization of neural network structure and learning parameters using genetic algorithms," in Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, 1996, pp. 200-206. H. F. Leung, H. K. Lam, S. H. Ling, and K. S. Tam, "Tuning of the structure and parameters of a neural network using an improved genetic algorithm," IEEE Transactions on Neural Networks, vol. 14, pp. 79 - 88, 2003. N. Richards, D. E. Moriarty, and R. Miikkulainen, "Evolving neural networks to play go," Applied Intelligence, vol. 8, pp. 85 - 96, 1997. D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Boston, MA: Kluwer Academic Publishers, 1989. J. Holland, Adaptation in Nature and Artificial Systems. Ann Arbor: University of Michigan Press, 1975. S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Upper Saddle River, New Jersey: Prentice-Hall, Inc., 1999. C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Upper Saddle River, NJ: Prentice Hall, 1996. C. Darwin, On the Origin of Species by Means of Natural Selection. London: John Murray, 1859. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, pp. 671-680, 1983. J. A. Nelder and R. Mead, "A simplex method for function minimization," Computer Journal, vol. 7, pp. 308 - 313, 1965. R. J. Oentaryo and M. Pasquier, "GenSoFNN-Yager: A novel hippocampus-like learning memory system realizing yager inference," in Proceedings of the IEEE International Joint Conference on Neural Networks. Vancouver, BC, Canada, 2006, pp. 705-712. R. J. Oentaryo, "Metacognitive intelligent system in complex environments," Centre for Computational Intelligence, Nanyang Technological University, Singapore, First Year Ph.D. Confirmation Report 2006. B. C. Kuo, Automatic Control Systems, 6th ed. New Jersey: Prentice Hall, 1991. W. L. Tung and C. Quek, "GenSoFNN: A generic self-organizing fuzzy neural network," IEEE Transactions on Neural Networks, vol. 13, pp. 1075-1086, 2002. R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of Eugenics, vol. 7, pp. 179-188, 1936. L. P. Maguire, B. Roche, T. M. McGinnity, and L. J. McDaid, "Predicting a chaotic time series using a fuzzy neural network," Information Sciences, vol. 112, pp. 125-136, 1998. D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz, "UCI repository of machine learning databases," [Available]: http://www.ics.uci.edu/$\sim$mlearn/MLRepository.html, 1998. M. Wall, "GAlib: A C++ library of genetic algorithm components." [Available]: http://lancet.mit.edu/ga/, 1996. T. M. Cover and P. E. Hart, "Nearest pattern classification," IEEE Transactions on Information Theory, vol. IT-13, pp. 21 - 27, 1967. S. K. Pal, S. Bandyopadhyay, and C. A. Murthy, "Genetic algorithms for generation of class boundaries," IEEE Transactions on Systems, Man and Cybernetics, Part B, vol. 28, pp. 816 - 828, 1998. C. J. Lin and C. T. Lin, "An ART-based fuzzy adaptive learning control network," IEEE Transactions on Fuzzy Systems, vol. 5, pp. 477 - 496, 1997. C. Quek and A. Singh, "POP-Yager: A novel self-organising fuzzy neural network based on the Yager inference," Expert Systems with Applications, vol. 29, pp. 229 - 242, 2005. J. Sim, W. L. Tung, and C. Quek, "FCMAC-Yager: A novel Yager-inference-scheme-based fuzzy CMAC," IEEE Transactions on Neural Networks, vol. 17, pp. 1394 - 1410, 2006. N. Friedman, D. Geiger, and M. Goldszmit, "Bayesian networks classifiers," Machine Learning, vol. 29, pp. 131 - 163, 1997.

1600

[27] B. Ster and A. Dobnikar, "Neural networks in medical diagnosis: Comparison with other methods," in Proceedings of the International Conference on Engineering Applications of Neural Networks, A. Bulsari, S. Kallio, and D. Tsaptsinos, Eds. London, 1996, pp. 427-430.

2007 IEEE Congress on Evolutionary Computation (CEC 2007)

Generic GA-based meta-level parameter optimization ...

apply different parameter settings using brute force or trial-and-error ... system called neuro-fuzzy system (NFS) [9], which synergizes the learning and .... NFS output s y . This process will then be terminated when a satisfactory performance is reached. System. (Black-box). Parameter p. Output ys g(.) Trained. Neuro-fuzzy.

157KB Sizes 1 Downloads 246 Views

Recommend Documents

Inter-System Handover Parameter Optimization
handover parameters on inter-system handover performance, and study ... centers and business districts where higher subscriber density .... 1. 2. 3. 4. 5. Perform GSM Handover Sequence. 6. Figure 1: W-to-G Handover Call Flow Diagram for CS .... 4 In

Parameter optimization in 3D reconstruction on a large ...
Feb 20, 2007 - File replication allows a single file to be replicated to multiple storage ... Data are then replicated from that SE to other two SEs and so on.

Cell Reselection Parameter Optimization in UMTS
then evaluated by comparing the quality of the camping cell with the ... A QUALCOMM test phone was used to collect field measurements from three different commercial UMTS networks in Europe, X, Y, and Z. In each market, we performed.

Generic Optimization of Linear Precoding in Multibeam Satellite Systems
Abstract—Multibeam satellite systems have been employed to provide interactive .... take into account the power flexibility, which is essential for optimum ...

generic postcard.pdf
Page 1 of 1. a little. Note. a little. Note. a little. Note. a little. Note. Page 1 of 1. generic postcard.pdf. generic postcard.pdf. Open. Extract. Open with. Sign In.

Template - Generic -
Dec 1, 2016 - This three-hour awareness session, led by the New Jersey Office of Homeland Security and Preparedness (NJOHSP), provides you with an ...

Parameter Penduduk.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Parameter ...

Generic Desired Adaptation Outcomes
Robust policies, programmes and actions for CC adaptation. 3. Accurate weather forecasting, reliable seasonal predictions, climate projections & effective early.

DISTRIBUTED PARAMETER ESTIMATION WITH SELECTIVE ...
shared with neighboring nodes, and a local consensus estimate is ob- tained by .... The complexity of the update depends on the update rule, f[·], employed at ...

Head Parameter-handout.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

Generic WBT Rules.pdf
Page 1 of 5. Rule #1. Follow. directions. quickly! Page 1 of 5. Page 2 of 5. Rule #2. Raise your. hand for. permission to. speak. Page 2 of 5. Page 3 of 5. Rule #3.

generic 1..414 -
The impact test specimen and notch location and orientation shall be as given in the Section requiring such tests. When qualifying pipe in the 5G or 6G position, ...

generic meal planner.pdf
Page 1 of 1. Menu. Monday Beverages Meat. Tuesday Bread/Bakery Produce. Wednesday Canned Goods Cleaners. Thursday Dairy Paper Goods. Friday Dry/Baking Goods Personal Care. Saturday Frozen Foods Other. Sunday. Shopping List. Our Family's. Weekly Meal

Generic WBT Rules.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Generic WBT ...

Generic iReady REsults.pdf
Generic iReady REsults.pdf. Generic iReady REsults.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Generic iReady REsults.pdf. Page 1 of 1.

Optimistic Generic Broadcast
Sep 28, 2005 - client → server: “book room 5”. 2 server → client: “room booked” client server book room 5 ..... identical instances share state finite resources ...

AUTOMORPHISMS AND AUTOEQUIVALENCES OF GENERIC ...
map. While it is very easy to describe explicit examples of algebraic K3 surfaces, the non-algebraic ones are usually presented with a rather abstract approach ...

Generic Personalized PDF
major ERP vendors have all made significant acquisitions, intended to enhance or replace the. EPM and BI ... The value of ERP standardization lies in leveraging best practices across repeatable process such as accounting and logistics. Performance ma

AUTOMORPHISMS AND AUTOEQUIVALENCES OF GENERIC ...
category of OX-modules instead of the bounded derived category of coherent sheaves (Section 4.3). ...... Algebraic structures and moduli spaces, CRM Proc.

Parameter homotopy continuation for feedback ...
Abstract. In the article the problem of output setpoint tracking for affine non-linear sys- tem is considered. Presented approach combines state feedback linearization and homotopy numerical continuation in subspaces of phase space where feedback lin

Robust Multivariable Linear Parameter Varying ...
A. Zin, 2005 (active suspension control toward global chassis control). Investigations on: ▷ Vehicle dynamics modeling & analysis. ▷ (Semi-)active suspensions modeling & control. ▷ Global Chassis Control (GCC) involving suspensions, steering &

Parameter homotopy continuation for feedback ...
H(ri) = Ai,1(x, z,Λ) · u + Ai,2(x, z,Λ) · λ(ri) + Bi(x, z,Λ),. (10) ..... the motor power supply power-stage based on frequency converter SEW MoviTrac is used.

OPTIMAL PARAMETER SELECTION IN SUPPORT ...
Website: http://AIMsciences.org ... K. Schittkowski. Department of Computer Science ... algorithm for computing kernel and related parameters of a support vector.

3 Our Parameter Selection Algorithms
Email: {rahulv,riskin}@ee.washington.edu, [email protected] ... In this paper, we use the x264 encoder, an open source implementation of the H.264 ...