Intelligent Data Analysis 17 (2013) 697–716 DOI 10.3233/IDA-130601 IOS Press

697

Nima Hatamia,∗ and Camelia Chirab a Imaging

CO PY

Diverse accurate feature selection for microarray cancer diagnosis Data Evaluation and Analysis Center, University of California – San Diego, La Jolla, CA, USA b Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania

TH OR

Abstract. Gene expression microarray data provides simultaneous activity measurement of thousands of features facilitating a potential effective and reliable cancer diagnosis. An important and challenging task in microarray analysis refers to selecting the most relevant and significant genes for data (cancer) classification. A random subspace ensemble based method is proposed to address feature selection in gene expression cancer diagnosis. The introduced Diverse Accurate Feature Selection method relies on multiple individual classifiers built based on random feature subspaces. Each feature is assigned a score computed based on the pairwise diversity among individual classifiers and the ratio between individual and ensemble accuracies. This triggers the creation of a ranked list of features for which a final classifier is applied with an increased performance using minimum possible number of genes. Experimental results focus on the problem of gene expression cancer diagnosis based on microarray datasets publicly available. Numerical results show that the proposed method is competitive with related models from literature. Keywords: Random subspace ensembles, multiple classifier systems, multivariate feature selection, gene expression data analysis, pairwise diversity

1. Introduction

AU

Gene expression microarray technology is widely used in clinical diagnosis as well as in the prediction of clinical treatment outcomes. Microarray data is characterized by a large number of features and needs efficient tools and techniques for a meaningful analysis. The number of samples usually available is very low mainly due to the cost associated. Furthermore, most genes are not relevant in a classification task or the information they carry is redundant. Feature selection is an important and challenging task by which few relevant features (out of thousands of features) must be selected based on their value for a limited available number of samples (normally less then a hundred). Cancer diagnosis based on gene expression data is clearly a high-dimensional low-sample size (HDLSS) problem which represents a significant challenge in machine learning and pattern recognition. A common approach to deal with this problem starts with a feature selection/reduction method (by which unimportant or noisy features are eliminated) followed by a standard classification method. Feature reduction is generally addressed by some statistical approaches e.g. Principal Component Analysis (PCA) and Independent Component Analysis (ICA) able to transfer the original features to a lower ∗ Corresponding author: Nima Hatami, Imaging Data Evaluation and Analysis (IDEA) Center, University of California, San Diego 9415 Campus Point Drive, La Jolla, CA 92093, USA. E-mail: [email protected].

c 2013 – IOS Press and the authors. All rights reserved 1088-467X/13/$27.50 

698

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

TH OR

CO PY

dimensional space [20,24]. Regarding feature selection, there are two main approaches to reduce the dimensionality of the feature space. Some models use simple and fast univariate selection methods which evaluate the relevance of each feature individually. Other methods use a multivariate feature selection as a pre-processing step which can be computationally expensive but is able to consider possible correlations and dependencies between features. The focus of the current paper is on the latter approach. Multiple classifier systems use base classifiers with complementary behaviour resulting in an efficient alternative to a complex and hard-to-manage single classifier. Among many well-known ensemble methods, Random Subspace Ensemble (RSE) [10] is an efficient model which obtained good results particularly for high-dimensional classification problems. RSE uses a number of base classifiers, each of them considering only a subset (randomly determined) of the original feature space. In this paper, the Diverse Accurate Feature Selection (DAFS) method is introduced to deal with feature selection in gene expression cancer diagnosis. The main idea behind DAFS is to adapt RSE for feature selection by efficiently exploring accuracy and diversity information. To be more specific, the proposed DAFS method uses individual and ensemble accuracies of base classifiers specialized on random subspaces as well as pairwise diversity to rank features. This is achieved by assigning to each feature a score calculated by a metric which takes into account the mean ratio between individual and ensemble accuracies weighted by their diversity. This way, the introduced approach takes advantage of RSE to deal with the high-dimensionality of the addressed problem while in the same time building many different classifiers on the same samples to overcome the sample-size limitation. The proposed method is evaluated for 11 cancer gene expression datasets [28] (from which nine are multiclass and two are binary classification problems). Computational experiments and comparissons to related methods indicate a good performance of the proposed DAFS method and emphasize the potential of RSE to deliver a fast effective feature selection method. The structure of the paper is as follows: Section 2 defines the problem of gene expression cancer diagnosis and reviews the major related feature selection methods, Section 3 describes the proposed DAFS method fully detailing the main features and the rationale behind them, Section 4 presents the computational experiments discussing the obtained results and Section 5 contains the conclusions of the paper and some future work directions.

2. Gene expression cancer diagnosis and related work

AU

High-throughput technologies are able nowadays to produce huge amount of valuable information which can be used in the identification and classification of various diagnostic areas. The required analysis of this information creates a real challenge for machine learning and new robust models are still required to efficiently tackle this task. This section briefly presents the problem of cancer diagnosis based on gene expression data and reviews related feature selection methods. 2.1. Cancer classification using gene expression data Cancer diagnosis based on gene expression data is an important emerging medical application domain of microarray analysis tools [29]. Clinical-based cancer classification has been shown to have limited diagnostic ability [22]. On the other hand, the classification of different tumor types using gene expression data is able to generate valuable knowledge for important tasks such as cancer diagnosis and drug discovery [22].

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

699

CO PY

A microarray gene expression dataset is normally represented as a matrix where each row corresponds to a sample and each column corresponds to a gene. The element at position (i, j) represents the gene expression level of feature j in sample i. The number of rows available in the gene expression matrix is normally very low due to the difficulty of collecting and processing microarray samples [8,23]. A microarray experiment provides the expression levels of thousands of genes but only a very small subset of them is relevant to clinical diagnosis [8]. Classifying microarray samples (for example, cancer versus normal cells) according to their gene expression profiles represents an important and challenging task. The complexity of the problem rises from the huge number of features that contribute to a profile compared to the very low number of samples normally available in microarray analysis. Another challenge for classification is the presence of noise (biological or technical) in the dataset which further affects the classifier accuracies. Moreover, the inherent presence of a large number of irrelevant genes increases the difficulty of the classification task influencing the discrimination power of relevant features [22]. 2.2. Feature selection methods for gene expression data analysis

AU

TH OR

The problem of extracting significant knowledge from microarray data requires the development of robust methods able to address the HDLSS aspect of this task [19,23]. Lu and Han [22] emphasize that the most important aspects of classification and gene selection methods are their computation time, classification accuracy and ability to reveal biologically meaningful gene information. An extensive comparative analysis of many feature selection methods able to identify differentially expressed genes in microarray data can be found in [12]. Many studies consider the selection of genes to be an important integral preprocessing step for the classification problem [22] able to reduce dimensionality, remove irrelevant or noisy genes and improve the learning accuracy [1,18,22]. Furthermore, gene selection can reduce the computational cost of the classifier and lead to more compact results easily interpretable in the diagnostics task [21]. There are two main approaches to feature selection: filter (independent from the learning algorithm) and wrapper (embedding classifiers in the search model) methods [7,21]. Filter methods discriminate between genes based solely on the intrinsic characteristics of the data. Features are selected based on some measures able to determine their relevance to the target class without any correlation to a learning method. The MRMR – minimum redundancy maximum relevance feature selection framework [7] is a well-known filter method. Besides the maximal relevance criteria, MRMR requires selected features to further be maximally dissimilar to each other (the minimum redundancy criteria). For example, mutual information can be used to measure the relevance of a gene to a target class while the mutual Euclidean distance between two features could measure their redundancy. The main disadvantage of filter approaches is that they ignore the interaction with the classifier [27] which leads to a poor recognition performance. On the positive side, filter methods can be computed easily and even simple variants based on mutual information or statistical tests can be efficient [7]. Wrapper models, on the other hand, integrate learning algorithms in the selection process. The relevance of a feature is directly determined based on the accuracy of the learning method. A specific subset of features is first obtained by training and testing a classifier and a search algorithm is then used in conjuction with the classification model to search the space of all feature subsets [27]. Most wrapper methods use population-based randomized heuristics to guide the search towards the optimal feature subset. The wrapper methods are therefore computationally intensive as the number of feature subsets grows exponentially with the number of features. Genetic algorithms have been engaged in wrapper

700

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

AU

TH OR

CO PY

models to identify relevant group of genes that are further explored by classifiers [31], to eliminate redundant genes by embedding Markov blanket with memetic operators [33] or to select genes by evolving individuals (which represent gene subsets) towards underlying good classifier accuracies [19]. As opposed to filter approaches, wrapper methods require a high computational time and present a higher risk of overfitting [27]. Main advantages of wrapper models refer to the ability to model feature dependencies and the interaction of the search model with the classifier. A special type of wrapper methods are the embedded [27] (or hybrid [21]) methods which interact with the classification model integrating the search for a suitable subset of features into the classifier construction. Saeys et al. [27] suggest that these embedded methods actually form a third class of feature selection techniques (besides filter and wrapper approaches). The main characteristic of embedded methods is that the combined space of feature subsets and classifiers forms the search space. For example, random forests have been used to find a subset of discriminative features based on the embedded capacity of several classifiers [27]. SVM-RFE [9] and its extensions [32] uses support vector machine (SVM) as a classifier to evaluate and remove redundant genes in a recursive procedure. Embedded methods are less computationally intensive compared to wrapper methods while still exploring the benefit of the interaction with the classifier. The volume of literature on feature selection methods is considerably large and many comprehensive surveys are available [5,17,18,22,27]. We present here a brief review of some selected recently introduced models (with a focus on embedded wrapper methods) considered to be highly related to the RSE-based gene selection proposed in the current paper. Statnikov et al. [29] analyse several multicategory classification methods for 11 cancer gene expression datasets (available in [28]). The authors show that multicategory-SVM (MC-SVM) clearly outperform K-Nearest Neighbors, backpropagation and probabilistic neural networks. Furthermore, the best non- ensemble methods report better results compared to ensemble classifiers. A software system called Gene Expression Model Selector (GEMS) is created based on the results. In [30], Valentini et al describe the model of bagged (Bootstrap AGGregatING [6]) ensembles of SVM for gene expression data classification tasks. A better performance of the bagged SVM ensembles compared to single SVM is obtained for the Colon and Leukemia datasets. Bertoni et al. [2–4] explore random subspace ensembles of SVM for bio-molecular cancer prediction. In [2], a method using a set of base classifiers (linear SVM) aggregated by majority voting is presented and tested for the Colon and Medulloblastoma datasets. This method is further extended in [3] by including an initial phase of feature selection to eliminate noisy and uninformative genes. The resulting method is called Random Subspace on Selected Features (RS-SF). In [16], the random subspace method [10] is explored for multivariate feature selection. As opposed to univariate approaches to feature selection (which evaluate features individually), multivariate methods take into account the dependencies between features to select relevant groups of genes. The method proposed by Lai et al. [16] selects t feature subsets from the original feature space based on the random subspaces approach and applies a multivariate search technique on each of them. Based on these evaluations (performed in subspaces), a weight proportional to the relevance is associated with each feature. The final list of features is established based on a cross-validation procedure. Experiments engage two techniques as the multivariate methods (i.e. Recursive Feature Elimination [9] and a classifier called Linkon) but numerical results are reported only for an artificial dataset introduced by the authors. Random subspace ensemble is also engaged by Kamath et al. [13] as a feature selection method for two binary cancer-related gene expression datasets. Accurate subsets of features are selected using a multivariate technique based on random subspaces. Subsets of features are simultaneously selected for

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

701

AU

TH OR

CO PY

individual classifiers (ten thousand subspaces with 200 features each are considered). The base classifiers reporting both overall and per-class prediction accuracies above a specified threshold inform the selection process of informative features. The authors show a better performance of the proposed method compared to the univariate feature selection approach (a 33% improvement in classification accuracy is reported but this is limited to only two datasets considered). The hybrid method BIRS (best incremental ranked subset) is presented in [26]. BIRS relies on the statistical significance of adding a ranked feature to the final subset. In the first phase, features are ranked according to an evaluation measure (the authors use the individual predictive power of a gene in a wrapper approach and the symmetrical uncertainty in a filter approach to ranking). In the second phase, BIRS identifies the best subset from the ranked list of genes based on statistical comparisons of the underlying classification accuracies. The subset obtained in this way should contain the relevant genes having a high discrimination power. The classifiers engaged for class prediction based on selected features are a probabilistic learner (Naïve Bayes), an instance-based learner (IB1) and a decision tree learner (C4.5). Experiments are performed for cancer classification using four DNA microarray datasets (colon, leukemia, lymphoma and global cancer map – GCM). Results indicate that BIRS is capable of selecting small feature subsets with an average size of 0.0018% of the original set of features and with a competitive accuracy. In [11], an evolutionary approach is proposed to design an SVM-based classifier. The resulting evolutionary SVM (ESVM) model follows two objectives simultaneously: optimization of automatic feature selection and SVM parameter tuning. The GA representation used includes the index of the genes selected and the parameters of the SVM classifier. The fitness function is designed to minimize the number of selected features while maximizing the classification accuracy (of 10-fold cross-validation). Experiments are performed for the 11 cancer microarray datasets from [28] and results indicate a better performance of ESVM compared to MC-SVM [29] (the average classification accuracy over all datasets is 96.88% for ESVM compared to 89.44% obtained by MC-SVM). An ensemble-based classification algorithm called CERP (Classification by Ensembles from Random Partitions) is described in [25]. Ensembles of classifiers are built based on the optimal number of random partitions of the feature space. Classification-Tree CERP (C-T CERP) is an ensemble of ensembles of optimal numbers of pruned tree classifiers. The feature space is randomly partitioned in mutually exclusive subsets which are used in the individual classifiers. The number of classifiers in an ensemble is determined based on adaptive bisection algorithm using nested cross-validation. C-T CERP is applied with good results for classification in the lymphoma, lung and breast cancer datasets. The classification result is based on the majority vote among ensembles (and each of them further used the majority vote of individual members). Liu et al. [21] proposed a new ensemble method called ensemble gene selection by grouping (EGSG). Information theory and approximate Markov blanket are engaged to select salient gene subsets for the classification task. The EGSG method has three steps as follows: (i) genes are grouped by approximate Markov blanket so that similar genes belong to the same group and dissimilar genes are in different groups with respect to information correlation coefficient, (ii) multiple gene subsets are created by randomly selecting one gene from each group, and (iii) classifiers are built for the gene subsets obtained in the second step and the ensemble is formed based on majority voting over the base classifiers. The performance of the EGSG method is evaluated for the following five gene expression datasets: breast cancer, CNS (central nervous system), colon cancer, leukemia and prostate (all are binary problems). Naïve Bayes and k-nearest-neighbor are the learning algorithms (in conjuction with leave-one-out crossvalidation) engaged to build classifiers on the gene subsets produced by EGSG. The authors show that

702

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

TH OR

CO PY

EGSG is able to achieve lower classification errors compared to several model-free gene selection methods. Furthermore, EGSG is shown to be more stable compared to ensembles from random partitions (CERP [25]) based on the variance of the classification accuracy over ten runs for different number of gene subsets (24 ranging from 2 to 48 subsets). One of the disadvantages of EGSG refers to the tendency of the method to select a high number of genes (more than the MRMR method [7]). Recently, Lee and Leu [19] introduced a hybrid method by which gene subsets are determined using a genetic algorithm and top-ranked genes are selected based on a homogeneity test. The method is called genetic algorithm with dynamic parameter setting (GADP). In the first stage, the number of genes is reduced to 500 using an initial feature selection method based on the BW (between-groups to withingroups sum of square) ratio. Individuals in GADP are represented based on an integer coding scheme. The fitness value of an individual is defined as the classification accuracy obtained by KNN using the genes in the individual. Furthermore, GADP uses an elitist strategy for selection and an extinction and immigration strategy for diversifying genetic material (to avoid the algorithm being trapped in local optima). The selection of features is based on their occurence frequencies in a set of high-fitted individuals produced by GADP. This results in a sorted list of genes for which the λ2 -test for homogeneity on each pair of consecutive genes in the list is employed to determine the final set of selected features. The efficiency of this selected set of features is tested using SVM for six microarray datasets (colon, small round blue cell tumor – SRBCT, breast cancer, ALL/AML, DLBCL and GCM). The experimental results presented indicate a competitive performance of the GADP method compared to most related methods considered. In [23], a rough set based maximum relevance – maximum significance (MRMS) algorithm is proposed for feature selection. MRMS relies on rough set theory to maximize two criterias in the same time: the relevance of selected features with respect to the class and the sinificance in the feature set. The authors show an overall better performance of the proposed MRMS compared to mutual information measures and other feature selection methods for five cancer and two arthritis microarray data sets. Despite the large number of existing methods focusing on gene selection for cancer diagnosis, there is still an ongoing need to improve recognition rates and computational times. We believe that a promising approach to handle this HDLSS problem can start with the efficient division of the problem into smaller subproblems. A simple expert can be trained and speciliazed for each of these subproblems and the final solution is triggered by ensemble of experts acting in a cooperative manner. 3. Diverse Accurate Feature Selection

AU

In this section, we introduce the Diverse Accurate Feature Selection (DAFS) method to facilitate multivariate feature selection for gene expression classification. The proposed DAFS relies on the RSE method [10] to select features triggering a high classification accuracy and furthermore on certain diversity measures to minimize the redundancy among selected features. It should be emphasized that the proposed method is intended for feature selection and not for classification, which means that DAFS should be used rather as a preprocessing step of the classification task (approach suggested in many studies [1,18,21,22]). As indicated by current experimental results (presented in section 4), DAFS is a fast effective feature selection method (particularly in comparisson with standard wrapper embedded methods) able to reduce the computational cost of the classifier by removing irrelevant features and improving the recognition rate. Let us consider a sample set XM ×N of size N and dimensionality M . Instead of using all features for each classifier in the ensemble, RSE [10] samples the feature set. The ensemble assigns a class label by

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

703

CO PY

majority or average voting of L classifiers built on different feature subsets of size m (where m << M ) sampled randomly and uniformly from the original feature set. Each base classifier is invariant to points that are different from the training points only in the unselected features, thus encouraging diversity. This approach results in each classifier generalizing in a different way. Hence, while most other classification methods suffer from the curse of dimensionality, the RSE method can take advantage of high dimensionality to build an ensemble of “diverse” classifiers, each specialized on an small subspace [10]. The major issues associated to the feature selection task which need to be addressed by DAFS are as follows: 1. Given L (the number of base classifiers) and m (the subset size), what is the probability that each feature is “at least” one time selected in a subspace set? 2. What are the best values for L and m in the proposed selection strategy to achieve maximum possible classification accuracy with minimum training cost? 3. Which is a suitable measure to rank the most important features (among a very large number of irrelevant features) using the information available from the base classifiers in RSE? In the following, we first theoretically address the first and third issues raised above (the second one will be discussed in the next section from an experimental perspective) and then describe the proposed DAFS method in more detail. 3.1. Feature-space coverage

(1)

AU

Pcov = (1 − (1 − SR)L )γ

TH OR

Let us denote by γ the total number of relevant features returned by the selection algorithm. The γ . As indicated in many research reports [8,22], the number of useful selection ratio is defined as M genes resulted from feature selection is very small (without loss of generality, we can consider this number to be as small as 10 out of thousands of genes). Considering the high number of features present in microarray gene expression datasets (normally more than 5000), we can state that selection ratio < 0.002 carying a large amount of information by a small number of genes. Therefore, any feature selection strategy should evaluate each gene at least once for an effective selection. The probability that a particular feature fi is hit in m trials of RSE is m/M (a measure called SR – Subspace Ratio). Therefore, the probability of not selecting a feature in any of the L classifiers of the ensemble is P (f¯i ) = (1 − SR)L . The probability of fi being at least in one of the L selections is 1 − P (f¯i ). Furthermore, the probability of all features to appear at least once in one of the L classifiers (called Pcov – feature coverage probability) is defined as follows:

The assumption here is that the features within the selected subset of size m are sampled independently. The probability Pcov is considered in the proposed DAFS method to ensure the feature-space coverage by informing the selection of L and m parameters. 3.2. Feature evaluation and ranking

The core of any feature selection method is the process of evaluating each feature and selecting the most relevant ones for the problem at hand. Furthermore, this process should avoid redundant features because of their negative effect on the classification task (since redundant features do not bring any additional necessary information and further increase dimensionality).

704

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

u=1

TH OR

CO PY

The individual accuracy and the diversity among ensemble members are two key characteristics affecting ensemble accuracy [15]. The relation of the individual accuracy and diversity with the ensemble accuracy is still not clear but underlies the only way of designing an ensemble. The idea behind the DAFS method is to use the accuracy and diversity driven information not for designing an ensemble but for selecting the most relevant features from a very large set. To minimize the redundancy between the selected features, the feature evaluation and ranking phase of DAFS makes use of an important ensemble characteristic i.e. diversity. As opposed to many standard methods which directly compare the similarity within a set of features (e.g. by Euclidean distance) to measure their redundancy, we propose to evaluate features by comparing the labels diversity of the base classifiers using a specific feature. The more diverse the classifiers to which a specific features contributes to, the less redundant the feature is. The accuracy of a base classifier built on a specific feature subset is able to consider the contribution of those features to the classification task. Thanks to this important ensemble characteristic, the importance of a feature is indirectly measured by comparing the performances of the corresponding classifiers. To be more specific, let us consider a feature fi which appears in k random subspace sets RSfi = {rsi1 , rsi2 , . . . , rsik }, each subspace being used in the training set of k base classifiers Lfi = {li1 , li2 , . . . , lik }. Intuitively, a feature is important if the corresponding classifiers using it get a high accuracy in the ensemble. Therefore, Individual accuracy/RSE accuracy ratio can be used for evaluating the feature relevancy. In the same way, d(lens , liu ), u = 1 . . . k computes the redundancy of a feature using an individual classifier liu compared to the ensemble lens (d refers to any pairwise diversity measure). Two features are considered to carry similar information (to be redundant) if the classifiers using them have lower diversity. This way, we indirectly measure redundancy of features via the output of corresponding classifiers. Considering all of the above mentioned issues, we propose a measure called Diversity-Accuracy (DA) associated with each feature. The DA measure uses the ensemble components to incorporate both accuracy (to assess feature relevancy) and diversity (to assess feature redundancy) information. For a feature fi present in the k subsets RSfi used in classifiers Lfi , the DA measure is defined as follows:   k 1 a(liu ) r r DA(fi ) = d(lens , liu ) × , (2) k a(lens )

AU

where a(l) represents the accuracy of classifier l and r is a parameter used for adjusting the contribution of an individual classifier versus the ensemble to the feature evaluation. A feature will obtain a high DA value rank if the base classifier using it gets a higher accuracy and diversity with respect to the ensemble. This way, the top ranked features based on the DA measure are the most representative and informative for the classification task. It should be noted that the ensemble performance represents the contribution of the entire set of features and majority of base classifiers. A feature will get a high DA value if the classifiers using that feature have more diversity and accuracy with respect to the ensemble (majority of other features). Therefore, the DA measure considers both the ensemble accuracy and diversity as reference points for evaluating the performance of the individual classifiers for each feature. 3.3. DAFS Algorithm The main steps of proposed DAFS method refer to randomly sampling the feature set, building the RSE, calculating the DA value for each feature appearing in at least one base classifier and ranking the features according to the DA value. Figure 1 presents the block diagram model of DAFS.

705

TH OR

CO PY

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

Fig. 1. Block diagram representation of proposed DAFS model.

The DAFS algorithm is detailed below (Algorithm 1). The input parameters are m – the subspace size, L – the ensemble size and G – the maximum number of features that will be selected. The sample set XM ×N is divided in training (Xt ) and testing (Xe ) subsets (by using for example cross-validation).

AU

Algorithm 1 DAFS Algorithm Parameters • m : subspace size • L : ensemble size • G : maximum number of selected features Input • Xt , Xe represent the training and testing sets respectively • li , i = 1 . . . L are the base classifiers in RSE • h is the classifier used for the final classification Feature evaluation and selection • Select relevant m and L which satisfy the feature-space coverage Pcov condition (formula 1) • Create L sets of feature subspace rsi , i = 1 . . . L randomly and independently sampled from the entire feature space • Build learners (subclassifiers) each using the training data projected on its selected subspace • Apply Xv on the L classifiers and assign the labels by majority voting (Xv is a subset of Xt used for validation) • Calculate the DA value for each feature using formula 2 • Sort the features in descending order of DA value • Select the first γ features from the sorted list, 1  γ  G, which trigger the best accuracy on Xv by classifier h Classification • Train the classifier model h using Xt projected on the selected feature space of size γ • Project the testing set on the selected feature space and classify them applying h

706

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis Table 1 Cancer gene expression datasets used in computational experiments (from [28]) Samples 174 308 60 90 50 72 72 203 83 102 77

Genes (features) 12533 15009 5726 5920 10367 5327 11225 12600 2308 10509 5469

Classes 11 26 9 5 4 3 3 5 4 2 2

Features/Samples 72 49 95 66 207 74 156 62 28 103 71

CO PY

Dataset 11_Tumors 14_Tumors 9_Tumors Brain_Tumor1 Brain_Tumor2 Leukemia1 Leukemia2 Lung_Cancer SRBCT Prostate_Tumor DLBCL

Max. Prior 15.5% 9.7% 15.0% 66.7% 30.0% 52.8% 38.9% 68.5% 34.9% 51.0% 75.3%

The feature evaluation and selection phase of the algorithm refers to the application of RSE method and the evaluation of features based on the DA measure. Ensemble accuracy is computed by majority voting strategy. Based on the individual and ensemble classifier results, the DA value is calculated for each feature using formula 2. Features are then sorted in descending order of the DA value and the first γ are selected. The final number γ of selected features is determined by considering all possible feature sets (of size up to the maximum G parameter) and choosing the one that triggers the best performance on the validation set by the final classifier h. The γ features are finally used in a classification phase to determine the performance of a classifier using the selected features projected on the training and testing sets.

TH OR

4. Experimental results for gene expression data

This section presents computational experiments for various cancer gene expression datasets. Results are analysed from multiple perspectives (e.g. subset size vs. ensemble size, DA feature evaluation, different classification algorithms) and compared to those of related methods. 4.1. Dataset description

AU

The cancer gene expression datasets available in [28] are engaged for computational experiments. Nine out of the 11 datasets considered are multicategory while two of them are binary classification problems. Table 1 presents the different characteristics of these datasets. The column Maximum prior indicates the prior probability of the main diagnostic class for each problem. As specified in [28], the genes with “absent” calls in all samples were excluded from some of the datasets to reduce noise. The experiments reported in this paper use the exact datasets as given in [28]. The main properties of the considered datasets are as follows: 11_Tumors contains 174 samples with 12533 genes with the task of finding 11 various human tumor types, 14_Tumors refers to 14 various human tumor types and 12 normal tissue types, 9_Tumors has 60 samples in 9 various human tumor types, Brain_Tumor1 with 5 human brain tumor types, Brain_Tumor2 contains 4 malignant glioma types, Leukemia1 is a three-class task: Acute myelogenous leukemia (AML), acute lympboblastic leukemia (ALL) B-cell, and ALL T-cell, Leukemia2 with AML, ALL, and mixed-lineage leukemia (MLL) classes, Lung_Cancer contains 4 lung cancer types and normal tissues, SRBCT is for Small, round blue cell tumors (SRBCT) of childhood, Prostate_Tumor refers to a binary classification task Prostate tumor and normal tissues, and DLBCL has 77 samples with two possible classes: Diffuse large b-cell lymphomas (DLBCL) and follicular lymphomas.

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

707

4.2. DAFS setup

Q–stala,lb =

TH OR

CO PY

The DAFS method needs the specification of the classifiers used in the RSE and of the diversity measure used in calculating the DA feature values. We have used different types of classification algorithms – k-Nearest Neighbour (kNN), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP) – as base learners in order to show that the proposed method is independent of the particular base classifier. The error backpropagation algorithm is used for the training of the MLP base classifiers and the iterative estimation process is stopped when an average squared error of 0.9 over the training set is obtained, or when the maximum number of iterations is reached (adopted mainly for preventing networks from overtraining). We also varied the number of hidden neurons to experimentally find the optimal architecture of the MLPs for each problem. The other parameter values used for training are as follows: learning rate is 0.4 and momentum parameter is 0.6. In the case of SVM serving as base classifier, linear kernel is used. In the case of kNN classifier, the standard euclidean distance is used to calculate distances between samples. The value of k was varied from 3 to 9 in order to find the best neighbourhood size for each problem. All other parameters for these three algorithms have been chosen according to the standard setting of MATLAB Toolboxes. The datasets are split into training and testing subsets by using Leave-One-Out Cross-Validation (LOOCV). Since the problems are low sample size, the evaluation of a method assessed by LOOCV provides realistic generalization for unseen data. Furthermore, 20% of the training set is used for the validation phase. Several diversity measures are considered to compute the DA measure in the evaluation of DAFS. In [15], Kuncheva and Whitaker emphasize that there is no generally accepted formal definition for diversity and present several definitions from statistics to measure pairwise and non-pairwise diversities. Based on the analysis presented in [15], the diversity measures used in this paper are Q-statistics (Q-sta), Correlation coefficient (Corr), Disagreement measure (Dis) and Double Fault measure (DF). We briefly recall these measures below: N 11 N 00 − N 01 N 10 N 11 N 00 + N 01 N 10

Corrla,lb = 

N 11 N 00 − N 01 N 10

(N 11 + N 10 )(N 01 + N 00 )(N 11 + N 01 )(N 10 + N 00 )

(3)

(4)

N 01 + N 10 N 11 + N 10 + N 01 + N 00

(5)

DFla,lb =

N 00 N 11 + N 10 + N 01 + N 00

(6)

AU

Disla,lb =

where la, lb represent the two classifiers compared, N 11 and N 00 are the number of samples for which both classifiers la and lb concurrently make correct and incorrect decisions respectively while N 10 and N 01 are the number of samples for which la and lb do not agree on their labels. To apply the RSE method, the two important parameters to be determined are ensemble size (L) and subspace size (m). Kuncheva et al. [14] show that there is no easily available effective pair of values

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

CO PY

708

Fig. 2. Recognition rates obtained for different pairs of (m,L) in DAFS for the DLBCL dataset. In the left figure, the accuracies of DAFS with Q-stat as a diversity measure are presented for kNN, SVM and MLP classifiers. In the figure on the right, the accuracies of DAFS using SVM as base classifier are presented for Q-sta, Corr, Dis and DF diversity measures.

4.3. Numerical experiments

TH OR

(m, L) for RSE on dealing with high dimensional classification problems. However, the authors indicate that small L and larger m are preferable in the case of fMRI classification. Although some similarities (e.g. dimensionality and high signal-noise ratio) between our problem and the fMRI problem adressed in [14] do exist, there is a fundamental difference between the two approaches: we engage RSE for feature selection not for classification. Therefore, the conclusion reported in [14] might not be valid here and some investigations are necessary. In the following subsection, experiments are presented by first analysing the effect of m and L on different aspects of proposed DAFS method.

AU

The aim of the first set of experiments is to determine the most effective pair of (m, L) values for DAFS on gene expression data. For each dataset, various (m, L) pairs (selected in line with formula 1 so that feature-space coverage probability equals 1) were tested in terms of the recognition rate. Numerical results obtained by DAFS based on each of the three classifiers considered (kNN, SVM and MLP) in all possible combinations with the four diversity measures (Q-sta, Corr, Dis and DF) have been analysed. Figure 2 shows the results of experiments for the DLBCL dataset obtained by DAFS based on five different (m, L) pairs i.e. (10,10000), (50,500), (500,100), (1000, 20) and (1000, 10). On the left side, the results of DAFS using the Q-sta diversity measure are compared for the three classifiers. On the right side, the results of DAFS using SVM are compared for the four diversity measures. For the DLBCL dataset, best results are obtained when using SVM as the base classifier and Q-stat or DF as diversity measures. For comparison purposes, we also test the standard RSE without any feature selection. The DAFS results are clearly better compared to those of basic RSE classifier (depicted in Fig. 3 for the same (m, L) pairs). Furthermore, results indicate that (m, L) pairs selected to satisfy formula 1 are able to trigger a similar performance of DAFS (there is no (m, L) pair clearly outperforming the others). The results are similar for the rest of datasets given in Table 1. The number of selected genes obtained by DAFS for different pairs of (m, L) is depicted in Fig. 4 (for the DLBCL dataset). The maximum number of features to be selected G has been set to 30. As shown in Fig. 4, the number of selected features (γ ) based on DAFS with SVM is small for smaller m and larger L and grows while m grows and L decreases. For the kNN and MLP classifiers, no particular trend in the obtained γ values can be observed.

Fig. 4. Number of the selected features by the proposed DAFS method for different pairs of (m, L) for the DLBCL dataset.

TH OR

Fig. 3. The peformance of the RSE method for different size of (m,L) obtained by majority vote and mean individual accuracy (for the DLBCL dataset).

Fig. 5. Computational running time of the proposed DAFS method for different pairs of (m, L) for the DLBCL dataset.

709

CO PY

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

Fig. 6. The growing rate of the feature-space coverage probability according to different m, L values. In the ensemble size curve, m = 20 and L varies from 20 to 4000. In the subset size curve, L = 20 while m varies from 20 to 4000.

AU

Figure 5 shows the cost of the DAFS method in terms of the required computational running time for different pairs of (m, L). The required feature selection time depends on the m and L values and the base classifiers. For a small subset size and large number of base classifiers (SVM in this case), the elapsed time for the DLBCL dataset is about 3 hours when the subspace size is 10 and the number of classifiers is 10000 while for (1000,10) pair DAFS is as fast as 1 second. The time reported in Fig. 5 is in logarithmic scale of base 10. In the second set of experiments, we investigate the validity of formulation given for feature-space coverage. Figure 6 plots the feature coverage probability (Pcov ) which can be viewed as a guide map in choosing the (m, L) pair for DAFS such that Pcov is close to 1. Each curve shows the Pcov values obtained by varying one of the m, L parameters while keeping the other one fixed. As expected, featurespace coverage is directly influenced by the m and L values. A better coverage is obtained by higher m and L values but, of course, because of the computational limitations, the optimum values for m and L should be determined for the problem at hand. As shown in Fig. 6, both value pairs (20,2500) and

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

CO PY

710

Fig. 7. Error rates reported by DAFS using different diversity measures for DLBCL dataset (top left: Q-stat, top right: Corr, bottom left: Dis and bottom right: DF).

AU

TH OR

(2500,20) for (m, L) result in a Pcov  1 but they might have totally different computational cost. The third set of experiments focuses on the relation between the DA measure proposed in this paper and the error rates obtained by kNN, SVM and MLP using the selected features returned by DAFS. We analyse these values for a final feature set of size k, where k = 1 . . . G (and G = 30). The features included in the final set of size k are the top k ranked features according to the DA measure. Figure 7 presents the results obtained for the DLBCL dataset (each chart corresponds to a different diversity measure used in the computation of DA). For each point k from the horizontal axis (representing the number of selected features), the charts in Fig. 7 show the error obtained by kNN, SVM and MLP as well as the DA value for the considered k feature normalized by the maximum DA (i.e. DA value for top ranked feature). An important result emphasized in Fig. 7 (and furthermore in Fig. 8) refers to the existing correlation between the DA values and the error rates of the classifiers. This result underlines the suitability of the DA measure for an efficient detection of relevant features. The error rates start from 1 at k = 0 and improve dramatically in the first stages of increasing the number of selected features (roughly up until k = 5 for the DLBCL dataset as shown in Fig. 7). Furthermore, it can be observed that for k = 11 a new minimum of the error rate is detected but after this point adding infomation from more features to the classification task generates no better performance. However, the SVM as a base classifier is able to trigger better results with more features particularly in conjuction with the DF diversity measure (for k = 18 the reported error rate is the lowest). Figure 8 extends the results to other datasets from Table 1 presenting the error rates reported by DAFS based on the best diversity measure for each dataset. Similar to Fig. 7, the errors of kNN, MLP and SVM in Fig. 8 are depicted for k selected features (k = 1 . . . 30). The DA measure decreases steadily with an increasing number k of selected features but this improvement is significant up until k = 10 approximately. In the case of the DLBCL and Leukemia1 datasets, the performance of all considered

711

TH OR

CO PY

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

Fig. 8. Error rates reported by DAFS using four different datasets achieved by the best diversity for each (top left: DLBCL, top right: Tumors9, bottom left: Leukemia1 and bottom right: Prostate Tumor).

AU

classifiers is almost similar whereas SVM and kNN trigger the best results for the Tumors 9 dataset while SVM and MLP outperform kNN for the Prostate Tumor dataset. Figure 9 presents the accuracies of different classifiers obtained based on certain final selected feature sets. SVM triggers the best recognition rates based on DAFS with Q-stat, Corr or Dis as diversity measures. The superiority of the DF diversity measure for the DLBCL dataset is clearly emphasized in Fig. 9 (bottom right) showing the accuracies of SVM, kNN and MLP are almost similar for many feature subsets. The optimum number of features varies from one diversity measure to another. 4.4. Comparisons of DAFS with related methods The proposed feature selection algorithm is compared to related methods based on the recognition rates of the final classifiers. Table 2 presents the comparative numerical results with report to the baseline methods. The accuracies of single kNN, SVM and MLP are given in the first three columns of results. The next two columns show the accuracy obtained by standard RSE and the average individual accuracy in RSE (column labeled Avg. Ind.). These five columns provide the base-line results of standard methods without any feature selection to comparatively assess the DAFS performance. Last column in Table 2 contains the accuracy reported by proposed DAFS method. Except for the single classifier results, SVM has been used as the final classifier to produce the recognition rates.

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

TH OR

CO PY

712

Fig. 9. Recognition rates of different classifiers using different numbers of selected features for DLBCL dataset (top left: Q-stat, top right: Corr, bottom left: Dis and bottom right: DF).

AU

The average rank of DAFS (computed based on the average results over all datasets) is the best among all methods compared in Table 2. The advantages of DAFS compared to the baseline methods are clear from the results obtained. Table 2 shows that SVM classifiers are robust against dimensionality and able to achieve a good performance even without feature selection. However, feature selection significantly improves the classification performance of non-SVM learners. The recognition rates of kNN and MLP increase dramatically (more than double) when DAFS is first used in feature selection. Furthermore, we compare the performance of DAFS with the most significant related methods for which results are available or can be computed for the same cancer diagnosis datasets considered in this study. Two commonly used feature selection methods in machine learning i.e. Mutual information (MI) and MRMR [7] have been implemented and applied for all considered cancer gene expression datasets. Furthermore, we directly compare the DAFS recognition rates with those reported in [29] where several multicategory classification methods are evaluated for the same gene expression cancer diagnosis datasets considered in the current paper. Statnikov et al. [29] analyse the results of multicategory SVMs (MC-SVMs) considered the most effective classifiers in performing accurate cancer diagnosis based on microarray data. DAFS is compared with all MC-SVM techniques analysed in [29]: One-Versus-Rest (OVR), One-Versus-One (OVO), DAGSVM (which uses a rooted binary decision directed acyclic graph in the testing phase), the method by Weston and Watkins (WW) and the method by Crammer and Singer (CS). Comparative results are presented in Table 3 (last column repeats the results obtained by DAFS for direct comparison purposes).

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

713

Table 2 Recognition rates obtained by proposed DAFS (last column) compared to the results of baseline methods single SVM 94.6 75.6 62.7 91.2 77.8 97.5 95.2 96.5 98.1 92.0 95.8 1.64

single kNN 73.2 45.8 44.1 85.3 65.1 81.9 85.9 86.5 83.2 80.7 77.9 3.91

single MLP 55.8 11.9 20.5 83.9 62.6 79.4 88.6 84.8 89.5 75.5 81.8 4.73

RS ensemble 68.1 50.9 44.9 84.4 63.0 82.0 87.7 87.7 90.5 79.9 78.8 3.55

Avg. Ind. 52.8 33.7 31.6 79.1 54.9 73.1 80.3 79.1 78.4 71.1 72.5 5.82

CO PY

Dataset 11_Tumors 14_Tumors 9_Tumors Brain_Tumor1 Brain_Tumor2 Leukemia1 Leukemia2 Lung_Cancer SRBCT Prostate_Tumor DLBCL Avg. Rank

DAFS method 95.3 74.9 65.6 90.6 78.8 97.5 97.2 95.4 100 92.3 98.0 1.27

Table 3 Comparison of recognition rates obtained by DAFS (last column) with related methods: MI, MRMR and MC-SVM techniques – One-Versus-Rest (OVR), One-Versus-One (OVO), DAGSVM, Weston and Watkins (WW) method and Crammer and Singer (CS) method MI 92.2 73.6 60.0 89.9 77.5 96.9 94.8 94.9 97.1 91.8 95.6 7.09

MRMR 95.3 74.9 65.6 91.0 78.8 97.0 95.0 96.5 97.3 92.0 97.3 3.64

OVR 94.68 74.98 65.10 91.67 77.00 97.5 97.32 96.05 100 92.0 97.5 2.45

OVO 90.36 47.07 58.57 90.56 77.83 97.32 95.89 95.59 100 92.0 97.5 4.27

DAG SVM 90.36 47.35 60.24 90.56 77.83 96.07 95.89 95.59 100 92.0 97.5 4.27

TH OR

Dataset 11_Tumors 14_Tumors 9_Tumors Brain_Tumor1 Brain_Tumor2 Leukemia1 Leukemia2 Lung_Cancer SRBCT Prostate_Tumor DLBCL Avg. Rank

WW 94.68 69.07 62.24 90.56 73.33 97.5 95.89 95.55 100 92.0 97.5 3.73

CS 95.3 76.60 65.33 90.56 72.83 97.5 95.89 96.55 100 92.0 97.5 2.45

DAFS 95.3 74.9 65.6 90.6 78.8 97.5 97.2 95.4 100 92.3 98.0 2.09

AU

As indicated in Table 3, DAFS clearly obtains better recognition rates than mutual information. An overall better performance compared to MRMR can also be observed. For three out of the 11 considered datasets (i.e. 11_Tumors, 9_Tumors and Brain_Tumor1), DAFS and MRMR report the same accuracy while for the Lung_Cancer dataset (a five-class problem with a high maximum prior probability of the dominant class), MRMR obtains a better recognition rate of 96.5 (same accuracy with single SVM) compared to 95.4 reported by DAFS. However, the proposed DAFS outperforms MRMR for the rest of seven datasets considered. Compared to the MC-SVM techniques, DAFS has an overall better performance confirmed by the best average rank of 2.09 (see Table 3). For the SRBCT dataset, all MC-SVM methods and DAFS obtain the best possible accuracy of 100% (only MI and MRMR are not able to reach this optimal value). DAFS obtains equal or better recognition rates compared to OVO, DAGSVM and WW for all datasets except the Lung_Cancer dataset (the same one for which MRMR outperforms DAFS as explained above). OVR obtains better accuracies compared to DAFS in four cases while for the majority of datasets a better performance of DAFS is observed. Compared to CS, DAFS obtains same recognition rates for three datasets, slightly worse accuracies in only two cases and a better performance for the rest of six datasets considered. It is worth noting the huge difference in computational running time between the MC-SVM techniques and the proposed DAFS method. As reported in [29], the total running time in hours for all 11 datasets

714

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

5. Conclusions and future work

CO PY

ranges from 289.01 to 772.43 hours when LOOCV is used in all MC-SVM methods, respectively from 7.88 to 19.28 hours when a stratified 10-fold cross-validation is used. In comparison, DAFS (which uses LOOCV) needs from as little as few seconds running time up to 3–10 hours for one dataset (leading to a maximum running time of approximately 90 hours for all 11 datasets) depending on the specific (m, L) parameter values selected in DAFS (see Fig. 5 for a visualization of DAFS required computational time for the DLBCL dataset). Therefore, DAFS brings an important advantage of low computational time cost compared to related methods which is clearly a critical factor in assessing the performance of gene selection methods.

AU

Acknowledgments

TH OR

A simple and effective feature selection method has been proposed and evaluated for gene expression cancer diagnosis. The main strength of the introduced DAFS method is represented by an efficient combination of classification accuracy information extracted using RSE and diversity information generated by using certain measures. The first component facilitates the identification of relevant features in obtaining a good recognition rate in classification while the second component focuses on minimizing the redundancy among selected features. Experimental results for 11 microarray cancer diagnosis datasets support the conclusions that DAFS is a fast and efficient feature selection method. It has been shown that DAFS is able to reduce the computational cost of the classification by removing irrelevant features and improving the recognition rate. The best results are obtained using SVM as base classifier while DAFS performance induced by the diversity measure varies from one dataset to another. Obviously, the DAFS functionality is independent from the particular classifier used while the recognition rate based on the selected features does depend on the final classifier. For gene expression cancer classification, experiments revealed that better accuracies are produced by SVM in comparison with kNN and MLP. Moreover, the non-SVM learners benefit more from the proposed feature selection method in the sense that classfication accuracies are significantly improved when using DAFS as a preprocessing step. Future work focuses on extending the numerical experiments to other microarray datasets. Also, the performance of other classifiers in connection with more diversity measures will be investigated. The features of DAFS will be explored in the context of clustering tasks for microararay data where the selection of relevant features should rely on various correlation measures rather than classification performances.

This work is supported by CNCS Romania through grant PN II TE 320 – Emergence, auto-organization and evolution: New computational models in the study of complex systems. References [1] [2]

M. Banerjee, S. Mitra and H. Banka, Evolutionary rough feature selection in gene expression data, IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 37(4) (2007), 622–632. A. Bertoni, R. Folgieri and G. Valentini, Random subspace ensembles for the bio-molecular diagnosis of tumors, Models and Metaphors from Biology to Bioinformatics Tools, NETTAB 2004.

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

A. Bertoni, R. Folgieri and G. Valentini, Feature selection combined with random subspace ensemble for gene expression based diagnosis of malignancies, in: Biological and Artificial Intelligence Environments, B. Apolloni, M. Marinaro and R. Tagliaferri, eds, Springer 2005, pp. 29–36. [4] A. Bertoni, R. Folgieri and G. Valentini, Bio-molecular cancer prediction with random subspace ensembles of Support Vector Machines, Neurocomputing 63 (2005), 535–539. [5] A.-L. Boulesteix, C. Strobl, T. Augustin and M. Daumer, Evaluating Microarray-based Classifiers: An Overview, Cancer Informatics 6 (2008), 77–97. [6] L. Breiman, Bagging predictors, Machine Learning 24(2) (1996), 123–140. [7] C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, J Bioinform Comput Biol 3 (2005), 185–205. [8] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield and E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286(5439) (1999), 531–537. [9] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning 46(1–3) (2002), 389–422. [10] T. Ho, The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8) (1998), 832–844. [11] H.-L. Huang and F.-L. Chang, ESVM: Evolutionary support vector machine for automatic feature selection and classification of microarray data, BioSystems 90 (2007), 516–528. [12] I.B. Jeffery, D.G. Higgins and A.C. Culhane, Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data, BMC Bioinformatics 7 (2006), 359. [13] V.P. Kamath, L.O. Hall, T.J. Yeatman and S. Eschrich, Multivariate Feature Selection using Random Subspace Classifiers for Gene Expression Data, Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2007 (2007), 1041–1045. [14] L.I. Kuncheva, J.J. Rodriguez, C.O. Plumpton, D.E.J. Linden and S.J. Johnston, Random Subspace Ensembles for fMRI Classification, IEEE Transactions on Medical Imaging 29(2) (2010), 531–542. [15] L.I. Kuncheva and C.J. Whitaker, Measures of diversity in classifier ensembles, Machine Learning 51 (2003), 181–207. [16] C. Lai, M.J.T. Reinders and L. Wessels, Random subspace method for multivariate feature selection, Pattern Recognition Letters 27(10) (2006), 1067–1076. [17] C. Lai, M.J.T. Reinders, L.J van’t Veer and L.F.A. Wessels, A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets, BMC Bioinformatics 7 (2006), 235. [18] P. Larrañaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J.A. Lozano, R. Armaanzas, G. Santaf, A. Prez and V. Robles, Machine learning in bioinformatics, Briefings in Bioinformatics 7(1) (2006), 86–112. [19] C.P. Lee and Y. Leu, A novel hybrid feature selection method for microarray data analysis, Applied Soft Computing 11 (2011), 208–213. [20] H. Liu and R. Kustra, Dimension Reduction of Microarray Data with Penalized Independent Component Analysis, The NIPS workshop on New Problems and Methods in Computational Biology (NIPS), 2005. [21] H. Liu, L. Liu and H. Zhang, Ensemble gene selection by grouping for microarray data classification, Journal of Biomedical Informatics 43 (2010), 81–87. [22] Y. Lu and J. Han, Cancer classification using gene expression data, Information Systems 28(4) (2003), 243–268. [23] P. Maji and S. Paul, Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data, International Journal of Approximate Reasoning 52 (2011) 408–426. [24] R. Malutan, P. Gomez Vilda and M. Borda, Independent component analysis algorithms for microarray data analysis, Intell Data Anal 14(2) (2010), 193–206. [25] H. Moon, H. Ahn, R.L. Kodell, S. Baek, C.-J. Lin and J.J. Chen, Ensemble methods for classification of patients for personalized medicine with high-dimensional data, Artificial Intelligence in Medicine 41(3) (2007), 197–207. [26] R. Ruiz, J.C. Riquelme and J.S. Aguilar-Ruiz, Incremental wrapper-based gene selection from microarray data for cancer classification, Pattern Recognition 39 (2006), 2383–2392. [27] Y. Saeys, I. Inza and P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007), 2507–2517. [28] A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin and S. Levy, http://www.gems-system.org, 2004. [29] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin and S. Levy, A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis, Bioinformatics 21(5) (2005), 631–643. [30] G. Valentini, M. Muselli and F. Ruffino, Cancer recognition with bagged ensembles of support vector machines, Neurocomputing 56 (2004), 461–466. [31] J.-Y. Yeh, Applying data mining techniques for cancer classification on gene expression data, Cybernetics and Systems 39(6) (2008), 583–602.

AU

TH OR

CO PY

[3]

715

CO PY

[33]

X. Zhou and D.P. Tuck, MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data, Bioinformatics 23 (2007), 1106–1114. Z. Zhu, Y.-S. Ong and M. Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition 40(11) (2007), 3236–3248.

TH OR

[32]

N. Hatami and C. Chira / Diverse accurate feature selection for microarray cancer diagnosis

AU

716

author copy

DOI 10.3233/IDA-130601. IOS Press. Diverse accurate feature selection for ... This way, the introduced approach takes advantage of RSE to deal ..... genes with “absent” calls in all samples were excluded from some of the datasets to reduce ...

1MB Sizes 9 Downloads 271 Views

Recommend Documents

author copy
Apr 2, 2009 - distinguish the relative impact of different types of network ... assemblers appear to be most beneficial for local suppliers, although .... supplier's few direct, social ties to certain firms, ... allied top-tier suppliers) took charge

author copy
Reply to Festenstein. Robert B. Talisse. Philosophy Department, 111 Furman Hall, Vanderbilt University, Nashville,. Tennessee 37240, USA. E-mail: [email protected]. Contemporary Political Theory (2010) 9, 45–49. doi:10.1057/cpt.2009.31.

author copy
discard those mechanisms that appear to be less consistent with the data. In addition to the ..... To the best of our knowledge, Reinhart and. Rogoff's ... GDP starts recovering in the year after the default and that as early as three years after ...

author copy
Automated teller machines allow us to withdraw cash from the bank more often .... sizes, which together account for 60% of the category sales (Globus 2002).

1 a. author, b. author and c. author
Building Information Modeling is the technology converting the ... the problem of preparing their students for a career in a BIM enabled work environment.

22221912_10155789669828615_5211413337012902137_n copy
*Open Enrollment registration begins ón 10/16/17 and will conclude on 11/3/2017 via the Registration. Gateway. Remember that you don't have to wait until this event to register! (RPS. TURNAROUND ARTS: RICHMOND creating success in schools. Communitie

PKM - Copy
basis) for KGBVs based on the Cut off:- 35 Marks and above for Sc/STs and 40. Marks and above for all other Categories. 2 ... 45. 42.5. 42. 41.5. 41. 41. S.No. w N P. DIST. Prakasham. Prakasham. Prakasham. BC-C. Hall Ticket Number. 80097. 80125. 8009

Author guidelines.pdf
online immediately after publication, which permits its users to read, download, copy,. distribute, print, search, or link to the full texts of its articles, thus facilitating ...

PKM - Copy
Examination Conducted on 10.11.2013 for Selection of Special Officers in KGBVs. Name of the District: PRAKASAM ... E E o 00 v o un » w N P. BC-B. Hall Ticket ...

author queries
8 Sep 2008 - Email: [email protected]. 22. ... life and domain satisfaction: to do well from one's own point of view is to believe that one's life is ..... among my goals. I also value positive hedonic experience, but in this particular. 235 situ

Author Guidelines
As a result, the main piconet coordinator can be understood by the other coordinators and it is also able to monitor the network traffic. The proposed approach is ...

Author preprint
This effect is known as the shine-through effect, because the vernier seems to .... dominance, and a performance of 50% that both the vernier and the anti-vernier ..... A. Illustration of the sequences of vernier, anti-vernier and surround used in ..

Author preprint
Each participant was seated at a distance of 2 meters from the display. ... At SOAs outside the interval between -60 ..... The data show that only the baseline.

AUTHOR-GUIDELINES.pdf
Page 3 of 32. AUTHOR-GUIDELINES.pdf. AUTHOR-GUIDELINES.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying AUTHOR-GUIDELINES.pdf.

Author preprint
was controlled by a PC via fast 16-bit DA converters. Line el- ..... The data show that only the baseline ..... Target recovery in metacontrast: The effect of contrast.

author queries
Some psychologists call for the replacement of all. 35 traditional first ... are alive and well,3 which means that objectivists about happiness and well-being .... of LS judgments shows them to be flawed in a way that 'objective happiness' is not.

Copy of Copy of Kaplan Adm Samples.pdf
A nurse is to give the liquid medicine 3 times a day. The morning dose is 3/4 ounce, the noon dose. is 1/2 ounce and the evening dose is 3/4 ounce. The nurse ...

Guidelines for Installation Use - Copy - Copy
Do not pass a recruit formation that is marching in the same direction you are travelling. 5. Parking is permitted in any of the designated parking areas around the ...

Author Guidlines IFRJ.pdf
Manuscript received will be. examined for the plagiation element using Google Scholar network. ... Article Writing Templates and Guidelines in PDF (.pdf): ... Writing unit is using the International System of Units (SI). ... Author Guidlines IFRJ.pdf

Author & Illustrator Chart.pdf
Author And Illustrator. Author: Both: Page 3 of 5. Author & Illustrator Chart.pdf. Author & Illustrator Chart.pdf. Open. Extract. Open with. Sign In. Main menu.

Author Guidlines JKPI.pdf
Page 1 of 7. PETUNJUK PENULISAN DAN KIRIM ARTIKEL JURNAL KEBIJAKAN. PERIKANAN INDONESIA MULAI PENERBITAN TAHUN 2016 (12pt Bold).

Author Guidlines JPPI.pdf
Template dan Petunjuk Penulisan Artikel dalam PDF (.pdf):. http://ejournal-balitbang.kkp.go.id/index.php/jppi/about/submissions#authorGuidelines. Petunjuk ...

Author (1).pdf
Taskstream.com and log into Taskstream using your assigned. username and password. 2. To access your Directed Response Folio (DRF), click the name of the DRF program from. the home page. 3. You will see the structure of your Directed Response Folio (