A Genetic Algorithm for Hierarchical Multi-Label Classification Ricardo Cerri, Rodrigo C. Barros, and Andre C. P. L. F. de Carvalho Institute of Mathematical Sciences and Computing (ICMC-USP) University of São Paulo Av. Trabalhador Sao-Carlense, 400, Centro

{cerri,rcbarros,andre}@icmc.usp.br

ABSTRACT In Hierarchical Multi-Label Classification (HMC) problems, each example can be classified into two or more classes simultaneously, differently from standard classification. Moreover, the classes are structured in a hierarchy, in the form of either a tree or a directed acyclic graph. Therefore, an example can be assigned to two or more paths from a hierarchical structure, resulting in a complex classification problem with possibly hundreds or thousands of classes. Several methods have been proposed to deal with such problems, some of them employing a single classifier to deal with all classes simultaneously (global methods), and others employing many classifiers to decompose the original problem into a set of subproblems (local methods). In this work, we propose a novel global method called HMC-GA, which employs a genetic algorithm for solving the HMC problem. In our approach, the genetic algorithm evolves the antecedents of classification rules, in order to optimize the level of coverage of each antecedent. Then, the set of optimized antecedents is selected to build the corresponding consequent of the rules (set of classes to be predicted). Our method is compared to state-of-the-art HMC algorithms, in protein function prediction datasets. The experimental results show that our approach presents competitive predictive accuracy, suggesting that genetic algorithms constitute a promising alternative to deal with hierarchical multi-label classification of biological data.

Categories and Subject Descriptors I.2.6 [Learning]: Induction and Knowledge Acquisition

Keywords Genetic algorithms, hierarchical multi-label classification, classification rules, protein function prediction.

1.

INTRODUCTION

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’12 March 25-29, 2012, Riva del Garda, Italy. Copyright 2011 ACM 978-1-4503-0857-1/12/03 ...$10.00.

Standard classification problems require a classifier to assign each test example to a single class. These classes are mutually exclusive and do not constitute any hierarchical structure whatsoever. Nevertheless, in many real-world classification problems, such as classification of biological data, one or more classes can be further divided in subclasses or grouped in superclasses. Hence, the classes constitute a hierarchical structure, usually in the form of a tree or of a Directed Acyclic Graph (DAG). These problems are known in the Machine Learning (ML) literature as hierarchical classification problems, in which examples are assigned to classes associated to nodes belonging to a hierarchy [17]. There are two well-known approaches to deal with hierarchical problems: the local (top-down) and global (one-shot, big-bang) approaches. In the first one, standard classification algorithms are trained for producing a hierarchy of classifiers, which are in turn used in a top-down fashion for classifying new examples. Initially, the most generic class is predicted, and then it is used to reduce the set of possible classes for the next hierarchical level. Note that, as the hierarchy is traversed toward the leaves, classification errors are propagated to deeper levels, unless some procedure is adopted to avoid this problem. The global approach, in turn, induces a unique classifier using all classes of the hierarchy at once and, after the training phase, the classification of a new example occurs in just one step. Even though these methods are usually more complex, they avoid the error propagation problem of local approaches. Another advantage of the global approach is that they usually generate less complex classification rules compared to the collection of all local-rule-sets generated by local based methods [2, 22, 26]. Hierarchical classification problems can be further divided into those whose classes are not only structured in a hierarchy, but an example can also be assigned to more than one class in the same hierarchical level. This kind of problem is known as Hierarchical Multi-Label Classification (HMC), and it is very common in tasks of protein and gene function prediction and text classification [25]. In HMC problems, an example can be assigned to two or more paths in a class hierarchy. Formally, given a space of examples X, the objective of the training process is to find a function that maps each example xi into a set of classes, respecting the constraints of the hierarchical structure, and optimizing some quality criterion. In this paper, we propose a novel method called HMC-GA (Hierarchical Multi-Label Classification with Genetic Algorithm). It is a global HMC method where a

Genetic Algorithm (GA) evolves sets of classification rules. HMC-GA evolves the antecedents of the rules, optimizing the rule coverage within the training set. At the end of evolution, a set of optimized antecedents of rules is selected to build the consequents of the rules (the classes to be predicted). To the best of our knowledge, this is the first work that proposes an evolutionary algorithm for HMC problems. This paper is organized as follows. In Section 2, we briefly review some works related to our approach. Our novel global method for HMC which employs a GA is described in Section 3. We detail the experimental methodology in Section 4, and we present the experimental analysis in Section 5, in which our method is compared with other bio-inspired HMC methods, and also with state-of-the-art decision trees for HMC problems, in protein function prediction datasets. Finally, we summarize our conclusions and point to future research steps in Section 6.

2.

RELATED WORK

In this section, we review the most recent work in HMC, with emphasis on comprehensible predictive models such as classification rules and decision trees. In [26], three algorithms based on the concept of Predicting Clustering Trees (PCT) to deal with HMC problems are compared, namely Clus-HMC [3], Clus-SC and Clus-HSC. The authors also discuss different hierarchical structures and their impact on the design of algorithms for HMC. Whereas in [26] the authors employ the Euclidean distance as the proximity measure for creating PCTs, the work of Aleksovski et al. [1] investigates the use of other proximity measures, such as Jaccard, SimGIC and ImageClef. The authors state that there are no significant differences among the proximity measures employed. In [24], an ensemble of decision trees induced by Clus-HMC [26] is proposed to improve HMC, namely Clus-HMC-ENS. The authors make use of Bagging [6] to perform bootstrap in the training data, and thus training a classifier for each bootstrap. The predictions are combined by either the average (for regression) or the majority vote (for classification) of the target attribute. Ensemble of PCT-based classifiers is also investigated in [12–14], where the authors experiment also with Random Forests [7]. Sangsuriyun et al. [23] propose a global method for HMC, namely Hierarchical Multi-Label Associative Classification (HMAC). Such an approach provides classification rules which consider both the presence (positive rules) and the absence (negative rules) of a given attribute for classifying an example. Moreover, it takes into account rules which predict a set of negative classes, i.e., rules which state that a given example does not belong to a set of classes. To the best of our knowledge, none of the proposed methods for HMC makes use of the Evolutionary Algorithms (EAs) paradigm. The bio-inspired strategies which are proposed for dealing with HMC problems are Artificial Neural Networks [8–10, 27], Artificial Immune Systems [2] and Ant Colony Optimization [22].

3.

HMC-GA

In this section, we present a novel global method for HMC problems called Hierarchical Multi-Label Classification with Genetic Algorithm (HMC-GA). We present in details the

GA designed for evolving the antecedents of classification rules, and also how we generate the consequents of the rules.

3.1

Individual Representation

In HMC-GA, each individual values, representing a sequence constitute the classification rule. rule for HMC problems is of the

is a string of integer/real of tests (antecedent) that Recall that a classification type

IF (A1 OP ∆) AND . . . AND (Am OP ∆) THEN {C1 , C1.3 , C1.4 , C1.4.2 } The antecedent of the rule is comprised of AND clauses, which test a given dataset attribute Ai according to a threshold ∆, based on a corresponding operator OP. For the case of nominal attributes, the operators employed are = and 6=. For numeric attributes, operators available are >, <, ≥, ≤. For representing every possible test, each AND clause is associated to an attribute, and encoded as a 4-tuple {FLAG,OP,∆1 , ∆2 }. Gene Flag can be either 1 or 0, indicating the presence (1) or absence (0) of the test over the attribute. Gene OP is an integer that indexes one of the possible operators of the test. Genes ∆1 and ∆2 are real values to be used as thresholds within the tests. Figure 1 depicts this rationale. FLAG OP

1

2

FLAG OP

1

2

Figure 1: Individual representation.

3.2

Evolution

We illustrate in Algorithm 1 the pseudocode of HMC-GA. The generation of the initial population (line 3) is performed by probabilistically activating the gene FLAG of the clauses in an individual (each clause has 20% of chance of being used), and by randomly setting different values for OP. The thresholds ∆1 and ∆2 are set according to the randomly assigned operator. Operators >, ≥ will result in ∆2 ← null and ∆1 to be assigned to a randomly generated real value in the interval [min, max], where min (max) is the minimum (maximum) value of the corresponding attribute in the training dataset. Conversely, operators <, ≤ will result in ∆1 ← null and ∆2 to be assigned to a randomly generated real value in the interval [min, max]. Operators = and 6= will result in ∆2 ← null and ∆1 to be assigned to a randomly selected category of the corresponding attribute, indexed by an integer. The reason we defined two distinct genes as threshold values is to increase the flexibility of the algorithm, in order to allow composite tests, such as ∆1 ≤ Ai ≤ ∆2 . However, composite tests are left for future research. Regarding the evolution, HMC-GA is a generational GA, but also a sequential covering rule algorithm. In a sequential covering algorithm, examples which are covered by a rule are removed from the training set, so the new rules generated can fit the remaining uncovered examples. HMC-GA runs a full evolutionary cycle (lines 3-19), and then saves all rules from the last generation whose number of covered examples surpass a given threshold (lines 21 and 22). In addition, all examples which are covered by a rule are removed from

Algorithm 1 HMC-GA. Require: Training dataset D, number of generations G, population size p, minimum number of covered examples minCovered, maximum number of uncovered examples maxU ncovered, crossover rate cr, mutation rate mr, mutate gene probability mg, tournament size t, elitism rate e. 1: Rules ← ∅ 2: while (|D| > maxU ncovered) do 3: Generate InitialP op 4: calculateF itness(InitialP op, D) 5: CurP op ← InitialP op 6: j←G 7: while (j > 0) do 8: N ewP op ← ∅ 9: N ewP op ← N ewP op ∪ CurP op.elite(e) 10: repeat 11: P arents ← selection(CurP op, t) 12: Children ← crossover(P arents, cr) 13: N ewP op ← N ewP op ∪ Children 14: until (|N ewP op| = p ) 15: N ewP op ← mutation(N ewP op, mr, mg) 16: CurP op ← N ewP op 17: calculateFitness(CurP op,D) 18: j ←j−1 19: end while 20: for (r = 1 to p) do 21: if (CurP op.rule[r].coverage > minCovered) then 22: Rules ← Rules ∪ CurP op.rule[r] 23: end if 24: end for 25: Remove examples from D covered by Rules 26: end while 27: return Rules the training set (line 25). Next, HMC-GA starts a full new evolutionary cycle, and this process is repeated until only a few examples (user input) are left uncovered. HMC-GA implements a tournament selection (line 11) and one-point crossover (line 12). Mutation (line 15) operates over a percentage mr of individuals. For these individuals, each gene has a probability mg of suffering mutation. It can be either flag mutation or threshold mutation. Flag mutation is a bit-flip operation, activating or deactivating the clause. Threshold mutation can either reduce (1) or increase (2) the value of the appropriate threshold according to a randomly selected correction, i.e., ∆i ← ∆i − (random × ∆i ).

(1)

∆i ← ∆i + (random × ∆i )

(2)

HMC-GA adopts as fitness function (lines 4 and 17) the proportion of active clauses covered by the examples. In other words, the fitness function has a bias towards individuals whose active clauses are covered by the greatest number of examples. Algorithm 2 shows how the fitness of a single individual is calculated.

3.3

Consequent Generation

Algorithm 2 Fitness Calculation. Require: Individual I, Dataset D 1: f itness ← 0 2: maxExamples ← number of examples in D 3: maxClauses ← number of attributes in D 4: activeClauses ← number of active clauses in I 5: for e = 1 to maxExamples do 6: sum ← 0 7: for c = 1 to maxClauses do 8: if I.clause[c] is active then 9: if I.clause[c] satisfies D.example[e] then 10: sum ← sum + 1 11: end if 12: end if 13: end for 14: f itness ← f itness + sum 15: end for 16: f itness ← f itness ÷ activeClauses 17: return f itness

HMC-GA evolves the antecedents of the rules, i.e., the AND clauses that form a rule. The consequent of the rule, which indicates the classes to which the examples that satisfy the rule belong to, is calculated using a deterministic procedure as follows. Given the set of examples Sr covered by a rule r, the consequent is a vector of length k (where k is the number of class labels in the class hierarchy). The value for each ith component of the consequent vector for rule r is given by consequentr,i =

|Sr,i | |Sr |

(3)

where |Sr,i | is the number of examples covered by rule r that belong to the ith class of the hierarchy. More specifically, the consequent of a rule is a vector where each ith component is the proportion of covered examples that belong to the ith class. Each position in the vector will now be a number in the interval [0, 1], representing the probability that examples covered by the rule have of belonging to the corresponding class. Hence, in order to obtain the classification predictions from each rule, we must choose a threshold δ for indicating whether the covered example belongs to class i (r[i] ≥ δ) or not (r[i] < δ).

4.

EXPERIMENTAL METHODOLOGY

Seven datasets related to protein functions of the Saccharomyces cerevisiae organism were employed in the experiments. The datasets are freely available at http:// www.cs.kuleuven.be/~dtai/clus/hmcdatasets.html, and are related to bioinformatics data such as phenotype data and gene expression levels. They are organized in a hierarchy structured as a tree, according to the FunCat scheme of classification developed by MIPS. Table 1 shows the main characteristics of the training, valid and test datasets used. The performance of HMC-GA was compared to three state-of-the-art decision trees for HMC problems presented in [26]: Clus-HMC, a global method that induces a single decision tree for the whole set of classes; Clus-HSC, a local method that explores the hierarchical relationships to build a decision tree for each hierarchical node; and Clus-SC, a local method that builds a binary decision tree for each class

Table 1: Summary of datasets: number of attributes (|A|), number of classes (|C|), total number of examples (Total) and number of multi-label examples (Multi). Dataset

|A|

|C|

Cellcycle Church Derisi Eisen Gasch2 Pheno Spo

77 27 63 79 52 69 80

499 499 499 461 499 455 499

Training

Valid Multi

Total

Multi

Total

Multi

1628 1630 1608 1058 1639 656 1600

1323 1322 1309 900 1328 537 1301

848 844 842 529 849 353 837

673 670 671 441 674 283 666

1281 1281 1275 837 1291 582 1266

1059 1057 1055 719 1064 480 1047

of the hierarchy. These methods are based on the concept of Predictive Clustering Trees (PCT) [4]. Additionally, our method was compared with two other bio-inspired methods. The first one is presented in [22], named hmAnt-Miner. This global method, based on Ant Colony Optimization, is also employed for discovering HMC rules. The second bio-inspired algorithm is HMC-LMLP [10], which is a local method based on Artificial Neural Networks trained with the Back-propagation algorithm. To evaluate the performance of the algorithms, we make use of the Precision-Recall curves (P R curves), an evaluation strategy that reflects the precision of a classifier as a function of its recall, and gives a more informative picture of an algorithm’s performance when dealing with highly skewed datasets [11]. The hierarchical precision (hP ) and hierarchical recall (hR) measures (Equations 4 and 5) used to construct the P R curves consider that an example belongs not only to a class, but also to all ancestor classes of this class [21]. Thus, given an example (xi , Ci ), with xi belonging to the space X of examples, Ci the set of its predicted classes, and Ci0 the set of its real classes, Ci and Ci0 can be extended to contain their corresponding bi = S b0 ancestor classes as: C ck ∈Ci Ancestors(ck ) and Ci = S cl ∈Ci0 Ancestors(cl ), where Ancestors(ck ) denotes the set of ancestors of the class ck . P hP = P hR =

bi ∩ C bi0 | |C P b i |C i |

(4)

bi ∩ C bi0 | |C P b0 i |C i |

(5)

i

i

Test

Total

To obtain a P R curve, different threshold values are applied to the outputs of the methods in order to generate different hP and hR values. The outputs of the methods are represented by vectors of real values, where each value is the pertinence degree of a class to a given example. For each threshold, a point in the P R curve is obtained, and final curves are then constructed by the interpolation of these points, according to [11]. The areas under these curves (AU (P RC)) are then approximated by summing the trapezoidal areas between each point. These areas are then used to compare the performances of the methods. The higher the AU (P RC) of a method, the better is its predictive performance. Besides the predictive performance, the methods were also compared considering the size of the model created. For HMC-GA and hmAnt-Miner, the model size is defined as the number of rules discovered, and for Clus-HMC, Clus-HSC and Clus-SC, the model size is defined as the number of leaf

nodes in the decision tree, since each path from the root to a leaf node can be considered a rule. The HMC-LMLP algorithm was not evaluated regarding model size, since the method does not produce classification rules. We employed the Wilcoxon rank-sum test [18], with the Holm‘s correction [19] for multiple comparisons, to verify the significance of the results with a confidence level of 95%. As in [26], [22] and [10], 2/3 of each dataset were used for inducing the classification models, and 1/3 for testing. Table 2 shows the user-defined parameter values used in HMC-GA. No attempt to tune these parameter values was made. They were defined either based on typical GA parameter values or on very few preliminary trials. Table 2: HMC-GA parameters. Parameter

Description

Value

maxUncovered G minCovered p cr mr mp t e

Maximum number of uncovered examples Number of generations Minimum covered examples per rule Population size Crossover rate Mutation rate Mutation probability Tournament size Elitism rate

1% 1000 10 100 90% 10% 0.5 17 5%

5.

EXPERIMENTAL ANALYSIS

Table 3 shows the results of the experiments for each dataset. As HMC-GA and hmAnt-Miner are probabilistic methods, the results reported are the means and standard deviations obtained after 10 executions. The results presented for the other methods are those provided in their references ( [10, 26]), since we are using exactly the same training and test sets. For HMC-LMLP, it is also shown the number of training epochs needed to obtain its results. According to Table 3, the best predictive performance, in the majority of the datasets, was achieved by the global methods, i.e., Clus-HMC, hmAnt-Miner and HMC-GA, respectively. Table 4 shows the p-values obtained in all pairwise comparisons involving all methods in terms of (i) AU (P RC) and (ii) Model size (number of rules), using the Wilcoxon rank-sum test with the Holm‘s correction. It is possible to see that Clus-HMC outperforms HMC-GA However, in terms of AU (P RC) and model size. considering the other methods in terms of the AU (P RC), HMC-GA significantly outperforms the Clus-SC method, and presents competitive results with the state-of-the-art methods hmAnt-Miner and Clus-HSC. It also outperforms, though with no statistically significant differences, the methods Clus-HSC and HMC-LMLP, showing that a GA

Table 3: Average AU (P RC) and model size obtained in the seven datasets. (mean ± standard deviation). HMC-GA AU (P RC) size Cellcycle Church Derisi Eisen Gasch2 Pheno Spo

0.150 0.149 0.152 0.165 0.151 0.148 0.151

± ± ± ± ± ± ±

0.001 66.70 ± 13.40 0.001 126.30 ± 42.35 0.001 37.00 ± 10.20 0.005 63.80 ± 7.70 0.001 42.80 ± 8.76 0.003 25.80 ± 8.06 0.001 48.40 ± 10.70

HMC-LMLP AU (P RC) 0.144 0.140 0.138 0.173 0.132 0.085 0.139

± ± ± ± ± ± ±

0.009 0.002 0.008 0.009 0.012 0.009 0.006

(20) (10) (20) (40) (20) (70) (50)

hmAnt-Miner AU (P RC) size 0.154 0.168 0.161 0.180 0.163 0.162 0.174

Table 4: Summary of all pairwise comparisons according to the statistical tests. Hypothesis (i)AU (P RC) Clus-HSC Clus-SC hmAnt-Miner HMC-GA HMC-LMLP

Clus-HMC Clus-HSC Clus-SC hmAnt-Miner HMC-GA 0.0087 0.0087 0.9015 0.3893 0.0087 0.0087 0.0359 0.1070 0.0396 0.0841 0.0396 0.5377 0.5377 0.0841 0.1483

(ii)Model size Clus-HSC Clus-SC hmAnt-Miner HMC-GA

Clus-HMC Clus-HSC Clus-SC hmAnt-Miner 0.0058 0.0058 0.0350 0.6200 0.0058 0.0058 0.0093 0.0058 0.0058 0.0093

can be considered an effective strategy for HMC problems. Note that HMC-GA evolves only the antecedents of the rules, and it is guided by a fitness function that is biased towards example coverage, instead of typical rule quality. Perhaps a more complex fitness function, which addresses both example coverage and rule quality, could yield better results. Even though HMC-GA generates plenty of different rules (as presented in Table 3), we do not take into account the similarity of these rules, since we do not employ a pruning step in the algorithm. A post-processing pruning phase could significantly reduce the number of rules generated, unifying similar rules (e.g., rules that cover the same examples in the training set) and increasing the generalization power of the algorithm, possibly improving its performance on unknown data. Figure 2 shows an example of rule antecedents generated by HMC-GA. If we carefully look at Figure 2(a), we can see that the two rules extracted from the Derisi dataset differ in only one condition (the left rule has the condition r7dashbkg > 1.5371082836783643, which is not present in the right rule). It is clear that the rule with less conditions is a more general rule, and covers the same training examples covered by the left rule. In this case, the rule with more conditions can be removed. Figure 2(b) shows another example where a post-processing step could decrease the number of rules generated by HMC-GA. We can see that the two rules differ in the values of just one antecedent condition (alpha_0 > -9.52991816689719 and alpha_0 > -12.177906449008962). We can see that the second condition makes the rule more general than the first condition, since -12.177906449008962 < -9.52991816689719, and in this case, the left rule covers the same examples of

± ± ± ± ± ± ±

0.001 0.001 0.002 0.003 0.002 0.001 0.002

28.67 ± 1.62 8.20 ± 0.58 19.33 ± 1.66 19.00 ± 0.98 32.33 ± 1.52 7.40 ± 0.77 15.80 ± 1.17

Clus-HMC Clus-HSC AU (P RC) size AU (P RC) size 0.172 0.170 0.175 0.204 0.195 0.160 0.186

24 17 4 29 26 8 6

0.111 0.131 0.094 0.127 0.121 0.152 0.103

4037 2221 3520 2995 3756 777 3623

Clus-SC AU (P RC) size 0.106 0.128 0.089 0.132 0.119 0.149 0.098

9671 4186 7807 6311 7850 1238 8527

the right rule. A post-processing step can thus remove the rule from the left. g1_bkg > 230.6461178938027 AND r1 < 44673.51105919659 AND ri_bkg >= 223.71394608733962 AND f1 >= 0.0 AND g2_bkg <= 3538.2421816681153 AND g3_bkg > 1079.6229383829861 AND r3_bkg >= 980.0 AND f5 >= 0.0 AND f6 <= 0.0 AND f7 <= 0.0 AND r2_ratio < 1.45 AND r5dashbkg <= 31415.862932196775 AND r6dashbkg >= 163.8488465000004 AND r7dashbkg > 1.5371082836783643

g1_bkg > 230.6461178938027 AND r1 < 44673.51105919659 AND ri_bkg >= 223.71394608733962 AND f1 >= 0.0 AND g2_bkg <= 3538.2421816681153 AND g3_bkg > 1079.6229383829861 AND r3_bkg >= 980.0 AND f5 >= 0.0 AND f6 <= 0.0 AND f7 <= 0.0 AND r2_ratio < 1.45 AND r5dashbkg <= 31415.862932196775 AND r6dashbkg >= 163.8488465000004 (a)

alpha_0 > -9.52991816689719 AND alpha_7 <= 6.157049380743048 AND alpha_77 <= 1.06 AND elu_30 < 0.73 AND cdc15_170 >= -0.15 AND cdc15_210 < 0.45 AND spo_early >= -2.74 AND diau_1260 <= 1.52

alpha_0 > -12.177906449008962 AND alpha_7 <= 6.157049380743048 AND alpha_77 <= 1.06 AND elu_30 < 0.73 AND cdc15_170 >= -0.15 AND cdc15_210 < 0.45 AND spo_early >= -2.74 AND diau_1260 <= 1.52 (b)

Figure 2: Example of rule antecedents: (a) Derisi dataset; (b) Eisen dataset. In both cases, the rules presented cover the same examples in the training set. Considering the model size, it is clear that hmAnt-Miner, Clus-HMC and HMC-GA produce considerably less rules than the local methods, resulting in much simpler and interpretable final models. If we compare the three global methods, it is possible to see that HMC-GA builds more rules than the other two methods, for the reasons previously mentioned.

6.

CONCLUSIONS

This work presented a novel global method for HMC problems that makes use of a genetic algorithm. The proposed method, named HMC-GA, evolves the antecedents of decision rules, which are comprised of AND clauses that test a given dataset attribute Ai according to a threshold ∆ and a corresponding operator OP. The fitness function is biased towards rules with high example coverage, and the evolutionary approach comprises a sequential covering strategy, removing from the training set examples already covered by the generated rules. Experimental results suggest that HMC-GA achieves competitive predictive performance when compared to state-of-the-art decision trees for HMC problems [26], as well as to bio-inspired strategies such as Ant Colony Optimization [22] and Artificial Neural Networks [10]. These results are quite encouraging, specially considering that we employed a typical generational GA with no specific

operator modifications to deal with multi-label problems, and that we made no attempt to tune the GA parameter values. The PCT-based methods, on the other hand, have been investigated and tuned for more than a decade [4, 5, 24]. For future research, we plan to investigate the application of a more complex fitness function, which evaluates both rule coverage and quality. In addition, we plan to employ a parameter tuning strategy [15] for improving the performance of HMC-GA in the tested datasets. Finally, we plan to test our approach in other domains such as multi-label hierarchical text categorization [16, 20]. Hierarchies structured as DAGs will also be investigated, requiring modifications during the evaluation of the results provided by the method.

7.

ACKNOWLEDGMENTS

The authors would like to thank Coordena¸ca ˜o de Aperfei¸coamento de Pessoal de N´ıvel Superior (CAPES), Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ ogico (CNPq) and Funda¸ca ˜o de Amparo a ` Pesquisa do Estado de S˜ ao Paulo (FAPESP) for funding this research.

8.

REFERENCES

[1] D. Aleksovski, D. Kocev, and S. Dzeroski. Evaluation of Distance Measures for Hierarchical Multilabel Classification in Functional Genomics. In Workshop on Learning from Multi-Label Data, pages 5–16, 2009. [2] R. Alves, M. Delgado, and A. Freitas. Knowledge Discovery with Artificial Immune Systems for Hierarchical Multi-Label Classification of Protein Functions. In International Conference on Fuzzy Systems, pages 2097–2104, 2010. [3] H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf. Hierarchical Multi-classification. In Workshop on Multi-Relational Data Mining, pages 21–35, 2002. [4] H. Blockeel, L. De Raedt, and J. Ramon. Top-down Induction of Clustering Trees. In International Conference on Machine Learning, pages 55–63, 1998. [5] H. Blockeel, L. Schietgat, J. Struyf, S. Dzeroski, and A. Clare. Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics. In PKDD, pages 18–29, 2006. [6] L. Breiman. Bagging Predictors. Mach Learn, (2):123–140, 1996. [7] L. Breiman. Random Forests. Mach Learn, pages 5–32, 2001. [8] F. Brucker, F. Benites, and E. Sapozhnikova. Multi-Label Classification and Extracting Predicted Class Hierarchies. Pattern Recogn, pages 724–738, 2010. [9] L. Cai and T. Hofmann. Exploiting Known Taxonomies in Learning Overlapping Concepts. In International Joint Conference on Artificial Intelligence, pages 714–719, 2007. [10] R. Cerri and A. C. P. L. F. Carvalho. Hierarchical Multilabel Protein Function Prediction Using Local Neural Networks. In Brazilian Symposium on Bioinformatics, pages 10–17, 2011. [11] J. Davis and M. Goadrich. The Relationship Between Precision-Recall and ROC Curves. In International conference on Machine Learning, pages 233–240, 2006.

[12] I. Dimitrovski, D. Kocev, S. Loskovska, and S. Deroski. Hierarchical Annotation of Medical Images. Pattern Recogn, pages 2436–2449, 2011. [13] I. Dimitrovski, D. Kocev, S. Loskovska, and S. Dˇzeroski. ImageCLEF Medical Image Annotation Task: PCTs for Hierarchical Multi-Label Classification. In CLEF, pages 231–238, 2009. [14] I. Dimitrovski, D. Kocev, S. Loskovska, and S. Dˇzeroski. Detection of Visual Concepts and Annotation of Images Using Ensembles of Trees for Hierarchical Multi-Label Classification. In ICPR, pages 152–161, 2010. [15] A. E. Eiben and S. K. Smit. Parameter Tuning for Configuring and Analyzing Evolutionary Algorithms. Swarm and Evolutionary Computation, pages 19–31, 2011. [16] A. Esuli, T. Fagni, and F. Sebastiani. Boosting Multi-Label Hierarchical Text Categorization. Inform Retrieval, pages 287–313, 2008. [17] A. Freitas and A. C. de Carvalho. A Tutorial on Hierarchical Classification with Applications in Bioinformatics., volume Research and Trends in Data Mining Technologies and Applications, chapter VII, pages 175–208. 2007. [18] M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods. Wiley-Interscience, 2 edition, January 1999. [19] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure. Scand J Stat, pages 65–70, 1979. [20] S. Kiritchenko, S. Matwin, and A. Famili. Functional Annotation of Genes Using Hierarchical Text Categorization. In Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, 2005. [21] S. Kiritchenko, S. Matwin, and A. F. Famili. Hierarchical Text Categorization as a Tool of Associating Genes with Gene Ontology Codes. In European Workshop on Data Mining and Text Mining in Bioinformatics, pages 30–34, 2004. [22] F. E. B. Otero, A. A. Freitas, and C. G. Johnson. A hierarchical Multi-Label Classification Ant Colony Algorithm for Protein Function Prediction. Memetic Computing, pages 165–181, 2010. [23] S. Sangsuriyun, S. Marukatat, and K. Waiyamai. Hierarchical Multi-Label Associative Classification (HMAC) Using Negative Rules. In International Conference on Cognitive Informatics, pages 919–924, 2010. [24] L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Dzeroski. Predicting Gene Function Using Hierarchical Multi-Label Decision Tree Ensembles. BMC Bioinformatics, 2010. [25] C. Silla and A. Freitas. A survey of hierarchical classification across different application domains. Data Min Knowl Disc, 22:31–72, 2011. [26] C. Vens, J. Struyf, L. Schietgat, S. Dˇzeroski, and H. Blockeel. Decision Trees for Hierarchical Multi-Label Classification. Mach Learn, pages 185–214, 2008. [27] C. Woolam and L. Khan. Multi-label Large Margin Hierarchical Perceptron. International Journal of Data Mining, Modelling and Management, pages 5–22, 2008.

A Genetic Algorithm for Hierarchical Multi-Label ...

optimize the level of coverage of each antecedent. Then, ... that maps each example xi into a set of classes, respecting ... To the best of our knowledge, this is.

300KB Sizes 1 Downloads 368 Views

Recommend Documents

A Scalable Hierarchical Fuzzy Clustering Algorithm for ...
discover content relationships in e-Learning material based on document metadata ... is relevant to different domains to some degree. With fuzzy ... on the cosine similarity coefficient rather than on the Euclidean distance [11]. ..... Program, vol.

A Hierarchical Goal-Based Formalism and Algorithm for ...
models typically take a large amount of human effort to create. To alleviate this problem, we have developed a ... and Search—Plan execution, formation, and generation. General Terms. Algorithms ...... data point is an average of 10 runs. There are

The Genetic Algorithm as a Discovery Engine - Cartesian Genetic ...
parts we discover an amazing number of new possibili- ties. This leads us to the .... of the important themes which has arisen in the nascent field of Evolvable ...

A Competent Genetic Algorithm for Solving Permutation ...
Jan 30, 2002 - ... Volume 6) (Genetic Algorithms and Evolutionary Computation) Q2 Cloud, TFT 2. ... algorithm, combines some of the latest in competent GA technology to ... Competent GAs are those designed for principled solutions of hard ...

A Hybrid Genetic Algorithm with Pattern Search for ...
computer simulated crystals using noise free data. .... noisy crystallographic data. 2. ..... Table 4: Mean number of HA's correctly identified per replication and the ...

a niche based genetic algorithm for image registration
Image registration aims to find the unknown set of transformations able to reduce two or more images to ..... of phenotypic similarity measure, as domain-specific.

A Steady-State Genetic Algorithm With Resampling for ... - Roberto Rossi
1 Cork Constraint Computation Centre, University College, Cork, Ireland ... 3 Faculty of Computer Science, Izmir University of Economics, Turkey.

A Random-Key Genetic Algorithm for the Generalized ...
Mar 24, 2004 - Department of Industrial Engineering and Management Sciences ... Applications of the GTSP include postal routing [19], computer file ...

a niche based genetic algorithm for image ... - Semantic Scholar
Image registration can be regarded as an optimization problem, where the goal is to maximize a ... genetic algorithms can address this problem. However ..... This is akin to computing the cosine ... Also partial occlusions (e.g. clouds) can occur ...

Application of a Genetic Algorithm for Thermal Design ...
Apr 4, 2008 - Multiphase Flow in Power Engineering, School of Energy and Power. Engineering, Xi'an Jiaotong ... Exchanger Design, Chemical Engineering Progress, vol. 96, no. 9, pp. 41–46 ... Press, Boca Raton, FL, pp. 620–661, 2000.

A DNA-Based Genetic Algorithm Implementation for ... - Springer Link
out evolutionary computation using DNA, but only a few implementations have been presented. ... present a solution for the maximal clique problem. In section 5 ...

Hierarchical genetic structure shaped by topography in a narrow ...
global analysis including all populations, we analyzed subse- quent subsets of .... tions using the 'anova.cca' function included in the package. VEGAN. We first ...

Hierarchical Imposters for the Flocking Algorithm in 3D
The availability of powerful 3D PC graphics hardware has made the creation of ... Virtual worlds ideally should contain a rich environment with all the objects in ...

Hierarchical genetic structure shaped by topography in a narrow ...
2016 Noguerales et al. Open Access This article is ... BMC Evolutionary Biology (2016) 16:96 .... on a topographic map of the Pyrenees using the software GENGIS [103]. ..... accounting for the influence of geography in the condi- tional test ...

Reliability-Oriented Genetic Algorithm for Workflow ...
Changsha, 410073, Hunan, China. {xf_wang1, sjs2}@nudt.edu.cn. *GRIDS Laboratory, Department of Computer Science and Software Engineering. The University of Melbourne, VIC 3010, Australia [email protected]. Abstract— To optimize makespan and r

Genetic evolutionary algorithm for static traffic grooming ...
At each node of the network, there are SONET [4,5]. Add-Drop ... tain way has been proposed by the authors. Almost .... (j,i)th call is followed by an (i,j)th call for a uni-direc- ..... WDM Optical Networks, 3rd International conference on Wireless.

Genetic evolutionary algorithm for optimal allocation of ...
Keywords WDM optical networks · Optimal wavelength converter ... network may employ wavelength converters to increase the ..... toward the next generation.

Searching Co-Integrated Portfolios by a Genetic Algorithm
Apr 4, 2010 - Sadhana House, 1st Flr, 570. 400018 Mumbai – India [email protected]. Luigi Troiano. University of Sannio ..... weakly-cointegrated instruments using boosting-based optimization,” in. JCIS. Atlantis Press, 2006. [

Genetic Algorithm Based Feature Selection for Speaker ...
Genetic Algorithm Based Feature Selection for Speaker Trait Classification. Dongrui Wu. Machine Learning Lab, GE Global Research, Niskayuna, NY USA.

Clustering by a genetic algorithm with biased mutation ...
It refers to visualization techniques that group data into subsets (clusters) ..... local and global results can make a big difference [38]. We implemented the ...

Genetic evolutionary algorithm for optimal allocation of ...
Given a set of connections between the source-destination node-pairs, the algorithm to ..... advantages of natural genetic system. In this article, more ..... rence on Wireless and Optical Communications Networks – 2006. (WOCN'06), IEEE ...

An application of genetic algorithm method for solving ...
To expound the potential use of the approach, a case example of the city Kolkata, ... From the mid-'60s to early '70s of the last century, a .... early stage. ... objective Fk(X), k = 1, 2, ..., K. Then the fuzzy goal ..... enforcement of traffic rul

Dynamic Channel Allocation Using a Genetic Algorithm ...
methods for a broadband fixed wireless access (BFWA) network. The existing ..... Systems,” IEEE Transactions on Vehicular Technology, pp. 1676-. 1687, vol.