DIMENSIONALITY REDUCTION TECHNIQUES FOR ENHANCING AUTOMATIC TEXT CATEGORIZATION by Dina Adel Said A Thesis Submitted To the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in COMPUTER ENGINEERING

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2007

DIMENSIONALITY REDUCTION TECHNIQUES FOR ENHANCING AUTOMATIC TEXT CATEGORIZATION by Dina Adel Said A Thesis Submitted To the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in COMPUTER ENGINEERING UNDER THE SUPERVISION OF

Nevin Mahmoud Darwish

Nadia Hamed Hegazy

Professor

Professor

Faculty of Engineering

Informatics Research Department

Cairo University

Electronics Research Institute

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2007

DIMENSIONALITY REDUCTION TECHNIQUES FOR ENHANCING AUTOMATIC TEXT CATEGORIZATION by Dina Adel Said A Thesis Submitted To the Faculty of Engineering, Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in COMPUTER ENGINEERING Approved by Examining Committee:

Prof. Nevin Mahmoud Darwish, Thesis Main Advisor Prof. Nadia Hamed Hegazy, Thesis Advisor Prof. Reem Reda Bahget, Member Prof. Mohamed Zaki Abd ElMegeed, Member

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2007

In the name of Allah, the Beneficent, the Merciful "My Lord, arrange things for me so I shall act grateful for Your favor which You have bestowed upon me and my parents, and so I may act so honorably that You will approve of it. Admit me through Your mercy among Your honorable servants." The Holy Quran, Chapter: The Ant (19)

i

Acknowledgment All praise is due to Allah Who guided me to this. I could not truly have been led aright if Allah had not guided me. I would like to express my sincere gratitude to my supervisors; Dr. Nevin Darwish and Dr. Nadia Hegazy. I’m greatly indebted also to the assistance of Dr. Nayer Wanas. I would like to thank him for devoting numerous hours making constructive comments, criticisms, and suggestions for improving this thesis. Above all, thanks to him for teaching me how to become a researcher. Many thanks to RDI for their permission to use their Arabic NLP tools. I would like to thank also Dr. Mohamed Attia and Eng. Ibrahim Sobh for their generous help with these tools. Thanks also should go to Dr. Kareem Darwish for making his Arabic NLP tools available for research use. Furthermore, I would like to thank Dr. Samir Ahmed and Eng. Walaa Atta for providing me with Alj-News Arabic dataset. I also appreciate the efforts made by Mr. Kareem Said, Mr. Mahmoud Said, and Miss Nora Bilal in collecting Alj-Mgz Arabic dataset. I’d like also to thank Miss Hoda Said and Mr Mahmoud Said for testing Alj-Mgz using Sakhr categorizer Finally, I am very grateful to my dear parents, my family, and my friends whom I consider as my sisters. I would like to thank particularly Radwa Aboudina, Fatma Nada, Maha Nabil, Haidi Badr, and Marwa Kamal. Thank you all for being always there when I needed you most. Thank you for believing in me and supporting me through all these years. I think without your support and your prayers, none of this work would be accomplished.

ii

I dedicate this thesis to my parents, my little sister Maha, and the memory of my grandfather.

iii

List of Abbrevations 20NG The 20 Newsgroups Collection Alj-Mgz Al-jazirah Magazine Arabic Dataset Alj-News Aljazeera News Arabic Dataset AS using Al-Stem stemmer Avg Averaging global thresholding technique BOW The bag-of-words representation CC Correlation Coefficient DF Document Frequency DR Dimensionality Reduction FLocal Fixed Local thresholding technique GR Gain Ratio IG Information Gain INT Intersection Operator IR Information Retrieval KDD Knowledge Discovery in Data Max Maximum global thresholding technique MCS Multiple Classifier Systems MI Mutual Information ML Machine learning MR using RDI MORPHO3 root extractor MS using RDI MORPHO3 stemmer NAvg Normalized Averaging global thresholding technique NMax Normalized Maximum global thresholding technique NMD Normalized Maximum Deviation global thresholding technique NSTD Normalized Standard Deviation global thresholding technique OR Odds Ratio

iv

LIST OF ABBREVIATIONS (Cont.)

Reuters(10) Reuters(90) SR SVM TC tfidf UCD UCM UN W WAvg WLocal WMax WMD WSTD

The split of the top10 categories in Reuters-21578 dataset The ModeAptÂt’e split of Reuters-21578 dataset using Sebawai root extractor Support Vector Machine Text Categorization Term Frequency Inverse Document Frequency Union-cut Operator using DF Union-cut Operator using Maximization Union Operator using Raw text Weighted Averaging global thresholding technique Weighted Local thresholding technique Weighted Maximum global thresholding technique Weighted Maximum Deviation global thresholding technique Weighted Standard Deviation global thresholding technique

v

List of Notations A(wk , ci ) B(wk , ci ) C(wk , ci ) ci D(wk , ci ) f (wk , ci ) IT h N(ci ) N(Tr) N(wk ) Ntotal M t f (wk , d) Th Tr Sim(L1, L2) W (Tr) wk Wu(Tr) UT h

The number of times a word wk and a category ci co-occur The number of time the wk occurs without ci The number of times ci occurs without wk Category i The number of times neither ci nor wk occurs The score of the word wk w.r.t. the category ci The equivalent threshold value due to applying INT operator The number of documents in category ci The number of documents in the training set The number of documents in the training set in which wk occurs Total number of documents in the dataset Total number of categories in the dataset ˘ ˙Iterm frequencyâA ˘ ˙I the number of term wk occurrence in document d âA Threshold value used The training Set Similarity between the two feature lists; L1, L2 No of words in Tr Word k No of unique words in Tr The equivalent threshold value due to applying UN operator

vi

List of Tables 2.1 2.2 2.3

A Summary of approaches used for feature weighting . . . . . . . . . . . . 15 Conditions for T P, FP, FN, and T N . . . . . . . . . . . . . . . . . . . . . 21 A Summary of the State-of-art Research in Arabic TC . . . . . . . . . . . 24

3.1

A summary of comparative studies among thresholding Techniques . . . . 39

4.1 4.2 4.3

The category distribution of Alj-Mgz Dataset . . . . . . . . . . . . . . . . 47 Characteristics of the datasets used in the comparative study . . . . . . . . 48 Results summary of the combining operators . . . . . . . . . . . . . . . . 88

A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 A.15 A.16 A.17 A.18 A.19 A.20

Thresholding techniques for the 20NG Dataset (MicroF1 ) . . . . Thresholding techniques for Alj-News-W Dataset (MicroF1 ) . . Thresholding techniques for Alj-News-AS Dataset (MicroF1 ) . . Thresholding techniques for Alj-News-SR Dataset (MicroF1 ) . . Thresholding techniques for Alj-News-MS Dataset (MicroF1 ) . Thresholding techniques for Alj-News-MR Dataset (MicroF1 ) . Thresholding techniques for the Ohsumed Dataset (MicroF1 ) . . Thresholding techniques for the Ohsumed Dataset (MacroF1 ) . Thresholding techniques for the Reuters(10) Dataset (MicroF1 ) . Thresholding techniques for the Reuters(10) Dataset (MacroF1 ) Thresholding techniques for the Reuters(90) Dataset (MicroF1 ) . Thresholding techniques for the Reuters(90) Dataset (MacroF1 ) Thresholding techniques for Alj-Mgz-W Dataset (MicroF1 ) . . . Thresholding techniques for Alj-Mgz-W Dataset (MacroF1 ) . . Thresholding techniques for Alj-Mgz-AS Dataset (MicroF1 ) . . Thresholding techniques for Alj-Mgz-AS Dataset (MacroF1 ) . . Thresholding techniques for Alj-Mgz-SR Dataset (MicroF1 ) . . Thresholding techniques for Alj-Mgz-SR Dataset (MacroF1 ) . . Thresholding techniques for Alj-Mgz-MS Dataset (MicroF1 ) . . Thresholding techniques for Alj-Mgz-MS Dataset (MacroF1 ) . vii

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

123 123 124 124 125 125 126 127 128 129 130 131 132 134 136 138 140 142 144 146

A.21 Thresholding techniques for Alj-Mgz-MR Dataset (MicroF1 ) . . . . . . . . 148 A.22 Thresholding techniques for Alj-Mgz-MR Dataset (MacroF1 ) . . . . . . . 150 B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 B.9 B.10 B.11 B.12 B.13 B.14 B.15 B.16 B.17 B.18 B.19

Combining operators using 20NG Dataset (FLocal) . . . . Combining operators using Alj-News-AS Dataset (FLocal) Combining operators using Alj-News-SR Dataset (FLocal) Combining operators using Alj-News-MS Dataset (FLocal) Combining operators using Alj-News-MR Dataset (FLocal) Combining operators using Ohsumed Dataset (FLocal) . . Combining operators using Ohsumed Dataset (WLocal) . . Combining operators using Reuters(10) Dataset (FLocal) . Combining operators using Reuters(10) Dataset (WLocal) Combining operators using Reuters(90) Dataset (FLocal) . Combining operators using Reuters(90) Dataset (WLocal) Combining operators using Alj-Mgz-AS Dataset (FLocal) Combining operators using Alj-Mgz-AS Dataset (WLocal) Combining operators using Alj-Mgz-SR Dataset (FLocal) . Combining operators using Alj-Mgz-SR Dataset (WLocal) Combining operators using Alj-Mgz-MS Dataset (FLocal) Combining operators using Alj-Mgz-MS Dataset (WLocal) Combining operators using Alj-Mgz-MR Dataset (FLocal) Combining operators using Alj-Mgz-MR Dataset (WLocal)

viii

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

153 154 155 156 157 158 159 160 161 162 163 164 166 168 170 172 174 176 178

List of Figures 2.1

The process of TC (a) Training phase, (b) Testing phase . . . . . . . . . . .

9

3.1 3.2 3.3 3.4 3.5 3.6

34 36 41 42 43

3.7

Construction of combined lists using the UN, INT, UCM, and UCD operators Local and global thresholding . . . . . . . . . . . . . . . . . . . . . . . . System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preprocessing and representation of a sample document . . . . . . . . . . . Applying feature scoring (IG) on sample categories . . . . . . . . . . . . . A sample of features in the training set after obtaining their global score using Max . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sample of weighted feature vectors of different documents . . . . . . . .

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19

MicroF1 of the thresholding techniques using 20NG . . . . . . . . . . . MicroF1 of the thresholding techniques using Ohsumed . . . . . . . . . MacroF1 of the thresholding techniques using Ohsumed . . . . . . . . . MicroF1 of the thresholding techniques using Reuters(10) . . . . . . . . MacroF1 of the thresholding techniques using Reuters(10) . . . . . . . MicroF1 of the thresholding techniques using Reuters(90) . . . . . . . . MacroF1 of the thresholding techniques using Reuters(90) . . . . . . . MicroF1 of the thresholding techniques using Alj-News-W . . . . . . . MicroF1 of the thresholding techniques using Alj-Mgz-W . . . . . . . . MacroF1 of the thresholding techniques using Alj-Mgz-W . . . . . . . Vocabulary size of the Arabic datasets due to different threshold values . MicroF1 of different versions of Alj-News datasets . . . . . . . . . . . MicroF1 of different versions of Alj-Mgz datasets . . . . . . . . . . . . MacroF1 of different versions of Alj-Mgz datasets . . . . . . . . . . . . MicroF1 of the combining operators using 20NG (FLocal) . . . . . . . MicroF1 of the combining operators using Ohsumed (FLocal) . . . . . . MacroF1 of the combining operators using Ohsumed (FLocal) . . . . . MicroF1 of the combining operators using Ohsumed (WLocal) . . . . . MacroF1 of the combining operators using Ohsumed (WLocal) . . . . .

50 52 54 57 58 60 61 63 64 65 66 67 68 69 73 74 75 76 77

ix

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

44 44

4.20 4.21 4.22 4.23 4.24 4.25 4.26 4.27 4.28 4.29 4.30 4.31 4.32 4.33

MicroF1 of the combining operators using Reuters(10) (FLocal) . . . . . MacroF1 of the combining operators using Reuters(10) (FLocal) . . . . . MicroF1 of the combining operators using Reuters(10) (WLocal) . . . . . MacroF1 of the combining operators using Reuters(10) (WLocal) . . . . . MicroF1 of the combining operators using Reuters(90) (FLocal) . . . . . MacroF1 of the combining operators using Reuters(90) (FLocal) . . . . . MicroF1 of the combining operators using Reuters(90) (WLocal) . . . . . MacroF1 of the combining operators using Reuters(90) (WLocal) . . . . . MicroF1 of the combining operators using Alj-News-AS (FLocal) . . . . MicroF1 of the combining operators using Alj-Mgz-AS (FLocal) . . . . . MacroF1 of the combining operators using Alj-Mgz-AS (FLocal) . . . . . MicroF1 of the combining operators using Alj-Mgz-AS (WLocal) . . . . MacroF1 of the combining operators using Alj-Mgz-AS (WLocal) . . . . Performance evaluation of the proposed system in comparison with Sakhr categorizer using five random splits of Alj-Mgz dataset . . . . . . . . . .

x

. . . . . . . . . . . . .

78 79 80 81 83 84 85 86 87 90 91 92 93

. 95

Abstract This work deals with the problem of Text Categorization (TC). It concentrates on the filter approach to achieve dimensionality reduction (DR). The filter approach consists of two main stages; feature scoring and thresholding. Feature scoring is applied locally on each category to evaluate the associated representative set of features. Next, thresholding is applied to select the highest scored features that in turn will be the input to the classifier used. New techniques are proposed for DR at both the feature scoring and thresholding stages. Regarding feature scoring, combining operators have been proposed to take the advantages of pairs of feature scoring methods. These operators are the Union (UN) operator, the Union-cut with DF (UCD) operator, the Union-cut with maximization (UCM) operator, and the Intersection (INT) operator. For the thresholding stage, the Standard Deviation (STD), Maximum Deviation (MD), and Normalizing global thresholding methods are proposed. A large comparative study has been conducted in order to evaluate these methods relative to the state-of-art methods. The experiments have been conducted using four benchmark English datasets and two Arabic datasets. The experiments concerning Arabic datasets have been conducted using the raw text, the stemmed text, and the root text. The results showed that MD thresholding technique outperforms other methods in thresholding Document Frequency (DF) scores using evenly distributed datasets, while Normalized MD is superior for moderately diverse datasets. Furthermore, the results indicated that normalizing feature scores improves the performance of rare categories and balances the bias of some techniques to frequent categories. Additionally, the proposed combining operators show a potential in improving the performance when the combined lists have similar performance and the correlation between them is limited. With respect to the Arabic datasets, the results showed that (Al-stem stemmer) AS is the best pre–processing tool for Arabic TC while (RDI Morpho3 root extractor) MR is the worst performed method. The feature filtering approach has showed improved performance when compared with the Sakhr online categorizer. This indicates the potential of this approach in Arabic TC.

xi

Contents Acknowledgment

ii

List of Abbreviations

iv

List of Notations

vi

List of Tables

vi

List of Figures

viii

Abstract

xi

1

. . . . .

1 1 3 4 5 7

. . . . . . . . .

8 8 8 10 12 14 16 20 23 27

2

Introduction 1.1 Text Mining . . . . 1.2 Research Scope . . 1.3 TC Applications . . 1.4 Problem Definition 1.5 Thesis Overview .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Text Categorization (TC) 2.1 The process of Text Categorization . . . 2.1.1 Document Pre-processing . . . 2.1.2 Document Representation . . . 2.1.3 Dimensionality Reduction (DR) 2.1.4 Feature Weighting . . . . . . . 2.1.5 Classification . . . . . . . . . . 2.2 Performance Evaluation Methods . . . . 2.3 Arabic Text Categorization . . . . . . . 2.4 Summary . . . . . . . . . . . . . . . .

xii

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

3

4

5

TC using the Filter Approach of DR 3.1 Feature scoring Methods . . . . . . . . . . . . . . . . . 3.1.1 Feature Combining Methods . . . . . . . . . . . 3.2 Thresholding Techniques . . . . . . . . . . . . . . . . . 3.2.1 The local policy . . . . . . . . . . . . . . . . . . 3.2.2 The global policy . . . . . . . . . . . . . . . . . 3.2.3 Comparative studies of Thresholding Techniques 3.3 System Description . . . . . . . . . . . . . . . . . . . . 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . Results 4.1 Experiments Setup . . . . . . . 4.1.1 Datasets . . . . . . . . . 4.2 Thresholding Techniques Results 4.2.1 The 20NG Dataset . . . 4.2.2 The Ohsumed Dataset . 4.2.3 The Reuters(10) Dataset 4.2.4 The Reuters(90) Dataset 4.2.5 Alj-News Dataset . . . . 4.2.6 Alj-Mgz Dataset . . . . 4.2.7 Conclusion . . . . . . . 4.3 Combining Operators . . . . . . 4.3.1 The 20NG Dataset . . . 4.3.2 The Ohsumed Dataset . 4.3.3 The Reuters(10) Dataset 4.3.4 The Reuters(90) Dataset 4.3.5 Alj-News-AS Dataset . . 4.3.6 Alj-Mgz-AS Dataset . . 4.3.7 Conclusion . . . . . . . 4.4 BenchMark Results . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

28 28 32 35 35 37 39 40 42

. . . . . . . . . . . . . . . . . . .

45 45 46 48 49 51 51 56 59 62 66 71 71 72 72 72 82 88 88 94

Discusion and Conclusion 97 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

List of Papers Resulting from this Thesis

100

References

101 xiii

A Thresholding Techniques Results

122

B Combining Operators Results

152

Arabic Abstract

180

xiv

Chapter 1

Introduction 1.1

Text Mining

In recent years, there have been a substantial growth in databases in all its forms (rational, graphical, and textual); this is largely due to the advances in the computational and storage technology. These advances have made generating, storing, retrieving, printing and multiplatform publishing of individual documents faster and simpler than ever before. In turn, there is a great need of intelligent techniques for searching, arranging, summarizing, and moreover, mining to discover interesting relations with this data. In order to meet these requirements several knowledge discovery in data (KDD) techniques, also known as Data Mining, have been proposed. KDD is defined as "the non–trivial extraction of implicit, previously unknown, and potentially useful information from given data" [67]. Nevertheless, KDD usually deals with structured data. However, the most natural way to store a piece of information is the textual form. Accordingly, Text Mining techniques have been proposed in order to discover knowledge from text or unstructured data [62]. The general framework of the text mining process consists of two main stages: preprocessing and discovery. In the preprocessing stage, the unstructured texts are transformed into semi-structured texts to facilitate the discovery process. While at the discovery stage, algorithms are applied to discover interesting and non-trivial relationships [168]. The applications of text mining cover a wide range include the following: • Information Retrieval (IR) is defined as "matching a user’s query against many unstructured text documents with the purpose of finding the documents that satisfy the user’s information needs" [76]. Three main approaches are used for matching queries; i) probabilistic retrieval, ii) knowledge based IR, and iii) learning systems based IR [31]. Probabilistic retrieval is based on estimating a probability of relevance of a certain document to the user’s query. On the other hand, a model of the system user and the expert’s knowledge is presented in the knowledge based approach. While 1

in learning based systems, a machine learning technique is applied in order to extract knowledge, and identify patterns in the documents. Learning systems are interesting as they automatically extract data from examples. Therefore, they are more flexible than knowledge based systems. Additionally, they don’t suffer from the problem of parameters estimation like the probabilistic retrieval systems [31]. • Text Categorization (TC) is the process of assigning one or more label to a given text. This process is considered as a supervised classification since a collection of labeled (pre–classified) documents is provided. The task is to assign a label to a newly encountered, yet unlabeled, pattern [184]. The most commonly used approach for classification is based on machine learning (ML) techniques [158]. ML is a general inductive process that automatically builds a classifier by learning the characteristics of the categories using a set of pre–classified documents. This is in contrast to the knowledge engineering (KE) based approach. KE is the process of manually defining a set of rules encoding expert’s knowledge on how to classify documents under the given categories. The advantages of ML over KE include considerable savings in terms of expert labor power, and straightforward portability to different domains [158]. • Text Clustering is considered an unsupervised learning process, where the main aim is to group a collection of unlabeled documents into meaningful clusters that are similar within themselves and dissimilar to documents in other clusters [86]. Clustering documents is attractive because it frees organizations from the need of manually organize document bases, which could be too expensive, or even infeasible given the time constraints of the application and/or the number of documents involved. Machine learning algorithms used for text clustering can be categorized into two main groups (i) hierarchical clustering algorithms, and (ii) partition-based clustering algorithms [87]. Hierarchical clustering algorithms produce nested partitions of data by merging or splitting clusters based on the similarity among them [56]. On the other hand, partition-based clustering algorithms group the data into non–overlapping partitions that usually locally optimize a clustering criterion [78]. Hierarchical clustering provides good visualization capabilities especially if data is naturally exists in hierarchary. However, it lacks robustness as it is very sensitive to outliers. Additionally, the computational time of hierarchical clustering is very large which limits its usage in large data [182]. • Text Summarization is the process of constructing a compressed summary text from the original document according to the user’s needs [152]. Summarization is either performed using extraction or abstraction. In extraction, important sentences are extracted from the document and gathered together to form the document summary. On 2

the other hand, abstraction analyzes the document and provides a better summary using a heavy machinery from natural language processing in addition to some commonsense and domain knowledge data [75]. There are two distinct classifications for text summarization approaches (i) Single Text Summarization, where a summary is required from one text, and (ii) Multiple Text Summarization, where multiple texts are used to construct the summary [165]. Most Multiple Text Summarization systems involve clustering to collect similar documents together and provide a summary to each cluster [120]. Nevertheless, the process of summarizing multiple document has multiple challenges. Among them is the determining of the order of extracted sentence in order to produce a coherent summary [15]. Additionally, summarizing multiple documents usually requires presentation methods that allow the summary to be displayed along with supporting information from the source documents [109]. Furthermore, multiple documents can be written by many authors with multiple styles to represent the same point of view. A good summarizer is one that handles this situation, taking into account that views are changing over time; documents written in different times, and may have conflicting information [144]. The borders among these different application are fuzzy. One application could be used as a preprocessing or a postprocessing stage to the other application. For example, text categorization and clustering could be used before IR is performed to narrow the search of user’s query into specific categories [104]. Moreover, the output documents from the IR process could be clustered or categorized in order to provide a better visualization to them [194]. As mentioned before, text clustering could be used as a preprocessing step in multiple document summarization. Text summarization could also be a useful tool in IR to extract important data from documents [92]. All in all, text categorization is one of the important applications of text mining and it could be used as a pre-processing or a post-processing stage to other applications.

1.2

Research Scope

This research concentrates on TC as an important application of text mining. The ML approach to TC has become the de facto standard since it is a generic approach that enables TC to be used in different domains like web-page classification, email classification and personal document classification [158]. In the ML approach, a set of pre-labeled documents is provided as training data such that the induction algorithm will be able to learn how to classify a new document to one or more categories. In this work, the following will be assumed: 3

• The categories are labeled in a symbolic way, that is to say there is no information available for their meanings. • The classification process is handled with only the endogenous data (i.e. data extracted from the documents). There is no other exogenous data such as the document type, publication source, publication date, etc . . . .

1.3

TC Applications

There has been great interest in TC due to numerous applications that can benefit from it. Among these applications are: • Web-page classification has become an essential issue since there is more than 11.5 billion indexed pages on the Internet [72]. This huge, and ever increasing, number of web-pages makes the task of locating specific information difficult. Search can be simplified with providing the user of a hierarchy of categories to navigate through. Obviously, performing this classification manually is an infeasible task. Therefore, various TC techniques have been adapted for classifying web-pages [82,160,190,193]. • Spam filtering is an important application of TC where TC techniques are used to distinguish between only two classes; spam and legitimate emails [35, 55, 197]. • News recommendation has become an urgent application due to the massive amount of news available online. Usually, the news recommendation systems build a hierarchy of categories of news stories that matches the user’s profile [8, 34]. • IR is another important application of TC, where a classifier is invoked to categorize the user’s query as well as all texts in the search domain. This classification process facilities the automatic retrieval of the appropriate text that matches the input query [68, 104, 133]. • Word sense disambiguation is the process of assigning the correct meaning (sense) to words in a given text. Knowing the topic of this text, using TC, could facilitate the disambiguation process [132, 139]. • Email classification where TC techniques are used to classify emails into user’s specific folders [98, 180]. • Topical crawling is a recent, yet important, application of text mining. The task of a topical crawling system is to retrieve all relevant web-pages starting from a single webpage. This is done by examining the hyperlinks in this page recursively. The main 4

challenge of topical crawling system is to identify the best hyperlinks to follow. TC algorithms have been used to identify these hyperlinks [140, 175]. Due to the huge growth of the electronic resources especially in web applications, there is an increasing demand of the previous applications. This shows the need to TC as a core process in these applications.

1.4

Problem Definition

Text categorization can be divided into five processes; namely (i) document pre-processing, (ii) document representation, (iii) dimensionality reduction (DR), (iv) feature weighing, and (v) classification. Eliminating punctuation marks, and common as well as rare words is conducted in the document pre-processing phase. Additionally, stemming could be performed in order to combine multiple versions of words into a single root. This will reduce the dimensionality significantly especially in languages that could have a lot of stems to a single word such as the Arabic language. Important features are extracted from documents in the representation phase. This could be as simple as extracting words from the documents, known as the Bag Of Words (BOW) approach. Dimensionality reduction is then performed by selecting some features that have higher importance to the classification process. The selected features are then weighted in the feature weighting phase. Finally, classification is performed through supervised learning where the selected features are used to learn how to distinguish among the different document categories. It is noted that dimensionality reduction (DR) is one of the most important challenges in TC. This reduction is necessary not only to decrease the computational resources, storage, and the memory required to manage these features, but also to avoid overfitting [74, 158]. DR is performed using either feature extraction or feature selection [158]. In feature extraction, new features are generated based on complex methods such as latent Semantic Indexing (LSI), Independent Component Analysis (ICA), or Linear Discriminative Analysis (LDA) [101]. On the other hand, the feature selection approach is based on selecting high-relevance features by filtering features [189] or wrapping the features around the used classifier [100]. This study concentrates on using filtering for DR, due to its simplicity compared to the other approachs [20, 38]. Filtering features involves applying feature scoring methods locally to each category in the training set in order to evaluate features. Thresholding is then performed either locally, or globally to select the features with the highest scores [48]. In local thresholding, a subset of features is chosen from each category independently. On the other hand, global thresholding is performed by applying a globalization scheme, such as maximization (Max) or averaging (Avg), to construct a single feature set. Selection is 5

applied to this set to construct the final subset of features that would be used in the learning, and subsequently in the classification, process. This work targets enhancing the TC process by proposing new techniques for both stages of the filter approach of DR; (Feature scoring and thresholding). The proposed techniques are attempting to overcome problems such as the bias to frequent categories and selecting non-discriminative features. This is with the aim of selecting more discriminative features in the final feature set that would be the input to the classifier used. Accordingly, this would have an impact on improving the classification performance as well as storage and computational resources. Two main problems exist in global thresholding techniques, (i) most of these techniques are biased to frequent categories which affects the classification of rare categories dramatically, and (ii) there is a possibility to choose non-discriminative features among the selected feature set when using these techniques. In this study, new methods for global thresholding are proposed to overcome these problems. These techniques are the Maximum Deviation (MD), and the Standard deviation (STD). In addition, methods for normalizing feature scores are proposed. In total eight new thresholding techniques are suggested. Few studies have been conducted on the performance of thresholding techniques, additionally studies that evaluates all the existing techniques are lacking. Moreover, the existing studies do not present consistent conclusions. In order to evaluate the proposed thresholding techniques, a comparative study is conducted among the eight new thresholding techniques and six existing state-of-art thresholding techniques. Since thresholding techniques are highly affected by the feature scoring method used, the comparative study is performed using four high-performing feature scoring methods at different thresholding values. This will help to evaluate the effectiveness of the new thresholding techniques and provide the TC literature with a a broad study among different thresholding techniques. With respect to the feature scoring stage, classifiers may saturate due to the selection of certain features. Combining features generated from several feature scoring methods can help overcome this problem. Some methods used in the literature for combining feature sets are very computationally expensive [50]. Other methods are targeting improving the performance without reducing the storage [53, 148]. In this work, new operators for combining feature scoring methods are suggested in order to overcome these problems. These are the Union (UN) operator, Intersection (INT) operator, Union-cut with Maximization (UCM) operator, Union-cut with Document frequency (UCD). The performance of these operators has been evaluated using different feature scoring methods for different threshold values. In order to evaluate the performance of the proposed thresholding techniques and combining operators, several experiments have been conducted using four English benchmark datasets, namely the 20 Newsgroups (20NG), Reuters-21578 Top10 (Reuters(10)), Reuters6

21578 ModeApté (Reuters(90)), and Ohsumed Datasets. The experiments of the Arabic language were performed using Aljazeera news dataset (Alj-News) [129], and Al-jazirah Magazine Dataset (Alj-Mgz). The Arabic experiments have been performed using the raw text, the stemmed text, and the root text. The stem and root have been extracted using two different sets of available tools. The first set was implemented by Kareem Darwish [41,42]. Whereas the second set was provided by [10].These results in a total of forteen datasets , four English and ten Arabic datasets, each of different nature in terms of the vocabulary size and category distribution. Conducting comparative studies using these diverse datasets assist in deriving conclusions about the performance of the thresholding techniques and combining operators.

1.5

Thesis Overview

This thesis is arranged such that chapter 2 presents a survey of different components of the TC process. In chapter 3, more light is shed on the feature filter technique for dimensionality reduction. This is in conjunction with a detailed description of the proposed methods to enhance the feature filtering. Comparative studies among the proposed methods and stateof-arts methods are presented in chapter 4. Finally, chapter 5 presents the conclusions of this thesis and suggested future work.

7

Chapter 2

Text Categorization (TC) 2.1

The process of Text Categorization

The process of TC consists of two phases; a training phase, and a testing phase. Figure 2.1 illustrates the main building blocks of these phases. In the training phase, a set of labeled documents is provided. Documents are initially pre–processed in order to eliminate noisy and non-useful terms. Next, features are extracted in the presentation phase. Due to the high dimensionality of these features, several Dimensionality Reduction (DR) techniques can be performed to extract discriminative features. These features are next weighted and presented to the classifier in order to build a classification model. The model is used to classify any test document in the testing phase. Before classifying any test document, it has to be pre–processed, represented, and weighted in the same way of manipulating the training documents. In the following, some light will be shaded on each of these phases. 2.1.1

Document Pre-processing

In the preprocessing stage, the document is transferred to a format suitable to the representation process. Mainly, punctuation and special characters are removed. This is followed by applying some feature-engineering choices. These choices can be used in order to reduce the noise in the document as well as to improve the classification and computational efficiency. Some of the most popular choices are: i. Removing of common words such as ’a’, ’the’, etc. . . in the English language could be performed. This is widely done by using a "stopword list". However, this approach suffers from being a language specific and domain specific choice. Alternatively, a threshold on the number of documents in which the word occurs could be specified e.g. words that occur in more than half of the documents may be considered as stop words [63]. However, this choice may be harmful to highly-skewed datasets. In such 8

Training Labeled Documents

Test  Document

Document Pre­processing

Document Pre­processing Prepared Documents

Prepared Documents

Document Presentation

Document Presentation

Set of Features Set of Features

Dimensionality Reduction Selected Features Feature Weighting Selected Weighted  Features

Feature Weighting Classifier

Learning

Selected Weighted Features Testing

(a)

(b)

Predicted  Category  (Categories)

Figure 2.1: The process of TC (a) Training phase, (b) Testing phase

datasets, one or two classes may include a large percentage of documents. Therefore, deleting stop words based on thresholding may delete the discriminative features that distinguish these categories. ii. Stemming could be applied with the objective of extracting the "word stem". As a result, the variation arises according to multiple occurrences of the same word in many grammatical forms could be eliminated. For example, "review", "reviewed", reviewer", "reviewing" are all mapping to the same stem "review". iii. Removing of rare words that occur two or fewer times may be useful. Forman showed that words in the corpus usually follow the Zipf distribution, where few words occurs frequently while many words occurs rarely [63]. Therefore, removing of rare 9

words may lead to a great saving in the feature space. However, this choice may be harmful to highly-skewed datasets since it may delete discriminative features in rare categories. In order to investigate the effect of the previous three feature engineering choices on TC, Silvatt and Ribeirot compared the performance of using combinations of these choices [161]. They concluded that stop word removal is an important choice as it affected the performance significantly. They showed also that stemming may degrade the performance slightly while removing rare words has nearly no influence on the performance. An extended study performed by Song et al. showed that stop word removal is neither helpful nor harmful to TC [162]. They also noticed that stemming may harm the performance slightly. However, they recommended the use of stop word removal and stemming since they have a significant effect on reducing the feature space. 2.1.2

Document Representation

After preprocessing, the main features of the document have to be extracted. This is done by representing the document as a bag of words, phrases, synsets, or even a hybrid combination of any of the three representations. i. The word-based representation : The simplest method to represent the document is to convert it to a vector of words, which is known as bag-of-words (BOW) [153]. BOW has many disadvantages as it discards a great deal of information about the original document such as paragraph, sentence, and word order. In addition, this representation breaks the syntactic structure, and ignores the semantic relations among words [157]. Despite these disadvantages, BOW has remained the dominant approach for representing documents in the TC literature [158] due to its simplicity and efficiency. ii. The phrase-based representation : The main drawback of the BOW representation is that it ignores the context of the word. BOW assumes that the contribution to the classification process due to the presence or absence of a certain word is independent on the presence of other words [37]. This assumption is unrealistic. As an example the phrase "artificial intelligence" has a distinct meaning from the words "artificial", and "intelligence" in learning the classes; computer, industry, and learning. In order to overcome this problem, a phrase-based representation could be used. The phrase could be simply a cluster of n distinct words, which is known as n-grams [110]. The main drawback of this representation is that it does not capture the relations among words in the same cluster. Other researchers constructed clusters of words that have the same Mutual Information [11]. Furthermore, words in the same cluster could be restricted 10

to have a fixed order, known as sparse-phrase [37]. Motivated by the fact that nouns are more informative than verbs, Scott and Martin used linguistics techniques to extract noun phrases to form clusters [157]. They also used another representation based on extracting the key phrases in the system. Additionally, Bekkerman et al. used the Information Bottleneck (IB) theory to cluster similar words showing an enhancement only in complex corpus [16]. Generally, none of the previous systems showed a significant improvement compared with the BOW despite the high complexity involved in constructing phrases [134]. iii. The synset-based representation: The semantic relations among words may be very informative to TC. These relations could be captured using the Wordnet; a large online thesaurus that contains information about synonymy and hyponymy [124]. Rodríguez et al. increased the weight of features that are synonyms of the topic heading [147]. On the other hand, Scott and Matwin mapped all synonymy words into a single mapping [157]. However, this representation did not show an improvement compared with the traditional BOW due to the level of ambiguity involved. On the contrary, another experiment conducted by [95] showed an improvement due to the usage of the word sense. They used a corpus with full knowledge about the word synonymy which is not a practical choice for the TC problem. Moschitti and Basili used word sense disambiguation to resolve the ambiguity among different words and selected only the proper synonymy [134]. Another solution is to merge WordNet with an ontology such as Yahoo in order to map a certain word into the proper concept [172]. The idea of using a domain ontology have been extended in [70] and have shown a significant improvement. iv. Combination of representations: a combination of two or more presentations could be used. A combination of uni-gram and bi-grams was implemented by [11, 29]. Additionally, Scott and Matwin used a union of noun-phrase and BOW. This led to a performance falling between that of BOW and noun-phrase [157]. Another choice here is to augment other features that are more relevant to the document. This may lead to more knowledge about the document style. As an example, syntactic features of the document could be used such as the percentage of noun phrases and the percentage of words in noun phrases [141]. In order to help finding the style of a certain category, other category features could be used such as the average document length of that category and the number of positive training examples of that category [103]. Whatever the method used, the representation stage is considered as converting the document into a bag of features that will be used later in the classification process.

11

2.1.3

Dimensionality Reduction (DR)

DR is one of the most important challenges that face the research in TC. DR is a necessary process to avoid the overfitting problem where the classifier fits the training data in the sense that it fails it to classify new unseen data [1]. Furthermore, DR techniques help to eliminate noisy and irrelevant terms. As a result, saving in the computational resources, storage , and memory requirements could be achieved [74]. Two approaches are mainly used for DR: i) feature selection, and ii) feature extraction. • Feature Selection Feature selection methods aim at selecting some of features that have higher importance to the classification process. The selection process is performed by applying either the filter approach or the wrapper approach [158]. While the filter approach is based on applying a scoring method to evaluate the features, the wrapper approach wraps the features around the classifier to be used to anticipate the benefits of adding or removing a certain feature from the training set. Moreover, a hybrid approach that combines both the filter and the wrapper approaches could be used. These approaches are outlined in the following. – The Filter approach where a feature scoring method is first applied to each category in order to assess the features in the category. This is followed by applying a selection scheme to select some of these features based on the assigned score. Several feature scoring methods have been adapted in TC. Among them are the DF, IG, GR, MI, χ2 , CC, Gss, odds ratio, and BNF [46, 63, 71, 126, 137, 189] (the mathematical definition of these feature scoring methods in addition to a detailed survey of the state of art in the filter approach is included in chapter 3). After scoring different feature sets in each category, a thresholding scheme is applied to select some of these features. The most straight forward approach is selecting the top features from each category, which is known by the local thresholding policy [114]. On the other hand, a globalization scheme could be used to combine these features in a single feature set. Thresholding is next performed on this global set. Maximization and weighted averaging are the most popular used globalization schemes [189]. – The Wrapper approach A major disadvantage of the filter approach is that it ignores the effect of the selected feature set on the classifier algorithm [20]. On the contrary, the wrapper approach selects the features that lead to an improvement in the performance of the classifier algorithm on a validation set. Commonly, either forward selection or backward elimination of features is used. In the forward selection approach, a wrapper examines the effect of adding each unselected 12

feature and chooses the one that leads to the best accuracy. On the other hand, features that cause the performance to the classifier to degrade are removed in the backward elimination approach [100]. It has been shown that the backward elimination is not a practical choice for TC, due to the large number of features involved [12]. Several researchers adopted the wrapper approach to TC including [22, 96, 127, 131]. Most studies showed that the wrapper methods are more efficient than the filter methods in terms of classification efficiency. However, the wrapper methods are very computationally expensive, as they involve calling the induction algorithm for each feature set considered [20]. Another drawback of the wrapper approach results from the using of a cross-validation set to evaluate the features. In the case of small training set, this may cause the wrapper approach to overfit the training samples. – Hybrid Approaches could be used with an objective of combining the advantages of the filter and wrapper approaches. This would help achieving a performance close to the wrapper methods with a reduced computational time due to the using of the filtering methods. As an example, Das [44] used the filter approach in order to evaluate the features. However, the size of the training set was determined automatically by the wrapper approach i.e. he allowed the adding of features as long as the classifier accuracy was increasing. On the other hand, a genetic algorithm was used by [50] to compose the optimal feature set. In this experiment, six feature scoring methods were applied to form six different feature sets. Then, the genetic algorithm was invoked to merge them in optimally to maximize the performance on the validated set. Another approach was presented by [12] where they used a wrapper approach on a validation set in order to estimate the parameter of the scoring method used. • Feature Extraction Feature extraction methods aim at transforming the vector space representation of the document into a one of a lower dimensionality. The dimensions in the new representation can be viewed as a linear combination of the original dimensions. Many methods have been adapted to TC in order to perform DR using feature extraction. Among them are the following: – Latent Semantic Indexing (LSI) is based on the idea of Principle Components Analysis (PCA). The PCA uses Singular Value Decomposition (SVD) to transform the original higher dimension into a new lower one where features are ranking by their importance within the document. Using truncated SVD, the best features (in terms of least squares) could be extracted [49]. A strong argument about 13

truncated SVD is that it does not only capture strong associations among features and documents, but it also could remove noise, redundancy, and word ambiguity in the dataset [169]. Several researchers adapted LSI to TC demonstrating its efficiency, despite being computationally expensive [27, 33, 119, 179, 195, 203]. – Independent Component Analysis (ICA) linearly transforms data into components that are maximally independent on each other [85]. Most applications of ICA have used PCA as a pre–processing step in order to transform the data into lower dimensions in which they are ordered by their importance. The ICA further transforms these new dimensions into independent components [170]. Examples of systems that have used ICA as a DR process are [101], [19], [159], and [176]. – Linear Discriminative Analysis (LDA) works in a supervised manner unlike LSI and ICA which pay no attention to the class labels. This makes LSI and ICA more suitable to text clustering other than text categorization [174]. On the contrary, LDA searches for features that best discriminate among classes. Therefore, LDA constructs a linear combination of these features that maximizes the margin among the desired classes [121]. In this sense, LDA extracts the best features for discrimination while PCA based methods extract the best features for representation [56]. Despite expecting that LDA should outperform PCA based methods, It has been shown PCA outperforms LDA in small size datasets [121]. The decision to choose a certain approach to perform DR is highly dependent on the requirements of the application in which TC would be used. For example, some applications require accurate performance regardless of the computational time. On the other hand, other applications may sacrifice some accuracy in favor of the execution time and storage needed [200]. It has been shown that filtering features is the simplest and fastest approach to perform DR. Therefore, several studies have been conducted to propose new techniques in order to enhance the performance of feature filtering methods. Among these studies are the proposing of new feature scoring methods [38, 63], combining feature scoring methods [148], and suggesting new ways to perform feature thresholding [64, 164]. This makes the performance of feature filtering approach comparable to feature wrappering and extraction without adding so much complexity to it. 2.1.4

Feature Weighting

Features obtained through DR have to be weighted before being presented to the classifier to be able to learn them. Several approaches have been proposed for feature weighting. All these approaches are mainly based on multiplying two components. The first component W1 indicates the feature importance in the document. On the other hand, the second 14

component W2 indicates the global feature importance in the entire training set. This is known as "Term Frequency Inverse Document Frequency" (t f id f ) weighting [151]. Furthermore, t f id f could be normalized to handle the diversity among documents. Table 2.1 summaries the most commonly used techniques for feature weighting [26, 51, 105]. Three letters are used to express a certain combination of the three components; W1, W2, and normalization. The normalization component could be excluded if the diversity among documents is marginal. However, Song et al. showed that normalization usually enhances the performance of the classification process significantly [162]. Table 2.1: A Summary of approaches used for feature weighting

W1 z

none (always 1)

n

(normal) tf

n

none (always 1)

l

log(1.0 + ln(t f ))

t

N(Tr) log( N(w ))

a

tf Max(t f )

0.5 + 0.5 ×

b

binary (1 or 0)

m

tf maxt f

i

1 1 − 1+t f

W2

r c

Normalization n

k

N(wk ) log(1 + N(Tr)−N(w

k)

c

none (always 1) cosine ( ∑

W 1×W 2

∀Words in document (W 1×W 2)

k ,ci )/N(ci ) log B(w A(w ,ci )/(N(Tr)−N(ci )) k

The simplest approach for feature weighting is the bnn where a feature is assigned a weight of one if it exists in the document, and zero otherwise. Yang and Pedersen [189] demonstrated that ltc is the best weighting scheme among those reported in [26]. Hence, ltc has become the most common approach for feature weighting in TC [158]. Debole and Sebastiani suggested a supervised term weighting scheme where they replaced the W2 component with the score obtained in the feature selection process [46, 47]. This score could be either the local or global score of the feature in the category. Concerning classification of frequent categories, ltc outperformed both the global and local scores. On the other hand, using the local score was shown to be superior in classification of rare categories when applied to highly-skewed datasets. Deng et al. [51] performed a similar experiment comparing mtn, mcn, (m × χ2 ), and (m × Odds Ratio). The comparison showed that supervised weighting methods outperformed all other methods. Lan et al. [105] conducted a comparison among several weighting schemes without normalizing scores. They showed that nrn is superior to all other methods. Additionally, they demonstrated the poor 15

)

performance achieved by using bnn and (t f × χ2 ). Another experiment conducted by [99] modified ltc to indicate the importance of the sentence the term occurs. This showed a significant improvement compared with traditional methods. It is worth noting that t f id f weighting method could be used as a feature scoring method to evaluate features during the DR process [52]. Weighting features is a very time consuming process, especially if normalizing weights will be performed. When using normalized weights, two passes through the document are required. In the first pass, the weight of each feature is determined without normalization. Then, the normalize factor of the document is calculated and used in the second pass to update the feature weights. Consider that weighing features is performed for both the training and testing phases, the role of DR is magnified. DR targets selecting a small portion of discriminative features for the classification process. This reduction will significantly reduce the time required for weighting features. 2.1.5

Classification

Prior to classification, all the stages could be considered as pre–processing to construct the final feature set. This feature set would be used by the classifier in the training phase. The classifier would further process this set to predict the class(es) of new unseen documents. Most classifiers assume that classes are mutually exclusive and exhaustive, which might be not always true [78]. The exhaustive assumption could be handled by assuming that there is additional class that holds documents that do not belong to any category. While dealing with the multi-label problems, where a document could belong to more than one category, is handled using the one-against-all approach. In this approach, the multi-label problem is transformed to M binary problems. Each problem, pi , consists of documents of category ci as positive examples as well as all documents of other categories as negative examples. Due to the simplicity and efficiency of this approach, it has become dominant in TC applications [158]. (For other approaches, the reader is directed to the survey provided by [146] in which they proved the effectiveness of the one-against-all approach compared with other approaches). The most popular classifiers used in TC are: i. Decision Tree (DT) is based on answering questions recursively in a form of a tree until a leaf node is reached. Leaf nodes represent all categories [56]. DT consists of two main steps: i) the growing step in which nodes are added to the tree based on a certain criteria, and ii) the pruning step where some nodes are deleted in order to overcome the problem of overfitting [25]. Among the systems that adapted DT for TC are [114], [57], [32], and [167]. DT Classifiers are fast and easy to interpret. However, most of them are not suitable to sparse data [91]. 16

ii. Decision rules are best suited when the relationships among categories can be described in terms of rules [56]. Examples of used rule-based systems in TC are: SWAP1 [9], RIPPER [36], CHARADE [136], ESC [117], CPAR [192], and Harmony [177]. Additionally, many research have proposed the extraction of rules from decision trees. As an example, [91] described a system that constructs a fast decision tree, then converts it to the logically equivalent rules. Decision rules have many advantages like readability and refinability [117], however, most of them are not appropriate for large training set with a large number of classes and particularly for the multi-categories problem [177]. iii. Maximum Entropy (ME) is based on the idea of giving an estimation to the conditional distribution of the class label given a set of features [18]. Using ME for TC was first introduced by [138] for the single-category task and by [202] for the multicategories task. Other researchers have used ME in TC including [123], [79], [65], and [94]. ME classifiers are simple, efficient, and robust against data sparseness. In addition, ME classifiers offer a framework for specifying any relevant information that may contribute to the classification task [65]. However, they suffer from the problem of overfitting [94]. iv. Neural Network (NNet) is a network of units, where the input units represent features and the output unit(s) represent(s) the category (categories) of interest. The edges connecting the units represent the relations among these units [204]. A basic advantage of NNet is its ability to generalize any continuous function. On the other hand, it is very hard to interpret the NNet, or determine why it takes a specific decision. Another key disadvantage of NNet is being very slow. Additionally, its converge time depends on the network initial conditions [178]. NNets are also affected by the presence of outliers in the training set, since they use the sum-of-square errors, and the problem of the local minima [1]. Among the researchers that have applied NNets to TC are [179], [137], [185], [102], [28], [149], [171], [30] and [118]. v. Naive Bayes (NB) is based on assigning to a new instance the most probable value νNB [125]. νNB is defined as M

νNB = max{p(ci )Π j p(w j |ci )} i=1

. A disadvantage of NB is its assumption of the word independency i.e. the conditional probabilities of a word given a category is independent on the other words given that category. Although this assumption is unrealistic, NB still achieves accurate perfor17

mance [78]. Due to its efficiency and simplicity, NB has been implemented in various TC systems including [112], [122], [128], [66], [145], [97], [142], and [156] vi. k-Nearest Neighbors (kNN) assigns a new test document X to the class that the majority of the k close neighbors to X belongs to [56]. Usually the similarity between X and a training document Y is calculated using the Euclidean distance D(X,Y ) which is defined by d

D(X,Y ) = { ∑ (x j − y j )2 }1/2 ,

(2.1)

j=1

where X = (x1 , x2 , ..., xd ) and Y = (y1 , y2 , ..., yd ). The kNN classifier is robust to noise and quite effective for a large set of training documents [125]. A major problem involved in the kNN classifier is the "curse of dimensionality". In high dimensional data, the Euclidean distance becomes meaningless and the kNN performs poorly [78]. Furthermore, kNN is considered a lazy classifier, since it does not build a model for the training data. Therefore, nearly all computations take place at the testing time rather than the training time. Therefore, kNN is very inefficient in terms of both the computational power, and the storage [125, 191]. Among the systems that proposed using kNN in the classification stage for TC are [163], [77], [14], and [13]. vii. Support Vector Machine (SVM) was first introduced by [39]. SVM attempts to find the best separator among classes, by maximizing the margins between the data points (support points) in the training set. An advantage of SVM is that only the points in the training set that affect the decision are necessary. This is in comparison to other classification methods that evaluate the whole training set [185]. SVM is robust to outliers and does not suffer from the local minima problem. On the other hand, SVM suffers from a long training time [191]. Furthermore, it suffers from the problem of model parameters where a large number of parameters have to be set in order to provide the optimal solution to a specific problem [1]. SVM has been widely used in TC for its effectiveness compared with other classifiers [52, 90, 105, 141, 173, 198]. viii. Other classifiers: Several other induction algorithms have been used in TC. Among the most popular algorithms are the Linear Least Squares Fit (LLSF) [187], Bayes Net (BN) [57], Logistic Regression (LR) [198], Ridge Regression (RR) [199], Rocchio [88, 115], Widrow-Hoff [115], Prototype [116], and kNN-Model, which is a combination of kNN and Rocchio [73]. ix. Multiple Classifier Systems (MCS) have gained significant momentum in the last few years in order to increase the classification accuracy and capitalize on different advantages of each classifier. This follows logically from the way that human consults 18

different approaches and conditions to reach a final decision. An early experiment performed by [107] showed the potential of MCS in TC. They combined Rocchio, kNN, and NB in a pairwise manner. They showed that this combination enhanced the performance compared with using a single classifier. They further showed that aggregating the three classifiers outperformed all the pairwise combinations. Del Castillo and Serrano divided the individual document into four parts and used a different classifier for each part [50]. They combined the decisions of all the classifiers by running a genetic algorithm for optimizing the weight of each classifier decision. They showed that this genetic combination has a better result than using individual learners or even voting among classifier decisions. Alternatively, Bell et al. combined four classifiers using Dempster’s Rule [17]. The classifiers used in this experiment were the Rocchio, kNN, SVM, and kNN-Model. The experiment showed that the best pairwise combination is the one that aggregates the SVM classifier with the kNN-Model. Another potential area of MCS is performing the fusion at the feature level [40]. Bagging and Boosting techniques are the most common approaches to perform feature fusion. Bagging is based on splitting the training set into random samples with replacement. Next, each of these sets are fed to a different classifier. The decisions of these classifiers are then combined to compose the final decision using several schemes such as maximum voting [24]. While Bagging can generate the classifiers in parallel, Boosting generates them sequentially as the input feature set to a certain classifier is based on the output of the previous classifiers [56]. Among the systems that used feature fusion approach in TC are [155], [45], [167], [3], and [54]. Several studies have been conducted to compare different classifier in TC. The studies of [57, 89, 188] showed the superior performance of SVM using Reuters dataset. Whereas the study of [116] showed that SVM outperforms other classifiers using MicroF1 performance measure which is an indication of the performance in frequent categories. However, using the MacroF1 measure, LLSF, NNet, and LR outperform SVM, which SVM has an edge over kNN and Rocchio. Since the MacroF1 is largely affected by the performance of infrequent categories and the dataset used in this experiment was Reuters(90) which is a highly-skewed dataset, this could justify these conclusions. These results are consistent with the study performed by [48] where they showed that SVM has a superior performance compared with kNN and Rocchio for both MicroF1 and MacroF1 using Reuters(10) dataset. However, kNN and Rocchio outperform SVM for only the MacroF1 of Reuters(90) and Reuters(115) datasets which are skewed datasets compared with Reuters(10). The experiment conducted by [48] also illustrates that selecting 10% of features using the filter approach exhibits the same classification performance when using all the features by SVM. This is in contrast to other classifiers such as kNN in which using all the features 19

degrades the performance significantly. This supports the fact that SVM is very robust to noise and it does not need DR techniques from the performance point of view. However, the complexity analysis of the classification algorithms presented by [191] demonstrated that SVM has nearly the longest training time and the largest storage requirements. This indicates, as shown by [201], the necessity of the DR techniques to reduce the space dimensionality and hence decrease the training time and storage needed. All in all, SVM has received confidence for the TC literature in its suitability for TC applications. Thresholding classification scores Some classifiers such as kNN and NB assign a score for each document-category pair. Hence, a thresholding scheme has to be applied. Others such as SVM do not need this thresholding. Yang [186] conducted a survey among several thresholding techniques including: 1. Rank-based thresholding (RCut) where categories are ranked w.r.t. each document. The highest k-top categories are then selected where the k parameter could be obtained by using a validation set or alternatively specified by the user. This could be as simple as a user specific parameter. 2. Proportion-based assignment (PCut) where documents are ranked w.r.t. each category ci . Next, the highest ki documents are assigned to ci , where ki = p(ci ) × x × M and x is the parameter that controls the thresholding operation. 3. Score-based optimization (SCut), similar to the PCut approach, documents are ordered w.r.t. each category. The number of documents to select from each category is determined using a validation set. Yang showed that SCut has an unstable performance, since it suffers from overfitting. On the other hand, the PCut approach is more stable particularly for rare categories. However, it lacks the facility to be used in on–line decision [186]. This is in construct to the RCut strategy which is more suitable to such applications. Nevertheless, the RCut may not be suitable for highly-skewed data-sets since it neglects the category distribution [108].

2.2

Performance Evaluation Methods

The performance of the TC process can be measured by one or more of the following methods: • Recall and Precision are two well known measures of effectiveness in text mining. While Recall is a measure of correctly predicted documents by the system among the 20

positive documents, Precision is a measure of correctly predicted documents by the system among all the predicted documents [113]. Recall and precision are calculated by equations (2.2) and (2.3) respectively. TP TP r= , (2.2) p= , (2.3) T P + FN T P + FP where r and p are the recall and the precision of ci , T P, FP, and FN refer to the set of true positives w.r.t. ci (documents correctly classified to belong to ci ), false positives w.r.t. ci (documents incorrectly classified to belong to ci ), and false negatives w.r.t. ci (documents incorrectly classified not to belong to ci ) respectively. Table 2.2 illustrates the conditions for these measures [111]. Table 2.2: Conditions for T P, FP, FN, and T N

Decision

True False

True State of Nature True False TP FP FN TN

As a matter of fact, a balance is needed between missing correct decisions "recall" and generating many false decisions "precision". Therefore, other measures have been proposed in the literature. • Eleven-point average precision is a measure based on recall and precision. Given a document, the system is allowed to achieve recall values of 0%, 10%, 20%, . . . 100%, and the precision values at these points are computed. The resulting 11 point precision values are then averaged. The average precision values of all test documents are further averaged to obtain a global measure of the system performance [189]. • The break-even point is the point where recall equals precision. It is obtained by allowing the classifier to assign more categories. As a result, the recall increases and precision decreases until they become equal [114]. • Fβ – measure is a commonly used technique to measure the trade off between recall and precision. Fβ was first proposed as a measure of effectiveness in TC by [113] and it is defined as: (β2 + 1)p ∗ r β2 p + r (β2 + 1)T P = (β2 + 1)T P + FP + β2 FN

Fβ =

21

(2.4)

It is clear that Fβ equals precision when β = 0 and it equals recall when β = ∞. Between the two extremes, β is specified to adjust the weight between recall and precision. For example, recall has half the importance of precision if β = 0.5, twice its importance if it equals 2.. A common used approach is to set β to 1 so that recall will have the same importance as precision. In this sense, F1 is defined as: F1 =

2∗TP . 2 ∗ T P + FP + FN

(2.5)

As shown in [185], the break-even of a classifier is always less or equal than its F1 value. MicroFβ and MacroFβ tests are two measures based on Fβ . In the MicroFβ , all the binary decisions are collected in a joint pool and then Fβ is computed. On the other hand, the MacroFβ is based on calculating Fβ for individual categories. The measure is then averaged over all categories [110]. Generally, the MacroFβ equally weights all categories, and thus it is influenced by the performance of rare categories. On the other hand, MicroFβ equally weights all the documents, and therefore it is affected by the performance of frequent categories [185]. • Receiver Operating Characteristics (ROC) curves provide a slightly more sensitive measurement that can be adapted to show the convex hull of precision and recall [21]. It provides a more informative picture of performance than the single break-even point of Fβ [157]. However, it is more computationally expensive and it needs human’s interpretation. • Utility measures have roots in the decision theory, since these measures extend the effectiveness using economic criteria such as gain and loss. The standard effectiveness is obtained when T P = T N > FP = FN [158]. However, this case is not usually desired. For example, in spam email filtering failing to prevent a junk email is less serious than discarding a one that is not junk [63]. According to [84], the linear and scaling utility are defined as shown in equations 2.6 and 2.7 respectively. U(ci ) = a1 ∗ T Pi + a2 ∗ T Ni + a3 ∗ FPi + a4 ∗ FNi ,

ScaledUtility =

maxU(ci ),U(s) −U(s) , maxU(ci ) −U(s)

(2.6)

(2.7)

where a1, a2, a3, and a4 are the utility parameters and U(s) is the utility of retrieving s non relevant documents in category ci and maxU(ci ) is the maximum utility obtained 22

for that category. • Training and Classification Efficiency were used by [57] to evaluate classifiers with closed performance. While the former calculates the average time taken to build a classifier for a certain category, the latter measures the average time taken to categorize a new document. The performance measures most widely used in the TC literatures are the MicroFβ and MacroFβ . These measures are very helpful in evaluating the performance of both frequent and rare categories.

2.3

Arabic Text Categorization

A key issue in Arabic text mining systems is how to perform stemming. The Arabic language is highly derivative, where tens or even hundreds of words could be formed using only one root. Furthermore, a single word may be derived from multiple roots [6]. Mainly, there are two approaches to perform an Arabic pre–processing; (i) the stem-based approach, and (ii) the root-based approach. In the stem-based approach, prefixes, infixes, and suffixes are removed from the word to extract the word stem. This stem may be further processed to compass the word root in the root-based approach [181]. Early studies performed on IR showed that using root words is better than using stemmed words [2, 4, 83]. Other studies, by [106,135], showed that the stem-based approach is superior to the root-based. An experiment performed by [43] showed that using context to improve the root extraction process may enhance the process of IR slightly compared to the stem-based approach, however, it is computationally expensive. On the other hand, Brants et al. reported that performing stemming to the Arabic text will increase the ambiguity and hence using the raw text may be better [23].

23

24

[61] Root-based algorithm of [5] Yes

[60]

Duwairi

[58, 59]

Al-Taani and Al-Awad

stem-based algorithm Their own root-based and

[129]

Syiam et al.

[166] Their own stem-based algorithm

Kanaan et al.

[93]

stem-based algorithms

Their own

Mohamed et al.

[7]

Root-based algorithm of

No

No

El-kourdi et al.

[154]

Sawaf et al.

[79]

Hassan

Stemming

Yes

Yes

Yes

unknown stemmer

No

No

No

No

IG

Gss, CC, OR

DF, χ2 , IG

n-grams

DF with

-

-

tfidf

-

DF,

Selection

removal χ2

Feature

Stop-word

kNN

Rocchio

kNN,

Distance-based

Fuzzy-based

Distance-based

NB

ME

ME

Classifier

Yes

No

Yes

No

No

Yes

Yes

No

Validation

Cross

Table 2.3: A Summary of the State-of-art Research in Arabic TC

600

1.1k

1.5k

50

1k

1.5k

33k

4.5k

Ntotal

6

6

5

5

10

5

10, 34

5

M

Yes

No

Yes

Yes

Yes

Yes

No

No

Even

Arabic Data-set

Unfortunately, documentation, publications, and research in the area of Arabic TC is fairly limited and hence not readily available. The only commercially available system that performs Arabic TC is the one implemented by the Sakhr Company [150]. However, it has not been published. Table 2.3 provides an overview of the literature concerning Arabic TC. Hassan [79] performed TC using both WebKB English dataset and ArabText dataset. Maximum Entropy was used as a base classifier. Feature Filtering was performed using DF and χ2 feature scoring methods. He concluded that χ2 is better than DF. Sawaf et al. [154] used also Maximum Entropy without any preprocessing or DR. They used a very large corpus; Arabic NEWSWIRE, and achieved a quite good F1 of about 60%. This illustrates the potential of the machine learning, and especially statistical learning, in TC. El-kourdi et al. [60] performed root extraction to their Arabic dataset that consists of only 300 documents that cover five categories. They used NB for the classification process and performed feature filtering using t f id f feature scoring method. They used cross-validation to estimate the average performance of their system which showed comparable results with previous work. Duwairi [58, 59] used the root based of [5] to perform preprocessing on her Arabic datasets of 1K document. She proposed a distance-based classifier based on the Dice measure which is defined as 2 ∗ Fx Fy SimDice (Fx , Fy ) = , Fx + Fy T

(2.8)

where Fx , Fy are the feature vector, BOW, of document x and category y. Therefore, the document is assigned to the category that has the maximum SimDice (Fx , Fy ). However, no comparison was provided among this classifier and other state-of-art classifiers used in TC literature. Additionally, the experiment has not been performed on a benchmark dataset to compare its results with benchmark results. Al-Taani and Al-Awad [7] proposed using a fuzzy similarity approach for classification. They applied six fuzzy operators and compared among their performance using a very small dataset; consists of only 50 documents. Similar to the study of Duwairi, the dataset used is not a benchmark dataset and no comparison between this classifiers and top classifiers in the TC literature was presented. Additionally, no information was provided about the stemmer used in the preprocessing stage. Mohamed et al. [129] used the DF of n-gram words as a feature scoring method. This is performed by counting the DF of each word, two words up to six words. Then, the highest DF features are selected in the training set. Similar to [7, 58, 59], they proposed a new classification algorithm without providing benchmark results or comparing it with top performing classifiers. Their algorithm is based on measuring the distance between the test document and the category which is similar to the algorithm performed in [58, 59]. 25

Syiam et al. [166] conducted a study to compare the performance when no stemming, using different stemmers, and root extraction. The results showed that using root-based technique outperformed using the raw text. While performing light stemmer was superior. Additionally, Syiam et al. compared different scoring methods in conjuction of removing rare words as a preprocessing stage. They showed that IG had the best performance. Additionally, they compare between the performance of Rocchi and kNN with different k values. They illustrated that Rocchi outperformed kNN significantly. However, the experiments of this study were conducted using only one non-benchmark relatively small dataset. More experiments are needed to increase the assertion of this work. Kanaan et al. [93] used IG as a feature scoring method and normalized t f id f (ltc) as a feature weighting method. They performed experiments using kNN classifier. They illustrated the importance of the k parameter on the performance and recommended a k of 19. However, they used only one dataset in their experiment which is very small (consists of 600 documents). Examining the state-of-art research in the Arabic TC leads to the following conclusions: • Most researchers either did not perform stemming at all, or used their own stemming algorithms. The reason for that is the lack of a standard readily available Arabic stemming algorithm. Recently, Darwish1 made his implementation of Arabic stembased and root-based algorithms available on the Web. These algorithms were applied to Arabic IR and showed promising results [42]. • With the exception of [79], none of the mentioned systems have been applied to an English benchmark dataset. Validating the system efficiency using benchmark datasets will help establish the confidence in these conclusions. • The distribution of documents across categories in most datasets used has been almost even. It is worth noting that the greater the skewness of the dataset, the more challenging the categorization process. • With the exception of [79, 154], all datasets used are of small sample size; usually below 1.5k documents ( This is in contrast to studies using standard data sets which present a variety of small sizes and distributions 4.1.1). • Most the previous systems did not indicate that they performed cross validation despite using non benchmark datasets. It is not clear why a certain set of documents are selected as the training documents and what would happen if this set changes. 1 The

stemmer and root extractor algorithms implemented by Kareem Darwish are available for free at http://www.glue.umd.edu/ kareem/research/

26

• DR techniques have been performed in only four systems [60,79,129,166]. However, there is no study that compares the performance of high performing feature scoring methods; namely IG, MI and CC. • All researchers , who performed feature filtering, did not indicate how thresholding of features was performed and whether they used local or global thresholding. • Despite being from the top performing classifier in TC, SVM has not been presented in the Arabic TC literature. • All systems , with the exception of [129], used the BOW approach for document representation. Phrase-based or other representations have not been used yet.

2.4

Summary

This chapter provides an overview of various techniques used in each of TC stages. TC consists of five main stages: 1. Document Pre–processing: The general literature of English TC has agreed to perform stemming and remove stop words. However, in the Arabic language, the choice of whether to perform stemming, root extraction, or using the raw text still needs more investigation. 2. Document Representation: The BOW representation is considered to be the dominant approach to represent documents due to its simplicity and efficiency. 3. Dimensionality Reduction: Generally, feature filtering is the simplest and fastest approach for DR , however, it is slightly outperformed by other techniques. 4. Feature weighting: The ltc method is the most popular approach despite innovating new efficient methods such as supervised feature weighting. 5. Classification: The SVM is considered to be generally the top classifier in the TC literature. However, it needs a long training time and large storage requirements. DR techniques are needed in order to reduce the computational time needed by the feature weighting and classification stages. Additionally, this would lead to a saving in storage required. Proposing new techniques for enhancing the performance of feature filtering is inevitable. Nevertheless, these techniques should not add complexity to the filtering approach in order to keep it simple and fast. In order to investigate the performance of these new techniques, a large comparative study has to be conducted using different benchmark datasets of different natures in terms of the vocabulary size and category distribution. 27

Chapter 3

TC using the Filter Approach of DR The filter approach is the simplest ,and by far, the most popular approach in DR. Filtering is composed of two stages (a) feature scoring, and (b) thresholding. In this chapter, more light is shed on the filtering approach. The contribution of this work in both scoring feature and thresholding is presented.

3.1

Feature scoring Methods

Several feature scoring methods have been presented in the literature. A good feature scoring method should achieve one or more of the following objectives: enhance performance, reduce storage requirements, and decrease computational time. Among the most popular features used in TC are: i. Term Frequency (TF) is the number of occurrences of the word in the training set [128]. ii. Document Frequency (DF), unlike TF, assumes that the single occurrence of the word in a document has the same importance as its multiple occurrence . DF can be simply calculated by counting the number of documents where a specific word , wk , occurs [189]. DF is defined by DF(wk , ci ) = N(ci ),

(3.1)

where N(ci ) is the number of documents in category ci . iii. Information Gain (IG) is the number of bits gained, for a certain category, by knowing the presence or absence of a word in the document [189]. Yang and Pedersen applied IG to evaluate features in the training set regardless of their categories [189]. In this case, thresholding is not needed to collect local feature scores since the features are evaluated globally on the entire set. On the other hand, it is possible to apply IG locally 28

in order to evaluate features in each category [158]. The global and local definitions of IG provided by [189], and [158] are shown in equations 3.2, and 3.3 respectively.

M

IG(wk ) = −



M

p(ci )log(p(ci )) +

ci =1





p(w)p(ci |w)log(p(ci |w))

(3.2)

ci =1 w∈{wk ,wk }

IG(wk , ci ) =





p(c|w)log

c∈{ci ,ci } w∈{wk ,wk }

p(c|w) p(w)p(c)

(3.3)

iv. Gain Ratio (GR) was first introduced by [46], and is defined as the ratio between the information gain and the entropy of the category. IG grows with the entropy of the category and dividing it by the entropy allows a fair comparison among categories. GR is defined as follows: GR(wk , ci ) =

IG(wk , ci ) − ∑c∈{ci ,ci } p(c)log(p(c))

(3.4)

v. Mutual Information (MI) measures the mutual dependency between the word k and the category ci . It is presented by equation 3.5 according to [189]. MI(wk , ci ) = log

A(wk , ci ) × N(Tr) , (A(wk , ci ) +C(wk , ci ))(A(wk , ci ) + B(wk , ci ))

(3.5)

where N(Tr) is the number of documents in the training set, A(wk , ci )is the number of times a word wk and a category ci co-occur, B(wk , ci ) is the number of time the wk occurs without ci , and C(wk , ci ) is the number of times ci occurs without wk . This can be simplified to: MI(wk , ci ) = log

A(wk , ci ) × N(Tr) , N(ci ) × N(wk )

(3.6)

where N(wk ) is the number of documents in the training set in which wk occurs. Since N(wk ) is always greater than or equal to A, the features that have the highest MI that N are unique. Hence, A(wk , ci ) = N(wk ) and MI(wk , ci ) = log N(c = log NA . This would i) be a favorable for rare categories over frequent ones, and the accuracy may be affected dramatically. In order to overcome this problem, an alternative definition of MI was proposed in [16] and is defined as: MI(wk , ci ) = p(wi |ci )log 29

A(wk , ci ) × N(Tr) . N(ci ) × N(wk )

(3.7)

This new definition overcomes the problem By multiplying the traditional MI by p(wk , ci ). N Unique words in rare categories will have small p(wk , ci ) and this will balance the N(c . i) Therefore, there would be no bias to rare categories. vi. The chi square (χ2 ) measures the lack of independency between a word, wk , and a category, ci . χ2 can be considered as a normalized form of MI. [189] defined χ2 to be:

χ2 (wk , ci ) =

N(Tr) × [A(wk , ci ) ∗ D(wk , ci ) −C(wk , ci ) ∗ B(wk , ci )]2 , (3.8) N(ci ) × N(wk ) × [D(wk , ci ) +C(wk , ci )] × [D(wk , ci ) + B(wk , ci )]

where D(wk , ci ) is the number of times neither ci nor wk occurs. vii. Correlation Coefficient (CC), also known as Ng-Goh-Low coefficient (NGL), is the square root of χ2 . Since the numerator in χ2 is squared, it fails to illustrate if "negative correlation" (C(wk , ci ) × B(wk , ci )) is greater than "positive correlation" (A(wk , ci ) × D(wk , ci )) [137]. Obtaining the square root, as defined in CC , has been shown to outperform χ2 [137, 149]. CC is defined as:

p N(Tr) × [A(wk , ci ) ∗ D(wk , ci ) −C(wk , ci ) ∗ B(wk , ci )]

. CC(wk , ci ) = p N(ci ) × N(wk ) × [D(wk , ci ) +C(wk , ci )] × [D(wk , ci ) + B(wk , ci )] (3.9) viii. Galavottia-Sebastiania–Simi coefficient (Gss) was developed by [71] to simplify CC. The definition of Gss is shown in Equation 3.10. Gss(wk , ci ) =

[A(wk , ci ) ∗ D(wk , ci )] − [C(wk , ci ) ∗ B(wk , ci )] . N(Tr)2

(3.10)

ix. Odds Ratio (OR), first used by [126], favors features that occur in documents that belong to the category "positive examples". Thus, the features that occur in positive examples and are absent from negative examples will have high scores. OR is defined as follows: OR(wk , ci ) =

p(wk , ci ) ∗ (1 − p(wk , ci )) A(wk , ci ) ∗ D(wk , ci ) ≈ (1 − p(wk , ci )) ∗ p(wk , ci ) C(wk , ci ) ∗ B(wk , ci )

(3.11)

x. Bi-Normal Separation (BNS) was proposed by [63] in order to select relevant and irrelevant features. Forman claimed that both set of features are necessary for the classification process. BNS is defined as: 30

BNS(wk , ci ) = |F −1 (p(wk , ci )) − F −1 p(wk , ci ))|,

(3.12)

where F −1 is the standard Normal distribution inverse cumulative probability function. xi. Random (RND) selects features randomly from the training set. xii. Other methods were used in TC such as term strength [189], ML measure [38], and Orthogonal Centroid Feature Selection (OCFS) [183]. Several studies have been conducted in order to evaluate the different feature metrics. The study of [189] compared among DF, IG, MI, χ2 , and term strength using the Ohsumed and Reuters-21578 datasets. They concluded that IG, DF, and χ2 are the best performing features. On the other hand, Ng indicated that CC is better than χ2 , where both metrics outperform DF [137]. However, this study was conduced using only the Reuters-21578. Using small datasets, Gabrilovich and Markovitch found that IG, χ2 , and BNF better than DF and RND [69]. On the other hand, Debole and Sebastian used three versions of Reuters21578 in their comparison among IG, χ2 , and GR [48]. They concluded that IG and χ2 outperform GR, while IG is slightly better than χ2 in most cases. The following conclusions could be derived from these studies: • CC could be used instead of χ2 due to its superior performance • RND seems to be unstable method and the other methods usually outperform it. • GR and term strength perform poorly in TC. • DF is definitely the simplest feature scoring method. However, it is outperformed by most methods. Nervelessness, its performance is acceptable if simplicity is preferable as indicated by [189]. • There is no study that compares among CC, IG, and MI. • The only study that evaluated MI is the one of [189]. However, this definition of MI suffers from some weakness as explained before. The second definition of MI, presented in [16], has not yet been compared with other feature scoring methods. This work conducts a comparative study among CC, DF, MI, and IG as feature scoring methods. This study uses different threshold values and several datasets of diverse nature. This would help to decide what is the best method to be used in TC.

31

3.1.1

Feature Combining Methods

Researchers have presented several combination techniques to merge two or more feature scoring methods. Since each of the feature scoring method evaluates features differently, merging these methods together could help to identify a better feature set. The purpose of the combining process should be clear in order to decide how to perform combination. Obviously, the combining operation is computationally expensive as it involves performing a number of feature scoring methods then merging the lists they produce. However, this computation takes place only in the training phase and the testing phase would not be affected. Among the researches that used a combination of feature scoring methods are the following: • Rogati and Yang [148]: The combining approach used in their study was based on normalizing the feature scores of each method and taking the maximum score. Thresholding is then performed on the combined list. Rogati and Yang concluded that using χ2 combined with either DF or IG leads to an improved performance. The idea of normalizing feature scores then taking the maximum seems to be a good idea. However, it could be simplified or better developed to produce more discriminative features. • Del Castillo and Serrano [50]: A genetic algorithm was used by in order to construct the optimal feature set from various feature sets produced by different scoring methods. Six different scoring methods were used to generate six feature sets. The genetic algorithm is then used to decide which features to include in the final set. However, using the genetic algorithm makes the feature selection process very complicated and time consuming which may be invisible to some applications. • Doan and Horiguchi [53]: The experiment conducted was based on selecting the top 2000 features of the MI feature set. These features are then aggregated with either the top 100 or the top 200 features of the DF feature set using the union. Doan and Horiguchi showed that the combined list outperformed both a feature set of the top 2000 MI and a feature set that contains all the terms in the training set. Using the union is a very fast and simple combining method. However, it is not clear why Doan and Horiguchi chose to combine the MI feature set with a DF feature set with smaller size. This may lead to a bias to the MI list. Alternatively, combining the same number of features from both lists will not produce any bias and may enhance the performance further. In [53], the authors compared the performance of a combined list of either 2100 or 2200 features with another list of 2000 features which might not be fare. In this study, another approach would be taken where the evaluation of the performance of these techniques should 32

be done by comparing their performance with the performance of the original feature set using the same feature set size. Therefore, the combined list has to lead to an improved performance compared with the performance of the individual lists for the same number of selected feature. Hence, instead of increasing the number of the selected features of a single scoring method in order to achieve a desired accuracy, one can combine two or more lists produced by different methods to reach an accuracy using a limited number of features. Reducing the size of the final feature set is a very important issue in TC; it does not affect only the storage needed but also the computational time. The combining operators should be fast, simple and effective. Additionally, the performance of the combining techniques should be investigated using different threshold values and different datasets. This would help to illustrate the benefits and drawbacks of using such techniques. With this purpose, this work proposes different operators for combining pairs of feature sets generated from different feature scoring methods. Figure 3.1 shows a graphical illustration of the process of combining feature sets using these operators. The proposed operators are the Union Operator (UN), Intersection Operator (INT), Union-cut Operator using Maximization (UCM), and Union-cut Operator using DF (UCD). • Union (UN) : The UN operator aggregates different feature sets together. As a result of aggregation, the number of selected features will be greater than or equal to the number of features of the original lists. The exact equivalent threshold (UTh) equals to: UT h = (T h × Sim + 2 × T h(1 − Sim)), (3.13) where T h is the threshold used in the original lists, Sim is the similarity between the original feature lists. • Intersection (INT) : The INT operator selects only the common features of the two lists. This list represents the intersection of the features selected by each scoring pair. Evidently, this list will be less than or equal to the number of features of the original lists. The equivalent threshold (ITh) will be equal to: IT h = (T h × Sim).

(3.14)

• Union-cut using Maximization (UCM) : This operator is a modification to the one used by [148]. The size of the resulted list is limited to be equal to the size of the original lists. However, instead of normalizing the scores and taking the maximum of all the features in the two lists, this normalization is performed for only the features 33

(a) The UN Operator

(b) The INT Operator

(c) The UCD Operator

(d) The UCM Operator

Figure 3.1: Construction of combined lists using the UN, INT, UCM, and UCD operators

34

that the two methods disagree about. Justification in this case is to preserve features that receive more confidence when selected by both methods. These features are selected using INT operator. These features are first included in the UCM list. The list is then completed by the highest scoring features after normalization. • Union-cut using DF (UCD) : This operator is a simplification of the UCM operator to avoid normalization, which is costly. The common features are selected, first, followed by the highest DF features instead of the highest scored features. The four proposed operators attempt to achieve better performance through combining each from its perspective. The UN operator selects all features generated by the combined scoring methods. Accordingly, it increases the size of the feature set and its performance should be compared with the performance of single lists at a higher threshold (UTh). On the other hand, the INT operator reduces the size of the feature set , hence, it should be matched with the individual lists at a lower threshold (ITh). While the performance of the UCM and UCD operators is compared with the same threshold (Th). The four combined operators are applied to the four feature scoring methods, CC, DF, IG, and MI in order to combine each pair of feature scoring method together. These combined feature are compared with the original set at the equivalent threshold according to the combining operator applied. A comparative study is conducted to evaluate these methods using different thresholding values and datasets of diverse natures. The results of this study is presented in chapter 4.

3.2

Thresholding Techniques

Thresholding is an important stage in feature selection using the filter approach. After applying feature scoring methods to each category, thresholding is performed to select the final representive feature set. Selection is done according to either (a) a local, or (b) global policy. In the local policy, thresholding is applied locally on each category, and their aggregate compose, the final representative feature set [52]. On the other hand, a globalization is performed to extract a single global score for features in the global policy by selecting features with the highest global scores [189]. Figure 3.2 illustrates these thresholding policies. 3.2.1

The local policy

Several researchers have suggested the usage of a local policy The local policy tends to optimize the classification process for each category by selecting the most relevant features in that category [158]. Local selection policy was used by [9] where a local dictionary 35

Top Th%

Feature Set

Feature Set

Features with global Scores

Globalization Schema Top 5% Top Th%

C1

Top Th%

Top Th%

C2

....

CM

C1

(a) Fixed Local thresholding

C2

....

CM

(b) Global thresholding

Figure 3.2: Local and global thresholding

of the most important words in each topic was used. They then selected from each category the words that matched the category dictionary. However, using local dictionaries is both a domain-dependent and a language-dependent approach. Alternatively, Lewis and Ringuette selected the top IG features from each category [114]. On the other hand, an implementation to local feature extraction was proposed by [179] where they applied LSI on each category separately. An extension to the usage of the local strategy was presented by [137] where they applied this policy to three feature scoring methods, namely DF, CC, and χ2 . Common to these approaches is the selection of the same number of features from each category which could be referred as a " fixed local approach" (FLocal). On the other hand, Soucy and Mineau [164], proposed selecting features in proportion to the category distribution, or what can be considered a weighted local approach (WLocal). The rational behind this approach is that the ratio among words of frequent and infrequent categories maybe very large. Hence, selecting the same number of features from both distributions may degrade the performance. This is more profound in highly skewed datasets. In [64], Forman proposed a similar idea to fixed local thresholding, which he called round-robin. In this approach features are selected from each class in a round-robin manner. Additionally, Forman proposed another selection policy called called rand-robin. In this method, the next class to select features from is determined randomly by a process that is controlled by the probability of the class distribution. This is highly similar to the idea of WLocal. However, one drawback of rand-robin is that in highly-skewed datasets it might not select any document from rare-categories. This is due to its random nature in selection, 36

which depends on the category probability. 3.2.2

The global policy

In contract to the local policy, the global policy aims to provide a global view of the training set by extracting a global score from local feature scores. Thresholding is then applied to these global scores, where features with the highest global score are retained. Yang and Pedersen [189] used Maximization (Max) and Weighted Averaging (WAvg) for extracting global scores from χ2 and MI. Additionally, they used Averaging (Avg) for DF and IG. Calvo and Ceccatto proposed the usage of Weighted Maximum (WMax), where features are weighted by the category probability [28]. Equations 3.15, 3.16, 3.17, and 3.18 provide the mathematical definitions of Avg, Max, WAvg, and WMax respectively. f (wk , ci ) ∑M FAvg (wk ) = i=1 , M

M

(3.15)

FMax (wk ) = max { f (wk , ci )} , i=1

(3.16)

M

FWAvg (wk ) =

∑M i=1

p(ci ) f (wk , ci ) , (3.17) M

FW Max (wk ) = max {p(ci ) f (wk , ci )} , i=1

(3.18)

where f (wk , ci ) is the score of the word wk w.r.t. the category ci , and M is number of categories in the training set. Feature scoring methods such as the DF, have an inherent biased to frequent categories. Similarly global techniques such as the Max or Avg tends to select features that are biased to frequent categories. Moreover, weighting the feature scores based on the category probability increases this bias and is excepted to degrade the performance, mainly in the classification of rare categories. These rare categories may have high importance and their number maybe large, especially in highly skewed datasets. Alternatively, we propose normalizing feature scores before applying the globalization scheme in order to enhance the classification process of rare categories and balance the bias of feature scoring methods such as DF. Accordingly, Normalized Average (NAvg), and Normalized Maximization (NMax) are defined as shown in equations 3.19, and 3.20 respectively: FNAvg (wk ) =

∑M i=1

f (wk ,ci ) p(ci )

M

M

(3.19)

FNMax (wk ) = max i=1

f (wk , ci ) . p(ci )

(3.20)

The main focus of selection is to identify "good features". It is usually assumed to be the features that have the maximum or the maximum average score in the training set. However, when using a feature scoring method such as DF, this definition is inappropriate. Based on 37

this definition, the features that exist in all categories with the same DF will be considered as good features. However, a good feature is the one that has a score in one category that is substantially different from its score in all other categories. In order to select such features, we propose the usage of the Standard Deviation (STD) as a globalization scheme. STD gives an estimated measure of how diverse the data is from the mean. STD, WSTD, and NSTD are defined as s  2 ∑M i=1 f (wk , ci ) − f Avg (wk ) FST D (wk ) = . (3.21) M s FW ST D (wk ) =

 2 f (w , c ) ∗ p(c ) − f (w ) ∑M i i WAvg k k i=1 . M

s FNST D (wk ) =

 2 f (w , c )/p(c ) − f (w ) ∑M i i NAvg k k i=1 . M

(3.22)

(3.23)

Although intuitively the STD should capture good features, as opposed to the Max and Avg, it has its shortcomings. To illustrate this, suppose that a certain feature w1 occurs in only one category with DF= x while another feature w2 occurs in two categories with the same DF=x. According to the definition of a good feature w1 should be considered better than w2 . When using DF as a scoring method, the mean of w2 will be higher than the mean of w1 . Accordingly, the STD of w2 will be greater than the STD of w1 since the scores of w2 will be more diverse from its mean. In order to overcome this pitfall, we propose the Maximum Deviation (MD) as a globalization scheme. Contrary to the STD, MD gives an estimation of how diverse is the data from the Max which makes it closer to realize the definition of a good feature compared to the STD. MD, WMD, and NMD are defined as: s FMD (wk ) = s FW MD (wk ) =

(3.24) 2

∑M i=1 [ f (wk , ci ) ∗ p(ci ) − fW Max (wk )] . M

s FNMD (wk ) =

2

∑M i=1 [ f (wk , ci ) − f Max (wk )] . M

2

∑M i=1 [ f (wk , ci )/p(ci ) − f NMax (wk )] . M

38

(3.25)

(3.26)

Table 3.1: A summary of comparative studies among thresholding Techniques

3.2.3

Yang and Pedersen [189]

Scoring Methods IG, χ2

Datasets Reuters-21578

Galavotti et. al. [71]

Simplified χ2

Reuters-21578

Calvo and Ceccatto [28]

χ2

Reuters-21578

Soucy and Mineau [164]

IG, DF ,χ2 , Cross Entropy

Forman [64]

IG, χ2

Reuters-21578 lingSpam, DigiTrad, Ohsumed 19 datasets

Díaz et al. [52]

IG, t f , t f id f

Reuters-21578, Ohsumed

Conclusions 11-average point: Max > WAvg MicroF1 : Max > WAvg MicroF1 : WAvg, WMax > Max MacroF1 : Max,WAvg > WMax MicroF1 : FLocal > WLocal > Max IG (MacroF1 ): FLocal > WLocal > Max > Avg χ2 (MacroF1 ): FLocal > Max > Avg > WLocal Recall and Precision: FLocal > Avg

Comparative studies of Thresholding Techniques

Table 3.1 provides a summary of major studies conducted to evaluate the performance of current thresholding techniques. The comparisons among the Max and WAvg performed by [71, 189] indicated that using the Max is better than WAvg. However, the two measures used in these experiments, MicroF1 and 11-averaging point, are mainly affected by the performance of frequent categories especially in a highly skewed dataset such as Reuters21578. The study performed by [28] took MacroF1 into consideration. MacroF1 is a measure of the performance of rare categories. However, the conclusion they presented contradicted the results of [71, 189] in MicroF1 despite using the same dataset. While the previous three studies used only Reuters dataset in the evaluation, Soucy and Mineau [164] used four different datasets and four feature scoring methods. However, they reported only the best performance achieved for each dataset regardless of the feature scoring method and threshold value used. Additionally, the evaluation measure used in this study, MicroF1 , does not reflect the performance of rare categories. The study of [64] was conducted using different small threshold values and nineteen different datasets. However, the results reported were only the average performance of the experiments conducted on these datasets. Both studies however fell short of presenting the complete picture required to determine the most situable thresholding techniques for different datasets. Díaz et. al. [52] presented a comparison between FLocal and Avg. Although this study didn’t evaluate the WLocal or even the traditional Max, however, it supported the results of [64, 164] that local thresholding are better than global thresholding. In order to avoid the pitfalls of these studies, in this work we present a comparative study among the proposed thresholding techniques and the existing six techniques; namely 39

FLocal, WLocal, Max, Avg, WMax, and WAvg. This study is conducted using different threshold values, different feature scoring methods, and datasets of diverse natures. The results of this study are reported using MicroF1 and MacroF1 in order to evaluate the performance of frequent and rare categories.

3.3

System Description

Figure 3.3 illustrates the block diagram of the system where the dotted lines show the processes performed if combining operators are applied. A set of training documents is provided to the preprocessing process where stopwords, punctation, special character are removed. Furthermore, words are stemmed to obtain the word stem. A document is then converted to a BOW. Figure 3.3 shows a sample of a document in the 20NG dataset and the constructed BOW. For each category, a bow is constructed by combining all the BOW of its documents. The BOW of the whole training set is then obtained by combining the BOW of individual categories. Feature scoring methods are then applied to categories to evaluate their features and give each feature a score indicated the feature importance in the category. Figure 3.3 illustrates the applying of a feature scoring method (IG) on the categories of the 20NG dataset. Thresholding is then applied either locally or globally. In local thresholding, the highest scored features are selected from the categories either with the same portion (Fixed local thresholding FLocal), or in propositional to the category distribution (Weighted local thresholding WLocal). On the other hand, in global thresholding, a globalization scheme is first applied to construct a global score for all the words in the training set. Then, thresholding is applied on these global score. Figure 3.6 shows the global scores of the training set after applying the Max globalization scheme. The documents are scanned again to construct a weighted feature vector for each document according to the thresholded feature list obtained after thresholding. Features are weighted by commonly using t f id f to indicate the term importance in both the document and the training set. Figure 3.7 shows a sample of feature vectors to the documents of the 20NG. These feature vectors are used by the classifier to learn the model which will be used in the testing process. If combining operators will be applied, two or more feature scoring methods are applied on categories to construct a scored list of feature like that in Figure 3.3. Thresholding is then applied on this lists either locally or globally. Then two or more thresholded feature lists are combined using the combining operators; UN, UCM, UCD, or INT (see Figure 3.1). The combined list is then used to weight the BOW of different documents to construct the feature vectors of different document (see Figure 3.7). These feature vectors are then used in the learning process.

40

Training Documents

Preprocessing

Construct BOW (representation)

BOW

Feature Scoring

List(s) of Scored words Thresholding (Local or global)

thresholded feature list(s)

Combine pair of feature Lists (using UN, UCM, UCD, or INT) Combined feature set

Feature Weighting

Weighted feature vectors

Classification

Figure 3.3: System Block Diagram

41

(a) Sample Document d1

(b) BOW of d1

Figure 3.4: Preprocessing and representation of a sample document

3.4

Summary

This chapter presented a literature review for the filter approach for feature selection. This is followed by an illustration of the new methods proposed to enhance this approach. New combining operators have been proposed in order to merge feature sets generated by pairs of feature scoring methods. This could lead to an improved performance as well as a reduction in the storage and computational resources needed. Additionally, new methods have been proposed for thresholding features scored using feature scoring methods. This could lead to a better choice of more discriminative features. This chapter showed that there are some pitfalls in previous studies conducted to evaluate feature scoring methods and thresholding techniques. Therefore, one of the main objectives of this work is to provide comparative studies among various state-of-art feature filtering techniques. These comparisons evaluate the new proposed techniques as well. The following chapters will present the experiments conducted to perform these comparative studies using datasets of diverse nature.

42

(a) Scored words of category 1

(b) Scored words of category 2

Figure 3.5: Applying feature scoring (IG) on sample categories

43

Figure 3.6: A sample of features in the training set after obtaining their global score using Max

Figure 3.7: A sample of weighted feature vectors of different documents

44

Chapter 4

Results 4.1

Experiments Setup

The main contribution of this thesis is to enhance TC using the filter approach of DR. In order to compare the results of this work with the benchmark results in TC, the most used technique is chosen in each stage of the TC process. The following section summarizes the how the experiments of this work have been setup. • Document Pre-processing : All punctuation and special characters are first removed from documents. It is worth noting that the removal of rare words was not considered, since it might be harmful especially in highly-skewed datasets. These datasets may contain categories contain a limited number of documents, and without rare words the discrimination of these categories may be difficult. When using datasets containing English documents, all stop words are removed. Stemming is then performed using the Porter Stemming Algorithm [143], which is the stemming algorithm of choice for English language. The experiments in the Arabic language have been conducted using the raw text, the stemmed, and the root words. The stemming and root extraction have been performed using two different algorithms. The first set was implemented by Kareem Darwish [41, 42] which consists of Al-Stem stem-based system and Sebawai root-based system. Whereas the second set was provided by [10] which are MORPHO3 stemmer and MORPHO3 root extractor. In this work, stop words removal has not been applied in the preprocessing of the Arabic documents. The diversity of stop word list depends on the preprocessing tool used. An additional objective of the experiments on the Arabic language is comparing among different pre-processing tools. Therefore, stop words have not been removed to isolate its effect and conduct the different experiments using the same conditions. • Document Representation : Every document was represented as a BOW which is 45

the most simplest representation avaliable. • Dimensionality Reduction : four feature scoring methods have been chosen to examine the performance of the thresholding techniques as well as the combining operators. These methods are the DF, IG, MI, and CC. These feature selection methods have been widely used, and have shown promising results [137, 158, 189]. The definitions of IG and MI adopted in this study are those suggested by [158] and [16] respectively (The mathemtical definitions of these methods are provided in 3.3 and 3.7). This choice has been supported by a pilot study within the scope of this work. • Feature Weighting : classical normalized tfidf (ltc) method has been adopted since it is the most commonly used feature weighting method. • Classification : SVM has been the method of choice of this work. Studies have shown that it is among the best performing classifiers in TC applications [48, 57, 188]. The SV M light [90]1 package is used for this purpose, where all parameters were set to the default parameters. That is to say, SVM is used in its linear form. • Performance Evaluation :The common MicroF1 and MacroF1 measures have been used for performance evaluation to benchmark our results with the literature. 4.1.1

Datasets

Six different datasets were used, it is worth noting that the 20NG, Reuters(10), Reuters(90), and Ohsumed datasets are benchmark datasets. • 20 Newsgroups Collection (20NG)2 is a collection of nearly 20,000 articles posted to the Usenet newsgroups [88]. Some of the newsgroups are much related together (e.g. rec.autos, rec.motorcycles), while others are highly unrelated (e.g. comp.graphics, talk.politics.mideast). The standard split "bydate" was used in which duplicates and headers were removed. This results in 18,941 documents, 60% of them are reserved for the training set, while the test set contains the remaining 40% [196]. • Reuters-21578 Dataset3 has been a standard benchmark in TC for the last 10 years [48]. It consists of over 20,000 news stories appeared in Reuters newswire in 1987 [80]. In this experiment, two standard splits of Reuters-21578 were used. The first one is the Top10 split (Reuters(10)) which contains the top 10 categories that have the largest number of documents in the training set [57]. The second is the ModeApté split 1 SV M light

is publicly published at http://svmlight.joachims.org. 20 Newsgroups and the bydate split can be found at http://people.csail.mit.edu/people/jrennie/20Newsgroups. 3 Reuters-21578 is available at http://www.daviddlewis.com/resources/testcollections/reuters21578/. 2 The

46

(Reuters(90)) that contains all categories with at least one positive training example and one positive test example which results in 90 categories [89]. • Ohsumed Dataset4 is a collection of 348,566 references gathered from 270 medical journal published from 1987 to 1991 [81]. In this work, the subset used by [38, 89] was followed. In this subset, only the first 20,000 documents with abstracts published in 1991 were considered which resulted in 23 categories. The first 10,000 had been used as training set and the rest as test set. • Aljazeera News Arabic Dataset (Alj-News) is a collection of 1500 Arabic news documents. These documents were gathered evenly from five categories found at aljazeera online news agency 5 [129]. The categories used in this work were: economic, art, sport, politics, and science. However, the number of documents in this dataset is small and the diversity among the nature of categories is big. This significantly simplify the classification process. In order to compare the results of this work with the results of [129], the same split used was adopted. Table 4.1: The category distribution of Alj-Mgz Dataset Category No of documents Arts 406 Culture 205 Economy 283 International-News 653 Local-News 433 Health 204 Society 1326 Sport 952

• Al-jazirah Magazine Arabic Dataset (Alj-Mgz) is an Arabic dataset collected manually from Al-jazirah online newspaper6 . The dataset consists of 4470 articles published from 2001 to 2005. The selection of these articles insured that at least one article from each week in that period was selected. The documents of this dataset are distributed unevenly among eight categories. Table 4.1 illustrates the distribution of Alj-Mgz dataset. Since there is no standard split to the training and test document of this dataset, cross validation is performed. Five random chosen splits were constructed such that the training documents in each split represent four fifth of the total number of documents. The results of experiments conducted on Alj-Mgz show both the mean and standard deviation using the five different splits. 4 Ohsumed

is available http://trec.nist.gov/data/t9_filtering/.

5 http://www.aljazeera.net/

6 http://www.al-jazirah.com/

47

Dataset 20NG Reuters(10) Reuters(90) Ohsumed Alj-News-W Alj-News-AS Alj-News-SR Alj-News-MS Alj-News-MR Alj-Mgz-W Alj-Mgz-AS Alj-Mgz-SR Alj-Mgz-MS Alj-Mgz-MR

Table 4.2: Characteristics of the datasets used in the Comparative Study . Multi-Category Evenly Distributed M Ntotal N(Tr) W (Tr) No Yes 20 '19K '11.3K '1.6M7 Yes No 10 '10K '6.5K '440K7 Yes No 90 '13K '7.8K '550K 7 Yes No 23 20K 10K '600k 7 No Yes 5 1.5K 750 '190K No Yes 5 1.5K 750 '167K No Yes 5 1.5K 750 '150K No Yes 5 1.5K 750 '196K No Yes 5 1.5K 750 ' 196K No No 8 '4.5K '3.5K '943K No No 8 '4.5K '3.5K '830K No No 8 '4.5K '3.5K '940K No No 8 '4.5K '3.5K ' 1M No No 8 '4.5K '3.5K ' 1M

Wu (Tr) '70k '17K '18K '19.5K '34K '16K '7.3K '18.6K '2.6K '93K '37K '32K '47K '3.6K

The datasets used have different characteristics in the sense of the vocabulary size, and category distribution. Both the 20NG dataset and Alj-News dataset are evenly distributed datasets. Whereas, the Reuters(10) and Alj-Mgz dataset can be considered as moderately diverse datasets. Ohsumed dataset can also be considered as a moderately diverse dataset, but its skew is larger than Reuters(10) and Alj-Mgz. On the other hand, Reuters(90) is a sample of a highly diverse dataset. Similarly, there is diversity in the vocabulary size among the used datasets. The largest dataset in vocabulary size is the 20NG followed by Alj-Mgz. While Reuters(10), Reuters(90), and Ohsumed have near similar vocabulary size. On the other hand, Alj-News dataset has the smallest vocabulary size. Table 4.2 summarizes the characteristics of the used datasets. In order to distinguish among different versions of Arabic datasets according to the stem or root extractor tool used, the letters (W, AS , SR, MS, MR) are appended to the name of the dataset to refer to using the raw text, Al-Stem stemmer, Sebawai root extractor, RDI MORPHO3 stemmer, and RDI MORPHO3 root extractor respectively.

4.2

Thresholding Techniques Results

Figures 4.1-4.14 compare the performance of different feature thresholding techniques at different thresholding values. These techniques include Averaging (Avg), Maximization (Max), Fixed Local (FLocal), and Weighted Local (WLocal) in addition to our proposed techniques Standard Deviation (STD) and Maximum Deviation (MD). All the globalization 7 The

vocabulary size of English datasets was calculated after applying Porter stemmer and removing stop words

48

techniques are evaluated using the original, the weighted, and the normalized score. This results in a comparison among fourteen thresholding techniques for non–evenly distributed datasets. However, for evenly distributed datasets, weighting or normalizing feature scores will be indifferent since all the categories have the same probability. Additionally, WLocal will be similar to FLocal. All the results are shown in both MicroF1 and MacroF1 . In evenly distributed, the values of MicroF1 and MacroF1 will match. Therefore, the results of MicroF1 are only presented. In the following, we will focus on the evaluation of the thresholding techniques for each dataset used separately. 4.2.1

The 20NG Dataset

The performance of the MicroF1 of the 20NG dataset is shown in Figure 4.1. The following notes could be derived by examining these results: • The results show that the MD has a limited improvement when using DF as a scoring method, especially noticeable at low threshold values. On the hand, FLocal has the best overall performing method. • Generally, the Max, and MD exhibit an almost identical performance in feature scoring methods with the exception of DF. As a matter of fact, the correlation between the feature sets they produce show a high similarity, generally above 90%. This is due to the fact that these scoring methods take into account the relevance of the feature in the category under investigation and other categories in the training set. Therefore, using either the Max, or MD tends to retain nearly the same set of features. This is an expected result, since these scoring methods follow from the rational of a good feature. Generally, FLocal shows a performance superior to the Max and MD in thresholding feature scoring methods except DF. This is due to the diversity in the number of good features from one category to the other. Therefore, globalization schemes such as the Max, or MD tend to be biased to certain categories in accordance to the number of good features they include. On the other hand, FLocal is an unbiased technique that forces the selection of the same number of features from all the categories. • The MD is usually slightly better than the STD which shows the potential of MD to capture discriminative features. • The Avg thresholding technique performs poorly since it sums the scores of the features across categories and hence it tends to select non-discriminative features.

49

0.8

0.8

0.75

0.75

0.7

Micro-F1

Micro-F1

0.7

0.65

0.65

0.6

0.6 0.55 0.55

0.5

0.5

0.45 0

2

4

6

8

10

0

2

4

Threshold

(a) CC

8

10

6

8

10

(b) DF

0.8

0.8

0.78

0.78

0.76

0.76 Micro-F1

Micro-F1

6 Threshold

0.74

0.74

0.72

0.72

0.7

0.7

0.68

0.68 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.1: MicroF1 of the thresholding techniques using 20NG

50

4.2.2

The Ohsumed Dataset

Figures 4.2, and 4.3 show the performance of MicroF1 , and MacroF1 for the Ohsumed dataset respectively. The following observations could be made by analyzing these results: • NMD achieves the best MicroF1 and MacroF1 in thresholding DF scores. • For CC, MI, and IG, FLocal has the best performance for MacroF1 . On the other hand, WLocal is better than FLocal with respect to MicroF1 at small threshold values. However, as the threshold increases, the performance of FLocal approaches that of WLocal. This threshold differs from one scoring method to the other. • Consistent with the results from the 20NG, the MD and Max show a high degree of similarity in performance when thresholding the IG and MI. On the other hand, using the CC, and Max provides a slightly improved performance compared to the MD. This is true for all the scores; the original score, the normalized score, and the weighted score. • The poor performance of the Avg in thresholding DF scores is not surprising. Selecting the highest average DF features at small threshold values tends to retain the features that exist in all categories. These features are not useful to discriminate among categories. • Generally, weighting the scores performs poorly compared to both the original and normalized scores. On the other hand, normalizing scores is better than using the original score at the macro level, since it adds bias to rare categories. In order to simplify the illustration of results in non-evenly distributed datasets, we will narrow the investigation to seven techniques. These techniques are the Max, NMax, MD, NMD, Avg, FLocal, and WLocal. Weighting global scores have been excluded because of its poor performance. Since the MD and NMD are usually better than the STD and NSTD, the results of STD and NSTD are also omitted. Additionally, NAvg has been excluded due to its poor performance. For complete results of all the fourteen techniques, the reader is directed to Appendix A. 4.2.3

The Reuters(10) Dataset

The performance of the MicroF1 , and MacroF1 for the Reuters (10) dataset is shown in Figures 4.4, and 4.5 respectively. The following remarks could be concluded via the analysis of these results:

51

0.6 0.55 0.5

Micro-F1

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0

1

2

3

4

5

4

5

Threshold

(a) CC 0.6

0.5

Micro-F1

0.4

0.3

0.2

0.1

0 0

1

2

3 Threshold

(b) DF

Figure 4.2: MicroF1 of the thresholding techniques using Ohsumed

52

0.64 0.62 0.6 0.58

Micro-F1

0.56 0.54 0.52 0.5 0.48 0.46 0.44 0.42 0

1

2

3

4

5

3

4

5

Threshold

(c) IG 0.65 0.6 0.55

Micro-F1

0.5 0.45 0.4 0.35 0.3 0.25 0

1

2 Threshold

(d) MI

Figure 4.2: MicroF1 of the thresholding techniques using Ohsumed (Continued)

53

0.5 0.45 0.4

Macro-F1

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

1

2

3

4

5

4

5

Threshold

(a) CC 0.6

0.5

Macro-F1

0.4

0.3

0.2

0.1

0 0

1

2

3 Threshold

(b) DF

Figure 4.3: MacroF1 of the thresholding techniques using Ohsumed

54

0.6 0.55 0.5

Macro-F1

0.45 0.4 0.35 0.3 0.25 0.2 0

1

2

3

4

5

3

4

5

Threshold

(c) IG 0.55 0.5 0.45

Macro-F1

0.4 0.35 0.3 0.25 0.2 0.15 0

1

2 Threshold

(d) MI

Figure 4.3: MacroF1 of the thresholding techniques using Ohsumed (Continued)

55

• While the performance of the previous datasets enhances with increasing the threshold value; the situation in the Reuters(10) dataset is different. The best performance in this dataset is achieved at a small threshold value. As this threshold increases, the performance degrades. This shows that the Reuters(10) dataset is an easy classification problem that only a small set of features could lead to the highest accuracy. Additionally, this supports the idea that DR is very important issue in TC as it removes noisy terms that may cause the performance to decline. • For the CC, Max, MD, FLocal, and WLocal have nearly similar performance. However, the Avg is the least performing method based on both MicroF1 and MacroF1 , while the performance of NMax is not as good as the other methods based on only MicroF1 . • For small threshold values of DF, NMD and NMax are the best methods while the performance of Avg and MD is very poor. For large threshold values, all the methods exhibit nearly the same performance. • MD is the best technique for the MicroF1 of IG and small threshold values of MacroF1 . For larger threshold values, NMD and NMax are the better than other methods. • With respect to using MI, the best performance is achieved by MD at threshold 2.5%. For larger thresholds, WLocal presents the technique of choice.

4.2.4

The Reuters(90) Dataset

Figures 4.6, and 4.7 illustrate the performance of MicroF1 , and MacroF1 for the Reuters (90) dataset respectively. The analysis of the performance yields the following observations: • The performance of NMD and NMax is very poor especially for small threshold values. Since Reuters(90) is a highly skewed dataset, normalizing scores will increase the bias to rare categories at small threshold values. • At the micro level, MD, Max, FLocal, and WLocal have similar performance in MicroF1 with a limited superiority of WLocal at small threshold values due to the added bias to frequent categories. • At the macro level, one can conclude that the WLocal is the best method in thresholding IG and MI at threshold values greater than 1%. It is surprising to observe that the WLocal is generally better than the FLocal on the macro level. This is despite the fact that the FLocal allows the selection of more features from rare categories. Examining 56

0.95

0.94

0.94

0.93

0.93

0.92

0.92 Micro-F1

Micro-F1

0.95

0.91 0.9

0.91 0.9

0.89

0.89

0.88

0.88

0.87

0.87

0.86

0.86 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

6

8

10

Threshold

(a) CC

(b) DF

0.95

0.96

0.94

0.95 0.94

0.93

Micro-F1

Micro-F1

0.93 0.92

0.91

0.92 0.91

0.9 0.9 0.89

0.89

0.88

0.88 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.4: MicroF1 of the thresholding techniques using Reuters(10)

57

0.92

0.95

0.9

0.9 0.85

0.88

Macro-F1

Macro-F1

0.8 0.86

0.84

0.75 0.7

0.82 0.65 0.8

0.6

0.78

0.55 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

6

8

10

Threshold

(a) CC

(b) DF

0.91

0.92

0.91

0.9

0.9

Macro-F1

Macro-F1

0.89

0.88

0.89

0.88

0.87 0.87 0.86

0.86

0.85

0.85 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.5: MacroF1 of the thresholding techniques using Reuters(10)

58

the F1 of individual categories shows that FLocal in fact slightly enhances the performance of rare categories. However, WLocal enhances the performance of frequent categories significantly as it selects more features due to the highly skewed nature of Reuters(90). Since MacroF1 , as mentioned in section 2.2, gives the same weight to all categories , the WLocal is generally better than the FLocal in thresholding IG and MI. • On the other hand, FLocal is the best method considering the CC and DF for threshold values less than 7.5%. Beyond this threshold, WLocal outperforms FLocal. Using WLocal in thresholding CC and DF scores in small threshold values leads to a selection of a very small number of features from rare categories. If the scoring method used is not in itself a high-performing scheme, then adding these limited number of selected features will not be enough to discriminate rare categories. This argument is asserted by examining the F1 of individual categories. In this work, two Arabic datasets have been used which are Alj-News and Alj-Mgz. For each dataset, there are five versions; W (using the raw text), AS (using Al-Stem stemmer), SR (using Sebawai root extractor), MS (using RDI MORPHO3 stemmer), and MR (using RDI MORPHO3 root extractor). The difference among the versions of the same dataset is in number of words. However, the structure and the category distribution of the dataset are the same. Therefore, the raw text will be considered as a representative to the five versions of the dataset. For complete results for the other four versions, the reader is directed to Appendix A. 4.2.5

Alj-News Dataset

Similar to the 20NG dataset, Alj-News is an evenly distributed dataset. Therefore, the performance is evaluated for only the Avg, MD, Max, STD, and FLocal in terms of the MicroF1 . Figure 4.8 illustrates the performance for Alj-News-W. Examining these results yields the following observations: • The MicroF1 of this dataset is near perfect for most techniques which shows that the classification process of Alj-News dataset is fairly simple. • The diversity among the performance of different techniques is very small; usually less than 1%. This is with the exception of the Avg of DF and CC for small threshold values. Since the stop words were not eliminated from the Arabic datasets, small threshold values of DF and CC tends to select these stop word which are not helpful in the discrimination process.

59

0.9

0.9

0.85

0.8

0.8

0.7

0.75

0.6 Micro-F1

Micro-F1

0.7 0.65

0.5 0.4

0.6 0.3

0.55

0.2

0.5

0.1

0.45 0.4

0 0

2

4

6

8

10

0

2

4

Threshold

(a) CC

8

10

6

8

10

(b) DF

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6 Micro-F1

Micro-F1

6 Threshold

0.5 0.4

0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

0 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.6: MicroF1 of the thresholding techniques using Reuters(90)

60

0.45

0.45

0.4

0.4 0.35

0.35

0.3 Macro-F1

Macro-F1

0.3 0.25

0.25 0.2

0.2 0.15 0.15

0.1

0.1

0.05

0.05

0 0

2

4

6

8

10

0

2

4

Threshold

(a) CC

8

10

6

8

10

(b) DF

0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35 Macro-F1

Macro-F1

6 Threshold

0.3 0.25

0.3 0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.7: MacroF1 of the thresholding techniques using Reuters(90)

61

• Consistent with the results of the 20NG, the MD is the best performing method for thresholding DF scores and FLocal is superior for other feature scoring methods.

4.2.6

Alj-Mgz Dataset

Tables A.13 and A.14 show the average and standard deviation of five runs using different randomly selected subsets for the training and testing data. Figures 4.9 and 4.9 illustrate the performance of average MicroF1 , and MacroF1 for Alj-Mgz-W dataset. Similar to the R(10) and R(90) datasets, seven thresholding techniques are examined; Avg, MD, Max, NMD, NMax, FLocal, and WLocal (The results of the fourteen thresholding techniques are shown in Appendix A). Examining the results of this dataset yields the following observations: • Generally, the standard deviation of the MacroF1 is bigger than that of the MicroF1 . In line with previous results, the MacroF1 is affected considerably by the performance of rare categories compared to those which are frequent. As a matter of fact, the rare categories are more influenced by the randomization process as the selection/deselection of a single file in the small set of testing files may cause the accuracy of that category to increase/decrease significantly. • When using the MicroF1 measure, generally the MD, Max, FLocal, and WLocal show near identical performance for most methods in high threshold values. On the other hand, WLocal is superior for most small threshold values for all methods other than DF, where FLocal shows the best performance. This is due to the added bias to frequent categories. • For methods except DF, the FLocal is superior for the MacroF1 of most small threshold values whereas most other methods show near identical performance for higher thresholds. The superior performance of FLocal is due to forcing the selection of words from rare categories. • Similar to the Alj-News, the Avg exhibits a poor performance for the DF and CC for small threshold values. This is because these methods tend to select the stop words in small threshold values. For comparing among different versions of the same dataset, the best thresholding technique is selected to be plotted for each feature scoring method. Figure 4.12 represents the comparison for the MicroF1 of Alj-News dataset while Figures 4.13 and 4.14 show the performance of the MicroF1 and MacroF1 of Alj-Mgz dataset respectively. 62

1

0.95

0.95

0.9

0.9 Micro-F1

Micro-F1

1

0.85

0.85

0.8

0.8

0.75

0.75

0.7

0.7 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

6

8

10

Threshold

(a) CC

(b) DF

0.98

0.97

0.97 0.96 0.96 0.95 Micro-F1

Micro-F1

0.95 0.94

0.94

0.93 0.93 0.92 0.92 0.91 0.9

0.91 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.8: MicroF1 of the thresholding techniques using Alj-News-W

63

0.87

0.86

0.86

0.84

0.85

0.82

0.84

Micro-F1

Micro-F1

0.88

0.8

0.83

0.78

0.82

0.76

0.81

0.74

0.8 0

2

4

6

8

10

0

2

4

Threshold

(a) CC

8

10

6

8

10

(b) DF

0.87

0.87

0.86

0.86

0.85

0.85

0.84

0.84

Micro-F1

Micro-F1

6 Threshold

0.83

0.83

0.82

0.82

0.81

0.81

0.8

0.8 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.9: MicroF1 of the thresholding techniques using Alj-Mgz-W

64

0.8

0.8 0.79

0.78 0.78 0.77 0.76

0.74

Macro-F1

Macro-F1

0.76

0.72

0.75 0.74 0.73

0.7

0.72 0.68 0.71 0.66

0.7 0

2

4

6

8

10

0

2

4

Threshold

(a) CC

8

10

6

8

10

(b) DF

0.81

0.81

0.8

0.8

0.79

0.79

0.78

0.78 Macro-F1

Macro-F1

6 Threshold

0.77

0.77

0.76

0.76

0.75

0.75

0.74

0.74

0.73

0.73 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG

(d) MI

Figure 4.10: MacroF1 of the thresholding techniques using Alj-Mgz-W

65

3500

10000

Alj-News-W Alj-News-AS Alj-News-SR Alj-News-MS Alj-News-MR

3000

8000

2500

7000 No. of words

No. of words

Alj-Mgz-W Alj-Mgz-AS Alj-Mgz-SR Alj-Mgz-MS Alj-Mgz-MR

9000

2000

1500

6000 5000 4000 3000

1000

2000 500 1000 0

0 0

2

4

6

8

10

0

2

4

6

Threshold

Threshold

(a) Alj-News

(b) Alj-Mgz

8

10

Figure 4.11: Vocabulary size of the Arabic datasets due to different threshold values

Using the raw text (W) or either of the two stemmers; AS, and MS exhibit almost indifferent performance for both datasets. On the other hand, using the root extractor methods degrades the performance compared to the stemmed or the raw text. Generally, SR method is the worst performing method. Figure 4.11 shows the vocabulary size of each dataset. Examining this figure shows that the AS is the best performing pre-processing technique for both datasets. This is because it has the smallest vocabulary size compared to MS and W and it performs as well as them. Another remark is that the best performance is generally attained at threshold 5%. Beyond this threshold value, the performance is stable. This shows that the role of DR techniques in order to reduce the storage and computational resources without affecting the performance. 4.2.7

Conclusion

In this study, using the Standard Deviation (STD) and Maximum Deviation (MD) as global feature selections is purposed. In order to balance the bias of some scoring features to frequent categories, this work suggests normalizing feature scores before applying the globalization scheme. A complete study among these techniques and other state-of-art methods was conducted using different threshold values and different datasets. Generally, localization techniques are better than globalization methods, which supports the results obtained by [52, 64]. Additionally, localization techniques are much faster since thresholding is done on each category individually as opposed to the whole training set. 66

1

1

0.95

0.95 0.9

0.9

0.85

0.85 MicroF-1

MicroF-1

0.8 0.8 0.75

0.75 0.7

0.7

0.65

0.65

0.6

0.6

0.55

0.55

0.5 0

2

4

6

8

10

0

2

4

Threshold

(a) CC (FLocal)

8

10

6

8

10

(b) DF (MD)

0.98

0.98

0.96

0.96

0.94

0.94

0.92

0.92

MicroF-1

MicroF-1

6 Threshold

0.9

0.9

0.88

0.88

0.86

0.86

0.84

0.84 0

2

4

6

8

10

0

Threshold

2

4 Threshold

(c) IG (FLocal)

(d) MI (FLocal)

Figure 4.12: MicroF1 of different versions of Alj-News datasets

67

0.9

0.85

0.85

0.8

0.8

0.75

0.75

MicroF-1

MicroF-1

0.9

0.7

0.7

0.65

0.65

0.6

0.6

0.55

0.55 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

8

10

Threshold

(a) CC (FLocal)

(b) DF (FLocal)

0.88

0.88

0.86

0.86

0.84

0.84

0.82

0.82

0.8 MicroF-1

MicroF-1

0.8 0.78 0.76

0.78 0.76

0.74 0.74

0.72

0.72

0.7

0.7

0.68 0.66

0.68 0

2

4

6

8

10

0

Threshold

2

4

6 Threshold

(c) IG (FLocal)

(d) MI (FLocal)

Figure 4.13: MicroF1 of different versions of Alj-Mgz datasets

68

0.8

0.8

0.75

0.75 0.7

0.7

0.65

0.65 MacroF-1

MacroF-1

0.6 0.6 0.55

0.55 0.5

0.5

0.45

0.45

0.4

0.4

0.35

0.35

0.3 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

8

10

Threshold

(a) CC (FLocal)

(b) DF (FLocal)

0.8

0.8

0.75

0.75

0.7

0.65

MacroF-1

MacroF-1

0.7

0.6

0.65

0.6 0.55 0.55

0.5

0.45

0.5 0

2

4

6

8

10

0

Threshold

2

4

6 Threshold

(c) IG (FLocal)

(d) MI (FLocal)

Figure 4.14: MacroF1 of different versions of Alj-Mgz datasets

69

Furthermore, there might be a possibility to perform thresholding on local categories independently allows for parallism, which may help speedup the process of dimensionality reduction. FLocal is the best thresholding method in evenly distributed datasets and generally moderate diverse datasets when applied to methods other than DF. On the other hand, WLocal is the best in the MicroF1 and some cases of the MacroF1 of highly diverse dataset. Furthermore, WLocal achieves a performance that is either better than, or equivalent, to that of FLocal in the MicroF1 of moderate diverse dataset. MD shows an enhancement in the performance of thresholding when applied to DF, as a scoring method in an evenly distributed datasets. Normalized MD shows also potential in moderately diverse datasets and large threshold values of highly diverse datasets. However, MD has a performance similar to the Max in all other feature scoring methods. Generally, the MD outperforms STD in most cases. This follows intuitively from the fact that MD is more efficient in identifying good features. On the other hand, the Avg shows a poor performance comparable to other thresholding techniques. This is in consistent with the results of [71, 189]. Normalizing the features scores before applying a globalization scheme shows an enhancement in the MacroF1 of moderately diverse dataset. However, when using IG and MI in highly-skewed datasets, limited degradation, compared to the unmodified scheme is reported. For Arabic datasets, performing stemming is very useful as it reduces the vocabulary size significantly. In addition, it may enhance the performance slightly compared with using the raw text. On the other hand, using the root text leads to a degrading in the performance compared with using the stemmed or the raw text. This reduction is highly noticeable in large datasets like Alj-Mgz. This conclusion is in consistent with the one derived by [166] and results of IR field [106, 135]. Generally, using the Al-Stem stemmer (AS) is less than using RDI MORPHO3 stemmer (MS) in terms of the vocabulary size. While both tools exhibit nearly the same performance with a limited superiority of AS in some cases. Additionally, the Sebawai root extractor (SA) is superior to RDI MORPHO3 root extractor (MR) in terms of performance despite its large vocabulary size. Overall, the results suggest the usage of MD in an evenly distributed dataset and NMD for a moderately diverse dataset for thresholding DF scores. For methods other than DF, the results recommend FLocal for evenly distributed dataset. Furthermore, WLocal should be used in moderately diverse dataset in case that frequent categories are more important and the number of desired features is small. Otherwise, FLocal is recommended. However for a highly skewed dataset, WLocal is recommended for IG and MI as it generally enhances the classification of both frequent and rare categories. Additionally, WLocal is suggested for CC and DF if frequent categories are more important while FLocal is recommended if 70

rare categories are more important.

4.3

Combining Operators

In this work, four combining operators have been suggested namely; the Union (UN), Union-cut using Maximization (UCM), Union-cut using DF (UCD), and Intersection (INT). The four operators are used to combine pairs of feature sets generated locally by four feature scoring methods; CC, DF, MI, and IG. The choice of evaluating combining operators using only local thresholding techniques is based on the conclusion that local thresholding generally performs better and is considerably simpler than the global thresholding. Since the combining operators involve more computations, combining global feature sets seems to be inefficient in terms of computational time. Therefore, the performance of the combining operators is evaluated using the FLocal for evenly distributed datasets as well as FLocal and WLocal for uneven datasets. With respect to the Arabic datasets, the combining operators are evaluated using both the root and the stem text. It has been shown that the raw text leads to very large vocabulary size without improving the performance. Four preprocessing tools have been used; AS (using Al-Stem stemmer), SR (using Sebawai root extractor), MS (using RDI MORPHO3 stemmer), and MR (using RDI MORPHO3 root extractor). However, AS is chosen as a representative to the Arabic experiments since it is the best performed preprocessing tool (For complete results of other experiments, the reader is directed to Appendix B). To facilitate the assessment of the performance of the UCM and UCD operators, the similarities between the scoring method M1 and each of the UCM and UCD lists are evaluated at each examined threshold (Th). On the other hand, with respect to the analysis of the UN and INT operators, the equivalent thresholds due to using the UN or INT are calculated (see section 3.1.1). Therefore, the performance of the UN and INT lists are compared with the original lists at the equivalent threshold. In the following we will focus on the evaluation of the combining methods for each dataset separately. 4.3.1

The 20NG Dataset

Figure 4.15 shows that using both the IG and MI feature selection methods outperform the DF and CC. Therefore, combining any of these outperformed methods, IG or MI, with one of the underperformed methods, DF or CC, shows no improvement at all using any of the combining operators. On the other hand, it is notable that there is nearly no performance diversity between the DF and CC despite the low similarity between them. As a result, both the UCM and INT show a limited enhancement in the performance. Examining the similarity measure between the UCD list and the combined lists shows that there is a bias 71

to the DF list in combining DF with any method and a bias to CC in combining CC with either IG or MI. This bias even reaches 100% in some cases. Therefore, the performance of the UCD list is near identical to the either DF or CC which is outperformed by the other method. With respect to the IG and MI, none of the combining operators shows an enhancement in the performance due to the high similarity among the combined lists; generally greater than 80%. 4.3.2

The Ohsumed Dataset

Figures 4.16, and 4.17 present the MicroF1 and MacroF1 of combining pairs of features generated using the FLocal thresholding technique respectively. On the other hand, Figures 4.18, and 4.19 show the MicroF1 and MacroF1 using the WLocal thresholding technique respectively. The INT operator shows potential to enhance the performance of the CCDF and IG-MI. This is due to the limited diversity between these lists. Additionally, the UCM operator only improves the combining of small thresholds of CC and DF using the FLocal. This is because the performance proximity of CC and DF. Otherwise, none of the combining operators shows any improving in the performance due to the huge diversity among the combined lists. 4.3.3

The Reuters(10) Dataset

The performance of the combining operators using the FLocal and WLocal thresholding techniques is shown in Figures 4.20-4.23. While the performance of the previous datasets enhances with increasing the threshold value, the performance in Reuters(10) peaks at a small threshold, then it degrades slightly with increasing the threshold as a result of increasing noise. This nature of the Reuters(10) dataset affects the effectiveness of the combining operators. As a matter of fact, most operators show small enhancement while comparing its performance with the performance of individual lists at the equivalent threshold. However, this improvement is not very effective, since the best performance may have already been attained using a smaller threshold. Additionally, it might be worth noting that due to the poor performance of DF using WLocal thresholding technique none of the operators show an enhancement when combining DF with any method. This is on the contrary of FLocal where the performance diversity is not large. Therefore, the combining operators could lead to an enhanced or at least the same performance compared with the individual lists. 4.3.4

The Reuters(90) Dataset

Due to the high skew between the categories in this dataset, the conclusions based on the performance of the combining operators differs between the MicroF1 and MacroF1 . This 72

0.8

0.78

0.78

0.76

0.76

0.74

0.74

0.72

0.72

Micro-F1

Micro-F1

0.8

0.7 90

0.68

0.7 90

0.68

80

0.66

0.64

70

Similarity (%)

Similarity (%)

80 0.66

60 0.64

50

60 50 40

40

30 0

0.62

70

2

4

6

8

10

0

0.62

2

Threshold

4

6

8

10

Threshold

0.6

0.6 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.8

0.8

0.78

0.78

0.76 0.76 0.74

Micro-F1

Micro-F1

0.74 0.72

0.7

0.72

90

100 0.7

Similarity (%)

0.66

90 Similarity (%)

80 0.68

70 60

0.68

50

70 60

40 0

0.64

80

2

4

6

8

50

0.66

10

0

2

4

Threshold

6

8

10

Threshold

0.62

0.64 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.8

0.8

0.79 0.78 0.78 0.76 0.77 0.74

Micro-F1

0.72 100

0.75 100

0.74

0.7

0.68

90

Similarity (%)

Similarity (%)

Micro-F1

0.76

0.73

80 70 60

0.66

90

0.72 80 0

2

4

6

8

10

0

0.71

2

Threshold

4

6

8

10

Threshold

0.64

0.7 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.15: MicroF1 of the combining operators using 20NG (FLocal)

73

8

10

0.65

0.65

0.6

0.6

0.55

Micro-F1

Micro-F1

0.55

0.5

0.45

0.5

90

100 90

0.4

0.45

70 60 50 40

0.35

Similarity (%)

Similarity (%)

80

40 30

0.4

30 0

2

4

6

8

80 70 60 50

10

0

2

4

Threshold

0.3

8

10

0.35 0

2

4

6

8

10

0

2

4

Threshold

6

0.6

0.6

0.55

0.55

Micro-F1

0.65

0.5

0.5

90

100

80 Similarity (%)

Similarity (%)

90 0.45

70 60 50 40

0.4

10

(b) CC-IG

0.65

0.45

8

Threshold

(a) CC-DF

Micro-F1

6 Threshold

80 70 60 50

0.4

30

40 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.35

0.35 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.65

0.64

0.62 0.6 0.6

Micro-F1

0.58

0.5

0.56

0.54

100

100

90 0.52

80

Similarity (%)

0.45

Similarity (%)

Micro-F1

0.55

70 60

0.5

90

50

0.4

40

80 0

2

4

6

8

10

0

0.48

2

Threshold

4

6

8

10

Threshold

0.35

0.46 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.16: MicroF1 of the combining operators using Ohsumed (FLocal)

74

8

10

0.6

0.5

0.55

0.45

0.5

0.4

0.45

0.35

Macro-F1

Macro-F1

0.55

90

0.4

100 90

0.3

Similarity (%)

Similarity (%)

80 70 0.35

60 50 40

0.25

40 30

0.3

30 0

2

4

6

8

80 70 60 50

10

0

2

4

Threshold

6

8

10

Threshold

0.2

0.25 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.55

0.6

0.55

0.5

0.5

Macro-F1

Macro-F1

0.45

0.4 90

0.45

0.4

100

Similarity (%)

0.35

90 Similarity (%)

80 70 0.35

60 50 40

0.3

80 70 60 50

0.3

30 0

2

4

6

8

40

10

0

2

4

Threshold

6

8

10

Threshold

0.25

0.25 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.55

0.58

0.56 0.5 0.54

0.52 0.45

Macro-F1

0.4 100

0.48 100

0.46

80

0.44

70 60

90

0.42

50

0.3

Similarity (%)

90 0.35

Similarity (%)

Macro-F1

0.5

40

80 0

2

4

6

8

10

0

0.4

2

Threshold

4

6

8

10

Threshold

0.25

0.38 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.17: MacroF1 of the combining operators using Ohsumed (FLocal)

75

8

10

0.65

0.65

0.6 0.6

0.55

Micro-F1

Micro-F1

0.55

0.5 80

90

0.5

80

Similarity (%)

0.45

Similarity (%)

70 60 50 0.45

40

70 60 50 40

0.4 30

30 0

2

4

6

8

10

0

2

Threshold

4

6

8

10

Threshold

0.35

0.4 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.65

0.65

0.6 0.6

0.55

Micro-F1

Micro-F1

0.55

0.5

90

0.5

0.45

0.45

70

Similarity (%)

Similarity (%)

80

60 50 40 2

4

6

8

70 60 50 40 30

0.4

30 0

100 90 80

10

0

2

4

Threshold

6

8

10

Threshold

0.4

0.35 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.65

0.64

0.62 0.6 0.6 0.55

Micro-F1

0.5

0.56

100

100 Similarity (%)

0.54

90 0.45

Similarity (%)

Micro-F1

0.58

80 70 0.52

60

90

50

0.4

40 0

2

4

6

8

80

0.5

10

0

2

Threshold

4

6

8

10

Threshold

0.35

0.48 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.18: MicroF1 of the combining operators using Ohsumed (WLocal)

76

8

10

0.5

0.6

0.55

0.45

0.5

Macro-F1

Macro-F1

0.4

0.35 80

0.45

0.4

90 80

Similarity (%)

0.3

60

Similarity (%)

70 0.35

50 40

70 60 50 40

0.25

0.3

30 0

2

4

6

8

30

10

0

2

Threshold

4

6

8

10

Threshold

0.2

0.25 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.55

0.6

0.55 0.5 0.5 0.45

Macro-F1

0.4 90

0.35

80 Similarity (%)

0.35

0.4

Similarity (%)

Macro-F1

0.45

70 60 0.3

50 40

0.3

30 0

2

4

6

8

70 60 50 40 30

0.25

10

100 90 80

0

2

4

Threshold

6

8

10

Threshold

0.25

0.2 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.55

0.56

0.54 0.5 0.52 0.45

0.5

Macro-F1

0.35

100

0.46 100

0.44

0.3

80

0.42

70 60

90

0.4

50 0.25

Similarity (%)

90 Similarity (%)

Macro-F1

0.48 0.4

40

80 0

2

4

6

8

10

0

0.38

2

Threshold

4

6

8

10

Threshold

0.2

0.36 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.19: MacroF1 of the combining operators using Ohsumed (WLocal)

77

8

10

0.95

0.95

0.94

0.94

0.93 0.93

Micro-F1

Micro-F1

0.92

0.91 90

0.92

0.91

100

0.89

90

80 0.9

70 60

80 70 60 50

0.89

50

0.88

Similarity (%)

Similarity (%)

0.9

0

2

4

6

8

40

10

0

2

4

Threshold

6

8

10

Threshold

0.87

0.88 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.95

0.95

0.945 0.94 0.94 0.93 0.935 0.92

Micro-F1 100

0.9

0.925 100

0.92

Similarity (%)

90

0.89

90

80

Similarity (%)

Micro-F1

0.93

0.91

0.915

70 0.91

60

70 60

50

0.88

80

50 0

2

4

6

8

10

0

0.905

2

4

Threshold

6

8

10

Threshold

0.87

0.9 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.95

0.95

0.945

0.945

0.94 0.94

Micro-F1

0.93

0.93

100

100

0.925

0.92

Similarity (%)

0.925 Similarity (%)

Micro-F1

0.935 0.935

90 0.92

80

70 0

2

4

6

8

80

0.915

10

90

0

2

Threshold

4

6

8

10

Threshold

0.915

0.91 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.20: MicroF1 of the combining operators using Reuters(10) (FLocal)

78

8

10

0.91

0.91

0.9 0.9 0.89 0.89 0.88 0.88

Macro-F1

Macro-F1

0.87

0.86 90

0.85

0.87 100

0.84

0.83

70 0.85 60

80 70 60 50

50 0

0.82

90 Similarity (%)

Similarity (%)

0.86 80

2

4

6

8

40

0.84

10

0

2

4

Threshold

6

8

10

Threshold

0.81

0.83 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.91

0.91

0.9

0.9

0.89 0.89

Macro-F1

0.87 100 0.86

0.88

0.87

100

Similarity (%)

90

0.85

90

80

Similarity (%)

Macro-F1

0.88

0.86

70 60

70 60

0.85

50

0.84

80

0

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.83

0.84 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.91

0.91

0.905 0.905 0.9

0.895

0.9

Macro-F1

0.885

0.88 100

0.895

0.89

100

0.87

Similarity (%)

0.875 Similarity (%)

Macro-F1

0.89

90 0.885 80

90

0.865 0.88

70 0

0.86

2

4

6

8

80

10

0

2

Threshold

4

6

8

10

Threshold

0.855

0.875 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.21: MacroF1 of the combining operators using Reuters(10) (FLocal)

79

8

10

0.96

0.94

0.95

0.93

0.94

0.92

0.93

Micro-F1

Micro-F1

0.95

0.91

0.9

0.92

0.91

Similarity (%)

0.89

0.88

90

80

0.9

70

0.89

Similarity (%)

90

60

70 60 50

0

0.87

80

2

4

6

8

10

0

0.88

2

4

Threshold

6

8

10

Threshold

0.86

0.87 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.96

0.95

0.95 0.945 0.94 0.94

Micro-F1

0.92

0.91 100 Similarity (%)

0.9

0.89

0.935

0.93

100 90

90

Similarity (%)

Micro-F1

0.93

0.925

80 70

70 60

0.92

60 0

0.88

80

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.87

0.915 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.955

0.955

0.95

0.95

0.945 0.945

Micro-F1

0.935 100

0.94

0.935

100

0.93

0.925

90

90 0.93

80 70

80 70 60

0.925

60

0.92

Similarity (%)

Similarity (%)

Micro-F1

0.94

0

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.915

0.92 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.22: MicroF1 of the combining operators using Reuters(10) (WLocal)

80

8

10

0.95

0.92

0.9 0.9

0.88

0.86 0.85

Macro-F1

Macro-F1

0.84

0.8 90

0.82

0.8 90

Similarity (%)

0.75

Similarity (%)

0.78 80 0.76 70

70 60

0.74

0.7 60

50 0

2

4

6

8

10

0

0.72

Threshold

0.65

2

4

6

8

10

Threshold

0.7 0

2

4

6

8

10

0

2

4

Threshold

0.9

0.9

0.85

0.89

Macro-F1

0.91

0.8 100

10

0.88 100 90

90

0.87

Similarity (%)

Similarity (%)

8

(b) CC-IG

0.95

0.75

6

Threshold

(a) CC-DF

Macro-F1

80

80 70

80 70 60

0.7

0.86 60

50 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.65

0.85 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.92

0.92

0.91

0.91

0.9 0.9

Macro-F1

0.88 100

0.89

0.88

100

0.87

0.86

90

90 0.87

80 70

80 70 60

0.86

60

0.85

Similarity (%)

Similarity (%)

Macro-F1

0.89

0

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.84

0.85 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.23: MacroF1 of the combining operators using Reuters(10) (WLocal)

81

8

10

is in the contrary of the previous datasets where the performance of the frequent categories MicroF1 has nearly the same trend of the performance of rare categories MacroF1 . Figures 4.24-4.25 present the performance of the combining operators using the FLocal and WLocal thresholding technique. With respect to the FLocal thresholding, the similarity between the IG and MI is very high (generally > 90%). This leads to almost similar performance of the combining operators to the individual lists, either in the MicroF1 or the MacroF1 . Combining other methods, all the operators show an improved performance for the MicroF1 . The only exception is using the INT operator in combining the CC with any method in threshold values less than 1.5%. This is mainly because of the poor performance of the CC feature scoring method for these values. On the other hand, it is notable that all the methods outperform CC significantly for the MacroF1 . This performance diversity leads to a degradation in the performance when combining CC with any method using the INT and the UCD. As the DF is naturally based to frequent categories, it is outperformed by IG and MI using the MacroF1 measure. This leads to some degradation in the performance of the combining operators when combining DF with IG or MI. This is with the exception of UCM operator that shows some improvement. Examining the similarity measure between M1 and UCM shows that the UCM list is biased to IG or MI and only few words are taken from the DF list. However, these words could enhance the performance. This improvement reaches 3% at small threshold values. For the WLocal thresholding technique, the performance diversity among the four feature scoring methods is apparent. MI outperforms IG slightly. They both outperform CC and DF significantly. On the other hand, CC is outperformed by DF significantly in small threshold values. Similar to Reuters(10), the MacroF1 of all methods reaches a peak value then the performance degrades with increasing the threshold. All these conditions cause the performance of the combining operators to be slightly poorer with the exception of some few cases. 4.3.5

Alj-News-AS Dataset

Examining the results shown in Figure 4.28 indicates that all scoring methods outperform the DF significantly for small threshold values. Therefore, only the UN operator leads to a slight enhancement in combining MI-DF and IG-DF. However, none of the other operators improve the performance in combining any method with the DF. On the other hand, the UN and INT operators show some potential in improving the performance for combining CC-IG, CC-MI, and IG-MI. It should be noted that the UCD and UCM operators generally outperform the least-performed combined list but they could not outperform the other method. 82

0.9

0.9

0.85 0.85 0.8

100 90 Similarity (%)

0.75

0.7

80 70 60 50

0.75

0.7

100 90 Similarity (%)

Micro-F1

Micro-F1

0.8

0.65

40 30

40 30

0.6 0

2

4

6

8

80 70 60 50

10

0

2

4

Threshold

6

8

10

Threshold

0.65

0.55 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.9

0.88

0.86

0.85

0.84

Micro-F1

0.75

Similarity (%)

0.7

100 90 80

0.8

70 60 50

0.78

100 90 80 70 60

40 30

0.65

0.82

Similarity (%)

Micro-F1

0.8

0.76 0

2

4

6

8

50

10

0

2

4

Threshold

0.6

8

10

0.74 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG 0.88

0.86

0.86

0.84

0.84

Micro-F1

0.88

0.82

0.82

100

100

90 0.8

Similarity (%)

0.8

Similarity (%)

Micro-F1

6 Threshold

80 70 60

0.78

0.78 50

90 0

2

4

6

8

10

0

2

Threshold

4

6

8

10

Threshold

0.76

0.76 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.24: MicroF1 of the combining operators using Reuters(90) (FLocal)

83

8

10

0.5

0.4

0.45

0.35

0.4

Macro-F1

0.3 100 90 Similarity (%)

0.25

80 70 60 50

100 90 0.3

40 30

0.2

0.35

Similarity (%)

Macro-F1

0.45

40 30

0.25 0

2

4

6

8

80 70 60 50

10

0

2

4

Threshold

6

8

10

Threshold

0.15

0.2 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.5

0.48

0.46

0.45

0.44 0.4

100 90 80

100 0.38

70 60 50

90

0.36

80 70 60

40 30

0.2

0.4

Similarity (%)

0.25

Macro-F1

0.3

Similarity (%)

Macro-F1

0.42 0.35

0

2

4

6

8

50

0.34

10

0

2

4

Threshold

6

8

10

Threshold

0.15

0.32 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.46

0.46

0.45 0.44 0.44 0.42

Macro-F1

0.4 100

0.42 100 Similarity (%)

0.41

90 0.38

Similarity (%)

Macro-F1

0.43

80 70

0.4

60 0.36 50 0

2

4

6

8

90

0.39

10

0

2

Threshold

4

6

8

10

Threshold

0.34

0.38 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.25: MacroF1 of the combining operators using Reuters(90) (FLocal)

84

8

10

0.9

0.9

0.85

0.85

0.8

Micro-F1

Micro-F1

0.8

0.75 80

0.75

0.7

90 80

Similarity (%)

0.7

Similarity (%)

70 60 0.65

50 40 30

0.65

30 20

0.6

20 0

2

4

6

8

70 60 50 40

10

0

2

4

Threshold

6

8

10

Threshold

0.6

0.55 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.9

0.88

0.86 0.85

0.84

Micro-F1

Micro-F1

0.8

0.82

90

0.75

100

0.7

90 0.8

70

Similarity (%)

Similarity (%)

80

60 50

80 70 60

40

0.78

30

50 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.65

0.76 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.88

0.88

0.87 0.86 0.86 0.84

Micro-F1

0.84

0.82

100

100

0.83

0.82

90

90

0.8

Similarity (%)

Similarity (%)

Micro-F1

0.85

80 70

80 70 60

0.78 60

0.81

50 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.8

0.76 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.26: MicroF1 of the combining operators using Reuters(90) (WLocal)

85

8

10

0.45

0.5

0.45

0.4

0.4 0.35

80 Similarity (%)

90 80

0.25

70 0.2

0.3

Similarity (%)

0.25

Macro-F1

Macro-F1

0.35 0.3

60 50 0.2

40 30

0.15

20 0

2

4

6

8

30 20

0.15

10

70 60 50 40

0

2

Threshold

6

8

10

Threshold

0.1

0.1 0

2

4

6

8

10

0

2

4

Threshold

0.45

0.46

0.4

0.44

0.35

0.42

Macro-F1

0.48

0.3 90

90 Similarity (%)

Similarity (%)

100

70 60 0.36

50

80 70 60

40 30

0.15

10

0.4

0.38

80

0.2

8

(b) CC-IG

0.5

0.25

6

Threshold

(a) CC-DF

Macro-F1

4

0

2

4

6

8

50

0.34

10

0

2

4

Threshold

6

8

10

Threshold

0.1

0.32 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.48

0.48

0.46

0.46

0.44 0.44 0.42

Macro-F1

0.38 100

0.4 100 0.38

0.34

90

90

Similarity (%)

0.36

Similarity (%)

Macro-F1

0.42 0.4

80 0.36 70

70 60

60 0

0.32

80

2

4

6

8

50

0.34

10

0

2

4

Threshold

6

8

10

Threshold

0.3

0.32 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.27: MacroF1 of the combining operators using Reuters(90) (WLocal)

86

8

10

0.97

0.97

0.96 0.96 0.95 0.95

Micro-F1

Micro-F1

0.94

0.93

0.94

80

100

0.91

90

70

0.93

Similarity (%)

Similarity (%)

0.92

60 50

70 60 50

0.92 40

0.9

80

40 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.89

0.91 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.98

0.98

0.97 0.97 0.96 0.96

0.95

Micro-F1

Micro-F1

0.94 0.95

0.94

100

0.93 100

0.92

90

0.93

80

0.91

70 0.9

60 0.92

Similarity (%)

Similarity (%)

90

80 70 60 50

50

40 0

2

4

6

8

10

0

0.89

2

4

Threshold

6

8

10

Threshold

0.91

0.88 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.97

0.97

0.96 0.965 0.95

Micro-F1

0.96

0.93

0.92

0.955

100

100

90

0.9

0.95

Similarity (%)

0.91

Similarity (%)

Micro-F1

0.94

80 70

90

80

60 0.945 50

70 0

0.89

2

4

6

8

10

0

2

Threshold

4

6

8

10

Threshold

0.88

0.94 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.28: MicroF1 of the combining operators using Alj-News-AS (FLocal)

87

8

10

4.3.6

Alj-Mgz-AS Dataset

Figures 4.29-4.32 illustrate that the performance of feature scoring methods in this dataset reaches a saturation point from nearly threshold 5%. Accordingly, the performance of the combining operators in most cases, is slightly better, or the same as the performance of individual lists, at the equivalent threshold value. Even when there is a degradation in the performance, this degradation is very small and occurs mostly at small threshold values. 4.3.7

Conclusion

Table 4.3 summarizes the results of the combining operators using different datasets where + means a performance improvement, - means a degradation, and ' refers to an approximate performance. Analyzing this table leads to the following conclusions:

Table 4.3: Results summary of the combining operators UN

UCD

UCM

INT

MicroF1 20NG

CC-DF

'

-

+

+

CC-IG,CC-MI

'

-

-

-

DF-IG,DF-MI

-

-

-

-

-

-

-

'

+ (FLocal)

+

IG-MI

MicroF1 and MacroF1 Ohsumed (FLocal, WLocal)

CC-DF

-

'

- (WLocal CC-IG,CC-MI

-

-

-

-

DF-IG,DF-MI

-

-

-

-

'

-

-

+

+, '

+, '

IG-MI

MicroF1 and MacroF1 Reuters(10) (FLocal)

CC-DF

+, '

-

CC-IG,CC-MI

+, '

-

-

+, −

DF-IG,DF-MI

+ (MicroF1 )

-

+, −

+, '

+, '

-

+, '

+, '

IG-MI

MicroF1 and MacroF1 Reuters(10) (WLocal)

CC-DF

+, '

-

-

+, −

CC-IG,CC-MI

+, −

-

'

+, −

'

-

-

-

+, −

-

'

+,-

DF-IG,DF-MI IG-MI

MicroF1

Reuters(90) (FLocal)

CC-DF

'

+

+

+>1.5%

CC-IG,CC-MI

+

+

+

+>1.5%

DF-IG,DF-MI

'

+

+

+

IG-MI

'

'

'

'

88

Table 4.3 – continued UN

UCD

UCM

INT

MacroF1 CC-DF

-

-

+

-

CC-IG,CC-MI

+

-

+

-

DF-IG,DF-MI

+,-

-

+

-

IG-MI

'

+,'

+,'

+,'

CC-DF

'

-

-

-

CC-IG,CC-MI

-

-

+,'

-

DF-IG,DF-MI

-

-

+/-

+/-

IG-MI

+,'

-

-

+,'

IG-MI

-

-

-

+<2.5%

other methods

-

-

-

-

MicroF1

Reuters(90) (WLocal)

MacroF1

MicroF1 Alj-News-AS

CC-DF

'

−, '

−, '

-

CC-IG,CC-MI

+

-

-

+,-

DF-IG,DF-MI

+,-

-

-

-

+, '

-

-

+

IG-MI

MicroF1 and MacroF1 Alj-Mgz-AS (FLocal)

CC-DF

-

-

−, '

−, +

CC-IG

+

+, '

+, '

+

CC-MI DF-IG,DF-MI IG-MI

-

-

-

-

+, '

+, '

+, '

+

+, '

+, '

'

+, −

MicroF1 and MacroF1 Alj-Mgz-AS (WLocal)

CC-DF

', −

-

-

+, '

CC-IG

+

+, '

+, '

+, '

CC-MI

', −

'

', +

-

DF-IG,DF-MI

+, −

-

+, −

+, −

IG-MI

+, −

+, '

−, '

+

• The are three main factors that highly affect the effectiveness of the combining operators: – The correlation between the combined lists: If the combined lists are highly correlated, then the resulted list would be very similar to the original list, leading to a near identical performance. This is apparent in combining MI and IG using Reuters(90)-FLocal dataset. Since the similarity between the combined lists is above 90%, the performance of the combining operators is nearly the same as the original. 89

0.86

0.84

0.84

0.82

0.82

Micro-F1

Micro-F1

0.86

0.8

0.8

90

90

80

80 0.78

70 60 50 40

0.76

Similarity (%)

Similarity (%)

0.78

70 60 50 40

0.76

30

30 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

0.74

0.74 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.87

0.86

0.86 0.85 0.85

0.84

0.84

Micro-F1

0.82

0.81 90

0.83

0.82

100

80 Similarity (%)

0.8

0.79

90 Similarity (%)

Micro-F1

0.83

70 0.81

60 50

0.8

30 0

0.77

70 60

40

0.78

80

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.76

0.79 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.87

0.87

0.86

0.86

0.85 0.85

Micro-F1

0.83 100 0.82

0.84

0.83

100

0.81

80

Similarity (%)

90 Similarity (%)

Micro-F1

0.84

0.82

70

90

80

60 0.81

50

0.8

0

2

4

6

8

70

10

0

2

4

Threshold

6

8

10

Threshold

0.79

0.8 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.29: MicroF1 of the combining operators using Alj-Mgz-AS (FLocal)

90

8

10

0.82

0.8

0.8

0.78

0.78

0.76

0.76

0.74

0.74

Macro-F1

0.72

0.7 90 80 Similarity (%)

0.68

0.66

0.7 90 80

0.68

70 60

0.66

50 40

0.64

0.72

Similarity (%)

Macro-F1

0.82

60 50 40

0.64

30

30 0

0.62

70

2

4

6

8

10

0

0.62

Threshold

0.6

2

4

6

8

10

Threshold

0.6 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.82

0.82

0.8 0.8 0.78 0.78

0.76

Macro-F1

Macro-F1

0.74

0.72 90

0.7

0.76

0.74

100

0.68

0.66

90 Similarity (%)

Similarity (%)

80 70 0.72

60 50

70 60

40 0.7

30 0

0.64

80

2

4

6

8

50

10

0

2

4

Threshold

6

8

10

Threshold

0.62

0.68 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.82

0.81

0.8 0.8 0.79 0.78

0.74

Macro-F1

0.76

100

0.77

0.76 100

90 0.72

0.75

Similarity (%)

Similarity (%)

Macro-F1

0.78

80 70 0.74

90

80

60 0.7

50

70 0

2

4

6

8

10

0

0.73

2

4

Threshold

6

8

10

Threshold

0.68

0.72 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.30: MacroF1 of the combining operators using Alj-Mgz-AS (FLocal)

91

8

10

0.86

0.85

0.85

0.84

0.84

0.83

0.83

0.82

0.82

Micro-F1

0.81 80 Similarity (%)

0.8

0.79

0.78

0.81 80

0.8

70

70

Similarity (%)

Micro-F1

0.86

0.79

60 50

0.78

50 40

40

30 0

0.77

60

2

4

6

8

10

0

0.77

2

Threshold

4

6

8

10

Threshold

0.76

0.76 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.87

0.86

0.86 0.85 0.85 0.84 0.84 0.83

Micro-F1

Micro-F1

0.83

0.82

0.82

90

0.81

100

0.8

0.79

70 0.8 60

80 70 60

50 0

0.78

90 Similarity (%)

Similarity (%)

0.81 80

2

4

6

8

50

0.79

10

0

2

4

Threshold

6

8

10

Threshold

0.77

0.78 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.86

0.86

0.85

0.85

0.84

Micro-F1

0.83

0.82

0.83

100

100

90 0.81

0.82

Similarity (%)

Similarity (%)

Micro-F1

0.84

80 70 60

90 80 70

0.81

0.8

50

60 0

2

4

6

8

10

0

2

Threshold

4

6

8

10

Threshold

0.79

0.8 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.31: MicroF1 of the combining operators using Alj-Mgz-AS (WLocal)

92

8

10

0.8

0.82

0.8

0.78

0.78

0.76

0.76 0.74 0.74

Macro-F1

0.7 80 Similarity (%)

0.68

0.66

0.64

0.72

0.7 80 0.68

70 60

70 Similarity (%)

Macro-F1

0.72

0.66

50

50 40

0.64 40

30 0

0.62

60

2

4

6

8

10

0

0.62

Threshold

0.6

2

4

6

8

10

Threshold

0.6 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(a) CC-DF

(b) CC-IG

0.82

0.82

0.8 0.8 0.78 0.78

0.76

0.76

Macro-F1

0.72

0.7 90

0.74 100 0.72

Similarity (%)

0.68

0.66

90

80

Similarity (%)

Macro-F1

0.74

70 0.7 60

70 60

0.64 50 0

0.62

80

2

4

6

8

50

0.68

10

0

2

4

Threshold

6

8

10

Threshold

0.6

0.66 0

2

4

6

8

10

0

2

4

Threshold

6

8

10

Threshold

(c) CC-MI

(d) DF-IG

0.82

0.81

0.8 0.8 0.79 0.78 0.78 0.76

Macro-F1

0.74 100 0.72

0.76 100

0.75

0.7

80

Similarity (%)

90 Similarity (%)

Macro-F1

0.77

0.74

70 0.73

60 50

0.68

90 80 70 60

0

2

4

6

8

10

0

0.72

2

Threshold

4

6

8

10

Threshold

0.66

0.71 0

2

4

6

8

10

0

Threshold

2

4

6

Threshold

(e) DF-MI

(f) IG-MI

Figure 4.32: MacroF1 of the combining operators using Alj-Mgz-AS (WLocal)

93

8

10

– The performance diversity of the combined lists: Most combining operators show a degraded performance when the diversity between the performance of the combined lists is high. As a matter of fact, the performance of the resulted list suppresses the least performed list, mostly, but it could not outperform the best performance achieved. The most promising results have been achieved when using the combining operators to combine lists which are highly-uncorrelated and have comparable performance. For example, the diversity between the feature scoring methods in the 20NG is huge. Therefore, none of the combined operators shows improvement with the exception of combining CC and DF where the performance is close. – The nature of the datasets: Some datasets such as Alj-Mgz have the nature that leads to a performance saturation at some threshold. Others such as Reuters(10) shows a performance that peaks at some threshold, then the performance degrades with the increasing in threshold. Both cases put a limitation on the capabilities of the combining operators to enhance the performance. • The INT operator shows some potential in performance enhancement and hence it could reduce the storage requirements. This is because the INT operator leads to a reduced size feature set. Therefore, if it could enhance the performance compared with the equivalent threshold, this would lead to a reduction in the storage resources. This reduction will affect the computational resources as well. • The UN operator has a potential in improving the performance in some cases where the performance diversity is close and the correlation between the combined lists is high. However, it increases the storage needed and hence its performance should be compared with the combined lists at higher thresholds. • The UCD and UCM operators enhance the performance in some cases without increasing the storage. • Generally, the UCM operator shows a better performance than the UCD operator due to being independent on the DF that has been proved to not be an effective way to represent the relevance to different categories.

4.4

BenchMark Results

In [48], IG using the Max was implemented for threshold 10% using SVM classifier. The results of the Reuters(10) dataset were 0.93 and 0.88 for the MicroF1 and MacroF1 respectively. While the results of this work for the same threshold are 0.945 and 0.893. For the 94

1

Sakr 10-IG(FLocal)

0.95

Accuracy

0.9

0.85

0.8

0.75

0.7 Politics

Arts and Culture

Sports Categories

Economy

Medical

Figure 4.33: Performance evaluation of the proposed system in comparison with Sakhr categorizer using five random splits of Alj-Mgz dataset

Reuters(90) datasets, the study of [48] showed 0.86 and 0.42 for the MicroF1 and MacroF1 respectively. On the other hand, this work achieves a MicroF1 of 0.874 and a MacroF1 of 0.433 for the same threshold. The work of [130] showed a MicroF1 of 0.855 and 0.55 for the Reuters(90) and ohsumed datasets respectively. These results were obtained for the IG and FLocal at threshold 10% using SVM classifier. On the other hand, this work achieved MicroF1 of 0.874 and 0.636 for the same datasets under the same conditions. It might be worth noting that [130] implemented IG using equation 3.2 while this work implemented equation 3.3. This might be a reason for this diversity in results. The experiment conducted for Alj-News Arabic dataset showed a MicroF1 of 0.945 using stem words [129]. This work exhibited an improved MicroF1 of 0.975 using the Avg and MI at threshold value of 7.5%. Despite using different classifier and different feature selection techniques, the work of [129] is the only available work in the Arabic language to benchmark the results of this work with. This is due to the unavailability of Arabic benchmark datasets. In order to evaluate our system in Arabic language with a commercial product, the Saker8 categorizer system is used. However, no technical information is available for this system. The files of the Alj-Mgz dataset were tested using this system. In order to conduct a fair comparison with Sakhr, the following has been done: 8 http://siraj.sakhr.com/

95

• Since there are no separate categories for Arts and Culture in Sakhr categorizer, the Arts and Culture categories are combined into one category named "Arts and Culture". • The files in "Local-News" category are not considered since this category does not exist in Sakhr system. • The files in the "Society" Category are not considered since they are very biased to the Saudi Society. • There are three categories in Sakhr system that do not exist in Alj-Mgz dataset. There categories are "Accidents", "Science and tech", and "Religion". Therefore, if Sakhr claims that any file of Alj-Mgz dataset belongs to one of these categories, this file is not considered in the evaluation process. • Since Sakhr categorizer trial system does not categorize files larger than 50k, these files are not considered • The "International-News" category of Alj-Mgz dataset is assumed to be the "Politics" category in Sakhr. This results in about 3000 files splitted into five categories; Politics, Arts and Culture, Sports, Economy, and Health. Figure 4.4 shows the performance of Sakhr system in comparison with a system that uses feature filtering in DR. The scoring method used is IG with FLocal as a thresholding technique for threshold 10%. The pre–processing tool used is Al-Stem Stemmer (AS). Five random splits have been taken from Alj-Mgz dataset in order to make cross validation. The results show that this system has exhibited a performance better than Sakhr’s even in general categories such as "Politics" which has no biased to the Saudi Society. This shows the potential of the feature filtering system in Arabic TC. As a conclusion, this work showed comparable results with the state-of-art research in the literature. This increases the confidence in this work which helps to introduce Alj-Mgz Arabic dataset to be used as a benchmark dataset in Arabic TC.

96

Chapter 5

Discusion and Conclusion 5.1

Conclusion

This main contribution of this work is to enhance existing DR techniques and second is to conduct a comparative study that allows users to make comprehensive choices among available techniques. The main objective is to achieve the highest performance with the most simplest techniques. Due to the simplicity and efficiency of the feature filtering approach, this work demonstrates this approach in order to perform the DR process. The feature filtering approach is divided into two main stages: • Applying a feature scoring method in order to evaluate features in the training set. Several feature scoring methods have been investigated in the literature. Among them are the DF, IG, MI, and CC. This work proposes several combining operators that could be used in order to combine pairs of feature scoring methods. These operators are the Union (UN) operator, the Union-cut with DF (UCD) operator, the Union-cut with maximization (UCM) operator, and the Intersection (INT) operator. A comparative study has been conducted using several datasets in order to evaluate the performance of these operators. The results also show the potential of these operators in terms of performance enhancement and storage reduction. However, the effectiveness of these operators depend highly on three factors; the correlation between the combined lists, the performance diversity between the combined lists, and the nature of the dataset used. It has been shown that the less diversity in the performance and the less correlation between the combined lists, the more effective the combining operators. The results show that the INT operator has a potential in improving the performance and reducing the storage while the UN operator enhances the performance but increases the storage required. Both the UCD and the UCM operators improve the 97

performance without affecting the storage. Notebely is the fact that the UCD operator is outperformed by the UCM operator. This stems from the bias of the UCD operator to DF. • Thresholding scored features: This work presents new methods for global thresholding; namely Standard Deviation (STD), Maximum Deviation (MD), and normalizing global thresholding methods. A comparative study has been performed among these techniques and state-of-arts techniques using different nature datasets. The results show that local thresholding is generally better than global thresholding for feature scoring methods with the exception of the DF. In the DF, MD and normalized MD have shown some potential in improving the performance. Additionally, the results show that the performance of the thresholding techniques is highly dependent on the nature of the dataset whether it is evenly distributed, moderately diverse, or highly skewed. Care should be given to the chosen thresholding technique according to the type of the dataset and the feature scoring method used. The results suggest using MD for evenly distributed datasets and NMD for moderately distributed datasets when using DF as a feature scoring method. For methods other than DF, FLocal is recommended for evenly distributed datasets and WLocal is suggested for moderately diverse datasets if frequent categories are of higher importance. With respect to highly-skewed datasets, WLocal is recommended for IG and MI feature scoring methods while FLocal is suggested for DF and CC if rare categories are more important than frequent ones. Four English benchmark datasets have been used in order to evaluate the performance of the proposed techniques. They are 20NG, Ohsumed, Reuters(10), and Reuters(90). Additionally, the experiments have been conducted using two Arabic datasets. The Arabic datasets have been investigated using the raw words, the stem , and the root text. The results indicate that using the stem words does not lead to a significant loss in the accuracy while reducing the vocabulary size signifcantly. On the other hand, using the root text dramatically reduces the storage requirements. However, it has shown to lead to some degradation in the performance notable in large dataset. Two stemmers have been used in the Arabic experiments which are Al-Stem stemmer (AS), and RDI MORPHO3 stemmer (MS). Additionally, two root extraction tools have been applied which are the Sebawai root extractor (SR), and RDI MORPHO3 root extractor (MR). The results show that AS is better performed than MS and leads to more reduction in the storage requirements. On the other hand, SR leads to a larger feature set compared to MR but SR has a better performance comparable to MR.

98

The proposed combining operators and thresholding techniques have an improved the performance of the feature filtering approach. This leads to improved categorization accuracy and a saving in the feature set size. In addition, this reduction would help reduce the storage and decrease the computational resources. The proposed systems have shown comparable results with benchmark datasets. With respect to the Arabic datasets, the feature filtering approach has shown high superiority compared with the Sakhr online categorizer system using Alj-Mgz dataset.

5.2

Future Work

Some issues that remain open to future exploration • A further investigation is needed for other document representation techniques such as the phrase-based representation 2.1.2. None of the studies conducted for the English language showed a significant enhancement due to using phrase-based representation instead of the word-based representation. However, there is no similar comparison for the Arabic language. • The usage of semantic knowledge to enhance the performance of Arabic TC is needed to be explored. The main difficulty of using semantic information is that the research in semantic for the Arabic language is still in its first steps. As shown in section 2.1.2, most studies used WordNet for semantic knowledge. Unfortunately, the Arabic language is not implemented in the WordNet. Therefore, using semantic in Arabic TC would be a challenge. • This work proposes using the combining operators to combine pairs of feature sets produced by the filter approach of DR. An extended study could be conducted to investigate applying these combining operators to feature sets produced by feature extraction methods such as PCA and LSI. • Further investigation is needed to know the effect of combining more than two lists using the proposed combining operators. • A more extended study is needed to investigate the performance of the proposed thresholding techniques and combining operators using other classifier such as kNN, Rocchio, NNet etc . . . . • Using multiple classifier systems could be adopted to enhance the performance further. Ideas such as Boosting and Bagging are needed to be investigated.

99

List of Papers Resulting from this Thesis Wanas N., Said D. , Hegazy N., and Darwish N. A study of local and global thresholding techniques in text categorization. In Proc. of the Australasian Data Mining Conference (AusDM 2006), Volume 61 of Conferences in Research and Practice in Information Technology (CRPIT), 91-101 , Sydney, Australia, November 29-30, (2006). Wanas N., Said D. , Hegazy N., and Darwish N. Combining local feature scoring methods for text categorization. Special Issue on: Multiple Classifier Systems, Artificial Intelligence and Machine Learning (AIML) 23-33, (2006). Wanas N., Said D. , Hegazy N., and Darwish N. Combining local feature scoring methods for text categorization. In Special session on Multiple Classifier Systems, 4th International Conference on Informatics and Systems (INFOS2006), Cairo, Egypt, March 25-27, (2006).

100

References [1] Abe, S. Support Vector Machines for Pattern Classification. Advances in Pattern Recognition. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2005. [2] Abu-Salem, H., Al-Omari, M., and Evens, M. Stemming methodologies over individual query words for arabic information retrieval. Journal of the American Society for Information Science and Technology (JASIST), 50(6):524–529, 1999. [3] Adami, G., Avesani, P., and Sona, D. Bootstrapping for hierarchical document classification. In Proc. of the 12th ACM International Conference on Information and Knowledge Management (CIKM’03), pages 295–302, New Orleans, United stated, Nov. 02–08, 2003. ACM Press, New York, United States. [4] Al-Kharashi, I. and Evens, M. Comparing words, stems, and roots as index terms in an arabic information retrieval. Journal of the American Society for Information Science and Technology (JASIST), 45(8):548–560, 1994. [5] Al-Shalabi, R., Kanaan, G., and Al-Serhan, H. New approach for extracting Arabic roots. In Proc. of the 2003 Arab conference on Information Technology (AC˘ Z2003), ´ ITâA pages 42–59, Alexandria, Egypt, December 2003. [6] Al-Sughaiyer, I. and Al-Kharashi, I. Arabic morphological analysis techniques: a comprehensive survey. Journal of the American Society for Information Science and Technology (JASIST), 55(3):189–213, 2004. [7] Al-Taani, A. and Al-Awad, N. A. A comparative study of web-pages classification methods using fuzzy operators applied to Arabic web-pages. In Proc. of the International Enformatika Conference (IEC’05), pages 33–35, Prague, Czech Republic, Aug. 26–28, 2005. Enformatika, Çanakkale, Turkey. [8] Antonellis, I., Bouras, C., and Poulopoulos, V. Personalized news categorization through scalable text classification. In Proc. of the 8th Asia-Pacific Web ConferenceFrontiers of WWW Research and Development (APWeb 2006), volume 3841 of

101

Lecture Notes in Computer Science, pages 391–401, Harbin, China, Jan. 16–18, 2006. Springer-Verlag New York, Inc. [9] Apté, C., Damerau, F., and Weiss, S. Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS), 12(3):233–251, July 1994. [10] Attia, M. A large-scale computational processor of the Arabic morphology. Master’s thesis, Computer Engineering, Faculty of Engineering, Cairo, Egypt, Jan. 2000. [11] Baker, D. and McCallum, A. Distributional clustering of words for text classification. In Proc. of the 21th ACM International Conference on Research and Development in Information Retrieval (SIGIR’98), pages 96–103, Melbourne, AU, Aug. 24–28, 1998. ACM Press, New York, United States. [12] Bakus, J. and Kamel, M. Higher order feature selection for text classification. ˘ S–491, Knowledge and Information Systems, 9(4):468âA ¸ Apr. 2006. [13] Bang, S., Yang, J., and Yang, H. Hierarchical document categorization with k-NN and concept-based thesauri. Information Processing and Management, 42(2):387– 406, 2006. [14] Baoli, L., Qin, L., and Shiwen, Y. An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing, 3(4):215– 226, Dec. 2004. [15] Barzilay, R., Elhadad, N., and McKeown, K. Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17:35–55, 2002. [16] Bekkerman, R., El-Yaniv, R., Tishby, N., and Winter, Y. Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research, 3:1183– 1208, 2003. [17] Bell, D., Guan, J., and Bi, Y. On combining classifier mass functions for text categorization. IEEE Transaction on Knowledge and Date Engineering, 17(10):1307–1319, Oct. 2005. [18] Berger, A., Pietra, S., and Pietra, V. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996. [19] Bingham, E., Kabán, A., and Girolami, M. Topic identification in dynamical text by complexity pursuit. Neural Processing Letters, 17(1):69–83, 2003. 102

[20] Blum, A. and Langley, P. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2):245–271, Dec. 1997. [21] Bradley, A. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 5(7):1145–1160, 1997. [22] Brank, J., Grobelnik, M., Mili´c-Frayling, N., and Mladeni´c, D. Feature selection using support vector machines. In Proc. of the 3rd International Conference on Data Mining Methods and Databases for Engineering, Finance, and Other Fields, Bologna, IT, 2002. [23] Brants, T., Chen, F., and Farahat, A. Arabic document topic analysis. In Proc. of the LREC-2002 Workshop Arabic Language Resources and Evaluation, Las Palmas, Spain, June 01, 2002. [24] Breiman, L. Bagging predictors. Machine Learning, 24(2):123–140, 1996. [25] Breslow, L. and Aha, D. Simplifying decision trees: a survey. Knowledge Engineering Review, 12(1):1–40, 1997. [26] Buckley, C. The importance of proper weighting methods. In Proc. of the workshop on Human Language Technology (HLT ’93), pages 349–352, Princeton, New Jersey, Mar. 21–24 1993. Association for Computational Linguistics, Morristown, NJ, USA. [27] Cai, L. and Hofmann, T. Text categorization by boosting automatically extracted concepts. In Proc. of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03), pages 182–189, Toronto, Canada, July 28-Aug. 1, 2003. ACM Press, New York, United States. [28] Calvo, R. and Ceccatto, H. Intelligent document classification. Intelligent Data Analysis, 4(5):411–420, 2000. [29] Caropreso, M., Matwin, S., and Sebastiani, F. A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. Text databases & document management: theory & practice, pages 78–102, 2001. [30] Chen, C.-M., Lee, H.-M., and Hwang, C.-W. A hierarchical neural network document classifier with linguistic feature selection. Applied Intelligence, 23(3):277– 294, Dec. 2005. [31] Chen, H. Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. Journal of the American Society for Information Science and Technology (JASIST), 46(3):194–216, 1995. 103

[32] Chen, H. and Ho, T. Evaluation of decision forests on text categorization. In Proc. of the 7th SPIE Conference on Document Recognition and Retrieval, pages 191– 199, San Jose, United States, Jan. 21–23, 2000. SPIE - The International Society for Optical Engineering. [33] Chen, L., Tokuda, N., and Nagai, A. A new differential LSI space-based probabilistic document classifier. Information Processing Letters, 88(5):203–212, 2003. [34] Chiang, J.-H. Chen, Y.-C. An intelligent news recommender agent for filtering and categorizing large volumes of text corpus. International journal of Intelligent Systems, 19(3):201–216, 2003. [35] Chuan, Z., Xianliang, L., Mengshu, H., and Xu, Z. A lvq-based neural network anti-spam email approach. ACM SIGOPS Operating Systems Review, 39(1):34–39, 2005. [36] Cohen, W. Fast effective rule induction. In Proc. of the 12th International Conference on Machine Learning (ICML’95), pages 115–123, Tahoe City, CA, July 9–12, 1995. Morgan Kaufmann Publishers, San Francisco, United States. [37] Cohen, W. and Singer, Y. Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems, 17(2):141–173, 1999. [38] Combarro, E., Montanes, E., Diaz, I., Ranilla, J., and Mones, R. Introducing a family of linear measures for feature selection in text categorization. IEEE Transaction on Knowledge and Date Engineering, 17(9):1223–1232, Sept. 2005. [39] Cortes, C. and Vapnik, V. Support-vector networks. Machine Learning, 20(3):273– 297, 1995. [40] Dara, R. and Kamel, M. Sharing training patterns among multiple classifiers. In Proc. of the 5th International Workshop in Multiple Classifier Systems (MCS-2004), volume 3077 of Lecture Notes in Computer Science, pages 243–252. Springer-Verlag New York, Inc., Cagliari, Italy, June 09–11, 2004. [41] Darwish, K. Building a shallow Arabic morphological analyzer in one day. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pages 1–8, Philadelphia, Pennsylvania, United States, July 7–12, 2002. Association for Computational Linguistics, Morristown, NJ, USA. [42] Darwish, K. Probabilistic methods for searching OCR-degraded Arabic text. PhD thesis, University of Maryland, College Park, Maryland, United States, 2003. 104

[43] Darwish, K., Hassan, H., and Emam, O. Examining the effect of improved context sensitive morphology on Arabic information retrieval. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 25–30, Ann Arbor, Michigan, June 25–30, 2005. Association for Computational Linguistics, Morristown, NJ, USA. [44] Das, S. Filters, wrappers and a boosting-based hybrid for feature selection. In Proc. of the 18th International Conference on Machine Learning (ICML’01), pages 74– 81, Williamstown, MA, United States, June 28–July 1st 2001. Morgan Kaufmann Publishers, San Francisco, United States. [45] Dasigi, V., Mann, R., and Protopopescu, V. Information fusion for text classification an experimental comparison. Pattern Recognition, 34(12):2413–2425, December 2001. [46] Debole, F. and Sebastiani, F. Supervised term weighting for automated text categorization. In Proc. of the 2003 ACM Symposium on Applied Computing (SAC’03), pages 784–788, Melbourne, United States, Mar. 09–12, 2003. ACM Press, New York, United States. [47] Debole, F. and Sebastiani, F. Supervised term weighting for automated text categorization. In Sirmakessis, S., editor, Text Mining and its Applications, volume 138 of “Studies in Fuzziness and Soft Computing” series, pages 81–98. Physica-Verlag, Heidelberg, DE, 2004. [48] Debole, F. and Sebastiani, F. An analysis of the relative hardness of reuters-21578 subsets. Journal of the American Society for Information Science and Technology (JASIST), 56(6):584–596, Apr. 2005. [49] Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990. [50] del Castillo, M. D. and Serrano, J. A multistrategy approach for digital text categorization from imbalanced documents. SIGKDD Explorations, 6(1):70–79, 2004. [51] Deng, Z.-H., Tang, S., Yang, D., Zhang, M., Li, L.-Y., and Xie, K.-Q. A comparative study on feature weight in text categorization. In Proc. of the Advanced Web Technologies and Applications, 6th Asia-Pacific Web Conference, APWeb 2004, pages 588–597, Hangzhou, China, Apr. 14–17, 2004. Springer-Verlag New York, Inc. [52] Díaz, I., Ranilla, J., Montañes, E., Fernández, J., and Combarro, E. Improving performance of text categorization by combining filtering and support vector machines. 105

Journal of the American Society for Information Science and Technology (JASIST), ˘ S–592, 55(7):579âA ¸ May 2004. [53] Doan, S. and Horiguchi, S. An efficient feature selection using multi-criteria in text categorization. In Proc. of the 4th International Conference on Hybrid Intelligent Systems (HIS’04), pages 86–91, Kitakyushu, Japan, Dec. 05–08, 2004. IEEE Computer Society, Washington, DC, USA. [54] Dong, Y.-S. and Han, K.-S. Text classification based on data partitioning and parameter varying ensembles. In Proc. of the 2005 ACM Symposium on Applied Computing (SAC’05), pages 1044–1048, Santa Fe, New Mexico, USA, Mar. 13–17, 2005. ACM Press, New York, United States. [55] Drucker, H., Wu, D., and Vapnik, V. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5):1048–1054, 1999. [56] Duda, R., Hart, P., and Stork, D. Pattern Classification. Wiley-Interscience, 2nd edition, Nov. 2000. [57] Dumais, S., Platt, J., Heckerman, D., and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proc. of the 7th ACM International Conference on Information and Knowledge Management (CIKM’98), pages 148– 155, Bethesda, United States, Nov. 02–07, 1998. ACM Press, New York, United States. [58] Duwairi, R. A distance-based classifier for Arabic text categorization. In Proc. of the 2005 International Conference on Data Mining (DMIN 2005), pages 187–192, Las Vegas, Nevada, United States, June 20–23, 2005. CSREA Press, Las Vegas, Nevada, United States. [59] Duwairi, R. Machine learning for Arabic text categorization. Journal of the American Society for Information Science and Technology (JASIST), 57(8):1005 – 1010, June 2006. [60] El-kourdi, M., Bensaid, A., and eddine Rachifi, T. Automatic Arabic document categorization based on the Naive Bayes algorithm. In Proc. of the COLING 20th Workshop on Computational Approaches to Arabic Script-based Languages, University of Geneva, Geneva, Switzerland, Aug. 23–27, 2004. [61] El-kourdi, M., eddine Rachifi, T., and Bensaid, A. A concatenative approach to Arabic word root extraction. In in progress, 2006.

106

[62] Feldman, R. and Dagan, I. Knowledge discovery in textual databases (KDT). In Proc. of the 1st IEEE International Conference on Knowledge Discovery and Data Mining (KDD’95), pages 112–117, Montreal, Canada, Aug. 20–21, 1995. AAAI Press, Menlo Park, United States. [63] Forman, G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research (JMLR), 3:1289–1305, Mar. 2003. [64] Forman, G. A pitfall and solution in multi-class feature selection for text classification. In Proc. of the 21st International Conference on Machine Learning (ICML’04), volume 69 of ACM International Conference Proceeding Series, page 38, Banff, Alberta, Canada, July 4–8, 2004. ACM Press, New York, United States. [65] Fragos, K., Maistros, Y., and Skourlas, C. A weighted maximum entropy language model for text classification. In Natural Language Understanding and Cognitive Science, Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science, NLUCS 2005, In conjunction with ICEIS 2005, pages 55–67, Miami, FL, United States, May 24th, 2005. INSTICC Press. [66] Frasconi, P., Soda, G., and Vullo, A. Text categorization for multi-page documents: A hybrid Naive Bayes HMM approach. In Proc. of the 1st ACM-IEEE Joint Conference on Digital Libraries (JCDL’01), pages 11–20, Roanoke, United States, June 24–28, 2001. IEEE Computer Society, Washington, DC, USA. [67] Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. databases - an overview. AI Magazine, 13:57–70, 1992.

Knowledge discovery in

[68] Freitas-Junior, H., Ribeiro-Neto, B., Vale, R., Laender, A., and Lima, L. Categorization-driven cross-language retrieval of medical information. Journal of the American Society for Information Science and Technology (JASIST), 57(4):501 – 510, Jan. 2006. [69] Gabrilovich, E. and Markovitch, S. Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5. In Proc. of the 21st International Conference on Machine Learning (ICML’04), vol˘ S328, ume 69 of ACM International Conference Proceeding Series, page 321âA ¸ Banff, Alberta, Canada, July 4–8, 2004. ACM Press, New York, United States. [70] Gabrilovich, E. and Markovitch, S. Feature generation for text categorization using world knowledge. In Proc. of the the 19th International Joint Conference on Artificial Intelligence, pages 1048–1053, Edinburgh, Scotand, Aug. 2005. 107

[71] Galavotti, L., Sebastiani, F., and Simi, M. Experiments on the use of feature selection and negative evidence in automated text categorization. In Proc. of the 4th European Conference on Research and Advanced Technology for Digital Libraries (ECDL’00), volume 1923 of Lecture Notes in Computer Science, pages 59–68, Lisbon, Portugal, Sept. 18–20, 2000. Springer-Verlag New York, Inc. [72] Gulli, A. and Signorini, A. The indexable web is more than 11.5 billion pages. In Proc. of the Special interest tracks and posters of the 14th international conference on World Wide Web (WWW ’05), pages 902–903, Chiba, Japan, May 10–14, 2005. ACM Press, New York, United States. [73] Guo, G., Wang, H., Bell, D., Bi, Y., and Greer, K. Using KNN model for automatic text categorization. Soft Computing, 10(5):423–430, 2006. [74] Guyon, I. and Elisseeff, A. An introduction to variable and feature selection. Journal of Machine Learning Research (JMLR), 3:1157–1182, Mar. 2003. [75] Hahn, U. and Mani, I. The challenges of automatic summarization. Computer, 22(11):29–36, Nov. 2000. [76] Halvorsen, P.-K. Chapter document processing. In Cole, R., editor, Survey of the state of the art in human language technology. Cambridge University Press, New York, NY, USA, 1997. [77] Han, E.-H., Karypis, G., and Kumar, V. Text categorization using weight-adjusted k-nearest neighbor classification. In Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-01), volume 2035 of Lecture Notes in Computer Science, pages 53–65, Hong Kong, CN, Apr. 16–18, 2001. Springer-Verlag New York, Inc. [78] Hand, D., Mannila, H., and Smyth, P. Principles of Data Mining (Adaptive Computation and Machine Learning). The MIT Press, August 2001. [79] Hassan, H. Maximum entropy framework for natural language processing applied on Arabic text categorization. Master’s thesis, Department of Electronics and Communication Engineering, Faculty of Engineering, Cairo, Egypt, Nov. 2001. [80] Hayes, P. and Weinstein, S. C ONSTRUE /T IS: a system for content-based indexing of a database of news stories. In Proc. of the IAAI-90, 2nd Conference on Innovative Applications of Artificial Intelligence, pages 49–66, Boston, United States, 1990. AAAI Press, Menlo Park, United States.

108

[81] Hersh, W., Buckley, C., Leone, T. J., and Hickam, D. Ohsumed: an interactive retrieval evaluation and new large test collection for research. In Proc. of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR’94), pages 192–201, Dublin, Ireland, July 03–06, 1994. SpringerVerlag New York, Inc. [82] Hess, A. Supervised and unsupervised ensemble learning for the semantic web. PhD thesis, School of Computer Science and Informatics, National University of Ireland, Dublin 4, Ireland, Feb. 2006. [83] Hmeidi, I., Kanaan, G., and Evenss, M. Design and implementation of automatic indexing for information retrieval with arabic documents. Journal of the American Society for Information Science and Technology (JASIST), 48(10):867–881, 1997. [84] Hull, D. The TREC-7 filtering track: description and analysis. In Proc. of the TREC-7, 7th Text Retrieval Conference, pages 33–56, Gaithersburg, United States, Nov. 09–11, 1998. National Institute of Standards and Technology, Gaithersburg, United States. [85] Hyvärinen, A. and Oja, E. Independent component analysis: algorithms and applications. Neural Networks, 13(4-5):411–430, 2000. [86] Ismail, M. and Kamel, M. Multidimensional data clustering utilizing hybrid search strategies. Pattern Recognition, 22(1):75–89, 1989. [87] Jain, A., Murty, M., and Flynn, P. Data clustering: a review. ACM Computing ˘ S– Surveys (CSUR), 31(3):264 âA ¸ 323, Sept. 1999. [88] Joachims, T. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proc. of the 14th International Conference on Machine Learning (ICML’97), pages 143–151, Nashville, United States, July 08–12, 1997. Morgan Kaufmann Publishers, San Francisco, United States. [89] Joachims, T. Text categorization with support vector machines: learning with many relevant features. In Proc. of the 10th European Conference on Machine Learning (ECML’98), volume 1398 of Lecture Notes in Computer Science, pages 137–142, Chemnitz, Germany, Apr. 21–24, 1998. Springer-Verlag New York, Inc. [90] Joachims, T. Transductive inference for text classification using support vector machines. In Proc. of the 16th International Conference on Machine Learning (ICML’99), pages 200–209, Bled, Slovenia, June 27–30, 1999. Morgan Kaufmann Publishers, San Francisco, United States. 109

[91] Johnson, D., Oles, F., Zhang, T., and Goetz, T. A decision-tree-based symbolic rule induction system for text categorization. IBM systems Journal, 41(3):428–437, 2002. [92] Kan, M.-Y., McKeown, K., and Klavans, J. Domain-specific informative and indicative summarization for information retrieval. In Proc. of the Document Understanding Conference (DUC 2001), New Orleans, United States, Sept. 2001. [93] Kanaan, G., Al-Shalabi, R., and Al-Akhras, A.-A. KNN Arabic text categorization using IG feature selection. In Proc. of the 4th International conference on Computer Science and Information Technology, Amman, Jordan, Apr. 05–07, 2006. [94] Kazama, J. and Tsujii, J. Maximum entropy models with inequality constraints: A case study on text categorization. Machine Learning, 60(1-3):159–194, Sept. 2005. [95] Kehagias, A., Petridis, V., Kaburlasos, V., and Fragkou, P. A comparison of wordand sense-based text categorization using several classification algorithms. Journal ˘ S–247, of Intelligent Information Systems, 21(3):227âA ¸ 2003. [96] Kim, H., Howland, P., and Park, H. Dimension reduction in text classification with support vector machines. Journal of Machine Learning Research, 6:37–53, 2005. [97] Kim, S.-B., Seo, H.-C., and Rim, H.-C. Poisson Naive Bayes for text classification with feature weighting. In Proc. of the 6th International Workshop on Information Retrieval with Asian Languages, pages 33–40, Sapporo, Japan, July 7, 2003. [98] Klimt, B. and Yang, Y. The Enron corpus: A new dataset for email classification research. In Proc. of the 15th European Conference on Machine Learning (ECML’04), volume 3201 of Lecture Notes in Computer Science, pages 217–226, Pisa, Italy, Sept. 20–24, 2004. Springer-Verlag New York, Inc. [99] Ko, Y., Park, J., and Seo, J. Improving text categorization using the importance of sentences. Information Processing and Management, 40(1):65–79, 2004. [100] Kohavi, R. and John, G. H. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324, 1997. [101] Kolenda, T., Hansen, L., and Sigurdsson, S. Chapter independent Components in text. In Girolami, M., editor, Advances in Independent Component Analysis, pages 229–250. Springer-Verlag New York, Inc., 2000. [102] Lam, S. and Lee, D. Feature reduction for neural network based text categorization. In Proc. of the 6th International Conference on Database Systems for Advanced 110

Applications (DASFAA ’99), pages 195–202, Hsinchu, Taiwan, Apr. 19–21, 1999. IEEE Computer Society, Washington, DC, USA. [103] Lam, W. and Lai, K.-Y. A meta-learning approach for text categorization. In Proc. of the 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR’01), pages 303–309, New Orleans, Louisiana, United States, Sept. 09–13, 2001. ACM Press, New York, United States. [104] Lam, W., Ruiz, M., and Srinivasan, P. Automatic text categorization and its application to text retrieval. IEEE Transaction on Knowledge and Date Engineering, 11(6):865–879, 1999. [105] Lan, M., Tan, C.-L., Low, H.-B., and Sung, S.-Y. A comprehensive comparative study on term weighting schemes for text categorization with support vector machines. In Proc. of the Special interest tracks and posters of the 14th international conference on World Wide Web (WWW ’05), pages 1032–1033, Chiba, Japan, May 10–15, 2005. ACM Press, New York, United States. [106] Larkey, L., Ballesteros, L., and Connell, M. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In Proc. of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR’02), pages 275–282, Tampere, Finland, Aug. 11–15, 2002. ACM Press, New York, United States. [107] Larkey, L. and Croft, B. Combining classifiers in text categorization. In Proc. of the 19th ACM International Conference on Research and Development in Information Retrieval (SIGIR’96), pages 289–297, Zürich, Switzerland, Aug. 18–22, 1996. ACM Press, New York, United States. [108] Lee, K., Kay, J., Kang, B., and Rosebrock, U. A comparative study on statistical machine learning algorithms and thresholding strategies for automatic text categorization. In Proc. of the 7th Pacific Rim International Conference on Artificial Intelligence (PRICAI-02), volume 2417 of Lecture Notes in Computer Science, pages 444–453, Tokyo, Japan, 2002. Springer-Verlag New York, Inc. [109] Lehtonen, M., Petit, R., Heinonen, O., and Lindén, G. A dynamic user interface for document assembly. In Proc. of the ACM Symposium on Document Engineering (DocEng’02), pages 134–141, McLean, Virginia, USA, Nov. 8–9, 2002. ACM Press, New York, United States. [110] Lewis, D. An evaluation of phrasal and clustered representations on a text categorization task. In Proc. of the 15th ACM International Conference on Research 111

and Development in Information Retrieval (SIGIR’92), pages 37–50, Copenhagen, Denmark, June 21–24, 1992. Springer-Verlag New York, Inc. [111] Lewis, D. Evaluating and optimizing autonomous text classification systems. In Proc. of the 18th ACM International Conference on Research and Development in Information Retrieval (SIGIR’95), pages 246–254, Seattle, Washington, United States, July 9–13, 1995. ACM Press, New York, United States. [112] Lewis, D. Naive (Bayes) at forty: The independence assumption in information retrieval. In Proc. of the 10th European Conference on Machine Learning (ECML’98), volume 1398 of Lecture Notes in Computer Science, pages 4–15, Chemnitz, Germany, Apr. 21–24, 1998. Springer-Verlag New York, Inc. [113] Lewis, D. and Gale, W. A sequential algorithm for training text classifiers. In Proc. of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR’94), pages 3–12, Dublin, Ireland, July 03–06, 1994. SpringerVerlag New York, Inc. [114] Lewis, D. and Ringuette, M. A comparison of two learning algorithms for text categorization. In Proc. of the 3rd Symposium on Document Analysis and Information Retrieval (SDAIR’94), pages 81–93, Las Vegas, United States, 1994. ISRI; University of Nevada. [115] Lewis, D., Schapire, R., Callan, J., and Papka, R. Training algorithms for linear text classifiers. In Proc. of the 19th ACM International Conference on Research and Development in Information Retrieval (SIGIR’96), pages 298–306, Zürich, Switzerland, Aug. 18–22, 1996. ACM Press, New York, United States. [116] Li, F. and Yang, Y. A loss function analysis for classification methods in text categorization. In Proc. of the 20th International Conference on Machine Learning (ICML’03), pages 472–479, Washington, DC, USA, Aug. 21–24, 2003. AAAI Press, Menlo Park, United States. [117] Li, H. and Yamanishi, K. Text classification using ESC-based stochastic decision lists. Information Processing and Management, 38(3):343–361, May 2002. [118] Li, Y., Cao, Y., Zhu, Q., and Zhu, Z. A novel framework for web page classification using two-stage neural network. In Proc. of the 1st International conf. in Advanced Data Mining and Applications (ADMA 2005), volume 3584 of Lecture Notes in Computer Science, pages 499–506, Wuhan, China, July 22–24, 2005. SpringerVerlag New York, Inc. 112

[119] Liu, T., Chen, Z., Zhang, B., ying Ma, W., and Wu, G. Improving text classification using local latent semantic indexing. In Proc. of the 4th IEEE International Conference on Data Mining (ICDM’04), pages 162–169, Brighton, UK, Nov. 01–04, 2004. IEEE Computer Society, Washington, DC, USA. [120] Mana-Lopez, M., Buenaga, M. D., and Gomez-Hidalgo, J. Multidocument summarization: An added value to clustering in interactive retrieval. ACM Transactions on Information Systems (TOIS), 22(2):215–241, 2004. [121] Martínez, A. and Kak, A. PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2):228–233, Feb. 2001. [122] McCallum, A. and Nigam, K. A comparison of event models for Naive Bayes text classification. In Proc. of the AAAI-98, Workshop on Learning for Text Categorization, pages 41–48, Madison, Wisconsin, United States, July 26–27, 1998. AAAI Press, Menlo Park, United States. [123] Mikheev, A. Feature lattices for maximum entropy modelling. In Proc. of the 17th international conference on Computational linguistics, pages 848–854, Montreal, Quebec, Canada, Aug. 10–14, 1998. Association for Computational Linguistics, Morristown, NJ, USA. [124] Miller, G., Beckwith, R., Gross, C., and Miller, K. Introduction to wordnet: An on-line lexical database. International Journal of Lexicography, 3(4):235–244, 1990. [125] Mitchell, T. Machine Learning. The MIT Press, 1997. [126] Mladeni´c, D. Feature subset selection in text-learning. In Proc. of the 10th European Conference on Machine Learning (ECML’98), volume 1398 of Lecture Notes in Computer Science, pages 95–100, Chemnitz, Germany, Apr. 21–23, 1998. SpringerVerlag New York, Inc. [127] Mladeni´c, D., Brank, J., Grobelnik, M., and Mili´c-Frayling, N. Feature selection using linear classifier weights: interaction with classification models. In Proc. of the 27th ACM International Conference on Research and Development in Information Retrieval (SIGIR’04), pages 234–241, Sheffield, United Kingdom, July 25–29, 2004. ACM Press, New York, United States. [128] Mladeni´c, D. and Grobelnik, M. Feature selection for unbalanced class distribution and Naive Bayes. In Proc. of the 16th International Conference on Machine Learning (ICML’99), pages 258–267, Bled, Slovenia, June 27–30, 1999. Morgan Kaufmann Publishers, San Francisco, United States. 113

[129] Mohamed, S., Ata, W., and Darwish, N. A new technique for automatic text categorization for Arabic documents. In Proc. of the 5th IBIMA International Conference on Internet and Information Technology in Modern Organizations, Cairo, Egypt, Dec. 13–15, 2005. [130] Montañés, E., Díaz, I., Ranilla, J., Combarro, E. F., and Fernández, J. Scoring and selecting terms for text categorization. IEEE Intelligent Systems, 20(3):40–47, 2005. [131] Montañés, E., Quevedo, J. R., and Díaz, I. A wrapper approach with support vector machines for text categorization. In Proc. of the 7th International Work-Conference on Artificial and Natural Neural Networks, (IWANN2003), volume 2686 of Lecture Notes in Computer Science, pages 230–237, Maó, Menorca, Spain, June 03–06, 2003. Springer-Verlag New York, Inc. [132] Montoyo, A., Suarez, A., Rigau, G., and Palomar, M. Combining knowledge- and corpus-based word-sense-disambiguation methods. Journal of Artificial Intelligence Research, 23:299–330, Mar. 2005. [133] Moschitti, A. Answer filtering via text categorization in question answering systems. In Proc. of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2003), pages 241–248, Sacramento, California, USA, Nov. 3–5, 2003. IEEE Computer Society. [134] Moschitti, A. and Basili, R. Complex linguistic features for text classification: A comprehensive study. In Proc. of the 26th European Conference on IR Research (ECIR’2004), volume 2997 of Lecture Notes in Computer Science, pages 181–196, Sunderland, United Kingdom, Apr. 05–07, 2004. [135] Moukdad, H. Stemming and root-based approaches to the retrieval of Arabic documents on the web. Webology, 3(1), Mar. 2006. [136] Moulinier, I. and Ganascia, J.-G. Applying an existing machine learning algorithm to text categorization. In Proc. of the IJCAI-95 Workshop on Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language, volume 1040 of Lecture Notes in Computer Science series, pages 343–354, Montreal, Canada, Aug. 21st, 1995. Springer-Verlag New York, Inc. [137] Ng, H., Goh, W., and Low, K. Feature selection, perceptron learning, and a usability case study for text categorization. In Proc. of the 20th ACM International Conference on Research and Development in Information Retrieval (SIGIR’97), pages 67–73, Philadelphia, United States, July 27–31, 1997. ACM Press, New York, United States. 114

[138] Nigam, K., Laferty, J., and McCallum, A. Using maximum entropy for text classification. In Proc. of the IJCAI-99, Workshop on Machine Learning for Information Filtering, pages 61–67, Stockholm, Sweden, Aug. 1st, 1999. Morgan Kaufmann Publishers, San Francisco, United States. [139] Paliouras, G., Karkaletsis, V., Androutsopoulos, I., and Spyropoulos, C. Learning rules for large-vocabulary word sense disambiguation: A comparison of various classifiers. In Proc. of the 2nd International Conference on Natural Language Processing, volume 1835 of Lecture Notes in Computer Science, pages 383–394, Patra, Greece, 2000. Springer-Verlag New York, Inc. [140] Pant, G. and Srinivasan, P. Learning to crawl: Comparing classification schemes. ACM Transactions on Information Systems (TOIS), 23(4):430–462, 2005. [141] Park, S.-B. and Zhang, B.-T. Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing and Management, 40(3):421–439, Jan. 2004. [142] Peng, F., Schuurmans, D., and Wang, S. Augmenting Naive Bayes classifiers with statistical language models. Information Retrieval, 7(3-4):317–345, 2004. [143] Porter, M. An algorithm for suffix stripping. program, 14(3):130–137, July 1980. [144] Radev, D., Winkel, A., and Topper, M. Multi document centroid-based text summarization. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pages 112–113, Philadelphia, Pennsylvania, United States, July 7–12, 2002. Association for Computational Linguistics, Morristown, NJ, USA. [145] Rennie, J. Improving multi-class text classification with Naive Bayes. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA, Sept. 2001. [146] Rifkin, R. and Klautau, A. In defense of one-vs-all classification. Journal of Machine Learning Research, 5:101–141, 2004. [147] Rodríguez, M. D. B., Gómez-Hidalgo, J. M., and Díaz-Agudo, B. Using WordNet to complement training information in text categorization. In Proc. of the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP97), Tzigov Chark, Bulgaria, Sept. 11–13, 1997. [148] Rogati, M. and Yang, Y. High-performing feature selection for text classification. In Proc. of the 11th ACM International Conference on Information and Knowledge 115

Management (CIKM’02), pages 659 – 661, McLean, Virginia, United States, Nov. 04–09, 2002. ACM Press, New York, United States. [149] Ruiz, M. and Srinivasan, P. Hierarchical text classification using neural networks. Journal of Information Retrieval, 5(1):87–118, Jan. 2002. [150] Sakhr. Sakhr Arabic Categorization Engine, 2006. [151] Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988. [152] Salton, G., Singhal, A., Mitra, M., and Buckley, C. Automatic text structuring and summarization. Information Processing and Management, 33(2):193–207, 1997. [153] Salton, G., Wong, A., and Yang, C. S. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, Nov. 1975. [154] Sawaf, H., Zaplo, J., and Ney, H. Statistical classification methods for Arabic news articles. In Proc. of the Arabic NLP Workshop at ACL/EACL, Toulouse, France, July 06, 2001. [155] Schapire, R. and Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000. [156] Schneider, K.-M. Techniques for improving the performance of Naive Bayes for text classification. In Proc. of the 6th International Conference of Computational Linguistics and Intelligent Text Processing (CICLing 2005), volume 3406 of Lecture Notes in Computer Science, pages 682–693, Mexico City, Mexico, Feb. 13–19, 2005. Springer-Verlag New York, Inc. [157] Scott, S. and Matwin, S. Feature engineering for text classification. In Proc. of the 16th International Conference on Machine Learning (ICML’99), pages 379–388, Bled, Slovenia, June 27–30, 1999. Morgan Kaufmann Publishers, San Francisco, United States. [158] Sebastiani, F. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1):1–47, Mar. 2002. [159] Sevillano, X., Alías, F., and Socoró, J. C. Reliability in ICA-based text classification. In Proc. of the 5th Independent Component Analysis and Blind Signal Separation ICA 2004, volume 3195 of Lecture Notes in Computer Science, pages 1213–1220, Granada, Spain, Sept. 22–24, 2004. Springer-Verlag New York, Inc.

116

[160] Shen, D., Sun, J.-T., Yang, Q., and Chen, Z. A comparison of implicit and explicit links for web page classification. In Proc. of the 15th international conference on World Wide Web ( WWW ’06), pages 643–650, Edinburgh, Scotland, May 23–26, 2006. ACM Press, New York, United States. [161] Silvatt, C. and Ribeirot, B. The importance of stop word removal on recall values in text categorization. In Proc. of the IEEE International Joint Conference on Neural Networks (IJCNN 2003), volume 3, pages 1661–1666, Portland, Oregon, USA, July 20–24, 2003. [162] Song, F., Liu, S., and Yang, J. A comparative study on text representation schemes in text categorization. Pattern Analysis & Applications, 8(1-2):199–209, Sept. 2005. [163] Soucy, P. and Mineau, G. A simple KNN algorithm for text categorization. In Proc. of the IEEE International Conference on Data Mining (ICDM-01), pages 647–648, San Jose, California, United States, 2001. IEEE Computer Society, Washington, DC, USA. [164] Soucy, P. and Mineau, G. W. Feature selection strategies for text categorization. In Proc. of the 16th Conference of Canadian Conference on AI, the Canadian Society for Computational Studies of Intelligence, volume 2671 of Lecture Notes in Computer Science, pages 505–509, Halifax, Canada, June 11–13, 2003. Springer-Verlag New York, Inc. [165] Stein, G., Strzalkowski, T., and Wise, G. Interactive, text-based summarization of multiple documents. Computational Intelligence, 16(4):606–613, 2000. [166] Syiam, M., Fayed, Z., and Habib, M. Arabic text categorization using machine learning techniques. In Proc. of the 5th Conference on Language Engineering, Cairo, Egypt, Sept. 14–15, 2005. [167] Taira, H. Text Categorization using Machine Learning. PhD thesis, Department of Information Processing, Nara Institute of Science and Technology, Japan, Feb. 2002. [168] Tan, A.-H. Text mining: The state of the art and the challenges. In Proc. of the 3rd the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD’99 workshop on Knowledge Discovery from Advanced Databases, pages 71–76, Beijing, China, Apr. 26–28, 1999. [169] Tang, B., Shepherd, M., Milios, E., and Heywood, M. Comparing and combining dimension reduction techniques for efficient text clustering. In SIAM International Workshop on Feature Selection for Data Mining - Interfacing Machine Learning and Statistics, Newport Beach, California, United States, April 23 2005. 117

[170] Tang, L. and Liu, H. Bias analysis in text classification for highly skewed data. In Proc. of the 5th IEEE International Conference on Data Mining (ICDM-05), pages 781–784, Houston, Texas, United States, Nov. 27–30, 2005. [171] Teuteberg, F. In Abramowicz, W., editor, Knowledge-Based Information Retrieval and Filtering from the Web, chapter Intelligent Agents for Document Categorization and Adaptive Filtering Using a Neural Network Approach and Fuzzy Logic, pages 231–250. Kluwer Academic Publishers, Hingham, MA, USA, 2003. [172] Tiun, S., Abdullah, R., and Kong, T. E. Automatic topic identification using ontology hierarchy. In Proc. of the 2nd International Conference on Computational Linguistics and Intelligent Text Processing (CICLing ’01), volume 2004 of Lecture Notes in Computer Science, pages 444–453, Mexico-City, Mexico, Feb. 18–24, 2001. Springer-Verlag New York, Inc. [173] Tong, S. and Koller, D. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2:45–66, Mar. 2001. [174] Torkkola, K. Discriminative features for text document classification. Pattern Analysis and Applications, 6(4):301–308, 2004. [175] Tsay, J.-J., Shih, C.-Y., and Wu, B.-L. Autocrawler: An integrated system for automatic topical crawler. In Proc. of the 4th Annual ACIS International Conference on Computer and Information Science (ICIS’05), pages 462–467, Jeju Island, South Korea, July 14–16, 2005. IEEE Computer Society, Washington, DC, USA. [176] Väyrynen, J. Learning linguistic features from natural text data by independent component analysis. Master’s thesis, Department of Computer Science and Engineering, Helsinki University of Technology, Finland, 2005. [177] Wang, J. and Karypis, G. Harmony: Efficiently mining the best rules for classification. In Proc. of the SIAM International Conference on Data Mining (SDM-05), pages 205–216, Newport Beach, California, United States, Apr. 21–23, 2005. [178] Warner, B. and Misra, M. Understanding neural networks as statistical tools. The American Statistician, 50(4):284–293, 1996. [179] Wiener, E., Pedersen, J., and Weigend, A. A neural network approach to topic spotting. In Proc. of the 4th Symposium on Document Analysis and Information Retrieval (SDAIR’95), pages 317–332, Las Vegas, United States, Apr. 24–26, 1995. [180] Xia, Y., Dalli, A., Wilks, Y., and Guthrie, L. FASiL adaptive email categorization system. In Proc. of the 6th International Conference of Computational Linguistics and Intelligent Text Processing (CICLing 2005), volume 3406 of Lecture Notes 118

in Computer Science, pages 723–734, Mexico City, Mexico, Feb. 13–19, 2005. Springer-Verlag New York, Inc. [181] Xu, J., Fraser, A., and Weischedel, R. Empirical studies in strategies for Arabic retrieval. In Proc. of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR’02), pages 269–274, Tampere, Finland, Aug. 11–15, 2002. ACM Press, New York, United States. [182] Xu, R. and Wunsch, D. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678, 2005. [183] Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., and Ma, W.Y. OCFS: optimal orthogonal centroid feature selection for text categorization. In Proc. of the 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR’05), pages 122–129, Salvador, Brazil, Aug. 15–19, 2005. ACM Press, New York, United States. [184] Yang, Y. Noise reduction in a statistical approach to text categorization. In Proc. of the18th ACM International Conference on Research and Development in Information Retrieval (SIGIR’95), pages 256–263, Seattle, Washington, United States, July 9–13, 1995. ACM Press, New York, United States. [185] Yang, Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1(1/2):69–90, 1999. [186] Yang, Y. A study of thresholding strategies for text categorization. In Proc. of the 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR’01), pages 137–145, New Orleans, Louisiana, United States, Sept. 09–13, 2001. ACM Press, New York, United States. [187] Yang, Y. and Chute, C. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3):252–277, 1994. [188] Yang, Y. and Liu, X. A re-examination of text categorization methods. In Proc. of the 22th ACM International Conference on Research and Development in Information Retrieval (SIGIR’99), pages 42–49, Berkeley, California, United States, Aug. 15–19, 1999. ACM Press, New York, United States. [189] Yang, Y. and Pedersen, J. A comparative study on feature selection in text categorization. In Proc. of the 14th International Conference on Machine Learning (ICML’97), pages 412–420, Nashville, Tennessee, United States, July 08–12, 1997. Morgan Kaufmann Publishers, San Francisco, United States. 119

[190] Yang, Y., Slattery, S., and Ghani, R. A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2/3):219–241, 2002. [191] Yang, Y., Zhang, J., and Kisiel, B. A scalability analysis of classifiers in text categorization. In Proc. of the 26th ACM International Conference on Research and Development in Information Retrieval (SIGIR’03), pages 96–103, Toronto, Canada, July 28-Aug. 1, 2003. ACM Press, New York, United States. [192] Yin, X. and Han, J. CPAR: Classification based on predictive association rules. In Proc. of the SIAM International Conference on Data Mining (SDM-03), San Francisco, CA, United States, May 1–3, 2003. [193] Yu, H., Han, J., and Chang, K. C.-C. PEBL: Web page classification without negative examples. IEEE Transaction on Knowledge and Date Engineering, 16(1):70–82, Jan. 2004. [194] Zamir, O. and Etzioni, O. Grouper: A dynamic clustering interface to web search ˘ S–1374, results. Computer Networks, 31(11-16):1361âA ¸ 1999. [195] Zelikovitz, S. and Hirsh, H. Using LSI for text classification in the presence of background text. In Proc. of the 10th ACM International Conference on Information and Knowledge Management (CIKM’01), pages 113–118, Atlanta, Georgia, United States, Nov. 05–08, 2001. [196] Zhang, D., Chen, X., and Lee, W. Text classification with kernels on the multinomial manifold. In Proc. of the 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR’05), pages 266–273, Salvador, Brazil, Aug. 15–19, 2005. ACM Press, New York, United States. [197] Zhang, L., Zhu, J., and Yao, T. An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP), 3(4):243– 269, 2004. [198] Zhang, T. and Oles, F. Text categorization based on regularized linear classification methods. Information Retrieval, 4(1):5–31, 2001. [199] Zheng, Z. and Srihari, R. Optimally combining positive and negative features for text categorization. In Proc. of the Workshop on Learning from Imbalanced Datasets (ICML’03), Washington DC, United States, Aug. 21–24, 2003. [200] Zhu, M., Zhu, J., and Chen, W. Effect analysis of dimension reduction on support vector machines. In Proc. of the IEEE International conference in Natural Language Processing and Knowledge Engineering (IEEE NLP-KE ’05), pages 592– 596, Wuhan, China, Oct. 30-Nov. 1, 2005. 120

[201] Zhu, M., Zhu, J., and Chen, W. Effect analysis of dimension reduction on support vector machines. In Proc. of the Natural Language Processing and Knowledge Engineering IEEE NLP-KE, pages 592–596, Wuhan, China, Oct30–Nov01 2005. [202] Zhu, S., Ji, X., Xu, W., and Gong, Y. Multi-labelled classification using maximum entropy method. In Proc. of the 28th ACM International Conference on Research and Development in Information Retrieval (SIGIR’05), pages 274–281, Salvador, Brazil, Aug. 15–19, 2005. ACM Press, New York, United States. [203] Zu, G., Ohyama, W., Wakabayashi, T., and Kimura, F. Accuracy improvement of automatic text classification based on feature transformation. In Proc. of the 2003 ACM symposium on Document Engineering (DocEng’03), pages 118–120, Grenoble, France, Nov. 20–22, 2003. ACM Press, New York, United States. [204] Zurada, J. Introduction to artificial neural systems. West publishing company, Saint Paul, Minnesota, 1992.

121

Appendix A

Thresholding Techniques Results

122

Table A.1: Thresholding techniques for the 20NG Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.517

0.646

0.650

0.634

1

0.621

0.698

0.699

1.5

0.669

0.727

0.729

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.653

0.685

0.702

0.701

0.699

0.725

0.681

0.709

0.723

0.745

0.744

0.744

0.755

0.708

0.737

0.739

0.761

0.760

0.759

0.770

0.699

0.746

0.752

0.739

0.757

0.757

0.775

0.778

0.776

0.786

5

0.737

0.769

0.773

0.765

0.774

0.776

0.791

0.790

0.789

0.791

7.5

0.752

0.778

0.781

0.778

0.781

0.784

0.795

0.793

0.792

0.796

10

0.759

0.786

0.786

0.784

0.785

0.787

0.798

0.796

0.792

0.798

0.5

0.451

0.684

0.661

0.661

0.671

0.687

0.696

0.694

0.691

0.719

1

0.615

0.718

0.710

0.700

0.717

0.728

0.738

0.735

0.731

0.745

0.668

0.739

0.732

0.726

0.737

0.745

0.754

0.751

0.746

0.761

0.716

0.755

0.754

0.751

0.752

0.759

0.768

0.763

0.768

0.775

5

0.758

0.773

0.773

0.771

0.775

0.775

0.781

0.781

0.779

0.785

7.5

0.768

0.782

0.780

0.777

0.783

0.782

0.787

0.786

0.783

0.790

10

0.777

0.788

0.788

0.785

0.786

0.787

0.791

0.791

0.788

0.794

CC

1.5 2.5

DF

IG

MI

Table A.2: Thresholding techniques for Alj-News-W Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.883

0.880

0.893

0.887

1

0.903

0.932

0.932

1.5

0.928

0.939

0.942

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.924

0.906

0.911

0.907

0.914

0.934

0.924

0.938

0.938

0.930

0.929

0.927

0.949

0.940

0.942

0.940

0.945

0.946

0.948

0.954

0.937

0.956

0.949

0.959

0.959

0.959

0.959

0.958

0.957

0.964

5

0.953

0.965

0.961

0.960

0.969

0.963

0.968

0.962

0.964

0.971

7.5

0.962

0.964

0.965

0.967

0.968

0.967

0.962

0.966

0.964

0.970

10

0.961

0.965

0.966

0.965

0.969

0.966

0.967

0.965

0.965

0.971

0.5

0.735

0.905

0.870

0.896

0.872

0.925

0.914

0.921

0.920

0.925

1

0.860

0.935

0.913

0.925

0.922

0.936

0.930

0.930

0.935

0.952

0.900

0.943

0.937

0.939

0.932

0.953

0.944

0.943

0.949

0.960

0.937

0.956

0.948

0.954

0.954

0.962

0.958

0.960

0.957

0.960

5

0.953

0.962

0.961

0.961

0.960

0.962

0.962

0.962

0.964

0.965

7.5

0.959

0.962

0.960

0.960

0.959

0.964

0.963

0.963

0.965

0.965

10

0.956

0.963

0.962

0.959

0.959

0.962

0.964

0.963

0.964

0.966

CC

1.5 2.5

DF

123

IG

MI

Table A.3: Thresholding techniques for Alj-News-AS Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.841

0.890

0.913

0.899

1

0.889

0.941

0.936

1.5

0.917

0.952

0.950

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.931

0.930

0.926

0.926

0.926

0.943

0.935

0.948

0.946

0.940

0.940

0.943

0.951

0.948

0.952

0.955

0.953

0.953

0.952

0.961

0.944

0.956

0.959

0.954

0.963

0.961

0.958

0.962

0.961

0.966

5

0.951

0.965

0.968

0.966

0.967

0.967

0.969

0.969

0.972

0.965

7.5

0.953

0.966

0.965

0.966

0.967

0.966

0.966

0.967

0.967

0.967

10

0.958

0.967

0.966

0.967

0.964

0.967

0.965

0.968

0.967

0.967

0.5

0.721

0.920

0.892

0.921

0.905

0.932

0.930

0.926

0.928

0.945

1

0.866

0.941

0.935

0.947

0.937

0.943

0.946

0.943

0.946

0.945

0.912

0.947

0.947

0.949

0.947

0.946

0.950

0.949

0.955

0.956

0.936

0.959

0.954

0.956

0.956

0.964

0.955

0.952

0.955

0.966

5

0.957

0.963

0.962

0.967

0.968

0.970

0.963

0.964

0.968

0.967

7.5

0.962

0.966

0.961

0.967

0.959

0.975

0.968

0.969

0.971

0.967

10

0.960

0.969

0.964

0.966

0.965

0.970

0.965

0.967

0.967

0.967

CC

1.5 2.5

DF

IG

MI

Table A.4: Thresholding techniques for Alj-News-SR Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.862

0.873

0.894

0.818

1

0.842

0.881

0.884

1.5

0.868

0.899

0.909

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.912

0.890

0.885

0.888

0.886

0.912

0.883

0.903

0.921

0.917

0.920

0.920

0.924

0.908

0.929

0.932

0.929

0.936

0.929

0.946

0.887

0.933

0.943

0.921

0.947

0.949

0.946

0.946

0.944

0.943

5

0.905

0.953

0.955

0.957

0.956

0.956

0.957

0.960

0.960

0.956

7.5

0.913

0.960

0.961

0.958

0.961

0.959

0.959

0.959

0.956

0.965

10

0.916

0.958

0.960

0.961

0.964

0.962

0.964

0.964

0.964

0.960

0.5

0.451

0.874

0.625

0.864

0.664

0.875

0.898

0.898

0.885

0.905

1

0.682

0.918

0.833

0.913

0.822

0.927

0.919

0.922

0.920

0.923

0.759

0.933

0.856

0.929

0.878

0.939

0.934

0.933

0.932

0.943

0.863

0.949

0.913

0.948

0.920

0.950

0.949

0.950

0.950

0.949

5

0.922

0.960

0.953

0.955

0.948

0.956

0.958

0.958

0.958

0.957

7.5

0.954

0.958

0.955

0.960

0.959

0.963

0.957

0.957

0.958

0.957

10

0.956

0.958

0.960

0.960

0.959

0.963

0.958

0.961

0.958

0.959

CC

1.5 2.5

DF

124

IG

MI

Table A.5: Thresholding techniques for Alj-News-MS Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.862

0.873

0.894

0.818

1

0.893

0.910

0.929

1.5

0.912

0.934

0.931

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.912

0.889

0.883

0.885

0.883

0.933

0.912

0.944

0.936

0.932

0.934

0.932

0.949

0.933

0.946

0.946

0.941

0.942

0.943

0.953

0.938

0.951

0.949

0.949

0.954

0.955

0.952

0.954

0.954

0.959

5

0.951

0.961

0.959

0.954

0.964

0.958

0.965

0.964

0.966

0.958

7.5

0.955

0.961

0.960

0.964

0.964

0.962

0.965

0.963

0.962

0.960

10

0.960

0.960

0.960

0.964

0.961

0.966

0.964

0.962

0.964

0.968

0.5

0.602

0.888

0.823

0.885

0.869

0.881

0.900

0.892

0.892

0.926

1

0.801

0.928

0.909

0.923

0.911

0.930

0.926

0.929

0.931

0.949

0.861

0.938

0.922

0.932

0.925

0.946

0.941

0.939

0.942

0.957

0.920

0.950

0.945

0.950

0.945

0.955

0.952

0.954

0.954

0.959

5

0.942

0.955

0.954

0.952

0.957

0.965

0.957

0.962

0.956

0.962

7.5

0.953

0.962

0.956

0.955

0.954

0.962

0.965

0.963

0.962

0.960

10

0.959

0.961

0.959

0.960

0.959

0.960

0.962

0.964

0.962

0.960

CC

1.5 2.5

DF

IG

MI

Table A.6: Thresholding techniques for Alj-News-MR Dataset (MicroF1 ) Th

Avg

MD

Max

STD

FLocal

0.5

0.680

0.530

0.563

0.515

1

0.775

0.751

0.751

1.5

0.821

0.809

0.805

2.5

Metric

Metric

Avg

MD

Max

STD

FLocal

0.596

0.617

0.779

0.654

0.721

0.852

0.760

0.802

0.834

0.850

0.851

0.853

0.891

0.788

0.855

0.888

0.886

0.886

0.886

0.918

0.676

0.906

0.787

0.902

0.813

0.916

0.913

0.910

0.911

0.930

5

0.888

0.904

0.933

0.913

0.940

0.947

0.935

0.938

0.934

0.949

7.5

0.913

0.935

0.944

0.940

0.954

0.944

0.948

0.950

0.949

0.955

10

0.911

0.951

0.957

0.951

0.955

0.959

0.952

0.954

0.953

0.959

0.5

0.000

0.523

0.196

0.538

0.191

0.779

0.523

0.617

0.523

0.859

1

0.259

0.758

0.513

0.730

0.531

0.859

0.827

0.849

0.827

0.902

0.498

0.883

0.665

0.835

0.703

0.891

0.905

0.880

0.905

0.916

0.676

0.906

0.787

0.902

0.813

0.917

0.908

0.907

0.919

0.930

5

0.808

0.925

0.897

0.937

0.917

0.947

0.935

0.938

0.934

0.949

7.5

0.879

0.953

0.923

0.954

0.932

0.948

0.953

0.950

0.952

0.948

10

0.900

0.953

0.948

0.954

0.941

0.957

0.952

0.953

0.955

0.959

CC

1.5 2.5

DF

125

IG

MI

126

MI

IG

DF

CC

Th 0.297 0.397 0.451

0.169 0.283

0.557 0.594

0.408 0.467

0.284 0.351 0.443 0.517 0.557 0.568 0.143 0.290 0.348 0.426 0.511 0.546 0.570 0.485 0.522 0.545 0.581 0.609 0.610 0.617 0.474 0.522 0.551 0.591 0.617 0.625 0.627

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.612

0.608

0.588

0.547

0.499

0.610

0.609

0.532

0.499

0.446

0.561

0.543

0.501

0.409

0.326

0.520

0.490

0.337

0.252

0.5

WAvg

Avg 0.207

Metric

Avg

0.613

0.612

0.599

0.526

0.472

0.421

0.272

0.610

0.608

0.600

0.568

0.535

0.498

0.433

0.575

0.548

0.512

0.417

0.357

0.258

0.000

0.529

0.506

0.462

0.367

0.304

0.258

0.171

NAvg

MD

0.620

0.617

0.610

0.582

0.543

0.525

0.483

0.633

0.633

0.628

0.598

0.562

0.537

0.493

0.592

0.575

0.544

0.494

0.431

0.389

0.349

0.596

0.587

0.571

0.530

0.488

0.444

0.375

0.604

0.592

0.573

0.537

0.504

0.476

0.400

0.620

0.612

0.593

0.565

0.531

0.504

0.445

0.561

0.541

0.513

0.401

0.324

0.293

0.251

0.580

0.568

0.546

0.472

0.420

0.356

0.299

WMD

MD

0.614

0.611

0.604

0.551

0.518

0.490

0.435

0.622

0.615

0.592

0.560

0.524

0.501

0.441

0.605

0.604

0.591

0.548

0.507

0.486

0.423

0.600

0.589

0.574

0.528

0.460

0.416

0.366

NMD

Max

0.622

0.616

0.610

0.585

0.543

0.523

0.484

0.632

0.633

0.627

0.596

0.562

0.535

0.493

0.589

0.576

0.544

0.486

0.402

0.378

0.277

0.599

0.589

0.573

0.537

0.487

0.455

0.377

0.607

0.595

0.577

0.539

0.503

0.478

0.399

0.620

0.612

0.592

0.566

0.531

0.504

0.445

0.559

0.541

0.515

0.401

0.334

0.293

0.243

0.589

0.569

0.539

0.472

0.406

0.354

0.299

WMax

Max

0.616

0.614

0.608

0.555

0.518

0.491

0.436

0.624

0.617

0.608

0.561

0.525

0.501

0.441

0.600

0.596

0.574

0.526

0.491

0.444

0.381

0.603

0.596

0.579

0.534

0.453

0.424

0.365

NMax

STD

0.612

0.611

0.602

0.568

0.536

0.514

0.470

0.633

0.629

0.625

0.603

0.558

0.530

0.491

0.585

0.574

0.541

0.481

0.406

0.379

0.298

0.592

0.581

0.559

0.509

0.456

0.420

0.356

0.598

0.592

0.571

0.531

0.498

0.474

0.404

0.622

0.613

0.598

0.566

0.533

0.504

0.444

0.565

0.538

0.510

0.418

0.329

0.315

0.255

0.583

0.569

0.543

0.488

0.445

0.407

0.311

WSTD

STD

Table A.7: Thresholding techniques for the Ohsumed Dataset (MicroF1 )

0.610

0.609

0.600

0.565

0.511

0.488

0.431

0.625

0.619

0.601

0.566

0.529

0.500

0.438

0.601

0.591

0.576

0.529

0.498

0.458

0.391

0.595

0.581

0.558

0.504

0.438

0.411

0.347

NSTD

0.623

0.621

0.620

0.595

0.562

0.518

0.475

0.636

0.635

0.629

0.605

0.567

0.532

0.486

0.601

0.597

0.573

0.527

0.485

0.455

0.385

0.604

0.601

0.584

0.542

0.490

0.447

0.380

Flocal

0.618

0.621

0.617

0.592

0.563

0.535

0.495

0.632

0.630

0.628

0.603

0.578

0.553

0.505

0.586

0.576

0.557

0.516

0.477

0.430

0.395

0.602

0.596

0.585

0.546

0.505

0.457

0.413

WLocal

Local

127

MI

IG

DF

CC

Th 0.107 0.190 0.252

0.047 0.095

0.386 0.445

0.199 0.247

0.165 0.228 0.308 0.397 0.445 0.461 0.041 0.112 0.168 0.238 0.339 0.389 0.435 0.311 0.393 0.413 0.474 0.516 0.521 0.527 0.299 0.389 0.423 0.479 0.522 0.535 0.539

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.483

0.471

0.418

0.353

0.269

0.492

0.492

0.355

0.292

0.244

0.411

0.384

0.316

0.214

0.132

0.351

0.310

0.134

0.086

0.5

WAvg

Avg 0.115

Metric

Avg

0.530

0.533

0.527

0.475

0.415

0.381

0.282

0.527

0.528

0.525

0.499

0.455

0.413

0.343

0.467

0.429

0.378

0.252

0.191

0.124

0.000

0.484

0.465

0.429

0.338

0.292

0.246

0.176

NAvg

MD

0.534

0.530

0.520

0.468

0.418

0.396

0.317

0.558

0.555

0.554

0.500

0.456

0.421

0.360

0.470

0.440

0.379

0.304

0.238

0.205

0.160

0.473

0.464

0.429

0.381

0.329

0.289

0.195

0.466

0.436

0.406

0.349

0.287

0.266

0.192

0.486

0.469

0.429

0.388

0.336

0.286

0.239

0.416

0.386

0.336

0.196

0.127

0.098

0.067

0.440

0.409

0.375

0.291

0.226

0.162

0.093

WMD

MD

0.536

0.538

0.534

0.495

0.462

0.422

0.359

0.549

0.548

0.537

0.513

0.478

0.448

0.371

0.521

0.524

0.511

0.462

0.422

0.393

0.332

0.517

0.498

0.482

0.427

0.372

0.326

0.269

NMD

Max

0.534

0.530

0.520

0.473

0.419

0.395

0.317

0.555

0.555

0.548

0.497

0.454

0.418

0.360

0.466

0.441

0.379

0.300

0.214

0.197

0.084

0.481

0.467

0.429

0.391

0.333

0.295

0.198

0.468

0.440

0.410

0.350

0.286

0.266

0.192

0.488

0.469

0.429

0.389

0.336

0.287

0.239

0.412

0.386

0.336

0.196

0.135

0.097

0.065

0.449

0.415

0.353

0.272

0.202

0.154

0.093

WMax

Max

0.536

0.538

0.538

0.499

0.462

0.424

0.364

0.552

0.549

0.546

0.515

0.478

0.448

0.371

0.514

0.510

0.488

0.430

0.387

0.344

0.242

0.519

0.506

0.488

0.434

0.369

0.339

0.274

NMax

STD

0.521

0.519

0.505

0.445

0.407

0.359

0.303

0.553

0.553

0.546

0.504

0.453

0.414

0.340

0.462

0.436

0.375

0.295

0.220

0.199

0.112

0.479

0.451

0.417

0.333

0.275

0.239

0.176

0.458

0.440

0.404

0.340

0.285

0.265

0.196

0.503

0.478

0.432

0.391

0.352

0.289

0.244

0.412

0.368

0.331

0.218

0.130

0.118

0.067

0.438

0.414

0.366

0.289

0.246

0.216

0.119

WSTD

STD

Table A.8: Thresholding techniques for the Ohsumed Dataset (MacroF1 )

0.527

0.531

0.528

0.501

0.445

0.412

0.356

0.549

0.550

0.540

0.518

0.483

0.446

0.366

0.514

0.503

0.485

0.427

0.396

0.353

0.254

0.503

0.491

0.470

0.396

0.318

0.265

0.203

NSTD

0.545

0.548

0.548

0.528

0.498

0.446

0.397

0.562

0.566

0.561

0.546

0.508

0.457

0.405

0.518

0.514

0.490

0.440

0.392

0.359

0.272

0.521

0.517

0.499

0.444

0.395

0.362

0.273

Flocal

0.535

0.540

0.532

0.488

0.448

0.427

0.386

0.553

0.552

0.549

0.500

0.476

0.445

0.390

0.477

0.460

0.435

0.389

0.332

0.274

0.230

0.497

0.482

0.465

0.407

0.365

0.316

0.273

WLocal

Local

128

MI

IG

DF

CC

Th 0.815 0.847 0.858

0.816 0.870

0.944 0.946

0.834 0.897

0.888 0.896 0.910 0.916 0.914 0.918 0.853 0.906 0.925 0.934 0.944 0.946 0.945 0.929 0.942 0.946 0.947 0.944 0.945 0.942 0.922 0.939 0.941 0.947 0.944 0.944 0.943

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.940

0.937

0.936

0.929

0.913

0.944

0.947

0.936

0.935

0.914

0.944

0.942

0.938

0.903

0.891

0.869

0.870

0.842

0.756

0.5

WAvg

Avg 0.824

Metric

Avg

0.933

0.931

0.929

0.886

0.867

0.812

0.751

0.942

0.942

0.944

0.943

0.938

0.931

0.912

0.943

0.945

0.943

0.937

0.920

0.907

0.852

0.897

0.893

0.886

0.867

0.830

0.806

0.745

NAvg

MD

0.943

0.944

0.945

0.950

0.947

0.941

0.923

0.944

0.944

0.947

0.949

0.945

0.947

0.929

0.947

0.945

0.947

0.942

0.936

0.930

0.862

0.945

0.944

0.945

0.943

0.938

0.925

0.902

0.948

0.947

0.947

0.938

0.938

0.924

0.854

0.945

0.948

0.946

0.939

0.935

0.924

0.857

0.946

0.944

0.938

0.914

0.884

0.857

0.774

0.947

0.943

0.942

0.938

0.912

0.897

0.830

WMD

MD Max

0.942

0.944

0.941

0.942

0.926

0.923

0.897

0.943

0.944

0.945

0.945

0.948

0.948

0.940

0.925

0.945

0.943

0.945

0.941 0.939

0.948

0.946

0.942

0.925

0.946

0.946

0.948

0.941

0.933

0.922

0.862

0.943

0.945

0.945

0.942

0.939

0.923

0.905

0.936

0.915

0.910

0.885

0.943

0.946

0.944

0.944

0.940

0.934

0.916

0.942

0.942

0.944

0.939

0.933

0.922

0.901

NMD

0.946

0.945

0.949

0.944

0.940

0.924

0.827

0.945

0.948

0.944

0.939

0.935

0.932

0.877

0.945

0.943

0.937

0.912

0.885

0.858

0.774

0.947

0.944

0.944

0.934

0.916

0.895

0.791

WMax

Max

0.944

0.944

0.939

0.939

0.923

0.915

0.884

0.942

0.942

0.941

0.938

0.926

0.923

0.892

0.944

0.946

0.945

0.942

0.938

0.931

0.915

0.942

0.942

0.941

0.938

0.926

0.910

0.866

NMax

STD

0.942

0.946

0.946

0.948

0.946

0.941

0.922

0.945

0.943

0.946

0.949

0.948

0.942

0.929

0.946

0.946

0.948

0.943

0.935

0.922

0.887

0.946

0.945

0.942

0.941

0.935

0.925

0.897

0.946

0.948

0.945

0.937

0.927

0.916

0.859

0.944

0.948

0.946

0.940

0.939

0.935

0.879

0.946

0.941

0.937

0.906

0.882

0.858

0.805

0.945

0.943

0.942

0.935

0.919

0.893

0.823

WSTD

STD

Table A.9: Thresholding techniques for the Reuters(10) Dataset (MicroF1 )

0.942

0.943

0.940

0.939

0.930

0.928

0.897

0.941

0.941

0.939

0.937

0.925

0.920

0.889

0.942

0.946

0.946

0.942

0.942

0.934

0.914

0.944

0.943

0.944

0.939

0.933

0.923

0.902

NSTD

0.946

0.947

0.946

0.946

0.945

0.940

0.926

0.943

0.942

0.944

0.949

0.943

0.937

0.917

0.944

0.947

0.945

0.945

0.942

0.932

0.918

0.945

0.946

0.945

0.941

0.941

0.927

0.898

Flocal

0.949

0.949

0.951

0.949

0.945

0.940

0.925

0.945

0.944

0.944

0.948

0.948

0.942

0.929

0.945

0.948

0.947

0.939

0.932

0.930

0.922

0.946

0.948

0.946

0.942

0.937

0.932

0.895

WLocal

Local

129

MI

IG

DF

CC

Th 0.543 0.645 0.699

0.491 0.607

0.894 0.896

0.586 0.754

0.817 0.816 0.848 0.853 0.852 0.856 0.550 0.776 0.856 0.863 0.894 0.893 0.893 0.877 0.898 0.903 0.904 0.895 0.891 0.886 0.873 0.899 0.901 0.903 0.892 0.891 0.887

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.878

0.875

0.874

0.859

0.785

0.893

0.898

0.872

0.866

0.784

0.890

0.881

0.871

0.735

0.697

0.723

0.719

0.632

0.339

0.5

WAvg

Avg 0.679

Metric

Avg

0.884

0.883

0.887

0.866

0.864

0.831

0.799

0.889

0.887

0.895

0.897

0.900

0.898

0.877

0.892

0.897

0.892

0.893

0.870

0.847

0.807

0.863

0.861

0.857

0.854

0.814

0.813

0.764

NAvg

MD

0.888

0.891

0.895

0.912

0.904

0.895

0.856

0.891

0.889

0.898

0.908

0.900

0.905

0.882

0.895

0.893

0.900

0.889

0.870

0.862

0.555

0.893

0.892

0.893

0.894

0.884

0.845

0.797

0.898

0.896

0.901

0.882

0.876

0.814

0.534

0.896

0.890

0.893

0.894

0.903

0.891

0.893

0.870

0.893

0.892

0.897

0.894 0.898

0.897

0.887

0.881

0.854

0.887

0.896

0.892

0.897

0.901

0.895

0.877

0.892

0.892

0.896

0.892

0.891

0.884

0.854

NMD

0.871

0.870

0.820

0.591

0.897

0.891

0.872

0.774

0.657

0.573

0.323

0.895

0.890

0.889

0.881

0.749

0.674

0.493

WMD

MD Max

0.889

0.891

0.896

0.907

0.906

0.893

0.856

0.893

0.887

0.894

0.908

0.901

0.899

0.866

0.897

0.895

0.904

0.887

0.864

0.819

0.562

0.889

0.893

0.894

0.893

0.888

0.832

0.809

0.894

0.892

0.908

0.892

0.880

0.801

0.465

0.895

0.898

0.892

0.874

0.871

0.864

0.623

0.892

0.888

0.870

0.771

0.655

0.571

0.323

0.896

0.889

0.892

0.855

0.781

0.722

0.344

WMax

Max

0.889

0.891

0.890

0.902

0.888

0.886

0.859

0.889

0.892

0.898

0.898

0.894

0.894

0.860

0.890

0.895

0.894

0.899

0.895

0.888

0.877

0.890

0.892

0.892

0.895

0.888

0.866

0.827

NMax

STD

0.887

0.895

0.897

0.905

0.903

0.896

0.856

0.892

0.888

0.896

0.909

0.909

0.894

0.881

0.895

0.893

0.901

0.891

0.867

0.825

0.679

0.895

0.892

0.890

0.888

0.883

0.850

0.750

0.895

0.900

0.895

0.870

0.828

0.773

0.546

0.892

0.900

0.895

0.875

0.878

0.870

0.626

0.893

0.875

0.870

0.744

0.656

0.568

0.438

0.894

0.891

0.892

0.867

0.789

0.680

0.511

WSTD

STD

Table A.10: Thresholding techniques for the Reuters(10) Dataset (MacroF1 )

0.889

0.891

0.892

0.898

0.896

0.902

0.880

0.891

0.890

0.893

0.899

0.894

0.895

0.863

0.887

0.895

0.896

0.898

0.900

0.893

0.880

0.894

0.891

0.896

0.891

0.891

0.878

0.858

NSTD

0.891

0.893

0.896

0.904

0.903

0.904

0.886

0.886

0.891

0.894

0.908

0.901

0.903

0.881

0.888

0.895

0.895

0.898

0.902

0.888

0.863

0.892

0.888

0.892

0.894

0.901

0.880

0.853

Flocal

0.901

0.906

0.910

0.907

0.902

0.884

0.863

0.893

0.891

0.892

0.905

0.909

0.895

0.871

0.894

0.901

0.898

0.884

0.866

0.864

0.859

0.894

0.899

0.893

0.891

0.884

0.873

0.794

WLocal

Local

130

MI

IG

DF

CC

Th

0.825 0.825

0.700 0.727

0.655 0.729

0.838 0.859

0.673 0.761

0.802 0.823 0.843 0.850 0.854 0.5 0.778 0.802 0.824 0.861 0.870 0.871 0.812 0.848 0.866 0.870 0.874 0.873 0.872 0.804 0.836 0.856 0.863 0.873 0.872 0.872

2.5

5

7.5

10

0.713

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.841

0.841

0.837

0.806

0.785

0.868

0.863

0.814

0.796

0.770

0.862

0.848

0.826

0.792

0.745

0.750

0.737

0.668

0.651

0.778

1

0.837

0.837

0.822

0.771

0.717

0.698

0.624

0.861

0.856

0.731

0.688

0.616

0.868

0.865

0.860

0.843

0.801

0.750

0.673

0.841

0.819

0.808

0.790

0.727

0.706

0.641

1.5

NAvg

0.581

0.5

WAvg

Avg 0.725

Metric

Avg MD

0.874

0.876

0.876

0.876

0.869

0.851

0.806

0.874

0.875

0.878

0.876

0.877

0.857

0.821

0.874

0.872

0.868

0.841

0.820

0.794

0.738

0.872

0.871

0.869

0.863

0.825

0.810

0.762

0.860

0.854

0.854

0.834

0.809

0.776

0.718

0.869

0.860

0.855

0.821

0.811

0.784

0.720

0.853

0.848

0.825

0.786

0.742

0.709

0.636

0.860

0.858

0.850

0.810

0.783

0.706

0.649

WMD

MD

0.841

0.817

0.798

0.662

0.485

0.211

0.039

0.849

0.803

0.801

0.697

0.523

0.223

0.045

0.870

0.862

0.852

0.842

0.659

0.331

0.058

0.865

0.858

0.849

0.768

0.653

0.550

0.447

NMD

Max

0.873

0.877

0.876

0.875

0.869

0.852

0.804

0.874

0.875

0.877

0.875

0.877

0.858

0.822

0.874

0.873

0.870

0.838

0.818

0.794

0.741

0.874

0.871

0.868

0.862

0.825

0.810

0.764

0.860

0.855

0.855

0.833

0.809

0.776

0.717

0.869

0.860

0.854

0.823

0.811

0.785

0.721

0.853

0.848

0.824

0.785

0.741

0.708

0.634

0.855

0.850

0.839

0.815

0.776

0.696

0.634

WMax

Max

0.842

0.824

0.800

0.675

0.494

0.210

0.040

0.849

0.803

0.800

0.703

0.532

0.222

0.053

0.870

0.863

0.855

0.844

0.140

0.064

0.083

0.865

0.855

0.850

0.764

0.674

0.550

0.448

NMax

STD

0.875

0.874

0.873

0.875

0.864

0.855

0.803

0.875

0.875

0.874

0.876

0.873

0.859

0.820

0.872

0.870

0.867

0.841

0.815

0.788

0.744

0.870

0.868

0.866

0.853

0.813

0.800

0.733

0.864

0.861

0.855

0.828

0.789

0.772

0.710

0.870

0.862

0.856

0.825

0.814

0.788

0.747

0.855

0.849

0.825

0.788

0.742

0.719

0.648

0.866

0.851

0.848

0.810

0.775

0.740

0.675

WSTD

STD

Table A.11: Thresholding techniques for the Reuters(90) Dataset (MicroF1 )

0.858

0.838

0.806

0.746

0.650

0.564

0.234

0.854

0.848

0.806

0.761

0.642

0.572

0.227

0.869

0.867

0.856

0.843

0.817

0.785

0.706

0.868

0.865

0.856

0.821

0.776

0.725

0.494

NSTD

0.871

0.873

0.866

0.855

0.835

0.827

0.798

0.868

0.867

0.866

0.851

0.835

0.822

0.789

0.871

0.870

0.868

0.854

0.843

0.824

0.790

0.868

0.870

0.867

0.852

0.833

0.779

0.710

Flocal

0.876

0.874

0.880

0.879

0.875

0.861

0.842

0.874

0.875

0.873

0.878

0.875

0.867

0.803

0.872

0.875

0.875

0.870

0.865

0.855

0.810

0.873

0.871

0.871

0.857

0.837

0.820

0.701

WLocal

Local

131

MI

IG

DF

CC

Th 0.065 0.120 0.154

0.055 0.113

0.256 0.339

0.052 0.124

0.217 0.255 0.326 0.368 0.400 0.419 0.5 0.159 0.204 0.238 0.357 0.402 0.420 0.225 0.321 0.374 0.401 0.444 0.428 0.434 0.225 0.298 0.363 0.379 0.448 0.437 0.429

1

1.5

2.5

5

7.5

10

0.093

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.5

1

1.5

2.5

5

7.5

10

0.300

0.290

0.263

0.203

0.166

0.406

0.367

0.196

v 0.162

0.118

0.367

0.313

0.256

0.184

0.141

0.204

0.173

0.083

0.019

0.5

WAvg

Avg 0.150

Metric

Avg

0.423

0.422

0.416

0.374

0.329

0.294

0.222

0.429

0.433

0.432

0.410

0.322

0.287

0.216

0.430

0.434

0.432

0.368

0.255

0.203

0.126

0.418

0.404

0.412

0.348

0.272

0.230

0.156

NAvg

MD

0.448

0.452

0.451

0.407

0.366

0.295

0.181

0.443

0.455

0.459

0.432

0.407

0.341

0.240

0.425

0.420

0.389

0.255

0.210

0.162

0.092

0.421

0.417

0.384

0.325

0.229

0.197

0.117

0.329

0.305

0.301

0.240

0.183

0.116

0.069

0.396

0.354

0.323

0.226

0.199

0.156

0.084

0.340

0.318

0.254

0.180

0.136

0.101

0.042

0.356

0.332

0.296

0.197

0.157

0.068

0.039

WMD

MD

0.416

0.432

0.422

0.356

0.303

0.243

0.070

0.431

0.431

0.428

0.377

0.321

0.249

0.091

0.431

0.439

0.441

0.429

0.268

0.136

0.055

0.430

0.436

0.427

0.314

0.179

0.138

0.075

NMD

Max

0.447

0.456

0.451

0.407

0.366

0.290

0.181

0.443

0.455

0.456

0.431

0.408

0.343

0.240

0.423

0.423

0.392

0.249

0.209

0.161

0.097

0.425

0.419

0.387

0.326

0.229

0.200

0.119

0.328

0.322

0.304

0.234

0.182

0.116

0.069

0.397

0.354

0.323

0.226

0.199

0.156

0.085

0.342

0.316

0.256

0.178

0.136

0.100

0.041

0.321

0.306

0.272

0.196

0.139

0.065

0.031

WMax

Max

0.412

0.430

0.423

0.348

0.305

0.239

0.081

0.433

0.429

0.427

0.364

0.323

0.248

0.100

0.435

0.431

0.443

0.429

0.010

0.005

0.009

0.431

0.437

0.432

0.316

0.188

0.139

0.071

NMax

STD

0.442

0.442

0.444

0.412

0.352

0.297

0.189

0.441

0.445

0.449

0.422

0.393

0.345

0.234

0.420

0.407

0.386

0.249

0.210

0.168

0.106

0.412

0.412

0.380

0.288

0.205

0.185

0.094

0.376

0.355

0.323

0.229

0.166

0.135

0.066

0.402

0.362

0.323

0.228

0.205

0.156

0.093

0.346

0.314

0.252

0.189

0.135

0.110

0.048

0.391

0.348

0.309

0.211

0.174

0.117

0.057

WSTD

STD

Table A.12: Thresholding techniques for the Reuters(90) Dataset (MacroF1 )

0.429

0.435

0.429

0.396

0.342

0.293

0.187

0.419

0.433

0.427

0.394

0.339

0.307

0.186

0.432

0.435

0.435

0.399

0.351

0.289

0.202

0.424

0.437

0.434

0.347

0.248

0.191

0.081

NSTD

0.433

0.439

0.449

0.451

0.443

0.441

0.393

0.435

0.438

0.446

0.450

0.442

0.442

0.394

0.428

0.434

0.443

0.447

0.435

0.407

0.369

0.428

0.434

0.439

0.418

0.385

0.295

0.227

Flocal

0.460

0.459

0.469

0.469

0.470

0.437

0.380

0.440

0.459

0.463

0.468

0.462

0.449

0.380

0.433

0.447

0.440

0.433

0.428

0.406

0.340

0.431

0.400

0.392

0.314

0.271

0.241

0.153

WLocal

Local

132

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.856 (±0.007)

(±0.006)

(±0.006)

(±0.007)

0.857

0.855

(±0.006)

(±0.006)

0.856

0.857

(±0.011)

(±0.006)

0.857

0.841

(±0.007)

(±0.005)

0.846

0.822

(±0.007)

(±0.010)

0.828

0.816

(±0.006)

(±0.005)

0.821

0.776

0.788

0.828 (±0.009)

0.846

(±0.012)

0.822 (±0.008)

0.845

(±0.011)

0.821 (±0.009)

0.848

(±0.008)

0.810 (±0.008)

0.836

(±0.013)

0.801 (±0.007)

0.818

(±0)

(±0.007)

0.781

0.798

(±0.007)

(±0.011)

(±0)

1

0.745

WAvg

Avg

0.756

Th

0.5

Avg

(±0.005)

0.855

(±0.008)

0.858

(±0.010)

0.854

(±0.007)

0.846

(±0.007)

0.832

(±0.006)

0.812

(±0.007)

0.770

(±0.007)

0.837

(±0.011)

0.832

(±0.009)

0.829

(±0.008)

0.820

(±0)

0.792

(±0.009)

0.758

(±0.016)

0.680

NAvg

MD

(±0.008)

0.861

(±0.010)

0.861

(±0.005)

0.859

(±0.007)

0.856

(±0.009)

0.843

(±0.012)

0.833

(±0.008)

0.810

(±0.010)

0.861

(±0.006)

0.860

(±0.007)

0.858

(±0)

0.849

(±0.009)

0.835

(±0.019)

0.823

(±0.024)

0.786

(±0.009)

0.860

(±0.007)

0.856

(±0.005)

0.855

(±0.009)

0.844

(±0.007)

0.826

(±0.009)

0.810

(±0.008)

0.790

(±0.006)

0.857

(±0.010)

0.857

(±0)

0.855

(±0.014)

0.848

(±0.016)

0.835

(±0.018)

0.819

(±0.011)

0.789

WMD

MD

(±0.005)

0.858

(±0.007)

0.858

(±0.010)

0.857

(±0.008)

0.850

(±0.006)

0.843

(±0.009)

0.834

(±0.008)

0.816

(±0.009)

0.857

(±0)

0.856

(±0.011)

0.852

(±0.010)

0.845

(±0.008)

0.828

(±0.012)

0.810

(±0.011)

0.769

NMD

Max

(±0.007)

0.861

(±0.010)

0.860

(±0.006)

0.859

(±0.006)

0.855

(±0.010)

0.842

(±0.012)

0.829

(±0.008)

0.809

(±0)

0.859

(±0.010)

0.861

(±0.012)

0.858

(±0.008)

0.847

(±0.008)

0.835

(±0.005)

0.821

(±0.016)

0.786

(±0.011)

0.857

(±0.006)

0.857

(±0.009)

0.855

(±0.011)

0.844

(±0.006)

0.828

(±0.012)

0.809

(±0)

0.791

(±0.010)

0.858

(±0.013)

0.857

(±0.007)

0.856

(±0.008)

0.845

(±0.005)

0.833

(±0.010)

0.825

(±0.008)

0.790

WMax

Max

(±0.006)

0.859

(±0.009)

0.859

(±0.009)

0.859

(±0.003)

0.847

(±0.008)

0.842

(±0)

0.831

(±0.011)

0.810

(±0.013)

0.857

(±0.006)

0.856

(±0.008)

0.850

(±0.011)

0.844

(±0.009)

0.823

(±0.010)

0.807

(±0.009)

0.760

NMax

Table A.13: Thresholding techniques for Alj-Mgz-W Dataset (MicroF1 )

STD

(±0.007)

0.860

(±0.009)

0.860

(±0.008)

0.859

(±0.006)

0.857

(±0)

0.842

(±0.010)

0.832

(±0.008)

0.812

(±0.007)

0.860

(±0.007)

0.859

(±0.009)

0.855

(±0.007)

0.848

(±0.007)

0.834

(±0.006)

0.822

(±0.017)

0.783

(±0.012)

0.858

(±0.009)

0.857

(±0.006)

0.857

(±0)

0.844

(±0.011)

0.826

(±0.009)

0.809

(±0.011)

0.790

(±0.006)

0.857

(±0.009)

0.857

(±0.009)

0.856

(±0.006)

0.843

(±0.005)

0.833

(±0.012)

0.822

(±0.005)

0.792

WSTD

STD

(±0.009)

0.858

(±0.007)

0.858

(±0)

0.857

(±0.005)

0.848

(±0.009)

0.843

(±0.010)

0.832

(±0.007)

0.815

(±0.006)

0.857

(±0.012)

0.857

(±0.006)

0.852

(±0.006)

0.844

(±0.008)

0.830

(±0.013)

0.810

(±0.014)

0.767

NSTD

(±0.006)

0.859

(±0)

0.859

(±0.007)

0.860

(±0.005)

0.856

(±0.013)

0.847

(±0.007)

0.838

(±0.005)

0.822

(±0.008)

0.857

(±0.007)

0.859

(±0.008)

0.860

(±0.009)

0.849

(±0.010)

0.841

(±0.013)

0.831

(±0.013)

0.811

Flocal

(±0)

0.861

(±0.010)

0.859

(±0.006)

0.860

(±0.008)

0.854

(±0.008)

0.846

(±0.008)

0.835

(±0.005)

0.816

(±0.008)

0.858

(±0.007)

0.858

(±0.010)

0.858

(±0.007)

0.853

(±0.009)

0.842

(±0.010)

0.832

(±0.009)

0.818

WLocal

Local

133

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.863 (±0.006)

0.860

(±0.010)

0.863 (±0.009)

0.859

(±0.006)

0.859 (±0.006)

0.860

(±0.007)

0.847 (±0.007)

0.852

(±0.006)

0.829 (±0.008)

0.851

(±0.013)

0.816 (±0.012)

0.841

(±0.010)

0.769 (±0.006)

0.819

(±0.009)

(±0.005)

(±0.007)

0.861

(±0.009)

(±0.008)

0.860

0.861

(±0.005)

(±0.007)

0.862

0.859

0.859

(±0.008)

(±0.006)

(±0) 0.849

(±0.009)

0.851

0.838

(±0.012)

(±0)

0.844

0.833

(±0.008)

0.841

(±0.008)

1

0.811

0.822

0.5

Avg WAvg

Avg

Th

(±0.007)

0.850

(±0.005)

0.849

(±0.005)

0.848

(±0.008)

0.839

(±0.008)

0.832

(±0.007)

0.821

(±0.009)

0.793

(±0.007)

0.855

(±0.005)

0.854

(±0.004)

0.851

(±0)

0.848

(±0.006)

0.843

(±0.012)

0.836

(±0.009)

0.816

NAvg

(±0.006)

0.861

(±0.007)

0.859

(±0.005)

0.862

(±0.006)

0.858

(±0.008)

0.852

(±0.009)

0.841

(±0.006)

0.823

(±0.007)

0.861

(±0.007)

0.862

(±0)

0.861

(±0.005)

0.855

(±0.007)

0.846

(±0.007)

0.834

(±0.008)

0.819

MD

MD

(±0.008)

0.861

(±0.006)

0.861

(±0.006)

0.860

(±0.010)

0.851

(±0.007)

0.845

(±0.009)

0.832

(±0.011)

0.801

(±0.006)

0.862

(±0)

0.859

(±0.009)

0.855

(±0.008)

0.850

(±0.010)

0.839

(±0.009)

0.822

(±0.005)

0.790

WMD

(±0.005)

0.854

(±0.006)

0.852

(±0.004)

0.854

(±0.010)

0.848

(±0.008)

0.838

(±0.011)

0.822

(±0.006)

0.810

(±0)

0.851

(±0.008)

0.849

(±0.008)

0.847

(±0.009)

0.841

(±0.007)

0.832

(±0.014)

0.827

(±0.010)

0.802

NMD

(±0.006)

0.861

(±0.006)

0.859

(±0.008)

0.861

(±0.007)

0.856

(±0.010)

0.848

(±0.008)

0.841

(±0)

0.822

(±0.006)

0.861

(±0.006)

0.861

(±0.009)

0.862

(±0.008)

0.854

(±0.008)

0.846

(±0.008)

0.835

(±0.006)

0.822

Max

Max

0.794

WMax

(±0.007)

0.862

(±0.005)

0.863

(±0.007)

0.860

(±0.003)

0.855

(±0.013)

0.847

(±0)

0.833

(±0.009)

0.800

(±0.008)

0.861

(±0.009)

0.860

(±0.007)

0.858

(±0.008)

0.850

(±0.007)

0.840

(±0.008)

0.824

(±0.008)

Table A.13 – continued

(±0.006)

0.854

(±0.008)

0.852

(±0.003)

0.851

(±0.004)

0.848

(±0)

0.838

(±0.007)

0.824

(±0.008)

0.807

(±0.011)

0.851

(±0.007)

0.851

(±0.007)

0.848

(±0.007)

0.843

(±0.010)

0.835

(±0.014)

0.825

(±0.010)

0.807

NMax

(±0.008)

0.861

(±0.006)

0.858

(±0.004)

0.860

(±0)

0.856

(±0.010)

0.846

(±0.012)

0.841

(±0.012)

0.823

(±0.010)

0.860

(±0.011)

0.858

(±0.009)

0.861

(±0.006)

0.854

(±0.008)

0.845

(±0.007)

0.834

(±0.008)

0.820

STD

STD

(±0.006)

0.859

(±0.007)

0.859

(±0)

0.857

(±0.006)

0.851

(±0.013)

0.840

(±0.007)

0.831

(±0.005)

0.801

(±0.008)

0.862

(±0.007)

0.860

(±0.005)

0.856

(±0.009)

0.850

(±0.005)

0.838

(±0.009)

0.824

(±0.006)

0.796

WSTD

(±0.006)

0.854

(±0)

0.853

(±0.007)

0.853

(±0.011)

0.848

(±0.007)

0.839

(±0.012)

0.829

(±0.007)

0.806

(±0.007)

0.850

(±0.009)

0.852

(±0.005)

0.850

(±0.009)

0.843

(±0.011)

0.835

(±0.011)

0.824

(±0.009)

0.807

NSTD

(±0)

0.860

(±0.010)

0.862

(±0.012)

0.863

(±0.007)

0.862

(±0.006)

0.852

(±0.011)

0.847

(±0.008)

0.830

(±0.005)

0.857

(±0.007)

0.861

(±0.007)

0.864

(±0.009)

0.855

(±0.011)

0.847

(±0.005)

0.840

(±0.009)

0.862

(±0.011)

0.863

(±0.007)

0.862

(±0.004)

0.862

(±0.007)

0.855

(±0.007)

0.847

(±0.008)

0.832

(±0.008)

0.862

(±0.007)

0.861

(±0.006)

0.860

(±0.008)

0.858

(±0.007)

0.850

(±0.005)

0.840

(±0.007)

0.828

WLocal

Local

(±0.009)

0.820

Flocal

134

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.777 (±0.016)

(±0.015)

(±0.014)

(±0.014)

0.779

0.774

(±0.016)

(±0.016)

0.777

0.775

(±0.013)

(±0.012)

0.780

0.755

(±0.014)

(±0.011)

0.769

0.737

(±0.007)

(±0.018)

0.746

0.720

(±0.013)

(±0.014)

0.732

0.637

0.669

0.714 (±0.018)

0.768

(±0.020)

0.706 (±0.014)

0.766

(±0.016)

0.708 (±0.017)

0.769

(±0.017)

0.685 (±0.018)

0.754

(±0.022)

0.676 (±0.013)

0.734

(±0)

(±0.016)

0.642

0.703

(±0.010)

(±0.034)

(±0)

1

0.565

WAvg

Avg

0.618

Th

0.5

Avg

(±0.013)

0.779

(±0.019)

0.786

(±0.014)

0.780

(±0.016)

0.777

(±0.007)

0.759

(±0.012)

0.730

(±0.012)

0.667

(±0.015)

0.766

(±0.018)

0.762

(±0.018)

0.757

(±0.016)

0.751

(±0)

0.718

(±0.016)

0.686

(±0.034)

0.594

NAvg

MD

(±0.017)

0.784

(±0.018)

0.786

(±0.010)

0.789

(±0.012)

0.785

(±0.017)

0.765

(±0.029)

0.753

(±0.015)

0.701

(±0.017)

0.783

(±0.013)

0.786

(±0.019)

0.786

(±0)

0.783

(±0.017)

0.760

(±0.027)

0.746

(±0.019)

0.669

(±0.015)

0.782

(±0.015)

0.775

(±0.014)

0.772

(±0.016)

0.753

(±0.014)

0.727

(±0.013)

0.705

(±0.013)

0.658

(±0.016)

0.780

(±0.024)

0.783

(±0)

0.783

(±0.018)

0.768

(±0.027)

0.751

(±0.025)

0.721

(±0.021)

0.669

WMD

MD

(±0.014)

0.787

(±0.018)

0.789

(±0.019)

0.788

(±0.019)

0.790

(±0.010)

0.780

(±0.019)

0.768

(±0.018)

0.742

(±0.023)

0.785

(±0)

0.784

(±0.018)

0.784

(±0.021)

0.778

(±0.018)

0.764

(±0.020)

0.739

(±0.016)

0.678

NMD

Max

(±0.015)

0.786

(±0.015)

0.785

(±0.012)

0.785

(±0.012)

0.785

(±0.019)

0.765

(±0.018)

0.747

(±0.018)

0.705

(±0)

0.782

(±0.018)

0.788

(±0.025)

0.788

(±0.014)

0.778

(±0.020)

0.760

(±0.014)

0.737

(±0.029)

0.675

(±0.019)

0.775

(±0.014)

0.778

(±0.020)

0.773

(±0.016)

0.757

(±0.018)

0.736

(±0.025)

0.701

(±0)

0.661

(±0.016)

0.781

(±0.027)

0.780

(±0.013)

0.778

(±0.015)

0.757

(±0.014)

0.741

(±0.016)

0.729

(±0.014)

0.657

WMax

Max

(±0.014)

0.787

(±0.020)

0.789

(±0.018)

0.790

(±0.012)

0.784

(±0.018)

0.780

(±0)

0.764

(±0.014)

0.732

(±0.022)

0.784

(±0.012)

0.786

(±0.015)

0.782

(±0.028)

0.780

(±0.017)

0.759

(±0.017)

0.738

(±0.025)

0.665

NMax

Table A.14: Thresholding techniques for Alj-Mgz-W Dataset (MacroF1 )

STD

(±0.014)

0.785

(±0.016)

0.785

(±0.015)

0.786

(±0.016)

0.786

(±0)

0.766

(±0.015)

0.752

(±0.007)

0.712

(±0.015)

0.784

(±0.016)

0.786

(±0.015)

0.783

(±0.016)

0.779

(±0.013)

0.758

(±0.013)

0.741

(±0.029)

0.662

(±0.019)

0.777

(±0.017)

0.776

(±0.016)

0.778

(±0)

0.755

(±0.022)

0.728

(±0.018)

0.701

(±0.012)

0.660

(±0.012)

0.780

(±0.017)

0.782

(±0.016)

0.781

(±0.013)

0.767

(±0.008)

0.753

(±0.021)

0.734

(±0.014)

0.674

WSTD

STD

(±0.018)

0.785

(±0.015)

0.786

(±0)

0.788

(±0.015)

0.783

(±0.020)

0.779

(±0.017)

0.766

(±0.019)

0.733

(±0.011)

0.786

(±0.020)

0.784

(±0.014)

0.781

(±0.016)

0.775

(±0.015)

0.764

(±0.022)

0.733

(±0.031)

0.672

NSTD

(±0.015)

0.783

(±0)

0.784

(±0.017)

0.790

(±0.019)

0.789

(±0.015)

0.784

(±0.012)

0.771

(±0.009)

0.745

(±0.015)

0.783

(±0.013)

0.786

(±0.017)

0.788

(±0.018)

0.777

(±0.022)

0.771

(±0.020)

0.757

(±0.020)

0.722

Flocal

(±0)

0.785

(±0.017)

0.781

(±0.010)

0.785

(±0.015)

0.779

(±0.016)

0.775

(±0.009)

0.755

(±0.013)

0.721

(±0.017)

0.780

(±0.017)

0.779

(±0.016)

0.784

(±0.013)

0.774

(±0.016)

0.763

(±0.015)

0.748

(±0.017)

0.725

WLocal

Local

135

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.784 (±0.013)

0.787

(±0.017)

0.785 (±0.018)

0.790

(±0.012)

0.781 (±0.011)

0.792

(±0.015)

0.763 (±0.010)

0.789

(±0.010)

0.719 (±0.015)

0.789

(±0.016)

0.690 (±0.015)

0.774

(±0.017)

0.565 (±0.012)

0.739

(±0.014)

(±0.009)

(±0.010)

0.785

(±0.013)

(±0.020)

0.787

0.787

(±0.009)

(±0.011)

0.791

0.787

0.794

(±0.018)

(±0.013)

(±0) 0.778

(±0.011)

0.790

0.761

(±0.018)

(±0)

0.779

0.755

(±0.016)

0.770

(±0.015)

1

0.709

0.745

0.5

Avg WAvg

Avg

Th

(±0.018)

0.780

(±0.008)

0.780

(±0.011)

0.782

(±0.011)

0.773

(±0.015)

0.769

(±0.012)

0.756

(±0.014)

0.717

(±0.008)

0.785

(±0.010)

0.790

(±0.011)

0.788

(±0)

0.785

(±0.016)

0.781

(±0.022)

0.774

(±0.018)

0.744

NAvg

(±0.007)

0.788

(±0.009)

0.787

(±0.009)

0.796

(±0.011)

0.795

(±0.007)

0.790

(±0.016)

0.775

(±0.009)

0.746

(±0.009)

0.792

(±0.014)

0.793

(±0)

0.796

(±0.011)

0.797

(±0.022)

0.788

(±0.011)

0.771

(±0.019)

0.744

MD

MD

(±0.015)

0.788

(±0.015)

0.789

(±0.010)

0.789

(±0.017)

0.769

(±0.008)

0.765

(±0.014)

0.739

(±0.017)

0.677

(±0.012)

0.790

(±0)

0.786

(±0.019)

0.784

(±0.017)

0.774

(±0.020)

0.756

(±0.016)

0.725

(±0.014)

0.650

WMD

(±0.013)

0.787

(±0.008)

0.786

(±0.012)

0.792

(±0.017)

0.786

(±0.010)

0.774

(±0.019)

0.757

(±0.016)

0.740

(±0)

0.785

(±0.013)

0.788

(±0.016)

0.786

(±0.011)

0.779

(±0.007)

0.769

(±0.022)

0.767

(±0.018)

0.738

NMD

(±0.009)

0.787

(±0.014)

0.787

(±0.016)

0.795

(±0.012)

0.792

(±0.014)

0.785

(±0.015)

0.776

(±0)

0.748

(±0.014)

0.794

(±0.012)

0.792

(±0.014)

0.794

(±0.013)

0.793

(±0.017)

0.787

(±0.011)

0.770

(±0.015)

0.749

Max

Max

0.663

WMax

(±0.015)

0.790

(±0.012)

0.791

(±0.015)

0.792

(±0.008)

0.775

(±0.015)

0.763

(±0)

0.737

(±0.017)

0.664

(±0.013)

0.787

(±0.013)

0.787

(±0.011)

0.789

(±0.011)

0.771

(±0.009)

0.758

(±0.015)

0.728

(±0.018)

Table A.14 – continued

(±0.012)

0.785

(±0.009)

0.787

(±0.008)

0.788

(±0.010)

0.785

(±0)

0.774

(±0.013)

0.760

(±0.017)

0.739

(±0.015)

0.783

(±0.012)

0.788

(±0.014)

0.786

(±0.015)

0.780

(±0.010)

0.770

(±0.016)

0.763

(±0.017)

0.749

NMax

(±0.010)

0.789

(±0.007)

0.783

(±0.017)

0.794

(±0)

0.795

(±0.015)

0.783

(±0.020)

0.770

(±0.016)

0.746

(±0.017)

0.793

(±0.017)

0.788

(±0.016)

0.794

(±0.014)

0.793

(±0.016)

0.785

(±0.012)

0.767

(±0.016)

0.746

STD

STD

(±0.012)

0.784

(±0.016)

0.784

(±0)

0.785

(±0.014)

0.771

(±0.017)

0.758

(±0.010)

0.739

(±0.009)

0.680

(±0.014)

0.787

(±0.009)

0.787

(±0.012)

0.785

(±0.012)

0.772

(±0.010)

0.758

(±0.017)

0.731

(±0.016)

0.667

WSTD

(±0.013)

0.783

(±0)

0.785

(±0.012)

0.790

(±0.025)

0.785

(±0.012)

0.773

(±0.017)

0.764

(±0.013)

0.735

(±0.011)

0.785

(±0.013)

0.788

(±0.014)

0.790

(±0.012)

0.780

(±0.014)

0.771

(±0.011)

0.763

(±0.015)

0.747

NSTD

(±0)

0.787

(±0.016)

0.791

(±0.021)

0.796

(±0.012)

0.802

(±0.013)

0.789

(±0.019)

0.784

(±0.014)

0.757

(±0.011)

0.785

(±0.013)

0.792

(±0.012)

0.800

(±0.012)

0.797

(±0.020)

0.787

(±0.011)

0.782

(±0.015)

0.788

(±0.020)

0.793

(±0.009)

0.793

(±0.015)

0.798

(±0.016)

0.784

(±0.013)

0.775

(±0.013)

0.754

(±0.014)

0.788

(±0.010)

0.791

(±0.009)

0.794

(±0.012)

0.793

(±0.012)

0.786

(±0.009)

0.773

(±0.016)

0.760

WLocal

Local

(±0.015)

0.760

Flocal

136

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.856 (±0.007)

(±0.007)

(±0.006)

(±0.007)

0.857

0.854

(±0.006)

(±0.004)

0.852

0.850

(±0.009)

(±0.010)

0.850

0.837

(±0.012)

(±0.011)

0.841

0.822

(±0.009)

(±0.005)

0.825

0.810

(±0.008)

(±0.009)

0.802

0.760

0.742

0.803 (±0.007)

0.835

(±0.005)

0.802 (±0.008)

0.830

(±0.006)

0.796 (±0.007)

0.824

(±0.005)

0.780 (±0.010)

0.801

(±0.011)

0.759 (±0.015)

0.769

(±0.023)

0.731 (±0.014)

0.745

(±0.014)

(±0.024)

(±0.045)

1

0.670

WAvg

Avg

0.589

Th

0.5

Avg

(±0.007)

0.856

(±0.008)

0.855

(±0.010)

0.848

(±0.010)

0.834

(±0.008)

0.819

(±0.009)

0.794

(±0.010)

0.740

(±0.012)

0.827

(±0.009)

0.821

(±0.009)

0.807

(±0.005)

0.772

(±0.018)

0.727

(±0.016)

0.687

(±0.043)

0.485

NAvg

MD

(±0.008)

0.862

(±0.005)

0.859

(±0.004)

0.857

(±0.008)

0.852

(±0.010)

0.837

(±0.011)

0.831

(±0.005)

0.787

(±0.007)

0.858

(±0.008)

0.859

(±0.008)

0.856

(±0.007)

0.847

(±0.008)

0.827

(±0.009)

0.807

(±0.010)

0.761

(±0.010)

0.854

(±0.007)

0.855

(±0.008)

0.852

(±0.009)

0.838

(±0.006)

0.821

(±0.011)

0.806

(±0.006)

0.772

(±0.009)

0.856

(±0.005)

0.856

(±0.006)

0.855

(±0.011)

0.846

(±0.010)

0.829

(±0.011)

0.813

(±0.013)

0.770

WMD

MD

(±0.007)

0.855

(±0.009)

0.855

(±0.008)

0.854

(±0.008)

0.845

(±0.009)

0.837

(±0.006)

0.830

(±0.002)

0.794

(±0.012)

0.856

(±0.009)

0.856

(±0.006)

0.849

(±0.006)

0.835

(±0.006)

0.817

(±0.008)

0.785

(±0.026)

0.638

NMD

Max

(±0.009)

0.860

(±0.009)

0.858

(±0.007)

0.856

(±0.008)

0.850

(±0.013)

0.838

(±0.011)

0.830

(±0.007)

0.787

(±0.004)

0.857

(±0.006)

0.859

(±0.008)

0.858

(±0.007)

0.844

(±0.008)

0.824

(±0.003)

0.808

(±0.008)

0.756

(±0.008)

0.855

(±0.006)

0.855

(±0.004)

0.856

(±0.009)

0.839

(±0.008)

0.829

(±0.011)

0.808

(±0.009)

0.770

(±0.008)

0.857

(±0.006)

0.856

(±0.009)

0.852

(±0.009)

0.843

(±0.007)

0.827

(±0.010)

0.812

(±0.011)

0.749

WMax

Max

(±0.006)

0.856

(±0.013)

0.854

(±0.010)

0.851

(±0.008)

0.842

(±0.010)

0.833

(±0.008)

0.823

(±0.007)

0.792

(±0.011)

0.856

(±0.007)

0.855

(±0.005)

0.846

(±0.006)

0.833

(±0.003)

0.807

(±0.013)

0.776

(±0.036)

0.596

NMax

Table A.15: Thresholding techniques for Alj-Mgz-AS Dataset (MicroF1 )

STD

(±0.007)

0.860

(±0.005)

0.860

(±0.007)

0.857

(±0.008)

0.849

(±0.009)

0.841

(±0.013)

0.829

(±0.009)

0.784

(±0.007)

0.858

(±0.007)

0.858

(±0.008)

0.855

(±0.008)

0.842

(±0.012)

0.825

(±0.013)

0.810

(±0.008)

0.760

(±0.008)

0.855

(±0.005)

0.854

(±0.003)

0.854

(±0.009)

0.842

(±0.008)

0.825

(±0.008)

0.808

(±0.007)

0.767

(±0.006)

0.858

(±0.006)

0.857

(±0.007)

0.854

(±0.009)

0.844

(±0.014)

0.830

(±0.005)

0.810

(±0.016)

0.765

WSTD

STD

(±0.008)

0.856

(±0.007)

0.852

(±0.010)

0.853

(±0.008)

0.845

(±0.009)

0.833

(±0.005)

0.827

(±0.004)

0.800

(±0.009)

0.857

(±0.010)

0.855

(±0.008)

0.853

(±0.007)

0.835

(±0.006)

0.816

(±0.011)

0.790

(±0.032)

0.648

NSTD

(±0.008)

0.858

(±0.008)

0.858

(±0.007)

0.852

(±0.007)

0.851

(±0.008)

0.848

(±0.010)

0.833

(±0.010)

0.810

(±0.005)

0.859

(±0.009)

0.856

(±0.006)

0.856

(±0.005)

0.848

(±0.008)

0.837

(±0.008)

0.820

(±0.014)

0.782

Flocal

(±0.007)

0.856

(±0.006)

0.857

(±0.008)

0.856

(±0.007)

0.852

(±0.011)

0.841

(±0.009)

0.829

(±0.010)

0.805

(±0.007)

0.857

(±0.010)

0.856

(±0.006)

0.858

(±0.007)

0.846

(±0.009)

0.836

(±0.005)

0.823

(±0.007)

0.797

WLocal

Local

137

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.847 (±0.011)

0.857

(±0.006)

0.845 (±0.008)

0.857

(±0.006)

0.842 (±0.004)

0.857

(±0.009)

0.831 (±0.009)

0.853

(±0.005)

0.812 (±0.007)

0.843

(±0.009)

0.795 (±0.010)

0.831

(±0.008)

0.747 (±0.004)

0.809

(±0.007)

(±0.006)

(±0.007)

0.860

(±0.007)

(±0.007)

0.858

0.860

(±0.010)

(±0.006)

0.859

0.856

(±0.007)

(±0.004)

0.859

0.849

(±0.007)

(±0.007)

0.850

0.838

(±0.010)

(±0.008)

0.842

0.819

(±0.009)

0.832

(±0.004)

1

0.794

0.810

0.5

Avg WAvg

Avg

Th

(±0.007)

0.847

(±0.006)

0.848

(±0.007)

0.843

(±0.006)

0.831

(±0.007)

0.814

(±0.006)

0.802

(±0.007)

0.750

(±0.006)

0.855

(±0.009)

0.854

(±0.009)

0.849

(±0.004)

0.839

(±0.006)

0.836

(±0.004)

0.829

(±0.006)

0.795

NAvg

(±0.006)

0.858

(±0.007)

0.858

(±0.009)

0.858

(±0.007)

0.853

(±0.010)

0.845

(±0.006)

0.834

(±0.006)

0.807

(±0.006)

0.857

(±0.006)

0.856

(±0.007)

0.857

(±0.007)

0.849

(±0.007)

0.840

(±0.004)

0.828

(±0.007)

0.805

MD

MD

(±0.007)

0.858

(±0.009)

0.860

(±0.009)

0.857

(±0.009)

0.849

(±0.009)

0.837

(±0.009)

0.820

(±0.009)

0.769

(±0.009)

0.858

(±0.009)

0.858

(±0.008)

0.854

(±0.007)

0.848

(±0.010)

0.835

(±0.006)

0.810

(±0.011)

0.767

WMD

(±0.007)

0.852

(±0.009)

0.853

(±0.007)

0.846

(±0.005)

0.840

(±0.005)

0.831

(±0.005)

0.813

(±0.006)

0.780

(±0.008)

0.851

(±0.007)

0.850

(±0.010)

0.844

(±0.011)

0.833

(±0.007)

0.829

(±0.009)

0.810

(±0.009)

0.786

NMD

(±0.005)

0.859

(±0.007)

0.860

(±0.008)

0.858

(±0.007)

0.851

(±0.010)

0.845

(±0.005)

0.831

(±0.005)

0.808

(±0.009)

0.857

(±0.008)

0.858

(±0.006)

0.857

(±0.004)

0.850

(±0.006)

0.840

(±0.005)

0.826

(±0.008)

0.806

Max

Max

0.771

WMax

(±0.004)

0.859

(±0.005)

0.857

(±0.006)

0.855

(±0.006)

0.850

(±0.007)

0.833

(±0.009)

0.817

(±0.005)

0.760

(±0.007)

0.860

(±0.010)

0.857

(±0.008)

0.853

(±0.006)

0.849

(±0.010)

0.839

(±0.009)

0.810

(±0.009)

Table A.15 – continued

(±0.007)

0.853

(±0.010)

0.852

(±0.007)

0.847

(±0.008)

0.838

(±0.003)

0.827

(±0.002)

0.807

(±0.008)

0.782

(±0.009)

0.854

(±0.008)

0.851

(±0.008)

0.847

(±0.008)

0.837

(±0.008)

0.832

(±0.005)

0.816

(±0.008)

0.785

NMax

(±0.009)

0.857

(±0.007)

0.858

(±0.010)

0.856

(±0.005)

0.849

(±0.009)

0.846

(±0.007)

0.834

(±0.006)

0.808

(±0.009)

0.858

(±0.007)

0.857

(±0.007)

0.858

(±0.006)

0.851

(±0.005)

0.838

(±0.004)

0.829

(±0.004)

0.805

STD

STD

(±0.005)

0.860

(±0.009)

0.858

(±0.010)

0.858

(±0.008)

0.849

(±0.009)

0.838

(±0.008)

0.819

(±0.009)

0.777

(±0.006)

0.859

(±0.010)

0.858

(±0.008)

0.854

(±0.009)

0.847

(±0.013)

0.833

(±0.007)

0.813

(±0.002)

0.772

WSTD

(±0.009)

0.853

(±0.008)

0.854

(±0.008)

0.849

(±0.004)

0.842

(±0.003)

0.830

(±0.005)

0.814

(±0.006)

0.785

(±0.009)

0.853

(±0.010)

0.851

(±0.011)

0.846

(±0.007)

0.837

(±0.010)

0.832

(±0.005)

0.814

(±0.011)

0.783

NSTD

(±0.010)

0.859

(±0.008)

0.861

(±0.006)

0.860

(±0.006)

0.857

(±0.006)

0.846

(±0.007)

0.837

(±0.006)

0.815

(±0.009)

0.858

(±0.006)

0.857

(±0.008)

0.856

(±0.006)

0.845

(±0.006)

0.838

(±0.007)

0.829

(±0.007)

0.859

(±0.006)

0.860

(±0.006)

0.860

(±0.007)

0.855

(±0.006)

0.848

(±0.007)

0.838

(±0.008)

0.823

(±0.009)

0.858

(±0.006)

0.858

(±0.007)

0.856

(±0.004)

0.855

(±0.005)

0.843

(±0.007)

0.838

(±0.009)

0.811

WLocal

Local

(±0.008)

0.815

Flocal

138

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.792 (±0.012)

(±0.012)

(±0.009)

(±0.010)

0.792

0.790

(±0.011)

(±0.006)

0.789

0.784

(±0.016)

(±0.013)

0.786

0.768

(±0.018)

(±0.016)

0.775

0.746

(±0.017)

(±0.012)

0.754

0.721

(±0.017)

(±0.010)

0.720

0.584

0.563

0.689 (±0.020)

0.763

(±0.004)

0.682 (±0.025)

0.758

(±0.006)

0.677 (±0.011)

0.749

(±0.009)

0.643 (±0.015)

0.721

(±0.023)

0.595 (±0.032)

0.665

(±0.036)

0.541 (±0.036)

0.626

(±0.018)

(±0.035)

(±0.074)

1

0.433

WAvg

Avg

0.397

Th

0.5

Avg

(±0.012)

0.792

(±0.014)

0.794

(±0.015)

0.786

(±0.011)

0.772

(±0.013)

0.752

(±0.016)

0.724

(±0.020)

0.627

(±0.017)

0.768

(±0.012)

0.761

(±0.014)

0.748

(±0.012)

0.716

(±0.024)

0.656

(±0.020)

0.607

(±0.066)

0.388

NAvg

MD

(±0.012)

0.800

(±0.007)

0.797

(±0.008)

0.793

(±0.010)

0.792

(±0.012)

0.774

(±0.019)

0.754

(±0.022)

0.630

(±0.011)

0.796

(±0.013)

0.799

(±0.014)

0.793

(±0.012)

0.785

(±0.018)

0.755

(±0.016)

0.714

(±0.040)

0.605

(±0.016)

0.789

(±0.013)

0.788

(±0.013)

0.784

(±0.015)

0.766

(±0.012)

0.735

(±0.019)

0.707

(±0.014)

0.602

(±0.013)

0.795

(±0.008)

0.792

(±0.010)

0.791

(±0.020)

0.777

(±0.015)

0.749

(±0.021)

0.709

(±0.033)

0.594

WMD

MD

(±0.010)

0.795

(±0.008)

0.796

(±0.012)

0.796

(±0.011)

0.787

(±0.015)

0.776

(±0.009)

0.767

(±0.009)

0.698

(±0.017)

0.793

(±0.013)

0.797

(±0.009)

0.788

(±0.009)

0.773

(±0.012)

0.752

(±0.010)

0.695

(±0.028)

0.530

NMD

Max

(±0.014)

0.796

(±0.011)

0.796

(±0.010)

0.792

(±0.015)

0.790

(±0.018)

0.776

(±0.019)

0.760

(±0.018)

0.640

(±0.010)

0.794

(±0.012)

0.796

(±0.014)

0.796

(±0.016)

0.784

(±0.013)

0.745

(±0.009)

0.713

(±0.020)

0.606

(±0.012)

0.792

(±0.010)

0.788

(±0.007)

0.793

(±0.018)

0.768

(±0.019)

0.753

(±0.019)

0.706

(±0.019)

0.607

(±0.013)

0.794

(±0.012)

0.791

(±0.017)

0.785

(±0.017)

0.768

(±0.010)

0.736

(±0.017)

0.707

(±0.033)

0.554

WMax

Max

(±0.010)

0.794

(±0.017)

0.794

(±0.014)

0.792

(±0.013)

0.781

(±0.015)

0.770

(±0.009)

0.761

(±0.011)

0.695

(±0.019)

0.793

(±0.013)

0.796

(±0.007)

0.787

(±0.011)

0.771

(±0.010)

0.740

(±0.022)

0.687

(±0.029)

0.495

NMax

Table A.16: Thresholding techniques for Alj-Mgz-AS Dataset (MacroF1 )

STD

(±0.012)

0.796

(±0.007)

0.799

(±0.009)

0.794

(±0.012)

0.789

(±0.010)

0.778

(±0.017)

0.756

(±0.023)

0.627

(±0.010)

0.796

(±0.009)

0.795

(±0.015)

0.791

(±0.011)

0.778

(±0.021)

0.749

(±0.026)

0.718

(±0.026)

0.614

(±0.011)

0.792

(±0.009)

0.787

(±0.004)

0.789

(±0.015)

0.769

(±0.013)

0.743

(±0.016)

0.707

(±0.014)

0.595

(±0.008)

0.794

(±0.009)

0.794

(±0.008)

0.790

(±0.013)

0.776

(±0.025)

0.753

(±0.014)

0.712

(±0.031)

0.600

WSTD

STD

(±0.012)

0.794

(±0.011)

0.791

(±0.012)

0.794

(±0.010)

0.785

(±0.015)

0.776

(±0.006)

0.765

(±0.014)

0.702

(±0.015)

0.795

(±0.013)

0.794

(±0.010)

0.793

(±0.011)

0.773

(±0.012)

0.745

(±0.018)

0.699

(±0.041)

0.526

NSTD

(±0.013)

0.794

(±0.015)

0.793

(±0.013)

0.795

(±0.012)

0.791

(±0.011)

0.792

(±0.017)

0.771

(±0.021)

0.730

(±0.010)

0.799

(±0.016)

0.795

(±0.010)

0.796

(±0.008)

0.786

(±0.019)

0.770

(±0.014)

0.748

(±0.027)

0.671

Flocal

(±0.010)

0.791

(±0.008)

0.792

(±0.010)

0.794

(±0.013)

0.788

(±0.012)

0.778

(±0.012)

0.762

(±0.024)

0.714

(±0.013)

0.793

(±0.018)

0.792

(±0.012)

0.795

(±0.012)

0.773

(±0.017)

0.760

(±0.008)

0.745

(±0.017)

0.691

WLocal

Local

139

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.774 (±0.014)

0.796

(±0.008)

0.773 (±0.014)

0.796

(±0.011)

0.764 (±0.007)

0.799

(±0.008)

0.738 (±0.018)

0.798

(±0.008)

0.691 (±0.015)

0.786

(±0.011)

0.644 (±0.018)

0.768

(±0.011)

0.450 (±0.010)

0.708

(±0.012)

(±0.016)

(±0.010)

0.798

(±0.012)

(±0.009)

0.796

0.800

(±0.013)

(±0.010)

0.800

0.793

(±0.013)

(±0.011)

0.801

0.788

(±0.010)

(±0.012)

0.796

0.774

(±0.017)

(±0.011)

0.783

0.733

(±0.015)

0.769

(±0.011)

1

0.650

0.709

0.5

Avg WAvg

Avg

Th

(±0.013)

0.783

(±0.012)

0.788

(±0.012)

0.784

(±0.006)

0.775

(±0.008)

0.749

(±0.013)

0.730

(±0.010)

0.663

(±0.009)

0.795

(±0.013)

0.793

(±0.011)

0.791

(±0.005)

0.781

(±0.009)

0.778

(±0.007)

0.766

(±0.009)

0.699

NAvg

(±0.011)

0.796

(±0.009)

0.797

(±0.010)

0.801

(±0.009)

0.796

(±0.014)

0.788

(±0.006)

0.768

(±0.012)

0.703

(±0.009)

0.794

(±0.008)

0.798

(±0.013)

0.802

(±0.009)

0.795

(±0.013)

0.776

(±0.006)

0.755

(±0.015)

0.707

MD

MD

(±0.008)

0.794

(±0.014)

0.797

(±0.015)

0.796

(±0.013)

0.783

(±0.014)

0.761

(±0.021)

0.723

(±0.030)

0.562

(±0.014)

0.794

(±0.015)

0.796

(±0.011)

0.791

(±0.010)

0.784

(±0.020)

0.755

(±0.017)

0.694

(±0.023)

0.548

WMD

(±0.011)

0.789

(±0.015)

0.791

(±0.010)

0.787

(±0.008)

0.783

(±0.011)

0.771

(±0.009)

0.742

(±0.009)

0.690

(±0.011)

0.792

(±0.012)

0.791

(±0.012)

0.788

(±0.016)

0.776

(±0.010)

0.768

(±0.013)

0.737

(±0.014)

0.697

NMD

(±0.009)

0.797

(±0.008)

0.797

(±0.010)

0.801

(±0.007)

0.795

(±0.014)

0.786

(±0.006)

0.762

(±0.013)

0.704

(±0.012)

0.793

(±0.008)

0.799

(±0.009)

0.799

(±0.007)

0.798

(±0.009)

0.776

(±0.007)

0.751

(±0.015)

0.701

Max

Max

0.566

WMax

(±0.008)

0.792

(±0.011)

0.793

(±0.011)

0.794

(±0.007)

0.784

(±0.013)

0.752

(±0.021)

0.709

(±0.030)

0.522

(±0.012)

0.796

(±0.015)

0.794

(±0.013)

0.790

(±0.010)

0.784

(±0.017)

0.765

(±0.021)

0.697

(±0.011)

Table A.16 – continued

(±0.010)

0.790

(±0.016)

0.792

(±0.010)

0.789

(±0.009)

0.782

(±0.007)

0.766

(±0.003)

0.738

(±0.010)

0.693

(±0.011)

0.794

(±0.012)

0.790

(±0.014)

0.788

(±0.011)

0.779

(±0.013)

0.772

(±0.014)

0.747

(±0.012)

0.693

NMax

(±0.012)

0.795

(±0.009)

0.798

(±0.014)

0.798

(±0.007)

0.789

(±0.012)

0.787

(±0.014)

0.767

(±0.013)

0.709

(±0.013)

0.796

(±0.010)

0.798

(±0.008)

0.802

(±0.008)

0.796

(±0.009)

0.773

(±0.010)

0.758

(±0.012)

0.701

STD

STD

(±0.008)

0.797

(±0.012)

0.796

(±0.013)

0.795

(±0.010)

0.786

(±0.017)

0.763

(±0.015)

0.715

(±0.026)

0.596

(±0.013)

0.796

(±0.014)

0.794

(±0.013)

0.791

(±0.014)

0.783

(±0.022)

0.758

(±0.018)

0.708

(±0.016)

0.569

WSTD

(±0.013)

0.793

(±0.015)

0.795

(±0.011)

0.790

(±0.008)

0.786

(±0.006)

0.767

(±0.007)

0.746

(±0.009)

0.690

(±0.013)

0.793

(±0.013)

0.789

(±0.017)

0.788

(±0.007)

0.779

(±0.012)

0.773

(±0.009)

0.741

(±0.014)

0.690

NSTD

(±0.016)

0.797

(±0.013)

0.804

(±0.008)

0.804

(±0.010)

0.801

(±0.011)

0.789

(±0.011)

0.775

(±0.013)

0.743

(±0.014)

0.799

(±0.009)

0.798

(±0.015)

0.800

(±0.011)

0.790

(±0.010)

0.781

(±0.013)

0.769

(±0.008)

0.795

(±0.010)

0.800

(±0.008)

0.800

(±0.011)

0.796

(±0.008)

0.788

(±0.012)

0.776

(±0.013)

0.743

(±0.015)

0.800

(±0.011)

0.797

(±0.008)

0.796

(±0.006)

0.799

(±0.008)

0.784

(±0.008)

0.776

(±0.017)

0.729

WLocal

Local

(±0.011)

0.740

Flocal

140

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.834 (±0.007)

(±0.007)

(±0.009)

(±0.009)

0.834

0.834

(±0.008)

(±0.008)

0.834

0.831

(±0.007)

(±0.007)

0.833

0.827

(±0.008)

(±0.010)

0.828

0.807

(±0.010)

(±0.010)

0.807

0.782

(±0.007)

(±0.006)

0.784

0.724

0.719

0.730 (±0.012)

0.800

(±0.012)

0.725 (±0.010)

0.797

(±0.012)

0.717 (±0.013)

0.793

(±0.008)

0.703 (±0.004)

0.773

(±0.006)

0.689 (±0.007)

0.743

(±0.013)

0.667 (±0.016)

0.703

(±0.016)

(±0.025)

(±0.025)

1

0.616

WAvg

Avg

0.623

Th

0.5

Avg

(±0.008)

0.833

(±0.006)

0.833

(±0.009)

0.830

(±0.008)

0.828

(±0.009)

0.808

(±0.004)

0.781

(±0.005)

0.689

(±0.006)

0.811

(±0.005)

0.807

(±0.008)

0.802

(±0.004)

0.781

(±0.009)

0.740

(±0.008)

0.694

(±0.024)

0.555

NAvg

MD

(±0.006)

0.836

(±0.010)

0.836

(±0.007)

0.832

(±0.005)

0.830

(±0.005)

0.815

(±0.006)

0.804

(±0.004)

0.764

(±0.007)

0.836

(±0.010)

0.836

(±0.008)

0.833

(±0.007)

0.830

(±0.009)

0.814

(±0.006)

0.794

(±0.009)

0.740

(±0.009)

0.835

(±0.009)

0.835

(±0.007)

0.832

(±0.008)

0.826

(±0.006)

0.807

(±0.012)

0.782

(±0.005)

0.738

(±0.009)

0.835

(±0.009)

0.833

(±0.010)

0.833

(±0.005)

0.826

(±0.005)

0.814

(±0.008)

0.794

(±0.013)

0.731

WMD

MD

(±0.009)

0.837

(±0.008)

0.837

(±0.008)

0.832

(±0.010)

0.823

(±0.007)

0.817

(±0.007)

0.809

(±0.007)

0.771

(±0.010)

0.836

(±0.010)

0.832

(±0.007)

0.830

(±0.010)

0.827

(±0.006)

0.811

(±0.007)

0.778

(±0.011)

0.693

NMD

Max

(±0.008)

0.837

(±0.007)

0.836

(±0.007)

0.832

(±0.009)

0.831

(±0.006)

0.814

(±0.005)

0.795

(±0.003)

0.758

(±0.008)

0.837

(±0.010)

0.835

(±0.010)

0.834

(±0.004)

0.827

(±0.004)

0.814

(±0.006)

0.795

(±0.015)

0.731

(±0.009)

0.835

(±0.009)

0.834

(±0.007)

0.834

(±0.006)

0.828

(±0.008)

0.807

(±0.013)

0.789

(±0.006)

0.737

(±0.008)

0.834

(±0.007)

0.833

(±0.008)

0.831

(±0.011)

0.824

(±0.007)

0.812

(±0.004)

0.792

(±0.008)

0.745

WMax

Max

(±0.008)

0.835

(±0.009)

0.837

(±0.009)

0.830

(±0.006)

0.825

(±0.005)

0.819

(±0.012)

0.800

(±0.004)

0.753

(±0.009)

0.835

(±0.009)

0.833

(±0.007)

0.829

(±0.006)

0.828

(±0.007)

0.802

(±0.010)

0.776

(±0.009)

0.689

NMax

Table A.17: Thresholding techniques for Alj-Mgz-SR Dataset (MicroF1 )

STD

(±0.009)

0.838

(±0.007)

0.837

(±0.008)

0.833

(±0.008)

0.831

(±0.008)

0.818

(±0.005)

0.799

(±0.008)

0.761

(±0.006)

0.836

(±0.008)

0.836

(±0.010)

0.834

(±0.007)

0.827

(±0.006)

0.811

(±0.011)

0.794

(±0.012)

0.735

(±0.010)

0.835

(±0.009)

0.834

(±0.006)

0.834

(±0.005)

0.827

(±0.007)

0.808

(±0.010)

0.794

(±0.009)

0.735

(±0.007)

0.835

(±0.009)

0.834

(±0.012)

0.833

(±0.006)

0.826

(±0.004)

0.814

(±0.010)

0.787

(±0.014)

0.722

WSTD

STD

(±0.009)

0.834

(±0.010)

0.838

(±0.009)

0.831

(±0.006)

0.828

(±0.004)

0.817

(±0.008)

0.805

(±0.006)

0.770

(±0.009)

0.835

(±0.010)

0.834

(±0.008)

0.831

(±0.009)

0.827

(±0.011)

0.810

(±0.006)

0.786

(±0.016)

0.707

NSTD

(±0.006)

0.835

(±0.011)

0.836

(±0.009)

0.835

(±0.007)

0.829

(±0.009)

0.823

(±0.011)

0.807

(±0.010)

0.777

(±0.009)

0.834

(±0.008)

0.837

(±0.009)

0.832

(±0.006)

0.831

(±0.006)

0.825

(±0.011)

0.808

(±0.012)

0.762

Flocal

(±0.010)

0.836

(±0.009)

0.833

(±0.009)

0.831

(±0.006)

0.828

(±0.007)

0.814

(±0.006)

0.801

(±0.004)

0.765

(±0.009)

0.834

(±0.006)

0.832

(±0.009)

0.833

(±0.007)

0.826

(±0.004)

0.817

(±0.008)

0.803

(±0.009)

0.760

WLocal

Local

141

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.810 (±0.006)

0.837

(±0.006)

0.806 (±0.006)

0.837

(±0.011)

0.805 (±0.007)

0.834

(±0.009)

0.798 (±0.008)

0.827

(±0.009)

0.788 (±0.003)

0.817

(±0.008)

0.767 (±0.008)

0.805

(±0.008)

0.735 (±0.008)

0.775

(±0.007)

(±0.010)

(±0.007)

0.837

(±0.008)

(±0.007)

0.839

0.836

(±0.007)

(±0.009)

0.837

0.834

(±0.005)

(±0.009)

0.833

0.826

(±0.005)

(±0.007)

0.830

0.815

(±0.006)

(±0.007)

0.821

0.802

(±0.005)

0.811

(±0.006)

1

0.753

0.775

0.5

Avg WAvg

Avg

Th

(±0.008)

0.830

(±0.008)

0.829

(±0.006)

0.828

(±0.011)

0.821

(±0.008)

0.805

(±0.010)

0.777

(±0.006)

0.717

(±0.011)

0.836

(±0.009)

0.833

(±0.006)

0.831

(±0.014)

0.827

(±0.007)

0.816

(±0.008)

0.807

(±0.008)

0.765

NAvg

(±0.007)

0.836

(±0.010)

0.836

(±0.009)

0.835

(±0.008)

0.831

(±0.009)

0.826

(±0.008)

0.813

(±0.005)

0.784

(±0.006)

0.837

(±0.007)

0.838

(±0.010)

0.835

(±0.005)

0.830

(±0.009)

0.815

(±0.007)

0.799

(±0.005)

0.763

MD

MD

(±0.007)

0.836

(±0.008)

0.838

(±0.009)

0.835

(±0.007)

0.831

(±0.005)

0.824

(±0.008)

0.803

(±0.008)

0.755

(±0.007)

0.834

(±0.011)

0.834

(±0.010)

0.830

(±0.007)

0.827

(±0.006)

0.808

(±0.008)

0.784

(±0.005)

0.735

WMD

(±0.008)

0.833

(±0.011)

0.834

(±0.009)

0.830

(±0.013)

0.821

(±0.008)

0.812

(±0.011)

0.799

(±0.012)

0.760

(±0.011)

0.833

(±0.010)

0.835

(±0.006)

0.830

(±0.008)

0.823

(±0.008)

0.808

(±0.006)

0.785

(±0.009)

0.754

NMD

(±0.007)

0.837

(±0.009)

0.836

(±0.007)

0.836

(±0.008)

0.831

(±0.004)

0.828

(±0.008)

0.814

(±0.005)

0.787

(±0.009)

0.836

(±0.007)

0.837

(±0.007)

0.834

(±0.003)

0.829

(±0.010)

0.820

(±0.008)

0.805

(±0.004)

0.765

Max

Max

0.738

WMax

(±0.009)

0.837

(±0.009)

0.836

(±0.004)

0.834

(±0.007)

0.833

(±0.003)

0.822

(±0.005)

0.797

(±0.005)

0.752

(±0.011)

0.834

(±0.010)

0.833

(±0.008)

0.830

(±0.008)

0.826

(±0.006)

0.812

(±0.007)

0.790

(±0.005)

Table A.17 – continued

(±0.008)

0.834

(±0.010)

0.832

(±0.009)

0.829

(±0.011)

0.817

(±0.010)

0.814

(±0.010)

0.794

(±0.009)

0.757

(±0.010)

0.833

(±0.011)

0.835

(±0.008)

0.833

(±0.006)

0.823

(±0.005)

0.809

(±0.006)

0.788

(±0.013)

0.753

NMax

(±0.007)

0.838

(±0.009)

0.837

(±0.008)

0.833

(±0.008)

0.828

(±0.007)

0.822

(±0.010)

0.808

(±0.007)

0.780

(±0.009)

0.836

(±0.009)

0.837

(±0.008)

0.834

(±0.007)

0.832

(±0.006)

0.819

(±0.005)

0.800

(±0.005)

0.766

STD

STD

(±0.009)

0.837

(±0.006)

0.835

(±0.007)

0.837

(±0.006)

0.829

(±0.006)

0.816

(±0.005)

0.804

(±0.009)

0.756

(±0.010)

0.834

(±0.010)

0.835

(±0.008)

0.831

(±0.009)

0.826

(±0.005)

0.813

(±0.007)

0.787

(±0.009)

0.738

WSTD

(±0.009)

0.836

(±0.009)

0.835

(±0.009)

0.828

(±0.011)

0.822

(±0.008)

0.817

(±0.009)

0.804

(±0.009)

0.765

(±0.011)

0.834

(±0.008)

0.833

(±0.010)

0.833

(±0.007)

0.822

(±0.007)

0.811

(±0.008)

0.791

(±0.011)

0.760

NSTD

(±0.007)

0.834

(±0.008)

0.836

(±0.008)

0.836

(±0.006)

0.830

(±0.005)

0.825

(±0.010)

0.825

(±0.007)

0.800

(±0.005)

0.837

(±0.006)

0.836

(±0.009)

0.835

(±0.003)

0.831

(±0.008)

0.821

(±0.008)

0.815

(±0.007)

0.838

(±0.008)

0.839

(±0.006)

0.834

(±0.006)

0.833

(±0.003)

0.831

(±0.007)

0.826

(±0.008)

0.800

(±0.009)

0.834

(±0.009)

0.834

(±0.008)

0.831

(±0.006)

0.828

(±0.007)

0.823

(±0.003)

0.814

(±0.006)

0.779

WLocal

Local

(±0.007)

0.785

Flocal

142

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.757 (±0.012)

(±0.010)

(±0.013)

(±0.012)

0.760

0.757

(±0.014)

(±0.012)

0.760

0.757

(±0.019)

(±0.012)

0.757

0.751

(±0.017)

(±0.019)

0.756

0.725

(±0.014)

(±0.021)

0.721

0.675

(±0.010)

(±0.010)

0.677

0.554

0.547

0.557 (±0.018)

0.713

(±0.017)

0.547 (±0.018)

0.707

(±0.016)

0.531 (±0.026)

0.705

(±0.019)

0.499 (±0.029)

0.679

(±0.012)

0.467 (±0.026)

0.625

(±0.018)

0.425 (±0.026)

0.580

(±0.020)

(±0.027)

(±0.055)

1

0.344

WAvg

Avg

0.449

Th

0.5

Avg

(±0.012)

0.759

(±0.010)

0.759

(±0.017)

0.755

(±0.013)

0.755

(±0.016)

0.726

(±0.016)

0.689

(±0.008)

0.516

(±0.011)

0.740

(±0.009)

0.738

(±0.013)

0.731

(±0.012)

0.707

(±0.017)

0.655

(±0.008)

0.601

(±0.039)

0.422

NAvg

MD

(±0.010)

0.762

(±0.015)

0.761

(±0.013)

0.759

(±0.015)

0.758

(±0.015)

0.734

(±0.020)

0.711

(±0.007)

0.614

(±0.011)

0.764

(±0.015)

0.763

(±0.014)

0.759

(±0.017)

0.757

(±0.010)

0.734

(±0.011)

0.702

(±0.016)

0.581

(±0.015)

0.761

(±0.015)

0.760

(±0.011)

0.760

(±0.017)

0.750

(±0.015)

0.718

(±0.022)

0.686

(±0.009)

0.576

(±0.016)

0.760

(±0.018)

0.759

(±0.014)

0.760

(±0.010)

0.748

(±0.016)

0.729

(±0.020)

0.692

(±0.039)

0.539

WMD

MD

(±0.011)

0.763

(±0.013)

0.765

(±0.013)

0.757

(±0.012)

0.749

(±0.016)

0.747

(±0.010)

0.740

(±0.015)

0.670

(±0.013)

0.761

(±0.015)

0.760

(±0.014)

0.755

(±0.016)

0.757

(±0.007)

0.739

(±0.018)

0.688

(±0.012)

0.571

NMD

Max

(±0.011)

0.764

(±0.011)

0.762

(±0.010)

0.759

(±0.015)

0.759

(±0.017)

0.732

(±0.013)

0.695

(±0.007)

0.608

(±0.013)

0.764

(±0.016)

0.762

(±0.017)

0.762

(±0.012)

0.756

(±0.010)

0.735

(±0.010)

0.698

(±0.019)

0.570

(±0.014)

0.759

(±0.015)

0.759

(±0.012)

0.760

(±0.014)

0.751

(±0.017)

0.718

(±0.024)

0.695

(±0.010)

0.577

(±0.016)

0.759

(±0.013)

0.761

(±0.011)

0.756

(±0.013)

0.740

(±0.010)

0.713

(±0.012)

0.666

(±0.020)

0.542

WMax

Max

(±0.011)

0.761

(±0.014)

0.763

(±0.017)

0.753

(±0.012)

0.751

(±0.010)

0.749

(±0.021)

0.724

(±0.012)

0.643

(±0.014)

0.760

(±0.016)

0.760

(±0.015)

0.756

(±0.014)

0.757

(±0.011)

0.724

(±0.018)

0.685

(±0.018)

0.572

NMax

Table A.18: Thresholding techniques for Alj-Mgz-SR Dataset (MacroF1 )

STD

(±0.012)

0.764

(±0.014)

0.764

(±0.012)

0.761

(±0.019)

0.758

(±0.021)

0.739

(±0.016)

0.706

(±0.012)

0.607

(±0.011)

0.762

(±0.012)

0.763

(±0.017)

0.761

(±0.015)

0.753

(±0.013)

0.729

(±0.019)

0.697

(±0.024)

0.568

(±0.017)

0.758

(±0.016)

0.759

(±0.011)

0.760

(±0.016)

0.751

(±0.012)

0.720

(±0.020)

0.698

(±0.010)

0.572

(±0.012)

0.761

(±0.017)

0.759

(±0.017)

0.761

(±0.012)

0.749

(±0.015)

0.727

(±0.018)

0.689

(±0.026)

0.547

WSTD

STD

(±0.012)

0.759

(±0.014)

0.765

(±0.016)

0.756

(±0.012)

0.754

(±0.008)

0.745

(±0.014)

0.731

(±0.009)

0.669

(±0.013)

0.760

(±0.016)

0.762

(±0.017)

0.759

(±0.014)

0.756

(±0.014)

0.736

(±0.013)

0.693

(±0.016)

0.573

NSTD

(±0.011)

0.760

(±0.014)

0.764

(±0.017)

0.765

(±0.014)

0.758

(±0.011)

0.753

(±0.023)

0.726

(±0.015)

0.664

(±0.015)

0.759

(±0.017)

0.764

(±0.019)

0.761

(±0.017)

0.760

(±0.010)

0.751

(±0.022)

0.730

(±0.025)

0.627

Flocal

(±0.015)

0.759

(±0.015)

0.757

(±0.012)

0.756

(±0.018)

0.752

(±0.013)

0.734

(±0.006)

0.718

(±0.006)

0.626

(±0.019)

0.758

(±0.013)

0.756

(±0.011)

0.763

(±0.018)

0.750

(±0.013)

0.737

(±0.017)

0.707

(±0.016)

0.603

WLocal

Local

143

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.706 (±0.014)

0.764

(±0.012)

0.695 (±0.013)

0.764

(±0.016)

0.687 (±0.013)

0.760

(±0.014)

0.668 (±0.018)

0.752

(±0.017)

0.633 (±0.015)

0.750

(±0.015)

0.569 (±0.021)

0.729

(±0.016)

0.482 (±0.017)

0.667

(±0.012)

(±0.011)

(±0.009)

0.765

(±0.013)

(±0.012)

0.768

0.760

(±0.012)

(±0.014)

0.764

0.761

(±0.010)

(±0.015)

0.761

0.753

(±0.014)

(±0.016)

0.757

0.738

(±0.020)

(±0.014)

0.751

0.710

(±0.017)

0.737

(±0.012)

1

0.579

0.654

0.5

Avg WAvg

Avg

Th

(±0.012)

0.757

(±0.015)

0.756

(±0.012)

0.756

(±0.016)

0.750

(±0.015)

0.736

(±0.012)

0.697

(±0.010)

0.632

(±0.017)

0.763

(±0.015)

0.762

(±0.012)

0.757

(±0.020)

0.754

(±0.012)

0.746

(±0.011)

0.732

(±0.011)

0.675

NAvg

(±0.014)

0.763

(±0.015)

0.762

(±0.014)

0.764

(±0.013)

0.760

(±0.018)

0.754

(±0.014)

0.741

(±0.009)

0.678

(±0.010)

0.766

(±0.013)

0.767

(±0.012)

0.763

(±0.008)

0.760

(±0.016)

0.746

(±0.008)

0.717

(±0.011)

0.652

MD

MD

(±0.014)

0.762

(±0.013)

0.763

(±0.013)

0.760

(±0.014)

0.757

(±0.015)

0.742

(±0.017)

0.700

(±0.022)

0.554

(±0.016)

0.760

(±0.016)

0.763

(±0.019)

0.756

(±0.018)

0.750

(±0.022)

0.723

(±0.020)

0.674

(±0.010)

0.555

WMD

(±0.013)

0.760

(±0.020)

0.760

(±0.014)

0.756

(±0.015)

0.746

(±0.010)

0.737

(±0.015)

0.723

(±0.015)

0.669

(±0.012)

0.762

(±0.013)

0.763

(±0.011)

0.758

(±0.012)

0.749

(±0.011)

0.738

(±0.009)

0.695

(±0.010)

0.666

NMD

(±0.013)

0.765

(±0.014)

0.764

(±0.013)

0.766

(±0.015)

0.759

(±0.010)

0.757

(±0.012)

0.747

(±0.002)

0.684

(±0.014)

0.765

(±0.012)

0.765

(±0.014)

0.760

(±0.005)

0.757

(±0.020)

0.750

(±0.016)

0.730

(±0.009)

0.657

Max

Max

0.559

WMax

(±0.012)

0.765

(±0.013)

0.764

(±0.007)

0.761

(±0.012)

0.761

(±0.014)

0.738

(±0.015)

0.676

(±0.012)

0.539

(±0.018)

0.760

(±0.015)

0.762

(±0.016)

0.755

(±0.020)

0.751

(±0.019)

0.727

(±0.022)

0.682

(±0.006)

Table A.18 – continued

(±0.012)

0.762

(±0.015)

0.759

(±0.015)

0.755

(±0.016)

0.742

(±0.013)

0.742

(±0.014)

0.721

(±0.011)

0.665

(±0.014)

0.761

(±0.015)

0.764

(±0.012)

0.762

(±0.009)

0.752

(±0.004)

0.744

(±0.012)

0.705

(±0.016)

0.665

NMax

(±0.012)

0.767

(±0.013)

0.766

(±0.014)

0.759

(±0.014)

0.756

(±0.011)

0.752

(±0.016)

0.733

(±0.013)

0.674

(±0.014)

0.765

(±0.012)

0.766

(±0.015)

0.759

(±0.010)

0.761

(±0.014)

0.753

(±0.004)

0.723

(±0.010)

0.656

STD

STD

(±0.016)

0.764

(±0.013)

0.760

(±0.010)

0.764

(±0.013)

0.756

(±0.015)

0.732

(±0.014)

0.706

(±0.019)

0.597

(±0.016)

0.760

(±0.014)

0.764

(±0.015)

0.756

(±0.020)

0.751

(±0.019)

0.731

(±0.019)

0.679

(±0.031)

0.555

WSTD

(±0.014)

0.764

(±0.015)

0.762

(±0.017)

0.754

(±0.019)

0.750

(±0.011)

0.748

(±0.013)

0.732

(±0.010)

0.673

(±0.016)

0.762

(±0.013)

0.761

(±0.012)

0.760

(±0.011)

0.747

(±0.008)

0.739

(±0.012)

0.711

(±0.017)

0.670

NSTD

(±0.013)

0.762

(±0.014)

0.765

(±0.014)

0.770

(±0.010)

0.762

(±0.010)

0.753

(±0.010)

0.760

(±0.010)

0.723

(±0.009)

0.766

(±0.013)

0.763

(±0.016)

0.764

(±0.006)

0.769

(±0.011)

0.752

(±0.007)

0.747

(±0.015)

0.766

(±0.015)

0.768

(±0.015)

0.761

(±0.012)

0.765

(±0.012)

0.765

(±0.012)

0.752

(±0.018)

0.715

(±0.016)

0.760

(±0.016)

0.765

(±0.018)

0.761

(±0.014)

0.761

(±0.009)

0.754

(±0.006)

0.739

(±0.018)

0.682

WLocal

Local

(±0.014)

0.693

Flocal

144

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.861 (±0.005)

(±0.006)

(±0.012)

(±0.006)

0.863

0.86

(±0.011)

(±0.007)

0.86

0.852

(±0.007)

(±0.007)

0.857

0.828

(±0.012)

(±0.007)

0.834

0.815

(±0.006)

(±0.008)

0.818

0.788

(±0.007)

(±0.008)

0.803

0.74

0.729

0.811 (±0.008)

0.842

(±0.009)

0.807 (±0.01)

0.838

(±0.009)

0.804 (±0.007)

0.835

(±0.012)

0.779 (±0.009)

0.81

(±0.008)

0.752 (±0.01)

0.781

(±0.009)

0.73 (±0)

0.753

(±0.006)

(±0.023)

(±0)

1

0.676

WAvg

Avg

0.629

Th

0.5

Avg

(±0.008)

0.859

(±0.007)

0.856

(±0.005)

0.851

(±0.011)

0.833

(±0.007)

0.813

(±0.008)

0.79

(±0.011)

0.711

(±0.007)

0.833

(±0.008)

0.829

(±0.006)

0.819

(±0.007)

0.792

(±0)

0.736

(±0.016)

0.703

(±0.009)

0.502

NAvg

MD

(±0.006)

0.864

(±0.006)

0.861

(±0.01)

0.86

(±0.006)

0.846

(±0.008)

0.836

(±0.006)

0.826

(±0.008)

0.767

(±0.009)

0.862

(±0.008)

0.859

(±0.005)

0.854

(±0)

0.839

(±0.011)

0.82

(±0.007)

0.8

(±0.027)

0.752

(±0.006)

0.859

(±0.007)

0.856

(±0.011)

0.852

(±0.012)

0.83

(±0.009)

0.815

(±0.004)

0.8

(±0.008)

0.754

(±0.01)

0.861

(±0.01)

0.858

(±0)

0.852

(±0.015)

0.835

(±0.01)

0.818

(±0.009)

0.804

(±0.004)

0.768

WMD

MD

(±0.006)

0.863

(±0.007)

0.863

(±0.009)

0.858

(±0.01)

0.843

(±0.007)

0.836

(±0.009)

0.821

(±0.011)

0.797

(±0.009)

0.859

(±0)

0.858

(±0.01)

0.851

(±0.015)

0.834

(±0.007)

0.806

(±0.007)

0.785

(±0.006)

0.636

NMD

Max

(±0.007)

0.864

(±0.005)

0.861

(±0.011)

0.861

(±0.009)

0.845

(±0.007)

0.836

(±0.01)

0.821

(±0.009)

0.771

(±0)

0.864

(±0.007)

0.858

(±0.007)

0.852

(±0.013)

0.84

(±0.006)

0.823

(±0.004)

0.799

(±0.033)

0.747

(±0.005)

0.86

(±0.009)

0.858

(±0.012)

0.855

(±0.012)

0.83

(±0.008)

0.814

(±0.005)

0.802

(±0)

0.753

(±0.01)

0.857

(±0.008)

0.856

(±0.005)

0.849

(±0.012)

0.83

(±0.004)

0.812

(±0.012)

0.799

(±0.009)

0.759

WMax

Max

(±0.007)

0.862

(±0.006)

0.859

(±0.009)

0.856

(±0.006)

0.845

(±0.01)

0.831

(±0)

0.816

(±0.008)

0.788

(±0.008)

0.857

(±0.006)

0.859

(±0.007)

0.846

(±0.008)

0.831

(±0.013)

0.798

(±0.01)

0.774

(±0.01)

0.621

NMax

Table A.19: Thresholding techniques for Alj-Mgz-MS Dataset (MicroF1 )

(±0.006)

0.865

(±0.005)

0.862

(±0.005)

0.859

(±0.009)

0.847

(±0)

0.839

(±0.006)

0.826

(±0.006)

0.771

(±0.007)

0.862

(±0.009)

0.859

(±0.012)

0.856

(±0.011)

0.837

(±0.007)

0.818

(±0.009)

0.793

(±0.034)

0.75

STD

(±0.007)

0.861

(±0.005)

0.856

(±0.012)

0.852

(±0)

0.827

(±0.01)

0.814

(±0.005)

0.799

(±0.013)

0.752

(±0.007)

0.862

(±0.009)

0.857

(±0.008)

0.847

(±0.013)

0.832

(±0.007)

0.818

(±0.009)

0.807

(±0.009)

0.763

WSTD

STD

(±0.007)

0.862

(±0.005)

0.862

(±0)

0.858

(±0.01)

0.844

(±0.009)

0.833

(±0.007)

0.821

(±0.003)

0.793

(±0.008)

0.859

(±0.008)

0.859

(±0.006)

0.854

(±0.013)

0.838

(±0.014)

0.815

(±0.005)

0.785

(±0.003)

0.629

NSTD

(±0.007)

0.863

(±0)

0.864

(±0.007)

0.861

(±0.01)

0.853

(±0.01)

0.837

(±0.01)

0.828

(±0.006)

0.813

(±0.008)

0.862

(±0.007)

0.861

(±0.01)

0.854

(±0.006)

0.844

(±0.007)

0.833

(±0.005)

0.823

(±0.039)

0.782

Flocal

(±0)

0.863

(±0.005)

0.863

(±0.007)

0.858

(±0.009)

0.845

(±0.011)

0.837

(±0.007)

0.821

(±0.008)

0.788

(±0.009)

0.858

(±0.008)

0.857

(±0.011)

0.856

(±0.012)

0.842

(±0.008)

0.833

(±0.007)

0.821

(±0.01)

0.788

WLocal

Local

145

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.856 (±0.007)

0.86

(±0.003)

0.852 (±0.009)

0.861

(±0.005)

0.851 (±0.008)

0.858

(±0.012)

0.828 (±0.01)

0.849

(±0.005)

0.811 (±0.007)

0.837

(±0.007)

0.791 (±0.01)

0.827

(±0.009)

0.742 (±0.007)

0.81

(±0.007)

(±0.01)

(±0.006)

0.863

(±0.008)

(±0.008)

0.859

0.858

(±0.006)

(±0.005)

0.861

0.854

0.858

(±0.004)

(±0.008)

(±0) 0.842

(±0.005)

0.846

0.835

(±0.008)

(±0)

0.835

0.822

(±0.006)

0.827

(±0.005)

1

0.782

0.81

0.5

Avg WAvg

Avg

Th

(±0.006)

0.853

(±0.006)

0.848

(±0.007)

0.845

(±0.007)

0.83

(±0.013)

0.814

(±0.01)

0.801

(±0.008)

0.746

(±0.004)

0.86

(±0.002)

0.856

(±0.007)

0.853

(±0)

0.846

(±0.006)

0.836

(±0.006)

0.823

(±0.006)

0.799

NAvg

(±0.005)

0.863

(±0.004)

0.86

(±0.009)

0.862

(±0.003)

0.851

(±0.007)

0.837

(±0.01)

0.829

(±0.006)

0.808

(±0.008)

0.861

(±0.006)

0.862

(±0)

0.857

(±0.007)

0.844

(±0.008)

0.833

(±0.008)

0.821

(±0.008)

0.802

MD

MD

(±0.006)

0.862

(±0.009)

0.863

(±0.005)

0.853

(±0.009)

0.847

(±0.005)

0.833

(±0.009)

0.812

(±0.007)

0.764

(±0.009)

0.857

(±0)

0.856

(±0.006)

0.853

(±0.006)

0.842

(±0.007)

0.826

(±0.005)

0.8

(±0.009)

0.75

WMD

(±0.007)

0.861

(±0.007)

0.859

(±0.008)

0.853

(±0.007)

0.838

(±0.01)

0.822

(±0.008)

0.811

(±0.005)

0.786

(±0)

0.851

(±0.01)

0.849

(±0.006)

0.844

(±0.006)

0.83

(±0.01)

0.823

(±0.005)

0.809

(±0.01)

0.785

NMD

(±0.004)

0.863

(±0.007)

0.859

(±0.009)

0.863

(±0.005)

0.849

(±0.006)

0.838

(±0.008)

0.826

(±0)

0.806

(±0.007)

0.862

(±0.008)

0.862

(±0.007)

0.857

(±0.007)

0.844

(±0.006)

0.835

(±0.007)

0.823

(±0.008)

0.801

Max

Max

0.75

WMax

(±0.01)

0.863

(±0.01)

0.86

(±0.008)

0.855

(±0.005)

0.844

(±0.003)

0.832

(±0)

0.81

(±0.007)

0.76

(±0.008)

0.857

(±0.005)

0.856

(±0.006)

0.853

(±0.009)

0.845

(±0.006)

0.827

(±0.007)

0.807

(±0.007)

Table A.19 – continued

(±0.009)

0.859

(±0.007)

0.859

(±0.002)

0.851

(±0.006)

0.838

(±0)

0.823

(±0.009)

0.81

(±0.006)

0.782

(±0.007)

0.855

(±0.008)

0.854

(±0.009)

0.848

(±0.008)

0.834

(±0.009)

0.822

(±0.006)

0.813

(±0.008)

0.788

NMax

(±0.003)

0.864

(±0.005)

0.863

(±0.009)

0.861

(±0)

0.848

(±0.006)

0.836

(±0.007)

0.824

(±0.007)

0.806

(±0.008)

0.862

(±0.006)

0.862

(±0.004)

0.859

(±0.008)

0.842

(±0.003)

0.835

(±0.006)

0.822

(±0.008)

0.8

STD

STD

(±0.008)

0.863

(±0.008)

0.86

(±0)

0.856

(±0.006)

0.843

(±0.012)

0.835

(±0.009)

0.806

(±0.011)

0.765

(±0.008)

0.858

(±0.007)

0.857

(±0.004)

0.853

(±0.01)

0.845

(±0.006)

0.827

(±0.006)

0.811

(±0.008)

0.761

WSTD

(±0.008)

0.861

(±0)

0.856

(±0.004)

0.854

(±0.01)

0.837

(±0.005)

0.822

(±0.007)

0.814

(±0.01)

0.786

(±0.008)

0.854

(±0.006)

0.852

(±0.008)

0.849

(±0.005)

0.833

(±0.008)

0.828

(±0.003)

0.811

(±0.007)

0.787

NSTD

(±0)

0.865

(±0.008)

0.865

(±0.013)

0.863

(±0.006)

0.852

(±0.011)

0.844

(±0.005)

0.836

(±0.008)

0.818

(±0.009)

0.862

(±0.005)

0.863

(±0.007)

0.859

(±0.007)

0.849

(±0.005)

0.836

(±0.006)

0.825

(±0.009)

0.862

(±0.008)

0.862

(±0.009)

0.859

(±0.008)

0.856

(±0.005)

0.843

(±0.012)

0.834

(±0.008)

0.82

(±0.009)

0.859

(±0.004)

0.86

(±0.005)

0.861

(±0.008)

0.85

(±0.008)

0.836

(±0.007)

0.829

(±0.007)

0.81

WLocal

Local

(±0.008)

0.808

Flocal

146

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.789 (±0.016)

(±0.018)

(±0.022)

(±0.014)

0.795

0.788

(±0.02)

(±0.019)

0.79

0.778

(±0.013)

(±0.018)

0.789

0.746

(±0.02)

(±0.02)

0.757

0.719

(±0.007)

(±0.009)

0.722

0.668

(±0.012)

(±0.02)

0.706

0.564

0.55

0.69 (±0.016)

0.765

(±0.015)

0.677 (±0.02)

0.762

(±0.022)

0.674 (±0.015)

0.755

(±0.022)

0.631 (±0.018)

0.722

(±0.014)

0.573 (±0.018)

0.666

(±0)

(±0.016)

0.515

0.62

(±0.016)

(±0.034)

(±0)

1

0.406

WAvg

Avg

0.439

Th

0.5

Avg

(±0.017)

0.79

(±0.016)

0.788

(±0.009)

0.787

(±0.019)

0.758

(±0.007)

0.726

(±0.021)

0.698

(±0.02)

0.582

(±0.019)

0.765

(±0.018)

0.763

(±0.015)

0.753

(±0.009)

0.721

(±0)

0.655

(±0.028)

0.606

(±0.031)

0.369

NAvg

MD

(±0.016)

0.798

(±0.013)

0.797

(±0.022)

0.792

(±0.018)

0.773

(±0.016)

0.76

(±0.012)

0.745

(±0.018)

0.588

(±0.019)

0.793

(±0.016)

0.796

(±0.014)

0.786

(±0)

0.768

(±0.025)

0.735

(±0.025)

0.694

(±0.037)

0.588

(±0.014)

0.784

(±0.016)

0.78

(±0.021)

0.773

(±0.014)

0.737

(±0.013)

0.712

(±0.016)

0.68

(±0.016)

0.577

(±0.019)

0.793

(±0.022)

0.789

(±0)

0.78

(±0.02)

0.751

(±0.018)

0.718

(±0.014)

0.688

(±0.027)

0.605

WMD

MD

(±0.013)

0.799

(±0.019)

0.803

(±0.014)

0.797

(±0.02)

0.781

(±0.009)

0.769

(±0.016)

0.746

(±0.02)

0.7

(±0.021)

0.795

(±0)

0.795

(±0.02)

0.785

(±0.036)

0.77

(±0.011)

0.73

(±0.02)

0.69

(±0.024)

0.497

NMD

Max

(±0.017)

0.798

(±0.011)

0.796

(±0.021)

0.795

(±0.017)

0.771

(±0.013)

0.76

(±0.016)

0.735

(±0.018)

0.596

(±0)

0.799

(±0.014)

0.792

(±0.018)

0.782

(±0.015)

0.768

(±0.012)

0.737

(±0.011)

0.692

(±0.039)

0.583

(±0.011)

0.788

(±0.019)

0.781

(±0.026)

0.78

(±0.019)

0.741

(±0.017)

0.711

(±0.013)

0.688

(±0)

0.562

(±0.016)

0.785

(±0.012)

0.782

(±0.011)

0.772

(±0.019)

0.737

(±0.004)

0.696

(±0.021)

0.67

(±0.028)

0.568

WMax

Max

(±0.016)

0.797

(±0.017)

0.798

(±0.013)

0.794

(±0.019)

0.784

(±0.015)

0.763

(±0)

0.743

(±0.015)

0.692

(±0.023)

0.795

(±0.01)

0.796

(±0.018)

0.779

(±0.018)

0.764

(±0.018)

0.712

(±0.017)

0.677

(±0.037)

0.484

NMax

Table A.20: Thresholding techniques for Alj-Mgz-MS Dataset (MacroF1 )

STD

(±0.015)

0.798

(±0.013)

0.796

(±0.013)

0.794

(±0.019)

0.772

(±0)

0.766

(±0.012)

0.739

(±0.01)

0.594

(±0.014)

0.791

(±0.017)

0.791

(±0.024)

0.788

(±0.018)

0.762

(±0.012)

0.736

(±0.016)

0.679

(±0.038)

0.581

(±0.016)

0.789

(±0.014)

0.779

(±0.027)

0.779

(±0)

0.738

(±0.017)

0.711

(±0.008)

0.682

(±0.024)

0.568

(±0.018)

0.793

(±0.02)

0.788

(±0.019)

0.771

(±0.021)

0.747

(±0.019)

0.723

(±0.018)

0.701

(±0.031)

0.598

WSTD

STD

(±0.012)

0.798

(±0.017)

0.801

(±0)

0.798

(±0.02)

0.783

(±0.015)

0.766

(±0.013)

0.747

(±0.005)

0.697

(±0.019)

0.795

(±0.015)

0.796

(±0.015)

0.787

(±0.016)

0.767

(±0.027)

0.736

(±0.019)

0.686

(±0.01)

0.481

NSTD

(±0.018)

0.798

(±0)

0.801

(±0.018)

0.8

(±0.02)

0.789

(±0.019)

0.762

(±0.019)

0.754

(±0.013)

0.729

(±0.017)

0.797

(±0.016)

0.796

(±0.023)

0.786

(±0.011)

0.772

(±0.012)

0.75

(±0.007)

0.744

(±0.041)

0.649

Flocal

(±0)

0.793

(±0.012)

0.793

(±0.019)

0.786

(±0.016)

0.777

(±0.022)

0.766

(±0.007)

0.737

(±0.016)

0.649

(±0.017)

0.791

(±0.022)

0.789

(±0.021)

0.785

(±0.026)

0.767

(±0.02)

0.742

(±0.017)

0.725

(±0.011)

0.653

WLocal

Local

147

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.776 (±0.013)

0.797

(±0.007)

0.771 (±0.016)

0.801

(±0.014)

0.765 (±0.017)

0.8

(±0.028)

0.72 (±0.018)

0.786

(±0.01)

0.671 (±0.022)

0.767

(±0.007)

0.606 (±0.014)

0.755

(±0.016)

0.445 (±0.012)

0.712

(±0.01)

(±0.011)

(±0.01)

0.799

(±0.014)

(±0.022)

0.796

0.794

(±0.013)

(±0.008)

0.801

0.787

(±0.01)

(±0.015)

0.798

0.77

(±0)

(±0.009)

0.786

0.761

(±0.013)

(±0)

0.765

0.737

(±0.014)

0.751

(±0.013)

1

0.623

0.713

0.5

Avg WAvg

Avg

Th

(±0.015)

0.791

(±0.011)

0.786

(±0.016)

0.783

(±0.014)

0.767

(±0.013)

0.745

(±0.016)

0.725

(±0.011)

0.668

(±0.006)

0.798

(±0.011)

0.797

(±0.014)

0.79

(±0)

0.785

(±0.017)

0.769

(±0.007)

0.746

(±0.015)

0.703

NAvg

(±0.006)

0.801

(±0.01)

0.8

(±0.022)

0.802

(±0.012)

0.788

(±0.015)

0.769

(±0.016)

0.759

(±0.01)

0.714

(±0.017)

0.799

(±0.013)

0.801

(±0)

0.796

(±0.021)

0.781

(±0.014)

0.762

(±0.016)

0.75

(±0.015)

0.7

MD

MD

(±0.014)

0.8

(±0.021)

0.799

(±0.018)

0.777

(±0.021)

0.774

(±0.006)

0.745

(±0.012)

0.691

(±0.013)

0.561

(±0.012)

0.788

(±0)

0.789

(±0.009)

0.781

(±0.02)

0.764

(±0.01)

0.738

(±0.013)

0.672

(±0.019)

0.523

WMD

(±0.018)

0.799

(±0.012)

0.797

(±0.021)

0.793

(±0.009)

0.776

(±0.016)

0.753

(±0.014)

0.737

(±0.009)

0.696

(±0)

0.79

(±0.02)

0.791

(±0.014)

0.786

(±0.014)

0.766

(±0.021)

0.761

(±0.02)

0.733

(±0.018)

0.697

NMD

(±0.006)

0.8

(±0.015)

0.798

(±0.021)

0.802

(±0.019)

0.785

(±0.014)

0.77

(±0.014)

0.756

(±0)

0.706

(±0.015)

0.798

(±0.015)

0.801

(±0.013)

0.797

(±0.013)

0.78

(±0.016)

0.764

(±0.015)

0.747

(±0.018)

0.696

Max

Max

0.528

WMax

(±0.016)

0.799

(±0.022)

0.794

(±0.018)

0.783

(±0.012)

0.766

(±0.006)

0.74

(±0)

0.68

(±0.015)

0.544

(±0.012)

0.79

(±0.013)

0.789

(±0.011)

0.782

(±0.012)

0.769

(±0.011)

0.74

(±0.015)

0.69

(±0.018)

Table A.20 – continued

(±0.016)

0.797

(±0.012)

0.797

(±0.008)

0.79

(±0.016)

0.776

(±0)

0.756

(±0.015)

0.736

(±0.014)

0.694

(±0.013)

0.795

(±0.013)

0.796

(±0.023)

0.79

(±0.013)

0.768

(±0.018)

0.756

(±0.028)

0.74

(±0.015)

0.696

NMax

(±0.009)

0.802

(±0.013)

0.802

(±0.015)

0.801

(±0)

0.787

(±0.016)

0.766

(±0.01)

0.751

(±0.014)

0.704

(±0.011)

0.797

(±0.012)

0.801

(±0.007)

0.797

(±0.019)

0.778

(±0.006)

0.766

(±0.01)

0.746

(±0.017)

0.695

STD

STD

(±0.013)

0.8

(±0.019)

0.795

(±0)

0.786

(±0.019)

0.769

(±0.021)

0.75

(±0.011)

0.686

(±0.018)

0.582

(±0.01)

0.792

(±0.012)

0.791

(±0.011)

0.783

(±0.016)

0.77

(±0.007)

0.744

(±0.016)

0.705

(±0.019)

0.553

WSTD

(±0.012)

0.8

(±0)

0.795

(±0.009)

0.793

(±0.022)

0.775

(±0.012)

0.755

(±0.006)

0.739

(±0.018)

0.696

(±0.008)

0.793

(±0.009)

0.792

(±0.021)

0.79

(±0.006)

0.77

(±0.016)

0.765

(±0.017)

0.742

(±0.016)

0.698

NSTD

(±0)

0.802

(±0.015)

0.805

(±0.03)

0.804

(±0.01)

0.79

(±0.021)

0.782

(±0.01)

0.773

(±0.011)

0.741

(±0.013)

0.798

(±0.008)

0.803

(±0.014)

0.8

(±0.016)

0.792

(±0.007)

0.775

(±0.01)

0.762

(±0.013)

0.8

(±0.018)

0.799

(±0.014)

0.798

(±0.017)

0.791

(±0.011)

0.773

(±0.015)

0.758

(±0.014)

0.737

(±0.013)

0.794

(±0.008)

0.796

(±0.011)

0.799

(±0.016)

0.786

(±0.007)

0.767

(±0.011)

0.764

(±0.016)

0.73

WLocal

Local

(±0.017)

0.735

Flocal

148

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.817 (±0.018)

(±0.014)

(±0.014)

(±0.051)

0.818

0.8

(±0.043)

(±0.029)

0.801

0.764

(±0.027)

(±0.064)

0.76

0.675

(±0.074)

(±0.048)

0.668

0.627

(±0.16)

(±0.093)

0.613

0.568

(±0.007)

(±0.009)

0.552

0.432

0.457

0.711 (±0.009)

0.77

(±0.014)

0.707 (±0.025)

0.755

(±0.017)

0.686 (±0.074)

0.729

(±0.055)

0.644 (±0.057)

0.655

(±0.15)

0.598 (±0.072)

0.583

(±0)

(±0.07)

0.566

0.539

(±0.105)

(±0.138)

(±0)

1

0.466

WAvg

Avg

0.471

Th

0.5

Avg

(±0.006)

0.815

(±0.022)

0.795

(±0.018)

0.727

(±0.059)

0.662

(±0.065)

0.605

(±0.138)

0.551

(±0.006)

0.367

(±0.011)

0.775

(±0.024)

0.748

(±0.041)

0.701

(±0.056)

0.577

(±0)

0.493

(±0.124)

0.408

(±0.121)

0.351

NAvg

MD

(±0.007)

0.823

(±0.009)

0.811

(±0.041)

0.801

(±0.062)

0.77

(±0.066)

0.694

(±0.09)

0.659

(±0.007)

0.599

(±0.007)

0.818

(±0.016)

0.811

(±0.034)

0.792

(±0)

0.753

(±0.105)

0.693

(±0.072)

0.633

(±0.225)

0.544

(±0.003)

0.813

(±0.014)

0.809

(±0.047)

0.785

(±0.057)

0.737

(±0.068)

0.684

(±0.148)

0.651

(±0.001)

0.543

(±0.007)

0.817

(±0.018)

0.809

(±0)

0.787

(±0.083)

0.744

(±0.058)

0.702

(±0.2)

0.649

(±0.129)

0.558

WMD

MD

(±0.007)

0.826

(±0.024)

0.821

(±0.034)

0.802

(±0.062)

0.779

(±0.074)

0.76

(±0.068)

0.714

(±0.005)

0.627

(±0.008)

0.823

(±0)

0.807

(±0.039)

0.785

(±0.038)

0.675

(±0.155)

0.604

(±0.101)

0.498

(±0.094)

0.336

NMD

Max

(±0.012)

0.823

(±0.01)

0.815

(±0.031)

0.798

(±0.061)

0.743

(±0.05)

0.697

(±0.075)

0.655

(±0.01)

0.59

(±0)

0.821

(±0.031)

0.808

(±0.027)

0.789

(±0.136)

0.739

(±0.065)

0.685

(±0.082)

0.617

(±0.274)

0.537

(±0.004)

0.819

(±0.014)

0.805

(±0.044)

0.782

(±0.047)

0.735

(±0.06)

0.683

(±0.095)

0.652

(±0)

0.475

(±0.024)

0.814

(±0.021)

0.803

(±0.056)

0.786

(±0.043)

0.737

(±0.065)

0.693

(±0.192)

0.648

(±0.119)

0.533

WMax

Max

(±0.009)

0.828

(±0.026)

0.821

(±0.023)

0.798

(±0.047)

0.756

(±0.053)

0.711

(±0)

0.665

(±0.151)

0.505

(±0.021)

0.817

(±0.037)

0.798

(±0.018)

0.77

(±0.045)

0.654

(±0.131)

0.566

(±0.112)

0.488

(±0.12)

0.337

NMax

Table A.21: Thresholding techniques for Alj-Mgz-MR Dataset (MicroF1 )

STD

(±0.012)

0.819

(±0.009)

0.813

(±0.03)

0.799

(±0.047)

0.764

(±0)

0.692

(±0.121)

0.671

(±0.16)

0.594

(±0.023)

0.823

(±0.007)

0.815

(±0.019)

0.788

(±0.093)

0.74

(±0.073)

0.683

(±0.086)

0.64

(±0.274)

0.543

(±0.003)

0.815

(±0.009)

0.803

(±0.031)

0.779

(±0)

0.737

(±0.1)

0.69

(±0.112)

0.646

(±0.182)

0.468

(±0.006)

0.818

(±0.008)

0.807

(±0.023)

0.781

(±0.055)

0.729

(±0.075)

0.671

(±0.175)

0.621

(±0.136)

0.521

WSTD

STD

(±0.004)

0.826

(±0.011)

0.821

(±0)

0.805

(±0.084)

0.778

(±0.093)

0.727

(±0.126)

0.709

(±0.093)

0.643

(±0.006)

0.821

(±0.006)

0.81

(±0.018)

0.784

(±0.051)

0.686

(±0.143)

0.594

(±0.083)

0.528

(±0.123)

0.418

NSTD

(±0.004)

0.831

(±0)

0.824

(±0.035)

0.811

(±0.08)

0.768

(±0.106)

0.729

(±0.076)

0.698

(±0.106)

0.631

(±0.003)

0.823

(±0.012)

0.819

(±0.019)

0.798

(±0.109)

0.757

(±0.078)

0.721

(±0.088)

0.673

(±0.208)

0.552

Flocal

(±0)

0.827

(±0.013)

0.823

(±0.03)

0.797

(±0.088)

0.762

(±0.067)

0.72

(±0.064)

0.701

(±0.08)

0.599

(±0.006)

0.82

(±0.013)

0.812

(±0.03)

0.798

(±0.049)

0.759

(±0.09)

0.715

(±0.171)

0.677

(±0.127)

0.584

WLocal

Local

149

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.796 (±0.008)

0.824

(±0.038)

0.783 (±0.033)

0.816

(±0.019)

0.769 (±0.037)

0.806

(±0.043)

0.741 (±0.048)

0.778

(±0.047)

0.693 (±0.111)

0.747

(±0.06)

0.662 (±0.006)

0.707

(±0.004)

0.546 (±0.006)

0.673

(±0.018)

(±0.012)

(±0.012)

0.825

(±0.024)

(±0.039)

0.829

0.812

(±0.027)

(±0.031)

0.822

0.795

0.808

(±0.041)

(±0.037)

(±0) 0.757

(±0.057)

0.773

0.73

(±0.063)

(±0)

0.734

0.706

(±0.005)

0.716

(±0.006)

1

0.599

0.662

0.5

Avg WAvg

Avg

Th

(±0.017)

0.816

(±0.024)

0.794

(±0.042)

0.753

(±0.047)

0.67

(±0.153)

0.602

(±0.003)

0.5

(±0.012)

0.444

(±0.018)

0.824

(±0.024)

0.817

(±0.031)

0.796

(±0)

0.778

(±0.049)

0.761

(±0.096)

0.723

(±0.007)

0.617

NAvg

(±0.011)

0.83

(±0.021)

0.822

(±0.042)

0.811

(±0.055)

0.786

(±0.061)

0.746

(±0.003)

0.718

(±0.011)

0.677

(±0.007)

0.827

(±0.015)

0.816

(±0)

0.804

(±0.043)

0.777

(±0.051)

0.726

(±0.081)

0.711

(±0.006)

0.658

MD

MD

(±0.01)

0.825

(±0.035)

0.806

(±0.036)

0.794

(±0.046)

0.761

(±0.085)

0.743

(±0.004)

0.713

(±0.002)

0.583

(±0.008)

0.815

(±0)

0.802

(±0.026)

0.781

(±0.044)

0.748

(±0.045)

0.718

(±0.055)

0.658

(±0.006)

0.523

WMD

(±0.024)

0.818

(±0.023)

0.807

(±0.041)

0.797

(±0.079)

0.762

(±0.122)

0.733

(±0.001)

0.706

(±0.004)

0.59

(±0)

0.808

(±0.011)

0.796

(±0.036)

0.787

(±0.027)

0.763

(±0.044)

0.742

(±0.116)

0.728

(±0.002)

0.59

NMD

(±0.013)

0.828

(±0.025)

0.819

(±0.048)

0.81

(±0.059)

0.784

(±0.068)

0.747

(±0.009)

0.721

(±0)

0.679

(±0.004)

0.827

(±0.018)

0.821

(±0.019)

0.802

(±0.042)

0.783

(±0.078)

0.731

(±0.127)

0.713

(±0.006)

0.657

Max

Max

0.524

WMax

(±0.01)

0.82

(±0.029)

0.802

(±0.044)

0.79

(±0.038)

0.76

(±0.063)

0.743

(±0)

0.716

(±0.005)

0.567

(±0.009)

0.817

(±0.014)

0.806

(±0.025)

0.783

(±0.051)

0.752

(±0.033)

0.723

(±0.058)

0.676

(±0.004)

Table A.21 – continued

(±0.018)

0.818

(±0.025)

0.808

(±0.025)

0.793

(±0.039)

0.761

(±0)

0.734

(±0.056)

0.691

(±0.002)

0.473

(±0.003)

0.815

(±0.014)

0.802

(±0.039)

0.792

(±0.033)

0.767

(±0.045)

0.751

(±0.12)

0.726

(±0.003)

0.56

NMax

(±0.01)

0.828

(±0.015)

0.825

(±0.029)

0.809

(±0)

0.772

(±0.051)

0.74

(±0.104)

0.713

(±0.006)

0.67

(±0.008)

0.827

(±0.02)

0.82

(±0.027)

0.802

(±0.041)

0.776

(±0.071)

0.736

(±0.146)

0.71

(±0.006)

0.663

STD

STD

(±0.008)

0.821

(±0.018)

0.811

(±0)

0.797

(±0.033)

0.767

(±0.065)

0.72

(±0.158)

0.654

(±0.004)

0.598

(±0.013)

0.82

(±0.02)

0.81

(±0.024)

0.784

(±0.048)

0.747

(±0.034)

0.724

(±0.06)

0.689

(±0.004)

0.547

WSTD

(±0.009)

0.823

(±0)

0.815

(±0.02)

0.798

(±0.054)

0.763

(±0.161)

0.731

(±0.061)

0.692

(±0.005)

0.565

(±0.017)

0.816

(±0.013)

0.802

(±0.034)

0.791

(±0.029)

0.768

(±0.047)

0.746

(±0.109)

0.726

(±0.005)

0.591

NSTD

(±0)

0.832

(±0.012)

0.83

(±0.027)

0.823

(±0.109)

0.801

(±0.048)

0.779

(±0.098)

0.747

(±0.006)

0.676

(±0.007)

0.831

(±0.02)

0.824

(±0.028)

0.812

(±0.041)

0.789

(±0.049)

0.775

(±0.129)

0.744

(±0.007)

0.834

(±0.018)

0.829

(±0.075)

0.818

(±0.038)

0.8

(±0.045)

0.779

(±0.089)

0.759

(±0.006)

0.69

(±0.011)

0.832

(±0.018)

0.826

(±0.023)

0.807

(±0.05)

0.793

(±0.035)

0.768

(±0.063)

0.745

(±0.004)

0.678

WLocal

Local

(±0.004)

0.677

Flocal

150

DF

CC

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.739 (±0.025)

(±0.024)

(±0.036)

(±0.068)

0.744

0.701

(±0.095)

(±0.079)

0.701

0.62

(±0.027)

(±0.15)

0.616

0.484

(±0.161)

(±0.071)

0.485

0.392

(±0.181)

(±0.159)

0.376

0.329

(±0.009)

(±0.016)

0.319

0.224

0.247

0.513 (±0.021)

0.674

(±0.024)

0.501 (±0.058)

0.642

(±0.034)

0.467 (±0.109)

0.592

(±0.113)

0.387 (±0.097)

0.471

(±0.199)

0.321 (±0.135)

0.375

(±0.11)

0.29 (±0)

0.341

(±0.138)

(±0.128)

(±0)

1

0.231

WAvg

Avg

0.286

Th

0.5

Avg

(±0.013)

0.748

(±0.045)

0.702

(±0.022)

0.605

(±0.15)

0.473

(±0.151)

0.365

(±0.169)

0.324

(±0.016)

0.176

(±0.023)

0.696

(±0.036)

0.662

(±0.071)

0.598

(±0.131)

0.432

(±0)

0.327

(±0.14)

0.256

(±0.085)

0.216

NAvg

MD

(±0.011)

0.743

(±0.014)

0.712

(±0.099)

0.677

(±0.15)

0.596

(±0.14)

0.439

(±0.153)

0.391

(±0.013)

0.33

(±0.017)

0.737

(±0.027)

0.721

(±0.083)

0.674

(±0)

0.589

(±0.141)

0.475

(±0.062)

0.402

(±0.167)

0.327

(±0.006)

0.731

(±0.038)

0.719

(±0.103)

0.664

(±0.11)

0.552

(±0.154)

0.426

(±0.174)

0.382

(±0.007)

0.29

(±0.013)

0.733

(±0.038)

0.712

(±0)

0.654

(±0.122)

0.555

(±0.065)

0.448

(±0.209)

0.383

(±0.141)

0.289

WMD

MD

(±0.016)

0.757

(±0.047)

0.747

(±0.049)

0.71

(±0.153)

0.694

(±0.161)

0.673

(±0.099)

0.607

(±0.015)

0.453

(±0.019)

0.747

(±0)

0.721

(±0.06)

0.686

(±0.049)

0.534

(±0.188)

0.427

(±0.175)

0.325

(±0.12)

0.237

NMD

Max

(±0.017)

0.745

(±0.013)

0.724

(±0.08)

0.678

(±0.149)

0.557

(±0.079)

0.436

(±0.129)

0.389

(±0.028)

0.33

(±0)

0.74

(±0.041)

0.714

(±0.051)

0.66

(±0.156)

0.567

(±0.139)

0.479

(±0.163)

0.394

(±0.228)

0.329

(±0.012)

0.739

(±0.037)

0.716

(±0.1)

0.657

(±0.08)

0.545

(±0.136)

0.423

(±0.166)

0.389

(±0)

0.26

(±0.028)

0.72

(±0.042)

0.696

(±0.066)

0.65

(±0.088)

0.515

(±0.146)

0.438

(±0.21)

0.385

(±0.147)

0.289

WMax

Max

(±0.018)

0.756

(±0.057)

0.742

(±0.034)

0.701

(±0.125)

0.645

(±0.139)

0.54

(±0)

0.448

(±0.174)

0.326

(±0.033)

0.745

(±0.041)

0.717

(±0.037)

0.671

(±0.107)

0.508

(±0.186)

0.405

(±0.181)

0.317

(±0.126)

0.236

NMax

Table A.22: Thresholding techniques for Alj-Mgz-MR Dataset (MacroF1 )

STD

(±0.021)

0.743

(±0.012)

0.717

(±0.057)

0.679

(±0.134)

0.584

(±0)

0.434

(±0.184)

0.41

(±0.178)

0.33

(±0.02)

0.742

(±0.018)

0.724

(±0.048)

0.671

(±0.129)

0.555

(±0.14)

0.458

(±0.167)

0.41

(±0.225)

0.312

(±0.009)

0.731

(±0.013)

0.708

(±0.078)

0.647

(±0)

0.543

(±0.178)

0.436

(±0.18)

0.388

(±0.19)

0.258

(±0.012)

0.731

(±0.019)

0.709

(±0.031)

0.647

(±0.113)

0.55

(±0.156)

0.427

(±0.203)

0.371

(±0.152)

0.297

WSTD

STD

(±0.011)

0.753

(±0.033)

0.746

(±0)

0.713

(±0.14)

0.678

(±0.171)

0.593

(±0.199)

0.576

(±0.16)

0.482

(±0.014)

0.748

(±0.016)

0.724

(±0.045)

0.681

(±0.127)

0.546

(±0.183)

0.413

(±0.155)

0.332

(±0.146)

0.263

NSTD

(±0.014)

0.764

(±0)

0.748

(±0.067)

0.72

(±0.132)

0.638

(±0.19)

0.529

(±0.161)

0.471

(±0.158)

0.411

(±0.011)

0.75

(±0.023)

0.738

(±0.053)

0.693

(±0.145)

0.623

(±0.15)

0.575

(±0.171)

0.517

(±0.206)

0.356

Flocal

(±0)

0.754

(±0.026)

0.745

(±0.063)

0.682

(±0.148)

0.583

(±0.16)

0.472

(±0.152)

0.463

(±0.114)

0.338

(±0.01)

0.742

(±0.029)

0.72

(±0.037)

0.68

(±0.11)

0.584

(±0.179)

0.501

(±0.206)

0.469

(±0.152)

0.362

WLocal

Local

151

MI

IG

Metric

10

7.5

5

2.5

1.5

1

0.5

10

7.5

5

2.5

1.5

0.656 (±0.011)

0.749

(±0.033)

0.616 (±0.105)

0.738

(±0.033)

0.57 (±0.039)

0.71

(±0.136)

0.475 (±0.114)

0.669

(±0.045)

0.421 (±0.126)

0.592

(±0.133)

0.394 (±0.012)

0.47

(±0.009)

0.287 (±0.006)

0.437

(±0.047)

(±0.011)

(±0.015)

0.749

(±0.027)

(±0.102)

0.754

0.719

(±0.033)

(±0.039)

0.744

0.662

0.714

(±0.075)

(±0.047)

(±0) 0.557

(±0.114)

0.64

0.483

(±0.119)

(±0)

0.523

0.461

(±0.01)

0.477

(±0.013)

1

0.338

0.429

0.5

Avg WAvg

Avg

Th

(±0.051)

0.75

(±0.025)

0.716

(±0.104)

0.678

(±0.11)

0.615

(±0.135)

0.547

(±0.009)

0.396

(±0.025)

0.321

(±0.02)

0.753

(±0.032)

0.74

(±0.043)

0.706

(±0)

0.69

(±0.136)

0.668

(±0.135)

0.605

(±0.016)

0.444

NAvg

(±0.012)

0.758

(±0.035)

0.744

(±0.128)

0.722

(±0.05)

0.683

(±0.123)

0.554

(±0.01)

0.526

(±0.011)

0.444

(±0.012)

0.759

(±0.024)

0.735

(±0)

0.709

(±0.119)

0.672

(±0.124)

0.543

(±0.123)

0.528

(±0.011)

0.432

MD

MD

(±0.015)

0.74

(±0.116)

0.687

(±0.039)

0.655

(±0.13)

0.549

(±0.143)

0.49

(±0.008)

0.462

(±0.009)

0.305

(±0.007)

0.73

(±0)

0.7

(±0.056)

0.644

(±0.133)

0.55

(±0.072)

0.47

(±0.114)

0.388

(±0.008)

0.258

WMD

(±0.063)

0.749

(±0.025)

0.731

(±0.106)

0.712

(±0.169)

0.68

(±0.124)

0.651

(±0.011)

0.629

(±0.01)

0.467

(±0)

0.738

(±0.019)

0.714

(±0.102)

0.704

(±0.037)

0.68

(±0.1)

0.66

(±0.162)

0.642

(±0.007)

0.476

NMD

(±0.013)

0.756

(±0.062)

0.741

(±0.133)

0.72

(±0.058)

0.679

(±0.122)

0.557

(±0.02)

0.532

(±0)

0.444

(±0.005)

0.758

(±0.05)

0.741

(±0.023)

0.709

(±0.098)

0.678

(±0.154)

0.543

(±0.119)

0.53

(±0.015)

0.432

Max

Max

0.26

WMax

(±0.013)

0.731

(±0.086)

0.679

(±0.049)

0.639

(±0.052)

0.535

(±0.124)

0.488

(±0)

0.461

(±0.007)

0.304

(±0.017)

0.736

(±0.017)

0.705

(±0.042)

0.65

(±0.129)

0.56

(±0.029)

0.474

(±0.118)

0.412

(±0.006)

Table A.22 – continued

(±0.045)

0.749

(±0.025)

0.73

(±0.035)

0.709

(±0.081)

0.675

(±0)

0.654

(±0.123)

0.615

(±0.004)

0.414

(±0.007)

0.744

(±0.02)

0.721

(±0.097)

0.709

(±0.038)

0.682

(±0.102)

0.666

(±0.166)

0.638

(±0.008)

0.457

NMax

(±0.011)

0.754

(±0.021)

0.748

(±0.045)

0.717

(±0)

0.633

(±0.135)

0.544

(±0.104)

0.487

(±0.014)

0.436

(±0.01)

0.758

(±0.046)

0.742

(±0.029)

0.709

(±0.099)

0.67

(±0.148)

0.55

(±0.131)

0.529

(±0.011)

0.434

STD

STD

(±0.013)

0.735

(±0.028)

0.715

(±0)

0.663

(±0.075)

0.587

(±0.098)

0.472

(±0.171)

0.378

(±0.009)

0.327

(±0.025)

0.736

(±0.021)

0.705

(±0.04)

0.648

(±0.129)

0.544

(±0.033)

0.476

(±0.116)

0.444

(±0.004)

0.283

WSTD

(±0.01)

0.752

(±0)

0.743

(±0.035)

0.71

(±0.105)

0.68

(±0.17)

0.65

(±0.136)

0.615

(±0.007)

0.461

(±0.018)

0.739

(±0.017)

0.72

(±0.091)

0.708

(±0.04)

0.682

(±0.107)

0.662

(±0.16)

0.642

(±0.012)

0.476

NSTD

(±0)

0.767

(±0.017)

0.768

(±0.091)

0.754

(±0.093)

0.709

(±0.108)

0.688

(±0.121)

0.657

(±0.006)

0.505

(±0.008)

0.768

(±0.045)

0.755

(±0.032)

0.735

(±0.104)

0.705

(±0.113)

0.685

(±0.119)

0.656

(±0.011)

0.767

(±0.058)

0.758

(±0.063)

0.737

(±0.096)

0.7

(±0.118)

0.671

(±0.107)

0.612

(±0.013)

0.51

(±0.022)

0.764

(±0.022)

0.756

(±0.039)

0.722

(±0.131)

0.699

(±0.036)

0.669

(±0.115)

0.611

(±0.015)

0.463

WLocal

Local

(±0.017)

0.517

Flocal

Appendix B

Combining Operators Results

152

Table B.1: Combining operators using 20NG Dataset (FLocal) Threshold (%) Th UTh ITh

M1-M2

0.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2.1 3.4 6.4 9.5 12.6

0.2 0.6 0.9 1.6 3.6 5.5 7.4

49.3 55.6 57.7 65.4 71.9 73.9 74.1

0.5 1 1.5 2.5 5 7.5 10

0.8 1.6 2.3 3.7 7.0 10.3 13.6

0.2 0.4 0.7 1.3 3.0 4.7 6.4

36.8 45.0 46.7 53.3 59.2 62.8 64.4

0.5 1 1.5 2.5 5 7.5 10

0.8 1.5 2.2 3.5 6.6 9.7 12.7

0.2 0.5 0.8 1.5 3.4 5.3 7.3

45.6 51.2 54.7 61.3 67.9 70.2 72.6

0.5 1 1.5 2.5 5 7.5 10

0.7 1.5 2.2 3.5 6.7 9.8 12.7

0.3 0.5 0.8 1.5 3.3 5.2 7.3

51.1 53.5 54.9 59.3 65.8 69.3 72.8

0.5 1 1.5 2.5 5 7.5 10

0.7 1.3 2.0 3.2 6.1 8.8 11.5

0.3 0.7 1.0 1.8 3.9 6.2 8.5

60.5 66.1 69.5 73.0 78.9 82.1 84.9

0.5 1 1.5 2.5 5 7.5 10

0.6 1.2 1.8 2.9 5.8 8.7 11.3

0.4 0.8 1.2 2.1 4.2 6.3 8.7

81.2 83.7 81.2 83.8 83.6 84.1 87.3

Similarity (%) M1-UCD M1-UCM CC-DF 62.2 69.3 66.8 69.1 67.4 72.2 73.9 76.8 78.6 80.6 79.8 81.4 81.3 81.3 CC-IG 78.5 59.8 82.6 62.1 81.6 63.3 85.7 67.7 87.1 71.5 88.9 72.8 88.3 74.2 CC-MI 78.9 62.7 78.4 62.4 78.6 64.6 82.8 70.1 84.9 73.6 86.1 76.6 87.0 77.5 DF-IG 99.1 73.3 99.6 77.1 99.7 76.1 99.7 79.2 99.3 83.7 99.5 83.4 99.5 88.0 DF-MI 100.0 77.2 100.0 79.4 99.4 82.0 99.8 84.0 99.7 86.8 99.7 89.5 99.7 92.1 IG-MI 84.7 91.3 85.2 89.6 82.8 88.4 85.0 90.5 85.2 89.1 85.8 90.2 88.0 92.0

153

microF1 UN UCD

M1

M2

UCM

INT

0.653 0.709 0.737 0.757 0.774 0.781 0.785

0.671 0.717 0.737 0.752 0.775 0.783 0.786

0.691 0.732 0.750 0.766 0.777 0.785 0.790

0.644 0.700 0.717 0.748 0.771 0.781 0.785

0.665 0.722 0.739 0.757 0.775 0.782 0.787

0.613 0.689 0.711 0.743 0.766 0.780 0.781

0.653 0.709 0.737 0.757 0.774 0.781 0.785

0.725 0.755 0.770 0.786 0.791 0.796 0.798

0.737 0.758 0.774 0.782 0.788 0.792 0.794

0.682 0.722 0.746 0.765 0.777 0.783 0.789

0.710 0.752 0.766 0.776 0.788 0.792 0.793

0.611 0.687 0.725 0.760 0.776 0.788 0.790

0.653 0.709 0.737 0.757 0.774 0.781 0.785

0.719 0.745 0.761 0.775 0.785 0.790 0.794

0.726 0.754 0.771 0.774 0.786 0.788 0.794

0.679 0.717 0.743 0.758 0.777 0.781 0.787

0.710 0.743 0.759 0.771 0.784 0.790 0.793

0.622 0.693 0.719 0.753 0.772 0.785 0.786

0.671 0.717 0.737 0.752 0.775 0.783 0.786

0.725 0.755 0.770 0.786 0.791 0.796 0.798

0.728 0.747 0.765 0.779 0.787 0.790 0.792

0.672 0.718 0.737 0.753 0.775 0.782 0.787

0.707 0.738 0.758 0.769 0.784 0.785 0.793

0.649 0.719 0.737 0.756 0.778 0.785 0.791

0.671 0.717 0.737 0.752 0.775 0.783 0.786

0.719 0.745 0.761 0.775 0.785 0.790 0.794

0.720 0.744 0.758 0.772 0.784 0.787 0.791

0.670 0.717 0.735 0.753 0.775 0.782 0.786

0.708 0.734 0.755 0.767 0.783 0.789 0.792

0.659 0.718 0.737 0.755 0.775 0.785 0.789

0.725 0.755 0.770 0.786 0.791 0.796 0.798

0.719 0.745 0.761 0.775 0.785 0.790 0.794

0.730 0.754 0.771 0.785 0.790 0.795 0.796

0.719 0.744 0.762 0.774 0.785 0.790 0.795

0.717 0.749 0.762 0.779 0.785 0.790 0.794

0.707 0.742 0.762 0.776 0.787 0.791 0.797

Table B.2: Combining operators using Alj-News-AS Dataset (FLocal) Threshold (%) Th UTh ITh

M1-M2

0.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2.0 3.3 6.7 10.2 13.3

0.2 0.6 1.0 1.7 3.3 4.8 6.7

44.0 62.4 67.1 68.0 65.9 64.2 66.7

0.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2.0 3.0 5.5 8.0 10.7

0.2 0.6 1.0 2.0 4.5 7.0 9.3

47.6 61.8 69.9 80.0 89.8 92.8 93.3

0.5 1 1.5 2.5 5 7.5 10

0.7 1.3 1.9 3.0 5.8 9.0 11.7

0.3 0.7 1.1 2.0 4.2 6.0 8.3

54.8 70.9 75.6 81.9 83.3 80.6 83.2

0.5 1 1.5 2.5 5 7.5 10

0.8 1.5 2.2 3.6 7.0 10.3 13.7

0.2 0.5 0.8 1.4 3.0 4.7 6.3

41.5 46.4 50.6 54.9 60.7 63.0 62.9

0.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2.1 3.3 6.2 9.1 12.1

0.3 0.6 0.9 1.7 3.8 5.9 7.9

50.0 60.7 61.8 69.5 76.4 79.3 78.8

0.5 1 1.5 2.5 5 7.5 10

0.6 1.2 1.8 3.0 5.9 9.0 12.2

0.4 0.8 1.2 2.0 4.1 6.0 7.8

85.7 80.7 80.3 81.7 82.3 80.2 78.5

Similarity (%) M1-UCD M1-UCM CC-DF 48.8 67.1 64.8 72.7 73.6 74.8 71.0 78.3 65.9 77.4 64.8 73.5 66.7 79.7 CC-IG 97.6 73.8 99.4 81.8 98.8 85.8 99.5 91.2 99.6 95.7 98.8 97 97.4 96.2 CC-MI 91.7 75 96.4 84.8 97.2 89.8 93.1 88.8 86.2 89.1 82.0 85.6 83.7 87.6 DF-IG 100.0 72 100.0 75.9 99.6 76.7 99.3 77.1 100.0 82.1 100.0 86 100.0 80 DF-MI 100.0 76.8 100.0 81.5 100.0 84.2 99.3 83.4 100.0 88.3 99.8 89.7 100.0 89 IG-MI 85.7 92.9 81.9 91 82.6 91.1 82.7 89.5 83.5 90.7 81.0 87.5 81.0 88.2

154

microF1 UN UCD

M1

M2

UCM

INT

0.931 0.948 0.952 0.963 0.967 0.967 0.964

0.905 0.937 0.947 0.956 0.968 0.959 0.965

0.925 0.947 0.957 0.962 0.969 0.962 0.967

0.902 0.935 0.950 0.956 0.968 0.959 0.964

0.918 0.938 0.946 0.955 0.967 0.969 0.965

0.891 0.933 0.946 0.959 0.961 0.967 0.963

0.931 0.948 0.952 0.963 0.967 0.967 0.964

0.943 0.951 0.961 0.966 0.965 0.967 0.967

0.950 0.956 0.961 0.967 0.965 0.969 0.969

0.922 0.945 0.955 0.964 0.969 0.969 0.967

0.942 0.942 0.957 0.968 0.967 0.967 0.965

0.917 0.935 0.958 0.965 0.970 0.966 0.963

0.931 0.948 0.952 0.963 0.967 0.967 0.964

0.945 0.945 0.956 0.966 0.967 0.967 0.967

0.946 0.954 0.958 0.965 0.972 0.966 0.968

0.921 0.948 0.954 0.965 0.967 0.967 0.968

0.938 0.948 0.952 0.964 0.969 0.966 0.967

0.916 0.947 0.954 0.963 0.967 0.966 0.963

0.905 0.937 0.947 0.956 0.968 0.959 0.965

0.943 0.951 0.961 0.966 0.965 0.967 0.967

0.948 0.958 0.963 0.969 0.968 0.962 0.967

0.905 0.939 0.948 0.958 0.965 0.960 0.964

0.929 0.944 0.955 0.965 0.968 0.971 0.968

0.889 0.928 0.943 0.954 0.966 0.969 0.965

0.905 0.937 0.947 0.956 0.968 0.959 0.965

0.945 0.945 0.956 0.966 0.967 0.967 0.967

0.942 0.949 0.959 0.963 0.968 0.963 0.965

0.905 0.937 0.950 0.957 0.968 0.959 0.966

0.931 0.947 0.952 0.959 0.964 0.967 0.967

0.889 0.937 0.941 0.954 0.961 0.964 0.966

0.943 0.951 0.961 0.966 0.965 0.967 0.967

0.945 0.945 0.956 0.966 0.967 0.967 0.967

0.949 0.952 0.961 0.968 0.967 0.965 0.967

0.945 0.943 0.957 0.968 0.967 0.966 0.966

0.944 0.951 0.957 0.968 0.966 0.967 0.966

0.942 0.946 0.957 0.968 0.968 0.966 0.965

Table B.3: Combining operators using Alj-News-SR Dataset (FLocal) Threshold (%) Th UTh ITh

M1-M2

0.5 1 1.5 2.5 5 7.5 10

1.0 1.8 2.5 3.8 6.4 9.6 12.7

0.0 0.2 0.5 1.3 3.6 5.4 7.3

5.4 21.6 30.1 50.0 72.0 72.1 72.6

0.5 1 1.5 2.5 5 7.5 10

0.8 1.6 2.3 3.5 6.2 9.0 11.8

0.2 0.4 0.7 1.5 3.8 6.0 8.2

40.5 44.6 47.8 60.8 75.3 80.1 82.3

0.5 1 1.5 2.5 5 7.5 10

0.8 1.5 2.2 3.2 5.7 8.7 11.7

0.2 0.5 0.8 1.8 4.3 6.3 8.3

45.9 51.4 56.6 70.6 86.7 84.6 83.4

0.5 1 1.5 2.5 5 7.5 10

0.9 1.8 2.6 4.0 7.1 10.2 13.0

0.1 0.2 0.4 1.0 2.9 4.8 7.0

16.2 23.0 29.7 39.5 58.0 64.5 70.5

0.5 1 1.5 2.5 5 7.5 10

0.9 1.7 2.5 3.8 6.4 9.1 11.6

0.1 0.3 0.5 1.2 3.6 5.9 8.4

13.5 25.7 36.0 46.5 71.1 78.0 84.4

0.5 1 1.5 2.5 5 7.5 10

0.6 1.2 1.8 3.0 6.1 9.0 11.6

0.4 0.8 1.2 2.0 3.9 6.0 8.4

83.8 77.0 77.1 81.5 77.2 80.2 83.6

Similarity (%) M1-UCD M1-UCM CC-DF 8.1 29.7 23.0 31.1 31.5 37.8 55.1 58.9 75.7 74.1 74.1 75.2 72.6 73.4 CC-IG 75.7 48.6 89.2 55.4 89.4 59.3 93.8 72.7 97.6 81.8 95.8 88.7 91.6 86.4 CC-MI 73.0 56.8 78.4 62.2 82.3 62.8 91.7 75.5 97.8 88 90.8 87.8 85.8 83.8 DF-IG 100.0 56.8 100.0 70.3 100.0 73 98.9 75.7 98.4 85.3 99.6 88.4 99.6 91.4 DF-MI 100.0 54.1 100.0 67.6 100.0 69.4 98.9 73 98.9 85.6 99.5 88.2 99.7 91.5 IG-MI 86.5 94.6 81.1 89.2 83.3 90.4 85.9 90.1 85.1 89.2 82.8 86.2 85.2 87.8

155

microF1 UN UCD

M1

M2

UCM

INT

0.857 0.903 0.929 0.947 0.956 0.961 0.964

0.664 0.822 0.878 0.920 0.948 0.959 0.959

0.871 0.915 0.941 0.944 0.959 0.962 0.962

0.691 0.821 0.877 0.922 0.955 0.958 0.961

0.722 0.831 0.883 0.928 0.954 0.959 0.960

0.445 0.633 0.849 0.911 0.947 0.960 0.958

0.857 0.903 0.929 0.947 0.956 0.961 0.964

0.912 0.924 0.946 0.943 0.956 0.965 0.960

0.914 0.934 0.950 0.955 0.963 0.965 0.963

0.876 0.912 0.933 0.941 0.957 0.963 0.964

0.909 0.923 0.945 0.948 0.959 0.959 0.961

0.814 0.882 0.922 0.937 0.954 0.956 0.961

0.857 0.903 0.929 0.947 0.956 0.961 0.964

0.905 0.923 0.943 0.949 0.957 0.957 0.959

0.913 0.929 0.946 0.951 0.959 0.962 0.962

0.880 0.913 0.935 0.949 0.961 0.960 0.960

0.904 0.922 0.938 0.952 0.959 0.958 0.958

0.832 0.886 0.921 0.940 0.961 0.962 0.962

0.664 0.822 0.878 0.920 0.948 0.959 0.959

0.912 0.924 0.946 0.943 0.956 0.965 0.960

0.919 0.936 0.948 0.951 0.958 0.962 0.962

0.664 0.822 0.878 0.920 0.952 0.960 0.958

0.838 0.913 0.935 0.938 0.955 0.961 0.956

0.470 0.709 0.844 0.906 0.950 0.958 0.961

0.664 0.822 0.878 0.920 0.948 0.959 0.959

0.905 0.923 0.943 0.949 0.957 0.957 0.959

0.915 0.931 0.950 0.951 0.960 0.960 0.958

0.664 0.822 0.878 0.917 0.949 0.958 0.958

0.833 0.911 0.925 0.944 0.956 0.956 0.961

0.465 0.721 0.851 0.911 0.952 0.957 0.957

0.912 0.924 0.946 0.943 0.956 0.965 0.960

0.905 0.923 0.943 0.949 0.957 0.957 0.959

0.914 0.933 0.947 0.951 0.961 0.963 0.961

0.908 0.925 0.943 0.943 0.956 0.960 0.960

0.907 0.920 0.940 0.947 0.954 0.958 0.960

0.908 0.913 0.936 0.942 0.953 0.958 0.960

Table B.4: Combining operators using Alj-News-MS Dataset (FLocal) Threshold (%) Th UTh ITh

M1-M2

0.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2 3.4 6.8 10.3 13.5

0.2 0.6 1 1.6 3.2 4.7 6.5

46.5 62.2 66.8 65.7 63.6 62.9 65.3

0.5 1 1.5 2.5 5 7.5 10

0.8 1.3 1.9 3 5.4 7.9 10.6

0.2 0.7 1.1 2 4.6 7.1 9.4

48.5 65.4 76.6 81.9 92.6 94.8 93.5

0.5 1 1.5 2.5 5 7.5 10

0.7 1.2 1.8 3 5.8 8.8 11.8

0.3 0.8 1.2 2 4.2 6.2 8.2

56.4 76.1 79 81.7 83.9 83.2 82.1

0.5 1 1.5 2.5 5 7.5 10

0.9 1.6 2.3 3.7 7 10.4 13.5

0.1 0.4 0.8 1.3 3 4.6 6.5

27.7 44.1 50 51.8 59.2 60.8 64.8

00.5 1 1.5 2.5 5 7.5 10

0.8 1.4 2.1 3.3 6.2 9.3 12

0.2 0.6 0.9 1.7 3.8 5.7 8

36.2 58.5 60.8 68.2 75.5 76.5 79.7

0.5 1 1.5 2.5 5 7.5 10

0.6 1.2 1.8 3 6 8.9 11.8

0.4 0.8 1.2 2 4 6.1 8.2

85.3 77.7 82.9 80.1 80.4 81.5 82.2

Similarity (%) M1-UCD M1-UCM CC-DF 52.1 60.6 68.6 68.1 70.6 72.4 66 72.8 63.6 67.8 62.9 69 65.5 80.5 CC-IG 100 66.3 99.5 82.4 99.7 89.2 99.6 91.2 99.5 96.1 98.2 97.2 97.9 97.1 CC-MI 98.9 73.4 98.9 84.6 94.8 85.7 92.3 87.4 85.9 85.1 84.1 84.8 83.9 85.7 DF-IG 97.9 63.8 99.5 73.9 99.7 77.3 100 74.9 100 80.5 100 81.2 100 79 DF-MI 98.9 70.2 100 79.3 99.7 81.5 100 83.3 100 86 100 86.6 100 87.4 IG-MI 86.2 93.6 80.9 86.7 84 91.3 80.9 89.1 81.1 86.4 82.1 87.5 82.7 88.4

156

microF1 UN UCD

M1

M2

UCM

INT

0.912 0.944 0.946 0.954 0.964 0.964 0.961

0.869 0.911 0.925 0.945 0.957 0.954 0.959

0.908 0.942 0.947 0.956 0.959 0.960 0.959

0.874 0.909 0.925 0.946 0.957 0.955 0.959

0.866 0.910 0.929 0.947 0.955 0.960 0.962

0.859 0.910 0.929 0.939 0.956 0.960 0.964

0.912 0.944 0.946 0.954 0.964 0.964 0.961

0.933 0.949 0.953 0.961 0.958 0.966 0.968

0.937 0.947 0.950 0.961 0.963 0.962 0.965

0.915 0.946 0.945 0.954 0.965 0.963 0.964

0.924 0.943 0.953 0.958 0.961 0.963 0.966

0.900 0.942 0.951 0.958 0.963 0.964 0.963

0.912 0.944 0.946 0.954 0.964 0.964 0.961

0.926 0.949 0.957 0.959 0.962 0.960 0.960

0.926 0.944 0.950 0.954 0.964 0.961 0.960

0.913 0.941 0.950 0.952 0.963 0.957 0.959

0.905 0.944 0.959 0.957 0.962 0.959 0.962

0.900 0.943 0.952 0.959 0.964 0.962 0.963

0.869 0.911 0.925 0.945 0.957 0.954 0.959

0.933 0.949 0.953 0.961 0.958 0.966 0.968

0.933 0.950 0.957 0.956 0.957 0.961 0.960

0.870 0.909 0.925 0.946 0.957 0.955 0.961

0.903 0.938 0.950 0.956 0.959 0.960 0.962

0.834 0.905 0.932 0.948 0.960 0.963 0.965

0.869 0.911 0.925 0.945 0.957 0.954 0.959

0.926 0.949 0.957 0.959 0.962 0.960 0.960

0.929 0.944 0.950 0.954 0.960 0.959 0.961

0.867 0.911 0.925 0.945 0.957 0.955 0.961

0.898 0.941 0.946 0.951 0.958 0.961 0.962

0.847 0.901 0.937 0.940 0.960 0.956 0.960

0.933 0.949 0.953 0.961 0.958 0.966 0.968

0.926 0.949 0.957 0.959 0.962 0.960 0.960

0.933 0.951 0.955 0.960 0.963 0.962 0.959

0.926 0.945 0.952 0.959 0.962 0.962 0.960

0.928 0.950 0.954 0.959 0.961 0.962 0.960

0.928 0.940 0.949 0.958 0.961 0.965 0.967

Table B.5: Combining operators using Alj-News-MR Dataset (FLocal) Threshold (%) Th UTh ITh

M1-M2

0.5 1 1.5 2.5 5 7.5 10

1 1.9 2.8 4.5 7.8 10.6 13

0 0.1 0.2 0.5 2.2 4.4 7

6.7 7.4 12.5 19.1 44.4 59.2 70.1

0.5 1 1.5 2.5 5 7.5 10

0.9 1.7 2.4 3.9 7.6 10.3 13.1

0.1 0.3 0.6 1.1 2.4 4.7 6.9

26.7 33.3 40 44.1 47.4 63.2 68.6

0.5 1 1.5 2.5 5 7.5 10

0.9 1.6 2.4 3.8 7 9.5 12

0.1 0.4 0.6 1.2 3 5.5 8

26.7 37 42.5 47.1 60.7 73.6 79.5

0.5 1 1.5 2.5 5 7.5 10

1 1.9 2.8 4.4 8.3 11.6 14.6

0 0.1 0.2 0.6 1.7 3.4 5.4

0 10.3 15 22.7 34.8 44.9 54.2

0.5 1 1.5 2.5 5 7.5 10

1 1.9 2.8 4.3 7.9 10.8 13.6

0 0.1 0.2 0.7 2.1 4.2 6.4

0 10.3 15 27.3 42.4 55.6 63.6

0.5 1 1.5 2.5 5 7.5 10

0.5 1.1 1.8 2.9 5.9 8.5 11.8

0.5 0.9 1.2 2.1 4.1 6.5 8.2

93.3 85.2 80 85.5 81.8 86.8 82.5

Similarity (%) M1-UCD M1-UCM CC-DF 7.1 57.1 7.4 37 15 45 22.7 51.5 47.7 64.4 62.1 73.7 70.5 78.4 CC-IG 53.3 60 70.4 59.3 85 70 86.8 67.6 90.9 68.2 95.9 80.2 96.6 83.3 CC-MI 53.3 60 59.3 66.7 72.5 75 77.9 66.2 88.8 76.9 94.9 84.8 96.6 88.5 DF-IG 100 50 100 59.3 100 62.5 100 62.1 100 71.2 99.5 75.1 99.6 80.7 DF-MI 100 50 100 51.9 100 60 100 60.6 100 71.2 100 77.8 100 80.9 IG-MI 93.3 93.3 85.2 96.3 82.5 90 87 92.8 84.8 90.9 88.8 93.9 87 90.8

157

microF1 UN UCD

M1

M2

UCM

INT

0.596 0.802 0.855 0.908 0.940 0.954 0.955

0.191 0.531 0.703 0.813 0.917 0.932 0.941

0.616 0.831 0.872 0.924 0.943 0.962 0.959

0.191 0.479 0.676 0.802 0.915 0.934 0.941

0.321 0.674 0.794 0.866 0.913 0.936 0.950

0.000 0.247 0.453 0.591 0.906 0.928 0.936

0.596 0.802 0.855 0.908 0.940 0.954 0.955

0.852 0.891 0.918 0.930 0.947 0.955 0.959

0.874 0.913 0.933 0.946 0.952 0.955 0.960

0.749 0.819 0.868 0.921 0.937 0.950 0.957

0.752 0.882 0.888 0.933 0.936 0.954 0.956

0.431 0.690 0.785 0.888 0.926 0.947 0.956

0.596 0.802 0.855 0.908 0.940 0.954 0.955

0.859 0.902 0.916 0.930 0.949 0.948 0.959

0.885 0.917 0.928 0.940 0.954 0.954 0.954

0.746 0.832 0.868 0.922 0.932 0.947 0.961

0.751 0.856 0.891 0.919 0.947 0.954 0.957

0.431 0.694 0.764 0.893 0.939 0.947 0.959

0.191 0.531 0.703 0.813 0.917 0.932 0.941

0.852 0.891 0.918 0.930 0.947 0.955 0.959

0.857 0.915 0.923 0.934 0.951 0.958 0.958

0.191 0.522 0.703 0.813 0.917 0.932 0.941

0.642 0.786 0.875 0.896 0.939 0.942 0.959

0.000 0.454 0.487 0.542 0.897 0.925 0.937

0.191 0.531 0.703 0.813 0.917 0.932 0.941

0.859 0.902 0.916 0.930 0.949 0.948 0.959

0.864 0.914 0.925 0.935 0.956 0.959 0.957

0.191 0.522 0.703 0.813 0.917 0.932 0.940

0.639 0.822 0.846 0.910 0.933 0.941 0.956

0.000 0.454 0.487 0.675 0.904 0.925 0.941

0.852 0.891 0.918 0.930 0.947 0.955 0.959

0.859 0.902 0.916 0.930 0.949 0.948 0.959

0.852 0.911 0.929 0.934 0.950 0.948 0.960

0.859 0.902 0.921 0.925 0.949 0.948 0.958

0.859 0.886 0.917 0.928 0.947 0.950 0.956

0.845 0.887 0.914 0.926 0.950 0.949 0.955

158

0.8 1.6 2.3 3.6 6.8 9.7 12.5

0.8 1.6 2.4 3.9 7.5 10.8 14.1

0.8 1.5 2.3 3.7 7.0 10.1 12.9

0.8 1.5 2.3 3.8 7.2 10.5 13.8

0.8 1.5 2.2 3.5 6.5 9.5 12.3

0.5 1.1 1.7 2.8 5.8 8.9 11.6

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 0.9 1.3 2.2 4.2 6.1 8.4

0.2 0.5 0.8 1.5 3.5 5.5 7.7

0.2 0.5 0.7 1.2 2.8 4.5 6.2

0.2 0.5 0.7 1.3 3.0 4.9 7.1

0.2 0.4 0.6 1.1 2.5 4.2 5.9

0.2 0.4 0.7 1.4 3.2 5.3 7.5

Threshold (%) Th UTh ITh

90.8 89.8 86.7 88.2 83.7 81.6 83.6

47.5 53.0 55.5 61.0 69.8 73.0 77.0

41.6 48.0 47.2 48.4 55.2 59.4 61.7

36.8 45.4 47.8 53.4 59.6 65.1 71.2

34.0 42.9 43.2 44.6 50.0 55.6 58.9

36.8 43.4 49.5 57.2 64.6 71.0 74.7

M1-M2

91.8 91.3 88.3 88.2 85.1 85.9 84.4

96.0 99.0 99.3 98.6 99.3 98.6 99.2

96.9 99.5 98.7 97.8 98.6 98.5 99.0

68.6 76.1 75.7 79.8 83.5 86.5 88.9

72.4 80.1 78.3 81.8 85.6 87.9 90.5

49.5 54.0 57.5 65.7 74.2 77.5 80.1

91.8 93.4 90.0 92.7 92.4 93.9 90.3

64.4 70.7 68.6 75.1 88.8 89.3 89.4

67.3 69.4 66.9 67.1 78.9 80.2 83.3

65.7 69.3 67.1 71.3 76.6 82.9 82.3

68.4 69.4 69.3 63.9 66.2 71.2 73.6

71.3 72.7 76.3 78.8 75.6 80.5 83.0

Similarity (%) M1-UCD M1-UCM

0.486 0.532 0.567 0.605 0.629 0.635 0.636

0.385 0.455 0.485 0.527 0.573 0.597 0.601

0.385 0.455 0.485 0.527 0.573 0.597 0.601

0.380 0.447 0.490 0.542 0.584 0.601 0.604

0.380 0.447 0.490 0.542 0.584 0.601 0.604

0.380 0.447 0.490 0.542 0.584 0.601 0.604

M1

0.475 0.518 0.562 0.595 0.620 0.621 0.623

0.475 0.518 0.562 0.595 0.620 0.621 0.623

0.486 0.532 0.567 0.605 0.629 0.635 0.636

0.475 0.518 0.562 0.595 0.620 0.621 0.623

0.486 0.532 0.567 0.605 0.629 0.635 0.636

0.385 0.455 0.485 0.527 0.573 0.597 0.601

M2

microF1 UN UCD CC-DF 0.418 0.378 0.500 0.423 0.526 0.473 0.553 0.527 0.590 0.569 0.600 0.590 0.603 0.596 CC-IG 0.493 0.436 0.538 0.491 0.575 0.534 0.606 0.567 0.620 0.602 0.621 0.608 0.621 0.609 CC-MI 0.486 0.436 0.531 0.489 0.572 0.531 0.596 0.562 0.615 0.601 0.615 0.607 0.617 0.607 DF-IG 0.484 0.396 0.528 0.456 0.562 0.490 0.594 0.534 0.612 0.577 0.614 0.598 0.614 0.602 DF-MI 0.475 0.384 0.521 0.456 0.556 0.486 0.585 0.529 0.608 0.576 0.611 0.598 0.611 0.602 IG-MI 0.482 0.478 0.534 0.518 0.577 0.557 0.608 0.592 0.623 0.619 0.627 0.621 0.626 0.622 0.479 0.524 0.558 0.599 0.625 0.630 0.627

0.471 0.500 0.546 0.579 0.603 0.612 0.613

0.475 0.510 0.548 0.588 0.614 0.618 0.619

0.470 0.508 0.553 0.584 0.613 0.615 0.617

0.471 0.515 0.554 0.593 0.622 0.626 0.628

0.405 0.477 0.509 0.546 0.582 0.603 0.601

UCM

0.480 0.518 0.550 0.592 0.626 0.628 0.632

0.392 0.457 0.480 0.531 0.584 0.610 0.614

0.393 0.459 0.481 0.529 0.587 0.615 0.621

0.353 0.441 0.477 0.526 0.587 0.603 0.609

0.358 0.438 0.481 0.517 0.591 0.609 0.618

0.344 0.406 0.448 0.504 0.570 0.595 0.604

INT

0.405 0.457 0.508 0.546 0.561 0.566 0.562

0.272 0.359 0.392 0.440 0.490 0.514 0.518

0.272 0.359 0.392 0.440 0.490 0.514 0.518

0.273 0.362 0.395 0.444 0.499 0.517 0.521

0.273 0.362 0.395 0.444 0.499 0.517 0.521

0.273 0.362 0.395 0.444 0.499 0.517 0.521

M1

Table B.6: Combining operators using Ohsumed Dataset (FLocal)

0.397 0.446 0.498 0.528 0.548 0.548 0.545

0.397 0.446 0.498 0.528 0.548 0.548 0.545

0.405 0.457 0.508 0.546 0.561 0.566 0.562

0.397 0.446 0.498 0.528 0.548 0.548 0.545

0.405 0.457 0.508 0.546 0.561 0.566 0.562

0.272 0.359 0.392 0.440 0.490 0.514 0.518

M2

0.402 0.457 0.518 0.543 0.551 0.552 0.546

0.386 0.442 0.483 0.510 0.530 0.530 0.529

0.395 0.446 0.494 0.523 0.536 0.533 0.532

0.406 0.453 0.499 0.522 0.540 0.536 0.535

0.412 0.460 0.509 0.535 0.546 0.541 0.538

0.310 0.406 0.429 0.461 0.508 0.515 0.519

0.399 0.440 0.492 0.528 0.549 0.548 0.542

0.274 0.359 0.393 0.442 0.493 0.513 0.519

0.280 0.360 0.396 0.443 0.494 0.512 0.519

0.301 0.381 0.419 0.460 0.512 0.520 0.521

0.300 0.381 0.429 0.466 0.514 0.521 0.521

0.232 0.315 0.368 0.422 0.480 0.501 0.510

macroF1 UN UCD

0.401 0.448 0.492 0.535 0.557 0.557 0.551

0.371 0.407 0.457 0.489 0.520 0.530 0.530

0.372 0.421 0.463 0.511 0.533 0.539 0.539

0.350 0.414 0.453 0.489 0.535 0.535 0.536

0.372 0.425 0.466 0.516 0.552 0.552 0.553

0.271 0.379 0.411 0.451 0.496 0.517 0.518

UCM

0.401 0.444 0.488 0.532 0.558 0.558 0.559

0.278 0.368 0.392 0.450 0.507 0.534 0.532

0.279 0.370 0.391 0.446 0.511 0.540 0.547

0.255 0.358 0.392 0.434 0.508 0.525 0.526

0.258 0.356 0.399 0.432 0.513 0.532 0.542

0.225 0.316 0.355 0.414 0.486 0.514 0.519

INT

159

0.8 1.6 2.3 3.6 6.9 10.2 13.2

0.8 1.6 2.3 3.8 7.3 10.8 14.2

0.8 1.5 2.2 3.6 6.8 9.9 13.1

0.8 1.6 2.4 3.9 7.6 11 14.5

0.8 1.5 2.2 3.5 6.8 9.8 13.1

0.6 1.2 1.7 2.9 5.9 8.8 12

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.4 0.8 1.3 2.1 4.1 6.2 8

0.2 0.5 0.8 1.5 3.2 5.2 6.9

0.2 0.4 0.6 1.1 2.4 4 5.5

0.2 0.5 0.8 1.4 3.2 5.1 6.9

0.2 0.4 0.7 1.2 2.7 4.2 5.8

0.2 0.4 0.7 1.4 3.1 4.8 6.8

Threshold (%) Th UTh ITh

86.7 84.7 85.4 82.8 81.4 82.8 80.1

45.9 48.7 51.2 59.9 64 68.8 69.5

39.8 39.1 41.8 43.4 48 53.1 54.7

38.8 45.9 52.2 58 64.7 67.5 69.4

34.7 43.4 46.4 47.5 53.1 56.1 58.4

38.8 40.3 45.4 55.4 62 64.7 67.6

M1-M2

88.8 85.6 86.1 83.4 83.1 84.6 83.2

98 99 99.3 99.2 99.1 98.7 98.4

98 99.5 100 99.4 99.1 98.7 98.4

66.3 70.4 74.2 77 80.3 81.7 84.7

67.3 76.5 77.6 80.4 82.4 84.1 85.7

46.9 46.4 50.5 60.1 65.3 67.8 71.6

92.9 94.1 94.2 95.1 93.8 94 93.8

84.7 86.3 89.9 96.1 98.7 99 97.6

82.7 79.7 85.1 88.2 91.4 92 90.9

68.4 65.8 72.9 71.9 77.5 80.5 82

63.3 61.7 65.8 57.8 62.3 66.8 69.7

55.1 53.6 53.6 60.5 64.5 68.4 72

Similarity (%) M1-UCD M1-UCM

0.505 0.553 0.578 0.603 0.628 0.63 0.632

0.395 0.43 0.477 0.516 0.557 0.576 0.586

0.395 0.43 0.477 0.516 0.557 0.576 0.586

0.413 0.457 0.505 0.546 0.585 0.596 0.602

0.413 0.457 0.505 0.546 0.585 0.596 0.602

0.413 0.457 0.505 0.546 0.585 0.596 0.602

M1

0.495 0.535 0.563 0.592 0.617 0.621 0.618

0.495 0.535 0.563 0.592 0.617 0.621 0.618

0.505 0.553 0.578 0.603 0.628 0.63 0.632

0.495 0.535 0.563 0.592 0.617 0.621 0.618

0.505 0.553 0.578 0.603 0.628 0.63 0.632

0.395 0.43 0.477 0.516 0.557 0.576 0.586

M2

microF1 UN UCD CC-DF 0.429 0.365 0.492 0.403 0.521 0.46 0.547 0.508 0.581 0.551 0.592 0.575 0.597 0.585 CC-IG 0.51 0.446 0.559 0.501 0.581 0.526 0.599 0.558 0.621 0.598 0.622 0.605 0.62 0.609 CC-MI 0.502 0.442 0.545 0.48 0.567 0.523 0.596 0.551 0.613 0.594 0.617 0.599 0.614 0.605 DF-IG 0.499 0.395 0.553 0.43 0.57 0.477 0.592 0.516 0.612 0.557 0.614 0.577 0.613 0.586 DF-MI 0.49 0.394 0.534 0.429 0.557 0.476 0.585 0.517 0.605 0.558 0.608 0.576 0.609 0.587 IG-MI 0.504 0.494 0.554 0.532 0.581 0.561 0.603 0.589 0.621 0.614 0.624 0.619 0.624 0.62 0.496 0.546 0.576 0.603 0.628 0.627 0.628

0.418 0.476 0.504 0.53 0.568 0.58 0.594

0.427 0.488 0.517 0.543 0.581 0.598 0.603

0.473 0.516 0.542 0.578 0.609 0.617 0.615

0.485 0.527 0.557 0.597 0.624 0.629 0.628

0.401 0.43 0.481 0.517 0.557 0.576 0.587

UCM

0.497 0.532 0.562 0.591 0.625 0.627 0.629

0.4 0.438 0.477 0.521 0.565 0.589 0.596

0.404 0.442 0.48 0.52 0.572 0.595 0.602

0.403 0.447 0.496 0.537 0.589 0.601 0.608

0.41 0.456 0.499 0.535 0.593 0.603 0.615

0.374 0.401 0.458 0.505 0.559 0.581 0.591

INT

0.39 0.445 0.476 0.5 0.549 0.552 0.553

0.23 0.274 0.332 0.389 0.435 0.46 0.477

0.23 0.274 0.332 0.389 0.435 0.46 0.477

0.273 0.316 0.365 0.407 0.465 0.482 0.497

0.273 0.316 0.365 0.407 0.465 0.482 0.497

0.273 0.316 0.365 0.407 0.465 0.482 0.497

M1

Table B.7: Combining operators using Ohsumed Dataset (WLocal)

0.386 0.427 0.448 0.488 0.532 0.54 0.535

0.386 0.427 0.448 0.488 0.532 0.54 0.535

0.39 0.445 0.476 0.5 0.549 0.552 0.553

0.386 0.427 0.448 0.488 0.532 0.54 0.535

0.39 0.445 0.476 0.5 0.549 0.552 0.553

0.23 0.274 0.332 0.389 0.435 0.46 0.477

M2

0.396 0.445 0.472 0.501 0.543 0.541 0.542

0.378 0.421 0.441 0.484 0.513 0.516 0.518

0.384 0.438 0.46 0.49 0.521 0.524 0.521

0.39 0.439 0.452 0.491 0.525 0.531 0.525

0.394 0.452 0.474 0.498 0.536 0.536 0.534

0.283 0.356 0.392 0.428 0.467 0.482 0.492

0.379 0.421 0.447 0.484 0.528 0.535 0.537

0.229 0.272 0.333 0.39 0.435 0.461 0.477

0.23 0.273 0.335 0.389 0.436 0.46 0.478

0.289 0.327 0.37 0.418 0.481 0.495 0.501

0.291 0.348 0.374 0.418 0.482 0.496 0.505

0.205 0.236 0.305 0.356 0.422 0.459 0.47

macroF1 UN UCD

0.386 0.441 0.47 0.5 0.546 0.549 0.549

0.243 0.329 0.363 0.402 0.447 0.473 0.489

0.253 0.352 0.385 0.428 0.477 0.501 0.51

0.323 0.378 0.418 0.465 0.516 0.526 0.531

0.354 0.404 0.454 0.494 0.542 0.552 0.548

0.249 0.269 0.336 0.372 0.426 0.458 0.475

UCM

0.381 0.421 0.454 0.488 0.542 0.548 0.548

0.233 0.284 0.339 0.398 0.446 0.477 0.496

0.234 0.285 0.341 0.4 0.454 0.487 0.505

0.258 0.306 0.35 0.395 0.469 0.489 0.509

0.264 0.313 0.352 0.4 0.475 0.497 0.516

0.208 0.235 0.304 0.356 0.434 0.462 0.482

INT

160

0.7 1.4 2.0 3.1 6.0 9.1 12.2

0.8 1.4 2.2 3.4 6.6 9.9 13.0

0.7 1.4 2.1 3.2 6.0 9.0 12.3

0.7 1.3 2.0 3.3 6.4 9.5 12.4

0.6 1.2 1.9 2.9 5.9 8.7 11.6

0.6 1.1 1.7 2.9 5.9 9.0 12.0

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.4 0.9 1.3 2.1 4.1 6.0 8.0

0.4 0.8 1.1 2.1 4.1 6.3 8.4

0.3 0.7 1.0 1.7 3.6 5.5 7.6

0.3 0.6 0.9 1.8 4.0 6.0 7.7

0.2 0.6 0.8 1.6 3.4 5.1 7.0

0.3 0.6 1.0 1.9 4.0 5.9 7.8

Threshold (%) Th UTh ITh

84.0 90.2 83.9 83.8 82.8 80.5 80.3

74.4 75.8 75.6 82.0 82.8 84.7 84.1

58.5 68.9 63.6 68.1 72.5 72.8 76.3

51.8 58.8 60.5 73.6 79.9 80.1 76.5

48.2 56.9 55.3 64.6 68.3 67.6 69.9

50.6 59.4 66.0 74.6 79.1 78.8 78.1

M1-M2

85.2 91.9 85.8 85.3 85.8 84.2 87.0

97.6 99.4 99.6 100.0 99.9 99.6 99.9

97.5 99.4 99.2 99.5 100.0 99.8 99.7

74.7 81.3 82.6 88.1 92.0 92.4 88.6

79.0 85.0 85.8 91.0 92.1 92.6 88.8

63.4 70.0 74.0 79.9 83.3 85.4 82.4

87.7 93.2 88.1 89.1 90 88 90.8

85.4 88.8 87.2 92.3 92.8 92.2 93.2

81.5 87.6 86.8 88.4 92.4 91.8 92.6

72.3 76.9 77.5 86 87.5 86.9 86.6

79 81.9 82.6 85.5 85.9 86.4 85.3

80.5 80.6 84 88.1 89.1 89.5 88.8

Similarity (%) M1-UCD M1-UCM

0.917 0.937 0.943 0.949 0.944 0.942 0.943

0.918 0.932 0.942 0.945 0.945 0.947 0.944

0.918 0.932 0.942 0.945 0.945 0.947 0.944

0.898 0.927 0.941 0.941 0.945 0.946 0.945

0.898 0.927 0.941 0.941 0.945 0.946 0.945

0.898 0.927 0.941 0.941 0.945 0.946 0.945

M1

0.926 0.940 0.945 0.946 0.946 0.947 0.946

0.926 0.940 0.945 0.946 0.946 0.947 0.946

0.917 0.937 0.943 0.949 0.944 0.942 0.943

0.926 0.940 0.945 0.946 0.946 0.947 0.946

0.917 0.937 0.943 0.949 0.944 0.942 0.943

0.918 0.932 0.942 0.945 0.945 0.947 0.944

M2

microF1 UN UCD CC-DF 0.928 0.916 0.934 0.929 0.943 0.941 0.945 0.943 0.945 0.947 0.944 0.946 0.945 0.944 CC-IG 0.923 0.918 0.938 0.930 0.945 0.944 0.947 0.943 0.946 0.944 0.944 0.945 0.945 0.945 CC-MI 0.931 0.922 0.942 0.934 0.945 0.941 0.945 0.941 0.945 0.943 0.943 0.947 0.945 0.944 DF-IG 0.931 0.923 0.941 0.930 0.945 0.943 0.948 0.945 0.946 0.946 0.944 0.947 0.945 0.944 DF-MI 0.931 0.922 0.941 0.932 0.942 0.942 0.948 0.946 0.947 0.945 0.947 0.946 0.946 0.944 IG-MI 0.929 0.928 0.940 0.939 0.946 0.946 0.948 0.946 0.946 0.946 0.945 0.944 0.945 0.946 0.929 0.939 0.946 0.947 0.946 0.945 0.945

0.928 0.938 0.944 0.947 0.947 0.946 0.944

0.930 0.937 0.946 0.947 0.947 0.948 0.943

0.931 0.936 0.942 0.942 0.944 0.947 0.944

0.919 0.931 0.942 0.943 0.946 0.946 0.942

0.923 0.929 0.944 0.941 0.946 0.946 0.945

UCM

0.913 0.937 0.942 0.947 0.944 0.943 0.944

0.916 0.931 0.945 0.946 0.945 0.948 0.943

0.902 0.929 0.941 0.945 0.943 0.944 0.944

0.880 0.925 0.940 0.946 0.945 0.946 0.945

0.884 0.925 0.938 0.945 0.942 0.943 0.944

0.876 0.924 0.939 0.942 0.944 0.946 0.945

INT

0.881 0.903 0.901 0.908 0.894 0.891 0.886

0.863 0.888 0.902 0.898 0.895 0.895 0.888

0.863 0.888 0.902 0.898 0.895 0.895 0.888

0.853 0.880 0.901 0.894 0.892 0.888 0.892

0.853 0.880 0.901 0.894 0.892 0.888 0.892

0.853 0.880 0.901 0.894 0.892 0.888 0.892

M1

Table B.8: Combining operators using Reuters(10) Dataset (FLocal)

0.886 0.904 0.903 0.904 0.896 0.893 0.891

0.886 0.904 0.903 0.904 0.896 0.893 0.891

0.881 0.903 0.901 0.908 0.894 0.891 0.886

0.886 0.904 0.903 0.904 0.896 0.893 0.891

0.881 0.903 0.901 0.908 0.894 0.891 0.886

0.863 0.888 0.902 0.898 0.895 0.895 0.888

M2

0.887 0.903 0.902 0.903 0.896 0.890 0.889

0.894 0.901 0.900 0.902 0.896 0.891 0.891

0.892 0.902 0.902 0.903 0.895 0.889 0.888

0.892 0.907 0.900 0.898 0.892 0.884 0.890

0.887 0.901 0.899 0.897 0.892 0.885 0.889

0.883 0.891 0.899 0.894 0.892 0.888 0.889

0.890 0.903 0.902 0.902 0.895 0.890 0.891

0.866 0.888 0.902 0.900 0.893 0.894 0.888

0.865 0.886 0.903 0.898 0.895 0.895 0.888

0.871 0.889 0.895 0.891 0.892 0.892 0.888

0.871 0.884 0.903 0.894 0.892 0.887 0.891

0.857 0.883 0.897 0.896 0.896 0.893 0.888

macroF1 UN UCD

0.890 0.902 0.904 0.903 0.896 0.891 0.891

0.889 0.896 0.902 0.901 0.898 0.894 0.890

0.889 0.897 0.908 0.903 0.900 0.895 0.888

0.886 0.893 0.900 0.892 0.892 0.895 0.890

0.875 0.888 0.899 0.899 0.894 0.892 0.885

0.877 0.884 0.906 0.895 0.895 0.894 0.894

UCM

0.878 0.905 0.899 0.902 0.895 0.895 0.888

0.860 0.888 0.906 0.902 0.894 0.895 0.887

0.850 0.888 0.903 0.901 0.891 0.896 0.891

0.835 0.879 0.899 0.900 0.894 0.889 0.892

0.837 0.876 0.902 0.903 0.891 0.893 0.892

0.819 0.878 0.898 0.896 0.892 0.890 0.892

INT

161

0.7 1.3 2.0 3.4 6.9 10.0 13.6

0.7 1.4 2.2 3.7 7.3 11.0 14.7

0.7 1.3 1.9 3.1 6.1 8.8 12.5

0.7 1.4 2.0 3.4 7.0 10.6 13.9

0.6 1.2 1.9 3.3 7.0 10.4 13.9

0.6 1.3 1.9 3.4 7.1 10.7 14.3

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.4 0.7 1.1 1.6 2.9 4.3 5.7

0.4 0.8 1.1 1.7 3.0 4.6 6.1

0.3 0.6 1.0 1.6 3.0 4.4 6.1

0.3 0.7 1.1 1.9 3.9 6.2 7.5

0.3 0.6 0.8 1.3 2.7 4.0 5.3

0.3 0.7 1.0 1.6 3.1 5.0 6.4

Threshold (%) Th UTh ITh

73.8 72.5 70.4 62.8 58.5 56.9 56.6

80.7 75.4 71.6 67.6 60.9 61.0 61.0

66.3 63.5 65.6 63.8 60.5 58.6 61.4

63.3 73.4 76.4 76.2 77.0 83.0 75.1

54.4 56.2 56.3 53.6 53.6 52.8 52.6

60.0 66.9 64.6 65.0 62.9 67.0 64.4

M1-M2

85.7 88.0 90.0 88.8 90.6 90.9 92.5

98.8 98.8 98.4 97.8 95.9 95.2 95.2

92.8 91.6 90.4 89.0 86.0 84.0 82.7

84.5 86.2 88.8 88.5 89.1 92.5 91.3

81.0 76.0 73.6 69.7 66.5 64.8 65.0

72.3 72.5 69.6 68.8 67.1 70.9 69.9

85.7 86.2 88 87.6 87.9 87.5 90.1

92.8 87.4 87.6 89 87.1 88.6 88.7

96.4 89.8 85.6 81.9 79.5 78.6 77.2

85.7 85 84.8 86.4 87.3 90.6 91.4

89.3 82 79.2 72.8 67.3 65 64.4

83.1 88 85.2 80.5 74.2 73.9 78.8

Similarity (%) M1-UCD M1-UCM

0.929 0.942 0.948 0.948 0.944 0.944 0.945

0.922 0.930 0.932 0.939 0.947 0.948 0.945

0.922 0.930 0.932 0.939 0.947 0.948 0.945

0.895 0.932 0.937 0.942 0.946 0.948 0.946

0.895 0.932 0.937 0.942 0.946 0.948 0.946

0.895 0.932 0.937 0.942 0.946 0.948 0.946

M1

0.925 0.940 0.945 0.949 0.951 0.949 0.949

0.925 0.940 0.945 0.949 0.951 0.949 0.949

0.929 0.942 0.948 0.948 0.944 0.944 0.945

0.925 0.940 0.945 0.949 0.951 0.949 0.949

0.929 0.942 0.948 0.948 0.944 0.944 0.945

0.922 0.930 0.932 0.939 0.947 0.948 0.945

M2

microF1 UN UCD CC-DF 0.935 0.911 0.939 0.929 0.937 0.932 0.943 0.939 0.949 0.948 0.945 0.947 0.945 0.945 CC-IG 0.937 0.916 0.944 0.934 0.948 0.939 0.951 0.944 0.946 0.946 0.946 0.947 0.945 0.943 CC-MI 0.934 0.914 0.941 0.933 0.943 0.936 0.949 0.947 0.951 0.949 0.947 0.949 0.946 0.947 DF-IG 0.927 0.922 0.940 0.930 0.948 0.933 0.947 0.942 0.942 0.946 0.947 0.945 0.944 0.945 DF-MI 0.928 0.921 0.937 0.929 0.941 0.932 0.945 0.941 0.948 0.948 0.948 0.949 0.948 0.947 IG-MI 0.930 0.925 0.944 0.939 0.949 0.946 0.950 0.946 0.946 0.947 0.946 0.945 0.946 0.944 0.928 0.941 0.948 0.946 0.947 0.946 0.947

0.923 0.938 0.941 0.945 0.947 0.948 0.947

0.922 0.933 0.945 0.944 0.945 0.946 0.946

0.922 0.939 0.944 0.948 0.951 0.948 0.947

0.922 0.937 0.940 0.947 0.946 0.947 0.946

0.907 0.934 0.936 0.939 0.945 0.946 0.947

UCM

0.921 0.939 0.943 0.948 0.949 0.949 0.948

0.915 0.930 0.938 0.943 0.948 0.950 0.948

0.919 0.931 0.935 0.941 0.949 0.948 0.947

0.873 0.926 0.938 0.941 0.943 0.948 0.948

0.880 0.932 0.936 0.940 0.942 0.947 0.946

0.868 0.925 0.930 0.939 0.942 0.948 0.948

INT

0.871 0.895 0.909 0.905 0.892 0.891 0.893

0.859 0.864 0.866 0.884 0.898 0.901 0.894

0.859 0.864 0.866 0.884 0.898 0.901 0.894

0.794 0.873 0.884 0.891 0.893 0.899 0.894

0.794 0.873 0.884 0.891 0.893 0.899 0.894

0.794 0.873 0.884 0.891 0.893 0.899 0.894

M1

M2

0.863 0.884 0.902 0.907 0.910 0.906 0.901

0.863 0.884 0.902 0.907 0.910 0.906 0.901

0.871 0.895 0.909 0.905 0.892 0.891 0.893

0.863 0.884 0.902 0.907 0.910 0.906 0.901

0.871 0.895 0.909 0.905 0.892 0.891 0.893

0.859 0.864 0.866 0.884 0.898 0.901 0.894

Table B.9: Combining operators using Reuters(10) Dataset (WLocal)

0.868 0.895 0.910 0.908 0.894 0.895 0.895

0.866 0.879 0.898 0.900 0.899 0.899 0.896

0.867 0.888 0.907 0.903 0.889 0.894 0.889

0.886 0.896 0.899 0.905 0.906 0.895 0.894

0.891 0.896 0.908 0.909 0.892 0.892 0.892

0.886 0.889 0.884 0.890 0.899 0.893 0.892

0.863 0.877 0.905 0.902 0.897 0.892 0.891

0.859 0.862 0.868 0.890 0.900 0.899 0.897

0.856 0.868 0.870 0.887 0.893 0.891 0.893

0.777 0.868 0.882 0.903 0.902 0.901 0.894

0.798 0.877 0.886 0.895 0.892 0.896 0.889

0.776 0.862 0.865 0.880 0.897 0.895 0.892

macroF1 UN UCD

0.868 0.895 0.907 0.905 0.900 0.894 0.896

0.858 0.880 0.898 0.900 0.897 0.899 0.895

0.859 0.868 0.899 0.897 0.892 0.896 0.895

0.856 0.891 0.898 0.909 0.906 0.899 0.894

0.870 0.889 0.893 0.902 0.892 0.895 0.891

0.803 0.881 0.883 0.882 0.891 0.893 0.894

UCM

0.860 0.884 0.901 0.907 0.908 0.906 0.901

0.850 0.864 0.879 0.894 0.908 0.905 0.900

0.856 0.868 0.874 0.891 0.903 0.901 0.898

0.698 0.852 0.886 0.891 0.893 0.905 0.902

0.716 0.876 0.882 0.890 0.889 0.902 0.897

0.695 0.848 0.857 0.890 0.894 0.906 0.898

INT

162

0.8 1.6 2.3 3.6 6.7 9.5 12.3

0.8 1.6 2.4 3.7 7.0 10.0 13.1

0.8 1.6 2.3 3.7 6.9 9.7 12.6

0.7 1.4 2.1 3.5 6.6 9.7 12.7

0.7 1.4 2.0 3.4 6.4 9.3 12.0

0.5 1.1 1.6 2.6 5.3 8.2 10.8

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 0.9 1.4 2.4 4.7 6.8 9.2

0.3 0.6 1.0 1.6 3.6 5.7 8.0

0.3 0.6 0.9 1.5 3.4 5.3 7.3

0.2 0.4 0.7 1.3 3.1 5.3 7.4

0.2 0.4 0.6 1.3 3.0 5.0 6.9

0.2 0.4 0.7 1.4 3.3 5.5 7.7

Threshold (%) Th UTh ITh

96.0 91.0 93.4 95.0 93.4 90.5 92.3

65.0 58.9 63.4 65.9 71.9 76.4 80.3

64.0 56.9 59.1 60.3 67.9 71.3 73.0

35.0 37.3 43.5 53.9 62.7 70.7 74.0

35.0 35.3 41.2 50.5 60.1 67.2 68.6

34.0 40.3 44.5 54.3 66.2 73.4 76.9

M1-M2

98.0 94.5 95.0 95.5 95.2 93.6 92.9

88.0 93.9 94.6 94.0 96.7 97.2 97.7

88.0 93.4 93.9 94.2 96.3 96.8 97.7

77.0 80.1 81.7 83.0 89.8 91.8 93.2

77.0 80.1 82.4 85.8 90.0 92.2 93.8

71.0 67.5 71.7 73.6 80.7 84.3 86.9

97 97.5 96.7 96.6 97.7 97 97.2

77 71.6 72.8 76.4 78.3 80.7 82.8

77 68 69.9 73.5 73.6 75.3 77

60 62.7 66.4 73.4 79.8 79.7 81.5

61 59.7 64.8 73.5 74 74.7 74.9

72 78.2 80.3 83.5 89.4 89.8 93.1

Similarity (%) M1-UCD M1-UCM

0.789 0.822 0.835 0.851 0.866 0.867 0.868

0.790 0.824 0.843 0.854 0.868 0.870 0.871

0.790 0.824 0.843 0.854 0.868 0.870 0.871

0.710 0.779 0.833 0.852 0.867 0.870 0.868

0.710 0.779 0.833 0.852 0.867 0.870 0.868

0.710 0.779 0.833 0.852 0.867 0.870 0.868

M1

0.798 0.827 0.835 0.855 0.866 0.873 0.871

0.798 0.827 0.835 0.855 0.866 0.873 0.871

0.789 0.822 0.835 0.851 0.866 0.867 0.868

0.798 0.827 0.835 0.855 0.866 0.873 0.871

0.789 0.822 0.835 0.851 0.866 0.867 0.868

0.790 0.824 0.843 0.854 0.868 0.870 0.871

M2

microF1 UN UCD CC-DF 0.809 0.783 0.840 0.829 0.857 0.849 0.859 0.858 0.867 0.869 0.871 0.872 0.868 0.868 CC-IG 0.818 0.798 0.844 0.829 0.859 0.853 0.866 0.863 0.869 0.869 0.871 0.872 0.868 0.870 CC-MI 0.814 0.795 0.846 0.829 0.859 0.854 0.865 0.864 0.868 0.869 0.870 0.871 0.868 0.869 DF-IG 0.813 0.808 0.842 0.841 0.850 0.851 0.857 0.859 0.868 0.870 0.870 0.871 0.868 0.869 DF-MI 0.810 0.805 0.836 0.836 0.851 0.850 0.858 0.860 0.867 0.870 0.868 0.871 0.870 0.870 IG-MI 0.803 0.803 0.827 0.827 0.838 0.836 0.856 0.857 0.866 0.866 0.871 0.871 0.870 0.871 0.803 0.826 0.838 0.855 0.867 0.872 0.871

0.805 0.825 0.845 0.863 0.870 0.874 0.872

0.808 0.834 0.843 0.860 0.871 0.872 0.871

0.805 0.831 0.852 0.860 0.870 0.874 0.870

0.810 0.837 0.854 0.855 0.872 0.872 0.872

0.785 0.831 0.850 0.862 0.873 0.872 0.870

UCM

0.767 0.811 0.831 0.847 0.866 0.867 0.868

0.772 0.811 0.828 0.848 0.865 0.873 0.872

0.742 0.793 0.826 0.846 0.863 0.867 0.869

0.612 0.714 0.798 0.836 0.865 0.872 0.874

0.550 0.701 0.799 0.833 0.864 0.868 0.872

0.662 0.745 0.811 0.845 0.869 0.869 0.871

INT

0.394 0.442 0.442 0.450 0.446 0.438 0.435

0.369 0.407 0.435 0.447 0.443 0.434 0.428

0.369 0.407 0.435 0.447 0.443 0.434 0.428

0.227 0.295 0.385 0.418 0.439 0.434 0.428

0.227 0.295 0.385 0.418 0.439 0.434 0.428

0.227 0.295 0.385 0.418 0.439 0.434 0.428

M1

M2

0.393 0.441 0.443 0.451 0.449 0.439 0.433

0.393 0.441 0.443 0.451 0.449 0.439 0.433

0.394 0.442 0.442 0.450 0.446 0.438 0.435

0.393 0.441 0.443 0.451 0.449 0.439 0.433

0.394 0.442 0.442 0.450 0.446 0.438 0.435

0.369 0.407 0.435 0.447 0.443 0.434 0.428

Table B.10: Combining operators using Reuters(90) Dataset (FLocal)

0.410 0.441 0.448 0.450 0.445 0.438 0.430

0.417 0.443 0.451 0.443 0.439 0.430 0.426

0.420 0.446 0.453 0.445 0.437 0.432 0.424

0.417 0.446 0.456 0.452 0.444 0.430 0.424

0.418 0.448 0.458 0.455 0.442 0.430 0.423

0.388 0.427 0.440 0.445 0.436 0.431 0.423

0.405 0.445 0.447 0.456 0.448 0.438 0.431

0.402 0.422 0.448 0.447 0.444 0.433 0.424

0.400 0.421 0.448 0.448 0.445 0.441 0.423

0.295 0.372 0.423 0.443 0.441 0.434 0.429

0.286 0.373 0.424 0.432 0.442 0.434 0.428

0.258 0.368 0.403 0.433 0.440 0.443 0.428

macroF1 UN UCD

0.400 0.444 0.446 0.451 0.446 0.437 0.434

0.423 0.453 0.459 0.458 0.452 0.436 0.433

0.427 0.457 0.463 0.464 0.450 0.438 0.431

0.397 0.439 0.461 0.454 0.457 0.441 0.429

0.398 0.444 0.463 0.450 0.448 0.440 0.434

0.338 0.430 0.438 0.446 0.444 0.436 0.431

UCM

0.384 0.443 0.441 0.449 0.450 0.438 0.435

0.343 0.394 0.438 0.446 0.448 0.444 0.434

0.339 0.375 0.430 0.452 0.449 0.443 0.434

0.197 0.260 0.350 0.402 0.453 0.443 0.439

0.206 0.263 0.349 0.401 0.445 0.440 0.442

0.198 0.251 0.348 0.408 0.442 0.438 0.435

INT

163

0.9 1.4 2.1 3.4 6.8 10.0 13.4

0.9 1.5 2.2 3.7 7.3 10.9 14.5

0.8 1.4 2.1 3.3 6.3 9.2 12.6

0.7 1.4 2.1 3.4 7.0 10.4 13.6

0.7 1.2 1.9 3.2 6.8 10.2 13.6

0.7 1.1 1.8 3.1 6.9 10.3 13.7

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.5 1 1.5 2.5 5 7.5 10

0.3 0.9 1.2 1.9 3.1 4.7 6.3

0.3 0.8 1.1 1.8 3.2 4.8 6.4

0.3 0.6 0.9 1.6 3.0 4.6 6.4

0.2 0.6 0.9 1.7 3.7 5.8 7.4

0.1 0.5 0.8 1.3 2.7 4.1 5.5

0.1 0.6 0.9 1.6 3.2 5.0 6.6

Threshold (%) Th UTh ITh

57.6 89.1 80.1 74.9 62.8 62.0 62.7

69.2 76.1 71.3 72.2 63.3 63.7 64.1

57.1 63.6 62.5 64.9 60.9 60.9 63.8

30.9 55.1 62.7 68.8 74.2 77.2 74.4

27.7 48.1 53.3 53.4 53.5 54.2 55.3

28.7 57.8 60.5 62.1 63.2 66.3 66.4

M1-M2

59.8 94.0 91.7 91.1 90.3 90.9 92.0

80.2 95.7 96.7 97.8 96.0 94.4 94.0

91.2 92.3 90.9 90.5 85.8 83.7 83.0

56.4 79.5 83.3 83.4 88.3 89.7 88.4

76.1 81.4 78.3 73.6 69.8 68.3 68.3

61.5 75.5 72.7 70.2 69.0 71.8 72.6

57.6 96.7 98.2 98.5 97.8 95.8 94.7

73.6 88 89.8 89.2 87.2 87 86.7

85.7 79.8 74.5 73.1 70.9 73.2 76

39.4 69.7 77.5 76.3 83.2 85.3 83.6

57.6 62.3 62.3 57.3 57.2 59.4 62.5

61.5 75.5 73.5 72.1 73.9 76.1 74.9

Similarity (%) M1-UCD M1-UCM

0.803 0.867 0.875 0.878 0.873 0.875 0.874

0.810 0.855 0.865 0.870 0.875 0.875 0.872

0.810 0.855 0.865 0.870 0.875 0.875 0.872

0.701 0.820 0.837 0.857 0.871 0.871 0.873

0.701 0.820 0.837 0.857 0.871 0.871 0.873

0.701 0.820 0.837 0.857 0.871 0.871 0.873

M1

0.842 0.861 0.875 0.879 0.880 0.874 0.876

0.842 0.861 0.875 0.879 0.880 0.874 0.876

0.803 0.867 0.875 0.878 0.873 0.875 0.874

0.842 0.861 0.875 0.879 0.880 0.874 0.876

0.803 0.867 0.875 0.878 0.873 0.875 0.874

0.810 0.855 0.865 0.870 0.875 0.875 0.872

M2

microF1 UN UCD CC-DF 0.834 0.788 0.863 0.829 0.868 0.838 0.871 0.863 0.876 0.876 0.871 0.871 0.871 0.870 CC-IG 0.829 0.796 0.871 0.836 0.876 0.856 0.876 0.865 0.873 0.873 0.874 0.873 0.872 0.871 CC-MI 0.848 0.794 0.869 0.834 0.876 0.860 0.878 0.866 0.877 0.876 0.874 0.874 0.875 0.874 DF-IG 0.834 0.826 0.872 0.857 0.874 0.860 0.876 0.874 0.874 0.872 0.872 0.871 0.870 0.870 DF-MI 0.844 0.825 0.867 0.862 0.872 0.864 0.878 0.873 0.877 0.877 0.871 0.873 0.870 0.871 IG-MI 0.847 0.839 0.868 0.861 0.878 0.875 0.877 0.875 0.874 0.874 0.874 0.875 0.872 0.874 0.842 0.866 0.874 0.876 0.874 0.877 0.875

0.830 0.861 0.870 0.876 0.876 0.875 0.873

0.834 0.867 0.873 0.878 0.876 0.876 0.875

0.820 0.861 0.874 0.876 0.878 0.874 0.875

0.810 0.872 0.877 0.877 0.875 0.875 0.875

0.808 0.829 0.839 0.857 0.872 0.875 0.873

UCM

0.770 0.860 0.875 0.879 0.878 0.875 0.876

0.805 0.850 0.863 0.873 0.880 0.879 0.875

0.772 0.848 0.864 0.872 0.873 0.877 0.874

0.660 0.802 0.828 0.855 0.874 0.873 0.874

0.554 0.805 0.829 0.855 0.869 0.871 0.874

0.643 0.807 0.825 0.851 0.870 0.873 0.874

INT

0.380 0.449 0.462 0.468 0.463 0.459 0.440

0.340 0.406 0.428 0.433 0.440 0.447 0.433

0.340 0.406 0.428 0.433 0.440 0.447 0.433

0.153 0.241 0.271 0.314 0.392 0.400 0.431

0.153 0.241 0.271 0.314 0.392 0.400 0.431

0.153 0.241 0.271 0.314 0.392 0.400 0.431

M1

0.380 0.437 0.470 0.469 0.469 0.459 0.460

0.380 0.437 0.470 0.469 0.469 0.459 0.460

0.380 0.449 0.462 0.468 0.463 0.459 0.440

0.380 0.437 0.470 0.469 0.469 0.459 0.460

0.380 0.449 0.462 0.468 0.463 0.459 0.440

0.340 0.406 0.428 0.433 0.440 0.447 0.433

M2

Table B.11: Combining operators using Reuters(90) Dataset (WLocal)

0.434 0.455 0.472 0.467 0.461 0.451 0.434

0.420 0.460 0.459 0.457 0.456 0.443 0.426

0.428 0.472 0.462 0.457 0.456 0.448 0.422

0.390 0.459 0.454 0.465 0.453 0.447 0.444

0.433 0.456 0.460 0.460 0.453 0.439 0.427

0.370 0.426 0.412 0.428 0.441 0.436 0.431

0.377 0.434 0.465 0.459 0.457 0.458 0.436

0.333 0.417 0.420 0.435 0.437 0.438 0.433

0.380 0.412 0.421 0.433 0.431 0.435 0.427

0.207 0.270 0.334 0.369 0.429 0.426 0.453

0.252 0.265 0.316 0.370 0.433 0.430 0.426

0.209 0.256 0.272 0.339 0.423 0.419 0.428

macroF1 UN UCD

0.378 0.442 0.465 0.465 0.459 0.460 0.452

0.330 0.418 0.432 0.445 0.459 0.446 0.447

0.417 0.422 0.431 0.452 0.455 0.459 0.453

0.267 0.375 0.414 0.424 0.451 0.452 0.454

0.383 0.428 0.446 0.458 0.456 0.461 0.454

0.252 0.246 0.273 0.319 0.414 0.427 0.441

UCM

0.329 0.434 0.460 0.465 0.468 0.458 0.461

0.307 0.402 0.426 0.440 0.448 0.453 0.449

0.321 0.398 0.424 0.439 0.441 0.452 0.447

0.108 0.206 0.239 0.307 0.400 0.396 0.432

0.115 0.203 0.247 0.311 0.390 0.398 0.438

0.105 0.215 0.235 0.294 0.380 0.404 0.437

INT

164

0.8 (±0.0) 1.6 (±0.0) 2.3 (±0.0) 3.7 (±0.1) 7.1 (±0.1) 10.4 (±0.0) 13.7 (±0.1)

0.8 (±0.0) 1.5 (±0.0) 2.2 (±0.0) 3.5 (±0.0) 6.6 (±0.1) 9.5 (±0.0) 12.6 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.5 (±0.0) 2.2 (±0.0) 3.5 (±0.0) 6.6 (±0.0) 9.6 (±0.0) 12.6 (±0.1)

0.2 (±0.0) 0.5 (±0.0) 0.8 (±0.0) 1.5 (±0.0) 3.4 (±0.1) 5.5 (±0.0) 7.4 (±0.1)

0.2 (±0.0) 0.4 (±0.0) 0.7 (±0.0) 1.3 (±0.1) 2.9 (±0.1) 4.6 (±0.0) 6.3 (±0.1)

0.2 (±0.0) 0.5 (±0.0) 0.8 (±0.0) 1.5 (±0.0) 3.4 (±0.0) 5.4 (±0.0) 7.4 (±0.1)

Threshold (%) UTh ITh

0.5

Th

39.2 (±1.1) 49.2 (±0.7) 54.4 (±0.8) 59.6 (±1.5) 67.4 (±1.2) 73.2 (±0.5) 74.2 (±1.1)

34.0 (±1.4) 41.1 (±0.7) 46.8 (±0.8) 50.3 (±0.7) 57.2 (±0.5) 61.1 (±0.7) 63.3 (±0.9)

38.3 (±0.8) 49.0 (±1.5) 55.1 (±0.6) 59.3 (±1.2) 67.8 (±0.4) 72.1 (±0.5) 73.5 (±1.0)

M1-M2

62.9 (±2.4) 69.6 (±1.0) 72.0 (±0.9) 75.6 (±1.2) 80.7 (±0.5) 84.7 (±0.5) 85.5 (±0.8)

66.4 (±1.3) 71.3 (±0.9) 74.1 (±0.8) 77.0 (±1.0) 81.4 (±0.4) 83.7 (±0.4) 85.7 (±0.4)

47.1 (±1.2) 55.0 (±1.1) 62.5 (±0.6) 64.9 (±0.9) 72.8 (±0.3) 76.3 (±0.4) 77.4 (±0.8)

65.7 (±2.6) 70.0 (±3.2) 73.0 (±2.9) 77.0 (±2.4) 79.4 (±0.8) 82.9 (±1.6) 83.6 (±1.8)

60.3 (±2.7) 63.1 (±3.1) 67.0 (±2.6) 70.2 (±2.9) 72.0 (±1.7) 74.4 (±1.3) 74.2 (±1.6)

68.1 (±2.1) 72.3 (±2.2) 76.7 (±1.5) 79.8 (±2.1) 80.5 (±1.7) 83.3 (±0.7) 82.8 (±1.8)

Similarity (%) M1-UCD M1-UCM

0.782 (±0.014) 0.820 (±0.008) 0.837 (±0.008) 0.848 (±0.005) 0.856 (±0.006) 0.856 (±0.009) 0.859 (±0.005)

0.782 (±0.014) 0.820 (±0.008) 0.837 (±0.008) 0.848 (±0.005) 0.856 (±0.006) 0.856 (±0.009) 0.859 (±0.005)

0.782 (±0.014) 0.820 (±0.008) 0.837 (±0.008) 0.848 (±0.005) 0.856 (±0.006) 0.856 (±0.009) 0.859 (±0.005)

M1

0.815 (±0.006) 0.837 (±0.007) 0.846 (±0.006) 0.857 (±0.006) 0.860 (±0.006) 0.861 (±0.008) 0.859 (±0.010)

0.815 (±0.008) 0.829 (±0.007) 0.838 (±0.006) 0.845 (±0.006) 0.856 (±0.008) 0.857 (±0.006) 0.858 (±0.009)

0.810 (±0.010) 0.833 (±0.010) 0.848 (±0.008) 0.851 (±0.007) 0.852 (±0.007) 0.858 (±0.008) 0.858 (±0.008)

M2

0.824 (±0.007) 0.840 (±0.006) 0.848 (±0.003) 0.856 (±0.007) 0.858 (±0.003) 0.860 (±0.007) 0.858 (±0.008)

0.825 (±0.006) 0.841 (±0.005) 0.850 (±0.004) 0.855 (±0.003) 0.857 (±0.004) 0.860 (±0.008) 0.860 (±0.010)

UCD CC-DF 0.805 (±0.012) 0.830 (±0.007) 0.839 (±0.008) 0.850 (±0.009) 0.852 (±0.006) 0.859 (±0.009) 0.859 (±0.008) CC-IG 0.812 (±0.006) 0.836 (±0.006) 0.843 (±0.007) 0.850 (±0.005) 0.857 (±0.002) 0.859 (±0.009) 0.859 (±0.009) CC-MI 0.811 (±0.006) 0.837 (±0.006) 0.842 (±0.007) 0.851 (±0.007) 0.859 (±0.004) 0.860 (±0.008) 0.857 (±0.010)

microF1

0.819 (±0.009) 0.838 (±0.008) 0.850 (±0.006) 0.852 (±0.008) 0.856 (±0.004) 0.859 (±0.011) 0.858 (±0.012)

UN

0.806 (±0.007) 0.830 (±0.007) 0.842 (±0.008) 0.854 (±0.007) 0.859 (±0.003) 0.856 (±0.007) 0.857 (±0.009)

0.805 (±0.009) 0.826 (±0.007) 0.839 (±0.006) 0.849 (±0.006) 0.858 (±0.006) 0.857 (±0.007) 0.859 (±0.010)

0.801 (±0.006) 0.834 (±0.004) 0.840 (±0.010) 0.848 (±0.009) 0.855 (±0.003) 0.857 (±0.007) 0.859 (±0.006)

UCM

0.762 (±0.013) 0.809 (±0.016) 0.829 (±0.007) 0.844 (±0.008) 0.858 (±0.009) 0.858 (±0.005) 0.858 (±0.010)

0.749 (±0.011) 0.795 (±0.014) 0.822 (±0.006) 0.835 (±0.004) 0.854 (±0.003) 0.856 (±0.006) 0.858 (±0.009)

0.755 (±0.011) 0.807 (±0.012) 0.829 (±0.007) 0.843 (±0.007) 0.853 (±0.006) 0.855 (±0.006) 0.859 (±0.008)

INT

Table B.12: Combining operators using Alj-Mgz-AS Dataset (FLocal)

0.671 (±0.027) 0.748 (±0.014) 0.770 (±0.019) 0.786 (±0.008) 0.796 (±0.010) 0.795 (±0.016) 0.799 (±0.010)

0.671 (±0.027) 0.748 (±0.014) 0.770 (±0.019) 0.786 (±0.008) 0.796 (±0.010) 0.795 (±0.016) 0.799 (±0.010)

0.671 (±0.027) 0.748 (±0.014) 0.770 (±0.019) 0.786 (±0.008) 0.796 (±0.010) 0.795 (±0.016) 0.799 (±0.010)

M1

0.743 (±0.013) 0.775 (±0.011) 0.789 (±0.011) 0.801 (±0.010) 0.804 (±0.008) 0.804 (±0.013) 0.797 (±0.016)

0.740 (±0.011) 0.769 (±0.013) 0.781 (±0.010) 0.790 (±0.011) 0.800 (±0.015) 0.798 (±0.009) 0.799 (±0.014)

0.730 (±0.021) 0.771 (±0.017) 0.792 (±0.011) 0.791 (±0.012) 0.795 (±0.013) 0.793 (±0.015) 0.794 (±0.013)

M2

0.761 (±0.008) 0.778 (±0.009) 0.789 (±0.009) 0.797 (±0.010) 0.800 (±0.003) 0.799 (±0.016) 0.796 (±0.013)

0.757 (±0.010) 0.783 (±0.007) 0.793 (±0.005) 0.798 (±0.006) 0.799 (±0.008) 0.799 (±0.014) 0.799 (±0.016)

UCD

0.736 (±0.008) 0.775 (±0.010) 0.782 (±0.012) 0.791 (±0.008) 0.800 (±0.008) 0.799 (±0.015) 0.793 (±0.016)

0.739 (±0.011) 0.776 (±0.010) 0.783 (±0.009) 0.788 (±0.006) 0.797 (±0.005) 0.798 (±0.014) 0.797 (±0.014)

0.717 (±0.033) 0.765 (±0.009) 0.777 (±0.012) 0.791 (±0.010) 0.794 (±0.012) 0.796 (±0.013) 0.794 (±0.014)

macroF1

0.748 (±0.020) 0.780 (±0.013) 0.793 (±0.011) 0.791 (±0.013) 0.795 (±0.009) 0.797 (±0.017) 0.796 (±0.021)

UN

0.720 (±0.018) 0.771 (±0.009) 0.782 (±0.009) 0.798 (±0.008) 0.800 (±0.006) 0.797 (±0.012) 0.794 (±0.014)

0.714 (±0.017) 0.761 (±0.012) 0.778 (±0.008) 0.793 (±0.005) 0.800 (±0.008) 0.798 (±0.008) 0.799 (±0.014)

0.701 (±0.011) 0.768 (±0.009) 0.776 (±0.013) 0.786 (±0.008) 0.797 (±0.009) 0.796 (±0.010) 0.796 (±0.011)

UCM

0.627 (±0.029) 0.726 (±0.029) 0.761 (±0.014) 0.785 (±0.012) 0.800 (±0.012) 0.797 (±0.008) 0.800 (±0.016)

0.611 (±0.030) 0.705 (±0.022) 0.752 (±0.023) 0.775 (±0.010) 0.799 (±0.008) 0.797 (±0.012) 0.801 (±0.015)

0.606 (±0.015) 0.723 (±0.025) 0.756 (±0.014) 0.778 (±0.011) 0.796 (±0.009) 0.793 (±0.012) 0.800 (±0.015)

INT

165

0.7 (±0.0) 1.3 (±0.0) 1.9 (±0.0) 3.1 (±0.0) 6.0 (±0.0) 8.9 (±0.1) 11.8 (±0.0)

0.6 (±0.0) 1.2 (±0.0) 1.8 (±0.0) 3.0 (±0.0) 6.0 (±0.0) 9.0 (±0.1) 12.0 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.7 (±0.1) 1.5 (±0.0) 2.1 (±0.0) 3.4 (±0.0) 6.6 (±0.0) 9.7 (±0.0) 12.8 (±0.0)

0.5

Th

0.4 (±0.0) 0.8 (±0.0) 1.2 (±0.0) 2.0 (±0.0) 4.0 (±0.0) 6.0 (±0.1) 8.0 (±0.1)

0.3 (±0.0) 0.7 (±0.0) 1.1 (±0.0) 1.9 (±0.0) 4.0 (±0.0) 6.1 (±0.1) 8.2 (±0.0)

0.3 (±0.1) 0.5 (±0.0) 0.9 (±0.0) 1.6 (±0.0) 3.4 (±0.0) 5.3 (±0.0) 7.2 (±0.0)

Threshold (%) UTh ITh

78.8 (±2.2) 78.9 (±1.3) 82.1 (±0.9) 80.1 (±0.4) 79.4 (±0.3) 79.5 (±0.5) 80.3 (±0.8)

57.6 (±2.0) 66.5 (±0.5) 71.5 (±0.9) 74.9 (±0.7) 79.9 (±0.4) 81.9 (±0.4) 82.3 (±0.5)

50.8 (±2.1) 54.6 (±1.1) 60.5 (±1.0) 62.5 (±0.8) 67.8 (±0.3) 70.7 (±0.5) 72.5 (±0.5)

M1-M2

84.8 (±1.6) 82.6 (±1.2) 85.0 (±0.9) 82.8 (±0.3) 83.1 (±0.3) 84.0 (±0.6) 85.1 (±0.4)

99.7 (±0.3) 99.5 (±0.3) 99.4 (±0.1) 99.5 (±0.2) 99.4 (±0.1) 99.6 (±0.2) 99.5 (±0.2)

99.2 (±0.5) 99.3 (±0.1) 99.2 (±0.1) 99.5 (±0.1) 99.5 (±0.1) 99.7 (±0.2) 99.7 (±0.2)

90.4 (±1.5) 90.2 (±0.5) 91.7 (±0.5) 91.5 (±0.2) 91.1 (±0.4) 91.0 (±0.8) 92.7 (±0.4)

76.5 (±1.3) 81.5 (±0.8) 84.3 (±0.9) 86.5 (±0.5) 89.8 (±0.6) 92.2 (±1.3) 92.3 (±0.9)

73.7 (±1.1) 76.3 (±1.1) 78.8 (±0.7) 80.0 (±0.5) 84.6 (±1.4) 85.6 (±1.0) 85.9 (±0.2)

Similarity (%) M1-UCD M1-UCM

0.815 (±0.008) 0.829 (±0.007) 0.838 (±0.006) 0.845 (±0.006) 0.856 (±0.008) 0.857 (±0.006) 0.858 (±0.009)

0.810 (±0.010) 0.833 (±0.010) 0.848 (±0.008) 0.851 (±0.007) 0.852 (±0.007) 0.858 (±0.008) 0.858 (±0.008)

0.810 (±0.010) 0.833 (±0.010) 0.848 (±0.008) 0.851 (±0.007) 0.852 (±0.007) 0.858 (±0.008) 0.858 (±0.008)

M1

0.815 (±0.006) 0.837 (±0.007) 0.846 (±0.006) 0.857 (±0.006) 0.860 (±0.006) 0.861 (±0.008) 0.859 (±0.010)

0.815 (±0.006) 0.837 (±0.007) 0.846 (±0.006) 0.857 (±0.006) 0.860 (±0.006) 0.861 (±0.008) 0.859 (±0.010)

0.815 (±0.008) 0.829 (±0.007) 0.838 (±0.006) 0.845 (±0.006) 0.856 (±0.008) 0.857 (±0.006) 0.858 (±0.009)

M2

Table B.12 – continued microF1 UN UCD UCM DF-IG 0.829 0.812 0.813 (±0.010) (±0.007) (±0.007) 0.844 0.834 0.837 (±0.007) (±0.007) (±0.007) 0.851 0.848 0.846 (±0.004) (±0.009) (±0.010) 0.855 0.851 0.854 (±0.008) (±0.009) (±0.010) 0.856 0.853 0.855 (±0.007) (±0.007) (±0.003) 0.860 0.857 0.857 (±0.008) (±0.008) (±0.009) 0.860 0.859 0.857 (±0.008) (±0.008) (±0.006) DF-MI 0.831 0.811 0.815 (±0.008) (±0.009) (±0.006) 0.842 0.833 0.837 (±0.007) (±0.011) (±0.009) 0.853 0.848 0.849 (±0.006) (±0.009) (±0.008) 0.856 0.850 0.854 (±0.005) (±0.007) (±0.006) 0.857 0.854 0.856 (±0.006) (±0.007) (±0.004) 0.861 0.858 0.859 (±0.010) (±0.009) (±0.007) 0.857 0.859 0.860 (±0.010) (±0.006) (±0.008) IG-MI 0.825 0.817 0.816 (±0.010) (±0.007) (±0.008) 0.840 0.837 0.833 (±0.003) (±0.005) (±0.005) 0.848 0.846 0.843 (±0.005) (±0.006) (±0.004) 0.857 0.855 0.852 (±0.005) (±0.006) (±0.008) 0.860 0.859 0.857 (±0.005) (±0.005) (±0.007) 0.860 0.861 0.859 (±0.007) (±0.008) (±0.005) 0.858 0.857 0.857 (±0.008) (±0.007) (±0.007) 0.804 (±0.007) 0.825 (±0.007) 0.838 (±0.006) 0.844 (±0.005) 0.856 (±0.005) 0.856 (±0.007) 0.858 (±0.009)

0.794 (±0.008) 0.825 (±0.010) 0.838 (±0.009) 0.848 (±0.006) 0.857 (±0.003) 0.860 (±0.008) 0.860 (±0.006)

0.791 (±0.005) 0.817 (±0.008) 0.831 (±0.006) 0.837 (±0.007) 0.852 (±0.003) 0.857 (±0.010) 0.860 (±0.008)

INT

0.740 (±0.011) 0.769 (±0.013) 0.781 (±0.010) 0.790 (±0.011) 0.800 (±0.015) 0.798 (±0.009) 0.799 (±0.014)

0.730 (±0.021) 0.771 (±0.017) 0.792 (±0.011) 0.791 (±0.012) 0.795 (±0.013) 0.793 (±0.015) 0.794 (±0.013)

0.730 (±0.021) 0.771 (±0.017) 0.792 (±0.011) 0.791 (±0.012) 0.795 (±0.013) 0.793 (±0.015) 0.794 (±0.013)

M1

0.743 (±0.013) 0.775 (±0.011) 0.789 (±0.011) 0.801 (±0.010) 0.804 (±0.008) 0.804 (±0.013) 0.797 (±0.016)

0.743 (±0.013) 0.775 (±0.011) 0.789 (±0.011) 0.801 (±0.010) 0.804 (±0.008) 0.804 (±0.013) 0.797 (±0.016)

0.740 (±0.011) 0.769 (±0.013) 0.781 (±0.010) 0.790 (±0.011) 0.800 (±0.015) 0.798 (±0.009) 0.799 (±0.014)

M2

0.757 (±0.016) 0.783 (±0.006) 0.795 (±0.012) 0.803 (±0.010) 0.803 (±0.009) 0.799 (±0.013) 0.795 (±0.014)

0.769 (±0.010) 0.784 (±0.011) 0.798 (±0.009) 0.798 (±0.008) 0.800 (±0.012) 0.798 (±0.018) 0.792 (±0.015)

UC

0.744 (±0.012) 0.779 (±0.009) 0.788 (±0.010) 0.798 (±0.009) 0.802 (±0.009) 0.801 (±0.015) 0.791 (±0.013)

0.728 (±0.017) 0.770 (±0.018) 0.790 (±0.010) 0.789 (±0.013) 0.798 (±0.011) 0.794 (±0.016) 0.795 (±0.010)

0.732 (±0.018) 0.771 (±0.015) 0.791 (±0.011) 0.793 (±0.014) 0.797 (±0.013) 0.793 (±0.016) 0.796 (±0.013)

macroF1

0.763 (±0.013) 0.787 (±0.013) 0.794 (±0.007) 0.797 (±0.013) 0.799 (±0.012) 0.796 (±0.013) 0.795 (±0.015)

UN

0.739 (±0.010) 0.773 (±0.007) 0.785 (±0.007) 0.798 (±0.012) 0.801 (±0.012) 0.802 (±0.008) 0.794 (±0.014)

0.736 (±0.012) 0.779 (±0.009) 0.789 (±0.011) 0.797 (±0.008) 0.799 (±0.006) 0.797 (±0.014) 0.795 (±0.012)

0.722 (±0.017) 0.775 (±0.010) 0.783 (±0.012) 0.800 (±0.012) 0.797 (±0.007) 0.797 (±0.015) 0.794 (±0.014)

UM

0.728 (±0.011) 0.761 (±0.013) 0.780 (±0.008) 0.786 (±0.009) 0.799 (±0.010) 0.799 (±0.011) 0.799 (±0.014)

0.697 (±0.015) 0.762 (±0.014) 0.781 (±0.012) 0.789 (±0.007) 0.801 (±0.007) 0.802 (±0.013) 0.797 (±0.011)

0.693 (±0.009) 0.746 (±0.012) 0.771 (±0.008) 0.774 (±0.010) 0.794 (±0.006) 0.796 (±0.014) 0.800 (±0.012)

INT

166

0.8 (±0.0) 1.5 (±0.1) 2.3 (±0.0) 3.7 (±0.0) 7.3 (±0.0) 10.7 (±0.1) 14.1 (±0.0)

0.7 (±0.0) 1.4 (±0.0) 2.1 (±0.0) 3.4 (±0.0) 6.5 (±0.0) 9.6 (±0.0) 12.2 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.5 (±0.0) 2.2 (±0.0) 3.5 (±0.0) 6.8 (±0.1) 10.0 (±0.1) 12.9 (±0.0)

0.3 (±0.0) 0.6 (±0.0) 0.9 (±0.0) 1.6 (±0.0) 3.5 (±0.0) 5.4 (±0.0) 7.8 (±0.1)

0.2 (±0.0) 0.5 (±0.1) 0.7 (±0.0) 1.3 (±0.0) 2.7 (±0.0) 4.3 (±0.1) 5.9 (±0.0)

0.2 (±0.0) 0.5 (±0.0) 0.8 (±0.0) 1.5 (±0.0) 3.2 (±0.1) 5.0 (±0.1) 7.1 (±0.0)

Threshold (%) UTh ITh

0.5

Th

51.5 (±2.0) 58.0 (±1.1) 61.2 (±0.6) 64.2 (±1.2) 69.6 (±0.5) 72.1 (±1.0) 78.2 (±0.6)

37.7 (±1.5) 45.8 (±1.8) 47.6 (±1.2) 51.3 (±0.8) 53.9 (±0.7) 57.3 (±0.7) 59.0 (±0.3)

47.8 (±1.4) 52.6 (±1.5) 56.0 (±0.9) 61.4 (±0.3) 64.9 (±0.6) 66.9 (±0.6) 70.7 (±0.5)

M1-M2

72.4 (±2.3) 77.4 (±0.6) 77.8 (±1.0) 79.4 (±0.5) 83.5 (±1.7) 84.1 (±0.7) 88.0 (±0.5)

68.6 (±2.3) 72.9 (±0.7) 72.9 (±0.7) 74.3 (±0.8) 74.9 (±1.2) 73.7 (±0.8) 75.7 (±0.7)

53.5 (±1.3) 56.9 (±1.1) 60.3 (±1.1) 64.9 (±0.5) 68.2 (±1.3) 69.6 (±0.9) 71.6 (±0.4)

67.7 (±2.5) 70.2 (±3.1) 72.1 (±2.6) 75.6 (±2.2) 80.9 (±2.1) 81.8 (±1.1) 85.4 (±1.1)

56.5 (±2.2) 60.5 (±3.1) 61.7 (±3.9) 65.8 (±3.3) 69.5 (±2.7) 69.9 (±2.3) 71.4 (±1.9)

65.4 (±2.4) 70.1 (±2.9) 70.6 (±2.4) 73.6 (±2.5) 76.0 (±2.3) 75.9 (±1.4) 76.4 (±0.4)

Similarity (%) M1-UCD M1-UCM

0.797 (±0.007) 0.823 (±0.005) 0.836 (±0.009) 0.846 (±0.007) 0.858 (±0.006) 0.856 (±0.010) 0.857 (±0.007)

0.797 (±0.007) 0.823 (±0.005) 0.836 (±0.009) 0.846 (±0.007) 0.858 (±0.006) 0.856 (±0.010) 0.857 (±0.007)

0.797 (±0.007) 0.823 (±0.005) 0.836 (±0.009) 0.846 (±0.007) 0.858 (±0.006) 0.856 (±0.010) 0.857 (±0.007)

M1

0.823 (±0.008) 0.838 (±0.007) 0.848 (±0.006) 0.855 (±0.007) 0.860 (±0.006) 0.860 (±0.006) 0.859 (±0.007)

0.811 (±0.009) 0.838 (±0.007) 0.843 (±0.005) 0.855 (±0.004) 0.856 (±0.007) 0.858 (±0.006) 0.858 (±0.009)

0.805 (±0.010) 0.829 (±0.009) 0.841 (±0.011) 0.852 (±0.007) 0.856 (±0.008) 0.857 (±0.006) 0.856 (±0.007)

M2

0.826 (±0.008) 0.842 (±0.007) 0.849 (±0.008) 0.858 (±0.007) 0.859 (±0.003) 0.860 (±0.005) 0.858 (±0.009)

0.832 (±0.009) 0.845 (±0.007) 0.854 (±0.005) 0.860 (±0.005) 0.858 (±0.005) 0.859 (±0.006) 0.858 (±0.007)

UCD CC-DF 0.797 (±0.007) 0.827 (±0.007) 0.840 (±0.009) 0.850 (±0.006) 0.856 (±0.008) 0.857 (±0.010) 0.858 (±0.007) CC-IG 0.810 (±0.009) 0.838 (±0.005) 0.847 (±0.008) 0.854 (±0.004) 0.857 (±0.004) 0.857 (±0.007) 0.859 (±0.007) CC-MI 0.814 (±0.009) 0.837 (±0.008) 0.847 (±0.005) 0.857 (±0.008) 0.859 (±0.008) 0.859 (±0.008) 0.859 (±0.008)

microF1

0.821 (±0.007) 0.837 (±0.007) 0.845 (±0.008) 0.853 (±0.008) 0.859 (±0.010) 0.857 (±0.006) 0.858 (±0.010)

UN

0.820 (±0.007) 0.838 (±0.008) 0.845 (±0.009) 0.856 (±0.006) 0.862 (±0.004) 0.861 (±0.006) 0.861 (±0.006)

0.810 (±0.003) 0.834 (±0.006) 0.843 (±0.010) 0.853 (±0.003) 0.860 (±0.005) 0.858 (±0.004) 0.857 (±0.007)

0.797 (±0.004) 0.830 (±0.010) 0.838 (±0.011) 0.849 (±0.007) 0.856 (±0.006) 0.857 (±0.006) 0.858 (±0.008)

UCM

0.772 (±0.007) 0.814 (±0.007) 0.829 (±0.008) 0.846 (±0.009) 0.859 (±0.008) 0.856 (±0.011) 0.856 (±0.006)

0.761 (±0.005) 0.802 (±0.009) 0.822 (±0.012) 0.842 (±0.006) 0.854 (±0.007) 0.856 (±0.009) 0.857 (±0.007)

0.766 (±0.011) 0.811 (±0.008) 0.827 (±0.005) 0.842 (±0.006) 0.857 (±0.006) 0.857 (±0.011) 0.857 (±0.007)

INT

Table B.13: Combining operators using Alj-Mgz-AS Dataset (WLocal)

0.691 (±0.017) 0.745 (±0.008) 0.760 (±0.017) 0.773 (±0.012) 0.795 (±0.012) 0.792 (±0.018) 0.793 (±0.013)

0.691 (±0.017) 0.745 (±0.008) 0.760 (±0.017) 0.773 (±0.012) 0.795 (±0.012) 0.792 (±0.018) 0.793 (±0.013)

0.691 (±0.017) 0.745 (±0.008) 0.760 (±0.017) 0.773 (±0.012) 0.795 (±0.012) 0.792 (±0.018) 0.793 (±0.013)

M1

0.743 (±0.013) 0.776 (±0.012) 0.788 (±0.008) 0.796 (±0.011) 0.800 (±0.008) 0.800 (±0.010) 0.795 (±0.008)

0.729 (±0.017) 0.776 (±0.008) 0.784 (±0.008) 0.799 (±0.006) 0.796 (±0.008) 0.797 (±0.011) 0.800 (±0.015)

0.714 (±0.024) 0.762 (±0.012) 0.778 (±0.012) 0.788 (±0.013) 0.794 (±0.010) 0.792 (±0.008) 0.791 (±0.010)

M2

0.759 (±0.012) 0.779 (±0.013) 0.785 (±0.008) 0.798 (±0.011) 0.800 (±0.004) 0.800 (±0.010) 0.796 (±0.015)

0.769 (±0.017) 0.785 (±0.007) 0.796 (±0.006) 0.802 (±0.009) 0.798 (±0.011) 0.798 (±0.010) 0.797 (±0.012)

UCD

0.737 (±0.016) 0.774 (±0.016) 0.784 (±0.008) 0.796 (±0.009) 0.800 (±0.010) 0.798 (±0.013) 0.796 (±0.011)

0.734 (±0.015) 0.775 (±0.009) 0.788 (±0.011) 0.792 (±0.008) 0.796 (±0.010) 0.793 (±0.012) 0.797 (±0.012)

0.690 (±0.016) 0.754 (±0.014) 0.778 (±0.015) 0.785 (±0.011) 0.795 (±0.011) 0.791 (±0.014) 0.794 (±0.010)

macroF1

0.743 (±0.019) 0.774 (±0.010) 0.779 (±0.010) 0.787 (±0.015) 0.798 (±0.012) 0.792 (±0.010) 0.794 (±0.013)

UN

0.742 (±0.018) 0.777 (±0.013) 0.790 (±0.006) 0.798 (±0.012) 0.803 (±0.010) 0.801 (±0.011) 0.799 (±0.007)

0.716 (±0.011) 0.770 (±0.010) 0.785 (±0.013) 0.796 (±0.004) 0.801 (±0.009) 0.796 (±0.008) 0.796 (±0.012)

0.684 (±0.009) 0.758 (±0.015) 0.775 (±0.010) 0.785 (±0.013) 0.793 (±0.011) 0.792 (±0.011) 0.793 (±0.009)

UCM

0.612 (±0.025) 0.724 (±0.012) 0.758 (±0.017) 0.776 (±0.015) 0.795 (±0.015) 0.794 (±0.014) 0.791 (±0.012)

0.604 (±0.014) 0.694 (±0.021) 0.740 (±0.026) 0.772 (±0.010) 0.794 (±0.010) 0.794 (±0.013) 0.794 (±0.012)

0.602 (±0.022) 0.717 (±0.008) 0.753 (±0.010) 0.769 (±0.010) 0.793 (±0.014) 0.792 (±0.020) 0.792 (±0.012)

INT

167

0.7 (±0.0) 1.4 (±0.0) 2.1 (±0.0) 3.3 (±0.0) 6.5 (±0.1) 9.7 (±0.0) 13.1 (±0.0)

0.7 (±0.1) 1.3 (±0.0) 1.9 (±0.0) 3.2 (±0.0) 6.5 (±0.0) 9.8 (±0.1) 13.2 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.7 (±0.0) 1.5 (±0.1) 2.1 (±0.1) 3.5 (±0.0) 6.9 (±0.1) 10.0 (±0.0) 13.5 (±0.0)

0.5

Th

0.3 (±0.0) 0.7 (±0.0) 1.1 (±0.0) 1.8 (±0.0) 3.5 (±0.0) 5.2 (±0.1) 6.8 (±0.1)

0.3 (±0.0) 0.6 (±0.0) 0.9 (±0.0) 1.7 (±0.0) 3.5 (±0.1) 5.3 (±0.0) 6.9 (±0.0)

0.3 (±0.0) 0.5 (±0.1) 0.9 (±0.1) 1.5 (±0.0) 3.1 (±0.1) 5.0 (±0.0) 6.5 (±0.0)

Threshold (%) UTh ITh

69.6 (±0.6) 73.3 (±1.2) 71.2 (±0.8) 71.0 (±0.5) 69.8 (±0.4) 69.3 (±0.6) 68.7 (±0.4)

55.9 (±0.7) 58.9 (±1.9) 62.5 (±0.5) 68.0 (±0.7) 69.5 (±1.1) 70.2 (±0.3) 69.2 (±0.2)

51.0 (±0.7) 55.3 (±0.8) 56.8 (±0.3) 60.8 (±0.3) 63.2 (±1.0) 66.6 (±0.8) 65.0 (±0.2)

M1-M2

84.6 (±1.1) 86.2 (±1.3) 84.2 (±0.6) 84.5 (±0.5) 85.8 (±0.5) 87.7 (±0.4) 87.7 (±0.2)

98.8 (±0.5) 98.7 (±0.3) 98.4 (±0.3) 98.5 (±0.3) 99.1 (±0.5) 98.3 (±0.4) 99.6 (±0.1)

96.6 (±0.5) 96.6 (±0.4) 96.0 (±0.6) 94.4 (±0.3) 95.0 (±0.7) 94.0 (±0.5) 95.7 (±0.5)

90.2 (±0.9) 92.7 (±1.4) 90.9 (±0.3) 90.4 (±0.5) 89.9 (±0.5) 91.5 (±0.5) 90.0 (±0.5)

79.2 (±0.6) 79.2 (±1.0) 80.2 (±0.6) 84.7 (±0.4) 87.8 (±0.9) 89.1 (±0.7) 92.7 (±0.7)

79.4 (±0.8) 76.5 (±0.4) 78.1 (±0.6) 80.9 (±0.4) 84.7 (±0.8) 85.7 (±0.6) 88.6 (±0.7)

Similarity (%) M1-UCD M1-UCM

0.811 (±0.009) 0.838 (±0.007) 0.843 (±0.005) 0.855 (±0.004) 0.856 (±0.007) 0.858 (±0.006) 0.858 (±0.009)

0.805 (±0.010) 0.829 (±0.009) 0.841 (±0.011) 0.852 (±0.007) 0.856 (±0.008) 0.857 (±0.006) 0.856 (±0.007)

0.805 (±0.010) 0.829 (±0.009) 0.841 (±0.011) 0.852 (±0.007) 0.856 (±0.008) 0.857 (±0.006) 0.856 (±0.007)

M1

0.823 (±0.008) 0.838 (±0.007) 0.848 (±0.006) 0.855 (±0.007) 0.860 (±0.006) 0.860 (±0.006) 0.859 (±0.007)

0.823 (±0.008) 0.838 (±0.007) 0.848 (±0.006) 0.855 (±0.007) 0.860 (±0.006) 0.860 (±0.006) 0.859 (±0.007)

0.811 (±0.009) 0.838 (±0.007) 0.843 (±0.005) 0.855 (±0.004) 0.856 (±0.007) 0.858 (±0.006) 0.858 (±0.009)

M2

Table B.13 – continued microF1 UN UCD UCM DF-IG 0.830 0.805 0.811 (±0.012) (±0.008) (±0.007) 0.846 0.831 0.835 (±0.009) (±0.011) (±0.010) 0.855 0.843 0.846 (±0.008) (±0.008) (±0.008) 0.857 0.850 0.852 (±0.005) (±0.009) (±0.006) 0.858 0.856 0.856 (±0.006) (±0.011) (±0.006) 0.857 0.857 0.858 (±0.008) (±0.008) (±0.007) 0.859 0.858 0.860 (±0.010) (±0.007) (±0.007) DF-MI 0.831 0.807 0.819 (±0.007) (±0.009) (±0.009) 0.845 0.828 0.838 (±0.008) (±0.010) (±0.011) 0.853 0.845 0.849 (±0.008) (±0.009) (±0.008) 0.858 0.852 0.852 (±0.009) (±0.006) (±0.004) 0.856 0.855 0.858 (±0.009) (±0.009) (±0.007) 0.857 0.856 0.858 (±0.007) (±0.006) (±0.008) 0.862 0.857 0.857 (±0.008) (±0.007) (±0.006) IG-MI 0.831 0.820 0.813 (±0.004) (±0.010) (±0.009) 0.843 0.838 0.836 (±0.007) (±0.009) (±0.006) 0.854 0.846 0.846 (±0.009) (±0.003) (±0.006) 0.859 0.859 0.859 (±0.004) (±0.005) (±0.004) 0.858 0.858 0.858 (±0.006) (±0.005) (±0.005) 0.858 0.856 0.859 (±0.007) (±0.009) (±0.005) 0.859 0.858 0.858 (±0.010) (±0.008) (±0.009) 0.803 (±0.010) 0.830 (±0.007) 0.837 (±0.003) 0.851 (±0.005) 0.857 (±0.003) 0.857 (±0.006) 0.859 (±0.006)

0.792 (±0.008) 0.824 (±0.006) 0.836 (±0.008) 0.845 (±0.007) 0.860 (±0.006) 0.859 (±0.007) 0.858 (±0.008)

0.782 (±0.004) 0.815 (±0.006) 0.827 (±0.007) 0.844 (±0.006) 0.856 (±0.006) 0.856 (±0.006) 0.856 (±0.007)

INT

0.729 (±0.017) 0.776 (±0.008) 0.784 (±0.008) 0.799 (±0.006) 0.796 (±0.008) 0.797 (±0.011) 0.800 (±0.015)

0.714 (±0.024) 0.762 (±0.012) 0.778 (±0.012) 0.788 (±0.013) 0.794 (±0.010) 0.792 (±0.008) 0.791 (±0.010)

0.714 (±0.024) 0.762 (±0.012) 0.778 (±0.012) 0.788 (±0.013) 0.794 (±0.010) 0.792 (±0.008) 0.791 (±0.010)

M1

0.743 (±0.013) 0.776 (±0.012) 0.788 (±0.008) 0.796 (±0.011) 0.800 (±0.008) 0.800 (±0.010) 0.795 (±0.008)

0.743 (±0.013) 0.776 (±0.012) 0.788 (±0.008) 0.796 (±0.011) 0.800 (±0.008) 0.800 (±0.010) 0.795 (±0.008)

0.729 (±0.017) 0.776 (±0.008) 0.784 (±0.008) 0.799 (±0.006) 0.796 (±0.008) 0.797 (±0.011) 0.800 (±0.015)

M2

0.759 (±0.009) 0.781 (±0.007) 0.796 (±0.009) 0.804 (±0.009) 0.795 (±0.010) 0.797 (±0.010) 0.798 (±0.015)

0.759 (±0.012) 0.784 (±0.012) 0.795 (±0.011) 0.798 (±0.014) 0.795 (±0.012) 0.792 (±0.013) 0.797 (±0.012)

UC

0.744 (±0.017) 0.778 (±0.013) 0.786 (±0.006) 0.805 (±0.011) 0.797 (±0.009) 0.794 (±0.013) 0.797 (±0.012)

0.718 (±0.019) 0.758 (±0.015) 0.784 (±0.013) 0.793 (±0.010) 0.793 (±0.011) 0.792 (±0.011) 0.793 (±0.010)

0.715 (±0.021) 0.767 (±0.017) 0.782 (±0.012) 0.793 (±0.014) 0.795 (±0.015) 0.792 (±0.014) 0.794 (±0.009)

macroF1

0.757 (±0.017) 0.786 (±0.012) 0.797 (±0.010) 0.800 (±0.007) 0.795 (±0.009) 0.793 (±0.012) 0.795 (±0.015)

UN

0.731 (±0.018) 0.772 (±0.009) 0.783 (±0.007) 0.804 (±0.007) 0.797 (±0.009) 0.799 (±0.009) 0.795 (±0.015)

0.734 (±0.012) 0.780 (±0.014) 0.793 (±0.009) 0.796 (±0.011) 0.796 (±0.010) 0.794 (±0.014) 0.793 (±0.008)

0.717 (±0.019) 0.770 (±0.019) 0.786 (±0.012) 0.795 (±0.012) 0.796 (±0.007) 0.797 (±0.009) 0.798 (±0.011)

UM

0.718 (±0.015) 0.765 (±0.015) 0.775 (±0.002) 0.793 (±0.007) 0.798 (±0.006) 0.799 (±0.008) 0.799 (±0.009)

0.674 (±0.018) 0.751 (±0.008) 0.771 (±0.011) 0.782 (±0.015) 0.803 (±0.011) 0.799 (±0.012) 0.796 (±0.011)

0.667 (±0.011) 0.735 (±0.014) 0.757 (±0.011) 0.783 (±0.010) 0.795 (±0.010) 0.794 (±0.010) 0.794 (±0.011)

INT

168

0.8 (±0.0) 1.5 (±0.1) 2.2 (±0.0) 3.5 (±0.0) 6.8 (±0.0) 10.3 (±0.1) 13.6 (±0.1)

0.8 (±0.0) 1.4 (±0.0) 2.0 (±0.1) 3.2 (±0.0) 6.1 (±0.0) 9.0 (±0.0) 12.2 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.4 (±0.0) 2.1 (±0.0) 3.2 (±0.1) 6.3 (±0.0) 9.4 (±0.1) 12.3 (±0.1)

0.2 (±0.0) 0.6 (±0.0) 1.0 (±0.1) 1.8 (±0.0) 3.9 (±0.0) 6.0 (±0.0) 7.8 (±0.1)

0.2 (±0.0) 0.5 (±0.1) 0.8 (±0.0) 1.5 (±0.0) 3.2 (±0.0) 4.7 (±0.1) 6.4 (±0.1)

0.2 (±0.0) 0.6 (±0.0) 0.9 (±0.0) 1.8 (±0.1) 3.7 (±0.0) 5.6 (±0.1) 7.7 (±0.1)

Threshold (%) UTh ITh

0.5

Th

43.6 (±2.3) 58.4 (±3.2) 63.7 (±2.3) 72.0 (±1.5) 77.9 (±0.4) 79.5 (±0.4) 77.8 (±0.9)

34.8 (±2.1) 47.9 (±3.8) 53.3 (±1.1) 59.8 (±1.0) 63.8 (±0.4) 62.7 (±1.0) 64.3 (±0.3)

43.6 (±2.5) 57.7 (±2.0) 61.4 (±2.2) 70.5 (±1.3) 74.6 (±0.6) 74.5 (±0.9) 76.5 (±0.5)

M1-M2

70.8 (±2.8) 75.9 (±1.1) 77.6 (±1.7) 82.8 (±1.2) 88.0 (±0.3) 91.1 (±0.7) 91.6 (±1.1)

72.3 (±2.5) 76.0 (±2.5) 76.9 (±1.0) 81.9 (±1.0) 86.7 (±0.5) 87.9 (±1.0) 88.6 (±0.5)

49.9 (±2.3) 61.3 (±1.2) 64.4 (±2.5) 74.3 (±1.2) 77.8 (±0.8) 79.5 (±0.6) 81.9 (±0.3)

66.1 (±7.7) 72.7 (±5.5) 75.6 (±2.1) 82.8 (±2.0) 86.4 (±2.3) 86.1 (±1.6) 83.3 (±2.3)

56.5 (±8.8) 61.7 (±5.7) 65.3 (±2.6) 69.7 (±2.6) 74.1 (±2.4) 69.3 (±2.8) 70.9 (±0.8)

68.0 (±2.6) 74.2 (±3.9) 77.9 (±3.0) 81.1 (±1.9) 82.4 (±1.8) 80.8 (±3.0) 81.3 (±1.2)

Similarity (%) M1-UCD M1-UCM

0.762 (±0.012) 0.808 (±0.011) 0.825 (±0.006) 0.831 (±0.006) 0.832 (±0.009) 0.837 (±0.008) 0.834 (±0.009)

0.762 (±0.012) 0.808 (±0.011) 0.825 (±0.006) 0.831 (±0.006) 0.832 (±0.009) 0.837 (±0.008) 0.834 (±0.009)

0.762 (±0.012) 0.808 (±0.011) 0.825 (±0.006) 0.831 (±0.006) 0.832 (±0.009) 0.837 (±0.008) 0.834 (±0.009)

M1

0.800 (±0.007) 0.825 (±0.010) 0.825 (±0.005) 0.830 (±0.006) 0.836 (±0.008) 0.836 (±0.008) 0.834 (±0.007)

0.785 (±0.007) 0.815 (±0.008) 0.821 (±0.008) 0.831 (±0.003) 0.835 (±0.009) 0.836 (±0.006) 0.837 (±0.005)

0.777 (±0.010) 0.807 (±0.011) 0.823 (±0.009) 0.829 (±0.007) 0.835 (±0.009) 0.836 (±0.011) 0.835 (±0.006)

M2

0.806 (±0.008) 0.826 (±0.007) 0.831 (±0.006) 0.832 (±0.009) 0.836 (±0.006) 0.837 (±0.008) 0.835 (±0.009)

0.813 (±0.006) 0.827 (±0.007) 0.834 (±0.007) 0.835 (±0.010) 0.835 (±0.009) 0.837 (±0.009) 0.835 (±0.007)

UCD CC-DF 0.771 (±0.007) 0.805 (±0.009) 0.821 (±0.007) 0.828 (±0.006) 0.836 (±0.008) 0.836 (±0.010) 0.834 (±0.007) CC-IG 0.787 (±0.003) 0.815 (±0.009) 0.827 (±0.004) 0.830 (±0.006) 0.835 (±0.009) 0.836 (±0.009) 0.836 (±0.008) CC-MI 0.784 (±0.009) 0.817 (±0.006) 0.826 (±0.006) 0.830 (±0.007) 0.836 (±0.007) 0.836 (±0.008) 0.835 (±0.010)

microF1

0.797 (±0.006) 0.816 (±0.007) 0.827 (±0.005) 0.831 (±0.008) 0.835 (±0.010) 0.834 (±0.012) 0.835 (±0.007)

UN

0.790 (±0.007) 0.820 (±0.010) 0.826 (±0.006) 0.829 (±0.006) 0.836 (±0.010) 0.837 (±0.007) 0.834 (±0.008)

0.774 (±0.005) 0.811 (±0.009) 0.822 (±0.007) 0.829 (±0.005) 0.836 (±0.009) 0.837 (±0.007) 0.837 (±0.008)

0.770 (±0.009) 0.808 (±0.012) 0.818 (±0.007) 0.830 (±0.007) 0.835 (±0.009) 0.836 (±0.009) 0.835 (±0.007)

UCM

0.736 (±0.020) 0.803 (±0.011) 0.817 (±0.009) 0.829 (±0.005) 0.831 (±0.007) 0.836 (±0.008) 0.833 (±0.008)

0.710 (±0.009) 0.782 (±0.011) 0.804 (±0.012) 0.821 (±0.003) 0.833 (±0.007) 0.836 (±0.007) 0.837 (±0.007)

0.735 (±0.005) 0.790 (±0.015) 0.814 (±0.009) 0.826 (±0.006) 0.831 (±0.009) 0.837 (±0.010) 0.834 (±0.010)

INT

Table B.14: Combining operators using Alj-Mgz-SR Dataset (FLocal)

0.627 (±0.025) 0.730 (±0.022) 0.751 (±0.010) 0.760 (±0.017) 0.761 (±0.019) 0.764 (±0.017) 0.759 (±0.015)

0.627 (±0.025) 0.730 (±0.022) 0.751 (±0.010) 0.760 (±0.017) 0.761 (±0.019) 0.764 (±0.017) 0.759 (±0.015)

0.627 (±0.025) 0.730 (±0.022) 0.751 (±0.010) 0.760 (±0.017) 0.761 (±0.019) 0.764 (±0.017) 0.759 (±0.015)

M1

0.723 (±0.010) 0.760 (±0.010) 0.753 (±0.010) 0.762 (±0.010) 0.770 (±0.014) 0.765 (±0.014) 0.762 (±0.013)

0.693 (±0.014) 0.747 (±0.007) 0.752 (±0.011) 0.769 (±0.006) 0.764 (±0.016) 0.763 (±0.013) 0.766 (±0.009)

0.664 (±0.015) 0.726 (±0.023) 0.753 (±0.011) 0.758 (±0.014) 0.765 (±0.017) 0.764 (±0.014) 0.760 (±0.011)

M2

0.730 (±0.014) 0.758 (±0.009) 0.761 (±0.012) 0.763 (±0.017) 0.767 (±0.014) 0.767 (±0.015) 0.762 (±0.015)

0.738 (±0.012) 0.761 (±0.009) 0.768 (±0.012) 0.770 (±0.017) 0.764 (±0.019) 0.764 (±0.019) 0.762 (±0.012)

UCD

0.690 (±0.019) 0.746 (±0.009) 0.756 (±0.011) 0.758 (±0.013) 0.766 (±0.015) 0.766 (±0.016) 0.761 (±0.015)

0.697 (±0.007) 0.741 (±0.016) 0.758 (±0.010) 0.759 (±0.010) 0.764 (±0.015) 0.764 (±0.016) 0.765 (±0.011)

0.649 (±0.010) 0.721 (±0.018) 0.750 (±0.014) 0.757 (±0.016) 0.763 (±0.015) 0.764 (±0.014) 0.759 (±0.010)

macroF1

0.707 (±0.017) 0.738 (±0.015) 0.757 (±0.006) 0.759 (±0.015) 0.765 (±0.017) 0.758 (±0.016) 0.759 (±0.011)

UN

0.692 (±0.017) 0.753 (±0.012) 0.754 (±0.015) 0.760 (±0.010) 0.766 (±0.019) 0.767 (±0.015) 0.761 (±0.017)

0.669 (±0.009) 0.736 (±0.019) 0.754 (±0.012) 0.762 (±0.011) 0.763 (±0.018) 0.764 (±0.015) 0.767 (±0.014)

0.647 (±0.016) 0.727 (±0.022) 0.747 (±0.013) 0.760 (±0.015) 0.765 (±0.018) 0.765 (±0.013) 0.760 (±0.011)

UCM

0.590 (±0.039) 0.717 (±0.027) 0.738 (±0.013) 0.761 (±0.013) 0.762 (±0.014) 0.765 (±0.016) 0.760 (±0.013)

0.564 (±0.019) 0.682 (±0.029) 0.724 (±0.018) 0.752 (±0.007) 0.765 (±0.015) 0.761 (±0.014) 0.763 (±0.012)

0.567 (±0.011) 0.693 (±0.035) 0.739 (±0.015) 0.753 (±0.014) 0.761 (±0.017) 0.767 (±0.014) 0.760 (±0.015)

INT

169

0.8 (±0.0) 1.3 (±0.0) 1.9 (±0.0) 3.0 (±0.0) 6.0 (±0.0) 9.2 (±0.1) 12.2 (±0.2)

0.6 (±0.0) 1.2 (±0.0) 1.8 (±0) 3.04 (±0.06) 6.16 (±0.06) 9.26 (±0.11) 11.94 (±0.18)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.4 (±0.0) 2.0 (±0.0) 3.2 (±0.0) 6.4 (±0.1) 9.6 (±0.0) 12.9 (±0.1)

0.5

Th

0.4 (±0.0) 0.8 (±0.0) 1.2 (±0) 1.96 (±0.06) 3.84 (±0.06) 5.74 (±0.11) 8.06 (±0.18)

0.2 (±0.0) 0.7 (±0.0) 1.1 (±0.0) 2.0 (±0.0) 4.0 (±0.0) 5.8 (±0.1) 7.8 (±0.2)

0.2 (±0.0) 0.6 (±0.0) 1.0 (±0.0) 1.8 (±0.0) 3.6 (±0.1) 5.4 (±0.0) 7.1 (±0.1)

Threshold (%) UTh ITh

75.9 (±2.0) 77.2 (±0.9) 79.3 (±1.2) 78.0 (±0.4) 76.7 (±0.7) 76.2 (±1.4) 80.5 (±1.7)

47.9 (±1.4) 66.7 (±1.3) 74.0 (±1.2) 81.0 (±1.0) 80.8 (±0.3) 76.9 (±0.6) 78.0 (±1.8)

41.8 (±0.8) 56.5 (±1.3) 64.2 (±0.8) 70.8 (±0.6) 72.8 (±0.9) 72.1 (±0.5) 71.6 (±0.8)

M1-M2

85.5 (±1.7) 83.0 (±1.1) 84.4 (±0.5) 84.3 (±0.4) 84.3 (±0.8) 85.4 (±0.7) 87.0 (±0.8)

99.6 (±0.3) 99.8 (±0.3) 99.9 (±0.1) 99.6 (±0.1) 99.1 (±0.4) 98.6 (±0.4) 99.0 (±0.5)

99.8 (±0.3) 99.6 (±0.2) 99.7 (±0.3) 99.5 (±0.1) 99.7 (±0.2) 99.8 (±0.4) 99.6 (±0.4)

91.9 (±0.8) 93.0 (±0.6) 94.2 (±0.4) 93.9 (±0.4) 91.8 (±0.8) 95.1 (±0.5) 95.7 (±1.4)

72.8 (±1.1) 82.2 (±0.7) 86.7 (±0.7) 92.4 (±0.6) 94.5 (±0.8) 92.6 (±2.1) 94.0 (±3.8)

70.6 (±1.5) 73.0 (±1.8) 77.9 (±1.2) 83.0 (±0.6) 86.3 (±0.6) 86.5 (±1.4) 85.7 (±0.7)

Similarity (%) M1-UCD M1-UCM

0.785 (±0.007) 0.815 (±0.008) 0.821 (±0.008) 0.831 (±0.003) 0.835 (±0.009) 0.836 (±0.006) 0.837 (±0.005)

0.777 (±0.010) 0.807 (±0.011) 0.823 (±0.009) 0.829 (±0.007) 0.835 (±0.009) 0.836 (±0.011) 0.835 (±0.006)

0.777 (±0.010) 0.807 (±0.011) 0.823 (±0.009) 0.829 (±0.007) 0.835 (±0.009) 0.836 (±0.011) 0.835 (±0.006)

M1

0.800 (±0.007) 0.825 (±0.010) 0.825 (±0.005) 0.830 (±0.006) 0.836 (±0.008) 0.836 (±0.008) 0.834 (±0.007)

0.800 (±0.007) 0.825 (±0.010) 0.825 (±0.005) 0.830 (±0.006) 0.836 (±0.008) 0.836 (±0.008) 0.834 (±0.007)

0.785 (±0.007) 0.815 (±0.008) 0.821 (±0.008) 0.831 (±0.003) 0.835 (±0.009) 0.836 (±0.006) 0.837 (±0.005)

M2

Table B.14 – continued microF1 UN UCD UCM DF-IG 0.813 0.777 0.793 (±0.009) (±0.009) (±0.006) 0.823 0.808 0.817 (±0.008) (±0.010) (±0.013) 0.828 0.822 0.825 (±0.005) (±0.009) (±0.008) 0.838 0.828 0.831 (±0.008) (±0.007) (±0.006) 0.836 0.835 0.837 (±0.009) (±0.008) (±0.008) 0.835 0.837 0.836 (±0.009) (±0.011) (±0.010) 0.835 0.834 0.837 (±0.006) (±0.007) (±0.007) DF-MI 0.812 0.776 0.791 (±0.010) (±0.008) (±0.008) 0.825 0.807 0.819 (±0.008) (±0.011) (±0.008) 0.828 0.822 0.826 (±0.006) (±0.009) (±0.009) 0.833 0.828 0.831 (±0.007) (±0.008) (±0.005) 0.837 0.835 0.836 (±0.010) (±0.008) (±0.008) 0.835 0.836 0.836 (±0.009) (±0.010) (±0.010) 0.835 0.835 0.835 (±0.006) (±0.007) (±0.008) IG-MI 0.811 0.794 0.793 (±0.007) (±0.009) (±0.008) 0.829 0.822 0.816 (±0.009) (±0.011) (±0.006) 0.830 0.826 0.824 (±0.005) (±0.003) (±0.004) 0.835 0.831 0.831 (±0.006) (±0.005) (±0.006) 0.835 0.836 0.836 (±0.008) (±0.006) (±0.008) 0.837 0.836 0.836 (±0.008) (±0.008) (±0.008) 0.835 0.835 0.837 (±0.007) (±0.005) (±0.005) 0.773 (±0.009) 0.809 (±0.007) 0.821 (±0.006) 0.825 (±0.002) 0.836 (±0.010) 0.837 (±0.007) 0.837 (±0.006)

0.751 (±0.008) 0.804 (±0.009) 0.822 (±0.009) 0.826 (±0.005) 0.834 (±0.008) 0.838 (±0.007) 0.834 (±0.007)

0.735 (±0.007) 0.784 (±0.009) 0.816 (±0.010) 0.822 (±0.004) 0.833 (±0.010) 0.835 (±0.007) 0.837 (±0.004)

INT

0.693 (±0.014) 0.747 (±0.007) 0.752 (±0.011) 0.769 (±0.006) 0.764 (±0.016) 0.763 (±0.013) 0.766 (±0.009)

0.664 (±0.015) 0.726 (±0.023) 0.753 (±0.011) 0.758 (±0.014) 0.765 (±0.017) 0.764 (±0.014) 0.760 (±0.011)

0.664 (±0.015) 0.726 (±0.023) 0.753 (±0.011) 0.758 (±0.014) 0.765 (±0.017) 0.764 (±0.014) 0.760 (±0.011)

M1

0.723 (±0.010) 0.760 (±0.010) 0.753 (±0.010) 0.762 (±0.010) 0.770 (±0.014) 0.765 (±0.014) 0.762 (±0.013)

0.723 (±0.010) 0.760 (±0.010) 0.753 (±0.010) 0.762 (±0.010) 0.770 (±0.014) 0.765 (±0.014) 0.762 (±0.013)

0.693 (±0.014) 0.747 (±0.007) 0.752 (±0.011) 0.769 (±0.006) 0.764 (±0.016) 0.763 (±0.013) 0.766 (±0.009)

M2

0.733 (±0.011) 0.765 (±0.012) 0.760 (±0.011) 0.773 (±0.012) 0.764 (±0.015) 0.765 (±0.015) 0.763 (±0.013)

0.739 (±0.014) 0.755 (±0.012) 0.757 (±0.015) 0.764 (±0.015) 0.767 (±0.017) 0.758 (±0.016) 0.759 (±0.012)

UC

0.705 (±0.016) 0.755 (±0.012) 0.755 (±0.010) 0.767 (±0.011) 0.770 (±0.012) 0.765 (±0.014) 0.762 (±0.011)

0.664 (±0.012) 0.726 (±0.019) 0.753 (±0.013) 0.758 (±0.014) 0.765 (±0.016) 0.764 (±0.014) 0.760 (±0.011)

0.667 (±0.010) 0.730 (±0.018) 0.752 (±0.014) 0.756 (±0.014) 0.764 (±0.015) 0.764 (±0.015) 0.759 (±0.012)

macroF1

0.737 (±0.018) 0.754 (±0.011) 0.756 (±0.011) 0.770 (±0.013) 0.766 (±0.015) 0.761 (±0.012) 0.759 (±0.012)

UN

0.701 (±0.018) 0.748 (±0.011) 0.752 (±0.009) 0.766 (±0.010) 0.767 (±0.015) 0.765 (±0.015) 0.767 (±0.013)

0.688 (±0.012) 0.745 (±0.012) 0.755 (±0.014) 0.761 (±0.013) 0.768 (±0.017) 0.764 (±0.017) 0.759 (±0.014)

0.694 (±0.009) 0.741 (±0.023) 0.758 (±0.014) 0.765 (±0.007) 0.767 (±0.016) 0.765 (±0.017) 0.763 (±0.011)

UM

0.672 (±0.015) 0.739 (±0.010) 0.750 (±0.008) 0.757 (±0.007) 0.768 (±0.019) 0.767 (±0.017) 0.767 (±0.013)

0.647 (±0.018) 0.718 (±0.021) 0.754 (±0.012) 0.759 (±0.015) 0.767 (±0.014) 0.769 (±0.014) 0.760 (±0.011)

0.625 (±0.016) 0.687 (±0.019) 0.749 (±0.015) 0.755 (±0.006) 0.762 (±0.016) 0.764 (±0.013) 0.766 (±0.008)

INT

170

0.8 (±0.0) 1.5 (±0.1) 2.2 (±0.1) 3.6 (±0.0) 7.1 (±0.1) 10.7 (±0.1) 13.9 (±0.1)

0.7 (±0.1) 1.4 (±0.0) 2.0 (±0.0) 3.2 (±0.0) 6.1 (±0.1) 9.6 (±0.1) 12.6 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.5 (±0.0) 2.1 (±0.1) 3.4 (±0.0) 6.6 (±0.1) 10.1 (±0.1) 13.2 (±0.1)

0.3 (±0.1) 0.6 (±0.0) 1.0 (±0.0) 1.8 (±0.0) 3.9 (±0.1) 5.4 (±0.1) 7.4 (±0.1)

0.2 (±0.0) 0.5 (±0.1) 0.8 (±0.1) 1.4 (±0.0) 2.9 (±0.1) 4.3 (±0.0) 6.1 (±0.1)

0.2 (±0.0) 0.5 (±0.0) 0.9 (±0.1) 1.6 (±0.0) 3.4 (±0.1) 4.9 (±0.1) 6.8 (±0.1)

Threshold (%) UTh ITh

0.5

Th

53.1 (±4.4) 62.3 (±1.4) 68.4 (±1.8) 72.2 (±1.0) 77.3 (±1.8) 71.7 (±0.7) 74.1 (±0.9)

35.1 (±4.0) 45.5 (±0.9) 50.6 (±1.5) 56.2 (±1.1) 58.5 (±1.2) 57.5 (±0.7) 60.6 (±0.6)

45.8 (±1.6) 51.3 (±1.5) 58.0 (±1.6) 63.1 (±0.8) 66.6 (±1.6) 65.4 (±0.7) 67.5 (±0.8)

M1-M2

78.3 (±2.8) 80.3 (±0.7) 82.2 (±1.3) 84.1 (±1.0) 88.4 (±1.0) 86.3 (±1.0) 88.0 (±1.7)

69.5 (±2.0) 71.1 (±1.0) 72.1 (±1.2) 73.5 (±1.0) 75.8 (±0.4) 72.6 (±0.5) 77.2 (±1.1)

50.0 (±1.3) 55.2 (±1.9) 60.0 (±1.4) 65.6 (±1.0) 68.9 (±1.1) 70.1 (±0.8) 71.7 (±0.6)

65.8 (±6.7) 72.7 (±2.0) 77.7 (±1.6) 82.8 (±0.4) 86.7 (±0.8) 86.4 (±0.9) 87.5 (±1.6)

49.8 (±6.9) 57.0 (±2.0) 60.7 (±2.4) 63.1 (±1.1) 66.7 (±2.3) 63.5 (±1.3) 65.9 (±2.9)

57.3 (±3.6) 64.3 (±3.3) 68.2 (±2.0) 72.5 (±0.8) 76.7 (±1.7) 75.1 (±1.6) 76.6 (±0.9)

Similarity (%) M1-UCD M1-UCM

0.760 (±0.009) 0.803 (±0.008) 0.817 (±0.004) 0.826 (±0.007) 0.833 (±0.009) 0.832 (±0.006) 0.834 (±0.009)

0.760 (±0.009) 0.803 (±0.008) 0.817 (±0.004) 0.826 (±0.007) 0.833 (±0.009) 0.832 (±0.006) 0.834 (±0.009)

0.760 (±0.009) 0.803 (±0.008) 0.817 (±0.004) 0.826 (±0.007) 0.833 (±0.009) 0.832 (±0.006) 0.834 (±0.009)

M1

0.800 (±0.008) 0.826 (±0.007) 0.831 (±0.003) 0.833 (±0.006) 0.834 (±0.006) 0.839 (±0.008) 0.838 (±0.007)

0.779 (±0.006) 0.814 (±0.003) 0.823 (±0.007) 0.828 (±0.006) 0.831 (±0.008) 0.834 (±0.009) 0.834 (±0.009)

0.765 (±0.004) 0.801 (±0.006) 0.814 (±0.007) 0.828 (±0.006) 0.831 (±0.009) 0.833 (±0.009) 0.836 (±0.010)

M2

0.807 (±0.009) 0.823 (±0.007) 0.828 (±0.006) 0.834 (±0.009) 0.835 (±0.005) 0.836 (±0.007) 0.836 (±0.007)

0.813 (±0.007) 0.828 (±0.006) 0.833 (±0.006) 0.833 (±0.008) 0.835 (±0.008) 0.835 (±0.007) 0.835 (±0.007)

UCD CC-DF 0.757 (±0.007) 0.804 (±0.006) 0.812 (±0.007) 0.829 (±0.004) 0.828 (±0.009) 0.835 (±0.011) 0.835 (±0.009) CC-IG 0.786 (±0.008) 0.817 (±0.008) 0.825 (±0.008) 0.832 (±0.007) 0.834 (±0.009) 0.834 (±0.009) 0.836 (±0.009) CC-MI 0.785 (±0.011) 0.818 (±0.003) 0.828 (±0.003) 0.831 (±0.008) 0.834 (±0.007) 0.836 (±0.008) 0.835 (±0.009)

microF1

0.785 (±0.012) 0.815 (±0.005) 0.820 (±0.005) 0.826 (±0.007) 0.833 (±0.010) 0.834 (±0.011) 0.836 (±0.008)

UN

0.793 (±0.006) 0.823 (±0.008) 0.832 (±0.005) 0.832 (±0.007) 0.836 (±0.008) 0.836 (±0.008) 0.835 (±0.009)

0.774 (±0.004) 0.809 (±0.006) 0.821 (±0.008) 0.831 (±0.009) 0.833 (±0.006) 0.836 (±0.010) 0.834 (±0.010)

0.755 (±0.009) 0.800 (±0.008) 0.817 (±0.008) 0.828 (±0.006) 0.832 (±0.007) 0.834 (±0.010) 0.836 (±0.009)

UCM

0.725 (±0.021) 0.796 (±0.007) 0.817 (±0.004) 0.828 (±0.005) 0.832 (±0.009) 0.836 (±0.006) 0.835 (±0.008)

0.691 (±0.027) 0.773 (±0.010) 0.798 (±0.008) 0.825 (±0.004) 0.831 (±0.008) 0.830 (±0.009) 0.836 (±0.010)

0.714 (±0.013) 0.775 (±0.014) 0.805 (±0.008) 0.823 (±0.004) 0.831 (±0.007) 0.831 (±0.006) 0.834 (±0.006)

INT

Table B.15: Combining operators using Alj-Mgz-SR Dataset (WLocal)

0.603 (±0.016) 0.707 (±0.017) 0.737 (±0.013) 0.750 (±0.018) 0.763 (±0.011) 0.756 (±0.013) 0.758 (±0.019)

0.603 (±0.016) 0.707 (±0.017) 0.737 (±0.013) 0.750 (±0.018) 0.763 (±0.011) 0.756 (±0.013) 0.758 (±0.019)

0.603 (±0.016) 0.707 (±0.017) 0.737 (±0.013) 0.750 (±0.018) 0.763 (±0.011) 0.756 (±0.013) 0.758 (±0.019)

M1

0.715 (±0.018) 0.752 (±0.012) 0.765 (±0.012) 0.765 (±0.012) 0.761 (±0.015) 0.768 (±0.015) 0.766 (±0.015)

0.682 (±0.018) 0.739 (±0.006) 0.754 (±0.009) 0.761 (±0.014) 0.761 (±0.018) 0.765 (±0.016) 0.760 (±0.016)

0.626 (±0.006) 0.718 (±0.006) 0.734 (±0.013) 0.752 (±0.018) 0.756 (±0.012) 0.757 (±0.015) 0.759 (±0.015)

M2

0.724 (±0.012) 0.749 (±0.014) 0.758 (±0.016) 0.763 (±0.012) 0.762 (±0.013) 0.762 (±0.011) 0.761 (±0.015)

0.728 (±0.014) 0.753 (±0.011) 0.765 (±0.014) 0.766 (±0.015) 0.762 (±0.015) 0.764 (±0.014) 0.759 (±0.016)

UCD

0.683 (±0.021) 0.742 (±0.009) 0.759 (±0.013) 0.760 (±0.012) 0.763 (±0.013) 0.762 (±0.016) 0.760 (±0.018)

0.687 (±0.012) 0.736 (±0.014) 0.757 (±0.015) 0.765 (±0.015) 0.762 (±0.016) 0.762 (±0.016) 0.761 (±0.018)

0.603 (±0.008) 0.715 (±0.011) 0.729 (±0.015) 0.754 (±0.013) 0.753 (±0.013) 0.758 (±0.017) 0.758 (±0.011)

macroF1

0.678 (±0.024) 0.737 (±0.010) 0.742 (±0.008) 0.752 (±0.019) 0.760 (±0.014) 0.757 (±0.021) 0.761 (±0.012)

UN

0.694 (±0.011) 0.750 (±0.012) 0.764 (±0.017) 0.763 (±0.011) 0.766 (±0.014) 0.762 (±0.016) 0.761 (±0.017)

0.669 (±0.007) 0.731 (±0.013) 0.749 (±0.017) 0.763 (±0.016) 0.764 (±0.016) 0.765 (±0.019) 0.760 (±0.017)

0.608 (±0.020) 0.707 (±0.011) 0.733 (±0.015) 0.754 (±0.019) 0.759 (±0.013) 0.758 (±0.014) 0.761 (±0.015)

UCM

0.534 (±0.037) 0.692 (±0.017) 0.735 (±0.015) 0.752 (±0.015) 0.760 (±0.015) 0.764 (±0.009) 0.761 (±0.016)

0.492 (±0.051) 0.632 (±0.037) 0.713 (±0.018) 0.752 (±0.013) 0.757 (±0.016) 0.757 (±0.016) 0.763 (±0.018)

0.500 (±0.043) 0.631 (±0.036) 0.714 (±0.015) 0.747 (±0.013) 0.759 (±0.013) 0.755 (±0.012) 0.759 (±0.017)

INT

171

0.8 (±0.0) 1.5 (±0.1) 2.1 (±0.0) 3.4 (±0.0) 6.6 (±0.0) 9.9 (±0.0) 13.4 (±0.1)

0.7 (±0.0) 1.3 (±0.0) 2.0 (±0.0) 3.3 (±0.0) 6.7 (±0.0) 10.1 (±0.1) 13.2 (±0.2)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0.0) 1.5 (±0.0) 2.1 (±0.0) 3.4 (±0.0) 6.8 (±0.0) 10.0 (±0.1) 13.5 (±0.2)

0.5

Th

0.3 (±0.0) 0.7 (±0.0) 1.0 (±0.0) 1.7 (±0.0) 3.3 (±0.0) 4.9 (±0.1) 6.8 (±0.2)

0.2 (±0.0) 0.5 (±0.1) 0.9 (±0.0) 1.6 (±0.0) 3.4 (±0.0) 5.1 (±0.0) 6.6 (±0.1)

0.2 (±0.0) 0.5 (±0.0) 0.9 (±0.0) 1.6 (±0.0) 3.2 (±0.0) 5.0 (±0.1) 6.5 (±0.2)

Threshold (%) UTh ITh

62.4 (±1.2) 67.8 (±0.8) 67.7 (±1.4) 68.1 (±0.8) 66.1 (±1.0) 65.8 (±0.7) 67.6 (±2.0)

39.6 (±1.6) 54.0 (±1.7) 61.7 (±1.4) 65.1 (±0.5) 68.4 (±0.5) 68.0 (±0.4) 65.5 (±1.3)

39.0 (±1.5) 52.9 (±0.3) 59.7 (±0.9) 64.9 (±0.4) 64.3 (±0.8) 66.0 (±0.6) 65.0 (±1.8)

M1-M2

84.8 (±0.8) 84.3 (±0.6) 84.8 (±0.8) 87.8 (±0.4) 87.3 (±0.8) 89.9 (±0.5) 91.3 (±1.2)

98.7 (±0.3) 97.6 (±0.3) 98.2 (±0.2) 97.4 (±0.2) 97.8 (±0.4) 97.2 (±0.5) 98.0 (±0.6)

93.6 (±0.9) 95.2 (±0.7) 95.4 (±0.3) 94.7 (±0.4) 93.0 (±0.8) 91.0 (±0.5) 93.1 (±0.8)

89.2 (±1.0) 93.3 (±0.6) 92.9 (±1.1) 95.1 (±0.6) 94.4 (±0.6) 95.4 (±0.5) 96.4 (±1.2)

74.4 (±0.7) 80.0 (±1.4) 85.8 (±0.5) 87.8 (±0.2) 90.1 (±0.9) 90.0 (±0.8) 89.1 (±0.7)

75.6 (±1.1) 76.5 (±2.1) 77.8 (±0.8) 78.8 (±0.6) 78.5 (±0.8) 77.4 (±1.0) 76.5 (±1.1)

Similarity (%) M1-UCD M1-UCM

0.779 (±0.006) 0.814 (±0.003) 0.823 (±0.007) 0.828 (±0.006) 0.831 (±0.008) 0.834 (±0.009) 0.834 (±0.009)

0.765 (±0.004) 0.801 (±0.006) 0.814 (±0.007) 0.828 (±0.006) 0.831 (±0.009) 0.833 (±0.009) 0.836 (±0.010)

0.765 (±0.004) 0.801 (±0.006) 0.814 (±0.007) 0.828 (±0.006) 0.831 (±0.009) 0.833 (±0.009) 0.836 (±0.010)

M1

0.800 (±0.008) 0.826 (±0.007) 0.831 (±0.003) 0.833 (±0.006) 0.834 (±0.006) 0.839 (±0.008) 0.838 (±0.007)

0.800 (±0.008) 0.826 (±0.007) 0.831 (±0.003) 0.833 (±0.006) 0.834 (±0.006) 0.839 (±0.008) 0.838 (±0.007)

0.779 (±0.006) 0.814 (±0.003) 0.823 (±0.007) 0.828 (±0.006) 0.831 (±0.008) 0.834 (±0.009) 0.834 (±0.009)

M2

Table B.15 – continued microF1 UN UCD UCM DF-IG 0.812 0.757 0.785 (±0.009) (±0.010) (±0.006) 0.828 0.803 0.811 (±0.006) (±0.003) (±0.009) 0.831 0.814 0.824 (±0.008) (±0.006) (±0.007) 0.834 0.828 0.828 (±0.006) (±0.005) (±0.006) 0.836 0.834 0.836 (±0.010) (±0.008) (±0.008) 0.836 0.836 0.835 (±0.008) (±0.010) (±0.009) 0.837 0.834 0.838 (±0.007) (±0.007) (±0.010) DF-MI 0.812 0.763 0.787 (±0.008) (±0.003) (±0.007) 0.824 0.806 0.819 (±0.004) (±0.003) (±0.010) 0.829 0.813 0.826 (±0.006) (±0.006) (±0.006) 0.832 0.828 0.832 (±0.010) (±0.004) (±0.010) 0.835 0.830 0.835 (±0.011) (±0.009) (±0.011) 0.837 0.835 0.839 (±0.010) (±0.011) (±0.010) 0.835 0.835 0.836 (±0.008) (±0.008) (±0.010) IG-MI 0.811 0.791 0.793 (±0.010) (±0.006) (±0.003) 0.829 0.821 0.815 (±0.007) (±0.006) (±0.006) 0.833 0.831 0.829 (±0.008) (±0.008) (±0.009) 0.835 0.833 0.835 (±0.006) (±0.006) (±0.007) 0.835 0.835 0.836 (±0.008) (±0.006) (±0.009) 0.836 0.836 0.836 (±0.006) (±0.008) (±0.007) 0.838 0.837 0.835 (±0.007) (±0.007) (±0.008) 0.766 (±0.004) 0.807 (±0.003) 0.819 (±0.003) 0.828 (±0.007) 0.834 (±0.007) 0.838 (±0.009) 0.836 (±0.008)

0.719 (±0.010) 0.795 (±0.010) 0.816 (±0.003) 0.824 (±0.003) 0.831 (±0.007) 0.836 (±0.007) 0.836 (±0.008)

0.699 (±0.011) 0.780 (±0.004) 0.802 (±0.007) 0.824 (±0.007) 0.826 (±0.006) 0.832 (±0.007) 0.834 (±0.011)

INT

0.682 (±0.018) 0.739 (±0.006) 0.754 (±0.009) 0.761 (±0.014) 0.761 (±0.018) 0.765 (±0.016) 0.760 (±0.016)

0.626 (±0.006) 0.718 (±0.006) 0.734 (±0.013) 0.752 (±0.018) 0.756 (±0.012) 0.757 (±0.015) 0.759 (±0.015)

0.626 (±0.006) 0.718 (±0.006) 0.734 (±0.013) 0.752 (±0.018) 0.756 (±0.012) 0.757 (±0.015) 0.759 (±0.015)

M1

0.715 (±0.018) 0.752 (±0.012) 0.765 (±0.012) 0.765 (±0.012) 0.761 (±0.015) 0.768 (±0.015) 0.766 (±0.015)

0.715 (±0.018) 0.752 (±0.012) 0.765 (±0.012) 0.765 (±0.012) 0.761 (±0.015) 0.768 (±0.015) 0.766 (±0.015)

0.682 (±0.018) 0.739 (±0.006) 0.754 (±0.009) 0.761 (±0.014) 0.761 (±0.018) 0.765 (±0.016) 0.760 (±0.016)

M2

0.732 (±0.015) 0.753 (±0.010) 0.769 (±0.017) 0.766 (±0.017) 0.763 (±0.014) 0.764 (±0.012) 0.763 (±0.016)

0.736 (±0.012) 0.752 (±0.005) 0.759 (±0.013) 0.762 (±0.016) 0.763 (±0.016) 0.760 (±0.019) 0.758 (±0.014)

UC

0.697 (±0.016) 0.745 (±0.010) 0.764 (±0.014) 0.764 (±0.011) 0.761 (±0.013) 0.761 (±0.013) 0.763 (±0.014)

0.625 (±0.007) 0.727 (±0.007) 0.733 (±0.013) 0.753 (±0.010) 0.755 (±0.014) 0.760 (±0.020) 0.756 (±0.011)

0.617 (±0.010) 0.721 (±0.008) 0.736 (±0.015) 0.755 (±0.012) 0.762 (±0.013) 0.761 (±0.014) 0.758 (±0.012)

macroF1

0.738 (±0.017) 0.757 (±0.011) 0.761 (±0.012) 0.763 (±0.012) 0.761 (±0.015) 0.761 (±0.014) 0.759 (±0.011)

UN

0.700 (±0.013) 0.741 (±0.013) 0.762 (±0.016) 0.765 (±0.017) 0.765 (±0.018) 0.765 (±0.013) 0.760 (±0.014)

0.677 (±0.009) 0.739 (±0.017) 0.758 (±0.014) 0.762 (±0.016) 0.763 (±0.017) 0.764 (±0.016) 0.760 (±0.017)

0.675 (±0.010) 0.733 (±0.012) 0.751 (±0.011) 0.754 (±0.015) 0.764 (±0.017) 0.762 (±0.014) 0.765 (±0.016)

UM

0.653 (±0.006) 0.723 (±0.005) 0.749 (±0.011) 0.760 (±0.012) 0.767 (±0.017) 0.770 (±0.014) 0.767 (±0.013)

0.534 (±0.013) 0.704 (±0.021) 0.739 (±0.014) 0.748 (±0.007) 0.758 (±0.011) 0.763 (±0.013) 0.765 (±0.012)

0.484 (±0.008) 0.671 (±0.012) 0.720 (±0.017) 0.751 (±0.020) 0.753 (±0.012) 0.764 (±0.012) 0.763 (±0.018)

INT

172

0.82 (±0) 1.6 (±0) 2.32 (±0) 3.8 (±0) 7.2 (±0) 10.46 (±0.1) 13.82 (±0)

0.8 (±0) 1.5 (±0) 2.2 (±0) 3.52 (±0) 6.72 (±0) 9.76 (±0.1) 12.66 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0) 1.5 (±0) 2.2 (±0) 3.56 (±0.1) 6.72 (±0) 9.7 (±0.1) 12.74 (±0.1)

0.2 (±0) 0.5 (±0) 0.8 (±0) 1.48 (±0) 3.28 (±0) 5.24 (±0.1) 7.34 (±0.1)

0.18 (±0) 0.4 (±0) 0.68 (±0) 1.2 (±0) 2.8 (±0) 4.54 (±0.1) 6.18 (±0)

0.2 (±0) 0.5 (±0) 0.8 (±0) 1.44 (±0.1) 3.28 (±0) 5.3 (±0.1) 7.26 (±0.1)

Threshold (%) UTh ITh

0.5

Th

38.08 (±2.2) 48.24 (±2.1) 52.08 (±1.3) 58.16 (±0.7) 65.44 (±0.6) 70.06 (±0.2) 73.3 (±0.7)

32.38 (±2.6) 40.7 (±2.3) 44.36 (±1.1) 48.36 (±0.4) 55.72 (±0.7) 60.46 (±0.5) 61.8 (±0.5)

39.26 (±1.9) 48.54 (±1.5) 52.46 (±1) 57.1 (±1.3) 65.4 (±0.6) 70.88 (±0.8) 72.86 (±0.9)

M1-M2

62.5 (±1.6) 67.9 (±1.7) 71.18 (±1.4) 74 (±0.6) 78.52 (±0.9) 81.38 (±0.4) 84.56 (±0.5)

65.56 (±1.2) 71.78 (±0.7) 73.5 (±1.2) 75.9 (±0.7) 79.44 (±0.7) 82.42 (±0.6) 85.84 (±0.6)

47.74 (±2.1) 56.08 (±1.8) 59.46 (±0.5) 64.16 (±0.8) 71.86 (±0.4) 74.7 (±0.7) 76.24 (±0.6)

58.12 (±2.9) 64.56 (±1.5) 68.94 (±1.2) 72.76 (±1.4) 75.56 (±0.7) 77.88 (±0.9) 80.54 (±0.8)

54.14 (±3.8) 59.86 (±1.9) 63.34 (±0.9) 65.98 (±1) 68.4 (±1) 72.36 (±2) 72.18 (±1.5)

61.26 (±2.7) 69.46 (±1.8) 72.48 (±0.6) 75.2 (±0.9) 78.22 (±1) 82 (±0.9) 81.92 (±1.7)

Similarity (%) M1-UCD M1-UCM

0.782 (±0.01) 0.823 (±0.009) 0.833 (±0.009) 0.844 (±0.006) 0.854 (±0.008) 0.861 (±0.01) 0.862 (±0.011)

0.782 (±0.01) 0.823 (±0.009) 0.833 (±0.009) 0.844 (±0.006) 0.854 (±0.008) 0.861 (±0.01) 0.862 (±0.011)

0.782 (±0.01) 0.823 (±0.009) 0.833 (±0.009) 0.844 (±0.006) 0.854 (±0.008) 0.861 (±0.01) 0.862 (±0.011)

M1

0.818 (±0.006) 0.836 (±0.005) 0.844 (±0.002) 0.852 (±0.005) 0.863 (±0.008) 0.865 (±0.008) 0.865 (±0.007)

0.808 (±0.007) 0.825 (±0.008) 0.836 (±0.006) 0.849 (±0.002) 0.859 (±0.008) 0.863 (±0.007) 0.862 (±0.008)

0.813 (±0.01) 0.828 (±0.008) 0.837 (±0.006) 0.853 (±0.005) 0.861 (±0.005) 0.864 (±0.007) 0.863 (±0.008)

M2

0.823 (±0.011) 0.841 (±0.005) 0.847 (±0.006) 0.856 (±0.004) 0.863 (±0.011) 0.864 (±0.009) 0.862 (±0.008)

0.825 (±0.012) 0.841 (±0.007) 0.846 (±0.007) 0.853 (±0.007) 0.861 (±0.007) 0.865 (±0.008) 0.863 (±0.008)

UCD CC-DF 0.802 (±0.006) 0.827 (±0.01) 0.834 (±0.011) 0.85 (±0.008) 0.862 (±0.008) 0.864 (±0.006) 0.862 (±0.008) CC-IG 0.81 (±0.016) 0.824 (±0.012) 0.84 (±0.008) 0.853 (±0.004) 0.86 (±0.005) 0.863 (±0.007) 0.865 (±0.008) CC-MI 0.813 (±0.012) 0.832 (±0.011) 0.839 (±0.009) 0.852 (±0.006) 0.861 (±0.008) 0.866 (±0.005) 0.863 (±0.007)

microF1

0.825 (±0.011) 0.833 (±0.01) 0.843 (±0.004) 0.855 (±0.006) 0.862 (±0.007) 0.863 (±0.008) 0.862 (±0.008)

UN

0.812 (±0.007) 0.829 (±0.006) 0.841 (±0.005) 0.852 (±0.007) 0.859 (±0.009) 0.864 (±0.01) 0.865 (±0.007)

0.799 (±0.007) 0.823 (±0.008) 0.833 (±0.005) 0.846 (±0.007) 0.862 (±0.004) 0.862 (±0.007) 0.861 (±0.006)

0.802 (±0.009) 0.828 (±0.007) 0.834 (±0.009) 0.845 (±0.009) 0.86 (±0.007) 0.862 (±0.01) 0.862 (±0.008)

UCM

0.761 (±0.012) 0.813 (±0.007) 0.829 (±0.009) 0.84 (±0.005) 0.855 (±0.006) 0.864 (±0.009) 0.862 (±0.006)

0.746 (±0.014) 0.797 (±0.008) 0.813 (±0.006) 0.833 (±0.003) 0.849 (±0.003) 0.861 (±0.008) 0.86 (±0.006)

0.755 (±0.015) 0.81 (±0.008) 0.828 (±0.01) 0.837 (±0.007) 0.852 (±0.007) 0.861 (±0.008) 0.863 (±0.007)

INT

Table B.16: Combining operators using Alj-Mgz-MS Dataset (FLocal)

0.649 (±0.011) 0.744 (±0.016) 0.75 (±0.018) 0.772 (±0.015) 0.786 (±0.016) 0.796 (±0.019) 0.797 (±0.02)

0.649 (±0.011) 0.744 (±0.016) 0.75 (±0.018) 0.772 (±0.015) 0.786 (±0.016) 0.796 (±0.019) 0.797 (±0.02)

0.649 (±0.011) 0.744 (±0.016) 0.75 (±0.018) 0.772 (±0.015) 0.786 (±0.016) 0.796 (±0.019) 0.797 (±0.02)

M1

0.741 (±0.014) 0.773 (±0.012) 0.782 (±0.008) 0.79 (±0.013) 0.804 (±0.013) 0.805 (±0.017) 0.802 (±0.015)

0.735 (±0.011) 0.762 (±0.015) 0.775 (±0.013) 0.792 (±0.011) 0.8 (±0.017) 0.803 (±0.013) 0.798 (±0.014)

0.729 (±0.016) 0.754 (±0.017) 0.762 (±0.019) 0.789 (±0.013) 0.8 (±0.014) 0.801 (±0.012) 0.798 (±0.017)

M2

0.749 (±0.023) 0.776 (±0.008) 0.784 (±0.012) 0.792 (±0.012) 0.803 (±0.016) 0.801 (±0.018) 0.799 (±0.019)

0.758 (±0.016) 0.78 (±0.014) 0.782 (±0.011) 0.792 (±0.012) 0.8 (±0.013) 0.802 (±0.016) 0.798 (±0.019)

UCD

0.731 (±0.022) 0.76 (±0.02) 0.768 (±0.02) 0.783 (±0.016) 0.801 (±0.012) 0.804 (±0.013) 0.8 (±0.015)

0.727 (±0.026) 0.748 (±0.02) 0.769 (±0.018) 0.786 (±0.011) 0.798 (±0.008) 0.801 (±0.013) 0.802 (±0.016)

0.7 (±0.01) 0.75 (±0.022) 0.759 (±0.022) 0.782 (±0.022) 0.8 (±0.012) 0.801 (±0.013) 0.796 (±0.016)

macroF1

0.748 (±0.025) 0.759 (±0.018) 0.769 (±0.013) 0.79 (±0.016) 0.801 (±0.012) 0.798 (±0.017) 0.795 (±0.016)

UN

0.725 (±0.011) 0.761 (±0.016) 0.778 (±0.011) 0.789 (±0.017) 0.799 (±0.015) 0.804 (±0.017) 0.804 (±0.014)

0.71 (±0.007) 0.75 (±0.015) 0.77 (±0.014) 0.786 (±0.011) 0.802 (±0.012) 0.8 (±0.012) 0.797 (±0.014)

0.702 (±0.022) 0.757 (±0.018) 0.761 (±0.016) 0.776 (±0.021) 0.8 (±0.014) 0.799 (±0.016) 0.797 (±0.018)

UCM

0.601 (±0.009) 0.725 (±0.011) 0.753 (±0.012) 0.769 (±0.017) 0.787 (±0.014) 0.804 (±0.015) 0.798 (±0.014)

0.583 (±0.018) 0.705 (±0.01) 0.728 (±0.008) 0.766 (±0.013) 0.786 (±0.012) 0.799 (±0.013) 0.796 (±0.012)

0.587 (±0.014) 0.719 (±0.016) 0.751 (±0.013) 0.763 (±0.012) 0.786 (±0.013) 0.798 (±0.013) 0.797 (±0.014)

INT

173

0.7 (±0) 1.3 (±0) 1.94 (±0.1) 3.1 (±0) 6 (±0) 8.96 (±0.1) 11.84 (±0.1)

0.6 (±0) 1.2 (±0) 1.8 (±0) 3 (±0) 5.92 (±0) 8.94 (±0.1) 12 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.78 (±0) 1.42 (±0) 2.1 (±0) 3.42 (±0) 6.58 (±0) 9.78 (±0) 12.96 (±0.1)

0.5

Th

0.4 (±0) 0.8 (±0) 1.2 (±0) 2 (±0) 4.08 (±0) 6.06 (±0.1) 8 (±0.1)

0.3 (±0) 0.7 (±0) 1.06 (±0.1) 1.9 (±0) 4 (±0) 6.04 (±0.1) 8.16 (±0.1)

0.22 (±0) 0.58 (±0) 0.9 (±0) 1.58 (±0) 3.42 (±0) 5.22 (±0) 7.04 (±0.1)

Threshold (%) UTh ITh

77.4 (±1.2) 78.4 (±0.7) 79.7 (±0.8) 80.66 (±0.6) 81.16 (±0.7) 80.92 (±0.6) 80 (±0.7)

59.02 (±1) 68.42 (±1.2) 70.34 (±0.7) 75.04 (±0.8) 80.06 (±0.4) 80.54 (±1.1) 81.72 (±0.6)

49.26 (±1) 56.16 (±1.2) 57.7 (±0.7) 62.16 (±0.4) 68.58 (±0.4) 69.56 (±0.4) 70.16 (±0.5)

M1-M2

82.58 (±1.2) 81.66 (±0.7) 82.58 (±0.5) 83.14 (±0.5) 84.08 (±0.5) 84.46 (±0.5) 83.62 (±0.5)

99.84 (±0.2) 99.44 (±0.3) 99.4 (±0.4) 99.58 (±0.3) 99.44 (±0.2) 99.68 (±0.3) 99.66 (±0.4)

99.44 (±0.2) 99.26 (±0.4) 99.28 (±0.4) 99.52 (±0.3) 99.68 (±0.2) 99.7 (±0.3) 99.86 (±0.2)

88.34 (±1.1) 89 (±0.6) 89.88 (±0.8) 91 (±0.3) 91.4 (±0.3) 91.54 (±0.5) 91.34 (±1)

80.22 (±1.4) 82.82 (±0.8) 82.38 (±0.8) 86.82 (±0.4) 89.3 (±0.6) 90.54 (±0.4) 91.46 (±0.8)

76.42 (±0.9) 77.28 (±0.7) 77.3 (±0.7) 80.86 (±0.3) 85.1 (±0.5) 85.76 (±1.2) 85.28 (±1.3)

Similarity (%) M1-UCD M1-UCM

0.808 (±0.007) 0.825 (±0.008) 0.836 (±0.006) 0.849 (±0.002) 0.859 (±0.008) 0.863 (±0.007) 0.862 (±0.008)

0.813 (±0.01) 0.828 (±0.008) 0.837 (±0.006) 0.853 (±0.005) 0.861 (±0.005) 0.864 (±0.007) 0.863 (±0.008)

0.813 (±0.01) 0.828 (±0.008) 0.837 (±0.006) 0.853 (±0.005) 0.861 (±0.005) 0.864 (±0.007) 0.863 (±0.008)

M1

0.818 (±0.006) 0.836 (±0.005) 0.844 (±0.002) 0.852 (±0.005) 0.863 (±0.008) 0.865 (±0.008) 0.865 (±0.007)

0.818 (±0.006) 0.836 (±0.005) 0.844 (±0.002) 0.852 (±0.005) 0.863 (±0.008) 0.865 (±0.008) 0.865 (±0.007)

0.808 (±0.007) 0.825 (±0.008) 0.836 (±0.006) 0.849 (±0.002) 0.859 (±0.008) 0.863 (±0.007) 0.862 (±0.008)

M2

Table B.16 – continued microF1 UN UCD UCM DF-IG 0.828 0.812 0.815 (±0.007) (±0.011) (±0.009) 0.841 0.828 0.826 (±0.004) (±0.007) (±0.009) 0.849 0.837 0.84 (±0.005) (±0.007) (±0.008) 0.856 0.854 0.853 (±0.007) (±0.007) (±0.004) 0.866 0.862 0.862 (±0.006) (±0.007) (±0.005) 0.865 0.865 0.864 (±0.007) (±0.006) (±0.006) 0.863 0.864 0.864 (±0.006) (±0.007) (±0.007) DF-MI 0.824 0.812 0.816 (±0.007) (±0.01) (±0.008) 0.838 0.827 0.831 (±0.008) (±0.009) (±0.007) 0.849 0.837 0.84 (±0.004) (±0.008) (±0.007) 0.856 0.853 0.856 (±0.004) (±0.005) (±0.004) 0.864 0.861 0.861 (±0.007) (±0.006) (±0.007) 0.865 0.864 0.866 (±0.01) (±0.007) (±0.008) 0.863 0.863 0.863 (±0.008) (±0.008) (±0.008) IG-MI 0.826 0.818 0.818 (±0.004) (±0.009) (±0.007) 0.836 0.834 0.833 (±0.005) (±0.003) (±0.005) 0.848 0.844 0.84 (±0.003) (±0.003) (±0.004) 0.854 0.855 0.851 (±0.005) (±0.005) (±0.003) 0.865 0.862 0.86 (±0.008) (±0.009) (±0.007) 0.864 0.865 0.865 (±0.006) (±0.006) (±0.007) 0.864 0.862 0.864 (±0.007) (±0.008) (±0.008) 0.8 (±0.007) 0.824 (±0.006) 0.833 (±0.005) 0.847 (±0.004) 0.858 (±0.01) 0.863 (±0.01) 0.863 (±0.005)

0.796 (±0.007) 0.823 (±0.006) 0.836 (±0.005) 0.851 (±0.009) 0.858 (±0.005) 0.863 (±0.009) 0.863 (±0.006)

0.78 (±0.008) 0.811 (±0.008) 0.822 (±0.005) 0.843 (±0.005) 0.855 (±0.006) 0.862 (±0.012) 0.864 (±0.008)

INT

0.735 (±0.011) 0.762 (±0.015) 0.775 (±0.013) 0.792 (±0.011) 0.8 (±0.017) 0.803 (±0.013) 0.798 (±0.014)

0.729 (±0.016) 0.754 (±0.017) 0.762 (±0.019) 0.789 (±0.013) 0.8 (±0.014) 0.801 (±0.012) 0.798 (±0.017)

0.729 (±0.016) 0.754 (±0.017) 0.762 (±0.019) 0.789 (±0.013) 0.8 (±0.014) 0.801 (±0.012) 0.798 (±0.017)

M1

0.741 (±0.014) 0.773 (±0.012) 0.782 (±0.008) 0.79 (±0.013) 0.804 (±0.013) 0.805 (±0.017) 0.802 (±0.015)

0.741 (±0.014) 0.773 (±0.012) 0.782 (±0.008) 0.79 (±0.013) 0.804 (±0.013) 0.805 (±0.017) 0.802 (±0.015)

0.735 (±0.011) 0.762 (±0.015) 0.775 (±0.013) 0.792 (±0.011) 0.8 (±0.017) 0.803 (±0.013) 0.798 (±0.014)

M2

0.758 (±0.008) 0.775 (±0.011) 0.788 (±0.007) 0.797 (±0.013) 0.805 (±0.014) 0.804 (±0.012) 0.799 (±0.014)

0.755 (±0.016) 0.768 (±0.016) 0.781 (±0.012) 0.794 (±0.012) 0.806 (±0.014) 0.804 (±0.017) 0.798 (±0.014)

UC

0.738 (±0.011) 0.771 (±0.009) 0.78 (±0.01) 0.794 (±0.015) 0.801 (±0.013) 0.806 (±0.012) 0.799 (±0.014)

0.728 (±0.015) 0.752 (±0.017) 0.765 (±0.019) 0.788 (±0.013) 0.799 (±0.015) 0.801 (±0.014) 0.798 (±0.016)

0.727 (±0.019) 0.753 (±0.017) 0.763 (±0.017) 0.792 (±0.016) 0.798 (±0.015) 0.802 (±0.01) 0.799 (±0.016)

macroF1

0.756 (±0.01) 0.773 (±0.009) 0.785 (±0.016) 0.796 (±0.014) 0.806 (±0.013) 0.801 (±0.012) 0.797 (±0.012)

UN

0.742 (±0.014) 0.766 (±0.009) 0.779 (±0.009) 0.792 (±0.006) 0.8 (±0.013) 0.805 (±0.014) 0.802 (±0.016)

0.737 (±0.013) 0.763 (±0.017) 0.774 (±0.014) 0.796 (±0.01) 0.801 (±0.016) 0.808 (±0.015) 0.801 (±0.015)

0.736 (±0.016) 0.755 (±0.023) 0.771 (±0.015) 0.793 (±0.013) 0.8 (±0.013) 0.805 (±0.009) 0.804 (±0.014)

UM

0.718 (±0.014) 0.76 (±0.011) 0.774 (±0.012) 0.786 (±0.013) 0.796 (±0.017) 0.801 (±0.016) 0.801 (±0.01)

0.703 (±0.012) 0.75 (±0.014) 0.765 (±0.017) 0.789 (±0.018) 0.8 (±0.012) 0.802 (±0.013) 0.8 (±0.014)

0.675 (±0.017) 0.731 (±0.019) 0.749 (±0.015) 0.779 (±0.017) 0.795 (±0.009) 0.801 (±0.017) 0.8 (±0.016)

INT

174

0.8 (±0) 1.58 (±0) 2.3 (±0) 3.78 (±0) 7.26 (±0.1) 10.76 (±0.1) 14.08 (±0)

0.74 (±0.1) 1.4 (±0) 2.12 (±0) 3.46 (±0.1) 6.42 (±0) 9.78 (±0) 12.62 (±0)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.78 (±0) 1.48 (±0) 2.2 (±0) 3.54 (±0.1) 6.62 (±0) 9.98 (±0) 13.06 (±0.1)

0.28 (±0) 0.6 (±0) 0.88 (±0) 1.54 (±0.1) 3.58 (±0) 5.22 (±0) 7.38 (±0)

0.2 (±0) 0.42 (±0) 0.7 (±0) 1.22 (±0) 2.74 (±0.1) 4.24 (±0.1) 5.92 (±0)

0.22 (±0) 0.52 (±0) 0.8 (±0) 1.46 (±0.1) 3.38 (±0) 5.02 (±0) 6.94 (±0.1)

Threshold (%) UTh ITh

0.5

Th

51.08 (±1.8) 57.9 (±1.1) 58.32 (±1.6) 61.42 (±1) 71.9 (±0.7) 69.7 (±0.4) 73.88 (±0.4)

38.62 (±2.1) 44.56 (±1.4) 46.76 (±1.3) 49.72 (±0.6) 54.8 (±0.7) 56.64 (±0.4) 59.4 (±0.3)

46.64 (±2.3) 54.16 (±0.9) 55.44 (±0.7) 58.7 (±0.8) 67.2 (±0.9) 66.74 (±0.7) 69.28 (±0.6)

M1-M2

73.12 (±1.9) 76.26 (±0.6) 76.64 (±2) 79.26 (±0.9) 84.28 (±0.8) 81.74 (±0.4) 86.18 (±0.4)

69.02 (±1.9) 73.22 (±0.5) 73.72 (±1.6) 74.94 (±0.8) 75.84 (±0.6) 73.52 (±0.2) 76.52 (±0.3)

53.9 (±2.3) 59.14 (±0.6) 61.1 (±1.4) 64.78 (±0.3) 70.24 (±0.7) 69.26 (±0.3) 71.68 (±0.5)

62.72 (±1.9) 67.5 (±1.1) 69.14 (±1.5) 73.24 (±0.9) 78.86 (±0.5) 78.94 (±0.5) 82.06 (±0.2)

51.56 (±2.7) 54.84 (±1.2) 57.48 (±0.9) 62.5 (±1.8) 67.36 (±1.3) 68.5 (±1.4) 70.8 (±0.9)

61.38 (±2.4) 66.9 (±0.7) 67.26 (±0.9) 70.78 (±1.3) 75.36 (±0.9) 74.82 (±0.7) 76.68 (±0.8)

Similarity (%) M1-UCD M1-UCM

0.788 (±0.006) 0.821 (±0.01) 0.833 (±0.007) 0.842 (±0.005) 0.856 (±0.01) 0.857 (±0.009) 0.858 (±0.009)

0.788 (±0.006) 0.821 (±0.01) 0.833 (±0.007) 0.842 (±0.005) 0.856 (±0.01) 0.857 (±0.009) 0.858 (±0.009)

0.788 (±0.006) 0.821 (±0.01) 0.833 (±0.007) 0.842 (±0.005) 0.856 (±0.01) 0.857 (±0.009) 0.858 (±0.009)

M1

0.82 (±0.003) 0.834 (±0.006) 0.843 (±0.009) 0.856 (±0.008) 0.859 (±0.008) 0.862 (±0.006) 0.862 (±0.007)

0.81 (±0.005) 0.829 (±0.004) 0.836 (±0.007) 0.85 (±0.006) 0.861 (±0.009) 0.86 (±0.005) 0.859 (±0.008)

0.788 (±0.005) 0.821 (±0.01) 0.837 (±0.009) 0.845 (±0.012) 0.858 (±0.005) 0.863 (±0.007) 0.863 (±0.007)

M2

0.828 (±0.01) 0.838 (±0.011) 0.848 (±0.01) 0.857 (±0.009) 0.86 (±0.009) 0.861 (±0.008) 0.861 (±0.008)

0.828 (±0.008) 0.843 (±0.008) 0.85 (±0.011) 0.856 (±0.009) 0.862 (±0.009) 0.861 (±0.009) 0.86 (±0.008)

UCD CC-DF 0.782 (±0.006) 0.817 (±0.009) 0.83 (±0.009) 0.844 (±0.013) 0.856 (±0.006) 0.861 (±0.007) 0.863 (±0.009) CC-IG 0.805 (±0.013) 0.831 (±0.011) 0.841 (±0.006) 0.852 (±0.01) 0.859 (±0.007) 0.862 (±0.006) 0.862 (±0.006) CC-MI 0.811 (±0.009) 0.828 (±0.009) 0.841 (±0.012) 0.852 (±0.009) 0.858 (±0.008) 0.861 (±0.007) 0.862 (±0.007)

microF1

0.814 (±0.007) 0.836 (±0.008) 0.843 (±0.007) 0.854 (±0.012) 0.859 (±0.005) 0.862 (±0.007) 0.862 (±0.009)

UN

0.817 (±0.009) 0.831 (±0.008) 0.841 (±0.009) 0.853 (±0.009) 0.86 (±0.008) 0.861 (±0.007) 0.864 (±0.007)

0.806 (±0.011) 0.828 (±0.007) 0.839 (±0.008) 0.85 (±0.009) 0.859 (±0.009) 0.861 (±0.007) 0.862 (±0.007)

0.779 (±0.011) 0.823 (±0.011) 0.837 (±0.011) 0.846 (±0.01) 0.855 (±0.005) 0.859 (±0.005) 0.861 (±0.008)

UCM

0.759 (±0.01) 0.812 (±0.008) 0.825 (±0.009) 0.841 (±0.01) 0.854 (±0.009) 0.858 (±0.008) 0.862 (±0.011)

0.747 (±0.013) 0.798 (±0.006) 0.82 (±0.002) 0.836 (±0.006) 0.85 (±0.008) 0.852 (±0.006) 0.856 (±0.007)

0.742 (±0.016) 0.807 (±0.006) 0.825 (±0.007) 0.835 (±0.008) 0.854 (±0.009) 0.856 (±0.008) 0.86 (±0.009)

INT

Table B.17: Combining operators using Alj-Mgz-MS Dataset (WLocal)

0.653 (±0.016) 0.725 (±0.018) 0.742 (±0.009) 0.767 (±0.014) 0.785 (±0.022) 0.789 (±0.021) 0.791 (±0.018)

0.653 (±0.016) 0.725 (±0.018) 0.742 (±0.009) 0.767 (±0.014) 0.785 (±0.022) 0.789 (±0.021) 0.791 (±0.018)

0.653 (±0.016) 0.725 (±0.018) 0.742 (±0.009) 0.767 (±0.014) 0.785 (±0.022) 0.789 (±0.021) 0.791 (±0.018)

M1

0.737 (±0.006) 0.758 (±0.016) 0.773 (±0.015) 0.791 (±0.019) 0.798 (±0.012) 0.799 (±0.014) 0.8 (±0.012)

0.73 (±0.009) 0.764 (±0.01) 0.767 (±0.014) 0.786 (±0.013) 0.799 (±0.012) 0.796 (±0.009) 0.794 (±0.014)

0.649 (±0.013) 0.737 (±0.015) 0.766 (±0.019) 0.777 (±0.027) 0.786 (±0.017) 0.793 (±0.018) 0.793 (±0.016)

M2

0.751 (±0.018) 0.763 (±0.018) 0.777 (±0.021) 0.793 (±0.02) 0.797 (±0.017) 0.795 (±0.011) 0.796 (±0.013)

0.752 (±0.018) 0.776 (±0.017) 0.784 (±0.016) 0.794 (±0.019) 0.801 (±0.014) 0.8 (±0.018) 0.793 (±0.016)

UCD

0.717 (±0.017) 0.754 (±0.018) 0.773 (±0.023) 0.781 (±0.021) 0.795 (±0.018) 0.798 (±0.011) 0.797 (±0.015)

0.715 (±0.026) 0.758 (±0.021) 0.765 (±0.017) 0.782 (±0.024) 0.797 (±0.013) 0.797 (±0.012) 0.797 (±0.014)

0.632 (±0.022) 0.727 (±0.014) 0.75 (±0.017) 0.776 (±0.028) 0.787 (±0.022) 0.789 (±0.019) 0.796 (±0.018)

macroF1

0.71 (±0.012) 0.758 (±0.017) 0.769 (±0.018) 0.787 (±0.026) 0.787 (±0.02) 0.792 (±0.021) 0.793 (±0.02)

UN

0.734 (±0.019) 0.755 (±0.019) 0.768 (±0.02) 0.786 (±0.018) 0.8 (±0.013) 0.798 (±0.014) 0.803 (±0.014)

0.716 (±0.018) 0.756 (±0.021) 0.77 (±0.014) 0.783 (±0.014) 0.796 (±0.015) 0.798 (±0.018) v0.798 (±0.013)

0.626 (±0.027) 0.741 (±0.018) 0.762 (±0.026) 0.774 (±0.023) 0.786 (±0.015) 0.789 (±0.016) 0.791 (±0.014)

UCM

0.569 (±0.021) 0.697 (±0.013) 0.731 (±0.006) 0.766 (±0.019) 0.785 (±0.019) 0.79 (±0.021) 0.797 (±0.021)

0.553 (±0.016) 0.683 (±0.01) 0.726 (±0.005) 0.757 (±0.011) 0.782 (±0.015) 0.783 (±0.017) 0.79 (±0.014)

0.503 (±0.045) 0.691 (±0.011) 0.73 (±0.007) 0.757 (±0.014) 0.783 (±0.022) 0.783 (±0.02) 0.791 (±0.02)

INT

175

0.7 (±0) 1.4 (±0) 2 (±0) 3.3 (±0) 6.44 (±0.1) 9.58 (±0) 12.74 (±0.1)

0.62 (±0) 1.3 (±0) 1.9 (±0) 3.2 (±0) 6.46 (±0.1) 9.66 (±0.1) 13.06 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.8 (±0) 1.48 (±0) 2.2 (±0) 3.5 (±0) 6.84 (±0.1) 10.1 (±0) 13.3 (±0.1)

0.5

Th

0.38 (±0) 0.7 (±0) 1.1 (±0) 1.8 (±0) 3.54 (±0.1) 5.34 (±0.1) 6.94 (±0.1)

0.3 (±0) 0.6 (±0) 1 (±0) 1.7 (±0) 3.56 (±0.1) 5.42 (±0) 7.26 (±0.1)

0.2 (±0) 0.52 (±0) 0.8 (±0) 1.5 (±0) 3.16 (±0.1) 4.9 (±0) 6.7 (±0.1)

Threshold (%) UTh ITh

70.98 (±1) 70.84 (±1.1) 72.56 (±1.1) 72.04 (±0.5) 70.88 (±0.4) 71.24 (±0.7) 69.62 (±0.7)

52.82 (±1.1) 62.28 (±0.6) 64.76 (±0.5) 67.54 (±0.5) 71.38 (±1.1) 72.22 (±0.8) 72.7 (±0.5)

48.6 (±0.7) 54.64 (±0.4) 56.08 (±0.4) 59.24 (±0.2) 63.54 (±0.9) 65.54 (±0.2) 66.9 (±0.9)

M1-M2

85.58 (±0.8) 82.34 (±0.5) 83.28 (±0.8) 84.46 (±0.3) 85.26 (±0.2) 86.9 (±0.5) 86.8 (±0.6)

97.32 (±0.4) 98.3 (±0.3) 98.28 (±0.3) 98.56 (±0.2) 98.88 (±0.4) 98.84 (±0.3) 98.58 (±0.1)

94.3 (±0.5) 96.12 (±0.7) 95.44 (±0.4) 94.8 (±0.2) 94.84 (±0.4) 94 (±0.6) 93.78 (±0.3)

91.04 (±1.3) 89.64 (±0.9) 91.42 (±0.8) 89.34 (±0.3) 89 (±0.3) 89.88 (±0.4) 88.92 (±0.6)

77.2 (±0.6) 81.48 (±0.5) 81.64 (±0.5) 82.6 (±0.6) 87.18 (±0.8) 88.6 (±0.7) 89.46 (±0.6)

75 (±1) 77.32 (±0.6) 78.42 (±0.2) 80.14 (±0.7) 85 (±0.4) 85.56 (±1) 86.18 (±0.3)

Similarity (%) M1-UCD M1-UCM

0.81 (±0.005) 0.829 (±0.004) 0.836 (±0.007) 0.85 (±0.006) 0.861 (±0.009) 0.86 (±0.005) 0.859 (±0.008)

0.788 (±0.005) 0.821 (±0.01) 0.837 (±0.009) 0.845 (±0.012) 0.858 (±0.005) 0.863 (±0.007) 0.863 (±0.007)

0.788 (±0.005) 0.821 (±0.01) 0.837 (±0.009) 0.845 (±0.012) 0.858 (±0.005) 0.863 (±0.007) 0.863 (±0.007)

M1

0.82 (±0.003) 0.834 (±0.006) 0.843 (±0.009) 0.856 (±0.008) 0.859 (±0.008) 0.862 (±0.006) 0.862 (±0.007)

0.82 (±0.003) 0.834 (±0.006) 0.843 (±0.009) 0.856 (±0.008) 0.859 (±0.008) 0.862 (±0.006) 0.862 (±0.007)

0.81 (±0.005) 0.829 (±0.004) 0.836 (±0.007) 0.85 (±0.006) 0.861 (±0.009) 0.86 (±0.005) 0.859 (±0.008)

M2

Table B.17 – continued microF1 UN UCD UCM DF-IG 0.825 0.796 0.806 (±0.011) (±0.009) (±0.01) 0.845 0.826 0.826 (±0.006) (±0.009) (±0.008) 0.85 0.839 0.838 (±0.003) (±0.009) (±0.005) 0.859 0.847 0.85 (±0.009) (±0.01) (±0.006) 0.864 0.859 0.861 (±0.008) (±0.007) (±0.007) 0.865 0.864 0.864 (±0.005) (±0.007) (±0.008) 0.866 0.864 0.865 (±0.008) (±0.007) (±0.006) DF-MI 0.824 0.795 0.812 (±0.007) (±0.011) (±0.01) 0.836 0.825 0.83 (±0.006) (±0.009) (±0.01) 0.847 0.836 0.84 (±0.005) (±0.009) (±0.009) 0.857 0.846 0.851 (±0.009) (±0.011) (±0.006) 0.861 0.859 0.861 (±0.006) (±0.009) (±0.009) 0.864 0.862 0.863 (±0.008) (±0.007) (±0.006) 0.863 0.864 0.863 (±0.009) (±0.009) (±0.008) IG-MI 0.822 0.814 0.812 (±0.007) (±0.011) (±0.01) 0.843 0.832 0.833 (±0.005) (±0.008) (±0.007) 0.849 0.842 0.844 (±0.004) (±0.004) (±0.002) 0.858 0.855 0.853 (±0.01) (±0.005) (±0.006) 0.861 0.859 0.862 (±0.007) (±0.008) (±0.009) 0.864 0.861 0.862 (±0.007) (±0.009) (±0.007) 0.865 0.864 0.865 (±0.006) (±0.007) (±0.008) 0.802 (±0.011) 0.821 (±0.008) 0.834 (±0.007) 0.846 (±0.006) 0.858 (±0.007) 0.857 (±0.004) 0.86 (±0.007)

0.768 (±0.01) 0.819 (±0.009) 0.833 (±0.01) 0.843 (±0.007) 0.856 (±0.005) 0.86 (±0.007) 0.862 (±0.008)

0.757 (±0.006) 0.807 (±0.009) 0.825 (±0.009) 0.837 (±0.007) 0.854 (±0.004) 0.857 (±0.007) 0.859 (±0.005)

INT

0.73 (±0.009) 0.764 (±0.01) 0.767 (±0.014) 0.786 (±0.013) 0.799 (±0.012) 0.796 (±0.009) 0.794 (±0.014)

0.649 (±0.013) 0.737 (±0.015) 0.766 (±0.019) 0.777 (±0.027) 0.786 (±0.017) 0.793 (±0.018) 0.793 (±0.016)

0.649 (±0.013) 0.737 (±0.015) 0.766 (±0.019) 0.777 (±0.027) 0.786 (±0.017) 0.793 (±0.018) 0.793 (±0.016)

M1

0.737 (±0.006) 0.758 (±0.016) 0.773 (±0.015) 0.791 (±0.019) 0.798 (±0.012) 0.799 (±0.014) 0.8 (±0.012)

0.737 (±0.006) 0.758 (±0.016) 0.773 (±0.015) 0.791 (±0.019) 0.798 (±0.012) 0.799 (±0.014) 0.8 (±0.012)

0.73 (±0.009) 0.764 (±0.01) 0.767 (±0.014) 0.786 (±0.013) 0.799 (±0.012) 0.796 (±0.009) 0.794 (±0.014)

M2

0.742 (±0.013) 0.777 (±0.015) 0.781 (±0.011) 0.793 (±0.02) 0.799 (±0.011) 0.801 (±0.016) 0.803 (±0.012)

0.743 (±0.016) 0.759 (±0.018) 0.78 (±0.013) 0.796 (±0.019) 0.795 (±0.015) 0.798 (±0.017) 0.792 (±0.017)

UC

0.727 (±0.015) 0.757 (±0.022) 0.769 (±0.013) 0.792 (±0.018) 0.796 (±0.01) 0.799 (±0.017) 0.802 (±0.013)

0.675 (±0.02) 0.746 (±0.018) 0.762 (±0.02) 0.778 (±0.024) 0.79 (±0.022) 0.794 (±0.017) 0.798 (±0.017)

0.689 (±0.013) 0.747 (±0.015) 0.768 (±0.015) 0.777 (±0.026) 0.793 (±0.019) 0.798 (±0.014) 0.796 (±0.013)

macroF1

0.745 (±0.018) 0.776 (±0.016) 0.785 (±0.011) 0.795 (±0.02) 0.802 (±0.015) 0.8 (±0.013) 0.799 (±0.018)

UN

0.729 (±0.018) 0.764 (±0.011) 0.776 (±0.01) 0.789 (±0.013) 0.8 (±0.015) 0.8 (±0.013) 0.803 (±0.013)

0.722 (±0.013) 0.752 (±0.017) 0.767 (±0.019) 0.786 (±0.015) 0.796 (±0.015) 0.798 (±0.012) 0.798 (±0.016)

0.709 (±0.02) 0.752 (±0.015) 0.764 (±0.016) 0.785 (±0.013) 0.798 (±0.016) 0.8 (±0.016) 0.8 (±0.014)

UM

0.714 (±0.017) 0.741 (±0.013) 0.766 (±0.012) 0.785 (±0.015) 0.795 (±0.014) 0.793 (±0.008) 0.797 (±0.01)

0.569 (±0.016) 0.734 (±0.014) 0.76 (±0.02) 0.773 (±0.018) 0.789 (±0.01) 0.797 (±0.015) 0.801 (±0.016)

0.552 (±0.015) 0.722 (±0.014) 0.746 (±0.02) 0.768 (±0.019) 0.787 (±0.012) 0.793 (±0.014) 0.796 (±0.01)

INT

176

0.9 (±0.1) 1.76 (±0.2) 2.58 (±0.2) 4.16 (±0.4) 7.68 (±0.5) 11.02 (±0.4) 14.26 (±0.4)

0.88 (±0) 1.72 (±0.1) 2.46 (±0.3) 3.92 (±0.5) 7.26 (±0.7) 10.24 (±0.7) 13.2 (±0.5)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.92 (±0.1) 1.76 (±0.2) 2.56 (±0.3) 4.12 (±0.5) 7.34 (±0.6) 10.48 (±0.6) 13.52 (±0.7)

0.12 (±0) 0.3 (±0.1) 0.54 (±0.3) 1.08 (±0.5) 2.74 (±0.7) 4.76 (±0.7) 6.8 (±0.5)

0.1 (±0.1) 0.24 (±0.2) 0.42 (±0.2) 0.86 (±0.4) 2.32 (±0.5) 3.98 (±0.4) 5.74 (±0.4)

0.08 (±0.1) 0.24 (±0.2) 0.44 (±0.3) 0.88 (±0.5) 2.66 (±0.6) 4.52 (±0.6) 6.5 (±0.7)

Threshold (%) UTh ITh

0.5

Th

21.24 (±11.9) 30.06 (±15) 34.7 (±18.1) 42.5 (±18.2) 54.8 (±13.2) 63.2 (±9.4) 68.1 (±5.3)

19.56 (±10) 24.04 (±13.4) 28.86 (±14.6) 34.12 (±15.6) 46.42 (±9.9) 52.74 (±5.4) 57.32 (±3.9)

17.34 (±15.7) 23.5 (±18.8) 28.46 (±20.8) 36.06 (±20.5) 53.06 (±12.5) 60.16 (±8.4) 64.7 (±6.7)

M1-M2

68.54 (±4.9) 65.82 (±5.6) 67.76 (±7.1) 70.18 (±8.2) 75.28 (±7.2) 79.92 (±6.1) 82.36 (±4.9)

67.54 (±4.5) 67.56 (±5.2) 68.2 (±6.1) 71.22 (±7.2) 75.6 (±6.1) 79.06 (±5.1) 80.6 (±4.3)

22.96 (±16.5) 27.34 (±19.5) 34.38 (±18.9) 42.96 (±18.3) 59.14 (±10.8) 64.82 (±8.3) 68.54 (±7.7)

53.22 (±5.3) 62.12 (±10.4) 65.92 (±10.4) 68.48 (±10.8) 74.64 (±9) 76.44 (±8.1) 77.78 (±4.7)

42.56 (±5.5) 46.12 (±12.2) 51.52 (±10.9) 56.8 (±11.9) 63.86 (±9.9) 66.8 (±5.6) 68.88 (±3.8)

65.82 (±5.3) 59.46 (±10.6) 64.1 (±12.2) 63.36 (±11.4) 74.96 (±7.1) 77.94 (±4.2) 79.66 (±4.9)

Similarity (%) M1-UCD M1-UCM

0.552 (±0.127) 0.673 (±0.07) 0.721 (±0.057) 0.757 (±0.041) 0.798 (±0.016) 0.819 (±0.007) 0.823 (±0.005)

0.552 (±0.127) 0.673 (±0.07) 0.721 (±0.057) 0.757 (±0.041) 0.798 (±0.016) 0.819 (±0.007) 0.823 (±0.005)

0.552 (±0.127) 0.673 (±0.07) 0.721 (±0.057) 0.757 (±0.041) 0.798 (±0.016) 0.819 (±0.007) 0.823 (±0.005)

M1

0.676 (±0.068) 0.747 (±0.038) 0.779 (±0.025) 0.801 (±0.015) 0.823 (±0.008) 0.83 (±0.008) 0.832 (±0.008)

0.677 (±0.063) 0.744 (±0.037) 0.775 (±0.027) 0.789 (±0.024) 0.812 (±0.007) 0.824 (±0.002) 0.831 (±0.001)

0.631 (±0.075) 0.698 (±0.06) 0.729 (±0.047) 0.768 (±0.03) 0.811 (±0.009) 0.824 (±0.004) 0.831 (±0.004)

M2

0.714 (±0.049) 0.772 (±0.031) 0.794 (±0.018) 0.808 (±0.008) 0.828 (±0.007) 0.831 (±0.007) 0.828 (±0.006)

0.718 (±0.054) 0.771 (±0.035) 0.788 (±0.02) 0.808 (±0.01) 0.824 (±0.007) 0.831 (±0.006) 0.833 (±0.005)

UCD CC-DF 0.559 (±0.135) 0.679 (±0.072) 0.707 (±0.064) 0.751 (±0.038) 0.802 (±0.013) 0.823 (±0.007) 0.827 (±0.004) CC-IG 0.639 (±0.085) 0.714 (±0.052) 0.757 (±0.037) 0.792 (±0.018) 0.813 (±0.007) 0.824 (±0.004) 0.83 (±0.006) CC-MI 0.635 (±0.086) 0.72 (±0.049) 0.765 (±0.03) 0.79 (±0.018) 0.815 (±0.009) 0.827 (±0.003) 0.827 (±0.005)

microF1

0.682 (±0.062) 0.742 (±0.038) 0.771 (±0.028) 0.795 (±0.014) 0.819 (±0.004) 0.826 (±0.006) 0.829 (±0.003)

UN

0.666 (±0.068) 0.721 (±0.052) 0.761 (±0.037) 0.794 (±0.021) 0.814 (±0.009) 0.826 (±0.005) 0.83 (±0.007)

0.653 (±0.072) 0.714 (±0.052) 0.753 (±0.04) 0.786 (±0.024) 0.809 (±0.009) 0.819 (±0.005) 0.831 (±0.004)

0.639 (±0.072) 0.702 (±0.054) 0.729 (±0.045) 0.769 (±0.03) 0.803 (±0.011) 0.822 (±0.002) 0.828 (±0.002)

UCM

0.339 (±0.225) 0.552 (±0.156) 0.603 (±0.143) 0.72 (±0.06) 0.786 (±0.026) 0.813 (±0.011) 0.822 (±0.005)

0.301 (±0.274) 0.502 (±0.2) 0.605 (±0.107) 0.681 (±0.081) 0.778 (±0.027) 0.802 (±0.015) 0.82 (±0.006)

0.294 (±0.277) 0.538 (±0.161) 0.59 (±0.153) 0.687 (±0.078) 0.777 (±0.025) 0.807 (±0.012) 0.82 (±0.003)

INT

Table B.18: Combining operators using Alj-Mgz-MR Dataset (FLocal)

0.356 (±0.152) 0.517 (±0.11) 0.575 (±0.097) 0.623 (±0.071) 0.693 (±0.027) 0.738 (±0.013) 0.75 (±0.015)

0.356 (±0.152) 0.517 (±0.11) 0.575 (±0.097) 0.623 (±0.071) 0.693 (±0.027) 0.738 (±0.013) 0.75 (±0.015)

0.356 (±0.152) 0.517 (±0.11) 0.575 (±0.097) 0.623 (±0.071) 0.693 (±0.027) 0.738 (±0.013) 0.75 (±0.015)

M1

0.505 (±0.122) 0.657 (±0.052) 0.688 (±0.035) 0.709 (±0.021) 0.754 (±0.013) 0.768 (±0.02) 0.767 (±0.018)

0.517 (±0.115) 0.656 (±0.047) 0.685 (±0.033) 0.705 (±0.032) 0.735 (±0.012) 0.755 (±0.009) 0.768 (±0.011)

0.411 (±0.129) 0.471 (±0.136) 0.529 (±0.125) 0.638 (±0.057) 0.72 (±0.013) 0.748 (±0.011) 0.764 (±0.017)

M2

0.575 (±0.083) 0.677 (±0.043) 0.701 (±0.025) 0.72 (±0.011) 0.759 (±0.016) 0.766 (±0.018) 0.766 (±0.016)

0.576 (±0.095) 0.676 (±0.044) 0.695 (±0.028) 0.726 (±0.012) 0.754 (±0.018) 0.764 (±0.02) 0.768 (±0.016)

UCD

0.418 (±0.154) 0.537 (±0.107) 0.649 (±0.057) 0.687 (±0.029) 0.737 (±0.008) 0.757 (±0.013) 0.76 (±0.016)

0.413 (±0.16) 0.548 (±0.1) 0.629 (±0.07) 0.689 (±0.032) 0.729 (±0.008) 0.751 (±0.01) 0.763 (±0.018)

0.344 (±0.177) 0.459 (±0.138) 0.5 (±0.14) 0.592 (±0.077) 0.705 (±0.025) 0.75 (±0.014) 0.757 (±0.015)

macroF1

0.477 (±0.125) 0.575 (±0.079) 0.644 (±0.066) 0.693 (±0.022) 0.741 (±0.007) 0.758 (±0.018) 0.762 (±0.014)

UN

0.462 (±0.135) 0.564 (±0.104) 0.629 (±0.077) 0.696 (±0.033) 0.726 (±0.01) 0.758 (±0.014) 0.76 (±0.021)

0.441 (±0.125) 0.57 (±0.107) 0.636 (±0.075) 0.691 (±0.04) 0.716 (±0.012) 0.741 (±0.006) 0.766 (±0.014)

0.43 (±0.122) 0.499 (±0.115) 0.527 (±0.12) 0.62 (±0.07) 0.7 (±0.02) 0.747 (±0.002) 0.757 (±0.011)

UCM

0.205 (±0.209) 0.368 (±0.194) 0.417 (±0.191) 0.572 (±0.097) 0.663 (±0.048) 0.722 (±0.016) 0.746 (±0.009)

0.197 (±0.217) 0.345 (±0.193) 0.43 (±0.167) 0.518 (±0.135) 0.662 (±0.05) 0.699 (±0.027) 0.741 (±0.013)

0.184 (±0.218) 0.34 (±0.196) 0.399 (±0.187) 0.5 (±0.139) 0.647 (±0.049) 0.708 (±0.025) 0.745 (±0.01)

INT

177

0.88 (±0) 1.72 (±0.2) 2.54 (±0.4) 3.98 (±0.5) 7 (±0.6) 9.9 (±0.4) 12.68 (±0.3)

0.64 (±0.1) 1.24 (±0.1) 1.8 (±0) 3.04 (±0.1) 5.98 (±0.1) 8.96 (±0.2) 11.84 (±0.1)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.88 (±0) 1.68 (±0.2) 2.5 (±0.3) 3.92 (±0.4) 7.2 (±0.5) 10.4 (±0.5) 13.32 (±0.3)

0.5

Th

0.36 (±0.1) 0.76 (±0.1) 1.2 (±0) 1.96 (±0.1) 4.02 (±0.1) 6.04 (±0.2) 8.16 (±0.1)

0.12 (±0) 0.28 (±0.2) 0.46 (±0.4) 1.02 (±0.5) 3 (±0.6) 5.1 (±0.4) 7.32 (±0.3)

0.12 (±0) 0.32 (±0.2) 0.5 (±0.3) 1.08 (±0.4) 2.8 (±0.5) 4.6 (±0.5) 6.68 (±0.3)

Threshold (%) UTh ITh

72.68 (±5.3) 76.54 (±3.7) 80.14 (±1.6) 77.98 (±1.3) 80.48 (±2.7) 80.66 (±2.8) 81.36 (±0.9)

23.7 (±14.4) 26.62 (±22.9) 31.68 (±24.6) 41.94 (±21.6) 59.98 (±11.9) 68.22 (±5.3) 73 (±3.2)

26.74 (±9.3) 31.88 (±14.4) 32.86 (±18.2) 42 (±16.2) 55.68 (±9.9) 61.28 (±5.8) 66.6 (±3.3)

M1-M2

87.52 (±4.6) 85.86 (±3.4) 88.96 (±3.2) 88.36 (±2.3) 88.78 (±2.8) 86.24 (±1.6) 86.98 (±0.6)

99.88 (±0.3) 100 (±0) 99.58 (±0.8) 99.9 (±0.2) 99.56 (±0.6) 99.52 (±0.7) 99.38 (±0.5)

98.82 (±2.3) 99.34 (±1.2) 98.8 (±0.9) 99.9 (±0.2) 99.48 (±0.1) 99.48 (±0.4) 99.56 (±0.2)

94.32 (±3.9) 94.92 (±2.5) 94.76 (±0.9) 95.08 (±1.1) 95.28 (±1.8) 94.38 (±0.4) 95.04 (±0.6)

51.26 (±12.1) 58.82 (±13.5) 66 (±12.4) 73.46 (±10.5) 80.5 (±7.4) 82.76 (±6.1) 85.32 (±5.1)

48.5 (±12.3) 55.14 (±11.4) 56.56 (±11.8) 70.82 (±6.9) 75.64 (±6) 78.3 (±5.3) 79.44 (±3.6)

Similarity (%) M1-UCD M1-UCM

0.677 (±0.063) 0.744 (±0.037) 0.775 (±0.027) 0.789 (±0.024) 0.812 (±0.007) 0.824 (±0.002) 0.831 (±0.001)

0.631 (±0.075) 0.698 (±0.06) 0.729 (±0.047) 0.768 (±0.03) 0.811 (±0.009) 0.824 (±0.004) 0.831 (±0.004)

0.631 (±0.075) 0.698 (±0.06) 0.729 (±0.047) 0.768 (±0.03) 0.811 (±0.009) 0.824 (±0.004) 0.831 (±0.004)

M1

0.676 (±0.068) 0.747 (±0.038) 0.779 (±0.025) 0.801 (±0.015) 0.823 (±0.008) 0.83 (±0.008) 0.832 (±0.008)

0.676 (±0.068) 0.747 (±0.038) 0.779 (±0.025) 0.801 (±0.015) 0.823 (±0.008) 0.83 (±0.008) 0.832 (±0.008)

0.677 (±0.063) 0.744 (±0.037) 0.775 (±0.027) 0.789 (±0.024) 0.812 (±0.007) 0.824 (±0.002) 0.831 (±0.001)

M2

Table B.18 – continued microF1 UN UCD UCM DF-IG 0.717 0.629 0.675 (±0.052) (±0.077) (±0.071) 0.774 0.696 0.731 (±0.027) (±0.063) (±0.044) 0.794 0.724 0.776 (±0.016) (±0.047) (±0.024) 0.813 0.768 0.798 (±0.01) (±0.029) (±0.018) 0.827 0.807 0.819 (±0.002) (±0.011) (±0.004) 0.835 0.825 0.826 (±0.01) (±0.006) (±0.004) 0.839 0.831 0.837 (±0.008) (±0.005) (±0.007) DF-MI 0.715 0.631 0.676 (±0.05) (±0.075) (±0.064) 0.776 0.697 0.729 (±0.024) (±0.06) (±0.048) 0.8 0.728 0.766 (±0.014) (±0.046) (±0.027) 0.814 0.767 0.8 (±0.007) (±0.029) (±0.016) 0.827 0.811 0.817 (±0.006) (±0.009) (±0.009) 0.834 0.824 0.832 (±0.011) (±0.003) (±0.008) 0.836 0.831 0.833 (±0.009) (±0.005) (±0.008) IG-MI 0.709 0.679 0.68 (±0.058) (±0.065) (±0.062) 0.771 0.749 0.751 (±0.03) (±0.035) (±0.037) 0.787 0.777 0.779 (±0.022) (±0.025) (±0.026) 0.805 0.798 0.795 (±0.015) (±0.017) (±0.017) 0.823 0.818 0.816 (±0.007) (±0.01) (±0.009) 0.83 0.83 0.825 (±0.006) (±0.01) (±0.005) 0.834 0.835 0.835 (±0.005) (±0.007) (±0.003) 0.617 (±0.084) 0.718 (±0.045) 0.762 (±0.033) 0.787 (±0.024) 0.806 (±0.009) 0.82 (±0.003) 0.827 (±0.003)

0.457 (±0.172) 0.62 (±0.099) 0.642 (±0.094) 0.707 (±0.063) 0.792 (±0.019) 0.817 (±0.01) 0.828 (±0.004)

0.556 (±0.105) 0.633 (±0.081) 0.65 (±0.086) 0.707 (±0.066) 0.789 (±0.018) 0.804 (±0.012) 0.826 (±0.004)

INT

0.517 (±0.115) 0.656 (±0.047) 0.685 (±0.033) 0.705 (±0.032) 0.735 (±0.012) 0.755 (±0.009) 0.768 (±0.011)

0.411 (±0.129) 0.471 (±0.136) 0.529 (±0.125) 0.638 (±0.057) 0.72 (±0.013) 0.748 (±0.011) 0.764 (±0.017)

0.411 (±0.129) 0.471 (±0.136) 0.529 (±0.125) 0.638 (±0.057) 0.72 (±0.013) 0.748 (±0.011) 0.764 (±0.017)

M1

0.505 (±0.122) 0.657 (±0.052) 0.688 (±0.035) 0.709 (±0.021) 0.754 (±0.013) 0.768 (±0.02) 0.767 (±0.018)

0.505 (±0.122) 0.657 (±0.052) 0.688 (±0.035) 0.709 (±0.021) 0.754 (±0.013) 0.768 (±0.02) 0.767 (±0.018)

0.517 (±0.115) 0.656 (±0.047) 0.685 (±0.033) 0.705 (±0.032) 0.735 (±0.012) 0.755 (±0.009) 0.768 (±0.011)

M2

0.564 (±0.091) 0.675 (±0.047) 0.696 (±0.031) 0.716 (±0.023) 0.757 (±0.015) 0.765 (±0.017) 0.771 (±0.018)

0.562 (±0.099) 0.683 (±0.034) 0.706 (±0.023) 0.723 (±0.011) 0.759 (±0.015) 0.772 (±0.026) 0.771 (±0.022)

UC

0.485 (±0.135) 0.653 (±0.052) 0.683 (±0.035) 0.709 (±0.024) 0.739 (±0.014) 0.764 (±0.018) 0.773 (±0.018)

0.412 (±0.131) 0.466 (±0.137) 0.527 (±0.123) 0.638 (±0.058) 0.721 (±0.014) 0.748 (±0.011) 0.765 (±0.017)

0.411 (±0.136) 0.469 (±0.143) 0.523 (±0.122) 0.637 (±0.057) 0.714 (±0.017) 0.752 (±0.016) 0.765 (±0.016)

macroF1

0.571 (±0.098) 0.687 (±0.034) 0.7 (±0.023) 0.727 (±0.016) 0.756 (±0.01) 0.775 (±0.02) 0.774 (±0.018)

UN

0.504 (±0.121) 0.66 (±0.05) 0.685 (±0.033) 0.706 (±0.026) 0.737 (±0.01) 0.759 (±0.014) 0.772 (±0.017)

0.472 (±0.131) 0.549 (±0.103) 0.631 (±0.062) 0.7 (±0.024) 0.726 (±0.016) 0.761 (±0.018) 0.769 (±0.019)

0.472 (±0.128) 0.571 (±0.081) 0.672 (±0.039) 0.7 (±0.031) 0.732 (±0.006) 0.752 (±0.01) 0.771 (±0.017)

UM

0.44 (±0.128) 0.631 (±0.052) 0.671 (±0.04) 0.7 (±0.03) 0.718 (±0.015) 0.747 (±0.006) 0.763 (±0.013)

0.29 (±0.192) 0.413 (±0.162) 0.43 (±0.172) 0.535 (±0.113) 0.698 (±0.028) 0.741 (±0.011) 0.76 (±0.013)

0.344 (±0.149) 0.418 (±0.142) 0.439 (±0.162) 0.536 (±0.117) 0.695 (±0.024) 0.708 (±0.021) 0.754 (±0.01)

INT

178

0.9 (±0.1) 1.74 (±0.1) 2.58 (±0.2) 4.12 (±0.3) 7.8 (±0.4) 11.26 (±0.3) 14.56 (±0.4)

0.84 (±0.1) 1.6 (±0.1) 2.34 (±0.2) 3.7 (±0.3) 6.84 (±0.4) 9.9 (±0.2) 12.84 (±0.2)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.92 (±0.1) 1.78 (±0.2) 2.54 (±0.3) 4.04 (±0.4) 7.48 (±0.5) 10.92 (±0.5) 14.24 (±0.5)

0.16 (±0.1) 0.4 (±0.1) 0.66 (±0.2) 1.3 (±0.3) 3.16 (±0.4) 5.1 (±0.2) 7.16 (±0.2)

0.1 (±0.1) 0.28 (±0.1) 0.42 (±0.2) 0.88 (±0.3) 2.2 (±0.4) 3.74 (±0.3) 5.44 (±0.4)

0.08 (±0.1) 0.24 (±0.2) 0.46 (±0.3) 0.96 (±0.4) 2.52 (±0.5) 4.08 (±0.5) 5.76 (±0.5)

Threshold (%) UTh ITh

0.5

Th

33.28 (±12.3) 40.6 (±13.9) 44.28 (±13.8) 52.16 (±12.2) 62.74 (±8.2) 67.82 (±2.7) 71.66 (±2)

20.24 (±9.2) 24.92 (±11.7) 27.42 (±13.7) 35.92 (±12) 44.36 (±8.1) 49.7 (±4.6) 54.22 (±3.6)

17.14 (±15.7) 23.68 (±17.5) 31.46 (±16.1) 39.16 (±14.3) 50.48 (±8.9) 54.6 (±6.8) 57.44 (±5.3)

M1-M2

68.74 (±8.8) 71.98 (±6.1) 72.16 (±6.6) 75.4 (±5.7) 82.42 (±4) 83.22 (±2.2) 85.4 (±2.4)

60.66 (±7.2) 64.32 (±4.9) 64.32 (±4.7) 68.4 (±3.2) 70.96 (±3.2) 70.14 (±1.3) 72.7 (±2.8)

22.76 (±15) 30.52 (±15.7) 36.88 (±14.1) 43.8 (±12.9) 55 (±8.1) 57.98 (±7.1) 60.8 (±5.8)

59 (±9.3) 62.56 (±9.8) 68.4 (±9.5) 70.82 (±9.5) 76.56 (±9.7) 79.3 (±6.3) 82.1 (±5.7)

45.96 (±10) 41.12 (±10.9) 43.6 (±12.5) 51.18 (±8.9) 59.14 (±6.7) 61.16 (±3.6) 64.44 (±2.2)

36.52 (±13.4) 40.32 (±14.8) 49.44 (±13.6) 54.44 (±10.9) 66.58 (±6.7) 69.94 (±3.4) 71.88 (±5.5)

Similarity (%) M1-UCD M1-UCM

0.584 (±0.105) 0.677 (±0.072) 0.715 (±0.056) 0.759 (±0.034) 0.798 (±0.018) 0.812 (±0.008) 0.82 (±0.01)

0.584 (±0.105) 0.677 (±0.072) 0.715 (±0.056) 0.759 (±0.034) 0.798 (±0.018) 0.812 (±0.008) 0.82 (±0.01)

0.584 (±0.105) 0.677 (±0.072) 0.715 (±0.056) 0.759 (±0.034) 0.798 (±0.018) 0.812 (±0.008) 0.82 (±0.01)

M1

0.69 (±0.063) 0.759 (±0.039) 0.779 (±0.029) 0.8 (±0.018) 0.818 (±0.009) 0.829 (±0.008) 0.834 (±0.007)

0.678 (±0.057) 0.745 (±0.041) 0.768 (±0.031) 0.793 (±0.015) 0.807 (±0.008) 0.826 (±0.004) 0.832 (±0.009)

0.599 (±0.095) 0.701 (±0.053) 0.72 (±0.047) 0.762 (±0.031) 0.797 (±0.011) 0.823 (±0.004) 0.827 (±0.004)

M2

0.72 (±0.046) 0.773 (±0.028) 0.793 (±0.015) 0.809 (±0.011) 0.823 (±0.009) 0.83 (±0.008) 0.835 (±0.008)

0.723 (±0.047) 0.775 (±0.028) 0.791 (±0.019) 0.813 (±0.008) 0.824 (±0.005) 0.831 (±0.008) 0.839 (±0.008)

UCD CC-DF 0.554 (±0.118) 0.676 (±0.077) 0.709 (±0.051) 0.763 (±0.031) 0.794 (±0.012) 0.817 (±0.007) 0.826 (±0.003) CC-IG 0.66 (±0.071) 0.714 (±0.053) 0.737 (±0.043) 0.782 (±0.024) 0.815 (±0.007) 0.823 (±0.006) 0.829 (±0.004) CC-MI 0.654 (±0.074) 0.722 (±0.053) 0.755 (±0.041) 0.791 (±0.017) 0.813 (±0.008) 0.823 (±0.01) 0.827 (±0.006)

microF1

0.678 (±0.055) 0.738 (±0.043) 0.757 (±0.035) 0.789 (±0.016) 0.815 (±0.003) 0.827 (±0.005) 0.831 (±0.006)

UN

0.667 (±0.075) 0.739 (±0.05) 0.756 (±0.043) 0.782 (±0.029) 0.812 (±0.009) 0.825 (±0.008) 0.833 (±0.007)

0.666 (±0.067) 0.725 (±0.046) 0.744 (±0.039) 0.777 (±0.029) 0.808 (±0.011) 0.819 (±0.007) 0.825 (±0.006)

0.602 (±0.084) 0.701 (±0.052) 0.726 (±0.043) 0.765 (±0.032) 0.802 (±0.015) 0.816 (±0.003) 0.824 (±0.006)

UCM

0.442 (±0.171) 0.588 (±0.139) 0.636 (±0.111) 0.723 (±0.054) 0.782 (±0.025) 0.809 (±0.012) 0.821 (±0.003)

0.361 (±0.251) 0.586 (±0.115) 0.616 (±0.103) 0.703 (±0.067) 0.768 (±0.028) 0.796 (±0.014) 0.813 (±0.008)

0.244 (±0.288) 0.543 (±0.166) 0.638 (±0.09) 0.699 (±0.067) 0.767 (±0.033) 0.8 (±0.014) 0.814 (±0.01)

INT

Table B.19: Combining operators using Alj-Mgz-MR Dataset (WLocal)

0.362 (±0.138) 0.469 (±0.135) 0.501 (±0.131) 0.584 (±0.083) 0.68 (±0.038) 0.72 (±0.019) 0.742 (±0.028)

0.362 (±0.138) 0.469 (±0.135) 0.501 (±0.131) 0.584 (±0.083) 0.68 (±0.038) 0.72 (±0.019) 0.742 (±0.028)

0.362 (±0.138) 0.469 (±0.135) 0.501 (±0.131) 0.584 (±0.083) 0.68 (±0.038) 0.72 (±0.019) 0.742 (±0.028)

M1

0.51 (±0.124) 0.612 (±0.081) 0.671 (±0.045) 0.7 (±0.028) 0.737 (±0.01) 0.758 (±0.015) 0.767 (±0.013)

0.463 (±0.114) 0.611 (±0.075) 0.669 (±0.043) 0.699 (±0.024) 0.722 (±0.007) 0.756 (±0.01) 0.764 (±0.02)

0.338 (±0.166) 0.463 (±0.139) 0.472 (±0.134) 0.583 (±0.078) 0.682 (±0.033) 0.745 (±0.014) 0.754 (±0.015)

M2

0.555 (±0.095) 0.638 (±0.059) 0.686 (±0.027) 0.712 (±0.02) 0.75 (±0.014) 0.762 (±0.019) 0.768 (±0.02)

0.553 (±0.096) 0.661 (±0.049) 0.688 (±0.031) 0.718 (±0.015) 0.748 (±0.009) 0.764 (±0.015) 0.774 (±0.024)

UCD

0.427 (±0.142) 0.502 (±0.131) 0.587 (±0.09) 0.682 (±0.032) 0.724 (±0.016) 0.747 (±0.015) 0.754 (±0.014)

0.419 (±0.15) 0.494 (±0.131) 0.513 (±0.126) 0.654 (±0.052) 0.729 (±0.01) 0.752 (±0.008) 0.759 (±0.017)

0.311 (±0.16) 0.444 (±0.153) 0.466 (±0.132) 0.59 (±0.078) 0.672 (±0.035) 0.735 (±0.019) 0.753 (±0.013)

macroF1

0.445 (±0.123) 0.542 (±0.115) 0.576 (±0.091) 0.66 (±0.04) 0.724 (±0.011) 0.751 (±0.016) 0.762 (±0.015)

UN

0.465 (±0.138) 0.56 (±0.112) 0.584 (±0.11) 0.644 (±0.077) 0.722 (±0.013) 0.75 (±0.012) 0.764 (±0.017)

0.439 (±0.124) 0.552 (±0.104) 0.583 (±0.104) 0.653 (±0.075) 0.714 (±0.013) 0.74 (±0.013) 0.756 (±0.017)

0.339 (±0.152) 0.464 (±0.136) 0.513 (±0.111) 0.598 (±0.074) 0.697 (±0.029) 0.729 (±0.009) 0.751 (±0.016)

UCM

0.247 (±0.172) 0.385 (±0.204) 0.422 (±0.182) 0.539 (±0.107) 0.638 (±0.064) 0.702 (±0.028) 0.737 (±0.013)

0.207 (±0.172) 0.376 (±0.173) 0.416 (±0.169) 0.526 (±0.119) 0.622 (±0.067) 0.672 (±0.037) 0.721 (±0.024)

0.14 (±0.212) 0.338 (±0.209) 0.42 (±0.156) 0.462 (±0.152) 0.591 (±0.085) 0.671 (±0.045) 0.724 (±0.03)

INT

179

0.88 (±0) 1.74 (±0.1) 2.52 (±0.2) 3.96 (±0.3) 7.52 (±0.6) 10.8 (±0.5) 14.08 (±0.3)

0.7 (±0) 1.3 (±0) 2.02 (±0) 3.3 (±0) 6.58 (±0.1) 9.8 (±0.2) 12.96 (±0.2)

0.5

0.5

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

10

7.5

5

2.5

1.5

1

0.88 (±0) 1.66 (±0.1) 2.42 (±0.2) 3.9 (±0.3) 7.54 (±0.4) 10.66 (±0.4) 13.8 (±0.1)

0.5

Th

0.3 (±0) 0.7 (±0) 0.98 (±0) 1.7 (±0) 3.42 (±0.1) 5.2 (±0.2) 7.04 (±0.2)

0.12 (±0) 0.28 (±0.1) 0.48 (±0.2) 1.04 (±0.3) 2.48 (±0.6) 4.2 (±0.5) 5.92 (±0.3)

0.12 (±0) 0.34 (±0.1) 0.58 (±0.2) 1.1 (±0.3) 2.46 (±0.4) 4.34 (±0.4) 6.2 (±0.1)

Threshold (%) UTh ITh

65.28 (±2.9) 69.22 (±1.5) 67.14 (±3.1) 68.06 (±0.8) 68.16 (±1.2) 69.58 (±2.6) 70.42 (±2.2)

21.98 (±11.1) 30.18 (±13.8) 32.36 (±15.8) 42.24 (±13.2) 49.34 (±11.2) 56.1 (±6.7) 59.3 (±3)

25.92 (±8.2) 35.74 (±9.8) 37.64 (±12.7) 44.4 (±11.9) 49.24 (±8.7) 57.62 (±5.1) 62.06 (±1.1)

M1-M2

90.2 (±4.6) 89.8 (±3.4) 89.88 (±3.2) 89.76 (±1.3) 89.34 (±1.9) 88.92 (±0.7) 89.26 (±1.2)

99.64 (±0.8) 99.58 (±0.9) 98.52 (±0.8) 99.02 (±1.1) 98.74 (±0.5) 98.44 (±0.5) 97.96 (±0.2)

98.66 (±3) 96.04 (±1.4) 94.7 (±0.4) 95 (±0.6) 93.66 (±0.9) 95.1 (±2.1) 95.22 (±1.1)

92.84 (±2.4) 96.32 (±2.2) 96.36 (±1.9) 96.48 (±1) 94.22 (±1.3) 94.54 (±0.5) 94.24 (±1.1)

76.64 (±3.7) 80.72 (±1.1) 77.9 (±4.6) 81.24 (±4.1) 76.92 (±8) 78.84 (±6.8) 82.34 (±3.8)

78.44 (±5.6) 71.3 (±4.4) 69.78 (±4.8) 74.94 (±2.7) 74 (±3.2) 76.8 (±1.3) 78.02 (±1.3)

Similarity (%) M1-UCD M1-UCM

0.678 (±0.057) 0.745 (±0.041) 0.768 (±0.031) 0.793 (±0.015) 0.807 (±0.008) 0.826 (±0.004) 0.832 (±0.009)

0.599 (±0.095) 0.701 (±0.053) 0.72 (±0.047) 0.762 (±0.031) 0.797 (±0.011) 0.823 (±0.004) 0.827 (±0.004)

0.599 (±0.095) 0.701 (±0.053) 0.72 (±0.047) 0.762 (±0.031) 0.797 (±0.011) 0.823 (±0.004) 0.827 (±0.004)

M1

0.69 (±0.063) 0.759 (±0.039) 0.779 (±0.029) 0.8 (±0.018) 0.818 (±0.009) 0.829 (±0.008) 0.834 (±0.007)

0.69 (±0.063) 0.759 (±0.039) 0.779 (±0.029) 0.8 (±0.018) 0.818 (±0.009) 0.829 (±0.008) 0.834 (±0.007)

0.678 (±0.057) 0.745 (±0.041) 0.768 (±0.031) 0.793 (±0.015) 0.807 (±0.008) 0.826 (±0.004) 0.832 (±0.009)

M2

Table B.19 – continued microF1 UN UCD UCM DF-IG 0.716 0.595 0.665 (±0.049) (±0.087) (±0.064) 0.778 0.702 0.724 (±0.026) (±0.053) (±0.046) 0.79 0.717 0.748 (±0.016) (±0.05) (±0.036) 0.811 0.76 0.776 (±0.01) (±0.034) (±0.024) 0.828 0.794 0.815 (±0.008) (±0.016) (±0.005) 0.835 0.824 0.826 (±0.008) (±0.004) (±0.007) 0.838 0.825 0.83 (±0.008) (±0.006) (±0.004) DF-MI 0.726 0.598 0.667 (±0.044) (±0.093) (±0.065) 0.779 0.703 0.722 (±0.023) (±0.057) (±0.051) 0.797 0.719 0.746 (±0.012) (±0.049) (±0.042) 0.812 0.764 0.774 (±0.006) (±0.034) (±0.025) 0.831 0.798 0.819 (±0.01) (±0.011) (±0.003) 0.835 0.823 0.828 (±0.009) (±0.006) (±0.006) 0.834 0.826 0.833 (±0.008) (±0.004) (±0.008) IG-MI 0.719 0.682 0.683 (±0.053) (±0.056) (±0.063) 0.77 0.748 0.75 (±0.033) (±0.039) (±0.035) 0.789 0.772 0.77 (±0.019) (±0.028) (±0.028) 0.804 0.8 0.796 (±0.012) (±0.016) (±0.016) 0.821 0.815 0.812 (±0.005) (±0.009) (±0.006) 0.836 0.832 0.828 (±0.008) (±0.007) (±0.005) 0.838 0.838 0.831 (±0.009) (±0.008) (±0.006) 0.629 (±0.08) 0.726 (±0.044) 0.752 (±0.037) 0.783 (±0.022) 0.803 (±0.012) 0.817 (±0.006) 0.83 (±0.006)

0.347 (±0.211) 0.642 (±0.083) 0.654 (±0.089) 0.708 (±0.066) 0.773 (±0.028) 0.813 (±0.009) 0.822 (±0.006)

0.519 (±0.104) 0.644 (±0.077) 0.655 (±0.08) 0.705 (±0.061) 0.76 (±0.033) 0.802 (±0.011) 0.818 (±0.005)

INT

0.463 (±0.114) 0.611 (±0.075) 0.669 (±0.043) 0.699 (±0.024) 0.722 (±0.007) 0.756 (±0.01) 0.764 (±0.02)

0.338 (±0.166) 0.463 (±0.139) 0.472 (±0.134) 0.583 (±0.078) 0.682 (±0.033) 0.745 (±0.014) 0.754 (±0.015)

0.338 (±0.166) 0.463 (±0.139) 0.472 (±0.134) 0.583 (±0.078) 0.682 (±0.033) 0.745 (±0.014) 0.754 (±0.015)

M1

0.51 (±0.124) 0.612 (±0.081) 0.671 (±0.045) 0.7 (±0.028) 0.737 (±0.01) 0.758 (±0.015) 0.767 (±0.013)

0.51 (±0.124) 0.612 (±0.081) 0.671 (±0.045) 0.7 (±0.028) 0.737 (±0.01) 0.758 (±0.015) 0.767 (±0.013)

0.463 (±0.114) 0.611 (±0.075) 0.669 (±0.043) 0.699 (±0.024) 0.722 (±0.007) 0.756 (±0.01) 0.764 (±0.02)

M2

0.553 (±0.101) 0.635 (±0.07) 0.688 (±0.033) 0.707 (±0.018) 0.748 (±0.006) 0.771 (±0.018) 0.773 (±0.02)

0.55 (±0.108) 0.656 (±0.049) 0.694 (±0.025) 0.721 (±0.011) 0.758 (±0.015) 0.765 (±0.022) 0.767 (±0.018)

UC

0.458 (±0.123) 0.601 (±0.076) 0.671 (±0.04) 0.703 (±0.024) 0.728 (±0.013) 0.765 (±0.018) 0.772 (±0.02)

0.338 (±0.165) 0.465 (±0.142) 0.473 (±0.136) 0.586 (±0.086) 0.688 (±0.031) 0.744 (±0.02) 0.756 (±0.018)

0.333 (±0.153) 0.463 (±0.136) 0.472 (±0.136) 0.583 (±0.086) 0.681 (±0.034) 0.745 (±0.013) 0.756 (±0.02)

macroF1

0.533 (±0.109) 0.67 (±0.045) 0.688 (±0.03) 0.715 (±0.018) 0.757 (±0.017) 0.772 (±0.025) 0.773 (±0.02)

UN

0.475 (±0.13) 0.605 (±0.072) 0.661 (±0.045) 0.696 (±0.023) 0.725 (±0.01) 0.761 (±0.012) 0.765 (±0.016)

0.437 (±0.129) 0.477 (±0.139) 0.517 (±0.127) 0.613 (±0.068) 0.733 (±0.01) 0.758 (±0.018) 0.766 (±0.02)

0.435 (±0.129) 0.479 (±0.136) 0.543 (±0.105) 0.621 (±0.062) 0.728 (±0.005) 0.754 (±0.016) 0.761 (±0.014)

UM

0.427 (±0.132) 0.583 (±0.076) 0.634 (±0.063) 0.683 (±0.036) 0.709 (±0.017) 0.739 (±0.012) 0.764 (±0.013)

0.183 (±0.198) 0.42 (±0.157) 0.431 (±0.163) 0.469 (±0.151) 0.626 (±0.065) 0.722 (±0.015) 0.741 (±0.007)

0.238 (±0.137) 0.423 (±0.143) 0.432 (±0.148) 0.467 (±0.141) 0.596 (±0.077) 0.701 (±0.024) 0.737 (±0.006)

INT

Arabic abstract

180

Dimensionality Reduction Techniques for Enhancing ...

A large comparative study has been conducted in order to evaluate these .... in order to discover knowledge from text or unstructured data [62]. ... the preprocessing stage, the unstructured texts are transformed into semi-structured texts.

2MB Sizes 1 Downloads 445 Views

Recommend Documents

Comparison of Dimensionality Reduction Techniques ...
In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract th

Transferred Dimensionality Reduction
propose an algorithm named Transferred Discriminative Analysis to tackle this problem. It uses clustering ... cannot work, as the labeled and unlabeled data are from different classes. This is a more ... Projection Direction. (g) PCA+K-means(bigger s

Dimensionality Reduction for Online Learning ...
is possible to learn concepts even in surprisingly high dimensional spaces. However ... A typical learning system will first analyze the original data to obtain a .... Algorithm 2 is a wrapper for BPM which has a dimension independent mis-.

Distortion-Free Nonlinear Dimensionality Reduction
in many machine learning areas where essentially low-dimensional data is nonlinearly ... low dimensional representation of xi. ..... We explain this in detail.

eigenfaces and eigenvoices: dimensionality reduction ...
We conducted mean adaptation experiments on the Isolet database 1], which contains .... 4] Z. Hu, E. Barnard, and P. Vermeulen, \Speaker Normalization using.

Feature Sets and Dimensionality Reduction for Visual ...
France. Abstract. We describe a family of object detectors that provides .... 2×2 cell blocks for SIFT-style normalization, giving 36 feature dimensions per cell. .... This provides cleaner training data and, by mimicking the way in which the clas-.

Dimensionality reduction of sonar images for sediments ...
Abstract Data in most of the real world applications like sonar images clas- ... with: Xij: represent the euclidian distance between the sonar images xi and xj. ... simple way to obtain good classification results with a reduced knowledge of.

Dimensionality reduction using MCE-optimized LDA ...
Dec 7, 2008 - performance of this new method is as good as that of the MCE- trained system ..... get significant performance improvement on TiDigits. Compared with ... Application of Discriminative Feature Extraction to Filter-. Bank-Based ...

EMI Reduction Techniques
Dec 29, 1998 - peak jitter must remain within the system's specifications. .... avoided. These ground and power partitions may create complex current loops.

Dimensionality reduction using MCE-optimized LDA ...
Dec 7, 2008 - Continuous Speech Recognition (CSR) framework, we use. MCE criterion to optimize the conventional dimensionality reduction method, which ...

Self-taught dimensionality reduction on the high ...
Aug 4, 2012 - representations of target data are deleted for achieving the effectiveness and the efficiency. That is, this step performs feature selection on the new representations of target data. Finally, experimental results at various types of da

Nonlinear Dimensionality Reduction with Local Spline ...
Jan 14, 2009 - and call yij ∈ Y its corresponding global coordinate in Rd. During algorithm ..... To avoid degenerate solutions, we add a .... the center. We also ...

Nonlinear Dimensionality Reduction with Local Spline ...
Nov 28, 2008 - This paper presents a new algorithm for Non-Linear Dimensionality Reduction (NLDR). Our algorithm is developed under the conceptual framework of compatible mapping. Each such mapping is a compound of a tangent space projection and a gr

Fast and Efficient Dimensionality Reduction using ...
owns very low computational complexity O(d log d) and highly ef- ..... there is no need to store the transform explicitly in memory. The- oretically, it guarantees ...

Dimensionality reduction by Mixed Kernel Canonical ...
high dimensional data space is mapped into the reproducing kernel Hilbert space (RKHS) rather than the Hilbert .... data mining [51,47,16]. Recently .... samples. The proposed MKCCA method (i.e. PCA followed by CCA) essentially induces linear depende

Novel Shaping and Complexity-Reduction Techniques ...
useful shaping technique that has empirical performance approaching the ... server timing channel that can achieve half of capacity. However, 1) they can only ..... check codes for channels with cross-talk,” in Proceedings of. IEEE Information ...