An Automatic Algorithm for Building Ontologies From Data F. Colace, M. De Santo, M. Vento Universita di Salerno Dip. di Ing. dell’hformazione e Ing. Elettrica Viap.te Don Melillo, 1 - 84084 Fisciano (SA) {fcolace, desanto, mvento}@unisa.it

Abstract In this paper we will describe an automatic algorithm able to learn university courses ontologies from experimental datu. This algorithm is based on the use of the Bayesian networksformalism for representing ontologies, as well as on the use of a learning algorithm that infers the corresponding probabilistic model starting from the resultsfinal courses tests. According a multiexpert approach, this method uses Bayesian networks structural learning algorithms in order to build reference ontologies. This algorithm aims to help teachers in the organization of courses and students in the definition of customized learning path. We will provide an experimental evaluation of the method using data coming fiom real courses

1. Introduction Currently, one of the greatest challenges in scientific research is the development of advanced educational systems that are adaptable and intelligent. Methodologies linked to knowledge representation are among the key elements to building intelligent and advanced training systems. In fact, an ensemble of well-structured concepts is able to significantly improve interoperability and information sharing between systems. In literature, such a set of concepts and their relationships describing a knowledge domain is called ontology [ 11. Ontologies are among the most efficient tools for formalizing knowledge that should then be shared by groups of people [ 2 ] . The ontology construction process is neither trivial nor easy. A source of indirect evidence exists that can be profitably employed for reconstructing ontology used during a course or a series of lessons: end-of-course evaluation tests. Teacher planning the end-of-course evaluation test not only assesses students’ level of preparation for the most significant subjects proposed during the lessons, but also tends to describe the ontology outlining the propaedeutic aspects that relate subjects to one another. It may be useful to extract the ontology from these tests through the analysis of the answers given by students on such tests. Bayesian networks represent a technique useful for this purpose. The aim of this paper is to introduce a technique, based on structural learning Bayesian network algorithms, that allows an unattended construction of ontology in order to allow a more easy management of the contents, related to every subject belonging to ontology, by teachers or intelligent tutoring system.

2. Ontologies Ontologies represent a vast topic that cannot be easily defined, given the disagreements coming from the several methods adopted to build and use them, as well as from the different roles they may play [3]. In the field of computer science, ontology represents a tool useful to the learning processes that are typical of artificial intelligence. In fact, the use of ontologies is rapidly growing thanks to the significant functions they are carrying out in information systems, semantic web and knowledge-based systems. Ontological analysis clarifies knowledge structures: given a domain, its ontology represents the heart of any knowledge representation system for that domain. It is clcar that ontologics arc important bccausc thcy cxplicatc all thc possiblc rclations among thc concepts belonging to a domain. Once these relations are explained, it will be possible to easily modify them, if our knowledge about that domain changes. These explicit specifications provided by ontologies can also help new users to understand what specific terms in a domain mean.

3. Bayesian Networks

.

Bayesian networks have been successfully used to model knowledge under conditions of uncertainty within expert systems [4] and so also ontologies. A Bayesian network is a graph-based model encoding the joint probability distribution of a set of random variables. It consists of a directed acyclic graph S where each node is associated with one random variable X, and each arc represents the conditional dependence among the nodes that it joints and a set P of local probability distributions, each of which is associated with a random variable X, and conditioned by the variables corresponding to the source nodes of the arcs entering the node with which X, is associated. It could be useful to learn the structure of Bayesian Networks given the data. The main aim of structural learning algorithms is to make clear the relationship between the entities of the domain and to specify the causality ties starting from the observations of domain variables values. In the next paragraph we will show as a Bayesian network could represent ontology and as structural learning algorithms could build ontologies from

0-7803-8482-21041$20.0002004 IEEE.

117

data. We have chosen five structural learning algorithms in order to use them according a majority vote MultiExpert approach. These algorithms represent the main approaches followed in the structural learning research field. The algorithms are: the Bayesian algorithm[5], K2[6], K3[7], PC[8] and TPDA[9].

4. Our proposal and obtained results This proposal aims to present a technique able to automatically infer propaedeutic relationships among the different subjects forniing a university course. A source of indirect evidence that can be employed for reconstructing a posteriori an ontology used during a course, as well as the propaedeutic connection among the single subjects, is represented by the end-of-course evaluation tests. The teacher planning the end-of-course evaluation tends to describe the ontology on which hisher course was based outlining the propaedeutic aspects that relate subjects to one another. It may be useful to extract the ontology from these tests evaluating the propaedeutic relationships among the subjects forming it through the analysis of the answers given by students on such tests. On the basis of these considerations, teachers has planned the final test of the first-level course on Computer Science at the Electronical Engineering Faculty (CSE) and the final test of the first-level course on Introduction to Computer Science at the Language Faculty of the University of Salerno. In the case of the second ontology the teacher divided it in three sub-ontologies: hardware, software and web. On the basis of these ontologies, some questionnaires, composed by multiple choice questions, to be filled in by students have been realized. The previously graph represents the ontologies, but can also be used as a Bayesian network for the infcrcncc proccss. Each nodc of thc nctworks has two statcs LYcs' for complctc knowlcdgc ofthc subjcct or 'Not' for total ignorance on the subject and represents the probability that a generic learner knows the subject associated with the same node. Through a majority vote multiexpert approach we combined the results obtained by using the five structural learning algorithms previously cited (figure I). The obtained results show as learned ontologies are closed to teacher ones (table 1 ) .

5. Conclusions In this paper, we have described a method for learning in an automatic way ontologies. In particular, our approach to the problem is based on the use of Bayesian networks. Thanks to their characteristics, these networks can be used to model and evaluate the conditional dependencies among the nodes of ontology on the basis of the data obtained from student tests. An experimental evaluation of the proposed method has been performed using real student data and demonstrated that the relationships inferred by the system are very similar to the ones that a human expert defined, confirming the effectiveness of the proposed method.

6. References [ I ] Gruber T. R., A translation Approach to Portable Ontology Specifications, Knowledge Acquisition, 5(2): 199-220, 1993 [2] Studer R., Benjamins V . R., Fensel D., Knowledge Engineering: Principles and Methods, DKE 25(1-2), 1998 [3] Uschold M., Gruninger M., Ontologies: principles, methods and applications, Knowledge Engineering Review [4] Heckerman, D., A tutorial on learning with Bayesian networks, Learning in Graphical Models, Adaptive Computation and Machine Learning The MlT Press, Cambridge, Massachusetts, M.1. Jordan Editor, 1999

[5] Heckermann, D. et al, Learning Bayesian Networks. The Combination of Knowledge and Statistical Data, Machine Learning, 1995 20(3): 197-243 [6] Cooper G . F., E. Herskovits, A Bayesian Method For The Induction of Probabilistic Networks From Data, Machine Learning. 1992,9,309-347 [7] Bouckaert R., Probabilistic Network Construction Using the Minimum Description Length Principle, Lecture Notes in Computer Science, Vol. 747, 1993 [8] Spirtes, P. et al, Prediction and Search, MIT press, 2001 [9] Cheng ,J., Bell, D., Liu, W., Learning belief networks from data: an information theory based approach, Proceedings of the Sixth ACM International Conference on Information and Knowledge Management, 1 YY7

Appendix: Images and Tables

1

Figure 1: Algorithm Sehematization

0-7803-8482-21041$20.0002004 IEEE.

118

An automatic algorithm for building ontologies from data

This algorithm aims to help teachers in the organization of courses and students in the ... computer science, ontology represents a tool useful to the learning ... It is clcar that ontologics arc important bccausc thcy cxplicatc all thc possiblc ...

230KB Sizes 1 Downloads 251 Views

Recommend Documents

Automatic construction of lexicons, taxonomies, ontologies
NLP and AI applications. What standards exist for these resources? – ...... techbull/nd12/nd12_umls_2012ab_releases.html. (Accessed December 14, 2012). 31.

Automatic construction of lexicons, taxonomies, ontologies
changing domains such as current affairs and celebrity news. Consequently, re- ..... In some cases there are both free versions and full commercial versions.

An Automatic Verification Technique for Loop and Data ...
tion/fission/splitting, merging/folding/fusion, strip-mining/tiling, unrolling are other important ... information about the data and control flow in the program. We use ...

An Approximation Algorithm for Data Storage ...
Email:{shengbo, cct, liqun, wm}@cs.wm.edu. Abstract ... archived for future information retrieval. This paper pro- ... to a central place for archiving and reduce the communi- ... The best known solution to the metric k-median problem has an ...

An Efficient Algorithm for Clustering Categorical Data
the Cluster in CS in main memory, we write the Cluster identifier of each tuple back to the file ..... algorithm is used to partition the items such that the sum of weights of ... STIRR, an iterative algorithm based on non-linear dynamical systems, .

An Efficient Algorithm for Sparse Representations with l Data Fidelity ...
Paul Rodrıguez is with Digital Signal Processing Group at the Pontificia ... When p < 2, the definition of the weighting matrix W(k) must be modified to avoid the ...

Building Domain Ontologies in a retrieval system for ...
deficits we developed a Semantic Web based retrieval system, using domain ontologies. ... (phone: +49 03 450 536113; fax: +49 03 450 7536113; e-mail:.

Towards Landscape-Aware Automatic Algorithm ...
by automatic configuration, and the method was proved effective when experimented with SAT specific tools. However, it requires both a suitable portfolio builder and a domain-specific knowledge, which can constitute a bottleneck in practice for black

Improving Automatic Model Creation using Ontologies
software development process prevents numerous mistakes [2] .... meaning of each word, only certain UML concepts of these n-ary relations are suitable and sensible. Having the phrase. “user A uses an interface B in the application” implies a.

Improving Automatic Model Creation using Ontologies
state-charts, sequence-diagrams and so forth. Every thematic relation can be ..... [22] Meystre and Haug, “Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation,”. J. of Biomedical ...

Automatic Test Data Generation from Embedded C Code.pdf ...
There was a problem loading this page. Automatic Test Data Generation from Embedded C Code.pdf. Automatic Test Data Generation from Embedded C Code.

Data Structure and Algorithm for Big Database
recommendation for further exploration and some reading lists with some ... There is a natural tendency for companies to store data of all sorts: financial data, ...

An Evolutionary Algorithm for Homogeneous ...
fitness and the similarity between heterogeneous formed groups that is called .... the second way that is named as heterogeneous, students with different ...

An Algorithm for Implicit Interpolation
More precisely, we consider the following implicit interpolation problem: Problem 1 ... mined by the sequence F1,...,Fn and such that the degree of the interpolants is at most n(d − 1), ...... Progress in Theoretical Computer Science. Birkhäuser .

An Adaptive Fusion Algorithm for Spam Detection
An email spam is defined as an unsolicited ... to filter harmful information, for example, false information in email .... with the champion solutions of the cor-.

An Algorithm for Implicit Interpolation
most n(d − 1), where d is an upper bound for the degrees of F1,...,Fn. Thus, al- though our space is ... number of arithmetic operations required to evaluate F1,...,Fn and F, and δ is the number of ...... Progress in Theoretical Computer Science.

An Adaptive Fusion Algorithm for Spam Detection
adaptive fusion algorithm for spam detection offers a general content- based approach. The method can be applied to non-email spam detection tasks with little ..... Table 2. The (1-AUC) percent scores of our adaptive fusion algorithm AFSD and other f

An Algorithm for Nudity Detection
importance of skin detection in computer vision several studies have been made on the behavior of skin chromaticity at different color spaces. Many studies such as those by Yang and Waibel (1996) and Graf et al. (1996) indicate that skin tones differ

Ontologies for eLearning
Institute of Control and Industrial Electronics, Warsaw University of Technology. Warsaw, Poland .... on ontologies keep the profile of each user and thanks to the ontologies are able to find best fitting content. ..... 2001), Orlando, USA. Edutella 

Financial Industry Ontologies for Risk and Regulation Data (FIORD ...
different internal data structures. Something different is ... Volume: Big data companies such as EMC can now offer storage solutions such as the 4 petabyte ...

OntoDW: An approach for extraction of conceptualizations from Data ...
OntoDW: An approach for extraction of conceptualizations from Data Warehouses.pdf. OntoDW: An approach for extraction of conceptualizations from Data ...