An automatic algorithm for building ontologies from data

Viewer
Transcript

An Automatic Algorithm for Building Ontologies From Data F. Colace, M. De Santo, M. Vento Universita di Salerno Dip. di Ing. dell’hformazione e Ing. Elettrica Viap.te Don Melillo, 1 - 84084 Fisciano (SA) {fcolace, desanto, mvento}@unisa.it

Abstract In this paper we will describe an automatic algorithm able to learn university courses ontologies from experimental datu. This algorithm is based on the use of the Bayesian networksformalism for representing ontologies, as well as on the use of a learning algorithm that infers the corresponding probabilistic model starting from the resultsfinal courses tests. According a multiexpert approach, this method uses Bayesian networks structural learning algorithms in order to build reference ontologies. This algorithm aims to help teachers in the organization of courses and students in the definition of customized learning path. We will provide an experimental evaluation of the method using data coming fiom real courses

1. Introduction Currently, one of the greatest challenges in scientific research is the development of advanced educational systems that are adaptable and intelligent. Methodologies linked to knowledge representation are among the key elements to building intelligent and advanced training systems. In fact, an ensemble of well-structured concepts is able to significantly improve interoperability and information sharing between systems. In literature, such a set of concepts and their relationships describing a knowledge domain is called ontology [ 11. Ontologies are among the most efficient tools for formalizing knowledge that should then be shared by groups of people [ 2 ] . The ontology construction process is neither trivial nor easy. A source of indirect evidence exists that can be profitably employed for reconstructing ontology used during a course or a series of lessons: end-of-course evaluation tests. Teacher planning the end-of-course evaluation test not only assesses students’ level of preparation for the most significant subjects proposed during the lessons, but also tends to describe the ontology outlining the propaedeutic aspects that relate subjects to one another. It may be useful to extract the ontology from these tests through the analysis of the answers given by students on such tests. Bayesian networks represent a technique useful for this purpose. The aim of this paper is to introduce a technique, based on structural learning Bayesian network algorithms, that allows an unattended construction of ontology in order to allow a more easy management of the contents, related to every subject belonging to ontology, by teachers or intelligent tutoring system.

2. Ontologies Ontologies represent a vast topic that cannot be easily defined, given the disagreements coming from the several methods adopted to build and use them, as well as from the different roles they may play [3]. In the field of computer science, ontology represents a tool useful to the learning processes that are typical of artificial intelligence. In fact, the use of ontologies is rapidly growing thanks to the significant functions they are carrying out in information systems, semantic web and knowledge-based systems. Ontological analysis clarifies knowledge structures: given a domain, its ontology represents the heart of any knowledge representation system for that domain. It is clcar that ontologics arc important bccausc thcy cxplicatc all thc possiblc rclations among thc concepts belonging to a domain. Once these relations are explained, it will be possible to easily modify them, if our knowledge about that domain changes. These explicit specifications provided by ontologies can also help new users to understand what specific terms in a domain mean.

3. Bayesian Networks

.

Bayesian networks have been successfully used to model knowledge under conditions of uncertainty within expert systems [4] and so also ontologies. A Bayesian network is a graph-based model encoding the joint probability distribution of a set of random variables. It consists of a directed acyclic graph S where each node is associated with one random variable X, and each arc represents the conditional dependence among the nodes that it joints and a set P of local probability distributions, each of which is associated with a random variable X, and conditioned by the variables corresponding to the source nodes of the arcs entering the node with which X, is associated. It could be useful to learn the structure of Bayesian Networks given the data. The main aim of structural learning algorithms is to make clear the relationship between the entities of the domain and to specify the causality ties starting from the observations of domain variables values. In the next paragraph we will show as a Bayesian network could represent ontology and as structural learning algorithms could build ontologies from

0-7803-8482-21041$20.0002004 IEEE.

117

data. We have chosen five structural learning algorithms in order to use them according a majority vote MultiExpert approach. These algorithms represent the main approaches followed in the structural learning research field. The algorithms are: the Bayesian algorithm[5], K2[6], K3[7], PC[8] and TPDA[9].

4. Our proposal and obtained results This proposal aims to present a technique able to automatically infer propaedeutic relationships among the different subjects forniing a university course. A source of indirect evidence that can be employed for reconstructing a posteriori an ontology used during a course, as well as the propaedeutic connection among the single subjects, is represented by the end-of-course evaluation tests. The teacher planning the end-of-course evaluation tends to describe the ontology on which hisher course was based outlining the propaedeutic aspects that relate subjects to one another. It may be useful to extract the ontology from these tests evaluating the propaedeutic relationships among the subjects forming it through the analysis of the answers given by students on such tests. On the basis of these considerations, teachers has planned the final test of the first-level course on Computer Science at the Electronical Engineering Faculty (CSE) and the final test of the first-level course on Introduction to Computer Science at the Language Faculty of the University of Salerno. In the case of the second ontology the teacher divided it in three sub-ontologies: hardware, software and web. On the basis of these ontologies, some questionnaires, composed by multiple choice questions, to be filled in by students have been realized. The previously graph represents the ontologies, but can also be used as a Bayesian network for the infcrcncc proccss. Each nodc of thc nctworks has two statcs LYcs' for complctc knowlcdgc ofthc subjcct or 'Not' for total ignorance on the subject and represents the probability that a generic learner knows the subject associated with the same node. Through a majority vote multiexpert approach we combined the results obtained by using the five structural learning algorithms previously cited (figure I). The obtained results show as learned ontologies are closed to teacher ones (table 1 ) .

5. Conclusions In this paper, we have described a method for learning in an automatic way ontologies. In particular, our approach to the problem is based on the use of Bayesian networks. Thanks to their characteristics, these networks can be used to model and evaluate the conditional dependencies among the nodes of ontology on the basis of the data obtained from student tests. An experimental evaluation of the proposed method has been performed using real student data and demonstrated that the relationships inferred by the system are very similar to the ones that a human expert defined, confirming the effectiveness of the proposed method.

6. References [ I ] Gruber T. R., A translation Approach to Portable Ontology Specifications, Knowledge Acquisition, 5(2): 199-220, 1993 [2] Studer R., Benjamins V . R., Fensel D., Knowledge Engineering: Principles and Methods, DKE 25(1-2), 1998 [3] Uschold M., Gruninger M., Ontologies: principles, methods and applications, Knowledge Engineering Review [4] Heckerman, D., A tutorial on learning with Bayesian networks, Learning in Graphical Models, Adaptive Computation and Machine Learning The MlT Press, Cambridge, Massachusetts, M.1. Jordan Editor, 1999

[5] Heckermann, D. et al, Learning Bayesian Networks. The Combination of Knowledge and Statistical Data, Machine Learning, 1995 20(3): 197-243 [6] Cooper G . F., E. Herskovits, A Bayesian Method For The Induction of Probabilistic Networks From Data, Machine Learning. 1992,9,309-347 [7] Bouckaert R., Probabilistic Network Construction Using the Minimum Description Length Principle, Lecture Notes in Computer Science, Vol. 747, 1993 [8] Spirtes, P. et al, Prediction and Search, MIT press, 2001 [9] Cheng ,J., Bell, D., Liu, W., Learning belief networks from data: an information theory based approach, Proceedings of the Sixth ACM International Conference on Information and Knowledge Management, 1 YY7

Appendix: Images and Tables

1

Figure 1: Algorithm Sehematization

0-7803-8482-21041$20.0002004 IEEE.

118

Automatic construction of lexicons, taxonomies, ontologies

An Automatic Verification Technique for Loop and Data ...

An Approximation Algorithm for Data Storage ...

An Efficient Algorithm for Clustering Categorical Data

An Efficient Algorithm for Sparse Representations with l Data Fidelity ...

Building Domain Ontologies in a retrieval system for ...

Towards Landscape-Aware Automatic Algorithm ...

Improving Automatic Model Creation using Ontologies

Automatic Test Data Generation from Embedded C Code.pdf ...

Data Structure and Algorithm for Big Database

An Evolutionary Algorithm for Homogeneous ...

An Algorithm for Implicit Interpolation

An Adaptive Fusion Algorithm for Spam Detection

An Algorithm for Implicit Interpolation

An Adaptive Fusion Algorithm for Spam Detection

An Algorithm for Nudity Detection

Ontologies for eLearning

Financial Industry Ontologies for Risk and Regulation Data (FIORD ...

OntoDW: An approach for extraction of conceptualizations from Data ...