Reasoning with Large Data Sets Darko Anicic Digital Enterprise Research Institute (DERI), University of Innsbruck, Austria [email protected]

Abstract. Efficient reasoning is a critical factor for successful Semantic Web applications. In this context, applications may require vast volumes of data to be processed in a short time. We develop novel reasoning techniques which will extend current reasoning methods as well as existing database technologies in order to enable large scale reasoning. We propose advances and key design principles primarily in: making an efficient query execution plan as well as in memory, storage and recovery management. Our study is being implemented in Integrated Rule Inference System (IRIS) - a reasoner for Web Service Modeling Language.

1

Problem Statement

The Web Service Modeling Language WSML1 is a language framework for describing various aspects related to Semantic Web (SW) services. We are developing IRIS2 to serve as a WSML reasoner which handles large workload efficiently. Current inference systems exploit reasoner methods developed rather for small knowledge bases [2]. These systems3 , although utilize mature and efficient relational database management systems (RDBMSs) and exploit a number of their evaluation strategies (e.g., query planning, caching, buffering etc.), cannot meet requirements for reasoning in complex SW applications. Reason for this is found in the fact that database techniques are rather developed for explicitly represented data, and need to be extended for dealing with implicit knowledge. In this work we investigate a framework which generalizes relational databases by adding deductive capabilities to them. RDBMSs suffer some limitations w.r.t the expressivity of their language. Full support for recursive views is one of them [3]. Further on, negation as failure is recognized as a very important nonmonotonic property for the Semantic Web. RDBMSs, although deal with negation as failure, can not select a minimal fixpoint that reflects the intended meaning in situations where the minimal fixpoint may not be unique. Our framework, although exceeding capabilities of RDBMSs, does not compromise their performance. Current reasoners cannot cope with large data sets (i.e., relations larger than system main memory). Hence a reasoner needs to deal effectively with portions of 1 2 3

WSML: http://www.wsmo.org/TR/d16/d16.1/v0.2/. IRIS: http://sourceforge.net/projects/iris-reasoner/. Reasoners which utilize persistant storage: KAON2, Aditi, InstanceStore, DLDB.

relations (possible distributed over many machines), and sophisticated strategies for partition-level relation management are required. Consequently, a relevant topic for our present and future work is: The development of effective optimization algorithms as well as distribution and memory management strategies for reasoning with large data sets.

2

Efficient Large Scale Reasoning: an Approach

We will now give a short overview of our approach to achieving effective reasoning with large data sets. Unlike other inference systems4 , which utilize SQL to access existential relations, we tightly integrate IRIS with its storage layer (i.e., rules are translated into relational algebra expressions and SQL is avoided as an unnecessary overhead). We extend embedded RDBMS query optimizer (which is rather designed to be used for extensional data) for derived relations. The estimation of the size and evaluation cost of the intensional predicates will be based on the adaptive sampling method [4, 1], while the extensional data will be estimated using a graph-based synopses of data sets similarly as in [5]. Further on, for reasoning with large relations, run time memory overflow may occur. Therefore in IRIS we are developing novel techniques for a selective pushing of currently processed tuples to disk. This technique will be further extended for data distributed over many disks (e.g., a cluster of machines). Such techniques aim to enable IRIS to effectively handle large workload which cannot fit in main memory of the system. Our framework comprises a recovery manager and thus features fault-tolerant architecture. Using logging and replications we ensure that, when a crash occurs, the system may continue with an ongoing operation without loss of previously computed results.

3

Acknowledgment

I am grateful to Michael Kifer and my supervisors: Stijn Heymans and Dieter Fensel for their help in the work conceptualization and insightful discussions.

References 1. M. E. Vidal E. Ruckhaus and E. Ruiz. Query evaluation and optimization in the semantic web. In ALPSWS2006 Workshop, Washington, USA. 2. Dieter Fensel and Frank van Harmelen. Unifying reasoning and search to web scale. IEEE INTERNET COMPUTING, page 3, 2 2007. 3. Michael Kifer, Arthur Bernstein, and Philip M. Lewis. Database Systems: An Application Oriented Approach. Addison-Wesley, Boston, MA, USA, 2005. 4. R. J. Lipton and J. F. Naughton. Query size estimation by adaptive sampling. In PODS ’90, NY, USA. 5. J. Spiegel and N. Polyzotis. Graph-based synopses for relational selectivity estimation. In SIGMOD ’06, NY, USA. 4

KAON2, QUONTO, InstanceStore and DLDB exploit SQL for querying.

Reasoning with Large Data Sets

query execution plan as well as in memory, storage and recovery man- agement. ... This technique will be further extended for data distributed over many disks ...

71KB Sizes 2 Downloads 263 Views

Recommend Documents

Reasoning with Large Data Sets
Framework consisting of a collection of components which cover various aspects of ..... IRIS is an open source project developed under LGPL and available at:.

Chapter 4 ONTOLOGY REASONING WITH LARGE DATA ...
LARGE DATA REPOSITORIES. Stijn Heymans1, Li Ma2, ... We take Minerva as an example to analyze ontology storage in databases in depth, as well as to.

Bootstrap Tilting Inference and Large Data Sets ... - Tim Hesterberg
Jun 11, 1998 - We restrict consideration to distributions with support on the observed data methods described ...... My guess is that the bootstrap (and other computer-intensive methods) will really come into its own ..... 5(4):365{385, 1996.

Bootstrap Tilting Inference and Large Data Sets ... - Tim Hesterberg
Jun 11, 1998 - We restrict consideration to distributions with support on the observed data methods described ...... My guess is that the bootstrap (and other computer-intensive methods) will really come into its own ..... 5(4):365{385, 1996.

Abductive Reasoning with Type 2 Fuzzy Sets
respectively. For computing the primary distribution of y is. Bi. / we use equation (9),. ))],(. ),( ..... architecture for autonomous mobile robots”, IEEE Trans. On.

Improved Mining of Outliers in Distributed Large Data Sets ... - IJRIT
achieve a large time savings and it meets two basic requirements: the reduction of the ... of real data sets and in the prevalence of distributed data sources [11].

Improved Mining of Outliers in Distributed Large Data Sets ... - IJRIT
Abstract- In Data Mining, a distributed approach for detecting distance-based ... of all the data sets is widely adopted solution requires to a single storage and .... This implementation is portable on a large number of parallel architectures and it

Interoperability with multiple instruction sets
Feb 1, 2002 - ABSTRACT. Data processing apparatus comprising: a processor core hav ing means for executing successive program instruction. Words of a ...

Interoperability with multiple instruction sets
Feb 1, 2002 - 712/209,. 712/210. See application ?le for complete search history. ..... the programmer speci?es the sorting order is to pass the address of a ...

Simulated and Experimental Data Sets ... - Semantic Scholar
Jan 4, 2006 - For simplicity of presentation, we report only the results of apply- ing statistical .... identify the correct synergies with good fidelity for data sets.

1 Visibility Data & AIPS++ Measurement Sets - GitHub
you screw up, restore it with: $ cd ~/Workshop2007 ... cp -a (/net/birch)/data/oms/Workshop2007/demo.MS . ... thus, “skeleton”: we ignore the data in the MS.

Simulated and Experimental Data Sets ... - Semantic Scholar
Jan 4, 2006 - and interact with a highly complex, multidimensional environ- ment. ... Note that this definition of a “muscle synergy,” although common ... structure of the data (Schmidt et al. 1979 .... linear dependency between the activation co

Towards Large Scale Reasoning on the Semantic Web
evaluate the performance of the DLog database extension. Keywords: ... for querying description logic concepts in an environment where the ABox is stored in a ..... The SQL query in the simple interface is defined using the string attribute.

Reasoning with Rules
Sep 5, 2002 - which leaves room for a normative gap. How can ..... factual disagreement: economics, e.g., is far from a secure science, and disagreements.

Horizontal Aggregations in SQL to Prepare Data Sets for Data ...
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis..pdf. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining ...

large data -> MMM.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. large data ...

practice question sets with solutions.pdf
There was a problem loading more pages. Retrying... practice question sets with solutions.pdf. practice question sets with solutions.pdf. Open. Extract. Open with.

Khepera robots with argumentative reasoning
Now: DeLP-server temporarily binds the robot's perception with the program it has stored, to answer the current query. (contextual query). ▫ Fourth. ❑ Before: Two robots in the same environment but running in different frameworks had to independe

Extracting Coactivated Features from Multiple Data Sets
s Background information on the extraction of related features from multiple data sets s Explanation of the statistical model underlying our method s Testing our ...