Efficient Query Processing for Streamed XML Fragments Huan Huo, Guoren Wang, Xiaoyun Hui, Rui Zhou, Bo Ning, and Chuan Xiao Institute of Computer System, Northeastern University, Shenyang, China [email protected]

Abstract. Unlike in traditional databases, queries on XML streams are bounded not only by memory but also by real time processing. Recently proposed Hole-Filler model is promising for information transmission and publication, by slicing XML data into low consuming, easy synchronized fragments. However, XPath queries evaluate the elements in streamed XML data, not the XML fragments, and operation dependence caused by fragments decelerates processing efficiency. By taking advantage of schema information for XML, this paper proposes a model of tid tree to optimize queries over XML fragments by removing “redundant” operations. It then proposes XFPro for processing XPath queries on XML fragments to achieve processing and memory efficiency. Our performance study shows that XFPro performs well both on execution time and memory metrics.

1

Introduction

XML [1]is emerging as a de facto standard for information representation and data exchange over the web. As semi-structural data, XML can be represented as a tree-structural model with data contents and structural relationships among them. Evaluating XML queries, such as XPath [2] and XQuery [3], is thus widely studied in database management systems. Figure 1 gives an XML document and its DOM tree, which acts as an example of our work. However, being inherently hierarchical, stored XML data poses an overwhelming overhead on runtime factors, which is not suitable for stream processing. In stream model, data arrives in continuous streams and has to be analyzed in real-time by one pass. Hence, queries on XML streams are bounded not only by memory but also by real time processing. Many applications, such as network intrusion detection, sensor network monitoring, business transactions and earth climate monitoring, involve analysis of streaming data. Recently, many research work focus on answering queries on streamed XML data, such as XFrag [4], XStreamCast [5] and etc. In XFrag framework, large XML documents are fragmented into manageable chunks of information and XQueries are processed on steamed XML fragments in a pipelined model, without having to wait for the entire XML document to be received and materialized.

1 c om m odities

W al-M art PDA HP PalmPilo t

315.25

...
...


...... 2

314

vendor

vendor 4

3

item s

nam e

......

5

W al-M art

item 6

7

93

315

316

nam e

item s

Carrefour ……

......

item 8

9

nam e

m ak e

m odel

pric e

P DA

HP

P alm P ilot

315.25

US D c urrenc y

Fig. 1. An XML Document and its DOM Tree

In [5], a query algebra for XQuery that operates on fragmented XML stream data is presented. All these framework are built on streamed XML fragment model. In order to correlate each XML fragments, Hole-Filler model is proposed in [6]. In the model, a hole represents a placeholder into which another rooted subtree (a fragment), called a filler, could be positioned to complete the tree. In this way, infinite XML streams turn out to be a sequence of XML fragments, and queries on parts of XML data require less memory and processing time. Furthermore, changes to XML data may pose less overhead by sending only fragments corresponding to the changes, instead of sending the entire document. Unfortunately, processing XML fragments instead of whole XML document is fraught with challenges. It has to maintain the context of the fragments for us to navigate from fragment to fragment and to cache the fragments related to the query answer when necessary. Since not all the fragments can be available at the same time and the fragments may arrive in any order, reducing the processing cost is the key for queries on XML fragments. In XFrag, XML fragments are processed as and when they arrive and only those messages that may effect on the query results are kept in the association table. However, the XFrag pipeline is still space consuming in maintaining the links in the association tables and time consuming in scheduling the operations for each fragment. Furthermore, since fragments are forwarded through operators on the pipeline, XFrag has to check the fragments’ head information on each operator, which decelerates the processing efficiency. And it can not avoid “redundant” operations when dependence occurs between adjacent operators. This paper presents a new framework and a set of techniques for processing XPath queries over streamed XML fragment. As compared to the existing work on supporting XPath/XQuery over streamed XML fragment, we make the following contributions: (i)we present techniques for enabling the transformation from XPath expression to optimized query plan. We model the query expressions using tid tree and apply a series of transformations, which enable further analysis and optimizations on query operations. Furthermore, such transforma-

tions reduce the query workload by specifying query operations such as “//” and “*”. (ii)based on tid tree, we present a pruning scheme to cut off redundant operations after query rewriting. In this way, we save the memory space and processing power. (iii)based on optimized tid tree, we propose query plan transformation techniques, which map a tid tree directly into an XML fragment query processor, named XFPro, and generate an efficient query execution plan. Note that, we assume the query clients cannot reconstruct the entire XML data before processing the queries. The rest of this paper is organized as follows. Section 2 presents the related work in the area of XML stream query processing. Section 3 introduces Hole-Filler model as the base for our XML fragments. Section 4 gives a detailed statement of our XML fragment processing framework. Section 5 shows experimental results from our implementation and shows the processing efficiency of our framework. Our conclusions are contained in Section 6.

2

Related Work

Many recent projects relate to query processing on streamed XML, such as NiagaraCQ [7], XRQL [8] and FluXQuery [9]. The BEA/XQRL processor [8] supports pipelined processing of streams by implementing the iterator model at the expression level. However, query optimizations specially designed for XML streams are limited in this system, and large documents cannot be processed. Transducer networks [10] have also been used to handle a subset of XQuery for streaming XML data. In Flux [9], XQuery is translated into event-based intermediate representation (IR) and the buffer size is optimized by analyzing the DTD as well as the query syntax. Instead of evaluating infinite XML stream by token, several recent efforts have focused on continuous processing of fragmented XML. The hole-filler model was first proposed in [11]. However, it is used in the context of pull-based content navigation over mediated views of XML data from disparate data sources. In Xstream [12], the advantages of a semantics-based fragmentation of XML data for efficient transmission over a wireless medium are highlighted. An alternative fragmented XML processing model, suitable for pull-based web-service applications, is presented in Active XML [13]. In XstreamCast [5], XML fragments are broadcasted to clients in a pushbased streaming model and continuous query is processed in a historical timeline. In comparison, we present systematic and powerful techniques for optimizing and transforming queries that are not specifically written for fragment processing. As we stated earlier, our additional contribution is specifying query expressions and pruning “redundant” operations in them.

3

Model for Streamed Fragmented XML Data

In our approach, we adopt the hole-filler model [6] to describe XML fragments, which hold both the data contents and structural relationships. In order to sim-

plify representation for further processing, a coding scheme is proposed to compress such information. 3.1

Preliminary Hole-Filler Model

We assume that a single document D is a node labelled acyclic tree with the set V of infinite nodes and the set E of finite edges. XML stream begins with finite XML documents and runs on as and when new elements are added into the document or updates occur upon the existing elements. The following definitions introduce some fundamental notions used in the rest of the paper. Definition 1. An XML document D is a tree Td = (Vd , Ed , Σd , rootd , oid), where Vd is an infinite set of nodes, including element nodes, attribute nodes and text nodes; Ed is a finite set of directed edges, indicating parent-child relationship between element nodes or containment relationship between element nodes and attribute nodes; each node has a type and is identified by oid, Σd is the set of node types; rootd (∈ Vd ) is the root element of D. Definition 2. A filler F is a subtree of XML document Tf = (Vf , Ef , Σf , rootf , f id, headf , Hd ), where Vf is the subset of Vd , Ef is the subset of Ed and Σf is the subset of Σd ; each filler is identified by f id, which is included in headf ; Hd is a finite set of holes; rootf (∈ Vf )is the root element of the subtree. Definition 3. A hole H is an empty node n(n ∈ Hd ) assigned with unique hid, into which a filler with the same f id value could be positioned to complete the tree. Given an XML document tree, we can fragment it by recursively inserting a hole at every point where a subtree is pruned, i.e. a filler is generated, and associating it with an ID(the f id of the filler fragment). Note that the filler can in turn have holes in it, which will be filled by other fillers. And we can reconstruct the original XML document by substituting holes with the corresponding fillers at the destination as it was in the source. However, reconstructing the entire XML tree is not a good idea since the query has to wait for the end of the stream to begin processing, which is not accommodated for infinite streamed XML fragments. As will be discussed in the next section, our approach is to process XML fragments as and when they become available in streamed model. Definition 4. Tag structure is a fragment of XML document with the highest priority T S = (Vt , Et , roott , IDt , Did), where Vt is an infinite set of tag nodes in XML document; Et is a finite set of edges; IDt is a set of number identifying the tag nodes in XML document; Did is the XML document identifier. Tag structure is a structure summary for XML fragments. It provides structural information for XML and captures all the valid paths [6]. In the hole-filler model, tag structure not only provides the relationships between each element nodes, but also involves fragmentation information of the XML data. It can be generated according to XML Schema or DTD, and also can be obtained when fragmenting an XML document without DTD.

3.2

Encoding Scheme

The DTD and tag structure of the XML document (given in Figure 1) in Section 1 are depicted in Figure 2.

1 commodities * 2 vendor 3

4

name

items + 5 item 6

7

8

9

name

make

model

price



Fig. 2. Tag Structure of Hole-Filler Model

Here, we encode the tag attribute “ID” and “Filler” together as a tag code. For “F iller = true”, we set the end of the tag code with “1”, otherwise we set it with “0”. And for attribute “ID”, we separate it from the “Filler” code by a point. The tag code for tag ”vendor” in the previous example is 2.1, while the tag code for tag “items” is 4.0. In this way, we can obtain the fragmentation information by checking the end of the tag code. Figure 3 gives two fragments of the XML document in Figure 1. Here, we number the root filler (i.e. the root of the fragmented document) with f id 0. And other filler IDs can be generated by pre-order traversing XML document tree at the server site. Attribute tsid [4](i.e. tag structure id) indicates the ID of the fragment’s root element in XML document DTD. We associate fillers with holes by matching filler IDs with hole IDs. Fragment 2’s f id corresponds to a Fragment 1’s hid, which means Fragment 2 fills the corresponding hole in Fragment 1 as a subtree when reconstructing the XML document. It is obvious that the contents in Fragment 1 remain relative stable to Fragment 2, i.e. texts (or elements) in Fragment 2 ( such as ”price”) may be updated more frequently. We can save transmission cost by sending Fragment 2 rather than the whole XML document. Furthermore, we can cut ”price” as a single fragment to save update transmission cost. This will lead to higher cost in querying item/price, for the elements now are in two different fragments. There is a trade-off between transmission cost and query cost. In this paper, we assume that XML documents have been fragmented already. What we focus on is query execution on XML streaming fragments at client site. Fragmenting algorithm is stated in [14] and omitted here.

Fragm ent 1: W al-Mart ... ....

Fragm ent 2: PDA HP Palm Pilot 315.25

Fig. 3. XML document Fragments

4

XFPro Query Handling

Based on hole-filler model, infinite XML streams turn out to be a sequence of XML fragments, which become the basic processing units of the query. However, input queries evaluate the elements in the XML document, not the XML fragments. Since fragments with the same tag code share the same structure, we can skip evaluating the structural relationship inside the fragments and expedite processing time by rewriting the queries for XML fragments. This section focuses on the analysis and optimization we perform for queries on fragments. Our goal is to correctly rewrite the query so that it can be processed directly on fragments, and to prune off the redundant path evaluations. Initially, we give an overview of our framework. 4.1

Overview

In this paper, we consider the class of XPath queries that are formed using only the following axes: child, attribute, or descendant axes, denoted as forward XPath. The following query, referred to as Query 1, is an example on the XML document described in Section 1. Query 1: /commodities/vendor/items/item[name=‘‘PDA’’]/price The analysis we perform in this paper is based on the following key observations on queries over streamed XML fragments. In a path expression consisting of predecessor node and successor node, operation dependence (see definition 6)occurs if the following conditions hold true: – The query result to predecessor node and successor node are in the same fragment, or – Any fragment matching the predecessor node also matches the successor nodes. The first condition is straightforward. Let us consider the second condition. When the query nodes involve predicates, the result set of the successor query

must be a subset of that of the predecessor query. When the query nodes have no predicates, the first condition holds true, which means that the query result only depends on the predecessor node. Queries that satisfy this propriety are referred to as subsumption dependence [15], which in most cases can be made subsumption-free by removing the successor nodes. Take Query 1 for example. According to the fragmentation information indicated in tag structure, “commodities”, “vendor” and “item” are root nodes of the fillers while “items”, “price” and“name” are not. Considering that a “vendor” fragment with tsid 5, filler id 7 and hole ids from 10 to 100, arrives and is evaluated against the path expression from query node “commodities”, since query nodes “items” and “vendor” belong to a common fragment according to the fragmentation information in tag structure, fragments that match ”vendor” obviously match “items”(without considering predicates). And such fragments need not to be evaluated for structural relationship between “vendor” and “items”. Much of our analysis bases on such query operation dependencies. Figure 4 shows the key phases in our XFPro system. First, we construct the tid tree from query expression and tag structure. Then, according to tag structure, we apply a series of policies to prune and optimize the tid tree. Such techniques not only rewrite some queries to avoid redundant operations, but most importantly, they save the memory space and processing power. After optimization, we transform the tid tree into query processor, and efficient query execution plan is generated. XFPro Framework Query Transformer

Query Plan Generator

Path Expressions

Optimized Tid Tree

Tid Tree

XFPro Query Engine

Prune Policy

Query Plan

Fig. 4. Overview of the Framework

4.2

Tid Tree

We introduce tid tree to represent the query expression and enable further analysis and optimizations on query operations. Definition 5. Let N be the set of query nodes in a query Q. Tid tree is a tree T T = (Tt , Et , roott , Pt , Ot ), where Tt is the set of corresponding tag codes of the nodes in set N; Et is a set of edges describing the structural relationship between two nodes; Pt is a text set of the predicate values; Ot is an operator set including boolean connectors; roott (∈ Vt ) is the root element of the tree.

We introduce subroot node denoted as the root of a filler, and subelement node that locates in a filler but is not the root of the subtree. By taking advantage of tag codes, we can easily tell subroot nodes from subelement nodes by checking the end of the code. And parent-child relationship between nodes is represented by a single arrow, while ancestor-descendant relationship between nodes is represented by a double arrow. In the case that the descendant node corresponds to multiple tag codes, we duplicate the descendant node and assign different tag codes to them (see Section 4.3 for details). The output of the query is depicted by an arrow. In order to distinguish between the node that represents a tag code and the node that represents an atomic predicate, we represent tag nodes with circles and values of predicates with rectangles. The operators (such as <, >, ≥, ≤, =) and boolean connectors are represented with diamonds. The tid tree for the Query 1 described in the previous section is shown in Figure 5. 1.1 is the tag code of “commodities” and similarly, 2.1, 4.0, 5.1, 6.0 and 7.0 are the tag codes of “vendor”, “items”, “item”, “price” and “name”. Here, “=” and “PDA” are treated as operator node and predicate node respectively in the tid tree.

1.1

2.1

4.0

5.1

7.0

= 6.0

"PDA"

Fig. 5. Tid Tree of Query 1

4.3

Optimizing Tid Tree

In XFrag [4], each query primitive corresponds to an XFrag operator, which processes the fragment only if the tsid of the fragment matches that of the operator. In the case that they do not match, the fragment is simply passed on to the subsequent operator in the query tree. However, in the case of operator dependence (as illustrated in Section 4.1), the fragments that do not match the predecessor operator need not to be evaluated against the successive one. Definition 6. Given any pair of nodes in tid tree n1 ,n2 , if the query result of n2 is valid only if that of n1 is valid, n2 is considered dependent on n1 . We use directed edge e = (n1 , n2 ) to imply the dependence between n1 and n2 . Definition 7. Given any pair of nodes in tid tree n1 ,n2 , n2 is subsumption dependent on n1 if: (i) n2 is dependent on n1 , and (ii) the query result of n2 is a subset of the query result of n1 . Subsumption-free queries are intuitively queries that do not contain “redundancies”. Some queries can be rewritten to be subsumption-free, by eliminating

redundant portions. Much of our analysis focuses on finding such dependencies on tsid nodes, to eliminate “redundant” query evaluations on structural relationship. In pruning process, we use dashed arrows to represent subsumption dependencies, and solid arrows for subsumption-free dependencies. Path Pattern Query Path pattern query is the simplest type of queries. Meanwhile it is the base of tree pattern query. Firstly, we assume that the query does not contain “//” and “*”. This class of query covers most of the structural relationship “redundancies”. For example, Query 2 is a simple path pattern query with only “/” involved. Query 2: /commodities/vendor/items/item/name The original query involves three fragments with tsid 1, tsid 2 and tsid 5 and the tid tree includes five steps with tsid 1, tsid 2, tsid 4, tsid 5 and tsid 6. However, since fragments that don’t match tsid 2 obviously don’t match tsid 4, i.e. tsid 4 subsumption depends on tsid 2. We can rewrite the query to avoid such redundant operations by deleting subelement nodes which have no predicates and are not the leaf nodes in tid tree. According to tag code, subroot nodes ended with “1” are kept in the tid tree while subelement nodes ended with “0” and without predicate node in their children are removed. Since tsid 6 is a subelement node with predicate and tsid 7 is the leaf node in tid tree, they are kept in the tree. Figure 6 shows the optimized tid tree after pruning off the dependent node 4.0 (depicted by “X”).

1.1

2.1

4.0

5.1

6.0

Fig. 6. The Pruned Tid Tree after removing Subsumption Dependence of Query 2

However, pruning path pattern query may lead to incorrect results, when “//” and “*” are considered. This is because the ancestor node A before “//” and the descendant node D after “//” may belong to different fragments. Hence the fragment matches A may not match D. Similarly, “*” may not match in the same filler and we cannot determine subsumption dependence directly. In such cases, we need to rewrite the tid tree into “//” or “*” excluded form. Taking “//” for consideration, we first capture all the paths from A to D when traversing the tag structure. Then we insert the tag codes of corresponding subroot nodes of D into tid tree and link them with A according to the path. In this way, “//” is replaced by “/” and the query result is the merge set of each output node in tid tree. Now we can apply the pruning scheme for “/” to the rewritten tid tree. Figure 7 presents the tid tree of Query 3, which returns the descendants “name” of “vendor”.

4.4

Query Plan Generation

As described in the previous section, we rewrite original query into tid tree. However, the tid tree only represents a view of relationships between tsid nodes and predicates, while the details of query processing are not modelled. This section focuses on the transformation from tid tree to the corresponding query plan and gives a processing example of XFPro.

1.1

1.1

2.1

2.1 "PDA" 5.1

= 6.0

5.1 result

=

7.0

HAS H T ABLE

7.0

6.0

''PD A ''

BUC KET

Fig. 9. Transformation from Tid Tree to XFPro

The transformation from tid tree of Query 1 into the XFPro processor is depicted in Figure 9. Each subroot node in tid tree corresponds to an entry of hash table, which is tagged by a value of true, false, undecided (⊥). And each subelement node is added in a bucket tagged by an odd value linked to the corresponding entry of the subroot node, while each predicate node is added in a bucket tagged by an even value linked to the corresponding entry. There is a result entry at the end of the hash table, which has a linked bucket to cache the candidate output. It conjuncts all the entries’ value and is set true only if all the predecessor entries are set true. The XFPro processing for Query 1 is depicted in Figure 10. When the “commodities” fragment with tsid “1”, filler id “0” and hole ids “1, 21, 41” arrives, the query hash table set the entry 1 with T and the information is saved in the bucket linked to the entry. More over, the fragment with tsid “1” is tagged with an undecided value when it has predicate and the condition has not been evaluated for this fragment. Note that, at the point, the “commodities” filler can be discarded as it is no more needed to produce the result and the hole filler association is already captured. This results in memory conservation on the fly. Similarly, when the “vendor” fragment with the corresponding tsid “2” arrives, the entry 2 saves the information into the bucket and is set T , as there is no condition for it. When the “item” fragment with tsid “5”, filler id “3” arrives, the entry 5 is set ‘⊥’, since it has predicate bucket. After determining that the information in filler “3” matches the predicate, it sets the entry T . The “item” fragment may also be discarded at the point conserving memory, for the result value, which is a subset of the fragment, is already captured in the linked bucket. Since all the entries in the hash table are set “true”, the value of price is output as the result. The algorithms listed below describe the processing method.

T tsid= 1 fid= 0 hid (1, 21,41)

tsid= 1 fid= 0 hid (1,21,41)

2.1 5.1

6.0

7.0

result

T T tsid= 5 fid= 3

tsid= 2 fid= 1 hid (2,...,20)

5.1

7.0

6.0

=

tsid= 1 fid= 0 hid (1,21,41) tsid= 2 fid= 1 hid (2,...,20)

T

tsid= 1 fid= 0

T

tsid= 2 fid= 1 hid (2,...,20)

T

315.25

T =

BUCKET

''P D A ' '

HASH TABLE BUCKET (2 )

result

HASH TABLE (3 )

T

''P D A ' '

BUCKET

name

hid (--)

tsid= 1 fid= 0 hid (1,21,41)

result =

HASH TABLE (1 )

tsid= 2 fid= 1 hid (2,3,...,20)

T

hid (1,21,41)

T

315.25

315.25

''P D A ' '

=

''P D A ' '

HASH TABLE BUCKET (4 )

Fig. 10. XFPro Processing Example

Algorithm1 FindQueryChild() {Input an element node and trigger descendant operators} IF (isHashTerminalNode(element)) THEN output element; ELSE q <- HashBucketFirstnode(QueryNextnode(element)); WHILE(q!=null)DO IF (q.fid==elemnet.fid) THEN q.val=element.val; FindQueryChild(q); q=q.next; ELSE FOR(p=element.hid;p!=null&&p.hid!=q.fid;p=p.next); IF (p.hid==q.fid) THEN q.val= element.val; FindQueryChild(q);q = q.next; END IF END FOR END IF END WHILE END IF Algorithm 1 and 2 change the corresponding values of the hash table to schedule triggering the descendant operator and inquiring the parent operator. Algorithm2 FindQueryParent() {Input an element node and inquire parent operator} IF (HashQueryFirstnode(element)) THEN element.val=TRUE; ELSE q <- HashBucketFirstnode(QueryPrenode(element)); WHILE(q!=null)DO

FOR ( ; q!= null;q = q.next ) IF (q.fid==element.fid) THEN element.val = q.val; ELSE FOR (phid=q.hid; phid!= null; phid = phid.next); IF(p.hid==element.fid) THEN element.val = q.value; END IF END FOR END IF END FOR END WHILE element.val=UNDECIDED; END IF

5

Performance Evaluation

We have implemented the XFPro translator engine in Java, which rewrites XPath expressions into tid-tree based query plans for XML fragments. Our XFPro query engine on fragmented XML streams processes the optimized queries directly on the filler fragments before reconstructing the entire XML document. All experiments are run on a PC with 2.6GHz CPU, 512M of memory and 80G hard disk. The operating system is WindowsXP. The experiments are run on data sets generated by the xmlgen program. We have written an XML fragmenter that fragments an XML document into filler fragments to produce an XML stream, based on the tag structure defining the fragmentation layout. We have selected three representative queries (Q1 ,Q2 and Q3 ) on the generated XML documents and compared the results with the XFrag Processor [4]. Query1:doc("book.xml")/book/sections/section/subsection/title Query2:doc("book.xml")/book/section[difficulty>="default"]/title Query3:doc("book.xml")/book/title/section[difficulty>="default"] To illustrate the differences in the query execution methods on the filler fragments, consider the Query 1 that returns the subsection title of the books. Since “section”, “subsection” and “title” are in common filler fragments, according to the fragmentation information in tag structure, our query operates “subsection” and “title” over fragment only when the fragment tsid matches that of the operator. Furthermore, each fragment is only evaluated once and hashed to corresponding item if tsid matches. While in XFrag, each fragment needs to be passed on through the pipeline and evaluated step by step. In this way, our method performs better than XFrag. The results of the experiments are summarized in Figure 11. From the experimental results, we observe that the XFPro method outperforms the XFrag method mainly on running time, while the memory cost of these two methods makes little difference. That is because both of the methods adopt the policy of keeping the output-related information of the fragments while hash buckets use less links than association table. For the query processing time, the XFPro method saves CPU time by avoiding subsumption operations. Furthermore, the XFrag method has to schedule the operations for each fragment, while the XFPro only changes the corresponding value of the hash table.

File size

Fragmented File Size

Method

Run time

memory

10Mb

11.04Mb

XFPro XFrag

518.27ms 1875.00ms

0.36Mb 0.62Mb

15Mb

17.56Mb

20Mb

23.18Mb

XFPro XFrag XFPro XFrag

1377.05ms 3926.50ms 2121.59ms 5245.56ms

0.81Mb 1.35Mb 1.18Mb 1.83Mb

10Mb

11.98Mb

15Mb

19.20Mb

20Mb

24.12Mb

XFPro XFrag XFPro XFrag XFPro XFrag

3015.92ms 7329.70ms 4585.60ms 11444.55ms 6727.93ms 15259.40ms

1.87Mb 2.13Mb 5.39Mb 6.95Mb 6.78Mb 9.83Mb

10Mb

11.78Mb

15Mb

19.38Mb

20Mb

24.33Mb

XFPro XFrag XFPro XFrag XFPro XFrag

3005.86ms 7239.07ms 4550.15ms 11429.71ms 6674.87ms 15154.78ms

2.08Mb 2.03Mb 5.01Mb 6.64Mb 6.73Mb 8.86Mb

Query

Q1

Q2

Q3

Fig. 11. Experimental Results

6

Conclusions

This paper has presented a framework and a set of techniques for processing XPath queries over streamed XML fragments. We present techniques for enabling the transformation from XPath expression to optimized query plan. Our query model of tid tree helps to transform queries on element nodes to queries on XML fragments and analyze “redundant” operations in them. Furthermore, such transformations specify query operations such as “//” and “*” and reduce the query workload. Based on optimized tid tree, we present a scheme to map a tid tree directly into an XML fragment query processor, and thus efficient query execution plan is generated. Our experiments show that our framework performs well on saving processing power and memory space. Acknowledgments This research was partially supported by the National Natural Science Foundation of China (Grant No. 60473074 and 60573089) and Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP).

References 1. W3C Recommendation: Extensible Markup Language (XML) 1.0 (Second Edition). (2000) http://www.w3.org/TR/REC-xml. 2. W3C Working Draft: XML Path Languages (XPath), ver 2.0. (2001) Tech. Report WD-xpath20-20011220, W3C, 2001, http://www.w3.org/TR/WD-xpath2020011220. 3. W3C working draft: XQuery 1.0: An XML Query Language. (2001) Technical Report WD-xquery-20010607, World Wide Web Consortium.

4. Bose, S., Fegaras, L.: XFrag: A query processing framework for fragmented XML data. In: Eighth International Workshop on the Web and Databases (WebDB 2005), Baltimore, Maryland (June 16–17,2005) 5. Bose, S., Fegaras, L., Levine, D., Chaluvadi, V.: A query algebra for fragmented XML stream data. In: Proceedings of the 9th International Conference on Data Base Programming Languages(DBPL 2003), Potsdan, Germany (September 6–8, 2003) 6. Fegaras, L., Levine, D., Bose, S., Chaluvadi, V.: Query processing of streamed XML data. In: Eleventh International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA (November 4–9, 2002) 7. Chen, J., J.DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In Chen, W., Naughton, J.F., Bernstein, P.A., eds.: SIGMOD Conference, Dallas, Texas, USA, ACM (2000) 379–390 8. Florescu, D., Hillery, C., Kossmann, D., Lucas, P., Riccardi, F., Westmann, T., Carey, M.J., Sundararajan, A.: The BEA/XQRL streaming xquery processor. In Freytag, J.C., Lockemann, P.C., Abiteboul, S., Carey, M.J., Selinger, P.G., Heuer, A., eds.: Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany (2003) 9. Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: FluXQuery: An optimizing xquery processor for streaming XML data. [16] 1309–1312 10. Lud¨ ascher, B., Mukhopadhyay, P., Papakonstantinou, Y.: A transducer-based XML query processor. In Bernstein, P.A., Ioannidis, Y.E., Ramakrishnan, R., Papadias, D., eds.: Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong SAR, China (2002) 227–238 11. Lud¨ ascher, B., Papakonstantinou, Y., Velikhov, P.: Navigation-driven evaluation of virtual mediated views. In: Proceedings of the 7th International Conference on Extending Data Base Technology(EDBT 2000), Konstanz, Germany (March 27–31, 2000) 150–165 12. Wang, E., et al.: Efficient management of XML contents over wireless environment by Xstream. In: ACM-SAC 2004. (March, 2004) 1122–1127 13. Abiteboul, S., Benjelloun, O., Cautis, B., Manolescu, I., Milo, T., Preda, N.: Lazy evaluation for active XML. In Weikum, G., K¨ onig, A.C., Deßloch, S., eds.: SIGMOD Conference, Paris, France, ACM (2004) 227–238 14. Huo, H., Hui, X., Wang, G.: Document fragmentation for XML streams based on hole-filler model. In: 2005 China National Computer Conference, Wu Han, China (October 13–15,2005) 15. Bar-Yossef, Z., Fontoura, M., Josifovski, V.: On the memory requirements of xpath evaluation over XML streams. [16] ¨ 16. Nascimento, M.A., Ozsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Schiefer, K.B., eds.: (e)Proceedings of the Thirtieth International Conference on Very Large ¨ Data Bases. In Nascimento, M.A., Ozsu, M.T., Kossmann, D., Miller, R.J., Blakeley, J.A., Schiefer, K.B., eds.: Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, Morgan Kaufmann (2004)

Efficient Query Processing for Streamed XML Fragments

Institute of Computer System, Northeastern University, Shenyang, China ... and queries on parts of XML data require less memory and processing time.

262KB Sizes 2 Downloads 246 Views

Recommend Documents

Efficient Top-k Hyperplane Query Processing for ...
ABSTRACT. A query can be answered by a binary classifier, which sep- arates the instances that are relevant to the query from the ones that are not. When kernel methods are employed to train such a classifier, the class boundary is represented as a h

A Space-Efficient Indexing Algorithm for Boolean Query Processing
index are 16.4% on DBLP, 26.8% on TREC, and 39.2% on ENRON. We evaluated the query processing time with varying numbers of tokens in a query.

Region-Based Coding for Queries over Streamed XML ... - Springer Link
region-based coding scheme, this paper models the query expression into query tree and ...... Chen, L., Ng, R.: On the marriage of lp-norm and edit distance.

Efficient Exact Edit Similarity Query Processing with the ...
Jun 16, 2011 - edit similarity queries rely on a signature scheme to gener- ... Permission to make digital or hard copies of all or part of this work for personal or classroom ... database [2], or near duplicate documents in a document repository ...

An Efficient Algorithm for Location-Aware Query ... - J-Stage
Jan 1, 2018 - location-aware service, such as Web mapping. In this paper, we ... string descriptions of data objects are indexed in a trie, where objects as well ...

An Efficient Algorithm for Location-Aware Query ... - J-Stage
Jan 1, 2018 - †The author is with Graduate School of Informatics, Nagoya. University .... nursing. (1, 19). 0.7 o5 stone. (7, 27). 0.1 o6 studio. (27, 12). 0.1 o7 starbucks. (22, 18). 1.0 o8 starboost. (5, 5). 0.3 o9 station. (19, 9). 0.8 o10 schoo

Using OBDDs for Efficient Query Evaluation on Probabilistic Databases
a query q and a probabilistic database D, we construct in polynomial time an ... formation have, such as data cleaning, data integration, and scientific databases. ..... The VO-types of the variable orders of Fig. 3 are (X∗Y∗)∗ and X∗Y∗, re

A Space-Efficient Indexing Algorithm for Boolean Query ...
lapping and redundant. In this paper, we propose a novel approach that reduces the size of inverted lists while retaining time-efficiency. Our solution is based ... corresponding inverted lists; each lists contains an sorted array of document ... doc

Efficient Error-tolerant Query Autocompletion
clude command shells, desktop search, software development environments (IDE), and mobile applications. ... edit distance is a good measure for text documents, and therefore has been widely adopted and studied [8 ..... 〈12, 2, 1 〉. 〈12, 3, 1 ã€

Linked Data Query Processing Strategies
Recently, processing of queries on linked data has gained at- ... opment is exciting, paving new ways for next generation applications on the Web. ... In Sections 3 & 4 we present our approach to stream-based query ..... The only “interesting”.

Chapter 5: Overview of Query Processing
calculus/SQL) on a distributed database (i.e., a set of global relations) into an equivalent and efficient lower-level query (of ... ASG2 to site 5: 1000 * tuple transfer cost. 10,000. – Select tuples from ASG1 ∪ ASG2: 1000 * tuple access cost. 1

REQUEST+: A framework for efficient processing of ...
Jun 24, 2013 - the total number of sets, we devise a pruning method that utilizes the concept of circular convex set defined in [14]. .... In this section, we propose REQUEST+, a framework for region-based query processing in sensor networks. ......

Efficient Keyword Search over Virtual XML Views
ping, which could lead to data duplication and its associated space-overhead. In contrast, a more scalable strategy is to define virtual views for different users of ...

Optimizing Differential XML Processing by Leveraging ...
critical business applications. ... Although our approach has promising performance benefits, some limitations were observed in our ..... Technology for. Efficient.

The Space Complexity of Processing XML Twig ... - Research at Google
and Google Haifa Engineering Center. Haifa, Israel. [email protected] ..... which we call basic twig queries. Many existing algo- rithms focus on this type ...

On efficient k-optimal-location-selection query ...
a College of Computer Science, Zhejiang University, Hangzhou, China ... (kOLS) query returns top-k optimal locations in DB that are located outside R. Note that ...

On efficient k-optimal-location-selection query ... - Semantic Scholar
Dec 3, 2014 - c School of Information Systems, Singapore Management University, ..... It is worth noting that, all the above works are different from ours in that (i) .... develop DBSimJoin, a physical similarity join database operator for ...

LigHT: A Query-Efficient yet Low-Maintenance Indexing ...
for indexing unbounded data domains and a double-naming strategy for improving ..... As the name implies, the space partition tree (or simply partition tree for short) ..... In case of mild peer failures, DHTs can guarantee data availability through.

A Case for XML - IJEECS
butions. For starters, we propose a novel methodology for the investigation of check- sums (TAW), which we use to verify that simulated annealing and operating systems are generally incompatible. We use decen- tralized archetypes to verify that super

A Case for XML - IJEECS
With these considerations in mind, we ran four novel experiments: (1) we measured instant messenger and RAID array throughput on our 2-node testbed; (2) we ...

On-the-Fly Sharing for Streamed Aggregation
In this paper we present ways to efficiently share streaming aggre- gate queries with differing .... is that they operate on data on-the-fly and require no static analysis of the queries as a .... good, sharing than paned windows. Paned and paired ..

GPUQP: Query Co-Processing Using Graphics Processors - hkust cse
on how GPUs can be programmed for heavy-duty database constructs, such as ... 2. PRELIMINARIES. As the GPU is designed for graphics applications, the basic data .... processor Sorting for Large Database Management. SIGMOD 2006: ...

REQUEST: Region-Based Query Processing in Sensor ...
In wireless sensor networks, node failures occur frequently. The effects of these failures can ..... tion service for ad-hoc sensor networks. SIGOPS Oper. Syst. Rev.

GPUQP: Query Co-Processing Using Graphics ...
computing devices including PCs, laptops, consoles and cell phones. GPUs are .... using the shared memory to sort all bitonic sequences whose sizes are small ...