REQUEST+: A framework for efficient processing of ...

Viewer
Transcript

Information Sciences 248 (2013) 151–167

Contents lists available at SciVerse ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

REQUEST+: A framework for efﬁcient processing of region-based queries in sensor networks Dong-Wan Choi, Chin-Wan Chung ⇑ Department of Computer Science, Korea Advanced Institute of Science and Technology(KAIST), 335 Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea

a r t i c l e

i n f o

Article history: Received 17 June 2011 Received in revised form 11 March 2013 Accepted 16 June 2013 Available online 24 June 2013 Keywords: Sensor network Spatial query Group-by aggregation Weighted set-cover problem

a b s t r a c t In wireless sensor networks, individual sensing values are not reliable due to node failures. The effect of these failures can be reduced by using aggregated values for groups of sensor nodes instead of the individual sensing values. However, most existing works have focused on computing the aggregation of all the nodes without grouping. Only a few approaches dealt with the processing of grouped aggregate queries. However, since groups in their approaches are disjoint, some areas which are not covered by groups cannot be considered, even if the areas are relevant to the user’s interest. In this paper, we propose a new type of queries, region-based queries, and a framework to process region-based queries, called REQUEST+. A region in REQUEST+ is deﬁned as a maximal set of nodes located within a circle having a diameter speciﬁed in the query. To efﬁciently construct a large number of regions covering the entire monitoring area, we build the SEC (Smallest Enclosing Circle) index. Moreover, in order to process a region-based query, we adapt a clustering-based aggregation method, in which there is a leader node for each region. To minimize the communication cost, we formulate an optimal leader selection problem and prove that it is NP-hard. In addition, we transform the problem into the weighted set-cover problem to utilize the algorithm devised for the problem. Finally, we construct a query-initiated routing tree for the communication between the leader and non-leader nodes. In the experimental results, we show that the result of our region-based query is more reliable than that of the query which is based on individual nodes, and our processing method is more energy-efﬁcient than existing methods for processing grouped aggregate queries. Ó 2013 Elsevier Inc. All rights reserved.

1. Introduction In various environmental monitoring applications, wireless sensor networks are broadly used. By using these applications, we can ﬁnd some phenomena of the monitoring area, and detect some events corresponding to a given set of conditions. For example, if farmers could collect the information about the distribution of nutrients in the soil or locations colonized by many insects in real-time, the farmers could predict where and how much water, pesticide, and fertilizer are needed currently [1]. We can collect this information by gathering sensing values from the sensor nodes scattered in the monitoring area. However, each sensing value can have some noises, as sensor nodes are prone to failure inherently. Moreover, managing a large number of individual sensor nodes is ineffective when only a macro view of the monitoring area is required. To overcome these problems, we can construct groups of sensor nodes, and use an aggregated value of each group. Existing works on grouping nodes [12,31] in the sensor networks address the problem by partitioning or clustering nodes with ⇑ Corresponding author. Tel.: +82 42 350 3537; fax: +82 42 350 3510. E-mail addresses: [email protected] (D.-W. Choi), [email protected] (C.-W. Chung). 0020-0255/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ins.2013.06.048

152

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

appropriate criteria such as the geographic location. In these works, there can be missing areas since they do not allow groups to overlap. For example, regions that are located in the middle of two clusters cannot be found among the retrieved groups. It is natural to group sensor nodes with regions of the same size. This is because, in the sensor network applications, our interest is not a node itself, but a region covered by the node. Also, each node has the same sensing coverage. In this perspective, we can claim that every node itself is also a group that covers a tiny equal-sized region and includes only one node. If a larger region size is required, groups including one or more nodes can be constructed. Considering the above grouping method, we propose REQUEST+, which is a framework for the region-based query processing in sensor networks. In region-based queries, the primitive processing unit is a region instead of a node. In addition, regions can overlap to cover every possible area where sensor nodes are deployed. Fig. 1 shows example queries and results that ﬁnd nodes or regions with certain temperature and humidity. Note the difference between our proposed region-based query (Fig. 1b and c) and the query that is based on individual nodes (Fig. 1a). In order to process region-based queries, there are some challenging problems. First, since regions overlap and the number of regions is fairly large, it is not trivial to efﬁciently construct regions with a speciﬁed size in the query. A naive approach is to move a circle representing a region as a certain step size. However, this approach is too inefﬁcient, and it is not easy to ﬁnd appropriate step size. To solve this problem, we create the SEC (Smallest Enclosing Circle) index structure in the preprocessing phase, and construct regions by using the SEC index. Second, the communication cost of forwarding sensing values and aggregated values can be considerably high due to a large number of regions. Especially when the region size is large, the TAG-based aggregation method [18] is not appropriate since a single node can belong to many groups. In order to process the region-based queries energy-efﬁciently, we use a clustering-based aggregation method [11] as a basic processing scheme. In the clustering-based aggregation method, we have a leader node and several non-leader nodes in each group, and the aggregation of each group is computed locally in a group. Since there are numerous regions in our environment, it is more beneﬁcial to calculate an aggregated value for each region as early as possible. By doing so, we can reduce the size and the total number of messages to send to the base station. Moreover, to minimize communication cost while gathering aggregated values, we need an algorithm that selects optimal leader nodes efﬁciently in terms of energy consumption. To determine optimal leader nodes, we consider some criteria such as the hop counts between nodes, the size of messages, the selectivity of retrieved regions, and the number of regions covered by a node. Based on these criteria, we formulate a leader selection problem, and prove the problem to be NP-hard. Also, we design an algorithm that uses the idea of transformation into the weighted set-cover problem.

Fig. 1. Example queries and results.

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

153

Finally, we need a topology for communication between leader nodes and non-leader nodes. A TAG-based global routing tree [18] is not a proper topology for the intra-region communication, since it is constructed in order for the base station to collect the data of the entire network. For the intra-region communication, it is required to construct a tree in order for the leader node to collect the data of non-leader nodes inside the region. Therefore, for each leader node, we build a new routing tree whose root node is the leader node, called query-initiated routing tree. Our contributions in this paper are as follows: We propose a new type of queries in sensor networks, called region-based queries, which use a region as a primitive data unit. By adjusting the region size in the query, we can collect the data in various degrees of circumstantiality. Moreover, since regions overlap in REQUEST+, we can avoid that some important regions are missing. To the best of our knowledge, the region-based query is the ﬁrst type of grouped aggregate query in which groups can overlap. We propose algorithms to efﬁciently process region-based queries. To efﬁciently construct a large number of overlapping regions, we propose the Smallest Enclosing Circle index. In addition, to deal with large-sized regions as well as small-sized regions, we allow a leader node of a region to be placed outside the region. To select optimal leader nodes, we formulate the leader selection problem, and prove the problem to be NP-hard. Also, we transform the problem into the weighted set-cover problem to use an algorithm for the weighted set-cover problem. To reduce the total number of sets, we devise a pruning method that utilizes the concept of circular convex set deﬁned in [14]. Through extensive experiments, we show that by using region-based queries, higher reliability can be achieved in the result, especially when node failures occur frequently. Also, we show that the proposed query processing methods are more efﬁcient than existing methods in terms of energy consumption. The rest of the paper is organized as follows. In Section 2, we discuss the related work. In Section 3, we propose REQUEST+, a framework for processing region-based queries, and explain our processing methods. In Section 4, we show experimental results, and we conclude our work in Section 5.

2. Related work Our region-based queries are similar to the spatial queries in sensor networks. These spatial queries in sensor networks have been actively reported [7,8,10,27]. For instance, in [27], Soheili et al. propose a distributed spatial index, called SPIX, over the sensor nodes to process spatial queries in a distributed manner. However, most of works about spatial queries in sensor networks have focused on using the predeﬁned regions. For example, in [27], spatial queries are used to answer questions such as ‘‘what is the average temperature in room-1?’’. In contrast, our region-based queries ask questions such as ‘‘which regions with a diameter 10 m have the average temperature lower than a certain threshold?’’. Thus, in REQUEST+, regions are not predeﬁned before the query is posed, but redeﬁned whenever the region size speciﬁed in the query is changed. Our query processing scheme is also related to aggregate queries in sensor networks. Although many works have been proposed to process aggregate queries [9,18,17,11,21,19,22,13], only a few works deal with grouped aggregate queries. In TAG [18], Madden et al. propose a grouped in-network aggregation method. In this method, as climbing up the global routing tree from the leaf nodes to the base station, partial aggregated values for each group are computed and forwarded respectively at each node. However, this method has a problem if there are many groups to be maintained at each node. There have been other works [24,28] to improve the grouped in-network aggregation method, which are based on TAG. In these works, they focus on modifying the routing protocol, in order to reduce the size of messages. In [24], Sharaf et al. propose group-aware network conﬁguration algorithms. The key idea of these algorithms is selecting a node in the same group as a parent. By doing so, the number of partial aggregations that should be maintained at each node can be reduced. In [28], a multipath routing protocol is proposed in order for each node to send its data to different parents. All these grouped in-network aggregation methods based on TAG consider only disjoint groups. However, in REQUEST+, a large number of overlapping regions (groups) can be generated. The methods based on TAG are not directly applicable to our environment, since a node can belong to several groups simultaneously. Moreover, if larger-sized regions are required in the query, the number of groups to which a node belongs will be increased. Zhuang et al. propose the max regional aggregate query [32] which is the most similar and related query with the proposed region-based query. The max regional aggregate query is for ﬁnding a region with the maximum aggregated value. To do that, they propose a sampling-based approach in which regions and nodes are sampled within a certain accuracy. However, they only consider regions at which individual nodes are centered. Therefore, some relevant regions to the user’s interest can be missing. In addition, they only focus on reducing the data to process but do not deal with an efﬁcient collection of the data. Lee et al. in [16] present a framework to efﬁciently process group-by aggregate queries in sensor networks. They propose the compression scheme that uses the Haar wavelet in order to reduce the size of messages. However, they assume that groups and leader nodes are pre-determined, and only consider pre-clustered groups which are disjoint each other. In the preliminary work [6] of this paper, Choi et al. propose REQUEST, region-based query processing in sensor networks. While they propose an efﬁcient solution to answer region-based queries, the solution only deals with small-sized regions. In

154

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

REQUEST, the region size should not be larger than the communication range, and this constraint is a signiﬁcant limitation. To overcome this limitation, in REQUEST+, we extend the preliminary solution to a more general solution that considers large-sized regions as well as small-sized regions by modifying the preliminary algorithm that transforms the leader selection problem into the weighted set-cover problem. 3. Framework to process region-based queries In this section, we propose REQUEST+, a framework for region-based query processing in sensor networks. The goal of REQUEST+ is gathering regional aggregated values energy-efﬁciently so that we can ﬁnd some interest regions satisfying several conditions. The speciﬁcation of the region-based query is as follows: select from group by having sampling rate duration

{region, aggfunc (attributes)} sensors region (region size D) {having predicates} {time of sampling interval} {maximum of sampling time}

We regard a region as a circle located in any places in the sensor network monitoring area. In fact, as shown in Fig. 2a, there are inﬁnite number of regions in the monitoring area, even if every region has the same size. However, the number of regions can be limited since the number of sensor nodes is ﬁnite. In order to limit the number of regions reasonably, we formally deﬁne a region as follows: Deﬁnition 1 (Region). Let D be the region size (diameter) speciﬁed in the query. A region r is a maximal set of sensor nodes which are located within a circle having a diameter D. Since r is maximal, it is not contained in any other regions. Fig. 2b shows three different regions with a size D in REQUEST+. By Deﬁnition 1, r2 cannot be a region since it is included in region r1. The set of sensor nodes in r2 can be a region when users request a smaller D. Unlike existing grouped aggregate methods, regions can overlap in our approach. Through this, we can cover all areas which contain the sensor nodes deployed in the monitoring area. Therefore, a node can belong to several different regions. When there are a large number of groups or especially the selectivity of the having predicates in the query is relatively low, the earlier aggregation can be more beneﬁcial in terms of energy consumption. Therefore, we adapt a clustering-based aggregation method to process region-based queries more energy-efﬁciently. In this method, there is a leader node for each group, and non-leader nodes in the same group forward their sensing values, and ﬁnally aggregation of each group is computed in the leader node. In REQUEST+, a leader node can represent many regions. For example, in Fig. 2b, a dark-colored node is a leader node which covers three different regions. Unlike the example in Fig. 2b, a leader node can be outside of its region to minimize communication cost. We will explain further details about the optimal leader selection problem in Section 3.2. The overall process of REQUEST+ consists of the following steps: 1. 2. 3. 4. 5.

Regions and leader nodes are decided at the base station when a query is received from a user. The initial query message with the leader and region information is sent to the entire network. Query-initiated routing trees are constructed, one for each leader node. Every non-leader node sends its data to its leader node(s). Every leader node computes and forwards the aggregated value for each region to the base station.

Fig. 2. Regions in sensor networks.

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

155

3.1. Region construction The region construction in REQUEST+ is to ﬁnd every possible combination of sensor nodes located inside a circle of a given diameter in the query. For a naive idea, we can ﬁnd all the regions by moving the circle from the top left corner to the bottom right corner. However, it is difﬁcult to determine the appropriate step size for covering the entire region since a region can be placed at an arbitrary position. Moreover, if the size of monitoring area is large, this naive method is extremely time-consuming. Therefore, we propose an efﬁcient region construction method that utilizes the SEC (Smallest Enclosing Circle) index. The deﬁnition of the SEC of a region is as follows: Deﬁnition 2 (SEC). The SEC (Smallest Enclosing Circle) of a region r, denoted by SEC(r), is a circle with the smallest diameter enclosing all nodes in r. By Deﬁnition 1, the diameter of SEC(r) is not larger than D. Our intuitive observation is that every region can be exclusively identiﬁed by the SEC of the region, which is proved in the following lemmas. Lemma 1. Let r1 and r2 be regions complying to Deﬁnition 1. If r1 – r2, then there exist n1 2 r1 and n2 2 r2 such that the distance between n1 and n2 is larger than D (see Fig. 3).

Proof. To prove the lemma by contradiction, suppose that the distance between every pair of nodes in r1 [ r2 is not larger than D, assuming r1 – r2. Since r1 – r2, there exists a node u satisfying u 2 r1 ^ u R r2. Then u is located at most D away from every node of r2 according to the initial assumption. By Deﬁnition 1, r2 [ {u} is also a valid region. Therefore, r2 [ {u} r2, and this contradicts that r2 is maximal. h Lemma 2. Every region is uniquely identiﬁed by its corresponding SEC. Proof. This lemma means that the following two propositions are valid: (1) ‘‘If r1 = r2, then SEC(r1) = SEC(r2)’’ and (2) ‘‘If r1 – r2, then SEC(r1) – SEC(r2)’’. The proof of (1) is obvious by Deﬁnition 2. To prove (2), we use the proof by contradiction as common with the proof of Lemma 1. Suppose that SEC(r1) = SEC(r2) and r1 – r2. Then the center and diameter of SEC(r1) are equal to those of SEC(r2). By Deﬁnition 2, therefore, all nodes in r1 and r2 are enclosed by both SEC(r1) and SEC(r2). Let d denote the diameter of SEC(r1)(same as that of SEC(r2)). Then the distance between every pair of nodes in r1 [ r2 is not larger than d, and obviously d 6 D. This contradicts Lemma 1. h Another important fact behind our method is that every SEC can be deﬁned by using at most three points, which is proved in [30]. Thus, in order to ﬁnd all the regions, it is sufﬁcient to check every triple (and pair) of sensor nodes. For example, Fig. 4 shows the relationship between regions and the corresponding SECs. Fig. 4b shows all the possible SECs of a given set of nodes. To construct regions with a size D is identical to ﬁnd maximal sets of nodes inside a circle having a diameter D by Deﬁnition 1. In Fig. 4a, there are two regions, which are sets of nodes. To generate these sets, we ﬁnd the largest SECs among the SECs with a diameter that is not larger than D. In Fig. 4b, the SECs in the solid line correspond to the regions in Fig. 4a. In summary, ﬁrst we build the SEC index of the sensor network, and then construct regions by using the SEC index. To build the SEC index, we need an algorithm of ﬁnding a SEC that completely contains a given set of points. The problem of ﬁnding SEC has been well-studied in the research area of mathematics. We use an algorithm from [30], which is simple to implement and has linear average time complexity. Fig. 5 shows the algorithm of building the SEC index. Since every region can be deﬁned by at most three nodes, the number of the SEC index entries is at most m3, where m is the number of nodes. We manage the SEC index to be sorted by diameters so as to construct regions efﬁciently in runtime. Each entry of the SEC index consists of a diameter, a SEC, and a set of nodes which are located inside SEC (Line 5). We assume that every node has a static position. Therefore, once the SEC index is built, we do not need further updates to the index. By means of the SEC index, we can construct corresponding regions when a region-based query is posed. Fig. 6 presents the algorithm of region construction. First, we ﬁnd the largest SEC among the SEC index entries with a diameter that is not

Fig. 3. The illustration of Lemma 1.

156

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

Fig. 4. Regions and the corresponding SECs.

Fig. 5. Algorithm of building the SEC index.

Fig. 6. Algorithm of region construction.

larger than that of the region speciﬁed in the query (Line 1). From the largest SEC to the smallest SEC, we generate regions unless they are sub regions of already generated regions (Lines 3 5).

3.2. Leader selection In REQUEST+, there are two kinds of communication. First, every sensor node sends its sensing value to its leader nodes, called the intra-region communication. Second, every leader node forwards aggregated values to the base station after ﬁltering values that do not satisfy having conditions in the query, called the aggregation forwarding. Since regions overlap in REQUEST+, a leader node represents one or more regions. Moreover, a leader node for each region does not have to be a node inside the region. Especially when a large region size is requested, it is more beneﬁcial that leader nodes can be placed outside their regions. For example, in Fig. 7a, there are two leader nodes (ﬁlled nodes) which are located inside their region. In this case, some non-leader nodes should send their data to both of the two leader nodes, and the two leader nodes should respectively forward aggregated values to the base station. In contrast, in Fig. 7b, there is only one leader node covering four regions. Note

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

157

Fig. 7. Communication according to the different leader nodes.

that one of the regions does not contain the leader node. This reduces the intra-region communication cost as well as the aggregation forwarding cost. To minimize the communication cost, it is important to determine which nodes should be selected as a leader node. Also, we should decide which and how many regions should be covered by each leader node since a leader node can cover any regions even if those regions do not contain the leader node. Therefore, optimal leaders and their optimal collections of regions, called region clusters, should be determined. There are some issues to solve these problems. First, there is a tradeoff between the aggregation forwarding cost and the intra-region communication cost by the distances from leader nodes to other nodes. If we select leader nodes nearby the base station, the aggregation forwarding cost can be decreased. However, the intra-region communication cost can be increased, as the distances between leader nodes and non-leader nodes will be increased. On the other hand, if we select leader nodes nearby non-leader nodes, the aggregation forwarding cost will be increased while the intra-region communication cost will be decreased. Second, we should consider the number of regions covered by each leader node and the total number of leader nodes. If the number of regions covered by a leader node is small, the cost for the leader node to forward aggregated values to the base station will also be low. It is because the size of messages for a leader node to forward to the base station is proportional to the number of regions covered by the leader node. However, this also increases the total number of leader nodes, which can increase the total communication cost. It is desired that leader nodes nearby the base station should cover more regions, and leader nodes nearby non-leader nodes should cover fewer regions. Finally, since some aggregated values can be ﬁltered out by the having conditions, we need to estimate the selectivity of the having predicate. When most aggregated values are discarded at each leader node due to the low selectivity, the aggregation forwarding cost is not an important issue. In this case, to reduce the intra-region communication cost, we should select leader nodes nearby non-leader nodes rather than those nearby the base station. For the opposite case, we should select leader nodes nearby the base station to reduce the aggregation forwarding cost. 3.2.1. The leader selection problem Based on the requirements, we formulate the leader selection problem as follows:

Minimize

X X ðdist g ðroot; nj Þ selectiv ity jRj j þ dist q ðni ; nj ÞÞ nj 2L

Subject to :

[ Rj ¼ R

ni 2N j

j2L

R : The set of entire regions: N : The set of entire nodes: L : The set of leader nodes: Rj : The set of regions which a leader node nj covers: Nj : The set of nodes which are located in Rj : dist g ðni ; nj Þ : The hop counts between ni and nj in the global routing tree: dist q ðni ; nj Þ : The hop counts between ni and nj in their local routing tree: In this formulation, given R and N, our goal is to ﬁnd the optimal L and Rj for each leader node nj 2 L while minimizing the objective function. The objective function is the summation over nj of the expected cost when we select a certain node nj and

158

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

a region cluster Rj. Note that the size of messages between the base station and the leader node nj is jRjj times larger than that of messages between the leader node nj and the non-leader nodes ni’s as long as the having conditions are satisﬁed. A unique constraint is for covering all the regions in R. We prove that the leader selection problem is NP-hard by the following theorem: Theorem 1. The leader selection problem is NP-hard. Proof. In order to prove this theorem, we reduce the discrete unit disk cover problem (a.k.a. the weighted geometric setcover problem with unit disks) to the restricted-version of the leader selection problem (i.e., a special case of the leader selection problem) that has the additional constraint that every leader node should be inside its residing region(s). This restricted-version of the leader selection problem is the same as deﬁned in the preliminary work [6]. Given a set P = {p1, p2, . . . , pm} of points and a set X ¼ fd1 ; d2 ; . . . ; dn g of unit disks on a 2-dimensional space, the discrete S P unit disk cover problem is to ﬁnd a subset X⁄ # X such that di 2X di covers P and di 2X Cðdi Þ is minimized, where C(di) is the cost (i.e., weight) of disk di. For an instance of the discrete unit disk cover problem, we can construct the corresponding instance of the leader selection problem as the following steps: 1. Set both the communication range and region size (diameter) to be 1 (the diameter of unit disks), which implies that P every pair of nodes in the same region is connected by 1-hop (i.e., ni 2Nj distq ðni ; nj Þ ¼ jN j j). 2. P ! R, i.e., For each pi 2 P, draw region ri 2 R centered at pi. Note that R is just given as an input set of the leader selection problem, which does not need be constructed by our algorithm for the region construction, even if some regions in R need a relaxation of Deﬁnition 1. 3. X ! N, i.e., For each di 2 X, deploy node ni 2 N into the area where ni belongs to all the corresponding regions that are originally the points covered by di. 4. Set selectivity to be the reciprocal of the least common multiple of all jdij’s. i ÞjN i j , where jNij is j{djjdi \ dj – ;}j by step 3. Since 1/selectivity is the least 5. For each node ni 2 N, set distg(root,ni) to be jdCðd i jselectiv ity

common multiple of all jdij’s by step 4, distg(root, ni) can be an integer value. Furthermore, because the global routing tree can be constructed in a very different manner from local routing trees, called query-initiated routing trees (see Section 3.3), any positive integer value can be assigned to distg(root, ni), assuming that C(di) > jNij without loss of generality. By doing this, if we choose ni 2 N as a leader node in the leader selection problem, the communication cost for ni is P Cðdi ÞjN i j nj 2Ni dist q ðni ; nj Þ ¼ jdi jselectiv ity selectiv ity jRi j þ jN i j ¼ Cðdi Þ, which is identical to the cost of

distg ðroot; ni Þ selectiv ity jRi j þ

di 2 X in the discrete unit disk cover problem. This reduction can be performed obviously in a polynomial time. Through this polynomial reduction, the leader selection problem is proved to be NP-hard since the discrete unit disk cover problem is also NP-hard [15]. h 3.2.2. Transformation into the weighted set-cover problem To solve the leader selection problem, we adapt an idea that the facility location problem can be transformed into the weighted set-cover problem [14]. Thus, a reduction in opposite direction to the proof of Theorem 1 is used. To transform the problem into the weighted set-cover problem, it is required to deﬁne sets and the cost of each set. Intuitively, we need to cover all the regions by selecting a set of nodes as leaders in the leader selection problem as we ﬁnd a collection of sets to cover all the elements in the weighted set-cover problem. Thus, selecting a node as a leader should be identical to choosing a set. Therefore, each node corresponds to a set or sets (recall that each leader node can cover many different sets of regions, called region clusters) and regions correspond to elements of a set. We deﬁne a set and the cost of each set in the weighted set-cover problem transformed from our leader selection problem as follows: Deﬁnition 3 (Set). Let Skj denote a set in the transformed weighted set-cover problem and nj denote a node. Then Skj is a set of regions covered by nj, and k is a number to distinguish this set from other sets of regions covered by nj.

Fig. 8. An instance of the leader selection problem.

159

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

Deﬁnition 4 (Cost of set). Let C Skj a P C Skj ¼ distg ðroot; nj Þ selectiv ity Skj þ ni 2Nj distq ðni ; nj Þ.

denote

the

cost

of

set

Skj .

Then

Note that the cost of each set is naturally derived from the objective function in our formulation. For an instance of the leader selection problem, as shown in Fig. 8, suppose that there are two regions r1 and r2, where three sensor nodes n1, n2, and n3 are deployed. Since a node can cover any combination of regions, which means a set can be any combination of regions, we should consider the following sets: S11 ¼ fr 1 g; S21 ¼ fr2 g; S31 ¼ fr1 ; r2 g; S12 ¼ fr 1 g; S22 ¼ fr 2 g; S32 ¼ fr 1 ; r2 g; S13 ¼ fr1 g; S23 ¼ fr2 g; S33 ¼ fr 1 ; r 2 g. If we choose set S11 and S23 , it is identical to select node n1 and n3 as leader nodes, which cover region r1 and r2, respectively. Like this example, we should consider every possible subset of regions for each node, which is the power set of entire regions. This means that k of Deﬁnition 3 is at most 2m 1, where m is the number of all the regions. Therefore, the number of sets in the transformed set-cover problem is about n 2m, where n is the number of sensor nodes. For an efﬁcient solution of the leader selection problem, it is required to reduce this exponential number of sets. 3.2.3. Reducing the number of sets Our proposed idea to reduce the number of sets is based on the fact that some combinations of regions do not have to be considered when they are not useful in terms of the set-greedy algorithm. For example, consider m connected regions in Fig. 9. Note that some sets such as {r1,r3}, {r1,r4}, . . . , and {r1,rm} are not necessary for ﬁnding optimal region cluster for node u since {r1,r2} is the best among sets having two elements. Therefore, we consider only the sets of regions which can possibly be an optimal region cluster, called the sufﬁcient region sets. To formally deﬁne the sufﬁcient region set for each node, we use the concept of circular convex set deﬁned in [14]. A circular convex set contains all the points, with any subset of geometric points, which are contained in some ‘circle’ that contains this subset (see Fig. 10). Unlike the concept of the circular convex set, the Euclidean distance between two points cannot be used for our sufﬁcient region set. Instead, it is required to measure the distance between a node and a region for ﬁnding optimal region cluster. We deﬁne the distance between node u and region r as follows: Deﬁnition 5 (DISTANCE(r, u)). Let P DISTANCEðr; uÞ ¼ v 2r distq ðu; v Þ.

DISTANCE(r, u)

denote

the

distance

of

region

r

from

node

u.

Then

Based on the deﬁnition of DISTANCE(r, u), the sufﬁcient region set for each node is deﬁned as follows: Deﬁnition 6 (Sufﬁcient region set). A sufﬁcient region set with regard to node u is a circular convex set of regions ri’s based on DISTANCE(ri, u). We use SF(u) to denote the set of all the sufﬁcient region sets for node u. For example, in Fig. 9, with regard to node u in r1, assuming that DISTANCE(ri, u) is proportional to the Euclidean distances of regions, where 1 6 i 6 m, then SF(u) = {{r1}, {r1, r2}, {r1, r2, r3}, . . . , {r1, r2, r3, . . . , rm1, rm}}. Since jSF(u)j for each node u is at most m, the total number of sets in transformed set-cover problem can be reduced from about n 2m to n m. 3.2.4. Algorithm of the leader selection Once we transform an instance of the leader selection problem into the weighted set-cover problem, the algorithm that solves the weighted set-cover problem can be applied. The set-cover problem is a well-known NP-hard problem, and has been actively studied in the theoretical communities. Among the several approximation algorithms that solve the set-cover problem in polynomial time, we use the set-greedy algorithm [5] which is the best known for the simplicity as well as a reasonable approximation bound. In the set-greedy algorithm, we pick a set that covers the greatest number of elements not yet covered at each step. We skip the detail explanation of the set-greedy algorithm since it is beyond our work. Fig. 11 presents the overall algorithm of the leader selection, in which the set-greedy algorithm is called as a sub function (Line 7). 3.2.5. Further optimization technique for small-sized regions In the case that the region size is not larger than the communication range, we can further reduce the number of sets by limiting the leader node for each region to be selected from the nodes inside the region [6]. If we give this constraint to the

Fig. 9. m connected regions.

160

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

Fig. 10. An example to illustrate circular convex sets.

Fig. 11. Algorithm of the leader selection.

optimal leader selection, the number of selected leader nodes can increase as shown in Fig. 7. However, if the region size is not larger than the communication range, all non-leader nodes are 1-hop distant from their leader nodes. Thus, even if a nonleader node has two or more leader nodes to send its sensing values, the communication cost will be the same as that of sending to one leader node. This is because sensor nodes broadcast their messages to all of their 1-hop neighbor nodes when sending their sensing values. Therefore, the intra-region communication cost will not grow despite increased leader nodes. By limiting the leader node for each region to be inside the region, the set of regions covered by each node is also limited to the regions containing the node. Thus, maximum k of Deﬁnition 3 is just one. Therefore, the number of sets is reduced to n from n m, where n is the total number of sensor nodes and m is the total number of regions. 3.3. Query-initiated routing tree In order for a leader node to communicate with non-leader nodes, it is required to build a routing tree for each leader node, called the query-initiated routing tree. When a new region-based query is posed, we construct a new routing tree for each leader node since the requested region size can be changed. Basically, we apply to each leader node the routing method which is similar to the method used when the base station constructs the global routing tree. The query-initiated routing tree construction performs the following steps: 1. Query messages with the leader nodes and region clusters information are ﬂooded in the entire network. Each node is aware whether itself is a leader or not, and which nodes are in the same region cluster. Thus, nonleader nodes can ﬁnd where to send its sensing values, and leader nodes can identify nodes in their region cluster. 2. The routing request messages (leader_id,hop_count) are broadcasted to the non-leader nodes. Receiver nodes should increase hop_count and broadcast to its neighbors. We assume that every node can identify another node’s location from the node id. 3. Each node designates the sender node as its parent node. If multiple messages with the same leader node arrive at a node, the sender node of the message with the smallest hop_count is selected as the parent of the node.

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

161

500

y-coordinate (meters)

400

300

200

100

0

0

100

200

300

x-coordinate (meters) Fig. 12. The deployment of LUCE.

4. For each leader node, the local region construction is performed. When the leader nodes received data message from the other nodes for the ﬁrst time, they perform the local region construction based on the sender nodes of arrived messages. Local region construction algorithm is almost the same as the algorithm in Fig. 6 except that we can only consider nodes inside the regions covered by the leader node and SECs are dynamically generated at each leader node. 4. Experiments In order to investigate the effectiveness and efﬁciency of the proposed method in REQUEST+, we conduct experimental evaluations. 4.1. Experimental environment We implement our method and alternative methods based on our own simulator, and conduct experiments using both real and synthetic datasets. As a topology for synthetic datasets, we deploy total 100 sensor nodes in a grid environment. The area of each grid cell is 10 m 10 m, and two sensor nodes are randomly located in each cell. We set the communication range to 10 m. Since our experiments are not affected by the spatial or temporal correlation in sensor networks, we randomly generate the sensing value, in the range [0, 10], for each node at every sampling time. To assess the performance of our method in the real world, we also use a real dataset, called the LUCE dataset, which can be downloaded from the SensorScope project [3,23]. This dataset has been widely used in various kinds of experiments in the literature related to sensor networks based on their own goals [20,25,26,29]. LUCE is collected from 97 weather stations (i.e., sensor nodes) each of which is equipped with a diverse set of sensors for measuring the temperature, humidity, solar radiation, wind speed, etc. Since our work does not focus on utilizing multiple attributes, we use only temperature measurements for the simplicity. Another important feature of LUCE is that it covers a relatively large area (about 300 m 400 m), which is well-suited for REQUEST+ where the goal is to ﬁnd interesting regions in the entire monitoring area. We use the deployment of all the sensor nodes in LUCE based on their real coordinates,1 as depicted in Fig. 12, and data from sensor nodes in our simulation environment. The communication range is set to 60m. For the convenience, we assume that a packet has simple header information which comprises of a source address and destination addresses. Note that destination addresses can be one or more, if a node belongs to several regions simultaneously. A message consists of the region or node identiﬁer and the corresponding sensing value(s). We set the node identiﬁer and the region identiﬁer to a sequential number and coordinates of the region center, respectively. A region center can 1

The website [23] provides the (x, y) coordinates as well as GPS coordinates.

162

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

be decided from the corresponding SEC index entry. However, in our proposed method, we do not need the region identiﬁer for a message, since the base station can identify regions from the leader node identiﬁer. 4.2. Reliability We conduct experiments to evaluate the effectiveness of region-based queries. As a metric of the reliability, we use the average relative error rate. This metric is calculated as follows:

P Av erageðrelativ e errorÞ ¼

i¼1...n

jv i v 0i j jv i j

n

In the above formula, n is the number of values, v is the original value, and v0 is a value with noises due to the failure. Using the average relative error, we conduct experiments with various failure rates from 10% to 90%. We use the following region-based query for this experiment. select region, AVG(temp) from sensors group by region(D) sampling rate 1 duration 100.

Fig. 13 shows the results of experiments on the reliability. ‘‘Individual’’, ‘‘Region (5 m)’’, ‘‘Region (10 m)’’, ‘‘Region (15 m)’’, ‘‘Region (20 m)’’, and ‘‘Region (25 m)’’ are the cases that D is 0 m, 5 m, 10 m, 15 m, 20 m, and 25 m, respectively. As the failure rate increases, the average relative error rates of all the cases also increase. However, region-based queries show better accuracy than the query that is based on individual nodes especially when the failure rate is high. Also, as the region size increases, accuracy gets better. This result shows that using the aggregated values in the regions is effective to reduce the effect of the node failures. 4.3. Energy-efﬁciency In sensor networks, consumed energy is the primary performance measure. Since the communication is the dominant factor in consuming energy, we use the amount of transmissions as an efﬁciency metric. We compare REQUEST+ with three other systems, which are ‘‘TAG’’, ‘‘REQUEST’’, and ‘‘Direct Collection’’. First, we implement TAG, which is a grouped in-network aggregation method proposed in [18]. In fact, in region-based queries, a node can belong to several groups (regions) at the same time, and each node cannot know those groups before the query is posed. Groups can change according to the region size that is speciﬁed in the query. It needs additional communication to notify the group information to each node. However, for the simplicity, we assume that every node knows its corresponding groups in advance for the compared system, TAG. Also, we compare with REQUEST which is the preliminary work of REQUEST+ by Individual

Region(5m)

Region(10m)

Region(15m)

Region(20m)

Region(25m)

1.2

Average Relative Error

1

0.8

0.6

0.4

0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Failure Rate Fig. 13. Experimental result for reliability.

0.8

0.9

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

163

adapting to large regions. To initialize query processing, each node in REQUEST+ needs more communication than REQUEST since leader nodes can be placed outside their regions in REQUEST+. Therefore, we give a penalty to REQUEST+ by adding the initialization cost to REQUEST+ only. Finally, Direct Collection is the most naive approach where every sensor node sends its sensing value to the base station directly without in-network processing. The query used in this experiment is as follows: select region, SUM(temp) from sensors group by region(D) having 0 SUM(temp) 5t sampling rate 1 duration 100

For the simplicity, we use the following simple estimation for the selectivity of the having predicate in REQUEST+:

selectiv ityestimated ¼

t MinðSUMðtempÞÞ MaxðSUMðtempÞÞ MinðSUMðtempÞÞ

Note that the real selectivity is not the same as the estimated selectivity, and can be very different in some cases. Nevertheless, REQUEST+ shows the best energy-efﬁciency in most cases. First, in order to show the efﬁciency of the basic processing scheme of REQUEST+ according to the region size, we conduct experiments with 100% selectivity of the having predicate even though our region-based queries focus on ﬁnding a few relevant regions to the query rather than retrieving all the aggregated values from all the regions. For each round, we collect aggregated values of all the regions since every region satisﬁes the conditions in the having clause. We test by varying the region size D from 5 m to 25 m, and set t to be a value which always makes the having predicate true. Fig. 14 shows that REQUEST+ is more energy-efﬁcient than the compared systems in most region sizes. In the case that the region size is not larger than the communication range 10 m, all of REQUEST, REQUEST+, and TAG are much better than Direct Collection. However, when the region size exceeds the communication range, REQUEST and TAG consume much more energy than Direct Collection. This is because the distances between leader nodes and non-leader nodes can be more than 1-hop, which increase the intra-region communication cost in REQUEST. Also, since a large region can contain more sensor nodes than a small region, the number of regions (groups) to which a sensor node belongs will increase. This increases the aggregation forwarding cost in REQUEST, and the size of partial aggregation messages in TAG. Thus, as the region size drastically increases, in-network processing is not helpful to reduce the consumed energy. Nevertheless, our proposed REQUEST+ is almost always better than Direct Collection as well as other systems in terms of energy consumption.

REQUEST

REQUEST+

TAG

Direct Collection

8000000

7000000

Energy Cost (bytes)

6000000

5000000

4000000

3000000

2000000

1000000

0

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Region Size (m) Fig. 14. Experimental result for varying region sizes (selectivity = 100%).

164

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167 REQUEST TAG

REQUEST+ Direct Collection

REQUEST TAG

REQUEST TAG

2500000 2000000 1500000 1000000 500000

7000000

5000000

Energy Cost (bytes)

3000000

4000000 3000000 2000000 1000000

0

6000000 5000000 4000000 3000000 2000000 1000000 0

0 0

0.2

0.4

0.6

0.8

1

REQUEST+ Direct Collection

8000000

6000000

3500000

Energy Cost (bytes)

Energy Cost (bytes)

4000000

REQUEST+ Direct Collection

0

0.2

Selectivity

0.4

0.6

0.8

1

0

0.2

Selectivity

0.4

0.6

0.8

1

Selectivity

Fig. 15. Experimental results for varying selectivities with large-sized regions.

REQUEST+ Direct Collection

REQUEST TAG

2500000

2500000

2000000

2000000

Energy Cost (bytes)

Energy Cost (bytes)

REQUEST TAG

1500000 1000000 500000 0

0

0.2

0.4

0.6

Selectivity

0.8

1

REQUEST+ Direct Collection

1500000 1000000 500000 0

0

0.2

0.4

0.6

0.8

1

Selectivity

Fig. 16. Experimental results for varying selectivities with small-sized regions.

Next, we conduct experiments with varying the selectivity of the having predicate. To do that, we change t in the range from the minimum of SUM(temp) to the maximum of SUM(temp). If t increases, the selectivity also increases. To apply effects of changing the selectivity to the compared system equally, we implement TAG to exploit suppressing messages in the intermediate node. We test for ﬁve region sizes which are 5 m, 10 m, 15 m, 20 m, and 25 m. Fig. 15 shows the experimental results with regions larger than the communication range. In all of the region sizes, REQUEST+ is the most energy-efﬁcient in most cases. Similarly, in Fig. 14, REQUEST and TAG do not show better performance than Direct Collection even if they show the effectiveness of ﬁltering aggregated values according to decrease of the selectivity. Fig. 15c shows that the difference between REQUEST+ and Direct Collection is not large. This is because if the region size is extremely large, REQUEST+ selects only two or less leader nodes which are located near to the base station. Thus, the communication way of REQUEST+ becomes similar to that of Direct Collection for the extremely large regions. Fig. 16 shows the experimental results with regions smaller than or equal to the communication range. Both REQUEST and REQUEST+ show better performance than other systems in most cases. In Fig. 16a, when the selectivity is high, TAG shows better efﬁciency than REQUEST and REQUEST+. This is because, in the case of tiny-sized regions, most sensor nodes belong to only one region (group) and most regions do not overlap each other, which is the ideal environment to the grouped in-network aggregation method. Nevertheless, REQUEST and REQUEST+ show the better ﬁltering effectiveness according to the decrease of the selectivity due to earlier suppression at each leader node.

4.4. Results using a real dataset We perform the same kind of experiments on the real dataset, LUCE, except varying failure rates. Thus, we evaluate the energy-efﬁciency of REQUEST+ compared with TAG, REQUEST, and Direct Collection. First, Fig. 17 shows the consumed energy of each system when varying the region size without any having conditions (i.e., 100% selectivity). It is worth reminding that our region-based queries aim at ﬁnding only interesting regions that satisfy

165

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

REQUEST

REQUEST+

TAG

Direct

4500000 4000000 3500000

Energy Cost (bytes)

3000000 2500000 2000000 1500000 1000000 500000 0

10

20

30

40

50

60

70

80

90

100

110

120

Region Size (m) Fig. 17. Experimental result for varying region sizes (selectivity = 100%) using a real dataset.

REQUEST+ Direct Collection

3500000

4000000

3000000

3500000

2500000 2000000 1500000 1000000 500000 0 0

0.2

0.4

0.6

Selectivity

0.8

1

REQUEST+ Direct Collection

REQUEST TAG

Energy Cost (bytes)

REQUEST TAG

Energy Cost (bytes)

Energy Cost (bytes)

REQUEST TAG

3000000 2500000 2000000 1500000 1000000 500000 0

0

0.2

0.4

0.6

0.8

1

REQUEST+ Direct Collection

4500000 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 0

0.2

Selectivity

0.4

0.6

0.8

1

Selectivity

Fig. 18. Experimental results for varying selectivities with large-sized regions using a real dataset.

given having conditions. Nevertheless, REQUEST+ shows the best performance even when gathering all the aggregated values. TAG and REQUEST are much more inefﬁcient than Direct Collection when the region size gets larger (recall that the communication range is set to 60 m for LUCE), which is similar to the result shown in Fig. 14. Next, we also perform experiments using LUCE with varying the selectivity of the having predicate. Figs. 18 and 19 show the experimental results with large-sized regions and small-sized regions, respectively. Overall trends of the graphs are similar to (or better than) the results using the synthetic dataset, which are shown in Figs. 15 and 16. With large-sized regions, our REQUEST+ is superior to other systems when the selectivity is reasonably low. Note that REQUEST, our preliminary system, is also as efﬁcient as REQUEST+ when the region size is smaller than the communication range.

5. Discussion It is worth noting the applicability of our framework in multiagent systems (MASs), which are the systems composed of multiple interacting agents in an environment. MASs have been used in a variety of applications such as market simulations, e-commerce trading environments, GIS systems, and defence systems, which all require the engineering of distributed sys-

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167 REQUEST+ Direct Collection

1800000 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0

REQUEST TAG

Energy Cost (bytes)

Energy Cost (bytes)

REQUEST TAG

0

0.2

0.4

0.6

Selectivity

0.8

1

REQUEST TAG

REQUEST+ Direct Collection

1800000 1600000 1400000 1200000 1000000 800000 600000 400000 200000 0

REQUEST+ Direct Collection

2500000

Energy Cost (bytes)

166

2000000 1500000 1000000 500000 0

0

0.2

0.4

0.6

0.8

1

0

0.2

Selectivity

0.4

0.6

0.8

1

Selectivity

Fig. 19. Experimental results for varying selectivities with small-sized regions using a real dataset.

tems rather than centralized systems. In a given MAS, by cooperations between agents, we can achieve global goals that are difﬁcult or impossible for a single agent to achieve (see [2,4] for a further detailed view of MASs). Sensor networks can be regarded as a special case of MASs in the sense that individual sensor nodes cooperatively pass their data through the network in order to process a query that is usually posed at the base station in a global manner. In other words, the behavior of a sensor node is similar to a common type of agents that are in charge of acquiring data of environment and obtaining the relevant information. In employing MMSs, the unit of cooperation can be varied from a sensor node to a set of sensor nodes in a region. Our REQUEST+ can be applicable to any MASs that require to gather information of a given environment by modifying the deﬁnition of a region appropriately for the given environment (note that a region in our framework is just deﬁned as a maximal set). Our framework is useful for MASs in the environment with a high portion of noises as well as MASs that need to control the degree of resolution of environmental information. 6. Conclusion In this work, we proposed a new type of query in sensor networks, called the region-based query. This type of queries are helpful to overcome noises in the sensor data, and to provide a macro view of a monitoring area. By permitting overlapping regions, we could deal with every possible region where sensor nodes are deployed. In order to construct numerous regions efﬁciently, we used the SEC index that is built in the preprocessing phase. Moreover, to efﬁciently process the region-based query, we used a clustering-based aggregation method and addressed an optimization problem for leader selection with a proof of its NP-hardness. Also, we provided an algorithm to solve the leader selection problem by mapping the problem to the weighted set-cover problem. Also, we built a new routing tree for each leader node, a query-initiated routing tree that enables the intra-region communication. Finally, we showed that our proposed approach is effective and energy-efﬁcient through the experiments. Acknowledgements We would like to thank the editor and anonymous reviewers for their helpful comments. This work was supported by the National Research Foundation of Korea grant funded by the Korean government (MSIP) (No. NRF-2009-0081365). References [1] Precision Agriculture. . [2] E. Argente, G. Beydoun, R. Fuentes-Fernández, B. Henderson-Sellers, G. Low, Modelling with agents, in: Proceedings of the 10th International Conference on Agent-Oriented Software Engineering (AOSE), 2011, pp. 157–168. [3] G. Barrenetxea, F. Ingelrest, G. Schaefer, M. Vetterli, O. Couach, M. Parlange, Sensorscope: out-of-the-box environmental monitoring, in: Proceedings of the 7th International Conference on Information Processing in Sensor Networks (IPSN), 2008, pp. 332–343. [4] G. Beydoun, G.C. Low, B. Henderson-Sellers, H. Mouratidis, J.J. Gómez-Sanz, J. Pavón, C. Gonzalez-Perez, FAML: a generic metamodel for mass development, IEEE Transactions on Software Engineering 35 (6) (2009) 841–863. [5] V. Chvatal, A greedy heuristic for the set-covering problem, Mathematics of Operations Research 4 (3) (1979) 233–235. [6] D.-W. Choi, C.-W. Chung, REQUEST: region-based query processing in sensor networks, in: Proceedings of the 16th International Conference on Database Systems for Advanced Applications (DASFAA), 2011, pp. 266–279. [7] M. Demirbas, H. Ferhatosmanoglu, Peer-to-peer spatial queries in sensor networks, in: Proceedings of the 3rd International Conference on Peer-to-Peer Computing (P2P), 2003, pp. 32–39. [8] V. Dyo, C. Mascolo, Adaptive distributed indexing for spatial queries in sensor networks, in: Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA), 2005, pp. 1103–1107.

D.-W. Choi, C.-W. Chung / Information Sciences 248 (2013) 151–167

167

[9] C. Intanagonwiwat, R. Govindan, D. Estrin, J.S. Heidemann, F. Silva, Directed diffusion for wireless sensor networking, IEEE/ACM Transactions on Networking 11 (1) (2003) 2–16. [10] H. Gupta, Z. Zhou, S.R. Das, Q. Gu, Connected sensor cover: self-organization of sensor networks for efﬁcient query execution, IEEE/ACM Transactions on Networking 14 (1) (2006) 55–67. [11] W.R. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energy-efﬁcient communication protocol for wireless microsensor networks, in: Proceedings of the 33rd Annual Hawaii International Conference on System Sciences (HICSS), 2000, p. 10. [12] W.R. Heinzelman, A. Chandrakasan, H. Balakrishnan, An application-speciﬁc protocol architecture for wireless microsensor networks, IEEE Transactions on Wireless Communications 1 (4) (2002) 660–670. [13] Z. He, B.S. Lee, X.S. Wang, Aggregation in sensor networks with a user-provided quality of service goal, Information Sciences 178 (9) (2008) 2128–2149. [14] D.S. Hochbaum, Heuristics for the ﬁxed cost median problem, Mathematical Programming 22 (1) (1982) 148–162. [15] D.S. Johnson, The np-completeness column: an ongoing guide, Journal of Algorithms 3 (1982) 182–195. [16] C.-H. Lee, C.-W. Chung, S.-J. Chun, Effective processing of continuous group-by aggregate queries in sensor networks, Journal of Systems and Software 83 (12) (2010) 2627–2641. [17] S. Lindsey, C.S. Raghavendra, K.M. Sivalingam, Data gathering algorithms in sensor networks using energy metrics, IEEE Transactions on Parallel Distributed Systems 13 (9) (2002) 924–935. [18] S. Madden, M.J. Franklin, J. Hellerstein, W. Hong, Tag: a tiny aggregation service for ad-hoc sensor networks, SIGOPS Oper. Syst. Rev. 36 (SI) (2002) 131– 146. [19] A. Manjhi, S. Nath, P.B. Gibbons, Tributaries and deltas: efﬁcient and robust aggregation in sensor network streams, in: Proceedings of ACM SIGMOD International Conference on Management of Data, 2005, pp. 287–298. [20] F. Marcelloni, M. Vecchio, An efﬁcient lossless compression algorithm for tiny nodes of monitoring wireless sensor networks, Computer Journal 52 (8) (2009) 969–987. [21] S. Nath, P.B. Gibbons, S. Seshan, Z.R. Anderson, Synopsis diffusion for robust aggregation in sensor networks, in: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys), 2004, pp. 250–262. [22] Q. Ren, Q. Liang, Energy and quality aware query processing in wireless sensor database systems, Information Sciences 177 (10) (2007) 2188–2205. [23] Sensorscope Project. . [24] M.A. Sharaf, J. Beaver, A. Labrinidis, P.K. Chrysanthis, Balancing energy efﬁciency and quality of aggregate data in sensor networks, The VLDB Journal 13 (4) (2004) 384–403. [25] A. Sharma, L. Golubchik, R. Govindan, Sensor Faults: Detection Methods and Prevalence in Real-World Datasets, TOSN 6 (3). [26] S. Siripanadorn, W. Hattagam, N. Teaumroong, Anomaly detection using self-organizing map and wavelets in wireless sensor networks, in: Proceedings of the 10th WSEAS International Conference on Applied Computer Science, 2010, pp. 291–297. [27] A. Soheili, V. Kalogeraki, D. Gunopulos, Spatial queries in sensor networks, in: Proceedings of the 13th ACM International Workshop on Geographic Information Systems (ACM-GIS), 2005, pp. 61–70. [28] I. Song, Y.J. Roh, M.-H. Kim, Content-based multipath routing for sensor networks, in: Proceedings of the 15th International Conference on Database Systems for Advanced Applications (DASFAA), 2010, pp. 520–534. [29] M. Stern, K. Böhm, E. Buchmann, Processing continuous join queries in sensor networks: a ﬁltering approach, in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010, pp. 267–278. [30] E. Welzl, Smallest enclosing disks (balls and ellipsoids), in: New Results and New Trends in Computer Science, Lecture Notes in Computer Science, vol. 555, Springer, Berlin, Heidelberg, 1991, pp. 359–370. [31] O. Younis, S. Fahmy, Distributed clustering in ad-hoc sensor networks: a hybrid, energy-efﬁcient approach, in: Proceedings of the 23rd Conference of the IEEE Communications Society (INFOCOM), 2004, pp. 640–. [32] Y. Zhuang, L. Chen, Max regional aggregate over sensor networks, in: Proceedings of the 25th International Conference on Data Engineering (ICDE), 2009, pp. 1295–1298.

a simulation framework for energy efficient data grids

A Space-Efficient Indexing Algorithm for Boolean Query Processing

Sailfish: A Framework For Large Scale Data Processing

vPath: Precise Discovery of Request Processing Paths ...

Efficient Query Processing for Streamed XML Fragments

Efficient Top-k Hyperplane Query Processing for ...

A Proposed Framework for Proposed Framework for ...

Efficient processing of graph similarity queries with edit ...

Request for Proposal - Ning

Request for Leave of Absence.pdf

request for qualifications (rfq) - City of Mobile

request for proposal - AOS92

request for expressions of interest - ICOH

REQUEST: Region-Based Query Processing in Sensor ...

request for expressions of interest - ICOH

CITY OF MOBILE, ALABAMA REQUEST FOR PROPOSAL ...

request for qualifications (rfq) - City of Mobile

Request For Quote for a WebCL Kernel Validator - Khronos Group

A Hardware Intensive Approach for Efficient Implementation of ... - IJRIT

A Hardware Intensive Approach for Efficient Implementation of ...

A Family of Computationally Efficient and Simple Estimators for ...

A Framework for Visual Characterization of Number ...