Overhearing-aided Data Caching in Wireless Ad Hoc Networks Weigang Wu

Jiannong Cao, Xiaopeng Fan

Department of Computer Science Sun Yat-sen University Guangzhou 510275, China [email protected]

Department of Computing The Hong Kong Polytechnic University Hong Kong, China {csjcao, csxpfan}@comp.polyu.edu.hk

Abstract—The wireless ad hoc network is a promising networking technology to provide users with various network services anywhere anytime. To cope with resource constraints of wireless ad hoc networks, data caching is widely used to efficiently reduce data access cost. In this paper, we propose an efficient data caching algorithm for wireless ad hoc networks which makes use of the overhearing property of wireless communication to improve caching performance. Due to the broadcast nature of wireless links, a packet can be overheard by a node within the transmission range of the transmitter, even if the node is not the intended target. Our proposed algorithm explores the overheard information, including data request and data reply, to optimize cache placement and cache discovery. To the best of our knowledge, this is the first data caching algorithm that considers the overhearing property of wireless communications for performance improvement. The simulation results show that, compared with existing work, our proposed algorithm can significantly reduce both message cost and access delay. Keywords- data cache, ad hoc network, overhearing

I.

INTRODUCTION

In recent years, wireless ad hoc network has received a lot of attention from both academia and industry communities due to its features of flexible deployment, low cost and easy maintenance. Wireless ad hoc networks are especially suitable for the scenarios where the deployment of network infrastructure is too costly or even impossible, e.g. outdoor assemblies, disaster recovery and battlefield. In a wireless ad hoc network, there is no support of any fixed infrastructure (e.g. the base stations in 3G networks or access points in wireless LANs), and each network node communicates with other nodes through multi-hop paths. A network node can be any computing device, ranging from a mobile computer, e.g. PDA, laptop, to a backbone network device, e.g. mesh routers [AW05], or even an embedded small sensor node [IG00][RK03]. Wireless ad hoc networks are resource constrained in terms of bandwidth, power, etc., so data access cost is a major concern. Data caching has been widely used to reduce data access cost in traditional computer networks [CL98][CZ07][KK01]. Data caching is much more desirable and effective in wireless ad hoc networks. When data are delivered through multi-hop paths, caching the data at intermediate nodes can significantly reduce the message cost and consequently save various resources, from network

bandwidth to battery power. Accessing data at cache node can also help reduce data access delay. Quite a lot of work has been conducted for data caching in wireless ad hoc networks, including cache placement [YC06] [NS06] [TG08] and discovery [SI03], and cache consistency [DG04] [HC07][JE97]. Cache placement refers to determining where and what to cache; cache discovery refers to the mechanism to find and obtain a cached data item; and cache consistency means to ensure that the data value in cache copies is consistent with the source copy at the data server. The first two problems are so closely related that they are usually studied together. Existing works on cache placement mainly focus on selecting cache nodes based on the information about data access frequency and network topology[LK02] [NS06] [CY04] [TG08] [YC06] [DT04][HT01]. For cache discovery, research has been focused on combining passive and active query approaches [SI03]. The properties of wireless communication have not been well explored in designing the algorithms for improving the performance. . In this paper, we propose an efficient data caching algorithm for wireless ad hoc networks, focusing on how to optimize the cost for cache placement and discovery. While an intended target node can always “hear” the packet, with broadcast wireless links, the packet can also be received by any node within the transmission range even if the node is not the intended target of the packet. In this case, we say that the node overhear the packet. Our algorithm makes use of the overhearing property to significantly improve the performance of data caching in several aspects. First, by overhearing, a requesting node can obtain data copies from an intermediate node on the path to the cache node so as to reduce data access cost. Second, overhearing helps collect more data access information, e.g. access frequency, which is necessary to make decision on cache placement. However, several issues need to be addressed in order to make use of overhearing. First of all, what requests can be met by overhearing is a key issue. The cache discovery mechanism must be delicately designed so as to capture as many overhearing data copies as possible. Second, how to define and evaluate the access cost should be considered carefully for cache placement. In existing work, the access cost is defined by the distance between a requesting node and the nearest cache node. With overhearing, however, this

definition is not suitable any more because a request may be replied by an intermediate node. Although security and privacy may be concerns in some applications, we focus on the data access cost in this work. Based on the above observations and considerations, we have designed an overhearing-aided data caching algorithm that jointly consider the cache placement and cache discovery issues. To evaluate data access cost with overhearing, we define function to calculate the access cost with the distance between the requestor and the replier rather the cache node. To capture data copies through overhearing, we find out the sufficient condition for predictable data reply overhearing. To our knowledge, this is the first algorithm of such kind. To evaluate the performance of our proposed algorithm, we have conducted extensive simulations. Various scenarios with different parameter settings are simulated. The simulation results show that, our proposed algorithm performs much better than existing algorithms in terms of both message cost and data access delay. The rest of the paper is organized as follows. Section 2 briefly reviews existing work on cache placement and cache discovery in wireless ad hoc networks. The cache placement problem is formulated in Section 3. We first describe the system model and then define the data access cost function, taking the overhearing property into consideration. In Section 4, we describe the details of our proposed data caching algorithm, including the data structures, message types, and cache placement and discovery operations. We also discuss possible improvements of our algorithm in this section. In Section 5, we report the performance evaluation results in comparison with existing work. Finally, Section 6 concludes the paper and points out future works. II.

RELATED WORK

Although quite a lot of work has been done on data caching in wireless ad hoc networks, none of the previous algorithms has considered the overhearing property of wireless links. In this section, we review works on cache placement and discovery which are related to our work in this paper. Hara [HT01] [HT02a][HT02b] proposed three popularitybased algorithms for cache placement in ad hoc networks. A mobile node selects data items to cache based on the access frequency. The three algorithms differ in whether and how the access of a neighbor host is considered. Yin and Cao [YC06] also proposed data caching techniques based on data access frequency. Their algorithms differ in the information to cache, i.e. the data itself or the path to the nearest cache node. A data caching scheme specifically for accessing multimedia objects in ad hoc networks is proposed by Lau et al. [LK02]. Benefit-based data caching algorithms can be found in

[XL02][NS06][BR01][TG08]. Xu et al. [XL02] discussed cache placement in tree networks. Nuggehalli et al. [NS06] formulated the cache placement problem with a single data item in ad hoc networks as a special case of the connected facility location problem [SK02]. They considered the tradeoff between the query delay and overall energy consumption. Tang et al. [TG08] proposed a benefit-based greedy cache placement algorithm for ad hoc networks. Cache placement is done based on the total data access cost of all the nodes in the network. The data items that can help reduce more total access cost will be cached with higher priority. Existing cache discovery approaches can be categorized into two classes: passive discovery and active discovery. In the passive system [XL02][YC06], the cache copy at one node is unknown to other nodes. The data requests are always destined to the data source. When an intermediate node in the path has a cache copy, it stops forwarding the request and sends the cached data back to the requestor. In active cache discovery [TG08][SI03], a node gets the knowledge of the cache node first and then the request is sent to the cache node rather than the source node. III.

SYSTEM MODEL AND PROBLEM FORMULIZATION

As mentioned, our proposed data caching algorithm is the first work that takes the overhearing into consideration to improve the performance of data caching. In this section, we first describe the system model, including the network model and the data cache model, and then formulate the cache placement problem with the consideration of the overhearing property. A.

System model We consider a wireless ad hoc network that consists of n nodes. The network nodes communicate by sending and receiving messages through wireless links. Such a network can be represented as an undirected graph G(V,E) where the set of vertices V represents the nodes in the network, and E is the set of weighted edges in the graph. Two nodes that can communicate directly with each other are connected by an edge in the graph. For user applications or other purposes, there are multiple data items to be accessed by network nodes. Each data item is served by its source node. The data source is assumed to be known to all the nodes. To reduce data access cost, each network node caches multiple data items subject to its memory capacity limitation. The node with a cache copy of some data item is called a cache node of that data item. For simplicity of presentation, the data source is also viewed as a cache node. B.

Formulation of the Cache Placement Problem The objective of cache placement is to minimize the total

access cost of all the nodes in ad hoc networks by appropriately placing the cache copies. The cache placement problem can be formulated as follows. Given a wireless ad hoc network graph G(V,E) with p data items D1, D2, …, Dp, where a data item Dj is served by a source node Sj, the cache placement problem is to select a set of sets of cache nodes M = {M1, M2,…, Mp}, where each node in Mj stores a copy of Dj , to minimize the total access cost Г(G,M): p

Γ(G, M ) = ∑∑ aij • min l∈({S j }∪ M j ) d il

(1)

i∈V j =1

under the constraint ∀i , i ∈ V : {M j | i ∈ M j } ≤ mi , which specifies the memory capacity, i.e. each node i appears in at most mi sets of M. In the problem formulation, for clarity of presentation, we assume all the data items have a uniform size of one memory unit. A node i has a memory with size mi for data caching. We use aij to denote the access frequency for a node i to request the data item Dj, and dil to denote the weighted distance between two nodes i and l. The cache placement problem is known to be NP-hard [BR01][JV01]. To formulate the cache placement problem with overhearing, we introduce the following additional notations: spil: the shortest path from node i to node l. It is a chain of nodes from i to l. opkj: the probability that node k serves a request for data Dj by overhearing. Then, with overhearing, Function (1) can be re-defined as:

Γ(G, M)= p

∑∑a

ij

i∈V j =1

• min l∈({S j }∪ M j ) (



m∈spil

(| spim | •opmj •

∏ (1 − op ))) tj

(2)

t ∈( spim \ m )

under the constraint ∀i , i ∈ V : {M j | i ∈ M j } ≤ mi . The difference between the definitions of Γ(G, M) in (1) and (2) is in the distance that a data request travels along the path between the requester and the nearest cache node. Without overhearing, the requested data item is always obtained from a cache node. On the other hand, if the request is overheard by some intermediate node between the requestor and the cache node, the data will be obtained from that immediate node rather than the cache node. IV.

OVERHEARING-AIDED CACHING ALGORITHM

Our overhearing-aided data caching algorithm consists of two parts: cache placement and cache discovery. The cache placement part is used for a node to choose data items to cache with our proposed data access cost function. The cache discovery part realizes the overhearing-aided data access by providing a node with the mechanism to serve

data requests by overhearing. In the rest of this section, we first define the cache placement metric “benefit” and then introduce the data structures and message types used in our algorithm. The major part of the section is the description of the detailed cache placement and cache discovery operations of our algorithm. Finally, we discuss how to handle node mobility. A.

Definition of Benefit The benefit of caching a data item is defined by the change in the overall data access cost of the whole network. Based on Function (2), benefit is formally defined as follows. Definition 1 (Benefit): the benefit of caching a data item Dj at a node i with respect to the current cache placement M is defined as:

B(i,j,M)= ⎧0 if i ∈ M j ⎪⎪ if i ∉ M j and | {M j | i ∈ M j } |< mi ⎨Γ(G, M ) − Γ(G, M M j ∪i ) ⎪ ⎪⎩max∀t ,i∈M t (Γ(G, M ) − Γ(G, M M j ∪i, M t \ i )) if i ∉ M j and | {M j | i ∈ M j } |≥ mi

(3)

The notation MX denotes the set M with some change in the element set X of M. The first condition in Function (3) specifies that the benefit of caching Dj at node i is zero if i has already cached a copy of Dj. The second condition specifies that, if node i has not cached Dj and i has some empty space to cache it, the benefit of caching Dj at node i is the reduction in the access cost Г. Finally, under the third condition, node i has not cached Dj and the cache space of i is full. If node i wants to cache Dj, some existing cache item must be replaced. The change in the data access cost is different if different cache items are selected for replacement. In this case, the benefit of caching Dj at node i is defined as the maximum reduction in data access cost Г. B.

Data Structures and Message Types When executing our algorithm, a node needs to maintain the cache information using the following data structures: Nearest-cache table: each node uses this table to maintain the nearest cache node for each data item. For example, the table entry Nj indicates the nearest node with a cache copy of Dj. Overheard-cache list: each element of this list is a Boolean flag Oj, which is set to be “true” if a request for Dj is still pending and the reply for this request can be (over)heard. This list is maintained at each node individually. Cache-list: the set of nodes caching a data item Dj. This is maintained by only the source node of Dj. Besides the data structures above, our data caching algorithm involves the following message types: ADD (i, j): the message from node i to inform others that i has added a copy of Dj. ADDEL(i, j, t, Mt): the message from node i to inform others that i has added a copy of Dj by replacing a

copy of Dt. The updated cache-list of Dt, i.e. Mt is also included in the message. REQ(i, j, Nj): the request message from node i to the nearest cache node Nj for data Dj. REP(i, Dj): the reply message of data Dj to node i, the requestor. C.

Cache Placement The pseudo-code of the cache placement operations of our algorithm is shown in Fig. 1. This code is executed by each node. When a data item Dj is (over)heard by a node i, i will first calculate the benefit of caching Dj using Definition 1. The key parameters in Definition 1 are aij and opkj. A major concern is how to get the two values. As in many similar works, a commonly used approach is letting a node calculate the parameter values based on the localized historical data request information collected by listening to the traffic around. The access frequency can be calculated based on data request messages. To calculate the overhearing probability, an additional flag needs to be added to each reply message, which indicates whether the data copy included in the reply is obtained by overhearing. This calculation is simple but not accurate because a node needs to calculate the parameters for all the nodes based on local information. //Cache Placement Algorithm //Executed at each node COBEGIN: (101) When Dj is (over)heard by a node i: (102) if (B(i, j, M) > TH) then /*TH is a threshold*/ (103) cache Dj; (104) if (Dt has been replaced by Dj) then (105) get Mt from the sever; (106) broadcast ADDEL(i, j, t, Mt); (107) else (108) broadcast ADD(i, j); (109) When a node k (over)hears an ADD(i, j)/ADDEL(i, j, t, Mt): (110) if (node i is nearer than Nj) (111) then Nj ←i; (112) if Nt=i then (113) Nt ← nearest(Mt); COEND

Fig. 1. The Overhearing-aided Algorithm -- cache placement

A more accurate calculation is by exchanging the access frequency and overhearing probability values among the nodes. Each node only calculates its own access frequency and overhearing probability, and then sends all these values to other nodes. To save communication cost, these values can be piggybacked on the date request and reply messages. Based on the benefit of caching the data item Dj, node i can determine whether Dj should be cached. If the benefit of caching Dj exceeds some specific threshold value TH, node i will cache it (line 102). TH can be specified by the user. If Dj is cached in some empty space, node i broadcasts an ADD(i, j) message to inform other nodes about the new

cache copy. Otherwise, a cache copy of some data item Dt must have been replaced by Dj. Node i first needs to get the cache-list Mt for Dt from the source node St and then broadcasts an ADDEL(i, j, t, Mt) message. Upon (over)hearing an ADD(i, j) or ADDEL(i, j, t, Mt) message at a node k, if node i is nearer than the current Nj at node k, Nj would be updated to i. Moreover, if Dt has been replaced by Dj at node i and node i is currently the nearest cache node of Dt to node k, i.e. Nt= i, node k needs to update Nt based on Mt. D.

Cache Discovery The major challenge in the design is to find that, in what cases, overhearing is useful and utilizable for data caching. Overhearing may occur in numerous scenarios, but not all of them can be utilized. Intuitively, if a node knows that a data reply for some pending request will be heard or overheard, the later requests for the same data item can be certainly served without sending the request to the cache node. We call such an overhearing of data reply a “predictable” reply overhearing. How to determine whether a reply overhearing is predictable? The question is answered by the following theorem. Theorem 1 (Predictable Reply Overhearing): if a node m overhears a request for a data x from some intermediate node, then m can overhears data x in the reply corresponding to the request overheard. Proof. We prove the theorem by examining the scenarios of overhearing one by one. Overhearing scenarios that may occur in the procedure of data access are shown in Fig. 2. A data requestor i sends a request for data item x to the cache node l through intermediate nodes j, k, …, m,… Without loss of generality, we assume that node m can serve the request by overhearing. Then, the request would not be forwarded further when it arrives at node m. A copy of data x would be sent back to node i by node m. During the above procedure of request and reply, other nodes may overhear the request or reply message. If an overhearing node can predict this coming data copy x, it can serve other requests for data x even if the node itself is not a cache node. Obviously, the predictability of the overhearing of data x depends on the location of the overhearing node. According to the transmission range of the nodes, the whole region is divided into un-overlapping sub-regions. We examine the sub-regions one by one and label the regions of predictable overhearing using a tick as shown in Fig. 2. By observing all the sub-regions with ticks, we can conclude the theorem holds. □ Based on Theorem 1, we design the following mechanisms to handle data request and reply, the pseudocode of which is shown in Fig. 3. When a node i has a request for data Dj which is not cached locally, node i sends a request message REQ(i, j, Nj) to the nearest cache node Nj (line 202) and then waits for the

corresponding reply. : no overhearing

: predictable overhearing

: unpredictable

so as to serve future requests for the same data item. Upon overhearing or hearing a reply message REP(i, Dj), a node resets Oj to be “false” and forwards the reply message to all pending requestors for Dj (line 217). E.

i

j

k

m

l

Overhearing a request from m ≠i

Fig. 2. Overhearing Scenarios in Cache Discovery //Cache Discovery Algorithm //Executed at each node COBEGIN: (201) When node i has an access request for Dj: (202) send REQ(i, j, Nj) to Nj; (203) wait for reply with Dj; (204) When node k hears a request REQ(i, j, Nj): (205) if (k∈ Mj) then (206) send REP(i, Dj) to node i; (207) else (208) if (Oj is true) then (209) put REQ(i, j, Nj) in pending list; (210) else (211) send REQ(i, j, Nj) to node Nj; (212) Oj ← true; (213) When node l overhears a request REQ(i, j, Nj) from node k≠ i: (214) Oj ← true; (215) When node k (k≠ i) (over)hears a reply REP(i, Dj): (216) Oj ← false; (217) send REP(i, Dj) to each pending requestor i for Dj; COEND

Fig. 3. The Overhearing-aided Algorithm -- cache discovery

The request message needs to be forwarded by the intermediate nodes between i and Nj. If an intermediate node k has a cache copy of Dj (due to the inaccuracy of Nj), it replies the requestor by sending a REP(i, Dj) message (line 206). Otherwise, it can still serve the request for Dj if its Oj is “true”, i.e. another request for Dj is still pending and the reply for this request can be (over)heard, the request will be put in the pending list (line 209). Node k will then stop propagating the request and wait for the reply for the pending request (line 209). If neither of the above two cases is true, node k will continually forward the request towards Nj (line 211) and set Oj to be “true” so that it can serve other requests for the same data item when it gets the reply corresponding to the current request. Besides the intermediate nodes in the path between node i and Nj, some other nodes may also get the request for Dj by overhearing. If a node l can overhear a request forwarded by some intermediate node (line 213), it sets its Oj to be “true”

Handling Node Mobility In the algorithm described above, the node mobility is not well considered. The mobility of nodes may change the network topology from time to time, so that the Nearestcache table at a node is no longer accurate. To cope with such a dynamic topology, two alternative approaches can be used [TG08]. First, we can maintain the integrated cacherouting tables as routing tables in mobile ad hoc networks [PB94]. For example, for each data item, a cache entry is maintained in the form of . Alternatively, we can let the data sources periodically broadcast the updated cache lists. On adding or deleting a cache item, the cache node just sends the ADD or ADDEL message to the source. Moreover, the node mobility can also affect the accuracy of the condition of predictable overhearing. Due to the topology change, the reply for a request may not be delivered along the reverse path of the request message and consequently Condition 1 of predictable overhearing may not hold. To address this problem, a possible solution is letting the intermediate nodes of a request wait for some time to try to overhear the requested data before forwarding the request to the destination cache node. Of course, the duration of the waiting time must be set carefully. This value can be determined according to the overhearing probability of the requested data. V.

PERFORMANCE EVALUATION

To evaluate the performance of our proposed algorithm, we have carried out extensive simulations using the ns-2 simulator [NS07]. The algorithm proposed in [TG08] has been also simulated to be compared with our algorithm. In the following, we first introduce the simulation system setup and performance metrics and then report the simulation results. A.

Simulation Setup The setup of our simulations includes the network setup and data access model. In general, we follow the setup used in [TG08]. All the key settings are listed in Table 1. 1) Network Setup We simulated a wireless ad hoc network with 100 nodes, which are randomly distributed in a territory of 2000×500 (the small territory) or 2000×1000(the large territory) m2. We assume the network is static, i.e. the nodes do not move. For the MAC layer protocol, IEEE 802.11 [IE03] was used. The transmission range is about 250 meters. Following [TG08], we used DSDV [PB94] as the routing protocol. Data caching algorithms were implemented as

by Pj =

1 j

θ



1000

1 / hθ h =1

, where 0≤ θ≤ 1. When θ = 1, the

above distribution follows the strict Zipf distribution; while when θ = 0, it follows the uniform distribution. As in [TG08] and [YC06], we set θ to be 0.8 based on real web trace studies [BP99]. The second data access pattern is a random pattern, where each node uniformly accesses 200 randomly chosen data items. B.

Performance Metrics We measured four performance metrics to demonstrate the performance of the data caching system.

Access Path Len.

2.0 1.8 1.6 1.4

2.2

1.9

1.6

TG-Grid

TG-Large

OH-Grid

1.2

OH-Large TG-Small

1.3

TG-Rand OH-Rand

OH-Small

1.0

1.0 1/40

1/30

1/20

1/10

1/5

1/40

Request Arrival Rate (1/s)

1/30

1/20

1/10

1/5

Request Arrival Rate (1/s)

(a) Small territory, different access patterns (b) Random pattern, different territory sizes

Fig. 4. The Access Path Length 1.6E+06

1.2E+06

8.0E+05

Total Msg Cost

2) Data Access Model All the network nodes, including the sources, act as client nodes. Following [TG08], each client sends out data access requests with a constant rate. The time interval between two consecutive requests follows the exponential distribution with the mean value varied from 5 to 40 seconds. The timeout for a reply is set to 40 seconds. To get stable results, each simulation was executed for 10000 seconds. Two data access patterns have been simulated. The first one is the grid pattern. The data access frequencies at a node depend on its geographic location, i.e. the nodes closely located have similar data access frequencies. In the beginning, the 1000 data items are uniformly distributed over the network territory in a grid-like manner resulting in a virtual coordinate for each data item. Then, each client requests the data items following a Zipf-like distribution [BP99]. The access frequencies of the data items are ordered by the distance between the data item’s virtual coordinates and the location of the client. More precisely, the probability of accessing the jth closest data item is represented

Access Path Len.

Table 1. Key Settings in Simulations Parameter Value # of nodes 100 2 Network Territory (m ) 2000×500, 2000×1000 Transmission range (m) 250 MAC Protocol IEEE 802.11 Routing Protocol DSDV # of Data Items 1000 # of Sources 2 Data Item Size (Byte) 100-1500 Cache Space Size (Byte) 40000, 80000 Request Arrival Rate (1/s) 1/40-1/5 Timeout for Reply (s) 40 Simulation Time (s) 10000

Access Path Length (APL): the average length of the path between a data requestor and the corresponding replier. The replier can be a cache node or a node overhearing the data. This metric directly reflects the data access cost of a cache placement M. Total Message Cost (TMC): the total number of message transmissions done for data caching and accessing by all the network nodes. This includes all the message cost for request/reply and control messages, e.g. the broadcast of the ADD message. This is the most important metric, which precisely reflects the total message cost of the whole data cache system. Access Delay (AD): the average time interval between sending a data request and receiving the corresponding reply. Success Ratio (SR): the percentage of the requests that are replied before the timeout occurs.

Total Msg Cost

applications directly based on DSDV. We assume that there are totally 1000 data items with varying size served by two sources. The sources are randomly placed in the network and each source serves 500 data items. The data size is assumed to follow the uniform distribution between 100 and 1500 bytes. As to the cache space size, each node has a cache space of 40000 Bytes, which can cache about 50 data items (i.e. 5% of the total number of data items as in [TG08]).

T G-Grid OH-Grid T G-Rand OH-Rand

2.0E+06 1.6E+06

TG-Large OH-Large

1.2E+06

TG-Small OH-Small

8.0E+05

4.0E+05

0.0E+00 1/40

4.0E+05

1/30

1/20

1/10

1/5

0.0E+00 1/40

1/30

Request Arrival Rate (1/s)

1/20

1/10

1/5

Request Arrival Rate (1/s)

(a) Small territory, different access patterns (b) Random pattern, different territory sizes

Fig. 5. The Total Message Cost

C.

Simulation Results In this section, we present the simulation results and discuss them according to the performance metrics. In the following figures, the results of the algorithm proposed in [TG08] are labelled “TG” and the results of our proposed algorithm are labelled “OH”. 1) Access Path Length The results of APL are shown in Fig. 4. First, let us see the effect of request arrival rate. In general, the faster the requests arrive, the fewer hops are needed to access the data. This can be explained as follows. When a data item is accessed, it may be cached by the requestor and other nodes forwarding the reply. Then, if some node requests the same data after a short time, with a high probability, the cache copy is still there. Therefore, APL decreases with the increase of request arrival rate. The effect of access pattern is shown in Fig. 4-(a). When the request arrival rate is not large, the APL with the grid

100%

Success Ratio

Success Ratio

4) Success Ratio The success ratio of data access is shown in Fig. 7. The request arrival rate affects SR significantly. Under a high request arrival rate, the workload of the whole network is also high. All the messages, including reply messages, suffer from the long delay. Therefore, more replies cannot be delivered in time. The access pattern does not affect SR much (Fig. 7-(a)) but the effect of territory size is obvious (Fig. 7-(b)). In a small territory, the diameter of the network is also small, so the cache information can be propagated very quickly. Therefore, there is less outdated cache information in a small territory and the both algorithms can achieve a higher success ratio. 90% 80% TG-Grid

70%

0.05 0.04 0.03 0.02

0.08

50% 1/40

OH-Large

0.06

TG-Rand OH-Rand

0.00

1/40

1/30

1/20

1/10

1/5

Request Arrival Rate (1/s)

1/20

1/10

1/5

50% 1/40

T G-Small OH-Small

1/30

1/20

1/10

1/5

Request Arrival Rate (1/s)

Fig. 7. The Success Ratio

0.02

0.00

1/30

T G-Large OH-Large

60%

(a) Small territory, different access patterns (b) Random pattern, different territory sizes

OH-Grid

0.01

80%

Request Arrival Rate (1/s)

TG-Small OH-Small

0.04

TG-Grid

90%

70%

TG-Rand OH-Rand

0.10 TG-Large

100%

OH-Grid

60%

Access Delay

Access Delay

pattern is smaller than that with the random pattern. This is easy to understand. In the grid pattern, the nodes close to each other have more common interests in data access, so cache nodes are also close to the requestor and consequently the access path length is reduced. However, when the access requests arrive frequently, such effect of grid pattern is weakened. The territory size affects the both algorithms similarly on APL as shown in Fig. 4-(b). In a large territory, the distribution of nodes is sparse, so more hops are necessary to access the data. Now, let us compare the TG algorithm and our proposed one. Obviously, in all the cases, our proposed algorithm needs fewer hops to access a data item. Such an advantage comes from the benefit of overhearing. In our caching system, an intermediate node can reply a request by overhearing, which will certainly reduce the distance between the requestor and the replier.

1/40

1/30

1/20

1/10

1/5

Request Arrival Rate (1/s)

(a) Small territory, different access patterns (b) Random pattern, different territory sizes

Fig. 6. The Access Delay

2) Total Message Cost Fig. 5 shows the results of TMC against the request arrival rate. In general, with the increase of the request arrival rate, TMC of the both algorithms increases. This is because that more requests will cause more messages. The access pattern and territory size have very similar effect on TMC as on APL because TMC is directly affected by APL. Benefiting from overhearing, our algorithm can significantly save message cost compared with the TG algorithm. In some cases, the difference is up to about 40%. The larger the request arrival rate is, the more overhearing replies may appear and the larger message cost is saved compared with TG. 3) Access Delay We plot the results of access delay in Fig. 6. Under a high request arrival rate, the traffic of the network is also high, so the message delay is large and consequently the data access delay is increased. The access pattern and territory size do not affect access delay much. This is because that the access delay in all the cases is very short. Therefore, the change caused by access pattern in access delay is not obvious. Compared with the TG algorithm, our proposed algorithm always performs better, although the difference is not much. This is obtained by overhearing. As shown in Fig. 5, our algorithm can get the data copy with a shorter path, so the access delay is also shorter, compared with TG.

In comparison with the TG algorithm, our proposed algorithm achieves a smaller success ratio although the difference is minor. This can be explained as follows. Due to the dynamic cache replacement, a request may be sent to a node that has removed that requested data cache and then a cache miss occurs. This will cause a data access fail. Cache miss may occur in both the algorithms but it causes more access failures in our algorithm. More precisely, due to overhearing mechanism, more than one request may be pended for the same reply, so one cache miss may cause several access failures. From the above descriptions and comparisons we can see that, compared with the TG algorithm, our proposed algorithm can significantly save both message cost and access delay. Although TG performs a little better in success ratio, the difference is minor. VI.

CONCLUSIONS AND FUTURE WORK

In this paper, we propose the first overhearing-aided data caching algorithm for wireless ad hoc networks. Due to the broadcast feature of wireless links, a data request or reply may be heard by all the nodes within the transmission range even if they are not the intended destination nodes. Based on this observation, we designed an efficient data caching algorithm to reduce data access cost by making use of data copies overheard. Our algorithm consists of two parts: cache placement and cache discovery. For cache placement, we proposed a benefit like placement metric with the consideration of overhearing property in evaluating data access cost. For cache discovery, we designed mechanism to capture data copies through overhearing. The simulation

results show that, our proposed algorithm performs much better than the existing algorithm in terms of both message cost and access delay. In future, we will evaluate our algorithm by conducting more simulations. Both data update and node mobility will be simulated. Moreover, we will pursue improvements of our algorithm, including tolerance of packet loss, higher success ratio, etc. We are also planning to design popularity-based data caching algorithm with the help of overhearing. ACKNOWLEDGEMENTS This research is partially supported by National Natural Science Foundation of China (Grant No. 60803137), Hong Kong University Research Grant Council (CERG Grant No. PolyU 5105/E), Hundred Talents Program (Bairen Jihua) of Sun Yat-sen University (Grant for “Wu Weigang”), Guangdong Natural Science Fund Committee (Grant No. 8451027501001507), and Program for Doctor Degree Foundation of Ministry of Education of China (Grant No. 200805581008).

REFERENCES [AW05] [BR01] [BP99] [CL98] [CY04] [CZ07] [DG04]

[HC07]

Ian F. Akyildiz, Xudong Wang, Weilin Wan, Wireless mesh networks: a survey, Computer Networks, 47(4), Mar. 2005. I. Baev, and R. Rajaraman, Approximation Algorithms for Data Placement in Arbitrary Networks, in Proc. of SODA’01, 2001. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications, in Proc. of INFOCOM’99, 1999. P. Cao, and C. Liu, Maintaining Strong Cache Consistency in the World Wide Web, IEEE Trans. on Computers, 47(4), 1998. G. Cao, L. Yin, and C. R. Das, Cooperative Cache- Based Data Access in Ad Hoc Networks, IEEE Computer, 37(2), Feb. 2004 J. Cao, Y. Zhang, L. Xie, and G. Cao, Data Consistency for Cooperative Caching in Mobile Environments, IEEE Computer, 40(4), April 2007, pp. 60-66. Yu Du, and S. Gupta, Cache Management in Wireless and Mobile Computing Environments, Chapter 15, Mobile Computing Handbook (edited by Mohammad Ilyas, Imad Mahgoub), CRC press, Nov. 2004 Y. Huang, J. Cao, Z. Wang, B. Jin, and Y. Feng, Achieving Flexible Cache Consistency for Pervasive Internet Access, in Proc. of Percom’07, 2007

[HT01]

T. Hara, Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility, in Proc. of INFOCOM’01, 2001. [HT02a] T. Hara. Cooperative Caching by Mobile Clients in Push-based Information Systems, in Proc. of CIKM’02, 2002. [HT02b] T. Hara. Replica Allocation in Ad Hoc Networks with Periodic Data Update, in Proc. of MDM’02, 2002. [IE03] IEEE Computer Society, Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, ANSI/IEEE Std 802.11, June 2003. [IG00] C. Intanagonwiwat, R. Govindan, and D. Estrin, Directed Diffusion: a Scalable and Robust Communication Paradigm for Sensor Networks. In Proc. of MOBICOM’00, 2000. [JE97] J. Jing, A. Elmagarmid, A. S. Helal, and R. Alonso, Bitsequences: an Adaptive Cache Invalidation Method in Mobile Client/server Environments, ACM Mobile Networks and Applications, 2(2), 1997, pp. 115-127. [JV01] K. Jain and V. V. Vazirani, Approximation Algorithms for Metric Facility Location and k-median Problems using the Primal-dual Schema and Lagrangian Relaxation, Journal of the ACM, 48(2), 2001. [KK01] A. Kahol, S. Khurana, S. Gupta, and P. Srimani, A Strategy to Manage Cache Consistency in a Disconnected Distributed Environment, IEEE Trans on Parallel and Distributed Systems, 12(7), 2001, pp. 686 -700. [LK02] W. Lau, M. Kumar, and S. Venkatesh, A Cooperative Cache Architecture in Supporting Caching Multimedia Objects in MANETs, in Proc. of WoWMoM’02, 2002. [NS06] P. Nuggehalli, V. Srinivasan, and C. Chiasserini, Energyefficient Caching Strategies in Ad Hoc Wireless Networks, IEEE/ACM Transactions on Networking, 14(5), October, 2006. [NS07] ns-2 project, http://www.isi.edu/nsnam/ns/ [PB94] C. Perkins, and P. Bhagwat, Highly Dynamic DSDV Routing for Mobile Computers, in Proc. of SIGCOMM’94, 1994. [RK03] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and F. Yu, Data-centric Storage in Sensornets with GHT, a Geographic Hash Table, Mobile Networks and Applications, 8(4), 2003. [SI03] F. Sailhan, and V. Issarny, Cooperative Caching in Ad Hoc Networks, in Proc. of MDM’03,, 2003 [SK02] C. Swamy, and A. Kumar, Primal-dual Algorithms for Connected Facility Location Problems, in Proc. of APPROX, 2002. [TG08] B. Tang, H. Gupta, and S. Das, Benefit-based Data Caching in Ad Hoc Networks, IEEE Transactions on Mobile Computing, 7(3), Mar 2008, pp. 289-304. [XL02] J. Xu, B. Li, and D. L. Lee, Placement Problems for Transparent Data Replication Proxy Services, IEEE JSAC, 20(7), 2002. [YC06] L. Yin, and G. Cao, Supporting Cooperative Caching in Ad Hoc Networks, IEEE Transactions on Mobile Computing, 5(1), January 2006, pp.77–89.

Overhearing-aided Data Caching in Wireless Ad Hoc Networks

Abstract—The wireless ad hoc network is a promising networking technology to provide users with various network services anywhere anytime. To cope with resource constraints of wireless ad hoc networks, data caching is widely used to efficiently reduce data access cost. In this paper, we propose an efficient data caching ...

299KB Sizes 0 Downloads 216 Views

Recommend Documents

Transmission Power Control in Wireless Ad Hoc Networks
Transmission Power Control in Wireless Ad Hoc. Networks: ..... performance (compared to TDMA and FDMA), CDMA has been chosen as the access technology of choice in cellular systems, including the recently adopted 3G systems.

An Exposure towards Neighbour Discovery in Wireless Ad Hoc Networks
An Exposure towards Neighbour Discovery in Wireless. Ad Hoc Networks. S. SRIKANTH1, D. BASWARAJ2. 1 M.Tech. Student, Computer Science & Engineering, CMR Institute of Technology, Hyderabad (India). 2 Associate Professor. Computer Science & Engineering

Wireless Mobile Ad-hoc Sensor Networks for Very ...
{mvr, bzw}@cs.nott.ac.uk. T. Page 2. is typically archived in a powerful server geographically ... the pre-determined powerful servers in the labs e.g. The Great.

Policy Based SLA for Wireless Ad Hoc Networks
propagation range of each mobile host's wireless transmissions. ... The Framework uses a Policy Server to take decisions on the service trading issues.

Wireless Mobile Ad-hoc Sensor Networks for Very ... - Semantic Scholar
proactive caching we significantly improve availability of sensor data in these extreme conditions ... farmers over the web interface, e-mail, or post and stored in a.

The Effect of Caching in Sustainability of Large Wireless Networks
today, [1]. In this context, network caching has a key role, as it can mitigate these inefficiencies ... nectivity, and deliver high quality services as the ones already.

Capacity Scaling in Mobile Wireless Ad Hoc Network ...
Keywords-Ad hoc wireless networks; hybrid wireless net- work; mobility; capacity .... A smaller m represents a more severe degree of clustering and vice versa.