Enhancing Location Service Scalability with HIGH ...

Viewer
Transcript

1

Enhancing Location Service Scalability with HIGH-GRADE ∗ Dept.

Yinzhe Yu∗ , Guor-Huar Lu† , and Zhi-Li Zhang∗ of Computer Science & Engineering, † Dept. of Electrical & Computer Engineering University of Minnesota, Twin Cities ∗ {yyu,zhzhang}@cs.umn.edu, † [email protected]

Abstract— Location-based routing significantly reduces routing overheads in mobile ad hoc networks (MANETs) by utilizing position information of mobile nodes in forwarding decisions. Location service is therefore critical to location-based routing, the scalability of which hinges largely on the overheads of such service. Although several location service schemes have been proposed, most of them focus only on one or two aspects of the scalability in their performance evaluations, and a comprehensive comparative study is missing. In this paper, we first explore the design space of location services and present a taxonomy of existing schemes. We then propose HIGH-GRADE, a new location service scheme that employs a multilevel hierarchical location server structure and a multi-grained location information organization. We develop a uniform theoretical framework to analyze HIGH-GRADE and four other existing schemes in terms of three metrics: location maintenance cost, location query cost, and storage requirement cost. We show that the design of a location service scheme involves tradeoffs among all the three kind of overheads. Further, in our theoretical analysis and simulation experiments, HIGH-GRADE demonstrates superior scalability, especially when a localized data traffic pattern is assumed.

I. I NTRODUCTION Mobile ad hoc networks (MANETs), being fast and cheap to deploy and requiring little user configuration, is a key enabling technology of future mobile computing environment. However, the design of scalable routing protocols in MANETs remains a challenging research problem [1]. Classical protocols, such as DSR[2] and AODV[3], focus mainly on accommodating node mobility. These protocols discover routes on demand by sending query messages flooding the entire network, therefore are generally considered unsuitable for networks beyond a few hundred nodes, as the cost of flooding becomes prohibitive [4]. The recent proposed location-based routing protocols [5], [6], [7] are promising candidates of achieving scalable ad hoc routing. Assuming availability of the geographical location information of nodes (e.g., via GPS or other techniques), location-based routing allows nodes to make forwarding decisions based on the packet destination locations and neighboring nodes locations (e.g., by greedily forwarding to a neighbor closest to the destination). Hence nodes only maintain local information about neighbors, rather than the traditional “routing tables” about all potential destinations. This work was supported in part by the National Science Foundation under the grants ANI-0073819, ITR-0085824, and CAREER Award NCR-9734428. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Nevertheless, location-based routing introduces a new problem, i.e., the need for a location service: before packets can be forwarded, a (source) node needs to discover the location of a destination node. In ad hoc networks, location service is often designed as a co-operative service in which every node is both a user and a server1 . Each node stores its location information on some other nodes, called its location servers. As a node moves around, it updates its servers about its current location. When a node wants to find the location of another (destination) node, it queries the appropriate location server(s) to retrieve the desired information. Hence location-based routing shifts the routing scalability from route discovery and maintenance to location discovery and maintenance, and scalability of location service is critical to the overall scalability of location-based ad hoc routing. Although many location service schemes for MANETs have been proposed, including GLS [7], SLURP [9], SLALoM [10], DLM [11], and Hierarchical Grid [12], several important issues remain to be addressed. First, the evaluation of the scalability properties of these schemes only focus on one or two metrics. For instance, [12] focuses mainly on the location update cost, while [10] and [11] consider the location query cost as well. We contend that in addition to these two metrics, memory or storage requirement is also important, as a large number of cheap and small devices will be present in the future ad hoc networks. In fact, as we will show in our performance study, the design of a location service involves tradeoffs among all these three important metrics. Second, we believe that the scalability issue of location services should be considered under traffic patterns (i.e., communications patters among nodes) other than the uniform pattern, which is the only pattern considered in the aforementioned studies. Of particular importance is localized traffic patterns, i.e., nodes are more likely to communicate with those nodes in its nearby area rather than those at a large distance. Such traffic patterns are prevalent among many human or computer communication environments, and also highly likely in a large MANET. In this paper we first explore the location service design space and present a taxonomy of the existing schemes. We then propose a new scheme, HIerarchical Geographical Hashing with multi-GRained Address DElegation (HIGH-GRADE), which offers many advantages over existing schemes. We also present a common theoretical framework for studying 1 There are location services in which only a subset of all the nodes serve as location servers [8]. However, in this paper, we focus on those in which all nodes participate in a cooperative manner.

2

(a) Fig. 1.

(b)

(c)

Server organizations: (a) flat; (b) two-level; (c) multi-level hierarchical.

the location service scalability, based on which we analyze HIGH-GRADE as well as four other schemes. We show that HIGH-GRADE strikes the right balance among the three performance metrics, and has superior scalability, especially under a localized traffic pattern. The analytical results are then validated by simulation experiments. We believe that our study provides valuable insights into the issue of MANET routing scalability and sheds light on the comparative performance of the various location service schemes. In the rest of the paper, we first discuss the design space and tradeoffs in Section II. In Section III, we present HIGHGRADE. A comprehensive comparison study based on analytical modeling is described in Section IV. Section V presents simulation results comparing HIGH-GRADE and GLS. We conclude the paper in Section VI. II. L OCATION S ERVICE : T HE P ROBLEM AND T HE D ESIGN S PACE The basic location service problem can be described as follows: For a node with unique ID B wishing to communicate with another node with unique ID A, how to discover the current location of node A, i.e., how to map a node’s ID to its current location in a distributed and dynamic manner? More specifically, how should A choose a set of nodes as its location servers? How should A update its location servers as it moves to ensure freshness of the information? How should B find the appropriate servers to query for A’s location? Consider a MANET with N nodes, each with a unique ID, uniformly distributed in a squared region with area of A. Our basic design goal is to make the location service scale well as N grows. A. Location Server Organization Structure The first question is which nodes to select as the location server(s) of an arbitrary node A — the problem of location server organization. The simplest organization structure is a flat structure, used in SLURP[9]. In this method, the network area is divided into a flat grid of “squares”. A hash function is applied to node A’s ID to obtain a random (x, y) coordinates

in the entire network area. The square containing the hashed coordinates is defined as A’s “home square” and A’s location information is stored on all the nodes in its home square. To query A’s location, a node B applies the same hash function to A’s ID to find out A’s home square. B can then forward a query packet to that square to retrieve the location of A. Figure 1(a) illustrates this method, where the solid line represents the location update path and the dashed line represents the location query path. A well-known drawback of the flat structure is that the expected distance a query packet of B needs to travel to retrieve A’s location grows proportional to the network size, i.e., even if B is geographically close to A, its query packet travels a long distance. To remedy this, a two-level structure is employed in SLALoM and DLM, where the entire network area is divided into many “level-2 squares”. Node A’s ID is hashed to a (same) point in each of the level-2 squares. Node A thus has one home square in every level-2 squares. The distance a query packet needs to travel is then bounded by the size of a level-2 square, as illustrated in Figure 1(b). The two-level structure reduces the location query cost, but increases the cost of updating location servers — many home squares need to be updated. Note that in the two-level structure location servers are distributed uniformly. A design alternative is to distribute the servers such that they are denser in the area close-by, so that nearby nodes incur low cost in location query. Such design is especially attractive for localized traffic pattern, where many queries are from nearby nodes. This method, which we call the multi-level hierarchical structure, is employed by GLS and our HIGH-GRADE scheme. GLS assumes a square shape network area, called the level-H square (where H is the total number of levels in the hierarchy). The level-H square is divided into four level-(H−1) squares, each of which are further divided into four level-(H − 2) squares, and so on, until finally reaching the level-0 squares. In each level-i square (for i > 0), node A selects three location servers, one in each level-(i − 1) square quadrants that A is not in, as illustrated in Figure 1(c). HIGH-GRADE employs a similar hierarchical grid over the network, but uses different method for selecting servers and storing location information, as we will explain in Section III.

3

grained location information, it may reduce the location update cost, but it also requires multiple servers to be queried to assemble the exact location of a node. Therefore, a good design must be based on clear design goals, and careful analysis of the involved tradeoffs is necessary to make the final call.

Location granularity

DLM

multi-grained

(partial address)

two-level grained single grained

HIGH-GRADE

SLALoM

SLURP flat

DLM (complete address)

two-level

GLS Server organization multi-level

Fig. 2. Taxonomy of location service schemes along two design dimensions

B. Granularity of Location Information How frequently a node updates its location servers depends on the granularity of the location information stored on the servers. A straightforward method called the single grained strategy, used in SLURP and GLS, is to store the exact location. However, since the exact location information becomes stale quickly, frequent and expensive updates are required. SLALoM employs a two-level grained location information. To do this, SLALoM defines “home squares near A” as the nine home squares closest to A, i.e., the home square in the level-2 square A resides in, and the home squares in the eight immediate surrounding level-2 squares. All the home squares of A (“near A” or not) knows which level-2 square A is in (the coarse grained location); in addition, all the home squares near A knows the exact location of A (the fine grained location). Therefore, only nearby servers need to be updated frequently. To carry the idea of location granularity a step further, HIGH-GRADE adopts a strategy of using multi-grained location information. In HIGH-GRADE, a node A has location servers at each level-i square it resides in. Instead of storing the exact location of A on all these servers, a rough information of “which level-(i−1) square A is in” is stored on a server serving the level-i square. In this way, location servers at higher level squares (i.e., those far away from A) need only infrequent updates. On the other hand, by tracing a series of servers at lower and lower levels, the exact location of A can be obtained eventually. Finally, the DLM scheme offers two options in location granularity. Either the “complete address” or a “partial address” can be stored at a server. The partial address method is similar to HIGH-GRADE’s multi-grained location information. C. Design Tradeoffs In Figure 1(d), we present a taxonomy of the five schemes over two design dimensions — server organization and location information granularity. We note that many tradeoffs are involved when we make a design choice. For example, SLALoM and DLM try to reduce the high location query cost of SLURP by changing from a flat server structure to a twolevel one. But they may increase the storage overhead of each server, as more duplicates of a node’s location information are stored across the network. As HIGH-GRADE uses coarse

D. Other Related Works We note that address/location management research has been active for many years in the context of packet radio and cellular networks. Lauer [13] described an address management scheme for a three-level hierarchical packet radio network, where nodes are organized into clusters and superclusters. Kasera and Ramanathan [14] proposed a similar location management protocol for cellular networks. Although the hierarchical clustering and routing schemes in these works bare a similar design philosophy as HIGH-GRADE, there are several important differences. [13], [14] assume a hierarchical clustering algorithm based on the network topology, while in HIGH-GRADE the hierarchy is organized naturally using the node geographical locations without additional clustering cost. In addition, [13], [14] rely on cluster-head to take the routing responsibility, while in HIGH-GRADE the location management is distributed over all the nodes. Therefore, the approach of [13], [14] is more suitable for an infrastructurebased network, where base stations are deployed to serve the special functions of a cluster, as described in [14]. III. T HE HIGH-GRADE S CHEME In this section, we first introduce a technique called Geographically Scoped Hashing (GSH), based on which a multilevel hierarchical server structure is built in HIGH-GRADE. We then describe the location update and query procedures in HIGH-GRADE. A. Geographically Scoped Hashing We consider a mobile ad hoc network with N nodes in a square region with area of A. In HIGH-GRADE, we recursively divide the network area into a quad-tree-like hierarchy of squares. At the top level, the entire area is called a levelH square, where H is a system parameter related to the total number of levels in the hierarchy. The level-H square is divided into four quadrants, called the level-(H − 1) squares, each of which is further divided into four quadrants as well, so on and so forth, until the entire region is divided into 4H level0 squares. We denote the side length of a level-0 square by R √ (R = 2HA ). Figure 3 illustrates this hierarchy of squares with an example where H = 3. We note that in such hierarchy, each node A resides in exactly one level-i square for 0 ≤ i ≤ H . By applying a Cartesian coordinate system to our network area, with the lower left point as the origin of the system, we can formally define the level-i square of a node A as follows. Definition 1 (level-i square): Let the current coordinates of a node A be (xA , yA ), the level-i square of A is a square area

4

E

E

C

C CX1

CX1

A

D

A

D

B

F

B

F

B

A

(a)

(b)

Fig. 4. Server selections around an LSP: (a) Perimeter based; (b) Perimeter and range based.

level-3 square boundary

level-2 square boundary

level-3 square LSC

Fig. 3.

level-2 square LSC

level-1 square boundary level-1 square LSC

level-0 square boundary

level-0 square LSC

Network hierarchy and location query in HIGH-GRADE.

o ) and side with lower left point (called its origin) (xoA,i , yA,i i length of 2 R, where

xoA,i = xA − (xA mod 2i R), o yA,i = yA − (yA mod 2i R). From Definition 1 we see that the coordinates of a node A (xA , yA ) can be decomposed into two parts, the coordinates of its level-i square origin and the its relative position in the square. Figure 3 highlights the level-0, 1, 2 squares of node A with three levels of shade. With the level-i squares defined, a node A selects one location server (set) in each level-i square (0 ≤ i ≤ H), based on the concept of location server point (LSP). Definition 2 (Location Server Point): Node A’s location server point in its level-i square is the point with coordinates l o LSPA,i = hxlA,i , yA,i i = hxoA,i , yA,i i + hi (A.ID), where o o hxA,i , yA,i i is A’s level-i square origin, and hi is a uniform hash function that hashes a node’s ID to a relative position in a level-i square. Note that Definition 2 assumes H+1 uniform hash functions well-known to all nodes in the network. In Figure 3, node A’s four LSPs are identified by four different markers (with solid drawings). As each LSP is a hash point within a geographically scoped area (a level-i square), we call this technique Geographically Scoped Hashing (GSH).

B. Server Selection Around LSPs An LSP provides a rendezvous point for a pair of nodes A and B at which B can lookup A’s location. However, typically there is no node at the exact position of an LSP. We therefore need a consistent way of choosing location servers around the LSP, so that location updates and location queries result in the same nodes. In position based forwarding such as FACE-II [15] and GPSR [5], packets operate in two modes: greedy mode and perimeter mode. When an intermediate node has a neighbor

closer to the destination, it adopts the greedy strategy and forwards the packet to the neighbor closest to the destination. Otherwise, the packet is set to the perimeter mode, and is forwarded according to the “right hand rule” based on a planar subgraph of the network topology. When there is no node at the exact position of an LSP, a packet targeting the LSP will circulate on the “perimeter” around the LSP, as shown in Figure 4(a). We can use the following two strategies to select location servers around an LSP. 1) Store the location information on all the nodes on the perimeter enclosing the LSP. This strategy is used in [16] for sensor networks. It is illustrated in Figure 4(a). 2) Store the location information on those nodes that are on the perimeter and within a distance threshold κ of the LSP. This strategy is illustrated in Figure 4(b). A good property of the second strategy is that it creates an upper bound on the average number of nodes that are selected around each LSP (assuming a constant node density γ). However, when the node distribution is non-uniform across the network area and there are large holes in the network, the second strategy may result in an “empty” LSP with no location server around it.

C. Multi-Grained Location Information In HIGH-GRADE, multi-grained location information are stored on servers at different levels. Specifically, at a level-i (i > 0) server of node A, it stores information on “which level(i−1) square A resides in”; it stores A’s exact location only on level-0 servers. The following definition gives the exact format of the location information records stored on the servers. Definition 3 (Location Record): Suppose C is a level-i server of node A, the location information record stored on C, denoted Record(C, A) is the triple with following format: ½ o o Record(C, A) =

(A.ID, (xA,i−1 , yA,i−1 ), i − 1) (A.ID, (xA , yA ), N U LL).

if 1 ≤ i ≤ H; if i = 0.

Note that the record stored in level-i (i > 0) servers uniquely specifies the level-(i − 1) square A is in, while a record stored in a level-0 server contains the exact location and a NULL value to indicate that the location (instead of a square) is stored.

5

D. Location Update and Maintenance Based on Definition 3, as long as node A remains in its level-i square, all its level-j servers (∀j > i) have the current information, and need not be updated. This gives the following rule for location updates in HIGH-GRADE. Rule 1 (Location Update): A level-j server of node A needs to be updated when and only when A moves across a level-i square boundary for i ≥ j − 1. The location information stored on location servers need to be maintained as servers moves around, as some servers may move away from the LSP, while other nodes can move close and become a perimeter node. To accomplish this task, HIGHGRADE uses a perimeter refresh protocol similar to [16]. One server on the perimeter is elected as the “home node” of an LSP (e.g., the first node encountered on the perimeter when the perimeter is traversed). The home node refreshes the perimeter periodically by sending update messages towards the LSP. When the home node moves away so that it is no longer on the perimeter, a new home node is elected on next perimeter refreshing. E. Location Query We now describe how a node B finds the location of A through A’s location servers, starting with the concept of minimal common square (MCS) of two nodes. Definition 4 (Minimal Common Square): The minimal common square of A and B is the level-i square of A (or B) such that o o i = min{j|xoA,j = xoB,j , yA,j = yB,j }, o o where hxoA,j , yA,j i and hxoB,j , yB,j i are A and B’s level-j square origins respectively. The MCS of A and B is their first common level-j square, as we examine A and B’s level-j squares with increasing j values. Note that the MCS of A and B always exists since o o xoA,H = xoB,H = 0 and yA,H = yB,H = 0, i.e., the level-H square is always a common square. MCS is important because if B knows that A and B’s MCS is at level-i, B can obtain LSPA,i immediately by adding hi (ID(A)) to the origin of the MCS. Unfortunately, B has no way to know a priori the MCS. Therefore, it may need to make several unsuccessful attempts, at locations we call potential location server point (pLSP). Definition 5 (Potential LSP): A pLSP computed by node B for lookup of A’s location at a level-i square is pLSPB,A,i = l o hxlB,A,i , yB,A,i i = hxoB,i , yB,i i + hi (ID(A)). The same set of hash functions hi ’s are used in both LSP and pLSP computation. In LSP, the hashed point is applied to A’s level-i square, while in pLSP, B’s level-i square is used. By the definition of LSP, pLSP, and MCS, we have the following property: If the MCS of A and B is a level-i square, then i = min{j|LSPA,j = pLSPB,A,j }. Based on this property, we have the following strategy for B to find A’s location: B queries pLSPB,A,j sequentially for increasing j. Suppose the MCS of A and B is a level-i square, then the first i queries will fail as pLSPB,A,j 6= LSPA,j for

j < i. When a query at pLSPB,A,j fails, the home node at that pLSP re-forwards the query to the next pLSP, i.e., pLSPB,A,j+1 , until a location server is finally reached (at LSPA,i ). The location server at LSPA,i can then re-forward the query sequentially to LSPA,j for decreasing value of j, until a server at LSPA,0 receives the query and replies B with the current location of A. Figure 3 shows an example of how the query message is relayed when node B queries the location of node A. Note that in the figure, markers with dashed drawing are pLSPs, while markers with solid drawing are LSPs. IV. C OMPARATIVE S TUDY BASED ON A NALYTIC M ODELS In this section, we develop analytic models to compare the scalability of HIGH-GRADE and existing schemes including GLS, DLM, SLURP and SLALoM. In particular, we focus on how they scale with the size of the network, represented by the number of nodes N , and the moving speed of nodes, denoted by v. We first define three metrics: location maintenance cost, location query cost, and storage requirements cost. In all the schemes, a node needs to update its location servers. In addition, some schemes require servers to maintain the location information among themselves (e.g., perimeter refreshing in HIGH-GRADE). These overheads are used to maintain the fresh location information on location servers, which we define as location maintenance cost. Definition 6 (Location Maintenance Cost): The location maintenance cost Cm is defined as the number of forwarding operations each node needs to perform in a second to handle the location update and location maintenance packets. Note that we measure the cost in terms of the forwarding load. It is natural to correlate the number of packet forwarding a node performs to the node’s CPU processing power and battery power consumption, two important scalability constraints. Similarly, we define the location query cost and storage requirement cost as follows. Definition 7 (Location Query Cost): The location query cost Cq is the number of packet forwarding operations due to location queries each node needs to perform in a second. Definition 8 (Storage Requirement Cost): The storage requirement cost Cs of a location service is the number of location records a node needs to store as a location server. We point out that all the three metrics are defined in terms of the cost measured at an individual node. Since nodes are all symmetric in the schemes we examine, the expected value of the metrics are the same for all the nodes in the network. We derive these expected values in the remaining part of the section. Before we proceed, we summarize some basic assumptions we made: • As N grows, the area A of the network grows linearly with N . This implies a constant node density γ. • Nodes move according to a simplified Random Waypoint model [17]. Each node picks a random point in the network area and moves towards it with velocity v. After reaching it, a new point is selected and the node moves on without pause. • A data traffic pattern is the probability distribution of traffic intensities between any pair of nodes in the network.

6

TABLE I N OTATIONS . Cm Cq Cs κ v z ρi du dq nu nq λ Piu Pil c1 c2,3

location maintenance cost location query cost storage requirement cost distance threshold in perimeter refresh node speed average progress of each forwarding hop level-i square boundary crossing rate distance traveled by an update packet distance traveled by a query packet number of forwarding hops of an update packet number of forwarding hops of a query packet perimeter refreshing rate prob. querying nodes in level-i square (uniform) prob. querying nodes in level-i square (localized) constant of random distance within a square constants of random distance between a pair of squares

Lemma 2: The expected number of hops of a location update packet sent from A to A’s level-i LSP is E(nui ) =

c1 · 2i R , z

where c1 is a constant (≈ 0.5214), and z is a constant parameter representing the average progress towards the destination in each packet forwarding hop. Proof: Consider the distance between A and its level-i LSP, denoted by dui (“u” for update). Since we use a uniform hash function hi to hash A’s ID into the LSP, dui can be viewed as the random distance between two points in the leveli square. Therefore, ZZ 1 Z1 Z 1 p 1 E(dui ) = 2i R (x1 −x2 )2 + (y1 −y2 )2 dx1 dy1 dx2 dy2 0 0 0 0

= The most commonly used pattern is the uniform model, where the probability of initiating a packet between any pair of nodes is equal. However, it has been shown in [18], [19] that under such traffic patten, the end-toend throughput available to a node is O( √1N ), i.e., the throughput scales poorly anyway (no matter what routing scheme is used!) In our models, we consider a localized pattern in addition to the uniform pattern. We summarize our notations in Table I.

c1 · 2i R ≈ 0.5214 · 2i R

A detailed procedures for the integration step above can be found in [20]. The expected number of hops in forwarding the packet is the expected distance divided by z, the average progress of each hop. As shown in [9], z is a function of the radio transmission range rt and the node density γ. As we assume both as constants, so is z. Hence E(nui ) =

E(dui ) c1 · 2i R = z z

. A. HIGH-GRADE We derive the expected values of the three metrics for the HIGH-GRADE scheme. 1) Location Maintenance Cost: The location maintenance cost Cm of HIGH-GRADE consists of two parts: the cost due to the location update packets (denoted by Cm1 ) and the cost due to the perimeter refreshing protocol (denoted by Cm2 ). By Rule 1, when a node A moves across a level-i square boundary, an update packet needs to be sent to each of the level-j LSP’s for j ≤ i + 1. The following two Lemmas give the frequency of boundary crossing events and the expected number of hops of each location update packet. Lemma 1 (Boundary Crossing Rate): A node A’s level-i square boundary crossing rate is πv 1 ρi ≈ · , ∀i, 0 ≤ i ≤ H − 1, 2R 2i where v is A’s moving speed and R is the side length of a level-0 square. Proof: First consider the level-0 square boundary crossπv ing rate ρ0 . In [9], the author showed that ρ0 ≈ 2R by approximating ρ0 with the crossing rate of a node in a circle area with diameter R. Next, observe that a boundary crossing is either a vertical or horizontal boundary crossing. In either case, a level-i boundary is always also a level-(i−1) boundary, while every other (vertical/horizontal) level-(i − 1) boundary is a level-i boundary. Therefore, we have ρi = 12 ρi−1 for 1 ≤ i ≤ H − 1.

With Lemma 1 and 2, we are ready to prove the upper bound of the expected location maintenance cost. Theorem 3: In the HIGH-GRADE scheme, E(Cm ) = O(v· log N ), that is, the location maintenance cost metric scales linearly with the node speed and logarithmically with the number of nodes in the network. Proof: Cm has two components: Cm1 and Cm2 . Cm1 is the cost of updating all the H + 1 LSP’s. Therefore, E(Cm1 )

=

H X

ρi · E(nui )

i=0

=

πvc1 H ∝ v · H. 2z

√ If we hold R as a constant, then H is proportional to log A. As A ∝ N , we have H ∝ log N . So E(Cm1 ) = O(v · log N ). Now consider Cm2 , the cost of perimeter refreshing. Let the refreshing rate be λ. Assuming the perimeter and range based perimeter refreshing protocol, the number of nodes around 2 the perimeter is bounded by πκ γ , where κ is the distance threshold. Since H + 1 LSP’s need the be refreshed, we have E(Cm2 ) = (H + 1) · λ ·

πκ2 = O(log N ) γ

. Combining the two components, we have E(Cm ) = E(Cm1 ) + E(Cm2 ) = O(v log N ).

7

2) Location Query Cost: We first present the expected forwarding hops traveled by a location query packet in a leveli square. Lemma 4: If A and B’s MCS is a level-i square, then the expected forwarding hops traveled by a location query packet initiated by node B for A’s location, denoted by E(nqi ), has the following formula. R E(nqi ) = (2i+2 − 3) · c1 · z Proof: As we described in Section III, the location query packet is re-forwarded sequentially to pLSP0 , pLSP1 ,..., up to pLSPi (i.e., LSPi ), and then to LSPi−1 ,..., and finally to LSP0 . Note that each of these steps can be viewed as the distance between a pair of random points in a level-j square, where j first increases from 0 to i and then decreases to 1. Therefore, the distance of each step is exactly what we have derived for duj in Lemma 2: E(nqi ) =

i X

E(duj )

j=0

z

+

1 X j=i

E(duj ) z

=2

i X j=0

c1 · 2j R c1 · R − z z

R = (2i+2 − 3) · c1 · . z Lemma 4 gives the expected cost of a location query packet when A and B’s MCS is a level-i square. To obtain the general expected cost, we need the probability that “the MCS is a level-i square”, which we denote by Pi . If the traffic pattern is uniform across the network area, i.e., given A’s position, the probability density of B’s position is uniform across the network, then Pi is as follows (we use the notation Piu to indicating the uniform pattern). ½ 3 if 1 ≤ i ≤ H 4H−i Piu = 1 if i = 0. 4H To catch the localization property in the traffic pattern, we also define another set of Pi (denoted by Pil for “localized” l pattern) that is exponentially decaying PHwithl i. We define Pi = 1 l 0 Pi = 1, we have 2 Pi−1 , for 1 ≤ i ≤ H. Given that Pil =

1 1 · 1 , for 0 ≤ i ≤ H. 2i+1 1 − 2H−1

With the definition of the two traffic patterns, we can now prove the upper bound of the expect location query cost metric for HIGH-GRADE scheme. Theorem 5: The upper bound √ of the expected location query cost metric E(Cq ) is O( N ) for uniform traffic pattern and O(log N ) for localized traffic pattern. Proof: For the localized traffic pattern, E(Cq ) = ≈

H X

E(nqi ) · Pil

i=0 H X

(2i+2 − 3) · c1 ·

i=0

1 1 R · i+1 · 1 z 2 1 − 2H−1

R · H = O(log N ). z The proof of bound for uniform pattern is similar and straight forward. ≈

2 · c1 ·

1

c3

c2 c1

1

Fig. 5. Three constants. Left: random distance between a pair of nodes in a unit square (c1 ). Right: random distances between pairs of nodes in two squares (c2 and c3 ).

3) Storage Requirement Cost: We now analyze the third metric: storage requirement cost Cs , which is the number of location records a node stores as a location server. Theorem 6: The expected value of the storage requirement cost of HIGH-GRADE scheme is: E(Cs ) = O(log N ). Proof: Each node stores its location information at H + 1 LSP’s. Assuming the perimeter and range based server selection method is used, the average number of servers at one 2 LSP is bound by πκ γ . Therefore, we have E(Cs ) =

N · (H + 1) · N

πκ2 γ

= O(log N ).

B. GLS GLS uses a multilevel hierarchical server structure similar as HIGH-GRADE. A node A selects three location servers in each level-i square, one in each level-(i − 1) quadrants that A is not in, as shown in Figure 1(c). Since GLS stores exact location information on every server, all the servers need to be updated periodically. The update period is set as the time a node moves a distance of δ. When a node B wants to find the location of A, it forwards a query packet towards node C1 , the node closest to A in the ID space that B knows of. C1 does the same, re-forwording the query packet to C2 , the node closest to A that C1 have a record with, so on and so forth, until a location server of A is found. Assuming nodes are relatively static during a packet’s lifetime, GLS guarantees that the location server is reached in i steps of re-forwarding, where i is the level of A and B’s MCS. In addition, in each of the i steps, the source and destination of the re-forwarded packet in that step are within a level-j square, where j decreases from i to 1. We now prove the following theorem. Theorem 7: For the GLS scheme, √ E(Cm ) = O(v N ); √ ½ O( N ) for uniform traffic pattern E(Cq ) = ; O(log N ) for localized traffic pattern E(Cs ) = O(log N ). Proof: Consider Cm first. In GLS, Cm is solely due to location updates. Consider the expected distances the three update packets travel at the level-i square, denoted by E(dui ). We have E(dui ) = (2c2 + c3 ) · 2i R, where 2i R is the side length of a level-i square, c2 and c3 are two constant factors

8

TABLE II S UMMARY OF THREE SCALABILITY METRICS FOR THE FIVE LOCATION SERVICE SCHEMES . HIGH-GRADE Location maintenance cost Location query cost Storage overhead

O(v √log N ) O( N ) (uniform) O(log N ) (localized) O(log N )

GLS √ O(v √ N) O( N ) (uniform) O(log N ) (localized) O(log N )

representing the average random distances between two points in a pair of squares adjacent to each other or adjoin to each other on a corner, 5 (right pane). Obviously, √ as shown in Figure √ we have c2 ≤ 5, and c3 ≤ 2 2. Since updates are sent out at a rate of vδ , we have E(Cm ) =

H √ v X (2 · c2 + c3 ) · 2i · R · = O(v N ). δ i=1 z

Next consider Cq . Based on the location query procedure described, the expected location query cost when A and B’s MCS is level-i is E(nqi )

=

i X E(du ) i

j=0

=

i X c1 · 2j R j=0

=

z z

(2i+1 − 1) · c1 ·

R . z

Taking into account the various probability of the MCS being a level-i square, we have (for localized data traffic pattern), E(Cq ) =

H X

E(nqi ) · Pil

i=0

≈

H X R 1 1 (2i+1 − 1) · c1 · · i+1 · 1 z 2 1 − 2H−1 i=0

≈

c1 ·

R · H = O(log N ). z

The E(cq ) for uniform pattern can be similarly derived. Finally, the storage requirement cost: Cs =

N · 3H = O(log N ). N

DLM √ 3 O(v√ N ) 3 O( N ) (both) O(

√ 3 N 2)

SLURP √ O(v √ N) O( N ) (both)

SLALoM √ 3 O(v√ N ) 3 O( N ) (both)

O(1)

O(

√ 3 N)

D. Summary Not surprisingly, the results of HIGH-GRADE are most close to GLS, as their designs exhibit the most similarities. HIGH-GRADE and GLS have the same asymptotic location query costs and storage requirement costs. However, HIGHGRADE has significant advantage in terms of the location maintenance cost. Comparing HIGH-GRADE and GLS in terms of total packet processing overhead—i.e., the sum of the location maintenance and query costs—HIGH-GRADE has several advantages. If the localized data traffic pattern is assumed, HIGH-GRADE has a better asymptotic results: O(v log N ) √ vs. O(v N ). If the uniform data traffic pattern is assumed, √ then the two schemes have the same asymptotic cost O(v N ). However, as we have mentioned, location query cost can often be reduced by various caching techniques. In HIGH-GRADE, since the bottleneck of the total cost is location query cost, any improvement from caching techniques will be reflected directly in the total cost. On the other hand, in GLS, since the location maintenance cost has higher asymptotic value, caching strategy may not bring much benefit. We next observe that both SLALoM and DLM improve the total location maintenance/query costs over the SLURP scheme. However, they also introduce significant increase in the storage requirement cost. In the case of DLM, the storage requirement becomes a dominant constraint to the scalability. Another drawback of the SLALoM and DLM schemes is that they do not take advantage of the localized traffic patterns well, with asymptotic bounds of location maintenance/query costs being the same for both traffic patterns. We conclude our analysis with two observations. First, the design of a location service scheme involves tradeoffs among all the three scalability metrics we considered, and missing any of them is likely to miss the whole picture in making the final design decision. Second, the HIGH-GRADE scheme strikes a good balance in the three metrics. Specifically, when the data traffic pattern is localized, HIGH-GRADE achieves superior overall scalability on the order of O(v log N ). V. S IMULATION R ESULTS

C. Analysis of Other Schemes Our analysis of other schemes utilizes very similar techniques as above. Due to the space limit, we refer the reader to a technical report version of this paper [21] for the detailed derivation of scalability properties of the SLURP, SLALoM, and DLM schemes. The results are summarized in Table II.

In this section, we compare the performance of HIGHGRADE and GLS schemes with simulation experiments. We choose GLS to compare with HIGH-GRADE because the two are most similar in their designs. We therefore use the simulation experiments as a complement of our analytical results. The GLS implementation we use is that of [7]. HIGHGRADE is implemented using the network simulator ns-2 [22][23].

HIGH−GRADE GLS

2.5 2 1.5 1 0.5 0

100

200 300 400 500 Number of nodes in the network

(a)

600

5

Query overhead per node per second

3

Update Overhead per node per second

Update overhead per node per second

9

HIGH−GRADE GLS

4

3

2

1

0

10

15 20 25 Maximum Speed (m/s)

(b)

30

HIGH−GRADE GLS

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

100

200 300 400 500 Number of nodes in the network

600

(c)

Fig. 6. (a) Average update cost as a function of node number (v ≤ 10 m/s); (b) Average update cost as a function of node speed (N = 400); (c) Average query cost as a function of node number (v ≤ 10 m/s).

Both simulations use IEEE 802.11 radio and MAC model as implemented in ns-2. Nodes are initially randomly placed across the network area, with a uniform density of 100 nodes/km2 . Therefore, the size of network area increases with the number of nodes. Nodes move according to the random way-point model with no pause time. Each time a random target is chosen, a moving speed is selected between zero and a maximum moving speed (default 10 m/s). Each node periodically broadcasts its location information to its neighbors. The radio transmission ranges rt is 250 meters. For both GLS and HIGH-GRADE, the side length of a level0 square is 250 meter. In all GLS simulations, location update threshold is 100 meters. Each simulation runs for 300 seconds, during which time, each node generates on average 15 location queries to random destination nodes. For each parameter set, we run five simulations with different random seeds, and report the mean and 95% confidence interval. Figure 6(a) shows the average location update cost as a function of N , the node number, when the maximum node moving speed v is 10 m/s. Location update cost is the number of location update packets that are originated at or are forwarded by a node per second. The results reported are the average over all nodes in the network. As shown in Figure 6(a), the location update cost of both schemes exhibit sub-linear growth rate. In comparison, HIGH-GRADE grows much more slowly. This validates our analytic results, where the location √ update costs of the two schemes are O(v log N ) vs. O(v N ). Figure 6(b) shows the average location update costs as a function of maximum node speed for a 400 node network. Again, the simulation results confirm our analytic results, i.e., the costs grow linearly with the node speed for both HIGH-GRADE and GLS. Figure 6(c) shows the average location query cost as a function of N . Location query cost is the number of query packets forwarded by a node per second. The figure shows that HIGH-GRADE and GLS have similar trends in their location query costs, while the HIGH-GRADE cost is somewhat higher. We also observe that although the query rate is quite high (15 queries in 300 seconds for each node), the location query cost is much lower than the location update cost. Therefore the

overall cost is dominated by the update part. As complementary results that ensure the fairness in our comparison above, we present the query success rates of the two schemes in Figure 7(a) and 7(b). For HIGH-GRADE, a query may fail because of missing location information on server(s) due to mobility. For GLS, query failure is mainly due to stale location information on servers. Figure 7(a) shows the query success rates as a function of maximum node speed. As we can see, when the speed is high, the success rate remains high in HIGH-GRADE, but drops steadily in GLS. We believe this may be due to the way location servers are updated in GLS. In GLS, when a node A needs to update one of its high level server, the update packet will be forwarded in many steps, each targeting a node in a level-i square (with increasing value of i). The choice of the correct next target in each step depends on the correctness of the location information in the current level-i square. Therefore, in GLS, the updates of lower level servers need to precede the updates of higher level servers, making the “convergence” time long. In face of high node speed, the GLS scheme is slow in adapting to node mobility, resulting in many stale location records on servers. Figure 7(b) show the query success rate as a function of N , where similar trends are shown for both HIGH-GRADE and GLS, although the difference between the two schemes is less dramatic compared to Figure 7(a). From Figure 5 and 6, we observe that compared with GLS, HIGH-GRADE achieves higher query success rate with lower total communication costs (location update costs plus location query costs). Finally, Figure 7(c) presents the storage requirement cost of the two schemes. For each simulation run, we obtain the average number of location records stored on all the nodes at the end of the simulation. In addition, we also find the maximum number of records a node stores. As shown in the figure, the average storage requirement costs of both schemes grows very slowly, in agreement with our analytic results. We also observe that GLS’s average storage costs are slightly better than HIGH-GRADE’s. However, the maximum storage costs of GLS are very erratic. We offer one possible explanation for this phenomenon. For both HIGH-GRADE and √ GLS, the network is in “perfect” hierarchy (i.e., RA is a power

10

1

1

0.8

0.7

0.6

Per−node location entry

Query success rate

Query success rate

0.9

150

0.95

0.9

0.85

HIGH−GRADE GLS 0.5

10

15 20 25 Maximum Speed (m/s)

30

(a)

HIGH−GRADE Average HIGH−GRADE Maximum GLS Average GLS Maximum

100

50

HIGH−GRADE GLS 0.8

100

200 300 400 500 Number of Nodes in the network

(b)

600

0

100

200 300 400 500 Number of Nodes in the network

600

(c)

Fig. 7. (a) Query success rate as a function of node number (v ≤ 10 m/s). (b) Query success rate as a function of node speed (N = 400). (c) Storage requirement cost as a function of node number (v ≤ 10 m/s).

of 2) when N = 100 and N = 400 in our simulations. In these cases, both HIGH-GRADE and GLS can distribute location server load uniformly. In other cases, GLS distributes more load on the nodes located in the incomplete (i.e., not a power of 2) high level squares, resulting in high maximum storage cost. VI. C ONCLUSIONS We have explored the design space of location service and classified previously proposed schemes. We have presented HIGH-GRADE, a new location service scheme featuring a multilevel hierarchical server structure and multi-grained location information. We have developed a uniform theoretical framework to compare the scalability of HIGH-GRADE with four other schemes. Our analysis has shown that HIGHGRADE has balanced costs in the three scalability metrics, and has superior asymptotic bounds, especially when localized data traffic pattern is assumed. The analysis results have been further supported by simulation experiments. In conclusion, we believe HIGH-GRADE is an attractive choice for building a location service for large scale mobile ad hoc networks. Furthermore, we believe that our comparative study of the various location schemes will facilitate a deeper understanding on the routing scalability in MANETs. R EFERENCES [1] Onur Arpacioglu, Tara Small, and Zygmunt J. Hass. Notes on scalability of wireless ad hoc networks. http://www.ietf.org/internet-drafts/draftirtf-ans-scalability-notes-01.txt, October, 2003. [2] David B Johnson and David A Maltz. Dynamic source routing in ad hoc wireless networks. In Imielinski and Korth, editors, Mobile Computing, volume 353. Kluwer Academic Publishers, 1996. [3] Charles Perkins and Elizabeth Royer. Ad-hoc on-demand distance vector routing. In Proceedings of IEEE Workshop on Mobile Computing Systems and Applications, 1999. [4] Charles Perkins. Ad Hoc Networking. Addison Wesley Publishing Co., 2000. [5] Brad Karp and H. T. Kung. GPSR: greedy perimeter stateless routing for wireless networks. In Mobile Computing and Networking, pages 243–254, 2000. [6] Stefano Basagni, Imrich Chlamtac, Violet R. Syrotiuk, and Barry A. Woodward. A distance routing effect algorithm for mobility (DREAM). In Proceedings of ACM MobiCom 98, pages 76–84, 1998.

[7] J. Li, J. Jannotti, D. De Couto, D. Karger, and R. Morris. A scalable location service for geographic ad-hoc routing. In Proceedings of ACM MobiCom, pages 120–130, August 2000. [8] Zygmunt J. Haas and Ben Liang. Ad hoc mobility management with uniform quorum systems. IEEE/ACM Transactions on Networking, 9(4):228–240, 1999. [9] Seung-Chul M. Woo and Suresh Singh. Scalable routing protocol for ad hoc networks. Wireless Networks, 7(5):513–529, 2001. [10] Christine T. Cheng, H. L. Lemberg, Sumesh J. Philip, E. van den Berg, and T. Zhang. SLALoM: A scalable location management scheme for large mobile ad-hoc networks. In Proceedings of IEEE WCNC, March 2002. [11] Y. Xue, B. Li, and K. Nahrstedt. A scalable location management scheme in mobile ad-hoc networks. In Proceedings of the IEEE Conference on Local Computer Networks (LCN ’01), 2001. [12] Sumesh J. Philip and Chunming Qiao. Poster: Hierarchical grid location management for large wireless ad hoc networks. In Proceedings of ACM MobiHoc 03, Poster session, June 2003. [13] Gregory Lauer. Address servers in hierarchical networks. In Proceedings of IEEE ICC, 1988. [14] Kamal K. Kasera and Ram Ramanathan. A location management protocol for hierarchically organized multihop mobile wireless networks. In Proceedings of IEEE ICUPC, 1997. [15] P. Bose, P. Morin, I. Stojmenovic, and J. Urrutia. Routing with guaranteed delivery in ad hoc wireless networks. In Proceedings of 3rd International Workshop on Discrete Algorithms and methods for mobile computing and communications, August 1999. [16] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker. GHT: A geographic hash table for data-centric storage in sensornets. In Proceedings of ACM WSNA, 2002. [17] Josh Broch, David A. Maltz, David B. Johnson, Yih-Chun Hu, and Jorjeta Jetcheva. A performance comparison of multi-hop wireless ad hoc network routing protocols. In Mobile Computing and Networking, pages 85–97, 1998. [18] Piyush Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Transactions On Information Theory, 46(2):388–404, March 2000. [19] Jinyang Li, Charles Blake, Douglas S. J. De Couto, Hu Imm Lee, and Robert Morris. Capacity of ad hoc wireless networks. In Mobile Computing and Networking, pages 61–69, 2001. [20] Steven R. Dunbar. The average distance between points in geometric figures. College Mathematics Journal, 28(3), may 1997. [21] Yinzhe Yu, Guor-Huar Lu, and Zhi-Li Zhang. Enhancing Location Service Scalability with HIGH-GRADE, Dept. of Comp. Sci. & Eng., U of Minnesota, Technical Report TR-04-002, 2004. [22] The VINT Project, The ns Manual, 2003. http://www.isi.edu/nsnam/ns/ns-documentation.html. [23] CMU Monarch Group, CMU Monarch extensions to ns. http://www.monarch.cs.cmu.edu/.