Delay Optimal Queue-based CSMA

Viewer
Transcript

Delay Optimal Queue-based CSMA Devavrat Shah

Jinwoo Shin

∗

Department of EECS Massachusetts Institute of Technology Cambridge, MA 02139, USA

Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139, USA

[email protected]

[email protected]

ABSTRACT

General Terms

In the past year or so, an exciting progress has led to throughput optimal design of CSMA-based algorithms for wireless networks ([16][31][22][30]). However, such an algorithm suffers from very poor delay performance. A recent work [35] suggests that it is impossible to design a CSMA-like simple algorithm that is throughput optimal and induces low delay for any wireless network. However, wireless networks arising in practice are formed by nodes placed, possibly arbitrarily, in some geographic area. In this paper, we propose a CSMA algorithm with pernode average-delay bounded by a constant, independent of the network size, when the network has geometry (precisely, polynomial growth structure) that is present in any practical wireless network. Two novel features of our algorithm, crucial for its performance, are (a) choice of access probabilities as an appropriate function of queue-sizes, and (b) use of local network topological structures. Essentially, our algorithm is a queue-based CSMA with a minor diﬀerence that at each time instance a very small fraction of frozen nodes do not execute CSMA. Somewhat surprisingly, appropriate selection of such frozen nodes, in a distributed manner, lead to the delay optimal performance. We report several simulation results to support our theoretical results about performance of our algorithm as well as to provide a comparison with known algorithms.

Algorithms, Performance, Design

Categories and Subject Descriptors G.3 [Probability and Statistics]: Stochastic processes, Markov processes, Queueing theory; C.2.1 [Network Architecture and Design]: Distributed networks, Wireless communication ∗All authors are with Laboratory for Information and Decision Systems, MIT. This work was supported in parts by NSF projects HSD 0729361, CNS 0546590, TF 0728554 and DARPA ITMANET project.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.

Keywords Wireless multi-access, Markov chain, Mixing time, Aloha

1.

INTRODUCTION

Eﬃcient scheduling to resolve contention, also known as Medium Access Control (MAC), arises in wireless and computer networks where multiple entities may interfere with each other to access a common resource. Design of good MAC protocols has been of great interest starting with the design of the Aloha network [1] and their practical importance is reﬂected in IEEE standards. Despite a long history (see Section 1.1), a satisfactory simple, distributed MAC protocol has remained allusive till recently. In the past year or so, it has been shown that extremely simple MAC protocols with Carrier Sensing information (CSMA) can achieve the maximum throughput in wireless networks (cf. [16][31][22][30]). Thus, realizing promise of designing a simple, distributed throughput optimal MAC. Unfortunately, all of these algorithms, based on a randomized sampling mechanism, also known as Glauber dynamics, have very poor performance in terms of their induced delay. Now in general, it is unlikely to expect a simple MAC protocol to have high throughput and low delay simultaneously due to the recent impossibility result [35]. But, wireless networks arising in practice are not arbitrary; and actually they are formed between nodes placed in a geographic area. Therefore, the question remains whether it is possible to design a high throughput, low delay algorithm for practical wireless networks or networks with geometry. In this paper, we address this question successfully. Specifically, we consider the task of designing a CSMA (Carrier Sensing Multiple Access) algorithm with high throughput, low delay for networks with geometry or formally called networks with polynomial growth. This class of networks includes a wireless network formed between arbitrarily placed nodes in two or three dimensional spaces, when any two nodes interfere only if they are nearby and there is a minimal separation between placements of any two nodes. Our algorithm, by utilizing minimal local network structural information, provides a distributed queue-based CSMA that is essentially throughput optimal and has per-node, order optimal average delay (or queue-size).

1.1

Related Works

For the past four decades, researchers have addressed the question of designing MAC in various setups/assumptions in terms of (1) time scaling (slotted, asynchronous), (2) interference topology (one common channel, bipartite network, primary interference model, general topology), (3) available information (Aloha-style random back-oﬀ protocol, CSMAtype algorithm, message-passing algorithm, centralized algorithm), (4) packet arrivals/services (saturated, exogenous). Most research focuses on simple or myopic protocols, such as random back-oﬀ styles or CSMA-based ones, because these are easiest to implement and understand. We start by describing the classical problem setup, which relies on the slotted time-domain, unit-sized packets and one common channel. In this setup, under some exogenous arriving process, a random number of packets enter the system. Each packet may try to transmit at the beginning of each time step. If two or more packets simultaneously try, then the packets interfere with each other (collide), and they are not transmitted successfully. The channel is not centrally controlled. Instead, it needs to use a distributed protocol or algorithm to resolve contention. Works by Kelly and McPhee [19, 20, 23], Mosely and Humblet [29], Tsybakov and Likhanov [43], Aldous [2], Hastad, Leighton and Rogoﬀ [14], Goldberg et al. [12] establish various negative and positive results in this setup when packets may be queued or not queued. Most of these protocols are of random access (or back-oﬀ) style, using collision or busyness of the channel as a signal of congestion and then reacting to it using a simple randomized rule. We refer an interesting reader to an online survey [11] (until October 2002). Gupta and Stolyar [13] and Stolyar [39] consider a general interference topology (or network) which includes one common channel model as a special case. They proposed random access algorithms where access probabilities are designed using queueing information. Their algorithm can achieve a certain (not complete) throughput optimality property under assuming that all queues in the network are saturated, i.e. unlimited number of packets are available for transmission at any queue at any time. Another class of random access algorithms in general topology is based on “Carrier Sensing”, i.e. each node veriﬁes the absence of other traﬃc before transmitting. Eryilmaz, Marbach and Ozdaglar [24] showed that with a particular interference model (“primary interference model”), properly choosing the access probabilities in CSMA can achieve the maximum throughput in the asymptotic regime of small sensing delay and large networks. A related work by Bordenave, McDonald and Prouti´ere [4] analyzes the capacity of large network (or mean ﬁeld limit) for a given set of access probabilities. The Max-Weight (MW) scheduling algorithm proposed by Tassiulas and Ephremides [42] provides a myopic solution (centralized though) to reach the contention resolution in general interference topology. Variants of this algorithm have good delay properties (cf. Shah and Wischik [40, 6, 36, 37]). However, such algorithms requires to solve a NPhard problem per every time slot, hence are diﬃcult to implement. Maximal scheduling or Longest-Queue-First algorithm are low-complexity alternatives to MW, but they achieve only some fraction of the maximal throughput region [3][25][17][21]. Simpler or distributed implementations of MW has been also extensively studied. Randomized versions of MW by Tassiulas [41] and its variant by Giaccone, Prabhakar and

Shah [10] provide a simpler (centralized though) implementation of MW for input-queued switches while retaining the throughput property. A distributed implementation of this algorithm based on distributed sampling and distributed (a la gossip, cf. Shah [33]) summation procedure was proposed by Modiano, Shah and Zussman [27]. Another distributed implementation with constant overhead was proposed by Sanghavi, Bui and Srikant [32]. These algorithms, though distributed, require high information (or message) exchanges for each new scheduling decision and are not applicable to the general inference network. As a recent work on this line, Rajagopalan, Shah and Shin [31, 34] proposed a simple CSMA-type throughput optimal algorithm, say RSS, in the general network which essentially implements a variant of MW where weights are taken with respect to the double logarithm (or log log) of queue-sizes. In RSS, each node uses access probability based on its own queue-size and an estimation of the maximum queue-size Qmax in the entire network. This global information Qmax requires each node to exchange exactly one message/number (through a broadcast transmission) with its neighbor at each time. Jiang and Walrand [16] also provide a CSMA-type algorithm which determines the access probabilities using arrival rate information instead of queue-sizes. Their algorithm does not require any information exchanges between nodes in the network, and recently was proven to have (some form of) throughput optimality (see [22] under a restricted setup and [15] for a general setup). The perfect carriersensing assumption in both algorithms can be relaxed due to the very recent work of Ni and Srikant [30]. In summary, CSMA algorithms based on local information fulﬁll the desire of achieving high throughput. However, the induced average queue-size or delay of these algorithm is quite large as discussed earlier. And in general, it is unreasonable to expect low delay and high throughput from CSMA algorithms due to the recent work of Shah, Tse and Tsitsiklis [35] which proves that the average-delay of any distributed algorithm in general networks cannot be polynomial with respect to the network size unless NP ⊂ BPP.

1.2

Contributions

As the main result of this paper, we design a queue-based CSMA algorithm that is throughput optimal and induces O(1) per-node average queue-size (equivalently, average delay) for networks with polynomial growth – this includes any reasonable practical wireless network (see Section 2.1). Its average queue-size/delay is order-optimal in the sense that its per-node average queue-size is constant, independent of network size, but possibly dependent on geometry of the network as well as the arrival load (like the standard queueing scenario). The algorithm is essentially a queue-based randomized CSMA with a minor twist/modiﬁcation that at each time instance a very small fraction of frozen nodes do not execute CSMA. We prove that an appropriate selection of frozen nodes, in a distributed manner, leads to the desired performance. The key idea behind our algorithm is a novel blend of a variant of RSS with a distributed implementation of the graph decomposition scheme of [18], say GDS, for networks with polynomial growth. The GDS suggests that such a geometric graph can be decomposed into local components so that the union of the maximum weight independent sets in these local components essentially gives a solution close to

the global maximum one. At the ﬁrst glance, one may expect that the immediate combination of GDS and RSS will lead to a good distributed implementation of MW if RSS ﬁnds (some form of) the maximum weight independent set in each local component. For such an algorithm to work, it is essential that the time taken by RSS to (approximately) solve a local maximum weight independent set problem must be uniformly bounded so that GDS can periodically update its decomposition in the network to keep queue-sizes small. Fundamentally, this is not possible with RSS, since it uses the double logarithm of queue-sizes as weights, which can become unboundedly large in principle. Therefore, to carry out such an approach one needs to overcome this non-trivial conceptual challenge of designing, possibly a variant of RSS that takes uniformly bounded time to reach near a local maximum weight independent set in each local component. We overcome this challenge by proposing a novel weight, dependent on queue-sizes, that always remains uniformly bounded (even though queues may not be) and has the desired near optimality property – it is a bounded linear function of queue-size of the node. Using this bounded weight, we ﬁrst show that indeed the corresponding randomized CSMA quickly converges (close) to a local maximum weight solution in uniformly bounded time, and thus allowing for an appropriate choice of constant updating period of GDS, independent of the network size. Eventually, this leads to throughput optimality and O(1) per-node average queuesize. An extensive simulation study presented in Section 6 supports our theoretic results. It shows that our algorithm comprehensively outperforms the algorithm RSS. The simulation results lead to the conjecture that GDS might not be necessary for the success of the algorithm, at least for highly structured geometric networks, such as regular grids. This is strongly supported by our simulation results in Section 6. Some remarks about realizability of our protocol. As a reader will note, the GDS can be easily implemented in a light-weight distributed manner, using only local structural information and hence is robust to the dynamic topology of the network. Further, in principle it is possible to hardwire the parameters (constants) utilized in our algorithm in wireless nodes as long as we know the growth rate of polynomially growing wireless networks (which is 2 or 3 for practical scenarios). Like RSS, our algorithm requires minimal local information exchanges between nodes. It induces essentially small constant amount of additional information per unit time and performed only between nodes within small, constant distance. Finally, we note that the proof technique utilized to establish our results is relatively elementary compared to that of RSS, e.g. not requiring the non-trivial network adiabatic theorem. Organization. Section 2 provides preliminaries, the setup of our interest and the precise deﬁnitions of performance metrics, throughput optimality and average queue-size/delay. In Section 3, we present our algorithm and state the main theorem about its performance guarantees. Sections 4 and 5 are dedicated to establish the main theorem. In Section 6, we report several simulation results which support our theoretical claims, compare its performance with a prior work as well as validate an interesting conjecture.

2. PRELIMINARIES

We start oﬀ with some notations. We shall reserve bold letters for vectors: e.g. u = [ui ]di=1 denotes a d-dimensional vector; 1 and 0 denote the vector of all 1s and all 0s. For any vector u = [ui ], deﬁne umax = maxi ui and umin = mini ui .

2.1

Network Model

Our network is a collection of n queues. Each queue has a dedicated exogenous arrival process through which new work arrives in the form of unit-sized packets. Each queue can be potentially serviced at unit rate, resulting in departures of packets from it upon completion of their unit service requirement. The network will be assumed to be single-hop, i.e. once work leaves a queue, it leaves the network. At ﬁrst glance, this appears to be a strong limitation. However, as a careful reader will ﬁnd that the results of this paper, in terms of algorithm design and analysis, is naturally extends to the case of the multi-hop setting using the diﬀerential back pressure [42]. Let t ∈ R+ denote the (continuous) time and τ = ⌊t⌋ ∈ N denote the corresponding discrete time slot. Let Qi (t) ∈ R+ be the amount of work in the ith queue at time t. Queues are served in First-Come-First-Serve manner. Qi (t) is the number of packets in queue i at time t, e.g. Qi (t) = 2.7 means head-of-line packet has received 0.3 unit of service and 2 packets are waiting behind it. Also, deﬁne Qi (τ ) = Qi (τ + ) for τ ∈ N. Let Q(t), Q(τ ) denote the vector of queuesizes [Qi (t)]1≤i≤n , [Qi (τ )]1≤i≤n respectively. Initially, time t = τ = 0 and the system starts empty, i.e. Q(0) = 0. Arrival process is assumed to be discrete-time with unitsized packets arriving to queues, for convenience. Let Ai (τ ) denote the total packets that arrive to queue i in [0, τ ] with assumption that arrivals happen at the end in each time slot, i.e. arrivals in time slot τ happen at time (τ + 1)− and are equal to Ai (τ +1)−Ai (τ ) packets. For simplicity, we assume Ai (·) are independent Bernoulli processes with parameter λi . That is, Ai (τ +1)−Ai (τ ) ∈ {0, 1} and Pr(Ai (τ +1)−Ai (τ ) = 1) = λi for all i and τ . Denote the arrival rate vector as λ = [λi ]1≤i≤n . Our assumption of Bernouli arrival process is just to simplify proofs in this paper, and one can consider another stochastic process (e.g. Poisson) as long as its second moment is bounded. The queues are oﬀered service as per a continuous-time (i.e. asynchronous/non-slotted) scheduling algorithm. Each of the n queues is associated with a wireless transmissioncapable device. Under any reasonable model of communication deployed in practice (e.g. 802.11 standards), in essence if two devices are close to each other and share a common frequency to transmit at the same time, there will be interference and data is likely to be lost. If the devices are far away, they may be able to simultaneously transmit with no interference. Thus the scheduling constraint here is that no two devices that might interfere with each other can transmit at the same time. This can be naturally modeled as an independent-set constraint on a graph (called the interference graph), whose vertices correspond to the devices, and where two vertices share an edge if and only if the corresponding devices would interfere when simultaneously transmitting. Speciﬁcally, let G = (V, E) denote the network interference graph with V = {1, . . . , n} representing n nodes and E = {(i, j) : i and j interfere with each other} .

Graphs with Geometry. Interference graphs of our interest are of polynomial growth, and can be deﬁned as follows. Definition 1 (Graphs with Polynomial Growth). G = (V, E) is a polynomial growth graph with rate ρ if there exists a universal constant B such that for any r ∈ N and v ∈V, |{w ∈ V : dG (w, v) ≤ r}| ≤ B · rρ , where dG (u, v) denotes the length of the shortest path between u and v in G. Now we explain why the wireless interference network G in practice has polynomial growth. Example 1 (Wireless Network). Suppose n wireless nodes are located (arbitrarily) in R2 or R3 with the minimum distance dmin between two. Transmissions of two wireless nodes do not interfere with each other if the distance between them is large enough, say dmax . Then, by virtually packing non-intersecting balls of radius dmin /2 centered at all nodes, it is easy to check that the number of devices within r hops w.r.t. interference graph is at most (2dmax /dmin )2 · r2 or (2dmax /dmin )3 · r3 . Therefore, the wireless interference network has polynomial growth of rate 2 or 3. Let N (i) = {j ∈ V : (i, j) ∈ E} denote the neighbors of node i. We assume that if node i is transmitting, then all of its neighbors in N (i) can “listen” to it. Let I(G) denote the set of all independent sets of G, i.e. subsets of V so that no two neighbors are adjacent to each other. Formally, I(G) = {σ = [σi ] ∈ {0, 1}n : σi + σj ≤ 1 for all (i, j) ∈ E}. Under this setup, the set of feasible schedules S = I(G). Given this, let σ(t) = [σ i (t)] denote the collective scheduling decision at time t ∈ R+ , with σ i (t) being the rate at which node i is transmitting. Then as discussed, it must be that σ(t) ∈ I(G), σ i (t) ∈ {0, 1} for all i, t. The queueing dynamics induced under the above described model can be summarized by the following equation: for any 0 ≤ s < t and 1 ≤ i ≤ n, ∫ t Qi (t) = Qi (s) − σi (y)1{Qi (y)>0} dy + Ai (s, t), s

where A(s, t) = [Ai (s, t)] denotes the cumulative arrival in time interval [s, t] and 1{x} denotes the indicator function. Finally, deﬁne the cumulative departure D(s, t) = [Di (s, t)], where ∫ t Di (s, t) = σi (y)1{Qi (y)>0} dy. s

2.2 Performance Metrics We need an algorithm to select schedule σ(t) ∈ S = I(G) for all t ∈ R+ . Thus, a scheduling algorithm is equivalent to scheduling choices σ(t), t ∈ R+ . From the perspective of network performance, we would like the scheduling algorithm to be such that the queues in network remain as small as possible given the arrival process. From the implementation perspective, we wish that the algorithm be simple and distributed, i.e. perform constant number of logical operations at each node (or queue) per unit time, utilize information only available locally at the node or obtained through a neighbor and maintain as little data structure as possible at each node.

First, we formalize the notion of performance. In the setup described above, we deﬁne capacity region C ⊂ [0, 1]n as the convex hull of the feasible scheduling set I(G) = S, i.e. C=

 ∑ 

σ∈S

ασ σ :

∑

ασ = 1 and ασ

σ∈S

  ≥ 0 for all σ ∈ I(G) . 

The intuition behind this deﬁnition of capacity region comes from the fact that any algorithm has to choose schedule from I(G) each time and hence the time average of the ‘service rate’ induced by any algorithm must belong to C. Therefore, if arrival rates λ can be ‘served’ by any algorithm then it must belong to C. Motivated by this, we call an arrival rate vector λ admissible if λ ∈ Λ, where Λ = {λ ∈ Rn + : λ ≤ σ componentwise, for some σ ∈ C} . Further, the admissible arrival rate vector λ is called ρloaded for ρ ∈ [0, 1] if ρ−1 λ ∈ Λ. When λ is ρ-loaded for ρ < 1, we say λ is strictly admissible, and let Λo be the set of strictly admissible arrival rate vectors i.e. the interior of Λ. Now we are ready to deﬁne the performance metric for a scheduling algorithm. Definition 2 (Throughput Optimal). We call a scheduling algorithm throughput-optimal, or stable, or providing 100% throughput, if for any λ ∈ Λo the underlying network Markov process is positive Harris recurrent. Positive Harris Recurrence & Its Implications. For completeness, we deﬁne the well known notion of positive Harris recurrence (e.g. see [5]). We also state its useful implications to explain its desirability. In this paper, we will be concerned with discrete-time, time-homogeneous Markov process or chain evolving over a complete, separable metric space X. Let BX denote the Borel σ-algebra on X. Let X(τ ) denote the state of Markov process at time τ ∈ N. Consider any A ∈ BX . Deﬁne stopping time TA = inf{τ ≥ 1 : X(τ ) ∈ A}. Then the set A is called Harris recurrent if Prx (TA < ∞) = 1

for any x ∈ X,

where Prx (·) ≡ Pr(·|X(0) = x). A Markov process is called Harris recurrent if there exists a σ-ﬁnite measure µ on (X, BX ) such that whenever µ(A) > 0 for A ∈ BX , A is Harris recurrent. It is well known that if X is Harris recurrent then an essentially unique invariant measure exists (e.g. see Getoor [9]). If the invariant measure is ﬁnite, then it may be normalized to obtain a unique invariant probability measure (or stationary probability distribution); in this case X is called positive Harris recurrent. Now we describe a useful implication of positive Harris recurrence. Let π be the unique invariant (or stationary) probability distribution of the positive Harris recurrent Markov process X. Then the following ergodic property is satisﬁed: for any x ∈ X and non-negative measurable function f : X → R+ , T −1 1 ∑ f (X(τ )) → Eπ [f ], Prx -almost surely. T →∞ T τ =0 ∫ Here Eπ [f ] = f (z)π(z). Note that Eπ [f ] may not be ﬁnite.

lim

Average Delay. The second important performance metric is average-delay or equivalently from Little’s law the average of queue-size in the network induced by the algorithm.

If the underlying network Markov process is positive Harris recurrent (hence, the unique stationary distribution π exists), we deﬁne the per-node average queue-size (which we may address as the average delay as well) of the algorithm as 1∑ Eπ [Qi ]. n i∈V An algorithm is called order optimal if it is bounded by a constant, not dependent on n; naturally dependent on load ρ of the arrival rate vector e.g. O(1/(1 − ρ)).

2.3 Popular Algorithm In this paper, our interest is in scheduling algorithms that utilize the network state, i.e. the queue-size Q(t), to obtain a schedule. An important class of scheduling algorithms with throughput-optimality property is the well known maximum-weight scheduling algorithm which was ﬁrst proposed by Tassiulas and Ephremides [42]. We describe the slotted-time version of this algorithm. In this version, the algorithm changes decision in the beginning of every time slot using Q(τ ) = Q(τ + ). Speciﬁcally, the scheduling decision σ(τ ) remains the same for the entire time slot τ , i.e. σ(t) = σ(τ ) for t ∈ (τ, τ + 1], and it satisﬁes ∑ σ(τ ) ∈ arg max ρi Qi (τ ). ρ∈S

i

Thus, this maximum weight or MW algorithm chooses schedule σ ∈ S that has the ∑ maximum weight, where weight is deﬁned as σ · Q(τ ) = n i=1 σ i Qi (τ ).

3. MAIN RESULT This section presents the main result of this paper, namely a delay optimal queue-based CSMA algorithm that utilizes local network geometry to achieve good performance. In a nutshell, our algorithm is an asynchronous CSMA in which each wireless node adapts its medium access probabilities as function of its current queue-size. That is, a node attempts transmission in a regular, asynchronous manner: if it ﬁnds the medium busy then it does not transmit, else it transmits with probability that depends on its queue-size. In addition, at each time each node is in one of two states, frozen (colored red) or unfrozen (colored green). The frozen (i.e. red) nodes do not change their transmission state but unfrozen (i.e. green) nodes execute the queue-based CSMA mentioned above and described in detail in Section 3.1. As we shall ﬁnd, the performance of such an algorithm is crucially determined by the precise choices of (a) frozen nodes and (b) queue-based access probabilities. We describe these in Sections 3.2 and 3.3, respectively. We note that while the algorithm is inspired by and similar to that in [31, 34] (we say RSS), the choice of weights diﬀers crucially in addition to the selection of frozen nodes. As a reader will ﬁnd, both of these modiﬁcations are essential to establish the desired delay order-optimal performance.

3.1 CSMA Algorithm As explained above, the CSMA algorithm will be executed by green or unfrozen nodes. Let t ∈ R+ denote the time and W (t) = [Wi (t)] denote the vector of weights at nodes at time t. As explained in Section 3.3, the weights W (t) will be essentially some function of the queue-sizes Q(t).

Each node i has an independent Exponential clock of rate 1, i.e. times between consecutive clock ticks are independently distributed as per Exponential distribution with mean 1. Upon a clock tick of node i at time t, it does the following: ◦ Node i “listens to” or “senses” the medium. ◦ If any neighbor is transmitting, i.e. medium is busy, then σi (t+ ) = 0. ◦ Else (i.e. medium is free), set σi (t+ ) = 1 with probaexp(Wi (t)) bility 1+exp(W , and σi (t+ ) = 0 otherwise. i (t)) Note that due to the property of continuous random variables, no two clock ticks at diﬀerent nodes will happen at the same time (with probability 1)1 . We assume that if σi (t) = 1, then node i will always transmit data irrespective of the value of Qi (t) so that the neighbors of node i, i.e. nodes in N (i), can infer σi (t) by “listening” to the medium.

3.2

Coloring Scheme

In this section, we provide details of the randomized coloring or freezing decisions. They are updated regularly L time apart. That is, the decisions are made at times Tk , where Tk = kL, k ∈ Z+ . The following is the centralized description of this algorithm that is run at each time Tk . A simple distributed implementation is presented in Section 3.5. ◦ Initially, all nodes are uncolored. ◦ Repeat the following until all nodes are colored by green or red: (a) Choose an uncolored node u ∈ V uniformly at random. (b) Draw a random integer R ∈ [1, K] according to a distribution described below that depends on K and parameter ε > 0. (c) Color all nodes in {w ∈ V : dG (u, w) < R} as green. (d) Color all nodes in {w ∈ V : dG (u, w) = R} as red. Note that a node may be re-colored multiple times until the loop terminates. In above, K and ε is some constants, which will be decided later, and shall aﬀect the performance of the algorithm. The distribution of R used in step (b) above, parameterized by K and ε > 0, is essentially a truncated (at K) Geometric with parameter ε: { ε(1 − ε)i−1 if 1 ≤ i ≤ K Pr[R = i] = . (1 − ε)K−1 if i = K

3.3

Design of Weight

Let VG and VR be the green and red nodes generated by the coloring scheme in Section 3.2, respectively. By removing VR from G, the graph is partitioned into connected components of green nodes. For each green node i, Qmax,i (t) denotes the maximum queue size at time t in the partition 1 Our algorithm can be easily adapted to slotted time, or synchronous time, by using the ‘trick’ introduced in [30].

containing it. Based on this notation, the weight at green node i in the kth time interval [Tk , Tk+1 ) is deﬁned as Wi (t) = C

Qi (Tk ) , Qmax,i (Tk )

for t ∈ [Tk , Tk+1 ),

(1)

where C is a constant and we use notation 0/0 = 1. Thus, the weight of each node is updated regularly L time apart, at times Tk , k ∈ Z+ . Here we have assumed that every node i knows the maximum queue-size in the virtual partition that it belongs. This can be computed using a simple distributed mechanism described in Section 3.5. Further, the algorithm’s performance is robust with respect to error in this estimation (see remark in Section 3.4).

3.4 Optimality: Throughput & Delay These weight and coloring scheme with proper choices of L, C, K and ε lead the following throughput optimality and optimal delay property of the algorithm. Theorem 1. Suppose λ ∈ (1 − δ)Λ for some δ > 0 and G is a polynomial growth graph of rate ρ. Then, there exist constants L , L(ρ, δ),

C , C(ρ, δ),

K , K(ρ, δ),

ε , ε(δ),

such that the (appropriately defined) network Markov process is positive Harris recurrent with its unique stationary distribution π. Further, ∑ Eπ [Qi ] = O(n). i∈V

Conjecture. Note that the precise provable condition of constants in the theorem is ( ρ) ( ) ε = Ω(δ), K = Ω ρε log ρε , C = Ω Kδ , L ≥ L(K, C, ρ, δ) (See Lemma 5), where the function L(·) is decided by the graph structure of G, speciﬁcally the mixing time of Glauber dynamics in G as we discuss in Section 4.2. It means that K, C and L should be decided in this order. We observe in our proof and simulations that the small choice of L, C and the large choice of K lead to a lower delay. Now as K → ∞, essentially all nodes are colored green. Therefore, it naturally leads to the following conjecture: Our algorithm with all nodes colored green, the weight selection as in (1), and appropriate universal constants L, C, will have near optimal performance in terms of throughput and delay. This conjecture requires to remove the dependency between K and C, L in the theorem. We perform extensive simulations to verify this conjecture over regular geometric graphs. The empirical results validate this conjecture as described Section in 6. Therefore, we strongly believe that the algorithm without coloring (or frozen) nodes should work well for at least regular enough geometric graphs. This also explains the importance of weight selection as in Section 3.3 in contrast to log log(·) weight as used in [31, 34]. Remark. Note that the term Qmax,i in the weight (1) is not sensitive for performance of the algorithm. Speciﬁcally, using the same proof techniques in this paper, one can prove

e max,i the theorem under the weight using some estimation Q of Qmax,i in its place as long as e for all i. Qmax,i − Qmax,i = O(1), For example, one can use older queueing information at time Tk − O(1) instead of Tk since |Qmax,i (Tk − O(1)) − Qmax,i (Tk )| = O(1).

3.5

Distributed Implementation

The goal in this paper is to design an algorithm that is distributed and simple. That is, each node makes only constant number of operations locally each time, communicates only constant amount of information to its neighbors, maintains only constant amount of data structure and utilizes only local information, i.e. nodes exchange information only with other nodes that are within constant distance. As we describe our algorithm so far, it is indeed distributed and simple given two centralized components which are necessary to update regularly per every time period L: (a) the coloring scheme and (b) the knowledge of the maximum queue-size Qmax,i in each partition. In this section, we provide somewhat obvious simple distributed algorithm which achieves both for the completeness. The algorithm is a simple-message passing mechanism of two rounds with at most O(K) iterations. Since the updating period L can be chosen arbitrarily large (constant though) compared to K as we discuss in Section 3.4, the distribution cost can be made minimal. However, remember that a smaller choice of L leads to a lower delay. The ﬁrst round provides a distributed implementation of the coloring scheme in Section 3.2 and the second round is for computing the local maximum queue-size in weight (1). Now we formally describe the algorithm. Initialization. Initially, each node i draw a random number r = ri ∈ [0, 1] uniformly at random and R = Ri ∈ [1, K] under the distribution in Section 3.2. First Round (Coloring). In the ﬁrst round, i sends a message (ri , Ri − 1) to its neighbors. Once a node j receive a message (r, k), it sends a message (r, k − 1) to all neighbors if k > 0. Meanwhile, every node i maintain a message (r∗ , k ∗ ) such that r∗ = max r, with maximum taken over all messages received by i (including ri ); k∗ be the corresponding k value. If k∗ = 0, the node decides to be colored red. Otherwise, it becomes colored green. This ﬁrst round terminates in at most K iterations. Second Round (Weight). In the second round, every green node i generates a message (Qi , 2K) and sends the message to its all neighbors. Once a green node j receives a message (Q, k), it sends a message (Q, k −1) to all neighbors if k > 0. On the other hand, red nodes do nothing even if they receive some messages. Meanwhile, every green node i maintains Qmax,i = max Q where the maximum is taken over all messages i receives (including Qi ). This second round terminates in 2K iterations.

4.

TECHNICAL PRELIMINARIES

We present some known results about stationary distribution and convergence time (or mixing time) to stationary distribution for a speciﬁc class of ﬁnite-state Markov processes known as Glauber dynamics (or Metropolis-Hastings). We also provide the known property of the coloring scheme.As

the reader will ﬁnd, these results will play an important role in establishing the positive Harris recurrence and constantaverage delay of the network Markov process.

4.1 Finite State Markov Process Consider a discrete-time Markov chain {X(τ )}τ ∈Z+ over a ﬁnite state space Ω. Let an |Ω| × |Ω| matrix P be its transition probability matrix: µ(τ ) = µ(τ − 1)P = µ(0)P τ , where µ(τ ) is the distribution of X(τ ) ∈ Ω. If P is irreducible and aperiodic, then the Markov chain has a unique stationary distribution π and it is ergodic in the sense that limτ →∞ µ(τ ) = π. The adjoint of the transition matrix P (also called the time-reversal of P ) is denoted by P ∗ and deﬁned as: for any i, j ∈ Ω, P ∗ (i, j) = πj P (j, i)/πi . By definition, P ∗ has π as its stationary distribution as well. If P = P ∗ then P is called reversible. In case P is reversible, P has real eigenvalues 1 = λ1 ≥ λ2 ≥ λ3 · · · ≥ λ|Ω| . 1 − λ2 is called the spectral gap of P . The continuous-time Markov process {X(t)}t∈R+ over a ﬁnite state space Ω can be characterized using a discretetime Markov chain P . For t ≥ 0, et(P −I) represents the transition matrix of the process: µ(t) = µ(0)et(P −I) . We call P as the kernel of the Markov process. As the reader will notice, our algorithm in each time interval is a continuous-time Markov process on the ﬁnite space, the set of independent sets of each component. Its kernel is essentially the following Markov chain known as Glauber dynamics (or Metropolis-Hastings). Definition 3 (Glauber Dynamics). Consider a node weighted graph G = (V, E) with W = [Wi ]i∈V the vector of node weights. Let I(G) denote the set of all independent sets of G. Then the Glauber dynamics on I(G) with weights given by W , denoted by GD(W ), is the following Markov chain. Suppose the Markov chain is at state σ = [σi ]i∈V , then the next transition happens as follows: ◦ Pick a node i ∈ V uniformly at random. ◦ If σj = 0 for all j ∈ N (i), then { 1 with probability σi = 0 otherwise.

exp(Wi ) 1+exp(Wi )

◦ Otherwise, σi = 0. Relation to Algorithm. Now we relate the Glauber dynamics with our algorithm. To this end, ﬁrst observe that once red nodes are decided to be frozen in time interval [Tk , Tk+1 ), the set VF of all (virtual) frozen nodes becomes VF = VR ∪ {i ∈ VG : ∃j ∈ N (i) ∩ VR s.t. σj (Tk ) = 1} , since even a green node cannot change its schedule if its frozen neighbor is transmitting. By removing VF from G, one can naturally deﬁne the partition of all non-frozen nodes as {G1 , G2 , . . . } where Gl = (Vl , El ) and ∪l Vl = V \ VF . Recall that the algorithm changes its scheduling decision when a non-frozen node’s Exponential clock of rate 1 ticks. Due to the property of the Exponential distribution, no two

nodes have clocks ticking at the same time. Given a clock tick, it is equally likely to be any of the |Vl | nodes in the component Gl . The node whose clock ticks, decides its transition based on probability prescribed by the Glauber dynamics GD(W (t)) where recall that W (t) are determined based on Q(Tk ) for t ∈ [Tk , Tk+1 ). Let P be the transition matrix prescribed by the Glauber dynamics GD(W (t)) for t ∈ [Tk , Tk+1 ) on the partition graph Gl . The stationary distribution π of the Glauber dynamics can be characterized easily from the reversibility of P : for W = W (t) = W (Tk ) for t ∈ [Tk , Tk+1 ) ( ) πσl ∝ exp W · σ l , for σ l ∈ I(Gl ). (2) Further, using the variational characterization of distribution π in the exponential form, one can obtain the following whose proof is presented in Appendix A. Lemma 2.

[ ] Eπ W · σ l ≥ max W · ρ − |Vl |. ρ∈I(Gl )

Lemma 2 implies that sampling schedule σ l according to π is a max-weight choice with respect to W with some additive error at most |Vl |. Now let µl (t) denote the distribution of the schedule σ l (t) ∈ I(Gl ) at time t in the component Gl . The algorithm is essentially running P on I(Gl ) when a clock ticks at time t. Since there are |Vl | clocks with rate 1, we have µl (t) =

∞ ∑

Pr(ζ = i)µl (Tk )P i = µl (Tk )e|Vl |(t−Tk )(P −I) ,

i=0

(3) where ζ be the number of clock ticks in time [Tk , t] and it is distributed as a Poisson random variable with mean |Vl |(t − Tk ). Therefore, P is the kernel of the embedded { ( )} Markov process

4.2

σl

t |Vl |

for t ∈ [Tk , Tk+1 ).

Mixing Time

We would like to establish that the distribution µl (t) is close to its corresponding stationary distribution for most of time t ∈ [Tk , Tk+1 ) starting from any initial condition µl (Tk ). To establish our results, we will need quantiﬁable bounds on the time it takes for the process to reach close to its stationary distribution – popularly known as mixing time. To make this notion precise and recall known bounds on mixing time, we start with deﬁnitions of the distance between probability distributions. Definition 4 (Distance of Measures). Given two probability distributions ν and µ on a finite space Ω, we define the following two distances. The total variation distance, denoted as ∥ν − µ∥T V is ∥ν − µ∥T V =

1∑ |νi − µi | . 2 i∈Ω

For ε > 0 and given initial distribution µ(Tk ), the mixing time can be quantiﬁed using this distance: let π be correspond stationary distribution, then { } TT V (ε) = min ∥µ(Tk + t) − π∥T V ≤ ε (4) t

Since the Markov process induced by our algorithm is reversible, it is known [28] that ( ) 1 1 1 1 TT V (ε) ≤ log + log , λ 2 πmin 2ε where λ is the spectral gap of the kernel of the Markov process. Hence, from (3), we have ( ) 1 1 1 1 TT V (ε) ≤ log + log , (5) |Vl |λP 2 πmin 2ε where λP is the spectral gap of P . From (2) with Wmax ≤ C, it is easy to obtain a lower bound of πmin in terms of |Vl | and C. 1 πmin ≥ |V | . (6) 2 l exp(C|Vl |) A naive bound of λP using the Cheeger’s inequality [7, 38] is also known in terms of |Vl | and C (see Lemma 3 in [34]). λP ≥

1 . |Vl |2 22|Vl |+3 exp(2(|Vl | + 1)C)

(7)

Therefore, from (5), (6) and (7), one can ﬁnd an explicit form of constant T (ε, |Vl |, C) such that TT V (ε) ≤

T (ε, |Vl |, C).

(8)

4.3 Graph Decomposition The coloring scheme in Section 3.2 is the graph-decomposition scheme in [18] (see page 40) for polynomial growth graphs. For a polynomial growth graph G with rate ρ, the authors consider the following constant K and distribution of R. For some ε ∈ (0, 1), { ε(1 − ε)i−1 if 1 ≤ i ≤ K Pr[R = i] = (1 − ε)K−1 if i = K ( ) 8ρ 8ρ 4 4 1 K(ε, ρ) = log + log B + log + 2. ε ε ρ ρ ε Under this randomized coloring scheme, they prove (see Lemma 4 in [18]) the following. Lemma 3. For any v ∈ V , Pr [v is colored by red] ≤ 2ε.

Network Markov Process

We ﬁrst introduce the necessary deﬁnition of the network Markov process under our algorithm. As before, let τ ∈ Z+ and t ∈ R+ be the index for discrete time and continuous time, respectively. Let Q(t) = [Qi (t)] denote the vector of queue-sizes at time t and σ(t) = [σi (t)] ∈ I(G) be the scheduling choices at the n nodes at time t. Then it can be checked that the tuple X(τ ) = (Q(τ L), σ(τ L)) is the Markov state of the network operating under the algorithm. Note that X(τ ) ∈ X where X = Rn + × I(G). Clearly, X is a Polish space endowed with the natural product topology. Let BX be the Borel σ-algebra of X with respect to this product topology. Let P denote the probability transition matrix of this discrete-time X-valued Markov process. We wish to establish that X(τ ) is indeed positive Harris recurrent under this setup. For any x = (Q, σ) ∈ X, we deﬁne norm of x denoted by |x| as |x| = |Q| + |σ|, where |Q| denotes the standard ℓ1 norm while |σ| is deﬁned as its index in {0, . . . , |I(G)| − 1}, which is assigned arbitrarily. Thus, |σ| is always bounded. Therefore, in essence |x| → ∞ if and only if |Q| → ∞. Next, we present the proof of Theorem 1 based on a sequence of lemmas.

5.2

Proof of Theorem 1

Positive Recurrence. We will need some deﬁnitions to begin with. Given a probability distribution (also called sampling distribution) a on N, the a-sampled transition matrix of the Markov process, denoted by Ka is deﬁned as ∑ Ka (x, B) = a(τ )P τ (x, B), for any x ∈ X, B ∈ BX . τ ≥0

Now, we deﬁne a notion of a petite set. A non-empty set A ∈ BX is called µa -petite if µa is a non-trivial measure on (X, BX ) and a is a probability distribution on N such that for any x ∈ A, Ka (x, ·) ≥ µa (·). A set is called a petite set if it is µa -petite for some such nontrivial measure µa . A known suﬃcient condition to establish positive Harris recurrence of a Markov process is to establish positive Harris recurrence of closed petite sets as stated in the following lemma. We refer an interested reader to the book by Meyn and Tweedie [26] or the recent survey by Foss and Konstantopoulos [8] for details. Lemma 4. Let B be a closed petite set. Suppose B is Harris recurrent, i.e. Prx (TB < ∞) = 1 for any x ∈ X. Further, let

This implies that for any v ∈ V , Pr [v is frozen]

5.1

=

Pr[v ∈ VF ]

sup Ex [TB ] < ∞,

≤

Pr [∃ red colored w ∈ {v} ∪ N (v)]

x∈B

(a)

≤

2ε(|N (v)| + 1) ≤ 2εB,

(9)

where we use the union bound and Lemma 3 for (a). Therefore, one can make the portion of red or frozen nodes arbitrarily small under the coloring scheme.

5. PROOF OF MAIN RESULT This section presents the detailed proof of Theorem 1. We will present the necessary notions and proof skeleton followed by details.

where Ex [·] denotes the expectation with respect to the initial state X(0) = x. Then the Markov process is positive Harris recurrent. Lemma 4 suggests that to establish the positive Harris recurrence of the network Markov process, it is suﬃcient to ﬁnd a closed petite set that satisﬁes the conditions of Lemma 4. To this end, we ﬁrst establish that there exist closed sets that satisfy the conditions of Lemma 4. Later we will establish that they are indeed petite sets. This will conclude the proof of positive Harris recurrence of the network Markov process.

The system Lyapunov function, L : X → R+ is deﬁned as L(x) =

n ∑

Q2i where x = (Q, σ) ∈ X.

i=1

We will establish the following, whose proof is given in Section 5.3. Lemma 5. Suppose λ ∈ (1 − δ)Λ and G is a poly growth graph of rate ρ. If we choose ε, K, C and L such that ( ) 8ρ 8ρ 4 4 1 K = K(ε, ρ) = log + log B + log + 2 ε ε ρ ρ ε ε=

2

δ 8B 2

C=

Lemma 7. If G is a polynomial growth graph with rate ρ, ( ) δ Ex [Q(0) · σ(t)] ≥ 1− max Q(0) · ρ, 2 ρ∈I(G)

8B K δ

ρ

L=

then Ex [L(X(1)) − L(X(0))] ≤ −

for t ∈ [M, L) and M = ⌈T (δ/8, BK ρ , C)⌉. Lemma 7 implies that our algorithm chooses essentially a max-weight schedule with respect to Q(0) after large enough time (≥ M ) in the ﬁrst time interval. Hence, it follows that for t ∈ [M, L), Ex [Q(t) · σ(t)]

≥ (a)

≥

3B ⌈T (δ/8, BK ρ , C)⌉ , δ

∑

Qi (0) + O(n),

(b)

≥

≥

(10)

Ex [Q(0) · σ(t)] + Ex [(Q(t) − Q(0)) · σ(t)] ( ) δ 1− max Q(0) · ρ − tn 2 ρ∈I(G) ( ) [ ] δ 1− Ex max Q(⌊t⌋) · ρ − 2tn ρ∈I(G) 2 ) [ ] ( δ Ex max Q(⌊t⌋) · ρ − 2Ln, 1− ρ∈I(G) 2 (11)

i

where T (·) is defined in (8). Now deﬁne Bκ = {x : L(x) ≤ κ} for any κ > 0. It will follow that Bκ is a closed set. Therefore, Lemma 5 and Theorem 1 in survey [8] imply that there exists constant κ0 > 0 such that for all κ0 < κ, the following holds: Ex [TBκ ]

<

∞,

sup Ex [TBκ ]

<

∞.

for any x ∈ X

where both (a) and (b) follow from the Lipschitz2 property of Qi . Now, using (11), we analyze how the Lyapunov function L evolves in time [τ, τ + 1] for τ ∈ [M, L) ∩ N. Ex

[ ∑

Qi (τ + 1)2 −

i

∑ i

[ ] ∑ = Ex (Qi (τ + 1) − Qi (τ ))(Qi (τ + 1) + Qi (τ )) i

x∈Bκ

(a)

Now we are ready to state the ﬁnal nugget required in proving positive Harris recurrence as stated below. Lemma 6. Consider any κ > 0. Then, the set Bκ = {x : L(x) ≤ κ} is a closed petite set.

≤ 2Ex

[ ∑

] (Qi (τ + 1) − Qi (τ ))Qi (τ ) + n

i

≤ 2Ex [A(τ, τ + 1) · Q(τ )] − 2Ex [D(τ, τ + 1) · Q(τ )] + n [ ] ∑ ∫ τ +1 ≤ 2Ex [λ · Q(τ )] − 2Ex σi (y)1{Qi (y)>0} Qi (τ )dy + n i

The proof of Lemma 6 is presented in Appendix B. Lemmas 4, 5 and 6 imply that the network Markov process is positive Harris recurrent.

(b)

Average Delay. Lemma 5 implies that for τ ∈ Z+ , [ ] ∑ Ex [L(X(τ + 1)) − L(X(τ ))] ≤ −Ex Qi (τ L) + O(n),

≤ 2Ex [λ · Q(τ )] − 2Ex

i

since the network Markov process is time-homogeneous. By summing the above inequality over τ from 0 to T − 1, we obtain [ ] T −1 ∑ ∑ Ex Qi (τ L) ≤ O(T n) + L(X(0)) − L(X(T )). τ =0

i

[ ∑∫ i

]

τ +1

σi (y)1{Qi (y)>0} Qi (y)dy + 3n

τ

]

τ +1

σi (y)Qi (y)dy + 3n, τ

where we use the fact that Qi is Lipschitz for (a) and (b). Hence, we have Ex

[ ∑

Qi (τ + 1)2 −

i

∑ i

≤ 2Ex [λ · Q(τ )] − 2Ex

] Qi (τ )2

[ ∑∫ i

]

τ +1

σi (y)Qi (y)dy + 3n τ

[ ] ∫ ≤ 2Ex (1 − δ) max ρ · Q(τ ) − 2 ρ∈I(G)

Therefore, it follows that

T →∞

≤ 2Ex [λ · Q(τ )] − 2Ex

τ

[ ∑∫

(a)

i

lim sup

]

Qi (τ )2

1 T

T −1 ∑ τ =0

Ex

[ ∑

] Qi (τ L)

τ +1

Ex [σ(y) · Q(y)] dy + 3n

τ

[ ] [( ) ] δ ≤ 2Ex (1 − δ) max ρ · Q(τ ) − 2Ex 1− max ρ · Q(τ ) ρ∈I(G) 2 ρ∈I(G)

(b)

≤ O(n).

i

By the ergodic property implied by the positive Harris recurrence of the ∑ network Markov process as stated in Section 2.2, we have i Eπ [Qi ] = O(n). This completes the proof of Theorem 1.

5.3 Proof of Lemma 5 We ﬁrst state the following which is a key for proving Lemma 5.

+ 4Ln + 3n [ ] ≤ −δEx max ρ · Q(τ ) + O(n) ρ∈I(G)

[ ] ∑ δ ≤ − Ex Qi (τ ) + O(n). B i

(c)

In above, (a) and (b) are from λ ∈ (1 − δ)Λ and (11) respectively. (c) follows that the maximum degree of G is at 2

|Qi (s) − Qi (t)| ≤ |s − t| for all s, t ∈ R+ .

most B − 1 from the growth condition of the graph deﬁned in Section 2. By summing the above inequality over τ from M to L − 1, we obtain Ex

[ ∑

Qi (L) − 2

∑

i

i

≤−

]

TV

Qi (M )

(a)

≥

] Qi (τ ) + O(n)(L − M ).

max W · ρ

ρ∈I(Gl )

) ( δ max W · ρ − |Vl |, 1− 8 ρ∈I(Gl )

where (a) is from Lemma 2 and (13).

i

By multiplying

Qmax,V (0) l C

Therefore, it follows that Ex [L(X(1)) − L(X(0))] = Ex

[ ∑

Qi (L) − 2

∑

i

] Qi (0)

2

i

[ ] L−1 ∑ δ ∑ Ex Qi (τ ) + O(n)(L − M ) B τ =M i [ ] ∑ ∑ + Ex Qi (M )2 − Qi (0)2

≤−

i

Ex [W · σ l (t)] = Eµl (t) [W · σ]

≥ Eπ [W · σ] − µl (t) − π ·

2

[ ∑

L−1 δ ∑ Ex B τ =M

t ∈ [M, L), we have

on both sides of the above inequality, we obtain that for t ∈ [M, L), ( ) |Vl | δ Ex [Ql (0) · σ l (t)] ≥ 1 − max Ql (0) · ρ − Qmax,Vl (0). 8 ρ∈I(Gl ) C (14) Using this, it follows that for t ∈ [M, L), Ex [Q(0) · σ(t)] ≥

∑

Ex [Ql (0) · σ l (t)]

l

( ) δ ∑ 1 ∑ max Ql (0) · ρ − |Vl |Qmax,Vl (0) ≥ 1− ρ∈I(Gl ) 8 C l l ( ) (b) δ ∑ BK ρ ∑ max Ql (0) · ρ − ≥ 1− Qi (0) ρ∈I(Gl ) 8 C i∈V l

i

(a)

∑ δ δ (L − M ) Qi (0) + (L − M )nL + O(n)(L − M ) B B i [ ] ∑ ∑ + Ex Qi (M )2 − Qi (0)2

(a)

≤ −

i (b)

≤ −

(c)

≤ −

i

∑ ∑ δ (L − M ) Qi (0) + 2M Qi (0) + O(n) B i i ∑

  ( ) ∑ δ BK ρ ∑ ≥ 1− max Q(0) · ρ − E  Qi (0) − Qi (0) 8 ρ∈I(G) C i∈V i∈V

(c)

(d)

≥

Qi (0) + O(n),

(

δ 1− 8

)

F

( ) BK ρ ∑ max Q(0) · ρ − 2εB + Qi (0). ρ∈I(G) C i∈V

i

where (a), (b) are from the Lipschitz property of Qi and (c) is due to L ≥ 3BM/δ and M ≥ 1, from our choice of L in Lemma 5. This completes the proof of Lemma 5.

5.4 Proof of Lemma 7 |V |

For a vector v ∈ R , we let v and v denote the projected vector of v on Vl and VF respectively. We start by observing the following. ∑ l

max Ql (0) · ρ

ρ∈I(Gl )

≥

l

∑

F

Ql (0) · ρl∗

l

=

max Q(0) · ρ − QF (0) · ρF ∗

ρ∈I(G)

≥

max Q(0) · ρ −

ρ∈I(G)

∑

Qi (0). (12)

i∈VF

where ρ∗ = maxρ∈I(G) Q(0) · ρ. Recall P is the transition matrix of the Glauber dynamics GD(W ) on the partition graph Gl = (Vl , El ) and the weight W is given as Wi = C

Qi (0) Qi (0) = C , Qmax,i (0) Qmax,Vl (0)

for i ∈ Vl ,

where we can deﬁne Qmax,Vl (0) = Qmax,i (0) since Qmax,i (0) = Qmax,j (0) for i, j ∈ Vl . Now we know that for t ≥ M = ⌈T (δ/8, BK ρ , C)⌉,

l

≤ δ/8, (13)

µ (t) − π TV

from |Vl | ≤ BK ρ and the deﬁnition of T (·) in (8). Thus, for

In above, (a), (c) and (d) are from (14), (12) and (9), respectively. (b) follows from the fact that Qmax,Vl (0) is the maximum in at most BK ρ queue-sizes. Finally, if we choose ε and C as in Lemma 5, the above inequality leads to Ex [Q(0) · σ(t)] ) ( ) ( BK ρ ∑ δ max Q(0) · ρ − 2εB + Qi (0) ≥ 1− 8 ρ∈I(G) C i∈V ( ) ( ) δ B2K ρ ≥ 1− max Q(0) · ρ − 2εB 2 + max Q(0) · ρ ρ∈I(G) 8 ρ∈I(G) C ( ) ( ) δ δ δ ≥ 1− max Q(0) · ρ − + max Q(0) · ρ 8 ρ∈I(G) 4 8 ρ∈I(G) ) ( δ = 1− max Q(0) · ρ, 2 ρ∈I(G)

where the second inequality can be easily checkable using the fact that the maximum degree of G is at most B − 1. This completes the proof of Lemma 7.

6.

SIMULATIONS

We consider a N × N two-dimensional grid interference graph G to understand the performance of our algorithm. We choose this example since it is (1) reasonable to approximate a scenario of mesh wireless networks, and (2) easy to characterize the capacity region of the network as 2

Λ = {λ ∈ RN + : λu + λv ≤ 1 for all edges (u, v)}. (2) is because G is a bipartite (or perfect) graph. In our simulations, we consider the Bernoulli arrival process with uniform rate of load ρ ∈ [0.1] i.e. λ ∈ ρΛ and λi = λj for all

i, j. Otherwise stated, we will use ρ = 0.85 i.e. λi = 0.5 × 0.85 = 0.425 for every node i. Under this arrival traﬃc, we simulate our algorithm in Section 3 with constants C, L, R. Note that although we suggest a randomized decision of R for theoretical reasons, we ﬁx R in simulations for simplicity.

Figure 1: Comparison of RSS and our algorithm. Comparison with RSS. We ﬁrst compare our algorithm with that in [31, 34], which we call RSS. This simulation is done under 10 × 10 grid network G (i.e. N = 10). We run our algorithm with weights in Section 3.3 and parameters C = L = R = 5, and RSS with weights as the double logarithm of queue-sizes as the authors suggest. Figure 1 has a x-axis as the time domain as the average queue∑ and y-axis 2 size in the network i.e. i Qi /N . Our simulation presented in Figure 1 implies that both algorithms are stable, but our algorithm is much better in its delay performance. Moreover, we observe that the delay gap between two algorithms increases in high loaded arrival traﬃc (i.e. ρ → 1) and large network (i.e. N → ∞). In other words, our algorithm performs much better under more extreme scenarios. Veriﬁcation of Theorem 1. Now, through simulations, we verify Theorem 1 which guarantees universal constants C, L, R, independent of the network size N 2 . We ﬁx C = L = R = 5 in our algorithm, and run it while changing the network size N = 10, 30, 50. We obtain Figure 2,

Figure 2: Eﬀect of the network size to the delay performance. which shows that the choice of ﬁxed C = L = R = 5 makes the algorithm stable for all N = 10, 30, 50. Further, the average-delay in case N = 50 are roughly same as that of

case N = 30, although the network size increases 25/9 times! This matches the conclusion of Theorem 1 i.e. the averagedelay of our algorithm with appropriate choices of universal constants is bounded by a constant as N → ∞.

Figure 3: Relation between the delay performance and arrival traﬃc load ρ. Relation between the delay performance and ρ. Now, we study how the average-delay of our algorithm changes depending on the arrival traﬃc load ρ. To this end, we consider the following setups/parameters, N = 10 and C = L = R = 5. While we change ρ from 0.1 to 0.9, we calculate the aver∑99999 ∑ Qi (t) 1 age queue-size up to time 100000 i.e. 100000 t=0 i N2 , which corresponds to the y-axis in Figure 3. Veriﬁcation of Conjecture. In Section 3.4, we make an interesting conjecture that the freezing scheme in our algorithm may not necessary. To check whether this conjecture is indeed true, we compare the performances of our algorithms under two scenarios: (1) N = 10, C = L = R = 5, and (2) N = 10, C = L = 5, R = 2N = 20. The second scenario which chooses R = 2N represents no freezing case since the diameter of G is at most 2N . The delay performance in both scenarios which we described in Figure 2 are roughly same, hence this implies that the choice of R (i.e. the quality of the freezing scheme in our algorithm) may not crucial for the performance of the algorithm in grid-like regular networks.

Figure 4: Eﬀect of the freezing scheme to the performance. Eﬀect of C in Performance. As the last scenario, we compare the performance of our algorithm in terms of C which

decides the weights in (1). Running our algorithm under N = 10, R = L = 5 while changing C = 3, 5, 9, leads to Figure 5. This shows that small C does not guarantee even the

7.

In past years, we have witnessed an exciting progress in designing queue-based randomized CSMA algorithms. These approaches yield throughput optimal algorithms but induce very large delay. In this paper, motivated to design a low delay and high throughput algorithm for wireless networks, we design a queue-based CSMA algorithm that utilizes the minimal local network topology to achieve such high performance for networks with geometry. We introduce a new queue-weight function to design random access probabilities along with a local graph partitioning scheme. Our simulation results suggest that for regular enough networks, our weight function leads to low delay and high throughput even without the partitioning scheme. In summary, we believe that the result of this paper brings us close to practically useful queue-based CSMA for wireless networks that arise in practice.

8. Figure 5: Eﬀect of C to the performance. stability of the network, while large C increases the averagedelay. This phenomenon is also expected under our theoretic proofs in this paper. When our algorithm implicitly aims to ﬁnd an approximated MW schedule, small and large C increase the approximation error and the time-complexity (i.e. mixing time of Glauber dynamics), respectively.

CONCLUSION

REFERENCES

[1] N. Abramson and F. Kuo (Editors). The aloha system. Computer-Communication Networks, 1973. [2] D. J. Aldous. Ultimate instability of exponential back-oﬀ protocol for acknowledgement-based transmission control of random access communication channels. IEEE Transactions on Information Theory, 33(2):219–223, 1987. [3] T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker. High-speed switch scheduling for local-area networks. ACM Transactions on Computer Systems, 11(4):319–352, 1993. [4] C. Bordenave, D. McDonald, and A. Proutiere. Performance of random medium access - an asymptotic approach. In Proceedings of ACM Sigmetrics, 2008. [5] J. G. Dai. Stability of ﬂuid and stochastic processing networks. Miscellanea Publication, (9), 1999. [6] J. G. Dai and W. Lin. Asymptotic optimality of maximum pressure policies in stochastic processing networks. Annals of Applied Probability, 18(6):2239–2299, 2008. [7] M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM, 38(1):1–17, 1991. [8] S. Foss and T. Konstantopoulos. An overview of some stability methods. Journal of the Operations Research Society of Japan, 47(4):275–303, 2004. [9] R. K. Getoor. Transience and recurrence of markov processes. ˝ In AzOma, J. and Yor, M., editors, S´ eminaire de Probabilit´ es XIV, pages 397–409, 1979. [10] P. Giaccone, B. Prabhakar, and D. Shah. Randomized scheduling algorithms for high-aggregate bandwidth switches. IEEE Journal on Selected Areas in Communications High-performance electronic switches/routers for high-speed internet, 21(4):546–559, 2003. [11] L. A. Goldberg. Design and analysis of contention-resolution protocols, epsrc research grant gr/l60982. http://www.csc.liv.ac.uk/ leslie/contention.html, Last updated, Oct. 2002. [12] L. A. Goldberg, M. Jerrum, S. Kannan, and M. Paterson. A bound on the capacity of backoﬀ and acknowledgement-based protocols. Research Report 365, Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK, January 2000. [13] P. Gupta and A. L. Stolyar. Optimal throughput allocation in general random-access networks. In Proceedings of 40th Annual Conf. Inf. Sci. Systems, IEEE, Princeton, NJ, pages 1254–1259, 2006. [14] J. Hastad, T. Leighton, and B. Rogoﬀ. Analysis of backoﬀ protocols for multiple access channels. SIAM J. Comput, 25(4), 1996. [15] L. Jiang, D. Shah, J. Shin, and J. C. Walrand. Distributed random access algorithm: Scheduling and congesion control. CoRR, abs/0907.1266, 2009. [16] L. Jiang and J. Walrand. A distributed csma algorithm for throughput and utility maximization in wireless networks. In Proceedings of 46th Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, 2008. [17] C. Joo, X. Lin, and N. B. Shroﬀ. Understanding the capacity region of the greedy maximal scheduling algorithm in multi-hop

[18]

[19] [20]

[21]

[22]

[23] [24]

[25]

[26] [27]

[28]

[29]

[30]

[31]

[32]

[33]

[34] [35]

[36] [37]

[38] [39]

[40]

[41]

[42]

[43]

wireless networks. In IEEE INFOCOM 2008. The 27th Conference on Computer Communications, pages 1103–1111, 2008. K. Jung. Approximate Inference: Decomposition Methods with Applications to Networks. PhD thesis, MASSACHUSETTS INSTITUTE OF TECHNOLOGY, 2009. F. P. Kelly. Stochastic models of computer communication systems. J. R. Statist. Soc B, 47(3):379–395, 1985. F. P. Kelly and I. M. MacPhee. The number of packets transmitted by collision detect random access schemes. The Annals of Probability, 15(4):1557–1568, 1987. M. Leconte, J. Ni, and R. Srikant. Improved bounds on the throughput eﬃciency of greedy maximal scheduling in wireless networks. In MobiHoc, 2009. J. Liu, Y. Yi, A. Proutiere, M. Chiang, and V. Poor. Convergence and tradeoﬀ of utility-optimal csma. Submitted, IEEE Communication Letters, February 2009. I. M. MacPhee. On optimal strategies in stochastic decision processes, d. phil. thesis, university of cambridge, 1989. P. Marbach, A. Eryilmaz, and A. Ozdaglar. Achievable rate region of csma schedulers in wireless networks with primary interference constraints. In Proceedings of IEEE Conference on Decision and Control, 2007. N. McKeown. iSLIP: a scheduling algorithm for input-queued switches. IEEE Transaction on Networking, 7(2):188–201, 1999. S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. Springer-Verlag, London, 1993. E. Modiano, D. Shah, and G. Zussman. Maximizing throughput in wireless network via gossiping. In ACM SIGMETRICS/Performance, 2006. R. Montenegro and P. Tetali. Mathematical aspects of mixing times in markov chains. Found. Trends Theor. Comput. Sci., 1(3):237–354, 2006. J. Mosely and P. A. Humblet. A class of eﬃcient contention resolution algorithms for multiple access channels. IEEE Transactions on Communications, 33(2):145–151, 1985. J. Ni and R. Srikant. Distributed CSMA/CA algorithms for achieving maximum throughput in wireless networks. In Proc. of Information Theory and Applications Workshop, 2009. S. Rajagopalan, D. Shah, and J. Shin. A network adiabatic theorem: an eﬃcient randomized protocol for contention resolution. In ACM Sigmetrics/Performance, 2009. S. Sanghavi, L. Bui, and R. Srikant. Distributed link scheduling with constant overhead. In Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, page 324. ACM, 2007. D. Shah. Gossip Algorithms. Foundations and Trends in Networking, Now Publishers Inc, June 2009. available at http://web.mit.edu/devavrat/www/Gossipbook.pdf. D. Shah and J. Shin. Randomized scheduling algorithm for queueing networks. CoRR, abs/0908.3670, 2009. D. Shah, D. N. C. Tse, and J. N. Tsitsiklis. Hardness of low delay network scheduling. Submitted to IEEE Transactions on Information Theory, 2009. D. Shah and D. J. Wischik. Optimal scheduling algorithm for input queued switch. In Proceeding of IEEE INFOCOM, 2006. D. Shah and D. J. Wischik. The teleology of scheduling algorithms for switched networks under light load, critical load, and overload. http://web.mit.edu/devavrat/www/shahwischik.pdf, 2007-09. A. Sinclair. Algorithms for Random Generation and Counting: A Markov Chain Approach. Birkh¨ auser, Boston, 1993. A. L. Stolyar. Dynamic distributed scheduling in random access networks. Journal of Applied Probabability, 45(2):297–313, 2008. A.L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and workload minimization in heavy traﬃc. The Annals of Applied Probability, 14(1):1–53, 2004. L. Tassiulas. Linear complexity algorithms for maximum throughput in radio networks and input queued switches. In IEEE INFOCOM, volume 2, pages 533–539, 1998. L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions on Automatic Control, 37:1936–1948, 1992. B. S. Tsybakov and N. B. Likhanov. Upper bound on the capacity of a random multiple-access system. Problemy Peredachi Informatsii, 23(3):64–78, 1987.

APPENDIX A. PROOF OF LEMMA 2 To simplifying notations, we will use G and σ instead of Gl and σ l . Observe that the deﬁnition of distribution π implies that for any σ ∈ I(G), exp (W · σ) = log Z + log πσ , where Z is the normalizing factor in (2). Using this, for any distribution µ on I(G), we deﬁne F (µ) = Eµ (W · σ) + HER (µ), where HER (µ) is the the standard discrete entropy of µ. Now we obtain ∑ ∑ F (µ) = µσ W · σ − µσ log µσ σ

=

∑

σ

µσ (log Z + log πσ ) −

σ

=

∑

µσ log Z +

σ

= log Z +

∑ σ

≤ log Z + log

∑

(∑ σ

µσ log µσ

σ

µσ log

σ

µσ log

∑

πσ µσ

µσ

πσ µσ

πσ µσ

)

= log Z with equality if and only if µ = π. Let σ ∗ ∈ arg max W · σ and µ be the delta distribution µσ = 1σ=σ∗ . Then, for this distribution F (µ) = W · σ ∗ . But, F (π) ≥ F (µ). Also, the maximal entropy of any distribution on I(G) is log |I(G)| ≤ |Vl |. Therefore, W · σ∗

≤

F (π)

=

Eπ [W · σ] + HER (π)

≤

Eπ [W · σ] + |Vl |.

This completes the proof of Lemma 2.

B.

PROOF OF LEMMA 6

First note that Bκ is closed by deﬁnition. To establish that it is a petit set, we need to ﬁnd a non-trivial measure µ on (X, BX ) and sampling distribution a on Z+ so that for any x ∈ Bκ , Ka (x, ·) ≥ µ(·). To construct such a measure µ, we shall use the following Lemma. Lemma 8. Let the network Markov process X(·) start with the state x ∈ Bκ at time 0 i.e. X(0) = x. Then, there exists Tκ ≥ 1 and γκ > 0 such that Tκ ∑

Prx (X(τ ) = 0) ≥ γκ , ∀x ∈ Bκ .

τ =1

Here 0 = (0, 0) ∈ X denote the state where all components of Q are 0 and the schedule is the empty independent set. Proof. Consider any x ∈ Bκ . By deﬁnition the size of each queue is no more than F −1 (κ). Consider some

large enough (soon to be determined) Tκ . By the property of Bernoulli arrival process, there is a positive probability θκ0 > 0 of no arrivals happening to the system in time Tκ L. Assuming no arrivals happen, we will show that in large enough time t1κ , with probability θκ1 > 0 each queue receives at least F −1 (κ) amount of service; and after that in additional time t2 with positive probability θ2 > 0 the empty set schedule is reached. This will imply that by deﬁning △ Tκ = (t1κ + t2 )/L the state 0 ∈ X is reached with probability at least △

γκ = θκ0 θκ1 θ2 > 0. And this will immediately imply the desired result of Lemma 8. To this end, we need to show existence of t1κ , θκ1 and t2 , θ2 with properties stated above to complete the proof of Lemma 8. First, existence of t1κ , θκ1 . For this, focus on a speciﬁc node i. The probabilities of i being frozen and unfrozen in the coloring scheme are at least some constants ηl and η2 respectively. Thus, if σi (0) = 1, σi (t) = 1 and i is frozen at the second round t ∈ [L, 2L) with probability at least η12 . On the other hand, if σi (0) = 0, σi (t) = 1 and i is frozen at the second round t ∈ [L, 2L) with probability at least η2 × Pr[Markov chain P makes σi (L) = 1] × η1 , where recall P is described in Section 4.1. Since the transition probability of P is uniformly bounded in terms of C, there exists a constant η3 such that η2 × Pr[Markov chain P makes σi (L) = 1] × η1 ≥ η3 . Therefore, not depending on the initial scheduling σi (0) of node i, the probability that i keeps transmitting at the second round [L, 2L) is at least η4 , min{η12 , η3 }. ⌈ If we apply ⌉ this argument iteratively over time, after time F −1 (κ) + L , queue i receives at least F −1 (κ) amount of work with prob⌈ ⌉ ⌈F −1 (κ)/L⌉ ability η4 . Thus, after time t1κ = n F −1 (κ) + L , all queues receives at least F −1 (κ) amount of work with n⌈F −1 (κ)/L⌉ probability θκ1 = η4 . Next, to establish existence of t2 , θ2 as desired, observe that once the system reaches empty queues, it follows that in the absence of new arrivals the empty schedule 0 is reached after some ﬁnite time t2 with probability θ2 > 0 by similar properties of the Markov chain P on I(G) when all queues are 0. Here t2 and θ2 are dependent on n only. This completes the proof of Lemma 8. In what follows, Lemma 8 will be used to complete the proof that Bκ is a closed petit. To this end, consider Geometric(1/2) as the sampling distribution a, i.e. a(ℓ) = 2−ℓ , ℓ ≥ 1. Let δ 0 be the delta distribution on element 0 ∈ X. Then, deﬁne µ as µ = 2−Tκ γk δ 0 , that is µ(·) = 2−Tκ γk δ 0 (·). Clearly, µ is non-trivial measure on (X, BX ). With these deﬁnitions of a and µ, Lemma 8 immediately implies that for any x ∈ Bκ , Ka (x, ·) ≥ µ(·). This establishes that set Bκ is a closed petit set.

A Distributed Throughput-Optimal CSMA/CA