Queue-Aware Distributive Resource Control for Delay-Sensitive Two ...

Viewer
Transcript

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

341

Queue-Aware Distributive Resource Control for Delay-Sensitive Two-Hop MIMO Cooperative Systems Rui Wang, Member, IEEE, Vincent K. N. Lau, Senior Member, IEEE, and Ying Cui, Student Member, IEEE

Abstract—In this paper, we consider a queue-aware distributive resource control algorithm for two-hop MIMO cooperative systems. We shall illustrate that relay buffering is an effective way to reduce the intrinsic half-duplex penalty in cooperative systems. The complex interactions of the queues at the source node and the relays are modeled as an average-cost infinite horizon Markov decision process (MDP). The traditional approach solving this MDP problem involves centralized control with huge complexity. To obtain a distributive and low complexity solution, we introduce a linear structure which approximates the value function of the associated Bellman equation by the sum of per-node value functions. We derive a distributive two-stage two-winner auction-based control policy which is a function of the local CSI and local QSI only. Furthermore, to estimate the best fit approximation parameter, we propose a distributive online stochastic learning algorithm using stochastic approximation theory. Finally, we establish technical conditions for almost-sure convergence and show that under heavy traffic, the proposed low complexity distributive control is global optimal. Index Terms—Cooperative system, delay-sensitive, distributive scheduling, stochastic control.

I. INTRODUCTION OOPERATIVE relay communication has been a hot research topic in both the academia [1] and the industry [2] because it could exploit the broadcast nature of wireless communication to achieve cooperative diversity. One potential issue of cooperative communication is the half-duplex penalty in the relay nodes. There have been some recent works to address the half-duplex issue in cooperative relay systems. For example, complex echo cancelation technique is used at the relay to cancel the coupled interference from the transmitting path [3]. However, these works all focused at the physical layer signal processing. In [4], the authors exploit special topology and proposed some relay protocols to get rid of the half-duplex penalty.

C

Manuscript received February 11, 2010; accepted September 16, 2010. Date of publication October 14, 2010; date of current version December 17, 2010. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Xiang-Gen Xia. This work was supported by RGC 615609. R. Wang was with the Department of ECE, The Hong Kong University of Science and Technology. He is now with Huawei Technologies Co., Ltd., Bantian, Longgang District, Shenzhen 518129, P.R. China (e-mail: [email protected]). V. K. N. Lau and Y. Cui are with the Department of ECE, The Hong Kong University of Science and Technology (e-mail: [email protected]; cuiying@ust. hk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2010.2086449

Moreover, this approach depends heavily on the locations of the relays and it cannot be extended to general relay channel. In this paper, we are interested to explore a system level solution to deal with the half-duplex issue. We consider a simple MIMO cooperative relay system with a multiantenna multiantenna relay nodes (RS), and a source node (Src), multiantenna destination node (Dst). We shall illustrate that relay buffering can be utilized to significantly reduce the intrinsic half-duplex penalty. Since buffering is involved, it is important to consider not only the throughput performance but also the associated end-to-end delay performance. As a result, we shall focus on delay-optimal resource control for the two-hop protocol in MIMO cooperative relay systems. Delay-optimal resource control in cooperative relay system is a very difficult problem. Most of the existing works have assumed infinite backlogs of information and focus on optimizing the throughput performance only. A systematic approach is to model the delay-optimal control as Markov Decision Process (MDP) [5], [6]. However, there is a well-known issue of the curse of dimensionality and brute force value iteration or policy iteration could not give simple implementable solutions1. For multihop systems, there is a unique challenge concerning the complex interactions of buffers at the source node and the RS nodes and the existing solutions for single-hop systems cannot be extended easily to deal with this situation. There are a few recent works that considered queue dynamics in relay systems [7], [8]. However, these works have focused on the characterization of the stability region and throughput optimal control. The question of delay-optimal control for cooperative relay system remains to be open. In addition, another important technical challenge is the distributive implementation consideration. For instance, the entire system state could be characterized by the global CSI (CSI among every pair of nodes in the system) as well as the global QSI (QSI of every buffer in the system). Brute-force solution of the MDP will yield a control policy that is adaptive to the global CSI and global QSI. This poses a huge implementation challenges because these global system state information are distributed locally at each of the source and relay nodes. In this paper, we shall address the above challenges as follows. We shall first formulate the delay-optimal resource control policy (such as the power control and RS selection) as an average-cost infinite horizon Markov decision process (MDP). To

M

1For example, for a system with maximum buffer length of 20, 3 CSI states RSs, the total number of system states is 20 3 , which is unand manageable even for small number of RS.

1053-587X/$26.00 © 2010 IEEE

2

342

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

Fig. 1. Illustration of the two-hop MIMO cooperative system with a multiantenna source node, 2 multiantenna RS nodes and a multiantenna destination node. By exploiting buffers at the 2 MIMO RSs, the S-R link (source node to RS1) and R-D link (RS2 to destination node) can deliver packets simultaneously without interfering with each other using signal processing techniques (with appropriate precoder and decorrelator designs).

alleviate the curse of dimensionality, and to obtain a distributive and low complexity solution, we first introduce a per-node value function to approximate the value function of the associated Bellman equation. Based on the per-node value function, we derive a distributive two-stage two-winner auction-based control policy, which is a function of the local CSI and local QSI. The per-node value function is obtained via a distributive online stochastic learning algorithm, which requires local CSI and local QSI only. The proposed online stochastic learning is quite different from the conventional reinforced learning [9] in mainly two ways: (1) We are dealing with constrained MDP (CMDP) and our online iterative solution updates both the value function and the Lagrange multipliers (LM) simultaneously; (2) The control action is determined from the per-node value function of all the nodes via a per-slot auction mechanism. Therefore, the algorithm dynamics of the per-node online learning is not a contraction mapping and hence, standard convergence proof using fixed point theorem cannot be applied in our case directly. Using the technique of separation of different time scales, we establish technical conditions for the almost sure convergence of the proposed distributive stochastic learning. We also show that the proposed low complexity distributive solution is asymptotically global optimal under heavy traffic loading. Finally, we demonstrate by simulation that the proposed scheme has significant performance gain over various baselines [such as conventional CSIT-only control and the throughput-optimal control (in and low signaling stability sense)] with low complexity overhead2. II. SYSTEM MODELS A. System Architecture and MIMO Relay Physical Layer Model We consider a two-hop multiantenna cooperative relay communication system with one multiantenna source node ( anmultiantenna half-duplex relay stations (RS, each tennas), antennas) and one multiantenna destination node ( with 2Due to page limit, we omit all the proofs. Please refer to [10] for the technical details.

antennas), as illustrated in Fig. 1. The source node cannot deliver packets directly to the destination node due to limited coverage and the cooperative RSs are deployed to extend the source node’s coverage. Denote the Rx-RS and the Tx-RS as the th RS and the th and be the number RS for notation simplicity3. Let of data streams transmitted in the S-R link and the R-D link, refor spectively, where we require simultaneous interference-free transmission. We shall illustrate link and the – link as folthe signal model of the – lows: link: Let and • – be the symbol vector and the precoder matrix of the source node, respectively, be the decorrelator matrix at the th RS node, the postprocessing symbol vector at the th RS , where is given by is the zero-mean unit variance i.i.d. complex Gaussian fading matrix from the source node is the zero-mean unit to the th RS, variance complex Gaussian channel noise. – link: Let and be • the transmit symbol vector and the precoder of the th RS, respectively, the received symbol vector at the des, tination node is given by4 is complex Gaussian fading where matrix from the th RS to the destination node, is the complex Gaussian channel noise. In this paper, the resource control is performed distributively on each RS and therefore, we define the local channel state information (CSI) available at each RS as follows. For the th RS, there are two types of local CSI, namely the type-I local CSI and type-II local CSI as illustrated in Fig. 2. The type-I and type-II local CSI of the th RS are denoted by and , respectively. For notation convenience, let be the be the global CSI local CSI5 at the th RS and (GCSI) of the system. Moreover, the assumption on the channel is summarized below: Assumption 1 (Assumption on Channel Fading): We assume are i.i.d. the channel fading elements in the global CSI . The CSI is quasi-static within a frame but i.i.d. between frames. We assume strong channel coding is used and hence, the maximum achievable data rate is given by the instantaneous mutual information. If the source node transinformation bits to the th RS in the curmits rent frame, the frame will be successfully received if , where 3Since the RSs are half-duplex under practical consideration, we require

n implicitly.

m 6=

4Due to the limited coverage of the source node, we assume the received signal from the source node is negligible compared with the received signal from the relay node.

m

5Note that both the type-I and type-II local CSI at the th RS refers to all the outgoing links from the mth RS and hence, they can be measured at the mth RS using channel reciprocity and preambles. For example, there are standard signaling and channel sounding mechanisms in the WiMAX (802.16j, 802.16m) and LTE systems for the RS to acquire the local CSI.

WANG et al.: QUEUE-AWARE DISTRIBUTIVE RESOURCE CONTROL

343

Let be the SVD decom, where the sinposition of channel matrix are sorted in a decreasing order gular values in along the diagonal, and . Using standard optimization is given by techniques [11], the source precoder

(2) where are the first singular is the Lagrange mulvalues of channel matrix tiplier corresponding to the transmit power constraint in is given by (1). The decorrelator (3) Fig. 2. Illustration of an example of bidding protocol for a 2-RS system.

denotes the matrix conjugate transpose and is the frame duration. Similarly, the destination node could successfully information bits (transmitted from decode a frame with the th RS) if

Link at the Tx-RS • Precoder Design of the in (3), the Node7: Similarly, given the decorrelator precoder at the Tx-RS node is selected – link mutual information subject to the to maximize transmit power constraint and the interference nulling constraint (at the Rx-RS node) as follows:

.

B. Buffered Decode and Forward

(4)

Although the RS nodes are half-duplex relays, it is possible to reduce the system half-duplex penalty by exploiting buffers at the half-duplex RSs. Specifically, the source node could transmit a packet to the th RS (denoted as the Rx-RS) and at the same time, the th RS (denoted as the Tx-RS) transmits its buffered packet to the destination node without interfering the Rx-RS. This is possible by means of precoder-decorrelator designs at the source node, Rx-RS ( th RS) and the Tx-RS and denote the total transmit power ( th RS). Let link and the Tx-RS ( th RS) at the source node for the – – link, respectively. For any given for the for the – link and for the – link (where implicitly), the decorrelator and precoder designs are elaborated below. Link • Precoder and Decorrelator Design of the at the Rx-RS Node6: The precoder at the source node and the decorrelator at the Rx-RS node are chosen to link subject optimize the mutual information of the – to the transmit power constraint as follows:

(5) The interference nulling constraint in (4) is to allow simultaneously active R-D and S-R links using half-duplex RSs. Let be the SVD decomposition, where the singular values are sorted in a decreasing order along the diagin onal, denotes the null space of matrix and . Using standard optimization techniques [11], the precoder at the is given by Tx-RS

(6) where are the first singular values of channel matrix is the Lagrange multiplier corresponding to the power constraint in (5). C. Bursty Source Model and Queue Dynamics

(1)

H

m

6Type-I local CSI is required at the th Rx-RS node to compute the precoder and decorrelator of the S–R link.

There is one queue in the source node and one queue in each RSs, respectively, for the storage of received informaof the

H

7Type II local CSI is required at the precoder of the R –D link.

nth Tx-RS node to compute the

344

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

tion bits. Let be the maximum buffer size (number of bits) indifor the buffers in the source node and all the RSs. Let cates the number of new information bits arrival in the th frame at the source node. The assumption on the bit arrival process is given. Assumption 2 (Assumption on Arrival Process): We assume is i.i.d. over frames based on a general distribution with and the information bits arrive at the end of each frame. and denote the number of inMoreover, let formation bits in the source node’s queue and the th RS’s queue at frame . We assume each RS has the knowledge of its own queue length and the source node’s queue . length. Thus, the local QSI of the th RS is denotes the global queue state information (GQSI) at frame . The overall system queue dynamics at the source node and the RSs are summarized below. informa• If the source node successfully delivers tion bits to the th RS at frame , then and . • If the source node fails to deliver any information bit to the . RSs, then th RS successfully delivers in• If the formation bits to the destination at frame , then . Remark 1: Each information bit delivered from the source node will be received by one of the RSs and different RSs may have different information bits in the buffer. When the source node is to deliver information bits to one RS, selecting different RSs with different buffer lengths may have different effects on the average packet delay of the system. Therefore, not only the CSI of all S-R links but also the QSI of all RSs should be considered in directing the source node’s transmission. Such coupling on the system QSI is a unique challenge in delay-optimal control of multihop systems. Fig. 1 shows the top level architecture illustrating the interactions among all the queues in the two-hop cooperative system.

Fig. 2 illustrates an example of bidding protocol for a 2-RS system. As a result, the RS selection and data stream allocation procedure can be parameterized by a bidding vector . We shall refer the bidding vector as the RS selection and data stream allocation policy in the rest of the paper.

D. Distributive Contention Protocol Based on the BDF in Section II.B, we still need to determine (a) which RS should be the Rx-RS , (b) which RS should be the Tx-RS and (c) the number of data streams transmitted and the Tx-RS . Due to the by the source node decentralized control requirement, we shall propose a two-stage two-winner auction mechanism for distributive contention resolution.

E. Optimization Objective and Control Policy Definition 1 (Distributive Stationary Control Policy): A distributive stationary control policy is a collection of stationary control policies at the th includes the power allocaRS, where , the first-stage and tion policy of S-R link and R-D link second-stage bidding policy and . Specifically (7) (8) (9) , where is the total transmit power allocation at the source node for the S-R link with data streams, is the total transmit power allocation data streams. at the Tx-RS for the R-D link with Denote the local system state of the th RS as . Therefore, the global system . state is given by Remark 2 (Distributive Consideration of Control Policy ): The stationary control policy is at each RS only distributive in the sense that the policy and the broadcast bidding depends on the local system state information available at RS . Thus, for notation simplicity, we shall omit the bidding information when the meaning is clear, in the rest and use of the paper. A stationary control policy induces a joint distribution for the random process . Under Assumption 1 and 2, only depends on and actions at frame , and hence the for a given control policy is induced random process Markovian with the following transition probability: for

(10) where the equality is because of Assumption 1 and the queue is dynamics transition probability given by (11), shown at the bottom of the page. Given a unichain is ergodic and there policy , the induced Markov chain

(11)

WANG et al.: QUEUE-AWARE DISTRIBUTIVE RESOURCE CONTROL

345

exists a unique steady state distribution [5]. Therefore, we have the average end-to-end delay of the two-hop cooperative RS system summarized in the following lemma: Lemma 1 (Average End-to-End Delay): For small average packet drop rate constraint , the average end-to-end delay of the two-hop cooperative RS system is given by

(12) where in the equation, means taking the expectation with respect to the induced steady state distri(induced by the unichain control policy ) and is bution the average number of arrival bits per frame at the source node. Similarly, the source node’s average drop rate constraint8, the source node’s average power constraint and each RS ’s average power constraint are given by

s.t. (13), (14), (15). Problem 1 is an infinite horizon average cost constrained Markov Decision Problem (CMDP) [12] with system state (where is the global QSI space is the global CSI state space), action space state space and (where is power allocation action space, is the first-stage bidding action space is the second-stage bidding action space), and transition kernel given by (10), and the per-stage cost function . B. Lagrangian Approach for the CMDP The CMDP in Problem 1 can be converted into unconstrained MDP by the Lagrange theory [11]. For any vector of Lagrange multiplier (LM) , we define the Lagrangian as , where

(13)

(14) Therefore, the corresponding unconstrained MDP for a particular vector of LMs is given by

(15)

(16)

In this section, we shall formulate the delay-optimal problem as an infinite horizon average cost constrained Markov decision problem (CMDP) and discuss the general solution.

where gives the Lagrange dual function. The dual problem of the primal problem in Problem 1 is given by . It is shown in [13] that there exists a Lagrange multiplier such that minimizes and the saddle point condition the saddle point condition holds. Using standard Lagrange theory [11], is the primal opis the dual optimal (solving timal (i.e., solving Problem 1), the dual problem) and the duality gap is zero. Thus, by solving the dual problem, we can obtain the primal optimal . Therefore, we shall first solve the unconstrained MDP in (16) in the following. For a given LM vector , the optimizing unichain policy for the unconstrained MDP (16) can be obtained by solving the asas follows: sociated Bellman equation w.r.t.

where and

. III. CONSTRAINED MARKOV DECISION PROBLEM FORMULATION

A. CMDP Formulation The goal of the controller is to choose an optimal stathat minimizes the avtionary feasible unichain policy erage end-to-end transmission delay in (12). Specifically, the delay-optimal control problem is summarized below. Problem 1 (Delay-Optimal Control Problem): Find a feasuch that sible stationary unichain policy the average end-to-end delay is minimized subject to the average drop rate constraint at the source node and the average power constraint at the source node and each RS node9, i.e., 8Since

M

the source node and RSs have buffers with the same buffer size , the average drop rate at each RS node is much lower than the average drop rate at the source node. Therefore, we omit the average drop rate constraint at each RS to simplify the problem. 9To simplify the notation, we shall normalize 1 in the rest of the paper.

N

=

(17) where

is the value function of the MDP and is the transition kernel which can be obtained from (10), is the optimal average with cost per stage and the optimizing policy is

346

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

minimizing the R.H.S. of (17) at any state . For any unichain , the solution to policy with irreducible Markov Chain (17) is unique [12]. We restrict our policy space to be unichain as the optimal unichain policy. policies and we denote

Remark 3: Note that solving the R.H.S. of (18) for each will get an overall control policy which is a function of both the CSI and QSI . IV. DISTRIBUTIVE ONLINE ALGORITHM BASED ON APPROXIMATED MDP

C. Equivalent Bellman Equation for the CMDP The Bellman equation in (17) is a fixed point problem over the functional space and this is very complicated to solve due to the huge cardinality of the system state space. Brute-force solution could not lead to any useful implementations. In this subsection, we shall illustrate that the Bellman equation in (17) can be simplified into an equivalent form by exploiting the i.i.d. structure of the CSI process . For notation convenience, we partition the unichain policy into a collection of actions based on the QSI. Specifically, we have the following definition. th Relay): Definition 2 (Partitioned Actions for the , we define Given a unichain control policy as the collection for all possible of actions under a given local QSI local CSI . The complete policy for the th RS is therefore equal to the union of all the partitioned actions, i.e., . and we show that the opTherefore, we have timal policy of (16) can be obtained by solving an equivalent Bellman equation summarized in the following lemma. Lemma 2 (Equivalent Bellman Equation): The control policy obtained by solving the Bellman equation in (17) is the same as that obtained by solving the equivalent Bellman equation defined

(18)

is the original optimal average cost where is the conditional avper stage, erage value function for state , and (19) is the conditional per-stage cost and is the conditional average transition kernel.

There are still two major obstacles ahead. Firsty, obtaining w.r.t. (18) has exponential complexity. Second, even if we could obtain , the derived control actions will depend on global QSI and CSI, which is highly undesirable. In this section, we shall overcome the above challenges using approximate MDP and distributive stochastic learning. The linear approximation architecture of the value function is given below: (20) where we shall refer

as per-node value functions and as global value function. Compared with the original value function in (18), the dimension of the per-node value functions is much smaller. Therefore, the per-node value function can only satisfy the Bellman equation (18) in some predetermined system queue states, which are referred to as the representative states. Without loss of generality, we define the representative states , as denotes the QSI with and where . We let and set (i.e., all buffer empty) as the reference state without loss of generality. In vector form, we have and , where , the and inverse mapping parameter vector , mapping matrix matrix are given by the equation shown at the bottom of the page. A. Distributive Control Policy Under Linear Value Function Approximation Using the approximate value function in (20), we shall derive a distributive control policy which depends on the local CSI and local QSI as well as the per-node value functions at each node . Specifically, using the approximation in (20), the control policy in (18) can be obtained by solving the following simplified optimization problem.

WANG et al.: QUEUE-AWARE DISTRIBUTIVE RESOURCE CONTROL

347

Problem 2 (Optimal Control Action With Approximate Value Functions): The optimal control policy is given by

(21) where

,

and . Lemma 3 (Distributive Control Policy): Given and , the following distributive control solves the Problem 2 : • Power control for the S-R link and R-D link: (22) (23) • First-stage bid at RSs: (24) • Second-stage bid at RSs:

(25)

B. Online Distributive Stochastic Learning Algorithm to and the LMs Estimate the Per-Node Value Functions In Lemma 3, the control actions are functions of per-node and the LMs . In this value functions section, we propose an online learning algorithm to determine the per-node value functions and the LMs. The system procedure of the proposed distributive online learning is given. initiates its per-node • Step 1 [Initialization]: Each RS and , value functions and LMs, denoted as as well as the per-node value functions and LMs for the and . The source node, denoted as initialization of and at each RS should be the same. • Step 2 [Determination of control actions]: At the beginning of the th frame, the source node broadcasts its to the RS nodes. Based on the local system inQSI and the per-node value formation and , each RS determines functions the distributive control actions including the S-R and R-D , the firstpower allocation as well as stage bid according to the second-stage bid Lemma 3. Based on the contention resolution protocol described in Section II-D, the Rx-RS and the Tx-RS pair is (where given by and and the corresponding number (where of data streams pair is given by and . • Step 3 [Per-node value functions and LMs upupdates the per-node value funcdate]: Each RS as well as the LMs tion according to Algorithm 1. Finally, and go to Step 2. let Algorithm 1: (Online Distributive Learning Algorithm for Per-Node Value Functions and LMs):

In addition, for sufficiently large source arrival rate , and the average transmit power constraints , the above power control policy has the following closed-form expression:

(26) (27)

(28) where

and . Remark 4 (Multilevel Water-Filling Structure of the Control Policy): The power control policy (22) and (23) as well as the RS selection and data stream allocation control policy in (24) and (25) are functions of both the CSI and QSI where they depend on the QSI indirectly via the per-node value functions . The power control solution has the form of multilevel water-filling where the power is allocated according to the CSI while the water-level is adaptive to the QSI.

(29) where , and are the step size sequences satisfying:

,

,

348

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

, , . C. Almost-Sure Convergence of Distributive Stochastic Learning In this section, we shall establish technical conditions for the almost-sure convergence of the online distributive learning satisfy , algorithm. Since , the LMs update and the per-node potential functions update are done simultaneously but over two different time scales. During the per-node potential functions update (timescale I), and we have . Therefore, the LMs appear to be quasi-static [15] during the per-node value function update in (26). For the notation convenience, define the sequences of matrices and as and , where is a is the unichain system control policy at the th frame, transition matrix of system states given the unichain system conis identity matrix. The convergence property of trol policy the per-node value function update is given here. Lemma 4 (Convergence of Per-Node Value Function Learning over Timescale I): Assume for all the feasible policy in the policy space, there exists some positive integer and such that

Hence, Lemma 4 basically guarantees the proposed online learning algorithm will converge to the best fit parameter vector (per-node potential) satisfying (20). On the other hand, since during the ratio of step sizes satisfies the LM update (timescale II), the per-node value function will be updated much faster than the LMs. Hence, the update of LMs in timescale II will trigger another update process of the per-node value function in timescale I. By [16, Corollary 2.1], w.p.1. Hence in Alwe have gorithm 1, during the LM updates, the per-node value function update is seen as almost equilibrated. Moreover, convergence of the LMs is summarized here. Lemma 5 (Convergence of the LMs Over Timescale II): The iteration on the vector of LMs converges almost surely to , which satisfies the power and packet drop rate constraints in (14), (15), and (13). Based on these lemmas, we summarized the convergence performance of the online per-node value functions and LMs learning algorithm in the following theorem. Theorem 1 (Convergence of Online Learning Algorithm 1): For the same conditions as in Lemma 4, we have w.p.1., where satisfies and the average power constraint (14), (15) as well as the average packet drop rate constraint (13), where is a vector with all elements equal to 1. D. Asymptotic Optimality

(30) where denotes the element in th row and th column (where corresponds to the reference state and . The following statements are true. • The update of the parameter vector (or per-node potential vector) will converge almost surely for any given initial and LMs , i.e., parameter vector . satisfies • The steady state parameter vector (31) where

is a constant,

and the mapping

is given by

is defined as . Note that (31) is equivalent to the following Bellman equation on the representative states

Finally, we shall show that the performance of the distributive algorithm is asymptotically global optimal for high traffic loading. Theorem 2 (Asymptotically Global Optimal at High Traffic and high traffic loading Loading): For sufficiently large such that the optimization problem in Problem 1 is feasible, the performance of the proposed distributive control algorithm is asymptotically global optimal. V. SIMULATIONS AND DISCUSSIONS In this section, we shall compare our proposed online pernode value function learning algorithm to five reference baselines. Baseline 1 and 4 refer to the proposed buffered decode and forward (BDF) protocol with throughput optimal policy (in stability sense), namely the dynamic backpressure algorithm [17], where we utilize full-duplex RSs in Baseline 1 and half-duplex RSs in Baseline 4. Baseline 2 and 5 refer to the regular decode-and-forward protocol (DF) with the CSIT only scheduling (the link selection and power allocation are adaptive to the CSIT only so as to optimize the end-to-end throughput). We utilize full-duplex RSs in Baseline 2 and half-duplex RSs in Baseline 5. Moreover, Baseline 3 refers to the proposed BDF protocol with CSIT only scheduling and half-duplex RSs. In the simulations, we assume the total bandwidth is 1 MHz, the packet arrival at the source node is Poisson with average arrival rate pck/s and deterministic packet size bits. The number of an. tennas at the source node and the destination node is Moreover, the maximum buffer size of each node (source node . and RSs) is

WANG et al.: QUEUE-AWARE DISTRIBUTIVE RESOURCE CONTROL

Fig. 3. Average end-to-end delay versus average transmit SNR. The determin25 K bits and the number of antennas at each RS is istic packet size is = 4. The packet drop rates of the Baselines 1–5 and the proposed distributive online learning are 0.2% 0.2% 13%, 3%, 24%, and 0.2%, respectively.

N =

N

349

Fig. 5. Average end-to-end delay versus the number of relay antennas with = 20 K bits and transmit SNR = 5 dB. The deterministic packet size is the number of antennas at each RS is = 4. The packet drop rates of the Baseline 1–5 and the proposed distributive online learning are 3%, 4%, 9%, 5%, 20%, and 0.1%, respectively.

N

N

Fig. 4. Average end-to-end delay versus the number of relays with transmit SNR = 5 5 dB. The deterministic packet size is = 25 K bits and the = 4. The packet drop rates of the Baseline number of antennas at each RS is 1–5 and the proposed distributive online learning are 23%, 23%, 86%, 82%, 96%, and 0.5%, respectively.

Fig. 6. Illustration of the convergence of the proposed online learning algorithm. The instantaneous per-node value function is plotted versus time slot index for a cooperative MIMO system with a source node (with 2 antennas) and 2 RS nodes (each with 4 antennas). The transmit SNR of the source and the RS nodes are 10 dB and the target packet drop rate is 0.2%.

Fig. 3 illustrate the average end-to-end delay versus average antennas at each RS, transmit SNR per node with respectively. It can be observed that the proposed distributive algorithm with half-duplex RS could achieve significant performance gain in average delay over all baselines with full-duplex RSs, and even more significant gain over the baselines with half-duplex RSs. This illustrates the advantages of the proposed BDF algorithm with distributive delay-optimal control policy, which could effectively reduce the intrinsic half-duplex penalty in the cooperative communication systems. Figs. 4 and 5 illustrate the average end-to-end delay versus the number of relays and the number of relay antennas with antennas at each RS, respectively. It can be observed that the

average delay of all the schemes decreases as the number of relays or the number of relay antennas increases. Furthermore, the proposed BDF algorithm with distributive delay-optimal control policy has significant gain in delay over all the baselines. Fig. 6 illustrates the convergence property of the proposed distributive online learning algorithm. We plot the per-node value function of the first relay versus scheduling slot index at a transmit SNR= 10 dB. The average delay at the 200th scheduling slot is already very close to the steady-state value, which is much better than all the baselines. Furthermore, unlike the iterations in deterministic NUM problems, the proposed algorithm is online, meaning that normal payload is delivered during the iteration steps.

:

N

N

350

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 1, JANUARY 2011

VI. SUMMARY In this paper, we consider queue-aware resource control for two-hop cooperative MIMO systems. We show that by exploiting buffering in each MIMO relay, we could substantially reduce the intrinsic half-duplex loss in cooperative systems. The delay-optimal resource control policy is formulated as an average-cost infinite horizon MDP. To obtain a low complexity solution, we approximate the value function by a linear combination of per-node value functions. The per-node value function is obtained using a distributive stochastic learning algorithm. We also established technical conditions for almost-sure convergence and show that in heavy traffic limit, the proposed low complexity distributive algorithm converges to global optimal solution. REFERENCES [1] T. Cover and A. Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Inf. Theory, vol. 25, no. 5, pp. 572–584, Sep. 1979. [2] IEEE 802.16’s Relay Task Group. [Online]. Available: http://www. ieee802.org/16/relay/index.html [Online]. Available: [3] L. Vega, H. Rey, J. Benesty, and S. Tressens, “A new robust variable step-size NLMS algorithm,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1878–1893, May 2008. [4] E. Lo and K. Letaief, “Optimizing downlink throughput with user cooperation and scheduling in adaptive cellular networks,” in Proc. IEEE Wireless Commun. Netw. Conf., 2007, Mar. 2007, pp. 4342–4347, WCNC 2007. [5] D. P. Bertsekas, Dynamic Programming—Deterministic and Stochastic Models. Englewood Cliffs, NJ: Prentice-Hall, 1987. [6] X. Cao, Stochastic Learning and Optimization: A Sensitivity-Based Approach. New York: Springer, 2008. [7] E. Yeh and R. Berry, “Throughput optimal control of cooperative relay networks,” in Proc. Int. Symp. Inf. Theory, (ISIT 2005), Sep. 2005, pp. 1206–1210. [8] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation and cross-layer control in wireless networks,” Found. Trends Netw., vol. 1, no. 1, pp. 1–144, 2006. [9] J. Abounadi, D. Bertsekas, and V. S. Borkar, “Learning algorithms for Markov decision processes with average cost,” SIAM J. Contr. Optimiz., vol. 40, pp. 681–698, 1998. [10] R. Wang, V. K. N. Lau, and Y. Cui, Queue-Aware Distributive Resource Control for Delay-Sensitive Two-Hop MIMO Cooperative Systems Tech. Rep. [Online]. Available: http://www.ee.ust.hk/eeknlau, [Online]. Available: [11] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge , U.K.: Cambridge Univ. Press, 2004. [12] D. Bertsekas, Dynamic Programming and Optimal Control. New York: Athena Scientific, 2007, vol. 2.

[13] V. S. Borkar, “An actor-critic algorithm for constrained Markov decision processes,” Syst. Contr. Lett., vol. 54, pp. 207–213, 2005. [14] V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoints, 1st ed. Cambridge , U.K.: Cambridge Univ. Press, 2008. [15] V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge , U.K.: Cambridge Univ. Press, 2008. [16] V. S. Borkar, “Stochastic approximation with two time scales,” Syst. Contr. Lett., vol. 29, pp. 291–294, 1997. [17] L. Georgiadis, M. Neely, and L. Tassiulas, Resource Allocation and Cross Layer Control in Wireless Networks. New York: Now Publishers, 2006. Rui Wang (S’04–M’08) received the B.Eng. degree (first class honors) in computer science from the University of Science and Technology of China in 2004 and the Ph.D. degree from the Department of ECE, HKUST, in 2008. He was a Postdoctoral Researcher with HKUST from 2008 to 2009. He is currently with Huawei Technologies Co., Ltd. His current research interests include cross-layer optimization, wireless ad hoc network, and cognitive radio.

Vincent K. N. Lau (SM’01) received the B.Eng. degree (Distinction 1st Hons) from the University of Hong Kong in 1992 and the Ph.D. degree from Cambridge University, Cambridge, U.K., in 1997. He was with PCCW as system engineer from 1992 to 1995 and with Bell Labs—Lucent Technologies as member of Technical Staff from 1997 to 2003. He then joined the Department of ECE, HKUST, as an Associate Professor and was promoted to Professor in July 2010. His current research interests include the robust and delay-sensitive cross-layer scheduling, cooperative and cognitive communications, as well as stochastic approximation and Markov decision process.

Ying Cui (S’08) received the B.Eng. degree (first class honors) in electronic and information engineering, Xian Jiaotong University, China, in 2007. She is currently a Ph.D. degree candidate with the Department of ECE, Hong Kong University of Science and Technology (HKUST). Her current research interests include cooperative and cognitive communications, delay-sensitive cross-layer scheduling, as well as stochastic approximation and Markov decision process.

Distributive Property.pdf

incentive-based resource allocation and control for ...

An Energy Based Two Level Prioritized Control for ...

Power system control for hybrid sources Using two ...

Low Cost Two-Person Supervisory Control for Small ...

incentive-based resource allocation and control for ...

Resource Control for the Enhanced Distributed ...

$pdf-0751\media-access-control-and-resource-allocation-for-next ...$

pdf-0751\media-access-control-and-resource-allocation-for-next ...

Distributive Energy Efficient Adaptive Clustering Protocol for Wireless ...

Distributive Property Notes.pdf

Distributive Justice, Geoengineering and Risks.pdf

Electronic Control of a Two- Dimensional, Knee-less ...

Base station, radio resource control equipment, mobile station ...

DISTRIBUTIVE BILATTICES FROM THE ...

Finite distributive concept algebras

Distributive Politics and Electoral Competition

10 Principles to Move Your School Toward Distributive ...

$pdf-1296\distributive-justice-a-constructive-critique-of-the-utilitarian ...$

pdf-1296\distributive-justice-a-constructive-critique-of-the-utilitarian ...

OPTIMAL RESOURCE PROVISIONING FOR RAPIDLY ...

Local candidates and distributive politics under closed ...

Program control system for manipulator

for the Control Freak - GitHub

CONTROL SYSTEM DESIGN FOR SPEED CONTROL ...