Distributed Algorithms for Minimum Cost Multicast with Network Coding

Viewer
Transcript

Distributed Algorithms for Minimum Cost Multicast with Network Coding Yufang Xi and Edmund M. Yeh Department of Electrical Engineering Yale University New Haven, CT 06520, USA {yufang.xi,edmund.yeh}@yale.edu

Abstract We consider the problem of finding the minimum-cost multicast scheme for a single session with elastic rate demand based on the network coding approach. It is shown that solving for the optimal coding subgraphs in network coding is equivalent to finding the optimal routing scheme in a multicommodity flow problem. We design a set of node-based distributed gradient projection algorithms consisting of joint congestion control/routing at the source node and “virtual” routing at intermediate nodes. With appropriately chosen parameters, we show that the distributed algorithms converge to the optimal configuration from all initial conditions.

I. I NTRODUCTION Routing has long been an important technique in optimizing data transmissions for a communication network. In conventional optimal routing problems for wired networks, each node functions as a switch for passing data streams. The node relays the data it receives, but makes no change to the data content. Analytically, the situation can be treated as a Multicommodity Flow Problem (MFP) [1], [2], where data streams are treated as commodities identified by their different destinations. As they are routed among nodes, all commodities maintain their distinctness. Node-based distributed algorithms are presented in [1], [2] to achieve routing patterns that minimize overall network cost. To prevent excessive data input flows from overloading capacitated networks, congestion control algorithms used at source nodes are developed [3], [4]. The combination of congestion control and routing [5]–[7] provides an overall optimal solution to the MFP described above. The recent breakthrough in network coding [8], [9] extends the functionality of network nodes to performing algebraic operations on received data. In general, network coding techniques improve network throughput [8], network robustness [10], and the efficiency of network resource allocation [11], over those achievable by pure routing. The advantage of network coding is most pronounced in establishing multicast connections. Li et al. [12] prove that linear coding suffices to obtain the optimal throughput of a multicast session, achieving the fundamental max-flow-min-cut upper bound. Decentralized random linear coding schemes are proposed in [13], [14], thus rendering network coding applicable to real networks. The problem of finding the minimum-cost multicast scheme using a network coding approach is addressed in [15]. It is shown [15] that the solution of this problem can be decomposed into two parts: finding the minimum-cost coding subgraphs and designing the code applied over the optimal subgraphs. A distributed solution for the second part was provided in [13]. To solve the first part, the work in [11] proposes a distributed algorithm for finding the optimal coding subgraphs via a primal-dual approach. This approach, however, is 1

This research is supported in part by Army Research Office (ARO) Young Investigator Program (YIP) grant DAAD1903-1-0229 and by National Science Foundation (NSF) grant CCR-0313183.

complicated by the fact that distributed evaluation of the dual function is in itself a complex problem. Motivated by this, we design a set of node-based primal gradient projection algorithms that iteratively find the minimum-cost coding subgraphs. Furthermore, we explicitly specify the proper scaling matrices and step sizes for each of the algorithms and show that these parameters can be calculated efficiently in a distributed way. Ironically, although network coding represents a decidedly different network management approach than routing, solving for the optimal coding subgraphs in network coding intrinsically resembles finding the optimal routing scheme in an MFP. In this work, we fully explore this connection and transform the subgraph searching problem into a “virtual” MFP. Furthermore, we generalize the distributed optimal routing algorithms developed in [1], [2] to design a complete set of distributed solutions for the optimal multicast problem involving both congestion control and network coding. In contrast to [2], our scheme uses a different technique for computing the scaling matrices and step sizes. This scheme allows us to guarantee the convergence of the algorithms from all initial conditions. II. P ROBLEM F ORMULATION We consider finding the minimum-cost network coding subgraphs for a single multicast session with elastic rate demand. This procedure, followed by a network code designed specifically for the derived subgraphs, provides the optimal network configuration for the multicast session. Let the network supporting the multicast session be modelled by a directed and connected graph G = (N , E). Specifically, denote the unique source node in N by s, and use W ⊆ N \{s} to represent the set of multiple destination nodes. For each w ∈ W, we say (s, w) is the source-destination pair of virtual session w. To measure the optimality of a multicast scheme, we first associate a utility function U (·) with the multicast session. Assume the session’s maximal transmission rate is R bits/sec, i.e. no more utility is gained by transmitting at rate r ≥ R. As a function of the admitted rate r, U (r) is strictly increasing and concave on its domain [0, R]. For analytical purposes, assume U (r) is twice continuously differentiable. We adopt a flow model to analyze the transmission of the multicast session’s data traffic from the source s to respective destinations w ∈ W. A cost measured by function Dij (Fij ) is incurred on link (i, j)P when it transports the session’s traffic at rate Fij . We assume that the total network cost is (i,j) Dij (Fij ). If costs also exist at nodes, they can be absorbed into the costs of the nodes’ adjacent links. Furthermore, if each link (i, j) has finite capacity Cij , we can let Dij implicitly impose link capacity constraint Fij ≤ Cij by defining Dij (Fij ) = ∞ when Fij > Cij . In general, Dij (·) is assumed to be convex, strictly increasing, and twice continuously differentiable on [0, Cij ). Such link cost functions are adopted in pure routing problem for unicast networks [1], [2]. The network coding technique asymptotically provides a link flow distribution (Fij )(i,j)∈E for a multicast session with admitted rate r that satisfies the following virtual flow conservation relations [8], [15]: fij (w) ≥ 0,

∀(i, j) ∈ E and w ∈ W,

(1)

fsj (w) = r ≡ ts (w),

∀w ∈ W,

(2)

fwj (w) = 0, X fij (w) = fji (w) ≡ ti (w),

∀w ∈ W and j ∈ O(w),

(3)

∀w ∈ W and i ∈ N \{s, w},

(4)

∀(i, j) ∈ E,

(5)

X j∈O(s)

X

j∈O(i)

j∈I(i)

Fij = max fij (w), w∈W

R

s

a i

Fss '

s'

b b k

a j

a a+b

j

a+b

w1 Fig. 1.

b

a+b

w2

An extended multicast network with virtual sink and overflow link

where O(i) , {j : (i, j) ∈ E} and I(i) , {j : (j, i) ∈ E} represent the sets of node i’s next-hop and previous-hop neighbors in the network, respectively. The flow constraints in (1)-(5) essentially reflect the max-flow-min-cut bound which is achievable by network coding. The above constraints are “virtual” in the sense that the conventional flow balance equation (4) and source-sink equations (2)-(3) are with respect to flows of individual virtual sessions (fij (w))(i,j)∈E . The main difference between the optimal coding subgraph problem and the traditional optimal routing problem is that in the optimal subgraph problem, the actual link flow Fij is the maximum (rather than the sum) of the virtual session flows fij (w) (cf. (5)). In the above equations, we use ti (w) to denote the total incoming rate of virtual session2 w at node i. The optimal multicast scheme is derived from balancing the session’s rate demand and the resulting network cost as follows: X Dij (Fij ) (6) maximize U (r) − (i,j)∈E

subject to

0 ≤ r ≤ R, Virtual flow conservation (1)-(5).

(7) (8)

Denote the rejected rate of the multicast session by Fss0 , R − r. Further define the “overflow” cost as Dss0 (Fss0 ) = U (R) − U (r), which is strictly increasing, convex, and twice continuously differentiable on [0, R]. Note the resemblance of Dss0 to ordinary link cost functions. One can think of the rejected flow as being routed on a virtual overflow link [16] connecting s directly to a virtual sink s0 . Thus, the former optimization problem (6) is equivalent to the following Jointly Optimal Congestion control and Routing (JOCR) problem: X Dij (Fij ) + Dss0 (Fss0 ). minimize (9) (i,j)∈E

In what follows, the above objective function is denoted by D. The adjustment of Fss0 corresponds to the congestion control mechanism at s. The flow distribution (Fij ) is determined by the coding subgraph configuration. Algorithmically, the flow variables are adjusted by the “virtual” routing functionality inside the network. An example multicast network with three destinations, one virtual sink, and one virtual overflow link is illustrated in Figure 1. For analytical purposes, we use the approximation proposed in [11]: Ã !1/n X Fij = max fij (w) ≈ (fij (w))n , (10) w∈W

2

w∈W

Later, we use “session” and “virtual session” interchangeably when this causes no confusion.

to make the actual flow Fij differentiable in every virtual flow variable fij (w). In this case, the partial derivatives are given by µ ¶n−1 ∂Fij fij (w) = , (11) ∂fij (w) Fij and Fij is strictly convex in (fij (w)). Approximation (10) becomes exact as n → ∞. In the sequel, we assume n is very large and solve JOCR problem in (9) with constraint (5) replaced by (10). III. N ODE -BASED O PTIMIZATION VARIABLES AND O PTIMALITY C ONDITIONS We have shown that finding the optimal coding subgraphs is equivalent to solving for the minimum-cost flow distribution. The problem can therefore be tackled with an optimal routing methodology. To enable each node to independently adjust its virtual flow values, we adopt the routing variables introduced in [1]. For source node s, they are defined as Fss0 fsj (w) , φsj (w) = , ∀w ∈ W and j ∈ O(s), R R and for intermediate node i ∈ N \{s, w}, they are defined as φss0 =

φij (w) =

fij (w) , ∀w ∈ W and j ∈ O(i). ti (w)

These newly defined variables are subject to node-based simplex constraints X φss0 ≥ 0, φsj (w) ≥ 0, φss0 + φsj (w) = 1, ∀w ∈ W,

(12)

(13)

(14)

j∈O(s)

φij (w) ≥ 0,

X

φij (w) = 1,

∀w ∈ W and i 6= s, w.

(15)

j∈O(i)

When ti (w) = 0, the values of {φij (w)} are immaterial. However, they are required to conform to (15) for consistency. The first derivatives of the objective function with respect to the optimization variables are ∂D ∂D ∂D = R · δφss0 , = R · δφsj (w), and = ti (w) · δφij (w), (16) ∂φss0 ∂φsj (w) ∂φij (w) where the marginal cost indicators are defined as 0 δφss0 , Dss 0 (Fss0 ), δφsj (w) ,

∂D ∂Dij ∂D ∂Dsj + , and δφij (w) , + . (17) ∂fsj (w) ∂rj (w) ∂fij (w) ∂rj (w)

The partial derivatives in (17), representing the marginal link costs and marginal node costs of a virtual session, are computed as follows. ∂Fij ∂Dij 0 (Fij ) = Dij , ∂fij (w) ∂fij (w) ½ ∂D 0, P = ∂rj (w) k∈O(j) φjk (w)δφjk (w),

(18) if j = w, otherwise.

(19)

The conditions for optimality stated in the following theorem can be checked by individual nodes using their marginal cost indicators.

Theorem 1: For a feasible set of routing variables to induce the optimal coding subgraphs, the following conditions are necessary. For all w ∈ W and i ∈ N \{s, w} such that ti (w) > 0, ½ = λi (w), if φik (w) > 0, δφik (w) (20) ≥ λi (w), if φik (w) = 0. For the source node s, define for every w ∈ W, λs (w) = minj∈O(s) δφsj (w), then ½ = λs (w), if φsk (w) > 0, δφsk (w) ≥ λs (w), if φsk (w) = 0,  P  ≥ Pw∈W λs (w), = λs (w), δφss0  ≤ Pw∈W λ (w),

and

w∈W

s

if φss0 = 0, if φss0 ∈ (0, 1), if φss0 = 1.

(21)

(22)

The above conditions are sufficient if (20) holds at all intermediate nodes whether ti (w) > 0 or not. To prove the above theorem, we need the following lemma relating the marginal link costs and the marginal source node cost. Its proof is omitted here due to space limitations. Lemma 1: With link-based and node-based marginal routing costs defined as in (18) and (19), we have for all w ∈ W, X ∂Dik ∂D · fik (w) = · R. (23) ∂fik (w) ∂rs (w) (i,k)∈E

Proof of Theorem 1: The necessity of the optimality conditions can be verified in a straightforward manner. We thus prove only the sufficiency part. Assume a set of valid routing configurations {φ∗ss0 , φ∗i (w)} satisfying the conditions specified in the theorem. Let {φ1ss0 , φ1i (w)} be ∗ ∗ 1 1 any other valid routing variables. Denote the resulting flows by {Fss 0 , fik (w)} and {Fss0 , fik (w)}, respectively. We focus on the difference of the objective values under these two schemes. By the convexity of cost functions Dij and Dss0 in (fij (w)) and Fss0 , we have X X 1 ∗ Dik (Fik1 ) + Dss0 (Fss Dik (Fik∗ ) − Dss0 (Fss 0) − 0) (i,k)∈E

(i,k)∈E

X X

dDss0 1 ∂Dik 1 ∗ ∗ (f (w) − f (w)) + (Fss0 − Fss 0) ik ik ∗ ∗ ∂fik (w) dFss0 (i,k)∈E w∈W   X X ∂Dik ∂D dDss0 (a)  = · t1i (w)φ1ik (w) − ∗ · R(φ1ss0 − φ∗ss0 ) · R + ∗ ∗ ∂f (w) ∂r dF (w) 0 s ik ss w∈W (i,k)∈E ( " #) X X ∂D X 1 − t (w) − t1i (w)φ1ik (w) k ∗ ∂r (w) k w∈W k6=s,w i6=w                X X X X (b) ∗ ∗ ∗ ∗ 1 ∗ 1 ∗     = R φsk (w)δφsk (w) + φss0 δφss0 φsk (w)δφsk (w) + φss0 δφss0 −     w∈W k∈O(s) w∈W k∈O(s)     | {z } | {z } S∗ S1      X X X + t1i (w)  φ1ik (w)δφ∗ik (w) − φ∗ik (w)δφ∗ik (w) . (24)   >

w∈W

i6=s,w

k∈O(i)

P Equation (a) is obtained by using Lemma 1 and appending the zero terms [t1k (w) − i t1i (w)φ1ik (w)]. After grouping similar terms and using definitions (17) and (19), we arrive at (b). We next show that S 1 ≥ S ∗ by considering the following two cases. Case 1: φ∗ss0 < 1. This implies that for each w, there exists at least Pone k ∗∈ O(s) such ∗ ∗ that φsk (w) >P0. Then by optimality conditions (21) and (22), S = w∈W λs (w), and we have δφ∗ss0 ≥ w∈W λ∗s (w). Therefore X X X X S1 ≥ φ1sk (w)λ∗s (w) + φ1ss0 λ∗s (w) = λ∗s (w) = S ∗ . w∈W k∈O(s)

w∈W

w∈W

P Case 2: φ∗ss0 = 1. The optimality condition (22) implies δφ∗ss0 ≤ w∈W λ∗s (w). Therefore, X X X λ∗s (w)(1 − φ1ss0 ) + φ1ss0 δφ∗ss0 ≥ δφ∗ss0 = S ∗ . φ1sk (w)λ∗s (w) + φ1ss0 δφ∗ss0 = S1 ≥ w∈W k∈O(s)

w∈W

Following similar reasoning, one can verify that the expression in (24) is also non-negative. Thus, we have shown that for any other routing configuration {φ1ss0 , φ1i (w)}, X X ∗ 1 Dik (Fik∗ ) − Dss0 (Fss Dik (Fik1 ) + Dss0 (Fss 0 ) > 0. 0) − (i,k)∈E

(i,k)∈E

Therefore, {φ∗ss0 , φ∗i (w)} satisfying (20)-(22) must be optimal.

2

IV. N ODE -BASED D ISTRIBUTED A LGORITHMS After obtaining the optimality conditions, we come to the question of how individual nodes can adjust their local routing variables to find the optimal coding subgraphs. Since the JOCR problem in (9) involves the minimization of a convex objective over convex regions, the class of scaled gradient projection algorithms is appropriate for providing a distributed solution. Using this method, Bertsekas et al. [2] developed distributed routing algorithms for networks supporting unicast sessions. In this section, we adapt this technique to design algorithms for adjusting all virtual sessions’ routing configurations to find the minimum-cost coding subgraphs. These distributed algorithms are used at the source node and intermediate nodes, respectively. Our scheme uses a different technique for computing the scaling matrices and step sizes. This scheme allows us to guarantee the convergence of the algorithms from all initial conditions. A. Source Node Congestion Control/Routing Algorithm (CR) The unique source node s controls φs = (φss0 , (φs (w))), i.e. it adjusts the admission rate of the multicast session (through φss0 ) and the routing allocations of the incoming traffic with respect to all destinations (through φs (w)). Therefore, we specifically call the source node’s algorithm the Congestion control/Routing (CR) algorithm. At the kth iteration, the feasible set of vector φs is © ª Fφk s = φs ≥ 0 : φss0 + φs (w)0 · 1 = 1 and φsj (w) = 0, ∀j ∈ Bsk (w), w ∈ W , where 0 denotes the all-zero vector of dimension |W| × |O(s)| + 1, and 1 represents the all-one vector of dimension |O(s)|. The notation Bsk (w) stands for the blocked node set of node s relative to session w. This device was invented in [1], [2] to prevent the formation of loops in the routing pattern of session w’s traffic. For the source or an intermediate node i, Bik (w) consists of its neighboring node j with marginal cost ∂r∂D higher than ∂r∂D , k k j (w) i (w) and neighbors that route positive flows to more costly downstream nodes. By blocking such nodes, we force each session’s traffic to flow through nodes in decreasing order of marginal

costs, thus precluding the existence of loops. For the exact definition of Bik (w), see [2]. Finally, it is easily seen from its definition that Fφk s is compact and convex. Node s updates the current routing vector φks via the following scaled gradient projection algorithm: £ ¤+ (25) φk+1 = CR(φks ) = φks − (Msk )−1 · δφks M k . s s

δφks

(δφkss0 , (δφks (w))),

Here, is the vector of marginal cost indicators and Msk is a symmetric and positive definite matrix on the subspace     X Vsk = v s : vss0 + vsj (w) = 0 and vsj (w) = 0, ∀w ∈ W, j ∈ Bsk (w) .   j∈O(s)

denotes projection on the feasible The scaling matrix is specified later. The operator [·]+ Msk k k set Fφs relative to the norm induced by matrix Ms . This is given by ˜ + k = arg min hφ − φ, ˜ M k (φ − φ)i, ˜ [φ] Msk

Ms

k φ∈Fφ

s

s

where h·, ·i denotes the standard Euclidean inner product. B. Intermediate Node Routing Algorithm (RT) An intermediate node i 6= s, w changes the allocation of session w’s traffic on its outgoing links j ∈ O(i) locally by adjusting its current routing vector φki (w) within the feasible set © ª Fφk i (w) = φi (w) ≥ 0 : φi (w)0 · 1 = 1 and φij (w) = 0, ∀j ∈ Bik (w) . Here, vectors 0 and 1 are both of dimension |O(i)|, and Bik (w) is the blocked node set discussed above. Because φki (w) affects only the routing pattern of session w’s traffic inside the network, we refer to the updating algorithm at an intermediate node as a pure Routing algorithm (RT ). Similar to CR, it has a scaled gradient projection form £ ¤+ φk+1 (w) = RT (φki (w)) = φki (w) − (Mik (w))−1 · δφki (w) M k (w) . (26) i i

Here,

δφki (w)

=

(δφkijn(w))

and P

Mik (w)

is a symmetric and positive definite o matrix on k the subspace = v i : j∈O(i) vij (w) = 0 and vij (w) = 0, ∀j ∈ Bi (w) 3 . A specific choice of the scaling matrix Mik (w) is given later. In contrast to CR at the source node, RT updates the routing vector φi (w) of one session at a time. Vik (w)

C. Marginal Cost Exchange Protocol In order to let each node acquire the necessary information δφi (w) to implement either CR4 or RT , protocols for exchanging control messages must be developed. In [1], the rules for propagating the marginal cost information are specified. Before iterating its local ik , and inquires its next-hop neighbors k ∈ O(i) algorithm, node i collects local measures ∂f∂D ik (w) ∂D for their marginal costs ∂rk (w) with respect to the adjusted session(s) w. It then evaluates the terms δφik (w) by using (17). throughout the network, all nodes compute locally via the recursive To update ∂r∂D k (w) equations (19) based on reports from their downstream neighbors, and then provide the results to their upstream neighbors. This procedure must terminate because the network consists of a finite number of nodes. Moreover, the algorithms CR and RT guarantee the flow pattern of any session is loop-free. 3 Subspaces Vsk and Vik (w) are spanned by feasible incremental routing vectors φs − φks and φi (w) − φki (w) where φs ∈ Fφk s and φi (w) ∈ Fφk i (w) , respectively. 4 0 In the case of CR, δφss0 = Dss 0 (Fss0 ) is a local measure at s which is not needed elsewhere. Thus, we ignore δφss0 when later discussing the marginal cost exchange protocol.

D. Convergence of Algorithms The scaled gradient projection method seeks to reduce the objective value with each iteration. Because the update direction at every iteration is opposite to the gradient (scaled by a positive definite matrix) with respect to the adjusted variables, it is a descent direction. However, reduction of the objective cost is guaranteed only when appropriate scaling matrices are used. In this subsection, we specify such matrices for CR and RT , respectively. It turns out that the scaling matrix at each node i depends on the number of nodes in its downstream node set DN ki (w) relative to session w’s flow at current k. For convenience, S iteration k k k k introduce notations AN i (w) ≡ O(i)\Bi (w) and AN s ≡ w AN s (w). Lemma 2: Assume the initial network cost D0 is finite. If at each iteration k of CR, the scaling matrix o n ¡ £ ¤¢ R k k k Ms = diag Ass0 (D0 ), |W| Asj (D0 ) + |AN s ||DN j (w)|A(D0 ) w∈W, j∈AN k , (27) s 2 where 00 Aij (D0 ) ≡ max Dij (F ) and A(D0 ) ≡ max Amn (D0 ), (28) F :Dij (F )≤D0

(m,n)∈E

then the cost is strictly reduced by the current iteration unless the equilibrium conditions (21) and (22) are satisfied at s. Proof: By the Projection Theorem [17], the cost difference after the current iteration is 1 D(φk+1 ) − D(φks ) = (Rδφks )0 (φk+1 − φks )0 Hφk,λ (φk+1 − φki ) − φks ) + (φk+1 s s s s s 2 ! Ã k,λ (29) H φs k+1 k k 0 k (φ − φ ), ≤ (φk+1 − φ ) −RM + s s s s s 2 where Hφk,λ is the Hessian matrix of D with respect to components of φs , evaluated at s k λφs + (1 − λ)φk+1 for some λ ∈ [0, 1]. We temporarily assume that D(φk+1 ) − D(φks ) ≤ 0. s s H k,λ

We validate this assumption by showing that −RMsk + φ2 s is negative definite. That is, we show that for all non-zero v s ∈ Vsk , v 0s · Hφk,λ · v s < v 0s · (2RMsk ) · v s . For brevity, we suppress s superscripts (k, λ) in what follows. Plugging in expressions of all entries of Hφs , we have " X X ∂ 2 Dsj 0 2 00 2 vsj(w) vsj(w0 ) v s · Hφs · v s = R Dss 0 (Fss0 )vss0 + ∂fsj (w)∂fsj (w0 ) w,w0 ∈W j∈AN s # X X ∂ 2D + vsj(w) vsk(w0 ) ∂rj (w)∂rk (w0 ) w,w0 ∈W j,k∈AN s  s Ã !2 2D X X (a) ∂ sj 00 2 |v | < R2 Dss 0 (Fss0 )vss0 + 2 sj(w) ∂f (w) sj j∈AN s w∈W s Ã !2  2 X X ∂ D + |v |  2 sj(w) ∂r j (w) w∈W j∈AN s " X X ∂ 2 Dsj (b) 2 00 2 2 0 ≤ R Dss0 (Fss )vss0 + |W| vsj(w) 2 ∂fsj (w) j∈AN s w∈W # X X ∂ 2D + |W||AN s | v2 . (30) 2 sj(w) ∂r j (w) w∈W j∈AN s

Inequality (a) follows from Dsj being strictly convex in (fsj (w)) so that for w 6= w0 or j 6= k, ¯ ¯ s 2 ¯ ¯ s 2 2 2 2D ¯ ¯ ¯ ¯ ∂ D ∂ D ∂ ∂ D ∂ D ∂ 2D sj sj sj ¯ ¯< ¯ ¯< , . ¯ ∂fsj (w)∂fsj (w0 ) ¯ ∂fsj (w)2 ∂fsj (w0 )2 ¯ ∂rj (w)∂rk (w0 ) ¯ ∂rj (w)2 ∂rk (w0 )2 By the Cauchy-Schwarz Inequality, we obtain (b). It can be shown that ∂ 2D ∂ 2 Dmn ≤ max |DN j (w)|, ∂rj (w)2 (m,n)∈E ∂fmn (w)2 where

∂ 2 Dmn 00 = Dmn (Fmn ) ∂fmn (w)2

µ

fmn (w) Fmn

¶2n−2 0 + Dmn (Fmn )

∂ 2 Fmn . ∂fmn (w)2

By relation (10), fmn (w)/Fmn ≤ 1 and µ ¶n−1 · µ ¶n ¸ ∂ 2 Fmn n − 1 fsj (w) fsj (w) 1− ≈ 0, = ∂fmn (w)2 fsj (w) Fsj Fsj 2

Dmn because for n large, either (fsj (w)/Fsj )n−1 ≈ 0 or (fsj (w)/Fsj )n ≈ 1. Therefore, ∂f∂mn ≤ (w)2 k+1 k 00 k Dmn (Fmn ). By the assumption D(φs ) − D(φs ) ≤ 0, we have Dmn (Fmn ) ≤ D0 and k+1 k k+1 Dmn (Fmn ) ≤ D0 , so Dmn (λFmn + (1 − λ)Fmn ) ≤ D0 for all λ ∈ [0, 1]. Accordingly, ∂ 2 Dmn 00 k,λ Dmn (Fmn ) ≤ Amn (D0 ) and max(m,n)∈E ∂f k,λ (w)2 ≤ A(D0 ). Substituting the above bounds mn

into (30), we obtain the desired relation v 0s · Hφk,λ · v s < v 0s · (2RMsk ) · v s . s Now with the scaling matrix Msk specified in the lemma, D(φk+1 )−D(φks ) ≤ 0, where the s k+1 k k inequality is strict if and only if φs 6= φs . Moreover, φs = CR(φks ) only when conditions (21)-(22) hold at node s. Thus, the lemma is proved. 2 Using similar techniques, we can derive appropriate scaling matrices for the algorithms RT used at intermediate nodes. Lemma 3: Assume the initial network cost D0 is finite. If at each iteration k of RT , the scaling matrix © ª tki (w) diag Aij (D0 ) + |AN ki (w)||DN kj (w)|A(D0 ) j∈AN k (w) , (31) i 2 then the cost is strictly reduced by the current iteration unless the equilibrium condition (20) is satisfied by φki (w). Mik (w) =

Building on the above two lemmas, we have the following convergence theorem. 0 Theorem 2: Assume an initial ¡ 0 congestion ¢control ratio φss0 and loop-free routing config0 0 uration {φi (w)} such that D φss0 , {φi (w)} = D0 < ∞. If the scaling matrices are chosen according to Lemmas 2 and 3, then the sequences generated by algorithms CR and RT converge, i.e. φkss0 → φ∗ss0 and φki (w) → φ∗i (w) for all w ∈ W and i 6= w as k → ∞. Furthermore, φ∗ss0 and {φ∗i (w)} constitute a set of jointly optimal solution of JOCR (9).

Proof: With the scaling matrices specified in Lemmas 2 and 3, any iteration of CR and RT strictly reduces the total network cost with all other variables fixed, unless the equilibrium conditions for the adjusted variables are satisfied. Because optimization variables k ∞ φs and {φi (w)} each takes values in a compact set, the sequences {φks }∞ k=0 and {φi (w)}k=0 must each have a convergent subsequence. As the objective function is bounded below, the non-increasing sequence of network cost generated by iterations of all the algorithms must converge. Therefore, the limit points φ∗s and φ∗i (w) of above sequences must be such that

conditions (21)-(22) are satisfied by φ∗s and condition (20) is satisfied by φi (w) at all i 6= s, w. Therefore by Theorem 1, these limit points jointly constitute an optimal solution of JOCR (9). 2 Note that global convergence does not require any particular order in running the algorithms CR and RT at different nodes. For convergence to the joint optimum, every node i only needs to iterate its own algorithm(s) until its routing variables satisfy either (21)-(22) or (20)5 . Since both the implementation and the termination of all the algorithms are fully distributed to individual nodes, the whole scheme provides a distributed method of finding the optimal multicast subgraphs for network coding. V. C ONCLUSION We adopt the network coding approach to achieve minimum-cost multicast. In light of the intrinsic similarity between the optimal coding subgraph problem and the conventional multi-commodity flow routing problem, we apply distributed routing algorithms to solve the multicast subgraph optimization. We develop a node-based optimization framework, and derive the necessary and sufficient optimality conditions for convex link costs and concave utility functions. Finally, we design a complete set of distributed algorithms involving both congestion control and routing, and prove its convergence to the optimum network configuration. R EFERENCES [1] R. Gallager, “A minimum delay routing algorithm using distributed computation,” IEEE Transactions on Communications, vol. 25, no. 1, pp. 73–85, 1977. [2] D. Bertsekas, E. Gafni, and R. Gallager, “Second derivative algorithm for minimum delay distributed routing in networks,” IEEE Transactions on Communications, vol. 32, no. 8, pp. 911–919, 1984. [3] F. Kelly, A. Maulloo, and D. Tan, “Rate control in communication networks: shadow prices, proportional fairness and stability,” Journal of the Operational Research Society, vol. 49, 1998. [4] S. Low and D. Lapsley, “Optimization flow control. I. basic algorithm and convergence,” IEEE/ACM Transactions on Networking, vol. 7, pp. 861–874, Dec. 1999. [5] W. Wang, M. Palaniswami, and S. H. Low, “Optimal flow control and routing in multi-path networks,” Performance Evaluation, vol. 52, pp. 119–132, 2003. [6] K. Kar, S. Sarkar, and L. Tassiulas, “Optimization based rate control for multipath sessions,” in Proceedings of Seventeenth International Teletraffic Congress (ITC), Dec. 2001. [7] X. Lin and N. B. Shroff, “The multi-path utility maximization problem,” in 41st Annual Allerton Conference on Communication, Control and Computing, (Monticello, IL), October 2003. [8] R. Ahlswede, N. Cai, S.-Y. Li, and R. Yeung, “Network information flow,” IEEE Transactions on Information Theory, vol. 46, pp. 1204–1216, July 2000. [9] R. Koetter and M. Medard, “An algebraic approach to network coding,” IEEE/ACM Transactions on Networking, vol. 11, pp. 782–795, Oct. 2003. [10] T. Ho, M. Medard, and R. Koetter, “An information-theoretic view of network management,” IEEE Transactions on Information Theory, vol. 51, pp. 1295–1312, Apr. 2005. [11] D. Lun, N. Ratnakar, R. Koetter, M. M´edard, E. Ahmed, and H. Lee, “Achieving minimum-cost multicast: A decentralized approach based on network coding,” in INFOCOM2005, Mar. 2005. [12] S.-Y. Li, R. Yeung, and N. Cai, “Linear network coding,” IEEE Transactions on Information Theory, vol. 49, pp. 371– 381, Feb. 2003. [13] T. Ho, M. M´edard, M. Effros, and D. Karger, “On randomized network coding,” in 41st Allerton Annual Conference on Communication, Control and Computing, Oct. 2003. [14] P. A. Chou, Y. Wu, and K. Jain, “Practical network coding,” in 41st Allerton Annual Conference on Communication, Control and Computing, Oct. 2003. [15] D. Lun, M. M´edard, T. Ho., and R. Koetter, “Network coding with a cost criterion,” in International Symposium on Information Theory and its Applications (ISITA 2004), Oct. 2004. [16] D. P. Bertsekas and R. Gallager, Data Networks. Prentice Hall, second ed., 1992. [17] D. P. Bertsekas, Nonlinear Programming. Athena Scientific, second ed., 1999.

5

In practice, nodes may keep updating their routing variables with the corresponding algorithms until further reduction in network cost by any one of the algorithms is negligible.

Optimizing Network Coding Algorithms for Multicast Applications.pdf