Modeling of H.264 Video Transmission over MBMS

Viewer
Transcript

1

Modeling of H.264 Video Transmission over MBMS Zhengye Liu†, Zhenyu Wu‡, Hang Liu‡, Yao Wang† †Polytechnic Institute of New York University, Brooklyn, NY11201, USA ‡Corporate Research, Thomson Inc., Princeton, NJ 08540, USA

Abstract—The 3rd Generation Partnership Project (3GPP) has standardized Multimedia Broadcast and Multicast Services (MBMS) to provide mobile video streaming services over cellular networks. H.264/AVC has been selected as the recommended video coding scheme for MBMS. In this paper, we model the system behavior for every component for H.264 encoded video transmission over 3GPP MBMS, including H.264 source rate and source distortion at video encoder, MBMS packet losses at transmission, and H.264 channel distortion at video decoder. Our simulations verify the proposed models. Additionally, with the proposed models, we study a joint source channel optimization problem for video multicast over MBMS.

I. I NTRODUCTION The 3rd Generation Partnership Project (3GPP) committee has standardized the Multimedia Broadcast/Multicast Services (MBMS) as a feature in release 6 [1]. MBMS is expected to enable easier deployment of video distribution of live or pre-recorded programs to many receivers efficiently over 3G cellular networks. MBMS specifies video coding, channel coding, and protocol stacks for video streaming over 3G networks. Frist, MBMS selects the baseline profile of H.264/AVC (in short H.264) [2] video coding standard as the unique recommended video coding scheme [1], due to its significantly improved video coding efficiency. Second, MBMS adopts application layer cross-packet forward error correction (FEC) to combat packet losses induced by errorprone wireless networks. Third, MBMS maximally reuses the existing infrastructures and protocols of GSM (Global System for Mobile communications) and UMTS (Universal Mobile Telecommunications System) [3], such as the packet-based bearers within UMTS or EGPRS (Enhanced General Packet Radio Services). However, despite of its successful standardization and deployment, the system behaviors of video transmission over MBMS have not been analytically studied and fully understood. In this paper, we model the system behaviors of video transmission over MBMS, including H.264 source rate and distortion, MBMS packet transmission, H.264 channel distortion, and finally the end-to-end distortion. These models can be used to provide guidance on system design and operation. The models provide an in-depth understanding on how different components, e.g., source coding, channel coding, and low layer channel conditions, affect the received video quality, based on which system design can be refined to maximize the received video quality. Furthermore, in MBMS, one needs to determine how to allocate the available bandwidth to source coding and

channel coding to maximize the received video quality. This is a joint source and channel optimization problem. The models are essential to accelerate the decision-making procedure and make the joint source and channel coding scheme practically feasible. In this paper, we first propose sequence level recursive models that relate the source rate and the quantization incurred distortion at the encoder to two configurable encoder parameters, namely, quantization parameter and intra-block rate. Second, we propose a recursive model that relates the channel characteristics observed at the application layer, including, the FEC success rate and the residual packet loss rate, to the link layer channel characteristics in MBMS. Third, we apply our previously proposed H.264 channel distortion model [4], [5] on MBMS, to estimate the channel error induced distortion. By combining these components, we obtain an end-to-end distortion model for video transmission over MBMS. We verify these models via simulations and show that they provide sufficient accuracy for a large range of channel conditions. With the proposed models, we observe that under representative system configuration, inserting more FEC parity packet in channel coding is more efficient on providing error resilience than inserting more intra-coded blocks in source coding. Additionally, we investigate the joint source channel optimization problem. We first consider a single receiver scenario and observe that a proper bandwidth allocation to source and channel bits can lead to more than 2 dB gain compared with an arbitrary bandwidth allocation. We then consider a multiple receiver scenario and investigate two performance metrics, namely maximum weighted average metric and minimax degradation metric, to deal with the heterogeneity of receiver channel conditions. We find that the minimax degradation metric can yield more constant video quality among all receivers and for the same receiver at different times with only slight reduction of the average video quality. The remainder of this paper is structured as follows. Section II presents a typical system setup for video transmission over MBMS. In Section III, we discuss the models for source coding rate and distortion, packet loss rate, channel distortion, and total distortion. In Section IV, we investigate the joint source and channel optimization problem and compare two performance metrics for video multicast. We discuss related work in Section V and conclude the paper in Section VI.

2

3G Core Network

Source packet 1

UE

UTRAN

FEC coding padding

...

Multimedia Services

Source packet N

UE

Repair packet 1

Fig. 1.

The principle of MBMS.

... Repair packet M

IP

UDP RTP

Video Payload

RLC

Frame

RLC PDU

Fig. 2.

Frame

CRC

Repair symbol n-k

Structure of cross-packet FEC for MBMS.

PDCP with HC

Video Payload

RLC

Source symbol k Repair symbol 1 Repair symbol 2

RTP/UDP/IP

Fig. 3. PDCP HC

Source symbol 1 Source symbol 2 Source symbol 3

RLC

Frame

RLC PDU

JM9.6 Encoder

RLC Link Layer

Interleaving

FEC Encoder Trace-Driven Channel Simulator

CRC Physical Layer

JM9.6 Decoder

Packetization through UMTS protocol stack. Fig. 4.

De-Interleaving

FEC Decoder

Architecture of simulation platform.

II. S YSTEM S ETUP We first give a brief overview for video transmission over MBMS, including video coding, channel coding, and protocol stacks. We then describe our simulation platform used to verify the proposed models. A. Overview of MBMS MBMS[1] is an IP datacast (IPDC) type of service that can be offered via existing GSM and UMTS cellular networks. It is a unidirectional point-to-multipoint service in which data is transmitted from a single source entity to multiple recipients in a specific area, with a transmission rate up to 256 kbps (although lower rates are likely to be more common). Figure 1 shows the principle of MBMS for UMTS. We assume that there is no IP packet loss within the 3GPP core network and UTRAN, and packet losses are mainly induced by the RLCPDU losses over the air interface between UE and UTRAN. H.264 baseline profile [2] is adopted as the recommended video coding scheme in MBMS. H.264 baseline profile is designed for low-cost applications with limited computing resources. It supports several new features in H.264 that significantly improve video coding efficiency, such as intrablock and in-loop deblocking filter. Additionally, it provides a set of tools for error resilience purpose, such as flexible macroblock ordering (FMO), arbitrary slice ordering (ASO), and redundant slice (RS). To limit the computational complexity, the baseline profile only supports IPPP... structure. Furthermore, H.264 introduces a network adaptation layer (NAL) in order to flexibly adapt to different networks. With NAL, a video NAL unit (NALU) can be easily packetized into RTP packet, and transmitted through the RTP/UDP/IP protocols. Although higher definition video may be transmitted over MBMS in the future, in this paper, we focus on QCIF (172x144) video sequence and target at a low bit rate video steaming service. We focus our discussion on UMTS. Figure 2 shows the UMTS packetization process [3]. An RTP/UDP/IP packet is put into one packet data convergence protocol (PDCP) packet which then becomes an radio link control (RLC)-service data unit (SDU). Video slices resulting from the H.264 encoder

are put into RTP packets. These packets have variable length (unless some special mechanisms are exercised at the encoder such as multi-pass encoding), so RLC-SDU will be of varying length as well. If an RLC-SDU is larger than a radio link control (RLC)-protocal data unit (PDU), then the SDU is segmented into several PDUs. If the last PDU is not filled up, some beginning portion of the next SDU will be put in there. In the receiver side, if one PDU which carries a portion of an SDU is lost then the entire SDU and hence the corresponding video packet will be considered lost entirely. Note that all SDUs will be lost when there is any error occurring at their underlying PDU. Therefore, even if PDU loss is not bursty, the SDU loss and consequently the video packet loss can be very bursty. In MBMS, channel coding is performed in the application layer. Application layer cross-packet FEC is adopted to combat packet losses induced by lower layer channel degradation. As shown in Figure 3, N video source packets are grouped to form one source block, which is then shaped into k equal length source symbols, with a symbol length T . FEC padding bits may be added to the end of last symbol from a video packet to make all symbol have length T . Note that these padding bits are only used for generating parity symbols, but will not be transmitted. The (n, k) FEC codes with r = k/n is applied to the k source symbols to generate n − k parity symbols. The parity symbols are encapsulated into parity packets for transmission, where one or more parity symbols can be encapsulated into one RTP packet. With FEC, any received k symbols can be used to recover all k source symbols. However, if a receiver receives less than k symbols, the FEC decoding will fail and all received parity symbols will become useless. MBMS has determined to use the Raptor codes to generate parity packets. B. Simulation Platform We create a simulation platform that simulates video transmission over MBMS. Figure 4 shows the architecture of the simulation platform. We adopt JM9.6 [2] as the H.264 video encoder. We code

3

TABLE I S IMULATION SET- UP FOR H.264 VIDEO TRANSMISSION OVER MBMS Video encoder Sequence name: Foreman Sequence format: QCIF(176x144), 30 frame/s Encoding frame rate: 10 frame/s GoP length: 4 seconds Encoding mode: Frame mode Quantization parameter: 29-35 Forced intra rate: 0-12/99 Channel simulation configuration UTRAN: 64 kbps TTI: 80 ms PDU length: 640 bytes PDU loss rate: 0.005, 0.01, 0.015, 0.1

a video sequence in groups of pictures (GoP), within each GoP led by one I-frame and followed by P-frames. In our simulations, the GoP length is set to 4 seconds. We periodically code MBs in a P-frame in the intra mode according to a specified forced intra rate. Each frame is coded into one NAL unit and then put into a separate RTP packet. Because of the relatively low available total bandwidth, we choose not to use slice mode so as to maximize the coding efficiency. We run the JM9.6 encoder on QCIF sequences with different quantization parameters and intra block rates. We tested “News”, “Foreman”, and “Football” sequences in our simulations. In this paper, we present the results of “Foreman” as a representative. In our simulation, interleaving of video source packets before FEC is adopted to combat the burst of packet losses, so that the event of packet loss after FEC decoding (if FEC decoding is not successful) could be considered as i.i.d.. In order to avoid introducing additional delay besides FEC coding delay, we use within FEC-block interleaving instead of cross FEC-block interleaving. We apply application layer cross-packet FEC on the encoded source video packets. We limit our study to the case where equal FEC is applied to all video data. We group all the video packets corresponding to one GoP to form one source block. Thus, the system delay is equal to the GoP length (4 seconds in our simulation). A (n, k) FEC coding is applied on each source block. We shape the video packets into k equal length source symbols. We choose n = 255. The symbol length T is chosen appropriately to guarantee that all the RTP packets from the same GoP could be fitted into k source symbols. All RTP packetization follows the protocol RFC 3984[6]. In our simulations, we use the PDU loss traces provided by Qualcomm and Toshiba [7] with different average loss rates for UTRAN. The total transmission rate is set to 64 kbps, and the according TTI and PDU length are 80 ms and 640 bytes, respectively. The channel simulator simulates all lower layer behaviors below application layers, e.g., the header compression. The highest PDU loss rate of available traces is 10%. As aforementioned, one PDU loss may lead to more than one SDU/packet loss. A receiver performs FEC decoding for the received source and parity packets, de-interleaves the recovered source packets, and then feeds these source packets to JM9.6 H.264 video decoder. Motion copy is adopted as the error concealment

method. The detailed simulation setup is summarized in Table I. III. M ODELS FOR H.264 V IDEO T RANSMISSION OVER MBMS We first present source rate and distortion models for H.264 video encoding. We next present a recursive model that relates the application layer channel characteristics to the link layer channel characteristics in MBMS. We then model channel induced distortion for the received video. Finally, we combine all these models and propose an end-to-end video distortion model for MBMS. A. Source Rate and Distortion Models We model the source video rate and the distortion incurred at encoding. The source distortion is mainly induced by quantization at an encoder. Let’s denote the source rate, source distortion, quantization parameter, and intra-block rate as Rs , Ds , QP , and β, respectively. There have been several studies that model the relation between the quantization step size and the source rate and distortion in a block-based hybrid coder, e.g., [8], [9], [10], [11]. A problem with these prior work is that they apply the same model over all blocks regardless their coding modes. Because intra and inter-coded blocks have very different statistics, ideally they should be modeled separately. The work in [12] considers the impact of intra rate β on Rs and Ds , but it models the rate-distortion in ρ-domain, and does not give an explicit relation between Rs , Ds and QP . In MBMS application, it is desirable to model Rs , Ds explicitly in terms of QP and β, which can be configured by a system operator. In this section, all video distortions are modeled in terms of MSE. Among all intra-coded blocks, those that are forced to be coded using the intra-mode (FI), and those that can be coded more efficiently with the intra mode (I), and finally those in an I-frame (I-frame), typically have different characteristics, and should be modeled separately as well. We denote the percentage of forced intra MBs by βFI , the percentage of other intra-coded MBs (chosen by the encoder for coding efficiency purpose) by βI , and finally the percentage of inter MBs by βP = 1 − βI − βFI . Note that βI depends on the characteristics of the underlying sequence and can be considered as a fixed known value for a given sequence. In the current work, we do not attempt to model the relation between the quantization step size and the rate for blocks coded in different modes. Rather, for a given QP , we measure the average bit rate for each coding mode (I-frame, FI, I, P) and the percentage of nonforced intra blocks (βI ) by running the actual encoder without any forced intra and at an intermediate forced intra rate. We then estimate the total average rate for a given forced intra-rate βFI by R(QP, βFI ) = N1 {RI−frame (QP )+ (N − 1)[βFI RFI (QP ) + βI RI (QP ) + βP RP (QP )]}.

(1)

Similarly, we measure the average distortion for each mode, and estimate the average source coding distortion over a GoP for a given βF I by

4

4

x 10

40

Average source distortion (MSE)

6

Average source rate (bps)

5.5

5

4.5

4 Model: QP=30 Model: QP=32 Experiment: QP=30 Experiment: QP=32

3.5

3 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Verification of source rate model.

Ds (QP, βFI ) = N1 {Ds,I−frame (QP ) +(N − 1)[βFI Ds,FI (QP ) + βI Ds,I (QP ) + βP Ds,P (QP )]}. (2) To reduce the computation, we could use some chosen analytical models of rate vs. QP and distortion vs. QP , wherein the model parameters for each mode will need to be determined in advance based on training data. For example, for rate modeling, we have found from our preliminary study that the quadratic model in [9] is quite accurate, but the model parameters for the inter mode vs. the various intra mode are quite different. For the distortion model, the general relation of Ds (Q) = A/Q2 can be applied where Q is the actual quantization step size corresponding to the QP . Once accurate models are found, they can be incorporated into the current proposed framework. We verify the source rate and distortion models with simulations. The simulation setup is shown in Table I. Figure 5 compares the source coding rates obtained based on the model and the actual simulations. We indicate βFI in X-axis to highlight how intra-coded blocks affect source rate. We can see for different QP and βFI , the model is sufficiently accurate. As expected, we observe that the source rate is linearly related to βFI . This reflects that different coding modes indeed have different characteristics on bit rate. Similarly, Figure 6 verifies the source distortion model. Unlike source rate, source distortion is mainly determined by QP rather than the different coding modes. Therefore, with varying βFI , the source distortion appears unchanged. B. Packet Loss Model We relate application layer channel characteristics to link layer channel characteristics. We adopt FEC block success rate (denoted by PFEC,suc ) and residual packet loss rate (denoted by Pr ) to represent the application layer channel characteristics. PFEC,suc is defined as the percentage of FEC blocks that are successfully decoded at the receiver, and Pr is defined as the application layer packet loss rate from those FEC blocks that cannot be decoded successfully. These two characteristics will be used in Section III-C for channel distortion modeling. Recall that the variable-length SDUs (which correspond to

30

25

20 Model: QP=30 Model: QP=32 Experiment: QP=30 Experiment: QP=32

15

10 0

0.04

Forced intra rate

Fig. 5.

35

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Forced intra rate

Fig. 6.

Verification of source distortion model.

the application layer video packets) must be delivered over fixed length PDUs without alignment through bit stuffing. The channel condition degradation is therefore amplified, with one PDU loss leading to multiple SDU/application layer packet losses. We assume that N source packets are divided into k FEC source symbols. With FEC encoding, n − k parity symbols are created and packetized into Np parity packets. We further assume that these N + Np packets are then transmitted over M fixed-length PDUs. In FEC decoding, if the number of lost symbols is less than n − k, then this FEC block will be decoded successfully; otherwise the FEC decoding will fail. Mathematically, this can be expressed as PFEC,suc =

n−k X

PSYM (l),

(3)

l=0

where PSYM (l) denotes the probability that l symbols in a FEC block are lost. The problem of estimating PFEC,suc is thus reduced to estimate the distribution of lost symbols PSYM (l). Figure 7 (a) shows an example of the relation between PDUs, application packets, and FEC symbols. If PDU m − 1 has been received correctly, the loss of PDU m will newly induce two packet losses with three FEC symbol losses; while if PDU m − 1 has been lost already, the loss of PDU m will only induce one new packet loss with two symbol losses. This indicates that the packet/symbol losses are dependent. Furthermore, in wireless channel, the underlying PDU losses are also dependent. The main challenge of estimating PSYM (l) is to model the dependency of PDU losses, packet losses, and FEC symbol losses. We use a two-state Markov model to model PDU losses. This model is characterized by two probabilities, with which the state transits from “good” to “good” and from “bad” to “bad”. Let p and q denote these two probabilities, respectively. p and q can be obtained from the link layer characteristics, namely, average PDU loss rate and the average burst length of PDU losses. We denote the probabilities that PDU m is g b received correctly and incorrectly as PPDU,m and PPDU,m , g respectively. Recursively, we can represent PPDU,m as g g b PPDU,m = pPPDU,m−1 + (1 − q)PPDU,m−1 .

(4)

5

Packet

Packet

Packet Symbol

PDU m-1

PDU m

(a) Good State

Bad State

m-1 (1 , … , l-3, l-2, l, … , n) (1 , … , l-3, l-2, l, … , n) m (1 , … , l-3, l-2, l, … , n) (1 , … , l-3, l-2, l, … , n)

(b) Fig. 7. Illustration of the recursive model for packet loss in MBMS. (a) Relation of PDUs, packets, and FEC symbols; (b) Recording table for the recursive model.

g where PPDU,1 = 1−Pe,PDU . Here Pe,PDU denotes the average g b PDU loss rate. PPDU,m is equal to 1 − PPDU,m . Let’s consider how PDU losses affect the high level FEC symbols. Assuming that m − 1 out of M PDUs have been transmitted. If PDU m − 1 has been received correctly, we g denote the distribution of lost symbols as PSYM,m−1 (l), where g l = 1, . . . , n. For a given l, PSYM,m−1 (l) is the probability that after transmitting m − 1 PDUs and provided that PDU m − 1 is received correctly, l FEC symbols are lost. Similarly, b (l) denotes such a distribution, given that PDU m− PSYM,m−1 1 has been lost. Given that PDU m is received correctly, we g g b (l) can derive PSYM,m (l) from PSYM,m−1 (l) and PSYM,m−1 recursively: g g (l) (l) = pPSYM,m−1 PSYM,m b +(1 − q)PSYM,m−1 (l).

(5)

b (l) can be derived as: Similarly, PSYM,m g b (l) = (1 − p)PSYM,m−1 (l − LgSYM,m ) PSYM,m b (l − LbSYM,m ), +qPSYM,m−1

(6)

where LgSYM,m and LbSYM,m indicate the newly induced FEC symbol losses due to the loss of PDU m, when PDU m − 1 is received and lost, respectively. We can see in Figure 7 (a) that LgSYM,m = 3 and LbSYM,m = 2. Given an encoded video sequence, LgSYM,m and LgSYM,m are known for all M PDUs. Initially, we set 1 l=0 g (7) PSYM,1 (l) = 0 l = 1, . . . , n and b PSYM,1 (l)

=

1 0

l = LgSYM,1 Otherwise

(8)

Therefore, we can obtain PSYM (l) as g g b b PSYM (l) = PPDU,M PSYM,M (l) + PPDU,M PSYM,M (l).

(9) PFEC,suc can thus be carried out by (3) and (9). To implement this model, at step m, we maintain a table g b with totally 2n entries to record PSYM,m (l) and PSYM,m (l) for all possible l from l = 1, . . . , n. This is illustrated in Figure 7 (b). The reasons that we differentiate good state and bad state for each PDU are (i) PDU losses are dependent, and (ii) the new FEC symbol losses not only depend on the current PDU m, but also depend on the previous PDU m−1. We notice that

the new FEC symbol losses on PDU m may also depend on more previous PDUs, e.g., PDU m − 2. However, we observe that the video packet size is typically comparable to PDU size, thus we restrict the dependency on the conjoint PDUs. We increment the recursive model to obtain the residual packet loss rate Pr . Assume l FEC symbols are lost at step m − 1. In the case that PDU m has been received correctly g (in good state), in addition to PSYM,m (l) for each entry l, we also maintain the corresponding source packet loss distribution g PSRC,m (l, v), which records the probabilities that v source packet are lost, where v = 1, . . . , Ns . Similarly, we record b PSRC,m (l, v) for each entry l in the case that PDU m is lost (in bad state). Note that each entry of the table additionally g b records one vector of PSRC,m (l, v) or PSRC,m (l, v). We reg cursively represent PSRC,m (l, v) as follows: g g PSRC,m (l, v) = pPSRC,m−1 (l, v) b +(1 − q)PSRC,m−1 (l, v).

(10)

b (l, v) can be represented by: Similarly, PSRC,m g b (l, v) = (1 − p)PSRC,m−1 (l − LgSYM,m , v − LgSRC,m ) PSRC,m b b +qPSRC,m−1 (l − LSYM,m , v − LbSRC,m ), (11) where LgSRC,m and LbSRC,m denote the number of newly induced source packet losses due to the loss of PDU m. Similar g to 7 and 8, we can obtain the initial state of PSRC,1 (l, v) and b (l, v). PSRC,1 At step M , given that l symbols are lost, it is expected that ESRC (l) source packets are lost. ESRC (l) can be expressed as: PN g g ESRC (l) = v=0 vPSRC,m (l, v)PSYM,M (l) PN (12) b b + v=0 vPSRC,m (l, v)PSYM,M (l)

The residual packet loss rate Pr can thus be expressed as: Pn ESRC (l)PSYM (l) . (13) Pr = l=n−k+1 N (1 − PFEC,suc ) We verify the packet loss models in MBMS with simulations. We fix βFI = 0 and vary QP from 29 to 33 to obtain different encoded video sequences with different source rates and packet lengths. Note that since the total bandwidth is fixed, a smaller QP will lead to a higher source rate and a higher FEC coding rate r, defined by r = k/n. Figure 8 shows PFEC,suc obtained from the model and the simulations for different link layer channel conditions. We can observe that for both good channel condition with Pe,PDU = 0.015 and bad channel condition with Pe,PDU = 0.1, the model provides sufficiently accurate estimation for PFEC,suc . In Table II, we compare Pr obtained from the model and the simulations. Note that Pr is available only if PFEC,suc is less than 1. We can see that the recursive model provides an accurate estimation for Pr . C. Channel Distortion Model In this subsection, we model the video distortion occurred during video transmission and consequent error propagation in the decoded sequence. While all the prior works, e.g.,[9],

6

500 1

Average channel distortion (MSE)

450

FEC success rate

0.8

0.6

0.4 Model: Pe,PDU=0.015 Model: Pe,PDU=0.1

0.2

Experiment: Pe,PDU=0.015

29.5

30

30.5

31

31.5

32

32.5

300

Model: QP=30, Pe,PDU=0.1 Model: QP=32, Pe,PDU=0.1 Model: QP=30, Pe,PDU=0.015

250

Experiment: QP=30, P

200

Experiment: QP=32, Pe,PDU=0.1

150

=0.1

e,PDU

Experiment: QP=30, Pe,PDU=0.15

100

0 0

33

QP

Fig. 8.

350

50

Experiment: Pe,PDU=0.1 0 29

400

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Forced intra rate

Verification of packet loss model for FEC success rate.

Fig. 9.

Verification of channel distortion model.

TABLE II V ERIFICATION OF PACKET LOSS MODEL FOR RESIDUAL PACKET LOSS

a GoP as Dc,GoP (QP, β, P ). In the context of MBMS, given that a FEC block can be decoded successfully, then the GoP carried by this FEC block can be reconstructed with Dc = 0; while provided that a FEC block is failed to decode, the packet loss rate of this GoP is the residual packet loss rate Pr . Hence, the average channel distortion of a video sequence (denoted as Dc ) can be expressed mathematically as:

RATE

QP 29 30 29 30 31 32

Model Experiment Pe,PDU = 0.015 0.0649 0.0719 0.1606 0.1474 Pe,PDU = 0.1 0.1724 0.1942 0.2104 0.2127 0.2928 0.2782 0.3240 0.3368

[13], [14], [12], consider error propagation due to temporal prediction, most of them do not take into account of intraprediction and deblocking filtering which are two new features of the latest H.264 video coding standard. In our previous work [4], [5], we have developed a quite accurate model that relates the channel distortion with the GoP length (denoted as N ), intra-block rate β = βFI + βI , and packet loss rate (denoted as P ). The model relates the average channel distortion in a Pframe with that of the previous frame in the same GoP in a recursion formula: Dc,n (QP, β, P ) = P DECP,n (QP ) +α(β, P )Dc,n−1 (QP, β, P ), n = 1, 2, . . . , N − 1 with α(β, P ) = (1 − P )(1 − β)(a + (1 − β)b) + P h

(14)

where Dc,n (QP, β, P ) denotes the expected distortion per pixel in frame n, DECP,n (QP ) denotes the average concealment distortion per pixel for lost MBs in frame n in the absence of any channel-induced distortion in previous frames. The value for DECP,n (QP ) is video sequence dependent but can be measured directly at the encoding stage if the decoder error concealment algorithm is known. In our simulation, we determine DECP,n (QP ) by setting frame n alone as lost and running the assumed decoder error concealment method. The constants a, b, and h are model parameters that range between 0 and 1, and are estimated following [4], [5]. Using (16), we can easily compute the expected channel distortion in each frame within a GoP and derive the average channel distortion over the GoP. We denote the average channel distortion over

Dc (QP, βFI ; Pe ) = (1 − PFEC,suc )Dc,GoP (QP, βFI + βI , Pr ), (15) where Pe collectively denotes the channel condition. In our case, Pe includes the PDU loss rate Pe,PDU and the average burst length of PDU loss. We explicitly explore QP , βFI , and Pe as parameters of the channel distortion model. Given a Pe , PFEC,suc and Pr can be obtained based on the packet loss model described in Section III-B. In Figure 9, we show the channel distortion Dc obtained by the model and simulation with different βF I and QP under different channel conditions. We see that the modeled distortions are accurate. An interesting observation is that for different QP s, the minimum channel distortion is always achieved when βFI = 0 and the leftover bandwidth is used for FEC coding. In the prior studies on joint source and channel coding, e.g., [14], [12], intra rate and FEC rate can tradeoff with each other and there exists an optimal point that minimizes the received channel distortion. However, based on a typical setup in MBMS, the application layer cross-packet FEC appears outperforming intra refreshment on providing error resilience. This observation has also been made for different video sequences under different bandwidth setups. Only if the leftover bandwidth allocated to FEC coding is very limited and a large percentage of FEC blocks cannot be decoded successfully, intra refreshment tends to be beneficial by stopping error propagation. When QP = 30, Pe,PDU = 0.1, and accordingly PFEC,suc = 0.37 (as shown in Figure 8), Dc achieved by increasing βFI becomes close to that achieved by setting βFI = 0 and allocating all leftover bandwidth to FEC coding. However, since in this case, FEC is ineffective and the residual packet loss rate is high, Dc is normally high, which makes it a less meaningful operating point.

7

550

36

400

Model: QP=30, Pe,PDU=0.1

350

Model: QP=32, Pe,PDU=0.1

300

Model: QP=30, Pe,PDU=0.015

250 200

Experiment: QP=30, P

=0.1

e,PDU

Experiment: QP=32, Pe,PDU=0.1 Experiment: QP=30, P

=0.015

e,PDU

(0,0.77,1) (0,0.60,1)

32 30 28

Model: 0.005 Model: 0.01 Model: 0.015 Model: 0.1 Experiment: 0.005 Experiment: 0.01 Experiment: 0.015 Experiment: 0.1

26 24

150

22 100

20

50 0 0

29 0.005

0.01

0.015

0.02

0.025

0.03

0.035

30

0.04

Forced intra rate

Fig. 10.

(0,0.87,1)

34

450

Average PSNR (dB)

Average total distortion (MSE)

500

Fig. 11.

31

32 QP

33

34

35

Optimal operating point for joint source and channel coding.

Verification of total distortion model.

D. Total Distortion Model In [12], it has been shown with experimental data that the encoder-introduced source distortion and the channel-induced channel distortion are quite uncorrelated. The total distortion can be approximated by the summation of the source distortion and channel distortion, if the distortions are in terms of MSE. Therefore, in the MBMS case, the total distortion Dt can be expressed as: Dt (QP, βFI ; Pe ) = Ds (QP, βFI ) + Dc (QP, βFI ; Pe ). (16) Our simulation verifies the accuracy of the total distortion model, as shown in Figure 10. We observe that given a channel condition, e.g., Pe,PDU = 0.1, a different set of (QP, βFI ) leads to quite a different end-to-end distortion Dt . This motivates us to investigate the join source and channel optimization problem. IV. J OINT S OURCE C HANNEL O PTIMIZATION In this section, we investigate how to jointly consider video source coding and channel coding to optimize the performance of video transmission over MBMS. We first consider a simple case, where we assume that only a single receiver is in the multicast group. We investigate (i) how much gain can be achieved by the joint source and channel optimization, and (ii) whether the proposed models are sufficiently accurate in operating point selection for the joint source and channel optimization. We then consider the multiple receiver scenario and investigate how to appropriately choose an overall optimization criterion to deal with heterogeneity of receiver channel conditions. A. Single User Case We briefly describe the joint source channel optimization problem. As discussed above, when a video is transmitted over a lossy packet network, the total distortion Dt depends both on source distortion Ds and the channel distortion Dc . A smaller QP introduces lower Ds with higher source rate R. Given a target bit rate, higher R will reduce the rate allocated to FEC coding, hence channel distortion will increase. Higher βFI can limit transmission error propagation and hence reduce Dc , but it will lead to higher R for almost the same Ds . For a particular

receiver with a given channel condition Pe , there is an optimal rate allocation among source bits and channel bits at which the total distortion is minimal. Figure 11 shows the maximum PSNR obtained from both the model and simulation at each QP value for different PDU loss rates with the corresponding optimal (βFI , r, PFEC,suc ). We use video quality, i.e., PSNR rather than video distortion, i.e., MSE, to present the gain of adopting joint source and channel optimization. Consistent with the results shown in Figure 9 and Figure 10, the maximum PSNR (minimum distortion) is always achieved when βFI = 0 for any tested QP and Pe . Also, as expected, the optimal QP moves toward higher values when channel condition becomes worse. It should be noticed that for a particular channel condition, the video quality is quite different by using a different QP . For example, in the case of Pe,pdu = 0.1, the optimal QP is 33, and the degradation from using QP = 31 is about 8 dB; while in the case of Pe,pdu = 0.015, the optimal QP is 31, and the degradation from using QP = 33 is about 2 dB. Therefore, the gain from optimizing the operation point according to channel conditions is substantial especially when the channel condition is poor. We observe that the FEC success rates PFEC,suc corresponding to the optimal operation points are high (close to 1). Therefore, a suitable bandwidth allocation should assign just enough channel coding bits so that most FEC blocks can be decoded correctly, and use the remaining bits for source coding without any forced intra blocks, and choose QP as low as possible under the rate constraint. Furthermore, both the model and simulation can land on the same optimal operating points for different channel conditions. It demonstrates that our proposed models can be used in operating point selection for the joint source and channel optimization. B. Multiple User Case Given receiver topology and channel conditions, the optimal operation point in a multicast session is dependent on the overall performance criterion. In this paper, we propose and compare two multicast performance criteria, and investigate the effect of these two criteria on overall system performance. 1) Weighted average criterion: With this criterion, we maximize the weighted average of the video quality (in terms

8

Qopt = max [

W (k)Qk (S, Pe,k )]

(17)

30 Minimax degradation (32.52) Weighted average (32.86)

k=1

where G is the number of receivers in the multicast group, S collectively denotes (QP , βF I ), Pe,k denotes the channel condition of receiver k, Qk (S, Pe,k ) is the individual video quality of receiver k. The weight W (k) depends on the channel condition of receiver k. In this paper, we set W (k) as 1 Pe,k ≤ Pth W (k) = (18) 0 Pe,k > Pth where Pth is a preset threshold. This criterion averages the individual performance over the receivers with reasonable channel conditions and ignores the receivers with very bad channel conditions. 2) Minimax degradation criterion: In this case, we minimize the maximum performance degradation due to multicast among multiple receivers, following the minimax criterion proposed in [15]. Different from that in [15], our criterion requires that a receiver must meet a minimum requirement for receiving channel condition if it is to be served. This prevents a receiver with a very bad channel condition from causing dramatic quality degradation at other receivers. Similarly, we use W (k) to achieve it. The minimax degradation criterion is defined as follows: Qopt = min{max[W (k)(Qopt,k (Pe,k )−Qk (S, Pe,k ))]} (19) S

where Qopt,k (Pe,k ) is the maximum video quality in terms of PSNR of the k th receiver obtainable with an operating point that is optimized for this receiver, and Qk (S, Pe,k ) is the actual received video quality for receiver k with a chosen operating point S. Given each receiver’s individual channel condition, this criterion attempts to equalize the degradation of video quality among all receivers from their individual optimal operating points. We simulate multicasting a video to a group of receivers, where every receiver experiences one of the four different channel conditions shown in Table I in each 30-second time slot. For a new time slot, each receiver will be assigned to a new channel condition with probabilities 0.5, 0.3, 0.15, and 0.05. For example, a receiver will be assigned a channel condition with Pe,PDU = 0.005 with probability 0.5 at the beginning of each time slot. We measure the actual overall channel condition at each time interval, and determine the optimal operation point for the overall channel condition. We assume that 40 users are in the multicast group and we test 60 time slots. We assume the “system” has perfect knowledge of the channel condition of each receiver at different time slots, and determines the optimal operation point at each time slot dynamically based on a chosen multicast performance criterion. We set Pth = 0.3 for both criteria so that all receivers contribute to the optimal operating point selection. Figure 12 (a) plots the received video qualities at a chosen time slot for all receivers under different criteria. We can

25 0

10

20 User

30

40

(a) 35 PSNR(dB)

S

G X

35 PSNR(dB)

of PSNR) in all receivers in a multicast group. Mathematically, this can be written as:

30 Minimax degradation(32.77) Weighted average(33.42) 25 0

10

20

30 Time slot

40

50

60

(b) Fig. 12. Effect of different criteria on video quality (a) Video quality for different users in a particular time slot under different criteria; (b) Video quality for different time slots for a particular user under different criteria.

observe that based on the minimax degradation criterion, the individual video qualities of different receivers tend to be consistent with each other no matter which channel conditions the receivers are experiencing. However, by using the weighted average criterion, there is a much larger variance of video quality between different receivers. The average video PSNR among all receivers based on the weighted average criterion is only slightly higher than that based on the minimax degradation criterion. As shown in Figure 12 (b), the individual video quality is also more stable over the entire streaming time by using the minimax degradation criterion than using the weighted average criterion. Given that the minimax criterion yields much less variation of video quality among receivers and over times, with only slightly lower average PSNR, we argue that this criterion is more appropriate for practical multicast system design. V. R ELATED W ORK Plentiful work has been done in the area of video streaming over wireless environment. In [16], Wang and Zhu reviewed the solutions in error resilience and concealment for video transmission over lossy networks. Most solutions reviewed there can be applied on wireless video transmission to combat video packet losses. In [17], Etoh and Yoshimura reviewed the more recent solutions for wireless video, with most of them focusing on video delivery over cellular networks. While their work mainly focuses on the video unicast scenario, several presented schemes can also be extended to the multicast scenario. Cross-layer wireless approaches for video transmissions have been reviewed in [18]. For example, to overcome packet losses in wireless video, solutions targeted at different network layers have been proposed in [19], including the selection of appropriate physical layer mode, MAC layer retransmission (only for unicast), packet size optimization, scalable coding, etc.. Scalable video becomes a promising candidate in video multicast to deal with heterogeneity of receiver

9

channel conditions, e.g., in [20], [21], [22], [23], [24] and the references therein. Typically, scalable video coding and unequal protection are jointly designed to maximize the system performance with a given overall performance criterion. Most recently, instead of focusing on transmitting video over wireless network in the last hop, video transmitting over multihop wireless networks has drawn a lot of research attentions, e.g., in [25], [26] and reference therein. Video transmitting over MBMS largely limits the system design flexibility. For example, the system design is restricted to application layer, only single-layer H.264 is supported, and feedback from receiver to sender is disabled. In this paper, instead of proposing new frameworks or schemes, we aim to model the system behaviors for video transmission over MBMS, which we believe will benefit for both design and operation of a MBMS video streaming system. We consider a practical joint source and channel optimization problem within the MBMS framework. [27] is the most related work for video transmission over MBMS. From the perspective of system deployment, the authors implement an emulator for video transmission over MBMS. From extensive experiment studies, the authors investigate how different system parameters affect the system performance, including video coding, packitization, etc.. In our work, we attempt to analytically study the MBMS system. Some observations made in our work are consistent with their experiment study. For example, the experiment result in [27] also shows that in MBMS, application-layer cross-packet FEC is more efficient in error resilience compared with intra refreshment. As aforementioned, video modeling, i.e., source rate, source distortion, and channel distortion, have been widely studied in recent years, including [8], [9], [10], [11], [12], [13], [14], [4], [5]. In this paper, we explicitly consider the new features in H.264, e.g., forced intra-coded MBs, deblocking filters [5], etc.. We attempt to relate the source rate/distortion and channel distortion explicitly to the configurable parameters for system operating, including QP and βFI , and the link layer channel conditions Pe of receivers. The packet loss model accurately reflects the amplification of channel degradation in MBMS. The recursive approach used in packet loss modeling can be applied on any wireless networks if application-layer FEC is adopted and the underlying PDU/frame losses are dependent. VI. C ONCLUSION In this paper, we model the system behaviors for H.264 video streaming over MBMS. We claim that our proposed models can be used in system design and operation. Via simulations that largely follow MBMS specification, we verify the proposed models. We investigate a joint source and channel optimization problem. We found that it is essential to have a suitable operating point for bandwidth allocation to source bits and channel bits. We figured out a general rule for bandwidth allocation: setting βFI = 0, guaranteeing a high FEC success rate, PFEC,suc , and then selecting QP as small as possible for a low Ds . Two multicast performance criteria are proposed and compared. We found that the minimax degradation criterion

can yield more constant video quality among all receivers and for the same mobile receiver at different times at only slight reduction of the average video quality (in terms of PSNR). R EFERENCES [1] G. T. 22.146, “Multimedia Broadcast/Multicast Service (MBMS); Stage1 (Release 6),” 2003. [2] I.-T. R. H.264, “Advanced video coding for generic audiovisual services,” 2003. [3] G. T. . version 6.3.0, “Universal Mobile Telecommunications System (UMTS); Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs,” 2005. [4] Y. Wang, Z. Wu, J. M. Boyce, and X. Lu, “Modelling of distortion caused by packet losses in video transport,” in Proceedings of International Conference on Multimedia and Expo, July 2005. [5] Y. Wang, Z. Wu, and J. M. Boyce, “Modeling of transmission-loss induced distortion in decoded video,” IEEE Transactions On Circuits And Systems For Video Technology, vol. 16, no. 6, June 2006. [6] http://www.rfc-editor.org/rfc/rfc3984.txt/. [7] “Video network simulator and error masks for 3GPP services,” November 2004. [8] W. Ding and B. Liu, “Rate control of MPEG video coding and recoding by rate-quantization modeling,” IEEE Transactions On Circuits And Systems For Video Technology, vol. 6, no. 1, February 1996. [9] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic rate distortion model,” IEEE Transactions On Circuits And Systems For Video Technology, vol. 7, no. 1, February 1997. [10] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for lowdelay communications,” IEEE Transactions On Circuits And Systems For Video Technology, vol. 9, no. 1, February 1999. [11] Y. K. Kim, Z. He, and S. Mitra, “A novel linear source model and a unified rate control algorithm for H.263/MPEG-2/MPEG-4,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing, May 2001. [12] Z. He, J. Cai, and C.-W. Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate,” IEEE Transactions On Circuits And Systems For Video Technology, vol. 12, no. 6, June 2002. [13] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimal inter/intra-mode switching for packet loss resilience,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, June 2000. [14] K. Stuhlm¨uller, N. F¨arber, M. Link, and B. Girod, “Analysis of video transmission over lossy channels,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, June 2000. [15] L. Qian and D. L. Jones, “Minimax disappointment criterion for video broadcasting,” in Proceedings of International Conference on Image Processing, October 2001. [16] Y. Wang and Q.-F. Zhu, “Error control and concealment for video communication: A review,” Proceedings of the IEEE, vol. 86, no. 5, May 1998. [17] M. Etoh and T. Yoshimura, “Advances in wireless video delivery,” Proceedings of the IEEE, vol. 93, no. 1, January 2005. [18] M. van de Schaar and S. Shankar, “Cross-layer wireless multimedia transmission: challenges, principles, and new paradigms,” IEEE Wireless Communication Magazine, vol. 12, no. 4, August 2005. [19] M. van der Schaar, S. Krishnamachari, S. Choi, and X. Xu, “Adaptive cross-layer protection strategies for robust scalable video transmission over 802.11 WLANs,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 10, December 2003. [20] H. M. Smith, M. W. Mutka, and E. Torng, “Bandwidth allocation for layered multicast video,” in Proceedings of International Conference on Multimedia Computing and Systems, June 1999. [21] M. van der Schaar and H. Radha, “Unequal packet loss resilience for fine-granular scalability video,” IEEE Transactions On Multimedia, vol. 3, no. 4, December 2001. [22] D. Wu, Y. T. Hou, and Y.-Q. Zhang, “Scalable video coding and transport over broad-band wireless networks,” Proceedings of the IEEE, vol. 89, no. 1, January 2001. [23] Z. Liu, Z. Wu, H. Liu, M. Wu, and A. Stein, “A layered hybrid-ARQ scheme for scalable video multicast over wireless networks,” in Proceedings of Asilomar Conference on Signals, Systems and Computers, November 2007. [24] S. Deb, S. Jaiswal, and K. Nagaraj, “Real-time video multicast in WiMAX networks,” in Proceedings of IEEE INFOCOM, April 2008.

10

[25] S. Mao, S. Lin, Y. Wang, and S. S. Panwar, “Video transport over ad hoc networks: multistream coding with multipath transport,” IEEE Journal on Selected Areas in Communications, vol. 12, no. 4, August 2005. [26] X. Zhu, P. Agrawal, J. P. Singh, T. Alpcan, and B. Girod, “Rate allocation for multi-user video streaming over heterogenous access networks,” in Proceedings of ACM Multimedia, Sepetember 2007. [27] J. Afzal, T. Stockhammer, T. Gasiba, and W. Xu, “Video streaming over MBMS: A system design approach,” Journal of Multimedia, vol. 1, no. 5, August 2006.

Real-time Transmission of Layered MDC Video over Relay ... - IJRIT