low-delay robust video multicast

Viewer
Transcript

LOW-DELAY ROBUST VIDEO MULTICAST

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Zhi Li March 2012

© 2012 by Zhi Li. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons AttributionNoncommercial-No Derivative Works 3.0 United States License. http://creativecommons.org/licenses/by-nc-nd/3.0/us/

This dissertation is online at: http://purl.stanford.edu/ks814ss8141

ii

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Bernd Girod, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Fouad Tobagi, Co-Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ali C. Begen

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

iv

Abstract Video multicast oﬀers a scalable solution for large-scale television distribution. For commercial-grade video multicast services such as Internet Protocol Television (IPTV), it is often challenging to meet a set of conﬂicting design goals, including high video quality, low request response time and good system scalability. This dissertation addresses these issues by developing a decentralized peer-assistance architecture and integrating it with streaming-speciﬁc source and channel coding techniques to deliver multicast video in a timely, reliable and scalable way. Video quality is negatively aﬀected by network packet loss. An existing framework to simultaneously address packet loss and request response delay is to combine multicast with a unicast service that provides both packet loss repair and fast stream startup functionalities. Typically, servers that provide the unicast service could become the system bottleneck. To mitigate this problem, we propose an alternative distributed solution to partially shift the burden of the unicast servers to receivers of the multicast. Using a Peer-Assisted Repair (PAR) protocol, we demonstrate that packet repairs can be delivered reliably on time using a combination of server-peer coordination and redundant repairs. PAR can be naturally extended to a Peer-Assisted Startup (PAS) protocol to facilitate fast stream startup, where a missing portion of the multicast stream is treated as a long erasure burst to be repaired. In these applications, we show that video transcoding can be seamlessly integrated to provide improved robustness to packet loss, reduced stream startup latency and improved system scalability. Erasure correcting coding plays an essential role in the peer-assistance architecture. To understand how to mitigate its delay into the stream startup process, we v

explore the design of erasure correcting codes that takes into account a decoding delay constraint. Modeling the peer-assisted unicast service as communication over a set of parallel links, we propose a practical construction of parallel-link block erasure correcting code, where each source packet is decoded at the sink on-the-ﬂy with a strict decoding deadline. The proposed code handles two types of common errors in the upper layers of packet-switched networks – bursty packet loss and link outages. We further show that the proposed code construction is delay-optimal among all codes that achieve the Singleton bound. The proposed decentralized architecture requires in-network video transcoding at the multicast peer receivers, where it is often desirable to use a scheme with low complexity, such as open-loop requantization-based transcoding. A known drawback of this scheme is the requantization distortion drift problem. We develop an analytical model to capture the distortion drift eﬀect and show that this drawback can be largely mitigated through parameter optimization. We demonstrate its eﬀectiveness in the transcoded-to-primary stream switching problem in fast stream startup.

vi

Acknowledgments Throughout my four and half years of graduate study at Stanford, Professor Bernd Girod constantly guided me with his ﬁerce intuition, breadth of knowledge and spirit of practicality. His vision, passion and entrepreneurial thinking will keep on inspiring me for many years on the road. Professor Fouad Tobagi was very helpful, especially during the preparation of my defense and thesis. I enjoyed the many casual conversations with him, where I learned many historical aspects of computer networking. I was fortunate to have Dr Ali Begen as a great mentor outside the academic world, who constantly steered my interest towards problems that were practically relevant. Xiaoqing has always been a big sister to me. I want to thank her for the time and eﬀort spent on correcting my papers, for being frank on expressing opinions and for being a good listener when I felt puzzled. I enjoyed the great time spent with the other IVMSers – David Varodayan, Aditya, Jeonghun, Yao-Chung, Vijay, Mina, Derek, Frank, Maryam, Andre, Hari, Ngai-Man, Keiichi, David Chen, Sam, Gabe, Sherif, Chuo-Ling, Pierpaolo and Markus. Every outing with you was a fun experience. Together we have created a unique group that fosters great openness and collaborations. My graduate program would not have been a unique experience without the exceptional courses and seminars oﬀered on the Stanford campus. My special thanks goes to Professors Tom Cover, Stephen Boyd, Andrea Montanari, Andrew Ng and Tsachy Weissman, whose teaching strongly shaped my thought process. I enjoyed many recreational physical education courses, including Ann Gould’s tennis classes and Zora Neuhold-Huber’s swimming class. I also want to thank the speakers of a good number of talks/seminars/colloquiums on technology and entrepreneurship that vii

I attended. I feel grateful for always having a group of friends sticking around. I was fortunate to have Daryl as a wonderful roommate along the years. I truly appreciated the little projects with Kai and Yuankai. It was a great learning experience, and may be a good point of departure for the future. I enjoyed the many dinner conversations with Jinghua and Bangpeng about almost everything. My friends at Stanford and in the Bay Area, Tianshi, Jia, Yuchao, Zheng, Yuxin, Chen, Yijie, Xiaoyu, Grace, Zhe, Qiuhua, Tao, Jie, Jianyin, Tom, Lei, Russ, Terry and many others shared great experiences with me along the years. I spent the summer of ’09 at Microsoft Research Asia, when I rediscovered Beijing – a lovely city with rich culture and great people. Dr Feng Wu was a wonderful host, who gave me maximum ﬂexibility in choosing research topics. I owe special thanks to Dr John Wright, who taught me the theory and practice of compressed sensing within a week! My summer of ’10 at Cisco was an equally fulﬁlling experience. Collaborating with the Boxborough folks was truely rewarding. Special thanks goes to Ali, Dave, Kapil and Josh. My collaboration with Dr Ashish Khisti (who was ﬁrst at Deutsche Telekom Labs, then at University of Toronto) was lots of fun. He is such an ingenious researcher to work with. Qibin was an adviser of my master’s thesis, and continued to be a close friend throughout the years. He is a resourceful person that I can always count on. Last but not least, I would like to express my greatest gratitude to my family – Mom, Dad, Ray and Hu – for always being there with their unreserved support!

viii

ix

List of Abbreviations Abbreviation

Explanation

ARQ

Automatic Retransmission reQuest

AVC

Advanced Video Coding

CDF

Cumulative Distribution Function

DSL

Digital Subscriber Line

FEC

Forward Error Correction

GOP

Group of Pictures

IGMP

Internet Group Management Protocol

IPTV

Internet Protocol Television

JM

H.264/AVC Joint Test Model

MDS

Maximum-Distance-Separable

MSE

Mean-Squared Error

PAR

Peer-Assisted Repair

PAR-CT PAR-DT

PAR with Centralized Tracking PAR with Distributed Tracking

PAS

Peer-Assisted Startup

PLR

Packet Loss Rate

PSNR QP

Peak Signal-to-Noise Ratio Quantization Parameter

RAP

Random Access Point

SAR

Server-Assisted Repair

SAS

Server-Assisted Startup

SLEP SLEPr

Systematic Lossy Error Protection Retransmitted SLEP

SRM

Scalable Reliable Multicast

SSM

Source-Speciﬁc Multicast

STB

Set-Top Box

TCP

Transmission Control Protocol

UDP

User Datagram Protocol

VOD

Video-on-Demand

x

Contents Abstract

v

Acknowledgments

vii

List of Abbreviations

x

1 Introduction

1

2 Background

7

2.1 Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.1.1

Reliable Multicast

. . . . . . . . . . . . . . . . . . . . . . . .

8

2.1.2

Peer-to-Peer (P2P) and Peer-Assistance Architectures . . . . .

9

2.1.3

Fast Stream Startup Architectures . . . . . . . . . . . . . . .

10

2.1.4

Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2 Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.2.1

Hybrid Video Coding Architecture . . . . . . . . . . . . . . .

12

2.2.2

Distortion Models for Hybrid Video Coding . . . . . . . . . .

12

2.2.3

Video Transcoding . . . . . . . . . . . . . . . . . . . . . . . .

14

2.3 Error-Resilient Video Transmission . . . . . . . . . . . . . . . . . . .

15

2.3.1

Robust Video Coding . . . . . . . . . . . . . . . . . . . . . . .

15

2.3.2

Error Correcting Coding . . . . . . . . . . . . . . . . . . . . .

16

2.3.2.1

Low-Delay Burst Erasure Codes . . . . . . . . . . . .

16

2.3.2.2

Hybrid Automatic Repeat reQuest (H-ARQ) . . . . .

16

Systematic Lossy Error Protection (SLEP) . . . . . . . . . . .

17

2.3.3

xi

3 Peer-Assisted Packet Loss Repair 3.1

3.2

3.3

Peer-Assisted Repair (PAR) . . . . . . . . . . . . . . . . . . . . . . .

21

3.1.1

Redundancy of Repairs . . . . . . . . . . . . . . . . . . . . . .

21

3.1.2

PAR with Centralized Tracking (PAR-CT) . . . . . . . . . . .

23

3.1.3

PAR with Distributed Tracking (PAR-DT) . . . . . . . . . . .

24

Cross-Layer Design with SLEP/SLEPr . . . . . . . . . . . . . . . . .

25

3.2.1

SLEP Packet Generation and Decoding . . . . . . . . . . . . .

25

3.2.2

SLEP/SLEPr Procedure . . . . . . . . . . . . . . . . . . . . .

27

3.2.3

Joint PAR-SLEP/SLEPr Error Protection . . . . . . . . . . .

28

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.3.1

Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.3.2

Model for Packet Repair Protocols . . . . . . . . . . . . . . .

33

3.3.2.1

Server-Assisted Repair (SAR) . . . . . . . . . . . . .

34

3.3.2.2

Peer-Assisted Repair (PAR) . . . . . . . . . . . . . .

34

Numerical Results . . . . . . . . . . . . . . . . . . . . . . . .

37

Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.4.1

Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.4.2

Eﬀects of Link Bitrates . . . . . . . . . . . . . . . . . . . . . .

41

3.4.2.1

Unicast Server Bitrate . . . . . . . . . . . . . . . . .

41

3.4.2.2

Peer Uplink Bitrate . . . . . . . . . . . . . . . . . .

42

3.4.2.3

Peer Downlink Bitrate . . . . . . . . . . . . . . . . .

43

3.4.3

Eﬀect of Repair Redundancy . . . . . . . . . . . . . . . . . . .

43

3.4.4

Eﬀect of Correlated Loss . . . . . . . . . . . . . . . . . . . . .

44

3.4.5

End-to-End Latency . . . . . . . . . . . . . . . . . . . . . . .

45

3.4.6

Joint PAR-SLEP/SLEPr Error Protection . . . . . . . . . . .

47

3.3.3 3.4

4 Peer-Assisted Fast Stream Startup 4.1

4.2

18

53

Stream Startup Procedures . . . . . . . . . . . . . . . . . . . . . . . .

57

4.1.1

Server-Assisted Startup (SAS) . . . . . . . . . . . . . . . . . .

57

4.1.2

Peer-Assisted Startup (PAS) . . . . . . . . . . . . . . . . . . .

58

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

xii

4.2.1

Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.2.2

Server-Assisted Startup (SAS) . . . . . . . . . . . . . . . . . .

62

4.2.2.1

Modeling of SAS Procedure . . . . . . . . . . . . . .

62

4.2.2.2

Performance Measurements . . . . . . . . . . . . . .

64

4.2.2.3

Optimization of RAP and IJ

. . . . . . . . . . . . .

65

4.2.3

Peer-Assisted Startup (PAS) . . . . . . . . . . . . . . . . . . .

65

4.2.4

Numerical Results . . . . . . . . . . . . . . . . . . . . . . . .

68

4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

4.3.1

Stream Startup Parameter Optimization . . . . . . . . . . . .

72

4.3.2

Number of Multicast Receivers . . . . . . . . . . . . . . . . .

74

4.3.3

Eﬀect of Request Arrival Rate . . . . . . . . . . . . . . . . . .

75

4.3.4

Eﬀect of Video Transcoding . . . . . . . . . . . . . . . . . . .

75

5 Low-Delay Burst Erasure Codes

78

5.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

5.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.3 Code Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

5.3.1

Single-Link Burst Erasure Code

. . . . . . . . . . . . . . . .

84

5.3.2

Parallel-Link Burst Erasure Code . . . . . . . . . . . . . . . .

85

5.3.2.1

Encoding . . . . . . . . . . . . . . . . . . . . . . . .

85

5.3.2.2

Decoding . . . . . . . . . . . . . . . . . . . . . . . .

87

Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

5.3.3.1

Finite Field Size . . . . . . . . . . . . . . . . . . . .

89

5.3.3.2

Encoding Complexity . . . . . . . . . . . . . . . . .

89

5.3.3.3

Decoding Complexity . . . . . . . . . . . . . . . . .

90

5.4 Extension to Heterogeneous Parallel Links . . . . . . . . . . . . . . .

90

5.3.3

5.4.1

Parallel Links with Heterogeneous Capacity . . . . . . . . . .

90

5.4.2

Parallel Links with Heterogeneous Link Delay . . . . . . . . .

91

5.5 Application to Fast Stream Startup . . . . . . . . . . . . . . . . . . .

92

6 Distortion-Drift-Aware Video Transcoding 6.1 Problem Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

95 96

6.2

Analysis of Transcoding Error Process . . . . . . . . . . . . . . . . . 100 6.2.1

6.2.2

6.3

Intra-Frame Requantization Distortion . . . . . . . . . . . . . 101 6.2.1.1

Rate-Distortion Model . . . . . . . . . . . . . . . . . 101

6.2.1.2

QP-PSNR Model . . . . . . . . . . . . . . . . . . . . 102

Inter-Frame Requantization Drift . . . . . . . . . . . . . . . . 104 6.2.2.1

I-Frame . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2.2.2

P-Frame . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2.2.3

B-Frame . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.2.4

Power Transfer Matrix . . . . . . . . . . . . . . . . . 108

6.2.3

Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2.4

Complexity of Parameter Estimation . . . . . . . . . . . . . . 109

Transcoded-to-Primary Stream Switching . . . . . . . . . . . . . . . . 111

7 Conclusions

119

7.1

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

7.2

Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.3

Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

A Converse Proofs A.1 Converse of Theorem 1

123 . . . . . . . . . . . . . . . . . . . . . . . . . 123

A.2 Converse of Theorem 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 130 B Requantization Penalty

132

Bibliography

135

xiv

List of Tables 3.1 Notations used in Chapter 3. . . . . . . . . . . . . . . . . . . . . . . .

20

3.2 Simulation parameters.

. . . . . . . . . . . . . . . . . . . . . . . . .

41

4.1 Notations used in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . .

56

4.2 Simulation parameters.

. . . . . . . . . . . . . . . . . . . . . . . . .

71

5.1 Notations used in Chapter 5. . . . . . . . . . . . . . . . . . . . . . . .

80

6.1 Notations used in Chapter 6. . . . . . . . . . . . . . . . . . . . . . . .

97

A.1 Illustration of the erasure patterns used in proving the converse. denotes an symbol erasure. . . . . . . . . . . . . . . . . . . . . . . . .

xv

128

List of Figures 1.1

Illustration of a multicast system combining multicast with unicast retransmissions for packet loss repair and fast stream startup. . . . .

3.1

3

Packet loss repair protocols illustrated in simpliﬁed network topology. (a) Server-Assisted Repair (SAR), (b) Peer-Assisted Repair with Centralized Tracking (PAR-CT) and (c) Peer-Assisted Repair with Distributed Tracking (PAR-DT). Only the retransmissions are depicted. The lightning sign denotes a bursty erasure not correctable by FEC. .

3.2

21

Generation and transmission of coded redundant repair packets from peers in the co-multicast set CΩ\n to the requesting Peer n. In this illustration, each repair packet is generated by a diﬀerent peer. . . . .

23

3.3

SLEP parity packet generation (a) and decoding (b). . . . . . . . . .

27

3.4

Generation and retransmission of coded redundant repair packets from peers in the co-multicast set |CΩ\n | to the requesting Peer n – joint PARSLEP/SLEPr error protection case. In this illustration, each repair packet is generated by a diﬀerent peer. . . . . . . . . . . . . . . . . .

3.5

Analysis: expected unicast server bitrate E{RS } as a function of the ˜ for (1) SAR, (2) PAR-CT and (3) PARnumber of supported peers N, DT. The redundancy μ is set to 1. . . . . . . . . . . . . . . . . . . . .

3.6

29

38

Analysis: probability of repair failure event P (AF ) as a function of peer departure rate γDEPT , for repair redundancy μ = 1, 2, 3 and peer ˜ PU = 0, 1, 2 (PAR-CT and PAR-DT, uncoded case). . uplink bitrate R xvi

38

3.7 Analysis: probability of repair failure event P (AF ) as a function of the peer departure rate γDEPT , for (a) uncoded case, (b) separately coded case and (c) jointly coded case. μ is set to 2; β is set to 0.5; αFWD and αRET are set at 5% and 5%, respectively.

. . . . . . . . . . . . . . .

39

3.8 Analysis: probability of repair failure event P (AF ) as a function of the peer departure rate γDEPT , for (a) β = 1, (b) β = 1/2 and (c) β = 1/3. ˜ PD = 1. the packets are jointly coded with αFWD and We set μβ = R αRET set at 5% and 5%, respectively. . . . . . . . . . . . . . . . . . .

39

3.9 Simulation: unicast server bitrate RS , for a post-repair PLR of 3e-4, ˜ for (1) SAR, (2) as a function of the number of supported peers N, PAR-CT and (3) PAR-DT. Video sequence: Soccer. . . . . . . . . .

42

3.10 Simulation: probability of repair failure event P (AF) as a function of ˜ PU , for (a) PAR-CT and (b) PARthe normalized peer uplink bitrate R DT with request redundancy μ = 1, 2, 3, uncoded case. The results are measured at γDEPT = 0.1. Video sequence: Soccer. . . . . . . . . .

43

3.11 Simulation: probability of repair failure event P (AF) as a function of ˜ PD , for (a) PAR-CT and (b) the normalized peer downlink bitrate R PAR-DT with redundancy μ = 1, 2, 3, uncoded case. The results are measured at γDEPT = 0.1. Video sequence: Soccer. . . . . . . . . .

44

3.12 Simulation: probability of repair failure event P (AF) as a function of unicast server bitrate RS for (1) PAR-CT and (2) PAR-DT at diﬀerent redundancy and uncoded/coded cases. The results are measured at γDEPT = 0.1. Video sequence: Soccer. . . . . . . . . . . . . . . . . .

45

3.13 Simulation: probability of repair failure event P (AF) as a function of unicast server bitrate RS at diﬀerent percentage of correlated peers. The results are measured at γDEPT = 0.1. Video sequence: Soccer. .

46

3.14 Simulation: probability of repair failure event P (AF) as a function of retransmission delay DRET for (1) SAR, (2) PAR-CT and (3) PARDT. The unicast server bitrate is set such that at an excessive delay of DRET = 0.6 sec, all schemes experience no packet repair failure within 100 runs of experiments.

. . . . . . . . . . . . . . . . . . . . . . . . xvii

46

3.15 Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.1 for PAR-CT and PAR-DT in the uncoded, separately coded and SLEP/SLEPr jointly coded settings. The FEC budget and retransmission budget is set at 5% and 5%, respectively. TBURST is set to 8 ms. Video sequence: (a) Soccer, (b) City and (c) Crew. . . .

48

3.16 Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.1 for PAR-CT and PAR-DT in the uncoded, separately coded and SLEP/SLEPr jointly coded settings. The FEC budget and retransmission budget is set at 5% and 5%, respectively. TBURST is set to 8 ms. Video sequence: (d) Harbour, (e) Ice and (f) Spincal. .

49

3.17 Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.15 for PAR-CT and PAR-DT for diﬀerent combinations of redundancy μ and SLEP compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. The repair bitrate is set to be 10% of the primary video stream bitrate. Video sequence: (a) Soccer, (b) City, and (c) Crew.

. . . . . . . . . . . . . . . . . . . . . . . . . .

50

3.18 Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.15 for PAR-CT and PAR-DT for diﬀerent combinations of redundancy μ and SLEP compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. The repair bitrate is set to be 10% of the primary video stream bitrate. Video sequence: (d) Harbour, (e) Ice and (f) Spincal. . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.19 Simulation: frame PSNR for 100 peers at peer departure rate γDEPT = 0.15 for diﬀerent combinations of redundancy μ and stream compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. (a) PSNR trace for peer 25, PAR-CT, (b) PSNR trace for peer 65, PAR-DT, (c) PSNR of frame 20 for all peers, PAR-CT, (d) PSNR of frame 20 for all peers, PAR-DT. Video sequence: Soccer. . . . .

52

4.1

Procedure of Server-Assisted Startup (SAS). . . . . . . . . . . . . . .

57

4.2

Procedure of Peer-Assisted Startup (PAS). . . . . . . . . . . . . . . .

58

xviii

4.3 Video data received by a peer receiver over time for (a) SAS, Case I and (b) SAS, Case II. For the diagram displayed, we assume the video is not transcoded. Abbreviation: BCAT – data size the unicast needs to catch up; TSS – stream startup latency; IJ – multicast join instant; TJ – multicast join delay; TU – unicast duration; TC – block erasure coding delay; BB – decoder buﬀering data size.

. . . . . . . . . . . .

63

4.4 Analysis: startup latency TSS , unicast data size μU βBU and unicast duration TU as a function of the multicast stream’s relative position within a RAP interval when the receiver initiate a stream startup request, at e-factor e = 0.2, 0.3 and 0.4. β = 1. . . . . . . . . . . . . . .

68

4.5 Analysis: startup latency TSS , unicast data size μU βBU and unicast duration TU as a function of the transcoding compression ratio β, at efactor e = 0.2, 0.3 and 0.4. The relative position within a RAP interval is 0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.6 Analysis: expected retransmission bitrate E{RS } as a function of (a) stream startup request arrival rate λSS , and (b) number of current peer ˜ . M = 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . receivers N

70

4.7 Simulated traces of video packets received by a peer, to demonstration the selection of multicast join time IJ that optimizes the unicast data size. μU = 1 and no decoding delay is involved. In this experiment, β = 1; receiver sends stream startup request at time 0.165. (a) demonstrates the case where optimization is applied; (b) demonstrates a simple strategy when the unicast terminates when it catches up with the multicast. In this illustration, no decoding delay is involved, otherwise the last unicast packet has to be received earlier. Abbreviations: MCast – multicast, UCast RS – unicast sent from the unicast server, UC Req – request for unicast stream startup, MC Join – send multicast join request.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

72

4.8

Simulated traces of video packets received by a peer, to demonstration the selection of unicast startup RAP that optimize the startup delay. μU = 1 and no decoding delay is involved. In this experiment, β = 0.5; receiver joins at time 1.09. (a) demonstrates the case where optimization is applied; (b) demonstrates a simple strategy when the unicast terminates when it catches up with the multicast. Abbreviations: MCast – multicast, UCast RS – unicast sent from the unicast server, UC Req – request for unicast stream startup, MC Join – send multicast join request. . . . . . . . . . . . . . . . . . . . . . . . . . .

4.9

73

Simulation: (a) Unicast data size/duration (mean and standard deviation) from the unicast server and the peers, as a function of the number of current peers. (b) CDF of server unicast data size/duration for SAS (i.e., analogous to that the number of current peers is 0), PAS with number of current peers 5 and 15. The number of requesting peers is 10. 74

4.10 Simulation: unicast server unicast data size/duration (mean and standard deviation) as a function of the stream startup request arrival rate λSS , for SAS, and PAS with number of current peers 10 and 20. . . .

75

4.11 Simulation: stream startup latency TSS and the unicast server unicast data size/duration (mean and standard deviation) as a function of the transcoding compression ratio β. The number of current peers and requesting peers are both 15. We set the unicast server bitrate to be 70% of the suﬃcient to introduce modest contention among the receivers. 76 5.1

Numerical comparison of the rate-delay bound for delay-optimal codes and MDS codes. In this plot, we ﬁx L = 2 and Z = 3. . . . . . . . . .

5.2

83

Simulation: CDFs of stream startup latency TSS under proposed transcoding and erasure correcting coding schemes. Both maximum burst durations of 50 ms and 100 ms are examined. Abbreviation: LD – low-delay codes, MDS – maximum distance separable codes. β is the transcoding compression ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx

93

6.1 Block diagram of the video encoder (with embedded decoder) and the open-loop requantization-based transcoder. . . . . . . . . . . . . . . .

98

6.2 Simpliﬁed codec structure for analysis. . . . . . . . . . . . . . . . . .

99

6.3 Frame MSE after transcoding versus frame size for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. The model is ﬁtted using 4 out of the 23 simulation points (the points lying oﬀ the convex hull are avoided). The triangle represents the operating point of the primary video (QP 25). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

103

6.4 Frame PSNR after transcoding versus the transcoding QP for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. The model is ﬁtted using 4 out of the 23 simulation points (except those with QP diﬀerence between primary quantization and requantization equal to 0 and 1). The triangle represents the operating point of the primary video.103 6.5 The power transfer factor α[m, n] as a function of the power transfer step m (only over I and P frames within a GOP). The GOP of 33 frames from the Soccer sequence are encoded using the IBBBP structure. The models are ﬁtted with data points obtained by transcoding the starting frame while keeping the other frames intact and recording the MSE of the subsequent propagated frames. In this example, as the decaying behaviors of the error processes starting from a P frame are similar, we aggregate the data points to obtain one uniﬁed model for them. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

107

6.6 Power transfer matrix obtained for a GOP of 33 frames from the Soccer sequence, encoded using IBBBP structure. The coeﬃcients are estimated through trials of coding and model ﬁtting of parameters. .

109

6.7 Traces of PSNR and frame data size for a GOP of 33 frames from Soccer, encoded using IBBBP structure with diﬀerent frames transcoded: (a) all Frame 1 to 33 are transcoded, (b) Frame 1 to 15 are transcoded, (c) Starting from Frame 1, every one out of three frames are transcoded, (d) Frame 2 to 5, 20 to 23 are transcoded. The primary video QP and transcoding QP are 25 and 33, respectively. xxi

. . . . . . . . . . . . . .

110

6.8

Stream composition in a rapid acquisition architecture for fast stream startup. The unicast section is colored red; the multicast section is colored green. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6.9

(a) PSNR traces and (b) the corresponding frame size allocation for diﬀerent switching point locations within a GOP, for a GOP of 33 frames from Soccer, where the transcoded stream uses (a) ﬁxed QP for each frame, and (b) optimized QP based on (6.31). The mean transcoded frame size is 93 Kb. . . . . . . . . . . . . . . . . . . . . . 114

6.10 Visual comparison of the decoded video frame from Soccer, for transcodedto-primary switching point M = 33, (a) using ﬁxed QP (PSNR 29.1dB) and (b) using optimized QP (PSNR 39.8dB). Playout frame #33. . . 115

6.11 Mean and standard deviation of the frame PSNR as a function of the mean transcoded frame data size in the transcoded-to-primary stream switching problem, for the six test sequences, transcoded using (i) ﬁxed QP for each frame, and (ii) optimized QP based on solving (6.31).

. 116

6.12 Simulation: CDF of frame PSNR for unicast-to-multicast stream switching with (i) optimized QP using the formulation in (6.31) of Chapter 6, and (ii) with ﬁxed QP. The simulation results are obtained using PAS with 10 current peers and 10 requesting peers. The stream transcoding compression ratio is selected to be β = 0.7. As our main focus here is on the eﬀect of transcoding on the video quality, in these simulations we assume that no bursty erasures occur in the received unicast/multicast stream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

B.1 Illustration of additional error introduced by requantization. Reproduced from Figure 2, [25]. . . . . . . . . . . . . . . . . . . . . . . . . 133 xxii

B.2 Requantization eﬀect on synthetic data: MSE vs. quantization stepsize (left) and MSE vs. entropy of quantized data (right) for source with (a) standard deviation 1.0 and (b) standard deviation 5.0. We generate 105 samples according to i.i.d. Laplacian distribution, and compare two cases of interest: (i) the samples are quantized once by a coarse quantizer with stepsize Δ2 (Quant), (ii) the samples are ﬁrst quantized by a ﬁne quantizer with stepsize Δ1 = 1, then by a coarse quantizer with stepsize Δ2 , where Δ2 ≥ Δ1 . The quantizers have the form |x| f (x) = sgn(x) Δ + 0.5 Δi , i = 1, 2. . . . . . . . . . . . . . . . . . . i

xxiii

134

xxiv

Chapter 1 Introduction Video multicast oﬀers a scalable solution for large-scale television distribution over packet-switched networks. Multicast delivers datagrams to a group of selected receivers in a single transmission at the source and creates copies at network nodes only when required by the network topology. Since the early 1990’s, multicast has been an active research topic. Today, multicast ﬁnds many important real-world applications, ranging from Internet Protocol Television (IPTV) riding on network-layer IP multicast, to peer-to-peer video streaming or ﬁle sharing built on application-layer overlay multicast. This thesis is largely motivated by the real-world deployment of commercial-grade video streaming service over bandwidth-limited and error-prone access networks, for example, IPTV service over Digital Subscriber Line (DSL) links. In these applications, a set of conﬂicting design criteria must be met simultaneously: • Video quality. The received video must have as high a resolution as possible and should exhibit no artifacts due to lossy compression or network error. Video quality depends on the eﬃciency of the underlying video compression algorithm, network bandwidth provisioned and the system’s robustness in maintaining good video quality in the presence of network error, such as packet loss. IPTV-over-DSL provides challenges because the receiver downlink (i.e., the DSL link) is strictly bandwidth-limited and prone to impulse noise that leads to burst packet loss [32]. 1

CHAPTER 1. INTRODUCTION

2

• Response time. The latency from the moment a user initiates a request (e.g., to start a new stream, or to switch to another new stream) to the moment a newly decoded video is displayed on the screen should be as low as possible. A subjective study [22] has shown that in order to achieve user satisfaction, the response time must be kept below one second. In IP networks, the overall response time is an aggregate eﬀect of a number of delay components, for example, acquisition delay for waiting for the next random access point (RAP) in the multicast stream, network buﬀering delay for error control mechanisms, and decoder buﬀering delay for storing a whole video frame before decoding can proceed. • Scalability. The system architecture must be able to cope with a growing number of users and supported multicast sessions (or television programs) while maintaining a reasonable cost. The ﬁrst two criteria aim to provide user satisfaction, as it is the user who has a high expectation on the service quality based on years of traditional broadcast or digital cable experience. The third criterion comes from the perspective of a service provider, whose primary concern is to maintain a low operational cost. A managed network infrastructure is often required to meet these goals. Current IPTV-over-DSL deployments complement source-speciﬁc multicast (SSM) [58, 84] with an error control mechanism, which combines multicast forward error correction (FEC) and unicast retransmissions to repair bursty packet loss [32]. Unlike randomly distributed packet loss, bursty loss are often harder to repair. In this architecture, retransmitted packets are generated by unicast servers, which are usually located at the edge of the core network. Besides, the unicast servers also support fast stream startup by introducing a unicast stream to avoid the acquisition delay of the video stream [166, 67, 33]. To better understand this, note that a missing video stream at the startup moment is akin to a long burst of packet loss to be repaired. However, given its unicast nature, each server can only support a limited number of downstream set-top boxes (STBs) lest they are overwhelmed by the transmission requests for erasure burst correction and stream startup. Typically, the unicast servers become

3

Unicast Server

Multicast

U

Unicast

Packet Loss Repair

R

Receiver

R

Receiver

S Multicast Source

Router Fast Stream Startup

Figure 1.1: Illustration of a multicast system combining multicast with unicast retransmissions for packet loss repair and fast stream startup. the bottleneck for the system to scale up to support a higher number of users. An illustration of such a system is shown in Figure 1.1. In this thesis, these challenges motivate us to re-examine the problem of video multicast from diﬀerent perspectives. From an architectural view, we consider an alternative distributed solution where the burden of the unicast servers is partially shifted to the multicast peer receivers. From a channel coding perspective, we explore how erasure correcting codes can be designed while taking into account the decoding delay constraint. From a source coding perspective, we explore how video transcoding can be optimized to yield good video quality while maintaining low computational cost, thus making it well suited for implementation in the decentralized architecture. The main contributions of this thesis are summarized as follows. • Design and analysis of PAR and PAS protocol s. We propose to utilize multicast peer receivers in the packet loss repair process to lessen the burden on the unicast servers. One main challenge in using this architecture is the uncertain status of the peers. For example, a peer’s departure from the group may lead to packet repair failures. We show that this issue can be greatly alleviated by introducing redundancy in the repairs. We propose the Peer-Assisted Repair (PAR) protocol, which coordinates the server and the peers to deliver repair packets in a timely, reliable and decentralized way. We also describe how PAR can be

CHAPTER 1. INTRODUCTION

4

integrated with forward and retransmitted Systematic Lossy Error Protection (SLEP/SLEPr) – a transcoding-based source-aware error protection mechanism – to provide improved robustness to bursty packet loss. We further extend PAR to a reservation-based Peer-Assisted Startup (PAS) protocol to facilitate fast stream startup, and demonstrate how transcoding of the unicast video stream could beneﬁt in improving system scalability and reducing startup latency. Analytical models and simulation results are used to analyze the performance of PAR and PAS protocols. • Delay-optimal burst erasure code for parallel links. Motivated by the scenario of multiple peers assisting a single receiver, we study block erasure correcting coding for a sequence of packets streamed over a set of parallel links. We consider a new model for network error correction, where each source symbol is decoded at the sink on-the-ﬂy with a strict decoding deadline. We distinguish two error types which have very diﬀerent implications on the design of the code – bursty packet loss and link outages. For a class of codes that achieve the Singleton bound, we state a theorem that characterizes the fundamental tradeoﬀ among coding rate, decoding delay and erasure correction performance. To prove the theorem, we show the achievability via a practical code construction, and the converse via an entropy argument. We apply this code in the peer-assisted fast stream startup problem and demonstrate its eﬀectiveness in reducing the stream startup latency. • Distortion-drift-aware video transcoding. Open-loop transcoding based on requantization is well-suited for in-network processing of a video stream for bitrate reduction. A well-known drawback of this scheme is its requantization distortion drift problem. We develop an analytical model to capture the distortion drift eﬀect; we then show how this problem can be largely mitigated through parameter optimization. Finally, we demonstrate its eﬀectiveness in improving the video quality in the transcoded-to-primary stream switching problem in fast stream startup. Although the proposed solutions in this thesis are designed with IPTV multicast as

5

a motivating application, we believe that it is naturally suited for a broader class of applications. Bursty error is commonly encountered in real-world scenarios. The impulse noise in DSL links is one example. In wireless communications, a prominent feature of wireless media is time-varying multipath fading, which often causes channel transition between good and bad channel states [170]. As another example, in packetswitched networks, when a router is overwhelmed by traﬃc, it tends to drop packets in bursts.1 Our proposed solutions have the potential to provide eﬀective solutions to mitigate the impact of bursty error and reliably deliver media content over these networks. The proposed peer-assistance architecture is also useful in other delayconstrained environments involving multiple receivers, such as multi-party audio or video conferencing, gaming and online whiteboarding, where both delay and reliability are important to the user experiences. The rest of this thesis is structured as follows. Chapter 2 reviews the related research areas of multicast architectures, video coding and video streaming techniques. Chapter 3 focuses on the aforementioned system-level solutions, where the multicast receivers are involved in the packet loss repair process. We ﬁrst present the proposed PAR protocol, then describe the combination of the PAR protocol with SLEP/SLEPr to improve resistance to bursty packet loss and overcome peer downlink bottleneck. We also present an analytical model with the aim of characterizing the performance trade-oﬀs and the gains achieved by the hybrid PAR-SLEP/SLEPr scheme. Chapter 4 presents the PAS protocol for fast stream startup. After description of the stream startup procedures, we analyze the various schemes and discuss the related parameter optimization. Chapter 5 focuses on burst erasure correcting codes for parallel links with a decoding delay constraint. After stating the problem model, we present the main result, which characterizes the fundamental tradeoﬀ between the code rate and the decoding delay. We then describe the construction of parallel-link burst erasure code construction and show its optimality among all Singleton-bound-achieving codes. The cases when the parallel links are heterogeneous in terms of capacity or link

1

Unless the random early detection (RED) feature is turned on. To our best knowledge, in practice, RED is implemented in most routers but rarely turned on due to compatibility issues.

6

CHAPTER 1. INTRODUCTION

delay are also examined. We then apply the constructed low-delay codes to the peerassisted fast stream startup problem and demonstrate its performance gain. Chapter 6 discusses distortion-drift-aware video transcoding and its application in fast stream startup. We ﬁrst introduce a model for the open-loop requantization-based transcoding scheme and analyze the distortion drift eﬀect. We then show how this framework can be applied to optimizing the received video quality in the transcoded-to-primary stream switching problem.

Chapter 2 Background Our work on low-delay robust video multicast builds upon prior work in multicast architectures, video coding and video transmission techniques. In this chapter, we review the state-of-the-art in these areas and discuss their relation to our work.

2.1

Multicast

Multicast is the delivery of information to a group of selected receivers in a single transmission at the source and creates copies automatically at network nodes only when required by the network topology. Multicast can be built at diﬀerent layers of a network. IP multicast [52, 127, 143, 138, 86, 66, 163, 129, 77, 135, 99], initially proposed by Deering [52], relies on the support of network routers to replicate messages at intermediate nodes to achieve one-to-many distribution. A content delivery network (CDN) [101, 179, 53, 137, 144, 149, 176, 161, 23, 95], on the other hand, implements cache servers placed at various locations in a network to mirror data and serve the local receivers. CDNs are often considered a viable alternative to IP multicast [130], as they provide similar beneﬁts but are easier to deploy. At the application layer, overlay multicast [41, 30, 175, 115, 104, 103, 209, 153] does not rely on any speciﬁc infrastructure. Data packets are replicated at the end hosts, which logically form an overlay network. Streaming of live video over peer-to-peer (P2P) networks [175, 115, 209, 206, 153], belongs to the latter category. As the solutions 7

CHAPTER 2. BACKGROUND

8

proposed in this thesis do not bind to any particular multicast implementations, they have the potential to be applied to any viable multicast infrastructures.

2.1.1

Reliable Multicast

Reliable multicast is a concept related to IP multicast. Reliable multicast protocols have been researched for a wide spectrum of applications, ranging from media conferencing, network gaming, to ﬁle transfer and news feeds. Similar to the Transmission Control Protocol (TCP), reliable multicast seeks mechanisms for unfailing delivery of network packets. A noticeable diﬀerence between the concept of reliable multicast and the robust multicast in this thesis is that the latter does not attempt to provide absolute guarantee of end-to-end packet delivery; instead, it seeks to deliver a video stream with a deadline while optimizing the received video quality. In the multicast scenario, achieving scalability by reducing the amount of feedback control traﬃc is an important concern. This issue has been addressed in either a hierarchical fashion, such as in Reliable Multicast Transport Protocol (RMTP) [138], or using multicast retransmission with feedback suppression [66, 135]. Scalable Reliable Multicast (SRM) [66] is a protocol implementing the “repair by any receiver” concept, which is similar in spirit to the PAR protocol in this thesis. However, as low latency is not important in SRM but essential in PAR, there are major diﬀerences between SRM and PAR. PAR provides coordination among the receivers by leveraging the presence of a dedicated server. In SRM, the repairs are fully distributed; the receivers implement random backoﬀ to suppress duplicated requests/repairs. In SRM, group members maintain status updates through lowfrequency session messages; in PAR, members promptly inform the dedicated server when joining/leaving the group and this information is promptly propagated to all the relevant members. Pretty Good Multicast (PGM) [163] is a protocol that bypasses the User Datagram Protocol (UDP) and interfaces directly with IP. Diﬀerent from PAR, PGM does not utilize peers in retransmission. To achieve good scalability, the key idea of PGM is to suppress duplicated negative acknowledgment (NACK) packets at intermediate

2.1. MULTICAST

9

routers as they propagate upstream towards the source. Designed with real-time applications in mind, low-latency is important in PGM. This is in principle similar to PAR. However, no precaution is taken for possible unavailability of repair – if no NACK packets are received by the time a transmit window times out, the data simply becomes unavailable for repair. Multicast File Transfer Protocol (MFTP) [129] targets bulk data delivery applications and has no low-latency requirement. MFTP breaks data into large chunks and aggregate repair requests for each chunk in NACK packets to reduce feedback implosion. A recent work named Cooperative Peer assists and Multicast (CPM) [77] for the applications of Video-on-Demand (VOD) implements the bulk data transfer idea. In addition, to achieve low start-up delay, CPM proposes to pre-populate the beginning of each video to the local machines during oﬀ-peak hours. However, this is not an available option for real-time streaming applications. CPM also uses dedicated server(s), but with the aim of constraining the latency. In contrast, the dedicated server is used for constraining the error probability in PAR.

2.1.2

Peer-to-Peer (P2P) and Peer-Assistance Architectures

P2P architectures have enjoyed wide popularity over the last decade. In a P2P system, there is no dedicated infrastructure and the receivers contribute their resources (e.g., bandwidth, storage) to the system. P2P architectures have been applied to ﬁle sharing [43, 7, 2, 5, 6, 3, 4], live video streaming [175, 115, 153, 209, 206], content delivery networking [23, 104] and VOD [24, 96, 77, 78]. A pure P2P architecture has many limitations, such as unreliability and prolonged startup delay. In view of these limitations, many works propose a hybrid architecture where P2P is backed by a more traditional server infrastructure [152, 77, 117, 79, 202, 195, 107, 134]. Our proposed peer-assistance architecture belongs to this category. To the best of our knowledge, we are the ﬁrst to use a peer-assistance architecture for the application of error resilience and reducing stream startup latency.

CHAPTER 2. BACKGROUND

10

2.1.3

Fast Stream Startup Architectures

In IP-based video multicast, the latency experienced by a user when starting up a new stream is an aggregate result of a number of delay components, for example, acquisition delay for waiting for the next RAP in the multicast stream, network buﬀering delay for error control mechanisms, and decoder buﬀering delay for storing a whole video frame before decoding can proceed. Various mechanisms to accelerate the stream startup procedure have been proposed. One approach is to send along the primary stream a companion stream of lower quality but more frequent RAPs, so that the acquisition delay can be eﬀectively limited [67, 102, 68]. A second approach is to use multiple time-shifted multicast sessions for a single video stream and let the receiver join one session that leads to a minimum acquisition delay [150, 34]. The companion stream approach and the time-shifted multicast approach impose extra burden on either the multicast router data plane and the access network downlinks, or on the multicast router control plane. A diﬀerent approach is to use a dedicated server to burst a unicast stream to achieve rapid acquisition of the new stream [166, 67, 33]. This approach can avoid the acquisition delay, as the unicast stream always starts with an RAP. The rapid acquisition approach is related to patching – a multicast technique for VOD services [87, 120, 89]. Both leverage multicast service with unicast to start a new video stream. However, there are a number of diﬀerences. First, as patching is for VOD, the unicast always starts from the very beginning of a video; on the other hand, since rapid acquisition is for real-time video, the unicast can start from any RAP in a video stream. Second, the original design of patching [87] assumes that the unicast and the multicast streams can be downloaded simultaneously without receiver downlink bandwidth constraint. In contrast, rapid acquisition explicitly considers this constraint in scheduling/pacing the unicast and multicast streams. An associated problem with the rapid acquisition approach is bitstream switching from the (transcoded) unicast back to the multicast stream. One way to do this is to switch the stream only at RAPs (e.g., I-frames). In this way, no mismatch distortion drift would be introduced, but this would lead to long delay or sacriﬁce the coding eﬃciency. Another approach is to generate SP/SI switching frames to bridge the two

2.1. MULTICAST

11

streams [91]. However, generating switching frames involves considerable complexity, which may lead to scalability issues under the multicast scenario. An approach without generating switching frames but could mitigate the distortion drift eﬀect is proposed in [197, 198]. The key idea is to choose a switching point in a neighborhood with the highest encoding quality, within a switching window determined by the delay constraint. In this thesis, we adopt an approach without generating switching frames as well, but the twist is that instead of selecting a switching time within a time window, we select a quantization parameter (QP) for each unicast frame to mitigate the distortion drift eﬀect. Under this approach, the delay associated with bitstream switching can be completely avoided, while good video quality can be maintained.

2.1.4

Network Coding

Network coding is the technique of coding packets (instead of merely forwarding) at intermediate network nodes to achieve improved throughput and error robustness. The coding scheme leveraged in the PAR/PAS protocols share some similarities with network coding [38, 40, 20, 105, 94, 201, 35, 93]. Both rely on encoding of packets at network nodes in a distributed manner, and decoding at the destination nodes upon receiving enough packets. There are, however, a number of diﬀerences. Practical network coding [40, 38, 39] leverages random linear codes to allow the encoding to proceed in a fully distributed manner (i.e., incoherent coding). The coding rules are tagged on the data packets to allow decoding to proceed in a distributed manner. Random linear codes [40, 105], by nature, have a small probability of failure. On the other hand, since retransmission requests in the PAR/PAS protocols are initiated by a single network node, the coding rules can be speciﬁed in a centralized way (i.e., coherent coding) and included in the request packets. The request recipients then generate the data packets according to the speciﬁed rules. Instead of random linear codes, deterministic codes, such as Reed-Solomon codes or low-delay burst erasure codes, can then be used to achieve the best possible performance.

CHAPTER 2. BACKGROUND

12

2.2 2.2.1

Video Coding Hybrid Video Coding Architecture

Today’s prevailing technology for video compression is based on a hybrid coding architecture that combines predictive coding and transform coding to reduce the redundancy of a video signal. The simulation results we present in this thesis are based on the latest video coding standard H.264/Advanced Video Coding (H.264/AVC), which was ﬁnalized in March 2003 [14]. Its predecessors including H.261, H.262 (MPEG-2), H.263 and MPEG-4 [17, 16, 18, 15], and competing standards [165, 164, 98] are all based on similar hybrid coding architectures. Its succeeding standard, High Eﬃciency Video Coding (HEVC, or H.265) [1], is currently under development (as of 2011). Overviews of modern video coding and in particular of H.264/AVC can be found in [182, 119, 136, 60]. State-of-the-art video codecs achieve high compression ratios by using appropriate combinations of a large number of tools. For example, H.264/AVC achieves bit rate reduction of up to 50% at a comparable quality compared to H.263 as a result of a combination of better motion-compensated prediction with more reference frames for prediction [194], a ﬁner granularity with varying block sizes down to 4x4 pixels [189], spatial prediction of independently coded frames [194], 4x4 integer transform [122], improved entropy coding [123], among many others. A reference software implementation of H.264/AVC is available [8].

2.2.2

Distortion Models for Hybrid Video Coding

Visual artifacts in a decoded video sequence could be the results of lossy compression and/or transmission error. To study the visual quality of a video sequence, the most straightforward metric is the mean-squared-error (MSE) distortion, or equivalently, the peak signal-to-noise ratio (PSNR), measured between the original and the decoded video signals. MSE has its limitations [70], and more advanced subjective or objective metrics have been proposed [27, 187, 186, 185, 151, 188, 139, 126]. However, due to its practicality for evaluation and simplicity for modeling, MSE is still widely used

2.2. VIDEO CODING

13

today. The results on video quality in this thesis are evaluated based on the MSE distortion of each frame of a video sequence. To model the distortion presented in a video sequence due to lossy compression using a motion-compensated hybrid video codec, a theoretical framework is ﬁrst presented in [69] based on the analysis of power spectral density of the video signal. This model leads to a closed-form expression of the distortion with the assumption that the source process is stationary and jointly-Gaussian. This framework is later extended in [71] to account for sub-pixel motion accuracy, in [72] for multi-hypothesis predictive coding (or B-frames) and in [65, 64, 63] where more speciﬁc information about prediction is considered. Other extensions of this framework include the analysis of SI/SP switching frames [91, 154, 155, 153], analysis of scalable video codecs [44, 140, 45] and motion side information estimation in distributed video coding [19, 73, 106]. The model presented in [69] provides insights on the inﬂuence of various elements of a video coding system, but its idealistic assumptions make the model unsuitable to precisely calculate the operational rate-distortion behavior for a particular video sequence. Various works have proposed empirical models to characterize operational rate-distortion functions [168, 82], some developed for speciﬁc implementations of codecs [55, 200, 50]. These empirical models are typically useful for rate control [81, 83, 128, 173]. In this thesis, we apply a modiﬁed version of the empirical model in [168] to the requantized video frames, to compensate for the requantization penalty – a factor not captured in the original model. Distortion may also be a result of channel noise or transmission error. In the event of video packet loss, the distortion in the current frame tends to propagate to the other predictively coded frames in a video sequence. Various models have been proposed to capture this distortion propagation eﬀect [168, 75, 47, 207, 59, 108, 36, 109]. A model for a general error process across predictively coded video frames is proposed in [168, 75], which not only applies to distortion caused by packet loss, but also to requantization distortions, as will be discussed in this thesis.

CHAPTER 2. BACKGROUND

14

2.2.3

Video Transcoding

Video transcoding is the procedure of performing one or more operations to transform one compressed video stream to another. Video transcoding can be performed with diﬀerent goals in mind, for example, bitrate reduction [116, 97, 80], format conversion [157, 88, 190, 158, 131], error resilience enhancement [196, 51, 181] and digital watermarking for copyright protection [48]. In this thesis, we focus on video transcoding to achieve bitrate reduction. For overviews of video transcoding, please refer to [180, 21, 90]. Video transcoding can be performed with various complexities – on one end of the spectrum is full-blown decoding/re-encoding, on the other end is simple openloop transcoding via requantization [132, 171, 204], with various schemes in between [199, 172, 92, 25], balancing video quality and complexity. This thesis revisits the simple architecture of open-loop transcoding by requantizing the motion-compensated transform coeﬃcients while preserving the coding modes, motion estimation information and group of picture (GOP) structure. This scheme has been studied extensively [132, 171, 172, 26, 204, 203, 199] and surveyed in [21, 90]. There are well-known drawbacks of the open-loop requantization architecture. First, additional requantization error occurs if the requantizer is not embedded with the encoder primary quantizer [25] . Second, since no closed-loop motion compensation is performed with requantization, the requantization distortion will drift across the subsequent frames until the next intra-coded frame (I-frame) [204, 90]. These drawbacks are relatively minor for our applications in packet loss repair and fast stream startup, since the transcoded video is only displayed transiently. Nevertheless, we would like to minimize their impact on the displayed video quality. One possible approach is to design a quantizer speciﬁcally for the transcoding purpose [192], but this would render the transcoded bitstream incompatible. Other approaches mainly focus on replacing the open-loop architecture with a closed-loop architecture [21, 90], which inevitably increases the system complexity. In this work we adapt a system-level backward-compatible approach – we model the rate-distortion behavior of each frame and the requantization distortion drift eﬀect, and select appropriate QPs of the standard quantizer for each frame to minimize the modeled distortion, while taking into account the additional

2.3. ERROR-RESILIENT VIDEO TRANSMISSION

15

requantization error problem. Another related topic is the theoretical analysis of requantization distortion. Conventional wisdom on quantization distortion is that it can be treated as independent additive noise. However, as discussed by Widrow and Kollar [193], this assumption holds only under certain conditions. For requantization distortion, several works [31, 159] have attempt to model its eﬀect without the independence assumption.

2.3

Error-Resilient Video Transmission

Error resilience techniques mitigate the impact of transmission error or packet loss on the quality of the decoded video. To achieve error resilience, tools from both video coding and error correcting coding have proven useful. Overviews of this topic can be found in [210, 205, 42, 76, 37, 184, 183, 74, 191, 177]. It is also possible to combine techniques from both areas to achieve superior performance. SLEP, which plays a crucial role in our robust video multicast architecture, belongs to this category.

2.3.1

Robust Video Coding

One approach to achieve error resilience is to introduce redundancy in the compressed video bitstream at the video encoding stage at the cost of lower coding eﬃciency. This can be done through, for example, varying the GOP size, intra-macroblock refresh, data partitioning, redundant slices, smart selection of reference pictures, and ﬂexible macroblock ordering [191, 167, 46, 208]. If a feedback channel is available, the transmitter can adapt the encoding to the feedback information sent by the decoder, via adaptive reference picture selection [74, 75, 61, 114, 110, 111, 112], or sending SP/SI switching frames [62, 91]. Error concealment is a post-processing technique at the decoding stage that acts as a last resort when error/loss is inevitable. The simplest technique of error concealment is frame repetition, which requires no additional computation but does not correct visual artifacts. Temporal linear interpolation algorithms can be implemented at the decoder to create new frames. More advanced schemes have been proposed based on a

CHAPTER 2. BACKGROUND

16

combination of spatial and temporal motion-compensated interpolation [174, 169, 49] and accurate estimation of coding modes and motion vectors [142, 28, 100].

2.3.2

Error Correcting Coding

Error correcting coding has been studied for over half a century. General discussions on error correcting coding can be found in [113, 121]. In this section, we review two topics in this area that are directly related to the work in this thesis. 2.3.2.1

Low-Delay Burst Erasure Codes

Low-delay burst erasure codes consider how to achieve minimal decoding delay when the erasures are bursty. Martinian et al. ﬁrst studied delay-optimal systematic codes for a single erasure burst [57, 125, 124]. They show that conventional maximumdistance-separable (MDS) codes (such as Reed-Solomon codes) are not delay-optimal, and codes with better rate-delay tradeoﬀ are possible. They design a family of delayoptimal block codes, then use diagonal interleaving [125] to convert the block codes into streaming codes. The single-link code is extended to a parallel-link code in [124], which is limited to correcting a single burst. Delay-optimal erasure codes for broadcast channel is studied by Khisti et al. in [29]. The burst erasure code proposed in this thesis diﬀers from [57, 125, 124, 29] as follows. First, block codes are considered instead of streaming codes. The conversion from block codes to streaming codes can be achieved trivially [125]. Second, a broader class of causal codes are considered instead of systematic codes. Third, more general erasure patterns are considered, including erasure bursts and link outages. In particular, we consider correcting multiple erasure bursts instead of a single burst as in [124]. 2.3.2.2

Hybrid Automatic Repeat reQuest (H-ARQ)

Hybrid Automatic Retransmission reQuest (H-ARQ) [162, 178] is a variation of the standard ARQ where FEC is combined with retransmissions for error correction. Two types of H-ARQ schemes have been used in practice. In Type I, the source

2.3. ERROR-RESILIENT VIDEO TRANSMISSION

17

message and the parity are transmitted. At the receiver, error correction decoding is attempted. If it fails, retransmission of the source data is requested. In Type II, the receiver sends a request asking for more parity information instead. When received, the parity and the forward transmitted parity are both used in a stronger error correction decoding. Type II H-ARQ is also known as incremental redundancy (IR) or on-demand FEC. Rate-adaptive codes such as shortened Reed-Solomon codes [147], LT codes [118] or Raptor codes [160] are required in Type II H-ARQ.

2.3.3

Systematic Lossy Error Protection (SLEP)

Equipped with both video coding and error correcting coding tools, SLEP is an application-layer forward error correction scheme for robust transmission of video over packet erasure channels [73, 145]. The scheme is systematic in the sense that the protection stream is separable from the source stream, and lossy in that robustness is achieved in exchange for a slight drop in video quality when error correction is needed. The concept originates from the distributed source coding principle and the problem of digital enhancement of a degraded analog signal [156]. SLEP is able to achieve graceful degradation even under very noisy channel conditions because it can avoid the “cliﬀ eﬀect” (i.e., the error correcting performance undergoes a sharp degradation at some threshold) that is usually experienced in conventional FEC. A practical implementation of SLEP using H.264/AVC redundant slices and Reed-Solomon codes is described in [145]. A theoretical analysis of the SLEP performance is presented in [146] for a ﬁrst-order Gauss-Markov source and a DPCM-like encoder. In this work, we build on top of the existing framework and extend SLEP to the H-ARQ mode.

Chapter 3 Peer-Assisted Packet Loss Repair In this chapter and the next, we take an architectural view and study a system-level video multicast solution with great scalability. In the current chapter, we focus on the packet loss repair problem; Chapter 4 is dedicated to fast stream startup. In Chapter 1 (Figure 1.1), we discussed a multicast system combining unicast retransmissions for bursty packet loss repair and fast stream startup. This system can be abstracted as shown in Figure 3.1(a), where only the unicast retransmission is shown. Since the repairs only involve the unicast server, we refer to this approach as Server-Assisted Repair (SAR). Unlike multicast where the server delivers a single stream to support multiple receivers, a unicast server must deliver data to each receiver individually; as a result, the traﬃc grows linearly with the supported number of receivers, making the unicast server the bottleneck for such a system to scale up to support a higher number of receivers. The key idea to overcome such a bottleneck is to realize that the unicast server is not the only resource that could provide such a retransmission service. The same video stream is also cached by the peer receivers of a multicast session. In a nutshell, our idea is to partially shift the unicast service from the server to the multicast peer receivers. The unicast server serves as the last resort to do the repair when the peerassisted repair becomes unavailable. We show how this can be achieved through the design of a Peer-Assisted Repair (PAR) protocol. 18

19

One challenging issue faced by PAR (but not by SAR) is unexpected peer departures that could lead to repair failures. For example, when a peer receiver leaves a multicast session, it is still deemed available for packet repair until the state update is relayed to the unicast server and other peers. Besides peer departure, other uncertainties could also lead to repair failures. For example, due to imperfect synchronization, the requested packet may fall outside a peer’s cache window. In this chapter, using the principle of end-to-end system design [148], we propose to address all these problems in the application layer using redundant repairs. The PAR protocol operates at the network/transport layers and is designed with low latency in mind; that is, unlike TCP or SRM [66], the system cannot aﬀord to let a peer receiver continue sending requests until the repair is received; instead, the system must ensure that the repair information is received after one retransmission attempt with high probability. In addition, the communication between the peers does not rely on exchanging state information between the peers. Instead, the unicast server pushes the state information to the peers. Given that the transported source is video instead of generic data, performance enhancement can be achieved by exploiting the property of a video stream. In this chapter, we also present how PAR can be integrated with forward and retransmitted Systematic Lossy Error Protection (SLEP/SLEPr) – a transcoding-based sourceaware error protection mechanism – to provide improved robustness to bursty packet loss and overcome peer downlink bottleneck. The joint PAR-SLEP/SLEPr scheme is able to eﬀectively deal with erasure bursts and peer departures within a uniﬁed framework. This chapter is organized as follows. Section 3.1 presents the proposed PAR protocol. Section 3.2 describes how to combine the PAR protocol with SLEP/SLEPr. Section 3.3 presents an analytical model with the aim of characterizing the gains achieved by the joint PAR-SLEP/SLEPr and the performance trade-oﬀs. Simulation results are presented in Section 3.4. The notations used in this chapter are summarized in Table 3.1.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

20

Notation ˜ N ˜ M ˜ Nm B; B s; p μ CR MS(·) CΩ\n (N, K) S PFWD ; MFWD PRET RFWD ; RRET αFWD ; αRET β RSRC TBURST TBB AS AFWD ARET AF RS ˜ PU RPU ; R ˜ PD RPD ; R γDEPT ANR ˜) pZ (m; s, M b(k; n, p) E{·}

Explanation Number of receivers Number of multicast sessions Number of receivers in Multicast Session m index set for erasure burst; erasure burst length B := |B| Source packet; parity packet Redundancy degree Coding rule Operator that maps a peer to a multicast session Peer n’s co-multicast set Number of channel symbols / source symbols in a coding block Index set for a block of source packets Index set for parity packets in FEC; MFWD := |PFWD | Index set for parity packets in retransmission FEC parity stream bitrate; retransmitted parity stream bitrate FEC parity stream rate budget; retransmitted parity stream rate budget SLEP stream compression ratio Bitrate of the source video stream Mean time of erasure bursts Mean time between two erasure bursts Event that a packet is successfully received Event that a packet is successfully repaired by FEC Event that a packet is successfully repaired by retransmission Event that a packet is failed to be repaired Unicast server bitrate ˜ PU := RPU Uplink bitrate at peer receiver link; R βRRET ˜ PD := RPD Downlink bitrate at peer receiver link; R βRRET Peer departure rate Event that a peer is unresponsive to a request ˜ Zipf probability distribution with parameter s and M n k n−k Binomial distribution b(k; n, p) = k p (1 − p) Expectation of a random variable

Table 3.1: Notations used in Chapter 3.

3.1. PEER-ASSISTED REPAIR (PAR)

Receiver

R

Receiver

1. Repair Request

Router Unicast Server

Router

R

Receiver

R 1. Repair Request

R

2. Repair Packet(s)

U

21

Unicast Server

R

2. Repair Request (w/Coding Rules) or Repair Packet(s)

(a)

1. Repair Request (w/ Coding Rules)

R

R

3. (Coded) Repair Packet(s)

U

R

(b)

Router Unicast Server

U

R

2. (Coded) Repair Packet(s)

(c)

Figure 3.1: Packet loss repair protocols illustrated in simpliﬁed network topology. (a) Server-Assisted Repair (SAR), (b) Peer-Assisted Repair with Centralized Tracking (PAR-CT) and (c) Peer-Assisted Repair with Distributed Tracking (PAR-DT). Only the retransmissions are depicted. The lightning sign denotes a bursty erasure not correctable by FEC.

3.1

Peer-Assisted Repair (PAR)

In Section 3.1.1, we discuss how to introduce redundancy in the repair packets sent from diﬀerent peers. In Section 3.1.2 and 3.1.3 respectively, we discuss two variations of the PAR protocol – PAR with Centralized Tracking (PAR-CT) and PAR with Distributed Tracking (PAR-DT). Figures 3.1(b) and (c) illustrate these two variations. ˜ receivers. In the sequel, we consider the case that the router is connected with N ˜ multicast Attached to the router is one unicast server. Each peer may join one of M sessions. Each network packet may consist of multiple symbols, in which case they are multiplexed and encoded/decoded in parallel. For simplicity, we use the symbol notation si or pi to denote one network packet.

3.1.1

Redundancy of Repairs

Uncoded Case. When a peer receiver experiences an erasure burst, it ﬁrst attempts to correct the burst by applying FEC using the received parity packets. If the attempt fails, denote by B the index set for the residual erasure burst of size B := |B|, and

22

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

{si }i∈B the set of erased packets. The simplest solution to recover these packets is to request redundant repairs for each packet si individually, thereby increasing the probability that the requesting peer receives at least one retransmitted packet for each si . Exactly μ copies are requested for each si from the requesting peer’s helping peers. When not enough helping peers are available, retransmission is performed by the unicast server, where redundant repairs are not needed since the unicast server is considered reliable. Note that the number of requested copies μ should not be too large, otherwise the redundant packets will cause congestion in the requesting peer’s downlink. The issue of selecting a suitable value of μ is discussed in detail in Sections 3.4.3 and 3.4.6.

Coded Case. A more eﬃcient solution is to apply erasure codes across helping peers to recover a burst of packet loss, as illustrated in Figure 3.2. Speciﬁcally, for a burst of size B, a (μB, B) code shall be applied. Note that unlike the uncoded case where the redundancy degree μ must be an integer, now μ can be a fractional number such that μB is an integer. In total, μB requests are generated and sent to the helping peers. Each Request also includes a coding rule CR in the following form: CR = g˜() := [g(1, ), ..., g(B, )]T, = 1, ..., μB. The vector g˜() corresponds to a column of the code generator matrix G ∈ QB×μB of a linear block code, where Q is a ﬁnite ﬁeld. Peer , after receiving the request with the coding rule CR , generates the coded packet p˜ according to p˜ =

B

g(i, )s(i)

i=1

where {s(i) }B i=1 := {si }i∈B (we use the subscript (i) for the indexing of packets within B, which is diﬀerent from the index i used for the set of all packets) and transmits p˜ to the requesting peer, which performs decoding after successfully receiving B coded packets. Various erasure correcting codes can be applied, such as Reed-Solomon codes, in which case eﬃcient decoding methods such as Newton’s interpolation exist

3.1. PEER-ASSISTED REPAIR (PAR)

23

Burst B

pn

…

Erasure

…

˜ p1

p1

…

s(1)

… s(B)

…

…

⊆ CΩ\n

Decoding succeeds if received coded packets

≥B

˜ pµB

pμB

…

s(1)

… s(B)

…

Figure 3.2: Generation and transmission of coded redundant repair packets from peers in the co-multicast set CΩ\n to the requesting Peer n. In this illustration, each repair packet is generated by a diﬀerent peer. [85]. In all the discussions, we assume that the code achieves the Singleton bound, i.e., it is MDS.

3.1.2

PAR with Centralized Tracking (PAR-CT)

In PAR-CT (Figure 3.1(b)), a peer’s status is continuously tracked by the unicast server. When a peer joins a multicast session, (we denote this by m = MS(n), where MS(·) is the operator that maps Peer n to Multicast Session m), the unicast server is promptly informed of this event via a special message [166]. During the session, the peer periodically sends messages to the unicast server to conﬁrm that it continues to receive the multicast. If no such message is received for some time, the server no longer considers the peer a receiver of that multicast session. When the peer leaves the multicast session, it promptly informs the server of this event. The server also ˜

maintains a table that keeps track of the peer-session pairs {(n, MS(n))}N n=1 . Thus, it knows the co-multicast set CΩ\n := {j : j = n, MS(j) = MS(n)} of any Peer n, i.e., the set of peers currently in the same multicast session as the peer.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

24

When a peer experiences an erasure burst B that causes loss of packets {si }i∈B which cannot be corrected by FEC, it sends a repair request to the unicast server, indicating B, i.e., which packets are missing. The server determines μ – the number of redundant repairs per packet needed to maintain a high probability of recovery. The total number of repair packets is then μB. Let Peer n be the requesting peer. If a minimum number of helping peers in the co-multicast set can be found, i.e., |CΩ\n | ≥ μ, the server then delegates the μB requests to these peers in CΩ\n . The requests should be assigned to the peers in a way that they share the load equally. If coding needs to be applied across the repair packets, the coding rules CR , = 1, ..., μB are also speciﬁed in the request packets (See Section 3.1.1). If not enough peers are found, i.e., |CΩ\n | < μ, the unicast server replies to Peer n with the requested packets {si }i∈B itself (i.e., the scheme fails over to SAR). Note that by delegating the request instead of sending the repair packets, the unicast server bitrate and computation can be greatly reduced. Upon receiving the request, each Peer looks in its cache for the requested source packets, or for the packets required for generating the parity packets. If found, it responds to Peer n with the repair packets (i.e., the requested source packets or the generated parity packets).

3.1.3

PAR with Distributed Tracking (PAR-DT)

Similar to PAR-CT, in PAR-DT (Figure 3.1(c)) a peer’s status is always promptly ˜ maintains its own tracking table that updated. However, each Peer n, n ∈ {1, ..., N}, contains the co-multicast set CΩ\n . Whenever there is a status change, the unicast server is informed ﬁrst, which, in turn, sends a message to the relevant peers whose tracking tables need an update. When a peer experiences the loss of packets {si }i∈B which cannot be corrected by FEC, it ﬁrst determines the repair redundancy μ needed. The peer then assigns the μB requests to the selected peers in the co-multicast set CΩ\n in a similar way as the unicast server in PAR-CT does. It also speciﬁes the coding rules CR , = 1, ..., μB if coding shall be applied. Let Peer n be the requesting peer. If Peer n does not ﬁnd a suﬃcient number of peers receiving the same multicast, i.e., |CΩ\n | < μ, it sends the

3.2. CROSS-LAYER DESIGN WITH SLEP/SLEPR

25

repair request to the server, indicating B (i.e., the scheme fails over to SAR). Either the server or the peers in CΩ\n respond with the repair packets, similar as in PAR-CT. PAR-CT and PAR-DT diﬀer from each other in (i) where the tracking table(s) are maintained, and (ii) who decides from where to request repair packets. Compared to PAR-CT, PAR-DT demands a lower server bitrate and reduces queueing delay at the server, but it is more diﬃcult for PAR-DT to achieve tracking table synchronization. This is because the server needs to constantly update the peers by sending them the state information. Since the state information is subject to delay and loss, there is more uncertainty in comparison with the solution that the server maintaining the tracking table itself.

3.2

Cross-Layer Design with SLEP/SLEPr

In this section, we describe how to combine the PAR protocol with forward and retransmitted SLEP (SLEP/SLEPr) to achieve improved resistance to erasure bursts, overcome the peer downlink bottleneck and eﬀectively deal with erasure bursts and peer departures within a uniﬁed framework. In Section 3.2.1, we highlight the SLEP packet generation and decoding procedure. In Section 3.2.2, we illustrate the SLEP/SLEPr procedure as an extension of SLEP in the hybrid error control scenario (i.e., H-ARQ, refer to Section 2.3.2.2). In Section 3.2.3, we show how to combine the application-layer SLEP/SLEPr with network/transport-layer PAR.

3.2.1

SLEP Packet Generation and Decoding

The SLEP scheme described in [145] consists of a codec for generating/decoding the H.264/AVC-compatible primary packets and an enhancement codec for generating/decoding the parity packets. Figure 3.3(a) illustrates the generation of SLEP parity packets, and Figure 3.3(b) illustrates its decoding process. Below we highlight the parity packet generation and decoding process. Interested readers can refer to [145] for more details.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

26

Packet Generation. The sender ﬁrst encodes the input video using a standard H.264/AVC coder into a stream of source packets. The bitstream corresponding to a number of macroblocks is encapsuled in a so-called primary slice, one per source packet. From each source packet, the sender then generates a redundant packet that contains one redundant slice (covering the same macroblocks as in the source packet). The redundant slice is obtained by open-loop transcoding through requantization with a QP larger than the primary quantization QP, thus reducing the data size of each slice. If the generated redundant slices are of diﬀerent sizes, they can be zero-padded to exactly the same size. Next, divide the source stream into blocks of packets, where the block boundary should align with the boundary of video frames. For a block of K packets, apply a systematic (N, K) block erasure code across the K redundant packets to generate N − K parity packets, where each parity packet contains the encoded parity slice plus some helper information. The helper information includes redundant slice QP and slice boundaries. The SLEP parity packet generation can be considered as a “compression scheme” for the parity packets, as the parity packets are generated based on the redundant packets instead of the source packets.

Packet Decoding. The receiver receives the source packets and the parity packets transmitted over an erasure channel. If the total number of received packets is no smaller than K, then from each received source packet, it regenerates a redundant packet that contains one redundant slice, the same way as at the sender. Then the regenerated redundant packets and the received parity packets are passed to the erasure decoder, where the redundant slices covering the same macroblocks as in the lost source packets are generated. The redundant slices are then decoded and fed into the H.264/AVC decoding loop as a substitute for the primary slices in the lost source packets. If the total number of source and parity packets is smaller than K, then post-processing error concealment is triggered in the decoding loop to cover the lost primary slice.

3.2. CROSS-LAYER DESIGN WITH SLEP/SLEPR

(a)

27

(b)

Figure 3.3: SLEP parity packet generation (a) and decoding (b).

3.2.2

SLEP/SLEPr Procedure

Consider the scenario where SLEP is applied to a Type II H-ARQ (see Section 2.3.2.2). Let si denote a source packet, ˆsi a redundant packet and pˆi a parity packet. Let S denote the index set for a block of source packets. Among the SLEP parity packets from S, let {ˆpi }i∈PFWD be the set of parity packets used in FEC, where PFWD is the FEC index set, and {ˆpi }i∈PRET is the set of parity packets used in retransmission, where PRET is the retransmission index set. In practice, the forward parity packets are generated at the source encoding stage; depending on the repair schemes, the retransmitted parity packet may be either generated at the source encoding stage, or generated at the unicast server (as in SAR) or at the peers (as in PAR) upon request. We detail the procedure of SLEP/SLEPr as follows.

Forward Transmission. At the sender, the source packets {si }i∈S and the forward parity packets {ˆpi }i∈PFWD are transmitted to the receiver over the error-prone erasure channel.

Forward Error Correction. Let B denote an erasure burst occurred during forward transmission. At the receiver, use the received source packets {si }i∈S\B to regenerate the redundant packets {ˆsi }i∈S\B . If the total number of redundant packets

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

28

regenerated and the parity packets received is larger than K, i.e., |{ˆsi }|i∈S\B + |{ˆpi }|i∈PFWD \B ≥ K,

(3.1)

apply block erasure decoding to recover the corrupted packets. Otherwise, the receiver sends a retransmission request for additional parity packets {ˆpi }i∈PRET . Retransmission. Upon receipt of the additional parity packets, apply erasure decoding if enough packets are received, i.e., |{ˆsi }|i∈S\B + |{ˆpi }|i∈PFWD \B + |{ˆpi }|i∈PRET \B ≥ K,

(3.2)

where B denotes new erasures (e.g., due to peer departure or packet loss) during retransmission of repair packets. Then decode the redundant slices and substitute into the primary video stream. If not enough packets are received, the decoder falls back on error concealment.

3.2.3

Joint PAR-SLEP/SLEPr Error Protection

We have seen that SLEP/SLEPr operates in the application layer whereas PAR operates in the network/transport layer. Note that erasure codes (e.g., Reed-Solomon codes) are used in both layers, but for diﬀerent purposes – for resisting the erasure burst in the application layer and for mitigating peer departures in the transport layer. This introduces ineﬃciency. If we can unify SLEP/SLEPr and PAR into one single framework, then erasure coding is performed only once and the redundancy can be utilized more eﬃciently. This is can be achieved by modifying the PAR protocol and make it aware of the encoding in SLEP/SLEPr. We refer to this new scheme as the jointly coded case. In contrast, the coded case discussed in Section 3.1.1 is referred to as the separately coded case. Speciﬁcally, we modify the coded case described in Section 3.1.1 as follows. The repair packet generation and transmission procedure is illustrated in Figure 3.4. Note that the erasure encoding/decoding is performed in blocks. Either the

3.2. CROSS-LAYER DESIGN WITH SLEP/SLEPR

29

Burst B pn

ˆ s(1)

…

Erasure

…

ˆ s(K)

…

Forward Parity PFWD

Source Block S ˜ p1

p1

ˆ s(1)

…

…

…

…

⊆ CΩ\n

Decoding succeeds if received coded packets

(3.2)

˜ p|P

RET |

p|PRET |

ˆ s(1)

…

…

ˆ s(K)

…

ˆ s(K)

Figure 3.4: Generation and retransmission of coded redundant repair packets from peers in the co-multicast set |CΩ\n | to the requesting Peer n – joint PAR-SLEP/SLEPr error protection case. In this illustration, each repair packet is generated by a diﬀerent peer.

unicast server (in PAR-CT) or a peer receiver (in PAR-DT) must be aware of the coding blocks. Either the server or the peer speciﬁes the coding rules CR = g˜() := [g(1, ), ..., g(K, )]T , = 1, ..., |PRET | where g˜() corresponds to a column in the retransmission sub-matrix GRET ∈ QK×|PRET| of the generator matrix of a Q-ary linear block code. Note that each coding rule is now speciﬁed with support {ˆsi }i∈S , i.e., all the redundant packets within the coding block, instead of only for the corrupted source packets {si }i∈B as in Section 3.1.1. Each Peer , after receiving the request with the coding rule CR , ﬁrst regenerates the redundant packets {ˆsi }i∈S from source packets {si }i∈S , then generates the coded packets p˜ according to p˜ =

K i=1

g(i, )ˆs(i)

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

30

K where ˆs(i) i=1 := {ˆsi }i∈S and transmits the coded packet ˜p to the destination Peer n. Once it receives a suﬃcient number of packets, including regenerated redundant packets and forward parity packets satisfying (3.2), Peer n performs erasure decoding. Note that one drawback of the joint scheme is the additional delay introduced by block coding.

3.3

Analysis

In this section, we present an analytical model for the packet repair protocols and SLEP/SLEPr error protection scheme described in Section 3.1 and 3.2. The aim is to characterize the performance trade-oﬀs and the gains achieved by the joint PARSLEP/SLEPr framework compared to other schemes. The analytical results will be later compared with the simulation results in Section 3.4.

3.3.1

Setup

Throughout the analysis, we make the following assumptions. ˜ receivers, each belonging to one 1. A video multicast system supports in total N ˜ multicast sessions. The receiver distribution among the multicast sesof M sions follows a Zipf model [141], having the following probability mass function (PMF):

s

˜ = 1/m pZ (m; s, M) ˜ M s n=1 1/n where m is the popularity rank of a particular multicast session and s ≥ 0 is the exponent that determines the shape of the distribution. The number of peers ˜)·N ˜. ˜m = pZ (m; s, M in Multicast Session m is N 2. The multicast source video stream has bitrate RSRC . The FEC parity packet stream has bitrate RFWD . If the SLEP compression is applied in generating the parity packets, the parity packet stream has bitrate βRFWD , where β ∈ (0, 1] is referred to as SLEP compression ratio (see Section 3.2.1). Similarly, the

3.3. ANALYSIS

31

retransmitted parity packet stream has on-time1 bitrate RRET , and if SLEP compression is applied, the SLEPr stream has on-time bitrate βRRET . Deﬁne αFWD := βRFWD /RSRC and αRET := βRRET /RSRC the forward and retransmission rate budget, respectively. 3. We model the channel of each receiver link using a two-state Markov model. The erasure burst duration tBURST and the time between two bursts tBB are independent random variables and each follows an exponential distribution with mean TBURST and TBB , respectively. Consequently, the expectation for any function g(tBURST , tBB ) is obtained by integrating over two exponential distributions: ˆ ˆ E{g(tBURST , tBB )} =

t

t

g(t, t )fEXP (t; TBURST )fEXP (t ; TBB )dtdt

with fEXP (t; T ) denoting the exponential distribution parametrized by mean T . 4. The erasure bursts in diﬀerent receiver links are independent. The reason for making this assumption is two-fold. First, in many applications, it is reasonable to assume that the error events are local (e.g., in DSL links one typical event that triggers impulse noise is turning on/oﬀ microwave ovens). Second, the independence assumption ensures that the analytical results can be expressed in closed forms. In Section 3.4, we will evaluate through simulations the case when the bursts are correlated. 5. If a packet is not aﬀected by the erasure burst, it is considered successfully received (denoted by event AS ); otherwise the packet is corrupted, and could be repaired by either forward error protection packets (event AFWD ), or retransmitted packets (event ARET ); if neither succeeds, the repair is considered as failed (event AF ). Denote by ωi the outcome of Packet i after transmission over the channel and SLEP/SLEPr recovery. We have ωi ∈ Ω = {AS , AFWD , ARET , AF }, 1

Note that the retransmitted stream is on-oﬀ.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

32

where Ω denotes the sample space. 6. The erasure burst falls into one forward error protection coding block. This is a pessimistic assumption since if the burst falls into two, it can be more easily corrected within each block separately. Furthermore, no two erasure bursts occur within the same coding block. This is true with a high probability when burst occurrence is rare.2 Our analysis relates the following ﬁgures of merit for each protocol analyzed: • The expected bitrate requirement at the unicast server link, E{RS }, for sending the repair packets and optionally forwarding the request packets.3 • The uplink and downlink bitrate at each peer receiver link, denoted by RPU and RPD , respectively.4 Alternatively, we can use the normalized uplink and downlink bitrate, deﬁned by ˜ PU := RPU R βRRET and ˜ PD := RPD , R βRRET ˜ PU and R ˜ PD represent the number of uploads and respectively. Intuitively, R downloads a peer simultaneously supports, respectively. • The overall repair failure rate P (AF ). 2

For example, below eﬀective packet loss rate of 1e-5 [32] in a typical DSL link. At any time instance, the bitrate RS at the unicast server link needed to support the peer requests and retransmissions is a random variable. By analyzing the expected bitrate E{RS } we are able to compare the performance of diﬀerent repair protocols. This is diﬀerent from the setup in the experiments in Section 3.4.2.1, where we ﬁx the bitrate at the unicast server link and observe the resulting probability of repair failures. 4 Note that the downlink bitrate RPD excludes the the forward transmitted primary stream rate RSRC , since we only consider repair packets in this analysis. 3

3.3. ANALYSIS

3.3.2

33

Model for Packet Repair Protocols

Consider that source packets {si }i≥0 are multicast to the end-users. Denote by BSRC the packet size of the source stream. Each source packet has duration TSRC = BSRC /RSRC . A packet is considered lost if a portion of it is corrupted. Assume the starting location of the corruption is uniformly distributed over TSRC . For TBB TBURST , the packet loss rate given tBURST and tBB can be approximated as P (Ω \ AS | tBURST , tBB ) =

tBURST + TSRC . tBB

(3.3)

Now consider forward error correction. For an erasure burst B of duration tBURST , following (3.3), the number of packets it corrupts (averaged over the starting location of corruption) is B := |B| =

tBURST + 1. TSRC

(3.4)

/ AS and B ≤ MFWD (see Let MFWD := |PFWD |. The outcome ωi = AFWD if ωi ∈ Section 3.2.2, (3.1)). The probability of event AFWD (i.e., that a packet can be repaired by forward error correction) is thus

P (AFWD ) = E

tBURST + TSRC tBB

·1

tBURST + 1 ≤ MFWD TSRC

,

where 1(·) denotes the indicator function. Given tBURST and tBB , the probability of requesting a repair packet is PRET := P (ARET AF | tBURST , tBB ) tBURST + TSRC tBURST = + 1 > MFWD . ·1 tBB TSRC Overall, the probability of requesting a repair packet is

PRET := P (ARET which can be calculated using (3.5).

AF ) = E{PRET },

(3.5)

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

34

3.3.2.1

Server-Assisted Repair (SAR)

For SAR, the expected number of receivers sending request to the unicast server is ˜ The expected unicast server bitrate is PRET N. ˜ · RRET β. E{RSSAR } = PRET N Trivially, the normalized uplink bitrate need to support SAR at each receiver link is ˜ SAR = 0 R PU since no uplink is involved in SAR, and the normalized downlink bitrate is ˜ SAR = 1. R PD Given suﬃcient server bitrate, we assume that the repair handled by the server is reliable, i.e., P (ASAR ) = 0. F

3.3.2.2

Peer-Assisted Repair (PAR)

We analyze the two variations of the PAR protocol – PAR-CT and PAR-DT. We also analyze three cases of applying coding across packets – the uncoded case, the separately coded case (both described in Section 3.1.1) and the jointly coded case (described in Section 3.2.3). Our analysis proceeds in two steps: (i) We ﬁrst ﬁnd, for ˜ the probability of the unicast server directly each Multicast Session m, m = 1, ..., M, handling the repair, and the probability of repair failure, both as a function of the ˜m . (ii) We then compute the overall unicast server bitrate multicast session peer size N required, as well as the probability of repair failure for the uncoded, separately coded and jointly coded case. ˜m . In PAR, there are several possibilities Consider Multicast Session m of size N that may cause a peer receiver to be unresponsive to a request: (i) The peer may have just departed the multicast session and the request is sent before the notiﬁcation

3.3. ANALYSIS

35

arrives. Denote this event by ADEPT and the probability of this event (referred to as peer departure rate) by γDEPT := P (ADEPT ). (ii) The peer’s corresponding packet is also lost and not repairable by FEC. (iii) The peer’s uplink is congested, denoted by ACONG . The event that a peer is unresponsive to a request, denoted by ANR , has conditional probability P(ANR ) = γDEPT + (1 − γDEPT )PRET + (1 − γDEPT )(1 − PRET )P(ACONG ).

(3.6)

To ﬁnd P(ACONG ), note that this is equivalent to the probability that the instantaneous demand on the uplink is beyond its capacity. Given normalized uplink bitrate ˜ PAR (assumed to be integer), each peer can serve up to R ˜ PAR repair requests simulR PU

PU

taneously. We further introduce the practical constraint that each peer can communicate with at most nNB neighboring peers, and nNB neighboring peers can receive repair packets from that peer. Then P(ACONG ) can be expressed as P(ACONG ) =

PRET b k; n ˜ NB , , n ˜ NB ˜ PAR

n ˜ NB

(3.7)

k=RPU

˜m − 1, nNB ), b(k; n, p) = where n ˜ NB := min(N

n k p (1 − p)n−k denotes the binomial k

distribution. ˜ PAR as a parameter in (3.7). Now let us consider R ˜ PAR (for We have speciﬁed R PU PD both PAR-CT and PAR-DT). Denote by μ the redundancy of repair packets. For the uncoded case, it is the number of peers each request is sent to. For the separately/jointly coded case, it is the average redundancy degree for each lost packet, which can be a fractional number. The normalized peer downlink bitrate necessary to support PAR is ˜ PAR = μ. R PD

(3.8)

Now consider the server bit rate RS . For PAR-CT, the unicast server traﬃc

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

36

consists of the request packets forwarded and the repair packets directly sent by the ˜m is no larger than μ, the request is handled server. Recall that if the group size N by the server. The expected number of peers the server directly send repairs to is ˜m . The expected unicast server bitrate for PAR-CT can be expressed PRET m:N˜m ≤μ N as E{RSPAR−CT } = PRET

˜m · RRET β + PRET N

˜m ≤μ m:N

˜m · μRCTL , N

˜m >μ m:N

where RCTL is the bitrate for the control information (i.e., the request). For PAR-DT, the unicast server traﬃc only consists of the repair packets directly sent by the server, thus its expected bitrate can be expressed as E{RSPAR−DT } = PRET

˜m · RRET β. N

˜m ≤μ m:N

We want to evaluate the PAR protocol repair failure probability for the uncoded, ˜m , the separately coded and jointly coded case. For a multicast session of size N ˜m ). probability of repair failure conditioned on retransmission is P(AF | ARET AF , N ˜m ), the overall probability of failure can be evaluated as Given P(AF | ARET AF , N P (APAR ) F

=

˜ M

˜m )pZ (m; s, M) ˜ P (AF | N

m=1 ˜ M

=

E{PRET P(AF | ARET

˜m )}pZ (m; s, M). ˜ AF , N

m=1

The uncoded, separately coded and jointly coded case only diﬀer in the way that ˜m ) is calculated. P(AF | ARET AF , N

˜m > μ, the repair is handled by the peers. In this case, given Uncoded Case. If N redundancy μ ∈ N, the probability of repair failure is P(ANR )μ . Otherwise the repair is handled by the unicast server, in which case we assume that the repair is reliable. ˜m , the conditional repair failure probability Overall, for a multicast session of size N is

3.3. ANALYSIS

37

P(AF | ARET

⎧ ⎨P(ANR )μ , N ˜m > μ, ˜m ) = AF , N ⎩0, otherwise.

˜m > 1, the repair is handled by the peers. Separately Coded Case. Similarly, if N Recall that the number of corrupted packets B is computed by (3.4). The encoding generates μB packets. The repair is successful if B out of μB packets are received. Assume that the μB repair packets are transmitted with independent loss probability.5 The conditional repair failure probability is

P(AF | ARET

where b(k; n, p) =

n k

⎧ ⎨ B−1 b(k; μB, 1 − P(ANR )), N ˜m > μ, k=0 ˜ AF , Nm ) = ⎩0, otherwise,

pk (1 − p)n−k denotes binomial distribution.

Jointly Coded Case. The jointly coded case diﬀers from the separately coded case in that MFWD FEC packets can help in the decoding. Assume the worst case that the burst always falls into one coding block, we have B = |B ∩ (S ∪ PFWD )|. The repair is successful if B − MFWD out of μB packets are received. Similarly, assume that the μB packets are transmitted with independent loss probability. The conditional repair failure probability is ⎧ ⎨ B−MFWD −1 b(k; μB, 1 − P(ANR )), N ˜m > μ, k=0 ˜ P(AF | ARET AF , Nm ) = ⎩0, otherwise.

3.3.3

Numerical Results

In this section, we present numerical results obtained from the analytical model. The parameters used in obtaining the plots are selected from Table 3.2. Most of the parameters are chosen to mimic a practical setup; others are chosen while taking into 5

This independence assumption makes it possible to express the repair failure probability in a closed form, which can serve as a lower bound for the more general dependent case.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

38

2

0

10

Prob. Repair Failure P(AF)

Expected Server Rate E{RS} (Mb/sec)

10

0

10

−2

10

SAR PAR−CT PAR−DT PAR−CT REQ Overhead

−4

10

1

10

2

10

3

10

˜ Number of Peers N

−2

10

−4

10

−6

10

−8

4

10

Figure 3.5: Analysis: expected unicast server bitrate E{RS } as a function of the ˜ , for (1) number of supported peers N SAR, (2) PAR-CT and (3) PAR-DT. The redundancy μ is set to 1.

10

0

µ µ µ µ µ µ µ

= = = = = = =

˜ PU 1, R ˜ PU 1, R ˜ PU 1, R ˜ PU 2, R ˜ 2, RPU ˜ PU 3, R ˜ PU 3, R

0.05 0.1 0.15 Peer Departure Rate γDEPT

= = = = = = =

0 1 2 1 2 1 2

0.2

Figure 3.6: Analysis: probability of repair failure event P (AF ) as a function of peer departure rate γDEPT , for repair redundancy μ = 1, 2, 3 and peer uplink bi˜ PU = 0, 1, 2 (PAR-CT and PARtrate R DT, uncoded case).

consideration the simulation constraints (e.g., the number of peers are selected to be 100 by default whereas in reality a unicast server could support more than thousands of peers). We ﬁrst look at the unicast server bitrate, which reﬂects the server burden. In Figure 3.5, we compare the expected bitrate of SAR, PAR-CT and PAR-DT. We can ˜ . The rate of PAR-CT consists of observe that the rate of SAR grows linearly with N two portions – control information (i.e., request packets) and repair packets. For PARDT, only the repair packets count towards the bitrate. The repair packet portion of ˜ increases, it grows until it reaches the bitrate exhibits an interesting behavior – as N ˜ grows, the a maximum and then it drops. This can be explained as follows. As N number of multicast groups increases, contributing to the overall rate. The number of peers within each group also increases, thus the availability of peers that possess the requested repair packets increases, alleviating the demand on the unicast server. The expected rate reaches the maximum at a point when each multicast group starts

3.3. ANALYSIS

39

0

0

10

−2

10

−4

10

−6

10

−8

10

Uncoded Separate Joint

−10

10

0

0.05 0.1 0.15 Peer Departure Rate γDEPT

0.2

Figure 3.7: Analysis: probability of repair failure event P (AF) as a function of the peer departure rate γDEPT , for (a) uncoded case, (b) separately coded case and (c) jointly coded case. μ is set to 2; β is set to 0.5; αFWD and αRET are set at 5% and 5%, respectively.

Prob. Repair Failure P(AF)

Prob. Repair Failure P(AF)

10

−5

10

−10

10

β=1 β=1/2 β=1/3

−15

10

0

0.05 0.1 0.15 Peer Departure Rate γDEPT

0.2

Figure 3.8: Analysis: probability of repair failure event P (AF ) as a function of the peer departure rate γDEPT , for (a) β = 1, (b) β = 1/2 and (c) β = 1/3. We set ˜ PD = 1. the packets are jointly μβ = R coded with αFWD and αRET set at 5% and 5%, respectively.

to have at least one peer. Figure 3.6 illustrates how the redundancy degree μ and the peer uplink bitrate ˜ PU inﬂuence the probability of repair failure under diﬀerent peer departure rates. R The ﬁrst observation we make is that increasing repair redundancy could signiﬁcantly help mitigate the impact of peer departures. However, note that a high degree of redundancy is expensive because the peer downlink bitrate needs to be increased ˜ PU above 1 accordingly (recall (3.8)). Secondly, we observe that further increasing R does not help much in reducing the repair failure rate. This suggests that excessive peer uplink bitrate is not required in the system design. To examine the inﬂuence of applying coding across the repair packets, we plot the repair failure rate for the uncoded case, separately coded case and jointly coded case in Figure 3.7. The advantage of applying coding in mitigating the repair failure is prominent. As expected, the jointly coded case achieves even better performance

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

40

than the separately coded case as it takes into account all the redundancy in network/transport/application layer in correcting the packet erasures. Lastly, we examine the eﬀect of SLEP compression of the parity packets. Figure 3.8 shows the repair failure rate for the case we apply SLEP compression ratio β = 1, 1/2, 1/3. In each case, the repair redundancy μ (in the jointly coded case) is set such that μβ = 1. This ensures that the bitrate used for error protection are the same in all cases. It is shown that when SLEP is applied, the repair failure rate can be lowered dramatically. However, note that in SLEP, the reconstructed video quality also degrades if erasure occurs. In Section 3.4, we will evaluate the performance in terms of the reconstructed video quality through simulations.

3.4

Simulation Results

In this section, the performance and properties of the PAR protocol, as well as its joint performance with application-layer error protection scheme SLEP/SLEPr are thoroughly evaluated through simulations.

3.4.1

Simulation Setup

We have implemented a simulation program for the peer-assisted packet loss repair system for video multicast in Network Simulator (ns-2) [11] and a modiﬁed H.264/AVC reference software JM (version 13.2) [8]. Table 3.2 summarizes parameters used in the simulations. To allow simulations of up to 1000 peers, we have made some simpliﬁcations. Only the repair packets (but not the forward transmitted parity packets) are simulated. The channel error model is assumed to be on-oﬀ exponential, which corresponds to the two-state Markov model assumed in Section 3.3. By picking proper values for TBURST and TBB , we simulate an environment with mean erasure burst length 2 ∼ 16 ms and packet loss rate (PLR) in the order of 1e-3. When the received video quality is evaluated, the results are based on six 4CIF format (i.e., having a spatial resolution 704×576) sequences of diverse rate-distortion

3.4. SIMULATION RESULTS

41

Parameter

Default Value

˜ Nb. of peers N ˜ Nb. of multicast sessions M Zipf distr. shape s Primary stream bitrate RSRC Video duration FEC delay DFWD Retrans. delay DRET Peer-router delay Router-unicast server delay FEC budget αFWD Retrans. budget αRET Mean burst length TBURST Mean time btw burst TBB Packet loss rate P (Ω \ AS ) Req. pkt size BCTL Primary pkt size BSRC Repair redundancy μ SLEP compression ratio β Peer departure rate γDEPT ˜ PU Peer uplink bitrate R ˜ PD Peer downlink bitrate R Number of peer neighbors nNB

100 200 1 5 Mb/sec 300 frames 50 ms 0.5 sec 20 ms 0 ms 0% 10% 8 ms 2 sec 0.4% 64 Bytes 1375 Bytes 1 1 0 1 1 10

Range 10∼1000

240∼300 frames 0.1∼0.5 sec

0∼5% 5∼10% 2∼16 ms 0.2%∼0.8%

1∼3 0.5∼1 0∼0.25 0.1∼3 0.1∼3

Table 3.2: Simulation parameters. characteristics – Soccer, Ice, City, Harbour, Crew and Spincal. The sequences are encoded at 30 frames per second with an H.264/AVC JM encoder at QP 25 for I/P frames and 27 for B frames. The GOP structure is chosen to be IBBBP and the GOP size is 33.

3.4.2

Eﬀects of Link Bitrates

We examine the eﬀect of the three link bitrates of interests: server link bitrate RS , ˜ PD . ˜ PU and peer downlink bitrate R peer uplink bitrate R 3.4.2.1

Unicast Server Bitrate

We ﬁrst investigate the bitrate needed at the unicast server to support loss repairs in diﬀerent schemes. For this experiment, we ﬁxed the target PLR and ﬁnd the bitrate needed to achieve this PLR. Figure 3.9 plots the unicast server bitrate obtained in

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

42

2

10

SAR PAR−CT PAR−DT

S

R (Mb/sec)

1

10

0

10

−1

10

1

10

2

10

˜ Number of Peers N

3

10

Figure 3.9: Simulation: unicast server bitrate RS , for a post-repair PLR of 3e-4, as ˜ for (1) SAR, (2) PAR-CT and (3) a function of the number of supported peers N, PAR-DT. Video sequence: Soccer. simulations. Note that in Figure 3.5 the y-axis is the expected bitrate, which is lower than the bitrate for a PLR of 3e-4 in Figure 3.9. Nevertheless, we observe a qualitative agreement between the simulation results and the analytical model. Both PAR-CT and PAR-DT curves exhibit a lower server bitrate than the SAR curve. Furthermore, the PAR-DT curve grows as the number of peers increase until it reaches a maximum and then it drops. Refer to Section 3.3.3 for more explanations on the curve shapes. 3.4.2.2

Peer Uplink Bitrate

Compared to SAR, PAR-CT and PAR-DT require additional uplink bitrate from each peer. Figure 3.10 plots the repair failure probability versus the normalized peer ˜ PU . We observe that no matter how much redundancy is introduced, uplink bitrate R it is suﬃcient to have the peer uplink bandwidth matched to the downlink bandwidth reserved for repair packets. The reason is that, for loss uncorrelated on diﬀerent links, the chance that a peer is burdened by simultaneous repair requests from more than one peer is small. This result was expected based on the model analysis in Figure ˜ PU above 1 does not reduce the repair failure rate. 3.6, where increasing R

3.4. SIMULATION RESULTS

43

PAR−CT, γDEPT=0.1

−2

10

μ=1 μ=3 μ=2

μ=1 μ=3 μ=2

F

Prob. Repair Failure P(A )

Prob. Repair Failure P(AF)

PAR−DT, γDEPT=0.1

−2

10

−3

10

−4

10

0

0.5

1

1.5

˜PU R

2

2.5

3

−3

10

−4

10

0

0.5

1

1.5

˜PU R

2

2.5

3

Figure 3.10: Simulation: probability of repair failure event P (AF) as a function of the ˜ PU , for (a) PAR-CT and (b) PAR-DT with request normalized peer uplink bitrate R redundancy μ = 1, 2, 3, uncoded case. The results are measured at γDEPT = 0.1. Video sequence: Soccer. 3.4.2.3

Peer Downlink Bitrate

According to (3.8), the peer downlink bandwidth needs to be increased with the repair redundancy to accommodate more packet arrivals and avoid congestion. This is conﬁrmed in the simulation result in Figure 3.11. The plots show that in the ˜ PD = 1), repair redundancy uncoded case, if we keep the downlink bitrate low (e.g., R does not necessarily reduce the repair failure probability. In other words, the peer downlink bandwidth needs to be increased in accordance with repair redundancy, so as to accommodate more packet arrivals and avoid congestion. Therefore, using uncoded repairs with high redundancy is not an eﬀective repair strategy. We will show in Section 3.4.3 that coded repairs with moderate redundancy (e.g., μ = 3/2) ˜ PD = 1. are more eﬀective in reducing the repair failure probability, even if we keep R

3.4.3

Eﬀect of Repair Redundancy

Figure 3.12 shows the repair failure probability as a function of the unicast server bitrate RS , for various degrees of redundancy. Increasing the (uncoded) redundancy

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

44

PAR−CT, γDEPT=0.1

−2

μ=1 μ=2 μ=3

−3

F

Prob. Repair Failure P(A )

Prob. Repair Failure P(AF)

10

10

−4

10

−5

10

−6

10

0

PAR−DT, γDEPT=0.1

−2

10

μ=1 μ=2 μ=3

−3

10

−4

10

−5

10

−6

0.5

1

1.5

˜PD R

2

2.5

3

10

0

0.5

1

1.5

˜PD R

2

2.5

3

Figure 3.11: Simulation: probability of repair failure event P (AF) as a function of ˜ PD , for (a) PAR-CT and (b) PAR-DT with the normalized peer downlink bitrate R redundancy μ = 1, 2, 3, uncoded case. The results are measured at γDEPT = 0.1. Video sequence: Soccer. of repair packets is an eﬀective tool to mitigate the peer departures, but at the expense of potentially congesting the communication links. This is evidenced by the ineﬀectiveness of increasing the redundancy from 2 to 3. This conﬁrms the observation we made in Section 3.4.2.3. If we instead use a moderate redundancy (e.g., μ =3/2) and apply Reed-Solomon codes across the redundant packets, as shown in Figure 3.12, we can avoid the peer downlink congestion while we further reducing the PLR in the presence of peer departures. Applying coding across packets is a more eﬀective solution for mitigating the impact of peer departures.

3.4.4

Eﬀect of Correlated Loss

PAR is most eﬃcient when the packet loss among peers are uncorrelated. This is because that under uncorrelatedness, the probability that the peers lose the same portion of the stream is small. Figure 3.13 plots the repair failure probability at diﬀerent correlation degrees. For some percentage of peers, the loss is fully correlated whereas for the rest, they are uncorrelated. The plot shows that the PLR becomes quite signiﬁcant when 30% or more peers are correlated. However, note that in many

3.4. SIMULATION RESULTS

PAR−CT, γ

PAR−DT, γ

=0.1

DEPT

−2

10

μ=1 μ=3, Uncoded μ=2, Uncoded μ=3/2, Coded

F

−3

10

−4

10

−5

10

−6

10

1

=0.1

DEPT

−2

10

Prob. Repair Failure P(A )

Prob. Repair Failure P(AF)

45

μ=1 μ=3, Uncoded μ=2, Uncoded μ=3/2, Coded

−3

10

−4

10

−5

10

−6

1.5

2

2.5 3 RS (Mb/sec)

3.5

4

10

1

1.5

2

2.5 3 RS (Mb/sec)

3.5

4

Figure 3.12: Simulation: probability of repair failure event P (AF) as a function of unicast server bitrate RS for (1) PAR-CT and (2) PAR-DT at diﬀerent redundancy and uncoded/coded cases. The results are measured at γDEPT = 0.1. Video sequence: Soccer. practical cases (e.g., DSL system), in only very rare events (e.g., lightning strikes) the loss is correlated. Furthermore, PAR-DT appears to be somewhat less susceptible to correlated loss than PAR-CT. This is because that most of the requests in PAR-DT do not need to go through the unicast server link, therefore they are not aﬀected by the congestion created by the simultaneous requests in the bandwidth-limited server link.

3.4.5

End-to-End Latency

We verify the delay characteristics of SAR, PAR-CT and PAR-DT. We select a unicast server bitrate suﬃcient to support all schemes, and vary the pre-determined playout delay. Figure 3.14 illustrates the repair failure probability as a function of the retransmission delay. The result suggests that PAR-CT and PAR-DT require additional playout delay of about twice the peer-router delay (40 ms) than SAR. This is not unexpected. For each repair request, PAR-CT and PAR-DT traverse two hops more than SAR. Notice that, compared to the target retransmission delay of 500 ms (see Table 3.2), the additional 40 ms is a rather short delay.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

46

PAR−CT, γDEPT=0.1

−3

10

cor 100% cor 40% cor 30% cor 20% cor 10% cor 5% cor 0%

F

Prob. Repair Failure P(A )

Prob. Repair Failure P(AF)

PAR−DT, γDEPT=0.1

−3

10

−4

10

−5

10

1

−4

10

−5

1.5

2

2.5 3 R (Mb/sec)

3.5

4

10

1

1.5

2

S

2.5 3 R (Mb/sec)

3.5

4

S

Figure 3.13: Simulation: probability of repair failure event P (AF) as a function of unicast server bitrate RS at diﬀerent percentage of correlated peers. The results are measured at γDEPT = 0.1. Video sequence: Soccer.

−2

F

Prob. Repair Failure P(A )

10

PAR−CT PAR−DT SAR

−3

10

−4

10

−5

10

−6

10

−7

10

0.1

0.2 0.3 Retransmission Delay D

RET

0.4 (sec)

0.5

Figure 3.14: Simulation: probability of repair failure event P (AF) as a function of retransmission delay DRET for (1) SAR, (2) PAR-CT and (3) PAR-DT. The unicast server bitrate is set such that at an excessive delay of DRET = 0.6 sec, all schemes experience no packet repair failure within 100 runs of experiments.

3.4. SIMULATION RESULTS

3.4.6

47

Joint PAR-SLEP/SLEPr Error Protection

In this section, we evaluate the performance of the jointly coded scheme combining PAR and SLEP/SLEPr. Since SLEP/SLEPr causes a slight video quality degradation even if the corrupted packet is successfully repaired, we evaluate diﬀerent systems based on the received video quality. Figures 3.15 and 3.16 show the received video quality (in terms of the empirical cumulative distribution function (CDF) of frame PSNR) of the six test sequences, transmitted with PAR-CT and PAR-DT in uncoded, separately coded and jointly coded settings at TBURST = 8 ms. As expected, for all the test sequences, the gain of applying coding across packets over the uncoded case can be reﬂected on the CDFs of reconstructed video quality. The jointly coded case achieves even better performance than the separately coded case as it combines the redundancy in network/transport/application layer in correcting the packet erasures. To evaluate the eﬀect of repair redundancy and SLEP parity packet compression, we present results of received video quality for various schemes in Figure 3.17 and Figure 3.18. We compare the video quality for three schemes with diﬀerent combinations of redundancy μ and SLEP compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. It can be seen that both (ii) and (iii) incur worse video quality. The case of (ii) fails because it does not provide redundancy against peer departures whereas the case of (iii) fails due to peer downlink congestion. In comparison, the case of (i) demonstrates robustness performance as it avoids congestion by transmitting packets of reduced sizes. For a deeper understanding of the video reconstruction quality in these schemes, in Figure 3.19 we select some peers and plot the frame PSNR trace, and select some frames and plot the frame PSNR for all the peers. The results show that the video quality can be made robust by applying redundant repairs jointly with SLEP/SLEPr to resist bursty packet erasures and overcome peer departures while avoiding the peer downlink congestion.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

48

PAR−CT

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(a) Soccer

25 30 Frame PSNR (dB)

PAR−CT

35

40

35

40

35

40

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(b) City

25 30 Frame PSNR (dB)

PAR−CT

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

(c) Crew

20

25 30 Frame PSNR (dB)

Figure 3.15: Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.1 for PAR-CT and PAR-DT in the uncoded, separately coded and SLEP/SLEPr jointly coded settings. The FEC budget and retransmission budget is set at 5% and 5%, respectively. TBURST is set to 8 ms. Video sequence: (a) Soccer, (b) City and (c) Crew.

3.4. SIMULATION RESULTS

49

PAR−CT

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(d) Harbour

25 30 Frame PSNR (dB)

PAR−CT

35

40

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 35 Frame PSNR (dB)

40

45

15

20

(e) Ice

25 30 35 Frame PSNR (dB)

PAR−CT

40

45

PAR−DT

−1

−1

10

10

−2

Uncoded Separate Joint

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

(f) Spincal

20

25 30 Frame PSNR (dB)

35

40

Figure 3.16: Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.1 for PAR-CT and PAR-DT in the uncoded, separately coded and SLEP/SLEPr jointly coded settings. The FEC budget and retransmission budget is set at 5% and 5%, respectively. TBURST is set to 8 ms. Video sequence: (d) Harbour, (e) Ice and (f) Spincal.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

50

PAR−CT

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(a) Soccer

25 30 Frame PSNR (dB)

PAR−CT

35

40

35

40

35

40

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(b) City

25 30 Frame PSNR (dB)

PAR−CT

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

(c) Crew

20

25 30 Frame PSNR (dB)

Figure 3.17: Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.15 for PAR-CT and PAR-DT for diﬀerent combinations of redundancy μ and SLEP compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. The repair bitrate is set to be 10% of the primary video stream bitrate. Video sequence: (a) Soccer, (b) City, and (c) Crew.

3.4. SIMULATION RESULTS

51

PAR−CT

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

20

(d) Harbour

25 30 Frame PSNR (dB)

PAR−CT

35

40

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 35 Frame PSNR (dB)

40

45

15

20

(e) Ice

25 30 35 Frame PSNR (dB)

PAR−CT

40

45

PAR−DT

−1

−1

10

10

−2

β=1, μ=1 β=1, μ=2 β=0.5, μ=2

−2

CDF

10

CDF

10

−3

−3

10

10

−4

−4

10

15

10

20

25 30 Frame PSNR (dB)

35

40

15

(f) Spincal

20

25 30 Frame PSNR (dB)

35

40

Figure 3.18: Simulation: CDF of frame PSNR over 100 peers at peer departure rate γDEPT = 0.15 for PAR-CT and PAR-DT for diﬀerent combinations of redundancy μ and SLEP compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. The repair bitrate is set to be 10% of the primary video stream bitrate. Video sequence: (d) Harbour, (e) Ice and (f) Spincal.

CHAPTER 3. PEER-ASSISTED PACKET LOSS REPAIR

52

Peer #65 45

40

40 fPSNR (dB)

fPSNR (dB)

Peer #25 45

35 30

β=0.5, μ=2, PAR−CT β=1,μ=2, PAR−CT β=1, μ=1, PAR−CT

25 20 0

20

40

60

80

35 30 25 20 0

100

β=0.5, μ=2, PAR−DT β=1, μ=2, PAR−DT β=1, μ=1, PAR−DT 20

40

Frames

(a) 35

35 fPSNR (dB)

fPSNR (dB)

40

30

15 0

β=0.5, μ=2, PAR−CT β=1, μ=2, PAR−CT β=1, μ=1, PAR−CT 20

60

80

100

30 25 20

40

100

Frame #20

40

20

80

(b)

Frame #20

25

60 Frames

60

80

Peers

(c)

100

15 0

β=0.5, μ=2, PAR−DT β=1, μ=2, PAR−DT β=1, μ=1, PAR−DT 20

40 Peers

(d)

Figure 3.19: Simulation: frame PSNR for 100 peers at peer departure rate γDEPT = 0.15 for diﬀerent combinations of redundancy μ and stream compression ratio β: (i) β = 0.5, μ = 2, (ii) β = 1, μ = 1 and (iii) β = 1, μ = 2. (a) PSNR trace for peer 25, PAR-CT, (b) PSNR trace for peer 65, PAR-DT, (c) PSNR of frame 20 for all peers, PAR-CT, (d) PSNR of frame 20 for all peers, PAR-DT. Video sequence: Soccer.

Chapter Conclusion In this chapter, we have focused on the system-level peer-assistance solution where the multicast receivers are involved in the packet loss repair process. We ﬁrst presented the proposed PAR protocol, then described how to combine the PAR protocol with SLEP/SLEPr to achieve improved resistance to impulse noise and overcome peer downlink bottleneck. We also presented an analytical model with the aim of characterizing the gains achieved by the hybrid PAR-SLEP/SLEPr scheme and the performance trade-oﬀs. Simulations using ns-2 and a modiﬁed H.264/AVC coder were conducted to evaluate the performance. Through analysis and simulation results, we have conﬁrmed that in the application of packet loss repair, our approach of combining the servers and the peers in a uniﬁed fashion is able to greatly improve the system scalability while preserving the reliability oﬀered by a pure server-based model. Additionally, we have also veriﬁed that applying coding to proactively protect against uncertainties appears to be a viable way of ensuring robustness and timeliness.

Chapter 4 Peer-Assisted Fast Stream Startup In a digital television service, the response time to a request to start a new stream experienced by a user is a crucial measure of the user’s quality of experience. For commercial-grade IPTV service, to achieve user satisfaction, the response time must be kept below one second [22, 67, 56]. Compared to traditional analog television and digital cable, the startup delay in IP-based video multicast could be particularly signiﬁcant. Typically, the overall startup latency is the sum of several delay components: • Delay due to multicast operations, including the delay for leaving a previous multicast session and delay for joining a new multicast session. • Delay due to network operations, including the delay caused by error control mechanisms such as FEC and/or retransmissions, link delay and network buﬀering in order to smooth out jitters. • Acquisition delay, referring to the waiting time until the information in the stream necessary for decoding is acquired, such as program speciﬁc information (PSI), encryption keys and a video stream RAP (e.g., an I-frame). • Decoder buﬀering delay, due to the need to buﬀer data (e.g., a video frame) before decoding and playout can commence. 53

54

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

We have discussed existing approaches to expediting the stream startup for video multicast in Section 2.1.3. One approach we would like to highlight here uses a dedicated unicast server to burst a unicast stream to achieve rapid acquisition of the decodable video stream, and lateron switches from the unicast stream back to the multicast [166, 33, 67]. Since the unicast stream can always start from a RAP, the acquisition delay can be avoided. Furthermore, the unicast stream is bursted at a rate higher than the multicast rate, expediting the decoder buﬀering process. The same unicast server used for packet loss repair (as in Chapter 3) can be used to support fast stream startup, because the missing stream at the startup moment can be essentially treated as a long erasure burst. When the unicast server is used for facilitating fast stream startup, the system scalability is limited by a bottleneck eﬀect similar to that of the packet loss repair service. In order to overcome this limit, in this chapter, we extend the PAR protocol presented in Chapter 3 to the application of fast stream startup. Furthermore, we show that video transcoding is especially helpful in improving the system scalability, since even moderate transcoding of the video could lead to a tremendous reduction of unicast data volume. The proposed Peer-Assisted Startup (PAS) protocol combines PAR-CT with the unicast rapid acquisition and unicast-to-multicast switching procedure.

For fast

stream startup, PAR-CT appears to be a more suitable candidate than PAR-DT for the following reason. If PAR-DT is used for fast stream startup, each peer receiver needs to maintain information of the new multicast session it is about to join (thus potentially information of all multicast sessions). This is diﬀerent from the packet loss repair case, where each peer receiver only maintains information about its current multicast session. Thus, PAR-DT would result in much larger control overhead. The new PAS protocol must take into account a number of diﬀerences between packet loss repair and fast stream startup: • Duration of unicast stream. In packet loss repair, the duration of each repair packet stream is typically short (e.g., less than 100 ms); in fast stream startup, the unicast stream can last up to a few seconds. This has a few implications: (i) it is more desirable to use a reservation-based mechanism during peer-assisted

55

stream startup, rather than relying on repair packets to be delivered on-time with high probability. (ii) The chance that the peer link experiences erasure burst during unicast streaming should not be ignored, thus it is desirable to apply erasure correcting codes to resist these errors.1 (iii) Unlike in the peerassisted packet loss repair case where Reed-Solomon code can be suﬃciently applied to short coding blocks, it is more desirable to consider erasure correcting codes that explicitly consider the decoding delay (we will explore this topic further in Chapter 5). • Utilization of peer downlink bitrate. In packet loss repair, the repair packet stream is transmitted in tandem with the forward-transmitted source stream and FEC stream, thus it only uses the residual bandwidth available in the peer downlink. In fast stream startup, however, after the previous multicast stream is stopped, the unicast stream may grab the full bandwidth of the peer downlink. To fully utilize this bandwidth, the peer-assistance protocol should consider the typical asymmetry of peer uplink and downlink bandwidth. Furthermore, when the new multicast stream arrives, the unicast stream may keep on transmitting, but at a lower rate to avoid link congestion. Thus, the protocol must properly pace the packets sent to the peer receiver to achieve certain average bitrate. • Statistics of requests. In packet loss repair, the overall load of unicast is primarily aﬀected by the rate of error occurrence on the communication links; in fast stream startup, the load is mainly aﬀected by the incoming rate of startup requests. In addition, in peer-assisted packet loss repair, the load is aﬀected by the current receiver distribution among the multicast sessions; in fast stream startup, it is also aﬀected by the likelihood of which multicast session a new receiver is to join. Thus, a diﬀerent statistical model must be used in the system behavior analysis and parameter selection. In the sequel, after describing the stream startup procedures in Section 4.1, we model and analyze the various protocols/schemes and discuss their parameter optimization 1

In principle, retransmissions could also be applied, but the worst-case latency would be longer.

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

56

Notation ˜ N ˜ M ˜m N s; p B; B Rm (κn , κMAX ) n U μU ; μM M β TJ ; IJ TC T B ; BB RPU ; RPD e λSS TSS T U ; BU RS RSRC CR MS(·) ˜) pZ (m; s, M γDEPT ANR b(k; n, p) φm E{·}

Explanation Number of receivers currently in multicast sessions Number of multicast sessions Number of receivers in Multicast Session m Source packet; parity packet Index set for erasure burst; erasure burst length B := |B| Receiver group of Multicast Session m Number of reservations made on Peer n; its maximum number Index set for unicast stream Redundancy of unicast stream; redundancy of multicast stream Number of simultaneous helping peers (PAS) Transcoding compression ratio Multicast join delay; multicast join instant Block erasure coding delay Decoder buﬀering delay; decoder buﬀer data size Peer receiver uplink bitrate; peer receiver downlink bitrate Excessive bandwidth factor (e-factor) Stream startup request arrival rate Stream startup latency Unicast stream duration; unicast stream data size Unicast server bitrate Bitrate of the source video stream Coding rule Operator that maps a peer to a multicast session ˜ Zipf probability distribution with parameter s and M Peer departure rate Event that a peer is unresponsive toa request Binomial distribution b(k; n, p) = nk pk (1 − p)n−k Number of peers currently reserved in Rm Expectation of a random variable

Table 4.1: Notations used in Chapter 4.

in Section 4.2. In Section 4.3, we present simulation results based on ns-2, and evaluate the performance in terms of server bitrate, channel change latency and unicast burst duration.

The notations used in this chapter are summarized in Table 4.1.

4.1. STREAM STARTUP PROCEDURES

57

Figure 4.1: Procedure of Server-Assisted Startup (SAS).

4.1

Stream Startup Procedures

4.1.1

Server-Assisted Startup (SAS)

We refer to the fast stream startup mechanism using a dedicated unicast server to achieve rapid acquisition as Server-Assisted Startup (SAS), as illustrated by Figure 4.1. The scheme uses the Internet Group Management Protocol (IGMP) for multicast signaling. The unicast server is situated at the edge of the distribution network and continuously caches the packets of all multicast sessions in a previous period of time (typically dozens of seconds) received from the multicast router. When a peer receiver requests assistance for starting a new multicast stream, the peer sends an IGMP LEAVE message to the multicast router (corresponding to Step (a) in Figure 4.1) to stop its current multicast stream if needed; at the same time, the peer sends a rapid acquisition request message to the unicast server (Step (b)). The server responds with a message announcing whether the request would be served (Step (c)), and if served, the server determines the sequence numbers U of the packets to be sent, always starting from a RAP, and starts sending the stream {si }i∈U from its cache of the multicast stream (Step (d)). If transcoding is applied to the unicast stream, the server sends the redundant packets {ˆsi }i∈U generated from the source packets {si }i∈U instead. If block erasure coding is applied, the encoded stream is sent instead. If the request is rejected, the peer joins the multicast session regularly (i.e., fail over to the multicast join process without the unicast server). The unicast packets

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

58

Figure 4.2: Procedure of Peer-Assisted Startup (PAS). are carefully paced such that the peer downlink bitrate is utilized fully while the link is not congested. After some period of delay (which, depending on the implementation, may be calculated at the server side and signaled to the peer), the peer sends the multicast router an IGMP JOIN message for joining the new multicast session (Step (e)). After a short delay, the multicast stream from the router arrives at the peer (Step (f)), and eventually the unicast stream ceases.

4.1.2

Peer-Assisted Startup (PAS)

The procedure of PAS is illustrated in Figure 4.2. In PAS, a peer receiver’s status is continuously tracked by the unicast server, similar to that in PAR-CT (see Section 3.1.2). The server maintains a table that keeps track of which peer is currently ˜

receiving which multicast, i.e., the peer-session pairs {(n, MS(n))}N n=1 . In addition, – the maximum numeach Peer n is associated with a positive integer number κMAX n ber of new stream startups it can simultaneously support (thus be reserved), which depends on its uplink bandwidth and processing power. The server also maintains ˜

)}N {(κn , κMAX n n=1 , where the positive integer κn is the number of reservations Peer n currently has. When a peer requests assistance for starting a stream of a new multicast session, the peer sends an IGMP LEAVE message to the multicast router to stop its current multicast stream if needed (corresponding to Step (a) in Figure 4.2); at the same time, the peer sends a rapid acquisition request message to the unicast server (Step (b)).

4.1. STREAM STARTUP PROCEDURES

59

The server responds with a message (Step (c)) with one of three possible outcomes: (i) If a group of M reliable2 peers in the new multicast session that are available for reservation (i.e., each Peer n has κn < κMAX ) can be found, the request would be n served by these peers. In the meanwhile, these helping peers are also notiﬁed and reserved by the server. (ii) Otherwise, if the server has enough resource to provide the fast stream startup, the request would be served by the server itself, in which case the scheme fails over to the SAS case described in Section 4.2.2. (iii) Otherwise, the request is rejected, in which case the scheme fails over to the regular multicast join procedure. In the sequel, we focus on Case (i). The selection of the number of helping peers M should depend on both the reservation availability and the uplink/downlink bitrate of the peer receiver links (see Section 4.2 for more details). The unicast server determines the unicast coding redundancy μU and the total number requests μU |U|. Usually these requests are split evenly among the M peers. The server then sends the requests at evenly spaced time intervals to each helping peer (in order to control the average rate of the unicast stream), including the speciﬁed coding rule CR,m (Step (d1 )). Assume block coding is applied on every K source packets, the rules for generating a block of MN channel packets are speciﬁed as CR,m = g˜m () := [gm (1, ), ..., gm (K, )]T , = 1, ..., N, m = 1, ..., M where g˜m () corresponds to a column in the erasure code generator matrix Gm ∈ QK×N . Note that in this case, the coding redundancy is related as μU =

MN . K

If

transcoding is applied to the unicast stream, each helping Peer m, m = 1, ..., M, upon receiving assistance notiﬁcation from the server, starts transcoding the stream by generating the redundant packets {ˆsi }i∈U from source packets {si }i∈U . Upon receiving

2

In practice, it is possible to ﬁlter out unreliable peer receivers. For example, in IPTV, it is possible to learn viewers’ channel change behavior and determine if a viewer is currently surﬁng channels. If this is the case, the viewer’s dwell time is short, thus the peer should be excluded from the group providing peer assistance.

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

60

a request, it generates the coded packet p˜,m according to ˜p,m =

K

gm (i, )ˆs(i)

i=1

and sends ˜p,m to the requesting peer (Step (d2 )), which then performs decoding when enough packets are received. Lastly, after some period of delay, the requesting peer sends the multicast router an IGMP JOIN message (Step (e)). After a short delay, the multicast stream arrives (Step (f)), the unicast stream ceases, and eventually the M helping peers are notiﬁed by the server that the reservation ends.

4.2

Analysis

This section introduces an analytical model to characterize the performance of SAS and PAS, combined with the unicast stream transcoding. The goal is to capture the fundamental trade-oﬀs in these systems without introducing too many implementation details. Based on the model, we also discuss the selection of a few key parameters that optimize the system performance.

4.2.1

Setup

The main assumptions of this model are listed below. 1. Three delay components are considered – multicast join delay TJ , block erasure coding delay TC and decoder buﬀering delay TB . Recall that both SAS and PAS can avoid the acquisition delay. 2. Signaling is assumed ideal among the nodes. That is, the control messages among the multicast peer receivers, the unicast server and the multicast router are assumed to be lossless and instantaneous. Furthermore, the unicast server is able to send the unicast packets (or the requests) to control the unicast stream bitrate in perfect coordination with the multicast arrival to avoid congestion.

4.2. ANALYSIS

61

3. The multicast (unicast) stream has bitrate μM RSRC (μU RSRC ), where RSRC is the bitrate of the source video stream and μM > 1 (μU > 1) reﬂects the FEC redundancy applied to the multicast (unicast) stream. For the unicast stream, if video transcoding with compression ratio 0 < β ≤ 1 is further applied, the bitrate becomes μU βRSRC . The maximum bitrate in the peer receiver downlink = (1 + e)RSRC where e > 0 is referred to as the excessive bandwidth is RPD

factor (or the e-factor).3 The bitrate in the peer receiver uplink is RPU . ˜ current re4. Assumption 1 of Section 3.3.1 applies, i.e., the distribution of N ˜ and the ˜ multicast sessions follows a Zipf model pZ (m; s, M), ceivers among M ˜ )· N. ˜ ˜m = pZ (m; s, M number of current peer receivers in Multicast Session m is N Furthermore, we assume the arrival of stream startup requests follows a Poisson process with arrival rate λSS and the probability of a request joining a multicast session is proportional to the popularity of that session characterized by the Zipf model. 5. Assumptions 3, 4 and 6 of Section 3.3.1 also apply. We use the following ﬁgures of merit to evaluate each stream startup scheme: • The stream startup latency TSS – the duration between request and actual playout of the new stream. During the analysis and parameter optimization, we assume that TSS has the highest priority among all the criteria. • The duration of the unicast stream TU and the total data size of the unicast stream μU βBU , where BU is the corresponding video data size if no transcoding is applied. TU and μU βBU capture the burden on the unicast server control plane and data plane, respectively. As we will see, in most cases, the optimization of TU and BU are consistent with each other. During the analysis, we arbitrarily select BU as the objective to optimize. • The expected bitrate requirement at the unicast server link, E{RS }. • The quality of the decoded video at the receivers. 3

RPD of Chapter 3 diﬀers from RPD in that RPD only considers the bitrate used by retransmissions, excluding the bitrate of the multicast stream.

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

62

4.2.2

Server-Assisted Startup (SAS)

4.2.2.1

Modeling of SAS Procedure

At time 0, after a peer sends a request, the unicast server starts bursting a unicast stream, starting from some preceding RAP at the bitrate (1 + e)RSRC . The higher bitrate allows the unicast to catch up with the multicast. RAP also determines BCAT – the video data size the unicast needs to catch up (refer to Figure 4.3). Let TCAT be the duration needed for the unicast to catch up with the multicast, which can be computed as TCAT =

μU βBCAT . (1 + e − μU β)RSRC

We distinguish two cases: if the peer joins the multicast before the unicast catches up, or if TJ ≤ TCAT , we have Case I (Figure 4.3(a)); if TJ > TCAT , we have Case II (Figure 4.3 (b)). In both Case I and II, the data arrived at the peer becomes available to the video decoder only after block erasure coding delay TC ; then after TB , when BB video bits are completely buﬀered, the video decoder starts decoding. Now we discuss each of the two cases as follows.

Case I. At some instant IJ , the peer sends an IGMP JOIN message to the multicast router; after TJ , the peer starts receiving the real-time multicast stream of bitrate μM RSRC from the router. To make room for the multicast, the unicast server lowers its unicast bitrate to (1 + e − μM )RSRC at the moment the multicast stream arrives. The unicast stream keeps on transmitting at that bitrate until TU , when the gap between the unicast and multicast stream is closed. Denote by THBR the duration in which the unicast stream is sent at the high bitrate (1 + e)RSRC . It is clear that THBR of Case I is SAS−I = IJ + TJ , THBR

as the unicast stream bitrate must be lowered when the multicast stream arrives. The unicast duration of Case I can be expressed as TUSAS−I = IJ + TJ +

μU βBU − (IJ + TJ )(1 + e)RSRC . (1 + e − μM )RSRC

4.2. ANALYSIS

Bit #

Bit #

63

Multicast Join IJ

Unicast End TU

Multicast Unicast End TU Join IJ

TJ TJ

BCAT

RAP Multicast Unicast Playout

BB

Time

0 TC TSS

BCAT

RAP Multicast Unicast Playout

BB

0 TC TSS

(a)

Time

(b)

Figure 4.3: Video data received by a peer receiver over time for (a) SAS, Case I and (b) SAS, Case II. For the diagram displayed, we assume the video is not transcoded. Abbreviation: BCAT – data size the unicast needs to catch up; TSS – stream startup latency; IJ – multicast join instant; TJ – multicast join delay; TU – unicast duration; TC – block erasure coding delay; BB – decoder buﬀering data size.

Note that the selection of IJ > 0 must be subject to the following constraint. If IJ is too early, it may result in a gap between the multicast and unicast streams, thus interrupting the playout. To avoid this, we must ensure that at the moment the last unicast bit is passed to the video decoder (i.e., at TU + TC ), this bit is not played out yet; that is, (TU + TC − TSS )RSRC ≤ BU .

(4.1)

Case II. After the unicast catches up, it should never pass the multicast, thus the unicast has to lower its bitrate from (1 + e)RSRC to μU RSRC . The unicast server terminates the unicast at the moment the multicast begins. Thus, the unicast duration of Case II is TUSAS−II = IJ + TJ .

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

64

The duration of unicast bursting at the high bitrate of Case II is SAS−II = TCAT . THBR

4.2.2.2

Performance Measurements

Stream Startup Latency TSS . The stream startup latency of SAS consists of the block erasure coding delay TC and the decoder buﬀering delay TB : TSS = TC + TB . TC and TB can be obtained as follows. By Assumption 7 of Section 3.3.1, we select a MAX . Applying a MDS block erasure code to resist one erasure burst of duration TBURST

code with redundancy μU results in a worst-case decoding delay (i.e., a block length) TCSAS−MDS =

MAX μU TBURST . μU − 1

To compute TB , note that since reducing TSS has the highest priority, we should transmit the unicast at high bitrate (1 + e)RSRC until BB bits are stored in the video buﬀer. Thus, TB =

μU βBB , (1 + e)RSRC

as long as the unicast burst duration at the high rate THBR is long enough, i.e., THBR ≥ TSS .

(4.2)

Unicast Video Data Size BU . In SAS, BU can be related to BCAT and IJ as BU = BCAT + (IJ + TJ )RSRC .

(4.3)

Unicast Server Bitrate. Lastly, the unicast server handles all the stream startup requests, which has mean arrival rate λSS requests/sec. If we model the position of video at stream startup request as a random variable uniform over a RAP interval,

4.2. ANALYSIS

65

the unicast data size BU is also a random variable, which can be modeled as being independent of the request arrival process. The expected unicast server bitrate is thus E{RSSAS } = μU βE{BU }λSS . 4.2.2.3

Optimization of RAP and IJ

In SAS, two tunable parameters are present – the unicast startup RAP and the multicast join instant IJ . We would like to select (RAP∗ , IJ∗ ) to optimize both stream startup latency TSS and unicast duration TU . First, it is clear that TSS is optimal as long as (4.2) is met. In (4.3), BCAT is determined by RAP. The problem now becomes the selection of (RAP∗ , IJ∗ ) to minimize BU subject to the constraints: (RAP∗ , IJ∗ ) = argminBU (RAP,IJ )

subject to

(4.4)

TSS ≤ THBR , (TU + TC − TSS )RSRC ≤ BU .

It is possible to search in the feasible domain of (RAP, IJ ) the optimal parameters that minimizes BU . A good approximation to (4.4) is the following two-step formulation. First, select RAP∗ =

argmin

RAP:TSS ≤TCAT

BCAT .

Then, select IJ∗ = minIJ IJ

subject to

TSS ≤ THBR , (TU + TC − TSS )RSRC ≤ BU .

4.2.3

Peer-Assisted Startup (PAS)

PAS essentially combines PAR-CT with the SAS procedure of unicast rapid acquisition and unicast-to-multicast stream switching. The modeling and optimization of RAP and IJ presented in Section 4.2.2 applies to PAS, with the exception for the calculation of the block erasure coding delay TC . We will discuss the calculation of TC

66

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

and the selection of a unique parameter in PAS – the number of peers M involved in a stream startup. Lastly, we will discuss the unicast server bitrate and the probability of peer-assisted startup failure. Block Erasure Coding Delay TC . Diﬀerent from SAS, in PAS, the erasure code applied should resist not only erasure bursts, but also peer departures. Assume we would like to apply an M-link block code that resists Z erasure bursts of maximum MAX and L peer departures. Applying an MDS code with redundancy μU duration TBURST

results in a worst-case decoding delay TCPAS−MDS =

MAX ZμU TBURST . M(μU − 1) − LμU

(4.5)

Selection of Helping Peer Number M. In PAR-CT of Section 3.1.2, we randomly select as many peers as possible from the co-multicast group CΩ\n of a peer to maximize the path diversity and minimize the impact of peer departures. In PAS, as the unicast stream is potentially very long, it is necessary to deterministically select a number of peers and send them the requests periodically. Besides (i) path diversity, the selection of assistance peer number M is also subject to the following considerations. (ii) M should not be too small, given the typical asymmetric bitrate of peer uplink and downlink. Speciﬁcally, to support the full rate of peer downlink, we should select M ≥ RPD /RPU . (iii) The larger M is, the more source bits are modeled

as having the same deadline. In reality, since the video stream is sequential in nature, there is a penalty in decoding delay when sequential deadlines are considered parallel. Therefore, M should not be too large. Overall, an appropriate value of M should be selected to balance all three considerations. Unicast Server Bitrate. In PAS, if a new request for Multicast Session m arrives, ˜m . In the unicast server looks into Session m’s current receiver group Rm of size N = 1, i.e., each peer can handle at most one this analysis, for simplicity, assume κMAX n reservation at a time. Let φm be the number of peers currently reserved in Rm . If ˜m − φm < M, the request is handled by the unicast server, where the server sends N

4.2. ANALYSIS

67

data of size μU βBU ; otherwise, the request is handled by M helping peers, where the CTL B . Here BCTL and BSRC are the data packet server sends control data of size μU B BSRC U

and control packet size, respectively. We would like to ﬁnd the distribution of φm . By Assumption 4. of Section 4.2.1, the request arrival processes of Multicast Session ˜ are mutually independent, each following a Poisson process of mean m, m = 1, ..., M ˜ )λSS . For each request, if M peers are reserved for duration arrival rate pZ (m; s, M TU , it can be shown that φm ˜ )λSS TU ) ∼ Poisson(pZ (m; s, M M ˜ are mutually independent. The overall expected unicast server and φm , m = 1, ..., M bitrate can be expressed as: E{RSPAS }

=

˜ M

˜m − φm < M)μU βBU + E{1(N

m=1

˜m − φm ≥ M)μU 1(N

BCTL ˜ SS . BU }pZ (m; s, M)λ BSRC

(4.6)

Note that BU and TU are functions of the position of video at stream startup request, which is a random variable uniform over one RAP interval. The expectation on the right hand side of (4.6) is over both the video startup position and the request arrival process, which are assumed independent.

Probability of Startup Failure P (AF). For simplicity, we make the following assumptions. (i) Diﬀering from (3.6) for PAR, since PAS is reservation-based, assume the event that a peer is unresponsive to a request, ANR , is due to peer departure only. Thus P (ANR ) = γDEPT . (ii) The erasure decoding failure is due to peer departure only. (iii) The peer departures are independent. The probability of request failure for Multicast Session m can

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

68

1.2

1

70

14

60

12

50

10

40

8

TU (sec)

U

μ β B (Mb)

0.6

U

TSS (sec)

0.8

30

6

0.4

e=0.2 e=0.3 e=0.4

0.2

0 0

0.5 Relative Position

1

20

4

10

2

0 0

0.5 Relative Position

1

0 0

0.5 Relative Position

1

Figure 4.4: Analysis: startup latency TSS , unicast data size μU βBU and unicast duration TU as a function of the multicast stream’s relative position within a RAP interval when the receiver initiate a stream startup request, at e-factor e = 0.2, 0.3 and 0.4. β = 1. then be expressed as ˜m − φm ≥ M)} P (AF | MSm ) = E{1(N

M −L−1

b(k; M, 1 − P (ANR ))

k=0

where b(k; n, p) =

n k

pk (1 − p)n−k denotes the binomial distribution. The overall

probability of request failure is

P (AF ) =

˜ M

˜ (AF | MSm ). pZ (m; s, M)P

m=1

4.2.4

Numerical Results

Based on our analysis, we present a set of numerical results that characterize the behavior of the stream startup schemes discussed. If not stated otherwise, the parameters used in obtaining the plots are selected from Table 4.2. We start with the simple scenario of a unicast server supporting a single receiver using SAS. The same result also applies to PAS, when the unicast is from the peer receivers. In Figure 4.4, we show under various e-factors, how the startup latency and

4.2. ANALYSIS

69

0.4

0.2

0 0.4

e=0.2 e=0.3 e=0.4 0.6

β

0.8

1

9

45

8

40

7

35

6

U

0.6

50

30

U

μ β B (Mb)

TSS (sec)

0.8

10

TU (sec)

1

55

25

4

20

3

15

2

10

1

5 0.4

0.6

β

0.8

1

5

0 0.4

0.6

β

0.8

1

Figure 4.5: Analysis: startup latency TSS , unicast data size μU βBU and unicast duration TU as a function of the transcoding compression ratio β, at e-factor e = 0.2, 0.3 and 0.4. The relative position within a RAP interval is 0.8.

unicast data size/duration vary as a function of the multicast stream’s relative position within a RAP interval when the receiver initiate a stream startup request. First, the startup latency TSS is independent of the initial position, as expected, indicating that the unicast-based stream startup mechanism is able to avoid the acquisition delay. The unicast data size/duration, however, vary signiﬁcantly with the initial position, which determines how much data the unicast needs to catch up. The variation is particularly prominent when the e-factor is small, when it takes a considerable amount of time for the unicast stream to catch up with the multicast stream. Also note that the unicast data size/duration is not linear with the relative position within the RAP interval; instead, a sawtooth pattern can be observed. The reason is that the optimization scheme selects a RAP that ensures that the unicast is bursted at full bitrate before the stream startup is completed; if the unicast catches up too soon, the unicast is forced to start from a RAP earlier than the immediate RAP. In Figure 4.5, we show how the startup latency and unicast data size/duration vary as a function of the transcoding compression ratio β. First, the startup latency TSS decreases linearly with decreasing β, due to the reduction of the video data size the decoder has to buﬀer. We then look at the unicast data size/duration. At the region where β is close to 1, the unicast data size/duration reduces superlinearly with

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

70

120 100

80

SAS ˜ = 100 PAS, N ˜ = 200 PAS, N

λSS=2

70

λ

80

E{R } (Mb/sec)

60

50 40

S

E{RS} (Mb/sec)

=1

SS

λSS=0.5

60

40

30 20

20 10

0 0

0.5

1

λ

SS

1.5 2 (req/sec)

(a)

2.5

3

0 0

50

100

150

˜ Number of Current Peers N

200

(b)

Figure 4.6: Analysis: expected retransmission bitrate E{RS } as a function of (a) ˜. stream startup request arrival rate λSS , and (b) number of current peer receivers N M = 6.

β. As illustrated by the unicast data size, at e-factor e = 0.2, where β is reduced from 1 to 0.8, the data size is reduced from over 50 Mb to about 15 Mb. This eﬀect can be explained by the fact that reducing β is similar to increasing the e-factor, which could make the unicast catch up much faster with the multicast. However, note that reducing β further may lead the unicast server to start bursting from a RAP earlier than the immediate RAP (to avoid the delay of stream startup as we discussed above), thus increasing the unicast data size. In practice, we should always avoid selecting a β that is so low that it would lead to both a lower video quality and a larger unicast data size. Now we shift the attention to PAS. The most interesting quantity to examine is the unicast server bitrate E{RS } required. We examine it both as a function of λSS ˜ – the number of peer receivers that – the stream startup request arrival rate and N helps in peer-assisted stream startup, shown in Figure 4.6. In Figure 4.6 (a), as the request rate increases, E{RS } in SAS increases linearly. For PAS, however, initially E{RS } increases fairly slowly (mostly due to control traﬃc), until it reaches some saturation level when E{RS } starts increase linearly at the same rate as SAS. This is expected, since the stream startup request rate supported is limited by the number

4.3. SIMULATION RESULTS

71

Parameter

Default Value

Range

˜ Nb. of current peer receivers N Nb. of requesting peer receivers ˜ Nb. of multicast sessions M Zipf distr. shape s Primary stream bitrate RSRC Video duration Peer-router delay Router-unicast server delay Req. pkt size BCTL Primary pkt size BSRC SLEP compression ratio β Retrans. server bitrate % e-factor e Multicast join delay TJ Decoder video buﬀer TB Unicast redundancy μU Multicast redundancy μM MAX Max. erasure burst dur. TBURST Nb. of assistance peers M RAP (GOP) interval

10 10 10 1 5 Mb/sec 300 frames 20 ms 0 ms 64 Bytes 1375 Bytes 1 70% 0.2 100 ms 500 ms 1.1 1.1 50 ms 3 1.1 sec

0∼20

240∼300 frames

0.5∼1 10%∼100% 0.2∼0.4

1.1, 1.2 1.1, 1.2 3, 6

Table 4.2: Simulation parameters. ˜ . As we increase N ˜ from 100 to 200, the supported request of current peer receivers N rate when saturated also increases. This claim is validated by Figure 4.6 (b), where ˜ eﬀectively reduces we observe that increasing the number of current peer receivers N the expected server rate E{RS }. However, when the request arrival rate λSS is high, the number of current peer receivers required to drive down the server rate has to be large. The numerical results for failure probability due to peer departures is qualitatively similar to Chapter 3, Figure 3.6, which we do not repeat here.

4.3

Simulation Results

We implement a simulation program in ns-2 and a modiﬁed H.264/AVC reference software JM version 13. Unlike the packet loss repair simulation in Chapter 3 where only unicast traﬃc is simulated, in fast stream startup both the multicast and unicast

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

72

1000

600

800 Packet Seq. No.

Packet Seq. No.

800

1000 MCast UCast RS Playout UC Req MC Join RAP

400

200

0 0

600

MCast UCast RS Playout UC Req MC Join RAP

400

200

0.5

1 Time (sec)

(a)

1.5

2

0 0

0.5

1 Time (sec)

1.5

2

(b)

Figure 4.7: Simulated traces of video packets received by a peer, to demonstration the selection of multicast join time IJ that optimizes the unicast data size. μU = 1 and no decoding delay is involved. In this experiment, β = 1; receiver sends stream startup request at time 0.165. (a) demonstrates the case where optimization is applied; (b) demonstrates a simple strategy when the unicast terminates when it catches up with the multicast. In this illustration, no decoding delay is involved, otherwise the last unicast packet has to be received earlier. Abbreviations: MCast – multicast, UCast RS – unicast sent from the unicast server, UC Req – request for unicast stream startup, MC Join – send multicast join request. traﬃc are simulated. To simulate the operation of the PAS protocol, we consider a simpliﬁed setup where the peer receivers attached to the multicast router are divided into two groups – the current peers and the requesting peers. The current peers join the multicast session from time 0 and are prepared for responding to assistance requests. The requesting peers each send the unicast server a stream startup request at a random instant within a simulated interval. The primary simulation parameters are listed in Table 4.2.

4.3.1

Stream Startup Parameter Optimization

We ﬁrst give some intuition on how the optimization of parameters IJ and RAP works. Consider the simple case where the unicast server supports a single receiver and the downlink bandwidth is suﬃcient. We demonstrate the stream startup procedure in a

4.3. SIMULATION RESULTS

73

2000

1500 Packet Seq. No.

Packet Seq. No.

1500

2000 MCast UCast RS Playout UC Req MC Join RAP

1000

500

0 0

MCast UCast RS Playout UC Req MC Join RAP

1000

500

1

2 Time (sec)

(a)

3

4

0 0

1

2 Time (sec)

3

4

(b)

Figure 4.8: Simulated traces of video packets received by a peer, to demonstration the selection of unicast startup RAP that optimize the startup delay. μU = 1 and no decoding delay is involved. In this experiment, β = 0.5; receiver joins at time 1.09. (a) demonstrates the case where optimization is applied; (b) demonstrates a simple strategy when the unicast terminates when it catches up with the multicast. Abbreviations: MCast – multicast, UCast RS – unicast sent from the unicast server, UC Req – request for unicast stream startup, MC Join – send multicast join request.

few traces of video packets received over time, shown in Figure 4.7 and Figure 4.8. In Figure 4.7 (a), the optimization scheme selects a multicast join instant IJ and lowers its unicast rate at the moment when the multicast traﬃc comes, in such a way that at the moment the unicast ends, the decoder is just about to play the last unicast packet (in this illustration, no decoding delay is involved, otherwise the last unicast packet has to be received earlier). This ensures that the unicast server sends the least amount of unicast data while not interrupting the playout. In contrast, in Figure 4.7 (b), when a simple strategy is implemented, i.e., the unicast terminates when it catches up with the multicast, the amount of unicast data is signiﬁcantly larger. Note that in this experiment, the playout delay is the same for both cases. In Figure 4.8 (a), we illustrate a scenario where the optimization scheme chooses to start the unicast stream at a RAP earlier than the immediate RAP, so that during the decoder buﬀering of video data, the unicast can be bursted at the full receiver downlink bitrate. In contrast, in Figure 4.8 (b), when the simple strategy mentioned

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

74

1 0.8 CDF

Data Size (Mb)

10

5

0.4 0.2

0

0

5

10

15

Nb. Current Peers

20

0 0

25

Server Unicast Peer Unicast

30

0 −10

0.4 0.6 0.8 1 Server Unicast Data Size (Mb)

5

10

15

Nb. Current Peers

20

25

1.4

0.6 0.4

SAS ˜ =5 PAS, N ˜ = 15 PAS, N

0.2

0

1.2

0.8

20 10

0.2

1

CDF

Duration (sec)

0.6

0 0

0.5

(a)

1 1.5 2 2.5 Server Unicast Duration (sec)

3

3.5

(b)

Figure 4.9: Simulation: (a) Unicast data size/duration (mean and standard deviation) from the unicast server and the peers, as a function of the number of current peers. (b) CDF of server unicast data size/duration for SAS (i.e., analogous to that the number of current peers is 0), PAS with number of current peers 5 and 15. The number of requesting peers is 10.

above is used, the resulting instant of playout is delayed (refer to the distance between the starting point of the red and black curves).

4.3.2

Number of Multicast Receivers

We are interested in how PAS shifts the unicast traﬃc from the unicast server to the peers. In Figure 4.9 (a), we plot the mean and standard deviation of the unicast data size/duration, from both the unicast server and the peers, as a function of the number of current peers supporting PAS. Both the server unicast data size and duration decreases as we increase the number of current peers. This agrees well with the analytical result in Figure 4.6 (b). The current peers’ unicast data size/duration increases with the number of current peers. In Figure 4.9 (b), we also plot the CDFs of server unicast data size/duration for some selected numbers of current peers.

4.3. SIMULATION RESULTS

75

Data Size (Mb)

20 15

SAS ˜ = 10 PAS, N ˜ = 20 PAS, N

10 5 0 0

0.5

1

1.5 2 λSS (req/sec)

2.5

3

3.5

0.5

1

1.5 2 λSS (req/sec)

2.5

3

3.5

Duration (sec)

60 40 20 0 0

Figure 4.10: Simulation: unicast server unicast data size/duration (mean and standard deviation) as a function of the stream startup request arrival rate λSS , for SAS, and PAS with number of current peers 10 and 20.

4.3.3

Eﬀect of Request Arrival Rate

In Section 4.2.4, we discussed the unicast server bitrate’s behavior as the stream startup request arrival rate λSS increases at diﬀerent numbers of current peers. In this section, we verify this behavior in terms of the unicast data size and duration of the unicast server, as shown in Figure 4.10. For the SAS case (i.e., analogous to that the number of current peers equals to 0), both unicast data size and duration increases linearly with the request rate. For PAS with number of current peers larger than 0, initially both the unicast data size and duration increases slowly, until it reaches a saturation level when they start increase linearly at the same rate as SAS. This agrees with the behavior predicted by the model.

4.3.4

Eﬀect of Video Transcoding

We examine the eﬀect of applying transcoding to the unicast video stream on the reduction of stream startup latency TSS and the unicast data size/duration. Figure 4.11 shows TSS and the server unicast data size/duration as a function of the transcoding

CHAPTER 4. PEER-ASSISTED FAST STREAM STARTUP

TSS (sec)

76

1 0.5

Duration (sec)

Data Size (Mb)

0

0.5

0.6 0.7 0.8 0.9 Transcoding Compression Ratio β

1

0.5

0.6 0.7 0.8 0.9 Transcoding Compression Ratio β

1

0.5

0.6 0.7 0.8 0.9 Transcoding Compression Ratio β

1

20

e=0.2 e=0.4

10 0

30 20 10 0

Figure 4.11: Simulation: stream startup latency TSS and the unicast server unicast data size/duration (mean and standard deviation) as a function of the transcoding compression ratio β. The number of current peers and requesting peers are both 15. We set the unicast server bitrate to be 70% of the suﬃcient to introduce modest contention among the receivers. compression ratio β. First of all, TSS linearly decreases with decreasing β. This agrees with our observation from the analytical results in Figure 4.5. Second, at e = 0.2, the unicast data size and duration reduce signiﬁcantly when β reduces from 1 to 0.8, and then continue reducing with β. The unicast data size/duration reduction is less signiﬁcant when e is higher.

Chapter Conclusion In this chapter, the peer-assistance architecture has been applied to the fast stream startup problem.

After a description of the stream startup procedures and the

SAS/PAS protocols, we modeled and analyzed the protocols and discussed their parameter optimization. Lastly, we presented results of ns-2 simulations, and evaluated the performance in terms of server bitrate, channel change latency and unicast burst

4.3. SIMULATION RESULTS

77

duration. We have veriﬁed the applicability of the peer-assistance architecture in aiding the fast startup of new video streams, as well as the beneﬁt of transcoding the unicast video stream for reducing the unicast server load and improving the system scalability.

Chapter 5 Low-Delay Burst Erasure Codes Erasure correcting coding plays an essential role in the peer-assistance architecture. To resist unexpected peer departures that may lead to repair failures, coding is applied across the repair packets sent from multiple peer receivers. To repair a long erasure burst, a hybrid scheme complementing multicast FEC with unicast parity packet retransmissions achieves great eﬃciency. When the hybrid erasure correcting scheme is further combined with robust video coding, the resulting SLEP/SLEPr is able to achieve improved resistance to erasure burst at a slight cost in video quality. Erasure correcting coding inevitably introduces decoding delay. In fast stream startup, a long decoding delay would contribute to the overall latency for starting a new stream, hence aﬀecting the user experience. In this chapter, we study how to design an erasure correcting code that explicitly takes into consideration the decoding delay constraint. In the peer-assistance architecture, a receiver is aided by its peers receiving the same multicast stream. This scenario can be modeled as a communication problem where a single sender transmits data over a set of parallel links to a single receiver, which also encompasses the single-link problem as a special case. We would like to design a parallel-link erasure correcting code to handle two types of network errors – link outages and packet erasure bursts. A link outage refers to the case where all packets transmitted over a link are lost, which could be the result of a peer departure. Packet erasure bursts may occur on multiple links. However, in designing the code, 78

5.1. PROBLEM SETUP

79

we make the assumption that no two erasure bursts occur in the same link within a coding block, similar to Assumption 6 of Section 3.3.1. We also restrict ourselves to the FEC case, as in fast stream startup, we assume that after the user initiates the request, all the unicast packets are forward transmitted. The primary question we would like to answer is: given a code rate and the error correction performance constraint, what is the fundamental lowest decoding delay achievable and how to construct a code to achieve it? After the formal problem statement in Section 5.1, we state the main theorem of this chapter in Section 5.2, which characterizes the fundamental design tradeoﬀ among the code rate, error correction performance and decoding delay for any Singletonbound-achieving parallel link codes. In Section 5.3, we describe the practical code constructions. We start with the single-link burst erasure codes, built on top of which we propose the parallel-link burst erasure code construction. The cases when the parallel links are heterogeneous in terms of capacity and link delay are discussed in Section 5.4. Lastly, in Section 5.5, we apply the constructed low-delay codes to the peer-assisted fast stream startup problem and demonstrate its performance gain. The notations used in this chapter are summarized in Table 5.1.

5.1

Problem Setup

An M-parallel-link network contains a sender node connected to a receiver node via M parallel links. We consider homogeneous links and assume each link has unit capacity, i.e., it can transmit one symbol within a unit time. For the time being, we also assume that the links are delay-free. The more general cases with heterogeneous capacity and link delay are discussed in Section 5.4. Multiple channel symbols can be transmitted over a link sequentially over time. The channel symbol can either be erased (denoted by symbol ) or passed to the receiver intact. Consider a length-N channel symbol block for each Link m, m = 1, ..., M. Denote by B ⊆ {(m, n) : m = 1, ..., M, n = 1, ..., N} an erasure pattern, i.e., the set of channel symbol positions where erasures occur. When x[m, n] is transmitted over the link, the received channel

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

80

Notation s, x, y H(x) H(x | y) (m, n) a:b M N B B; {B} Z L T K Kn Q R ⊗ τ Em ; Em,n D; Dn C

Explanation Random variables in sans-serif letters Entropy of x Conditional entropy of x given y Symbol index within a coding block, m – link and n – time a, ..., b Number of parallel links Number of time slots within a coding block Number of symbols an erasure burst spans Erasure pattern; set of erasure patterns Number of erasure bursts within a coding block Number of link outages within a coding block Decoding delay (in number of symbols) Number of source symbols within a coding block Number of source symbols injected at time n Finite ﬁeld Rate of code Kronecker product Erasure symbol Link delay Encoding function for Link m; encoding function for Link m, time n Decoding function; decoding function for time n Code, C = ({Em }M m=1 , D)

Table 5.1: Notations used in Chapter 5.

symbol is y[m, n] =

⎧ ⎨,

(m, n) ∈ B

⎩x[m, n], otherwise.

We care about two types of practically relevant errors. If a link outage occurs, all the symbols transmitted over a link are erased. A B-burst is B consecutive symbol erasures on a link. In this thesis, we assume that within a coding block, at most one burst occurs in a link. We usually consider a set of erasure patterns {B}, for example, the set of all patterns with L link outages and Z B-bursts, each occurring on a separate link. A code C over an M-parallel-link network consists of a set of M encoding functions {Em }M m=1 and a decoding function D. The operation of the code is described as follows. Encoding. At time n, the source node observes a vector of source symbols S[n] =

5.1. PROBLEM SETUP

81

(s[1, n], ..., s[Kn , n])T ∈ QKn ×1 , where each source symbol is over ﬁnite ﬁeld Q, and Kn is the number of source symbols injected at time n.1 To encode the source symbols into channel symbols, for each Link m, m = 1, ..., M, we consider an encoding function K 2 Em : QK → QN that takes in a vector of K := N n=1 Kn source symbols S ∈ Q and maps them into a block of N channel symbols. The generated channel symbols Xm = Em (S) are then transmitted sequentially over the link. The rate of the code is deﬁned as: R :=

K symbols/unit time. N

(5.1)

The code is causal, if in the encoding function the current channel symbol is a function of the current and previous source symbols. That is, let Em = {Em,n }N n=1 , where Em,n is the encoding function for output symbol at time n, then x[m, n] = Em,n ({S[n ]}nn =1 ), m = 1, ..., M, n = 1, ..., N. If the code is linear, the encoding functions can be represented in matrix multiplication as Xm = Em (S) = S · Gm

(5.2)

where S ∈ Q1×K , Xm ∈ Q1×N and Gm ∈ QK×N . Decoding. At the sink, the decoding function D : {Q, }M ×N → QK takes in the received channels symbols {Ym }M m=1 , where Ym = (y[m, 1] ... y[m, N]), and maps S = S for every B ∈ {B}, we them into the reconstructed source symbols S ∈ QK . If say code C is feasible for correcting the set of erasure patterns {B}. Furthermore, each source symbol is subject to a decoding delay of T symbols. That is, let D = {Dn }N n=1 , where Dn is the decoding function that outputs the reconstructed source vector S[n] ∈ QKn ×1 , then S[n] = Dn ({y[m, n ]}m=1,...,M, n =1,...,n+T ), n = 1, ..., N. We then say the code is delay-T . The Singleton bound, in the context of erasure codes, requires that the number 1

In our setting, Kn varies over time. This is akin to the single-link (N, K) block code case, where Kn = 1 for n = 1, ..., K and Kn = 0 for n = K + 1, ..., N . Note that the source symbol rate can be smoothed out by packetizing symbols through diagonal interleaving [125]. 2 S and {S[n]}n=1,...,N are bijective, but the symbols could be re-ordered at the convenience of the algorithm description. See Section 5.3.2.1 for more details.

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

82

of source symbols is no larger than the number of unerased symbols. For an Mparallel-link block code feasible for correcting L link outages and Z B-bursts, it means MN ≥ K + LN + ZB. If MN = K + LN + ZB, we refer to the code as Singleton-achieving.

5.2

Main Result

Our main result for the parallel-link burst erasure codes can be summarized in the following theorem: Theorem 1. It is possible to construct a delay-T rate-R Singleton-achieving Mparallel-link causal block erasure code feasible for L link outages and Z B-bursts, each occurring on a separate link, if ⎧ ⎨M − L − ZB , T +B R ≤ R∗ := ⎩M − L − Z,

T ≥ B,

(5.3)

T < B,

Conversely, if R > R∗ , no feasible code can be constructed. Proof. The achievability of the code is proven by construction in Section 5.3. The converse is proven in Appendix A.1. Theorem 1 assures the best rate that can be achieved for the case when the decoding of the source takes place either before (T < B) or after (T ≥ B) the end of a single burst. An alternative view of the theorem is by obtaining the expression for T from (5.3): ⎧ ⎨max Z − 1, 1 B, R > M − L − Z, M −L−R T ≥ ⎩0, R ≤ M − L − Z,

(5.4)

which suggests that when R > M − L − Z, the lowest delay that can be achieved is T = B, which occurs when R = M − L − Z2 . The delay T cannot be reduced further when M − L − Z < R < M − L − Z2 . However, when R ≤ M − L − Z, we could have T = 0.

5.2. MAIN RESULT

83

1

M=20

0.8

M=10

R*/M

0.6 M=6

0.4

0.2 MDS Codes Delay−Optimal Codes 0 0

1

2

3

4

5

T/B

Figure 5.1: Numerical comparison of the rate-delay bound for delay-optimal codes and MDS codes. In this plot, we ﬁx L = 2 and Z = 3.

When M = 1, Z = 1 and L = 0, (5.4) degenerates to

R T ≥ max , 1 · B, 1−R

(5.5)

which is the single-link single-burst bound in [57]. As expected, in this case it is not possible to recover an erasure before delay B. However, note that the result in [57] applies only to the systematic codes (i.e., the ﬁrst K channel symbols are a replica of the K source symbols) whereas Theorem 1 applies to the more general causal codes. We plot in Figure 5.1 the rate-delay bound derived in Theorem 1 for a few parameters. The result is benchmarked against the bound for codes that use symbols within delay T from diﬀerent links forming a long coding block, and apply a K × (MN) generator matrix for MDS codes (for example, Reed-Solomon code). We observe that the bounds are the same at T < B. At T ≥ B, the delay-optimal bound has higher rate than the bound for MDS codes, with the largest gain in low-rate region. The gain diminishes as rate

R M

T B

becomes large. Furthermore, codes of larger M have a higher per-link

than codes of smaller M. This is expected, as path diversity introduces more

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

84

robustness to error, in turn improves coding eﬃciency. It is interesting to note that each rate-delay curve has a discontinuity at T = B. In other words, for a rate R within a certain range of values, the best delay T achievable is lower-bounded by the burst length B. To get some intuition on why this is the case, ﬁrst consider the single-link single-burst case with bound (5.5), where T is bounded by B regardless of the rate R, because if the burst occurs from the beginning of a block, no information can be received before the burst ends. In the case M > 1, as we will show in the converse (see Appendix A.1), the only way that T can be lower than B is to form coding blocks across the links (but not over time) at a low rate R, which will incur T = 0.

5.3

Code Construction

We consider the achievability part of Theorem 1 by constructing codes to meet the rate-delay bound in (5.4). We start with the single-link burst erasure codes, and then describe the proposed parallel-link burst erasure codes built on top of it.

5.3.1

Single-Link Burst Erasure Code

For single-link block codes (M = 1) that correct one B-burst with delay T , it is suﬃcient to use a (T, T − B) cyclic code over ﬁnite ﬁeld Q, with generator matrix

GC =

(5.6)

IT −B P(T −B)×B

where IM ∈ QM ×M denotes M × M identity matrix and P(T −B)×B ∈ Q(T −B)×B is the parity sub-matrix. Martinian et al. [57, 125, 124] propose a construction of (T +B, T ) code with the same delay and burst correction performance, but of better rate, having

the form: GD =

IT

IB P(T −B)×B

(5.7)

where P(T −B)×B is the same as in (5.6). An intuition behind this construction is that since it does not correct wrap-around bursts, the rate can be improved. Clearly, the

5.3. CODE CONSTRUCTION

85

proposed code construction achieves the rate-delay-burst bound in (5.5).

5.3.2

Parallel-Link Burst Erasure Code

The parallel-link burst erasure code we propose utilizes the single-link burst erasure code as a building block. It also has some similarity with product codes, which is a special class of concatenated codes [133]. 5.3.2.1

Encoding

Proposition 2 (Encoding). The encoding matrices Gm , m = 1, ..., M, of a delay-T rate-R∗ (as in (5.3)) Singleton-achieving M-parallel-link causal linear block erasure code feasible for L link outages and Z B-bursts, each occurs on a separate link, can be constructed as follows. First select a ﬁeld size q m such that a (T, T − B) cyclic code, an (M, M −L) MDS code and an (M, M −L−Z) MDS code can be constructed in the ﬁnite ﬁeld Q = GF(q m ). Create a systematic (T, T − B) cyclic code with generator matrix GC and the corresponding single-link inner code generator matrix GD (as in (5.7)). Create an (M, M − L) MDS outer code with generator matrix U ∈ Q(M −L)×M and an (M, M − L − Z) MDS outer code with generator matrix V ∈ Q(M −L−Z)×M . Create a generator matrix G ∈ Q[(M −L)T +(M −L−Z)B]×[(T +B)M ] as G=

G

U

GL

⎛ := ⎝

⎞

U ⊗ GD ⎠ V ⊗ 0B×T IB

where ⊗ denotes Kronecker product, 0B×T ∈ QB×T is an all-0 matrix and IB ∈ QB×B is an identity matrix. GU ∈ Q[(M −L)T ]×[(T +B)M ] and GL ∈ Q[(M −L−Z)B]×[(T +B)M ] are the upper and lower part of G, respectively. Divide G into M equal-size sub and use Gm ∈ Q[(M −L)T +(M −L−Z)B]×[T +B] matrices, i.e., G = G1 G2 ... GM as the generator matrix for link m, m = 1, ..., M. Let the source vector (as in (5.2)) be S =: (s1 s2 ... sK ). S[n], the source symbols injected at time n, is speciﬁed in (5.8). Example. To construct a 4-parallel-link code that corrects one link outage and two

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

86

−L T ({s(i−1)T +n }M n = 1, ..., T i=1 ) , S[n] = M −L−Z T ({s(M −L−1)T +(i−1)B+n }i=1 ) , n = T + 1, ..., T + B.

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

1 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0

1 0 1 0 0 0 0 0 0 1 0

0 1 1 0 0 0 0 0 0 0 1

0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0

0 0 0 1 0 1 0 0 0 1 0

0 0 0 0 1 1 0 0 0 0 1

0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 1 0 1 1 0

0 0 0 0 0 0 0 1 1 0 1

1 0 0 1 0 0 1 0 0 0 0

0 1 0 0 1 0 0 1 0 0 0

0 0 1 0 0 1 0 0 1 0 0

1 0 1 1 0 1 1 0 1 1 0

0 1 1 0 1 1 0 1 1 0 1

(5.8)

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(5.9)

2-bursts with delay 3, we choose a (5, 3) single-burst block erasure code in GF(2) with generator matrix GD as

⎛

⎞

1 0 0 1 0 ⎟ ⎜ ⎟ GD = ⎜ ⎝ 0 1 0 0 1 ⎠, 0 0 1 1 1 a (4, 3) MDS code in GF(2) with generator matrix ⎛

⎞

1 0 0 1 ⎜ ⎟ ⎜ U =⎝ 0 1 0 1 ⎟ ⎠ 0 0 1 1 and a (4, 1) MDS code in GF(2) with generator matrix V = (1 1 1 1) . G = (G1 G2 G3 G4 ) is then constructed as in (5.9). Let S = (s1 s2 ... s11 ). The source symbols injected at time 1 to 5 is speciﬁed as S[1] = (s1 s4 s7 )T , S[2] = (s2 s5 s8 )T , S[3] = (s3 s6 s9 )T , S[4] = (s10 )T and S[5] = (s11 )T . The constructed generator matrix G consists of two parts. The upper part GU

5.3. CODE CONSTRUCTION

87

uses the single-link code GD as the inner code at each link, and is further coded by an outer code U of rate

M −L M

applied across the links to resist up to L link outages. The

upper part GL , if used alone, provides resistance to up to one B-burst at each link and up to L link outages. Given that the B-bursts occur only in Z links, the lower part GL inserts additional source symbols. The inserted source symbols (corresponding to the IB part) share the channel symbols with the parity part of GD . Since they have to resist both L link outages and Z bursts, they are coded by the outer code V of rate

M −L−Z . M

One can verify that the constructed code is causal by observing the source symbols S[n] injected at each time n and the corresponding coeﬃcients in Gm , m = 1, ..., M. One can also verify that the constructed code has rate R = M − L −

ZB B+T

. As

a special case, when T < B, (5.3) suggests that we rather consider codes with R = M −L−Z, which would give us T = 0. In this case, the generator matrix degenerates to G = V ⊗ IN , i.e., the source symbols are coded via generator matrix V and then interleaved by N. For the single-link code with M = 1, L = 0 and Z = 1, G simply degenerates to GD . For codes that only correct link outages, i.e., Z = 0 (or equivalently, B = 0), the generator matrix degenerates to G = U ⊗ IN . In a slight modiﬁcation to this construction, we can also construct an burstcorrection-only (M − L)-link code correcting Z B-bursts as the inner code and an (M, M − L) MDS outer code across the links. This concatenated construction, after simpliﬁcation, gives a generator matrix very similar (but not identical) to the one in Proposition 2. This type of construction often requires a larger ﬁeld size Q, but is useful when extending the parallel-link code to single-source multicast networks.

5.3.2.2

Decoding

Deﬁne L as the set of link outages and B the set of links where erasure bursts occur. Divide the source vector into S = ( A1 ... AM −L B1 ... BM −L−Z )

88

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

where Ai ∈ Q1×T and Bi ∈ Q1×B are the sub-vectors corresponding to GU and GL , respectively. Denote by EGD : QT → QT +B the encoding function of the single-link code with generator matrix GD , and DGD : {Q, }T +B → QT ×QT +B the corresponding decoding function that outputs the source symbol vector and the clean channel symbol vector. The encoding and decoding functions for U and V are deﬁned similarly. Proposition 3 (Decoding). The M-parallel-link causal linear block erasure code described in Proposition 2 can be decoded in the following steps: • For each link m ∈ / L

B , mark the last B symbols of Ym as erased and perform

L decoding (Cm , XU m ) = DGD (Ym ). Compute the diﬀerence Xm L B where XLm corresponds to the information encoded by

= Ym − XU / m, m ∈ GL .

i }M −L−S , {XL }M ) = DV ({XL }m∈ • Perform decoding ({B / L ∪B ). Compute the m m=1 m i=1 U U diﬀerence Ym = Ym − XLm , m ∈ B where Ym corresponds to the information

encoded by GU (but still corrupted by the bursts). U • Perform decoding (Cm , XU m ) = DGD (Ym ), m ∈ B .

i }M −L , {XU }M ) = DU ({XU }m∈ • Perform decoding ({A / L ). m m=1 m i=1 M −L B M −L−Z ). 1 ... A 1 ... B • Output S=( A Example. Consider the 4-parallel-link code example correcting one link outage and two 2-bursts with delay 3, described in Section 5.3.2.1. Assume Link 1 is lost (i.e., L = {1}), Link 2 has y[2, 1], y[2, 2] erased, and Link 3 has y[3, 2], y[3, 3] erased (thus B = {2, 3}). To decode, start with decoding channel symbols on Link 4. Mark y[4, 4], y[4, 5] as erased (since they are “corrupted” by the last two source symbols) and perform decoding to recover XU 4 . Note that since the code is linear, if we linearly combine three codewords, the resulting sum is still a codeword, thus the decoding can be performed successfully. After that, we recover XL4 = Y4 − XU 4 and perform decoding on XL4 , which allows us to completely recover the source information corresponding to the lower part GL . After removing these information, we are left with Y2U and Y3U , U and then we can proceed to recover XU 2 , X3 and the corresponding source information

5.3. CODE CONSTRUCTION

89

within delay constraint. Finally, performing decoding across the links, we recover XU 1 and the corresponding source information.

5.3.3

Complexity

We discuss a few issues related to the computational complexity of the constructed parallel link code.

5.3.3.1

Finite Field Size

Small ﬁnite ﬁeld size is generally desirable for the complexity of ﬁnite ﬁeld arithmetic. Proposition 2 suggests that the required ﬁeld size q m should meet the need for (i) a (T, T − B) cyclic code, (ii) an (M, M − L) MDS code and (iii) an (M, M − L − Z) MDS code. It is suﬃcient to use (shortened) Reed-Solomon codes in all three cases. Since Reed-Solomon code must satisfy n ≤ q m − 1 where n is the coding block length, it is suﬃcient to use a ﬁeld size q m = q · max(T, M). For the example of constructing a 4-parallel-link code that corrects one link outage and two 2-bursts with delay 3, a suﬃcient ﬁeld size is q m = 22 . Note that the example in Section 5.3.2.1 uses ﬁeld size 2, since a Hamming code and a repetition code are used instead of a Reed-Solomon code, both requiring smaller ﬁeld size.

5.3.3.2

Encoding Complexity

Since the code is linear, the encoding can be represented in matrix multiplication. Generally, the encoding complexity is in the order of O(K · N), where K = (M − L)T + (M − L − Z)B and N = (T + B)M. Note that the generator matrix G can be very sparse in our construction.

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

90

5.3.3.3

Decoding Complexity

The decoding complexity mainly involves the decoding of (i) a (T, T − B) cyclic code, (ii) an (M, M − L) MDS code and (iii) an (M, M − L − Z) MDS code. If in all three cases, the Reed-Solomon codes and a Fast-Fourier-Transform-like decoding scheme are used, the decoding complexity for each block is in O(n log n), where n is the coding block length.

5.4

Extension to Heterogeneous Parallel Links

In this section, we extend our results to the more general cases involving heterogeneous link capacity and delay.

5.4.1

Parallel Links with Heterogeneous Capacity

In Section 5.1, we model the parallel links as having unit capacity, i.e., each link can transmit one symbol within a unit time. It is not hard to generalize this to parallel links with heterogeneous capacity. First, if the capacity is a fractional number, we can always approximate it well with an integer when choosing the time unit suﬃciently small. Therefore, we can focus on the integer case. We can model a link with capacity cm , where cm > 1 is an integer, as cm parallel links each with unit capacity. The delay penalty associated with this conversion is at most one symbol duration. To see this, consider that within a unit symbol duration, a link with capacity cm can send cm symbols sequentially. The ﬁrst symbol will ﬁnish the transmission after 1/cm . If, however, the cm symbols are sent in parallel over cm unit-capacity links, each symbol will ﬁnish the transmission after 1. Therefore, the worst delay penalty is 1 − 1/cm ≤ 1. If we further assume the worst case that the link outages and burst erasures occur on the links with highest capacities, then the following proposition characterizes the performance when a unit-capacity parallel-link code is applied to the heterogeneouscapacity parallel links:

5.4. EXTENSION TO HETEROGENEOUS PARALLEL LINKS

91

Proposition 4. For M parallel links, each with capacity cm ∈ N, m = 1, ..., M arranged in descending order, to correct L link outages and Z B-bursts with decoding delay T , each occurring on a separate link, it is suﬃcient to apply a Singleton-achieving ( M m=1 cm )-parallel-link causal block erasure code with decoding delay T − 1, feasible for Lm=1 cm link outages and L+Z m=L+1 cm B-bursts.

5.4.2

Parallel Links with Heterogeneous Link Delay

Previously we assume that the links are delay-free. We would like to generalize the results developed in Section 5.2 to parallel links where each link has a potentially diﬀerent delay. Consider the problem setup similar to Section 5.1, except that for the M parallel links, each introduces a delay of τm symbols, m = 1, ..., M, that is, the received channel symbol y[m, n] is related to the transmitted channel symbol x[m, n] as:

y[m, n + τm ] =

⎧ ⎨,

(m, n) ∈ B

⎩x[m, n], otherwise.

It is not hard to justify that the following strategy will work: we simply encode the source symbols using the same code construction in Section 5.3. At the decoder, we wait the longest τm until all symbols needed for performing the current decoding arrive, i.e., we recover S[n] at time n + T + maxm τm , n = 1, ..., N. The question is, can we do better than this strategy? Unfortunately, the following result shows that this is not possible. Theorem 5. For M parallel links, each with link delay τm , m = 1, ..., M, it is possible to construct a rate-R Singleton-achieving M-parallel-link causal block erasure code with decoding delay T = T + maxM m=1 τm , feasible for L link outages and Z B-bursts, each occurring on a separate link, if ⎧ ⎨M − L − ZB , T +B ∗ R ≤ R := ⎩M − L − Z,

T ≥B T < B,

(5.10)

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

92

Conversely, if R > R∗ , no feasible code can be constructed. Proof. Achievability: encode the source symbols using the generator matrix G in Proposition 2. At time n + T , n = 1, ..., N, the decoder collects {y[m, n ] : n = τm + 1, ..., τm + n + T, m = 1, ..., M} and apply decoding as in Proposition 3 to reconstruct S[n]. A proof of the converse is outlined in Appendix A.2. This result shows that in order to achieve low decoding delay over parallel links, the variation of link delay should be made as low as possible.

5.5

Application to Fast Stream Startup

In this section, we apply the constructed parallel-link burst erasure code to the peerassisted fast stream startup system discussed in Chapter 4 and show that it can eﬀectively help reduce the overall startup latency. In Section 4.2.3, we have shown that if an MDS code is applied, the resulting decoding delay can be expressed in (4.5). If a low-delay block erasure code is applied, by (5.4), the decoding delay can be expressed as

TCPAS−LD

=

⎧ ZμU MAX ⎨max − 1, 1 TBURST , μU < M (μU −1)−LμU ⎩0,

μU ≥

M M −L−Z M . M −L−Z

We illustrate the performance of both the MDS code and the low-delay code using the ns-2 simulation program discussed in Section 4.3. Assume that the erasure burst duration follows an exponential distribution with mean 16 ms. We examine the cases that the target maximum burst duration correctable is set to 50 ms and 100 ms, which correspond to the 99.8% and 99.99% percentile of the distribution, respectively. In Figure 4.11, we have shown that decreasing β helps reduce the startup latency, as transcoding helps reduce the decoder buﬀering data size. We would like to show how much gain could low-delay coding achieve on top of transcoding. In Figure 5.2, we plot the CDFs of the startup latency TSS under various combinations of transcoding (β = 0.5) / no transcoding (β = 1) and MDS code / low-delay code. It can be

5.5. APPLICATION TO FAST STREAM STARTUP

TMAX =50 ms BURST

TMAX =100 ms BURST

1

0.95

93

1 LD, β=0.5 MDS, β=0.5 LD, β=1 MDS, β=1

0.95

CDF

0.9

CDF

0.9

0.85

0.85

0.8

0.8

0.75 0

0.2

0.4 T

SS

0.6 (sec)

0.8

1

LD, β=0.5 MDS, β=0.5 LD, β=1 MDS, β=1

0.75 0.5 T

SS

1 (sec)

1.5

Figure 5.2: Simulation: CDFs of stream startup latency TSS under proposed transcoding and erasure correcting coding schemes. Both maximum burst durations of 50 ms and 100 ms are examined. Abbreviation: LD – low-delay codes, MDS – maximum distance separable codes. β is the transcoding compression ratio. observed that low-delay coding could achieve additional delay reduction on top of transcoding. The largest reduction is achieved when transcoding is combined with low-delay coding. Furthermore, the longer the burst duration to be corrected, the more saving the low-delay code could achieve.

Chapter Conclusion In this chapter, we have studied the design of an erasure correcting code for the peerassistance architecture that explicitly takes into consideration the decoding delay constraint. After a formal problem statement, we presented the main theorem that characterizes the fundamental design tradeoﬀ among the code rate, error correction performance and decoding delay for any Singleton-bound-achieving parallel link codes. We then described the practical code construction of the parallel-link burst erasure code. The cases when the parallel links are heterogeneous in terms of capacity and link delay were also discussed. Lastly, we applied the constructed low-delay codes

94

CHAPTER 5. LOW-DELAY BURST ERASURE CODES

to the peer-assisted fast stream startup problem and demonstrated its performance gain. Through this chapter, we have demonstrated the feasibility of designing a code with an aim beyond the conventional error correction performance.

Chapter 6 Distortion-Drift-Aware Video Transcoding In previous chapters, we have demonstrated that video transcoding can greatly enhance our system performance via reducing the stream bitrate. In Chapter 3, when video transcoding is integrated with erasure correcting coding, the resulting sourceaware error protection scheme SLEP/SLEPr achieves improved robustness to bursty packet loss and overcomes the peer downlink bottleneck issue. In Chapter 4, we have shown that video transcoding is especially helpful in improving the scalability of the system for fast stream startup, since even moderate transcoding of the video could lead to a tremendous reduction of the unicast data volume. In the proposed peer-assistance architecture, video transcoding needs to be performed in-network at the multicast peer receivers, where it is often desirable to use a scheme with low complexity. One of the approaches is to use open-loop transcoding by requantizing the motion-compensated transform coeﬃcients while preserving other control information (refer to Section 2.2.3 for more discussions). A known drawback of this scheme is the requantization distortion drift problem; that is, the distortion caused by requantization in a frame would propagate over many predicted frames until the end of a GOP, potentially leading to visible artifacts. In our applications of packet loss repair and fast stream startup, this problem is relatively minor, since the transcoded video is only displayed transiently. Nevertheless, we would like to 95

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

96

minimize its impact on the video visual quality. In this chapter, we propose to use an optimization framework to mitigate the impact of transcoding on the received video quality. We ﬁrst develop an analytical model to quantitatively capture the transcoding distortion drift eﬀect in a video sequence. We then optimize the received video quality based on selecting appropriate QPs of the quantizer for individual frames to minimize the modeled distortion. This distortion-drift-aware transcoding framework is proven useful for both packet loss repair and fast stream startup. In this chapter, we will demonstrate its eﬀectiveness in the transcoded-to-primary stream switching problem in fast stream startup. In Section 6.1, we introduce a simpliﬁed model for the open-loop requantizationbased transcoder, based on which we analyze the requantization error process in Section 6.2. In Section 6.3, we formulate the optimization problem for transcoded-toprimary stream switching, and show simulation results to demonstrate its eﬀectiveness in improving the video quality. Lastly, we test the distortion-drift-aware transcoding framework in the peer-assisted fast stream startup system. Throughout this chapter, when the received video quality is evaluated, we use a setup similar to that in Chapter 3. The results are based on six 4CIF (i.e., having a spatial resolution 704×576) sequences of diverse rate-distortion characteristics – Soccer, Ice, City, Harbour, Crew and Spincal. The sequences are encoded at 30 frames per second with an H.264/AVC JM encoder at QP 25 for I/P frames and 27 for B frames. The GOP structure is chosen to be IBBBP and the GOP size is 33. Table 6.1 summarizes the notations used in this chapter.

6.1

Problem Model

In the open-loop requantization-based transcoding architecture, the transcoder requantizes the motion-compensated transform coeﬃcients while preserving the coding modes, motion estimation information and GOP structure. The block diagram of the video encoder (with embedded decoder) and the transcoder is shown in Figure 6.1.

6.1. PROBLEM MODEL

Notation x, v, q E{·} f (·) τ (·) i n [i−/+ (m), n−/+ (m)] −/+ μn x x x v v vT q qT σ02 [n] 2 σT [n] 2 σEE [n] 2 ΔσT [n] (Dn , Rn , θn ) (A)+ α[m, n] βn ; γn ΦPT N M

97

Explanation Random variables in sans-serif letters Expectation of a random variable Spatial loop ﬁlter Motion-compensation operator Space index i = (ix , iy ) Time index Position [i, n]’s mth-order backward/forward recursive reference Backward/forward prediction coeﬃcients for a B-frame n Video signal Encoder local reconstruction of x Decoder local reconstruction of x Prediction error signal Quantized prediction error signal Requantized prediction error signal Quantization error signal Requantization error signal Quantization distortion in primary video Frame n Quantization distortion in transcoded video Frame n End-to-end distortion in decoder reconstruction Frame n 2 2 ΔσT [n] := σT [n] − σ02 [n] Parameters of rate-distortion model [168] (A)+ := max(A, 0) Power transfer factor after m steps for a process starting from Frame n Model parameters associated with α[m, n] Power transfer matrix Number of frames in a GOP Stream switching point within a GOP

Table 6.1: Notations used in Chapter 6. The encoder structure represents a typical hybrid predictive-transform video coding architecture. After receiving the encoded video, the transcoder performs entropy decoding to reconstruct the quantized transform coeﬃcients, followed by dequantization. It then uses a coarser quantizer to requantize the coeﬃcients, followed by entropy re-encoding.1 The control data and motion data are kept intact in the transcoded stream. This transcoder saves the complexity of both the motion estimation and the closed-loop motion compensation operations. 1

If intra-frame prediction is invoked (e.g., in H.264/AVC), the transcoder must also invert the transform coding and the intra-frame prediction before applying requantization, in order to prevent catastrophic distortion propagation within the same frame. But since the complexity of intraframe prediction is much lower than inter-frame motion compensation, such operations are assumed aﬀordable.

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

98

Control Data

Coder Control Video in

Control Data

Transform/ Quantizer

-

Embedded Decoder 0 Intra/Inter

Deq./Inv. Transform

Motion Comp. Predictor

Quant. Transf. Coef.

Entropy Encoder

Entropy Decoder

Motion Data

Motion Estimator

Encoder (with Embedded Decoder)

Quant. Transf. Coef.

Deq.

Requantizer

Entropy Encoder

Motion Data Transcoder

Figure 6.1: Block diagram of the video encoder (with embedded decoder) and the open-loop requantization-based transcoder.

Consider a simpliﬁed codec structure used for analysis, shown in Figure 6.2, where the transform coding and entropy coding blocks are omitted since they can be considered lossless. The video signal x[i, n] is discrete in two-dimensional space i = (ix , iy ) and time n. The codec includes a spatial loop ﬁlter f (·) such that f (x[i, n]) := f [i] ∗ x[i, n], where the impulse response f [i] represents eﬀects like sub-pel interpolation, and a motion-compensation operator τ (·), which involves spatial and temporal shifting. Note that τ (·) is time- and space-variant and should be indexed as τi,n (·), but for convenience, we omit the subscript. For intra-coded frames (I-frames), uni-directionally predicted frames (P-frames) and bi-directionally predicted frames (B-frames), respectively: τ (x[i, n]) = 0,

(6.1)

τ (x[i, n]) = x[i− , n− ],

(6.2)

− − + + + τ (x[i, n]) = μ− n · x[i , n ] + μn · x[i , n ],

(6.3)

where n− and n+ are Frame n’s backward and forward reference frame, respectively, each associated with a motion vector (i, i− ) and (i, i+ ). For a B-frame n, μ− n ≥ 0 and μ+ n ≥ 0 are the backward and forward prediction coeﬃcients, respectively. Note that (6.3) is a simpliﬁcation of the more general case where the forward and backward predictions are independent, each involving a diﬀerent interpolation (or ﬁltering) operation.

6.1. PROBLEM MODEL

v[i, n]

x[i, n]

99

q[i, n] v[i, n]

qT [i, n]

vT [i, n]

x[i, n]

−

f (·)

f (·)

x[i, n] τ (·)

τ (·) Transcoder

Encoder

Decoder

Figure 6.2: Simpliﬁed codec structure for analysis. The encoder, transcoder and decoder are modeled as follows. • Encoder. The input video signal x is fed into the encoder. x is predicted by some ﬁltered and motion-compensated reference based on the encoder local reconstruction x. The prediction error signal v is quantized to v. Denote by q the additive quantization error, which is modeled as zero-mean. We send v to the transcoder. v is also used for the local reconstruction of x. The quantities at the encoder can be related as: x[i, n] = f (τ ( x[i, n])) + v[i, n],

(6.4)

v[i, n] = v[i, n] + q[i, n],

(6.5)

x[i, n] = f (τ ( x[i, n])) + v[i, n].

(6.6)

• Transcoder. The transcoder coarsely requantizes v to vT , with qT denoting the zero-mean additive requantization error: vT [i, n] = v[i, n] + qT [i, n].

(6.7)

We send vT to the decoder. • Decoder. At the decoder, we generate the reconstruction based on the received

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

100

vT and the locally available reference x: x[i, n] = f (τ ( x[i, n])) + vT [i, n].

(6.8)

We are primarily interested in (i) σ02 [n] – quantization distortion in the primary video Frame n, (ii) σT2 [n] – quantization distortion (due to both primary quantization and 2 [n] – end-to-end disrequantization) in the transcoded video Frame n, and (iii) σEE

tortion in the decoder reconstruction Frame n, respectively deﬁned as: σ02 [n] := E{q[i, n]2 }, σT2 [n] := E{(q[i, n] + qT [i, n])2 }, 2 [n] := E{( x[i, n] − x[i, n])2 }, σEE

(6.9) (6.10) (6.11)

where E{·} denotes expectation. Throughout our analysis, we make two uncorrelatedness approximations: (i) The primary quantization error q and the requantization error qT in any frames are uncorrelated, i.e., E{q[i, n] · qT [j, m]} = 0, ∀i, j, n, m.

(6.12)

This approximation will be further discussed in Section 6.2.1.1. (ii) The requantization errors are spatially and temporally uncorrelated, i.e., E{qT [i, n] · qT [j, m]} = 0, ∀i = j, or n = m.

6.2

(6.13)

Analysis of Transcoding Error Process

2 The main objective of this analysis is to model the end-to-end distortion {σEE [n]}N n=1

for a GOP of N frames as a function of their frame sizes {RT [n]}N n=1 . In Section 6.2.1, through analysis of the intra-frame requantization error, we relate the transcoded frame size RT [n] to the quantization distortions σ02 [n] and σT2 [n]. In Section 6.2.2, 2 [n] to σ02 [n] and through analysis of the inter-frame requantization error, we relate σEE

σT2 [n].

6.2. ANALYSIS OF TRANSCODING ERROR PROCESS

101

Note that the transcoded frame size RT [n] can be associated with the the transcoding compression ratio β in Chapter 4 (and the SLEP compression ratio β in Chapter 3), by deﬁning

RT [n] , β := n n R0 [n]

where R0 [n] is the data size of Frame n when it is not transcoded.

6.2.1

Intra-Frame Requantization Distortion

By (6.9) (6.10) and the uncorrelatedness approximation (6.12), we have: E{qT [i, n]2 } = σT2 [n] − σ02 [n] =: ΔσT2 [n],

(6.14)

i.e., for each Frame n, the power of the requantization error can be computed as the diﬀerence between the distortion of the transcoded frame and that of the primary frame, denoted by ΔσT2 [n]. It is suggested in (6.14) that to compute the power of the requantization error, we need to ﬁnd both σ02 [n] and σT2 [n]. σ02 [n] can be obtained during primary video encoding. The way we obtain σT2 [n] is by establishing a rate-distortion model via measuring a few values at diﬀerent frame sizes RT [n] (by transcoding the current Frame n while keeping other frames intact, followed by decoding of Frame n) and ﬁtting them with a parametric model. Additionally, we also want to model the quantization parameter QPT [n] that achieves σT2 [n].

6.2.1.1

Rate-Distortion Model

The rate-distortion characteristic of a transcoded frame depends on diﬀerent factors, including the video content, the frame type (I, P or B-frame) and the QP used for encoding the primary video. Following [168, 145], we use a parametric model: σT2 [n] = Dn +

θn , RT [n] > (Rn )+ RT [n] − Rn

(6.15)

102

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

where (A)+ := max(A, 0) and θn > 0, Dn , Rn are the three model parameters to be determined by trials of transcoding and decoding of Frame n. Figure 6.3 shows the simulated results and the ﬁtted models of the MSE versus frame size for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. We can see that most of the points can be modeled by the parametric form (6.15), except for the points in P and B frames with ΔQP of about 2 ∼ 8 (the primary video is encoded with QP 25, and ΔQP is the diﬀerence between the transcoding QP and primary video QP), which lie far from the rate-distortion convex hull, implying additional distortion penalty associated with requantization. In the transcoding process, we would like to avoid these operating points, since they give ineﬃcient marginal return on the improvement of the video quality. This requantization penalty indicates that the uncorrelatedness approximation (6.12) is not always valid. More precisely, when the quantization and requantization stepsizes are close, the associated errors cannot be well approximated as being uncorrelated. We are primarily interested in operating in the range where the approximation is valid. In Appendix B, a more detailed explanation of the requantization penalty is provided.

6.2.1.2

QP-PSNR Model

Next, we would like to relate the QP used in quantization to the distortion. In H.264/AVC, each increment of six in the QP doubles the quantization stepsize Δ [122], i.e., Δ = A · 2QP/6 , where A is a constant. Furthermore, the distortion σT2 [n] is quadratic with the stepsize. It is thus clear that the frame peak signal-to-noise ratio (PSNR), deﬁned as PSNRT [n] := 10 · log10

2 σMAX σT2 [n]

,

6.2. ANALYSIS OF TRANSCODING ERROR PROCESS

MSE

I

P

0

B

50

100

0

500 Frame Size (Kb) B

0

B

20

0

103

0

200

20

0

0

100

P

0

100

B

200

B

50 20

20 0

0

0

100

0

B

0

0

100

0

0

100

B

0

200

B

0

0

100

0

100

B Simulation Model Primary

20

50

10

10 0

50

0

100

0

200

20

P

20

0

100

20

B

0

0

P 50

20 0

0

200

20

0

200

0

100

Figure 6.3: Frame MSE after transcoding versus frame size for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. The model is ﬁtted using 4 out of the 23 simulation points (the points lying oﬀ the convex hull are avoided). The triangle represents the operating point of the primary video (QP 25).

PSNR (dB)

I

P

B

40

40

40

30

30

30

20

30

40

50

20

30

QP B

40

50

20

50

50

20

40

50

20

50

20

50

50

20

30

40

50

40

50

B 40 30

30

40

50

20

30

Simulation Model Primary

40 30 30

40

B

30 40

40 B

40

30

30

30 30

P

40

30

20

30 30

20

B

40

30 40

50

40

P

B

20

40

40

30

40

30 30

B 40

30

30 30

40

30 30

20

B

40

30

20

50

40

P

40

20

40

B

40

50

20

30

40

50

Figure 6.4: Frame PSNR after transcoding versus the transcoding QP for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. The model is ﬁtted using 4 out of the 23 simulation points (except those with QP diﬀerence between primary quantization and requantization equal to 0 and 1). The triangle represents the operating point of the primary video.

104

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

where σMAX is 255 for 8-bit depth, is aﬃne with QPT [n]. Therefore, we can model the relationship between PSNRT [n] and QPT [n] as: PSNRT [n] = an + bn · QPT [n]

(6.16)

where an > 0 and bn < 0 are to be determined by trials of transcoding and decoding of Frame n. Figure 6.4 shows the simulation results and the ﬁtted models of the PSNR versus QP for the ﬁrst 15 frames of Soccer, encoded using IBBBP structure. We can see that most of the operating points can be explained by the model, except in B and P frames when the QP diﬀerence between the primary quantization and requantization is 0 and 1. These are the cases when the additional requantization error is small (or zero when the QP diﬀerence is 0). In our model, we use (6.16) for the QP diﬀerence > 1 cases, and directly use the measured PSNR for the cases when the QP diﬀerence equals 0 and 1.

6.2.2

Inter-Frame Requantization Drift

Having considered the requantization distortion occurring within each frame, we now shift the focus to the propagation of the distortion across frames due to the drift between encoder and decoder. First consider iteratively expressing the error between the reconstructed value at the encoder x and that at the decoder x in terms of qT [i, n]: x[i, n] − x[i, n] = qT [i, n] + f (τ ( x[i, n] − x[i, n]))

(6.17)

= qT [i, n] + f ◦ τ (qT [i, n]) + f ◦ τ ◦ f ◦ τ ( x[i, n] − x[i, n])

(6.18)

= ... n (f ◦ τ ◦)m qT [i, n] =

(6.19)

m=0

where ◦ denotes the composition of functions (i.e., f ◦g(·) := f (g(·))) and (f ◦ τ ◦)m (·)

6.2. ANALYSIS OF TRANSCODING ERROR PROCESS

105

denotes recursively composing spatial ﬁltering f (·) and motion compensation τ (·) m times. (6.17) follows (6.6), (6.7) and (6.8); both (6.17) and (6.18) use the linearity of f (·) and τ (·). The end-to-end distortion for any Frame n can then be expressed as: 2 [n] = E{(( x[i, n] − x[i, n]) + ( x[i, n] − x[i, n]))2 } σEE n (f ◦ τ ◦)m qT [i, n] + q[i, n])2 } = E{(

= E{(

m=0 n

(f ◦ τ ◦)m qT [i, n])2 } + E{q[i, n]2 }

(6.20)

m=0

where (6.20) follows the uncorrelatedness approximation (6.12) and the linearity of f (·) and τ (·). Since τ (·) for diﬀerent types of frames are diﬀerent, the way to expand the ﬁrst term of (6.20) is diﬀerent. In the sequel, we ﬁrst consider each case individually, and then combine the results in a uniﬁed matrix form.

6.2.2.1

I-Frame

Since an I-frame does not have a reference frame, its end-to-end distortion purely consists of the current frame’s primary quantization and requantization distortions. For an I-frame n, we have: 2 [n] = ΔσT2 [n] + σ02 [n] = σT2 [n]. σEE

6.2.2.2

(6.21)

P-Frame

The end-to-end distortion incurred on a P-frame consists of the eﬀects due to quantization of the current frame as well as drift distortion from its recursively referenced I- and P- frames. Consider (f ◦ τ ◦)m qT [i, n] for a P-frame n: (f ◦ τ ◦)m qT [i, n] = (f ◦)m (τ ◦)m qT [i, n] = (f ◦)m qT [i− (m), n− (m)]

(6.22) (6.23)

106

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

where in (6.22) f (·) and τ (·) are re-ordered assuming, for mathematical tractability, that both are spatially and temporally shift-invariant. In (6.23), [i− (m), n− (m)] denotes the mth-order backward recursive reference of [i, n]. For example, [i, n] has reference [i− (1), n− (1)], which in turn has reference [i− (2), n− (2)], and so on (which are either I- or P-frames). We can then write E{(

n

m

2

(f ◦ τ ◦) qT [i, n]) } = E{(

m=0

=

n

n

(f ◦)m qT [i− (m), n− (m)])2 }

m=0

E{((f ◦)m qT [i− (m), n− (m)])2 }

(6.24)

m=0

where (6.24) follows the uncorrelatedness approximation (6.13). All we left with is to study the power drift eﬀect under repetitive ﬁltering (f ◦)m (·). In general, the power decays with the number of ﬁlterings. This decaying eﬀect is studied by F¨arber and Girod [75] for a general error process u across inter-coded video frames, where it is shown that with the Gaussian assumption of both the error signal power spectral density (PSD) and the transfer function magnitude, the error power after m repetitive ﬁltering of u can be approximated by E{((f ◦)m u[i, n])2 } = α[m]σu2 , where α[m] :=

1 − βm 1 + γm

(6.25)

(6.26)

is the power transfer factor after m steps, β is the percentage of intra-coded macroblocks in inter-frames and γ is the leakage parameter, deﬁned as γ :=

σf2 . σg2

Here σf is a parameter describing the ﬁltering strength of the spatial loop ﬁlter f (·) and σg is a parameter describing the error signal’s spectral shape. The stronger the ﬁltering eﬀect (i.e., larger σf ) and ﬂatter the error signal’s PSD (i.e., smaller σg ), the larger γ is (i.e., the less error propagation).

6.2. ANALYSIS OF TRANSCODING ERROR PROCESS

107

Error Process Starting from an I Frame

α[m, n]

1 Simulation Model 0.5

0 0

1

2

3

4

5

6

7

Propagation Step m Error Process Starting from a P Frame

8

9

1

α[m, n]

Simulation Model 0.5

0 0

1

2

3

4

5

Propagation Step m

6

7

8

Figure 6.5: The power transfer factor α[m, n] as a function of the power transfer step m (only over I and P frames within a GOP). The GOP of 33 frames from the Soccer sequence are encoded using the IBBBP structure. The models are ﬁtted with data points obtained by transcoding the starting frame while keeping the other frames intact and recording the MSE of the subsequent propagated frames. In this example, as the decaying behaviors of the error processes starting from a P frame are similar, we aggregate the data points to obtain one uniﬁed model for them.

In general, the decaying behavior for an requantization error process should depend on its starting frame. For example, since in a P-frame, the coeﬃcients are the residues after temporal prediction, the requantization error signal tends to have a ﬂatter PSD, leading to a faster decay due to repeated low-pass ﬁltering. In the sequel, we denote by α[m, n] the power transfer factor after m steps for an error process starting from Frame n, and by βn and γn the corresponding model parameters. The values of βn and γn can be determined by trial transcoding of Frame n while keeping the other frames intact, decoding and model ﬁtting. In Figure 6.5, we plot some simulated data points and the ﬁtted model for one GOP of the Soccer sequence. It can be empirically veriﬁed that the requantization error process starting from a P frame exhibits faster decay compared to that starting from an I frame.

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

108

Combining (6.20) (6.24) (6.14) and (6.25), we obtain the ﬁnal form of the end-toend distortion for a P-frame n: 2 [n] σEE

=

σ02 [n]

+

n

α[m, n− (m)] · ΔσT2 [n− (m)].

(6.27)

m=0

6.2.2.3

B-Frame

The analysis for a B-frame is similar to P-frame’s, except that a B-frame has two reference Frames n− and n+ , which are either I- or P- frames. Similarly, let [i+ (m), n+ (m)] denote the mth-order forward recursive reference of [i, n]. We can obtain the ﬁnal form of the end-to-end distortion for a B-frame n: 2 [n] σEE

=

σ02 [n]

+

ΔσT2 [n]

+

n

(μ− [n])2 · α[m, n− (m)] · ΔσT2 [n− (m)]

m=1

+(μ+ [n])2 · α[m, n+ (m)] · ΔσT2 [n+ (m)]. 6.2.2.4

(6.28)

Power Transfer Matrix

2 We note that in (6.21) (6.27) and (6.28), σEE [n] is expressed as a linear combination of 2 the terms ΔσT2 [n] and σ02 [n]. Thus, we can express σEE [n] in a matrix form. Consider 2 2 2 := (σEE [1] ... σEE [N]), ΔσT2 := (ΔσT2 [1] ... ΔσT2 [N]) and a GOP of N frames.2 Let σEE

σ02 := (σ02 [1] ... σ02 [N]) be 1 × N row vectors. Deﬁne the power transfer matrix ΦPT as 2 [n], an N × N matrix with entry ΦPT (m, n) denoting the coeﬃcient of ΔσT2 [m] for σEE 2 can be expressed as: which can be obtained from (6.21) (6.27) and (6.28). Then σEE 2 σEE = ΔσT2 · ΦPT + σ02 .

(6.29)

As an example, in Figure 6.6, we plot the power transfer matrix for a GOP of 33 frames from the Soccer sequence, encoded using the IBBBP structure. The coeﬃcients are estimated through trials of coding and model ﬁtting of parameters. For more discussions about parameter estimation, refer to Section 6.2.4. 2

We only consider closed GOP, which starts with an instantaneous decoder refresh (IDR) frame.

6.2. ANALYSIS OF TRANSCODING ERROR PROCESS

109

1 5 Playout Frame #

0.8 10 0.6

15 20

0.4

25 0.2 30 5

10 15 20 25 Playout Frame #

30

0

Figure 6.6: Power transfer matrix obtained for a GOP of 33 frames from the Soccer sequence, encoded using IBBBP structure. The coeﬃcients are estimated through trials of coding and model ﬁtting of parameters.

6.2.3

Numerical Results

From (6.15) (6.16) and (6.29) together, we are allowed to predict the reconstruction PSNR and data size for each frame of the transcoded video. We compare the proposed analytical model with simulation results, for the end-to-end distortion of the decoded frames and the frame data size. Figure 6.7 shows examples of the PSNR trace and frame data size for a GOP of 33 frames. It can be seen that the proposed model can fairly accurately predict the reconstructed video quality and the frame data size.

6.2.4

Complexity of Parameter Estimation

The prediction of the end-to-end distortion and data size of each frame involves the estimation of the parameters in (6.15), (6.16) and (6.29), which can be obtained through trials of coding and model ﬁtting at the video encoding stage. In practice, we always face the tradeoﬀ between estimation accuracy and computational complexity. For (6.15) and (6.16), most generally, we need to obtain Dn , Rn , θn , an and bn for each Frame n = 1, ..., N, by transcoding the current Frame n followed by decoding at three diﬀerent QPs. If the video content is relatively stationary,

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

110

Avg. PSNR 30.3 dB (Model) 30.3 dB (Simulation)

Avg. PSNR 31.8 dB (Model) 31.8 dB (Simulation)

34 32 30 28 0

5

15 20 25 30 Playout Frame # Avg. Frame Size 73.5 Kb (Model) 73.3 Kb (Simulation)

400

200

0 0

5

10

15 20 Playout Frame #

25

Simulation Model

34 32 30 28 0

10

Frame Size (Kb)

Frame Size (Kb)

36 Simulation Model

PSNR (dB)

PSNR (dB)

36

5

10

15 20 25 30 Playout Frame # Avg. Frame Size 115 Kb (Model) 115 Kb (Simulation)

5

10

600 400 200 0 0

30

(a) PSNR (dB)

PSNR (dB)

Simulation Model 5

15 20 25 30 Playout Frame # Avg. Frame Size 125 Kb (Model) 127 Kb (Simulation)

600 400 200

5

10

15 20 Playout Frame #

(c)

25

30

Simulation Model 35

30 0

10

Frame Size (Kb)

Frame Size (Kb)

40

34

0 0

30

Avg. PSNR 34.4 dB (Model) 34 dB (Simulation)

36

30 0

25

(b)

Avg. PSNR 33.7 dB (Model) 33.3 dB (Simulation)

32

15 20 Playout Frame #

5

10

15 20 25 30 Playout Frame # Avg. Frame Size 136 Kb (Model) 139 Kb (Simulation)

5

10

600 400 200 0 0

15 20 Playout Frame #

25

30

(d)

Figure 6.7: Traces of PSNR and frame data size for a GOP of 33 frames from Soccer, encoded using IBBBP structure with diﬀerent frames transcoded: (a) all Frame 1 to 33 are transcoded, (b) Frame 1 to 15 are transcoded, (c) Starting from Frame 1, every one out of three frames are transcoded, (d) Frame 2 to 5, 20 to 23 are transcoded. The primary video QP and transcoding QP are 25 and 33, respectively.

6.3. TRANSCODED-TO-PRIMARY STREAM SWITCHING

111

Figure 6.8: Stream composition in a rapid acquisition architecture for fast stream startup. The unicast section is colored red; the multicast section is colored green. we can aggregate all the parameters for I, for P and for B frames individually, thus we only need to sample three frames. For (6.29), most generally, we need to obtain the the parameters βn and γn in (6.26) by model ﬁtting for each n. But typically, we can aggregate the parameters for all the error processes starting from I, from P or from + B frames. For B frames, the values of μ− n and μn can be heuristically determined

through linear interpolation based on their distance to the reference frames.

6.3

Transcoded-to-Primary Stream Switching

The analytical model we have developed thus far enables us to quantitatively capture the transcoding distortion drift eﬀect, which is a necessary step for formulating optimization problems with the aim of improving the received video quality. This optimization framework can be applied in many scenarios. In this section, we demonstrate its eﬀectiveness by applying it to the transcoded-to-primary stream switching problem, which is an important concern when the open-loop requantization-based transcoding is applied to the rapid acquisition architecture for fast stream startup. Recall that in the rapid acquisition architecture, after a stream startup request is initiated, a unicast stream starting from a RAP is sent at a higher-than-usual bitrate to the receiver. Later on, at a pre-determined moment, the stream is switched from the unicast back to the multicast. The switching point could be an arbitrary position in the stream, typically between two RAPs. A resulting stream composition is illustrated in Figure 6.8. If open-loop requantization-based transcoding is applied to the unicast stream, the resulting requantization distortion will drift across all the

112

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

unicast section, then continue drifting across the multicast section until the next RAP, even if the multicast frames are never transcoded. Our objective is to minimize the distortion incurred by the transcoding operation via selecting appropriate QPs of the quantizer for individual unicast frames, given the constraint on the unicast data size budget. The problem formulation should capture the following intuitions. First, the selected QPs should depend the types of the frames. For example, an I- or P-frame which has many subsequent frames referring to it should deserve a QP that yields more bits than a B-frame which has no dependent frame (hence no distortion drift). Second, the selected QPs should also depend on the relative position of the switching point within a GOP – if the switching point is earlier, there are fewer number of unicast frames transcoded, but the distortion is going to drift further in the primary multicast frames; and vice versa. Formally, consider a GOP of N frames crossing a switching point M, where Frames 1, ..., M belong to the transcoded stream, and Frames M + 1, ..., N belong to the primary stream. Since the switching point is arbitrary within a GOP, we can model M as a random variable uniformly distributed in {1, ..., N}. For each M = M, we are free to choose the requantization QPs for each transcoded frame and the corresponding data size RT (M) := (RT (M)[1]...RT (M)[M]), where RT (M) is a 1×M row vector. In choosing the transcoded frame size, we want to avoid the ineﬃcient operating points falling oﬀ the rate-distortion convex hull boundary (see Section 6.2.1). In other words, we want to impose the constraint that / (n , un ), RT (M)[n] ∈

(6.30)

where (n , un ) is the rate interval corresponding to the ineﬃcient operating points (typically corresponding to ΔQP = 2 ∼ 8). The total data size of the unicast section is

M

n=1

RT (M)[n], which is also a ran-

dom variable. The expected unicast data size should subject to a constraint, i.e., E{ M n=1 RT (M)[n]} ≤ E{M} · RT , where RT is a pre-determined value for the average unicast frame size. The objective is to minimize the expected per-frame distortion

6.3. TRANSCODED-TO-PRIMARY STREAM SWITCHING

113

across the whole GOP. Overall, this optimization problem can be cast as: 1 2 T ∗ (M)}N {RT M =1 = argmin E{ N σEE · 1 }, {RT (M )}

subject to

2 σEE = (σT2 − σ02 ) · ΦPT + σ02 ,

σT2 [n] ≥ Dn +

θn , RT (M)[n]−Rn

n = 1, ..., M,

σT2 [n] ≥ σ02 [n],

n = M + 1, ..., N, RT (M)[n] ∈ / (−∞, (Rn )+ ) (n , un ), E{ M n=1 RT (M)[n]} ≤ E{M} · RT .

n = 1, ..., M, M = 1, ...N, (6.31)

Note that in this formulation, there are N(N + 1)/2 unknowns that need to be solved. This optimization problem is non-convex, since the domain for each RT (M)[n] is non-convex. However, the following heuristic can lead to a good sub-optimal solution. First, we solve the optimization problem without considering the constraint (6.30). Excluding this constraint makes the problem convex, implying that it can be solved ∗ tractably. The solution may contain components RT (M)[n] that fall inside (n , un ),

in which case we simply quantize its value to n ; this would lead to unused data size. Note that (6.15) is monotonic, thus the optimality occurs when E{RT [M]} = RT . The next step is to allocate the excessive rate to other frames. We use the heuristic that when allocating the excessive rate, an I-frame has priority over a P-frame, and a P-frame has priority over a B-frame. Additionally, an earlier P-frame has priority over a later P-frame. In Figure 6.9, we plot the frame PSNR traces and the allocated frame data size (i.e., total bits allocated for the transcoded unicast stream) for diﬀerent switching point locations within a GOP, for a GOP of 33 frames from Soccer, transcoded using ﬁxed QPs for each frame, and optimized QPs based on solving the optimization problem (6.31). Two cropped frames, each selected from the ﬁxed QP or the optimized QP scheme, is shown in Figure 6.10 for visual comparison. From the results, we observe that the gain is especially prominent when the switching point is late within the GOP. Looking at the transcoding data size allocation, we ﬁnd that more bits are allocated to the case of early switching points. Intuitively, this is to reduce the error propagation within the long primary stream after an early switching point.

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

PSNR (dB)

114

40

40

40

40

40

40

40

35

35

35

35

35

35

35

30

30

30

30

30

30

40

40

40

40

40

40

40

35

35

35

35

35

35

35

30

30

30

30

30

30

0 20 0 Playout Frame #

0

20

20

0

20

0

20

0

20

0

20

0

20

0

20

0

20

0

20

0

20

30

30

40

40

40

40

40

40

40

35

35

35

35

35

35

35

30

30

30

30

30

30

0

20

0

20

0

20

0

20

0

20

0

20

30

40

40

40

40

40

40

40

35

35

35

35

35

35

35

30

30

30

30

30

30

0

20

0

20

0

20

0

20

40

40

40

40

40

35

35

35

35

35

30

0

20

30

0

20

30

0

20

30

0

20

30

0

20

0

20

30

0

20

0

20

0

20

0

20

Optimized QP Fixed QP 0

20

Frame Size (Kb)

(a) 500

500

500

0 0 0 20 0 Encoded Frame # 500 0

500

0

20

500 0

0

20

0

20

0

0

20

0

0

20

20

0

20

0

0

0

20

0

0

20

20

0

0

20

0

0

20

0

0

20

20

0

0

20

20

0

20

0

0

0

20

0

0

20

20

0

0

20

0

0

20

0

0

20

20

0

0

20

0

0

20

0

20

0

20

500

0

20

500

0

0

500

500

0

500

0

20

0

Optimized QP Fixed QP

500

0

0

500

500

500

500

0

0

500

500

0

0

500

500

500

500

0

0

500

500

500

500

0

0

500

500

500 0

0

0

500

500

500 0

20

500

0

20

(b) Figure 6.9: (a) PSNR traces and (b) the corresponding frame size allocation for different switching point locations within a GOP, for a GOP of 33 frames from Soccer, where the transcoded stream uses (a) ﬁxed QP for each frame, and (b) optimized QP based on (6.31). The mean transcoded frame size is 93 Kb.

6.3. TRANSCODED-TO-PRIMARY STREAM SWITCHING

(a) Fixed QP

115

(b) Optimized QP

Figure 6.10: Visual comparison of the decoded video frame from Soccer, for transcoded-to-primary switching point M = 33, (a) using ﬁxed QP (PSNR 29.1dB) and (b) using optimized QP (PSNR 39.8dB). Playout frame #33.

In Figure 6.11, for the six test sequences, we show the mean and the standard deviation of the frame PSNR as a function of the mean transcoded frame data size. The results are averaged over switching points uniformly distributed over each GOP. From the results, it can be observed that if we transcode the video using the optimized QPs, the received video quality has a signiﬁcant gain over the case with ﬁxed QPs. The gain is the most prominent when the transcoded data size is 70% to 80% of the original primary stream data size. This is partially due to the fact that at this region, the ﬁxed-QP scheme operates very ineﬃciently because of the requantization penalty discussed (refer to Section 6.2.1) whereas the optimized-QP scheme is able to avoid this operational region, thus being immune from this penalty. Lastly, we examine the gain of the transcoded-to-primary stream switching optimization scheme in the fast stream startup system, using the simulation settings in Section 4.3. The average stream compression ratio is set to be β = 0.7 to provide a good balance between the video quality and the reduction of unicast data size/duration. As our main focus here is on the eﬀect of transcoding on the video quality, in this set of experiments we assume that no bursty erasures occur in the received unicast/multicast stream. In Figure 6.12, we plot the CDF of frame PSNR resulted for the six test sequences, comparing the scheme with optimized QPs with

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

116

42

40

40

38 36

36

PSNR (dB)

PSNR (dB)

38

34 32

34 32 30

30 28

28

Optimized QP Fixed QP

26 0

50 100 Mean Transc. Frame Size (Kb)

26 0

150

Optimized QP Fixed QP 20

(a) Soccer 40

38

38 36 PSNR (dB)

PSNR (dB)

36 34 32

34 32 30

30

28

Optimized QP Fixed QP

28 0

50 100 150 Mean Transc. Frame Size (Kb)

26 0

200

Optimized QP Fixed QP 50

(c) Crew

100 150 200 250 Mean Transc. Frame Size (Kb)

300

(d) Harbour

44

38

42

36

40

34 PSNR (dB)

PSNR (dB)

120

(b) City

40

38 36

32 30

34 32 30 0

40 60 80 100 Mean Transc. Frame Size (Kb)

28

Optimized QP Fixed QP 10

20 30 40 50 Mean Transc. Frame Size (Kb)

(e) Ice

60

26 0

Optimized QP Fixed QP 20

40 60 80 100 Mean Transc. Frame Size (Kb)

120

(f) Spincal

Figure 6.11: Mean and standard deviation of the frame PSNR as a function of the mean transcoded frame data size in the transcoded-to-primary stream switching problem, for the six test sequences, transcoded using (i) ﬁxed QP for each frame, and (ii) optimized QP based on solving (6.31).

6.3. TRANSCODED-TO-PRIMARY STREAM SWITCHING

117

the scheme with ﬁxed QPs. It can be observed that the optimized-QP scheme works well in the system, achieving considerable gain over the ﬁxed-QP scheme.

Chapter Conclusion In this chapter, we discuss distortion-drift-aware video transcoding and its application in the transcoded-to-primary stream switching problem in fast stream startup. We ﬁrst introduce a model for the open-loop requantization-based transcoding scheme and analyze the distortion drift eﬀect. We then show how this framework can be applied to optimization of the received video quality in the transcoded-to-primary stream switching problem. The performance gain in video quality demonstrated in the simulation results suggests that it is indeed very important to exploit the characteristic of the video content in the video transport process.

CHAPTER 6. DISTORTION-DRIFT-AWARE VIDEO TRANSCODING

118

0

0

10

10

−1

CDF

CDF

10

−1

10

−2

10

Optimized QP Fixed QP

−3

10

25

30

35 Frame PSNR (dB)

40

Optimized QP Fixed QP

−2

10

45

34

35

(a) Soccer

36 37 38 Frame PSNR (dB)

39

40

(b) City

0

0

10

10

−1

−1

CDF

10

CDF

10

−2

−2

10

10

Optimized QP Fixed QP

−3

10

25

30

35 Frame PSNR (dB)

40

Optimized QP Fixed QP

−3

10

45

25

30 35 Frame PSNR (dB)

(c) Crew

40

(d) Harbour

0

0

10

10

−1

−1

CDF

10

CDF

10

−2

−2

10

10

Optimized QP Fixed QP

−3

10

32

34

36 38 40 Frame PSNR (dB)

(e) Ice

42

Optimized QP Fixed QP

−3

44

10

34

35

36 37 Frame PSNR (dB)

38

39

(f) Spincal

Figure 6.12: Simulation: CDF of frame PSNR for unicast-to-multicast stream switching with (i) optimized QP using the formulation in (6.31) of Chapter 6, and (ii) with ﬁxed QP. The simulation results are obtained using PAS with 10 current peers and 10 requesting peers. The stream transcoding compression ratio is selected to be β = 0.7. As our main focus here is on the eﬀect of transcoding on the video quality, in these simulations we assume that no bursty erasures occur in the received unicast/multicast stream.

Chapter 7 Conclusions The study of low-delay robust video multicast in this thesis is largely motivated by the real-world deployment of commercial-grade video services over bandwidth-limited and error-prone access networks, under which a set of challenging design criteria must be met simultaneously, including high video quality, low response time and system scalability. These challenges motivate us to re-examine the video multicast problem from a number of perspectives, including a peer-assistance architecture to improve the system scalability, a practical construction of decoding delay-optimal erasure correcting code that nicely ﬁts into the peer-assistance architecture, and a distortion-drift-aware transcoding scheme which ensures that video can be transcoded in-network with good quality and low complexity. We have also demonstrated that all the proposed solutions can work seamlessly together. In this ﬁnal chapter, we discuss the application scenarios of our solutions, summarize the major lessons learned from conducting this research and suggest a number of general ideas for future research in this area.

7.1

Applications

The motivating example of our proposed solutions is IP multicast of video over bursty error-prone access networks. However, we would like to emphasize that these solutions are naturally suited for a broader class of applications. First, bursty error is commonly 119

CHAPTER 7. CONCLUSIONS

120

encountered in real-world scenarios. Besides impulse noise in the access network, the time-varying multipath fading and bursty drop-tail strategy implemented at network routers are both common causes for bursty packet loss. Our proposed schemes have the potential to provide eﬀective solutions to mitigate the impact of bursty packet loss and reliably deliver media content over these networks. Second, our solutions ride on a generic multicast architecture and do not make any assumptions about its implementation details. Thus, our solutions can be naturally applied to application-layer overlay multicast [41], digital video broadcasting (DVB) [9] or multicast-like solutions such as CDNs [95]. Third, besides video streaming applications, the proposed PAR framework also has the potential to be applied in real-time collaborative services involving multiple receivers, such as multi-party audio or video conferencing, gaming and online whiteboarding, where both delay and reliability are important to the user experience. Recently, with the development of ﬁber-to-the-home technology, it is possible to provide network connection speed at gigabits per second with very low error rate [10]. One natural question to raise is, would this new trend in technology eventually make our solutions outdated? Our answer to this question is two-fold: First, video is extremely resource-hungry, and it is always challenging to push the limit of a communication technology to meet the ever-augmenting bandwidth demand of emerging applications such as high-deﬁnition/3D/stereo television/gaming. In this regard, it is always a better choice to have a viably engineered solution, where error is admissible but handled by error correction mechanisms, rather than an over-engineered solution where error is absolutely suppressed [148]. Second, even if packet loss repair is no longer needed under the new network infrastructure, the peer-assistance architecture for fast stream startup will still be useful.

7.2

Lessons Learned

We summarize the major lessons learned from conducting this research, hoping that they might provide insights to beneﬁt the broader class of applications discussed above.

7.2. LESSONS LEARNED

121

A network system that purely relies on servers to provide services to a large number of users could easily suﬀer from scalability issues. Likewise, a pure P2P system without any infrastructure could have many limitations, such as unreliability and prolonged startup delay. When combining the servers and the peers in a uniﬁed fashion, however, the resulting system could overcome the shortcomings of both. In this work, through designing the PAR and PAS protocols, we have demonstrated that it is possible to partially shift the servers’ burden to the local peers to improve the system scalability, by using the peers to assist the servers in error control and stream startup. Other works (e.g., [152]) have demonstrated that it is also possible to start from a P2P network, and add servers to the architecture to enhance reliability and reduce latency while preserving the scalability beneﬁt of a P2P network. A P2P network, by nature, involves many uncertainties. One important concern is peer departures, among others, that could easily lead to failure of services. There are many ways to overcome these issues, such as building a reliable service at the same layer as IP multicast [138, 66, 163], or providing probabilistic guarantees based on swarming in the overlay network [2, 209]. In this work, we have learned that we can also apply coding in the application layer to proactively protect against uncertainties, meeting both reliability and latency requirements. Shannon’s celebrated channel coding theorem has established the fundamental limit of the data rate of reliable communication over a channel contaminated by noise. However, latency, an important concern in real-time communication, has not been part of the formulation. Given practical assumptions about the channel, is it possible to tailor the code design to take into account this new objective? In this work, through the design of parallel-link burst erasure code with low decoding delay, we have shown that this is possible. Compared to generic data, video and other media format could have very diﬀerent properties. First of all, diﬀerent parts of the coded stream may carry unequal importance when the reconstruction quality is concerned. Second, depending on the semantic meaning they carry, diﬀerent media streams could exhibit diﬀerent resilience to errors. In the transport process, when these properties are exploited properly, often large gains in end-to-end reconstruction quality can be observed.

CHAPTER 7. CONCLUSIONS

122

7.3

Open Problems

In the proposed peer-assistance architecture, a key insight is that the a video stream is replicated and cached by many network nodes, including the unicast servers and the peer receivers. A lost packet is successfully repaired as long as anyone of the nodes delivers a copy of that packet. In the current PAR and PAS protocols, the unicast servers explicitly take on roles of coordinating the repair, or keep on pushing the status information to the peers. It is desirable to handle such tasks more transparently. Content-centric networking (CCN) [13], or named data networking (NDN) [12], is a promising networking paradigm that builds packet caching and look-up services in low layers of a network, thus greatly enhancing the eﬀectiveness. How to build real-time packet loss repair and fast stream startup services under the CCN or NDN paradigm are very interesting research questions. In the work presented in this thesis, we have imposed some practical constraints to ensure that our solutions are directly implementable in the current IP multicast architecture. In a bigger picture, however, the concept of “multicast” can be generalized. Instead of creating duplicates of packets at network nodes when needed, it is possible to create new packets by arbitrarily coding the received packets and relay the new packets to the subsequent nodes [20]. The system is considered as multicast as long as each receiver is able to decode the intended source messages. In such an “information ﬂow” view of multicast, there is no need to distinguish between multicast and unicast. Thus, the coding we currently consider on the unicast stream can be jointly applied to the multicast stream. Furthermore, the distinction between the unicast servers and the peer receivers should be de-emphasized. Each can be regarded as a network node “in the cloud” with some caching capabilities to accommodate time shifts, with the only diﬀerence of being either a receiver or not. This generalized model could encompass a greater class of problems – not only the packet loss repair and fast stream startup we considered in this thesis, but also services such as video-on-demand which involves a larger time shift and a focus on storage. Under this framework, both the low-delay codes and distributed storage codes [54] should be studied in a uniﬁed fashion.

Appendix A Converse Proofs A.1

Converse of Theorem 1

In this section, we prove the converse part of Theorem 1. The converse result in [57, 124] based on the periodic erasure channel argument is applicable when the code is systematic. For the more general causal codes, a diﬀerent proof technique is required. The entropy argument used in this section is based on the following reasoning. Since the feasibility of a code must be irrespective of the source’s prior distribution, if we can assign the source symbols some distribution, and use the entropy inequalities to show that there exists a set of conditions (i.e. erasure patterns) that the code cannot possibly meet all at the same time, we thereby show the infeasibility of the code. Another way of interpreting this entropy argument is as follows. Recall that the Bayes risk of any estimator is always no larger than its minimax risk. If we can show that its Bayes risk is bounded away from the targeted risk, then its minimax risk is surely bounded away as well. We start with a useful entropy lemma: Lemma 6. Let X = (x1 , ..., xn ), n ≥ 2, and XΩ\i = (x1 , ..., xi−1 , xi+1 , ..., xn ). If H(s) > 0 and H(s | XΩ\i ) = 0, i = 1, ..., n, then H(X) <

n

i=1

H(xi ). 123

(A.1)

APPENDIX A. CONVERSE PROOFS

124

Proof. Consider H (x1 | xn2 ) ≤ H (s, x1 | xn2 ) = H (s | xn2 ) + H (x1 | s, xn2 ) = H (x1 | s, xn2 )

(A.2)

≤ H (x1 | s, xn3 ) = H (s | x1, xn3 ) + H (x1 | xn3 ) − H (s | xn3 ) = H (x1 | xn3 ) − H (s | xn3 )

(A.3)

≤ H (x1 ) − H (s | xn3 )

(A.4)

where xji denotes (xi , ..., xj ). In (A.2) and (A.3) we apply (A.1) for i = 1 and i = 2, respectively. Apply (A.4), we have: H(X) = H (x1 | xn2 ) + H (xn2 ) ≤ H (x1 ) − H (s | xn3 ) + H (xn2 ) n H(xi ) − H (s | xn3 ) . ≤ i=1

If H (s | xn3 ) > 0, the conclusion follows; otherwise we have H (s | xn3 ) = 0.

(A.5)

Similarly, apply (A.5), together with (A.1) for i = 3 to upper bound H (x21 | xn3 ), we end up with H x21 | xn3 ≤ H x21 − H (s | xn4 ) . If H (s | xn4 ) > 0, the conclusion follows; otherwise we have H (s | xn4 ) = 0. Repeat the same argument until we exhaust all the cases.

The proof of the converse proceeds as follows. Let the source symbols s[m, n],

A.1. CONVERSE OF THEOREM ??

125

m = 1, ..., Kn , n = 1, ..., n be distributed i.i.d. uniform over Q. Thus H(s[m, n]) = H(s) := log Q.

(A.6)

Furthermore, since each x[m, n] is over Q, we have H(x[m, n]) ≤ H(s),

∀m, n.

(A.7)

For simplicity, we only consider the W := M − L links without link outage. By the Singleton-achieving assumption, K = W N − BZ.

(A.8)

In the sequel, we discuss two cases. Note that the causality assumption is used in proving Case II, and the proof for Case I holds for general non-causal codes. Case I : T ≥ B, where we want to show T ≥ N − B.

(A.9)

Assume the opposite is true, for example, T = N −B −1. Then S[1] must be recovered at time N − B. Let Xurgent := x[1 : W, 1 : (N − B)] be divided into non-overlapping segments of length B except the segments in the last columns, then the number of segments in ! each row is NB := N B−B . Precisely, let the index sets be Im,1 = {(1, 1), ..., (1, B)}, Im,2 = {(1, B + 1), ..., (1, 2B)}, ..., Im,NB = {(1, NB B − B + 1), ..., (1, N − B)}, m = 1, ..., W , then Xurgent = {x[Im,i ]}m=1,...,W, i=1,...,NB . Overall, there are W NB segments. Let them be indexed as I(1) = I1,1 , I(2) = I2,1 ,..., I(W +1) = I1,2 ,..., I(W NB ) = IW,NB . Consider a set of W NB − Z + 1 erasure patterns (refer to Table A.1(a) for an example): Bi = {I(j) }Z−1 j=1

I(Z−1+i) ,

i = 1, ..., W NB − Z + 1.

APPENDIX A. CONVERSE PROOFS

126

That S[1] is recovered at N − B implies H(S[1] | Xurgent \ x[Bi ]) = 0, i = 1, ..., W NB − Z + 1. Applying Lemma 6 with s = S[1] and xi = x[I(Z−1+i) ], i = 1, ..., W NB − Z + 1, we have H(Xurgent \ x[{I(j) }Z−1 j=1 ]) <

W N B −Z+1

H(x[I(Z−1+i) ])

(A.10)

i=1

≤

H(x[m, n])

x[m,n]∈Xurgent \x[{I(j) }Z−1 ] j=1

≤ (K − W B + B) · H(s)

(A.11)

where (A.11) follows (A.8) and (A.7). Intuitively, this suggests that the channel symbols Xurgent \ x[{I(j) }Z−1 j=1 ] cannot have the maximum entropy possible.

Now, consider the erasure pattern (refer to Table A.1(b) for an example): B0 = {I(j) }Z−1 j=1

{(W, N − B + 1), ..., (W, N)}.

The feasibility of the code implies that all the source symbols S must be recovered from the remaining symbols Xurgent \ x[{I(j) }Z−1 j=1 ] and x[1 : W − 1, N − B + 1 : N], or H(S | Xurgent \ x[{I(j) }Z−1 j=1 ], x[1 : W − 1, N − B + 1 : N]) = 0.

A.1. CONVERSE OF THEOREM ??

127

But 0 = H(S | Xurgent \ x[{I(j) }Z−1 j=1 ], x[1 : W − 1, N − B + 1 : N]) ≥ H(S) − H(Xurgent \ x[{I(j) }Z−1 j=1 ])

(A.12)

−H(x[1 : W − 1, N − B + 1 : N])) > K · H (s) − (K − W B + B) · H(s) −(W − 1)B · H(s)

(A.13)

= 0 where in (A.12) we use the fact that x[m, n] must be a function of S, implying H(Xurgent \ x[{I(j) }Z−1 j=1 ], x[1 : W − 1, (N − B + 1) : N] | S) = 0; (A.13) follows (A.6), (A.7), (A.11) and that the source symbols are independent. Intuitively, since we have seen that Xurgent \ x[{I(j) }Z−1 j=1 ] cannot have the maximum entropy possible, it is not suﬃcient to fully recover S. Thus, the assumption fails and (A.9) is true. Combining (5.1), (A.8) and (A.9), we obtain R≤W−

BZ , T ≥ B. T +B

Case II : T < B, where we want to show N ≤ B.

(A.14)

Assume the opposite is true, for example, let N = B + 1, thus N > T + 1. In other words, when S[1] is recovered at time T + 1, there are channel symbols remaining to be received. We will show these “excessive” channel symbols lead to a problem. Deﬁne two index sets I(Z−1)×B := {(m, n)}m=W −Z+2,...,W, n=1,...,B , I(Z−1)×1 := {(m, n)}m=W −Z+2,...,W, n=N .

APPENDIX A. CONVERSE PROOFS

128

Table A.1: Illustration of the erasure patterns used in proving the converse. denotes an symbol erasure. Time 1 2 Link 1 Link 2 Link 3

3 4 5

Time 1 2 Link 1 Link 2 Link 3

(a) Case I: B1 . W = 3, Z = 2, B = 2, N = 5. Time 1 2 Link 1 Link 2 Link 3

3 4

5

(c) Case II: B1 . W = 3, Z = 2, B = 4, N = 5.

5

(b) Case I: B0 . W = 3, Z = 2, B = 2, N = 5. Time 1 2 Link 1 Link 2 Link 3

3 4

3 4

5

(d) Case II: B2 . W = 3, Z = 2, B = 4, N = 5.

Consider the erasure pattern (refer to Table A.1(c) for an example): B1 = I(Z−1)×B

{(1, 1), ..., (1, B)}.

Since S[1] is recovered at T + 1, where T + 1 ≤ B, we must have H(S[1] | x[2 : (W − Z + 1), 1 : (T + 1)]) = 0, or S[1] = f (x[2 : W − Z + 1, 1 : T + 1]) . Now consider the erasure pattern (refer to Table A.1(d) for an example): B2 = I(Z−1)×B

{(1, 2), ..., (1, N)}.

(A.15)

A.1. CONVERSE OF THEOREM ??

129

That the code is feasible implies 0 = H(S | x[1 : W − Z + 1, 1], x[2 : W − Z + 1, 2 : N], x[I(Z−1)×1 ]) = H(S | x[1 : W − Z + 1, 1], x[2 : W − Z + 1, 1 : N], x[I(Z−1)×1 ])

(A.16)

≥ H(S | S[1], x[2 : W − Z + 1, 1 : N], x[I(Z−1)×1 ])

(A.17)

= H(S | f (x[2 : W − Z + 1, 1 : T + 1]) , x[2 : W − Z + 1, 1 : N], x[I(Z−1)×1 ]) ≥ H(S | x[2 : W − Z + 1, 1 : N], x[I(Z−1)×1 ])

(A.18) (A.19)

= H(S) −H(x[2 : W − Z + 1, 1 : N], x[I(Z−1)×1 ])

(A.20)

≥ [K − (W − Z)N − (Z − 1)] · H (s)

(A.21)

= H (s)

(A.22)

where in (A.16) we repeat the conditioning x[2 : W − Z + 1, 1], (A.17) follows the causality of code, thus x[1 : W − Z + 1, 1] must be a function of S[1], (A.18) follows (A.15), (A.19) follows that T + 1 ≤ N, (A.20) follows that x[m, n] is function of S for all m, n, (A.21) follows (A.6) and that the source symbols are independent, and (A.22) follows (A.8). The whole equation cannot be valid since H (s) > 0. Therefore, the assumption is false, and we have (A.14). The key factor that leads to contradiction is that since the code is causal, x[1 : W − Z + 1, 1] must be a function of S[1], which in turn must be a function of x[2 : W − Z + 1, 1 : T + 1]. This dependency reduces the entropy of the symbols remaining after B2 is erased. Finally, combining (5.1), (A.8) and (A.14), we obtain R ≤ W − Z, T < B.

APPENDIX A. CONVERSE PROOFS

130

A.2

Converse of Theorem 5

In this section, we sketch the converse of Theorem 5, highlighting the diﬀerences from Appendix A.1. Again, for simplicity, we only consider the links without link outage. We follow similar source distribution and the Singleton-achieving assumption, so (A.6), (A.7) and (A.8) still apply. In the case each link m is with delay τm , each channel symbol x[m, n] is received at time n+τm as y[m, n+τm ]. Let τ ∗ := maxM m=1 τm be the longest link delay, and L∗ := {m : m = arg maxM m =1 τm } be the set of link(s) with the longest delay. We discuss two cases in the sequel. Case I : T ≥ B, where we want to show that T ≥ N − B.

(A.23)

Assume the opposite is true, for example, T = N − B − 1, or T = N − B + τ ∗ − 1. Then S[1] must be recovered at time N − B + τ ∗ . At this moment, what have been received are within Xurgent

⎫ ⎧ ⎧ ∗⎬ ⎨ ⎨1, ..., N − B + 1, m ∈ /L := x[m, n ] : n = . ⎩ ⎩1, ..., N − B, ∗⎭ m∈L

In a similar manner, we can deﬁne I(1) , I(2) , ..., I(NU ) such that Xurgent = {I(i) }i=1,...,NU . We can proceed to show that H(Xurgent \ x[{I(j) }Z−1 j=1 ]) cannot achieve maximum entropy possible by invoking Lemma 6. Next, let m∗ be one of the links in L∗ and consider the erasure pattern: B0 = {I(j) }Z−1 j=1

{(m∗ , N − B + 1), ..., (m∗ , N)}.

The feasibility of the code implies that all the source symbols S must be recovered from the remaining symbols. But since Xurgent \x[{I(j) }Z−1 j=1 ] cannot have the maximum entropy possible, the remaining symbols are not suﬃcient to fully recover S. Thus, the assumption is false and (A.23) is true. Case II : T < B, where we want to show N ≤ B. Assume the opposite is true, for

A.2. CONVERSE OF THEOREM ??

131

example, let N = B +1, thus N +τ ∗ > 1+T +τ ∗ . Similarly, the key is to realize when S[1] is recovered at time 1 + T + τ ∗ , there are “excessive” channel symbols remaining to be received, which will cause contradiction when multiple erasure patterns are considered. We can complete the proof by contradiction using the early recovery, full recovery property and the causality of code.

Appendix B Requantization Penalty In this part, we provide a detailed explanation for the distortion penalty associated with requantization, as discussed in Section 6.2.1.1. Figure B.1 shows an example where applying quantization twice with diﬀerent stepsizes could lead to an inconsistent result compared to applying quantization once only. Consider scalar quantizers with uniform stepsizes. The ﬁner quantizer and the coarser quantizer have stepsize Δ1 and Δ2 , respectively, with Δ1 < Δ2 . For any value x, its quantization level by the ﬁner quantizer, by the coarser quantizer and by the ﬁner quantizer followed by the coarser quantizer are denoted by Qx1 , Qx2 and Qx12 , respectively. For value a, Qa2 = Qa12 , thus no additional error occurs. However, for value b, Qb2 = Qb12 , implying that there are additional quantization error due to applying quantization twice on the source. Typically, the additional quantization error is most severe when Δ2 is less than the double of Δ1 .1 The eﬀect of requantization has been studied by Shen in [159] by carrying out analytical study based on the number theory. However, in his work, no simple analytical form about the distortion and the stepsize has been be derived. Instead, in this part we empirically study the eﬀect of requantization penalty based on synthetic data. We generate 105 samples according to i.i.d. Laplacian distribution with zero mean and standard deviation 1.0 and 5.0, apply uniform quantization/requantization on the samples and compute the MSE measured between the original samples and 1

Doubling the stepsize corresponds to a QP diﬀerence of 6.

132

133

Δ1

Δ1

Qa1

Qb1

a

Δ1

b

Qa2 = Qa12 = Qb2 Δ2

Qb12 Δ2

Figure B.1: Illustration of additional error introduced by requantization. Reproduced from Figure 2, [25]. the reconstructed samples. Figure B.2 plot the MSE as a function of the quantization stepsize (left) and the MSE as a function of the entropy of quantized data (right) for the two cases of interest: (i) the samples are quantized once by a coarse quantizer with Δ2 , (ii) the samples are ﬁrst quantized by a ﬁne quantizer with Δ1 , then requantized by a coarse quantizer with Δ2 . Based on these plots, we make the following observations: 1) From the relation of MSE and stepsizes, it is evident that quantization and requantization errors are not well approximated by the uncorrelated assumption especially when the source variance is low. Instead, the MSE varies as a function of the stepsize ratio: at odd integer numbers, requantization incurs zero error; at even integer numbers, the error is especially high (note that this depends on the form of the quantizer as well). 2) From the relation of MSE and entropy, it is evident that when the source variance is low, there is a large penalty at high-entropy region. The variation of MSE as a function of the entropy is complicated – ﬁrst, the relation is not monotonic; second, it is interesting to see that sometimes requantization do help reduce the MSE (although overall, the MSE is higher). 3) When the source variance is high, the behavior of the two cases (i) and (ii) become more consistent, which explains why in Figure 6.3, the plot for the I-frame appears to have less requantization penalty than the other plots.

APPENDIX B. REQUANTIZATION PENALTY

134

STD=1.0, Quant Stepsize=1.0 0.8

0.7

0.7

0.6

0.6

0.5

0.5 MSE

MSE

STD=1.0, Quant Stepsize=1.0 0.8

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0 1

0.1

Quant Quant/Requant 1.5 2 2.5 3 3.5 Quant/Requant Stepsize Ratio

Quant Quant/Requant

0 0

4

0.2

0.4 0.6 Entropy (bits)

0.8

1

(a) STD=5.0, Quant Stepsize=1.0 18

16

16

14

14

12

12

10

10

MSE

MSE

STD=5.0, Quant Stepsize=1.0 18

8

8

6

6

4

4

2 0 0

2

Quant Quant/Requant 5 10 15 Quant/Requant Stepsize Ratio

Quant Quant/Requant

20

0 0

0.2

0.4 0.6 Entropy (bits)

0.8

1

(b) Figure B.2: Requantization eﬀect on synthetic data: MSE vs. quantization stepsize (left) and MSE vs. entropy of quantized data (right) for source with (a) standard deviation 1.0 and (b) standard deviation 5.0. We generate 105 samples according to i.i.d. Laplacian distribution, and compare two cases of interest: (i) the samples are quantized once by a coarse quantizer with stepsize Δ2 (Quant), (ii) the samples are ﬁrst quantized by a ﬁne quantizer with stepsize Δ1 = 1, then by a coarse quantizer Δ2 , where Δ2 ≥ Δ1 . The quantizers have the form with stepsize |x| f (x) = sgn(x) Δi + 0.5 Δi , i = 1, 2.

Bibliography [1] ISO/IEC JTC1/SC29/WG11/N10361: Vision and Requirements for HighPerformance Video Coding (HVC), Feb. 2009. [2] BitTorrent. http://en.wikipedia.org/wiki/BitTorrent (protocol). [3] The eDonkey network. http://en.wikipedia.org/wiki/EDonkey network. [4] FastTrack. http://en.wikipedia.org/wiki/FastTrack. [5] Gnutella. http://en.wikipedia.org/wiki/Gnutella. [6] Kazaa. http://en.wikipedia.org/wiki/Kazaa. [7] Napster. http://en.wikipedia.org/wiki/Napster. [8] H.264/AVC reference software. http://iphome.hhi.de/suehring/tml/. [9] Digital Video Broadcasting (DVB) project. http://www.dvb.org. [10] Google Fiber Project. http://www.google.com/appserve/ﬁberrﬁ/. [11] The Network Simulator – ns-2. http://www.isi.edu/nsnam/ns/. [12] Named Data Networking (NDN). http://www.named-data.net/index.html. [13] Content-Centric Networking (CCN).

http://www.parc.com/work/focus-

area/content-centric-networking/. [14] Advanced Video Coding for Generic Audiovisual Services, ITU-T Recommendation H.264 - ISO/IEC 14496-10(AVC), ITU-T and ISO/IEC JTC 1, 2003. 135

BIBLIOGRAPHY

136

[15] ISO/IEC JTC1, Coding of Audio-Visual Objects Part 2: Visual, ISO/IEC 14496-2, MPEG-4 Visual Version 1, Apr. 1999; Amendment 1 (Version 2), Feb., 2000; Amendment 4 (streaming proﬁle), Jan., 2001. [16] ITU-T and ISO/IEC JTC 1, Generic Coding of Moving Pictures and Associated Audio Information, Part 2: Video, ITU-T Recommendation H.262 ISO/IEC 13818-2 (MPEG-2), Nov. 1994. [17] ITU-T, Video Codec for Audiovisual Services at 64 kbit/s, ITU-T Recommendation H.261, Version 1: Nov. 1990; Version 2: Mar. 1993. [18] ITU-T, Video Coding for Low Bit Rate Communication, ITU-T Recommendation H.263, version 1, Nov. 1995; version 2, Jan. 1998; version 3, Nov. 2000. [19] A. Aaron, R. Zhang, and B. Girod. Wyner-Ziv coding of motion video. In Proc. Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, volume 1, pages 240–244, Paciﬁc Grove, CA, USA, Nov. 2002. [20] R. Ahlswede, N. Cai, S.-Y.R. Li, and R.W. Yeung. Network information ﬂow. IEEE Trans. Information Theory, 46(4):1204–1216, Jul. 2000. [21] I. Ahmad, X. Wei, Y. Sun, and Y.-Q. Zhang. Video transcoding: an overview of various techniques and research issues. IEEE Trans. Multimedia, 7(5):793–804, 2005. [22] K. Ahmed. Perceived quality of channel zapping. In ITU-T IPTV Global Technical Workshop, 2006. [23] Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peerto-peer content distribution technologies. ACM Comput. Surv., 36:335–371, December 2004. [24] S. Annapureddy, S. Guha, C. Gkantsidis, D. Gunawardena, and P. Rodriguez. Exploring VoD in P2P swarming systems. In Proc. IEEE International Conference on Computer Communications (INFOCOM), pages 2571–2575, May 2007.

BIBLIOGRAPHY

137

[25] P. A. A. Assuncao and M. Ghanbari. A frequency-domain video transcoder for dynamic bitrate reduction of MPEG-2 bit streams. IEEE Trans. Circuits Syst. Video Technol., 8(8):953–967, 1998. [26] P.A.A. Assuncao and M. Ghanbari. Transcoding of single-layer MPEG video into lower rates.

IEE Proceedings, Vision, Image and Signal Processing,

144(6):377–383, Dec. 1997. [27] I. Avcibas, I. Avcbas, B. Sankur, and K. Sayood. Statistical evaluation of image quality measures. Journal of Electronic Imaging, 11:206–223, 2002. [28] P. Baccichet, D. Bagni, A. Chimienti, L. Pezzoni, and F.S. Rovati. Frame concealment for H.264/AVC decoders. IEEE Transactions on Consumer Electronics, 51(1):227–233, Feb. 2005. [29] A. Badr, A. Khisti, and E. Martinian. Diversity embedded streaming erasure codes (DE-SCo): Constructions and optimality. In Proc. IEEE Global Telecommunications Conference (GLOBECOM), pages 1–5, Miami, FL, USA, Dec. 2010. [30] Suman Banerjee, Bobby Bhattacharjee, and Christopher Kommareddy. Scalable application layer multicast. In Proc. ACM Conference on Applications, Technologies, Architectures and Protocols for Computer Communications, SIGCOMM ’02, pages 205–217, Pittsburgh, Pennsylvania, USA, 2002. ACM. [31] H.H. Bauschke, C.H. Hamilton, M.S. Macklem, J.S. McMichael, and N.R. Swart. Recompression of JPEG images by requantization. IEEE Transactions on Image Processing, 12(7):843–849, Jul. 2003. [32] A.C. Begen. Error control for IPTV over xDSL networks. In Proc. IEEE Consumer Communications and Networking Conference (CCNC), pages 632– 637, Las Vegas, NV, USA, Jan. 2008. [33] A.C. Begen, N. Glazebrook, and W. V. Steeg. A uniﬁed approach for repairing packet loss and accelerating channel changes in multicast IPTV. In Proc. IEEE

BIBLIOGRAPHY

138

Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 2009. [34] Y. Bejerano and P.V. Koppol. Improving zap response time for IPTV. In Proc. IEEE International Conference on Computer Communications (INFOCOM), pages 1971–1979, Rio de Janeiro, Brazil, Apr. 2009. [35] N. Cai and R.W. Yeung. Network error correction, part II: lower bounds. Communications in Information and Systems, 6(1):37–54, 2006. [36] J. Chakareski, J. Apostolopoulos, W.-T. Tan, S. Wee, and B. Girod. Distortion chains for predicting the video distortion for general packet loss patterns. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages V – 1001–4 vol.5, Montreal, Quebec, Canada, May 2004. [37] R. Chandramouli, R. Shorey, P.K. Srimani, X. Wang, and H. Yu. Guest editorial recent advances in wireless multimedia. IEEE Journal on Selected Areas in Communications, 21(10):1501–1505, Dec. 2003. [38] P.A. Chou and Y. Wu. Network coding for the Internet and wireless networks. IEEE Signal Processing Magazine, 24(5):77–85, Sep. 2007. [39] P.A. Chou and Y. Wu. Network coding for the Internet and wireless networks. Microsoft Research, Technical Report, MSR-TR-2007-70, Jun. 2007. [40] P.A. Chou, Y. Wu, and K. Jain. Practical network coding. In Proc. 41st Allerton Conference on Communications, Control and Computing, Monticello, Illinois, USA, 2003. [41] Y.-H. Chu, S. G. Rao, and H. Zhang. A case for end system multicast. In Proc. ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 1–12, 2000.

BIBLIOGRAPHY

139

[42] M.R. Civanlar, A. Luthra, S. Wenger, and Wenwu Zhu. Introduction to the special issue on streaming video. IEEE Transactions on Circuits and Systems for Video Technology, 11(3):265–268, Mar. 2001. [43] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Proc. International Workshop on Designing Privacy Enhancing Technologies: Design Issues in Anonymity and Unobservability, pages 46–66. Springer-Verlag New York, Inc., 2001. [44] G.W. Cook, J. Prades-Nebot, and E.J. Delp. Rate-distortion bounds for motion compensated rate scalable video coders. In Proc. International Conference on Image Processing, volume 5, Singapore, Oct. 2004. [45] G.W. Cook, J. Prades-Nebot, Y. Liu, and E.J. Delp. Rate-distortion analysis of motion-compensated rate scalable video. IEEE Transactions on Image Processing, 15(8):2170–2190, Aug. 2006. [46] G. Cote and F. Kossentini. Optimal intra coding of blocks for robust communication over the internet. Signal Processing: Image Communication, 15(1-2):25– 34, Sep. 1999. [47] G. Cote, S. Shirani, and F. Kossentini. Optimal mode selection and synchronization for robust video communications over error-prone networks. IEEE Journal on Selected Areas in Communications, 18(6):952–965, Jun. 2000. [48] I.J.M. Miller Cox, J. Bloom, J. Fridrich, and T. Kalker. Digital watermarking and steganography. Morgan Kaufmann Publishers, 2008. [49] P. Csillag and L. Boroczky.

Enhancement of video data using motion-

compensated postprocessing techniques. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 2897–2900 vol.4, Munich, Bavaria, Germany, Apr. 1997.

BIBLIOGRAPHY

140

[50] M. Dai, D. Loguinov, and H. Radha. Rate-distortion modeling of scalable video coders. In Proc. International Conference on Image Processing, volume 2, pages 1093–1096 Vol.2, Singapore, Oct. 2004. [51] G. De Los Reyes, A.R. Reibman, S.-F. Chang, and J.C.-I. Chuang. Errorresilient transcoding for video over wireless channels. IEEE Journal on Selected Areas in Communications, 18(6):1063–1074, Jun. 2000. [52] S. Deering. Multicast routing in a datagram internetwork. Ph.D. Thesis, Stanford University, Dec. 1991. [53] John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, and Bill Weihl. Globally distributed content delivery. IEEE Internet Computing, 6:50–58, 2002. [54] A.G. Dimakis, P.B. Godfrey, Yunnan Wu, M.J. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Transactions on Information Theory, 56(9):4539–4551, Sep. 2010. [55] W. Ding and B. Liu. Rate control of mpeg video coding and recording by ratequantization modeling. IEEE Transactions on Circuits and Systems for Video Technology, 6(1):12–20, Feb. 1996. [56] DSL Forum Technical Report TR-126. of Experience (QoE) Requirements.

Triple-play Services Quality

Dec. 2006. http://www.broadband-

forum.org/technical/download/TR-126.pdf. [57] C.-E. W. Sundberg E. Martinian. Burst erasure correction codes with low decoding delay. IEEE Trans. on Information Theory, 50(10):2494–2502, Oct. 2004. [58] S. Bhattacharyya (ed.). An overview of source-speciﬁc multicast (SSM). RFC 3569, Jul. 2003. [59] Y. Eisenberg, F. Zhai, C.E. Luna, T.N. Pappas, R. Berry, and A.K. Katsaggelos. Variance-aware distortion estimation for wireless video communications. In

BIBLIOGRAPHY

141

Proc. IEEE International Conference on Image Processing, volume 1, pages I – 89–92 vol.1, Barcelona, Catalonia, Spain, Sep. 2003. [60] B. Erol, A. Dumitras, F. Kossentini, A. Joch, and G. Sullivan. MPEG-4, H.264/AVC, and MPEG-7: New standards for the digital video industry. Handbook of Image and Video Processing, 2nd Ed, Academic Press, 2005. [61] N. F¨arber. Feedback-based error control for robust video transmission. Ph.D. Thesis, University of Erlangen, 2000. [62] N. F¨arber and B. Girod. Robust H.263 compatible video transmission for mobile access to video servers. In Proc. IEEE International Conference on Image Processing, volume 2, pages 73–76 vol.2, Washington, DC, USA, Oct. 1997. [63] M. Flierl. Video Coding with Superimposed Motion-Compensated Signals, Ph.D. Dissertation, University of Erlangen. 2003. [64] M. Flierl and B. Girod. Multihypothesis motion-compensated prediction with forward adaptive hypothesis switching. In Proc. Picture Coding Symposium (PCS), Seoul, Korea, 2001. [65] M. Flierl and B. Girod. Multihypothesis motion estimation for video coding. In Proc. Data Compression Conference (DCC), Snowbird, UT, USA, 2001. [66] S. Floyd, V. Jacobson, C. Liu, S. McCanne, and L. Zhang. A reliable multicast framework for light-weight sessions and application level framing. IEEE/ACM Transactions on Networking, 5(6):784–803, Dec. 1997. [67] H. Fuchs and N. F¨arber. Optimizing channel change time in IPTV applications. In Proc. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pages 1–8, 31 2008-april 2 2008. [68] H. Fuchs, U. Jennehag, and H. Thoma. Subjective evaluation of low resolution tune-in streams for IPTV fast channel change. In Proc. IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pages 1–5, May 2009.

BIBLIOGRAPHY

142

[69] B. Girod. The eﬃciency of motion-compensating prediction for hybrid coding of video sequences. IEEE Journal on Selected Areas in Communications, 5(7):1140–1154, Aug. 1987. [70] B. Girod. Digital images and human vision. chapter What’s wrong with meansquared error?, pages 207–220. MIT Press, Cambridge, MA, USA, 1993. [71] B. Girod. Motion-compensating prediction with fractional-pel accuracy. IEEE Transactions on Communications, 41(4):604–612, Apr. 1993. [72] B. Girod. Eﬃciency analysis of multihypothesis motion-compensated prediction for video coding. IEEE Transactions on Image Processing, 9(2):173–183, Feb. 2000. [73] B. Girod, A.M. Aaron, S. Rane, and D. Rebollo-Monedero. Distributed video coding. Proceedings of the IEEE, 93(1):71–83, Jan. 2005. [74] B. Girod and N. F¨arber. Wireless video. A. Reibman, M.-T. Sun (Eds.), Compressed Video over Networks. Marcel Dekker, 1999. [75] B. Girod and N. F¨arber. Feedback-based error control for mobile video transmission. Proceedings of the IEEE, 87(10):1707–1723, Oct. 1999. [76] B. Girod, I. Lagenduk, Qian Zhang, and Wenwu Zhu. Advances in wireless video. IEEE Wireless Communications, 12(4):4–5, Aug. 2005. [77] V. Gopalakrishnan, B. Bhattacharjee, K.K. Ramakrishnan, R. Jana, and D. Srivastava. CPM: Adaptive video-on-demand with cooperative peer assists and multicast. In Proc. IEEE International Conference on Computer Communications (INFOCOM), pages 91–99, Apr. 2009. [78] Quan Gu. A novel content scheduling algorithm for peer-to-peer VoD system. In Proc. International Conference on Computer Application and System Modeling (ICCASM), volume 11, pages V11 172–175, Oct. 2010.

BIBLIOGRAPHY

143

[79] O. Harmanci, S. Kanumuri, U.C. Kozat, U. Demircin, and R. Civanlar. Peer assisted streaming of scalable video via optimized distributed caching. In Proc. IEEE Consumer Communications and Networking Conference (CCNC), pages 1–5, Jan. 2009. [80] T. Hata, N. Kuwahara, T. Nozawa, D.L. Schwenke, and A. Vetro. Surveillance system with object-aware video transcoder. In 2005 IEEE 7th Workshop on Multimedia Signal Processing, pages 1–4, Shanghai, China, Nov. 2005. [81] Z. He, Y.K. Kim, and S.K. Mitra. Low-delay rate control for dct video coding via rho;-domain source modeling. IEEE Transactions on Circuits and Systems for Video Technology, 11(8):928–940, Aug. 2001. [82] Z. He and S.K. Mitra. A uniﬁed rate-distortion analysis framework for transform coding. IEEE Transactions on Circuits and Systems for Video Technology, 11(12):1221–1236, Dec. 2001. [83] Z. He and S.K. Mitra. Optimum bit allocation and accurate rate control for video coding via rho;-domain source modeling. IEEE Transactions on Circuits and Systems for Video Technology, 12(10):840–849, Oct. 2002. [84] H. Holbrook and B. Cain. Source-speciﬁc multicast for IP. RFC 4607, Aug. 2006. [85] U. Horn, K. Stuhlm¨ uller, M. Link, and B. Girod. Robust internet video transmission based on scalable coding and unequal error protection. Image Communication, Special Issue on Real-time Video over the Internet, 15(1-2):77–94, Sep. 1999. [86] Kien A. Hua, Ying Cai, and Simon Sheu. Patching: a multicast technique for true video-on-demand services. In Proc. the sixth ACM International Conference on Multimedia, MULTIMEDIA ’98, pages 191–200, Bristol, United Kingdom, 1998. ACM.

BIBLIOGRAPHY

144

[87] Kien A. Hua, Ying Cai, and Simon Sheu. Patching: a multicast technique for true video-on-demand services. In Proc. ACM International Conference on Multimedia, MULTIMEDIA ’98, pages 191–200, New York, NY, USA, 1998. ACM. [88] Kan-Li Huang, Yi-Shin Tung, Ja-Ling Wu, Po-Kang Hsiao, and Hsien-Shuo Chen. A frame-based MPEG characteristics extraction tool and its application in video transcoding. IEEE Transactions on Consumer Electronics, 48(3):522– 532, Aug. 2002. [89] Espen Jacobsen, Carsten Griwodz, and P˚ al Halvorsen. Pull-patching: a combination of multicast and adaptive segmented HTTP streaming. In Proc. International Conference on Multimedia, MM ’10, 2010. [90] J.Xin, C.-W. Lin, and M.-T. Sun. Digital video transcoding. Proceedings of the IEEE, 93(1):84–97, 2005. [91] M. Karczewicz and R. Kurceren. The SP- and SI-frames design for H.264/AVC. IEEE Trans. Circuits and Systems for Video Technology, 13(7):637–644, Jul. 2003. [92] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman. Transcoding of MPEG bitstreams. Signal Processing: Image Commun., 8(6):481–500, 1996. [93] R. Koetter and F. Kschischang. Coding for errors and erasures in random network coding. IEEE Trans. Information Theory, 54(8):3579–3591, 2008. [94] R. Koetter and M. Medard.

An algebraic approach to network coding.

IEEE/ACM Trans. Networking, 11(5):782–795, Oct. 2003. [95] L. Kontothanassis, R. Sitaraman, J. Wein, D. Hong, R. Kleinberg, B. Mancuso, D. Shaw, and D. Stodolsky. A transport layer for live streaming in a content delivery network. Proceedings of the IEEE, 92(9):1408–1419, Sep. 2004.

BIBLIOGRAPHY

145

[96] A. Korosi, C. Lukovszki, B. Szekely, and A. Csaszar. High quality P2P-Videoon-Demand with download bandwidth limitation. In International Workshop on Quality of Service (IWQoS), pages 1–9, Jul. 2009. [97] R. Kumar. A protocol with transcoding to support qos over internet for multimedia traﬃc. In Proc. International Conference on Multimedia and Expo, volume 1, pages I – 465–8 vol.1, Baltimore, MD, USA, Jul. 2003. [98] J. Dong L. Yu, F. Yi and C. Zhang. Overview of AVS-video: Tools, performance and complexity. In Proc. Visual Communications and Image Processing (VCIP), Beijing, China, 2005. [99] M.S. Lacher, J. Nonnenmacher, and E.W. Biersack. Performance comparison of centralized versus distributed error recovery for reliable multicast. IEEE/ACM Transactions on Networking, 8(2):224–238, Apr. 2000. [100] W.M. Lam, A.R. Reibman, and B. Liu. Recovery of lost or erroneously received motion vectors. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 417–420 vol.5, Minneapolis, Minnesota, USA, Apr. 1993. [101] Irwin Lazar and William Terrill. Exploring content delivery networking. IT Professional, 3:47–49, July 2001. [102] Y. Lee, J. Lee, I. Kim, and H. Shin. Reducing IPTV channel switching time using H.264 scalable video coding. IEEE Trans. on Consumer Electronics, 54(2):912–919, May 2008. [103] Jin Li. PeerStreaming: A practical receiver-driven peer-to-peer media streaming system. Technical report, Microsoft Research, MSR-TR-2004-101, 2004. [104] Jin Li, Philip A. Chou, and Cha Zhang. Mutualcast: An eﬃcient mechanism for content distribution in a peer-to-peer (P2P) network. Technical report, Microsoft Research, MSR-TR-2004-98, 2004.

BIBLIOGRAPHY

146

[105] S.-Y.R. Li, R.W. Yeung, and N. Cai. Linear network coding. IEEE Trans. Information Theory, 49(2):371–381, Feb. 2003. [106] Z. Li, L. Liu, and E.J. Delp. Rate distortion analysis of motion side estimation in Wyner-Ziv video coding. IEEE Transactions on Image Processing, 16(1):98– 113, Jan. 2007. [107] Chao Liang, Zhenghua Fu, Yong Liu, and Chai Wah Wu. Incentivized peerassisted streaming for on-demand services. Parallel and Distributed Systems, IEEE Transactions on, 21(9):1354–1367, Sep. 2010. [108] Y.J. Liang, J.G. Apostolopoulos, and B. Girod. Analysis of packet loss for compressed video: does burst-length matter?

In Proc. IEEE International

Conference on Acoustics, Speech, and Signal Processing, volume 5, pages V – 684–7 vol.5, Hong Kong, China, Apr. 2003. [109] Y.J. Liang, J.G. Apostolopoulos, and B. Girod. Analysis of packet loss for compressed video: Eﬀect of burst losses and correlation between error frames. IEEE Transactions on Circuits and Systems for Video Technology, 18(7):861– 874, Jul. 2008. [110] Y.J. Liang and B. Girod. Low-latency streaming of pre-encoded video using channel-adaptive bitstream assembly. In IEEE International Conference on Multimedia and Expo, volume 1, pages 873–876 vol.1, Lusanne, Switzerland, 2002. [111] Y.J. Liang, E. Setton, and B. Girod. Channel-adaptive video streaming using packet path diversity and rate-distortion optimized reference picture selection. In Proc. IEEE Workshop on Multimedia Signal Processing, pages 420–423, St. Thomas, Virgin Islands, USA, Dec. 2002. [112] Y.L. Liang, M. Flierl, and B. Girod. Low-latency video transmission over lossy packet networks using rate-distortion optimized reference picture selection. In IEEE International Conference on Image Processing, volume 2, pages II–181– 184 vol.2, Rochester, NY, USA, 2002.

BIBLIOGRAPHY

147

[113] Shu Lin, Jr. Costello, and Daniel J. Error Control Coding: Fundamentals and Applications. New Jersey, NJ: Prentice-Hall, 1983. [114] Shunan Lin, Shiwen Mao, Yao Wang, and S. Panwar. A reference picture selection scheme for video transmission over ad-hoc networks using multiple paths. In Proc. IEEE International Conference on Multimedia and Expo, pages 96–99, Tokyo, Japan, Aug. 2001. [115] J. Liu, S.G. Rao, B. Li, and H. Zhang. Opportunities and challenges of peerto-peer Internet video broadcast. Proceedings of the IEEE, 96(1):11–24, Jan. 2008. [116] Shan Liu and C.-C.J. Kuo. Joint temporal-spatial rate control for adaptive video transcoding. In Proc. International Conference on Multimedia and Expo, volume 2, pages II – 225–8 vol.2, Baltimore, MD, USA, Jul. 2003. [117] Yu Liu, Hao Yin, Guangxi Zhu, and Xuening Liu. Peer-assisted content delivery network for live streaming: Architecture and practice. In Proc. International Conference on Networking, Architecture, and Storage (NAS), pages 149–150, Jun. 2008. [118] M. Luby. LT codes. In Proc. 43rd Symposium on Foundations of Computer Science, FOCS ’02, pages 271–, Vancouver, Canada, 2002. IEEE Computer Society. [119] A. Luthra, G. Sullivan, and T. Wiegand (Eds.). Special issue on the H.264/AVC video coding standard. IEEE Transc. Circuits and Systems For Video Technology, 13(7), Jul. 2003. [120] Huadong Ma, G. Kang Shin, and Weibiao Wu. Best-eﬀort patching for multicast true VoD service. Multimedia Tools Appl., 26:101–122, May 2005. [121] D.J.C. MacKay.

Information Theory, Inference, and Learning Algorithms.

Cambridge University Press, 2003.

BIBLIOGRAPHY

148

[122] H. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky. Low-complexity transform and quantization in H.264/AVC. IEEE Transc. Circuits and Systems For Video Technology, 13(7):598ˇsC603, Jul. 2003. [123] D. Marpe, H. Schwarz, and T. Wiegand. Context-adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Transc. Circuits and Systems For Video Technology, 13(7):620–636, Jul. 2003. [124] E. Martinian. Dynamic information and constraints in source and channel coding. Ph.D. Thesis, Massachusetts Inst. of Technology, Sep. 2004. [125] E. Martinian and M. Trott. Delay-optimal burst erasure code construction. In Proc. IEEE International Symposium on Information Theory (ISIT), pages 1006–1010, Nice, France, Jun. 2007. [126] M. Masry, S.S. Hemami, and Y. Sermadevi. A scalable wavelet-based video distortion metric and applications. IEEE Transactions on Circuits and Systems for Video Technology, 16(2):260–273, Feb. 2006. [127] S. McCanne, V. Jacobson, and M. Vetterli. Receiver-driven layered multicast. ACM Computer Communication Review, 26(4):117–130, Aug. 1996. [128] M. Militzer, M. Suchomski, and K. Meyer-Wegener. Improved rho;-domain rate control and perceived quality optimizations for mpeg-4 real-time video applications. In Proc. the eleventh ACM International Conference on Multimedia, pages 402–411, Berkeley, CA, USA, 2003. [129] K. Miller, K. Robertson, A. Tweedly, and M. White. Starburst multicast ﬁle transfer protocol (MFTP) speciﬁcation. http://tools.ietf.org/html/draft-millermftp-spec-03. [130] M. Mitzenmacher. Digital fountains: a survey and look forward. In Proc. IEEE Information Theory Workshop, pages 271–276, San Antonio, TX, USA, Oct. 2004.

BIBLIOGRAPHY

149

[131] R. Mohan, J.R. Smith, and Chung-Sheng Li. Adapting multimedia internet content for universal access. IEEE Transactions on Multimedia, 1(1):104–114, Mar. 1999. [132] Y. Nakajima, H. Hori, and T. Kanoh. Rate conversion of MPEG coded video by re-quantization process. In Proc. IEEE Int. Conf. Image Processing, pages 408–411 vol.3, Washington, DC, USA, 1995. [133] A. Neubauer, J. Freudenberg, and V. Kuhn. Coding Theory Algorithms, Architectures and Applications. John Wiley & Sons, 2007. [134] Di Niu, Zimu Liu, Baochun Li, and Shuqiao Zhao. Demand forecast and performance prediction in peer-assisted on-demand streaming systems. In Proc. IEEE International Conference on Computer Communications (INFOCOM), pages 421–425, Apr. 2011. [135] J. Nonnenmacher and E.W. Biersack.

Scalable feedback for large groups.

IEEE/ACM Transactions on Networking, 7(3):375–386, Jun. 1999. [136] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi. Video coding with H.264/AVC: Tools, performance, and complexity. IEEE Circuits and Systems Magazine, 4(1):851–875, Jan. 2004. [137] Kihong Park, Walter Willinger (editors, H.T. Kung, and C.H. Wu. Content networks: Taxonomy and new approaches, 2002. [138] S. Paul, K.K. Sabnani, J.C. Lin, and S. Bhattacharyya. Reliable multicast transport protocol (RMTP). IEEE Journal on Selected Areas in Communications, Apr. 1997. [139] M.H. Pinson and S. Wolf. A new standardized method for objectively measuring video quality. IEEE Transactions on Broadcasting, 50(3):312–322, Sep. 2004.

BIBLIOGRAPHY

150

[140] J. Prades-Nebot, G.W. Cook, and E.J. Delp. Analysis of the eﬃciency of SNRscalable strategies for motion compensated video coders. In Proc. International Conference on Image Processing, volume 5, Singapore, Oct. 2004. [141] Tongqing Qiu, Zihui Ge, Seungjoon Lee, Jia Wang, Qi Zhao, and Jun Xu. Modeling channel popularity dynamics in a large IPTV system. In Proc. 11th International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’09, pages 275–286, Seattle, WA, USA, 2009. ACM. [142] E. Quacchio, E. Magli, G. Olmo, P. Baccichet, and A. Chimienti. Enhancing whole-frame error concealment with an intra motion vector estimator in H.264/AVC. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 329–332, Philadelphia, Pennsylvania, USA, 18-23 2005. [143] B. Quinn and K. Almeroth. RFC 3170: IP multicast applications: Challenges and solutions. http://www.ietf.org/rfc/rfc3170.txt. [144] Michael Rabinovich and Oliver Spatschek.

Web caching and replication.

Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002. [145] S. Rane, P. Baccichet, and B. Girod. Systematic lossy error protection of video signals. IEEE Transactions on Circuits and Systems for Video Technology, 18(10):1347–1360, Oct. 2008. [146] S. Rane, D. Rebollo-Monedero, and B. Girod. High-rate analysis of Systematic Lossy Error Protection of a predictively encoded source. In Proc. Data Compression Conference (DCC), pages 263–272, Mar. 2007. [147] Irving S. Reed and Gustave Solomon. Polynomial codes over certain ﬁnite ﬁelds. Journal of the Society for Industrial and Applied Mathematics (SIAM), 8(2):300–304, 1960. [148] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Transactions on Communications, 2(4), 1984.

BIBLIOGRAPHY

151

[149] Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy. An analysis of internet content delivery systems. SIGOPS Oper. Syst. Rev., 36:315–327, December 2002. [150] C. Sasaki, A. Tagami, T. Hasegawa, and S. Ano. Rapid channel zapping for IPTV broadcasting with additional multicast stream. In IEEE International Conference on Communications, pages 1760–1766, May 2008. [151] K. Seshadrinathan, R. Soundararajan, A.C. Bovik, and L.K. Cormack. Study of subjective and objective quality assessment of video. Image Processing, IEEE Transactions on, 19(6):1427–1441, Jun. 2010. [152] E. Setton and J. Apostolopoulos. Towards quality of service for peer-to-peer video multicast. In Proc. IEEE International Conference on Image Processing, volume 5, pages V – 81–84, Oct. 2007. [153] E. Setton, P. Baccichet, and B. Girod. Peer-to-peer live multicast: A video perspective. Proceedings of the IEEE, 96(1):25–38, 2008. [154] E. Setton and B. Girod. Video streaming with SP and SI frames. In Proc. SPIE Visual Communication and Image Proccessing, 2005. [155] E. Setton and B. Girod. Rate-distortion analysis and streaming of SP and SI frames. IEEE Transactions on Circuits and Systems for Video Technology, 16(6):733–743, Jun. 2006. [156] S. Shamai, S. Verdu, and R. Zamir. Systematic lossy source/channel coding. 44(2):564–578, Mar. 1998. [157] T. Shanableh and M. Ghanbari. Heterogeneous video transcoding to lower spatio-temporal resolutions and diﬀerent encoding formats. IEEE Transactions on Multimedia, 2(2):101–110, Jun. 2000. [158] T. Shanableh and M. Ghanbari. Heterogeneous video transcoding to lower spatio-temporal resolutions and diﬀerent encoding formats. IEEE Transactions on Multimedia, 2(2):101–110, Jun. 2000.

BIBLIOGRAPHY

152

[159] B. Shen. Modeled analysis on requantization error. In Proc. 2005 IEEE 7th Workshop on Multimedia Signal Processing, pages 1–4, Shanghai, China, Nov. 2005. [160] A. Shokrollahi. Raptor codes. IEEE Transactions on Information Theory, 52(6):2551–2567, Jun. 2006. [161] Swaminathan Sivasubramanian, Michal Szymaniak, Guillaume Pierre, and Maarten van Steen. Replication for web hosting systems. ACM Comput. Surv., 36:291–334, September 2004. [162] E. Soljanin, R. Liu, and P. Spasojevic. Hybrid ARQ with random transmission assignments. In Proc. DIMACS Workshop on Network Information Theory, Piscataway, NJ, USA, May 2003. [163] T. Speakman, J. Crowcroft, J. Gemmell, D. Farinacci, S. Lin, D. Leshchiner, M. Luby, T. Montgomery, L. Rizzo, A. Tweedly, R. Edmonstone, and L. Vicisano.

RFC 3208:

PGM reliable transport protocol speciﬁcation.

http://www.ietf.org/rfc/rfc3208.txt. [164] G. Srinivasan and S. Regunathan. An overview of VC-1. In Proc. SPIE Visual Communications and Image Processing (VCIP), Beijing, China, 2005. [165] S. Srinivasan, P. Hsu, T. Holcomb, K. Mukerjee, S. Regunathan, B. Lin, J. Liang, M.-C. Lee, and J. Ribas-Corbera. Windows Media Video 9: Overview and applications. Signal Processing: Image Communications, 19(9):851–875, Oct. 2004. [166] B. Ver Steeg, A. Begen, T. Van Caenegem, and Z. Vax. net draft:

Inter-

Unicast-based rapid acquisition of multicast RTP sessions.

http://tools.ietf.org/html/draft-ietf-avt-rapid-acquisition-for-rtp. [167] T. Stockhammer, M.M. Hannuksela, and T. Wiegand. H.264/AVC in wireless environments. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):657–673, Jul. 2003.

BIBLIOGRAPHY

153

[168] K. Stuhlm¨ uller, N. F¨arber, M. Link, and B. Girod. Analysis of video transmission over lossy channels. IEEE Journal on Selected Areas in Communications, 18(6):1012–1032, Jun. 2000. [169] J. Su and R. Mersereau. Motion-compensated interpolation of untransmitted frames in compressed video. In Proc. 30th Asilomar Conf. on Signals Systems and Computers, pages 100–104, Paciﬁc Grove, CA, Nov. 1996. [170] V. Subramanian and Vijay Gautam Subramanian. Broadband fading channels: Signal burstiness and capacity. IEEE Transactions on Information Theory, 48:809–827, 1999. [171] H. Sun, W. Kwok, and J. W. Zdepski. Architectures for MPEG compressed bitstream scaling. IEEE Trans. Circuits Syst. Video Technol., 6(2):191–199, 1996. [172] Huifang Sun, W. Kwok, and J.W. Zdepski. Architectures for mpeg compressed bitstream scaling. Circuits and Systems for Video Technology, IEEE Transactions on, 6(2):191–199, Apr. 1996. [173] W.-T. Tan and B. Shen. Accurate distortion-driven macroblock level rate control via rho;-domain analysis. In Proc. IEEE International Conference on Image Processing, volume 3, pages III – 45–8, Genoa, Italy, Sep. 2005. [174] R. Thoma and M. Bierling. Motion compensated interpolation considering covered and uncovered background. Signal Processing: Image Communication, 1(2):192–212, Oct. 1989. [175] D.H.K. Tsang, K.W. Ross, P. Rodriguez, J. Li, and G. Karlsson. Advances in peer-to-peer streaming systems. IEEE Journal on Selected Areas in Communications, 25(9):1609–1611, Nov. 2007. [176] Athena Vakali and George Pallis. Content delivery networks: Status and trends. IEEE Internet Computing, 7.

BIBLIOGRAPHY

154

[177] M. van der Schaar and P. Chou (Eds.). Multimedia over IP and Wireless Networks. Elsevier, 2007. [178] N. Varnica, E. Soljanin, and P. Whiting. LDPC code ensembles for incremental redundancy hybrid ARQ. In Proc. International Symposium on Information Theory, 2005, pages 995–999, Adelaide, Australia, Sep. 2005. [179] D.C. Verma. Content Distribution Networks: An Engineering Approach. John Wiley and Sons, Inc., New York, USA, 2002. [180] A. Vetro, C. Christopoulos, and Huifang Sun. Video transcoding architectures and techniques: an overview. IEEE Signal Processing Magazine, 20(2):18–29, Mar. 2003. [181] A. Vetro, J. Xin, and Huifang Sun. Error resilience video transcoding for wireless communications. IEEE Wireless Communications, 12(4):14–21, Aug. 2005. [182] Yao Wang, J. Ostermann, and Y.-Q. Zhang. Video Processing and Communications. Prentice Hall, New Jersey, 2001. [183] Yao Wang, S. Wenger, Jiantao Wen, and A.K. Katsaggelos. Error resilient video coding techniques. IEEE Signal Processing Magazine, 17(4):61–82, Jul. 2000. [184] Yao Wang and Qin-Fan Zhu. Error control and concealment for video communication: a review. Proceedings of the IEEE, 86(5):974–997, May 1998. [185] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. Image Processing, IEEE Transactions on, 13(4):600–612, Apr. 2004. [186] Zhou Wang, L. Lu, and A.C. Bovik.

Video quality assessment based on

structural distortion measurement. Signal Processing: Image Communication, 19(2):121–132, 2004. [187] Zhou Wang, H.R. Sheikh, and A.C. Bovik. Objective video quality assessment. In The Handbook of Video Databases: Design and Applications, pages 1041– 1078. CRC Press, 2003.

BIBLIOGRAPHY

155

[188] Andrew B. Watson, James Hu, and John F Mcgowan III. DVQ: A digital video quality metric based on human vision. Journal of Electronic Imaging, 10:20–29, 2001. [189] T. Wedi and H. Musmann. Motion- and aliasing-compensated prediction for hybrid video coding. IEEE Transc. Circuits and Systems For Video Technology, 13(7):577–586, Jul. 2003. [190] S.J. Wee, J.G. Apostolopoulos, and N. Feamster. Field-to-frame transcoding with spatial and temporal downsampling. In Proc. International Conference on Image Processing, volume 4, pages 271–275 vol.4, Kobe, Japan, 1999. [191] S. Wenger. H.264/AVC over IP. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):645–656, Jul. 2003. [192] O. Werner. Requantization for transcoding of MPEG-2 intraframes. IEEE Trans. Image Process., 8(2):179–191, 1999. [193] B. Widrow and I. Kollar. Quantization Noise: Roundoﬀ Error in Digital Computation, Signal Processing, Control, and Communications. Cambridge University Press, 2008. [194] T. Wiegand, J. Sullivan, G. Bjontegaad, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Transc. Circuits and Systems For Video Technology, 13(7):560–576, Jul. 2003. [195] Jiahua Wu and Baochun Li. Keep cache replacement simple in peer-assisted VoD systems. In Proc. IEEE International Conference on Computer Communications (INFOCOM), pages 2591–2595, Apr. 2009. [196] Minghui Xia, A. Vetro, Huifang Sun, and Bede Liu. Rate-distortion optimized bit allocation for error resilient video transcoding. In Proc. of the 2004 International Symposium on Circuits and Systems, volume 3, pages III – 945–8 Vol.3, Vancouver, Canada, May 2004.

BIBLIOGRAPHY

156

[197] Bo Xie and Wenjun Zeng. Source characteristics based fast bitstream switching. In Proc. International Conference on Multimedia and Expo, volume 1, pages I – 521–4 vol.1, Baltimore, MD, USA, Jul. 2003. [198] Bo Xie and Wenjun Zeng. Fast bitstream switching algorithms for real-time adaptive video multicasting. IEEE Transactions on Multimedia, 9(1):169–175, Jan. 2007. [199] R. Xie, J. Liu, and X.Wang. Eﬃcient MPEG-2 to MPEG-4 compressed video transcoding. In Proc. SPIE Visual Communications and Image Processing, volume 4671, pages 192–201, San Jose, CA, USA, Jan. 2002. [200] K.H. Yang, A. Jacquin, and N.S. Jayant. A normalized rate-distortion model for H.263-compatible codecs and its application to quantizer selection. In Proc. International Conference on Image Processing, volume 2, pages 41–44, vol.2, Washington, DC, USA, Oct. 1997. [201] R.W. Yeung and N. Cai. Network error correction, part I: basic concepts and upper bounds. Communications in Information and Systems, 6(1):19–36, 2006. [202] Hao Yin, Xuening Liu, Tongyu Zhan, Vyas Sekar, Feng Qiu, Chuang Lin, Hui Zhang, and Bo Li. Design and deployment of a hybrid CDN-P2P system for live video streaming: experiences with LiveSky. In Proceedings of the 17th ACM international conference on Multimedia, MM ’09, pages 25–34, New York, NY, USA, 2009. ACM. [203] P. Yin, A. Vetro, H. Sun, and B. Liu. Drift compensation architectures and techniques for reduced resolution transcoding. In Proc. SPIE Visual Communications and Image Processing, volume 4671, San Jose, CA, USA, Jan. 2002. [204] Jeongnam Youn, Ming-Ting Sun, and Jun Xin. Video transcoder architectures for bit rate scaling of H.263 bit streams. In Proc. 7th ACM International Conference on Multimedia, MULTIMEDIA ’99, pages 243–250, Orlando, Florida, United States, 1999. ACM.

BIBLIOGRAPHY

157

[205] W. Zeng, K. Nahrstedt, P.A. Chou, A. Ortega, P. Frossard, and H.H. Yu. Introduction to the special issue on streaming media. IEEE Transactions on Multimedia, 6(2):225–229, Apr. 2004. [206] Meng Zhang, Jian-Guang Luo, Li Zhao, and Shi-Qiang Yang. A peer-to-peer network for live media streaming using a push-pull approach. In Proceedings of the 13th annual ACM international conference on Multimedia, MULTIMEDIA ’05, pages 287–290, New York, NY, USA, 2005. ACM. [207] Rui Zhang, S. Regunathan, and K. Rose. End-to-end distortion estimation for RD-based robust delivery of pre-compressed video. In Proc. IEEE Asilomar Conference on Signals, Systems and Computers, pages 210–214 vol.1, Paciﬁc Grove, CA, USA, Jul. 2001. [208] Rui Zhang, S.L. Regunathan, and K. Rose.

Video coding with optimal

inter/intra-mode switching for packet loss resilience. IEEE Journal on Selected Areas in Communications, 18(6):966–976, Jun. 2000. [209] Xinyan Zhang, Jiangchuan Liu, Bo Li, and Y.-S.P. Yum.

CoolStream-

ing/DONet: a data-driven overlay network for peer-to-peer live media streaming. In Proc. IEEE International Conference on Computer Communications (INFOCOM), volume 3, pages 2102–2111 vol. 3, Mar. 2005. [210] W. Zhu, M.-T. Sun, L.-G. Chen, and T. Sikora. Special issue on advances in video coding and delivery. Proceedings of the IEEE, 93(1):3–5, Jan. 2005.

Perceptual Similarity based Robust Low-Complexity Video ...