Global Rate Control Scheme for MPEG-2 HDTV Parallel Encoding System Ken NAKAMURA, Mitsuo IKEDA, Takeshi YOSHITOME and Takeshi OGURA NTT Cyber Space Laboratories 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan
[email protected]
Abstract This paper proposes a new rate control scheme for an MPEG-2 HDTV parallel encoding system consisting of multiple encoding units. In this rate control scheme, the target number of bits is allocated to each encoding unit every frame period using the complexity of the divided parts of an image. We evaluate this rate control scheme and verify that its image quality is as good as or better than that of single encoding.
1. Introduction High definition television, or HDTV, is an important leading-edge technology for video applications in the field of broadcasting, communication and storage media. One of the key HDTV technologies is MPEG-2 video encoding. The MPEG-2 video standard was established in 1994 as a generic video encoding standard which covers the range from SDTV to HDTV, and it has had an enormous impact on the communication and entertainment industries. Advance in the video technology now enables the HDTV digital broadcasting services to be provided in many countries and the MPEG-2 video standard is employed for most of them. MPEG-2 MP@HL (Main Profile at High level) is a typical video encoding format for HDTV images and it is employed as an HDTV digital broadcasting format in many countries. Therefore it is anticipated there will be a strong need for encoders compliant with the MPEG-2 MP@HL format in various situations. But MP@HL encoders are currently very expensive and quite large due to the high degree of computational power required of them. For example, the number of pixels in the 1080i format, which is a typical HDTV standard format, is six times that in the NTSC format. Therefore, MP@HL encoders cannot be implemented on one chip at a reasonable cost as MP@ML1 (Main Profile 1a
typical MPEG-2 video coding format for SDTV.
at Main Level) encoders can. There are several approaches to constructing MP@HL encoders. One is to construct them using many dedicated function units designed for HDTV encoders. This approach, however, makes encoders very expensive because development costs are excessively high. Another approach is the parallel encoding method using multiple video encoding units. With this approach, each video encoding unit has a nearly complete encoding function by itself, and the encoding units encode divided images concurrently and output partial bitstreams. The partial bitstreams are then reconstructed into an HDTV bitstream. There are also several different methods for dividing the input images [1]. In the spatial division method, input images are divided into a number of rectangular imagees horizontally or vertically, as illustrated in Figure 1. In the block interleave method the input image is divided into small blocks, and these blocks are collected at intervals horizontally or vertically.
Image
Image
Encoder
Bitstream
Image
Encoder
Bitstream
Image
Encoder
Bitstream
Image
Encoder
Bitstream
. . .
. . .
. . .
Bitstream
Figure 1. Parallel encoding method with spatial division.
We consider that a parallel encoding with spatial division to be the most advantageous method, because it enables single-chip MP@ML encoders to be used as video encoding units with only a few pre-processors and postprocessors. This raises the possibility of developing very low cost, compact encoders with multi-format scalability. And with one-chip MP@ML encoders being increasingly used in consumer products such as DVDs and digital video recorders, their cost can be expected to become even lower.
and this degrades encoding efficiency and image quality. In this architecture, therefore, the encoding units transmit image data to each other using the Multi-chip Data Transfer Interface, MDTI, and this enables them to access the neighboring parts of image data to achieve motion compensation across the border of the divided parts of images. This architecture has advantages from the aspects of cost and compactness because it needs only MP@ML encoder LSIs with a few pre-processors and post-processors. This means we can build very inexpensive and compact encoders with scalability for application to the multiple HDTV formats.
Earlier, we proposed a new architecture for an HDTV parallel encoding system using one-chip MP@ML encoders [2]. In this architecture, the input HDTV images are divided horizontally into multiple rectangular images consisting of several slices. Each of the corresponding encoding units encodes a divided image and outputs a partial bitstream. Then the partial bitstreams are reconstructed into an HDTV bitstream. This MP@ML encoder LSI has features which enable it to be used in a multi-chip encoder system, such as image size flexibility and reference image transmission capability. However, a problem still remains concerning the efficiency and the quality of images. The problem is that the rate control system is divided, and thus when each of the encoding units encodes at the fixed bit rate independently, the number of bits becomes insufficient in some encoding units, but excessive in others. This is a critical problem because the parts which are short of bits are very noticeable. To overcome this problem, the encoding parameter should be shared among all encoding units and the bit allocation should be optimized globally. In this paper, we discuss a global rate control scheme for such an MPEG-2 HDTV parallel encoder. Section 2 reviews the architecture of the encoder system. Section 3 proposes a new rate control scheme that uses sharing of the encoding information. The effect of this rate control scheme is evaluated in section 4, and we also evaluate the effect of reference image transmission and compare the efficiency of the parallel encoder with that of a single encoder. Section 5 is a brief conclusion.
Host Processor control bus
u.p. SuperENC
preprocessor
l.p to control bus
to control bus
video input
u.p. preprocessor
bitstream output
SuperENC l.p to control bus
to control bus
u.p. preprocessor
2. Architecture of MP@HL encoder
SuperENC l.p MDTI-ring
In our encoding system, the encoding unit is an MPEG2 MP@ML encoder LSI we developed, named “SuperENC” [3], shown as Figure 2. This LSI is expansible for application to parallel encoding systems. First, it can encode the arbitrary size of rectangular images within the constraints of its computational power, which is equivalent to that of MP@ML encoding. Thus it can encode the divided parts of HDTV images. It can also flexibly generate the bitstream syntax so that the reconstructed bitstream is compliant to an MP@HL stream. Further, the encoding units can transmit image data to each other to achieve motion compensation across borders of the divided parts of image. Motion vector estimation and motion compensation requires reference image data which includes not only original images but also locally decoded images. Therefore, MPEG-2 encoders must decode the encoded images locally and store them to achieve motion vector estimation and motion compensation. But in the spatial division encoding, reference images cannot be accessed from neighboring encoding units, so the motion compensation is limited on the border of the divided part of an image
u.p. upper port l.p. lower port
Figure 2. System architecture for the MP@HL encoder.
3. New Rate Control Scheme In this section, we propose a new rate control scheme for a parallel encoding system. If the encoding units run independently and there is no global rate control system in the parallel encoding system, each encoding unit must have a fixed bit rate. If the bit allocation is fixed, the image quality is not uniformly consistent, i.e., the quality of some parts of the image becomes much lower than that of other parts. Therefore, it is essential to have a global rate control system that can adaptively allocate bits among encoding units. We propose a new rate control scheme to overcome this problem; a model of the scheme is shown in Figure 3. The scheme consists of three stages. In Stage 1, inter-frame rate 2
3.2 Stage 2 - Bit allocation to encoding units
control, a target number of bits is determined for every picture. Stage 2 allocates the bits to each of the encoding units and determines a target number of bits for each unit. Stage 3 is the intra-frame rate control. In this scheme, the intra-frame and inter-frame rate control algorithms are basically independent of each other. The inter-frame rate control algorithm calculates the target number of bits for every picture and the inter-frame rate control algorithm of each encoding unit encodes a picture such that the target number of bits is attained.
Bit allocation to each encoding unit is performed in Stage 2. This stage determines how many bits should be allocated to each encoding unit for the next picture within the target number of bits for the next picture.
stage3 (in Encoding Unit)
stage2
stage3 (in Encoding Unit)
target bits T
target bits Ti
i = 1
(1)
T denotes the target number of bits for the next picture determined as the result of Stage 1, and Ti denotes the target number of bits for the ith part of the next picture. We describe the way to estimate i , the ratio of target number of bit for the ith encoding unit. Generally, the generated number of bits is said to be almost inversely proportional to the quantization parameter, as described in Appendix A. So we can estimate an expected number of bits Ri for each encoding unit from a quantization parameter Qi and a complexity Xi which is a coefficient of inversely proportional relation between Ri and Qi .
coding parameter
stage1
X
Ti = i T;
stage3
Ri =
(in Encoding Unit)
Xi Qi
(2)
Let us consider the case in which a distribution of quantization parameter Qi like that shown below is given, where
i denotes the ratio of the quantization parameter of each encoding unit.
Figure 3. Rate control scheme.
Q1
1
3.1 Stage 1 - Inter-frame rate control
=
Q2
2
= =
Qn
n
(3)
The ratio of quantization parameter i should be determined according to the charasteristics of the image. Then, the ratio of the expected number of bits for each encoding units Ri is obtained from equations (2) and (3), as below.
Stage 1 determines the target number of bits for every picture using an inter-frame rate control algorithm. The Stage 1 module is implemented in a host processor which is connected to all encoders by a control bus (Fig.2). After every picture is encoded in the encoding units, the host processor gathers encoding parameters from all encoding units and calculate the parameters of the entire image from them. Then, it calculates the target number of bits for the next picture using the parameters. Various algorithms are applicable for the inter-frame rate control algorithm, among them, the Test Model 5 [4] and VBR alogorithms. In this proposed rate control scheme, intra-frame rate control and inter-frame rate control are basically independent of each other. This is an advantage from the standpoint of efficiency in algorithm development, because it enables various inter-frame rate control algorithms to be adapted directly. VBV buffer control is also executed in this module. The VBV buffer is a hypothesis decoder buffer which is conceptually connected to the encoder output [5]. The output bitstream must be controlled so that the VBV buffer does not underflow or overflow. The host processor verifies this regulation and controls the generated number of bits. It also calculates vbv-delay for picture header syntax.
X1
1 R1
=
X2
2 R2
= =
Xn
n Rn
(4)
We determine the ratio of target number of bits for each encoding units i according to the ratio of the expected number of bits.
i =
PRiRi = PXXi= i = i i i
i
(5)
Thus, the ratio of bit allocation i is estimated according to the equation below,
8> X = >> Pi IiXIi = i i < = i i = PXPi >> i XPi = i >: PXBiX = = i Bi i i
3
(when the picture type is I-picture) (when the picture type is P-picture) (when the picture type is B-picture) (6)
Xi = R~i Q~i
Table 1. Coding simulation conditions.
(7)
GOP structure Chroma format Bit rate
where the complexity of each part of image Xi is calculated from the generated number of bits R~i and the average of quantization parameter Q~i of the previous picture with the same picture type. Here, we make two assumptions: first, that the rate quantization model is exactly inversely proportional and second, that the rate quantization model of two consecutive pictures with same picture type in the same position are almost the same. Although these assumptions cause the degradation of the control accuracy, they enable the quantization parameter to be approximately kept as we expected. Further, they simplify the estimation of target bit counts and make implementation easy.
M=3, N=15, frame structure 4:2:0 20 Mbps, 30 frame/second, CBR(based on TM5) horizontal 16.0 vertical 5.0 for 1 field period n=8 1920 1024, 150 frames ITE Standard Test Sequences for Subjective Assessment for HDTV Picture Quality
Search range Number of units Image size Test sequence
A and C were encoded by a parallel encoder with fixed bit allocation among the encoding units, and cases B and D were encoded by a parallel encoder with the proposed rate control scheme. In cases A and B the reference images for motion compensation were transmitted to each other and the motion compensation is not limited at the borders. In cases C and E, however, motion compensation was executed independently and limited at the borders of parts of the images.
3.3 Stage 3 - Intra-frame rate control In Stage 3, intra-frame rate control is achieved in each encoding units. It controls macroblock quantization parameters in order to achieve the given target number of bits and realize consistent image quality. This scheme also has advantages from the viewpoint of implementation. Stage 3 is implemented in each encoding unit, and the intra-frame control of the MP@ML encoder can be used as Stage 3 of this scheme without making any major modifications.
22.0
single fixed allocation proposed
20.0
Q average
18.0 16.0
4. Experiment
14.0
We evaluated the rate control scheme by software simulation. In this simulation, the bits are allocated to uniform quantization parameters among all encoding units, i.e.
1 = 2 = n . And these simulations were executed under the conditions shown in Table 1. The parallel encoding system have eight encoding units and their inter-frame and intra-frame rate control algorithms are based on the Test Model 5 (TM5) algorithm. Figure 4 and Figure 5 show the quantization parameter and SNR of each part of the image for a test sequence “Soccer Action”. These experimental results were obtained by three methods, single encoding with TM5, parallel encoding with fixed bit allocation and parallel encoding with the proposed rate control. Figure 4 shows that in the fixed bit allocation method there is disparity in the quantization parameters among the parts of the image, whereas in the proposed bit allocation method they are nearly constant, as we expected. And Figure 5 shows that the distribution of SNR is flatter than in the fixed bit allocation method. This indicates that very poor quality regions do not appear. Table 2 and 3 shows the SNRs that were encoded by various encoding methods, in which SNRs are denoted as the difference from SNRs of a single encoder with TM5. Cases
12.0 10.0 8.0
1
2
3
4
5
part number
6
7
8
Figure 4. Average of quantization parameter of each part ('Soccer Action', Average for 150 frames).
The SNR for the proposed rate control scheme was found to be higher than that for the fixed bit allocation method for all test sequences. This fact is clear from the difference between cases A and B in Table 2. This means that the optimization of bit allocation improves the quality level for the entire picture. This effect is especially large in image sequences that have different characteristics among the parts of an image, such as “Soccer Action” and “Whale Show”. The differences between cases A and C, and between cases B and D, show that the inclusion of the image transmission is also important and effective. In sequences where 4
32.0
Table 3. SNR for each method without image transmission [dB] (Difference from SNR for single encoding method).
SNR
30.0
conditions
28.0
single fixed allocation proposed
26.0 1
2
3
4
5
part number
6
7
test sequence 8
Figure 5. SNR of each part ('Soccer Action', Average for 150 frames).
test sequence
image transmission proposed rate control ' Soccer Action' ' Whale Show ' ' Green Leaves' ' Church' ' Marching in' ' Walk through the Square'
case A ON OFF -0.403 -0.509 -0.210 -0.088 -0.051 -0.117
case C OFF OFF -0.467 -0.632 -0.436 -0.654 -0.136 -0.179
case D OFF ON +0.104 -0.128 -0.233 -0.592 -0.009 -0.007
the complexity of each encoding units improves the quality of images. We also verified that the effect of transmitting reference images among encoding units and the combination of these methods overcomes the drawbacks of parallel encoding systems. As a result, we confirmed that our parallel encoding system using multiple encoding units makes it possible to achieve not only compactness, reasonable cost and scalability but also the high-efficiency encoding. We are currently evaluating the effect of this scheme on the evaluation board of a real-time MP@HL encoder, and we have obtained good results on the board. Our future focus will be on refining the scheme and applying to real-time encoding systems.
Table 2. SNR for each method with image transmission [dB] (Difference from SNR for single encoding method). conditions
image transmission proposed rate control ' Soccer Action' ' Whale Show' ' Green Leaves' ' Church' ' Marching in' ' Walk through the Square'
case B ON ON +0.155 +0.037 -0.008 -0.006 +0.079 +0.047
References [1] Y. Yashima, A. Shimizu, H. Kotera, “A HDTV Parallel Coding Method Based on Image Division”, International Workshop on HDTV' 94, 8-A-1.
there is a lot of vertical motion, the effectiveness is particularly significant, as in “Church”, a scene which includes tilting. In most of the test sequences, the SNRs for case B, is as good as or better than that of single encoding method. Also in a subjective evaluation, the image quality is as good as that of single encoders. Therefore, we can conclude that the proposed rate control scheme with the reference image transmission among encoding units overcomes the drawbacks of the parallel encoding systems with spatial division, and that its image quality is comparable to that of single encoding.
[2] K. Suguri et al., “A Scalable Architecture of Real-Time MP@ML MPEG-2 Video Encoder for Multi-Resolution Video”, “Proc. IS&T/SPIE Conf. Visual Communications and Image Processing, Vol.3653, SPIE, 1999, pp.895-904. [3] M. Ikeda et al., “SuperENC: MPEG-2 Video Encoder Chip”, IEEE MICRO, Vol.19, No.4, July/August 1999 . [4] MPEG-2 Test Model 5, Document ISO/IEC JTC1 SC29 WG11/93-400, Test Model Editing Committee, April 1993. [5] H.262(MPEG-2) ISO/IEC 13818-2 International Standard, Jan. 1995. [6] G.Keesman, I.Shah, R.K.Gunnewiek, “Bit-rate control for MPEG encoders”, Singnal Processing Image Communication 6 (1995), pp.545-560.
5. Conclusion We have described a method that improves the efficiency and the image quality of an MPEG-2 HDTV encoder with spatial division. We proposed a new rate control scheme and described that the adaptive bit allocation according to 5
Appendix
quantization parameters q, the number of bits generated for them is respectively amount to r0 and r1 (see Fig.6). The ratio of r0 to r1 is approximately the same as the ratio of complexity of the two images because of the inversely proportional relation. Thus, if we estimate the rate quantization model of each image, we can control the amount of bits generated for the image easily by determining the quantization parameter.
A. Rate Quantization Model In a lossy image encoding method like MPEG-2, an input image is transformed using discrete cosine transform (DCT) or other transform and the coefficients in the frequency domain are quantized. This quantization process causes some distortion of images, but it enables highly efficient encoding and flexible control of bit rate. In order to achieve required bit rates, it is necessary to control the quantization parameter of each macroblock. Thus it is important to predict the amount of bits that will be generated for a given quantization parameter. Generally, the product of the quantization parameter Q and the number of generated bits R is called “complexity” [4, 6], denoted as X.
X = RQ
(8)
And the complexity X is said to be approximately constant for the same image when it is encoded by the same encoding method. The complixity is different depending on the characteristics of the image. The complexity for the complicated image is large and that of a simple image is small. It also means that the number of generated bits is inversely proportional to the quantization parameter and the complexity is the coefficient of inversely proportional relation.
R=
X Q
(9)
R (number of bits)
We show this property function in Figure 6.
r0 r1
q Q (quantization parameter)
Figure 6. Rate Quantization Model.
There are two curves with different complexities, the lower one representing an “easier” image property than the higher one. If the two images are quantized by the same 6