A New Rate-Quantization Model for H.264/AVC Low-Delay Rate Control Junhui Hou1,2 , Shuai Wan2 , Zhan Ma3 , Fuzheng Yang4 , and Lap-Pui Chau1 1
School of Electrical and Electronics Engineering, Nanyang Technological University, 639798 Singapore. 2 School of Electronics and Information, Northwestern Polytechnical University, Xi’an, 710072 China. 3 Dallas Technology Lab, Samsung, Richardson TX 75082, USA. 4 State Key Laboratory of ISN, Xidian Universtity, Xi’an, 710071 China. {
[email protected],
[email protected],
[email protected],fzhyang@mail. xidian.edu.cn,
[email protected]}
Abstract. In this paper, we present a new rate-quantization (R-Q) model for H.264/AVC low-delay rate control. Our rate model is a power function of the quantization stepsize, which is derived through theoretical analysis assuming the Laplacian distributed source. Model parameters are content adaptive and updated frame by frame. Proposed R-Q model is implemented on H.264/AVC reference software JM17.2 to evaluate the performance, where accurate quantization stepsize is selected for each frame according to the target bit rate. As compared to the most recent published work [5] and well-known R-Q model adopted in JM [13], we have shown that almost 2x performance improvement in terms of the bits mismatch error. Keywords: Rate-quantization model, H.264/AVC, low-delay rate control
1
Introduction
In video encoding, a fundamental problem is how to select an appropriate quantization stepsize (QS) or equivalent quantization parameter (QP) to meet the bit rate constraint. Precise bit rate control is very important and critical for the encoder design, in particular for the real-time low-delay encoder rate control which is widely used in low-delay or ultra low-delay scenarios including video-conferencing, live video show and etc. Accurate rate-quantization (R-Q) model is high desired to solve this problem and such R-Q modeling problem has been extensively studied over decades. For instance, a well-known quadratic R-Q model is developed assuming the Laplacian distributed source [2] [3], with frame residual complexity dependent parameter to accurate capture the frame content variation. This model is adopted in MPEG-4 reference software to do encoder rate control and frame complexity is measured using the mean of the absolute of difference (MAD) for residual signal. In addition to MAD, other frame complexity measurements, such as variance of the difference (VOD) [5], sum of the
2
Junhui Hou1,2 , Shuai Wan2 , Zhan Ma3 , Fuzheng Yang4 , and Lap-Pui Chau1
absolute of the transform difference (SATD) [6], are introduced to improve the R-Q model for rate control. Meanwhile, a power function based R-Q model is introduced in [8] [9] assuming the Cauchy distributed residual, where a gradient based frame complexity is used in [9] to enhance the bit prediction. Besides these R-Q models, a ρ-domain rate model has been developed, where ρ indicates the percentage of zero coefficients after quantization [10]. The ρ-domain model turns out to be very accurate, however it does not give the explicit relationship between bit rate and QS or QP. In this paper, a new R-Q model is derived analytically based on the Laplacian distributed residual source. To accurately capture the bit consumption for each frame, frame residual complexity is measured by MAD and adjacent pixel correlation for residual block. Our proposed R-Q model has the power functional form which is consistent with the work in [8]. We implement the proposed model in rate control to measure its performance, where accurate quantization stepsize for each frame is selected based on the proposed R-Q model according to the target bit rate. Results show that our proposed R-Q model can reduce the bits mismatch error up to 50% compared with the models proposed in [5] [13]. The remainder of this paper is organized as follows. The theoretical derivation of the proposed model is given in Section 2. In Section 3, on-line model parameter prediction and update are presented, while Section 4 shows the rate prediction accuracy of our proposed model when performing rate control for H.264/AVC. This paper closes with the conclusions given in Section 5.
2
Analytical Rate-Quantization Model
Assuming a M × M residual block X after prediction, the pixel values in X can be captured by a Laplacian distribution with a zero mean and separable autocorrelation function [11]. In video encoder, X is further transformed (through DCT or DCT-alike integer transform) for energy compaction. For example, in H.264/AVC, after transform, we can have Y = (Cf XCTf ) [1] with Cf representing the forward transform. The variance of the transformed coefficients at (u, v)-th position (0 ≤ u < M , 0 ≤ v < M ) in any frame, i.e., σY2 (u, v), can be statistically expressed as [12] 2 σY2 (u, v) = K(u, v)σX ,
(1)
2 where σX is the variance of X. K(u, v) is the (u, v)-th element of the 4 × 41 matrix K, noted as
K = diag(Cf PCTf )diag(Cf PCTf )T , 1
(2)
We assume the 4x4 block transform used in H.264 for theoretical analysis. 8x8 block transform can be analyzed similarly.
A New Rate-Quantization Model for H.264/AVC Low-Delay Rate Control
3
Here diag(Cf PCTf ) is a column vector consisting of all the diagonal elements of the input matrix Cf PCTf . P in (2) is the correlation matrix defined as 1 ζ ζ2 ζ3 ζ 1 ζ ζ2 P= (3) ζ2 ζ 1 ζ 3 2 ζ ζ ζ 1 where |ζ| ≤ 1 is the correlation coefficient between adjacent residual pixels. Moreover, for a Laplacian distributed random signal, σX can be approximated by the mean of the absolute difference for residual signal, i.e., √ (4) σX = 2 · MAD. Substituting (4) into (1), we can have p σY = 2K(u, v) · MAD.
(5)
As shown in [14], it has been demonstrated that the transformed coefficients at (u, v)-th can be also captured by a Laplacian distribution as fY (u,v) (y) =
λ(u, v) −λ(u,v)|y| e , 2
where λ(u, v) is the Laplacian distribution parameter, i.e., √ 2 1 λ(u, v) = . =p σY (u, v) K(u, v) · MAD
(6)
(7)
As we can see, transform coefficients distribution can be easily derived if ζ and MAD are known. Hence the bits consumption for encoding one frame can be accurately predicted. Under the magnitude error criterion [15], the well-known R-D function for the coefficient at (u, v)-th is given by ( 1 1 ln( λ(u,v)D ) 0 < D < λ(u,v) R(u,v) (D) = (8) 1 0 D ≥ λ(u,v) . Eq. (8) can be expanded using Taylor series as R(u,v) (D) 1 1 1 1 ( − x0 ) − − x0 )2 + . . . ( 2 x0 λu,v D 2!x0 λu,v D 2 1 5 D−2 + . . . , D−1 − = (ln x0 − ) + 4 x0 λu,v 2!(x0 λu,v )2
= ln x0 +
where λu,v represents λ(u, v) at (u, v)-th position, x0 > 1 so as to satisfy convergence of the Taylor series. Since the magnitude error criterion is used as the
Junhui Hou1,2 , Shuai Wan2 , Zhan Ma3 , Fuzheng Yang4 , and Lap-Pui Chau1
4
football
tempete
11.5
10.5
11 10
10
ln(R)
ln(R)
10.5
9.5
9.5
9 8.5 8 −7
−6
−5
−4
9 −6
−3
α ln(ζ)+β ln(MAD/q)
−5.5
football
−4.5
tempete
10
9.5
9.5
9
9
8.5
8.5
ln(R)
ln(R)
−5
α ln(ζ)+β ln(MAD/q)
8
8 7.5
7.5
7
7 6.5 −11
−10
−9
−8
−7
−6
6.5 −13
α ln(ζ)+β ln(MAD/q)
−12.5
−12
−11.5
−11
α ln(ζ)+β ln(MAD/q)
Fig. 1. Illustration of ln(R) versus ζ and M AD in scatter plots for different QPs. (The first row: QP=30; The second row:QP=36.)
distortion measure in Eq. (8), D = q/4 as shown in [16] with q standing for QS. Therefore, a R-Q model for the coefficients at (u, v) can be formulated ( where the high-order terms is neglected) as follows: R(u,v) (q) = a0 +
a2 a1 q −1 + q −2 , λ(u, v) λ(u, v)2
with a0 , a1 and a2 as the model parameters. Combining Eq. (7) into (9), it yields p R(u,v) (q) = a0 + a1 K(u, v)MAD · q −1 + a2 K(u, v)MAD2 · q −2 .
(9)
(10)
As a result, the total number of texture bits R(q) for a frame is the sum of the bits distributed at each position, i.e., R(q) =
M −1 M −1 X X u=0 v=0
+ a2 (
(R(u,v) (q)) = a0 + a1
M −1 M −1 MAD X X p K(u, v) q u=0 v=0
3 3 MAD 2 X X ) K(u, v). q u=0 v=0
(11)
p As aforementioned [(cp. (2) and (3)], it is noted that K(u, v) is a function of PM −1 PM −1 p ζ. We define two functions, i.e., f1 (ζ) and f2 (ζ), denoting u=0 K(u, v) v=0
A New Rate-Quantization Model for H.264/AVC Low-Delay Rate Control
and
PM −1 PM −1 u=0
v=0
5
K(u, v), respectively. Hence, (11) can be rewritten as
R(q) = a0 + a1
MAD MAD 2 f1 (ζ) + a2 ( ) f2 (ζ). q q
(12)
It is noted that R(q) is a polynomial function of the MAD/q. However, it is difficult to find a closed form for f1 (ζ) and f2 (ζ) so as to derive the close-form R(q) analytically. Instead, we observed that the logarithm of the R(q) is linearly related to the the logarithm of the MAD/q, as illustrated in Fig. 1. Such linearity is well preserved for different video contents coded with different QPs. To save the space, we only show two videos (Football as an example of motion intensive video while Tempete as an example of rich texture content) at two different QPs. Therefore, we can simplify the R(q) model in (12) as R(q) = C × ζ α × (
MAD β ) , q
(13)
where C, α and β are model parameters, which are refined using least square method (LSM) [18] after encoding one frame. ζ and MAD reflect the frame complexity. Note that the functional form of our proposed R-Q model is also consistent with the models presented in [8] [19].
3 3.1
Model Parameter Prediction and Update ζ and MAD Prediction
In practice, the actual values of ζ and MAD are not available before finishing a frame encoding [13]. We adopt the linear prediction used in [13] to predict the MAD of current frame, i.e., MADp [i] = Y1 [i] × MAD[i − 1] + Y0 [i],
(14)
where MADp [i] denotes the predicted MAD of the i-th frame, MAD[i−1] denotes the actual MAD after encoding the (i − 1)-th frame, and Y1 [i] and Y0 [i] are the first-order and zero-order parameters of this linear prediction model, which would be updated after encoding every frame the same as in [13]. Let ζd be the correlation coefficient for direct residual signal, where direct residual signal is the co-located difference between current original frame and previous reconstructed frame. We have found that ζd always has the same trend as the ζ. Hence, we propose to predict ζ as follows: ζp [i] = ζ[i − 1] · (1 + ω · ρζ [i]),
(15)
where ζp [i] is the predicted autocorrelation coefficient of the i-th frame, ζ[i−1] is the actual autocorrelation coefficient of the (i − 1)-th frame, ω = ζ[i − 1]/ζd [i − 1] and ρζ = (ζd [i]−ζd [i−1])/ζd [i−1]. Table I shows the averaged prediction error for PN N coded frames, defined as eζ = N1 i=1 |ζ[ i] − ζp [i]|. We also plot the predicted and actual values of ζ in Fig. 2 for Football and Silent. As demonstrated, (15) can predict the actual autocorrelation coefficient very well.
Junhui Hou1,2 , Shuai Wan2 , Zhan Ma3 , Fuzheng Yang4 , and Lap-Pui Chau1
6
Table 1. Averaged prediction errors for ζ prediction Sequence Mother Silent Football Foreman eζ 0.0252 0.0906 0.0251 0.0624
silent
footbll
0.45
0.75 0.7
0.4
0.65
ζ
ζ
0.35
0.6
Predicted values Actual values
0.55
0.3
Predicted values Actual values
0.25
0.5 0.45 0
10
20
30
40
50
0.2 0
Frame Number
5
10
15
20
25
30
35
40
45
50
Frame Number
Fig. 2. Illustration of predicted and actual ζ for Football and Silent.
3.2
Online Joint Update of C, α and β
C, α and β are updated before encoding the next frame using least mean square error (LMS) [18]. For instance, after coded the current frame, encoder collects the number of bits, MAD, ζ and QS to form R = [R1 , R2 , . . . , Rn ]T and V = [V1 , V2 , . . . , Vn ]T with Vi = [1, ln ζi , ln(MAD · qi−1 )], 1 ≤ i ≤ n. n denotes of the number of previously coded frame used for parameter updating. In our simulation, we set n = 20 as in [13], and parameters can be updated through [ln C, α, β]T = (RT R)−1 RT V.
4
(16)
Experimental Results and Discussions
In this section, we present the experimental results of rate prediction accuracy for different R-Q models used in H.264/AVC rate control. Latest H.264/AVC reference software, i.e., JM17.2 [17] is used to do the benchmark with rate control enabled. The test sequences are chosen with different characteristics, i.e., “mother-daughter”, “ice”, “soccer”, “city”, and “crew” at CIF resolution and 30 frames per second. Each sequence is encoded with only first intra frame followed by all P-frames for low-delay encoding. Several typical target bit rates are simulated for verify the model accuracy. We compare the proposed R-Q model with the models described in [5] and [13]. In our experiment, constant target bits are set for each frame. Then, the first three frames of a sequence are coded with an initial QP, and the QP values of the following frames are determined based on the employed R-Q models. The average bits mismatch error between the target bits and the actual generated bits is used to evaluate the performance
A New Rate-Quantization Model for H.264/AVC Low-Delay Rate Control mother−daughter 6
soccer 2.5
The proposed model Model in [5] Model in [13]
5
7
2
The proposed model Model in [5] Model in [13]
4
e
e
1.5
3
1
2 0.5
1 0 0
10
20
30
40
0 0
50
10
Frame Number
20
crew 2.5
40
50
40
50
ice 1.4
The proposed model Model in [5] Model in [13]
2
30
Frame Number
1.2
The proposed model Model in [5] Model in [13]
1 1.5
e
e
0.8 0.6
1
0.4 0.5 0.2 0 0
10
20
30
40
50
0 0
10
Frame Number
20
30
Frame Number
Fig. 3. The relative error between targeted bits and generated bits for each frame.
of R-Q models, defined as e=
1 XN |Rt (i) − R(i)| i=1 N Rt (i)
where Rt (i) and R(i) are the target and the actual coded bits in the i-th frame, respectively. The average bits mismatch error are shown in Table II, from which it can be observed that the proposed R-Q model consistently outperforms the models in [5] and [13] with smaller rate prediction error. Apparently, the actual bits are more close to the target bits using the proposed model. The relative errors at frame encoding order are plotted in Fig. 3, which also shows that the maximum relative error is dramatically lower than previous works in addition to the smaller average relative error. Meanwhile, the complexity of the proposed method is similar to the schemes used for comparison because the same least square method is adopted to update model parameters for all three models.
5
Conclusions
In this paper, a novel rate quantization (R-Q) model is proposed through theoretical derivation assuming the Laplacian distributed source. There are five
8
Junhui Hou1,2 , Shuai Wan2 , Zhan Ma3 , Fuzheng Yang4 , and Lap-Pui Chau1 Table 2. Averaged relative errors between target and actual bits e Rt (kb/s) Model in [13] Model in [5] Proposed Mother
Soccer
City
Crew
Ice Ave.
120
0.8108
0.7435
0.3348
60
1.1270
0.8171
0.4475
30
1.3214
1.1101
0.5212
600
0.2210
0.1807
0.1289
180
0.2851
0.1955
0.1477
60
0.3537
0.2739
0.1537
600
0.1725
0.2448
0.1442
240
0.3050
0.4129
0.1177
60
0.2901
0.4418
0.1393
540
0.2502
0.2822
0.1820
240
0.3537
0.4034
0.2867
90
0.4906
0.4816
0.3522
600
0.1319
0.1213
0.0771
180
0.2418
0.1749
0.1459
60
0.2338
0.2002
0.1540
0.4388
0.4069
0.2221
model parameters in total with two content dependent parameters: one is the frame complexity measured by MAD while the other is the adjacent residual pixel correlation ζ. All these parameters are updated on-line frame by frame. For MAD and ζ, they are predicted using two simple prediction methods while other three parameters are updated using least square method. Experimental results have demonstrated that the proposed algorithm reduces the bit mismatch error by average 45%, 49% compared with the latest work [5] and default method adopted in the H.264/AVC reference software [13], respectively.
References 1. H.S. Malvar, A. Hallapuro, M. Karczewicz and L. Kerofsky: Low-complexity transform and quantization in H.264/AVC. IEEE Trans. Circuits Syst. Video Technol., 13(7), 598-603 (2003) 2. T. Chiang and Y.-Q. Zhang: A New Rate Control Scheme Using Quadratic Rate Distortion Model. IEEE Trans. Circuits Syst. Video Technol., 7(1), 246-250 (1997) 3. H.-J. Lee, T. Chiang, and Y.-Q. Zhang: Scalable rate control for MPEG-4 Video. IEEE Trans. Circuits Syst. Video Technol., 10(6), 878-894 (2000) 4. S. Wan and E. Izquierdo: A new rate distortion model for rate control of video coding. In Pro. of WIAMIS (2006)
A New Rate-Quantization Model for H.264/AVC Low-Delay Rate Control
9
5. C.-W. Seo, J. W. Kang, J.-K. Han and T. Q. Nguyen: Efficient Bit Allocation and Rate Control Algorithms for Hierarchical Video Coding. IEEE Trans. Circuits Syst. Video Technol., 20(9), 1210-1223 (2010) 6. D.-K. Kwon, M.-Y. Shen, and C.-C. J. Kuo: Rate Control for H.264 Video With Enhanced Rate and Distortion Models. IEEE Trans. Circuits Syst. Video Technol., 17(5), 517-529 (2007) 7. S. Hu, H. Wang, S. Kwong and T. Zhao: Frame level rate control for H.264/AVC with novel rate-quantizaion model. In Pro. of ICME, 226-231 (2010) 8. N. Karmaci, Y. Altunbasak and R. M. Mersereau: Frame Bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models. IEEE Trans. Circuits Syst. Video Technol., 15(8), 994-1006 (2005) 9. X. Jing, L.-P Chau and W.-C. Siu: Frame complexity-based rate-quantization model for H.264/AVC intraframe rate control,” IEEE Signal Process. Lett., 15, 373-376 (2008) 10. Z. He, S. K. Mitra: A unified rate-distortion analysis frame work for transform coding. IEEE Trans. Circuits Syst. Video Technol., 11(12), 1221-1236 (2001) 11. I.-M. Pao and M.-T. Sun: Modeling DCT coefficients for fast video encoding. IEEE Trans. Circuits Syst. Video Technol., 9(4), 608-616 (1999) 12. A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. 13. Z. G. Li, F. Pan, K. P. Lim, G. Feng, X. Lin, and S. Rahardja: Adaptive basic unit layer rate control for JVT, Doc. JVT-G012-rl, Thailand, (2003) 14. E. Y. Lam and J. W. Goodman: A Mathematical Analysis of the DCT Coefficient Distribution for Images. IEEE trans. Image process., 9(10), 1661-1666 (2000) 15. T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, 94-95 (1971) 16. H. Gish and J. N. Pierce: Asymptotically efficient quantizing. IEEE Trans. Inform. Theory, IT-14(5), 676-683 (1968) 17. H.264/14496-10 AVC Reference software JM17.2 [Online] Available:http://iphome.hhi.de/suehring/tml/index.htm. 18. S. G. Nash and A. Sofer, Linear and Nonlinear Programming, McGraw-Hill Companies, Inc, New York, (1996) 19. W. Ding and B. Liu: Rate control of MPEG video coding and recoding by ratequantization modeling. IEEE Trans. Circuit and Sys. for Video Technology, 6(2), 12-20 (1996)