Motion Feature and Hadamard Coefficient-Based Fast ...

Viewer
Transcript

620

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Motion Feature and Hadamard Coefficient-Based Fast Multiple Reference Frame Motion Estimation for H.264 Zhenyu Liu, Member, IEEE, Lingfeng Li, Yang Song, Member, IEEE, Shen Li, Satoshi Goto, Fellow, IEEE, and Takeshi Ikenaga, Member, IEEE

Abstract—In the state-of-the-art video coding standard, H.264/AVC, the encoder is allowed to search for its prediction signals among a large number of reference pictures that have been decoded and stored in the decoder to enhance its coding efficiency. Therefore, the computation complexity of the motion estimation (ME) increases linearly with the number of reference picture. Many fast multiple reference frame ME algorithms have been proposed, whose performance, however, will be considerably degraded in the hardwired encoder design due to the macroblock (MB) pipelining architecture. Considering the limitations of the traditional four-stage MB pipelining architecture, two fast multiple reference frame ME algorithms are proposed here. First, on the basis of mathematical analysis, which reveals that the efficiency of multiple reference frames will be degraded by the relative motion between the camera and the objects, for the slow-moving MB, the authors adopt the multiple reference frames but reduce their search range. On the other hand, for the fast-moving MB, the first previous reference frame is used with the full search range during the ME processing. The mutually exclusive feature between the large search range and the multiple reference frames makes the computation saving performance of the proposed algorithm insensitive to the nature of video sequence. Second, following the Hadamard transform coefficient-based all_zeros block early detection algorithm, two early termination criteria are proposed. These methods ensure the pronounced computation saving efficiency when the encoded video has strong spatial homogeneity or temporal stationarity. Experimental results show that 72.7%–93.7% computation can be saved by the proposed fast algorithms with an average of 0.0899 dB coding quality degradation. Moreover, these fast algorithms can be combined with fast block matching algorithms to further improve their speedup performance.

Index Terms—H.264/AVC, motion estimation (ME), motion vector (MV), multiple reference frames, video coding, video signal processing, VLSI.

Manuscript received May 24, 2007; revised September 28, 2007. This work was supported by fund from the CREST JST. This paper was recommended by Associate Editor L.-G. Chen. Z. Liu, Y. Song, S. Goto, and T. Ikenaga are with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). L. Li was with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan. He is now with Nemochips Inc., Shanghai 200135, China (e-mail: [email protected]). S. Li was with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan. He is now with Toshiba Company Ltd., Kawasaki 212-8520, Japan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2008.918844

I. INTRODUCTION HE superior performance of the latest international video coding standard, H.264/AVC, comes mainly from several state-of-the-art techniques, which include quarter-pixel accurate variable block size motion estimation (ME) with multiple reference frames, intra-prediction (IP), context-based adaptive variable length entropy coding (EC) and in-loop deblocking (DB) filter, etc [1]. The superior coding quality of H.264/AVC comes at the price of considerable increase in encoding computational complexity. ME is the most computation-intensive component in the encoder. According to the analysis of [2], 89.2% computation power is consumed by ME part. Compared with earlier standards, ME in H.264/AVC takes advantage of variable block size and multiple reference frames to further reduce the prediction residues, and multiple reference frame technique is the main issue that leads to huge computation complexity. The required computation is directly proportional to the reference frame number. For massive computation through real-time H.264/AVC encoding applications, especially with HDTV specifications, the dedicated hardwired encoder with the up-to-date techniques is still a must. The huge computation complexity of the prediction algorithm renders even the traditional two-stage MB pipelining architecture inefficient for the H.264 hardwired encoder design [3]. In H.264/AVC, fractional ME (FME) is 100 times more complex than that of previous standards. It has become the system bottleneck because of the quarter-pixel accuracy, variable block size, multiple reference frames, and precise distortion evaluation. Consequently, integer ME (IME) and FME must be arranged in two separate stages to get the high hardware utilization and throughput. One optimized four-stage MB pipelining hardwired encoder, provided in [3], is shown in Fig. 1. To investigate the impact of MB pipelining on the existing multiple reference frame fast algorithms, a brief dataflow overview of the four-stage hardware encoder is introduced here. In the first stage, the IME engine processes on all reference frames. The integer motion vectors (MVs) of 41 blocks in current macroblock (MB) on all reference frames are achieved and then dispatched to the second FME stage. Through the quarter-pixel accurate ME and precise rate-distorion (RD)-cost evaluation, FME engine finds the best prediction candidates and decides the best inter prediction mode. The IP, post inter/intra mode decision and chroma motion compensation are implemented at the third stage. EC and DB are processed in parallel at the fourth stage.

T

1051-8215/$25.00 © 2008 IEEE

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

621

Fig. 1. Block diagram of the traditional four-stage real-time hardwired H.264 encoder.

As the multiple reference frame algorithm is the main issue leading to the huge computation complexity, many literatures about fast multiple reference frame ME have been proposed to reduce its computational load. One method [2] proposes four criteria for early termination of the motion search on multiple reference frames. These algorithms efficiently reduce 30%–80% redundant computation in the software, whereas the MB pipelining architecture of the hardware encoder degrades their performance. From this method [2], it can be seen that all these early termination criteria must be applied in the second FME stage. The IME engine, the most computation intensive component, cannot benefit from these approaches. Hsu et al. [4] propose a fast reference frame selection algorithm. In this algorithm, the best reference frame of current block is determined by the Lagrangian costs of its neighboring blocks. As the Lagrangian cost can only be achieved after FME search, for the hardwired engine design, this method has the same demerit in the work [2]. Another promising scheme is reducing the search areas on multiple reference frames depending on the strong correlations of MVs in consequent pictures [5]–[7]. The main drawback of this algorithm is the hardware overhead consumed by the MV composition. For example, in literature [5], 4 4 block-based MVs of all reference frames must be kept. With HDTV720p frame size, 128 128 search range and five reference frames, a total of 5 529 600-bit memories are required to buffer these MVs. For the accuracy of MV composition, the multiplication, which is the hardware consuming component, is also applied in [5]. Moreover, this kind of algorithms just simplify the computation of IME but do not contribute to the FME computation saving. Other researchers applied the fast block-matching algorithm in multiple reference frame ME [8], [9]. For example, Kim improved the multi-resolution block-matching algorithm and adopted it in multiple reference frame ME [9]. These methods can be categorized as extensions of fast block-matching schemes. In this paper, two fast multiple reference frame ME algorithms, which are compatible with the existing four-stage MB pipelining architecture, are proposed. The first one is the integer MV-based reference frame reduction and search area adjustment algorithm. Our mathematical analysis suggested that the fast movement can blur the edges of the sampled object and this feature makes those blocks under fast movement insensitive to multiple reference frame algorithm. Consequently, for the fast-moving MB, only the first previous reference frame is searched with full search range; on the other hand, for the slowmoving MB, multiple reference frames are required, but their search ranges can be greatly reduced. The second fast search method is the Hadamard transform coefficient-based all_zeros block early detection algorithm. Based on this detection method,

two early termination criteria are devised. These early termination algorithms are efficient to those sequences with the strong spatial homogeneity or temporal stationarity. The remainder of this paper is organized as follows. The MV-based reference frame reduction and search area adjustment algorithm is presented in Section II and the Hadamard transform coefficient-based all_zeros block early detection algorithm is introduced in Section III. The whole process flow of the proposed fast algorithms is analyzed in Section IV. Section V presents the experimental results to demonstrate the proposed schemes. Finally, conclusions are drawn in Section VI. II. MV-BASED REFERENCE FRAME REDUCTION AND SEARCH RANGE ADJUSTMENT By mathematical analysis, the aliasing problem is found to be the chief determining issue that deteriorates the motion prediction efficiency [10]–[13]. Subpixel interpolation and multiple reference frames techniques are adopted by H.264/AVC to mainly compensate the prediction error by aliasing. In this section, first, the prediction error coming from aliasing is introduced and investigated. Second, the effect of image motion blurring to aliasing is analytically described. Finally, the integer MV-based reference frame reduction and search range adjustment algorithm is proposed. A. Impact of Aliasing to Prediction Error In previous literatures [2], [5], many reasons have been adduced for the superior prediction performance of multiple reference frame ME. For instance, seven matters, such as “uncovered background,” “repetitive motions,” and “camera shaking” etc, which make multiple reference frame ME achieve less prediction residues, are listed in [5]. However, these reasons address the problem only superficially. Through the mathematical analysis and experimental results, the authors found that aliasing caused by the detailed textures in the picture and the displacement error, is the most critical issue among all these [13]. The highly sophisticated ME algorithms applied in H.264/ AVC, such as multiple reference frames and subpixel interpolation, are developed mainly to circumvent the prediction error caused by prediction displacement error [10], [11]. In the spectral domain, this prediction error can be explained as the aliasing problem coming from subsampling [12]. To simplify the mathematical description, the analysis is restricted to one spatial diand denote the spatial-continuous mension signal. signals at time instance and . is a displaced version of and the distance is , which can be expressed as

622

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Fig. 2. RD curve comparisons of Mobile_QCIF and Stockholmpan_QCIF with 1 and 5 reference frames (QP = 16, 20, 24, 28, 32;GOP = IPPP; RDO_ON; CAVLC; 200 frames).

Fig. 3. RD curve comparisons of Football_QCIF and Foreman_QCIF with 1 and 5 reference frames (QP = 16, 20, 24, 28, 32;GOP = IPPP; RDO_ON; CAVLC; 200 frames for Football_QCIF and 20 frames for Foreman_QCIF (from the 299th to the 318th frame) ).

. Their frequency domain signals are denoted as and . These continuous image signals are sampled by the sensor array before digital processing. The interval of the spatial samplers is denoted as . The displacement error can be expressed as . Aliasing does not exist if Nyquist–Shannon sampling precondifor , where is the sampling tion, i.e., frequency, is satisfied. However, because no spatial-limited signals can be band limited and the lowpass filter of the sampling system is not ideal, the precondition of Nyquist–Shannon sampling theorem cannot be fulfilled. On the basis of [12], with the normalized sampling frequency, , the magnitude of prediction error signal caused i.e., by aliasing can be described as (1) and . Two important conclusions can be drawn in light of (1). , aliasing is caused by the 1) Because of the term , where . high frequency signals in 2) According to the term , the impact of aliasing and is vanishes at full pixel displacements, i.e., maximum at half pixel displacements, i.e., . Conclusion 1 states that the image rich of high frequency signals is more susceptible to the aliasing problem. Conclusion

where

Fig. 4. MV on the first previous reference frame statistics of Mobile_QCIF and Football_QCIF (QP = 16;GOP = IPPP; RDO_ON; CAVLC; 200 frames).

2 underlines the need for multiple reference frames during the prediction processing: If the displacement error amplitude between the current and the first previous is greater than , which denotes the image and displacement error amplitude between the current the th previous image , then is preferred to be chosen as the reference picture because the aliasing errors . are weakened by the term To justify the assumption that aliasing by high frequency signals is the chief determining issue to multiple reference frame ME, some experimental results are illustrated in Figs. 2 and 3. Both Mobile_QCIF and Stockholmpan_QCIF contain many detailed textures and almost have no fast movement. Stockholmpan_QCIF, particularly, does not have even “occlusion,” “repetitive motion,” or “camera shaking” as mentioned in [5]. However, the effect of multiple reference frames is prominent in these sequences. For Stockholmpan_QCIF, at high bit rate, 1.8 dB peak signal-to-noise ratio (PSNR) gain can be achieved by the multiple reference frame ME. Foreman_QCIF contains an abrupt fast pan of camera, which locates from the 299th to the 318th frame. The authors made use of Football_QCIF and the fast pan video segment in Foreman_QCIF to study the impact of high motion on the multiple reference frame ME. As shown in Fig. 3, the coding quality improvement by multiple reference frame ME for these two sequences is negligible. Before proceeding further with spectral analysis of the test sequences, it is instructive to study their motion features. The on the first previous reference statistics of MB MVs frame are depicted in Fig. 4. It is noticed that most MBs in Mobile_QCIF undergo slow motion. For instance, the ratio of the is 92.27%. In MB with contrast, this parameter in Football_QCIF just reaches 7.42%. Although Football_QCIF has fast and complex motions, multiple reference frame ME can not achieve noticeable coding gain. On the other hand, for Mobile_QCIF, because of its complex textures, the PSNR differences between searching five reference frames and searching only one reference frame are about 1.4–1.5 dB. The 2-D Fourier spectrums of Mobile_QCIF and Football_QCIF are shown in Fig. 5. For each test sequence, 128 frames and hamming window were used to calculate its average spectrum amplitude. It is obvious that the high frequency

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

623

Fig. 6. Blurring caused by synthesized motion degrades the efficiency of multiple reference frame ME (QP = 16, 20, 24, 28, 32; 200 frames; GOP = IPPP; RDO_ON; CAVLC).

undergoes planar motion and and represent the motion in - and -directions respectively, the obtained position can be expressed as the integration image at during the exposure period, , as (2) In frequency domain, this procedure can be expressed as (3) where . If the image undergoes uniform linear motion, i.e., and , using (3), may be expressed as (4) Fig. 5. Spectral analysis of Mobile_QCIF and Football_QCIF sequence. (a) Average Fourier spectrum of 128 Football_QCIF frames. (b) Average Fourier spectrum of 128 Mobile_QCIF frames.

signals of Mobile_QCIF are much more abundant than those of its counterpart Football_QCIF. The authors also quantitatively analyzed the high frequency signals, which are defined as . The high frequency signal power accounts for 5.62% of the whole image power, in Football_QCIF and 10.98%, that is almost double, in Mobile_QCIF. The theory of the aliasing problem to multiple reference frames search has been successfully adopted in [13]. By spatial analysis, Liu et al. [13] deduced similar conclusions as those derived by spectral analysis and they further demonstrated that the variance of prediction error is determined mainly by the edge gradient amplitude. In Section II-B, it will be seen that the image motion can smooth edges in the picture and thus it alleviates the prediction errors caused by the aliasing problem. B. Impact of Image Motion to Multiple Reference Frame Algorithm According to Gonzalez and Woods [14], the motion between the object and the sensor blurs the sampled image. If an image

From this analysis, it can be seen that the effect of motion is the same as that of lowpass filter and that its pass band decreases with the motion speed, and . In other words, the edges of the sampled image are blurred by the motion. With the decrease of the edge gradient amplitude, according to the conclusion in [13], the prediction errors of the fast-moving block are reduced, so that the efficiency of multiple reference frame ME technique is lowered. To verify this assumption, the authors synthesized the moand tion effect on Mobile_CIF video sequence, where . Fig. 6 depicts the comparisons between the RD curves of the original sequence and the blurred one. At moderate and high bit rates, for the original sequence, the PSNR enhancement which comes from multiple reference frame ME algorithm was 1.2–1.3 dB. In contrast, for the blurred test sequence, the PSNR differences were reduced to 0.5–0.6 dB. C. Motion Vector Based Reference Frame Reduction and Search Area Adjustment Algorithm From the investigations in Section II-B, it is concluded that the block with fast motion needs large search range but is not susceptible to multiple reference frame algorithm. On the other hand, the slow-moving block can be searched in a small area with multiple reference frames. Consequently, the MV-based

624

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

reference frame reduction and search area adjustment algorithm is provided. is processed 1) In the first reference frame, IME of . in the full search range, 2) If of is greater than the , current MB is a fast-moving one, so threshold that the searches on subsequent reference frames can be eliminated. Otherwise, as current MB is the slow-moving one and the aliasing problem still exists, multiple reference frames are required. However, subsequent searches can , be processed in the reduced ranges, i.e., to save the computation of IME part. is defined as , where and represent the - and -components of MV predictor. The threshold is determined by the frame size of the coded sequence. Through exhaustive exis defined as 12-pixel periments, for QCIF format, and for CIF format, as 16-pixel. As small blocks contain fewer pixels and are prone to be trapped in local optimum positions, in authors’ algorithm, the motion feature determination solely depends on the integer MV block. This also simplifies its computation comof plexity. For VLSI implementation, this criterion is implemented in the first IME stage as shown in Fig. 1. If current MB is determined as the fast-moving type, the search of IME engine and FME engine are restricted to the first reference frame. In this way, the computational load of both IME and FME are saved. For those slow-moving MBs, the subsequent variable block size IME searches are processed in the reduced search area, which is . In this case, this algorithm contributes defined as just to the computation saving of IME engine.

III. HADAMARD TRANSFORM COEFFICIENT BASED all_zeros BLOCK DETECTION ALGORITHM In H.264/AVC standard, after the motion-compensation prediction, the process to prediction residues includes a forward transform, zig-zag scanning, scaling, and rounding as the quantization process, followed by EC. If the transformed and quantized coefficients of the block or sub-MB could be early determined as all_zeros case during ME, its searches on other reference frames can be eliminated because more search cannot further reduce the coefficients after the quantization process. This early termination algorithm has also the adverse effect as mentioned in [15], which perhaps brings about the miss of the best matching prediction block and then deteriorates the quality of reconstruction. On the other hand, the experiments in Section V show that the coding quality loss coming from the proposed early termination algorithms was less than 0.0664 dB. In this section, the authors first present the mathematical demonstration of the Hadamard transform coefficient-based all_zeros block detection algorithm and then propose two early termination criteria based on this all_zeros block detection approach. The separable 4 4 2-D dicrete cosine transform (DCT) applied in H.264/AVC is written as (5)

is the 4 where the superscript denotes transposition, residual matrix and is the transform matrix, as shown in

4

(6)

During the quantization step, the thresholds in 4 are shown in (7).

4 matrix

(7) is the quantization parameter, , , and is the scaling matrix shown in [16]. For one block, during the multiple reference frame ME, if it is found that the residues at current reference frame are small enough to make it all_zeros post DCT transformation and quantization (Q), the search can be terminated. Depending on either sum of absolute difference (SAD) of the prediction errors or sum of absolute transformed differences (SATD), Huang et al. [2] provided one method for the early estimation of the all_zeros block. Compared with the constant threshold approach [17], the threshold in [2] increases with , so that more computation incan be saved by the dynamic threshold scheme when creases. This is an eminent improvement. However, as SAD and SATD are the sum of prediction residues or transformed prediction residues, they are not the efficient indicators of the frequency characteristics, so that they provide only the coarse all_zeros estimations. Some works try to improve the detection accuracy at the cost of computation. Xie et al. [18] adopted sum of squares operation to evaluate the ac power in prediction residues. As the all_zeros detection must be processed in every search position for all inter mode blocks during the FME, the computation overhead of such operation is unacceptable. In the JM reference software, SATD was applied in FME process to search for the best candidate and the low-complexity mode decision algorithm also depends on SATD cost to calculate the distortion. Compared with SAD, SATD accounts for the amount of prediction error, as well as the cost for transformed representation. Hence, the SATD operates as a more accurate RD cost criterion for the prediction error than SAD. SATD-based mode decision algorithm has already been implemented in the hardware design [3]. To derive the SATD value, the 4 4 Hadamard transform was first applied to each 4 4 residual block as shown in where

(8) where

is the Hadamard transform matrix, as shown in (9)

In fact, depending on these Hadamard coefficients, more accurate all_zeros estimations can be made. Comparing and , and of were the it was found that the basic functions

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

625

same as and of and and had patterns similar to and . In fact, Hadamard transform is a simplified form of DCT transform. Its resulting transformed signal emulates the frequency characteristics of the true DCT transformed block in the subsequent DCT/Q stage at very low computational cost. As the Hadamard coefficients have already been derived during can be set for the early detecFME, a threshold matrix have the same value tion of all_zeros block. The entries in as illustrated in (10). If a Hadamard coefficient is less than the threshold, it is assumed that the corresponding DCT coefficient will become zero after the quantization stage.

(17)

(18)

(19)

(10) This

setting

depends

on

two reasons. First, for , and . Thus, the real values of these entries post DCT/Q stage can be obtained. Second, in other entries, the ratio of the DCT coefficient standard deviation to the Hadamard coefficient standard deviation is similar to the ratio of their thresholds. Proof: According to Huang et al. [2], the standard deviation matrix of DCT and Hadamard coefficients can be expressed denotes the standard deas (11) and (12) respectively, where to , i.e., viation of residues. The ratio matrix of , is illustrated in (13). For , this item is ignored in (7) to simplify the analysis, the ratio of to , which is denoted as , . For example, for has six cases according to , is shown as (14) and the ratio between to is shown in (15). Other cases can be traced by analogy and they are to ratio expressed as (16)–(20). It is observe that the matrix is approximate all_ones case, and thus the assumption is demonstrated

(11)

(12)

(13)

(14)

(15)

(16)

(20)

where is the scalar division, which means each entry of the first matrix is divided by the element in the same position in the second matrix. On the basis of the foregoing analysis, the following early termination criteria are provided to alleviate the computation load of ME. and 1) If ( ), this MB is set as skip mode and early terminate all subsequent ME of current MB; , and , if 2) For modes , early terminate in other reference frames; the ME of 3) For modes , , and , , early terif in other reference minate the ME of frames. SKIP_MVP denotes the skip mode MV predictor. For implementing VLSI with authors’ algorithm, only four comparators are required to be appended to the output of each 4 4 Hadamard module as the 2-D Hadamard architecture has been implemented in the FME engine [3]. This additional hardware overhead is trivial compared to that of the whole FME engine. The coding quality and computation saving efficiency value and the feature of of authors’ algorithm depends on encoded video sequence. Compared with the SA(T)D (SAD or SATD)-based all_zeros block early detection method, experiments in Section V-B demonstrate that the authors’ approach . Further, at the provides more robust performance versus similar detection rate, the authors’ approach has better coding quality, especially at low bit rate. IV. OVERALL ALGORITHMS Based on the analysis of Sections II and III, the authors integrated the MV-based reference frame reduction and search area adjustment algorithm and the Hadamard coefficient-based all_zeros block detection method into the JVT reference software. The pseudo codes of whole fast algorithms embedded into the reference software JM11.0 provided by JVT are shown in Fig. 7. The modifications to encode_one_MB function in JM11.0 have been highlighted with the bold italic font. In the first step, Hadamard coefficient-based skip MB early detection

626

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

reference frame, the redundant searches can be eliminated. block That is, if the amplitude of integer MV of , the searches on other was larger than the threshold frames were skipped. On the other hand, for the MB with the small MV amplitude, its search ranges in the subsequent IME search processing could be shrunk. The skip mode MB (Skip_MB) and Fast_Motion flags were both transferred to the second FME stage. If the current MB was determined as skip mode, all operations in FME engine were eliminated and if the Fast_Motion flag was set, FME processed only the ME on the first reference frame. The Hadamard coefficient-based all_zeros block detection method can be applied in FME stage to further save its computational power. V. EXPERIMENTAL RESULTS

Fig. 7. Pseudo-codes of the proposed fast multiple reference frame ME algorithms.

was proceeded. For those MBs that were early determined as skip mode, their searches for other modes can be terminated. was proceeded. With Otherwise, the search for mode the knowledge of integer MV of at the first reference frame, Fast_MB flag was determined. The reference frame , , numbers and search ranges of modes and depend on Fast_MB flag. At FME stage, all_zeros block detection algorithm is applied to early terminate the searches in other reference frames. The proposed schemes are VLSI oriented and can be conveniently implemented in the hardwired encoder to save its computation power. The block diagram of MB-pipelining hardware encoder with the authors’ fast algorithms is shown in Fig. 8. In the first IME stage, before IME search, the Hadamard coefficient-based all_zeros detection was first executed at SKIP_MVP. If it was determined as all_zeros case, the current MB was decided as skip mode and its subsequent inter and intra searches were both eliminated. Otherwise, IME engine was scheduled to do IME search in the first reference frame. With the knowledge of the integer MV amplitude on the first

In this section, investigation of the performance of MV-based fast algorithm is discussed first, followed by that of Hadamard coefficient-based all_zeros block early detection algorithm. Finally, the detailed coding quality and computation saving performance of the whole algorithms are presented. In all simulations, the authors implemented their fast ME algorithms in the H.264 reference software version JM11.0. Several high and low motion sequences were selected including QCIF resolution sequences Foreman (255 frames), Carphone (255 frames), Mobile (250 frames), Coastguard (255 frames), Football (250 frames), Tempete (250 frames), News (255 frames), Container (255 frames) and CIF sequences Foreman (255 frames), Silent (255 frames), Football (250 frames), Stefan (255 frames), News (255 frames), Container (255 frames), Coastguard (255 frames) and Conoa (220 frames). For all these sequences, the picture rate was 30f/s. The authors used IPPP GOP and a single slice per picture. The search range was for QCIF sequences and for CIF sequences, respectively. The reference frame number was set as 5. High complexity RD-optimization (RDO_ON) and CAVLC entropy method were also used. For each sequence, quantization parameter values of 16, 20, 24, 28, 32, 36, and 40 were tested. A. Performance Analysis of Motion Vector Based Algorithm To quantify the coding quality of their algorithms, the authors used Bjonteggard delta bitrate (BDBR) and Bjonteggard delta PSNR (BDPSNR) [19], which are the average difference of bit rate and PSNR between two methods, respectively. BDBR and BDPSNR in this paper are derived from the simulations when sign in BDBR and sign in QP = 16; 24; 32; 40. The BDPSNR indicate the coding loss. The coding quality analysis of the MV-based reference frame reduction algorithm is illustrated in Table I. Not surprisingly, the performance of this algorithm was good to the sequences with low motion, such as Mobile and Container, because almost all MBs in those sequences were processed with multiple reference frames, which was illustrated in Table II. For slow-moving sequences, such as Mobile, the percentage of reduced reference frames was near 0.00%. The performance of this method was also robust to other sequences with moderate and high motion, such as Foreman, Carphone, Stefan and Football. As shown in Table I, in the worst case (Stefan_CIF), the introduced PSNR

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

627

Fig. 8. H.264/AVC hardwired encoder architecture with the proposed fast multiple reference frame ME algorithms.

denotes the the ME time of authors’ algowhere, is the time taken by JM11.0 exhaustive rithm and search with five reference frames. The experimental results of ME saving are also listed in Table III, which demonstrate that 74.9%–87.1% computation can be saved by the authors’ MV-based reference frame and search range reduction algorithm. Compared with VADVS in [7], the proposed MV-based algorithm provides the similar coding quality performance. For example, for Foreman_QCIF sequence, the coding quality degradation of VADVS was 0.086 dB average PSNR loss and 1.406% bit rate increase. In contrast, the coding performance loss of the authors’ algorithm was 0.0350 dB or equivalently 0.69% bit rate increase.

TABLE I CODING QUALITY COMPARISON OF MOTION VECTOR BASED REFERENCE FRAME REDUCTION WITH JVT CODEC

TABLE II PERCENTAGE OF REFERENCE FRAME REDUCTION

B. Performance Analysis of Hadamard Coefficient Based All_Zeros Block Detection Algorithm

loss by the authors’ MV-based reference frame reduction algorithm was 0.0559 dB or equivalently 1.09% bit rate increase versus the original JM11.0 algorithm. The coding performance comparison of the whole MV-based reference frame reduction and search range adjustment algorithm is depicted in Table III. The worst case comes from Stefan_CIF test sequence, whose BDPSNR was 0.1637 dB. Comparing Tables III and I, it was observed that the worst additional coding quality loss caused by the search range adjustment comes from Stefan_CIF, whose BDPSNR difference between Table III and Table I was 0.1038 dB. The performance of search range adjustment algorithm to those sequences with low or moderate motion was better than that of the counterparts with high motion. For example, the additional DBPSNR values for Foreman_QCIF and Mobile_QCIF were 0.0144 and 0.0045 dB, respectively. In contrast, to those sequences with large amount of complex movements, such as Stefan_CIF and Football_CIF, this parameter reached 0.1038 and 0.0817 dB, respectively. To evaluate the coding is speedup of the MV-based algorithm, ME time saving defined as (21)

SA(T)D-based all_zeros early detection algorithm has been proposed by Huang et al. [2]. In this Section, the authors compare their their Hadamard coefficient-based method with SATD-based one. The performance analysis focuses mainly on the Hadamard coefficients-based skip MB early detection algorithm. First, the SATD-based skip MB detection algorithm is introduced. If the SATD of MB at SKIP_MVP was less than the , then that MB was identified as threshold all_zeros case and determined as skip mode. The wisdom of SATD-based method is that the threshold is increased with re. At low rate (high ), more skip MBs can be early spect to detected, which agrees with the fact. However, this method just provides the coarse all_zeros estimation. The users find themselves in a dilemma in choosing the parameter . When is set large, the false all_zeros detection rate increases and the coding quality is degraded. On the other hand, if is set small, the increase of miss rate deteriorates the efficiency of computation , 20 and , 20, 24, 28, saving. For example, when 32, 36, the performance comparisons of coding quality and detection rate between the SATD-based method and authors’ algo, rithm are shown in Fig. 9. When is equal to 16 and the coding quality of SATD-based method is similar to that of Hadamard coefficient-based one. However, its detection rate is less than that of the authors’ method. When is augmented, e.g., , and , the detection rate of SATD-based one improves with the cost of coding quality loss. For instance, when , the detection rate of SATD-based algorithm is 1.4 times that of the authors’ method, but its PSNR loss is 0.6 dB. That is, its false detection rate severely deteriorates its coding

628

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

TABLE III CODING QUALITY AND MOTION ESTIMATION TIME SAVING COMPARISONS OF MOTION VECTOR BASED FAST ALGORITHM WITH JVT CODEC

TABLE IV SKIP_MB EARLY DETECTION RATE

Fig. 9. Performance comparisons between SATD-based and our Hadamard coefficient-based Skip_MB detection algorithm. (a) RD curve comparisons. (b) Detection rate comparisons.

performance. Another problem of SATD-based all_zeros detection algorithm is its instability versus quantization parameter. and , the The authors noticed that when SATD-based algorithm presented similar coding quality as JVT , the SATD-based fast alreference software. When gorithm expressed the noticeable PSNR loss, which was about , this gap further increased to 0.7 dB. 0.3 dB, and when On the other hand, the authors’ Hadamard coefficient-based algorithm presented so robust a performance that the JM11.0 full search reference curve and authors’ fast search curve are hardly distinguishable in Fig. 9. The computational saving efficiency of authors’ skip MB early detection algorithm depends mainly on the feature of the test sequence and quantization parameter. Table IV summarizes

the Skip_MB early detection rate by authors’ algorithm versus quantization parameter. The Skip_MB early detection rate indicates the percentage of early detected skip MBs with respect to total MBs. It is clearly illustrated that the rate always increases . Moreover, the authors’ algorithm is efficient to those with sequences with strong spatial homogeneity or temporal stationarity because under these conditions, the aliasing problem (low bit rate), 66.27% MBs is not so prominent. At in News_QCIF sequence are early determined as skip mode because of its station background. On the other hand, Mobile_QCIF is rich in complex textures and all scenes undergo slow movement. The impact of aliasing degrades the perfor, just 13.97% mance of authors’ algorithm. Even at MBs in Mobile_QCIF are early detected. From Table IV, it was also observed that early detection efficiency always improved with the frame size. Comparing the tests of Foreman_QCIF and , the detection rate is increased Foreman_CIF, when by 12.3%. Other test sequences demonstrate the same property. This can also be explained with the aliasing theory[12]. With the decrease in the sampler interval , the sampling frequency increases, so that the high frequency signals, which are normalized by , become fewer. Consequently, more all_zeros blocks appear as prediction errors decrease. C. Performance Analysis of Whole Algorithms The RD curve comparisons of the whole fast algorithms versus the JVT reference software JM11.0 with eight QCIF and eight CIF test vectors are shown in Figs. 10 and 11, respectively. The coding performance comparisons are conducted with the following three test cases: 1) JM11.0 refers to the five previous frames;

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

629

Fig. 10. RD curve comparisons for QCIF sequences.

2) JM11.0 only refers to the immediately previous frame; 3) authors’ algorithms refer to the five previous frames. As the authors’ algorithm provides almost the same coding efficiency, in most cases, it is hard to distinguish the algorithms’ curves from the results of JM11.0 with five reference frames. The BDPSNR and BDBR criteria-based coding quality quantitive analysis is depicted in Table V. The average BDPSNR of all tests is 0.0899 dB. Among these sequences, the coding quality of Foreman, Carphone, Tempete, Mobile, and Stefan are sensitive to multiple reference frames technique. With the proposed algorithms, for those sequences with moderate or low motion, such as Foreman, Carphone, Tempete and Mobile, the maximum PSNR degradation was less than 0.1564 dB. The maximum coding quality loss (BDPSNR=-0.1822 dB) in all these sequences was from Stefan_CIF. This is mainly

due to fast zooming and large movements that deteriorate the performance of adaptive search range algorithm. From the comparison between Tables V and III, it can be found that the additional PSNR loss of authors’ all_zeros early detection algorithm is no more than 0.0664 dB. The experimental ME saving results are listed in Table VI, which demonstrate that 72.7%–93.7% computation can be saved by the authors’ always increases with schemes. It is observed that . This comes mainly from the efficiency enhancement of was increased, all_zeros block detection algorithm. When more blocks were early determined as all_zeros block and their searches in other reference frames could be skipped. The comparisons between the authors’ proposals and other existing fast multiple reference frames algorithms are shown in Table VII. It can be seen that the algorithms proposed in [2] are

630

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

Fig. 11. Rate— distortion curve comparisons for CIF sequences.

TABLE V CODING QUALITY COMPARISON OF OVERALL ALGORITHMS WITH JVT CODEC

not fully compatible with the existing MB pipelining architecture. This comes mainly from its skip MB detection criterion, which requires the Intra prediction cost of current MB. This means that the Intra prediction engine must be arranged before the Inter prediction engines, consequently, it is not compatible

TABLE VI MOTION ESTIMATION TIME SAVING OF OVERALL ALGORITHMS

LIU et al.: FRAME MOTION ESTIMATION FOR H.264

631

TABLE VII COMPARISONS WITH EXISTING FAST MULTIPLE REFERENCE FRAME MOTION ESTIMATION ALGORITHMS

with the original MB pipelining processing flow in [3]. Compared with the MV composition-based algorithms, the authors’ approaches need less hardware overhead. Particularly, the additional memory cost of [5], [7] is directly proportional to the frame size. For HDTV720p applications, the required memory size for the MV buffer in [5] is 5 529 600-bit. Table VII illustrates that the authors’ approaches have better computation saving performance than other counterparts, especially at low bit rate. In regard to coding quality, it is noticed that the performance of authors’ approaches is similar to that of [2]. References [5], [7] do not provide the coding performance analysis by DBPSNR. However, from their RD curves, it can be estimated that the DBPSNR of Mobile sequence is about 0.2 dB in [7] and that of Foreman sequence is about 0.1 dB in [5]. In summary, the authors’ approaches are compatible with the existing MB-pipelining hardwired encoder architecture, and their coding quality performance is similar to that of other existing fast algorithms with less additional hardware overhead. It should be noticed that the authors’ algorithms can be integrated with fast block matching algorithms. The proposed multiple reference frame reduction and search window adjustment algorithm does not define the search pattern during the block matching procedure. In each search area, the fast block matching methods, such as four-step search, diamond search and successive elimination, can be applied to further reduce the ME computation. VI. CONCLUSION Considering the limitations of MB pipeline hardware architectures, the authors propose two fast algorithms for multiple reference frame variable block size ME in H.264/AVC. First, on the basis of mathematical analysis and experimental results, it is demonstrated that aliasing problem is the primary important issue that makes multiple reference frame algorithm essential and that the motion between the image and the camera alleviates the adverse effect of aliasing. Consequently, the MV-based reference frame elimination and search area adjustment algorithm is proposed to save operations in both IME and FME stages. Second, with the Hadamard transform coefficient-based all_zeros block detection algorithm, two early termination criteria are proposed. Compared with the predecessor SA(T)D-based all_zeros block detection algorithm, the Hadamard transform coefficient-based algorithm provides better coding quality and is more robust versus quantization parameter. Experimental results show that 72.7%–93.7% computation can be saved with almost the same coding quality as

the reference software JM11.0 with exhaustive ME algorithm. Moreover, these proposed schemes can be combined with other fast block ME algorithms to further improve the computation saving performance. REFERENCES [1] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi, “Video coding with H.264/AVC: Tools, performance, and complexity,” IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, 2004. [2] Y.-W. Huang, B.-Y. Hsieh, S.-Y. Chien, S.-Y. Ma, and L.-G. Chen, “Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp. 507–522, Apr. 2006. [3] T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W. Chen, and L.-G. Chen, “Analysis and architecture design of an HDTV720P 30 frames/s H.264/AVC encoder,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp. 673–688, Jun. 2006. [4] C.-T. Hsu, Li H.-J, and M.-J. Chen, “Fast reference frame selection method for motion estimation in JVT/H.264,” IEICE Trans. Commun., vol. E87-B, no. 12, pp. 3827–3830, Dec. 2004. [5] Y.-P. Su and M.-T. Sun, “Fast multiple reference frame motion estimation for H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 3, pp. 447–452, Mar. 2006. [6] M.-J. Chen, Y.-Y. Chiang, H.-J. Li, and M.-C. Chi, “Efficient multiframe motion estimation algorithms for MPEG-4 AVC/JVT/H.264,” in Proc. 2004 Int. Symp. Circuits Syst., May 2004, vol. 3, pp. 737–740. [7] M.-J. Chen, G.-L. Li, Y.-Y. Chiang, and C.-T. Hsu, “Fast multiframe motion estimation algorithms by motion vector composition for the MPEG-4 AVC/JVT/H.264 standard,” IEEE Trans. Multimedia, vol. 8, no. 3, pp. 478–487, Jun. 2006. [8] M. Al-Mualla, C. Canagarajah, and D. Bull, “Simplex minimization for single- and multiple-reference motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 12, pp. 1209–1220, Dec. 2001. [9] M.-J. Kim, Y.-G. Lee, and J.-B. Ra, “A fast multi-resolution block matching algorithm for multiple-frame motion estimation,” IEICE Trans. Inf. Syst., vol. E88-D, no. 12, pp. 2819–2827, Dec. 2005. [10] B. Girod, “The efficiency of motion-compensating prediction for hybrid coding of video sequences,” IEEE J. Sel. Areas Commun., vol. SAC-5, no. 7, pp. 1140–1154, Aug. 1987. [11] B. Girod, “Efficiency analysis of multihypothesis motion compensated prediction for video coding,” IEEE Trans. Image Process., vol. 9, no. 2, pp. 173–183, Feb. 2000. [12] T. Wedi and H. G. Musmann, “Motion- and aliasing-compensated prediction for hybrid video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 577–586, Jul. 2003. [13] Z.-Y. Liu, Y.-Q. Huang, Y. Song, S. Goto, and T. Ikenaga, “VLSI friendly edge gradient detection based multiple reference frames motion estimation optimization for H.264/AVC,” in Proc. 2007 Eur. Signal Process. Conf., Sep. 2007, pp. 1809–1813. [14] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Second ed. Englewood Cliffs, NJ: Prentice Hall, 2002. [15] M. Kucukgoz and M.-T. Sun, “Early-stop and motion vector re-using for MPEG-2 to MPEG-4 transcoding,” Proc. SPIE Visual Commun. Image Process., pp. 932–936, Jan. 2004. [16] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, “Lowcomplexity transform and quantization in H.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 598–603, Jul. 2003.

632

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 5, MAY 2008

[17] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C. Ko, “Fast intermode decision in H.264/AVC video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 953–958, Jul. 2005. [18] Z.-G. Xie, Y. Liu, J. Liu, and T. Yang, “A general method for detecting all-zero blocks prior to DCT and quantization,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 2, pp. 237–241, Feb. 2007. [19] G. Bjontegaard, Calculation of Average PSNR Differences Between RD-Curves. Austin, Texas: VCEG-m33, 2001.

Zhenyu Liu (M’07) received the B.E., M.E., and Ph.D. degrees in electronics engineering from Beijing Institute of Technology, Beijing, China, in 1996, 1999, and 2002, respectively. His Ph.D. research focused on the real time signal processing and relative ASIC design. From 2002 to 2004, he worked as a Postdoctorate in Tsinghua University of China, Beijing, China, where his work mainly concentrated on the embedded CPU architecture design. Currently, he is a Researcher in the Graduate School of Information, Production and Systems of Waseda University, Kitakyushu, Japan. His research interests include real-time H.264/AVC encoding algorithm optimization and associated VLSI architecture design.

Lingfeng Li received the B.E. degree from Xi’an Jiaotong University, Xi’an, China, in 1997, the M.E. degree from Tsinghua University, Beijing, China, in 2003, and the Ph.D.degree from Waseda University, Kitakyushu, Japan, in 2007. Currently, he is with Nemochips Inc., Shanghai, China, as Senior Design/Research Engineer. His research interests include image compression, video signal processor, and system design of multimedia SoC.

Yang Song (S’06–M’08) received the B.E. degree in computer science from Xi’an Jiaotong University, Xi’an, China, in 2001 and the M.E. degree in computer science from Tsinghua University, Beijing, China in 2004, and the Ph.D. degree from Waseda University, Kitakyushu, Japan. He is currently a Researcher in Graduate School of Information, Production and Systems, Waseda University, Japan. His research interest includes video coding and associated very large scale integration (VLSI) architecture.

Shen Li was born in 1980. He received the B.E. degree in automation engineering from Tsinghua University, Beijing, China, in 2002, and the M.E. and Ph.D. degrees from the Graduate School of Information, Production and Systems, Waseda University, Japan, in 2005 and 2007, respectively. He is now with SoC Research Center, Toshiba Company, Ltd., Kawasaki, Japan, and his research interest is video processing and related VLSI architecture design.

Satoshi Goto (S’69–M’77–SM’84–F’86) was born in Hiroshima, Japan, in 1945. He received the B.E., M.E., and Ph.D. degrees in electronics and communication engineering from Waseda University, Kitakyushu, Japan, in 1968, 1970, and 1981, respectively. He joined NEC Laboratories in 1970 where he worked for LSI design, Multimedia System and Software as GM and Vice President. Since 2003, he has been Professor at Graduate school of Information, Production and Systems of Waseda University, Kitakyushu. His main interest is now on VLSI design methodologies for multimedia and mobile applications. He has published 7 books, 38 journal papers, 67 international conference papers with reviews. Dr. Goto served as General Chair of ICCAD and ASPDAC and was a board member of IEEE Circuits and Systems Society. He is a Institute of Electronics, Information and Communication Engineers (IEICE) of Japan Fellow and Member of Academy Engineering Society of Japan.

Takeshi Ikenaga (M’95) received the B.E. and M.E. degrees in electrical engineering and the Ph.D. degree in information and computer science from Waseda University, Tokyo, Japan, in 1988, 1990, and 2002, respectively. He joined LSI Laboratories, Nippon Telegraph and Telephone Corporation (NTT) in 1990, where he had been undertaking research on the design and test methodologies for high performance ASICs, a real-time MPEG2 encoder chip set, and a highly parallel LSI and system design for image-understanding processing. He is presently an Associate Professor in the system LSI field of the Graduate School of Information, Production and Systems, Waseda University. Kitakyushu. His current interests are application SoCs for image, security and network processing. Especially, he engages in the research on H.264 encoder LSI, JPEG2000 codec LSI, LDPC decoder LSI, UWB wireless communication LSI, public key encryption LSI, object recognition LSI, etc. Dr. Ikenaga is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan, and the Information Processing Society of Japan.

Fast and accurate sequential floating forward feature ...

Fast Sub-Pixel Motion Estimation and Mode Decision ...

A fast heuristic Cartesian space motion planning ...

VLSI Oriented Fast Multiple Reference Frame Motion Estimation ...

A Fast Algorithm For Rate Optimized Motion Estimation

Low-Power Partial Distortion Sorting Fast Motion ...

From Structure-from-Motion Point Clouds to Fast ...

Rotational Motion and Astrophysics

Fast Object Detection with Whiten Hog Feature and its ...

Minimal Spanning Tree and FAST used in Feature Clustering to ... - IJRIT

Oscillatory Motion and Chaos - GitHub

Discrete Walsh-Hadamard Transform in Signal ...

On the growth factor for Hadamard matrices

Discrete Walsh-Hadamard Transform in Signal ...

robust image feature description, matching and ...

motion - inversecondemnation.com