Block based embedded color image and video coding

Viewer
Transcript

Block based embedded color image and video coding Nithin Nagaraja , William A. Pearlmanb and Asad Islamc a GE

Global Research, John F. Welch Technology Centre, Bangalore, India. b Renssealer Polytechnic Institute, Troy, NY, USA. c Nokia Research Center, Dallas, Texas, USA. ABSTRACT

Set Partitioned Embedded bloCK coder (SPECK) has been found to perform comparable to the best-known still grayscale image coders like EZW, SPIHT, JPEG2000 etc. In this paper, we first propose Color-SPECK (CSPECK), a natural extension of SPECK to handle color still images in the YUV 4:2:0 format. Extensions to other YUV formats are also possible. PSNR results indicate that CSPECK is among the best known color coders while the perceptual quality of reconstruction is superior than SPIHT and JPEG2000. We then propose a moving picture based coding system called Motion-SPECK with CSPECK as the core algorithm in an intra-based setting. Specifically, we demonstrate two modes of operation of Motion-SPECK, namely the constant-rate mode where every frame is coded at the same bit-rate and the constant-distortion mode where we ensure the same quality for each frame. Results on well-known QCIF and CIF sequences indicate that Motion-SPECK performs comparable to Motion-JPEG2000 while the visual quality of the sequence is in general superior. Both CSPECK and MotionSPECK automatically inherit all the desirable features of SPECK such as embeddedness, low computational complexity, highly efficient performance, fast decoding and low dynamic memory requirements. The intended applications of Motion-SPECK would be high-end and emerging video applications such as High Quality Digital Video Recording System, Internet Video, Medical Imaging etc. Keywords: image coding, video coding, compression, color coding, embedded, Motion-JPEG2000, wavelet, intra-based coding, moving picture coding.

1. INTRODUCTION In the arena of still image compression, there has been a growing interest in wavelet based embedded image coders because of their desirable features, namely high quality at large compression ratios, very fast decoding, progressive transmission etc. In particular, set-partitioning schemes seem to have an edge due to their relatively lowcomplexity and high performance when compared to other coders, such as those employing vector quantization. Recently, a number of hierarchical coding techniques have emerged which provide very good performance, apart from exhibiting the desirable properties characteristic of such coding techniques. All these coding techniques are based on the idea of partitioning the image into sets, and exploiting the hierarchical subband pyramidal structure of the transformed image. Shapiro’s Embedded Zerotree Wavelet algorithm (EZW) 8 was one of the first of this kind. Said and Pearlman successfully improved the EZW algorithm by providing symbols for combination of parallel zerotrees.9 Their implementation is based on set-partitioning sorting algorithm called Set-Partitioning in Hierarchical Trees (SPIHT). More recent is Islam and Pearlman’s Set Partitioned Embedded BloCK Coder (SPECK).4 SPECK has all the desirable properties of embeddedness, progressive transmission, low computational complexity, low dynamic memory requirements, fast decoding/encoding and provides excellent performance. Inspired by the huge success of SPIHT and SPECK for still grayscale image coding, there has been extensive research on color image coding using the zerotree structure. A simple application of these grayscale algorithms on the individual components of a color image would result in losing embeddedness. We investigate a natural extension of SPECK to handle color images. By treating all color planes as one unit at the coding stage, we Further author information: E-mail: a [email protected]. This work was done when the author was a Graduate student at Rensselaer Polytechnic Institute.

1

generate a single mixed bit-stream so that we can stop at any point of the bit-stream and reconstruct the color image with the best quality at that bit-rate. We call this scheme as CSPECK (Color-SPECK) and compare it’s performance with SPIHT, JPEG2000 etc. We also utilize CSPECK for a moving picture based coding system, where we operate CSPECK in an intra-based setting on each frame of the video sequence. As such, we demonstrate its performance against Motion-JPEG2000. The intended applications of Motion-SPECK would be high-end and emerging video applications such as High Quality Digital Video Recording System, Internet Video, Medical Imaging etc. This paper is divided as follows: In Section 2, we briefly review the set partitioning methods of SPECK which motivates us to derive a natural extension to color images. In Section 3, we introduce color image coding and the extension of SPECK to color images (CSPECK) and compare its performance with JPEG2000, SPIHT and Predictive Embedded Zero-tree Wavelet (PEZW),2 which is an improved version of the original EZW and incorporated in the MPEG-4 verification model (VM) 6.0. Results of extensive simulations on various types of color test images were recently reported by Pearlman.7 In Section 4, we propose Motion-SPECK and compare its performance with Motion-JPEG2000 on standard CIF sequences (300 frames per sequence). We summarize our findings in Section 5.

2. MOTIVATION Many scalar quantized schemes, such as EZW, SPIHT, SPECK, etc. employ some kind of significance testing of sets or groups of wavelet coefficients, in which the set is tested to determine whether the maximum magnitude is above a certain threshold. The results of these significance tests determine the path taken by the coder to code the source samples. These significance-testing schemes are based on some very simple principles, which allow them to exhibit excellent performance. Among these principles are the partial ordering of magnitude coefficients with a set-partitioning sorting algorithm, bit plane transmission in decreasing bit plane order, and exploitation of self-similarity across different scales of an image wavelet transform. An interesting thing to note in these schemes is that all of them have relatively low computational complexity, considering the fact that their performance is comparable to the best-known image coding algorithms. As previously mentioned, this class of coders exhibits the property of progressive transmission and embeddedness, which is a very important and desirable characteristic. Progressive transmission refers to the transmission of information in decreasing order of its information content. In other words, the coefficients with the highest magnitudes are transmitted first. Since all of these coding schemes transmit bits in decreasing bit plane order, this ensures that the transmission is progressive. Such a transmission scheme makes it possible for the bit-stream to be embedded, i.e., a single coded file can be used to decode the image at various rates less than or equal to the coded rate, to give the best reconstruction possible with the particular coding scheme. For a complete description of the algorithm itself, the reader is referred to Islam 5 or Pearlman7 ; here we highlight the set partitioning methods used in SPECK that make it highly efficient. Consider an image X which has been adequately transformed using an appropriate subband transformation (most commonly, the discrete wavelet transform). The transformed image is said to exhibit a hierarchical pyramidal structure defined by the levels of decomposition, with the topmost level being the root. The finest coefficients lie at the bottom level of the pyramid while the coarsest coefficients lie at the top (root) level. The image X is represented by an indexed set of transformed coefficients ci,j , located at coefficient position (i,j) in the transformed image. Transform coefficients are grouped together in sets which comprise of regions in the transformed image. Following the ideas of SPIHT, we say that a set T of coefficients is significant with respect to n if 2n ≤ max(i,j)∈T {| ci,j |} < 2n+1 otherwise it is insignificant. We can write the significance of a set T , i.e. Sn (T ) =

½

1, 0,

if 2n ≤ max(i,j)∈T { | ci,j | } < 2n+1 else

2

(1)

X

I

S

S S

I

S

O(S)

S

new I

Figure 1. (a) Left: Partitioning of image X into sets S and I. (b) Middle: Quadtree Partitioning of set S. (c) Right: Octave-band Partitioning of set I.

The SPECK algorithm makes use of rectangular regions of the image transform. These regions or sets, referred to as sets of type S, can be of varying dimensions. The dimension of a set S depends on the dimension of the original image and the subband level of the pyramidal structure at which the set lies. During the course of the algorithm, sets of various sizes will be formed, depending on the characteristics of coefficients in the original set. Note that a set of size 1 consists of just one coefficient. The other type of sets used in the SPECK algorithm are referred to as sets of type I. These sets are obtained by chopping off a small square region from the top left portion of a larger square region. A typical set I is illustrated in Fig. 1. The SPECK algorithm maintains two linked lists: LIS — List of Insignificant Sets, and LSP — List of Significant Pixels. The former contains sets of type S of varying sizes which have not yet been found significant against a threshold n while the latter obviously contains those coefficients which have tested significant against n. The algorithm starts by partitioning the image into two sets: set S which is the root of the pyramid, and set I which is everything that is left of the image after taking out the root as shown in Fig. 1(a). To start the algorithm, set S is added to LIS. To proceed, it notes the maximum threshold n max such that every ci,j in X is insignificant with respect to nmax + 1, but is significant with respect to nmax for some ci,j ∈ X. Set S in LIS is processed by testing it for significance against the threshold n = n max . Assuming that S is significant, it is partitioned, by a quadtree partitioning process, into four subsets O(S), each having size approximately one-fourth the size of the parent set S. Fig. 1(b). gives an illustration of this partitioning process. Each of these four subsets are, in turn, treated as a set of type S and processed recursively until the coefficientlevel is reached where the coefficients that are significant in the original set S are located and thereby coded. The coefficients/sets that are found insignificant during this hunting process are added to LIS to be tested against the next lower threshold later on. The motivation for quadtree partitioning of such sets is to zoom-in quickly to areas of high energy in the set S and code them first. At this stage of the algorithm, there are no more sets of type S that need to be tested against n; if there were any, they would be processed before going on to the next part of the algorithm. Once all sets of type S are processed, the set I is processed by testing it against the same threshold n. Upon finding it significant, it is further partitioned by yet another partitioning scheme — the octave band partitioning. Fig. 1(c). gives an illustration of this partitioning scheme. Set I is partitioned into four sets — three sets of type S and one of type I. The size of each of these three set S is the same as that of the chopped portion of X. The new set I that is formed by this partitioning process is now reduced in size. The idea behind this partitioning scheme is to exploit the hierarchical pyramidal structure of the subband decomposition, where it is more likely that energy is concentrated at the top most levels of the pyramid and as one goes down the pyramid, the energy content decreases gradually. If a set I is significant against the same threshold n, it is most likely that the coefficients that cause I to be significant lie in the top left regions of I. These regions are decomposed into sets of type S, and are put next in line for processing. 3

In this way, regions that are likely to contain significant coefficients are grouped into relatively smaller sets and processed first, while regions that are likely to contain insignificant coefficients are grouped into a large set. A single bit may be enough to code this large region against the particular threshold. Hence, once the set I is partitioned by the octave band partitioning method, the three sets S are processed in the regular image-scanning order, after which the newly formed reduced set I is processed. It should be noted that processing the set I is a recursive process, and depending on the characteristics of the image, at some point in the algorithm, the set I will cover only the lower-most (bottom) level of the pyramidal structure. When, at this point, the set I tests significant against some threshold, it will be broken down into three sets S but there will be no new reduced set I. To be precise, the new set I will be an empty set. Once all the sets of S and I have been processed, the sorting pass for that particular threshold n is completed, after which the refinement pass is initiated which refines the quantization of the coefficients in LSP, i.e. those coefficients which were tested significant during the previous sorting passes. Once this is done, the threshold is lowered (Quantization step) the LIS is visited and the sequence of sorting and refinement passes is repeated against this lower threshold. It is important to note that in the next round of the sorting pass, the sets (in LIS) of type S are processed in increasing order of their size. In other words, say for a square image, sets of size 1 (i.e. individual coefficients) are processed first, sets of size 4 (i.e. blocks of 2 X 2 coefficients) are processed next, and so on. The algorithm repeats until the desired rate is achieved or, in case of lossless or nearly lossless compression, all the thresholds up to the last, corresponding n = 0, are tested.

3. COLOR IMAGE CODING Tristimulus color spaces have been successfully used for the representation of color. The color space most often used in still image compression is the red (R), green (G) and blue (B) or RGB space. Other color spaces such as luminance and chrominance spaces have been used in digital video systems. The most popular of these are the YUV and YCrCb spaces. Generally, the components of many of these spaces are highly correlated. This correlation has not been fully exploited in color image compression. The use of an appropriate color transformation can provide for significant increase in compression performance. In this paper, we use the YUV space and the transformation from RGB to YUV space is as follows: 

  Y 0.299  U  =  0.500 V −0.169 and the reverse transformation is,    R 1.0012  G  =  1.4451 B −1.2951

0.587 −0.419 −0.331

  0.114 R −0.081   G  . −0.500 B

(2)

1.4017 −0.7137 −0.0013

  0.0012 Y 0.4451   U  . −2.2951 V

(3)

One of the advantages of this transformation is that it reduces the psychovisual redundancy in an image. Also, it has the important feature that the Y or the luminance component is the compatible monochrome (grayscale) version of the image. It has been shown that the human visual system (HVS) is relatively insensitive to the high frequency content of the chrominance components (refer to Poynton6 ). Thus, these components are commonly sub-sampled to remove redundancy.

3.1. Color Image Coding Using SPECK A simple application of SPECK to color images would be to code each color plane separately as does a conventional color image coder. Then, the generated bit-stream of each plane would be serially concatenated. However, this simple method would require bit allocation among color components, losing precise rate control and would fail to meet the requirement of full embeddedness of the image codec since the decoder needs to wait until the full bit-stream arrives to reconstruct and display. Instead, one can treat all color planes as one unit at the coding stage, and generate one mixed bit-stream so that we can stop at any point of the bit-stream and reconstruct 4

S S I

Y

S I

I

U

V

(YUV: 4 : 2 : 0 assumed) Figure 2. Set-Partitioning scheme for CSPECK.

the color image of the best quality at that bit-rate. In addition, it will automatically allocate optimally among the color planes. By doing so, we still claim full embeddedness and precise rate control of SPECK. We call this scheme Color-SPECK (CSPECK). We can stop at any point of the bit-stream, and can still reconstruct a color image at that bit-rate as opposed to the first case. Let us consider the tri-stimulus color space YUV (4:2:0 format) where Y stands for the luminance plane, U and V stand for chrominance planes which are sub-sampled by a factor of 2 in both the horizontal and vertical directions. One motivation for using this format is that the Human Visual System (HVS) is relatively insensitive to the high frequency content of the chrominance components. Thus, these components are commonly subsampled to remove redundancy as in the JPEG and MPEG standards. Each color plane is separately wavelet transformed (using 9/7 filter), having its own pyramid structure. Now, to code all color planes together, the SPECK algorithm will begin by intializing the LIS (List of Insignificant Sets) with the appropriate coordinates of the top level in all the three planes. The set partitioning rules are similar to the original SPECK algorithm (refer to Islam4 ) except that now there are 3 LIS but only one LSP. Each color plane is partitioned into set S and I as shown in Fig 2. which are mutually exclusive and exhaustive among color planes. This ensures that the algorithm automatically assigns the bits among the planes according to the significance of the magnitudes of their own coordinates. The effect of the order in which the root pixels of each color plane are initialized will be negligible except when coding at extremely low bit-rate. For a given threshold, the Y component is first partitioned and tested for significance (both S and I). Once these are finished, the U and the V components are partitioned and tested for the same threshold. The n-th bit of the LSP coefficients found significant in the previous passes are output and the bit-plane is reduced by one and the algorithm repeats itself until the bit-budget is met or down through the lowest bit-plane in case of lossless coding. A back-end entropy coder using arithmetic coder further compresses the bit-stream.

3.2. Results and Discussion on CSPECK In this section, we compare CSPECK with other algorithms on the color Akiyo image (QCIF and CIF formats) which are in the YUV 4:2:0 format. It seems that there are not many reports of color image coding in the literature, since it has been believed that chrominance components are usually easy to code. Consequently, not much attention has been devoted to set up standard criteria for evaluating color image compression. Notwithstanding, here we report the results of color image compression in terms of PSNR of each color plane. We will compare the CSPECK algorithm with other embedded image algorithms such as Predictive Embedded Zero-tree Wavelet (PEZW) which is an improved version of the original EZW and is currently in the MPEG-4 verification model (VM) 6.0, SPIHT and JPEG2000 (VM 8.0). The test results of PEZW for various types of color images were obtained by Kim. 2 We tried to match the number of bits, type of filters, and number of decompositions as closely as possible for fair comparisons. The 9/7 bi-orthogonal Daubechies’ filter in Antonini et. al.1 is used for the 2-D wavelet decomposition. Table 1. provides

5

comparision of CSPECK, SPIHT, JPEG2000 and PEZW on color Akiyo image in QCIF and CIF formats at various bit-rates. From the table, we observe that the performance of CSPECK is best for the luminance (Y) component whereas JPEG2000 performs better for the chrominance components (U and V). But since the human eye is more sensitive to changes in brightness, we claim that the gain in dB for the Y component as obtained from CSPECK yields visually better results. This is in accordance with the images shown in Fig. 6 and Fig. 7. For comprehensive simulation results on various color test images at various bit-rates with QCIF and CIF formats, the reader is referred to Pearlman.7

4. MOTION-SPECK: EMBEDDED COLOR MOVING PICTURE CODING WITH SPECK In this section, we propose a new moving-picture coding system with an intra-based coding scheme as the core technology. This differs significantly from the current moving pictures standards, MPEG (MPEG-1, MPEG2 and MPEG-4). It is a well known fact that MPEG outperforms intra-based coding schemes in terms of compression efficiency because MPEG takes advantage of motion prediction between frames. On the other hand, intra-based coding systems like Motion JPEG and Motion JPEG2000 (MJP2) provide powerful features, such as embeddedness, frame-by-frame editing, arbitrary frame extraction, and robustness to bit errors in errorprone channel environments. These features are very important for consumer applications as well as professional broadcasting systems. We call this new intra-based coding system with SPECK as Motion-SPECK and provide a full-description of the scheme. This coding scheme has the potential to provide subjective image quality performance superior or comparable to Motion JPEG2000 without sacrificing performance at other points in the rate-distortion spectrum.

4.1. Constant-Rate mode The proposed moving picture coding system is shown in Fig. 3. The encoder consists primarily of a 2-D wavelet analysis part without any kind of motion compensation and a coding part with the CSPECK being the core algorithm. As we can see, the decoder has the structure symmetric to the encoder. Frames from the input sequence are assumed to be of the YUV 4:2:0 format (QCIF or CIF). These frames will be treated individually, i.e., each frame undergoes the 2-D wavelet analysis followed by the CSPECK algorithm. Then the resulting data is further compressed by an entropy coder which in this case is an arithmetic coder. The user inputs a specific rate in bits/pixel/frame and this is used to determine the bit-budget that can be allotted per frame. For example, a 0.5 bits/pixel/frame for a QCIF image sequence consisting of 300 frames would imply a bit-budget of 0.5x144x176 = 12672 bits or 1584 bytes for each frame of the sequence which would mean a total of 1584x300 = 475200 bytes for the compressed bit-stream. With this scheme, there is no complication of a rate allocation, nor is there a feedback loop of prediction error signal, which may slow down the efficiency of the system. The omission of a motion compensation stage might yield lower compression but yields a very flexible codec since each frame can now be decoded independent of other frames. It should be noted that we are generating one mixed bit-stream of the three components so that we can stop at any point in the bit-stream, and still reconstruct the color video sequence at the corresponding bit-rate. Each color plane has its own pyramidal structure. CSPECK automatically assigns appropriate bit allocations to each color plane as discussed in the previous section. This results in embeddedness of the output bitstream and these features makes it desirable in some of the applications mentioned previously. The complexity of the system is low since there is no motion compensation. The complexity of the system is effectively the same as that of the CSPECK coder but applied to each frame of the sequence.

4.2. Constant-Distortion mode It is often desired to obtain the same quality for each of the frames of a sequence. This results in a very pleasing experience for the human eye. In order to have the same quality we need to have the same distortion (PSNR) across different frames. The method to achieve this is described below. The input sequence is treated frame by frame as in Fig. 3. The additional step being that after each frame which consists of the three color components (Y,U and V) is first wavelet-transformed, the resulting coefficients 6

Wavelet Transform Y

CIF

U Wavelet Transform

YUV

CSPECK algorithm

output compressed bit-stream

4:2:0 V Wavelet Transform

Figure 3. Block diagram of the proposed codec: Motion-SPECK

are passed through an uniform quantizer before applying CSPECK. The step-size of which is chosen according to the quality-factor input by the user. The equations governing this are as follows: ColorWeight[component] = {1.0, 0.7, 0.7}

(4)

Quality = < input >

(5)

QualMult = 0.02; QualConst = 5.0

(6)

1 = 0.208 ∗ e(QualMult∗(Quality+1)−QualConst) δ 1 1 = ∗ ColorWeight[Y] δY δ 1 1 = ∗ ColorWeight[U] δU δ 1 1 = ∗ ColorWeight[V] δV δ

(7) (8) (9) (10)

Here Quality is input from the user and can have a value ranging from 0 − 250. The component refers to the Y,U and V components. As it can be seen, the ColorWeight for the U and V components are 0.7 times that of Y which in-turn results in a larger value of δU and δV than δY . The wavelet coefficients of the three components are quantized by the uniform quantizer using the respective values of step-sizes. Note that the lossless S+P filter10 was used for the wavelet transform. Although the S+P filter results in a slight loss in compression ratio as compared to 9/7 Daubechies’floating point filter, it has an added advantage of higher speed owing to lesser computation. This is to ensure that there is loss in the wavelet-transform step owing to the lossless nature of the S+P transform. It is important to note that once the step-sizes are determined by the above equations (δ Y , δU and δV ), the same values are used for every frame of the input sequence. This results in approximately the same Mean-Squared Error (MSE) in every frame. Hence each frame is of the same quality in the sense that the MSE or the distortion is approximately the same.

4.3. Simulation and Results In this section, we report results of both the modes. The Constant-rate mode was tested for a number of sequences at different rates (bits/pixel/frame). The results were compared with Motion-JPEG2000 (VM 2.1). The bits/pixel/frame rates of 0.25, 0.5 and 1.0 for the CIF sequences Coastguard, Container and Foreman each consisting of 300 frames were used for comparison. Table 2. gives a tabulation of those results in terms of the

7

average PSNR for the Y,U and V components for the 300 frames. The plots in Fig. 4 indicate the PSNR for the Y component in dB. It can be inferred from the results that Motion-SPECK outperforms Motion-JPEG2000 in terms of the PSNR for the Y component for the Foreman sequence and very close for the other two sequences. Motion-JPEG2000 does better for the U and V components. The subjective quality for Motion-SPECK is comparable to that of Motion-JPEG2000 (refer to Fig. 8. It is important to note that a single embedded bit-stream, generated for Motion-SPECK for a bits/pixel/frame rate of 1.0, was used to decode at different rates of 0.25 and 1.0. Notably, exact bit rates were achieved for any rate less than or equal to the coded rate. On the other hand, exact bit-rates are not achieved with Motion-JPEG2000 due to the way rate allocation is performed in the EBCOT algorithm. Similarly, the Constant-distortion mode was tested for the CIF sequences (352 x 288, 300 frames, YUV 4:2:0) Coastguard, Container and Foreman for quality factors of 50, 100, 150 and 200. The results were compared with Motion-JPEG2000 (Fig. 5). In order to have a fair comparison, the bits/pixel/frame for Motion-JPEG2000 was determined by first running Motion-SPECK for each of the sequences and noting down the file-sizes of the resulting compressed bit-streams. The bit-rate (bits/pixel/frame) were determined from this value of actual filesize and used to generate the compressed bit-stream of Motion-JPEG2000 to match this file-size. By the nature of the rate allocation scheme used in Motion-JPEG2000, it is not possible to obtain the exact rate, but only a close approximation can be obtained. Hence we also give the actual file sizes of the compressed bit-streams for both the schemes. In Table 3, we give an exhaustive comparison of the two coders. As in the case of the constant-rate mode, it is found that Motion-SPECK does slightly better for the Y component and poorly for the other two components. This is because Motion-SPECK is allocating more bits to the Y component than Motion-JPEG2000. The subjective quality of the images are comparable, as seen in Fig. 9.

5. CONCLUSIONS In this work, we developed a new color-embedded coding scheme for still images by modifying the SPECK algorithm. Color-embedded coding implies that the generated bit-stream is a mixture of three color planes so that the decoder can choose any quality of the decoded color image with a single bit-stream, and can improve the image quality by receiving more bits from the source. We call this scheme as Color-SPECK or CSPECK and simulation results indicate that this scheme still preserves desirable properties such as automatic optimal rate allocation (within the framework of the algorithm) among color planes and among pixels within a plane, precise rate control, and fast coding/decoding. Especially with color-embedded image coding, the coding results are among the best known thus far in terms of PSNR. We then presented a way of extending CSPECK for coding moving color image/video frames. MotionSPECK, an intra-based coding scheme with CSPECK as the core algorithm, is a low-complexity moving-picture (video) coding algorithm with clear advantages over rivals such as the standard Motion-JPEG2000. Specifically we demonstrated the constant-rate and the constant-distortion modes of Motion-SPECK. Although it is a known fact that inter-frame based coding schemes which exploit the redundancy in the temporal domain outperform intra-based coding schemes in terms of compression ratio, nevertheless, intra-based coding schemes have their own set of advantages such as embeddedness, easy editing and robustness to error prone environments which the former schemes fail to provide.

REFERENCES [1] [2] [3] [4] [5]

M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies. Image Coding Using Wavelet Transform. IEEE Transactions on Image Processing, 1992. Beong-Jo Kim. Embedded Video Subband Coding With 3-D Set Partitioning in Heirarchical Trees. Ph.d. thesis, Rensselaer Polytechnic Institute, December 1992. T. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, Inc., 1991. A. Islam and W. A. Pearlman. An Embedded and Efficient Low-Complexity Hierarchical Image Coder. Visual Communications and Image Processing, January 1999. A. Islam. Set-Partitioned Image Coding. Ph.d. thesis, ECSE dept., RPI, September 1999. 8

[6] [7]

C. A. Poynton. A Technical Introduction to Digital Video. John Wiley and Sons, 1996. William A. Pearlman, Asad Islam, Nithin Nagaraj, and Amir Said. Efficient, Low-Complexity Image Coding with a Set-Partitioning Embedded Block Coder. IEEE Trans. Circuits and Systems for Video Technology, accepted for publication, in press. [8] J. M. Shapiro. Embedded Image Coding Using Zerotrees of Wavelet Coefficients. IEEE Transactions on Signal Processing, December 1993. [9] A. Said and W. A. Pearlman.A New Fast and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees. IEEE Transactions on Circuits and Systems for Video Technology, June 1996. [10] A. Said and W. A. Pearlman. An Image Multiresolution Representation for Lossless and Lossy Compression. IEEE Transactions on Image Processing, September 1996.

PEZW

QCIF 176x144

bits 10256

QCIF 176x144

20816

QCIF 176x144

29240

CIF 352x288

25112

CIF 352x288

49016

CIF 352x288

70448

Y U V dB 32.3 34.2 36.9 37.5 39.1 41.0 40.8 41.5 42.6 34.7 37.7 40.1 39.3 41.3 43.6 42.2 43.2 45.2

SPIHT

bits 10000

20000

30000

25000

50000

70000

Y U V dB 32.34 35.02 38.29 38.22 39.97 41.12 41.91 43.68 44.04 35.33 38.70 40.60 40.51 42.64 44.01 43.14 44.87 46.13

JPEG2000 (VM 8.0) bits 10048

19992

29960

24928

49720

70000

Y U V dB 30.74 36.65 38.14 36.69 42.71 42.72 40.87 46.36 46.64 34.45 41.55 42.15 39.85 45.08 46.19 42.48 47.40 48.06

CSPECK

bits 10000

20000

30000

25000

50000

70000

Table 1. Comparison of CSPECK with various algorithms on color Akiyo image.

9

Y U V dB 33.29 34.43 36.65 38.68 38.75 39.84 42.17 43.10 43.56 35.63 37.28 38.95 40.56 40.68 42.36 42.92 44.40 45.83

Sequence CIF 352x288 300 frames Coastguard Coastguard Coastguard Container Container Container Foreman Foreman Foreman

Bits /pixel /frame 0.25 0.50 1.00 0.25 0.50 1.00 0.25 0.50 1.00

Motion JPEG 2000 file size (bytes) 940038 1884484 3777731 943419 1888200 3795336 941943 1887527 3784433

YUV (mean) dB 27.46 30.64 34.97 28.14 32.34 37.21 30.41 33.71 37.87

40.94 42.40 44.43 38.04 41.34 45.01 39.05 41.51 44.68

42.79 44.06 45.94 38.10 40.88 45.48 40.20 43.20 46.22

Motion SPECK file size (bytes) 950400 1900800 3801600 950400 1900800 3801600 950400 1900800 3801600

YUV (mean) dB 27.41 30.36 34.55 28.20 32.21 37.20 31.04 34.29 38.52

37.05 39.98 42.20 35.31 37.63 40.68 36.09 38.11 40.94

39.96 41.77 43.96 34.07 37.33 40.68 36.17 38.82 42.18

Table 2. Comparison of Motion-JPEG2000 (VM 2.1) and Motion-SPECK (constant-rate mode).

Sequence CIF 352x288 300 frames Container Container Container Container Foreman Foreman Foreman Foreman Coastguard Coastguard Coastguard Coastguard

Motion JPEG 2000 file size (bytes)

227425 798763 2133787 4754242 154196 562898 1774737 4387401 182024 733581 2383848 5641570

YUV (mean) dB

22.51 27.33 33.14 38.88 23.18 28.21 33.39 38.88 22.34 26.55 31.94 38.10

34.30 37.58 41.90 46.89 33.98 37.60 41.32 45.46 38.19 40.44 43.11 46.46

32.96 37.52 41.94 47.03 33.52 38.39 43.01 46.90 40.83 42.28 44.73 47.68

Quality factor

Motion SPECK file size (bytes)

50 100 150 200 50 100 150 200 50 100 150 200

228539 804789 2139080 4759690 152959 568804 1775820 4387950 184429 740849 2402010 5661550

YUV (mean) dB

22.93 27.20 32.84 38.95 24.25 28.35 33.29 39.23 22.65 26.23 31.39 37.84

28.58 33.73 37.66 42.03 30.57 34.51 37.62 41.06 28.39 37.52 41.10 43.15

28.16 33.44 37.43 42.05 29.13 34.16 38.38 42.72 29.04 37.73 42.01 44.93

Table 3. Comparison of Motion-JPEG2000 (VM 2.1) and Motion-SPECK (constant-distortion mode).

10

Foreman: PSNR at bit rate 0.25 bits/pixel/frame

50

Foreman: PSNR at bit rate 0.5 bits/pixel/frame

50 Motion−SPECK Motion−JPEG2000

Motion−SPECK Motion−JPEG2000

45

PSNR (Y component) in dB

PSNR (Y component) in dB

45

40

35

30

25

40

35

30

0

50

100

150 200 Sequence number: 1−300

250

25

300

0

50

100

150 200 Sequence number: 1−300

250

300

Figure 4. PSNR vs. frame number (1-300) for Foreman. (a) Left: 0.25 bits/pixel/frame. (b) Right: 0.5 bits/pixel/frame. (Solid line: Motion-SPECK in the Constant-Rate mode. Broken line: Motion-JPEG2000.)

Container: PSNR at quality factor of 50

40

38

36

36

34

34

PSNR (Y component) in dB

PSNR (Y component) in dB

38

32 30 28

32 30 28

26

26

24

24

22

22

20

0

50

100

150 200 Sequence number: 1−300

Container: PSNR at quality factor of 200

40 Motion−SPECK Motion−JPEG500

250

20

300

Motion−SPECK Motion−JPEG2000 0

50

100

150 200 Sequence number: 1−300

250

300

Figure 5. PSNR vs. frame number (1-300) for Container. (a) Left: Quality-factor=50. (b)Right: Quality-factor=200. (Solid line: Motion-SPECK in the Constant-Distortion mode. Broken line: Motion-JPEG2000.)

11

Figure 6. (a) Left: Original Foreman QCIF image. (b) Right: CSPECK at 19.3 Kbits (PSNR Y:34.77 dB, U:38.08 dB, V:36.76 dB).

Figure 7. (c) Left: SPIHT at 19.3 Kbits (Y:34.32 dB, U:40.09 dB, V:40.32 dB). (d) Right: JPEG2000 (VM 8.1) at 19.3 Kbits (Y:33.65 dB, U:41.97 dB,V:43.10 dB).

Figure 8. The same reconstructed frame (50) of Foreman at 1.0 bits/pixel/frame. (a)Left: Original. (b)Middle: MotionSPECK (constant-rate mode). (c)Right: Motion-JPEG2000.

Figure 9. Frame no. 98 of Container sequence. (a)Left: Original. (b)Middle: Motion-SPECK in constant-distortion mode (Quality-factor=100). (c)Right: Motion-JPEG2000.

12

A Block-Based Video-Coding Algorithm Focusing on ...