Fig. 1. A typical scalable video encoder

video sequence. Here some information about the new object can be obtained from the co-located base layer block, but is not present in temporal (inter), or spatial (intra) domains. We denote the residue using this mode as “IntraBL residue”. In almost all prior work in literature, extensive research has focused on coding of intra, or inter residue. Recently, Han et al. [7, 8] showed that DST-Type 7 is the optimal transform for the horizontal and vertical prediction modes in H.264/AVC. Saxena & Fernandes then showed that this is indeed the case for other oblique directional modes in [13], and their work on DST [14, 16] was adopted in the HEVC standard. An et al. showed how to use DST for the interresidues in [2]. In addition, in [1] and [15], a different class of secondary transforms was proposed for the intra/inter residues, which apply to only the low-frequency coefficients of DCT 1 . All these aforementioned schemes primarily targeted intra or inter residues. Recently, Guo et al. in [6] and [5] proposed an alternate transform scheme for coding the “IntraBL residue”. They show that different transforms other than DCT-Type 2, when applied on the IntraBL block residue can provide substantial gains. Specifically, at sizes 4 to 32, transforms DCT-Type 3 and DST-Type 3 were also used as alternates to the conventional DCT-Type 2 transform. At the encoder, a Rate-Distortion (R-D) search was performed and one of the following transforms was chosen: DCT-Type 2, DCT-Type 3, and DST-Type 3. The transform choice is signaled by a flag (which can take 3 values for each of the 3 transforms) to the decoder. At the decoder, the flag is parsed and the corresponding inverse transform is used. However, the scheme would require two additional transform cores at each size 4, 8, 16 and 32. This means 8 new, additional, 1 Unless otherwise specified, DCT refers to DCT-Type 2 throughout the paper

Fig. 2. Comparison of the energy distributions for intra/inter residual, and inter-layer residual for a video sequence (reproduced from [5]). Figure best viewed in color.

transform cores (2 transforms x 4 sizes = 8) are required. Furthermore, additional transform cores especially at size 32x32 are extremely expensive to implement in hardware. Hence, to avoid large alternate transforms for inter-prediction residues, there is a need for a low-complexity transform method that can be applied efficiently on the IntraBL residues. To avoid the above shortcomings, and to improve the coding efficiency S-HEVC, in this paper, we propose to use secondary transforms, which we describe in detail in the next section. We show that our proposed secondary transforms perform almost close to the transforms proposed by Guo et al., but at almost negligible complexity as compared to [6, 5]. The rest of the paper is organized as follows: Sec. 2 reviews the application of DCT Type-3 and DST Type-3 transforms for scalable video coding from [5]. The primaries for secondary transforms, and the proposed secondary transform scheme for scalable video coding is described in Sec. 3. A low complexity version of secondary transforms: Rotational transforms is given in Sec. 4. Experimental results with a discussion on proposed transform complexity are given in Sec. 5, followed finally by Conclusions in Sec. 6. 2. ALTERNATE DCT AND DST TRANSFORM SCHEME FOR SCALABLE VIDEO CODING:REVIEW In this section, we briefly review the alternate DCT-Type 3 and DSTType 3 scheme presented in [6, 5]. Let x denote the original video sequence, and x ˆBL denote the reconstruction at the base layer. For the spatial scalability scenario, x ˆBL is up-sampled, and then subtracted from X to construct the inter-layer prediction residual eIL (see Fig. 1). For the SNR scalability scenario, x ˆBL is directly subtracted from X to obtain eIL without intermediate upsampling. We reproduce Fig. 2 (c,d) from [5] in Fig. 2 which shows the energy distribution for the transform coefficients for the intra/inter, and the inter-layer residual prediction eIL for a video sequence at transform size 16x16. From Fig. 2, it is evident that the residual eIL consists of energy in the medium and high frequencies as well, which is not generally the case for intra/inter residues. The primary reasons for this is that the base layer reconstruction x ˆBL does not contain much medium/high frequency information due to heavier quantization at the base layer. Also, the upsampling process is inherently a low-pass

Fig. 3. Transform Basis for 4x4 transforms: DCT Type-2 (top), DCT Type-3 (bottom left) and DST Type-3 (bottom right) (reproduced from Fig. 3 in [5]). Figure best viewed in color.

operation in scalable HEVC [4]. Hence eIL consists of energy in the medium and high frequencies as well. For such a residue, transforms (other than the DCT-Type 2) which efficiently capture medium to high frequencies provide more efficient representations. Such an observation was made in [5], where the authors show that DCT-Type 3 and DST-Type 3 with basis vectors as shown in Fig. 3 can enhance the medium/high frequency efficiently. In contrast, the first basis vector of the DCT-Type 2 is flat (Fig. 3), and therefore enhances the DC component in the signal more than the medium/high frequencies. Also, another advantage of using DCT and DST of Type 3 is that these transforms can be easily obtained from DCT Type-2: DCT Type 3 is just transpose (or inverse as DCT Type-2 is unitary) of DCT-Type 2, and DST-Type 3 can be obtained from DCT-Type 3 via some additional permutation matrices [12]. Therefore, both DCT and DST of Type 3 have fast implementations. But they do require additional transform cores in a video codec. For example, when the DCT Type-3 is being used at the encoder, it can not re-use the inverse transform architecture of DCT Type-2, as the decoder (inside the encoder) might be using the inverse transform. Also, a device which only has the decoder (e.g., mobile-phone), and not the encoder, won’t have the forward transform for DCT Type-2. But inverse of DCT Type-3 is forward DCT Type-2. Hence Guo et al.’s scheme requires additional transforms to be implemented in hardware. Next, we move on to secondary transforms, and show how these secondary transforms can provide performance close to Guo et al.’s scheme, but at a substantial lower complexity. 3. SECONDARY TRANSFORMS FOR SCALABLE VIDEO CODING 3.1. Preliminiaries Since most of the energy in the DCT coefficients is concentrated in the upper-left coefficients (see Fig. 4), it is, in general, sufficient to perform operations only on a small fraction of the DCT output (typically the 4x4 or 8x8 upper-left block). These operations can be performed by simply using a secondary transform of size 4x4 or 8x8 on the upper-left block. Furthermore, the same secondary transform derived for a block size (such as 8x8) can be applied at higher block sizes (such as 16x16 or 32x32). This re-utilization at higher

block sizes is one of the most important advantages of secondary transforms.

8x8 block of low frequency coefficients

32x32 Block

3.2. Secondary transform matrices for IntraBL residue In [6, 5], it was shown that primary alternative transforms: DCT Type-3, and DST Type-3 can be used in addition to DCT Type2. One of the three possible transforms was selected via a RateDistortion search at the encoder, and the choice is signaled at the decoder via a flag. At the decoder, the flag is parsed and the corresponding inverse transform is used. Next, we show how a low-complexity secondary transform can be used for IntraBL residues. Here we show the derivation and usage of secondary transforms at block size K*K, and K=8, but this can be trivially extended to other block sizes. Let us begin with a secondary transform at size 8. At size 8, we would always like to use DCT-Type 2 as the primary transform. Let C denote the DCT Type-2. DCT Type-3, which is simply the inverse (or transpose) of DCT Type-2 is then given by C T (Note that we ignore the normalization factors such as sqrt (2) in definition of DCTs, which is a common practice in literature). Also let S denote DST-Type 3. For an alternate primary transform A (for example C T , or S), an equivalent secondary transform is M such that: C∗M = A, or M = C T ∗ A. Mathematically, it means that DCT followed by secondary transform M (i.e., C ∗ M ) is equivalent to the matrix A. Hence, if alternate transform is DCT Type-3, i.e., C T , then the correspond secondary transform Mc is Mc = C T ∗ C T . For DST Type-3, the secondary matrix Ms would be C T ∗ S. We next present the secondary transform matrices Mc , and Ms corresponding to DCT Type-3, and DST Type-3 respectively, after rounding and shifting by 7-bits: Mc = round(C T ∗ C T ∗ 128) 120 39 −20 −4 −33 114 37 −29 −26 116 22 26 29 −14 119 −9 = −6 27 −3 14 −1 16 0 26 8 2 14 6 4 9 7 13

−9 −1 −33 4 120 8 26 14

−4 −11 −2 −33 −15 118 20 27

−4 −2 −10 −2 −30 −33 114 36

−2 −3 −1 −7 −3 −22 −45 118

and Ms = round(C T ∗ S ∗ 128) 120 −39 −20 4 −9 33 114 −37 −29 1 26 116 −22 −33 26 29 14 119 −4 9 = 6 27 3 120 14 1 16 0 26 −8 8 −2 14 −6 26 −4 9 −7 13 −14

4 −11 2 −33 15 118 −20 27

−4 2 −10 2 −30 33 114 −36

2 −3 1 −7 3 −22 45 118

(1)

(2)

(3)

(4)

where the basis vectors for the transforms are along rows. Note that the presented transforms Mc and Ms above are of sizes 8x8. But they can be applied at block sizes 16x16 and 32x32 as well. Specifically, at the encoder for a 16x16 (and similarly 32x32) block, DCT Type-2 of size 16x16 would be applied first, followed by Mc (or Ms ) to only the low-frequency 8x8 coefficients. Also, note that we show the analysis in the case of 1-dimension. Extension to 2-d in the context of a video codec is trivial.

Fig. 4. Low frequency components of a DCT transformed output

3.3. Application of secondary transforms in scalable video codec We show the decoder operation for the secondary transforms in Fig. 5. The trigger condition for using the secondary transform is when transform unit (TU) size is greater than or equal to 8x8 for the Luma video component, prediction mode is IntraBL , and secondary transform flag is “ON” (1). At the encoder, and the decoder the secondary transform implementation can also be summarized in a practical video codec as follows: Encoder 1. Select one from the following 3 transform choices (a), (b) or (c) for all the Transform Units in a Coding Unit (CU) via a Rate-distortion search: (a) 2-d DCT. (b) 2-d DCT followed by secondary transform Mc . (c) 2-d DCT followed by secondary transform Ms . 2. Based on the choice of transform scheme in (1) above, encode a flag with the appropriate transform choice in (1): DCT, or DCT + Mc , or DCT + Ms . 3. Encode the coefficients by transform choice in (1), and encode the flag with appropriate value (the flag is signaled only when there are non-zero coefficients in the block). Decoder 1. Decode the flag, obtain the transform choice: DCT, or DCT + Mc , or DCT + Ms . 2. If (transform choice == DCT) { Apply Inverse DCT. } else if (transform choice == secondary transform Mc ){ Apply inverse secondary transform Mc . Apply Inverse DCT. } else (transform choice == secondary transform Ms ) { Apply inverse secondary transform Ms . Apply Inverse DCT. } Note that, in the above algorithm, we have assumed only 2 secondary transform choices. This can also be trivially extended to different transform sizes, and block sizes. Also, similar to [6], the flag is encoded using truncated unary codes with codeword ‘0’ for DCT only; ‘10’ for DCT + Mc , and ‘11’ for DCT + Ms . 3.4. Comments on Proposed Secondary Transforms 1. The secondary transforms derived using DCT Type-3 and DST Type-3 have the same coefficients in magnitude, and only a few coefficients have alternate signs. This will reduce secondary transform hardware-implementation cost, where the same core with few sign changes can be used as the second secondary transform.

is given as: 87 91 25 0 R1 = 0 0 0 0

−93 79 38 0 0 0 0 0

12 −44 120 0 0 0 0 0

0 0 0 118 −50 1 0 0

0 0 0 −50 −118 14 0 0

0 0 0 −5 −13 −128 0 0

0 0 0 0 0 0 128 0

0 0 0 0 0 0 0 128

while the second Rotational transform R2 = R1T , is simply the transform of the first Rotational transform. Fig. 5. Decoder operations for the proposed secondary transforms scheme together with trigger conditions in scalable video coding

2. It has been shown by Loeffler [10] that an 8x8 DCT Type2 can be implemented via 11 multiplications, and 29 additions. DCT Type-3, which is the transpose of DCT Type-2, can therefore also be implemented via 11 multiplications and 29 additions. Thus the 8x8 secondary transform Mc which is a cascade of two inverse DCT Type-2 can therefore computed via 22 multiplications and 58 additions, which is less than a full matrix multiplication at size 8x8, i.e., 64 multiplications and 56 additions. 3. Similarly, the secondary transform corresponding to DSTType 3 (which can be obtained by changing signs of some transform coefficients of the previous secondary transform matrix M) can also be implemented via 22 multiplications and 58 additions. 4. The 4x4 secondary transforms (not presented in this paper) corresponding to DCT Type-3, and DST Type-3 actually have an even faster factorization with only 6 multiplications as compared to the cascade of two DCT’s. Further details and the derivation for this are beyond the scope of this paper.

4. LOW-COMPLEXITY SECONDARY TRANSFORMS: ROTATIONAL TRANSFORMS Rotational transforms [1] were derived for Intra residue in the context of HEVC. In fact, the rotational transforms are special cases of secondary transforms, and can also be used as secondary transforms for IntraBL residues. The main advantage of using Rotational Transforms is that their structure is sparse, and there are only 20 nonzero elements at size 8x8 (an example Rotation transform is given in Eqn. 5 ). So they can be implemented via only 20 multiplications, and 12 additions, which is much smaller than 22 multiplications and 58 adds that would be required for 8x8 secondary transform implementation in the previous section. Further the number of operations for the 8x8 rotational transform can be reduced via using a lifting implementation of rotational transforms as described in [11]. However, as these matrices are low-complexity, and were designed primarily for “Intra” residue, the compression gains in scalable video coding are less as compared to the proposed secondary transforms in this paper. We refer the reader to [1] for more details about these matrices. In this work, we present the results for the two best performing Rotational Transforms from [1]. The first Rotational transform R1

5. EXPERIMENTAL RESULTS In our simulations, we encoded five full length sequences (which had 240 to 600 images) at resolution: 1920x1080. The five sequences are Kimono, ParkScene, Cactus, BasketballDrive, and BQTerrace, and are part of the sequences being tested in common test conditions in the scalable extensions of HEVC standard. Full details about the GOP size, Intra period, coding structure of these video sequences etc. are available in [9]. Note that we present, here the results for only the evaluation of the proposed transform scheme and retain all other test settings as in [9]. We test two scenarios for spatial scalability: 2x and 1.5x, where the base layer is sub-sampled by ratios of 2 and 1.5 respectively from 1920x1080 to 1280x720, and 960x540 respectively. According to the common test conditions in [9], the following two specific QP structures for base layer, and enhancement layer are used: 1. QP setting 1 : Four different points are used with BL and EL QP respectively as (22,22), (26,26), (30,30) and (34, 34) respectively. 2. QP setting 2 : Four different points are used with BL and EL QP respectively as (22,24), (26,28), (30,32) and (34, 36) respectively. The anchor (base-line reference) for the proposed transform scheme is SHM 1.0 [4], the reference software for S-HEVC standard with DCT applied to inter-layer residual blocks (IntraBL residuals) for sizes 8x8 and higher. For 4x4 IntraBL blocks, we retain the 4x4 DST Type-7 transform in SHM 1.0 [4]. Note that at size 4x4, DST Type-7 is used since its compression efficiency is almost similar to DCT in S-HEVC, but in single layer HEVC codec DST Type-7 provides close to 1 % compression gain [13]; and it may be beneficial to have only DST Type-7 for an All Intra profile in S-HEVC as well. Thus, DST Type-7 is used for IntraBL residue in EL coding as well. For our simulations, we present the results for All-Intra (AI), and Random Access (RA) scenarios. In the Intra and RA settings, all the images were respectively encoded as I-I-I- and I-B-B- respectively. Table 1 presents the BD-Rate [3] gains for the Luma component for various video sequences. In all the tests, PSNR is calculated using the reconstructed enhancement layer video, and the rate is the total rate from the enhancement layer and the base layer. Here, we compare the technique in Guo et al. [6] and our proposed 8x8 secondary transform scheme with transforms Mc and Ms . Guo et al.’s scheme applies full-size transforms at sizes 8, 16 and 32, while in our proposed scheme, only secondary transforms for the top 8x8 low-frequency coefficients are applied for blocks of size 8, 16 and 32. From the table, our proposed algorithm gains upto 1.4% and 1.3% BD-Rate in All Intra 2x and 1.5x settings respectively. This is slightly less (around 0.2 %) than Guo et al.’s method, but the complexity of our scheme is negligible as compared to Guo et al’s, since

(5)

Technique

Guo et al. AI 2x

Proposed Guo et al. Sec Trans. AI 2x AI 1.5x

Proposed Sec Trans. AI 1.5x

Sequence Name Kimono ParkScene Cactus BasketballDrive BQTerrace Average

-2.0 -2.2 -1.9 -1.0 -0.7 -1.6

-1.9 -1.8 -1.6 -1.0 -0.7 -1.4

-1.7 -1.6 -1.4 -1.0 -0.7 -1.3

-2.0 -1.9 -1.7 -1.1 -0.9 -1.5

Table 1. BD-Rate gains (in percentages) for the technique presented in Guo et al. [6] and proposed transform scheme for All Intra settings and QP Setting 1. Note that negative BD-Rate means compression gain. Technique

Guo et al. AI 2x

Proposed Guo et al. Sec Trans. AI 2x AI 1.5x

Proposed Sec Trans. AI 1.5x

Sequence Name Kimono ParkScene Cactus BasketballDrive BQTerrace Average

-2.1 -2.2 -1.8 -1.1 -1.0 -1.6

-1.8 -1.8 -1.5 -1.0 -0.9 -1.4

-0.5 -0.7 -0.7 -0.4 -0.5 -0.6

-0.7 -1.0 -1.0 -0.7 -0.9 -0.9

Table 2. BD-Rate gains (in percentages) for the technique presented in Guo et al. [6] and proposed transform scheme for All Intra settings and QP Setting 2. Note that negative BD-Rate means compression gain.

we require only small 8x8 secondary transforms for storage, and not the additional large 16x16 and 32x32 transforms as proposed by Guo et al. Also, the average gains for 2x spatial scalability are more as compared to 1.5x spatial scalability, since the EL rate (out of total EL+BL rate) in the 2x scenario is more than in 1.5x scenario, and more reduction of EL rate leads to more gains in 2x scenario (the BL is kept fixed throughout the simulations). In Table 2, we provide the BD-Rate gains for QP setting 2 for Guo et al.’s and our proposed method. Again, our scheme performs close to Guo et al.’s scheme. The gains vary across different video sequences for various methods, as is typically the case in video coding depending on the sequence characteristics. Also, gains such as 0.5-1% in HEVC and S-HEVC standard are difficult to come by, and in general, considered to be significant unlike the days of H.264/AVC, where 5-10% gains were considered significant. Next, in Table 3, we show the results for Guo et al. and our method on RA settings for QP setting 1. The trend is similar to the previous 2 tables and our scheme performs close to Guo et al.’s method. Also, note that the gains for all methods in RA settings are less as compared to All Intra settings, since the number of IntraBL blocks is more in All Intra settings. In Table 4, we show the results for QP setting 1 for lowcomplexity versions of the secondary transforms for All Intra and QP setting 1. We show the results when only one secondary transform Mc (corresponding to DCT Type-3) is applied in the first two columns. Here, we observe average gains of 0.9 % and 0.8 % for AI

Technique

Guo et al. RA 2x

Proposed Guo et al. Sec Trans. RA 2x RA 1.5x

Proposed Sec Trans. RA 1.5x

Sequence Name Kimono ParkScene Cactus BasketballDrive BQTerrace Average

-1.3 -1.3 -1.2 -0.7 -0.5 -1.0

-1.1 -1.2 -1.0 -0.6 -0.5 -0.9

-0.9 -1.1 -1.0 -0.6 -0.7 -0.9

-1.1 -1.2 -1.2 -0.8 -0.7 -1.0

Table 3. BD-Rate gains (in percentages) for the technique presented in Guo et al. [6] and proposed transform scheme for Random Access settings and QP Setting 1. Note that negative BD-Rate means compression gain. Technique

Sequence Name Kimono ParkScene Cactus BasketballDrive BQTerrace Average

One Sec Trans Mc AI 2x

One Sec Trans Mc AI 1.5x

ROT Trans

ROT Trans

AI 2x

AI 1.5x

-1.3 -1.2 -1.0 -0.6 -0.4 -0.9

-1.1 -1.0 -0.9 -0.6 -0.5 -0.8

-1.6 -1.2 -1.2 -0.9 -0.7 -1.1

-1.2 -0.9 -0.9 -0.8 -0.7 -0.9

Table 4. BD-Rate gains (in percentages) for various proposed secondary transforms for All Intra settings and QP Setting 1. Note that negative BD-Rate means compression gain.

2x and 1.5x scenario. Note from Table 1, the gains when Mc and Ms are both applied is around 1.4 % and 1.3 % for AI 2x and 1.5x scenario. So Mc can provide almost 60-65 % of the gain, and can be a low-complexity alternative with lesser coding efficiency, but lesser implementation cost of using only one secondary transform Mc . Next, the last two columns in Table 4 show the gains when we apply the two Rotational Transforms R1 and R2 in Sec. 4 as the secondary transforms. From the results, we observe that average gains 1.1 % and 0.9 % are obtained for the Rotational Transforms. The gains are less as compared to the secondary transforms in Table 1, but they could provide an attractive low-complexity point since the number of additions is very less in Rotational Transforms due to their sparse structure. 5.1. Discussion on Complexity We begin the discussion by a note on complexity of the transforms: The average increase over the reference (no secondary transform case) in the encoder/decoder run-times for the proposed transform schemes is around 10 % per additional transform tested in All Intra setting, and around 4-5 % for Random Access settings. This is expected as the proposed transforms require a R-D search. Other advantages of the secondary transform are: 1. The same secondary transform can be applied on all blocks: 8x8 and larger, thereby eliminating the need of storing and applying different secondary transforms for different blocks: 8x8, 16x16 and 32x32. In general, when different secondary transforms are designed for 8x8, 16x16 and 32x32 blocks,

Technique Sequence Name Kimono ParkScene Cactus BasketballDrive BQTerrace Average

Fig. 6. Worst-case and average case operations count summary for the secondary transforms

and are respectively applied to these blocks, the difference in compression efficiency is almost negligible. However, in that case three matrices would be required to be stored. 2. The secondary transform can be applied to any non-square block such as 8x32 as well in a similar fashion, after the 8point DCT and 32-point DCT for the 8x32 block. 3. To the best of our knowledge, this is the first attempt in the literature to apply the secondary transform for IntraBL residue in scalable video coding. 4. Various techniques to reduce latency in secondary transform implementation have also been presented in literature in [17]. 5.2. Operation Counts Next we mention the number of operations for the proposed transform scheme in Fig. 6. Guo et al.’s scheme in [6, 5] requires the same number of operations as 2-d DCT but different transform cores than DCT Type-2. The entries corresponding to the secondary transform indicate the additional operations required after DCT for the proposed secondary transform. The worst-case happens when the secondary transform is used at the decoder. For the average case, we present the results assuming that the secondary transforms are used half of the times. Note that, in general, the worst-case scenario in a video codec is important, since an “evil” video bit-stream which can break the video codec will present the worst-case scenario. From Fig. 6, we can observe that 8x8 secondary transform and 8x8 Rotational Transforms provides an excellent trade-off in terms of complexity and compression efficiency, since the number of additional operations as compared to DCT is almost negligible, especially at sizes 16 and 32. We also show the results in Table 5 when the proposed secondary transforms Mc and Ms , or the two Rotational Transforms are applied at block sizes 16 and 32 only for QP setting 1. Note that there is a slight decrease in the compression efficiency from the case when the secondary transforms were applied to block sizes 8 as well (see Table 1 for secondary transforms results, and Table 4 for Rotational transform results), but the main advantage here is that we eliminate the “worst-case” block of 8x8 where the extra operations of applying the secondary transform are maximum. In our future work, we plan to find faster algorithms for the proposed secondary transform to further reduce the number of operation counts. 6. CONCLUSIONS In this paper, we have presented various secondary transforms for coding IntraBL residue instead of the conventional DCT. The proposed secondary transform scheme requires the storage of only at

Sec Trans AI 2x

Sec Trans AI 1.5x

ROT Trans AI 2x

ROT Trans AI 1.5x

-1.7 -0.9 -0.9 -0.7 -0.3 -0.9

-1.5 -0.7 -0.6 -0.6 -0.2 -0.7

-1.5 -0.8 -0.7 -0.6 -0.2 -0.8

-1.1 -0.5 -0.5 -0.5 -0.2 -0.6

Table 5. BD-Rate gains (in percentages) for various proposed secondary transforms for All Intra settings and QP Setting 1 when applied at block sizes 16 and 32 only. Note that negative BD-Rate means compression gain. most two additional transform cores. Simulation results show significant gains in compression performance as compared to SHM 1.0 anchors in the context of scalable video coding. 7. REFERENCES [1] E. Alshina, A. Alshin, and F. Fernandes. Rotational transform for image and video compression. In IEEE ICIP, Sept 2011. [2] J. An, X. Zhao, X. Guo, and S. Lei. Boundary-dependent transform for inter-predicted residue. ITU-T & ISO/IEC JCTVC-G281, Nov 2011. [3] G. Bjontegaard. Calculation of average PSNR Differences between RD curves. ITU-T SG16/Q6, VCEG-M33, April 2001. [4] J. Chen, J. Boyce, Y. Ye, and M. M. Hannuksela. SHVC Test Model 1 (SHM 1). ITU-T & ISO/IEC JCTVC-L1007, Jan 2013. [5] L. Guo, M. Karczewicz, and J. Chen. Transform for inter-layer prediction residues in scalable video coding. In International Workshop on Emerging Multimedia Systems and Applications, July 2013. [6] L. Guo, M. Karczewicz, J. Sole, and J. Chen. Transform selection for inter-layer texture prediction in scalable video coding. ITU-T & ISO/IEC JCTVC-K0321, Oct 2012. [7] J. Han, A. Saxena, V. Melkote, and K. Rose. Jointly optimized spatial prediction and block transform for video and image coding. IEEE Trans. on Image Processing, 21(4):1874–1884, April 2012. [8] J. Han, A. Saxena, and K. Rose. Towards jointly optimal spatial prediction and adaptive transform in video/image coding. In IEEE ICASSP, pages 726–729, March 2010. [9] X. Li, J. Boyce, P. Onno, and Y. Ye. Common SHM test conditions and software reference configurations. ITU-T & ISO/IEC JCTVC-L1009, Jan 2013. [10] C. Loeffler, A. Lightenberg, and G. Moschytz. Practical fast 1-D DCT algorithms with 11 multiplications. In IEEE ICASSP, Feb 1989. [11] Z. Ma et al. Experimental results for the Rotational transform. ITU-T & ISO/IEC JCTVC-F294, July 2011. [12] K. R. Rao and P. Yip. Discrete Cosine Transform-Algorithms, Advantages and Applications. Academic Press, 1990. [13] A. Saxena and F. Fernandes. Mode Dependent DCT/DST for intra prediction in block-based image/video coding. In IEEE ICIP, Sept 2011. [14] A. Saxena and F. Fernandes. Mode-dependent DCT/DST without 4*4 full matrix multiplication for intra prediction. ITU-T & ISO/IEC JCTVC-E125, March 2011. [15] A. Saxena and F. Fernandes. On secondary transforms for prediction residual. In IEEE ICIP, Oct 2012. [16] A. Saxena and F. Fernandes. DCT/DST based transform coding for intra prediction in image/video coding. IEEE Trans. on Image Processing, 22(10):3974–3981, October 2013. [17] A. Saxena and F. Fernandes. Low latency secondary transforms for intra/inter prediction residual. IEEE Trans. on Image Processing, 22(10):4061–4071, October 2013.