IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Implementation of Fast Radix-2 DCT Algorithm using CORDIC T. Saketh1, R.Prithiviraj2 1

MTech, VLSI DESIGN, Department of Electronics and Communication and Engineering, SRM University, Chennai, India, Email id: [email protected] 2

Assistant Professor, Department of Electronics and Communication and Engineering, SRM University, Chennai, India, Email id: [email protected]

Abstract This paper proposes a Coordinate Rotation Digital Computer (CORDIC) based DCT/IDCT algorithm, which makes use of shift and add operations for computation. The proposed algorithm is easily extended upto 2n-point DCT/IDCT. In this proposed architecture, an N-point DCT/IDCT is deduced using two N/2-point DCTs/IDCTs. In addition to this the proposed architecture is modeled with MATLAB language and performed in DCT-based JPEG process. The experimental results show that the peak signal to noise ratio (PSNR) values of proposed architecture is higher than the existing method. Moreover the proposed architectures have high regularity, modularity, computation accuracy and are suitable for VLSI Implementation.

Keywords: Coordinate rotation digital computer (CORDIC). Discrete cosine transform (DCT), Inverse Discrete cosine transform (IDCT), Fast radix-2 algorithm.

I. Introduction The Discrete cosine transform (DCT) and Inverse discrete cosine transform (IDCT) are the most widely used transforms in the field of Image and Signal processing due to their near optimal performance for compressions of a highly correlated data. Discrete cosine transform was proposed by Ahmed hence various fast algorithms have been reported in this article. Existing fast algorithms can be classified into two categories, namely nonradix and radix. Algorithms of nonradix category attempt to reduce computational complexity and make computations more efficient ones, such as matrix factorization, directly deduced from signal flow graphs, and Coordinate rotation digital computer (CORDIC) based fast algorithms. Due to the extensive design optimization for cost reduction and performance enhancement, these DCT algorithms are often complicated and hardly scalable to more than 8-point DCTs. In comparison with nonradix algorithms, radix algorithms allow us to generate higher-order DCTs from lower-order DCTs. The radix algorithms have a regular computational structure, which reduces the implementation complexity. But, due to their recursive nature, radix algorithms are difficult to realize pipeline and are not suitable for high-speed applications. Among radix algorithms, the radix-2 algorithm is the most popular because of its computational efficiency and structural simplicity. Hence we propose a CORDIC –based radix-2 fast DCT algorithm. From the proposed algorithm, signal flows of DCTs and IDCTs are developed and deduced using their orthogonal properties, respectively. The proposed algorithm is similar to Cooley-Tukey fast Fourier transform (FFT) and can generate the next higher-order DCT from two identical lower-order DCTs. In addition to this, it has T. Saketh,

IJRIT

298

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

some distinguish advantages, such as regular dataflow like FFT, uniform post scaling factor, in-place computation and arithmetic sequence rotation angles. The difficulty to realize pipelining in conventional CORDIC can be overcome by using unfolding CORDIC technique. This results in a pipelined and highspeed VLSI implementation. Compared to the existing DCTs the proposed algorithm has low computational complexity, and is highly scalable, modular, regular, and able to admit efficient pipelined implementation.

II. CORDIC BASED FAST DCT ALGORITHM EQUATIONS For an N –point signal x[n], the DCT is defined as

(1) Where α[0] =1/

if k = 0, and α[k] =

otherwise.

According to (1), neglecting the post-scaling factor without loss of generality, the main operation of an Npoint DCT denoted as DCT can be written as (2) A length-N input sequence x[n], with N is power of two, can be decomposed into xL[n] = ½ {x[2n] + x[2n+1]} and

(3)

xH[n] = ½ {x[2n] - x[2n+1]}

(4)

where n = 0,1,2……(N/2) – 1.

Fig.1. Signal flow graph of an N-point fast DCT using CORDIC.

Fig.2. Signal flow graph of a 2-point fast DCT using CORDIC.

T. Saketh,

IJRIT

299

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

Fig.3. Signal flow graph of a 4-point fast DCT using CORDIC.

So the original signal x[n] can be obtained from xL[n] and xH[n] as follows x[2n] = xL[n] + xH[n]

and

(5)

x[2n + 1] = xL[n] - xH[n].

(6)

Substituting (5) and (6) into (2), (2) can be rewritten as

C[k] = +

(7)

Where k = 0,1,2……N – 1 Since

(8) We get (9) and (10)

C[k] = (9)

+

C[N - k] = -

+ (10)

Where k = 0,1,2….N/2 – 1.

T. Saketh,

IJRIT

300

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

From (9) and (10), we find that each equation has two N/2-point DCT with two different coefficients, and the four coefficients just make one CORDIC. So we combine the two equations to realize a CORDIC based fast DCT algorithm.

Fig.4. Signal flow graph of an 8-point fast DCT using CORDIC.

Let H

[n] =

xH[n]

(11)

Combining the constant values 2 and ⎷2 in recursively decomposing stages with the post-scaling factor, the DCT can be written as

(12)

According to (12), we can decompose the N- point DCT into two N/2-point DCTs based on the CORDIC algorithm. For power-of-two point DCT, the proposed algorithm computes the DCT by recursively decomposing it into 2-point DCT. Since the basic operation of the algorithm is a 2-point DCT, similar to the radix-2 FFT, this algorithm is called fast radix-2 DCT. The rotation angles of the CORDICs are arithmetic sequences with a common difference of –π /2N, the important aspect is that all the outputs have uniform post-scaling factor. III. ALGORITHM FLOW OF DCT AND IDCT BASED ON CORDIC The signal flow graph of the proposed fast DCT algorithm is given in (12) is shown in Fig 1., Similarly signal flow graphs of 2-point DCT,4-point DCT and 8-point DCT are shown in Figs 2 – 4, where the angles in the circles are used to represent the CORDICs with this rotation angles. In Fig.1. there are two separate N/2-point DCTs and one CORDIC array .As mentioned earlier, the CORDIC array has N - 1 CORIDCs with arithmetic sequence rotation angles. As we are taking it in Decimation in Frequency (DIF) the inputs are addressed in bit reverse order and the outputs are addressed in natural order. It also supports in place computation like the FFT. It is suitable for pipelined VLSI T. Saketh,

IJRIT

301

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

implementation because it has Regular and pure feed-forward data paths of the signal flow. The recursive problems conventional CORDIC can be overcome by using unfolded CORDIC. Similarly, the fast algorithm for the N-point IDCT can be deduced like the fast DCT algorithm. It can be obtained more easily using their orthogonal property. As we already know, the DCT and IDCT are orthogonal transformations and the signal flow graph of N-point IDCT can easily be obtained by inverting the transfer function of each building block shown in the below table and reversing the signal flow direction.

Symbol

DCT

Butterfly

xout = xin + yin

xout = (xin + yin)/2

yout = xin - yin

yout = (xin - yin)/2

Multiply Constant CORDIC

kx Clockwise (-ϴ)

IDCT

x Anticlockwise(ϴ)

Table.I. Transfer functions of DCT and IDCT. The CORDICs in the above mentioned Table.I, have same rotation angle but opposite rotation directions. So, when we change the CORDIC from a clockwise to an anticlockwise rotation with same angle, the only thing required to change is, all adders to subtractors, and subtractors to adders in the rotation iteration stage. This results in an easy way to implement a re-configurable or unified architecture for DCTs and IDCTs. The proposed fast IDCT algorithm has the same arithmetic complexity as does the DCT. IV. MODE OF IMPLEMENTATION AND APPLICATION The N-point proposed DCT algorithm needs two N/2-point DCTs, N/2 – 1 CORDICs, and N additions. Therefore the number of CORDICs required by the proposed algorithm is

(13) and the number of additions is (14) The proposed architecture functionality is checked using system Verilog for both DCT and IDCT. In addition 8-point fast DCT using CORDIC is written in Verilog and simulated using modelsim6.6d.

T. Saketh,

IJRIT

302

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

Fig.5. Simulated output of 8-point fast DCT algorithm.

Fig.6. Layout view of the proposed 8 point DCT after P & R based on TSMC. 0.18 µm standard cell library.

T. Saketh,

IJRIT

303

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

Fig.7. Reconstructed image from proposed DCT algorithm. To verify the performance of the proposed architecture we design and estimate proposed algorithm and compare it with the PSNR value obtained in [5] using MATLAB language. The image taken and obtained PSNR value is shown in Fig.7. Here we took baboon jpeg image and the PSNR value obtained is 26.95 dB. The following are some major features of our proposed CORDIC-based fast DCT algorithm in comparison with others. Hardware complexity: The proposed CORDIC-based algorithm is highly suitable for VLSI implementation, since it uses only shifters and adders. Furthermore, the computational accuracy can be selected based on the trade-off between hardware complexity and approximation error. Because of the uniform post scaling factor in our proposed architecture it is also suitable for scaled DCT implementation. Fig.6. shows the layout view of the proposed 8-point DCT after placement and routing based on TSMC 0.18µm standard cell library has area of 82012µm2 with power consumption of 10.8 mW under the operating frequency of 66.6 MHz . Scalability: Many CORDIC based algorithms are limited to short-length DCT. But our proposed algorithm provides easier way to realize the scalability, which can be extended to compute long-length DCTs as long as the radix is 2. CORDICs with Modularity: For N- point DCT, our proposed algorithm requires N/2 only N/2 – 1 different CORDIC types. It can be said as modular as we can use double angle formula to reduce number of CORDIC types. Pipelinability: The architectures in [20], [22], [23] is recursive in nature, thus making them difficult to realize pipeline. Use of modified unfolding CORDIC can overcome this difficulty. Reconfigurability: By using the orthogonal property we present an easy way to implement a reconfigurable architecture for DCT and IDCT. IV. CONCLUSION A novel CORDIC- based radix-2 fast DCT algorithm is proposed. This algorithm can generate the next higher order DCT from two identical lower-order DCTs by using only shift and add operations. Compared to the existing DCT algorithms, our proposed algorithm has several distinguished advantages like low computational complexity, highly scalable, modular, regular and able to perform pipeline implementation. Furthermore, it provides an easy way to implement reconfigurable architecture for DCTs and IDCTs. The PSNR value is calculated to show that the proposed architecture is an efficient one.

T. Saketh,

IJRIT

304

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, April 2014, Pg: 298- 305

REFERENCES [1] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–94, 1974. [2] T. D. Tran, “The binDCT: Fast multiplierless approximation of the DCT,” IEEE Signal Process. Lett., vol. 7, no. 6, pp. 141–144, 2000. [3] L. Xiao and H. Huang, “A novel CORDIC based unified Architecture for DCT and IDCT,” in 2012 Int. Conf. Optoelectronics and Microelectronics (ICOM), 2012, Aug. 2012, pp. 496–500. [4] Z. Wu, J. Sha, Z. Wang, L. Li, and M. Gao, “An improved scaled DCT architecture,” IEEE Trans. Consumer Electron., vol. 55–2, pp.685–689, May 2009. [5] C.-C. Sun, S.-J. Ruan, B. Heyne, and J. Goetze, “Low-power and High quality CORDIC-based loeffler DCT for signal processing,” IET Proc.-Circuits, Devices Syst., vol. 1, no. 6, pp. 453–461,2007. [6] S. Yu and E. E. Swartzlander, Jr, “A scaled DCT architecture with theCORDIC algorithm,” IEEE Trans. Signal Process., vol. 50, no. 1, pp. 160–167, Jan. 2002.

T. Saketh,

IJRIT

305

Implementation of Fast Radix-2 DCT Algorithm using ...

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 4, ... signal flow graphs, and Coordinate rotation digital computer (CORDIC) .... The following are some major features of our proposed CORDIC-based fast ...

318KB Sizes 2 Downloads 309 Views

Recommend Documents

Fast Web Clustering Algorithm using Divide and ...
5 : Simulate adding d to c. 6: HR. ( ) ... Add d to c. 9: end if. 10: end for. 11: if (d was not added to any cluster) then. 12: Create a ... Basic theme of the algorithm is ...

Fast Web Clustering Algorithm using Divide and ...
Clustering is the unsupervised classification of patterns .... [3] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM Computing. Surveys 31 (3) (1999) ...

A Fast Line Segment Based Dense Stereo Algorithm Using Tree ...
correspondence algorithm using tree dynamic programming (LSTDP) is ..... Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame.

A fast k-means implementation using coresets - Research at Google
Dec 5, 2005 - onds on one core of an Intel Pentium D dual core processor with 3 GHz core ..... KMHybrid was compiled using the same compiler and also.

A New Algorithm to Implement Low Complexity DCT ...
National Institute of Technology, Trichy, India .... Subexpressions (CSs) with inputs signal needs to be implemented only once, the .... The number of CSs and the bits that cannot form CSs are determined for the remaining bits in the z coefficient as

Implementation of a Moving Target Tracking Algorithm ...
Jun 24, 2010 - Using Eye-RIS Vision System on a Mobile Robot. Fethullah Karabiber & Paolo ..... it gives big advantage in processing speed with comparison.

An Implementation of a Backtracking Algorithm for the ...
sequencing as the Partial Digest Problem (PDP). The exact computational ... gorithm presented by Rosenblatt and Seymour in [8], and a backtracking algorithm ...

LNCS 5876 - Interactive Image Inpainting Using DCT ...
Department of Mechanical and Automation Engineering. The Chinese University of Hong Kong [email protected]. Abstract. We present a novel ...

A Fast String Searching Algorithm
number of characters actually inspected (on the aver- age) decreases ...... buffer area in virtual memory. .... One telephone number contact for those in- terested ...

A Fast String Searching Algorithm
An algorithm is presented that searches for the location, "i," of the first occurrence of a character string, "'pat,'" in another string, "string." During the search operation, the characters of pat are matched starting with the last character of pat

Fast-Track Implementation Climate Adaptation - Climatelinks
The Integrated Water and Coastal Resources Management Indefinite Quantity Contract (WATER ... ii. FAST TRACK IMPLEMENTATION OF CLIMATE ADAPTATION ..... and Increased Vulnerability/Renewable and Conventional Energy Assets .

Fast maximum likelihood algorithm for localization of ...
Feb 1, 2012 - 1Kellogg Honors College and Department of Mathematics and Statistics, .... through the degree of defocus. .... (Color online) Localization precision (standard devia- ... nia State University Program for Education and Research.

Design and Implementation of a Fast Inter Domain ...
Jul 6, 2006 - proximity of virtual machines sharing data and events can .... that share file systems is already being investigated [14] [15]. [16]. It is not ...

Speech Enhancement using Intra-frame Dependency in DCT Domain
In [10], a DCT domain speech enhancement method is pro- posed based on ... where we want to get an estimate of X from a given obser- vation of Y. We split the ...

Fast-Track Implementation Climate Adaptation - Climatelinks
The Integrated Water and Coastal Resources Management Indefinite Quantity Contract (WATER ... ii. FAST TRACK IMPLEMENTATION OF CLIMATE ADAPTATION ..... and Increased Vulnerability/Renewable and Conventional Energy Assets .

Fixed-Point DSP Algorithm Implementation, SF 2002 - Semantic Scholar
Embedded Systems Conference ... The source of these signals can be audio, image-based or ... elements. Figure 1 shows a typical DSP system implementation.

a novel parallel clustering algorithm implementation ...
In the process of intelligent grouping of the files and websites, clustering may be used to ..... CUDA uses a recursion-free, function-pointer-free subset of the C language ..... To allow for unlimited dimensions the process of loading and ... GPU, s

Fixed-Point DSP Algorithm Implementation, SF 2002 - Semantic Scholar
Developing an understanding of which applications are appropriate for floating point ... The code development process is also less architecture aware. Thus,.

Fixed-Point DSP Algorithm Implementation, SF 2002 - Semantic Scholar
Digital Signal Processors are a natural choice for cost-sensitive, computationally intensive .... analog domain and digital domain in a fixed length binary word.

A DNA-Based Genetic Algorithm Implementation for ... - Springer Link
out evolutionary computation using DNA, but only a few implementations have been presented. ... present a solution for the maximal clique problem. In section 5 ...

TCP Retransmission Timeout Algorithm Using ...
Jan 2, 2010 - and HTTP) running on different hosts on the Internet [2, p. 82]. It is critical for TCP to have ... Manuscript received July 10, 2003; revised October 10, 2003. This work was .... cursive WM RTT estimates, produces the best results. Let

Enhance Performance of K-Mean Algorithm Using MCL
K- Mean does not determine the membership of data ... exploratory analysis scenario in which there are no predetermined notions about what will constitute an.

Simulation of Grover's algorithm using MATLAB
However, even quadratic speedup is considerable when N is large. Like all quantum computer algorithms, Grover's algorithm is probabilistic, in the sense that it.

Implementation of Cross Platorm Using Public and Private WEBOS.pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Implementation ...