Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415

# Low Power DCT Implementation In An Image Compression System

**Gulnar Perveen** 

Solar Energy Centre, Ministry of New & Renewable Energy, CGO Complex, New Delhi, India

#### Abstract

Low power serves as the most important challenges to maximize battery life & to save the energy for many signal-processing system designs, particularly in multimedia cellular applications and multimedia system on chip design. The 2-D DCT is a commonly used frequency transformation in compression algorithms. In this paper, an Efficient Baseline 2-D DCT Architecture is compared with the **Row/Column** Approach, Distributed a Arithmetic Architecture & Fully Pipelined DCT Architecture for wireless Image Compression Systems. It is observed that the Row/Column DA DCT Architecture provides power saving of 24.4% and the Fully Pipelined Architecture provides power saving of 16.4% as compared to 2-D DCT Baseline Architecture. The speed is also measured & observed that Fully Pipelined Architecture exploits the principle of pipelining & parallelism to obtain throughput of 4.703 GHz.

Keywords-DCT (Discrete Cosine Transform), 2-D(Two-Dimensional), DA(Distributed Arithmetic).

## I. ROW/COLUMN ARCHITECTURE

In this an 8-point 1-D DCT is applied to 8 rows, and then again to each of the 8 columns. The 1-D algorithm that is applied to both the rows and columns are the same. Therefore, it could be possible to use the identical pieces of hardware to do the row computation as well as the column computation.

The bulk of the design and computation is in the 8-point 1-D DCT block, which can potentially be reduced 16 - 8 times for each row and 8 times for each column. Therefore, the fast algorithm for computing 1-D DCT is usually selected. The DCT core processor implements a Row-Column Distributed Arithmetic fast DCT algorithm enhanced with the activity reduction methods coefficients. The Row/Column Architecture has two 1-D DCT units connected through transposition matrix memory.

Design is synchronous, with single positive clock edge and no internal tri-state buffers. This has RAM for storing product results after first DCT stage for maximized performance. This way both 1-D DCT units can work in parallel. This architecture takes 8 bit input data and produces 12 bit output using 12 bit DCT matrix coefficients [4].



Figure 1. Row/Column Approach Distributed arithmetic DCT Architecture, D0-D7: Registers.

In the above architecture, when second stage of DCT reads out data from transposition memory 1, first DCT stage can populate second transposition memory with new data. 1-D DCT use Distributed Arithmetic with butterfly computation to compute DCT values. Because of parallel DA they need considerable amount of ROM memories to compute one DCT value.

## Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415



Figure 2. RAC unit of Row/Column Approach.

Design based on Distributed Arithmetic does not use any multipliers for computing MAC (multiply and accumulate), instead it stores results in ROM memory. There is a buffer which is essentially a memory arbiter between 1-D DCT stages. The DCT architecture implements a Row-Column Distributed Arithmetic fast DCT algorithm enhanced with the activity reduction methods [4].

#### **II. BASELINE 2-D DCT ARCHITECTURE**

The Baseline 2-D DCT Architecture provides a reference design for application of our low power techniques. It is based on Chen Algorithm. This acts as a reference design for the computation of power savings techniques. This approach requires three steps: eight 1-D DCT/IDCTs along the rows, a memory transposition, and another eight 1-D DCT/IDCTs along the transposed columns.

A block diagram of the Baseline Architecture shown above includes the controller which enables input of the first row of data (DIN) through the ser2par unit under the SEN signal. It then activates the 1-D DCT unit with the SEL and REN signals determining the data path. The first row of the transposition memory stores the results & the process repeats for the remaining seven rows of the input block. Next, the ISEL and COLACK signals enable the 1-D DCT unit to receive the input data from the columns of the transposition memory. The final results of the column-wise 1-D DCT are available at the output [1].

## **III. FULLY PIPELINED ARCHITECTURE**

In this architecture, a row output vector is computed using multipliers, multiplexers, accumulators, and registers. The elements of input vector X are fed into the circuit one at a time. The 8 output elements are computed simultaneously and are shifted out serially. An input vector is multiplied by the coefficient matrix M' to get the output. The second element of Y is computed through some additions and subtractions of the elements of X, and then multiplied by constant a. Permutation is done before the result goes into the accumulator.



Figure 3. Baseline 2-D DCT Architecture.

The circuit accepts one pixel per clock cycle and the entire processing is performed as a linear pipe. When the left column of register set RS is filled with eight data elements, the entire column is copied onto the corresponding registers in the right column. A similar process occurs in each of the partitions simultaneously. The transpose buffer consists of an 8x8 array of register pairs, the data is input to the transpose buffer in row-wise fashion until all the 64 registers are loaded. The data in those registers are copied in parallel onto the corresponding adjacent registers which are connected in column-wise fashion. While the data is being read out from the column registers, the row



Figure 4. Fully Pipelined 2-D DCT Architecture.

registers will keep receiving further data from the DCT module.

Thus, the output of row-wise DCT computation is transposed for column-wise DCT

# Gulnar Perveen / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue5, September- October 2012, pp.1413-1415

computation [2].

## **IV. SIMULATION RESULTS** TABLE I.

| A                                                  | Comparison of Speed & Power for<br>DCT Architectures |                          |                 |  |
|----------------------------------------------------|------------------------------------------------------|--------------------------|-----------------|--|
| ure                                                | Speed                                                | Power<br>consumptio<br>n | Power<br>saving |  |
| 2D DCT<br>Baseline<br>Architectu<br>re             | 2.934 GHz                                            | 9mW                      | IF              |  |
| Row<br>Column<br>DA<br>Approach                    | 3.2 <mark>3 GH</mark> z                              | 6.3mW                    | 24.4%           |  |
| Fully<br>Pipelined<br>2D - DCT<br>Architectu<br>re | 4.703 GHz                                            | 7.5mW                    | 16.6%           |  |

## TABLE II.

| TT 1/                    | Performance Analysis of JPEG<br>Compressor |             |                     |  |
|--------------------------|--------------------------------------------|-------------|---------------------|--|
| Units                    | Logic cells                                | Memory bits | Frequenc<br>y (MHZ) |  |
| DCT 2D                   | 3,749                                      | 1,528       | 50.485              |  |
| Quantizati<br>on         | 309                                        | 700         | 85.700              |  |
| Zigzag<br>Buffer         | 84                                         | 1,380       | 181.285             |  |
| Run<br>Length<br>Encoder | 510                                        | 3,901       | 78.500              |  |
| JPEG<br>Compresso<br>r   | 4,652                                      | 7,509       | 395.970             |  |

## V. CONCLUSION

The comparison of the Power Consumption & Speed for three different DCT architectures & the Performance Analysis for the JPEG Compressor discussed above have been done using Verilog & Synthesis using Xilinx.

The Row Column (Distributed Arithmetic) DCT Architecture approach was selected for implementation after power savings against the Fully Pipelined Architecture & Baseline 2-D DCT Architecture. This architecture results in maximum power savings as it uses the butterfly operation which results in minimum data path bit widths since fewer flip flops were needed between stages, hence reduction in power consumption. The Fully Pipelined 2-D DCT architecture exploits the principles of pipelining and parallelism to the maximum extent so to obtain high speed and throughput when compared with Row/Column (DA) DCT approach & Fully Pipelined 2-D DCT approach.

## REFERENCES

- Nathaniel J. August and Dong Sam Ha, "Low power Design of DCT and IDCT for low bit rate video codecs", IEEE Transactions on Multimedia, vol. 6, No. 3, June 2004.
- [2] Jim Li and Shih –Lien Lu, "Low Power Design of Two-Dimensional DCT", IEEE Transactions on communication, vol. 14, No. 4, April 1996.
- [3] F. Bensaali, A. Amira and A. Bouridane, "An efficient Architecture for color space conversion using Distributed Arithmetic" The 10th IEEE International conference on Electronics, Circuits & Systems (ICECS 2003) Sharjah, UAE, December 14-17, 2004.
- [4] Thucydides Xanthopoulos, and Anantha P. Chandraprakasan, "Low power DCT core using Adaptive Bit width and Arithmetic Activity Exploiting signal correlations and Quantization", IEEE Journal of Solid state Circuits, vol. 35, No. 5, May 2000.
- [5] Jie Chen and K.J. Ray Liu, "Low-Power Architectures for Compressed Domain Video Coding Co-Processor", IEEE transaction on circuits & systems.,vol. 7, pp.459-467, June 2000.
- [6] L. fanucci, S. Saponara, "Data Driven VLSI Computation for Low Power DCTbased Video Coding", Proceedimgs of IEEE, vol. 83, No. 2, pp.220-246, May 2002.
- [7] Hyeonuk Jeong, Jinsang Kim, and Wonkyung Cho, "Low-Power Multiplier less DCT Architecture using Image Data Correlation", IEEE Transactions on Consumer Electronics, Vol. 50, No.1, February 2004.