## **RESEARCH ARTICLE**

## OPEN ACCESS

# Design of Fast Fourier Transform using Radix-2 Butterfly using Shift Register and Folding technique

Ranbeer Rathore<sup>1</sup>, Prof. Navneet Kaur<sup>2</sup>

<sup>1,2</sup>Dept. of Electronics and Communication Sagar Institute of Research & Technology, Bhopal

## ABSTRACT

The Discrete Fourier Transform (DFT) is an important technique in the field of Digital Signal Processing (DSP) and Telecommunications, especially for applications in Orthogonal Frequency Division Multiplexing (OFDM) systems. The Fast Fourier Transform (FFT) is an efficient algorithm to compute the DFT and IDFT. This paper involves the implementation of an area efficient 8-point, 16-point, 32-point, 64-point, 128-point, 256-point, 512-point and 1024-point single path delay feedback (SDF) and folding technique using radix-2 DIT FFT algorithm for signed and unsigned numbers. Implementation for complex numbers is also done for N= 8 and N=16 points. The implemented algorithm used radix-2 butterfly at all stages. It is area efficient and has less delay as compared to previous algorithms. All designs are implemented on vertex-2p device family in Xilinx software. *Index Terms*— FFT, Folding Technique, Single Path Delay Feedback (SDF), Serial in Serial out Shift Register

Date of Submission: 24-08-2017 Date of acceptance: 14-10-2017

#### I. INTRODUCTION

The Fourier Transform is an inevitable approach in signal processing, particularly for applications in Orthogonal Frequency Division Multiplexing (OFDM) systems [1]. The Discrete Fourier Transform decomposes a set of values into different components of frequency. The Fast Fourier transform (FFT) is an appropriate technique to do manipulation of DFT. The algorithm of FFT was derived by Cooley and Tukey in order to decrease the amount of complexity with respect to time and computations [2].

The hardware of FFT can be implemented by two types of classifications- memory architecture and pipeline architecture. The memory architecture, comprises a single processing element and various units of memory [3]. The merits of memory architecture include low power and low cost when compared to that of other styles. The specific demerits are greater latency and lower throughput. The above demerits of the memory architecture are totally eliminated by pipeline architecture at the expense of extra hardware in an acceptable way. The various types of pipeline architecture include Single delay feedback (SDF), Single delay commutator (SDC) and multiple delay commutator (MDC). The pipeline architecture is a regular structure which can be adopted by using hardware description language in an easy manner. In the recent years, the communication systems need to transmit voice and video signals of high quality in an efficient manner. In present day the efficient module is a high speed, reliable technology of communication [4].

Orthogonal Frequency Division Multiplexing (OFDM) is such a reliable option to accomplish the above requirement. The algorithms of FFT can be grouped into fixed-radix, mixed-radix and split radix algorithms in a rough manner [5]. The basic categories of algorithms of FFT include -Decimation in-frequency (DIF) and the Decimationin-time (DIT) as shown in Figure 1. Both of these algorithms depend on disintegration of transformation of an N-point sequence into many subsequences in a successive manner. There is no major difference between them as far as complexity of computation is concerned [6]. Generally DIT deals with the input and output in reverse sequence and normal sequence respectively, while DIF deals with input and output in normal sequence and reverse sequence respectively. Only Decimation-infrequency (DIF) algorithm will be taken into consideration [7].



Figure 1: (a) DIF FFT processing (b) DIT FFT processing (c) DIF FFT butterfly (d) DIT FFT butterfly

## II. LITERATURE REVIEW

Pramod Kumar Mehe et al. [1], the decimation-intime (DIT) fast Fourier transform (FFT) very often has advantage over the decimation-in-frequency (DIF) FFT for most real-valued applications, like speech/image/video processing, biomedical signal processing, and time-series analysis, etc., since it does not require any output reordering. Besides, the DIT FFT butterfly involves less computation time than its DIF counterpart. In this paper, we present an efficient architecture for the radix-2 DIT real-valued FFT (RFFT). We present here the necessary mathematical formulation for removing the redundancies in the radix-2 DIT RFFT, and present a formulation to regularize its flow graph to facilitate folded computation with a simple control unit. We propose here a register-based storage design which involves significantly less area at the cost of a little higher latency compared with the conventional RAM-based storage. The address generation for folded in-place DIT RFFT computation with register-based storage is challenging since both read and write operations are performed in the same clock cycle at different locations. Therefore, we present here a simple formulation of address generation for the proposed radix-2 DIT RFFT structure. The proposed structure involves 61% less area and 40% less power consumption than those of [8], on average, for FFT sizes 16, 32, 64, and 128. It involves 70% less areadelay product and 57% less energy per sample than those of the other, on average, for the same FFT sizes.

Himanshu Thapaliyal et al. [2], this paper proposes the hardware implementation of RSA encryption/decryption algorithm using the algorithms of Ancient Indian Vedic Mathematics that have been modified to improve performance. The recently proposed hierarchical overlay multiplier architecture is used in the RSA circuitry for multiplication operation. The most significant aspect of the paper is the development of a division architecture based on Straight Division algorithm of Ancient Indian Vedic Mathematics and embedding it in RSA encryption/decryption circuitry for improved efficiency. The coding is done in Verilog HDL and the FPGA synthesis is done using Xilinx Spartan library. The results show that RSA circuitry implemented using Vedic division and multiplication is efficient in terms of area/speed compared to its implementation using conventional multiplication and division architectures.

**M.** Ayinala et al. [3], this brief presents a novel scalable architecture for in-place fast Fourier transform (IFFT) computation for real-valued signals. The proposed computation is based on a modified radix-2 algorithm, which removes the redundant operations from the flow graph. A new processing element (PE) is proposed using two radix-2 butterflies that can process four inputs in parallel. A novel conflict-free memory-addressing scheme is proposed to ensure the continuous operation of the FFT processor. Furthermore, the addressing scheme is extended to support multiple parallel Pes [9]. The proposed real-FFT processor simultaneously requires fewer computation cycles and lower hardware cost compared to prior work.

For example, the proposed design with two PEs reduces the computation cycles by a factor of 2 for a 256-point real fast Fourier transform (RFFT) compared to a prior work while maintaining a lower hardware complexity. The number of computation cycles is reduced proportionately with the increase in the number of PEs.

S. S. Kerur et al. [4], digital signal processors (DSPs) are very important in various engineering disciplines [10]. Fast multiplication is very important in DSPs for convolution, Fourier transforms, etc. A fast method for multiplication based on ancient Indian Vedic mathematics is proposed in this paper. The whole of Vedic mathematics is based on 16 sutras (word formulae) and manifests a unified structure of mathematics [11]. Among the various methods of multiplication in Vedic mathematics, Urdhava tiryakbhyam is discussed in detail. Urdhava tiryakbhyam is the general multiplication formula applicable to all cases of multiplication. The coding is done in VHDL (very high speed integrated circuit hardware description language) and synthesis is done using Xilinx ISE series [12]. The combinational delay obtained after the synthesis is compared with normal multiplier. Further, this Vedic multiplier is used in matrix multiplication. This Vedic multiplier can bring great improvement in the DSP performance.

#### **III. FAST FOURIER TRANSFORM**

There are two types with respect to FFT algorithm devised by Cooley and Tukey - Decimation-in-Time algorithm (DIT) and Decimation-in-Frequency algorithm (DIF). The computation of a sequence of N-point can be obtained by means of a dual approach [13]. The input sequence x(n) of size 'N' is decomposed into samples of odd and even and the corresponding subsequences  $f_1(n)$  and  $f_2(n)$  are given by:

$$f_1(n) = x(2n)$$

$$f_1(n) = x(2n+1), \ n = 0,1.....\frac{N}{2} - 1$$
(2)







Figure 3: Butterfly of Radix-2 DIF FFT Algorithm

Figure 2, show the butterfly of radix-2 DIT FFT algorithm. Here we have used eight inputs and eight outputs [9]. In case of DIT the input sample is used in bit reversal order while the output of DIT FFT coefficients is generated in natural order [14] Figure 3,show the butterfly of radix-2 DIF FFT algorithm. In case of DIF the input sample is used in natural order while the output of DIF FFT coefficient is generated in bit reversed order [15].

#### **IV. PROPOSED METHODOLOGY**

In this proposed method we have used binary input in serial input serial output (SISO) shift register. This proposed algorithm consist of SISO, different types of multipliers, Kogge stone (KS) adder, subtractor, single path delay feedback (SPD) pipeline and folding architecture. The flow chart of proposed algorithm is shown in figure 4.

SISO: - SISO technique depends on the binary input. Suppose the binary input of the system is 8 word length, then eight delay flip flops are used in SISO register.



Figure 4: Flow Chart of Proposed Algorithm

Step-I: - The binary input function is a signal conditioning device that interfaces to the serial in serial out shift register. All integer number are applied in binary form to FFT architecture. Binary input depends on the word length i.e. suppose word length of the binary input (3 down to 0) means the input range is 0 to 15.

Step-II: - Second block of the proposed FFT architecture is serial in serial out shift register. Serial in serial out shift register can be constructed using flips flop. The register is first cleared, forcing all output of the serial in serial out shift register to zero. The input data is then applied sequentially to the D input of the first flip flop of the left. During each clock pulse, one bit is transmitted from left to right.

Step-III: - Third block of the proposed FFT architecture is decision multiplier block. According to the number the block is selected and gives the output to the adder and sub-tractor. There are three condition applied in the decision multiplier block i.e. suppose m=0 than select array multiplier, m=1 than select sign multiplier and m=2 than select complex multiplier.

Step-IV: - According to the result of decision multiplier block it uses adder and sub-tractor block. Step-V: - Pipeline FFT processor is a specified class for DFT computation utilizing fast algorithm. This technique is used to lower the delay and for portable digital signal processing device. The single path delay feedback pipeline architecture is used to divide number of order according to the system.

Step-VI: - And at the last of the algorithm we used folding technique. Folding architecture handles two inputs per clock and so two output samples are processed per clock cycle. The advantage of the folding technique is minimized delay in overall system.

Different types of Multiplier:- Three types of multiplier are used in proposed algorithm, i.e. array multiplier, sign (Baugh) multiplier and complex multiplier. Another name of array multiplier is binary unsigned multiplied.

The Baugh - Wooley algorithm for the signed binary multiplication is based on the concept shown in figure 5. The algorithm specifies that all possible AND operations are performed by white cell and all possible NAND operations are performed by gray cell. The white cell consists of full adder and AND gate, but the gray cell consists of full adder and NAND gate.



Figure 5: Block Diagram of Sign Multiplier

Complex multiplication generally consists of 4 multiplications, 2 adders and 1 subtraction. But the proposed complex multiplication consists of 3 multiplications, 1 adder and 1 subtraction in shown in figure 6.



Figure 6: Block Diagram of Complex Multiplication

Subtractor:- Subtractor consists of half subtractor and full subtractor depends on word length in implemented algorithm.

Single-path Delay Feedback Pipeline Architecture:-It is used as a feedback mechanism in order to minimize the number of delay elements. In this implemented architecture one half of outputs from each stage are fed back to the input data buffer when the input data are directly sent to the butterfly.



Figure 7: Flow Graph of the Radix-2 SDF Pipeline Architecture

Folding Architecture:- Folding is a transformation technique used in DSP architecture, implementation for minimizing the number of functional blocks in synthesizing DSP architecture.

In this paper we have calculated the computational delay as  $T_{AM}$  and  $T_{MA}$ , where  $T_{MA}$  is computation time of multiplication followed by addition and  $T_{AM}$  is computation time of addition followed by multiplication.





folding technique



Figure 9: Proposed DIT Radix-2 FFT algorithm using Radix-2 Butterfly

#### V. SIMULATION & RESULTS

The simulation results for various FFT algorithms have been tested practically by implementing in the Vertex-2p Xilinx software. Some of the snapshots of results in the Xilinx software and simulation are as follows



Figure 10: Resistor Transfer Level of radix-2 DIT butterfly



Figure 11: Output Wave Form of radix-2 DIT butterfly



Figure 12: Resistor Transfer Level of radix-2 DIF butterfly

| 📽 🖬 🗏 🕅 🖏 📶 16 10 2 🔍 Q |                   |                  |     |                |      |     |     |     |
|-------------------------|-------------------|------------------|-----|----------------|------|-----|-----|-----|
| Time (ns)               | 0                 | 100              | 200 | 300            | 400  | 500 | 600 | 700 |
| s1[7:0] 🗅               | 0                 | 1                | 2   | 3              | 4    | 5   | 6   | 7   |
| s2[7:0] 🗅               | 3                 | 4                | 5   | 6              | 7)   | 8   | 9   | 10  |
| w[7:0] 🗅                | 2                 | 3                | 4   | 5              | 6    | 7   | 8   | 9   |
| <b>g</b> 1[7:0] 🔼       | 3                 | 5                | 7   | <mark>9</mark> | 11   | 13  | 15  | 17  |
| <b>g</b> 2[7:0] 🧲       | _ <mark>∕6</mark> | <mark>)</mark> 9 | 12  | 15             | ) 18 | 21  | 24  | 27  |

Figure 13: Output Wave Form of radix-2 DIF butterfly

**Table I:** Comparison result for existing algorithm and proposed algorithm

| Radix-2 DIT Algorithm for N=8  |           |        |        |  |
|--------------------------------|-----------|--------|--------|--|
| Parameters                     | Number of |        | MCPD   |  |
|                                | Slice     | of     | (nsec) |  |
|                                |           | LUTs   | · · /  |  |
| P. K. Meher                    | -         | -      | 16.895 |  |
| et al. [1]                     |           |        |        |  |
| Proposed                       | 96        | 192    | 15.548 |  |
| Design                         |           |        |        |  |
| (unsigned)                     |           |        |        |  |
| Proposed                       | 221       | 387    | 15.137 |  |
| Design                         |           |        |        |  |
| (signed)                       |           |        |        |  |
| Radix-2 DIT Algorithm for N=16 |           |        |        |  |
| Parameters                     | Number of | Number | MCPD   |  |
|                                | Slice     | of     | (nsec) |  |
|                                |           | LUTs   |        |  |
| P. K. Meher                    | -         | -      | 21.054 |  |
| et al. [1]                     |           |        |        |  |
| Proposed                       | 256       | 512    | 19.260 |  |
| Design                         |           |        |        |  |
| (unsigned)                     |           |        |        |  |

| Proposed                        | 573                            | 1000   | 18.927  |  |  |  |
|---------------------------------|--------------------------------|--------|---------|--|--|--|
| Design                          |                                |        |         |  |  |  |
| (signed)                        |                                |        |         |  |  |  |
| _                               |                                |        |         |  |  |  |
| Radix                           | Radix-2 DIT Algorithm for N=32 |        |         |  |  |  |
| Parameters                      | Number of                      | Number | MCPD    |  |  |  |
|                                 | Slice                          | of     | (nsec)  |  |  |  |
|                                 |                                | LUTs   |         |  |  |  |
| P. K. Meher                     | -                              | -      | 24.421  |  |  |  |
| et al. [1]                      |                                |        |         |  |  |  |
| Proposed                        | 640                            | 1280   | 22.147  |  |  |  |
| Design                          |                                |        |         |  |  |  |
| (unsigned)                      |                                |        |         |  |  |  |
| Proposed                        | 1410                           | 2462   | 22.806  |  |  |  |
| Design                          |                                |        |         |  |  |  |
| (signed)                        |                                |        |         |  |  |  |
|                                 | Radix-2 DIT Algorithm for N=64 |        |         |  |  |  |
| Parameters                      | Number of                      |        | MCPD    |  |  |  |
|                                 | Slice                          | of     | (nsec)  |  |  |  |
|                                 | Shee                           | LUTs   | (11500) |  |  |  |
| P. K. Meher                     | -                              | -      | 29.421  |  |  |  |
| et al. [1]                      |                                |        |         |  |  |  |
| Proposed                        | 1536                           | 3072   | 26.319  |  |  |  |
| Design                          | 1000                           | 0012   | 201017  |  |  |  |
| (unsigned)                      |                                |        |         |  |  |  |
| Proposed                        | 2431                           | 4063   | 22.785  |  |  |  |
| Design                          |                                | .000   |         |  |  |  |
| (signed)                        |                                |        |         |  |  |  |
| Radix-2 DIT Algorithm for N=128 |                                |        |         |  |  |  |
| Parameters                      | Number of                      | Number | MCPD    |  |  |  |
|                                 | Slice                          | of     | (nsec)  |  |  |  |
|                                 |                                | LUTs   |         |  |  |  |
| P. K. Meher                     | -                              | -      | 36.421  |  |  |  |
| et al. [1]                      |                                |        |         |  |  |  |
| Proposed                        | 2739                           | 5042   | 32.895  |  |  |  |
| Design                          |                                | -      |         |  |  |  |
| (unsigned)                      |                                |        |         |  |  |  |
| Proposed                        | 3390                           | 6042   | 33.806  |  |  |  |
| Design                          |                                |        | 20.000  |  |  |  |
| (signed)                        |                                |        |         |  |  |  |
| (Signed)                        | l                              |        |         |  |  |  |

#### Table II: Result for proposed algorithm

| Radix-2 DIT Algorithm for N=256  |                                 |                   |                |  |  |
|----------------------------------|---------------------------------|-------------------|----------------|--|--|
|                                  | Number of<br>Slice              | Number of<br>LUTs | MCPD<br>(nsec) |  |  |
| Proposed<br>Design               | 7694                            | 16782             | 47.903         |  |  |
| Radi                             | Radix-2 DIT Algorithm for N=512 |                   |                |  |  |
| Proposed<br>Design               | 18943                           | 37716             | 56.989         |  |  |
| Radix-2 DIT Algorithm for N=1024 |                                 |                   |                |  |  |
| Proposed<br>Design               | 42138                           | 81920             | 67.008         |  |  |

www.ijera.com

| Table III. Result for complex number   |                 |         |        |  |
|----------------------------------------|-----------------|---------|--------|--|
| Radix-2 DIT Algorithm for N=8 and N=16 |                 |         |        |  |
| Parameters                             | Number Number N |         | MCPD   |  |
|                                        | of Slice        | of LUTs | (nsec) |  |
| N=08                                   | 438             | 864     | 24.948 |  |
| N=16                                   | 1168            | 2304    | 30.687 |  |

| Table III: Result for complex | a number |
|-------------------------------|----------|
|-------------------------------|----------|

#### **V. CONCLUSION**

The Fast Fourier transformation (FFT) is a frequently used Digital signal processing (DSP) algorithms for the applications of Orthogonal Frequency Division multiplexing (OFDM). The combination of Orthogonal Frequency Division Multiplexing (OFDM) with Multiple Input Multiple Output (MIMO) signal processing is a definite approach of enhancing the data rates of various communication systems such as Wireless LAN, e Mobile, 4G etc.

Since FFT processor is a complex module in OFDM, it is highly inevitable to design the processor in an efficient way. This research work involved the implementation of a low delay and area efficient 8-point, 16-point, 32-point, 64-point, 128point, 256-point, 512-point and 1024-point single path delay feedback (SDF) and folding technique using radix-2 DIT FFT algorithm for signed and unsigned numbers. Implementation for complex numbers is also done for N= 8 and N=16 points. The implemented algorithm used radix-2 pipelined structure at all stages. The implemented algorithm consumes about 14-15% less delay as compared to existing algorithm.

Implementation of a minimum delay and low area efficient radix-2 DIT FFT algorithm using folding technique shows the improvement of 8.46% in computational delay for mult-add (DIT) operation, 10.46% in computational delay for addmult (DIF) operation for N=16 and 7.95% in computational delay for mult-add operation, 6.487% computational delay for add-mult operation. Improved Percentage for MCPDs are

- 7.6 % computational delay for N=8
- 8.5 % computational delay for N=16
- 6.61 % computational delay for N=32
- 10.54 % computational delay for N=64
- 9.68 % computational delay for N=128

#### **IV. FUTURE SCOPE**

This work can be extended for 2048 point, 4096 point and so on. And also we can use for Radix-3, Radix-4 and so on for FFT Design. These software outputs can be verified with simulation results obtained using MODELSIM

#### REFERENCES

[1] Pramod Kumar Mehe. Basant Kumar Mohanty, Sujit Kumar Patel, Soumya

Thambipillai Srikanthan, Ganguly, and "Efficient VLSI Architecture for Decimationin-Time Fast Fourier Transform of Real-Valued Data", IEEE Transactions on Circuits And Systems-I: Regular Papers, Vol. 62, No. 12, December 2015.

- [2] Himanshu Thapaliyal and M.B Srinivas, "VLSI Implementation of RSA Encryption Using Ancient Indian Vedic System Mathematics", Center for VLSI and System Embedded Technologies, International Institute of Information Technology Hyderabad, India 2014.
- M. Ayinala, Y. Lao, and K. K. Parhi, "An in-[3] place FFT architecture for real-valued signals," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60, no. 10, pp. 652-656, Oct. 2013.
- S. S. Kerur, Prakash Narchi, Jayashree C N, [4] Harish M Kittur and Girish V A, "Implementation of Vedic multiplier for Digital Signal Processing", International VLSI, Communication & Conference on Instrumentation (ICVCI) 2011, Proceedings published by International Joural of Computer Applications® (IJCA), pp.1-6.
- Jagadguru Swami Sri Bharati Krishna Tirthaji [5] Maharaja, "Vedic Mathematics: Sixteen simple Mathematical Formulae from the Veda", Delhi (2011).
- D. Tsonev, S. Sinanovic and H. Haas, [6] "Enhanced subcarrier index modulation OFDM," IEEE Global (SIM) in Communications (IEEE Conference GLOBECOM 2011), 5-9Dec. 2011.
- M.A. Mohamed, A.S. Samarah and M.I. Fath [7] Allah, "A Novel implementation of OFDM using FPGA," in IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.11, Nov. 2011.
- [8] Charles. Roth Jr., "Digital Systems Design using VHDL", Thomson Brooks/Cole, 7th reprint, 2005.
- [9] Harpreet Singh Dhillon and Abhijit Mitra, "A Reduced-bit Multiplication Algorithm for Digital Arithmetic", International Journal of Computational and Mathematical Sciences, Febrauary 2008, pp.64-69.
- [10] R. Abualhiga and H. Haas, "Subcarrier-Index Modulation OFDM," in Proc. of the International Symposium on Personal, Indoor Mobile Radio Communications and (PIMRC), Tokyo, Japan, Sep. 13-16, 2009.
- [11] Harpreet Singh Dhillon and Abhijit Mitra, "A Reduced-bit Multiplication Algorithm for Digital Arithmetic", International Journal of Computational and Mathematical Sciences, Febrauary 2008, pp.64-69.

www.ijera.com

- [12] B. G. Jo and M. H. Sunwoo, "New continuous-flow mixed-radix (CFMR) FFT processor using novel in-place strategy", IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911–919, May 2005.
- [13] Uwe Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays, second edition, Springer, Berlin, 2004.
- [14] R.W Chang, "Synthesis of Band limited Orthogonal Signals for Multichannel data Transmission", Bell System Tech. J., pp.1775-1776, Dec 1996.
- [15] Peled an A. Ruiz, "Frequency Domain Data Transmission using Reduced Computational Complexity Algorithms", In Proc. IEEE Int. conf. Acoust., Speech, Signal Processing, pp. 964-967, Denver, CO, 1980.

Ranbeer Rathore. "Design of Fast Fourier Transform using Radix-2 Butterfly using Shift Register and Folding technique." International Journal of Engineering Research and Applications (IJERA), vol. 7, no. 10, 2017, pp. 27–34.