**RESEARCH ARTICLE** 

OPEN ACCESS

## High Performance FPGA Based Optimization Techniques for **DSP Blocks**

B.Krishna<sup>1</sup>, M.Srinu<sup>2</sup>

<sup>1</sup>Assistant Professor ECE AAR Mahaveer Engineering College <sup>,2</sup>Assistant Professor ECE AAR Mahaveer Institute of Science and Tecnology

## ABSTRACT

A digital circuit Optimization is needed to attain higher performance include terms like minimizing the area occupancy and increasing the speed of operation. In any digital circuit the critical path (longest path delay) decides the operating frequency of the system. The operating frequency of digital circuits can be increased by several techniques such as pipelining and wave-pipelining. The proposed technique is evaluated by implementing 4x4 array multiplier, 4-tap FIR filter using array multiplier and 4-tap DA based FIR filter by using three different schemes: non-pipelining, pipelining and wave-pipelining on Spartan 3E FPGA. The WP array multiplier and FIR filters are operating at higher frequency than by using conventional pipelining and non-pipelining techniques.

Keywords: Clock skew, DA, MAC, WP, OSPAM, NRE.

Date of Submission: 08-08-2017

#### I. **INTRODUCTION**

Wave pipelining is high performance circuit designs which an implements pipelining in logic without the use of intermediate registers has the ability to improve speed, efficiency, economy in every aspects. The operating speed of the wavepipelined (WP) circuit can be increased by the following three tasks: adjustment of the clock period. clock skew and equalization of path delays. The pathdelay equalization task can be done theoretically but the real challenge is to accomplish in the presence of various different delays. So to solve the path delay equalization problem insert the control circuit in WP based circuit which will act as critical path for the data moves from input to output.

The conventional Finite Impulse Response (FIR) filters use multipliers, adders and delay elements to produce the required output. The multipliers which multiply the input with the fixed content significantly occupy more area to store their temporary values and also increase the power consumption. The multipliers in FIR filter are replaced with multiplier less Distributed Arithmetic (DA) based technique.

Most of the Digital Signal Processing (DSP) algorithms require multiplication and addition in real time the unit carrying out this function is called Multiply Accumulate (MAC). The three types of technology exist for the implementation of DSP algorithms are: Programmable DSP (PDSP) chips, Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate-Arrays (FPGAs).

------Typically PDSP chips have only one MAC unit that can perform one MAC is less than a clock cycle. DSP processors or PDSP chips are flexible but they might not be fast enough. The reason is that the DSP processor is general purpose and that architecture requires constant instructions to be fetched, decoded and executed.

Date of acceptance: 22-11-2017

ASICs have multiple dedicated MACs that perform DSP functions in parallel but they have high cost, low volume production and the inability to make design modifications after production makes less attractive. FPGAs offer amazing them capabilities for many embedded systems from network infrastructure to military and medical. Many front-end DSP algorithms are Fast Fourier Transforms (FFTs), Finite Impulse Response (FIR) or Infinite Impulse Response (IIR) filters built with ASICs or PDSPs are now replaced by FPGAs. FPGAs have advantages over ASICs such as rapid prototyping, circuit programmability, lower Non-Recurring Engineering (NRE) costs and more economical designs.

### II. **MOTIVATION**

Multipliers are extensively used in common engineering tasks are correlations, convolutions, filtering and frequency analysis. The multiplication specifically in case of higher data path is an expensive, slow process and forms the lowest denominator for the performance of any design with the trend in modern digital systems increasingly focused on design solutions are high reliability and low cost along with power restrictions, design

www.ijera.com

verification and reliability are becoming two critical design cost components. Wave-Pipelined (WP) multipliers using the filters are found to be faster and require low power than those using pipelined multipliers.

But in case of Wave-pipelining system multiple waves of data are propagated between storage elements. The wave-pipeling was first introduced by cotton [3] and he observed that the maximum rate at which logic can propagate through a circuit depend not only on the longest path delay and also the difference between the longest and shortest path delay. A several computational "waves" (logic signals) are related to different clock cycles can propagate through the simultaneous logic. The operating speed of the WP circuit can be increased by the following three tasks: adjustment of the clock period, clock skew and equalization of path delays. A new high speed control circuit is used in [4] which will act as critical path.

## III. CONCEPT OF PIPELINING A. Logic Pipelining

A pipeline is a set of data processing elements connected in series so the output of one element is the input of the next one. Let's consider the following circuit performing a calculation on a stored value and storing the result. Flip-flop ---> combinatorial logic (10 gates) ---> flip-flop

Assume that the combinational logic is a sequence of 10 logic gates each introducing a delay of 1 ns means that it takes 10 ns for a change in the first flip-flop to reach the second and therefore this circuit can be clocked at 100 M Hz only. It takes 1 clock cycle or 10 ns for the data to reach the output and the throughput of this circuit is 1 value per cycle or 1 value per 10 ns.

Now, split the combinational logic in half as shown below: flip-flop ---> logic (5 gates) ---> flip-flop ---> logic (5 gates) ---> flip-flop.

Now it takes 5 ns for data from the first flip-flop to reach the middle and 5 ns from the middle to the end the key is that these two paths are now independent so this circuit can be clocked at 200 M Hz. The latency is 2 clock cycles for 10 ns. In addition we get 1 value out per cycle so the throughput has doubled to 1 value per 5 ns means we can give an input and will get the correct result at output 10ns later. But, we give inputs at each 5ns for this Pipelining circuit has doubled the clock speed and throughput without affecting the latency.The clock frequency is determined by the biggest delay of a combinational logic part. When this delay is reduced then the usable clock frequency is increased so the way to decrease the delay by chopping up the combinational logic part in smaller chunks with synchronous registers in between.



Fig.3.1 optimization schemes for FPGA based Wave pipelining multipliers and FIR filters

### B. 4x4 PIPELINED ARRAY MULTIPLIER

A 4 x 4 array multiplier functions M0, M1, M2 and M4 (M's are either HAs or FAs) are shown in Fig 3.2. X3X2X1X0 is the 4 bit multiplicand and Y3Y2Y1Y0 is the 4 bit multiplier full adder is the important component in each cell. Each cell consist of AND gate which determines whether a multiplicand bit Xj is added to the incoming partial product bit based on the value of the multiplier bit Yi. PPi is unchanged and passed vertically downward if Yi=0 else each row adds the multiplicand (appropriately shifted) to the incoming partial product PPi to generate the outgoing partial product PP (i+1).



### C. 4x4 OPTIMALLY SYNTHESIZED PIPELINED ARRAY MULTIPLIER

The objective of the synthesis technique proposed in [5] is to ensure that all of the four inputs of the LUTs in FPGAs are effectively engaged. Consider the optimization of a 4x4 array multiplier given in Fig 3.2, the stages involving HAs utilize the LUTs inefficiently these stages may be modified to engage all of the four inputs of the LUT as follows: Stage1 may be modified to compute the partial products due to the two least significant multiplier bits. The last 'N' stages may be reduced to 'N/2' stages by replacing the HAs with suitable functional blocks and feeding the sum and carry outputs from one stage to another stage properly. The resulting pipelined array multiplier is referred to as Optimally Synthesized Pipelined Array Multiplier (OSPAM) is shown in Fig 3.3. It consists of five stages of combinational logic blocks these stages uses the functional blocks M0, M2, M5, M6, M7, M8, M9, and M10.



Fig 3.3 Optimally Synthesized Pipelined Array Multiplier

### D. CONCEPT OF CONTROL CIRCUIT USED FOR CRITICAL PATH SCHEME

Control circuit is consists of flip-flops and XOR gates as shown in Fig 3.5 is basically a high speed circuit will be placed in the WP architecture as shown in Fig 3.4 to improve the operating speed. In WP circuit data will move from input to output in different waves in the previous work designers calculated maximum  $(D_{max})$  and minimum  $(D_{min})$ distances which varies from input to output and taking the difference between these two distances  $(D_{max} - D_{min})$  they improve the speed. First calculate all the distances after that maximum and minimum distance to be sort out is a very time consuming process and requires more hardware. The modified WP circuit is shown in Fig 3.4. In Fig 3.5 whatever the data will enter at the input X, the same data will be at the output at very high speed. The advantage of this circuit is automatic high speed and it act as a critical path for the data moving from input to output. This control circuit is used for implementing the 4x4WPAM and 4-tap WP FIR filter.



Fig 3.4 Modified WP Circuit with Control Circuit.



Fig 3.5 Control Circuit having D flip-flops and XOR gates.

# E. FIR FILTER USING AS ARRAY MULTIPLIER

The FIR filter is shown in Fig 3.6. The input samples and the impulse response coefficients of each 4-bit are stored in the RAMs. The products of the inputs with the filter coefficients are accumulated in the adders. The outputs of these adders are finally accumulated to get the filter output. To improve the throughput of the FIR filters using the pipeline multipliers and adders. In this case pipelined array multiplier is used and the registers are placed between multipliers and adders are separated by registers. A 4-tap WP FIR filter is implemented by using 4x4 WPAM, adders and the control circuit.



Fig.3.6 FIR Filter Using as array multiplier

## F. DA BASED FIR FILTER

Memory-based structures are well suited for many DSP applications. Memory elements like Random Access Memory (RAM) or ROM are used as a part or whole of an arithmetic unit. Memorybased structures are more regular compared with the MAC structures and many other advantages are very greater potential for high throughput and reducedlatency in implementation (since the memory-accesstime is much lesser than the usual multiplicationtime) and expected less dynamic power consumption due to less switching activities for memory-read operations when compared to the conventional multipliers [1]. The multipliers are replaced with memory based structures to reduce area and latency of the system.

## IV. IMPLEMENTATION RESULTS A. SIMULATION RESULTS

The simulation results of 4x4 array multiplier, 4-tap FIR filter using array multiplier and 4-tap DA based FIR filters implemented in the three techniques non-pipelining, pipelining and wave-pipelining are shown in Fig 4.1 to Fig 4.9.



Fig .4.1 Simulation Results of 4x4 Non-Pipelined Array Multiplier

| o                                   |       |      |                     |          |      |      |       |      |      |     |              |     | 2454 | .4 ns |      |       |    |     |       |     |      |      |    |       |       |
|-------------------------------------|-------|------|---------------------|----------|------|------|-------|------|------|-----|--------------|-----|------|-------|------|-------|----|-----|-------|-----|------|------|----|-------|-------|
| Current Simulation<br>Time: 5000 ns |       | 1250 |                     | 150      | ) ns | .17  | 50 ns |      | 2000 | ns  | 2250         |     |      | 00 ns |      | 750 r | ns | 300 | )0 ns |     | 3250 |      | 35 | 00 ns |       |
| olk 👔                               | 1     |      |                     |          |      |      |       |      |      |     |              |     | h    | ┿     |      |       |    | 1   |       |     |      |      |    |       |       |
| 🖬 🚮 x(3:0)                          | 4'hF  | 41   | nF )                |          |      | _    |       | 4'h8 |      |     |              |     | Ć    |       |      |       |    | 4   | hF    |     | _    |      |    |       | (4'h8 |
| 🖬 🚮 y[3:0]                          | 4'hF  | 41   | hF X 4h8 X 4hF X4h6 |          |      |      |       |      |      |     |              |     |      |       |      |       |    |     |       |     |      |      |    |       |       |
| 🖬 😽 p(7:0)                          | 8'h40 |      | 81                  | IE1      |      | 8'n8 | C X   | 8'hE | з Х  | 8ħ2 | 0 <u>)</u> 8 | h60 | X    | 3'h40 | X 81 | 14D   | 81 | 151 | X 81  | nF1 | χe   | 3hA1 | Χ  | 8'h   |       |
| 💦 period                            | 2     |      |                     | 20000000 |      |      |       |      |      |     |              |     |      |       |      |       |    |     |       |     |      |      |    |       |       |
| 🚮 duty_cycle                        | 0.5   |      |                     | 05       |      |      |       |      |      |     |              |     |      |       |      |       |    |     |       |     |      |      |    |       |       |
| offset                              | 1     |      | 1000/0000           |          |      |      |       |      |      |     |              |     |      |       |      |       |    |     |       |     |      |      |    |       |       |

Fig .4.2 Simulation Results of 4x4 Pipelined Array Multiplier

|                                     |               |                |                |                 |                | 825.8 ns             |  |  |  |  |  |  |  |  |  |
|-------------------------------------|---------------|----------------|----------------|-----------------|----------------|----------------------|--|--|--|--|--|--|--|--|--|
| Current Simulation<br>Time: 1000 ns |               | 0 ns 100 ns 20 | lūns 300 ns 41 | 00 ns 500 ns 60 | lūns 700 ns 80 | ių ns 900 ns 1000 ns |  |  |  |  |  |  |  |  |  |
|                                     |               |                |                |                 |                |                      |  |  |  |  |  |  |  |  |  |
| oll clk                             | 1             |                |                |                 |                |                      |  |  |  |  |  |  |  |  |  |
| 🖬 🚮 xx[3:0]                         | 4'n2          | 4'h0 X         | 4h8            | 4'n4            | 4'hF           | 4'h2                 |  |  |  |  |  |  |  |  |  |
| 🖬 🚮 yy(3:0)                         | 4ħB           | 4'h0           | 4hF            | 4hE             | 4'hF           | 4'hB                 |  |  |  |  |  |  |  |  |  |
| 🖬 🚮 pp[7:0]                         | 8 <b>h</b> 16 | 81100          | 8h78           | 81138           | 8hE1           | 8h16                 |  |  |  |  |  |  |  |  |  |
| of period                           | 2             |                | 20000000       |                 |                |                      |  |  |  |  |  |  |  |  |  |
| duty_cycle                          | 0.5           |                |                |                 |                |                      |  |  |  |  |  |  |  |  |  |
| offset                              | 1             |                |                | 10000000        |                |                      |  |  |  |  |  |  |  |  |  |

Fig .4.3 Simulation Results of 4x4 WPAM

|                                     |       |      |      |           |     |          |      |      |      |       |     |     |             |     |      | 1621 | 1.6 r   | 1S      |    |     |      |    |            |     |     |              |     |      |            |      |
|-------------------------------------|-------|------|------|-----------|-----|----------|------|------|------|-------|-----|-----|-------------|-----|------|------|---------|---------|----|-----|------|----|------------|-----|-----|--------------|-----|------|------------|------|
| Current Simulation<br>Time: 3000 ns |       | 0 ns | 2501 | ns<br>I I | 500 | ns<br>II | 75   | 0 ns | ; 10 | 100 r |     | 125 | ion:<br>I i | s 1 | 150C | Ins  | 17<br>1 | '50<br> | ns | 200 | 0 ns | 22 | 250 i<br>i | ns  | 250 | 10 n:<br>  1 | s 2 | 2750 | 080<br>I I | 0 ns |
| olk 👔                               | 1     |      |      | Ë         |     |          | İ    |      | Ï    |       |     |     |             |     |      |      | -       |         |    |     |      |    |            |     |     |              |     | Ť    | İ          |      |
| 🖬 😽 op(7:0)                         | 8'hOF |      | 8    | h00       |     |          | 8'h0 | 6 X  | 8h1  | sχε   | ih2 | 1)( | 8ħ2         | B)  | 8ħ:  | 28)  | 81      | IOF     | C  | 81  | 106  |    | (81        | 116 | 81  | h21          | X8  | 'h28 | 81         | h28  |
| 🏹 period                            | 2     |      |      |           |     |          |      |      |      |       |     |     |             | 20  | 000  | 000  | þ       |         |    |     |      |    |            |     |     |              |     |      |            |      |
| 🔰 duty_cycle                        | 0.5   |      |      |           |     |          |      |      |      |       |     |     |             |     | 0.   | 5    |         |         |    |     |      |    |            |     |     |              |     |      |            |      |
| offset                              | 1     |      |      |           |     |          |      |      |      |       |     |     |             |     | 000  | 000  |         |         |    |     |      |    |            |     |     |              |     |      |            |      |

Fig .4.4 Simulation Results of 4-tap FIR Filter using Non-Pipelined Array Multiplier

|                                     |       |             |        |       |    |          |      |        | 10      | 30.0     | <b>TIS</b> |         |         |        |         |
|-------------------------------------|-------|-------------|--------|-------|----|----------|------|--------|---------|----------|------------|---------|---------|--------|---------|
| Current Simulation<br>Time: 4000 ns |       | 0 ns<br>I I | 25<br> | 50 ns | 11 | 500<br>I | ) ns | 750 ns | 100<br> | 0 19<br> |            | 1250 ns | 1500 ns | 1750 n | s 2000r |
| olk 🛛                               | 1     |             |        |       |    |          |      |        |         |          |            |         |         |        |         |
| 🗉 😽 op[7:0]                         | 8'h06 |             |        |       |    | - 81     | 100  |        |         |          | 8'n06      | X 8'h16 | 8'h21   | 8'h2B  | 8'h28   |
| 🏹 period                            | 2     |             |        |       |    |          |      |        | 2000    | 000      | 00         |         |         |        |         |
| 👌 duty_cycle                        | 0.5   |             |        |       |    |          |      |        |         | .5       |            |         |         |        |         |
| 👌 offset                            | 1     |             |        |       |    |          |      |        |         | 000      |            |         |         |        |         |

Fig .4.5 Simulation Results of 4-tap FIR Filter using Pipelined Array Multiplier

|                                     |       |         |         | 181          | 5.2 n | s      |       |     |    |       |   |         |       |   |        |          |         |       |     |           |       |              |
|-------------------------------------|-------|---------|---------|--------------|-------|--------|-------|-----|----|-------|---|---------|-------|---|--------|----------|---------|-------|-----|-----------|-------|--------------|
| Current Simulation<br>Time: 4000 ns |       | 1500 ns | s 1<br> | 1750 n:<br>I | 6<br> | 20<br> | 00 ns |     | 22 | 50 ns |   | 251<br> | 00 ns |   | 2750 I | ns<br>II | 30(<br> | 10 ns |     | 325I<br>I | ) ns  |              |
| 🧓 cik                               | 1     |         |         |              |       |        |       |     |    |       |   |         |       |   |        |          |         |       |     |           |       |              |
| 🗉 😽 op[7:0]                         | 8'n06 | 8h0F    | 81      | 106          | 8     | 'n06   | χ ε   | h16 | X  | 8'h2  | ł | 8       | h28   | X | ľh28   | 81       | hOF     | 8     | h06 | χ         | 81106 | <b>)</b> h1i |
| 🎝 period                            | 2     |         |         |              |       |        |       |     |    |       |   | 20000   | 0000  |   |        |          |         |       |     |           |       |              |
| 👌 duty_cycle                        | 0.5   |         |         |              |       |        |       |     |    |       |   | 0.      | 5     |   |        |          |         |       |     |           |       |              |
| 闎 offset                            | 1     |         |         |              |       |        |       |     |    |       |   | 0000    | 0000  |   |        |          |         |       |     |           |       |              |

Fig .4.6 Simulation Results of 4-tap FIR Filter using WPAM

| Current Simulation<br>Time: 3000 ns |       | 0 ns 250 | ns<br>I I |   | i.1 ns<br>750 | ns | 100<br> | 10 ns | : 12<br> | 250 | ns  | 150<br> | ) ns | : 11<br> | 750 i<br>I | ns | 200<br> | 0 ns | ; 2.<br> | 150 | ns<br>I I | 250<br> | 10 ns | s 2<br> | 750<br> | nsot<br>    | D ns |
|-------------------------------------|-------|----------|-----------|---|---------------|----|---------|-------|----------|-----|-----|---------|------|----------|------------|----|---------|------|----------|-----|-----------|---------|-------|---------|---------|-------------|------|
| olk 👌                               | 1     |          |           | Π |               |    |         |       |          |     |     |         |      |          |            |    |         |      |          |     |           |         |       |         |         |             |      |
| 🖬 😽 data_out(7:0)                   | 8'n14 | 6        | 100       |   | 3h14          | X  | h22     | 81    | 138      | (81 | 146 | 81      | 00)  | 81       | 00         | 8  | h14     | 81   | 122      | 81  | 138       | 81      | h46   | 81      | hOC     | <b>8</b> 1h | 00   |
| 💦 period                            | 2     |          |           |   |               |    |         |       |          |     |     | 000     | 0000 | 00       |            |    |         |      |          |     |           |         |       |         |         |             |      |
| 💦 duty_cycle                        | 0.5   |          |           |   |               |    |         |       |          |     |     | 0       |      |          |            |    |         |      |          |     |           |         |       |         |         |             |      |
| 🗼 offset                            | 1     |          |           |   |               |    |         |       |          |     |     | 000     | 0000 | 00       |            |    |         |      |          |     |           |         |       |         |         |             |      |

Fig .4.7 Simulation Results of 4-tap Non-Pipelined DA FIR Filter



Fig .4.8 Simulation Results of 4-tap Pipelined DA FIR Filter

| Current Simulation<br>Time: 3000 ns |       | 0 ns | 2<br> | 50 I | ns<br>III | 50<br>     | 10 n<br>  1 | s<br>I | 75<br> | i0 ns | »<br> | 100<br> | 10 ns | s 1 | 250<br> | ) ns | : ^ | 150<br>I | 0 n:<br> | 169<br>8 1 |    |   | s :<br> | 2000 | ) ns | 22 | 250<br> | ns<br>I I | 25<br> | 00  | ns  | 275 | 50 fd | 1900 | ns<br>I |
|-------------------------------------|-------|------|-------|------|-----------|------------|-------------|--------|--------|-------|-------|---------|-------|-----|---------|------|-----|----------|----------|------------|----|---|---------|------|------|----|---------|-----------|--------|-----|-----|-----|-------|------|---------|
| oll clk                             | 1     |      |       |      |           |            |             | ſ      |        |       |       |         |       |     |         |      |     |          |          |            | 1  |   |         |      |      |    |         |           |        |     |     |     |       |      |         |
| 🖬 😽 data_out(7:0)                   | 8'h0C |      |       |      |           |            | 8           | 'h0    | 0      |       |       |         |       |     | Xe      | 'h3  | 8)  | 87       | 46       | X8'        | 10 | Ŋ | 8'h     | 00 ) | 8'h  | 14 | 8'      | h22       | Xs     | 'n3 | 8 X | 8ħ4 | 6)    | 8ħ(  | iC )    |
| oriod 🔀                             | 2     |      |       |      |           |            |             |        |        |       |       |         |       |     |         |      |     | 000      | 000      |            |    |   |         |      |      |    |         |           |        |     |     |     |       |      |         |
| 👌 duty_cycle                        | 0.5   |      |       |      |           |            |             |        |        |       |       |         |       |     |         |      |     | 0        |          |            |    |   |         |      |      |    |         |           |        |     |     |     |       |      |         |
| 🚮 offset                            | 1     |      |       |      |           |            |             |        |        |       |       |         |       |     |         |      | 10  | 000      | 000      | 00         |    |   |         |      |      |    |         |           |        |     |     |     |       |      |         |
| Fig /                               | 0     | Sii  | m     | 11   | 1         | <u>a</u> 1 | ÷.          |        | n      | Ľ     | >,    | 20      | . 11  | 1+  | c       |      | ۱.  | F        | 1        | f          |    | n | , 1     | X    |      | )  | Г       | 1/        | ν      | F   | T   | P   |       |      | -       |

Fig .4.9 Simulation Results of 4-tap WP DA FIR Filter

## B. SYNTHESIS RESULTS

The implementation results of 4x4 nonpipelined, pipelined, WP array multipliers and 4-tap FIR filter are shown in Table 4.1 and 4.2 respectively. And the implementation results of 4-tap DA based FIR filter using non-pipelining, pipelining and wave-pipelining are shown in Table 4.3.

 Table 4.1 Implementation Results of 4x4 Non-Pipelined, Pipelined and WP Array Multipliers

| 1 190111   | ieu, i ipei | mea ana  |      | ay man | upners    |
|------------|-------------|----------|------|--------|-----------|
| Array      | Number      | Number   | No.  | Min    | Max       |
| Multiplier | of          | of Slice | of   | period | Frequency |
|            | Slices      | F- F     | LUTs | (ns)   | (MHz)     |
| Non-       | 17          | 16       | 28   | 5.137  |           |
| pipelining |             |          |      |        | 194.678   |
| Pipelining | 22          | 38       | 31   | 2.658  |           |
|            |             |          |      |        | 376.230   |
| Wave-      | 18          | 12       | 30   | 1.997  |           |
| pipelining |             |          |      |        | 500.663   |

 
 Table 4.2 Implementation Results of 4-tap FIR filter using Non-Pipelined, Pipelined and WP Array

|            |        | Multi  | pliers |        |           |
|------------|--------|--------|--------|--------|-----------|
| FIR using  | No. of | No. of | No. of | Min    | Max       |
| Array      | Slices | Slice  | LUTs   | period | Frequency |
| Multiplier |        | F-F    |        | (ns)   | (MHz)     |
| Non-       | 27     | 29     | 42     | 6.273  |           |
| pipelining |        |        |        |        | 159.406   |
| Pipelining | 33     | 54     | 55     | 4.113  |           |
|            |        |        |        |        | 243.114   |
| Wave-      | 30     | 31     | 48     | 3.062  |           |
| pipelining |        |        |        |        | 326.541   |

**Table 4.3** Implementation Results of 4-tap DA basedFIR Filter using Non-Pipelining, Pipelining and

|            | ,      | Wave-Pi | pelinin | g      |           |
|------------|--------|---------|---------|--------|-----------|
| FIR using  | No.    | No. of  | No.     | Min    | Max       |
| DA         | of     | Slice   | of      | period | Frequency |
|            | Slices | Flip-   | LUTs    | (ns)   | (MHz)     |
|            |        | Flops   |         |        |           |
| Non-       | 15     | 20      | 25      | 4.273  |           |
| pipelining |        |         |         |        | 236.019   |

| Pipelining | 22 | 38 | 26 | 3.492 |         |
|------------|----|----|----|-------|---------|
|            |    |    |    |       | 286.369 |
| Wave-      | 16 | 17 | 23 | 3.023 |         |
| Pipelining |    |    |    |       | 330.764 |

### **V.FPGA IMPLEMENTATION**

This project is implemented on FPGA Spartan 3 XC3S100E. Crystal oscillator generates the clock signal of 24MHz frequency, which is the clock source for FPGA. The block diagram of hardware implementation on FPGA is shown in Fig. 5.1.



Fig .5.1 Spartan-3e FPGA Board

This is the normal Spartan-3e FPGA board. The WP DSP blocks are implemented on this board. A. LCD display



Fig 5.2 LCD Displayed Output of 4x4 WPAM

The input samples and the filter coefficients are  $\{3, 5, 2, 1\}$  and  $\{2, 4, 3, 6\}$  respectively used for the 4-tap FIR filter using WPAM. The result of the FIR filter is  $\{6, 22, 33, 43, 40, \text{ and } 15\}$ .

The input samples and the filter coefficients are  $\{10, 2, 0, 0\}$  and  $\{2, 3, 5, 6\}$  respectively used for the 4-tap WP FIR filter using DA. The result of the FIR filter is  $\{20, 34, 56, 70 \text{ and } 12\}$ .



Fig 5.3 LCD Displayed Output of 4-tap FIR Filter using WPAM



Fig 5.4 LCD Displayed Output of 4-tap WP FIR Filter using DA

### VI. CONCLUSION

The optimized multiplier is obtained for multiplication of two four bit Numbers. By replacing the gates that does not utilize all the inputs available efficiently which will be an important factor while using an FPGA. Further the delay is reduced from pipelined multiplier to optimized multiplier which shows the improved performance. By applying Wave-Pipelining technique the performance of the arrav multiplier and FIR filter increased. Traditionally, direct implementation of a K-tap FIR filter requires K MAC blocks, which are expensive to implement in FPGA due to logic complexity and resource usage. An alternative to computing the multiplication is to decompose the MAC operations into a series of LUT accesses and summations. Advantage of this method is the LUTs readily available in the FPGAs can be utilized efficiently. This work presents the proposed DA architecture for FIR filter, i.e., multiplier-less architecture. Then, the complexity is reduced. Hence there is low power consumption. Then performance increases. Then the speed increases. The proposed architecture provides an efficient area-time implementation which involves significantly less latency and less area-delay complexity when compared with existing structures for FIR Filter. The implementation of highly efficient DA algorithm was presented in this work. The results were analyzed for 4-tap FIR filter using DA LUT on Xilinx 10.1i as a target of SPARTAN-3E FPGA device.

### REFERENCES

- [1]. Jiafeng Xie, Jianjun He, Guanzheng Tan, "FPGA Realization of FIR filters for highspeed and medium-speed by using modified distributed arithmetic architectures", Microelectronics journal 41(2010) 365-370.
- [2]. Antonion, "Digital Filters: Analysis, Design, and Applications", McGraw-Hill, New York, 1993.
- [3]. L.Cotton, "Maximum rate pipelined systems," in Proc.AFIPS Spring Joint Comput.Conf., 1969. [4] Charanjit Singh, Balwinder Singh, "Design of High Performance Modified Wave pipelined DAA Filter with Critical Path Approach", International Journal of Electrical and Electronics Engineering (IJEEE), Vol.1, No.2, 2011.
- [4]. G. Lakshminarayanan, B. Venkataramani, "Optimization Techniques for FPGA-Based Wave-Pipelined DSP Blocks", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 13, No. 7, July 2005, pp. 783–793.
- [5]. Nyathi, J. And Delgado-Frias, G. J. A hybrid wave pipelined network router. *IEEE Trans.*

www.ijera.com

*Circ. Syst. I: Fundam. Theory Appl.* 49, 12, 1764–1772. 1999.

- [6]. G.Seetharaman, B.Venkataramani, "Automation Schemes for FPGA Implementation of Wave-Pipelined Circuits", ACM Transactions on Reconfigurable Technology and Systems, Vol. 2, No. 2, Article 11, Pub.date: June 2009.
- [7]. Hirak Kumar Maity, Mitra Barun Sarkar and A.Chakrobarty, "Wave Pipelining: An Analysis for High Performance Digital Circuits", International Journal of Electronic Engineering Research, Vol. 1 Number 3 (2009) pp. 269–278.

International Journal of Engineering Research and Applications (IJERA) is **UGC approved** Journal with Sl. No. 4525, Journal no. 47088. Indexed in Cross Ref, Index Copernicus (ICV 80.82), NASA, Ads, Researcher Id Thomson Reuters, DOAJ.

B.Krishna High Performance FPGA Based Optimization Techniques for DSP Blocks." International Journal of Engineering Research and Applications (IJERA), vol. 7, no. 11, 2017, pp. 50-55.

www.ijera.com