# An efficient Differential Conditional Capturing Energy Recovery Flip-Flop design

## **Rajendra Prasad\* and Ritu Kanodia\*\***

\*Dept. of ECE Mewar University, Chittorgarh, 632014 Rajasthan, INDIA \*\* Dept. ECE

## Abstract—

An efficient approach for differential conditional capturing based flip-flop design has been introduced in this paper. It was found that the main speed bottleneck of existing DCCER FF's is the cross-coupled set-reset (SR) latch at the output stage. To make the differential amplifier more efficient, we designed a clocked CMOS based latch for the differential amplifier. We surveyed on the design of differential Amplifier and their experimental evaluation presented. The simulation results were compared in terms of timing parameters, switching activities and the power dissipation etc. with conventional flip-flop and hybrid approach has been presented in this field. We have done the minimum sizing of transistors and also designed the layout for them. Net lists with parasitic capacitance were extracted from layouts and simulated using Tanner EDA, Tsmc 0.18um technology. We applied the sinusoidal clock of single phase of 200 MHz frequency generated by clock generator of high efficiency. The designed circuit provides ratio less design, reduced short-circuit power dissipation, and glitch-free operation. The power delay product (PDP) is improved by 50.54%, 17.95% and 25.21% in comparison to conventional flip flop for switching activity  $\alpha = 1, 0.5$  and 0.25 respectively. A PDP improvement of 14.9 %, 13.66% and 8.72% is also reported in comparison with hybrid approach. Finally the approach indicates the flip-flop's suitability with low power and high performance based designs.

*Index Terms*—Clocked CMOS, CMOS digital integrated circuits, flip-flops, Differential amplifier, PDP.

## I. INTRODUCTION

FLIP FLOPS are ubiquitous elements in CMOS circuits based designs which make the major portion of the synchronous circuits. As a result, the structure of FF used in circuits has a large impact on system power consumption. Moreover, the

type of FF used determines the amount of clock.

load, which directly affects dynamic power consumption  $P_{\rm DYN}$  of a circuit Thus, it is prudent to come up with techniques to reduce the power consumption of FFs to reduce the overall system power Consumption [1]. Also the power dissipated in clock distribution network is 30% to 60% of the total system power, where 90% of which is consumed by the flip flops and the CDN that is driving the flip flops [2]. As the power budget of today's portable digital circuit is severely limited, it is important to reduce the power dissipation in both CDN and in the flip flops.

Timing elements, latches and flip flop are critical for the performance of the digital systems because of the tight timing constraints and requirements for low power [3]. Short set up and hold times are also required for performance, but often overlooked. In a complex system it is very often necessary to have the ability to scan the data in and out during the test and diagnostic process.

For high performance increase the clock frequency with the technology scaling. But in deep sub micrometer generation's higher performance is obtained by parallelism in the architecture level (eg. Multicore architectures for processor [4]). Deeply pipelined systems exhibit inherent parallelism that requires higher fan-outs at the register outputs. This impacts the requirements for higher flip-flop driving strengths. The impact of the clock skew on the minimum cycle time increases in deep sub micrometer designs as the clock skew does not follow the technology scaling. Thus the ability to absorb the clock skew without impact on setup time becomes important. The amount of cycle time taken out by the flip-flop consists of the sum of setup time and clock-to-output delay [5]. Therefore, the true measure of the flip-flop delay is the time between the latest point of data arrival and corresponding output transition.

In this work we extensively studied the existing flip flops architectures, analyzed and proposed a new differential amplifier based flip flop circuits. DCCER FF is differential in character, fast in speed and consume low power. The SAFF circuit is implemented by the various circuits such as microprocessors, digital signal processing units etc. In Section II, we present mechanism, timing definitions and tools used in the process. Section III presents the working of the proposed design latch of the differential amplifier. Section IV presents the analysis, observation and comparison with SR flip flop by simple CMOS design, clocked CMOS based design and a hybrid solution between the standard NAND-based set/reset latches and NC<sup>2</sup>-MOS approach. In section V results and conclusions are discussed.

#### II. DIFFERENTIAL -AMPLIFIER BASED FLIP FLOP

#### A. Mechanism

The fig [1] shows the general mechanism of flip flop operation. It is different than the master-slave flip flop that consists cascading of two latches) and glitch approaches. The master-slave flip flop creates problem if the clock phases overlaps.

Differential amplifier circuits accept small input small signals and amplify them to generate small rail to rail swings. They are used extensively in memory cores and in low swing bus drivers to either improve performance or reduce power dissipation.

In general the flip flop operation can be divided in two parts that is mentioned in the fig [1] in two blocks. A pulse generator (or precharged front end amplifier [6]), and a slave latch, similar to the master-slave latch combination consist master and slave latch.



Fig: 1 Basic structure of flip flop

In pulse generator the inputs are the data and clock. The pulse generator generates short pulses of sufficient duration. This pulse in turn sets the slave latch. The pattern generator generates the pulse according to the particular realization of the rising or falling edge of clock. The use of cross coupled inverter guarantees that the differential output will switch only once per cycle. The differential inputs in this implementation don't have rail to rail swing. While M-S latch is level sensitive. The sensitivity of pattern generator may pose danger under certain conditions in terms of reliability and robustness of operation. Thus the use of flip flop has been prohibited in some design methodology such as IBM's LSSD [7].

#### B. Timing, tools and technology used:

Fig [2] represents the timing definition for all the observations. We considered the reference point for all the observations in rising edge of the clock from half of the supply voltage.



Fig: 2 sample waveforms for timing definition

The fig [3] represents the timing requirements to send the correct data to output.



Fig 3: Timing requirements

Set up time is defined as the time from when the data is stable before the rising edge of the clock.

$$\Gamma_{su} \ge T_{CLK-Q} + T_{logic} + T_{hold} \qquad \dots (1)$$

It imposes and extra constraint for proper operation. Hold time is defined as the time from the rising transition of clock to the earliest time when data may change after being sampled.

 $T_{hold} \le T_{CDreg} + T_{CDlogic}$  .....(2) From fig [2] hold time will be negative and large, since the data is held before the clock edge.

The CLK-Q delay is defined as the time from the half of the rising edge of the CLK to the more delayed output (here QB) to verify the timing definition. The D-Q delay is defined as time from the point where D transition reaches 50% of the supply voltage to the point where the Q transition reaches 50% of the supply voltage.

We simulate all the DCCER FF with Tanner EDA tool in 0.18um technology with a supply voltage of 2.5V. The design was optimized at temperature of  $27^{\circ}$ C for a clock frequency of 200MHz, and a load capacitance of 30fFwas used for the output Q.

## Section III

## A. Conventional NAND gate based latch in DIFFERENTIAL AMPLIFIER

This flip flop is a dynamic flip flop with precharge and evaluates phases of operation discussed in [8]. In [9] this flip flop is used as an energy recovery flip flop to recover the energy from clock distribution network and clock input capacitance of flip flops. Since clock signal is highly capacitive. Recovering the energy from internal nodes of flip flops in a quasi adiabatic fashion is required. However, storage elements of flip flops cannot be energy recovering because they drive standard (non adiabatic) logic.

The fig [4] represents the schematic of differential amplifier based flip flop with the minimum sizing.



Fig: 4 Differential Amplifier with NAND based SR latch

Instead of using the clock for precharging, small pull-up pMOS transistors (P\_1and P\_2) are used for charging the precharge nodes (SET and RESET). Since SET and RESET

nodes are high that will not change the output of the feedback cross coupled NAND gate. The DCCER flip-flop uses a NAND-based set/reset latch for the storage mechanism. The conditional capturing is Implemented by using feedback from the output (Qand QB) to the control transistors N\_2 and N\_4 in the evaluation paths. Therefore, if the state of the input data (D and Dbar) is same as that of the output, both left and right evaluation paths are turned off preventing SET and RESET from being discharged. This results in power saving at low data switching activities when input data remains idle for more than one clock.



Fig: 5 simulated output with timing definition



Fig: 6 simulated output of SAFF with NAND based SR latch at output

The energy recovery clock is applied to a minimum-sized inverter skewed for fast high-to-low transition. Such skewing creates a sharp high-to-low transition on CLKB to ensure correct timing for the flip-flop operation. The minimum sizing of the inverter reduces its short circuit power caused by slow rising of the input clock. The clock signal and the inverter output (CLKB) are applied to transistors N\_6 and N\_5. The series combination of these transistors conducts for a short period of time during the rising transition of the clock when both the CLK and CLKB signals have voltages above the threshold voltages of the NMOS transistors.

When CLK and CLKB signal have voltage above the  $V_{th}$ And when D and QB both are high only then SET has a discharging path through four stacked NMOS transistors (N\_1 ,N\_2, N\_5,N\_6) that causes output Q charge to 1. At this time there is no discharging path for RESET node .So it remain high. This state transition is captured by the SET/RESET latch (made of NAND gates).

DCCER approach is used for minimizing FF power at low data switching activities.Moreover, since there are four stacked transistors in the evaluation path, significant charge sharing may occur when three of them become ON simultaneously. So having properly sized pull-up PMOS transistors helps to reduce the effect of charge sharing. Another property of the circuit that helps reduce charge sharing is that the clock transistor (MN6), which is the largest transistor in the evaluation path, is placed at the bottom of the stack. Therefore, the diffusion capacitance of the source terminal of MN6 is grounded and does not contribute to the charge sharing

#### B. Hybrid approach

The output latch of the sdifferential amplifier is replaced by the standard NAND based SR latch and  $NC^2MOS$  approach [11]. In [12] Kim et al. proposes a DCCER FF that uses a slave latch with two  $NC^2MOS$  circuits and two cross coupled weak inverter pairs, which are needed to make the flip flop static. But it has some disadvantages. It has glitches and more power dissipation at the output due to crow bar current. The fig [7] represents the schematic of Strolio based latch. We used this with the single phase sinusoidal clock for low power application.



Fig: 7 Antonio based SAFF

Circuit Operation: If we remove transistor N\_7, N\_8,N\_11, N\_12 than it will become NAND based SR latch. When D is high at the rising edge of clock the sense amplifier drives SET node to zero, while RESET is high due to feed back inverter. P 3 will be on and N 13 will be off that makes the output Q high and QB is quickly pulled down to 0 through N\_12, N\_11 and N\_14. Hence the  $H \rightarrow L$  output transition requires only one stage output delay, because the output latch immediately catches the precharged value at the rising edge of the clock. The role of P\_4, N\_9, N\_13, P\_5, N\_10, N\_14 hold the previous value of Q and QB during the sense amplifier precharge, making the flip flop fully static. The role of transistor N\_8 and N\_11 makes the design glitch free. Suppose for instance CLK L $\rightarrow$ H transition occurs with both Q and D high. After the CLK rising edge, the SET node is still high and transistor N\_13 is on. Since N\_8 is off so the node Q always remain at VDD, without glitch.



Fig: 8 simulated glitch free output with timing definition



Fig: 9 simulated output of Antonio based flip flop

#### C. Clocked CMOS based design

#### 1) First approach

A clocked CMOS version of the latch is shown in fig [10]. It consists of a cross coupled inverter pair, plus inverter and NMOS transistor at both Q and QB and four extra transistors to drive the flip flop from one state to another, and to provide synchronization. The role of INV1, INV2, P\_3and P\_4 at output is to reduce the redundant Internal switching at output. In steady state one inverter resides in the high state and other in the low state. No static path between VDD and GND exists. Transistor sizing is however essential to ensure that the flip flop can transition from one state to other if required.



Fig: 10 Clocked CMOS based approach

To describe the circuit operation let us assume, that D is high at the rising edge of CLK. The sense amplifier drives SET to 0, while RESET remains high and in this way N\_8 off and N\_10 on driving QB high. Note that the shut off of N\_8 assures at the same time a ratio less design, without crow bar current and the independence of the transition delay from the capacitive load on the other Q output. The use of feedback inverter quickly discharges the QB through INV2 and makes the output Q to high. The H $\rightarrow$ L transition requires only one stage delay because the output latch catches the precharged value at the rising edge of the CLK. Note that if, after pull up of QB, PMOS of INV4 turns on, keeping QB high even if input D changes after CLK rising edge. The use of inv2 and inv4 hold the previous Q and QB value during the sense amplifier precharge, making the flip flop fully static. The output of the proposed design is presented in fig [11].



Fig: 11 simulated output of the proposed design



Fig: 12 simulated output with timing definition (first approach)

#### 2) Second approach

Fig [13] represents the schematic of the design. The working is same as described above. Here we added two NMOS transistor in the first approach, in which the inputs are D and DB. It enables the avoidance of glitch problem. Let us suppose that at clock rising edge both QB and D is high. After the CLK rising edge, the SET node is still high and transistor  $N_11$  is on. However, the pull down through



Fig 13: Schematic of second approach

The speed-up network N\_7-8 does not take place, since N\_8 is off. As a consequence QB will be stable at VDD and Q at 0 without glitch. The absence of glitches at the

output gives the safe operation and reduces power dissipation.



#### **IV. Analysis and Observations**

| SAFF              | CLK-to-<br>output<br>delay<br>L? H<br>H? L | Rati<br>oless | Q and<br>QB<br>delay<br>indepen<br>dence | Glitch<br>free<br>output | Output<br>noise<br>immunit<br>y | Robustness<br>against<br>sizing | Duai<br>Outputs |
|-------------------|--------------------------------------------|---------------|------------------------------------------|--------------------------|---------------------------------|---------------------------------|-----------------|
| conventio<br>nal  | 2stages<br>3stages                         | Yes           | No                                       | Yes                      | No                              | Yes                             | Yes             |
| Nikolic<br>et al. | 2stages<br>3stages                         | Yes           | Yes                                      | Yes                      | No                              | Yes                             | Yes             |
| Kim et<br>al.     | 2stages<br>1 stage                         | No            | Yes                                      | No                       | No                              | No                              | Yes             |
| Antonio<br>et al. | 2stages<br>1 stage                         | Yes           | Yes                                      | Yes                      | No                              | Yes                             | yes             |
| Proposed          | 2 stages<br>1 stage                        | Yes           | Yes                                      | Yes                      | No                              | Yes                             | Yes             |

 Table 1: (Flip-flop characteristics)

| Design                                      | Tota1<br>width of<br>pulse<br>generator<br>(with<br>inverter) | Total<br>width<br>of<br>Output<br>latch<br>µm | Tota1<br>width<br>μm | Total no.<br>of<br>transistor<br>used in<br>output<br>latch | Clock<br>used              |
|---------------------------------------------|---------------------------------------------------------------|-----------------------------------------------|----------------------|-------------------------------------------------------------|----------------------------|
| Conventiona<br>1 NAND<br>based SR<br>latch  | 9.1                                                           | 16                                            | 25.1                 | 8                                                           | Sinusoida1<br>single phase |
| Matsui et al.<br>and<br>Montanaro<br>et al. | -                                                             | -                                             | 54.9                 | 8                                                           | Square wave                |
| Antonio et                                  | 12.58                                                         | 10.75                                         | 23.33                | 12                                                          | Sinusoida1<br>single phase |
|                                             | -                                                             | -                                             | 44.5                 | 12                                                          | Square wave                |
| Proposed (II)                               | 4.7                                                           | 28.6                                          | 33.3                 | 16                                                          | Sinusoida1<br>single phase |
| Nikolic et al.                              | -                                                             | -                                             | 67.8                 | 16                                                          | Square wave                |
| Kim et al.                                  | -                                                             | -                                             | 58.4                 | 14                                                          | Square wave                |

Table 2: (Flip-flop measurements)

| Data<br>pattern (α)  | Tsu<br>(ns) | Thei<br>d<br>(ns) | Tcq<br>(ns) | Tdq<br>(ns) | Total<br>power<br>dissipation<br>(µwa tt) | PDP<br>(FJ) | Flipflops                  |
|----------------------|-------------|-------------------|-------------|-------------|-------------------------------------------|-------------|----------------------------|
| 0101010<br>101010101 | 1           | 0.79              | 0.21        | 1.2         | 42.97                                     | 52.08       | Conventional<br>NAND based |
| 0011001<br>100110011 | 0.58        | 1.48              | 0.23        | 0.81        | 42.97                                     | 34.81       | SAFF                       |
| 0000111<br>100001111 | 0.44        | 1.08              | 0.37        | 0.78        | 42.97                                     | 33.51       |                            |
| 0101010<br>101010101 | 0.49        | 1.2               | 0.197       | 0.69        | 42.96                                     | 29.6        | Antonio et<br>a1.          |
| 0011001              | 0.58        | 1.48              | 0.193       | 0.77        | 42.96                                     | 33.08       |                            |
| 0000111<br>100001111 | 0.44        | 1.08              | 0.196       | 0.64        | 42.96                                     | 27.5        |                            |
| 0101010<br>1010101   | 1           | 8.0               | 0.38        | 0.70        | 36.8                                      | 25.76       | Proposed<br>(II)           |
| 0011001<br>1001100   | 0.6         | 1.4               | 0.35        | 0.80        | 35.7                                      | 28.56       |                            |
| 0000111              | 0.58        | 1                 | 0.3         | 0.70        | 35.8                                      | 25.06       | 7                          |

Table 3: Represents the PDP and timing parameters for different switching activities  $\alpha=1,\alpha=0.5,\alpha=0.25$  for sinusoidal clock of 200 MHz and the output load in the range of 30fF-200f F.

Table 1 represents the characteristic of proposed and others. The CLK to output delay for  $H\rightarrow L$  transition of Q or QB is one gate delay because the output is immediately appears at the rising edge of CLK; however the delay of two feedback inverter will also be associated in both approaches.

Table 2 represents the minimum sizing, the no. of transistor used in pulse generator and output latch with respect to different clock. We used sinusoidal clock for comparison. Table 3 represents the PDP and CLK-Q delay for various flip flops and the proposed one with switching activity. The proposed flip-flop is best one for both the Power and performance. The design can be sized to work with other frequencies.

## V. CONCLUSION

This paper introduces new sense amplifier based flip-flop. The slave latch of the new flip-flop is able to keep the advantage of NC<sup>2</sup>MOS approach [15]. The future work can be extended by giving the preset and clear to achieve asynchronous flip-flop. The design can be used as an energy recovery flip-flop since it is assumed that only the storage part of element of flip flop cannot be energy recovering because they drive standard (non-adiabatic) logic [9]. But the energy at the gates of sense amplifiers can be used for energy recovery, since they are not the part of storage element. However, the proposed design is more efficient due to feed back inverters used and consumes lesser total power when compared with conventional and Hybrid flip-flops.

The proposed flip-flop gives a very good PDP with glitch free operation, lesser crow-bar current and lesser static power dissipation.

#### ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers for their constructive comment and that have substantially improved this paper.

## REFERENCES

- F. Moradi, C. Augstine, A. Goel, G. Karakonstantis, T. V. Cao, D. Wisland, H. Mahmoodi, and K. Roy "Data dependent Sense Amplifier Flip Flop for low Power application," *IEEE Con.* 2010.
- [2] H. kawaguchi and T. Sakurai, "A reduced clock swing flip flop (RCFF) For 63% power reduction," *IEEE J. Solid-State Circuits*,vol. 33, pp. 807–811, May 1998.
- [3] G. Gerosa *et al.*, "A 2.2W, 80 MHz superscalar RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 29, pp. 1440–1452, Dec. 1994.
- [4] S. Rusu, S. Tam, H. Muljono, D. Ayers, J. Chang, B. Cherkauer, J.Stinson, J. Benoit, R. Varada, J. Leung, R. D. Limaye, and S. Vora, "A 65-nm dual core multithreaded xeon processor with 16-MB L3 cache," IEEE J. Solid State Circuits, vol.42, no.1, pp. 17-25, Jan. 2007.
- [5] S. H. Unger and C. J. Tan, "Clocking schemes for high speed digital systems," IEEE Tans. Comput., vol. C-35, no., 10, pp. 880-895, Oct. 1986.
- [6] J.M. Rabey et al., Digital Integrated Circuits: A design perspective 2<sup>nd</sup> edition Prentice Hall of India, 2005.
- [7] Engineering Design Systems: LSSD Rules and Applications, manual 3531, IBM, Corp., Armonk, NY, 1985.
- [8] J. U. Nikolic, "Improved sense amplifier based-flip flop: design and Measurements," *IEEE Trans. Electron Devices*, vol.35 no.6, pp. 35–39, Jun. 2000.
- [9] H. Mahmoodi, V. Tirumalashetty, M. cooke and kaushik roy, "Ultra low power clocking scheme using energy recovery and clock gating," VLSI *IEEE Trans.*, vol. 17 no.1, pp. 33–46, Jan. 2009.
- [10] F. Moradi, T. V. Vao, D. Wisland, C. Augstine, A. Goel, G.

Karakonstantis, K. Roy, H. Mahmoodi "Data dependent sense amplifier flip flop for low power application," in *2010 IEEE Int. Conf.*.

- [11] G. M. Strollo, D. D. Caro, E. Napoli and N. Petra "A novel high speed Sense-amplifier based flip flop," VLSI IEE Transaction, vol. 13 no. 11 pp. 1266–1274 Nov. 2005.
- [12] J. Kim, Y. Yang and H. Park, "CMOS sense amplifier based flip flop with two NC<sup>2</sup>MOS output latches," Electron. Lett. vol. 36 no.6, pp.498-500, Mar. 2000.
- [13] M. Matsui, H. Hara, Y. Uetani, L. Kim, T. Nagamatsu, Y. Watanabe, A.chiba, K. Matsuda and T. sakurai "A 200MHz 13 mm2 2D DCT macrocell using sense amplifying using pipeline flip flop scheme," IEEE J. Solid State Circuits, vol. 29, no. 12, pp 1482-1490, Dec. 1994.
- [14] J. Montanaro et al., "A 160 MHz 32-b 0.5-W CMOS RISC microprocessor," IEEE J. Solid State Circuits vol. 31, no. 11, pp. 1703-1714, Nov 1996.
- [15] J. Kim, Y. Jang, and H. Park "CMOS senseamplifier based flip flop with two N-C<sup>2</sup>MOS output latches," Electron. Lett., vol. 36, no. 6, pp. 498-500, Mar 2000.