## A review of energy efficient ternary content addressable memory (TCAM) circuits for network application

by

Nafiur Rahman-122435 K. M. Rumman-122438 S. M. Rakib Raihan-122440

A Thesis Submitted to the Academic Faculty in Partial Fulfillment of the Requirements for the Degree of

#### BACHELOR OF SCIENCE IN ELECTRICAL AND ELECTRONIC ENGINEERING



**Department of Electrical and Electronic Engineering** 

Islamic University of Technology (IUT)

Gazipur, Bangladesh

November 2016

#### A Dissertation on

### A review of energy efficient ternary content addressable memory (TCAM) circuits for network application

#### Submitted By

Nafiur Rahman

K. M. Rumman

S.M. Rakib Raihan

Approved by

••••••

Dr. Md. Ashraful Hoque

Professor and Departmental Head

Departmental of Electrical and Electronic Engineering,

Islamic University of Technology (IUT),

Boardbazar, Gazipur-1704

Date:

**Supervised By** 

••••••

Dr. Syed Iftekhar Ali

#### **Associate Professor**

Departmental of Electrical and Electronic Engineering,

Islamic University of Technology (IUT),

## Abstract

Ternary content addressable memory [TCAMs) are especially popular in network routers for packet forwarding and classification, but they are also beneficial in a variety of other applications that require high speed table lookup. They are used to implement lookup-table function in a single clock cycle using dedicated comparison circuitry. Although it has many attractive features, higher power consumption is very crucial drawbacks of TCAMs. Hence, a great amount of efforts have to give to design T-CAMs without doing any compromise with its speed and voltage margin. Match line sense amplifiers (MLSAs) consume a significant portion of the TCAM power for match detection .This works compares circuit techniques for reducing TCAM power consumption.

## Acknowledgements

This work was supported by Electrical and Electronic University of Technology, OIC, Bangladesh.

We would like to express deep gratitude to our supervisor, Dr. Syed Iftekhar Ali, Associate Professor of Electrical and Electronic Engineering Department of Islamic University of Technology for providing moral support, technical guidance and encouragement. He gave us full freedom to develop our research. At the same time, he was easily accessible whenever we needed his guidance and feedback. During the tough times (when we were having difficulties in simulation results), he provided us moral support and useful tips. Our association with him made our undergraduate research an enjoyable learning experience.

We did not have enough ideas about VLSI circuit design, when we started our research and our supervisor extended his hand of cooperation to us by making us understand the basis of VLSI design. He pressurized us only in certain tasks where he felt that, things should have been done by us without his assistance and at the end of the day after finishing the task we gained self-confidence and containment. We are honored to do research under his Supervision.

We would like to thank Prof. Ashraful Hoque, Head of the department, EEE, IUT; Md. Mehedi Hassan Galib for encouraging us to do research work on VLSI circuit design. We would also like to thank our friends to support us during our study and research.

Last, but definitely not the least, a very special thanks to our beloved parents who always supported us with their unending love and care during all ups and downs of our research. Their encouragement, moral support, faith and love were the source of our enthusiasm.

# Contents

| Chapter 1                                                    | 10 |
|--------------------------------------------------------------|----|
| 1. Introduction                                              | 10 |
| 1.1 CAM Basics                                               | 11 |
| 1.2 Binary CAM Cell                                          | 13 |
| 1.2.1. WRITE operation                                       | 14 |
| 1.2.2 READ operation                                         | 14 |
| 1.2.3 SEARCH operation                                       | 14 |
| 1.3 The Ternary CAM cell (TCAM)                              | 16 |
| CHAPTER 2                                                    | 16 |
| 2.1 TCAM structure                                           | 17 |
| 2.1.1 SRAM Structure                                         | 17 |
| 2.1.2 Comparator Circuit                                     | 18 |
| 2.1.3 TCAM Basic Structure                                   | 19 |
| 2.1.4. TCAM Word, Array and SEARCH operation                 | 20 |
| 2.2 Match line and Search line Sensing Schemes               | 23 |
| 2.2.1 Conventional Scheme                                    | 23 |
| 2.2.2 Precharge-high low-swing or charge-shared ML sensing   | 25 |
| 2.2.3 Pipelining scheme                                      | 27 |
| 2.2.4 CR scheme                                              | 28 |
| 2.2.4.1 CR-based schemes with positive feedback              | 30 |
| Chapter 3                                                    |    |
| 3. Simulation Results and Performance Analysis               | 35 |
| 3.1 Current Race Scheme                                      | 35 |
| 3.1.1 Match line current variation in Current Race Scheme    | 35 |
| 3.1.2 Search Time                                            | 36 |
| 3.1.3 Voltage Margin                                         | 37 |
| 3.1.4 Peak Dynamic Power and Worst Case Energy Consumption   | 38 |
| 3.2 Active Feedback Sensing Scheme                           |    |
| 3.2.1 Match line current variation in Active Feedback Scheme |    |
| 3.2.2 Search Time                                            | 41 |

| 3.2.3 Voltage Margin                                            | 41 |
|-----------------------------------------------------------------|----|
| 3.2.4 Peak Dynamic Power and Worst Case Energy Consumption      | 42 |
| 3.3 Resistive Feedback                                          | 43 |
| 3.3.1 Match line current variation in Resistive Feedback Scheme | 44 |
| 3.3.2 Search Time                                               | 45 |
| 3.3.3 Voltage Margin                                            | 45 |
| 3.3.4 Peak Dynamic Power and Worst Case Energy Consumption      | 46 |
| 3.4 Final Comparison                                            | 47 |
| CHAPTER 4                                                       | 48 |
| 4. Conclusion                                                   | 48 |
| References                                                      | 49 |

# List of Figures

| Figure.1.1: Simple schematic of a model CAM with 4 words having 3 bits each           | 12 |
|---------------------------------------------------------------------------------------|----|
| Figure 1.2: Circuit diagram of conventional 10T NOR- type BCAM cell and 9T NAND-      |    |
| Type cell                                                                             | 13 |
| Figure 1.3: Shows how multiple cells are connected side-by side to form an n-bit data | 15 |
| Figure 2.1: 16T NOR- type TCAM cell and a 16T NAND- type TCAM cell                    | 17 |
| Figure 2.2: Single SRAM Cell                                                          | 18 |
| Figure 2.3: Matched state and Mismatched state                                        | 19 |
| Figure 2.4: One TCAM data word                                                        | 21 |
| Figure 2.5: A k-word x n-digit TCAM array using NOR- type cells                       | 22 |
| Figure 2.6: TCAM- based routing table implementation and packet forwarding            | 23 |
| Figure 2.7: Conventional ML sensing scheme                                            | 24 |
| Figure 2.8: Charge-shared ML sensing scheme                                           | 25 |
| Figure 2.9: Charge-shared ML sensing scheme containing arbitrary k digits             | 26 |
| Figure 2.10: General form of Pipelining scheme                                        | 27 |
| Figure 2.11: CR ML sensing scheme- one word and dummy word                            | 29 |
| Figure 2.12: MLSA in MD power allocation technique for ML sensing                     | 31 |
| Figure: 2.13 AF ML sensing scheme                                                     | 32 |
| Figure: 2.14 RF ML sensing scheme                                                     | 33 |
| Figure 3.1 ML current variation in current race scheme                                | 36 |
| Figure 3.2: Search time for current race scheme                                       | 36 |
| Figure 3.3: Voltage margin for current race scheme                                    | 37 |
| Figure 3.4: The peak dynamic power and worst case energy consumption for current race |    |
| sensing scheme                                                                        | 39 |
| Figure 3.5: ML current variation in Active Feedback scheme                            | 41 |
| Figure 3.6: Search time for active feedback sensing scheme                            | 41 |
| Figure 3.7: Voltage margin in Active Feedback scheme                                  | 42 |
| Figure 3.8: The peak dynamic power and worst case energy consumption for Active       |    |
| Feedback sensing scheme                                                               | 42 |
|                                                                                       |    |

| Figure 3.9: ML current variation in Resistive feedback scheme                                               | .44 |
|-------------------------------------------------------------------------------------------------------------|-----|
| Figure 3.10: Search time for resistive feedback scheme                                                      | .45 |
| Figure 3.11: Voltage margin for resistive feedback scheme                                                   | .45 |
| Figure 3.12: The peak dynamic power and worst case energy consumption for Resistive Feedback sensing scheme | .46 |

## List of Tables

| Table3.1: Different parameter of current race scheme                   |    |
|------------------------------------------------------------------------|----|
| Table3.2: Aspect ratio of gate parameters of active feedback scheme    | 40 |
| Table3.3: Different parameter of active feedback scheme                | 43 |
| Table3.4: Aspect ratio of gate parameters of resistive feedback scheme | 43 |
| Table3.5: Different parameter of Resistive Feedback scheme             | 46 |
| Table 3.6: Final comparison                                            | 47 |

### Chapter 1

### 1. Introduction

Massive increase of internet users throughout the world has given birth to the demand of high speed internet networks. In modern internet network architecture, routers are the most important components. A router connects multiple networks and interchanges data packets between them. Each packet contains a header and a payload. The header contains information such as a source address, a destination address, the data length, a sequence number and the data type of the packet<sup>1</sup>.

By inspecting the information in the header of an incoming packet, the router can decide the target network and can select the preferred path between the source and the destination networks. Each router maintains a routing table which contains multiple entries where each entry, known as prefix, typically contains information such as destination address and corresponding output port address<sup>2</sup>.

The sheer explosion of traffic created by new mobile and social applications is driving highcapacity line cards. This growing demand of high-speed network is further pushing the existing table look-up solutions to their limits. Many software-based methods have been proposed for the longest prefix matching table look-up.

Software-based techniques use external static random access memory (SRAM) or dynamic random access memory (DRAM) to store and to search a data structure. These techniques use algorithms such as hashing, trees and tries to reduce computational complexity<sup>3-8</sup>.

While these approaches can reduce the number of memory accesses for a single search key, they usually cannot meet the requirement of high-speed forwarding or classification.

Using hardware-based solution, more specifically ternary content addressable memory (TCAM), to perform hardware-based packet forwarding and classification has become the de facto industrial standard <sup>9</sup>.

Content addressable memory (CAM) is a special type of memory used in very high-speed searching applications. It is also known as associative memory, associative storage or associative array.

In addition to READ and WRITE operations, TCAM can also perform SEARCH operation. In standard random access memory (RAM), the user supplies a memory address and the RAM returns the data word stored at that address. A CAM is designed in such a way that the user supplies a search word and the CAM searches its entire content to see if that word is stored anywhere in it. If the search word is found, the CAM returns a list of one or more storage addresses where it is

found. CAM is designed to search its entire memory in a single operation. Therefore, it is much faster than RAM.

A ternary CAM or TCAM has the additional capability of storing mask or don't care" bits. This makes TCAMs even more attractive for packet forwarding or classification since these applications require mask bits to be stored in the routing table.

Discrete high-performance search processors and TCAMs are typically used for layer 2 to layer 4 look-ups in higher-end Edge and Core equipment. Besides the networking equipment, CAMs are also attractive for other applications<sup>10</sup>.

But, the current TCAM research is primarily driven by the networking applications, which requires high-capacity TCAMs with low-power and high-speed operation. With increasing search speed and larger word size, TCAM dynamic search power/ energy consumption has also become a major concern. TCAM chips form a part of network processor .Low-power sub-system design is essential for integration into system-on-chip (SoC). Therefore, energy efficient TCAM design has become an active research area.

In this paper, we present a survey on the different CAM circuits and architectures primarily designed to reduce dynamic search energy consumption.

## 1.1 CAM Basics

For implementing lookup table orientation a CAM is a very good choice due to its fast search capability. But, the speed of a CAM comes at the expense of increased silicon area and power consumption. Power consumption problem becomes more acute with the larger CAMs. Hence, reducing power consumption without sacrificing speed or area, is the main challenge forward.

The figure 1.1 shows a CAM consisting of 4 words, with each word containing 3 bits arranged horizontally (corresponding to 3 CAM cells). There is a match line corresponding to each word (ML, ML1, ML2 etc.) feeding into match line sense amplifiers (MLSAs), and there is a differential search line pair corresponding to each bit of the search word (SL<sub>0</sub>,  $SL_0$ , SL<sub>1</sub>,  $SL_1$ , etc.). A CAM search operation begins with loading the search-data word into the search-data registers. Then search data is provided into the differential search lines and each CAM core cell compares its stored bit against the bit on its corresponding search lines Machines on which all bits match become/remain high state. Match lines that have at least one bit that misses, discharge to ground. The MLSA then detects whether its matchline matching condition or miss condition. Finally, the encoder maps the match line of the matching location to its encoded address<sup>11</sup>



Fig. : 1.1 Simple schematic of a model CAM with 4 words having 3 bits each

A CAM cell serves two basic functions: bit storage [as in RAM] and bit comparison [unique to CAM]. CAMs can be divided into two categories (i) binary CAMs and (ii) ternary CAMS (TCAMs). A binary CAM can store and search binary words (made of 0s and 1s). Thus, nary CAMs are suitable for applications that require only exact-match searches. A more powerful and feature-rich TCAM can store and search any states (1,0 and X). The state 'X', also called "mask or "don't care", allowing a wildcard operation. Wildcard operation means that an "X" value stored in a cell causes a match regardless of the input bit.

### 1.2 Binary CAM Cell

Binary CAM (BCAM) can store either of the two states `1' (high) or `0' (low) in each cell. Since only one out of two states is to be stored, a simple 6T SRAM cell is used for data storage. A comparison circuit is there in each cell to carry out the SEARCH operation. There are two types of comparison circuit –

- 1) NAND-type
- 2) NOR-type.

Figure (1.2) shows the BCAM cells with both types of comparison circuits



Fig. 1.2 Circuit diagrams of conventional (a) 10T NOR-type BCAM cell and (b) 9T NAND-type cell.

#### Figure reference [10]

In Figs. 1(a) and 1(b), transistors M1–M6 constitute 6T SRAM cells. In Fig. 1(a), the NOR comparison circuit is composed of transistors M7–M10 while in Fig. 1(b), the NAND-type comparison circuit comprises transistors M7–M9. READ and WRITE operations are carried out with the help of bit lines (BLs) and word line (WL). The search bit is supplied in the search lines (SLs). Each cell (bit) in a data word is connected to a match line (ML). The logic level at ML is affected by match or mismatch condition which is sensed to deduce the match result. In NOR-type cell, the ML is pulled down to ground if the stored bit does not match the search

bit. In NAND-type cell a match results in connection between two successive ML segments ML1and ML2. Since READ, WRITE and SEARCH operations are never performed simultaneously, sometimes same lines are used for READ/WRITE and SEARCH, i.e. BLs and SLs are merged.

#### 1.2.1. WRITE operation

WRITE operation to the cell is performed by supplying the data to be written on BLs and enabling WL. Enabling WL causes the access transistors M5 and M6 to turn on. Data from BLs pass through the access transistors to the internal nodes (Data and Data) and are preserved there as full rail-to-rail voltage because of the feedback action of the cross coupled inverters. Data to be written may be different from the already stored value. In that case, appropriate internal node (Data or Data) needs to be flipped from `1' to `0'. Since NMOS has higher carrier mobility than PMOS, using same-sized (minimum-sized) access transistors (M5, M6) and load transistors (M1, M3) in the SRAM cell can accomplish this job easily.

#### 1.2.2 READ operation

READ operation is performed by pre charging the BLs to '1' and enabling WL. After Pre charging, BL drivers are turned off. Access transistors M5 and M6 are ON, which causes one of the BLs to start discharging. Reduction of voltage in one BL line causes voltage difference between the BLs which is sensed by the BL sense amplifier (BLSA) and converted into full rail-to-rail voltage. In order to prevent accidental flip of stored bit during READ operation, driver transistors (M2, M4) are chosen to be stronger than access transistors (M5, M6). Generally, the driver transistors are 1.5 to 2.5 times wider (for same gate length) than the access transistors.

#### 1.2.3 SEARCH operation

The search key is supplied through SLs. In Fig. 1.2 (a), if Data=SL and hence  $\overline{Data}$ =SL, ML remains disconnected from ground. Otherwise, one of the pull-down paths (through M7, M8 or through M9, M10) causes ML to be connected to ground. Therefore, transistors M7 through M10 implement XNOR function, i.e., ML=`0' if Data≠SL (ML=Data xor SL).



Fig. 1.3 (a) One BCAM data word consisting of n-bit NOR-type cells and (b) one data word consisting of n-bit NAND-type cells.

Figure 1.3 (a) shows how multiple cells are connected side-by-side to form an n-bit data word. Even if a single bit is mismatched in a word, the ML will be connected to ground through the comparison circuit of the mismatched cell and will have logic '0'. Only in case of a full word match, the ML will remain in high impedance state. The decision circuit can sense this difference and produce corresponding match result. Comparison circuits of multiple parallel cells form parallel pull-down paths for ML just like the parallel pull-down paths of the driver network of a NOR gate. That is why, this type of cell is called NOR-type cell.

In Fig. 1.3(b), if Data= SL, then either M8 (when Data=SL=`1') or M7 (when Data=SL=`1') will be ON and will pass logic high to the gate of M9. Therefore, M9 will also be ON making connection between ML1 and ML2 (and eventually connection to ground). Transistors M7 through M9 implement XOR logic, i.e., ML=`0' if Data=SL (ML=Data xor SL). One problem of NAND-type cell is, gate of M9 will never have full VDD rather VDD – Vtn, since NMOS pass transistor cannot pass logic level high well. In case of a mismatch (Data $\neq$  SL), M9 will have logic level `0' at its gate and hence will remain OFF. This will result in no connection between ML2 and ground. Now, when multiple cells are connected side-by-side to form a word as shown in Fig. 1.3 (b), only in case of a full match between the data word and the search key, all ML segments (ML<sub>0</sub> to ML<sub>n</sub>) will be connected to ground. Even if there is a mismatch in one bit, all ML segments (and the decision circuit) to the right side of that bit will remain disconnected from ground.

The decision circuit distinguishes between a match and a mismatch and produces an appropriate output. The ML consists of series connected NMOS transistors just like the pull-down path of the driver network of a NAND gate which makes the naming of the cell appropriate.

### 1.3 The Ternary CAM cell (TCAM)

Binary CAM can perform exact –match searches, but it cannot store don't care or mask condition. This drawback of Binary CAM leads to a new powerful CAM which is known as Ternary Content Addressable Memory (TCAM). TCAM can store don't care condition. Don't cares act as wildcards during a search, and are particularly attractive for implementing longest-prefix-match searches in routing tables.

### **CHAPTER 2**

### 2.1 TCAM structure

The word "ternary" comes from the fact that each cell in TCAM can store three States namely high, low and mask or don't care `X'. Representation of three states requires two bits. Hence, each cell in TCAM contains two SRAM cells. TCAM cell can be either of two types—NOR-type or NAND-type. Figure 2.1 shows both types of TCAM cells.



Fig. 2.1 (a) 16T NOR-type TCAM cell and (b) 16T NAND-type TCAM cell. One TCAM cell contains two SRAM cells (bits).

### 2.1.1 SRAM Structure

A SRAM is made of two cross couple inverter and it can store 1 bit data. In fig 2.2 two Nmos access transistors named M1, M2 are there to read/write a bit in a SRAM cell. When the world line is high then M1 and M2 are turned on and bit present in bit lines pass into SRAM. When the word lines are kept low then the bits written in the SRAM are stored in it. To read a data from the SRAM word line is made high which in turn turns on M1 AND M2 bit present in SRAM pass into bit lines and thus they are read.



#### 2.1.2 Comparator Circuit

A circuit which compares a search data with the stored data and provides with the result either match or mismatch.

If stored data & search data are same then the condition is called match condition and ML is floating [fig- 2.3 a]. ML can be charged or discharged depending upon different schemes. If the stored data & search data are not same then the condition is denoted as mismatch condition and ML grounded [fig 2.3 b]. ML can be charged or discharged depending upon different schemes.

Match Line

TCAM



shown





b) mismatched state

#### 2.1.3 TCAM Basic Structure

In NOR-type TCAM cell shown in Fig.2.1 (a), transistors M1–M4 make the comparison circuit. The three states are stored as Data1Data2 = 01 (low), Data1Data2= 10 (high) and Data1Data2= 00 (don't care).Data1Data2= 11 state is not allowed. Similar encoding is used for search data also, i.e., SL1SL2= 01 or 10 or 00.

When stored data Data1Data2= 00, the masking is called local masking. SL1SL2= 00 means global masking. When Data1Data2=SL1 SL2 (match) or local/global masking is being used, neither of the ML pull-down paths (through M1, M2 or through M3, M4) is active and ML remains disconnected from ground. Only when Data1Data2  $\neq$  SL1 SL2 (mismatch) and there is no masking, ML is pulled down to ground by one of the two transistor pairs M1M2 or M3M4.

In Fig. 2.2 (b), the NAND-type cell uses separate cells for data and mask. Transistors M1 through M4 form the comparison circuit. Local masking is achieved by storing `1' in the mask cell (X=`1') which turns on pass transistor M4 and connects ML segments of both sides

4

Fig.

in

In

NOR-type

(ML1and ML2) irrespective of the value stored in the data cell. Global masking is achieved by supplying SL1=`1' and SL2=`1' (SL1SL2=00 is not allowed). This makes pass transistor M3 to turn ON since either M1 or M2 is always ON. That in turn, connects ML segments of both sides irrespective of the value stored in the data and mask cell. When there is no masking, i.e. X= `0' and SL1=SL2, the operation is same as BCAM NAND cell. Pass transistor M3 turns ON only if the stored data and the search data match, i.e. Data= SL1 (and Data=SL2= SL1). Therefore, only in case of a mismatch, both of the pass transistors M3 and M4 remain OFF.

In packet forwarding/classification application, WRITE operation is carried out to update the routing table and READ operation to test successful WRITE operation. These operations are infrequent while SEARCH is the most frequently per-formed task. READ/WRITE operation of the SRAM cells within TCAM cells is carried out using same procedure mentioned in case of BCAM

### 2.1.4. TCAM Word, Array and SEARCH operation

TCAM word is formed by joining TCAM cells side-by-side. The construction of a TCAM word is shown in Fig. 2.4



Fig. 2.4 One TCAM data word consisting of (a) *n*-digit NOR-type cells and (b) *n*-digit NAND-type cells. BLs, WLs and access transistors have not been shown for clarity.

In Fig. 2.4 (a), the ML is pulled down to ground if there is one or more mismatched cells in the word. Otherwise, ML remains floating. In Fig. 5(b), the whole ML is (all intermediate nodes MLn, MLn1;..., ML1 are) grounded only if there is a full match between the data word and the search key. The decision circuit senses the state of ML and produces a match result. Depending on ML sensing scheme, the construction of the decision circuit varies. In all cases, match result is `1' when there is a full match. Otherwise, match result is zero signifying one or more mismatches. Multiple words are used to form a table/array of stored words as shown in Fig. 2.5. During the SEARCH operation, the search key is compared with each word simultaneously and match results for all the words become available within one clock cycle.



Fig. 2.5 A k-word  $\times$  n-digit TCAM array using NOR-type cells.

For packet forwarding/classification applications, all the ML sensing outputs (MLSOs) are inputs to a priority encoder (not shown). In CIDR, the IP addresses are allowed to have variable lengths. When storing addresses in the table, if the prefix size is less than the word size, the least significant bits are padded with mask bits. In order to implement the LPM, the prefixes in the table are sorted according to their actual lengths, i.e., entry with smallest number of mask bits (longest prefix) has the highest priority. In case of multiple matches, the priority encoder will choose the MLSO of the highest priority prefix, i.e., the longest prefix. Figure 2.6 shows a simple example of packet forwarding using TCAM.



Fig. 2.6 TCAM-based routing table implementation and packet forwarding.

In Fig. 2.6, a 4-word x 6-digit TCAM array stores the routing table in descending order of priority from bottom to top. Because of the mask/wildcard bits, each entry represents a range. For example, the second highest priority entry 0110XX means packets with destination addresses in the range 011000 to 011011 have to be routed to port B.

If the destination address is 011010, there are matches with second and fourth entries. But, second entry is selected by the priority encoder since it has higher priority (fewer mask bits) and eventually the packet is sent to port B. This is how LPM is implemented using TCAMs.

### 2.2 Match line and Search line Sensing Schemes

There are many Match line and Search line Sensing Schemes. Some of them will be discussed here.

### 2.2.1 Conventional Scheme

Figure 2.7(a) shows the conventional scheme for match detection proposed for NOR-type TCAM<sup>12</sup>. Match detection requires following sequence of events: first discharge all SLs to ground, pre charge all the MLs to high (by Pre charge signal) and then broadcast the search key to the SLs. If there is a full match, the corresponding ML retains its voltage. If there are

one or more mismatches, the ML has conducting path(s) to ground and ML is discharged. The decision circuit is composed of a charging PMOS transistor and an ML sense amplifier (MLSA).

The MLSA senses the logic level present at the ML and produce high output for match and low output for mismatch.



Fig. 2.7 (a) Conventional ML sensing scheme where SLs are discharged to ground and MLs are precharged to high before match evaluation. Only the comparison logic of nth cell is shown for clarity. (b) Modified comparison logic suitable for conventional scheme.

The comparison circuit of the proposed scheme suffers from charge sharing problem. During the match evaluation, after SLs are discharged, M1 and M2 turn OFF. MLs are pre charged to high and search key is supplied. Depending on the search bit, either M1 or M2 may turn on. Therefore, the charge stored in ML is shared with the node A1 or A2. This reduces the voltage at ML. If this charge sharing happens in large number of cells in a fully matched word, the ML voltage may drop to a small value. This may lead to wrong match result. That is why, for conventional scheme the modified form of comparison circuit shown in Fig. 2.7 (b) is preferred. [16]

Since the stored bit controls the states of M1 and M2 and the bit is constant during a search, the charge sharing problem is eliminated.

Generally, there are only few words with full match in an array. MLs and SLs are heavily capacitive. As shown in Fig. 2.5, the NOR-type ML is shared by all the cells in a word. Therefore, the ML capacitance ( $C_{ML}$ ) is directly proportional to number of cells (digits) in a word. Again, same SL pair is shared by all the cells in a column. Therefore, SL capacitance ( $C_{SL}$ ) depends on the number of entries in the table. Modern routing tables may contain several hundred thousand entries. During a search, all the MLs and SLs are activated simultaneously. Switching of these highly capacitive lines causes huge power consumption. The problem with conventional pre charge-high technique is, all the MLs are pre charged to V<sub>DD</sub> and most of those MLs are discharged

to ground during match evaluation. This causes large amount of energy to be wasted. The power consumed by a single mismatched ML due to pre charge and discharge can be estimated by ref. [10]

 $P_{miss} = C_{ML} V_{DD}^2 f;$  ------1

Where f is the frequency of the SEARCH operation. Since there is only a small number matches, the overall ML power consumption with w MLs can be estimated by

 $P_{ML}=wC_{ML}V_{DD}^{2}f;$  -----2

Energy efficient match detection or ML sensing techniques try to reduce ML and SL power consumptions.

### 2.2.2 Precharge-high low-swing or charge-shared ML sensing

One technique to reduce ML power consumption and to increase speed is to limit the ML voltage swing. [13-15] ML power consumption is proportional to ML voltage ( $V_{ML}$ ) swing and is given by ref [10]

 $P_{ML} = wC_{ML}V_{DD}V_{ML}f \quad ----- [3]$ 

By using charge sharing, this technique reduces  $V_{ML}$ . The technique in Refs. 13 and 14 uses a separate capacitor to store charge in the pre charge phase. In the next phase, this charge is shared with the ML. Figure 2.8 shows simplified schematic of the



Fig. 2.8 Charge-shared ML sensing scheme

NOR-type ML sensing scheme proposed in Refs. 13 and 14.

First, the ML is dis-charged to ground and a tank capacitor ( $C_{tank}$ ) is charged to full  $V_{DD}$ . During match evaluation, the charge stored in  $C_{tank}$  is shared with the ML ( $C_{ML}$ ). The final ML voltage ( $V_{ML}$ ) is determined by the values of  $C_{tank}$  and  $C_{ML}$ .  $V_{ML}$  can be controlled by choosing suitable value of  $C_{tank}$ 

If the word is fully matched, the ML can retain its voltage. Otherwise, ML is discharged to ground. The drawbacks of this technique are additional area required by the tank capacitor and low noise margin. The technique in Ref. 15 divides the NOR-type ML into four segments and pre-charges capacitances of segment 1 and 4 to full  $V_{DD}$  as shown in Fig. 2.9(a).





Fig. 2.9 (a) Charge-shared ML sensing scheme (b) one segment containing arbitrary k digits and (c) match sensor block when  $V_{\rm ff}$  and  $V_{\rm rf}$  are greater than  $V_{\rm tn}$ .

During the precharge phase, SLs are kept at low logic level to keep ML segments in high impedance state. Signal CS is also low to prevent charge sharing. In the second phase of the search, SLs are loaded with search key and CS is asserted to open two pass gates to let the charge stored in pre charged segments to be shared by remaining two segments. If the two segments in a block fully match the corresponding search key fragments, they can retain their voltages ( $V_{lf} > 0$  and  $V_{rf} > 0$ ).

Magnitudes of the resulting ML voltages ( $V_{lf}$  and  $V_{rf}$ ) depend on the sizes (number of cells) of the segments. Mismatches cause ML voltages to discharge to ground ( $V_{lf}=0$  and or  $V_{rf}=0$ ). The match sensor block combines the match results from left and right blocks and produces additional

match result. The ML segments are never pre-discharged to ground before the search begins. Therefore, in the case when the same word is matched in two or more successive searches, any charge remaining in ML in the previous search cycle can be reused to reduce energy consumption of the current SEARCH operation. The technique in Ref. 28 is targeted to reduce the peak power consumption though the reported worst case energy figure is large. Also, the construction of the array is complex due to larger number of ML segments. Using equal sized segments can save approximately 50% energy compared to conventional scheme ( $V_{ML}=V_{DD}=2$  in Eq. (3)). Figure 2.9 (c) shows the match sensor block in the special case when  $V_{If}$  and  $V_{rf}$  are greater than N1 and N2 threshold voltages, respectively.

#### 2.2.3 Pipelining scheme

Pipelining is a variation of selective pre charge where ML is divided into more than two segments. Figure 2.10 shows the general form of pipelining scheme for ML using NOR-type cells. Like selective pre charge, searching is carried out serially segment-by-segment. If there is a mismatch in any one segment, charging/match detection in subsequent segments in a word is aborted. Each pipelining stage consists of a portion of ML and its own MLSA which is activated by the match result in the previous stage. ML segmentation results in breaking down the large ML capacitance into smaller portions which are charged only if the previous stage results in a match. This results in energy savings.



Fig 2.10 General form of pipelining scheme using four pipelined stages.

Different authors have implemented pipelining scheme in different forms<sup>16-24.</sup> In Ref. 16, four stages each consisting of NAND-types TCAM cells have been used. In first and third stages, NAND-type cells have been modified to have PMOS chain in ML (instead of NMOS chain) which propagates logic high (instead of logic low) in case of a match. Technique in Refs. 17 and 18 uses three stages where each of the first two contains two NAND cells and the third

stage contains the rest of the bits implemented using NOR cells. Technique in Refs. 19 and 20 uses five stages with the first containing 8 bits and each of the remaining four containing 34 bits. NOR-type cells and current-race (CR)-type MLSA (discussed later) have been used in each stage. In Ref. 21, the CAM cell has been modified to make each bit an individual segment. Pipelining operates on a bit-by-bit basis. Additional circuitry is required in each cell to implement this technique. References 22 and 23 use large number of ML segments each containing only 4/6/8 NAND-type TCAM cells to keep signal propagation delay small. Up to four segments are compared in parallel and match decisions from all segments are propagated to the next segments in a crisscross fashion called butterfly connection. Butterfly connection makes the mismatch in-formation in any one segment to propagate to all the next segments with high degree of parallelism. In yet another variation in Ref. 13, small ML segments as in Refs. 22 and 23 have been interconnected using tree AND-type ML combined with straightforward or leap-frog interconnection with the same objective of increasing parallelism in the SEARCH operation. The drawback of pipelining technique is increased latency and area overhead due to additional circuits. Due to serial nature of the SEARCH operation, pipelining schemes tend to have slower match detection.

#### 2.2.4 CR scheme

CR ML sensing scheme<sup>25</sup> is one of the few most cited ML sensing techniques found in the literature. Many authors have used this technique as a basis for the construction of their own techniques in an attempt to have better performance than CR. Therefore, CR-based scheme has evolved as a separate class of sensing scheme. The TCAM array in CR scheme is constructed using NOR-type cells .The original CR scheme was targeted to reduce ML power consumption of the conventional scheme. The main difference between CR scheme and conventional scheme is that CR scheme is a precharge-low scheme while conventional scheme is precharge-high. Instead of charging all MLs to high, CR scheme pre-discharges all MLs to low.

During match evaluation, all MLs are charged towards high. Matched MLs charge quickly to large voltage (due to NOR-type cells) while mismatched MLs have much lower voltage due to presence of discharging paths. Precharge-low technique has another advantage. In conventional scheme, the SLs need to be discharged to zero voltage during precharge phase to ensure that MLs remain disconnected from ground

(Fig. 2.7(b)). During evaluation, SLs are loaded with the actual search key. In CR scheme, since MLs are pre-discharged, SLs need not to be switched to zero. This reduced SL switching activity results in around 50% saving of SL energy. Controlling the timing of clocked circuits is an issue in CAM design. A common technique to address this issue is the use of a dummy word which is always matched.





Fig. 2.11 CR ML sensing scheme: (a) one word and (b) the dummy word.

The associated dummy ML is used to control the duration of precharge and evaluation phase. This minimizes the effects of process variations since the dummy ML tracks the process variations in the rest of the array. CR scheme eliminates unnecessary ML charging by using same dummy word concept. Figure 2.11 shows the CR scheme.

MLSA consists of a charging unit and sensing unit. The operation starts by resetting any voltage on ML ( $V_{ML}$ ) and MLSA outputs (MLSOs and DMLSO) to zero by MLRST signal. Then, asserting signal MLEN turns on transistor M2 causing  $I_{ML}$  to flow. The ML capacitance ( $C_{ML}$ ) starts charging. If the word is fully matched, ML can charge up to the threshold voltage of M3 turning it ON. Therefore, MLSO can become high. If the word is not matched,  $V_{ML}$  is small since ML has discharging path(s) to ground. M3 remains OFF and MLSO remains zero. The dummy word is always matched by local masking and its MLSA always produces high DMLSO. A delayed and inverted version of DMLSO, i.e., MLOFF is used to turn off M2

transistors in all words. This eliminates unnecessary charging (and energy consumption) of MLs. The programmable delay after DMLSO ensures that all the matched MLs get sufficient time to charge up to the threshold voltage of M3 if the dummy word detects the match earlier (due to process variations).  $V_{bias}$  controls  $I_{ML}$  and hence controls the speed and energy consumption of the match detection process.

In CR scheme, NOR cell comparison circuit shown in Fig. 2.7(a) is preferred over the comparison circuit shown in Fig. 2.7(b). The ON/OFF states of M1 and M2 in Fig. 2.7(b) determine the parasitic capacitance of ML. For the comparison circuit shown in Fig. 2.7(b), ML parasitic capacitance depends on stored data. Since different cells have different stored data,  $C_{ML}$  varies from ML to ML. But, for the comparison circuit shown in Fig. 2.7(a),  $C_{ML}$  depends on search data. Since search data bits are same along a column,  $C_{ML}$  remains the same for all MLs in a search. This ensures good matching between MLs and prevents sensing error due to capacitance variation<sup>10</sup>.

CR scheme supplies the same initial current (on the rising edge of MLEN) to both mismatched and matched MLs. But, since matched MLs charge to higher voltage, I<sub>ML</sub> gradually decreases in matched MLs. On the contrary, mismatched MLs have lower resistance path(s) to ground. The equivalent resistance of the ML pull-down path decreases with increasing number of mismatches. Therefore, I<sub>ML</sub> increases with increasing number of mismatches. Since most of the MLs are mismatched, large currents to mismatched MLs cause significant wastage of energy. This problem can be solved by supplying smaller currents to mismatched MLs.

### 2.2.4.1 CR-based schemes with positive feedback

The idea of using positive feedback in CR MLSA was first proposed in Refs. 26 and 27. This scheme is called current saving technique <sup>26</sup> or mismatch-dependent (MD) power allocation technique<sup>27</sup>.



Fig. 2.12 MLSA in MD power allocation technique for ML sensing.

In this technique, a feedback unit has been added to the basic CR MLSA to detect mismatched MLs during charging and reduce currents to those MLs. Figure 2.12 shows the MD scheme. The feedback unit contains a level shifter and a feedback circuit to implement feedback. In speed-optimized setting, the nodes MLSO,  $V_{ML}$  and  $V_{VAR}$  are pre-discharged to ground firrst. MLEN starts the charging of all MLs. Initially, both  $V_{ML}$  and  $V_{VAR}$  increase by currents  $I_{ML}$  and  $I_{bias}$ , respectively. As  $V_{VAR}$  increases, current  $I_{ML}$  decreases. But, with increasing  $V_{ML}$ , the level shifter output also increases.  $V_{ML}$  in a matched ML increases faster than that in a mismatched ML. The level shifter output becomes sufficiently high in a matched ML to turn on N1 and hence  $V_{VAR}$  starts to discharge. With reduced  $V_{VAR}$  voltage,  $I_{ML}$  for matched ML increases again. For a Mismatched ML,  $V_{ML}$  rises slowly and to a lower value which depends on the number of mismatches present in the word. Therefore, for mismatched MLs, N1 may get weakly turned on or it may remain OFF. This results in small or no reduction in  $V_{VAR}$  voltage. Therefore,  $I_{ML}$  in mismatched ML keeps decreasing. Matched MLs get higher average  $I_{ML}$  than mismatched MLs.

Same dummy word as in CR scheme has been used to generate MLOFF signal to control ML charging duration. In terms of ML energy, this technique can offer significant energy reduction compared to CR scheme. But, the level shifter and the feedback circuit consume considerable amount of energy. In case of full match, transistor N1 is fully turned on. For small number of mismatches (1-bit or 2-bit), N1 remains partially ON. This causes establishment of conducting path from VDD to ground in the feedback circuit. Again, for larger number of mismatches, V<sub>ML</sub> is small since ML has multiple discharging paths to ground. This makes both the level shifter transistors ON and creates a conducting path from V<sub>DD</sub> to ground. Conducting paths from V<sub>DD</sub> to ground cause static power consumption.

In order to overcome the problems of MD scheme, Refs. 25 and 28 have proposed two MLSAs with feedback—active feedback (AF) MLSA and resistive feedback/ shielding (RF) MLSA. Figure 2.13 shows ML sensing scheme with AF MLSA. After  $V_{ML}$  and MLSO are reset to zero (by MLRST), search is started by asserting MLEN.



Fig. 2.13 AF ML sensing scheme.

M1 and M2 turn ON making gate capacitance of M4 (plus drain capacitances of M2 and M3 and the wiring capacitance) to charge. Initial  $I_{ML}$  currents in all MLSAs are same which is also true for  $I_{Chr}$  currents. Therefore,  $V_{CS}$  and  $V_{ML}$  increase with time.

As a result,  $I_{ML}$  and  $I_{Chr}$  decrease with time. Now, for a matched ML,  $V_{ML}$  rises quickly since ML is disconnected from ground. This makes M2 to turn OFF ( $I_{Chr} = 0$ ) when  $V_{M_{a}^{a}source} - V_{ML} \leq V_{M_{a}^{a}threshold}$ 

Since M3 is always ON,  $V_{CS}$  starts to discharge through M3 in a matched ML. That makes M4 to again conduct more and  $I_{ML}$  to increase again. For a mismatched ML,  $V_{ML}$  rises slowly. This keeps M2 ON and makes  $V_{CS}$  to keep increasing eventually making M4 to turn OFF ( $I_{ML}$ =0). Mismatched ML gets smaller current than matched ML and energy consumption in mismatched words becomes small.  $V_{bias}$  and MLSA transistor sizes need to be tuned to make this feedback mechanism effective. Dummy word provides the MLOFF signal to terminate ML charging. Some energy is wasted due to current through M3 (to ground) especially in all mismatched MLs.

Figure 2.14 shows the RF MLSA. First, MLRST signal resets the voltages at ML, sensing point (SP) and MLSO. Then, MLEN signal initiates the search operation by turning on M2.V<sub>bias</sub> controls the total charging current  $I_{Chr}$ . Initially,  $I_{Chr}$  currents are same in all MLs and so are  $I_{ML}$  currents and  $I_{SP}$  currents. Feedback action is implemented by a single NMOS transistor M3. As ML voltage is increased by IML,  $V_{gs}$  and  $V_{ds}$  of M3 decreases. Therefore,  $I_{ML}$  decreases. Reduction in  $I_{ML}$  causes increase in  $I_{SP}$ . Increase in  $I_{SP}$  causes increase in voltage at SP ( $V_{SP}$ ) and  $V_{ds}$  of M3.



Fig. 2.14 RF ML sensing scheme.

As  $V_{ML}$  in a matched word increases faster and to a higher value,  $V_{SP}$  in a matched word also increases to a higher value. Since capacitance at node SP ( $C_{SP}$ ) is much lower than  $C_{ML}$ , the rate of rise of  $V_{SP}$  is much larger than that of  $V_{ML}$ . In a mismatched ML,  $V_{ML}$  rises slowly. Therefore,  $V_{SP}$  also rises slowly.  $V_{SP}$  in a matched ML reaches threshold voltage of M4 much quicker than that in a mismatched ML and produce a high output at MLSO. A dummy word, fully matched by local masking as in CR scheme, signals the end of match detection (by its own-MLSO) and terminates further charging of ML and SP nodes in all words (by MLOFF). $V_{SP}$  of mismatched words do not get the sufficient time to charge up to the sensing threshold voltage.

This technique of ML sensing is extremely simple in construction, i.e., only two additional transistors compared to CR MLSA are required. Yet, the technique is very effective not only to reduce energy consumption but also to speed up the SEARCH operation. Energy reduction occurs due to the fact that large ML capacitances are not required to be charged to as large voltages as in CR or other feedback techniques for proper match detection. Energy overhead occurs due to the charging of node SP.

Since  $C_{SP}$  is very small, this energy overhead is small compared to other feedback techniques. The drawbacks of this technique are requirement of two analog control voltages ( $V_{bias}$  and  $V_{res}$ ). In order to improve the performance further and eliminate the requirement of two bias voltages, Refs. 29 and 30 have used positive feedback twice in a way to control both the feedback using one bias voltage. Figure 20 shows the dual feedback MLSA of Refs. 29 and 30. One positive feedback is offered by the transistor N1 same as RF scheme. The second feedback is offered by the feedback unit. The feedback unit is a modified version of MD scheme designed to eliminate the static power consumption problem. The signal DMLSO is obtained from the dummy word MLSA output. The feedback actions create large difference between I<sub>SN</sub> currents in matched and mismatched MLs. Match sensing is possible with very small ML voltage which results in superior speed and energy consumption compared to other feedback schemes.

### Chapter 3

### 3. Simulation Results and Performance Analysis

In our simulation we are using 130 nm and 1.8 volt CMOS technology for HSpice simulation. As mentioned in chapter 2, TCAM actually works in the form of array i.e. multiple words x multiple bits. In our simulation we have used 24-words x 24- bits TCAM arrays and a dummy word of same bit.

We have performed comparative analysis among positive feedback based Current Race, Active feedback, Resistive feedback. There are a lot number of parameters, we worked only on search time, voltage margin, peak dynamic power and worst case energy consumption.

### 3.1 Current Race Scheme

The circuit diagram of current race scheme is given in figure -2. It has two part, namely charging unit and sensing unit. We need to change the gate parameter of the MOSFETS in the charging unit to make the feedback working. The V<sub>bias</sub> was high and L<sub>min</sub> and W<sub>min</sub> are minimum feature size.

In the following subsection we shall discuss about different performance parameters such as search time, voltage margin, peak dynamic power and worst case energy consumption for current race scheme.

### 3.1.1 Match line current variation in Current Race Scheme

The full matched ML current will be lowest and the current will be increasing with increase of the number of bits mismatches in current race scheme. Our simulation result shows exactly the same behavior in fig 3.1.



Figure 3.1 ML current variation in current race scheme

### 3.1.2 Search Time

Search time is defined as the time from 50% of MLEN to 50% of the final output of a matched ML. Using the biased voltage  $V_{\text{bias}} = 1.8V$ , we found search time 810.90 ps for the current race scheme which is shown in the figure 3.2.



Figure 3.2: Search time for current race scheme

For determining the search time we first took the time difference between 50% of MLEN voltage and the 50% of the matched MLSO [match line sensing scheme] and in our case the full match was word number one in our routing table [MLSO 16].

#### 3.1.3 Voltage Margin

Among all types of mismatches one bit mismatch causes the maximum resistance in the ML pulldown path since there is only one path through which the ML can discharge. If there are multiple mismatches, multiple Pull-down paths exist in parallel and hence the equivalent resistance of ML to ground path is lower which take less time for ML to discharge as the numbers of paths for discharging are increasing. Maximum resistance in the pull-down path means less charge leakage from ML to ground during match evaluation. Hence ML with 1-bit mismatch charges faster than MLs with more than one mismatch. So, 1-bit is the hardest to detect and it has the highest probability to be detected as a false match. So, there should be a distinct voltage gap between full match ML and 1 bit mismatch ML.

Voltage margin is defined as the difference between the sensing threshold of the sensing unit and the maximum voltage to which a 1-bit mismatched ML is charged. It has been calculated using graphical method shown figure 3.3. This was the voltage margin for current race scheme.



Figure 3.3: Voltage margin for current race scheme

To find the voltage margin we first determined the crossing point of matched ML and MLSO. Then we determined the maximum voltage up to which the ML of 1 bit mismatched charged and the difference between them was the voltage margin.

In case of Current Race scheme the voltage margin was 864.89 mV.

# 3.1.4 Peak Dynamic Power and Worst Case Energy Consumption

Peak power consumptions with worst case data pattern in routing table is a critical TCAM performance criterion. Many energy saving techniques concentrate on reducing average power consumption but the peak power consumption increases. Increased peak power consumption means more power has to be allocated for the TCAM chip which will be useful only for a short duration but during rest of the search cycle most of that allocated power remains unutilized. So lower peak power consumption means cheaper supply can be used or the extra power can be used for other components. The worst case routing table used in energy comparison has been used to obtain peak power consumptions of various schemes.

The energy consumption of a scheme is depending on different types of mismatch condition. So assessing probabilities of different mismatch conditions may be difficult. So we prefer to calculate on the lower boundary of energy in our scheme. Fully matched words consume the highest energy among all types of words. In case of mismatched words energy per word decreases with number of mismatches for active feedback, resistive feedback while in current race this increases with number of mismatches. So one bit mismatch will cause maximum energy consumption for active feedback, resistive feedback and minimum energy consumption per word if the mismatch is detected in current race. On the other hand the maximum energy consumption occurs in case of full mismatch. So we prepared two routing table for worst case energy consumption. For active feedback, resistive feedback schemes we prepared the routing table with two full matched and 22 words with one bit mismatch to get worst case energy consumption and for current race we prepared the routing table with two full matched.

The peak dynamic power and worst case energy consumption for current race sensing unit was shown in figure 3.4.



Figure 3.4: The peak dynamic power and worst case energy consumption for current race sensing scheme

Peak dynamic power is the maximum peak in the graph and worst case energy can be calculated from the area under the curve.

The peak dynamic power of Current Race scheme was 17.87 mW and the worst case energy consumption was 2042.95 fJ.

| Parameter                          | Current Race Scheme |
|------------------------------------|---------------------|
| Search Time [ps]                   | 810.9               |
| Voltage Margin [mV]                | 864.89              |
| Peak Dynamic Power [mW]            | 17.87               |
| Worst Case Energy Consumption [fJ] | 2042.95             |

Table3.1: Different parameter of current race scheme

## 3.2 Active Feedback Sensing Scheme

In this scheme we have to change the aspect ratio of the MOSFET to make the feedback work properly. The gate parameters use in active feedback is listed below

| Nmos Name | Length (Lmin) | Width (Wmin ) |
|-----------|---------------|---------------|
| M1        | 1             | 7/3           |
| M2        | 7/3           | 7/3           |
| M3        | 11/4          | 7/3           |
| M4        | 1             | 8/3           |
| M5        | 1             | 10/3          |
| MLRST BAR | 1             | 10/3          |
| M7        | 10/3          | 5/4           |
| M6        | 1             | 10/3          |
| MLRST     | 1             | 10/3          |

Table3.2: Aspect ratio of gate parameters of active feedback scheme

 $V_{\text{bias}}$  was tuned to 1.8V  $L_{\text{min}}$  and  $W_{\text{min}}$  are minimum feature size.

### 3.2.1 Match line current variation in Active Feedback Scheme

In the following subsection we shall discuss about different performance parameters such as search time, voltage margin, peak dynamic power and worst case energy consumption for Active Feedback scheme simulated by the aspect ratio mentioned above.

The full matched ML current will be highest and the current will be decreasing with increase of the number of bits mismatches in Active Feedback scheme. Our simulation result shows exactly the same behavior in fig 3.5.



Figure 3.5: ML current variation in Active Feedback scheme

#### 3.2.2 Search Time

For the above configuration we got the searching time 485.58 ps which is shown in the figure 3.6.



Figure 3.6: Search time for active feedback sensing scheme

### 3.2.3 Voltage Margin

In case of active Feedback scheme the voltage margin was 675.59 mV which shown in figure 3.7.



Figure 3.7: Voltage margin in Active Feedback scheme

# 3.2.4 Peak Dynamic Power and Worst Case Energy Consumption

The peak dynamic power and worst case energy consumption for Active Feedback sensing scheme was shown in figure 3.8.



Figure 3.8: The peak dynamic power and worst case energy consumption for Active Feedback sensing scheme

The peak dynamic power for Active Feedback sensing scheme was 14.05mW and worst case energy consumption for Active Feedback sensing scheme was 936.88 fJ

| Parameter                          | Active Feedback Scheme |
|------------------------------------|------------------------|
| Search Time [ps]                   | 485.58                 |
| Voltage Margin [mV]                | 675.59                 |
| Peak Dynamic Power [mW]            | 14.05                  |
| Worst Case Energy Consumption [fJ] | 936.88                 |

Table3.3: Different parameter of active feedback scheme

## 3.3 Resistive Feedback

In this scheme we have to change the aspect ratio of the MOSFET to make the feedback work properly. The gate parameters use in Resistive feedback is listed below

| Mos name  | Length(Lmin) | Width (wmin) |  |
|-----------|--------------|--------------|--|
| M1        | 1            | 10/3         |  |
| M2        | 1            | 10/3         |  |
| M3        | 1            | 10/3         |  |
| MLRST2    | 1            | 7/3          |  |
| MLRST1    | 1            | 10/3         |  |
| MLRST BAR | 1            | 10/3         |  |
| M7        | 10/3         | 11/9         |  |
| M4        | 1            | 10/3         |  |

Table3.4: Aspect ratio of gate parameters of resistive feedback scheme

Vres was tuned to 1.8V Lmin and Wmin are minimum feature size.

#### 3.3.1 Match line current variation in Resistive Feedback Scheme

In the following subsection we shall discuss about different performance parameters such as search time, voltage margin, peak dynamic power and worst case energy consumption for Resistive Feedback scheme simulated by the aspect ratio mentioned above.

The full matched ML current will be highest and the current will be decreasing with increase of the number of bits mismatches in Resistive Feedback scheme. Our simulation result shows exactly the same behavior in fig 3.9. This also shows that the feedback is working properly here.



Figure 3.9: ML current variation in Resistive feedback scheme

#### 3.3.2 Search Time



For the above configuration we got the searching time 372.2 ps which is shown in the figure 3.10.

Figure 3.10: Search time for resistive feedback scheme

#### 3.3.3 Voltage Margin

In case of Resistive Feedback scheme the voltage margin was 628.5mV which shown in figure 3.11.



Figure 3.11: Voltage margin for resistive feedback scheme

# 3.3.4 Peak Dynamic Power and Worst Case Energy Consumption

The peak dynamic power and worst case energy consumption for Resistive Feedback sensing scheme was shown in figure 3.12.



Figure 3.12: The peak dynamic power and worst case energy consumption for Resistive Feedback sensing scheme

The peak dynamic power for Resistive Feedback sensing scheme was 14.17mW and worst case energy consumption for Resistive Feedback sensing scheme was 1120.95fJ.

| Parameter                          | Resistive Feedback Scheme |
|------------------------------------|---------------------------|
| Search Time [ps]                   | 372.2                     |
| Voltage Margin [mV]                | 628.5                     |
| Peak Dynamic Power [mW]            | 14.17                     |
| Worst Case Energy Consumption [fJ] | 1120.95                   |

Table3.5: Different parameter of Resistive Feedback scheme

## 3.4 Final Comparison

| Comparison<br>Parameter               | Current Race | Active Feedback | Resistive Feedback |
|---------------------------------------|--------------|-----------------|--------------------|
| Search Time [ps]                      | 810.9        | 485.58          | 372.2              |
| Voltage Margin [mV]                   | 864.89       | 675.59          | 628.5              |
| Peak Dynamic Power<br>[mW]            | 17.87        | 14.05           | 14.17              |
| Worst Case Energy<br>Consumption [fJ] | 2042.95      | 936.88          | 1120.95            |

Table 3.6: Final comparison

Among the three schemes, Resistive Feedback is superior in term of search time. But it has a little bit degraded property in voltage margin, peak dynamic power and worst case energy consumption.

In term of voltage margin Current Race is superior although it is worst in term of search time and worst case energy consumption.

In term of worst case energy consumption Active Feedback is superior and its performance is medium in term of search time. But worst in case of voltage margin.

## **CHAPTER 4**

### 4. Conclusion

TCAMs are gaining importance in high-speed lookup intensive applications. However, the high power consumption of TCAMs is limiting their popularity and versatility. A significant portion of the TCAM power is consumed by MLSAs for match detection. We discussed the CR scheme and two MLSAs that apply positive feedback for power reduction in ML sensing. Instead of providing the same current to all MLs, these MLSAs modulate the ML current source such that a large current flows into the ML<sub>0</sub> (match) ML sensing. And a small current flows into the ML<sub>k</sub> (mismatch). We tuned the gate parameters in such a way that and a smaller current source such that a larger current gate parameters in such a way that the combination maximizes the speed, noise margin and minimizes the energy consumption of ML sensing scheme. The above statement implies that the proposed MLSAs do not necessarily reduce the power consumption (Power = Energy x Frequency because the increase in frequency offsets the reduction in energy. However, power reduction can be achieved if the frequency is not increased. Simulation results of the resistive feedback scheme (in 0.12pm CMOS technology) show about 45-46% reduction in energy over the conventional CR-MLSA. But the energy reduction was best in active feedback scheme and there was about 54-55% reduction in energy over the conventional CR-MLSA. In addition, two positive feedback based schemes improve the robustness of ML sensing by feeding less current to ML<sub>1</sub>, and more current to ML<sub>0</sub>.

In case of active feedback scheme this energy saving comes in expense of reduced voltage margin and peak dynamic power. The worst case energy consumption is relatively less in these schemes compared to conventional CR-MLSA. But the resistive feedback scheme shows no degradation of voltage margin and peak dynamic power compared to conventional CR-MLSA. Here also the worst case energy consumption is relatively less compared to conventional CR-MLSA. We have found that among the two positive feedback based schemes resistive feedback provides with the best search time. The voltage margin is the best in case of current race scheme. Worst case energy consumption is least in current race scheme. If the router is exposed to a noisy environment then we should go for the scheme which shows the best voltage margin which is the current race scheme. If the energy consumption and heating of the device is in concern then we should go for active feedback scheme.

Future research can be carried out in understanding the search algorithms and applying that information to reduce switching activity in SLs. In addition, innovative circuit techniques can be developed for the comparison logic to reduce the voltage swing and capacitance of SLs. Since large cell area is also a serious concern for large-capacity TCAMs, future research can also include the design of low area TCAM cells that are compatible with the standard CMOS process.

# References

1. A. S. Tanenbaum, Computer Networks (Prentice Hall, Upper Saddle River, 2003).

2. K. Pagiamtzis, Introduction to content-addressable memory (2007), http://www.pagiamtzis.com/articles.

3. M.-F. Tzeng, Routing table partitioning for speedy packet look-ups in scalable routers, IEEE Trans. Parallel Distrib. Syst. 17(2006) 481–494.

4. Y.-K. Chang and Y.-C. Lin, A fast and memory efficient dynamic IP look-up algorithm based on B-Tree,Proc. Int. Conf. Advanced Information Networking and Applications (2009), pp. 278–284.

5. L.-C. Wuu, T.-J. Liu and K.-M. Chen, A longest prefix firrst search tree for IP look-up, Comput. Netw.: Int. J. Comput. Telecommun. Netw. 51(2007) 3354–3367.

6. X. Sun and Y. Q. Zhao, An on-chip IP address look-up algorithm, IEEE Trans. Comput.54(2005) 873–885.

7. S. Sahni, K. S. Kim and H. Lu, Data structures for one-dimensional packet classification using most-specific-rule matching, Int. J. Found. Comput. Sci. 14(2003) 337–358.

8. P. Gupta and N. McKeown, Algorithms for packet classification, IEEE Netw. 15(2001) 24-32.

9. K. Lakshminarayanan, A. Rangarajan and S. Venkatachary, Algorithms for advanced packet classification with ternary CAMs, Proc. ACM SIGCOMM(2005), pp. 193–204.

10. K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory (CAM) circuits and architectures: A tutorial and survey,IEEE J. Solid-State Circuits41(2006) 712–727.

11. K.Pagiamtzis and A. Sceikholeslami, "Content Addressable Memory" [CAM] circuits and architectures: a tutorial and survey," IEEE J.Solid-State Circuits, Vol. 41, no.3, pp.712-727, March, 2006

12. H. Kadota, J. Miyake, Y. Nishimichi, H. Kudoh and K. Kagawa, An 8-kbit contentaddressable and reentrant memory, IEEE J. Solid-State Circuits 20(1985) 951–957

13. G. Kasai, Y. Takarabe, K. Furumi and M. Yoneda, 200 MHz/200 MSPS 3.2 W at 1.5 V Vdd, 9.4 Mbits ternary CAM with new charge injection match detect circuits and bank selection scheme, Proc. IEEE Custom Integrated Circuits Conf. (CICC)(2003), pp. 387–390

14. M. M. Khellah and M. I. Elmasry, Use of charge sharing to reduce energy consumption in wide fan-in gates, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS), Vol. 2, 31 May–3 June 1998, pp. 9–12.

15. S. Baeg, Low-power ternary content-addressable memory design using a segmented match line,IEEE Trans. Circuits Syst.55(2008) 1485–1494.

16. I. Y.-L. Hsiao, D.-H. Wang and C.-W. Jen, Power modeling and low-power design of content addressable memories, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS),

Vol. 4, 6–9 May 2001, pp. 926–929

17. A. Efthymiou and J. D. Garside, An adaptive serial–parallel CAM architecture for low-power cache blocks, Proc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED)

(2002), pp. 136-141.

18. A. Efthymiou and J. D. Garside, A CAM with mixed serial–parallel comparison for use in low energy caches,IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12(2004) 325–329

19. K. Pagiamtzis and A. Sheikholeslami, Pipelined match-lines and hierarchical search-lines for low-power content-addressable memories, Proc. IEEE Custom Integrated Circuits Conf. (CICC)(2003), pp. 383–386

20. K. Pagiamtzis and A. Sheikholeslami, A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme, IEEE J. Solid-State Circuits 39 (2004) 1512–1519.

21. I. M. Hyjazie and C. Wang, An approach for improving the speed of content addressable memories, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS), Vol. 5, 25–28 May 2003, pp. 177–180.

Solid-State Circuits 46(2011) 507-519.

24. C.-C. Wang, J.-S. Wang and C. Yeh, High-speed and low-power design techniques for TCAM macros, IEEE J. Solid-State Circuits 43(2008) 530–540.

25. I. Arsovski, T. Chandler and A. Sheikholeslami, A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme,IEEE J. Solid-State Circuits38(2003) 155–158.

26. I. Arsovski and A. Sheikholeslami, A current-saving match-line sensing scheme for con-tentaddressable memories, Digest of Technical Papers of IEEE Int. Solid-State Circuits Conf. (ISSCC)(2003), pp. 304–305.

27. I. Arsovski and A. Sheikholeslami, A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories, IEEE J. Solid-State Circuits 38

(2003) 1958–1966

28. N. Mohan, W. Fung, D. Wright and M. Sachdev, A low-power ternary CAM with positive-feedback match-line sense amplifiers, IEEE Trans. Circuits Syst. I, Regul. Pap.56 (2009) 566–573

29. S. I. Ali and M. S. Islam, A current race-based technique with dual feedback for match-line energy reduction in ternary content addressable memory, Proc. IEEE Int. Conf.Electrical and Computer Engineering (ICECE)(2012), pp. 725–728

30. S. I. Ali and M. S. Islam, A match-line dynamic energy reduction technique for high-speed ternary CAM using dual feedback sense amplifier, Micro electron. J.45 (2014) 95–101.