## Simulation and comparative study of different feedback-based matchline sensing circuits for Ternary Content Addressable Memory(TCAM)

by

#### Abir Hasan 160021092

#### Shamim Al Mamun 160021093

#### Sadman Sakib Soumik 160021087

A Thesis submitted to the Academic Faculty in the Partial Fulfillment of the requirements of the Degree of

#### BACHELOR OF SCIENCE IN ELECTRICAL AND ELECTRONIC ENGINEERING



Department of Electrical and Electronic Engineering Islamic University of Technology(IUT) Gazipur, Bangladesh March, 2021

## Simulation and comparative study of different feedback-based matchline sensing circuits for Ternary Content Addressable Memory(TCAM)

Approved by:

#### Dr. Syed Iftekhar Ali

Supervisor and Professor, Department of Electrical and Electronic Engineering, Islamic University of Technology (IUT), BoardBazar, Gazipur-1704.

Date:\_\_\_\_\_

## Contents

| Co | ontents                                | i         |
|----|----------------------------------------|-----------|
|    | List of Tables                         | iii       |
|    | List of Figures                        | iv        |
|    | List of Acronyms                       | v         |
|    | Acknowledgements                       | vi        |
|    | Abstract                               | vii       |
| 1  | Introduction                           | 1         |
| 2  | CAM Basics                             | 5         |
|    | 2.1 Binary CAM (BCAM)                  | 5         |
|    | 2.2 BCAM Operations                    | 6         |
|    | 2.2.1 Read                             | 6         |
|    | 2.2.2 Write                            | 6         |
|    | 2.2.3 Search                           | 7         |
|    | 2.3 Ternary CAM (TCAM)                 | 9         |
|    | 2.3.1 TCAM word formation              | 10        |
|    | 2.4 TCAM Array                         | 11        |
| 3  | Matchline Sensing Scheme               | 14        |
|    | 3.1 CR scheme                          | 16        |
|    | 3.2 Sensing circuits with Feedbacks    | 18        |
|    | 3.2.1 CR scheme Mismatch Dependent(MD) | 19        |
|    | 3.2.2 CR scheme Active Feedback(AF)    | 20        |
|    | 3.2.3 CR scheme Resistive Feedback(RF) | <b>21</b> |
|    | 3.2.4 CR scheme Dual Feedback(DF)      | 22        |
| 4  | Simulation, Results and Comparison     | 24        |
|    | 4.1 CR scheme and controlling signals  | 24        |
|    | 4.2 CR scheme MD Graphs                | 25        |
|    | 4.3 CR scheme AF Graphs                | 26        |

| REFERENCES |     |                                                          |   |   |   |   |   |   |   | 32 |   |   |   |   |   |   |   |   |   |   |    |
|------------|-----|----------------------------------------------------------|---|---|---|---|---|---|---|----|---|---|---|---|---|---|---|---|---|---|----|
| 5          | Con | clusion                                                  |   |   |   |   |   |   |   |    |   |   |   |   |   |   |   |   |   |   | 31 |
|            | 4.5 | CR scheme RF Graphs<br>CR scheme DF Graphs<br>Comparison | • | • | • | • | • | • | • | •  | • | • | • | • | • | • | • | • | • | • | 28 |
|            |     |                                                          |   |   |   |   |   |   |   |    |   |   |   |   |   |   |   |   |   |   |    |

#### ii

### List of Tables

• Comparison data among different CR schemes

#### **List of Figures**

- (a)NOR BCAM cell and (b)NAND BCAM cell
- (a)NOR BCAM data word consisting of n-bit cells, (b)NAND BCAM data word consisting of n-bit cells
- (a)NOR TCAM cell and (b)NAND TCAM cell
- (a)NOR TCAM Data word consisting of n-bit cells, (b)NAND TCAM Data word consisting of n-bit cells
- TCAM Array
- TCAM-based routing table implementation and packet forwarding.
- (a)Conventional ML sensing scheme where SLs are discharged to ground and MLs are precharged to high before match evaluation, (b)Modified comparison logic
- (a)CR scheme, (b)Dummy unit to generate *MLOFF*
- Mismatch Dependent CR scheme
- CR scheme with Active Feedback
- CR scheme with Resistive Feedback
- CR scheme with Dual Feedback
- (a)CR scheme V<sub>ML</sub> Graph and (b)Controlling signals
- CR scheme MD(Mismatch dependent) (a) $V_{ML}$  Graph, (b) $I_{ML}$  Graph, (c) $V_{VAR}$  Graph
- CR scheme AF(Active Feedback) (a) $V_{ML}$  Graph, (b) $I_{ML}$  Graph, (c) $V_{CS}$  Graph
- CR scheme RF(Resistive Feedback) (a) $V_{ML}$  Graph,(b) $I_{ML}$  Graph,(c) $V_{SP}$  Graph,(d) $\overline{MLOFF}$  Graph
- CR scheme Dual Feedback (a)V<sub>ML</sub> Graph,(b)I<sub>ML</sub> Graph

#### List of Acronyms

- CIDR: Classless Inter-Domain routing
- IP: Internet Protocol
- LPM: Longest Prefix Match
- BL: Bit Line
- WL: Write Line
- SL: Search Line
- CAM: Content Addressable Memory
- BCAM: Binary Content Addressable Memory
- TCAM: Ternary Content Addressable Memory
- CR: Current Race
- MD: Mismatch Dependent
- AF: Active Feedback
- RF: Resistive Feedback
- DF: Dual Feedback
- ML: MatchLine

### Acknowledgements

I would like to express my gratitude to my supervisor Professor Dr. Syed Iftekhar Ali for his precise guidance and motivating support in completion of this elaborate thesis work. Without his assistance,accomplishing the objective of this thesis would have been unachievable.

#### Abstract

Content addressable memory (CAM) can perform high-speed table look-up with bit level masking capability. The search and detection of data stored in CAM cells are done using different sensing circuitry. Current Race schemes serves this purpose with great efficiency. Adding feedback system in the circuitry increases the performance and energy efficiency. In this thesis, a comparative study has been conducted among different CR schemes and the simulation profiles has been observed based on a 16 by 32 Ternary CAM (TCAM) array.

# Chapter 1 Introduction

In the present age of high speed information search and processing, technology has progressed and executing performance that demand is expanding without limits.In modern internet network architecture, routers are the main segments. A routers interfaces different organizations and trades information packets between them. Every packets contains a header and a payload. The header contains data for example, a source address, an objective location, the information length, a sequence number and the data type of the packet. By reviewing the data in the header of an approaching packet, the router can choose the objective arrange and can choose the favored way between the source and the destination network. Every router keeps a routing table which contains numerous entries where every entry, known as prefix, ordinarily contains data, for example, objective location and relating yield port location. In basic packet sending application, the router looks at the objective address contained in the packet header with every one of the entries in the routing table, finds out the entry with the best match and advances the packet to the yield port based on the match. Classless inter-domain routing (CIDR) requires variable length of IP prefix. Subsequently, a few pieces in every IP address or prefix might be in cover or "don't care" states. One IP address may coordinate different prefixes in the directing (or sending) table and the longest prefix match(LPM), i.e., matching entry with most modest number of cover bits, must be picked. The speed of this table look-up function sets the breaking point to the data transmission and thus is the most basic piece of the routing activity. In spite of the fact that not successive, yet routers additionally need to refresh their directing tables by speaking with each other. Increased congestion and packet loss emerging from huge increment of web traffic have driven the business to concoct new class of routers. These new routers are fit for recognizing and ordering the approaching packets into various classes of services. This new function is called packet classification. Packet classification is actualized by strategy based turn upward or access control list (ACL) look-into which needs to authorize access control rules (likewise called channels) and requires serious level of adaptability. Execution of these standards requires the capacity to cover certain pieces in each prefix and the capacity to have "don't care" or wildcard bits sprinkled anyplace in the prefix. In packet classification, the best matching filter containing numerous fields in a filter set for a given packet must be found. Packet classification performs looking through the table of filters to relegate a stream identifier for the highest priority filter that matches the packet on the whole fields. The bringing flow back identifier shows the activity that is next applied to the parcel. The standard five-tuple fields incorporate the source address, objective location, convention type, source port furthermore, objective port. Among these fields, source and objective location fields are prefixes and frequently require the LPM. Protocol field can be wildcards or accurate qualities. Source and objective port numbers are commonly presented as reaches. Interestingly to packet sending which includes search in just one measurement, i.e., the objective IP address, packet classification is a multidimensional (multifield) search. It empowers routers to help numerous new organization applications like virtual private networks (VPNs), network address interpretation (NAT), load adjusting, traffic accounting and monitoring, quality of service (QoS), investigation, network interruption recognition furthermore, mixed media. These highlights regularly require various approach look-ups for most packets. These look-ups are frequently recursive, where the aftereffect of one gaze upward influences the accompanying turn upward. Consequently, a huge idleness can fundamentally hinder the strategy look-ups. Packet classification is regularly an exhibition bottleneck in the organization framework since it lies in the basic information way of the routers. Contrasted with straightforward packet forwarding, the intricacy (and thus time spent) in the table turn upward is far more prominent in packet classification function. The present circumstance is further bothered by change from web convention adaptation 4 (IPv4) to web convention adaptation 6 (IPv6). IPv4 utilizes 32-bit address length. IPv6 having 128bit address length was first created by internet engineering task

force (IETF) in the mid-1990s because of the pressing need to enhance the quickly lessening IPv4 tending to space. The expansion of address length offers new test to the table look-into speed. The five-tuple gaze upward can cause prefix size in IPv4 to be 104 bits while that in IPv6 to be 296 bits. Subordinate data like QoS data can expand IPv6 steering table prefix size to much higher incentive to 576 bits. Increase in routing table entry size joined with expanded number of entries in the routing table (because of expanded number of clients) has made high speed table look-into a significant challenge. The sheer blast of traffic made by new portable and social applications is driving high-limit line cards. This developing interest of rapid organization is further pushing the current table look-into answers for their cutoff points. Numerous product based techniques have been proposed for the longest prefix coordinating table turn upward. Programming based procedures utilize external static random access memory (SRAM) or dynamic random access memory (DRAM) to store and to look through an information structure. These strategies use calculations, for example, hashing, trees and attempts to decrease computational intricacy. While these methodologies can decrease the quantity of memory gets to for a solitary pursuit key, they as a rule can't meet the prerequisite of high speed forwarding or classification. Utilizing equipment based arrangement, all the more explicitly ternary content addressable memory (TCAM), to perform hardware-based packet forwarding and classification has become the *de facto* industrial standard.Content addressable memory (CAM) is an extraordinary kind of memory utilized in very fast looking through applications. It is otherwise called affiliated memory, acquainted capacity or cooperative cluster. Notwithstanding READ and WRITE tasks, CAM can likewise perform SEARCH activity. In standard random access memory (RAM), the client supplies a memory address and the RAM restores the information word put away at that address. A CAM is planned so that the client supplies a pursuit word furthermore, the CAM look through its whole substance to check whether that word is put away anyplace in it. On the off chance that the pursuit word is discovered, the CAM restores a rundown of at least one stockpiling addresses where it is found. CAM is intended to look through its whole memory in a solitary operation. Therefore, it is a lot quicker than RAM. Every individual memory cell in a completely equal CAM has its own related examination circuit to identify a match between the put away piece and the hunt bit. Furthermore, coordinate yields

from every cell in the word should be joined to yield a total information word coordinate sign. A ternary CAM or on the other hand TCAM has the extra ability of storing mask or "don't care" bits. This makes TCAMs much more alluring for packet forwarding or classification since these applications require mask bits to be put away in the routing table. In the presence of numerous matches, a priority encoder can resolve the most elevated need coordinate. This element is especially helpful for finding the LPM. These adaptabilities of TCAM come at the expense of expanded circuit size since every cell in TCAM requires two SRAM cells. The extra comparison hardware expands the actual size of the TCAM chip which builds the expense. It additionally expands power scattering since each comparison circuit is dynamic in each clock cycle. Yet, because of unequaled hunt speed what's more, straightforwardness in table support, utilization of TCAM is unavoidable in present day switches. Discrete elite inquiry processors and TCAMs are regularly utilized for layer 2 to layer 4 look-ups in better quality Edge and Core gear. Other than the organizing gear, CAMs are likewise appealing for different applications. Be that as it may, the flow TCAM research is essentially determined by the systems administration applications, which require high-limit TCAMs with low-force and high velocity activity. With speeding up and bigger word size, TCAM dynamic search power/ energy utilization has likewise become a significant concern. TCAM chips structure a piece of network processor. Lowpower sub-framework configuration is fundamental for combination into system on-chip (SoC). Consequently, energy effective TCAM configuration has gotten a functioning research area.

In this book, we will discuss about the basics and working principle of the CAMs [1]. We will see differences between BCAMs and TCAMs [1]. We will also discuss about different sensing circuits for look up [2][3][4][5]. We will use simulation tools and results in order to make a comparison among those sensing circuits. We will also study the comparison through observing tabular data of different parameters i.e. energy consumption, search time, voltage margin etc.

## **Chapter 2**

## **CAM Basics**

#### 2.1 Binary CAM (BCAM)

Binary CAM (BCAM) is a sort of CAM which can store both of the two states '1' (high) or '0' (low) in every cell. Since just one out of two states is to be put away, a basic 6T SRAM cell is utilized for data storage. Every cell has a comparison circuit to do the SEARCH activity. The comparison circuit can be either of two types — NAND-type and NOR-type. Figure 2.1 shows the BCAM cells with both kinds of comparison circuits [1]. In Figs. 2.1(a) and 2.(b), transistors M1–M6 establish 6T SRAM cells. In Fig. 2.1(a), the NOR examination circuit is made out of transistors M7-M10 while in Fig. 2.1(b), the NAND-type comparison circuit includes transistors M7–M9. READ and WRITE operations are done with the assistance of bit lines (BLs) and word line (WL). The search bit is provided in the Search lines (SLs). Every cell (bit) in a data word is associated with a match line (ML). The logic level at ML is influenced by coordinate or jumble condition which is detected to conclude the match result. In NOR-type cell, the ML is pulled down to ground if the stored doesn't coordinate the search bit. In NAND-type cell, a match brings about association between two progressive ML sections ML1 and ML2. Since READ, WRITE and SEARCH tasks are never performed at the same time, some of the time same lines are utilized for READ/WRITE and SEARCH.i.e., BLs and SLs are combined.



Figure 2.1: (a)NOR BCAM cell and (b)NAND BCAM cell

#### **2.2 BCAM Operations**

#### 2.2.1 Read

READ activity is performed by precharging the BLs to '1' and enabling WL. After precharging, BL drivers are switched off. Access transistors M5 and M6 are ON, which makes one of the BLs begin discharging. Decrease of voltage in one BL line causes voltage distinction between the BLs which is detected by the BL sense amplifier (BLSA) furthermore, changed over into full rail-to-rail voltage. To forestall accidental flip of stored bit during READ activity, driver transistors (M2, M4) are picked to be more stronger than access transistors (M5, M6). For the most part, the driver transistors are 1.5 to 2.5 times more extensive (for same gate length) than the access semiconductors.

#### 2.2.2 Write

WRITE operation to the cell is performed by providing the information to be composed on BLs also, empowering WL. Empowering WL causes the entrance transistors M5 and M6 to turn on. Information from BLs go through the access transistors to the inner nodes (Data and  $\overline{Data}$ ) and are safeguarded there as full rail-to-rail voltage due to the feedback activity of the cross-coupled inverters. Data to be written might be not the same as the already stored value. All things considered, fitting inner node (Data or  $\overline{Data}$ ) requirements to be flipped from '1' to '0'. Since NMOS has higher carrier mobility than PMOS, utilizing same-sized (least measured) access transistors (M5, M6) and load transistors (M1, M3) in the SRAM cell can achieve this work without any problem.

#### 2.2.3 Search

The search key is provided through SLs. In Fig. 2.1(a), if Data = SL and thus Data = SL, ML stays disengaged from ground. Else, one of the pull-down (through M7, M8 or through M9, M10) makes ML be associated with ground. Therefore, transistors M7 through M10 execute XNOR operation, for example ML='0' if Data  $\neq$  SL (ML= $\overline{Data \oplus SL}$ ). Figure 2.2(a) shows how different cells are associated one next to the other to shape a n-digit data word [1]. Regardless of whether a single bit is mismatch in a word, the ML will be associated with ground through the comparison circuit of the mismatch cell furthermore, will have logic '0'. Just in the event of a full word match, the ML will stay in high impedance state. The decision circuit can detect this distinction and produce relating match outcome. Comparison circuits of multiple parallel cells form parallel pulldown paths for ML actually like the parallel pull-down paths of the driver network of a NOR gate. That is the reason, this kind of cell is called NOR-type cell.

In Fig. 2.1(b), on the off chance that Data = SL, either M8 (when Data = SL ='1') or M7 (when  $\overline{Data} = \overline{SL} =$ '1') will be ON and will pass logic high to the gate of M9. In this way, M9 will likewise be ON making association somewhere in the range of ML1 and ML2 (and at last association with ground). Transistors M7 through M9 actualize XOR logic, i.e., ML='0' if Data = SL (ML=Data $\oplus$ SL). One issue of NAND-type cell is, gate of M9 won't ever have full V<sub>DD</sub> rather V<sub>DD</sub>-V<sub>tn</sub> since NMOS pass transistor can't pass logic level high well. If there should be an occurrence of a mismatch (Data  $\neq$  SL), M9 will have logic level '0' at its gate and henceforth will stay OFF. This will bring about no connection between ML<sub>2</sub> and ground. Presently, when different cells are associated one next to the other to frame a word as

demonstrated in Fig. 2.2(b), just if there should be an occurrence of a full match between the data word and the search key, all ML segments (ML  $_0$  to ML  $_n$ ) will be associated with ground [1]. Regardless of whether there is a mismatch in the single bit, all ML portions (and the decision circuit) to the right side of that bit will stay disengaged from ground. The decision circuit recognizes between a match and a mismatch and creates a proper output. The ML comprises of series connected NMOS transistors very much like the pull down path of the driver network of a NAND gate which makes the naming of the cell fitting. Contingent upon various procedures of match/mismatch detecting, the development of the decision circuit can differ which will be talked about later.





Figure 2.2: (a)NOR BCAM data word consisting of n-bit cells, (b)NAND BCAM data word consisting of n-bit cells

#### **2.3 Ternary CAM (TCAM)**

The word "ternary" comes from the fact that each cell in TCAM can store three states namely high, low and mask or don't care 'X'. Representation of three states requires two bits. Hence, each cell in TCAM contains two SRAM cells. TCAM cell can be either of two types—NOR-type or NAND-type. Figure 2.3 shows both types of TCAM cells [1]. In NOR-type TCAM cell appeared in Fig. 2.3(a),



Figure 2.3: (a)NOR TCAM cell and (b)NAND TCAM cell

transistors M1–M4 make the comparison circuit. The three states are put away as Data1Data2 = 01 (low), Data1Data2 = 10 (high) and Data1Data2 = 00 (don't care). Data1Data2 = 11 state isn't permitted. Identical encoding is utilized for search data additionally, i.e., SL1SL2 = 01 or 10 or on the other hand 00. At the point when stored data Data1Data2 = 00, the masking is called local masking. SL1SL2 = 00 means global masking. At the point when Data1Data2 = SL1SL2 (match) or local/ global masking is being utilized, neither of the ML pull-down paths (through M1, M2 or through M3, M4) is active and ML stays detached from ground. Just when Data1Data2  $\neq$  SL1SL2 (confuse) and there is no masking, ML is pulled down to

ground by one of the two transistor sets M1M2 or M3M4. In Fig. 2.3(b), the NAND-type cell utilizes separate cells for data and mask. Transistors M1 through M4 structure the comparison circuit. Local masking is accomplished by putting away '1' in the mask cell (X='1') which turns on pass transistor M4 and interfaces ML segments of the two sides  $(ML_1 \text{ and } ML_2)$  independent of the value stored in the data cell. Global masking is accomplished by providing SL1='1' and SL2='1' (SL1SL2 = 00 isn't permitted). This makes pass transistors M3 to turn ON since one or the other M1 or M2 is continuously ON. That thus, associates ML segments of the two sides regardless of the value stored in the data and mask cell. When there is no masking, i.e., X='0' and SL1 =  $\overline{SL2}$ , the operation is same as BCAM NAND cell. Pass transistor M3 turns ON just if the stored data and the search data match, i.e., Data=SL1(and  $\overline{Data}$ =SL2= $\overline{SL1}$ ). In this way, just in the event of a mismatch, both of the pass transistors M3 and M4 stay OFF. In packet forwarding/classification application, WRITE activity is completed to refresh the routing table and READ activity to test fruitful WRITE activity. These activities are rare while SEARCH is the most oftentimes performed task. READ/WRITE activity of the SRAM cells inside TCAM cells is completed utilizing same methodology referenced if there should be an occurrence of BCAM.

#### 2.3.1 TCAM word formation

TCAM word is formed by joining TCAM cells side-by-side. The construction of a TCAM word is shown in Fig. 2.4 [1]. In Fig. 2.4(a), the ML is pulled down to ground if there is one or more mismatched cells in the word. Otherwise, ML remains floating. In Fig. 2.4(b), the whole ML is (all intermediate nodes  $ML_n$ ,  $ML_{n-1}$ ; . . .,  $ML_1$  are) grounded only if there is a full match between the data word and the search key. The decision circuit senses the state of ML and produces a match result. Depending on ML sensing scheme, the construction of the decision circuit varies. In all cases, match result is '1' when there is a full match. Otherwise, match result is zero signifying one or more mismatches.



Figure 2.4: (a)NOR TCAM Data word consisting of n-bit cells, (b)NAND TCAM Data word consisting of n-bit cells

#### 2.4 TCAM Array

A TCAM array consists of several TCAM words stacked together horizontally.The TCAM cells that is on the same vertical line shares same BLs,SLs.The WLs are common for the TCAM cells that are in same word line.

Multiple words are used to form a table/array of stored words as shown in Fig. 2.5 [1].

In this simulation, the graphs and simulations are done on a TCAM array of size **16 by 32** which means 16 matchlines each with 32 TCAM cells.

During the SEARCH operation, the search key is compared with each word simultaneously and match results for all the words become available within one clock cycle.For packet forwarding/ classification applications, all the ML sensing outputs (MLSOs) are inputs to a priority encoder (not shown).In CIDR, the IP addresses are allowed to have variable lengths. When storing addresses in the table, if the prefix size is less than the word size, the least significant bits are padded with mask bits. In order to implement the LPM, the prefixes in the table are sorted according to their actual lengths,



Figure 2.5: TCAM Array

i.e., entry with smallest number of mask bits (longest prefix) has the highest priority. In case of multiple matches, the priority encoder will choose the MLSO of the highest priority prefix, i.e., the longest prefix.

Figure 2.6 shows a simple example of packet forwarding using TCAM [1]. In Fig. 2.6, a 4-word \* 6-digit TCAM array stores the routing table in descending order of priority from bottom to top. Because of the mask/wildcard bits, each entry represents a range. For example, the second highest priority entry 0110XX means packets with destination addresses in the range 011000 to 011011 have to be routed to port B. If the destination address is 011010, there are matches with second and fourth entries. But, second entry is selected by the priority encoder since it has higher priority (fewer mask bits) and eventually the packet is sent to port B.This is how packet forwarding and searching is done in TCAM array.



Figure 2.6: TCAM-based routing table implementation and packet forwarding.

## Chapter 3

## **Matchline Sensing Scheme**

Figure 3.1(a) shows the conventional scheme for match detection proposed for NOR type TCAM [1]. Match detection requires following arrangement of occasions: first discharge all SLs to ground, precharge every one of the MLs to high (by  $\overline{Precharge}$  sign) and afterward broadcast the search key to the SLs. On the off chance that there is a full match, the relating ML holds its voltage. On the off chance that there are at least one mismatch, the ML has leading path(s) to ground and ML is discharged. The decision circuit is made out of a charging PMOS transistor and a ML sense amplifier (MLSA). The MLSA faculties the logic level present at the ML and produce high output for match and low output for mismatch. The comparison circuit of the proposed scheme experiences charge sharing issue. During the match assessment, after SLs are discharged, M1 and M2 turn OFF. MLs are precharged to high and search key is provided. Contingent upon the search bit, either M1 or M2 may turn on. Along these lines, the charge stored in ML is imparted to the node A1 or A2. This diminishes the voltage at ML. In the event that this charge sharing occurs in huge number of cells in a completely matched word, the ML voltage may drop to a small value. This may prompt wrong match result. That is the reason, for conventional scheme the altered type of comparison circuit appeared in Fig. 3.1(b) is liked. Since the stored bit controls the conditions of M1 and M2 and the bit is steady during a search, the charge sharing issue is eliminated.

Generally, there are just couple of words with full match in an array. From Figs.2.3 also, 2.5 it is apparent that MLs and SLs are intensely capacitive. As demonstrated in Fig. 2.5, the NOR-type ML is shared by every one of the cells in a word. In this man-



Figure 3.1: (a)Conventional ML sensing scheme where SLs are discharged to ground and MLs are precharged to high before match evaluation, (b)Modified comparison logic

ner, the ML capacitance ( $C_{ML}$ ) is straightforwardly corresponding to number of cells (digits) in a word. Once more, same SL pair is shared by every one of the cells in a column. Along these lines, SL capacitance ( $C_{SL}$ ) relies upon the quantity of entries in the table. Present day routing tables may contain a few hundred thousand entries. During a search, every one of the MLs and SLs are enacted simultaneously. Exchanging of these profoundly capacitive lines causes colossal power consumption. The issue with conventional precharge-high strategy is, all the MLs are precharged to  $V_{DD}$  and the majority of those MLs are released to ground during match evaluation. This makes huge measure of energy be squandered. The power consumed by a single mismatched ML due to precharge and discharge can be estimated by

$$P_{Miss}=C_{ML}V_{DD}^2f$$

where f is the frequency of the SEARCH operation. Since there is only a small number matches, the overall ML power consumption with w MLs can be estimated by

$$P_{ML}$$
=w $C_{ML}V_{DD}^2f$ 

Energy efficient match detection or ML sensing techniques try to reduce ML and SL power consumptions.

There are a number of sensing circuits that are being used for detection and decision of match and mismatches.For example,Selective precharge scheme,Pipelining scheme etc.Each of them has there advantages and disadvantages.However the most popular sensing circuit is known to be **Current Race Scheme or CR scheme**.

#### **3.1 CR scheme**

The TCAM array in CR scheme is built utilizing NOR-type cells (Fig.2.3). The original CR scheme was focused to decrease ML power consumption of the conventional scheme. The primary contrast between CR scheme and conventional scheme is that CR scheme is a precharge-low scheme while conventional scheme is prechargehigh. Rather than charging all MLs to high, CR scheme pre-discharges all MLs to low. During match evaluation, all MLs are charged towards high. Matched MLs charge rapidly to large voltage (because of NOR-type cells) while mismatched MLs have a lot lower voltage because of presence of discharging paths. Precharge-low method has another benefit. In conventional scheme, the SLs should be discharged to zero voltage during precharge stage to guarantee that MLs stay separated from ground .During evaluation, SLs are stacked with the actual search key. In CR scheme, since MLs are predischarged, SLs need not to be changed to zero. This reduced SL exchanging movement results in around 50% saving of SL energy. Controlling the timing of clocked circuits is an issue in CAM design. A typical procedure to address this issue is the implementation of a dummy word which is always matched. The related dummy ML is utilized to control the length of precharge and evaluation stage. This minimizes the effects of process variations since the dummy ML tracks the process variations in the rest of the array. CR scheme dispenses with pointless ML charging by utilizing same dummy word idea [1]. Figure 3.2(a) shows the CR scheme [1]. MLSA comprises of a charging unit and sensing unit. The activity begins by resetting any voltage on ML ( $V_{ML}$ ) and MLSA yields (ML-SOs and DMLSO) to zero by MLRST signal. At that point, declaring







(b)

Figure 3.2: (a)CR scheme, (b)Dummy unit to generate  $\overline{MLOFF}$ 

signal MLEN turns on transistor M2 making  $I_{ML}$  low. The ML capacitance ( $C_{ML}$ ) begins charging. On the off chance that the word is completely matched, ML can energize to the threshold voltage of M3 turning it ON. Thusly, MLSO can turn out to be high. In the event that the word isn't matched,  $V_{ML}$  is small since ML has discharging path(s) to ground. M3 stays OFF and MLSO stays zero. The dummy word is constantly matched by local masking and its MLSA consistently delivers high DMLSO. A delayed and inverted form of DMLSO, i.e.,  $\overline{MLOFF}$  is utilized to turn off M2 transistors altogether words. This kills pointless charging (and energy consumption) of MLs. The programmable delay after DMLSO guaran-

tees that every one of the matched MLs get adequate opportunity to energize to the threshold voltage of M3 on the off chance that the dummy word distinguishes the match prior (because of process variation).  $V_{bias}$  controls  $I_{ML}$  and thus controls the speed and energy consumption of the match detection process. In CR scheme, NOR cell comparison circuit appeared in Fig. 3.1(a) is liked over the comparison circuit appeared in Fig. 3.1(b). The ON/OFF conditions of M1 and M2 in Fig. 3.1(b) decide the parasitic capacitance of ML. For the comparison circuit appeared in Fig. 3.1(b), ML parasitic capacitance relies upon stored data. Since various cells have diverse stored data,  $C_{ML}$  differs from one ML to another. Yet, for the comparison circuit appeared in Fig. 3.1(a),  $C_{ML}$  relies upon search data. Since search data bits are same along a column,  $C_{ML}$  stays as before for all MLs in a search. This guarantees great matching among MLs and forestalls detecting mistake due to capacitance variety. CR scheme supplies a similar starting current (on the rising edge of MLEN) to both mismatched and matched MLs. In any case, since matched MLs charge to higher voltage,  $I_{ML}$  progressively diminishes in matched MLs. Despite what is generally expected, mismatched MLs have lower resistance path(s) to ground. The equivalent resistance of the ML pull-down path diminishes with expanding number of mismatches. Subsequently,  $I_{ML}$  increments with expanding number of mismatches. Since the vast majority of the MLs are mismatched, huge currents to mismatched MLs cause huge wastage of energy. This issue can be settled by providing smaller currents to mismatched MLs. Positive feedback strategy has been joined with the fundamental CR scheme to accomplish this non-uniform current conveyance.

#### **3.2 Sensing circuits with Feedbacks**

The idea of using positive feedback in CR MLSA was proposed to reduce the current flow  $I_{ML}$  as most of the words are mismatched. This scheme is called current saving technique or mismatch-dependent (MD) power allocation technique. In this technique, a feedback unit has been added to the basic CR MLSA to detect mismatched MLs during charging and reduce currents to those MLs.

#### **3.2.1** CR scheme Mismatch Dependent(MD)



Figure 3.3: Mismatch Dependent CR scheme

Figure 3.3 shows the MD scheme [1]. The feedback unit contains a level shifter and a feedback circuit to execute feedback. In speed-optimized setting, the nodes MLSO,  $V_{ML}$  and  $V_{VAR}$  are predischarged to ground first. MLEN begins the charging, everything being equal. At first, both  $V_{ML}$  and  $V_{VAR}$  increment by current flows  $I_{ML}$  and  $I_{bias}$ , respectively. As  $V_{VAR}$  expands, current  $I_{ML}$  diminishes. Yet, with expanding  $V_{ML}$ , the level shifter output additionally increments.  $V_{ML}$  in a matched ML increments quicker than that in a mismatched ML. The level shifter output turns out to be adequately high in a matched ML to turn on N1 and subsequently  $V_{VAR}$  starts to discharged. With diminished  $V_{VAR}$  voltage,  $I_{ML}$  for matched ML increments once more. For a mismatched ML,  $V_{ML}$  rises gradually and to a lower value which relies upon the number of mismatches present in the word. Subsequently, for mismatched MLs, N1 may get feebly turned on or it might stay OFF. This outcomes in little or no decrease in  $V_{VAR}$  voltage. Hence,  $I_{ML}$  in mismatched ML continues to diminish. Matched MLs get higher average  $I_{ML}$  than mismatched MLs. Same dummy word as in CR scheme has been utilized to produce  $\overline{MLOFF}$  signal to control ML charging duration. Regarding ML energy, this method can offer huge energy decrease contrasted with CR plot. Be that as it may, the level shifter and the input circuit devour impressive sum of energy. In the event of full match, transistor N1 is completely turned on. For small number of mismatches (1-digit or 2-bit), N1 remains partially ON. This causes foundation of leading way from VDD to ground in the feedback circuit. Once more, for bigger number of mismatches,  $V_{ML}$ is little since ML has different discharging paths to ground. This makes both the level shifter transistors ON and makes a leading way from VDD to ground. Conducting paths from VDD to ground cause static power consumption. The level shifter goes about as a voltage divider requiring more prominent bit of VDD to be dropped across the M1. This requires M1 with huge gate length and M2 with large gate width. Capacitive stacking on the ML increments because of large M2. Subsequently, as far as all out energy, the saving isn't critical. The huge transistors in level shifter make match detection measure altogether slow. The transistor include in the MLSA is likewise bigger contrasted with CR MLSA.

#### **3.2.2 CR scheme Active Feedback(AF)**

In order to overcome the problems of MD scheme, there are two ML-SAs with feedback — active feedback (AF) MLSA and resistive feedback/ shielding (RF) MLSA. Figure 3.4 shows ML sensing scheme with AF MLSA [1]. After  $V_{ML}$  and MLSO are reset to zero (by MLRST),



Figure 3.4: CR scheme with Active Feedback

search is started by asserting MLEN. M1 and M2 turn ON making

gate capacitance of M4 (plus drain capacitances of M2 and M3 and the wiring capacitance) to charge. Initial  $I_{ML}$  currents in all ML-SAs are same which is also true for  $I_{Chr}$  currents. Therefore,  $V_{CS}$ and  $V_{ML}$  increase with time. As a result,  $I_{ML}$  and  $I_{Chr}$  decrease with time. Now, for a matched ML,  $V_{ML}$  rises quickly since ML is disconnected from ground. This makes M2 to turn OFF ( $I_{Chr} = 0$ ) when  $V_{M2_{source}}$  -  $V_{ML} < ||V_{M2}||_{threshold}||$ . Since M3 is always ON,  $V_{CS}$  starts to discharge through M3 in a matched ML. That makes M4 to again conduct more and  $I_{ML}$  to increase again. For a mismatched ML,  $V_{ML}$  rises slowly. This keeps M2 ON and makes  $V_{CS}$  to keep increasing eventually making M4 to turn OFF ( $I_{ML} = 0$ ). Mismatched ML gets smaller current than matched ML and energy consumption in mismatched words becomes small. Vbias and MLSA transistor sizes need to be tuned to make this feedback mechanism effective. Dummy word provides the  $\overline{MLOFF}$  signal to terminate ML charging. Some energy is wasted due to current through M3 (to ground) especially in all mismatched MLs.

#### **3.2.3 CR scheme Resistive Feedback(RF)**



Figure 3.5: CR scheme with Resistive Feedback

Figure 3.5 shows the RF MLSA [1]. First, MLRST signal resets

the voltages at ML, sensing point (SP) and MLSO. Then, MLEN signal initiates the search operation by turning on M2. Vbias controls the total charging current  $I_{Chr}$ . Initially,  $I_{Chr}$  currents are same in all MLs and so are  $I_{ML}$  currents and  $I_{SP}$  currents. Feedback action is implemented by a single NMOS transistor M3. As ML voltage is increased by  $I_{ML}$ ,  $V_{qs}$  and  $V_{ds}$  of M3 decreases. Therefore,  $I_{ML}$ decreases. Reduction in  $I_{ML}$  causes increase in  $I_{SP}$ . Increase in  $I_{SP}$  causes increase in voltage at SP ( $V_{SP}$ ) and Vds of M3. As  $V_{ML}$ in a matched word increases faster and to a higher value,  $V_{SP}$  in a matched word also increases to a higher value. Since capacitance at node SP ( $C_{SP}$ ) is much lower than CML, the rate of rise of  $V_{SP}$  is much larger than that of  $V_{ML}$ . In a mismatched ML,  $V_{ML}$ rises slowly. Therefore,  $V_{SP}$  also rises slowly.  $V_{SP}$  in a matched ML reaches threshold voltage of M4 much quicker than that in a mismatched ML and produce a high output at MLSO. A dummy word, fully matched by local masking as in CR scheme, signals the end of match detection (by its own MLSO) and terminates further charging of ML and SP nodes in all words (by  $\overline{MLOFF}$ ).  $V_{SP}$  of mismatched words do not get the sufficient time to charge up to the sensing threshold voltage. This technique of ML sensing is extremely simple in construction, i.e., only two additional transistors compared to CR MLSA are required. Yet, the technique is very effective not only to reduce energy consumption but also to speed up the SEARCH operation. Energy reduction occurs due to the fact that large ML capacitances are not required to be charged to as large voltages as in CR or other feedback techniques for proper match detection. Energy overhead occurs due to the charging of node SP. Since  $C_{SP}$  is very small, this energy overhead is small compared to other feedback techniques. The drawbacks of this technique are requirement of two analog control voltages ( $V_{bias}$  and  $V_{res}$ ).

#### **3.2.4 CR scheme Dual Feedback(DF)**

In order to improve the performance further and eliminate the requirement of two bias voltages, we have used positive feedback twice in a way to control both the feedback using one bias voltage. Figure 3.6 shows the dual feedback MLSA [1]. One positive feedback is offered by the transistor N1 same as RF scheme. The second feedback is offered by the feedback unit. The feedback unit is a modified version of MD scheme designed to eliminate the static power consump-



Figure 3.6: CR scheme with Dual Feedback

tion problem. The signal  $\overline{DMLSO}$  is obtained from the dummy word MLSA output. The feedback actions create large difference between  $I_{SN}$  currents in matched and mismatched MLs. Match sensing is possible with very small ML voltage which results in superior speed and energy consumption compared to other feedback schemes.

## **Chapter 4**

## Simulation, Results and Comparison

The conventional CR scheme and CR schemes with feedback circuits have been simulated using **HSPICE** programming [2][3][4][5]. We have used 130nm technology parameters and library files. In this technology, the full  $V_{DD}$  is 1.2V. The transistor sizes are tuned such that the current flow is optimized. The transistor sizes of the charging unit is wider than other circuits as current flow higher with the increment of the transistor width and decrement of the transistor length.

#### $I \propto \frac{W}{L}$

Here W=transistor channel Width,L=transistor channel Length. The voltages( $V_{ML}$ ),current( $I_{ML}$ ) and relevant graphs are projected in **Cosmos Scope**. The graphs and simulation verifies the process procedures of the sensing circuits.

#### 4.1 CR scheme and controlling signals

From Fig.4.1(a), we can observe the ML voltages for different match results. For a full match, the voltage has raised to almost around 1.1 volts. The next highest voltage is for a 1 bit mismatch. It can be observed that the difference between full match and 1-bit mismatch voltage is quite significant. This is known as voltage margin. The



Figure 4.1: (a)CR scheme  $V_{ML}$  Graph and (b)Controlling signals

more the margin, the easier it is for the sensing circuit to differentiate match and mismatch. As the mismatch bit number increases the voltage gets lower and lower as it is shown in the graph. On Fig.4.1(b), the 3 major controlling signals MLEN,  $\overline{MLOFF}$  and MLRST is shown that is used for running our simulation.

#### 4.2 CR scheme MD Graphs

In this simulation graph, it is observed in Fig.4.2(a) that the voltage margin previously discussed has a higher value. That means the difference between full match and 1-bit mismatch ML voltage is higher than normal CR scheme. This is because of the feedback system implemented. In Fig.4.2(b), the  $I_{ML}$  graph is shown. Here for the full match we get higher current and as mismatch increases, current decreases. In Fig.4.2(c),  $V_{VAR}$  graph is shown which is the feedback voltage that controls  $I_{ML}$  current flow. For full match,  $V_{VAR}$ is low(almost zero). This lets the PMOS flow large amount of current  $I_{ML}$ . As mismatch increases,  $V_{VAR}$  is less lowered. Thus  $I_{ML}$  current flow degrades gradually as PMOS channel is less active with more gate voltage. This shows the proper operation of feedback system. But, it is not energy efficient as a large amount of current is wasted through the level shifter. Also search speed is not up to the mark. To eliminate these problems, better versions of feedback systems are discussed below.



Figure 4.2: CR scheme MD(Mismatch dependent) (a) $V_{ML}$  Graph, (b) $I_{ML}$  Graph, (c) $V_{VAR}$  Graph

#### 4.3 CR scheme AF Graphs

In Fig.4.3(a),  $V_{ML}$  voltages are shown. The voltage margin is quite significant.  $I_{ML}$  flow in Fig.4.3(b) is quite convincing as full match  $I_{ML}$  and 1-bit mismatch  $I_{ML}$  flow value has a significant difference. So energy consumption is lessened. Also because of a single controlling voltage, search speed is very quick. The  $V_{CS}$  graph is Fig.4.3(c) shows that feedback system is working effectively as voltage value increases by number of mismatches.



Figure 4.3: CR scheme AF(Active Feedback) (a) $V_{ML}$  Graph, (b) $I_{ML}$  Graph, (c) $V_{CS}$  Graph

#### 4.4 CR scheme RF Graphs

In Fig.4.4(a),  $V_{ML}$  voltages are shown. Here, the voltage margin is not significant as  $I_{ML}$  flow is controlled in a very simple way. For all matches and mismatches, similar current flows to the ML and voltage rises quickly. Thus search speed is quick in this sensing circuit. Also voltage margin  $V_{SP}$  is good so detection is efficient.  $V_{SP}$  also rises quickly as  $V_{ML}$  in a matched word so for dummy unit,  $\overline{MLOFF}$  is turned quickly compared to the basic CR scheme as shown in Fig.4.4(d). Thus  $I_{ML}$  flow is turned off quickly and saves energy consumption. Both  $V_{SP}$  and  $I_{ML}$  graphs are shown in Fig.4.4(b) and 4.4(c).



Figure 4.4: CR scheme RF(Resistive Feedback) (a) $V_{ML}$  Graph,(b) $I_{ML}$  Graph,(c) $V_{SP}$  Graph,(d) $\overline{MLOFF}$  Graph

#### 4.5 CR scheme DF Graphs



Figure 4.5: CR scheme Dual Feedback (a) $V_{ML}$  Graph,(b) $I_{ML}$  Graph

In Fig.4.5(a), It is observed that  $V_{ML}$  voltage margin is really high. So detection is faster and more precise. In Fig.4.5(b),  $I_{ML}$  current flow is controlled very efficiently. In the beginning, initial  $I_{ML}$  is same. For the first feedback, which is RF feedback, current decreases for all matchlines. For the second feedback,  $I_{ML}$  current flow for matched ML increases. Energy consumption is highly optimized. Search speed is also convenient. But circuit complexity is the main challenge for this sensing circuit implementation.

#### 4.6 Comparison

After conduction successful simulation in the HSPICE, we compared the sensing circuits by measuring some values. Those are mainly the energy consumed by the circuit in a single search, the time required for the circuit to show the MLSO voltage after the MLEN has been turned on (Basically the time between 0.6V rise of MLEN to the 0.6V rise of MLSO as technology voltage is 1.2V) which is known as search time, the voltage difference between full match ML voltage and 1-bit mismatch ML voltage which is known as voltage margin and the ML voltage value when the word is fully matched for that particular search. The energy is consumed by the  $I_{ML}$  current flow. The more the current flows or wasted, the more the energy is consumed. So less energy consumption is preferable. Similarly the less the time it takes to present the search result the quickly the dummy circuit will turn off and thus will save energy. So less search time or high search speed is required. For voltage margin, the higher the difference is the better the circuit can detect match and mismatch. So higher voltage margin is appreciated. And the ML voltage rise needs to be optimal. Not too high as energy will be wasted yet not so low that sensing circuit can't detect the match. After extracting the data from the graphs, we compared them is tabular method. The table is shown in Table 4.1.

|              | Energy Consumed    | Search | Voltage | Full match |
|--------------|--------------------|--------|---------|------------|
| Circuit type | in a single search | Time   | Margin  | ML voltage |
|              | (fJ)               | (ps)   | (mV)    | (V)        |
| CR scheme    | 561.93             | 280    | 684     | 1.13       |
| CR scheme MD | 628.112            | 470    | 640     | 0.96       |
| CR scheme AF | 516.288            | 250    | 810     | 1.12       |
| CR scheme RF | 542.432            | 200    | 266     | 0.81       |
| CR scheme DF | 358.248            | 420    | 465     | 0.587      |

30

Table 4.1: Comparison data among different CR schemes

# Chapter 5 Conclusion

This project has elaborate discussion on distinction between all 4 type of feedback based current race schemes and also the basic current race scheme. Using thorough analysis of the precise simulation graphs, we have observed the controlling of the current flow  $I_{ML}$  by the sensing circuits according to the  $V_{ML}$  rise. The circuits response is also affected by all these parameters. Thus, energy consumption and search speed varies for different feedback system. The data collected from simulation has been tabulated in the above section for comparison.

From the tabular data, it is clearly realizable that different sensing circuit serves different purpose. For example, if energy optimization is needed than the dual feedback is more preferable. On the other hand, if seed optimization is a concern than resistive feedback is a wise choice. For a trade off between them, active feedback serves well. For low circuit complexity, basic CR scheme can be used. The choice is dependent on the engineer who is designing.

Finally, we come to a conclusion that TCAM circuits are the most efficient circuits for high speed search and therefore, the most popular ones. For the detection of the ML voltages and giving match and mismatch results, Current Race schemes are the state of the art circuitry. Attaching feedback system makes this more energy efficient and time productive.

## REFERENCES

- [1] Syed Iftekhar Ali,Md Shafiqul Islam and Mohammad Rakibul Islam. A Comprehensive Review of Energy Efficient Content Addressable Memory Circuits for Network Applications,World Scientific Publishing Company(2015).
- [2] Igor Arsovski, Trevis Chandler and Ali Sheikholeslami. A Ternary Content-Addressable Memory (TCAM) Based on 4T Static Storage and Including a Current-Race Sensing Scheme, IEEE JSSC(2003).
- [3] Igor Arsovski and Ali Sheikholeslami. A Mismatch-Dependent Power Allocation Technique for Match-Line Sensing in Content-Addressable Memories, IEEE JSSC(2003).
- [4] Igor Arsovski and Ali Sheikholeslami. A current-saving matchline sensing scheme for content- addressable memories, IEEE ISSCC(2003).
- [5] Nitin Mohan , Wilson Fung , Derek Wright and Manoj Sachdev. A Low-Power Ternary CAM With Positive-Feedback Match-Line Sense Amplifiers, IEEE TCSI(2009).
- [6] Syed Iftekhar Ali and Md Shafiqul Islam. A Current race-based technique with dual feedback for matchline energy reduction in Ternary Content Addressable Memory,ICECE(2012).
- [7] Syed Iftekhar Ali and Md Shafiqul Islam. A match-line dynamic energy reduction technique for high-speed ternary CAM using dual feedback sense amplifier, Microelectron (2014).
- [8] Syed Iftekhar Ali and Md Shafiqul Islam. *Improved charge-shared match-line sensing scheme for dynamic energy reduction in TCAM*,IETE(2015).

#### REFERENCES

[9] Md Shafiqul Islam and Syed Iftekhar Ali. *Improved charge shared* scheme for low-energy match line sensing in ternary content addressable memory,ISCAS(2014).