

# ISLAMIC UNIVERSITY OF TECHNOLOGY

# (IUT)

# Performance Comparison of Charge-Shared Matchline Sensing Schemes in High-Speed Ternary Content Addressable Memory (TCAM)

BY

Raqqib Bin Kadir

Muntasim-Ul-Haque

Ahmed Selim Anwar

A Dissertation Submitted in Partial Fulfillment of the Requirement for the **Bachelor of Science in Electrical and Electronic Engineering** Academic Year: 2013-2014

Department of Electrical and Electronic Engineering. Islamic University of Technology (IUT) A Subsidiary Organ of OIC Dhaka, Bangladesh.

## A Dissertation on

# Performance Comparison of Charge-Shared Matchline Sensing Schemes in High-Speed Ternary Content Addressable Memory (TCAM)

Submitted by

Raqqib Bin Kadir (102415)

Muntasim-Ul-Haque (102420)

Ahmed Selim Anwar (102459)

Approved By

Prof. Dr. Md. Shahid Ullah

Head of the Department Department of Electrical and Electronic Engineering Islamic University of Technology (IUT) Gazipur-1704, Bangladesh.

Supervised by

Dr. Syed Iftekhar Ali

Associate Professor Department of Electrical and Electronic Engineering Islamic University of Technology (IUT).

# Abstract

Ternary content addressable memory (TCAM) is a memory that offers high speed table look-up capability for applications such as internet protocol (IP) packet forwarding and classification in network routers. A performance comparison between different Matchline sensing schemes in high speed Ternary content addressable memory (TCAM) is presented in this thesis.

With the conventional current race scheme two different charge shared schemes are being compared. By segmentation of Matchline and then charge sharing reduces the power to some extent. This two charge shared schemes also improves search time and voltage margin.

Simulations are performed using 180nm 1.8V CMOS logic in HSPICE. By changing the properties of different transistors, the simulated outputs were compared for better performance measurements.

# Motivation

Ternary Content Addressable Memory offers very high speed searching capability. That is why it is used in variety of applications. But the speed of TCAM comes at the cost of increased silicon area and power consumption. As TCAM applications are increasing and also with the increase of data size, the size of the TCAM is also increasing. Larger size of TCAM demands more power and increases the area. So, decreasing power consumption without sacrificing the speed and area is the main concern of recent research in large capacity TCAMS.

So, we have chosen to work on this topic by improving the performance of TCAM focusing mainly on the charge shared matchline sensing schemes cause it seem more feasible than other methods. Charge shared methods provide more opportunity to reduce power consumption while retaining its speed. But reducing power consumption can cause a little degradation in performance. So, trade off must be made to achieve both speed and reduced power consumption.

# Acknowledgements

At first every respect to the greatest Almighty Allah to give us the opportunity to do this work with patience for the last one year. This undergraduate thesis 'Performance Comparison of Charge-Shared Matchline Sensing Schemes in High-Speed Ternary Content Addressable Memory (TCAM)' is the result of our one year continuous hard work and effort and by far the most significant accomplishment in our life. This would be impossible without continuous support and appreciation of many people.

We would like to express our sincere gratitude to our supervisor "Dr. Syed Iftekhar Ali, Associate Professor Department of Electrical and Electronic Engineering, Islamic University of Technology (IUT)" for his continuous guidance, support and enthusiasm for the last one year. Without the dedicated help of him this work would be impossible. He was always generous with his time, listening us carefully and criticizing us fairly.

We are also thankful to our departmental head Prof. Dr. Md. Shahid Ullah for his inspirations to complete the work. Finally we are grateful to all of our willwishers, family members, friends and relatives for their support for this whole time to accomplish this task successfully.

# **Table of Contents**

| Abstract                                                                                               | 3  |
|--------------------------------------------------------------------------------------------------------|----|
| Motivation                                                                                             | 4  |
| Acknowledgements                                                                                       | 5  |
| CHAPTER ONE                                                                                            | 8  |
| INTRODUCTION                                                                                           | 8  |
| THESIS LAYOUT                                                                                          | 8  |
| CHAPTER TWO                                                                                            | 9  |
| TCAM BASICS                                                                                            | 9  |
| 2.1 .Ternary Content Addressable Memory (TCAM)                                                         | 9  |
| 2.1.1. TCAM Cell                                                                                       | 9  |
| 2.1.2 TCAM Word                                                                                        | 11 |
| 2.1.3 NOR-type cell versus NAND-type cell                                                              | 12 |
| 1.2 Operations                                                                                         | 13 |
| 1.2.1 Write Operation                                                                                  | 13 |
| 2.2.2 Read Operation                                                                                   | 13 |
| 2.2.3 Search Operation                                                                                 | 14 |
| 2.3 Applications of TCAM                                                                               | 15 |
| 2.3.1 Packet Forwarding Using CAM                                                                      | 16 |
| CHAPTER THREE                                                                                          | 18 |
| MATCHLINE SENSING SCHEMES                                                                              | 18 |
| 3.1 Matchline Sensing Scheme                                                                           | 18 |
| 3.2 Current Race (CR) Scheme                                                                           | 19 |
| 3.3 Charged Shared Matchline Sensing Schemes                                                           | 21 |
| 3.3.1 Charge Shared Matchline Sensing by Segmentation of Two Blocks (4 Segments)                       | 22 |
| 3.3.2. ML Sensing Scheme Using the Combination of Charge-Sharing, Selective Prechar<br>Replica Control | 0  |
| CHAPTER FOUR                                                                                           | 26 |
| SIMULATION AND COMPARISONS                                                                             | 26 |
| 4.1 Parameter                                                                                          | 26 |
| 4.1.1 Search Time                                                                                      | 26 |
| 4.1.2 Voltage Margin                                                                                   | 26 |
| 4.2 Simulation Results:                                                                                | 27 |

| 4.2.1 Current-Race Scheme:                                                                | 27                   |
|-------------------------------------------------------------------------------------------|----------------------|
| 4.2.2 Charge Shared Matchline Sensing by Segmentation of Two Blocks (4                    | <b>Segments</b> ) 28 |
| 4.2.3 ML Sensing Scheme Using the Combination of Charge Sharing, Selec<br>Replica Control | 0                    |
| 4.3 Comparison Results:                                                                   |                      |
| CHAPTER FIVE                                                                              | 35                   |
| 5.1 CONCLUSION                                                                            | 35                   |
| 5.2 FUTURE WORK                                                                           |                      |
| REFERENCES                                                                                |                      |

## CHAPTER ONE

## **INTRODUCTION**

# THESIS LAYOUT

#### **TCAM BASICS**

TCAM cell, word, NOR/NAND type cell have been discussed. Overview have been given on Write, read and search operations. Applications of TCAM and packet forwarding techniques of TCAM have been given importance.

#### MATCHLINE SENSING SCHEME

Different matchline sensing schemes are there e.g. conventional matchline sensing, low–swing schemes, current-race scheme, selective–precharge scheme, pipelining scheme, current-race scheme with active feedback, current saving scheme, charge shared matchline sensing schemes. Current race scheme and two-types of charge shared matchline sensing schemes have been discussed here.

#### SIMULATION AND COMPARISONS

Current race method has been compared with 4-segmented and 2-segmented Charge shared matchline sensing based on two parameters: search time and voltage margin.

#### CONCLUSION

We have discussed the simulated results and compared among them. Charge shared matchline sensing is the best bet here.

### CHAPTER TWO

## TCAM BASICS

A TCAM is a memory that implements the lookup-table function in a single clock cycle using dedicated comparison circuitry. TCAMs are especially popular in network routers for packet forwarding and packet classification. In this chapter, basic structure of TCAM is discussed along with different types of TCAM cells, their differences and their operation. The chapter ends with the discussion of its application, particularly in packet forwarding in network routers.

#### 2.1. Ternary Content Addressable Memory (TCAM)

Content Addressable Memory (CAM) is an application specific memory which compares an input data against a Table of stored data and returns the address of the matching data. A distinct feature of CAM is that it can perform the search operation in a single clock cycle which makes them faster than any other hardware and software based search systems. Binary Cam performs exact-match search while a ternary CAM allows matching with the use of don't care bits. Don't care bit acts as a wildcard in a search and particularly it is useful for implementing longest prefix match searches in routing tables.

#### 2.1.1. TCAM Cell

TCAM, the word 'Ternary' comes from the fact that each TCAM cell store three states that I s high, low and don't care 'X'.to represent three bits it requires two bits. That is why in particular TCAM cell there are two SRAM cells.

TCAM cell can be of two types – NOR- type or NAND-type. Figure 1.1 shows both types of TCAM cells.



**Figure 1.1** (a) 16T NOR-type TCAM cell and (b) 16T NAND-type TCAM cell. One TCAM cell contains two SRAM cells (bits).

In NOR-type TCAM cell shown in figure 1.1 (a), transistors M1 -M4 make the comparison circuits. DATA1 and DATA2 respectively store the data in two bits. The three states are stored as -

DATA1 DATA2=01(low)

DATA1 DATA2=10(high)

DATA1 DATA2=00(don't care)

DATA1 DATA2=11(not allowed)

Searchlines are also encoded in the same way (SL1SL2= 10 or 01 or 00). When DATA1 DATA2=00, the masking is called local masking, while SL1SL2=00 is called global masking.

Now, when the data to be searched is matched with the data stored (DATA1 DATA2= SL1 SL2) none of the ML pull-down path is connected to ground. The ML can be pull-down to ground by the any of the transistors in series (M1, M2 or M3, M4). If the stored data don't match with the search data the conducting path between ML to ground becomes active by any of the series transistors.

Therefore, if there is a match the ML will retain its voltage otherwise if there is a mismatch the ML will be grounded.

In figure 1.1(b), the NAND- type cell, there are different cells for data and mask. TRANSISTORS M1 to M4 makes the comparison circuit. In local masking, '1' is stored in the mask cell (X='1') which turns on the pass transistor M4. It enables both the matchlines to be connected with each other. Now, both the searchlines are made high so that the M3 transistor turns ON. Pass transistor turns ON only if the stored data and search data is matched. Otherwise if there is a mismatch both transistors M3 and M4 remains OFF.

Therefore, if there is a match exact match or wildcard match, one or both the transistors M3 and M4 are ON and if there is a mismatch, both of them are OFF.

#### 2.1.2 TCAM Word

A TCAM is word is formed by joining large numbers of TCAM cells side by side. The TCAM cells can be of NOR-type or NAND-type. Both work same way but in different way. The number of bits in a TCAM word is usually large with existing implementations ranging from 36 to 144 bits. A typical TCAM employs a table size ranging from a few hundred bits entries to 32k entries. The corresponding address space ranging from 7 to 15 bits. All the TCAM cells are connected to same matchline.

If the data stored and search data match with each other ML remains floating otherwise the ML is pulled down to ground.



Figure 1.2 (a) One TCAM data word consisting of n-bit NOR-type cells and (b) one data word consisting of n-bit NAND-type cells. Bit lines, word lines and access transistors have not been shown for clarity. One 'TCAM bit' is actually represented by two bits.

#### 2.1.3 NOR-type cell versus NAND-type cell

From the TCAM word of NOR-type and NAND-type we see that the NOR-type cells are arranged in a parallel manner in a word. Whereas the NAND-type are arranged in series. Therefore, a property of NOR-type cell is that it provides a full rail to rail voltage at the gates of the of all comparison transistors. On the other hand, as the cells are connected in series the voltage gets reduced every time it passes a cell. So, a disadvantage of NAND-type cell is that it provides only a reduced voltage logic '1' which can reach up to  $V_{DD}$ -  $V_{tn}$ . (Where  $V_{DD}$  is the supply voltage and  $V_{tn}$  is the NMOS threshold voltage). This can cause mismatch even there is match in the circuit. So, NOR-type cells are more preferred than NAND-type cell.

#### **1.2 Operations**

#### **1.2.1 Write Operation**

WRITE operation is performed by enabling the word line (WL) and supplying the data to written to the Bit lines (BL) .When WL is enabled the access transistors M5 and M6 turn ON. Then the data supplied to the BL pass through the access transistors to the internal nodes (DATA and  $\overline{DATA}$ ). There it is preserved as full rail to rail voltage because of the feedback action of the crosscoupled inverters. Data that is written can be different from the previous stored value.



Figure 1.3 Circuit diagrams of conventional (a) 10T NOR-type BCAM cell and (b) 9T NAND-type cell

#### 2.2.2 Read Operation

While reading the data stored the BLs are precharged to high and the WL is enabled again. After the precharging is done, BL drivers are turned OFF. Access transistors M5 and M6 are still turned ON at that moment. So, the BL starts discharging. Reduction of voltage while discharge creates a voltage difference between the BLs which is sensed by the BL sense amplifier (BLSA) and then converted into full rail to rail voltage. Thus the data can be read. While discharging from BL, the DATA nodes can may rise. If it rises up to the threshold voltage of M2, it might flip the stored bit.in order to prevent this M4 needs to wider so that the voltage can be discharged quickly before it increases the voltage of the DATA node. Usually, the driver transistors are made 1.5 times wider than the access transistors.

#### 2.2.3 Search Operation

In the search operation, the data to be searched are supplied to the search lines (SL). If the search Data match the stored data ML remains disconnected from the ground. If it doesn't match the pull down path causes the ML to be connected to the ground. The figure 1.4 shows how multiple cells are connected together to form a word. If one single bit is mismatched, the whole ML is grounded by the pull down path of that cell.so, the collective result will be '0'. The match line will remain high only if a full set of data matches with the corresponding word.



Figure 1.4 One BCAM data word consisting of n-bit NOR-type cells



Figure 1.5 A k-word×n-bit TCAM array using NOR-type cells

#### 2.3 Applications of TCAM

As TCAMs are faster than any other hardware or software-based search systems, it has a wide variety of applications. These applications include parametric curve extraction [6], Hough transformation [7], Huffman coding/decoding [8], [9], Lempel–Ziv compression [10]–[13], and image coding [14]. The primary commercial application of CAMs today is to classify and forward Internet protocol (IP) packets in network routers [15]–[20].

In networks like the Internet, a message such an as e-mail or a Web page is transferred by first breaking up the message into small data packets of a few hundred bytes, and, then, sending each data packet individually through the network. These packets are routed from the source, through the intermediate nodes of the network (called routers), and reassembled at the destination to reproduce the original message. The function of a router is to compare the destination address of a packet to all possible routes, in order to choose the appropriate one. A CAM is a good choice for implementing this lookup operation due to its fast search capability.

#### 2.3.1 Packet Forwarding Using CAM

Network routers forward data packets from an incoming port to an outgoing port, using an address-lookup function. The address-lookup function examines the destination address of the packet and selects the output port associated with that address. The router maintains a list, called the routing table that contains destination addresses and their corresponding output ports. An example of a simplified routing table is displayed in Table I. All four entries in the table are 5-bit words, with the don't care bit, "X", matching both a 0 and a 1 in that position. Because of the "X" bits, the first three entries in the Table represent a range of input addresses, i.e., entry 1 maps all addresses in the range 10100 to 10111 to port A. The router searches this table for the destination address of each incoming packet, and selects the appropriate output port. For example, if the router receives a packet with the destination address 10100, the packet is forwarded to port A. In the case of the incoming address 01101, the address lookup matches both entry 2 and entry 3 in the table. Entry 2is selected since it has the fewest "X" bits, or, alternatively, it has the longest prefix, indicating that it is the most direct route to the destination. This lookup method is called longest-prefix matching.

| Entry No. | Address (Binary) | Output Port |  |
|-----------|------------------|-------------|--|
| 1         | 101XX            | A           |  |
| 2         | 0110X            | B           |  |
| 3         | 011XX            | C           |  |
| 4         | 10011            | D           |  |



Figure 1.6 CAM-based implementation of the routing table of Table I

Figure 1.6 illustrates how a CAM accomplishes address lookup by implementing the routing table shown in Table I. On the left of Figure 1.6, the packet destination-address of 01101 is the input to the CAM. As in the table, two locations match, with the (priority) encoder choosing the upper entry and generating the match location 01, which corresponds to the most-direct route. This match location is the input address to a RAM that contains a list of output ports, as depicted in Figure 1.6. A RAM read operation outputs the port designation, port B, to which the incoming packet is forwarded. We can view the match location output of the CAM as a pointer that retrieves the associated word from the RAM. In the particular case of packet forwarding the associated word is the designation of the output port. This CAM/RAM system is a complete implementation of an address-lookup engine for packet forwarding.

## CHAPTER THREE

## MATCHLINE SENSING SCHEMES

In the last chapter, Basics of TCAM, its structure, operation and different applications have been discussed. In this chapter different matchline sensing schemes have been discussed allowing with the performance of two different charge shared schemes and the current race scheme. The chapter ends with the review of their performance curves and their advantage and disadvantages.

#### 3.1 Matchline Sensing Scheme

In order to, improve the performance and reduce power dissipation without compromising its unmatched search speed, different matchline sensing schemes are available. Example:

- 1. Conventional (Precharge-high) matchline sensing
- 2. Low –Swing Schemes
- 3. Current-race scheme
- 4. Selective–Precharge Scheme
- 5. Pipelining Scheme
- 6. Current-race scheme with Active feedback
- 7. Current saving scheme
- 8. Charge shared matchline sensing schemes

Among these schemes our main consideration is the current race scheme and the charge shared matchline sensing schemes. We have chosen the mainly the charged shared current race scheme because it seem more promising than other schemes and so we have discussed two of the charged shared methods and compared them with the current race scheme.

#### 3.2 Current Race (CR) Scheme

Current race ML sensing scheme is one the most popular sensing schemes. Several other sensing schemes with better performance have been derived from it. NOR-type cells are used for constructing TCAM array in CR scheme. ML power reduction of precharge-high scheme was the main concern of CR scheme in the first place. CR scheme differs from conventional scheme being a precharge-low scheme. Therefore CR scheme pre-discharges all MLs to ground. If every bits of a word matches, then the corresponding ML goes high. In CR scheme, SLs don't need to be discharged to ground as MLs are predischarged. This technique saves half of SL energy. CR scheme uses dummy/replica control. This way it eliminates the need to charge MLs unnecessarily.



**Figure 2.1** Current race ML sensing scheme – (a) one word and (b) the dummy (replica) word

MLSA has two units: charging unit and sensing unit. The search operation begins by pre-discharging MLs to ground and MLSA to zero voltage. MLRST signal is used for this purpose. Transistor M2 is turned on by MLEN which eventually causes the flow of ML current,  $I_{ML}$  ML capacitance,  $C_{ML}$  starts charging. If any word is fully matched, the corresponding ML turned on M3 by charging up to its threshold voltage. This way MLSO becomes high. When any word is not matched,  $V_{ML}$  remains small as ML has a discharging path to ground. In this case, M3 remains OFF and thus MLSO remains zero. Dummy or replica word is always in a matched case. Therefore it always produces a DMLSO which is high. The inverted version of DMLSO, or specifically MLOFF, is used to turn off transistor M2. This way unnecessary charging of ML, and thus energy consumption is reduced. The delay is intentional here. It makes sure MLs get enough time charge them up to the transistor M3 threshold voltage in case of a match detection by the dummy word. Speed and energy consumption of the MLs are controlled by V<sub>bias</sub>, since it control I<sub>ML</sub>.

The parasitic capacitance of ML is determined by the ON/OFF states of transistors M1 and M2. Here, parasitic capacitance of ML depends on the search data.  $C_{ML}$  remains the same for every MLs in a search since search bits are same along a line of column. Good matching between MLs and prevention of sensing error due to capacitive variation is ensured by this process.

Both matched and mismatched MLs receive same amount initial current in case of CR scheme.  $I_{ML}$  decreases in a matched case since for the match case MLs are charged to high. This way mismatched MLs have lower resistance path(s) to ground. Increasing number of mismatches decrease the equivalent resistance of the ML pull-down path. In this case, increasing number of mismatches increases  $I_{ML}$ . Large amount of currents to mismatched MLs cause significant wastage of energy as most of the MLs are mismatched. Supplying smaller amount of currents to mismatched MLs can solve this energy consumption problem.



Figure 2.2 A timing diagram for a single search cycle

#### **3.3 Charged Shared Matchline Sensing Schemes**

In the charge shared matchline sensing schemes, charge between different ML segments are being shared to reduce power consumption. It has also the capability to enlarge the voltage margin so that it is more immune to noise without compromising the search speed.

# **3.3.1** Charge Shared Matchline Sensing by Segmentation of Two Blocks (4 Segments)

In this scheme the matchlines are divided into four segments. In the first phase of search operation two segments (segment 1 and 4) are precharged to VDD. At this time the SLs are kept low for the avoidance of high impedance state in ML segments. At the same time the CS is also low so that there will be no charge sharing.

In the second phase SLs are given input and CS triggers the pass gates for charge sharing between the two remaining segments (segment 2 and 3). If there is a match between the two segments in a block with their corresponding search key inputs then the voltage ( $V_{1f}$  and  $V_{rf}$ ) will be high to produce high ML voltages. In case of mismatch the matchline voltages discharges to ground.



Figure 2.3 Charge shared ML sensing scheme

The job of match sensor block between segment 2 and 3 is to combine the match result from the left and right segments and give a final match result.

The advantage of this scheme is that, if there is a match in two consecutive search, then any charge remaining in ML of the previous search cycle can be reused to reduce power consumption. Another advantage is that it reduces peak power consumption without compromising the speed or average power compared to other existing techniques. In this scheme the main focus was to reduce the peak power consumption. But for doing this there was no target to reduce the speed. This point may be considered as a drawback of this problem.

# **3.3.2.** ML Sensing Scheme Using the Combination of Charge-Sharing, Selective Precharge and Replica Control

In this scheme the matchline segments are divided into two ML segment where the second segment is larger than the first segment.

The search operation can be divided into two phases. At the beginning of the search operation the ML segments and MLSO are discharged to ground.

In first phase MLEN signals starts charging all the MLs of the first segment. The segments are compared with the corresponding search key input. If there is a match then the ML segment 1 voltage is sufficiently high to trigger the MLSA1 that turns on MLSO1 to high. As the MLSO1 becomes high it starts charging the second segments and also starts the charge sharing between the two segments via pass transistor M2. Otherwise if there is a mismatch then the pass transistor is off and MLSO2 remains zero.



**Figure 2.4** ML sensing scheme using combination of charge sharing, selective precharge and replica control - (a) one word of TCAM array, (b) replica/dummy word

First segments are compared to each other to elect the selected second segments which would be activated. This procedure is known as selective precharge.

In second phase, the second matchline segments which are now activated are compared with the search key inputs. The MLSA can quickly gain its sensing threshold voltage as there is a charging and charge sharing going on at the same time. So MLSA2 will be high. If there is a mismatch then MLSO2 will be zero via the discharging path to ground.

In case of dummy words, they are always matched due to local masking. When the DMLSO1 becomes high, the charging of the first segments are stopped by  $\overline{\text{MLOFF1}}$  signal. This in case stops charge sharing between the two segments. The charging duration can be controlled by delayed and inverted version of MLSA outputs.

This scheme is a combination of current race techniques, selective precharge and charge sharing. As only selected second segments are charged so energy can be saved and by reusing the stored charge in the first segment energy can also be saved.

The main disadvantage of this scheme is tuning the transistors. As the pass transistors can't be switched of instantly there will be still some charge sharing going on after the MLOFF1 is turned on. So as a result the first segment will be still charging even after the transistors are turned off.

As a result, the segment 1 voltage will be always higher than the segment 2 voltage and some charge in the first segment will be wasted. In order to avoid this problem we can use digitally controlled delay for generation of  $\overline{\text{MLOFF1}}$  from DMLSO1. But this in terms increase the circuit complications.



**Figure 2.5** (a) Circuit schematic of the proposed charge-shared ML scheme using a current-race ML sense amplifier, and (b) its timing diagram.

### CHAPTER FOUR

## SIMULATION AND COMPARISONS

In the previous chapter, we have discussed different charge shared matchline sensing schemes and also the current race scheme. In this chapter we have simulated all those schemes with HSPICE and compared their search time and voltage margin.

#### 4.1 Parameter

For the purpose of simulation we used HSPICE 2007 and for obtaining the simulation graphs we have used COSMO-SCOPE. The simulations were done with 180nm CMOS technology. In all the cases, a 32 bit 16\*16 array of TCAM is used.

With this, we have gained the voltage margin and search time.

#### 4.1.1 Search Time

Search time is the delay between the initializing signal (precharge signal or the matchline RESET signal) and the final match result which is found from the match sensing amplifier (MLSOAs). The difference is the measure of how quickly the circuitry gives away the match result. So, It is desirable to have the least search time.

#### 4.1.2 Voltage Margin

Voltage margin defines the difference between the 1 bit mismatch maximum voltage and the crossing of matched ML and match result signal. We use 1 bit mismatch's maximum voltage cause this is the highest possible value that can we get.as the number of mismatches increases the total mismatch voltage decrease. So, for our convenience we use the maximum possible output. The crossing between the two signal (matched ML and math result) signifies the threshold voltage for which the MLSA can give the actual result. If it is less than a certain value, the MLSA may give mismatch result even if there is a match.

## 4.2 Simulation Results:

## 4.2.1 Current-Race Scheme:



Figure 3.1 Search time measurements



Figure 3.2 Voltage Margin measurement





Figure 3.3 Search time measurement





(Mismatch in Left Block)



Figure 3.5 Voltage Margin measurement

(Mismatch in Right Block)

# **4.2.3 ML Sensing Scheme Using the Combination of Charge Sharing, Selective Precharge and Replica Control**



Figure 3.6 Search time measurement



Figure 3.7 Voltage Margin measurement

(Mismatch in 1<sup>st</sup> Segment)



Figure 3.8 Voltage Margin measurement

(Mismatch in 2<sup>nd</sup> Segment)

#### 4.3 Comparison Results:

|                          | Current  | Charge-Shared                |                               |                                        |                                     |
|--------------------------|----------|------------------------------|-------------------------------|----------------------------------------|-------------------------------------|
|                          | Race     | 1 <sup>st</sup> Scheme       |                               | 2 <sup>nd</sup> Scheme                 |                                     |
| Search<br>Time<br>(ns)   | 5.0446   | 4.0145                       |                               | 5.6497                                 |                                     |
| Voltage<br>Margin<br>(V) |          | Mismatch<br>in Left<br>Block | Mismatch<br>in Right<br>Block | Mismatch in<br>1 <sup>st</sup> Segment | Mismatch in 2 <sup>nd</sup> Segment |
|                          | 1.064405 | 0.25157                      | 0.76002                       | 1.127786                               | 1.2212                              |

#### Table of Comparison

We have compared current race schemes with two charged shared schemes. Here is the comparison result between them.

In case of search time, we got that for current race scheme search time is obtained as 5.0446ns. Whereas in case of charged shared schemes, in the first case search time is 4.0145ns and in the second scheme the search time is 5.6497ns. In case of search time consideration the more it is lower the more it is better. Because the lesser the search time is the higher its speed is.

In case of voltage margin, voltage obtained for current race scheme is 1.064405V. For charged shared schemes, for the first case the voltage margin for the left block mismatch is 0.25157V and for the right block mismatch is 0.76002V. And in the second scheme the voltage margin if there is a mismatch in the first segment is 1.127786V and if there is a mismatch in the second segment voltage margin will be 1.2212V. For voltage margin case the higher the voltage margin is the better it will be for circuit operation. Because the higher the voltage margin is it will be more immune to noise. If the noise margin is less than if there is a mismatch it will be undetected. So there will

be error in case of simulated results. So the more the voltage margin is the more it is helpful for us.

We can conclude that in case of search time the first charged shared scheme is better compared to the second charged shared scheme. Whereas is case of voltage margin the second charged shared scheme is better compared to the first charged shared scheme. So there will be trade off in case of consideration that which one will be better.

## CHAPTER FIVE

## **5.1 CONCLUSION**

There are different matchline sensing schemes. Among them we have mainly focused on the charge shared schemes as it is more convenient than the other matchline sensing schemes. Our main focus was to compare the obtained simulation results for charged share case with current race scheme and compare for the better performance.

In the conventional matchline sensing schemes the power consumption is more. In case of charge shared schemes without considering the mismatched case search speed the power consumption can be reduced.

So we have considered two charged shared schemes. In the first scheme the matchline segment is divided into four segment. Here if there is a match in two consecutive search then the charge stored in the previous cycle can be reused for power reduction.

Whereas in the second scheme the matchline segments are divided into two segments. The first segment is small compared to the second one. It is because if there is mismatch in the first segment then no need to charge the second segment. So time and power is saved. This scheme is more advantageous as it is the combination of current race, charge shared and selective precharge.

#### **5.2 FUTURE WORK**

In this paper our main concern was to compare performance of different charge shared schemes with the existing current race scheme. However, through simulations the data we obtained are the basic performance measurements. The results can be optimized by tuning the transistor sizes.

Here in our simulation we have used  $16 \times 16$  TCAM array. Increasing the size can give us more practical view of the TCAM performance.

So, our main target would be to optimize the results by tuning transistor sizes and also increasing the number of bits. Then our next approach would be to compare it with other schemes and combine them to find improved performance results. Moreover, we are looking forward to discover new schemes that can be more improved to further extent in case of power consumption and search speed.

#### REFERENCES

- 1. T. Kohonen, Content-Addressable Memories, 2nd ed. New York: Springer-Verlag, 1987.
- L. Chisvin and R. J. Duckworth, "Content-addressable and associative memory: alternatives to the ubiquitous RAM," IEEE Computer, vol. 22, no. 7, pp. 51–64, Jul. 1989.
- K. E. Grosspietsch, "Associative processors and memories: a survey," IEEE Micro, vol. 12, no. 3, pp. 12–19, Jun. 1992. N. Robinson, "Pattern-addressable memory," IEEE Micro, vol. 12, no.3, pp. 20–30, Jun. 1992.
- 4. S. Stas, "Associative processing with CAMs," in Northcon/93 Conf. Record, 1993, pp. 161–167.
- 5. S. Stas, "Associative processing with CAMs," in Northcon/93 Conf. Record, 1993, pp. 161–167.
- M. Meribout, T. Ogura, and M. Nakanishi, "On using the CAM concept for parametric curve extraction, IEEE Trans. Image Process" vol.9,no.12, pp. 2126–2130, Dec. 2000.
- M. Nakanishi and T. Ogura, "Real-time CAM-based Hough transform and its performance evaluation," Machine Vision Appl., vol. 12, no. 2, pp. 59–68, Aug. 2000.
- 8. E. Komoto, T. Homma, and T. Nakamura, "A high-speed and compact-size JPEG Huffman decoder using CAM," in Symp. VLSI Circuits Dig. Tech. Papers, 1993, pp. 37–38.
- 9. L.-Y. Liu, J.-F. Wang, R.-J. Wang, and J.-Y. Lee, "CAM-based VLSI architectures for dynamic Huffman coding," IEEE Trans. Consumer Electron., vol. 40, no. 3, pp. 282–289, Aug. 1994.
- 10.B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, "A single chip Lempel-Ziv data compressor," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol.3, 1993, pp. 1953–1955.
- 11.R.-Y. Yang and C.-Y. Lee, "High-throughput data compressor designs using content addressable memory," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 4, 1994, pp. 147–150.

- 12.C.-Y. Lee and R.-Y. Yang, "High-throughput data compressor designs using content addressable memory," IEE Proc.—Circuits, Devices and Syst., vol. 142, no. 1, pp. 69–73, Feb. 1995.
- 13.D. J. Craft, "A fast hardware data compression algorithm and some algorithmic extensions," IBM J. Res. Devel., vol. 42, no. 6, pp. 733–745, Nov. 1998.
- 14.S. Panchanathan and M. Goldberg, "A content-addressable memory architecture for image coding using vector quantization," IEEE Trans. Signal Process. vol. 39, no. 9, pp. 2066–2078, Sep. 1991.
- 15.T.-B. Pei and C. Zukowski, "VLSI implementation of routing tables: tries and CAMs," in Proc. IEEE INFOCOM, vol. 2, 1991, pp. 515–524.
- 16. "Putting routing tables in silicon", IEEE Network Mag., vol. 6, no.1, pp. 42–50, Jan. 1992.
- 17.A. J. McAuley and P. Francis, "Fast routing table lookup using CAMs," in Proc. IEEE INFOCOM, vol. 3, 1993, pp. 1282–1391.
- 18.N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, "Design of multi-field IPv6 packet classifiers using ternary CAMs," in Proc. IEEE GLOBECOM, vol. 3, 2001, pp. 1877–1881.
- 19.G. Qin, S. Ata, I. Oka, and C. Fujiwara, "Effective bit selection methods for improving performance of packet classifications on IP routers," in Proc. IEEE GLOBECOM, vol. 2, 2002, pp. 2350–2354.
- 20.H. J. Chao, "Next generation routers," Proc. IEEE, vol. 90, no. 9, pp. 1518– 1558, Sep. 2002.