Due to the limitations of the currently widely used von Neumann architecture-based computing system, research on various devices and circuit systems suitable for logic-in-memory computing applications has been conducted. In this work, the silicon-based floating gate memory cell transistor structure, which has been attracting attention as a memory to replace the dynamic random access memory or NAND Flash technology, was newly recalled, and its applicability to logic-in-memory application was confirmed. This floating gate field effect transistor (FGFET) has the advantage that the compatibility of the existing silicon-based complementary metal–oxide–semiconductor (CMOS) process is far superior to that of logic-in-memory application devices to which materials with new memory characteristics are applied. At the 32 nm technology node, which is the front node to which the planar MOSFET structure is applied, an analysis environment that can simultaneously analyze the device and circuit of the FGFET was established. For a seamless connection between FGFET-based devices and circuit analysis, the compact model of the FGFET was developed, which is applied to logic-in-memory ternary content addressable memory (TCAM) circuit design. It was verified that the two types of logic-in-memory TCAM circuits to which FGFETs are applied are superior to a conventional CMOS FET-based TCAM circuit in the number of devices used (=circuit area) and power/energy efficiency.

The traditional von Neumann architecture consists of separate processors and memory, which results in high power consumption and access overhead. Therefore, it is difficult to satisfy the development requirements of memory-intensive application devices due to the recent development of artificial intelligence.1–3 As one way to solve this problem, the development of a novel device capable of designing a circuit such as a processor to which a non-volatile memory is applied or a memory to which a computing function is applied has been attracting attention. This makes it possible to improve the performance and area while satisfying the given power consumption budget for smart edge devices.4–15 These technologies were first presented as Logic-in-Memory (LiM) concepts in the 1970s,4 and they have recently attracted attention again due to their high integration with complementary metal–oxide–semiconductor (CMOS) FETs of various nonvolatile memory technologies. Non-volatile memories that enable such LiM include resistive technologies (ReRAM),5 magnetic technologies (MRAM),6 phase change memories (PCM),7 and ferroelectric technology (FeFET).8 Research results have been published as technologies for LiM applications. This LiM concept can be divided in various ways according to the location of the memory and computation within the system architecture.9 Technologies such as processing in memory, in memory computing, and coarse-grain LiM are those in which nonvolatile memory is arranged in a matrix array and located close enough to the processor, while technologies such as fine-grain LiM are logically designed and very tightly integrated computing circuits. The ternary content addressable memory (TCAM) circuit is applied to a fine-grain LiM device (e.g., FeFET), showing a smaller circuit area and better energy consumption characteristics than the TCAM composed of conventional silicon CMOS-based static random access memory (SRAM).10–14 TCAM is a circuit that can search for stored data tables and given data parallelly and check for matches on the Match Line (ML), which has been used in networking hardware and other applications (e.g., database search applications and associated memory and routers)15 and has recently been studied and applied to artificial intelligence architectures.16 

The floating gate field effect transistor (FGFET) in this work is similar to the floating memory device structure used in the existing silicon-based NAND Flash memory. Hence, the FGFET structure, which is far superior to the previously mentioned LiM non-volatile devices integrated into the conventional silicon CMOS FET, was first introduced into LiM TCAM circuit design. The FGFET structure is based on the device structure presented by Samsung Electronics and Hitachi as Scalable Two Transistor Memory (STTM) and Phase-state Low Electron-number Drive Random Access Memory (PLEDM),17–22 respectively. The FGFET has a floating node storing data placed on the gate stack of a common MOSFET, and the data write function is performed by applying the Data Line (DL) voltage and Word Line (WL) voltage. In particular, an energy barrier was placed on the floating node to prevent the loss of stored data. In this work, the FGFET was applied to the most scaled-down 32 nm technology node in the single gate planar MOSFET logic process, and the electrical characteristics were confirmed using a well-calibrated Technology Computer Aided Design (TCAD). In addition, a compact model was developed so that FGFETs can be depicted in a circuit simulator, and various TCAM circuit analyses applied with LiM technology were performed. This also shows the results of performing conventional CMOS-based TCAM and circuit characteristic benchmarks at the 32 nm technology nodes. It is believed that the FGFET is a highly practical LiM technology to be applied to the industry.

This paper consists of the following. Section 2 describes the FGFET device structure and compact model. Section 3 includes the results of optimization and silicon CMOS technology and benchmarking of various TCAM circuit schemes using the developed model library for the FGFET. Finally, Sec. 4 concludes the experimental results.

This chapter describes the FGFET device operation and the used design-technology co-optimization (DTCO) environment for FGFET analysis. As shown in Fig. 1(a), the TCAD simulator used in this work was well-calibrated based on the measured data of the fabricated devices published in previous papers,17,23,24 and a compact model of the FGFET was developed using this well-calibrated TCAD. The developed model was used to confirm the LiM application characteristics of the FGFET through the TCAM circuit design. The FGFET structure is divided into a vertical FET (VFET) and a sense FET (SFET), as shown in Fig. 3(a). In addition, coupling capacitance (CVA) occurs when the VFET source and SFET gate are integrated into the FGFET structure having a memory node. In this work, a planar MOSFET at the 32 nm technology node, which is a node just before the FinFET structure is applied, was introduced as an SFET that becomes the baseline of the FGFET. For the SFET, 32 nm technology node Predictive Technology Model (PTM) was used as reference data. Based on industrial data, including ITRS 2005, it can be seen that the PTM model has sufficient physical properties and scalability for a wide range of processes and design conditions, and the excellent predictions for both nominal and variable characteristics have been verified.23,24 The channel length (LCH) of the VFET was set considering the aspect ratio of the fabricated FGFET presented in a previous paper.17, Figure 2 shows the FGFET structure and key parameters performed by TCAD, and Table I summarizes the key dimensions of the FGFET used in this work. Synopsys’ Sentaurus was used as the TCAD software;25 the used carrier transport models (mobility, carrier velocity, tunneling parameters, etc.) were calibrated with the fabricated hardware measurement data describing each SFET23,24 and VFET,17 and then integration into an FGFET structure to provide mutual coupling effect was considered. In particular, the description of tunneling current in a thin tunneling barrier is very important for the VFET, and for this purpose, calibration of parameters related to tunneling mass and density of state has a very large effect on the accuracy of VFET TCAD. The SFET and VFET TCAD calibration results are shown in Figs. 1(b) and 1(c), and the calibration was performed with more than 90% accuracy in the entire transfer curve including the sub-threshold region. As such, compact modeling was performed to predict and describe the electrical characteristics (I–V and C–V curve) of the FGFET at 32 nm using the well-calibrated TCAD. Adding a Central Shallow Barrier (CSB) to an existing FGFET reduces tunneling electrons due to barriers at the bottom, top, and center of the channel. Hence, the voltage of the memory node is lowered, and the stand-by-power is reduced.17 

FIG. 1.

(a) Established FGFET Design Technology Co-Optimization (DTCO) framework for LiM applications, (b) SFET TCAD calibration results with hardware-based I–V transfer curve of planar MOSFET at the 32 nm technology node, and (c) VFET TCAD calibration results with the hardware-based I–V transfer curve.

FIG. 1.

(a) Established FGFET Design Technology Co-Optimization (DTCO) framework for LiM applications, (b) SFET TCAD calibration results with hardware-based I–V transfer curve of planar MOSFET at the 32 nm technology node, and (c) VFET TCAD calibration results with the hardware-based I–V transfer curve.

Close modal
FIG. 2.

Key device parameters of the FGFET.

FIG. 2.

Key device parameters of the FGFET.

Close modal
TABLE I.

Values for key device parameters of the FGFET in this work.

ParametersValues
Gate separation (TD32 nm 
VFET gate oxide thickness (TOX10 nm 
Metal thickness (TM25 nm 
VFET channel length (LCH100.2 nm 
Source/drain length (LSD25 nm 
Source/drain barrier (LSDB2 nm 
VFET channel doping Intrinsic 
VFET S/D doping 2 × 1020 cm−3 
Memory node thickness (tN23.7 nm 
SiO2 thickness (TSiO20.7 nm 
HfO2 thickness (THfO23 nm 
Substrate doping 1 × 1016–1.8 × 1016 cm−3 
SFET source/drain doping 5 × 1019 cm−3 
ParametersValues
Gate separation (TD32 nm 
VFET gate oxide thickness (TOX10 nm 
Metal thickness (TM25 nm 
VFET channel length (LCH100.2 nm 
Source/drain length (LSD25 nm 
Source/drain barrier (LSDB2 nm 
VFET channel doping Intrinsic 
VFET S/D doping 2 × 1020 cm−3 
Memory node thickness (tN23.7 nm 
SiO2 thickness (TSiO20.7 nm 
HfO2 thickness (THfO23 nm 
Substrate doping 1 × 1016–1.8 × 1016 cm−3 
SFET source/drain doping 5 × 1019 cm−3 

Figure 3(a) shows the FGFET structure, and Fig. 3(b) shows the schematic for the compact model. The SFET and VFET were modeled using the BSIM4 model, one of the industry standard models, and the coupling characteristics between the VFET’s gate and memory node were implemented through CVA modeled with Verilog-A. The operation of the FGFET is as follows: As shown in Fig. 3(b), it is divided into initialize, write, storage, and read operation modes according to the WL voltage condition applied to the FGFET. 3 V is applied to WL for initialize and write, −2 V is applied to WL for storage, and 0.5 V is applied to read. Initialize is the process of initializing the memory node to 0 V, and Write is the section where the desired data (0 or 1) is written to the memory node, giving 1 V (high) or 0.05 V (low) to the DL. After writing, the data stored in the memory node of the FGFET is isolated and stored through the dielectric barriers placed on the bottom and top, and storage mode is performed before reading to maintain the data until reading. Finally, at the time of reading, 0.9 V is applied to the Sense Line (SL) to read “0” or “1.” Figure 4 shows the space charge distribution inside the memory node when the operation modes of the FGFET are storage and read. In the storage mode, WL is −2 V, and in the read mode, WL is 0.5 V, but the change in space charge is larger than the change in voltage in each mode. In this way, the change in charge distribution according to the voltage change was confirmed and applied to the capacitor modeling. CVA developed with Verilog-A is modeled to consider not only the physical capacitor by the dielectric layer but also the depletion capacitance considering the voltage condition.

FIG. 3.

FGFET (a) cross section view and (b) equivalent circuit model of the FGFET, which is a combined structure of the VFET and SFET.

FIG. 3.

FGFET (a) cross section view and (b) equivalent circuit model of the FGFET, which is a combined structure of the VFET and SFET.

Close modal
FIG. 4.

Space charge density profile at the memory node of the FGFET under (a) storage mode (data low), (b) read mode (data low), (c) storage mode (data high), and (d) read mode (data high).

FIG. 4.

Space charge density profile at the memory node of the FGFET under (a) storage mode (data low), (b) read mode (data low), (c) storage mode (data high), and (d) read mode (data high).

Close modal

As shown in Fig. 3(b), in order to model the VFET and SFET constituting the FGFET model, the BSIM4 model library was developed by extracting the electrical characteristic (I–V and C–V) curves of individual FETs from TCAD. After integration including CVA based on Verilog-A, various memory operation characteristics were obtained and compared transiently in TCAD, and the consistency of the FGFET compact model was secured. The results of the BSIM4 model library developed for describing the electrical characteristics of the VFET and SFET are shown in Fig. 5.

FIG. 5.

Comparison of the simulation results between the TCAD and SPICE model for (a) VFET, SFET IDS-VGS at low VDS, (b) VFET, SFET IDS-VGS at high VDS (c) VFET CGG vs VGS at low VDS, and (d) SFET CGG vs VGS at low VDS.

FIG. 5.

Comparison of the simulation results between the TCAD and SPICE model for (a) VFET, SFET IDS-VGS at low VDS, (b) VFET, SFET IDS-VGS at high VDS (c) VFET CGG vs VGS at low VDS, and (d) SFET CGG vs VGS at low VDS.

Close modal

After individual modeling of the VFET, SFET, and CVA of the FGFET, as shown in Fig. 3(b), the integrated compact model for the FGFET should be able to describe the transient operation of the FGFET, as shown in Fig. 6, so additional tuning was performed using CVA as follows: CVA was performed by setting the capacitance of the geometric direct overlap area and depletion capacitance components, which can be confirmed by structural dimension and TCAD simulation, to initial values and then checking the FGFET transient characteristics and adding fitting parameters. This may include additional parasitic capacitance by the fringe field in addition to the two capacitance components mentioned above, so it is reasonable to introduce fitting parameters to CVA in consideration of this. It can be seen that the developed compact model for the FGFET describes the TCAD results in various operating modes and voltage conditions as shown in Fig. 6.

FIG. 6.

FGFET compact model: (a) one-bit memory timing chart at VDL = 1 V and (b) one-bit memory timing chart at VDL = 0.05 V.

FIG. 6.

FGFET compact model: (a) one-bit memory timing chart at VDL = 1 V and (b) one-bit memory timing chart at VDL = 0.05 V.

Close modal

TCAM is a high-speed memory circuit used for applications requiring very fast search speed. Through search operation, “0,” “1,” and “Don’t care” states stored in memory cells are compared in parallel with the search data and output to match the address value. In this chapter, the operating characteristics of two types of FGFET-based TCAM circuits and comparison with conventional FET-based TCAM circuits were analyzed.

Figure 7(a) shows a schematic of a 2FET+2FGFET TCAM circuit composed of two conventional FETs and two FGFETs. As shown in Fig. 7(b), conventional FET-based TCAM requires 16 conventional FETs to store one-bit, but 2FET+2FGFET TCAM requires only four transistors. In addition, 2FET+2FGFET TCAM is a method of checking whether data are consistent with FGFETs (FG1,2) and FETs (T1,2) connected in parallel. 2FET+2FGFET TCAM is a circuit scheme with excellent stability, in which the problem of ML voltage instability due to the addition of T1,2 is alleviated compared to 2FGFET TCAM consisting of only FG1,2. In Fig. 7(a), each of FG1/T1 and FG2/T2 becomes a pull-down path as the ML is connected to the ground (GND). The pull-down path is shorted or opened depending on the data stored in FG1,2 and whether T1,2 is on/off. In case of a match, if both FG1,2/T1,2 are off, the connection between ML and GND is open, and ML is high (=0.9 V). On the other hand, in the case of a mismatch, when one of the pull-down paths is shorted to GND, ML is discharged and becomes low (=0 V). In “Don’t care,” ML is always maintained high regardless of the input data. Unlike Fig. 7(a), TCAM, which consists of FeFETs with ferroelectric layers integrated into the gate stack of FETs, places conventional FETs under the ML and then places FeFET devices.11 At this time, a potential charge-sharing problem is created between ML and node A1,2, and T1 or T2 may be unintentionally turned on to cause ML’s discharge.26 In order to solve this problem, as shown in Fig. 7(a), the FGFET is connected under the ML node, and a conventional FET used for Select Line (SL) and Select Line Bar (SLB) is connected below it in this work. Therefore, more stable operation is possible. Figures 7(c)7(e) depict the 2FET+2FGFET TCAM circuit operating characteristics. When write is “0,” operation applies 3 V to OP and 29 mV to BL to turn on the VFET of FG1,2. Then, −2 V is applied to the OP to turn off the VFET to store the data, and then a search operation is performed by applying 0.5 V to the OP. 0.9 V is applied to SL and SLB to determine whether data are matching. When write is “1,” the operation is the same as write “0” in storage and search mode, but when writing “1,” 2 V is applied to BL. As shown in Fig. 7(d), the ML node voltage drops only when the search is “0,” and the voltage drop does not occur when the search is “1.” In the “Don’t care” state, 29 mV is applied to both BL/BLB, as shown in Fig. 7(e), and no voltage drop occurs in ML.

FIG. 7.

(a) 2FET+2FGFET TCAM; (b) conventional 16FET TCAM schematic; (c) in 2FET+2FGFET TCAM, when write is “0,” search is 0 and 1; (d) in 2FET+2FGFET TCAM, when write is “1,” search is 0 and 1; (e) in 2FET+2FGFET TCAM, when write is “x,” search is 0 and 1.

FIG. 7.

(a) 2FET+2FGFET TCAM; (b) conventional 16FET TCAM schematic; (c) in 2FET+2FGFET TCAM, when write is “0,” search is 0 and 1; (d) in 2FET+2FGFET TCAM, when write is “1,” search is 0 and 1; (e) in 2FET+2FGFET TCAM, when write is “x,” search is 0 and 1.

Close modal

Figure 8 shows a comparison result according to the array size of TCAM composed of 16 baseline FETs and 2FET+2FGFET-based TCAM in the case of one-mismatch. Figure 8(a) shows the search, which is defined as the time difference between the first time when V(pre) becomes half and the first time when V(out) becomes half in Fig. 7(a). Compared to the conventional 16FET TCAM, the search delay time of 2FET+2FGFET TCAM is 76.5% slower in the 1 × 1 array, but the speed difference decreases as the array size increases, showing 2.3% faster operation in the 64 × 64 array. As for the search delay time, as the array size increases, the inverter, which is a sense amplifier (SA), slows down, and the buffer delay consumed to search for data increases. 2FET+2FGFET TCAM is affected by less buffer delay than conventional 16FET TCAM. The search delay time was measured to be small. As shown in Fig. 8(b), the search energy of 2FET+2FGFET TCAM is 25.4% smaller in the 1 × 1 array and 35.9% smaller in the 64 × 64 array than conventional 16FET TCAM. In the 1 × 1 array, the proportion of search energy consumed by SA was large, but as the array size increased, the proportion of energy in SA decreased, and the improvement rate increased. In Fig. 8(c), the Energy-Delay Product (EDP) of 2FET+2FGFET TCAM is 31.6% larger in the 1 × 1 array than the conventional 16FET TCAM but 37.3% smaller in the 64 × 64 array. From the TCAM array size of 16 × 16 or more, it can be seen that the EDP characteristics of 2FET+2FGFET TCAM are better than those of the conventional 16FET TCAM, and the larger the array size, the greater the improvement rate of characteristics. Figure 8(d) shows the power-delay product (PDP), which is the product of search delay time and search power, and it is the same as the EDP trend.10,11

FIG. 8.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) energy-delay product (EDP), and (d) power-delay product (PDP) according to the array size of conventional 16FET TCAM and 2FET+2FGFET TCAM.

FIG. 8.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) energy-delay product (EDP), and (d) power-delay product (PDP) according to the array size of conventional 16FET TCAM and 2FET+2FGFET TCAM.

Close modal

Figures 9(a)9(d) show a chart comparing search delay time, search energy, EDP, and PDP for all-mismatch between 2FET+2FGFET TCAM and conventional 16FET TCAM from the 1 × 1 array to 64 × 64 array. Figure 9(a) shows the search delay time, and the search delay time of conventional 16FET TCAM was small even in the case of all-mismatch. Figure 9(b) depicts the search energy characteristics for each TCAM. Compared to the conventional 16FET TCAM in the 1 × 1 array, 2FET+2FGFET TCAM consumes 25.4% less search energy and the 64 × 64 array 2FET+2FGFET TCAM consumes 66.7% less search energy than the conventional 16FET TCAM. The one-bit mismatch per energy of 2FET+2FGFET is smaller than the one-bit match per energy, so the improvement rate is higher than that of one-mismatch. Figure 9(c) shows the EDP characteristics for TCAM. In the 1 × 1 array, 2FET+2FGFET-based TCAM is 31.6% worse than conventional 16FET TCAM, and as the array size increases, EDP improves by 15.9% in the 64 × 64 array. In all-mismatch, 2FET+2FGFET TCAM consumes less energy than conventional 16FET TCAM, but the search delay time of 2FET+2FGFET TCAM is large, so the EDP improvement rate is relatively small compared to that of one-mismatch. Figure 9(d) shows the PDP characteristic, and the performance difference compared to the conventional 16FET TCAM is similar to the EDP characteristic.

FIG. 9.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to the array size of conventional 16FET TCAM and 2FET+2FGFET TCAM.

FIG. 9.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to the array size of conventional 16FET TCAM and 2FET+2FGFET TCAM.

Close modal

In the corresponding subchapter, the characteristics of applying the FGFET to 2FGFET TCAM are analyzed. A TCAM circuit composed of two FeFETs has been previously proposed.27 As depicted in Fig. 10(a), unlike 2FET+2FGFET TCAM, 2FGFET TCAM does not have 2FET, so the FGFET is directly connected to ML and GND, and OP/OPB plays the role of SL/SLB in 2FGFET TCAM. Since the FGFET used in 2FET+2FGFET cannot implement a NAND circuit, 2FGFET TCAM was implemented by right shifting the I–V curve of FGFET, as shown in Fig. 10(b).

FIG. 10.

(a) 2FGFET TCAM schematic and (b) I–V curve of SFET and I–V curve shift through gate work function engineering.

FIG. 10.

(a) 2FGFET TCAM schematic and (b) I–V curve of SFET and I–V curve shift through gate work function engineering.

Close modal

The 2FGFET TCAM has the smallest footprint as it requires only two transistors to store one-bit. The 2FET+2FGFET TCAM checks the match/mismatch in ML by comparing the SL and SLB applied to the transistors (T1,2), as shown in Fig. 7(a), but the 2FGFET TCAM confirms the data stored in the FGFET by using the OP/OPB node. In terms of circuit, only FG1 and FG2 shown in Fig. 10(a) form a pull-down path, and the on/off of the path will vary according to the data stored in the FGFET. Therefore, unlike 2FET+2FGFET TCAM, 2FGFET TCAM has different voltages applied to the gate of FGFET OP and OPB, as shown in the first figure of Figs. 11(a)11(c). In the case of a match, since FGX is off in the pull-down path, it can be seen as open between ground and ML, and ML maintains a high (=0.9 V) state. On the other hand, in case of a mismatch, one of the pull-down paths is shorted to GND, and ML is discharged to become Low (=0 V). “Don’t care” is always maintained high regardless of the input data. Figures 11(a)11(c) show the operating characteristics. In case of write “0,” in order to store “0” in FGFET, 3 V is applied to OP and 30 mV is applied to BL at the same time. Data “0” is saved by applying −2 V to the OP. After that, OP applies 0.75 V complementary to OPB to check the data in ML. As shown in the last picture of Fig. 11(a), the voltage drop of ML does not occur when the search is “0,” but the voltage drop occurs when the search is “1.” Conversely, in the case of write “1,” OP writes and stores with the same voltage as write “0.” However, in order to write “1,” 1.45 V is given to BL when writing, and OP operates opposite to OPB during search. As shown in Fig. 11(b), the voltage drop of ML occurs only when the search is “0,” and the high state is maintained when the search is “1.” The “Don’t Care” state applies 30 mV to both BL/BLB. Figure 11(c) shows a “Don’t Care” timing diagram, and there is no voltage drop.

FIG. 11.

(a) When write is “0,” search 0 and 1; (b) when write is “1,” search 0 and 1; (c) when write is “x,” search 0 and 1 in 2FGFET TCAM. (d) Time difference according to the matching degree and (e) relationship between 1/τ and the number of mismatched bits.

FIG. 11.

(a) When write is “0,” search 0 and 1; (b) when write is “1,” search 0 and 1; (c) when write is “x,” search 0 and 1 in 2FGFET TCAM. (d) Time difference according to the matching degree and (e) relationship between 1/τ and the number of mismatched bits.

Close modal

2FGFET TCAM causes a difference in discharge time in ML according to the number of mismatches between stored data and search data. Therefore, the mismatch number of TCAM cells can be predicted with a time difference. Figure 11(d) shows a graph showing the time difference according to the mismatch degree of ML, and the data from 1000000 to 1111111 are extracted from 1 × 7 TCAM. Through this, it can be seen that as the mismatch increases, the discharge speed also increases, so the difference in the number of mismatch bits can be confirmed.16  Figure 11(e), a function of τ, expresses the discharge rate according to the number of mismatch bits, and τ is a time constant. 1/τ increases as the number of mismatch bits increases. Because of the phenomenon that the search delay time varies according to the number of mismatches, there is a limit to determining the performance evaluation between conventional 16FET TCAM, 2FET+2FGFET TCAM, and 2FGFET TCAM with only one-mismatch. Therefore, in this work, the search delay time, search energy, EDP, and PDP of one-mismatch and all-mismatch were compared.

Figure 12 shows the result of comparing 2FGFET TCAM and conventional 16FET TCAM according to the array size in the case of one-mismatch. As shown in Fig. 12(a), for the search delay time, the 1 × 1 array conventional 16FET TCAM is 64.2% faster than 2FGFET TCAM and 27.7% better in the 64 × 64 array. Figure 12(b) shows a graph of search energy. In the 1 × 1 array, the conventional 16FET TCAM consumes 48.8% less search energy than the 2FGFET TCAM, but in the 64 × 64 array, the 2FGFET TCAM consumes 33.0% less search energy than the conventional 16FET TCAM. Figures 12(c) and 12(d) show the results of comparing EDP and PDP of TCAM, respectively. As the array size of 2FGFET TCAM increases, it can be confirmed that the improvement rate increases. As described above, this has the disadvantage of increasing the delay time, but it is because the energy (power) saving characteristic is more dominant.

FIG. 12.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to array size of conventional 16FET TCAM and 2FGFET TCAM.

FIG. 12.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to array size of conventional 16FET TCAM and 2FGFET TCAM.

Close modal

Figure 13 shows the results of comparing 2FGFET TCAM and conventional 16FET TCAM according to the array size in the case of all-mismatch. As shown in Fig. 13(a), as the array size increases, the search delay time improves as the number of active pull-down paths increases in the case of a mismatch. However, if the number of connections to the ground exceeds a certain number, the search delay time is stagnant. The search delay time of conventional 16FET TCAM in the 1 × 1 array is 64.2% of that of 2FGFET TCAM, and that of conventional 16FET TCAM in the 64 × 64 array is 62.3% of that of 2FGFET TCAM. As shown in Fig. 13(b), the 2FGFET TCAM is more efficient in search energy than the conventional 16 FET TCAM for the 4 × 4 and larger arrays. The search energy of conventional 16FET TCAM in the 1 × 1 array is 51.2% smaller than that of 2FGFET TCAM, but as the array size increases, the search energy of 2FGFET TCAM is 27.8% smaller than that of conventional 16FET TCAM in the 64 × 64 array. Figures 13(c) and 13(d) show the results of comparing EDP and PDP of TCAM, respectively. 2FGFET TCAM has 26.3% better EDP and PDP characteristics than conventional 16FET TCAM in the 64 × 64 array.

FIG. 13.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to the array size of conventional 16FET TCAM and 2FGFET TCAM.

FIG. 13.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to the array size of conventional 16FET TCAM and 2FGFET TCAM.

Close modal

Performance evaluation compares one-mismatch and all-mismatch cases. Figures 14 and 15 show the comparison results of conventional 16FET TCAM, 2FET+2FGFET, and 2FGFET in case of one-mismatch and all-mismatch in the 64 × 64 array.

FIG. 14.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to conventional 16FET TCAM, 2FET+2FGFET, and 2FGFET TCAM in the 64 × 64 array.

FIG. 14.

In the case of “one-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to conventional 16FET TCAM, 2FET+2FGFET, and 2FGFET TCAM in the 64 × 64 array.

Close modal
FIG. 15.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to conventional 16FET TCAM, 2FET+2FGFET, and 2FGFET TCAM in the 64 × 64 array.

FIG. 15.

In the case of “all-mismatch,” the TCAM characteristic comparison results of (a) search delay time, (b) search energy, (c) EDP, and (d) PDP according to conventional 16FET TCAM, 2FET+2FGFET, and 2FGFET TCAM in the 64 × 64 array.

Close modal

Figure 14(a) shows the result of search latency. The 2FET+2FGFET TCAM is 2.3% better than the conventional 16FET TCAM, but the 2FGFET TCAM is 38.2% larger than the conventional 16FET TCAM. This is because the current of the latter is lower at the same voltage when comparing the FGFET I–V curve used for 2FET+2FGFET TCAM and the FGFET I–V curve of 2FGFET TCAM. Figure 14(b) shows the search energy graph. Both 2FET+2FGFET TCAM and 2FGFET TCAM have better search energy efficiency than conventional 16FET TCAM. 2FET+2FGFET TCAM is 35.9% smaller than conventional 16FET TCAM, and 2FGFET TCAM is 33.0% smaller. Figures 14(c) and 14(d) show the results for EDP and PDP. Since 2FET+2FGFET TCAM has better performance than conventional 16FET TCAM, EDP and PDP are also small. On the other hand, 2FGFET TCAM has a larger search delay time than conventional 16FET TCAM, but both EDP and PDP are small because of its excellent search energy (power) performance.

Figure 15(a) shows that conventional 16FET TCAM is superior to FGFET TCAM in search delay time. When searching as shown in Fig. 15(b), the search energy is symmetrical with the search delay time. This is because the conventional 16FET TCAM’s capacitor is relatively larger than the FGFET capacitor. Hence, the conventional 16FET TCAM consumes more energy than the FGFET TCAM.11,28 Figures 15(c) and 15(d) show the EDP and PDP, where 2FET+2FGFET TCAM is improved by 15.9% and 2FGFET TCAM is 26.3% better than conventional 16FET TCAM.

ML drops voltage from 0.9 to 0 V in the case of a mismatch, and 2FGFET TCAM reduces more gently than conventional 16FET TCAM, so the voltage is maintained for a long time. Since the P-channel Metal–Oxide–Semiconductor (PMOS) current of the inverter of 2FGFET TCAM is larger than that of conventional 16FET TCAM, the Match Line Sense Amplifier (MLSA) consumes more energy in 2FGFET TCAM and 2FGFET TCAM consumes more total energy than conventional 16FET TCAM. Figure 16 and Table II show the analysis data of the energy of each TCAM and the share of each part in the total energy in the case of one-mismatch. As the array size increases, the same tendency appears in the 64 × 64 array size. However, in the 64 × 64 array, 2FGFET+2FET TCAM and 2FGFET TCAM performances are improved compared with conventional 16FET TCAM. This is because (1) in 63 match comparisons of conventional 16FET TCAM, more energy is consumed than one mismatch energy of FGFET TCAM and (2) the proportion of the matching energy of FGFET TCAM is small.

FIG. 16.

Energy consumption percentage of each device among the overall search energy in the 64 × 64 array.

FIG. 16.

Energy consumption percentage of each device among the overall search energy in the 64 × 64 array.

Close modal
TABLE II.

Energy consumption ratio of each device among the overall search energy in the 64 × 64 array. The red box means a mismatched TCAM cell.

Compared to CMOS (%)
Search energy (J)CMOS2FGFET+2FET2FGFET2FGFET+2FET2FGFET
MLSA m1 (MPRE3.0 × 10−16 3.1 × 10−16 4.4 × 10−17 104 14.8 
MLSA m2 (inverter’s PMOS) 1.8 × 10−14 4.9 × 10−14 9.8 × 10−14 264 529 
MLSA m3 (inverter’s NMOS) 1.6 × 10−14 2.2 × 10−14 3.9 × 10−14 139 244 
MLSA total 3.5 × 10−14 7.1 × 10−14 1.4 × 10−13 204 392 
Comparison (FG1 or X1+SRAM16.2 × 10−14 6.8 × 10−14 2.3 × 10−14 109 24.9 
Comparison (FG2 or X2+SRAM27.5 × 10−16 2.1 × 10−18 2.6 × 10−21 0.28 0.26 
Comparison T1 3.1 × 10−14 1.0 × 10−14 ⋯ 33.6 ⋯ 
Comparison T2 9.9 × 10−19 4.8 × 10−19 ⋯ 48.1 ⋯ 
Comparison total 9.4 × 10−14 7.9 × 10−14 2.3 × 10−14 83.4 24.9 
Match comparison 1.6 × 10−15 × 63 6.5 × 10−17 × 63 1.0 × 10−17 × 63 3.9 0.62 
Total TCAM 1.5 × 10−11 9.9 × 10−12 1.0 × 10−11 64.4 67.3 
Compared to CMOS (%)
Search energy (J)CMOS2FGFET+2FET2FGFET2FGFET+2FET2FGFET
MLSA m1 (MPRE3.0 × 10−16 3.1 × 10−16 4.4 × 10−17 104 14.8 
MLSA m2 (inverter’s PMOS) 1.8 × 10−14 4.9 × 10−14 9.8 × 10−14 264 529 
MLSA m3 (inverter’s NMOS) 1.6 × 10−14 2.2 × 10−14 3.9 × 10−14 139 244 
MLSA total 3.5 × 10−14 7.1 × 10−14 1.4 × 10−13 204 392 
Comparison (FG1 or X1+SRAM16.2 × 10−14 6.8 × 10−14 2.3 × 10−14 109 24.9 
Comparison (FG2 or X2+SRAM27.5 × 10−16 2.1 × 10−18 2.6 × 10−21 0.28 0.26 
Comparison T1 3.1 × 10−14 1.0 × 10−14 ⋯ 33.6 ⋯ 
Comparison T2 9.9 × 10−19 4.8 × 10−19 ⋯ 48.1 ⋯ 
Comparison total 9.4 × 10−14 7.9 × 10−14 2.3 × 10−14 83.4 24.9 
Match comparison 1.6 × 10−15 × 63 6.5 × 10−17 × 63 1.0 × 10−17 × 63 3.9 0.62 
Total TCAM 1.5 × 10−11 9.9 × 10−12 1.0 × 10−11 64.4 67.3 

TCAM has been used in networking hardware and other applications (e.g., database search applications and associated memory and routers). Existing Artificial Intelligence (AI) uses a Graphics Processing Unit (GPU) and Dynamic Random Access Memory (DRAM). The GPU is used to compute all stored vectors, and DRAM stores vectors. Recently, one-shot learning was performed in AI by constructing a TCAM circuit using non-volatile devices.16,29 The advantage of using the TCAM circuit is that data movement can be eliminated because the distance is calculated in parallel within the memory. Conventional 16FET TCAM is a volatile device and is composed of SRAM, so it requires a very large area and consumes a lot of energy. On the other hand, the FGFET is a non-volatile device and can be used for neuromorphic computation because it has the advantage of being small in area because it can store data in a memory node.

In this work, FGFET devices applicable to LiM computing systems that can overcome the limitations of existing von Neuman architecture-based computing systems were introduced and analyzed by applying them to TCAM, an LiM application circuit. The FGFET is a very feasible structure due to material and structural similarity to the existing silicon-based floating gate NAND Flash memory cell.

There are two FGFET-based TCAM circuits (2FET+2FGFET and 2FGFET) used in the analysis, which are benchmarked against conventional CMOS TCAM implemented using SRAM in terms of speed and energy consumption. Both types of FGFET TCAMs showed excellent EDP characteristics compared to conventional 16FET TCAMs, and it was confirmed that the improvement increased as the TCAM array size increased. The 2FET+2FGFET TCAM has a higher energy consumption due to two additional FETs than the 2FGFET TCAM. However, since SL/SLB is separately equipped, the operation stability is high. On the other hand, the 2FGFET TCAM has a disadvantage that the search delay time is greatly affected by the number of mismatch bits because there is no baseline FET for SL/SLB, but there are advantages in terms of area and energy consumption.

The authors are thankful to IC Design Education Center (IDEC) for EDA tool supports. This paper was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2021M3F3A2A03017693), and partly by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2020M3F3A2A01081595).

The authors have no conflicts to disclose.

S.C. and S.K. contributed equally to this work.

Sangki Cho: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Writing – original draft (lead). Sueyeon Kim: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Writing – original draft (equal). Insoo Choi: Conceptualization (supporting); Data curation (supporting); Formal analysis (supporting). Myounggon Kang: Supervision (supporting). Seungjae Baik: Supervision (supporting). Jongwook Jeon: Supervision (lead).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
A.
Jaiswal
et al, “
8T SRAM cell as a multibit dot-product engine for beyond von Neumann computing
,”
IEEE Trans. Very Large Scale Integr. Syst.
27
(
11
),
2556
2567
(
2019
).
2.
X.
Huang
et al, “
In-memory computing to break the memory wall
,”
Chin. Phys. B
29
(
7
),
078504
(
2020
).
3.
O.
Mutlu
et al, “
Processing data where it makes sense: Enabling in-memory computation
,”
Microprocess. Microsyst.
67
,
28
41
(
2019
).
4.
H. S.
Stone
, “
A logic-in-memory computer
,”
IEEE Trans. Comput.
C-19
(
1
),
73
78
(
1970
).
5.
N.
Talati
,
R.
Ben-Hur
,
N.
Wald
,
A.
Haj-Ali
,
J.
Reuben
, and
S.
Kvatinsky
, “mMPU—A real processing-in-memory architecture to combat the von Neumann bottleneck,” in Applications of Emerging Memory Technology: Beyond Storage, edited by
M.
Suri
(
Springer
, 2020), pp.
191
213
.
6.
D.
Ielmini
and
H.-S. P.
Wong
, “
In-memory computing with resistive switching devices
,”
Nat. Electron.
1
(
6
),
333
343
(
2018
).
7.
X.
Chen
,
X.
Yin
,
M.
Niemier
, and
X. S.
Hu
, “
Design and optimization of FeFET-based crossbars for binary convolution neural networks
,” in
2018 Design, Automation Test in Europe Conference Exhibition (DATE)
(
IEEE
,
2018
), pp.
1205
1210
.
8.
Y.
Zhang
,
L.
Xu
,
K.
Yang
,
Q.
Dong
,
S.
Jeloka
,
D.
Blaauw
, and
D.
Sylvester
, “
Recryptor: A reconfigurable in-memory cryptographic cortex-M0 processor for IoT
,” in
2017 Symposium on VLSI Circuits
(
IEEE
,
2017
), pp.
C264
C265
.
9.
I.
O’Connor
,
M.
Cantan
,
C.
Marchand
,
B.
Vilquin
,
S.
Slesazeck
,
E. T.
Breyer
,
H.
Mulaosmanovic
,
T.
Mikolajick
,
B.
Giraud
,
J.
Noël
,
A.
Ionescu
, and
I.
Stolichnov
, “
Prospects for energy-efficient edge computing with integrated HfO2-based ferroelectric devices
,” in
2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC)
(
IEEE
,
2018
), pp.
180
183
.
10.
X.
Yin
,
M.
Niemier
, and
X. S.
Hu
, “
Design and benchmarking of ferroelectric FET based TCAM
,” in
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2017
(
IEEE
,
2017
), pp.
1444
1449
.
11.
X.
Yin
et al, “
Ferroelectric FETs-based nonvolatile logic-in-memory circuits
,”
IEEE Trans. Very Large Scale Integr. Syst.
27
(
1
),
159
172
(
2018
).
12.
J.
Li
et al, “
1 Mb 0.41 µm2 2T-2R cell nonvolatile TCAM with two-bit encoding and clocked self-referenced sensing
,”
IEEE J. Solid-State Circuits
49
(
4
),
896
907
(
2013
).
13.
B.
Song
et al, “
A 10T-4MTJ nonvolatile ternary CAM cell for reliable search operation and a compact area
,”
IEEE Trans. Circuits Syst. II: Express Briefs
64
(
6
),
700
704
(
2016
).
14.
C.-C.
Lin
et al, “
7.4 A 256b-wordlength ReRAM-based TCAM with 1 ns search-time and 14× improvement in wordlength-energyefficiency-density product using 2.5 T1R cell
,” in
2016 IEEE International Solid-State Circuits Conference (ISSCC)
(
IEEE
,
2016
), pp.
136
137
.
15.
R.
Karam
et al, “
Emerging trends in design and applications of memory-based computing and content-addressable memories
,”
Proc. IEEE
103
(
8
),
1311
1330
(
2015
).
16.
K.
Ni
et al, “
Ferroelectric ternary content-addressable memory for one-shot learning
,”
Nat. Electron.
2
(
11
),
521
529
(
2019
).
17.
H.
Mizuta
et al, “
The role of tunnel barriers in phase-state low electron-number drive transistors (PLEDTRs)
,”
IEEE Trans. Electron Devices
48
,
1103
1108
(
2001
).
18.
K.-D.
Kim
et al, “
Characterization of multi-barrier tunneling diodes and vertical transistors using 2-D device simulation
,” in
International Conference on Simulation of Semiconductor Processes and Devices
(
IEEE
,
2002
), pp.
167
170
.
19.
S. J.
Ahn
et al, “
Highly scalable and CMOS-compatible STTM cell technology
,” in
IEEE International Electron Devices Meeting 2003
(
IEEE
,
2003
), pp.
10.4.1
10.4.4
.
20.
S. J.
Baik
et al, “
STTM-promising nanoelectronic DRAM device
,” in
4th IEEE Conference on Nanotechnology, 2004
(
IEEE
,
2004
), pp.
45
46
.
21.
K.
Nakazato
et al, “
Phase-state low electron-number drive random access memory (PLEDM)
,” in
2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No. 00CH37056)
(
IEEE
,
2000
), pp.
132
133
.
22.
S.
Kang
et al, “
Scalable two transistor memory (STTM) for mobile embedded applications with 80 nm technology
,”
in International SoC Design Conference (ISOCC)
, Seoul, Korea, October
2004
, pp.
159
162
.
23.
W.
Zhao
and
Y.
Cao
, “
Predictive technology model for nano-CMOS design exploration
,”
ACM J. Emerging Technol. Comput. Syst.
3
(
1
),
1
(
2007
).
24.
ITRS 2005. The International technology Roadmap for Semiconductors.
25.
Sentaurus Device User Guide, Version L-2016.03,
Synopsys TCAD Sentaurus
,
San Jose, CA, USA
,
2016
.
26.
K.
Pagiamtzis
and
A.
Sheikholeslami
, “
Content-addressable memory (CAM) circuits and architectures: A tutorial and survey
,”
IEEE J. Solid-State Circuits
41
(
3
),
712
727
(
2006
).
27.
X.
Yin
et al, “
An ultra-dense 2FeFET TCAM design based on a multi-domain FeFET model
,”
IEEE Trans. Circuits Syst. II: Express Briefs
66
(
9
),
1577
1581
(
2018
).
28.
X.
Yin
et al, “
Ferroelectric ternary content addressable memories for energy efficient associative search
,”
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
42
,
1099
(
2022
).
29.
P.
Huang
,
R.
Han
, and
J.
Kang
, “
AI learns how to learn with TCAMs
,”
Nat. Electron.
2
(
11
),
493
494
(
2019
).