Traditional physical-based models have generally been used to model the resistive-switching behavior of resistive-switching memory (RSM). Recently, vacancy-based conduction-filament (CF) growth models have been used to model device characteristics of a wide range of RSM devices. However, few have focused on learning the other-device-parameter values (e.g., low-resistance state, high-resistance state, set voltage, and reset voltage) to compute the compliance-current (CC) value that controls the size of CF, which can influence the behavior of RSM devices. Additionally, traditional CF growth models are typically physical-based models, which can show accuracy limitations. Machine learning holds the promise of modeling vacancy-based CF growth by learning other-device-parameter values to compute the CC value with excellent accuracy via examples, bypassing the need to solve traditional physical-based equations. Here, we sidestep the accuracy issues by directly learning the relationship between other-device-parameter values to compute the CC values via a data-driven approach with high accuracy for test devices and various device types using machine learning. We perform the first modeling with machine-learned device parameters on aluminum-nitride-based RSM devices and are able to compute the CC values for nitrogen-vacancy-based CF growth using only a few RSM device parameters. This model may now allow the computation of accurate RSM device parameters for realistic device modeling.
I. INTRODUCTION
Resistive-switching memory (RSM) is an emerging candidate for next-generation memory and computing devices, such as storage-class memory devices, multilevel memories, and as a synapse in neuromorphic computing. RSM devices have gained significant attention due to their simple metal–insulator–metal (MIM) structure, which composes a dielectric material sandwiched between the top electrode (TE) and bottom electrode (BE).1 RSM operation is based on resistive switching, which is generally known as the set process if the device switches from a high resistance state (HRS) to a low resistance state (LRS) and the reset process if the device switches from an LRS to an HRS.2 Such processes depend on the amplitude and polarity of applied voltage in the unipolar and bipolar-switching modes, respectively.2 Applying a voltage between the TE and BE leads to the formation of conduction filament (CF) in the set process and rupture of CF in the reset process.3 The formation of CF in RSM devices is typically based on the electrochemical metallization (ECM) effect,4 valence-change memory (VCM) effect,5 thermochemical memory (TCM) effect,6 and other effects. Aluminum-nitride-based RSM devices exhibit bipolar switching. The growth of CF is governed by the VCM effect, and CF is formed by nitrogen vacancies.7
Traditional physical-based models of CF growth are generally used to model filament-type RSM devices. Various phenomena and different approaches have been used to develop these models.8–12 For example, a flux-charge model was developed and it considers a memristor to be an equivalent to an inductor or a capacitor.13 This model used cumulative device current to provide a time-based measure of electrical conductance of a memristor. Conventional physical-based models usually assumed a linear relationship among the variables of a memristor. In particular, the ion-drift model with linear correlation developed by the Hewlett–Packard (HP) company laid the foundation for modeling RSM devices. Notably, the model compares a hysteresis effect between the switching curves for experiments and those for calculations.14 Additionally, it includes a linear relationship between the applied voltage (V) and current (I) to define the electrical conductance of RSM devices. On the other hand, non-linearities were also taken into account in the ion-drift model with exponential correlation, which considers the switching speed, volatility, and drift velocity variations for different applied voltages. However, it is possible that the model is usually limited to ionic crystals where major interactive forces are Coulomb repulsion. Application-specific models, such as Simmons tunneling barrier model15 and the models by Yakopcic16 and co-workers and Guan17 et al., were also developed to describe the switching behavior of neuromorphic systems. These models were also simulation-program-with-integrated-circuit-emphasis (SPICE) compatible. Recently, a bipolar model proposed by Bocquet et al. used the radius of CF to describe the switching rate and formation and rupture of CF.18,19 However, this model may assume a CF with a cylindrical shape and exclude variations of the temperature along CF. The bipolar model proposed by González-Cordero and co-workers includes the distribution of temperature, but it is possible that the model could be based on a tapered-cone-shaped CF.20
Early results on compact/simplified physical-based models related to CF growth tend to show a current-based and highly accurate current–voltage (I–V) calculation generally without solving complex physics equations.21 These models can offer a simple guide to an experimentalist by indicating which current-based parameters are expected to result in excellent device performance and which are expected to lead to a poorer performance. However, few have focused on computing the compliance-current value, which can generally control the size of CF or CF growth, which then influences the device performance, partially due to the focus on only certain aspects of device operations and general implicit form of conventional physics-based models. Additionally, traditional physical-based models consider various assumptions regarding the shape, growth direction, and temperature profile of CF to emulate the switching behavior. However, it is possible that these assumptions could lead to a less reliable and not so accurate model to emulate the switching behavior. In order to avoid the use of assumptions in existing models and, at the same time, maintain the accuracy level of existing models, we propose to study the switching behavior by using machine learning. In addition, the CC value can be influenced by other-general-device parameters, such as HRS, LRS, Vset, and Vreset.22,23 Moreover, it is possible that the mean-squared error (MSE) value to fit an I–V curve tends to be ∼0.0142, which could be insufficient for large-sample-size-modeling,21 considering recent advances in memristor-memory technology and increased focus on investigation of the potential of memristor memory for various applications. It might be possible to generally use machine-learning (ML) algorithms, e.g., neural-network-based multi-layer-perceptron (MLP) algorithm, and bootstrap-aggregating-based random-forest (RF) algorithm, to learn other-device-parameters to compute the CC value and develop an accurate CF-growth model (Fig. 1).
In this report, we present an alternative ML approach in which we avoid the need to solve traditional physical-based equations by directly learning the relationship between other-device-parameter values to compute the CC values, i.e., we establish a CC-and-other-device-parameters (CAP) model [Fig. 1(a)]. This model can allow an experimentalist to calculate/predict the correct CC magnitude reliably, which is usually not known to the experimentalist before a new batch of device testing, to achieve a specified device performance in terms of other-device parameters, e.g., HRS, LRS, Vset, and Vreset, rather than the traditional error-vulnerable trial-and-error method. Additionally, the model can enable an experimentalist to avoid filament overgrowth for preventing permanent hard breakdown during initial device testing. The proposed ML approach can model the compliance current more accurately without knowing the shape of the conduction filament and/or complex behavior, such as ion migration and thermal diffusion. One of the significant advantages of the machine learning model is that it is able to learn the other-device-parameter value relationship via a data-driven approach and with excellent accuracy. We show that the other-device-parameter value relationship can be learned via a data-driven approach and with excellent accuracy that exceeds previous current-related compact-based models to describe the nitrogen-vacancy-based CF growth. Furthermore, we show that it could be applied to real-device sampling by computing the CC value of test devices and various device types with just a few device parameters. Moreover, as we have already implemented the approach with common aluminum-nitride-based RSM devices, this could now be performed on a much wider range of RSM devices.
II. RESULTS
Training of a CAP model is shown in Fig. 1(c). Aluminum-nitride-based RSM cells were fabricated and characterized using the standard semiconductor process. Silicon substrate was used as the starting material on which a 45 nm-thick titanium BE was deposited. A 12 nm-thick aluminum nitride was deposited to form the switching layer, and a 50 nm-thick iridium TE was then deposited. Finally, a 300 nm-thick aluminum capping layer was deposited to complete the cell structure (see the supplementary material for details). The fabricated cells were sampled using a standard electrical-characterization system (Keithley 4200 SCS) at a sweep rate of 0.6 V/s to obtain the Vset and Vreset values. Data of the cells were prepared using the Pandas software (supplementary material). Each CAP model was implemented by using around 3100 data points from 50 aluminum-nitride-based RSM cells (supplementary material). Moreover, variations of the set and reset voltages were low (supplementary material, Fig. S3). In addition, we chose to investigate the set voltage, reset voltage, low-resistance state, high-resistance state, and corresponding compliance current at standard room temperature to describe the switching behavior.
Parameters of the CAP models, which were defined as hyperparameters, allow tuning of the characteristics of a ML algorithm, which in turn controls the performance of a training process.24 Here, we utilized the MLP algorithm and initiated hyperparameters of a MLP algorithm before the start of a training process using values found within the literature25–28 (see also the supplementary material). Figure 2(a) shows the architecture of a MLP model. Respective hyperparameters used for the MLP algorithm were (1) epoch, (2) learning rate (α), (3) momentum (β), and (4) number of neurons in each hidden layer. These parameters were tuned for achieving a high training performance. During the training process, hidden layers of a MLP model were used to process the weights associated with each layer. These weights were then adjusted continuously to find patterns from the training data to learn the relationship between other-device-parameter values to compute the CC values. The random-forest algorithm was also utilized to learn the mapping between other-device-parameter values (see the supplementary material). The training data and several decision trees were used to compute the CC value.29 Figure 2(b) shows the architecture of a RF model. Respective hyperparameters, e.g., minimum number of samples split in node, minimum number of samples in leaf, number of trees in forest, and maximum depth of tree, were used. An optimized RF model can be achieved by tuning these hyperparameters using a parameter-sweep method.30 CAP models trained with the MLP and RF algorithms were named the MLP-CO and RF-CO models, respectively.25–28
Tuning of the hyperparameters of CAP models is used to achieve a high training performance, that is, a low variance/bias direction toward training data. Variance is typically defined as an error that occurs due to the sensitivity of a ML model toward small fluctuation in training data. Variance of a ML model can be described via the MSE,
where n is the total number of samples in the training, validation, and test data. Computed CC values are the values generated by CAP models, while targeted CC values are the values obtained from the experiments of RSM cells. Generally, variance of a MLP-CO model can be denoted by the difference between the MSE value for validation data (MSEV) and the MSE value for training data (MSETRN). Variance of a MLP-CO model is considered as high if the MSEV value is higher compared to the MSETRN value (commonly referred to as overfitting), while variance of a MLP-CO model is considered as low if the MSEV value is approximately equal to the MSETRN value. Similarly, bias of a MLP-CO model is defined by the difference between the MSE value for validation data (MSEV) and the MSE value for training data (MSETRN). However, bias of a MLP-CO model is regarded as high if the MSEV value is lower than the MSETRN value (commonly referred to as underfitting), while bias of a MLP-CO model is regarded as low if the MSEV value is almost the same as the MSETRN value. A MLP-CO model that exhibits a low variance/bias direction toward training data is desired. It is likely that such a model can establish a good relationship between other-device-parameter values and that CC values may be computed with excellent accuracy using test data of RSM cells. Note that the RF-CO model uses the bootstrap-aggregating procedure to form a relationship between other-device-parameter values. Therefore, a RF-CO model that shows a high variance direction toward training data is usually absent. Tuning of the hyperparameters could then be an important approach to develop a RF-CO model that can compute CC values using test data of RSM cells.
Now, we discuss the training performance of CAP models. The MSETEST values of the MLP-CO and RF-CO models are shown in Fig. 2(c). The lowest MSETEST value was achieved by using specific hyperparameters to avoid a high variance/bias direction toward training data. The MLP-CO model showed high bias and variance directions toward training data when 100 and 300 epochs were used, respectively (supplementary material, Fig. S6 and Table SI). These observations indicate that a low variance/bias direction toward training data could be obtained if the number of epochs used is between 100 and 300. To achieve a low variance/bias direction toward training data, we chose the number of epochs to be 200. Additionally, the MLP-CO model performance could be improved by the use of another hyperparameter, i.e., patience, which defines the upper bound on the number of epochs used. If the patience value is defined as n, the process would stop if the MSETRN value does not decrease consecutively for n epochs. We chose the patience value to be 20 for the MLP-CO model with 200 epochs. The MLP-CO model with 200 epochs and 20 patience-type epochs showed a MSETRN value of 0.0037, similar to that of the default MLP-CO model with only 300 epochs (0.0038) (we call MLP-CO1) (supplementary material, Table SI). These findings indicate that the MLP-CO model with 200 epochs and 20 patience-type epochs is not able to account for the saturation of the MSETRN value and a reduced patience value is needed. Therefore, we chose a reduced patience value of 10 for the MLP-CO model with 200 epochs. The process stopped at around 175 epochs. Notably, the MLP-CO model with 200 epochs (MLP-CO2) and 10 patience-type epochs showed low MSEV, MSETRN, and MSETEST values of 0.0033, 0.0037, and 0.0019, respectively [Fig. 1(b) and supplementary material, Table SI]. The MSEV value of the MLP-CO2 model (0.0033) is lower compared to that of the MLP-CO1 model (0.0040). Even though the MSETRN value of the MLP-CO model with 200 epochs and 10 patience-type epochs (0.0037) is similar to that of the MLP-CO model with 200 epochs and 20 patience-type epochs (0.0037), the MSETEST value of the MLP-CO model with 200 epochs and 10 patience-type epochs (0.0019) is still lower compared to the MSETEST value of the MLP-CO model with 200 epochs and 20 patience-type epochs (0.0023) (supplementary material, Table SI). Thus, the MLP-CO2 model might be considered as optimized. Furthermore, the MLP-CO2 model shows a low variance/bias direction toward training data since the MSEV value (0.0033) is approximately equal to the MSETRN value (0.0037), indicating that the MLP-CO model can learn the other-device-parameter value relationship well.
To train the RF-CO model, a specified range of values of hyperparameters were initialized using values found in the literature.30 Figure 2(c) and supplementary material, Table SII, show the specified range of values of hyperparameters used and values of MSETEST obtained for a RF-CO model. The MSETEST value of the RF-CO model with default hyperparameter values (we call RF-CO1), where the number of trees in forest, minimum number of samples split in node, minimum number of samples in leaf, and maximum depth of trees were chosen to be 10, 2, 1, and non-specified, respectively, turned out to be 0.0018. Even though the MSETEST value of the RF-CO1 model is considerably small, a MSETEST value without optimized hyperparameters could exhibit a high variance/bias direction toward training data. To achieve a low variance/bias direction toward training data, the values of the number of trees in forest of the RF-CO model were parameter swept. Interestingly, the RF-CO model showed a reduced MSETEST value (from ∼0.0018 to 0.00155) with an increase in the number of trees in forest (from 10 to 100) [supplementary material, Fig. S8(a)]. Additionally, the MSETEST value decreased (from ∼0.0039 to 0.0014) and subsequently became saturated (∼0.0015) with an increase in the maximum depth of tree (from 2 to 24) [Fig. 1(b)]. Moreover, the MSETEST value remained constant (∼0.0014) with the value of the minimum number of samples in leaf [supplementary material, Fig. S8(b)]. The lowest MSETEST value (0.0014) of the RF-CO model (RF-CO2) was obtained when the values of the number of trees in forest, minimum number of samples split in node, minimum number of samples in leaf, and maximum-depth-of-tree value were 20, 2, 1, and 8, respectively (supplementary material, Table SII). As the RF-CO2 model did not exhibit an improved MSETEST value with further tuning of hyperparameter, the model could be considered as optimized. This indicates that a low variance/bias direction toward training data could be achieved. The MSETEST value of the optimized RF-CO2 model (0.0014) was lower than that of the default RF-CO1 model (0.0018). Additionally, the MSETEST value of the RF-CO2 model (0.0014) tends to be lower than that of the MLP-CO2 model (0.0019) [Fig. 2(c)]. Furthermore, the MSETEST value of the RF-CO2 model (0.0014) tends to be much lower compared to a compact physical-based modeling MSE baseline of 0.0142.21
Figures 2(d)–2(g) show the error histograms of CAP models. The error is defined as the difference between the targeted and computed CC values. Overall, the standard deviation of the error histogram is minimized, indicating that the models are able to compute CC values for a given set of test data. Moreover, the optimized MLP-CO2 model showed a smaller standard deviation value (∼0.043) compared to that of the default MLP-CO1 model (∼0.049). The optimized RF-CO2 model also exhibited a lower standard deviation value (∼0.038) than that of the default RF-CO1 model (∼0.043). Additionally, the standard deviation value of the RF-CO2 model (0.038) is also lower compared to that of the MLP-CO2 model (0.043), which indicates that the RF-CO2 model is able to compute CC values with excellent accuracy. In addition, the distribution of error of the RF-CO2 model is of Gaussian-type. This indicates that the RF-CO2 model can establish a good relationship between other-device-parameter values.
To test the versatility and verify the results of the CAP model used for different distributions of test data, we used the CAP model to compute CC values of a new set of aluminum-nitride-based RSM cells. Values of other-device-parameters, i.e., LRS, HRS, Vset, and Vreset, were experimentally collected from a new set of RSM cells. In the test data, other-device-parameter values of both new and original sets of RSM cells were used. This mixture of data can produce a different distribution of test data for ML. Figure 3(a) shows the cumulative probability function of the CC value for the MLP-CO2 and RF-CO2 models. Interestingly, the RF-CO2 model shows a reduced difference between the computed and experimental CC values of a new set of RSM cells compared to that of the MLP-CO2 model, indicating that the RF-CO2 model is able to accurately compute the CC values. Note that the CC values of both the MLP-CO2 and RF-CO2 models might not be in excellent agreement with those of experimental data. It is possible that the CAP model may need various types of data for different test data distributions. For instance, data obtained from different batches and regions of wafer could be used to retrain a CAP model used for different distributions of test data. Figures 3(b) and 3(c) show the cumulative probability function of the CC value of the CAP model for various types of RSM cells.31,32 New test data of other-device-parameter values of the SiOx and TiO2-based cells available in the literature were used. These data were included in the original test data of AlN-based cells. Notably, the RF-CO2 model performed better than the MLP-CO2 model for both the SiOx and TiO2-based cells. Moreover, the computed CC values of CAP models for the SiOx and TiO2-based cells fitted the experimental data well.
III. DISCUSSION
Resistive-switching memory is a promising candidate for next-generation edge computing. However, the device behavior of RSM devices is still not well understood. We explore here an alternative ML approach in which we can avoid the need to solve traditional physical-based equations by directly learning the relationship between other-device-parameter values to compute the CC values, i.e., we establish a CC-and-other-device-parameter (CAP) model. The absolute values of the CC were obtained from the experiments directly. The traditional physics-based models need to know the shape of the conduction filament and/or complex behaviors, such as ion migration and thermal diffusion to model the current-based values. On the other hand, our presented ML approach can model the compliance current well without knowing the shape of the conduction filament and the behavior. Moreover, we demonstrate that the device-parameter relationship can be learned via a data driven approach and with excellent accuracy that exceed previous current-related compact-based models. Additionally, we have also shown that it can be harnessed for real-device sampling by computing the CC values of test devices and various device types with just a few device parameters. Besides predicting the CC value, the ML method could also be used for other parameter prediction and design. Furthermore, it is possible that this method could be used for a wider range of RSM devices as well as large-scale circuits without the need to use traditional complex physical-based analysis. This would be very useful for large-scale RSM-based circuit design and implementation.
Overall, the model properties of CAP models can be modulated. For example, the RF-CO2 model can compute CC values with a higher accuracy (lower MSETEST value) for a given set of other-device-parameter values than the MLP-CO2 model. Moreover, the RF-CO2 model can show a lower MSETEST value (0.0014) compared to that of the MLP-CO2 model (0.0019). This finding could be attributed to the tabular nature of data, whereby other-device-parameter values are arranged in rows and columns, where the relationship between other-device-parameter values is known. It is possible that this could allow the RF algorithm to map the relationship between other-device-parameter values well. Even though the tabular nature of data may also benefit a MLP-CO model, the amount of data and the number of other-device-parameters used tend to influence the mapping accuracy of a MLP algorithm. Nevertheless, this could be avoided by the use of an increased amount of cell data.
In order to improve the accuracy of the CAP model, increased combinations of cell data are expected to form the training, validation, and test data. Currently, the CAP models can allow the use of training data obtained from various positions of a wafer. However, it could be improved, for example, by the inclusion of data of AlN-based RSM cells obtained under different operating conditions, fabricated using various methodologies, and sampled by a variety of systems to improve the accuracy of model computation for AlN-based RSM cells. Additionally, the current noise in the data introduced by the artifacts of testing tools could also be eliminated by including cell data obtained from various sampling systems. Although the collection of more data will help to improve the existing model, it could be time-consuming and not so cost-effective. This could be improved by adopting various data collection methodologies from machine learning, such as the data-augmentation method.33
Moreover, the trained weights and data of the current CAP model could be combined with other available data to improve the generality/versatility of the CAP model. For example, the existing CAP model can only compute the CC values of RSM cells. This model could be expanded further to also compute the other-device-parameter values, such as Vset, Vreset, LRS, and HRS by retraining the model through an interchange between the input and output parameters. Additionally, the existing model could also be improved to compute parameter values of a wide variety of RSM cells by including different RSM cell data in existing training data. The use of different machine learning techniques, such as data sharing, data augmentation, and data generation, could also help to enhance the existing CAP model to compute parameter values of a wide variety of RSM cells.
IV. CONCLUSION
In this report, we present a data-driven and accurate learning approach via the CAP model that directly learns the other-device-parameter value relationship to compute the CC value and avoids the need to solve traditional physical-based equations. Significantly, the learning device-parameter model achieved a minimized MSETEST value of ∼0.0014 and was able to compute the CC value of nitrogen-vacancy-based aluminum-nitride-type RSM devices and different device types with only a few device parameters. We believe such facile analysis of resistive switching behavior not only provides a reliable and accurate picture of resistive switching but also establishes much-needed guidelines for continued design and optimization of this important class of devices for memory applications.
SUPPLEMENTARY MATERIAL
See the supplementary material for device fabrication, data processing, other RF-CO model data, other MLP-CO model data, source data, and ML code.
ACKNOWLEDGMENTS
This work was financially supported by the Ministry of Science and Technology, R.O.C. (Contract No. MOST 109-221-E-182-032), Chang Gung Memorial Hospital, Linkou, Taiwan (Contract Nos. CMRPD2H0133, CMRPD2J0052, and BMRPA74), the Ministry of Education (Singapore) (Grant No. MOE2017-T2-2-064), the SUTD-MIT International Design Center (Singapore), the SUTD-ZJU IDEA Grant Program (SUTD-ZJU (VP) 201903), Changi General Hospital (Singapore) (Grant No. CGH-SUTD-HTIF2019-001), Agency of Science Technology and Research (Singapore) (Grant No. A20G9b0135), and the National Supercomputing Centre (Singapore) (Grant No. 15001618). Karthekeyan Periasamy acknowledges the Singapore University of Technology and Design for scholarship support.
The authors declare no conflict of interest.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.