Knowledge of the physical properties of ionic liquids (ILs), such as the surface tension and speed of sound, is important for both industrial and research applications. Unfortunately, technical challenges and costs limit exhaustive experimental screening efforts of ILs for these critical properties. Previous work has demonstrated that the use of quantum-mechanics-based thermochemical property prediction tools, such as the conductor-like screening model for real solvents, when combined with machine learning (ML) approaches, may provide an alternative pathway to guide the rapid screening and design of ILs for desired physiochemical properties. However, the question of which machine-learning approaches are most appropriate remains. In the present study, we examine how different ML architectures, ranging from tree-based approaches to feed-forward artificial neural networks, perform in generating nonlinear multivariate quantitative structure–property relationship models for the prediction of the temperature- and pressure-dependent surface tension of and speed of sound in ILs over a wide range of surface tensions (16.9–76.2 mN/m) and speeds of sound (1009.7–1992 m/s). The ML models are further interrogated using the powerful interpretation method, shapley additive explanations. We find that several different ML models provide high accuracy, according to traditional statistical metrics. The decision tree-based approaches appear to be the most accurate and precise, with extreme gradient-boosting trees and gradient-boosting trees being the best performers. However, our results also indicate that the promise of using machine-learning to gain deep insights into the underlying physics driving structure–property relationships in ILs may still be somewhat premature.

Ionic liquids (ILs) have a wide range of potential applications, ranging from carbon dioxide (CO2) capture to lignocellulosic biomass fractionation.1 This wide range of applications is due to several attractive properties of ILs, such as a negligible vapor pressure, low flammability, low toxicity, large electrochemical window, and high thermal and chemical stability.2–4 

ILs are molten salts, typically composed of an organic cation and an organic and/or inorganic anion, with melting temperatures below 100 °C.5,6 In addition to the bulk properties noted above, recent work has highlighted that both the surface tension of and the speed of sound in ILs are crucial factors for the accurate design and development of disparate processes, such as gas absorption, membrane separation, and distillation.7,8 The surface tension of a liquid is related to the intermolecular association free energy and the liquid interfacial microstructure and decreases with temperature.9,10 In biomass pretreatment, the surface tension of ILs has been shown to have a negative correlation with glucose yields.11–13 The speed of sound in a solvent is particularly useful for the development of an equation of state that describes the fluid and, consequently, can be used to derive several thermophysical properties, such as the reduced isobaric thermal expansion coefficient, isentropic and isothermal compressibility, bulk modulus, thermal pressure coefficient, isobaric and isochoric heat capacities, and the Joule–Thomson coefficient.14–16 

Unfortunately, owing to the high costs of detailed characterization studies, detailed experimental structure–property relationship studies for IL surface tensions and speeds of sound are limited. Despite this challenge, several groups, including our own, have engaged in the careful establishment of literature-derived IL structure–property databases and developed a wide variety of predictive, machine-learned, quantitative structure–property relationship (QSPR) models for ILs and their surface tension and speed of sound predictions. Recently, machine learning (ML) models that make use of COnductor-like Screening MOdel for Real Solvents (COSMO-RS)-derived features to describe ILs have shown substantial improvements in predictive power over traditional group-contribution approaches.17,18

The present study focuses on comparing and contrasting eight different machine-learning QSPR models for IL surface tension and speed of sound prediction, with the aim of exploring the impact of ML model choice on prediction quality and interpretability. In our previous studies, we established the effectiveness of COSMO-RS-derived input features for the prediction of IL properties and carbon-capture capabilities.19,20 Eight ML algorithms, namely two multilinear regression (MLR and two-factor polynomial), three decision trees (random forest [RF], gradient-boosting tree [GBT], and extreme gradient-boosting tree [XGBoost]), and three kernel-based ML algorithms (Gaussian process regression [GPR], support vector machine [SVM], and feed-forward neural network [FFNN]), were used to develop ML models for the prediction of IL properties. In contrast to analytical models that have explicit mathematical expressions, a typical ML model is a nonparametric function of its input features that requires a post-processing algorithm to interrogate the model and reveal the model’s dependency on its training features. SHAP (SHapley Additive exPlanations) analysis21,22 was performed to reveal the importance of input features on IL property predictions, which, in turn, can yield principles for designing new ILs with desired target properties.

Recently, we have published work making use of two new comprehensive datasets consisting of 2524 data points of IL surface tensions, for 360 different ionic liquids at various temperatures from 263 to 533 K, and another containing 5702 data points characterizing the speed of sound in 218 different ionic liquids at temperatures ranging from 273 to 413 K and at pressures ranging from 86 to 200 000 kPa.19 Here, we make use of this same dataset as it is substantially larger than prior literature databases focusing on these physical properties.23–26 

The goal of quantitative structure–property relationship (QSPR) regression models is to establish a relationship between the structural features of IL and their physiochemical properties.8,27 In our recent studies, we demonstrated the robustness of COSMO-RS-derived σ-profile features as input for highly accurate ML models trained to predict IL properties and CO2 solubilities. The COSMO-RS model creates a virtual conductor around each molecule, at which the surface area and screening charge density of each molecular surface segment are calculated, and based on this, the σ-profiles are derived.5,28,29 The σ-profile features of cations and anions for ML models were calculated as described elsewhere,19 and the same input features were used in the present study to develop different ML models. For completeness, we provide a summary of the computational details of the COSMO-RS calculations and the generation of the σ-profile features in the supplementary material. For the development of ML models, we included both chemical (sigma profiles) and physical (T, P, area, and molecular weight) parameters of ILs as input features. Sigma profiles capture information regarding molecular surfaces and, thus, information about interaction energies; however, this ignores inertial considerations. To account for these latter effects, the molecular mass is also considered a feature. It should be noted that while surface area and molecular weight have relatively high correlations with each other, this trend is non-linear; for example, adamantane and dehydrodecalin have the same molecular weight but different surface areas, thereby suggesting including both features may be useful.

1. Multiple linear regression

First, we developed a multilinear regression (MLR) model to predict the surface tension and speed of sound in ILs using T, P, COSMO-RS-derived σ-profiles (Sσ-profiles − 1 to Sσ-profiles − 13), and ILs’ molecular weights and surface areas as input features. The MLR model was developed using the scikit-learn package.30 The structure–property relationship is expressed in a MLR model as a linear combination of the input features, which are as follows:
y=b+i=1nwixi,
(1)
where b is the intercept of the MLR model and wi corresponds to the coefficient of feature xi, of which both parameters are determined by least-squares fitting, and y is the target property (either surface tension or speed of sound). It should be made clear that the MLR model presented here is a reproduction of our previous linear QSPR model reported elsewhere.19 

2. Two-factor polynomial regression

Similar to the MLR model, a two-factor polynomial regression (TFPR) model was also used to develop a simple, interpretable baseline for comparison to the machine-learning approaches presented elsewhere in this manuscript. The TFPR model was developed using the scikit-learn package in Python.30 In the TFPR, the binary interactions between the input features can be accounted for by adding an additional term to Eq. (1) as follows:
y=b+i=1nwixi+i=1nj=i+1n1wijxixj,
(2)
where xj represents an input feature j, wij is the interaction regression coefficient, b is the intercept of the TFPR model, and wi corresponds to the coefficient of feature xi.

3. Gaussian process regression

Gaussian Process Regression (GPR) is a non-parametric probabilistic kernel algorithm. GPR models have gained increasing attention over the past few years.31,32 One of the main benefits of the GPR method is that it is a flexible model with a variety of kernel functions, and simple clusters formed in a higher dimension correspond to complex clusters in the input space.33 This method can not only be applied for prediction but also provides the confidence interval for each prediction, which quantifies its uncertainty. Essentially, a Gaussian process is taken as a generalization of the respective probability distribution. The Gaussian distribution takes an input vector and computes its probability, whose characteristics are a mean and variance. More details on the GPR model can be found in the literature.34,35

4. Feed-forward neural networks

The feed-forward neural network (FFNN) is a computational model that mimics the structure and function of a biological neural network, consisting of a large number of neurons and interconnections between them. In a neural network model, the “neurons” are mathematical functions typically referred to as perceptrons whose output is binary, either 0 or 1, according to an activation function that toggles between these two outputs based on input from other perceptrons. The activated and deactivated perceptrons are collected in the last layer to create the necessary output response.36 FFNNs have been successfully implemented across industries to solve a wide range of engineering problems, demonstrating exceptional performance in areas such as nonlinear function fitting and machine learning, and are well known for their ability to generate high-accuracy models and robustness in solving complex problems. We built an FFNN model with three layers: one input layer, one hidden layer, and one output layer. The number of nodes in the input layer is equal to the number of input features. The number of nodes in the hidden layer of the FFNN was optimized using the RandomizedSearchCV in the scikit-learn package.30 The hidden layer was built with 15 neurons and used a ReLU (rectified linear unit) function as an activation function.

5. Support-vector machines

Support-vector machines (SVM) are based on a statistical learning theory developed by Vapnik (Vapnik, 1995).37 SVM is a popular supervised machine learning method that can be applied to classification and regression problems. In a regression problem, the goal is to fit a model that minimizes the error between a prediction and the target. In support vector regression (SVR) problem, the goal is to fit a model such that the error between a predicted response and the actual response falls within a range of −ε to ε. This provides a more flexible fit. Its popularity is largely based on its property of establishing nonlinear relationships between the feature set and the prediction target through the use of kernel functions, which can often yield a better prediction than a simple linear SVM model.38 Two popular nonlinear kernel functions used for SVM problems are the radial basis function (RBF) and polynomials of degree d > =2. Gamma is a hyperparameter specific to the RBF kernel, and cost (C) is a hyperparameter general to all kernels; both of these hyperparameters were optimized in model training. Here, we used the RBF kernel for the prediction of both surface tension and speed of sound, with gamma and cost values of 0.2 and 20 for surface tension and 0.1 and 140 for speed of sound, respectively.

6. Decision tree models

In addition to FFNN and SVM, we developed three decision tree-based ML models for IL property prediction. The decision-tree algorithms used in this work are random forest (RF), gradient-boosting tree (GBT), and extreme gradient-boosting (XGBoost). Random forest is a set of classification or regression trees, first proposed by Breiman39 in 2001, that comprise an ensemble of multiple regressors or classifiers. Each decision tree in a random forest is independent and can be processed in parallel during data classification or regression, thus reducing the computational cost of model development. The random selection of features to be used at splitting nodes enables fast training of this algorithm, even in the case of a large dimensionality of the feature vector. Each split in a tree considers a random subset of the training data. The success of RF lies in the fact that many weak tree models in aggregate yield a model with high accuracy. The final prediction for an observation is the average of the predicted values for that observation over all the decision trees.

Gradient-boosted decision trees are a popular machine-learning algorithm for solving prediction problems in both classification and regression domains.40 Unlike random forests that construct an ensemble through independent trees, gradient boosting builds many interdependent weak estimators that successively minimize the negative gradient of the loss function. GBT models obtain their high accuracy via a “boosting” mechanism that arises by adding several decision trees in a series and having each subsequent tree minimize the errors of the prior tree.41 Previously, we reported the power of using a GBT to generate predictive models of the same IL properties of interest here,19 and thus, for our comparison of ML approaches, it should be noted that the prior reported model is used here and re-trained with a GBT model with feature scaling.

Finally, extreme gradient boosting (XGBoost) is a method based on the decision tree ensemble similar to GBT that uses a gradient boosting algorithm. GBT uses the gradient descent method or first-order Taylor expansion when searching for the optimal function in the feature and hyperparameter space, while XGBoost uses the second-order Taylor expansion or Newton method to approximate the optimal function each time. XGBoost consists of a series of trees built iteratively.22 The model starts with weak learners that are intentionally added to make a significant error, which gets added to the loss function of the subsequent tree using a gradient-descent algorithm. The objective of the XGBoost function is to minimize the loss as each tree is added until the accuracy no longer improves.

All the ML models were executed in Python 3 with the scikit-learn package.30,33 All the above ML models used fivefold cross-validation for optimization. The RandomizedSearchCV in the scikit-learn package was used to optimize the hyperparameters for all the developed ML models.42 The optimized hyperparameters for all the ML models are reported in Table I for both surface tension and speed of sound. The models were trained using raw features as well as normalized, or scaled, features. Feature scaling is particularly important in training models using ML methods that generate parametric functions of the constituent features, especially when the features vary by order of magnitude. For FFNN and SVM, the best results were achieved using the “Standard Scaler” method from scikit-learn, and a similar feature processing was used for other ML models to compare the model performance.

TABLE I.

Optimized hyperparameters of different ML models for surface tension and speed of sound predictions.a

ML modelSurface tensionSpeed of sound
GPR kernel = DotProduct + WhiteKernel + RBF kernel = DotProduct + WhiteKernel + RBF 
optimizer = fmin_l_bfgs_b optimizer = fmin_l_bfgs_b 
n_restarts_optimizer = 10 n_restarts_optimizer = 10 
FFNN hidden_layer_sizes = 15 hidden_layer_sizes = 15 
activation = relu activation = relu 
solver = lbfgs solver = lbfgs 
learning_rate = adaptive learning_rate = constant 
max_iter = 100 000 000 max_iter = 100 000 000 
learning_rate_init = 0.1 learning_rate_init = 0.1 
SVR kernel = rbf kernel = rbf 
gamma = 0.2 gamma = 0.1 
C = 20 C = 140 
RF n_estimators = 100 n_estimators = 500 
min_samples_split = 3 min_samples_split = 3 
min_samples_leaf = 1 min_samples_leaf = 1 
max_features = sqrt max_features = sqrt 
GBT n_estimators = 100 n_estimators = 500 
learning_rate = 0.03 learning_rate = 0.03 
min_samples_leaf = 1 min_samples_leaf = 1 
min_samples_split = 3 min_samples_split = 3 
XGBoost learning_rate = 0.03 learning_rate = 0.03 
min_child_weight = 6 min_child_weight = 5 
n_estimators = 500 n_estimators = 300 
booster = gbtree booster = gbtree 
max_depth = none max_depth = 6 
ML modelSurface tensionSpeed of sound
GPR kernel = DotProduct + WhiteKernel + RBF kernel = DotProduct + WhiteKernel + RBF 
optimizer = fmin_l_bfgs_b optimizer = fmin_l_bfgs_b 
n_restarts_optimizer = 10 n_restarts_optimizer = 10 
FFNN hidden_layer_sizes = 15 hidden_layer_sizes = 15 
activation = relu activation = relu 
solver = lbfgs solver = lbfgs 
learning_rate = adaptive learning_rate = constant 
max_iter = 100 000 000 max_iter = 100 000 000 
learning_rate_init = 0.1 learning_rate_init = 0.1 
SVR kernel = rbf kernel = rbf 
gamma = 0.2 gamma = 0.1 
C = 20 C = 140 
RF n_estimators = 100 n_estimators = 500 
min_samples_split = 3 min_samples_split = 3 
min_samples_leaf = 1 min_samples_leaf = 1 
max_features = sqrt max_features = sqrt 
GBT n_estimators = 100 n_estimators = 500 
learning_rate = 0.03 learning_rate = 0.03 
min_samples_leaf = 1 min_samples_leaf = 1 
min_samples_split = 3 min_samples_split = 3 
XGBoost learning_rate = 0.03 learning_rate = 0.03 
min_child_weight = 6 min_child_weight = 5 
n_estimators = 500 n_estimators = 300 
booster = gbtree booster = gbtree 
max_depth = none max_depth = 6 
a

Other unlisted hyperparameters were kept at default values. SVR - support vector regression.

The model performance was evaluated using a combination of statistical metrics, including the average absolute relative deviation (AARD), mean absolute error (MAE), R2, and root mean square error (RMSE). As a reference for the reader, we provide the equations used to calculate these performance metrics below,
R2=i=1Nyiym̄2i=1Nyipredyi2i=1Nyiym̄2,
(3)
AARD%=i=1NyipredyiyiN×100,
(4)
MAE=i=1NyiyipredN,
(5)
RMSE=i=1Nyipredyi2N,
(6)
where N is the total number of data points and yi and yipred are the experimental and predicted surface tension, respectively, or speed of sound in the IL. ȳm is the average of the experimental data.

After training and evaluating the performance of the eight ML models, we interpreted the feature importance for the best performing models using Shapley Additive exPlanation (SHAP) analysis, which constructs an additive explanatory model inspired by cooperative game theory, where all features are considered “contributors.”43 SHAP is in contrast to traditional techniques for the interpretation of machine-learned models, such as impurity importance and permutation importance, which merely provide information about how important a feature it is.43 SHAP values show the degree and sign (i.e., positive or negative) of how influential a feature is in predicting individual values. SHAP feature importance is an alternative to permutation feature importance. The importance of permutation feature is based on the decrease in model performance. SHAP is based on the magnitude of feature attributions. SHAP is a “model interpretation” package developed in Python that can interpret the output of any machine learning model. For each prediction sample, the model produces a predicted value, and the SHAP value is the value assigned to each feature in that sample. When the SHAP value is >0, it means that the feature boosts the prediction value and also has a positive effect, and the opposite (SHAP < 0) means that the feature makes the prediction value lower and has a negative effect.

Recent studies demonstrated that the COSMO-RS-derived molecular descriptors, such as the sigma profile and those reflecting intermolecular interactions such as hydrogen bonding, can be used to build ML models that predict the thermodynamic properties of ionic liquids, deep eutectic solvents, and organic solvents with high accuracy.44–46 As noted in the Introduction, it is interesting to note that, while similar features have been used for model training, the choice of which ML architecture is most appropriate has been somewhat ad hoc, based primarily on model performance. Therefore, in the present study, we sought to not only explore how eight different supervised machine learning architectures/models perform at predicting given IL properties (surface tension and speed of sound), based on a common feature set, but also gauge how the model architecture may impact interpretation metrics. The input features for the machine learning models are the COSMO-RS-calculated sigma profile descriptors of cation (Sσ-profiles − 1 to Sσ-profiles − 6), anion (Sσ-profiles − 7 to Sσ-profiles − 13), molecular weight and area of IL, and experimental temperatures and pressures. As outlined in Sec. II B, the COSMO files of investigated molecules were generated and used for the calculation of sigma profiles (σ-profiles). To generate the ML models, we used 55% of the dataset for training and 45% of the dataset for testing. The rationality of this data splitting is to minimize bias in the split for training and testing data, and the training and testing data were likely exposed to all the cationic types of ionic liquids that were investigated in this study. The training and testing data were randomly split using the random state option in scikit-learn. Furthermore, the data splitting was also repeated with the different random state numbers and found a similar performance of the ML model.

Figures 1 and S1 illustrate the correlation of experimental and ML-predicted IL surface tension in the training and testing sets for different ML models. As depicted in Fig. 1 and Fig. S1 parity plots, the MLR [Fig. S1(a)], two-factor PR [Fig. 1(a)], and GPR [Fig. S1(b)] models underestimate the IL surface tensions, with an R2 of 0.48–0.88 for both training and test sets. The SVM and FFNN models performed well on the training sets; however, these models show weaker performance on the test sets with a low R2 (0.92) and higher RMSE (18.3–21.91 mN/m) values compared with other ML methods. In contrast, the decision-tree models’ (RF, GBT, and XGBoost) predictions for the training and testing sets are in excellent agreement with the experimental data, with higher accuracy [Figs. 1(d)1(f)] and greater reliability than the other investigated ML models. Figure 2 reports the performance metrics (R2, AARD, MAE, and RMSE) for all investigated ML models on the testing data, and the models are ranked based on their performance metric values. The XGBoost method yielded the highest performing model with R2, AARD, MAE, and RMSE values of 0.963, 2.67, 0.936, and 1.72 mN/m, respectively, (Fig. 2 and Table II). The RF and GBT methods yielded models with the next highest level of accuracy after XGBoost for both training and testing datasets. SVM, FFNN, GPR, and linear regression models show lower accuracy on the testing dataset with higher RMSE and AARD values (Fig. 2 and Table II) compared with the decision-tree-based models. From the statistical parameter estimations, tree-based models show low bias and low variance, while FFNN and SVM models have low bias but high variance (a large deviation for the testing set).

FIG. 1.

Experimental and predicted surface tension of ILs with different ML models: (a) two-factor PR, (b) SVM, (c) FFNN, (d) RF, (e) GBT, and (f) XGBoost.

FIG. 1.

Experimental and predicted surface tension of ILs with different ML models: (a) two-factor PR, (b) SVM, (c) FFNN, (d) RF, (e) GBT, and (f) XGBoost.

Close modal
FIG. 2.

Predictive performance metrics of different models on the surface tension of ILs: (a) RMSE, (b) AARD, (c) MAE, and (d) R2. The performance metrics were plotted on the test set.

FIG. 2.

Predictive performance metrics of different models on the surface tension of ILs: (a) RMSE, (b) AARD, (c) MAE, and (d) R2. The performance metrics were plotted on the test set.

Close modal
TABLE II.

Statistical parameters for the developed machine learning models for surface tension of ionic liquids.

ModelDatasetR2AARD (%)RMSE (mN/m)MAE (mN/m)
RF Training 0.990 1.105 0.898 0.422 
Testing 0.959 2.691 1.786 0.953 
Total 0.977 1.819 1.370 0.661 
GBT Training 0.991 1.558 1.026 0.571 
Testing 0.956 3.011 1.922 1.039 
Total 0.976 2.212 1.497 0.782 
XGBoost Training 0.991 1.327 0.861 0.489 
Testing 0.963 2.668 1.716 0.936 
Total 0.979 1.931 1.316 0.691 
SVM Training 0.948 2.207 2.069 0.866 
Testing 0.930 3.156 2.335 1.160 
Total 0.940 2.634 2.193 0.998 
FFNN Training 0.951 3.374 2.000 1.286 
Testing 0.922 4.180 2.502 1.531 
Total 0.938 3.737 2.240 1.396 
GPR Training 0.627 11.013 5.525 4.122 
Testing 0.619 11.243 5.447 4.115 
Total 0.623 11.117 5.490 4.119 
Two-factor PR Training 0.867 6.508 3.287 2.431 
Testing 0.883 5.903 3.003 2.191 
Total 0.874 6.236 3.162 2.323 
MLR Training 0.478 13.697 6.515 5.028 
Testing 0.476 13.722 6.375 4.962 
Total 0.477 13.708 6.453 4.998 
ModelDatasetR2AARD (%)RMSE (mN/m)MAE (mN/m)
RF Training 0.990 1.105 0.898 0.422 
Testing 0.959 2.691 1.786 0.953 
Total 0.977 1.819 1.370 0.661 
GBT Training 0.991 1.558 1.026 0.571 
Testing 0.956 3.011 1.922 1.039 
Total 0.976 2.212 1.497 0.782 
XGBoost Training 0.991 1.327 0.861 0.489 
Testing 0.963 2.668 1.716 0.936 
Total 0.979 1.931 1.316 0.691 
SVM Training 0.948 2.207 2.069 0.866 
Testing 0.930 3.156 2.335 1.160 
Total 0.940 2.634 2.193 0.998 
FFNN Training 0.951 3.374 2.000 1.286 
Testing 0.922 4.180 2.502 1.531 
Total 0.938 3.737 2.240 1.396 
GPR Training 0.627 11.013 5.525 4.122 
Testing 0.619 11.243 5.447 4.115 
Total 0.623 11.117 5.490 4.119 
Two-factor PR Training 0.867 6.508 3.287 2.431 
Testing 0.883 5.903 3.003 2.191 
Total 0.874 6.236 3.162 2.323 
MLR Training 0.478 13.697 6.515 5.028 
Testing 0.476 13.722 6.375 4.962 
Total 0.477 13.708 6.453 4.998 

ML models based on eight different algorithms were developed for the prediction of the speed of sound in ILs using the same type of input features (sigma profile, molecular weight and area of the IL, and experimental temperatures and pressures) as were used for training the surface tension prediction models. Figures 3 and S1 depict the correlation between the experimental and predicted speed of sound in both training and testing sets for different ML models. Except for the MLR [Figs. S1(a) and S1(c)] and GPR [Figs. S1(b) and S1(d)] models, all the investigated ML models show excellent prediction on the training and test sets with a high accuracy. However, the predictions of models trained using two-factor PR, SVM, FFNN, and RF showed large deviations at 1200–1800 m/s [Figs. 3(a)3(d)], which leads to larger RMSE and AARD values. GBT and XGBoost models show the best performance on the training and test sets, with high R2 and low relative deviations, AARD, MAE, and RMSE values compared with models trained with the other ML algorithms, which is similar to what was observed for surface tension prediction. Figure 4 shows the performance metrics of the ML models trained to predict the speed of sound. Based on the performance metrics, the GBT and XGBoost models exhibit lower deviations and a high R2 (0.992–0.994) as compared to other models. It is important to observe that the SVM model shows relatively lower AARD and MAE values of speed of sound as compared to the GBT and XGBoost models. However, SVM exhibits a substantially higher RMSE (20.67 m/s) and lower R2 (0.989), which results in a less accurate model than GBT and XGBoost (see Table III). From the statistical performance metrics and model predictions, gradient-boosting tree models show low bias and low variance and, thus, can be regarded as suitable for the prediction of IL properties.

FIG. 3.

Experimental and predicted speed of sound in ILs with different ML models: (a) two-factor PR, (b) SVM, (c) FFNN, (d) RF, (e) GBT, and (f) XGBoost.

FIG. 3.

Experimental and predicted speed of sound in ILs with different ML models: (a) two-factor PR, (b) SVM, (c) FFNN, (d) RF, (e) GBT, and (f) XGBoost.

Close modal
FIG. 4.

Predictive performance metrics of different models on the speed of sound in ILs: (a) RMSE, (b) AARD, (c) MAE, and (d) R2. The performance metrics were plotted on the test set.

FIG. 4.

Predictive performance metrics of different models on the speed of sound in ILs: (a) RMSE, (b) AARD, (c) MAE, and (d) R2. The performance metrics were plotted on the test set.

Close modal
TABLE III.

Statistical parameters for the developed machine learning models for the ionic liquid speed of sound.

ML modelDatasetR2AARD (%)RMSE (m/s)MAE (m/s)
RF Training 0.994 0.687 17.022 10.232 
Testing 0.987 1.019 23.759 14.933 
Total 0.990 0.834 20.105 12.273 
GBT Training 0.996 0.503 12.221 7.446 
Testing 0.994 0.655 16.206 9.651 
Total 0.995 0.581 14.353 8.551 
XGBoost Training 0.996 0.531 12.681 7.830 
Testing 0.992 0.723 17.573 10.652 
Total 0.994 0.628 15.221 9.236 
SVM Training 0.990 0.483 20.166 7.132 
Testing 0.988 0.583 21.908 8.615 
Total 0.989 0.527 20.662 7.753 
FFNN Training 0.993 0.591 16.434 8.767 
Testing 0.992 0.668 18.302 9.893 
Total 0.993 0.626 17.251 9.277 
GPR Training 0.805 4.570 88.970 69.135 
Testing 0.808 4.637 88.463 69.502 
Total 0.806 4.608 88.668 69.332 
Two-factor PR Training 0.990 0.825 20.078 12.122 
Testing 0.989 0.902 20.792 13.247 
Total 0.990 0.857 20.432 12.539 
MLR Training 0.871 3.598 70.465 53.908 
Testing 0.875 3.631 71.126 54.080 
Total 0.873 3.635 71.665 54.256 
ML modelDatasetR2AARD (%)RMSE (m/s)MAE (m/s)
RF Training 0.994 0.687 17.022 10.232 
Testing 0.987 1.019 23.759 14.933 
Total 0.990 0.834 20.105 12.273 
GBT Training 0.996 0.503 12.221 7.446 
Testing 0.994 0.655 16.206 9.651 
Total 0.995 0.581 14.353 8.551 
XGBoost Training 0.996 0.531 12.681 7.830 
Testing 0.992 0.723 17.573 10.652 
Total 0.994 0.628 15.221 9.236 
SVM Training 0.990 0.483 20.166 7.132 
Testing 0.988 0.583 21.908 8.615 
Total 0.989 0.527 20.662 7.753 
FFNN Training 0.993 0.591 16.434 8.767 
Testing 0.992 0.668 18.302 9.893 
Total 0.993 0.626 17.251 9.277 
GPR Training 0.805 4.570 88.970 69.135 
Testing 0.808 4.637 88.463 69.502 
Total 0.806 4.608 88.668 69.332 
Two-factor PR Training 0.990 0.825 20.078 12.122 
Testing 0.989 0.902 20.792 13.247 
Total 0.990 0.857 20.432 12.539 
MLR Training 0.871 3.598 70.465 53.908 
Testing 0.875 3.631 71.126 54.080 
Total 0.873 3.635 71.665 54.256 

To extract meaningful chemical insights and facilitate the rational design of IL molecules for desired target properties, further interpretation of the ML models is necessary. We exploited the SHAP analysis protocol to interpret the top four ML models based on the results from the training sets. The average of the absolute value of the SHAP value of each feature can be an indicator of feature importance. Figure 5 shows the importance of input features in the IL surface tension predictions. In Fig. 5, a positive SHAP value >0 for a feature suggests an increase in IL properties with a given feature value, while a negative SHAP value <0 implies a decrease in IL properties with a given feature value. For the tree-based models, molecular surface area, S5, S9, and temperature are the most important features, while for the FFNN model, anion polar region sigma features (S11–S13) and IL molecular weight are the most important features for surface tension predictions. Low values of surface area, the non-polar region of the σ-profile (S5: cation and S9: anion), and temperature boost the surface tension predictions with a positive SHAP value, which is in agreement with experimental observations.47,48 From the definition, surface tension is a measure of cohesive forces between liquid molecules that are present at the surface and represents the quantification of force per unit length or free energy per unit area.47,49 The lower the value of the σ-profile in the non-polar region (e.g., S5 and S9) and IL surface area, the higher the surface tension of the IL, suggesting that the surface tension should strongly depend on the size of a nonpolar moiety, such as the alkyl chain lengths of the ionic liquid; indeed, the ILs with longer alkyl chains tend to have lower surface tension values.47 This behavior may be attributed to a weakening of the Coulomb interaction upon increasing the alkyl chain length of an IL. Strong Coulomb interaction energies lead to high surface tension values, and the smaller alkyl chain length of ions with higher polarity exhibit larger Coulomb interaction energies. In contrast, SHAP analysis of the FFNN model shows a different set of important features for surface tension predictions than for the tree-based models. Lower values of the σ-profile in the anion polar region (e.g., S11–S13) correspond to positive SHAP values, indicating that the lower the anion polarity, the higher the surface tension of the IL, which is contrary to experimental measurements.

FIG. 5.

SHAP feature importance for the training dataset of surface tension of ionic liquids for (a) XGBoost, (b) GBT, (c) RF, and (d) FFNN models. S1–S6 are the cation sigma profile features, and S7–S13 are the anion sigma profile features.

FIG. 5.

SHAP feature importance for the training dataset of surface tension of ionic liquids for (a) XGBoost, (b) GBT, (c) RF, and (d) FFNN models. S1–S6 are the cation sigma profile features, and S7–S13 are the anion sigma profile features.

Close modal

In addition, to better understand the feature importance, the partial dependence of SHAP values for surface tension with the XGBoost model was also analyzed and is depicted in Fig. 6. The highly negative correlation between SHAP values and XGBoost top features is consistent with previous studies,8,19,47,48 which, in turn, demonstrated that the interpretation given by SHAP is reliable, though potentially ML model dependent. The variations of surface tension are generally analyzed from the perspective of molecular interactions; for example, with increasing temperature, the ion species could acquire enough kinetic energy to weaken the hydrogen bond forces and move freely. Aside from temperature, the remaining features describe the chemical environment of diverse ILs. As discussed earlier, higher values of IL surface area and σ-profile non-polar regions (S5 and S9) of ILs result in lower surface tension. From the distribution ranges of SHAP values shown in Fig. 6, we see that molecular weight has minor effects compared to surface area. The vertical distribution of molecular weights shows the interactions of the other features on the SHAP value of molecular weight.22 According to the experimental measurements, higher molecular weight is generally associated with low surface tension.47 While it is surprising that some low molecular weight ILs have negative SHAP values [Fig. 6(d)], these outliers are likely due to protic ionic liquids. Protic ILs have a greater ability to form stronger hydrogen bonds and show larger Coulomb interaction energies. However, depending on the difference in dissociation constants (pKa) and proton affinities between acids and bases, protic ILs contain both ionic and neutral molecules in their mixtures, thus decreasing the surface tension of ILs.50 

FIG. 6.

Partial dependence plots of the SHAP value for surface tension with the XGBoost model: (a) IL surface area, (b) cation sigma profile feature S5, (c) temperature, and (d) IL molecular weight.

FIG. 6.

Partial dependence plots of the SHAP value for surface tension with the XGBoost model: (a) IL surface area, (b) cation sigma profile feature S5, (c) temperature, and (d) IL molecular weight.

Close modal

SHAP analysis was also performed for the speed of sound in ILs for the tree-based and FFNN models, and the results are depicted in Fig. 7. We did not perform SHAP analysis for the SVM model due to the high computational time involved in calculating SHAP for SVM and the lower accuracy of the SVM model. The tree-based models show that IL molecular weight, pressure, temperature, and σ-profile features S9 and S12 are important input features for speed of sound predictions. In comparison, SHAP analysis of the FFNN model shows that IL molecular weight and σ-profile features S9, S2, and S4 are important. Based on the tree-based SHAP analysis, the higher values of IL molecular weight, temperature, and σ-profile feature S9 (non-polar region) show a decreasing speed of sound with negative SHAP values, while increasing pressure and σ-profile feature S12 correspond to predictions of increasing speed of sound, as shown by positive SHAP values. A higher molecular weight results in a larger free volume, in turn leading to lower values of the speed of sound, indicating consistency of the SHAP analysis with known experimental trends.48 Temperature shows balanced SHAP values (Fig. 7) for the speed of sound, indicating that with increasing temperature, the speed of sound in ILs decreases. This trend is again in accordance with the corresponding experimental observations.48,51,52 It is known that pressure influences the speed of sound; as the pressure increases, the speed of sound increases. The anion σ-profile feature S12 also boosts the speed of sound predictions; as the polarity of the anion increases, the speed of sound in IL increases. For example, the IL N-methyl-2-oxopyrrolidinium formate exhibits a higher speed of sound than N-methyl-2-oxopyrrolidinium hexanoate. This is due to the higher free volume and lower polarity of the hexanoate anion as compared to the formate anion. This observation is also in line with the experimental measurements.53 

FIG. 7.

SHAP feature importance for the training dataset of the speed of sound in ionic liquids for (a) GBT, (b) XGBoost, (c) RF, and (d) FFNN models. S1–S6 are the cation sigma profile features, and S7–S13 are the anion sigma profile features.

FIG. 7.

SHAP feature importance for the training dataset of the speed of sound in ionic liquids for (a) GBT, (b) XGBoost, (c) RF, and (d) FFNN models. S1–S6 are the cation sigma profile features, and S7–S13 are the anion sigma profile features.

Close modal

Figure 8 shows the SHAP partial dependence values for the speed of sound with the GBT model. The GBT model shows the best performance based on the full set of performance metrics. From the SHAP partial dependence values, pressure and temperature show a linear correlation; as the pressure increases, SHAP values increase. In contrast, as the temperature decreases, the SHAP values increase. It is important to mention that the FFNN model exhibits different results as regards the SHAP importance of the S4 and S9 features. In FFNN (Fig. 7), larger values of non-polar regions (S4 and S9) result in higher values of speed of sound, which is contrary to experimental observations. Furthermore, pressure, anion polarity, and temperature are the least important features of the FFNN model. The SHAP method provides clear interpretations for the effects of features on IL property predictions, with the decision tree-based model results in agreement with the experimental observation and existing theory.

FIG. 8.

Partial dependence plots of the SHAP value for the speed of sound with the GBT model: (a) IL molecular weight, (b) anion sigma profile feature S12, (c) pressure, and (d) temperature.

FIG. 8.

Partial dependence plots of the SHAP value for the speed of sound with the GBT model: (a) IL molecular weight, (b) anion sigma profile feature S12, (c) pressure, and (d) temperature.

Close modal

To further evaluate the reliability and rationality of the ML models developed in this work, the effect of experimental parameters, such as temperature, pressure, and the alkyl chain length of anion and cation, on the IL property predictions was investigated and compared to experimental measurements. Figure 9 shows the rationality of RF, XGBoost, and FFNN models on the prediction of surface tension of ILs over different temperature ranges and the effect of anion and cation alkyl chain lengths. XGBoost and FFNN models show a linear decrease in surface tension values with increasing temperature; however, the FFNN model predictions show a larger deviation from the corresponding experimental observations than the predictions of XGBoost. In contrast, the RF model demonstrates an unusual temperature effect; as the temperature increases from 298.15 to 318.15 K, the decrease in surface tension values is insignificant (Fig. 9). At higher temperatures (between 318.15 and 323.15 K), a substantial drop in surface tension values is noticed. Furthermore, as shown in Figs. 9(a) and 9(b), all the ML models correctly capture the anion and cation effect on the surface tension predictions; for instance, the higher the alkyl chain length of the anion/cation, the lower the predicted surface tension of ILs, in accordance with experiment. Wei et al.54 reported the surface tension of protic ILs by keeping the same cation (N-butylammonium, [1-BA]+) and varying the anions (formate, acetate, propionate, and butyrate). Song et al.55 reported the surface tension of pyridinium-based ILs by keeping trifluoroacetate [TFA]as an anion. The larger the alkyl chain length of the pyridinium cation or carboxylate-anion, the lower the surface tension of ILs—this is in accordance with the experiment.54,55

FIG. 9.

Rationality of the ML model for IL surface tension predictions. (a) Effect of anion alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [1-BA][For], circles: [1-BA][Ace], and squares: [1-BA][Prop]. (b) Effect of cation alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [EPy][TFA], circles: [C3Py][TFA], squares: [BPy][TFA], and up triangles: [HPy][TFA]. Colors: red—random forest, blue—XGBoost, and green—FFNN models.

FIG. 9.

Rationality of the ML model for IL surface tension predictions. (a) Effect of anion alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [1-BA][For], circles: [1-BA][Ace], and squares: [1-BA][Prop]. (b) Effect of cation alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [EPy][TFA], circles: [C3Py][TFA], squares: [BPy][TFA], and up triangles: [HPy][TFA]. Colors: red—random forest, blue—XGBoost, and green—FFNN models.

Close modal

In addition to surface tension, we also evaluated the rationality of ML models for the speed of sound predictions—the results are presented in Fig. 10 as a function of experimental parameters. The rationality of GBT, SVM, and FFNN models in the prediction of the speed of sound in ILs was examined over different pressure and temperature ranges, as well as the effect of anion and cation alkyl chain lengths. Here, RF predictions demonstrated unusual temperature effects; thus, this model was not considered for speed of sound model rationality. Figure 10(a) shows the effect of pressure on the speed of sound with different ML models. A linear increase is seen in the predicted speed of sound values with increasing pressure. Figures 10(b)10(d) show the effect of temperature on the speed of sound. All the models’ predictions, as depicted in Figs. 10(b)10(d), show a monotonic decrease in the speed of sound with increasing temperature. Furthermore, Fig. 11 shows a 3D plot of temperature and pressure effects on the speed of sound predictions for different ML models. FFNN and SVM models precisely capture the temperature and pressure effects on the speed of sound, whereas the tree-based models show fluctuations in their predictions, suggesting that the tree-based models may have “learned” a rugged response-surface compared to the SVM and FFNN models. For example, in Fig. 10(d), the GBT model shows a weaker temperature effect (283.15–298.15 K) on the speed of sound prediction for [MMIM][MeSO4]. Uncertainty quantification in ML-based predictions of solvents and material properties is of immense importance in any experimental, computational, or ML assessment, as it determines how trustworthy the measured or predicted data are. However, thus far in the ML literature, little effort has been spent evaluating ML model uncertainties.56 Recently, Tavazza et al.57 reported three methods to quantify the uncertainty for ML models, but none of the methods can be regarded as “gold standards” in quantifying uncertainty due to each method having its advantages and disadvantages. Therefore, we did not provide uncertainties for Figs. 9 and 10.

FIG. 10.

Rationality of the ML model for IL speed of sound predictions. (a) Effect of pressure [C3MIM][TF2N]; (b) effect of anion alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: N-methyl-2-oxopyrrolidinium formate, circles: N-methyl-2-oxopyrrolidinium acetate, and squares: N-methyl-2-oxopyrrolidinium hexanoate. (c) Effect of anion at different temperatures. The symbols of ILs are diamonds: [DBUH][2-OP], circles: [DBUH][3-OP], and squares: [DBUH][4-OP]. (d) Effect of cation alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [MMIM][MeSO4], circles: [EMIM][MeSO4], and squares: [BMIM][MeSO4]. The color representation in Figures (b)–(d): red—gradient boosting tree, blue—support vector machine, and green—artificial neural network models.

FIG. 10.

Rationality of the ML model for IL speed of sound predictions. (a) Effect of pressure [C3MIM][TF2N]; (b) effect of anion alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: N-methyl-2-oxopyrrolidinium formate, circles: N-methyl-2-oxopyrrolidinium acetate, and squares: N-methyl-2-oxopyrrolidinium hexanoate. (c) Effect of anion at different temperatures. The symbols of ILs are diamonds: [DBUH][2-OP], circles: [DBUH][3-OP], and squares: [DBUH][4-OP]. (d) Effect of cation alkyl chain lengths at different temperatures. The symbols of ILs are diamonds: [MMIM][MeSO4], circles: [EMIM][MeSO4], and squares: [BMIM][MeSO4]. The color representation in Figures (b)–(d): red—gradient boosting tree, blue—support vector machine, and green—artificial neural network models.

Close modal
FIG. 11.

Experimental and ML predicted speed of sound in ILs as a function of temperature and pressure for 1-methyl-3-propylimidazolium bis(trifluoromethylsulfonyl)imide [C3MIM][TF2N] IL: (a) experimental, (b) RF, (c) GBT, (d) XGBoost, (e) SVM, and (f) FFNN.

FIG. 11.

Experimental and ML predicted speed of sound in ILs as a function of temperature and pressure for 1-methyl-3-propylimidazolium bis(trifluoromethylsulfonyl)imide [C3MIM][TF2N] IL: (a) experimental, (b) RF, (c) GBT, (d) XGBoost, (e) SVM, and (f) FFNN.

Close modal

Finally, the effect of alkyl chain lengths on anion and cation was also evaluated. As shown in Fig. 10(b), as the alkyl chain length of the anion increases, the speed of sound in IL decreases. For example, Panda and Gardas48 reported the speed of sound in N-methyl-2-oxopyrrolidinium-based protic ILs by varying the alkyl chain length of anions from formate to hexanoate. The larger the alkyl chain length of the carboxylate anion, the lower the speed of sound in IL, in line with the experimental observations. A similar observation was also noticed for cation alkyl chain lengths [Fig. 10(d)]: as the cation chain length increases, the speed of sound in IL decreases. The longer alkyl chain length of an anion or cation results in a larger free volume, which in turn leads to lower values of the speed of sound. All the ML models correctly capture the anion/cation alkyl chain length effect; however, FFNN and SVM show larger deviations from the experiment. Figure 10(c) shows the effect of the hydroxypyridine anion ([OP]) with the 1,8-diazabicyclo[5.4.0]undec-7-ene ([DBUH]+) cation on the speed of sound in IL. As the distance between O and N in the anion ([2-OP], [3-OP], and [4-OP]) increases, the speed of sound in IL with the same cation ([DBUH]+) follows the order of [4-OP] > [3-OP] > [2-OP], indicating that the hydrogen bond network structure between anion and cation is gradually enhanced with increasing distance from O and N,58 leading to a more orderly structure, an observation in accordance with the ML model predictions. Figures 9 and 10 demonstrate the rationality of the developed machine learning models for IL property predictions.

The present work has focused on the development of different ML models for the prediction of the surface tension and speed of sound in ILs using the COSMO-RS-derived sigma profile input features. We find that several different ML models, including a variety of decision tree approaches, support-vector machines, and neural networks, provide high accuracy, according to traditional statistical metrics. Of the successful models, the decision tree-based approaches appear to be the most accurate and precise, with XGBoost and GBT being the best performers for the surface tension and speed of sound in ILs for training and testing sets.

Despite strong statistical performance metrics, when examining model rationality, i.e., whether thermophysical trends (i.e., temperature and pressure dependencies) are reproduced correctly for several test systems, the tree-based methods perform somewhat poorly, with the RF model faring the worst on this metric. Interestingly, although the FFNN model exhibits large deviations and weaker performance on surface tension predictions, compared to the tree-based approaches, it performs better in terms of our rationality tests.

Furthermore, by employing the SHAP technique, feature importances for each model type were extracted to aid in adding explainability to the models. For the tree-based models, the molecular area of the IL and the non-polar region of the sigma profile descriptor (S5: cation and S9: anion) play important roles in the prediction of surface tension, whereas the molecular weight, pressure, and S12 of the sigma profile descriptor are important in the speed of sound. In contrast, the FFNN based models, which have only modest differences in their statistical metrics compared to the trees and were shown to have correctly captured temperature and pressure dependencies, provide feature importances that are incongruent with both the tree-based feature importances and prior trends reported in the literature. Given the conflicting rationality and feature importances among the different QSPR models developed here, it would seem that the promise of using machine-learning models in conjunction with rationality testing and SHAP (or other explanatory tools) to gain deep insights into the underlying physics driving structure–property relationships in ILs may still be somewhat premature. However, it would be unfair to say that these tools do not have a use; indeed, for molecular systems where ample data are available but where general structure–property trends are unclear and physical intuition is limited, our results suggest that by training competing machine-learning models, with varied architectures, competing hypotheses can be extracted from feature importance calculations, and these could be useful to guide researchers.

COSMO-RS methodology and the MLR and GPR model predictions for surface tension and speed of sound are provided in the supplementary material.

This work was supported and provided by the U.S. Department of Energy (DOE), Office of Science, through the Genomic Science Program, Office of Biological and Environmental Research (Contract No. FWP ERKP752). Michelle K. Kidder acknowledges the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences, and Biosciences (CSGB), Grant No. 3ERKCG25, for partially supporting this research. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

The authors have no conflicts to disclose.

Mood Mohan: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Validation (lead); Visualization (lead); Writing – original draft (lead). Micholas Dean Smith: Methodology (equal); Supervision (equal); Writing – original draft (supporting); Writing – review & editing (equal). Omar Demerdash: Methodology (equal); Supervision (equal); Writing – review & editing (equal). Michelle K. Kidder: Funding acquisition (lead); Project administration (supporting); Supervision (equal); Writing – review & editing (lead). Jeremy C. Smith: Funding acquisition (lead); Project administration (lead); Supervision (lead); Writing – review & editing (lead).

The data that support the findings of this study are available from the corresponding authors upon reasonable request.

1.
N. V.
Plechkova
and
K. R.
Seddon
, “
Applications of ionic liquids in the chemical industry
,”
Chem. Soc. Rev.
37
(
1
),
123
150
(
2008
).
2.
M.
Mohan
,
H.
Choudhary
,
A.
George
,
B. A.
Simmons
,
K.
Sale
, and
J. M.
Gladden
, “
Towards understanding of delignification of grassy and woody biomass in cholinium-based ionic liquids
,”
Green Chem.
23
(
16
),
6020
6035
(
2021
).
3.
M.
Mohan
,
P.
Viswanath
,
T.
Banerjee
, and
V. V.
Goud
, “
Multiscale modelling strategies and experimental insights for the solvation of cellulose and hemicellulose in ionic liquids
,”
Mol. Phys.
116
(
15–16
),
2108
2128
(
2018
).
4.
M.
Mohan
,
K.
Huang
,
V. R.
Pidatala
,
B. A.
Simmons
,
S.
Singh
,
K. L.
Sale
, and
J. M.
Gladden
, “
Prediction of solubility parameters of lignin and ionic liquids using multi-resolution simulation approaches
,”
Green Chem.
24
,
1165
1176
(
2022
).
5.
M.
Mohan
,
J. D.
Keasling
,
B. A.
Simmons
, and
S.
Singh
, “
In-silico COSMO-RS predictive screening of ionic liquids for the dissolution of plastic
,”
Green Chem.
24
,
4140
4152
(
2022
).
6.
M.
Mohan
,
V. V.
Goud
, and
T.
Banerjee
, “
Solubility of glucose, xylose, fructose and galactose in ionic liquids: Experimental and theoretical studies using a continuum solvation model
,”
Fluid Phase Equilib.
395
,
33
43
(
2015
).
7.
M.
Hashemkhani
,
R.
Soleimani
,
H.
Fazeli
,
M.
Lee
,
A.
Bahadori
, and
M.
Tavalaeian
, “
Prediction of the binary surface tension of mixtures containing ionic liquids using Support Vector Machine algorithms
,”
J. Mol. Liq.
211
,
534
552
(
2015
).
8.
R. J.
Obaid
,
H.
Kotb
,
A. M.
Alsubaiyel
,
J.
Uddin
,
M.
Sani Sarjad
,
M.
Lutfor Rahman
, and
S. A.
Ahmed
, “
Novel and accurate mathematical simulation of various models for accurate prediction of surface tension parameters through ionic liquids
,”
Arabian J. Chem.
15
(
11
),
104228
(
2022
).
9.
J.
Restolho
,
A. P.
Serro
,
J. L.
Mata
, and
B.
Saramago
, “
Viscosity and surface tension of 1-ethanol-3-methylimidazolium tetrafluoroborate and 1-methyl-3-octylimidazolium tetrafluoroborate over a wide temperature range
,”
J. Chem. Eng. Data
54
(
3
),
950
955
(
2009
).
10.
M. H.
Ghatee
,
M.
Zare
,
A. R.
Zolghadr
, and
F.
Moosavi
, “
Temperature dependence of viscosity and relation with the surface tension of ionic liquids
,”
Fluid Phase Equilib.
291
(
2
),
188
194
(
2010
).
11.
P. K.
Kilaru
and
P.
Scovazzo
, “
Correlations of low-pressure carbon dioxide and hydrocarbon solubilities in imidazolium-, phosphonium-, and ammonium-based room-temperature ionic liquids. Part 2. Using activation energy of viscosity
,”
Ind. Eng. Chem. Res.
47
(
3
),
910
919
(
2008
).
12.
T.
Raj
,
R.
Gaur
,
P.
Dixit
,
R. P.
Gupta
,
V.
Kagdiyal
,
R.
Kumar
, and
D. K.
Tuli
, “
Ionic liquid pretreatment of biomass for sugars production: Driving factors with a plausible mechanism for higher enzymatic digestibility
,”
Carbohydr. Polym.
149
,
369
381
(
2016
).
13.
K. M.
Lee
,
J. Y.
Hong
, and
W. Y.
Tey
, “
Combination of ultrasonication and deep eutectic solvent in pretreatment of lignocellulosic biomass for enhanced enzymatic saccharification
,”
Cellulose
28
(
3
),
1513
1526
(
2021
).
14.
A. J.
Queimada
,
J. A. P.
Coutinho
,
I. M.
Marrucho
, and
J. L.
Daridon
, “
Corresponding-states modeling of the speed of sound of long-chain hydrocarbons
,”
Int. J. Thermophys.
27
(
4
),
1095
1109
(
2006
).
15.
R. L.
Gardas
and
J. A. P.
Coutinho
, “
Estimation of speed of sound of ionic liquids using surface tensions and densities: A volume based approach
,”
Fluid Phase Equilib.
267
(
2
),
188
192
(
2008
).
16.
A. F.
Estrada-Alexanders
and
D.
Justo
, “
New method for deriving accurate thermodynamic properties from speed-of-sound
,”
J. Chem. Thermodyn.
36
(
5
),
419
429
(
2004
).
17.
G.
Járvás
,
J.
Kontos
,
G.
Babics
, and
A.
Dallos
, “
A novel method for the surface tension estimation of ionic liquids based on COSMO-RS theory
,”
Fluid Phase Equilib.
468
,
9
17
(
2018
).
18.
R.
Wan
,
M.
Li
,
F.
Song
,
Y.
Xiao
,
F.
Zeng
,
C.
Peng
, and
H.
Liu
, “
Predicting the thermal conductivity of ionic liquids using a quantitative structure–property relationship
,”
Ind. Eng. Chem. Res.
61
(
32
),
12032
12039
(
2022
).
19.
M.
Mohan
,
M. D.
Smith
,
O.
Demerdash
,
B.
Simmons
,
S.
Singh
,
M. K.
Kidder
, and
J. C.
Smith
, “
Quantum chemistry-driven machine learning approach for the prediction of the surface tension and speed of sound of ionic liquids
,”
ACS Sustainable Chem. Eng.
11
(20),
7809
7821
(
2023
).
20.
M.
Mohan
,
O.
Demerdash
,
B. A.
Simmons
,
J. C.
Smith
,
M. K.
Kidder
, and
S.
Singh
, “
Accurate prediction of carbon dioxide capture by deep eutectic solvents using quantum chemistry and a neural network
,”
Green Chem.
25
,
3475
3492
(
2023
).
21.
I. U.
Ekanayake
,
D. P. P.
Meddage
, and
U.
Rathnayake
, “
A novel approach to explain the black-box nature of machine learning in compressive strength predictions of concrete using shapley additive explanations (SHAP)
,”
Case Stud. Constr. Mater.
16
,
e01059
(
2022
).
22.
D.
Shi
,
F.
Zhou
,
W.
Mu
,
C.
Ling
,
T.
Mu
,
G.
Yu
, and
R.
Li
, “
Deep insights into the viscosity of deep eutectic solvents by an XGBoost-based model plus SHapley Additive exPlanation
,”
Phys. Chem. Chem. Phys.
24
(
42
),
26029
26036
(
2022
).
23.
S.-P.
Mousavi
,
S.
Atashrouz
,
M.
Nait Amar
,
F.
Hadavimoghaddam
,
M.-R.
Mohammadi
,
A.
Hemmati-Sarapardeh
, and
A.
Mohaddespour
, “
Modeling surface tension of ionic liquids by chemical structure-intelligence based models
,”
J. Mol. Liq.
342
,
116961
(
2021
).
24.
Y.
Huang
,
H.
Dong
,
X.
Zhang
,
C.
Li
, and
S.
Zhang
, “
A new fragment contribution‐corresponding states method for physicochemical properties prediction of ionic liquids
,”
AIChE J.
59
(
4
),
1348
1359
(
2013
).
25.
R.
Haghbakhsh
,
S.
Keshtkari
, and
S.
Raeissi
, “
Simple estimations of the speed of sound in ionic liquids, with and without any physical property data available
,”
Fluid Phase Equilib.
503
,
112291
(
2020
).
26.
Y.
Xu
, “
Using artificial neural network to predict speed of sound and heat capacity of pure ionic liquid
,” Denver ProQuest dissertations (
University of Colorado
,
2017
), https://www.proquest.com/openview/424c5fa12f8f3ed823c920a6a9a16142/1?pq-origsite=gscholar&cbl=18750; accessed 24 April 2023.
27.
R. L.
Gardas
and
J. A.
Coutinho
, “
Applying a QSPR correlation to the prediction of surface tensions of ionic liquids
,”
Fluid Phase Equilib.
265
(
1–2
),
57
65
(
2008
).
28.
A.
Klamt
, “
The COSMO and COSMO‐RS solvation models
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
1
(
5
),
699
709
(
2011
).
29.
M.
Mohan
,
B. A.
Simmons
,
K. L.
Sale
, and
S.
Singh
, “
Multiscale molecular simulations for the solvation of lignin in ionic liquids
,”
Sci. Rep.
13
(
1
),
271
(
2023
).
30.
F.
Pedregosa
,
G.
Varoquaux
,
A.
Gramfort
,
V.
Michel
,
B.
Thirion
,
O.
Grisel
,
M.
Blondel
,
P.
Prettenhofer
,
R.
Weiss
, and
V.
Dubourg
, “
Scikit-learn: Machine learning in Python
,”
J. Mach. Learn. Res.
12
,
2825
2830
(
2011
).
31.
M.
Sharifzadeh
,
A.
Sikinioti-Lock
, and
N.
Shah
, “
Machine-learning methods for integrated renewable power generation: A comparative study of artificial neural networks, support vector regression, and Gaussian process regression
,”
Renewable Sustainable Energy Rev.
108
,
513
538
(
2019
).
32.
M.
Seeger
, “
Gaussian processes for machine learning
,”
Int. J. Neural Syst.
14
(
02
),
69
106
(
2004
).
33.
O.
Kramer
, “
Scikit-learn
,” in
Machine Learning for Evolution Strategies
(
Springer
,
2016
), Vol.
20
, pp.
45
53
.
34.
E.
Schulz
,
M.
Speekenbrink
, and
A.
Krause
, “
A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions
,”
J. Math. Psychol.
85
,
1
16
(
2018
).
35.
V. L.
Deringer
,
A. P.
Bartók
,
N.
Bernstein
,
D. M.
Wilkins
,
M.
Ceriotti
, and
G.
Csányi
, “
Gaussian process regression for materials and molecules
,”
Chem. Rev.
121
(
16
),
10073
10141
(
2021
).
36.
K.
Shahbaz
,
S.
Baroutian
,
F. S.
Mjalli
,
M. A.
Hashim
, and
I. M.
AlNashef
, “
Densities of ammonium and phosphonium based deep eutectic solvents: Prediction using artificial intelligence and group contribution techniques
,”
Thermochim. Acta
527
,
59
66
(
2012
).
37.
V.
Vapnik
,
The Nature of Statistical Learning Theory
(
Springer Science & Business Media
,
1999
).
38.
J. A. K.
Suykens
and
J.
Vandewalle
, “
Least squares support vector machine classifiers
,”
Neural Process. Lett.
9
(
3
),
293
300
(
1999
).
39.
L.
Breiman
, “
Random forests
,”
Mach. Learn.
45
(
1
),
5
32
(
2001
).
40.
L.-Y.
Yu
,
G.-P.
Ren
,
X.-J.
Hou
,
K.-J.
Wu
, and
Y.
He
, “
Transition state theory-inspired neural network for estimating the viscosity of deep eutectic solvents
,”
ACS Cent. Sci.
8
(
7
),
983
995
(
2022
).
41.
O. D.
Abarbanel
and
G. R.
Hutchison
, “
Machine learning to accelerate screening for Marcus reorganization energies
,”
J. Chem. Phys.
155
(
5
),
054106
(
2021
).
42.
E.
Bisong
, “
More supervised machine learning techniques with scikit-learn
,” in
Building Machine Learning and Deep Learning Models on Google Cloud Platform
(
Springer
,
2019
), pp.
287
308
.
43.
S. M.
Lundberg
and
S.-I.
Lee
, “
A unified approach to interpreting model predictions
,” in
Advances in Neural Information Processing Systems
(
2017
), Vol.
30
, pp. 4768–4777.
44.
J. P.
Wojeicchowski
,
D. O.
Abranches
,
A. M.
Ferreira
,
M. R.
Mafra
, and
J. A. P.
Coutinho
, “
Using COSMO-RS to predict solvatochromic parameters for deep eutectic solvents
,”
ACS Sustainable Chem. Eng.
9
(
30
),
10240
10249
(
2021
).
45.
A.
Kondor
,
G.
Járvás
,
J.
Kontos
, and
A.
Dallos
, “
Temperature dependent surface tension estimation using COSMO-RS sigma moments
,”
Chem. Eng. Res. Des.
92
(
12
),
2867
2872
(
2014
).
46.
A.
Alibakhshi
and
B.
Hartke
, “
Improved prediction of solvation free energies by machine-learning polarizable continuum solvation model
,”
Nat. Commun.
12
(
1
),
3584
(
2021
).
47.
M.
Tariq
,
M. G.
Freire
,
B.
Saramago
,
J. A. P.
Coutinho
,
J. N. C.
Lopes
, and
L. P. N.
Rebelo
, “
Surface tension of ionic liquids and ionic liquid solutions
,”
Chem. Soc. Rev.
41
(
2
),
829
868
(
2012
).
48.
S.
Panda
and
R. L.
Gardas
, “
Measurement and correlation for the thermophysical properties of novel pyrrolidonium ionic liquids: Effect of temperature and alkyl chain length on anion
,”
Fluid Phase Equilib.
386
,
65
74
(
2015
).
49.
R.
Sedev
, “
Surface tension, interfacial tension and contact angles of ionic liquids
,”
Curr. Opin. Colloid Interface Sci.
16
(
4
),
310
316
(
2011
).
50.
K.
Huang
,
M.
Mohan
,
A.
George
,
B. A.
Simmons
,
Y.
Xu
, and
J. M.
Gladden
, “
Integration of acetic acid catalysis with one-pot protic ionic liquid configuration to achieve high-efficient biorefinery of poplar biomass
,”
Green Chem.
23
(
16
),
6036
6049
(
2021
).
51.
M.
Ramos-Estrada
,
I. Y.
López-Cortés
,
G. A.
Iglesias-Silva
, and
F.
Pérez-Villaseñor
, “
Density, viscosity, and speed of sound of pure and binary mixtures of ionic liquids based on sulfonium and imidazolium cations and bis(trifluoromethylsulfonyl) imide anion with 1-propanol
,”
J. Chem. Eng. Data
63
(
12
),
4425
4444
(
2018
).
52.
M.
Dzida
,
M.
Musiał
,
E.
Zorębski
,
S.
Jężak
,
J.
Skowronek
,
K.
Malarz
,
A.
Mrozek-Wilczkiewicz
,
R.
Musiol
,
A.
Cyranka
, and
M.
Świątek
, “
Comparative study of the high pressure thermophysical properties of 1-ethyl-3-methylimidazolium and 1,3-diethylimidazolium ethyl sulfates for use as sustainable and efficient hydraulic fluids
,”
ACS Sustainable Chem. Eng.
6
(
8
),
10934
10943
(
2018
).
53.
E.
Zorębski
,
M.
Musiał
,
K.
Bałuszyńska
,
M.
Zorębski
, and
M.
Dzida
, “
Isobaric and isochoric heat capacities as well as isentropic and isothermal compressibilities of di- and trisubstituted imidazolium-based ionic liquids as a function of temperature
,”
Ind. Eng. Chem. Res.
57
(
14
),
5161
5172
(
2018
).
54.
Y.
Wei
,
T.
Xu
,
X.
Zhang
,
Y.
Di
, and
Q.
Zhang
, “
Thermodynamic properties and intermolecular interactions of a series of N-butylammonium carboxylate ionic liquids
,”
J. Chem. Eng. Data
63
(
12
),
4475
4483
(
2018
).
55.
Z.
Song
,
Q.
Yan
,
M.
Xia
,
X.
Qi
,
Z.
Zhang
,
J.
Wei
,
D.
Fang
, and
X.
Ma
, “
Physicochemical properties of N-alkylpyridine trifluoroacetate ionic liquids [CnPy][TFA] (n = 2–6)
,”
J. Chem. Thermodyn.
155
,
106366
(
2021
).
56.
D. L.
Shrestha
and
D. P.
Solomatine
, “
Machine learning approaches for estimation of prediction interval for the model output
,”
Neural Networks
19
(
2
),
225
235
(
2006
).
57.
F.
Tavazza
,
B.
DeCost
, and
K.
Choudhary
, “
Uncertainty prediction for machine learning models of material properties
,”
ACS Omega
6
(
48
),
32431
32440
(
2021
).
58.
T.
Chen
,
T.
Chen
,
X.
Wu
, and
Y.
Xu
, “
Effects of the structure on physicochemical properties and CO2 absorption of hydroxypyridine anion-based protic ionic liquids
,”
J. Mol. Liq.
362
,
119743
(
2022
).

Supplementary Material