Fluorescent organic dyes are extensively used in the design and discovery of new materials, photovoltaic cells, light sensors, imaging applications, medicinal chemistry, drug design, energy harvesting technologies, dye and pigment industries, and pharmaceutical industries, among other things. However, designing and synthesizing new fluorescent organic dyes with desirable properties for specific applications requires knowledge of the chemical and physical properties of previously studied molecules. It is a difficult task for experimentalists to identify the photophysical properties of the required chemical molecule at negligible time and financial cost. For this purpose, machine learning-based models are a highly demanding technique for estimating photophysical properties and may be an alternative approach to density functional theory. In this study, we used 15 single models and proposed three different hybrid models to assess a dataset of 3066 organic materials for predicting photophysical properties. The performance of these models was evaluated using three evaluation parameters: mean absolute error, root mean squared error, and the coefficient of determination (R2) on the test-size data. All the proposed hybrid models achieved the highest accuracy (R2) of 97.28%, 95.19%, and 74.01% for predicting the absorption wavelengths, emission wavelengths, and quantum yields, respectively. These resultant outcomes of the proposed hybrid models are ∼1.9%, ∼2.7%, and ∼2.4% higher than the recently reported best models’ values in the same dataset for absorption wavelengths, emission wavelengths, and quantum yields, respectively. This research promotes the quick and accurate production of new fluorescent organic dyes with desirable photophysical properties for specific applications.

Fluorescent organic dye molecules have a wide range of applications in basic research (extensively used in biochemistry, biophysics, chemistry, material science, medical science, and so on), technology and engineering (fluorescence-based sensors, lasers, solar cells, display devices, light harvesting, cryptography, and so on), and most importantly, in industry (display systems, textiles, plastics, paints, polymers, pharmaceuticals, and many others).1–8 Most applications require prior knowledge of the photochemical and photophysical properties of chemical compounds to be properly assessed before being used.9 Multitudinous attempts have been made to develop such chemical compounds with the desired photophysical properties. It is an arduous task for experimentalists to determine the photophysical properties of the desired chemical compound at a negligible cost of time and money. Most importantly, advanced knowledge of the photophysical properties of the chemical compound of interest will be extremely important information required before starting the experimental work. Having decided on the characteristics of the photophysical properties, the scientist, based on his knowledge and intuition, synthesizes the necessary compound. Knowledge of previously studied fluorescent organic dye molecules is essential to designing new fluorescent organic dyes with desirable photophysical properties. The absorption and emission properties of a fluorescent dye can be established experimentally using its peak position, bandwidth, molar absorption coefficient, and photoluminescence quantum yields, although the technique is challenging and time-consuming. Density Functional Theory (DFT) and Time-Dependent DFT (TD-DFT), two extensively utilized computational calculation methodologies, have been widely utilized for predicting absorption wavelengths, emission wavelengths, Stokes shifts, band gaps, quantum yields, and other parameters.10,11 Although these methods reliably predicted the photophysical properties of fluorescent organic dyes, they are expensive and time-consuming for large-scale molecular calculations. To acquire prior information about the molecules’ characteristics, scientists are tilted toward the fast-paced computational approaches, such as machine learning (ML), artificial intelligence (AI), and deep learning (DL), that have proliferated in order to drastically reduce design and experimental effort through laboratory automation.12–18 For this purpose, the ML-based hybrid ensemble model is a highly demanding technique for estimating photophysical properties and may be an alternative approach to the veteran approaches of DFT and TD-DFT (high computational cost in most cases). Our objective is to develop hybrid models to estimate absorption wavelengths, emission wavelengths, and quantum yields to overcome the challenge of quickly and accurately predicting fluorescent organic dyes. To accomplish the work, we constructed a dataset consisting of 3066 fluorescent organic materials in solvents. For example, Fig. 1 shows a few selective fluorescent organic dyes that are taken into consideration in the dataset for investigation.

FIG. 1.

Selective fluorescent organic dyes in the investigated dataset.

FIG. 1.

Selective fluorescent organic dyes in the investigated dataset.

Close modal

ML-based techniques have been widely employed in recent years as a robust and versatile tool for investigating underlying correlations in huge amounts of data and accurately estimating material properties, including photophysical properties.9,19 There have been numerous published research studies using ML to predict photophysical properties, including absorption wavelengths, emission wavelengths, quantum yields, Stokes shifts, molar absorption coefficients, and so on. For example, Ksenofontov et al. predict the molar absorption coefficient of different classes of 20 000 unique dye molecules, namely, BODIPY, azo, xanthene, cyanine, squaraine, naphthalene, oxadiazole, anthracene, pyrene, oxazine, acridine, arylmethine, coumarin, and so on.20 One of them, BODIPY dyes, utilized and developed data-driven models for calculating excitation energy by Gupta et al.21 Using the same ML data-driven approaches, Zhao et al. estimated one of the important properties (Stokes sifts).22 Another important property (quantum yield) of carbon dots is predicted by Han and co-workers with the best accuracy of nearly 69% using the XGBoost regression model.23 Using the same model, Mai et al. worked with azo dyes and predicted the absorption wavelength with the highest accuracy of 0.87 R2.24 Despite the ML techniques, the DL techniques successfully predicted the absorption wavelength with greater than 90% accuracy in 2022.25,26 In the same year, Senanayake et al. and Hong et al. used ML-assisted/guided carbon dot synthesis to predict the emission wavelength with accuracies of 94% and 96%, respectively.27,28 Hong et al. used principal component analysis (PCA) techniques to achieve better accuracy by reducing the dimensionality or removing the redundancy of data.28 There has been a lot of research where researchers have used both ML and other computational approaches for prediction and validation. For example, Ye et al. predicted the emission wavelengths of organic compounds using QSAR and ML-based techniques.29 In addition, Ju et al. used the combination of TD-DFT and ML for predicting the photophysical properties with remarkable accuracy comparable to TD-DFT for over 3000 different organic materials.30 Using the same dataset, we try to extend the research using advanced computational models like hybrid ensemble models to achieve better outcomes. Therefore, our developed ML-based hybrid ensemble models fulfill the requirement of quickly and accurately predicting the photophysical properties of fluorescent organic dyes.

In the present work, we have employed 15 ML-based single regression models and newly proposed hybrid ensemble models to evaluate the root mean squared error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) for determining the best suitable model among them in the examined dataset. Here, we estimated three photophysical properties, namely absorption wavelength, emission wavelength, and quantum yield, of 3066 records of fluorescent organic materials using the Morgan fingerprint and RDKit molecular descriptor. We developed a hybrid ensemble model, which is a combination of two composite models, light gradient boosting machine regression (LGBMR) and extra tree regression (ETR), to predict the absorption wavelength, and this model outperformed the other implemented models by judging three evaluating scores of R2 of 0.9728, MAE of 8.853, and RMSE of 15.025 on testing data (10% of total data). To predict the emission wavelength, we also used a different hybrid ensemble model that was developed by combining multi-layer perceptron regression (MLPR) and extra-tree regression (ETR) composite models. We found that this model performed the best, with three evaluation scores of R2 of 0.9519, MAE of 13.423, and RMSE of 20.893 on testing data (10% of the total data). By slightly changing the test-size data (11% of total data) to observe the performance of quantum yields, we found that another hybrid ensemble model, which is the combination of extreme gradient boosting regression (XGBR) and light gradient boosting machine regression (LGBMR), outperformed the other examined models by calculating three evaluating scores of R2 of 0.7401, MAE of 0.102, and RMSE of 0.151. The best resultant values of each of the three hybrid models are ∼1.9%, ∼2.7%, and ∼2.4%, higher than the reported best models’ values of 0.954, 0.925, and 0.716 by Ju et al. in the same dataset for absorption wavelengths, emission wavelengths, and quantum yields, respectively.30 In light of these findings, it can be concluded that ML-based hybrid ensemble models are quite useful for making rapid predictions about the photophysical characteristics of fluorescent organic dyes. The suggested hybrid ensemble models can be utilized with free hands without much knowledge of fluorescent materials to analyze photophysical properties long before the synthesis of the desired fluorescent organic materials in the laboratory at negligible computing time and cost.

Figure 2 shows the workflow of the ML-based models for the prediction of organic dye photophysical properties. The workflow works sequentially as data pre-processing (data collection, cleaning, and conversion), test size selection, proposed methods, and finally, obtaining evaluation parameter results. The data pre-processing steps are as follows:

  • The dataset was collected from the Figshare repository for the purpose of studying photophysical properties.30 

  • We omitted missing values from the collected dataset that was available in a comma-separated values (CSVs) file format.

  • We removed the duplicate records from the dataset by using the “drop duplicates” method in Python.

  • The dataset contained both numerical and string-type data. All the information in the data was converted into machine-readable data. In particular, SMILES (Simplified Molecular Input Line Entry System) records were converted into binary values using the Morgan fingerprint.

FIG. 2.

The workflow of the machine learning models.

FIG. 2.

The workflow of the machine learning models.

Close modal

In 2021, Ju et al. constructed a dataset consisting of the maximum absorption wavelength, emission wavelength, and quantum yield of organic fluorescent materials.30 This dataset contains 4386 instances and 11 attributes. Out of 11 attributes, five attributes were Solvatochromic parameters, three attributes were photophysical properties, one attribute was SMILES of materials, and one attribute was the reference of collected information of materials. The remaining attribute was solvents, where material-associated solvents were mentioned in SMILES format. In this work, the SMILES information of organic fluorescent materials and solvents was characterized by using Morgan fingerprints, which are a type of molecular fingerprint that can be obtained without quantum-mechanical calculations.31,32 Fingerprints are available inside the RDKit library33 and were selected in this study due to their suitability for the high-throughput screening of materials. The Morgan fingerprints were generated by taking into consideration their general applicability and computational efficiency. The molecular fingerprints involve turning the molecule into a sequence of bits (0 and 1) that can be easily compared between molecules.

After passing through the pre-processing process, 3066 organic fluorescent materials were ready to be examined through the machine-learning data-driven models. Figures 3(a)3(d) show the distribution of molecular weight between 95 and 1676 g/mol, absorption wavelength between 260 and 1026 nm, emission wavelength between 296 and 1045 nm, and quantum yield between 0 and 1 of the examined dataset records. In order to estimate the photophysical properties of organic fluorescent materials, we divided the dataset into two parts: the training dataset and the test dataset. For predicting the absorption and emission wavelengths, we considered 90% of the total examined data for training and the remaining 10% for testing the proposed ML models. However, for quantum yield prediction, we considered 11% of the total dataset for testing, and the remaining data were for training the applied models.

FIG. 3.

Distribution of (a) molecular weight, (b) absorption wavelength, (c) emission wavelength, and (d) quantum yield of the 3066 fluorescent organic materials.

FIG. 3.

Distribution of (a) molecular weight, (b) absorption wavelength, (c) emission wavelength, and (d) quantum yield of the 3066 fluorescent organic materials.

Close modal

In ML, there are many regression models and algorithms available. Some of them are popular and well-performing predictive models, such as Ridge Regression (RR), Lasso Regression (LSR), Elastic Net Regression (ENR), Bayesian Ridge Regression (BRR), Automatic Relevance Determination Regression (ARDR), Decision Tree Regression (DTR), Random Forest Regression (RFR), Extra Tree Regression (ETR), AdaBoost Regression (ABR), Gradient Boosting Regression (GBR), Extreme Gradient Boosting Regression (XGBR), CatBoost Regression (CBR), Light Gradient Boosting Machine Regression (LGBMR), K-Nearest Neighbors Regression (KNNR), and Multi-Layer Perceptron Regression (MLPR). These 15 single regression models were chosen to predict the photophysical properties of organic dyes in this study. A comparative performance analysis of various models was investigated in this article. Details on the theories and hyper-parameters of these implemented models can be found in the supplementary material.

In machine learning, a sequential hybrid ensemble model is developed when two or more models are employed sequentially to tackle a specific problem. Sequential hybrid ensemble models attempt to improve the overall performance and robustness of the system by taking advantage of each individual model while minimizing its drawbacks. These models can be of different types, such as stacking, cascading, fine-tuning, and others. In this study, we developed hybrid ensemble models by combining single models using cascade learning algorithms. The cascade model is effective when the regression problem is nonlinear, and it can help increase the accuracy of regression predictions by employing stepwise beginning estimates. There are several applications that have demonstrated that hybrid ensemble models typically outperform a single ML model, including material science research.34–36 

Figure 4 depicts the framework of hybrid ensemble models, which shows that the output of model 1 combines with the training dataset and generates an extended dataset. Then, model 2 is trained on the extended dataset and produces the prediction outcomes. The outcomes are based on the combination of two sequential or cascade models, where the output of one model serves as the input to the next. The developed model can be highly effective in improving model performance and solving complex ML-related problems. In order to estimate the three photophysical properties, we constructed three different hybrid ensemble models to accomplish the work. The combination of (i) MLPR+ETR, (ii) LGBMR+ETR, and (iii) XGBR+LGBMR for absorption wavelengths, emission wavelengths, and quantum yield predictions, respectively. While constructing hybrid models, at first, we trained model 1 using (i) MLPR for absorption wavelengths, (ii) LGBMR for emission wavelengths, and (iii) XGBR for quantum yield predictions. Then, we employed (i) ETR for absorption wavelengths, (ii) ETR for emission wavelengths, and (iii) LGBMR for quantum yield predictions in order to train model 2.

FIG. 4.

The framework of machine learning-based hybrid ensemble models.

FIG. 4.

The framework of machine learning-based hybrid ensemble models.

Close modal
The evaluation parameters were MAE, RMSE, and R2 for predicting the implemented regression models,
MAE=1ni=1nŷiyi,
(1)
RMSE=1ni=1n(ŷiyi)2,
(2)
R2=1i=1n(ŷiyi)2i=1n(ŷiȳ)2,
(3)
where yi, ŷi, and ȳ are the actual values, predicted values, and the average of the predicted values, respectively. “n” is the number of samples.

The spectroscopic behavior of fluorescent organic dyes in different solvents is being studied in order to predict photophysical properties such as absorption wavelengths, emission wavelengths, and quantum yields. The absorption peak position, bandwidth, and molar absorption coefficient of a fluorescent dye in a solvent environment are used to define its absorption properties. Another photophysical property (emission) of a fluorescent dye is characterized by the emission peak position, bandwidth, and photoluminescence quantum yields. In addition, one excited state property (lifetime) is an important property of a fluorescent dye, which would be considered for specific applications including dynamic quenching as well as fluorescence imaging. The database of these photophysical properties of 3066 fluorescent materials under different solvent environments is considered for prediction using the implemented models. For model inputs, the molecular structure information of fluorescent organic materials and solvents was decoded using Morgan fingerprints, which provide calculation-free material information without any quantum mechanical calculation. Chemical and structural information about molecules is recorded in the binary format (0 and 1), allowing comparisons between molecules with similar or dissimilar fingerprints.31,32 The presence of functional groups in molecular skeletons may enhance spin–orbit coupling, which is why rotatable bonds reflect functional group position. Consequently, functional groups’ positions reflected the circular and substructure fingerprints because these are closely related to the singlet–triplet gaps via inter-crossing systems and the radiative and non-radiative transitions of excited states.37,38 The excited state phenomena are also sensitive to solvents, especially for molecules’ intermolecular charge transfer characteristics and their solvatochromic parameters (Et30, SP, SdP, SA, and SB).39,40 The solvent used can have a substantial impact on a material’s optical properties and photophysical properties.41 The electrical structure, energy levels, and transition probabilities of a material may vary in reaction to different solvents. Solvent impacts are especially important for environmentally sensitive commodities, such as some chemicals and dyes. For example, materials with higher molecular weights have lower-frequency vibrational modes that interact with electronic transitions and may have stronger intermolecular interactions or aggregation effects that impact material absorption, emission, and quantum yield.

We employed 15 single regression models and a developed hybrid model to predict absorption wavelengths. The proposed ML models are reasonably well trained using the studied dataset, yielding outstanding prediction performance for absorption wavelengths on the test-size data. The implemented models’ evaluation parameter values using test-size data were reported in Table I, and their corresponding actual and predicted value graphs were supplied in the supplementary material (Figs. S1–S16). Figures 5, 6 and 7 depict the MAE, RMSE, and R2 values of the proposed models for maximum absorption wavelength prediction, respectively. The best MAE of 8.853 nm, RMSE of 15.025 nm, and R2 of 97.28% were obtained by the proposed hybrid models (MLPR+ETR), and the second-best MAE of 9.239 nm, RMSE of 16.433 nm, and R2 of 96.74% for ETR, respectively. ETR was the best value among the implemented single models in terms of all three evaluation parameters for maximum absorption wavelength prediction. Out of 15 single models, ten achieved an R2 value above 0.90, three models above 0.80, and two models less than 0.80. Our proposed hybrid model achieved the highest accuracy of an R2 of 0.9728, MAE of 8.853 nm, and RMSE of 15.025 nm to predict the absorption wavelengths with the examined test-size data. These results demonstrate that a hybrid model, when combined with the use of two composite models (MLPR and ETR), offers a suitable method for accurately predicting one of the important photophysical properties (absorption wavelengths) of organic fluorescent materials. Figure 8 displays the correlation between the predicted values and the actual values of absorption wavelengths on the tested dataset for the hybrid models.

TABLE I.

Performance of implemented models for absorption wavelength prediction.

Model nameMAERMSER2
RR 11.617 17.570 0.9628 
LSR 26.809 37.014 0.8347 
ENR 31.189 41.670 0.7905 
BRR 11.625 17.602 0.9626 
ARDR 11.977 18.923 0.9568 
DTR 16.605 32.633 0.8715 
RFR 11.815 20.962 0.9470 
ETR 9.239 16.433 0.9674 
ABR 34.000 42.450 0.7826 
GBR 19.070 26.324 0.9164 
XGBR 13.767 20.831 0.9477 
CBR 12.094 18.457 0.9589 
LGBMR 13.265 20.004 0.9517 
KNNR 24.593 39.107 0.8155 
MLPR 12.714 18.459 0.9589 
MLPR+ETR 8.853 15.025 0.9728 
Model nameMAERMSER2
RR 11.617 17.570 0.9628 
LSR 26.809 37.014 0.8347 
ENR 31.189 41.670 0.7905 
BRR 11.625 17.602 0.9626 
ARDR 11.977 18.923 0.9568 
DTR 16.605 32.633 0.8715 
RFR 11.815 20.962 0.9470 
ETR 9.239 16.433 0.9674 
ABR 34.000 42.450 0.7826 
GBR 19.070 26.324 0.9164 
XGBR 13.767 20.831 0.9477 
CBR 12.094 18.457 0.9589 
LGBMR 13.265 20.004 0.9517 
KNNR 24.593 39.107 0.8155 
MLPR 12.714 18.459 0.9589 
MLPR+ETR 8.853 15.025 0.9728 
FIG. 5.

The proposed models’ MAE results for absorption wavelength prediction.

FIG. 5.

The proposed models’ MAE results for absorption wavelength prediction.

Close modal
FIG. 6.

The proposed models’ RMSE results for absorption wavelength prediction.

FIG. 6.

The proposed models’ RMSE results for absorption wavelength prediction.

Close modal
FIG. 7.

The proposed models’ R2 results for absorption wavelength prediction.

FIG. 7.

The proposed models’ R2 results for absorption wavelength prediction.

Close modal
FIG. 8.

Correlation plots of absorption wavelengths on the tested dataset for the hybrid models (MLPR+ETR).

FIG. 8.

Correlation plots of absorption wavelengths on the tested dataset for the hybrid models (MLPR+ETR).

Close modal

Here, we applied several types of base models, such as linear, tree, neural network, boosting, and others. Linear regression (LR) based models (RR, LSR, ENR, BRR, and ARDR) are all regularization techniques that aim to prevent overfitting by adding a penalty term to the LR objective function. Comparing among the LR-based models, LSR and ENR did not get satisfactory outcomes (R2 values of 0.8347 and 0.7905) for absorption wavelength because these models did not capture complex nonlinear relationships in high-dimensional spaces. It is often necessary to perform some level of feature preselection before applying LSR and ENR to achieve better model performance.42 DTR structures consist of a root node (decision node) and leaf nodes (terminal nodes),43 and the prediction depends on the split of trees,44 whereas RFR and ETR are ensembles of decision trees, and the prediction generates various decision trees by using bootstrapping methods that collect random data from the examined datasets.45 Therefore, the single model DTR R2 value was 0.8715 for absorption wavelength prediction, and RFR and ETR had R2 values of 0.9470 and 0.9674, respectively. While comparing RFR and ETR, ETR is faster than RFR on the basis of computational speed. MLPR performed well with an R2 value of 0.9589 because it is one type of neural network that needs more information to perform well.46 Another well-established regression model, KNNR, did not perform well (R2 value of 0.8155) in the examined dataset. The k-value of this model is sensitive to noisy data and outliers in the training dataset. The curse of dimensionality can occur when a noisy or outlier data point has a large impact on the prediction, resulting in less accurate outcomes. Despite these models, we applied five different boosting models, such as ABR, GBR, XGBR, CBR, and LGBMR, and their corresponding R2 values were 0.7826, 0.9164, 0.9477, 0.9589, and 0.9517, respectively. Noteworthy, the ABR has the lowest performer model among the implemented models as well as boosting models, with an R2 of 0.7826, MAE of 34 nm, and RMSE of 42.45 nm. The ABR model is highly sensitive to outliers and noise in data. To increase the accuracy of the model, it requires the elimination of these factors before using the data. ABR can deliver better regression performance by iteratively modifying the instance weights and integrating numerous weak models.

When comparing the current work to the recently reported data by Ju et al. in the same dataset, they reported the highest R2 value of 0.954 using the gradient boosted regression trees (GBRT) model, whereas the six single models (ETR, RR, BRR, CBR, MLPR, and ARDR) and the developed hybrid model achieved better accuracy for predicting absorption wavelengths.30 When we compared our results with another dataset’s results of absorption wavelength prediction in 2022, Mai et al. predicted the maximum absorption wavelength of the azo dyes dataset and obtained a 0.87 R2 value using the XGBoost model.24 The resulting value was 0.10 lower than the outcome of our proposed hybrid model. Ksenofontov et al. did research using the BODIPY dyes dataset using the RDKit descriptor, the same descriptor as the current study, and obtained a 0.93 R2 value, which was lower than the proposed model’s highest 0.97 R2 value.26 Shao et al. published an article using deep learning algorithms with different molecular fingerprints and acquired the lowest MAE of 9.54 nm by combining Morgan and MACCS and our best MAE of 8.853 nm for the proposed hybrid model using Morgan fingerprints. However, we compared the results given by Shao et al. utilizing Morgan fingerprints to the current research data derived from the same molecular fingerprints, Morgan fingerprints. Shao et al. reported MAE of 10.57 nm for the fully connected neural network (FCNN), 13.13 nm for the convolutional neural network (CNN), and 14.42 nm for ChemFluo using these fingerprints, and the current research reported value was better than the MAE reported approximate value of 2, 4, and 6 nm, respectively.25 

We employed 15 single regression models and a developed hybrid model to predict emission wavelengths. The proposed ML models are reasonably well-trained using the examined dataset, yielding outstanding prediction performance for emission wavelengths on the test-sized data. After successfully observing these computational models, this current study analyzed and compared their resultant values on the basis of three evaluation parameters such as MAE, RMSE, and R2. The resultant values were tabulated in Table II, and their corresponding actual and predicted value graphs were supplied in the supplementary material (Figs. S17–S32). Figure 9 depicts the MAE values of the proposed models for maximum emission wavelength prediction. The lowest MAE of 13.423 nm was obtained by the proposed hybrid model (LGBMR+ETR), followed by ETR with an MAE of 14.205 nm. Figures 10 and 11 elucidate the RMSE and R2 results of the proposed models for maximum emission wavelength prediction. The best RMSE of 20.893 nm and R2 of 95.19% were obtained by the proposed hybrid models (LGBMR+ETR), and the second-best RMSE of 22.238 nm and R2 of 94.55% for LGBMR, respectively. However, in terms of RMSE and R2, LGBMR had the best value among the single models that were examined for predicting the maximum emission wavelength. Out of 15 single models, nine models achieved an R2 value above 0.90, four models above 0.80, and two models less than 0.80. Our proposed hybrid model achieved the highest accuracy of an R2 of 0.9519, MAE of 13.423 nm, and RMSE of 20.893 nm to predict the emission wavelengths with the examined test-size data. These results demonstrate that a hybrid model, when combined with the use of two composite models (LGBMR and ETR), offers a suitable method for accurately predicting one of the important photophysical properties (emission wavelengths) of organic fluorescent materials. Figure 12 shows the correlation between the predicted values and the actual values of emission wavelengths on the tested dataset for the hybrid models.

TABLE II.

Performance of implemented models for emission wavelength prediction.

Model nameMAERMSER2
RR 20.145 27.132 0.9188 
LSR 32.894 42.026 0.8053 
ENR 35.234 45.364 0.7731 
BRR 20.354 27.374 0.9174 
ARDR 21.271 29.863 0.9017 
DTR 21.661 36.736 0.8512 
RFR 15.892 24.397 0.9344 
ETR 14.205 22.478 0.9443 
ABR 37.590 46.459 0.7620 
GBR 22.687 30.419 0.8980 
XGBR 16.006 23.578 0.9387 
CBR 16.273 23.609 0.9385 
LGBMR 15.151 22.238 0.9455 
KNNR 23.261 36.326 0.8545 
MLPR 19.176 26.471 0.9227 
LGBMR+ETR 13.423 20.893 0.9519 
Model nameMAERMSER2
RR 20.145 27.132 0.9188 
LSR 32.894 42.026 0.8053 
ENR 35.234 45.364 0.7731 
BRR 20.354 27.374 0.9174 
ARDR 21.271 29.863 0.9017 
DTR 21.661 36.736 0.8512 
RFR 15.892 24.397 0.9344 
ETR 14.205 22.478 0.9443 
ABR 37.590 46.459 0.7620 
GBR 22.687 30.419 0.8980 
XGBR 16.006 23.578 0.9387 
CBR 16.273 23.609 0.9385 
LGBMR 15.151 22.238 0.9455 
KNNR 23.261 36.326 0.8545 
MLPR 19.176 26.471 0.9227 
LGBMR+ETR 13.423 20.893 0.9519 
FIG. 9.

The proposed models’ MAE results for emission wavelength prediction.

FIG. 9.

The proposed models’ MAE results for emission wavelength prediction.

Close modal
FIG. 10.

The proposed models’ RMSE results for emission wavelength prediction.

FIG. 10.

The proposed models’ RMSE results for emission wavelength prediction.

Close modal
FIG. 11.

The proposed models’ R2 results for emission wavelength prediction.

FIG. 11.

The proposed models’ R2 results for emission wavelength prediction.

Close modal
FIG. 12.

Correlation plots of absorption wavelengths on the tested dataset for the hybrid models (LGBMR+ETR).

FIG. 12.

Correlation plots of absorption wavelengths on the tested dataset for the hybrid models (LGBMR+ETR).

Close modal

It is interesting to note that when LR-based models were compared, LSR and ENR did not produce satisfactory results (R2 values of 0.8053 and 0.7731) for emission wavelength, perhaps due to their inability to accurately represent complicated nonlinear connections in a high-dimensional dataset.42 When comparing the tree-based models, the ensemble models (ETR and RFR) outperform the DTR model since the bootstrapping techniques are used to construct multiple decision trees based on random data collected from the datasets under examination.45 Despite these models, we used five distinct boosting models: ABR, GBR, XGBR, CBR, and LGBMR. The corresponding R2 values for these models were, in order, 0.7620, 0.8980, 0.9387, 0.9385, and 0.9455. The LGBMR model was the best-performer model among the boosting models as well as the deployed single models. After optimization, the grid search approach was utilized to determine the hyper-parameter optimal values for n-estimators, learning rate, tree depth, and regularization parameters for the LGBMR model’s efficiency and performance.22 Noteworthy, the ABR was the lowest-performer model among the implemented models because this model is very sensitive to outliers and noise in the data. Another factor is that the previous iterations’ weights and ABR instance weights that were misclassified or had higher errors in the prior iteration have been updated. This emphasis on difficult-to-predict instances assists succeeding weak models to pay greater attention to those instances and attempt to remedy the previous errors.

When comparing the current work to the recently reported data by Ju et al. in the same dataset, they reported the highest R2 value of 0.925 using the GBRT model, whereas the five single models (LGBMR, ETR, XGBR, CBR, and RFR) and the developed hybrid model achieved better accuracy for predicting emission wavelengths.30 We compared our results with the emission wavelength predictions from another dataset. Senanayake et al. predicted the emission color and wavelength of carbon dots in 2022 and obtained the lowest MAE of 19.4, 36.2, and 25.8 nm on three different test sizes: test 1, test 2, and test 3, respectively. The resulting values were less than 6, 23, and 12 nm from our proposed hybrid model’s MAE result of 13.423 nm for test 1, test 2, and test 3 datasets, respectively.27 In the same year, Hong et al. predicted emission-centered carbon dots with an R2 value above 0.96 by using principal component analysis (PCA), which is used to reduce the dimensionality of the data and achieve better accuracy than traditional ML approaches.28 Ye et al. predicted emission wavelengths of 11 460 experimentally synthesized fluorescent organic molecules and achieved the highest accuracy of 92%, which was smaller than that of our predicted results by 3%.29 

Apart from absorption and emission wavelength, another important photophysical parameter is quantum yield, which is the most important factor when we measure the fluorescent intensity of fluorescent organic dyes. However, the quantum yield prediction literature is still very limited because its prediction factors are difficult to cover in a single dataset. It depends on the non-transition phenomena of the excited state in the materials, such as fluorescence lifetime, radiative transition rate, non-radiative transition rate, and other tuning parameters. Tuning parameters such as reaction temperature, the mass of the precursor, ramp rate, and reaction time can influence the quantum yield. In some cases, the phosphorescence lifetime can impact the quantum yield in the phosphorescence processes, which is the result of delayed emission after excitation.

For estimating the quantum yield, we employed 15 single regression models and a developed hybrid model, and their corresponding resultant values were tabulated in Table III, and the actual and predicted value graphs were available in the supplementary material (Figs. S33–S48). Figure 13 depicts the MAE values of the proposed models for quantum yield prediction. The lowest MAE of 0.102 was obtained by the proposed hybrid model (XGBR+LGBMR), followed by ETR with an MAE of 0.104. Figures 14 and 15 elucidate the RMSE and R2 results of the proposed models for quantum yield prediction. The best RMSE of 0.151 was obtained by both the CBR and the proposed hybrid models (XGBR+LGBMR). However, the best R2 of 74.01% was obtained by the proposed hybrid models, followed by the CBR with an R2 of 73.99%. Moreover, in terms of RMSE and R2, CBR had the best value among the single models that were examined for predicting the quantum yield. CBR was associated with gradient boosting and was made for regression tasks, especially in situations involving structured input. This model’s effectiveness depends on the setting of parameters such as learning rate, tree depth, regularization parameters, and number of iterations.

TABLE III.

Performance of implemented models for quantum yield prediction.

Model nameMAERMSER2
RR 0.150 0.200 0.5447 
LSR 0.241 0.280 0.1091 
ENR 0.239 0.279 0.1184 
BRR 0.143 0.189 0.5932 
ARDR 0.162 0.214 0.4793 
DTR 0.130 0.214 0.4794 
RFR 0.113 0.159 0.7123 
ETR 0.104 0.154 0.7300 
ABR 0.236 0.269 0.1770 
GBR 0.164 0.199 0.5528 
XGBR 0.109 0.152 0.7383 
CBR 0.112 0.151 0.7399 
LGBMR 0.112 0.155 0.7280 
KNNR 0.175 0.240 0.3471 
MLPR 0.149 0.195 0.5682 
XGBR+LGBMR 0.102 0.151 0.7401 
Model nameMAERMSER2
RR 0.150 0.200 0.5447 
LSR 0.241 0.280 0.1091 
ENR 0.239 0.279 0.1184 
BRR 0.143 0.189 0.5932 
ARDR 0.162 0.214 0.4793 
DTR 0.130 0.214 0.4794 
RFR 0.113 0.159 0.7123 
ETR 0.104 0.154 0.7300 
ABR 0.236 0.269 0.1770 
GBR 0.164 0.199 0.5528 
XGBR 0.109 0.152 0.7383 
CBR 0.112 0.151 0.7399 
LGBMR 0.112 0.155 0.7280 
KNNR 0.175 0.240 0.3471 
MLPR 0.149 0.195 0.5682 
XGBR+LGBMR 0.102 0.151 0.7401 
FIG. 13.

The proposed models’ MAE results for quantum yield prediction.

FIG. 13.

The proposed models’ MAE results for quantum yield prediction.

Close modal
FIG. 14.

The proposed models’ RMSE results for quantum yield prediction.

FIG. 14.

The proposed models’ RMSE results for quantum yield prediction.

Close modal
FIG. 15.

The proposed models’ R2 results for quantum yield prediction.

FIG. 15.

The proposed models’ R2 results for quantum yield prediction.

Close modal

Out of 15 single models, nine models achieved an R2 value above 0.50, two models above 0.40, and four models less than 0.40. Our proposed hybrid model achieved the highest accuracy of an R2 of 0.7401, MAE of 0.102, and RMSE of 0.151 to predict the quantum yield with the examined test-size data (11% of total data). These results demonstrate that a hybrid model, when combined with the use of two composite models (XGBR and LGBMR), offers a suitable method for predicting one of the important photophysical properties (quantum yield) of organic fluorescent materials. Figure 16 elucidates the correlation between the predicted values and the actual values of quantum yields on the tested dataset for the hybrid models. Figure 15 shows that the ABR, ENR, and LSR prediction R2 values are near zero, meaning that these proposed models did not perfectly fit to explain the variance. Noteworthy, we cannot consider the attributes of radiative and non-radiative transition parameters in the examined dataset. In addition, tuning parameters are not taken into account as a predictor of the quantum yield. The radiative and non-radiative transitions of molecules, which reflect on the skeleton’s connected functional groups and rotatable bonds, are strongly related to quantum yields. Small shifts in the positions of the functional groups, therefore, affect the processes of intersystem crossing, singlet–triplet gaps, and spin–orbit coupling strength. Knowing more about such phenomena requires 3D knowledge about molecules, which the Morgan fingerprint cannot provide in the 2D that we used in the current study. These are the causes of the weak effectiveness of these models for quantum yield prediction.

FIG. 16.

Correlation plots of quantum yield on the tested dataset for the hybrid models (XGBR+LGBMR).

FIG. 16.

Correlation plots of quantum yield on the tested dataset for the hybrid models (XGBR+LGBMR).

Close modal

When comparing the current work to the recently reported data by Ju et al. in the same dataset, they reported the highest R2 value of 0.716 using the LGBMR model, whereas our LGBMR obtained a value of 0.728, which was greater by 0.012. Among the proposed models, the four single models (CBR, XGBR, ETR, and LGBMR) and the developed hybrid model achieved better accuracy than the reported highest value of 0.716 by Ju et al. for predicting quantum yields.30 We compared our results with another dataset’s results for quantum yield prediction. In 2020, Han et al. predicted the enhanced quantum yield of carbon dots and obtained the highest R2 value of 0.69 using the XGBR model.23 The resulting value was 0.05 lower than the outcome of our proposed XGBR model as well as our best model, the hybrid model.

Motivated by the challenge of early and accurately estimating the photophysical properties of fluorescent organic dyes, we developed ML-based hybrid ensemble models to assess and acquire knowledge effortlessly of materials with negligible cost and money. To accomplish this research, we trained our models on 3066 records of organic fluorescent materials and obtained the outcome of the implemented models based on the test-size data. Our proposed hybrid models’ combination of (MLPR+ETR), (LGBMR+ETR), and (XGBR+LGBMR) achieved the highest accuracy (R2) of 97.28%, 95.19%, and 74.01% for predicting the absorption wavelengths, emission wavelengths, and quantum yields, respectively. The resultant values of hybrid models are ∼1.9%, ∼2.7%, and ∼2.4%, higher than the reported best models’ values by Ju et al. in the same dataset for absorption wavelengths, emission wavelengths, and quantum yields, respectively.30 This recent study establishes a relationship between the chemical structure of organic fluorescent materials and the prediction of photophysical properties, which will speed up the process of synthesizing fluorescent organic compounds with the desired photophysical properties.

For follow-up research, we encourage utilizing the proposed models to analyze additional datasets in order to enhance the effectiveness of regression models’ outcomes and assist scientists in making accurate drug designs and new organic dyes.

The supplementary material contains predictive regression models’ theories. The correlation graphs of each regression model on the tested dataset are graphically illustrated for absorption wavelength, emission wavelength, and quantum yield prediction. In addition, hyper-parameter details of the models are also provided.

The work was supported constantly by CSIR, India. K.D.M. acknowledges the CSIR NET-JRF fellowship [File No. 09/1277(0001)/2019-EMR-I] for financial support.

The authors have no conflicts to disclose.

Kapil Dev Mahato: Conceptualization (lead); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). S. S. Gourab Kumar Das: Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing – review & editing (equal). Chandrashekhar Azad: Methodology (equal); Resources (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – review & editing (equal). Uday Kumar: Project administration (equal); Resources (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal).

1.
K. D.
Mahato
and
U.
Kumar
, “
A review of organic dye based nanoparticles: Preparation, properties, and engineering/technical applications
,”
Mini-Rev. Org. Chem.
20
(
7
),
655
674
(
2023
).
2.
W.
Cheng
,
H.
Chen
,
C.
Liu
,
C.
Ji
,
G.
Ma
, and
M.
Yin
, “
Functional organic dyes for health-related applications
,”
View
1
(
4
),
20200055
(
2020
).
3.
A.
Tkaczyk
,
K.
Mitrowska
, and
A.
Posyniak
, “
Synthetic organic dyes as contaminants of the aquatic environment and their implications for ecosystems: A review
,”
Sci. Total Environ.
717
,
137222
(
2020
).
4.
C.
Ji
,
L.
Lai
,
P.
Li
,
Z.
Wu
,
W.
Cheng
, and
M.
Yin
, “
Organic dye assemblies with aggregation-induced photophysical changes and their bio-applications
,”
Aggregate
2
(
4
),
e39
(
2021
).
5.
N.
Tomar
,
G.
Rani
,
V. S.
Dhaka
, and
P. K.
Surolia
, “
Role of artificial neural networks in predicting design and efficiency of dye sensitized solar cells
,”
Int. J. Energy Res.
46
(
9
),
11556
11573
(
2022
).
6.
K. D.
Mahato
and
U.
Kumar
, “
A comparative study of conventional FRET and light harvesting properties of Rh-110/Rh-6G and Rh-19/Rh-B organic dye pairs impregnated in sol-gel glasses
,”
Methods Appl. Fluoresc.
11
(
3
),
035003
(
2023
).
7.
Y.
Cai
,
W.
Si
,
W.
Huang
,
P.
Chen
,
J.
Shao
, and
X.
Dong
, “
Organic dye based nanoparticles for cancer phototheranostics
,”
Small
14
(
25
),
1704247
(
2018
).
8.
W. W.
Bao
,
R.
Li
,
Z. C.
Dai
,
J.
Tang
,
X.
Shi
,
J. T.
Geng
,
Z. F.
Deng
, and
J.
Hua
, “
Diketopyrrolopyrrole (DPP)-based materials and its applications: A review
,”
Front. Chem.
8
,
679
(
2020
).
9.
J.
Li
,
M.
Vacher
,
P. O.
Dral
, and
S. A.
Lopez
,
Theoretical and Computational Photochemistry: Fundamentals, Methods, Applications and Synergy with Experimental Approaches
(
Elsevier
,
2023
), pp.
163
189
.
10.
N.
Agnihotri
and
R. P.
Steer
, “
Time dependent DFT investigation of the optical properties of artificial light harvesting special pairs
,”
Phys. Chem. Chem. Phys.
18
(
22
),
15337
15351
(
2016
).
11.
P.
Rybczyński
,
M. H. E.
Bousquet
,
A.
Kaczmarek-Kędziera
,
B.
Jędrzejewska
,
D.
Jacquemin
, and
B.
Ośmiałowski
, “
Controlling the fluorescence quantum yields of benzothiazole-difluoroborates by optimal substitution
,”
Chem. Sci.
13
(
45
),
13347
13360
(
2022
).
12.
B.
Dou
,
Z.
Zhu
,
E.
Merkurjev
,
L.
Ke
,
L.
Chen
,
J.
Jiang
,
Y.
Zhu
,
J.
Liu
,
B.
Zhang
, and
G.
Wei
, “
Machine learning methods for small data challenges in molecular science
,”
Chem. Rev.
123
(
13
),
8736
8780
(
2023
).
13.
J.
Westermayr
and
P.
Marquetand
, “
Machine learning for electronically excited states of molecules
,”
Chem. Rev.
121
(
16
),
9873
9926
(
2021
).
14.
P. O.
Dral
and
M.
Barbatti
, “
Molecular excited states through a machine learning lens
,”
Nat. Rev. Chem
5
(
6
),
388
405
(
2021
).
15.
J.
Westermayr
and
P.
Marquetand
, “
Machine learning and excited-state molecular dynamics
,”
Mach. Learn.: Sci. Technol.
1
(
4
),
043001
(
2020
).
16.
K.
Choudhary
,
B.
DeCost
,
C.
Chen
,
A.
Jain
,
F.
Tavazza
,
R.
Cohn
,
C. W.
Park
,
A.
Choudhary
,
A.
Agrawal
,
S. J. L.
Billinge
,
E.
Holm
,
S. P.
Ong
, and
C.
Wolverton
, “
Recent advances and applications of deep learning methods in materials science
,”
npj Comput. Mater.
8
(
1
),
59
(
2022
).
17.
Z. J.
Baum
,
X.
Yu
,
P. Y.
Ayala
,
Y.
Zhao
,
S. P.
Watkins
, and
Q.
Zhou
, “
Artificial intelligence in chemistry: Current trends and future directions
,”
J. Chem. Inf. Model.
61
(
7
),
3197
3212
(
2021
).
18.
H.
Abroshan
,
P.
Winget
,
H. S.
Kwak
,
Y.
An
,
C. T.
Brown
, and
M. D.
Halls
,
Machine Learning in Materials Informatics: Methods and Applications
(
ACS Publications
,
2022
), pp.
33
49
.
19.
F.
Musil
,
A.
Grisafi
,
A. P.
Bartók
,
C.
Ortner
,
G.
Csányi
, and
M.
Ceriotti
, “
Physics-inspired structural representations for molecules and materials
,”
Chem. Rev.
121
(
16
),
9759
9815
(
2021
).
20.
A. A.
Ksenofontov
,
M. M.
Lukanov
, and
P. S.
Bocharov
, “
Can machine learning methods accurately predict the molar absorption coefficient of different classes of dyes?
,”
Spectrochim. Acta, Part A
279
,
121442
(
2022
).
21.
A.
Gupta
,
S.
Chakraborty
,
D.
Ghosh
, and
R.
Ramakrishnan
, “
Data-driven modeling of S → S1 excitation energy in the BODIPY chemical space: High-throughput computation, quantum machine learning, and inverse design
,”
J. Chem. Phys.
155
(
24
),
244102
(
2021
).
22.
Y.
Zhao
,
K.
Chen
,
L.
Zhu
, and
Q.
Huang
, “
Data-driven machine learning models for quick prediction of the Stokes shift of organic fluorescent materials
,”
Dyes Pigm.
220
,
111670
(
2023
).
23.
Y.
Han
,
B.
Tang
,
L.
Wang
,
H.
Bao
,
Y.
Lu
,
C.
Guan
,
L.
Zhang
,
M.
Le
,
Z.
Liu
, and
M.
Wu
, “
Machine-learning-driven synthesis of carbon dots with enhanced quantum yields
,”
ACS Nano
14
(
11
),
14761
14768
(
2020
).
24.
J.
Mai
,
T.
Lu
,
P.
Xu
,
Z.
Lian
,
M.
Li
, and
W.
Lu
, “
Predicting the maximum absorption wavelength of azo dyes using an interpretable machine learning strategy
,”
Dyes Pigm.
206
,
110647
(
2022
).
25.
J.
Shao
,
Y.
Liu
,
J.
Yan
,
Z. Y.
Yan
,
Y.
Wu
,
Z.
Ru
,
J. Y.
Liao
,
X.
Miao
, and
L.
Qian
, “
Prediction of maximum absorption wavelength using deep neural networks
,”
J. Chem. Inf. Model.
62
(
6
),
1368
1375
(
2022
).
26.
A. A.
Ksenofontov
,
M. M.
Lukanov
,
P. S.
Bocharov
,
M. B.
Berezin
, and
I. V.
Tetko
, “
Deep neural network model for highly accurate prediction of BODIPYs absorption
,”
Spectrochim. Acta, Part A
267
,
120577
(
2022
).
27.
R. D.
Senanayake
,
X.
Yao
,
C. E.
Froehlich
,
M. S.
Cahill
,
T. R.
Sheldon
,
M.
McIntire
,
C. L.
Haynes
, and
R.
Hernandez
, “
Machine learning-assisted carbon dot synthesis: Prediction of emission color and wavelength
,”
J. Chem. Inf. Model.
62
(
23
),
5918
5928
(
2022
).
28.
Q.
Hong
,
X. Y.
Wang
,
Y. T.
Gao
,
J.
Lv
,
B. B.
Chen
,
D. W.
Li
, and
R. C.
Qian
, “
Customized carbon dots with predictable optical properties synthesized at room temperature guided by machine learning
,”
Chem. Mater.
34
(
3
),
998
1009
(
2022
).
29.
Z. R.
Ye
,
I. S.
Huang
,
Y. T.
Chan
,
Z. J.
Li
,
C. C.
Liao
,
H. R.
Tsai
,
M. C.
Hsieh
,
C. C.
Chang
, and
M. K.
Tsai
, “
Predicting the emission wavelength of organic molecules using a combinatorial QSAR and machine learning approach
,”
RSC Adv.
10
(
40
),
23834
23841
(
2020
).
30.
C. W.
Ju
,
H.
Bai
,
B.
Li
, and
R.
Liu
, “
Machine learning enables highly accurate predictions of photophysical properties of organic fluorescent materials: Emission wavelengths and quantum yields
,”
J. Chem. Inf. Model.
61
(
3
),
1053
1065
(
2021
).
31.
A.
Cereto-Massagué
,
M. J.
Ojeda
,
C.
Valls
,
M.
Mulero
,
S.
Garcia-Vallvé
, and
G.
Pujadas
, “
Molecular fingerprint similarity search in virtual screening
,”
Methods
71
,
58
63
(
2015
).
32.
D.
Rogers
and
M.
Hahn
, “
Extended-connectivity fingerprints
,”
J. Chem. Inf. Model.
50
(
5
),
742
754
(
2010
).
33.
G.
Landrum
,
RDKIT: Open-Source Cheminformatics Software
,
2016
.
34.
H.
Li
,
Y.
Cui
,
Y.
Liu
,
W.
Li
,
Y.
Shi
,
C.
Fang
,
H.
Li
,
T.
Gao
,
L.
Hu
, and
Y.
Lu
, “
Ensemble learning for overall power conversion efficiency of the all-organic dye-sensitized solar cells
,”
IEEE Access
6
,
34118
34126
(
2018
).
35.
J.
Sansana
,
M. N.
Joswiak
,
I.
Castillo
,
Z.
Wang
,
R.
Rendall
,
L. H.
Chiang
, and
M. S.
Reis
, “
Recent trends on hybrid modeling for Industry 4.0
,”
Comput. Chem. Eng.
151
,
107365
(
2021
).
36.
M.
Borovic
,
M.
Ojstersek
, and
D.
Strnad
, “
A hybrid approach to recommending universal decimal classification codes for cataloguing in Slovenian digital libraries
,”
IEEE Access
10
,
85595
85605
(
2022
).
37.
A. W.
Kohn
,
Z.
Lin
, and
T.
Van Voorhis
, “
Toward prediction of nonradiative decay pathways in organic compounds I: The case of naphthalene quantum yields
,”
J. Phys. Chem. C
123
(
25
),
15394
15402
(
2019
).
38.
Z.
Lin
,
A. W.
Kohn
, and
T.
Van Voorhis
, “
Toward prediction of nonradiative decay pathways in organic compounds II: Two internal conversion channels in BODIPYs
,”
J. Phys. Chem. C
124
(
7
),
3925
3938
(
2020
).
39.
C.
Reichardt
, “
Solvatochromic dyes as solvent polarity indicators
,”
Chem. Rev.
94
(
8
),
2319
2358
(
1994
).
40.
J.
Catalán
, “
Toward a generalized treatment of the solvent effect based on four empirical scales: Dipolarity (SdP, a new scale), polarizability (SP), acidity (SA), and basicity (SB) of the medium
,”
J. Phys. Chem. B
113
(
17
),
5951
5960
(
2009
).
41.
J. F.
Joung
,
M.
Han
,
J.
Hwang
,
M.
Jeong
,
D. H.
Choi
, and
S.
Park
, “
Deep learning optical spectroscopy based on experimental database: Potential applications to molecular design
,”
JACS Au
1
(
4
),
427
438
(
2021
).
42.
P.
Filzmoser
and
K.
Nordhausen
, “
Robust linear regression for high-dimensional data: An overview
,”
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
13
(
4
),
e1524
(
2021
).
43.
C.
Azad
and
V.
Kumar Jha
, “
Genetic algorithm to solve the problem of small disjunct in the decision tree based intrusion detection system
,”
Int. J. Comput. Network Inf. Secur.
7
(
8
),
56
71
(
2015
).
44.
C.
Azad
,
B.
Bhushan
,
R.
Sharma
,
A.
Shankar
,
K. K.
Singh
, and
A.
Khamparia
, “
Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus
,”
Multimedia Syst.
28
(
4
),
1289
1307
(
2022
).
45.
C.
Saini
,
K. D.
Mahato
,
C.
Azad
, and
U.
Kumar
, in
2023 IEEE International Conference on Artificial intelligence and its Applications Alliance Technology Conference
(
IEEE
,
2023
), pp.
1
6
.
46.
Y.
Zhang
,
M.
Fan
,
Z.
Xu
,
Y.
Jiang
,
H.
Ding
,
Z.
Li
,
K.
Shu
,
M.
Zhao
,
G.
Feng
,
K. T.
Yong
,
B.
Dong
,
W.
Zhu
, and
G.
Xu
, “
Machine-learning screening of luminogens with aggregation-induced emission characteristics for fluorescence imaging
,”
J. Nanobiotechnol.
21
(
1
),
107
117
(
2023
).
47.
K. D.
Mahato
(
2023
). “
ML-Based-Hybrid-Ensemble-Models-for-Prediction-of-Organic-Dyes-Photophysical-Properties,” KDMSir.
https://github.com/KDMsir/ML-Based-Hybrid-Ensemble-Models-for-Prediction-of-Organic-Dyes-Photophysical-Properties.git

Supplementary Material