Acoustic cavitation threshold charts are used to map between acoustic parameters (mainly intensity and frequency) and different regimes of acoustic cavitation. The two main regimes are transient cavitation, where a bubble collapses, and stable cavitation, where a bubble undergoes periodic oscillations without collapse. The cavitation charts strongly depend on the physical model used to compute the bubble dynamics and the algorithm for classifying the cavitation threshold. The differences between modeling approaches become especially noticeable for resonant bubbles and when sonication parameters result in large-amplitude oscillations. This paper proposes a machine learning approach that integrates three physical models, i.e., the Rayleigh–Plesset, Keller–Miksis, and Gilmore equations, and multiple cavitation classification techniques. Specifically, we classify the cavitation regimes based on the maximum radius, the acoustic Mach number, the kurtosis factor of acoustic emissions, and the Flynn criterion on the inertial and pressure functions. Four machine learning strategies were developed to predict the likelihood of the transient and stable cavitation, using equally weighted contributions from classification techniques. By solving the differential equations for bubble dynamics across a range of sonication and material parameters and applying cross-validation on held-out test data, our framework demonstrates high predictive accuracy for cavitation regimes. This physics-informed machine learning approach offers probabilistic insights into cavitation likelihood, combining diverse physical models and classification strategies, each contributing different levels of physical rigor and interpretability.
I. INTRODUCTION
Acoustic waves propagating through a liquid can induce the formation, growth, and oscillation of vapor- or gas-filled cavities (bubbles) within the liquid.1 The dynamical behavior of these bubbles strongly depends on the parameters of the acoustic waves (i.e., pressure amplitude, frequency, and sonication protocol), the properties of the bubble, and the characteristics of the surrounding liquid. The two main cavitation regimes are stable and transient cavitation. Stable cavitation is characterized by periodic oscillations of a bubble. In contrast, a bubble undergoing transient cavitation rapidly expands to its maximum size, followed by a violent collapse. Mapping between acoustic parameters and cavitation regimes is important because different bubble dynamics generate different physical effects in the surrounding medium, including localized shock waves, microstreaming and jetting, shear stresses, and the production of free radicals.1 These effects have broad applications in science and engineering, such as sonochemistry,2,3 sonocrystallization,4–7 and imaging and therapeutic ultrasound.8–11
Theoretical models of bubble dynamics are employed to conduct mechanistic studies and develop novel applications of acoustic cavitation. These models allow us to study stable and transient cavitation as a function of acoustic parameters, physical properties of the medium, and boundary conditions. The first theoretical models were developed in the early 1900s with the pioneering works of Lord Rayleigh12 and Plesset,13 which resulted in the development of the classical Rayleigh–Plesset equation. This ordinary differential equation (ODE) assumes that a bubble remains spherical during oscillations, there is no mass transport to/from the bubble, and the surrounding liquid is unbounded and incompressible. However, these assumptions break down near a wall or when damping becomes important in bubble dynamics, as observed in inertial cavitation.14 Several other models have been developed to address these shortcomings.15–17 These theoretical models are nonlinear ODEs and are solved numerically using time-varying stiff ODE solvers.
The exact properties of the bubble nuclei are often unknown in practical situations. Therefore, cavitation models are solved for a range of initial values, and the dynamics are classified using acoustic cavitation thresholds. These thresholds are important to determine the onset and type of cavitation in various scenarios. Classical examples include the mechanical index to predict the onset of inertial cavitation18 and indicators based on the maximum bubble radius or acoustic Mach number of a bubble oscillation.1 The mechanical index is frequently used in ultrasound imaging to guide the selection of imaging parameters. Nonetheless, it is considered to be a poor indicator in many cases, especially when a bubble undergoes large-amplitude oscillations.19 As an alternative, the dynamical threshold based on radius-time curves was developed by Flynn,16 which is more accurate than other thresholds but computationally more elaborate.
The common approach for analyzing cavitation characteristics is to select an appropriate theoretical model for the targeted application and solve it numerically for all expected combinations of acoustical and material parameters. The practical challenges of this approach are as follows: (i) selecting a theoretical model that explains the physics correctly requires expert knowledge; (ii) accurately solving these models with numerical integration schemes can be computationally expensive, especially when parameter sweeps are performed; and (iii) multiple cavitation thresholds should be computed as there is no single algorithm to predict cavitation types reliably. To address these challenges, this study proposes a machine-learning approach that combines multiple theoretical models and cavitation thresholds to predict the cavitation type quickly. This method provides physics-informed predictions incorporating multiple cavitation models and thresholds.
Machine learning algorithms have recently received increased interest as a predictive tool in computational acoustics,20,21 with physics-informed neural networks22 being the most well-known for multiphysics simulations.23–25 In the context of acoustic cavitation, convolutional neural networks have been used to analyze images of bubble dynamics from laboratory experiments.26,27 Machine learning techniques were also proposed to simulate multiscale dynamics by training DeepONets on data generated with the Rayleigh–Plesset equation as a continuum model and direct simulation for particle dynamics.28 However, this approach was only tested at acoustic pressures lower than typically encountered in practical applications of acoustic cavitation.29 A machine learning algorithm was also used to estimate the maximum bubble radius in a stable cavitation regime, trained on data generated by solving the Keller–Miksis equation.30 In our work, we introduce a novel machine-learning methodology for predicting cavitation regimes for a wide range of acoustical and material parameters relevant to biomedical ultrasound and sonochemistry applications. Our machine learning predictions determine the likelihood of stable or transient cavitation based on four different cavitation thresholds, which are presented as cavitation likelihood charts. This method offers a fast and accurate tool for designing sonication protocols.
This manuscript first presents the physical and mathematical formulations in Sec. II. Specifically, that section covers the differential equations for bubble dynamics, the classifiers for cavitation type, the generation of the training data, the design of the machine learning strategies, and the performance metrics for the predictions. Subsequently, the computational results are presented in Sec. III, which shows the performance of the machine learning designs and the predictions of cavitation type. The conclusion of the research will be summarized in Sec. IV.
II. FORMULATION
A. Bubble dynamics
A gas-filled bubble within a fluid medium will oscillate when subjected to an acoustic pressure field. These oscillations may remain stable over time or may be sufficiently strong to cause a collapse of the bubble. The dynamics of such movements in a Newtonian fluid can be modeled with the Navier–Stokes equation. We consider the following assumptions that are commonly used to model the dynamics of a single bubble: (i) an unbounded fluid surrounds the bubble; (ii) oscillations remain spherically symmetric; and (iii) the mass and heat transfers between the bubble and the fluid are ignored. Notice that the first assumption restricts our study to an isolated bubble, with no acoustic interactions with other bubbles or a boundary. These assumptions enable us to describe the bubble's oscillations by its time-varying radius only. Furthermore, we assume that the relevant physical quantities depend on the temperature only and remain constant in time. Specifically, the parameter denotes the fluid's density in , the surface tension in , the fluid's viscosity in , the vapor pressure in , and c the speed of sound in . The time-dependent variables are R, the bubble radius in , , the velocity of the bubble's surface, and , the acceleration of the oscillations. The variables of the system's equilibrium state include , the initial bubble radius, and , the atmospheric pressure. Additionally, there is a known incident field, denoted as and measured in .
1. Rayleigh–Plesset
2. Keller–Miksis
3. Gilmore
B. Classifying cavitation type
Although cavitation is commonly classified as stable or transient, there is no consensus on exactly how to distinguish between these regimes. Various thresholds are proposed to classify the dynamics of bubbles. In this study, we consider four different classifiers, each based on distinct physical assumptions. They include acoustic emissions (i.e., irradiated pressure), maximum bubble radius and velocity, and pressure and inertial functions of the bubble dynamics. Four classifiers will be included in the machine learning algorithms. This approach ensures a more reliable classification of stable and transient events, as the methodology does not depend on a single threshold and the physical limitations behind the specific model. Instead, our approach combines multiple models and thresholds that characterize bubble dynamics.
1. Dynamical threshold
2. Acoustic emissions
3. Mach number
Transient cavitation is more likely when a bubble oscillates fast, i.e., the bubble wall velocity is large. Hence, we calculate the velocity from the solution of the differential equations and find its maximum during the entire simulation. If the maximum velocity is higher than the speed of sound in the liquid (i.e., the acoustic Mach number defined as is larger than one), the cavitation is classified as transient. Otherwise, we check if the initial radius is smaller than the radius calculated from the natural frequency of oscillation37 or not. If this condition is satisfied, the cavitation is classified as transient; and otherwise it is considered as stable.1
4. Maximum radius
C. Machine learning
The previous sections presented three differential equations that model bubble dynamics and four classifiers to identify the cavitation regimes based on the bubble's oscillations. Rather than selecting a specific differential equation and classifier, we design machine learning algorithms that consider all information from the 12 possible combinations of differential equations and classifiers.
1. Supervised machine learning design
Supervised machine learning methods are used in this work. Supervised machine learning refers to a family of techniques that predict a label from features. The label in supervised machine learning encodes the variable of interest, which is, in our case, a boolean indicating stable or transient cavitation. The features in supervised machine learning are the selected physical and acoustical parameters that encode known predictors for each simulation. Specifically, we use the initial radius, acoustic pressure, temperature, frequency, density, viscosity, surface tension, speed of sound, and vapor pressure as features.
Supervised machine learning algorithms learn patterns from known examples of feature-label pairs during a training phase. We create a dataset with training examples by selecting a range of input parameters, solving the three differential equations from Sec. II A, and applying the four classifiers from Sec. II B. Each classifier can be applied to each simulation result. As a result, there are 12 combinations to create a training example of cavitation type for the same set of material and acoustical input parameters. In other words, each set of features has 12 values for the label.
This study aims to investigate the feasibility of machine learning in classifying cavitation types. The 12 training techniques allow us to design four supervised machine learning approaches. The creation of four different machine learning designs improves our understanding of the effectiveness of such strategies in more detail. Furthermore, all these designs facilitate the application of any classification and regression algorithm.
a. Ensemble design.
The threshold-based ensemble design uses 12 independent machine learning algorithms. Each algorithm uses a specific combination of differential equations and classifiers to create a binary label for cavitation type. The predictions generated by this ensemble of 12 algorithms can be combined into a single result in two ways. First, we can average the individual results into a proportion. Second, we can apply majority voting to obtain a binary outcome, where ties are classified as stable cavitation.
b. Multi-objective design.
The multi-objective design uses a machine learning algorithm that predicts multiple labels for the same feature set. Specifically, for a given set of feature values, it predicts 12 labels. Each label corresponds to a specific combination of differential equations and classifiers. Like the ensemble design, averaging or majority voting can reduce the 12 results to a single outcome.
c. Expansion design.
The expanded data design considers the 12 combinations as independent data items. Hence, the training data set has 12 repetitions of the same features but with labels that may differ depending on the specific differential equation and classifier. The prediction is a boolean value for cavitation type.
d. Likelihood design.
The likelihood design uses the same training data structure as the expanded data design. However, the binary labels of transient and stable caviations are converted to the real values 0.0 and 1.0, respectively. Then, a regression algorithm predicts a real number. This outcome can be interpreted as the likelihood of having stable cavitation according to the set of differential equations and classifiers considered in this study.
2. Generating training data
The generation of training data for the machine learning designs involves solving the three theoretical models for bubble dynamics explained in Sec. II A, namely the Rayleigh–Plesset, Keller–Miksis, and Gilmore equations. These differential equations are numerically solved with the LSODA time integrator.38 Each experiment simulates the bubble dynamics for a duration of 20 periods of the incident acoustic wave. The numerical integrator has a resolution of time steps in each acoustic period, which is sufficient to achieve high-precision calculations. Since the second-order differential equations are solved as a system of two first-order differential equations, the time integrator's output is the bubble's radius and velocity at each time step. These outcomes are then used to obtain the labels for the training data by applying the classifiers from Sec. II B to the numerical simulations.
The accuracy of machine learning approaches strongly depend on the variety of training data. Here, this means that we must generate training examples for a wide range of physical and acoustical settings to our interest. We choose to vary four of the most significant parameters and keep the rest fixed. Specifically, we use ranges of the initial radius, frequency, pressure amplitude, and temperature relevant for typical engineering applications. See Table I for the values. We take 10 samples for each parameter, uniformly distributed within the ranges. This results in unique combinations for the training dataset.
Ranges for the input parameters in the training dataset.
Initial radius | ||
Acoustic pressure | ||
Frequency | ||
Temperature |
Initial radius | ||
Acoustic pressure | ||
Frequency | ||
Temperature |
The material parameters are chosen to resemble water and calculated through standard state equations from the literature. Specifically, the temperature determines the density,39 viscosity,40 surface tension,39 and speed of sound41 of the medium. Furthermore, we consider the vapor pressure of the bubble to be 3270 , the atmospheric pressure is 100 , and the adiabatic index is 1.33.
3. Performance metrics
The performance of machine learning to predict the desired outcomes can be analyzed with different metrics. We use the accuracy, mean absolute error (MAE), and root mean squared error (RMSE) as performance metrics. Notice that we also calculate performance metrics for the intermediate results in the ensemble and multi-objective designs before taking the ensemble average.
D. Methodology
The workflow of the proposed methodology consists of two main phases: training and prediction. In the training phase, we select values for key physical parameters—initial radius, acoustic pressure, frequency, and temperature—and solve cavitation models based on the Rayleigh–Plesset, Keller–Miksis, and Gilmore equations. Each simulation result is processed to classify bubble dynamics as stable or transient cavitation using the four thresholds outlined in Sec. II B. This process is repeated for all input parameter sets, generating a dataset of feature-label pairs, where the features correspond to the physical parameters and the label represents the cavitation regime. A machine-learning model is then trained by fitting optimal parameters to this dataset. In the prediction phase, the trained model is used to classify the cavitation regime for a specific input set. Given values for the four physical parameters, the machine learning algorithm predicts the most likely cavitation regime: stable or transient. Figure 1 shows the workflow.
As an example, the random-forest algorithm consists of various decision trees.42 Each decision tree is trained on a subset of the dataset and creates branches that minimize the entropy in each leaf node. During the prediction phase, each decision tree follows the branches for the input values and provides a binary label, with the final outcome being the majority vote on the cavitation regime.
E. Supervised machine learning algorithm
The proposed machine-learning framework is flexible and not restricted to any specific supervised learning algorithm for classification or regression tasks. Although a comprehensive comparison of different algorithms is beyond the scope of this article, Table II presents performance metrics for an ensemble design using five common algorithms. Based on these results and our computational experience across various settings, we selected the random forest algorithm for the subsequent analyses in this manuscript. This choice is motivated by three key factors: (i) The random forest consistently ranked among the top-performing algorithms in the test set, with performance metrics either the highest or within a few percentage points of the best; (ii) The reasonable gap between training and test scores indicates that the model is not overfitting; (iii) Hyperparameter tuning of the number of predictors showed that using 15 decision trees effectively balances precision and robustness.
Performance metrics for machine learning algorithms on train and test sets.
Algorithm . | MAE Train . | MAE Test . |
---|---|---|
Linear regression | 0.2805 | 0.2841 |
Logistic regression | 0.3058 | 0.3082 |
Decision tree | 0.0000 | 0.0862 |
Gradient boost | 0.0730 | 0.0785 |
Random forest | 0.0032 | 0.0870 |
Algorithm . | MAE Train . | MAE Test . |
---|---|---|
Linear regression | 0.2805 | 0.2841 |
Logistic regression | 0.3058 | 0.3082 |
Decision tree | 0.0000 | 0.0862 |
Gradient boost | 0.0730 | 0.0785 |
Random forest | 0.0032 | 0.0870 |
III. RESULTS
This section presents the computational results of our machine learning designs to predict stable and transient cavitation types.
A. Computational settings
The differential equations governing bubble dynamics (see Sec. II A) were nondimensionalized to improve computational robustness. Specifically, the radius was nondimensionalized by the equilibrium radius , resulting in the nondimensional radius . The nondimensional time variable was based on the period of the acoustic wave. The ODEs are solved with the LSODA time integrator available in Python's Scipy library.43 Since each simulation takes a few minutes on a standard desktop computer, we generated the training dataset by parallelizing the simulations over 32 cores on a high-performance compute node. This parallelization significantly reduced the overall computation time, allowing us to complete the data generation phase within a few hours only.
The machine learning designs, presented in Sec. II C, require choosing a specific classification or regression algorithm. We performed tests with various algorithms, including logistic regression, linear regression, decision trees, and gradient boost in both its regressor and classifier versions. Upon comparing all these algorithms, the random forest algorithm42 came out as the best choice. That is, the performance metrics of the random forest were always the best or only a few percentage points away from the top-performing algorithm, confirming its consistency across the various machine learning designs. The random forest algorithm is configured with 15 decision trees to balance precision and robustness.
We implemented the machine learning algorithms with Python's Scikit-learn library,44 known for its robust and efficient tools for machine learning model development and evaluation. We analyze the performance of supervised machine learning by applying cross-validation. Specifically, a fivefold approach was employed, where the dataset was randomly partitioned into five subsets of equal size. In each fold, one subset was reserved for testing, while the remaining four subsets were used for training. This process was repeated across all five folds, ensuring that each subset served as the test set exactly once. The performance metrics for each fold were averaged to calculate the overall performance metrics.
B. Bubble cavitation
Two examples of cavitation that are indicative of the stable and transient cavitation are presented in this section. Figure 2 shows the evolution of an m bubble, during 10 periods of sonication. The sonication parameters are: pressure amplitude = 0.3 MPa, and frequency = 1.2 MHz. At this relatively low acoustic pressure, the bubble oscillates smoothly with a maximum radius smaller than , which is classified as stable cavitation. For this scenario, the three different physical models predict similar bubble dynamics.
The oscillation of a bubble with m, exposed to ultrasound waves with the frequency of 1.2 MHz, and the amplitude of 0.3 MPa. This is an example of stable cavitation.
The oscillation of a bubble with m, exposed to ultrasound waves with the frequency of 1.2 MHz, and the amplitude of 0.3 MPa. This is an example of stable cavitation.
Now, let us consider a scenario where m, the sonication frequency and amplitude are 1.2 MHz and 2 MPa, respectively. Figure 3 shows that the bubble grows rapidly to a radius of around within a cycle of excitation, followed by a strong collapse at the end of the second cycle. This type of bubble dynamics is a good example of transient cavitation. Furthermore, we can see that the predictions of the three models in the subsequent cycles are different, as the compressibility becomes important in this high-amplitude transient cavitation scenario.
The oscillation of a bubble with m, exposed to ultrasound waves with the frequency of 1.2 MHz, and the amplitude of 2 MPa. This is an example of transient cavitation.
The oscillation of a bubble with m, exposed to ultrasound waves with the frequency of 1.2 MHz, and the amplitude of 2 MPa. This is an example of transient cavitation.
C. Training data
We generate training data by solving the three models of the bubble dynamics presented in Sec. II. Then, four different classifiers are computed from these simulations. This procedure is repeated for combinations of the four input parameters within the ranges given in Table I. These ten thousand numerical simulations, with 12 values for cavitation type each, form the training dataset for the supervised machine learning algorithms.
The labels in the training dataset have a distribution ratio of stable cavitation to transient cavitation. To explore this distribution in more detail, the acoustic cavitation data are grouped by the differential equation and classifier, as illustrated in Figs. 4 and 5, respectively.
The distribution of acoustic cavitation data grouped by the differential equation. The horizontal axis represents the cumulative count of classifiers indicating stable cavitation events. The vertical axis counts the number of simulations, where the dataset has a total of observations.
The distribution of acoustic cavitation data grouped by the differential equation. The horizontal axis represents the cumulative count of classifiers indicating stable cavitation events. The vertical axis counts the number of simulations, where the dataset has a total of observations.
Distribution of acoustic cavitation data grouped by the classifier. The horizontal axis represents the cumulative count of differential equations indicating stable cavitation events. The vertical axis counts the number of simulations, where the dataset has a total of observations.
Distribution of acoustic cavitation data grouped by the classifier. The horizontal axis represents the cumulative count of differential equations indicating stable cavitation events. The vertical axis counts the number of simulations, where the dataset has a total of observations.
The four classifiers in Sec. II B all distinguish between stable and transient cavitation but through different modeling approaches and taking into account different physical assumptions. Therefore, the consistency between classifiers is not guaranteed. The horizontal axis in Fig. 4 represents the consistency of classifiers. For example, the value zero refers to the cases where all classifiers judge the bubble cavitation as transient. Similarly, the value four represents the cases where all four classifiers consistently calssify cavitation as stable. These extreme cases have small counts (short bars), meaning that the complete agreement between the classifiers is rarely achieved. In fact, the largest proportion of cases have different classifiers giving different results. When looking at the differences between the bar groups, we notice that the Rayleigh–Plesset equation is more inclined to classify the same experiment as transient compared to the Keller–Miksis and Gilmore equations.
Figure 5 presents the label distribution of the training data grouped by the classifier. For example, the tallest bar at the right considers the maximum radius threshold as the classifier, for which, in more than 7000 cases, from all three differential equations yield stable cavitation. Furthermore, the zero on the horizontal axis means that the simulations of the three differential equations feature transient cavitation. Differently, there is also a large proportion of situations where one differential equation yields another cavitation type than the other two differential equations. This inconsistency is due to the different modeling approaches of bubble dynamics and the physics of cavitation.
D. Machine learning predictions
We trained four different machine learning designs on the cavitation dataset, as explained in Sec. II C 1. Here, we present the performance results in predicting the cavitation type.
1. Ensemble designs
Let us first consider the ensemble and multi-objective designs, which both provide predictions for each of the 12 combinations of differential equations and classifiers. Figure 6 presents the accuracy of the machine learning predictions. The average accuracy is for both designs, and the prediction's accuracy differs slightly between the type of differential equation and classifier for cavitation. The errors in the machine learning's predictions are due to the training errors of the random forest algorithm but are also caused by the complexities in the data set. That is, the differential equations and classifiers for bubble cavitation also come with modeling errors.
The accuracy score for ensemble and multi-objective designs for each of the 12 combinations between classifiers and differential equations.
The accuracy score for ensemble and multi-objective designs for each of the 12 combinations between classifiers and differential equations.
We observe that the dynamical threshold classifier has a lower accuracy than the other classifiers in both designs. This can be attributed to the classifier's more complex approach to distinguishing transient from stable cavitation from the bubble's oscillation profile. Hence, it is more challenging for machine learning to reproduce the cavitation behavior from the input parameters.
2. Comparative performance
Table III shows the performance metrics for all four machine learning designs, with the variants of the ensemble mean and majority voting for the first two. In general, the accuracy is reasonably high, and the errors sufficiently low. However, there are significant differences between the machine learning designs. For example, the ensemble and multi-objective models have a higher accuracy than the expansion and likelihood models. Remember that the first two designs train separate algorithms for specific combinations of differential equations and classifiers, while the latter two use a single machine learning algorithm for all combinations. This shows that it is easier for a machine learning algorithm to capture the behavior of a single differential equation and classifier, but it is more challenging to find patterns when all 12 combinations are included in the training set.
Performance metrics for machine learning models on the test sets in fivefold cross-validation.
Method . | Accuracy . | MAE . | RMSE . |
---|---|---|---|
Ensemble Mean | 0.8230 | 0.0603 | 0.0894 |
Ensemble Voting | 0.8230 | 0.1770 | 0.4206 |
Multi-objective Mean | 0.8193 | 0.0610 | 0.0914 |
Multi-objective Voting | 0.8193 | 0.1807 | 0.4250 |
Expansion | 0.6116 | 0.3884 | 0.6232 |
Likelihood | 0.6251 | 0.0492 | 0.0689 |
Method . | Accuracy . | MAE . | RMSE . |
---|---|---|---|
Ensemble Mean | 0.8230 | 0.0603 | 0.0894 |
Ensemble Voting | 0.8230 | 0.1770 | 0.4206 |
Multi-objective Mean | 0.8193 | 0.0610 | 0.0914 |
Multi-objective Voting | 0.8193 | 0.1807 | 0.4250 |
Expansion | 0.6116 | 0.3884 | 0.6232 |
Likelihood | 0.6251 | 0.0492 | 0.0689 |
When considering the errors in the regression tasks of finding the likelihood of stable and transient cavitation, the likelihood design performs best. The principal difference between the accuracy and the error metrics is that the first is based on binary classification between transient and stable cavitation, while the second allows real values that indicate the likelihood of cavitation type. From the training data presented in Sec. III C, it was already clear that the differential equations and classifiers provide inconsistent results for a large proportion of the input parameters. Hence, the training data come with uncertainties in cavitation type, which are best handled with the likelihood design for machine learning.
Looking into the differences between the mean and voting techniques to achieve a final result in the ensemble and multi-objective designs, Table III shows that the mean methodology results in lower MAE and RMSE compared to the voting methodology. This implies that the mean method is more effective in situations with more intermediate cases, where for the same features or input parameters, the label or cavitation type is different across differential equations and classifiers.
Additionally, we notice that both the ensemble and multi-objective designs have an accuracy close to . In contrast, the average accuracy presented in Fig. 6 is . This discrepancy arises because the accuracy of the combined designs is calculated in Table III, whereas the average accuracy across individual classifiers is considered in Fig. 6. Therefore, these are different calculations.
3. Generalization
So far, we presented performance metrics for machine learning predictions for input parameters within the range of the training data. However, we can also use the trained machine to predict the likelihood of cavitation for input parameters that are outside the training range. This generalization becomes increasingly challenging for parameters more distant from the training data. Here, we consider a generalization experiment where we leave out the of the highest input values for the initial radius. Hence, the training set consists of the lowest of values for the initial radius and the entire ranges for the other input parameters.
Table IV presents the performance metrics for the generalization experiment. On first look, it is evident that the performance is similar to the cross-validation metrics presented in Table III. This confirms the effectiveness of generalization with machine learning. However, upon closer inspection, the accuracy is lower, and the error is higher in most cases. The reduced accuracy for generalization is expected behavior because we are testing machine learning predictions in cases unseen at the training phase. Yet, the performance deterioration is small. Hence, this experiment shows that machine learning provides reasonable estimates for cavitation experiments on input parameters just outside the training data.
Performance metrics of the machine learning designs on the test set for the generalization experiment, where the training set was selected as the of cases with the lower initial radius, while the test set comprised the of cases with the upper initial radius.
Method . | Accuracy . | MAE . | RMSE . |
---|---|---|---|
Ensemble mean | 0.8295 | 0.0683 | 0.1007 |
Ensemble voting | 0.8295 | 0.1705 | 0.4129 |
Multi-objective mean | 0.8325 | 0.0683 | 0.1007 |
Multi-objective voting | 0.8325 | 0.1675 | 0.4093 |
Expansion value | 0.7036 | 0.2964 | 0.5444 |
Likelihood value | 0.4585 | 0.0659 | 0.0941 |
Method . | Accuracy . | MAE . | RMSE . |
---|---|---|---|
Ensemble mean | 0.8295 | 0.0683 | 0.1007 |
Ensemble voting | 0.8295 | 0.1705 | 0.4129 |
Multi-objective mean | 0.8325 | 0.0683 | 0.1007 |
Multi-objective voting | 0.8325 | 0.1675 | 0.4093 |
Expansion value | 0.7036 | 0.2964 | 0.5444 |
Likelihood value | 0.4585 | 0.0659 | 0.0941 |
E. Cavitation likelihood charts
As explained in the introduction, the exact physical properties of the bubbles and surrounding media are uncertain or unknown in many practical situations. Furthermore, acoustical parameters such as amplitude and frequency can often be chosen in laboratory experiments, and sonication protocols are optimized to achieve specific objectives for the onset and type of cavitation. Hence, cavitation threshold charts for a broad range of acoustical and physical parameters are needed to understand the likelihood of cavitation regimes in various situations. At the same time, trained machines quickly predict the variable of interest since they avoid solving the physical models for each set of input parameters. For this purpose, we trained our machine learning algorithms on the entire dataset and predicted the likelihood of transient cavitation for different input pairs of physical and acoustical input parameters. Specifically, we use the ensemble design and calculate the percentage of simulations (i.e., ODE and classifier combinations) that predict transient cavitation.
Figure 7 presents cavitation likelihood charts, where two input parameters are fixed, while the other two vary across their full range (see Table I). The heatmaps reveal the intricate, nonlinear effects of the bubble's equilibrium radius and acoustic parameters on bubble dynamics and cavitation regimes. The variability and block-like patterns in the heatmaps arise from the limited resolution of the training dataset and the inherently nonlinear nature of cavitation.
Cavitation charts in the form of heatmaps indicating the likelihood of transient cavitation as predicted by the machine learning algorithm with ensemble design. The independent variables are the temperature, initial radius, frequency, and acoustic pressure. The colors indicate the proportion, in percentage points, of transient cavitation predictions by the machine learning algorithm. The panels show different cases for two fixed input parameters.
Cavitation charts in the form of heatmaps indicating the likelihood of transient cavitation as predicted by the machine learning algorithm with ensemble design. The independent variables are the temperature, initial radius, frequency, and acoustic pressure. The colors indicate the proportion, in percentage points, of transient cavitation predictions by the machine learning algorithm. The panels show different cases for two fixed input parameters.
According to the physical models of bubble dynamics,4,45–47 transient cavitation is more likely to occur at resonant bubbles (i.e., bubbles with resonance frequencies close to the driving frequency of the acoustic waves) and at higher acoustic pressures. For example, the resonance frequencies of bubbles with initial radii of m and m are approximately and , respectively, in water at . Thus, we expect a higher likelihood of transient cavitation for bubbles with the initial radii around a few micrometers at the MHz acoustic frequency range, as shown in Figs. 7(a) and 7(c). Also, larger bubbles, with lower resonance frequencies, exhibit a small likelihood of transient cavitation at higher frequencies, as seen in Figs. 7(c) and 7(d).
At a fixed ultrasound frequency, the plots show a higher likelihood of transient cavitation as the pressure amplitude increases, especially for bubbles with resonance frequencies close to the ultrasound frequency, see Figs. 7(a) and 7(b). These predictions are consistent with both experimental46,48,49 data and theoretical findings.4,45–47
The effect of temperature on bubble dynamics is complex. In general, as temperature increases, the surface tension, viscosity and liquid density decrease.46–48 This facilitates bubble oscillations and potentially increases the likelihood of cavitation. Similar observations can be made in Figs. 7(b) and 7(d) compared to subfigures (a) and (c), where there are more blocks above likelihood at 37 °C compared to 20 °C. Despite nonlinear relationships between cavitation thresholds and the input parameters (i.e., acoustic parameters, the physical properties of the medium, and the initial bubble radii), the predictions from the developed machine learning model are consistent with the underlying physics of cavitation and in agreement (qualitatively) with experimental observations.
IV. CONCLUSIONS
In this study, we developed and evaluated machine learning algorithms for predicting cavitation regimes of air bubbles in liquids across a broad range of acoustic and material parameters. These algorithms were trained on simulated bubble dynamics data generated using various theoretical models, including the Rayleigh–Plesset, Keller–Miksis, and Gilmore differential equations. The trained models were tested across diverse scenarios with input parameters relevant to biomedical ultrasound and sonochemistry applications. Cross-validation on held-out test data achieved an accuracy of approximately 80.
The best-performing machine learning model was used to compute cavitation threshold maps at different temperatures. The predictions integrate equally weighted contributions from multiple cavitation classifiers, including maximum bubble radius, maximum acoustic Mach number, Flynn's criterion based on pressure and inertia functions, and the kurtosis of acoustic emissions. By incorporating data from multiple theoretical models and using multiple classifiers, the proposed approach provides a more comprehensive and statistically robust methodology compared to traditional cavitation threshold maps, which rely on a single physical model and threshold. The likelihood charts show good qualitative agreement with theoretical and experimental data published in the literature.
The machine learning models developed in this study offer a fast, accurate, and reliable means of predicting cavitation likelihoods. However, the physical models used to generate training data assume radially symmetric oscillations of uncoated gas bubbles in viscous liquids. Future work aims to extend these models to more general scenarios, including: (i) coated bubbles oscillating in viscoelastic media, such as ultrasound contrast agents in soft tissue; (ii) multiple interacting bubbles; and (iii) scenarios requiring full simulations of the Navier–Stokes equations, such as asymmetric oscillations near boundaries, which are particularly relevant for microbubbles in blood vessels.
ACKNOWLEDGMENTS
This work was financially supported by the Agencia Nacional de Investigación y Desarrollo, Chile [FONDECYT 1230642].
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Trinidad Gatica: Investigation (equal); Writing – original draft (equal). Elwin van 't Wout: Supervision (equal); Writing – review & editing (equal). Reza Haqshenas: Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The software code and data that support the findings of this study are openly available in GitHub at https://github.com/trinidadgatica/Bubble-Cavitation-ML.