Simulating and predicting multiscale problems that couple multiple physics and dynamics across many orders of spatiotemporal scales is a great challenge that has not been investigated systematically by deep neural networks (DNNs). Herein, we develop a framework based on operator regression, the so-called deep operator network (DeepONet), with the long-term objective to simplify multiscale modeling by avoiding the fragile and time-consuming “hand-shaking” interface algorithms for stitching together heterogeneous descriptions of multiscale phenomena. To this end, as a first step, we investigate if a DeepONet can learn the dynamics of different scale regimes, one at the deterministic macroscale and the other at the stochastic microscale regime with inherent thermal fluctuations. Specifically, we test the effectiveness and accuracy of the DeepONet in predicting multirate bubble growth dynamics, which is described by a Rayleigh–Plesset (R–P) equation at the macroscale and modeled as a stochastic nucleation and cavitation process at the microscale by dissipative particle dynamics (DPD). First, we generate data using the R–P equation for multirate bubble growth dynamics caused by randomly time-varying liquid pressures drawn from Gaussian random fields (GRFs). Our results show that properly trained DeepONets can accurately predict the macroscale bubble growth dynamics and can outperform long short-term memory networks. We also demonstrate that the DeepONet can extrapolate accurately outside the input distribution using only very few new measurements. Subsequently, we train the DeepONet with DPD data corresponding to stochastic bubble growth dynamics. Although the DPD data are noisy and we only collect sparse data points on the trajectories, the trained DeepONet model is able to predict accurately the mean bubble dynamics for time-varying GRF pressures. Taken together, our findings demonstrate that DeepONets can be employed to unify the macroscale and microscale models of the multirate bubble growth problem, hence providing new insight into the role of operator regression via DNNs in tackling realistic multiscale problems and in simplifying modeling with heterogeneous descriptions.

## I. INTRODUCTION

Machine learning methods that learn effective physical models from available observational data have attracted increasing attention in recent years^{1–3} and have demonstrated success in diverse application areas.^{4–7} In particular, the use of deep neural networks (DNNs) as a surrogate model of complex dynamic systems is a promising approach for exploring hidden physics from observed data.^{4,5} Many different DNN architectures have been developed for targeting different types of scientific problems.^{8,9} Examples include convolutional neural networks (CNNs) for turbulence modeling,^{10} recurrent neural networks (RNNs) for predictive flow control,^{11} and generative adversarial networks (GANs) for super-resolution of fluid flows.^{12} In general, different physical quantities involved in the same dynamic process are correlated either explicitly or implicitly. Using the observational data of one physical quantity to predict other unobserved quantities requires a model to correctly correlate different physical quantities, which can be achieved by training a DNN with a lot of observational data.^{13} There are two possible scenarios when we develop a DNN model for a physical problem. First, if we have sufficient physical knowledge on the dynamic process and know its governing equations, i.e., in the form of partial differential equations (PDEs), it becomes possible to encode the PDEs into the DNN, which leads to the physics-informed neural network (PINN) model.^{5} Second, if we lack physical knowledge of a given physical problem and do not know the form of its governing equations, we need to learn the governing equations or operators from available observed data. This motivated Lu *et al.*^{3} to develop a deep operator network (DeepONet) to learn implicitly PDEs from data and many other explicit and implicit diverse operators.

Both PINN and DeepONet models have been successfully applied to mostly single scale problems, including high-speed flows,^{14} hidden fluid mechanics,^{4} and diffusion–reaction processes.^{3} However, it is still challenging to apply DNNs to truly multiscale problems described by different mathematical formulations. The form of governing equations for various scales could be significantly different even for the same physical problem, which makes the integration and coupling of heterogeneous physical models in multiscale problems very difficult. Because the physical models constructed by the DeepONet do not have an explicit form of governing equations, we can possibly develop a unified multiscale framework based on the DeepONet to significantly simplify the multiscale modeling procedure. In the present work, we will investigate this possibility by considering multiscale bubble growth dynamics, described by the Rayleigh–Plesset (R–P) equation at the continuum scale and modeled as a stochastic nucleation and cavitation process at the microscale.

Bubbles are present in a wide range of applications, from advanced materials^{15} to biology and medicine,^{16} as either laser-generated or acoustically driven bubbles.^{17} The heat transfer and fluid flow processes associated with liquid–vapor phase change phenomena are among the most complex transport conditions that are encountered in engineering applications. The complexities typically arise from the interplay of multiple length and time scales involved in the process, non-equilibrium effects, and other effects associated with dynamic interfaces. Nucleation of one phase into another occurs at the atomic scales, whereas the growth of the nuclei and their interaction with the system occur at much longer length and time scales.^{18} For example, boiling is a complex nonequilibrium multiscale process, where the physics is linked from nano- to meso- to macroscales. For each of these processes, the physics from different scales has to be seamlessly connected. However, different models are valid or are practical only within some space and time range, and bridging the gap between models of different scales has been a challenge for decades.

A simple and fundamental problem to understand bubble dynamics is that of the growth of a single bubble as the pressure changes in an otherwise quiescent fluid. This problem has a long history, beginning with the important contributions of Lord Rayleigh.^{19} In general, contemporary interest in the problem is motivated largely by the sound produced by bubbles that undergo a time-dependent change in volume, particularly in the context of cavitation bubbles that are produced in hydromachinery or in the propulsion systems of submarines and other underwater vehicles or for biomedical applications. The objective for single-bubble dynamics is usually to predict the bubble radius as a function of time in a preset ambient pressure field. Once we know the bubble radius as a function of time, we can determine the velocity and pressure fields throughout the liquid and evaluate the sound produced.

In the macroscale regime, the R–P ordinary differential equation (ODE) is a valid model for the governing dynamics of a spherical bubble in an incompressible fluid. It is derived from the Navier–Stokes (N–S) equation based on the assumptions of a continuum hypothesis and of a single spherically symmetric bubble in an infinite domain of liquid at rest far from the bubble and with uniform temperature far from the bubble. In the microscale and nanoscale regimes, we can use molecular dynamics (MD) models to investigate the single-bubble nucleation problem.^{20} For longer term development, dissipative particle dynamics (DPD) is a more efficient coarse-grained MD method that has been used to model complex fluid systems such as red blood cells,^{21} polymers,^{22} droplets,^{23,24} and suspension.^{25} The equations of DPD are stochastic differential equations with conservative, dissipative, and random forces. A few studies have reported using DPD to simulate bubble dynamics.^{26,27} An effective model^{28} is a hybrid model of DPD and its extension to multi-body dissipative particle dynamics (MDPD), where the traditional DPD particles are used for the gas phase and MDPD particles are used for the liquid phase. The model is validated by comparing it to the R–P equation for larger sized bubbles and with MD for smaller sized bubbles.

In this paper, our goal is to test the effectiveness of the DeepONet as the surrogate model for both the macroscale and microscale physics. This paper is organized as follows: In Sec. II, we describe the details of the DeepONet model. In Sec. III, we introduce the macroscopic model (R–P model) for bubble dynamics and evaluate the performance of the DeepONet at the continuum scale. In Sec. IV, we introduce the microscopic model (DPD model) and evaluate the performance of the DeepONet at the stochastic microscale. Finally, we conclude this paper in Sec. V.

## II. LEARNING OPERATORS VIA DeepONet

Our work is inspired by the work of Lu *et al.*,^{3} who proposed a deep operator network (DeepONet) to approximate nonlinear continuous operators. The DeepONet architecture is based on rigorous mathematical theory, namely, the *universal operator approximation theorem*,^{29} which has been extended to DNNs.

The architecture of the DeepONet consists of a DNN for encoding the discrete input function space (branch net) and another DNN for encoding the coordinates of the output functions (trunk net). In the bubble dynamics application, the input function is the ambient liquid pressure change deviating from the initial equilibrium state, Δ*P*, while the output function is the bubble radius, *R*(*t*). There are different ways to represent a function, but here, the input to the branch is represented by discrete function values at a sufficient but finite number of locations, which we call “sensors.” The architecture we used in this study is shown in Fig. 1. The trunk network takes time *t* as the input and outputs $[a1,a2,\u2026,ap]T$. The branch network takes $[\Delta p(t1),\Delta p(t2),\u2026,\Delta p(tm)]T$ as the input and outputs a vector $[b1,b2,\u2026,bp]T$. This architecture corresponds to the unstacked DeepONet.^{3} Then, the two vectors are merged together to predict the bubble radius, $R(t)=G(\Delta p)(t)=\Sigma k=1pbkak$.

The loss function we employ is the mean squared error (MSE) between the true value of *R*(*t*) from the R–P equation and the network prediction derived from the input {[Δ*p*(*t*_{1}), Δ*p*(*t*_{2}), …, Δ*p*(*t*_{m})], *t*}. The DeepONet is a high-level network architecture without restricting the architectures of its inner trunk and branch networks. Here, we choose feedforward neural networks (FNNs) as the architectures of the two sub-networks, which are implemented in the Python library DeepXDE.^{30}

For each input function Δ*P*, the DeepONet requires evaluations at the same set of discrete times. However, the DeepONet does not have any constraint on the number or location for the evaluation of output function *R*(*t*). The DeepONet provides a continuous estimate of the bubble radius, *R*, at any arbitrary time. In this study, for the efficient training of the networks, we choose a fixed number of evaluations for each *R*(*t*) but at randomly selected times within the whole interval (see Fig. 2).

## III. MACROSCALE (CONTINUUM) REGIME

### A. Rayleigh–Plesset model

The R–P equation^{31–33} is the classical continuum-level model that describes the change in radius *R*(*t*) of a single gas-vapor bubble in an infinite body of liquid as the far-field pressure *P*_{∞}(*t*) varies with time *t*. This ODE for the dynamics of a spherical bubble is based on a number of assumptions. Although it is limited to the case where a continuum model is valid, it is a simple yet powerful tool to reveal the effects of the surroundings on bubble response and it is the starting point for our investigation of operator learning as applied to bubble dynamics.

The underlying assumptions of the R–P model are that the liquid is at rest far from the bubble and that the liquid phase is incompressible, with constant density *ρ*_{L} and kinematic viscosity *ν*_{L}. It is also assumed that the contents of the bubble have negligible inertia and are homogeneous and that the temperature, *T*_{B}(*t*), and pressure, *P*_{B}(*t*), within the bubble are uniform. The configuration is illustrated in Fig. 3. As the bubble expands, it generates a radial potential flow corresponding to a point source at the bubble center. The pressure in the liquid can be determined from the Navier–Stokes equations together with the balance of the liquid pressure at the bubble surface with the bubble pressure *P*_{B} and the surface tension between the phases given by the coefficient of surface tension *γ*. As the liquid responds to the expanding bubble, the liquid pressure balances the effects of liquid inertia and damping of the motion by viscosity. The R–P equation can be written as

The general R–P model assumes that the bubble contains a mixture of an ideal gas and vapor. The pressure inside the bubble is then the sum of the partial pressures. The vapor pressure *P*_{V} is determined by the temperature within the bubble *T*_{B}, while the pressure of the gas *P*_{G} is given by a polytropic gas law,

The coefficient *k* = 1 corresponds to isothermal conditions, or *k* = 1.4 for a simple adiabatic gas. Since the microscopic simulations to be described later are based on a thermostat to maintain a constant temperature, we will neglect the effects of any temperature difference between the bubble and the liquid, setting *T*_{B} = *T*_{∞}, and assume isothermal conditions. With these assumptions, the R–P equation becomes

The initial conditions imposed here are that the initial microbubble radius is *R*_{0} at *t* = 0 and that the bubble is in static equilibrium, with the gas pressure *P*_{G} = *P*_{G0} and the fluid at a pressure *P*_{∞}(0) so that

Consistent with initial equilibrium, *dR*/*dt*|_{t=0} = 0. Under these conditions, the vapor pressure simply gives a constant offset in the liquid pressure at infinity. So as to reduce the number of parameters under consideration, the results here are based on *P*_{∞}(0) = *P*_{V}(*T*_{∞}), as might occur close to the onset of cavitation. We are not trying to model any specific physical set of conditions, but for convenience, the initial gas pressure is given in terms of the surface tension, *P*_{G0} = 2*γ*/*R*_{0}.

A typical solution for Eq. (3) under these conditions is shown in Fig. 4 with a sharp change in the pressure *P*_{∞}(*t*), which drops sharply below *P*_{∞}(0) to a new constant value. The pressure drop induces a series of damped oscillations as the bubble first expands rapidly and then gradually adjusts to a new equilibrium size. In the absence of dissipation by viscosity, these oscillations would continue indefinitely without attenuation. For small variations in the liquid pressure *P*_{∞}(*t*) = *P*_{∞}(0) + Δ*p*(*t*), the response is *R*(*t*) = *R*_{0} + *r*(*t*) and the linearized form of Eq. (3) is

Under the present conditions, the last coefficient is equal to 2*P*_{G0} or equivalently 4*γ*/*R*_{0}. The corresponding natural frequency *ω*_{0} for undamped oscillations is given by

This shows the natural time scale of the bubble response to changing pressure and an indication of how often the bubble radius normally should be observed. It is also clear that this time scale is dependent on *R*_{0}.

For the training and test data to use with the DeepONet, we consider a bubble that is subject to a sharp low-pressure at time *t* = 0 and then experiences a random liquid pressure fluctuation thereafter. The liquid pressure change comprises of two parts: the random fluctuation generated by Gaussian random fields (GRFs) and the ramp function, given in terms of a power law, for the initial drop in pressure *P*_{∞}(*t*). We define the liquid pressure change, Δ*P*(*t*), as

The Gaussian covariance kernel *k*_{lT}(*t*_{1}, *t*_{2}) = exp(−∥*t*_{1} − *t*_{2}∥^{2}/2*l*^{2}*T*^{2}) has a correlation time scale based on the time-scale parameter *lT* > 0. *T* is the time interval and the dimensionless parameter *l* determines the smoothness of the sampled function. We choose *l* = 0.2 unless otherwise stated. The value of *q* indicates the duration of the initial pressure drop, and we choose *q* = 0.1 unless otherwise stated. The mean value of the pressure change is *μ* = −1500 Pa and the standard deviation is *σ* = 400 Pa.

The setup of the problem is summarized in Table I. The parameters are representative of water as the liquid phase and the surface tension are set for a water–air interface at room temperature.

Parameter . | Physical meaning . | Value . |
---|---|---|

ρ_{L} | Liquid density | 1 × 10^{3} kg m^{−3} |

ν_{L} | Viscosity | 1 × 10^{−6} m^{2} s^{−1} |

γ | Surface tension | 7.2 × 10^{−2} N m^{−1} |

T | Time interval | 5 × 10^{−4} s |

R(t_{0}) | Initial radius | 1 × 10^{−5} m |

dR(t_{0})/dt | Initial radius velocity | 0 m/s |

Parameter . | Physical meaning . | Value . |
---|---|---|

ρ_{L} | Liquid density | 1 × 10^{3} kg m^{−3} |

ν_{L} | Viscosity | 1 × 10^{−6} m^{2} s^{−1} |

γ | Surface tension | 7.2 × 10^{−2} N m^{−1} |

T | Time interval | 5 × 10^{−4} s |

R(t_{0}) | Initial radius | 1 × 10^{−5} m |

dR(t_{0})/dt | Initial radius velocity | 0 m/s |

Figure 5 demonstrates some representative data for the liquid pressure change, Δ*P*(*t*), and corresponding bubble radius, *R*(*t*). The general features of these solutions are characteristic of the response of a bubble as it passes through any low-pressure region.

### B. Convergence tests

Here, we investigate how the network size and training dataset affect the convergence of the DeepONet. We apply the Adam optimizer with a small learning rate 5 × 10^{−4} to train the DeepONet for 2 × 10^{5} iterations. The rectified linear unit (ReLU) is used for the activation function. By varying the network width, we observe that there is an optimal width to achieve the smallest error (depth 2) [see Fig. 7(a)]. We note that increasing the width from 1 to 500 could decrease the error, but the error would increase instead when the width further increases. We also note that increasing the depth from one hidden layer to ten hidden layers could decrease the training error (width 100), but the test loss reaches the minimum when the depth is about 6. To examine the effect of the training dataset size, we choose networks of width 200 and depth 3 to eliminate the network size effect, and the test errors using different dataset size are shown in Fig. 6(c). As the DeepONet has no constraint on the number of data points per trajectory, we can get the same number of data points either from more trajectories with sparser points per trajectory or fewer trajectories with denser points per trajectory. Figure 6(c) shows the influence of the number of trajectories and the size of the dataset. It is reasonable that including more training data points leads to a smaller error, and we can also see that when the number of data points is fixed, the number of trajectories is usually more important than the density of data points on each trajectory.

Figure 7 shows some representative results of the best model. The width and depth of the network is 200 and 3, respectively, and the number of data points is 1 × 10^{5} (5000 trajectories). The loss histories of the training and testing are presented in Fig. 7(a); the losses converge to rather small values. Upon training, the DeepONet can predict the bubble radius at any time when the input functions are given. The comparison of prediction values against the ground truth is shown in Fig. 7(b) in which the diagonal line indicates a perfect prediction. The points are colored based on the neighbor density because the points near the diagonal line are highly overlapped. Some *R*(*t*) trajectories predicted by the DeepONet are shown in Fig. 7(c) along with the ground truth. The DeepONet can predict them very accurately, capturing all the oscillations, decay, and smooth change afterward. We also use the trained DeepONet to predict an out-of-distribution pressure input signal (not from GRFs), which is the typical solution of the R–P equation discussed in Sec. III [see the results in Fig. 7(d)] (the inset subplot is the pressure field).

### C. Comparison of DeepONet and LSTM

Considering the nature of the present problem where we aim to forecast the time series, it is reasonable to question if any recurrent neural network (RNN) could also be a suitable tool for forecasting besides the DeepONet. RNNs were initially used for language processing models due to their ability to memorize long-term dependencies. However, when time lags increase, gradients of RNNs may vanish through unfolding RNNs into deep FNNs. Due to the gradient vanishing problem, the long short-term memory (LSTM)^{34} was proposed with forget units, which were designed to give the memory cells the ability to choose when to forget certain information, thus determining the optimal time lags. The LSTM cell contains mainly four parts: input gate, input modulation gate, forget gate, and output gate. The input gate takes a new input point from outside and processes newly coming data; the input modulation gate takes input from the output of the LSTM cell in the last iteration; the forget gate decides when to forget the output results and thus selects the optimal time lag for the input sequence; and the output gate takes all results calculated and generates output for the LSTM cell. LSTM has been utilized for traffic flow prediction,^{35} vessel dynamics prediction,^{36} human trajectory prediction,^{37} etc.

In the comparison case, the training data for both the DeepONet and LSTM are generated in the same way (see Sec. III A). With ample training data and tuned hyper-parameters, we find that both the DeepONet and LSTM can handle this problem well. The DeepONet has a clear advantage due to its flexibility in handling arbitrary training data points. We develop six benchmark cases and plot the results in Fig. 8. The inset plots are the input functions, and they are the same in each column for a fair comparison. To train the model, 5000 pairs of liquid pressure and bubble radius trajectories are provided while the availability of data points on the radius trajectories is limited. In Figs. 8(a) and 8(d), only 20 data points are known per trajectory; in Figs. 8(b) and 8(e), 80 data points are known per trajectory; and in Figs. 8(c) and 8(f), 200 data points are known per trajectory. These sample points are evenly spaced in LSTM’s case because LSTM requires that the locations of all data points be aligned. Even though LSTM learned well on these given locations, it does not have any knowledge on locations that are not on a predefined grid [see Figs. 8(a) and 8(b)]. The problem can be alleviated when the data points are sufficiently dense [Fig. 8(c)]. On the other hand, the sample points have arbitrary and different distribution for each trajectory in DeepONet’s case. As stated in Sec. II, the DeepONet does not enforce any constraint on the number or locations for the evaluation of the output function. This improvement enables the DeepONet to predict accurately based on any arbitrary locations, no matter how sparse the training data are [see Figs. 8(d) and 8(e)]. From Fig. 8(d), we note that the predicted trajectory does not deviate from the ground truth. Moreover, from our experience, once the training data are ample, the DeepONet can predict smooth and accurate results even if only a single data point is collected per trajectory. In general, we have no prior knowledge about the right time interval at which to perform measurements or we may have difficulty in acquiring very dense data, and hence, the DeepONet shows a clear advantage both for its accuracy and versatility.

### D. Predictions for random initial bubble size

In this section, we extend the model to different initial bubble sizes. The initial condition can be embedded in the DeepONet in many ways, so here we choose to input *R*(*t*_{0}) in the trunk net along with time *t*. In the current example, the training dataset contains 30 000 pairs of trajectories. Different from previous cases, the bubble radius trajectories are solved with random initial size ranging from 1 × 10^{−5} m to 2 × 10^{−5} m. Some examples of the input and output functions are presented in Fig. 9. As shown, the initial bubble size has a strong effect not only on the initial value of the solution but also on the oscillation frequency and the decay rate.

Upon training, DeepONet’s predictions agree with the ground truth very well. Figure 10(a) shows the comparison; each point’s horizontal coordinate is a bubble radius value predicted by the DeepONet at random times, while the vertical coordinate is the corresponding ground truth. The color indicates the density of overlapping points. We see that most points lie on or near the diagonal line, which indicates an almost perfect prediction. Figure 10(b) shows some samples of bubble radius evolution predicted by the DeepONet and solved with the R–P equation from the test dataset. We note that the different oscillation magnitude, oscillation frequency, and decay are accurately predicted by the DeepONet.

### E. Extrapolation: Predicting inputs outside the function space of the training data

The DeepONet predictions shown previously are results for the pressure field in the input training range, i.e., the testing was performed with independent data but with a pressure field generated with *l* = 0.2 [Eq. (7)], the same correlation length as in the training data (interpolation). In this section, we would like to obtain predictions with the trained DeepONet for pressure fields that do not belong to the input function space, i.e., perform an extrapolation. In particular, we want to use the trained DeepONet to predict the bubble radius based on pressure fields generated with unseen correlation length *l*, ranging from 0.07 to 0.9. The other parameters used are as follows: *q* in Eq. (7) equals 0.5, the network size is (50, 50, 50, 50) for both the trunk and branch net, and the training dataset contains 3000 pairs of trajectories. These settings remain the same throughout this section.

In Fig. 11(a), we show the test error when we extrapolate to the unseen pressure field with *l* ranging from 0.07 to 0.9. The error rises moderately when *l* increases but may go up quite sharply when *l* decreases. In Fig. 11(b), we use as training data a combination of three different sets corresponding to values of *l* (0.1, 0.15, and 0.2). Even though the training time and all other parameters are the same, the test errors of extrapolation are reduced substantially. The light blue area in the plot indicates the interpolation range; the test errors for *l* = 0.125 and *l* = 0.175 are smaller than that for *l* = 0.1. Then, we design the representation of the input space more carefully, moving the window toward the smaller values of *l*, where we observed the highest error. In Fig. 11(c), the training data are a combination of *l* = 0.07, 0.1, and 0.15. We observe that the errors with small *l* are significantly reduced, while for the extrapolation with large *l*, the errors increase. Overall, the errors in the entire range are more balanced. Figure 11(d) shows an extra case when the training data are a combination of *l* = 0.07, 0.15, and 0.9. In this case, the training data are selected with more representative *l* so that all the test cases fall into the interpolation range. We observe that the test errors are reduced more uniformly, which indicates that a better representation of the input space will lead to a higher accuracy.

Next, we will demonstrate how extrapolation can be improved when only a few data are available for the output variable. In this example, the DeepONet is trained with liquid pressure produced with *l* = 0.1, 0.15, and 0.2, as shown in Fig. 11(b), and is applied to extrapolate for *l* = 0.07. We use the relative *L*_{2}-norm error to evaluate the performance, namely,

where *R*^{P} and *R*^{T} are the predicted and true bubble radii, respectively. Figures 12(a) and 12(d) show two examples of using the pre-trained DeepONet for making predictions on new scenarios (i.e., unseen data), where we can see that the *L*_{2} errors are relatively high. In order to improve the performance of the DeepONet for extrapolation, we assume that a few measurements of the unseen profile are available. Moreover, since the input function is known *a priori*, when doing extrapolation, the locations of these additional data points can be selected based on the local extremum of the pressure profile. Then, we fine-tune the pre-trained networks based on only the new measurements by using the Adam optimizer with a reduced learning rate (2 × 10^{−5}) for 100 epochs. We note that only the last layer of the trunk net is tunable during the fine-tuning process. The results are shown in Figs. 12(b), 12(c), 12(e), and 12(f) from which we can see that the predictions after fine-tuning are in good agreement with the ground truth. In the first case, the *L*_{2} relative error drops from 15.3% to 9.7% (three new data) or 6.2% (six new data), and in the second case, the *L*_{2} error drops from 19.5% to 14.3% (three new data) or 11.1% (six new data). Overall, by performing transfer learning with sparse data points, we obtain much better solutions compared to just the original pretrained networks. In addition, we note that the transfer learning process is very fast, i.e., it takes only a few seconds either on a single central processing unit (CPU) or graphics processing unit (GPU).

## IV. MICROSCALE (STOCHASTIC) REGIME

The R–P equation is based on the continuum assumption, which is not valid in the microscopic regime. Usually, particle-based simulation methods are good alternatives at the microscale. In this section, the DPD method is employed to simulate the dynamics of bubble size evolution at the microscale level and to generate data for training the DeepONet. As with MD systems, a DPD model consists of many interacting particles governed by Newton’s equation of motion,^{38}

where *m*_{i} is the mass of a particle *i*, **r**_{i} and **v**_{i} are position and velocity vectors of particle *i*, and **F**_{i} is the total force acting on particle *i* due to the presence of neighboring particles. The summation for computing **F**_{i} is carried out over all neighboring particles within a cutoff range. Beyond the cutoff, all pairwise interactions are considered zero. The pairwise force is comprised of a conservative force $FijC$, a dissipative force $FijD$, and a random force $FijR$, which, respectively, are of the form

where *r*_{ij} = |**r**_{ij}| = |**r**_{i} − **r**_{j}| is the distance between particles *i* and *j*, **e**_{ij} = **r**_{ij}/*r*_{ij} is the unit vector, and **v**_{ij} = **v**_{i} − **v**_{j} is the velocity difference; $dW\u0303ij$ is an independent increment of the Wiener process.^{39} The coefficients *a*_{ij}, *γ*_{ij}, and *σ*_{ij} determine the strength of the three forces, respectively. To satisfy the fluctuation–dissipation theorem (FDT)^{39} and to maintain the DPD system at a constant temperature,^{38,40} the dissipative force and the random force are constrained by $\sigma ij2=2\gamma ijkBT$ and $\omega D(rij)=\omega R2(rij)$. Common choices for weight functions are *ω*_{C}(*r*_{ij}) = 1 − *r*_{ij}/*r*_{c} and $\omega D(rij)=\omega R2(rij)=(1\u2212rij/rc)1/2$, with *r*_{c} being the cutoff radius.

With the purely repulsive conservative forces, DPD is most suitable for simulating a gas, which fills the space spontaneously. To simulate the liquid phase, we use MDPD, which is an extension of DPD by modifying the conservative force to include both a repulsive force and an attractive force,^{41,42}

Because a DPD model with a single interaction range cannot maintain a stable interface,^{43} the repulsive contribution in Eq. (11) is set to act within a shorter range *r*_{d} < *r*_{c} than the soft pair attractive potential. The many-body repulsion is chosen in the form of a self-energy per particle, which is quadratic in the local density, *B*_{ij}(*ρ*_{i} + *ρ*_{j})$w$_{d}(*r*_{ij}), where *B* > 0. The density of each particle is defined as

and its weight function $w$_{ρ} is defined as

where $w$_{ρ} vanishes for *r* > *r*_{d}.

Figure 13 illustrates the DPD-MDPD coupled simulation system. To reduce the computational cost, a thin box with periodic boundaries in the *x* and *z* directions is used. The yellow particles represent the gas phase, with the interaction force modeled by a DPD potential; the blue particles represent the liquid phase, with the interaction force modeled by a MDPD potential. The interaction force between gas and liquid particles is described by a DPD potential, which can prevent the gas particles dissolving into the liquid phase. The top and bottom solid walls are constructed by pre-equilibrium frozen particles. The number density of wall particles is the same as the density of the liquid. The no-slip boundary method^{44} for arbitrary-shape geometries is applied. To change the system pressure, the top wall is set to have one degree of freedom moving just in the *y* direction. Before the simulation starts, a uniform static external pressure is exerted on the top wall to reach an initial equilibrium state. Then, the value of the external pressure evolves following a preset function. We continuously record the volume of the gas phase and use it to compute the effective bubble radius. Because the particles of a DPD system are randomly distributed in space, it is difficult to accurately compute the volume of the gas bubble. In the present work, we adopt a three-dimensional Voronoi tessellation^{45} to estimate the instantaneous volume occupied by gas particles. In the Voronoi scheme, each particle in the system occupies a cell around it, which contains all the space closer to that particle. In our case, the Voronoi cells take the form of irregular polyhedra (see the inset of Fig. 13). Let Ω be the volume of a gas or liquid phase; we compute the instantaneous volume-averaged virial stress based on the virial theorem,^{46}

where *α* and *β* take values of *x*, *y*, and *z* to generate the components of the stress. The first term represents a kinetic energy contribution for particle *i*. The second term is a pairwise energy contribution due to interactions between particles *i* and *j*, where *j* loops over the *N*_{p} neighbors of particle *i*; **r**^{i} and **r**^{j} are the positions of the two particles in the pairwise interaction and **F**^{i} and **F**^{j} are the forces on the two particles resulting from the pairwise interaction. We take the mean of diagonals of the negative stress tensor as the pressure,

Separate simulations are conducted to measure the properties of the gas and liquid phases. The simulation setup and the measured properties of fluids are summarized in Table II. By mapping the dimensionless properties (density, surface tension, and viscosity) to properties of water at 20 °C, the scaling relationship can be determined:^{47,48} length reference [*l*] = 4.12 × 10^{−8} m, time reference [*t*] = 1.79 × 10^{−9} s, and mass reference [*m*] = 8.82 × 10^{−21} kg. With this scaling relationship, the dimensionless properties in the DPD simulation can be transformed to physical values. For example, the bubble radius is 5.77 × 10^{−7} m and the simulation time span is ∼3.58 × 10^{−6} s. The preset pressure on the top wall is generated using GRFs [Eq. (7), *μ* = 37.55 and *σ* = 2.17] with an exponential function mask ensuring smooth transition from the initial equilibrium state. Examples of pressure trajectories are presented in Fig. 14.

. | Property . | Value . |
---|---|---|

System | Box | X = 60 (periodic) |

Y = 66 (shrink-wrapped) | ||

Z = 4 (periodic) | ||

Temperature | k_{b}T = 1.0 | |

Cutoff | r_{c} = 1.0 | |

Time | T = 2000 | |

Time step | t = 0.01 | |

Pressure | See Eq. (7), μ = 37.55, σ = 2.17 | |

Liquid | MDPD parameter | A = −40, B = −25, γ = 4.5, r_{d} = 0.75 |

Number density | n_{l} = 7.93 | |

Particle mass | m_{l} = 1.0 | |

Liquid density | ρ_{l} = 7.93 | |

Dynamic viscosity | μ_{l} = 8.36 | |

Speed of sound | c = 18.52 | |

Gas | DPD parameter | a = 5, γ = 4.5 |

Number density | n_{g} = 8.25 | |

Particle mass | m_{g} = 0.2 | |

Gas density | ρ_{g} = 1.65 | |

Dynamic viscosity | μ_{g} = 1.41 | |

Interface | DPD parameter | a = 20, γ = 4.5 |

Pressure in liquid | p_{l} = 43.08 | |

Pressure in gas | p_{g} = 44.96 | |

Bubble radius | R = 14.05 | |

Surface tension | γ = 26.42 |

. | Property . | Value . |
---|---|---|

System | Box | X = 60 (periodic) |

Y = 66 (shrink-wrapped) | ||

Z = 4 (periodic) | ||

Temperature | k_{b}T = 1.0 | |

Cutoff | r_{c} = 1.0 | |

Time | T = 2000 | |

Time step | t = 0.01 | |

Pressure | See Eq. (7), μ = 37.55, σ = 2.17 | |

Liquid | MDPD parameter | A = −40, B = −25, γ = 4.5, r_{d} = 0.75 |

Number density | n_{l} = 7.93 | |

Particle mass | m_{l} = 1.0 | |

Liquid density | ρ_{l} = 7.93 | |

Dynamic viscosity | μ_{l} = 8.36 | |

Speed of sound | c = 18.52 | |

Gas | DPD parameter | a = 5, γ = 4.5 |

Number density | n_{g} = 8.25 | |

Particle mass | m_{g} = 0.2 | |

Gas density | ρ_{g} = 1.65 | |

Dynamic viscosity | μ_{g} = 1.41 | |

Interface | DPD parameter | a = 20, γ = 4.5 |

Pressure in liquid | p_{l} = 43.08 | |

Pressure in gas | p_{g} = 44.96 | |

Bubble radius | R = 14.05 | |

Surface tension | γ = 26.42 |

We run 400 independent DPD simulations to generate the training data for the DeepONet. Each simulation can generate one pair of input and output functions. Figure 15 shows two representative simulations. In Fig. 15(a), we plot both the preset pressure on the top wall and the measured liquid pressure. The measured liquid pressure exhibits noticeable noise compared to the smooth input pressure signal. Figure 15(b) shows the measured bubble radius; the corresponding Δ*P*(*t*) and *R*(*t*) are presented with the same color.

The setup of the DeepONet is summarized in Table III. Here, 50 evenly spaced sensors are employed for the input function and 25 randomly spaced points are employed for the output function. Both branch and trunk nets have two hidden layers and 200 neurons per layer.

Pressure (Train + Test) | No. of pressure | No. of radius | No. of train data | No. of test data |

= 400 + 100 | sensors = 50 | sensors = 25 | points = 10 000 | points = 2500 |

Type of DeepONet | Branch depth = 2 | Branch width = 200 | Trunk depth = 2 | Trunk width = 200 |

unstacked | ||||

Pressure generator | Optimizer Adam | Learning rate = 0.0005 | Activation ReLU | No. of epochs = 10 000 |

Gaussian random |

Pressure (Train + Test) | No. of pressure | No. of radius | No. of train data | No. of test data |

= 400 + 100 | sensors = 50 | sensors = 25 | points = 10 000 | points = 2500 |

Type of DeepONet | Branch depth = 2 | Branch width = 200 | Trunk depth = 2 | Trunk width = 200 |

unstacked | ||||

Pressure generator | Optimizer Adam | Learning rate = 0.0005 | Activation ReLU | No. of epochs = 10 000 |

Gaussian random |

Figure 16(a) shows the loss histories for training and testing. Figure 16(b) is the parity plot of values predicted by the DeepONet and DPD simulation results. We see that almost all data points fall near the diagonal line, which indicates almost perfect accuracy. The small discrepancy is mainly because the raw trajectory is noisy but the predicted trajectory is quite smooth (see Fig. 17).

Figure 17 shows two representative cases of the DeepONet prediction. Even though the DPD results exhibit high frequency fluctuations and we only collect sparse data points on the noisy trajectory, the DeepONet yields an accurate smooth trajectory, which captures the mean trajectory of the data quite well.

The results shown above indicate that once trained, the DeepONet can be used to infer the microscale bubble growth from a complex particle simulation system, which cannot be easily described by simple control functions. Conventional numerical simulation methods, in which pair-list updates and inter-particle force calculation are required, are usually very time consuming. For each DPD simulation, it takes 3 h on a 16-core workstation, while it takes only a fraction of a second for the DeepONet to evaluate *R*(*t*), so the speed-up factor in this case exceeds 200 000.

## V. SUMMARY

In this paper, we first introduce the macroscopic model (R–P) and microscopic model (DPD) for the problem of the growth of a single bubble due to time-varying changes in the ambient liquid pressure. We then demonstrate the effectiveness of the DeepONet for both models. We use Gaussian random fields to generate various pressure fields, which are the input signals to the dynamical system. We investigate the influence of the network depth, width, and data size, and by comparing to LSTM, we demonstrated a unique advantage of the DeepONet, i.e., the flexibility in data density and location, which will be very useful when we have no prior knowledge of how much data density is sufficient or when we have restrictions on data acquisition. Moreover, we test the case when the input is not within the input space, i.e., the correlation length of the pressure field is out of the training range. In this case, we cannot obtain very accurate predictions; however, we resolve this issue by transfer learning and fine-tuning the trunk net of the pre-trained DeepONet with just a few new data points. In the microscale regime, the data generated by the DPD model contain high frequency fluctuations. We show that without any extra data processing, the DeepONet can learn the mean component of these noisy raw data, and the computational time could be reduced from 48 (3 h × 16 cores) CPU hours to a fraction of a second for a speed-up over 200 000. Learning the stochastic fluctuations accurately requires a more careful design and training of the DeepONet, and we leave this for future work.

More broadly, the results in this paper show that the DeepONet can be used both in the macroscopic and microscopic regimes of the bubble growth dynamics, which creates a good foundation for a unified neural network model that could predict the results from macroscale to microscale seamlessly. This topic is currently under investigation, and we will report such results in future work.

## ACKNOWLEDGMENTS

The authors acknowledge support from DOE/PhILMs (Grant No. DE-SC0019453) and ARL, Utah (Cooperative Agreement No. W911NF-12-2-0023).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.