Machine learning (ML) is used to build a new computationally efficient datadriven dynamical model for singlephase and complex multicomponent particle–liquid turbulent flows in a stirred vessel. By feeding shortterm trajectories of flow phases or components acquired experimentally for a given flow condition via a positron emission particle tracking (PEPT) technique, the ML model learns primary flow dynamics from the input driver data and predicts new longterm trajectories pertaining to new flow conditions. The model performance is evaluated over a wide range of flow conditions by comparing MLpredicted flow fields with extensive longterm experimental PEPT data. The ML model predicts the local velocities and spatial distribution of each flow phase and component to a high degree of accuracy, including conditions of impeller speeds, particle loadings and sizes within and without the range of the input driver datasets. A new flow analysis and modeling strategy is thus developed, whereby only shortterm experiments (or alternatively highfidelity simulations) covering a few typical flow situations are sufficient to enable the prediction of complex multiphase flows, significantly reducing experimental and/or simulation costs.
I. INTRODUCTION
Machine learning (ML) is an extremely powerful datadriven technique, which automatically builds a certain mathematic model using supplied sample data to make decisions and predictions without being explicitly programmed. It has achieved advances in a diverse range of applications, such as climatology, fluid turbulence, finance, robotics, and neuroscience.^{1–3} ML is especially useful for complex dynamical systems whose characteristics of nonlinearity, multiscale, highdimensionality, and dynamics often limit the use of conventional methods to understand, predict, design, and control them. ML strategies offer an agile and modular modeling framework that can solve specific issues based purely on input data.^{4} For example, ML recently played an important role in better understanding how COVID19 spread, thus helping to inform the fight against the pandemic worldwide.^{5,6} Although datadriven ML has achieved significant success in a number of fields, this is still a growing science and there is a need to explore and develop new strategies to widen and improve its applicability to science and engineering.
Mixing flows in mechanically agitated vessels are a typical example of a dynamical system in the study of multiphase fluid mechanics, where the blending of different phases and/or phase components produces a complex dynamic flow behavior. The main objective is to rapidly reduce the inhomogeneity of phase, temperature, and concentration, thus, speeding up mixture production and ensuring good product physical mixing and/or enhancing chemical reaction.^{7,8} The selection of a suitable system for a specific mixing application should consider a number of factors including the vessel geometry, impeller type, fluid properties and operating conditions since the resulting mixing flow pattern is a complex function of these parameters.^{9} Thus, understanding the complex flow dynamics involved is key to the successful design, operation and optimization of these devices and processes. Conventionally, the prediction of complex flow dynamics is achieved by establishing mechanistic models based on the underlying knowledge of physicochemical phenomena or simplistic empirical models.^{10} However, the high complexity of such flow systems often poses challenges for creating accurate models, and numerical solutions may not always be practicable due to extensive computation costs.
Fortunately, the ever increasing availability of highfidelity experimental and numerical data has opened up new routes for modeling complex flows and, in this respect, datadriven ML methods have attracted significant attention in recent years. One of the main topics of interest is using ML to analyze fluid dynamics phenomena with reducedorder modeling, which projects dynamical methods in reduced form and improves computational efficiency on highdimensional data. Hasegawa et al.^{11} used a convolutional neural network autoencoder to extract the evolution of laminar bluff body wakes, which took advantage of the lowdimensional feature of the latent space and captured hidden wakes around a body of an arbitrary shape. Another promising topic is closure modeling or establishing physically informed models that improve the speed or accuracy of conventional computational fluid dynamics (CFD) models. For example, Ling et al.^{12} applied ML to identify and model the Reynolds stress tensor discrepancies between the Reynolds averaged Navier–Stokes (RANS) model and highfidelity simulations. Hou et al.^{13} used neural networks to predict flow properties and detect flow disturbances in dynamical systems, and Zhai et al.^{14} developed a semiphysicsinformed neural network to predict the microbubble dynamics in bubbly flows. Moreover, ML was applied to flow pattern identification, sensor placement, and flow control,^{15} and an adaptive sensing and control strategy was developed to arrange the sensor placement for obtaining maximal information.^{16} Rabault et al.^{17} used artificial neural networks to discover control strategies for active flow control.
More recently, a number of other datadriven theoretical methodologies based on highfidelity driver data have also been reported, but have not been widely used to model and analyze engineering fluid flows, including Lagrangian stochastic modeling (LSM), Lagrangian recurrence tracking and Lagrangian coherent structure detection.^{18–20} Sheikh et al.^{20,21} developed a datadriven Lagrangian stochastic model (LSM) to simulate singlephase and particle–liquid flows in stirred vessels, which was driven by experimental local velocity measurements in conjunction with decorrelation statistics. LSM has also been used to model geophysical ocean flows for underwater vehicles,^{22} to predict the mixing and transport of pollutants in water or atmosphere,^{23,24} and to study solar wind turbulence and turbulent combustion.^{25,26}
Recently, we successfully developed a ML modeling framework to reconstruct the turbulent flow fields of singlephase and twophase flows in a stirred vessel. The strategy relied on feeding a very shortterm Lagrangian trajectory of the phase concerned experimentally determined by a positron emission particle tracking (PEPT) technique.^{27} The method proved very efficient in producing the corresponding longterm Lagrangian trajectory of the particular phase, having the same statistical features learned from the short driver dataset input, which greatly reduces the experimental costs and/or numerical costs needed for complex flow simulations. This work represents another branch of ML applications in fluid dynamics, i.e., flow field and parameter estimation via machine learning from limited data input.
In this study, we extend the capability of the ML framework to predict the flow developed under conditions different to those under which the limited sets of driver data are experimentally acquired. In addition to different conditions of singlephase flow, the strategy is also extended to complex twophase multicomponent particle–liquid flows. The framework strategy is borrowed from the Reynolds averaging concept, where the instantaneous flow field is approximated by a mean flow field coupled with a Gaussian distributed fluctuation. The mean flow field is predicted by a knearest neighbors (KNN) regressor, which is trained by the driver Lagrangian trajectories data, and the fluctuation is produced by a Gaussian noise generator with the same statistical pattern corresponding to the driver flow conditions. Extensive experimental PEPT data are used to validate the ML framework including the local velocity field and spatial distribution of each flow phase and component involved.
II. EXPERIMENTAL AND TURBULENT DYNAMIC ANALYSIS
A. Mixing apparatus
Singlephase and complex particle–liquid mixing experiments were conducted in a stirred vessel of standard configuration with diameter T = 288 mm (radius R = 0.5T), fitted with four wall baffles of width 0.1T and filled to a height H = T. The vessel was agitated by a downpumping sixblade 45° pitchedturbine (PBTD) of diameter D = 0.5T, blade height 0.1T, and offbottom clearance 0.25T, as depicted in Fig. 1(a). In singlephase flow tests, the vessel was filled with NaCl solution (density 1150 kg/m^{3}). Agitation speeds ranged from 60 to 540 rpm, corresponding to a fully turbulent regime, i.e., impeller Reynolds number Re_{imp} ≥ 24 × 10^{3}. In particle–liquid flow experiments, two different types of ballotini suspensions (density 2485 kg/m^{3}) were used, i.e., monodisperse and polydisperse. The suspending medium was an aqueous NaCl solution. The monodisperse suspensions were studied at mean solid mass loadings of C_{m} = 5 and 20 wt. %, and agitated at impeller speeds ranging from the minimum speed for particle suspension N_{js} up to 2N_{js}. The polydisperse suspensions consisted of five particle sizes of equal mass fraction and overall mean solid mass loadings of C_{m} = 5, 10, 20, and 40 wt. % and were agitated at an impeller speed corresponding to N_{js}. In each case, the N_{js} speed was experimentally determined based on the wellknown Zwietering criterion.^{28} The conditions of the experiments conducted are summarized in Table I.
Flow system .  C_{m} (wt. %) .  C_{v} (vol. %) .  d (mm) .  T (mm) .  N (rpm) .  Re_{imp} (×10^{5}) . 

Singlephase  ⋯  ⋯  ⋯  288  60  0.24 
288  100  0.40  
288  150  0.60  
288  260  1.03  
288  330  1.31  
288  400  1.59  
288  500  1.98  
288  540  2.13  
Monodisperse particle–liquid  5  2.5  3  288  360 (N_{js})  1.43 
5  2.5  288  540 (1.5N_{js})  2.13  
5  2.5  288  720 (2N_{js})  2.84  
20  10.4  288  490 (N_{js})  1.94  
20  10.4  288  613 (1.25N_{js})  2.42  
20  10.4  288  735 (1.5N_{js})  2.91  
Polydisperse particle–liquid  5  2.5  1.1, 1.7, 2.1, 2.7, 3.1 (equal mass fraction = C_{m}/5)  288  380 (N_{js})  1.50 
10  5.2  288  450 (N_{js})  1.78  
20  10.4  288  510 (N_{js})  2.01  
40  23.6  288  610 (N_{js})  2.41 
Flow system .  C_{m} (wt. %) .  C_{v} (vol. %) .  d (mm) .  T (mm) .  N (rpm) .  Re_{imp} (×10^{5}) . 

Singlephase  ⋯  ⋯  ⋯  288  60  0.24 
288  100  0.40  
288  150  0.60  
288  260  1.03  
288  330  1.31  
288  400  1.59  
288  500  1.98  
288  540  2.13  
Monodisperse particle–liquid  5  2.5  3  288  360 (N_{js})  1.43 
5  2.5  288  540 (1.5N_{js})  2.13  
5  2.5  288  720 (2N_{js})  2.84  
20  10.4  288  490 (N_{js})  1.94  
20  10.4  288  613 (1.25N_{js})  2.42  
20  10.4  288  735 (1.5N_{js})  2.91  
Polydisperse particle–liquid  5  2.5  1.1, 1.7, 2.1, 2.7, 3.1 (equal mass fraction = C_{m}/5)  288  380 (N_{js})  1.50 
10  5.2  288  450 (N_{js})  1.78  
20  10.4  288  510 (N_{js})  2.01  
40  23.6  288  610 (N_{js})  2.41 
B. PEPT measurements
PEPT allows noninvasive imaging of opaque flows in opaque devices by using a representative positronemitting particle tracer to track the threedimensional (3D) motion of each phase.^{29} In a typical PEPT experiment, a radiolabelled tracer is introduced in the flow and its movement is recorded, providing a longterm Lagrangian trajectory, as shown in Fig. 1(b). The liquid phase was tracked using a ∼600 μm neutrally buoyant resin particle tracer (note that NaCl was added to water to match the density of the resin tracer). Each component of the particle phase was individually tracked using a representative glass bead tracer taken from the particle fraction considered. Being able to visualize opaque flows in 3D with a comparable accuracy to leading optical techniques, such as particle image velocimetry (PIV)/laser Doppler velocimetry (LDV),^{30,31} gives the PEPT technique a unique advantage. PEPT has been extensively used to study single and multiphase flows in pipes and stirred vessels and more details can be found in our earlier papers.^{18,19,21,29,32–37} Here, the Lagrangian trajectories of the carrier fluid and the suspended particles were acquired over a period of ∼40 min, providing ample data to reliably study their flow behaviour.^{33}
C. Dynamic analysis of turbulent flow field
Comparing the standard deviation of fluctuating velocity components in Fig. 2, an ascending order $ \sigma \theta > \sigma r> \sigma z$ is observed in both single and twophase flows. The fact that the largest standard deviation belongs to the tangential fluctuations is mainly because of the periodic impeller motion and the vortex breaking effect of the wallmounted baffles. In singlephase flows, the standard deviation is not significantly affected by the agitation speed over the range considered, as depicted in Fig. 2(b), which indicates that the flows studied belong to the same regime (i.e., fully turbulent). In a particle–liquid flow at a given condition, there is no significant difference in the fluctuation characteristics between each individual phase, as shown in Fig. 2(c). Comparing the standard deviation between different flow systems in Figs. 2(b) and 2(c), the fluctuations in twophase flow are slightly greater than those in singlephase flow, which is consistent with reports that adding large solid particles results in an increase in the turbulence intensity of the carrier phase.^{41–43} Hence, the statistical features of fluctuating velocity components in similar flow regimes are close to each other, i.e., flows of similar regimes possess a similar fluctuation pattern.
III. MACHINELEARNING MODELING FRAMEWORK
The proposed ML framework consists of three main modules, as illustrated in Fig. 3: (a) driver Lagrangian data organization; (b) KNN regressor training; and (c) Lagrangian trajectory construction. First, the driver data which consists of short experimental Lagrangian trajectories are analyzed and the extracted flow characteristics (i.e., instantaneous velocities and distribution of fluctuating velocities) are stored in a database. Then, a KNN regressor is trained to learn the primary flow pattern governing the flow field. By setting new flow conditions similar to those of the driver data, a corresponding new longterm Lagrangian trajectory is constructed by advancing a seed tracer throughout the instantaneous velocity field that is approximated by the KNNpredicted mean velocity coupled with a Gaussian fluctuation. The key equations and parameters are presented and discussed in Secs. III A and III B.
A. Key equations
As described in Sec. II C, the dynamic analysis of the input driver trajectories provides the local instantaneous velocities and the global fluctuation distributions in each direction, which are combined with the flow operating conditions used namely, the impeller rotational speed (N), particle mass concentration ( $ C m$) and particle size (d), and organized into two parts in the database. Part1 data are encoded in the format [ $N, C m,d,r,z,\theta , v r, v z, v \theta $] and used to train the KNN regressor, while Part2 data formatted as [ $N, C m,d,\u2009 \mu r, \sigma r, \mu z, \sigma z, \mu \theta , \sigma \theta $] are used to feed the Gaussian noise generator.
B. Key parameters
To implement the ML framework, several key parameters should be determined and used to simulate the new phase/component trajectory of flow systems in stirred vessels. First and foremost is the minimum amount of input Lagrangian driver data, which should include tracer visits to most of the grid cells in the vessel domain, and which serves as a benchmark for the flow pattern. Trial and error tests showed that using a 5 min trajectory was sufficient to achieve an excellent agreement between MLpredicted flow fields and PEPT measurements for the same flow condition.
The aim here is to use ML to predict longterm Lagrangian trajectories pertaining to new flow conditions; hence, several shortterm trajectories from different flow conditions should be utilized to drive the ML framework. For example, to predict the singlephase flow field in a vessel agitated for any given value of N, different input driver datasets obtained at various agitation speeds should be used, and similarly for other flow variables. A sample case is illustrated in Fig. 4. The three velocity components and total velocity are computed from 40 min trajectories predicted by the ML approach for N = 330 rpm, using driver datasets consisting of individual trajectories (5 min each) corresponding to: (i) one agitation speed (260 rpm); (ii) two agitation speeds (260, 400 rpm); and (iii) four agitation speeds (150, 260, 400, and 500 rpm). Comparison of results with PEPT measurements shows that using a single 5 min driver dataset corresponding to 260 rpm, overall yields a good prediction of the velocity field at 330 rpm despite some minor discrepancies throughout the vessel. However, feeding more driver data corresponding to more agitation speeds clearly enhances predictability, with no significant improvements being obtained from four datasets compared to two, as depicted in Fig. 4. Similar tests showed that this also applies to other flow variables, e.g., $ C m$ or d.
Moreover, the quality of the driver datasets is also crucial since it serves as a benchmark, i.e., a poor driver dataset (inaccurate experimental data) would mislead the ML framework. Henceforth, for optimum accuracy and computational efficiency, two datasets of shortterm Lagrangian trajectories will be used to drive the framework. Furthermore, intensive tests showed that the prediction accuracy of the three velocity components is totally reflected in the total velocity and to avoid duplication of the information presented, therefore, only the total velocity profiles will be presented in Sec. IV. It should be noted, however, that the flow to be predicted should not pertain to a hydrodynamic regime in terms of flow (e.g., turbulent) and particle suspension (e.g., justsuspended) which is too dissimilar to the one from which the driver data are obtained. In other words, extrapolation from one hydrodynamic regime to a completely different one is not currently possible.
The next key parameter is k in Eq. (4), which is the number of instances in the training data used to predict a new instance, determining the performance of the KNN regressor. Too large or too small a k value will produce serious errors by underfitting or overfitting the training data. In other words, as k increases, the ML framework performance gradually improves to an optimum point beyond which it starts deteriorating. In general, this optimum k value is automatically determined by minimizing the KNN prediction error during the training process [Fig. 3(b)].
IV. RESULTS AND DISCUSSION
The ML model was implemented using the Python language, to predict longterm Lagrangian trajectories in singlephase and complex multicomponent particle–liquid turbulent flows under a range of conditions (N = 60–540 rpm and $ C m$ = 5–40 wt. %). The typical computation time for predicting and constructing a 40 min ML trajectory with a typical 5 ms time step was around 2 h, which is orders of magnitude less than most conventional numerical simulations of turbulent flows. Note that the size of the time step does not affect the accuracy of the results provided it is on the order of milliseconds; 5 ms was selected to match the time step of the acquisition of the experimental PEPT data. For each application, the ML framework was fed with shortterm Lagrangian trajectory driver datasets, as described above. To evaluate the framework, the azimuthally averaged profiles of local total velocity and solid phase concentration predicted by ML were compared with the PEPT measured longterm profiles. Comparison was performed axially over four cylindrical envelopes spanning the vessel radius (r = 0.30R–0.92R) and radially over six horizontal planes spanning the height of the vessel (z = 0.05H–0.89H). Further validation was conducted via verification of mass continuity for all ML modeling cases.
A. Evaluation of ML model in singlephase flows
To evaluate the capability of the dynamical model to predict the velocity field in singlephase flows, two shortterm (5 min) trajectories measured by PEPT at N = 260 and 400 rpm were used as input driver data. The flow fields corresponding to agitation speeds of 330 rpm (interpolation within the range of input driver data, $ ML single , \u2009 260 rpm + 400 rpm \u2192 330 rpm$) and 540 rpm (extrapolation outside the range of input driver data, $ ML single , \u2009 260 rpm + 400 rpm \u2192 540 rpm$) were then computed from the longterm (40 min) trajectories predicted by the ML model. Comparison of the MLpredicted and PEPTmeasured velocity fields is presented in Figs. 6 and 7, showing an excellent agreement for both 330 and 540 rpm. Hence, the ML model is capable of predicting flow both within and without the range of impeller speeds pertaining to the supplied driver data. More extensive validation was performed using PEPT measurements in fully turbulent singlephase flows across a wide range of agitation speeds (see Table I: N = 60–540 rpm). More results are depicted in Fig. S1 in the supplementary material.
B. ML model evaluation in monodisperse particle–liquid suspensions
The capability of the model to predict the flow field of monodisperse particle–liquid suspensions was assessed under different agitation regimes (i.e., at the justsuspension speed N_{js} and at speeds above it up to 2N_{js}). Thus, 5minlong liquid and solid trajectories determined by PEPT at a certain agitation speed were used as input driver data to simulate the flow field developed at higher speeds. For example, predicting flow at 1.25N_{js} was implemented using the liquid and solid PEPT trajectories developed at N_{js} as input driver data, which is denoted by $ ML mono , \u2009 N j s \u2192 1.25 N j s$. The azimuthally averaged profiles of local total liquid and particle velocities predicted by ML for N = 1.25N_{js} and C_{m} = 20 wt. % are, respectively, compared in Figs. 9 and 10(a) with longterm PEPT measurements. Results show that the MLpredicted velocities for both liquid and solid phases are generally in very good agreement with the PEPT measurements.
Similarly, the flow field of the 20 wt. % case was predicted by the ML model at a higher speed of 1.5N_{js} using the liquid and solid PEPT trajectories developed at N_{js} as input driver data, and is denoted by $ ML mono , \u2009 N j s \u2192 1.5 N j s$. Comparison of the obtained local total liquid velocity, particle velocity, and particle volume concentration distribution with experiments is depicted in Figs. 11 and 12. Results show that the ML predictions are overall in good agreement with PEPT measurements, although the deviations of particle volume concentration near the bottom corner observed for the case of $ ML mono , \u2009 N j s \u2192 1.25 N j s$ are now slightly larger.
Attempts to predict the flow field at the even higher speed of 2N_{js} based on driver data obtained at N_{js} failed for 20 wt. % as well as other concentrations, providing a particle volume concentration with a significant overestimation in the bottom half of the vessel and a significant underestimation in the upper half. This may be attributed to the large difference in the hydrodynamic regime approaching complete suspension homogeneity at 2N_{js} compared to the justsuspended regime at N_{js}. In other words, the ML model is unable to predict flow in a regime that is too remote from the regime of the input driver data. However, using input driver data based on particle–liquid flow developed at N_{js}, the ML model is able to predict with good accuracy flows developed at agitation speeds up to 1.5N_{js}, which corresponds to a large difference (>threefold) in power input.
Further tests were conducted using driver data that are more diverse, i.e., acquired at two different impeller speeds, for example, at N_{js} and 2N_{js}. For example, comparing the results in $ ML mono , \u2009 N j s + 2 N j s \u2192 1.5 N j s$ (Fig. S2 in the supplementary material) and $ ML mono , \u2009 N j s \u2192 1.5 N j s$ (Fig. 12), shows that the former case provides improved predictions especially near the bottom of the vessel. It further demonstrates that accuracy and reliability of the ML model are improved with a larger and more diverse amount of input driver data, provided the particle–liquid flow regime is not too dissimilar to that of the input driver data. Mass continuity was also verified for all the above cases, as shown in Fig. S3.
C. Evaluation of ML model evaluation in polydisperse particle–liquid suspensions
The capability of the proposed ML approach to predict the multicomponent particle–liquid flow field developed at the justsuspension regime (N = N_{js}) was assessed under varying conditions of solid loading. Thus, 5min PEPTdetermined trajectories of all individual components corresponding to two different particle concentrations were used as input driver data to simulate the flow developed at a different particle concentration. For example, the interpolation case of predicting flow at 20 wt. % solid loading, which is denoted by $ ML poly , \u2009 5 w t . % + 40 w t . % \u2009 \u2192 \u2009 20 w t . %$, was implemented using PEPT trajectories of all components from 5 and 40 wt. % suspensions as input driver data. Comparison of the MLpredicted and PEPTmeasured liquid velocity field is presented in Fig. 13, showing excellent agreement. Sample validation results of the velocity field and particle distribution are exhibited in Figs. 14 and 15, for two particle size fractions, 1.1 and 3.1 mm. Results for the other particle size fractions (1.7, 2.2, 2.7 mm) are presented in Figs. S4–S6 in the supplementary material. Again, the predictions are overall very good. Mass continuity was verified by all the flow components, and the results are summarized in Fig. S7, showing closetozero average velocities $ v z S z$ and $ v r S r$, generally less than 0.02v_{tip}.
Other tests including interpolation as well as extrapolation cases were conducted using polydisperse suspensions of different solid loadings, 5, 10, and 40 wt. %, denoted by $ ML poly , \u2009 10 w t . % + 20 w t . % \u2009 \u2192 \u2009 5 w t . %$, $ ML poly , \u2009 5 w t . % + 40 w t . % \u2009 \u2192 \u2009 10 w t . %$, and $ ML poly , \u2009 10 w t . % + 20 w t . % \u2009 \u2192 \u2009 40 w t . %$. Sample results of an extrapolation case at 40 wt. % are depicted in Fig. S8 in the supplementary material. Overall, a high degree of model predictability was achieved. In conclusion, the ML framework is capable of very good predictions within and without the range of solid loading concentrations used to provide driver data for the model.
Furthermore, the ability to predict the flow field of a given particle size fraction using input driver data pertaining to other size fractions was investigated in the justsuspension regime (N = N_{js}). For example, the interpolation case of predicting the flow field of 2.2 mm particles at 40 wt. % suspensions, which is denoted by $ ML poly , \u2009 1.1 m m + 3.1 m m \u2009 \u2192 \u2009 2.2 m m$, was implemented using 5 min PEPT trajectories of 1.1 and 3.1 mm particles as input driver data, as depicted in Fig. 16. Other extrapolation cases, $ ML poly , \u2009 1.7 m m + 2.2 m m \u2009 \u2192 \u2009 1.1 m m$ and $ ML poly , \u2009 1.7 m m + 2.2 m m \u2009 \u2192 \u2009 3.1 m m$, and sample results for 1.1 mm are presented in Fig. S9 in the supplementary material. In all cases, results show that the ML predictions are overall very good. The mass continuity was also verified for all investigated cases, as shown in Fig. S10.
V. CONCLUSIONS
We have successfully developed and validated a computationally efficient machine learning framework for predicting singlephase and complex twophase multicomponent turbulent flows in stirred vessels. Using a very small amount of Lagrangian driver trajectories, the proposed ML model learns the primary flow dynamics from the driver data and efficiently produces new longterm trajectories of the flow components corresponding to new flow conditions. Evaluation was conducted by comparing the longterm MLpredicted trajectories and longterm PEPTmeasured trajectories using the local flow characteristics inferred, namely, local flow component velocities, and concentration distribution. The overall excellent agreement obtained confirmed the accuracy and reliability of the presented ML model for predicting, such complex flows under a wide range of conditions. Thus, accurate ML predictions can be achieved within and without the respective range of the supplied driver data for (i) different impeller speeds, provided the hydrodynamic flow regime is not dissimilar to the one corresponding to the driver data; (ii) for different solid loading concentrations; and (iii) for different particle size fractions.
The proposed ML framework provides a new flow analysis and modeling strategy, whereby only shortterm experiments (or alternatively highfidelity simulations) covering a few typical flow situations are sufficient to enable the prediction of complex multiphase flows, significantly reducing experimental and/or simulation costs. The ML technique is applicable to other impeller configurations provided that driver data are available for the particular impeller configuration considered. However, using driver data from one impeller configuration to predict the performance of another impeller configuration is generally not possible because of the large differences in flow patterns. For example, our investigations showed that data pertaining to a downpumping PBT could not be used to drive the ML model to predict the performance of an uppumping PBT.
SUPPLEMENTARY MATERIAL
See the supplementary material for additional validation results of the proposed machinelearning framework.
ACKNOWLEDGMENTS
This work was supported by EPSRC Programme Grant No. EP/R045046/1: Probing Multiscale Complex Multiphase Flows with Positrons for Engineering and Biomedical Applications (PI: Professor M. Barigou, University of Birmingham).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Kun Li: Conceptualization (equal); Data curation (equal); Formal analysis (lead); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Chiya Savari: Formal analysis (supporting); Investigation (supporting); Methodology (supporting); Writing – review & editing (equal). Mostafa Barigou: Conceptualization (lead); Funding acquisition (lead); Methodology (equal); Project administration (lead); Resources (lead); Supervision (lead); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.
NOMENCLATURE
Symbols
 c

Local volume solid concentration (%)
 C_{m}

Mean solid mass concentration (%)
 C_{v}

Mean solid volume concentration (%)
 D

Impeller diameter (m)
 d

Solid particle diameter (mm)
 dt

Time step of machine learning trajectory (s)
 $e$

Restitution coefficients ()
 H

Height of the fluid in vessel (m)
 k

Number of neighbors used in KNN regressor ()
 $ L p$

Distance between two samples in feature space ()
 N

Impeller rotational speed (rpm)
 $ N c$

Number of equalvolume cells of cylindrical mesh system ()
 N_{js}

Minimal impeller rotational speeds for just suspension (rpm)
 $ N L$

Number of detected locations in a cell ()
 $ N r, N z,\u2009 N \theta $

Number of cells in three directions of cylindrical mesh system ()
 N $ \mu , \sigma 2$

Gaussian distribution with mean value $\mu $ and standard derivation $\sigma $ ()
 $ O E$

Local occupancy ()
 R

Vessel radius (m)
 r

Radial cylindrical coordinate (m)
 Re_{imp}

Impeller Reynolds number ()
 T

Vessel diameter (m)
 t

Time (s)
 $ t E$

Ergodic time (s)
 $ t \u221e$

Total runtime of Lagrangian trajectory (s)
 $V$

Targets of training sample ()
 $v$

Lagrangian velocity (m/s)
 $ v \u2032$

Fluctuating velocity (m/s)
 $ v \xaf$

Mean velocity (m/s)
 v_{tip}

Impeller tip velocity (m/s)
 $ v r S r$

Averaged radial velocity over a cylindrical envelope (m/s)
 $ v z S z$

Averaged axial velocity over a horizontal plane (m/s)
 $ v \theta , v r, v z$

Lagrangian velocity components (m/s)
 $ v \theta \u2032, v r \u2032, v z \u2032$

Fluctuating velocity components (m/s)
 $ v 1$, $ v 2$

Velocities before and after collision (m/s)
 $ V pred$

New predicted targets ()
 w

Weight function of KNN regressor ()
 $x$

3D space location of tracer (m)
 x, y, z

Cartesian coordinates (m)
 $ X i,\u2009 X j$

ith query instance and jth training instance ()
 $ X i ( l ),\u2009 X j ( l )$

lth dimensional feature of ith instance and jth instance ()
 $\Delta t$

Cumulative time of tracer spent in each cell (s)
Greek symbols
Abbreviations
NOMENCLATURE
Symbols
 c

Local volume solid concentration (%)
 C_{m}

Mean solid mass concentration (%)
 C_{v}

Mean solid volume concentration (%)
 D

Impeller diameter (m)
 d

Solid particle diameter (mm)
 dt

Time step of machine learning trajectory (s)
 $e$

Restitution coefficients ()
 H

Height of the fluid in vessel (m)
 k

Number of neighbors used in KNN regressor ()
 $ L p$

Distance between two samples in feature space ()
 N

Impeller rotational speed (rpm)
 $ N c$

Number of equalvolume cells of cylindrical mesh system ()
 N_{js}

Minimal impeller rotational speeds for just suspension (rpm)
 $ N L$

Number of detected locations in a cell ()
 $ N r, N z,\u2009 N \theta $

Number of cells in three directions of cylindrical mesh system ()
 N $ \mu , \sigma 2$

Gaussian distribution with mean value $\mu $ and standard derivation $\sigma $ ()
 $ O E$

Local occupancy ()
 R

Vessel radius (m)
 r

Radial cylindrical coordinate (m)
 Re_{imp}

Impeller Reynolds number ()
 T

Vessel diameter (m)
 t

Time (s)
 $ t E$

Ergodic time (s)
 $ t \u221e$

Total runtime of Lagrangian trajectory (s)
 $V$

Targets of training sample ()
 $v$

Lagrangian velocity (m/s)
 $ v \u2032$

Fluctuating velocity (m/s)
 $ v \xaf$

Mean velocity (m/s)
 v_{tip}

Impeller tip velocity (m/s)
 $ v r S r$

Averaged radial velocity over a cylindrical envelope (m/s)
 $ v z S z$

Averaged axial velocity over a horizontal plane (m/s)
 $ v \theta , v r, v z$

Lagrangian velocity components (m/s)
 $ v \theta \u2032, v r \u2032, v z \u2032$

Fluctuating velocity components (m/s)
 $ v 1$, $ v 2$

Velocities before and after collision (m/s)
 $ V pred$

New predicted targets ()
 w

Weight function of KNN regressor ()
 $x$

3D space location of tracer (m)
 x, y, z

Cartesian coordinates (m)
 $ X i,\u2009 X j$

ith query instance and jth training instance ()
 $ X i ( l ),\u2009 X j ( l )$

lth dimensional feature of ith instance and jth instance ()
 $\Delta t$

Cumulative time of tracer spent in each cell (s)