A regularized version of the lattice Boltzmann method for efficient simulation of soft materials is introduced. Unlike standard approaches, this method reconstructs the distribution functions from available hydrodynamic variables (density, momentum, and pressure tensor) without storing the full set of discrete populations. This scheme shows significantly lower memory requirements and data access costs. A series of benchmark tests of relevance to soft matter, such as collisions of fluid droplets, is discussed to validate the method. The results can be of particular interest for high-performance simulations of soft matter systems on future exascale computers.

Minimizing the cost of accessing data is one of the most compelling challenges of present-day high-performance computing (HPC).1 This is particularly true for memory-bound computational schemes used in computational fluid dynamics. Among them, the lattice Boltzmann (LB) method2 is characterized by a comparatively low arithmetic intensity spanning in a range between 1 and 5 Flops/Byte ratio depending on the implementation, hence below the roofline threshold beyond which computing resources become the limiting factor.3,4

Given the widespread use of LB across a broad spectrum of scales and regimes, it is of prime interest to develop techniques to mitigate the impact of data access on future (exascale) LB simulations. Several such techniques have been developed in the past, based on hierarchical memory access,5 data organization in terms of array of structures and structure of arrays,6 as well as special crystallographic lattices that double the resolution at a given level of memory occupation.7 Furthermore, a number of approaches have been proposed to reduce memory occupancy. These can be summarized in two main strategies: the first one focuses on algorithmic implementation, while the second one exploits different numerical representations. As a remarkable example of the algorithmic approach, we recall the so-called “in-place” streaming, like the esoteric pull,8,9 while in the latest one, low precision and shifted equilibrium distributions are used either with single precision (instead of double) or with a mixed single precision/half-precision representation, with a factor two of memory saving and no loss of accuracy.10,11

In this work, we retrieve the pioneering considerations given by Ladd and Verberg12 on the possibility of reducing memory consumption by only storing hydrodynamic fields and their gradients. Thus, we present a new strategy to implement the LB approach to model a bi-component system based on the idea of refraining from storing the populations and reconstructing them on the fly based on the available hydrodynamic information. This approach grants a significant reduction in memory usage since it avoids population storage (in the following simply called “without populations”). Furthermore, this is particularly useful for multi-component and multiple species applications in which each species is carried by a common flow field, thereby requiring a single hydrodynamic field (density) as opposed to the full kinetic representation with O(30) populations. A similar argument applies to flows far from equilibrium or relativistic hydrodynamics,13,14 which demand higher order lattices, sometimes entailing hundreds of discrete populations per species.

The present approach is tested on a selection of physically relevant situations in soft matter. More specifically, we simulate a 3D off-axis and a 3D head-on collision between two equally sized fluid droplets in which the merging is suppressed. These simulations are compared with those performed using a regularized LB version with populations, and the accuracy of the results is quantitatively assessed by computing the time evolution of the droplet densities and the relative error. Even though the reconstruction procedure suffers from an inherent approximation concerning the standard implementation of the LB method (see Sec. II), our results show that the error remains negligible, the numerical accuracy is generally preserved, and the physics is correctly reproduced. Furthermore, the reconstruction procedure benefits from the regularization techniques15–17 that grants stability properties filtering out the non-hydrodynamic content in off-equilibrium distribution functions. The distinctive feature of the reconstruction procedure discussed here is that, unlike a standard LB approach, it considerably enhances the computational performance of lattice Boltzmann codes, entailing about 40 percent of memory savings on large-scale simulations. This represents a significant step toward the study of the multiscale physics of soft materials, particularly on modern graphics processing unit (GPU)-accelerated machines where the minimization of the burden imposed by memory requirements and data access are often mandatory.

This paper is structured as follows: In Sec. II, we illustrate the idea behind the LB without populations, while in Sec. III, we shortly describe the implementation and performance data. Section IV presents some test cases that validate the method, and in Sec. V, we discuss possible perspectives.

The LB scheme is based on a set of discrete populations fi(x,t)=f(x,υ,t)δ(vci) describing the probability of finding a representative fluid particle at position x and time t with a discrete molecular velocity υ=ci. The discrete velocities are chosen according to suitable crystallographic symmetries ensuring compliance with the basic conservation laws of macroscopic fluids. This implies a set of order b ∼ 20 − 30 discrete populations fi, i = 0, b, which exceed the number of associated hydrodynamic fields—scalar density ρ, (1), flow velocity ua (a = 1, 3), and momentum–flux tensor Πab (six terms for each lattice point)—totaling ten independent fields. Hence, LB requires about twice as much memory as compared to a corresponding Navier–Stokes-based computational fluid dynamics method. This redundancy buys several major computational advantages, primarily the fact that streaming is exact and diffusion is emergent (no need for a second order spatial derivatives), which proved invaluable assets in achieving outstanding parallel efficiency across virtually any HPC platform, also in the presence of real-world complex geometries.

These extra-memory requirements put a premium on strategies aimed at minimizing the cost of data access in massively parallel LB codes. Among others, recognized techniques are the hierarchical access and the use of special crystallographic lattices.7 

In this paper, we present an alternative idea, namely, running the LB scheme with no need of storing the populations. Let us unwrap the idea in short. Let f = [f1, …, fb] the b-dimensional array (multiscalar) associated with the discrete populations and H = [ρ, ua, Πab] the array associated with the hydro-fields (including the dissipative component). The hydrodynamic fields are linear and local combinations of the populations,18 that is,
(1)
which we write in compact form simply as H = Pf, where P is a projector matrix defined by the above relations. The inverse operation, namely, the reconstruction of f from H, reads as f = RH, where the reconstruction operator is the pseudo-inverse P. Clearly, the latter is ill-posed since the information removed by the projector P cannot be retraced back exactly. Formally, we write f = h + g, where h is the hydro-component of f, the one that can be reconstructed exactly from H, whereas g is the residual lost in the projection, known as “ghost” (non-hydrodynamic) component in LB jargon.2 At zeroth order, one can simply set g = 0.
More specifically, the distribution functions fi can be expanded around their equilibrium value12,16–18
(2)
where fi(0)=hieq is equal, by definition, to the equilibrium distribution function, and all other components fi(k) are sought in the order O(ɛk), where ɛ is the Knudsen number (the ratio between mean free path and a characteristic length) with ɛ ≪ 1. Using a multiscale Chapman–Enskog expansion,19 it can be shown that only the two first terms, fi(0) and fi(1), are sufficient to recover (asymptotically) the dynamics of the Navier–Stokes equation.20,21 Consequently, one can write fi=fi(0)+fi(1)+O(ε2) and identify fi(1)=fifi(0)+O(ε2)hineq as the non-equilibrium hydrodynamic term, apart from O(ɛ2) contributions. Therefore, the distribution functions can be written as fi=hieq+hineq+O(ε2)=hi+O(ε2). As reported in the literature,22 the reconstruction operator amounts to the following expression:
(3)
where wi are a set of coefficients whose values depend on the lattice geometry, uia=ua/cs2, and Qiab=(ciacibcs2δab)/(2cs4). Note that, unlike the implementation discussed in Refs. 23–25 (also called simplified lattice Boltzmann method), the populations hi are reconstructed on the fly as the sum of the equilibrium and non-equilibrium terms. In particular, the first one is computed as
(4)
whereas the second one is given by
(5)
with Πabeq=ihieq(ρ,u)ciacib, as shown in Refs. 16 and 17. All the higher-order non-equilibrium information (namely, the ghost components) is discarded, i.e., gi = 0. It is also worth reminding that hieq fulfills the relations12,18,20,26
(6)
while hineq complies with the following ones:
(7)
Here, σab = η(aub + bua) is the shear stress tensor with η the dynamic viscosity, and EΠ is an additional term whose relative error eΠ = EΠ/Π was shown to scale with the second order in the Mach number.18 

The reconstructed populations hi are then relaxed and streamed without storing them (see Sec. III). Note that streaming revives the ghost components gi, which affect only the non-equilibrium part of the set of populations. As a result, the operational sequence at each time step is Reconstruct–Collide–Stream (RCS). It should be emphasized that the RCS sequence does not match exactly the result of the original, populations preserving, LB scheme. The underlying assumption is that, whenever the present approach is compared to standard Bathnagar–Gross–Krook (BGK) collision,27 the discrepancy introduced by the R-step naturally heals itself via the ghost revival described earlier. Since the ghosts contain high-order, non-hydrodynamic non-equilibrium information, this assumption appears well justified, except perhaps in the case of very low viscous flows (turbulence), which lie outside the scope of this paper. However, discrepancies are lowered whenever the comparison is performed with a regularized LB version with populations, as in the present case. To be noted that a further source of discrepancy relates to the truncation error due to the floating-point representation, as well as to the different sequences of operations in the algorithmic implementation. However, none of these are expected to play a significant role in the physical phenomena under study in this work.

The basic idea is to construct the populations on the fly, based on Eq. (3) without storing them. In order to caution against overwriting the values at time t and t + 1, two copies of the hydrodynamic arrays must be stored, doubling the memory request. For a single component fluid in the paradigm of flip–flop memory access pattern,28 the memory saving is significant, 20 hydro-arrays (1 for density, 3 for velocity vector, 6 for the pressure tensor, two times in the flip–flop paradigm) vs 38 populations for the D3Q19 lattice scheme, adopted in the present investigation. For two component fluids, the advantage increases. For example, using the color-gradient lattice Boltzmann (LBCG) model,29–31 we need to add two extra arrays for the second component, thus obtaining 22 hydro-arrays vs 76 populations (i.e., 38 populations per each fluid component).

In a single-component fluid, collisions are local and proceed according to the standard BGK update
(8)
where all hi are stored as temporary scalars, thus without arrays, and ω is the BGK relaxation frequency that tunes the kinematic viscosity ν=cs2(τ12), being τ = 1/ω the relaxation time and cs=1/3 the isothermal sound speed of the model.2,20
Inserting Eqs. (3) and (4) into Eq. (8), we obtain
(9)
with Πabneq=ΠabΠabeq.
The streaming is a little subtler because the streamed populations are constructed on the fly from the hydrodynamics fields computed in the neighboring points and then pulled from the neighboring sites (non-local). In equations
(10)
where hi is the post-collision population that is function of the fields ρ, ua, and Πab. Note that, since the populations are reconstructed on the fly, collision and streaming are more conveniently fused into a single step (fused approach), thus allowing for computation and storage of the hydro-fields only after Eq. (10). This is in contrast with a split approach, in which the fields would be stored after the collision step of Eq. (8). As aforementioned, the best strategy for the fused approach is to implement the Reconstruct–Collide–Stream (RCS) sequence, since it combines the locality of the collision step with the non-locality of the pulled streaming, allowing the reconstruction of the populations only at the pre-collision stage, in analogy with the regularized method described by Latt and Chopard.16 
For a two-component fluid, we use the color-gradient model.29–31 This method requires the calculation of the local density gradients, which are necessary for the perturbation and recoloring steps. Denoting the phase field as Θ = (ρRρB)/(ρR + ρB), where ρR and ρB are the densities of the two fluids (red and blue), the reconstructed populations obtained from Eq. (9) need for the following extra step:
(11)
(12)
where ρ = ρR + ρB, cos ϕi is the cosine of the angle between the phase field gradient and the ith lattice vector and hieq(ρ,0) is the equilibrium distribution function computed for ρ and zero velocity. In addition, Bi are lattice dependent weights given in Ref. 30, the parameter A is chosen to fit the interfacial tension γ through the relation
(13)
and the coefficient β is a free parameter that tunes the interface width, thus playing the role of an inverse diffusion length scale.
Hence, the collided populations hR,i and hB,i are streamed applying Eq. (10) for each component, and the updated hydrodynamic variables are finally computed
(14)
These ones are then used in the next step by applying, in sequence, Eqs. (9)(12) in a fused approach with the streaming pulled by the neighbor sites. For a single fluid component, Eqs. (11) and (12) are not required and the density field is computed only for the single component.

About the implementation of the boundary conditions (BCs), since the populations are reconstructed on the fly, BCs can be easily implemented by conditional statements for the applications of ad-hoc rules on the post-collision populations in the fused approach. For instance, halfway bounce-back can be easily implemented, including a minor correction to endure a Dirichlet boundary condition.12,32 On the other hand, the performance could notably decrease due to the extensive use of conditional statements in the fused implementations. The problem can be avoided by adopting the non-local regularized BCs,21,26 where the BCs are applied by modifying only the hydrodynamics fields used to reconstruct the populations. In particular, in the non-local regularized BCs, the target values of ρBC and uBC at the wall can be assigned or extrapolated by finite differences. At the same time, the non-equilibrium tensor can be estimated18,26 as Πabneqρcs2τ(aub+bua). Both the proposed BC strategies preserve second-order accuracy in space and time.12,21

Recently, an extension of the lattice Boltzmann method to model near contact interactions among interfaces (in the following denoted LBNCI)33,34 has granted the possibility to simulate a broad variety of complex flow systems, such as foams, emulsions, flowing collections of droplets, bubbles, and high internal phase emulsions, i.e., foamy-like media displaying ordered patterns of fluid drops when under confinement.35–37 In those systems, the relative motion of two fluid interfaces in close contact represents a complex hydrodynamic process due to the concurrent action of several repulsive and/or attractive forces, such as electrostatic and van der Waals forces, hydration repulsion, steric interactions, and depletion attraction.38,39 Importantly, these forces govern the physics of phenomena, such as coalescence and/or repulsion between neighboring droplets that can considerably alter the mechanics of the material, occurring at lengthscales much larger than the near-contact ones.40–44 Their nature has been initially elucidated by the pioneering works of Gibbs and Marangoni on the thermodynamics of liquid thin films39 and by the seminal contributions of Derjaguin and Overbeek45,46 concurring in the development of the DLVO theory.

The LBNCI method is a variant of an LBM approach for multicomponent flows, based on the color-gradient model,30 augmented with an extra forcing term capturing the effects of the near-contact forces acting at the fluid interface level. This method still holds to a continuum description of the interface dynamics, even though near interaction forces arise at a molecular level down to nanometers (and below), namely, the relevant spatial scale of contact forces. In other words, the mesoscale representation of near-contact forces provides a coarse-grained description able to retain computational viability without compromising the underlying physics.

In particular, the additional term is included within the LB method via an extra forcing contribution that acts only on the fluid interfaces in near contact, and it is given by
(15)

In the above equation, Ah[h(x)] is the parameter controlling the strength (force per unit volume) of the near contact interactions, h(x) is the distance between two close interfaces, n is a unit vector normal to the interface, and δI ∝ ∇Θ is a function, proportional to the phase field Θ, which localizes the force at the interface.

Performing a Chapman–Enskog expansion, it can be shown that the actual numerical algorithm in the hydrodynamic limit approaches the equation of the linear momentum
(16)
where Fsurf=σ(n)nδI is a volume interfacial force with σ denoting the surface tension,47 and p the pressure. This is the Navier–Stokes equation for a multicomponent system augmented with a surface-localized repulsive term.
As a validation test, we present a simulation of a head-off-axis collision between two equally sized fluid droplets. Such a process has been extensively investigated in a previous study using a central processing unit (CPU)-based color-gradient lattice Boltzmann method augmented with repulsive near-contact interactions preventing coalescence.33 In the present article, we use a GPU-based version of the method that has been included in LBcuda,48 a GPU-accelerated version of LBsoft.49 We test the algorithm in conditions where the BGK operator is under-relaxed and over-relaxed; this is done by setting two different values of relaxation time τ for the two fluid components, with τR = 0.6 and τB = 1.4. These ones define the corresponding kinematic viscosity νR ≃ 0.03 and νB ≃ 0.3. At the interface, the effective relaxation time τeff is linearly interpolated as
(17)
and the corresponding relaxation frequency, ωeff = 1/τeff, is inserted in Eqs. (9) and (13).

The numerical experiment is as follows: Two drops, having diameter D = 30 lattice sites, of a fluid (red component) are dispersed in a second fluid (blue component), with the viscosity ratio reported above and densities equal to one for both the components. The system is placed within a cubic box of linear size L = 128 and the drops are located at a distance such that reciprocal interactions are negligible. The impact parameter (i.e., the distance between the collision trajectories perpendicular to their relative direction of motion) is b = 0.85, the strength of near-contact forces preventing coalescence is Ah = 0.1, while other values can be found in Table I of Ref. 33. In Fig. 1, we show the time sequence of such a collision simulated using an LB approach without populations [(a)–(e)] and with populations [(f)–(j)]. In both cases, the relative impact velocity Urel is set to 0.5, while Reynolds and Capillary numbers are ReR = 450 and CaR ≃ 1.6 for the droplet (red fluid), and ReB = 50 for the surrounding medium (blue fluid). The two drops initially approach and then impact without merging [Figs. 1(b), 1(c), 1(g), and 1(h)], since the repulsive near-contact interactions prevent the coalescence and enable the formation of a thin film that separates the drops.33 During the bouncing, the shape of the drops considerably departs from the initial spherical geometry, temporarily flattening where opposite interfaces come into close contact. Afterward, they separate and gradually reacquire a spherical shape when their reciprocal distance is sufficiently high.

FIG. 1.

Time sequence of an off-axis collision between two fluid droplets simulated using an LB scheme without populations [NOPOPS, (a)–(e)] and with populations [POPS, (f)–(j)]. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. The two droplets, initially perfectly spherical [(a)–(f)], deform acquiring a bullet-like shape as they collide [(b), (c) and (g), (h)]. Afterward, they separate [(d)–(i)] and attain a spherical shape once the mutual distance is sufficiently large [(e)–(j)]. No appreciable differences emerge between the two approaches. The color map represents the values of the density field ranging from 0 to 1.

FIG. 1.

Time sequence of an off-axis collision between two fluid droplets simulated using an LB scheme without populations [NOPOPS, (a)–(e)] and with populations [POPS, (f)–(j)]. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. The two droplets, initially perfectly spherical [(a)–(f)], deform acquiring a bullet-like shape as they collide [(b), (c) and (g), (h)]. Afterward, they separate [(d)–(i)] and attain a spherical shape once the mutual distance is sufficiently large [(e)–(j)]. No appreciable differences emerge between the two approaches. The color map represents the values of the density field ranging from 0 to 1.

Close modal

The time sequence of Fig. 1 shows that the two numerical approaches with and without populations essentially lead to equivalent dynamic behaviors. On a quantitative basis, this result is corroborated by evaluating, for example, the time evolution of the density field ρ|ρ>0.5 shown in Fig. 2, where ⟨⋯⟩ represents a spatial average computed over lattice points for which ρ > 0.5 (i.e., within both drops, the red regions in Fig. 1). Larger fluctuations occur only at early times, i.e., when the droplets come into contact, while they progressively shrink as the droplets move away. Note that the differences between the two plots remain marginal over the full process. In Fig. 3, we compute, at two different lattice sites, the time evolution of the relative error |ρNρP|ρNMAX, where ρN and ρP are the densities of the LB without and with populations, while ρNMAX is the maximum value of ρ calculated using LB without populations. Unlike the previous one, this quantity provides a local measure of potential discrepancies between the two approaches. The plot shows that the relative error remains rather small over time, a further indication that the procedure of reconstructing the populations proposed in this work preserves, at a high level of accuracy, the physics of the systems.

FIG. 2.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5 (i.e., within the drops). The interaction time between the drops is 102t ≲ 15 × 102. Differences between the plots are generally very weak, slightly larger when the droplets are in close contact and negligible later on.

FIG. 2.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5 (i.e., within the drops). The interaction time between the drops is 102t ≲ 15 × 102. Differences between the plots are generally very weak, slightly larger when the droplets are in close contact and negligible later on.

Close modal
FIG. 3.

Typical time evolution of the relative error |ρNρP|ρNMAX calculated at three different sites, i.e., near the droplet interface (x = 32, y = 64, z = 64), within a drop (x = 45, y = 50, z = 64, inset) and far from the drops (x = 128, y = 128, z = 128).

FIG. 3.

Typical time evolution of the relative error |ρNρP|ρNMAX calculated at three different sites, i.e., near the droplet interface (x = 32, y = 64, z = 64), within a drop (x = 45, y = 50, z = 64, inset) and far from the drops (x = 128, y = 128, z = 128).

Close modal

The approach without populations has been also tested in simulations of a head-on collision between two fluid droplets (see Fig. 4). Here, b = 0, Urel = 0.25, and D = 30. Once again, the physics is correctly reproduced and no appreciable discrepancies emerge, as also shown in Fig. 5 where the time evolution of ⟨ρρ>0.5 is plotted. Note incidentally that, in this case, the interaction time is considerably longer than before since, after the bouncing, the drops remain in close contact for a longer period of time.

FIG. 4.

Time sequence of a head-on collision between two fluid droplets simulated using an LB scheme without populations [(a)–(e)] and with populations [(f)–(j)]. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. The two droplets, initially spherical [(a)–(f)], approach and, once in close contact, elongate vertically flattening where opposite interfaces are closer [(b), (c) and (g), (h)]. Afterward, they progressively relax toward equilibrium and reacquire a spherical shape [(d), (e) and (i), (j)]. Once again, no appreciable differences emerge between the two schemes. The color map represents the values of the density field ranging from 0 to 1.

FIG. 4.

Time sequence of a head-on collision between two fluid droplets simulated using an LB scheme without populations [(a)–(e)] and with populations [(f)–(j)]. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. The two droplets, initially spherical [(a)–(f)], approach and, once in close contact, elongate vertically flattening where opposite interfaces are closer [(b), (c) and (g), (h)]. Afterward, they progressively relax toward equilibrium and reacquire a spherical shape [(d), (e) and (i), (j)]. Once again, no appreciable differences emerge between the two schemes. The color map represents the values of the density field ranging from 0 to 1.

Close modal
FIG. 5.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5 (i.e., within the drops). The interaction time between the drops is 102t ≲ 80 × 102.

FIG. 5.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5 (i.e., within the drops). The interaction time between the drops is 102t ≲ 80 × 102.

Close modal

The numerical experiments discussed so far present results in which near-contact interactions function as an effective force field preventing droplet coalescence. But what if these forces are turned off? In Fig. 6, we simulate precisely this case by setting Ah equal to zero (other parameters are the same as those of Fig. 1). The drops, initially static and perfectly spherical, approach and connect through a tiny neck of red fluid, a region in which capillary forces are strong enough to prevent its breakup. Afterward, the two droplets merge into a single one that slowly attains an equilibrated spherical shape. Once again, both methods with and without populations describe the physics of the process with negligible differences, as also demonstrated in Fig. 7, where early times density fluctuations capture the dynamics of droplet coalescence.

FIG. 6.

Time sequence of the coalescence between two fluid droplets induced by a collision. Snapshots (a)–(e) show the results obtained using an LB scheme without populations while (f)–(j) the ones with populations. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. Here the drops, initially separated [(a) and (f)], come into close contact [(b) and (g)] and connect through a thin neck of red fluid [(c) and (h)]. Afterward, they slowly merge, morphing into a single droplet.

FIG. 6.

Time sequence of the coalescence between two fluid droplets induced by a collision. Snapshots (a)–(e) show the results obtained using an LB scheme without populations while (f)–(j) the ones with populations. The centers of mass are initially placed at a distance where the mutual interaction of the droplet is negligible. Here the drops, initially separated [(a) and (f)], come into close contact [(b) and (g)] and connect through a thin neck of red fluid [(c) and (h)]. Afterward, they slowly merge, morphing into a single droplet.

Close modal
FIG. 7.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5. Fluctuations observed for t ≲ 60 × 102 indicate that the droplets interact and the merging is ongoing. Once again, the differences between the two methods are mild.

FIG. 7.

Time evolution of ⟨ρ|ρ>0.5, where ⟨⋯⟩ is a spatial average computed considering lattice sites with ρ > 0.5. Fluctuations observed for t ≲ 60 × 102 indicate that the droplets interact and the merging is ongoing. Once again, the differences between the two methods are mild.

Close modal
To compare the performance, we use the head-off-axis collision reported in Sec. IV as a benchmark case. It consists of a cubic box of 128 lattice points per side with a two-component fluid. All the tests were carried out on an NVIDIA® Tesla® V100 Tensor Core equipped with 16 GB of GPU memory. We exploit the Mega Lattice Updates Per Second (MLUPS) metrics to probe the efficiency. In particular, the definition of MLUPS reads
(18)
where ts is the run (wall-clock) time (in seconds) per single time step iteration, while Lx, Ly, and Lz are the domain sizes. The current version without populations runs at about 622 MLUPS against the 277 MLUPS of the standard CGLB. Several specific CUDA optimization techniques have been developed to improve the performance, to be detailed in a future and more technical presentation. The main difficulty is the non-locality in the pulled streaming step, which requires reading the hydrodynamic fields (11 arrays) in the 18 neighbor points for a total of 198 data accesses (792 bytes in single precision). Although the readings are scattered, neighbor threads share several readings of the hydrodynamic fields. Consequently, threads in the same thread block may cooperate in the data access by using shared memory. Nonetheless, the shared memory is set to 48 Kbytes by default, and it is configurable up to 96 Kbytes per streaming multiprocess in the NVIDIA V100 card. In the present work, we exploit the shared memory to store only the density fields, obtaining a performance improvement of about 20 percent. In this context, the larger size of shared memory (up to 164 Kbytes) in the NVIDIA Ampere® A100 Tensor Core GPU will probably grant a significant performance gain, allowing to store also the velocity vector in the shared memory. Finally, the main advantage is observed in the GPU memory usage that is equal to 0.6782 GB in the version without populations compared to the version with populations where it is equal to 1.1411 GB, 40% of GPU memory saving. Note that the advantage is strictly connected to the three-dimensional lattice model implemented in the code. A 40% memory saving is highly appealing on very large-scale simulations with multi-billion grid points since it amounts to hundreds of Gbytes savings.50 More importantly, such savings are destined to become much more significant for the case of multiple species, such as those occurring in many natural and industrial applications.

In summary, we have introduced a regularized lattice Boltzmann method for the efficient simulation of soft materials that reconstructs the distribution functions from hydrodynamics quantities (density, momentum, and pressure tensor) with no need to store the full set of populations. This entails a significant decrease in memory requirements and a substantial enhancement of the computational performance, features of particular relevance for large-scale simulations. The accuracy of the method has been investigated in numerical experiments of relevance to soft matter, such as the collision of fluid droplets. For such cases, the formulation without populations shows satisfactory numerical stability and negligible differences with respect to a standard reconstruction-free LB approach. In practical terms, 40% memory reduction essentially means doubling the volume of the simulation, which would save roughly 4 TB memory, for the case of a simulation demanding about 10 TB of data.50–52 It would be of interest to further test the model on more complex states of matter, such as colloidal fluids and fluid droplets flowing within microfluidic channels, where the implementation of solid boundaries is often necessary.53 On a more general basis, our results could be helpful for the design of novel high-performance computing methods in soft matter, where customized numerical schemes are fundamental to tackle phenomena occurring over a broad spectrum of scales in space and time.

The research leading to these results has received funding from MIUR under the project “3D-Phys” (Grant No. PRIN 2017PHRM8X) and from the European Research Council under the European Union’s Horizon 2020 Framework Program (Grant No. FP/2014–2020)/ERC Grant Agreement No. 739964 (“COPMAT”). We acknowledge CINECA Project No. IsB25_DIW3F under the ISCRA initiative, for the availability of high-performance computing resources and support. F.B. acknowledges funding by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Program (Grant Agreement No. 882340). M.L. acknowledges the support of the Italian National Group for Mathematical Physics (GNFM-INdAM).

The authors have no conflicts to disclose.

Adriano Tiribocchi: Software (equal); Validation (equal); Visualization (lead); Writing – original draft (lead). Andrea Montessori: Formal analysis (equal); Software (equal); Validation (equal); Writing – review & editing (equal). Giorgio Amati: Software (equal); Writing – review & editing (equal). Massimo Bernaschi: Software (equal); Writing – review & editing (equal). Fabio Bonaccorso: Software (equal); Writing – review & editing (equal). Sergio Orlandini: Software (equal); Writing – review & editing (equal). Sauro Succi: Formal analysis (equal); Supervision (equal); Writing – review & editing (equal). Marco Lauricella: Conceptualization (lead); Formal analysis (equal); Methodology (lead); Software (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal).

The data supporting this study’s findings are available from the corresponding author upon reasonable request.

1.
S.
Jamshed
,
Using HPC for Computational Fluid Dynamics: A Guide to High Performance Computing for CFD Engineers
(
Academic Press
,
2015
).
2.
S.
Succi
,
The Lattice Boltzmann Equation: For Complex States of Flowing Matter
(
Oxford University Press
,
2018
).
3.
S.
Succi
,
G.
Amati
,
F.
Bonaccorso
,
M.
Lauricella
,
M.
Bernaschi
,
A.
Montessori
, and
A.
Tiribocchi
, “
Toward exascale design of soft mesoscale materials
,”
J. Comput. Sci.
46
,
101175
(
2020
).
4.
S.
Succi
,
G.
Amati
,
M.
Bernaschi
,
G.
Falcucci
,
M.
Lauricella
, and
A.
Montessori
, “
Towards exascale lattice Boltzmann computing
,”
Comput. Fluids
181
,
107
115
(
2019
).
5.
K.
Mattila
,
J.
Hyväluoma
,
J.
Timonen
, and
T.
Rossi
, “
Comparison of implementations of the lattice-Boltzmann method
,”
Comput. Math. Appl.
55
,
1514
1524
(
2008
).
6.
E.
Calore
,
A.
Gabbana
,
S.
Schifano
, and
R.
Tripiccione
, “
Optimization of lattice Boltzmann simulations on heterogeneous computers
,”
Int. J. High Perform. Comput. Appl.
33
,
124
139
(
2017
).
7.
M.
Namburi
,
S.
Krithivasan
, and
S.
Ansumali
, “
Crystallographic lattice Boltzmann method
,”
Sci. Rep.
6
,
27172
(
2016
).
8.
M.
Lehmann
, “
Esoteric pull and esoteric push: Two simple in-place streaming schemes for the lattice Boltzmann method on GPUs
,”
Computation
10
,
92
(
2022
).
9.
M.
Geier
and
M.
Schönherr
, “
Esoteric twist: An efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware
,”
Computation
5
,
19
(
2017
).
10.
M.
Lehmann
,
M. J.
Krause
,
G.
Amati
,
M.
Sega
,
J.
Harting
, and
S.
Gekle
, “
Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats
,”
Phys. Rev. E
106
,
015308
(
2022
).
11.
F.
Gray
and
E.
Boek
, “
Enhancing computational precision for lattice Boltzmann schemes in porous media flows
,”
Computation
4
,
11
(
2016
).
12.
A. J. C.
Ladd
and
R.
Verberg
, “
Lattice-Boltzmann simulations of particle-fluid suspensions
,”
J. Stat. Phys.
104
,
1191
1251
(
2001
).
13.
L.
Rezzolla
and
O.
Zanotti
,
Relativistic Hydrodynamics
(
Oxford University Press
,
2013
).
14.
L. R.
Weih
,
A.
Gabbana
,
D.
Simeoni
,
L.
Rezzolla
,
S.
Succi
, and
R.
Tripiccione
, “
Beyond moments: Relativistic lattice Boltzmann methods for radiative transport in computational astrophysics
,”
Mon. Not. R. Astron. Soc.
498
,
3374
3394
(
2020
).
15.
C.
Coreixas
,
B.
Chopard
, and
J.
Latt
, “
Comprehensive comparison of collision models in the lattice Boltzmann framework: Theoretical investigations
,”
Phys. Rev. E
100
,
033305
(
2019
).
16.
J.
Latt
and
B.
Chopard
, “
Lattice Boltzmann method with regularized pre-collision distribution functions
,”
Math. Comput. Simul.
72
,
165
168
(
2006
).
17.
R.
Zhang
,
X.
Shan
, and
H.
Chen
, “
Efficient kinetic method for fluid simulation beyond the Navier-Stokes equation
,”
Phys. Rev. E
74
,
046703
(
2006
).
18.
T.
Krüger
,
F.
Varnik
, and
D.
Raabe
, “
Shear stress in lattice Boltzmann simulations
,”
Phys. Rev. E
79
,
046704
(
2009
).
19.
S.
Chapman
and
T. G.
Cowling
,
The Mathematical Theory of Non-uniform Gases: An Account of the Kinetic Theory of Viscosity, Thermal Conduction and Diffusion in Gases
(
Cambridge University Press
,
1990
).
20.
T.
Krüger
,
H.
Kusumaatmaja
,
A.
Kuzmin
,
O.
Shardt
,
G.
Silva
, and
E. M.
Viggen
,
The Lattice Boltzmann Method
(
Springer International Publishing
,
2017
), Vol. 10, pp.
4
15
.
21.
J.
Latt
,
B.
Chopard
,
O.
Malaspinas
,
M.
Deville
, and
A.
Michler
, “
Straight velocity boundaries in the lattice Boltzmann method
,”
Phys. Rev. E
77
,
056703
(
2008
).
22.
G.
Wissocq
,
C.
Coreixas
, and
J. F.
Boussuge
, “
Linear stability and isotropy properties of athermal regularized lattice Boltzmann methods
,”
Phys. Rev. E
102
,
053305
(
2020
).
23.
Z.
Chen
,
C.
Shu
, and
D.
Tan
, “
Highly accurate simplified lattice Boltzmann method
,”
Phys. Fluids
30
,
103605
(
2018
).
24.
Z.
Chen
,
C.
Shu
, and
D.
Tan
, “
A simplified thermal lattice Boltzmann method without evolution of distribution functions
,”
Int. J. Heat Mass Transfer
105
,
741
757
(
2017
).
25.
Z.
Chen
,
C.
Shu
,
Y.
Wang
,
L. M.
Yang
, and
D.
Tan
, “
A simplified lattice Boltzmann method without evolution of distribution function
,”
Adv. Appl. Math. Mech.
9
,
1
22
(
2017
).
26.
J.
Latt
, “
Hydrodynamic limit of lattice Boltzmann equations
,”
Ph.D. thesis
,
University of Geneva
,
2007
.
27.
P. L.
Bathnagar
,
E. P.
Gross
, and
M.
Krook
, “
A model for collision processes in gases. I. small amplitude processes in charged and neutral one-component systems
,”
Phys. Rev.
94
,
511
525
(
1954
).
28.
J.
Myre
,
S. D. C.
Walsh
,
D.
Lilja
, and
M. O.
Saar
, “
Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters
,”
Concurrency Comput.: Pract. Exper.
23
,
332
350
(
2011
).
29.
Z.
Wen
,
Q.
Li
,
Y.
Yu
, and
K. H.
Luo
, “
Improved three-dimensional color-gradient lattice Boltzmann model for immiscible two-phase flows
,”
Phys. Rev. E
100
,
023301
(
2019
).
30.
S.
Leclaire
,
A.
Parmigiani
,
O.
Malaspinas
,
B.
Chopard
, and
J.
Latt
, “
Generalized three-dimensional lattice Boltzmann color-gradient method for immiscible two-phase pore-scale imbibition and drainage in porous media
,”
Phys. Rev. E
95
,
033306
(
2017
).
31.
H.
Liu
,
A. J.
Valocchi
, and
Q.
Kang
, “
Three-dimensional lattice Boltzmann model for immiscible two-phase flow simulations
,”
Phys. Rev. E
85
,
046309
(
2012
).
32.
A. J. C.
Ladd
, “
Numerical simulations of particulate suspensions via a discretized Boltzmann equation. Part 1. Theoretical foundation
,”
J. Fluid Mech.
271
,
285
309
(
1994
).
33.
A.
Montessori
,
M.
Lauricella
,
N.
Tirelli
, and
S.
Succi
, “
Mesoscale modelling of near-contact interactions for complex flowing interfaces
,”
J. Fluid Mech.
872
,
327
347
(
2019
).
34.
A.
Montessori
,
M.
Lauricella
,
A.
Tiribocchi
, and
S.
Succi
, “
Modeling pattern formation in soft flowing crystals
,”
Phys. Rev. Fluids
4
,
072201
(
2019
).
35.
M.
Bogdan
,
A.
Montessori
,
A.
Tiribocchi
,
F.
Bonaccorso
,
M.
Lauricella
,
L.
Jurkiewicz
,
S.
Succi
, and
J.
Guzowski
, “
Stochastic jetting and dripping in confined soft granular flows
,”
Phys. Rev. Lett.
128
,
128001
(
2022
).
36.
A.
Montessori
,
A.
Tiribocchi
,
M.
Bogdan
,
F.
Bonaccorso
,
M.
Lauricella
,
J.
Guzowski
, and
S.
Succi
, “
Translocation dynamics of high-internal phase double emulsions in narrow channels
,”
Langmuir
37
,
9026
9033
(
2021
).
37.
A.
Montessori
,
A.
Tiribocchi
,
M.
Lauricella
,
F.
Bonaccorso
, and
S.
Succi
, “
Wet to dry self-transitions in dense emulsions: From order to disorder and back
,”
Phys. Rev. Fluids
6
,
023606
(
2021
).
38.
C.
Stubenrauch
and
R.
Von Klitzing
, “
Disjoining pressure in thin liquid foam and emulsion films—New concepts and perspectives
,”
J. Phys.: Condens. Matter
15
,
R1197
(
2003
).
39.
V.
Bergeron
, “
Forces and structure in thin liquid soap films
,”
J. Phys.: Condens. Matter
11
,
R215
(
1999
).
40.
R. H.
Davis
,
J. A.
Schonberg
, and
J. M.
Rallison
, “
The lubrication force between two viscous drops
,”
Phys. Fluids A
1
,
77
81
(
1989
).
41.
G.
Barnocky
and
R. H.
Davis
, “
The lubrication force between spherical drops, bubbles and rigid particles in a viscous fluid
,”
Int. J. Multiphase Flow
15
,
627
638
(
1989
).
42.
S.
Rubin
,
A.
Tulchinsky
,
A. D.
Gat
, and
M.
Bercovici
, “
Elastic deformations driven by non-uniform lubrication flows
,”
J. Fluid Mech.
812
,
841
865
(
2017
).
43.
X. D.
Shi
,
M. P.
Brenner
, and
S. R.
Nagel
, “
A cascade of structure in a drop falling from a faucet
,”
Science
265
,
219
222
(
1994
).
44.
M.
Mani
,
S.
Mandre
, and
M. P.
Brenner
, “
Events before droplet splashing on a solid surface
,”
J. Fluid Mech.
647
,
163
185
(
2010
).
45.
B.
Derjaguin
, “
On the repulsive forces between charged colloid particles and on the theory of slow coagulation and stability of lyophobe sols
,”
Trans. Faraday Soc.
35
,
203
215
(
1940
).
46.
E. J. W.
Verwey
, “
Theory of the stability of lyophobic colloids
,”
J. Phys. Chem.
51
,
631
636
(
1947
).
47.
A.
Montessori
,
M.
Lauricella
, and
S.
Succi
, “
Mesoscale modelling of soft flowing crystals
,”
Philos. Trans. R. Soc., A
377
,
20180149
(
2019
).
48.
F.
Bonaccorso
,
M.
Lauricella
,
A.
Montessori
,
G.
Amati
,
M.
Bernaschi
,
F.
Spiga
,
A.
Tiribocchi
, and
S.
Succi
, “
LBcuda: A high-performance CUDA port of LBsoft for simulation of colloidal systems
,”
Comput. Phys. Commun.
277
,
108380
(
2022
).
49.
F.
Bonaccorso
,
A.
Montessori
,
A.
Tiribocchi
,
G.
Amati
,
M.
Bernaschi
,
M.
Lauricella
, and
S.
Succi
, “
LBsoft: A parallel open-source software for simulation of colloidal systems
,”
Comput. Phys. Commun.
256
,
107455
(
2020
).
50.
G.
Falcucci
,
G.
Amati
,
P.
Fanelli
,
K. V.
Krastev
, and
S.
Succi
, “
Extreme flow simulations reveal skeletal adaptations of deep-sea sponges
,”
Nature
595
,
537
541
(
2021
).
51.
M.
Bernaschi
,
S.
Melchionna
,
S.
Succi
,
M.
Fyta
,
E.
Kaxiras
, and
J. K.
Sircar
, “
MUPHY: A parallel MUlti PHYsics/scale code for high performance bio-fluidic simulations
,”
Comput. Phys. Commun.
180
,
1495
1502
(
2009
).
52.
D. E.
Keyes
,
L. C.
McInnes
,
C.
Woodward
et al, “
Multiphysics simulations: Challenges and opportunities
,”
Int. J. High Perform. Comput. Appl.
27
,
4
83
(
2013
).
53.
R.
Benzi
,
L.
Biferale
,
M.
Sbragaglia
,
S.
Succi
, and
F.
Toschi
, “
Mesoscopic two-phase model for describing apparent slip in micro-channel flows
,”
Europhys. Lett.
74
,
651
(
2006
).
Published open access through an agreement with Consiglio Nazionale delle Ricerche