An efficient parallel Stokes’ solver has been developed for complete description of hydrodynamic interactions between Brownian particles in bulk and confined geometries. A Langevin description of the particle dynamics is adopted, where the longrange interactions are included using a Green’s function formalism. A scalable parallel computational approach is presented, where the general geometry Stokeslet is calculated following a matrixfree algorithm using the general geometry Ewaldlike method. Our approach employs a highly efficient iterative finiteelement Stokes’ solver for the accurate treatment of longrange hydrodynamic interactions in arbitrary confined geometries. A combination of midpoint time integration of the Brownian stochastic differential equation, the parallel Stokes’ solver, and a Chebyshev polynomial approximation for the fluctuationdissipation theorem leads to an O(N) parallel algorithm. We illustrate the new algorithm in the context of the dynamics of confined polymer solutions under equilibrium and nonequilibrium conditions. The method is then extended to treat suspended finite size particles of arbitrary shape in any geometry using an immersed boundary approach.
I. INTRODUCTION
The dynamics of suspended objects in confined geometries, such as colloidal particles or polymeric molecules, is important for a wide range of applications, including enhanced oil recovery, biology, materials design, and medicine. For example, in modern genomic technologies, such as DNA sequencing or optical mapping, individual DNA molecules can be directly manipulated in microfluidic and nanofluidic devices, where hydrodynamic effects play a key role.^{1–5} Another set of applications, involving labonachip devices, aim to manipulate cell and particle suspensions through the precise design of flow conduits and accurate control of flow rates and hydrodynamic forces.^{6,7} From a physical point of view, many intriguing rheological behaviors in suspensions, including foam formation,^{8} shear thickening,^{9} solidification,^{10} and shear induced migration,^{11–13} have been traced back to hydrodynamic effects. Similarly, in the context of intracellular motion, the effects of crowding are of considerable interest and are believed to be caused by hydrodynamic interactions.^{14–21} The precise mechanisms that underpin such phenomena, however, remain poorly understood, particularly when they arise at high concentrations. Efficient methods and algorithms capable of accurately capturing hydrodynamic interactions and their influence on microstructure evolution are necessary to describe the dynamics of concentrated particle suspensions and for the design of fluidic devices.
Brownian Dynamics (BD) is generally used for problems that are characterized by a broad separation of length and time scales—on the order of several orders of magnitude, between the relaxation times of suspended particles or molecules and those of the solvent. The system is evolved through the integration of the FokkerPlanck equation for the probability distribution function, which incorporates a force balance for the suspended particles and the fluid, coupled through the fluctuationdissipation theorem.^{22,23} A representative system is provided by a 21 $\mu $m DNA molecule suspended in an aqueous solution. The characteristic size of the molecule is $18\xd71 0 \u2212 10 $ m, and it has a relaxation time of 0.1 s. In contrast, the fluid has a characteristic diffusion time, i.e., the time required for the molecule to diffuse its own size, of 10^{−15} s. Simulations that include a solvent explicitly would be impractical. In BD, the fluid is represented as a continuum governed by the Stokes momentum equations.^{23–26} In such a system, the motion of each particle is influenced by that of all other particles, and it is mediated by hydrodynamic interactions (HIs), which are longranged. Explicit expressions to calculate HI are only available in free space and for simple geometries; for the general cases encountered in micro or nanofluidic devices, hydrodynamic forces must be obtained numerically.
BD simulations have been used extensively to study the dynamics of macromolecules and colloids,^{27–30} the transport of DNA,^{3,31–35} and the flow behavior of colloidal dispersions.^{36–38} Such simulations are computationally demanding, and are often limited to small systems in unbounded domains or in simple geometries. Over the last two decades, increasingly efficient methods and algorithms have been developed to evaluate HI. These include Stokes solvers,^{39,40} Stokesian Dynamics (SD),^{41–43} the Lattice Boltzmann method (LBM),^{44–47} dissipative particle dynamics (DPD),^{48,49} and Green’s function based methods.^{29,30,39,50–53} Fluid particle methods, like DPD, and fluid mesh methods, like LBM, are widely used and preferred by numerous authors. In DPD, the fluid/solvent molecules are represented by coarsegrained particles, which reduce the number of solvent particles and increase time intervals when compared to molecular dynamics (MD). However, correct implementation of confining walls, i.e., noslip boundary conditions, is not a trivial task in DPD. On the other hand, the LBM provides approximate solutions to the incompressible NavierStokes equations and requires finite Reynolds, Re, and Mach, Ma, numbers. Complementary methods are also needed in the LBM to properly model complex geometries and eliminate viscositydependent errors.^{54,55} Similar to other fluid element/mesh methods, LBM can become meshdependent when high concentrations of the suspended particles or molecules are considered, and care must be exercised in its implementation. The General geometry Ewaldlike method (GgEm)^{53} is of particular interest to our work because it directly solves the Stokes equations (Re = 0) and is able to handle arbitrary geometries and no slip boundary conditions efficiently. GgEm uses the linear character of the momentum equations to split the contributions of longrange interactions into a local and a global calculation, following the general philosophy of the Ewald split. It resolves shortrange particleparticle interactions precisely, while farfield interactions are evaluated on a mesh using a computational fluid dynamics approach. GgEm has been used for simulations of confined DNAs,^{3} DNA pore translocation,^{30,56} charged dipoles^{57} and polyelectrolytes,^{58} suspensions of rigid and deformable particles,^{59,60} and active suspensions.^{61–63}
As helpful as the GgEm method has been, its implementation is challenging, particularly in highly parallel algorithms. The purpose of this work is to introduce an efficient and scalable computational implementation and to publicly release the corresponding software. To do so, we rely on a finite element formulation for the farfield contributions, using distributed memory parallelization methods that include a direct lowerupper (LU) decomposition analytic solver for systems with less than one million degrees of freedom, and a fast parallel iterative solver, with hybrid preconditioning, for larger systems. Our parallel Stokes’ solver is combined with the GgEm, a midpoint integration scheme, and a Chebyshev polynomial approximation for the fluctuationdissipation theorem, to arrive at an efficient O(N) and completely scalable parallel BD algorithm. In what follows, we offer a detailed explanation of our proposed approach, including a discussion of important numerical issues and general aspects of the algorithm. We then illustrate its use in the context of confined polymer solutions and finitesize particle dynamics. More specifically, we calculate the diffusion of polymers in a slit geometry and demonstrate that the correct Zimm scaling is obtained.^{26,64} We then simulate flowing polymers in a crosschannel geometry and show how the method can be used in arbitrary domain shapes. We also show results for the dynamics of finitesized particles using an immersed boundaryGgEm formulation.
II. THE FOKKERPLANCK AND THE BROWNIAN DYNAMICS EQUATIONS
The stochastic differential equation that governs the dynamics of N suspended beads in a viscous solvent is obtained from the force balance
for each bead $\nu \u2009=\u20091,\u2026,N$, where $ f \nu H $ is the hydrodynamic force, $ f \nu B $ is the Brownian force, $ f \nu E V $ are excluded volume forces (beadbead and beadwall), $ f \nu C $ are configurational forces (springs), $ f \nu o t h e r $ are other external forces that may apply (e.g., forces of electrostatic or magnetostatic character), M is the mass tensor, and U is the translational velocity. Re is assumed to be zero and inertial effects are neglected; therefore, the lefthand side of Eq. (1) is equal to zero.
In a Lagrangian frame of reference, the evolution equation for the probability distribution function, $\psi ( x , t ) $, for the bead positions is a convectiondiffusion equation of the FokkerPlanck type,^{22}
where $R= ( x 1 , x 2 , \u2026 , x N ) $ is a 3N vector containing the spatial coordinates of the beads, $\kappa $ is a diagonal $3N\xd73N$ tensor with the diagonal components of the imposed velocity gradient $\u2207 U 0 $, D is the $3N\xd73N$ diffusion tensor, F is a 3N vector with the nonBrownian and nonhydrodynamic components of the force, k_{B} is Boltzmann’s constant, and T is the temperature.
The diffusion tensor is given by D = k_{B}TM, where M is the mobility tensor that includes Stokes’ drag and the pairwise Stokeslets that account for the hydrodynamic interactions between beads. Note that the time evolution of $\psi $, according to the FokkerPlanck equation, is due to convection, first term on the righthand side, and diffusion, second term on the righthand side, of the probability density. This is particularly important because once this equation is transformed to produce an equivalent stochastic differential equation, convective and diffusive terms will be generated. Each of these terms imposes particular challenges for their efficient numerical integration.
Assuming a continuous probability density, using the ChapmanKolmogorov equation with a Wiener process, an equivalent stochastic differential equation for the motion of the beads is obtained as follows:^{23}
where U_{0} denotes a 3N vector with the unperturbed fluid velocity at the bead’s position and $U=M\u2009\u22c5\u2009F$ contains the fluctuating velocities from the hydrodynamic interactions; this is a convective term. The divergence of the diffusion tensor $ \u2202 \u2202 R \u2009\u22c5\u2009D$ is the drift resulting from the configurationdependent mobility of the confined particles; this is the first diffusive term. Finally, dW is a random vector, the components of which are obtained from a realvalued Gaussian distribution with zero mean and variance dt. It is coupled to the diffusion tensor through the fluctuationdissipation theorem: $D=B\u2009\u22c5\u2009 B T $. The term $ 2 B\u2009\u22c5\u2009dW$—the second diffusive term—represents the Brownian displacement, which results from collisions between the beads and the surrounding molecules in the fluid.
Three issues must be addressed in order to arrive at an efficient numerical algorithm for Eq. (3). First, the mobility/diffusion tensor cannot be constructed explicitly; this would result in an O(N^{2}) algorithm. This implies that the fluctuating velocity, U, the divergence of the diffusion tensor, $\u2207\u2009\u22c5\u2009D$, and the diffusion tensor decomposition, B, must be implemented in a matrixfree scheme. In our proposed algorithm, we use (i) the General geometry Ewaldlike method (GgEm) for a matrixfree product of the mobility tensor with any vector, $M\u2009\u22c5\u2009F$; (ii) a midpoint algorithm, proposed by Fixman,^{65} that avoids the explicit calculation of $\u2207\u2009\u22c5\u2009D$; and (iii) a Chebyshev polynomial approximation for the $B\u2009\u22c5\u2009dW$ product, also proposed by Fixman,^{66} that uses the GgEm to avoid the explicit calculation of D. The entire algorithm scales as O(N) and enables one to handle arbitrarily shaped systems by relying on the GgEm.^{53} In addition, we also rely on parallel finite element methods and parallel libraries to achieve the above tasks.
A. General geometry Ewaldlike method (GgEm)
A velocity field, $U\u2009=\u2009 ( u 1 , u 2 , \u2026 , u N ) \u2009=\u2009M\u2009\u22c5\u2009F$, driven by a set of forces at the bead positions, $F\u2009=\u2009 ( f 1 , f 2 , \u2026 , f N ) $, can be calculated directly by recognizing that, for zero Re, the force balance on the fluid is reduced to Stokes’ momentum equations,
where $\mu $ is the fluid viscosity, u(x) is the fluid velocity field generated by a force density $ \rho f ( x ) = \u2211 \nu = 1 N f \nu \delta ( x \u2212 x \nu ) $, and p is the corresponding pressure. The inclusion of the Dirac delta function, $\delta ( x ) $, in the force density implies that we are assuming, for the moment being, that the forces exerted by the beads on the fluid are singular pointforces.
Similar to conventional particlemesh Ewald methods,^{67} GgEm uses the linear character of Stokes’ equation to redefine the force density in Eq. (4),
where the “local” density is
while the “global” density is given by
The “screening” function, g(x), must satisfy $ \u222b a l l \u2009 s p a c e g ( x ) =1$, and it is chosen in a way that the local density contribution to the velocity field decays exponentially. We have found that^{53} a modified Gaussian function of the type,
results in an exponentially decaying “local” velocity field over a length scale $ \alpha \u2212 1 $.
Ignoring the confining walls, a Green’s function calculation can be carried out to obtain the local contribution according to
where G_{l}(x) is a screened Green’s function given by
where r = x.
Note that the aim here is to calculate the product of the mobility/diffusion tensor and a forcing vector. In doing so, the physical characteristics of such a tensor, its positive definiteness and selfadjoint form, must be preserved. As the concentration of beads is increased, using a pointforce model does not ensure such limits. We can, however, introduce a pointforce regularization using the same form of the modified Gaussian over a new length scale $ \xi \u2212 1 $. For $ \xi \u2212 1 =3a/ \pi 1 / 2 $, the maximum fluid velocity is equal to that of a particle with radius a and the pair mobility remains positivedefinite. Introducing this regularization in the local contribution is trivial because it only modifies the screened Green’s function as follows:
The functions $ G l R $ and G_{l} decay exponentially on the length scale $ \alpha \u2212 1 $. In practice, the local velocity field can be computed, as is done in Ewaldlike methods, by only considering the neighbors of each particle. On the other hand, the “global” contribution to the velocity field, u_{g}(x), which is driven by the force density $ \rho g f ( x ) $, is obtained numerically by solving the Stokes equations,
This requires that u_{l} + u_{g} satisfy the appropriate boundary conditions at the domain walls and boundaries. For example, a Dirichlet boundary condition, where the wall velocity is given by $ u \xaf ( x W ) $, results in a boundary condition for the global contribution $ u g ( x ) = u \xaf ( x W ) \u2212 u l ( x ) $, for $x\u2208 x W $.
The global solution is obtained on a set of M discrete points on a mesh; in this regard, the GgEm resembles a ParticleParticleParticleMesh (PPPM) method, where the assignment function is replaced by the delta function. Several techniques (finite differences, finite elements, and spectral methods) can be used to find the global contribution; after the mesh is resolved, interpolation is used to get the value of the global velocity at position x. As mentioned earlier, in the GgEm, $M\u2009\u22c5\u2009F$ is determined implicitly, requiring O(N) operations. For the pointforce version, M = M^{T}, as required by the selfadjoint character of Stokes’ equations. For the regularized version, M = M^{T} if the boundary condition on u_{g} remains the same as for the pointforce version, at the cost of violating the noslip boundary condition for points within $\u223c \xi \u2212 1 $ from the wall (which would normally be prevented by excluded volume interactions).
B. Midpoint integration algorithm
To avoid the explicit calculation of $\u2207\u2009\u22c5\u2009D$, we adopt Fixman’s midpoint algorithm,^{24} where the time evolution from time t to time t + dt requires an intermediate step $ t * $,
C. Chebyshev polynomial approximation
Finally, the diffusion tensor decomposition is carried out using a Chebyshev polynomial approximation as proposed by Fixman,^{65} which guarantees that the fluctuationdissipation theorem is satisfied. When combined with the GgEm, it results in a matrixfree algorithm because it only requires matrixvector products. This method has been widely used in previous studies with unbounded or periodic domains.^{3,11,27,33,53} Here we only present a brief summary of the algorithm and highlight some of the key issues that arise in our scalable numerical implementation.
An operation over a matrix, f(M), can be approximated by a linear combination of Chebyshev polynomials as follows:
where a_{l} and L are the coefficients and order of the approximation, respectively, and
The parameters d_{a} and d_{b} depend on the span of the approximation and are defined by
where $ [ \lambda m i n , \lambda m a x ] $ is the range of eigenvalues of M where the approximation takes effect.
For the midpoint algorithm, we require calculation of $y=f ( M ) \u2009\u22c5\u2009dw$. This can then be expressed in terms of a series of matrixvector products given by
where
The convergence rate of this polynomial approximation depends on both the condition number of the matrix and the accuracy of the evaluation of its eigenvalues. The wellconditioned character of the diffusion tensor is related to its positivedefinite form, which is guaranteed by using the GgEm and a pointforce regularization. More important is the proper calculation (or estimation) of the limiting eigenvalues. This procedure cannot rely on an explicit construction of the tensor, and it should be scalable and parallelizable. In our present numerical implementation, the eigenvalues, $ \lambda m i n $ and $ \lambda m a x $, are calculated by using the opensource software package SLEPc.^{68} It is capable of rapidly solving upper and lower bounds of eigenvalues for large systems on parallel computers. SLEPc also provides a variety of solution methods in a matrixfree way. Therefore, combining GgEm with SLEPc results in a O(N) scalable parallel algorithm.
III. PARALLEL FEM STOKES FLOW SOLVER
To be able to accommodate complex confining geometries, we discretize the Stokes problem, Eq. (13), using the P^{2} − P^{1} TaylorHood mixed element for u − p.^{69} This is a stable element that is a staple of Stokes discretizations. The discrete Stokes equation in this basis has the following form:
where $ u ^ $, $ p ^ $, and $ \rho ^ g f $ are the vectors of the finite element method (FEM) coefficients for the velocity, pressure, and global forcing, respectively; A is the Laplacian operator and B is the divergence operator discretized using the FEM basis, both incorporating whichever boundary conditions are imposed on the velocity and pressure, respectively.
Once the bead positions are determined and $ \rho ^ g f $ is formed, the Stokes solver determines $ u ^ $ and evaluates it at the bead positions to obtain U. This procedure implicitly defines the mobility tensor M. Since M is applied repeatedly in the timestepper and the Chebyshev approximations, it is crucial to have an efficient solver for Eq. (27) that can handle sufficiently fine discretizations, necessary for resolving complicated confining geometries of largesize containers.
A. Direct solver
For meshes with moderate numbers of degrees of freedom N_{h}, the best approach to solve Eq. (27) is to use a direct solver, such as the LU factorization $Q=L\Lambda U$, where L and U are lower and uppertriangular, respectively, and $\Lambda $ is diagonal. The advantage of this approach is that the matrix has to be factored only once, and the factors can be reused repeatedly to compute the action of M, including inside the Chebyshev operator expansion, using back and forwardsolves U and L, respectively.
The main limitations of this approach are that the L, U factors have to be explicitly computed and stored, consuming $O ( N h 3 ) $ time and $O ( N h 2 ) $ storage, respectively. The solves with L and U can also consume $O ( N h ) $ time and can introduce substantial serialization among the processors in a parallel setting. While parallel direct solver packages, such as SuperLU_Dist^{70} allow for moderately large systems to be solved in reasonable time by partitioning the factor matrices among processor memories, the direct approach is fundamentally unscalable: storage requirements with $ N h \u22485.0\xd71 0 5 $ exceed 16 GB, a typical RAM size for many commodity machines.
B. Iterative solver
Modern scalable parallel solvers for sparse linear systems such as Eq. (27) are usually implemented using Krylov subspace methods (KSPs). These are a family of algorithms that rely solely on the availability of a subroutine implementing the matrixvector product $Qw$ to solve $Qw$ = h.^{71} Because of this, KSP are highly parallelizable, provided an efficient preconditioner is available. Popular members of the KSP family are the conjugategradient (CG) method, which is applicable to symmetric positive(in)definite systems only, and the more flexible generalized minimal residual (GMRES) method. It is useful, albeit not exactly accurate, to think of a preconditioner as an operator P that approximates the inverse to Q so that $PQ\u2248I$ at least in the sense of a better clustering of the resulting spectrum. The Portable, Extensive Toolkit for Scientific Computation (PETSc) library implements a wide variety of KSP methods as well as a number of generalpurpose preconditioners or makes preconditioners from other packages available through a uniform Application Program Interface (API). This allows the user to quickly experiment with different combinations of solvers and finetune the solver parameters without changing the code.
Since Q in (27) is not positive definite, CG is inapplicable, but the more flexible GMRES can work well, given a good preconditioner for Q. Multigrid methods and other generalpurpose preconditioners are not very effective for Q due to its indefinite nature and a zero diagonal pressure block. However, PETSc provides effective tools for building preconditioners out of block methods based on decompositions such as the natural velocitypressure split.
Generally, the most effective preconditioners for the discrete Stokes system^{72} exploit the Schur complement matrix S = −BA^{−1}B^{T}, obtained from a factorization of Q,
Indeed, inverting the block matrices left to right leads to an exact solution $ x y $to $Q x y = f g $in terms of solutions to the subsystems,
defining the action of operator $ x y =P f g $. The first and the last systems involve the solves with the discrete Laplacian, which can be effectively preconditioned by a multigrid method or, even more simply, using ILU(0)—an incomplete LU factorization with zero fill applied locally on each processor.^{71} The middle (negatively definite) system also responds well to GMRES, provided that S can be applied efficiently without forming the matrix, and that an effective preconditioner for S can be obtained.
Many common preconditioners for Stokes are approximate solution operators P obtained by solving the three systems above inexactly using Krylov solvers $\kappa $ on the blocks of Q. This is done by replacing $ A \u2212 1 $ and S^{−1} in the three systems of Eq. (28) by applications of $\kappa ( A ) $ and $\kappa ( \u015c ) $, respectively. The Schur preconditioner itself is applied only approximately using $\kappa ( A ) $ to define
Varying degrees of accuracy are required of the different $\kappa $, and frequently only a few iterations (or even one) are necessary for an effective preconditioner; in particular, each occurrence of $\kappa ( A ) $ potentially denotes a different solver.
The main difficulty is in preconditioning $\u015c$ in such a way that the overall solver performance does not degrade with mesh refinement. A classical approach to preconditioning S or $\u015c$ consists of using the pressure mass matrix M_{p}, which is simply the classical mass matrix for the pressure finiteelement basis; in our case, it is the mass matrix in the P^{1} basis of piecewise linear continuous functions. Mass matrices have many desirable properties; in particular, they are symmetric, positivedefinite, and have positive entries. Many Stokes solvers use −M_{p} to precondition $\u015c$ very successfully, since $\u2212 M p \u2212 1 \u015c$ generally has very good spectral properties, making KSP converge rather well.
Since −M_{p} is a good preconditioner for $\u015c$, it is spectrally similar to $\u015c$ and can in some sense replace $\u015c$. Using this observation, we were able to devise a particularly effective preconditioner for Eq. (27) that replaces $\u015c$ by −M_{p}. This solves the problem of preconditioning the middle system in Eq. (28), since M_{p} is well preconditioned by the diagonal lumped mass matrix $ M ^ p $, which contains the row sums of M_{p} on its diagonal. At the same time, the use of M_{p} eliminates the need for the relatively expensive application of $\kappa ( A ) $ in the definition of the action of $\u015c$ on vectors.
Whenever pressure boundary conditions are used, it is important to observe that the lowerright block in Eq. (27) is not actually zero but contains rows of the negative identity matrix corresponding to the mesh nodes where this boundary condition is applied. In that case, we add those rows to −M_{p}, which remains negativedefinite. In fact, we use a penalty formulation of all Dirichlet boundary conditions, which adds large negative diagonal entries to −M_{p} and, incidentally, preserves the symmetry of $ A $. In particular, we can use CG with ILU(0) or its Cholesky variant ICC(0) for both $\kappa ( A ) $ and $\kappa M p $. While ILU(0) is not guaranteed to preserve the symmetry or the definiteness of the preconditioned operator, in our experience this does not cause a problem. In case CG fails to converge, the more robust GMRES can be used.
An additional simplification of P can be obtained by dropping the last of the three factors and inverting the remaining two using $\kappa $ in place of inverses. This is a rather common approximation (see Ref. 72), which might increase the number of GMRES iterations needed for convergence, but makes each application of P considerably cheaper, since it omits the intermediate $ z $. Concretely, the action of our preconditioner P is defined as $ P ^ f g = x y $, where $ x $ and y are computed as follows:
The effectiveness of KSP methods depends on the particular righthand side vector in the system being solved. For the specific case of $ \rho ^ g f 0 $, GMRES preconditioned with the P described above proves to be extremely effective, outperforming all other iterative methods we are aware of.
Finally, we note that as defined above P is, in fact, a nonlinear operator since it contains possibly underconverged KSP applications in it. This might cause GMRES to diverge, although this has not happened in our experience. In that case, the more robust flexible GMRES (FGMRES) should be used (see Ref. 71).
C. Solver configuration and comparison
In our software, the selection of a direct solver or an iterative solver can be done by simply changing runtime parameters, depending on the specific demands of the problem—most commonly, the size of the computational mesh. Moreover, all of the iterative solver parameters can be controlled in this manner, allowing the user to finetune the solver to a specific problem. We provided defaults, however, that we found to be the most effective. These can be nonetheless overridden at runtime thanks to PETSc’s flexible commandline options system. In particular, by default, $\kappa ( A ) $ above is CG preconditioned with ILU(0) on each processor’s local block and solved to the relative tolerance 10^{−6} on the residual. CG is also used in $\kappa ( M p ) $, likewise preconditioned with ILU(0) and to the same relative tolerance of 10^{−6}. These settings are used in all of the numerical experiments here whenever the iterative solver is employed. All of these settings can be easily overridden on the command line using the PETSc options database system.
We compared direct and iterative solvers using 128 CPU cores of Intel Xeon E52698v3 @ 2.3GHz on Blues at Argonne National Laboratory (ANL). We observe that, for problem sizes up to 1.2 × 10^{6} degrees of freedom, the direct solver performs better than iterative solvers. For systems with 0.32, 0.64, and 1.2 × 10^{6} degrees of freedom, the Stokes solution takes 0.083, 0.18, and 0.55 s, respectively, using a direct LU solver (in the preparation step, the LU decomposition takes 80, 225, and 1036 s, respectively). On the other hand, the iterative Stokes solution takes 6, 10, and 29 s, respectively, for the same system sizes. For a problem size with 2.4 × 10^{6} degrees of freedom, the direct solver was unable to provide a onestep Stokes solve and hung in the LU decomposition process for at least 6 h. The iterative solver, for systems with 2.4, 9.7, and 16.1 × 10^{6} degrees of freedom, took during one step Stokes solve 57, 218, and 625 s, respectively.
IV. RESULTS
Multiple levels of validations of our parallel finite element GgEm solver (pFEGgEM) are in order. We start from the Stokes’ solver precision and scalability and follow with the convergence and precision of the full BD code. We will include, however, two systems that illustrate the capability of these opensource and available routines.
A. Scalability and Stokes’ flow validation
Accounting for the fact that the GgEm is a proven and validated method, we wish to verify that the parallel FEM solver is correct and that the scalability of the system will allow its implementation to large systems (even though we already pointed out that the pFEGgEm method permits the study of confined, nonequilibrium systems).
We start by locating three point particles in an infinite cubic domain, the boundary of which is unconstrained, so that an analytic solution, using the free space Green’s function or Oseen tensor, is accessible. In order to make the solutions comparable, we set the exact values on the domain boundaries evaluated from the analytic method as the boundary conditions of the numerical method. It also helps to identify whether the boundary conditions for the GgEm solution are solved appropriately. The pointforces are located in x_{1} = (−5, 0, 0), x_{2} = (0, 0, 0), and x_{1} = (+5, 0, 0) in a $30\xd730\xd730$ cube.
We used the bead hydrodynamic radius a as the characteristic length scale. The bead diffusion time $ a 2 \zeta / k B T$ is the characteristic time ($\zeta =6\pi \mu a$ is the Stokes drag coefficient); they result in a characteristic force k_{B}T/a and velocity $ k B T/a\zeta $. The pointforces have a strength of $ f \nu =1/3$ along the xdirection. Figure 1 shows the velocity in the xdirection along centers of the beads calculated with the GgEm and the analytical solution. The figure includes the GgEm local and global contributions for a solution with $\alpha =0.1$ and a global mesh with a resolution of $1/ 2 \alpha $ ($15\xd715\xd715$ mesh). The inset shows the relative error between the solutions, where the maximum error is of order 10^{−2}.
The parallel scalability of the pFEGgEM algorithm is tested by placing 100 beads at random in a $30\xd730\xd730$ domain, using $\alpha =0.1$ and a $1/ 2 \alpha $ mesh resolution, i.e., $60\xd760\xd760$ mesh. The FEM mesh results in approximately 2.9 × 10^{6} degrees of freedom from the P^{2} − P^{1} FEM formulation using hexahedron elements with 20 nodes (HEX20).^{39} An iterative solver described in Sec. III—(F)GMRES with a custom Schur complementbased preconditioner—is used to solve Stokes’ equations. Simulations are carried out at ANL’s Laboratory Computing Resource Center (LCRC) Blues cluster, and the CPU time is measured as a function of the number of CPUs used for the calculation. Figure 2 shows the CPU time as a function of the number of CPUs for this system. As shown in the figure, the pFEGgEm algorithm follows closely the ideal power law scaling of −1.
B. Confined DNA solutions
We now proceed to show how the pFEGgEm algorithm translates into a full BD simulation. We start by calculating the diffusion coefficient of slitconfined DNA molecules. This simulation serves to validate the correct calculation of the diffusive terms in the stochastic differential equation. Both terms, the gradient of the diffusion tensor and the Chebyshev polynomial approximation, must be correct in order to obtain the proper diffusion coefficient and the Zimm scaling for the confined molecules.
The DNA model that we use was previously parametrized.^{11,32,33,73} A DNA molecule is described by a beadspring chain composed of N_{b} beads, with hydrodynamic radius a, that are connected by N_{s} = N_{b} − 1 wormlike springs.^{74,75} The force between two connecting beads i and j is given by
where x_{ij} = x_{j} − x_{i}, r_{ij} = x_{ij}, b_{k} is the Kuhn length and q_{0} = N_{k,s}b_{k} is the maximum spring length (N_{k,s} is the number of Kuhn segments per spring). Consequently, the contour length of the DNA molecule is L = N_{s}q_{0}. A DNA chain behaves as an ideal Gaussian chain at short scales (over the Kuhn segment). Therefore, a Gaussian beadtobead excluded volume is used as follows:^{32,73}
where $\upsilon $ is the excluded volume parameter, related to the type of solvent, and $ S s 2 = N k , s b k 2 /6$ is the radius of gyration of an ideal chain consisting of N_{k,s} Kuhn segments.
The confining walls impose excluded volume interactions to the DNA beads. We use an empirical beadwall repulsive potential^{11} of the type
where h represents the perpendicular distance between bead i and the wall, $ A w a l l =25 k B T$ is the strength of this repulsion, and $ \delta w a l l = N k , s 1 / 2 b k /2$.
Following past studies,^{32} the model parameters used here are $a\u2009=\u20090.077\u2009\mu $m, $ b k \u2009=\u20090.106\u2009\mu $m, $\upsilon \u2009=\u20090.0012\u2009\mu m 3 $, and N_{k,s} = 19.8. Three different molecular weights are simulated, namely, a 21 $\mu $m (N_{s} = 10), a 42 $\mu $m (N_{s} = 20), and an 84 $\mu $m (N_{s} = 40) long DNA molecule. The molecules are confined between two parallel walls with a separation of H = 2 $\mu $m between them. The domain is periodic in the nonconfined directions with a 20 $\mu $m period. Ten chains were randomly distributed within the domain with a volume fraction of $\varphi <0.02%$, to ensure that the system is in the dilute regime. The GgEm parameters are $\alpha \u2009=\u20090.1$ and a mesh resolution of $1/ 2 \alpha $, resulting in a $37\xd737\xd74$ HEX20 mesh and 89 913 degrees of freedom.
We measure the centerofmass inplane mean squared displacement (MSD) in order to obtain the chain diffusion coefficient, i.e., $ \u27e8 \Delta x 1 , c m 2 \u27e9 + \u27e8 \Delta x 2 , c m 2 \u27e9 \u2009=\u20094Dt$ (see Fig. 3). In the bulk, the diffusion coefficients of these DNA molecules are^{76} $ D b u l k , 21 \u2009 \mu m \u2009=\u20090.53\u2009\mu m 2 /s$, $ D b u l k , 42 \u2009 \mu m \u2009=\u20090.37\u2009\mu m 2 /s$, and $ D b u l k , 84 \u2009 \mu m \u2009=\u20090.22\u2009\mu m 2 /s$; the radii of gyration are $ R g , b u l k , 21 \u2009 \mu m =\u20090.594\u2009\mu m$, $ R g , b u l k , 42 \u2009 \mu m =\u20090.85\mu m$, and $ R g , b u l k , 84 \u2009 \mu m \u2009=\u20091.43\mu m$. Confinement and HI affect the value of the diffusion coefficient: it decreases due to the decrease of the chain mobility that is induced by the walls, and it follows a $ ( R g , b u l k / H ) \u2212 2 / 3 \u2009$^{76} scaling. Figure 3(a) shows typical MSD curves for the different DNA chains; the corresponding diffusion coefficients are plotted as a function of the confinement in Fig. 3(b). The calculated diffusion coefficients are $ D s l i t , 21 \u2009\u2009 \mu m \u2009=\u20090.3395\u2009\mu m 2 /s$, $ D s l i t , 42 \u2009\u2009 \mu m \u2009=\u20090.2104\u2009\mu m 2 /s$, and $ D s l i t , 84 \u2009\u2009 \mu m \u2009=\u20090.0713\u2009\mu m 2 /s$, which are the same as those reported in previous reports.^{32,76} They follow the correct Zimm scaling,^{26,64} ensuring that the “noise” and the HI are correctly calculated.
We now proceed to illustrate how the pFEGgEm algorithm handles a complex geometry under nonequilibrium conditions. To do this, we study the behavior of $21\u2009\mu $m long DNA chains under an elongational flow within a crosschannel geometry. Figure 4 shows the geometry and contours of the magnitude of the imposed fluid velocity at a z = 0 plane. To impose the elongation, a Poiseuille flow is generated by applying a pressure gradient $\Delta p\u2009=\u2009 p i n \u2212 p o u t $ between the inlet and outlet boundaries. A fluid viscosity $\mu \u2009=\u20091$ cP is used; the strength of the flow field is characterized by a Weissenberg number $Wi\u2009=\u2009\lambda \gamma $, where $\lambda $ is the longest relaxation time of the DNA chain and $\gamma $ is the characteristic shear rate.^{30} Three individual molecules were located near an inlet boundary, and their center of mass was then followed.
We measure the instantaneous molecular “stretch” of chain $v$ along the ydirection, $ S y , v $, according to
where $ Y \nu $ is a N_{b} vector with the yCartesian coordinates of the beads of chain $\nu =1,\u2026,3$. We plot the molecular stretch as a function of time and a reaction coordinate $\varphi $, defined as follows:
where $ L x =50.82\u2009\mu $m and $ l x =5.39\u2009\mu $m.
Figure 5 shows the instantaneous molecular stretch, along the ydirection, of the three DNA chains. Two main observations can be made from the figure. First, in the early stages ($t<0.05\u2009s$), the S_{y} magnitudes fluctuate around a value that corresponds to the initial configuration of the DNA chains. During this period, the chains are moving from the left inlet to the stagnation point under the elongational flow. Therefore, these chains are poorly stretched, and the fluctuations are mainly due to the Brownian motion and the HI. The stretch gradually increases to high values after a certain residency time. This maximum stretch, however, differs from chain to chain according to their positions along the flow lines. For instance, the chains represented by the black lines do not approach the stagnation point, and their elongation is low. On the other hand, blue and red chains manage to pass through the stagnation point, where the elongation is maximum, and their stretch is high. As a function of the reaction coordinate, the molecular stretch grows at approximately the same value of the reaction coordinate, namely $\varphi =23\u2009\mu $m, which delimits the entrance of the “cross section.” Within this section, these chains undergo a sharp velocity reduction and a transition from being elongated in the xdirection to being elongated in the ydirection.
C. Sedimentation of finite size particles: Immersed boundarypFEGgEm
Our pFEGgEm algorithm is generalizable to treat finite size particles through the Immersed Boundary (IB) method developed by Peskin.^{1,77} This generalization was applied previously by Pranay et al.^{60} for a GgEm implementation that uses finite differences and fast Fourier transforms. Here, we use a IBpFEGgEm that can handle confined, largescale suspensions of finitesize particles of arbitrary shape in an arbitrary geometry.
In the IB method, the force distributions at moving solids, interfaces or membranes are discretized as distributions of regularized pointforces, where the length scale for “smoothing” the delta function scales as the grid spacing used to represent the moving entities. That is, if h is a characteristic length for the node spacing or element size on the surface, then we relate this scale with the regularization scale of the GgEm, i.e., $ \xi I B \u223c h \u2212 1 $. This ensures that the force density associated with each node, on the surface, is spread over the length scale of the associated elements, thereby preventing fluid from penetrating the membrane surface. Similar to conventional IB methods, in the IBGgEm, the simulation results are insensitive to the choice of the regularization parameter if $\xi h=O ( 1 ) $. Details of the IBGgEM can be found in the work of Pranay et al.^{60}
As mentioned above, the forces exerted by a boundary immersed in a fluid are then modeled as a set of regularized point forces, just as is done in the regular GgEM. In particular,
where $ N I B $ is the number of nodes (beads) that are used to represent the suspended solids, $ \delta I B $ is the regularization modified Gaussian that includes $ \xi I B $, and $ f \nu C $ is the constitutive force that is used to describe the particles. The fluid velocity field comes from the numerical solution of Stokes’ equation driven by this force density. We use the pFEGgEm algorithm to efficiently calculate the dynamics of the particles. Recall that in traditional IB methods, the fluid mesh must be selected to resolve the $ \xi I B \u2212 1 $ scale; using the GgEm prevents this, because the fluid mesh resolves the $ \alpha \u2212 1 $ scale,^{60} thereby increasing the performance of the algorithm.
We test the IBpFEGgEm approach by simulating the sedimentation of ten solid particles in a channel with rectangular symmetry. Seven cylinders and three spheres are initialized in a particular distribution, as shown in Fig. 6. The particles are sedimenting in a rectangular channel of size of $300\xd7100\xd7100$ due to a sedimenting force $g=0.1 \delta 1 $ along the channel axis. The radius of the biggest sphere is 15. In this simulation, we kept the same “Brownian” characteristic variables for consistency: a for length, $ a 2 \zeta / k B T$ for time, and k_{B}T/a for the force. The solids are modeled using 218 nodes for the spheres and 278 nodes for the cylinders, resulting in a total of 2600 tracking points. The pFEGgEm mesh, for the global contribution, has 145 696 degrees of freedom, for a $\alpha =0.1$. The node separation for the particles’ surfaces is between $ h m i n =1.246\u2009247$ and $ h m a x =5.402\u2009710$; the smoothing parameter of the IB is $ \xi I B =1.0/0.75 h m i n $.
We assume that the particles are “rigid,” where each node point (tracking bead) on the particle is linked to its neighboring nodes by elastic springs with a prescribed large stiffness constant. In addition, all of the nodes are connected to its geometric centerofmass point by an elastic spring. For simplicity, we assume that each link is a linear spring, and the force acting on the point i by the point j is given by
where k is the spring elastic constant and r_{0} is the equilibrium spring size for each specific situation. For each particle, a spring network is formed, which generates the internal nodal force on each of the tracking beads to resist the deformation of the particles.
A time sequence of the particle’s positions is shown in Fig. 6. Fluid velocity contours are also included in Fig. 7. One can appreciate that, as expected, particles located at the center of the channel sediment faster than those closer to the walls, due to the higher mobility, i.e., the fluid velocity is zero at the walls. The HI prevents the solids from sedimenting at the same rate and changes their relative orientation due to different fluid resistances. The induced fluid velocity includes regions where the velocity goes against the sedimenting direction (negative values in Fig. 7), which is a typical characteristic of sedimenting systems.^{78}
V. pFEGgEm ROUTINES AND LIBRARIES
The parallel finite element GgEm (pFEGgEm) routines are built using open source libraries, thereby facilitating usage of our software. We list the required libraries and the repositories to download our pFEGgEm routines:

the finite element method is built using libMesh^{79} and PETSc,^{80–82}

for systems with less than one million fluid degrees of freedom, we recommend a direct solver such as the distributedmemory SuperLU_Dist,^{70,83,84}

the scalable eigenvalue calculation uses SLEPc,^{68}

the pFEGgEm routines can be downloaded at http://miccomcodes.org as part of the ContinuumParticle Simulation Suite (COPSS) from the Midwest Integrated Center for Computational Materials (MICCoM).
VI. CONCLUSIONS
We have developed an efficient O(N) computational approach to model the dynamics of hydrodynamically interacting Brownian or micronsized particles in arbitrary geometries. A parallel finite element Stokes’ solver is the center of the algorithm. Once it is combined with the General geometry Ewaldlike method (GgEm), a midpoint time integration scheme, and a Chebyshev polynomial approximation, for the fluctuationdissipation theorem, results in a scalable algorithm—the pFEGgEm algorithm. The approximations within these methods reduce the precision with which the fluctuationdissipation theorem is observed. However, the Chebyshev approximation has been used extensively in confined and unconfined systems and its performance is excellent. On the other hand, the GgEm resolves Stokes equations with satisfactory precision. In particular, a 10% tolerance in the Chebyshev polynomial approximation and a proper selection of the GgEm alpha and mesh parameters result in the correct diffusion regime and diffusivities within a 5% error of the analytical or experimental values. In practice, precision is slightly sacrificed in the interest of computational performance: O(N^{3})^{85} vs. $O ( N log N ) $ or O(N). The finite element formulation in the pFEGgEm has a distributed memory parallelization that may use a direct LU analytical solver or a fast parallel iterative solver with preconditioning.
The pFEGgEm algorithm shows good parallel performance (linear) as a function of the number of CPUs. We validated the algorithm by comparing its solutions to theoretical results in a simple geometry (for Stokes’ solver) and by calculating the diffusion coefficients of slitconfined DNA molecules (for the midpoint scheme, eigenvalues of the diffusion tensor, and the Chebyshev polynomial approximation).
It was also shown that the pFEGgEm algorithm is capable of handling complex geometries under nonequilibrium conditions by studying the behavior of DNA chains under an elongational flow in a crosschannel geometry. Finally, the proposed algorithm was combined with the immersed boundary method, the IBpFEGgEm approach, to simulate finitesize particles of arbitrary shape in any geometry. To illustrate the use of the IBpFEGgEm, we presented results for the sedimentation of solid particles in a rectangular channel.
We stress that the proposed algorithms are built on top of opensource libraries, such as libMesh (a framework for parallel mesh management and FEM discretization of Partial Differential Equations), PETSc (parallel linear and nonlinear equation solvers), and SLEPc (eigenvalue calculations). We have provided convenient interfaces and building blocks for computational researchers that will facilitate implementations of our routines in available BD simulation codes.
ACKNOWLEDGMENTS
This work was supported by “MICCoM,” as part of the Computational Materials Sciences Program funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division and by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract No. DEAC0206CH11357. We gratefully acknowledge the computing resources provided on Blues and Fusion, highperformance computing clusters operated by the Laboratory Computing Resource Center at Argonne National Laboratory.