The formulation of waterborne coating fluids composed of colloidal particles interacting with associative polymers as rheology modifiers is a complex multiscale problem with competing design requirements, for which the selection of new formulations remains largely empirical. To move toward rational multiscale design, we here develop active learning meta-models that capture from fine-grained molecular/colloidal simulations the association thermodynamics and dynamics of associative polymers bridging the particles. These properties dictate the macroscopic rheological behavior of these suspensions and can be used in a coarse-grained rheological model for practical predictions. The meta-models were developed using an intelligent search algorithm based on fine-grained data acquired from detailed Brownian dynamics simulations. The active learning approach enables an efficient meta-model development, removing the need for the conventional exhaustive exploration of the entire multidimensional design space. We applied Shapley additive explanations, a machine learning interpretability tool to the developed meta-models, which reveals quantitatively how the gap between particles largely determines the bridge and loop lifetimes. The attraction strength between the polymer ends and particles has minimal effect on the bridge and loop fractions but strongly influences rates of transition between them. These methods pave the way for the computational design of waterborne coatings, guiding bench-top formulation. Additionally, the developed meta-models provide an efficient mechanism for the transfer of information from higher resolution to lower resolution models, therefore bridging the scales, which can be applied to other problems involving the prediction of coarse-grained rheological or other dynamic properties from fine-grained molecular and colloidal simulations.

Understanding the association dynamics of multicomponent and multiscale materials composed of interacting colloidal particles with polymers is crucial in various biological and chemical processes such as wound healing, involving platelet-polymer plugs that assemble at the site of damage [1], drug delivery systems [2], food products [3], cosmetics [4], and waterborne coatings [5]. Waterborne coatings were introduced as a sustainable alternative to conventional coatings, which often contain hazardous solvents that pose risks to health and the environment. However, if not appropriately formulated, they may fail to meet desired performance and criteria, particularly rheological behavior during application as well as their shelf life [6]. Formulating effective waterborne coatings demands fine-tuning a range of competing design requirements. The impacts of various design choices on rheological properties are intertwined and multiscale, making the selection of new formulations largely empirical.

The components of waterborne coating fluids that most impact their rheology are colloidal latex particles and associative polymers, all suspended in water. The associative polymers are typically telechelic; they consist of a long hydrophilic backbone, commonly polyethylene oxide (PEO), capped at each end by relatively small hydrophobic alkyl groups (stickers). The stickers primarily associate with hydrophobic surfaces such as latex colloidal particles. Their attraction strength is determined by the number of methylene groups [7]. These associative chains, a.k.a rheology modifiers, can take four possible configurations in latex-polymer solutions, namely, (a) a bridge between neighboring particles, (b) a loop on one single particle, (c) a dangling chain with one end in the solution, and (d) a free chain in the solution. In addition, within a paint mixture, their association with other components of the waterborne coatings, such as pigment particles, was determined to have a negligible effect on rheology [8].

The typical associative polymers used in waterborne latex coating formulations are hydrophobically ethoxylated urethanes (HEURs). In the solution by themselves, these telechelic polymers usually aggregate into flowerlike micelles with a core of hydrophobic ends and a corona of stretched hydrophilic blocks. At moderate concentrations, stickers entering and leaving the adjacent flowers enable network relaxation. The most straightforward theory for such solutions was developed by Green and Tobolsky [9] and termed the “temporary network model.” In this relatively simple theory, the network relaxation is determined by a single relaxation time, denoted as τ. When there is no flow, this relaxation time is exponentially dependent on the free energy cost, Δ G, for the escape of the sticker from the micelle, given by the equation τ = τ 0 exp ( Δ G / k B T ). Annable et al. [10] observed that the relaxation times determined rheologically for the HEUR solutions agreed with the above equation, assuming that Δ G increases linearly with the length of the sticker. The increase was estimated as approximately 1  k B T per methylene group added to the sticker.

Unlike pure HEUR solutions, latex particle-HEUR solutions possess a broad distribution of relaxation times with nonterminal behavior. Wang and Larson [11] developed Brownian dynamics (BD) simulations of small particles with explicit, but over-simplified, dumbbell chains representing polymers to understand the structure and dynamics of latex particle-polymer suspensions. They demonstrated that even a simplified system shows at least four relaxation times, including (1) local relaxation of the HEUR loops, τ H E U R; (2) translation of the loops along the particle surfaces, τ s u r f; (3) relaxation of the HEUR bridges, τ b r i d g e; and (4) relaxation of clusters of bridged particles, τ c l. In general, τ H E U R and τ s u r f are very small (<1  mu s), and, therefore, these modes do not affect the experimentally detectable contributions to rheological behavior. These simulations facilitated the investigation of the interaction between polymers and particles. However, they can only predict stress relaxation for small particles with rapid diffusion, involving up to several dozen chains per particle. Simulating more realistic systems with larger particles ( 100 nm), each covered by hundreds of HEUR chains, is prohibitively expensive. Furthermore, a small system size limits the investigation of slow dynamics, such as relaxation of clusters of bridged particles and of the system-spanning network.

Thus, to mitigate the computational challenges posed by the numerous degrees of freedom, Hajizadeh et al. [12] developed a novel hybrid population balance-Brownian dynamics (Pop-BD) model. Unlike explicit BD simulations that track each individual chain, Pop-BD replaces the chains with dynamic bonds/potentials that are continuously added or removed based on particle motion. This approach employs probabilistic rules to estimate the likelihood of bridge or loop formation and rupture, depending on the current state of the system. Reducing the systems’ degrees of freedom, the Pop-BD model can simulate systems with hundreds of large colloidal particles and hundreds of implicit chains per particle. The ability to capture relaxation times related to chain relaxation is constrained in Pop-BD because it does not account for the presence of the explicit chains. However, this innovative approach is particularly useful for exploring large-scale systems and their larger relaxation times of relevance to experimentally measurable rheological responses of realistic systems, while being informed by the essential physics of the polymer-particle interactions from high resolution models.

However, accurately informing the Pop-BD model requires knowledge of bridge populations and loop-to-bridge and bridge-to-loop transition rates [represented by mean loop lifetime (MLL) and mean bridge lifetime (MBL) in this study]. These transition rates are functions of various factors, including but not limited to the gap between two particles, the sticker-free energy or attraction strength and the gap-dependent spring potential for chain stretch. Tripathi et al. [13] proposed a constitutive model to capture the dynamic behavior of telechelic polymer networks encompassing micelles and polymeric chains. Quantifying the shear-dependent behavior of pure HEUR solutions, they approximated the loop and bridge transition rates between micelles. These rates were later used as inputs for the Pop-BD model developed by Hajizadeh et al. [12] It should be noted that the simulation of particles using the gap-dependent transition rates required for the Pop-BD simulations has now been integrated into HOOMD-blue [14], a versatile particle simulation toolkit [15]. In their 2018 study, Rezvantalab and Larson [16] investigated transitioning from loops to bridges in bead-spring chains within a dilute regime, neglecting polymer-excluded volume effects. Additionally, they assumed a constant bridge-to-loop rate irrespective of the extent of chain stretching. Their investigation underscored the necessity of achieving resolution at the Kuhn step level to obtain reliable findings. Subsequently, Travitz et al. [17] employed BD simulations to measure the faster bridge-to-loop transition time. This analysis accounted for the excluded volume effects at the level of a Kuhn step. Furthermore, they applied a self-consistent field theory (SCFT) developed by Zhang et al. [18] to determine the equilibrium ratio of the number of bridges and loops and used detailed balance to thereby infer the slower loop-to-bridge transition times. These simulations treated colloid particles as flat surfaces, which is valid only in the small interparticle gap-to-colloid radius ratios, where the Derjaguin approximation can be used to account for the nonuniform gap. This model was not verified based on explicit-chain BD simulations with smaller colloid particles.

The main challenge in developing a precise predictive model of waterborne coating fluids is the fact the chains’ association dynamics and separation distances of the colloids unfold across substantially different time and length scales. These scales span from microseconds and 1 nm for the association dynamics to 100 nm and seconds for re-arrangement of the colloidal particle network. Consequently, a single simulation proves inadequate for encapsulating the extensive array of time and length scales inherent in macroscopic many-colloid simulations. Nevertheless, simulation of these dynamics at both the smaller and larger scales is essential for accurate prediction of the rheological behavior of these materials. Thus, the outstanding challenge necessitates developing an integrated and refined predictive model for the association dynamics that can reliably relay results from fine-grain simulations to coarser-grained simulations, such as Pop-BD.

Addressing this challenge has long been a need, but recent advances in computer modeling across multiple scales, combined with the advent of advanced machine learning (ML) techniques, now offer promising solutions [19,20]. Specifically, active learning meta-modeling can significantly enhance the efficiency and accelerate the development of precise meta-models or surrogate models of expensive-to-evaluate simulations. These ML methodologies commonly integrate Gaussian process (GP) models [21] with adaptive sampling strategies, efficiently exploring the design space and building accurate surrogate models. The sampling strategies involve iteratively selecting the most informative simulations or experiments for training and targeting areas where the model’s predictions are uncertain or need further understanding [22,23]. Furthermore, there is a particular interest in understanding how ML-based meta-model predictions are influenced by system parameters from the design space. This requires interpretability techniques such as feature importance analysis [24], partial dependence plots, and SHAP (Shapley additive explanations) values [25]. Employing interpretability techniques can leverage the understanding of the output of the ML meta-model and uncover underlying patterns and interactions within the system parameters. These approaches provide crucial guidance for material design and discovery [26] with valuable insights into the fundamental mechanisms of the system.

Therefore, we initially aimed to develop new fine-grained simulations that maintain accuracy without relying on the simplifying approximations commonly employed in the literature [27]. Although computationally demanding, these predictive models can predict the association dynamics of polymeric chains across various system parameters, including chain length and population, the radius and spacing of latex particles, and the attraction strength between stickers and particles. In what follows, after developing these detailed simulations, we implemented novel active learning meta-models. The active learning process is able to minimize the number of samples necessary for building an accurate surrogate model over the design space, as it iteratively selects the most informative samples based on the knowledge gained from previous ones. This study prioritized interpretability throughout the model development process to promote the comprehension of these meta-models. The developed meta-models, consequently, facilitate the integration of the microscopic simulations into macroscale models by passing information from the detailed and fine-grained BD simulations to coarse-grained models such as Pop-BD. The gained insights also provide a flexible tool for exploring and optimizing the design parameters of waterborne coatings.

The paper is structured as follows. In Subsection II A, we investigate the dynamics of chains at the high resolution of a single Kuhn step ( 1 nm) by performing BD simulations of a minimal model, which allowed us to capture key responses, including loop and bridge lifetimes as well as bridge populations, essential for informing the lower-resolution simulations such as Pop-BD. In Subsection II B, we define the design space and develop an intelligent search algorithm to gradually learn the model responses across the design space. In Sec. III, we analyzed our findings to explain the results and highlight the significance of different parameters on the model responses using a machine learning interpretability tool. Finally, the conclusion and future direction of this research are discussed in Sec. IV.

The simulated model features both colloidal particles and HEUR polymeric chains. The particle size varies from around 10 nm in theoretical studies [11,27] to approximately 60 nm in typical latex suspensions. As previously noted, HEURs are telechelic polymers characterized by PEO backbones, end-capped on both sides with hydrophobic alkyl groups (stickers). The molecular weight of the PEO is 137 g/mol per each Kuhn segment with an effective length ( b k) of approximately 1.1 nm [28]. Experimental studies [5] typically employ telechelic polymers with a molecular weight of approximately 30 000 g/mol, corresponding to around 200 Kuhn steps. The strength of attraction ( ε s) between the stickers and colloidal particles is quantified by the number of carbon groups in the hydrophobic end, each contributing to approximately 0.75  k B T [6]. The range of ε s varies from 7 to 10  k B T in this study. We restricted the hard-core radius of colloidal particles, R p, to between 11 and 33 nm and the number of Kuhn steps ( N k) from 10 to 30. This parameter range resulted in end-to-end distance to colloid radius ratios of approximately between 0.1 and 0.5, which aligns with the ratios used in experiments ( 0.23). However, it should be noted that the ranges of ε s, R p, and N k are restricted due to the current computational limitations. The number of chains per particle was assumed to range from 10 to 100, with the upper bound approaching typical experimental values [29]. The ranges of these variables are summarized in Table I.

TABLE I.

The system parameters or model features explored in this study with their ranges, shaping the five-dimensional design space.

System parameters (features)Rangea
H/Nk bk [0.1 − 0.6] 
Rp/bk [10 − 30] 
Nk [10 − 30] 
Nc/Np [10 − 100] 
ε s / k B T [7 − 10] 
System parameters (features)Rangea
H/Nk bk [0.1 − 0.6] 
Rp/bk [10 − 30] 
Nk [10 − 30] 
Nc/Np [10 − 100] 
ε s / k B T [7 − 10] 
a

The ranges are in reduced units.

The HEURs are modeled as chains of repulsive beads, with a resolution of one Kuhn step per added bead due to recent results showing that further coarse graining tends to lose accuracy [27]. The bonds between beads are represented as stiff, rodlike, harmonic springs as
(1)
where k = 400 k B T / b k 2, r 0 = 1.0 b K, and r is the distance between the bead centers. The beads, including stickers, interact with each other via a Weeks–Chandler–Anderson (WCA) potential,
(2)
with ε b b = 1.0 k B T and σ b b = 0.4 b K [18], set by matching the radius of gyration vs N k with light-scattering experiments for dilute PEO in water [30]. A shifted Lennard-Jones potential is also employed to represent the sticky interaction between the hydrophobic sticker to the hydrophobic colloidal particle,
(3)
where σ p s = 1.0 b K and ε s is the attraction strength between stickers. r is the distance between the center of the stickers and that of the particles. R p is the hard-core radius of the particles. According to the above equation, the maximum attraction occurs when stickers and particles are positioned at R t r u e = R p + 1.12 × σ p s, where the stickers favorably tend to reside. This distance is referred to as the “true” radius of the particles [11]. Given that beads that are not stickers exhibit repulsive interactions with colloidal particles, their behavior can be effectively modeled using the WCA potential as
(4)
where σ p b = 1.0 b K, ε p b = 1.0 k B T, and r is the distance between the centers of the beads and particles.
Simulations were conducted using the BD integrator [31] in HOOMD-blue, a particle simulation software package optimized for GPU performance [15]. The integrator advances particles forward in time solving the over damped Langevin equations of motion, a.k.a BD, according to
(5)
where the subscript i refers to the beads, including the stickers. F i C represents the cumulative force on the particle from all potentials (i.e., F i C = U / r i). F i R also refers to a random force, i.e., F i R = 2 d k B T ζ i / δ ( t ), in which d is the dimensionality of the system, δ ( t ) is the delta function, and ζ i is the friction coefficient for each bead. Following the work of Travitz and Larson [17], the bead friction coefficient in our free-draining simulation is set to produce a relaxation time that matches that of a chain experiencing hydrodynamic interactions and is calculated by ζ b e a d = 1.277 × 10 11 N k 1.79 s. In the simulations, reduced units were employed, including k B T as the reduced energy unit, b k = 1.1 nm (one Kuhn step length) as the reduced length unit, and the water viscosity is maintained at a constant value of η s = 8.90 × 10 4 Pa s at 298 K. These dimensions of k B T, b k, and η s allow for calculating other reduced units; for instance, [ t ] = ( η s b k 3 ) / k B T = 0.288 ns gives the reduced time unit.

To carry out the expensive fine-scale BD simulations resolved at the level of the Kuhn step needed to obtain the data for the Pop-BD model, we develop a “minimal model” in a periodic simulation box containing both polymeric chains and a fixed particle positioned at the center of the box. This model is the smallest unit that can represent a system able to allow simulation of bridge and loop transition dynamics needed for the Pop-BD simulations, with the periodic images of the particle serving as the neighboring particles to which bridges can form. The simulation box is cubic with a dimension L that we vary from run to run to set the gap H, as L = H + 2 × R t r u e, in which H is then the effective interparticle gap, defined as the minimum distance between the surfaces of the particle and its periodic image, and R t r u e is the true radius of the particles defined in Eqs. (3) and (4). The chains are initialized randomly in the space between particles where they can form the four possible configurations (loop, bridge, dangling, and free) depending on the stickers’ distance from the particle(s) surfaces. A sticker is considered “stuck” to a colloidal particle when it is within R c u t , 1 of the particle’s center. R c u t , 1 is defined as the distance at which the interaction energy is stronger than 1 k B T, as depicted in Fig. 1. A chain is in the “loop” configuration if both of its stickers are stuck to the same particle, or in the “bridge” configuration if the stickers are stuck to different particles. Similarly, the chain is dangling or free, if one or none, of the stickers are stuck to a particle, respectively. The chains continuously transition between these four configurations; however, a similar study [32] has shown that chains spend the majority of their time in either the loop or bridge configuration for the range of ε s considered here.

FIG. 1.

On the left-hand side, the simulation box contains a fixed colloidal particle at the center, showing the Lennard-Jones potentials representing the attraction strength ε s between a sticker (small circle end-capping chains) and the particle, R t r u e is the minimum of the potential well, R c u t , 1 is the radial position where U ( R c u t , 1 ) = 1 k B T, and R p is the hard-core repulsive boundary of the particle defined in Eqs. (3) and (4). On the right-hand side, an adjacent periodic box image is a single example of the periodic replicas that extend in all three spatial directions infinitely. A sample loop configuration is depicted in which the two stickers are within R c u t , 1. A sample bridge configuration is between the particle and its image. The beads and stickers representing a presumed bridging chain are displayed with their wrapped coordinates in full color, while their unwrapped coordinates are shown in shadow. The section of the chain exiting the right side of the box enters the adjacent periodic box, illustrating the formation of a sample bridge. The system parameters, including the radius of the particle ( R p), the number of Kuhn steps ( N k), the effective interparticle gap ( H), and the attraction strength between a sticker and the particle ( ε s) are also shown.

FIG. 1.

On the left-hand side, the simulation box contains a fixed colloidal particle at the center, showing the Lennard-Jones potentials representing the attraction strength ε s between a sticker (small circle end-capping chains) and the particle, R t r u e is the minimum of the potential well, R c u t , 1 is the radial position where U ( R c u t , 1 ) = 1 k B T, and R p is the hard-core repulsive boundary of the particle defined in Eqs. (3) and (4). On the right-hand side, an adjacent periodic box image is a single example of the periodic replicas that extend in all three spatial directions infinitely. A sample loop configuration is depicted in which the two stickers are within R c u t , 1. A sample bridge configuration is between the particle and its image. The beads and stickers representing a presumed bridging chain are displayed with their wrapped coordinates in full color, while their unwrapped coordinates are shown in shadow. The section of the chain exiting the right side of the box enters the adjacent periodic box, illustrating the formation of a sample bridge. The system parameters, including the radius of the particle ( R p), the number of Kuhn steps ( N k), the effective interparticle gap ( H), and the attraction strength between a sticker and the particle ( ε s) are also shown.

Close modal

The critical results extracted from the simulations include the MBL (or bridge-to-loop transition time), the MLL (or loop-to-bridge transition time), and the MBF. The bridge lifetime measures how long chains stay in a bridge configuration before one end-sticker escapes from its particle’s attraction potential (exceeding 1  k B T) and attaches to another particle, forming a loop. Conversely, the loop lifetime refers to the duration chains remain in a loop configuration, with both stickers attached to a single particle, until one end-sticker escapes the potential well (exceeding 1  k B T) and forms a bridge. Any intermediate dangling or free chains that form during the transition from loop to bridge or vice versa are assumed to be part of the preceding bridge or loop configuration. Thus, there are only two states: bridge and loop, with transitional time periods between the two being lumped into the state existing prior to completion of the transition to the second state, following the same procedure used in the work of Travitz and Larson [17]. Before computing the average lifetimes, bridge and loop lifetime samples should be collected over a sufficiently long simulation time to ensure adequate statistical sampling. Further details regarding the sampling are discussed in Appendix A 1.

In this subsection, we detail the development of active learning meta-models of the MLL, mean bridge lifetime (MBL), and bridge fractions (BFs) that are extracted from the detailed BD simulations of colloidal particles interacting with “rheology modifying” telechelic polymer chains discussed in Sec. II A. As mentioned there and represented in Table I, the design space explored in this study is represented by five system parameters, namely, the hard-core radius of particles R p / b k [ 10 30 ], number of Kuhn steps N k [ 10 30 ], normalized gap H / N k b k [ 0.1 0.6 ], number of polymer chains per particle N c / N p [ 10 100 ], and Lennard-Jones attraction strength between the stickers and the particles, where ε s / k B T [ 7 10 ]. While the ranges of particle sizes, chain lengths, and numbers of chains per particle to be explored in Table I are lower (by a factor of around 10) than the commercially most common, these values chosen to allow faster simulations and serve to provide a proof of principle. The ranges can be expanded upon in future work.

The active learning modeling aims at learning the model responses { f 1 , f 2 , f 3 } = {MLL, MBL, BF } across the entire design space, while reducing the number of simulations or experiments needed. If we were to conduct conventional grid-search (a form of exhaustive and unguided search) to find responses across the design space, it would have been necessary to perform a minimum of 1080 simulations for considering three options for each R p and N k, six options for H / N k b k, five options for N c / N p, and four options for ε s. Our active learning meta-modeling aims to significantly reduce this exhaustive set of simulations by using an intelligent search methodology. The overall process is illustrated in Fig. 2, with each section explained in more detail in what follows. After defining the design space, the process begins by fitting initial meta-models f ^ 1 , f ^ 2 , f ^ 3 for initial random points X = { x 1 , , x n } T within the design space, where each x i is a set of values of the system parameters or features (i.e., five parameters defined above). The essence of the process lies in making informed decisions about the next samples to simulate or experiment providing the most informative knowledge. This is guided by an acquisition function tailored to the specific problem. Once the new samples or a batch of new samples are selected, the BD simulations (or experiments) are carried out to update the fitted meta-models. This iterative process continues until specific stopping criteria are satisfied.

FIG. 2.

The flow chart illustrates the intelligent active learning framework, beginning with the definition of the design space and the generation of an initial set of uniform random samples (see Sec. II B 1). The actual responses of these initial samples (represented as dots) are obtained through BD simulations (see Sec. II A), which serve as observed samples for fitting initial GP regression meta-models (see Sec. II B 2). A sample GP model of a one-dimensional function is displayed with its mean value and confidence intervals compared to its true value (represented as a solid line), which remains unknown. Guided by a customized acquisition function (see Sec. II B 3), the next batch of samples is selected for simulation or experimentation using a kriging believer (KB, see Sec. II B 4). This selection aims to choose the most informative new samples by considering the model’s responses, their gradients, and uncertainties, without a risk of clustering around each other. After conducting the simulations or experiments, the meta-models are updated. This iterative process continues until the stopping criteria are met (see Sec. II B 5). Samples (represented as squares) are selected efficiently to minimize the need to exhaustive grid searches, allowing the model responses to be learned across the design space. Finally, the developed meta-models are explained (see Sec. II B 6) using SHAP.

FIG. 2.

The flow chart illustrates the intelligent active learning framework, beginning with the definition of the design space and the generation of an initial set of uniform random samples (see Sec. II B 1). The actual responses of these initial samples (represented as dots) are obtained through BD simulations (see Sec. II A), which serve as observed samples for fitting initial GP regression meta-models (see Sec. II B 2). A sample GP model of a one-dimensional function is displayed with its mean value and confidence intervals compared to its true value (represented as a solid line), which remains unknown. Guided by a customized acquisition function (see Sec. II B 3), the next batch of samples is selected for simulation or experimentation using a kriging believer (KB, see Sec. II B 4). This selection aims to choose the most informative new samples by considering the model’s responses, their gradients, and uncertainties, without a risk of clustering around each other. After conducting the simulations or experiments, the meta-models are updated. This iterative process continues until the stopping criteria are met (see Sec. II B 5). Samples (represented as squares) are selected efficiently to minimize the need to exhaustive grid searches, allowing the model responses to be learned across the design space. Finally, the developed meta-models are explained (see Sec. II B 6) using SHAP.

Close modal

1. Initial sampling

A maximin distance method [33] was used to generate a diverse and well-distributed set of initial random points, reducing the risk of clustering within the design space. The algorithm generates 1000 points and selects an initial set with the maximum pairwise distance between points, where “distance” is measured numerically in terms of the differences in values of the parameters in Table I. The initial sample size was 50 or 10-times the dimensionality of the problem as it is a commonly suggested guideline to start with [34]. Simulations were then conducted to obtain the actual model responses for each initial sample point.

2. Gaussian process (GP) regression

To generate meta-models, we used GP regression due to its effectiveness with limited datasets [35]. Each GP uses Bayesian principles and prior knowledge to create a posterior Gaussian distribution f ( x ) G P ( f ^ ( x ) , σ ^ 2 ( x ) ), where f ^ ( x ) is the predicted response (mean value) and σ ^ 2 ( x ) is the prediction variance. GPs utilize a kernel (or covariance function) to define the similarity between data points. Depending on the specific problem, various kernel functions can be selected or custom-designed for a GP (readers are referred to [36] for more detailed information on selecting a kernel/covariance function for a GP). The kernels used for the GP models are a combination of common choices, including radial basis function ( R B F), Matérn ( M a t), and dot product ( D o t) kernels, based on analyses explained in Appendix A 2.

3. Acquisition function

The active leaning process then iteratively adds new points x n e w based on an adaptive sampling criterion (a.k.a. “acquisition function”). Effective adaptive sampling should balance global exploration and local exploitation. Global exploration ensures comprehensive domain coverage by identifying unexplored areas, while local exploitation identifies informative regions (readers are referred to [22] on different categories of adaptive sampling strategies). A recent acquisition criterion called the gradient and uncertainty enhanced sequential sampling (GUESS) [37] leverages the predicted standard deviation to explore the design domain and employs a Taylor expansion-based approximation of the second- and higher-order remainders for exploitation as
(6)
where T f ^ t ( x ) ( x x ) represents the first- (and can be extend to higher-) order Taylor expansion of the model at x . The exploitation term is also adjusted with the predicted standard deviation, σ ( x ), serving as a penalty to prevent selecting samples from being too close to those already observed. The GUESS acquisition function is defined as
(7)
In this study, the acquisition function used for the responses ( F = { f 1 , f 2 , f 3 } = MLL, MBL,BF) is the GUESS acquisition function [Eq. (7)] modified by multiplying it by the bridge fraction. The idea behind this modification is to direct the new queries toward regions with a high bridge fraction. In regions with a low BF, the dynamics are rare, meaning it would take a long time for chains to transition their configurations, and a higher level of uncertainty is tolerable there. The new query was then obtained in such a way that maximizes the averaged customized GUESS (CGUESS) acquisition as
(8)

4. Kriging believer

For practical reasons, to perform parallel simulations, a batch of new samples was calculated according to the KB concepts [38]. The KB criterion updates the meta-models by believing (hallucinating) that the actual response of the new query is its GP mean-value prediction. This process of updating the quasi-Kriging model and applying the acquisition function is repeated until the desired number of samples within a batch (i.e., 10) is collected.

5. Stopping criteria

The simulations were run for each new batch and fed into the database to update the meta-models until a stopping criterion is met. To assess model performance and define a stopping criterion, the leave-one-out cross-validation (LOOCV) error can be expressed as a percentage error, from which the generalized mean squared cross-validation percentage error (GMSPE) is derived. Additionally, a test set of 27 data points was generated using Maximin distance design concepts. The actual responses of the test points were compared to those predicted by GP models, with percentage errors calculated and expressed as root mean squared percentage error (RMSPE). Figure 3 represents the GMSPE and RMSPE for (a) the BF response, the primary response of interest and (b) the MBL and MLL. As can be seen, both error criteria converge after a significant decline, indicating saturation and minimal likelihood for further improvement with additional iterations.

FIG. 3.

GMSPE and RMSPE values for (a) BF, (b) MBL and MLL. The RMSPE values were calculated for a test set of 27 samples, comparing the GP-predicted responses to the actual values generated by the developed model. Symbols are the data points while lines are guides to the eye.

FIG. 3.

GMSPE and RMSPE values for (a) BF, (b) MBL and MLL. The RMSPE values were calculated for a test set of 27 samples, comparing the GP-predicted responses to the actual values generated by the developed model. Symbols are the data points while lines are guides to the eye.

Close modal

6. Interpretability

Conducting interpretability analysis is crucial for obtaining insight from developed machine-learning models. In this study, we applied SHAP interpretability analysis [25] to our developed meta-models to interpret the model predictions and explain how the system parameters (features) impact the model responses (predictions). The SHAP values are computed using a concept from cooperative game theory called Shapley values [39], which considers all possible feature combinations and determines the contribution of each feature to the model’s response. The SHAP analyses can assess feature importance models, depicting how features interact with one another and influence specific predictions. The interpretability analysis is discussed in further detail in Sec. III.

In this section, the SHAP analyses along with the results derived from the developed meta-models are discussed for the model responses, including BF, MLL, and MBL, using the final dataset of 120 data points.

Figure 4(a) illustrates the SHAP global feature importance for each system parameter or feature, where the absolute mean values are quantified for each individual system parameter. It is worth mentioning that the sum of all SHAP values for a given feature equals the difference between the actual prediction and the average predicted response. Therefore, larger SHAP values for a particular feature indicate a greater contribution to the deviation of the actual response from the average response, signifying its higher level of influence. The normalized gap ( H / N k b k) emerged as the primary driver for the BF response, as evidenced by its larger corresponding SHAP value. This is also evident in the SHAP summary plot in Fig. 4(b), where the normalized gap is positioned at the top, distributing more widely. A more widespread spread of a feature’s points leads to a higher mean SHAP value, indicating a more significant impact on the model’s responses. Conversely, a narrow spread contributes to points clustered abound zero, reducing its mean SHAP value and showing little influence on the model’s responses. The gray scale intensity encodes feature values, with darker shades depicting higher feature values. The horizontal axis reflects the impact of these values on the BF response. Notably, higher values of the normalized gap (represented by black markers at the top row) are associated with lower predicted BFs (negative SHAP values). Consequently, as particle separation increases, the likelihood of chain bridging diminishes expectedly due to the reduced contact opportunities. Similarly, particle radius ( R p / b k) influences the bridge formation as the second most important factor. Larger particles increase free space in the simulation box, leading to fewer bridges. This is further confirmed by the “effective cap fraction,” which measures the fraction of the surface area of the particle over which bridges are formed, as defined in our previous study [12]. Increasing particle radius at a constant gap and chain length reduces the effective cap fraction for bridge formation, leading simultaneously to increased availability of surfaces for loop formation. This phenomenon can also be observed in the Fig. 5, where the GP mean predicted bridge fraction is depicted against both H / N k b k and R p / b k for ε s = 8 k B T, N k = 10, and N c / N p = 10, which we take as “standard” values. The other “standard value” is R p / b k = 10, which values we assume in all plots except where otherwise noted. Time constants are always normalized by τ 0 = ζ b e a d b k 2 / k B T. Notably, BFs become negligible beyond a normalized gap of approximately 0.6, indicating a deficient bridge population and rare bridging dynamics. Under these conditions, most chains are expected to remain in loop configurations for highly extended periods, and any bridges that do form will break almost immediately, hence not contributing to any rheological effect of interest.

FIG. 4.

(a) The SHAP global bar plot shows the absolute mean values of SHAP scores for each system parameter or feature in the BF model, highlighting the overall importance order of features. Features with higher bar values have a greater impact on the BF predictions. (b) The SHAP summary plot for the BF model illustrates both the importance and direction of influence by showing the distribution of SHAP values for each feature across all training data points. A wider distribution of a feature’s SHAP values indicates a greater impact on the BFs, while a narrower distribution contributes to a weaker impact. By analyzing the color gradient that represents feature values, the direction of influence for each feature can be deduced. For example, positive SHAP values associated with higher feature values imply a positive correlation between that feature and the overall BF response.

FIG. 4.

(a) The SHAP global bar plot shows the absolute mean values of SHAP scores for each system parameter or feature in the BF model, highlighting the overall importance order of features. Features with higher bar values have a greater impact on the BF predictions. (b) The SHAP summary plot for the BF model illustrates both the importance and direction of influence by showing the distribution of SHAP values for each feature across all training data points. A wider distribution of a feature’s SHAP values indicates a greater impact on the BFs, while a narrower distribution contributes to a weaker impact. By analyzing the color gradient that represents feature values, the direction of influence for each feature can be deduced. For example, positive SHAP values associated with higher feature values imply a positive correlation between that feature and the overall BF response.

Close modal
FIG. 5.

The predicted BF vs H / N k b k and R p / b k for systems with standard values of ε s = 8 k B T, N k = 10, and N c / N p = 10. The BF decreases as the normalized gap and the radius of particles increase, resulting in enhancing a greater availability of surfaces for loop formation.

FIG. 5.

The predicted BF vs H / N k b k and R p / b k for systems with standard values of ε s = 8 k B T, N k = 10, and N c / N p = 10. The BF decreases as the normalized gap and the radius of particles increase, resulting in enhancing a greater availability of surfaces for loop formation.

Close modal

Figure 6(a) illustrates the relationship between the most important feature ( H / N k b k) and the BF responses, revealing how the feature’s impact varies across different number of Kuhn steps ( N k). The plot is shaded according to N k, with darker colors indicating higher N k (larger chains) values and lighter colors representing lower N k (shorter chains). Interestingly, as seen in Fig. 6(b), at normalized gaps below 0.2, longer chains correlate with increased BFs, whereas at normalized gaps greater than 0.2, longer chains exhibit lower BFs. As N k grows, the rate of BF decline against H / N k b k becomes steeper. It should be noted that the normalized gap is the ratio of the closest distance between two adjacent particles divided by the chain length. Thus, to further demonstrate the dependency of BF on N k, we calculated the available space for chains between particles. Defining the volume fraction ( ν) as the ratio of particle volume to total volume, the “available volume fraction” can be measured as 1 ν, accounting for both the normalized gap, H / N k b k, and R p. Figure 6(c) depicts the predicted BFs vs both available volume fraction and N k for systems of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10. It is clear that larger chains are more likely to span larger gaps and, therefore, contribute to a higher BF at a constant 1 ν.

FIG. 6.

(a) The BF SHAP values for H / N k b k interacting with N k displays that, at lower normalized gaps, higher values of N k contribute to increased BF SHAP values for the normalized gap. However, this correlation is reversed at higher normalized gaps. (b) The predicted BF vs both H / N k b k and N k confirms the previously mentioned interdependency between the normalized gap and N k. (c) Additionally, because the normalized gap is sensitive to N k, the BF is also presented against the available volume fraction ( 1 ν) and N k . It shows that chains with higher N k contribute to increased BF values at a constant available volume fraction. In both (b) and (c), the standard values were taken of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10.

FIG. 6.

(a) The BF SHAP values for H / N k b k interacting with N k displays that, at lower normalized gaps, higher values of N k contribute to increased BF SHAP values for the normalized gap. However, this correlation is reversed at higher normalized gaps. (b) The predicted BF vs both H / N k b k and N k confirms the previously mentioned interdependency between the normalized gap and N k. (c) Additionally, because the normalized gap is sensitive to N k, the BF is also presented against the available volume fraction ( 1 ν) and N k . It shows that chains with higher N k contribute to increased BF values at a constant available volume fraction. In both (b) and (c), the standard values were taken of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10.

Close modal

Conversely, the chain-to-particle ratio ( N c / N p) and the attraction strength between stickers and colloidal particles ( ε s) exhibit minimal influence on BF model outcomes. Despite accounting for the excluded volume of beads, the relative abundance of chains and particles does not significantly affect BFs or other response metrics under equilibrium conditions. Thus, altering the chain population does not impact the proportion of chains forming bridges as shown in Fig. 7(a). Surprisingly, ε s also demonstrates negligible impact on BF when other system parameters remain constant over the range of ε s considered. Since ε s governs the attraction between stickers and particles, increasing this attraction, for example, appears to equally stabilize both loop and bridge configurations—loops tend to retain their structure and bridges maintain their connections—leaving the bridges fraction nearly unchanged [as shown in Fig. 7(b)]. This is not surprising, at least for relatively high values of ε s for which almost all stickers are bound to surfaces, which can be accomplished equally well through formation of either loops or bridges. It is the gap between particles that, therefore, controls the ratio of bridges to loops. However, it is evident that ε s significantly influences the time a sticker remains bound to a particle, and hence ε s strongly influences other system properties such as MLL and MBL, as will be discussed subsequently.

FIG. 7.

The predicted BF values vs (a) H / N k b k and N c / N p, and (b) H / N k b k and ε s is depicted, where other parameters take on their standard values. These figures confirm the findings in Fig. 4 that both N c / N p and ε s a minimal impact on the BF values.

FIG. 7.

The predicted BF values vs (a) H / N k b k and N c / N p, and (b) H / N k b k and ε s is depicted, where other parameters take on their standard values. These figures confirm the findings in Fig. 4 that both N c / N p and ε s a minimal impact on the BF values.

Close modal

The SHAP global summary plots [Figs. 8(a) and 8(b)] reveal that the normalized gap is the most influential system parameter affecting the MLL response, as evidenced by its highest value in the SHAP global bar plot [Fig. 8(a)] and its wide distribution at the top of the SHAP summary plot [Fig. 8(b)]. The dark-colored dots on the right side of the SHAP summary plot demonstrate a positive correlation between the normalized gap and MLL, with increased gap values leading to a higher MLL. This suggests that there is an energetic preference for chains to remain in loop configurations rather than forming or transitioning to bridges, leading to an increased time spent in the loop configuration and consequently larger MLL values. As shown in Fig. 9, this phenomenon can be enhanced when the attraction strength (the second most influential parameter) also increases. Under the conditions of a large gap and strong attraction between stickers and particles, loops are further encouraged to maintain their configurations, leading to even larger MLL values.

FIG. 8.

(a) The SHAP global bar plot illustrates the absolute mean SHAP scores for each feature in the MLL model, highlighting the overall importance of each feature. Features with higher bar values, positioned at the top, have a greater impact, while those with lower bar values and ranks indicate a lesser impact on the MLL predictions. (b) The SHAP summary plot for the MLL model combines the importance of features with their direction of effect by displaying the distribution of SHAP values for each feature across all training data points. The color gradient represents the feature values, and the horizontal axis shows the SHAP values, quantifying the impact of these values on the MLL response. Higher normalized gap values, for instance, correlate with higher predicted MLL values, as shown by the positive SHAP values and darker data points, i.e., a positive correlation between this feature and the MLL responses.

FIG. 8.

(a) The SHAP global bar plot illustrates the absolute mean SHAP scores for each feature in the MLL model, highlighting the overall importance of each feature. Features with higher bar values, positioned at the top, have a greater impact, while those with lower bar values and ranks indicate a lesser impact on the MLL predictions. (b) The SHAP summary plot for the MLL model combines the importance of features with their direction of effect by displaying the distribution of SHAP values for each feature across all training data points. The color gradient represents the feature values, and the horizontal axis shows the SHAP values, quantifying the impact of these values on the MLL response. Higher normalized gap values, for instance, correlate with higher predicted MLL values, as shown by the positive SHAP values and darker data points, i.e., a positive correlation between this feature and the MLL responses.

Close modal
FIG. 9.

The predicted MLL vs both H / N k b k and ε s. MLL values are normalized by τ 0 = ζ b e a d b k 2 / k B T, a characteristic frictional time. An increase in H / N k b k and ε s synergistically enhances MLL responses, meaning that the combined effect of normalized gap and the attraction energy drives greater MLL values.

FIG. 9.

The predicted MLL vs both H / N k b k and ε s. MLL values are normalized by τ 0 = ζ b e a d b k 2 / k B T, a characteristic frictional time. An increase in H / N k b k and ε s synergistically enhances MLL responses, meaning that the combined effect of normalized gap and the attraction energy drives greater MLL values.

Close modal

Figure 10(a) illustrates the SHAP dependence plot of the normalized gap on MLL, considering its interaction with the number of Kuhn segments ( N k). When particles are positioned relatively close to each other, larger chains [represented by darker dots in Fig. 10(a)] do not necessarily exhibit a marked increase in MLL. This observation is also evident in Fig. 10(b), where MLL values are less sensitive to N k at smaller normalized gaps than at higher gaps. This behavior is attributed to the increased likelihood of chains disrupting loops to form bridges at smaller gaps. Conversely, as the particle separation distance increases, the probability of transitioning from loop to bridge decreases, promoting more stable loops and consequently larger MLL values. This phenomenon is more pronounced for shorter chains.

FIG. 10.

(a) The MLL SHAP dependence plot for H / N k b k interacting with N k along with (b) the MLL values plotted against both H / N k b k and N k, indicate that the MLL increases with increasing normalized gaps. Additionally, at smaller normalized gaps, the increase in N k results in a relatively moderate rise in MLL.

FIG. 10.

(a) The MLL SHAP dependence plot for H / N k b k interacting with N k along with (b) the MLL values plotted against both H / N k b k and N k, indicate that the MLL increases with increasing normalized gaps. Additionally, at smaller normalized gaps, the increase in N k results in a relatively moderate rise in MLL.

Close modal

Figure 8 suggests that the radius of particles ( R p) also positively contributes to MLL responses, where increased R p leads to larger MLL values. As discussed previously, increasing R p (while keeping other parameters constant) decreases the likelihood of the chains forming bridges, resulting in a higher proportion of chains maintaining loop configurations for more extended periods. An alternative perspective involves considering the available volume between the particles. Increasing the interparticle volume influenced by both the normalized gap and R p is expected to correlate with larger MLL values. Figures 11(a) and (b) illustrate the relationship between predicted MLL values and available volume fractions for different R p and N k values, where clearly the higher available volume fractions correspond to longer MLL values. Interestingly, Fig. 11(b) suggests that at fixed available volume fractions shorter chains tend to exhibit longer MLL values due to the reduced probability of shorter loops traversing the same available space and transitioning to bridges compared to longer ones.

FIG. 11.

(a) The MLL vs available volume fraction and R p / b k indicates that increasing R p / b k while keeping available volume fractions constant results in higher MLL values. (b) The MLL vs available volume fraction and N k demonstrates that increasing N k at a constant available volume fraction leads to lower MLL values.

FIG. 11.

(a) The MLL vs available volume fraction and R p / b k indicates that increasing R p / b k while keeping available volume fractions constant results in higher MLL values. (b) The MLL vs available volume fraction and N k demonstrates that increasing N k at a constant available volume fraction leads to lower MLL values.

Close modal

While the attraction strength between stickers and colloidal particles ( ε s) exhibited minimal influence on the BF in Figs. 4(a), 4(b), and 7(b), it emerges as a dominant factor affecting MBL as evidenced by its top ranking in Fig. 12(a). Figure 12(b) further emphasizes the importance of ε s, with its positive contribution to MBL indicating that higher ε s values contribute to longer-lived bridges and consequently larger MBL. Because forming bridges requires stretching chains to cross the gap, maintaining bridges costs stretching free energy, for which ε s serves as a primary compensation force. After ε s, N k is the second most important factor influencing MBL. Longer chains can be stretched with less free energy cost and hence maintain their bridge configurations (once formed) longer than do shorter ones. As a consequence, higher or lower MBL values can be achieved by synergistically increasing or decreasing both ε s and N k, respectively. This insight is particularly useful for the design of coating fluid formulations.

FIG. 12.

(a) The SHAP global bar plot highlights the overall importance of each feature on the MBL model, by quantifying the absolute mean SHAP scores. It ranks the features from the one with the greatest impact on MBL predictions to the one with the least impact. (b) The SHAP summary plot for the MBL model illustrates the relative importance of each feature, along with their directional effects, by showing the distribution of SHAP values for every feature across the training dataset. The horizontal axis represents the SHAP values, which can be either positive or negative. The gradient color on the right side indicates the values of the features, while the features’ ranks of importance are shown on the left side. As observed, higher attraction strength between stickers and particles ( ε s) is associated with increased predicted MBL values, as indicated by the positive SHAP values and darker data points. This suggests a positive correlation between this specific feature and the MBL responses.

FIG. 12.

(a) The SHAP global bar plot highlights the overall importance of each feature on the MBL model, by quantifying the absolute mean SHAP scores. It ranks the features from the one with the greatest impact on MBL predictions to the one with the least impact. (b) The SHAP summary plot for the MBL model illustrates the relative importance of each feature, along with their directional effects, by showing the distribution of SHAP values for every feature across the training dataset. The horizontal axis represents the SHAP values, which can be either positive or negative. The gradient color on the right side indicates the values of the features, while the features’ ranks of importance are shown on the left side. As observed, higher attraction strength between stickers and particles ( ε s) is associated with increased predicted MBL values, as indicated by the positive SHAP values and darker data points. This suggests a positive correlation between this specific feature and the MBL responses.

Close modal

Figure 13(a) illustrates the SHAP values for the normalized gap ( H / N k b k) and its interaction with N k, where the MBL values initially increases with increasing the H / N k b k, reaching a peak, followed by a decrease upon further increase in the gap ratios. The lower MBL values at smaller gaps can be attributed to the more confined space between particles, which pushes the stickers outside and places them in less optimal positions to maintain bridges. As the normalized gap widens, the influence of the normalized gap leads to decreased MBL values due to the increase energy cost of sustaining bridge configurations at larger separation distances. Moreover, at smaller normalized gaps, larger chains tend to increase MBL values. However, this correlation appears to be reversed as the normalized gap increases (darker colored dots correspond to lower SHAP values). This implies that the MBL values for chains with higher N k decrease more sharply as the normalized gap grows. This behavior is further highlighted in Fig. 13(b), where the GP-predicted MBL is plotted against H / N k b k and N k with other parameters at their standard values, as always. Notably, the position of the peak on the x-axis shifts rightward as the chain length increases. This trend can be attributed to the polymer end-to-end distance, which measures the distance between the two stickers within a chain. The MBL peak typically occurs at a distance between particles that approximates the equilibrium chain end-to-end distance, which is approximately H / b k N k, representing the most favorable distance for sticker interaction under equilibrium as confirmed by the peak in MBL at H / b k N k around unity in Fig. 13(c) for different N k. This behavior provides valuable insight that can guide the development of design principles for formulating waterborne coating fluids.

FIG. 13.

(a) The MBL SHAP dependence plot for H / N k b k interacting with N k and (b) The MBL vs H / N k b k and N k show a rising trend in MBL as the normalized gap increases at smaller normalized gaps, while a declining trend is observed at larger gaps. Additionally, it is noted that MBL is highly sensitive to increases in N k at larger normalized gaps. (c) The MBL vs H / b k N k and N k indicates that the most favorable distance between particles for forming stable bridges under equilibrium is approximately equal to the polymer end-to-end distance for all chains with varying N k.

FIG. 13.

(a) The MBL SHAP dependence plot for H / N k b k interacting with N k and (b) The MBL vs H / N k b k and N k show a rising trend in MBL as the normalized gap increases at smaller normalized gaps, while a declining trend is observed at larger gaps. Additionally, it is noted that MBL is highly sensitive to increases in N k at larger normalized gaps. (c) The MBL vs H / b k N k and N k indicates that the most favorable distance between particles for forming stable bridges under equilibrium is approximately equal to the polymer end-to-end distance for all chains with varying N k.

Close modal

This study explored the association dynamics of telechelic polymers with colloidal latex particles, a pivotal factor in understanding and hence controlling the rheology of waterborne coating fluids. Recognizing the limitations of existing simulation techniques in capturing the full spectrum of relevant time and length scales, we developed a novel active learning-based approach to bridge the gap between high-resolution models and those with lower resolution. Initially, we developed BD simulations as precise yet computationally expensive models, resolving the fine-grained interaction of telechelic chains with colloidal particle surfaces. As a result, we could measure the system’s microscopic responses, such as the bridges and loops lifetimes and the BFs. These simulations allowed the generation of initial and sequential system responses for any points within the design space, utilizing an active learning approach. The active learning meta-models developed here are capable of predicting the lifespans of bridges and loops and the populations of bridges within the design space. This achievement is realized with a notably reduced number of samples required to map the system parameters to the corresponding responses from thousands to one hundred samples. Moreover, the interpretability of these meta-models enhances the model transparency and further facilitates refinement of formulations by focusing on the most influential design parameters.

Analyzing the meta-models provides insights into the interplay between system parameters in predicting MLLs and MBLs, and BFs. By adjusting parameters such as polymer length (represented in Kuhn steps), the number of chains per particle (polymer concentration in the formulation), particle spacing (which reflects the particle volume fraction in the solution), and the attraction strength between stickers and particles (number of the methylene in the hydrophobic end caps), experimentalists can tailor formulations to fine-tune the rheological behavior of waterborne coatings. The findings indicate that the normalized gap between particles plays a prominent role in bridge formation and loop lifetime. Conversely, the attraction strength between the stickers and particles has a minimal impact on the BF. Furthermore, this study identified a synergistic effect of chain length and attraction strength on promoting extended bridge lifetimes, resulting in longer relaxation times. This has been found experimentally [40,41] to increase the viscosity and stability of analogous systems, i.e., interconnected micelles by bridging chains. The developed meta-model effectively demonstrates that the optimal distance between particles maintaining bridges over an extended period corresponds to the end-to-end distance of the chains under equilibrium conditions, which refers to a more stable coating suspension in storage. In addition, the intelligent search strategy developed in this work can easily be adopted to accelerate and guide the design of laboratory experiments and is not limited to computational studies.

This study has laid a solid foundation for future research on the interaction between telechelic polymers and colloidal latex particles. Several avenues can be explored to extend and deepen our understanding of this complex systems. Integrating these meta-models with coarser-grained simulation methods, such as Pop-BD [12], could provide a more comprehensive understanding of the system at different lengths and time scales. The significant advantage of this research lies in its ability to formulate quantitative models suitable for integration into Pop-BD simulations. In addition to the gap-dependent BF, Pop-BD simulations require the loop-to-bridge transition rate (L) and the bridge-to-loop transition rate (M) that are equivalent to 1/MLL and 1/MBL, respectively. These Pop-BD input values are represented for the standard values (i.e., ε s = 8 k B T, N k = 10, R p / b k = 10, and N c / N p = 10) in Fig. 14. By providing accurate estimates of BF and these transition rates, our predictive meta-models allow the Pop-BD simulations to accurately capture the higher relaxation times of systems with larger colloidal particles, which is essential for bridging the microscale interactions to macroscale rheological behavior. With the developed meta-modes, our future research will focus on enhancing the predictive power of Pop-BD simulations informed by developed meta-models and gaining a more comprehensive understanding of the rheological properties at the macroscale.

FIG. 14.

Loop-to-bridge transition rate (L), bridge-to-loop transition rate (M), and BF for the standard values, predicted by developed active learning meta-models (i.e., ε s = 8 k B T, N k = 10, R p / b k = 10, and N c / N p = 10), which can be used in conducting Pop-BD simulations.

FIG. 14.

Loop-to-bridge transition rate (L), bridge-to-loop transition rate (M), and BF for the standard values, predicted by developed active learning meta-models (i.e., ε s = 8 k B T, N k = 10, R p / b k = 10, and N c / N p = 10), which can be used in conducting Pop-BD simulations.

Close modal

Future work should also focus on extending the meta-models to larger values of the particle size, chain length, and numbers of chains per particle, and sticker strength. This could be done by a combination of additional runs and using ML methods to extrapolate from smaller parameter values to larger, using more sparse sampling for the more expensive simulations to conserve computer power. Moreover, investigating the system’s behavior under shear conditions would provide valuable insights into the rheological properties of waterborne coatings. Refined gap-dependent analyses of these systems under shear forces are of great interest, as highlighted in [42]. These future studies ultimately pave the way for optimizing waterborne coating formulations to achieve the desired macrolevel rheological properties.

Consequently, this work represents a significant step forward in the multiscale modeling of waterborne coatings, offering a computationally efficient method to provide detailed microscopic-level information for use in macroscopic simulations. Offering a comprehensive understanding of how various factors such as polymer chain length, particle size, and attraction strength contribute to association dynamics under equilibrium conditions, these meta-models have minimized the need for extensive trial-and-error simulations and can accelerate the development of more efficient and environmentally friendly coatings.

This research was supported by The University of Melbourne’s Research Computing Services and the Petascale Campus Initiative. J.A. acknowledges the training facilities at the Australian Research Council (ARC) Industrial Transformation Training Centre in Optimisation Technologies, Integrated Methodologies, and Applications (OPTIMA). R.G.L. is grateful for the financial support of the National Science Foundation of the USA under Grant No. DMR-2100513. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.

The authors have no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1. Statistical observation of MLL and MBL

The developed BD simulations update the positions of all beads and stickers in the system at each time step, enabling the study of their dynamics over time. A time step of 5 × 10 5 [ t ] was used, consistent with the literature [27]. As time progresses, the positions of the stickers are tracked at specific time intervals, known as dumped time steps. These time steps are typically set to be as low as possible to capture accurate particle positions. To manage the volume of data generated, a dumped time step of 10 was used, ensuring that it is significantly smaller than any bridge or loop lifetime, as calculated according to the methods described in Sec. II A. It is crucial to allow simulations to run for a sufficiently long duration to capture adequate statistics of bridge and loop lifetimes before calculating averages. Failing to do so or imposing a cap on the total run time can lead to inaccurate results, particularly for loop lifetimes, which can be significantly longer than the capped total run time. Given the random and independent nature of loop and bridge lifetimes, their observed distributions should closely resemble a Poisson distribution [43]. The histogram plots of the samples are presented in Fig. 15 for a system with H / N k b k = 0.2 N k = 10 of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10. As observed, the frequency of occurrence of loop or bridge samples decreases exponentially as the duration of events increases.

FIG. 15.

(a) The distribution of (a) loop and (b) bridge lifetimes for a system with H / N k b k = 0.2 N k = 10 of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10.

FIG. 15.

(a) The distribution of (a) loop and (b) bridge lifetimes for a system with H / N k b k = 0.2 N k = 10 of ε s = 8 k B T, R p / b k = 10, and N c / N p = 10.

Close modal

2. Gaussian process (GP) kernels analysis

For a GP regression model, the marginal likelihood p ( y X ) over the function f is the integral of the likelihood function over all possible functions that could have generated the data, weighted by the prior p ( f X ) as:
(A1)
where, y and X are the vector of observed data and the set of input data points, respectively. This marginal likelihood has a closed-form expression because both the prior and likelihood are Gaussian. The log marginal likelihood (LML) is then given by:
(A2)
here, K = K ( X , X ; θ ) is the covariance matrix of the observations and contains the prior information about the function, computed using the kernel function with hyperparameters θ. The choice of kernel function can influence how the model interprets the smoothness, periodicity, and other properties of the underlying function. Different kernels imply different assumptions about the data structure. The LML provides a quantitative measure for comparison of these assumptions. After fitting the model with optimized hyperparameters, the LML was computed for each kernel. The kernel with the highest LML is considered the most suitable among the other kernel candidates. Additionally, LOOCV can provide an unbiased estimate of the model’s generalization ability. In LOOCV, each data point is sequentially left out from the training set, the model is trained on the remaining points, and its performance is evaluated on the left-out point, as explain in Algorithm 1. A lower LOOCV error (here expressed as GMSPE) generally indicates a better-performing kernel for a model.
ALGORITHM 1.

Leave-One-Out Cross-Validation (LOOCV) for Gaussian Process Model.

Input: Dataset D = { ( x i , y i ) } i = 1 n 
  • Initialize an empty list of prediction errors, E = [ ].

  • For each data point i = 1 to n:

    • (a) Remove the i-th data point ( x i , y i ) from the dataset D.

    • (b) Train the Gaussian Process (GP) model using the remaining n 1 data points.

    • (c) Predict the output y ^ i for the left-out i-th data point using the trained GP model.

    • (d) Calculate the prediction error:
      (A3)
    • (e) Append e i to the list of prediction errors E.

  • End For

  • Calculate the Generalized Mean Squared Percentage Error (GMSPE):
    (A4)
 
Output: GMSPE 
Input: Dataset D = { ( x i , y i ) } i = 1 n 
  • Initialize an empty list of prediction errors, E = [ ].

  • For each data point i = 1 to n:

    • (a) Remove the i-th data point ( x i , y i ) from the dataset D.

    • (b) Train the Gaussian Process (GP) model using the remaining n 1 data points.

    • (c) Predict the output y ^ i for the left-out i-th data point using the trained GP model.

    • (d) Calculate the prediction error:
      (A3)
    • (e) Append e i to the list of prediction errors E.

  • End For

  • Calculate the Generalized Mean Squared Percentage Error (GMSPE):
    (A4)
 
Output: GMSPE 

We used the criteria mentioned above to assess the kernel performance and select the best kernel for generating new samples. After generating the initial samples, the LML values and GMSPE were calculated for different kernel combinations at each iteration. These results were used to compare and evaluate the kernels that have the maximum value for LML and the minimum value for GMSPE. Another criterion considered was the simplicity of the kernels or having fewer hyperparameters to avoid overfitting. Once a satisfactory level of accuracy was achieved (below 5 % in GMSPE computed for the BF responses), the final selected kernels were retained for the remaining iterations. Figure 16 represents the GMSPE and LML values for different combinations of kernels used to predict the three responses: BF, MLL, and MBL. The data are based on a population of 80 samples. According Fig. 16(a), ( M a t r n ( ) + D o t P r o d u c t ( ) 2 ) 2 was selected because it has a relatively low GMSPE value (although not the lowest), a high LML value, and the fewest number of hyper parameters. Similarly, ( M a t r n ( ) × R B F ( ) × D o t P r o d u c t ( ) ) and ( M a t r n ( ) + D o t P r o d u c t ( ) 2 ) 2 were selected to represent kernels for MLL and MBL, respectively.

FIG. 16.

The GMSPE and LML values for various kernel combinations used to predict the three responses—(a) BF, (b) MLL, and (c) MBL—are shown based on a population of 80 samples.

FIG. 16.

The GMSPE and LML values for various kernel combinations used to predict the three responses—(a) BF, (b) MLL, and (c) MBL—are shown based on a population of 80 samples.

Close modal
1.
Chen
,
H.
, and
A.
Alexander-Katz
, “
Structure and dynamics of blood-clotting-inspired polymer-colloid composites
,”
Soft Matter
9
,
10381
10390
(
2013
).
2.
Kreuter
,
J.
,
Colloidal Drug Delivery Systems
(
CRC
,
New York
,
2014
).
3.
Dickinson
,
E.
,
Food Polymers, Gels and Colloids
(
Elsevier
,
Sawston, UK
,
1991
), 82.
4.
Xu
,
L.
,
Y.
Zhou
, and
S.
Amin
, “
Polymer colloids for cosmetics and personal care
,” in
Polymer Colloids: Formation, Characterization and Applications
,
edited by R. Priestley and R. Prud'homme (The Royal Society of Chemistry, London, UK, 2019), Chap. 13, pp. 399–417
.
5.
Ginzburg
,
V. V.
,
T.
Chatterjee
,
A. I.
Nakatani
, and
A. K.
Van Dyk
, “
Oscillatory and steady shear rheology of model hydrophobically modified ethoxylated urethane-thickened waterborne paints
,”
Langmuir
34
,
10993
11002
(
2018
).
6.
Larson
,
R. G.
,
A. K.
Van Dyk
,
T.
Chatterjee
, and
V. V.
Ginzburg
, “
Associative thickeners for waterborne paints: Structure, characterization, rheology, and modeling
,”
Prog. Polym. Sci.
129
,
101546
(
2022
).
7.
Maechling-Strasser
,
C.
,
J.
François
,
F.
Clouet
, and
C.
Tripette
, “
Hydrophobically end-capped poly (ethylene oxide) urethanes: 1. Characterization and experimental study of their association in aqueous solution
,”
Polymer
33
,
627
636
(
1992
).
8.
Van Dyk
,
A. K.
,
T.
Chatterjee
,
V. V.
Ginzburg
, and
A. I.
Nakatani
, “
Shear-dependent interactions in hydrophobically modified ethylene oxide urethane (HEUR) based coatings: Mesoscale structure and viscosity
,”
Macromolecules
48
,
1866
1882
(
2015
).
9.
Green
,
M. S.
, and
A. V.
Tobolsky
, “
A new approach to the theory of relaxing polymeric media
,”
J. Chem. Phys.
14
,
80
92
(
1946
).
10.
Annable
,
T.
,
R.
Buscall
,
R.
Ettelaie
, and
D.
Whittlestone
, “
The rheology of solutions of associating polymers: Comparison of experimental behavior with transient network theory
,”
J. Rheol.
37
,
695
726
(
1993
).
11.
Wang
,
S.
, and
R. G.
Larson
, “
Multiple relaxation modes in suspensions of colloidal particles bridged by telechelic polymers
,”
J. Rheol.
62
,
477
490
(
2018
).
12.
Hajizadeh
,
E.
,
S.
Yu
,
S.
Wang
, and
R. G.
Larson
, “
A novel hybrid population balance—Brownian dynamics method for simulating the dynamics of polymer-bridged colloidal latex particle suspensions
,”
J. Rheol.
62
,
235
247
(
2018
).
13.
Tripathi
,
A.
,
K. C.
Tam
, and
G. H.
McKinley
, “
Rheology and dynamics of associative polymers in shear and extension: Theory and experiments
,”
Macromolecules
39
,
1981
1999
(
2006
).
14.
Travitz
,
A.
, Multiscale modeling of polymer-colloid interactions in waterborne coatings, Ph.D. thesis, University of Michigan (2021).
15.
Anderson
,
J. A.
,
J.
Glaser
, and
S. C.
Glotzer
, “
Hoomd-blue: A Python package for high-performance molecular dynamics and hard particle Monte Carlo simulations
,”
Comput. Mater. Sci.
173
,
109363
(
2020
).
16.
Rezvantalab
,
H.
, and
R. G.
Larson
, “
Bridging dynamics of telechelic polymers between solid surfaces
,”
Macromolecules
51
,
2125
2137
(
2018
).
17.
Travitz
,
A.
, and
R. G.
Larson
, “
Brownian dynamics simulations of telechelic polymers transitioning between hydrophobic surfaces
,”
Macromolecules
54
,
8612
8621
(
2021
).
18.
Zhang
,
W.
,
A.
Travitz
, and
R. G.
Larson
, “
Modeling intercolloidal interactions induced by adsorption of mobile telechelic polymers onto particle surfaces
,”
Macromolecules
52
,
5357
5365
(
2019
).
19.
Alber
,
M.
,
A.
Buganza Tepole
,
W. R.
Cannon
,
S.
De
,
S.
Dura-Bernal
,
K.
Garikipati
,
G.
Karniadakis
,
W. W.
Lytton
,
P.
Perdikaris
,
L.
Petzold
et al., “
Integrating machine learning and multiscale modeling—Perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences
,”
NPJ Digital Medicine
2
,
115
(
2019
).
20.
Weeratunge
,
H.
,
D.
Robe
,
A.
Menzel
,
A. W.
Phillips
,
M.
Kirley
,
K.
Smith-Miles
, and
E.
Hajizadeh
, “
Bayesian coarsening: Rapid tuning of polymer model parameters
,”
Rheol. Acta
62
,
477
490
(
2023
).
21.
Williams
,
C. K.
, and
C. E.
Rasmussen
,
Gaussian Processes for Machine Learning
(
MIT
,
Cambridge, MA
,
2006
), Vol. 2.
22.
Liu
,
H.
,
Y.-S.
Ong
, and
J.
Cai
, “
A survey of adaptive sampling for global metamodeling in support of simulation-based complex engineering design
,”
Struct. Multidiscip. Optim.
57
,
393
416
(
2018
).
23.
Jin
,
R.
,
W.
Chen
, and
A.
Sudjianto
, On sequential sampling for global metamodeling in engineering design, in International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (ASME, New York, 2002), Vol. 36223, pp. 539–548.
24.
Ribeiro
,
M. T.
,
S.
Singh
, and
C.
Guestrin
, “Why should i trust you?” explaining the predictions of any classifier, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, 2016), pp. 1135–1144.
25.
Lundberg
,
S. M.
, and
S.-I.
Lee
, “
A unified approach to interpreting model predictions
,” in
Proceedings of the Advances in Neural Information Processing Systems
,
Long Beach, CA
,
4–9 December 2017
(
2017
), pp. 4765–4774.
26.
Oviedo
,
F.
,
J. L.
Ferres
,
T.
Buonassisi
, and
K. T.
Butler
, “
Interpretable and explainable machine learning for materials science and chemistry
,”
Acc. Mater. Res.
3
,
597
607
(
2022
).
27.
Travitz
,
A.
, and
R. G.
Larson
, “
Brownian dynamics simulations of telechelic polymers transitioning between hydrophobic surfaces
,”
Macromolecules
54
,
8612
8621
(
2021
).
28.
Chatterjee
,
T.
,
A. I.
Nakatani
, and
A. K.
Van Dyk
, “
Shear-dependent interactions in hydrophobically modified ethylene oxide urethane (HEUR) based rheology modifier–latex suspensions: Part 1. Molecular microstructure
,”
Macromolecules
47
,
1155
1174
(
2014
).
29.
Chatterjee
,
T.
,
A. K.
Van Dyk
,
V. V.
Ginzburg
, and
A. I.
Nakatani
, “
Formulation-controlled positive and negative first normal stress differences in waterborne hydrophobically modified ethylene oxide urethane (HEUR)-latex suspensions
,”
ACS Macro Lett.
6
,
716
720
(
2017
).
30.
Devanand
,
K.
, and
J.
Selser
, “
Asymptotic behavior and long-range interactions in aqueous solutions of poly (ethylene oxide)
,”
Macromolecules
24
,
5943
5947
(
1991
).
31.
Snook
,
I.
,
The Langevin and Generalised Langevin Approach to the Dynamics of Atomic, Polymeric and Colloidal Systems
(
Elsevier
,
New York
,
2006
).
32.
Ginzburg
,
V. V.
,
A. K.
Van Dyk
,
T.
Chatterjee
,
A. I.
Nakatani
,
S.
Wang
, and
R. G.
Larson
, “
Modeling the adsorption of rheology modifiers onto latex particles using coarse-grained molecular dynamics (CG-MD) and self-consistent field theory (SCFT)
,”
Macromolecules
48
,
8045
8054
(
2015
).
33.
Johnson
,
M. E.
,
L. M.
Moore
, and
D.
Ylvisaker
, “
Minimax and maximin distance designs
,”
J. Statist. Plann. Inference
26
,
131
148
(
1990
).
34.
Loeppky
,
J. L.
,
J.
Sacks
, and
W. J.
Welch
, “
Choosing the sample size of a computer experiment: A practical guide
,”
Technometrics
51
,
366
376
(
2009
).
35.
Berns
,
F.
,
J.
Hüwel
, and
C.
Beecks
, “
Automated model inference for Gaussian processes: An overview of state-of-the-art methods and algorithms
,”
SN Computer Science
3
,
300
(
2022
).
36.
Duvenaud
,
D.
, Automatic model construction with Gaussian processes, Ph.D. thesis, University of Cambridge (
2014
).
37.
Lämmle
,
S.
,
C.
Bogoclu
,
K.
Cremanns
, and
D.
Roos
, “
Gradient and uncertainty enhanced sequential sampling for global fit
,”
Comput. Methods Appl. Mech. Eng.
415
,
116226
(
2023
).
38.
Ginsbourger
,
D.
,
R.
Le Riche
, and
L.
Carraro
, Kriging is well-suited to parallelize optimization, in Computational Intelligence in Expensive Optimization Problems (Springer, 2010), pp. 131–162.
39.
Shapley
,
L.
, “
A value for n-person games
,” in
Contributions to the Theory of Games, Volume II
, edited by H. Kuhn and A. Tucker (Princeton University Press, Princeton, NJ), pp. 307–318.
40.
Kujawa
,
P.
,
H.
Watanabe
,
F.
Tanaka
, and
F.
Winnik
, “
Amphiphilic telechelic poly (n-isopropylacrylamide) in water: From micelles to gels
,”
Eur. Phys. J. E
17
,
129
137
(
2005
).
41.
Meng
,
X.-X.
, and
W. B.
Russel
, “
Rheology of telechelic associative polymers in aqueous solutions
,”
J. Rheol.
50
,
189
205
(
2006
).
42.
Krishnamurthy
,
S.
,
G.
Parthasarathy
,
R. G.
Larson
, and
E.
Mani
, “
Brownian dynamics simulations of telechelic polymer–latex suspensions under steady shear
,”
Soft Matter
19
,
2949
2961
(
2023
).
43.
Katti
,
S.
, and
A. V.
Rao
, “
Handbook of the Poisson distribution
,”
Technometrics
10
(2), 412–412 (
1968
), see https://www.tandfonline.com/doi/abs/10.1080/00401706.1968.10490580.