The interactions of molecules and particles in solution often involve an interplay between isotropic and highly directional interactions that lead to a mutual coupling of phase separation and self-assembly. This situation arises, for example, in proteins interacting through hydrophobic and charged patch regions on their surface and in nanoparticles with grafted polymer chains, such as DNA. As a minimal model of complex fluids exhibiting this interaction coupling, we investigate spherical particles having an isotropic interaction and a constellation of five attractive patches on the particle’s surface. Monte Carlo simulations and mean-field calculations of the phase boundaries of this model depend strongly on the relative strength of the isotropic and patch potentials, where we surprisingly find that analytic mean-field predictions become increasingly accurate as the directional interactions become increasingly predominant. We quantitatively account for this effect by noting that the effective interaction range increases with increasing relative directional to isotropic interaction strength. We also identify thermodynamic transition lines associated with self-assembly, extract the entropy and energy of association, and characterize the resulting cluster properties obtained from simulations using percolation scaling theory and Flory-Stockmayer mean-field theory. We find that the fractal dimension and cluster size distribution are consistent with those of lattice animals, i.e., randomly branched polymers swollen by excluded volume interactions. We also identify a universal functional form for the average molecular weight and a nearly universal functional form for a scaling parameter characterizing the cluster size distribution. Since the formation of branched clusters at equilibrium is a common phenomenon in nature, we detail how our analysis can be used in experimental characterization of such associating fluids.

## I. INTRODUCTION

Many complex fluids are composed of highly anisotropic molecules in solution that can be described by a superposition of directional and isotropic intermolecular interactions. To describe these fluids, patch models, which represent molecules as spheres decorated by a constellation of “patches” that introduce directional interactions, provide an attractive minimal model that allows for the study of both liquid-liquid phase coexistence and self-assembly.^{1} Although such models are applicable to a wide variety of complex fluids, most current work using these models focuses on protein and colloidal solutions.

Patchy models have been used extensively to describe small globular proteins, such as lysozyme and *γ*-crystallin,^{2–7} since they gained attention in 1999 due to work done by Benedek and coworkers.^{8} The introduction of patches represented an advance over prior models that only considered isotropic interactions.^{9–12} Although the treatment of proteins as spherical particles is simplistic, as detailed by Sarangapani *et al.*,^{13} this approach is useful for analyzing scattering data^{12,14} and for describing the phase coexistence of proteins.^{2,4,6,7}

Recently, Dill *et al.*^{7} used a variant of a patch model where the proteins were treated as hard spheres, while the number and interaction strength of the patches were estimated using experimental liquid-liquid phase coexistence curves. In particular, they found that they could reproduce liquid-liquid phase coexistence curves for lysozyme and *γ* IIIa-crystallin in a phosphate buffer with pH strengths close to seven. However, they did not consider the presence of attractive isotropic interactions (in addition to those of the attractive patches), as Liu *et al.*^{4} had done previously. Liu *et al.* found that spheres with a short-range isotropic interaction and either 4, 5, or 7 attractive patches could also reproduce the liquid-liquid phase coexistence curves of lysozyme and *γ*-crystallin when normalized by both the critical temperature and density. Dill *et al.*,^{7} Liu *et al.*,^{4} and others^{2,6} have focused primarily on liquid-liquid phase separation rather than self-assembly, a process that can occur well above the critical temperature for phase coexistence. Experimentally, it is known that proteins form self-assembled clusters, but it is less clear whether the clusters form under equilibrium or non-equilibrium conditions^{15–21} and how their formation relates to phase coexistence.

Distinct from the case of proteins, patch models have been used extensively to study the self-assembly of synthesized anisotropic particles, in addition to their phase separation.^{22–28} These studies are fueled by advances in the synthesis of new particles that are anisotropic in shape or interactions, as well as the use of particles in applications including electronics and drug delivery.^{29–32} For example, one realization of patchy particles uses DNA to provide highly specific interactions,^{33–36,83} with recent advances in synthesis allowing for systematic design of patch symmetries and size.^{35,36} The interactions of such patchy particles can be further controlled by modifying the length, sequence, and number of DNA strands.^{32}

Although the patch models are simplistic, they provide a platform for quantifying the role of anisotropic interactions compared to isotropic interactions, an interplay that is clearly of importance in both protein and particle solutions, as well as in molecular fluids with highly directional interactions, e.g., water and alcohols. Using a lattice-based patch model, Frenkel *et al.*^{37} identified the critical temperature and critical density for a wide range of both isotropic and patch interaction strengths, as well as various locations and numbers of patches using both simulations and theory. However, they did not compute the liquid-liquid phase coexistence curves nor did they study self-assembly. Roberts and Blanco^{38} also studied the role of anisotropic interactions, but limited their study to the second osmotic virial coefficient.

The interplay of anisotropic and isotropic interactions has broad significance in the study of the coupling between phase separation and self-assembly. Dudowicz and coworkers^{39–41} have studied this problem in detail within the context of lattice-based linear polymerization models. They quantified the competition between phase separation and dynamic formation of polymers, a common type of self-assembly process. However, their theory was only developed for molecules having the equivalent of two spots so that only linear polymers are formed. If more patches are considered, the resulting molecules self-assemble into dynamic, branched polymeric clusters.^{42–44} This case has received relatively less attention in the literature where the coupling between phase separation and self-assembly is considered.^{42,45}

To quantify the effects of the relative isotropic to anisotropic interaction strength in the context of the practical problem of characterizing protein and patchy particle solutions and to fill a gap in the literature regarding the quantification of coupling between phase separation and self-assembly for multi-functional particles, we study a five spot patch model using both exact Monte Carlo simulations and a renormalized mean-field theory. We expect this generic model of multi-particle association to provide insight into the general pattern of phase separation and self-assembly in complex fluids; thus, we analyze the phase boundaries of these fluids and cluster formation properties including the self-assembly transition lines for cluster formation and percolation, energy and entropy of association, size distributions, and cluster shapes as a function of the relative interaction strength. Although our analysis of the phase boundaries follows the qualitative pattern examined for the two spot case,^{39–41} the extension to multi-functional association, coupled with both simulations and mean-field theories, leads to many new results in comparison to former work.^{22–25,39–41,45} Specifically, we analytically quantify the relative difference of the simulation and mean-field critical temperatures due to fluctuations absent in mean-field descriptions. Additionally, we identify a universal function for the average cluster size, inspired by mean-field theories, and explore the implications of our results on the interpretation of experimental data.

The paper is organized as follows. In Sec. II, we describe our model of patchy particles, simulation techniques, and the renormalized mean-field theory. In Sec. III, we present the liquid-liquid phase coexistence curves, and in Sec. IV, we present thermodynamic transition lines related to the formation of clusters for multiple different relative isotropic to directional interaction strengths. We also analyze the cluster size distribution using geometrical percolation theory and Flory-Stockmayer theory, and we quantify cluster shape and size using the radius of gyration tensor. Finally, we discuss the applicability of our results to experimentally realizable quantities in Sec. V, and in Sec. VI, we summarize the main findings of our work.

## II. METHODS

### A. Patchy particle model

Our patchy particles consist of spheres with diameter *σ* that are decorated with five completely penetrable, smaller spheres, or patches, of diameter *δσ*. The smaller spheres are arranged in a dipyramid shape with the center of the smaller spheres located at the edge of the larger sphere (see Fig. 1). The large spheres interact with an isotropic square well potential,

where *r*_{i} is the distance between the centers of the large spheres. The small spheres located on different particles interact through a purely attractive square well potential,

where *r*_{p} is the distance between the centers of the small spheres. We chose $\delta =(5\u221223\u22121)/2\u22480.119$, which is the largest size where geometric constraints dictate that patchy “bonds” only occur between two, rather than three or more particles; this choice allows for comparison with theory. We choose the isotropic range *λ* = 1.15 consistent with the model of Liu *et al.*^{4} Since the interactions are attractive and short-ranged, the isotropic term can be thought of qualitatively as a van der Waals interaction.

### B. Monte Carlo simulations

For the calculation of the liquid-liquid phase coexistence, we applied the transition matrix Monte Carlo (TMMC) method, as described in Ref. 47. TMMC was performed in the grand canonical ensemble where the number of particles *N* varies during the course of the simulation. For efficiency, each individual simulation considers only a range of *N*, specifically, from *N*_{min} to *N*_{max}. Due to the computational expense of simulations sampling large *N*, the range of particles is chosen to decrease with increasing *N*_{min}. In particular, *N*_{min} = *N*_{0}*n*^{2/3} where *N*_{0} is a prefactor, *n* is the simulation number, and *N*_{max} is chosen such that there is an overlap of four values of *N*, i.e., *N*_{max} = *N*_{0}(*n* + 1)^{2/3} + 4. For example, taking *N*_{0} = 164, simulation numbers 0, 1, and 2 have [*N*_{min}, *N*_{max}] ranges of [0, 168], [164, 264], and [260, 345], respectively. Each simulation was run across 12 cores with the particle range determined using a similar method. Specifically, for odd processor number *p*, the range is [*N*_{min} + *c*(*p* − 1)^{2/3}, *N*_{min} + *c*(*p* + 1)^{2/3} − 1], and for even processor number *p*, the range is [*N*_{min} + (*c*/2)((*p* − 2)^{2/3} + *p*^{2/3}), *N*_{min} + (*c*/2)(*p*^{2/3} + (*p* + 2)^{2/3}) − 1] where *c* = 0.202(*N*_{max} − *N*_{min}). Such a choice allows for a large overlap in order to facilitate equilibration through swaps in particle range between processors.

All simulations were initialized without any particles in order to ensure random initial configurations and were only limited to their specified ranges once *N*_{min} was achieved. At each Monte Carlo step, one of four types of moves was attempted: single particle insertions (10% probability), single particle deletions (10% probability), single particle rotations (40% probability), or single particle displacements (40% probability). Every 10^{8} Monte Carlo moves, simulations on different cores were allowed to swap their *N*_{min} and *N*_{max} if their *N* ranges overlapped. The target maximum displacement and maximum rotation angle were updated with a target acceptance rate of 50%. Finally, the probability distributions of sampling *N* particles were updated using data across all 12 cores. Simulations continued until each *N* was visited 25 000 or more times from a different *N*. Normally, each *N* was visited from a different *N* close to 10^{7} times for *N* less than 150 and close to 10^{6} for almost all *N* up to 900, with the exception of the patchy limit *ϵ*_{i} = 0. The probability distributions from each simulation were then stitched together by matching the probability distributions at the largest *N* for the simulation with fewer particles, and demanding normalization. Finally, reweighting of the chemical potential was used to find the condition where the areas under both the low density and high density curves were equal, which was then used to define the average density in both the dilute and rich phases.

The corresponding critical point for the liquid-liquid phase separation was estimated using fits to the structure factor in the one phase region. Details of this technique can be found in the supplementary material.^{48} Our results agree with standard scaling expressions for critical properties of theories that incorporate fluctuation effects by altering the critical exponents from their mean field theory values.^{49}

For the one phase region, canonical Monte Carlo simulations were run with a 50% probability of single particle displacement and a 50% probability of single particle rotation at each Monte Carlo step. For densities *ρ* ≡ *N*/*V* < 0.8 *σ*^{−3}, the initial configuration was generated via a grand canonical simulation until the desired density was reached. For *ρ* ≥ 0.8 *σ*^{−3}, this procedure was prohibitively computationally expensive to perform for each simulation, so it was performed once to generate a random initial condition that was then used for all temperatures and interaction strengths considered. The maximum displacement distance and maximum rotation angle were chosen such that moves were accepted roughly 50% of the time. After the maximum displacement and maximum angles were determined, the simulations were run for 5 × 10^{9} Monte Carlo steps; the first 5 × 10^{8} Monte Carlo steps were discarded in order to ensure the equilibration of the system. For calculations of the heat capacity, simulations were run for 5 × 10^{10} Monte Carlo steps, and the first 5 × 10^{9} Monte Carlo steps were discarded in analysis. The heat capacity was computed for each temperature and density using fluctuations of the potential energy (e.g., (〈*U*^{2}〉 − 〈*U*〉^{2})/(*k*_{B}*T*^{2})); uncertainty was determined using a standard error analysis.^{50} The maximum heat capacity was estimated by fitting a quadratic function to the heat capacity data in the vicinity of the maximum.

All simulations were run using a cubic box size of edge length 10 *σ* with periodic boundary conditions such that the volume *V* = 1000 *σ*^{3}. The metrics for clustering only weakly depend on box size at this size (see comparison to a smaller box size in final section in the supplementary material^{48}). For consistency, the same box size was used for the phase separation.

### C. Renormalized mean-field theory

To describe our system using theory, we use a statistical associating fluid theory for variable range potentials (SAFT-VR)^{51} and subsequently apply a renormalization technique. In SAFT-VR, an approximate theory, the Helmholtz free energy normalized by the number of particles is given by

where the subscripts id, i, and p correspond to the ideal, isotropic square well, and patchy contributions, respectively. The ideal part of the free energy is given by

where *β* ≡ 1/(*k*_{B}*T*), *k*_{B} is Boltzmann’s constant, *T* is temperature, *ρ* is the number density, and *λ*_{T} is de Broglie’s wavelength.

The isotropic contribution to the free energy, treated using an inverse temperature expansion,^{52,53} is described by a sum of three contributions

where *βf*_{hs} is the hard sphere contribution and *βf*_{sw,1} and *β*^{2}*f*_{sw,2} are the first and second perturbation terms. From Carnahan and Starling,^{54} the hard sphere contribution is approximated as

where the packing fraction *ϕ* = *ρσ*^{3}*π*/6. Following Gil-Villegas *et al.*,^{51} the two square well contributions are

and

where the effective volume fraction is *ϕ*_{eff} = 0.859 413*ϕ* − 0.153 391*ϕ*^{2} − 0.121 318*ϕ*^{3} for the isotropic range *λ* = 1.15.

The patch contribution to the free energy, estimated using Wertheim’s thermodynamic perturbation theory^{55–57} is

where *M*_{p} is the number of patches and *X* is the fraction of patches that are non-bonded,

The patch interaction strength is

Here, *r*_{12} is the distance between two particles, *g*_{sw} is the reference pair correlation function, *f*(12) = *e*^{−βup} − 1 is the Mayer *f* function,^{58} and 〈*f*(12)〉_{ω1,ω2} is the Mayer *f* function averaged over all orientations,

with *S*(*r*) = (*δσ* + *σ* − *r*)^{2}(2*δσ* − *σ* + *r*)/(6*σ*^{2}*r*).^{24} Note that the Mayer *f* function only includes the patch interactions, since the isotropic interactions are contained in the reference pair correlation function, *g*_{sw}(*r*_{12}), which is approximated by its value at contact, since the range of interaction is short. In turn, the contact value, *g*_{sw}(*σ*), is approximated by^{51}

Here, the first term represents the hard sphere radial distribution function and the following two terms together yield the $O(\beta )$ term in the free energy expansion. Note that only an *n* − 1 order expansion is required for structural quantities such as *g*(*r*) in order to be consistent with *n* order expansion for thermodynamic quantities such as *f*.^{51} Using the above approximations, the patchy interaction strength simplifies to the form

where the bonding volume *V*_{b} = *πδ*^{4}*σ*^{3}(4*δ* + 15)/30.

However, it is evident that this theory involves a number of approximations and thus fails to recover the correct theta temperature *T*_{Θ}, or Boyle temperature, defined as the temperature at which the second osmotic virial coefficient *B*_{2} vanishes, i.e., *B*_{2}(*T*_{Θ}) = 0. This defect in the analytic theory can be corrected by redefining the isotropic interaction strength so that the theory exactly recovers *T*_{Θ}, a basic measure of mean interparticle interaction. We refer to the resulting analytic model as the renormalized mean-field theory (RMFT). Such a renormalization also yields an improved estimate of *B*_{2} across the full range of temperatures (see Sec. 1 in the supplementary material^{48}) and should minimize errors due to the inverse temperature expansion meaning that the main effect of deviations between RMFT and simulation will be due to the mean-field approximation rather than other approximations. The need for a theory whose deviations are primarily due to only the mean-field approximation will become apparent in Sec. III.

The first step in the renormalization procedure involves determining *B*_{2} for the exact model and the approximate theory. In particular, *B*_{2} can be exactly computed for our model using the Mayer cluster formalism^{59}

where **r**_{12} is the distance between the center of mass of the two particles, **Ω**_{j} is the orientation of particle *j*, *u* is the total potential energy including both isotropic and patch contributions. Evaluating the above quantity yields

with $B2hs$ representing the hard sphere virial. For the approximate liquid state theory described above, *B*_{2} can be computed by expanding the compressibility factor in density and taking the coefficient of the first order term. Such a calculation results in the relation

The second step in the renormalization procedure involves determining the renormalized *ϵ*_{i}, denoted as $\u03f5ire$ for *ϵ*_{i}≠0. For clarity, we switch to energy and temperature scales relative to the patch energy *ϵ*_{p}. Next, we compute the exact, analytic value of *T*_{Θ} for a given *ϵ*_{i}. We then determine a renormalized value of $\u03f5ire$, by requiring $T\Theta MFT(\u03f5ire)=T\Theta (\u03f5i)$ exactly. Then $\u03f5ire$ is used in place of *ϵ*_{i} within the theory to yield the RMFT. This procedure ensures that the mean-field theory produces the correct theta temperature for our molecular model. The dependence of $\u03f5ire$ on *ϵ*_{i} is shown in Sec. 1 of the supplementary material.^{48}

Using the above theory, the critical point is determined by simultaneously requiring ∂^{3}(*ρf*)/∂*ρ*^{3} = 0 and ∂^{2}(*ρf*)/∂*ρ*^{2} = 0. The phase coexistence is determined by minimizing the total free energy density of the system, i.e., *ρ*_{T}*f*_{T} = (*ρ*_{T} − *ρ*_{2})/(*ρ*_{1} − *ρ*_{2}) *ρ*_{1}*f*(*ρ*_{1}) + (*ρ*_{T} − *ρ*_{1})(*ρ*_{2} − *ρ*_{1}) *ρ*_{2}*f*(*ρ*_{2}) with respect to *ρ*_{1} and *ρ*_{2} subject to 0 ≤ *ρ*_{1} ≤ *ρ*_{T} and *ρ*_{T} ≤ *ρ*_{2} ≤ 6/(*πσ*^{3}). *ρ*_{T} is the initial concentration and always chosen to be the critical density. This procedure is a Gibbs ensemble formulation^{60} and is equivalent to requiring that the chemical potentials and pressures are equal in both phases.

For conciseness, we define energy and temperature scales relative to the patch energy strength *ϵ*_{p}, and lengths relative to the hard sphere diameter *σ* for the remainder of the paper.

## III. PHASE SEPARATION

Prior to investigating self-assembly, we determine the location of the liquid-liquid phase coexistence curves as a function of the isotropic interaction strength *ϵ*_{i}, where *ϵ*_{i} is in reduced units and thus represents the relative isotropic to directional interaction strength. In Fig. 2, we show the phase boundaries obtained by Monte Carlo simulation and RMFT. In both cases, the critical density *ρ*_{c} and temperature *T*_{c} shift to smaller values with decreasing *ϵ*_{i}, although the shift is more pronounced for the analytic theory. A prior study^{23} on the numbers of patches in the *ϵ*_{i} = 0 limit found that the critical density decreases with decreasing number of patches and becomes zero in the two spot case.^{23} In this sense, making the number of patches large qualitatively corresponds to an isotropic potential, so the trend for *ρ*_{c} and *T*_{c} with increasing *ϵ*_{i} is consistent with the earlier work. Additionally, the non-zero critical density for patch numbers greater than two is fundamentally different than the two spot case where only linear chains can form. Previous work has attributed this shift to a constant non-zero *ρ*_{c} in the *ϵ*_{i} = 0 limit to the presence of cooperative interactions due to competitive equilibria.^{61}

It is apparent from Fig. 2 that the RMFT becomes an increasingly accurate description of the simulation data for small values of *ϵ*_{i}. In order to explore this further, we plot the critical temperatures for both simulation *T*_{c,sim} and RMFT *T*_{c,RMFT} along with the theta temperature *T*_{Θ} in Fig. 3(a). As mentioned in Sec. II, the *T*_{Θ} corresponds to the temperature at which the second osmotic virial coefficient *B*_{2} = 0. Examples of *B*_{2} as a function of temperature for different values of *ϵ*_{i} can be found in Fig. 3(b). Due to the renormalization technique employed in the RMFT, the *T*_{Θ} for both the simulation and RMFT is equal, by definition (see Eq. (16)). Due to critical fluctuation effects, which are absent in RMFT (as well as all analytic theories of phase separation in three dimensions), we would expect deviations between *T*_{c,RMFT} and *T*_{c,sim}. However, these deviations, surprisingly, almost vanish as the purely patchy limit is approached (i.e., small *ϵ*_{i}). This striking effect, that fluctuation effects are weak in the patchy limit, is apparent in former simulations but has not been explained previously.^{23,62} In the patchy limit, *ϵ*_{i} = 0, the ratio of *T*_{Θ} to *T*_{c} approaches a constant that is greater than 1, the limit for long permanent homopolymers.

In order to quantify the critical fluctuation effects as a function of *ϵ*_{i}, we plot the ratio of *T*_{c,sim} to *T*_{c,RMFT} (Fig. 3(c)). In the limit that *ϵ*_{i} → ∞, we expect this ratio to be less than 1 and comparable to the corresponding estimate for the Ising model in three dimensions with a nearest neighbor interactions, *T*_{c,sim}/*T*_{c,RMFT} = 0.752.^{63,64} This ratio is nearly independent of the lattice^{64} suggesting its applicability for off-lattice fluids. Further, an expansion of *T*_{c,sim}/*T*_{c,RMFT} in terms of the lattice coordination number *q* yields^{65–67}

We can translate Eq. (18) into a corresponding result for an off-lattice fluid by calculating the *B*_{2} of a lattice fluid, as well as that of a square well fluid in the continuum. Direct correspondence implies that *q* in the lattice model corresponds to the dimensionless interaction range variable *λ* of the square well potential, i.e., *q* ∝ *λ*^{3} − 1.^{67–69} Thus, we have

where we take *a*_{0} to be a constant that exactly recovers the nearest-neighbor Ising result in the limit *ϵ*_{i} → ∞. For *λ* = 1.15, this condition implies *a*_{0} = 0.129 by consistency. Since *λ*^{3} − 1 can be taken as the prefactor to (*e*^{βϵi} − 1) in *B*_{2} (Eq. (16)), *B*_{2} can also be used to determine an effective range parameter $\lambda \u0304$ for any *ϵ*_{i} by following the same principle,

We also need the temperature, which we chose to be *T*_{c,RMFT}, in order to fully specify $\lambda \u03043\u22121$. Combining this information with the value of *a*_{0} from our continuum potential model and Eq. (19) with the replacement of *λ* by $\lambda \u0304$ allows for a prediction of *T*_{c,sim}/*T*_{c,RMFT} using only system parameters and RMFT. Fig. 3(c) shows the resulting prediction as a dotted line. There seems to be a constant shift of ≈0.03 between our theoretical estimates and measured ratios, but this discrepancy is likely due to the inherent approximations of the RMFT. Given these approximations, we view the similarity of the results as quite encouraging. Additionally, the analysis indicates that RMFT works better at smaller values of *ϵ*_{i} because the effective coordination number of the intermolecular interaction increases as *ϵ*_{i} decreases, an effect that is naturally associated with large clustering near the critical point. We emphasize that this prediction requires no knowledge from simulation. Thus, it can be used to estimate the phase boundaries with fluctuations based on only RMFT. Of course, separate arguments will have to be considered to estimate the correct critical density with fluctuation effects included.

## IV. SELF-ASSEMBLY

### A. Self-assembly transition lines

In addition to phase separation, due to their patchy nature, our particles form dynamic clusters upon cooling, where clusters are uniquely defined through associative patchy interactions. In our case, clusters can be defined without the introduction of a cut-off distance, since the patchy potential is prescribed by a square well interaction. Fig. 4 illustrates examples of different clusters obtained from simulation. It is clear that the clusters resemble highly branched polymers. They also contain multiple branch points and loops, and form and disintegrate in dynamic equilibrium. Prior to quantifying the cluster distributions and sizes of these clusters, we first consider two metrics that can be used to define polymerization transition lines governing the self-assembly, as opposed to the liquid-liquid phase separation boundaries. As this process of self-assembly does not involve discontinuities in any of the derivatives of the free energy, these polymerization transition lines highlight the progress of self-assembly, rather than a phase transition proper. Nonetheless, it has been shown that the polymerization transition lines for linear polymers can be described as a line of “rounded,” thermodynamic transitions^{70,71} that can be mathematically described by an interacting spin model with an applied magnetic field controlling the degree of “rounding” or “cooperativity.”^{72} We expect that a similar situation is true for our branched polymeric clusters.

The first metric for describing the emergence of self-assembly is the extent of particle cluster formation Φ, also referred to as the extent of polymerization. Φ is defined as the average fraction of particles that are in clusters, as opposed to being in a monomeric state, which is given by 1 − Φ. This quantity represents an order parameter for the self-assembly.^{39–41} Simulation results for Φ are plotted as points in Fig. 5(a) for *ϵ*_{i} = 0.1, various temperatures, and various densities. When either the temperature is lowered or the density is increased, Φ, and thus the number of particles in clusters, increases. Since the predictions for Φ as a function of *T* from RMFT do not exactly overlap with the simulation data, we use the functional form from RMFT to obtain accurate estimates for Φ at intermediate temperatures. Specifically,

where *X* is the probability that a patch is not bonded and is given analytically by Eq. (10); thus 1 − *X*^{5} is the probability that there is at least one bonded patch. The exponent 5 signifies the number of patches per particle. Combining the relation between Φ and *X* (Eq. (21)), the relation between *X* and Δ (Eq. (10)), and the expression for Δ (Eq. (14)), yields the functional form for RMFT,

with the parameters *a*_{Φ} and *b*_{Φ} defined exactly within RMFT given both *ϵ*_{i} and *ρ*. Since the theory does not match the simulations exactly, we let *a*_{Φ} and *b*_{Φ} become fitting parameters. Note that *b*_{Φ} is exactly 0 for the case of *ϵ*_{i} = 0, and thus it is not used as a fitting parameter in this limit. As can be seen from Fig. 5(a), this procedure provides an excellent description of the simulation data.

In order to characterize the assembly process, we also identify the thermodynamic conditions at which percolating or system spanning clusters become significant in our simulations. Following standard arguments of simple geometrical percolation theory,^{73} we consider the probability that one or more clusters of associated particles percolate across the simulation box at any given time during the course of the simulation as our second metric for self-assembly. Note that unlike simple geometrical percolation theory, clusters are defined through patch interactions rather than proximity. The resulting percolation probability is plotted as points in Fig. 5(b) for *ϵ*_{i} = 0.1. The data at low temperatures are cut-off due to intersection with the phase coexistence curves. The lines are fits assuming that the distribution follows 1/2(1 − erf[(*T* − Δ*T*)/*w*]) where Δ*T* and *w* are fitting parameters that are dependent on the density.^{73} As simulation box size increases, these curves in Fig. 5(b) should approach Heaviside step functions in the thermodynamic limit; however, finite size effects cause a rounding of this transition.^{73} Fortunately, the temperature at which the probability of percolation, *p*_{perc}, equals 1/2 denoted by *T*_{pperc=1/2} is only slightly sensitive to the simulation box size (see final section in the supplementary material^{48}), and thus, *T*_{pperc=1/2} can be used with minimal concern regarding finite size effects. Interestingly, when *p*_{perc} = 1/2, Φ varies only slightly, ranging from 0.83 to 0.89 suggesting that Φ alone may serve as a rough criterion for describing the state of self-assembly. This point will be explored below.

These metrics are then used to define polymerization transition lines, which in turn define the continuous process of particles associating into polymeric structures. For a given density and *ϵ*_{i}, the temperatures at which Φ = 1/2, ∂^{2}Φ/∂*T*^{2} = 0 (inflection point) and *p*_{perc} = 1/2 define polymerization transition lines. These transition lines are denoted as *T*_{Φ=1/2}, *T*_{Φ,infl}, and *T*_{pperc=1/2}, respectively. Fits to the simulation data, such as that in Fig. 5(a), are used to identify the transition lines based on Φ, while linear interpolation is used for the transition lines based on *p*_{perc}. These transition lines are represented by points in Fig. 6 for various values of *ϵ*_{i}. For comparison, estimates of the phase boundaries from simulation (see Fig. 2) are shown as dashed lines. Theoretical Φ based transition lines are computed with RMFT, since Φ = 1 − *X*^{5} with *X* given by Eq. (10). Interestingly, there is a crossover point in these transition lines for different values of *ϵ*_{i} at densities around 0.75. In RMFT, Φ is calculated directly from *X*, and the only dependence on *ϵ*_{i} occurs in the reference radial distribution function. At the observed crossover point in RMFT where Φ does not depend on *ϵ*_{i}, the square well contributions to the reference radial distribution function perfectly cancel such that the radial distribution function is equal to the hard sphere radial distribution function (see Eq. (13) noting that the first term is the hard sphere contribution).

The maximum in the heat capacity is also used as a metric for self-assembly and is directly determinable via experiments. Thus, this metric is shown in Fig. 6 for both simulations and RMFT. For the simulations, it is difficult to determine precise values of the maximum heat capacity, and based on our fits, we estimate the uncertainty to be roughly twice the symbol size. Note that the RMFT for the maximum heat capacity deviates from the simulation data much more than in the case of Φ based transition lines for reasons that are not clear.

Together, the four polymerization transition lines displayed in Fig. 6 provide a characterization of the self-assembly present in our system. Specifically, at very high temperatures all of the particles are in a monomeric state. Then, as the temperature lowers, they start to form dynamic, branched, polymeric clusters with exactly half of the particles in clusters on average at *T*_{Φ=1/2}. As the temperature continues to lower, the clustering speeds up until *T*_{Φ,infl} is reached. Eventually, system spanning clusters form at *T*_{pperc=1/2}, which occurs at even lower temperatures. Finally, a maximum in the heat capacity is observed at significantly lower temperatures assuming that phase separation did not occur first. Compared to the two spot model, the difference between *T*_{Φ=1/2} and *T*_{Φ,infl} is much greater,^{24} which could be predicted *a priori* given the RMFT.

Perhaps the largest effect of *ϵ*_{i} is its influence on the relative locations of the polymerization transition lines compared to the phase separation curves. When the patch strength is much stronger than the isotropic strength, the region in which clustering occurs is significantly larger and extends to lower densities. This trend holds regardless of the temperature reference used (see Sec. 3 in the supplementary material^{48} for details).

### B. Entropy and energy

In addition to determining the location of relevant transition lines, we determine entropy and energy of association using the RMFT and values for Φ, and thus *X*, from simulation. Specifically, a rearrangement of Eq. (10) yields a chemical equilibrium form of the formation of bonds between two particles, which can then be used to identify an equilibrium constant *K _{b}* as in Ref. 24,

In turn, *K _{b}* can be used to extract both the energy Δ

*U*and entropy Δ

*S*via the Helmholtz free energy Δ

*F*for the formation of the bonds between two particles,

Using the same functional form as the RMFT (see Eq. (22)), the functional form of the equilibrium constant is

Here, *a*_{Φ} and *b*_{Φ} are parameters that are uniquely determined within the RMFT by *ρ* and *ϵ*_{i}. Assuming that *e*^{1/T} ≫ 1, which should hold for temperatures that are low enough to observe self-assembly, *e*^{1/T} − 1 can be approximated by *e*^{1/T} allowing for the determination of both the energy and the entropy via thermodynamic definitions. The limit where *a*_{Φ} ≫ *b*_{Φ}*ϵ*_{i}/*T* permits the further approximations

Although this further approximation is not yet justified, its value will become apparent. By combining Eqs. (23), (24), (26), and (27), we find

The above expression can be simplified by recasting ln*a*_{Φ} + ln*ρ* in terms of *T*_{Φ=1/2} (by plugging both *T* = *T*_{Φ=1/2} and *X* = 1 − Φ^{1/5} = 1 − (1/2)^{1/5} into Eq. (28) and solving for ln*a*_{Φ} + ln*ρ*). Plugging this result back into Eq. (28) yields the simple form

This functional form allows for the identification of both the energy and the entropy through the use of an Arrhenius plot (see Fig. 7). In particular, the entropy of association becomes

where *T*_{Φ=1/2} is density dependent and the $ln21/5(21/5\u22121)$ term must be modified for a different choice of *M*_{p}. The points are simulation results determined using *M*_{p}*ρ*Δ = (1 − *X*)/*X*^{2} and the line is generated using Eq. (29). Due to the quality of the fit, we can confirm the consistency of the approximations and that a general graph of *M*_{p}*ρ*Δ versus 1/*T* can be used to extract the energy through the negative of the slope and the entropy through the intercept. The implications of this scaling for the interpretation of experimental data are discussed in Sec. V.

### C. Quantifying cluster size and shape

Having identified the entropy and energy of association and thermally reversible polymerization transition lines, in addition to the phase boundaries, we now focus on the cluster distributions, as well as the cluster sizes and shapes under different thermodynamic conditions from our simulations. In Fig. 8, we show the distribution *p*(*M*) of clusters of size *M* for a wide range of temperatures, densities, and *ϵ*_{i} for state points above the percolation transition line. In order to characterize these distributions, we make use of two different theories. The first is a scaling framework that assumes the applicability of geometrical percolation theory to our dynamically associating particles;^{73} the second is Flory-Stockmayer^{74,75} mean-field theory.

For non-percolating systems, the probability distribution for observing clusters of size *M* within geometrical percolation theory is given by

where *μ* represents a metric of size and *τ* represents the power law associated with the distribution. In Eq. (31), Li_{τ} is the polylogarithm function and results from normalization so that the sum of *p*(*M*) over all values of *M* equals 1. Treating *τ* and *μ* as free parameters, the fits using geometrical percolation theory are shown as solid lines in Fig. 8. Within geometrical percolation theory, the probability distribution function formally applies up to the percolation transition, at which point *μ* diverges and the probability distribution becomes a power law. However, this approach requires two fitting parameters and the variation of the fitting parameters is not known *a priori*, since geometrical percolation theory is based on a different type of percolation than observed in our system. Thus, we also consider the applicability of the mean-field theory of Flory and Stockmayer.^{74,75}

Within Flory-Stockmayer theory, the probability distribution for observing clusters of size *M* is given by

This equation assumes that no loops are formed, an assumption that clearly becomes invalid at even moderate cluster sizes (loops can be clearly seen in Fig. 4) and that the percolation has not yet been reached. The later assumption is quantified within the theory by requiring *X* be larger than its value at percolation, 3/4. Since *X* is directly related to Φ (see Eq. (21)), the Flory-Stockmayer cluster distributions can be generated from knowledge of Φ from the simulations and thus requires no fitting parameters. These predictions are shown in Fig. 8 as dashed lines. Despite the fact that the assumptions for Flory-Stockmayer do not apply for small *M*, the predictions are in good agreement with the simulations, suggesting that Flory-Stockmayer theory may be used to gain further insight into the system regardless of its approximate nature. The highly attractive feature of Flory-Stockmayer theory is that it can roughly reproduce the correct distribution with minimal information. Specifically, since there is only one parameter, this parameter can be determined through knowledge of only the average cluster size 〈*M*〉 (see Eq. (33)) meaning that a good estimate of *p*(*M*) can be determined from a single experimental measurement.

We also explore the shape of the branched, polymeric clusters from our simulations. We do not include system spanning clusters. As can be seen in Fig. 9, the radius of gyration *R*_{g} scales as *M* to a power near *ν* ≈ 1/2 for a wide range of conditions, including different temperatures, densities, and *ϵ*_{i}. This means that the fractal dimension of the non-percolating clusters *d _{f}* = 1/

*ν*≈ 2, which is the known value for lattice animals,

^{76}but distinct from geometrical percolation clusters where the fractal dimension is near 2.5.

^{73}Lattice animals are in the same universality class as branched polymers swollen by excluded volume interactions, while geometrical percolation clusters are in the same universality class as branched polymers with screened excluded volume interactions or at their theta point.

^{77}This suggests that our clusters are in the swollen, branched polymer universality class. For comparison, the mass scaling exponent for the mean-field Flory-Stockmayer theory is 1/4, reflecting the mean-field nature of this model and the large upper critical dimension of 8, above which excluded volume interactions can be neglected.

^{73}Evidently, configurational fluctuations lead to large deviations from mean-field predictions regarding polymer size.

To further quantify the shapes of the clusters, we compute the ratios of the average values of the eigenvalues of the radius of gyration tensor for the same clusters, shown in Fig. 9. We label the eigenvalues, *λ _{i}* such that

*λ*

_{1}≤

*λ*

_{2}≤

*λ*

_{3}. Note that by definition $Rg2=\lambda 1+\lambda 2+\lambda 3$ or, equivalently, $Rg,i2=\lambda i$. The ratios are plotted in Fig. 10. For perfectly isotropic clusters both ratios,

*λ*

_{3}/

*λ*

_{1}=

*λ*

_{2}/

*λ*

_{1}= 1. Small clusters are highly anisotropic, but this is not surprising given that two of the eigenvalues are zero for a dimer. As the cluster size increases, the ratios asymptote to a result near the expected value for lattice animals, rather than geometrical percolation clusters.

^{78}This finding is consistent with our analysis of the fractal dimension. However, for clusters with very large masses, it is reasonable to expect that the excluded volume interactions would be screened, implying that the scaling exponent should approach that of geometrical percolation clusters due to screening of excluded volume interactions. Finite size limitations do not permit a definite conclusion regarding such a trend.

### D. Universal descriptors of cluster size

Having quantified cluster size distributions, as well as the cluster sizes and shapes from simulation, we investigate the average cluster size 〈*M*〉, which we plot as a function of density for various temperatures in Fig. 11(a). We observe larger average cluster sizes both at lower temperatures and higher densities. Such trends follow intuition and qualitatively accord with experimental results for the average size of lysozyme clusters.^{20} If we plot 〈*M*〉 as a function of the order parameter Φ instead of density, 〈*M*〉 for all densities, temperatures, and *ϵ*_{i} above the percolation transition line roughly follow a master curve as can be seen in Fig. 11(b). Using Flory-Stockmayer theory, this relationship equals^{79}

where *X* is determined by Φ = 1 − *X*^{5}. This relation is expected to hold for *X* > 3/4 or equivalently, Φ ≤ 0.763. As can be seen from Fig. 11(b), Flory-Stockmayer theory agrees nearly perfectly with our simulation results within the range in which it is applicable. A linear plot can also be generated by plotting 1/〈*M*〉 versus *X* as shown in Sec. 4 in the supplementary material.^{48} For larger values of Φ, the relation breaks down, due to both the formation of percolating clusters and the presence of loops.

We now explore if we can obtain a universal description of the *T* dependence of 〈*M*〉. Inspired by the Arrhenius description, which clearly links 〈*M*〉, and thus *X*, to *T*, we plot 〈*M*〉 as a function of 1/*T* − 1/*T*_{Φ=1/2} in Fig. 12 yielding a master curve. This master curve follows the expected function, shown as a black line. The function is determined by solving Eq. (29) for *X* and then plugging the result in Eq. (33). Small deviations at larger values of 1/*T* are due to the breakdown of the Flory-Stockmayer relation (Eq. (33)).

Similar to the universal 1/*T* − 1/*T*_{Φ=1/2} dependence of 〈*M*〉, we consider if the characteristic cluster size parameter *μ* from percolation theory (see Eq. (31)) also has a master curve (the inset to Fig. 12). Here the data reduction is not a true master curve, as there are small but systematic deviations for different interaction strengths. To provide an approximate analytic form for the *T* dependence of *μ* and 〈*M*〉 that would be valuable for describing experimental data, we again turn to the Flory-Stockmayer theory, since, despite its deficiencies, it is able to provide physical insight into the cluster size distribution and cluster size average. Specifically, in the large *M* limit Flory-Stockmayer theory reduces to the closed form,

Comparing the above equation with the probability distribution expected from geometrical percolation theory (see Eq. (31)), one finds the Flory-Stockmayer result corresponds to *τ* = 5/2 and

Although Flory-Stockmayer predicts *τ* = 5/2, our fits indicate that it ranges from roughly 1.7 to 2 for *T* > *T*_{Φ=1/2}, which is consistent with the expected range for lattice animals^{78} and consistent with branched dynamic immobile particle clusters observed in simulations of glass-forming polymer liquids.^{82} This exponent range for *τ* deviates from 2.18, the expected value from geometric percolation theory describing randomly placed non-interacting particles.^{73} Again we have evidence that our dynamic clusters are not percolation clusters.

In our quantification of the relationship between *μ* and temperature, we consider only the functional form for *μ* suggested by Flory-Stockmayer theory. Accordingly, we let ln(27/256) become a fitting parameter *y* and find that a fit of −ln(*X*^{3}(1 − *X*)) versus 1/*μ* yields *y* = − 2.146 ± 0.003, as compared to the Flory-Stockmayer prediction of −2.249. The results of this fit, which relate *μ* to *X* can then be combined with the relationship between *X* and 1/*T* − 1/*T*_{Φ=1/2} to yield the black line in the inset of Fig. 12. This fit is due to only one fitting parameter (i.e., *y*). For comparison, we plot the Flory-Stockmayer relationship without the fitting parameter *y* as a dashed line. Clearly, the fit yields a significantly improved prediction. However, this fit will breakdown completely for values of *μ* that are much larger (Φ ≤ 0.76), since Flory-Stockmayer theory breaks down before the percolation transition line is reached. Nonetheless, within the range plotted in the inset of Fig. 12, the fit is rather impressive.

These comparisons of Flory-Stockmayer theory with geometrical percolation theory can provide insight into how our system relates to geometrical percolation. Overall, the combination of Flory-Stockmayer theory and geometrical percolation theory allows for an understanding of master-like curves for two different metrics of cluster size (〈*M*〉 and *μ*). Additionally, we can conclude that the properties of the clusters at temperatures above the percolation transition line are independent of *ϵ*_{i} and can be quantified. Thus, the main effect of *ϵ*_{i} is to control the value of Φ for a given temperature and density, which then specifies all the cluster properties.

## V. THEORETICAL FRAMEWORK FOR CHARACTERIZING SELF-ASSEMBLING SYSTEMS

In Sec. IV, we introduced a simple form of the average cluster size 〈*M*〉 and expressions for extracting the energy and entropy of association that are applicable to particles exhibiting multi-functional association. We now show how this framework can be adapted to allow for the determination of the energy and entropy from experimental measurements. Specifically, an Arrhenius plot such as that in Fig. 7 can be now generated using the experimental molecular weight. The relationship between 〈*M*〉 (number average molecular weight) and *X* and between *X* and *M*_{p}*ρ*Δ in principle allows for the generation of an Arrhenius plot. However, in practice, it is typically the weight averaged molecular weight *M _{w}* that is measured experimentally. In particular, static light scattering coupled with the assumption of a fractal dimension of 2 (required for consistency with our results) can be used to extract

*M*in a dynamically associating system.

_{w}^{80,81}However, an Arrhenius plot, such as that in Fig. 7 can still be generated by linking

*M*to

_{w}*X*, which is straightforward within Flory-Stockmayer theory. For five spots, the

*y*-axis is defined by the relation

which, when plotted as a function of 1/*T*, should be linear. However, in general the number of spots may not be five. Thus, the above expression can be generalized and the linearity of such a plot can be used to identify the effective number of spots in the system,

Once the effective number of spots is determined, the energetic and entropic parameters can be extracted. Specifically, the slope of the line in the Arrhenius plot yields the energy of association, while the intercept yields the entropy of association divided by *k*_{B}. Note that the discussion in Sec. IV B uses simulation, rather than experimental units. The entropy can also be used to extract the transition temperature *T*_{Φ=1/2} using a generalization of Eq. (30) to an arbitrary number of spots. A full discussion of the extension of the above framework to an arbitrary number of spots will be the subject of a future paper. Thus, using only average molecular weight data for various temperatures and initial concentrations, the effective number of spots, the energy of association, the entropy of association as a function of density, the transition temperature *T*_{Φ=1/2} as a function of density, and the rough cluster size distribution (assuming Flory-Stockmayer theory) can all be determined.

The above analysis hinges on the definition of clusters defined solely on the basis of the patch interactions rather than the isotropic interaction. However, our choice is reasonable because the value of the structure factor at small wavevectors, which is related to the molecular weight, increases as the temperature drops below the transition lines. Such a change in the structure factor is not observed if transition lines are defined using isotropic rather than patch interactions, confirming the validity of our choice.

Additionally, relevant physical quantities can in principle be extracted from the determination of the second osmotic virial coefficient *B*_{2} as a function of temperature. Specifically, the patch size *δ*, isotropic well depth *ϵ*_{i}, and range of the isotropic interaction *λ* can be determined. However, the experimental system must be at conditions, such as high salt concentration, where the dominant isotropic interactions are short ranged and where only unassociated proteins or particles are present.

## VI. CONCLUSIONS

Using both simulations and a RMFT, we identify phase coexistence curves of five patch particles with an additional isotropic interaction for a wide range of relative interaction strengths. Although RMFT overpredicts the critical temperature for larger values of isotropic interaction strength *ϵ*_{i}, we can predict this overestimate using only information from the RMFT and the purely isotropic case. Specifically, we find that the effective coordination number is larger for larger values of *ϵ*_{i}, which explains why the mean-field model, RMFT, performs better in this limit. The prediction also allows for better estimates of phase boundaries using only RMFT. Additionally, we use three different metrics, the extent of clustering, percolation probability, and heat capacity, to define polymerization transition lines that delineate the phase diagram into characteristic regions. We also find that the largest effect of the isotropic interactions was on the relative location of the phase separation boundaries relative to the clustering transition lines, with the largest regions of clustering occurring for the smallest values of *ϵ*_{i}.

We provide a method for extracting the energy and entropy of association, and we analyze the cluster size distributions, sizes, and shapes for different densities, temperatures, and interaction strengths. Cluster size distributions and related quantities are explored in the context of two different theories, Flory-Stockmayer and geometrical percolation theory, both of which yield different information. Using the radius of gyration tensor, we determine that the clusters are like lattice animals—they have a fractal dimension of two and are anisotropic with roughly the expected ratios of average eigenvalues for the radius of gyration tensor. Since lattice animals are in the same class as swollen, randomly branched polymers, the clusters formed can be thought of as equilibrium, swollen, branched polymers. Finally, we identify a master curve for average cluster size and a master-like curve for the cluster size parameter from geometrical percolation theory. By combining knowledge from RMFT, Flory-Stockmayer theory, and geometrical percolation theory, we quantify the curves using no fitting parameters for the average cluster size and a single fitting parameter for the cluster size parameter from geometrical percolation theory. Consequently, cluster shape, size, and distributions within the clustering regions are controlled primarily by the extent of clustering, Φ, rather than the temperature, density, or interaction strength directly.

We expect that our results will provide insight into clustering phenomena in general both in the protein and colloidal contexts, since we provide frameworks in which to quantify observed clustering and, thus, make predictions about the system as discussed in detail in Sec. V.

## Acknowledgments

D.J.A. thanks the NRC postdoctoral fellowship program for their generous support, and F.W.S. acknowledges support from NIST Award No. 70NANB15H282.

## REFERENCES

Here we define the self-assembly as a thermoreversible transition without a phase transition.