We extend Wertheim’s thermodynamic perturbation theory to derive the association free energy of a multicomponent mixture for which double bonds can form between any two pairs of the molecules’ arbitrary number of bonding sites. This generalization reduces in limiting cases to prior theories that restrict double bonding to at most one pair of sites per molecule. We apply the new theory to an associating mixture of colloidal particles (“colloids”) and flexible chain molecules (“linkers”). The linkers have two functional end groups, each of which may bond to one of several sites on the colloids. Due to their flexibility, a significant fraction of linkers can “loop” with both ends bonding to sites on the same colloid instead of bridging sites on different colloids. We use the theory to show that the fraction of linkers in loops depends sensitively on the linker end-to-end distance relative to the colloid bonding-site distance, which suggests strategies for mitigating the loop formation that may otherwise hinder linker-mediated colloidal assembly.

## I. INTRODUCTION

It has been over 35 years since Wertheim developed a thermodynamic perturbation theory (TPT) for fluids of molecules that associate through strong directional attractions.^{1–4} Wertheim’s TPT has proven to be powerful and broadly applicable, capturing the thermodynamic behavior of fluids comprising, e.g., small molecules with chemically reactive sites, associating macromolecules, such as DNA nanostars,^{5,6} and patchy colloidal particles (“colloids”).^{7–10} Indeed, Wertheim’s TPT is a foundation for statistical associating fluid theory (SAFT),^{11–14} which has been widely used to develop equations of state for complex fluid mixtures.^{15}

In practice, many applications of Wertheim’s TPT, including SAFT, adopt a first-order approximation that treats association as pairwise bonding into tree-like networks and is often referred to as TPT1.^{2,4} We previously used TPT1 to predict the phase behavior of mixtures of colloids and difunctional chain molecules (“linkers”) in which linker ends associate with sites on the colloids through strong reversible bonds (Fig. 1).^{16} Homogeneous colloid–linker mixtures of this type lose thermodynamic stability for a range of colloid or linker concentrations,^{16,17} phase separating into colloid-rich and colloid-lean phases and potentially gelling.^{18,19} TPT1 qualitatively predicts this transition, but we have found that it underestimates the amount of linker required for phase separation.^{16,19} We hypothesized that this underestimation was due, in part, to TPT1’s neglect of other prevalent bonding motifs such as linkers that “loop” so that both ends associate to the same colloid [Fig. 1(a)], i.e., form a double bond between a linker and a colloid.

TPT1 has been extended to include double bonds for restricted numbers of components, bonding sites, and/or pairs of sites that can participate in the bonds.^{21–28} The most general result to date is the work of Marshall,^{29} which includes multiple components with multiple bonding sites but permits only one pair of sites per component to double bond. This restriction is well justified when the geometry of the sites limits potential double bonding pairs, e.g., direct association of colloids with small numbers of attractive patches. However, it can break down for common linking schemes for self-assembly. For example, nanoparticles are often coated with flexible ligands that are functionalized to bond with a complementary functional group on another nanoparticle^{30–35} or a linking molecule.^{19,36–38} Control over the specific number of functionalized ligands on each nanoparticle is infeasible or produces low yields,^{39} and due to their flexibility and surface coverage, multiple ligands on the same nanoparticle might readily double bond with another nanoparticle or with a linker.^{19} Given the success of TPT1 for other associating fluids, we aim at developing the theory to handle such cases.

Building on previous TPTs, we derive a general expression for the association free energy of a mixture when a double bond can form between any two pairs of sites on any of the components; this expression reduces to TPT1^{14} and Marshall’s result^{29} as special cases. We apply the theory to the colloid–linker mixture we previously studied, where the linker readily double bonded with nearly any pair of sites on the colloids.^{16} We compute the fraction of linkers forming double-bond loops. In the strong association limit, this fraction is strongly dependent on the distribution of linker end-to-end distances. Modifications to linker length or flexibility that make the linker end-to-end distance incompatible with the distance between colloid bonding sites inhibit loop formation; this consideration may prove helpful in the design of molecular linkers for self-assembly. We also show that the theory readily incorporates other prevalent bonding motifs with two-site association, such as “rings” made from multiple linkages between colloids [Fig. 1(b)], and that incorporating these motifs improves its accuracy.

The rest of this article is organized as follows. In Sec. II, we review the fundamentals of Wertheim’s theory, which we hope will serve as a pedagogical reference complementary to and expanding on excellent recent reviews.^{40,41} In Sec. III, we derive the new theory for double bonding between multiple components with multiple sites. We apply the theory to the colloid–linker mixture in Sec. IV before summarizing in Sec. V.

## II. WERTHEIM’S THEORY

We begin by outlining Wertheim’s theory for associating fluids,^{1–4} which is based on a graphical expansion in the grand canonical ensemble using the densities of various bonding states. Our aim is to provide a useful entry point for those seeking to understand and build upon this work, as well as for those who wish to similarly apply or extend Wertheim’s theory. We will follow Wertheim’s derivation for molecules with multiple association sites^{3} with small modifications to include multiple components; any subscripts or superscripts denoting the component type in this article should be neglected when comparing to the notation from his work.

### A. Mixture model

Consider a mixture of *k* components that are able to associate with each other through short-ranged attractions. We will adopt a physical model for the mixture that refers to the constituents of a component as “molecules” that are covered with bonding “sites,” but here, a molecule should be interpreted liberally to include not only small molecules but also macromolecules and colloids. A molecule of component *i* has a set of sites $\Gamma i={A1,\u2026,Ani}$, which we will assume for now are rigidly fixed within the molecule so that its conformation can be specified by its position **r** and orientation **Ω**. The total potential energy of the mixture *U* is taken to be the sum of a one-body potential $u1(i)$ acting on each molecule of component *i* and a pairwise interaction $u2(ij)$ between two molecules of components *i* and *j*. The pairwise interaction is split into a repulsive core $uR(ij)$ and a short-ranged attraction $uAB(ij)$ between sites,

where **1** = (**r**_{1}, **Ω**_{1}) is a shorthand notation for the position and orientation of molecule 1. The site attractions are functions of **1** and **2** through the positions of sites *A* and *B*, $rA(i)(1)$ and $rB(j)(2)$. The summation is taken over all pairs of sites in the two molecules, but some sites may be noninteracting.

In the grand canonical ensemble, the temperature *T*, volume *V*, and the chemical potentials ${\mu i}$ of all components are held constant, and the numbers of molecules {*N*_{i}} fluctuate. The grand partition function is

where **N**_{i} indicates all the positions and orientations of the *N*_{i} molecules of component *i*, Λ_{i} accounts for integrals over translational and rotational momenta (i.e., is related to the thermal wavelength), *N*_{i}! is a factor for permutating labels of the molecules, and *β* = 1/(*k*_{B}*T*), with *k*_{B} being Boltzmann’s constant. The one-body contributions $u1(i)$ to the potential energy *U* can be grouped with the chemical potential *μ*_{i} to define a modified fugacity

so that for our model,

The product of *z*_{i} is taken over all molecules of component *i* (indexed **1**), and the sum of $u2(ij)$ is taken over all unique pairs of molecules (indexed **1** and **2**).

### B. Graphical expansion

We now define $e(ij)(1,2)=exp[\u2212\beta u2(ij)(1,2)]$ as the *e*-function (Boltzmann weight) associated with an interaction between two molecules of components *i* and *j*, respectively, and *f*^{(ij)}(**1**, **2**) = *e*^{(ij)}(**1**, **2**) − 1 as the corresponding Mayer *f*-function. The integrand in Eq. (4) can be written as a product of *f*-functions using

The integrals in Eq. (4) can then be expanded into multiple integrals over different products of *f*-functions, and it is useful to express these integrals as graphs (Fig. 2).^{42} Each integral is represented by a “*z*-graph” whose nodes, called field points, are the integration variables and are each assigned a factor of the fugacity *z*_{i}. Two field points in a graph may be connected by at most one *f*-bond whose presence is represented by an edge between the points that carries a weight *f*^{(ij)}. There is an infinite number of *z*-graphs in Ξ of which only a subset are connected graphs, i.e., all points have joined to each other by *f*-bonds. It can be shown that ln Ξ is the sum of only these connected graphs, and the mixture thermodynamics can be computed using the grand potential *F* = −*β*^{−1} ln Ξ. However, using only *f*-bonds to represent association can give a poorly convergent summation for ln Ξ, which motivated Wertheim to further decompose the graphs.^{1}

For the model interactions defined by Eq. (1), an *f*-bond is separable into repulsive and attractive parts,

where the subscripts denote the part of the potential. Graphically, each point in a *z*-graph is replaced by a hyperpoint encompassing the molecule’s sites, and the *f*-bonds are replaced by repulsive bonds between hyperpoints and attractive bonds between sites (Fig. 2). Each hyperpoint still carries a factor *z*_{i}, and repulsive *f*_{R}-bonds are edges between two hyperpoints. Attractive *f*_{AB}-bonds are edges between sites *A* and *B* in different hyperpoints, which have in parallel a repulsive *e*_{R}-bond between the hyperpoints.

Two hyperpoints are “directly connected” if they have a repulsive *f*_{R}-bond or any attractive *f*_{AB}-bond between their sites. Furthermore, two sites are “bond-connected” if there is a path of attractive bonds between them, while two sites are “constraint-connected” if they are in the same hyperpoint. Now, consider the set of *z*-graphs where all hyperpoints are connected by networks of attractive bonds; Wertheim called such a graph having *s* hyperpoints an “*s*-mer” analogous to the association of monomers into a polymer. The connected *z*-graphs in ln Ξ can all be expressed as *s*-mers with *f*_{R}-bonds between hyperpoints in distinct *s*-mers, including the *s* = 1-mer or single hyperpoint without any attractive bonds.

Within an *s*-mer graph, there can be multiple networks of bond-connected sites; these networks must be constraint-connected. Consider two hyperpoints **1** and **2** in an *s*-mer that are not directly connected by an attractive bond. For such a hyperpoint pair, Wertheim proposed that if any site *A* on **1** is bond-connected to any site *B* on **2**, two *s*-mer graphs with and without an *f*_{R}-bond between **1** and **2** can be usefully combined into one graph with an *e*_{R}-bond between **1** and **2**. Hence, we will form *e*_{R}-bonds between all pairs of hyperpoints in an *s*-mer with bond-connected sites, but pairs of hyperpoints in an *s*-mer that are connected only through a constraint connection will not have an *e*_{R}-bond. By this procedure, all hyperpoints in a bond-connected network become irreducibly connected.^{43}

Define *ρ*^{(i)}(**1**) to be the sum of *z*-graphs obtained by turning a field hyperpoint of component *i*, which is an integration variable, into a “root” hyperpoint labeled **1**, which is not an integration variable. This is equivalent to the functional differentiation,

so *ρ*^{(i)}(**1**) is the average “singlet” density of component *i* at **1** in the grand canonical ensemble. The singlet density is related to the familiar number density *ρ*_{i}(**r**_{1}) by integration over orientations,

The various *z*-graphs in the sum *ρ*^{(i)}(**1**) can be further analyzed in terms of their bonding at the root hyperpoint. A *z*-graph is assigned to the sum $\rho \alpha (i)(1)$ if *α* is the set of bonded sites at **1** (*α* ⊆ Γ_{i}),^{44} and as a result,

Now, let $c\u2205(i)(1)$ be the sum of the subset of graphs in $\rho \u2205(i)(1)/zi(1)$ for which **1** is not an articulation point. It can be shown that^{41}

Furthermore, let $c\alpha (i)(1)$ be the sum of the subset of graphs in $\rho \alpha (i)(1)/\rho \u2205(i)(1)$ for which **1** is not a “constraint” articulation point, i.e., the removal of the hyperpoint and any incident repulsive bonds at **1** leaves the graph connected. $\rho \alpha (i)(1)$ includes all graphs bonded at the sites *α*, which can be formed by composing at the root point various $ca(i)(1)$ for *a* ⊆ *α*, such that their union gives *α*, along with the graphs that are not bonded,

The sum is over all set partitions of *α*, denoted as *P*(*α*),^{45} and the product is over all sets in each partition.

### C. Topological reduction

As currently defined, $c\alpha (i)$ is a sum of both reducible and irreducible *z*-graphs, and it is computationally desirable to eliminate the reducible ones. To do so, Wertheim considered the procedure of adding at each field point of an irreducible graph either nothing or another graph with one root point. By the *e*_{R}-bond replacement procedure for an *s*-mer, the root point of the added graph must have bonding complementary to the bonding at the field point of the irreducible graph, so the added graphs can be any $\rho \gamma (i)$ for *γ* ⊆ Γ_{i}$\$*α*.^{46} This naturally leads to the definition of the auxiliary singlet densities,

which are the sums of all graphs whose bonded sites at root point **1** are a subset of *α*. For example, $\sigma A(i)$ contains all graphs bonded at *A* (and not bonded elsewhere) or having no bonded sites.^{47} Two special cases are $\sigma \u2205(i)=\rho \u2205(i)$, the sum of graphs not bonded at any site, and $\sigma \Gamma i(i)=\rho (i)$, including all possible combinations of sites. The bonded singlet densities $\rho \alpha (i)$ can be expressed in terms of $\sigma \alpha (i)$,^{48}

with details in Appendix A.

Each field point of an irreducible graph that has a set of bonded sites *α* is now reassigned a factor $\sigma \Gamma i\\alpha (i)$, which are precisely the graphs *not* bonded at *α*, and $c\alpha (i)$ can be expressed as a sum of only irreducible graphs. We define *c* to be the sum of all irreducible graphs consisting of *s*-mer graphs and *f*_{R}-bonds between *s*-mers where all hyperpoints are field points and carry these factors. (Wertheim called this sum *c*^{(0)}, but we will use *c* to avoid confusion with the type superscripts.) All $c\alpha (i)$ can be obtained from *c* by turning a field point bonded at *α* into a root point labeled **1** and removing its factor $\sigma \Gamma i\\alpha (i)$. This procedure is given by the functional differentiation,

$c\alpha (i)$ can also be expressed as functions of $\sigma \alpha (i)$ directly,

for *α* ≠ *∅*, with *δ*_{i,j} being the Kronecker delta. [Note that this corrects a typographical error in Eq. (27) of Ref. 3.] The inverse relationship is

See Appendix A for details required to arrive at these results.

### D. Thermodynamics

Given $\sigma \alpha (i)$ and $c\alpha (i)$, it can be shown using a Legendre transform of the grand potential that the Helmholtz free energy *A* of the associating mixture is^{41}

The integrand can be expressed in terms of only $\sigma \alpha (i)$ using Eq. (15) to replace $c\alpha (i)$ (see Appendix A),

with

to give

At equilibrium, *A* should be minimized with respect to variations in $\sigma \alpha (i)$, subject to a constraint on $\sigma \Gamma i(i)$ to give a fixed number of molecules. Taking the functional derivative of *A* with respect to $\sigma \alpha (i)$ for *α* ⊂ Γ_{i} gives^{49}

That this relationship holds can be verified by taking derivatives of Eq. (19) and comparing them to Eq. (15). Functional differentiation of *A* with respect to the total singlet density $\sigma \Gamma i(i)$ gives the chemical potential *μ*_{i} after some manipulation, which is the correct thermodynamic relationship.

The results so far are formally exact, but Wertheim’s theory has been most successfully applied as an approximate TPT.^{2,4} In TPT, the free energy is computed relative to a reference system with molecules whose pairwise interactions are given by only $uR(ij)$ and whose free energy *A*_{R} is more readily computed. The reference system has no attractive bonds and is assumed to have the same total densities $\sigma \Gamma i(i)$,

where *c*_{R} is the sum of irreducible graphs for the reference fluid without any bonding, i.e., it contains only *f*_{R}-bonds and should only be a functional of the total density $\sigma \Gamma i(i)$. By identifying the first term as the ideal free energy, −*c*_{R} is the excess contribution to *βA*_{R}. In practice, *A*_{R} is usually evaluated with a known equation of state for the reference system; for example, *A*_{R} for a spatially homogeneous hard-sphere mixture is well approximated by Boublík’s equation of state.^{50}

The association free energy, Δ*A* = *A* − *A*_{R}, is then

The last term Δ*c* = *c* − *c*_{R} is the key to Wertheim’s theory, as it is the contributions to the sum of irreducible graphs from association. Although Δ*c* formally contains all irreducible graphs with at least one attractive bond, Wertheim discussed a few cases of “steric hindrance” that can simplify the graphs that need to be included. Two particularly convenient ones are as follows:

The formation of an

*f*_{AB}-bond between site*A*on molecule 1 and site*B*on molecule 2 prevents any site in third molecule 3 from bonding with either*A*or*B*. This hindrance usually occurs because of repulsion between the cores of the molecules.Two sites

*B*and*C*on molecule 2 cannot bond to the same site*A*on molecule 1. This can also be enforced by the site geometry or by restricting the model to form exclusive bonds (e.g., covalent chemistry).

Together, these hindrances remove all graphs with multiple bonding of a single site, which greatly reduces the combinations that must be considered. Further simplification can be made by including only a limited subset of irreducible graphs, which we do next.

## III. DOUBLE-BOND ASSOCIATION

TPT1 usually includes the lowest-order irreducible graphs from pair association in Δ*c*.^{2,4} This admits at most one attractive bond between two molecules and enforces a tree-like bonded network. To extend TPT1 to include double bonding, we also include the irreducible graphs with two attractive bonds between two molecules,

where $gR(ij)$ is the pair correlation function in the reference fluid. The sums over *AB* and *CD* are taken over the two-site pairs on *i* and *j* (e.g., *AB* = {*A*, *B*} is a two-element subset of Γ_{i}), and

accounts for the two way sites in *AB*, and *CD* can associate with each other. The first term in Eq. (24) is standard in TPT1 and accounts for all ways a single bond can form between two molecules, while the second term accounts for all ways a double bond can form. In practice, some or all these graphs may be zero or negligible for a given model due to the bonding chemistry and/or site geometry, but we do not exclude any of these at this stage.

In expressing the graph sum in this way, we have neglected any rings (cycles) in the graph sum beyond the double bond, although all such graphs are irreducible.^{22,51} Hence, the overall bonding network is still a tree of single and double bonds between molecules. Rings of molecules bonded at two sites can be included within the framework we develop here using the approximation of Sear and Jackson;^{22} we will revisit this point in Sec. IV. We also neglect the effects of steric hindrance that prevent bonding at one site when a bond forms at another; these effects might be treated by including additional graphs or with a renormalization approach.^{27,28,52}

We now differentiate Eq. (24) with respect to $\sigma \alpha (i)$, giving

and

so

The remaining $\Delta c\alpha (i)$ for *α* ⊂Γ_{i} are zero. Because *c*_{R} is only a functional of $\sigma \Gamma i(i)$, we can replace $c\alpha (i)$ by $\Delta c\alpha (i)$ in Eq. (18),

We may further express $\Delta cA(i)$ and $\Delta cAB(i)$ in terms of $\sigma \alpha (i)$ using Eq. (15),

and

These results can be directly substituted into Eq. (23) to define Δ*A* using only $\sigma \alpha (i)$; however, this system of equations is still incomplete for specifying all $\sigma \alpha (i)$ (and hence, Δ*A*).

We use Eq. (16) to find a relationship between $\sigma \alpha (i)$ in terms of $\Delta cA(i)$ and $\Delta cAB(i)$, given that all other $\Delta c\alpha (i)$ are zero for *α* ⊂ Γ_{i},

The sum is taken over all partitions of *α* having only one-element or two-element subsets, and the products are taken over each type of subset; all other partitions do not contribute. For |*α*|≥ 3, Eq. (32) can be restated as a recursive relationship,

after the substitution of Eqs. (30) and (31). The first term accounts for *A* being a one-element subset in the first product of Eq. (32), which contributes a factor of $1+\Delta cA(i)$; this factor then multiplies $\sigma \alpha \A(i)$, which includes all partitions of *α*$\$*A*. The second term accounts for all two-element subsets containing *A*, which contribute a factor of $\Delta cAB(i)$ in the second product of Eq. (32) for all *B* ∈ *α*$\$*A*. This must multiply $\sigma \alpha \AB(i)$, which includes all remaining partitions of *α*$\$*AB*. For example, if *α* = *ABC* = {*A*, *B*, *C*} and we remove *A*, we obtain

As expected, the same expression is obtained if site *B* or *C* is initially removed from *α* rather than *A*. The partitions of *α* having only one-element or two-element subsets are {{*A*}, {*B*, *C*}}, {{*B*}, {*A*, *C*}}, {{*C*}, {*A*, *B*}}, and {{*A*}, {*B*}, {*C*}}, so Eq. (32) is also equivalent after substitution and some simplification.

We now recast these equations using the notation of Chapman and co-workers^{11–14} and define $X\alpha (i)(1)=\sigma \Gamma i\\alpha (i)(1)/\sigma \Gamma i(i)(1)$ as the local fraction of molecules of type *i* not bonded at all sites in *α*. [Note the special case $X\u2205(i)(1)=1$.] The association free energy is

while the “mass action” or “chemical equilibrium” conditions from $\Delta c\alpha (i)$ are

and

If the mixture is spatially homogeneous, none of the singlet densities (or $X\alpha (i)$) depends on position or orientation. Carrying out the integral in Eq. (35) gives the association free-energy density Δ*a* = Δ*A*/*V* in terms of the number densities *ρ*_{i} and $X\alpha (i)$,

integrating Eq. (36) over **1** gives

with

and integrating Eq. (37) over **1** gives

with

where **r**_{12} = **r**_{2} − **r**_{1} is the separation between molecules and Ω_{i} is the integral over orientations of a molecule of component *i*. Here, we have used that $gR(ij)$ should only depend on the relative separation between molecules in a spatially homogeneous mixture. $\Delta AB(ij)$ and $\Delta ABCD(ij)$ are then “bond volumes” integrated over separations and averaged over all orientations of the two molecules.^{13,14}

Equation (32) can also be restated as

and equivalently, as a recursive relationship,

These relationships allow $XA(i)$ and $XAB(i)$ to be computed in terms of $X\Gamma i(i)$ and all $X\Gamma i\A(i)$ and $X\Gamma i\AB(i)$, and they also self-consistently define $X\Gamma i(i)$. Taken together, Eqs. (39), (41), and (43) comprise a system of nonlinear equations that can be numerically solved to obtain all $X\alpha (i)$, with Eq. (38) giving the free energy; in practice, this may be challenging for certain parameter values. There are, however, two cases that simplify considerably: (1) no double bonds can form and (2) a double bond can form at only one pair of sites on each molecule. We now consider these, in turn, before applying the general theory to a colloid–linker mixture in Sec. IV.

### A. No double bonds

In TPT1, at most one bond can form between two molecules,^{2,4} which is a reasonable approximation when the site geometry on the molecules inhibits double bonding, e.g., patchy colloids with two bonding sites on opposite hemispheres, and the formation of bonded rings of molecules is unlikely, e.g., because the bonding sites are rigidly fixed. (This amounts to setting $\Delta ABCD(ij)=0$ for all pairs of sites on all molecules.) TPT1 is the foundation of SAFT,^{11–15} so we will show that we recover the SAFT equations originally derived by Chapman for multicomponent mixtures with multiple bonding sites^{14} if we neglect double bonding between molecules.

Making the simplification that $\Delta ABCD(ij)=0$, Eq. (44) becomes

for any *A* ∈ *α*, which when evaluated for *α* = Γ_{i} gives

and it follows that

This relationship captures the statistical independence of bonding in TPT1: the fraction of molecules of component *i* that are not bonded at *any* site is the product of the fraction not bonded at *each* site. The free energy

and chemical equilibrium equations

follow by substitution and are equivalent to Eqs. (3) and (4) of Ref. 14 after a conversion to express Δ*A* per molecule.

### B. One double bond

As another special case, suppose that exactly two sites *A*_{12} = {*A*_{1}, *A*_{2}} on a molecule of component *i* can form a double bond with a similar set of two sites on a molecule of component *j*. (We will label all these sites *A*_{1} and *A*_{2}, but *A*_{1} on component *i* and *A*_{1} on component *j* need not be the same, and similarly for *A*_{2}.) This restriction may physically correspond to a molecular geometry where only two of the sites are close enough to double bond with another molecule, which was studied by Marshall and Chapman for a one-component fluid^{28} and by Marshall for a multicomponent mixture^{29} or a restrictive bonding chemistry.

The immediate consequence is that only $\Delta A12A12(ij)\u22600$, which drastically simplifies the expressions for $\sigma \alpha (i)$ because a maximum of two partitions of *α* contribute to Eq. (43)—the partition into all singleton sets and the partition containing one two-element subset *A*_{12}—so

The first case, when *A* is not one of the double-bond sites, is obtained from Eq. (44) with *α* = Γ_{i} and is the same as in TPT1. The second case, when *A* is one of the double-bond sites, is obtained from Eq. (44) with *α* = Γ_{i}$\$*A*_{1} and Γ_{i}$\$*A*_{2} and removing *A*_{2} and *A*_{1}, respectively. Combining Eq. (50) with Eq. (44) for *α* = Γ_{i} and either *A* ∈ *A*_{12} removed gives the additional relationship

Finally, it follows from applying Eq. (44) recursively that

With these results, the free energy is

The free energy has been written so that the sum over sites *A* is the same as TPT1, while the additional terms give the free energy due to double-bond formation. This expression is equivalent to Eq. (19) of Ref. 29 after some manipulations. The corresponding chemical equilibrium equations for *A* ∉ *A*_{12} are the same as Eq. (49), while for *α* ⊆ *A*_{12}, the three $X\alpha (i)$ are given by the system of equations,

a similar equation with the labels *A*_{1} and *A*_{2} reversed, and

## IV. COLLOID–LINKER MIXTURE

Having extended Wertheim’s TPT to include general double-bond association, we now apply the theory to a model mixture of colloids and flexible linker molecules that we previously studied with molecular simulations and TPT1.^{16} In that work, we aimed at predicting the phase behavior of the homogeneous mixture; however, TPT also gives valuable information about the prevalence of different bonding motifs. We measured a significant fraction of linkers forming loops (a double bond with both linker ends attached to the same colloid) in our simulations. Linker loops are neglected in TPT1, so here we investigate the linker loop fraction under different conditions as a useful test of the new theory.

### A. Model

The linkers we studied were linear chains of *M* = 8 tangentially bonded hard-sphere segments with diameter *d*_{l}, which we also called “polymers” in Ref. 16, while the colloids were hard spheres with diameter *d*_{c} = 5 *d*_{l}. The mixture composition was defined using the colloid volume fraction $\eta c=\rho c\pi dc3/6$, which we varied from 0.01 to 0.15, and the linker-to-colloid number ratio *ρ*_{l}/*ρ*_{c}, which we fixed at 1.5. Each linker had *n*_{l} = 2 bonding sites, one at the center of each end segment. Each colloid had *n*_{c} = 6 bonding sites arranged at the vertices of an octahedron at a distance $dcl*=dcl+(21/6\u22121)dl\u22483.12\u2009dl$ from the center of the colloid, where *d*_{cl} = (*d*_{c} + *d*_{l})/2 = 3 *d*_{l} is the hard-sphere contact distance for a colloid and a linker segment. (Placement of the colloid bonding sites at $dcl*$ rather than *d*_{cl} was due to our original work using nearly hard potentials for the spheres.^{53}) A linker could reversibly bond to a colloid through a short-ranged site attraction,

where $rAB=|rB(l)(2)\u2212rA(c)(1)|$ is the distance between the linker and colloid sites, *ε* is the strength of the attraction, and *ℓ* = 0.2 *d*_{l} sets the range of the attraction. The potential was truncated for *r*_{AB} ≥ 2.5 *ℓ* = 0.5 *d*_{l}. There was at most one bond at each site, and there were no bonds between two colloid sites or two linker sites.

### B. Perturbation theory with molecule flexibility

We developed the theory in Secs. II and III with all sites fixed on rigid molecules, but in our colloid–linker model, the linker bonding sites fluctuate with the end-to-end vector **R**. This intramolecular flexibility can be incorporated into the theory with a few modifications and approximations.^{21,22,52} To specify the linker’s conformation, we redefined **1** = (**r**_{1}, **R**_{1}), where **r**_{1} is the position of one end of the chain and **R**_{1} is its end-to-end vector, effectively coarse-graining the internal linker segments. To appropriately weight the conformations, the linkers were given an intramolecular potential

where *p*(**R**) is the probability distribution of end-to-end vectors in the reference mixture without bonding. The linker singlet densities now depend on **R**, and this dependence may be different when both ends of a linker are bonded than when they are free. As a result, $\sigma \Gamma l(l)$ may have a different **R**-dependence in the associating mixture than in the reference mixture, and computing *A*_{R} at the same $\sigma \Gamma l(l)$ may be inconvenient for equations of state that are functionals of the number density. Wertheim showed how this can be circumvented in certain cases by choosing a reference mixture having the same number density after integration over internal degrees of freedom and making a small modification to Eq. (23).^{52} We accordingly take the reference state to be the hard-chain mixture at the same number density and without bonding (*βε* = 0).

With these modifications, we specialized the graph sum given by Eq. (24) to our model. We first reduced the number of $\sigma \alpha (c)$ that needed to be explicitly included. For a linker to double bond, the arc-length distance between colloid bonding sites must be less than the linker’s contour length *L* = (*M* − 1)*d*_{l} = 7 *d*_{l}. In our model, the arc-length distance between nearest sites on the colloid was $dcl*\pi /2\u22484.9\u2009dl$, so the linker was able to form double bonds with any of these pairs of sites. However, the arc-length distance between sites on opposite hemispheres was $dcl*\pi \u22489.8\u2009dl$, so a linker could not form a double bond with these sites. We accordingly excluded double bonds between the three pairs of sites on opposite sides of the colloid; the remaining *ν*_{c} = *n*_{c}(*n*_{c} − 2)/2 double-bond pairs were all equivalent. We then defined the graph sum using representative sets of colloid bonding sites, *A*_{1} for a single bond and *A*_{12} = {*A*_{1}, *A*_{2}} for a double bond between nearest sites. The linker had *n*_{l} = 2 equivalent bonding sites *B*_{12} = {*B*_{1}, *B*_{2}}.

We next simplified the chemical equilibrium equations for a spatially homogeneous mixture. The linker singlet densities were constant with respect to **r** but not **R**, so we redefined $X\alpha (l)(r1)=\sigma \Gamma l\\alpha (l)(r1)/\sigma \Gamma l(l)(r1)$ for the linkers using singlet densities integrated over **R**_{1}. We further assumed that the distribution of end-to-end vectors was the same for linkers having neither or one end bonded and with both given by *p*(**R**), i.e., $\rho \u2205(l)(1)=\rho \u2205(l)(r1)p(R1)$ and $\rho B1(l)(1)=\rho B1(l)(r1)p(R1)$. Carrying through the integration and explicitly specializing for the difunctional linker gave

and

with single-bond volume

and double-bond volume

The factors of 2 in Eqs. (60) and (61) account for equivalent permutations of the site labels. Additionally, Eq. (43) gives $XA1(c)$ and $XA12(c)$ as functions of $X\Gamma c\A12(c)$, $X\Gamma c\A1(c)$, and $X\Gamma c(c)$, along with an implicit equation for $X\Gamma c(c)$ in these variables. Together, these make a complete set of nonlinear equations in five variables that can be solved for a given mixture composition and attraction strength *ε* after the evaluation of Δ_{1} and Δ_{2}. Complete details of how we evaluated these integrals and solved these equations are given in Appendix B.

### C. Pair correlation function

The pair correlation function between the colloids and linkers in the reference system $gR(cl)(r12,R2)$ depends not only on the distance between the colloid and the linker **r**_{12} but also on the linker’s conformation **R**_{2}. We considered approximating $gR(cl)$ using a superposition of the hard-sphere correlations $ghs(cl)$ in the mixture obtained by dissolving the linker’s internal bonds, but we suspected this approach might overestimate $gR(cl)$ because segments of polymer chains are known to be depleted near the surface of colloids,^{54} whereas hard spheres are enriched.^{50}

We accordingly measured $gR(cl)$ in molecular dynamics simulations of the reference hard-chain mixture using the model and simulation methods of Ref. 16. We simulated 1000 colloids and 1500 linkers in a cubic box with periodic boundary conditions using LAMMPS (22 Aug 2018).^{53,55,56} Starting at *η*_{c} = 0.01, we first equilibrated the mixture for 1.5 × 10^{4} *τ*, where *τ* is the unit of time in the simulations. We then sampled configurations for analysis every 10 *τ* over a 10^{4} *τ* period (1000 configurations). Finally, we linearly compressed the edge length of the simulation box over 5 × 10^{3} *τ* to reach *η*_{c} = 0.02. We repeated this procedure up to and including *η*_{c} = 0.15. We computed $gR(cl)$ as a histogram of three variables: the distance from the center of the colloid to one end of the linker $rB1=|r12|$, the end-to-end distance *R* = |**R**_{2}|, and the polar angle *ϕ* between **R**_{2} and **r**_{12}, i.e., $cos\u2061\varphi =(r12\u22c5R2)/(rB1R2)$. The histogram bin ranges and widths were $3\u2009dl\u2264rB1\u22644\u2009dl$ with width 0.25 *d*_{l}, 0 *d*_{l} ≤ *R* ≤ 7 *d*_{l} with width 0.5 *d*_{l}, and 0 ≤ *ϕ* ≤ *π* with width *π*/9 (20°). We have labeled $rB1$ using the end *B*_{1}, but we also took the other linker end *B*_{2} as **r**_{12} to improve sampling.

We also computed the end-to-end vector distribution *p*(**R**) as a histogram of *R* using the same *R*-bins as for $gR(cl)$ (Fig. 3). This distribution was largely independent of *η*_{c} at the compositions we simulated, so we used *p*(**R**) at *η*_{c} = 0.01 to calculate Δ_{1} and Δ_{2}. [We used the *p*(**R**) measured at each composition, however, to normalize $gR(cl)$.] We noted that the probability of having *R* commensurate with the distance between two colloid bonding sites ($2dcl*\u22484.4\u2009dl$) was nonnegligible, meaning that it was likely that double bonds would form.

The integrand of Δ_{1} is nonzero only when one end of the linker interacts with a bonding site. Anticipating the hard-sphere exclusion between the colloid and linker segment, this requires $dcl\u2264rB1\u2264r*$, where $r*=dcl*+2.5\u2113$ based on the range of Eq. (56). We found that $gR(cl)$ did not change significantly in this range of $rB1$, tending to increase slightly for larger $rB1$. In the strong association limit of large *βε*, the dominant contribution to the integral occurs when the bonding sites overlap at $rB1\u2248dcl*$. We accordingly approximated $gR(cl)(rB1,R,\varphi )\u2248gR(cl)(dcl*,R,\varphi )$, which still depended on both *R* and *ϕ*, e.g., because the other linker end cannot penetrate the colloid. However, further inspection of the data suggested a function of only one other variable, namely, the distance $rB2$ from the center of the colloid to the other end of the linker,

Conformations having $rB2<dcl$ had $gR(cl)(dcl*,rB2)=0$ because this would cause the linker end to penetrate the colloid. For larger values of $rB2$, we typically found $gR(cl)(dcl*,rB2)<1$ for most conformations, indicating depletion of the linker near the surface of the colloid.^{54}

We noted that $gR(cl)(dcl*,rB2)$ also had a volume fraction dependence. Conveniently, scaling $gR(cl)$ by $ghs(cl)(dcl+)$, which we obtained from Boublík’s equation of state,^{50} effectively accounted for this composition dependence and collapsed the data [Fig. 4(a)]. [We included in Fig. 4(a) only data for *d*_{l} ≤ *R* ≤ 5 *d*_{l} and 0 ≤ *ϕ* ≤ *π*/2, for which we had sufficient sampling. These conformations are expected to contribute most significantly to Δ_{1}, as others are less probable or wholly excluded.] Empirically, the data were well fit by a linear function of $rB2$ for $rB2>dcl$, which we used to evaluate Δ_{1}.

The integrand of Δ_{2} is nonzero only when both ends of the linker interact with bonding sites. Using similar reasoning as for Δ_{1}, we approximated $gR(cl)$ by its value when all the bonding sites overlap, $gR(cl)(rB1,R,\varphi )\u2248gR(cl)(dcl*,2dcl*,3\pi /4)$. The value of $gR(cl)$ again depended on *η*_{c}, but we found that it was essentially a constant across volume fractions when scaled by $[ghs(cl)(dcl+)]2$ [Fig. 4(b)]. Qualitatively, the additional factor of $ghs(cl)(dcl+)$ here accounts for the presence of the other chain end near the surface.^{21,22,51} The value of $gR(cl)$ is significantly less than the superposition of the hard-sphere correlations, presumably due to the large entropic penalty to confine the linker near the surface of the colloid.

### D. Loop fractions

With all inputs determined, we proceeded to compute the fraction of colloids and linkers in different bonding states at various attraction strengths *ε* and colloid volume fractions *η*_{c}. We first determined the fractions of linkers having no, one, or both ends bonded using TPT; these fractions are $XB12(l)$, $2(XB1(l)\u2212XB12(l))$, and $1\u22122XB1(l)+XB12(l)$, respectively, using Eq. (13). Figure 5 shows a representative result for *η*_{c} = 0.10 as a function of *ε*, which was a composition where the mixture remained a single (homogeneous) phase in our previous simulations.^{16} The fraction of unbonded linkers decreased monotonically as the attraction strength *ε* increased, the fraction of linkers bonded at both ends concomitantly increased monotonically, and the fraction of linkers with only one end bonded had a maximum near *βε* ≈ 12.5. The TPT calculation (blue dashed line) agreed nearly quantitatively with the simulation data (black circles).^{16} However, first-order TPT, which completely neglects double bonds between colloids and linkers, yielded comparable results that we omitted here for clarity.

The usefulness of the newly developed theory is its ability to predict the fraction of linkers in loops *χ*, which is absent from TPT1. This fraction can be computed from the double-bond graphs in $\rho B12(l)$, which here is

In TPT1, *χ* = 0 because $\Delta cB12(l)=0$ (Sec. III A), but with double bonding included, *χ* should depend on *ε* and on the mixture composition. We computed *χ* as a function of *ε* at volume fractions *η*_{c} = 0.01 and *η*_{c} = 0.10 (red dotted lines in Fig. 6). At both compositions, more loops formed as *ε* increased, and there were significantly more loops at the dilute composition *η*_{c} = 0.01 than when *η*_{c} = 0.10. The latter is more obvious considering *χ* as a function of *η*_{c} in the strong association limit, which we take here as *βε* = 20 (Fig. 7). Both of these trends in the TPT predictions are qualitatively consistent with our simulations (red circles).

By manipulating the chemical equilibrium equations for the linkers, it can be shown in the strong association limit that

where $K=(XA1(c))2/XA12(c)$ approaches a limiting value as *βε* becomes large. Here, we assume that Δ_{2} ≫ Δ_{1} and $\Delta 2\u223c\Delta 12$, which is motivated by Eqs. (62) and (63) and supported by our calculations. For the range of compositions we studied, *K* depended weakly on the mixture composition in our numerical calculations but was approximately 1. Consistent with our calculations and simulations (Fig. 7), smaller colloid number density *ρ*_{c} favors loop formation (larger *χ*). *χ* also depends on the number of colloid bonding sites *n*_{c} relative to the number of potential double bonding pairs *ν*_{c} and, significantly, on the ratio of bond volumes $\Delta 12/\Delta 2$. This suggests a potential strategy for limiting loop formation. The number of double bonding pairs *ν*_{c} depends on the linker length, and Δ_{2} is also roughly proportional to $p(rA1A2)$, the probability of the linker having an end-to-end vector commensurate with the distance between colloid bonding sites ( Appendix B). Reducing the compatibility between the linker and the typical distance between colloid bonding sites could decrease both *ν*_{c} and $p(rA1A2)$, and as a result *χ*. This might be achieved by modifying the linker’s length or flexibility and presents an opportunity for engineering colloidal self-assembly.

### E. Loop and ring fractions

Despite qualitatively capturing the *ε* and *η*_{c} dependences of *χ*, our TPT calculations consistently overestimated *χ* compared to the simulations. We note that the simulations at smaller colloid volume fractions (*η*_{c} ≲ 0.06) phase separated,^{16} but our TPT calculations assume that the mixture is homogeneous. Some differences might then be expected between TPT and simulations in this regime; however, similar differences were also obtained at larger volume fractions where the simulated structures were homogeneous. Hence, we suspected there might be additional bonding motifs impacting the thermodynamics that we neglected in our approximation of the irreducible graph sum Δ*c*. In particular, our simulations suggested that “rings,” where two linkers redundantly bridged two colloids [Fig. 1(b)], were also prevalent, but only double-bond graphs [Fig. 8(a)] representing loops were included in Eq. (24).

Rings of two colloids linked by two linkers can be incorporated into our TPT using the approximation of Sear and Jackson,^{22} which adds irreducible graphs [Fig. 8(b)] to Δ*c*,

where $gR(4)$ is the four-body correlation function in the reference mixture approximated by the superposition,

The sums run over all pairs of bonding sites on the colloids labeled **1** and **2**, {*E*, *F*} and {*G*, *H*} are the bonding sites on the linkers labeled **3** and **4**, respectively, and the factor of 16 accounts for permutations of bonding within these sets of sites. Because this ring graph involves only singlet densities not bonded at two sites, it will contribute only to $\Delta cA12(c)$ and $\Delta cB12(l)$ in a similar manner to the double-bond graphs. A similar approximation was also made by Haghmoradi *et al.* for mixtures of ring-forming colloids with two bonding sites in an article that appeared after our work was completed.^{57}

We carried out the functional differentiation of Δ*c* including these ring graphs, and we integrated the chemical equilibrium equations for a spatially homogeneous fluid. Two-colloid rings are only likely to form at colloid bonding sites that are nearest neighbors because the linkers would need to stretch significantly (*R* ≳ *d*_{c}) to bond at sites on opposite hemispheres. Hence, the equivalent $\sigma \alpha (c)$ remains the same as before, and we need only add terms for the ring graph to Eqs. (60) and (61). The result is

with the ring-bond volume

Details of how we approximated Δ_{4} are given in Appendix B.

As in Sec. IV D, we proceeded to solve the new chemical equilibrium equations. The fractions of linkers with zero, one, or two bonded ends predicted by the TPT with both loops and rings (red line in Fig. 5) were very similar to those predicted by the TPT with only loops, and both agreed well with the simulations. We then determined the fractions of linkers in either loops or rings. With rings included in the TPT, *χ* given by Eq. (65) is now the total fraction of linkers in either motifs. We accordingly separated the loop-graph contributions to $\Delta cB12(l)$ and *χ*,

and by subtraction, the ring-graph contribution was *χ*_{ring} = *χ* − *χ*_{loop}.

The TPT-predicted fraction of linkers in rings (blue dashed line in Figs. 6 and 7) is in very good agreement with the simulations (blue squares). The predicted fraction of loops also decreased in the TPT with loops and rings (red line) compared to the TPT with only loops. This improved the agreement between the TPT and simulations, although the fraction of linkers in loops was still slightly overpredicted in the TPT. We believe that this overprediction might be due to the neglect of other competing bonding motifs such as larger rings or of other irreducible graphs that include effects from steric hindrance.^{52} Nonetheless, the agreement between the simulations and the TPT with loops and two-colloid rings is good, and the TPT with loops and rings is a significant improvement over TPT1, which predicted that *χ* = 0 for both bonding motifs.

## V. CONCLUSIONS

We have extended Wertheim’s thermodynamic perturbation theory (TPT) for fluids with strong directional attractions to include double bonds between an arbitrary number of pairs of molecular bonding sites in a multicomponent mixture. This extension, which relaxes restrictions in prior TPTs on the number of potential double-bond site pairs, was required to model the assembly of colloidal particles (“colloids”) and difunctional flexible chain molecules (“linkers”). We showed that the fraction of linkers that “looped” to make a double bond with both ends attached to the same colloid and/or that formed “rings” of two linkers bridging between the same two colloids could be reliably predicted using TPT across a range of compositions and attraction strengths. A large linker loop fraction may inhibit assembly of percolated, self-supporting networks of colloids because looped linkers do not form new bridges between colloids. Our work suggests that the loop fraction can be reduced by making the end-to-end distance of the linker incompatible with the distance between colloid bonding sites, which might be achieved by modifying the linker’s molecular weight or flexibility.^{58} It would be interesting to use the developed theory to compute not only loop fractions but also phase boundaries in order to demonstrate how loops modify the phase behavior of the mixture, including conditions amenable to gelation or equilibrium gels.^{7,59}

## ACKNOWLEDGMENTS

M.P.H. thanks Ryan Jadrich for introducing him to Wertheim’s elegant theory. This research was primarily supported by the National Science Foundation through the Center for Dynamics and Control of Materials: an NSF MRSEC under Cooperative Agreement No. DMR-1720595, with additional support from an Arnold O. Beckman Postdoctoral Fellowship (Z.M.S.) and the Welch Foundation (Grant Nos. F-1696 and F-1848). We acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources.

## DATA AVAILABILITY

The data that support the findings of this study are available from the authors upon reasonable request.

### APPENDIX A: SITE OPERATOR ALGEBRA

Useful relationships between $\rho \alpha (i)$, $\sigma \alpha (i)$, and $c\alpha (i)$ can be derived through a formalism of “site operators” (or dual numbers) that permits for the manipulation of equations such as Eq. (11).^{3} For completeness, we will explicitly derive some of Wertheim’s key results using this formalism. Let $\epsilon A(i)(1)$ be an operator of a site *A* on a molecule of component *i* at a root point labeled **1**. All site operators commute and are defined to satisfy

For some *α* ⊆ Γ_{i}, define

and as a result,

Wertheim considered numbers of the form

that follow the arithmetic and algebra for real numbers with the exception that division by $x\u0308(i)$ having $x\u2205(i)=0$ is not permitted. Analytical functions of $x\u0308(i)$ can be computed by series expansion.

With this algebra, $\rho \u0308(i)$ is defined by Eq. (A4),

and we will now connect $\sigma \u0308(i)$ to $\rho \u0308(i)$. We start with this same definition for $\sigma \u0308(i)$ and use Eq. (12) to replace $\sigma \alpha (i)$,

We made use of the identity

in the third line and

in the last line.

It is now straightforward to invert this relationship,

using the series expansion $(1+\epsilon A(i))\u22121=1\u2212\epsilon A(i)$. To find expressions for $\rho \alpha (i)$ in terms of $\sigma \alpha (i)$, we expand $\sigma \u0308(i)$,

and collect coefficients of $\epsilon \alpha (i)$. This can only include terms in subsets *γ* ⊆ *α*, giving Eq. (13).

To find expressions for $c\alpha (i)$ in terms of $\sigma \alpha (i)$, we start from the definition of $c\u0308(i)$ and consider its exponential

where in the second equality, we have separated the exponential and used the series expansion $exp(\epsilon \alpha (i))=1+\epsilon \alpha (i)$. The product can be re-expressed as a sum over partitions of *α* because only products of terms in disjoint sets will survive the expansion,

using Eq. (11) to arrive at the second equality and Eq. (A9) with the replacement $\rho \u2205(i)=\sigma \u2205(i)$ to obtain the last equality.

Alternatively, we can invert Eq. (A12) directly,

where the last equality used the series expansion $ln(1\u2212\epsilon A(i))=\u2212\epsilon A(i)$. In order to determine expressions for $c\alpha (i)$, we now series expand the logarithm,

Recall that in general,

so the expansion can be performed in multiple variables by collecting all terms involving disjoint sets whose union is *α*, i.e., these sets must be partitions of *α*,

The factor of |*γ*|! accounts for all permutations of *γ* that occur in the expansion. This gives Eq. (15) for $c\alpha (i)$.

We will now show one additional relationship that is useful for computing the thermodynamics. We start by explicitly expanding the sum

The second term can be re-expressed as a sum over partitions of Γ_{i}, excluding the improper partition, by substituting Eq. (15), and regrouping terms,

Note in the second equality that the sum excludes the improper partition of Γ_{i} and that there is a factor of |*γ*| that accounts for all permutations removing one of the terms in the partition. By direct substitution of Eq. (15), the third term can be similarly expressed,

Combining the two sums leads to

where *Q*^{(i)} is given by Eq. (19). Note that *Q*^{(i)} does not depend on $\sigma \Gamma (i)$.

### APPENDIX B: NUMERICS

To evaluate Δ_{1}, we fixed the colloid at the origin and specified its orientation by a director through *A*_{1} having three intrinsic Euler angles (*θ*_{c}, *ϕ*_{c}, *ψ*_{c})—0 ≤ *θ*_{c} ≤ 2*π* is a right-handed rotation around the *z* axis, 0 ≤ *ϕ*_{c} ≤ *π* is a right-handed rotation about the rotated *y* axis, and 0 ≤ *ψ*_{c} ≤ 2*π* is a right-handed rotation about the rotated *z* axis—so Ω_{c} = 8*π*^{2} and *θ*_{c} and *ϕ*_{c} have the same meaning as the azimuthal and polar angles in spherical coordinates. We defined the position $rB1$ of the bonded linker site *B*_{1} in spherical coordinates $(rB1,\theta B1,\varphi B1)$ using the colloid as the origin and $rA1$ as an axis, i.e., in a frame that rotates with the colloid. We will denote the vector between two sites *A* and *B* by **r**_{AB} = **r**_{B} − **r**_{A} and the distance by *r*_{AB} = |**r**_{AB}|. The end-to-end vector $R=rB1B2$ was also defined in spherical coordinates $(rB1B2,\theta B1B2,\varphi B1B2)$ using *B*_{1} as the origin and $rB1$ as the axis. These coordinates are illustrated in Fig. 9(a).

The distance between *A*_{1} and *B*_{1} was

and with the potential given by Eq. (56), $fA1B1$ was nonzero $rA1B1<2.5\u2113$, or equivalently

The end-to-end distance was limited by the segment diameter and the contour length, $dl\u2264rB1B2\u2264L$. Conformations where the linker wraps around the colloid are either improbable or forbidden by the hard-sphere exclusions, so we limited $0\u2264\varphi B1B2\u2264\pi /2$, effectively approximating the colloid as a flat surface. With these considerations and after integration over all colloid orientations, $\theta B1$, and $\theta B1B2$, the integral for Δ_{1} [Eq. (62)] was

We used a linear fit [Fig. 4(a)] for the pair correlation function

with $rB2$ given by Eq. (64). For numerical convenience, we also fit *p*(**R**) for |**R**| ≥ *d*_{l} at *η*_{c} = 0.01 (Fig. 3) with a piecewise function,

where *r*_{max} = 2.76, *p*_{max} = 3.00 × 10^{−3}, *a* = −5.24 × 10^{−4}, *b* = 0.276, and *c* = 2.45 (all units implicit and consistent with *r* having units of *d*_{l}).

To evaluate Δ_{2}, we modified the scheme we used for Δ_{1} so that *B*_{2} was now bonded with a site *A*_{2} on the colloid that was a nearest neighbor of *A*_{1}. The position $rB2$ of *B*_{2} was redefined in spherical coordinates $(rB2,\theta B2,\varphi B2)$ with the colloid as the origin and $rA2$ as the axis [Fig. 9(b)]. The distance between *A*_{1} and *B*_{1} was unchanged, and the distance between *A*_{2} and *B*_{2} was given by an analogous formula. The distance between *B*_{1} and *B*_{2} was

It was then not possible to trivially integrate over $\theta B1$ and $\theta B2$, and we found that this increased dimensionality added a significant computational cost for evaluating Δ_{2}. However, the distance between the colloid sites $rA1A2$ is much larger than the range of Eq. (56), so to a good approximation $rB1B2\u2248rA1A2$. As a result,

with

from Fig. 4(b) and

To evaluate Δ_{4}, which is an even higher dimensional integral than Δ_{2}, we adopted a similar approximation for determining *p* and $gR(4)$ using the positions of the colloid bonding sites rather than the linker ends. We replaced each pairwise $gR(cl)$ in the superposition approximation for $gR(4)$ using Eq. (B4). Labeling the additional colloid bonding sites as *C*_{12} = {*C*_{1}, *C*_{2}} and linker bonding sites as *D*_{12} = {*D*_{1}, *D*_{2}}, we obtained

We evaluated the integral over the range of colloid separations $dc\u2264r12\u22643dcl*+L$, for which the integrand is nonzero based on $eR(cc)$ and *p*(**R**), using the trapezoidal rule with 21 uniformly spaced points. The angle brackets denote an unweighted average over the orientations of the two colloids, which we evaluated at each separation *r*_{12} using Monte Carlo sampling. We placed one colloid with sites *A*_{12} at the origin (0, 0, 0) and the other with sites *C*_{12} along the *z* axis at (0, 0, *r*_{12}). The bonding sites on both colloids were initially oriented along the +*z* and +*y* axes, and we generated 4 × 10^{6} configurations by drawing and applying uniformly random axis–angle rotations to each colloid.^{60} In averaging, we rejected any configurations in which (1) the distance $rA1C1$ or $rA2C2$ was greater than the contour length *L* or less than the segment diameter *d*_{l}, (2) either of the site *z*-coordinates $zA1$ or $zA2$ was less than zero, or (3) either of the site *z*-coordinates $zC1$ or $zC2$ was greater than *r*_{12}. The first condition rejects linker conformations excluded based on *p*(**R**), while the other two conditions reject ring configurations that are unlikely or forbidden because the linkers would need to wrap around or penetrate the colloids.

The chemical equilibrium equations can be written as five nonlinear algebraic equations in five variables: $X\Gamma c\A12(c)$, $X\Gamma c\A1(c)$, $X\Gamma c(c)$, $XB1(l)$, and $XB12(l)$. In addition to Eqs. (58) and (59) and Eqs. (69) and (70), the equations from Eq. (43) were

The expressions for $XA1(c)$ and $XA12(c)$ were directly substituted before solving.

To obtain numerical solutions, we first evaluated the bond volume integrals for 0 ≤ *βε* ≤ 20 in increments of 0.5 using the multivariable quadrature method in SciPy (version 1.3.1) with numba (version 0.50.1).^{61–63} We then iteratively solved the equations for a given composition *η*_{c} and *ρ*_{l}/*ρ*_{c} as a function of *ε*, starting from *βε* = 0 where all $X\alpha (i)=1$ and increasing in steps of 0.01. The bond volumes at intermediate values of *ε* were interpolated; for *βε* < 5, we used a cubic spline interpolation of the bond volumes, while for larger *ε*, we used a cubic spline interpolation of the natural logarithms of the bond volumes. The equations were solved using the SciPy implementation of the Levenberg–Marquadt algorithm with an explicitly specified Jacobian matrix, a relative error tolerance of 10^{−12} in the sum of squares, a relative error tolerance of 10^{−12} in the solution, and a maximum of 5000 iterations. We used the solution from the previous value of *ε* as an initial guess, and we checked that $0\u2264X\alpha (i)\u22641$ for both components and all *α* (including dependent values of $X\alpha (c)$) to ensure a physically meaningful converged solution.

## REFERENCES

An articulation point of a connected graph is a point that, if removed, will disconnect the graph into two or more graphs.^{42} An irreducible graph is free of articulation points. For example, a pair of points or any closed cycle of points is irreducible, but three points bonded collinearly are not irreducible because the middle point is an articulation point.

*A* ⊆ *B* denotes that *A* is a subset of *B*, including the improper subset *A* = *B*.

A partition of set *A* is a grouping of the elements of *A* into one or more non-empty sets using every element exactly once. For example, if *A* = {*a*, *b*, *c*}, then {{*a*}, {*b*}, {*c*}}, {{*a*}, {*b*, *c*}}, {{*a*, *b*}, {*c*}}, {{*a*, *c*}, {*b*}}, and {{*a*, *b*, *c*}} are all partitions of *A*. *P*(*A*) denotes the set of all possible partitions of *A*. The last partition, into only a single subset {*A*}, is called an improper partition.

*A*∖*B* = {*a* ∈ *A*|*a* ∉ *B*} denotes the set difference, i.e., all elements that are in *A* but not in *B*.

Forgiving some abuse of notation, the label of a single site *A* should be replaced by a set {*A*} when it represents a set of bonded sites.

|*A*| denotes the number of elements in set *A*.

*A* ⊂ *B* denotes that $A$ is a proper subset of *B*, i.e., there is at least one element of *B* that is not in *A* so *A* ≠ *B*.

^{3}