Hydrophobic interactions drive numerous biological and synthetic processes. The materials used in these processes often possess chemically heterogeneous surfaces that are characterized by diverse chemical groups positioned in close proximity at the nanoscale; examples include functionalized nanomaterials and biomolecules, such as proteins and peptides. Nonadditive contributions to the hydrophobicity of such surfaces depend on the chemical identities and spatial patterns of polar and nonpolar groups in ways that remain poorly understood. Here, we develop a dual-loop active learning framework that combines a fast reduced-accuracy method (a convolutional neural network) with a slow higher-accuracy method (molecular dynamics simulations with enhanced sampling) to efficiently predict the hydration free energy, a thermodynamic descriptor of hydrophobicity, for nearly 200 000 chemically heterogeneous self-assembled monolayers (SAMs). Analysis of this dataset reveals that SAMs with distinct polar groups exhibit substantial variations in hydrophobicity as a function of their composition and patterning, but the clustering of nonpolar groups is a common signature of highly hydrophobic patterns. Further molecular dynamics analysis relates such clustering to the perturbation of interfacial water structure. These results provide new insight into the influence of chemical heterogeneity on hydrophobicity via quantitative analysis of a large set of surfaces, enabled by the active learning approach.

## INTRODUCTION

The hydrophobicity of an interface dictates the magnitude of attractive water-mediated hydrophobic interactions that drive diverse processes, including molecular recognition,^{1–3} protein folding,^{2,4,5} colloidal aggregation,^{3,6} and assembly at the nano–bio interface.^{7–9} While the hydrophobicity of spherical uniformly nonpolar solutes can be related to the molecular rearrangement of water,^{10} the hydrophobicity of solutes with chemically heterogeneous surfaces—characterized by polar and nonpolar groups in close (∼nm) proximity—is less well-understood.^{11,12} Approaches to predict quantitative metrics of hydrophobicity for these surfaces often assume that contributions from polar and nonpolar groups are additive; examples of approaches include predicting water contact angles based on area-weighted sums of the contact angles of polar and nonpolar surface regions (i.e., the Cassie equation),^{13,14} predicting transfer free energies by multiplying the nonpolar solvent-accessible surface area of a solute by a constant fitting parameter,^{15–19} or predicting partitioning into nonpolar media by summing hydrophobicity scale values^{20} or octanol–water partition coefficients.^{21–24} However, experiments have demonstrated that the hydrophobicity of chemically heterogeneous surfaces can deviate from these additive approximations depending on the spatial arrangement of polar and nonpolar groups at the nanoscale (i.e., surface patterning) and the chemical identity of polar groups. For example, Woodward *et al.* attributed deviations from the Cassie equation in binary self-assembled monolayers (SAMs) to a “boundary region” around hydrophobic patches.^{25} Wang *et al.* and Ma *et al.* found that the hydrophobic force between a nonpolar SAM and a *β*-peptide containing polar and nonpolar amino-acid side chains was only measurable if the polar side chains were adjacent to a well-defined nonpolar patch.^{11,26} These studies revealed the impact of the adjacent polar groups: Lysine side chains (with an amine group) had no measurable effect on the hydrophobic force, whereas glutamine side chains (with an amide group) eliminated the hydrophobic force completely.^{11,26} This interplay between surface patterning and polar group chemistry on the hydrophobicity of chemically heterogeneous surfaces remains poorly understood.

To complement experimental studies, atomistic molecular dynamics (MD) simulations have been employed to analyze the interfacial water structure at chemically heterogeneous interfaces and examine the effect of surface patterning on hydrophobicity.^{27–33} For example, Xi *et al.* studied the impact of surface patterning by systematically varying the distance between hydroxyl groups in a binary SAM and found a nonmonotonic relationship between this distance and interfacial water density fluctuations.^{12} Monroe and Shell leveraged a genetic algorithm to show that hydroxylated surfaces with highly clustered polar domains had the highest surface water diffusivity as a measurement of hydrophobicity.^{34} Luo *et al.* reported drastic differences in wetting behavior and time-averaged interfacial water structures for surfaces with similar chemical compositions but differing morphologies representative of patchy and striped nanoparticles.^{35} This work seeks to build upon these and related studies^{36,37} by exhaustively mapping surface patterning to hydrophobicity for surfaces with different types of polar chemical groups, thereby enabling the quantitative analysis of pattern features that impact hydrophobicity. A key challenge is the combinatorial complexity associated with surface patterning; for example, a binary SAM with polar and nonpolar ligands arranged on an *n* × *n* lattice has ∼$2n2$ arrangements of polar and nonpolar groups. Enhanced sampling methods that calculate the hydration free energy (HFE), a thermodynamic descriptor of hydrophobicity,^{27,28} are too computationally expensive to tractably explore all such patterns. We have recently developed a data-driven method for predicting HFEs utilizing a convolutional neural network (CNN) that requires minimal simulation time.^{32} However, the CNN’s reduced accuracy combined with the combinatorial complexity of patterning necessitates more efficient methods to study the effect of patterning on hydrophobicity, particularly when considering multiple types of polar groups.

In this work, we develop a new computational approach to efficiently map surface patterning to hydrophobicity for chemically heterogeneous surfaces. Our approach uses a Gaussian Process Regression (GPR) model to predict the HFE of a patterned SAM surface using a binary matrix representation of the pattern as input. To efficiently train the GPR model, we develop a dual-loop active learning scheme to label a small subset of patterned SAMs with HFEs. Active learning accelerates model training by choosing training data using an information gain criterion^{38} and has been recently employed for materials discovery.^{40–41} Our scheme combines a fast reduced-accuracy method (a pre-trained CNN) with a slow higher-accuracy method [Indirect Umbrella Sampling (INDUS)] to label selected training patterns with HFEs. Once trained, the GPR model can rapidly predict HFEs for tens of thousands of patterns without additional simulations. We leverage this capability to identify patterns that display large deviations from additive behavior (i.e., the HFE cannot be approximated by additive contributions from polar and nonpolar groups) for nearly 200000 binary SAMs containing either amine, hydroxyl, or amide polar end groups. Analysis of this large dataset reveals that SAMs with different polar groups exhibit substantial variations in hydrophobicity as a function of surface composition and patterning, but highly hydrophobic patterns are always distinguished by the clustering of nonpolar groups, which we attribute to the disruption of interfacial water molecules. These results thus provide new physical insight into the role of chemical heterogeneity on hydrophobicity via quantitative analysis of a substantially larger set of surfaces than would be accessible by simulations alone.

## METHODS

### System description and simulation parameters

Identifying patterned surfaces with interfacial hydrophobicities that deviate from additive behavior requires defining (i) the set of patterned surfaces, (ii) a descriptor of interfacial hydrophobicity, and (iii) additive behavior. We chose a set of patterned binary SAMs consisting of alkanethiol ligands terminated with either methyl groups (referred to as nonpolar ligands) or one of three polar end groups—amine, hydroxyl, or amide (referred to as polar ligands). Ligand and end group chemical structures are shown in Fig. 1(a). SAMs were constructed from a hexagonal lattice of 64 ligands with a 4 × 4 patch of polar and nonpolar ligands embedded within a background of 48 nonpolar ligands. The number and spatial positioning of polar and nonpolar ligands within the patch were permuted to generate different patterns, as shown in Fig. 1(b). In total, this set included 3 × 2^{16} = 196608 SAMs, permitting broad exploration of the relationship between spatial patterning and hydrophobicity for different polar end groups. Long ligand backbones were selected to ensure that ligands formed an ordered arrangement with minimal disruption in the relative positions of end groups in order to isolate the effect of chemical heterogeneity on SAM hydrophobicity. The effect of surface roughness (for chemically uniform systems) emerging from fluctuations of disordered ligand backbones on hydrophobicity was considered in previous work by our group.^{31,33}

Each SAM was constructed by positioning 64 ligands in the *xy* plane with a grafting density of 21.6 Å^{2}/ligand to mimic self-assembly onto a gold (111) lattice.^{42} Gold atoms were not modeled because prior work showed that removing the gold substrate leads to SAM properties in better agreement with experiments without influencing the water structure, possibly due to inaccuracies with classical gold force fields.^{31} The systems considered in this work also have long carbon backbones and a high ligand grafting density, ensuring that gold-end group interactions are minimized and no water molecules are capable of penetrating through the SAM layer to access the (missing) gold surface. Ligands were oriented with the end groups pointing in the positive *z*-direction. A 5-nm thick water layer was placed above the SAM and in contact with the ligand end groups. A 3-nm thick buffering vacuum layer was added above the water layer. Periodic boundary conditions were applied in all dimensions. Ligands were modeled using the CHARMM36 force field^{43} with the TIP4P/2005 water model.^{44} Electrostatic interactions were calculated using the smooth particlemesh Ewald algorithm^{45} with short-range Coulomb, van der Waals, and neighbor list cutoffs set to 1.2nm. Because gold-sulfur bonds were omitted in the simulations, harmonic restraints with a spring constant of 50000 kJ/mol/nm^{2} were applied to the sulfur atoms to maintain the SAM structure. All MD simulations were performed in the *NVT* ensemble using *GROMACS 2016* with a 2-fs time step.^{46} The temperature was maintained at 300K using a velocity rescaling thermostat with a temperature-coupling time of 0.1ps.^{47}

### Thermodynamic descriptor of hydrophobicity

We quantified SAM interfacial hydrophobicity via the hydration free energy (HFE), which measures the magnitude of water density fluctuations.^{27,28} Equation (1) defines the HFE in terms of the probability $P\nu 0$ that a cavity (indicated by subscript *ν*) near the SAM is occupied by zero water molecules,^{27}

Water density fluctuations are enhanced near hydrophobic surfaces, increasing $P\nu 0$ and decreasing the HFE. The HFE is an appropriate descriptor of hydrophobicity for chemically heterogeneous surfaces because it has been shown to vary for surfaces with hand-selected patterns.^{12,48} Previous studies have also shown HFEs to correlate with experimental measurements of surface contact angles^{49,50} and hydrophobic forces.^{33,48} Because the HFE is sensitive to the placement and size of the cavity, all HFE calculations used a 2.0 × 2.0 × 0.3nm^{3} cavity [Fig. 1(c)] positioned with its base on a constant water (number) density isosurface.^{51} Section S1.1 of the supplementary material further discusses the cavity placement. HFEs were obtained either by Indirect Umbrella Sampling (INDUS),^{27,28} an enhanced sampling technique that biases the removal of water molecules from the cavity during independent MD simulations, or by using a previously developed 3D convolutional neural network (CNN),^{32} a machine learning model that maps interfacial water positions sampled from a short MD simulation to an HFE. INDUS is a high accuracy approach but requires substantial MD simulation time (∼65ns), whereas the CNN is less accurate but requires minimal simulation time (∼2ns). Details on the usage of these two methods are provided below.

We expect the HFE of a surface that exhibits additive behavior to depend on the relative amounts of polar and nonpolar area covered by the cavity.^{10,15–19} Accordingly, we computed Voronoi diagrams to partition the total area covered by the cavity into polar and nonpolar cells determined by the closest ligand end group at equilibrium [Fig. 1(b)]. Cells were generated from a subset of MD simulations for each polar end group and were found to be nearly equal hexagons in all simulations. We thus define the polar area fraction as the fraction of covered area that is occupied by polar cells, assuming that cell areas are independent of the pattern and polar end group, and define a pattern as exhibiting additive behavior if its HFE is a linear combination of the HFEs of SAMs containing only polar and only nonpolar ligands weighted by its polar area fraction. Because the cavity has a square 2D footprint, while the SAM lattice is hexagonal, the polar area fraction can vary by up to ∼0.05 for patterns containing the same number of polar groups and reaches a maximum of ∼0.82 for a pattern with only polar groups due to the boundary of nonpolar groups [Fig. 1(b)]. Details on the calculation of the polar area fraction are provided in the supplementary material.

### Dual-loop active learning scheme to efficiently map patterns to hydrophobicity

The large number of possible patterned SAMs inhibits the direct calculation of HFEs for all possible SAMs. Instead, we trained a Gaussian Process Regression (GPR) model to map patterns (encoded as 4 × 4 binary matrices) to HFE labels; the model is trained on HFEs calculated from simulation data, but after training, the GPR model can predict HFEs for all patterns without additional simulations. To minimize the amount of training data while maximizing GPR model accuracy, we designed a dual-loop active learning scheme that exploits the advantages of both INDUS (high accuracy, more computationally expensive) and the CNN (lower accuracy, less computationally expensive) during GPR model training. This scheme uses accurate INDUS calculations to label selected patterns with large deviations from additive behavior and faster, but less accurate, CNN calculations to explore large portions of pattern space.

Figure 2 illustrates the active learning scheme. The GPR model is initially trained using a “seed” set of 52 SAMs for each polar end group with INDUS-calculated HFE labels. Two loops are then performed in parallel to iteratively label additional SAMs with HFEs: a “fast loop” utilizing the CNN and a “slow loop” utilizing INDUS. 50 iterations of the fast loop require approximately the same computational time as one iteration of the slow loop. A pattern is chosen for each iteration of the fast loop by maximizing an acquisition function that optimizes the trade-off between choosing patterns with high deviations from additive behavior and sampling patterns in uncertain regions of pattern space. The chosen pattern is then labeled with the CNN. The pattern that was labeled with the largest deviation from the expected additive behavior during the 50 iterations of the fast loop is then chosen for the slow loop; the fast loop continues for an additional 50 iterations while this pattern is labeled with INDUS. The parallel execution of the fast and slow loops is repeated until the convergence of GPR predictions,^{52} which required only 30 iterations of the slow loop and 1448 iterations of the fast loop (detailed in the supplementary material). Table I summarizes the scheme, which we repeated once for each polar end group. Details about the choice of seed patterns, HFE calculations, GPR, and choice of the acquisition function are discussed below.

Dual-Loop Active Learning Algorithm |

1 Generate a set of 52 seed patterns and label with INDUS simulations |

2 Use GPR model to map patterns to HFE labels |

3 (convergence criterion is not met): while |

4| Execute 50 iterations of the and 1 iteration of the fast loop in parallel slow loop |

5 end |

Fast Loop |

1 Maximize the acquisition function to select next pattern to sample |

2 Perform unbiased MD simulation and predict HFE using a CNN |

3 Append the CNN-predicted HFE to the training data with proper replicate error |

4 Use GPR model to map patterns to HFE labels |

Slow Loop |

1 Choose pattern with maxdeviation from expected additive behavior from last 50 fast loop iterations |

2 Perform INDUS simulation of chosen pattern to calculate HFE |

3 Append INDUS-calculated HFE to training data with proper replicate error |

Dual-Loop Active Learning Algorithm |

1 Generate a set of 52 seed patterns and label with INDUS simulations |

2 Use GPR model to map patterns to HFE labels |

3 (convergence criterion is not met): while |

4| Execute 50 iterations of the and 1 iteration of the fast loop in parallel slow loop |

5 end |

Fast Loop |

1 Maximize the acquisition function to select next pattern to sample |

2 Perform unbiased MD simulation and predict HFE using a CNN |

3 Append the CNN-predicted HFE to the training data with proper replicate error |

4 Use GPR model to map patterns to HFE labels |

Slow Loop |

1 Choose pattern with maxdeviation from expected additive behavior from last 50 fast loop iterations |

2 Perform INDUS simulation of chosen pattern to calculate HFE |

3 Append INDUS-calculated HFE to training data with proper replicate error |

### Seed patterns and hydration free energy calculations

The set of seed patterns included patterns generated from 50 random binary matrices and two patterns corresponding to all polar and all nonpolar groups. The same seed patterns were used for all polar end groups. Seed patterns and patterns chosen for the slow loop during active learning were labeled with HFEs using INDUS. INDUS samples $P\nu 0$ [Eq. (1)] by biasing the number of water molecules in the cavity *ν* because $P\nu 0$ is too small to be sampled during unbiased simulations if the cavity is large.^{27,28} We performed INDUS using *GROMACS 2016.6*^{53} patched with the PLUMED 2.5.1 plugin.^{54} Each SAM was equilibrated for 5ns, INDUS was performed using 13 independent 5-ns simulation windows, and the unbiased value of $P\nu 0$ was obtained using the weighted histogram analysis method.^{55} Details are included in the supplementary material. The error associated with HFEs calculated using INDUS is ∼2 *k*_{B}*T*.^{32,48}

Patterns chosen for the fast loop were labeled using a CNN developed in our prior work.^{32} The CNN was previously trained on a set of homogeneous SAMs^{36} containing amine, amide, and hydroxyl ligands with scaled partial charges. Input to the CNN consists of a short unbiased MD simulation (∼1ns) converted into a sequence of atomic density histograms. The CNN predicts HFEs that are highly correlated with INDUS-calculated HFEs and accurate up to a multiplicative constant. The error associated with HFEs predicted by the CNN is ∼6 *k*_{B}*T*. While the error associated with CNN-predicted HFEs is larger than that of INDUS-calculated HFEs, the error of the CNN is only ∼10% of the range of HFEs. While this is detrimental when studying surfaces with similar HFEs, this error is small enough that the CNN can consistently rank-order surfaces with different HFEs using very short unbiased simulations, which is required for the rapid evaluation of HFEs during the fast loop in the dual-loop active learning algorithm. CNN details are included in the supplementary material.

### Gaussian process regression to map patterns to HFE labels

GPR is a non-parametric Bayesian model that estimates the mean and uncertainty of unlabeled test data based on the knowledge of labeled training data under the assumption that the joint distribution of training and test data is Gaussian.^{56} In our approach, input data are patterns encoded as 4 × 4 binary matrices and labels are HFEs. We define elements of the symmetric covariance matrix **K** using a radial basis kernel function^{57}

where **x**_{i} and **x**_{j} denote individual patterns and $\gamma =2$ defines the characteristic length scale in pattern space. For any pattern **x*** not in the training data, estimates for the mean ($\mu f*)$and uncertainty ($\sigma f*$) of the HFE for that pattern are

where **y** is a vector of HFE labels (obtained from either INDUS or the CNN) for the *t* training patterns, *σ*_{rep} is a vector of error values for the training pattern HFEs (equal to 2 *k*_{B}*T* for HFEs from INDUS and 6 *k*_{B}*T* for HFEs from the CNN), and $k=[k(x1,x*),k(x2,x*),\u2026,k(xt,x*)]T$ is a vector of kernel distances between each training pattern **x**_{i} and **x***. Additional details are included in the supplementary material.

### Acquisition function for pattern selection

Active learning requires an acquisition function, *u*(**x***), which is maximized to choose a new pattern, **x**_{t+1}, to label and add to the *t* training patterns. We used the “Expectation of Improvement” acquisition function to balance the trade-off between choosing patterns with high GPR-predicted uncertainties (exploration) and choosing patterns with large deviations from additive behavior (exploitation).^{57,58} To define this trade-off, we first define the expected additive HFE for a given pattern, HFE_{add}, as the average of INDUS-calculated HFEs for patterns with only polar (HFE_{polar}) and only nonpolar (HFE_{nonpolar}) ligands weighted by the mole fraction of polar ligands in the pattern

where $\chi x*$ is the mole fraction of polar ligands in the pattern. The deviation from the expected additive HFE is quantified by

where HFE_{GPR}(**x***) is the GPR-predicted HFE for a pattern **x***. The acquisition function is then

where $\mu \Delta +$ is the maximum deviation among the set of training points, $\varphi Z$ and $\Phi Z$ are the probability density function and cumulative density function of a standard Gaussian, respectively, and *ξ* is a parameter that controls the explorationexploitation trade-off. We chose *ξ* = 0.01 in accordance with past experiments.^{58} During the fast loop, *u*(**x***) is computed for all possible SAM patterns and the pattern which maximizes $ux*$ is chosen as the next pattern to label.

## RESULTS

### Predictions from the Gaussian process regression model before and after active learning

We first trained the GPR model using the same set of randomly selected seed patterns for all three polar end groups and then initiated the dual-loop active learning scheme. Figure 3(a) depicts envelopes of GPR-predicted HFEs after being trained only on seed patterns. Each envelope illustrates the range of GPR-predicted HFEs for different patterns as a function of polar area fraction; predicted HFEs for all possible SAM patterns lie within the envelopes. The linear interpolation line depicts additive behavior. The differences in HFEs for patterns with equivalent polar area fractions indicate the impact of patterning. All HFE envelopes monotonically increase with the polar area fraction as observed previously.^{32,34} However, there are substantial differences between polar groups; HFEs for the amine SAMs (i.e., SAMs containing polar ligands with amine end groups) are clustered near the linear interpolation line, hydroxyl SAMs display large positive deviations from additive behavior, and amide SAMs display positive deviations less than those of the hydroxyl SAMs. Figure 3(b) shows that the envelopes of GPR-predicted HFEs increase in area after convergence of active learning, indicating that active learning amplifies HFE variations associated with patterning.^{52} To quantify these changes, we characterize the envelope of INDUS-calculated HFEs before (i.e., for only the seed patterns) and after active learning (i.e., including patterns selected for the slow loop). Only INDUS-calculated HFEs are considered because of their low error and because patterns were selected for the slow loop to maximize deviations from additive behavior. We define the average envelope width as the difference between the maximum and minimum INDUS-calculated HFEs averaged over all polar area fractions and the maximum envelope width as the largest difference between the maximum and minimum INDUS-calculated HFEs for any polar area fraction.

Figure 3(b) shows that amine SAMs still display behavior close to additive behavior after active learning. However, the average envelope width increases from ∼3.3 to ∼5.5 *k*_{B}*T*, indicating that active learning successfully identifies patterns with higher deviations from additivity than present in the seed set. Similarly, the maximum envelope width increases from ∼6.3 to ∼9.7 *k*_{B}*T* after active learning. Hydroxyl SAMs exhibit a plateau in the calculated HFEs as the polar area fraction reaches ∼0.6, a value lower than a pure polar pattern (0.82), in qualitative agreement with past studies.^{32,34} All INDUS-calculated HFEs exhibit positive deviations from additive behavior, with some tracking the upper edge of the GPR-predicted envelope. None of the INDUS-calculated HFEs lie below the linear interpolation line, where the GPR model predicts negative deviations from additive behavior. This discrepancy is an artifact of the CNN which tends to systematically underpredict values at the lower end of the HFE range. Despite this quantitative inaccuracy, active learning still discovers new patterns with large deviations from additive behavior; the average envelope width for hydroxyl patterns increases from ∼4.7 to ∼7.1 *k*_{B}*T* after active learning and the maximum envelope width substantially increases from ∼8.1 to ∼15.7 *k*_{B}*T.* The largest positive deviation from additivity for INDUS-calculated HFEs also increases from ∼17.9 to ∼20.1 *k*_{B}*T*. Amide SAMs exhibit behavior between that of the amine and hydroxyl SAMs, with most INDUS-calculated displaying positive deviations from additive behavior but without plateauing at intermediate polar area fractions. The average envelope width increases from ∼5.6 to ∼7.2 *k*_{B}*T* while the maximum envelope width increases from ∼12.7 to ∼15.4 *k*_{B}*T* after active learning. Together, these data indicate that active learning identifies larger deviations from additivity for all three sets of SAMs while confirming that varying patterning at a fixed polar area fraction substantially impacts HFEs, even for amine SAMs that only moderately deviate from additive behavior.

### The trained GPR model identifies patterns with large deviations from additive behavior

A chief benefit of the GPR model is the ability to label all possible SAM patterns, permitting analysis of the distribution of GPR-predicted HFEs as a function of polar area fraction. We quantified this distribution by binning the number of patterns with respect to the polar area fraction (in increments of 0.04) and the GPR-predicted HFE (in increments of 0.5). Each bin was normalized by the largest number of patterns in any bin for the same polar area fraction to obtain the relative probability of obtaining a HFE for any given polar area fraction. Figure 4 plots the distribution of GPR-predicted HFEs for the three end groups as a function of the polar area fraction. For the amine and amide SAMs, most patterns have HFEs clustered near the linear interpolation line. The inset in Fig. 4 shows the relative probability of patterns for an amine SAM with a polar area fraction of 0.38, indicating that most patterns also have HFEs clustered near the average HFE. Together, these features indicate that randomly sampled patterns for these SAMs would likely exhibit additive behavior, emphasizing the need for active learning to identify patterns with large deviations from ideality. Conversely, the average HFEs for hydroxyl SAMs cluster near the linear interpolation line at small polar area fractions but deviate from this line at larger polar area fractions, indicating that most patterns for SAMs with large numbers of hydroxyl groups would exhibit deviations from additive behavior (as noted in prior computational studies).^{34}

We next sought to confirm the patterns lying on the extremes of the GPR-predicted HFE distributions with higher accuracy INDUS calculations. We selected four patterns—corresponding to the two largest and two smallest GPR-predicted HFEs—in each polar area fraction bin that contained at least 4000 patterns, resulting in 32 patterns per polar group. GPR-predicted and INDUS-calculated HFEs are highly correlated for these patterns with a Pearson’s *r* of ∼0.9 for amine SAMs and ∼0.8 for hydroxyl and amide SAMs. Figure S8 of the supplementary material shows INDUS-calculated HFEs superimposed on the final GPR-predicted HFE envelope for the selected SAMs. Figure 4 shows that patterns predicted by the GPR model have the highest positive deviations from ideality have INDUS-calculated HFEs (denoted by red plusses) that consistently lie on the upper edge of the envelope in regions where the relative probability of finding patterns is extremely low, with outliers noted only for the amide SAMs. On average, these patterns have INDUS-calculated HFEs that deviate by ∼3.0 *k*_{B}*T* more from additivity than the set of seed patterns (shown as black crosses in Fig. 4). These results indicate that the GPR model reliably identifies patterns with large deviations from additivity in addition to those patterns chosen for the active learning slow loop, enabling the study of pattern features that distinguish hydrophobicity utilizing a much larger number of patterns than in previous studies.^{12,30,34}

### Clustering of nonpolar ligands distinguishes hydrophobic and hydrophilic patterns

Having identified large numbers of patterns with substantial deviations from additive behavior for all three sets of SAMs, we sought to identify features of these patterns that distinguish extremes of hydrophobicity. Figure 5(a) visualizes patterns with high and low HFEs at a constant polar area fraction. Visually, patterns with high HFEs appear to have more densely clustered polar groups, whereas patterns with low HFEs appear to have a higher degree of “connectedness” between nonpolar groups. To quantify these observations, we defined a variety of clustering and spatial autocorrelation metrics that could be computed for patterns. Clustering metrics were computed using a graph-based approach; either polar or nonpolar ligands were defined as vertices and edges were defined between neighboring ligands. We calculated the edge connectivity, node connectivity, and the average clustering coefficient for distinct graphs of polar and nonpolar end groups (a total of six metrics for each pattern). Spatial autocorrelation metrics included Moran’s I and adjusted Geary’s C. Patterns displaying extremes of hydrophobicity were defined based on their GPR-predicted HFEs: patterns with HFEs at least two standard deviations larger than the mean HFE for any given polar area fraction (binned following the same approach as in Fig. 4) were defined as hydrophilic for this analysis, whereas patterns with HFEs at least two standard deviations smaller than the mean HFE were defined as hydrophobic. We only included polar area fraction bins with more than 4000 patterns to ensure statistical significance. We then used a binary support vector machine (SVM) classifier with an *l*_{1} weight penalty to select the metrics that can accurately classify a pattern as hydrophobic or hydrophilic. More details on the calculation of these metrics and the SVM classifier can be found in the supplementary material. Alternative methods, such as unsupervised clustering methods using rotationally and translationally invariant motif identifiers that act as system-specific fingerprints,^{59} could be used to find features common to the most hydrophobic and hydrophilic patterns. However, in this work, we restrict ourselves to hand-crafted interpretable features of patterning and leave more complex methods as an avenue for future work.

The SVM classifier identified the average clustering coefficient of nonpolar ligands as the most important metric for distinguishing hydrophobic and hydrophilic patterns. Figure 5(b) shows that there is a distinct difference between the average clustering coefficient of nonpolar ligands computed for hydrophobic and hydrophilic patterns for all end groups and polar area fractions, with the difference most pronounced as the polar area fraction reaches an intermediate value (consistent with the widest HFE envelopes in Fig. 3). The impact of clustered nonpolar group on hydrophobicity has been demonstrated in previous computational studies of patterned chemically heterogeneous surfaces^{12,32,34,36,48} and in experimental studies of *β*-peptides with extended nonpolar domains^{11,26} but to our knowledge has not been demonstrated quantitatively for a large number of patterns and different polar end groups. The other clustering and spatial autocorrelation metrics do not distinguish between patterns with large variations in HFEs, emphasizing the importance of clustered nonpolar domains over other possible pattern features. This finding underscores the importance of using a quantitative data-driven approach with enough sampling to find measurable differences in factors affecting hydrophobicity. Changing properties, such as the temperature or pressure, or SAM-specific properties, such as grafting density or ligand structure, might lead to systematic changes in SAM hydrophobicity. However, these results show that the impact of clustering of nonpolar end groups on hydrophobicity is universal across patterns tested for all three surface chemistries. We believe that this result will similarly extrapolate to other polar groups since hydroxyl, amine, and amide display distinctly different properties individually.

### Clustering of nonpolar ligands impacts the pinning of water molecules

To explain why the clustering coefficient of nonpolar ligands distinguishes hydrophobic and hydrophilic patterns, we hypothesized that this metric reports on the pinning of the SAM-water interface. Pinning refers to the strong attraction of water molecules to hydrophilic regions of the surface, which reduces water density fluctuations and decreases the likelihood of dewetting [thus contributing to a smaller value of $P\nu 0$ and larger HFE].^{12,33} Because water molecules are highly correlated due to strong water–water interactions, pinning impacts water density fluctuations across multiple molecular lengths and is influenced by the patterning of chemically heterogeneous surfaces. Pinning can be quantified by using MD simulations to analyze spatial variations in the average height of the SAM-water interface, which tends to decrease near hydrophilic regions due to pinning.^{12,48} In our recent study of surfaces with hand-selected patterns, we found that SAMs with uniformly distributed polar and nonpolar groups more tightly pinned water molecules (leading to lower interfacial heights) and appeared more hydrophilic than SAMs with large nonpolar domains.^{48} We thus sought to investigate if the clustering coefficient of nonpolar ligands similarly captures a relationship between patterning and pinning for SAMs with patterns that do not fit these hand-selected extremes. We characterized pinning by performing MD simulations of representative SAMs and calculating the SAM-water interface height, which we defined as the difference between the height of a constant water density isosurface (described in the supplementary material) and the height of the center of mass of ligand end groups.

Figure 6 shows interface height contours for a representative pair of hydroxyl SAMs with the same polar area fraction but with INDUS-calculated HFEs differing by ∼15 *k*_{B}*T*. The average clustering coefficient of nonpolar ligands is 0.63 for the more hydrophobic pattern (HFE of 76.9 *k*_{B}*T*) and 0 for the more hydrophilic pattern (HFE of 92.1 *k*_{B}*T*). A larger region of loose pinning (i.e., a region with a high interface height) is apparent for the hydrophobic pattern. In the hydrophilic pattern, the interface height is reduced near the nonpolar ligand that is bordered by polar ligands, reflecting the perturbation of the water structure due these ligands. Figure 6(b) quantifies these differences by plotting the distribution of interface heights for both patterns. In agreement with prior studies of hand-selected patterns, there is a greater prevalence of larger areas of loose pinning (interface height ∼0.21nm) for the hydrophobic pattern and tight pinning (interface height <0.15nm) for the hydrophilic pattern. Similar distributions were identified for other patterns (examples shown in Fig. S11 of the supplementary material), pointing to a general relationship between pinning and the clustering of nonpolar ligands. These findings illustrate the non-obvious changes to the interfacial water structure that can be related to patterns identified by the GPR model without being selected by hand.

## CONCLUSIONS

We investigated nonadditive contributions to the hydrophobicity of chemically heterogeneous surfaces by training a GPR model that relates patterning to HFEs for SAMs with three different polar end groups (amine, hydroxyl, and amide). To efficiently train the GPR model, we developed a dual-loop active learning scheme that leverages the strengths of two simulation methods to label HFEs: a fast method with lower accuracy (a pre-trained CNN) that is used to explore pattern space and a slow method with high accuracy (INDUS) that is used to identify patterns with large deviations from additive behavior. The active learning scheme enables the GPR model to map 65536 patterns to HFEs by training on simulation data for only ∼2% of the patterns (labeled with 82 INDUS and 1448 unbiased simulations). GPR-predicted HFEs reveal substantial differences in the impact of patterning and composition on hydrophobicity for SAMs with different polar groups, emphasizing the need for inclusion of diverse polar groups in studies of chemically heterogeneous surfaces. The trained GPR model also consistently identifies patterns with HFEs that substantially deviate from additive behavior. Analysis of pattern features further shows that a metric quantifying the clustering of nonpolar ligands distinguishes more hydrophobic from more hydrophilic patterns at a constant polar area fraction, which we attribute to differences in the local pinning of the SAMwater interface. To our knowledge, this is the first analysis of a large number (thousands) of patterns to quantitatively demonstrate the effect of clustering on the hydrophobicity of diverse chemically heterogeneous SAMs, enabled by the new active learning approach. We have limited this study to the effect of SAM patterning on hydrophobicity for SAMs with high grafting densities and ordered ligand backbones. Given these simplifications, similar results might be obtained with alternative surfaces that display the same functional groups with comparable densities; SAMs were selected because they are representative experimentally realizable surface coatings. Future work can consider other SAM design metrics (notably, flexibility, surface curvature, and grafting density) to extend these findings to a broader SAM design space.

The success of the dual-loop active learning scheme suggests several possibilities for future investigation. To correct the tendency of the GPR model to underpredict the lower bounds of possible HFEs, future work will investigate machine learning models that exploit more informative data representations (e.g., hydrogen bond graphs^{60,61}) or alternative deep learning architectures (e.g., recurrent neural networks) to improve the accuracy of HFE predictions during the fast loop. We also envision the extension of the active learning scheme to train a single GPR model that maps both polar group chemistry (encoded as a molecular fingerprint, for example) and patterning to HFEs, permitting efficient investigation of a broader range of polar groups. A significant challenge for any data-centric model trained using planar SAM systems is the extrapolation to complex 3D geometries. Previously mentioned data representations, which are not restricted to planar geometries (e.g., graph representations^{60,61}), could be used to aid the transferability of this framework to more topologically complex surfaces in future work. Finally, previous studies have shown that INDUS-calculated HFEs correlate with experimental measurements of hydrophobicity, such as SAM–SAM pull-off force measurements using atomic force microscopy and contact angle measurements.^{33,49,50,62} This correlation suggests that the active learning model developed in this work could be extended to other experimental systems with large design spaces (e.g., *β*-peptides or SAMs with varying end group chemistry^{11,26}). More generally, the active learning approach could utilize experimental measurements as part of the slow loop rather than high-accuracy free energy calculations to further enable connections between simulation and experiment. This work, thus, presents a new approach for efficiently exploring SAM hydrophobicity using HFEs as one possible metric, with the potential to apply the dual-loop active learning framework to a broader range of systems or quantitative metrics of surface properties.

## SUPPLEMENTARY MATERIAL

See the supplementary material for details on INDUS and CNN calculations of hydration free energies, details on Gaussian process regression methods, additional comparisons of hydration free energies, and details on clustering and spatial autocorrelation metrics.

## ACKNOWLEDGMENTS

This work was supported by the National Science Foundation (Grant No. 2044997). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (Grant No. ACI-1548562). R.C.V.L. also acknowledges support from the 3M Non-Tenured Faculty Award.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

Raw data consisting of labeled trajectories for over 370 patterned SAMs, a pre-processed dataset containing histograms of oxygen and hydrogen number densities for all INDUS-labeled patterns in the dataset, and all files needed to reproduce or extend the trajectories are available at https://doi.org/10.5061/dryad.41ns1rnft. All data and code to reproduce this work are available at https://gitlab.com/atharva-kelkar/dual-loop-active-learning, Ref. 63.

## REFERENCES

See https://gitlab.com/atharva-kelkar/dual-loop-active-learning for all data and code to reproduce this work.