Efficient structure search is a major challenge in computational materials science. We present a modification of the basin hopping global geometry optimization approach that uses a curvilinear coordinate system to describe global trial moves. This approach has recently been shown to be efficient in structure determination of clusters [C. Panosetti *et al.*, Nano Lett. **15**, 8044–8048 (2015)] and is here extended for its application to covalent, complex molecules and large adsorbates on surfaces. The employed automatically constructed delocalized internal coordinates are similar to molecular vibrations, which enhances the generation of chemically meaningful trial structures. By introducing flexible constraints and local translation and rotation of independent geometrical subunits, we enable the use of this method for molecules adsorbed on surfaces and interfaces. For two test systems, *trans*-*β*-ionylideneacetic acid adsorbed on a Au(111) surface and methane adsorbed on a Ag(111) surface, we obtain superior performance of the method compared to standard optimization moves based on Cartesian coordinates.

## I. INTRODUCTION

In just 20 years the complexity of systems studied in surface science has increased by orders of magnitude.^{1} Whereas previous major problems were associated with the adsorption behavior of small adsorbates such as CO on Pt(111),^{2} typical challenges that dominate surface science nowadays are associated with the structural, electronic, and chemical properties of large organic adsorbates and molecular networks on metal and semiconductor surfaces.^{3–5} Here the interaction strength and geometry of systems and the implications for nanoarchitectures, self-assembly, or atomic-scale manipulation are studied by two complementary sets of techniques, namely, by *in vacuo* surface science experiments^{6} and *ab initio* electronic structure simulations.^{7} The capabilities of both experiment and simulation have increased tremendously. On the computational side, previously unthinkable system sizes can be dealt with thanks to the immense increase in computational power and the numerical, as well as algorithmic efficiency of Density-Functional Theory (DFT).^{8} Owing to the latest advances in particular in the treatment of dispersive interactions, such calculations are nowadays able to provide a reliable description of the adsorption structure and energetics of highly complex organic adsorbates on metal surfaces.^{9–11}

An especially important aspect of theoretical surface science that should benefit from this progress is the ability to conduct computational screenings of possible geometries of complex interfaces, be it for determining the atomic configuration of surface reconstructions,^{12} topologies of organic crystals,^{13} or identifying the manifold of all possible compositions and adsorption sites in processes of heterogeneous catalysis and in compound materials.^{14–16} However, one thing that has not changed much in recent years is how, in simulations, optimal adsorption structures are identified. The large number of degrees of freedom in a complex adsorbate, as well as the structural complexity of large surface nanostructures, necessitates a full global search for optimal structures and overlayer phases that includes all possible binding modes and chemical changes that can occur upon adsorption. Knowing the geometric structure of an interface is an important prerequisite to investigating its electronic properties and level alignment, which in turn determines the performance in potential electronic or catalytic applications. An efficient potential energy surface (PES) sampling tool is necessary to enable this, as opposed to just conventional local optimization of several chemically sensible initial guesses. In general, this problem can be solved by applying global geometry optimization methods.^{17,18}

The goal of a majority of such global optimization approaches is to efficiently traverse the PES by generating new trial configurations that are subsequently accepted or rejected based on certain criteria. Multiple reviews on global optimization methods specific to certain types of systems, ranging from small clusters to large biomolecules, have been published recently.^{19,20} In particular, much attention has been attracted by the application of global optimization methods to the study of metal clusters,^{21} binary nanoalloys,^{22} proteins,^{23} water clusters,^{24} and molecular switches.^{25} Recently, global geometry optimization schemes have been specifically developed and applied in the growing field of heterogeneous catalysis,^{26} including application in reaction coordinate prediction,^{27} and biomolecular simulations.^{28} A wide range of different global optimization algorithms has thereby been suggested over the last two decades, ranging from simple classical statistical mechanics simulated annealing schemes^{29} to sophisticated landscape paving,^{30} puddle-jumping,^{31,32} or neural-network controlled dynamic evolutionary approaches.^{33,34} Two prominent and most popular families of global geometry optimization techniques include Monte Carlo based methods, such as basin hopping (BH),^{35} and evolutionary principles based genetic algorithms (GA).^{36} However, no general rule of preference to a specific algorithm exists, as the efficiency of classical global optimization methods is highly system dependent.^{22}

There are many technical aspects that influence the performance of any global sampling technique, such as the choice of initial geometry, the ways of disturbing the configuration during the trial move, the definition of acceptance criteria, the methods employed to calculate potential energies and forces, etc. Many authors were concerned with the efficiency of applied sampling methods and suggested various improvements.^{37–42} Moreover, the possibility of moving from the total energy as the main sampling criterion towards observable-driven^{43} and grand-canonical^{44} global sampling schemes has been suggested. The parameter that plays the most crucial role in method performance, however, is the choice of coordinates suitable for representing the geometries and, most importantly, changes in geometries during the sampling, i.e., the so-called trial moves.^{45} The essential importance of the trial geometry generation step has already been noticed in early system-specific publications.^{45,46} For instance, simple group rotations within proteins^{47} or clusters^{35} already led to significant gains in sampling efficiency. Furthermore, several computationally more expensive ways to produce elementary trial moves were suggested based on short high-temperature molecular dynamics (MD) runs.^{38,48} For certain systems they have indeed proven to be much more effective, allowing to, e.g., optimize large metallic clusters within atomistic models.^{49,50}

From a programmer’s point of view, the perhaps most intuitive representation of the atomic coordinates of trial moves are Cartesian coordinate (CC) displacements. It has been noted though that this most popular representation is often inefficient,^{45,51} due to its chemical blindness, which may easily lead to unphysical configurations (e.g., dissociated structures). CC displacements do not account for the overall geometry and the coupling between coordinates. Different approaches have been proposed to overcome this difficulty, such as virtual-alphabet genetic algorithms,^{52} employing the idea that coordinates should stand as meaningful building blocks,^{53} or genetic algorithms with space-fixed CCs,^{45,46,54} introducing non-traditional genetic operators. The main limitation of these methods, however, is that they remain rather system specific. The choice of suitable coordinates for global optimization should rather be system independent, while at the same time adapted to the chemical structure. Especially in the case of large adsorbates, the completely unbiased way in which the structural phase space is sampled in CCs does not allow for an efficient structural search for conformational changes upon adsorption that leave the molecular connectivity intact. The chemically most intuitive set of coordinates for such a situation would instead be internal coordinates (ICs), namely, bond stretches, angle bends, or torsions. Already in the seminal work on BH of Wales and Doye, the potential usefulness of such coordinates has been noted^{35} and later shown to be beneficial for structures connected by double-ended pathways.^{55}

One of the first applications of the idea of using ICs for global geometry optimization was reported in the context of protein-ligand docking, which developed into the so-called Internal Coordinate Mechanics (ICM) model,^{56} further extending the above-mentioned concept of meaningful building blocks.^{53} Another example of global optimization in ICs was the attempt of introducing dihedral angles into the framework of the so-called deterministic global optimization.^{57} Global optimization of clusters and molecules in ICs using the Z-matrix representation was suggested by Dieterich and Hartke.^{58} However, automatic construction of a Z-matrix is close to impossible for larger, more complicated systems, especially organic molecules containing rings. Besides, it also does not eliminate the problem of coordinate redundancies, which generally limits the applicability of this approach.

Especially suitable for a system independent description of complex molecular structures are instead so-called delocalized internal coordinates (DICs), i.e., non-redundant linear combinations of ICs, that have been extensively used for efficient local structure optimization of covalent molecules,^{59–62} crystalline structures,^{63–65} and for vibrational calculations.^{66,67} In our recent work,^{68} we implemented such automatically generated collective curvilinear coordinates in a BH global sampling procedure. The similarity of these coordinates to molecular vibrations does yield an enhanced generation of chemically meaningful trial structures, especially for covalently bound systems. In the application to hydrogenated Si clusters, we correspondingly observed a significantly increased efficiency in identifying low-energy structures when compared to CC trial moves and exploited this enhancement for an extensive sampling of potential products of silicon cluster soft landing on the Si(001) surface.

In the present work, we provide a detailed methodological account of this curvilinear coordinate global optimization approach^{68} and extend it to a conformational screening of adsorbates on surfaces. We do this by introducing constraints and extending the coordinate system to include lateral surface translations and rigid adsorbate rotations. These curvilinear coordinates are constructed automatically at every global optimization step and, similarly to the original BH scheme, we pick a random set of coordinate displacements with which a trial move is attempted. Testing this for the *trans*-*β*-ionylideneacetic acid molecule adsorbed on a Au(111) surface (as an example of a complex functionalized organic adsorbate), and for methane adsorbed on a Ag(111) surface (as an example of a small adsorbate with a shallow PES and many minima), we find that DIC trial moves increase the efficiency of BH by both a more complete sampling of the possible surface adsorption sites and by reducing the number of molecular dissociations.

The paper is organized as follows: in Section II we outline traditional approaches to global structure optimization, explain the construction of DICs and our definition of a complete coordinate set for adsorbates on surfaces, and how these coordinates are applied for conformational structure searches in a global optimization framework such as basin hopping. In Section III we summarize the computational details of our benchmark studies on the surface-adsorbed aggregates presented in Section IV. We conclude our work in Section V. Additional remarks on software and algorithms can be found in Appendix A, and details on DFT calculated diol stability can be found in Appendix B.

## II. METHODS

### A. Global optimization — basin hopping

A variety of global optimization procedures exists to date, including simulated annealing,^{29} genetic algorithms,^{36} BH,^{17,35} and many variants and combinations thereof.^{37–39} One reason for the success of basin hopping might be the intriguing simplicity of the approach, summarized as follows:

displace a starting geometry with a random

*global*Cartesian coordinate (CC) trial move Δ**x**normalized to a step width*dr*;perform a

*local*structure optimization to a minimum energy structure with energy*E*;_{i}- pick a random number and accept the new minimum energy structure with the probabilitywhere$P(\Delta Ei)=exp\u2212Ei\u2212EminkBTeff,$
*T*_{eff}corresponds to an effective temperature and*E*_{min}refers to the energy of the current lowest energy structure; if the structure is accepted, place it as a new starting point and proceed with point 1.

Following this procedure, as illustrated in Fig. 1, the PES is sampled until all chemically relevant minima are found. A sufficiently high *T*_{eff} ensures that also higher-lying minima are accepted with adequate probability and a large area of the PES can be sampled. The size of trial moves Δ**x** must be large enough to escape the basin of attraction of the current local minimum, but small enough not to lead to regions of the PES that are of little interest, e.g., dissociated structures or structures with colliding atoms. The corresponding displacement Δ**x** is constructed from random Cartesian vectors normalized to the chosen step width. This approach has proven highly efficient for optimization of clusters and biomolecules.^{20,69} In the following, we will discuss a modification of this general algorithm only with respect to point 1 by changing the construction of the displacement vector Δ**x**.^{68}

### B. Construction of delocalized internal coordinates

ICs can be defined in many different ways. A typically employed approach is to define a set of coordinates in a body-fixed frame as a set of bond stretches, angle bends, torsional modes, and out-of-plane angles.^{70} Any displacement Δ**x** from a given point **R**_{x0} in Cartesian coordinate space can then be transformed to a set of internal coordinate displacements Δ**q** by

where

The set of coordinates in **q**-space can be chosen to be non-orthogonal and complete (e.g., via a Z-matrix); however, this is often difficult and highly system specific.

An alternative approach is the construction of a highly redundant set of *M* > (3*N* − 6) primitive ICs to represent the independent degrees of freedom of an *N* atom system.^{71} Pulay, Fogarasi, Baker, and others^{60–62,66,71,72} have shown that an orthogonal, complete set of coordinates can be constructed from such a redundant coordinate set simply by singular value decomposition (SVD) of the matrix **G** = **B**^{†}**B**. By construction, matrix **G** is positive semi-definite and Hermitian and satisfies

In Eq. (4), **U** is a set of vectors in the space of primitive ICs **q** that defines linear combinations thereof. SVD yields a subset of (3*N* − 6) such independent vectors with non-zero eigenvalues **Λ** that constitute a fully orthogonal system of DICs.^{61} These coordinates span all primitive ICs that were previously defined in **B**. **U** is therefore nothing else than a Jacobian that transforms displacements in the space of primitive internals **q** into displacements in the space of DICs **d**,

In the last equation, we have defined the transformation matrix $B\u0303$ of dimension (3*N* − 6) × 3*N* that transforms CC into DIC displacements. Unfortunately the back transformation cannot simply be done by inversion as the matrix $B\u0303$ is non-quadratic and therefore singular. However, we can define a generalized inverse^{62} that connects displacements in CC and DIC space

where

Equation (6) can easily be verified by multiplication from the left with $B\u0303$. Utilizing the above expressions, we have thus established a bijective transformation between the displacements Δ**x** and Δ**d**. However, transformation of the absolute positions of atoms in a molecule, their nuclear gradients, and second derivatives requires iterative backtransformation from DIC to CC space. This is due to the fact that $B\u0303$ is simply a linear tangential approximation to the curvilinear hyperplane spanned by coordinates **d**. Further details on DIC transformation properties can be found in Ref. 62.

In summary, the here presented construction allows to define displacements in terms of a random set of DICs Δ**d** and expression thereof as CC displacements Δ**x**, which can then be used to generate a new trial structure from a starting geometry (cf. basin hopping procedure, point 1).

### C. Curvilinear coordinates for adsorbates and aggregates

In the above presented formalism, we have constructed (3*N* − 6) DICs, which fully describe all (body-fixed) deformations of a molecule. By construction, overall translations and rigid rotations of isolated molecules and clusters are removed due to their invariance with respect to these coordinates. Re-introducing these degrees of freedom can in some cases be useful, for example, in the description of molecules or clusters adsorbed on surfaces, in dense molecular arrangements, or organic crystals. In such systems the position and orientation of a tightly bound subsystem, such as a molecule, with respect to the rest of the system, such as a surface or the molecular neighbors, are determined by overall weaker interactions and forces.

One can account for this specific chemistry of the system by partitioning it into two (or more) subsystems (e.g., adsorbate and surface), for each of which one constructs a set of 3 rigid translations, 3 rigid rotations, and (3*N*_{a} − 6) DICs, where *N*_{a} is the number of atoms in subsystem a. In the following, we will refer to such a set of coordinates as *complete* delocalized internal coordinates (CDICs). This idea is illustrated in Fig. 2. The pictured system, namely, methane on a Ag(111) surface, is partitioned into molecule and surface, and each of these two subsystems is assigned its own set of coordinates, the concatenation of which comprises a full-dimensional coordinate vector that is equivalent to CCs. The translations can be added as simple Cartesian unit vectors of the center of mass (or any set or subset of atoms), as symmetry-adapted linear combinations thereof, or even as fractional CCs in a unit cell, for example, to adapt to the symmetry of a given surface unit cell. For the definition of sub-system rotations, we have chosen to employ nautical or Tait-Bryan angles. Hereby the three rotation angles Φ, Θ, and Ψ (also called yaw, pitch, and roll) are defined around three distinct rotation axes with respect to a reference orientation. This reference orientation is given by the molecular geometry from which the DICs are constructed and establishes an Eckart frame. Changes to orientation are identified and applied by standard rigid-body superposition using Quaternions.^{73}

In the systems where the above partitioning makes sense chemically, the concomitant decoupling of coordinates may be beneficial for the description of dynamics and global optimization. In cases where, for example, no significant surface reconstruction is expected upon adsorption, it may not be useful to construct DICs for the substrate, where chemical bonding is more isotropic. Instead, CDIC partitions can be constructed for arbitrary subsets of coordinates (e.g., the adsorbate) in order to then be combined with the remaining CCs (e.g., the metal substrate). Finally, CDICs (as well as DICs) make the definition of arbitrary coordinate constraints fairly straightforward as is shown in Subsection II D.

### D. Constraints on delocalized internal coordinates

Starting from a set of DICs, constraints on primitive internals such as bond stretches, angle bends, or torsions can be applied by projection from the initial set of DIC vectors **U**.^{61} The constraint vector **C**_{i} is taken as a simple unit vector in the space of primitive ICs **q**,

Correspondingly, the projection of this constraint vector onto the eigenvector matrix **U**

yields a vector $C\u0303i$ in the space of DICs **d** (denoted by the tilde sign). All vectors **U**_{j} need to be orthogonalized with respect to **C**_{i} (e.g., using Gram-Schmidt orthogonalization) to ensure that the primitive coordinate *i* is removed from the DICs. The resulting matrix **U**′ is then used to transform back to CCs using the inverse of $B\u0303\u2032=BU\u2032$ in the same way as in Eq. (6).

Constraints on sub-system translations or rotations in CDICs can be applied by simply removing them from the available set of coordinates in **d**-space. The iterative nature of the backtransformation to absolute CCs may require an iterative coordinate constraint algorithm such as SHAKE.^{74} Cartesian constraints can be applied by simply excluding atoms or CCs from the reference geometry from which the DICs are constructed, or by projections in Cartesian space **x** equivalent to the above projections in **q**-space.^{75}

### E. Global optimization with curvilinear trial moves

Summarizing the above, we have defined a set of complete, orthogonal curvilinear coordinates describing all internal motions of molecules, which can be constructed in a fully automatic manner from a set of CCs. For systems with multiple molecules, aggregates, or for surface-adsorbed molecules, we can supplement these with translational and rotational degrees of freedom for the individual subsystems. Any changes or displacements applied in these coordinates can be transformed back to CCs and be readily applied in this space.

Displacements in the form of DICs or CDICs yield molecular deformations that are more natural to the chemistry of the system, simply due to their construction based on the connectivity of atoms. We have recently shown that application of such coordinates in the framework of global optimization methods can largely facilitate a structure search by preferential bias towards energetically more favorable geometries.^{68} Modification of a global optimization algorithm in terms of DIC trial moves on the example of the BH procedure sketched above can be summarized as follows:

Displace a starting geometry with a random DIC (CDIC) trial move Δ

**d**. Therefore, we have the following:construct a set of DICs (CDICs);

apply constraints, if necessary;

pick a random (sub-)set of DICs (CDICs) and normalize them to a given step width

*dr*; andtransform coordinates back to Cartesian space using Eq. (6) and apply them to the structure.

Perform a

*local*structure optimization to a minimum energy structure with energy*E*._{i}- Pick a random number and accept the new minimum energy structure with the probability$P(\Delta Ei)=exp(\u2212Ei\u2212EminkBTeff).$
If the structure is accepted, place it as a new starting point and proceed with point 1.

## III. COMPUTATIONAL DETAILS

We consider three test systems: retinoic acid (ReA, cf., Fig. 3) in gas phase; *trans*-*β*-ionylideneacetic acid^{76} (*β*-acid, cf. Fig. 4) adsorbed on a six-layer, (5 × 5) surface unit cell Au(111) slab; and methane (CH_{4}) on a four-layer, (2 × 2) surface unit cell Ag(111) slab. All test systems were built using the ASE (Atomic Simulation Environment) package.^{77} Energetics for the global optimization runs were calculated with Density Functional-based Tight-Binding (DFTB) performed with the DFTB+ V1.1 code.^{78,79} To correctly account for surface-molecule interactions in the *β*-acid on Au(111) case, the Tkatchenko-Scheffler screened dispersion correction method vdW^{surf} ^{80} was applied as an *a posteriori* correction to the DFTB results. Since vdW^{surf} was not available in DFTB+ at the time, we have neglected the environment-dependent rescaling of atomwise dispersion coefficients. Environment-dependence of dispersion interactions in conjunction with tight-binding methods has been introduced recently.^{81} As convergence criterion for the self-consistent DFTB charges, 10^{−7} electrons was used for all systems. DFTB parameters for interaction between C, H, O, Au, and Ag have been generated as described in Refs. 82 and 83 and are available upon request. Si and H parameters are described in Ref. 84. We have used the refinements as described in Ref. 85.

We implemented the coordinate generation as well as a BH procedure in a modular python package (winak, see Appendix A).^{86} The set of redundant internal coordinates was constructed by including bond lengths, bending angles, and dihedral angles. Bond lengths were included if they were within the sum of the covalent radii of the atoms plus 0.5 Å. Bending angles were included if the composing atoms were bonded to each other and if the bending angle was below 170°. Dihedral angles were included if the composing atoms were bonded and if the dihedral angle was smaller than 160°. The effective temperature in the BH procedure was set to 300 K. Trial moves in unconstrained DICs/CDICs and with all stretches constrained (constr. DICs/CDICs) were always constructed using linear combinations of a varying percentage of all modes, e.g., 10%, 25%, 75%, or 100% of all available coordinates. These subsets were randomly chosen, and in the construction of the displacement vector, each DIC mode was scaled with a random factor in the range [ − 1; 1]. CC displacements were created by assigning a random value in the range [ − 1; 1] to the *x*, *y*, and *z* coordinates of every atom. As a normalization, all displacements were divided by the single largest component in CCs, such that the maximum absolute displacement of an atom in *x*, *y*, or *z* direction was 1 Å. Subsequently the whole displacement vector was multiplied by a tunable dimensionless factor *dr*, i.e., the step width introduced above. The global minimum was always used as initial starting geometry and local optimizations were performed until the maximum residual force was smaller than 0.025 eV/Å. Local optimizations were performed using a standard quasi-Newton optimizer and Cartesian coordinates.

## IV. RESULTS AND DISCUSSION

In our recent work, we have already described the efficiency of the DIC-based structure search approach for metallic and hydrogenated silicon clusters in gas phase and when adsorbed on surfaces.^{68} In the following, we shortly review these results in the present context and proceed to evaluate the efficiency and applicability of structure search based on DIC (CDIC) trial moves for global optimization of molecules in gas phase and adsorbed on surfaces.

### A. Performance of delocalized internal coordinates for isolated molecules and clusters

In our previous work,^{68} we have compared the efficiency of BH when using CCs and DICs as trial moves by performing a series of 500-step runs with different parameters (step width, percentage of DICs) for a hydrogenated silicon cluster Si_{16}H_{16} both in isolation and adsorbed on a Si(001) surface. Regardless of the choice of parameters, all the runs in DICs were shown to sample a more relevant region of the PES, that is, the minima identified with DIC trial moves are more often intact and the distribution of their energies is invariably centered around lower energies compared to the corresponding CC run with equivalent parameters. Furthermore, CC-based runs almost always failed to identify the global minimum, whereas almost all DIC-based runs found it. Finally, as DIC trial moves produce less strained geometries, the local relaxation at each global step necessitates on average 30% fewer local optimization steps, which drastically reduces the overall simulation time.

We ascribe the superior performance of DICs to their collective account of the structure and chemistry of the system. DICs by construction take into account the connectivity of atoms to produce collective displacements that resemble vibrational modes. Such displacements are largely composed of concerted motions of groups of atoms, thus naturally suitable to explore configurational subspaces in which certain bonding patterns are preserved. The latter condition is particularly desirable in the investigation of both the gas-phase and the adsorption behavior of large functional molecules, where two intertwined aspects need to be addressed: (i) the structural integrity is a necessary prerequisite to enable functionality; (ii) even small conformational changes, or minor isomerizations, can determine dramatic variations of electronic properties (e.g., of molecular switches). The structural screening of molecules of this kind should therefore be constrained to subspaces of the PES that satisfy these aspects, while ideally retaining a sufficient degree of flexibility to be able to capture non-intuitive, potentially relevant geometries.

In the case of large organic molecules, we need to additionally account for the fact that their degrees of freedom show largely varying bond stiffness. High-frequency stretch motions will, for example, generally not be relevant for a conformational structure search. To test the efficiency of the DIC-BH approach in such cases, global optimization of isolated ReA (cf. Fig. 3) has been performed using 100 trial moves based on random CCs and displacements along varying subsets of DICs. Table I compiles the number of symmetry-inequivalent minima found. It is evident that random CCs are particularly ill-suited for large organic molecules. In most of the cases the molecule ended up dissociated and once the molecule is torn apart, regaining an intact structure by random CC displacements is highly improbable. For most applications involving organic molecules, dissociated structures are not of interest and the computing time used for the local optimization step is essentially wasted. When using random CCs, this can amount to up to 95% of the overall computing cost.

. | Step width dr
. | |||
---|---|---|---|---|

Coords. . | 0.55 . | 0.70 . | 0.90 . | 1.20 . |

CC | 1 | 0 | … | … |

DIC 10% | 2 | 2 | 3 | 3 |

DIC 25% | 5 | 2 | 2 | 1 |

DIC 100% | 1 | 3 | 2 | 1 |

Coords. | 0.55 | 0.70 | 1.50 | 2.00 |

Constr. DIC 10% | 2 | 3 | 8 | 8 |

Coords. | 0.70 | 0.90 | 1.50 | 2.50 |

Constr. DIC 25% | 2 | 4 | 5 | 9 |

Constr. DIC 100% | 2 | 6 | 4 | 9 |

. | Step width dr
. | |||
---|---|---|---|---|

Coords. . | 0.55 . | 0.70 . | 0.90 . | 1.20 . |

CC | 1 | 0 | … | … |

DIC 10% | 2 | 2 | 3 | 3 |

DIC 25% | 5 | 2 | 2 | 1 |

DIC 100% | 1 | 3 | 2 | 1 |

Coords. | 0.55 | 0.70 | 1.50 | 2.00 |

Constr. DIC 10% | 2 | 3 | 8 | 8 |

Coords. | 0.70 | 0.90 | 1.50 | 2.50 |

Constr. DIC 25% | 2 | 4 | 5 | 9 |

Constr. DIC 100% | 2 | 6 | 4 | 9 |

The use of DIC displacements already greatly improves on this. Steps that lead to dissociations can be reduced to 75% of the overall amount of steps, while still finding more relevant minima compared to random CCs. The code responsible for creating the ICs automatically detects dissociations, such that they can be discarded directly at runtime prior to local optimization. This shows that DICs can already be useful in their most general form. However, for an efficient search of the conformational phase space of a large organic molecule, this can even further be improved by removing bond stretches as explained in Section II. When the bond stretches are fully constrained (constr. DICs in Table I), a much finer screening of the PES is possible, and more unique and intact minima are found. Many different conformers (for example, eclipsed/staggered stereochemistry) can be observed which would be extremely difficult to find using trial moves based on CCs or unconstrained DICs. Dissociation events can be reduced to as low as 33% of all steps even at large step widths.

While in the simple example of ReA global optimization might not be necessary to identify the most relevant conformations, these results demonstrate that customizing DICs with structure-preserving constraints can make them more applicable to covalently bound systems. We conclude with a note on the step width and percentage of DICs. Due to the normalization, the result of the algorithm is highly dependent on the percentage of DICs used. The more DICs a displacement is made up of, the less influence a single DIC has on the overall displacement. Individual DICs contribute more to the 10% than to the 25% DIC displacements. To find certain minima, some atoms might have to be moved over a few Ångström, so a large step width would be of advantage. However, for our test system, CC displacements failed to find a single minimum already for a step width as small as 0.70. For DIC displacements, the step width can and should be chosen much higher than that. Optimization runs using constr. DIC displacements performed best for a step width of above 2.00.

### B. Performance of delocalized internal coordinates for molecules on surfaces

We proceed to apply the here presented approach to two molecules adsorbed on surfaces: a *β*-acid molecule on Au(111) and methane on Ag(111). The first case exhibits a large number of internal degrees of freedom, and the second a shallow PES regarding translations and rotations on the surface.

#### 1. Trans-*β*-ionylideneacetic acid adsorbed on Au(111)

Even though there might be crystallographic data for large biological molecules, the current resolution of experiments is often insufficient to resolve the individual atomic positions of larger functional molecules adsorbed on surfaces. Closely related to their role in nature, functional molecules could act as molecular switches when adsorbed on a metal surface.^{4,87,88} Therefore, prediction of key structural elements by electronic structure methods is crucial. We study the performance of CDIC-based global optimization on the example of a smaller analogue of ReA, namely, *trans*-*β*-ionylideneacetic acid^{76} (*β*-acid, cf. Fig. 4) adsorbed on a Au(111) surface. As an initial starting geometry, the molecule in the conformation representing the global minimum in gas phase was placed flat lying at a random adsorption site on the surface, and the structure was relaxed using local optimization. No surface relaxation was allowed. This was used as a starting point for all BH runs. As a reference, 100 global steps were performed for step widths of 0.25, 0.5, and 1.0 using CC trial moves. This is compared to 100 global step runs using 10%, 25%, and 100% mixtures of constr. CDICs, all with a step width of 1.5.

For surface-adsorbed molecules, the shortcomings of standard CC displacements become even more apparent as shown in Table II. In 70%–90% of all global optimization steps, a dissociated structure is obtained. Most commonly the cyclohexenyl ring is torn apart or the conjugated side chain is broken. In the context of conformational switching, this would render this minimum irrelevant for the process under study. Keeping important structural motifs intact is difficult with CC displacements. One could argue that setting a smaller step width remedies this problem; however, our results show that, as the step width decreases, the starting geometry is found with higher probability. There seems to be no optimal setting for the step width parameter: either the molecule dissociates or the sampling is not able to escape the initial basin of attraction. As a result, we did not obtain any single new geometry other than the starting point and there was almost no lateral adsorption site sampling. CC trial moves are thus not applicable to this case. The result of the sampling is changed dramatically by the introduction of CDIC trial moves with constrained stretches (constr. CDIC, cf. Table III). Using 25% and 100% of available constr. CDICs reduces dissociation events to about 40% of all steps. For 10% constr. CDIC displacements at a step width of 1.5, the amount of dissociations is comparable to CC trial moves using a step width of 0.25, however, at largely different sampling efficiency. When fewer DICs are mixed into a displacement, the influence of each individual DIC in the final trial move is larger due to normalization. As a consequence and consistent with the gas-phase sampling results above, 10% constr. CDIC displacements introduce the most significant changes to the molecular geometry, i.e., result in more dissociations. Inversely, 100% constr. CDIC displacements sample the starting geometry more often than 10% or 25% constr. CDIC trial moves. As the number of translations and rotations in comparison to the number of internal degrees of freedom in the overall displacement is smaller at higher percentages of CDICs, lateral sampling is less efficient than when using 25% or 10% constr. CDIC trial moves. It is not straightforward to find the optimal position and orientation of a complex adsorbate like the one studied here, and efficient lateral sampling is therefore strongly desired. In contrast to standard CC based trial moves, different rotations and various adsorption sites of the molecule have been sampled during all three runs. A closer analysis of this aspect will be presented in Sec. IV B 2.

. | Step width dr
. | ||
---|---|---|---|

CC trial moves . | 0.25 . | 0.5 . | 1 . |

Dissociations | 73 | 93 | 98 |

Starting geometry | 24 | 7 | 2 |

Different adsorption site | 3 | 0 | 0 |

New structures | 0 | 0 | 0 |

. | Step width dr
. | ||
---|---|---|---|

CC trial moves . | 0.25 . | 0.5 . | 1 . |

Dissociations | 73 | 93 | 98 |

Starting geometry | 24 | 7 | 2 |

Different adsorption site | 3 | 0 | 0 |

New structures | 0 | 0 | 0 |

. | Amount of constr. CDICs . | ||
---|---|---|---|

Constr. CDIC . | 10% . | 25% . | 100% . |

Dissociations | 62 | 38 | 37 |

Starting geometry | 8 | 3 | 30 |

Different adsorption site | 19 | 48 | 23 |

New structures | 11 | 11 | 10 |

. | Amount of constr. CDICs . | ||
---|---|---|---|

Constr. CDIC . | 10% . | 25% . | 100% . |

Dissociations | 62 | 38 | 37 |

Starting geometry | 8 | 3 | 30 |

Different adsorption site | 19 | 48 | 23 |

New structures | 11 | 11 | 10 |

Conformational structure search for *β*-acid on Au(111) serves the purpose of finding new and stable configurations. Especially interesting are *cis* isomers (cf. Fig. 5, structures B and C) since they could play an active role in a molecular switching process. These structures are particularly hard to uncover by unbiased sampling, since a concerted motion of a large number of atoms is necessary. We were able to find such minima in two out of three constr. CDIC based runs (10% and 100% constr. CDIC trial moves) with a relatively small number of steps per run (25 and 88 global steps, respectively). It should be noted that CC based sampling, however, was not able to uncover any new conformations apart from the starting geometry.

Other minima of interest are conformations of the molecule that might all contribute to the apparent finite-temperature geometry observed in experiment, such as different ring conformations (chair, boat, half-chair, etc.) or changes in internal degrees of freedom (chain to ring angle, staggered vs. eclipsed conformations of methyl groups, etc.). The broadest range of such structures has been identified using 10% constr. CDIC displacements and a step width of 1.5. Configurations were found where the ring was bent towards the surface and away from it. We have also found structures where the ring is in a boat configuration (cf. Fig. 5, structure A) or twisted with respect to the chain. Methyl groups both on the ring and on the chain were found in different rotations (staggered conformation). Another group of minima involved modifications of the acid group, rotating either the OH or the whole COOH group by 180°. 25% constr. CDIC trial moves found similar minima, but were not able to sample changes in the acid group. Instead, the boat configuration of the ring was sampled more often. The third setup (100% constr. CDIC displacements) did not result in a completely rotated acid group; the OH group was found flipped once. This run was the only one that sampled half-chair conformations of the ring. A somewhat unexpected result was the repeated discovery of a diol structure (cf. Fig. 5, structure D) in all constr. CDIC runs, while CC based runs always failed to find this structure. We have performed DFT calculations that predict a stabilizing effect of the surface by 0.22 eV on this structure as compared to the gas-phase minimum conformation; however, the acid structure nevertheless remains the global minimum (see Appendix B). The energetically most favorable structures that were found by all runs are slight variations of the global minimum conformation found in gas phase, adsorbed such that the oxygen atoms are situated above a bridge site.

In summary, CC displacements were found to either repeatedly sample the starting geometry at small step widths or dissociated structures at larger step widths. Using CC trial moves we were not able to find any step width that is both large enough to overcome barriers on the PES and simultaneously small enough not to dissociate the molecule. Expressing the displacements in constr. CDICs, however, enables a finely tunable sampling of phase space. Indeed a small number of steps are sufficient to find a plethora of molecular configurations on the surface as well as different adsorption sites and rotations.

#### 2. CH_{4} adsorbed on Ag(111)

The example of *β*-acid on Au(111) illustrated the importance of chemically adapted coordinates in sampling the internal degrees of freedom of an adsorbate on a surface. To better understand the sampling of combined surface/adsorbate degrees of freedom such as the lateral adsorption site and the orientation of the molecule on the surface, we next study CH_{4} (methane) adsorbed on Ag(111). Different adsorption sites of CH_{4}/Ag(111) are found to have similar stability and the PES is dominated by small barriers and little corrugation between these sites. As a consequence, it should be an ideal test system for which we can sample all adsorption geometries in the unit cell.

Even though the system appears simpler than the previous test cases, a large number of distinct minima exists. The surface has four different high-symmetry adsorption sites (cf. Fig. 6) and although the molecule is highly symmetric, CH_{4} can adsorb in multiple rotational orientations with respect to the surface. In principle, most of these minima might be found by visual inspection; however, as emphasized by Peterson,^{26} as soon as more than one adsorbate is introduced or if the surface features steps or defects, combinatorics makes brute-force sampling and analysis methods quickly intractable. We performed global optimization runs with Cartesian displacements (step width 0.4) as well as constr. CDIC displacements made up of 25% and 75% of the available coordinates (step width 1.25) until 260 intact CH_{4} minima were found. This took 600 global optimization steps with CC trial moves and 500 with constr. CDIC trial moves, respectively. Step widths were chosen according to the best performance in shorter test runs (100 steps each). As initial starting geometry, the molecule has been positioned at a top site with one hydrogen atom pointing away from the surface, followed by local optimization (cf. left-hand side of Fig. 6). Optimization results have been obtained for CH_{4} in a (2 × 2) unit cell to mimick a lower coverage at the surface.

To achieve a rigorous sampling on a metal surface, the first prerequisite is a complete lateral exploration of the surface unit cell. In our test case, this means that the simulation has to traverse through all four distinct high-symmetry adsorption sites (right-hand side of Fig. 6). We therefore monitor the position of the carbon atom in all intact methane minima identified over the course of the sampling run and plot the result in a 2D histogram in Fig. 7. It immediately stands out that CC trial moves fail to sample larger areas of the surface unit cell. This includes rare sampling of the bridge sites, the center of the fcc hollow site, as well as the surroundings of the hcp hollow site. Top sites are instead sampled repeatedly, and computational sampling time is wasted in these revisits. In the case of constr. CDIC trial moves, the distribution of identified minima is more homogeneous and all four high-symmetry sites are visited. An especially interesting finding is that the most often sampled areas are not around the starting site, but around the bridge sites. There is also a slight preference for the fcc hollow site. Since translations are explicitly included in constr. CDIC displacements on surfaces, it is not a big surprise that our trial moves perform better in this regard. In contrast, a random CC trial move is unlikely to describe a concerted molecular translation across the surface.

The second characteristic of a thorough structural sampling in this case is the variety of rotational arrangements of the methane molecule that are found. CC trial moves mostly produce geometries close to the top sites with one hydrogen atom pointing towards the center of the corresponding surface silver atom. A minimum with two hydrogen atoms wedged in-between the bridge sites, the others pointing towards the top sites, is found twice. The most stable configuration, which is only slightly more favorable than other minima, corresponds to adsorption at the hcp hollow site, with the hydrogen atoms rotated to lie above the ridges between silver atoms (staggered). This geometry at the hcp and fcc hollow site was found in the BH run based on CC trial moves. The constr. CDIC trial moves were instead able to find all these rotational arrangements, as well as a number of others. Both runs (25% and 75% constr. CDIC) sampled a variety of different orientations at the bridge sites and close to the top sites as well. The 75% constr. CDIC run even found the molecule flipped around completely once, with one of the hydrogen atoms pointing towards the surface. 25% constr. CDIC trial moves were the only ones that produced an eclipsed rotational configuration of hydrogen atoms at both hollow sites, with the H atoms oriented along the close-packed rows of the Ag(111) surface.

Fig. 8 shows how the discovery of new, unique CH_{4} minima occurred over the course of the global optimization runs. In all three runs combined, a total of 20 symmetry-inequivalent minima was found, of which the CC based run was only able to find 8. All these minima were also identified in either one or both of the constr. CDIC runs. An interesting finding here is that the constr. CDIC runs found twice as many inequivalent geometries as the CC run already after about 100 global optimization steps. After an initial phase of 150 steps, CC trial moves do not lead to any more new minima, whereas constr. CDIC runs continue to find more. Both effects together clearly show the superior performance of constr. CDIC trial moves compared to simple CC trial moves.

Up to this point we have focused on geometries where the CH_{4} molecule remained intact. However, CDIC trial moves are equally able to efficiently sample the space of chemically different minima that involve hydrogen dissociations and chemical reactions on the surface. CC displacements are completely defined by the single parameter of step width *dr*, and there is no straightforward way of fine tuning the search target, for example, in terms of focusing on mainly dissociated or mainly intact geometries of CH_{4}. CDIC displacements, however, enable a straightforward inclusion of constraints that confine configurational space, for example, by constraining bond stretches. In order to analyze the distribution of dissociated and intact structures throughout a BH run, we additionally performed 500 steps of an unconstrained CDIC run using 25% of available coordinates and a step width of 1.6. Figure 9 shows the distribution of intact and dissociated minima for different displacement methods. As also found in the case of *β*-acid on Au(111), constr. CDIC displacements effectively restrict the geometry search to intact CH_{4} molecules. More importantly, unconstrained CDIC trial moves find a similar distribution of dissociated and intact structures as CC displacements and are therefore equally suited to sample the space of reactive intermediates at the surface. However, this comes at a higher sampling rate of unique minima than achieved with CC trial moves.

Lastly, there is one more mechanism with which DICs increase the computational efficiency of global optimization. The most expensive part of a global optimization step is the local optimization, which typically uses a large number of steps to converge to the local minimum energy structure. Reducing the average number of such local optimization steps can lead to a significant computational speed-up. For all test cases we find that constr. CDIC trial moves generally generate geometries that are less strained compared to the ones produced by CC trial moves. This is emphasized by the fact that geometries created from CC trial moves take almost twice as many steps for local convergence as ones generated by 75% constr. CDIC displacements, namely, on average 528 BFGS^{89–92} steps compared to 267 using a force threshold of 0.025 eV/Å. As explained in Sec. IV B 1, using smaller percentages of constr. CDICs introduces more severe changes to the molecule. Therefore 25% constr. CDIC displacements take slightly longer for convergence as 75% constr. CDIC trial moves (309 steps). Overall this nevertheless constitutes a significant reduction in computational cost, consistent with what we already observed in our previous work on silicon clusters.^{68} In summary, we thus find that global structure search with DIC trial moves not only finds more unique minima than CC displacements, but also does so at reduced computational cost as well.

## V. CONCLUSIONS AND OUTLOOK

We presented a modification of the basin hopping algorithm by performing global optimization trial moves in chemically motivated curvilinear coordinates that resemble molecular vibrations. This approach has recently been shown to be efficient in structure determination of clusters^{68} and was here investigated and extended for the application to organic molecules both in gas phase and adsorbed on metal surfaces. The chemical nature of the collective displacements enables straightforward inclusion of rotations, translations, as well as constraints on these and arbitrary internal degrees of freedom. This allows for a finely tunable sampling of practically relevant areas of phase space for complex systems that appear in nanotechnology and heterogeneous catalysis, such as hybrid organic-inorganic interfaces,^{11} organic crystals, self-assembled nanostructures, and reaction networks on surfaces. These systems feature heterogeneous chemical bonding, where stiff energetically favorable bonding moieties, such as covalent molecules, coexist with weak and flexible bonding forces such as interactions between molecules or between a molecule and a surface.

For the show cases *β*-acid adsorbed on Au(111) and methane adsorbed on Ag(111) we find that collective internal coordinate based trial moves outperform Cartesian trial moves in several ways: (i) CDICs systematically identify a larger number of structures at equal number of global optimization steps, thus allowing to reduce the number of necessary global optimization steps; (ii) constr. CDICs are able to restrict structure searches to well defined chemically motivated subdomains of the configurational space and thus find more relevant structures; (iii) DICs generate less strained structures and reduce the computational cost associated with a single global optimization step.

In future work we plan to utilize the variability of singular-value decomposed collective coordinates in the context of materials structure search to facilitate simulation of surface reconstructions and defect formation, as well as crystal polymorphism and phase stability in organic crystals and layered materials. So far we have only discussed the relevance of these coordinate systems in the context of basin hopping; however, displacement moves are a common element of many different stochastic global optimization strategies, such as, minima hopping, where curvilinear constraints can help guide the MD.^{26} Our implementation can easily be extended to feature such algorithms as well.

## Acknowledgments

The authors would like to thank Daniel Strobusch and Christoph Scheurer for supplying the initial coordinate construction code and for fruitful discussions. R.J.M. acknowledges funding from the DoE-Basic Energy Sciences Grant No. DE-FG02-05ER15677. C.P. gratefully acknowledges funding from the Alexander von Humboldt Foundation and within the DFG Research Unit FOR1282. D.P. gratefully acknowledges funding from the Engineering and Physical Sciences Research Council (Project No. EP/J011185/1).

### APPENDIX A: winak: A PYTHON-BASED TOOL TO CONSTRUCT AND APPLY CURVILINEAR COORDINATES FOR MATERIALS STRUCTURE SEARCH AND BEYOND

Many global optimization procedures can be divided into three steps: displacement, optimization, evaluation, and analysis. In order to make DIC displacements as easily accessible as possible, we have developed a modular Python-based framework called winak,^{86} where each of the three aforementioned steps can be customized. For instance, in order to change from the basin hopping method to the minima hopping method,^{38} instead of a global optimization, an MD run can be started, and instead of employing a Metropolis criterion, all newly identified minima are accepted.^{26}

In order to make the code independent of a particular application, each part of the global optimization process has been encapsulated in an individual class. Exchanging local optimization by an MD run can easily and safely be done, independent of the coordinate generation class. The same is true for the displacement step. Logging and error handling is managed by analysis routines that work with any combination of sub-classes. To ensure compatibility, abstract base classes and templates are provided on the basis of which custom procedures for the three above mentioned steps can be developed. This code is designed to be interfaced with ASE,^{77} which enables a wide range of electronic structure codes to be used in combination with winak.

### APPENDIX B: DENSITY-FUNCTIONAL THEORY CALCULATIONS

We repeatedly encountered diol isomers in the case of *trans*-*β*-ionylideneacetic acid adsorbed on Au(111) on the DFTB level. To further investigate this, we recalculated the energy differences between acid and diol, both in gas phase and on the surface using DFT. All calculations were carried out using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional,^{93} implemented in the FHI-aims code using the light basis set.^{94,95} Local optimization was considered converged when the maximum residual force was smaller than 0.025 eV/Å. Calculations including the surface were done on 4 layers of (4 × 6) Au(111) with a k-grid of (5 × 5 × 1). Surface atoms were not allowed to relax during optimization. Both in gas phase and on the surface the acid isomer is more stable, which aligns with chemical intuition. However, the energy difference is lowered from 1.09 eV in the gas phase to 0.87 eV for the adsorbed system, i.e., the diol is stabilized.