We present a new geodesic-based method for geometry optimization in a basis set of redundant internal coordinates. Our method updates the molecular geometry by following the geodesic generated by a displacement vector on the internal coordinate manifold, which dramatically reduces the number of steps required to converge to a minimum. Our method can be implemented in any existing optimization code, requiring only implementation of derivatives of the Wilson B-matrix and the ability to numerically solve an ordinary differential equation.
I. INTRODUCTION
Geometry optimization is a crucial first step in the computational modeling of molecules, solids, and other atomic systems. The most obvious way to optimize molecular geometries is to directly manipulate the Cartesian positions of each atom in the molecule. This approach is appropriate for condensed phase systems, in which the motion of each individual atom is constrained by the presence of nearby atoms. However, this is not true for gas-phase molecules, in which a group of atoms can easily move long distances through concerted motion. These kinds of displacements are large in magnitude in a Cartesian coordinate space and therefore would require a large number of individual geometry optimization steps.
It is for this reason that optimization of molecular geometries is instead usually performed in an internal coordinate basis. In an internal coordinate basis, the geometry of a molecule is represented by a collection of intramolecular coordinates, such as bond stretch distances, bending angles, and dihedral angles.1 These chemically relevant features are directly modified by the optimization algorithm, which reduces the number of steps required to realize large-magnitude displacements through curvilinear steps in the internal coordinate space.
Several factors complicate optimization in an internal coordinate basis. In general, there are more chemically relevant bonds, bending angles, and dihedral angles than degrees of freedom in a molecule. Using a minimal basis of internal coordinates necessarily leaves some chemically relevant features out of the coordinate system, which reduces the efficiency by which these indirectly represented features can be optimized. If all chemically relevant coordinates are used instead, then the coordinate system will have a higher dimensionality than the number of degrees of freedom in the molecule. This latter approach is referred to as a redundant internal coordinate system.2,3
In a redundant internal coordinate system, a seemingly valid displacement vector may lead to a point in internal coordinate space that does not correspond to any physical arrangement of atoms. For example, while an infinitesimal displacement of a C–C–C–C dihedral angle in benzene may appear to preserve all C–C bond distances, any finite displacement will necessarily perturb at least one bond distance. The traditional method for solving this problem is to find a valid point in internal coordinate space that is closest to the invalid coordinates generated using the optimization algorithm.2,4,5 While computationally facile, this approach does not account for the coupling between internal coordinates; this coupling manifests as a curvature of the space of valid internal coordinates.
One may consider the space of valid internal coordinates as a manifold embedded in the higher-dimensional redundant internal coordinate space.6 In order for a displacement vector to be valid, it must be tangent to the manifold at the starting geometry and the final geometry after displacement must also lie in the manifold. Using this perspective, we propose a new method for realizing displacements in a redundant internal coordinate space in which the displacement vector is interpreted as tangent to a geodesic embedded in the manifold of internal coordinates. The geodesic follows the curvature of the manifold as the geometry evolves along the displacement trajectory. Applied to geometry optimization, this approach generates intermediate structures that converge to a minimum-energy structure in fewer steps when compared to the standard approach.
II. THEORY
The Cartesian coordinate vector of an n-atom molecule encodes the geometry of the molecule as the Cartesian positions of each atom in that molecule. The internal coordinate vector encodes the geometry of a molecule in a set of m local coordinates, usually consisting of bond distances, bending angles, and dihedral angles.1 These internal coordinates cannot represent net translation or rotation of a molecule, so, in general, only 3n − 6 internal coordinates are required to fully specify the geometry of nonlinear molecules. When m is greater than its minimum possible value of 3n − 6, it is said that the internal coordinate representation is redundant.2,3 Even in a redundant internal coordinate basis, the set of all valid internal coordinate vectors q only spans a (3n − 6)-dimensional space due to coupling between redundant internal coordinates. As described by Zhu et al.,6 the space of valid internal coordinates can be considered as a (3n − 6)-dimensional manifold embedded in a larger m-dimensional space. It is necessary to ensure that all geometry optimization steps in a redundant internal coordinate basis stay on the (3n − 6)-dimensional manifold, that is, the steps correspond to valid internal coordinates. This means both that displacement vectors must be tangent to the internal coordinate manifold and that new structures obtained during optimization must be found in a way that accounts for the curvature of the manifold.
One way to ensure that the displacement vector Δq lies tangent to the manifold is to temporarily switch to a minimal local coordinate system in which only valid displacement vectors are possible. The delocalized internal coordinate approach constructs a new minimal coordinate system as a linear transformation of the redundant internal coordinates. This is accomplished with the projection matrix U, which is the (m × (3n − 6)) matrix of left singular vectors of the Jacobian matrix B, also known as the Wilson B-matrix,1,7
where U are the aforementioned left singular vectors, S is the diagonal matrix of nonzero singular values, V are the right singular vectors, and U′ and V′ are the left and right singular vectors, respectively, spanning the null space of B.
The coordinates q, the gradient at the current geometry g, and the approximate Hessian matrix H are projected into the delocalized internal coordinate space with U,
The U matrix is recalculated for every geometry visited during optimization in order to ensure that the delocalized internal coordinates remain a complete basis. Using these projected quantities, a variety of optimization algorithms, such as rational function optimization (RFO)8 or quasi-Newton BFGS,9–12 can be used to determine the displacement vector Δp in the delocalized internal coordinate space. In RFO, Δp is obtained from the leftmost eigenvector of the RFO eigenvalue equation,
where α is chosen such that Δp satisfies the trust region condition (see Appendix A for more details), and the eigenvalue μ is not used. The displacement vector obtained from Eq. (5) is then projected back to the full redundant internal coordinate space,
Although the process described above ensures that Δq is tangent to the internal coordinate manifold at the current geometry q0, the point q0 + Δq still may not lie on the manifold due to the curvature. This can be a particularly severe problem for molecules with many highly coupled internal coordinates, such as systems with multiple fused ring structures. This problem is traditionally solved by updating the geometry to be the point on the internal coordinate manifold that is closest to q0 + Δq. This can be accomplished with Newton’s root-finding method in which a series of rectilinear displacements are taken in the Cartesian coordinate basis according to the equation
where we have used the Einstein summation convention, q(k) and x(k) are the internal and Cartesian coordinates at iteration k, respectively, and is the Moore–Penrose pseudo-inverse of the Jacobian matrix evaluated at x(k).2,4,5 In Eq. (7) and below, Latin indices correspond to quantities represented in Cartesian coordinates while Greek indices correspond to quantities represented in redundant internal coordinates. The converged Cartesian coordinates x(k) obtained from Eq. (7) are then used to calculate the new internal coordinates qNewton, which correspond to the point in the internal coordinate manifold closest to q0 + Δq. Although each iteration of Eq. (7) consists of a rectilinear displacement in Cartesian coordinates, the Newton method results in an overall curvilinear displacement.
This approach is computationally facile and generally converges in only a few iterations, with the greatest cost being the evaluation and inversion of the Jacobian matrix. However, when the manifold has a high degree of redundancy or coupling between coordinates, such as in systems with rings, Eq. (7) may fail to converge. In this scenario, a common solution is to iterate Eq. (7) only a single time, which is equivalent to taking the rectilinear Cartesian displacement .5 This fallback approach can have substantial deleterious effects on optimization performance, as these rectilinear displacements tend to perturb bond distances when modifying bending angles or dihedral angles. Even when Eq. (7) does converge, it cannot fully account for the changes in coupling between internal coordinates during the displacement because it does not explicitly consider the curvature of the manifold at any point.
As an alternative to the Newton approach, we suggest a new method for realizing a displacement vector Δq based on geodesics of the internal coordinate manifold. Geodesics are curves that trace the shortest path between two points on a manifold. In our application, the geodesic is determined from the starting geometry q0 and a vector that is tangent to the geodesic, which we choose to be Δq. The orientation of Δq determines the trajectory of the geodesic q(τ), where τ is the dimensionless geodesic parameter, while the magnitude ‖Δq‖ determines the distance along the trajectory to travel. The trajectory can be found by solving the geodesic equation,
where Newton’s dot notation is used to refer to derivatives with respect to τ and are the Christoffel symbols of the second kind for the internal coordinates (see Appendix B for more details).13 Equation (8) is solved for the initial conditions q(0) = q0 and . The new geometry is determined by integrating q(τ) until τ = 1 by Eq. (8), which corresponds with a displacement distance of ‖Δq‖ along the geodesic.
Equation (8) cannot be solved directly, as the internal coordinates q are calculated from the Cartesian coordinates x and are therefore not independent variables. Instead, we solve the geodesic equation in the Cartesian coordinate basis,
where x(0) = x0 are the Cartesian coordinates corresponding to q0 and . The point x(1) obtained from this differential equation is used to calculate the new internal coordinates q(1). Although Eq. (9) depends on the second derivative of q with respect to x, this quantity is not prohibitively onerous to implement for commonly used internal coordinate types, and it has a sparse structure that can be exploited to accelerate the summation over indices k and l. These second derivatives can be evaluated numerically from the Jacobian matrix,1 analytically,14,15 or through automatic differentiation.16 Equation (9) can be solved using an off-the-shelf ordinary differential equation (ODE) solver such as LSODA17 or CVODE18 using a standard order reduction strategy.
Following a geometry step, it is typical for optimization algorithms to update an approximate Hessian matrix H in order to satisfy the secant condition,
where g is the gradient vector in the internal coordinate basis. In order for the approximate curvature to lie in the tangent space of the manifold at the new point q1, this secant condition must be modified to
where is obtained from the solution to Eq. (8) and is the gradient vector at point q0, which has been parallel transported along the geodesic to the point q1.19 Parallel transport is the process of translating vectors that are tangent to a manifold along a curvilinear trajectory on that manifold (such as a geodesic) in such a way that the vectors remain both tangent to the manifold along the entire trajectory and self-parallel along infinitesimal displacements. Details on how is determined are presented in Appendix C. In the Hessian update scheme of our geodesic approach, the raw displacement q1 − q0 is replaced by and the initial gradient vector g0 is replaced by its parallel transported equivalent . Our implementation of the algorithm initializes the Hessian matrix using the scheme of Fischer and Almlof,20 while the Hessian matrix is updated using the TS-BFGS algorithm, as it is suitable for both minimization and saddle point optimization.21,22
An illustration comparing the geodesic and the Newton stepping methods is presented in Fig. 1. In Fig. 1, the purple surface represents the manifold of valid internal coordinates in a methane molecule in which only a single hydrogen atom is free to move at a fixed distance from the carbon atom. Thus, the only unconstrained internal coordinates of this molecule are three of the bending angles, as depicted in Fig. 1(c). Although this system has three free bending angle coordinates, only two degrees of freedom remain due to the coupling between the angular coordinates. Figures 1(a) and 1(b) depict the entire manifold in a basis of the three free bending angles from two different perspectives. On this basis, the internal coordinate manifold takes the form of an octahedron with smoothed edges. The geodesic approach follows the curvature of the manifold to find the new point qgeodesic. In contrast, the Newton method converges to the point qNewton on the manifold that is closest to q0 + Δq. Figure 1(d) shows the same manifold but rotated and zoomed to better illustrate the difference between the Newton and the geodesic stepping methods. From Fig. 1(d), it is clear that qNewton does not lie on the geodesic curve. This is to be expected as the Newton stepping method is not aware of the curvature of the manifold, unlike the geodesic method that follows the curvature of the manifold by construction. For a convex manifold, the geodesic step length will always be greater than or equal to the Newton step length as a consequence of the triangle inequality. Despite this, the primary difference between the geodesic and the Newton methods is not the magnitude of the displacement, but rather the trajectory.
III. RESULTS
Although Fig. 1 demonstrates that the Newton and the geodesic methods result in different structures, it is not immediately obvious which of the two stepping methods is better for geometry optimization. To determine the difference in performance between the Newton and geodesic methods, we use a geometry optimization benchmark originally developed by Birkholz and Schlegel consisting of 20 molecules that have between 20 and 95 atoms.5 Potential energies were evaluated using DFTB+ with the DFTB3 parameterization.23–27 Structure optimization was performed using Sella, an open source Python package primarily focused on saddle point optimization that is also capable of performing geometry minimization.28,29 We note that because Sella is primarily intended to be used for saddle point optimization, the performance of its RFO minimization algorithm is likely to be lower than that of other purpose-built minimization codes. The focus is therefore only on the relative performance of the Newton and geodesic stepping approaches, with all other aspects of the minimization algorithm remaining identical. Of the original 20 molecules in the benchmark, one molecule was excluded due to a missing initial structure from the reference and another was excluded as DFTB3 lacks parameters for aluminum. Scripts to reproduce these calculations can be found in the supplementary material.
The results in Table I indicate that the geodesic approach requires fewer steps to reach convergence in all tested systems. For the molecule sphingomyelin, the geodesic approach converges to a lower energy structure than the Newton approach while also requiring fewer steps to converge. The optimization trajectories of two of the molecules, azadirachtin and sphingomyelin, are illustrated in Fig. 2. Trajectories of all tested molecules can be found in the supplementary material. Initially, the largest components of the gradients of the structures in this test set tend to lie in bond stretching coordinates, and so early stages of the geometry optimization are dominated by bond stretch displacements. These bond stretch displacements tend to be rectilinear or nearly rectilinear, meaning that the manifold has very low curvature in these directions, and therefore, the two methods tend to take very similar steps near the beginning of optimization. After the bond stretching modes are largely relaxed, the larger amplitude angle bending and dihedral angle coordinates begin to dominate the optimization; it is at this point that the Newton and geodesic methods begin to exhibit different performance characteristics. In this regime, the Newton method more frequently takes steps that result in an increase in the potential energy as evidenced by the many spikes in the optimization trajectories in Fig. 2. In contrast, the geodesic method is less likely to take steps that increase the potential energy and generally reaches convergence in fewer steps overall compared to the Newton method. Azadirachtin, as illustrated in Fig. 2(a) is an example of a molecule with several fused ring and bicyclic structures that benefit substantially from the geodesic stepping method, as previously suggested.
Species (No. of atoms) . | Newton . | Geodesic . |
---|---|---|
Artemisinin (42) | 122 | 33 |
Avobenzone (45) | 307 | 90 |
Azadirachtin (95) | 108 | 45 |
Bisphenol A (33) | 268 | 89 |
Cetirizine (52) | 178 | 34 |
Codeine (43) | 270 | 107 |
Diisobutyl phthalate (42) | 119 | 40 |
Estradiol (44) | 174 | 47 |
Inosinea (31) | 351 | 92 |
Maltose (45) | 197 | 104 |
Mg porphyrin (37) | 86 | 15 |
Ochratoxin A (45) | 212 | 47 |
Penicillin V (42) | 167 | 55 |
Raffinose (66) | 300 | 168 |
Sphingomyelin (84) | 217b | 164 |
Tamoxifen (57) | 211 | 58 |
Vitamin C (20) | 160 | 60 |
Zn EDTA (33) | 126 | 41 |
Mean | 198.5 | 71.6 |
Standard deviation | 73.7 | 42.0 |
Species (No. of atoms) . | Newton . | Geodesic . |
---|---|---|
Artemisinin (42) | 122 | 33 |
Avobenzone (45) | 307 | 90 |
Azadirachtin (95) | 108 | 45 |
Bisphenol A (33) | 268 | 89 |
Cetirizine (52) | 178 | 34 |
Codeine (43) | 270 | 107 |
Diisobutyl phthalate (42) | 119 | 40 |
Estradiol (44) | 174 | 47 |
Inosinea (31) | 351 | 92 |
Maltose (45) | 197 | 104 |
Mg porphyrin (37) | 86 | 15 |
Ochratoxin A (45) | 212 | 47 |
Penicillin V (42) | 167 | 55 |
Raffinose (66) | 300 | 168 |
Sphingomyelin (84) | 217b | 164 |
Tamoxifen (57) | 211 | 58 |
Vitamin C (20) | 160 | 60 |
Zn EDTA (33) | 126 | 41 |
Mean | 198.5 | 71.6 |
Standard deviation | 73.7 | 42.0 |
The structure provided by Ref. 5 erroneously replaced a nitrogen atom with a carbon atom. We have corrected this error in our calculations.
Converges to a higher-energy structure.
We believe that this difference in performance is not primarily a consequence of the magnitude of the individual displacement steps, but rather the differing displacement trajectories generated by the two methods. In Sella’s primary application of saddle point optimization, preliminary results suggest a substantial increase in performance compared to other leading algorithms, which we intend to show in a future publication.
IV. CONCLUSION
We have presented a new approach for realizing displacements in a redundant internal coordinate space by translating the molecular geometry along geodesics of the internal coordinate manifold. In contrast to the traditional Newton stepping approach, our method substantially reduces the number of steps required to reach convergence, particularly for systems with highly coupled internal coordinates, such as molecules with multiple fused rings. The improved performance of the geodesic stepping approach is due to its consideration of the coupling between internal coordinates along the entire displacement trajectory. Our method is straightforward to implement, requiring only second derivatives of the internal coordinates (i.e., first derivatives of the Wilson B-matrix) and numerical solution of a non-stiff first-order differential equation.
SUPPLEMENTARY MATERIAL
Additional optimization trajectories, such as those presented in Fig. 2, for all tested systems can be found in the supplementary material. Additionally, the Python scripts used to generate these trajectories and the data presented in Table I are also available in the supplementary material.
ACKNOWLEDGMENTS
This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences and Biosciences Division, as part of the Computational Chemistry Sciences Program (Award No. 0000232253).
Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under Contract No. DE-NA0003525. The views expressed in this article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
We would like to thank Dr. Laura McCaslin for useful discussions leading to improved readability and comprehensibility of this work.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material and are openly available in Zenodo at http://doi.org/10.5281/zenodo.4747052.29
APPENDIX A: TRUST REGION ALGORITHM
In our implementation of trust region RFO, the parameter α of Eq. (5) is chosen to satisfy the trust region condition ‖Δp‖∞ ≤ δk, where δk is the size of the trust region at iteration k. First, Eq. (5) is solved for α = 1, and if the resulting Δp satisfies the trust region condition, the displacement is accepted. Otherwise, bisection is used to determine a value of α for which ‖Δp‖∞ = δk. After every iteration, δk is updated based on the agreement between the expected change in electronic potential energy and the actual change in energy. The ratio between the predicted change and the true change in energy is
where Δϵ is the true change in energy following a geometry optimization step. Two threshold values, ρinc and ρdec, are used to determine whether to increase or decrease the trust radius, respectively. If ρ > ρdec or , this indicates that the quadratic approximation is of poor quality and that the trust region should be decreased. For minimization, this condition will be met whenever the energy increases during optimization, as ρ will become negative, while ρdec is always positive. In this case, the new trust region is chosen to be
where σdec < 1 is a parameter of the method. If , this indicates that the quadratic approximation is highly accurate and that the trust region can be expanded. In this case, the new trust region is chosen to be
where σinc > 1 is a parameter of the method.
For this work, we use the following manually tuned parameters: δ0 = 0.2 (units ambiguous, vide infra), ρdec = 100 (effectively, the trust region only shrinks when the energy increases), ρinc = 1.035, σdec = 0.90, and σinc = 1.15. Note that because p is a linear combination of bond stretches, angle bends, and dihedral angles, the units of δ are somewhat ambiguous. In our implementation, angstroms are used for bond stretches while radians are used for both angle bends and dihedral angles. The choice of the units impacts the performance of the algorithm. For example, using degrees for the angle coordinates instead of radians would result in a decrease in the relative magnitude of displacements in the angle coordinates as compared to the bond stretch coordinates. Similarly, using atomic units of distance for bond stretch coordinates would result in a decrease in the relative magnitude of displacements in the bond stretch coordinates as compared to the angle coordinates.
APPENDIX B: CHRISTOFFEL SYMBOLS AND THE METRIC TENSOR
The Christoffel symbols of a manifold contain information about the curvature of the manifold. The symbols are related to the metric tensor G through the relation
The metric tensor G is a representation of the metric function g(u, v), which generalizes the notion of the dot product to curved manifolds,
For the current application, the metric tensor is that of the internal coordinate space as represented in a Cartesian coordinate basis, which is given by
APPENDIX C: SOLVING THE GEODESIC EQUATION AND PARALLEL TRANSPORT
The geodesic is found by solving the differential equation outlined in Eq. (9). This equation is reduced to a first-order differential equation by substituting y1 = x and ,
Parallel transport of the gradient vector g0 from the point at which it is evaluated q0 along the geodesic to the new point q1 is accomplished by augmenting Eqs. (C1) and (C2) with a third variable y3,
with the initial condition , where is the Moore–Penrose pseudo-inverse of the Jacobian matrix evaluated for geometry x0 = y1(0).13 The parallel transported gradient vector is given by
where B1 is the Jacobian matrix evaluated at x1 = y1(1). The final differential equation is solved in terms of with the SciPy implementation of the LSODA ODE solver.