Force Field X (FFX) is an open-source software package for atomic resolution modeling of genetic variants and organic crystals that leverages advanced potential energy functions and experimental data. FFX currently consists of nine modular packages with novel algorithms that include global optimization via a many-body expansion, acid–base chemistry using polarizable constant-pH molecular dynamics, estimation of free energy differences, generalized Kirkwood implicit solvent models, and many more. Applications of FFX focus on the use and development of a crystal structure prediction pipeline, biomolecular structure refinement against experimental datasets, and estimation of the thermodynamic effects of genetic variants on both proteins and nucleic acids. The use of Parallel Java and OpenMM combines to offer shared memory, message passing, and graphics processing unit parallelization for high performance simulations. Overall, the FFX platform serves as a computational microscope to study systems ranging from organic crystals to solvated biomolecular systems.
INTRODUCTION
Force Field X (FFX) is an open-source software platform for atomic resolution modeling of organic materials and biomolecules that leverages molecular mechanics (MM) force fields and a family of novel electrostatics,1–3 optimization,4,5 thermodynamics,6–8 and experimental refinement algorithms.9–11 This article introduces the unique computational models and algorithms available in FFX, with emphasis placed on applications such as crystal structure prediction, understanding the mechanism of disease-causing protein missense variants, and the refinement of biomolecules against experimental datasets. FFX was envisioned as a platform to develop experimental refinement algorithms using advanced potential energy functions based on permanent atomic multipoles and induced dipoles, including the Atomic Multipole Optimized Energetics for Biomolecular Applications (AMOEBA) force field.12–16 FFX’s design has been influenced by lessons learned from packages such as Tinker,17–19 OpenMM,20 and Crystallography and NMR System (CNS),21,22 while prioritizing the use of virtual machine (VM) technology and polyglot programming. FFX is distributed under the GPL v.3 license with the classpath exception, which is the same license used by OpenJDK (although v.2 in this case). The long-term goal for the FFX platform is to support the study of all major facets of organic materials, biological molecules, and their assembly into large complexes using the principles of chemical physics.
Polyglot architecture
FFX achieves portability and support for polyglot programming (i.e., the use of several programming languages) through VM technologies, including the Java VM (JVM),23–25 GraalVM,26,27 and TornadoVM28,29 as shown in Fig. 1. The JVM, as part of the Java Development Kit (JDK), is one of the oldest VM platforms. Over the last decade, the GraalVM project has augmented the JVM with an alternative just-in-time compiler (Graal), the ability to create ahead-of-time native images, and support for polyglot programming to allow languages such as Python30,31 to execute on the JVM and interoperate with canonical JVM languages (e.g., Java, Kotlin, Scala, and Groovy).
An overview of Force Field X. The second row shows languages that are compatible with FFX. The third row represents the platforms that execute FFX on CPU cores on the left and coprocessors on the right. The final row indicates the CPU and GPU hardware that FFX can perform calculations on.
An overview of Force Field X. The second row shows languages that are compatible with FFX. The third row represents the platforms that execute FFX on CPU cores on the left and coprocessors on the right. The final row indicates the CPU and GPU hardware that FFX can perform calculations on.
Novel algorithms
The platform includes many novel algorithms that will be discussed in more detail throughout this article. For example, FFX debuted the first implementation of x-ray crystallography refinement using the polarizable atomic multipole AMOEBA force field, the first constant-pH molecular dynamics (MD) support for a polarizable force field,32 and the generalized Kirkwood (GK) implicit solvent models.3,33–35 FFX also implements efficient inclusion of space group symmetry into long-range electrostatics via particle-mesh Ewald (PME) summation1,35–37 and orthogonal space tempering (OST) for the calculation of free energy differences.6,8,38,39 It can also perform global optimization using dead-end40 (and Goldstein41) elimination for target functions that include higher-order many-body terms11,43 and dual topology methods to compute free energy differences between potential energy models.7,42
Commands and parallelization
FFX offers more than 50 commands that implement a variety of methods to investigate organic and biomolecular systems, which are organized into nine Java packages. These packages will be discussed in more detail later in this article. FFX leverages GraalVM technologies for cross-platform polyglot programming in Java, Groovy, Python, and Kotlin. FFX utilizes the Parallel Java (PJ) package to facilitate parallelization across the central processing unit (CPU) cores/threads of a single process [i.e., shared memory (SM)] and among multiple processes using its message passing interface (MPI). Graphics processing unit (GPU) acceleration is currently achieved using OpenMM based on a Java class hierarchy that mirrors the OpenMM C++ application programming interface (API). The source code is freely available for download from the FFX website (http://ffx.biochem.uiowa.edu) or from GitHub (https://github.com/SchniedersLab/forcefieldx) and currently depends on JDK version 21 or greater. The software is compiled within a terminal window using the Apache Maven tool and is executed in either “headless” command line mode using the “ffxc” command or with a simple interactive graphical user interface using the “ffx” command.
Applications
Throughout this article, we will discuss applications of the novel algorithms available in FFX. One application is the refinement of atomic resolution models against x-ray and/or neutron diffraction data using advanced force fields.10,11 FFX is also able to investigate the impact of missense variants on protein structure, dynamics, and thermodynamics.43–53 In addition, FFX premiers a novel pipeline for crystal structure prediction.54,55 The wide-ranging capabilities of FFX make it particularly useful for the computational study of biochemical mechanisms.
FEATURES AND ORGANIZATION
File types, coordinate representations, and conversions
FFX utilizes a variety of file types to describe molecular systems. Files typically consist of a base name that is shared among all files for a system and a unique suffix to denote the information contained therein (e.g., molecule.xyz, molecule.mtz, and molecule.properties). FFX uses a suffix versioning system where subsequent files of the same type are saved with an appended integer (e.g., generated coordinates from molecule.xyz would produce a file named molecule.xyz_2). Commonly used file types are found with a brief description in Table I.
File types read and written by FFX, including the origin of each format.
Atomic coordinates and variables . | . | ||
---|---|---|---|
ARC | Structural archive | Tinker | |
CIF | Crystallographic information file | IUCr | |
DYN | Molecular dynamics restart information | Tinker | |
ESV | Extended system variables | FFX | |
INT | Internal coordinates | Tinker | |
PDB | Protein databank structure | PDB | |
XPH | XYZ file with pH information | FFX | |
XYZ | Coordinates, atom types, and connectivity | Tinker | |
Control properties and force field parameters | |||
DST | Distance matrix from superposing structures | FFX | |
Properties/key | Control file with Java properties/tinker keywords | FFX/Tinker | |
PRM | Force field parameter file | Tinker | |
Structure factors and real space maps | |||
CCP4/MAP | Real space density map | CCP4 | |
CIF | Structure factors | IUCr | |
CNS/HKL | Structure factors | CNS/Xplor | |
MTZ | Binary format for structure factors | CCP4 | |
XPLOR | Real space density map | CNS/Xplor | |
Thermodynamics | |||
BAR | Window energy values for FEP/BAR | Tinker | |
HIS | Orthogonal space histogram | FFX | |
LAM | Lambda restart information | FFX | |
MBAR | Window energy values for MBAR | FFX |
Atomic coordinates and variables . | . | ||
---|---|---|---|
ARC | Structural archive | Tinker | |
CIF | Crystallographic information file | IUCr | |
DYN | Molecular dynamics restart information | Tinker | |
ESV | Extended system variables | FFX | |
INT | Internal coordinates | Tinker | |
PDB | Protein databank structure | PDB | |
XPH | XYZ file with pH information | FFX | |
XYZ | Coordinates, atom types, and connectivity | Tinker | |
Control properties and force field parameters | |||
DST | Distance matrix from superposing structures | FFX | |
Properties/key | Control file with Java properties/tinker keywords | FFX/Tinker | |
PRM | Force field parameter file | Tinker | |
Structure factors and real space maps | |||
CCP4/MAP | Real space density map | CCP4 | |
CIF | Structure factors | IUCr | |
CNS/HKL | Structure factors | CNS/Xplor | |
MTZ | Binary format for structure factors | CCP4 | |
XPLOR | Real space density map | CNS/Xplor | |
Thermodynamics | |||
BAR | Window energy values for FEP/BAR | Tinker | |
HIS | Orthogonal space histogram | FFX | |
LAM | Lambda restart information | FFX | |
MBAR | Window energy values for MBAR | FFX |
FFX implements a variety of commands to facilitate alterations and conversions between the file types listed in Table I. General file commands are listed in Table II along with a brief description. The Cart2Frac and Frac2Cart commands convert atomic coordinates between Cartesian and fractional coordinate systems, where the latter defines each atomic position as a fractional (unitless) distance along each axis of a unit cell. SaveAsP1 expands a periodic system by applying space group symmetry operators to the asymmetric unit to generate a P1 unit cell (or a larger replicated unit cell). SaveAsXYZ converts a system into an XYZ coordinate file. SaveAsQE generates a default script that can be used with Quantum ESPRESSO to perform a plain-wave self-consistent field (PWscf) calculation. ImportCIF converts an organic crystal from the Crystallographic information file (CIF) format (e.g., from the Cambridge Structural Database56) into XYZ format based on the constituent molecule(s) having already been parameterized. While reading systems, ImportCIF actively enforces space group restrictions with regard to lattice parameters and updates the space group in some cases (e.g., a rhombohedral space group with hexagonal lattice parameters is converted to the appropriate hexagonal space group). ImportCIF adds hydrogen atoms to a system if none are present in the original CIF file. It can also convert an XYZ system to a basic CIF format containing the coordinate and crystal information. MoveIntoUnitCell moves the molecules of a system to ensure each center of mass is located within the unit cell. Finally, the command xray.MTZInfo displays human readable information for a binary MTZ file, while xray.CIFtoMTZ generates an MTZ file for a corresponding CIF file.
Structure manipulation commands available in FFX.
Command . | Description . |
---|---|
Cart2Frac | Convert from Cartesian to fractional coordinates |
Frac2Cart | Convert from fractional to Cartesian coordinates |
SaveAsP1 | Expands a crystal to P1 or a replicated unit cell |
SaveAsXYZ | Save the system as an XYZ file |
SaveAsQE | Create a quantum ESPRESSO input script |
ImportCIF | Import a system from a CIF file |
MoveIntoUnitCell | Move all molecules into the unit cell |
xray.MTZInfo | Log information for an MTZ file |
xray.CIFtoMTZ | Convert diffraction data from CIF format to MTZ |
Command . | Description . |
---|---|
Cart2Frac | Convert from Cartesian to fractional coordinates |
Frac2Cart | Convert from fractional to Cartesian coordinates |
SaveAsP1 | Expands a crystal to P1 or a replicated unit cell |
SaveAsXYZ | Save the system as an XYZ file |
SaveAsQE | Create a quantum ESPRESSO input script |
ImportCIF | Import a system from a CIF file |
MoveIntoUnitCell | Move all molecules into the unit cell |
xray.MTZInfo | Log information for an MTZ file |
xray.CIFtoMTZ | Convert diffraction data from CIF format to MTZ |
Software organization
FFX is composed of nine Java packages organized by functional themes and capabilities. The nine packages are Parallel Java (PJ), Utilities, Numerics, Crystal, OpenMM, Potential, Algorithms, Refinement, and User Interfaces (UI). The organization of packages and their dependencies within FFX are shown in Fig. 2.
A flowchart representation of the package dependencies in FFX. Each package is indicated by a separate color, and the lines between packages indicate a dependency relationship. For example, most colored lines connect to the PJ package as all packages depend on PJ except for OpenMM. Commands available in the Potential, Algorithms, and Refinement packages are displayed in the corresponding color-coded box. For Refinement commands, those with an asterisk indicate that both x-ray and Real Space versions of the command are available, while those without an asterisk currently have only an x-ray version.
A flowchart representation of the package dependencies in FFX. Each package is indicated by a separate color, and the lines between packages indicate a dependency relationship. For example, most colored lines connect to the PJ package as all packages depend on PJ except for OpenMM. Commands available in the Potential, Algorithms, and Refinement packages are displayed in the corresponding color-coded box. For Refinement commands, those with an asterisk indicate that both x-ray and Real Space versions of the command are available, while those without an asterisk currently have only an x-ray version.
Parallel Java57,58 provides APIs for both shared memory (SM) parallelization using threads and a message-passing interface (MPI) for parallelization across JVM processes that are executing on the same node and/or between nodes. The former SM parallelization leverages concepts analogous to those defined by OpenMP, including support for parallelization of “for” loops, atomic operations, and scheduling of tasks. The PJ MPI approach defines its own scheduling and communication (e.g., “mpirun” is unnecessary) without any dependencies beyond the Java Runtime Environment (JRE). The Parallel Java package is used to accelerate operations in most other packages [e.g., 3D fast Fourier transforms (FFTs), force field energy evaluations, and replica-based sampling strategies]. Therefore, all packages have PJ as a dependency. Our fork of the original PJ code by Alan Kaminsky contains several changes geared toward freeing SM hardware threads that are no longer in use and enhanced logging of MPI communications.57,58
The Utilities package provides simple functionality for file handling (e.g., file copying and file naming conventions) and the implementation of system properties. Properties in FFX are built on the JVM standards for system properties. These properties (key-value pairs) are used to specify calculation details that vary between simulation goals. All packages except for PJ and OpenMM depend on the Utilities package.
The Numerics package implements common mathematical methods and numerical recipes required for calculations available within FFX. The Numerics package computes real and complex 1D59,60 and 3D fast Fourier transforms (FFTs) and offers both float (single precision) and double (double precision) vector math libraries and a range of special functions, such as Erf/Erfc and modified Bessel functions. Numerics offers a novel family of multipole tensor recursion61 algorithms for Coulomb, Ewald, Thole, and GK interactions using both Cartesian and quasi-internal62 coordinate frames. Finally, the Numerics package implements atomic operations on arrays, a limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS)63–66 optimizer, and support for uniform b-splines.67 The Crystal, Potential, Algorithms, Refinement, and UI packages each depends on the Numerics package.
The Crystal package gives FFX the ability to perform operations on all 230 crystallographic space groups during the evaluation of a force field potential energy and during crystallographic refinement against x-ray and/or neutron diffraction data. It provides methods for application of the minimum image convention, symmetry operators, conversion between Cartesian and fractional coordinates, and the storage of reflection lists from diffraction experiments.68 The Potential, Algorithms, Experiment, and UI packages each depends on the Crystal package.
The Potential package provides support for evaluating the potential energy of atomic resolution molecular systems using fixed partial atomic charge force fields, polarizable atomic multipole force fields, and preliminary support for using neural network potentials that are based on PyTorch.69 When a molecular system is loaded from a file, an instance of the MolecularAssembly class is instantiated together with creation of an associated ForceFieldEnergy. The latter provides methods to compute the potential energy of the system and optionally its gradient for use with simulation methods defined in the Algorithms package described below. The Potential package supports calculation of long-range electrostatics using particle-mesh Ewald35–37 (PME) summation with novel support for space group symmetry1 and offers continuum treatment of solvent via our GK model.3,33–35 The FFX commands defined within the Potential package are listed in Fig. 1. Some notable commands include Energy to calculate the potential energy of a system, Solvator to solvate a system into a water box with or without free salt, and the SaveAs family of commands to convert systems between file types. The Algorithms, Refinement, and UI packages depend on the Potential package.
The Algorithms package contains optimization and sampling methods that operate on the potential energy functions defined in the Potential package. The Algorithms package commands are listed in Fig. 2. A family of local optimization commands (Minimize, CrystalMinimize, and PhMinimize) leverage the L-BFGS method defined in the Numerics package. The Algorithms package also offers global optimization methods based on simulated annealing (Anneal) and via our many-body versions4,11 of dead-end elimination40 and Goldstein elimination73 (ManyBody). The Dynamics command executes molecular dynamics via integrators that include velocity Verlet, Beeman,70 stochastic dynamics,71,72 and the reversible reference system propagation algorithm (r-RESPA).73,74 In the absence of stochastic dynamics, temperature control is available via thermostats by Berendsen et al.75 and Bussi et al.76 Constant pressure is achieved using a novel Monte Carlo barostat77 that respects space group constraints. The Thermodynamics command computes free energy differences via an alchemical path defined by a state variable (λ) either by sampling an array of windows at fixed λ values [followed by application of the Bennett Acceptance Ratio (BAR) method78] or using a unique implementation of the orthogonal space tempering38,79 method that supports both polarizable force fields and space group symmetry.6–8,80 The Refinement and User Interface packages depend on the Algorithms package.
The Refinement package implements target functions that combine a potential energy function defined in the Potential package (e.g., a fixed charge or polarizable force field) with a function that compares a molecular model (e.g., its atomic coordinates, b-factors, and/or occupancies) to either diffraction data in reciprocal space or a scalar field (e.g., electron density) in real space.10 This includes a novel bulk scattering model that is differentiable with respect to atomic coordinates9 and the unique ability to evaluate long-range electrostatics using the rigorous PME method.1 Once the overall target function is defined, then most of the optimization and sampling methods of the Algorithms package can be employed for model refinement (e.g., local minimization, molecular dynamics, many-body side-chain optimization, and simulated annealing). Refinement against x-ray diffraction data, neutron diffraction data, or joint x-ray/neutron diffraction datasets is supported. Global optimization of side-chain conformations against either a reciprocal space or real space target function is supported.11 The available commands in the Refinement package are listed in Fig. 2. Those with an asterisk (*) have both reciprocal and real space versions. The User Interface package depends on the Refinement package.
The User Interface package implements both command line and graphical user interfaces. This package depends on all other packages. We will elaborate on FFX functionalities in more explicit detail later in this article.
EXTERNAL LIBRARIES
FFX leverages a variety of external libraries to perform functions such as CIF file parsing, parallelization, and bioinformatics methods. These packages include CIF tools,81 BioJava,82 the Chemistry Development Kit (CDK),83 TornadoVM,84 Parallel Java,57 the picocli (a command line interface package),85 the Groovy language, and the GraalVM Python implementation. CIF tools give FFX the ability to read and write CIF files from the Cambridge Structural Database.56 The BioJava library supports local installations of the Protein Data Bank (PDB), can load protein and nucleic acid sequences stored in FASTA format, and offers algorithms for structural alignment. The Chemistry Development Kit (CDK) is a collection of modular libraries for processing chemical information, including support for parsing files in the SMILES, SDF, and Mol2 format. CDK also allows substructure and SMARTS pattern matching and fingerprint methods for similarity searching. TornadoVM extends the JRE by offering programming constructs to facilitate translating a subset of Java code into PTX (CUDA), OpenCL, or SPIRV (Intel Level Zero) backends. As mentioned previously, the Parallel Java library implements both “OpenMP Style” shared memory coding constructs and a platform independent implementation of the canonical message passing interface operations for parallelization across processes. Picocli is an annotation-driven library for creating command line applications along with documentation in HyperText Markup Language, portable document format (PDF), or Unix man page formats. The Groovy scripting language is used to quickly prototype new ideas and create FFX commands. Finally, with the emergence of a Python 3.10 implementation that runs on the JVM from the GraalVM team, FFX now also fully supports execution of Python scripts in a cross-platform manner.
POTENTIAL ENERGY FUNCTIONS
FFX supports a variety of potential energy functions that include both fixed partial charge and polarizable atomic multipole force fields. There is support for implicit solvents consisting of cavitation, dispersion, and generalized Kirkwood contributions, energy terms for refinement against x-ray and/or neutron diffraction data, and energy terms for refinement against real space maps (e.g., from either CryoEM or diffraction experiments). FFX also offers unique support for a family of dual-topology potential energy functions that facilitate computing free energy differences due to chemical modifications in the AMOEBA force field (e.g., in the context of relative binding affinity or relative hydration free energy), between two crystal polymorphs, or between alternative force field models. Force field parameter files adopt Tinker conventions when possible, however, supports for experimental refinement and dual-topology algorithms are unique to FFX.
Force field models
IMPLICIT SOLVENT
FFX implements implicit solvents for applications where the use of explicit water molecules is cumbersome, including the repacking of protein side chains and the refinement of biomolecular models against experimental data. The implicit solvent is generally formulated as a sum of three free energy differences defined by a thermodynamic cycle: cavitation, dispersion, and electrostatics contributions. The cycle describes the transfer of the biomolecular system between vacuum into solvent phases. The combination of cavitation and dispersion free energy differences is usually referred to as the non-polar contribution,56–59 while the latter electrostatic contribution is based on an analytic approximation to the Poisson equation as described below.
FFX calculates effective radii by combining the analytic Hawkins, Cramer, and Truhlar (HCT) pairwise descreening approximation98 with the solvent field approximation (SFA) proposed by Grycuk.99 Contributions to effective radii due to interstitial spaces (spaces within a system where a water molecule cannot fit) are also included due to their importance when modeling proteins. The interstitial space corrections include a pairwise “neck” between nearby atoms100 and a hyperbolic tangent (tanh) function101 to help smoothly scale up the effective radius of an atom as it becomes more deeply buried. Currently, the FFX implementation of GK implicit solvent has been validated for protein simulations with work ongoing to support nucleic acids.3
DUAL-TOPOLOGY FRAMEWORK
Interpolation between force fields
Overly aggressive coordinate constraints cause the calculated free energy correction to diverge from the actual free energy correction due to the approximation that both systems share an identical phase space. On the other hand, overly conservative constraints render the correction increasingly expensive to converge. The SB approach was used to correct the binding free energy differences for a set of divalent cation binding proteins to within statistical uncertainty of the true calculated AMOEBA values.8 Compared to prior IFE methods, the SB approach allowed an order of magnitude more atoms to be converted between resolutions. Future IFE efforts would benefit from force field resolutions with identical bonded terms (e.g., between the fixed partial charge and polarizable atomic multipole resolutions) such that only non-bonded terms contribute to the corrections.
Interpolation between polymorphs
Depiction of an alchemical path connecting two crystal polymorphs defined by different space groups via a dual topology framework [Eq. (10)]. Symmetry mates produced via the symmetry operators of polymorph A are shown with carbon atoms colored black, and those produced by the symmetry operators of polymorph B are light gray. The atoms of the asymmetric unit of polymorph A are mapped via a custom symmetry operator into the asymmetric unit of polymorph B (asymmetric unit carbon atoms are colored gray for each state).
Depiction of an alchemical path connecting two crystal polymorphs defined by different space groups via a dual topology framework [Eq. (10)]. Symmetry mates produced via the symmetry operators of polymorph A are shown with carbon atoms colored black, and those produced by the symmetry operators of polymorph B are light gray. The atoms of the asymmetric unit of polymorph A are mapped via a custom symmetry operator into the asymmetric unit of polymorph B (asymmetric unit carbon atoms are colored gray for each state).
STRUCTURE MANIPULATION
FFX contains methods to alter systems as needed. MutatePDB changes the identity of a chosen amino acid to an alternative identity. Solvator creates a periodic box of water around the input system and optionally adds explicit counterions. The Superpose command calculates the coordinate root mean square deviation (RMSD) between two systems via quaternion superposition. A unique SuperposeCrystals command quantifies the packing similarity between two crystals. This is performed using the Progressive Alignment of Crystals (PAC) algorithm, which calculates an atomistic RMSD for the aligned molecular subclusters of each crystal.54 The input options the user can select include the number of asymmetric units in each subcluster, the selection criteria for molecules in the subcluster, and atom(s) to be excluded from the comparison. SuperposeCrystals can save crystal systems that are within a user specified RMSD from a desired crystal or to write out a matrix of RMSD values for clustering and/or filtering out similar crystals within an ensemble. Furthermore, PAC can accumulate the rotations and translations into an overall symmetry operator to map moleculars between crystal polymorphs as discussed in the section titled Interpolation Between Polymorphs.
LOCAL OPTIMIZATION
FFX currently has two methods for local optimization: the steepest descent method and the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) method. The latter is a quasi-Newton approach that minimizes a nonlinear function by approximating the inverse Hessian matrix.103,104 Both methods are available via the Minimize command to optimize atomic coordinates for a given potential energy function. The MinimizeCrystals command additionally supports optimization of lattice parameters, while the MinimizePh command supports local optimization of titration and tautomer states. Finally, the xray.Minimize and realSpace.Minimize commands facilitate optimization of coordinates, b-factors, and/or occupancies against experimental diffraction or real space data, respectively.
GLOBAL OPTIMIZATION
Simulated annealing
FFX implements simulated annealing functionality via the Anneal command.50,105 The command allows for selection of the heating and cooling schedule, the molecular dynamics protocol and simulation length at each temperature, and selection of the atomic degrees of freedom that should remain fixed. Although the simulated annealing approach can locate the global minimum of a target function, its success depends on the simulations at each temperature being of sufficient length. Versions of the method are available for refinement against both experimental diffraction data (xray.Anneal) and real space maps (realSpace.Anneal).106
Methods based on a many-body expansion
In addition to extending the Goldstein elimination criteria and implementing parallelization strategies, four approximations were introduced to improve computational performance. In the context of rotamer optimization, the expansion can often be truncated at pairwise terms due to damping of three-body and higher-order terms by the generalized Kirkwood implicit solvent. However, previous work also demonstrated that inclusion of three-body terms is sometimes necessary (i.e., in the absence of implicit solvent or when using an x-ray diffraction target function).11 The second approximation employs a distance cutoff to exclude interactions where the closest rotamers for a residue pair are more than 3 Å apart or for residue triples that are more than 2 Å apart. Pruning removed rotamers with self-energies 25 kcal/mol or more above the lowest self-energy of a residue, prior to calculation of two-body energies. The pruning criterion is based on the heuristic observation that rotamers with such an unfavorable self-energy (e.g., due to an atomic clash with backbone atoms) are not found in well-packed structures. The final approximation imposed a 3D grid over the protein, followed by optimization within each subdomain (cube) of the grid, rather than including all protein residues simultaneously. The repacking algorithm is a provable global optimizer within a single subdomain of the grid, but not for the whole protein because coordinated changes between subdomains are neglected.
PERIODIC SYSTEMS, SPACE GROUP SYMMETRY, AND NEIGHBOR LISTS
The conditional convergence of Coulomb’s law under periodic boundary conditions can be treated by splitting the electrostatic potential into a strictly convergent short-ranged real space contribution and a smooth periodic contribution. The smooth periodic contribution is strictly convergent in Fourier space as described by Ewald.107 For crystals of small organic molecules, both space group symmetry and handling of the minimum image convention require special consideration when building the neighbor list as shown in for Compound 23. In FFX, both issues are handled in a uniform fashion based on permuting space group symmetry operators with the translational operators needed to generate replicated copies of the unit cell (i.e., to generate an overall cell that is larger than twice the cutoff used during building of the neighbor list as shown in Fig. 4). In direct space, the non-bonded pairwise loops over neighbors (i.e., for van der Waals or electrostatics interactions) leverage an additional outer loop over the permuted symmetry operators. FFX then utilizes a customized version of the smooth particle mesh Ewald (PME) algorithm that handles both space group symmetry and small unit cells.108 For NPT simulations, the number of replicates cells is adjusted dynamically to accommodate shrinking or growing unit cells.
PARTICLE MESH EWALD SUMMATION
As a part of FFX’s potential energy capabilities, a general implementation of smooth particle-mesh Ewald summation (PME) is included to avoid using electrostatic cutoffs and boost performance.35,36 The FFX implementation of PME for multipoles builds on the work of Sagui et al.37 by adding unified support for symmetry operators for all 230 crystallographic space groups and replicates operators for small unit cells (i.e., interactions with neighboring unit cells are required to satisfy the real space cut-off) and their combination.1 Here, we will emphasize notable changes in the algorithm compared to that available in other simulation packages due to the inclusion of symmetry operators for both the direct and reciprocal space terms.
Real space energy
Reciprocal space energy
POLARIZATION ALGORITHMS
Definition of the self-consistent field
Successive over relaxation (SOR)
The SOR technique is a variant of the Gauss–Seidel method for solving a linear system of equations. It incorporates the previous solution of the system of linear equations with a variable called the relaxation factor (0.7 by default). Although the convergence rate can be improved slightly by tuning the relaxation factor to the system of interest, SOR requires significantly more iterations than the preconditioned conjugate gradient method described below to achieve the same convergence criteria. By default, the SCF convergence criteria are to reduce the change in induced dipole magnitude between iterations below a threshold of 1.0 × 10−6 rms D.
Preconditioned conjugate gradient (PCG)
The objective of the PCG technique is to minimize the residual found by moving all components of the linear system to the right-hand side (r.h.s.). The notation for AMOEBA is achieved by rearranging Eq. (23) to give the residual as . A preconditioner is used to improve the condition number of the matrix . In FFX, the preconditioner is based on using a short real space cutoff with a default value of 4.5 Å (with no reciprocal space contribution to the field). The PCG method typically reduces the rms Debye change between iterations by an order of magnitude each cycle (i.e., just six or seven cycles are needed to reach the default convergence criteria).
FFX allows users to define the polarization model, where the default value of “mutual” (i.e., SOR or PCG is used to iteratively converge the SCF) can be changed to “direct” or “none.” The “direct” option includes the field due to permanent multipoles, but not the field due to induced dipoles themselves (i.e., the second term on the r.h.s. of Eq. (23) is not included). Selecting “none” eliminates the polarization energy term from the potential energy and is useful for relaxing poor starting coordinates whose SCF is unstable.
DYNAMICS METHODS
Integrators and controls
Several integrators are implemented to propagate degrees of freedom based on Newton’s equations of motion during molecular dynamics (MD) simulations. Velocity Verlet provides the simplest integration scheme that is numerically stable and time-reversible.110,111 The closely related Beeman70 algorithm for integration produces an identical trajectory to velocity Verlet with a modification that calculates velocities more accurately and better conserves energy. FFX implements a reversible-RESPA integrator112 that offers multiple-time step simulations through the separation of long-range force calculations from the short-range. The short-range forces are calculated at each time step by a position Verlet algorithm. The position Verlet offers greater stability than velocity Verlet if large time steps are used. The long-range forces are calculated at n time steps, reducing the computational cost of integration. FFX offers two thermostats for use in conjunction with the integrators: the thermostat of Berendsen et al.117 and Bussi–Donadio–Parrinello113 thermostat. The Berendsen thermostat causes the system temperature to decay exponentially toward a target temperature, but it does not produce particle velocities that are consistent with sampling from the canonical ensemble. The Bussi thermostat can be considered a global version of the (local) Langevin thermostat described below, and it produces particle velocities consistent with rigorous sampling from the canonical ensemble.
The Langevin dynamics119,120 integrator incorporates two forces: a viscous damping force proportional to particle velocity and a random force representing the effects of collisions with molecules in the environment. The random forces are pulled from a Gaussian distribution with a mean of zero. The magnitude of both the viscous damping and random forces are controlled by a collision frequency parameter (default in FFX is 91/ps to mimic water-like viscosity114). In this way, the Langevin integrator provides rigorous temperature control. Finally, pressure control is achieved through a Monte Carlo barostat that obeys lattice system constraints.115
Implementations
The default implementation for performing molecular dynamics leverages shared memory parallelism (SMP) constructs defined by the Parallel Java API. This code path is currently recommended for small systems (e.g., pharmaceutical crystals), for use with orthogonal space tempering, or for refinement against experimental data. For GPU acceleration of systems without space group symmetry, FFX offers an interface with OpenMM.123 This latter code path is recommended for simulating large systems, such as proteins and/or DNA solvated in a periodic box of water.
Constant pH molecular dynamics
FREE ENERGY CALCULATIONS
The Thermodynamics command supports estimation of free energy differences using two complimentary approaches. The first approach applies free energy perturbation (FEP), the Bennett Acceptance Ratio (BAR) method, and/or Multi-State BAR (MBAR) to sample from ensembles defined by pairs of λ values. It then accumulates the overall free energy difference (and its statistical uncertainty) over a path defined by a series of λ windows.127,119 The second approach uses the orthogonal space tempering (OST) biased sampling strategy to estimate the free energy difference while overcoming hidden barriers.38,129 Future work will access opportunities to re-weight biased samples from OST and thereby permit evaluation with MBAR.
FFX supports all 230 space groups, the use of unit cells whose interfacial radius is less than half of the nonbonded cutoff distance, and the combination of small unit cells with space group symmetry.
FFX supports all 230 space groups, the use of unit cells whose interfacial radius is less than half of the nonbonded cutoff distance, and the combination of small unit cells with space group symmetry.
Free energy perturbation, BAR, and MBAR
For the BAR and MBAR methods, the λ values range from 0 to 1, where λ = 0 indicates the initial state and λ = 1 indicates the end state. Intermediate simulation windows sample from an ensemble that represents an unphysical alchemical state. The resulting samples are stored in an archive file (.arc) for each λ value, which can then read by the BAR or MBAR commands during the estimation of free energy differences.
For example, Thermodynamics can be used to estimate the free energy difference associated with amino acid substitutions within the protein structure. In this case, the thermodynamic path is broken into four sets of simulations to slowly remove the wildtype (WT) amino acid and slowly grow in the mutant (MT) amino acid. The first set turns off the electrostatics from the WT side chain, the second turns off van der Waals from the WT side chain, the third turns on the van der Waals for the MT side chain, and the final simulation turns on the electrostatics for the MT side chain.
To remove the electrostatics contribution of the WT side chain, FFX splits the simulation into a set number of alchemical intermediate simulations, typically referred to as windows. The user sets the alchemical atoms to be affected by the λ states (i.e., the side chain atoms). Each window will have a different λ value evenly distributed across n windows from zero to one. The λ value acts as a dial to tune the contribution from the electrostatics from on (λ = 1) to off (λ = 0). FFX initiates windowed alchemical simulations with Thermodynamics when n windows are set. Each simulation is run in parallel and has GPU acceleration. The result is n archive files with snapshots from thermodynamics simulations used for further analysis to estimate the overall free energy change. The BAR command estimates the free energy difference from the collected samples at each λ value.127
An extension of BAR is MBAR where equilibrium samples from multiple thermodynamic states can be utilized to construct a statistically optimal free energy estimator. MBAR reduces to BAR when used on two thermodynamic states.
Our recent expansion of the dual topology framework to directly estimate free energy differences between polymorphs is presented as an additional demonstration for both BAR and MBAR. A dual topology simulation via the Thermodynamics command was used to sample states between experimental crystals with lattice parameters and coordinates minimized to convergence criteria of 0.05 kcal/mol/Å according to parameters generated by PolType2120 for 2-((4-(2-(3,4-dichlorophenyl)ethyl)phenyl)amino)benzoic acid (CSD ID: XAFPAY) also known as compound XXIII. Symmetry operators to map between the two polymorph end states were generated via the PAC algorithm. Individual symmetry operators were generated for each of the aromatic rings and their constituents as shown in Fig. 5. Several hydrogen atoms had inter-polymorph distances larger than 1.0 Å after the application of the symmetry operator that was applied to match their bonded heavy atom between polymorphs; therefore, those hydrogen atoms were treated as alchemical.
The upper panel shows compound XXIII colored by an atomic number with chlorine green, oxygen red, nitrogen blue, carbon gray, and hydrogen white. The lower panel is colored by dual topology groups, with orange, violet, and yellow atoms each assigned a unique symmetry operator to map between polymorphs. The hydrogen atoms colored green in the lower panel were given independent degrees of freedom for the two topologies.
The upper panel shows compound XXIII colored by an atomic number with chlorine green, oxygen red, nitrogen blue, carbon gray, and hydrogen white. The lower panel is colored by dual topology groups, with orange, violet, and yellow atoms each assigned a unique symmetry operator to map between polymorphs. The hydrogen atoms colored green in the lower panel were given independent degrees of freedom for the two topologies.
One CPU core was used to generate sampling for each of 21 lambda windows in parallel. Each window sampled for one nanosecond (∼11.1 h on an Intel® Xeon® Gold 6330 CPU at 2.00 GHz) with coordinate snapshots saved each picosecond. The last 900 snapshots were evaluated MBAR to computed polymorph relative free energy differences, with results given in Tables III and IV. For comparison, relative lattice potential energy differences are provided for AMOEBA (which was used for the free energy difference calculations) along with several other approaches featured in the sixth blind test of organic crystal structure prediction organized by the Cambridge Crystallographic Data Centre.121
Relative lattice energies for the experimental polymorphs of compound XXIII using a variety of models, compared to AMOEBA relative polymorph free energy differences (kcal/mol).
. | . | Form . | ||||
---|---|---|---|---|---|---|
Modela . | Method . | A . | B . | C . | D . | E . |
Team 3: Day et al. | Multipoles and exp-6 | 0.3 | 1.3 | 0.0 | 0.6 | 0.1 |
Team 5: van Eijck | Charges and exp-6 | 1.0 | 0.0 | 1.3 | 1.3 | 1.1 |
Team 14: Neumann et al. | PBE + Neuman-Perrin | 0.9 | 0.0 | 0.0 | 0.7 | 0.5 |
Team 18: Price et al. | Multipoles and exp-6 | 2.3 | 0.0 | 0.8 | 2.2 | 1.3 |
Potential energy | AMOEBA | 1.30 | 0.00 | 1.80 | 1.34 | 1.78 |
MBAR relative ΔG | AMOEBA | 0.39 | 0.00 | 0.87 | 0.66 | 0.84 |
. | . | Form . | ||||
---|---|---|---|---|---|---|
Modela . | Method . | A . | B . | C . | D . | E . |
Team 3: Day et al. | Multipoles and exp-6 | 0.3 | 1.3 | 0.0 | 0.6 | 0.1 |
Team 5: van Eijck | Charges and exp-6 | 1.0 | 0.0 | 1.3 | 1.3 | 1.1 |
Team 14: Neumann et al. | PBE + Neuman-Perrin | 0.9 | 0.0 | 0.0 | 0.7 | 0.5 |
Team 18: Price et al. | Multipoles and exp-6 | 2.3 | 0.0 | 0.8 | 2.2 | 1.3 |
Potential energy | AMOEBA | 1.30 | 0.00 | 1.80 | 1.34 | 1.78 |
MBAR relative ΔG | AMOEBA | 0.39 | 0.00 | 0.87 | 0.66 | 0.84 |
Model entries that start with “Team” were converted from Table S12 of the sixth blind test of organic crystal structure prediction.
Free energy differences and uncertainties in kcal/mol per molecule between experimental polymorphs of compound XXIII computed using MBAR.
Transformation . | ΔG . | Uncertainty . |
---|---|---|
B → D | 0.656 | 0.067 |
D → E | 0.183 | 0.097 |
E → C | 0.027 | 0.092 |
C → A | −0.482 | 0.095 |
A → B | −0.386 | 0.065 |
Cycle closure | −0.001 |
Transformation . | ΔG . | Uncertainty . |
---|---|---|
B → D | 0.656 | 0.067 |
D → E | 0.183 | 0.097 |
E → C | 0.027 | 0.092 |
C → A | −0.482 | 0.095 |
A → B | −0.386 | 0.065 |
Cycle closure | −0.001 |
The estimated free energy differences from both BAR and MBAR follow the trends observed from the AMOEBA lattice potential energy differences. Individual free energy differences between each of the experimental polymorphs, their associated uncertainties, and cycle closure errors can be found in Table IV.
Orthogonal space tempering
Figure 6 represents a visualization of the OST algorithm as applied to a simulation of carbamazepine as it transitions between vacuum to crystalline phases. In Fig. 6(a), the ensemble average partial derivative of the potential energy with respect to the path variable, , is plotted as a black solid line. The deposition free energy difference at each relative to the vacuum state (i.e., ) is represented as a blue dashed line. Figure 6(b) demonstrates a contour plot of the total OST bias (i.e., the summation of both 1D and 2D bias contributions). The addition of OST bias promotes sampling away from the free energy minimum, while in the crystalline phase (near or at ) encouraging the exploration of free energy barriers along (orthogonal to the path). Figure 6(c) features only the two-dimensional bias contribution as a contour plot by removing the 1D bias, which is constant for each value. The partial derivatives needed for OST (, , ) are supported for softcore van der Waals interactions, softcore charge (or multipolar) interactions, and polarization energy contributions within the CPU code path,6 but not yet available within OpenMM.129
Plots illustrating the use of the OST sampling approach on a carbamazepine simulation between vacuum () and crystalline () phases. (a) The ensemble average partial derivative of the potential energy with respect to the path variable , , (black solid line) and the deposition free energy difference obtained as the integration over the phase transition path (blue dashed line). (b) A contour plot of the combined contributions from the 1D and 2D biases. (c) A contour plot of only the 2D bias contributions. Both (b) and (c) show as a function of .
Plots illustrating the use of the OST sampling approach on a carbamazepine simulation between vacuum () and crystalline () phases. (a) The ensemble average partial derivative of the potential energy with respect to the path variable , , (black solid line) and the deposition free energy difference obtained as the integration over the phase transition path (blue dashed line). (b) A contour plot of the combined contributions from the 1D and 2D biases. (c) A contour plot of only the 2D bias contributions. Both (b) and (c) show as a function of .
PROPERTIES AND ANALYSIS
A family of commands leverage coordinate snapshots and trajectories to analyze the properties of organic materials and biomolecules. The Energy command provides a summary of the potential energy contributions for all terms in use (e.g., bonds, angles, torsion, van der Waals, permanent electrostatics, polarization, and various restraints). The Volume command supports calculation of molecular volume and surface area using either the GaussVol125 or the Connolly algorithm.126 The former is limited to calculation of van der Waals volume and surface area. The latter additionally supports calculation of both (1) molecular volume and surface area and (2) solvent excluded volume and solvent accessible surface area.
BAR and MBAR can be applied to single or dual topology systems after simulation with molecular dynamics or Metropolis Monte Carlo techniques.127,128 BAR can save Tinker.bar files that contain energy evaluations for snapshots in each λ value, which is particularly useful for re-evaluating free energy differences using subsets of the coordinate snapshots (e.g., to compare free energy difference estimates from the first and second half of trajectories). With an additional script SortArc, coordinate snapshots from thermodynamics simulations using the replica exchange algorithm can be reorganized and analyzed with BAR.129 Specifically, SortArc sorts the multi-state snapshot files into archive files containing snapshots for only a single state (i.e., all snapshots at λ = 1 will be contained in a single .arc file).
Finally, the Histogram command is used to load a 2D count matrix from an OST simulation (stored in a histogram file *.his). Histogram then leverages the counts to first compute the ensemble average partial derivative of the potential energy with respect to the state variable followed by estimation of the free energy difference as a function λ using thermodynamic integration. The Histogram command can optionally save out text files that are input to a simple Matlab script that plots the 1D potential of mean force (PMF) and 2D OST bias to visualize the free energy surface as a function of both λ and .
ROTAMER OPTIMIZATION OF MANY-BODY POTENTIAL
Rotamer optimization can be performed on a protein using the ManyBody command. The total energy of the protein is optimized with the many-body expansion through movement of defined side chain positions relative to the backbone (rotamers). The user can choose between two rotamer libraries with rotamers for every amino acid, including the protonated and deprotonated form of titratable residues.130,131 ManyBody results in an optimized structure with side chains in the global minimum energy conformation (GMEC). The user can improve the rotamer optimization by removing default approximations within the many-body expansion. The distance cutoffs for both two- and three-body energy terms can be increased to capture interactions of more distal residues. Users can adjust energy cutoffs for clash pruning, add or remove the three-body energy term, or allow soft-coring of clashes. There are multiple residue selection algorithms that include providing start and end residues as well as a user-specified list of residues. Rotamer optimization with ManyBody improves the overall structural quality of proteins.
For example, rotamer optimization with local minimization was applied to AlphaFold2132 deep learning algorithm predicted isoform-specific protein structures for 218 protein-coding genes found in the Deafness Variation Database (DVD).133 The MolProbity algorithm evaluated structures before and after optimization to quantify the improvement in atomic clashes, backbone angles, and side chain conformations.5 Structures from AlphaFold2 had an average clash score of 20.75 (i.e., number of unphysical overlaps per 1000 atoms), and the overall MolProbity score was 2.86 (i.e., the protein models were consistent with those from 2.86 Å resolution diffraction data). After optimization with FFX, the average clash score of the protein models dropped to only 0.11, and the protein structures had an average MolProbity score of 0.97, which is consistent with models from atomic resolution diffraction data.
REFINEMENT AGAINST X-RAY, NEUTRON, AND JOINT X-RAY/NEUTRON TARGETS
Real space refinement is also supported using CryoEM electron density maps or those from a Fourier synthesis, such as or , where are observed structure factors and are calculated structure factors. Prior work has shown that PME electrostatics, especially when combined with a polarizable multipole force field, improves structure quality.1 Further model improvements are afforded by use of a differentiable bulk solvent model9 and global optimization of sidechain rotamers using a many-body expansion of Eq. (35) coupled to Goldstein elimination criteria.11 Work remains to implement a convergent and efficient implementation of generalized Kirkwood continuum electrostatics under periodic boundary conditions, which will open the door to estimation of free energy differences between model conformations and chemical compositions.
UNIT TESTING AND CONTINUOUS INTEGRATION
FFX currently includes more than 500 JUnit tests, with many building on the commands described previously (e.g., Energy, Thermodynamics, and ManyBody) to validate both core functionality and command line flags. The number of JUnit tests for each package and current percentage of code covered by each test is summarized in Table V. No JUnit tests specific to Parallel Java have been created due to the careful work by its original author.57 We also lack JUnit tests for the graphical user interface. The OpenMM classes are covered by JUnit tests in the Potential and Algorithms packages. A Jenkins continuous integration server is used to automatically run all FFX unit tests and generate the Force Field X website based on polling the GitHub repository. The FFX website includes documentation for all commands and properties generated from annotations within Java code. Future work will focus on expanding test coverage for emerging methods within the Algorithms and Refinement packages.
The number of unit tests and code coverage for most of the FFX packages.
Package . | Number of tests . | Code coverage (%) . |
---|---|---|
Algorithms | 100 | 32 |
Crystal | 13 | 91 |
Numerics | 175 | 82 |
OpenMM | ⋯ | 58 |
Potential | 201 | 60 |
Refinement | 27 | 40 |
Utilities | 2 | 14 |
Package . | Number of tests . | Code coverage (%) . |
---|---|---|
Algorithms | 100 | 32 |
Crystal | 13 | 91 |
Numerics | 175 | 82 |
OpenMM | ⋯ | 58 |
Potential | 201 | 60 |
Refinement | 27 | 40 |
Utilities | 2 | 14 |
SHARED MEMORY, MPI, AND GPU PARALLELIZATION
Shared memory parallelization
FFX leverages Parallel Java (PJ) for shared memory parallelism (SMP)57 based on classes that offer functionality that is analogous to OpenMP style pragmas. Rather than annotating a loop with a pragma, PJ defines various “ForLoop” classes that are extended and then executed by a collection of threads called a “ParallelTeam.”58 By default, nonbonded forces are calculated using all available threads. For PME, the direct space and reciprocal terms can be calculated concurrently. The direct space contributions are organized using neighbor lists to distribute and balance the workload. Parallelization of the 3D convolution operation that is the basis of reciprocal space PME has been described previously.1
Message passing across nodes
FFX executes on a single process by default but supports the cooperation of multiple processes through the use of the PJ Scheduler and MPI implementation of PJ. The Scheduler organizes execution of a parallel job across a cluster of multiple processes with a user defined number of threads per process and memory per process. The Scheduler defaults to evenly splitting all available cores if this property remains unspecified by the user. The Scheduler and MPI constructs can be used in conjunction with both the SMP approach described above and the GPU support described below.
For example, OST free energy difference calculations can be accelerated through MPI parallelization over multiple walkers that each contributes counts to the same 2D histogram (Multiple Walker OST). The Scheduler launches individual trajectories that each starts from an identical value of the state variable λ or be distributed across the thermodynamic path (i.e., each walker has a unique starting value of λ). The samples from any given walker are communicated to all other walkers such that each process is applying the same OST bias and thereby providing the same estimated free energy difference. The addition of walkers enhances sampling and facilitates convergence. MPI approaches are also employed in the context of both many-body optimization and replica exchange constant pH MD.
Usage of GPUs via OpenMM
OpenMM binaries are bundled with FFX to allow for usage of GPUs.20 Alternatively, a source code can be built and used via JNA or JExtract. FFX has Java implementations for the majority of C++ classes available in OpenMM (i.e., Context, Platform, and State). Many-body optimization employs OpenMM for force field energy calculations and in turn for self, pair, and triple energy calculations. The initialization of many-body occurs on the CPU, while the AMOEBA energy calculations are performed after creating an OpenMM context and moving to the GPU. The finalization of the global minimum energy conformation is passed back to the CPU.4 GPU acceleration is also available for MD, AMOEBA/GK calculations, replica exchange constant pH MD (hybrid CPU–GPU implementation), etc. SMP and MPI can be used in conjunction with OpenMM.
BENCHMARKS
Energy and force timings for simulating carbamazepine crystalline units are presented in Table VI for three polymorphs to show the efficiency gained by simulating asymmetric units relative to replicated unit cells. Furthermore, the carbamazepine polymorph deposition simulation that produced the plots in Fig. 6 was performed utilizing two threads of a recent Intel CPU (a Xeon Gold 6330 CPU at 2.00 GHz). Simulating for 3.6 h produced 1 ns of sampling using the AMOEBA force field or more than 350 ns/day using all 112 threads/56 cores of a dual CPU configuration.
Comparison of the asymmetric unit (ASU), unit cell (UC), and replicated unit cell that satisfies minimum image convention for several experimental polymorphs of carbamazepine.
CSD ID . | Crystalline unit . | Space group . | Number of molecules . | Degrees of freedom . | Time for energy and force evaluations (s) . |
---|---|---|---|---|---|
CBMZPN 02 | ASU | P21/n | 1 | 90 | 0.027 |
UC | P1 | 4 | 360 | 0.029 | |
4 × 3 × 3 UC | P1 | 144 | 12 960 | 0.733 | |
CBMZPN 12 | ASU | C2/c | 1 | 90 | 0.026 |
UC | P1 | 8 | 720 | 0.055 | |
2 × 5 × 3 UC | P1 | 240 | 21 600 | 1.116 | |
CBMZPN 16 | ASU | Pbca | 1 | 90 | 0.034 |
UC | P1 | 8 | 720 | 0.046 | |
4 × 3 × 2 UC | P1 | 192 | 17 280 | 0.894 |
CSD ID . | Crystalline unit . | Space group . | Number of molecules . | Degrees of freedom . | Time for energy and force evaluations (s) . |
---|---|---|---|---|---|
CBMZPN 02 | ASU | P21/n | 1 | 90 | 0.027 |
UC | P1 | 4 | 360 | 0.029 | |
4 × 3 × 3 UC | P1 | 144 | 12 960 | 0.733 | |
CBMZPN 12 | ASU | C2/c | 1 | 90 | 0.026 |
UC | P1 | 8 | 720 | 0.055 | |
2 × 5 × 3 UC | P1 | 240 | 21 600 | 1.116 | |
CBMZPN 16 | ASU | Pbca | 1 | 90 | 0.034 |
UC | P1 | 8 | 720 | 0.046 | |
4 × 3 × 2 UC | P1 | 192 | 17 280 | 0.894 |
Rotamer optimization was performed for a turkey-ovomucoid third domain, a 56 amino acid protein structure requiring ∼112.5 thousand AMOEBA/GK energy calculations with zero, one, two, and four GPUs on Nvidia A10 s with 28 Intel CPU cores per GPU. The simulation experienced a 9.4× speed up from 7.6 AMOEBA energy calculations per second to 71.7 per second when increasing from no GPU to four GPUs (Table VII).
AMOEBA/GK energy evaluations per second with different numbers of GPU’s performing a many-body optimization on a 56-residue turkey-ovomucoid third domain protein.
Architecture . | AMOEBA energy evaluations (s) . |
---|---|
CPU | 7.6 |
1 GPU | 29.9 |
2 GPUs | 48.8 |
4 GPUs | 71.7 |
Architecture . | AMOEBA energy evaluations (s) . |
---|---|
CPU | 7.6 |
1 GPU | 29.9 |
2 GPUs | 48.8 |
4 GPUs | 71.7 |
Finally, for molecular dynamics simulations that can be offloaded to OpenMM (e.g., unit cell lengths that are larger than twice the non-bonded cutoff and do not require symmetry operators), the benchmarks described by the OpenMM developers20,134 are representative of the performance in FFX.
CONCLUSIONS
This article has demonstrated significant use cases and advancements available in FFX for atomic resolution modeling of organic materials and biomolecules. Specifically, we have highlighted FFX’s handling of crystal structures and data, the generalized Kirkwood implicit solvent model, constant-pH MD for the polarizable AMOEBA force field, dual topology methods, and global side chain optimization. FFX development will continue with novel algorithms and advanced treatment of force fields, acid–base chemistry, and prediction of crystal properties. Some methods under active development include a statistical mechanics method for accelerated acid–base chemistry calculations, AMOEBA folding and binding free energy difference predictions for amino acid mutations, GPU acceleration of constant pH MD, and methods for relative polymorph free energy differences. It is our hope that the open-source and freely available FFX software can serve as a computational microscope to understand the biophysics of organic materials and biomolecules.
ACKNOWLEDGMENTS
A.C.T., R.A.C.G., and M.R.T. were supported by the NSF (National Science Foundation) Graduate Research Fellowship under Grant No. 000390183. R.A.C.G. and A.J.N. were partially supported by a Ballard and Seashore Dissertation Fellowship from the University of Iowa. R.A.C.G., G.Q., and L.M.C. were partially supported by fellowships from the University of Iowa Office of Undergraduate Research. J.W.P. and P.R. were supported by NIH under Grant Nos. R01GM114237 and R01GM106137. J.S. was supported by NIH under Grant No. R35GM148261. M.J.S. was supported by NSF under Grant No. CHE-1751688. M.J.S. and R.J.H.S. were supported by NIH under Grant No. R01DC012049. M.J.S. and J.J.M. were supported by the Simons Foundation SFARI under Award No. 1019623.
AUTHOR DECLARATIONS
Conflict of Interest
J.-P. Piquemal, J. W. Ponder, and P. Ren are cofounders of Qubit Pharmaceuticals.
Author Contributions
R.A.G., A.J.N., and A.C.T. contributed equally to this work.
Rose A. Gogal: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Aaron J. Nessler: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Andrew C. Thiel: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Hernan V. Bernabe: Data curation (equal); Methodology (equal); Software (equal); Writing – review & editing (equal). Rae A. Corrigan Grove: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (supporting); Writing – review & editing (supporting). Leah M. Cousineau: Data curation (supporting); Investigation (supporting); Methodology (supporting); Software (supporting); Visualization (supporting); Writing – review & editing (supporting). Jacob M. Litman: Data curation (supporting); Formal analysis (supporting); Methodology (equal); Software (equal); Validation (equal); Writing – original draft (supporting); Writing – review & editing (equal). Jacob M. Miller: Data curation (supporting); Formal analysis (supporting); Investigation (supporting); Methodology (supporting); Software (supporting); Validation (equal); Writing – original draft (supporting); Writing – review & editing (equal). Guowei Qi: Formal analysis (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Writing – review & editing (supporting). Matthew J. Speranza: Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). Mallory R. Tollefson: Data curation (supporting); Formal analysis (supporting); Investigation (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Writing – original draft (supporting); Writing – review & editing (supporting). Timothy D. Fenn: Conceptualization (supporting); Data curation (supporting); Investigation (supporting); Methodology (supporting); Software (supporting); Validation (supporting); Visualization (supporting); Writing – review & editing (supporting). Jacob J. Michaelson: Funding acquisition (supporting); Resources (supporting); Supervision (supporting); Writing – review & editing (supporting). Okimasa Okada: Data curation (supporting); Formal analysis (supporting); Investigation (equal); Methodology (equal); Software (supporting); Validation (equal); Writing – review & editing (supporting). Jean-Philip Piquemal: Data curation (equal); Funding acquisition (equal); Methodology (equal); Resources (equal); Validation (equal); Writing – review & editing (equal). Jay W. Ponder: Funding acquisition (supporting); Methodology (supporting); Resources (supporting); Software (supporting); Validation (supporting); Writing – review & editing (supporting). Jana Shen: Formal analysis (supporting); Investigation (supporting); Methodology (equal); Software (supporting); Validation (supporting); Writing – review & editing (supporting). Richard J. H. Smith: Funding acquisition (supporting); Methodology (supporting); Resources (supporting); Writing – review & editing (supporting). Wei Yang: Investigation (supporting); Methodology (supporting); Resources (supporting); Software (supporting); Writing – review & editing (supporting). Pengyu Ren: Data curation (supporting); Funding acquisition (equal); Methodology (equal); Project administration (supporting); Resources (supporting); Software (supporting); Validation (supporting); Writing – review & editing (supporting). Michael J. Schnieders: Conceptualization (equal);Data curation (equal); Formal analysis (equal); Funding acquisition(equal); Investigation (equal); Methodology (equal); Project administration(equal); Resources (equal); Software (equal); Supervision(equal); Validation (equal); Visualization (equal); Writing – originaldraft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The Force Field X code is available on GitHub (https://github.com/SchniedersLab/forcefieldx) under the GNU General Public License, version 3, with the Classpath Exception. Force Field X documentation, including instructions to install precompiled versions or build the software from source code, is available from the University of Iowa (https://ffx.biochem.uiowa.edu).