DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features, such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, DP-range correction, DP long range, graphics processing unit support for customized operators, model compression, non-von Neumann molecular dynamics, and improved usability, including documentation, compiled binary packages, graphical user interfaces, and application programming interfaces. This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, this article presents a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarks the accuracy and efficiency of different models, and discusses ongoing developments.
I. INTRODUCTION
In recent years, the increasing popularity of machine learning potentials (MLPs) has revolutionized molecular dynamics (MD) simulations across various fields, including neural network potentials (NNPs),1–19 message passing models,7,20–24 and other machine learning models.25–28 Numerous software packages have been developed to support the use of MLPs.13,29–40 One of the main reasons for the widespread adoption of MLPs is their exceptional speed and accuracy, which outperform traditional molecular mechanics (MM) and ab initio quantum mechanics (QM) methods.41,42 As a result, MLP-powered MD simulations have become ubiquitous in the field and are increasingly recognized as a valuable tool for studying atomistic systems.43–49
The DeePMD-kit is an open-source software package that facilitates molecular dynamics (MD) simulations using neural network potentials. The package was first released in 201729 and has since undergone rapid development with contributions from many developers. The DeePMD-kit implements a series of MLP models known as Deep Potential (DP) models,9,10,50–54 which have been widely adopted in the fields of physics, chemistry, biology, and material science for studying a broad range of atomistic systems. These systems include metallic materials,55 non-metallic inorganic materials,56–60 water,61–71 organic systems,10,72 solutions,52,73–76 gas-phase systems,77–80 macromolecular systems,81,82 and interfaces.83–87 Furthermore, the DeePMD-kit is capable of simulating systems containing almost all Periodic Table elements,51 operating under a wide range of temperature and pressure,88 and can handle drug-like molecules,72,89 ions,73,76 transition states,75,77 and excited states.90 As a result, the DeePMD-kit is a powerful and versatile tool that can be used to simulate a wide range of atomistic systems. Here, we present three exemplary instances that highlight its diverse applications.
Theoretical investigation of the water phase diagram poses a significant challenge due to the requirement for a highly accurate model of water interatomic interactions.91,92 Consequently, it serves as an exceptionally stringent test for the model’s accuracy and provides a means to validate the software implementation necessary for molecular dynamics simulations used in phase diagram calculations.92 Zhang et al.88 utilized DeePMD-kit to construct a deep potential model for the water system, covering a range of thermodynamic states from 0 to 2400 K and 0–50 GPa. The model was trained on density functional theory (DFT) data generated using the SCAN approximation of the exchange–correlation functional and exhibited consistent accuracy [with an Root mean square error (RMSE) of less than 1 meV/H2O] within the relevant thermodynamic range. Moreover, it accurately predicted fluid, molecular, and ionic phases and all stable ice polymorphs within the range, except for phases III and XV. The study extensively investigated the two first-order phase transitions from ice VII to VII” and VII” to ionic fluid and the atomistic mechanism of proton diffusion, leveraging the model’s capability and high accuracy in predicting water molecule ionization.
Another challenging area is condensed-phase MD simulations, as long-range interactions are critical for modeling heterogeneous systems in the condensed phase. Electrostatic interactions are not only the longest but also are well-understood, and linear-scaling methods exist for their efficient computation at the point charge,93 multipole,94,95 and quantum mechanical96,97 levels. Fast semiempirical quantum mechanical methods can be developed98,99 that can accurately and efficiently model charge densities and many-body effects in the long-range but may still lack quantitative accuracy in the mid-range (typically less than 8 Å). This limits the predictive capability of the methods in condensed-phase simulations. Zeng et al.52 created a new Δ-MLP method called Deep Potential-Range correction (DPRc) to integrate with combined quantum mechanical/molecular mechanical (QM/MM) potentials, which corrects the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a high-level ab initio QM/MM theory. Unlike many of the emerging Δ-MLPs that correct internal QM energy and forces, the DPRc model corrects both the QM–QM and QM–MM interactions of a QM/MM calculation in a manner that conserves energy as MM atoms enter (or leave) the vicinity of the QM region. This enables the model to be easily integrated as a mid-ranged correction to the potential energy within molecular simulation software that uses non-bonded lists, i.e., for each atom, a list of other atoms within a fixed cut-off distance (typically 8–12 Å). The trained DPRc model with a 6 Å range-correction was applied to simulate RNA 2′-O-transphosphorylation reactions in solution in long timescales75 and obtain better free energy estimates with the help of the generalization of the weighted thermodynamic perturbation (gwTP) method.100 Very recently, Zeng et al.72 have trained a Δ-MLP correction model called Quantum Deep Potential Interaction (QDπ) for drug-like molecules, including tautomeric forms and protonation states, which was found to be superior to other semiempirical methods and pure MLP models.89
The third important application is large-scale reactive MD simulations over a nanosecond time scale, which enable the construction of interwoven reaction networks for complex reactive systems101 instead of focusing on studying a single reaction. These simulations require the potential energy model to be accurate and computationally efficient, covering the chemical space of possible reactions. Zeng et al.77 introduced a deep potential model for simulating 1 ns methane combustion reactions and identified 798 different chemical reactions in both space and time using the ReacNetGenerator package.102 The concurrent learning procedure103 was adopted and proved crucial in exploring known and unknown chemical space during the complex reaction process. Subsequent work conducted by the research team extended these simulations to more complex reactive systems, including linear alkane pyrolysis,78 decomposition of explosive,79,104 and the growth of polycyclic aromatic hydrocarbon.80
Compared to its initial release,29 DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attention-based, and hybrid descriptors,10,50,51,53 the ability to fit tensorial properties,105,106 type embedding, model deviation,103,107 Deep Potential-Range Correction (DPRc),52,75 Deep Potential Long Range (DPLR),53 graphics processing unit (GPU) support for customized operators,108 model compression,109 non-von Neumann molecular dynamics (NVNMD),110 and various usability improvements, such as documentation, compiled binary packages, graphical user interfaces (GUIs), and application programming interfaces (APIs). This article provides an overview of the current major version of the DeePMD-kit, highlighting its features and technical details, presenting a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarking the accuracy and efficiency of different models, and discussing ongoing developments.
II. FEATURES
In this section, we introduce features from the perspective of components (shown in Fig. 1). A component represents units of computation. It is organized as a Python class inside the package, and a corresponding TensorFlow static graph will be created at runtime.
The components of the DeePMD-kit package. The direction of the arrow indicates the dependency between the components. The blue box represents an optional component.
The components of the DeePMD-kit package. The direction of the arrow indicates the dependency between the components. The blue box represents an optional component.
A. Models
1. Neural networks
2. Descriptors
DeePMD-kit supports multiple atomic descriptors, including the local frame descriptor, the two-body and three-body embedding DeepPot-SE descriptor, the attention-based descriptor, and the hybrid descriptor that is defined as a combination of multiple descriptors. In the following text, we use to represent the atomic descriptor of the atom i.
a. Local frame.
The limitation of the local frame descriptor is that it is not smooth at the cutoff radius and the exchanging of the order of two nearest neighbors [i.e., the swapping of a(i) and b(i)], so its usage is limited. We note that the local frame descriptor is the only non-smooth descriptor among all DP descriptors, and we recommend using other descriptors for the usual system.
b. Two-body embedding DeepPot-SE.
c. Three-body embedding DeepPot-SE.
d. Handling the systems composed of multiple chemical species.
e. Type embedding.
f. Attention-based descriptor.
g. Hybrid descriptor.
h. Compression.
The compression of the DP model uses three techniques, tabulated inference, operator merging, and precise neighbor indexing, to improve the performance of model training and inference when the model parameters are properly trained.109
In the standard DP model inference, taking the two-body embedding descriptor as an example, the matrix product requires the transfer of the tensor between the register and the host/device memories, which usually becomes the bottle-neck of the computation due to the relatively small memory bandwidth of the GPUs. The compressed DP model merges the matrix multiplication with the tabulated inference step. More specifically, once one column of is evaluated, it is immediately multiplied with one row of the environment matrix in the register, and the outer product is deposited to the result of . By the operator merging technique, the allocation of and the memory movement between register and host/device memories is avoided. The operator merging of the three-body embedding can be derived analogously.
The first dimension, Nc, of the environment and embedding matrices is the expected maximum number of neighbors. If the number of neighbors of an atom is smaller than Nc, the corresponding positions of the matrices are pad with zeros. In practice, if the real number of neighbors is significantly smaller than Nc, a notable operation is spent on the multiplication of padding zeros. In the compressed DP model, the number of neighbors is precisely indexed at the tabulated inference stage, further saving computational costs.
3. Fitting networks
The fitting network can fit the potential energy of a system, along with the force and the virial, and tensorial properties, such as the dipole and the polarizability.
a. Fitting potential energies.
b. Fitting tensorial properties.
c. Handling the systems composed of multiple chemical species.
4. Deep potential range correction (DPRc)
5. Deep potential long range (DPLR)
6. Interpolation with a pairwise potential
B. Trainer
1. Learning rate
2. Loss function
3. Training process
4. Multiple task training
C. Model deviation
Statistics of ϵF,i and ϵΞ,αβ can be provided, including the maximum, average, and minimal model deviation over the atom index i and over the component index α, β, respectively. The maximum model deviation of forces ϵF,max in a frame was found to be the best error indicator in a concurrent or active learning algorithm.103,107
III. TECHNICAL IMPLEMENTATION
In addition to incorporating new powerful features, DeePMD-kit has been designed with the following goals in mind: high performance, high usability, high extensibility, and community engagement. These goals are crucial for DeePMD-kit to become a widely-used platform across various computational fields. In this section, we will introduce several technical implementations that have been put in place to achieve these goals.
A. Code architecture
The DeePMD-kit utilizes TensorFlow’s computational graph architecture to construct its DP models,125 which are composed of various operators implemented with C++, including customized ones, such as the environment matrix, Ewald summation, compressed operator, and their backward propagations. The auto-grad mechanism provided by TensorFlow is used to compute the derivatives of the DP model with respect to the input atomic coordinates and simulation cell tensors. To optimize performance, some of the critical customized operators are implemented for GPU execution using CUDA or ROCm toolkit libraries. The DeePMD-kit provides Python, C++, and C APIs for inference, facilitating easy integration with third-party software packages. As indicated in Fig. 2, the code of the DeePMD-kit consists of the following modules:
The core C++ library provides the implementation of customized operators, such as the atomic environmental matrix, neighbor lists, and compressed neural networks. It is important to note that the core C++ library is independently built and tested without TensorFlow’s C++ interface.
The GPU library (CUDA126 or ROCm127), an optional part of the core C++ library, is used to compute customized operators on GPU devices other than central processing units (CPUs). This library depends on the GPU toolkit library (NVIDIA CUDA Toolkit or AMD ROCm Toolkit) and is also independently built and tested.
The DP operators library contains several customized operators not supported by TensorFlow.125 TensorFlow provides both Python and C++ interfaces to implement some customized operators, with the TensorFlow C++ library packaged inside its Python package.
The “model definition” module, written in Python, is used to generate computing graphs composed of TensorFlow operators, DP customized operators, and model parameters organized as “variables.” The graph can be saved into a file that can be restored for inference. It depends on the TensorFlow Python API (version 1, tf.compat.v1) and other Python dependencies, such as the NumPy128 and H5Py129 packages.
The Python application programming interface (API) is used for inference and can read computing graphs from a file and use the TensorFlow Python API to execute the graph.
The C++ API, built on the TensorFlow C++ interface, does the same thing as the Python API for inference.
The C API is a wrapper of the C++ API and provides the same features as the C++ API. Compared to the C++ API, the C API has a more stable application binary interface (ABI) and ensures backward compatibility.
The header-only C++ API is a wrapper of the C API and provides the same interface as the C++ API. It has the same stable ABI as the C API but still takes advantage of the flexibility of C++.
The command line interface (CLI) is provided to both general users and developers and is used for both training and inference. It depends on the model definition module and the Python API.
The architecture of the DeePMD-kit code. The red boxes are modules within the DeePMD-kit package (the green box), the orange box represents computing graphs, the blue boxes are dependencies of the DeePMD-kit, and the yellow box represents third-party packages integrated with DeePMD-kit, including LAMMPS, i-PI, GROMACS, AMBER, OpenMM, ABACUS, ASE, MAGUS, DP-Data, DP-GEN, and MLatom. Customized operators are operators that are not offered by TensorFlow, including atomic environmental matrix, interpolation with a pairwise potential, and tabulated inference of the embedding matrix. The direction of the black arrow A → B indicates that module A is dependent on module B. The red and purple arrows represent “define” and “use,” respectively.
The architecture of the DeePMD-kit code. The red boxes are modules within the DeePMD-kit package (the green box), the orange box represents computing graphs, the blue boxes are dependencies of the DeePMD-kit, and the yellow box represents third-party packages integrated with DeePMD-kit, including LAMMPS, i-PI, GROMACS, AMBER, OpenMM, ABACUS, ASE, MAGUS, DP-Data, DP-GEN, and MLatom. Customized operators are operators that are not offered by TensorFlow, including atomic environmental matrix, interpolation with a pairwise potential, and tabulated inference of the embedding matrix. The direction of the black arrow A → B indicates that module A is dependent on module B. The red and purple arrows represent “define” and “use,” respectively.
The CMake build system130 manages all modules, and the pip and scikit-build131 packages are used to distribute DeePMD-kit as a Python package. The standard Python unit testing framework132 is used for unit tests on all Python codes, while GoogleTest software133 is used for tests on all C++ codes. GitHub Actions automates build, test, and deployment pipelines.
B. Performance
1. Hardware acceleration
In the TensorFlow framework, a static graph combines multiple operators with inputs and outputs. Two kinds of operators are time-consuming during training or inference. The first one is TensorFlow’s native operators for neural networks (see Sec. II A 1) and matrix operations, which have been fully optimized by the TensorFlow framework itself125 for both CPU and GPU architectures. Second, the DeePMD-kit’s customized operators are for computing the atomic environment [Eqs. (6) and (12)], for interpolation with a pairwise potential, and for the tabulated inference of the embedding matrix [Eq. (30)]. These operators are not supported by the TensorFlow framework but can be accelerated using OpenMP,134 CUDA,126 and ROCm127 for parallelization under both CPUs and GPUs, except the features without GPU supports listed in Appendix B.
2. MPI implementation for multi-device training and MD simulations
Users may prefer to utilize multiple CPU cores, GPUs, or hardware across multiple nodes to achieve faster performance and larger memory during training or molecular dynamics (MD) simulations. To facilitate this, DeePMD-kit has added message-passing interface (MPI) implementation135,136 for multi-device training and MD simulations in two ways, which are described below.
Multi-device training is conducted with the help of Horovod, a distributed training framework.137 Horovod works in the data-parallel mode by equally distributing a batch of data among workers along the axis of the batch size .138 During training, each worker consumes sliced input records at different offsets, and only the trainable parameter gradients are averaged with peers. This design avoids batch size and tensor shape conflicts and reduces the number of bytes that need to be communicated among processes. The mpi4py package139 is used to remove redundant logs.
Multi-device MD simulations are implemented by utilizing the existing parallelism features of third-party MD packages. For example, a Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) enables parallelism across CPUs by optimizing partitioning, communication, and neighbor lists.140 AMBER builds a similar neighbor list in the interface to DeePMD-kit.52,54,141 DeePMD-kit supports local atomic environment calculation and accepts the neighbor list n(i) from other software to replace the native neighbor list calculation.108 In a device, the neighbors from other devices are considered “ghost” atoms that do not contribute atomic energy Ei to this device’s total potential energy E.
3. Non-von Neumann molecular dynamics (NVNMD)
When performing molecular dynamics (MD) simulations on CPUs and GPUs, a large majority of time and energy (e.g., more than 95%) is consumed by the DP model inference. This inference process is limited by the “memory wall” and “power wall” bottlenecks of von Neumann (vN) architecture, which means that a significant amount of time and energy (e.g., over 90%) is wasted on data transfer between the processor and memory. As a result, it is difficult to improve computational efficiency.
To address these challenges, non-von Neumann molecular dynamics (NVNMD) uses a non-von Neumann (NvN) architecture chip to accelerate inference. The NvN chip contains processing and memory units that can be used to implement the DP algorithm. In the NvN chip, the hardware algorithm runs fully pipelined. The model parameters are stored in on-chip memory after being loaded from off-chip memory during the initialization process. Therefore, two components of data shuttling are avoided: (1) reading/writing the intermediate results from/to off-chip memory and (2) loading model parameters from off-chip memory during the calculation process. As a result, the DP model ensures high accuracy with NVNMD, while the NvN chip ensures high computational efficiency. For more details, see Ref. 110.
C. Usability
1. Documentation
DeePMD-kit’s features and arguments have grown rapidly with more and more development. To address this issue, we have introduced Sphinx142 and Doxygen143 to manage and generate documentation for developers from docstrings in the code. We use the DArgs package (see Sec. III E) to automatically generate Sphinx documentation for user input arguments. The documentation is currently hosted on Read the Docs (https://docs.deepmodeling.org/projects/deepmd/). Furthermore, we strive to make the error messages raised by DeePMD-kit clear to users. In addition, the GitHub Discussion forum allows users to ask questions and receive answers. Recently, several tutorials have been published49,54 to help new users quickly learn DeePMD-kit.
2. Easy installation
As shown in Fig. 2, DeePMD-kit has dependencies on both Python and C++ libraries of TensorFlow, which can make it difficult and time-consuming for new users to build TensorFlow and DeePMD-kit from the source code. Therefore, we provide compiled binary packages that are distributed via pip, Conda (DeepModeling and conda-forge144 channels), Docker, and offline packages for Linux, macOS, and Windows platforms. With the help of these pre-compiled binary packages, users can install DeePMD-kit in just a few minutes. These binary packages include DeePMD-kit’s LAMMPS plugin, i-PI driver, and GROMACS patch. As LAMMPS provides a plugin mode in its latest version, DeePMD-kit’s LAMMPS plugin can be compiled without having to re-compile LAMMPS.140 We offer a compiled binary package that includes the C API and the header-only C++ API, making it simpler to integrate with sophisticated software, such as AMBER.52,54,141
3. User interface
DeePMD-kit offers a command line interface (CLI) for training, freezing, and testing models. In addition to CLI arguments, users must provide a JSON145 or YAML146 file with completed arguments for components listed in Sec. II. The DArgs package (see Sec. III E) parses these arguments to check if user input is correct. An example of how to use the user interface is provided in Ref. 54. Users can also use DP-GUI (see Sec. III E) to fill in arguments in an interactive web page and save them to a JSON145 file.
DeePMD-kit provides an automatic algorithm that assists new users in deciding on several arguments. For example, the automatic batch size determines the maximum batch size during training or inferring to fully utilize memory on a GPU card. The automatic neighbor size Nc determines the maximum number of neighbors by stating the training data to reduce model memory usage. The automatic probability determines the probability of using a system during training. These automatic arguments reduce the difficulty of learning and using the DeePMD-kit.
4. Input data
To train and test models, users are required to provide fitting data in a specified format. DeePMD-kit supports two file formats for data input: NumPy binary files128 and HDF5 files.147 These formats are designed to offer superior performance when read by the program with parallel algorithms compared to text files. HDF5 files have the advantage of being able to store multiple arrays in a single file, making them easier to transfer between machines. The Python package “DP-Data” (see Sec. III E) can generate these files from the output of an electronic calculation package.
5. Model visualization
DeePMD-kit supports most of the visualization features offered by TensorBoard,125 such as tracking and visualizing metrics, viewing the model graph, histograms of tensors, summaries of trainable variables, and debugging profiles.
D. Extensibility
1. Application programming interface and third-party software
DeePMD-kit offers various APIs, including the Python, C++, C, and header-only C++ API, as well as a command-line interface (CLI), as shown in Fig. 2. These APIs are primarily used for inference by developers and high-level users in different situations. Sphinx142 generates the API details in the documentation.
These APIs can be easily accessed by various third-party software. The Python API, for instance, is utilized by third-party Python packages, such as Atomic Simulation Environment (ASE),148 MAGUS,149 and DP-Data (see Sec. III E). The C++, C, or header-only C++ API has also been integrated into several third-party MD packages, such as LAMMPS,140,150 i-PI,151 GROMACS,152 AMBER,52,54,141 OpenMM,153,154 and ABACUS.155 Moreover, the CLI is called by various third-party workflow packages, such as DP-GEN107 and MLatom.34 While the ASE calculator, the LAMMPS plugin, the i-PI driver, and the GROMACS patch are developed within the DeePMD-kit code, others are distributed separately. By integrating these APIs into their programs, researchers can perform simulations and minimization, without being restricted by DeePMD-kit’s software features.72,75,83,156 Additionally, they can combine DP models with other potentials outside the DeePMD-kit package if necessary.52,72,157
Molecular dynamics, a primary application for DP models, is facilitated by several third-party packages that interface with the DeePMD-kit package, offering a wide range of supported features:
LAMMPS140 is seamlessly integrated with the DeePMD-kit through a dedicated plugin developed within the DeePMD-kit project. This plugin supports MPI, as discussed in Sec. III B 2, and provides essential functionalities, such as force calculations. Additionally, it enables on-the-fly computation of model deviation, as shown in Eqs. (68)–(72), during concurrent learning. The plugin can obtain atomic and frame-specific parameters in Eq. (38) from various sources, including constants, electronic temperatures calculated by LAMMPS, or any compute style from LAMMPS. LAMMPS also supports calculating classical point charges’ long-range (Coulomb) interaction using the Ewald summation and the fast algorithm particle–particle particle–mesh Ewald (PPPM). The k-space part of these methods, involving the Fourier space transformation of Gaussian charge distributions to compute the Coulomb interaction158 [as shown in Eq. (50)], is utilized by the DPLR method to handle the long-range interaction.
i-PI151 is integrated with the DeePMD-kit through a dedicated driver provided within the DeePMD-kit project. The driver enables the path integral molecular dynamics (PIMD) driven by the i-PI engine and is compatible with the MolSSI Driver Interface (MDI) package159 and a similar interface in the ASE package.148 However, the communication between the i-PI driver and the engine relies on UNIX-domain sockets or the network interface, which can limit performance. To overcome this limitation, developers have incorporated PIMD features into the LAMMPS package, allowing for seamless integration with the DeePMD-kit.
AMBER141 is integrated with the DeePMD-kit package through the customized source code.54 The AMBER/DeePMD-kit interface allows for effective QM/MM + DPRc simulations using the DPRc model.52 The interface extends beyond QM/QM interactions and includes a range correction for QM/MM interactions. The DeePMD-kit package only infers the selected QM region (assigned by an AMBER mask) and its MM buffer within the cutoff radius of the QM region. Like the LAMMPS integration, this interface supports MPI, as discussed in Sec. III B 2, and allows for on-the-fly computation of model deviation during concurrent learning. The AMBER/DeePMD-kit interface also enables alchemical free energy simulations to be performed, leveraging AMBER’s GPU-accelerated free energy engine120 and new features160–162 for MM transformations and using indirect MM → QM/Δ-MLP methods163 to correct the end states to the higher level.
OpenMM,153 a widely adopted molecular dynamics engine, integrates with the DeePMD-kit through an OpenMM plugin. This plugin enables standard molecular dynamics simulations with DP models and supports hybrid DP/MM-type simulations. In hybrid simulations, the system can be simulated with a fixed DP region or adaptively changing regions during the simulation.154
GROMACS152 is integrated with the DeePMD-kit through a patch to GROMACS. The patch enables DP/MM simulations by assigning the atom types inferred by DeePMD-kit.
ABACUS155 supports the C and C++ interfaces provided by DeePMD-kit. In addition, ABACUS supports various molecular dynamics based on different methods, such as classical molecular dynamics using LJ pair potential and first-principles molecular dynamics based on methods such as Kohn-Sham density functional theory (KSDFT), stochastic density functional theory (SDFT), and orbital-free density functional theory (OFDFT). The possibility of combining the DeePMD-kit with these methods requires further exploration.
These integrations and interfaces with existing packages offer researchers the flexibility to utilize the DeePMD-kit in conjunction with other powerful tools, enhancing the capabilities of molecular dynamics simulations.
2. Customized plugins
DeePMD-kit is built with an object-oriented design, and each component discussed in Sec. II corresponds to a Python class. One of the advantages of this design is the availability of a plugin system for these components. With this plugin system, developers can create and incorporate their customized components, without having to modify the DeePMD-kit package. This approach expedites the realization of their ideas. Moreover, the plugin system facilitates the addition of new components within the DeePMD-kit package itself.
E. DeepModeling community
DeePMD-kit is a free and open-source software licensed under the LGPL-3.0 license, enabling developers to modify and incorporate DeePMD-kit into their own packages. Serving as the core, DeePMD-kit led to the formation of an open-source community named DeepModeling in 2021, which manages open-source packages for scientific computing. Since then, numerous open-source packages for scientific computing have either been created or joined the DeepModeling community, such as DP-GEN,107 DeePKS-kit,164 DMFF,165 ABACUS,155 DeePH,166 and DeepFlame,167 among others, whether directly or indirectly related to DeePMD-kit. The DeepModeling packages that are related to DeePMD-kit are listed as follows.
Deep Potential GENerator (DP-GEN)107 is a package that implements the concurrent learning procedure103 and is capable of generating uniformly accurate DP models with minimal human intervention and computational cost. DP-GEN2 is the next generation of this package, built on the workflow platform Dflow.
Deep Potential Thermodynamic Integration (DP-Ti) is a Python package that enables users to calculate free energy, perform thermodynamic integration, and determine pressure-temperature phase diagrams for materials with DP models.
DP-Data is a Python package that helps users convert atomistic data between different formats and calculate atomistic data through electronic calculation and MLP packages. It can be used to generate training data files for DeePMD-kit and visualize structures via 3Dmol.js.168 The package supports a plugin system and is compatible with ASE,148 allowing it to support any data format without being limited by the package’s code.
DP-Dispatcher is a Python package used to generate input scripts for high-performance computing (HPC) schedulers, submit them to HPC systems, and monitor their progress until completion. It was originally developed as part of the DP-GEN package,107 but has since become an independent package that serves other packages.
DArgs is a Python package that manages and filters user input arguments. It provides a Sphinx142 extension to generate documentation for arguments.
DP-GUI is a web-based graphical user interface (GUI) built with the Vue.js framework.169 It allows users to fill in arguments interactively on a web page and save them to a JSON145 file. DArgs is used to provide details and documentation of arguments in the GUI.
IV. EXAMPLE APPLICATION: MOLECULAR DYNAMICS
This section introduces a general workflow for performing deep potential molecular dynamics using concurrent learning170 from scratch, as depicted in Fig. 3. The target simulation can encompass various conditions, such as temperature, pressure, and classical or path-integral dynamics, with or without enhanced sampling methods, in equilibrium or non-equilibrium states, and at different scales and time scales. It is important to note that this section does not serve as a user manual or tutorial or delve into specific systems.
The general workflow of performing deep potential molecular dynamics in the manner of concurrent learning.
The general workflow of performing deep potential molecular dynamics in the manner of concurrent learning.
The initial step involves preparing the initial dataset. This dataset is typically generated by sampling from small-scale, short-time MD simulations conducted under the same conditions as target simulations. The simulation level can vary, ranging from ab initio170 to semi-empirical52 or force fields,77 depending on the computational cost. Subsequently, these configurations are relabeled using high-accuracy ab initio methods.
If the ratio of accurate configurations (ϵF,max < θlow) in a simulation converges (remains unchanged in subsequent concurrent learning cycles), it can be considered as the target simulation, and the iteration can be stopped. Such a simulation trajectory can be further analyzed.
The above workflow can be executed manually or using the DP-GEN package107 automatically.
V. BENCHMARKING
We performed benchmarking on various potential energy models with different descriptors on multiple datasets to show the precision and performance of descriptors developed within the DeePMD-kit package. The datasets, the models, the hardware, and the results will be described and discussed in the following Secs. V A–V C.
A. Datasets
The datasets we used included water,9,61 copper (Cu),107 high entropy alloys (HEAs),51,171 OC2M subset in Open Catalyst 2020 (OC20),115,116 Small-Molecule/Protein Interaction Chemical Energies (SPICEs),104 and dipeptide subset in SPICE,104 as shown in Table I and listed as follows:
The water dataset contains of 140 000 configurations collected from path-integral ab initio MD simulations and classical ab initio MD simulations for liquid water and ice. Configurations were labeled using the hybrid version of Perdew–Burke–Ernzerhof (PBE0)172+ Tkatchenko–Scheffler (TS) functional and projector augmented-wave (PAW) method.173 The energy cutoff was set to 115 Ry (1565 eV).
The copper dataset consists of 15 366 configurations in Face Centered Cubic (FCC), Hexagonal Close Packed (HCP), and Body Centered Cubic (BCC) crystal. MD simulations sampled the configurations across a temperature range of 50–2579 K and a pressure range of 1–5 × 104 Bar. The concurrent learning scheme170 was employed to select the critical configurations that improved the accuracy of an ensemble of models used to estimate the model prediction error. The Perdew–Burke–Ernzerhof (PBE) functional174 and PAW method were used with an energy cutoff of 650 eV.
The High Entropy Alloy (HEA) dataset comprises six elements: Ta, Nb, W, Mo, V, and Al.51,171 These elements occupy a 2 × 2 × 2 BCC lattice consisting of 16 atoms in a random arrangement. The concentrations of Ta, Nb, W, Mo, and V encompass the entire composition space, while Al is considered an additive, with its maximum quantity being less than six. MD simulations sampled the configurations across a temperature range of 50–388.1 K and a pressure range of 1–5 × 104 bars. The concurrent learning scheme170 was employed to select the critical configurations that improved the accuracy of an ensemble of models used to estimate the model prediction error. The dataset comprises 8160 configurations labeled by the density functional theory with PBE approximation174 of the exchange and correlation. The PAW method was used with an energy cutoff of 1200 eV and a k-space sampling grid size of 0.12 Å−1.
The OC2M subset116 in the Open Catalyst 2020 (OC20) dataset takes 2 × 106 configurations from the OC20 dataset115 and includes 57 elements. OC20 consists of 1 281 040 configurations across a wide swath of materials, surfaces, and adsorbates and is labeled by the revised PBE functional174 under the periodic boundary condition. The PAW method was employed with an energy cutoff of 350 eV.
The Small-Molecule/Protein Interaction Chemical Energies (SPICE)104 dataset is a drug-like dataset that includes various subsets: dipeptides, solvated amino acids, PubChem molecules, DES370K dimers, DES370K monomers, and ion pairs. The dataset is composed of 1 132 808 non-period configurations labeled at the ωB97M-D3BJ/def2-TZVPPD level.175,176 It consists of 15 elements and contains charged configurations. We adopted the same method as described in Ref. 104 to consider each unique combination of element and formal charge as a different atom type.
The dipeptide subset in SPICE104 comprises all possible dipeptides formed by the 20 natural amino acids and their common protonation variants. This subset contains 33 850 configurations with elements, including H, C, N, O, and S, corresponding to the amino acids.
Datasets used to benchmark.
Dataset . | No. of frames . | Elements . | DFT level . | References . |
---|---|---|---|---|
Water | 140 000 | H, O | PBE0+TS/PAW (Ecutoff = 1565 eV) | 9 and 61 |
Copper | 15 366 | Cu | PBE/PAW (Ecutoff = 650 eV) | 107 |
HEA | 8 160 | Ta, Nb, W, Mo, V, Al | PBE/PAW(Ecutoff = 1200 eV) | 51 and 171 |
OC2M | 2 000 000 | Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, | RPBE/PAW (Ecutoff = 350 eV) | 115 and 116 |
Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mg, Mn, Mo, N, Na, Nb, | ||||
Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, | ||||
Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr | ||||
SPICE | 1 132 808 | H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I | ωB97M-D3BJ/def2-TZVPPD | 104 |
Dipeptides | 33 850 | H, C, N, O, S | ωB97M-D3BJ/def2-TZVPPD | 104 |
Dataset . | No. of frames . | Elements . | DFT level . | References . |
---|---|---|---|---|
Water | 140 000 | H, O | PBE0+TS/PAW (Ecutoff = 1565 eV) | 9 and 61 |
Copper | 15 366 | Cu | PBE/PAW (Ecutoff = 650 eV) | 107 |
HEA | 8 160 | Ta, Nb, W, Mo, V, Al | PBE/PAW(Ecutoff = 1200 eV) | 51 and 171 |
OC2M | 2 000 000 | Ag, Al, As, Au, B, Bi, C, Ca, Cd, Cl, Co, Cr, Cs, Cu, | RPBE/PAW (Ecutoff = 350 eV) | 115 and 116 |
Fe, Ga, Ge, H, Hf, Hg, In, Ir, K, Mg, Mn, Mo, N, Na, Nb, | ||||
Ni, O, Os, P, Pb, Pd, Pt, Rb, Re, Rh, Ru, S, Sb, Sc, Se, | ||||
Si, Sn, Sr, Ta, Tc, Te, Ti, Tl, V, W, Y, Zn, Zr | ||||
SPICE | 1 132 808 | H, Li, C, N, O, F, Na, Mg, P, S, Cl, K, Ca, Br, I | ωB97M-D3BJ/def2-TZVPPD | 104 |
Dipeptides | 33 850 | H, C, N, O, S | ωB97M-D3BJ/def2-TZVPPD | 104 |
The above datasets are representative as they contain liquids, solids, and gases, configurations in both periodic and non-periodic boundary conditions, configurations spanning a wide range of temperatures and pressures, and ions and drug-like molecules in different protonation states. The study of all these systems is essential in the field of chemical physics.
We split all the datasets into a training set containing 95% of the data and a validation set containing the remaining 5% of the data.
B. Models and hardware
We compared various descriptors, including the local frame (loc_frame), two-body embedding full-information DeepPot-SE (se_e2_a), a hybrid descriptor with two-body embedding full- and radial-information DeepPot-SE (se_e2_a+se_e2_r), a hybrid descriptor with two-body embedding full-information and three-body embedding DeepPot-SE (se_e2_a+se_e3), and an attention-based descriptor (se_atten). In all models, we set rs to 0.5 Å, M< to 16, and La to 2, if applicable. We used (25, 50, 100) neurons for two-body embedding networks , (2, 4, 8) neurons for three-body embedding networks , and (240, 240, 240, 1) neurons for fitting networks . In the full-information part (se_e2_a) of the hybrid descriptor with two-body embedding full-information and radius-information DeepPot-SE (se_e2_a+se_e2_r) and the two-body embedding part (se_e2_a) of the hybrid descriptor with two-body full-information and three-body DeepPot-SE (se_e2_a+se_e3), we set rc to 4 Å. For the OC2M system, we set rc to 9 Å, while under other situations, we set rc to 6 Å.
We trained each model for a fixed number of steps (1 000 000 for water, Cu, and dipeptides, 16 000 000 for HEA, and 10 000 000 for OC2M and SPICE) using neural networks in double floating precision (FP64) and single floating precision (FP32) separately. We used the LAMMPS package140 to perform MD simulations for water, Cu, and HEA with as many atoms as possible. We compared the performance of compressed models with that of the original model where applicable.108 The platforms used to benchmark performance included 128-core AMD EPYC 7742, NVIDIA GeForce RTX 3080 Ti (12 GB), NVIDIA Tesla V100 (40 GB), NVIDIA Tesla A100 (80 GB), AMD Instinct MI250, and Xilinx Virtex Ultrascale + VU9P FPGA for NVNMD only.110 We note that currently, the model compression feature only supports se_e2_a, se_e2_r, and se_e3 descriptors, and NVNMD only supports regular se_e2_a for systems with no more than four chemical species in FP64 precision. The model compression feature for se_atten is under development.
It is important to note that these models are designed for the purpose of comparing different descriptors and floating-point number precisions supported by the package under the same conditions, with the aim of recommending the best model to use. However, it should be emphasized that the number of training steps is limited, and hyperparameters, such as the number of neurons in neural networks, are not tuned for any specific system. Therefore, it is not advisable to utilize these models for production purposes, and it would be meaningless to compare them with the well-established models reported in other references, Refs. 9, 51, 107, and 104.
Furthermore, it is not recommended to compare these models with models produced by other packages, as it can be challenging to establish a fair comparison. For instance, ensuring that hyperparameters in different models are precisely the same or making all models consume the same computational resources across different packages is not straightforward.
C. Results and discussion
We present the validation errors of different models in Table II, the training and MD performance on various platforms in Tables III and IV, as well as the maximum number of atoms that a platform can simulate in Table V. None of the models outperforms the others in terms of accuracy for all datasets. The non-smooth local frame descriptor achieves the best accuracy for the water system, with an energy RMSE of 0.689 meV/atom and a force RMSE of 39.2 meV/Å. Moreover, this model exhibits the fastest computing performance among all models on CPUs, although it has not yet been implemented on GPUs, as shown in Appendix B. The local frame descriptor, despite having higher accuracy in some cases, has limitations that hinder its widespread applicability. One such limitation is that it is not smooth. Additionally, this descriptor does not perform well for the copper system, which was collected over a wide range of temperatures and pressures.107 Another limitation is that it requires all systems to have similar chemical species to build the local frame, which makes it challenging to apply in datasets, such as HEA, OC2M, dipeptides, and SPICE.
Root mean square errors (RMSEs) in the energy per atom (E, meV/atom) and forces (F, meV/Å) for water, Cu, HEA, OC2M, dipeptides, and SPICE validation sets. The boldfaced values denote the best model in an indicator.
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
System . | Indicator . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . |
Water | E RMSE | 0.7 | 0.7 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 1.0 | 1.5 | 1.2 |
F RMSE | 40.0 | 39.2 | 49.0 | 48.4 | 48.6 | 50.0 | 46.5 | 45.9 | 44.4 | 42.3 | |
Cu | E RMSE | 12.7 | 19.2 | 3.0 | 2.8 | 4.8 | 4.9 | 2.5 | 2.6 | 3.2 | 3.6 |
F RMSE | 84.7 | 105 | 17.7 | 17.9 | 21.4 | 22.0 | 16.8 | 16.6 | 16.9 | 16.9 | |
HEA | E RMSE | ⋯ | ⋯ | 15.4 | 15.3 | 13.5 | 14.5 | 12.1 | 17.2 | 5.5 | 6.4 |
F RMSE | ⋯ | ⋯ | 134 | 137 | 163 | 158 | 136 | 180 | 90.7 | 98.3 | |
OC2M | E RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 15.1 | 14.3 |
F RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 155 | 148 | |
Dipeptides | E RMSE | ⋯ | ⋯ | 9.5 | 9.6 | 12.5 | 11.7 | 14.9 | 16.8 | 12.8 | 12.7 |
F RMSE | ⋯ | ⋯ | 97.9 | 97.7 | 98.6 | 101 | 160 | 222 | 99.7 | 96.7 | |
SPICE | E RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 80.9 | 78.3 |
F RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 233 | 234 |
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
System . | Indicator . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . | FP64 . | FP32 . |
Water | E RMSE | 0.7 | 0.7 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 1.0 | 1.5 | 1.2 |
F RMSE | 40.0 | 39.2 | 49.0 | 48.4 | 48.6 | 50.0 | 46.5 | 45.9 | 44.4 | 42.3 | |
Cu | E RMSE | 12.7 | 19.2 | 3.0 | 2.8 | 4.8 | 4.9 | 2.5 | 2.6 | 3.2 | 3.6 |
F RMSE | 84.7 | 105 | 17.7 | 17.9 | 21.4 | 22.0 | 16.8 | 16.6 | 16.9 | 16.9 | |
HEA | E RMSE | ⋯ | ⋯ | 15.4 | 15.3 | 13.5 | 14.5 | 12.1 | 17.2 | 5.5 | 6.4 |
F RMSE | ⋯ | ⋯ | 134 | 137 | 163 | 158 | 136 | 180 | 90.7 | 98.3 | |
OC2M | E RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 15.1 | 14.3 |
F RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 155 | 148 | |
Dipeptides | E RMSE | ⋯ | ⋯ | 9.5 | 9.6 | 12.5 | 11.7 | 14.9 | 16.8 | 12.8 | 12.7 |
F RMSE | ⋯ | ⋯ | 97.9 | 97.7 | 98.6 | 101 | 160 | 222 | 99.7 | 96.7 | |
SPICE | E RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 80.9 | 78.3 |
F RMSE | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 233 | 234 |
Training performance (ms/step) for water, Cu, HEA, OC2M, dipeptides, and SPICE systems. “FP64” means double floating precision, “FP32” means single floating precision, and “FP64c” and “FP32c” mean the compressed training109 for double and single floating precision, respectively. “EPYC” performed on 128 AMD EPYC 7742 cores, “3080 Ti” performed on an NVIDIA GeForce RTX 3080 Ti card, “V100” performed on an NVIDIA Tesla V100 card, “A100” performed on an NVIDIA Tesla A100 card, and “MI250” performed on an AMD Instinct MI250 Graphics Compute Die (GCD).
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | EPYC | 14.7 | 9.20 | 97.3 | 45.0 | 28.4 | 16.2 | 63.7 | 32.5 | 29.9 | 15.4 | 141 | 85.2 | 34.0 | 20.6 | 1210 | 383 |
3080 Ti | 7.00 | 4.80 | 24.6 | 10.3 | 9.70 | 6.40 | 26.3 | 11.6 | 12.0 | 8.20 | 52.8 | 17.2 | 16.3 | 6.80 | 199 | 26.9 | |
V100 | 7.90 | 8.50 | 11.1 | 8.20 | 5.90 | 4.80 | 13.6 | 10.9 | 6.90 | 6.40 | 23.5 | 14.0 | 8.60 | 7.30 | 69.6 | 31.7 | |
A100 | 10.7 | 10.0 | 8.20 | 9.30 | 4.90 | 5.70 | 14.5 | 10.8 | 7.80 | 6.30 | 24.5 | 12.0 | 7.50 | 7.20 | 30.8 | 21.2 | |
MI250 | 11.7 | 10.9 | 20.3 | 13.1 | 7.70 | 7.00 | 27.3 | 19.7 | 11.5 | 10.9 | 278 | 27.7 | 12.8 | 11.2 | 125 | 31.7 | |
Cu | EPYC | 4.90 | 3.30 | 33.7 | 12.8 | 8.00 | 5.40 | 19.9 | 10.0 | 10.5 | 5.30 | 45.5 | 24.2 | 9.10 | 6.50 | 226 | 89.1 |
3080 Ti | 3.20 | 2.20 | 6.50 | 5.10 | 4.60 | 3.90 | 8.70 | 6.30 | 5.90 | 3.40 | 11.8 | 4.80 | 7.20 | 5.70 | 36.8 | 8.80 | |
V100 | 3.20 | 3.80 | 4.20 | 4.80 | 3.20 | 3.70 | 6.50 | 5.30 | 5.50 | 4.10 | 7.90 | 5.60 | 6.00 | 5.80 | 15.6 | 11.9 | |
A100 | 4.00 | 3.90 | 3.80 | 3.70 | 3.10 | 3.00 | 5.40 | 5.30 | 4.10 | 4.10 | 8.00 | 5.60 | 4.80 | 4.60 | 11.6 | 11.2 | |
MI250 | 4.80 | 4.90 | 6.90 | 6.40 | 5.10 | 5.00 | 9.10 | 9.40 | 7.40 | 7.00 | 49.9 | 10.1 | 8.00 | 7.30 | 23.6 | 18.6 | |
HEA | EPYC | ⋯ | ⋯ | 53.4 | 30.5 | 19.4 | 12.2 | 52.3 | 29.3 | 27.7 | 16.7 | 83.7 | 51.1 | 26.6 | 15.7 | 159 | 60.1 |
3080 Ti | ⋯ | ⋯ | 38.4 | 25.2 | 11.2 | 9.10 | 71.4 | 41.8 | 16.3 | 12.7 | 93.6 | 41.0 | 19.7 | 15.0 | 35.9 | 9.10 | |
V100 | ⋯ | ⋯ | 33.2 | 29.8 | 11.8 | 11.1 | 63.2 | 47.4 | 17.5 | 16.5 | 65.5 | 49.6 | 27.4 | 18.7 | 15.6 | 11.9 | |
A100 | ⋯ | ⋯ | 30.5 | 28.6 | 10.9 | 10.4 | 51.6 | 67.4 | 16.9 | 21.2 | 61.7 | 52.9 | 18.6 | 18.8 | 11.7 | 11.5 | |
MI250 | ⋯ | ⋯ | 48.8 | 42.7 | 18.5 | 18.0 | 72.3 | 69.3 | 28.7 | 27.3 | 134 | 88.4 | 32.7 | 32.3 | 21.6 | 19.5 | |
OC2M | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 2070 | 625 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 352 | 46.0 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 120 | 52.8 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 51.4 | 30.9 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 171 | 55.7 | |
Dipeptides | EPYC | ⋯ | ⋯ | 49.7 | 30.5 | 21.2 | 19.4 | 52.0 | 35.3 | 30.1 | 21.2 | 89.5 | 61.1 | 35.0 | 21.2 | 214 | 91.5 |
3080 Ti | ⋯ | ⋯ | 54.8 | 39.5 | 17.3 | 11.3 | 90.0 | 64.3 | 19.0 | 15.3 | 131 | 67.7 | 25.4 | 19.2 | 26.1 | 12.0 | |
V100 | ⋯ | ⋯ | 54.1 | 52.6 | 14.8 | 14.8 | 88.0 | 84.3 | 20.5 | 21.7 | 96.2 | 103 | 30.1 | 30.8 | 14.3 | 10.6 | |
A100 | ⋯ | ⋯ | 50.2 | 50.8 | 14.3 | 14.3 | 89.0 | 75.9 | 20.7 | 19.9 | 91.1 | 82.7 | 26.6 | 26.7 | 13.2 | 11.1 | |
MI250 | ⋯ | ⋯ | 66.2 | 67.8 | 23.1 | 22.9 | 117 | 112 | 35.0 | 32.4 | 155 | 129 | 45.9 | 44.9 | 19.6 | 16.8 | |
SPICE | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 244 | 98.0 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 35.4 | 15.3 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 17.3 | 15.9 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 11.9 | 12.2 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 29.0 | 24.1 |
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | EPYC | 14.7 | 9.20 | 97.3 | 45.0 | 28.4 | 16.2 | 63.7 | 32.5 | 29.9 | 15.4 | 141 | 85.2 | 34.0 | 20.6 | 1210 | 383 |
3080 Ti | 7.00 | 4.80 | 24.6 | 10.3 | 9.70 | 6.40 | 26.3 | 11.6 | 12.0 | 8.20 | 52.8 | 17.2 | 16.3 | 6.80 | 199 | 26.9 | |
V100 | 7.90 | 8.50 | 11.1 | 8.20 | 5.90 | 4.80 | 13.6 | 10.9 | 6.90 | 6.40 | 23.5 | 14.0 | 8.60 | 7.30 | 69.6 | 31.7 | |
A100 | 10.7 | 10.0 | 8.20 | 9.30 | 4.90 | 5.70 | 14.5 | 10.8 | 7.80 | 6.30 | 24.5 | 12.0 | 7.50 | 7.20 | 30.8 | 21.2 | |
MI250 | 11.7 | 10.9 | 20.3 | 13.1 | 7.70 | 7.00 | 27.3 | 19.7 | 11.5 | 10.9 | 278 | 27.7 | 12.8 | 11.2 | 125 | 31.7 | |
Cu | EPYC | 4.90 | 3.30 | 33.7 | 12.8 | 8.00 | 5.40 | 19.9 | 10.0 | 10.5 | 5.30 | 45.5 | 24.2 | 9.10 | 6.50 | 226 | 89.1 |
3080 Ti | 3.20 | 2.20 | 6.50 | 5.10 | 4.60 | 3.90 | 8.70 | 6.30 | 5.90 | 3.40 | 11.8 | 4.80 | 7.20 | 5.70 | 36.8 | 8.80 | |
V100 | 3.20 | 3.80 | 4.20 | 4.80 | 3.20 | 3.70 | 6.50 | 5.30 | 5.50 | 4.10 | 7.90 | 5.60 | 6.00 | 5.80 | 15.6 | 11.9 | |
A100 | 4.00 | 3.90 | 3.80 | 3.70 | 3.10 | 3.00 | 5.40 | 5.30 | 4.10 | 4.10 | 8.00 | 5.60 | 4.80 | 4.60 | 11.6 | 11.2 | |
MI250 | 4.80 | 4.90 | 6.90 | 6.40 | 5.10 | 5.00 | 9.10 | 9.40 | 7.40 | 7.00 | 49.9 | 10.1 | 8.00 | 7.30 | 23.6 | 18.6 | |
HEA | EPYC | ⋯ | ⋯ | 53.4 | 30.5 | 19.4 | 12.2 | 52.3 | 29.3 | 27.7 | 16.7 | 83.7 | 51.1 | 26.6 | 15.7 | 159 | 60.1 |
3080 Ti | ⋯ | ⋯ | 38.4 | 25.2 | 11.2 | 9.10 | 71.4 | 41.8 | 16.3 | 12.7 | 93.6 | 41.0 | 19.7 | 15.0 | 35.9 | 9.10 | |
V100 | ⋯ | ⋯ | 33.2 | 29.8 | 11.8 | 11.1 | 63.2 | 47.4 | 17.5 | 16.5 | 65.5 | 49.6 | 27.4 | 18.7 | 15.6 | 11.9 | |
A100 | ⋯ | ⋯ | 30.5 | 28.6 | 10.9 | 10.4 | 51.6 | 67.4 | 16.9 | 21.2 | 61.7 | 52.9 | 18.6 | 18.8 | 11.7 | 11.5 | |
MI250 | ⋯ | ⋯ | 48.8 | 42.7 | 18.5 | 18.0 | 72.3 | 69.3 | 28.7 | 27.3 | 134 | 88.4 | 32.7 | 32.3 | 21.6 | 19.5 | |
OC2M | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 2070 | 625 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 352 | 46.0 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 120 | 52.8 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 51.4 | 30.9 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 171 | 55.7 | |
Dipeptides | EPYC | ⋯ | ⋯ | 49.7 | 30.5 | 21.2 | 19.4 | 52.0 | 35.3 | 30.1 | 21.2 | 89.5 | 61.1 | 35.0 | 21.2 | 214 | 91.5 |
3080 Ti | ⋯ | ⋯ | 54.8 | 39.5 | 17.3 | 11.3 | 90.0 | 64.3 | 19.0 | 15.3 | 131 | 67.7 | 25.4 | 19.2 | 26.1 | 12.0 | |
V100 | ⋯ | ⋯ | 54.1 | 52.6 | 14.8 | 14.8 | 88.0 | 84.3 | 20.5 | 21.7 | 96.2 | 103 | 30.1 | 30.8 | 14.3 | 10.6 | |
A100 | ⋯ | ⋯ | 50.2 | 50.8 | 14.3 | 14.3 | 89.0 | 75.9 | 20.7 | 19.9 | 91.1 | 82.7 | 26.6 | 26.7 | 13.2 | 11.1 | |
MI250 | ⋯ | ⋯ | 66.2 | 67.8 | 23.1 | 22.9 | 117 | 112 | 35.0 | 32.4 | 155 | 129 | 45.9 | 44.9 | 19.6 | 16.8 | |
SPICE | EPYC | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 244 | 98.0 |
3080 Ti | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 35.4 | 15.3 | |
V100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 17.3 | 15.9 | |
A100 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 11.9 | 12.2 | |
MI250 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | 29.0 | 24.1 |
MD performance (μs/step/atom) for water, Cu, and HEA systems. “FP64” means double floating precision, “FP32” means single floating precision, and “FP64c” and “FP32c” mean the compressed model109 for double and single floating precision, respectively. “EPYC” performed on 128 AMD EPYC 7742 cores, “3080 Ti” performed on an NVIDIA GeForce RTX 3080 Ti card, “V100” performed on an NVIDIA Tesla V100 card, “A100” performed on an NVIDIA Tesla A100 card, “MI250” performed on an AMD Instinct MI250 Graphics Compute Die (GCD), and “VU9P” performed NVNMD110 on a Xilinx Virtex Ultrascale+ VU9P FPGA board.
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | EPYC | 1.25 | 0.699 | 19.3 | 8.73 | 3.89 | 2.61 | 8.33 | 3.43 | 3.78 | 1.86 | 37.2 | 15.1 | 5.04 | 3.63 | 221 | 83.8 |
3080 Ti | 12.9 | 8.63 | 29.0 | 4.21 | 9.71 | 1.73 | 20.8 | 3.43 | 9.06 | 1.99 | 69.5 | 10.5 | 18.5 | 2.89 | 294 | 32.3 | |
V100 | 16.1 | 16.8 | 8.25 | 4.59 | 1.94 | 1.51 | 6.21 | 3.53 | 2.22 | 1.62 | 22.2 | 11.3 | 3.31 | 2.41 | 91.2 | 37.2 | |
A100 | 35.7 | 33.9 | 4.37 | 3.01 | 1.56 | 1.42 | 4.11 | 2.44 | 2.07 | 1.53 | 12.5 | 7.17 | 2.64 | 2.25 | 35.6 | 22.4 | |
MI250 | 40.2 | 39.6 | 7.74 | 3.96 | 1.74 | 1.41 | 6.03 | 3.20 | 2.00 | 1.54 | 30.5 | 18.8 | 3.51 | 2.64 | 55.0 | 30.2 | |
VU9P | ⋯ | ⋯ | 0.306 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | |
Cu | EPYC | 1.14 | 0.702 | 22.2 | 9.38 | 3.43 | 2.04 | 11.9 | 5.28 | 3.09 | 1.56 | 47.9 | 19.5 | 4.20 | 2.73 | 200 | 62.1 |
3080 Ti | 14.9 | 8.98 | 30.5 | 4.18 | 8.52 | 1.51 | 18.8 | 3.15 | 7.98 | 1.81 | 74.6 | 11.2 | 14.7 | 2.32 | 294 | 33.0 | |
V100 | 15.7 | 15.7 | 8.73 | 4.81 | 1.56 | 1.27 | 5.71 | 3.18 | 1.84 | 1.38 | 24.3 | 12.2 | 2.60 | 1.83 | 91.1 | 37.3 | |
A100 | 36.9 | 36.9 | 4.41 | 2.65 | 1.36 | 1.15 | 3.35 | 2.15 | 1.63 | 1.42 | 13.5 | 7.49 | 2.15 | 1.78 | 36.2 | 21.0 | |
MI250 | 39.0 | 39.1 | 8.27 | 4.13 | 1.37 | 1.21 | 5.62 | 2.98 | 1.59 | 1.35 | 26.9 | 12.6 | 2.56 | 2.00 | 55.4 | 29.5 | |
VU9P | ⋯ | ⋯ | 0.310 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | |
HEA | EPYC | ⋯ | ⋯ | 32.8 | 13.0 | 7.04 | 4.58 | 15.3 | 7.64 | 6.83 | 3.80 | 81.0 | 33.4 | 8.56 | 5.68 | 156 | 45.9 |
3080 Ti | ⋯ | ⋯ | 65.3 | 9.72 | 10.5 | 2.51 | 36.1 | 6.83 | 11.9 | 3.24 | 171 | 24.9 | 29.6 | 5.37 | 290 | 32.8 | |
V100 | ⋯ | ⋯ | 20.1 | 10.9 | 2.88 | 2.39 | 12.3 | 6.86 | 12.3 | 2.85 | 55.2 | 28.4 | 9.42 | 5.47 | 91.2 | 37.4 | |
A100 | ⋯ | ⋯ | 10.4 | 6.09 | 2.13 | 1.83 | 7.25 | 5.48 | 2.98 | 2.83 | 30.1 | 17.1 | 4.21 | 4.22 | 35.0 | 20.0 | |
MI250 | ⋯ | ⋯ | 20.1 | 11.6 | 4.57 | 4.22 | 16.2 | 12.0 | 7.01 | 6.44 | 76.0 | 44.9 | 9.09 | 7.61 | 55.7 | 30.5 |
. | . | loc_frame . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | EPYC | 1.25 | 0.699 | 19.3 | 8.73 | 3.89 | 2.61 | 8.33 | 3.43 | 3.78 | 1.86 | 37.2 | 15.1 | 5.04 | 3.63 | 221 | 83.8 |
3080 Ti | 12.9 | 8.63 | 29.0 | 4.21 | 9.71 | 1.73 | 20.8 | 3.43 | 9.06 | 1.99 | 69.5 | 10.5 | 18.5 | 2.89 | 294 | 32.3 | |
V100 | 16.1 | 16.8 | 8.25 | 4.59 | 1.94 | 1.51 | 6.21 | 3.53 | 2.22 | 1.62 | 22.2 | 11.3 | 3.31 | 2.41 | 91.2 | 37.2 | |
A100 | 35.7 | 33.9 | 4.37 | 3.01 | 1.56 | 1.42 | 4.11 | 2.44 | 2.07 | 1.53 | 12.5 | 7.17 | 2.64 | 2.25 | 35.6 | 22.4 | |
MI250 | 40.2 | 39.6 | 7.74 | 3.96 | 1.74 | 1.41 | 6.03 | 3.20 | 2.00 | 1.54 | 30.5 | 18.8 | 3.51 | 2.64 | 55.0 | 30.2 | |
VU9P | ⋯ | ⋯ | 0.306 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | |
Cu | EPYC | 1.14 | 0.702 | 22.2 | 9.38 | 3.43 | 2.04 | 11.9 | 5.28 | 3.09 | 1.56 | 47.9 | 19.5 | 4.20 | 2.73 | 200 | 62.1 |
3080 Ti | 14.9 | 8.98 | 30.5 | 4.18 | 8.52 | 1.51 | 18.8 | 3.15 | 7.98 | 1.81 | 74.6 | 11.2 | 14.7 | 2.32 | 294 | 33.0 | |
V100 | 15.7 | 15.7 | 8.73 | 4.81 | 1.56 | 1.27 | 5.71 | 3.18 | 1.84 | 1.38 | 24.3 | 12.2 | 2.60 | 1.83 | 91.1 | 37.3 | |
A100 | 36.9 | 36.9 | 4.41 | 2.65 | 1.36 | 1.15 | 3.35 | 2.15 | 1.63 | 1.42 | 13.5 | 7.49 | 2.15 | 1.78 | 36.2 | 21.0 | |
MI250 | 39.0 | 39.1 | 8.27 | 4.13 | 1.37 | 1.21 | 5.62 | 2.98 | 1.59 | 1.35 | 26.9 | 12.6 | 2.56 | 2.00 | 55.4 | 29.5 | |
VU9P | ⋯ | ⋯ | 0.310 | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | ⋯ | |
HEA | EPYC | ⋯ | ⋯ | 32.8 | 13.0 | 7.04 | 4.58 | 15.3 | 7.64 | 6.83 | 3.80 | 81.0 | 33.4 | 8.56 | 5.68 | 156 | 45.9 |
3080 Ti | ⋯ | ⋯ | 65.3 | 9.72 | 10.5 | 2.51 | 36.1 | 6.83 | 11.9 | 3.24 | 171 | 24.9 | 29.6 | 5.37 | 290 | 32.8 | |
V100 | ⋯ | ⋯ | 20.1 | 10.9 | 2.88 | 2.39 | 12.3 | 6.86 | 12.3 | 2.85 | 55.2 | 28.4 | 9.42 | 5.47 | 91.2 | 37.4 | |
A100 | ⋯ | ⋯ | 10.4 | 6.09 | 2.13 | 1.83 | 7.25 | 5.48 | 2.98 | 2.83 | 30.1 | 17.1 | 4.21 | 4.22 | 35.0 | 20.0 | |
MI250 | ⋯ | ⋯ | 20.1 | 11.6 | 4.57 | 4.22 | 16.2 | 12.0 | 7.01 | 6.44 | 76.0 | 44.9 | 9.09 | 7.61 | 55.7 | 30.5 |
The maximum number of atoms (103) that a GPU card can simulate for water, Cu, and HEA systems. “FP64” means double floating precision, “FP32” means single floating precision, and “FP64c” and “FP32c” mean the compressed model109 for double and single floating precision, respectively. “3080 Ti” performed on an NVIDIA GeForce RTX 3080 Ti card (12 GB), “V100” performed on an NVIDIA Tesla V100 card (40 GB), and “A100” performed on an NVIDIA Tesla A100 card (80 GB).a
. | . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | 3080 Ti | 27 | 51 | 127 | 141 | 44 | 74 | 94 | 128 | 10 | 21 | 86 | 95 | 3 | 7 |
V100 | 73 | 135 | 415 | 493 | 114 | 196 | 274 | 430 | 28 | 55 | 250 | 265 | 9 | 19 | |
A100 | 189 | 332 | 987 | 1128 | 288 | 488 | 651 | 900 | 76 | 147 | 618 | 736 | 22 | 49 | |
Cu | 3080 Ti | 18 | 35 | 214 | 202 | 40 | 77 | 125 | 151 | 7 | 14 | 83 | 144 | 3 | 7 |
V100 | 54 | 106 | 606 | 635 | 122 | 183 | 337 | 461 | 19 | 39 | 271 | 330 | 9 | 19 | |
A100b | 141 | 244 | 1534 | 1615 | 286 | 453 | 706 | 1074 | 50 | 99 | 697 | 867 | 22 | 49 | |
HEA | 3080 Ti | 11 | 19 | 60 | 69 | 21 | 33 | 47 | 59 | 5 | 9 | 42 | 52 | 3 | 7 |
V100 | 30 | 53 | 175 | 184 | 57 | 87 | 131 | 166 | 13 | 25 | 117 | 142 | 9 | 19 | |
A100 | 76 | 132 | 447 | 468 | 140 | 218 | 323 | 408 | 35 | 66 | 292 | 365 | 22 | 48 |
. | . | se_e2_a . | se_e2_a+se_e2_r . | se_e2_a+se_e3 . | se_atten . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
System . | Hardware . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . | FP64c . | FP32c . | FP64 . | FP32 . |
Water | 3080 Ti | 27 | 51 | 127 | 141 | 44 | 74 | 94 | 128 | 10 | 21 | 86 | 95 | 3 | 7 |
V100 | 73 | 135 | 415 | 493 | 114 | 196 | 274 | 430 | 28 | 55 | 250 | 265 | 9 | 19 | |
A100 | 189 | 332 | 987 | 1128 | 288 | 488 | 651 | 900 | 76 | 147 | 618 | 736 | 22 | 49 | |
Cu | 3080 Ti | 18 | 35 | 214 | 202 | 40 | 77 | 125 | 151 | 7 | 14 | 83 | 144 | 3 | 7 |
V100 | 54 | 106 | 606 | 635 | 122 | 183 | 337 | 461 | 19 | 39 | 271 | 330 | 9 | 19 | |
A100b | 141 | 244 | 1534 | 1615 | 286 | 453 | 706 | 1074 | 50 | 99 | 697 | 867 | 22 | 49 | |
HEA | 3080 Ti | 11 | 19 | 60 | 69 | 21 | 33 | 47 | 59 | 5 | 9 | 42 | 52 | 3 | 7 |
V100 | 30 | 53 | 175 | 184 | 57 | 87 | 131 | 166 | 13 | 25 | 117 | 142 | 9 | 19 | |
A100 | 76 | 132 | 447 | 468 | 140 | 218 | 323 | 408 | 35 | 66 | 292 | 365 | 22 | 48 |
The results on the MI250 card are not reported since the ROCm Toolkit reported a segmentation fault when the number of atoms increased.
Two MPI ranks were used since integer overflow occurred in TensorFlow when the number of elements in an operator exceeded 231.
On the other hand, the DeepPot-SE descriptor offers greater generalization in terms of both accuracy and performance. The compressed models are 1–10× faster than the original for training and inference, and the NVNMD is 50–100× faster than the regular MD, both of which demonstrate impressive computational performance. It is expected that the MD performance of the uncompressed water se_e2_a FP64 model (8.25 µs/step/atom on a single V100 card) is close to the MD performance reported in Ref. 41 (8.19 µs/step/atom per V100 card), and the MD performance of the compressed model (1.94 µs/step/atom) is about 3× faster in this case. In addition, the compressed model can simulate 6× atoms in a single card compared to the uncompressed model. The three-body embedding descriptor theoretically contains more information than the two-body embedding descriptor and is expected to be more accurate but slower. While this is true for the water and copper systems, the expected order of accuracy is not clearly observed for the HEA and dipeptides datasets. Further research is required to determine the reason for this discrepancy, but it is likely due to the loss not converging within the same training steps when more chemical species result in more trainable parameters. Furthermore, the performance on these two datasets slows down as there are more neural networks.
The attention-based models with the type embedding exhibit better accuracy for the HEA system and equivalent accuracy for the dipeptide system. These models also have the advantage of faster training on GPUs, with equivalent accuracy for these two systems, by reducing the number of neural networks. However, this advantage is not observed on CPUs or MD simulations, as attention layers are computationally expensive, which calls for future improvements. Furthermore, when there are many chemical species, the attention-based descriptor requires less CPU or GPU memory than other models since it has fewer neural networks. This feature makes it possible to apply to the OC2M dataset with over 60 species and the SPICE dataset with about 20 species.
It is noteworthy that in nearly all systems, FP32 is 0.5 to 2× faster than FP64 and demonstrates similar validation errors. In this case, since we only apply FP32 into neural networks but keep precision in other components, the model precision is also well-known as “mixed precision.” This result is consistent with the fact that mixed precision has been widely adopted in other packages.38,140 Therefore, FP32 should be widely adopted in most applications. Moreover, FP32 enables high performance on hardware with poor FP64 performance, such as consumer GPUs or CPUs.
VI. SUMMARY
DeePMD-kit is a powerful and versatile community-developed open-source software package for molecular dynamics (MD) simulations using machine learning potentials (MLPs). Its excellent performance, usability, and extensibility have made it a popular choice for researchers in various fields. DeePMD-kit is licensed under the LGPL-3.0 license, which allows anyone to use, modify, and extend the software freely. Thanks to its well-designed code architecture, DeePMD-kit is highly customizable and can be easily extended in various aspects. The models are organized as Python modules in an object-oriented design and saved into the computing graphs, making it easier to add new models. The computing graph is composed of TensorFlow and customized operators, making it easier to optimize the package for a particular hardware architecture and certain operators. The package also has rich and flexible APIs, making it easier to integrate with other molecular simulation packages. DeePMD-kit is open to contributions from researchers in computational science, and we hope that the community will continue to develop and enhance its features in the future.
ACKNOWLEDGMENTS
The authors thank Yihao Liu, Xinzijian Liu, Haidi Wang, Hailin Yang, and the GitHub user ZhengdQin for their code contribution to DeePMD-kit; Zhi X. Chen, Jincai Yang, and Tong Zhu for suggestions to the manuscript; and Ming Li for designing the highlight image. D.T. is grateful to Stefano Baroni, Riccardo Bertossa, Federico Grasselli, and Paolo Pegolo for enlightening discussions throughout the completion of this work. ChatGPT was used to edit the language with the prompt “polish in English,” and its outputs were manually reviewed. The work of J.Z. and D.M.Y. was supported by the National Institutes of Health (Grant No. GM107485 to D.M.Y.) and the National Science Foundation (Grant No. 2209718 to D.M.Y.). J.Z. is grateful for the Van Dyke Award from the Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey. The work of Y.C., Yifan Li, and R.C. was supported by the “Chemistry in Solution and at Interfaces” (CSI) Center funded by the United States Department of Energy under Award No. DE-SC0019394. The work of M.R. was supported by the VEGA under Project No. 1/0640/20 and by the Slovak Research and Development Agency under Contract No. APVV-19-0371. The work of Q.Z. was supported by the Science and Technology Innovation Program of Hunan Province under Grant No. 2021RC4026. The work of S.L.B. was supported by the Research Council of Norway through the Centre of Excellence Hylleraas Centre for Quantum Molecular Sciences (Grant No. 262695). The work of C.L. and R.W. was supported by the United States Department of Energy (DOE) under Award No. DE-SC0019759. The work of H.W. was supported by the National Key R&D Program of China under Grant No. 2022YFA1004300 and the National Natural Science Foundation of China under Grant No. 12122103. Computational resources were provided by the Bohrium Cloud Platform at DP technology; the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey; the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which was supported by National Science Foundation under Grant Nos. 2138259, 2138286, 2138307, 2137603, and 2138296 (supercomputer Expanse at SDSC through allocation Grant No. CHE190067); the Texas Advanced Computing Center (TACC) at the University of Texas at Austin, URL: http://www.tacc.utexas.edu (supercomputer Frontera through allocation Grant No. CHE20002); the AMD Cloud Platform at AMD, Inc; and the Princeton Research Computing resources at Princeton University, which is a consortium of groups led by the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology’s Research Computing.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Jinzhe Zeng: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Duo Zhang: Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Denghui Lu: Methodology (equal); Software (equal); Writing – review & editing (equal). Pinghui Mo: Investigation (equal); Methodology (equal); Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Zeyu Li: Software (equal); Writing – review & editing (equal). Yixiao Chen: Software (equal); Writing – review & editing (equal). Marián Rynik: Software (equal); Writing – review & editing (equal). Li’ang Huang: Software (equal); Writing – review & editing (equal). Ziyao Li: Software (equal); Writing – review & editing (equal). Shaochen Shi: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Yingze Wang: Software (equal); Writing – review & editing (equal). Haotian Ye: Software (equal); Writing – review & editing (equal). Ping Tuo: Software (equal); Writing – review & editing (equal). Jiabin Yang: Software (equal); Writing – review & editing (equal). Ye Ding: Software (equal); Writing – original draft (equal); Writing – review & editing (equal). Yifan Li: Investigation (equal); Software (equal); Writing – review & editing (equal). Davide Tisi: Software (equal); Writing – review & editing (equal). Qiyu Zeng: Funding acquisition (equal); Software (equal); Writing – review & editing (equal). Han Bao: Software (equal); Writing – review & editing (equal). Yu Xia: Software (equal); Writing – review & editing (equal). Jiameng Huang: Software (equal); Writing – review & editing (equal). Koki Muraoka: Software (equal); Writing – review & editing (equal). Yibo Wang: Software (equal); Writing – review & editing (equal). Junhan Chang: Software (equal); Writing – review & editing (equal). Fengbo Yuan: Software (equal); Writing – review & editing (equal). Sigbjorn Loland Bore: Funding acquisition (equal); Software (equal); Writing – review & editing (equal). Chun Cai: Software (equal); Writing – original draft (supporting); Writing – review & editing (equal). Yinnian Lin: Software (equal); Writing – review & editing (equal). Bo Wang: Software (equal); Writing – review & editing (equal). Jiayan Xu: Software (equal); Writing – review & editing (equal). Jia-Xin Zhu: Software (equal); Writing – review & editing (equal). Chenxing Luo: Software (equal); Writing – review & editing (equal). Yuzhi Zhang: Software (equal); Writing – review & editing (equal). Rhys E. A. Goodall: Software (equal); Writing – review & editing (equal). Wenshuo Liang: Software (equal); Writing – review & editing (equal). Anurag Kumar Singh: Software (equal); Writing – review & editing (equal). Sikai Yao: Software (equal); Writing – review & editing (equal). Jingchao Zhang: Software (equal); Writing – review & editing (equal). Renata Wentzcovitch: Funding acquisition (equal); Supervision (equal); Writing – review & editing (equal). Jiequn Han: Methodology (equal); Software (equal); Writing – review & editing (equal). Jie Liu: Conceptualization (equal); Methodology (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal). Weile Jia: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Darrin M. York: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal). Weinan E: Conceptualization (equal); Methodology (equal); Supervision (equal); Writing – review & editing (equal). Roberto Car: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal). Linfeng Zhang: Conceptualization (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Writing – review & editing (equal). Han Wang: Conceptualization (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Software (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
DeePMD-kit is openly hosted at the GitHub repository: https://github.com/deepmodeling/deepmd-kit. The datasets, the models, the simulation systems, and the benchmarking scripts used in this study can be downloaded from the GitHub repository: https://github.com/deepmodeling-activity/deepmd-kit-v2-paper. Other data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: FIFTH-ORDER POLYNOMIAL INTERPOLATION
APPENDIX B: FEATURES WITHOUT GPU SUPPORT
At present, the following features do not have GPU support:
Calculating the maximum number of neighbors Nc within the cutoff radius from the given data.
All NVNMD-specific features.
The KSpace solver for DPLR in the LAMMPS plugin.