The extended-system adaptive biasing force (eABF) method and its newer variants offer rapid exploration of the configuration space of chemical systems. Instead of directly applying the ABF bias to collective variables, they are harmonically coupled to fictitious particles, which separates the problem of enhanced sampling from that of free energy estimation. The prevalent analysis method to obtain the potential of mean force (PMF) from eABF is thermodynamic integration. However, besides the PMF, most information is lost as the unbiased probability of visited configurations is never recovered. In this contribution, we show how statistical weights of individual frames can be computed using the Multistate Bennett’s Acceptance Ratio (MBAR), putting the post-processing of eABF on one level with other frequently used sampling methods. In addition, we apply this formalism to the prediction of nuclear magnetic resonance shieldings, which are very sensitive to molecular geometries and often require extensive sampling. The results show that the combination of enhanced sampling by means of extended-system dynamics with the MBAR estimator is a highly useful tool for the calculation of ensemble properties. Furthermore, the extension of the presented scheme to the recently published Gaussian-accelerated molecular dynamics eABF hybrid is straightforward and approximation free.

Exploring high-dimensional potential energy surfaces of complex molecular systems poses a great challenge, as it often involves extensive sampling of regions separated by free energy barriers.1,2 Molecular dynamics (MD) or Monte Carlo (MC) simulations usually remain trapped in free energy wells, leading to non-ergodic sampling. Therefore, it is paramount to use importance-sampling algorithms to improve sampling efficiency.3–8 Many of these methods rely on the definition of a low-dimensional set of collective variables (CVs) that are sufficient to discriminate between the states of interest. Typically, one to three-dimensional variables are chosen, because of the massive growth of sampling needed in higher-dimensional space, known as curse of dimensionality,9 which, in turn, leads to immense computational cost.

The adaptive biasing force (ABF)4,10,11 method is one such algorithm that achieves uniform sampling along the CV. Here, a running estimate of the mean force is applied as bias. Subsequently, the potential of mean force (PMF) can directly be obtained from thermodynamic integration (TI)12 of the biasing force. Despite its outstanding stability and beneficial formal convergence properties, the application of ABF has been limited by stringent technical requirements on CVs to be usable for ABF.13,14 These could be lifted by Lesage et al.15 in the extended-system ABF (eABF) method. Instead of directly applying a biasing force to the physical system, each CV is coupled to a fictitious particle, extending the system.

In addition to removing all technical constraints on the CV, this framework separates the problem of sampling from free energy estimation. The resulting flexibility in the choice of bias force applied to the fictitious particle has been exploited by combining ABF and metadynamics,16–18 termed well-tempered metadynamics extended-system ABF (WTM-eABF),19,20 a highly potent enhanced sampling scheme, which stands out due to its efficiency and robustness.21 In line with the original ABF method, the PMF is typically obtained via thermodynamic integration of the estimated free energy gradient.15,22 Very recently, Gaussian-accelerated WTM-eABF (GaWTM-eABF) was introduced by Chen et al.,23 where WTM-eABF is coupled with Gaussian-accelerated MD (GaMD),24 which helps to overcome barriers orthogonal to the CV.

Many CV-based advanced sampling techniques not only offer access to the PMF but also provide access to the unbiased probability of a configuration. The original weight of a trajectory frame can be obtained, e.g., for umbrella sampling3 by means of Multistate Bennett’s Acceptance Ratio (MBAR).25–27 While formalisms have been derived for metadynamics by which one regains the unbiased probability of a visited configuration,28–31 for the classes of ABF methods, such an algorithm is missing, barring users from computing unbiased ensemble averages.

In this contribution, we close this gap for the aforementioned family of eABF-based algorithms. We show that MBAR can be applied to extend the scope of eABF-based enhanced sampling from pure PMF calculation to the calculation of ensemble averages in general. By solving the MBAR equations, Boltzmann weights of individual configurations are obtained, which enables the direct reweighting of any coordinate-dependent observable of interest. Because no histograms or other form of finite numeric grids are introduced, MBAR also provably yields the lowest variance free energy estimate.26,32

In the following, we will first revise the theory behind enhanced sampling using the ABF and eABF algorithms. We then show how MBAR can be applied to the statistical analysis of eABF simulations and demonstrate through several numerical examples how eABF-trajectories are unbiased by means of MBAR. Finally, we showcase for a real chemical system how advanced sampling and unbiased averages can be useful for calculating equilibrium nuclear magnetic resonance (NMR) shieldings. Due to their sensitivity to the molecular geometry, the accurate prediction of NMR shieldings is particularly challenging and often requires extensive sampling of molecular configurations.33,34 Further, we demonstrate how by reweighting, eABF-biased simulations can even yield CV-resolved NMR shieldings.

We consider a system consisting of Na atoms with Cartesian coordinates xT=(x1,,x3Na), whose physical distribution for the potential energy surface (PES) U(x) is the Boltzmann-distribution ρ(x) at absolute temperature T.

We assume that a reaction coordinate can be represented by a collective variable (CV) ξ(x):R3NaR, which has the marginal probability distribution,

(1)

where ⟨⟩ denotes the ensemble average and δ[x] the Dirac delta distribution.1 The PMF A(z), i.e., free energy profile, along a CV is usually defined as

(2)

where β=(kBT)1 and kB is the Boltzmann constant. Note that the shape of the PMF is not gauge invariant against the choice of CV function ξ. Therefore, features of the PMF like minima or transition barriers have no CV-independent meaning.35 This is, however, not problematic for the calculation of free energy differences as long as the CV can discriminate between configurations of states R and P, because the CV-dependent local features are integrated out to obtain the probabilities of the system occupying either state R or P,36 

(3)

In the case of chemical reactions, there often exist several metastable states that are separated by free energy barriers much larger than the thermal energy kBT,1 which are not surmounted within the typical time scale of molecular dynamics simulations. For the exploration of chemical space, it is, therefore, crucial to accelerate sampling, e.g., by means of ABF, which we will revise in the following.

The ABF algorithm enhances sampling of low probability states by introducing a force A(z)zξ(x) that cancels the unbiased mean force and leads to uniform sampling along the CV.11 

Often, one has no prior knowledge of the free energy derivative, and, thus, ABF uses an on-the-fly estimate of the mean force. For this purpose, the CV is divided in L equally spaced bins of size dz. The approximation of the bias force F̄(zi) in bin i is the average of collected force samples Fiμ,11 

(4)

Assuming ergodicity, F̄(zi) will converge to the z-conditioned ensemble average Fξ(x)=zi for a sufficiently large sample size Ni.

Taking a more general perspective, this corresponds to biasing the system dynamics with a time-dependent approximation of the free energy surface projected onto a low-dimensional collective variable, At(x)◦ξ(x).11 Although this approximation is very poor at the beginning of a simulation and the bias based on the running average has to be downscaled in the low-sampling regime, it systematically improves over the course of the simulation. This biasing strategy connects ABF to metadynamics and its well-tempered variant, which follow a similar idea, but push the system away from visited regions with a time-dependent repulsive potential. However, both algorithms change the bias potential and, thereby, also the weights of each frame over time. Recovering the statistical information of the underlying unbiased system is, therefore, difficult. While reweigthing schemes have been proposed for metadynamics,28–31 to our knowledge none has been reported for ABF simulations.

Free energy surfaces can directly be recovered from F̄(zi) by TI.10 Despite its appealing theoretical simplicity and efficiency, the application of ABF is hindered by some practical limitations: When using multiple CVs, they are required to be mutually orthogonal as well as orthogonal to any additional constraints on the system. Additionally, a Jacobian term depending on the second derivative of the coordinates has to be calculated.11 

To circumvent the technical requirements of ABF, Lesage et al.15 proposed the extended-system ABF (eABF) method, where a fictitious particle λ is coupled to the CV by a harmonic spring. The extended system (x, λ) evolves according to the potential

(5)

with thermal coupling width to the extended system σ. The key intuition behind this choice of potential is that in the tight coupling (low σ) limit, efficient sampling of λ will result in efficient sampling of ξ. Therefore, it is sufficient to bias the dynamics of λ to accelerate sampling along ξ with any choice of bias potential Ub(λ, t). This opens up great flexibility for the choice of the importance-sampling algorithm. As each CV is coupled to its own fictitious particle, orthogonality conditions are fulfilled by construction, and the application of ABF is trivial. An especially useful variant is the combination of eABF with (Well-Tempered) Metadynamics (meta-eABF, WTM-eABF), which offers rapid exploration of free energy surfaces.19,20 An asymptotically unbiased estimator of A(z), which is independent of Ub(λ, t), can be derived by correcting the free energy gradient, obtained from the biased distribution ρB(z) of the extended system, with the average harmonic force on z,

(6)

This estimator is called corrected z-averaged restraint (CZAR) and can be calculated from the trajectories of ξ and λ alone.15 In the spirit of the original ABF method, the CZAR estimator yields the free energy gradient, which is integrated using TI.

However, none of these algorithms are able to accelerate sampling of degrees of freedom orthogonal to the CV. In recent years, great effort went into the development of CVs that contain all slow degrees of freedom to circumvent this inherent problem, for example, by means of machine learning.37–41 Chen et al.23 recently reported an alternative approach by combining WTM-eABF with Gaussian-accelerated MD (GaWTM-eABF). The GaMD potential, which is only a function of the potential energy itself, reduces free energy barriers along all degrees of freedom. Then WTM-eABF gives sampling a specific direction on this modified potential. The full GaWTM-eABF potential reads

(7)

where the GaMD potential ΔU(x) is given by

(8)

Parameters E = Umax and k0=min1,σ0σUUmaxUminUmaxUavg are obtained from an equilibration calculation, the only free parameter being σ0, which defines the upper bound for the standard deviation of ΔU(x). Umax, Umin, Uavg, and σU denote the maximum, minimum, average, and standard deviation of U(x) in the equilibration stage, respectively. The PMF is obtained by summing the contributions of both biases,

(9)

where the CZAR estimate is used for WTM-eABF and the GaMD bias is approximated by a second order cumulant expansion with average and variance of ΔU, C1 and C2, respectively. This approximation requires the distribution of ΔU to be Gaussian-shaped, which can be ensured by an appropriate choice of σ0.

For all methods discussed thus far, the unbiased Boltzmann weight of individual frames is lost. In Sec. II C, we will show how the MBAR estimator can be applied to recover statistical weights and calculate any ensemble average of interest from all eABF-based methods.

To gain a statistically optimal analysis of eABF-biased trajectories, we will make use of MBAR.26 In eABF simulations, the physical system is effectively sampled under harmonic restraint for each value of the fictitious particle λ. We take advantage of this to split the extended-system Uext(x, λ, t), where λ is a dynamic variable, into M biased systems Ui(x) with constant λ = λi. Configurations of the continuously sampled trajectory are separated post hoc into M biased states. The modified potential in window i reads

(10)

where the GaMD potential ΔU(x) is only added for GaWTM-eABF. The biased probability distribution ρB(z) of the eABF trajectory is converted to a mixture distribution of M overlapping, λ-conditioned, biased distributions ρB(z|λi) (see also Sec. S1 of the supplementary material).

Note that one could obtain a similar set of biased windows by means of umbrella sampling3 with M harmonic biasing potentials. However, this has several disadvantages: Overlap of biased probability distributions ρ(z|λi) has to be ensured by hand prior to the simulation with an appropriate choice of λ windows. This can be far from trivial and often requires significant experience from the user, whereas eABF ensures a uniform distribution of ρ(z) automatically. Moreover, when using eABF, the physical system diffuses between states, which does not happen in standard umbrella simulations that have a static λi value. For systems with parallel valleys, this freedom of motion helps overcoming orthogonal barriers,11 especially when paired with multiple-walker techniques42,43 or GaMD.23 

To obtain an estimate of the reduced free energy differences fi between λ-states, one can solve the MBAR equations,26 

(11)

with Nm samples in the mth λ-state such that N=m=1MNm. Because we are only interested in relative free energies, f̂1 is typically fixed to zero. Note that unlike before [compare Eq. (9)], no Gaussian distribution of ΔU is required.

By solving the MBAR equations iteratively, we obtain the weights W(xn) of individual samples as

(12)

where N is a normalization constant to ensure that nW(xn) = 1. As noted by Shirts and Ferguson,27 it is no longer relevant which sample belongs to which state as the normalized probability of each sample is known. An unbiased equilibrium ensemble average of a position-dependent operator O(xn) from all collected samples is obtained via reweighting,

(13)

The unbiased probability density ρ(z) (and therefore the PMF) along some CV can be obtained by calculating the expectation of the indicator function27 

(14)

where

(15)

with bin centers zi and bin width Δz. Note that the indicator function can potentially map to any K-dimensional CV (which might be different from the original bias CVs). Similarly, free energy differences are directly computed from weights W(xn), instead of numeric integration of ρ(z) [compare Eq. (3)],

(16)

where R and P denote two metastable states divided by a free energy barrier.

All sampling algorithms and numerical simulations are implemented, performed, and analyzed using NumPy. PMFs are calculated by post-processing the MD trajectories of ξ and λ. The TI of the CZAR estimate is computed with the trapezoid rule. The MBAR equations are solved self-consistently and the PMF is computed as a combination of Eqs. (2) and (14). The starting guess for all βf̂i's is zero and f̂1 is set to zero after every cycle. Convergence is reached when the largest change of βf̂i compared to the last cycle drops under 10−6. Subsamples in λ-windows with window size Δλ are obtained by selecting all frames where λ ∈ [λ − Δλ/2, λ + Δλ/2] from the full biased trajectory. Error bars for both estimators are calculated via bootstrapping.32,44 For this purpose, the data are resampled 100 times with replacement and equilibrium ensembles are calculated for each bootstrap sample. A generalized Python class for various adaptive sampling methods and their analysis methods is available at https://github.com/ochsenfeld-lab/adaptive_sampling; the numerical examples from this study can be found there as well.

For numerical tests, we consider a single particle of mass 10 a.u. moving on two different two-dimensional double-well potentials given by

(17)
(18)

where (x, y) are particle coordinates and a, b, c, and d are free parameters of the potential, respectively (reported in Table I). The particle evolves according to Langevin dynamics45 with friction constant 0.001 fs−1. The system was propagated using a Velocity Verlet46 integrator with a step size of 5 fs for 2 × 106 steps (10 ns) for simulations in U1 and 4 × 106 steps (20 ns) for simulations in U2. The initial momenta were drawn randomly from the Maxwell–Boltzmann distribution at 300 K. Analytic reference PMFs were obtained by numerical integration of exact probability densities.

TABLE I.

Parameters of numerical potentials U1 [Eq. (17)] and U2 [Eq. (18)].

U1U2
 a 8×106kJmolÅ4    5 × 10−3 Å−2 
5×101kJmolÅ2 4 × 10−2 Å−2 
80 Å 40 Å 
160 Å 20 Å 
ϵ ⋯ 1 kJ mol−1 
U1U2
 a 8×106kJmolÅ4    5 × 10−3 Å−2 
5×101kJmolÅ2 4 × 10−2 Å−2 
80 Å 40 Å 
160 Å 20 Å 
ϵ ⋯ 1 kJ mol−1 

Sampling along x was enhanced using eABF,15 WTM-eABF,20 or GaWTM-eABF23 with σ = 2.0 Å and mλ = 20 a.u. The fictitious particle was confined to the range of interest by harmonic walls with a force constant of 500 kJ/(mol Å2). The ABF force was scaled with a linear ramp and fully applied in bins with more than 100 samples. For the metadynamics potential, Gaussians with a standard deviation of 6.0 Å were added every 20 steps with an initial height of 1.0 kJ/mol and scaled down over the course of the simulation in the well-tempered framework47 applying an effective temperature of 4000 K. To obtain parameters for the GaMD potential, the system was equilibrated for 4 × 105 steps, including 5 × 104 initial steps where no boost potential is applied. σ0 was set to 3.5 kJ/mol. PMFs along the x-axis and free energy difference were calculated with the MBAR and CZAR estimator as described above.

To demonstrate the calculation of unbiased ensemble averages of observables in real chemical systems, we compute the equilibrium NMR shieldings of gaseous ammonia. For this purpose, ab initio simulations are carried out with the Python interface of our in-house quantum-chemical program suite FermiONs++48–51 at the level of ωB97M-V/def2-TZVP.52,53 The molecule was heated from 0.1 to 310 K over 3100 time steps with a step size of 0.1 fs. Initial momenta were randomly drawn from the Maxwell–Boltzmann distribution. Velocities were rescaled every 10 time steps to increase the temperature by 1 K. For the production run, the heated system was simulated for 30 ps with a time step of 0.5 fs. The temperature was controlled with the Langevin thermostat at 310 K with a friction coefficient of 0.001 fs−1. The dynamics of the ammonia inversion was biased with the WTM-eABF algorithm.19,20 The mean of the three improper torsional angles χ was chosen as collective variable,

(19)

where rN denotes the position of the nitrogen atom and rH1 to rH3 the hydrogen atom positions, respectively. For the WTM potential, Gaussians with an initial height of 1 kJ/mol and variance of 4° were deposited every 10 fs. The effective temperature was set to 1000 K. The fictitious particle had a mass of 20 a.u. and was coupled to the CV with a thermal width of 2°. The WTM force and ABF were collected on a grid with bin size 2°. The ABF was scaled with a linear ramp and fully applied in bins with more than 500 samples. Absolute isotropic NMR shieldings were calculated for all MD frames and for the optimized geometry using B97-2/pcSseg-2.54,55

The CZAR and MBAR estimators are tested for extended-system dynamics by simulations of a single particle on potential energy surfaces U1 and U2. In both potentials, two metastable states are separated by barriers of roughly 8 kBT (20 kJ/mol). As CV, ξ1(x, y) = x is chosen for all test simulations. We start with potential U1, which is shown in Fig. 1(a) together with data points of an eABF-biased trajectory of the particle. Figure 1(b) shows the trajectory of the CV, which frequently crosses the transition state and gives an overall uniform probability density of z and λ. The resulting PMFs as obtained from MBAR and CZAR are shown in Fig. 1(c), both are almost identical with the analytical PMF and the bootstrapped standard deviations of both estimates are negligible because of the large sample size along the x-axis. As both estimators are asymptotically unbiased, this is what we expected in the limit of large sample sizes. Figure 1(d) shows PMF RMSDs for increasing distance between discrete λ-states Δλ, hereafter referred to as “window size.” The corresponding PMF RMSD from CZAR is shown in orange. Interestingly, the PMF RMSD of the MBAR estimator is constant at about 0.1 kJ/mol if the window size is below or equal to the thermal coupling width σ of λ to the CV, which was 2 Å in this simulation. This can be rationalized by the fact that σ is the standard deviation of the CV for a given λ-state. Thus, it provides an intuitive and robust choice for Δλ. For larger windows of up to 5 Å, the PMF RMSD rises moderately to about 0.3 kJ/mol. Only for window sizes larger than 6 Å (3 σ), the error rises above 1 kJ/mol. The CZAR estimate has no Δλ-dependence and is, therefore, visualized as a line in Fig. 1(d).

FIG. 1.

(a) Every 10th data point of the eABF trajectory. The color indicates the time progression by gradually changing from yellow to red over the course of the simulation. Potential U1 [Eq. (17)] is shown as contour plot. (b) Left: eABF trajectory along ξ(x, y) = x. Right: Probability densities of ξ and λ. (c) PMFs obtained with either analysis method, analytical reference in green. Error bars indicating the standard deviation of 100 bootstrap runs are smaller than the linewidth. (d) RMSD of PMFs with respect to the analytic reference. The MBAR PMF is computed for different distances between discrete λ-states. The PMF RMSD obtained with CZAR is shown as orange line. All PMFs are computed for a bin size of Δz = 2 Å.

FIG. 1.

(a) Every 10th data point of the eABF trajectory. The color indicates the time progression by gradually changing from yellow to red over the course of the simulation. Potential U1 [Eq. (17)] is shown as contour plot. (b) Left: eABF trajectory along ξ(x, y) = x. Right: Probability densities of ξ and λ. (c) PMFs obtained with either analysis method, analytical reference in green. Error bars indicating the standard deviation of 100 bootstrap runs are smaller than the linewidth. (d) RMSD of PMFs with respect to the analytic reference. The MBAR PMF is computed for different distances between discrete λ-states. The PMF RMSD obtained with CZAR is shown as orange line. All PMFs are computed for a bin size of Δz = 2 Å.

Close modal

We next turn to WTM-eABF and GaWTM-eABF simulations on potential U2, where we again choose ξ1(x, y) = x. Figure 2(a) shows the PMF obtained from WTM-eABF. As before, CZAR and MBAR estimates are almost identical and bootstrapped standard deviations are negligible. However, both computed PMFs are very different from the analytical result. The reason becomes clear when looking at the sampling of the potential U2(x, y), as depicted in Fig. 2(b). Because of the chosen collective variable, sampling is only accelerated along the x-axis, missing the y-component of the transition between both minima. Therefore, the system rarely crosses the transition state, but rather samples both parallel valleys. As shown in Figs. 2(c) and 2(d), PMFs estimated with GaWTM-eABF are much closer to the analytic reference because the dividing surface between both states is much better sampled. Here, the MBAR estimate slightly underestimates the analytical reference; an error in this region is expected as both valleys are still sampled more densely than the transition state. The CZAR estimate overestimates the PMF at the transition state. This can be attributed to the additional second order cumulant expansion of ΔU [Eq. (9)], which accounts for the contribution of GaMD to the PMF. The implied approximation only holds if ΔU obeys a near-Gaussian distribution (see also Sec. S2 of the supplementary material). For larger values of σ0, this approximation breaks down because the distribution of ΔU is no longer Gaussian. For the MBAR estimator, no such approximations are made. Obtained PMFs are not affected by the harmonicity of ΔU and are, therefore, insensitive to the choice of GaMD parameters, making the analysis of GaWTM-eABF simulations much more robust.

FIG. 2.

[(a) and (c)] PMFs obtained from WTM-eABF and GaWTM-eABF simulations, respectively, using ξ(x, y) = x and either analysis method; analytical reference in green. Error bars indicating the standard deviation of 100 bootstrap runs are smaller than the linewidth. All PMFs are computed with bin and window size 2 Å. [(b) and (d)] Every 20th data point of the WTM-eABF and GaWTM-eABF trajectory. The color indicates the time progression by gradually changing from yellow to red over the course of the simulation. Potential U2 [Eq. (18)] is shown as contour plot.

FIG. 2.

[(a) and (c)] PMFs obtained from WTM-eABF and GaWTM-eABF simulations, respectively, using ξ(x, y) = x and either analysis method; analytical reference in green. Error bars indicating the standard deviation of 100 bootstrap runs are smaller than the linewidth. All PMFs are computed with bin and window size 2 Å. [(b) and (d)] Every 20th data point of the WTM-eABF and GaWTM-eABF trajectory. The color indicates the time progression by gradually changing from yellow to red over the course of the simulation. Potential U2 [Eq. (18)] is shown as contour plot.

Close modal

Finally, to demonstrate how MBAR can be used to recover unbiased ensemble averages, we will reweight the PMF of WTM-eABF/MBAR and GaWTM-eABF/MBAR simulations shown in Fig. 2 to a new CV, ξ2(x, y) = 0.25x + y, and we also compute the conditional average U2z. Figure 3(a) shows the reweighted PMF. Because ξ2 is orthogonal to the dividing surface of both states, it constitutes the ideal CV for the given transition.36 Because ξ1 was chosen as CV during the simulations, using WTM-eABF, the transition state is barely sampled, again causing large deviations from the analytical PMF along ξ2. GaWTM-eABF reproduces the analytic PMF remarkably well. For U2z, the difference between both sampling algorithms is less severe. Not only does the conditional average computed from GaWTM-eABF/MBAR match the analytic reference almost exactly, but also the result from WTM-eABF/MBAR is only slightly distorted. The reason is that with both methods, minima of the potential energy are sampled sufficiently, which carry the by far highest statistical weight for the mean potential energy along the x-axis. Overall, both examples demonstrate the ability of MBAR to produce accurate ensemble and conditional averages, if the underlying configuration space is sampled sufficiently. In Sec. IV B, we show how the same technique can be applied to a real chemical system, i.e., for the prediction of NMR shieldings.

FIG. 3.

(a) Reweighting of WTM-eABF and GaWTM-eABF simulations shown in Fig. 2: (a) to a different collective variable ξ2(x, y) = 0.25x + y, (b) to the conditional average of the potential energy of U2 along the x-axis.

FIG. 3.

(a) Reweighting of WTM-eABF and GaWTM-eABF simulations shown in Fig. 2: (a) to a different collective variable ξ2(x, y) = 0.25x + y, (b) to the conditional average of the potential energy of U2 along the x-axis.

Close modal

To showcase the usefulness of computing equilibrium observables from biased trajectories of a real molecular system, we calculate NMR shieldings for ammonia. Figure 4(a) shows the WTM-eABF-biased trajectory of ammonia using a linear combination of the three improper torsional angles as CV [Eq. (19)]. The transition state at z = 0°, where the molecule is planar, is frequently crossed and the probability density is roughly uniform between −50° and 50°. Figure 4(b) shows the PMF and its error bars obtained from 100 bootstrapping runs with the MBAR or CZAR estimator, respectively. Using a grid size of 1°, numerical errors of the CZAR estimate due to thermodynamic integration are negligible and both PMFs are almost identical with bootstrapped standard deviations of roughly 0.3 kJ/mol.

FIG. 4.

(a) Left: WTM-eABF trajectory. Right: Probability densities of ξ and λ. (b) PMFs obtained from WTM-eABF/MBAR and WTM-eABF/CZAR. Error bars indicate the standard deviation from 100 bootstrapping runs.

FIG. 4.

(a) Left: WTM-eABF trajectory. Right: Probability densities of ξ and λ. (b) PMFs obtained from WTM-eABF/MBAR and WTM-eABF/CZAR. Error bars indicate the standard deviation from 100 bootstrapping runs.

Close modal

Absolute isotropic shieldings are calculated for all data frames of the WTM-eABF-biased trajectory and reweighted according to Eq. (13) in order to obtain unbiased averages. Figures 5(a) and 5(b) show probability densities of 15N and 1H shifts. Blue and orange histograms represent the biased and reweighted probability density, respectively. For 1H, shieldings of all three protons are averaged for each data point. Mean chemical shifts are given as vertical lines.

FIG. 5.

[(a) and (b)] Absolute isotropic 15N and 1H shieldings, respectively. The distribution of chemical shifts as obtained from an WTM-eABF-biased trajectory is shown in blue and the MBAR reweighted distribution in orange. The vertical lines of the same color indicate mean values. The green line is positioned at the shielding value of the minimum energy geometry. [(c) and (d)] Conditional average of absolute isotropic 15N and 1H shieldings along the CV, respectively. Error bars indicate the standard error of the mean of the conditional average.

FIG. 5.

[(a) and (b)] Absolute isotropic 15N and 1H shieldings, respectively. The distribution of chemical shifts as obtained from an WTM-eABF-biased trajectory is shown in blue and the MBAR reweighted distribution in orange. The vertical lines of the same color indicate mean values. The green line is positioned at the shielding value of the minimum energy geometry. [(c) and (d)] Conditional average of absolute isotropic 15N and 1H shieldings along the CV, respectively. Error bars indicate the standard error of the mean of the conditional average.

Close modal

Biased probability distributions are slightly skewed and broader due to enhanced sampling, which deliberately overestimates the probability of configurations that are located in high energy (low probability) regions. Reweighting the probability of individual data frames with the unbiased probability as obtained from MBAR recovers the expected Gaussian-shaped Boltzmann distributions for the equilibrium NMR shieldings. Figures 5(c) and 5(d) show conditional averages of isotropic shieldings along the CV. For 15N, σz has the same shape as that of the PMF, local minima and maxima of the chemical shielding curve occur at the same z values as those of the PMF. Therefore, the 15N shielding for the optimized structure is almost identical to the unbiased mean. The biased mean is shifted by about 4 ppm, due to overestimation of the probability of transition state configurations. In contrast, the conditional average of 1H shieldings has a V-like shape, where the minimum is located at the transition state. Since high energy geometries correspond to z values that can have both lower or higher isotropic shieldings, the biased and unbiased mean are much more similar than in the case of 15N [compare Figs. 4(a) and 4(b)].

To summarize, we have shown how MBAR can be applied to post-process extended-system dynamics simulations of molecular transitions. Thus, having recovered the full unbiased statistical information of the given dataset, equilibrium properties such as NMR shieldings, are recovered seamlessly by reweighting the probability of each data frame.

We have shown that MBAR is a suitable and robust estimator of statistical information for the family of eABF-based enhanced sampling algorithms. While the CZAR estimator already yields fast and accurate on-the-fly estimates of PMFs, MBAR supplements it by offering a statistically optimal analysis in post-processing. By computing the unbiased statistical weight of each frame, it extends the application of eABF and its variants from the computation of PMFs to ensemble averages in general. The presented framework, therefore, enables the application of this highly efficient class of enhanced sampling algorithms without any loss of statistical information. We have shown how this can be useful for the prediction of equilibrium properties, such as NMR shieldings, which are highly sensitive to the molecular geometry and, therefore, have to be calculated from ensemble averages accounting for all contributing configurations. Additionally, we have shown how PMFs can be reweighted to other collective variables of interest, which might yield mechanistic insight into complex chemical processes.

We expect the recently published GaWTM-eABF to be particularly useful in this regard, as it reduces the dependence of sampling efficiency on the choice of collective variables. Here, MBAR additionally removes the requirement on the distribution of the GaMD boost potential to be Gaussian-shaped, making estimations of free energies significantly more accurate and robust against the choice of GaMD parameters.

The supplementary material contains a pdf file with a detailed discussion of the separation of the distribution of ξ into λ-dependent subsamples and the different dependence of the estimated PMF on GaMD parameters when using MBAR or a cumulant expansion. Additionally, a zip archive containing all scripts to perform the numerical simulations of Sec. IV A is given. A maintained and updated version of the code is available at https://github.com/ochsenfeld-lab/adaptive_sampling.

The authors thank J. Kussmann (LMU Munich) for providing a development version of the FermiONs++ program package and Yannick Lemke for useful comments and discussions. Financial support was provided by the “Deutsche Forschungsgemeinschaft” (DFG, German Research Foundation) within the Cluster of Excellence “e-conversion” (Grant No. EXC 2089/1-390776260) and SFB 1309-325871075, “Chemical Biology of Epigenetic Modifications.” C.O. acknowledges further support as a Max Planck Fellow at MPI-FKF Stuttgart. J.C.B.D. acknowledges support from the Leopoldina fellowship program, German National Academy of Sciences Leopoldina (Grant No. LPDS 2021-08).

The authors have no conflicts to disclose.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
C.
Chipot
and
A.
Pohorille
,
Free Energy Calculations
(
Springer
,
2007
).
2.
C.
Chipot
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
4
,
71
(
2014
).
3.
G. M.
Torrie
and
J. P.
Valleau
,
J. Comput. Phys.
23
,
187
(
1977
).
4.
E.
Darve
and
A.
Pohorille
,
J. Chem. Phys.
115
,
9169
(
2001
).
5.
A.
Laio
and
M.
Parrinello
,
Proc. Natl. Acad. Sci. U. S. A.
99
,
12562
(
2002
).
6.
C.
Abrams
and
G.
Bussi
,
Entropy
16
,
163
(
2014
).
7.
V.
Spiwok
,
Z.
Sucur
, and
P.
Hosek
,
Biotechnol. Adv.
33
,
1130
(
2015
).
8.
O.
Valsson
,
P.
Tiwary
, and
M.
Parrinello
,
Annu. Rev. Phys. Chem.
67
,
159
(
2016
).
9.
M.
Köppen
, in
5th Online World Conference on Soft Computing in Industrial Applications (WSC5)
,
2000
, Vol. 1, pp.
4
8
.
10.
E.
Darve
,
D.
Rodríguez-Gómez
, and
A.
Pohorille
,
J. Chem. Phys.
128
,
144120
(
2008
).
11.
J.
Comer
,
J. C.
Gumbart
,
J.
Hénin
,
T.
Lelièvre
,
A.
Pohorille
, and
C.
Chipot
,
J. Phys. Chem. B
119
,
1129
(
2015
).
12.
J. G.
Kirkwood
,
J. Chem. Phys.
3
,
300
(
1935
).
13.
G.
Ciccotti
,
R.
Kapral
, and
E.
Vanden-Eijnden
,
ChemPhysChem
6
,
1809
(
2005
).
14.
G.
Fiorin
,
M. L.
Klein
, and
J.
Hénin
,
Mol. Phys.
111
,
3345
(
2013
).
15.
A.
Lesage
,
T.
Lelièvre
,
G.
Stoltz
, and
J.
Hénin
,
J. Phys. Chem. B
121
,
3676
(
2017
).
16.
G.
Bussi
,
A.
Laio
, and
M.
Parrinello
,
Phys. Rev. Lett.
96
,
090601
(
2006
).
17.
A.
Laio
and
F. L.
Gervasio
,
Rep. Prog. Phys.
71
,
126601
(
2008
).
18.
A.
Barducci
,
M.
Bonomi
, and
M.
Parrinello
,
Wiley Interdiscip. Rev.: Comput. Mol. Sci.
1
,
826
(
2011
).
19.
H.
Fu
,
H.
Zhang
,
H.
Chen
,
X.
Shao
,
C.
Chipot
, and
W.
Cai
,
J. Phys. Chem. Lett.
9
,
4738
(
2018
).
20.
H.
Fu
,
X.
Shao
,
W.
Cai
, and
C.
Chipot
,
Acc. Chem. Res.
52
,
3254
(
2019
).
21.
H.
Fu
,
H.
Chen
,
X. a.
Wang
,
H.
Chai
,
X.
Shao
,
W.
Cai
, and
C.
Chipot
,
J. Chem. Theory Comput.
60
,
5366
(
2020
).
22.
H.
Fu
,
X.
Shao
,
C.
Chipot
, and
W.
Cai
,
J. Chem. Theory Comput.
12
,
3506
(
2016
).
23.
H.
Chen
,
H.
Fu
,
C.
Chipot
,
X.
Shao
, and
W.
Cai
,
J. Chem. Theory Comput.
17
,
3886
(
2021
).
24.
Y.
Miao
,
V. A.
Feher
, and
J. A.
McCammon
,
J. Chem. Theory Comput.
11
,
3584
(
2015
).
25.
S.
Kumar
,
J. M.
Rosenberg
,
D.
Bouzida
,
R. H.
Swendsen
, and
P. A.
Kollman
,
J. Comput. Chem.
13
,
1011
(
1992
).
26.
M. R.
Shirts
and
J. D.
Chodera
,
J. Chem. Phys.
129
,
124105
(
2008
).
27.
M. R.
Shirts
and
A. L.
Ferguson
,
J. Chem. Theory Comput.
16
,
4107
(
2020
).
28.
29.
M.
Bonomi
,
A.
Barducci
, and
M.
Parrinello
,
J. Comput. Chem.
30
,
1615
(
2009
).
30.
P.
Tiwary
and
M.
Parrinello
,
J. Phys. Chem. B
119
,
736
(
2015
).
31.
T. M.
Schäfer
and
G.
Settanni
,
J. Chem. Theory Comput.
16
,
2042
(
2020
).
32.
H.
Paliwal
and
M. R.
Shirts
,
J. Chem. Theory Comput.
7
,
4115
(
2011
).
33.
M.
Dracinsky
,
H. M.
Möller
, and
T. E.
Exner
,
J. Chem. Theory Comput.
9
,
3806
(
2013
).
34.
J. C. B.
Dietschreit
,
A.
Wagner
,
T. A.
Le
,
P.
Klein
,
H.
Schindelin
,
T.
Opatz
,
B.
Engels
,
U. A.
Hellmich
, and
C.
Ochsenfeld
,
Angew. Chem., Int. Ed.
59
,
12669
(
2020
).
35.
C.
Hartmann
,
J. C.
Latorre
, and
G.
Ciccotti
,
EPJ: Spec. Top.
200
,
73
(
2011
).
36.
J. C. B.
Dietschreit
,
D. J.
Diestler
, and
C.
Ochsenfeld
,
J. Chem. Phys.
156
,
114105
(
2022
).
37.
D.
Mendels
,
G.
Piccini
, and
M.
Parrinello
,
J. Phys. Chem. Lett.
9
,
2776
(
2018
).
38.
Y.
Wang
,
J. M. L.
Ribeiro
, and
P.
Tiwary
,
Nat. Commun.
10
,
3573
(
2019
).
39.
L.
Sun
,
J.
Vandermause
,
S.
Batzner
,
Y.
Xie
,
D.
Clark
,
W.
Chen
, and
B.
Kozinsky
,
J. Chem. Theory Comput.
18
,
1549
(
2022
).
40.
L.
Bonati
,
V.
Rizzi
, and
M.
Parrinello
,
J. Phys. Chem. Lett.
11
,
2998
(
2020
).
41.
D.
Wang
and
P.
Tiwary
,
J. Chem. Phys.
154
,
134111
(
2021
).
42.
K.
Minoukadeh
,
C.
Chipot
, and
T.
Lelièvre
,
J. Chem. Theory Comput.
6
,
1008
(
2010
).
43.
J.
Comer
,
J. C.
Phillips
,
K.
Schulten
, and
C.
Chipot
,
J. Chem. Theory Comput.
10
,
5276
(
2014
).
45.
M.
Kröger
,
Models for Polymeric and Anisotropic Liquids
(
Springer Science & Business Media
,
2005
), Vol. 675.
46.
W. C.
Swope
,
H. C.
Andersen
,
P. H.
Berens
, and
K. R.
Wilson
,
J. Chem. Phys.
76
,
637
(
1982
).
47.
A.
Barducci
,
G.
Bussi
, and
M.
Parrinello
,
Phys. Rev. Lett.
100
,
020603
(
2008
).
48.
J.
Kussmann
and
C.
Ochsenfeld
,
J. Chem. Phys.
138
,
134114
(
2013
).
49.
J.
Kussmann
and
C.
Ochsenfeld
,
J. Chem. Theory Comput.
11
,
918
(
2015
).
50.
H.
Laqua
,
T. H.
Thompson
,
J.
Kussmann
, and
C.
Ochsenfeld
,
J. Chem. Theory Comput.
16
,
1456
(
2020
).
51.
H.
Laqua
,
J.
Kussmann
, and
C.
Ochsenfeld
,
J. Chem. Phys.
154
,
214116
(
2021
).
52.
N.
Mardirossian
and
M.
Head-Gordon
,
J. Chem. Phys.
144
,
214110
(
2016
).
53.
F.
Weigend
and
R.
Ahlrichs
,
Phys. Chem. Chem. Phys.
7
,
3297
(
2005
).
54.
P. J.
Wilson
,
T. J.
Bradley
, and
D. J.
Tozer
,
J. Chem. Phys.
115
,
9233
(
2001
).
55.
F.
Jensen
,
J. Chem. Theory Comput.
11
,
132
(
2015
).

Supplementary Material