The virial equation of state (VEOS) provides a rigorous bridge between molecular interactions and thermodynamic properties. The past decade has seen renewed interest in the VEOS due to advances in theory, algorithms, computing power, and quality of molecular models. Now, with the emergence of increasingly accurate first-principles computational chemistry methods, and machine-learning techniques to generate potential-energy surfaces from them, VEOS is poised to play a larger role in modeling and computing properties. Its scope of application is limited to where the density series converges, but this still admits a useful range of conditions and applications, and there is potential to expand this range further. Recent applications have shown that for simple molecules, VEOS can provide first-principles thermodynamic property data that are competitive in quality with experiment. Moreover, VEOS provides a focused and actionable test of molecular models and first-principles calculations via comparison to experiment. This Perspective presents an overview of recent advances and suggests areas of focus for further progress.

## I. INTRODUCTION

*Ab initio* and related methods for computing physical properties from first-principles considerations are advancing rapidly. Evaluation of single-molecule properties (e.g., charge distribution, spectroscopic behavior) for small- to medium-sized molecules is routine,^{1} applications of density functional theory and its extensions^{2} to crystalline phases are competitive with experiment, and first-principles techniques are increasingly finding their way into fluid-phase simulations.^{3} A logical progression from the current situation to the next level of application should advance through the non-condensed vapor and supercritical fluid phases, which provides a clear path from single-molecule, ideal-gas properties to nontrivial, dense gases. Properties of interest in this context include those relating to volumetric behavior—the equation of state in particular—as well as thermal properties, such as the heat capacity; a number of other properties of interest, e.g., the Joule–Thomson (JT) coefficient, can be derived from these. However, there has to date been no concerted effort to develop and apply first-principles methods in this direction. The value of doing so is twofold: (1) It provides a systematic approach to rigorous and precise testing of the molecular models and *ab initio* methods, through comparison to available experimental data; and (2) it can yield useful computational-sourced data for properties of important systems for which no experimental data are available. The conventional approach for bridging molecular models to thermodynamic properties is molecular simulation, but, at least for gases and supercritical fluid phases, an alternative path exists that has some advantages; see Fig. 1.

More specifically, *ab initio* methods and molecular models can be connected rigorously to bulk fluid properties via calculation of “cluster integrals,” which are integrals over the configurations of a small number of molecules (often fewer than ten). These methods typically provide fluid properties in the form of a power series in density, with coefficients expressed in terms of the cluster integrals. As a modeling tool, cluster-integral methods hold a distinct position between molecular simulation on the one hand and traditional molecularly inspired thermodynamic models on the other. Specifically,

like simulation, (1) for the state conditions where they are applicable, they provide an exact thermodynamic description for any system that is defined in terms of its molecular interactions; (2) they include the theoretically correct dependence on state variables, including mole fractions when applied to mixtures; (3) they can accommodate any level of complexity in the model, all with full statistical-mechanical rigor; (4) they can provide information for a broad range of thermodynamic properties, all consistent with a given molecular model.

like thermodynamic models, (5) they provide a model in the form of an equation (rather than discrete data), which can be manipulated, analyzed, and used in larger calculations (e.g., process design) as needed; (6) cluster integrals (i.e., thermodynamic model parameters) developed for one substance enter as part of the description of mixtures containing that substance, so effort made to characterize it is not discarded when it appears elsewhere.

and, unique to cluster methods, (7) unfortunately, their applicability is (at present) limited to non-condensed fluid phases and, thus, excludes, e.g., liquids, which are separated from the low-density gas by a first-order phase transition; moreover, they become more difficult to apply with increasing density, even at supercritical temperatures; (8) however, unlike any other thermodynamic model, when failure occurs, it is detectable—variation of the property with addition of higher-order terms indicates that it is not accurate at a given order; (9) related to this, cluster methods are based on a well-defined approximation and they can be systematically improved; finally, (10) they can be interfaced with other known features of the fluid-property behavior to form approximants that are more effective over a broader range of conditions, compared to the basic series.

Thus, the approach as a whole really is in a class by itself, wherein the simultaneous character of being both a detailed molecular treatment and a rigorous thermodynamic model makes it a methodology with unique appeal. The small community of researchers working on the virial framework has made truly remarkable progress in recent years toward turning it into a practical tool to connect molecular and bulk properties, one that may come to rival molecular simulation in some respects. Major advances have been made in methods for calculating the cluster integrals, understanding their convergence, and extending their range of application via interfacing with approximants. A particularly important development, which makes the proposed activity possible now for the first time, is the formulation of very efficient algorithms for computing cluster integrals and their derivatives for multibody potentials.

## II. FRAMEWORK AND RECENT ADVANCES

### A. Virial equation of state

#### 1. Virial coefficients

The virial equation of state (VEOS) relates pressure *p*, temperature *T*, and number density *ρ* as a series expansion in density,^{4–6}

where *k*_{B} is the Boltzmann constant and *B*_{k} is the *k*th virial coefficient. We label the series truncated at order *n* (shown here) as VEOS*n*. The series in terms of the activity $z=exp\mu /kBT$, where *μ* is the chemical potential, is sometimes considered in developing, understanding, or extending the overall framework,

The *B*_{n} and *b*_{n} coefficients can be computed from each other in a straightforward way.^{7}

The virial coefficients are independent of density and depend only on temperature; coefficients for mixtures are given rigorously as a polynomial in mole fractions (see Sec. II A 2). The virial coefficients are related to the intermolecular potential via the configuration integral,

where molecule 1 defines the origin and positions of molecules 2, 3, …, *n* are defined relative to it (integration over orientations or intramolecular coordinates would be performed as well, if needed). Here, *f*_{B} is the Husumi function, and it is given as a complicated sum of products of terms involving the energy *U*_{k} of groupings of *k* ≤ *n* molecules, formed from partitions of the entire set of *n* molecules in configuration **r**^{n}. For pairwise-additive models, such that $Uk=\u2211i<jkuij$, *f*_{B} can be represented as a sum of doubly connected graphs, with bonds given by the Mayer function, $fij=exp\u2212uij/kBT$. Significantly, the approximation of pairwise additivity is not a requirement, and the VEOS can be applied with virial coefficients computed using potentials that include multibody contributions. It has been shown that such contributions are essential to providing property values that are competitive in accuracy to experiment.^{8–13}

#### 2. Mixtures

The treatment of mixtures by the VEOS is handled rigorously. The coefficient *B*_{n} for a mixture of *c* components is given as a polynomial in mole fractions of the species, with coefficients given by cluster integrals computed for the corresponding set of molecules. For example, *B*_{3} for a three-component mixture is

where *B*_{ijk} is the cluster integral for *i* molecules of species 1,*j* molecules of species 2, and *k* molecules of species 3, with mole fractions *x*_{1}, *x*_{2}, and *x*_{3}, respectively. The generalization to other orders *n* and number of species *c* is straightforward.

Often, the weak link in the treatment of mixtures is evaluating the cross-species configurational energy needed for coefficients that involve a heterogeneous set of molecules. Potentials are typically developed and refined for pure species, and when needed, unlike-molecule interactions are estimated using semiempirical combining rules, such as Lorentz–Berthelot. However, in principle, it is possible to formulate energy surfaces for heterogeneous sets of molecules, and this is sometimes done, but it obviously involves more computational effort. Nevertheless, calculations that aim for high accuracy relative to experiment should develop bespoke potential models for each set of molecule types and cannot rely on approximate combining rules. If the potential is modeled by summing only two- and three-body contributions, the effort required can be manageable.

#### 3. Other properties

When supplemented by ideal-gas thermal properties, standard thermodynamic manipulations allow derivation of formulas^{14} for any thermodynamic property of interest, for any temperature, density, and composition within the region of convergence of the series (and where coefficient values are available). This includes thermal expansion coefficient, heat capacities, Joule–Thomson inversion curve, fugacity coefficients, etc. Cluster integrals cannot be performed to compute transport coefficients, and attempts to develop an appropriate density expansion for these properties have been problematic,^{15} but extensions of the VEOS approach have been developed for calculation of other properties, such as those related to inhomogeneous fluids and surfaces,^{16} and electrostatic phenomena^{17} (e.g., dielectric constant).

The VEOS is a rigorous description of the equation of state for the molecular model used to compute the virial coefficients. Where it is converged, it provides exact values of the pressure for a given density, temperature, and mixture composition. In particular, it describes behavior in the thermodynamic limit, so it is not subject to finite-size errors that arise in molecular simulation.

### B. Calculation of virial coefficients

#### 1. Mayer-sampling Monte Carlo

The integral in Eq. (3) must be evaluated numerically using Monte Carlo (MC) methods. Mayer-sampling Monte Carlo (MSMC)^{18} is formulated by extending single-stage and multistage free-energy methods, and it has opened up many opportunities for evaluating cluster integrals. In MSMC, configurations are generated via a Markov process, sampled in proportion to a distribution *π*(**r**^{n}). For each configuration, *f*_{B} is evaluated, and the integral is given as the average

where $Bnref\u2261n\u22121n!\u222bfBrefdrn\u22121$ is the virial coefficient (or some cluster integral) for a reference system (typically based on the hard-sphere model) and is taken as known. Importance sampling suggests that *π* be given by *f*_{B}, but because *f*_{B} can be negative, an absolute value is used. With this choice, it is likely that the reference system is incompletely sampled, so we routinely employ a multistage method^{19,20} (overlap sampling, a variant of Bennett’s method^{21}) in which configurations are generated in parallel by sampling on $\pi ref=|fBref|$. Averages in both samples are then taken for an overlap function that is defined in terms of *f*_{B} and $fBref$, with a parameter [and allocation of central processing unit (CPU) time between the two samples] that is tuned to obtain precise *B*_{n} most efficiently.

Mayer-sampling calculations are performed in an open volume, with no containing walls and no periodic boundaries—importance sampling keeps the molecules together. Configurations with large steric overlap are included in the samples because *f*_{B} is nonzero there (Fig. 2). No correction for truncation of the potential is needed, even for systems with electrostatic interactions (i.e., no Ewald sum or similar methods are invoked). However, it is sometimes necessary to couple offsetting orientations when computing a weight for a configuration of molecules that have electrostatic interactions. This rough average can be as simple as including the direct configuration as well as others where each molecule is flipped to reverse the direction of its dipole moment. Electrostatic interactions decay more rapidly upon Boltzmann-weighted orientation averaging, relative to fixed orientation, and this flipping process prevents molecules from oversampling distant separations.

#### 2. Wheatley’s recursion

Each configuration generated in a MSMC process requires evaluation of the integrand *f*_{B}. This calculation is increasingly complicated and computationally expensive with increasing order of the coefficient *n*. This is the major barrier to evaluation of virial coefficients to arbitrarily high order *n*. For example, for pairwise-additive potentials, *f*_{B} requires evaluation of a sum of terms that grows in number as $2n2$. Table I lists the total number of graphs that can be formed from *n* distinguishable points with zero or one bonds joining each pair. The fraction of these that are doubly connected must be summed to evaluate *f*_{B} for each configuration. By just *n* = 13, the total number of graphs is of order 10^{23}, of which 96% are biconnected (also referred to as “irreducible”). Clearly, direct evaluation of such a sum, even once, is wholly infeasible.

. | Number of graphs . | Fraction . | . |
---|---|---|---|

Order n
. | 2^{n(n−1)/2}
. | biconnected (%) . | 3^{n}
. |

2 | 2 | 50 | 9 |

3 | 8 | 13 | 27 |

4 | 64 | 16 | 81 |

5 | 1024 | 23 | 243 |

6 | 32 768 | 34 | 729 |

7 | 2 097 152 | 48 | 2187 |

8 | 268 435 456 | 62 | 6561 |

9 | 68 719 476 736 | 74 | 19 683 |

10 | 35 184 372 088 832 | 83 | 59 049 |

11 | 36 028 797 018 963 968 | 89 | 177 147 |

12 | 73 786 976 294 838 206 464 | 93 | 531 441 |

13 | 0.501 mol | 96 | 1 594 323 |

. | Number of graphs . | Fraction . | . |
---|---|---|---|

Order n
. | 2^{n(n−1)/2}
. | biconnected (%) . | 3^{n}
. |

2 | 2 | 50 | 9 |

3 | 8 | 13 | 27 |

4 | 64 | 16 | 81 |

5 | 1024 | 23 | 243 |

6 | 32 768 | 34 | 729 |

7 | 2 097 152 | 48 | 2187 |

8 | 268 435 456 | 62 | 6561 |

9 | 68 719 476 736 | 74 | 19 683 |

10 | 35 184 372 088 832 | 83 | 59 049 |

11 | 36 028 797 018 963 968 | 89 | 177 147 |

12 | 73 786 976 294 838 206 464 | 93 | 531 441 |

13 | 0.501 mol | 96 | 1 594 323 |

Ten years ago, Wheatley^{22} proposed a very clever method to break through this barrier, or at least push it back. He presented a recursive method for evaluating *f*_{B} with computational effort that grows (only) exponentially with *n*, specifically as 3^{n}. This rate of growth is illustrated in the table as well, showing that calculation of *B*_{n} for *n* = 13 is only of order 10^{5} times more computationally expensive than *n* = 2. The calculation still becomes untenable for not-very-large *n*, but the advance extends our capabilities by a useful amount. Apart from the expense of evaluating *f*_{B}, larger *n* implies a larger space of configurations to sample, and the presence of offsetting positive and negative contributions to the average can further slow convergence.

Arguably, even more important than the computational savings provided by Wheatley’s algorithm is the relative simplicity it presents for writing a general-purpose code that allows calculation of *f*_{B} for any *n*. Previously, this required enumeration and coding of the relevant doubly connected graphs, which becomes increasingly tedious for increasing *n* and increasing number of species *c*. Moreover, without added effort, the enumeration will not handle permutations of molecule labels (yielding graphs that are equivalent upon integration but differ for a given configuration); Wheatley’s algorithm handles these permutations automatically, reaping the most information from each configuration.

Even more, Wheatley and co-workers^{23} recently presented a detailed proof showing that his algorithm yields the correct *f*_{B} for nonadditive potentials. The complexity of including non-pairwise terms in a direct sum significantly compounds the tedium already inherent in formulating the pairwise-additive sum. This realization opens the door to facile implementation of calculations of virial coefficients for highly realistic molecular models, suitable for high-accuracy property calculation, where three-body interactions (at a minimum) are needed to compete with state-of-the-art experimental data.

Finally, Wheatley’s recursion can be extended in a straightforward manner to allow calculation of virial-coefficient derivatives, with added cost that is comparable to the cost of computing the coefficient value itself (in excess of the cost of the energy calculations, which do not need to be repeated for derivatives).^{23} The temperature derivative in particular is important, as the first and second temperature derivatives are required for evaluation of thermal properties, such as the heat capacity and Joule–Thomson coefficient.

As a result of these advances, calculation of virial coefficients to seventh or maybe eighth order, for sufficiently simple molecular models, can be accomplished without extraordinary computational expense. The limit of what can be reached if extraordinary effort is applied is demonstrated by some recent publications, reporting coefficients up to *B*_{12} for the hard-sphere model,^{22,24} up to *B*_{9} with full temperature dependence for the square-well model,^{25} and up to *B*_{16} for a single low temperature for the Lennard-Jones model.^{26}

#### 3. Flexible correction

The formulas typically presented for the virial coefficients in terms of doubly connected graphs are applicable only to molecules that are rigid, having no bend, torsion, stretch, or other intramolecular coordinate degrees of freedom. The mathematical manipulations that convert the activity series to the density series involve cancellation of graphs that are equivalent only if the molecules are rigid; see Fig. 3. Hence, to compute *B*_{n} correctly, it is necessary (for *n* > 2) to apply a well-defined “flexible correction.”^{27} As of now, no general-purpose algorithm has yet been developed to compute the correction for arbitrary *n* and *c*, e.g., as an extension to Wheatley’s recursion, so the correction has to be coded specifically for each case. The correction is small to the extent that the molecule is rigid, and an algorithm has been proposed to exploit this feature,^{28} but in general it can be difficult to calculate the correction with good precision, and often it contributes the most to the overall uncertainty in *B*_{n}. Wheatley’s algorithm, by the way, computes the *b*_{n} coefficients in the process of evaluating *B*_{n}, and the activity-series coefficients evaluated this way are correct even for flexible molecules—it is in the process of deriving *B*_{n} that the need for the correction is introduced.

It should be noted that “flexibility” in the intermolecular potential, e.g., electrostatic polarization, is not a concern in this context; it is only nuclear-coordinate flexibility that is at issue.

#### 4. Nuclear quantum effects

Light atoms at low temperatures exhibit behavior that is not accurately described using a classical treatment, and it becomes necessary to introduce corrections or methods that capture nuclear quantum effects. Semiempirical intermolecular models incorporate these effects implicitly when the model parameters are fit to experimental data. In contrast, calculations based on models that are derived from high-accuracy first-principles electronic structure calculations must instead handle the effects explicitly. The main effect is due to the quantum nature of the nucleus, which does not allow it to be treated as a mathematical point particle but rather as a diffuse (albeit highly localized) density cloud. Another possible complication is nuclear exchange, which can further give rise to peculiar isotope effects; these considerations are important only at very low temperatures (less than 10 K) for very light atoms (hydrogen, helium).

The simplest approach to handle quantum diffuseness is to invoke a semiclassical approximation,^{29,30} which is derived as a series expansion in Planck’s constant, *h*, usually truncated at *h*^{2}. The correction requires evaluation of first and second derivatives of the intermolecular potential with respect to the atom coordinates. Semiclassical methods can extend the range of temperature where accurate results can be obtained, but they eventually become inaccurate at sufficiently low temperatures. In these cases, path-integral methods should be applied.^{12,31,32} Curiously, introduction of path-integral techniques effectively makes rigid (or even monatomic) molecules into flexible molecules, and the treatments discussed in Sec. II B 3 come into play for coefficients beyond *B*_{2}.

### C. Convergence of the VEOS

#### 1. Performance, and assessment of convergence

The VEOS is a power series in density, and as such, it can provide accurate values for properties only for conditions where the series is converged. One can assess whether the series truncated at a particular order is converged by examining contributions to the properties made by successive terms in the series. At a given state condition, the magnitude of the last term in the sum gives a sense of the error made in neglecting the remaining terms. Of course, it is possible that the highest-order coefficient happens to be small where subsequent coefficients are not, so ideally one will have enough coefficients to observe a consistent trend of diminishing contributions. In several applications,^{14,33} we have performed a test of the reliability of the approach in the following manner. Say we have *n* coefficients’ values available to us. We set a tolerance for accuracy, e.g., 1%, and we evaluate the order of the VEOS required to obtain the property value to this accuracy, gauging the truncated series VEOS*k* against the full available series VEOS*n*, *k* < *n*. In these tests, we also have available the correct property value, given either by experiment or molecular simulation for the same model. We perform the same evaluation, finding the VEOS*j* that provides agreement of the series and the known property value within the tolerance. We typically observe that the self-assessment of convergence is highly consistent with the externally validated assessment, i.e., *j* = *k*. Thus, in contrast with other thermodynamic models, VEOS is unusual in its ability to provide a reliable self-assessment of its accuracy (for a given molecular model, whose accuracy itself is a different issue).

Condensation represents a singularity in the free-energy surface. Accordingly, VEOS, as a power series about *ρ* = 0, cannot converge in the liquid region. In fact, it was proved by Mayer and Mayer^{7} that the density series is not convergent for densities exceeding the spinodal density, i.e., where (*∂p*/*∂ρ*)_{T} = 0. In practice, we find that convergence is good up to the binodal density, sometimes even on approach to the critical point. The VEOS in principle can provide the location of the critical point, but in practice, we have not been able to do this reliably, such that increasing the series order leads to a smooth convergence to the critical conditions. Above the critical temperature, there is *a priori* no clear barrier to convergence with increasing density. However, we observe an apparent barrier when convergence is mapped in the pressure–temperature plane. Self-assessment of convergence viewed this way suggests that convergence becomes difficult upon approach to conditions that extrapolate the coexistence line in a Clausius–Clapeyron representation; this behavior is illustrated in Fig. 4. We suspect this barrier may have some relation to a Widom line,^{34} which is connected to maxima in the thermodynamic response functions^{35} (e.g., compressibility). The effect attenuates as one moves away from the critical point, toward higher temperatures, such that a series using about six coefficients is sufficient to reach the freezing point, at least for simple, nonpolar fluids.

#### 2. Approximants

The VEOS is just one way to represent the equation of state as a power series. The series in activity [Eq. (2)] is an alternative, and the reversion of Eq. (1), giving density as a series in pressure, is another. The convergence properties of these various approaches differ from each other. Studies have found^{36} that the conventional VEOS [Eq. (1)] exhibits the best behavior of these alternatives.

It is worthwhile to consider more sophisticated ways to recast the VEOS as a different series. In doing this, we want to take advantage of any independent physical information that may be available about the system and build that into the treatment. Padé approximants, for example, are formed as a ratio of two series with coefficients selected to be consistent with the original, direct series. They can accelerate convergence significantly, but one must be careful in choosing how they are formulated, in particular, how many terms to include in the numerator series and how many in the denominator (the total number of independent terms cannot exceed the number of terms in the original series). In application to the soft-sphere model, we showed that convergence of the approximant depended critically on whether the approximant captured an estimated form for the pressure at high density.^{37}

Another example of a physically motivated approximant considers behavior near the critical point. Here, all fluids in a given universality class exhibit behavior having a common mathematical form, and a strategy for enhancing convergence of the VEOS in the vicinity of criticality is to formulate an approximant with this universal behavior built in. The simplest such example attempts to capture the critical singularity along a path following the critical isotherm. In particular, the following form is imposed^{38} (subscript c indicates a value at the critical point):

The coefficients *A*_{k} are determined by expanding this form about *ρ* = 0 and matching the VEOS. The exponent *δ* is universal, with the value 4.789. Extension of this approximant to approach the critical point from any direction (not just along *T*_{c}) must be handled carefully. We developed an approximant^{39} based on the Schofield’s parametric equation of state^{40} and found significantly better performance over the basic VEOS. However, there is still much room (and need) for improvement.

#### 3. Condensation and the liquid state

It is interesting to consider the prospect of extending the VEOS into the liquid phase. Mayer and Mayer^{7} proved that the VEOS strictly fails at the point of condensation, the binodal density. Ushcats^{41} developed a recursive formulation of the partition function (and from it, an equation of state, which we will designate UEOS) in terms of the same irreducible cluster integrals appearing in the VEOS. The formulation depends on the number of molecules *N* and the volume *V* independently (rather than only as their ratio *ρ* = *N*/*V*). In the limit *N* → ∞, UEOS and VEOS are equivalent for *ρ* less than the binodal density, but for densities beyond the point where *∂p*/*∂ρ* = 0, UEOS follows a flat line, mimicking the true coexistence behavior. However, this line does not terminate at the coexisting liquid density but instead extends to arbitrarily large density without change in pressure. UEOS fails to identify the coexisting liquid and the subsequent increase in pressure with density in the liquid region. This failure is attributed to the cluster integrals’ lack of volume dependence, as the effects of constraining walls (or periodic boundaries) are removed in the development of the cluster-integral framework.

Ushcats and co-workers have done much to advance thinking on this topic over the past decade.^{42} They raise issues regarding the relative importance of reducible vs irreducible cluster integrals in determining the needed volume dependence and how this dependence might be represented in practice. They argue that volume dependence must be manifested in “macroscopic” cluster integrals, where *n* is of order *ρV*, where *V* is large enough to support a liquid phase. Indeed, given that any such volume dependence must scale in a way that ensures that in the thermodynamic limit, properties depend on *N*/*V* and not *N* and *V* independently, one might consider whether there exists a universal form for the volume dependence of large cluster integrals, independent of microscopic details in the same manner as critical-point universality. Regardless, the topic seems ripe for some type of breakthrough and deserves additional attention from the statistical physics community.

## III. CAN ACCURACY OF VEOS PROPERTIES EXCEED EXPERIMENT?

Attempts to generate VEOS-based experiment-quality property data from *ab initio* computational chemistry calculations are showing significant progress. The natural starting point for such efforts is helium, for which a history of virial-coefficient calculations through 2021 may be found in Ref. 33. A high-quality first-principles pair potential has been presented and refined several times over the past decade,^{43} and a good three-body potential has been available since 2009.^{44} We used these models to compute virial coefficients up to *B*_{7}, and in Fig. 5, we compare the errors in the pressure (expressed as the compressibility factor, *Z*) given via VEOS to those estimated for state-of-the-art experimental measurements.^{45,46} The figure shows that the contribution to *Z* from each additional term in the series steadily decreases such that the error from truncating the series is safely less than 10^{−5}. At the same time, stochastic error in the VEOS pressure, propagated from uncertainties in the coefficients, is negligible; estimated systematic error in VEOS, propagated from bounds on errors published with the pair and three-body potentials, and estimated error due to neglect of four-body interactions^{12} are the limiting (largest) errors for VEOS, but they do not exceed 10^{−4}. Meanwhile, estimated stochastic errors published with the experiments are small, but estimated systematic errors do exceed 10^{−4} for all pressures. Hence, based on best estimates of all errors contributing to VEOS and experiment, we could conclude that the *ab initio*-based VEOS calculations exceed the accuracy and precision of the experimental measurements for helium at this temperature.

Propagation of estimated systematic error in the intermolecular potential into estimated error in a virial coefficient is typically accomplished by perturbing the entire potential-energy surface in one direction by its estimated error and computing how this changes the coefficient. Although imperfect, this provides a reasonable gauge of the effect of errors in the potential. The process can be completed efficiently without computing the coefficient twice, by focusing on calculating the difference itself^{33} or by evaluating functional derivatives.^{12}

Going beyond helium, greater challenge (and interest) is found with multiatomic molecules. Hellmann generated *ab initio* pair^{47} and three-body^{48} potential functions for rigid CO_{2} molecules, and he used these potentials to compute first-principles virial coefficients for CO_{2} up to *B*_{8}. We used his coefficients to compute the Joule–Thomson (JT) inversion curve for CO_{2} and compare the result with experimental data in Fig. 6. The JT coefficient requires first and second temperature derivatives of the virial coefficients, and JT inversion is generally considered to be a challenging property to capture accurately with any equation of state; it is also a difficult property to evaluate experimentally. The figure demonstrates the convergence of the VEOS and the very good agreement with experimental data. The figure also shows the performance of a semiempirical pair potential (TraPPE), which does well in fitting CO_{2} properties overall. In this application, however, its limits are apparent, and they highlight the success of the first-principles coefficients in capturing this behavior.

Finally, we point out a very recent paper from Graham and Wheatley,^{11} who computed properties from first-principles VEOS for the mixture CO_{2} + Ar. This work is notable for its use of the Gaussian Processes (GP) machine-learning method to describe the potential-energy surface (PES) based on inputs from two- and three-molecule *ab initio* calculation. Separate *ab initio* calculations and GP representations were determined for each set of pairs (AA, AB, BB, where A and B represent CO_{2} and Ar molecules, respectively) and triplets (AAA, AAB, ABB, and BBB) needed to compute the PES for any set of species. Then from these, all mixture virial coefficients up to *B*_{5} were computed from 150 to 370 K (with some pure-species *ab initio* coefficients taken from the literature). Graham and Wheatley report excellent agreement with experiment at conditions where VEOS is converged, at all compositions of the mixture. Tested properties include the pressure, speed of sound, and Joule–Thomson inversion.

We would suggest that these recent studies argue for the feasibility of computing properties from first principles via the VEOS, over regions where it is converged, with results that are competitive to experiment in both precision and accuracy.

## IV. MOVING FORWARD

The steady accumulation of advances in disconnected fields—first-principles molecular modeling, cluster-integral theory and methods, machine-learning tools for parameterizing force fields, and computer hardware and software—has now set the stage for a concerted, systematic effort to generate highly accurate first-principles equations of state for bulk fluids. We can now for the first time calculate properties of technological importance for a huge variety of supercritical fluid mixtures from molecular considerations. We can generate data for far more systems than can be measured in the laboratory, and we can present these data as a reliable equation of state suitable for manipulation in design and optimization or in other investigations (e.g., estimating critical lines). In doing so, we generate information that can be used to identify weaknesses in *ab initio* methods, thereby guiding efforts in that arena.

For semiempirical pairwise-additive (or polarizable) potentials that are formulated in part by fitting to experiment, the virial coefficients are just another property that may or may not be reproduced well. These potentials necessarily make compromises that enable them to cover a broad range of properties adequately over a range of state conditions and to do so while being computationally inexpensive and thereby suitable for molecular simulations. Often, these compromises are manifest in systematic but compensating differences from experimental virial coefficients, such that, for example, *B*_{2} is overestimated while *B*_{3} is underestimated.^{14}

Deviations of this sort should not be tolerated for virial coefficients computed from potential-energy functions that purport to provide the true pair, three-body, etc., interactions. Moving forward, we would advocate that the presentation of any *ab initio* potential-energy function be accompanied by a demonstration that it is consistent with any available experimental virial-coefficient data. Such a comparison should cover a range of temperatures when possible, because different temperatures will give emphasis to different parts of the PES. Indeed, often this comparison is dutifully performed for new potentials, but in some cases, it is not. Potentials that are validated this way are more valuable, because they provide more reliable probes of behavior and can serve as a more reliable basis for property calculations, whether via VEOS or molecular simulation.

Related to this, a quantitative estimate of the magnitude of any systematic error that may be present in the potential is needed to generate error bounds on the properties computed from them. Such errors could have origins in the limitations of the *ab initio* theory or basis sets or in the means used to represent the PES from the limited *ab initio* data. It would be helpful to developers and users to have standards and procedures in place for estimating these errors and an expectation that they will be applied and their results reported with any new model or data computed from it.

Methods for calculation of virial coefficients given a function for the PES are adequate for simple systems, but some advances in application to flexible molecules will be needed to allow for application to a broad range of molecule types.

The utility of high-accuracy virial coefficients can be greatly extended if we can improve the reach of the VEOS. This may be accomplished by the development of more powerful approximants, which can, for example, be built upon treatments of critical universality other than previously attempted or perhaps with accommodation of the anomalies associated with the Widom region; approximants for mixtures have not yet been attempted. The prospect of reaching condensed states is alluring, but success in that direction is uncertain despite significant advances over the past decade. All in all, the number of researchers who have reported attempts to extend the reach of the VEOS is small, and continued advances may arrive more quickly if a broader community of researchers can be persuaded to look at this problem.

Finally, the ability of VEOS to advance first-principles and machine-learning molecular models hinges on the availability of high-quality experimental data that can be used to assess and guide the model development. We have advocated that an effective form of this interplay is through comparison of temperature-dependent virial coefficients. However, to extract meaningful virial coefficients from experimental data (e.g., compressibility factor, speed of sound), one should extrapolate to zero density and evaluate the limiting intercept, slope, curvature, etc. This is difficult to do with good precision, and consequently experimental data for coefficients of order *B*_{3} and (especially) higher are rare; the situation is even worse for mixtures. A variety of experimental methods exist for generating data that can yield virial coefficients,^{4} and others may yet be invented. It is our hope that an emergence of the VEOS as a new frontier for computational chemistry would spur a concomitant renewal of interest in experimental gas-phase thermodynamics as a means to inform and support these advances.

## ACKNOWLEDGMENTS

The authors are grateful for support from the U.S. National Science Foundation, Grant No. CBET-2152946. D.A.K. is grateful for the support of the Walter E. Schmid Family Foundation.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Andrew J. Schultz**: Conceptualization (equal); Writing – review & editing (equal). **David A. Kofke**: Conceptualization (equal); Writing – original draft (lead); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

_{2}/CH

_{4}mixtures from the virial equation of state

_{2}–H

_{2}interaction potential from first principles

^{4}He and

^{3}He from an accurate relativistic interaction potential

_{2}, O

_{2}, CO

_{2}, and NH

_{3}mixtures from the virial equation of state