The formally *exact* framework of equilibrium Density Functional Theory (DFT) is capable of simultaneously and consistently describing thermodynamic and structural properties of interacting many-body systems in arbitrary external potentials. In practice, however, DFT hinges on *approximate* (free-)energy functionals from which density profiles (and hence the thermodynamic potential) follow via an Euler–Lagrange equation. Here, we explore a relatively simple Machine-Learning (ML) approach to improve the standard mean-field approximation of the excess Helmholtz free-energy functional of a 3D Lennard-Jones system at a supercritical temperature. The learning set consists of density profiles from grand-canonical Monte Carlo simulations of this system at varying chemical potentials and external potentials in a *planar* geometry only. Using the DFT formalism, we nevertheless can extract not only very accurate 3D *bulk* equations of state but also *radial* distribution functions using the Percus test-particle method. Unfortunately, our ML approach did not provide very reliable Ornstein–Zernike direct correlation functions for small distances.

## I. INTRODUCTION

Given the massive present-day availability of computer power and data, the grown general interest in machine learning (ML) should not come as a big surprise. This interest also extends to physics whose community excels at gathering, organizing, and analyzing data in order to predict and model the behavior of systems with many degrees of freedom, which is also one of the strengths of ML. An important distinction between the field of physics and ML is that physicists tend to understand, model, and predict the systems of their interest via a stepwise chain of reasoning from cause to effect, whereas ML algorithms tend to “only” directly relate cause to effect without necessarily understanding (in the traditional “human” sense) the steps in between. In other words, ML can often be regarded as a black box that is as incomprehensible as the initial raw data itself.

Here, we will also suffer, at least to some extent, from this black-box character of ML applied to a problem in classical Density Functional Theory (DFT).^{1,2} However, only in a limited way because we can build on the foundations of physics to exploit, in this case, a few ingredients of the DFT formalism. As we will explain in full detail below, DFT is an exact framework to describe thermodynamic and structural properties of interacting many-body systems. This involves the solution of Euler–Lagrange equations for the equilibrium density profile for a particle system in an external potential. Now, DFT hinges for given particle–particle interactions on approximate free-energy density functionals. By comparison with Monte Carlo (MC) simulations of density profiles in a learning set of external potentials, a free-energy functional can be constructed during an ML process. The additional physics that can be extracted beyond the learning set not only includes density profiles for external potentials outside the learning set but also (i) thermodynamic bulk quantities (because the minimal value of the functional corresponds to the thermodynamic potential at equilibrium from which, for instance, the bulk pressure follows) and (ii) the two-body direct correlation function (because it is related to the second functional derivative of the functional) from which the radial distribution function follows. Moreover, thermodynamic surface properties such as the adsorption and the interfacial tension can be extracted from the functional. Our work is strongly inspired by recent ML work to construct a classical DFT for the Lennard-Jones (LJ) fluid in one spatial dimension,^{3,4} which we here extend to the three-dimensional LJ fluid. Similar to Refs. 3 and 4, we use grand-canonical Monte Carlo (MC) simulations at a learning set of chemical potentials and external potentials, however, in a planar geometry. We stress that the planar geometry yields an effective 1D problem embedded in 3D, not to be confused with an actual 1D problem. We will show the ability not only to “learn” a free-energy functional that predicts density profiles of this system at chemical and external potentials outside the training set but also to extract several system properties that were not at all present in the data of the training set or at least not explicitly. In particular, we will show that from a learning set in a *planar* geometry, a machine-learned functional can be constructed that is capable of predicting the 3D mechanical bulk equation of state of the *homogeneous* fluid (the pressure–density–chemical potential relations), the 3D *radially symmetric* direct correlation function and the radial distribution function at any density, and (in principle) Lennard-Jones density profiles in an arbitrary external potential in 3D. The agreement of these predictions against simulations varies from very good (the equation of state, radial distributions from the Percus test-particle method, and density profiles outside the learning set) to, admittedly, rather poor (direct correlation function). The poor prediction for the latter is probably due to the rather simple form (and in retrospect perhaps an overly simple form compared to Ref. 4) for the free-energy functional and due to the treatment of the repulsive part of the LJ interaction. The main thrust of our findings at this point, therefore, is not the construction of the Lennard-Jones free-energy functional that compares “best” with MC simulations, but rather the notion that free-energy functionals for 3D systems can be constructed from relatively simple geometries (here planar) in the learning set. Extensions to other systems, for instance electrolytes and ionic liquids forming an electric double layer in contact with planar electrodes, could be a next step with actual applications in modeling the osmotic equation of state, the differential capacitance, and the adsorption in porous geometries.

This paper is organized as follows. We start in Sec. II with an extensive introduction for classical DFT—which can easily be skipped by readers familiar with this framework. In Secs. III and IV, we discuss the system and the simulation and machine-learning methods that we use, and in Sec. V, we discuss the resulting kernels, density profiles, equations of state, and pair correlation functions. We end in Sec. VI with a discussion and outlook.

## II. CLASSICAL DENSITY FUNCTIONAL THEORY

### A. Formalism

We consider a classical one-component system of *N* spherical particles with linear momenta **p**_{i} and center-of-mass positions **r**_{i}, with *i* = 1, …, *N* being the particle label. The particles interact with each other via an isotropic pair potential *ϕ*(*r*_{ij}), where *r*_{ij} = |**r**_{i} − **r**_{j}| is the distance between particles *i* and *j*. All particles are subject to a static external potential *V*_{ext}(**r**_{i}) such that the Hamiltonian of the system reads

where *m* denotes the mass of the particles. Here, we note that Eq. (1) can also describe macroscopic bulk systems by considering the external potential to be zero in a box of volume *V* at a homogeneous density *ρ* = *N*/*V* and temperature *T*. For these homogeneous systems, typical thermodynamic equilibrium quantities of interest include the caloric and mechanical equations of state *u*(*ρ*, *T*) and *p*(*ρ*, *T*) for the internal energy and pressure, respectively. In addition, structural quantities such as the radial distribution function *g*(*r*) (at particle–particle separation *r*) and the structure factor *S*(*q*) (at wavenumber *q*) are of interest for homogeneous systems.^{2} Equilibrium statistical mechanics offers a variety of techniques to calculate (approximations to) these quantities, for instance, systematic low-*ρ* or high-*T* expansions, integral equations based on the Ornstein–Zernike equation, or computer simulations. However, the situation is more complicated in the case of a nontrivial external potential due to, for instance, the Earth’s gravity, an attractive or repulsive substrate, or a porous matrix that may confine the particles of interest. In this case, the system described by Eq. (1) becomes heterogeneous in thermodynamic equilibrium such that the local density *ρ*(**r**) varies in space. Consequently, the energy density *u* and the pressure *p* become ill-defined (except, of course, within a local density approximation), and the broken translation invariance causes the radial distribution function to be of the form g(**r**, **r**′) rather than *g*(|**r** − **r**′|). Nevertheless, the formalism of Density Functional Theory (DFT) can provide a consistent picture of the thermodynamic and structural properties of inhomogeneous fluids in an external potential. Although DFT finds its roots in the quantum-many-body description of electrons, it has also found many applications in the (essentially) classical context of soft-matter systems to describe molecular liquids, electrolytes, colloidal dispersions, etc.^{1,5–12}

DFT is essentially a grand-canonical framework in which the temperature *T* and the chemical potential *μ* of the particles are fixed to characterize the heat bath and the particle bath with which the system is in thermal and diffusive equilibrium. The corresponding thermodynamic potential is the grand potential Ω_{0} defined by $\beta \Omega 0=\u2212ln\u2061\u2211N=0\u221e\u222bdpNdrN\u2061exp[\beta (\mu N\u2212HN)]/N!h3N$, where *β*^{−1} = *k*_{B}*T*, *k*_{B} is the Boltzmann constant, and *h* is an arbitrary constant with the same dimension as the Planck constant. From Ω_{0}, essentially all thermodynamic properties would follow, for instance, the pressure of the homogeneous system equals −Ω_{0}/*V*, and the internal energy is *∂β*Ω_{0}/*∂β*. Of course, this involves the immense problem of evaluating the 6*N*-dimensional phase-space integral in the definition of Ω_{0}. The key of classical DFT is that it circumvents this high-dimensional phase-space integral by a proof^{1} of the existence of a grand potential *functional* Ω[*ρ*] of the variational one-body density profile *ρ*(**r**), with the properties that (i) the equilibrium density profile *ρ*_{0}(**r**) minimizes the functional Ω[*ρ*], and (ii) this minimum equals the equilibrium grand potential Ω_{0}. This implies that

The problem is thus reduced to finding the functional Ω[*ρ*] and, after that, to solving the 3D Euler–Lagrange equation (2), which amounts to a huge reduction in the problem compared to the high-dimensional phase-space integral.

One can also prove rigorously^{1,2,13} that the grand potential functional Ω[*ρ*] can always be written as

where $F[\rho ]$ is the *intrinsic* Helmholtz free-energy functional that, and this is crucial for our machine-learning approach, *only and uniquely* depends on the particle–particle interactions [here, the pair potential *ϕ*(*r*)] and on the temperature, and *not* on *μ* and *V*_{ext}(**r**). In other words, the same and unique functional $F[\rho ]$ for a given *ϕ*(*r*) applies at any chemical and external potential. That $F[\rho ]$ is a *Helmholtz* free-energy functional follows straightforwardly from the thermodynamic relation Ω_{0} = *F*_{0} − *μN*_{0}, with *N*_{0} = *∫d***r***ρ*_{0}(**r**) being the equilibrium number of particles and *F*_{0} being the equilibrium Helmholtz free energy, which can be decomposed into the sum of the potential energy *∫*d**r***ρ*_{0}(**r**)*V*_{ext}(**r**) due to the external field and the remaining intrinsic free energy $F[\rho 0]$.

Unfortunately, $F[\rho ]$ is not explicitly known in most cases. An exception is the ideal-gas case of *ϕ*(*r*) ≡ 0, where it is possible to construct the intrinsic free-energy functional as $Fid[\rho ]=kBT\u222bdr\rho (r)ln\u2061\rho (r)\Lambda 3\u22121$, with $\Lambda =h/2\pi mkBT$ being the thermal wavelength. The common practice in DFT is now to split the intrinsic free energy into the ideal and the excess-over-ideal part, $F[\rho ]=Fid[\rho ]+Fexc[\rho ]$, and to find an explicit (usually approximate) expression for $Fexc[\rho ]$. Once such an explicit expression has been found, we can cast the minimum condition for *ρ*_{0}(**r**) of Eq. (2) in the explicit form

Note that Eq. (4) is a self-consistency relation for interacting systems, which usually takes the form of a nonlinear integro-differential equation that needs to be solved numerically for given *μ* and *V*_{ext}(**r**) for a system of interest with pair potential *ϕ*(*r*) at temperature *T* and, hence, with a given excess functional $Fexc[\rho ]$. In relatively simple geometries, for instance, with planar or radial symmetry, a numerical solution of Eq. (4) can be found at relatively low computational cost by means of a Picard iteration scheme.

Thus, the remaining problem of DFT lies in constructing an explicit form for $Fexc[\rho ]$, for which no universal recipe is available —not unlike the case of partition functions of interacting systems. There is, however, one more exact relation that can be and has been exploited and involves the second functional derivative $\u2212\beta \delta 2Fexc[\rho ]/\delta \rho (r)\delta \rho r\u2032$, which equals by definition the Ornstein–Zernike direct correlation function *c*(**r**, **r**′) and is hence directly related to the two-body structure of the system. In particular, in a homogeneous bulk system, the direct correlation function is of the form *c*_{b}(|**r** − **r**′|), and its Fourier transform $c\u0302b(q)$ yields the structure factor $S(q)=(1\u2212\rho c\u0302b(q))\u22121$, from which the radial distribution function *g*(*r*) follows by an inverse Fourier transformation.

In this manuscript, we will focus on a Lennard-Jones fluid. In the DFT treatment, we split the pair potential *ϕ*(*r*) = *ϕ*_{0}(*r*) + *ϕ*_{1}(*r*) into a steep repulsion *ϕ*_{0}(*r*) at short distances and an attractive tail *ϕ*_{1}(*r*) of well depth −*ɛ* < 0 in accordance with Barker–Henderson theory, as further detailed in Sec. III. On the basis of the vast body of knowledge on the thermodynamics and the two-body structure of the hard-sphere system, extremely accurate approximations have been constructed for its intrinsic excess Helmholtz free-energy functional $FHSexc[\rho ]$, for which we will use the White-Bear mark II^{12} version of the fundamental measure theory^{6,11} throughout this paper. It is common practice in liquid-state theory to treat the attractions as a perturbation on the hard-sphere system, and a popular version results in the van der Waals-like mean-field (MF) approximation,

The high-temperature limit of Eq. (5) returns the hard-sphere limit, but the MF approximation fails to give accurate results for lower temperatures where the attractions play a more prominent role.^{2,3,14} We therefore seek improved excess free-energy functionals in terms of corrections to the mean-field functional of Eq. (5) of the quadratic and cubic forms,

where the labels ML2 and ML3 refer to the fact that we will use machine learning (ML) to find the optimal form of the kernels Ω_{2}(*r*) and Ω_{3}(*r*). We note that ML2 reduces to the mean-field form for Ω_{2}(*r*) ≡ 0 and that ML3 reduces to ML2 for Ω_{3}(*r*) ≡ 0. We also emphasize that the ML3 form of the functional is *not* compatible with the third-order virial-type expansion, which would have entailed an additional spatial integration (say over **r**′) and a kernel of the triple product form $f(|r\u2212r\u2032|)f|r\u2032\u2212r\u2033|f|r\u2033\u2212r|$; finding the optimal kernel *f*(*r*) proved to be computationally too demanding and inconvenient at the exploration phase of this project, and hence, we settled for the simpler form of the cubic term ML3.

The building of ML functionals upon the MF functional is important. For long-ranged potentials, the mean-field functional retrieves the correct asymptotic decay of the direct correlation function such that the range of the ML corrections can conveniently be limited. For short-ranged potentials, the direct correlations are short-ranged anyway, and the inclusion of the mean-field term puts no constraint on the resulting ML functional.

### B. Planar geometry

Interestingly, we can exploit the fact that the optimal kernels Ω_{2}(*r*) and Ω_{3}(*r*) that we seek *must* be independent of *μ* and *V*_{ext}(**r**) to determine them in systems with a planar geometry, i.e., systems that have translation invariance in the *x*- and *y*-directions with external potentials *V*_{ext}(*z*) and density profiles *ρ*(*z*) that only depend on the normal coordinate *z*. One easily checks that the non-HS part of the MF functional of Eq. (5) then reduces to

where *A* = *∫*d*x*d*y* is the (macroscopically large) area of the planar surface and $\varphi 1,z(|z\u2212z\u2032|)=\u222bdxdy\varphi 1((z\u2212z\u2032)2+x2+y2)$ is the laterally integrated pair potential *ϕ*_{1}(*r*). Likewise, the non-HS contributions to the functionals of Eq. (6) can be cast in the form

where the laterally integrated kernels *ω*_{2} and *ω*_{3} can be written as

Interestingly, Eq. (11) can be inverted such that we find

In other words, once we find *ω*_{2}(*z*) and *ω*_{3}(*z*) from calculations in planar geometry, we can determine Ω_{2}(*r*) and Ω_{3}(*r*) from Eq. (12) such that the direct correlation function follows by taking second functional derivatives of Eqs. (6) and (7). Hence, we have access to thermodynamic and structural properties in bulk, in any geometric confinement, at any external potential, at any chemical potential, and solely on the basis of input in a planar geometry.

## III. SYSTEM

In this paper, we consider a 3D fluid in which the particles interact with a truncated and shifted Lennard-Jones (LJ) interaction given by

where *ɛ* > 0 denotes the well depth and *σ* is the LJ particle diameter. The full LJ potential is truncated at *r*_{cut} = 4*σ* and shifted upward by *ε*_{cut} = 0.98 · 10^{−3}*ɛ* such that *ϕ*^{LJ}(*r*_{cut}) = 0. The splitting of *ϕ*^{LJ}(*r*) into a hard-sphere reference and an attractive tail in the DFT treatment is performed on the basis of the well-known Barker–Henderson theory^{15,16} that leads to an effective and temperature-dependent hard-core diameter 0 < *d* ≤ *σ* that does not depend on the bulk density, as explained in Refs. 15 and 17. At the temperature *k*_{B}*T*/*ɛ* = 2 of our main interest, the effective diameter is given by *d* = 0.9568*σ*. The resulting expression for *ϕ*_{1}(*r*) then reads

We stress that the value inside the core, i.e., *ϕ*_{1}(*r* < *d*), is not uniquely defined,^{18,19} and its value can be used as a fit parameter for better agreement between simulations and DFT. However, we choose here to set it to 0 in line with previous studies on the LJ system.^{20–22}

The external potentials *V*_{ext}(*z*) that we consider in this manuscript all mimic a planar slit geometry. The slit is translationally invariant in the lateral *x*–*y* plane and is mirror-symmetric in the midplane *z* = 0 such that *V*_{ext}(*z*) = *V*_{ext}(−*z*). We employ a family of external wall-particle potentials that is repulsive and parameterized by

where the dimensionless strength *s* = *βV*_{ext}(*L*/2) ≥ 40 characterizes the potential at |*z*| = *L*/2, *w* ∈ [0, 1] denotes the width of the central part of the slit, *βV*_{ext}(*z*) = 0, and *p* > 0 is the power that characterizes the steepness of the potential. Figure 1 illustrates the external potential for general *s*, *w*, and *L* and for steepness parameters *p* = 2 (black) and *p* = 8 (blue).

## IV. METHODS

### A. Simulations

To generate the training and validation datasets to “learn” the density functional, we perform grand-canonical Monte Carlo (MC) simulations of the 3D truncated and shifted Lennard-Jones (LJ) fluid confined between two planar soft-repulsive walls described by the external potential *V*_{ext}(*z*) of Eq. (15). Here, we only consider highly repulsive walls with *s* ≥ 40 to ensure that the density reduces to essentially 0 at |*z*| = *L*/2. We measure the equilibrium density profile *ρ*^{MC}(*z*) in a cubic simulation box of volume *V* = *L*^{3}, with *L* = 10*σ*. We impose periodic boundary conditions in the *x*- and *y*-directions and equilibrate the system for at least 10^{5} MC cycles before the measurements start. Each MC cycle consists of *N* trial moves, with *N* denoting the instantaneous number of particles. Each trial move can either displace a particle or insert/remove a particle. The probability of selecting a trial move to displace a particle instead of an insertion/deletion move is set to 90%. The sampling of the density profile *ρ*^{MC}(*z*) is performed by dividing the volume in 320 equidistant bins that represent planar slices normal to the *z* axis, each of width *σ*/32 such that the interval *z* ∈ [−5*σ*, 5*σ*] is fully covered. The density in each bin is measured and stored after every fourth MC cycle.

In order to avoid (interesting but at this stage undesired) complications due to possible phase transitions (condensation, pre-wetting, capillary evaporation, etc.), we consider only a supercritical temperature *k*_{B}*T*/*ɛ* = 2. Eight different chemical potentials *μ* are imposed in the grand-canonical MC simulations of the LJ system, given by *βμ* ∈ {−3.0, −2.5, …, 0.0, 0.5}. Here, the arbitrary offset of *μ* is chosen such that the thermal wavelength equals the particle diameter, Λ = *σ*; it implies that *βμ* → log *ρ*_{b}*σ*^{3} in the dilute (ideal-gas) limit *ρ*_{b}*σ*^{3} ≪ 1. A total of 24 different external potentials are considered as training sets, all with total slit length *L* = 10*σ* and strengths *s* ∈ {40, 60}, widths *w* ∈ {0.4, 0.65, 0.9}, and steepness parameters *p* ∈ {2, 4, 8, 10}.

As an illustration, we show in Fig. 2 the simulated density profiles *ρ*^{MC}(*z*) of a LJ fluid at *k*_{B}*T*/*ɛ* = 2 and chemical potentials *βμ* = {−3.0, −2.5, −2, −1.5, −1.0, −0.5, 0, 0.5} (symbols) corresponding to (separately simulated) bulk densities *ρ*_{b}*σ*^{3} ≈ {0.056, 0.10, 0.19, 0.33, 0.47, 0.56, 0.62, 0.67} in the external potential *V*_{ext}(*z*) characterized by *s* = 60, *w* = 0.4, and *p* = 4, as denoted by the red solid line. We observe monotonous density profiles at the lowest *μ*’s, the development of density oscillations at higher *μ*′s, and a fairly well-defined “bulk” density in the vicinity of *z* = 0 (except at the highest *μ*′s, where the profiles of the two walls show some overlap due to the limited system size).

### B. Machine-learning methods

With an optimization process that uses several techniques from the field of ML, we will construct intrinsic free-energy functionals of the form of Eqs. (9) and (10) such that the density profiles *ρ*^{ML}(*z*) that follow from this machine-learned functional are an “optimal” approximation to the corresponding MC densities *ρ*^{MC}(*z*). We recall that *ρ*^{ML}(*z*) is to be determined as a solution of the Euler–Lagrange equation (4), not only for Λ = *σ* at a given temperature, chemical potential, and external potential but also for a given excess functional $Fexc[\rho ]$. In other words, we are interested in optimal kernels *ω*_{2}(*z*) and *ω*_{3}(*z*) [where *ω*_{3}(*z*) ≡ 0 for ML2].

In order to quantify “optimal,” we define the so-called loss function $L$ that characterizes the difference between ML and MC profiles and that we will minimize with respect to *ω*_{2}(*z*) and *ω*_{3}(*z*). Here, we define $L=L1+L2$ to consist of a dominant contribution $L1$ and a regularization term $L2$. The dominant loss term is defined by the mean-square error^{23} between the MC and ML profiles,

where *j* = 1, …, *n* labels the *n* = 24 × 8 = 192 combinations of 24 external potentials and the eight chemical potentials of the training set as identified above. We normalize the difference between the MC and ML profiles by the MC bulk density at the chemical potential *μ*_{j} of training set *j*, for which we performed separate bulk simulations. This scaling promotes equal weights to high- and low-density states during the learning process. The regularization term $L2$ is independent of the MC and ML profiles and defined by

where *f*(*z*) is given by

It accounts for the constraint that *ω*_{2}(*z*) and *ω*_{3}(*z*) must decay smoothly to 0 for *z* ≫ *σ*, where our statistics is poor. Moreover, $L2$ also suppresses undue high-wavenumber undulations that tend to develop at |*z*| < *σ*. We tune the (positive) regularization parameter *λ* by trial and error such that it contributes less to the minimization procedure than $L1$, while not being too small to be irrelevant. Note that $L2$ effectively reduces the range of *ω*_{i}(*z*) by suppressing it exponentially for |*z*| > *σ*.

The minimization of the total loss function $L$ is performed with the stochastic and iterative optimization method Adam as proposed by Kingma and Ba in Ref. 24. We use their suggested default step size *α* = 0.001 and exponential decay rates *β*_{1} = 0.9 and *β*_{2} = 0.999 and refer the reader to their work for a full description of the method and its parameters. During each iteration of the minimization process, the gradient of the loss function with respect to the kernels *ω*_{i}(*z*) is required, which are straightforwardly derived for $L2$ to be the functional derivatives,

The functional derivatives of $L1$ with respect to *ω*_{i}(*z*) for *i* = 2, 3 are more involved and stem from the dependence of $L1$ on the ML density profiles $\rho jML(z)$ for *j* = 1, …, *n* such that the functional chain-rule yields

where we replaced the dummy integration variable *z* of Eq. (16) by *z*′. From the Euler–Lagrange equation (4) for the DFT equilibrium profiles *ρ*_{0}—which are represented by $\rho jML$ in Eq. (20)—one checks that $\delta \rho 0(z\u2032)/\delta \omega i(z)=\u2212\rho 0(z\u2032)\xd7\delta 2\beta Fexc/\delta \rho (z\u2032)\delta \omega i(z)$. Upon considering *ω*_{i} and *ρ* independent variables in Eqs. (9) and (10) for the ML2 and ML3 excess functionals, respectively, the second (cross) derivative for *i* = 2 equals *ρ*_{0}(*z* + *z*′), and for ML3 and *i* = 3, it equals $13[\rho 02(z+z\u2032)+2\rho 0(z\u2032)\rho 0(z+z\u2032)]$. Hence, within this approximation, a numerical integration of *z*′ suffices to evaluate Eq. (20), and in combination with Eq. (19), we can numerically calculate $\delta L/\delta \omega i(z)$ for the grid points *z* of our system. Thus, we have all ingredients to minimize $L$ by means of Adam.^{24}

### C. The training process

The training process starts with an initial guess for the two kernels, for which we take the MF approximation $\omega 20(z)=\omega 30(z)\u22610$, where the superscript 0 denotes the 0-th iteration in the training process. Next, we use these kernels to calculate the *n* density profiles $\rho j,kML(z)$ for learning sample *j* = 1, …, *n* and iteration label *k* = 0 by solving the Euler–Lagrange equation [Eq. (4)] using a Picard iteration scheme with the MC profile $\rho jMC(z)$ as the initial guess.

On the basis of Eqs. (16)--(20), we can then evaluate $L$ and $\delta L/\delta \omega i(z)$ for *i* = 1, 2, from which improved kernels $\omega ik(z)$ are constructed for *k* = 1 by employing Adam,^{24} which will give rise to improved density profiles $\rho j,1ML(z)$, etc. For *k* ≥ 2, we take $\rho j,k\u22121ML(z)$ as the initial guess in the Picard-iteration of $\rho j,kML$. The iteration process is repeated until the loss function has converged.

Although Adam is already an efficient algorithm for the learning process, its computational cost can be significantly reduced by making use of stochastic optimization. Rather than using all *n* elements of the training set in every iteration, which involves the addition of all *n* terms in Eq. (20) at every iteration level *k*, we consider mini batches with only 20 randomly selected elements of the training set during each Picard iteration *k*. The gradient of the loss function $L1$ is computed by only taking into account this mini batch; thus, the summation over the *n* density profiles of Eq. (20) changes to a summation over 20 randomly selected density profiles, and the normalization factor 1/*n* becomes 1/20. A new mini batch is randomly selected during every iteration in the ML process.

## V. RESULTS FOR THE LENNARD-JONES SYSTEM

We perform MC simulations of the LJ system to generate MC density profiles with 24 different external potentials, described in Sec. III, and eight equi-distant different chemical potentials, *βμ* ∈ {−3.0, −2.5, …, 0.0, 0.5}, for the temperature *k*_{B}*T*/*ɛ* = 2. We describe the kernels, the resulting density profiles, the mechanical equations of state of the bulk fluid, and the radial distribution functions that follow from the functionals ML2 and ML3 using two different routes.

### A. The kernels

For several choices of the regularization parameter *λ* in Eq. (17), we determined the kernel *ω*_{2}(*z*) for ML2 and *ω*_{2}(*z*) and *ω*_{3}(*z*) for ML3. Without a significant $L2$ contribution, *λ* ≤ 10^{−3}, we found spurious peaks in both *ω*_{2}(*z*) and *ω*_{3}(*z*) for |*z*| > 8*σ*, i.e., at the largest separations (with the poorest statistics) we considered in the learning set; these spurious peaks disappeared, and *ω*_{2}(*z*) smoothly decayed to 0 for *λ* ≥ 10^{−2}, and throughout, we settle for *λ* = 10^{−2} as a reasonable compromise between error-correction and minimization of the actual loss function of interest $L1$.

In Fig. 3, we present the evolution of the loss functions $L1$ (blue) and $L2$ (red) during the training process with iteration label *k* in (a) for ML2 and in (b) for ML3; the gray curves in (a) and (b) represent a moving average of $L1$ over 15 iterations. We observe good convergence after, say, *k* = 5000 iterations, respectively. We note that the minimized loss function $L1$ of ML2 is as small as 5 × 10^{−4}, and for ML3, it is even about four times smaller. We also note that $L2<L1$ for ML2, as desired for a regularization term that is (naively) supposed to be a small correction to the total loss function. However, for ML3, we find $L1$ to be so small that it has dropped below the regularization term $L2$, which in retrospect should be seen as a consequence of the good accuracy of the ML3 functional rather than as a problem for the relative magnitude of the two contributions to the loss function.

In Fig. 4(a), we show the MF (scaled) kernel *βϕ*_{1,z}(*z*) (blue dashed line) and its ML2 correction *βϕ*_{1,z}(*z*) + *ω*_{2}(*z*) (black solid line), as obtained after 5000 iterations. For all *z*, the ML2 kernel is more negative than the MF kernel, as if there is actually more cohesive energy in the system than predicted by MF. We see that *ω*_{2}(*z*) develops a peculiar and unexpected small “bump” close to *z* = 0. For ML3, a similar feature occurs close to *z* = 0 in both *ω*_{2}(*z*) and *ω*_{3}(*z*), as can be seen in Fig. 4(b), where we plot *ω*_{2}(*z*) (green solid line) and *ω*_{3}(*z*) (green dotted line) for the ML3 case as obtained after 5000 iterations, together with the ML2 kernel *ω*_{2}(*z*) (black solid line) for comparison. We see that *ω*_{2} from ML3 is again essentially negative (except for a tiny positive feature at *z* = 0 and |*z*| ≃ 2*σ*) and contains a “bump” similar to the ML2 case. We also see that *ω*_{3}(*z*) has a structure that is quite similar to *ω*_{2}(*z*), however more pronounced with higher peaks and lower valleys. Below, we will investigate the thermodynamic and structural properties that follow from DFT based on these kernels.

### B. The density profiles

The first test of the quality of the ML functionals is a comparison of their resulting density profiles with the simulated ones from the training set. In Fig. 5, this comparison is illustrated for the external potential parameterized by *w* = 0.65, *p* = 2.0, and *s* = 40 (shown in red) and the four chemical potentials *βμ* ∈ {−2.5, −1.5, −0.5, 0.5}; for symmetry reasons, we only plot the regime 0 < *z* < *L*/2, and for comparison, we also show the MF profiles. Clearly, the MF predictions are substantially worse than ML2 and ML3, except at the lowest *μ*, and ML3 constitutes a small improvement over ML2, especially at the peaks of the profiles at intermediate to high *μ*. In fact, we can also conclude from Fig. 5 that the main improvement of ML2 and ML3 over MF compared to the simulations concerns the bulk density *ρ*_{b} that is approached in the center of the slit at *z* = 0, as will be made more explicit below.

In Fig. 6, we consider a comparison of MC simulations with MF, ML2, and ML3 density profiles in a particular external potential outside the training set, at *βμ* = −1. The external potential consists of hard walls at *z* = 0 and *z* = 20*σ*, and for *z* ∈ [0, 20*σ*], the potential varies irregularly with wells and barriers between ±*k*_{B}*T*, as shown by the red solid curve in Fig. 6. We see again that both ML2 and ML3 are largely of comparable quality and substantially more accurate than MF.

### C. Mechanical equation of state of the bulk

The (isothermal) mechanical bulk equation of state provides relations between the bulk density *ρ*_{b}, the pressure *p*, and the chemical potential *μ*, satisfying the constraint of the Gibbs–Duhem equation *dp* = *ρ*_{b} *dμ* such that we can equivalently consider *ρ*_{b}(*μ*), *p*(*μ*), or *p*(*ρ*_{b}). Within DFT, the bulk density *ρ*_{b}(*μ*) that follows from a particular free-energy excess functional follows from the solution of the Euler–Lagrange equation (4) for the homogeneous bulk case *V*_{ext} ≡ 0, which reduces for ML2 and ML3 to a nonlinear algebraic equation with coefficients that depend on ∫*dz ω*_{i}(*z*). Hence, *ρ*_{b}(*μ*) is straightforwardly solved numerically for the three functionals ML2, ML3, and MF of our interest here. The bulk pressure follows as *p*(*μ*) = −Ω[*ρ*_{b}]/*V*, from which *p*(*ρ*_{b}) follows upon inversion of *ρ*_{b}(*μ*). For the temperature of interest, *k*_{B}*T*/*ɛ* = 2, these three representations of the equation of state are shown in Figs. 7(a)–7(c) for the three functionals MF (blue dashed line), ML2 (black solid line), and ML3 (green solid line) together with the MC data (purple symbols). The regime of the training set is hatched gray. In the *μ*-dependent curves of (a) *ρ*_{b}(*μ*) and (b) *p*(*μ*), we find agreement in the low-density limit *βμ* < −3, as expected, since all functionals include the ideal-gas limit properly. In the regime of the training set, we also see ML2 and ML3 outperforming MF by a large margin in (a) and (b), with a small but hardly noticeable improvement of ML3 compared to ML2, as we could have expected on the basis of the density profiles of Fig. 6 and the loss functions of Fig. 3. At the high-*μ* side outside the training set, Fig. 7(a) shows an increasingly deteriorating quality of the ML2 and, especially, the ML3 prediction, which are systematically higher than the MC data, although they are still much more accurate than the predictions based on the MF functional. Interestingly, however, the picture that emerges from the *p*(*ρ*_{b}) representation shown in Fig. 7(c) is much more forgiving for the MF functional, which is now of comparable good agreement in the complete regime of the training set and deviates as much as ML2 (and even less than ML3) from the MC data. Clearly, this relatively good MF and ML2 performance is due to a fortunate cancellation of errors occurring in the process of eliminating the dependence on the chemical potential.

It is, perhaps, remarkable that rather accurate *bulk* equations of state in a complete density interval can be obtained from MC simulations at only a few chemical potentials in only a few external potentials. Here, it is crucial to appreciate the DFT formalism, which includes the statement that the intrinsic excess free-energy functional $Fexc[\rho ]$ that we construct by the ML ansatz of Eqs. (9) and (10) is independent of the external and chemical potentials and, hence, can also be applied at any *μ* in the homogeneous bulk where *V*_{ext} ≡ 0.

### D. The structure of the bulk fluid

A key feature of DFT is that it provides not only thermodynamic but also structural information, where we have seen that the first functional derivative $\delta Fexc[\rho ]/\delta \rho (r)$ plays a key role in the Euler–Lagrange equation (4) for the equilibrium one-body distribution function. We have also seen already that the second functional derivative $\u2212\beta \delta 2Fexc[\rho ]/\delta \rho (r)\delta \rho r\u2032\u2261cr,r\u2032$ equals the direct correlation function and governs the two-body distribution function.^{1,2} In a homogeneous and isotropic bulk fluid, symmetry dictates that the direct correlation takes the bulk form *c*_{b}(|**r** − **r**′|), and the radial distribution function *g*(*r*) follows from the Ornstein–Zernike equation $g(r)\u22121=cb(r)+\rho b\u222bdr\u2032(g(r\u2032)\u22121)cb(|r\u2212r\u2032)|$. Since the ML2 and ML3 functionals have been fully determined in terms of *ω*_{i}(*z*) in planar geometry and its conversion to Ω_{i}(*r*) according to Eq. (12), we can write from Eqs. (9) and (10) that

where for *c*_{HS}(*r*), we used the White-Bear mark I direct correlation function reported in Ref. 25 (and where Ω_{3} ≡ 0 for ML2). Consequently, our ML2 and ML3 functionals (and likewise also the MF functional) give direct access to the two-body structure encoded in *c*_{b}(*r*) and *g*(*r*).

In Fig. 8, we plot the resulting *c*_{b}(*r*) for bulk density *ρ*_{b}*σ*^{3} = 0.39 and find fairly good agreement between MF, ML2, and ML3, except close to *r* = 0 where *c*_{b}(*r*) from ML2 and, especially, ML3 become deeply negative. This can be traced back directly to Fig. 4, which reveals that (i) there is close structural similarity between the functionals outside the hard core and (ii) that the “bumps” of *ω*_{2}(*z*) and *ω*_{3}(*z*) close to *z* = 0 give rise to a substantial slope *dω*_{3}(*z*)/*dz* for *z*/*σ* ∈ [0, 0.05] that translates via Eq. (12) in a relatively large effect in *c*_{b}(*r*) in the vicinity of *r* = 0. Note also that *c*_{b}(*r*) vanishes for *d* < *r* ≤ *σ* due to the Barker–Henderson splitting and that all three versions of *c*_{b}(*r*) agree pretty accurately outside the hard core, at least on the scale of the plot.

Upon insertion of *c*_{b}(*r*) into the (Fourier transform of the) Ornstein–Zernike equation, we find (after an inverse Fourier transform) the radial distribution functions *g*(*r*) that we compare with canonical MC simulations at a given density *ρ*_{b} (at the fixed temperature of interest *k*_{B}*T*/*ɛ* = 2). The three lines in Fig. 9 show these radial distribution functions for MF, ML2, and ML3 at bulk densities (a) *ρ*_{b}*σ*^{3} = 0.39 and (b) *ρ*_{b}*σ*^{3} = 0.837, together with the MC simulation results (symbols). For both the lower density in (a) and the higher one in (b), we find reasonably good overall agreement outside the hard core (*r* > *d*), with MF and ML2 actually outperforming ML3 close to contact. Inside the hard core, our prediction for *g*(*r*) is poor in all cases, which is not surprising, given that the underlying *c*_{HS}(*r*) is constructed such as to cause vanishing *g*(*r*) inside the hard core; any tampering of the direct correlation [such as adding terms as we do in Eq. (21)] will give rise to spurious nonzero contributions to *g*(*r*) for *r* < *d*.^{26}

Interestingly, DFT provides another procedure to calculate the radial distribution function of a bulk fluid. This so-called “Percus test-particle method”^{27} is based on the identification of *ρ*_{b}*g*(*r*) with the equilibrium density profile *ρ*_{0}(*r*) that surrounds a given (test) particle that is fixed in the origin of an otherwise homogeneous fluid at bulk density *ρ*_{b} ≡ *ρ*_{0}(*∞*). In other words, *g*(*r*) = *ρ*_{0}(*r*)/*ρ*_{0}(*∞*), with *ρ*_{0}(*r*) being the spherically symmetric density profile of the fluid in an external potential that equals the pair potential, *V*_{ext}(**r**) = *ϕ*(*r*), scaled such that *g*(*∞*) = 1. For a given chemical potential *μ* and a given functional $Fexc[\rho ]$, one thus obtains *g*(*r*) through the solution *ρ*_{0}(*r*) of the Euler–Lagrange equation (4). For the same two state points as used in Fig. 9, we present the resulting radial distributions in Figs. 10(a) and 10(b). For both densities, the agreement between simulation and all three DFTs is substantially better than obtained from the Ornstein–Zernike route shown in Fig. 9, not only for *r* < *σ* where the Boltzmann factor of the external potential in Eq. (4) ensures a vanishingly small contribution to *g*(*r*) but also at larger distances where the oscillations in the MC data are rather accurately captured by all three DFTs. Interestingly, however, the peaks of the oscillations in (b) are actually better accounted for by MF and ML2 than by ML3, which underestimates them especially at close contact. The relatively good performance of MF in predicting *g*(*r*) via the Percus test-particle method compared to its relatively poor prediction of the equation of state *ρ*_{b}(*μ*) is due to the scaling-out of *ρ*_{b} in the density profile *ρ*_{0}(*r*) such that *g*(*∞*) = 1 by construction. Clearly, an overall comparison of Figs. 9 and 10 shows that *g*(*r*) based on the test-particle method is much more accurate compared to the MC simulations than those based on the Ornstein–Zernike equation. This is not surprising, given that we constructed the direct correlation function in Eq. (21) based on a modification of that of a reference hard-sphere system, which yields non-vanishing radial distributions inside the hard core if the OZ route is used. A more careful discussion on the different radial distribution functions from the two routes can be found in Ref. 26.

## VI. SUMMARY, DISCUSSION, AND OUTLOOK

In this article, we combine the formalism of classical density functional theory (DFT) with machine-learning (ML) density profiles from Monte Carlo (MC) simulations to construct approximations to the excess intrinsic Helmholtz free-energy functional $Fexc[\rho ]$ of a (truncated and shifted) Lennard-Jones fluid at the supercritical temperature *k*_{B}*T*/*ɛ* = 2. This functional consists of a well-known and accurate hard-sphere contribution, a standard van der Waals-type mean-field account of the attractions, and new machine-learned corrections that are, for simplicity, either of a quadratic (ML2) or an additional cubic (ML3) form in the density. The kernels of ML2 and ML3 are radially symmetric and translation-invariant two-point functions of the form Ω_{i}(|**r** − **r**′|) for *i* = 2 and 3; see Eqs. (9) and (10). By comparing DFT predictions of the equilibrium density profiles *ρ*_{0}(*z*) with grand-canonical MC simulations at a learning set of chemical potentials *μ* and external potentials *V*_{ext}(*z*) in a 3D planar geometry, we can construct the optimal planar kernels *ω*_{i}(|*z* − *z*′|) using Adam to minimize a suitable loss function, from which we can reconstruct the full radially symmetric kernels Ω_{i}(|**r** − **r**′|). Given that $Fexc[\rho ]$ is independent of the external potential and the chemical potential, the functional and its Euler–Lagrange equation (4) for *ρ*_{0}(**r**) can be applied to any *μ* and any *V*_{ext}(**r**). By comparisons with density profiles obtained from grand-canonical MC simulations, for conditions within and outside of the learning set, we find that the ML2 and ML3 functionals generally outperform MF by far because the latter predicts densities that are systematically too low; ML3 improves ML2 somewhat on some of the details at higher *μ*, at least within the training set. A similar picture emerges from the resulting representations of the mechanical equations of state, viz., the bulk density *ρ*_{b}(*μ*) and the pressure *p*(*μ*), where MF is too low by a large margin and ML3 performs only slightly better than ML2 within the training set, while showing a slightly poorer performance outside. The functional $Fexc[\rho ]$ can also be used to calculate the direct pair correlation function, from which the radial distribution *g*(*r*) of a bulk fluid follows via the Ornstein–Zernike equation. At the relatively low bulk density *ρ*_{b}*σ*^{3} = 0.39, this yields, outside the hard core, an (almost) equally satisfying result for MF, ML2, and ML3; see Fig. 9. Inside the hard core and also close to contact, say, *σ* < *r* < 1.3*σ*, the prediction for *g*(*r*) is poor in all cases. However, all three functionals give a rather good account of the simulated *g*(*r*) at these two state points if the Percus test-particle method is employed, although here ML3 overestimates the peaks at the higher density slightly. The reason for the relatively good MF performance for *g*(*r*) compared to the equation of state *ρ*_{b}(*μ*) and density profiles stems from the imposed asymptotic normalization *g*(*r* → *∞*) = 1.

A disadvantage of the ML approach is its black-box character and the associated difficulty in interpreting the outcome. In particular, the “hump” in the ML2 and ML3 kernels *ω*_{i}(*z*) close to *z* = 0 shown in Fig. 4 and the associated deeply negative direct correlation *c*_{b}(*r*) close to *r* = 0 for ML3 in Fig. 8 are actually rather suspicious. In retrospect, we expect these features to be the result of some degree of overfitting the data in the learning process. This is also borne out by closer inspection of the bulk equations of state *ρ*_{b}(*μ*) and *p*(*μ*) in Figs. 7(a) and 7(b), respectively, where ML3 hardly improves upon ML2 in the (hatched) regime of the learning set while performing even poorer outside and, likewise, for *g*(*r*) of Figs. 9(a) and 10(a) at the density *ρ*_{b}*σ*^{3} = 0.39 that lies comfortably in the middle of the training set. Of course, ML3 does outperform ML2 somewhat for the density profiles of Fig. 5. Nevertheless, some more caution could or should have been exercised in the diversity of the training set of external potentials, perhaps with attractive components and discontinuities. We leave studies along these lines for future work.

Although there is room for improvement and extensions, we have shown here anyway that it is, in principle, possible to construct a free-energy functional for an atomic fluid by an ML process that takes data from grand-canonical MC simulations at a variety of chemical and external potentials, from which further predictions outside the training set can be made. Interestingly, even data taken in a planar geometry can suffice to construct the full functional, at least for the (relatively simple) functional forms that we considered here, which are linear in the kernels Ω_{i}(|**r** − **r**′|); nonlinear forms probably require a different treatment. It is important to realize that we fixed the temperature, and although $Fexc[\rho ]$ is independent of *μ* and *V*_{ext}(**r**), it is dependent on *T*, so strictly speaking, a new functional is to be constructed at every temperature of interest. We leave the *T*-dependence of the functional to future work. Another rather straightforward extension is to use the newly constructed functional to calculate the Gibbs adsorption and the tensions of wall–fluid interfaces, for which we expect good accuracy on the basis of the good agreement of the density profiles. We also expect that it is possible to extend studies of this type to other systems with spherically symmetric particles, also to mixtures such as electrolytes. Systems of particles with orientation degrees of freedom are probably challenging, in practice, because of their larger number of variables, although one could imagine first attempts based on truncated expansions in spherical harmonics or an initial focus on homogeneous bulk states (nematics). We hope that this paper will stimulate further explorations of the combination of DFT, ML, and MC simulation.

## ACKNOWLEDGMENTS

This work was part of the D-ITP consortium, a program of The Netherlands Organisation for Scientific Research (NWO) that is funded by the Dutch Ministry of Education, Culture and Science (OCW). It also forms part of the NWO program “Data-driven science for smart and sustainable energy research,” with Project Nos. 16DDS003 and 16DDS014.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.