In modeling the interior of cells by simulating a reaction–diffusion master equation over a grid of compartments, one employs the assumption that the copy numbers of various chemical species are small, discrete quantities. We show that, in this case, textbook expressions for the change in Gibbs free energy accompanying a chemical reaction or diffusion between adjacent compartments are inaccurate. We derive exact expressions for these free energy changes for the case of discrete copy numbers and show how these expressions reduce to traditional expressions under a series of successive approximations leveraging the relative sizes of the stoichiometric coefficients and the copy numbers of the solutes and solvent. Numerical results are presented to corroborate the claim that if the copy numbers are treated as discrete quantities, then only these more accurate expressions lead to correct behavior. Thus, the newly derived expressions are critical for correctly computing entropy production in mesoscopic simulations based on the reaction–diffusion master equation formalism.

## I. INTRODUCTION

In recent years, the coarse-grained computational modeling of intracellular environments has enjoyed significant advances. An important paradigm shared by many such models is to treat the evolution of reacting chemical species’ copy numbers and spatial distributions by simulating a reaction–diffusion master equation (RDME).^{1} In this approach, the system volume is divided into compartments, each with local values of the copy numbers and chemical potentials of the different chemical species (Fig. 1). The RDME is a differential equation describing the evolution of the probability $PN,t$ of observing the vector of copy numbers $N={Ni,A}i\u2208L,A\u2208\Omega $ of chemical species *i* in compartment *A* at time *t*, where *L* is the set of solute species and Ω is the set of compartments in the system. The RDME can be written schematically as

where $M^$ and $D^$ represent operators describing chemical reactions and inter-compartment diffusion, respectively.^{2} An in-depth description of the RDME approach can be found in Refs. 3 and 4, where the forms of the operators are discussed. Rather than directly solving Eq. (1), one often simulates trajectories of the vector **N** obeying the stochastic dynamics encoded in the RDME using a variant of the Gillespie algorithm.^{5} The sizes of the compartments are commonly determined by the Kuramoto lengths, the mean free diffusional path for a species before it participates in a chemical reaction. Within each compartment, the spatial distributions of the reacting species are assumed to be homogeneous, allowing the use of mass-action kinetics with the compartment’s local values of the species’ concentrations to describe the stochastic chemical reaction propensities. Molecules can additionally jump between adjacent compartments in “diffusion events” (whose propensities also depend on the compartments’ local concentrations of species) to give rise to concentration gradients on the scale of the compartment length. This modeling approach is appropriate when the Kuramoto length is small compared to the system size (i.e., the assumption of homogeneity over the system volume fails), yet large compared to the intermolecular distance scale.^{1} Examples of simulation platforms based on such an approach include Virtual Cell,^{6,7} lattice microbes,^{8,9} MesoRD,^{10} MEDYAN (Mechanochemical Dynamics of Active Networks),^{11,12} and others.^{13–17}

One important aspect of simulating non-equilibrium biological systems is the computation of thermodynamic forces that drive the observed flux on the network of chemical reactions.^{18–20} Determining these forces can allow for the quantification of entropy production in chemically reactive systems.^{21,22} In several research groups, measuring entropy production in biological active matter has been a recent goal.^{23,24} For instance, in recent work, we quantified the entropy production rates of self-organizing non-equilibrium actomyosin networks in MEDYAN using the expressions derived here as a first step.^{12} The ability to measure dissipation in active matter systems will allow us to test the applicability of different physical organizing principles relating the production of entropy to the likelihood of observing certain trajectories.^{25,26} For isothermal, isobaric, chemically reactive solutions, which include most mesoscopic biological systems, measuring the total entropy production amounts to determining the change in Gibbs free energy that accompanies chemical reactions and diffusion down concentration gradients.^{27–29} A ubiquitous textbook expression for the change in Gibbs free energy *G* accompanying a chemical reaction is

where *K*_{eq} is the equilibrium constant, *Q* is the reaction quotient (we give definitions of these quantities below), *k*_{B} is Boltzmann’s constant, and *T* is the temperature.^{30–35} At equilibrium, $Q=Keq\u22121$, and as a result, Δ*G* = 0. However, in this paper, we argue that Eq. (2) is a biased approximation to the exact value of Δ*G* accompanying a chemical reaction that holds when the copy numbers of the reacting molecules are large, such as on the order of Avogadro’s number. When the system is small, such as when copy numbers are on the order of 100 as is often the case in RDME simulations of intracellular environments, thermodynamic expressions such as Eq. (2) require corrections.^{36,37}

As a simple motivating example, consider a mixture of an even total number of two chemical species, red and green, which inter-convert at equal rates. At equilibrium, the copy numbers of these molecules will be equal, and $Q=Keq\u22121=1$. Now, if a reaction were to occur at equilibrium to produce one additional red molecule and to eliminate one green molecule. Then we would expect that the Gibbs free energy of the system had increased, since we have left equilibrium where the free energy attains its minimum. However, the prediction of Eq. (2) is that Δ*G* = 0 for this reaction. The assumption whose violation leads to Eq. (2) being incorrect is that the copy number of chemical species is a continuous quantity. When these variables are considered as discrete, a different expression for Δ*G* must be used to give correct behavior.

Similarly, for diffusion between adjacent compartments, a common expression for the change in Gibbs free energy accompanying the jump of a molecule *i* from compartment *A* with copy number *N*_{i,A} to compartment *B*, where its copy number is *N*_{i,B}, is

Imagine, however, we have a situation where *N*_{i,A} = *N*_{i,B} and a molecule jumps from compartment *A* to *B*. The Gibbs free energy should have increased since we have departed from the highest entropy distribution of the molecules over the two compartments; however, Eq. (3) will predict that Δ*G* = 0.

In this paper, we derive exact expressions for the change in Gibbs free energy accompanying chemical reactions within compartments and diffusion events between compartments, and we further show how these expressions relate to the familiar textbook formulas [Eqs. (2) and (3)] through a series of approximations. We also discuss the assumptions involved in defining the Gibbs free energy of a grid of homogeneously mixed compartments that can exchange energy and particles, such as that used in a simulation of a RDME. Finally, we present numerical simulations using MEDYAN to demonstrate the need to use these more exact expressions for Δ*G* in order to obtain sensible results when copy numbers are treated as discrete variables. Only these more exact expressions will give correct, unbiased behavior when measuring entropy production in mesoscopic *in silico* studies of biological non-equilibrium systems that rely on the RDME formalism.

## II. METHODS

### A. **Δ***G* of reactions

Here, we make successive approximations to the formula for Δ*G* accompanying a chemical reaction, and our notation reflects the level of approximation in which certain quantities are being used: when appropriate, we subscript quantities with a parenthesized number, i.e., Δ*G*_{(0)}, where increasing numbers represent more approximate versions. The symbol $\u2009\u2009\u2009\u2009\u0303$ will indicate that the quantities of chemical species are being represented by mole fractions *χ*_{i}, rather than concentrations *C*_{i}. In this section, we treat the case that our system comprises a single closed compartment of a homogeneous dilute solution in which a chemical reaction has occurred, and we derive an expression for Δ*G*. In this system, the number of solvent molecules is fixed and the solute molecules participate in chemical reactions, causing their copy numbers to change. In Secs. II B and II C, we consider a system with multiple weakly interacting compartments (which may, in general, comprise an irregular grid rather than the regular Cartesian grid visualized in Fig. 1), each of which contains a homogeneous solution with local copy numbers of solvent and solutes and between which both solvent and solutes can diffuse. The nearly exact result [Eq. (21)] obtained in this section will also apply to those systems, as we argue below.

Before restricting to the case of a single closed compartment, we establish notation for properties of the chemical species in a compartment grid. The chemical potential for species *i* in compartment *A* can be expressed as depending either upon the mole fraction, *χ*_{i,A}, or upon the concentration, *C*_{i,A}, of that species in the compartment,

where *k*_{B} is Boltzmann’s constant, $\mu i0\u0303(T,p)$ is the standard state chemical potential at temperature *T* and pressure *p* when working with the dimensionless *χ*_{i,A}, and $\mu i0(T,p)$ is the same when working with *C*_{i,A}. *C*_{i,A} and *χ*_{i,A} both play the role of the “composition variable” leading to these different, yet equivalent expressions for the chemical potential.^{38} We make the distinction between dependence upon copy number and dependence upon concentration in order to establish parameters that can be used in simulation, which commonly works with copy numbers, based on those given in the literature, which typically use units of concentration. Here, we make the assumption of an ideal-dilute solution and neglect the coefficient of activity of the solute species.^{39} The mole fraction can be written as

where *N*_{A} is the total copy number of molecules in compartment *A* and *M* is the set of all species including the solvent in the system. Similarly, the concentration can be written as

where

is a conversion factor, *N*_{Av} is Avogadro’s number, and *V*_{A} is the compartment volume (we assume constant pressure and that the fluctuations in volume are negligible for these liquid systems, allowing the use of Gibbs free energy). Using Eqs. (4)–(6), we can write^{40}

Using standard arguments concerning thermodynamic extensivity, it is possible to establish that the Gibbs free energy of the solution in compartment *A* can be written as a weighted sum over the chemical potentials of the species,

where $NA={Ni,A}i\u2208M$ is the vector of species copy numbers *N*_{i,A}, and where we have explicitly indicated the dependency of *μ*_{i,A} upon mole fraction *χ*_{i,A} via Eq. (5).^{28,41} One may be concerned that Eq. (9) fails to apply when the copy numbers of solutes are small. As explained below, we make the assumption that boundary effects are still negligible, which allows us to treat *G* as a first order homogeneous function of the number of copies of the system.^{37} This is the necessary property to establish Eq. (9), so the small copy numbers of solutes do not render this approach invalid. We rely on Eqs. (4) and (9) to derive changes in Gibbs free energy accompanying chemical reactions and inter-compartment diffusion.

Consider a reaction of the general form

where *X*_{i} represent reactants, *Y*_{j} represent products, *ν*_{i} and *υ*_{j} are stoichiometric coefficients, and the rate of reaction is *k*_{+} to the right and *k*_{−} to the left. We have dropped the subscript *A* indicating the compartment in which the reaction takes place and now restrict to the case that our system is a single compartment. When this reaction has occurred once to the right, the copy numbers of reactants have changed *N*_{i} → *N*_{i} − *ν*_{i} and those of the products have changed *N*_{j} → *N*_{j} + *υ*_{j}. We calculate the change in Gibbs free energy accompanying this reaction by considering it as resulting from these finite, discrete changes in copy numbers,^{37} not from infinitesimal changes. Using Eqs. (4), (5), and (9), the Gibbs free energy before the reaction has occurred can be written as

where *R* is the set of reactants, *P* is the set of products, the subscript *s* refers to the solvent, and where we have dropped the dependence of the standard state chemical potential on *T* and *p*. $\mu i0\u0303$ and $\mu j0\u0303$ describe the chemical potential at a reference concentration of the solute in the solvent (also referred to as the solute standard state), whereas $\mu s0*\u0303$ describes the chemical potential at a reference state of pure solvent (also referred to as the solvent standard state).^{39} We assume here, for simplicity and without loss of generality, that there are no solute species that have not participated in the reaction (i.e., spectator solute species). These species would also contribute terms to Eq. (11), but when we subtract the initial from the final Gibbs free energy, their inclusion would not yield a different result. The final Gibbs free energy is

where

which is the amount by which the total species copy number *N* has changed. As described in Refs. 38 and 42, when *σ* ≠ 0, it is important to account for the solvent species in the calculation of free energy differences. This is because the free energy of the solvent, which is the most abundant species in the reaction volume, will not be the same after the reaction has taken place since its mole fraction will change as *N* changes to *N* + *σ*. Neglecting the solvent species when *σ* ≠ 0 leads to expressions for Δ*G* that are off by an amount *σk*_{B}*T*.^{38} Whereas the authors of Refs. 38 and 42 describe the appearance of this erroneous term while formulating the Gibbs free energy as a function of a continuous degree of advancement of reaction *dξ* = −*dN*_{i}/*ν*_{i} = *dN*_{j}/*υ*_{j}, here we treat the extent of reaction as a discrete quantity. In the limit that *ν*_{i}/*N*_{i} → 0, *υ*_{j}/*N*_{j} → 0 for each reactant and product, the discrete case passes into the continuum case; however, under the assumption of small copy numbers, we do not satisfy this limit. In Appendix A, we discuss further differences between the continuum treatment and the discrete treatment, as well as the relation to the Gibbs–Duhem equation.

After some algebra (using the fact that ∑_{i∈R}*N*_{i} + ∑_{j∈P}*N*_{j} + *N*_{s} = *N*), we can write the change in Gibbs free energy as

where

Equation (14) is exact, but we would like to avoid specifying *N* in simulation since the solvent is typically not modeled explicitly, as we elaborate on in Sec. II B. We would also like to determine $\Delta G0\u0303$ from literature values of Δ*G*^{0}. To these ends, we first rewrite Eq. (14) as

where

where

Inserting this into Eq. (16), we have

We now make the approximation that *N* ≫ *σ*, which is certainly reasonable for most realistic parameterizations of the compartment grid (in the example of a 0.125 *µ*m^{3} compartment filled with water, *N* ∼ 10^{9} while *σ* ∼ 1). With this, we can write the third term in Eq. (20) as −*σk*_{B}*T*,^{43} giving

where $Q\u0303(1)=Q\u0303(0)$ (we updated the subscript to indicate the use of this quantity in a more approximate version of the formula for Δ*G*). We recommend using Eq. (21) in simulation because it allows us to incorporate literature values for Δ*G*^{0} and avoids specification of *N*. To understand the term −*σk*_{B}*T* in Eq. (21) and to understand the relationship between Eqs. (21) and (16) and the textbook expression for Δ*G*, we proceed by making further approximations leveraging the large sizes of the solute copy numbers compared to their stoichiometric coefficients. First, we rewrite $Q\u0303(1)$ as

Assuming *N*_{i} ≫ *ν*_{i} and *N*_{j} ≫ *υ*_{j}, and using the limit

we obtain

Inserting this into Eq. (21), canceling the term *σk*_{B}*T*, gives

where

Finally, introducing

and discarding terms in Δ*G*_{(2)} that scale like $\nu iNi$ or $vjNj$, we arrive at

where

We see that Eq. (28) is obtained from Eq. (25) upon making the approximations *N*_{i} − *ν*_{i} ≈ *N*_{i} and *N*_{j} + *υ*_{j} ≈ *N*_{j}. This final result in Eq. (28) is a standard textbook expression for the change in Gibbs free energy.^{30–35}

We, thus, have four expressions for Δ*G* of chemical reactions:

Equation (14) for Δ

*G*_{(0)}is exact; however, it requires specifying*N*.Equation (21) for Δ

*G*_{(1)}uses the approximation*N*≫*σ*. We recommend the use of this expression because it is the most exact expression for which we need not specify*N*, and since it is written in terms of Δ*G*^{0}, for which literature values can be found.Equation (25) for Δ

*G*_{(2)}uses the approximations*N*_{i}≫*ν*_{i}and*N*_{j}≫*υ*_{j}.Equation (28) for Δ

*G*_{(3)}uses the approximations*N*_{i}≫*ν*_{i}and*N*_{j}≫*υ*_{j}again.

In Appendix B, we provide expressions for the accuracy of these approximations. Note that, without specifying the copy number of solvents *N*_{s}, we can only approximately compute changes in Gibbs free energy, not the instantaneous Gibbs free energy of the system. Typically, only the changes are of interest.

The definition of Gibbs free energy states that

Comparing this expression to Eq. (28), we identify

and

where Δ*H* is the enthalpy of reaction, and Δ*S*^{0} and Δ*S*_{mix} represent the changes in entropy due to the molecular conformations and the translational motions, respectively,

It is instructive to realize that the discrepancies between the various expressions for Δ*G* are due to how the Δ*S*_{mix} term in Eq. (33) is written.

When employing any of the above expressions that involve the logarithms of products of copy numbers that are raised to the power of other copy numbers, we recommend splitting the logarithm of products into a sum of logarithms as well as using log *x*^{y} = *y* log *x* in order to prevent overflow resulting from computing very large numbers. The results of this section do not assume a system consisting of multiple, weakly interacting compartments, which we discuss next.

### B. Thermodynamics of a reaction–diffusion compartment grid

Now that we have treated the scenario of a reaction event occurring within a single compartment, we want to generalize to the case of a grid of compartments. For this, we develop an argument based on timescales that will allow us to track the copy numbers of the reactive solutes while ignoring those of the inert solvent. This is a necessary modeling feature due to the computational infeasibility of tracking the solvent copy numbers in each compartment for long times. After describing the thermodynamic framework for a grid of compartments, in Sec. II C, we derive the change in *G* resulting from diffusion of solutes between adjacent compartments.

In simulating a RDME, one commonly treats diffusion between adjacent compartments and chemical reactions within compartments using an augmented set of all species and reactions in the system that treats species as distinct if they belong to separate compartments. Thus, if there are |*L*| reacting species and |Ω| compartments, where *L* and Ω are the sets of solute species and compartments, respectively, then in the augmented set, there are |*L*∥Ω| species tracked. The number of reactions in the augmented system, including *r* chemical reactions per compartment and roughly *z*|*L*| diffusion events per compartments (where *z* is the assumed constant number of neighbors of each compartment, ignoring boundary compartments), is |Ω|(*r* + *z*|*L*|). This augmented set of species and reactions is then simulated using the Gillespie algorithm in which the reaction propensities are appropriately scaled according to the compartment volumes.^{4}

Crucial to the justification of this strategy to simulate a RDME is the assumption that, within each compartment, the reacting species can be considered homogeneously distributed so that one may use mass-action kinetics to determine the propensities. This assumption amounts to the condition that the timescale describing diffusion within compartments, *τ*_{D}, is much less than the timescale of chemical reactions, *τ*_{C},

This comparison should be done for each diffusing and reacting species, and the timescale of the fastest reaction (taken as the inverse of the propensity) for each species should be used. If the condition holds, then the process of intra-compartmental diffusion will homogenize the solution faster than chemical reactions that occur, so the assumption of mass-action kinetics holds. Let the dimension of the space be *d*, the length of the (here assumed cubical) compartments be *h*, and the diffusion constant of a species be *D*. Then,

The Kuramoto length is given by

so one can see that the condition *τ*_{D} ≪ *τ*_{C} is equivalent to the condition *l*_{K} ≫ *h*, and thus, one can enforce this condition by choosing a smaller compartment size *h*. For fixed total volume, there is a trade-off between *h* and |Ω|, which determines the size of the augmented system and therefore the computational efficiency. The timescale of intra-compartment diffusion *τ*_{D} is approximately equal to the timescale of inter-compartment diffusion (which can be given as the inverse of $kD=Dh2$), so a third way of describing this condition is that the frequency of jumps between adjacent compartments is much greater than the frequency of chemical reactions inside the compartments, which can be checked empirically in simulation.^{1,4} We note that, in the literature, these expressions may differ up to a constant coefficient depending on the reference.

In order to approximately describe the thermodynamics of this system, we distinguish between the inert solvent and the dilute, chemically reactive solutes. We assume here that the system is impermeable to the flow of either kind of species to the exterior. Within the system, all species are permeable (though local diffusion constants may be incorporated for each species^{1}). We treat the solutes explicitly (at the level of their compartment copy numbers), whereas we model the solvent implicitly through an appropriate limit as in the steps leading to Eq. (21). To this end, we assume the existence of a laboratory timescale *τ*_{l} whose purpose is to define the temporal resolution of our measurements such that any changes occurring on a timescale longer than *τ*_{l} will be measured. We assume that *τ*_{l} is much longer than the timescale describing the local compartment fluctuations in the solvent copy numbers *τ*_{s}, yet shorter than but on the order of the timescale describing diffusion of the solutes *τ*_{D}. On this timescale *τ*_{l}, then in the time between the chemical reactions and diffusion events involving the solutes, the system is quasi-equilibrated with respect to fluctuations involving the fast process of solvent diffusion, and thus, the system may be assigned well-defined values of Gibbs free energy.^{44} This hierarchy of timescales can be written as

Typical ratios of the diffusion constants for solute to solvent are in the range of 1/10 to 1/100, placing *τ*_{D}/*τ*_{s} in the range of 10–100.^{45}

On the timescale *τ*_{l}, there is enough temporal resolution to track the diffusion and chemical reactions of the solutes while allowing averaging over the fluctuations of the solvent. To describe activity occurring over the large grid of compartments for extended systems, we introduce new timescales $\tau xg$, where *x* refers to any of the timescales defined above. If we hold the compartment size *h* and the chemical concentrations fixed and add more compartments to our system, the rates of solute diffusion events and chemical reactions occurring anywhere in the system will scale as |Ω|, the number of compartments, and thus, the timescales needed to describe them scale as |Ω|^{−1}, i.e., $\tau lg\u223c\tau l/|\Omega |$. One might be concerned that $\tau lg$ will be less than *τ*_{s} for large systems, ostensibly violating our requirement that we will be able to average over the solvent fluctuations to define quasi-equilibrated states. However, the timescale of solvent fluctuations across the whole grid, $\tau sg$, will also scale inversely with |Ω|, so the condition for being able to average over solvent fluctuations expressed in Eq. (37) does not ultimately depend on the number of compartments. In other words, for systems with many compartments, as long as Eq. (37) holds for one compartment, we can be sure that our laboratory timescale that describes the whole grid, $\tau lg$, will be short enough to describe processes involving the solutes, while long enough to allow averaging over the fluctuations of the solvent occurring locally in each compartment.

We assume that the exterior of the system acts as a reservoir for the thermodynamic variables *p* and *T*. The volume *V* of the system also remains constant; however, under the assumption of the solvent being an incompressible liquid, the change in quantity *pV* is approximately zero, and for each reaction, the change in Gibbs free energy is equal to that of the Helmholtz free energy. Thus, it is inconsequential whether we consider *p* or *V* to be reservoir variables, and we choose *p* in order to speak of the Gibbs free energy of the system. We can write the Gibbs free energy of the system as *G*(**N**, *p*, *T*), where $N={{Ni,A}i\u2208M}A\u2208\Omega $ represents the set of copy numbers of all solute and solvent species in each compartment in the grid. We further assume the compartments to be only weakly interacting, that is, they can exchange energy and particles, but the interaction of the two subsystems does not contribute a term to the Gibbs free energy of the system. This is equivalent to assuming that the Gibbs free energy of the compartments *G*_{A} is linearly additive,

without any terms of the form *G*_{AB}. To justify this, we first note that the interaction free energy between two adjacent compartments will primarily be due to the interaction of the solvent at the interface. This interfacial free energy will, even for mesoscopically sized compartments, be small compared to the free energy due to the bulk of the compartment. Furthermore, our main interest will be in changes in the Gibbs free energy of the system due to solute diffusion between compartments and chemical reactions, neither of which will significantly affect the interfacial free energy, so all terms of the form *G*_{AB} will drop out of the expression for Δ*G*.

In our approach of averaging over fluctuations in the solvent amounts and taking the limit that this average is large compared to the copy numbers of the solutes, we are choosing to neglect the changes in Gibbs free energy of the system owing to the solvent fluctuations. We justify this with an argument that these fluctuations are small compared to those resulting from the activity of the solute molecules; it also arises out of necessity due to the computational expense of tracking the solvent fluctuations. On the laboratory timescale *τ*_{l}, each compartment has an average copy number of solvent molecules, $Ns,A\xaf$. The fluctuations in this quantity will have a standard deviation on the order of $Ns,A\xaf1/2$.^{44} As indicated previously [to arrive at Eq. (21)], we avoid specifying $Ns,A\xaf$ by taking the limit that it is much larger than the number of solute species, and thus, fluctuations in this quantity do not matter when computing nearly exact changes in the Gibbs free energy accompanying reactions involving the solutes. We need to establish that the change in Gibbs free energy of a compartment due to a fluctuation in the solvent copy number on the order of $Ns,A\xaf1/2$ is small compared to the change in Gibbs free energy accompanying chemical reactions and inter-compartment solute diffusion events and, thus, that the fluctuations in this quantity are not outweighing the changes that we are measuring. In Appendix C, we show that if the concentrations of solutes are very different in the two compartments, then this Gibbs free energy change is on the order of $\epsilon Ns,A\xaf1/2kBT$, where *ε* is the ratio of solutes to solvent. We show that this quantity is typically much less in magnitude than the change in Gibbs free energy accompanying a solute diffusion event. If the concentrations are nearly equal, then the Gibbs free energy is on the order of *εk*_{B}*T* and is thus negligibly small. We make the modeling choice to ignore these changes in the Gibbs free energy resulting from solvent fluctuations because they tend to be small and they do not represent the processes we are interested in, which involve the activity of the solute molecules.

To summarize, we track the changes in the Gibbs free energy of the system by computing a value of Δ*G* whenever a chemical reaction occurs within a compartment or a solute diffusion event occurs between adjacent compartments. The timescales on which these events occur are much slower than that of solvent equilibration, allowing for well-defined values of *G* between the events. By the linear additivity of the compartments’ free energies, any change in the free energy of a single compartment is equal to the change in free energy of the whole system (i.e., Δ*G*_{A} = Δ*G*). We assume that the diffusion of the solvent only contributes small, fluctuating, unbiased changes to the free energy that we ignore; we also assume that the amount of solvent is so large that one can take a limit and neglect the fluctuations in this quantity when computing the changes in free energy for processes involving the solute copy numbers.

### C. **Δ***G* of diffusion

To describe the change in Gibbs free energy accompanying a solute diffusion event between neighboring compartments, we use a similar approach to the one used above for chemical reactions. The key difference here is that, as opposed to multiple species being involved in a reaction taking place in a single compartment, we now have a single species involved in a diffusion event taking place between two compartments.

Consider species *i* diffusing from compartment *A*, where its initial copy number is *N*_{i,A}, to compartment *B*, where its initial copy number is *N*_{i,B}. The total copy numbers of molecules in compartments *A* and *B* are *N*_{A} and *N*_{B}, respectively. Assume there is just one spectator species constituting the solvent, labeled *s* with copy numbers *N*_{s,A} and *N*_{s,B} (we remove our uncertainty in the exact values of these numbers by taking a limit later). As a result of the diffusion event, we have the following changes in these quantities: *N*_{i,A} → *N*_{i,A} − 1, *N*_{i,B} → *N*_{i,B} + 1, *N*_{s,A} → *N*_{s,A}, *N*_{s,B} → *N*_{s,B}, *N*_{A} → *N*_{A} − 1, and *N*_{B} → *N*_{B} + 1. The initial Gibbs free energy is

and the final Gibbs free energy is

The difference of these two expressions leads to an exact formula, similar to Eq. (16), which we omit here. Analogously to the approximation *N* ≫ *σ* made in the context of chemical reactions, here we assume that *N*_{A}, *N*_{B} ≫ 1, which amounts to setting *N*_{A} − 1 ≈ *N*_{A} and *N*_{B} + 1 ≈ *N*_{B} in Eqs. (39) and (40). Using this approximation, we can express the change in Gibbs free energy as

We see that, analogously to making the approximation *N*_{i} ≫ *ν*_{i} above, if we here take *N*_{i,A}, *N*_{i,B} ≫ 1, then Eq. (41) reduces to the common expression

The right-hand side of Eq. (42) will always be less than that of Eq. (41), which, although the difference is typically very slight, can lead to systematically biased calculations, as we describe in Sec. III.

The considerations leading to the main results of this paper [Eqs. (14), (21), and (41)] do not depend essentially on the assumption that the compartments comprising the simulation volume form a regular Cartesian grid (as illustrated in Fig. 1). A variety of strategies have been developed to discretize the simulation volume in a more sophisticated manner. These include unstructured meshes that use complex polygonal compartments useful for modeling curved surfaces, as well as adaptive meshes that use compartments with time-dependent boundaries that can increase computational efficiency.^{46–48} Hybrid, multiscale methods also exist, which combine a mesoscopic compartment-based description of part of the simulation volume with either a macroscopic or microscopic description elsewhere in the volume.^{49,50} Determining reaction and diffusion propensities in these frameworks is a somewhat complicated issue that we do not address here. Without undertaking a detailed analysis of how the above derivations of expressions for Δ*G* should be extended to apply in each of these computational frameworks, we observe that, given a method for calculating propensities leading to reaction and diffusion events, the expressions derived here can be applied directly to the case of an irregular grid of compartments, which encompasses several applications of interest. In an irregular grid, each compartment *A* has an arbitrary shape and volume *V*_{A}, with *V*_{A} ≠ *V*_{B}, in general, for any pair *A*, *B*. It is important to ensure that for each of these compartments the assumptions regarding timescales described above still hold. If they do hold, then to apply the above results, it is only necessary to use compartment *A*’s local value of Θ_{A} = *N*_{Av}*V*_{A} in place of Θ where it appears in the expressions for Δ*G*_{(1)}, Δ*G*_{(2)}, and Δ*G*_{(3)} [Eqs. (21), (25), and (28), respectively].

## III. RESULTS

Here, we perform stochastic simulations to illustrate the effects of approximating Δ*G* for reaction and diffusion events when copy numbers are considered small and discrete. We use MEDYAN, a simulation platform designed to study active networks at high resolution, which is equipped with a RDME simulation engine as described above.^{11} With this, we report on two different simulation setups to test the discrepancy between the nearly exact and approximate formulas for Δ*G* corresponding to reactions and to diffusion. We compare the nearly exact equations (21) and (41) for reactions and diffusion, respectively, with their approximate counterparts [Eqs. (28) and (42)]. We observe that only the nearly exact expressions result in sensible behavior, i.e., the rate of change of Gibbs free energy on average is 0 *k*_{B}*T*/s when the system is at equilibrium.

### A. **Δ***G* of reactions

To test the effect of approximation for Δ*G* of reactions, we consider the simple reaction scheme

where the rate constants to the right and left are *k*_{+} = 0.05 *µ*M^{−1} s^{−1} and *k*_{−} = 0.01 s^{−1}, respectively, giving an equilibrium constant of *K*_{eq} = *k*_{−}/*k*_{+} = 0.2 *µ*M. We perform stochastic simulations with the Next Reaction Method in MEDYAN^{51} using a single compartment of size 0.125 *µ*m^{3} (i.e., in this example, there is no diffusion). To employ Eq. (21) in simulation, we first compute Δ*G*^{0} − *σk*_{B}*T* log Θ − *σk*_{B}*T* = 3.71 *k*_{B}*T* for the forward reaction and −3.71 *k*_{B}*T* for the reverse reaction (using Δ*G*^{0} = *k*_{B}*T* log *K*_{eq}), and then, when each reaction fires during simulation, the quantity $Q\u0303(1)$ is computed from the instantaneous values of the copy numbers to determine Δ*G*_{(1)}. A similar approach is taken to employ Eq. (28). We begin with *N*_{A} = 100, *N*_{B} = *N*_{C} = 50 and repeat a simulation of 100 s duration 3000 times to obtain averages of the trajectory of rates *∂*_{t}Δ*G*(*t*) resulting from the forward and reverse reactions [the notation *∂*_{t}Δ*G*(*t*) implies that the measured rates vary smoothly as a function of time; however, this quantity is calculated as the total change in Gibbs free energy resulting from discretely timed chemical events during 1 s-long windows]. Figure 2 displays the results of these simulations. Note how, while the two trajectories bear close similarity, the equilibrium value of *∂*_{t}Δ*G*(*t*) centers around 0 *k*_{B}*T*/s for the nearly exact formulation [Eq. (21)], yet erroneously centers around ∼ −0.08 *k*_{B}*T*/s for the approximate version [Eq. (28)].

### B. **Δ***G* of diffusion

To test the effect of approximating Δ*G* for diffusion in a compartment-based reaction–diffusion scheme, we next employed MEDYAN to simulate diffusion of 1000 solute molecules with diffusion constant 20 *µ*m^{2} s^{−1} in a 2 × 2 × 2 grid of compartments, each a cube with volume 0.125 *µ*m^{3}. The initial distribution of molecules is uniformly random over the compartment grid, and thus, the system begins near equilibrium and is then allowed to stochastically evolve for 100 s, i.e., the molecules hop randomly between adjacent compartments. For each diffusion event, the value of Δ*G* is determined using the nearly exact equation (28) for one run, and in another run, the approximate equation (42) is used. The difference in the trajectories of *∂*_{t}Δ*G* for these simulations is stark, as displayed in Fig. 2. While the trajectory centers around 0 *k*_{B}*T*/s for the nearly exact formulation of Δ*G*, it erroneously centers around ∼ −1925 *k*_{B}*T*/s for the approximate version. Diffusion events are very frequent in this system, occurring around 240 000 times/s (this number can be calculated from the parameters of the system and is also observed during simulations). Thus, since Eq. (42) is always less than the nearly exact quantity, even by a small amount on the order 0.05 *k*_{B}*T* for this system, this systematic bias is amplified by the frequency of diffusion events to produce significant differences from the expected behavior, necessitating the use of a more exact formula for Δ*G*. Finally, we performed a simulation involving both reactions and diffusion across multiple compartments. We found again that only when the nearly exact formulas were used did the rate of change of Gibbs free energy center on 0 *k*_{B}*T*/s at equilibrium. It is not additionally illuminating to show the data, so we do not display it here.

## IV. DISCUSSION

We have argued that when the copy numbers of the reactants and products are treated as small, discrete quantities, then certain approximations leading to the textbook formulas for Δ*G* of reaction and diffusion [Eqs. (2) and (3)] break down and lead to biased results. We emphasize that this is true only when the copy numbers and reaction occurrences are treated as discrete; when they are treated as continuous, one should use the textbook formulas. This can be shown by considering a continuous version of the chemical system described by Eq. (43). The time evolution of the concentrations of the chemical species is obtained by solving a system of ordinary differential equations that employ mass-action kinetics, and from this solution, the rates *∂*_{t}Δ*G*(*t*) resulting from the forward and reverse reactions are computed using the instantaneous values of the species’ copy numbers that enter into Eqs. (21) and (28). Figure 3 displays the results of these calculations. Here, the total rate of *∂*_{t}Δ*G*(*t*) only approaches 0 *k*_{B}*T*/s, as it must at equilibrium, when Eq. (28) is used.

When the copy numbers of the chemical species are treated as continuous, there is no notion of a single occurrence of a reaction; instead, the evolution of the system is parameterized by the continuous variable *ξ* that quantifies the extent of advancement of the reaction.^{52} In this framework, which is adopted in classical thermodynamics, the copy number of any species never jumps instantaneously from *N*_{i} to *N*_{i} + *ν*_{i}, and thus, the premise of the derivation presented above leading to Eq. (16) falls apart. This explains why the seemingly more accurate equation (21) is wrong when applied to chemical dynamics that are described using continuous variables to represent the copy number of species. When the chemical dynamics are modeled this way, the textbook expression [Eq. (28)] for the change in Gibbs free energy is valid. Another way to understand the difference in these expressions is to view Eq. (28) as giving the instantaneous slope of *G* with respect to the degree of advancement of the reaction when these quantities are viewed as continuous variables,^{38,42} whereas Eq. (16) effectively integrates the continuously varying quantity *dG* through a single discrete reaction occurrence.

If the copy numbers of chemical species are not large compared to the stoichiometric coefficients (i.e., *N*_{i} is not much greater than *ν*_{i}), it is best to describe the chemical dynamics by treating the copy numbers as discrete variables participating in stochastically timed chemical reactions. This is the philosophy adopted by several recent models of intracellular environments, where copy numbers of molecules of interest are sometimes quite small. In these cases, adoption of the more exact expression for Δ*G* can not only improve precision but also ensure correct behavior. If concentration gradients are expected to be a strong source of entropy production (e.g., diffusion of monomeric actin along the lengths of filopodia, see Refs. 53–55), then using the nearly exact formulas presented here can ensure that the resulting measurements of dissipation are not strongly biased.

## ACKNOWLEDGMENTS

The authors acknowledge A. Chandresekaran, H. Ni, and Q. Ni for improvement of this manuscript and helpful discussions. This work was supported by the National Science Foundation (Grant Nos. NSF 1632976, NSF DMR-1506969, and NSF CHE-1800418).

### APPENDIX A: DISCRETE VARIABLES AND THE RELATION TO THE GIBBS–DUHEM EQUATION

The Gibbs–Duhem equation from classical thermodynamics states

where *M* represents all chemical species in the system including the solvent, and the last equality holds at constant temperature and pressure.^{27} Applying the product rule^{56} to the expression for the Gibbs free energy, *G* = ∑_{i∈M}*N*_{i}*μ*_{i}, we get

Evaluating ∑_{i∈M}*μ*_{i}*dN*_{i} using stoichiometric coefficients in place of *dN*_{i} can be shown to lead to the expression Δ*G*_{(3)}.

If the copy numbers are small, we are better served using the discrete difference operator Δ and not the differential difference operator *d*. The product rule for the discrete difference operator applied to *G* gives

Here, we do not neglect the cross-term as we do in the differential product rule [Eq. (A2)]. We evaluate this expression term by term,

Comparing these to the expressions for the accuracy of the various approximations given in Appendix B, we have

and thus, combining all the terms in Eq. (A3), we recover our exact expression Δ*G*_{(0)}, and this approach can be seen as an alternate derivation of Eq. (14). To summarize, for small, discrete copy numbers, we obtain corrections to the expression for the change in Gibbs free energy accompanying chemical reactions that are not captured by the constraints imposed by the Gibbs–Duhem equation, which assumes that the copy numbers are continuous quantities.

### APPENDIX B: ACCURACY OF THE APPROXIMATIONS

We calculate the accuracy of the approximations Δ*G*_{(1)}, Δ*G*_{(2)}, and Δ*G*_{(3)} by taking the difference of these quantities with Δ*G*_{(0)}. We have

The accuracy of Δ*G*_{(1)} depends only on *N* for a given reaction, whereas the remaining approximations depend also on the values of *N*_{i} and *N*_{j}. These observations reflect the fact that, in order to arrive at the expression for Δ*G*_{(1)}, we leveraged the size of *N* compared to *σ*, and to arrive at the expressions for Δ*G*_{(2)} and Δ*G*_{(3)}, we further successively leveraged the sizes of *N*_{i}, *N*_{j} compared to *ν*_{i}, *ν*_{j}.

### APPENDIX C: **Δ***G* OF SOLVENT FLUCTUATIONS

Here, we first calculate an approximate expression for the change in Gibbs free energy of the system accompanying a fluctuation of *n* solvent molecules from compartment A to compartment B. We have

where *L* is the set of solute species. After the transfer of *n* solvent molecules from *A* to *B*, we have

Taking the difference and simplifying, we arrive at the exact expression

To understand the magnitude of Δ*G* for typical values of *n*, *N*_{s,A}, and *N*_{s,B} compared to *N*_{A} and *N*_{B}, we first make the assumption that the two compartments initially have the same number of solvent molecules, i.e., *N*_{s,A} = *N*_{s,B} ≡ *N*_{s}. Next, we assume that the solvent molecules dominate the proportion of total molecules, allowing us to write *N*_{A} ≈ *N*_{B} ≡ *N*. These approximations will hold in the limit that the number of solute molecules is much less than the number of solvent molecules for each compartment. We next introduce the small parameters

and

which capture the dilutions of the two compartments, and

which represents the relative size of the fluctuation. For *N* = 10^{9}, we typically have *ξ* ∼ 10^{−4.5} (since *n* ∼ *N*^{1/2}) and *ε* ∼ 10^{−6}. Substituting these parameters into Eq. (C3), we have

We expand this expression to first order in *ε*_{A} and *ε*_{B} and then to second order in *ξ*. The result is

The observed numerical agreement between Eqs. (C3) and (C8) is close for realistic values of the parameters: for *N*_{A} = 10^{9}, *N*_{B} = 5 × 10^{8}, *ε*_{A} = 10^{−6}, *ε*_{B} = 3 × 10^{−6}, and $n=NA1/2$, the prediction of Eq. (C3) is −0.0 632 401 *k*_{B}*T* and the prediction of Eq. (C8) is −0.0 632 436 *k*_{B}*T*. The first term in Eq. (C8) will dominate if the solute dilutions in the two compartments are very different from each other. In this case, we may compare the size of this change in Gibbs free energy to that accompanying the diffusion of a solute from compartment B to compartment A. This latter change in Gibbs free energy will be approximately $kBT\u2061log\epsilon A\epsilon B$. If we now set *ε*_{A} = *aε*_{B}, where *a* is of order 1 (typically, it will fall in the range [1/10, 10]), then Δ*G* for the solvent fluctuation will be *k*_{B}*T*(*a* − 1)*ε*_{B}*n* and Δ*G* for the solute diffusion will be *k*_{B}*T* log *a*. The product *ε*_{B}*n* will be typically on the order of ∼10^{−1.5}, so one can see that for typical values of the parameters the change in Gibbs free energy from a solute diffusion event will be significantly greater in magnitude than that from a solvent fluctuation. If the dilutions are very similar, then *ε*_{A} − *ε*_{B} ≈ 0, and the second term in Eq. (C8) dominates. This term is on the order of (*ε*_{A} + *ε*_{B})*k*_{B}*T* since $n2N\u223c1$. These changes in Gibbs free energy will typically be much smaller than those accompanying a chemical reaction or inter-compartment diffusion of the solute. Thus, we may neglect the activity of the solvent in tracking the Gibbs free energy of the system.

## REFERENCES

The argument of the logarithm, $NA\Theta A$, has dimensions of concentration, however it is understood that such quantities are taken with reference to some standard concentration, typically 1M.

We have $\u2212logN+\sigma N(N+\sigma )=\u2212log1+\sigma N(N+\sigma )\u2248\u2212\sigma $ for large *N*.

The product rule can be derived as *d*(*fg*) = (*f* + *df*) (*g* + *dg*) − *fg* = *fdg* + *gdf* + *dfdg* = *fdg* + *gdf* since the term *dfdg* is assumed to be negligible. We give this reminder to emphasize the presence of the cross term, which we do not neglect in the discrete case.