Macroscopic properties of reacting mixtures are necessary to design synthetic strategies, determine yield, and improve the energy and atom efficiency of many chemical processes. The set of time-ordered sequences of chemical species are one representation of the evolution from reactants to products. However, only a fraction of the possible sequences is typical, having the majority of the joint probability and characterizing the succession of chemical nonequilibrium states. Here, we extend a variational measure of typicality and apply it to atomistic simulations of a model for hydrogen oxidation over a range of temperatures. We demonstrate an information-theoretic methodology to identify typical sequences under the constraints of mass conservation. Including these constraints leads to an improved ability to learn the chemical sequence mechanism from experimentally accessible data. From these typical sequences, we show that two quantities defining the variational typical set of sequences—the joint entropy rate and the topological entropy rate—increase linearly with temperature. These results suggest that, away from explosion limits, data over a narrow range of thermodynamic parameters could be sufficient to extrapolate these typical features of combustion chemistry to other conditions.

Myriad chemical phenomena involve many parallel and competing reaction steps, from combustion and ozone depletion to the total synthesis of natural products.1 For combustion, the practical need to learn chemical mechanisms has driven experimentalists to probe molecular events with spectroscopic techniques down to femtosecond time scales.2,3 However, whether in the gas or condensed phase, chemical kinetics largely centers around the theoretical treatment of the bulk reaction rates. The available data on shorter time scales and widespread modeling approach4 to combustion kinetics leave many theoretical questions unanswered. One important question is how to transform properties of ephemeral molecular events into some relevant macroscopic observables for a reacting gaseous mixture through the program of statistical mechanics. Here, we begin to formulate an answer and suggest macroscopic observables that are not the traditional model parameters, rate coefficients.

Given certain physical constraints on a reacting mixture, such as those on the interior of an engine or in the atmosphere, a goal is to quantitatively relate the mechanism of combustion reactions to the temporal profiles of concentrations and physical conditions. A recent alternative to conventional modeling of mass-action kinetics in chemically reactive systems is based on a chemical-pathway representation.5,6 Here, the contracted description we use7 is complementary and more akin to symbolic dynamics; it neglects both elementary reactions, which is a notion that can breakdown under high-pressure conditions, and the assignment of rate coefficients, which may only be accurate in a narrow range of conditions.8,9 The focus is on the time-ordered set of sequences of chemical species during the overall reaction—another representation of the chemical mechanism. We have previously shown that it is possible to learn these sequences for hydrogen combustion from simulation data that mimic ultrafast spectroscopy experiments.10 We generate these sequences from the time profiles of marginal probabilities for each chemical species as a mixture transforms reactants into products. Ultimately, characterizing these sequences could lead to new routes to control and the design of (energy and atom efficient) reactions.

The dependence of bulk reaction rates on temperature and pressure can give insights into molecular mechanisms. One hallmark of chemical kinetics is the Arrhenius equation, an empirical finding that rate coefficients have an exponential temperature dependence. The quantitative understanding of chemical reaction rates has historically focused on thermal energy fluctuations, but there has been recent success extending kinetic, transition-state theories to driven, but simple, chemical reactions.11–13 Temperature and other thermodynamic parameters, however, must also affect the symbol sequence mechanism for chemically reacting mixtures. It remains to be seen whether there are also quantitative relationships for the dependence of the sequence mechanism on the thermodynamic state that might be useful in predicting the chemical mechanism at other conditions. This work takes initial steps in this direction.

Instead of constructing a chemical mechanism by estimating rate coefficients for elementary reactions, we identify the sequence mechanism of hydrogen oxidation. From discrete time series of the probability of each chemical species, we enumerate the possible sequences of chemical species and find those that are “typical”—sequences carrying the majority of the probability.14 The reduced set of sequences agrees well with the known chemical sequence mechanism.7 Analyzing a reduced mechanism is a common approach in combustion modeling, especially as the computational burden to model the reaction grows.15–18 The main advantage of the method we use here, the variational typical set,10 is that the set of (possibly time dependent) rate coefficients are traded for one global parameter per time step. We determine the parameter through a variational method that simultaneously maximizes the probability in the typical sequences and minimizes their number. The work here extends our previous results:10 with data derived from molecular dynamics simulations, we calculate the variational typical set for hydrogen oxidation over a range of temperatures and introduce an extension of this theory to account for mass conservation.

In the combustion of hydrogen-oxygen mixtures, the M = 8 chemical species that occur with significant probability are H2, O2, H, O, OH, HO2, H2O2, and H2O. If we impose no constraints from mass conservation or chemical reactions, and simply enumerate, there are Mn possible sequences in total. That is, the number of sequences grows exponentially as Mn = enlnM with the sequence length n. Characterizing the mechanism requires finding those sequences that carry the majority of the probability. Such sequences are the typical subset. More generally, typical sequences are central to information theory, dating back to the seminal work of Shannon.19 Here, we will work with a joint probability distribution over the possible sequences from molecular dynamics simulations and extend a variational formulation of the typical set.

To make this idea of the variational typical set more precise, take a countable set of sequences X^n={x1,x2,,xn}, where xi ∈ {H2, O2, H, O, OH, HO2, H2O2, H2O} is the state label for the system on the ith measurement. Since our interest is hydrogen combustion, the states are the M = 8 possible chemical species. The random variable X^n represents one of the Mn sequences of length n, while x^nX^n is a particular sequence realization. The joint probability distribution over X^n, μ(X^n), is normalized at each n, x^nμ(X^n=x^n)=1. From the joint distribution, the typical set of sequences is defined as20 

AϵnX^n:en(hμ+ϵ)μ(X^n)en(hμϵ).
(1)

The arbitrary, positive parameter ϵR+ defines a neighborhood around the joint entropy rate

hμ=1nx^nμ(x^n)lnμ(x^n).
(2)

The entropy rate must be finite in the limit n for the typical set to exist.21,22 In this limit, ϵ can be made arbitrarily small and the entropy rate characterizes both the exponential growth of the size of typical set |Aϵn|e+nhμ and the exponential decay of sequence probabilities μ(x^n)enhμ. The total probability in this subset is near one, x^nAϵnμ(x^n)|Aϵn|enhμ1 and determines the average (here, chemico-dynamical) behavior.14,23 Away from the infinite limit, however, this subset of possible sequences may be empty or contain little joint probability depending on the magnitude of the parameter ϵ. Consequently, even when there is asymptotic concentration of probability in the typical set, another approach is necessary to capture the typical behavior from the finite length sequences of transient processes.

Here we extend and apply a variational formulation of the typical set.10 Like the asymptotic typical set, the variational typical set (VTS) holds the majority of the probability, but for sequences of (potentially) finite length. Its generality also stems from its only requirement—the existence of the joint entropy rate. With the entropy rate in Eq. (2) for finite n, the VTS is10 

X^n:en(hμ+ϵ*)μ(X^n)en(hμϵ*).
(3)

The positive, real parameter ϵ* = ϵ*(n) depends on n (though we suppress this dependence for notational convenience) and is defined variationally,

ϵ*ϵ*(n):arg maxϵ[μ{Aϵn}Wϵn].
(4)

Through this variational method, the probability μ{Aϵ*n} in the typical set is maximized and the fraction of unique typical sequences, Wϵ*=|Aϵ*n|/Mn, is minimized at ϵ*. The subset of typical sequences then holds the majority of the probability using the minimal number of sequences of length n. Or, more loosely speaking, the VTS maximizes predictive fidelity of the statistical dynamics with the minimum amount of dynamical information.

To generate the chemistry and the data to input into the theory, we ran molecular dynamics simulations with the PuReMD simulation package24 and the ReaxFF potential.25–28 All simulations were at constant number of atoms, volume, and temperature (NVT) conditions. Each trajectory was updated with a time step of 0.1 fs. The temperature was controlled with a Nosé-Hoover thermostat29–32 using a coupling time of 1 ps. We ran a set of 50 independent simulations, each from unique initial conditions, at 20 temperatures between 2400 K and 6800 K. A cubic simulation box with a volume of 20 × 20 × 20 Å3 was initially filled with 66 hydrogen and 33 oxygen molecules. Each simulation was seeded with a single OH radical to avoid simulating the time before an initiation reaction.

Over the temperature range we consider, less water is produced at higher temperatures.7 Temperatures can drive water dissociation in this model, a fact reflected in the final abundance of water produced by the overall reaction shown in Fig. 1. Also of note is that the total simulation time varied from 2 ns at 2400 K to 25 ps at 6800 K. At low temperatures, the probability of water, p(H2O), accumulates slowly over the course of a simulation and saturates with a high abundance of water. Overall, an increase in temperature accelerates the rate of water production, but at the expense of the maximum number of water molecules.

FIG. 1.

The empirical, marginal probability of water as a function of time from molecular dynamics simulations at temperatures between 2400 (purple) and 6800 K (yellow). Data at each temperature are an average over 50 NVT trajectories. Points indicate times where the marginal probabilities of all chemical species are measured. These probabilities are used to compute the variational typical set of chemical sequences.

FIG. 1.

The empirical, marginal probability of water as a function of time from molecular dynamics simulations at temperatures between 2400 (purple) and 6800 K (yellow). Data at each temperature are an average over 50 NVT trajectories. Points indicate times where the marginal probabilities of all chemical species are measured. These probabilities are used to compute the variational typical set of chemical sequences.

Close modal

For the analysis of the variational typical set, we analyze the crossover regions of each times series for the probability of water. The times where we measure marginal probabilities of all chemical species are marked by points in Fig. 1. Because we fix the number of “measurements” at ten, we vary the time resolution to account for the difference in reaction times across the temperature range. To measure marginal probabilities, we first find the time where the mixture has 50% of the maximum probability p(H2O), t1/2. We divide the time from t = 0 to t1/2 into half of the number of desired time steps. The same number of time steps is then selected for t > t1/2, and we check that this window covers the crossover region. At each selected time, we record the marginal probability of each chemical species. These marginal probabilities are the input into the typical set calculations. We divide each region into ten windows, though we use nine in the calculation of the VTS for each temperature because of the computational expense of generating every sequence. The sequence length here can be thought of as a progress variable where at n = 5, the reaction is approximately 50% complete and at n = 9, the reaction is around 90% complete.

In molecular dynamics simulations, we can track individual atoms as they transition from one molecule to another during bimolecular and termolecular reactions. However, tracking atoms assumes the ability to obtain information not accessible through experiments. Methods, such as ultrafast laser spectroscopy,2,3 can now measure densities of chemical species on the femtosecond time scale. However, the individual atoms in each measurement are indistinguishable from each other. The consequences of this indistinguishability can be seen in a simple example. Take the case of H2 + OH ⇌ H2O + H, for example. To determine the typical set of sequences, we need the joint probability μ(H, OH) = p(H|OH)p(OH). If we cannot distinguish atoms, we cannot know which H atom in H2 joins water upon reaction. As a consequence, we cannot infer the conditional probability or the joint probability and we must estimate the joint probability of a sequence with the information available—independent marginal probabilities.

We estimate the joint probability of a sequence from the simulations as

Φ(x^n)=i=1np(xi).
(5)

This joint probability can be estimated from experimental or simulation data. The variational typical set becomes

Aϵ*n=X^n:en(hΦ+ϵ*)Φ(X^n)en(hΦϵ*)
(6)

with the entropy rate

hΦ=1nx^nΦ(x^n)lnΦ(x^n).
(7)

Here again, the positive, real parameter ϵ* = ϵ*(n) depends on n and is defined variationally,

ϵ*ϵ*(n):arg maxϵ[Φ{Aϵn}Wϵn].
(8)

What we show below is that the typical sequences we identify with only this experimentally accessible information include those sequences of chemical species also found by tracking atoms. We stress that the variational typical set and the entropy rate can be defined directly on the joint distributions and that we use independent distributions here to mimic experimental data.

To transform marginal probabilities from an experiment or simulation into joint probabilities, one can enumerate all possible sequences. Here, we will call these sequences “unrestricted” since we assume no additional information about the transitions between chemical species is accessible beyond the marginal probabilities of all chemical species and the direct enumeration of sequences. We first considered these sets of sequences in Ref. 10. Within the context of thermally activated chemical kinetics, a useful heuristic is that an increase in temperature relative to the activation energy will exponentially accelerate the rate of the reaction. From this heuristic, a plausible hypothesis is that the temperature will also accelerate the exponential growth in the number of typical unrestricted sequences and the exponential decay of their probabilities. To test this hypothesis, we ran molecular dynamics simulations and extracted marginal probability distributions for nineteen temperatures uniformly spaced between 2800 K and 6400 K.

For each temperature, we extract the variational typical set of sequences as a function of n from an enumeration of the possible sequences, x^n, and their joint probabilities, Φ(x^n). Figure 2(a) shows the total probability in this set as a function of n for each temperature. Even for short sequences, with only three or four chemical symbols, this set captures more than 90% of the probability in Φ(X^n). Figure 2(b) shows that this probability resides in only a small fraction, Wϵ*, of the Mn possible sequences. For example, when n = 6, the fraction of variationally typical sequences Wϵ* is less than 5%. The variational parameter ϵ* is roughly constant after n = 3 as shown in Fig. 2(c), which suggests that μ{Aϵ*n} and Wϵ*n have synchronized; as n grows, the number of sequences increases and the joint probability spreads over the ever increasing number of sequences. The parameter ϵ* measures the relationship between these two functions, so a constant ϵ* means that the maximum difference between the joint probability and the fraction of typical sequences is independent of n.

FIG. 2.

(a) The total probability in the variational typical set Φ{Aϵ*n}, (b) the fraction of typical sequences, Wϵ*n, and (c) the maximum value of the variational parameter ϵ*, all as a function of the sequence length n. Colors indicate temperature in Kelvin.

FIG. 2.

(a) The total probability in the variational typical set Φ{Aϵ*n}, (b) the fraction of typical sequences, Wϵ*n, and (c) the maximum value of the variational parameter ϵ*, all as a function of the sequence length n. Colors indicate temperature in Kelvin.

Close modal

The variationally typical sequences overlap with the reaction mechanism. To assess the agreement, we identify the sequence representation of the reaction mechanism by tracking atoms in the molecular dynamics simulations. While not experimentally accessible, these atom-tracked sequences are desirable mechanistic information. Let Mn denote the sequences of length n found from tracking atoms. The subset that intersects the variational typical set, Aϵ*n, of the enumerated sequences is MnAϵ*n. The total probability of this intersection is

Pr(MnAϵ*n)=x^nMnAϵ*nμ(x^n).
(9)

We find that Pr(MnAϵ*n)0.84 for all sequence lengths and all temperatures. By n = 6, or after the overall reaction is roughly 60% complete (at each temperature), this probability increases to Pr(MnAϵ*n)0.9. This amount of overlap means that the variational typical sequences include the significant atom-tracked sequences: even though the unrestricted sequences include those violating mass conservation and the joint probabilities are estimates from independent marginal probabilities. In other words, if Aϵ*n is used to infer the sequence reaction mechanism, the simulations would produce a sequence in Aϵ*n more than 84% of the time, independent of the progress of the overall reaction.

As we have shown previously,10 and re-enforced above, the enumerated sequences that hold most of the probability agree well with the high-probability sequences found by tracking atoms in the simulations. Thus, the typical subset of enumerated sequences is a potential route to learn chemical mechanisms in the sequence representation.

Drawing from the extensive work modeling chemical mechanisms, it is conceivable that restrictions on the possible symbol sequences are known a priori and could improve the overlap of the typical enumerated sequences and atom-tracked sequences. Certain transitions, for example, might be prohibited by the initial conditions, or, more relevant here, mass conservation, stoichiometry, and the constraints of electronic structure. As a first step, imposing mass conservation on the chemical sequences means excluding transitions such as the one from hydrogen atom (molecule) to oxygen atom (respectively, molecule). We will show that incorporating this information into the enumeration of sequences and the typical subset improves both the ability to capture the bulk of the probability and determine the atom-tracked sequences.

We expect mass conservation to significantly prune down the number of allowed sequences. For example, in hydrogen combustion, the transition O → H is forbidden but included in the unrestricted sequences. To impose mass conservation on the enumerated sequences, we divide the space of all sequences into a restricted set, X^nRn, and an allowed set, X^nRn=Rnc, the complement of Rn. Given the marginal probabilities from periodic measurements, the joint distribution over all sequences can be written using the Kronecker product,

Φ(X^n)=p(xn)(p(x3)(p(x2)p(x1)))=inp(xi),
(10)

where xi is the species observed on the ith time step with probability p(xi). However, after removing the restricted sequences from the enumerated set, conservation of probability no longer holds,

1=x^nRnΦ(x^n)+x^nRncΦ(x^n),x^nRncΦ(x^n).
(11)

It is necessary to find a normalization factor for the joint distribution (built from independent distributions) to account for mass conservation in the VTS.

There is not a unique normalization to form the distribution for the mass conserving sequences, as the normalization depends on the constraints of the distribution. To understand this point, let us look at several cases. The most naive normalization factor is the inverse of

Zn=x^nRcΦ(x^n).
(12)

While this factor does ensure normalization,

x^nRcΦ̃(x^n)=x^nRcΦ(x^n)/Zn=1,
(13)

contraction does not give the original marginal probabilities

p(xn)x^n1Φ̃(xn,x^n1).
(14)

The marginals are the experimentally measured quantities, and thus, any theory that does not preserve these quantities is not a faithful representation of the measurement. We include contraction of the joint, p(xn)=x^n1Φ̃(xn,x^n1), as a second constraint, where Φ̃ is the distribution (defined below) normalized so the mass conserving sequences respect the input marginals.

To incorporate the constraint on the contraction, we define the (unprimed) normalization factor

Z(xn)=x^n1Φ(x^n1)δx^n,Rnc.
(15)

In this equation, the final state xn is fixed and the sum runs over all subsequences of length n − 1. Now instead of a single normalization factor, there are up to M unique values, one for each state (here, chemical species). The Kronecker delta function is one if the sequence x^n has no violating subsequences of any length and is zero otherwise. Through direct computation it can be seen that the resulting distribution,

Φ̃(x^n)=1Z(xn)Φ(x^nRnc),
(16)

ensures both conservation of probability and contractions to the marginal probabilities. We use the unprimed normalization in the data we present here.

Now we turn to our numerical implementation and construction of the normalization factor Z(xn). Each violating sequence of length n is the result of at least one violating subsequence of length two. This observation leads to a simplification when using Z(xn). In practice, we gain efficiency when enumerating the sequences x^n by propagating forward the allowed sequences of length n − 1 and pruning any new violating sequences. These violating sequences will only have a single forbidden transition as the last, length-two subsequence. In this way, we only need a list of forbidden transitions,

R2=(O,H2),(H2,O),(O,H),(H,O),(H2,O2),(O2,H2),(O,H2),(H2,O),
(17)

to generate allowed sequences of length n. The iterative process starts at n = 2 with the joint probability

Φ̃(x^2)=1Z(x2)Φ(x^2R2c).
(18)

The first joint distribution then only requires knowledge of the allowed sequences of length two, R2c. Using the fact that any sequence of length n is the result of appending a state x to a sequence of length n − 1, i.e., x^n=(xn,x^n1), the normalization factor is

Z(x2)=x1p(x1)δ[(x2,x1),R2c].
(19)

Propagating Φ̃(x^2) leads to

Φ̃(X^n)=1Z(xn)p(xn)Φ̃(X^n1)δ(xn,xn1),R2c
(20)

with the corresponding normalizing function

Z(xn)=x^n1Φ̃(x^n1)δ(xn,xn1),R2c.
(21)

So, by executing this iterative process, we generate all mass-conserving sequences with the desired constraints on the probability distributions.

Tree diagrams are one means of visualizing the sequences in the variational typical set.10 Here, they illustrate the effect of removing the forbidden transitions on the typical set of sequences, Fig. 3. Figure 3(b) shows a tree diagram for sequences of chemical species up to length n = 5. The three central nodes are the reactants H2, O2, and OH. Edges radiating outwards connect nodes (representing chemical species) to form sequences of increasing length. The 3M edges leading from the three initial species make up all realizable sequences of length n = 2. The species label from any interior node to the M nodes connected on the subsequent outer ring is shown in Fig. 3(a). Using this ordering, the diagram in Fig. 3(b) represents all possible sequences with n = 5.

FIG. 3.

Tree diagrams for visualizing chemical symbol sequences. (a) Labels for the outer nodes connecting the chemical species X at ring n to the M possible species at ring n + 1. This set of nodes and edges radiating outwards represents a possible transition between chemical species (potentially in a longer sequence). (b) Possible sequences of length n = 5 for M = 8 chemical species. (c) Unrestricted sequences in the variational typical set at 4600 K (yellow). (d) Restricted, mass-conserving sequences that are variationally typical (blue). Comparing (c) and (d), conserving mass prunes large branches of sequences emanating from the central reactant, H2 and O2, nodes. At n = 7, these branches represent thousands of sequences that can be neglected.

FIG. 3.

Tree diagrams for visualizing chemical symbol sequences. (a) Labels for the outer nodes connecting the chemical species X at ring n to the M possible species at ring n + 1. This set of nodes and edges radiating outwards represents a possible transition between chemical species (potentially in a longer sequence). (b) Possible sequences of length n = 5 for M = 8 chemical species. (c) Unrestricted sequences in the variational typical set at 4600 K (yellow). (d) Restricted, mass-conserving sequences that are variationally typical (blue). Comparing (c) and (d), conserving mass prunes large branches of sequences emanating from the central reactant, H2 and O2, nodes. At n = 7, these branches represent thousands of sequences that can be neglected.

Close modal

The typical set of unrestricted sequences in Fig. 3(c) is a small fraction of those possible. As shown in Fig. 3(d), even fewer sequences are typical upon imposing mass conservation. Comparing (c) to (d), the sequences {H2, O2,…} are clearly removed from the VTS, along with the sequences {O2, H,…} and {O, H2, …,}. As n grows, removing all restricted sequences leads to a substantial reduction in the size and increase in the probability in the variational typical set. As an example, at 4600 K and n = 7, there are |Aϵ*n|=24,443 variationally typical sequences in the unrestricted case. In the restricted case, the variational typical set consists of |Aϵ*n|=8,974 sequences, a 36% decrease.

For sequence with n > 3, there is more probability in the typical set after imposing mass conservation than in the unrestricted case. The change in the typical set probability of early sequences is mirrored in the fraction of typical sequences, Wϵ*n, in Fig. 4(b), which now starts at a lower fraction of the total number of sequences and does not have as large a decrease from n = 2 to n = 3. While removing the violating sequences for each temperature causes the typical set to hold slightly less probability at n = 2, it also suppresses the minimum in probability between n = 2 and n = 3, Fig. 4(a). More important is that the amount of probability in the atom-tracked sequences overlapping Aϵ*n is close to one, Fig. 4(c). These data show that the typical set of sequences constructed from the product of marginal probabilities overlaps well with the typical set of sequences from pure joint probabilities. The greater the overlap probability, the better one has learned the sequences of chemical species linking reactants to products. This result is not necessarily an obvious consequence of accounting for mass conservation since mass conservation concerns the allowed sequences, not the probability of each sequence.

FIG. 4.

For the mass-conserving chemical symbol sequences, (a) the total probability in the typical set, Φ̃{Aϵ*n}, holds more probability at each temperature than in the unrestricted case. (b) The fraction of variationally typical sequences, Aϵ*n, is less than 0.12 and decays monotonically. (c) The amount of probability in the atom-tracked sequences and variational typical set, Pr(MnAϵ*n), is close to one for all temperatures and sequence lengths. The total overlap probability here is higher than in the unrestricted case. Colors indicate temperature in Kelvin.

FIG. 4.

For the mass-conserving chemical symbol sequences, (a) the total probability in the typical set, Φ̃{Aϵ*n}, holds more probability at each temperature than in the unrestricted case. (b) The fraction of variationally typical sequences, Aϵ*n, is less than 0.12 and decays monotonically. (c) The amount of probability in the atom-tracked sequences and variational typical set, Pr(MnAϵ*n), is close to one for all temperatures and sequence lengths. The total overlap probability here is higher than in the unrestricted case. Colors indicate temperature in Kelvin.

Close modal

The variational typical set consists of three components whose interactions characterize the chemical symbolic dynamics of a system: the size of the typical set, the probability in the typical set, and the parameter ϵ* that fixes the relationship between the two. The size and the probability are governed by the topological entropy rate33–35 and the joint entropy rate, Eq. (2), respectively. These quantities characterize the chemical symbol dynamics but can be challenging and expensive to calculate by brute force.

Starting from the three initial species (H2, O2, and OH), the number of possible unrestricted sequences grows exponentially as 3Mn1=3e(n1)lnM=3e(n1)htopenhtop. The topological entropy rate, htop, is the exponential growth rate. For the variational typical set, Aϵ*n, the corresponding topological entropy rate htop* grows at a slower, but still exponential, rate for all temperatures we consider here. We determine htop* from the slope of a linear fit to ln|Aϵ*n| against n for each temperature. From the simulation data, the magnitude of htop* scales linearly with temperature, Fig. 5(a). The linear growth of htop* with temperature is htop*=γ1T+γ2, where γ1 = 3.387 × 10−5 and γ2 = 1.424. The significance of this linear relationship is that it could be possible to extrapolate the number of sequences in Aϵ*n to different temperatures without direct enumeration of chemical sequences or further molecular dynamics simulations. To test this hypothesis, we ran additional calculations of |Aϵ*n| at 6800 K and 2400 K. The estimate of htop* from the linear fit in Fig. 5(c) was within 2% for 6800 K and 2400 K of the actual topological entropy rate from direct enumeration of sequences.

FIG. 5.

(a) The growth rate for the size of the variational typical set |Aϵ*n|—a topological entropy rate, htop*—increases linearly with temperature. (b) The joint entropy rate, hΦ, also increases linearly with temperature, though with a different slope. Data shown are for the restricted case. Both entropy rates are in nats/symbol.

FIG. 5.

(a) The growth rate for the size of the variational typical set |Aϵ*n|—a topological entropy rate, htop*—increases linearly with temperature. (b) The joint entropy rate, hΦ, also increases linearly with temperature, though with a different slope. Data shown are for the restricted case. Both entropy rates are in nats/symbol.

Close modal

The topological entropy for the typical set defines the rate that sequences join the variational typical set as n grows. The equivalent quantity for the probability in the typical set is hΦ, Eq. (7). We note that hΦ differs from the empirical average of n1lnΦ(x^). The neighborhood around the empirical mean defining the typical sequences is determined by ϵ*. We find the entropy rate hΦ also scales linearly with n, Fig. 5(b), such that hΦ = β1T + β2, where the linear fit (solid blue line) has the best fit parameters β1 = 7.24 × 10−5 and β2 = 0.77. These linear relationships between temperature and the entropy rates governing the size and probability of Aϵ*n for hydrogen combustion also hold for the unrestricted typical sequences.

The linear growth of htop* reflects the intuition that the number of unique sequences grows more rapidly with n at higher temperatures. Because temperature drives the more rapid expansion of the |Aϵ*n| with n, the joint probability of each sequence decays more quickly with n. Comparing Figs. 5(a) and 5(b), we see that hΦ<htop* for all temperatures: the probability of each typical sequence decays more slowly than the number of typical sequences increases. These findings are potentially useful for combustion modeling that seeks predictive power and fidelity at unexplored thermodynamic conditions. The generality of these relationships for other fuels, conditions, and observables, however, remains to be tested.

The variational typical set is a systematic framework capable of determining the chemical sequences that characterize complex chemical reactions. It requires no assumptions of equilibrium or rate coefficients. From data derived from molecular dynamics simulations, we have shown excellent agreement of the typical sequences with those found by tracking atoms. For a simulated reaction that is just over half complete, we found that the total probability of atom-tracked sequences that are also typical was greater than 90%, even though the VTS only used experimentally available information (probabilities of chemical species). This improved agreement stems from including mass conservation in the variational typical set and, in general, leads to an improved ability to capture probability in the typical set and, thus, learn the mechanism of the reaction. For data at different temperatures, we find the characteristic entropy rates governing the exponential growth in the number of sequences and the exponential decay of sequence probabilities are both linear functions of temperature for hydrogen oxidation. These linear relationships could be used to extrapolate the number and probability of sequences needed to characterize a combustion reaction as the temperature is varied. This finding could be useful when comparing different combustion models36,37 or when aiming to minimize costly simulations or experiments.

The variational typical set requires knowing all possible sequences of length n. At the moment, generating all possible sequences is computationally expensive, even for reactions such as hydrogen oxidization. Future work is necessary to develop methods that can efficiently calculate the size and probability in the VTS without explicitly checking whether each sequence is typical. For independent identically distributed random variables, progress has been made and it is possible to propagate forward the size and probability in the typical set without direct enumeration. Further work is necessary to propagate the independent distributions we consider here, which when combined with the variational typical set, could increase the accuracy of modeling combustion reactions.

This material is based upon work supported by the U.S. Army Research Laboratory and the U.S. Army Research Office under Grant No. W911NF-14-1-0359. We acknowledge the use of the supercomputing facilities managed by the Research Computing Group at the University of Massachusetts Boston as well as the University of Massachusetts Green High Performance Computing Cluster.

1.
M. J.
Pilling
and
P. W.
Seakins
,
Reaction Kinetics
, 1st ed. (
Oxford University Press
,
1995
).
2.
A. K.
Patnaik
,
I.
Adamovich
,
J. R.
Gord
, and
S.
Roy
, “
Recent advances in ultrafast-laser-based spectroscopy and imaging for reacting plasmas and flames
,”
Plasma Sources Sci. Technol.
26
(
10
),
103001
(
2017
).
3.
J. R.
Gord
,
T. R.
Meyer
, and
S.
Roy
, “
Applications of ultrafast lasers for optical measurements in combusting flows
,”
Anal. Chem.
1
,
663
687
(
2008
).
4.
C. K.
Law
,
Combustion Physics
(
Cambridge University Press
,
2006
).
5.
S.
Bai
,
M. J.
Davis
, and
R. T.
Skodje
, “
Sum over histories representation for kinetic sensitivity analysis: How chemical pathways change when reaction rate coefficients are varied
,”
J. Phys. Chem. A
119
(
45
),
11039
11052
(
2015
).
6.
S.
Bai
and
R. T.
Skodje
, “
Simulating chemical kinetics without differential equations: A quantitative theory based on chemical pathways
,”
J. Phys. Chem. Lett.
8
(
16
),
3826
3833
(
2017
).
7.
M.
Alaghemandi
and
J. R.
Green
, “
Reactive symbol sequences for a model of hydrogen combustion
,”
Phys. Chem. Chem. Phys.
18
(
4
),
2810
2817
(
2016
).
8.
J.
Warnatz
, “
Resolution of gas phase and surface combustion chemistry into elementary reactions
,”
Symp. (Int.) Combust.
24
,
553
579
(
1992
).
9.
R.
Van de Vijver
,
N. M.
Vandewiele
,
P. L.
Bhoorasingh
,
B. L.
Slakman
,
F.
Seyedzadeh Khanshan
,
H.-H.
Carstensen
,
M. F.
Reyniers
,
G. B.
Marin
,
R. H.
West
, and
K. M.
Van Geem
, “
Automatic mechanism and kinetic model generation for gas- and solution-phase processes: A perspective on best practices, recent advances, and future challenges
,”
Int. J. Chem. Kinet.
47
(
4
),
199
231
(
2015
).
10.
S. B.
Nicholson
,
M.
Alaghemandi
, and
J. R.
Green
, “
Learning the mechanisms of chemical disequilibria
,”
J. Chem. Phys.
145
(
8
),
084112
(
2016
).
11.
G. T.
Craven
,
T.
Bartsch
, and
R.
Hernandez
, “
Chemical reactions induced by oscillating external fields in weak thermal environments
,”
J. Chem. Phys.
142
(
7
),
074108
(
2015
).
12.
G. T.
Craven
,
A.
Junginger
, and
R.
Hernandez
, “
Lagrangian descriptors of driven chemical reaction manifolds
,”
Phys. Rev. E
96
(
2
),
022222
(
2017
).
13.
F.
Revuelta
,
G. T.
Craven
,
T.
Bartsch
,
F.
Borondo
,
R. M.
Benito
, and
R.
Hernandez
, “
Transition state theory for activated systems with driven anharmonic barriers
,”
J. Chem. Phys.
147
(
7
),
074104
(
2017
).
14.
R. W.
Yeung
,
A First Course in Information Theory
(
Springer
,
2002
), Vol. 1.
15.
M. J.
Davis
and
R. T.
Skodje
, “
Geometric investigation of low-dimensional manifolds in systems approaching equilibrium
,”
J. Chem. Phys.
111
(
3
),
859
874
(
1999
).
16.
J. M.
Simmie
, “
Detailed chemical kinetic models for the combustion of hydrocarbon fuels
,”
Prog. Energy Combust. Sci.
29
(
6
),
599
634
(
2003
).
17.
U.
Maas
and
T.
Blasenbrey
, “
ILDMs of higher hydrocarbons and the hierarchy of chemical kinetics
,”
Proc. Combust. Inst.
28
(
2
),
1623
1630
(
2000
).
18.
S.
Porras
,
V.
Bykov
,
V.
Gol’dshtein
, and
U.
Maas
, “
Joint characteristic timescales and entropy production analyses for model reduction of combustion systems
,”
Entropy
19
(
6
),
264
(
2017
).
19.
C. E.
Shannon
, “
A mathematical theory of communication
,”
Bell Syst. Tech. J.
27
,
623
656
(
1948
).
20.
T. M.
Cover
and
J. A.
Thomas
,
Elements of Information Theory
(
Wiley
,
2006
), Vol. 2.
21.
K. L.
Chung
, “
A note on the ergodic theorem of information theory
,”
Ann. Math. Stat.
32
(
2
),
612
614
(
1961
).
22.
P. H.
Algoet
and
T. M.
Cover
, “
A sandwich proof of the Shannon-McMillan-Breiman theorem
,”
Ann. Probab.
16
(
2
),
899
909
(
1988
).
23.
B.
McMillan
, “
The basic theorems of information theory
,”
Ann. Math. Stat.
24
(
2
),
196
219
(
1953
).
24.
S. B.
Kylasa
,
H. M.
Aktulga
, and
A. Y.
Grama
, “
PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs
,”
J. Comput. Phys.
272
,
343
359
(
2014
).
25.
K.
Chenoweth
,
A. C. T.
van Duin
, and
W. A.
Goddard
 III
, “
ReaxFF reactive force field for molecular dynamics simulations of hydrocarbon oxidation
,”
J. Phys. Chem. A
112
(
5
),
1040
1053
(
2008
).
26.
S.
Agrawalla
and
A. C. T.
van Duin
, “
Development and application of a ReaxFF reactive force field for hydrogen combustion
,”
J. Phys. Chem. A
115
(
6
),
960
972
(
2011
).
27.
T.
Cheng
,
A.
Jaramillo-Botero
,
W.
A Goddard
 III
, and
H.
Sun
, “
Adaptive accelerated ReaxFF reactive dynamics with validation from simulating hydrogen combustion
,”
J. Am. Chem. Soc.
136
(
26
),
9434
9442
(
2014
).
28.
T. P.
Senftle
,
S.
Hong
,
M.
Islam
,
S. B.
Kylasa
,
Y.
Zheng
,
Y. K.
Shin
,
C.
Junkermeier
,
R.
Engel-herbert
,
M. J.
Janik
,
H. M.
Aktulga
 et al, “
The ReaxFF reactive force-field: Development, applications and future directions
,”
Comput. Mater.
2
,
15011
15025
(
2015
).
29.
S.
Nosé
,
J. Chem. Phys.
81
,
511
(
1984
).
30.
W. G.
Hoover
,
Phys. Rev. A
31
,
1695
(
1985
).
31.
D. J.
Evans
and
B. L.
Holian
, “
The Nose-Hoover thermostat
,”
J. Chem. Phys.
83
(
8
),
4069
(
1985
).
32.
M. P.
Allen
and
T. J.
Tildesley
,
Computer Simulation of Liquids
(
Clarendon Press
,
Oxford, UK
,
1987
).
33.
R. L.
Adler
,
A. G.
Konheim
, and
M. H.
McAndrew
, “
Topological entropy
,”
Trans. Am. Math. Soc.
114
(
2
),
309
319
(
1965
).
34.
R.
Bowen
, “
Entropy for group endomorphisms and homogeneous spaces
,”
Trans. Am. Math. Soc.
153
,
401
414
(
1971
).
35.
C.
Beck
and
F.
Schlögl
,
Thermodynamics of Chaotic Systems
(
Cambridge University Press
,
1993
).
36.
J. M.
Desantes
,
J. J.
López
,
J. M.
García-Oliver
, and
D.
López-Pintor
, “
Experimental validation and analysis of seven different chemical kinetic mechanisms for n-dodecane using a rapid compression-expansion machine
,”
Combust. Flame
182
,
76
89
(
2017
).
37.
K. K.
Kuo
,
Principles of Combustion
(
John Wiley
,
2005
).