We present a lattice model developed to interpret oligonucleotide hybridization experiments beyond the two-state, all-or-none description. Our model is a statistical extension of the nearest-neighbor model in which all possible combinations of broken and intact base pairs in the duplex state are considered explicitly. The conformational degrees of freedom of unpaired nucleotides in the single-strand or duplex state are modeled as self-avoiding walks of the polymer chain on a cubic lattice. Translational entropy and concentration effects are modeled through a coarser lattice of single-strand sized sites. Introducing a single free parameter for the excess entropy per unpaired nucleotide results in reasonable agreement with experiment. While the model provides a generally applicable tool, we illustrate specifically how it is used to interpret equilibrium and nonequilibrium infrared spectroscopy measurements and validate that the model correctly captures sequence and length dependent effects for sequences up to 18 nucleotides. Model predictions are directly related to experiments through computed melting curves. Calculated free energy surfaces offer insight into the interpretation of temperature-jump measurements of oligonucleotide dehybridization. The model captures the interplay between configurational variation and the enthalpic stabilization of base pairing contacts in the context of a minimalist statistical description of DNA hybridization and offers useful insight beyond the simplest all-or-none picture.
The thermodynamics and kinetics of nucleic acid folding have been studied over several decades using a wide variety of experimental techniques that offer complementary information about the hybridization process. These methods include electronic1–6 and vibrational spectroscopies,7–9 nuclear magnetic resonance (NMR),10–12 fluorescence-based measurements,13–17 calorimetry,18–20 and single molecule optical tweezers21–23 to name a few. Regardless of the experimental observable, the simplest and most widely applied model of hybridization thermodynamics and kinetics invokes the so-called all-or-none picture, in which all possible base pairs in a duplex are assumed fully intact and the transition to the single-strand state is marked by a concerted loss of all base pairing contacts. Although this assumption is useful and adequate in many contexts, it has long been recognized that the all-or-none description is an oversimplification that offers little insight into the underlying molecular mechanism. Molecular dynamics (MD) simulations are capable of modeling the finest details of hybridization,24–26 but relating simulated folding trajectories to experimental observables is nontrivial and running high-level MD simulations can present challenges to nonspecialists. As a result, simple statistical models that can help explain experimental results in detail beyond the simplest all-or-none picture, but that lack finer detail and complexity, serve an important role.
Among experimental methods, we and others have found infrared (IR) spectroscopy to be a valuable technique in the study of nucleic acid structure,27–29 hydration,30,31 folding dynamics,8,9,32 and energy relaxation.33,34 The toolbox of IR methods offers label-free structural information, and time-resolved IR spectroscopy can access dynamics down to the picosecond time scale. Across the mid-IR, characteristic frequency ranges report on the sugar conformation, phosphate backbone orientation, glycosidic bond geometry, and the vibrations of the nucleobases themselves.27 The 1500–1800 cm−1 range contains in-plane nucleobase modes and carbonyl stretches that are particularly useful for studying nucleic acid folding since the same physical interactions that mediate hybridization, namely, hydrogen bonding and base stacking, are also largely responsible for shaping the infrared spectrum in this region. As a result, the line shapes, intensities, and frequencies of these IR transitions are sensitive base-specific reporters of nucleic acid structure and, when tracked in time, base pairing dynamics.
Extracting the complete information content of an IR measurement requires a rigorous mapping between the conformation of the system and the spectrum corresponding to a particular distribution of structures. Unfortunately, no simple structure-spectrum correlation exists for nucleic acids due to the highly delocalized nature of the nucleobase vibrations.35 As a consequence, the IR spectrum of DNA cannot be modeled using a basis of local vibrational modes such has been done for protein amide I spectroscopy, where the spectrum can be effectively described in terms of coupled vibrations of the polypeptide backbone.36,37 Theoretical work on identifying suitable basis modes and coupling parameters for simulating the IR spectroscopy of adenine-thymine (AT) and guanine-cytosine (GC) duplexes has been established,38–41 but as of now, there is no general quantitative model for interpreting infrared spectra in terms of nucleic acid structure.
Turning to simplified models based on empirical results offers one route forward. Recently, we have found that a simple lattice model can offer insight into the interpretation of both linear and two-dimensional IR (2D IR) experiments applied to study DNA oligonucleotide dehybridization.29 Lattice models of varying complexity have been used to successfully describe the melting transition and folding dynamics of both DNA42–44 and RNA45 and have been used extensively to simulate various aspects of protein folding.46–48 More generally, these descriptions fall into a broad class of statistical thermodynamic models rooted in formulating a partition function on the basis of structural contacts, such as the Muñoz-Eaton49 and Gō models.50 Our motivation for developing a lattice model as opposed to a more detailed representation is to determine the simplest ensemble description consistent with our experimental results, thereby identifying essential degrees of freedom through a computationally inexpensive and intuitive approach. In the context of IR spectroscopy, since spectra in the 1500–1800 cm−1 range report base-specifically on the extent of paired nucleobase contacts, a simple model that describes hybridization in these terms could be a natural tool for structurally interpreting experiments. In fact, we recently used such a model to discuss spectral features indicative of DNA hybridization and for melting curve analysis.29,32
Here, we present our model in detail and validate its utility by interpreting infrared experiments in terms of the predicted conformational ensemble. At its core, the model is a statistical extension of the nearest-neighbor (NN) model commonly used to predict DNA melting temperatures (Tm).51,52 The NN model decomposes the thermodynamics of the duplex to single-strand transition into additive contributions from the dinucleotide steps that make up a given oligonucleotide sequence. This approach explicitly assumes an all-or-none description of base pairing. Although this assumption proves remarkably successful at predicting hybridization thermodynamics for most nucleobase sequences, it neglects potential structural heterogeneity within the duplex state. Previous models that incorporate the NN parameters into statistical treatments have been effective at simulating nucleic acid ensembles and thermodynamics as well as providing a unified description of the thermal denaturation of both oligomers and polymers.44,53 Recently the RNA NN model has been advanced beyond the all-or-none assumption by deriving revised parameters based on a fit to a partition function model, and this approach has resulted in improved agreement with experimental melting curves.54
In the first half of the paper, we present the details of the model and identify useful connections to experiment. In the second half, we demonstrate the utility of the model in interpreting IR experiments on the melting of DNA oligonucleotides of varying sequence composition and length. Although the model relies on many simplifying assumptions, it has nevertheless proven an effective tool for interpreting DNA dehybridization experiments beyond the all-or-none description.
LATTICE MODEL FOR DNA HYBRIDIZATION
Details of the model
Our model of DNA melting and hybridization uses simple statistical mechanics to calculate thermodynamic variables, melting curves, and equilibrium constants using a partition function obtained from a coarse set of conformational microstates. At the finest level, discretized base pairing is enumerated such that all possible combinations of broken and intact contacts are generated and the enthalpy of a particular base pair configuration is set using the NN parameters. Configurational entropy is determined by self-avoiding random walks (SAWs) of the oligonucleotide chains on a cubic lattice of nucleotide sized sites. Finally, concentration effects associated with the gain in translational entropy upon duplex dissociation are simulated on a separate 3D lattice of single-strand sized sites. Throughout, we use the term “single-strand” to refer to an oligonucleotide that has zero base pair contacts, while a “duplex” refers to configurations involving any number of base pair contacts between two complementary strands.
The model describes the dissociation of a duplex (D) into two self-complementary single-strands (S) such that D ⇌ 2S. The results are readily generalized for heteroduplex dissociation, D ⇌ A + B, which is also summarized. As such, this treatment reflects a traditional two-state model for bimolecular dissociation but with explicitly enumerated single-strand and duplex ensembles. We construct the partition function starting with the assumption that the duplex and single-strand degrees of freedom are independent of one another and that the partition function, Q, can be separated into the product of partition functions for the components of the system, Qi, where i = S, D. We also assume that the external (translational) and internal (conformational) degrees of freedom for each species are separable and decompose the partition function for each species into the product
In our model, each of these degrees of freedom is coarse grained onto a separate lattice.
The external partition function
The external partition functions are used to describe the excess translational entropy available to the single-strand over the duplex as well as the decreasing free energy of the duplex as the total concentration of oligonucleotides is increased. In analogy to lattice gas models,48,55 this partition function is calculated by considering the number of possible single-strand and duplex arrangements on a 3D lattice of oligonucleotide sized sites. The number of lattice sites X is determined by dividing the total system volume V by the volume of the single-strand v. The number of ways of placing NS single-strands on a lattice of X sites is
However, under dilute conditions when NS ≪ X, this expression can be approximated by
Note that this expression is the same as the partition function for an ideal gas in which the quantization volume has been replaced with the volume of the lattice site.
In practice, we calculate the translational lattice site volume from the average radius of gyration for an ideal chain with L nucleotide units separated by length η: . We set the length of a single nucleotide unit to η = 0.34 nm.56 The number of sites on the translational lattice is then determined by X = V/v, where v = 4πRG3/3 = 0.0112 L3/2 nm3.
In the context of the external partition function, we define a duplex as any single-strand with at least one other single-strand occupying any one of b adjacent lattice sites, with b = 6 in the case of a cubic lattice. The number of ways of arranging 2ND oligonucleotides on the lattice such that they are in a duplex configuration is therefore equivalent to arranging ND oligonucleotides in any of the X sites on the lattice and restricting ND oligonucleotides to the b adjacent sites around each occupied site on the lattice. This definition of the duplex also introduces the entropy associated with one rotational degree of freedom of the adjacent single-strands. The expression for the duplex external partition function in the dilute limit for homo-duplexes is then
while for hetero-duplexes, assuming [A] = [B],
We assume that the translational configurational degeneracy Ωi for molecules on a cubic lattice is a sufficient representation of the external partition function Qi,ext for the single-strand and duplex. Then, the partition function for Ni indistinguishable copies of component i, Eq. (1), is
where qi,int is the molecular partition function for component i. Note that the indistinguishability of duplexes or single-strands is accounted for here in the external terms of the partition function.
The molecular partition function
The molecular partition function for component i, qi,int is determined from the sum of Boltzmann weights across all possible conformational microstates
where Ei,α is the energy of microstate α. In principle, qi,int includes contributions from the entire phase space, but our primary focus here is the stabilization due to the formation of base pairing contacts as well as the conformational degrees of freedom of the nucleotide chain, which we represent with a self-avoiding walk on a cubic lattice. We choose the single-strand to be the reference state with ES,α = 0, such that its partition function is the conformational degeneracy of single-strand states qS,int = WS which is set by the number of SAWs on a cubic lattice for a strand of L total nucleotides.
For the duplex, the interplay between energy stabilization due to base pair formation and the increase in conformational entropy due to unpaired nucleotides must be considered. In our model, the free energy of the duplex is defined solely by these terms. Here, we include only those duplex microstates with in-register base pairing, which one would expect to predominate for short and heterogeneous sequences. As we will see, this assumption proves reasonable for the sequences considered here (L 20). The energies for microstates of the duplex are obtained from the experimentally derived parameters of the NN model.51 Although the NN model is parameterized with respect to Gibbs free energy rather than Helmholtz free energy, our model assumes that the differences due to the PV term are negligible and equate the lattice model energy for a duplex configuration ED,α to enthalpies from the NN model.
Since we would like to evaluate sequence effects, we explicitly enumerate all possible in-register base pair configurations. An oligonucleotide sequence is therefore represented as a string of L sites which can exist in either an open (broken base pair) or closed (intact base pair) state. Besides nucleobase sequence, a configuration is characterized by a number of intact base pairs ρ and a number of broken base pairs l = L − ρ. For all 2L combinations of open and closed sites, the energy of a given duplex microstate, ED,α, is assigned by taking the sum of dinucleotide enthalpies from the NN model across the intact dinucleotide subunits of the microstate base pairing configuration, as shown in Fig. 1 for two illustrative microstates,
The experimentally derived dinucleotide enthalpies HNN take into account sequence specific hydrogen-bonding and base-stacking effects.51 For isolated pairing contacts that have a hydrogen bonding contribution in the absence of a stacking neighbor, an additional enthalpic stabilization of −2.1 kJ/mol for an AT pair and −3.1 kJ/mol for a GC pair is assigned based on an experimentally informed estimate for the stabilization due solely to hydrogen bonding.57
In practice, since the microstate energy is set only by the intact base pairs, the internal partition function of the duplex is calculated as a sum over all possible binary permutations of broken and intact base pairs for the strand, which we denote δ,
Here, WD,δ is the conformational degeneracy associated with base pair arrangement δ, and WD,δ is generated by allowing unpaired bases to explore all possible arrangements on a cubic lattice of nucleotide-sized sites through self-avoiding random walks.
Assuming in-register base pairing, a broken base pair or stretch of broken base pairs can exist in only one of two scenarios. A frayed end refers to a stretch of l = 1 to l = (L − 1) broken base pairs that includes a broken terminal pair, while an internal loop refers to a stretch of l = 1 to l = (L − 2) broken base pairs flanked by at least one intact base pair on either side. An example of a microstate with a frayed end of l = 5 nucleotides for an L = 10 duplex is depicted in Fig. 1(a), and an example of a microstate with an internal loop of l = 6 nucleotides is depicted in Fig. 1(b) for the same sequence. For simplicity, frayed ends although self-avoiding are not restricted to avoid clashing with the complementary strand.
The length scaling relationships for self-avoiding walks and self-avoiding polygons (SAPs) are well established.58,59 In practice, we explicitly model beaded polymer chains up to L = 11 on a cubic lattice and then fit the asymptotic expressions in Fig. 2 to determine the appropriate prefactor for generating the number of SAWs and SAPs in the L = 1–18 range.59 A frayed end is modeled as a SAW, while an internal loop is modeled as a SAP. Figure 2 shows the configurational degeneracy for frayed ends and loops calculated in this way. These asymptotic forms were determined from numerical studies of polymer chains on a cubic lattice.58 The results from our own explicit enumeration of walks and the prefactors from the fits are in good agreement with a previous report44 although we divide by a factor of six to separate out rotational degrees of freedom on the lattice. It should be noted that the model does not include any parameters that impose cooperativity effects explicitly, but that these effects enter naturally through the NN parameters.
Connecting the model to experimental observables
One key prediction of the lattice model that we compare to experiments is the equilibrium constant for the dissociation reaction, Kd. Evaluating the chemical potential with Eq. (6) and applying the equilibrium condition, 2μS = μD, results in an expression for the reaction quotient, K,
which is related to the dissociation constant through
Here, cS is the single-strand concentration, cD is the duplex concentration, c0 is a standard state concentration of 1 mol/L, Ni = ciV, and is Avogadro’s number. Equation (12) is a convenient expression for the equilibrium constant in terms of the molecular partition functions, the contact number for the translational lattice, and the single-strand volume.
Spectroscopic thermal melting curves are one of the most common experimental approaches to studying the DNA duplex to single-strand transition. In this context, the extent of dehybridization is often expressed in terms of the fraction of intact base pairs as a function of temperature θ(T) under the assumption that changes in absorbance with temperature can be mapped to the number of broken base pairs in the DNA ensemble.60 The overall contact fraction is expressed as the product of an internal and external fraction
Here, θext(T) is the fraction of DNA strands with at least one intact base pair out of the total number of DNA strands, and θint(T) is the average fraction of contacts among those DNA with at least one intact base pair.
With the framework of the lattice model in place, we can calculate the internal fraction of intact base pairs in terms of the internal duplex partition function qD,int
where ρδ corresponds to the number of intact base pairs in base pair configuration δ.
For θext(T), we use the dissociation constant Kd from Eq. (12) and the total concentration of DNA strands [CTot] to arrive at an expression for the fraction of oligonucleotide strands out of the total number of DNA strands that have at least one intact base pair
This expression is equivalent to the duplex fraction θD(T) routinely invoked to model the melting curve when assuming a two-state all-or-none duplex to single-strand transition. Assuming that [A] = [B] in the hetero-duplex case, [CTot]/4 is substituted in place of [CTot] in Eq. (15). The product of Eqs. (14) and (15), θ(T), is assumed to model the melting curve. The melting temperature, Tm, is defined as the temperature where θ(T) = 0.5 and is often reported as a convenient proxy for DNA duplex stability.
Parameterization of the model
In reality, by reducing each nucleotide to a single bead fixed on a cubic lattice, this model neglects many additional degrees of freedom and will clearly underestimate the conformational and vibrational entropy change severely, leading to nonphysical results. For example, modeling the 10-mer 5′-GATATATATC-3′ produces a reasonable sigmoidal melting curve, but Tm and melting transition range are approximately a factor of 10 too large. This motivates an additional correction to the model that can account for additional entropy changes upon dehybridization.
We introduce an excess entropy parameter κ, which accounts for the missing degrees of freedom in an average way on a per-base basis. The magnitude of this entropy correction grows exponentially with the number of free beads in a base-pair arrangement,45 and the internal molecular partition functions become
where κ is raised to the power of the number of free nucleotides l in base pairing arrangement δ. The value of κ is assumed to be independent of temperature and set such that the lattice melting curve is shifted to a temperature range that agrees with experiment. As seen in Fig. 3(a), applying this entropy correction results in agreement with the experimental melting curve measured by IR spectroscopy for this oligonucleotide when κ = 26. The length and sequence dependence of this parameter are discussed in the following sections that describe the validation of the model with respect to these variables. In the absence of experimental data, the NN model Tm can just as easily be used to set the value of κ for a given sequence.
The standard free energy of duplex dissociation as a function of temperature is plotted in Fig. 3(b) for the entropy-corrected lattice model. As a further check on the influence of κ, we compare the predicted ΔF° at 37 °C against the NN model since the latest salt-corrected version of the NN model computes this value using a set of experimentally derived corrections to account for the effects of both monovalent and divalent cations at physiological temperature.52 The agreement between the two models is reasonable despite the fact that the lattice model does not include any explicit correction for cation concentration. However, cation effects on DNA hybridization are known to be primarily entropic, so it is likely that the salt concentration also influences the value of κ when parameterizing against experiment.51,52
VALIDATION OF THE MODEL
Determining the role of sequence composition and nucleobase ordering in DNA hybridization is a question which is nearly as old as the discovery of the DNA double helix itself and which has been investigated in some detail.1,3,4 We have previously studied the sequence-dependent mechanism of DNA dehybridization through both steady-state and time-resolved IR spectroscopy, finding that the placement of GC pairs in an otherwise AT sequence shapes the dehybridization mechanism.29,32 To validate that the lattice model effectively captures the sequence dependent effects observed in IR experiments, we return to our well-studied set of self-complementary DNA oligonucleotides: 5′-ATATGCATAT-3′ (GC-core), 5′-ATGATATCAT-3′ (GC-mix), and 5′-GATATATATC-3′ (GC-ends). We will refer to each of the sequences by the shorthand name in parentheses for convenience.
Figure 4(b) shows an illustrative set of temperature dependent IR spectra for the GC-core sequence acquired between 5 °C and 90 °C. For the purposes of validating the lattice model, we are interested in how well the shape of the melting curve is reproduced as a function of the nucleobase sequence. Experimental melting curves that reflect the global spectral changes as a function of temperature in the 1500–1750 cm−1 range were determined by taking the normalized second singular value decomposition (SVD) component and subtracting off linear fits to the sloping baselines, as described previously.29 The resulting experimental melting curves for these sequences are shown in Fig. 4(a). These melting curves report primarily on the fraction of intact base pairing in the oligonucleotide ensemble and should therefore relate directly to θ(T) computed by the lattice model. Figure 4(c) shows the corresponding θ(T) calculated for each of the sequences after parameterizing κ against experiment. The magnitude of the entropy scaling parameter κ takes on values of 26, 28, and 29 for GC-ends, GC-mix, and GC-core, respectively. Although κ does not vary significantly across the oligonucleotide set, it does appear to track the trend in melting temperatures.
For the simplest description of a duplex to single-strand transition, the so-called all-or-none picture, all possible base pairs are either fully intact (duplex) or fully broken (single-strand). That is, θint(T) can only assume a value of 1. The melting curve in this case is a direct reporter of the duplex fraction and takes the form of a symmetric sigmoid with its inflection point centered on Tm. The GC-ends melting curve best exemplifies such an all-or-none dehybridization. For the other sequences, a more diverse duplex ensemble characterized by duplex fraying prior to dissociation is observed. Since these partially melted duplex configurations are most prevalent at temperatures below Tm, the result is a distortion of the low temperature side of the melting curve away from a symmetric sigmoid along with a more gradual melting transition, as best exemplified by the GC-core sequence. In the context of the lattice model, this corresponds to a scenario in which θint(T) decreases with increasing temperature. The GC-mix sequence represents an intermediate case between the GC-ends and GC-core sequences.
Comparing Figs. 4(a) and 4(c), it appears that the lattice model reasonably reproduces the trend in Tm and the shape of the melting curves across the set, but subtle shifts in the shape of melting curves can be difficult to visualize. Plotting the first derivative with respect to temperature offers a clearer means of comparison since any changes in the shape of the curve are more apparent in the derivative. Figure 4(d) shows the differentiated melting curves normalized to the maximum magnitude of the derivative of the GC-ends melting curve for both the experimental and lattice model curves. In the first derivative, it is easier to identify asymmetry about the inflection point and the width of the differentiated curve is a direct reflection of the sharpness of the dehybridization transition. We see reasonable agreement between the model and experiment with regard to these features as well.
To further evaluate the model’s predictions, it is possible to consider the melting of AT and GC base pair domains independently. For instance, between 1500 and 1590 cm−1, the IR spectrum is dominated by GC features and tracking thermal changes to the spectrum in this frequency range produces a melting curve that reports primarily on the dehybridization of the GC regions of the duplex. Above 1590 cm−1, the spectrum becomes somewhat congested with overlapping absorptions from all four nucleobases. We have previously identified distinct AT and GC cross-peak regions in 2D IR spectra that alleviate this congestion.29 However, because of the 80% AT content in these sequences as well as the large increase in intensity of the 1622 cm−1 adenine ring mode upon the loss of base pairing, it is reasonable in this case to assume that the 1590–1630 cm−1 frequency range is dominated by AT base pairing.
Figures 5(a)–5(c) show the GC and AT specific melting curves measured by restricting the SVD analysis to the respective GC and AT frequency ranges discussed above. Comparing the AT and GC melting curves for the GC-core sequence suggests that the AT termini of this sequence begin to melt at lower temperatures than the GC center of the duplex, as evidenced by the relative drop in the AT curve at lower temperature and increased asymmetry compared to the GC curve. The GC-mix sequence exhibits similar behavior, but the difference between the AT and GC melting curves is less pronounced. For the GC-ends sequence, the AT and GC melting curves are essentially overlaid, suggesting that both the center and termini of the duplex dehybridize at the same temperature. To verify whether or not this sequence-dependent melting behavior measured in IR experiments is captured by the lattice model, we calculate θAT(T) and θGC(T) base pair fractions in analogy to the simulated melting curves θ(T) above, but instead of including all base pairs when computing θint(T), the AT and GC base pairs are considered independently. Figures 5(d)–5(f) show the modeled AT and GC melting curves for each of the sequences. The trends across the oligonucleotide set discussed for the experimental curves are reproduced. Reasonable agreement suggests the simple model not only captures shifts in the overall duplex ensemble but also correctly predicts which base pair contacts dehybridize first with increasing temperature.
Visualizing populated microstates
A benefit of a simple lattice model is that one can decompose and visualize a duplex ensemble directly in order to build an intuition for how the simulated populations relate to experimental signals. For example, Fig. 6(a) plots a population profile for the GC-core sequence across 5–90 °C where microstates have been grouped according to their total number of intact base pairs. Taking a single temperature slice through this profile provides insight into the heterogeneity of the oligonucleotide ensemble at a given temperature point. Plotting the sum over the population of all duplexes with at least one broken base pair (dashed green line) demonstrates how the population of partially dehybridized duplexes gradually accumulates with increasing temperature, peaks around 7 °C below Tm, and then sharply drops off as the temperature is further increased.
At a finer level of detail, the base pairing configurations themselves can be visualized directly through contact plots such as those in Figs. 6(b) and 6(c), which represent all of the possible base pairing arrangements for the GC-core sequence with nine and eight intact base pairs. Each row represents a single base pairing configuration (δ), with the oligonucleotide sequence indicated along the horizontal axis. A white box indicates a broken base pair, while a black box indicates an intact base pair at a given site along the strand. The configurations are arranged top to bottom in order of decreasing contribution to the overall duplex population. The horizontal bar graphs on the right side of the contact plots in Figs. 6(b) and 6(c) indicate the free energy change relative to the highest energy base pairing configuration within the nine and eight base pair contact subensembles at 50 °C. From these plots, it is clear that the duplex configurations with frayed ends are significantly more populated than duplexes with internal loops, suggesting that the initial loss of AT base pairing observed in the experimental melting curve originates from the terminal base pairs. This example serves to illustrate how the lattice model can help rationalize experimental observations and build intuition for the interpretation of melting experiments in terms of a simplified hybridized ensemble.
To further evaluate the validity of the population distributions in Fig. 6(a), we compare the temperature dependent IR spectrum suggested by the model populations against experimental spectra for the GC-core sequence. This approach involves determining suitable component spectra for collections of microstates. A spectrum of the ensemble at a given temperature can then be calculated by using the sum over the appropriately grouped microstate populations to weight the contribution from the corresponding component spectrum, assuming that the experimentally observed spectrum can be expressed as a concentration weighted linear combination of these basis spectra. Microstates are grouped into three categories: duplexes with all ten possible base pairs intact (p1), duplexes with at least one broken base pair (p2), and single-strands that have zero intact base pairs (p3). Figure 7(a) plots these three populations as a function of temperature. Component spectra corresponding to each of these populations were determined using a maximum entropy method described previously29 and adapted from Ref. 61. In brief, this method determines the minimum number of component spectra needed to represent the experimental temperature series by reweighting SVD vectors subject to several physically motivated constraints, including a Shannon entropy term and a penalty term for negative absorbance/population values.
For the GC-core sequence, the method returns the three component spectra in Fig. 7(b). These basis spectra are a reflection of the minimum amount of information needed to represent the experimental data and therefore each component spectrum does not necessarily correspond to a simple physical interpretation. However, it is clear that the basis spectrum d1 plotted in blue and the basis spectrum d3 plotted in red in Fig. 7(b) closely resemble the lowest and highest temperature spectra, respectively, suggesting that d1 contains the component spectrum corresponding to the highly paired duplex and d3 contains the component spectrum corresponding to the single-strand. The remaining basis spectrum d2 falls in-between d1 and d3, suggesting that it is the contribution to the experimentally observed spectrum due to broken base pairs within duplexes. A model IR temperature series SLat(T), where the subscript “Lat” denotes a quantity calculated from the lattice model, can be generated through the matrix multiplication
where the columns of D contain the basis spectra and the rows of P contain the lattice model populations. Figure 7(d) shows the IR spectrum for the GC-core sequence generated in this way at temperature points spaced ∼8 °C apart from 5 to 90 °C. The corresponding experimental spectra are shown in Fig. 7(c). The IR spectra generated by the lattice model show reasonable agreement with experiment, suggesting that our assignment of the basis spectra in terms of the grouped microstates is reasonable and that the population distributions from the lattice model capture the correct behavior with increasing temperature.
Sequence dependent effects are built into the lattice model through explicitly accounting for all possible base pairing configurations and then assigning enthalpies based on the NN dinucleotide parameters. The ability of the model to capture experimentally observed sequence effects is therefore primarily related to the enthalpic contribution to the partition function. With the aim of evaluating the entropic contributions to the model, we tested a L = 8 to 18 length series of oligonucleotides with the sequence 5′-C(AT)nG-3′, where n = 3, 4, 5, 6, 7, or 8. Figure 8(a) shows the experimental melting curves, while Fig. 8(b) shows the corresponding θ(T) calculated from the model for each of the sequences in the length series. Comparing the simulated and experimental melting curves, the model reasonably reproduces the measured trend.
As discussed above, there are two entropic terms in the partition function that are sensitive to oligonucleotide length: the degeneracy of conformational states for unpaired nucleotides obtained from SAWs on a cubic lattice and the excess entropy parameter, κ. Since κ is set against experiment to match the measured Tm for each sequence, we must first evaluate the trend in melting temperatures predicted by the uncorrected model (κ = 1) to assess if the SAWs qualitatively capture the correct length dependence.
Figure 8(d) plots the experimentally measured trend in Tm as a function of length as the dark blue points. The light blue line represents the trend in Tm predicted by the model when κ = 1. For the sake of comparison, the model trend is scaled down by a factor of 12.61 to shift the curve into the same temperature range as the experimental points. The lattice model in this case reproduces the experimental Tm trend, suggesting that the SAWs adequately model the scaling of the conformational entropy as a function of oligonucleotide length. The melting temperature appears to asymptotically approach a large L limit, as expected from the length dependence of cooperative zipper models.48 A similar trend is observed for the κ parameter in Fig. 8(c), suggesting that the entropic term calculated through SAWs of free nucleotide beads and the contribution due to κ share a similar length dependence. This result supports our interpretation of κ as largely accounting for missed degrees of conformational freedom coarse-grained out by reduction of the system onto a cubic lattice. However, the small variation in κ across the set of L = 10 oligonucleotides with varying nucleobase sequence discussed above (κ = 26, 28, and 29) suggests that oligonucleotide length alone is insufficient to fully predict κ. Although in this particular case length appears to provide a reasonable initial guess, there remains at least a small sequence-dependent contribution to the parameter. We therefore elect to handle κ on a case-by-case basis and physically interpret kBln(κ) as the average excess entropy per free base pair that is otherwise lost due to the simplified framework invoked by the model.
Lattice model free energy surfaces
In addition to aiding the interpretation of equilibrium experiments, the lattice model is helpful when measuring nonequilibrium dehybridization kinetics in transient temperature-jump (T-jump) spectroscopy.29,32 In brief, a laser temperature jump heats the sample by up to 20 °C in ∼5 ns and the response of the ensemble is tracked in time through the changing IR spectrum.62 Free energy surfaces, F(x) = −kBT ln P(x), calculated from the lattice model with respect to different order parameters, x, can be useful guides for characterizing dehybridization reaction coordinates. Additionally, for large temperature jumps, a comparison of the changes in the population distribution P(x) and the energy landscape between the initial temperature Ti, which sets the initial ensemble, and the final temperature Tf, which determines the potential of mean force on which the ensemble will evolve, can provide insight into relaxation pathways in T-jump experiments.
As an example, we turn to recently reported T-jump data on the GC-core and GC-ends duplexes that display contrasting dehybridization mechanisms determined by the nucleobase sequence.32 Figure 9(a) shows normalized experimental kinetic traces tracked at the 1614 cm−1 adenine ring mode for these sequences using a 20 °C T-jump that brings the final temperature to Tf = Tm + 4 °C. This frequency is our focus due to the large increase in signal amplitude observed upon dehybridization. Whereas both sequences reveal dissociation kinetics on a 10s-of-μs time scale, the GC-core sequence with its terminal AT base pairs exhibits an additional ∼80 ns response that is absent from the GC-ends sequence. This 80 ns response was assigned to fraying dynamics at the AT termini of the GC-core sequence. For both sequences, the signal is observed to re-equilibrate on a several ms time scale as the temperature returns to Ti and the DNA strands rehybridize.
These experimental observations suggest two logical coordinates for decomposing the lattice model ensemble: one that characterizes the overall extent of dehybridization and the other the extent of duplex fraying. For dehybridization, we use the total number of intact base pairs in a microstate, ρ, while for fraying, we group microstates according to their maximum stretch of broken base pairs that includes a terminal base pair, f. A 2D free energy surface with respect to these coordinates can then be calculated by F(ρ, f) = −kBTln P(ρ, f), where P(ρ, f) is the microstate populations grouped with respect to total contact number, ρ, and maximum fray length, f. Figures 9(b) and 9(c) show the free energy surfaces calculated in this way for the GC-ends and GC-core sequences at Ti and Tf from the T-jump experiments in Fig. 9(a). The single-strand state is set as the reference state by subtracting F(0, 10) from all points on the surface. According to our definitions of ρ and f the single-strand state is only defined at F(0, 10), but in Fig. 9, the ρ = 0 slice is set to zero everywhere in f to avoid the appearance of discontinuities in the surface when crossing from the duplex well to the single-strand state.
The GC-ends surface exhibits a well-defined minimum at (ρ, f) = (10, 0), the fully base paired duplex, at both Ti and Tf, and the minimum energy path across the surface, indicated by the dashed line in Fig. 9(b), follows the diagonal where f = L − ρ. This relatively featureless surface with a well-defined minimum at the maximum contact coordinate, (ρ, f) = (10, 0), suggests a dissociation mechanism consistent with the experimental observation of only a single time scale corresponding to duplex dissociation and is consistent with an essentially all-or-none loss of base pairing. By contrast, the GC-core surface [Fig. 9(c)] displays a comparatively wide and structured duplex well, with a much broader minimum. The minimum path tracks across the center of the surface and reflects a symmetrically fraying duplex about a stable central core of intact base pairs. This reshaping broad minimum between Ti and Tf is characterized by a shift in the population of frayed duplex microstates and is consistent with the additional nanosecond response observed for the GC-core sequence in Fig. 9(a).
The ability to model the Ti and Tf populations and to visualize shifts in the shape of the duplex basin therefore provides valuable insight when interpreting T-jump results. The lattice model offers a more detailed explanation for the lack of amplitude in the nanosecond range for the GC-ends sequence as well as insight into the nature of fraying dynamics within a reshaping duplex well observed for the GC-core sequence. It also provides evidence that the dissociation kinetics for the GC-core sequence are two-state, with a single dominant barrier separating S and D states, rather than a three-state scenario in which the 80 ns response indicates the existence of a metastable intermediate.
It should be noted that these simple surfaces do not accurately capture the barrier height and transition states for the duplex to single-strand reaction since the model lacks any kinetic parameterization and is based solely on enumerating possible duplex contacts with energies set by equilibrium thermodynamic values. Nevertheless, there are well-established correlations between thermodynamic and kinetic trends for both nucleic acids and proteins, such as observed linear relations between activation free energy and thermodynamic free energy.6,63–65 These correlations, in addition to the effectiveness of the NN dinucleotide parameters in accounting for experimentally determined association/dissociation barriers,66,67 suggest that the lattice model may offer additional insight into the activated process of duplex dissociation.
To evaluate this approach, Figs. 10(a) and 10(b) show 1D free energy surfaces calculated at 37 °C with respect to ρ for all of the sequences considered here for which activation free energies have been measured.32,68 The surfaces for the length series are shown in Fig. 10(a), while the sequence-dependent surfaces of the GC-ends and GC-core oligonucleotides are shown in Fig. 10(b). Plotting the experimental activation free energy for dissociation measured by T-jump IR spectroscopy () against the lattice-model barrier height obtained from the difference in free energy of these surfaces at ρ = 2 and their minima at ρ = L (which we denote ΔFLat) results in a linear correlation with a slope of 0.853 and an intercept of 21.7 kJ/mol with R2 = 0.954, as shown in Fig. 10(c). The present model is in reasonable agreement with a previous study that noted a linear correlation between the activation free energy of duplex dissociation and the standard free energy of dehybridization, reporting a slope of 0.991 and an intercept of 33.1 kJ/mol with R2 = 0.996.6
It should be noted that we set the barrier height on the 1D free energy surfaces to ρ = 2 because this is the minimum contact number for which an intact dinucleotide subunit is possible and thus the minimum value of ρ at which the surface is dictated by the NN parameters. Setting the barrier to ρ = 1 or 3 does not significantly deviate the linear correlation, with slopes of 0.799 and 0.749 kJ/mol, respectively, and intercepts of 20.3 and 30.8 kJ/mol, respectively. However, the R2 value is reduced to 0.733 in the ρ = 1 case and 0.909 in the ρ = 3 case. Regardless of how ΔFLat is defined on the 1D surface, an all-or-none assumption in which the barrier height is defined simply as the difference between a minimum at high contact number and a maximum at low contact number is explicitly invoked when relating this value to the experimentally measured dehybridization barrier. Therefore, one would expect that the behavior of those sequences which are most closely described by an all-or-none dehybridization mechanism, such as the GC-ends sequence, would also be best described by the linear correlation between ΔFLat and . Fortunately, multidimensional free energy surfaces constructed with respect to relevant order parameters, such as those discussed in Figs. 9(b) and 9(c), above, can be used in conjunction with the 1D determination of ΔFLat to demonstrate potential deviations from all-or-none kinetics and to offer a more detailed base-by-base description of the dehybridization process.
We have presented a simple oligonucleotide lattice model for informing the interpretation of DNA dehybridization experiments beyond the commonly invoked all-or-none picture. Our model is based on a statistical extension of the NN model where degrees of freedom at each level of detail are coarse grained onto a discrete lattice. Introducing a single free parameter results in reasonable agreement with experiment. We have validated that the model effectively captures both sequence and length dependent effects for model oligonucleotide sequences measured through linear and nonlinear IR spectroscopy. By simulating melting curves, the model can be directly related to experimental measurements. In addition, modeled free energy surfaces offer insight into the interpretation of transient T-jump experiments that track DNA dehybridization in real time.
The primary utility of our lattice model is its simplicity. Although all possible base pairing configurations are considered explicitly, there are still few enough microstates that they can be visualized directly, allowing one to build an intuition for how experimental observables relate to the underlying oligonucleotide ensemble. The current model only considers in-register base pairing for sequences up to 20 base pairs in length. For short sequences this assumption is likely valid, but for longer strands, particularly if the base sequence is uniform, shifted register duplexes should be considered. In addition, the current model considers the free energy of the single-strand state as purely entropic. Again, this assumption is likely supported for short sequences, but for longer sequences, the possibility of stable intramolecular folded states, such as hairpins, should be accounted for as this could shift the duplex-single-strand equilibrium. Although it would be straightforward to incorporate these additional details in the future, at present, the model remains most applicable for short sequences. Even with this restriction, we have found that the insight provided by the simple lattice model presented here is valuable in our studies of the biophysics of DNA dehybridization.
We thank the National Science Foundation (Grant No. CHE-1561888) for support of this research. We thank Ryan Menssen and Xinxing Zhang for sharing their length-dependent dehybridization data.