In this study we derive analytically the equilibrium melting probabilities for basepairs of a DNA molecule with a defect site. We assume that the defect is characterized by a change in the Watson–Crick basepair energy of the defect basepair, and in the associated two stacking energies for the defect, as compared to the remaining parts of the DNA. The defect site could, for instance, occur due to DNA basepair mismatching, cross-linking, or by the chemical modifications when attaching fluorescent labels, such as fluorescent-quencher pairs, to DNA. Our exact solution of the Poland–Scheraga model for DNA melting provides the probability that the labeled basepair, and its neighbors, are open at different temperatures. Our work is of direct importance, for instance, for studies where fluorophore-quencher pairs are used for studying single basepair fluctuations of designed DNA molecules.
I. INTRODUCTION
The three dimensional ground-state structure of double-stranded Deoxyribonucleic acid (dsDNA) is the Watson–Crick double-helix.1,2 When increasing the temperature, the double-stranded double-helical DNA starts to “melt.” Partially melted DNA is an alternating set of intact double-stranded regions and of melted single-stranded DNA regions (“bubbles”).3,4 The biological relevance of DNA melting is, for instance, due to the fact that local denaturation of DNA is necessary for protein binding to DNA single-strand5 and required for transcription initiation.6–9
In Refs. 10 and 11, the melting behavior of DNA is reviewed. Three types of models are discussed: the Poland–Scheraga (PS)/two-state/helix–coil model, the Peyrard–Bishop–Dauxois (PBD) model10 and full-scale molecular dynamics simulations. For each model, numerical methods exist which takes a DNA sequence as input, and provides a melting profile (equilibrium probability that each basepair is open) as output, see Refs. 12 and 13 (PS model), Refs. 14 and 15 (PBD model) and Ref. 16. The case when a force is applied to the end of DNA has been studied within the PBD model,17 PS model,18,19 and through molecular dynamics simulations.16 Finite-size scaling analysis of DNA melting was studied within the PBD model in Ref. 20 and within the PS model in Ref. 12.
In this article, we use the PS model, as recommended in Ref. 10. The PS model of DNA melting21,22 has been proven to well reproduce macroscopic melting data.7,10,23–28 From the theory point of view, the PS model is particularly interesting since, for the case when all basepairs are identical (homo-DNA), it has been shown analytically that the model has a phase transition.29 Analytic prediction of the melting behaviour of homo-DNA within the PBD model are available in Ref. 30.
In the PS model, a given basepair can be in two possible states (closed or open). The energy function for a given state of the DNA molecule is of Ising-type with an additional long-range term21,27 due to the entropy associated with the melted single-stranded regions. Overall, the PS model has the following parameters: two hydrogen bond energies (AT or GC bonds), ten stacking (nearest neighbor) parameters, the loop entropy function g(m) (characterized by a loop exponent c) for melted DNA regions consisting of m basepairs, and the cooperativity parameter. All parameters in the PS model are available from numerous experiments.23,24,31,32
On the computational side, Poland introduced an exact algorithm33 for calculating the melting probability of each base-pair within the PS model. Poland’s method has computational time which scales as N2 for a sequence of length N. By introducing an approximation of g(m), Fixman and Freire34 managed to bring down computational costs to a linear scaling with N.
In here, we study analytically and through numerical computations (using the Poland algorithm), the melting behavior of a DNA with all basepairs identical except at one position (“defect”). A defect here refers to any scenario where DNA stability is locally increased or decreased compared to the remaining part of the DNA. Examples of defects include DNA interstrand crosslinks,35 strongly stabilized sites,36 base-pair mismatches,37 or potential changes on local DNA chemical structures when a fluorophore-quencher pair is attached to the opposite strands of a DNA molecule as in Refs. 38 and 39.
A number of previous studies investigated the effect of defects on the melting of DNA. Changes in melting temperature in short DNA due to the presence of mismatches were experimentally studied in Ref. 40. Molecular dynamics simulations of DNA with a defect was investigated in Ref. 41. Effect of defects on thermal denaturation of short DNA was investigated within the PBD model in Ref. 42. The order of the phase transition for a weakly disordered DNA, by applying a mean-field type approximation, was investigated analytically in Ref. 43. Despite these studies, to our knowledge, no previous work investigated by analytical means how a single defect affects the melting behavior of the PS model. In particular, it remains to be investigated how such a defect affects the phase transition, and how many basepairs away from the defect the melting probability is affected.
In this study, we solve a prototypical model analytically: a DNA molecule which has all identical basepairs, except for one “defect site,” as illustrated in Fig. 1. The hydrogen-bond energy at this defect site is different, and the stacking interactions between this site and its two neighbors are different compared to the rest of the DNA. For such a system, we calculate analytically the basepair melting probabilities within the PS model.
Due to the long-range coupling and the nearest neighbour stacking interaction, the effect of a defect in our system is non-trivial. Our analytic prediction provides a basic understanding of both how the melting temperature changes with defect properties, but also of the “scope” of the defect, i.e., over how long a distance the defect affects basepair melting probabilities. Experimentally, the melting curves predicted herein, and available for arbitrary DNA sequences using our publicly available codes, can be measured using specifically designed DNA sequences.
II. THE POLAND–SCHERAGA MODEL WITH A DEFECT
Let us consider a dsDNA with N basepairs, clamped at both ends (the first and last basepair are always in closed states) for simplicity. Physically, the clamping can be achieved by attaching beads to the end basepairs. We assume that the beads are free to rotate, and hence that the system can release torsional constraints.
In practice, the statistical weight of a bubble of length m (open basepairs) including the defect, like the one in Fig. 1 where the red/pink bases indicate the defect, is given by γ(m) = m−cσβLβRβm−2. If the defect is on the edge of the bubble, either the last closed basepair on the left or the first closed basepair on the right, the statistical weight of the bubble is γ(m) = m−cσRβm or γ(m) = m−cσLβm, respectively. A bubble of the same length which does not include the defect would have a weight γ(m) = m−cσβm. On the other hand, the statistical weight of non-melted regions is 1 by definition. Then, the partition function of the system, from which the melting probabilities are deduced, is obtained by summing over all possible configurations of the dsDNA (number, positions and lengths of the bubbles).
III. RESULTS
The model equations as presented in Sec. II are solved analytically in Appendix B, and our analytical predictions are corroborated using numerical computations.
A. Melting probability at the defect basepair
B. Melting probability in the neighborhood of the defect
We now give an analytic expression for the probability that a basepair in the neighborhood of the defect is closed. The details of our derivation is found in Appendix B.
The left neighborhood of the defect is turned into the right neighborhood of the defect when the DNA sequence is reversed (the first basepair becomes last and vice versa), which also reserves the left and right stacking energies. For this reason, the probability P(kd − d) for a basepair at a distance d = kd − k > 0 to the left of the defect is given by Eq. (12) when replacing βL, βR, σL and σR by , , and , respectively.
In Fig. 2, we compare the theoretical predictions Eqs. (6) and (12) to the numerical solution of the PS model, for the three values of the loop exponent discussed in Sec. II: c = 1.5, c = 1.76 and c = 2.11, below the transition (z0 < 1), and observe an excellent agreement. Let us make three comments on this figure. First, there is a slight asymmetry around the defect in the melting map, coming from the different values for the left and right Boltzmann weights βL and βR. Second, since βL > β and βR > β, it requires less energy to disrupt the defect bond than the background bonds, which explains why the opening probability is larger for the defect basepair. Third, the smaller the loop exponent c the larger the opening probabilities, because a smaller c means less avoidance for the melted regions [Eq. (3)], leading to a higher number of possible three-dimensional configurations (larger entropy), and thus to a higher probability to be in the open state.
C. Phase transition
The nature of the phase transition for homogeneous DNA has been long studied and is well understood.29 It has been shown that no phase transition occurs for c < 1, that the probability for a closed basepair approaches 0 with a critical exponent (2 − c)/(c − 1) when 1 < c < 2, and that the phase transition is first order when c > 2 (see Refs. 29 and 45). In Appendix D, we show that these results hold true for homo-DNA with a defect, and that the position of the basepair (defect, neighboring or background) enters through a prefactor in the melting probability.
We illustrate these results on Fig. 3 by plotting the melting curves for a background basepair, the defect basepair, and a basepair at a distance d = 10 to the right of the defect, using respectively Eqs. (9), (6), and (12), for a sequence “A…A” with a defect G, and for three different values of the loop exponent c. Unlike the example shown on Fig. 2, here the bond at the defect site is stronger than the bonds for the background basepairs (GC: three hydrogen bonds, AT: two hydrogen bonds), therefore the opening probability for the defect basepair is smaller than for background basepairs. Note that depending on the value of c, the neighboring basepair behaves either more similarly to the defect basepair or to a background basepair.
Because of the separation of variables between the basepair position k and the temperature difference T − T* in Eqs. (15) and (16), the helicity, defined as the fraction of bounded basepairs , follows the same phase transitions as T → T*: for 1 < c < 2 and θ → B > 0 for c > 2. Thus the nature of the phase transition and the critical exponent for the present single-defect system are the same as for the PS model without a defect in the thermodynamic limit (N → ∞).
IV. SUMMARY AND OUTLOOK
In this study we investigated how a defect affects the DNA melting transition. Our exact solution of the Poland–Scheraga model provides the melting probabilities for the defect site and its neighbors, as given in Eqs. (6) and (12). The opening probability for the defect site can be either larger or smaller than for the background basepairs, depending on whether the defect effectively weakens or strengthens the bond. Therefore, the presence of a defect can both increase or decrease the probability of bubble formation. We find that the effect of the defect decays with a typical distance d0 as given in Eq. (13), which depends on the distance to the critical temperature.
We also investigated the phase transition behavior in the thermodynamical limit (N → ∞) of our model system. We found that, while the single defect melting probability has a prefactor which is different than the rest of the DNA, the temperature dependence is such that the order of the phase transition, and the critical exponents, are the same as for the case without a defect.
Our findings could potentially be experimentally corroborated using, for instance, a setup with fluorophore-quencher pairs attached to the DNA.38,39
It remains a future challenge to solve analytically the case with multiple defect sites. Our results indicate that such a system might show a rich behavior, where the “interaction” between defects depends on how far from the critical temperature the system is, on the distance between defects, as well as on the value of the loop exponent.
ACKNOWLEDGMENTS
T.A. is grateful to the Swedish Research Council for funding (Grant No. 2014-4305). A.G. thanks Malcolm Hillebrand for is careful review of the manuscript and for useful discussions.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Arthur Genthon: Formal analysis (equal); Writing – original draft (equal); Writing – review & editing (equal). Albertas Dvirnas: Formal analysis (supporting); Supervision (supporting); Writing – review & editing (equal). Tobias Ambjörnsson: Conceptualization (lead); Formal analysis (equal); Funding acquisition (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The Matlab codes that support the findings of this study are openly available at https://github.com/Arthur-Genthon/dna-melting-defect.
APPENDIX A: THE POLAND–SCHERAGA MODEL
In the next section, we solve this equation for homogeneous-DNA with a defect basepair.
APPENDIX B: EXACT SOLUTION OF THE POLAND–SCHERAGA MODEL WITH A DEFECT BASEPAIR
In this appendix we solve the Garel–Orland recursion relation, Eq. (A13), for homo-DNA with a defect basepair described by Eqs. (4) and (5), and we provide analytic expressions for the melting probabilities of the defect basepair, a neighboring basepair, and a background basepair. To do so, we follow along similar lines as the derivation for the homo-DNA case given in Ref. 13.
1. Below the critical temperature
For a basepair k > kd in the right neighborhood of the defect, the left partition function QL(k) is given by the sum of Eqs. (B13) and (B14), where the terms I(k − kd) and I(k − kd − 1) are not negligible since we do not suppose a priori that the neighboring basepair is far from the defect. On the other hand, the right partition function describe the region from basepair k to basepair N + 1, with the constraint that they are both closed. Since this region does not include any defect, QR(k) is given by the formula for homo-DNA without defect. Combining these results with Eqs. (A1) and (B13) for k = N + 1, we recover the probability that a basepair in the right neighborhood of the defect is closed, Eq. (12) in the main text.
2. Above the critical temperature
For a basepair in the right neighboring of the defect, we still have QL(N + 1) ∼ I(N), moreover QR(k) ∼ I(N − k) since there is no defect in the region from k to N. Now the different branch cut contributions appearing in QL(k) are I(k) and terms equivalent to I(kd)I(k − kd), coming from the combination of Eq. (B7) with Eqs. (B8)–(B12). The part proportional to I(k) gives a vanishing contribution to the probability, which follows from the same calculation as for the defect basepair. The probability that a basepair at position k > kd is closed is thus asymptotically given by: P(k) ∼ I(kd)I(k − kd)I(N − k)/I(N). If we are interested in a basepair k that remains at a fixed distance of the defect in the thermodynamic limit where k, kd, N → ∞, then I(k − kd) is simply a constant, I(kd) ∼ I(k), and again the ratio reduces to the case treated for the defect basepair, which gives a vanishing probability in the thermodynamic limit. If k − kd → ∞, then the asymptotic behavior can be used for all four functions I involved in the probability, and the probability reads: , which goes to zero in the thermodynamic limit.
APPENDIX C: BRANCH CUT CONTRIBUTION
APPENDIX D: PHASE TRANSITION
0 < c < 1: the left-hand side of Eq. (D1) diverges as Δz → 0, whereas the right-hand side is finite. Thus, Eq. (D1) has no solutions for z0 close to 1. Instead, all solutions to Eq. (8) occur for z0 < 1, and P(k) is for all temperature given by Eq. (6) for the defect basepair and by Eq. (12) for the neighboring basepairs. The transition is therefore “smooth” (derivatives to any order of P with respect to T exists).
- 1 < c < 2: the left-hand side of Eq. (D1) is dominated by its second term as Δz → 0, so that:Note that we have Δβ < 0 and Γ(z) < 0 for −1 < z < 0, so that their ratio is positive.(D2)
- c > 2: the left-hand side of Eq. (D1) is dominated by its first term as Δz → 0, so that:(D3)