In this study we derive analytically the equilibrium melting probabilities for basepairs of a DNA molecule with a defect site. We assume that the defect is characterized by a change in the Watson–Crick basepair energy of the defect basepair, and in the associated two stacking energies for the defect, as compared to the remaining parts of the DNA. The defect site could, for instance, occur due to DNA basepair mismatching, cross-linking, or by the chemical modifications when attaching fluorescent labels, such as fluorescent-quencher pairs, to DNA. Our exact solution of the Poland–Scheraga model for DNA melting provides the probability that the labeled basepair, and its neighbors, are open at different temperatures. Our work is of direct importance, for instance, for studies where fluorophore-quencher pairs are used for studying single basepair fluctuations of designed DNA molecules.

The three dimensional ground-state structure of double-stranded Deoxyribonucleic acid (dsDNA) is the Watson–Crick double-helix.1,2 When increasing the temperature, the double-stranded double-helical DNA starts to “melt.” Partially melted DNA is an alternating set of intact double-stranded regions and of melted single-stranded DNA regions (“bubbles”).3,4 The biological relevance of DNA melting is, for instance, due to the fact that local denaturation of DNA is necessary for protein binding to DNA single-strand5 and required for transcription initiation.6–9 

In Refs. 10 and 11, the melting behavior of DNA is reviewed. Three types of models are discussed: the Poland–Scheraga (PS)/two-state/helix–coil model, the Peyrard–Bishop–Dauxois (PBD) model10 and full-scale molecular dynamics simulations. For each model, numerical methods exist which takes a DNA sequence as input, and provides a melting profile (equilibrium probability that each basepair is open) as output, see Refs. 12 and 13 (PS model), Refs. 14 and 15 (PBD model) and Ref. 16. The case when a force is applied to the end of DNA has been studied within the PBD model,17 PS model,18,19 and through molecular dynamics simulations.16 Finite-size scaling analysis of DNA melting was studied within the PBD model in Ref. 20 and within the PS model in Ref. 12.

In this article, we use the PS model, as recommended in Ref. 10. The PS model of DNA melting21,22 has been proven to well reproduce macroscopic melting data.7,10,23–28 From the theory point of view, the PS model is particularly interesting since, for the case when all basepairs are identical (homo-DNA), it has been shown analytically that the model has a phase transition.29 Analytic prediction of the melting behaviour of homo-DNA within the PBD model are available in Ref. 30.

In the PS model, a given basepair can be in two possible states (closed or open). The energy function for a given state of the DNA molecule is of Ising-type with an additional long-range term21,27 due to the entropy associated with the melted single-stranded regions. Overall, the PS model has the following parameters: two hydrogen bond energies (AT or GC bonds), ten stacking (nearest neighbor) parameters, the loop entropy function g(m) (characterized by a loop exponent c) for melted DNA regions consisting of m basepairs, and the cooperativity parameter. All parameters in the PS model are available from numerous experiments.23,24,31,32

On the computational side, Poland introduced an exact algorithm33 for calculating the melting probability of each base-pair within the PS model. Poland’s method has computational time which scales as N2 for a sequence of length N. By introducing an approximation of g(m), Fixman and Freire34 managed to bring down computational costs to a linear scaling with N.

In here, we study analytically and through numerical computations (using the Poland algorithm), the melting behavior of a DNA with all basepairs identical except at one position (“defect”). A defect here refers to any scenario where DNA stability is locally increased or decreased compared to the remaining part of the DNA. Examples of defects include DNA interstrand crosslinks,35 strongly stabilized sites,36 base-pair mismatches,37 or potential changes on local DNA chemical structures when a fluorophore-quencher pair is attached to the opposite strands of a DNA molecule as in Refs. 38 and 39.

A number of previous studies investigated the effect of defects on the melting of DNA. Changes in melting temperature in short DNA due to the presence of mismatches were experimentally studied in Ref. 40. Molecular dynamics simulations of DNA with a defect was investigated in Ref. 41. Effect of defects on thermal denaturation of short DNA was investigated within the PBD model in Ref. 42. The order of the phase transition for a weakly disordered DNA, by applying a mean-field type approximation, was investigated analytically in Ref. 43. Despite these studies, to our knowledge, no previous work investigated by analytical means how a single defect affects the melting behavior of the PS model. In particular, it remains to be investigated how such a defect affects the phase transition, and how many basepairs away from the defect the melting probability is affected.

In this study, we solve a prototypical model analytically: a DNA molecule which has all identical basepairs, except for one “defect site,” as illustrated in Fig. 1. The hydrogen-bond energy at this defect site is different, and the stacking interactions between this site and its two neighbors are different compared to the rest of the DNA. For such a system, we calculate analytically the basepair melting probabilities within the PS model.

FIG. 1.

Cartoon of DNA melting with a “defect.” In our prototypical system, a DNA molecule has all identical basepairs, represented here by complementary light and dark green nucleobases on opposite strands, except for one “defect site,” pictured by red and pink nucleobases. The hydrogen-bond energy at this defect site is different, and the stacking interactions between this site and its two neighbors are different compared to the rest of the DNA. A bubble of inner length 9 basepairs is also represented. The statistical weight of this bubble is γ(9) = 9cσβLβRβ7, where the parameters are defined in Eqs. (4) and (5).

FIG. 1.

Cartoon of DNA melting with a “defect.” In our prototypical system, a DNA molecule has all identical basepairs, represented here by complementary light and dark green nucleobases on opposite strands, except for one “defect site,” pictured by red and pink nucleobases. The hydrogen-bond energy at this defect site is different, and the stacking interactions between this site and its two neighbors are different compared to the rest of the DNA. A bubble of inner length 9 basepairs is also represented. The statistical weight of this bubble is γ(9) = 9cσβLβRβ7, where the parameters are defined in Eqs. (4) and (5).

Close modal

Due to the long-range coupling and the nearest neighbour stacking interaction, the effect of a defect in our system is non-trivial. Our analytic prediction provides a basic understanding of both how the melting temperature changes with defect properties, but also of the “scope” of the defect, i.e., over how long a distance the defect affects basepair melting probabilities. Experimentally, the melting curves predicted herein, and available for arbitrary DNA sequences using our publicly available codes, can be measured using specifically designed DNA sequences.

Let us consider a dsDNA with N basepairs, clamped at both ends (the first and last basepair are always in closed states) for simplicity. Physically, the clamping can be achieved by attaching beads to the end basepairs. We assume that the beads are free to rotate, and hence that the system can release torsional constraints.

In the PS model, the opening (or melting) of a basepair involves the disruption of the hydrogen bonds between the complementary facing bases and the disruption of the stacking interactions with the nearest (left and right) neighbors. The PS model is therefore characterized by the following set of parameters. The free energy required for breaking the hydrogen bonds between facing bases at position k is denoted by Ehb(k), and can take two values (AT-bonds or GC-bonds). For disrupting the stacking interactions between consecutive basepairs k − 1 and k there is an free energy cost Est(k − 1, k), which can take ten different values for symmetry reasons.24,25 Note that Ehb(k) and Est(k − 1, k) have energetic as well as entropic contributions. The ring factor for “bubble” (melted region) initiation ξ ≈ 10−3 is the statistical weight associated with boundaries between melted and clamped DNA regions.24,25 For convenience, we group these parameters in two equilibrium Boltzmann weights: the total statistical weight for the opening of basepair k
(1)
and the local cooperativity parameter
(2)
which is assigned to each loop and whose typical value is ϕ ≈ 10−5 (see Ref. 27). Its smallness favors single large bubbles over sequences of small bubbles of the same total length, which accounts for the fact that increasing the length of an existing bubble breaks only one stacking interaction, while starting a new bubble disturbs two stacking interactions.28 
Finally, the number of configurations of melted regions of length m is characterized by the loop factor
(3)
The value of the loop exponent c depends on which model best describes melted DNA regions. For unconfined DNA in three dimensions, an ideal random walk gives c = 1.5; a self-avoiding random walk leads to c = 1.76 (see Ref. 44); and if self-avoidance between intact and melted DNA regions is considered, then c = 2.11 (see Ref. 29). The effects of DNA confinement in nanochannels on the loop factor have been investigated in Ref. 13. Note that in principle Eq. (3) is only valid for large bubbles, however it is common practice to assign it to every bubble regardless of its length. The power-law form for g(m) gives rise to effectively long range interactions for molten DNA regions.27,34
The main novelty of our work is that we go beyond previous studies by introducing a “defect” site into an otherwise homogeneous DNA (melting of homogeneous DNA was studied previously, see for instance Ref. 29). We label the basepairs by an index k (k = 1, …, N) and the defect basepair is positioned at site kd. The statistical weights for the opening of the defect and its right neighbor are βL and βR, and the associated local cooperativity parameters are σL and σR. The rest of the sequence is homogeneous with weights β and σ. Thus,
(4)
(5)
where δk,k is the Kronecker delta. The mathematical details of the model are presented in  Appendix A, which together with Eqs. (4) and (5) define the problem at hand.

In practice, the statistical weight of a bubble of length m (open basepairs) including the defect, like the one in Fig. 1 where the red/pink bases indicate the defect, is given by γ(m) = mcσβLβRβm−2. If the defect is on the edge of the bubble, either the last closed basepair on the left or the first closed basepair on the right, the statistical weight of the bubble is γ(m) = mcσRβm or γ(m) = mcσLβm, respectively. A bubble of the same length which does not include the defect would have a weight γ(m) = mcσβm. On the other hand, the statistical weight of non-melted regions is 1 by definition. Then, the partition function of the system, from which the melting probabilities are deduced, is obtained by summing over all possible configurations of the dsDNA (number, positions and lengths of the bubbles).

The model equations as presented in Sec. II are solved analytically in  Appendix B, and our analytical predictions are corroborated using numerical computations.

In our solution of the PS model, we distinguish between two cases, c ≤ 1 and c > 1. When the loop exponent is smaller than one: c ≤ 1, we show in  Appendix B that the probability Pdf that the defect basepair is closed is continuous from 1 to 0 when increasing temperature, and that for any temperature and in the thermodynamic limit (long DNA strand and defect far from the ends), this probability is given by
(6)
where the polylogarithm is defined as:
(7)
and z0 is determined by the solution of
(8)
The continuity of the probability is ensured by the existence of a solution z0 < 1 to Eq. (8) for all temperatures when c ≤ 1.
In the absence of a defect, βL = βR = β and σL = σR = σ, then Eq. (6) reduces to the case for homogeneous DNA treated in Ref. 13 
(9)
We henceforth refer to Pbg as given above as background probability. As a consistency check of the result above, we have that if that basepairs are independent, i.e., if there is no stacking interaction (σ = 1) and no loop correction (c = 0), then using Li−1(z) = z/(1 − z)2 and Li0(z) = z/(1 − z), together with Eq. (8), we obtain Pbg = 1/(1 + β). This is the expected result for a basepair with two possible states, open and closed, respectively associated with Boltzmann weights β and 1.
Since typically ξ ≈ 10−3 and σL and σR are proportional to ξ, Eq. (6) is well approximated using a first order Taylor expansion in ξ of the denominator:
(10)
Note that in the absence of a defect in the approximate solution Eq. (10), we recover Eq. (9), because at first order in ξ we have that βz01=1+ξLic(z0)1.
For the case c > 1, which is more relevant for real DNA, we observe a phase transition at critical temperature T*, defined by the condition z0 = z* = 1. In this case, we show in  Appendix B that in the thermodynamic limit (long DNA strand and defect kd far from the ends), below the critical temperature the probability that the defect basepair is closed is equal to Pdf given by Eq. (6), and above the critical temperature this probability is null:
(11)
Mathematically, these cases originate from the existence or non-existence of a pole z0 < 1 for the generating function of the system. Please note that while, below the critical temperature, the probability for c > 1 is given by the same formula as the probability for c ≤ 1, their numerical values differ since Eq. (6) depends on the loop exponent c via the polylogarithm. When c > 1, the way Pdf approaches 0 near the critical temperature, which defines the nature of the phase transition, is explored in Sec. III C.

We now give an analytic expression for the probability that a basepair in the neighborhood of the defect is closed. The details of our derivation is found in  Appendix B.

Since we allowed different stacking energies for the right and left neighbors of the defect basepair, DNA is not symmetrical around the defect and so neither are the melting probabilities. Let us first consider the right neighborhood of the defect, that is basepairs at a distance d = kkd > 0 of the defect. In the thermodynamic limit, for all temperatures if c ≤ 1 and for T < T* if c > 1, the exact probability that a basepair at a distance d to the right of the defect is closed is given by:
(12)
where Pbg is given by Eq. (9) and I(k) is an explicit integral given in  Appendix C by Eq. (C4), that can be computed numerically.

The left neighborhood of the defect is turned into the right neighborhood of the defect when the DNA sequence is reversed (the first basepair becomes last and vice versa), which also reserves the left and right stacking energies. For this reason, the probability P(kdd) for a basepair at a distance d = kdk > 0 to the left of the defect is given by Eq. (12) when replacing βL, βR, σL and σR by βL=βLσR/σL, βR=βRσL/σR, σL=σR and σR=σL, respectively.

The probabilities for the defect site and in the neighborhood of the defect, Eqs. (11) and (12), were derived in the presence of a single defect. However, we see from Eq. (12) that, since the integral I(d) is a decreasing function of the distance d as shown  Appendix C, the scope of the perturbation induced by a defect is decaying at least exponentially fast when moving away from the defect, with a characteristic length-scale
(13)
Moreover, the integral I decays as a power law for large d: I(d)dλc, with λc = c when c > 1 and λc = 2 − c when c < 1 (see  Appendix C). Thus, far from the defect the perturbation decays as:
(14)
Therefore, multiple defects can be considered as independent and their contributions can be summed up if they are separated by a distance larger than the length-scale d0. Note that d0 increases with temperature and diverges as we approach the critical temperature (z0 → 1).

In Fig. 2, we compare the theoretical predictions Eqs. (6) and (12) to the numerical solution of the PS model, for the three values of the loop exponent discussed in Sec. II: c = 1.5, c = 1.76 and c = 2.11, below the transition (z0 < 1), and observe an excellent agreement. Let us make three comments on this figure. First, there is a slight asymmetry around the defect in the melting map, coming from the different values for the left and right Boltzmann weights βL and βR. Second, since βL > β and βR > β, it requires less energy to disrupt the defect bond than the background bonds, which explains why the opening probability is larger for the defect basepair. Third, the smaller the loop exponent c the larger the opening probabilities, because a smaller c means less avoidance for the melted regions [Eq. (3)], leading to a higher number of possible three-dimensional configurations (larger entropy), and thus to a higher probability to be in the open state.

FIG. 2.

Melting probabilities for a DNA segment with a defect at the center for three values of the loop exponent: c = 1.5 in purple, c = 1.76 in blue and c = 2.11 in green. The DNA is of length N = 1000 basepairs with a defect at basepair kd = 500. The plain curves have been obtained by solving the PS model numerically, and the dots are the analytical solutions, given by Eq. (6) for the defect basepair and by Eq. (12) for the other basepairs. The values of the parameters are β = 0.95, σ = ξ = 10−3, βL = 1.5, βR = 2, σL = 1.5ξ and σR = 2ξ. The horizontal dash-dotted lines indicate the background probabilities, computed with Eq. (9). The two vertical dashed lines at positions kdd0 and kd + d0 show the scope of the perturbation induced by the defect, with a characteristic length-scale d0 ≈ 19 basepairs for the three values of c, given by Eq. (13), with z0 = 0.9482 for c = 1.5, z0 = 0.9485 for c = 1.76 and z0 = 0.9487 for c = 2.11 defined by Eq. (8).

FIG. 2.

Melting probabilities for a DNA segment with a defect at the center for three values of the loop exponent: c = 1.5 in purple, c = 1.76 in blue and c = 2.11 in green. The DNA is of length N = 1000 basepairs with a defect at basepair kd = 500. The plain curves have been obtained by solving the PS model numerically, and the dots are the analytical solutions, given by Eq. (6) for the defect basepair and by Eq. (12) for the other basepairs. The values of the parameters are β = 0.95, σ = ξ = 10−3, βL = 1.5, βR = 2, σL = 1.5ξ and σR = 2ξ. The horizontal dash-dotted lines indicate the background probabilities, computed with Eq. (9). The two vertical dashed lines at positions kdd0 and kd + d0 show the scope of the perturbation induced by the defect, with a characteristic length-scale d0 ≈ 19 basepairs for the three values of c, given by Eq. (13), with z0 = 0.9482 for c = 1.5, z0 = 0.9485 for c = 1.76 and z0 = 0.9487 for c = 2.11 defined by Eq. (8).

Close modal

The nature of the phase transition for homogeneous DNA has been long studied and is well understood.29 It has been shown that no phase transition occurs for c < 1, that the probability for a closed basepair approaches 0 with a critical exponent (2 − c)/(c − 1) when 1 < c < 2, and that the phase transition is first order when c > 2 (see Refs. 29 and 45). In  Appendix D, we show that these results hold true for homo-DNA with a defect, and that the position of the basepair (defect, neighboring or background) enters through a prefactor in the melting probability.

In detail, when 1 < c < 2, we show that
(15)
where A(k) is a prefactor depending on the position k of the basepair, but independent of the temperature difference TT*. The prefactor A(k) is given explicitly in Eqs. (D13) and (D14). The probability that a basepair is closed thus approaches 0 in a power-law fashion with a critical exponent γ = (2 − c)/(c − 1) and since γ > 0 the transition is continuous. By taking the derivative of P with respect to ΔT = TT* we find that for 3/2 < c < 1 this derivative becomes infinite at T = T*, i.e. the transition is second order in the Ehrenfest sense. Proceeding to compute higher order derivatives we find that for 4/3 < c < 3/2 the transition is third order, for 5/4 < c < 4/3 the transition is fourth order etc. So, in general we have a Nth order transition, in the Ehrenfest sense, if (N + 1)/N < c < N/(N − 1).
When c > 2, our derivation in  Appendix D shows that
(16)
where B(k) > 0 is a position-dependent constant which is independent of the temperature difference TT*. Therefore as we approach T* from below we do not continuously approach 0, and thus the transition is first order. The explicit expression for B(k) is given in Eqs. (D16) and (D17).

We illustrate these results on Fig. 3 by plotting the melting curves for a background basepair, the defect basepair, and a basepair at a distance d = 10 to the right of the defect, using respectively Eqs. (9), (6), and (12), for a sequence “A…A” with a defect G, and for three different values of the loop exponent c. Unlike the example shown on Fig. 2, here the bond at the defect site is stronger than the bonds for the background basepairs (GC: three hydrogen bonds, AT: two hydrogen bonds), therefore the opening probability for the defect basepair is smaller than for background basepairs. Note that depending on the value of c, the neighboring basepair behaves either more similarly to the defect basepair or to a background basepair.

FIG. 3.

Melting curves as a function of temperature for different values of the loop exponent c. We considered an homogeneous sequence “A…A” and a defect G, and used DNA stacking and hydrogen-bond parameters from Ref. 24 with salt concentration 0.05M NaCl. The probabilities are computed using Eq. (9) for a background basepair far from the defect (k) in blue, Eq. (6) for the defect basepair (k = kd) in red, and Eq. (12) for a neighboring basepair at a distance d = kkd = 10 to the right of the defect, in yellow. (b) and (c) The vertical dashed lines indicate the critical temperatures T*, obtained by solving Eq. (7) with z0 = 1. For c = 1.5, T* = 65.55 °C, for c = 2.5, T* = 65.54 °C. (c) The three dots at T = T* are the values B(k) from Eq. (16), explicitly given in Eqs. (D16) and (D17).

FIG. 3.

Melting curves as a function of temperature for different values of the loop exponent c. We considered an homogeneous sequence “A…A” and a defect G, and used DNA stacking and hydrogen-bond parameters from Ref. 24 with salt concentration 0.05M NaCl. The probabilities are computed using Eq. (9) for a background basepair far from the defect (k) in blue, Eq. (6) for the defect basepair (k = kd) in red, and Eq. (12) for a neighboring basepair at a distance d = kkd = 10 to the right of the defect, in yellow. (b) and (c) The vertical dashed lines indicate the critical temperatures T*, obtained by solving Eq. (7) with z0 = 1. For c = 1.5, T* = 65.55 °C, for c = 2.5, T* = 65.54 °C. (c) The three dots at T = T* are the values B(k) from Eq. (16), explicitly given in Eqs. (D16) and (D17).

Close modal

Because of the separation of variables between the basepair position k and the temperature difference TT* in Eqs. (15) and (16), the helicity, defined as the fraction of bounded basepairs θ=k=1NP(k)/N, follows the same phase transitions as TT*: θ(TT*)(2c)/(c1) for 1 < c < 2 and θB > 0 for c > 2. Thus the nature of the phase transition and the critical exponent for the present single-defect system are the same as for the PS model without a defect in the thermodynamic limit (N).

In this study we investigated how a defect affects the DNA melting transition. Our exact solution of the Poland–Scheraga model provides the melting probabilities for the defect site and its neighbors, as given in Eqs. (6) and (12). The opening probability for the defect site can be either larger or smaller than for the background basepairs, depending on whether the defect effectively weakens or strengthens the bond. Therefore, the presence of a defect can both increase or decrease the probability of bubble formation. We find that the effect of the defect decays with a typical distance d0 as given in Eq. (13), which depends on the distance to the critical temperature.

We also investigated the phase transition behavior in the thermodynamical limit (N) of our model system. We found that, while the single defect melting probability has a prefactor which is different than the rest of the DNA, the temperature dependence is such that the order of the phase transition, and the critical exponents, are the same as for the case without a defect.

Our findings could potentially be experimentally corroborated using, for instance, a setup with fluorophore-quencher pairs attached to the DNA.38,39

It remains a future challenge to solve analytically the case with multiple defect sites. Our results indicate that such a system might show a rich behavior, where the “interaction” between defects depends on how far from the critical temperature the system is, on the distance between defects, as well as on the value of the loop exponent.

T.A. is grateful to the Swedish Research Council for funding (Grant No. 2014-4305). A.G. thanks Malcolm Hillebrand for is careful review of the manuscript and for useful discussions.

The authors have no conflicts to disclose.

Arthur Genthon: Formal analysis (equal); Writing – original draft (equal); Writing – review & editing (equal). Albertas Dvirnas: Formal analysis (supporting); Supervision (supporting); Writing – review & editing (equal). Tobias Ambjörnsson: Conceptualization (lead); Formal analysis (equal); Funding acquisition (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

The Matlab codes that support the findings of this study are openly available at https://github.com/Arthur-Genthon/dna-melting-defect.

Let us consider a dsDNA with N basepairs, clamped at both ends. Following Refs. 13 and 46 we denote by QL(k) and QR(k) the statistical weights (or partition functions) of the regions to the “left” and to the “right” of basepair k, with the constraint that basepair k is closed. The probability for having basepair k closed is then simply given by
(A1)
A recursive relation on the left partition functions46 can be derived by summing over the length of the right-most loop before the closed basepair k + 1:
(A2)
where γj(kj) is the statistical weight of a loop of length kj, with first open basepair at j + 1 and last open basepair at k. The recursion relation for QR(k) is similar, see Ref. 46. The loop weight is given by:
(A3)
with Uhb the energy required to break the hydrogen bonds (Watson–Crick) between facing bases, Ust the energy cost for disrupting a stacking interaction (nearest neighbor) between two consecutive basepairs, and Ω(m) the number of configurations of the melted region of size m. The number of configurations corresponds to the volume in phase space available for a random walk returning to its origin after 2 m steps, and is given by:44,
(A4)
where A is a non-universal constant and c is the loop factor. The quantity s is the effective coordination number for the random walk and is another non-universal constant. A part of this number of configurations is absorbed as an entropic contribution in the energies:
(A5)
(A6)
(A7)
Note that within the model so far, it does not matter for the statistical mechanics problem at hand, how much of the total entropy ΔS we assign to Ehb and how much we assign to Ehb. Besides the effective coordination number, open basepairs also have rotational degrees of freedom. These degrees of freedom can be absorbed into ΔShb and ΔSst. In practice, Est and Ehb are measured experimentally at different temperatures using designed DNA molecules, see Refs. 24 and 25 for details.
Given the quantities above, we now define the total statistical weight, which includes energetic and entropic contributions, associated with the opening of basepair k:
(A8)
the ring factor for bubble initiation
(A9)
the local cooperativity parameter
(A10)
and the loop factor
(A11)
As in Ref. 13 we find it convenient to rescale QL(k) as
(A12)
Finally, combining all the previous relations, we recover the Garel–Orland recursion relation46 for k = 1, …, N:
(A13)
with “boundary” conditions qL(0) = 1 and qL(1) = 1/Λ(1). A similar recursion relation for the right partition functions qR can also be derived.

In the next section, we solve this equation for homogeneous-DNA with a defect basepair.

In this appendix we solve the Garel–Orland recursion relation, Eq. (A13), for homo-DNA with a defect basepair described by Eqs. (4) and (5), and we provide analytic expressions for the melting probabilities of the defect basepair, a neighboring basepair, and a background basepair. To do so, we follow along similar lines as the derivation for the homo-DNA case given in Ref. 13.

First, we introduce the generating function
(B1)
We multiply Eq. (A13) by zk and sum over k, which leads, after using Eqs. (4) and (5) and the convolution theorem for z-transforms, and after some rearrangements, to:
(B2)
where the polylogarithmic function Lic(z) is defined in Eq. (7).
We now invert Eq. (B2) using contour integration:47,
(B3)
where the integration is for a counter-clockwise closed contour C around z = 0. Then, we deform the contour into several parts C1 to C5, as described in Refs. 13 and 45 and shown in Fig. 4. The parts C2, C4 and C5 are necessary to go around the branch cut of the polylogarithm for Re(z) > 1. Using standard methods from complex analysis, one easily show that the integration on portions C2 and C3 gives vanishing contributions. Portion C1 is a clockwise contour around the pole z0 of q̄L(z), which exists only if z0 < 1. The pole is the solution of Eq. (8), and its existence depends on both the loop factor c and temperature T. If c < 1, a solution z0 < 1 always exists, regardless of the temperature, because the polylogarithm Lic(z) diverges as z → 1. On the other hand if c > 1, a pole z0 < 1 exists only for temperatures below a critical temperature that we call T*. The critical temperature is thus defined as the temperature for which z0 = z* = 1, and is associated with weights β* and σ*, linked by Eq. (8) when z0 = 1:
(B4)
where ζ(c) = Lic(1) is the Riemann zeta function. These behaviors are illustrated in Fig. 5, for two different values of the loop exponent. Finally, the contours C4 and C5 are the circumvention of the branch cut for z ≥ 1, and we call the result of the integration along these contours the branch cut contribution qLBC(k).
FIG. 4.

Contours used in the integral Eq. (B3) for the statistical weight qL(k). The red clockwise contour around z = 0 is the original contour, which is deformed into contours C1 to C5 in blue. Portions C2, C4 and C5 are necessary to circumvent the branch cut of the polylogarithm for Re(z) > 1. The contour C3 is sent to infinity, and C1 is clockwise to subtract the contribution of the pole z0 of q̄L(z) given in Eq. (B2), if z0 < 1.

FIG. 4.

Contours used in the integral Eq. (B3) for the statistical weight qL(k). The red clockwise contour around z = 0 is the original contour, which is deformed into contours C1 to C5 in blue. Portions C2, C4 and C5 are necessary to circumvent the branch cut of the polylogarithm for Re(z) > 1. The contour C3 is sent to infinity, and C1 is clockwise to subtract the contribution of the pole z0 of q̄L(z) given in Eq. (B2), if z0 < 1.

Close modal
FIG. 5.

Determination of the pole z0 of q̄L(z) by the intersection of the polylogarithm Lic(z) in plain black, and F(z) = (βz)/(σz) in colors, for different values of β [see Eq. (8)]. The vertical dotted lines indicate z = 1. For c = 0.7, Lic(z) diverges when z → 1 so that the intersection at z0 < 1 exists for any β (i.e. any temperature). For c = 1.76, Lic(1) = ζ(c) is finite, and thus for T > T* there is no solution z0.

FIG. 5.

Determination of the pole z0 of q̄L(z) by the intersection of the polylogarithm Lic(z) in plain black, and F(z) = (βz)/(σz) in colors, for different values of β [see Eq. (8)]. The vertical dotted lines indicate z = 1. For c = 0.7, Lic(z) diverges when z → 1 so that the intersection at z0 < 1 exists for any β (i.e. any temperature). For c = 1.76, Lic(1) = ζ(c) is finite, and thus for T > T* there is no solution z0.

Close modal
Using Cauchy’s residue theorem47 for pole z0, we obtain:
(B5)
Using the fact that Res[A(z)/B(z), z = z0] = A(z0)/B′(z0) if z0 is a simple pole and A(z) is a regular function, we explicitly compute the residue when k > kd as
(B6)
After some rearrangements, the contribution of the branch cut for k > kd can be expressed as:
(B7)
in terms of the branch cut contribution for homo-DNA without defect: qLhomoBC(k)I(k). We give an expression of I(k) as an explicit integral that can be computed numerically in  Appendix C, and show that it vanishes in the limit k.
To further simplify these expressions, we must compute the partition functions qL in the right hand sides of Eqs. (B6) and (B7). To do so, it is useful to note that, for k = 0, …, kd − 1, the partition functions qL(k) are equal to their counterparts in the absence of defects, which we note q̂L(k). Therefore, by comparing Garel–Orland recursion relations in both situations, we can express the weights in Eqs. (B6) and (B7) as linear combinations of weights for homo-DNA without defect:
(B8)
(B9)
(B10)
(B11)
The weights q̂L are directly obtained from Eqs. (B5)(B7), when canceling the defect by setting βL = βR = β and σL = σR = σ:
(B12)
where the second term in the right hand side appears only if z0 < 1. Combining Eqs. (B8)(B12) with Eq. (B6) and with Eq. (B7), neglecting I(k), I(kd − 1), I(kd) and I(kd + 1) in the thermodynamic limit, using Eq. (8), and after multiplication by βRβLβk−2 we get for k > kd:
(B13)
(B14)
We consider that the defect is far from both ends of the DNA, so that in the thermodynamic limit, we have kd and also Nkd. Therefore, I(N + 1 − kd) and I(Nkd) vanish and QL(N + 1) is only controlled by the pole, and given by Eq. (B13) when z0 < 1.

1. Below the critical temperature

Let us first focus on the case z0 < 1. The partition function at the defect basepair QL(kd) is given by Eq. (B8) after multiplication by βkd1, using Eq. (B12) and neglecting I(kd − 1) and I(kd) in the thermodynamic limit:
(B15)
(B16)
where the right partition function is derived similarly. Intuitively, it is obtained by reversing the DNA strand, so that k has to be replaced by N + 1 − k and σL and σR must be inverted. By combining Eqs. (A1), (B15), (B16), and (B13) for k = N + 1, we obtain the probability that the defect basepair is closed, Eq. (6) in the main text.

For a basepair k > kd in the right neighborhood of the defect, the left partition function QL(k) is given by the sum of Eqs. (B13) and (B14), where the terms I(kkd) and I(kkd − 1) are not negligible since we do not suppose a priori that the neighboring basepair is far from the defect. On the other hand, the right partition function describe the region from basepair k to basepair N + 1, with the constraint that they are both closed. Since this region does not include any defect, QR(k) is given by the formula for homo-DNA without defect. Combining these results with Eqs. (A1) and (B13) for k = N + 1, we recover the probability that a basepair in the right neighborhood of the defect is closed, Eq. (12) in the main text.

2. Above the critical temperature

When T > T*, there is no contribution of pole z0, so that the partition functions are only given by the branch cut contributions. For the defect basepair, the branch cut contributions of the left partition function QL(kd) are I(kd) and I(kd − 1), coming from Eqs. (B8) and (B12), which are equivalent I(kd) ∼ I(kd − 1) in the limit kd. Similarly, the right partition function QR(kd) for the defect basepair is proportional to I(Nkd), and the left partition function QL(N + 1) for the final basepair N + 1 is proportional to I(N). Combining these results, the probability that the defect basepair is closed is asymptotically given by:
(B17)
(B18)
where we used a result derived in  Appendix C: I(k)kλc when k, with λc = c when c > 1 and λc = 2 − c when c < 1. In all cases, λc > 1, therefore Eq. (B18) vanishes in the thermodynamic limit where kd, N, and kd is far from both ends [think for example of kd = pN, with 0 < p < 1, so that P(kd)p(1p)Nλc0].

For a basepair in the right neighboring of the defect, we still have QL(N + 1) ∼ I(N), moreover QR(k) ∼ I(Nk) since there is no defect in the region from k to N. Now the different branch cut contributions appearing in QL(k) are I(k) and terms equivalent to I(kd)I(kkd), coming from the combination of Eq. (B7) with Eqs. (B8)(B12). The part proportional to I(k) gives a vanishing contribution to the probability, which follows from the same calculation as for the defect basepair. The probability that a basepair at position k > kd is closed is thus asymptotically given by: P(k) ∼ I(kd)I(kkd)I(Nk)/I(N). If we are interested in a basepair k that remains at a fixed distance of the defect in the thermodynamic limit where k, kd, N, then I(kkd) is simply a constant, I(kd) ∼ I(k), and again the ratio reduces to the case treated for the defect basepair, which gives a vanishing probability in the thermodynamic limit. If kkd, then the asymptotic behavior I(k)kλc can be used for all four functions I involved in the probability, and the probability reads: P(k)kd(kkd)(Nk)/Nλc, which goes to zero in the thermodynamic limit.

In this appendix, we make explicit the behavior of the branch cut contribution to the melting probability, and show that it vanishes in the thermodynamic limit. The contributions of portions C4 and C5 of the integration contour defined in Fig. 4 are given by:
(C1)
(C2)
where we parameterized the portions C4 and C5 with z = x and z = x + respectively, and we considered the limit δ → 0. The real part of the polylogarithm function is continuous across the branch-cut: Re[Lic(x + )] = Re[Lic(x)] = R(x) when δ → 0, but the the imaginary part makes a jump: Im[Lic(x + )] = −Im[Lic(x)] = π(ln x)c−1/Γ(c) when δ → 0, where
(C3)
is the gamma function. Using these two properties, we obtain
(C4)
From this expression, we get that
(C5)
because Ic is positive. The branch cut contribution is thus a decreasing function of the basepair position k.
We now derive the asymptotic scaling of Ic in the thermodynamic limit, that is when k. First let us make the variable substitution s = (k − 1) ln x:
(C6)
Since we are mostly interested in the behavior near the critical point, we use a general expansion for Lic(z) close to z = 1 obtained in Ref. 48 when c is not a positive integer:
(C7)
where the third term represents higher order corrections involving the Stirling numbers of the first kind Sn(m). We now consider the thermodynamic limit and let k in Eq. (C6). Retaining only the two first terms in Eq. (C7), and replacing es/(k−1) ≈ 1, Eq. (C6) becomes:
(C8)
where A = 2(1 − β*/β)σΓ(1 − c) cos(πc)/β. To derive Eq. (C8), we used the reflection formula Γ(z)Γ(1 − z) = π/sin(πz), the result cos(πzπ) = −cos(πz), the fact that Re[(−1)c−1] = Re[e(c−1)] = cos[π(c − 1)], and the trigonometric identity: cos(z)2 + sin(z)2 = 1. To proceed further we separate between the cases c > 1 and c < 1.
When c > 1, the denominator of Eq. (C8) is dominated by its first term:
(C9)
(C10)
The finite-size correction decays in a power-law fashion kc. At the melting transition ββ*, we have that Ic(k)|c>1 diverges with an exponent = −2.
When c < 1, the third term in the denominator of Eq. (C8) dominates:
(C11)
(C12)
(C13)
where we used the reflection formula for the gamma function again, and the property Γ(z + 1) = zΓ(z). Again the finite-size correction decays in a power-law fashion, in this case k−(2−c). However, no power-law divergence occurs near the transition ββ*.
In this appendix, we investigate the behavior of the probabilities Eqs. (6) and (12) close to the melting transition TT*, i.e. when ββ*, or equivalently when z0z* = 1. We define Δz = 1 − z0 ≪ 1, and we insert this ansatz into Eq. (8) using Eqs. (B4) and (C7) to derive the following equation for the unknown Δz:
(D1)
where we defined Δβ = ββ*. We then need to separate the cases: (i) 0 < c < 1; (ii) 1 < c < 2 and (iii) c > 2.
  • 0 < c < 1: the left-hand side of Eq. (D1) diverges as Δz → 0, whereas the right-hand side is finite. Thus, Eq. (D1) has no solutions for z0 close to 1. Instead, all solutions to Eq. (8) occur for z0 < 1, and P(k) is for all temperature given by Eq. (6) for the defect basepair and by Eq. (12) for the neighboring basepairs. The transition is therefore “smooth” (derivatives to any order of P with respect to T exists).

  • 1 < c < 2: the left-hand side of Eq. (D1) is dominated by its second term as Δz → 0, so that:
    (D2)
    Note that we have Δβ < 0 and Γ(z) < 0 for −1 < z < 0, so that their ratio is positive.
  • c > 2: the left-hand side of Eq. (D1) is dominated by its first term as Δz → 0, so that:
    (D3)
Let us now link Δβ and ΔT = TT*. First, for homo-DNA it is useful to re-write Eq. (A8) as
(D4)
where we define
(D5)
the temperature for which the closed and open states have the same statistical weight β = 1. For homo-DNA with independent basepairs (c = 0 and σ = 1), we showed in the discussion following Eq. (9) that Pbg = 1/(1 + β), thus in this case, TM is the melting temperature, at which half the basepairs are open.
Then,
(D6)
with
(D7)
(D8)
when ΔT → 0. We then insert Eq. (D8) into Eq. (D6), expand it at first order in ΔT, and identify β*, which leads to:
(D9)
with
(D10)
Finally, we need to expand the polylogarithm. Keeping the two first terms in Eq. (C7) and expanding the natural logarithm gives:
(D11)
Now let us write the probability in the vicinity of the transition explicitly. For 1 < c < 2, combining Eqs. (D2) and (D9)(D11) and inserting them in Eq. (6) for the defect basepair and in Eq. (12) for the neighboring basepairs leads to the following leading behavior:
(D12)
with
(D13)
(D14)
For c > 2, we combine Eqs. (D3) and (D9)(D11) and inserting them in Eq. (6) for the defect basepair and in Eq. (12) for the neighboring basepairs leads to the following leading behavior:
(D15)
with
(D16)
(D17)
The values A(k) and B(k) for a background basepair far from the defect (k) are directly obtained from Eqs. (D14) and (D17) by setting I = 0.
1.
J. D.
Watson
and
F. H.
Crick
,
Cold Spring Harbor Symposia on Quantitative Biology
(
Cold Spring Harbor Laboratory Press
,
1953
), Vol.
18
, pp.
123
131
.
2.
B.
Alberts
,
D.
Bray
,
K.
Hopkin
,
A. D.
Johnson
,
J.
Lewis
,
M.
Raff
,
K.
Roberts
, and
P.
Walter
,
Essential Cell Biology
,
4th ed.
(
Garland Science
,
2015
).
3.
D.
Poland
and
H. A.
Scheraga
,
Theory of Helix-Coil Transitions in Biopolymers: Statistical Mechanical Theory of Order-Disorder Transitions in Biological Macromolecules
(
Academic Press
,
New York
,
1970
).
4.
M.
Hillebrand
,
G.
Kalosakas
,
C.
Skokos
, and
A. R.
Bishop
,
Phys. Rev. E
102
,
062114
(
2020
).
5.
K.
Pant
,
R. L.
Karpel
, and
M. C.
Williams
,
J. Mol. Biol.
327
,
571
(
2003
).
6.
C. H.
Choi
,
G.
Kalosakas
,
K. Ø.
Rasmussen
,
M.
Hiromura
,
A. R.
Bishop
, and
A.
Usheva
,
Nucleic Acids Res.
32
,
1584
(
2004
).
8.
G.
Kalosakas
,
K. Ø.
Rasmussen
,
A. R.
Bishop
,
C. H.
Choi
, and
A.
Usheva
,
Europhys. Lett.
68
,
127
(
2004
).
9.
M.
Hillebrand
,
G.
Kalosakas
,
A. R.
Bishop
, and
C.
Skokos
,
J. Chem. Phys.
155
,
095101
(
2021
).
10.
M. D.
Frank-Kamenetskii
and
S.
Prakash
,
Phys. Life Rev.
11
,
153
(
2014
).
11.
A.
Vologodskii
and
M. D.
Frank-Kamenetskii
,
Phys. Life Rev.
25
,
1
(
2018
).
12.
T.
Garel
and
C.
Monthus
,
J. Stat. Mech.
2005
,
P06004
.
13.
M.
Reiter-Schad
,
E.
Werner
,
J. O.
Tegenfeldt
,
B.
Mehlig
, and
T.
Ambjörnsson
,
J. Chem. Phys.
143
,
115101
(
2015
).
14.
S.
Ares
,
N. K.
Voulgarakis
,
K. Ø.
Rasmussen
, and
A. R.
Bishop
,
Phys. Rev. Lett.
94
,
035504
(
2005
).
15.
T. S.
Van Erp
,
S.
Cuesta-López
, and
M.
Peyrard
,
Eur. Phys. J. E
20
,
421
(
2006
).
16.
M.
Santosh
and
P. K.
Maiti
,
J. Phys.: Condens. Matter
21
,
034113
(
2009
).
17.
N.
Singh
and
Y.
Singh
,
Eur. Phys. J. E
17
,
7
(
2005
).
18.
Y.
Kafri
,
D.
Mukamel
, and
L.
Peliti
,
Physica A
306
,
39
(
2002
).
19.
A.
Hanke
,
M. G.
Ochoa
, and
R.
Metzler
,
Phys. Rev. Lett.
100
,
018106
(
2008
).
20.
N.
Theodorakopoulos
,
Phys. Rev. E
68
,
026109
(
2003
).
21.
D.
Poland
and
H. A.
Scheraga
,
J. Chem. Phys.
45
,
1456
(
1966
).
22.
D.
Poland
and
H. A.
Scheraga
,
J. Chem. Phys.
45
,
1464
(
1966
).
23.
J.
SantaLucia
,
Proc. Natl. Acad. Sci. U. S. A.
95
,
1460
(
1998
).
24.
A.
Krueger
,
E.
Protozanova
, and
M. D.
Frank-Kamenetskii
,
Biophys. J.
90
,
3091
(
2006
).
25.
E.
Protozanova
,
P.
Yakovchuk
, and
M. D.
Frank-Kamenetskii
,
J. Mol. Biol.
342
,
775
(
2004
).
26.
R. M.
Wartell
and
A. S.
Benight
,
Phys. Rep.
126
,
67
(
1985
).
27.
R. D.
Blake
,
J. W.
Bizzaro
,
J. D.
Blake
,
G. R.
Day
,
S. G.
Delcourt
,
J.
Knowles
,
K. A.
Marx
, and
J.
SantaLucia
,
Bioinformatics
15
,
370
(
1999
).
28.
R.
Blossey
and
E.
Carlon
,
Phys. Rev. E
68
,
061911
(
2003
).
29.
Y.
Kafri
,
D.
Mukamel
, and
L.
Peliti
,
Phys. Rev. Lett.
85
,
4988
(
2000
).
30.
G.
Florio
and
G.
Puglisi
,
Acta Biomater.
157
,
225
(
2023
).
31.
W.
Reisner
,
N. B.
Larsen
,
A.
Silahtaroglu
,
A.
Kristensen
,
N.
Tommerup
,
J. O.
Tegenfeldt
, and
H.
Flyvbjerg
,
Proc. Natl. Acad. Sci. U. S. A.
107
,
13294
(
2010
).
32.
C.
Freitag
,
C.
Noble
,
J.
Fritzsche
,
F.
Persson
,
M.
Reiter-Schad
,
A. N.
Nilsson
,
A.
Granéli
,
T.
Ambjörnsson
,
K. U.
Mir
, and
J. O.
Tegenfeldt
,
Biomicrofluidics
9
,
044114
(
2015
).
34.
M.
Fixman
and
J. J.
Freire
,
Biopolymers
16
,
2693
(
1977
).
36.
D. Y.
Lando
,
A. S.
Fridman
, and
C.-K.
Hu
,
Europhys. Lett.
91
,
38003
(
2010
).
37.
A.
Kingsland
and
L.
Maibaum
,
J. Phys. Chem. B
122
,
12251
(
2018
).
38.
G.
Altan-Bonnet
,
A.
Libchaber
, and
O.
Krichevsky
,
Phys. Rev. Lett.
90
,
138101
(
2003
).
39.
T.
Ambjörnsson
,
S. K.
Banik
,
O.
Krichevsky
, and
R.
Metzler
,
Phys. Rev. Lett.
97
,
128105
(
2006
).
40.
N. C.
Harris
and
C.-H.
Kiang
,
J. Phys. Chem. B
110
,
16393
(
2006
).
41.
P. R.
Desai
,
S.
Brahmachari
,
J. F.
Marko
,
S.
Das
, and
K. C.
Neuman
,
Nucleic Acids Res.
48
,
10713
(
2020
).
42.
N.
Singh
and
Y.
Singh
,
Phys. Rev. E
64
,
042901
(
2001
).
43.
H.
Kunz
and
R.
Livi
,
Europhys. Lett.
99
,
30001
(
2012
).
44.
M. E.
Fisher
,
J. Chem. Phys.
45
,
1469
(
1966
).
45.
F.
Wiegel
,
Phase Transitions and Critical Phenomena
(
Academic Press
,
London
,
1983
), Vol.
7
, pp.
101
149
.
46.
T.
Garel
and
H.
Orland
,
Biopolymers
75
,
453
(
2004
).
47.
D.
Wunsch
,
Complex Variables with Applications
,
3rd ed.
(
Pearson
,
2004
).
48.
W. H.
Paulsen
,
Complex Variables, Theory Appl.
47
,
815
(
2002
).