In this study we derive analytically the equilibrium melting probabilities for basepairs of a DNA molecule with a defect site. We assume that the defect is characterized by a change in the Watson–Crick basepair energy of the defect basepair, and in the associated two stacking energies for the defect, as compared to the remaining parts of the DNA. The defect site could, for instance, occur due to DNA basepair mismatching, cross-linking, or by the chemical modifications when attaching fluorescent labels, such as fluorescent-quencher pairs, to DNA. Our exact solution of the Poland–Scheraga model for DNA melting provides the probability that the labeled basepair, and its neighbors, are open at different temperatures. Our work is of direct importance, for instance, for studies where fluorophore-quencher pairs are used for studying single basepair fluctuations of designed DNA molecules.

## I. INTRODUCTION

The three dimensional ground-state structure of double-stranded Deoxyribonucleic acid (dsDNA) is the Watson–Crick double-helix.^{1,2} When increasing the temperature, the double-stranded double-helical DNA starts to “melt.” Partially melted DNA is an alternating set of intact double-stranded regions and of melted single-stranded DNA regions (“bubbles”).^{3,4} The biological relevance of DNA melting is, for instance, due to the fact that local denaturation of DNA is necessary for protein binding to DNA single-strand^{5} and required for transcription initiation.^{6–9}

In Refs. 10 and 11, the melting behavior of DNA is reviewed. Three types of models are discussed: the Poland–Scheraga (PS)/two-state/helix–coil model, the Peyrard–Bishop–Dauxois (PBD) model^{10} and full-scale molecular dynamics simulations. For each model, numerical methods exist which takes a DNA sequence as input, and provides a melting profile (equilibrium probability that each basepair is open) as output, see Refs. 12 and 13 (PS model), Refs. 14 and 15 (PBD model) and Ref. 16. The case when a force is applied to the end of DNA has been studied within the PBD model,^{17} PS model,^{18,19} and through molecular dynamics simulations.^{16} Finite-size scaling analysis of DNA melting was studied within the PBD model in Ref. 20 and within the PS model in Ref. 12.

In this article, we use the PS model, as recommended in Ref. 10. The PS model of DNA melting^{21,22} has been proven to well reproduce macroscopic melting data.^{7,10,23–28} From the theory point of view, the PS model is particularly interesting since, for the case when all basepairs are identical (homo-DNA), it has been shown analytically that the model has a phase transition.^{29} Analytic prediction of the melting behaviour of homo-DNA within the PBD model are available in Ref. 30.

In the PS model, a given basepair can be in two possible states (closed or open). The energy function for a given state of the DNA molecule is of Ising-type with an additional long-range term^{21,27} due to the entropy associated with the melted single-stranded regions. Overall, the PS model has the following parameters: two hydrogen bond energies (AT or GC bonds), ten stacking (nearest neighbor) parameters, the loop entropy function *g*(*m*) (characterized by a loop exponent *c*) for melted DNA regions consisting of *m* basepairs, and the cooperativity parameter. All parameters in the PS model are available from numerous experiments.^{23,24,31,32}

On the computational side, Poland introduced an exact algorithm^{33} for calculating the melting probability of each base-pair within the PS model. Poland’s method has computational time which scales as *N*^{2} for a sequence of length *N*. By introducing an approximation of *g*(*m*), Fixman and Freire^{34} managed to bring down computational costs to a linear scaling with *N*.

In here, we study analytically and through numerical computations (using the Poland algorithm), the melting behavior of a DNA with all basepairs identical except at one position (“defect”). A defect here refers to any scenario where DNA stability is locally increased or decreased compared to the remaining part of the DNA. Examples of defects include DNA interstrand crosslinks,^{35} strongly stabilized sites,^{36} base-pair mismatches,^{37} or potential changes on local DNA chemical structures when a fluorophore-quencher pair is attached to the opposite strands of a DNA molecule as in Refs. 38 and 39.

A number of previous studies investigated the effect of defects on the melting of DNA. Changes in melting temperature in short DNA due to the presence of mismatches were experimentally studied in Ref. 40. Molecular dynamics simulations of DNA with a defect was investigated in Ref. 41. Effect of defects on thermal denaturation of short DNA was investigated within the PBD model in Ref. 42. The order of the phase transition for a weakly disordered DNA, by applying a mean-field type approximation, was investigated analytically in Ref. 43. Despite these studies, to our knowledge, no previous work investigated by analytical means how a single defect affects the melting behavior of the PS model. In particular, it remains to be investigated how such a defect affects the phase transition, and how many basepairs away from the defect the melting probability is affected.

In this study, we solve a prototypical model analytically: a DNA molecule which has all identical basepairs, except for one “defect site,” as illustrated in Fig. 1. The hydrogen-bond energy at this defect site is different, and the stacking interactions between this site and its two neighbors are different compared to the rest of the DNA. For such a system, we calculate analytically the basepair melting probabilities within the PS model.

Due to the long-range coupling and the nearest neighbour stacking interaction, the effect of a defect in our system is non-trivial. Our analytic prediction provides a basic understanding of both how the melting temperature changes with defect properties, but also of the “scope” of the defect, i.e., over how long a distance the defect affects basepair melting probabilities. Experimentally, the melting curves predicted herein, and available for arbitrary DNA sequences using our publicly available codes, can be measured using specifically designed DNA sequences.

## II. THE POLAND–SCHERAGA MODEL WITH A DEFECT

Let us consider a dsDNA with *N* basepairs, clamped at both ends (the first and last basepair are always in closed states) for simplicity. Physically, the clamping can be achieved by attaching beads to the end basepairs. We assume that the beads are free to rotate, and hence that the system can release torsional constraints.

*k*is denoted by

*E*

_{hb}(

*k*), and can take two values (AT-bonds or GC-bonds). For disrupting the stacking interactions between consecutive basepairs

*k*− 1 and

*k*there is an free energy cost

*E*

_{st}(

*k*− 1,

*k*), which can take ten different values for symmetry reasons.

^{24,25}Note that

*E*

_{hb}(

*k*) and

*E*

_{st}(

*k*− 1,

*k*) have energetic as well as entropic contributions. The ring factor for “bubble” (melted region) initiation

*ξ*≈ 10

^{−3}is the statistical weight associated with boundaries between melted and clamped DNA regions.

^{24,25}For convenience, we group these parameters in two equilibrium Boltzmann weights: the total statistical weight for the opening of basepair

*k*

*ϕ*≈ 10

^{−5}(see Ref. 27). Its smallness favors single large bubbles over sequences of small bubbles of the same total length, which accounts for the fact that increasing the length of an existing bubble breaks only one stacking interaction, while starting a new bubble disturbs two stacking interactions.

^{28}

*m*is characterized by the loop factor

*c*depends on which model best describes melted DNA regions. For unconfined DNA in three dimensions, an ideal random walk gives

*c*= 1.5; a self-avoiding random walk leads to

*c*= 1.76 (see Ref. 44); and if self-avoidance between intact and melted DNA regions is considered, then

*c*= 2.11 (see Ref. 29). The effects of DNA confinement in nanochannels on the loop factor have been investigated in Ref. 13. Note that in principle Eq. (3) is only valid for large bubbles, however it is common practice to assign it to every bubble regardless of its length. The power-law form for

*g*(

*m*) gives rise to effectively long range interactions for molten DNA regions.

^{27,34}

*k*(

*k*= 1, …,

*N*) and the defect basepair is positioned at site

*k*

_{d}. The statistical weights for the opening of the defect and its right neighbor are

*β*

_{L}and

*β*

_{R}, and the associated local cooperativity parameters are

*σ*

_{L}and

*σ*

_{R}. The rest of the sequence is homogeneous with weights

*β*and

*σ*. Thus,

*δ*

_{k,k′}is the Kronecker delta. The mathematical details of the model are presented in Appendix A, which together with Eqs. (4) and (5) define the problem at hand.

In practice, the statistical weight of a bubble of length *m* (open basepairs) including the defect, like the one in Fig. 1 where the red/pink bases indicate the defect, is given by *γ*(*m*) = *m*^{−c}*σβ*_{L}*β*_{R}*β*^{m−2}. If the defect is on the edge of the bubble, either the last closed basepair on the left or the first closed basepair on the right, the statistical weight of the bubble is *γ*(*m*) = *m*^{−c}*σ*_{R}*β*^{m} or *γ*(*m*) = *m*^{−c}*σ*_{L}*β*^{m}, respectively. A bubble of the same length which does not include the defect would have a weight *γ*(*m*) = *m*^{−c}*σβ*^{m}. On the other hand, the statistical weight of non-melted regions is 1 by definition. Then, the partition function of the system, from which the melting probabilities are deduced, is obtained by summing over all possible configurations of the dsDNA (number, positions and lengths of the bubbles).

## III. RESULTS

The model equations as presented in Sec. II are solved analytically in Appendix B, and our analytical predictions are corroborated using numerical computations.

### A. Melting probability at the defect basepair

*c*≤ 1 and

*c*> 1. When the loop exponent is smaller than one:

*c*≤ 1, we show in Appendix B that the probability

*P*

_{df}that the defect basepair is closed is continuous from 1 to 0 when increasing temperature, and that for any temperature and in the thermodynamic limit (long DNA strand and defect far from the ends), this probability is given by

*z*

_{0}is determined by the solution of

*z*

_{0}< 1 to Eq. (8) for all temperatures when

*c*≤ 1.

*β*

_{L}=

*β*

_{R}=

*β*and

*σ*

_{L}=

*σ*

_{R}=

*σ*, then Eq. (6) reduces to the case for homogeneous DNA treated in Ref. 13

*P*

_{bg}as given above as background probability. As a consistency check of the result above, we have that if that basepairs are independent, i.e., if there is no stacking interaction (

*σ*= 1) and no loop correction (

*c*= 0), then using Li

_{−1}(

*z*) =

*z*/(1 −

*z*)

^{2}and Li

_{0}(

*z*) =

*z*/(1 −

*z*), together with Eq. (8), we obtain

*P*

_{bg}= 1/(1 +

*β*). This is the expected result for a basepair with two possible states, open and closed, respectively associated with Boltzmann weights

*β*and 1.

*ξ*≈ 10

^{−3}and

*σ*

_{L}and

*σ*

_{R}are proportional to

*ξ*, Eq. (6) is well approximated using a first order Taylor expansion in

*ξ*of the denominator:

*ξ*we have that $\beta z0\u22121=1+\xi Lic(z0)\u22481$.

*c*> 1, which is more relevant for real DNA, we observe a phase transition at critical temperature

*T*

_{*}, defined by the condition

*z*

_{0}=

*z*

_{*}= 1. In this case, we show in Appendix B that in the thermodynamic limit (long DNA strand and defect

*k*

_{d}far from the ends), below the critical temperature the probability that the defect basepair is closed is equal to

*P*

_{df}given by Eq. (6), and above the critical temperature this probability is null:

*z*

_{0}< 1 for the generating function of the system. Please note that while, below the critical temperature, the probability for

*c*> 1 is given by the same formula as the probability for

*c*≤ 1, their numerical values differ since Eq. (6) depends on the loop exponent

*c*via the polylogarithm. When

*c*> 1, the way

*P*

_{df}approaches 0 near the critical temperature, which defines the nature of the phase transition, is explored in Sec. III C.

### B. Melting probability in the neighborhood of the defect

We now give an analytic expression for the probability that a basepair in the neighborhood of the defect is closed. The details of our derivation is found in Appendix B.

*d*=

*k*−

*k*

_{d}> 0 of the defect. In the thermodynamic limit, for all temperatures if

*c*≤ 1 and for

*T*<

*T*

_{*}if

*c*> 1, the exact probability that a basepair at a distance

*d*to the right of the defect is closed is given by:

*P*

_{bg}is given by Eq. (9) and

*I*(

*k*) is an explicit integral given in Appendix C by Eq. (C4), that can be computed numerically.

The left neighborhood of the defect is turned into the right neighborhood of the defect when the DNA sequence is reversed (the first basepair becomes last and vice versa), which also reserves the left and right stacking energies. For this reason, the probability *P*(*k*_{d} − *d*) for a basepair at a distance *d* = *k*_{d} − *k* > 0 to the left of the defect is given by Eq. (12) when replacing *β*_{L}, *β*_{R}, *σ*_{L} and *σ*_{R} by $\beta L\u2020=\beta L\sigma R/\sigma L$, $\beta R\u2020=\beta R\sigma L/\sigma R$, $\sigma L\u2020=\sigma R$ and $\sigma R\u2020=\sigma L$, respectively.

*I*(

*d*) is a decreasing function of the distance

*d*as shown Appendix C, the scope of the perturbation induced by a defect is decaying at least exponentially fast when moving away from the defect, with a characteristic length-scale

*I*decays as a power law for large

*d*: $I(d)\u223cd\u2212\lambda c$, with

*λ*

_{c}=

*c*when

*c*> 1 and

*λ*

_{c}= 2 −

*c*when

*c*< 1 (see Appendix C). Thus, far from the defect the perturbation decays as:

*d*

_{0}. Note that

*d*

_{0}increases with temperature and diverges as we approach the critical temperature (

*z*

_{0}→ 1).

In Fig. 2, we compare the theoretical predictions Eqs. (6) and (12) to the numerical solution of the PS model, for the three values of the loop exponent discussed in Sec. II: *c* = 1.5, *c* = 1.76 and *c* = 2.11, below the transition (*z*_{0} < 1), and observe an excellent agreement. Let us make three comments on this figure. First, there is a slight asymmetry around the defect in the melting map, coming from the different values for the left and right Boltzmann weights *β*_{L} and *β*_{R}. Second, since *β*_{L} > *β* and *β*_{R} > *β*, it requires less energy to disrupt the defect bond than the background bonds, which explains why the opening probability is larger for the defect basepair. Third, the smaller the loop exponent *c* the larger the opening probabilities, because a smaller *c* means less avoidance for the melted regions [Eq. (3)], leading to a higher number of possible three-dimensional configurations (larger entropy), and thus to a higher probability to be in the open state.

### C. Phase transition

The nature of the phase transition for homogeneous DNA has been long studied and is well understood.^{29} It has been shown that no phase transition occurs for *c* < 1, that the probability for a closed basepair approaches 0 with a critical exponent (2 − *c*)/(*c* − 1) when 1 < *c* < 2, and that the phase transition is first order when *c* > 2 (see Refs. 29 and 45). In Appendix D, we show that these results hold true for homo-DNA with a defect, and that the position of the basepair (defect, neighboring or background) enters through a prefactor in the melting probability.

*c*< 2, we show that

*A*(

*k*) is a prefactor depending on the position

*k*of the basepair, but independent of the temperature difference

*T*−

*T*

_{*}. The prefactor

*A*(

*k*) is given explicitly in Eqs. (D13) and (D14). The probability that a basepair is closed thus approaches 0 in a power-law fashion with a critical exponent

*γ*= (2 −

*c*)/(

*c*− 1) and since

*γ*> 0 the transition is continuous. By taking the derivative of

*P*with respect to Δ

*T*=

*T*−

*T*

_{*}we find that for 3/2 <

*c*< 1 this derivative becomes infinite at

*T*=

*T*

_{*}, i.e. the transition is second order in the Ehrenfest sense. Proceeding to compute higher order derivatives we find that for 4/3 <

*c*< 3/2 the transition is third order, for 5/4 <

*c*< 4/3 the transition is fourth order etc. So, in general we have a

*N*th order transition, in the Ehrenfest sense, if (

*N*+ 1)/

*N*<

*c*<

*N*/(

*N*− 1).

*c*> 2, our derivation in Appendix D shows that

*B*(

*k*) > 0 is a position-dependent constant which is independent of the temperature difference

*T*−

*T*

_{*}. Therefore as we approach

*T*

_{*}from below we do not continuously approach 0, and thus the transition is first order. The explicit expression for

*B*(

*k*) is given in Eqs. (D16) and (D17).

We illustrate these results on Fig. 3 by plotting the melting curves for a background basepair, the defect basepair, and a basepair at a distance *d* = 10 to the right of the defect, using respectively Eqs. (9), (6), and (12), for a sequence “A…A” with a defect G, and for three different values of the loop exponent *c*. Unlike the example shown on Fig. 2, here the bond at the defect site is stronger than the bonds for the background basepairs (GC: three hydrogen bonds, AT: two hydrogen bonds), therefore the opening probability for the defect basepair is smaller than for background basepairs. Note that depending on the value of *c*, the neighboring basepair behaves either more similarly to the defect basepair or to a background basepair.

Because of the separation of variables between the basepair position *k* and the temperature difference *T* − *T*_{*} in Eqs. (15) and (16), the helicity, defined as the fraction of bounded basepairs $\theta =\u2211k=1NP(k)/N$, follows the same phase transitions as *T* → *T*_{*}: $\theta \u223c(T\u2212T*)(2\u2212c)/(c\u22121)$ for 1 < *c* < 2 and *θ* → *B* > 0 for *c* > 2. Thus the nature of the phase transition and the critical exponent for the present single-defect system are the same as for the PS model without a defect in the thermodynamic limit (*N* → *∞*).

## IV. SUMMARY AND OUTLOOK

In this study we investigated how a defect affects the DNA melting transition. Our exact solution of the Poland–Scheraga model provides the melting probabilities for the defect site and its neighbors, as given in Eqs. (6) and (12). The opening probability for the defect site can be either larger or smaller than for the background basepairs, depending on whether the defect effectively weakens or strengthens the bond. Therefore, the presence of a defect can both increase or decrease the probability of bubble formation. We find that the effect of the defect decays with a typical distance *d*_{0} as given in Eq. (13), which depends on the distance to the critical temperature.

We also investigated the phase transition behavior in the thermodynamical limit (*N* → *∞*) of our model system. We found that, while the single defect melting probability has a prefactor which is different than the rest of the DNA, the temperature dependence is such that the order of the phase transition, and the critical exponents, are the same as for the case without a defect.

Our findings could potentially be experimentally corroborated using, for instance, a setup with fluorophore-quencher pairs attached to the DNA.^{38,39}

It remains a future challenge to solve analytically the case with multiple defect sites. Our results indicate that such a system might show a rich behavior, where the “interaction” between defects depends on how far from the critical temperature the system is, on the distance between defects, as well as on the value of the loop exponent.

## ACKNOWLEDGMENTS

T.A. is grateful to the Swedish Research Council for funding (Grant No. 2014-4305). A.G. thanks Malcolm Hillebrand for is careful review of the manuscript and for useful discussions.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Arthur Genthon**: Formal analysis (equal); Writing – original draft (equal); Writing – review & editing (equal). **Albertas Dvirnas**: Formal analysis (supporting); Supervision (supporting); Writing – review & editing (equal). **Tobias Ambjörnsson**: Conceptualization (lead); Formal analysis (equal); Funding acquisition (lead); Supervision (lead); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The Matlab codes that support the findings of this study are openly available at https://github.com/Arthur-Genthon/dna-melting-defect.

### APPENDIX A: THE POLAND–SCHERAGA MODEL

*N*basepairs, clamped at both ends. Following Refs. 13 and 46 we denote by

*Q*

_{L}(

*k*) and

*Q*

_{R}(

*k*) the statistical weights (or partition functions) of the regions to the “left” and to the “right” of basepair

*k*, with the constraint that basepair

*k*is

*closed*. The probability for having basepair

*k*closed is then simply given by

^{46}can be derived by summing over the length of the right-most loop before the closed basepair

*k*+ 1:

*γ*

_{j}(

*k*−

*j*) is the statistical weight of a loop of length

*k*−

*j*, with first open basepair at

*j*+ 1 and last open basepair at

*k*. The recursion relation for

*Q*

_{R}(

*k*) is similar, see Ref. 46. The loop weight is given by:

*U*

_{hb}the energy required to break the hydrogen bonds (Watson–Crick) between facing bases,

*U*

_{st}the energy cost for disrupting a stacking interaction (nearest neighbor) between two consecutive basepairs, and Ω(

*m*) the number of configurations of the melted region of size

*m*. The number of configurations corresponds to the volume in phase space available for a random walk returning to its origin after 2

*m*steps, and is given by:

^{44},

*A*is a non-universal constant and

*c*is the loop factor. The quantity

*s*is the effective coordination number for the random walk and is another non-universal constant. A part of this number of configurations is absorbed as an entropic contribution in the energies:

*S*we assign to

*E*

_{hb}and how much we assign to

*E*

_{hb}. Besides the effective coordination number, open basepairs also have rotational degrees of freedom. These degrees of freedom can be absorbed into Δ

*S*

_{hb}and Δ

*S*

_{st}. In practice,

*E*

_{st}and

*E*

_{hb}are measured experimentally at different temperatures using designed DNA molecules, see Refs. 24 and 25 for details.

*k*:

*Q*

_{L}(

*k*) as

^{46}for

*k*= 1, …,

*N*:

*q*

_{L}(0) = 1 and

*q*

_{L}(1) = 1/Λ(1). A similar recursion relation for the right partition functions

*q*

_{R}can also be derived.

In the next section, we solve this equation for homogeneous-DNA with a defect basepair.

### APPENDIX B: EXACT SOLUTION OF THE POLAND–SCHERAGA MODEL WITH A DEFECT BASEPAIR

In this appendix we solve the Garel–Orland recursion relation, Eq. (A13), for homo-DNA with a defect basepair described by Eqs. (4) and (5), and we provide analytic expressions for the melting probabilities of the defect basepair, a neighboring basepair, and a background basepair. To do so, we follow along similar lines as the derivation for the homo-DNA case given in Ref. 13.

*z*

^{k}and sum over

*k*, which leads, after using Eqs. (4) and (5) and the convolution theorem for

*z*-transforms, and after some rearrangements, to:

_{c}(

*z*) is defined in Eq. (7).

^{47},

*C*around

*z*= 0. Then, we deform the contour into several parts

*C*

_{1}to

*C*

_{5}, as described in Refs. 13 and 45 and shown in Fig. 4. The parts

*C*

_{2},

*C*

_{4}and

*C*

_{5}are necessary to go around the branch cut of the polylogarithm for Re(

*z*) > 1. Using standard methods from complex analysis, one easily show that the integration on portions

*C*

_{2}and

*C*

_{3}gives vanishing contributions. Portion

*C*

_{1}is a clockwise contour around the pole

*z*

_{0}of $q\u0304L(z)$, which exists only if

*z*

_{0}< 1. The pole is the solution of Eq. (8), and its existence depends on both the loop factor

*c*and temperature

*T*. If

*c*< 1, a solution

*z*

_{0}< 1 always exists, regardless of the temperature, because the polylogarithm Li

_{c}(

*z*) diverges as

*z*→ 1. On the other hand if

*c*> 1, a pole

*z*

_{0}< 1 exists only for temperatures below a critical temperature that we call

*T*

_{*}. The critical temperature is thus defined as the temperature for which

*z*

_{0}=

*z*

_{*}= 1, and is associated with weights

*β*

_{*}and

*σ*

_{*}, linked by Eq. (8) when

*z*

_{0}= 1:

*ζ*(

*c*) = Li

_{c}(1) is the Riemann zeta function. These behaviors are illustrated in Fig. 5, for two different values of the loop exponent. Finally, the contours

*C*

_{4}and

*C*

_{5}are the circumvention of the branch cut for

*z*≥ 1, and we call the result of the integration along these contours the branch cut contribution $qLBC(k)$.

^{47}for pole

*z*

_{0}, we obtain:

*A*(

*z*)/

*B*(

*z*),

*z*=

*z*

_{0}] =

*A*(

*z*

_{0})/

*B*′(

*z*

_{0}) if

*z*

_{0}is a simple pole and

*A*(

*z*) is a regular function, we explicitly compute the residue when

*k*>

*k*

_{d}as

*k*>

*k*

_{d}can be expressed as:

*I*(

*k*) as an explicit integral that can be computed numerically in Appendix C, and show that it vanishes in the limit

*k*→

*∞*.

*q*

_{L}in the right hand sides of Eqs. (B6) and (B7). To do so, it is useful to note that, for

*k*= 0, …,

*k*

_{d}− 1, the partition functions

*q*

_{L}(

*k*) are equal to their counterparts in the absence of defects, which we note $q\u0302L(k)$. Therefore, by comparing Garel–Orland recursion relations in both situations, we can express the weights in Eqs. (B6) and (B7) as linear combinations of weights for homo-DNA without defect:

*β*

_{L}=

*β*

_{R}=

*β*and

*σ*

_{L}=

*σ*

_{R}=

*σ*:

*z*

_{0}< 1. Combining Eqs. (B8)–(B12) with Eq. (B6) and with Eq. (B7), neglecting

*I*(

*k*),

*I*(

*k*

_{d}− 1),

*I*(

*k*

_{d}) and

*I*(

*k*

_{d}+ 1) in the thermodynamic limit, using Eq. (8), and after multiplication by

*β*

_{R}

*β*

_{L}

*β*

^{k−2}we get for

*k*>

*k*

_{d}:

*k*

_{d}→

*∞*and also

*N*−

*k*

_{d}→

*∞*. Therefore,

*I*(

*N*+ 1 −

*k*

_{d}) and

*I*(

*N*−

*k*

_{d}) vanish and

*Q*

_{L}(

*N*+ 1) is only controlled by the pole, and given by Eq. (B13) when

*z*

_{0}< 1.

#### 1. Below the critical temperature

*z*

_{0}< 1. The partition function at the defect basepair

*Q*

_{L}(

*k*

_{d}) is given by Eq. (B8) after multiplication by $\beta kd\u22121$, using Eq. (B12) and neglecting

*I*(

*k*

_{d}− 1) and

*I*(

*k*

_{d}) in the thermodynamic limit:

*k*has to be replaced by

*N*+ 1 −

*k*and

*σ*

_{L}and

*σ*

_{R}must be inverted. By combining Eqs. (A1), (B15), (B16), and (B13) for

*k*=

*N*+ 1, we obtain the probability that the defect basepair is closed, Eq. (6) in the main text.

For a basepair *k* > *k*_{d} in the right neighborhood of the defect, the left partition function *Q*_{L}(*k*) is given by the sum of Eqs. (B13) and (B14), where the terms *I*(*k* − *k*_{d}) and *I*(*k* − *k*_{d} − 1) are not negligible since we do not suppose *a priori* that the neighboring basepair is far from the defect. On the other hand, the right partition function describe the region from basepair *k* to basepair *N* + 1, with the constraint that they are both closed. Since this region does not include any defect, *Q*_{R}(*k*) is given by the formula for homo-DNA without defect. Combining these results with Eqs. (A1) and (B13) for *k* = *N* + 1, we recover the probability that a basepair in the right neighborhood of the defect is closed, Eq. (12) in the main text.

#### 2. Above the critical temperature

*T*>

*T*

_{*}, there is no contribution of pole

*z*

_{0}, so that the partition functions are only given by the branch cut contributions. For the defect basepair, the branch cut contributions of the left partition function

*Q*

_{L}(

*k*

_{d}) are

*I*(

*k*

_{d}) and

*I*(

*k*

_{d}− 1), coming from Eqs. (B8) and (B12), which are equivalent

*I*(

*k*

_{d}) ∼

*I*(

*k*

_{d}− 1) in the limit

*k*

_{d}→

*∞*. Similarly, the right partition function

*Q*

_{R}(

*k*

_{d}) for the defect basepair is proportional to

*I*(

*N*−

*k*

_{d}), and the left partition function

*Q*

_{L}(

*N*+ 1) for the final basepair

*N*+ 1 is proportional to

*I*(

*N*). Combining these results, the probability that the defect basepair is closed is asymptotically given by:

*k*→

*∞*, with

*λ*

_{c}=

*c*when

*c*> 1 and

*λ*

_{c}= 2 −

*c*when

*c*< 1. In all cases,

*λ*

_{c}> 1, therefore Eq. (B18) vanishes in the thermodynamic limit where

*k*

_{d}→

*∞*,

*N*→

*∞*, and

*k*

_{d}is far from both ends [think for example of

*k*

_{d}=

*pN*, with 0 <

*p*< 1, so that $P(kd)\u223cp(1\u2212p)N\u2212\lambda c\u21920$].

For a basepair in the right neighboring of the defect, we still have *Q*_{L}(*N* + 1) ∼ *I*(*N*), moreover *Q*_{R}(*k*) ∼ *I*(*N* − *k*) since there is no defect in the region from *k* to *N*. Now the different branch cut contributions appearing in *Q*_{L}(*k*) are *I*(*k*) and terms equivalent to *I*(*k*_{d})*I*(*k* − *k*_{d}), coming from the combination of Eq. (B7) with Eqs. (B8)–(B12). The part proportional to *I*(*k*) gives a vanishing contribution to the probability, which follows from the same calculation as for the defect basepair. The probability that a basepair at position *k* > *k*_{d} is closed is thus asymptotically given by: *P*(*k*) ∼ *I*(*k*_{d})*I*(*k* − *k*_{d})*I*(*N* − *k*)/*I*(*N*). If we are interested in a basepair *k* that remains at a fixed distance of the defect in the thermodynamic limit where *k*, *k*_{d}, *N* → *∞*, then *I*(*k* − *k*_{d}) is simply a constant, *I*(*k*_{d}) ∼ *I*(*k*), and again the ratio reduces to the case treated for the defect basepair, which gives a vanishing probability in the thermodynamic limit. If *k* − *k*_{d} → *∞*, then the asymptotic behavior $I(k)\u223ck\u2212\lambda c$ can be used for all four functions *I* involved in the probability, and the probability reads: $P(k)\u223ckd(k\u2212kd)(N\u2212k)/N\u2212\lambda c$, which goes to zero in the thermodynamic limit.

### APPENDIX C: BRANCH CUT CONTRIBUTION

*C*

_{4}and

*C*

_{5}of the integration contour defined in Fig. 4 are given by:

*C*

_{4}and

*C*

_{5}with

*z*=

*x*−

*iδ*and

*z*=

*x*+

*iδ*respectively, and we considered the limit

*δ*→ 0. The real part of the polylogarithm function is continuous across the branch-cut: Re[Li

_{c}(

*x*+

*iδ*)] = Re[Li

_{c}(

*x*−

*iδ*)] =

*R*(

*x*) when

*δ*→ 0, but the the imaginary part makes a jump: Im[Li

_{c}(

*x*+

*iδ*)] = −Im[Li

_{c}(

*x*−

*iδ*)] =

*π*(ln

*x*)

^{c−1}/Γ(

*c*) when

*δ*→ 0, where

*I*

_{c}is positive. The branch cut contribution is thus a decreasing function of the basepair position

*k*.

*I*

_{c}in the thermodynamic limit, that is when

*k*→

*∞*. First let us make the variable substitution

*s*= (

*k*− 1) ln

*x*:

_{c}(

*z*) close to

*z*= 1 obtained in Ref. 48 when

*c*is not a positive integer:

*k*→

*∞*in Eq. (C6). Retaining only the two first terms in Eq. (C7), and replacing

*e*

^{s/(k−1)}≈ 1, Eq. (C6) becomes:

*A*= 2(1 −

*β*

_{*}/

*β*)

*σ*Γ(1 −

*c*) cos(

*πc*)/

*β*. To derive Eq. (C8), we used the reflection formula Γ(

*z*)Γ(1 −

*z*) =

*π*/sin(

*πz*), the result cos(

*πz*−

*π*) = −cos(

*πz*), the fact that Re[(−1)

^{c−1}] = Re[

*e*

^{iπ(c−1)}] = cos[

*π*(

*c*− 1)], and the trigonometric identity: cos(

*z*)

^{2}+ sin(

*z*)

^{2}= 1. To proceed further we separate between the cases

*c*> 1 and

*c*< 1.

*c*> 1, the denominator of Eq. (C8) is dominated by its first term:

*k*

^{−c}. At the melting transition

*β*→

*β*

_{*}, we have that

*I*

_{c}(

*k*)|

_{c>1}diverges with an exponent = −2.

*c*< 1, the third term in the denominator of Eq. (C8) dominates:

*z*+ 1) =

*z*Γ(

*z*). Again the finite-size correction decays in a power-law fashion, in this case

*k*

^{−(2−c)}. However, no power-law divergence occurs near the transition

*β*→

*β*

_{*}.

### APPENDIX D: PHASE TRANSITION

*T*→

*T*

_{*}, i.e. when

*β*→

*β*

_{*}, or equivalently when

*z*

_{0}→

*z*

_{*}= 1. We define Δ

*z*= 1 −

*z*

_{0}≪ 1, and we insert this ansatz into Eq. (8) using Eqs. (B4) and (C7) to derive the following equation for the unknown Δ

*z*:

*β*=

*β*−

*β*

_{*}. We then need to separate the cases: (i) 0 <

*c*< 1; (ii) 1 <

*c*< 2 and (iii)

*c*> 2.

0 <

*c*< 1: the left-hand side of Eq. (D1) diverges as Δ*z*→ 0, whereas the right-hand side is finite. Thus, Eq. (D1) has no solutions for*z*_{0}close to 1. Instead, all solutions to Eq. (8) occur for*z*_{0}< 1, and*P*(*k*) is for all temperature given by Eq. (6) for the defect basepair and by Eq. (12) for the neighboring basepairs. The transition is therefore “smooth” (derivatives to any order of*P*with respect to*T*exists).- 1 <
*c*< 2: the left-hand side of Eq. (D1) is dominated by its second term as Δ*z*→ 0, so that:Note that we have Δ(D2)$\Delta z\u223c\Delta \beta \sigma \Gamma (1\u2212c)1/(c\u22121)$*β*< 0 and Γ(*z*) < 0 for −1 <*z*< 0, so that their ratio is positive. *c*> 2: the left-hand side of Eq. (D1) is dominated by its first term as Δ*z*→ 0, so that:(D3)$\Delta z\u223c\u2212\Delta \beta \beta $

*β*and Δ

*T*=

*T*−

*T*

_{*}. First, for homo-DNA it is useful to re-write Eq. (A8) as

*β*= 1. For homo-DNA with independent basepairs (

*c*= 0 and

*σ*= 1), we showed in the discussion following Eq. (9) that

*P*

_{bg}= 1/(1 +

*β*), thus in this case,

*T*

_{M}is the melting temperature, at which half the basepairs are open.

*T*→ 0. We then insert Eq. (D8) into Eq. (D6), expand it at first order in Δ

*T*, and identify

*β*

_{*}, which leads to:

*c*< 2, combining Eqs. (D2) and (D9)–(D11) and inserting them in Eq. (6) for the defect basepair and in Eq. (12) for the neighboring basepairs leads to the following leading behavior:

*c*> 2, we combine Eqs. (D3) and (D9)–(D11) and inserting them in Eq. (6) for the defect basepair and in Eq. (12) for the neighboring basepairs leads to the following leading behavior:

*A*(

*k*) and

*B*(

*k*) for a background basepair far from the defect (

*k*→

*∞*) are directly obtained from Eqs. (D14) and (D17) by setting

*I*= 0.

## REFERENCES

*Cold Spring Harbor Symposia on Quantitative Biology*

*Essential Cell Biology*

*Theory of Helix-Coil Transitions in Biopolymers: Statistical Mechanical Theory of Order-Disorder Transitions in Biological Macromolecules*

*Phase Transitions and Critical Phenomena*