A theoretical approach to the prediction of the sequence and temperature-dependent rate constants for oligonucleotide hybridization reactions has been developed based on the theory of relaxation kinetics. One-sided and two-sided melting reaction mechanisms for oligonucleotide hybridization reactions have been considered, analyzed, modified, and compared to select a physically consistent as well as robust model for prediction of the relaxation times of DNA hybridization reactions that agrees with the experimental evidence. The temperature- and sequence-dependent parameters of the proposed model have been estimated using available experimental data. The relaxation time model that we developed has been combined with the nearest neighbor model of hybridization thermodynamics to estimate the temperature- and sequence-dependent rate constants of an oligonucleotide hybridization reaction. The model-predicted rate constants are compared to experimentally determined rate constants for the same oligonucleotide hybridization reactions. Finally, we consider a few important applications of kinetically controlled DNA hybridization reactions.

## I. INTRODUCTION

Oligonucleotide hybridization is a reversible chemical reaction in which two short single-stranded DNA/RNA molecules (generally <50 base pairs) hybridize to give a double-stranded DNA/RNA molecule. This reaction is an integral part of a wide range of molecular biology technologies ranging from microarrays to the Polymerase Chain Reaction (PCR).^{1–3} Ever since these techniques were invented, DNA hybridization reactions have been studied extensively.^{3} In particular, the thermodynamics of DNA hybridization reactions has been well-studied both experimentally and theoretically.^{4–9} However, kinetic study of DNA hybridization has received only limited attention, with the majority of prior work being focused on homopolymers. For applications such as PCR that are time-dependent, the kinetics of DNA hybridization reactions can play an important role. Methods for the prediction of sequence-dependent rate constants could enable, for example, the optimal choice of temperature cycling protocols for PCR. In the late 1960s and early 1970s, a few experimental methods for the estimation of kinetic parameters of oligonucleotide hybridization reactions were proposed and using these, the rate constants of a series of *A*_{n}*U*_{n} oligonucleotides were estimated at a specified set of temperatures.^{10–12} Though these studies addressed the kinetics of a few homogeneous oligonucleotide hybridization reactions, there was no generalization made to the rate constants of arbitrary homogeneous/heterogeneous oligonucleotide hybridization reactions at specific temperatures. More recent work on modeling hybridization kinetics^{13} was based on simplified models that, due to the nature of the applications, did not demand physical consistency with hybridization thermodynamics.

With methodologies such as PCR being applied to different oligonucleotide sequences every day in modern molecular biology laboratories for various applications,^{14} it is not feasible to experimentally estimate the thermodynamic and kinetic parameters for all possible DNA/RNA sequences of interest. This is one of the main reasons for the development of the Nearest Neighbor (NN) method, which can be used to calculate the thermodynamic properties of the oligonucleotide hybridization reactions.^{5,6,8} However, thermodynamic properties are insufficient to analyze and optimize reactions such as PCR. Therefore, it is essential to develop a theoretical model to estimate the rate parameters for oligonucleotide hybridization reactions.

Although numerous experimental studies of the kinetics of oligonucleotide hybridization have been reported,^{15–18} to the best of our knowledge, no theoretical methods for estimation of the rate constants of arbitrary oligonucleotide sequences based on the fundamental biophysics of DNA hybridization have been developed to date. In this work, we have developed the first theoretical method to predict sequence- and temperature-dependent rate parameters that is applicable to either homogeneous or heterogeneous oligonucleotide hybridization reactions.

Experimentally, the rate constants of an oligonucleotide hybridization reaction are determined by measuring its (a) relaxation time using a temperature jump apparatus, (b) equilibrium melting curve, a relationship between the equilibrium conversion and temperature of the DNA hybridization reaction.^{10–12} The relaxation time of an equilibrium reaction is a measure of time needed for the reaction to return to equilibrium when it is perturbed from its original equilibrium state. Theoretical estimation of hybridization rate constants is based on estimation of the relaxation time and the equilibrium melting curve. The equilibrium melting curve of an oligonucleotide sequence is determined theoretically using the NN model. In this work, we have developed a theoretical model to estimate the relaxation time of any oligonucleotide hybridization reactions at a specified temperature. We have combined our model for relaxation time prediction with the NN model to estimate the hybridization rate constants of arbitrary oligonucleotide sequences. In order to develop a theoretical model for relaxation time prediction, we have considered two different hybridization reaction mechanisms: one-sided melting^{10,12,19,20} and two-sided melting.^{21–23} We have analyzed these two reaction mechanisms and proposed a modified reaction mechanism that is consistent with the laws of chemical reaction kinetics. Our kinetic model can be used not only to estimate appropriate reactions conditions for DNA hybridization, but also for systematic optimization and control of PCR reactions.

In this paper, we (1) describe our theoretical approach to estimate the temperature and sequence-dependent rate parameters of a oligonucleotide hybridization; (2) develop a theoretical method to estimate the relaxation times of oligonucleotide hybridization reactions; (3) estimate the model parameters and experimentally validate our theoretical model; (4) analyze our model and discuss the properties of its solutions; (5) consider illustrative applications of kinetically controlled DNA hybridization reactions.

## II. THEORETICAL MODEL FOR ANNEALING KINETICS

The forward and reverse rate constants of an equilibrium reaction can be expressed as a function of its equilibrium constant (*K*_{eq}) and relaxation time (τ)^{24} and as shown below.

### A. Prediction of equilibrium properties

Consider the following oligonucleotide hybridization reaction:

where *k*_{f} and *k*_{r} are forward and backward reaction rate constants. D and S represent the double stranded and single stranded oligonucleotide. The hybridization reaction, (R1), is assumed to follow “all or none” or “two state” hybridization.^{4–12} Typically, a DNA with less than 50 base pairs follows two state hybridization.^{25} Reaction (R1) can be assumed to be an elementary reaction with respect to the fully single-stranded and double-stranded DNA;^{10–12} hence, the rate expression is written as

where [ ] represents the molar concentration of respective species. Since usually the DNA hybridization reaction mixture is dilute, the following relationship between the equilibrium concentration, equilibrium constant, forward and backward rate constants is written as^{26}^{,}

*K*

_{eq}is an equilibrium constant,

*C*

_{0}is a reference concentration whose value is 1M

^{26,27}and Δ

*G*is the Gibbs free energy of reaction (R1) estimated based on the NN model.

^{5,6,25}We use unified NN model parameters to estimate Δ

*G*of reaction (R1).

^{6}

### B. Relationship between relaxation time and rate constants

As per the definition of the relaxation time, when the reaction (R1) is disturbed from its equilibrium, the following equation is written as

δ is a small perturbation from the equilibrium due to an external disturbance. Expanding Eq. (3) both sides and neglecting δ^{2} term, the following rate equation for the perturbation parameter δ is obtained:

Above equation represents the first order kinetics of the perturbation parameter δ with a characteristic time constant which is called relaxation time. Thus, an expression for the relaxation time is obtained as follows:

If the left hand side of Eqs. (2c) and (5) is known, then these equations can be solved to obtain *k*_{f} and *k*_{r}.

## III. THEORETICAL ESTIMATION OF RELAXATION TIME: UNIMOLECULAR ELEMENTARY STEP MODEL

### A. Nucleation of oligonucleotide hybridization

The kinetics of reaction (R1) depends on the following two processes:

A pre-equilibrium step that forms the first few base pairs to create a stable nucleus.

^{10,12,19}Once a stable nucleus is formed, rapid addition of base pairs to complete the hybridization reaction.

In a DNA hybridization reaction, each base pair has a specific stability (stacking + base pairing free energy). The following equations express the stability constants of AT and GC base pairs as functions of temperature:^{19}

Unlike NN models, these relations approximate the stabilities of different types of base pairs without specification of the identities of the neighboring base pairs.

The formation of the first or first few base pairs is quite different from the formation of other base pairs.^{10,12,19} Once the first base pair is formed, in order for the second base pair to form, the single strands must align in a specific manner. The strands can easily dissociate until a critical number of base pairs (termed a stable “nucleus”) has formed. The stability of a duplex with one or few base pairs is thus smaller than the product of their stability constants. In order to represent the reduction in stability of such duplexes, a parameter σ (different literature uses different notations and we use σ here) is multiplied^{10,12,19} with the product of stability constants. Thus, σ accounts for the resistance to the formation of first few base pairs and hence the following reaction mechanism for a oligomer hybridization reaction was proposed:^{10,12,19}

In the above reaction, *N* denotes the total number of base pairs and *D*_{i} is a hybridized duplex with *i* base pairs in the double-stranded state. *k*_{i} and *k*_{−i} are the forward and reverse rate constants of the *i*th step of (R2). At a fixed temperature, *k*_{i} is roughly constant for sequences of a given length and it can be assumed to be the same for all the steps; hence, we denote it by *k*_{1}.^{19,28} The ratio of *k*_{1} and *k*_{−i} is defined as a stability constant, *s*_{i} of *i*th base pair

As we explained above, once a stable nucleus is formed, it facilitates subsequent base pair formation. Let *x* be the number of base pairs that melt before forming a stable nucleus; then

The overall stability constant of reaction (R1) in terms of individual stability constants of base pairs and σ_{i} is expressed as

In Eq. (11), there are *x* unknown parameters (σ_{i}) that account for the resistance to form a stable nucleus. If these parameters are clubbed together, they can be represented as a single parameter and assigned as a resistance to the formation of the first base pair. Since all the resistances are then assigned to the first base pair formation, this represents a worst case scenario for formation of a stable nucleus. Thus, using Eqs. (6), (7), and (10) and assuming that all resistance to formation of a stable nucleus is associated with the first base pair, the following equation for σ can be written as

#### 1. Estimation of sigma

Equation (12) is rewritten to represent a general form of the equilibrium relationship as follows:

*K*_{eq} is calculated based on the overall Gibbs free energy of hybridization for the sequence, which can be decomposed as follows:

where Δ*G*_{nuc} is the Gibbs free energy of nucleation^{29} and Δ*G*_{dup} is the Gibbs free energy of a duplex. Δ*G*_{dup} can be decomposed into the Gibbs free energy for stacking Δ*G*_{st} and the Gibbs free energy for the formation Δ*G*_{bp}(*i*) of *i*th base pair. NN methods provide the length-independent nucleation enthalpy (Δ*H*_{nuc}) and entropy (Δ*S*_{nuc}) and they are usually called initiation enthalpy and entropy, respectively.^{29} Whereas for DNA, Δ*G*_{nuc} depends on the terminus at which nucleation occurs, for RNA it is sequence and terminus-independent.^{30} For self-complementary sequences, an additional term Δ*G*_{sym} must be added to the right hand side of Eq. (14).^{5}

There are several issues that complicate the use of NN parameters in kinetic models for oligonucleotide hybridization. First, although DNA melting can occur simultaneously from both sides of a sequence, Δ*G*_{nuc} assumes that the first base pair formation occurs at one or the other terminus. Δ*G*_{nuc} parameters reported in the literature are base pair-specific and available only for nucleation at one or the other end of the duplex. This reflects the fact that models for DNA hybridization have typically assumed the hybridization is nucleated at the termini of the polymer. Parameters have thus been estimated under this assumption. However, as will be discussed below, DNA hybridization can in fact be nucleated at any position in the sequence. Typically, the sum of Δ*G*_{nuc} for the two termini is used to calculate the nucleation free energy, and we follow this precedent here in the context of our kinetic models. Second, it is not possible to obtain the stability constants for each base pair in a sequence using NN parameters, since there are only *n* − 1 NN parameters for a *n* base pair sequence. This issue does not arise in thermodynamic models for DNA hybridization since one is only interested in the total hybridization free energy, not the stability of each base pair. In kinetic models, these base pair stabilities are required for the computation of sequence-dependent rate constants, as will be shown below. Due to the computation of Δ*G*_{seq} as sum of *n* − 1 pairwise *NN* parameters rather than a sum of n individual Δ*G*_{bp}(*i*)’s plus a sum of *n* − 1 stacking interactions Δ*G*_{st}(*i*),*i* > 1, in the *NN* approach to modeling free energies, there is no unique way to assign the free energy change due to the hybridization of each successive base. Free energy changes are only available for sequences of two neighboring bases.^{31}

In the absence of this information, we apply an approach that approximates Δ*G*_{dup} in Eq. (14) as follows: Δ*G*_{dup} ≈ ∑_{i} Δ*G*_{st}(*i*) + Δ*G*_{bp}(*i*), where Δ*G*_{st}(*i*) + Δ*G*_{bp}(*i*) = −*RT* ln *s*_{i}, with the *s*_{i} calculated according to Eqs. (6) and (7). ∑_{i} Δ*G*_{st}(*i*) ≠ Δ*G*_{st, NN} because the sequence dependence of stacking free energies is not represented in Eqs. (6) and (7). We then use the accurate NN parameters to compute Δ*G*_{seq} and Δ*G*_{nuc} in Eq. (14). Introducing the notation

we can rewrite Eq. (14) as

Since the product of stability constants is calculated based on ∑_{i} Δ*G*_{st}(*i*) + Δ*G*_{bp}(*i*), the ratio

where

*K*

_{eq}computed according to (2a) using Δ

*G*

_{seq}obtained from NN parameters, will be used in our kinetic models to represent the resistance to formation of the first base pair. Since Δ

*G*

_{st}(

*i*) is much smaller than Δ

*G*

_{bp}(

*i*), we assume the former is the same for all base pairs. This approach maintains thermodynamic consistency between Eqs. (13) and (14). These approximations are validated below through comparison to experimental kinetic data.

Fig. 1 compares the nucleation factor σ_{nuc} calculated based on Δ*S*_{nuc} and Δ*H*_{nuc} to the combined nucleation and stacking factor σ for sequences of length 10 and 14 base pairs. There are only three possible values for σ_{nuc}, since the first base pair for DNA/RNA can be any one of *A* − *U*, *A* − *T*, and *G* − *C*, whereas, σ is length-dependent. The difference between σ and σ_{nuc} is due to Δ*G*_{st, NN} − ∑_{i} Δ*G*_{st}(*i*), which is generally < 0, since the sequence-independent stacking energies represented in Eqs. (6) and (7) generally overestimate the magnitude of the actual stacking free energies. This adds a positive Δ*H*_{σ} to the Δ*G*_{σ}, which contributes to the apparent temperature dependence of sigma. When *N* increases, *x*, the number of base pairs that melt before forming a stable nucleus, also increases. As a result of this, the overall resistance to form a stable nucleus increases. Hence, σ decreases or resistance increases when *N* increases. Δ*G*_{st, NN} − ∑_{i} Δ*G*_{st}(*i*) also contributes to the observed length dependence of σ.

### B. One-sided melting

In one-sided melting mechanisms, hybridization/melting progresses from any one side of a DNA and we assign σ to the first step. Fig. 2 and the reaction (R2)^{10,19,32} show the mechanism of one-sided hybridization. Schwarz and Poland^{21} presented the same mechanism as a biased continuous time one-dimensional random walk (CTRW) with partially reflecting boundary conditions.^{33} The walk is biased because the forward and reverse reaction rates are not equal.

#### 1. Estimation of relaxation time under the steady-state (SS) approximation

Using the steady state approximation,^{32} the concentrations of the intermediate components in (R2) can be assumed to be equal to the same constant value. Therefore, the rate of formation of double helix is equal to the rate of disappearance of single strands. Hence, the rate of formation of double helix is the net forward rate at each intermediate step. To estimate the relaxation time, a perturbation δ is introduced to the equilibrium.

The net forward rate, *r* of *i*th step is

The association of the single strands in the first step of (R2) is a second-order reaction and the rate equation for this step in terms of δ and σ is obtained by linearizing the actual rate equation around the equilibrium concentration of the single strands, which is denoted as [*D*_{0eq}] (the factor 2 that was obtained after linearizing has been clubbed together with [*D*_{0}]_{eq}). Manipulation of Eq. (18) leads to the following equation for the rate *r* in terms of δ_{0} (a detailed derivation is provided in the supplementary material^{34}):

where

Relaxation time is defined as

Substituting

The stability constant *s* in Eq. (21) is a geometric mean of the stability constants of individual base pairs^{19}

Thus, for a given DNA sequence and temperature, the relaxation time can be found using Eq. (21).

#### 2. Estimation of relaxation time: State space representation for non-steady state solution

Baldwin^{32} indicates the following requirements for validity of the steady state approximation:

- $\prod _{i=0}^{n-1} s_i = \prod _{i=0}^{n-1} \frac{k_i}{k_{-i}} \gg n;$$\u220fi=0n\u22121si=\u220fi=0n\u22121kik\u2212i\u226bn;$
σ ≪ 1.

For short sequences, it may be possible that the product of individual stability constants less than *n*. Although the value of σ for the oligomer hybridization is always less than 1, in a long DNA melting problem, nucleation is not required for the hybridization of all domains, and hence σ = 1 for these domains. For these as well as other reasons discussed below, the steady state model is not always useful and a non-steady state solution for the relaxation time is essential as will be shown below.

A detailed unsteady state balance for the chemical reaction network (R2) yields a set of coupled first order differential equations after linearization around the equilibrium

where *x* = [*D*_{0}, …, *D*_{n}]^{T}. The state space matrix *A* is

A is a banded diagonal (tridiagonal) asymmetric matrix (*A* ≠ *A*^{T}). The solution to the initial value problem (23) (or equivalently, the solution to the Chapman-Kolmogorov equation^{33} for the corresponding discrete state space stochastic process) is

*V*= [

*v*

_{0}, …,

*v*

_{N}],

*Av*

_{i}= λ

_{i}

*v*

_{i}denote the matrices of left, right eigenvectors of the fundamental matrix

*A*, respectively, since

*A*is an asymmetric matrix. Λ = diag{λ

_{0}, …, λ

_{N}} is the matrix of eigenvalues. Let

*P*(

*t*) = exp [

*V*Λ

*tW*†], which is called the dynamical propagator or the transition probability matrix. The eigenvectors are modes of motion of the dynamical system. These eigenvectors form a complete set since

*A*is a normal matrix. In the non-orthogonal basis

*W*, we have

In this basis, the solution for the evolution of each species is

One examines the mode with eigenvalue Re(λ_{i, max }) ⩽ 0 to identify the relaxation time for hybridization as the associated time constant

The above expressions are simplified if we exploit the specific structure of *A* as a chemical kinetics transition rate matrix. Further properties of the solution (25) are considered below.

Schwarz and Poland^{21} considered transient dynamics of the homopolymer (but not heteropolymer) melting problem using state space and Kolmogorov master equation methods, for reasons of analytical tractability. As will be discussed below, homopolymer approximations are not typically capable of accurately predicting relaxation times for heteropolymers. The master equation for the continuous-time random walk is a system of partial differential equations for the occupation probability *p*(*i*, *t*) equivalent to the state space system of ordinary differential equations, where *i* denotes the number of hybridized bases. Master equation solutions (obtained, e.g., by the method of characteristic functions^{33}) are expressed in terms of modified Bessel functions rather than a sum of exponential functions, and are not convenient for calculation of relaxation times or rate constants.

#### 3. Comparison between transient and SS model

The relaxation time of a DNA hybridization reaction varies from 10^{−1} to 10^{2} s.^{10} Since this reaction is very fast, steady-state assumptions that do not account for the initial transient behavior may be inaccurate. The system of algebraic equations solved in the steady state formulation is inconsistent when the additional constraint corresponding to total mass balance (conservation of probability) is applied. This inconsistency would be observed if one were to solve for the concentrations of the intermediate species. In order for the steady state approximation to be valid, τ_{ss} ≈ τ_{max } ≫ τ_{i}, *i* ≠ max should hold, where τ_{ss} denotes the relaxation time obtained in the steady state formulation, and τ_{max} should be associated with a mode of motion that converts *x*_{0} to *x*_{n}. The validity of the steady state approximation can thus be evaluated based on the eigenvalues of the transient model. Properties of the eigenvectors and eigenvalues are considered further in Sec. IV.

### C. Two-sided melting

An obvious issue with one-sided melting models is the arbitrariness in the choice of the end of the sequence at which initiation occurs. Schwarz and Poland^{21} and Anshelevich *et al.*^{23} proposed a two-sided melting reaction mechanism for (R1) and analyzed it for a homogeneous DNA. According to this model, DNA melting/hybridization can be initiated anywhere on the single strands and subsequent hybridization can occur simultaneously on both sides of the first formed base pair. In this section, we modify this mechanism and develop a transient model for the heterogeneous oligonucleotide melting/hybridization reactions. Fig. 3 illustrates two-sided melting, and the reaction mechanism of the two-sided melting reaction is given in Fig. 4 below. For clarity, we present the mechanism for *N* = 2. *D*_{m, n} represents the state of a DNA molecule where *m* and *n* denote the number of base pairs melted from the left and right side of the DNA molecule, respectively.

In Fig. 4, the horizontal direction represents right-sided melting and the vertical direction represents left-sided melting. The total number of possible states in a *N* base pair hybridization reaction is (*N* + 1) × (*N* + 2)/2. Out of these, there are *N* states that have only one base pair hybridized, in *N* different positions. The σ parameter is associated with the rate constant for formation of each of these *N* states. The number of states in the one-sided melting model is of *O*(*N*), whereas in the two-sided melting model, it is of *O*(*N*^{2}). Using the convention that states with the same *m* value are grouped together, the state vector for two-sided melting is

and the state space matrix for the reaction in Fig. 4 is given as

Each diagonal block of this two-sided melting matrix corresponds to melting from the right side with a constant number *m* of base pairs dissociated from the left side. In the two-sided melting mechanism, there are *N* + 1 states representing fully dissociated single-stranded DNA (*m* + *n* = *N*). These can be represented using one notation, *D*_{0, N}. By doing this we also correct the rates that were counted twice in the above state space matrix (29). For example, in Fig. 4, *D*_{1, 0} gives *D*_{1, 1} and *D*_{2, 0}, which represent the single-stranded state. Hence, the disappearance rate of the molecule *D*_{1, 0} has been counted twice. Thus, from the above state matrix *A*, two redundant states (in general *N* states) should be removed and the rates that were counted twice should also be corrected to obtain the correct state space matrix for two-sided melting. Note the total number of states is (*N* + 1) × (*N* + 2)/2 − *N*. Hence, the Corrected Reaction Mechanism (CRM) for two-sided melting (considering *N* = 2) is given in Fig. 5.

The state space matrix for two-sided melting (*N* = 2) based on the above modified reaction mechanism is thus given by

Just as we have calculated the relaxation time for one sided melting from its state space matrix, the relaxation time for the two sided melting can be calculated from the above state space matrix (30).

Anshelevich *et al.*^{23} considered the double counting model, but not the more accurate single counting model, and only in the absence of σ, possibly due to the greater simplicity of the former model; due to the omission of σ, the solutions therein are not applicable to oligonucleotide hybridization, and moreover, double counting can result in measurable differences in the predicted reaction times (data not shown).

## IV. PROPERTIES OF SOLUTIONS TO THE TRANSIENT MELTING MODELS

In this section, we analyze the state space matrices and give a physical interpretation to the solutions to the state space models. To simplify our analysis, we assume that size of the state space system is *n* (the corresponding number of states can be back calculated based on one- and two-sided melting) and starting in this section the indices *i*, *j*, *k* will run from 1 to *n*. The transition rate matrices A for both one- and two-sided melting have rank *n* − 1, since the rows of these matrices are subject to the single algebraic constraint ∑_{i} *A*_{ij} = 0, ∀*j*. In addition, these matrices satisfy the property of detailed balance

for some state π. Since

due to the transition rate property of *A*, the state π, which is called the stationary distribution, is the eigenvector associated with the null eigenvalue, which we denote by λ_{1} = 0. Now since

the transition probability matrix *P*(*t*) has one eigenvector (π, a probability vector that is scalar multiple of equilibrium state) with a unit eigenvalue and is a stochastic matrix. If the system starts in this state, it does not evolve. We now derive the general solution for *x*(*t*), including properties of the eigenvectors and eigenvalues. First, note that the detailed balance condition (31) or

since

Let

Since *MA* = *A*^{†}*M*, *Mv*_{i} is a (right) eigenvector of *A*^{†} with eigenvalue λ_{i} and hence we have *w*_{i} = *Mv*_{i} for the *i*th left eigenvector of *A*, i.e., the eigendecomposition of *A* can be written as

*w*_{i} is a right eigenvector of *A* with eigenvalue

*A*with eigenvalue λ

_{i}. Since

*W*†

*V*=

*W*†

*W*=

*I*, the left and right eigenvectors must be associated with the same eigenvalue and hence

*W*=

*MV*for the matrix of left eigenvectors.

The negativity of eigenvalues λ_{2}, …, λ_{n} also follows from the properties of the transition rate matrix *A*. It can be shown that the Kullback-Leibler (K-L) distance between *x*(*t*) and π, namely,

according to the transition rate property of *A* and the properties of the K-L distance.

Based on these properties, the solution (21) to the state space model can then be expressed

Further, due to the stochastic property of the transition probability matrix *P*(*t*), we have

where *v*_{ij} denotes the *j*th element of the *i*th eigenvector, and hence

From this it follows that

*j*,

*j*+ 1 in the one-sided melting model) of

*v*

_{i}are equal to 0.

If *x*(0) is an initial state corresponding to a perturbation of the equilibrium (R1) by, for example, a temperature jump or addition of a small amount of either duplex or single-stranded DNA, Eq. (35) describes the time evolution of concentration perturbations of the various partially hybridized states *D*_{i} as the system attains a new equilibrium consisting of a perturbation of the original equilibrium distribution of these hybridized states represented by π.

## V. ESTIMATION OF THE MODEL PARAMETERS

In Sec. III, we have developed a functional relationship between relaxation time and measurable thermodynamic and kinetics parameters of reaction (R1) using two different reaction mechanisms and model formulations. Generally, such a functional relationship is represented as follows:

### A. Estimation of *k*_{1}

Estimation of *k*_{1} is a nonlinear parameter estimation problem in which a functional relationship between model parameters and experimentally measured dependent variables (37) is available. For a known set of relaxation time vs equilibrium concentration of single strands data, a least square objective function equal to the sum of squared differences between actual and predicted relaxation times can be minimized to estimate *k*_{1}. Given a set of τ vs. [*D*_{eq}] data, such a parameter estimation problem can be formulated and *k*_{1} can be estimated at a specific temperature.

However, at the current time, only limited experimental τ vs. [*D*]_{eq} data are available for oligonucleotides. At a specified temperature, only a single relaxation time data are available. Due to this limitation, we solved (37) algebraically for *k*_{1} given τ for a specified sequence, and used the relaxation time data available for the other sequence for out-of-sample prediction.

Under the steady-state assumption, Craig *et al.*^{10} derived the following equation that relates τ (here expressed in terms of the hybridization rate constant *k*_{f}) to *s*, *N*, σ, and *k*_{1}:

From Eq. (38), it is evident that *k*_{1} is a function of *N* and as long as *N* is of the same order of magnitude, *k*_{1} does not vary much with respect to length of a sequence. In Sec. III A, we have shown that σ decreases up to an order of magnitude when *N* increases from 8 to 14 base pairs. Further, Wetmur and Davidson^{19} and Pörschke and Eigen^{12} showed that *k*_{f} increases with respect to the length of a sequence. Hence, for a fixed temperature, when *N* increases *k*_{1} should increase. This can be explained with more physical arguments and reasoning as follows. For a fixed temperature, a sequence with more base pairs is more stable than a sequence with fewer base pairs. The individual stability constant *s*_{i} is equal to the ratio

*s*is a constant since a shorter sequence dissociates more quickly than a longer sequence,

*k*

_{−i}for a shorter sequence should be higher, and

*k*

_{−1}should be lower than those of a long sequence.

*k*_{1} for the transient models can be calculated by using the characteristic equation of the state space matrix *A*, with the negative reciprocal of the relaxation time τ substituted in it as the eigenvalue λ, for the functional relationship (37). Since the state space matrix is relatively large, we implemented a bisection method that is described below to solve for *k*_{1} based on the transient models.

Assume an interval, [

*a*,*b*] for*k*_{1}. Note that at*a*, the eigenvalue of the*A*should be greater than the actual eigenvalue (λ_{actual}) and at*b*it should be less than the actual value.Let

$c = \frac{a+b}{2}$$c=a+b2$.Find the eigenvalue of

*A*at c.If (λ

_{c}− λ_{actual}< 0),*b*=*c*, else*a*=*c*.Repeat until (λ

_{c}− λ_{actual}) ≈ 0.

Using the above algorithm for the transient models and Eq. (38) for the steady-state model, we have determined *k*_{1} for four different sequences, *A*_{4}*U*_{4}, *A*_{5}*U*_{5}, *A*_{6}*U*_{6}, and *A*_{7}*U*_{7}, using their relaxation time and initial concentration of single strands data that were provided by Craig *et al.*^{10} Henceforth, all our analysis is based on the reaction conditions given in Craig *et al.*,^{10} which are similar to the standard PCR reaction conditions. Fig. 6 shows the variation of *k*_{1} with respect to temperature for the sequences *A*_{5}*U*_{5} and *A*_{7}*U*_{7}. The variation of apparent rate constant *k*_{1} with respect to temperature follows an Arrhenius relationship and it is similar to *k*_{f} with a negative apparent activation energy that is a function of actual activation energy and the heat of base pair formation.^{11,12} At a fixed temperature, the difference between *k*_{1} values that were calculated based on one- and two-sided melting theories is negligible. However, there is an appreciable difference between *k*_{1} values that are calculated based on the steady-state and transient models. Between the one- and two-sided models, the negative activation energy of *k*_{1} based on one-sided melting theory is lower than that from the two-sided melting theory.

Fig. 7 shows the variation of the activation energy with respect to *N*. As we explained above, when *N* increases, *k*_{1} increases, and hence the negative apparent activation energy decreases. For a fixed *N*, though we expect that there will be a difference in apparent activation energy for homogeneous and heterogeneous sequences, this difference may lie within an acceptable error bound. Thus, we use the same *k*_{1} that was estimated using the homogeneous sequences for heterogeneous sequences as long as their lengths are the same. In the absence of experimentally measured relaxation time data for different N, we can extrapolate the reaction parameters for different lengths of DNA sequences. However, more experiments can be conducted with different compositions (see, e.g., Ref. 35) and lengths of DNA sequences to estimate *k*_{1} using least squares estimation rather than nonlinear root finding.

## VI. EXPERIMENTAL VALIDATION OF RELAXATION TIME MODELS

### A. Relaxation time of homogeneous oligonucleotides

In order to validate our theoretical model, we considered *A*_{10}*U*_{10},^{12} which was not one of the DNA sequences that were used to estimate the model parameters. To predict the relaxation time of *A*_{10}*U*_{10}, we used *k*_{1} that was estimated using the relaxation time of (*A*_{5}*U*_{5})_{2}. It should be noted that (*A*_{5}*U*_{5})_{2} (self-complementary sequence) and *A*_{10}*U*_{10} (non-self-complementary) are different sequences. Fig. 8 compares the relaxation time values that were measured using experiments to those predicted by the one-sided melting theory, two-sided melting theory, and the steady-state model, respectively. While one- and two-sided melting models are consistent with the experimental data within an acceptable experimental error limit, the steady-state model deviates significantly from the experimental results. Therefore, the steady state assumption is not generally valid and the transient models are recommended. There is no appreciable difference in the relaxation times that were estimated using the one- and two-sided melting models.

Fig. 9 shows the steps involved in predicting hybridization rate constants.

### B. Relaxation time of heterogeneous oligonucleotides

In this section, we predict the relaxation times of heterogeneous DNA using one-sided and two-sided melting theory and compare these predictions. Since we have established in Sec. VI A that the steady state model is not useful, we neglect it. We consider the effects of N, σ, temperature, and GC content on the relaxation time. As we have mentioned in Sec. III, in oligonucleotide hybridization the σ parameter accounts for the resistance to formation of first base pair. In a specific case of domain melting of long DNA, σ is fixed to be 1. Although we did not undertake a thorough analysis of domain melting here, we investigated the effect of setting the σ to 1 to understand the effect of the nucleation parameter on relaxation times for DNA melting. The sequences of different lengths and GC content that were used in our study are given in Table I. The initial concentration of single strands for this study is 1 μ*M*. We have shown that the kinetic parameter *k*_{1} varies both with respect to length of a sequence and temperature. We therefore use *k*_{1} that was estimated for a *A*_{n}*U*_{n} sequence of the same length. The effects of GC content and temperature on the relaxation time for N = 14 are shown in Fig. 10. For a fixed GC content, when the temperature increases, the relaxation time decreases. At a fixed temperature and concentration, when the GC content of a sequence increases, the relaxation time also increases. When the overall stability of a sequence increases, the relaxation time increases. In contrast to the homopolymers studied in Sec. VI A, the relaxation times predicted by the one- and two-sided models differ for the heteropolymers considered here. The difference between the relaxation times that were obtained using one- and two-sided melting theories increases when the temperature increases.

Sequence . | N . | % GC . |
---|---|---|

GATTGTGTAGATAA | 14 | 57 |

GACTGTGTAGCTCC | 14 | 29 |

GACTGTGC | 8 | 63 |

GATTGTGT | 8 | 38 |

Sequence . | N . | % GC . |
---|---|---|

GATTGTGTAGATAA | 14 | 57 |

GACTGTGTAGCTCC | 14 | 29 |

GACTGTGC | 8 | 63 |

GATTGTGT | 8 | 38 |

All of the above analysis was repeated for σ = 1 and the results are shown in Fig. 11. When σ = 1, there is no resistance to the formation of the first base pair. The qualitative change of behavior in relaxation time with respect to temperature and GC content for σ = 1 is similar to that observed for oligonucleotide hybridization. At a fixed temperature and GC content, however, the relaxation time for σ = 1 is less than that for oligonucleotide hybridization.

We also estimated the relaxation time for a short sequence with 8 base pairs using one- and two-sided melting theories. At a fixed temperature and GC content, the relaxation time of a short sequence is lower than that of a long sequence (Fig. 12). Unlike the long oligonucleotides, for short oligonucleotides there is a significant difference between the predictions of one- and two-sided melting theories at all temperatures.

### C. Unimolecular hybridization dynamics of heterogeneous oligonucleotides

In this section, we study the properties of solutions to the more general two-sided melting model derived in Sec. IV to systematically analyze the dynamics of CRM (Fig. 5). With relaxation time alone, it is impossible to determine whether a particular step of CRM (Fig. 5) controls the kinetics of reaction (R1). The evolution of

*v*

_{i}that are coupled in CRM (Fig. 5) are nonzero. We considered the second sequence that is given in Table I and performed the above explained study for

Oligonucleotide hybridization

In unimolecular DNA melting processes (domain melting), the nucleation parameter σ = 1.

#### 1. Case 1: σ corresponds to oligonucleotide hybridization, low temperature

Fig. 13 shows the evolution of

*v*

_{i}in Eq. (35) has multiple nonzero elements (data not shown), indicating that no single reaction step in CRM (Fig. 5) is rate-limiting. This is consistent with nucleation occurring at any position along the sequence, unlike the case of one-sided melting, where nucleation is associated with only one reaction step in CRM (Fig. 5).

#### 2. Case 2: σ corresponds to oligonucleotide hybridization, high temperature

The oligonucleotide that is considered in this study melts at 70 °C. Therefore, the reverse direction of reaction (R1) is dominant compared to forward direction. Unlike the hybridization reaction, the melting reaction does not have a resistance in the first step. Therefore, no single mode completely dominates the kinetics of CRM (Fig. 5) at high temperatures. Hence, almost all

#### 3. Case 3: σ = 1.

We analyzed the reaction mechanism of reaction (R1) with σ = 1. As expected, both at higher and lower temperatures,

## VII. EXPERIMENTAL VALIDATION OF HYBRIDIZATION MODEL

In this section, we calculate oligonucleotide hybridization rate constants by solving Eqs. (2a) and (5) for *A*_{5}*U*_{5} and compare them with those estimated experimentally. Relaxation times were obtained from the results in Sec. VI. To calculate *K*_{eq}, we did not apply the NN method because experimentally estimated Δ*G*_{seq} were available. Fig. 14 compares theoretically and experimentally estimated rate constants for *A*_{10}*U*_{10}. While the sign of the apparent activation energy is negative for the forward rate constant, it is positive for the reverse rate constant. The apparent activation energy for the forward rate constant is a function of an actual activation energy and reaction enthalpy of base pair formation.^{10} Since the base pair formation is an elementary reaction, the actual activation energy would be positive and small. In other words, the temperature dependence of rate constants is mainly due to the Gibbs free energy. The reverse rate constant, however, shows a strong dependence on the temperature. At any chosen temperature, the difference between experimentally and theoretically estimated rate constants is small. We have already shown in Sec. VI that the difference between the variation of theoretically and experimentally estimated relaxation times with respect to temperature lies within an acceptable error bound. Each experimental data point compared here was estimated by fitting Eq. (5) for various total concentrations, whereas, theoretical rate constants are calculated by solving Eq. (5) with one relaxation time and a corresponding total concentration, because the different total concentration values that were used in the experiments are not available. This could be a reason for the difference between experimental and theoretical results. Assessment of the accuracy of sequence-dependent rate constant prediction for a broad spectrum of sequences should be carried out systematically with larger datasets including sequences with relaxation times spanning a larger range. The availability of more comprehensive experimental relaxation time data, especially for non-self-complementary sequences with a range of GC contents, should facilitate the development of more accurate parameter sets for oligonucleotide hybridization kinetics. The current work is focused on providing a systematic theory for sequence-dependent oligonucleotide hybridization kinetics with a few representative examples.

We also investigated the robustness of our model by studying the effect of the difference between theoretically and experimentally estimated rate constants on the evolution of reaction products. Fig. 15 shows the evolution of product, *D*, of reaction (R1) that was predicted using the experimentally and theoretically estimated rate constants. It can be observed from this figure that only at low temperatures there is a small deviation between experimental and theoretical prediction. At all other temperatures, there is almost no difference. Within the given temperature interval in which the most of the hybridization occurs, the difference between experimentally and theoretically predicted evolution of *D* of reaction (R1) is negligible. The given dimensionless concentration in Fig. 15 should be multiplied with the micromolar or nanomolar units to get the actual concentration; hence, the difference between the experimental and theoretically predicted concentrations is well within experimental error. Although in this case, the system reaches equilibrium within few seconds, it is shown below that there are different cases of hybridization where sequence-dependent kinetics have a strong influence on the evolution of product over experimentally relevant time scales.

## VIII. APPLICATION: PCR SIMPLEX HYBRIDIZATION REACTION

In the context of PCR, oligonucleotide hybridization is called annealing. Annealing reactions can be competitive or noncompetitive depending on the stage of PCR. Here, we consider a specific stage of PCR at which the single strand and primer concentration are equal and the annealing reactions are competitive. The reaction scheme and state equations have been provided in the supplementary material.^{34} In this example, we consider a short primer of length 14 base pairs, since the parameters in our kinetic model were estimated for oligonucleotides of lengths 8–14 base pairs. Fig. 16 shows the evolution of the reaction species for two different set of equimolar primer and single strand concentrations: (1) 1*nM*, (2) 10*nM*. Even though primer and single strands concentrations are equal, since the order of the annealing reactions is 2, their kinetics depends on the initial concentration of the primer. At a higher concentration of primer, the reaction reaches equilibrium faster. Note that the y-axis is dimensionless and it expresses the relative concentration with respect to initial concentration of single strands. This kinetic model can be integrated with models for the other parts of PCR, such as enzyme binding and enzymatic extension reactions, and a state space system for PCR can be developed. Such a state space system is suitable for derivation of optimal reaction temperature vs time profiles for PCR reactions.

## IX. CONCLUSION

In this work, we have presented a theoretical approach to the prediction of sequence and temperature-dependent rate constants for oligonucleotide hybridization reactions using the equilibrium and relaxation theories of chemical reactions. We considered one-sided and two-sided melting theories to determine the relaxation times of oligonucleotide hybridization reactions, with important modifications to those proposed in past literature on homopolymers. The two-sided melting theory that is proposed in the past literature has been modified so that it is consistent with the laws of reaction kinetics. The significance of model parameters has been analyzed in detail and methods for the estimation of these parameters using available experimental data have been presented. Steady-state models were found to be inaccurate, whereas the proposed one-sided and two-sided theoretical models were found to be consistent with the available experimental data. The difference between the predicted and experimental rate constants lies within an acceptable error bound and the difference between the predicted and measured activation energies is negligible. Important differences between one- and two-sided melting model predictions are observed for heteropolymers, especially for shorter sequences, indicating that the more physically realistic two-sided models are generally preferred. Transient dynamical analysis of two-sided models shows that no single elementary reaction step is rate-determining. Finally, we studied the applications of oligonucleotide hybridization reactions in the context of PCR and showed the importance of hybridization kinetics in PCR annealing reactions. Our kinetic model is suitable for the purpose of optimal control studies of PCR, among many other applications.

## ACKNOWLEDGMENTS

K.M. and R.C. would like to thank the School of Chemical Engineering, Purdue University, for providing partial support for this work. K.M. dedicates this article to his nieces Mohitha and Sivarshini and nephew Kumaran.