We introduce oxNA, a new model for the simulation of DNA–RNA hybrids that is based on two previously developed coarse-grained models—oxDNA and oxRNA. The model naturally reproduces the physical properties of hybrid duplexes, including their structure, persistence length, and force-extension characteristics. By parameterizing the DNA–RNA hydrogen bonding interaction, we fit the model’s thermodynamic properties to experimental data using both average-sequence and sequence-dependent parameters. To demonstrate the model’s applicability, we provide three examples of its use—calculating the free energy profiles of hybrid strand displacement reactions, studying the resolution of a short R-loop, and simulating RNA-scaffolded wireframe origami.
I. INTRODUCTION
DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) are sufficiently similar that they can form stable DNA–RNA hybrids.1 In a biological context, an important example of such hybrids is the R-loop, which forms when one of the strands in double-helical DNA is displaced by complementary RNA to create a hybrid duplex and an unpaired DNA strand.2, In vivo, short R-loops form during nuclear DNA replication by RNA primers, as well as during transcription when nascent RNA anneals to the DNA template inside an RNA polymerase.3 The formation of an R-loop is also necessary for the proper functioning of RNA-guided endonucleases in CRISPR-Cas systems, where the guide RNA must fully hybridize with its DNA target for cleavage to take place.4–6 Much longer R-loops (of the order of 1 kilobase) are formed during the replication of mitochondrial DNA and immunoglobulin class-switch recombination.3 R-loops play an important role in gene regulation. Errors in their formation and resolution can cause DNA damage, transcription elongation defects, hyper-recombination, and genome instability,7 and they are also implicated in disease.8,9 Finally, DNA–RNA hybridization underlies the action of antisense oligonucleotide (ASO) drugs, a therapeutic modality that has shown great promise in, for example, the treatment of neurological disorders.10–12
The specificity and predictability of Watson–Crick base-pairing also make DNA and RNA excellent candidate materials for the design of synthetic self-assembled nanostructures, underpinning the growing field of nucleic acid nanotechnology.13 By simply annealing sets of strands with designed patterns of sequence complementarity, DNA has been used to assemble complex shapes,14,15 dynamic nanomachines,16–19 and constructs with potential therapeutic and diagnostic applications.20–23 Due to the presence of non-canonical interactions in RNA, its self-assembly is less well characterized. However, the field of RNA nanotechnology is also advancing rapidly, with many examples of functional nanostructures and methods for their assembly.24–26 The design of nanostructures comprising DNA hybridized to RNA is under-explored, although interest is increasing with exciting potential uses such as the delivery of therapeutic mRNA and artificial ribozyme fabrication.27–29
Many different approaches have been developed to tackle the problem of nucleic acid modeling and simulation. Analytical mathematical models such as the worm-like chain (WLC),30 which treats DNA or RNA as a semi-flexible polymer, can be useful if one is not concerned with details of the structure of the system. Classical molecular dynamics (MD) simulations that consider effective interactions between every atom have yielded useful insights into nucleic acid structure and dynamics, although they can only access microsecond timescales.31–33 Quantum-chemical calculation is the most fine-grained computational technique used to study nucleic acids,34 but this is usually limited to very small systems such as dinucleotides.35 Coarse-grained models, in which groups of atoms are represented as single particles, are a viable intermediate that offers a compromise between speed and detail.36,37 Multi-scale modeling of DNA nanostructures is reviewed by DeLuca et al.38 While many coarse-grained models of DNA and RNA have been developed,39–44 modeling of hybrid systems, coarse-grained or otherwise, is relatively sparse and is mostly limited to atomistic simulations.45–47 Other examples include a mesoscopic model parameterized to reproduce melting temperatures48 and an abstract model for R-loop formation.49
Here, we combine the most up-to-date versions of the models for DNA and RNA developed within the oxDNA framework50 to enable the simulation of DNA–RNA hybrids. The original average-sequence DNA model51 has been extended to introduce sequence-dependent thermodynamic properties,52 improved structural properties, and salt dependence.53 The same coarse-graining methodology has been used to develop an RNA model.50,54 A version of the DNA model with sequence-dependent structural and elastic properties is currently under development. The oxDNA family of models has seen tremendous success as tools for the study of nucleic acids and has improved our understanding of DNA and RNA origami55–61 and strand displacement reactions,62,63 fundamental nucleic acid biophysics,64–72 as well as the thermal fluctuation and reconfiguration of flexible DNA nanostructures.73–75 The introduction of our hybrid model to include DNA–RNA interactions will further expand the range of systems that can be simulated.
II. THE MODEL
Here, we provide a brief overview of the previously developed models for DNA and RNA as well as the introduction of new inter-strand interactions that enable the simulation of hybrids. We then describe in detail how the model was parameterized.
A. oxDNA and oxRNA
B. Incorporating DNA–RNA interactions
C. Parameterization
Parameterization of our hybrid model was constrained by the requirement to avoid changes to UDNA and URNA in order to maintain compatibility with the original oxDNA and oxRNA models; only the hybrid DNA–RNA interactions were modified. In future versions, it would be possible to reparameterize all of Uhybrid and potentially obtain an even better fit to experimentally measured properties.
As for oxDNA and oxRNA, we parameterize the hybrid model by fitting it to the predictions of a nearest-neighbor model of thermodynamic properties, which has itself been calibrated to reproduce experimental observations. Nearest-neighbor models for nucleic duplex formation are built by first conducting melting experiments for a range of sequences and using these data to estimate the thermodynamic parameters (ΔH and ΔS) associated with the formation of every possible nucleotide pair in the context of its nearest neighbors (as well as initiation parameters). Using these parameters, one can estimate the melting temperature (Tm) of an entire duplex, which is defined as the temperature at which the single-stranded (ss) and double-stranded (ds) states are equally probable.
Sugimoto et al. first estimated nearest-neighbor thermodynamic parameters for DNA–RNA hybrids over two decades ago.76 A more recent set of improved parameters (now also with sequence-specific initiation parameters)77 is employed here. The Sugimoto nearest-neighbor model (SNN) predicts melting temperatures to an accuracy of roughly 1 °C; for the purposes of this work, we consider it to be a very good fit to the experiment. Figure 2 highlights the drastic effect that sequence can have on melting temperature in DNA–RNA hybrids, also showing the differences in melting thermodynamics between hybrids, dsDNA and dsRNA. We used the nearest neighbor model of SantaLucia and Hicks78 to estimate dsDNA melting temperatures and the model of Xia et al. for dsRNA.79 In both cases, we employed an empirical salt correction to Tm derived by SantaLucia.80 Melting temperatures were calculated using Biopython 1.75.81 There is a large difference in stability between dA-rU and dT-rA base pairs, which has been attributed to the presence of the C-5 methyl group in thymine and to the location of the 2′-OH group in the purine.82,83 The stabilities of dG-rC and dC-rG base-pairs also differ, as illustrated by the difference between melting curves for poly-dG-rC and poly-dC-rG. The Tm shown are at a monovalent salt concentration of 1M and a total strand concentration of 3.5 × 10−4M—values that were also used in melting simulations.
For the average-sequence model, we assume that all hydrogen bonds have the same strength, ɛHB. We ran melting simulations for a range of bond strengths for duplexes of lengths 6, 8, 10, and 12—in each case, we found a linear relationship between and ɛHB. For each duplex length, we fit a straight line to the data [Fig. 3(a)], enabling an accurate prediction of from ɛHB. We chose the value of ɛHB, which minimizes C over the average-sequence training library, Savg.
In the sequence-dependent case, we have four possible types of hydrogen bonds, thus four parameters to select: ɛdArU, ɛdTrA, ɛdGrC, and ɛdCrG. Note that we distinguish between dG-rC and dC-rG base pairs as well as between dA-rU and dT-rA, as the Sugimoto model suggests that the distribution of bases between the strands affects duplex stability (see Fig. 2). In order to fit sequence-dependent parameters, previous iterations of our coarse-grained models used a histogram reweighting technique to calculate melting temperatures and an annealing algorithm to search the parameter space.52–54 This approach is necessary when one is fitting >10 parameters, many of which, e.g., stacking strengths, have quite subtle effects on Tm. Since the parameter space to be searched is much smaller, we are able to use a simpler method. We first find an approximate linear mapping between the sequence and the melting temperature predicted by our model. We then use this mapping to find the parameters that best reproduce the melting temperatures predicted by the Sugimoto model.
In order to search the parameter space, we used the following initial minimization procedure: (1) initialize parameters to average-sequence values. (2) For every sequence in Sdep, use the current values of ɛdXrY to calculate the average bond strength and, assuming the linear scaling established for the average-sequence model, the corresponding approximation to . Compute C. (3) Randomly perturb parameters to generate new parameters, and repeat step 2 for the new parameter set. (4) If new parameters reduce C, accept them; otherwise, repeat step 3. (5) Repeat steps 2–4 until C converges.
Using our previously obtained estimates of the bonding parameters, we ran melting simulations for 500 random sequences (125 per duplex length) and used these data to fit an MLR model for each length of the duplex [Fig. 3(b)]. We then performed the minimization procedure described earlier—now with the improved VMMC predictions made using the MLR models—to arrive at a refined parameter set. We repeated all of the above (i.e., melting simulations of 500 random sequences, MLR model fitting, followed by minimization) for a final time but found that by this point, the parameters had converged.
Note that initially we also attempted to fit the strength of the cross-stacking interaction for the average-sequence model. We performed preliminary fitting of the hydrogen bonding and cross-stacking strengths simultaneously and found that the value that minimized C was , where is the value used by the DNA model (for reference, ). However, we also found that increasing while decreasing ɛHB accordingly (and vice versa) made little difference to the overall fit. Since the relative strengths of the cross-stacking and hydrogen bonding interactions are not experimentally constrained, different values for and ɛHB could have been chosen without detriment to the model. We chose to set and then selected ɛHB using the procedure outlined earlier in this section.
The coaxial stacking and Debye–Hückel interactions could also, in principle, have been reparameterized. However, to the best of our knowledge, no data for DNA–RNA hybrids exist that could be used to fit these interactions. We set the parameters of the Debye–Hückel interaction for hybrids to the values used by the RNA model. The hybrid model uses the same coaxial stacking interaction as the DNA model.
III. PROPERTIES OF THE MODEL
In this section, we report the physical predictions of our model. These include the structure of double-stranded DNA–RNA hybrid duplexes, the melting behavior of both the average-sequence and sequence-dependent versions of our model, and mechanical properties such as persistence length and force-extension characteristics.
A. Structure
The structures of double-stranded DNA and RNA differ significantly—DNA most commonly folds into a B-form helix, whereas RNA takes up an A-form conformation. The A-form helix is characterized by significant slide (displacement of adjacent base pairs along the long axis of the pair) and roll (the angle by which base-pairs open up toward the minor groove), with the result that base pairs are shifted away from the helical axis and inclined to it.87,88 Figure 4(d) defines the parameters x-displacement and inclination,89 which are used to characterize the structure of the double helix in this work.
Reports on the exact structure of DNA–RNA hybrids vary. Thanks to studies of polymeric hybrids, it is largely accepted that poly-rA-dT can experience an A- to B-form transition with changes in relative humidity.90 Hybrids containing poly-dA-rU or poly-dI-rC have been termed heteromerous, whereby the DNA and RNA strands possess B- and A-form characteristics, respectively.91 The detailed structure of oligomeric hybrids can depend on sequence—it is known, for instance, that the purine/pyrimidine content of the DNA strand can change the backbone conformation.92 A nuclear magnetic resonance (NMR) study by Gyi et al.93 found that the extent of A- or B-form helicity as well as the major/minor groove widths vary with purine/pyrimidine content. They also found that a high-purine DNA strand results in greater conformational diversity as a result of increased sugar flexibility, compared to the case when the RNA strand of the hybrid duplex is high in purine. More recent crystallography studies of oligomeric DNA–RNA hybrids typically characterize them as A-form,94–96 and estimates of their exact x-displacement and inclination obtained from all-atom simulations suggest a structure in-between those of DNA and RNA.47
To determine the structure of a hybrid duplex in our model and compare it to DNA and RNA, we generated 10 000 uncorrelated configurations of each class of a 16 base-pair duplex by performing average-sequence Monte Carlo simulations at 25 °C with a monovalent salt concentration of 0.5M. We then measured the helical parameters of each configuration and calculated their means, as shown in Table I. Note that the values for the RNA model differ from those first reported by Šulc et al.54 since the model used here includes salt-dependent effects that were not included in the original model. Representative structures are shown in Fig. 4. We see that the values of inclination, x-displacement, and pitch are intermediate with respect to those for DNA and RNA. While our coarse-grained models are not primarily designed to achieve structural accuracy, it is encouraging that the high-level structural features of DNA–RNA hybrids emerge without being explicitly imposed.
Parameter . | DNA . | Hybrid . | RNA . |
---|---|---|---|
Inclination (deg) | 5.15 | 8.31 | 13.8 |
x-displacement (nm) | 0.0536 | 0.265 | 0.549 |
Pitch (bp/turn) | 10.6a | 10.8 | 11.0 |
Rise (nm/bp) | 0.347b | 0.343 | 0.280c |
Parameter . | DNA . | Hybrid . | RNA . |
---|---|---|---|
Inclination (deg) | 5.15 | 8.31 | 13.8 |
x-displacement (nm) | 0.0536 | 0.265 | 0.549 |
Pitch (bp/turn) | 10.6a | 10.8 | 11.0 |
Rise (nm/bp) | 0.347b | 0.343 | 0.280c |
In oxRNA, an A-form conformation is imposed on the helix by making the stacking interaction dependent on the angle between the nucleotide orientation vector and the backbone vector connecting neighboring nucleotides, such that the potential energy of the stacking interaction is minimized if the helix adopts an A-form geometry. This angular dependence is not present in the DNA model and, in hybrids, only the RNA strand has this modified stacking interaction. However, the short range of the hydrogen-bonding interaction forces base pairs to lie approximately in the same plane, resulting in a compromise between A- and B-forms. We find that this intermediate helix geometry has an effect on thermodynamic properties, which are discussed in Sec. III B.
B. Thermodynamics
The model parameters selected by the fitting procedure described in Sec. II C and used below are shown in Table II. We note that the hydrogen bonding parameters required to reproduce the correct melting temperatures are substantially larger than in either oxDNA or oxRNA; this point is discussed below.
Parameter . | DNA . | Hybrid . | RNA . |
---|---|---|---|
ɛHB | 1.07 | 1.50 | 0.87 |
ɛAU/AT | 0.89 | 1.21/1.37 | 0.82 |
ɛGC/CG | 1.23 | 1.61/1.77 | 1.06 |
Parameter . | DNA . | Hybrid . | RNA . |
---|---|---|---|
ɛHB | 1.07 | 1.50 | 0.87 |
ɛAU/AT | 0.89 | 1.21/1.37 | 0.82 |
ɛGC/CG | 1.23 | 1.61/1.77 | 1.06 |
The fit of the average-sequence model to target melting temperatures is shown in Fig. 5. While, in general, our model reproduces the melting behavior of short hybrid duplexes quite well, there is a noticeable deviation from target temperatures at short strand lengths—for strands of length 6 and 8, the melting temperature is overestimated by around 7.1 and 3.6 °C, respectively. In the average-sequence DNA and RNA models, corresponding deviations are typically no more than 1 °C.
In order to investigate how hybridization between DNA and RNA affects individual interactions, we computed the mean potential energies associated with stacking and hydrogen bonding using a simulation protocol similar to that used in Sec. III A but with the temperature set to 1 °C in order to reduce fluctuations away from the double-stranded ground state. In general, stacking contributes less to the stability of hybrids than dsDNA or dsRNA duplexes. This is because A- and B-form geometries, respectively, were imposed onto the RNA and DNA models through the forms of the interaction potentials: when part of a hybrid duplex, neither the DNA nor RNA is in its preferred conformation, which has a destabilizing effect. This explains why the fitting procedure described in Sec. II C increases the hydrogen bonding strengths to compensate (cf. Table II). We also find that, as strand length increases, both stacking and hydrogen bonding interactions become, on average, less stabilizing. This can be understood as a consequence of stabilizing relaxation of the strained duplex near the ends, which becomes relatively less important as the duplex increases in length. It is also noteworthy that stacking is more disrupted for the RNA strand of a hybrid duplex than for the DNA strand. We propose that this tendency for (RNA) stacking and hydrogen bonding to weaken with increasing strand length is the reason for the melting temperature overestimation in 6- and 8-mers. The model could be further adapted to include a modified stacking potential that can better accommodate hybrids, enabling an even better fit to experimental melting temperatures. This could be implemented by including a double-well angular/radial dependence in the stacking interactions, such that A- and B-form helicities are maintained in dsRNA and dsDNA, respectively, while also allowing a hybrid duplex to inhabit a second potential energy well, mitigating the destabilizing effect in the current version of the model.
In order to test the sequence-dependent version of the model, we ran melting simulations on 1000 random duplexes of lengths 6, 8, 10, and 12 (250 per length). Sequences with predicted melting temperatures below 1 °C (short, U-rich sequences) were discarded. The results are shown in Fig. 6. Over this 1000-sequence test set, the model achieves a mean ΔTm of 0.0926 °C with a standard deviation of 5.36 °C. While we consider this to be a more than satisfactory fit, we are aware of factors that limit our model’s performance. The first is its overestimation of the stability of short duplexes, as discussed for the average-sequence model. In Fig. 6, there is a noticeable overestimation of Tm in the <30 °C region, which is almost certainly a manifestation of this effect. As discussed in Sec. III A, sequence can affect backbone conformation. Our model does not factor in these structural changes, which likely worsens the overall sequence-dependent fit.
C. Mechanical properties
The mechanical properties of nucleic acids are biologically important97 and determine the mechanical behavior of synthetic constructs like DNA origami.98 For this reason, it is important to check that our model captures the basic mechanics of double-stranded DNA–RNA hybrids. Here, we measure the persistence length and force-extension characteristics of hybrid duplexes within our model and compare the results to available experimental data.
Experimental data on the mechanics of hybrid duplexes are scarce, and the number of all-atom simulation studies is also low. Zhang et al.101 performed a series of magnetic tweezer experiments to measure the mechanical properties of a long (>10 kilobase) hybrid duplex at different salt concentrations. They report a stretching modulus of 660 pN, which does not depend strongly on salt concentration. Conversely, salt does have an effect on persistence length, which ranges from 49 to 63 nm at salt concentrations of 0.5 and 1M, respectively. An all-atom simulation study performed at a 1M monovalent salt concentration estimated a stretching modulus of 834 pN.47 Given that the model is parameterized to reproduce thermodynamic properties, the agreement between calculated and measured elastic properties is satisfactory. We note that for low applied forces (<35 pN or so), the persistence length is more significant than the stretching modulus in determining the mechanical behavior of the duplex.
To put this into perspective, the persistence length Lp of dsDNA at moderate to high salt concentration is in the range 45–50 nm, and the stretching modulus K is around 1050–1250 pN at high salt.51 The first version of the oxDNA model achieves Lp = 43.8 nm and K = 2120 pN. For dsRNA, experimental estimates of Lp are in the range 58–80 nm and K = 615 pN, while for the oxRNA model, Lp = 28.3 nm and K = 296 pN.
IV. APPLICATIONS OF THE MODEL
We provide examples of the application of the model to three hybrid systems—toehold-mediated strand displacement (TMSD), a short R-loop, and RNA-scaffolded wireframe origami, all of which are technologically and/or biologically important.
A. Toehold-mediated strand displacement
Toehold-mediated strand displacement (TMSD) is a process in which one of the strands within a nucleic acid duplex is exchanged for another. The displacement of the incumbent strand is initiated by the binding of the invader to a short single-stranded toehold region on the complementary strand16,102 [Fig. 8(a)]. TMSD has many applications in nanotechnology, including in the construction of synthetic molecular circuits.103 DNA–RNA hybrid TMSD is of particular interest by virtue of its relevance to in vivo applications.104 Strand displacement has also been argued to play an important role in various naturally occurring RNA systems.105
We note that oxDNA has been remarkably successful in reproducing experimental observations related to TMSD, having been used, for example, to study mismatches as a tool for modulating strand displacement kinetics.106,107 RNA strand displacement has likewise been simulated using the oxRNA model.62
A common feature of all of the free energy profiles is the initial downhill trajectory in the range of 1–4 invader-substrate hydrogen bonds. This is associated with toehold binding, which is always favorable as there is no competition between strands. Generally, there is an entropic barrier associated with the formation of a branch junction during strand displacement, which is seen as an activation barrier in the branch migration region (for RNA invading dsDNA and DNA invading hybrids). In the case of DNA invading dsRNA, the landscape is steeply uphill, as on average, dsRNA is substantially more thermally stable than a DNA–RNA hybrid. Conversely, when RNA invades a hybrid, this results in the formation of dsRNA, which is much more thermally stable than a hybrid duplex, resulting in a downhill landscape. This can be understood in terms of the difference in average melting temperature between dsDNA and dsRNA—around 60 and 71 °C for a ten base-pair duplex, respectively. The difference between the free energy landscapes for RNA invading dsDNA and DNA invading a hybrid is more subtle because hybrids and dsDNA are quite close in melting temperature (around 61 °C for a ten base-pair hybrid duplex). It is likely that this relative difference is smaller than the typical effects of varying base sequences.
The simulations performed here only scratch the surface of what can be studied with the model—future work will investigate the effect of sequence on TMSD free energies and kinetics. Preliminary simulations with the model suggest that free energy landscapes, as well as reaction kinetics, are strongly sequence-dependent. We are also looking into how the secondary structure in the RNA strand impacts the reaction. Given the success of previous oxDNA models in studying TMSD, we are confident that our DNA–RNA hybrid model will provide useful insights.
B. R-loop resolution
An R-loop is a three-stranded nucleic acid structure consisting of double-stranded DNA that is partially hybridized with complementary RNA. As discussed in Sec. I, this is possibly the most important naturally occurring DNA–RNA hybrid system.
We use our coarse-grained model to simulate the resolution of an R-loop. While this system appears to be similar to the TMSD studied in Sec. IV A, as both involve DNA–RNA strand displacement, we observe behavior that is quite different. The simulation protocol used closely resembles our TMSD simulations. We study a single R-loop consisting of 55 base-pair double-stranded DNA that is hybridized to a 25-nucleotide RNA strand at its center [Fig. 9(a), top]. As before, in order to prevent strand dissociation, we restrict the system to states with at least one DNA–DNA and one RNA–DNA hydrogen bond and use average-sequence parameters. In this case, we ran separate simulations for two overlapping windows of the order parameter space—one restricted to 1–13 RNA–DNA hydrogen bonds and another to 13–25 bonds. We performed ten independent VMMC simulations per window, each for 3 × 108 time-steps. Temperature and monovalent salt concentration were the same as for our TMSD simulations.
Computed free energy profiles are shown in Fig. 9(b). There is a barrier of around 2kBT associated with the transition from 1 to 2 RNA–DNA bonds. The zoomed-in snapshot of the resolved state in Fig. 9(a) suggests an explanation. In the resolved state, the DNA double helix tends to be fully closed, with the RNA strand forming a weak hydrogen bond with one of the DNA strands. As a result, in order to make the transition from one to two RNA–DNA bonds, two DNA–DNA bonds must be broken, which is energetically costly. This is, in part, an artifact of restricting the simulation to bound states. Without this restriction, the RNA strand would have dissociated completely in the resolved state.
In general, we observe that the formation of the DNA–RNA hybrid in this particular system is significantly less favorable than in the analogous TMSD reaction of RNA invading dsDNA, depicted in Fig. 8(b). Several factors contribute to the difference between the two energy landscapes. In a fully formed R-loop, displacement of the RNA strand can take place from either end; the DNA loop is tethered at both sides, increasing its proximity to the hybrid and making displacement more likely. Resolving an R-loop is clearly entropically favorable, as it entails the exchange of a single strand tethered at both ends for one tethered at only one end in our simulations or fully displaced in practice, thus having much greater conformational freedom.
We also observe an oscillatory component to the free energy, which has minima at R-loop sizes of around 13 and 23 RNA–DNA bonds. When the DNA–RNA hybrid helix is of a size roughly commensurate with its pitch (around 11 base pairs), the ends of the displaced DNA loop are on the same side of the duplex, which entails higher conformational freedom. Conversely, at half a turn away, e.g., around 18, the ends are at opposite sides of the duplex, reducing conformational freedom and leading to a slight additional increase in free energy cost.
The inset in Fig. 9(b) depicts a 2D free-energy landscape that provides additional information about the system. The presence of the R-loop destabilizes the DNA double helix beyond the region of the DNA–RNA hybrid, with states that are not fully hybridized being readily accessible. This is clear from the fact that, at any given number of RNA–DNA bonds, states with numbers of DNA–DNA bonds below what would be expected for a fully hybridized system (55 bonds in total) are sampled.
The stability of an R-loop depends on its length and sequence.108 An obvious future application of our model would be a comprehensive study of the effects of these factors. The kinetics of R-loop resolution could also be studied using specialized sampling techniques.
C. RNA-scaffolded wireframe origami
Nucleic acid origami is one of the most common techniques used for assembling single-stranded DNA/RNA building blocks into a target structure. Origami nanostructures consist of a scaffold, which is a long strand running through the entire assembly, and shorter staple strands that hybridize into two or more scaffold domains to control their spatial arrangement. Domains of the scaffold strand, which are widely separated in their primary sequence, can be held in close spatial proximity in the final structure. This technique has been applied primarily to DNA, although interest in the design of DNA–RNA hybrid nanostructures is increasing.
We have used our model to simulate three hybrid wireframe origami nanostructures from Parsons et al.,29 which consist of an RNA scaffold and DNA staples. The structures were designed assuming a double helix with a pitch of 11 base-pairs per turn, which is roughly reproduced by our model. We performed MD simulations at 4 °C and a monovalent salt concentration of 0.3M to match the experiments. Each structure was simulated for 107 time-steps, and the positions of particles were sampled every 104 time-steps for analysis. We simulated three nanostructures—a tetrahedron, an octahedron, and a pentagonal bipyramid—each having edges 66 base-pairs long. For each, we calculated the mean structure and per-nucleotide RMSF (root-mean-square fluctuation). From these mean structures, we reconstructed all-atom models of the nanostructures using the oxDNA-to-PDB converter on TacoxDNA71 (by superimposing atomic coordinates onto individual nucleotides) and then aligned them with cryo-EM densities, obtained by Parsons et al. and retrieved from EMDB,109 using ChimeraX.110
Figure 10 compares our results to the experimental data. Our model captures the measured structures reasonably well, with no systematic strain build-up. For the tetrahedron and octahedron, it is immediately clear that structural fluctuations are concentrated at edge centers.
V. CONCLUSION
We have introduced a new coarse-grained model based on existing oxDNA and oxRNA models that enables the simulation of DNA–RNA hybrids. As with previous models, we parameterized the hydrogen bonding interaction to reproduce the melting temperatures of short duplexes. Quantitative agreement with the experimentally calibrated nearest-neighbor model of the thermodynamics of hybrid duplexes is nearly as close as that achieved for DNA and RNA duplexes using oxDNA and oxRNA. The persistence length and stretching modulus derived from simulations of longer duplexes are consistent with experimental values, although some uncertainty about their values remains. The conformation of DNA–RNA hybrid duplexes is a compromise between the structures preferred by DNA and RNA alone. As a result, the stabilization of the duplex by stacking interactions is reduced, necessitating an increase in hydrogen bonding strength to produce the desired melting temperatures. One consequence of this choice is that the model overestimates the stability of short double-stranded helices—something that users of the model should keep in mind. Nevertheless, the overall performance of our DNA–RNA hybrid model for the systems we studied gives us confidence that it will be able to capture the sequence-dependent kinetics/thermodynamics of more complex biophysical processes. A future version of the model will include a modified stacking potential that can accommodate the preferred conformations of dsDNA, dsRNA, and DNA–RNA hybrids.
We have demonstrated the versatility and applicability of our model by performing simulations for three different systems. Our study of toehold-mediated strand displacement using the average-sequence model suggests that the relative stabilities of DNA–DNA, RNA–RNA, and DNA–RNA duplexes play a key role in determining the free energy landscapes of hybrid displacement reactions. Our simulations show that the biophysics of R-loop resolution includes geometric effects related to the commensurability of the R-loop length and the pitch of the double helix. Finally, we have shown that our model can help validate DNA–RNA hybrid origami designs.
Future work will focus on DNA–RNA hybrid systems at time and length scales that are inaccessible to all-atom simulations, including the sequence-dependent kinetics of strand displacement reactions and the effects of RNA secondary structure motifs.
SUPPLEMENTARY MATERIAL
In the supplementary material, we provide values of the β parameters from Eq. (4), as well as the DNA sequence used in Sec. IV B.
ACKNOWLEDGMENTS
The authors acknowledge Thomas Ouldridge and Jonathan Bath for useful discussions, Lorenzo Rovigatti and Erik Poppleton for their help with code development, and Erik Winfree for suggesting the name oxNA for the model. E.J.R. acknowledges the financial support provided by the Clarendon Fund, Somerville College (Oxford), and the Engineering and Physical Sciences Research Council (Grant No. EP/W524311/1). We also acknowledge the Advanced Research Computing service at the University of Oxford for computer time. P.Š. acknowledges support from the National Science Foundation under Grant No. CCF 2211794.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Eryk J. Ratajczyk: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Petr Šulc: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal). Andrew J. Turberfield: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal). Jonathan P. K. Doye: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal). Ard A. Louis: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Project administration (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The code implementing the model, along with the supporting documentation, can be found at https://lorenzo-rovigatti.github.io/oxDNA/. A new topology file format supporting DNA–RNA hybrids has been implemented in the official oxDNA code, and the accompanying suite of analysis tools has likewise been extended to enable the analysis of systems containing both DNA and RNA. The online visualization tool oxView.org 111 has been extended to also support viewing of DNA–RNA hybrids. The simulations performed here were run on single CPUs, although a GPU version of the model is a likely future development. The data that support the findings of this study are availablefrom the corresponding author upon reasonable request.