We have developed a multi-scale model describing the dynamics of internal segments of DNA in nanochannels used for genome mapping. In addition to the channel geometry, the model takes as its inputs the DNA properties in free solution (persistence length, effective width, molecular weight, and segmental hydrodynamic radius) and buffer properties (temperature and viscosity). Using pruned-enriched Rosenbluth simulations of a discrete wormlike chain model with circa 10 base pair resolution and a numerical solution for the hydrodynamic interactions in confinement, we convert these experimentally available inputs into the necessary parameters for a one-dimensional, Rouse-like model of the confined chain. The resulting coarse-grained model resolves the DNA at a length scale of approximately 6 kilobase pairs in the absence of any global hairpin folds, and is readily studied using a normal-mode analysis or Brownian dynamics simulations. The Rouse-like model successfully reproduces both the trends and order of magnitude of the relaxation time of the distance between labeled segments of DNA obtained in experiments. The model also provides insights that are not readily accessible from experiments, such as the role of the molecular weight of the DNA and location of the labeled segments that impact the statistical models used to construct genome maps from data acquired in nanochannels. The multi-scale approach used here, while focused towards a technologically relevant scenario, is readily adapted to other channel sizes and polymers.

## I. INTRODUCTION

The advent of genome mapping in nanochannels^{1–6} has spurred interest in understanding the properties of semiflexible polymers confined in channels smaller than their persistence length. In this technology, DNA molecules hundreds of kilobase pairs (kbp) long, containing sequence specific fluorescence probes, are electrokinetically injected into an array of tens of thousands of nanochannels (Fig. 1(a)). The width of these square channels is around 45 nm,^{2} which is roughly the same as the persistence length of DNA in a high ionic strength buffer.^{7} Under such strong confinement, the labels on the DNA backbone order linearly and their locations can be “read” using optical microscopy (Fig. 1(b)). Importantly, the genomic data arising in these experiments do not have any global hairpins in the DNA, which would scramble the ordering of the labels.^{8} The combination of extensive work in theory,^{9–14} simulation,^{12–34} and experiments^{8,34–41} have produced a relatively complete picture of the thermodynamics of DNA confined below its persistence length. However, considerably less is known about the dynamics of DNA in small nanochannels,^{26,29,36,42,43} in particular, the internal segmental dynamics. In the present contribution, we develop a multi-scale model that allows us to study efficiently the internal dynamics of DNA when confined in a nanochannel in the absence of global hairpins.

We are particularly interested in the internal dynamics of DNA due to its relevance to the practice of genome mapping. During these experiments,^{1,2} a two-color image similar to Fig. 1(a) is acquired using a fixed exposure time for each color, typically 150 ms per color with a brief delay in switching the laser line and optical filters. The locations of the fluorescent labels are then measured with respect to a reference point, for example, the edge of the entire DNA molecule or one of the other labels, to remove effects of DNA diffusion or drift. There are several sources of error in the measurement of separation distance in these experiments including ones due to static effects such as topological events,^{41} mutations, channel size variations, and dynamic thermal fluctuations. The latter are the main focus of the present work.

As seen in the kymograph in Fig. 1(c), obtained using a movie of the label positions rather than a single snapshot,^{40} the positions of the labels in genome mapping fluctuate under thermal energy, and these fluctuations can impact the accuracy of the resulting DNA map since a single snapshot of 150 ms would average over 4–5 frames of this kymograph. If the acquisition time for a snapshot image is long compared to the relaxation time (decorrelation time) for the distance between a given label and the reference point, then the measured label position will be close to its thermodynamic average value. This is the optimal result for genome mapping, since the resulting conversion from physical distance (nm) to genomic distance (base pairs, bp) only requires knowledge of the average extension of the chain, which can be obtained by theory,^{10,14} simulation,^{31,32} or experimental calibration with a known molecular weight standard. Unfortunately such long exposure times are prohibitive due to photobleaching, which reduces the fluorescence signal of the labels, and photocleavage of the molecule,^{49} which reduces the integrity of the DNA molecule and effectively terminates the experiment. Moreover, the accuracy inherent in increasing the exposure time must be balanced against the concomitant lower throughput. Since the autocorrelation function (ACF) is presumably an exponentially decaying quantity, there are diminishing returns to increasing the exposure time, even if photobleaching and photocleavage are minimized. Thus, it behooves us to understand how the relaxation time for label-label distances is affected by the genomic distance between them, their location within the DNA chain, and the overall molecular weight of the DNA molecule being imaged.

DNA, as a model polymer, affords the unparalleled ability to study the properties of polymers in confinement in a very direct manner by imaging single molecules. However, there is a paucity of work in the literature focusing on the dynamics of DNA in nanochannels, in particular, channels below the persistence length. When available, such studies tend to focus on the dynamics of the entire molecule. For example, the seminal work by Reisner *et al.*^{36} provided the first data on the relaxation time of *λ*-DNA over a wide range of channel sizes, results that were later captured by simulations and models.^{26,42} Recent work on the internal dynamics has focused on either the equilibrium^{50,51} or non-equilibrium^{52,53} behavior in channels much larger than the persistence length, and thus describe physics in confinement regimes that do not apply to genome mapping. There are a few studies involving the fluctuations of internal segments of barcoded DNA in channel sizes relevant for genomic mapping.^{8,18,40} However, in each case, only the static thermodynamic properties were measured.

Developing a model for the dynamics of internal fluctuations of DNA molecules during genome mapping poses substantial challenges due to the wide separation of length scales and the computational costs for modeling hydrodynamics in confinement. Explicitly, molecules used for genome mapping are typically greater than 150 kbp in length (and up to 1 megabase pair) to provide a sufficient number of fluorescent labels to align the molecule to the consensus genome map. However, in order to accurately model the configuration of such a chain in a 45 nm channel using a discrete wormlike chain model, we require the sub-persistence length resolution of the chain configuration. The persistence length of DNA corresponds to approximately 150 bp; using 15 beads per persistence length implies that modeling even the smallest DNA molecules used for genome mapping requires 15 000 beads. Simulating the dynamics of such a large chain in the absence of confinement, where there is a closed-form result for the hydrodynamic interactions, is already a daunting challenge and normally handled by resorting to some pre-averaging approximation.^{54,55} The situation is even more difficult in a channel due to the hydrodynamic interactions. While these interactions are screened over a length scale close to the channel size,^{44,56} there is no closed-form solution for the Green's function describing the hydrodynamic interactions within the screening volume. Although the methods for including hydrodynamic interactions in confinement have advanced substantially,^{45,57–59} simulating the dynamics of a confined chain with hydrodynamic interactions is considerably slower than its unconfined counterpart.

Fortunately, the internal dynamics do not need to be resolved at such a fine length scale to produce meaningful information for the practice of genome mapping in nanochannels. Barcoded genomic DNA in the genome mapping system used here contain segments defined by barcode labels separated by distances that range from 0.5 to 80 *μ*m on molecules that are at least 17 *μ*m in length and often much longer. We developed the multi-scale approach outlined in Fig. 1 to produce a model that provides dynamic information resolved within this range for DNA that do not have global hairpin folds, which are the ones of interest for genome mapping. Importantly, our model only uses input parameters that can be obtained from bulk experiments or theories for unconfined polyelectrolytes.

Explicitly, our model takes as input the properties of the DNA (molecular weight, persistence length, effective width, and segmental hydrodynamic radius), all of which are readily available from bulk experimental data or theories that describe their dependence as a function of ionic strength.^{60–63} We also require the system temperature and buffer viscosity, again readily available from bulk experimental data. As illustrated in Fig. 1(d), these parameters serve as the inputs to a discrete wormlike chain model, which we have used extensively^{17,19,25,27,28,30,32,39,44,64} to study the properties of confined polymers in channels near the persistence length. This model resolves DNA at the length scale of the model's bond length, which is approximately 3 nm, and easily reaches $ O ( 10 4 ) $ beads when simulated using the pruned-enriched Rosenbluth method (PERM).^{46,47} Keep in mind that these are thermodynamic simulations, which provide, *inter alia*, expectations for the average chain extension and the variance about the average, but no information about the polymer dynamics. We also combine the PERM simulations with a confined hydrodynamics calculation to compute the friction coefficient of the confined chain with sub-persistence length spatial resolution.^{44} This fine-scale, confined DNA model has been tested using experimental data for the chain extension^{17,39,65} and its longest relaxation time^{42} and yields results that agree well with experimental data.

The results of the PERM simulations are then used as inputs to the one-dimensional, Rouse-like model of coupled, overdamped harmonic oscillators seen in Fig. 1(f), analogous to previous work by Riehn and coworkers for DNA under weak confinement.^{50,51} Importantly, our diffusion data for the fine-scale model allows us to pick a coarse-graining length scale where all of the hydrodynamic interactions occur within the Rouse beads, with the walls screening the interaction between Rouse beads. The advantage of this approach to simulating confined DNA,^{50,51} even if small errors are introduced by the coarse-graining, is that the coarse-grained model is substantially simpler than the original fine-scale model, reducing the number of beads by a factor of $ O ( 10 3 ) $ and reducing the dimensionality from 3D to 1D. Equally important, our coarse-grained model captures both the segment-segment and segment-wall hydrodynamic interactions at the local scale but, due to, the screening from the walls, does not require any hydrodynamic interactions between Rouse beads for the modest degrees of coarse-graining that we use here. The model is thus simple to simulate using Brownian dynamics, and many of the interesting properties of the model are readily obtained using a standard normal-mode analysis.^{48}

With this overall approach in mind, we begin with a detailed description for constructing and parameterizing the Rouse-like model. The conclusion of this analysis is a closed-form expression for the relaxation time between two segments on the chain, where all of the parameters in the expression emerge from simulating the fine-scale model. The overall approach we use here is general and readily applied to other channel sizes or shapes, as needed, and we focus on the nanochannels used in recent experiments on time-series data during genome mapping.^{40} We reanalyze these data^{40} to test the ability of the model to predict the trends in the relaxation time as a function of the distance between labels on the DNA and the distance of the labels from the ends of the molecule. Our results indicate that our model, while quite simple, reproduces both the trends in the experimental data, as well as the order-of-magnitude for the relaxation time. We then use the model to investigate in more detail the effects of DNA molecular weight, label separation, and label location, and we connect these results to practical implications for genome mapping. Our discourse concludes with perspectives on the further potential uses of both our approach and the particular model developed here.

## II. MODELING APPROACH

### A. One-dimensional Rouse-like model

Following previous work,^{50,51} the coarse-grained model in Fig. 1 represents the highly confined linear chain by a one-dimensional overdamped harmonic oscillator chain model. This model has an analogy to the Rouse model for polymer chains in free solution,^{48} in which a polymer molecule is represented by beads connected through harmonic springs with zero equilibrium spring length, and where each bead experiences random thermal force and a drag force. One of the differences between the Rouse model and what we call, for notational clarity, a *Rouse-like model* for a nanochannel-confined chain is that the equilibrium length of each spring in the nanochannel confined chain is nonzero.

The model consists of *N _{b}* beads connected by $ N s = N b \u2212 1 $ springs. As such, the equation of motion governing the motion of Rouse bead $ i = 1 , 2 , \u2026 , N b $ is

with hypothetical end beads $ R 0 = R 1 \u2212 X s $ and $ R N b + 1 = R N b + X s $, i.e., with a free-end boundary condition at both ends. Here, *ζ* is the friction coefficient of a bead, *R _{i}* is the axial position of bead

*i*in the one-dimensional model,

*k*is the spring constant, and

_{s}*X*is the equilibrium rest length of the spring. Although the factor

_{s}*X*cancels out in Eq. (1), we retain it to emphasize that the springs have non-zero rest lengths. The random force

_{s}*f*(

*i*) acting on bead

*i*has the properties

and

where *k _{B}* is Boltzmann's constant,

*T*is the absolute temperature, $ \delta ( t 1 \u2212 t 2 ) $ is a Dirac delta function, and

*δ*is the Kronecker delta function. It is clear from the form of Eq. (1) that the Rouse model ignores excluded volume and hydrodynamic interactions between beads, assumptions that will be justified shortly.

_{ik}### B. Parameterization via a fine-scale model

The Rouse-like model involves four parameters: the number of beads *N _{b}*, the bead friction

*ζ*, the spring constant

*k*, and the equilibrium spring length

_{s}*X*. The aim of our multi-scale approach in Fig. 1 is to relate these quantities to known properties of the DNA molecule (contour length

_{s}*L*, persistence length

*l*, effective width

_{p}*w*, and segmental hydrodynamic radius

*a*), the buffer (viscosity

*η*and temperature

*T*), and the channel width

*H*. The number of beads is obtained from

*L*and the corresponding mean extension

*X*of the chain

which can be readily solved for *N _{b}* once

*X*is known. The remaining quantities need to be obtained from a fine-scale model.

_{s}To this extent, we model the DNA as a discrete wormlike chain of $ N b P $ touching beads,^{17} with a bond length equal to the hydrodynamic diameter 2*a*,^{44} and simulate this model using PERM.^{27,46,47} There is a bending penalty between trios of continuous beads of the form

where *θ* is the angle between the beads. The bending penalty *κ* is related to the persistence length via^{44}

Hard-core repulsion is enforced between all beads and between the beads and the walls over a distance *w*. This wall interaction implies an effective channel size $ H eff = H \u2212 w $ available to the beads. Each bead also has a hydrodynamic radius *a* that is generally not equal to the effective width and, for DNA, conveniently satisfies the inequality *a* < *w*.^{66}

The standard PERM simulations for a confined chain^{27} furnish the average extension of the chain in the axial direction, *X*, and the variance about the mean, $ \delta X 2 $, as a function of *L*. We also use PERM simulations to compute a Kirkwood diffusivity *D _{K}* in the axial direction,

^{44}using a grid-based method to obtain the hydrodynamic interactions between segments and between the DNA and the walls.

^{45}We have recently shown that the Kirkwood approximation improves as confinement increases,

^{67}and we obtained a good agreement between experiments and simulations for the relaxation time of the entire DNA chain by invoking the Kirkwood approximation.

^{42}For our purposes here, it proves convenient to recast the Kirkwood diffusivity as an axial friction coefficient

*ζ*, where the superscript denotes a quantity obtained from PERM simulations. PERM is a chain growth algorithm,

^{P}^{46}and natively outputs thermodynamic averages as a function of contour length. As a result, the output(s) of our PERM simulations are the functions $ X ( N b P ) , \u2009 \delta X 2 ( N b P ) $, and $ \zeta P ( N b P ) $, where $ N b P $ is the number of PERM beads.

To construct the map between the discrete wormlike chain model and the Rouse-like model, we need to determine an appropriate number of PERM beads $ N b P * $ to coarse-grain into a single Rouse bead (or a Rouse spring). There are three important criteria that we consider to choose $ N b P * $. First, there should be no hydrodynamic interactions beyond the coarse-graining length scale because the Rouse model does not include any bead-bead hydrodynamic interactions. In other words, the value of $ N b P * $ should be sufficiently large so that all hydrodynamic interactions occur, on average, within a length scale represented by a Rouse bead representing the friction of $ N b P * $ PERM beads. This criterion will be used in setting the lower bound on the choice of $ N b P * $. Second, the number of Rouse beads should be large enough so that end effects emerging from using a fixed value of *ζ* for each Rouse bead in Eq. (1) are minimized. This criterion will set the upper bound on the choice of $ N b P * $. This upper bound depends on the chain length; for example, it can be very small for small chains. This introduces a limitation in our multi-scale model that only the chains of lengths greater than a particular length can be studied (so that the upper bound is always higher than the lower bound). Finally, the third criterion pertains to the static phenomena: There should be enough PERM beads to represent a Rouse spring such that the distribution for the extension of $ N b P * $ PERM beads is approximately Gaussian, as required by the harmonic nature of a Rouse spring. This criterion sets a second lower bound on the choice of $ N b P * $; naturally, the lower of these two lower bounds must be satisfied. The details of the procedure to choose $ N b P * $, by satisfying all three criteria, are discussed in Sec. III A.

Once we have selected a value of $ N b P * $, the mapping between the fine-scale model and the coarse-grained model is straightforward. The friction of a Rouse bead is

which is obtained directly from the PERM data. As is customary in the Rouse model, we assign the same friction *ζ* to each bead, including the end ones to facilitate the normal mode analysis and ultimately arrive at a closed-form result. The mean extension of the spring between two beads is also obtained directly from the PERM data as

To obtain the spring constant, we treat each spring as a dumbbell. From the equipartition theorem, the spring constant is^{42}

While we have focused so far on criteria for selecting $ N b P * $, it is worthwhile to mention two more important limitations to our model. First, the choice of the coarse-grained length scale naturally sets a lower bound on the time resolution of the model. In order for the Rouse-like model to be valid, the degrees of freedom modeled by a bead-spring pair need to be able to relax such that their thermodynamic averages, used to compute the Rouse parameters via Eqs. (7)–(9), are representative values. To a good approximation, this time scale is set by the relaxation time for a bead-spring dumbbell, $ t r \u2248 \zeta / ( 2 k s ) $. In dynamic simulations, Eq. (1) typically needs to be integrated using a time-step smaller than this relaxation time for numerical stability. This does not pose any conceptual problems, provided that one only considers dynamics that take place at a time exceeding *t _{r}*. A similar caveat applies for a normal-mode analysis of the Rouse-like model. Second, the Rouse-like model assumes no backfolding of the chain at the length scale of the coarse-grained model. Experiments and simulations suggest that this is a reasonable assumption. While there does exist a small population of folds and knots in experimental data, they can be easily removed from the ensemble by scanning the backbone signal and looking for anomalous bright regions.

^{41}Moreover, folded chains are short-lived, and tend to be removed simply by waiting several minutes after loading the DNA and acquiring data. It is also possible that there are small folds within the DNA that are not detected in experiments due to the low signal-to-noise ratio for the DNA backbone. Fortunately, simulations of confined DNA using realistic parameters for nanochannel mapping

^{23}suggest that these folds are small compared to our coarse-grained length scale. As a result, small S-loop shapes that appear due to thermal fluctuations are already incorporated into the coarse-grained model, in an average sense, by the equilibrium spring length and the spring constant. The extension data produced these from the aforementioned simulations,

^{23}which used a Metropolis Monte Carlo method, were recently confirmed by PERM simulations.

^{31}Thus, it seems reasonable to assume that the local configurational data (which are not easily obtained in a PERM simulation) are also reproduced.

### C. Extension autocorrelation function

For our analysis, we focus on the extension autocorrelation function (ACF) of the distance between segments in the Rouse-like model. This quantity is related to the relaxation time of this segment of the chain. We provide a brief outline of the derivation here;^{68} the complete derivation is provided in the supplementary material. The analysis is similar to that of the standard Rouse model, with the exception that the conversion to normal modes requires shifting the positions of bead *R _{i}* to $ R i \u2212 i X s $ to map the finite-extension chain onto a chain with zero average extension.

^{51}If the extension of a segment between beads

*a*and

*b*is denoted as $ x a b = R b \u2212 R a $, where

*R*and

_{b}*R*are the positions of beads

_{a}*b*and

*a*, respectively, then the normalized ACF for the mean extension, $ Z a b ( t ) $, of that segment of the chain is

where $ \u27e8 \cdots \u27e9 $ is the mean value. Equation (10) satisfies the properties of a normalized ACF, i.e., $ Z a b ( 0 ) = 1 $ and $ Z a b ( \u221e ) = 0 $. The former result is obvious and the latter result is due to the relaxation at infinite time, i.e., $ \u27e8 x a b ( \u221e ) x a b ( 0 ) \u27e9 = \u27e8 x a b ( \u221e ) \u27e9 \u27e8 x a b ( 0 ) \u27e9 $. If we define $ x \u0303 a b = x a b \u2212 \u27e8 x a b \u27e9 $, then Eq. (10) can be rewritten as

The quantity $ \u27e8 x \u0303 a b ( t ) \u2009 \u2009 x \u0303 a b ( 0 ) \u27e9 $, computed in the supplementary material, is^{68}

where

and

is the relaxation time of the *j*th Rouse-like mode. Clearly, many single exponential decays contribute to the decay of the mean extension normalized ACF. The relaxation time is given by the integral of the normalized ACF from time *t* = 0 to a final time *t* = *T _{e}*

which leads to

In theory, the upper limit on the integral is $ T e = \u221e $.^{68} However, when we compare this model to experimental data, we will treat *T _{e}* as the total experimental acquisition time, i.e., the number of frames multiplied by the frame exposure time

*t*. Note that each independent Rouse-like mode

_{e}*j*represents the local motion of the chain, which includes $ N b / j $ beads with relaxation time approximated by

*τ*. However, the actual relaxation time

_{j}*τ*of segment (

*a*,

*b*) depends on all modes, i.e., all

*τ*.

_{j}## III. RESULTS AND DISCUSSION

### A. Model parameterization

We focus here on the experimental system used in recent experiments on the dynamics of DNA during genome mapping.^{40} The channel size is *H* = 40 nm. The viscosity of the buffer was measured using an AR-G2 rheometer (TA instrument) with the cone-plate geometry^{69} to be approximately 0.0022 Pa s at room temperature. The ionic strength of the buffer is approximately 100 mM. From Dobrynin's theory,^{70,71} the corresponding persistence length is $ l p = 52.2 $ nm and, from Stigter's theory,^{60} the effective width is *w* = 5.6 nm. The resulting effective channel size available to the chain is $ H eff = 34.4 $ nm. For the hydrodynamic radius of a PERM bead, we use the value *a* = 1.47 nm.^{44,66} This value of bead radius was determined previously from the diffusivity of unconfined, short DNA and Yamakawa and Fujii's theory for the diffusivity of an ideal wormlike chain.^{72} Subsequent simulations of the unconfined Kirkwood diffusivity using this value of the hydrodynamic radius and our Kirkwood diffusion methodology led to excellent agreement with experimental data.^{66} The free-solution value of the hydrodynamic radius applies in confinement as well because the wall effects are incorporated through far-field interactions.^{44,45} Using these input parameters, we conducted PERM simulations using 100 000 tours with a total length of $ N b P = 15 \u2009 000 $ beads per tour. For wall hydrodynamic interactions between segments and wall, we use the same grid-based method and the same grid parameters as used in the past.^{44}

Overall, the idea behind the parameterization is to find a reasonable coarse-graining length scale, $ N b P * $, such that the criteria of having (i) no hydrodynamic interactions between Rouse beads, (ii) minimal errors due to extra friction on end Rouse beads, and (iii) harmonic potential for the Rouse spring in the Rouse-like model are satisfied, as outlined in Sec. II B. We first start with finding a lower bound on $ N b P * $ based on criterion (i). Figure 2 shows the results for the friction coefficient per bead, *ζ ^{P}*/ $ N b P $, as a function of $ N b P $ obtained from PERM simulations. The friction coefficient converges very quickly, consistent with arguments that hydrodynamic interactions are screened outside the length scale of the channel.

^{56}From these data, we select the lower bound on $ N b P * $ as 500 PERM beads because, beyond this value, the friction coefficient becomes approximately extensive in $ N b P $.

To find the upper bound set by criterion (ii), we calculate in Fig. 3 the relaxation time of a whole chain of fixed length *X*, using Eq. (17), for varying values of the coarse-graining parameter $ N b P * $. Equivalently, the number of Rouse beads is

which is similar to Eq. (4). Values of other Rouse parameters *k _{s}* and

*ζ*are easily mapped from the PERM results (shown in Figs. 2 and 4) for a given value of

*X*. For example, $ N b P = 7000 $ PERM beads corresponds to $ X = $ 18.18

*μ*m, and resulting $ k s wc = 221.6 $ nN/m and $ \zeta wc = 153.8 $ nN s/m for the whole chain; the superscript “wc” stands for whole chain. Note that the smallest DNA considered in the genomic experiment in this work is 17

*μ*m, and hence we used

*X*= 18.18

*μ*m (which is close to the minimum length used in experiments) to show the end effects. The spring constant is $ k s = ( N b \u2212 1 ) \u2009 k s wc $ and the friction coefficient is $ \zeta = \zeta wc / N b $.

Results for the relaxation time *τ* for *X* = 18.18 *μ*m are shown in the left panel of Fig. 3. There is a strong end effect when $ N b \u2272 10 $ for all chain lengths, indicated by dashed green horizontal line in Fig. 3. Using $ N b = 10 $ as the upper bound implies that the upper bound is $ N b P * \u2248 750 $, which is indicated by the blue dashed vertical line. In the middle and right panel of Fig. 3, we also show the end effects for two other larger chains to verify that the minimum value of $ N b = 10 $ is a good choice, bearing in mind that the smallest chain in our experiments provides the most stringent upper bound.

Finally, to test criterion (iii), the top panel of Fig. 4 shows examples of the probability density functions for the chain extension, $ \psi ( X ) $, for $ N b P = 2 , 20 , 200 , 2000 $. Because the chain has a finite extension, $ \psi ( X ) $ in all cases deviates from the Gaussian behavior, as evidenced by the normal probability plots shown in the bottom panel of Fig. 4 for respective chain lengths. This non-Gaussian behavior has also been seen in experimental studies in the past.^{8,40}

For our purposes, we want to determine the length scale for which the distribution of extension is as close as possible to the Gaussian distribution. In order to see the closeness between the empirical cumulative distribution function (CDF) of our extension data and the standard normal CDF, we calculate the root mean square of error (RMSE) between the two for a range of $ N b P $ in Fig. 5. In general, there is a steady decrease in RMSE all the way up to $ N b P = 15000 $. However, since the reduction in RMSE is quite small (0.09–0.04) over four decades of $ N b P $, and the minimum value of RMSE occurs well above the upper bound we have already set, criterion (i) alone provides a lower bound on $ N b P * $, namely, $ N b P * = 500 $ PERM beads.

There is a very narrow window between the lower bound (500) and the upper bound (750) on the choice of $ N b P * $. A choice of 750 for $ N b P * $ would result in a more accurate value of the friction coefficient (see Fig. 2), but a less accurate value for *τ* due to end effects (see Fig. 3); the opposite would be true if $ N b P * $ was 500. Throughout this paper, we use $ N b P * = 600 $ as the coarse-grained length scale to represent one spring in the Rouse-like model. Similarly, the friction coefficient of 600 PERM beads represents the friction coefficient of a single Rouse bead. The values of Rouse-like model parameters at this coarse-graining length scale are listed in Table I. Unlike the three parameters listed in Table I, the fourth parameter in the model, *N _{b}*, is set by the length of the chain. For clarity, Table II summarizes the conversion from PERM beads to Rouse springs. By way of example, the particular chain in Fig. 1(b) has an extension of 28.62

*μ*m that corresponds to roughly 17 springs or, equivalently, $ N b = 18 $ beads.

Parameter . | Value . | Units . |
---|---|---|

X _{s} | 1.64 | μm |

k _{s} | 2878 | nN/m |

ζ | 13.27 | nN s/m |

Parameter . | Value . | Units . |
---|---|---|

X _{s} | 1.64 | μm |

k _{s} | 2878 | nN/m |

ζ | 13.27 | nN s/m |

Entity . | Contour length . | Extension . | PERM beads . | Rouse springs . |
---|---|---|---|---|

. | [bp] . | [μm]
. | . | . |

PERM bead | 9.8 | … | 1 | … |

Rouse spring | 6492 | 1.64 | 600 | 1 |

Example (Fig. 1(b)) | 113 288 | 28.62 | 11 560 | $\u2245$17 |

Entity . | Contour length . | Extension . | PERM beads . | Rouse springs . |
---|---|---|---|---|

. | [bp] . | [μm]
. | . | . |

PERM bead | 9.8 | … | 1 | … |

Rouse spring | 6492 | 1.64 | 600 | 1 |

Example (Fig. 1(b)) | 113 288 | 28.62 | 11 560 | $\u2245$17 |

### B. Comparison with the experimental data

To test the model, we compared the predictions of Eq. (17) using the parameters in Table I to experimental data on genomic DNA in nanochannels. We recently obtained a large videomicroscopy data set of labeled *E. coli* genomic DNA molecules that were analyzed primarily for static properties.^{40} Here, we return to the same data set^{40} and now extract the dynamic data. In these experiments, the DNA molecules were driven electrophoretically into square nanochannels of size *H* = 40 nm, using the version 1 chip from BioNano Genomics, in a buffer of ionic strength of approximately *I* = 0.1 M. A typical image of the DNA in the nanochannels appears in Fig. 1(a). The extension of the DNA molecules were in the range $ 17 \u2212 124 $ *μ*m. Twenty-seven videos were acquired, each consisting of 330 frames of the label channel with a frame exposure time of 30 ms. An image of the YOYO-stained DNA backbone was taken immediately after the final image of the movie. The custom image processing program DM-static, which is available from BioNano Genomics, was used to analyze the raw images. The resulting time series were filtered for blinking and non-resolved labels using a set of filters developed separately from the DM-static algorithm.^{40} The final data set includes a total of 1718 molecules and 18 589 internal segments.

Each molecule produces a kymograph, similar to the example in Fig. 1(c), that provides the position in the channel for each label as a function of time. The normalized ACF for the experimental data is computed as

where *x _{ab}* is the separation distance between two labels

*a*and

*b*,

*T*is the total number of frames, and

_{f}*j*and

*t*denote the lag/frame index. This definition of the normalized ACF is simply a discretized version of Eq. (10). In Eq. (19), the time series of the separation of label positions are known from experimental data, whereas in the Rouse-like model, normal modes are used to define label or bead positions as described in the supplementary material. Although the buffer used in these experiments contains multiple additives to enhance photostability,

_{f}^{40}some of the labels will blink on and off. In the event that a label blinks off, we set the corresponding terms in the sums in Eq. (19) to zero when they involve that frame. We then fit the normalized ACF to a multiple exponential function to obtain the relaxation time

*τ*. We found that some of the fits were clearly not exponential. Most likely, these fits were to due to transient large compression or extension. However, we cannot make a definitive analysis because we only have a single image of the DNA backbone at the end of the movie. These outliers were removed from the data set; more details about the criteria to remove these outliers are provided in the supplementary material.

We chose to work with genomic DNA extracted from *E*. *coli*,^{40} rather than a model system (e.g., barcoded *λ*-phage DNA) for two reasons. First and foremost, genomic mapping in nanochannels is intended to acquire information from large genomic DNA molecules, so these samples are most relevant to the intended application. Second, it is relatively easy to acquire a very large data set over a wide range of molecular weights and label separation distances with a sample extracted from an organism. The downside of our approach is that essentially every molecule that we image is a unique sample, with a different molecular weight and different label pattern. As a result, we are forced to bin these data in order to obtain sufficient statistics. In what follows, we chose to bin the molecules first as a function of their extension *X*. Then, within a particular *X* bin, we sub-bin the data in terms of the label separation distance *x _{ab}*. We only considered bins which contained at least 20 data points. All of the values involved in the experimental data analysis are listed in Table III. Additional details about the experimental protocol are provided in the supplementary material. While our model suggests that there is a further dependence on the location of the labels with respect to the end of the molecule, we do not have enough data to support a third level of binning. We will consider the effect of label location from the model in Sec. III C.

Experimental parameter . | Value . |
---|---|

X bin size | 9.27 μm (37 kbp) |

x bin size _{ab} | 6.18 μm (25 kbp) |

Number of segments before (after) filtering | 18 589 (12 173) |

Number of molecules before (after) filtering | 1718 (480) |

Final range of x _{ab} | [0.5, 46.5] μm |

Final range of X | [17, 62] μm |

Experimental parameter . | Value . |
---|---|

X bin size | 9.27 μm (37 kbp) |

x bin size _{ab} | 6.18 μm (25 kbp) |

Number of segments before (after) filtering | 18 589 (12 173) |

Number of molecules before (after) filtering | 1718 (480) |

Final range of x _{ab} | [0.5, 46.5] μm |

Final range of X | [17, 62] μm |

By way of example, Fig. 6 displays the normalized ACF, *Z _{ab}*, for small distances between labels on small molecules (Fig. 6(a)) and large distances between labels on large molecules (Fig. 6(b)). Note that the mean for a given time lag is obtained by averaging the value of

*Z*of all segments for a particular combination of

_{ab}*X*and

*x*bins. We chose to represent the uncertainty in the data by the standard deviation over the values of

_{ab}*Z*within a given bin. There are certainly errors in the measurement of

_{ab}*x*for any particular label pair due to the imaging optics and the method for extracting the label positions. In addition to propagation of this error to

_{ab}*Z*, the autocorrelation calculation also has errors arising from the limited sample size and blinking labels. Nevertheless, the dominant source of variance within a given bin likely arises from the distribution of molecular weights and label distances within that bin.

_{ab}Overall, Fig. 6 makes it clear that longer segments relax more slowly than smaller segments, as would be expected. However, in both cases there are substantial variances in the correlation, especially at long lags. We attribute this phenomenon to two likely factors: (i) the finite duration of the videos, which limits the number of periods that can be realized for long Rouse modes, and (ii) transient compression or extension.

Our theoretical analysis produced a result in Eq. (17) for the time constant characterizing the decay in the autocorrelation. To compare this formula with the experimental data, we made a multiple exponential fit

to the first 100 data points in plots similar to Fig. 6 for each label pair combination. In Eq. (20), *N _{e}* is the number of exponentials, which we choose to be 10 in this work, and

*A*and

_{k}*θ*are the coefficients and time constants for the $ k th $ exponential, respectively. After fitting the ACF using multiple exponential function, we use Eq. (16) with

_{k}*T*= 10 s to calculate

_{e}*τ*. Figure 7 shows the relaxation time data for every molecule and every pair of labels with segment lengths that lie within the ranges listed in Table III.

There are only a few data points for $ x a b > $ 37.1 *μ*m, hence they are not shown in the figure. The spread in the data is quite large, consistent with the shading in the typical binned ACFs plotted in Fig. 6. Also, there are substantially more data points for smaller *X* and *x _{ab}* than for larger ones. The solid lines in all panels in Fig. 7(a) show a linear model regression for

*τ*as a function of

*X*. For all six segment bins, we observe a positive correlation between

*τ*and

*X*as quantified by the corresponding

*p*-values. The

*p*-values less than 0.05 indicate the rejection of the null hypothesis that

*τ*does not depend on

*X*, at the

*α*= 0.05 confidence level. We also perform a statistical comparison of the slopes of all segment size bins using an Analysis of Covariance at the

*α*= 0.05 level, followed by multiple individual pair comparisons using Tukey's honest significant difference criterion. These comparisons are shown in Table IV. All but three of the slope pairs are significantly different.

x_{ab} [μm]
. | 9.3 . | 15.5 . | 21.7 . | 27.9 . | 34.1 . |
---|---|---|---|---|---|

3.1 | 0.037 | <0.001 | <0.001 | <0.001 | <0.001 |

9.3 | 0.179 | <0.001 | <0.001 | 0.004 | |

15.5 | 0.002 | 0.016 | 0.023 | ||

21.7 | 0.982 | 0.406 | |||

27.9 | 0.698 |

x_{ab} [μm]
. | 9.3 . | 15.5 . | 21.7 . | 27.9 . | 34.1 . |
---|---|---|---|---|---|

3.1 | 0.037 | <0.001 | <0.001 | <0.001 | <0.001 |

9.3 | 0.179 | <0.001 | <0.001 | 0.004 | |

15.5 | 0.002 | 0.016 | 0.023 | ||

21.7 | 0.982 | 0.406 | |||

27.9 | 0.698 |

Although Fig. 7 shows almost all of the data we obtained, it is difficult to (i) draw clear inferences about the dependencies of *τ* on *X* and *x _{ab}* from all of the data points, and (ii) compare the experimental data with Rouse-like model results. Therefore, we also bin our data in

*X*. For each bin, we then compute the median, and the first and the third quantiles, which are depicted in Fig. 8(a). Poorer statistics are seen for

*τ*in Fig. 8(a) for larger values of

*X*and

*x*because, as evident from Fig. 7, there are relatively fewer data points for larger values of

_{ab}*X*and

*x*.

_{ab}Figure 8 indicates that (i) for a given molecule extension *X*, the relaxation time of a segment within that molecule increases with the length of the segment, and (ii) for a given segment length *x _{ab}*, the relaxation time also increases with increasing size of the entire molecule. Both the trends in the data are captured by the Rouse-like model in Fig. 8(b), where

*τ*is calculated using Eq. (17) with the parameters in Table I. For comparison with the experiments, we use

*T*= 10 s in Eq. (17), which is the experimental video duration. Figure 8 shows a qualitative agreement between the experimental results and the Rouse-like model predictions. A similar level of experimental agreement was achieved by Carpenter

_{e}*et al.*

^{50}and Karpusenko

*et al.*,

^{51}who used a simple one-dimensional overdamped oscillator chain model to predict and compare the dynamics of density fluctuations in DNA in larger channels. Therefore, our work corroborates the use of simplified models to describe these experiments.

Quantitatively, our Rouse-like model results are roughly within a factor of 2 compared to our experimental data. This quantitative discrepancy can be partially attributed to the excess friction coefficient assigned to end beads. This excess friction coefficient leads to overestimation of the relaxation time. Adjusting *ζ* for the end beads would result in a lower value of *τ*, albeit with a substantial increase in the complexity of the normal mode analysis. Moreover, on the experimental side, the limited video duration results in only a lower bound on the *τ* for larger segments and/or molecules. Thus, we expect larger *τ* for those segments and molecules in the experimental data. Additionally, for short segments with very short relaxation times, in many cases our calculated value will be merely an upper bound due to the duration of a frame, here optimized at 0.033 ms.

It is important to understand the effect of the overall length of a molecule on the relaxation dynamics of a given internal segment. Figure 8 shows that a segment of size *x _{ab}* relaxes much faster if it is a part of smaller DNA molecule, as opposed to being part of the larger DNA molecules. In general, this trend is captured by the model because the number of Rouse-like modes, and thus the relaxation time, increases with molecule length. Moreover, the molecule length dependence of the relaxation time increases with segment length,

*x*, as evidenced by the rising slopes in Fig. 8.

_{ab}### C. Effects of molecular weight and label locations

The relatively good agreement between the model and experiments in Fig. 8 allows us to use the model to investigate, in more detail, effects of molecular weight, label separation, and label location. These are challenging parameters to examine by experiment. In principle, one could use modern molecular biology methods to design molecules with precise molecular weights and label locations. Indeed, such a strategy proved useful in the development of genome mapping in nanochannels.^{1} However, constructing a set of molecules sufficient to study the physics over a wide region of the phase space would be prohibitively expensive, let alone the additional costs required for obtaining and analyzing the autocorrelation data. Thus, the model developed here presents an ideal opportunity to explore the phase space, keeping in mind that the accuracy of any conclusions obtained is likely to have an accuracy similar to that seen in Fig. 8.

Using the Rouse-like model, we can study the effect of segment position on *τ* in detail by choosing various values of *a* and *b* in Eq. (17). In this analysis we set $ T e = \u221e $ in order to capture the much longer relaxation time of very long molecules in the theoretical limit. Figure 9(a) plots *τ* as a function of the normalized relative position on the chain from the end

for four different molecule lengths and for *x _{ab}* = 3.29

*μ*m, which is the mean extension of a segment consisting of two Rouse springs. The four different molecules in Fig. 9(a) correspond to Rouse chains of $ N b = 10 , 20 , 40 , 80 $. Since there is no directionality in our model at equilibrium, these results are symmetric about the center of the chain, and we only plot the left side. Similarly, Fig. 9(b) displays the dependence of

*τ*on the position of the segment for various segment lengths

*x*(corresponding to $ b \u2212 a = 2 , 4 , 6 , 8 , 16 , 32 , 64 $) in a fixed chain length of

_{ab}*X*= 129.8

*μ*m (corresponding to $ N b = 80 $). The results indicate that, for a given segment length, locations on the edge of the chain relax much faster than locations in the center of the chain. This dynamic inhomogeneity has also been observed before in the context of polymer melts of polystyrene and polyisoprene.

^{73,74}The latter studies used atomistic simulations to study the influence of position of certain segments of polystyrene and polyisoprene on the dynamics of segments. This end effect becomes stronger as the molecule length increases. In order to see the effect of molecule length on end effects, we plot the ratio of the relaxation time of center segments to that of the outermost segments, $ \tau ratio $, as a function of

*X*in Fig. 9(c). For smaller DNA, this ratio is of the order one and the end effects are not strong. This suggests that the averaging of

*τ*for all positions in our experimental data analysis is valid only for the shorter molecules. This effect likely contributed to the increased variance in the relaxation time for the larger segments in Fig. 8(a), because these segments are more likely to be present on larger molecules.

### D. Impact on genome mapping

The consensus genome map for a biological sample is constructed by obtaining images similar to Fig. 1(b) for a large number of molecules such that the total number of base pairs imaged exceeds at least 30× the size of the genome. For a human genome, this corresponds to approximately 10^{11} base pairs or, approximately, 30 m of DNA. This measurement is accomplished in a single day's experiment using the version 2 chip available from BioNano Genomics. The fluorescent labels on each of the fragments within this football-field worth of DNA are then assembled by finding all of their overlapping regions. The assembly is done in a bioinformatic pipeline, where a set of statistical weights is applied to each individual molecule.^{75} As stated previously, there are number of sources of error, such as topological events, mutations, channel size variations, and dynamic fluctuations, that together constitute an overall error in label distance measurements. This overall error is then used to determine the statistical weight to assign to a particular measurement when constructing the genomic map. The information gained in this work studying the relaxation time as a function of segment length, molecule length, and segment position isolates the effects of dynamic fluctuations, which can improve the determination of the statistical weights. For example, the exposure time of 150 ms used for standard genome mapping roughly bisects the data in Fig. 8(a). This means that a snapshot fails to cover the fluctuation of a single relaxation for most of the larger segments. Whereas for the smaller segments, their extension distribution will be fairly completely sampled within one exposure time. As such, a larger statistical weight can be applied to shorter segments. We give a rough example calculation in the supplementary material that further amplifies this application of our model.

Considering the effect of position, the outermost segments will relax faster than the center segments, as can be seen from the general trends obtained solely from the Rouse-like model in Fig. 9. As a result, the distribution of the barcode distances is narrower for outermost segments. Thus, more statistical weight can be given to the outermost segments and less can be given to the center segments.

Figure 9(a) also shows that the relaxation time for outermost segments displays a weak dependence on *X* for larger *X*. For clarity, this trend is displayed in Fig. 10(a) across a wide range of molecular weights for three different segment lengths. A similar conclusion about outermost segments was made by Stephanou *et al.*^{76} but in the context of entangled polymer melts using primitive-path analysis on the trajectories obtained from molecular dynamics atomistic simulations. These results suggest that the accuracy in the measurements of barcode distances near the outermost segments do not decrease by increasing *X*. In other words, even for very large DNA chains, the accuracy level for outermost segments is the same as it would be in a small DNA chain. However, as shown in Fig. 10(b), the relaxation time strongly depends on *X* for center segments, suggesting a decrease in accuracy with an increase in *X*. This suggests that a larger statistical weight can be applied to outermost segments as compared to center segments.

## IV. CONCLUDING REMARKS

In this work, we have developed a multi-scale approach to model the dynamics of internal segments of nanoconfined DNA that ultimately produces a one-dimensional, Rouse-like model using input parameters that are readily available from the properties of unconfined bulk DNA and its buffer. While the limitations of the experimental data preclude a thorough model validation, this simple one-dimensional model captures the general trends in experimental data for DNA during genome mapping in nanochannels. This parameterized model provides further insights into the effects of molecular weight and label location that would be both expensive and difficult to obtain via experiments. Our results also impact the practice of genome mapping in nanochannels by identifying label pairs where the measurement of the label locations is likely to be impacted by a relaxation process that is slow compared to the imaging time. Indeed, as genome mapping moves towards ever longer molecules, the effects identified here will become increasingly important in optimizing the system for throughput and accuracy.

The modeling approach we used here is generic for channel-confined, semiflexible chains and thus easily extended to study other regimes of confinement or other polymers. In weaker confinement, however, it may prove convenient to move to a less resolved fine-scale model for computational efficiency.^{77} Likewise, while we found that normal-mode analysis was ideal for the problem we considered here, it is quite easy to simulate the Rouse-like model by Brownian dynamics. One should use caution, however, when using the model developed here to study strongly non-equilibrium processes such as the nanodozer assay.^{52} In order for the model to be valid, the statistics of the fine-scale model (which are obtained from equilibrium simulations) need to remain valid during simulation of the coarse-grained model. While this limitation does not impose any constraints on studying the equilibrium or near-equilibrium properties of channel-confined DNA, it constrains the model's use when non-equilibrium effects are important. Finally, the model developed here opens up the opportunity for *in silico* engineering of genome mapping technologies. Simulations of the Rouse-like model, with appropriate parameterization, furnish data on the dynamics of the label positions. Given the speed of these simulations, they offer an enticing avenue for optimizing operating parameters (exposure time, DNA molecular weight, label pattern) and device design for maximum throughput and accuracy.

## V. SUPPLEMENTARY MATERIAL

See supplementary material for a derivation of the autocorrelation time for the Rouse-like model, additional information on the fitting of the experimental data, and the effect of exposure time.

## ACKNOWLEDGMENTS

This work was supported by the National Institutes of Health (Grant No. R01-HG006851). We thank Seunghwan Shin for the measurement of the buffer viscosity. Part of this work was carried out in the College of Science and Engineering Polymer Characterization Facility, University of Minnesota, which has received capital equipment funding from the NSF through the UMN MRSEC program under Award No. DMR-1420013. Computational resources were provided in part by the Minnesota Supercomputing Institute at the University of Minnesota. Jeffrey G. Reifenberger and Han Cao are employees of BioNano Genomics, which is commercializing nanochannel genome mapping.