A central question in epigenetics is how histone modifications influence the 3D structure of eukaryotic genomes and, ultimately, how this 3D structure is manifested in gene expression. The wide range of length scales that influence the 3D genome structure presents important challenges; epigenetic modifications to histones occur on scales of angstroms, yet the resulting effects of these modifications on genome structure can span micrometers. There is a scarcity of computational tools capable of providing a mechanistic picture of how molecular information from individual histones is propagated up to large regions of the genome. In this work, a new molecular model of chromatin is presented that provides such a picture. This new model, referred to as 1CPN, is structured around a rigorous multiscale approach, whereby free energies from an established and extensively validated model of the nucleosome are mapped onto a reduced coarse-grained topology. As such, 1CPN incorporates detailed physics from the nucleosome, such as histone modifications and DNA sequence, while maintaining the computational efficiency that is required to permit kilobase-scale simulations of genomic DNA. The 1CPN model reproduces the free energies and dynamics of both single nucleosomes and short chromatin fibers, and it is shown to be compatible with recently developed models of the linker histone. It is applied here to examine the effects of the linker DNA on the free energies of chromatin assembly and to demonstrate that these free energies are strongly dependent on the linker DNA length, pitch, and even DNA sequence. The 1CPN model is implemented in the LAMMPS simulation package and is distributed freely for public use.

## I. INTRODUCTION

The assembly and compaction of eukaryotic DNA into chromatin represents a critical component of cellular function. Modifications to the processes that dictate chromatin folding, such as histone modifications, represent an important mechanism by which cellular functions such as transcription and replication are regulated. These modifications, known collectively as the epigenome, are central to human health, and a rapidly growing body of literature is finding these processes to be linked with diseases such as cancer.^{1,2} Developing a mechanistic understanding of the processes that dictate DNA compaction, and how these processes can be manipulated, is central to molecular biology, biophysics, and, ultimately, human health.

In recent years, genome-wide maps of histone modifications have provided researchers with molecular-level information about chromatin compaction. In particular, it has been found that many histone modifications do not act in isolation, but instead act collectively to dictate cellular function.^{3} By analyzing these genome-wide datasets, various groups have identified strong correlations between different patterns in histone modifications and the resulting function of chromatin.^{4–6} As foundational as these studies have been, however, they have not yet attempted to translate these correlations into a mechanistic understanding of chromatin that explains how modifications collectively dictate three-dimensional structure and gene expression.

One approach toward such a mechanistic understanding is afforded by so-called “chromatin conformation capture” techniques, which can provide a map of the three-dimensional contacts between different parts of the genome.^{7} These measurements have revealed several surprises, such as the fractal structure of chromatin,^{8,9} and the existence of topologically associated domains.^{10} Though initial experiments could only resolve genome contacts with 1000 kilobase resolution, recent techniques have achieved a resolution of up to one or several nucleosomes.^{11,12} Super-resolution microscopy has also been used to directly visualize chromatin compaction, shedding light on multiple, previously unknown subtle features of this process.^{13–17} Though both chromatin capture and microscopic techniques have significantly advanced our understanding of large scale structures within the genome, they do not possess sufficient resolution to resolve the interactions between individual nucleosomes. Resolving these interactions is critical because it is over these length scales that DNA-protein binding occurs and where histone modifications likely play the largest role.^{18}

The challenges associated with understanding the scales over which multiple nucleosomes interact together represent a continuation of a venerable body of literature trying to understand the “chromatin fiber.”^{19,20} Early work focused on whether this fiber is a “solenoid” one-start helix,^{21–24} where the linker DNA between neighboring nucleosomes is bent, or a “zig-zag” two-start helix,^{25–29} where DNA passes through the center of the fiber. Other work has called the very existence of this fiber *in vivo* into question, suggesting that at physiologically relevant concentrations a well-defined fiber may not even exist^{30,31} and, instead, that *in vivo* chromatin is in a liquidlike state.^{32} Though progress continues to be made in this field,^{33,34} understanding chromatin at the nanoscale and how conformations are altered by epigenetic modifications is a significant challenge.

Computational models provide an opportunity to bridge this knowledge gap by providing a link between intricate, nanometer length-scale processes and those more easily observable in experiments. A particularly successful approach was developed by Arya, Schlick, and co-workers in which a coarse-grained model of the chromatin fiber was used to interpret a wide range of experimental measurements.^{33,35–37} In that model, the nucleosome core is represented by a rigid body decorated with optimized charges and the linker DNA is described by a twistable wormlike chain.^{38} Additionally, it contains flexible histone-tails,^{38} the linker histone^{39,40} which mediates internucleosome interactions, and takes into account interactions between nucleosomes and linker DNA. Another successful approach has instead sought to represent the nucleosome with an anisotropic potential and the conformation of the exiting DNA by a set of angles.^{41–43} This model has been used to interpret the experimentally measured force-extension curves of the chromatin fiber^{44} and the intrinsic energies of different fiber configurations.^{45} In the interest of computational efficiency, both of these models necessarily make assumptions about the nucleosome, namely, that with the exception of the histone tails, the nucleosome can be represented as a rigid body. Many of these previous models also rely on Monte Carlo sampling to provide insights into equilibrium or metastable structures, but the dynamic pathways that connect them have not been investigated.

An alternative strategy is to rely on detailed molecular models of the nucleosome, which have recently emerged as a useful tool to investigate the dynamics and energetics of the histones and their wrapped DNA.^{46–49} By employing detailed coarse-grained models of DNA^{50} and proteins,^{51,52} these models make fewer assumptions about the structure of the nucleosome and do not rely on crystal structures to bias sampling. These models have been used to examine a wide range of phenomena associated with the nucleosome, such as unwrapping,^{47,48} sequence-dependent formation energies,^{46} and nucleosome repositioning.^{49} They are particularly attractive for the study of higher-order structures of chromatin, as they naturally incorporate interactions both within a single nucleosome and between different nucleosomes. Perhaps most importantly, they offer high resolution and can naturally incorporate histone modifications with amino-acid-level detail,^{53} thereby providing the means to examine the wide range of histone modifications known to impact chromatin function.^{3} In the past, however, computational limitations have restricted the use of these models to small systems consisting of several nucleosomes.^{53} A computational approach that can simultaneously capture the physical processes that occur at nanometer scales while also permitting access to the longer length and time scales characteristic of 3D genomic structure could, in principle, unravel the complex relationship between histone modifications and 3D genomic structure.

In this work, a model of chromatin is presented that relies on a multiscale approach to systematically incorporate different length scales of chromatin. This model, which we refer to as “1-Cylinder-per-Nucleosome,” or 1CPN, represents each nucleosome by a single anisotropic site and is sufficiently efficient to simulate kilobases of DNA and hundreds of nucleosomes. Notably, the interaction parameters in 1CPN have been rigorously determined from detailed free-energy measurements obtained with the 3-Site-per-nucleotide (3SPN) Atomic Interaction-based Coarse Grained (AICG) model of the nucleosome,^{46,47} thereby permitting description of subtle features such as histone modifications and the role of DNA sequence. The 1CPN model is demonstrated to accurately reproduce a wide range of processes within chromatin, including internucleosome free energies of interaction (pair-potentials), the free energies of nucleosome unwrapping, and the sedimentation coefficients of short chromatin fibers. Additionally, the 1CPN model provides a description of the dynamics of chromatin and can be employed to examine the dynamic folding and rearrangements that are thought to occur within the chromatin fiber.

The remainder of this manuscript is structured as follows: The details of the 1CPN model topology and force field are presented in Sec. II. Section III discusses the methods used to validate the 1CPN model, including the mapping between 1CPN and the 3SPN-AICG model, the free energy techniques used, and our coarse-graining approach. Finally, in Sec. IV, the results of the 1CPN model are presented along with comparisons to available experimental measurements. Finally, we note that the 1CPN model is implemented in the LAMMPS simulation package^{54} and is freely available online at https://uchic.ag/1cpn.

## II. MODEL

This section provides a detailed description of the 1CPN model topology, the model force field, and the equations of motion governing its dynamics. Before entering into these details, however, we briefly outline the key features of the 1CPN model and the physics that the 1CPN model seeks to incorporate.

**Rigorous validation and mapping between length scales**: Chromatin is a complex hierarchical material, and no single model can be used to examine all length-scales of chromatin biophysics. For this reason, multiscale approaches are particularly helpful. The 1CPN model was designed with a topology that can be validated using the detailed 3SPN-AICG model. Furthermore, the 1CPN topology can smoothly map between 3SPN-AICG configurations (and vice versa), thereby allowing for multistage simulations where part of the simulation is run at one length scale (and model) of interest and then continued using another.**Internucleosome dynamics**: Chromatin is an inherently dynamic structure, where regions of the genome are constantly being expanded or compressed depending on the cellular state, or the levels of protein expression. The 1CPN model was parameterized to quantitatively match the dynamics of chromatin and is well suited to examine the various pathways by which chromatin can rearrange.**Intranucleosome dynamics and nucleosome flexibility**: The dynamics of chromatin go beyond the folding of the chromatin fiber and extend to the motion of nucleosomes themselves. These dynamics can manifest themselves through fluctuations of nucleosomal DNA,^{55,56}the incorporation of loops and twist,^{57,58}repositioning of the nucleosomes themselves,^{59}or dynamic modification of histone tails.^{3}The 1CPN model includes many of these effects and moves beyond a view where the nucleosome is completely rigid, to one where intranucleosome dynamics are considered.**Sequence dependence**: DNA sequence plays an important role in the position and stability of nucleosomes, and very little work has been performed examining the effect of DNA sequence on the three-dimensional structure of chromatin. The 1CPN model includes a “first-order” approach to this sequence dependence and can be used to explore the effect of different DNA sequences on the resulting 3D structure of chromatin.**Computational efficiency**: In order to produce a model that can access the large length- and time-scales of chromatin assembly, 1CPN was designed with performance in mind. When seeking to incorporate the biophysical phenomena described above, each feature of the model was implemented so as to incur the smallest possible computational expense. The 1CPN model is implemented in LAMMPS and exhibits good parallelization performance across hundreds of distributed processors.

### A. Model topology

All sites in 1CPN are anisotropic. Each exhibits 12 degrees of freedom: six corresponding to the site’s position and velocity and six corresponding to the site’s orientation and angular momentum. Though the use of anisotropic sites incurs increased computational costs, this approach dramatically reduces the total number of force sites needed to represent the chromatin fiber.

The orientation of the *i*th site is given uniquely by the quaternion *q*_{i} = (*q*_{i,0}, *q*_{i,1}, *q*_{i,2}, *q*_{i,3}), which corresponds to the rotation quaternion that converts from the body-frame to the lab-frame. When discussing the model in the remainder of this section, the orientation of a site is more easily represented by an orthonormal basis $(\u2009f^,v^,\xfb)$. This basis is obtained by first defining a reference basis corresponding to the body-frame $(\u2009f^0,v^0,\xfb0)$ and then using the quaternion rotations

to obtain the orientation of site *i* in the lab-frame $(\u2009f^i,v^i,\xfbi)$, where *q*^{*} is the conjugate of ** q**, and the reference basis can be chosen arbitrarily. We choose $f^0=(1,0,0)$, $v^0=(0,1,0)$, and

*û*_{0}= (0, 0, 1).

The molecular representation of the 1CPN model is shown in Fig. 1. In the 1CPN model, three different site types are used that correspond to the nucleosomes, the DNA, and the nucleosomal dyad. Nucleosome sites are represented by a red ellipsoid surrounded by a blue cylinder and are centered on a red sphere. DNA sites are represented by blue spheres, and the dyad sites are represented by yellow spheres within the red ellipsoid. A discretization of one DNA site per three base pairs is chosen and will be discussed in Sec. II D. The red, green, and blue arrows correspond to the $f^i,v^i,\xfbi$ vectors of the *i*th site, respectively. Note that for the DNA, $v^$ is omitted for clarity and can be obtained by $v^=\xfb\xd7f^$.

A particularly illustrative visualization of the 1CPN model is provided by overlaying it over the 1KX5 nucleosome crystal structure^{60} [Fig. 1(c)]. From this representation, it can be seen that the geometrical position of the DNA sites is well within the nucleosome core particle. This topology was chosen to permit the partial unwrapping of DNA from the histone core and will be discussed in Sec. II D. This visualization also makes clear that the dyad site (yellow) is located approximately at the histone dyad. The dyad site is used to incorporate the effect of the H3 tails on the DNA strands entering and exiting the nucleosome and will be discussed in Sec. II B 2.

We now proceed to discuss the 1CPN force field and the interactions that govern the potential energy of the model. In the following discussion, we divide the total potential energy of the 1CPN model, *U*_{1CPN} = *U*_{b} + *U*_{nb}, into bonded, *U*_{b}, and nonbonded, *U*_{nb}, interactions.

### B. Nonbonded interactions

Nonbonded interactions are represented by a pairwise function where the total nonbonded energy is the sum over all pairs of sites,

The interaction potential between two sites, *U*_{ij}, depends on the site types *i* and *j*. It is given by

where *r*_{ij} = *r*_{j} − *r*_{i} and $f^i$, *û*_{i}, $f^j$, and *û*_{j} are the orientation vectors of sites *i* and *j*, respectively. The different pair potentials *U*_{Zewdie}, *U*_{Zewdie-LJ}, *U*_{elec}, and *U*_{dyad} are defined in what follows.

#### 1. Zewdie potential

Interactions between nucleosome sites are given by the Zewdie potential,^{61} an extension of the widely used Gay-Berne potential, which has been shown to provide a good representation of the nucleosome.^{42,43} The Zewdie potential between sites *i* and *j* is given by

where *r*_{ij} = |*r*_{ij}|, *σ*_{0} and *ϵ*_{0} are parameters, and the functions $\sigma =\sigma (r^ij,f^i,f^j)$ and $\u03f5=\u03f5(r^ij,f^i,f^j)$ are responsible for the orientation dependence of the potential. Note that $r^ij=rij/rij$. In the Zewdie potential, *σ* and *ϵ* are represented by the first six terms of an S-function expansion^{62} and are defined as

In this formulation, the orientation dependence of the potential is embedded in the terms *S*_{000}, *S*_{022}, *S*_{220}, *S*_{222}, and *S*_{224}, which are defined in Table I. The range and strength of the Zewdie potential is controlled by the parameters *σ*_{0} and *ϵ*_{0}, respectively, while the anisotropic shape of the potential is given by parameters *σ*_{x} and *ϵ*_{x} for *x* = {000, *cc*2, 220, 222, 224}. A key feature of this S-function expansion is that the different *ϵ*_{x} and *σ*_{x} parameters set the energy scales of different directions of approach between particles.^{61} For example, the second and third terms *S*_{022} and *S*_{202} increase as particle orientations ( $f^i,f^j$) align with the interparticle vector *r*_{ij}. As a result, the value of *ϵ*_{cc2} controls the preference of particles to stack columnwise. Similarly, *S*_{220} is only a function of particle orientation (e.g., $f^i,f^j$), and so *ϵ*_{220} controls the preference of particles to form a nematic phase. Higher order effects, such as quadrupoles, are captured by the terms *S*_{222} and *S*_{224}.

$a0=f^i\u22c5f^j$, $a1=f^i\u22c5r^ij$, $a2=f^j\u22c5r^ij$ |

S_{000} = 1, $S202=(3a12\u22121)/25$, |

$S022=(3a22\u22121)/25$, $S220=(3a02\u22121)/25$, |

$S222=170(2\u22123a12\u22123a22\u22123a02+9a0a1a2)$, |

$S224=1470(1+2a02\u22125a12\u22125a22\u221220a0a1a2+35a12a22)$ |

$a0=f^i\u22c5f^j$, $a1=f^i\u22c5r^ij$, $a2=f^j\u22c5r^ij$ |

S_{000} = 1, $S202=(3a12\u22121)/25$, |

$S022=(3a22\u22121)/25$, $S220=(3a02\u22121)/25$, |

$S222=170(2\u22123a12\u22123a22\u22123a02+9a0a1a2)$, |

$S224=1470(1+2a02\u22125a12\u22125a22\u221220a0a1a2+35a12a22)$ |

The flexibility of the expansion makes the Zewdie potential well-suited to represent a complex disklike biomolecule such as the nucleosome. The 12 parameters in the Zewdie potential are fit to anisotropic pair-potentials obtained using the more detailed 3SPN-AICG model of the nucleosome and will be discussed in detail in Sec. III C.

Interactions between DNA and nucleosome sites are also represented using the Zewdie potential. In the 1CPN model, nonbonded interactions between DNA sites are isotropic (i.e., DNA sites are spheres), and a slight modification of the Zewdie potential is required, denoted by *U*_{Zewdie-LJ}. In *U*_{Zewdie-LJ}, forces are applied the same as in *U*_{Zewdie}, but because the DNA sites interact isotropically, the potential is independent of the orientation of the DNA site. In practice, this is achieved by setting $f^DNA=r^ij$, *a*_{0} = *a*_{2}, and then using *U*_{Zewdie} as usual. Additionally, due to the different radii of the nucleosome and DNA sites, different values of *σ*_{0} and *ϵ*_{0} must be chosen, which are denoted by $\sigma 0\u2032$ and $\u03f50\u2032$ in Eq. (2). Here, we use an arithmetic mixing rule to obtain $\sigma 0\u2032$ = (*σ*_{DNA} + *σ*_{0})/2. The value of $\u03f50\u2032$ is chosen to be small, so that nucleosome and DNA sites interact only though an excluded volume. The values of the parameters used in the Zewdie potential are given in Table S3 of the supplementary material.

Past work using the Zewdie potential^{42,43,61} relied on Monte Carlo simulations that, by construction, did not require computation of forces and torques. In order to simulate the Zewdie potential using molecular dynamics (as in the 1CPN model), calculation of the forces and torques is necessary. The derivation of these quantities is provided in Sec. S1 A of the supplementary material.

#### 2. Nonbonded dyad interactions

Although the Zewdie potential provides a good representation of the interactions between nucleosomes and DNA, it neglects additional, significant interactions that occur within chromatin. Importantly, the role of the histone H3 tail in stabilizing DNA entering and exiting the nucleosome is not taken into account. Previous experimental measurements^{63} and simulations^{47} indicate that histone tails H3 and H4 stabilize the outer wrap of nucleosomal DNA and that the histone H3 tail is important for screening repulsive interactions between the entering and exiting DNA.^{64}

In order to incorporate these effects into the 1CPN model, we introduce an additional pairwise interaction between dyad and DNA sites. Coarse-graining these complex interactions between DNA and histone tails into a single pair potential is challenging, especially because they are highly localized near the entering/exiting DNA. In order to arrive at a suitable functional form, we drew inspiration from the base-pairing interactions in the 3SPN.2 model, where hydrogen bonds between complementary bases were represented by an anisotropic Morse potential.^{65} Adopting a similar approach for 1CPN, we define the dyad potential as an anisotropic gaussian potential, *U*_{gauss,aniso} (Fig. 2), which is given by

where *r* is the distance between the DNA and dyad sites, *r*_{0} is the equilibrium distance between the sites, *d*_{0} and *σ* control the depth and width of the gaussian well, and *f*(*K*_{θ}, Δ*θ*) and *f*(*K*_{ϕ}, Δ*ϕ*) are modulating functions that scale from zero to unity, depending on the relative orientation of the DNA and dyad sites. As with the 3SPN.2 model, the modulating function is chosen to be

and similarly for *f*(*K*_{ϕ}, Δ*ϕ*), where Δ*θ* = *θ* − *θ*_{0}, Δ*ϕ* = *ϕ* − *ϕ*_{0}, and *θ*_{0} and *ϕ*_{0} are reference angles. Here, $cos\u2061\theta =\xfb\u22c5r^$ is the polar angle between the normalized bond vector $r^$ and the dyad orientation vector ** û**, and $cos\u2061\varphi =f^\u22c5r^$ is the azimuthal angle between $r^$ and $f^$. Although the choice of the other

*U*

_{gauss,aniso}parameters will be discussed in Sec. IV C, the approximate symmetry of the exiting and entering DNA yields

*ϕ*

_{0}= 90°. To aid in interpreting this potential, different projections of

*U*

_{gauss,aniso}are given in Fig. 2. It is instructive to compare the locations of the energetic minima in Fig. 2(a) to the overlay of 1CPN and the 1KX5 crystal structure in Fig. 1(c). Comparison of these two figures gives a qualitative justification for the functional form of

*U*

_{gauss,aniso}; this potential accounts for the role of the H3 tail in stabilizing DNA that is entering and exiting the nucleosome. A quantitative comparison will be made by matching free energies in the 1CPN and 3SPN-AICG models in Sec. IV C. Before proceeding, we note that the orientation of the DNA site does not enter the potential and

*U*

_{gauss,aniso}is only a function of the vector connecting the two sites

**and the orientation of the dyad site.**

*r*#### 3. Nonbonded DNA interactions

Nonbonded interactions between DNA sites are represented by a Debye-Hückel screened electrostatic potential, *U*_{elec}, given by

where *q*_{i} and *q*_{j} are the charges of sites *i* and *j*, *r*_{ij} is the intersite separation, *λ*_{D} is the Debye length, and *ϵ* is the solution dielectric constant. Each DNA site represents three base-pairs and a charge of −3 is given to each DNA bead. The choice of charge per DNA bead was found to give good agreement with experimental measurements of the salt-dependent persistence length of DNA (see Fig. 7). The Debye length is defined as

where *N*_{A} is Avogadro’s number and *I* is the ionic strength of the solution. The solution dielectric constant is a function of monovalent salt concentration, *C*, and temperature, *T*, such that *ϵ*(*T*, *C*). The salt and temperature dependence are assumed to be independent such that

where

and

### C. Bonded interactions

The bonded potential, *U*_{b}, includes interactions between sites, including bonds, angles, and dihedrals, which maintain the topology of the model. Since all sites in 1CPN have both a position and an orientation, the bonded interactions in 1CPN include interactions that constrain both the relative position and relative orientation of bonded sites. In our discussion here, bonded interactions are presented in two parts, first, within the DNA molecule, *U*_{DNA}, and, second, within the nucleosome, *U*_{Nucl}. The total bonded energy is the sum of these two contributions

#### 1. DNA bonded interactions

Interactions within the DNA molecule are represented by a twistable wormlike chain model.^{66,67} For a DNA chain consisting of *N* beads [see Fig. 3(a)], *U*_{DNA} is given by

where *ℓ*_{i} = |*r*_{i+1} − *r*_{i}| is the distance between adjacent sites, $kbD$ is a force constant, and *ℓ*_{b,0} is the equilibrium bond length; $cos\beta i=r^(i)(i+1)\u22c5r^(i+1)(i+2)$ is the angle between adjacent bonds with force constant *k*_{β}; $coswi=f^i+1\u22c5f^i+v^i+1\u22c5v^i1+u^i\u22c5u^i+1$ is the twist around the polymer backbone between adjacent sites, $kw$ is the corresponding force constant, and $\omega 0$ is the equilibrium twist. The last term, *U*_{align}, constrains *û*_{i} of each site to be aligned with the bond vector, *r*_{ij}, where $cos\psi i=\xfbi\u22c5r^(i)(i+1)$ and *k*_{ψ} is the corresponding force constant. In our implementation, we use a modified form of the twistable wormlike chain that accounts for nonzero equilibrium twist (i.e., *ω*_{0} ≠ 0). The corresponding forces and torques for this modified potential are given in Sec. S1 C of the supplementary material.

#### 2. Nucleosome-DNA bonded interactions

Bonded interactions within the nucleosome, *U*_{Nucl}, are responsible for constraining the location and orientation of both the DNA entering/exiting sites and the dyad site relative to the nucleosome. As a consequence, it is natural to divide *U*_{Nucl} into two terms

where *U*_{NDNA} represents the bonded interactions corresponding to the implicit nucleosomal DNA, and *U*_{dyad} are bonded interactions that control the position and orientation of the dyad.

In the 1CPN model, many of the bonded potentials have the same functional form and only differ in the value of the underlying parameters. As a consequence, to simplify the notation throughout the remainder of this section and Sec. II F, we define the harmonic bonded potentials, $Ubh$, and the harmonic and cosine angle/orientation potentials, $Uah$ and $Uacos$, as

where *ℓ*_{ij} = |*r*_{ij}|, $cos\u2061\theta =x^\u22c5\u0177$ with $x^=x/|x|$, and *k*, *ℓ*_{0} and *θ*_{0} are parameters corresponding to the force constant, equilibrium length, and equilibrium angles, respectively. To further simplify our notation, we label all sites involved in *U*_{Nucl} interactions as *i*, *j*, *p*, *q*, *r*, *s* as shown in Figs. 3(b)–3(d). Throughout the remainder of this section, Eqs. (16)–(18) and the indexing convention from Figs. 3(b)–3(d) will be referenced extensively. A detailed diagram describing the various parameters that arise in the model and the relationship between them is provided in the supplementary material.

The nucleosomal DNA contribution can be subdivided into four terms

The first term, $UNDNAgeom$, ensures that the locations of the DNA leaving the nucleosome are consistent with the locations in the 1KX5 crystal structure. This is shown graphically by the orange and green bonds in Fig. 3(c) and is defined as

The second term, $UNDNAorient$, constrains the angle at which these DNA strands leave the nucleosome and is given by

The parameters in $UNDNAorient$ are determined by comparison to 3SPN-AICG data as described in Sec. IV C. The third term, $UNDNArotation$, incorporates the ability of nucleosomal DNA to rotate around the histone proteins

where the parameters *k*_{θ} and *θ* are obtained in Sec. IV D. Finally, the fourth term $UNDNAtwist$ corresponds to the ability of DNA twist to be stored *inside* of the nucleosome. The storage of twist within the nucleosome is a common feature of nucleosomal crystal structures^{68} and is a hypothesized mechanism by which nucleosomes reposition along a long stretch of DNA. This term is defined as

The position of the dyad site is constrained by two contributions to the potential energy,

A graphical representation of the interactions that contribute to *U*_{dyad} is shown in Fig. 3(d). The first term, $Udyadgeom$, constrains the position of the dyad relative to the nucleosome and the entering/exiting DNA sites and is given by

The second term, $Udyadorient$, constrains the alignment of the dyad to be equivalent to the nucleosome,

where the first three terms constrain $f^$ and the final three terms constrain ** û**.

### D. Nucleosomal DNA

An additional parameter in the 1CPN model is the number of DNA base pairs that are coarse-grained into the nucleosome site. In previous models of chromatin, either all 147 base pairs contained within the nucleosome were mapped onto a single nucleosomal site^{42,43} or assumed to be a rigid body along with the core histone proteins.^{38} The effect of these assumptions is that nucleosomal DNA in these previous models could not unwrap from the histone proteins. However, numerous experimental studies suggest that nucleosomal DNA does indeed unwrap from the histone proteins^{55,56} and recent computational work has calculated the free energies of these DNA openings near the ends of the nucleosome to only be several *k*_{B}*T*.^{47} Partial DNA unwrapping significantly changes the angle that DNA leaves the nucleosome and could significantly alter the three-dimensional packing of short sections of chromatin.

In order to incorporate the effects of partial DNA unwrapping in the 1CPN model, only ≈127 base pairs of DNA are coarse grained into the nucleosome, thereby allowing ≈10 base pairs at each end of the nucleosome to unwrap from the histone proteins [see Fig. 1(c)]. This choice was found to provide good agreement with free energies of nucleosome unwrapping obtained from the 3SPN-AICG model (see Sec. IV C). The number of base pairs that can unwrap from the nucleosome in 1CPN is given by parameter, *n*_{bp,unwrap}.

During development of 1CPN, we found it convenient to allow *n*_{bp,unwrap} to equal 9, 10, or 11 base pairs, depending on the nucleosome repeat length (NRL). This choice permits the 1CPN model to examine any nucleosome repeat length, not only values that are even multiples of the DNA discretization of three DNA base pairs per DNA bead. Details about this choice are given in Sec. S4 of the supplementary material.

### E. Brownian dynamics

Brownian dynamics simulations are used to describe the translational motion of a particle according to

where $vi$(*t*) = *d**r*_{i}/*dt* is the translational velocity of the *i*th site, *r*_{i}(*t*) is its position, *m*_{i} is its mass, *ζ*_{i} is its translational friction tensor, *B*_{i} is a matrix which satisfies $[Bi]\u22c5[Bi]\u22a4=2kBT\zeta i$, and *d**W*_{i}(*t*) is a Wiener increment, which has white noise statistics, $\u27e8dWi(t)dWj(t\u2032)\u27e9eq=\delta \delta i,j\delta (t\u2212t\u2032)$, where ** δ** is the identity tensor and

*δ*(

*t*) is the Dirac delta function.

Similarly, the rotational motion of a particle in solution can be written as^{69}

where *ω*_{i} is the angular velocity of the *i*th site, $\zeta ir$ is its rotational friction tensor, $Bir$ is a matrix which satisfies $[Bir]\u22c5[Bir]\u22a4=2kBT\zeta ir$, and $Ri(t)\u2261f^i(t),u^i(t),v^i(t)\u22a4$ is a rotation matrix in which each row is one of the vectors in the triad of orientation vectors of the particle; *I*_{i} is the moment of inertia tensor of the particle.

Here, we use freely draining Brownian dynamics and the friction tensor is assumed to be diagonal such that $\zeta i=mi\lambda i\delta $ and $\zeta ir=mi\lambda ir\delta $, where *λ*_{i} and $\lambda ir$ are the so-called scalar translational and rotational damping parameters of the *i*th site.

Since the nucleosome sites in 1CPN have an ellipsoidal shape, the frictional forces that they experience should, in principle, be anisotropic (i.e., tensorial). In the 1CPN model, we have simplified the dynamics of nucleosome sites by using an effective scalar friction coefficient that was chosen such that the translational diffusion coefficient of the nucleosome particles matches the value obtained with the full tensorial representation (see Sec. S2 B of the supplementary material). DNA sites in the 1CPN model are spherical, and their friction coefficient is scalar by construction. Since the mass coarse-grained into the nucleosome sites is approximately 100*×* larger than that coarse-grained into the DNA sites, the 1CPN model contains different scalar friction coefficients for the different site types. In 1CPN, nucleosome sites are assigned a damping time of *λ*_{i} = 5.1 ps, spherical (DNA and dyad) sites are assigned a damping time *λ*_{i} = 0.195 ps, and *λ*^{r} = 0.3*λ*_{i}. Though a time step of 20 fs was used in this work, the largest time step that can be used in 1CPN is 60 fs. See Sec. S3 of the supplementary material for an explanation of how this value was determined.

### F. Linker histone

The structure of the chromatin fiber is influenced by DNA-binding proteins, with the H1 or avian variant H5 linker histone being ubiquitous examples.^{21,70–74} Sedimentation analyses of small chromatin fibers have demonstrated that chromatin with linker histones produces more condensed fibers than those without.^{35,72,75–77} The density of positively charged residues in the linker histone reduces the electrostatic repulsion and promotes binding of the linker DNA, favoring a more compact structure. Given its direct influence on chromatin structure, a model of the H1/5 linker histone by Luque *et al.*^{40} has been adapted to fit the 1CPN model.

The 1CPN linker histone model is at a coarse-grained level of description comparable to that of 1CPN and draws information from atomistic structures and charge distributions (Fig. 4). Structurally, it consists of three domains: the central, rigid globular head (GH) domain consisting of ∼80 amino acid residues, the flexible, positively charged C-terminal domain consisting of ∼110 residues, and the short N-terminal domain of ∼25 residues.^{78} In its original model definition, the GH region was coarse-grained to be a fixed rigid-body consisting of 6 charged sites. The positions of these sites relative to the nucleosome center of mass are described in Table S9 of the supplementary material.

#### 1. Bonded interactions

The 1CPN linker histone model is adapted to work with molecular or Brownian dynamics in order to support 1CPN. To adapt the GH to the 1CPN model, stiff harmonic springs are placed to preserve a rigid structure, $UGHgeom$,

where {*r*_{GH}} is the set of all coordinates of the rigid GH sites, {*ℓ*_{GH}} is the set of lengths between the positions of the rigid sites, and {*θ*_{0,GH}} and {*ϕ*_{0,GH}} are the sets of angles and dihedrals of the rigid structure, respectively. In order to ensure that the GH stays rigid, but not too constrained, a subset of the possible bonds, angles, and dihedral restraints are placed on the system and shown in Table S9 of the supplementary material.

Although there is discussion regarding the binding location of the linker histone to the nucleosome, we choose to bind the linker histone GH on the dyad axis.^{71,79} To accomplish this, we employ two harmonic spring potentials to three sites of the GH. We define the potential $UGHbind$ as

Sites 1, 3, and 4 of the GH are constrained by utilizing two harmonic potentials—one to the dyad and the other to the nucleosome center of mass. The use of three constrained sites restricts rotation of the GH and excessive lateral movement of the GH from the nucleosome dyad.

Unlike the rigid globular head, the C-terminal tail domain is long and flexible. The implementation of the C-terminal domain remains mostly unchanged from that of Luque *et al.* We described the bonded forces of the model here for completeness. The geometric bonds of the C-terminal domain $UCTDgeom$ are defined as

where *r*_{i,i+1} is the position vector between any two subsequent CTD beads. In order to bind the CTD to the GH and the GH to the nucleosome, we place a harmonic spring between site *f* of the GH and the first bead of the CTD. This potential $ULHbind$ is defined as

where *ℓ*_{LH} is the distance between site *f* and the first bead of the CTD.

#### 2. Nonbonded interactions

All linker histone sites are treated similarly for nonbonded interactions. Between linker histone sites, an excluded volume interaction is adopted in the form of a Lennard-Jones potential,

For all spherical beads, a geometric average of *σ*_{0} is applied based on the sizes of each species. The energy of interaction, *ϵ*_{0}, is set low to make the interaction purely repulsive. Consistent with the DNA sites, linker histone electrostatics are treated at the level of Debye-Hückel, as described in Sec. II B 3. The electrostatic interactions are only used between the linker histone and DNA. Interactions between the linker histone and the nucleosome are treated similarly as in the DNA with the *U*_{Zewdie-LJ}.

### G. Parameter values

The parameter values used in the 1CPN model are tabulated in the supplementary material. These tables also include references to how (or where) each parameter value was obtained. These references consist of both experimental measurements (when available) and free energy comparisons made in this work between the 1CPN and 3SPN-AICG models. Despite the relatively large number of parameters in the 1CPN model, nearly all of them come from experiments or 3SPN-AICG free energy calculations.

### H. Code availability

The 1CPN model is implemented into the LAMMPS simulation package and is freely available online at https://uchic.ag/1cpn.

## III. METHODS

### A. 3SPN-AICG nucleosome model

Although this manuscript is primarily concerned with the new 1CPN model, many of the parameters in 1CPN were obtained by mapping 1CPN results to those obtained using the 3SPN-AICG model. For this reason, we provide an overview of the 3SPN-AICG model here, and refer interested readers to the detailed descriptions found in prior publications.^{46,47,49} In the 3SPN-AICG model, DNA is represented by the 3SPN.2C model of DNA, the latest version of the 3SPN model^{65,80,81} that has been extended to incorporate the effects of DNA sequence on the shape, local flexibility, and global flexibility of the DNA molecule.^{50} In the 3SPN.2C model, each base pair is represented by three sites, located at the phosphate, sugar, and the base. This mesoscale description of DNA has proven useful in a wide range of studies, ranging from DNA compaction into a viral capsid^{82} to DNA-conjugated nanoparticles.^{83,84} The histone proteins are represented by the Atomic Interaction-based Coarse Grained (AICG) model, a Go-based protein model with sequence-specific interactions parameterized from all-atom energies.^{51} In AICG, each amino acid is represented by a single site, located at the center of mass of the amino acid sidechain. We note that the AICG and 3SPN.2C models were developed independently to be accurate models of proteins and DNA, and consequently, no information about the structure of nucleosomal DNA is encoded directly into the model.

Interactions between the 3SPN and AICG models are represented only through electrostatic interactions between the negatively charged phosphate DNA sites and the charged amino acids on the histone protein. As a result, the configurations sampled in the 3SPN-AICG arise naturally as a balance between the elastic energies with the DNA molecule and the electrostatic interactions between the DNA and histones. This has permitted the 3SPN-AICG model to quantitatively match experimental measurements of the tension-dependent unwrapping of the nucleosome without any additional fit parameters^{47} and to examine the role of sequence-dependent repositioning.^{49} In 3SPN-AICG, electrostatic interactions are treated at the level of Debye-Hückel theory and the solvent implicitly by a Langevin thermostat. All calculations using 3SPN-AICG were performed at 300 K at 150 mM monovalent salt unless otherwise noted.

### B. Mapping between 3SPN-AICG and 1CPN models

In order to match the nucleosome-nucleosome interaction energies in the 1CPN and 3SPN-AICG nucleosome models, it was necessary to develop a mapping function between them (Fig. 5). In general, such a high reduction in dimensionality is accompanied by a high loss of information. We found, however, that the semirigid structure of the histone core enabled a straightforward mapping from the 3SPN-AICG model to the 1CPN model. The position of the 1CPN nucleosome site, ** r**, was chosen as the center of mass of the 3SPN-AICG nucleosome. The orientation vectors of the nucleosome site $f^,v^,u^$ were obtained by defining geometric relationships within the nucleosome such that $u^$ points from the center of mass of the histone to the dyad, $f^$ points along the nucleosomal DNA superhelix, and $u^\xd7f^=v^$.

Once the position and orientation of the nucleosome site are obtained, the dyad site is positioned *ℓ*_{d,0} along ** û**. DNA sites in 1CPN are positioned along the helical axis of 3SPN DNA corresponding to the central base pair of a three base pair segment. The orientation of these DNA sites is chosen such that $f^$ points toward the 3SPN minor groove and

**points along the helical axis. Additional details of this mapping procedure are described in Sec. S5 of the supplementary material.**

*û*### C. Nucleosome-nucleosome pair potentials

After mapping the 3SPN-AICG nucleosome model to a position and orientation, the relative orientation of the two nucleosomes *i* and *j* is given by six angles that represent all the possible combinations of angles between $f^i,f^j,r^ij$ and $\xfbi,\xfbj,r^ij$. In the 1CPN model, our choice of the Zewdie potential assumes uniaxial symmetry about $f^i$ and $f^j$, and therefore, the relative orientation of two nucleosomes only depends on the three vectors $f^i,f^j,r^ij$. The precise definitions of the 3SPN-AICG orientations used in this work are listed in Table II. Note that orientations C and D gave nearly identical free energies, and therefore, orientation D is omitted from Fig. 6 for clarity.

Orientation . | A . | B . | C . | D . |
---|---|---|---|---|

$f^i\u22c5f^j$ | 1 | 1 | 0 | 0 |

$f^i\u22c5r^ij$ | 0 | 1 | 0 | 0 |

$f^j\u22c5r^ij$ | 0 | 1 | 1 | 0 |

û_{i} · û_{j} | 1 | 1 | 0 | 0 |

$\xfbi\u22c5r^ij$ | 0 | 0 | 1 | 0 |

$\xfbj\u22c5r^ij$ | 0 | 0 | 0 | 0 |

Orientation . | A . | B . | C . | D . |
---|---|---|---|---|

$f^i\u22c5f^j$ | 1 | 1 | 0 | 0 |

$f^i\u22c5r^ij$ | 0 | 1 | 0 | 0 |

$f^j\u22c5r^ij$ | 0 | 1 | 1 | 0 |

û_{i} · û_{j} | 1 | 1 | 0 | 0 |

$\xfbi\u22c5r^ij$ | 0 | 0 | 1 | 0 |

$\xfbj\u22c5r^ij$ | 0 | 0 | 0 | 0 |

To compute the effective pair-potential, the 3SPN-AICG nucleosomes were constrained to the specified orientation with strong harmonic angle potentials, and umbrella sampling^{85,86} was performed along the center-of-mass separation, *r*, between the two nucleosomes. The parameters in the 1CPN model’s Zewdie potential (see Table S3 of the supplementary material) were chosen to minimize the total error with the 3SPN-AICG effective pair potentials over all four orientations.

### D. DNA exit angle from nucleosome

As discussed in Sec. II D, the 1CPN model permits the angle at which DNA exits the nucleosome to change dynamically as DNA unwraps from the histone surface. The energy scales of these interactions in 1CPN were determined by matching the free energies of DNA unwrapping obtained with 1CPN to those obtained using the 3SPN-AICG model (see Sec. IV C). These free energies were calculated for both 1CPN and 3SPN-AICG by first generating a configuration consisting of a single nucleosome flanked by 38 extra base-pairs on each side [Fig. 8(a)]. The DNA end-to-end extension was used as an order parameter to sample the unwrapping of the nucleosome, and the free energy as a function of DNA extension is then determined using umbrella sampling. Note that this procedure is identical to that employed in our previous work with the 3SPN-AICG model in the absence of applied tension to the DNA ends.^{47}

### E. Nucleosomal DNA rotation

In order to calculate the free energies of DNA rotation around the nucleosome in the 3SPN-AICG model, we first define an order parameter, *S*_{R}, that quantifies the rotational position of the nucleosomal DNA. This order parameter has been used in previous work with the nucleosome^{46,49} and is described again here. The definition of *S*_{R} is given by

where **B** is a vector from the center of a given base step on the sense strand to its complementary base step on the antisense strand, **P** is a vector from the center of a base step to the center of the protein, and the angle brackets denote an average over base steps at the −15, −5, +5, and +15 positions relative to the dyad. The positive sign is chosen if $P\xd7B\u22c5D\u22640$ (negative if >0), where ** D** is a vector in the 5′ to 3′ direction along the sense strand. Notably, when

*S*

_{R}= −

*π*/2, the minor groove is oriented toward the protein core, and when

*S*

_{R}=

*π*/2, it is oriented away from it. The free energy along

*S*

_{R}was obtained using Metadynamics, an advanced sampling technique where a molecular dynamics trajectory is continuously biased toward unexplored regions of phase-space, thereby ensuring that all of phase space is sampled.

^{87}

DNA rotation is incorporated into the 1CPN model through the choice of parameters in the $UNDNArotation$ potential described in Eq. (22). Since the parameter *S*_{R} in the 3SPN-AICG model corresponds to *θ* in the 1CPN model, the value of *k*_{θ} was obtained by fitting Eq. (22) to the 3SPN-AICG rotation data presented in Fig. 9. Note that *k*_{θ} listed in Table S6 of the supplementary material is half that obtained from the fits in Fig. 9 because $UNDNArotation$ is applied to both the entering and exiting DNA sites in 1CPN.

### F. DNA persistence length

The DNA molecule is characterized by both a bend persistence length, *ℓ*_{p}, and a twist persistence length, *ℓ*_{t}. Both *ℓ*_{p} and *ℓ*_{t} were obtained from simulations of linear chains of 80 DNA beads, corresponding to 240 base pairs. Error bars were obtained from 10 independent simulations. The bend persistence length is extracted from the autocorrelation of $u^$ vectors along a DNA strand

where $u^0$ and $u^j$ are the orientations of two DNA sites separated by *j* beads that point along the polymer backbone, and *ℓ*_{b,0} is the equilibrium bond length given in Eq. (14) [cf. Fig. 3(a)]. The twist persistence length is calculated analogously from the autocorrelation function of the vectors $f^$ that are perpendicular to the polymer backbone

We note that since our model of DNA contains a nonzero equilibrium twist [i.e., *ω*_{0} ≠ 0 in Eq. (14)], the $f^$ vectors rotate around the polymer backbone [cf. Fig. 3(a), red arrows], and the autocorrelation of $f^$ also contains an additional contribution due to the rotating phase of $f^$. Though this contribution can be resolved explicitly by rotating a bead’s reference frame when calculating $\u27e8f^0\u22c5f^j\u27e9$, we instead opt for the simpler, yet equivalent approach of setting *ω*_{0} = 0 in our simulations used to determine *ℓ*_{t}. Note that the bending of the DNA backbone will also lead to decorrelations of $f^0$ and $f^j$ and can effect the calculated value of $\u2113t$. To decouple these effects, the DNA backbone was artificially stiffened to prevent this coupling when we computed $\u2113t$.

### G. Diffusion constants

Diffusion constants in the 1CPN model of DNA or chromatin fibers were obtained by tracking the center of mass as a function of time, *r*_{cm}(*t*), and then computing the mean-squared displacement,

The diffusion coefficient is then obtained from the Einstein relation,

When calculating the diffusion constant of DNA and chromatin fibers (Fig. 10), simulations were performed at 150 mM salt and with a nucleosome repeat length of 207 base pairs. When extracting the diffusion constant from experimental papers,^{88–91} the diffusivity was not always measured at these salt conditions and extrapolation was necessary. In general, the diffusivity was approximately constant at salt concentrations >50 mM, and so this extrapolation was assumed to introduce negligible errors.

### H. DNA-mediated nucleosome-nucleosome pair potential

The DNA-mediated nucleosome-nucleosome pair potential (Sec. IV F) was obtained using umbrella sampling^{85} along an order parameter defined by the center-to-center distance, *r*, between the two nucleosome sites. At short separations, internucleosomal DNA can adopt many different configurations, each of which is separated by relatively large energy barriers. As a consequence, diffusion orthogonal to the order parameter was very slow and obtaining accurate sampling of all possible DNA configurations was challenging. To improve sampling, we used 10 independent umbrellas at each *r* and choose relatively weak umbrella force constants in order to allow a given configuration to sample a wide range of *r*, thereby permitting the two nucleosomes to unfold and refold many times in each umbrella. This procedure facilitated the sampling of many different nucleosome orientations and led to the rapid convergence of the reported free energy surfaces.

### I. Sedimentation coefficients

Sedimentation coefficients were obtained by first initializing a 12 nucleosome, 207 nucleosome repeat length (NRL) fiber in an extended configuration where the nucleosomes were not in contact. These fibers were then relaxed using Brownian dynamics until they condensed (typically 10^{9} time steps, or 20 *μ*s). This condensed configuration was then used as the initial configuration for a replica exchange simulation at the specified salt concentration with 24 temperatures spaced geometrically between 300 K and 700 K. Since the energy barriers between different fiber configurations are relatively large, this replica exchange simulation was necessary to accelerate the exploration of many possible fiber conformations and improve the estimate of the sedimentation coefficient. The replica exchange simulations were typically ≈1 × 10^{8} steps, or 2 *μ*s.

The sedimentation coefficient was determined, as described previously,^{38} according to

where *N*′ is the number of nucleosomes in the fiber, and *R* is the effective radius of the nucleosomes and is assumed to be 54.6 Å. *S*_{1} is the $S20,w$ of a mononucleosome taken as equal to 11.1 Svedberg (S), and *R*_{ij} is the distance between two nucleosomes. The values of *R* and *S*_{1} were chosen to be consistent with the values used in a prior work.^{38} The sedimentation coefficient was monitored throughout the replica exchange simulation (at 300 K) and was determined as the average once the sedimentation coefficient converged. Error bars represent the standard deviation from four independent simulations.

## IV. RESULTS

### A. Nucleosome-nucleosome pair potential

To parameterize the nucleosome-nucleosome interaction in 1CPN, we first compute the anisotropic pair-potential using the 3SPN-AICG nucleosome model (Fig. 6). The pair-potential is strongly dependent on the orientation of the nucleosomes, with the energy and length-scales of the interaction ranging from *U*/*ϵ*_{0} ≈ 0.4 to 1.0 and 65 Å to 100 Å, respectively. The lowest energy configuration corresponds to two nucleosomes stacked face-to-face (orientation B).

Note that 3SPN-AICG can be used to explore the effects of epigenetic markers, such as the acetylation of the histone tails, on the nucleosome-nucleosome pair potential. As a proof-of-principle, we again calculate the anisotropic pair-potential using 3SPN-AICG, but modify the histone proteins so that the histone H4 tail is removed. The H4 tail is thought to mediate interactions between nucleosomes,^{19} and modifications to H4 represent an important epigenetic mechanism by which chromatin compaction is controlled. The removal of the H4 tail is observed to decrease the attraction of nucleosomes in orientation B and leads to a significant widening of the potential in orientation B as well (Fig. 6).

Having obtained these free-energy surfaces with the 3SPN-AICG model, we then parameterize the nucleosome-nucleosome interactions energies in 1CPN (see Sec. III). The resulting pair potentials obtained with 1CPN are shown by the solid lines in Fig. 6, and the parameters are reported in Table S4 of the supplementary material. The agreement between the pair potentials from 3SPN-AICG (points) and those from 1CPN (lines) is good, with all length and energy scales matched between the two models. Notably, good agreement between the models is obtained for both the “full” nucleosome and nucleosomes lacking the H4 tail. This result indicates that both the complex orientation-dependent interactions between nucleosomes and the effects of histone modifications on these interactions can be effectively coarse-grained into the 1CPN model’s Zewdie potential.

When computing these pair potentials in 3SPN-AICG, the maximum energy of attraction, *ϵ*_{0,full}, was found to be 7.9 kcal/mol (≈13*k*_{B}*T*). This value is in good agreement with the experimental measurements of Ref. 92, in which the internucleosome attraction was found to be 13.2 ± 0.7*k*_{B}*T*, but differs from other experiments that estimate these energies to be ≈1 − 3*k*_{B}*T*.^{93–95} The strength of attraction between nucleosomes continues to be an active area of research, and an extensive comparison between the predictions of the 3SPN-AICG model and experimental measurements can be found in our recent publication.^{96}

In the 1CPN model presented here, we choose *ϵ*_{0,full} as an adjustable parameter that sets the energy scale for the nucleosome-nucleosome interactions. This choice allows the 1CPN model to retain the relative energies of the different orientations and histone modifications, yet provides the flexibility to vary the strength of attraction between nucleosomes as additional data becomes available.

### B. DNA persistence length

In addition to the pairwise interactions between nucleosomes, internucleosome interactions within chromatin are also mediated by the elasticity of the DNA that joins neighboring nucleosomes. To ensure that these elastic contributions are represented faithfully in our model, we calculate the bend and twist persistence lengths of the twistable wormlike chain used in 1CPN (see Sec. III). Using the DNA parameters from Refs. 41 and 43 (given in Table S5 of the supplementary material), our DNA model exhibits a bend persistence length, *ℓ*_{p} = 50.5 ± 1.3 nm, and a twist persistence length, *ℓ*_{t} = 1072 ± 164 nm. Both values are in good agreement with experimental data.^{98–100}

Since DNA is a polyanion, the persistence length depends strongly on the concentration of salt in solution. Experiments have shown that it can range from ≈50 nm at salt concentrations >100 mM up to ≈1000 nm at salt concentrations of 2 mM.^{97} In chromatin, this increased rigidity of DNA is thought to cause chromatin fibers to expand at low ionic strength.^{24,101} Because of its impact on chromatin structure, we sought to evaluate whether this salt-dependent persistence length of DNA is reproduced in the 1CPN model.

The persistence length as a function of salt was determined using 1CPN for salt concentrations ranging from 2 to 1000 mM and was compared to experiments by Baumann *et al.*^{97} (Fig. 7). The agreement between the model predictions and the experimental measurements is reasonable and indicates that the effective stiffening of DNA with decreased salt is reproduced by 1CPN.

### C. DNA exit angle from nucleosome

The angle at which DNA exits the nucleosome is an important model parameter that dictates how adjacent nucleosomes along a single DNA strand interact. Adjacent nucleosomes are typically separated by 20–80 base pairs, a length much shorter than the persistence length of DNA, and therefore, bent linker DNA is energetically disfavored.^{102} The angle at which DNA exits the nucleosome dictates the boundary conditions of this bent linker DNA and has a significant effect on the orientations of two associating nucleosomes. Depending on the DNA exit angle, different nucleosome orientations will be either favored or frustrated by the path of the linker DNA.

In previous models of chromatin, this angle was fixed at a constant value, which is either extracted from the nucleosomal crystal structure^{38,39} or chosen as an input parameter in the model.^{42,43} In both cases, the resulting configurations of the chromatin fiber are known to be very sensitive to the choice of these angles. These previous approaches also make the assumption that the DNA exit angle is fixed and that partial DNA unwrapping cannot occur.

As mentioned before, in the 1CPN model we make no such assumptions about the DNA exit angle and instead allow the outermost 10 base pairs of DNA to unwrap from the nucleosome (see Sec. II D). In order to parameterize DNA unwrapping in 1CPN, we rely on recent results using the 3SPN-AICG model where the free energies of nucleosome unwrapping were obtained.^{47} These free energies and the forces associated with them were found to be in good agreement with experimental measurements of fully intact nucleosomes^{103} and nucleosomes with histone tails removed.^{63}

The free energies of DNA unwrapping computed using both the 1CPN and the 3SPN-AICG model are consistent with each other over a large range of extensions (Fig. 8). Both the DNA-DNA repulsion at low extension (*r* < 60) and the free energetic penalty as the extension is increased to *r* = 200 Å are well reproduced by 1CPN. Deviations between the two models only occur at large separations *r* > 225, where more than 10 base pairs of DNA unwrap from the histone surface [see Fig. 8(a), right]. Since the 1CPN model only permits the unwrapping of ∼10 base pairs of DNA, its construction prevents the model from matching the free energies of additional DNA unwrapping. Though 1CPN could be modified to incorporate these larger DNA unwrapping events, our choice of 10 unwrapping base pairs strikes a good balance between matching the free energies of DNA unwrapping and computational efficiency.

As with the nucleosome-nucleosome pair potentials discussed in Sec. IV A, the 1CPN model can also incorporate the effect of histone modifications on the free energies of DNA unwrapping. The most relevant histone tail to partial DNA unwrapping is the tail of histone H3, which is known to stabilize the outer wrap of nucleosomal DNA^{47,63} and to screen repulsive interactions between the entering and exiting DNA.^{64} We calculate the free energies of DNA unwrapping with the 3SPN-AICG model with the H3 tail removed and use these free energies to parameterize 1CPN [Fig. 8(c)]. As with the free energies of full nucleosome, the predictions of 1CPN agree with the results obtained using 3SPN-AICG. This finding also underscores the importance of the anisotropic gaussian potential, *U*_{aniso,gauss}, in governing the exiting configurations of DNA (as discussed in Sec. II B 2). Whereas inclusion of *U*_{aniso-gauss} is sufficient to reproduce the 3SPN-AICG free energies with and without the H3 tail [Figs. 8(b) and 8(c), blue line], removal of this potential results in free energies that are too low [Figs. 8(b) and 8(c), gray line].

### D. Sequence-dependent rotation of nucleosomal DNA

Different DNA sequences bind to the histone proteins with different affinity with different sequences exhibiting mechanical properties or equilibrium shapes that make nucleosome formation more or less favorable.^{46,104} Certain DNA sequences bind more strongly than others because these sequences contain periodic base-step motifs that energetically prefer to bind to the histone proteins. These periodic base-steps impart a rotational phasing of the DNA molecule, with variations in the rotational position of the histone-bound DNA incurring an energetic penalty.

Using the 3SPN-AICG model, we have quantified the free energies of DNA rotation within the nucleosome for several different DNA sequences (points, Fig. 9), as described previously.^{46} These sequences were chosen to range from the weakly binding PolyA sequence to the strongly binding 601 sequence. The orientation of the DNA molecule is given by the order parameter, *S*_{R}, that describes which side of the DNA molecule is facing the histone core (see Sec. III). Notably, a value of *S*_{R} = −*π*/2 corresponds to the minor groove facing the protein, whereas when *S*_{R} = *π*/2, the minor groove is facing away from it. As expected, these DNA sequences exhibit strong rotational preferences. All three sequences have free energy minima at ≈−*π*/2, where the favorable DNA binding motifs are facing the histone protein, and maxima at ≈*π*/2, where these motifs are facing away from the histone protein. The magnitude of the free energy penalty of this rotation is strongly dependent on the sequence, ranging from ≈20*k*_{B}*T* for the c2 sequence to ≈1*k*_{B}*T* for the PolyA sequence.

This sequence-dependent rotation is incorporated into the 1CPN model through the term $UNDNArotation$, defined in Eq. (22). $UNDNArotation$ is a cosine angle potential [see Eq. (18)] that contains two parameters *θ*_{0}, the equilibrium angle between the nucleosome and the entering/exiting orientation vector $f^$, and *k*_{θ} the spring constant corresponding to rotation. By choosing different values of *k*_{θ}, we can effectively incorporate sequence-dependent rotation into the 1CPN model. The comparison between 1CPN predictions (lines) and the free energies from 3SPN-AICG (points) is satisfactory (Fig. 9) and demonstrates that the complex sequence-dependent rotation of nucleosomal DNA is captured in 1CPN.

The rotation of nucleosomal DNA can also manifest itself as DNA twist *stored within* the nucleosome. Many crystal structures of the nucleosome are observed to have over-twisted DNA,^{68,105,106} and the incorporation of twist into the nucleosome has long been hypothesized to be a mechanism by which the nucleosome can reposition.^{49,107} This storage of twist within the 1CPN model is dictated by the value of $ktN$, which enters the expression for $UNDNAtwist$ in Eq. (23), and couples the orientation of the DNA entering and exiting the nucleosome. $ktN$ dictates the strength of coupling between the entering and exiting DNA and is inversely related to how much twist can be stored within the nucleosome. In our initial analysis with the 1CPN model, we have chosen to minimize the possibility of twist storage by choosing $ktN=10$ (kcal/mol)/rad, corresponding to a relatively large energy penalty of ≈10*k*_{B}*T* for a single twist defect. However, in cases where DNA twist can be stored in the nucleosome, a value of $ktN=1$ (kcal/mol)/rad is likely to be more appropriate.

### E. Chromatin dynamics

Chromatin is intrinsically dynamic, and its function depends on the time scales by which the chromatin fiber expands or contracts. Incorporating accurate dynamics into the 1CPN model is, therefore, an important design goal. We evaluate the accuracy of the 1CPN model by comparing the dynamics of both short DNA strands (≈10–100 bp) and small chromatin fibers (≈1–10 nucleosomes) to available experimental measurements.

The diffusion constant of DNA molecules of different lengths was determined (see Sec. III) using the 1CPN model [Fig. 10(a)] and found to be in good agreement with experiments,^{108–112} especially over the range of 20–80 base pairs, the typical range of linker DNA separating nucleosomes. We note that because of the “freely draining” Brownian dynamics employed in the 1CPN model (see Sec. III), the scaling of the 1CPN model, *N*^{−1}, does not match the scaling measured experimentally, *N*^{−0.67}. The experimental scaling,^{108,111} *N*^{−0.67}, has been previously justified using the Zimm model of polymer dynamics adjusted to include the stiffness,^{111} or the helical wormlike chain theory of Yamakawa.^{113} Though these better treatments of hydrodynamics could be incorporated into the 1CPN model, the first-order “freely-draining” Brownian dynamics approach provides a reasonable first step to describe dynamics.

The diffusion constant for short chromatin fibers was also measured using the 1CPN model [Fig. 10(b)]. Though the diffusivity of a single nucleosome is overestimated in 1CPN, the diffusivity of fibers two to four nucleosomes long is well reproduced in our model. Taken together, these results suggest that the 1CPN model should produce reasonable predictions of dynamic rearrangements within the chromatin fiber.

### F. Linker-mediated nucleosome-nucleosome pair potential

We next proceed to investigate the fundamental building block of the chromatin fiber: two nucleosomes connected by linker DNA. Nucleosomes are typically separated by 20–80 bp of DNA, length scales that are small relative to the bending and twisting persistence length of DNA. Thus, it might be expected that the pair-potential between two nucleosomes connected by DNA would be a strong function of various DNA parameters.

The effect of DNA length is presented in Fig. 11(b). The DNA length between nucleosomes is varied from 20 to 60 bp (167–207 nucleosome repeat length or NRL) and is chosen to be an even multiple of the pitch of DNA (10 bp). From these calculations, three minima in the pair-potentials emerge [Fig. 11(a)]. The first minimum corresponds to the two nucleosomes in close face-to-face contact with the DNA strongly bent (reminiscent of configuration B in Fig. 6). The second minimum corresponds to the two nucleosomes aligned side-to-side (configuration A, Fig. 6), whereas the third minimum corresponds to the two nucleosomes far apart, with the DNA relatively straight. As a consequence of this relatively straight DNA, the location of the third minimum increases from 150 Å to 235 Å as the NRL increases. Although these three minima appear for most NRLs examined, their relative free energies vary significantly as the NRL changes from 167 to 207 bp. For example, the NRL of 177 is observed to stabilize the third minimum, whereas for the 197 NRL, the free energies corresponding to each minimum are approximately equal.

We also examine the effect of DNA twist on the pair-potentials between two connected nucleosomes [Fig. 11(c)]. By varying the NRL by a single base pair at a time, the angle between the two nucleosomes changes significantly (36°/bp), while the length of the DNA changes only slightly (3.3Å/bp). This changing angle between nucleosomes significantly affects how much the DNA must twist to accommodate nucleosome-nucleosome contacts. Slight changes to the nucleosome angle are therefore observed to significantly change the free energies of the first (*r* = 70 Å) and second (*r* = 105 Å) minima in the pair potential.

At this point, it is important to recall that the amount of twist that can be stored in the linker DNA is balanced by the nucleosome-bound DNA’s ability to resist rotation (see Sec. IV D). For strongly bound DNA sequences (e.g., the 601 sequence), it is expected that twist would be stored in the linker DNA. However, for weakly bound DNA sequences (e.g., PolyA), this twist might be able to escape the linker DNA by rotating the nucleosome-bound DNA. To examine if this is indeed the case, we explore the effect of DNA sequence on the pair-potential between two nucleosomes [Fig. 11(c)]. In agreement with our expectations, we observe that weakly binding DNA sequences (e.g., TTAGGG, PolyA), significantly lower the energy barrier to the second minimum. The relationship between the DNA sequence and the 3D structure of chromatin remains to be explored in detail, and we envision that the 1CPN model will provide a tool for pursuing such future studies.

### G. Chromatin fiber sedimentation coefficients

As mentioned earlier, 1CPN is well suited to examine the structure and dynamics of chromatin fibers. To illustrate this idea, we prepared chromatin fibers consisting of 12 nucleosomes with a nucleosome repeat length of 207 bp and various salt concentrations. We computed the sedimentation coefficient of these fibers (see Sec. III) and compared our predictions to experimental measurements for an identical system described in the work of Hansen *et al.*^{101} (Fig. 12). Sedimentation coefficients have been used extensively to assess the relative compaction of different chromatin fibers, and these calculations represent an important test of 1CPN.^{35,76,114} As shown in Fig. 12, simulations and experiments agree with each other, particularly for the effects of salt and linker histone on the compaction and expansion of chromatin.

### H. Performance

We have examined the efficiency of 1CPN as simulations are scaled across multiple processors (Fig. 13). For systems of 50 nucleosomes (N = 50), the 1CPN model exhibits excellent scaling across 16 processors, whereas in larger systems (N = 250), near-ideal scaling is obtained across 64 processors. For a 50 nucleosome fiber, on a single 16 core node, a week of simulation time will produce a ≈50 *μ*s trajectory with 1CPN. The efficiency is, therefore, sufficient for a study of a wide range of phenomena associated with chromatin structure and dynamics.

## V. CONCLUSION

A new model of chromatin, 1CPN, has been presented in this work. By relying on a multiscale approach, 1CPN incorporates physics that occur over nanometer length scales, such as histone modifications and DNA sequence. The model is nonetheless computationally efficient and permits simulation of many kilobases of chromatin. The parameterization of 1CPN has relied both on extensive simulations with the detailed 3SPN-AICG model of the nucleosome as well as experimental measurements of the structure and dynamics of chromatin. 1CPN has been parameterized to reproduce the many free energies that govern the interactions within chromatin, such as the interactions between nucleosomes, the interaction between DNA and histone tails and how these interactions can be modulated by histone modifications. 1CPN has also been parameterized to reproduce the salt-dependent stiffness of DNA, the effect of DNA sequence on rotation of DNA within the nucleosome, and the dynamics of short chromatin fibers. We incorporate linker histone H1 into 1CPN using the model of Luque *et al.*^{40} Following this parameterization, 1CPN has been used to examine the free energies of association between two nucleosomes separated by different lengths of DNA. It is found that the length of this DNA, as well as the relative pitch between two nucleosomes, has a dramatic effect on the interactions between nucleosomes. Finally, it was demonstrated that 1CPN achieves quantitative agreement with experimental measurements of sedimentation coefficients of short chromatin fibers, both in the presence and absence of linker histone H1.

Building on this foundation, it is anticipated that 1CPN will be useful for studies of the many dynamic processes that dictate chromatin compaction. The mechanisms that link chromatin structure to gene expression are still poorly understood, and the 1CPN model provides a tool to interrogate these relationships.

## SUPPLEMENTARY MATERIAL

See supplementary material for additional details on interaction potentials and model parameters.

## ACKNOWLEDGMENTS

We are grateful for helpful discussions with Marat Andreev, Alec Bowen, and Jonathan Whitmer. The development of multiscale descriptions of DNA was supported by the National Science Foundation through Grant No BIO/MCB 1818328. The development of advanced sampling codes required for free energy calculations and the implementation of 1CPN in LAMMPS was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, through the Midwest Integrated Center for Computational Materials (MICCoM). We also acknowledge computational resources provided by the Midway computing cluster at the University of Chicago. A.C. acknowledges financial support from CONICYT under FONDECYT Grant No. 11170056.