Synergism between x-ray crystallography and NMR residual dipolar couplings in characterizing protein dynamics

The important role of structural dynamics in protein function is widely recognized. Thermal or B-factors and their anisotropy, seen in x-ray analysis of protein structures, report on the presence of atomic coordinate heterogeneity that can be attributed to motion. However, their quantitative evaluation in terms of protein dynamics by x-ray ensemble refinement remains challenging. NMR spectroscopy provides quantitative information on the amplitudes and time scales of motional processes. Unfortunately, with a few exceptions, the NMR data do not provide direct insights into the atomic details of dynamic trajectories. Residual dipolar couplings, measured by solution NMR, are very precise parameters reporting on the time-averaged bond-vector orientations and may offer the opportunity to derive correctly weighted dynamic ensembles of structures for cases where multiple high-resolution x-ray structures are available. Applications to the SARS-CoV-2 main protease, Mpro, and ubiquitin highlight this complementarity of NMR and crystallography for quantitative assessment of internal motions.


INTRODUCTION
The study of protein internal motions at the atomic level has been a topic of high biophysical interest for many decades. These motions are key to the entropic contribution to the free energy function that is of fundamental importance for characterizing proteinprotein and protein-ligand interactions. 1,2 An improved view of protein dynamics may also shed new light on the age-old question of induced fit vs conformational capture. [3][4][5] Advances in single particle cryo-electron microscopy have resulted in atomic resolution protein structures, with the heterogeneity in observed structures providing evidence for domain motions that can be of high functional importance. 6,7 However, at present the atomic details of protein motions and their time scales are more commonly analyzed by protein x-ray crystallography and NMR spectroscopy. In this perspective, we focus on recent innovations in these fields and discuss how their concerted use may lead to further improvements in characterizing the atomic details of molecular motions that drive protein function.

MOTIONS DERIVED FROM X-RAY CRYSTALLOGRAPHY
Temperature or B-factors contain information on the spatial distribution of dynamic fluctuations experienced by a protein, 8 but their quantitative interpretation in structural terms is complex. [9][10][11] Using molecular dynamics to generate a "protein crystal" perturbed by known multimodal, anharmonic motions, Garcia et al. showed that experimental B-factors can substantially underestimate the true mean square displacements. 12 An analogous, in-depth analysis of simulated data showed that even at high crystallographic resolution, fitted B-factors of some well-resolved atoms can be underestimated by as much as sixfold. 10 On the other hand, for proteins that diffract to very high crystallographic resolution, electron density often can be confidently interpreted in terms of distinct conformations that are populated in the crystalline state, 13,14 an observation sometimes referred to as polysterism. 15 Such structural heterogeneities usually cluster in small regions, typically involving less than half a dozen residues, and the details and relative populations of these conformers vary with temperature. 16 Residues that exhibit such structural heterogeneities in the crystalline state nearly always give rise to single resonances in solution NMR spectra, recorded at room temperature, indicative of conformational averaging between these substates that is rapid on the NMR timescale of milliseconds. 17,18 The functional importance of such dynamic interconversions has been well recognized in enzymatic mechanisms, 15,17,19 and is at the basis of Koshland's induced fit hypothesis which postulated that proteins can switch to alternate discrete conformational states upon ligand binding. 3 For protein crystals that diffract to high resolution, i.e., better than $2-Å , ensemble refinement methods of increasing sophistication have been introduced over the past 30 years. [20][21][22][23] Rather than locally fitting a weighted average of two discrete conformers to the crystallographic data, these methods generate many protein conformers that are sampled from molecular dynamics (MD) trajectories, with these trajectories restrained to yield time-averaged agreement with the x-ray diffraction data. 20 Such ensemble refinement was shown to be prone to over-fitting of the experimental data, as exemplified by worse crossvalidation statistics in terms of the crystallographic free R factor (R free ) 24 during early applications of this method. 25,26 However, the introduction of suitable adjustments in both the MD protocol, improved MD force fields, and deformable elastic net (DEN) restraints, mitigate over-fitting and can result in modest improvements in R free . [21][22][23] An intrinsic advantage of ensemble refinement, over the fitting of multiple discrete local conformers, results from the correlation seen in dynamic excursions in different regions of a protein, as expected for example, during allosteric conformational switching. However, it is important to bear in mind that the correlations between motions depend upon the empirical force field used in the MD simulation, and do not result from the Bragg reflections per se. Scripts for carrying out crystallographic ensemble refinement using the very powerful and widely used Phenix software 27,28 have become available, making the approach accessible to non-expert users with sufficiently powerful computational resources.

DYNAMICS FROM NMR RELAXATION
NMR spectroscopy has long been used for probing the rates and amplitudes of dynamic excursions from the time-averaged structure by measurement of nuclear spin relaxation rates. Motions that are faster than the rotational correlation time of the protein in solution are typically probed by measurement of longitudinal (R 1 ) and transverse (R 2 ) relaxation rates of 15 N or 13 C nuclei, combined with the magnitude of the heteronuclear nuclear Overhauser enhancement (NOE). 29,30 The rotational correlation time scales approximately linearly with the molecular volume or mass of the protein, ranging from 4.2 ns for ubiquitin (8.5 kDa) at room temperature, to ca 32 ns for hemoglobin (64.5 kDa), and therefore only rather fast motions can be quantitatively probed by such relaxation measurements. The type of motion associated with these relaxation parameters usually cannot be extracted from such data and is simply presented as an effective generalized order parameter, S 2 , that reports on the angular amplitude of the excursions of corresponding bond vectors in a model-free manner. [31][32][33] Recent innovative work, relying on transient binding of a protein to slowly tumbling nanoparticles, can extend the timescale over which such dynamic excursions can be probed to values beyond the rotational correlation time of the free protein. 34,35 Unfortunately, the requirement that the time-weighted average rotational correlation time of the free and nanoparticle-bound protein remains shorter than ca 30 ns, the upper limit for which high-resolution amide resonances can be readily observed at adequate sensitivity when using TROSY methods, limits this method to relatively small proteins. Sidechain motions on time scales longer than 30 ns can be studied by the nanoparticle-assisted spin relaxation (NASR) method when using the more favorable methyl group relaxation properties to probe the system. 36 Internal motions on time scales much slower than the rotational tumbling, in the ls-ms regime, are often reflected in chemical exchange contributions, R ex , to the transverse relaxation rates, R 2 . R ex measurement exploits the difference in chemical shifts of the conformationally exchanging substates. For the common situation where the exchange is faster than the difference in chemical shifts (expressed in Hz), the R ex contribution to the transverse relaxation rate scales with the square of this shift difference, and inversely with the exchange rate. 37 In addition to providing precise values of the exchange rate, such measurements also can yield chemical shifts of the substates involved, which contain information regarding the structure of the substate. Although, in practice, it often proves difficult to extract the chemical shifts for more than a single unknown substate, the R ex data can be invaluable for validating models of the exchange process that gives rise to the exchange contributions. [38][39][40] If precise interproton distances are derived from highly quantitative NOE data, using so-called "exact NOEs" or eNOEs, 41 regions in a protein structure can sometimes be identified that cannot simultaneously satisfy all precisely measured distances, indicative of switching between conformational substates. 42 In such cases, multi-conformer ensembles can be derived that show much improved agreement with the NMR data, albeit using an N-fold increase in the degrees of freedom when fitting to an ensemble of N conformers. Like for crystallographic ensemble refinement, stringent cross-validation criteria are, therefore, needed to prevent over-fitting of the data. 43 As will be discussed below, residual dipolar couplings (RDCs) provide an unambiguous set of parameters to validate structural models. For example, RDCs that were not used to derive an NMR structure of a small GB3 domain agreed considerably better with a single-conformer eNOEderived model than with the traditionally NOE-derived structure, validating the power of the eNOE method. However, no significant further improvement was obtained when the structure was refined as a multi-conformer model. 44 Multi-conformer NMR ensembles also have been generated by using the relaxation-derived backbone order parameters as restraints during MD trajectory calculations. 45 This dynamic ensemble refinement (DER) method was demonstrated for ubiquitin, showing considerable conformational heterogeneity throughout the structure, much larger than coordinate uncertainties in either the deposited NMR structure, 46 or calculated from x-ray data by using RAPPER. 47 However, it is important to note that the original NMR structures of ubiquitin represent best fits to the time-averaged coordinates, whereas the DER method aims to depict the ensemble of structures sampled over time. The DER ensemble showed remarkably good agreement with both scalar couplings and residual dipolar couplings that were not used as restraints during the MD calculation, supporting the methodology. However, we note that S 2 order parameters, derived from NMR relaxation data, are very sensitive to the N-H bond length used. Due to the low mass of hydrogen, N-H vectors are subject to ultrafast zero-point liberations of rather large amplitudes that decrease the effective 15 N-1 H dipolar coupling. Therefore, N-H bond lengths should be artificially lengthened to 1.04 Å when interpreting 15 N relaxation in terms of motion of the N and C backbone atoms. 48 The large amplitude dynamics seen in the DER ubiquitin ensemble (PDB entry 1XQQ), forced by imposing compliance with the 15 N S 2 values that were derived for a 1.02-Å NH-bond length, resulted in conformers that transiently lack H-bonds (e.g., I30, Q40, L50, and L56), whereas very slow hydrogen exchange 49 is consistent with their uniform presence in all 190 PDB x-ray structures solved at a resolution 2.0 Å . This observation, therefore, suggests that, despite good validation statistics, the DER ensemble representation may be too dynamic.

DYNAMICS FROM RESIDUAL DIPOLAR COUPLINGS
RDCs contain highly precise information on the time-averaged orientation of bond vectors in the molecular frame of a protein.
Measurement of RDCs requires weak alignment of the protein relative to the external magnetic field, either by its own inherent magnetic susceptibility anisotropy 50 or by the application of an external alignment agent, 51 most commonly a suspension of filamentous phage 52 or anisotropically compressed hydrogel. 53,54 Dipolar coupling between two nuclear spins is described by a rank-2 tensorial interaction, and the averaged RDC for a bond vector that rapidly switches between multiple orientations in the molecular frame therefore will deviate from an RDC calculated for its timeaveraged orientation. [55][56][57] Depending on the precision at which the experimental RDCs can be measured, such deviations only become detectable for root mean square angular excursions that are larger than ca 20 -30 . With the availability of a large set of very precisely measured RDCs and NOEs, ensemble refinement of ubiquitin yielded a small, cross-validated improvement for a two-conformer ensemble over a single conformer, but with local outliers that appeared to be artifactual. 58 Alternatively, if couplings can be measured under five orthogonal protein alignments (in rank-2 tensor space), these couplings allow the generation of NMR ensemble models that reflect the amplitudes and directions of bond vector fluctuations. 55,59,60 A landmark study that applied this technology to ubiquitin found evidence for large amplitude backbone dynamics on a microsecond timescale, 59 termed "recognition dynamics." 59 The multi-alignment approach for extracting dynamics hinges on the inability to satisfy the five experimental RDCs, measured for a given bond vector under the five different alignment conditions, by a single vector orientation. However, each RDC also contains a random measurement error, and perfect fitting of five erroneous RDCs is possible due to the additional degrees of freedom that are available when representing the vector by an ensemble, even if its orientation is static. This problem of introducing false motion is aggravated when the alignment orientations are not sufficiently orthogonal to one another, in which case small measurement errors in the RDCs require a wider distribution of bond vectors, i.e., larger amplitudes of motion, to satisfy the data. In practice, application of the approach, therefore, requires RDCs that have been measured at very high precision, in particular when the five protein alignments are insufficiently orthogonal to one another. For ubiquitin, RDCs subsequently measured under a protein orientation that was poorly represented in the original study yielded better fits for most of the protein when using a single, static model rather than the ensemble representation, 61 suggesting that microsecond motions pertain to a much smaller fraction of the protein than originally reported. We also note that, in practice, five linearly independent alignment orientations are difficult to achieve unless relying on paramagnetic tagging, 62 which introduces other challenges.
Instead, as discussed below, even if measured in only a single alignment medium, the highly precise structural information contained in RDCs can be used to judge the validity and quality of both x-ray ensemble models and conventionally refined structures. Provided that sufficiently many x-ray structures are available for a given protein, RDCs can also be used to select an ensemble representation of such conventional low-energy structures that agrees better with RDC data than any individual structure.

RDCS AND CRYSTAL STRUCTURES
Clearly, NMR relaxation data and x-ray crystal structures are highly complementary: the x-ray data yield precise coordinates for low-energy states, with the relaxation data reporting on the rates at which they interconvert if populated in solution and provided that this exchange occurs on a NMR-accessible timescale. 63 Quantitative information on the number of conformers and their relative populations is often difficult or impossible to extract from such relaxation data. However, as discussed below and recently demonstrated for the SARS-CoV-2 main protease (M pro ), RDCs hold strong potential to provide this missing information. 64 Agreement between RDCs and protein atomic coordinates depends primarily on four factors: (1) The precision at which RDCs can be measured relative to the range spanned by the RDCs. (2) The precision at which the atomic coordinates are known, which is correlated with the crystallographic resolution and the R free at which the structure was solved. 65,66 (3) The accuracy at which hydrogen atom positions can be added to an x-ray structure, which can be challenging for amide groups that sometimes deviate substantially from idealized geometry. (4) True differences between a protein structure in the crystalline and solution states. The latter are often attributed to intermolecular contacts and lattice-packing effects, but well-defined differences between local conformations can also be observed between high-resolution structures determined in the same space group. An example of the latter has been discussed for the orientation of the Thr-198 peptide group in the interdomain linker adjacent to the P5 ligand-binding pocket of numerous M pro x-ray structures, all in the C 121 space group. 67 Fits of M pro RDCs to each of 350 x-ray structures available in the PDB showed good agreement for most of the a-helical C-terminal domain as well as part of the N-terminal b-sheet-rich domain, but considerably poorer fits for residues in the intervening half of the protein, including its active site and many of its substrate-binding pockets. 68 Despite the fact that this homodimeric enzyme is large by NMR standards (68 kDa), which adversely impacts the precision at which RDCs can be measured, the fit of 15 N-1 H RDCs to structure was shown to be limited by the accuracy of the coordinates rather than RDC measurement error. 64 This protein, therefore, presents a good testing ground for evaluating the utility of RDCs to quantify internal dynamics. First, we focus on the application of RDCs to evaluate different x-ray ensemble refinement methods.

RDC FITS TO X-RAY ENSEMBLE REFINEMENT MODELS
Recently, M pro was studied by various crystallographic ensemble refinement procedures, with mixed results. Whereas the standard ensemble refinement procedure, without DEN restraints, applied to the temperature series of x-ray data (PDB entries 7MHF-7MHK) yielded ensembles (7MHL-7MHQ) that, with one exception (7MHL, 100 K), generated small improvements in R free , 67 the dynamic distributions of these ensembles differed considerably. 64 Agreement with RDCs is commonly expressed as a "quality factor," Q, that effectively represents a weighted root mean square difference between N measured (D meas ) and predicted (D pred ) RDCs, with D a and Rh being the alignment strength and rhombicity obtained from the singular value decomposition (SVD) best fit. Q factors did not improve for the ensembles over fits to the corresponding traditionally refined x-ray structures. 64 Importantly, however, for these traditionally refined structures the fits of RDCs to amides that were most disordered in the ensemble refinement were ca 50% worse than for the least disordered half of M pro residues. This result indicated that the ensemble refinement correctly identified the residues with poorer coordinate accuracy, but not the atomic details of the coordinate distributions, i.e., the motions.
Application of DEN restraints during ensemble refinement, starting from crystallographic data recorded at very high resolution (PDB entry 7K3T; 1.2 Å ), 69 resulted in moderately (ca 10%) improved agreement with RDCs over the conventionally refined two-conformer model. 64 The results for M pro obtained with the recently introduced ECHT ensemble refinement protocol 22 have also been reported. 69 ECHT refinement breaks the disorder down to different size scales and provides a more intuitive view of the internal dynamics.
The ECHT-1 ensemble refinement protocol, applied to the PDB-REDO coordinates of the 7K3T M pro structure, yielded a moderately improved correlation with the RDCs, in particular for the more dynamic fraction of residues. This improved correlation indicates that the directions of excursions seen in the ensemble agree with motions sensed by the RDCs, even though Q factors remain about 20%-40% worse than for highly refined small proteins such as ubiquitin, 61 or the third IgG-binding domain of protein G. 70 Another indication that the x-ray ensemble refinement result remains imperfect is obtained when separately considering the most and least dynamic fraction of amides in the ensembles: if motions are correctly represented in the ensemble, separate RDC fits to the most and least disordered vectors in these ensembles should result in very similar D a values. However, a 5%-10% increase in D a value for the most disordered half of the residues indicated that the amplitudes of the dynamics assigned to their amides was considerably too large.

RDC FITS TO ENSEMBLES OF X-RAY STRUCTURES
When an artificial "superensemble" was generated from all 350 M pro PDB x-ray structures (distributed over eight space groups, different ligation states, and ranging from 1.2 to 2.85-Å in resolution), considerably better agreement with RDCs was obtained than for any individual structure or any of the crystallographic ensemble refinements (Fig. 1). 64 Moreover, fits of RDCs to the 50% of residues with the highest disorder (H50%) and lowest disorder (L50%) in this superensemble yielded nearly the same D a value, 64 indicating that the amplitudes of excursions in such an ensemble quantitatively agreed with solution RDCs. For the L50% fraction of residues, the improvement in the ensemble fit over the fit to a single model, hXray Ã i, where each vector adopts the ensemble-averaged vector orientation, is very small (Table I). This result suggests that for these least dynamic residues the improved ensemble RDC fit over the fit to any individual structure is dominated by decreased uncertainty in vector orientations after averaging over 350 different models. In contrast, for the dynamic H50% fraction of residues, the ensemble fit improves by 14% over fitting the single, ensemble-averaged model (Table I), indicating that it is the ensemble distribution, not simply the averaging of uncertainties in orientation, that is responsible for the better fit. For the ensemble fit to the 350 x-ray structures, the Q factor for the most dynamic, H50%, residues is only slightly worse than for the L50% half (Table I). This result indicates that this "random collection" of x-ray structures, the vast majority of which were determined by molecular replacement methods but also representing different ligation states and eight space groups, represents a remarkably good quantitative representation of the protein in its unligated solution state that was used for the RDC measurements.
Ubiquitin has also been studied extensively by both NMR and xray crystallography. Coordinates for more than 350 ubiquitin chains are available in the PDB, distributed across 31 space groups. Below, we focus on 190 ubiquitin chains that were solved at a resolution 2.0 Å . Many of these concern poly-ubiquitin constructs, frequently complexed to other proteins. A large set of highly precise RDCs in solution, measured under four different alignment orientations, is available for free ubiquitin and can be used to evaluate the backbone dynamics of this protein, which often has been used as a benchmark for dynamics analysis by NMR relaxation methods. 34,45,59,[71][72][73] A plot of the RDC Q-factor against crystallographic resolution confirms that, on average, higher resolution crystal structures agree better with RDCs [ Fig. 2(a)]. However, many of the structures solved at above 2-Å resolutions also yield good fits. As was observed for M pro , when considering these chains as a dynamic ensemble with equal weights for all members, the fit to the RDCs improves to well beyond what is obtained for any of the individual structures [dashed line in Fig. 2(a)]. A best fit superposition of these chains shows a fairly tight bundle, with a backbone C a coordinate root mean square deviation (rmsd) of 0.65 Å for residues Q2-L71 [ Fig. 2(b)].
If the collection of 190 ubiquitin x-ray structures in the Protein Data Bank (PDB) is interpreted as a dynamic ensemble, the N-H angular excursion (b n ) of residue i in structure number n relative to its  , obtained from 15 N NMR relaxation 72 (black) and calculated from Eq. (1) (red) for the 190 sets of PDB coordinates to which hydrogens were added using XPLOR-NIH. 74 Structural Dynamics PERSPECTIVE pubs.aip.org/aip/sdy ensemble-averaged orientation can be used to generate a generalized order parameter, given by where the averaging extends over all PDB ubiquitin x-ray structures.
The effect of dynamic disorder on NMR relaxation rates scales with S 2 , and it is this squared generalized order parameter that is normally reported in NMR relaxation studies. 29,31 For regions of secondary structure, these x-ray-derived S 2 values closely parallel those obtained from NMR relaxation but are higher by $0.1 [ Fig. 2(c)], consistent with the expected difference from the effect of N-H librations that impact the NMR relaxation analysis (vide supra). It is also important to note that, in the absence of slow conformational exchange, the 15 N NMR relaxation rates are only affected by internal motions that are faster than the overall tumbling time, i.e., ca 4 ns for ubiquitin. The lower S 2 values seen for the L8-T12 loop in the x-ray ensemble indicates that backbone motions in this loop must include rearrangements on a timescale slower than 4 ns, consistent with recent measurements that used nanoparticles to extend the NMR-visible timescale for internal motions. 34 The high-order parameters seen for most of the protein do not exclude the presence of transient large amplitude motions, such as associated with flipping of aromatic rings or transient unfolding, whose rates are simply too slow to contribute measurably to the ensemble-averaged S 2 values in relaxation analysis. Similarly, the corresponding high energy states cannot be crystallized and are therefore not observed in the PDB.

OPTIMIZATION OF THE ENSEMBLE OF STRUCTURES
When multiple x-ray structures for the same protein are present in the PDB, these constitute a rather arbitrary subset of the conformational space sampled by the protein, with each conformer impacted by intermolecular contacts in the crystalline lattice or in molecular complexes that they may be part of. Even if they span the conformational space sampled by the protein in solution, it is highly unlikely that the population of conformers in solution quantitatively corresponds to those deposited in the PDB. Therefore, the above RDC fitting procedure, carried out by assuming equal weights for each of the PDB depositions, is likely to deviate from the true situation and therefore to be sub-optimal. On the other hand, optimizing the weights for each of N PDB depositions introduces N-1 adjustable parameters in the RDC fit and can result in overfitting. This problem can be mitigated by cross validation, leaving out a small fraction of the RDCs used in the fit, while the remainder is used to optimize the weights and keeping fixed all previously optimized parameters related to alignment tensor orientation and magnitude. Computationally, such cross-validated optimization can become burdensome and improved procedures for ensemble optimization require further development. Efforts at optimizing the weights of the 350 M pro PDB structures, based on a simulated annealing protocol, improved the RDC fit by about 5% over weighing all structures equally. 64 Another question one may ask is: What is the minimum number of PDB structures needed to fit the RDCs satisfactorily? This latter question becomes of particular interest when focusing on small, local regions with increased dynamics. Can the RDCs of such local regions be satisfied by using a linear combination of just a few structural models? If so, this may yield intuitive insights regarding such motions.
For this purpose, we minimize the function with the restraint P i¼1.N w i ¼ 1 for the summed weights over N structures; C is a constant that impacts the weight given to minimizing the number of structures used vs minimizing the difference between predicted and measured RDCs, D pred j and D meas j , for residue j for a protein region that spans from residue P to R; and D a and Rh are the alignment strength and rhombicity, obtained from a global fit of all experimental RDCs to the ensemble of all structures, excluding residues P through R.
Application of this procedure to the M pro linker, A191-T199, bordering the P5 substrate binding pocket, shows that a satisfactory fit to the data cannot be obtained for any single x-ray structure [ Fig. 3(a)]. A fit to all 350 conformers, evenly weighted, agrees better with the RDC data, but not as well as an ensemble of three loop conformers with populations of 20%, 16%, and 64% [ Fig. 3(b)]. These three conformers, which represent the minimum needed to satisfy the RDCs to within their experimental uncertainties, exhibit fairly large differences in local structure, providing insights into the type of dynamic excursions at this functional linker site [ Fig. 3(b)].

OUTLOOK
The high degree of complementarity between RDCs and x-ray structures holds strong potential to provide improved, quantitative insights into structural dynamics. To date, the measurement of RDCs for characterizing dynamics has remained a niche, and their analysis in terms of motions remains challenging in the absence of crystallographic data. However, RDC data can rapidly and unambiguously be measured and assigned, and in conjunction with crystallographic data offer new opportunities to characterize protein motions in greater detail.
Recent improvements in crystallographic ensemble refinement, in particular the use of deformable-elastic-net-based restraints, appears to make the procedure more robust. 22 On the other hand, x-ray ensemble refinements 67,69 that aimed to describe the internal motions in SARS-CoV-2 M pro only provided a moderate improvement in agreement with RDCs, 64 measured by NMR for the apo-enzyme in solution. That analysis indicated that the motional amplitudes of the most dynamic regions in the ensemble refinements were substantially overestimated and inconsistent with the RDCs. These findings, therefore, suggest that RDCs can be used as valuable restraints to further optimize the crystallographic ensemble refinement method. However, analysis of M pro also highlighted areas that will remain impossible to capture by crystallographic ensemble refinement. For example, substantial structural differences adjacent to the P5 binding pocket of M pro were seen among different crystal structures, far outside uncertainties related to their electron densities. 67 Correspondingly, RDCs measured for T198 in this P5 pocket strongly disagree with the highest resolution x-ray structures and one of their ensemble refinements, but agree well with a multi-conformer structure that mostly populates an alternate state seen in several of the other high-resolution structures.
Structural Dynamics PERSPECTIVE pubs.aip.org/aip/sdy We observed that for two proteins that have been very extensively studied by both x-ray crystallography and NMR, M pro and ubiquitin, improved RDC agreement is obtained for pseudo-ensembles of all PDB x-ray structures over fits to any individual structure. Modest improvement in RDC fits for regions of well-defined secondary structure with below average C a coordinate RMSD appears to be dominated by a more accurate orientation of the bond vectors for the ensemble average over any individual structure, rather than by dynamic effects. However, the substantially larger improvement in pseudo-ensemble RDC agreement seen for the loop regions, 64 even when compared to ensemble-averaged vector orientations (Table I), indicates that for these more dynamic regions the RDCs contain valuable information on the distribution of conformations sampled by the protein. The improved RDC fits obtained for large ensembles of M pro and ubiquitin x-ray structures suggest that these ensembles of x-ray structures provide fairly realistic descriptions of the conformations sampled in solution. NMR relaxation data then can provide precise time scales for the rates at which such structures interchange, whereas RDCs may aid in adjusting the relative populations.
For both ubiquitin and M pro , the large diversity of PDB structures results from the work of dozens of laboratories over multiple years. It appears likely, however, that a much smaller number of structures may suffice to represent the sampled conformational space. At that point, RDCs can be used to judge how representative such a "microensemble" is for its solution structure, and weight factors for its members may be optimized to improve quantitative agreement with the data. Such work will represent an important step toward the ultimate goal of pinpointing the functionally relevant conformers and learning how they interconvert.