Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function. A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc. Nevertheless, none of the current methodologies are able to deliver an accuracy of 0.7 in terms of the Pearson correlation coefficients averaged over a large set of proteins. In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis. Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs. MWCGs not only predict the B-factors of protein residues but also accurately analyze the flexibility of all atoms in a protein. The MWCG model is validated over a number of protein test sets and compared with many standard methods. An extensive numerical study indicates that the proposed MWCG offers an accuracy of over 0.8 and thus provides perhaps the first reliable method for estimating protein flexibility and B-factors. It also simultaneously predicts all-atom flexibility in a molecule.

The Debye-Waller factor is a measure of x-ray scattering model uncertainty caused by thermal motions. In proteins, one refers to this measure as the beta factor (B-factor) or temperature factor. The strength of thermal motions is proportional to the B-factor and is thus a meaningful metric in understanding the protein structure, flexibility, and function.1 Previous studies have shown that intrinsic structural flexibility is related to important protein conformational variations.2 That is, protein dynamics provides a link between the structure and function.

One of the first methods used for B-factor prediction is normal mode analysis (NMA).2–6 NMA takes a time-independent approach by adopting an interaction Hamiltonian from molecular mechanics (MM). In this model, bond lengths and angles are fixed. Normal mode analysis is computed via the diagonalization of the Hamiltonian on an energy minimized structure. The normal modes are the orthogonal resonant patterns of the MM system. The collective motion of the protein can then be described by a superposition of the normal modes. Low-frequency modes will involve highly cooperative motions which are meaningful in applications such as hinge detection. Previous studies have shown that the low-frequency modes of NMA have a strong correlation with the transition pathways of macromolecules.2 In particular, NMA efficiently characterizes the coarse-grained deformation motion of supramolecular complexes.

In 1996, the elastic network model (ENM)7 was proposed to simplify NMA. The anisotropic network model (ANM) was introduced as another elastic network model which describes a protein as a spring network using a simple spring potential between residues represented by Cα atoms.8 Using this simplified potential, its modes are then obtained from matrix diagonalization. In this method, all springs have the same force constant. The minimalist approach provided by ANM has been shown to provide good insight into the dynamics of a protein with lower computational cost than NMA.

Similar to ANM, the Gaussian network model (GNM)9,10 bypasses detailed force functions and parameters and represents protein Cα interactions by a Kirchoff matrix, which is a measure of the connectivity of the local environment of each atom. The diagonalization of the Kirchoff matrix gives rise to eigenmodes and eigenvalues for describing protein B-factors. Like ANM, GNM is a minimalist coarse-grained approximation of protein dynamics.9 GNM is known for its better accuracy and efficiency.11 Additionally, graph theory12 and the rotation translation block method13,14 have been proposed. In addition to their applications in protein fluctuation analysis, these methods are also devised for entropy estimation. These approaches have been improved to include crystal periodicity and cofactor corrections,15–17 and density-cluster rotational-translational blocking.14 Applications have been considered to stability18 and docking analysis.19 Many interesting case studies have been reported on hemoglobin,20 F1 ATPase,21,22 chaperonin GroEL,23,24 viral capsids,25,26 and ribosome27,28 as shown in review papers.2,11,29,30

Mathematically, the above-mentioned methods utilize ideas from algebraic graph theory which is the study of graphs by using algebraic methods such as the characteristic polynomial, eigenvalues, and eigenvectors of Laplacian or adjacency matrices associated with the graphs. Algebraic graph theory has also been widely applied to quantum chemistry.31,32 Due to the matrix-diagonalization procedure, the aforementioned methods have a computational complexity of O(N3), with N being the number of elements in the involved matrix. Additionally, these methods suffer from limited accuracy. It was reported that for small-sized, medium-sized, and large-sized protein structures, the average Pearson correlation coefficients (PCCs) of B-factor predictions from NMA and GNM are, respectively, below 0.5 and 0.6.33 In fact, both NMA and GNM even deliver negative correlation coefficients for many proteins.33 Therefore, there is a pressing need to develop accurate and reliable methods for protein flexibility analysis and entropy estimation.

In the past few years, an alternative mathematical method based on geometric graphs, called flexibility-rigidity index (FRI), was introduced to bypass matrix-diagonalization.34–36 FRI makes use of protein graph connectivity and centrality to analyze protein flexibility. It assumes that protein interactions, including those with its environment, fully determine its structure in a given environment. The protein structure and its environment, in turn, fully determine protein flexibility and function. As a consequence, one does not need to invoke a protein interaction Hamiltonian as in spectral graph theory to analyze protein flexibility when the accurate structure of the protein and its environment is known. Earlier FRI34 is of O(N2) in computational complexity34 and the fast FRI (fFRI)35 is of O(N). We have also developed anisotropic FRI (aFRI)35 and generalized FRI (gFRI).37 Multiscale FRI (mFRI) was introduced to capture multiscale interactions in macromolecules,38 leading to multiple graphs, i.e., graphs with parallel edges. Impressively, mFRI is about 20% more accurate than GNM on a set of 364 proteins,38 while fFRI is orders of magnitude faster than GNM on a set of 44 proteins.35 It is able to predict the B-factors of the HIV virus capsid (1E6J) with 313 236 residues in less than 30 seconds on a single-core laptop.35 FRI matrices have been used to construct generalized GNM (gGNM), generalized ANM (gANM), multiscale GNM (mGNM), and multiscale ANM (mANM) methods39 to significantly improve their accuracy in protein flexibility analysis. In fact, mathematically, GNM might be regarded as a spectral graph approximation of the geometric graph centrality. Nonetheless, it is still a challenge to accurately predict protein flexibility. The average PCC of B-factor predictions from the aforementioned methods is typically below 0.7, which is insufficient for a reliable protein flexibility analysis. Given the importance of flexibility analysis, this challenge calls for innovative strategies.

However, none of the aforementioned methods is able to accurately predict the B-factors of different types of atoms in a protein. This limitation is a consequence that previous graph theory based methods do not distinguish different chemical and biological attributions in a graph. This problem can be addressed by appropriate graph coloring and subgraph division.

The objective of this work is to introduce a multiscale weighted colored graph (MWCG) model for protein flexibility analysis. MWCG is a geometric graph model and offers the most accurate and reliable protein flexibility analysis and B-factor prediction. Our basic idea is to color a protein graph by interaction types between graph nodes and define subgraphs according to colors. Generalized centrality is defined on each subgraph via radial basis functions. In a multiscale setting, graphic rigidity at each node in each scale is approximated by the generalized centrality. The proposed MWCG method can be combined with various earlier FRI approaches, such as fFRI, mFRI, and aFRI etc., to further strengthen its power in protein flexibility analysis. Additionally, we show that MWCG works well not only for residues but also for all the atoms in a protein, i.e., non-Cα carbon, nitrogen, oxygen, and sulfur atoms. Hydrogen atoms can be treated similarly if they are available in the data set. In the following, we present a detailed description of our new method and algorithm in terms of weighted colored graph theory. The performance of our method is validated and compared with that of other methods over various protein data sets. We show that the proposed method is over 40% more accurate than GNM on a set of 364 proteins.

We provide a description of weighted colored graphs. It is convenient to consider proteins as a network and describe FRI in terms of graph theory. A graph G(V, E) is defined by a set of nodes V, called vertices, and a set of edges of the graph, E, which relates pairs of vertices. A protein network is a graph whose nodes and edges have specific attributes. Specifically, individual atoms represent nodes and edges correspond to various distance-based correlation metrics. Many of existing methods in B-factor prediction use networks of three-dimensional (3D) spatial atomic coordinate data from the Protein Databank (PDB). This approach works because the distance between two atoms in a protein is generally proportional to their interaction strength.

The weighted colored graph (WCG) method converts 3D protein geometric information about atomic coordinates into protein connectivity. The original algorithm considers only Cα atoms in a given protein. In this work, we consider all N atoms in a protein given by a colored graph G(V, E). As such, the ith atom is labeled by its element type αj and position rj and thus V={(rj,αj)|rjR3;αjC;j=1,2,,N}, where C={C, N, O, S } is a set containing element types in a protein. Hydrogen element is omitted due to its absence from most PDB files and can be added without affecting the present description. To describe the set of edges in a colored protein graph, it is convenient to define directed element-specific pairs (i.e., directed and colored graphs) P={CC, CN, CO, CS, NC, NN, NO, NS, OC, ON, OO, OS, SC, SN, SO, SS}. For example, subset P2={CN} contains all directed CN pairs in the protein such that the first atom is a carbon and the second one is a nitrogen. Therefore, E is a set of weighted directed edges describing the potential interactions of various pairs of atoms,

(1)

where ||rirj|| is the Euclidean distance between the ith and jth atoms, ηij is a characteristic distance between the atoms, and (αiαj) is a directed pair of element types. Here Φk is a correlation function and is chosen to have the following properties:35 

(2)
(3)

Our previous work35 has shown that generalized exponential functions,

(4)

and generalized Lorentz functions,

(5)

are good choices which satisfy the assumptions.

Centrality is an important concept in graph theory and has many applications.40 There are many centrality definitions. For example, normalized closeness centrality41 and Harmonic centrality42 of node ri in a connected graph are given as 1/j||rirj|| and j1/||rirj||, respectively. In this context, we extend Harmonic centrality to subgraphs with weighted edges defined by generalized correlation functions,

(6)

where wij is a weight function related to the element type. The WCG centrality in Eq. (6) describes the atomic specific rigidity which measures the stiffness at the ith atom due to the kth set of contact atoms.

A standard procedure for constructing flexibility index from its corresponding rigidity index is to take a reciprocal function. Therefore, we have a colored flexibility index on subgraphs,

(7)

Our recent work indicates that other forms of flexibility index work equally well.37 The flexibility index at each atom corresponds to temperature fluctuation so we can model the B-factor of the ith atom as

(8)

where Bit represents the theoretically predicted B-factor of the ith atom. The coefficients ck and b are determined by minimizing the linear system,

(9)

where Bie is the experimentally measured B-factor of the ith atom.

Macromolecular interactions are of a short-range type, i.e., covalent bonds, medium-range type, such as hydrogen bonds, electrostatics, and van der Waals, and long-range type, namely, hydrophobicity. Consequently, protein flexibility is intrinsically associated with multiple characteristic length scales. To characterize protein multiscale interactions, we propose multiscale weighted colored graphs (MWCGs). The flexibility of the ith atom at the nth scale due to the kth set of interaction atoms is given by

(10)

where wijn is an atomic type dependent parameter, Φk(||rirj||;ηijn) a correlation kernel, and ηijn a scale parameter. Minimization takes the form

(11)

where Bie are experimentally predicted B-factors. In this work, we construct three-scale correlation kernels using two generalized Lorentz kernels and a generalized exponential kernel. By choosing appropriate values for η, ν, and κ, our method is parameter free.

While sulfur atoms play an important role in proteins, they are so sparse that their kernels have a negligible effect on the current model. Therefore, it is convenient to consider a subset of P in practical computations,

(12)

While we chose C, N, and O due to their high occurrence frequency and important biological relevance, the method can also be adapted to include any element the user prefers. Additionally, when we compute the B-factors of S atoms, we will consider all possible element pairs, SC, SN, SO, and SS in our WCG calculations.

The proposed method is distinct in its ability to consider the effects of Cα interactions in addition to nitrogen, oxygen, and other carbon atoms. The method creates the three aforementioned correlation kernels for all carbon-carbon (CC), carbon-nitrogen (CN), and carbon-oxygen (CO) interactions. Additionally, we consider three-scale interactions, which give rise to a total of 9 correlation kernels, making up the corresponding graph centralities and flexibilities. Our method is then fitted using those C elements from each of the correlation kernels. The element-specific correlation kernels of the proposed method use existing data about carbon, nitrogen, and oxygen interactions that other methods such as mFRI, GNM, and NMA fail to take into account.

By using NC, NN, NO or OC, ON, and OO kernel combinations, one can also use this method to predict the B-factors of these heavy elements in addition to carbon B-factor prediction.

The study uses two data sets, one from Refs. 35 and 38 and the other from the work of Park, Jernigan, and Wu.33 The first contains 364 proteins35,38 and the second contains 3 subsets of small, medium, and large proteins.33 All sequences have a resolution of 3 Å or higher and an average resolution of 1.3 Å, and the sets include proteins that range from 4 to 3912 residues.33 

We use the proposed method to predict the B-factor of Cα atoms for the given set of proteins. In addition to the Cα B-factor prediction, we also used the proposed method to predict the B-factors of nitrogen, oxygen, and sulfur and non-Cα carbon atoms.

To quantitatively assess our method for B-factor prediction, we use the Pearson correlation coefficient given by

(13)

where Bit,i=1,2,,N are predicted B-factors using the proposed method and Bie,i=1,2,,N experimental B-factors from the PDB file. The terms Bit and Bie represent the ith theoretical and experimental B-factors, respectively. Here B¯e and B¯t are averaged B-factors.

Using CC, CN, and CO element specific correlation kernels described in Eq. (10) provides a total of 9 unique correlation kernels for the present graph theory-based method. In this work, we set wij=wijn=1 and ηijn=ηn in all computation.

We carried out a simplified parameter search using the 364 data set to determine near optimal parameters for Cα B-factor predictions. We choose three kernels, in which we set ν = 1 and 3 for Lorentz kernels and κ = 1 for the exponential kernel, respectively. Our earlier analysis indicates that when ν = 3 and κ = 1, one can obtain a fast FRI via the k-d tree scheme by choosing an optimal box size R of R = 4.6η.35 As an approximation, we fix R = 4.6η and when we try to determine a set of near-optimal ηn values.

The first kernel is chosen to be a Lorentz function, and its optimal η1 was found to be η1 = 16 as shown in Fig. 1. Then, instead of a global search, we fixed η1 = 16 to search an optimal η2 for another Lorentz kernel and found that η2 = 2 offers the optimal prediction as shown in Fig. 1. Finally, we fixed η1 = 16 and η2 = 2 to search η3 for an exponential kernel. In this case, we run into a situation that the average Pearson correlation coefficient (PCC) does not decay even if η3 is as large as 40 as shown in Fig. 1. Nevertheless, this parameter behavior is quite reasonable. When only one kernel is used, a good approximation can be obtained by a mediate η value of the range of 12–17. The second η favors a small value to capture the small-scale interactions in proteins. The third η appears to pick up large-scale interactions. However, the larger the η value is, the more expensive the calculation becomes. We, therefore, choose η3 = 31 in our parameter-free MWCG method as listed in Table I. The exponential kernel is used to capture large-scale effects because it decays fast. In this work, as a good approximation, we use this set of parameters for all flexibility analysis.

FIG. 1.

The average Pearson correlation coefficient (PCC) as found by optimizing individual kernels in the range of ηn = 1, …, 40.

FIG. 1.

The average Pearson correlation coefficient (PCC) as found by optimizing individual kernels in the range of ηn = 1, …, 40.

Close modal
TABLE I.

Parameters used for correlation kernels in a parameter-free MWCG.

Kernel typeκηnν
Lorentz (n = 1) … 16 
Lorentz (n = 2) … 
Exponential (n = 3) 31 … 
Kernel typeκηnν
Lorentz (n = 1) … 16 
Lorentz (n = 2) … 
Exponential (n = 3) 31 … 

Visualization is an important part of methodological development. The monotonically decreasing radial basis functions used in the proposed method allow us to create various correlation maps of a given protein. Given that the current method considers carbon, nitrogen, and oxygen atoms, we can create correlation maps that show the relationship between not only carbon atoms but nitrogen and oxygen atoms as well. For each element pair, these maps were calculated using the average of the three kernels described in Sec. III A. The axes of each correlation map correspond to the carbon, nitrogen, or oxygen atoms. As presented in previous work, correlation maps provide a visual display of important structural features.35 Our work extends this concept to more general carbon-carbon, nitrogen-nitrogen, and oxygen-oxygen maps. As an example, we consider three proteins with PDB IDs 3TYS, 1AIE, and 3PSM. Correlation maps can be seen in Figs. 2–5.

FIG. 2.

(a) VMD representation of PBD ID 1AIE. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 1AIE. The thicker band along the main diagonal of (b) and (c) corresponds to the alpha helix secondary structure in 1AIE.

FIG. 2.

(a) VMD representation of PBD ID 1AIE. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 1AIE. The thicker band along the main diagonal of (b) and (c) corresponds to the alpha helix secondary structure in 1AIE.

Close modal
FIG. 3.

(a) VMD representation of PBD ID 1KGM. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 1KGM. The bands perpendicular to the main diagonal of (b) and (c) correspond to the anti-parallel beta sheet present in 1KGM.

FIG. 3.

(a) VMD representation of PBD ID 1KGM. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 1KGM. The bands perpendicular to the main diagonal of (b) and (c) correspond to the anti-parallel beta sheet present in 1KGM.

Close modal
FIG. 4.

(a) VMD representation of PBD ID 5IIV. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 5IIV. The presence of the two distinct thick bands along the main diagonal of (b) and (c) corresponds to the two alpha helices present in 5IIV. The off-diagonal bands correspond to the bonding interaction between alpha helices.

FIG. 4.

(a) VMD representation of PBD ID 5IIV. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 5IIV. The presence of the two distinct thick bands along the main diagonal of (b) and (c) corresponds to the two alpha helices present in 5IIV. The off-diagonal bands correspond to the bonding interaction between alpha helices.

Close modal
FIG. 5.

(a) VMD representation of PBD ID 3PSM. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 3PSM. The thicker bands along the main diagonal of (b) and (c) correspond to the two alpha helices present in 3PSM while bands perpendicular to the main diagonal correspond to anti-parallel beta sheets.

FIG. 5.

(a) VMD representation of PBD ID 3PSM. (b) Correlation maps for nitrogen-nitrogen (NN) and (c) oxygen-oxygen (OO) interactions for protein 3PSM. The thicker bands along the main diagonal of (b) and (c) correspond to the two alpha helices present in 3PSM while bands perpendicular to the main diagonal correspond to anti-parallel beta sheets.

Close modal

Table II displays the MWCG results for all 364 proteins in the data set. We compare the results of the proposed method with those from the optimal FRI (opFRI), parameter-free FRI (pFRI), and GNM. Tables III–V provide the same comparison for proteins of relatively small, medium, and large sizes.

TABLE II.

Correlation coefficients for B-factor prediction obtained by optimal FRI (opFRI), parameter free FRI (pfFRI), and Gaussian normal mode (GNM) for a set of 364 proteins. GNM scores reported here are the result of our tests with a processed set of PDB files as described in Sec. III.

PDB IDNMWCGopFRIpfFRIGNMPDB IDNMWCGopFRIpfFRIGNM
1ABA 87 0.855 0.727 0.698 0.613 1PEF 18 0.989 0.888 0.826 0.808 
1AHO 64 0.768 0.698 0.625 0.562 1PEN 16 0.957 0.516 0.465 0.27 
1AIE 31 0.969 0.588 0.416 0.155 1PMY 123 0.701 0.671 0.654 0.685 
1AKG 16 0.945 0.373 0.35 0.185 1PZ4 114 0.921 0.828 0.781 0.843 
1ATG 231 0.843 0.613 0.578 0.497 1Q9B 43 0.957 0.746 0.726 0.656 
1BGF 124 0.834 0.603 0.539 0.543 1QAU 112 0.786 0.678 0.672 0.62 
1BX7 51 0.896 0.726 0.623 0.706 1QKI 3912 0.508 0.809 0.751 0.645 
1BYI 224 0.600 0.543 0.491 0.552 1QTO 122 0.809 0.543 0.52 0.334 
1CCR 111 0.741 0.58 0.512 0.351 1R29 122 0.787 0.65 0.631 0.556 
1CYO 88 0.860 0.751 0.702 0.741 1R7J 90 0.859 0.789 0.621 0.368 
1DF4 57 0.941 0.912 0.889 0.832 1RJU 36 0.805 0.517 0.447 0.431 
1E5K 188 0.848 0.746 0.732 0.859 1RRO 112 0.748 0.435 0.372 0.529 
1ES5 260 0.700 0.653 0.638 0.677 1SAU 114 0.819 0.742 0.671 0.596 
1ETL 12 0.932 0.71 0.609 0.628 1TGR 104 0.810 0.72 0.711 0.714 
1ETM 12 0.941 0.544 0.393 0.432 1TZV 141 0.869 0.837 0.82 0.841 
1ETN 12 0.949 0.089 0.023 −0.274 1U06 55 0.774 0.474 0.429 0.434 
1EW4 106 0.804 0.65 0.644 0.547 1U7I 267 0.885 0.778 0.762 0.691 
1F8R 1932 0.504 0.878 0.859 0.738 1U9C 221 0.764 0.6 0.577 0.522 
1FF4 65 0.933 0.718 0.613 0.674 1UHA 83 0.838 0.726 0.665 0.638 
1FK5 93 0.648 0.59 0.568 0.485 1UKU 102 0.765 0.665 0.661 0.742 
1GCO 1044 0.839 0.766 0.693 0.646 1ULR 87 0.718 0.639 0.594 0.495 
1GK7 39 0.984 0.845 0.773 0.821 1UOY 64 0.769 0.713 0.653 0.671 
1GVD 52 0.849 0.781 0.732 0.591 1USE 40 0.960 0.438 0.146 −0.142 
1GXU 88 0.901 0.748 0.634 0.421 1USM 77 0.819 0.832 0.809 0.798 
1H6V 2927 0.133 0.488 0.429 0.306 1UTG 70 0.745 0.691 0.61 0.538 
1HJE 13 0.931 0.811 0.686 0.616 1V05 96 0.841 0.629 0.599 0.632 
1I71 83 0.798 0.549 0.516 0.549 1V70 105 0.854 0.622 0.492 0.162 
1IDP 441 0.827 0.735 0.715 0.69 1VRZ 21 0.995 0.792 0.695 0.677 
1IFR 113 0.875 0.697 0.689 0.637 1W2L 97 0.747 0.691 0.564 0.397 
1K8U 89 0.856 0.553 0.531 0.378 1WBE 204 0.767 0.591 0.577 0.549 
1KMM 1499 0.740 0.749 0.744 0.558 1WHI 122 0.804 0.601 0.539 0.27 
1KNG 144 0.810 0.547 0.536 0.512 1WLY 322 0.728 0.695 0.679 0.666 
1KR4 110 0.892 0.635 0.612 0.466 1WPA 107 0.797 0.634 0.577 0.417 
1KYC 15 0.971 0.796 0.763 0.754 1X3O 80 0.787 0.6 0.559 0.654 
1LR7 73 0.929 0.679 0.657 0.62 1XY1 18 0.933 0.832 0.645 0.447 
1MF7 194 0.757 0.687 0.681 0.7 1XY2 1.000 0.619 0.57 0.562 
1N7E 95 0.812 0.651 0.609 0.497 1Y6X 87 0.838 0.596 0.524 0.366 
1NKD 59 0.911 0.75 0.703 0.631 1YJO 1.000 0.375 0.333 0.434 
1NKO 122 0.831 0.619 0.535 0.368 1YZM 46 0.970 0.842 0.834 0.901 
1NLS 238 0.799 0.669 0.53 0.523 1Z21 96 0.725 0.662 0.638 0.433 
1NNX 93 0.834 0.795 0.789 0.631 1ZCE 146 0.898 0.808 0.757 0.77 
1NOA 113 0.808 0.622 0.604 0.615 1ZVA 75 0.911 0.756 0.579 0.69 
1NOT 13 0.937 0.746 0.622 0.523 2A50 457 0.704 0.564 0.524 0.281 
1O06 20 0.988 0.91 0.874 0.844 2AGK 233 0.821 0.705 0.694 0.512 
1O08 221 0.516 0.562 0.333 0.309 2AH1 939 0.462 0.684 0.593 0.521 
1OB4 16 1.000 0.776 0.763 0.75 2B0A 186 0.805 0.639 0.603 0.467 
1OB7 16 1.000 0.737 0.545 0.652 2BCM 413 0.695 0.555 0.551 0.477 
1OPD 85 0.607 0.555 0.409 0.398 2BF9 36 0.714 0.606 0.554 0.68 
1P9I 29 0.841 0.754 0.742 0.625 2BRF 100 0.873 0.795 0.764 0.71 
2CE0 99 0.824 0.706 0.598 0.529 2C71 205 0.773 0.658 0.649 0.56 
2CG7 90 0.738 0.551 0.539 0.379 2OLX 1.000 0.917 0.888 0.885 
2COV 534 0.895 0.846 0.823 0.812 2PKT 93 0.762 0.162 0.003 0.193 
2CWS 227 0.756 0.647 0.64 0.696 2PLT 99 0.635 0.508 0.484 0.509 
2D5W 1214 0.448 0.689 0.682 0.681 2PMR 76 0.799 0.693 0.682 0.619 
2DKO 253 0.873 0.816 0.812 0.69 2POF 440 0.743 0.682 0.651 0.589 
2DPL 565 0.721 0.596 0.538 0.658 2PPN 107 0.673 0.677 0.638 0.668 
2DSX 52 0.704 0.337 0.333 0.127 2PSF 608 0.641 0.526 0.5 0.565 
2E10 439 0.808 0.798 0.796 0.692 2PTH 193 0.901 0.822 0.784 0.767 
2E3H 81 0.794 0.692 0.682 0.605 2Q4N 153 0.846 0.711 0.667 0.74 
2EAQ 89 0.817 0.753 0.69 0.695 2Q52 412 0.510 0.756 0.748 0.621 
2EHP 248 0.832 0.804 0.804 0.773 2QJL 99 0.611 0.594 0.584 0.594 
2EHS 75 0.805 0.72 0.713 0.747 2R16 176 0.640 0.582 0.495 0.618 
2ERW 53 0.513 0.461 0.253 0.199 2R6Q 138 0.915 0.603 0.54 0.529 
2ETX 389 0.854 0.58 0.556 0.632 2RB8 93 0.840 0.727 0.614 0.517 
2FB6 116 0.850 0.791 0.786 0.74 2RE2 238 0.711 0.652 0.613 0.673 
2FG1 157 0.719 0.62 0.617 0.584 2RFR 154 0.826 0.693 0.671 0.753 
2FN9 560 0.704 0.607 0.595 0.611 2V9V 135 0.697 0.555 0.548 0.528 
2FQ3 85 0.844 0.719 0.692 0.348 2VE8 515 0.698 0.744 0.643 0.616 
2G69 99 0.850 0.622 0.59 0.436 2VH7 94 0.851 0.775 0.726 0.596 
2G7O 68 0.888 0.785 0.784 0.66 2VIM 104 0.859 0.413 0.393 0.212 
2G7S 190 0.756 0.67 0.644 0.649 2VPA 204 0.757 0.763 0.755 0.576 
2GKG 122 0.748 0.688 0.646 0.711 2VQ4 106 0.776 0.68 0.679 0.555 
2GOM 121 0.874 0.586 0.584 0.491 2VY8 149 0.759 0.77 0.724 0.533 
2GXG 140 0.901 0.847 0.78 0.52 2VYO 210 0.777 0.675 0.648 0.729 
2GZQ 191 0.462 0.505 0.382 0.369 2W1V 548 0.761 0.68 0.68 0.571 
2HQK 213 0.897 0.824 0.809 0.365 2W2A 350 0.819 0.706 0.638 0.589 
2HYK 238 0.728 0.585 0.575 0.51 2W6A 117 0.804 0.823 0.748 0.647 
2I24 113 0.672 0.593 0.498 0.494 2WJ5 96 0.821 0.484 0.44 0.357 
2I49 398 0.766 0.714 0.683 0.601 2WUJ 100 0.919 0.739 0.598 0.598 
2IBL 108 0.919 0.629 0.625 0.352 2WW7 150 0.629 0.499 0.471 0.356 
2IGD 61 0.865 0.585 0.481 0.386 2WWE 111 0.903 0.692 0.582 0.628 
2IMF 203 0.798 0.652 0.625 0.514 2X1Q 240 0.505 0.534 0.478 0.443 
2IP6 87 0.841 0.654 0.578 0.572 2X25 168 0.710 0.632 0.598 0.403 
2IVY 88 0.837 0.544 0.483 0.271 2X3M 166 0.875 0.744 0.717 0.655 
2J32 244 0.878 0.863 0.848 0.855 2X5Y 171 0.799 0.718 0.705 0.694 
2J9W 200 0.741 0.716 0.705 0.662 2X9Z 262 0.726 0.583 0.578 0.574 
2JKU 35 0.926 0.805 0.695 0.656 2XHF 310 0.830 0.606 0.591 0.569 
2JLI 100 0.937 0.779 0.613 0.622 2Y0T 101 0.834 0.778 0.774 0.798 
2JLJ 115 0.811 0.741 0.72 0.527 2Y72 170 0.926 0.78 0.754 0.766 
2MCM 113 0.867 0.789 0.713 0.639 2Y7L 319 0.939 0.928 0.797 0.747 
2NLS 36 0.937 0.605 0.559 0.53 2Y9F 149 0.769 0.771 0.762 0.664 
2NR7 194 0.885 0.803 0.785 0.727 2YLB 400 0.820 0.807 0.807 0.675 
2NUH 104 0.922 0.835 0.691 0.771 2YNY 315 0.836 0.813 0.804 0.706 
2O6X 306 0.825 0.814 0.799 0.651 2ZCM 357 0.723 0.458 0.422 0.42 
2OA2 132 0.703 0.571 0.456 0.458 2ZU1 360 0.753 0.689 0.672 0.653 
2OCT 192 0.673 0.567 0.55 0.54 3A0M 148 0.916 0.807 0.712 0.392 
2OHW 256 0.743 0.614 0.539 0.475 3A7L 128 0.806 0.713 0.663 0.756 
2OKT 342 0.779 0.433 0.411 0.336 3AMC 614 0.758 0.675 0.669 0.581 
2OL9 1.000 0.909 0.904 0.689 3AUB 116 0.650 0.614 0.608 0.637 
3BA1 312 0.827 0.661 0.624 0.621 3B5O 230 0.729 0.644 0.629 0.601 
3BED 261 0.874 0.845 0.82 0.684 3MD4 12 0.999 0.86 0.781 0.914 
3BQX 139 0.900 0.634 0.481 0.297 3MD5 12 0.998 0.649 0.413 −0.218 
3BZQ 99 0.848 0.532 0.516 0.466 3MEA 166 0.872 0.669 0.669 0.6 
3BZZ 100 0.783 0.485 0.45 0.6 3MGN 348 0.742 0.205 0.119 0.193 
3DRF 547 0.781 0.559 0.549 0.488 3MRE 383 0.675 0.661 0.641 0.567 
3DWV 325 0.754 0.707 0.661 0.547 3N11 325 0.736 0.614 0.583 0.517 
3E5T 228 0.731 0.502 0.489 0.296 3NE0 208 0.859 0.706 0.645 0.659 
3E7R 40 0.769 0.706 0.687 0.642 3NGG 94 0.867 0.696 0.689 0.719 
3EUR 140 0.874 0.431 0.427 0.577 3NPV 495 0.855 0.702 0.653 0.677 
3F2Z 149 0.877 0.824 0.792 0.74 3NVG 1.000 0.721 0.617 0.597 
3F7E 254 0.847 0.812 0.803 0.811 3NZL 73 0.713 0.627 0.583 0.506 
3FCN 158 0.741 0.64 0.606 0.632 3O0P 194 0.898 0.727 0.706 0.734 
3FE7 91 0.914 0.583 0.533 0.276 3O5P 128 0.787 0.734 0.698 0.63 
3FKE 250 0.755 0.525 0.476 0.435 3OBQ 150 0.877 0.649 0.645 0.655 
3FMY 66 0.857 0.701 0.655 0.556 3OQY 234 0.807 0.698 0.686 0.637 
3FOD 48 0.725 0.532 0.44 −0.126 3P6J 125 0.689 0.774 0.767 0.81 
3FSO 221 0.906 0.831 0.817 0.793 3PD7 188 0.848 0.77 0.723 0.589 
3FTD 240 0.818 0.722 0.713 0.634 3PES 165 0.861 0.697 0.642 0.683 
3FVA 1.000 0.835 0.825 0.789 3PID 387 0.677 0.537 0.531 0.642 
3G1S 418 0.879 0.771 0.7 0.63 3PIW 154 0.772 0.758 0.744 0.717 
3GBW 161 0.864 0.82 0.747 0.51 3PKV 221 0.731 0.625 0.597 0.568 
3GHJ 116 0.864 0.732 0.511 0.196 3PSM 94 0.914 0.876 0.79 0.745 
3HFO 197 0.825 0.691 0.67 0.518 3PTL 289 0.611 0.543 0.541 0.468 
3HHP 1234 0.830 0.72 0.716 0.683 3PVE 347 0.785 0.718 0.667 0.568 
3HNY 156 0.885 0.793 0.723 0.758 3PZ9 357 0.758 0.709 0.709 0.678 
3HP4 183 0.690 0.534 0.5 0.573 3PZZ 12 0.998 0.945 0.922 0.95 
3HWU 144 0.905 0.754 0.748 0.841 3Q2X 1.000 0.922 0.904 0.866 
3HYD 1.000 0.966 0.95 0.867 3Q6L 131 0.723 0.622 0.577 0.605 
3HZ8 192 0.857 0.617 0.502 0.475 3QDS 284 0.782 0.78 0.745 0.568 
3I2V 124 0.879 0.486 0.441 0.301 3QPA 197 0.616 0.587 0.442 0.503 
3I2Z 138 0.732 0.613 0.599 0.317 3R6D 221 0.854 0.688 0.669 0.495 
3I4O 135 0.767 0.735 0.714 0.738 3R87 132 0.861 0.452 0.419 0.286 
3I7M 134 0.604 0.667 0.635 0.695 3RQ9 162 0.711 0.51 0.403 0.242 
3IHS 169 0.807 0.586 0.565 0.409 3RY0 128 0.790 0.616 0.606 0.47 
3IVV 149 0.866 0.817 0.797 0.693 3RZY 139 0.867 0.8 0.784 0.849 
3K6Y 227 0.817 0.586 0.535 0.301 3S0A 119 0.713 0.562 0.524 0.526 
3KBE 140 0.743 0.705 0.704 0.611 3SD2 86 0.842 0.523 0.421 0.237 
3KGK 190 0.798 0.784 0.775 0.68 3SEB 238 0.879 0.801 0.712 0.826 
3KZD 85 0.789 0.647 0.611 0.475 3SED 124 0.870 0.709 0.658 0.712 
3L41 220 0.776 0.718 0.716 0.669 3SO6 150 0.747 0.675 0.666 0.63 
3LAA 169 0.880 0.827 0.647 0.659 3SR3 637 0.633 0.619 0.611 0.624 
3LAX 106 0.924 0.734 0.73 0.584 3SUK 248 0.721 0.644 0.633 0.567 
3LG3 833 0.701 0.658 0.614 0.589 3SZH 697 0.860 0.817 0.815 0.697 
3LJI 272 0.720 0.612 0.608 0.551 3T0H 208 0.897 0.808 0.775 0.694 
3M3P 249 0.697 0.584 0.554 0.338 3T3K 122 0.803 0.796 0.748 0.735 
3M8J 178 0.813 0.73 0.728 0.628 3T47 141 0.759 0.592 0.527 0.447 
3M9J 210 0.867 0.639 0.574 0.296 3TDN 357 0.668 0.458 0.419 0.24 
3M9Q 176 0.851 0.591 0.51 0.471 3TOW 152 0.722 0.578 0.556 0.571 
3MAB 173 0.770 0.664 0.591 0.451 3TUA 210 0.696 0.665 0.658 0.588 
3U6G 248 0.808 0.635 0.632 0.526 3TYS 75 0.918 0.853 0.8 0.791 
3U97 77 0.819 0.753 0.736 0.712 4DT4 160 0.784 0.776 0.738 0.716 
3UCI 72 0.689 0.589 0.526 0.495 4EK3 287 0.830 0.68 0.68 0.674 
3UR8 637 0.832 0.666 0.652 0.597 4ERY 318 0.801 0.74 0.701 0.688 
3US6 148 0.668 0.698 0.586 0.553 4ES1 95 0.820 0.648 0.625 0.551 
3V1A 48 0.811 0.531 0.487 0.583 4EUG 225 0.592 0.57 0.529 0.405 
3V75 285 0.674 0.604 0.596 0.491 4F01 448 0.883 0.633 0.372 0.688 
3VN0 193 0.889 0.84 0.837 0.812 4F3J 143 0.879 0.617 0.598 0.551 
3VOR 182 0.686 0.602 0.557 0.484 4FR9 141 0.806 0.671 0.655 0.501 
3VUB 101 0.852 0.625 0.61 0.607 4G14 15 1.000 0.467 0.323 0.356 
3VVV 108 0.951 0.833 0.741 0.753 4G2E 151 0.835 0.76 0.755 0.758 
3VZ9 163 0.887 0.785 0.749 0.695 4G5X 550 0.822 0.786 0.754 0.743 
3W4Q 773 0.798 0.737 0.725 0.649 4G6C 658 0.834 0.591 0.59 0.528 
3ZBD 213 0.891 0.651 0.516 0.632 4G7X 194 0.840 0.688 0.587 0.624 
3ZIT 152 0.641 0.43 0.404 0.392 4GA2 144 0.782 0.528 0.485 0.406 
3ZRX 221 0.639 0.59 0.562 0.391 4GMQ 92 0.794 0.678 0.628 0.55 
3ZSL 138 0.903 0.691 0.687 0.526 4GS3 90 0.698 0.544 0.522 0.547 
3ZZP 74 0.692 0.524 0.46 0.448 4H4J 236 0.866 0.81 0.806 0.689 
3ZZY 226 0.804 0.746 0.709 0.728 4H89 168 0.624 0.682 0.588 0.596 
4A02 166 0.730 0.618 0.516 0.303 4HDE 168 0.783 0.745 0.728 0.615 
4ACJ 167 0.827 0.748 0.746 0.759 4HJP 281 0.730 0.703 0.649 0.51 
4AE7 186 0.862 0.724 0.717 0.717 4HWM 117 0.807 0.638 0.622 0.499 
4AM1 345 0.796 0.674 0.619 0.46 4IL7 85 0.719 0.446 0.404 0.316 
4ANN 176 0.562 0.551 0.536 0.47 4J11 357 0.726 0.62 0.562 0.401 
4AVR 188 0.759 0.68 0.605 0.65 4J5O 220 0.817 0.793 0.757 0.777 
4AXY 54 0.973 0.7 0.623 0.72 4J5Q 146 0.851 0.742 0.742 0.689 
4B6G 558 0.804 0.765 0.756 0.669 4J78 305 0.729 0.658 0.648 0.608 
4B9G 292 0.855 0.844 0.816 0.763 4JG2 185 0.889 0.746 0.736 0.543 
4DD5 387 0.850 0.615 0.596 0.351 4JVU 207 0.800 0.723 0.697 0.553 
4DKN 423 0.786 0.781 0.761 0.539 4JYP 534 0.800 0.688 0.682 0.538 
4DND 95 0.829 0.763 0.75 0.582 4KEF 133 0.704 0.58 0.53 0.324 
4DPZ 109 0.837 0.73 0.726 0.651 5CYT 103 0.548 0.441 0.421 0.331 
4DQ7 328 0.776 0.69 0.683 0.376 6RXN 45 0.583 0.614 0.574 0.594 
PDB IDNMWCGopFRIpfFRIGNMPDB IDNMWCGopFRIpfFRIGNM
1ABA 87 0.855 0.727 0.698 0.613 1PEF 18 0.989 0.888 0.826 0.808 
1AHO 64 0.768 0.698 0.625 0.562 1PEN 16 0.957 0.516 0.465 0.27 
1AIE 31 0.969 0.588 0.416 0.155 1PMY 123 0.701 0.671 0.654 0.685 
1AKG 16 0.945 0.373 0.35 0.185 1PZ4 114 0.921 0.828 0.781 0.843 
1ATG 231 0.843 0.613 0.578 0.497 1Q9B 43 0.957 0.746 0.726 0.656 
1BGF 124 0.834 0.603 0.539 0.543 1QAU 112 0.786 0.678 0.672 0.62 
1BX7 51 0.896 0.726 0.623 0.706 1QKI 3912 0.508 0.809 0.751 0.645 
1BYI 224 0.600 0.543 0.491 0.552 1QTO 122 0.809 0.543 0.52 0.334 
1CCR 111 0.741 0.58 0.512 0.351 1R29 122 0.787 0.65 0.631 0.556 
1CYO 88 0.860 0.751 0.702 0.741 1R7J 90 0.859 0.789 0.621 0.368 
1DF4 57 0.941 0.912 0.889 0.832 1RJU 36 0.805 0.517 0.447 0.431 
1E5K 188 0.848 0.746 0.732 0.859 1RRO 112 0.748 0.435 0.372 0.529 
1ES5 260 0.700 0.653 0.638 0.677 1SAU 114 0.819 0.742 0.671 0.596 
1ETL 12 0.932 0.71 0.609 0.628 1TGR 104 0.810 0.72 0.711 0.714 
1ETM 12 0.941 0.544 0.393 0.432 1TZV 141 0.869 0.837 0.82 0.841 
1ETN 12 0.949 0.089 0.023 −0.274 1U06 55 0.774 0.474 0.429 0.434 
1EW4 106 0.804 0.65 0.644 0.547 1U7I 267 0.885 0.778 0.762 0.691 
1F8R 1932 0.504 0.878 0.859 0.738 1U9C 221 0.764 0.6 0.577 0.522 
1FF4 65 0.933 0.718 0.613 0.674 1UHA 83 0.838 0.726 0.665 0.638 
1FK5 93 0.648 0.59 0.568 0.485 1UKU 102 0.765 0.665 0.661 0.742 
1GCO 1044 0.839 0.766 0.693 0.646 1ULR 87 0.718 0.639 0.594 0.495 
1GK7 39 0.984 0.845 0.773 0.821 1UOY 64 0.769 0.713 0.653 0.671 
1GVD 52 0.849 0.781 0.732 0.591 1USE 40 0.960 0.438 0.146 −0.142 
1GXU 88 0.901 0.748 0.634 0.421 1USM 77 0.819 0.832 0.809 0.798 
1H6V 2927 0.133 0.488 0.429 0.306 1UTG 70 0.745 0.691 0.61 0.538 
1HJE 13 0.931 0.811 0.686 0.616 1V05 96 0.841 0.629 0.599 0.632 
1I71 83 0.798 0.549 0.516 0.549 1V70 105 0.854 0.622 0.492 0.162 
1IDP 441 0.827 0.735 0.715 0.69 1VRZ 21 0.995 0.792 0.695 0.677 
1IFR 113 0.875 0.697 0.689 0.637 1W2L 97 0.747 0.691 0.564 0.397 
1K8U 89 0.856 0.553 0.531 0.378 1WBE 204 0.767 0.591 0.577 0.549 
1KMM 1499 0.740 0.749 0.744 0.558 1WHI 122 0.804 0.601 0.539 0.27 
1KNG 144 0.810 0.547 0.536 0.512 1WLY 322 0.728 0.695 0.679 0.666 
1KR4 110 0.892 0.635 0.612 0.466 1WPA 107 0.797 0.634 0.577 0.417 
1KYC 15 0.971 0.796 0.763 0.754 1X3O 80 0.787 0.6 0.559 0.654 
1LR7 73 0.929 0.679 0.657 0.62 1XY1 18 0.933 0.832 0.645 0.447 
1MF7 194 0.757 0.687 0.681 0.7 1XY2 1.000 0.619 0.57 0.562 
1N7E 95 0.812 0.651 0.609 0.497 1Y6X 87 0.838 0.596 0.524 0.366 
1NKD 59 0.911 0.75 0.703 0.631 1YJO 1.000 0.375 0.333 0.434 
1NKO 122 0.831 0.619 0.535 0.368 1YZM 46 0.970 0.842 0.834 0.901 
1NLS 238 0.799 0.669 0.53 0.523 1Z21 96 0.725 0.662 0.638 0.433 
1NNX 93 0.834 0.795 0.789 0.631 1ZCE 146 0.898 0.808 0.757 0.77 
1NOA 113 0.808 0.622 0.604 0.615 1ZVA 75 0.911 0.756 0.579 0.69 
1NOT 13 0.937 0.746 0.622 0.523 2A50 457 0.704 0.564 0.524 0.281 
1O06 20 0.988 0.91 0.874 0.844 2AGK 233 0.821 0.705 0.694 0.512 
1O08 221 0.516 0.562 0.333 0.309 2AH1 939 0.462 0.684 0.593 0.521 
1OB4 16 1.000 0.776 0.763 0.75 2B0A 186 0.805 0.639 0.603 0.467 
1OB7 16 1.000 0.737 0.545 0.652 2BCM 413 0.695 0.555 0.551 0.477 
1OPD 85 0.607 0.555 0.409 0.398 2BF9 36 0.714 0.606 0.554 0.68 
1P9I 29 0.841 0.754 0.742 0.625 2BRF 100 0.873 0.795 0.764 0.71 
2CE0 99 0.824 0.706 0.598 0.529 2C71 205 0.773 0.658 0.649 0.56 
2CG7 90 0.738 0.551 0.539 0.379 2OLX 1.000 0.917 0.888 0.885 
2COV 534 0.895 0.846 0.823 0.812 2PKT 93 0.762 0.162 0.003 0.193 
2CWS 227 0.756 0.647 0.64 0.696 2PLT 99 0.635 0.508 0.484 0.509 
2D5W 1214 0.448 0.689 0.682 0.681 2PMR 76 0.799 0.693 0.682 0.619 
2DKO 253 0.873 0.816 0.812 0.69 2POF 440 0.743 0.682 0.651 0.589 
2DPL 565 0.721 0.596 0.538 0.658 2PPN 107 0.673 0.677 0.638 0.668 
2DSX 52 0.704 0.337 0.333 0.127 2PSF 608 0.641 0.526 0.5 0.565 
2E10 439 0.808 0.798 0.796 0.692 2PTH 193 0.901 0.822 0.784 0.767 
2E3H 81 0.794 0.692 0.682 0.605 2Q4N 153 0.846 0.711 0.667 0.74 
2EAQ 89 0.817 0.753 0.69 0.695 2Q52 412 0.510 0.756 0.748 0.621 
2EHP 248 0.832 0.804 0.804 0.773 2QJL 99 0.611 0.594 0.584 0.594 
2EHS 75 0.805 0.72 0.713 0.747 2R16 176 0.640 0.582 0.495 0.618 
2ERW 53 0.513 0.461 0.253 0.199 2R6Q 138 0.915 0.603 0.54 0.529 
2ETX 389 0.854 0.58 0.556 0.632 2RB8 93 0.840 0.727 0.614 0.517 
2FB6 116 0.850 0.791 0.786 0.74 2RE2 238 0.711 0.652 0.613 0.673 
2FG1 157 0.719 0.62 0.617 0.584 2RFR 154 0.826 0.693 0.671 0.753 
2FN9 560 0.704 0.607 0.595 0.611 2V9V 135 0.697 0.555 0.548 0.528 
2FQ3 85 0.844 0.719 0.692 0.348 2VE8 515 0.698 0.744 0.643 0.616 
2G69 99 0.850 0.622 0.59 0.436 2VH7 94 0.851 0.775 0.726 0.596 
2G7O 68 0.888 0.785 0.784 0.66 2VIM 104 0.859 0.413 0.393 0.212 
2G7S 190 0.756 0.67 0.644 0.649 2VPA 204 0.757 0.763 0.755 0.576 
2GKG 122 0.748 0.688 0.646 0.711 2VQ4 106 0.776 0.68 0.679 0.555 
2GOM 121 0.874 0.586 0.584 0.491 2VY8 149 0.759 0.77 0.724 0.533 
2GXG 140 0.901 0.847 0.78 0.52 2VYO 210 0.777 0.675 0.648 0.729 
2GZQ 191 0.462 0.505 0.382 0.369 2W1V 548 0.761 0.68 0.68 0.571 
2HQK 213 0.897 0.824 0.809 0.365 2W2A 350 0.819 0.706 0.638 0.589 
2HYK 238 0.728 0.585 0.575 0.51 2W6A 117 0.804 0.823 0.748 0.647 
2I24 113 0.672 0.593 0.498 0.494 2WJ5 96 0.821 0.484 0.44 0.357 
2I49 398 0.766 0.714 0.683 0.601 2WUJ 100 0.919 0.739 0.598 0.598 
2IBL 108 0.919 0.629 0.625 0.352 2WW7 150 0.629 0.499 0.471 0.356 
2IGD 61 0.865 0.585 0.481 0.386 2WWE 111 0.903 0.692 0.582 0.628 
2IMF 203 0.798 0.652 0.625 0.514 2X1Q 240 0.505 0.534 0.478 0.443 
2IP6 87 0.841 0.654 0.578 0.572 2X25 168 0.710 0.632 0.598 0.403 
2IVY 88 0.837 0.544 0.483 0.271 2X3M 166 0.875 0.744 0.717 0.655 
2J32 244 0.878 0.863 0.848 0.855 2X5Y 171 0.799 0.718 0.705 0.694 
2J9W 200 0.741 0.716 0.705 0.662 2X9Z 262 0.726 0.583 0.578 0.574 
2JKU 35 0.926 0.805 0.695 0.656 2XHF 310 0.830 0.606 0.591 0.569 
2JLI 100 0.937 0.779 0.613 0.622 2Y0T 101 0.834 0.778 0.774 0.798 
2JLJ 115 0.811 0.741 0.72 0.527 2Y72 170 0.926 0.78 0.754 0.766 
2MCM 113 0.867 0.789 0.713 0.639 2Y7L 319 0.939 0.928 0.797 0.747 
2NLS 36 0.937 0.605 0.559 0.53 2Y9F 149 0.769 0.771 0.762 0.664 
2NR7 194 0.885 0.803 0.785 0.727 2YLB 400 0.820 0.807 0.807 0.675 
2NUH 104 0.922 0.835 0.691 0.771 2YNY 315 0.836 0.813 0.804 0.706 
2O6X 306 0.825 0.814 0.799 0.651 2ZCM 357 0.723 0.458 0.422 0.42 
2OA2 132 0.703 0.571 0.456 0.458 2ZU1 360 0.753 0.689 0.672 0.653 
2OCT 192 0.673 0.567 0.55 0.54 3A0M 148 0.916 0.807 0.712 0.392 
2OHW 256 0.743 0.614 0.539 0.475 3A7L 128 0.806 0.713 0.663 0.756 
2OKT 342 0.779 0.433 0.411 0.336 3AMC 614 0.758 0.675 0.669 0.581 
2OL9 1.000 0.909 0.904 0.689 3AUB 116 0.650 0.614 0.608 0.637 
3BA1 312 0.827 0.661 0.624 0.621 3B5O 230 0.729 0.644 0.629 0.601 
3BED 261 0.874 0.845 0.82 0.684 3MD4 12 0.999 0.86 0.781 0.914 
3BQX 139 0.900 0.634 0.481 0.297 3MD5 12 0.998 0.649 0.413 −0.218 
3BZQ 99 0.848 0.532 0.516 0.466 3MEA 166 0.872 0.669 0.669 0.6 
3BZZ 100 0.783 0.485 0.45 0.6 3MGN 348 0.742 0.205 0.119 0.193 
3DRF 547 0.781 0.559 0.549 0.488 3MRE 383 0.675 0.661 0.641 0.567 
3DWV 325 0.754 0.707 0.661 0.547 3N11 325 0.736 0.614 0.583 0.517 
3E5T 228 0.731 0.502 0.489 0.296 3NE0 208 0.859 0.706 0.645 0.659 
3E7R 40 0.769 0.706 0.687 0.642 3NGG 94 0.867 0.696 0.689 0.719 
3EUR 140 0.874 0.431 0.427 0.577 3NPV 495 0.855 0.702 0.653 0.677 
3F2Z 149 0.877 0.824 0.792 0.74 3NVG 1.000 0.721 0.617 0.597 
3F7E 254 0.847 0.812 0.803 0.811 3NZL 73 0.713 0.627 0.583 0.506 
3FCN 158 0.741 0.64 0.606 0.632 3O0P 194 0.898 0.727 0.706 0.734 
3FE7 91 0.914 0.583 0.533 0.276 3O5P 128 0.787 0.734 0.698 0.63 
3FKE 250 0.755 0.525 0.476 0.435 3OBQ 150 0.877 0.649 0.645 0.655 
3FMY 66 0.857 0.701 0.655 0.556 3OQY 234 0.807 0.698 0.686 0.637 
3FOD 48 0.725 0.532 0.44 −0.126 3P6J 125 0.689 0.774 0.767 0.81 
3FSO 221 0.906 0.831 0.817 0.793 3PD7 188 0.848 0.77 0.723 0.589 
3FTD 240 0.818 0.722 0.713 0.634 3PES 165 0.861 0.697 0.642 0.683 
3FVA 1.000 0.835 0.825 0.789 3PID 387 0.677 0.537 0.531 0.642 
3G1S 418 0.879 0.771 0.7 0.63 3PIW 154 0.772 0.758 0.744 0.717 
3GBW 161 0.864 0.82 0.747 0.51 3PKV 221 0.731 0.625 0.597 0.568 
3GHJ 116 0.864 0.732 0.511 0.196 3PSM 94 0.914 0.876 0.79 0.745 
3HFO 197 0.825 0.691 0.67 0.518 3PTL 289 0.611 0.543 0.541 0.468 
3HHP 1234 0.830 0.72 0.716 0.683 3PVE 347 0.785 0.718 0.667 0.568 
3HNY 156 0.885 0.793 0.723 0.758 3PZ9 357 0.758 0.709 0.709 0.678 
3HP4 183 0.690 0.534 0.5 0.573 3PZZ 12 0.998 0.945 0.922 0.95 
3HWU 144 0.905 0.754 0.748 0.841 3Q2X 1.000 0.922 0.904 0.866 
3HYD 1.000 0.966 0.95 0.867 3Q6L 131 0.723 0.622 0.577 0.605 
3HZ8 192 0.857 0.617 0.502 0.475 3QDS 284 0.782 0.78 0.745 0.568 
3I2V 124 0.879 0.486 0.441 0.301 3QPA 197 0.616 0.587 0.442 0.503 
3I2Z 138 0.732 0.613 0.599 0.317 3R6D 221 0.854 0.688 0.669 0.495 
3I4O 135 0.767 0.735 0.714 0.738 3R87 132 0.861 0.452 0.419 0.286 
3I7M 134 0.604 0.667 0.635 0.695 3RQ9 162 0.711 0.51 0.403 0.242 
3IHS 169 0.807 0.586 0.565 0.409 3RY0 128 0.790 0.616 0.606 0.47 
3IVV 149 0.866 0.817 0.797 0.693 3RZY 139 0.867 0.8 0.784 0.849 
3K6Y 227 0.817 0.586 0.535 0.301 3S0A 119 0.713 0.562 0.524 0.526 
3KBE 140 0.743 0.705 0.704 0.611 3SD2 86 0.842 0.523 0.421 0.237 
3KGK 190 0.798 0.784 0.775 0.68 3SEB 238 0.879 0.801 0.712 0.826 
3KZD 85 0.789 0.647 0.611 0.475 3SED 124 0.870 0.709 0.658 0.712 
3L41 220 0.776 0.718 0.716 0.669 3SO6 150 0.747 0.675 0.666 0.63 
3LAA 169 0.880 0.827 0.647 0.659 3SR3 637 0.633 0.619 0.611 0.624 
3LAX 106 0.924 0.734 0.73 0.584 3SUK 248 0.721 0.644 0.633 0.567 
3LG3 833 0.701 0.658 0.614 0.589 3SZH 697 0.860 0.817 0.815 0.697 
3LJI 272 0.720 0.612 0.608 0.551 3T0H 208 0.897 0.808 0.775 0.694 
3M3P 249 0.697 0.584 0.554 0.338 3T3K 122 0.803 0.796 0.748 0.735 
3M8J 178 0.813 0.73 0.728 0.628 3T47 141 0.759 0.592 0.527 0.447 
3M9J 210 0.867 0.639 0.574 0.296 3TDN 357 0.668 0.458 0.419 0.24 
3M9Q 176 0.851 0.591 0.51 0.471 3TOW 152 0.722 0.578 0.556 0.571 
3MAB 173 0.770 0.664 0.591 0.451 3TUA 210 0.696 0.665 0.658 0.588 
3U6G 248 0.808 0.635 0.632 0.526 3TYS 75 0.918 0.853 0.8 0.791 
3U97 77 0.819 0.753 0.736 0.712 4DT4 160 0.784 0.776 0.738 0.716 
3UCI 72 0.689 0.589 0.526 0.495 4EK3 287 0.830 0.68 0.68 0.674 
3UR8 637 0.832 0.666 0.652 0.597 4ERY 318 0.801 0.74 0.701 0.688 
3US6 148 0.668 0.698 0.586 0.553 4ES1 95 0.820 0.648 0.625 0.551 
3V1A 48 0.811 0.531 0.487 0.583 4EUG 225 0.592 0.57 0.529 0.405 
3V75 285 0.674 0.604 0.596 0.491 4F01 448 0.883 0.633 0.372 0.688 
3VN0 193 0.889 0.84 0.837 0.812 4F3J 143 0.879 0.617 0.598 0.551 
3VOR 182 0.686 0.602 0.557 0.484 4FR9 141 0.806 0.671 0.655 0.501 
3VUB 101 0.852 0.625 0.61 0.607 4G14 15 1.000 0.467 0.323 0.356 
3VVV 108 0.951 0.833 0.741 0.753 4G2E 151 0.835 0.76 0.755 0.758 
3VZ9 163 0.887 0.785 0.749 0.695 4G5X 550 0.822 0.786 0.754 0.743 
3W4Q 773 0.798 0.737 0.725 0.649 4G6C 658 0.834 0.591 0.59 0.528 
3ZBD 213 0.891 0.651 0.516 0.632 4G7X 194 0.840 0.688 0.587 0.624 
3ZIT 152 0.641 0.43 0.404 0.392 4GA2 144 0.782 0.528 0.485 0.406 
3ZRX 221 0.639 0.59 0.562 0.391 4GMQ 92 0.794 0.678 0.628 0.55 
3ZSL 138 0.903 0.691 0.687 0.526 4GS3 90 0.698 0.544 0.522 0.547 
3ZZP 74 0.692 0.524 0.46 0.448 4H4J 236 0.866 0.81 0.806 0.689 
3ZZY 226 0.804 0.746 0.709 0.728 4H89 168 0.624 0.682 0.588 0.596 
4A02 166 0.730 0.618 0.516 0.303 4HDE 168 0.783 0.745 0.728 0.615 
4ACJ 167 0.827 0.748 0.746 0.759 4HJP 281 0.730 0.703 0.649 0.51 
4AE7 186 0.862 0.724 0.717 0.717 4HWM 117 0.807 0.638 0.622 0.499 
4AM1 345 0.796 0.674 0.619 0.46 4IL7 85 0.719 0.446 0.404 0.316 
4ANN 176 0.562 0.551 0.536 0.47 4J11 357 0.726 0.62 0.562 0.401 
4AVR 188 0.759 0.68 0.605 0.65 4J5O 220 0.817 0.793 0.757 0.777 
4AXY 54 0.973 0.7 0.623 0.72 4J5Q 146 0.851 0.742 0.742 0.689 
4B6G 558 0.804 0.765 0.756 0.669 4J78 305 0.729 0.658 0.648 0.608 
4B9G 292 0.855 0.844 0.816 0.763 4JG2 185 0.889 0.746 0.736 0.543 
4DD5 387 0.850 0.615 0.596 0.351 4JVU 207 0.800 0.723 0.697 0.553 
4DKN 423 0.786 0.781 0.761 0.539 4JYP 534 0.800 0.688 0.682 0.538 
4DND 95 0.829 0.763 0.75 0.582 4KEF 133 0.704 0.58 0.53 0.324 
4DPZ 109 0.837 0.73 0.726 0.651 5CYT 103 0.548 0.441 0.421 0.331 
4DQ7 328 0.776 0.69 0.683 0.376 6RXN 45 0.583 0.614 0.574 0.594 
TABLE III.

Correlation coefficients for B-factor prediction obtained by optimal FRI (opFRI), parameter-free FRI (pfFRI), and Gaussian normal mode (GNM) for small-size structures. Results for opFRI and pfFRI are taken from Opron et al.35 GNM and NMA values are taken from the coarse-grained (Cα) results reported in Park et al.33 MWCG results are parameter free and use all C, N, and O to predict Cα.

PDB IDNMWCGopFRIpfFRIGNMNMA
1AIE 31 0.969 0.588 0.416 0.155 0.712 
1AKG 16 0.945 0.373 0.35 0.185 −0.229 
1BX7 51 0.896 0.726 0.623 0.706 0.868 
1ETL 12 0.932 0.71 0.609 0.628 0.355 
1ETM 12 0.941 0.544 0.393 0.432 0.027 
1ETN 12 0.949 0.089 0.023 −0.274 −0.573 
1FF4 65 0.933 0.718 0.613 0.674 0.555 
1GK7 39 0.984 0.845 0.773 0.821 0.822 
1GVD 52 0.849 0.781 0.732 0.591 0.570 
1HJE 13 0.931 0.811 0.686 0.616 0.562 
1KYC 15 0.971 0.796 0.763 0.754 0.784 
1NOT 13 0.937 0.746 0.622 0.523 0.567 
1O06 20 0.988 0.91 0.874 0.844 0.900 
1OB4 16 1.000 0.776 0.763 0.750 0.930 
1OB7 16 1.000 0.737 0.545 0.652 0.952 
1P9I 29 0.841 0.754 0.742 0.625 0.603 
1PEF 18 0.989 0.888 0.826 0.808 0.888 
1PEN 16 0.957 0.516 0.465 0.270 0.056 
1Q9B 43 0.957 0.746 0.726 0.656 0.646 
1RJU 36 0.805 0.517 0.447 0.431 0.235 
1U06 55 0.774 0.474 0.429 0.434 0.377 
1UOY 64 0.769 0.713 0.653 0.671 0.628 
1USE 40 0.960 0.438 0.146 −0.142 −0.399 
1VRZ 21 0.995 0.792 0.695 0.677 −0.203 
1XY2 1.000 0.619 0.57 0.562 0.458 
1YJO 1.000 0.375 0.333 0.434 0.445 
1YZM 46 0.970 0.842 0.834 0.901 0.939 
2DSX 52 0.704 0.337 0.333 0.127 0.433 
2JKU 35 0.926 0.805 0.695 0.656 0.850 
2NLS 36 0.937 0.605 0.559 0.530 0.088 
2OL9 1.000 0.909 0.904 0.689 0.886 
2OLX 1.000 0.917 0.888 0.885 0.776 
6RXN 45 0.583 0.614 0.574 0.594 0.304 
PDB IDNMWCGopFRIpfFRIGNMNMA
1AIE 31 0.969 0.588 0.416 0.155 0.712 
1AKG 16 0.945 0.373 0.35 0.185 −0.229 
1BX7 51 0.896 0.726 0.623 0.706 0.868 
1ETL 12 0.932 0.71 0.609 0.628 0.355 
1ETM 12 0.941 0.544 0.393 0.432 0.027 
1ETN 12 0.949 0.089 0.023 −0.274 −0.573 
1FF4 65 0.933 0.718 0.613 0.674 0.555 
1GK7 39 0.984 0.845 0.773 0.821 0.822 
1GVD 52 0.849 0.781 0.732 0.591 0.570 
1HJE 13 0.931 0.811 0.686 0.616 0.562 
1KYC 15 0.971 0.796 0.763 0.754 0.784 
1NOT 13 0.937 0.746 0.622 0.523 0.567 
1O06 20 0.988 0.91 0.874 0.844 0.900 
1OB4 16 1.000 0.776 0.763 0.750 0.930 
1OB7 16 1.000 0.737 0.545 0.652 0.952 
1P9I 29 0.841 0.754 0.742 0.625 0.603 
1PEF 18 0.989 0.888 0.826 0.808 0.888 
1PEN 16 0.957 0.516 0.465 0.270 0.056 
1Q9B 43 0.957 0.746 0.726 0.656 0.646 
1RJU 36 0.805 0.517 0.447 0.431 0.235 
1U06 55 0.774 0.474 0.429 0.434 0.377 
1UOY 64 0.769 0.713 0.653 0.671 0.628 
1USE 40 0.960 0.438 0.146 −0.142 −0.399 
1VRZ 21 0.995 0.792 0.695 0.677 −0.203 
1XY2 1.000 0.619 0.57 0.562 0.458 
1YJO 1.000 0.375 0.333 0.434 0.445 
1YZM 46 0.970 0.842 0.834 0.901 0.939 
2DSX 52 0.704 0.337 0.333 0.127 0.433 
2JKU 35 0.926 0.805 0.695 0.656 0.850 
2NLS 36 0.937 0.605 0.559 0.530 0.088 
2OL9 1.000 0.909 0.904 0.689 0.886 
2OLX 1.000 0.917 0.888 0.885 0.776 
6RXN 45 0.583 0.614 0.574 0.594 0.304 
TABLE IV.

Correlation coefficients for tB-factor prediction obtained by optimal FRI (opFRI), parameter-free FRI (pfFRI), and Gaussian normal mode (GNM) for medium-size structures. Results for opFRI and pfFRI are taken from Opron et al.35 GNM and NMA values are taken from the coarse-grained (Cα) results reported in Park et al.33 MWCG results are parameter free and use all C, N, and O to predict Cα.

PDB IDNMWCGopFRIpfFRIGNMNMA
1ABA 87 0.855 0.727 0.698 0.613 0.057 
1CYO 88 0.860 0.751 0.702 0.741 0.774 
1FK5 93 0.648 0.590 0.568 0.485 0.362 
1GXU 88 0.901 0.748 0.634 0.421 0.581 
1I71 83 0.798 0.549 0.516 0.549 0.380 
1LR7 73 0.929 0.679 0.657 0.620 0.795 
1N7E 95 0.812 0.651 0.609 0.497 0.385 
1NNX 93 0.834 0.795 0.789 0.631 0.517 
1NOA 113 0.808 0.622 0.604 0.615 0.485 
1OPD 85 0.607 0.555 0.409 0.398 0.796 
1QAU 112 0.786 0.678 0.672 0.620 0.533 
1R7J 90 0.859 0.789 0.621 0.368 0.078 
1UHA 83 0.838 0.726 0.665 0.638 0.308 
1ULR 87 0.718 0.639 0.594 0.495 0.223 
1USM 77 0.819 0.832 0.809 0.798 0.780 
1V05 96 0.841 0.629 0.599 0.632 0.389 
1W2L 97 0.747 0.691 0.564 0.397 0.432 
1X3O 80 0.787 0.600 0.559 0.654 0.453 
1Z21 96 0.725 0.662 0.638 0.433 0.289 
1ZVA 75 0.911 0.756 0.579 0.690 0.579 
2BF9 36 0.714 0.606 0.554 0.680 0.521 
2BRF 100 0.873 0.795 0.764 0.710 0.535 
2CE0 99 0.824 0.706 0.598 0.529 0.628 
2E3H 81 0.794 0.692 0.682 0.605 0.632 
2EAQ 89 0.817 0.753 0.690 0.695 0.688 
2EHS 75 0.805 0.720 0.713 0.747 0.565 
2FQ3 85 0.844 0.719 0.692 0.348 0.508 
2IP6 87 0.841 0.654 0.578 0.572 0.826 
2MCM 113 0.867 0.789 0.713 0.639 0.643 
2NUH 104 0.922 0.835 0.691 0.771 0.685 
2PKT 93 0.762 0.162 0.003 −0.193 −0.165 
2PLT 99 0.635 0.508 0.484 0.509 0.187 
2QJL 99 0.611 0.594 0.584 0.594 0.497 
2RB8 93 0.840 0.727 0.614 0.517 0.485 
3BZQ 99 0.848 0.532 0.516 0.466 0.351 
5CYT 103 0.548 0.441 0.421 0.331 0.102 
PDB IDNMWCGopFRIpfFRIGNMNMA
1ABA 87 0.855 0.727 0.698 0.613 0.057 
1CYO 88 0.860 0.751 0.702 0.741 0.774 
1FK5 93 0.648 0.590 0.568 0.485 0.362 
1GXU 88 0.901 0.748 0.634 0.421 0.581 
1I71 83 0.798 0.549 0.516 0.549 0.380 
1LR7 73 0.929 0.679 0.657 0.620 0.795 
1N7E 95 0.812 0.651 0.609 0.497 0.385 
1NNX 93 0.834 0.795 0.789 0.631 0.517 
1NOA 113 0.808 0.622 0.604 0.615 0.485 
1OPD 85 0.607 0.555 0.409 0.398 0.796 
1QAU 112 0.786 0.678 0.672 0.620 0.533 
1R7J 90 0.859 0.789 0.621 0.368 0.078 
1UHA 83 0.838 0.726 0.665 0.638 0.308 
1ULR 87 0.718 0.639 0.594 0.495 0.223 
1USM 77 0.819 0.832 0.809 0.798 0.780 
1V05 96 0.841 0.629 0.599 0.632 0.389 
1W2L 97 0.747 0.691 0.564 0.397 0.432 
1X3O 80 0.787 0.600 0.559 0.654 0.453 
1Z21 96 0.725 0.662 0.638 0.433 0.289 
1ZVA 75 0.911 0.756 0.579 0.690 0.579 
2BF9 36 0.714 0.606 0.554 0.680 0.521 
2BRF 100 0.873 0.795 0.764 0.710 0.535 
2CE0 99 0.824 0.706 0.598 0.529 0.628 
2E3H 81 0.794 0.692 0.682 0.605 0.632 
2EAQ 89 0.817 0.753 0.690 0.695 0.688 
2EHS 75 0.805 0.720 0.713 0.747 0.565 
2FQ3 85 0.844 0.719 0.692 0.348 0.508 
2IP6 87 0.841 0.654 0.578 0.572 0.826 
2MCM 113 0.867 0.789 0.713 0.639 0.643 
2NUH 104 0.922 0.835 0.691 0.771 0.685 
2PKT 93 0.762 0.162 0.003 −0.193 −0.165 
2PLT 99 0.635 0.508 0.484 0.509 0.187 
2QJL 99 0.611 0.594 0.584 0.594 0.497 
2RB8 93 0.840 0.727 0.614 0.517 0.485 
3BZQ 99 0.848 0.532 0.516 0.466 0.351 
5CYT 103 0.548 0.441 0.421 0.331 0.102 
TABLE V.

Correlation coefficients for the B-factor prediction obtained by optimal FRI (opFRI), parameter-free FRI (pfFRI), and Gaussian normal mode (GNM) for large-size structures. Results for opFRI and pfFRI are taken from Opron et al.35 GNM and NMA values are taken from the coarse-grained (Cα) results reported in Park et al.33 MWCG results are parameter free and use all C, N, and O to predict Cα.

PDB IDNMWCGopFRIpfFRIGNMNMA
1AHO 64 0.768 0.698 0.625 0.562 0.339 
1ATG 231 0.843 0.613 0.578 0.497 0.154 
1BYI 224 0.600 0.543 0.491 0.552 0.133 
1CCR 111 0.741 0.580 0.512 0.351 0.530 
1E5K 188 0.848 0.746 0.732 0.859 0.620 
1EW4 106 0.804 0.650 0.644 0.547 0.447 
1IFR 113 0.875 0.697 0.689 0.637 0.330 
1NKO 122 0.831 0.619 0.535 0.368 0.322 
1NLS 238 0.799 0.669 0.530 0.523 0.385 
1O08 221 0.516 0.562 0.333 0.309 0.616 
1PMY 123 0.701 0.671 0.654 0.685 0.702 
1PZ4 114 0.921 0.828 0.781 0.843 0.844 
1QTO 122 0.809 0.543 0.520 0.334 0.725 
1RRO 112 0.748 0.435 0.372 0.529 0.546 
1UKU 102 0.765 0.665 0.661 0.742 0.720 
1V70 105 0.854 0.622 0.492 0.162 0.285 
1WBE 204 0.767 0.591 0.577 0.549 0.574 
1WHI 122 0.804 0.601 0.539 0.270 0.414 
1WPA 107 0.797 0.634 0.577 0.417 0.380 
2AGK 233 0.821 0.705 0.694 0.512 0.514 
2C71 205 0.773 0.658 0.649 0.560 0.584 
2CG7 90 0.738 0.551 0.539 0.379 0.308 
2CWS 227 0.756 0.647 0.640 0.696 0.524 
2HQK 213 0.897 0.824 0.809 0.365 0.743 
2HYK 238 0.728 0.585 0.575 0.510 0.593 
2I24 113 0.672 0.593 0.498 0.494 0.441 
2IMF 203 0.798 0.652 0.625 0.514 0.401 
2PPN 107 0.673 0.677 0.638 0.668 0.468 
2R16 176 0.640 0.582 0.495 0.618 0.411 
2V9V 135 0.697 0.555 0.548 0.528 0.594 
2VIM 104 0.859 0.413 0.393 0.212 0.221 
2VPA 204 0.757 0.763 0.755 0.576 0.594 
2VYO 210 0.777 0.675 0.648 0.729 0.739 
3SEB 238 0.879 0.801 0.712 0.826 0.720 
3VUB 101 0.852 0.625 0.610 0.607 0.365 
PDB IDNMWCGopFRIpfFRIGNMNMA
1AHO 64 0.768 0.698 0.625 0.562 0.339 
1ATG 231 0.843 0.613 0.578 0.497 0.154 
1BYI 224 0.600 0.543 0.491 0.552 0.133 
1CCR 111 0.741 0.580 0.512 0.351 0.530 
1E5K 188 0.848 0.746 0.732 0.859 0.620 
1EW4 106 0.804 0.650 0.644 0.547 0.447 
1IFR 113 0.875 0.697 0.689 0.637 0.330 
1NKO 122 0.831 0.619 0.535 0.368 0.322 
1NLS 238 0.799 0.669 0.530 0.523 0.385 
1O08 221 0.516 0.562 0.333 0.309 0.616 
1PMY 123 0.701 0.671 0.654 0.685 0.702 
1PZ4 114 0.921 0.828 0.781 0.843 0.844 
1QTO 122 0.809 0.543 0.520 0.334 0.725 
1RRO 112 0.748 0.435 0.372 0.529 0.546 
1UKU 102 0.765 0.665 0.661 0.742 0.720 
1V70 105 0.854 0.622 0.492 0.162 0.285 
1WBE 204 0.767 0.591 0.577 0.549 0.574 
1WHI 122 0.804 0.601 0.539 0.270 0.414 
1WPA 107 0.797 0.634 0.577 0.417 0.380 
2AGK 233 0.821 0.705 0.694 0.512 0.514 
2C71 205 0.773 0.658 0.649 0.560 0.584 
2CG7 90 0.738 0.551 0.539 0.379 0.308 
2CWS 227 0.756 0.647 0.640 0.696 0.524 
2HQK 213 0.897 0.824 0.809 0.365 0.743 
2HYK 238 0.728 0.585 0.575 0.510 0.593 
2I24 113 0.672 0.593 0.498 0.494 0.441 
2IMF 203 0.798 0.652 0.625 0.514 0.401 
2PPN 107 0.673 0.677 0.638 0.668 0.468 
2R16 176 0.640 0.582 0.495 0.618 0.411 
2V9V 135 0.697 0.555 0.548 0.528 0.594 
2VIM 104 0.859 0.413 0.393 0.212 0.221 
2VPA 204 0.757 0.763 0.755 0.576 0.594 
2VYO 210 0.777 0.675 0.648 0.729 0.739 
3SEB 238 0.879 0.801 0.712 0.826 0.720 
3VUB 101 0.852 0.625 0.610 0.607 0.365 

A comparison of the average correlation coefficients for small, medium, and large proteins, as well as the protein superset, is provided in Table VI. It is seen that GNM is more accurate than NMA, as analyzed by Park el al.33 opFRI and pfFRI are more accurate than GNM. The proposed MWCG is about 28% more accurate than pfFRI and 42% more accurate than GNM.

TABLE VI.

Average correlation coefficients for Cα B-factor prediction with FRI, GNM, and NMA for three structure sets from the work of Park et al.33 and a superset of 364 structures. Results for opFRI and pfFRI are taken from Opron et al.35 GNM and NMA values are taken from the coarse-grained (Cα) results reported in Park et al.33 MWCG results are parameter free and use all C, N, and O to predict Cα.

PDB setMWCGopFRI35 pfFRI35 GNMNMA33 
Small 0.921 0.667 0.594 0.54133  0.480 
Medium 0.795 0.664 0.605 0.55033  0.482 
Large 0.775 0.636 0.591 0.52933  0.494 
Superset 0.803 0.673 0.626 0.56535  NA 
PDB setMWCGopFRI35 pfFRI35 GNMNMA33 
Small 0.921 0.667 0.594 0.54133  0.480 
Medium 0.795 0.664 0.605 0.55033  0.482 
Large 0.775 0.636 0.591 0.52933  0.494 
Superset 0.803 0.673 0.626 0.56535  NA 

Lastly, Table VII displays the average correlation coefficient using MWCG to predict the B-factors of Cα, non-Cα carbon, nitrogen, oxygen, and sulfur atoms. Note that these predictions were not available for earlier GNM and FRI methods.

TABLE VII.

Correlation coefficients for Cα, non-Cα carbon, nitrogen, oxygen, and sulfur using parameter-free MWCG. Only 215 of the 364 proteins contain sulfur atoms.

SubsetCαNon-Cα carbonNitrogenOxygenSulfur
Average 0.803 0.744 0.812 0.789 0.903 
No. of proteins 364 364 364 364 215 
SubsetCαNon-Cα carbonNitrogenOxygenSulfur
Average 0.803 0.744 0.812 0.789 0.903 
No. of proteins 364 364 364 364 215 

Protein hinge regions play an important role in enzymatic catalysis due to their flexibility. A flexible active site will more likely to accommodate binding ligands or partners. Protein hinges are also important in protein domain separation. As such, the characterization of protein hinges is a valuable application of flexibility methodologies. Our method can be applied as a protein hinge detection tool. We consider calmodulin, a good example of a hinge that affects both the structure and function. We compared the experimental B-factors and predicted B-factors of Cα residues of the three proteins: 1CLL, 1WHI, and 2HQK. Experimental observations are compared with predictions from WCG and GNM. To illustrate the utility of the element specific feature of MWCG, we use only one scale so that the method does not take advantage of the multiscale ability of MWCG. For PDB ID 1CLL, we include MWCG as a comparison. The results are generated with the MWCG CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å. The results are displayed in Figs. 6–8.

FIG. 6.

Top: from left to right, the structure of calmodulin (PDB ID: 1CLL) visualized in Visual Molecular Dynamics (VMD) 18 and colored by experimental B-factors, MWCG-predicted B-factors, WCG-predicted B-factors, and GNM-predicted B-factors with red representing the most flexible regions. Bottom: the experimental (Exp) and predicted B-factor values plotted per residue for PDB ID 1CLL. The GNM is the GNM method with a cutoff distance of 7 Å. We see that GNM clearly misses the flexible hinge region. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å. MWCG represents B-factor predictions determined from the MWCG method using the fixed parameters listed in Table I.

FIG. 6.

Top: from left to right, the structure of calmodulin (PDB ID: 1CLL) visualized in Visual Molecular Dynamics (VMD) 18 and colored by experimental B-factors, MWCG-predicted B-factors, WCG-predicted B-factors, and GNM-predicted B-factors with red representing the most flexible regions. Bottom: the experimental (Exp) and predicted B-factor values plotted per residue for PDB ID 1CLL. The GNM is the GNM method with a cutoff distance of 7 Å. We see that GNM clearly misses the flexible hinge region. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å. MWCG represents B-factor predictions determined from the MWCG method using the fixed parameters listed in Table I.

Close modal
FIG. 7.

Top: a visual comparison of experimental B-factors (left), WCG-predicted B-factors (middle), and GNM-predicted B-factors (right) for the ribosomal protein L14 (PDB ID: 1WHI). Bottom: the experimental and predicted B-factor values plotted per residue. GNM represents predicted B-factors using GNM with a cutoff distance of 7 Å. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å.

FIG. 7.

Top: a visual comparison of experimental B-factors (left), WCG-predicted B-factors (middle), and GNM-predicted B-factors (right) for the ribosomal protein L14 (PDB ID: 1WHI). Bottom: the experimental and predicted B-factor values plotted per residue. GNM represents predicted B-factors using GNM with a cutoff distance of 7 Å. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å.

Close modal
FIG. 8.

Top: a visual comparison of experimental B-factors (left), WCG-predicted B-factors (middle), and GNM-predicted B-factors (right) for the engineered cyan fluorescent protein, mTFP1 (PDB ID: 2HQK). Bottom: the experimental (Exp) and predicted B-factor values plotted per residue for PDB ID 2HQK. The GNM is the GNM method with a cutoff distance of 7 Å. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å.

FIG. 8.

Top: a visual comparison of experimental B-factors (left), WCG-predicted B-factors (middle), and GNM-predicted B-factors (right) for the engineered cyan fluorescent protein, mTFP1 (PDB ID: 2HQK). Bottom: the experimental (Exp) and predicted B-factor values plotted per residue for PDB ID 2HQK. The GNM is the GNM method with a cutoff distance of 7 Å. WCG is parametrized using CC, CN, and CO kernels of the exponential type with fixed parameters κ = 1 and η = 3 Å.

Close modal

When compared to opFRI, pfFRI, GNM, and NMA, the proposed MWCG method provides a significantly better Pearson correlation coefficient. For most proteins in the data set, MWCG improves upon opFRI. MWCG is about 28% and 42% more accurate than pfFRI and GNM on a set of 364 proteins. However, the parameters of the current MWCG method were not fully optimized. A grid search of MWCG parameters would undoubtedly provide even better results. Mathematically, GNM can be regarded as an algebraic graph theory approximation to the graph centrality of geometric graph theory, namely, FRI, as far as the B-factor prediction is concerned. In fact, it is a computationally more expensive approximation. Consequently, FRI methods are more accurate and efficient than GNM. Using multiscale analysis, graph coloring, and subgraph division, MWCG is able to significantly outperform all the earlier GNM and FRI methods.

Unlike the earlier methods, the MWCG method is not limited to B-factor prediction of Cα atoms. Due to graph coloring and subgraph division techniques, MWCG can also be used to accurately predict the B-factor of other heavy atoms as well. The results in Table VII show that the method also reliably predicts the B-factors of non-Cα carbon, nitrogen, oxygen, and sulfur atoms.

The correlation maps generated in this study provide evidence that using MWCG, one can recognize tertiary structures from a contact map not only using Cα atoms but also using nearby double bonded carbonyl oxygen and amine nitrogen atoms as well. In this study, we construct nitrogen- and oxygen-based protein correlation maps using the amine nitrogen and the double bonded carbonyl oxygen from each amino acid. In particular, an alpha helix can be clearly observed in the correlation maps along the diagonal as a thicker band as seen in Figs. 2, 4, and 5. The parallel bands in the correlation maps for 5IIV seen in Fig. 4 indicate the interaction between the left and right alpha helices. In Figs. 3 and 5, the bands perpendicular to the diagonal represent the interaction between the anti-parallel beta sheets for proteins 1KGM and 3PSM.

Ribosomal protein L14 (PDB ID: 1WHI) is an important component of the 60S ribosomal subunit. Structurally the protein is diverse, containing an alpha helix, a beta-barrel, a parallel beta strands, and a beta-hairpin motif. Due to the hard cutoff used in GNM, GNM underpredicts B-factors in stiff areas and overpredicts B-factors in flexible regions. This result is typical of GNM as the hard cutoff required in the Kirchoff matrix (i.e., Laplacian matrix) of GNM leads to an overemphasis of the importance of the bond importance near the cutoff. This behavior is seen in Figs. 6–8. Figure 6 depicts the B-factor prediction comparison for calcium-bound calmodulin. We see that GNM fails to predict the hinge region near the 75th residue. The single kernel weighted colored graph does show a peak in this region though it underestimates the magnitude of the flexibility found in this region. We see the multiscale property of MWCG does an even better job capturing the hinge in this region. Figure 7 shows the predicted Cα B-factors of the cyan fluorescent protein. Here the GNM prediction contains a large error near the 60th residue. Different GNM cutoffs can slightly improve this error, but it still exists regardless of the chosen cutoff. This region corresponds to a small alpha-helical region within a beta-barrel. In this region, there are at least two important interaction scales and GNM fails to take both into account. We can see from the figure that the single kernel weighted colored graph is able to capture these interactions accurately indicating that the element specificity is capturing at least some of the multiscale interaction.

The work presented here demonstrates the efficacy of modifying the FRI method to include element specific interactions between other heavy atoms. Compared to the optimized FRI method, the MWCG method provides an 18% increase in the average Pearson correlation coefficient. Moreover, even the single kernel element specific FRI provides encouraging results as seen in Figs. 7, 6, and 8. The new oxygen- and nitrogen-based correlation maps provide a fresh insight into protein topology and may be useful for future work. To the authors’ knowledge, no other method outperforms the current algorithm for Cα B-factor prediction.

Any regression method is prone to overfitting. We can see from Table VI that for small proteins, the MWCG method has a particularly high Pearson correlation coefficient, while the medium and large sets are more consistent with one another. This discrepancy is likely due to overfitting and could be addressed by using fewer correlation kernels. A similar problem exists with the sulfur atom B-factor prediction as demonstrated by the large Pearson correlation coefficient in Table VII. Future work will address this issue by combining machine learning and cross-validation techniques to provide a method robust against overfitting. Unlike the FRI methods discussed here, the machine learning approach has the added advantage of blind B-factor prediction.

Despite much effort, protein flexibility analysis remains a challenge due to low accuracy in B-factor prediction. For a large set of proteins, none of the current methods deliver an average Pearson correlation coefficient as high as 0.7 for protein B-factor prediction, which is unreliable for practical applications to hinge detection, domain separation, docking analysis, and entropy calculations. Additionally, earlier methods cannot simultaneously predict the flexibility of different types of atoms in a molecule. This work introduces a geometric graph model, multiscale weighted colored graph (MWCG), to address the aforementioned difficulties and significantly improve the current state-of-the-art approaches in protein flexibility analysis. The weighted colored graph theory describes pairwise interactions near an atom in the protein network. These interactions are organized according to their element types, which leads to subgraphs. The rigidity of each node at a given scale is represented by subgraph centralities from various scales. The present method is validated by a few standard data sets, including relatively small, medium, and large proteins. An extensive comparison is given to a number of standard methods, such as GNM, NMA, and various FRI models. We show that the present MWCG used in our study is over 40% more accurate than GNM and delivers an average Pearson correlation coefficient as high as 0.8 in protein B-factor prediction of 364 proteins, which offers a reliable method for protein flexibility analysis and various applications. MWCG also offers accurate predictions of all atoms in a protein data set. The proposed method can be used to improve the algebraic graph theory analysis of biomolecular algebraic connectivity. A drawback of the present minimization schedule is that it is subject to overfitting in predicting the B-factors of small molecules. This problem can be addressed by using advanced machine learning algorithms.

This work was supported in part by NSF Grant Nos. DMS-1721024 and IIS-1302285 and the MSU Center for Mathematical Molecular Biosciences Initiative.

1.
H.
Frauenfelder
,
S. G.
Slihar
, and
P. G.
Wolynes
, “
The energy landsapes and motion of proteins
,”
Science
254
,
1598
1603
(
1991
).
2.
J. P.
Ma
, “
Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes
,”
Structure
13
,
373
180
(
2005
).
3.
N.
Go
,
T.
Noguti
, and
T.
Nishikawa
, “
Dynamics of a small globular protein in terms of low-frequency vibrational modes
,”
Proc. Natl. Acad. Sci. U. S. A.
80
,
3696
3700
(
1983
).
4.
M.
Tasumi
,
H.
Takenchi
,
S.
Ataka
,
A. M.
Dwidedi
, and
S.
Krimm
, “
Normal vibrations of proteins: Glucagon
,”
Biopolymers
21
,
711
714
(
1982
).
5.
B. R.
Brooks
,
R. E.
Bruccoleri
,
B. D.
Olafson
,
D.
States
,
S.
Swaminathan
, and
M.
Karplus
, “
Charmm: A program for macromolecular energy, minimization, and dynamics calculations
,”
J. Comput. Chem.
4
,
187
217
(
1983
).
6.
M.
Levitt
,
C.
Sander
, and
P. S.
Stern
, “
Protein normal-mode dynamics: Trypsin inhibitor, crambin, ribonuclease and lysozyme
,”
J. Mol. Biol.
181
(
3
),
423
447
(
1985
).
7.
M. M.
Tirion
, “
Large amplitude elastic motions in proteins from a single-parameter, atomic analysis
,”
Phys. Rev. Lett.
77
,
1905
1908
(
1996
).
8.
A. R.
Atilgan
,
S. R.
Durrell
,
R. L.
Jernigan
,
M. C.
Demirel
,
O.
Keskin
, and
I.
Bahar
, “
Anisotropy of fluctuation dynamics of proteins with an elastic network model
,”
Biophys. J.
80
,
505
515
(
2001
).
9.
I.
Bahar
,
A. R.
Atilgan
, and
B.
Erman
, “
Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential
,”
Folding Des.
2
,
173
181
(
1997
).
10.
I.
Bahar
,
A. R.
Atilgan
,
M. C.
Demirel
, and
B.
Erman
, “
Vibrational dynamics of proteins: Significance of slow and fast modes in relation to function and stability
,”
Phys. Rev. Lett.
80
,
2733
2736
(
1998
).
11.
L. W.
Yang
and
C. P.
Chng
, “
Coarse-grained models reveal functional dynamics–I. Elastic network models–theories, comparisons and perspectives
,”
Bioinf. Biol. Insights
2
,
25
45
(
2008
).
12.
D. J.
Jacobs
,
A. J.
Rader
,
L. A.
Kuhn
, and
M. F.
Thorpe
, “
Protein flexibility predictions using graph theory
,”
Proteins: Struct., Funct., Genet.
44
,
150
165
(
2001
).
13.
F.
Tama
,
F. X.
Gadea
,
O.
Marques
, and
Y. H.
Sanejouand
, “
Building-block approach for determining low-frequency normal modes of macromolecules
,”
Proteins: Struct., Funct., Bioinf.
41
(
1
),
1
7
(
2000
).
14.
O. N. A.
Demerdash
and
J. C.
Mitchell
, “
Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis
,”
Proteins: Struct., Funct., Bioinf.
80
,
1766
1779
(
2012
).
15.
S.
Kundu
,
J. S.
Melton
,
D. C.
Sorensen
, and
J. G. N.
Phillips
, “
Dynamics of proteins in crystals: Comparison of experiment with simple models
,”
Biophys. J.
83
,
723
732
(
2002
).
16.
D. A.
Kondrashov
,
A. W.
Van Wynsberghe
,
R. M.
Bannen
,
Q.
Cui
, and
J. G. N.
Phillips
, “
Protein structural variation in computational models and crystallographic data
,”
Structure
15
,
169
177
(
2007
).
17.
G.
Song
and
R. L.
Jernigan
, “
vGNM: A better model for understanding the dynamics of proteins in crystals
,”
J. Mol. Biol.
369
(
3
),
880
893
(
2007
).
18.
D. R.
Livesay
,
S.
Dallakyan
,
G. G.
Wood
, and
D. J.
Jacobs
, “
A flexible approach for understanding protein stability
,”
FEBS Lett.
576
,
468
476
(
2004
).
19.
Z. N.
Gerek
and
S. B.
Ozkan
, “
A flexible docking scheme to explore the binding selectivity of pdz domains
,”
Protein Sci.
19
,
914
928
(
2010
).
20.
C.
Xu
,
D.
Tobi
, and
I.
Bahar
, “
Allosteric changes in protein structure computed by a simple mechanical model: Hemoglobin T ↔ R2 transition
,”
J. Mol. Biol.
333
,
153
168
(
2003
).
21.
W. J.
Zheng
and
S.
Doniach
, “
A comparative study of motor-protein motions by using a simple elastic-network model
,”
Proc. Natl. Acad. Sci. U. S. A.
100
(
23
),
13253
13258
(
2003
).
22.
Q.
Cui
,
G. J.
Li
,
J.
Ma
, and
M.
Karplus
, “
A normal mode analysis of structural plasticity in the biomolecular motor f(1)-atpase
,”
J. Mol. Biol.
340
(
2
),
345
372
(
2004
).
23.
O.
Keskin
,
I.
Bahar
,
D.
Flatow
,
D. G.
Covell
, and
R. L.
Jernigan
, “
Molecular mechanisms of chaperonin groel-groes function
,”
Biochemistry
41
,
491
501
(
2002
).
24.
W.
Zheng
,
B. R.
Brooks
, and
D.
Thirumalai
, “
Allosteric transitions in the chaperonin GroEL are captured by a dominant normal mode that is most robust to sequence variations
,”
Biophys. J.
93
,
2289
2299
(
2007
).
25.
A. J.
Rader
,
D. H.
Vlad
, and
I.
Bahar
, “
Maturation dynamics of bacteriophage HK97 capsid
,”
Structure
13
,
413
421
(
2005
).
26.
F.
Tama
and
C. K.
Brooks
 III
, “
Diversity and identity of mechanical properties of icosahedral viral capsids studied with elastic network normal mode analysis
,”
J. Mol. Biol.
345
,
299
314
(
2005
).
27.
F.
Tama
,
M.
Valle
,
J.
Frank
, and
C. K.
Brooks
 III
, “
Dynamic reorganization of the functionally active ribosome explored by normal mode analysis and cryo-electron microscopy
,”
Proc. Natl. Acad. Sci. U. S. A.
100
(
16
),
9319
9323
(
2003
).
28.
Y.
Wang
,
A. J.
Rader
,
I.
Bahar
, and
R. L.
Jernigan
, “
Global ribosome motions revealed with elastic network model
,”
J. Struct. Biol.
147
,
302
314
(
2004
).
29.
L.
Skjaerven
,
S. M.
Hollup
, and
N.
Reuter
, “
Normal mode analysis for proteins
,”
J. Mol. Struct.: THEOCHEM
898
,
42
48
(
2009
).
30.
Q.
Cui
and
I.
Bahar
,
Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems
(
Chapman and Hall/CRC
,
2010
).
31.
N.
Trinajstic
,
Chemical Graph Theory
(
CRC Press
,
Boca Raton
,
1983
).
32.
H. P.
Schultz
, “
Topological organic chemistry. 1. Graph theory and topological indices of alkanes
,”
J. Chem. Inf. Comput. Sci.
29
(
3
),
227
228
(
1989
).
33.
J. K.
Park
,
R.
Jernigan
, and
Z.
Wu
, “
Coarse grained normal mode analysis vs. refined Gaussian network model for protein residue-level structural fluctuations
,”
Bull. Math. Biol.
75
,
124
160
(
2013
).
34.
K. L.
Xia
,
K.
Opron
, and
G. W.
Wei
, “
Multiscale multiphysics and multidomain models—Flexibility and rigidity
,”
J. Chem. Phys.
139
,
194109
(
2013
).
35.
K.
Opron
,
K. L.
Xia
, and
G. W.
Wei
, “
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
,”
J. Chem. Phys.
140
,
234105
(
2014
).
36.
K.
Opron
,
K. L.
Xia
,
Z.
Burton
, and
G. W.
Wei
, “
Flexibility-rigidity index for protein-nucleic acid flexibility and fluctuation analysis
,”
J. Comput. Chem.
37
,
1283
1295
(
2016
).
37.
D. D.
Nguyen
,
K. L.
Xia
, and
G. W.
Wei
, “
Generalized flexibility-rigidity index
,”
J. Chem. Phys.
144
,
234106
(
2016
).
38.
K.
Opron
,
K. L.
Xia
, and
G. W.
Wei
, “
Communication: Capturing protein multiscale thermal fluctuations
,”
J. Chem. Phys.
142
,
211101
(
2015
).
39.
K. L.
Xia
,
K.
Opron
, and
G. W.
Wei
, “
Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM)
,”
J. Chem. Phys.
143
,
204106
(
2015
).
40.
M.
Newman
,
Networks: An Introduction
(
Oxford University Press
,
2010
).
41.
A.
Bavelas
, “
Communication patterns in task-oriented groups
,”
J. Acoust. Soc. Am.
22
(
6
),
725
730
(
1950
).
42.
A.
Dekker
, “
Conceptual distance in social network analysis
,”
J. Soc. Struct.
6
, (
2005
).http://www.citeulike.org/group/1732/article/913975