Allostery is a constitutive, albeit often elusive, feature of biomolecular systems, which heavily determines their functioning. Its mechanical, entropic, long-range, ligand, and environment-dependent nature creates far from trivial interplays between residues and, in general, the secondary structure of proteins. This intricate scenario is mirrored in computational terms as different notions of “correlation” among residues and pockets can lead to different conclusions and outcomes. In this article, we put on a common ground and challenge three computational approaches for the correlation estimation task and apply them to three diverse targets of pharmaceutical interest: the androgen A2A receptor, the androgen receptor, and the EGFR kinase domain. Results show that partial results consensus can be attained, yet different notions lead to pointing the attention to different pockets and communications.
I. INTRODUCTION
To exert their function, biomolecular systems often require concerted, either global or local, conformational changes. Communication networks within these systems hence arise and represent constitutive features, which collectively go under the name of allosteric regulation.1–3 Since its historical conception,4,5 the term allostery has evolved to indicate any type of long-distance signaling caused by a local perturbation at a distal site, including protein–protein interaction, covalent modifications, mutations, and the classic binding of small molecule effectors at sites other than the orthosteric one.2–5 The result of allosteric regulation is the modulation of biomolecular activity.1,5,6 Despite being conceptually straightforward, the mechanism through which allostery is accomplished can be highly complex. Indeed, several models have been proposed.2–5 Due to this state of affairs, allosteric pathways remain nowadays largely uncharacterized.7 Besides the remarkable implications for fundamental science, characterizing allosteric mechanisms can have practical outcomes of great significance. In drug discovery, for instance, potential advantages associated with the development of allosteric modulators include, among others, high selectivity, safer dosage, and reduced risk of toxicity or side effects.8–10 From an experimental point of view, allostery is rather challenging to investigate through the most commonly used methods, including x-ray crystallography, NMR, and mutagenesis.1,7,11 Additionally, studies to assess allosteric mechanisms are typically conducted to compare the structural dynamics of biomolecules in the presence and absence of the triggering effector promoting the allosteric event. This is also typically the case for computational studies,6–8,10,12 where simulations and analyses of a biomolecular target are, for instance, done both in the presence and in the absence of ligands.13–15 The simulative outcomes are then compared to disentangle the relevant allosteric hotspots. In a dynamics-based view of allostery,2–4,6,16 structural fluctuations in the biomolecule can account for intramolecular communication between distinct binding sites2,17 and can hold true even in situations where no ligand may be present (apo protein).2,6 This suggests that, in some cases, hallmarks of allostery can be an intrinsic feature of biomolecules,18–20 which could be appreciated also in the apo forms.
In this paper, we employ computational methods to study pocket intramolecular communication in different biomolecules in the absence of ligands. We compare the results of three different methods by applying them to the same, diverse set of relevant biomolecules that are reported to display allosteric behaviors. Specifically, we study three pharmaceutically relevant targets: the adenosine A2A receptor (A2A), the androgen receptor (AR), and the epidermal growth factor receptor (EGFR) kinase domain. A2A is a transmembrane receptor involved in pathological conditions, such as inflammation, cancer, and Parkinson’s disease,21,22 AR is a nuclear receptor that plays a crucial role in the growth and progression of prostate cancer,23 while the EGFR plays crucial roles in the signaling pathways controlling cell growth, differentiation, and survival.24 The goal is to assess to what extent some information about allosteric communication can be retrieved from the apo simulations by the three methods, also comparing their outcomes. Our procedure is based on the combination of molecular dynamics (MD) simulations25 with graph theory to achieve a more intuitive interpretation of the results. Figure 1 provides a schematic representation of the pipeline we followed.
First, we perform μs-long plain MD simulations. We then identify pockets on the MD-sampled structures using a pocket-tracking algorithm named Pocketron.13 We further analyze correlations between the identified pockets through three different computational strategies: a pocket crosstalk analysis,13 a Dynamic Network analysis,26 and a distance fluctuation analysis.15 Finally, to allow for a more intuitive interpretation of the results and an effective comparison among the three different approaches, we visualize their results in the form of network diagrams, where we visualize the highest-ranking communications.
II. METHODS
A. MD simulations
Molecular dynamics simulations were conducted on three distinct systems: the androgen A2A receptor (modeled as reported in a previous study13 using PDB IDs 3UZC and 4EIY), the androgen receptor (PDB ID: 1T63), and the EGFR kinase domain (PDB ID: 2JIT). For the EGFR system, the MD trajectory was taken from Ref. 27. To prepare the systems, ligands were removed, while crystal water molecules and ions were retained. The systems were simulated using the ff99SB-ILDN force field,28 with the TIP3P model29 for water molecules and using a cubic box, resulting in systems of around 65 000, 75 000, and 61 000 atoms for A2A, AR, and EGFR, respectively. Na+ and Cl− ions were added for neutralization. All systems were prepared with BiKi Life Sciences.30 The systems were equilibrated in both the NVT, through three subsequent stages at 100, 200, and 300 K, and the NPT ensembles to achieve a temperature of 300 K and pressure of 1 bar. The A2A system, embedded in a POPC (1-palmitoyl-2-oleoylphosphatidylcholine) membrane, underwent a longer equilibration in the NPT ensemble (Fig. S1). Specifically, this system was simulated in the NPT ensemble for 50 ns with restraints on the heavy atoms of the protein, followed by an additional 50 ns simulation without restraints to fully relax the system. Subsequently, molecular dynamic simulations were performed on all three systems for a total of 3 µs for the A2A Receptor and EGFR and 2.6 µs for the androgen receptor. Simulations were conducted using GROMACS 2021.431 with a time step of 2 fs. From the simulations of each system, 50 000 frames were extracted (about 48 000 for the androgen receptor) for pocket detection and tracking through the Pocketron algorithm13 available in BiKi Life Sciences.
B. Pocketron’s pocket crosstalk analysis
Pocketron13 is an algorithm that allows tracking pockets on a molecular dynamics trajectory. The static (per frame) pocket detection algorithm at its basis is NanoShaper32 (freely available at https://gitlab.iit.it/SDecherchi/nanoshaper/) and uses the solvent excluded surface (SES) or Connolly-Richards surface as a basis. This surface is created by rolling a spherical probe over the van der Waals surface of the system. Pockets are identified by comparing the regions enclosed by the SESs obtained with two different probe radii (by default 1.4 and 3 Å). Retained pockets are those with volume values over 34.5 Å3, the equivalent volume of 3 water molecules. The algorithm tracks the atoms within pockets as they change over time in the trajectory. The algorithm tracks pockets through the assignment of unique pocket identifiers (pIDs) to each pocket identified in the first frame, as defined by their constituent atoms. These initial pockets are stored in a list, and for each subsequent frame, the algorithm recalculates the pockets and compares them to the stored ones. In instances where there is a discrepancy between the newly calculated pockets and those stored, the current pocket is added as a new entry in the stored pockets list and assigned a new pID. The algorithm monitors all atom-based events without needing any prior knowledge or structural alignment of the region being analyzed. It tracks the atoms “exchange” between different pockets, which is used as an indicator for “pocket crosstalk.” This analysis includes “merging” and “splitting” events, which are calculated by comparing the sets of atoms within each pocket at different points in time. A “merge” event occurs when atoms that were in separate pockets in a previous frame now belong to the same pocket. A “split” event occurs when atoms that were in a single pocket in a previous frame now belong to multiple pockets. The outcome of the tracking process is a set of pockets ID, the associated atom indices, and a pair of split–merge matrices, which represent the propensity in “exchanging” atoms. As in Ref. 27, to obtain a symmetric and single split–merge matrix of pockets, we first symmetrize the split and merge matrix in isolation and then we get the average matrix. This last matrix is used for the subsequent network analysis. Notably, Pocketron has recently been applied to a structurally complex target (i.e., tubulin) identifying innovative pockets,33 subsequently targeted in the context of an anticancer drug discovery program.33
C. Generalized correlation-based dynamical network analysis
Another method we challenged to study the dynamic motion and communication between residues in a MD simulation is a graph-based approach, named Dynamic Network (DyNet) analysis.26 This network analysis is based on correlated protein motions: it allows modeling of how communication signals travel through the protein structure, thus possibly identifying allosteric pathways.
To construct the protein graph, each node is associated with the alpha-carbon (Cα) of an amino acid residue in the primary sequence. Then, an adjacency matrix (A) is created, which defines the connections (edges) between the nodes.
The connections are determined using a distance cutoff (typically 3.5–5.5 Å) that corresponds to pairs of residues that are at chemical distance. The distances between heavy atoms of the two residues above the cutoff are filtered out. This first geometric cutoff is supplemented by a statistical time-percentage cutoff, which reflects how long two residues are in contact during the MD trajectories. For a fixed cutoff distance, the contacts are evaluated along the trajectories, and their existence is determined based on the percentage of frames (from 65% to 85%) in which these contacts are present.
The generalized mutual information can be used to weight the protein graph. The obtained graph, called dynamical weighted network, models the information exchange within a protein based on its dynamics. To create the dynamical weighted network, the nonzero entries of the adjacency matrix are filled with edge weights wij, which are derived as for all pairs of residues i and j that are connected in the network. Conversely, if two nodes (i and j) are not connected by an edge, the corresponding entry in the matrix (Aij) will be zero. In a graph weighted by wij coefficients, the adjacency matrix Aij represents which pairs of residues exchange information (through correlated motions) and are therefore linked in the dynamical weighted network.
The dynamical weighted network contains information paths and critical nodes that are important for communication within the protein system. Given this architecture, the edge betweenness, EB,35 is defined as the number of shortest pathways (SPs) that cross a given edge, measuring how much this edge is responsible for connecting the other pairs of nodes in the network. The SPs within the protein communication network can be calculated using the Floyd–Warshall algorithm.36 Therefore, the edges with the highest betweennesses are those conveying the highest amount of information exchange.
The EB can be used to find “communities” or, in other words, clusters. These communities can be found according to a clustering procedure.37 In this work, the communities, instead, are directly derived from the residues pocket membership relation; in other words, there is a one-to-one correspondence between pockets and communities. Next, we still take advantage of coarse graining, where linked communities (linked pockets) are connected by edges that are weighted by the inter-communities EB (IEB), i.e., the sum of the EBs associated with the pairs of residues connecting the two communities. The IEBs indicate the strength of the communication flow between pairs of communities and thus represent a simplified way to represent the communication associated with correlated protein motions within a protein.
Practical guidance to perform the dynamical network analysis is implemented in the form of Jupyter notebooks,26 which we used in the present work. The analyses were performed by defining as nodes the Cα carbons of systems (305 for A2A, 250 for AR, and 289 for EGFR), using a distance cutoff of 5.5 Å along with a percentage existence of 75%, and a unique block comprising 50 000 frames.
D. Distance fluctuation analysis
E. Coarse graining of the correlation matrices and construction of the network diagrams
Comparing the results of the three different methods explored in this work is not trivial as they provide results in different forms. Pocketron’s crosstalk analysis produces an M × M matrix where the element ij indicates crosstalk probability between pockets i and j. This crosstalk probability is computed from the number of merging and splitting events occurring during the simulation. The DyNet method generates a weighted network with edge betweenness associated with pairs of nodes corresponding to Cα carbons. Finally, the DF method generates an N × N matrix of pairwise distance fluctuations between the N Cα carbons in the system. To normalize these representations to a common ground, we aggregated the information provided by the DyNet and DF approaches in a pocket-oriented fashion to highlight correlations between entire pockets as done in other works.10,37 Following the procedures reported in these works, we dissected the weighted network generated with DyNet into communities corresponding to the pockets identified by Pocketron. As a result, we obtained an M × M matrix, with M being the number of pockets, where the element ij is the sum of the edge betweenness associated with the pairs of residues connecting pockets i and j. Analogously, the original DF matrix was aggregated into an M × M matrix, where the element ij is the average of the DFs between the residues comprised in pockets i and j.
A practical issue in the construction of the M × M matrices is that the pocket definition by the Pocketron algorithm allows for a residue to be part of two different pockets. This reflects the dynamical nature of pockets, which can undergo merge and split events, as detected by the algorithm. Coarse graining the information produced by the DyNet and DF methods into M × M matrices becomes thus non-straightforward. To address this issue, each time that pockets i and j displayed an overlap in residues, the computation was repeated twice, once assigning to pocket i all the residues in common and once to pocket j. We then took the highest and lowest values in the case of DyNet and DF, respectively, corresponding to the highest correlation in both methods.
For consistency, the M × M matrices obtained with Pocketron and DyNet were also normalized in the range [0,1]. The resulting matrices for the three systems are reported in Figs. S6–S8.
Finally, the correlations contained in the M × M matrices were displayed in the form of network diagrams (Figs. 2, 4, and 6). In our representation, network nodes correspond to pocket identities (pIDs), with radius proportional to the average nonzero volume sampled for the identified pockets during the MD simulations and the color scale indicating pocket persistency, which is defined as the fraction of MD snapshots with nonzero volume. The elements ij of the correlation matrices are represented as edges connecting the nodes corresponding to the involved pockets, with thicker edges indicating higher correlation. For clarity, we displayed in the networks only the highest-ranking (i.e., top 30) correlations for each system, so as to straightforwardly visualize the most relevant communications, and rescaled such edges to the same range for the three systems to improve discrimination between higher and lower correlations. Network diagrams were produced with the NetworkX39 Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. The nodes were projected in a 2D space obtained through multidimensional scaling on the Cartesian coordinates of the pockets’ centers of mass. This allowed obtaining a 2D space that reflected as much as possible the distances of the pockets in the original 3D space. As pockets closeness is preserved, the interpretation of the results is eased.
III. RESULTS
Molecular dynamics simulations were performed on three systems: the adenosine A2A receptor (A2A), the androgen receptor (AR), and the epidermal growth factor receptor (EGFR) kinase domain. 3 µs-long trajectories were generated for each structure once stripped of the respective ligands. Pocketron13 was employed to detect cavities (i.e., pockets) parsing the MD trajectories generated for the three systems. This pocket identification process represented a shared baseline as the residues’ composition was used as the common invariant for subsequent analysis. Indeed, from this pocket definition we can derive, as discussed before, correlation matrices for Pocketron,13 DyNet,26 and DF analysis,15 all of which span the [0,1] domain, where 1 means maximal “correlation.” We obtained, for each system, three such matrices quantifying correlations between the identified pockets (Figs. S6–S8). When analyzing MD simulations, higher values were assigned (i) by Pocketron, to pocket pairs that more often interacted via merge and split events, (ii) by DyNet, to pocket pairs displaying higher edge betweenness, and (iii) by the DF method, to pocket pairs displaying small DF.
As a first preliminary check, we evaluated the convergence rate of the matrix entries values. To do that (see the supplementary material, Figs. S9–S11), we took as reference the EGFR system. We computed the Frobenius norm40 of the difference of the matrices at 1.0, 1.5, 2.0, 2.5, 3.0 µs (total simulation time) and the matrix obtained at 0.5 µs. We found the DyNet method to be the fastest converging one, as after 1.0 µs the norm keeps on oscillating around a well-defined value. The DF and Pocketron methods prove to converge even if they require a higher simulation time; convergence is approximately attained at 2.5 µs. The time variations of the Pocketron matrix also take into account the rate of pockets emergence along time, whereas the other methods get advantage of the final Pocketron pockets’ set. Hence, the Pocketron result is also affected by the pockets’ appearance rate along the trajectory.
A comparison of the final results from the three methods highlighted a different sparseness of the correlation matrices. In particular, the DF method showed an overall homogeneous distribution of higher and lower signals, with regions of higher density of correlation, the DyNet approach returned less signals, mostly concentrated in specific regions of the matrices, finally Pocketron resulted in very few signals. This is related to the way the different methods assign values to the correlation matrices. For instance, Pocketron identifies pocket communication only if there is an exchange of residues comprised in different pockets, which is a rather strict condition to assess the occurrence of an event, and thus resulted in remarkably sparse correlation matrices. For the DyNet method, the betweenness is computed only for pocket pairs with distances between residues that are below a predefined distance cutoff, in the order of 3.5–5.5 Å,26,37 thus reducing by construction the possibility of direct communication for pockets that were distant in the 3D space. Finally, DFs are directly computed for each pocket pair returning nonzero values, thus, although the matrices were usually filtered via a DF upper bound, they typically displayed a larger amount of non-null entries, since also pockets at larger distances can display visible signals.
A closer look into the results revealed similarities in the regions with denser signals. For instance, in the A2A system (Fig. S6), DyNet and Pocketron displayed many signals associated with pocket identifiers (pID) 3, 4, and 5, with the highest values for pairs (3,5). Analogously, although the DF analysis displayed signals that are more homogeneously distributed among all pocket pairs, a similar “L”-shaped pattern formed by pIDs 3, 4, and 5 was visible. Therefore, all the three methods consistently highlighted the significance of these regions of the A2A receptor in displaying pocket correlations.
Concerning the AR (Fig. S7), the signals identified by Pocketron match with the denser region of DyNet. However, discrepancies can also be observed, particularly in the signal associated with the pID pair (14,39), which was determined as significant by the DyNet analysis and null by Pocketron. This suggested that, while the receptor structural dynamics displayed a correlation between regions involved in the pocket formation, the identified pockets had a low probability of undergoing merging and splitting events by interchanging atoms, apart from a few cases, such as (7,13), (8,9), and (13,21). The DF analysis showed an overall pattern that is similar to the one obtained with DyNet, indicating that the observed correlations result from low fluctuations in the distances between the corresponding pocket pairs.
Finally, the results for the EFGR (Fig. S8) revealed an agreement between the pocket correlation observed with DyNet and Pocketron, which resulted in rather sparse matrices of correlation with few signals in consistent regions. Specifically, multiple high signals were associated with pID 1, with strong communication between pairs (1,7), (1,34), (1,44), and (1,57) and a medium-to-high value for pair (7,34). Conversely, the DF analysis revealed correlations throughout the whole pID space.
A more intuitive interpretation of the information contained in the correlation matrices can be achieved by taking advantage of basic graph theory by representing the data as networks of nodes and edges. In particular, the nodes represent the pockets identified in the target’s structure, while the edges connecting the nodes represent the correlation between pocket pairs. In Secs. III A–III C, we provide such insights; we now describe the representation conventions we adopted. In the subsequent graph plots, the node radius is proportional to the average nonzero volume of the pocket during the simulation, while the color scale reports on the pocket persistency. The thickness of the edges between the nodes is proportional to the correlation value between the corresponding pockets as identified via the three different approaches examined in this work. To make the representation clearer, in the networks only the highest (top 30) signals are shown, corresponding to the most relevant communications between pockets. Furthermore, to facilitate the interpretation of the pocket network, nodes are projected in a 2D space obtained through multidimensional scaling on the Cartesian coordinates of the pockets’ centers of mass, which allows for considerably preserving the distances of the original 3D space. We present such results now system-wise.
A. Adenosine A2A receptor (A2A)
Figure 2 shows the results for the A2A receptor (see Fig. S12 for a 3D representation). The largest pocket identified (pID 4) corresponds to the orthosteric site, which is located near the extracellular region and buried surrounded by the seven transmembrane helices. Along with the orthosteric site, pIDs 3, 5, and 6 pockets display the highest persistency, meaning that they were observed in most of the simulation.
Inspection of the resulting pocket correlation network can give us insight into how the protein’s different regions are interconnected, unveiling pathways of communication that can possibly be involved in protein’s function and dynamics.
According to the DyNet results [Fig. 2(b)], the orthosteric site is highly connected with other surrounding pockets, showing the highest communication with pID 11, 13, 19, 33, 37, and 58. Similarly, the results from Pocketron [Fig. 2(c)] also showed connections between the orthosteric site (pID 4) and its surrounding pockets, with a good overlap with the DyNet results. Most remarkably, Pocketron also identified the connection with pID 3. However, in remarkable contrast, the connection is direct in this case and was not mediated by intermediate pockets. Notably, the connection between pIDs 3 and 5 was also preserved.
Finally, the DF approach also identified connections involving the most persistent pockets 3, 4, and 5 [Fig. 2(d)]. However, neither direct not indirect pathways were observed between pIDs 4 and 3. The obtained picture indicated a dense region of connections between the lower portion of the intracellular side of the receptor. Interestingly, the connection between pID 3 and 5 was also preserved in this case, as in DyNet and Pocketron. As already introduced, the DF prioritizes connections involving smaller distance fluctuations, i.e., where pocket residues move in a highly coordinated fashion. As a result, highly structured regions, such as, for instance, residues within an alpha helix structure, were more likely to return high signals. Therefore, a common practice is to filter out the system-specific local fluctuations and inspect the distance fluctuations at increasing values of the distances. We reproduced this approach in our network representation and observed the network obtained when focusing on inter-pocket residue distances larger than predefined cutoff values (Figs. S15–S17; note that only the highest-ranking correlations are shown in the network representation, see Figs. S3–S5 for the complete correlation matrices). For the case of the A2A receptor, filtering out DFs at distances <10 Å (i.e., focusing on DF associated with inter-pocket residue distances larger than 10 Å) revealed several longer-range correlations between the orthosteric site and the surrounding pockets.
Interestingly, a pathway from pID 4 to pID 3 arose, passing through pIDs 14, which corresponds to one of the multiple pathways between pIDs 4 and 3 as identified by DyNet. Increasing the distance lower bound to 15 Å, the connection between pID 4 and 3 was preserved and becomes direct, filtering out the intermediate pockets 13 and 14 that were connected by inter-pocket residue distances <15 Å. As a further note, we highlighted how, while the choice of these distance cutoffs appears arbitrary, it was nevertheless reasonably related to the system size. For instance, while AR and EGFR have a maximum inter-pocket residue distance of 46 and 55 Å, respectively, the A2A receptor reaches 76 Å, thus allowing for investigations of long-range correlations using larger distance cutoffs.
To further assess the significance and contextualize the results obtained with our analyses, we investigate the relation with respect to hotspots in the structures already reported in the literature. Concerning the A2A receptor, previous studies41–43 suggested several non-orthosteric sites that can be suitable hotspots for allosteric modulation. Interestingly, we observed a high level of agreement between the location of these pockets and the ones identified through Pocketron (Fig. 3).
Specifically, we identified pockets in the regions of the “G Protein-Coupling Site” (pID 3), the “Lipid interface” (pID 6), the “C-Terminus Cleft” (pID 8), and the “Extracellular Cleft” (pID 33).42 The position of the “Intracellular Crevice” (Fig. 3, blue residues) is rather extended, as it can vary depending on the receptor's conformation (active, intermediate 1, intermediate 2, inactive). Interestingly, we were able to identify a small pocket (pID 5) in a subpart of this region. Consistently, the pockets located in the region of the “C-Terminus Cleft” (pID 8) and the “Extracellular Cleft” (pID 33), which are only visible in the active conformation, displayed a very small persistency in our analyses, although were found, by the study, to show some degree of correlation with the orthosteric site.
Another known hotspot of the receptor is the ion site (Fig. 3, yellow residues), normally occupied by a sodium ion that can be displaced by allosteric drugs. It has been suggested that sodium binding triggers the deactivation of the system.43,44 In particular, the study of the movement and positioning of the sodium ion indicated that it is very stable when it is linked to the receptor’s inactive form.44 Moreover, the impact of a physiological saline concentration was tested in the simulations, and no variations in the outcomes were noticed. Additionally, none of the sodium ions from the extracellular environment were able to pass through the narrow channel and enter the allosteric pocket. Conversely, when evaluating the active form of the receptor, research has demonstrated that the TM7 partially closes off the sodium ion binding pocket, rendering it incapable of accommodating ions.44 Our MD simulations were performed starting from a crystal structure (PDB ID 3UZC) with the sodium in the ion pocket, which remained in place for the whole simulation. Nevertheless, the ion pocket was absent as an independent pocket in the final pocket list. For pocket identification, we used the standard Pocketron setup, where pockets need to display a suitable volume to host a probe size corresponding to about three water molecules in volume. Notably, in a previous study by some of us,13 where a pocket search was conducted on a 100 ns long simulation of the A2A receptor, the ion pocket was correctly identified. Interestingly, in our MD simulation the ion pocket merged with the neighboring orthosteric site. As a result, the residues comprised in the ion pocket were included in the orthosteric pocket.
B. Androgen receptor (AR)
The pocket search on the AR indicated the presence of three large pockets, namely pID 7, pID 8, and pID 13, as well as a medium-sized pocket, pID 3, all denoted by a significant persistency over the MD simulation.
Comparison of the results obtained from the three approaches provided results that are more inhomogeneous (Fig. 4, and Fig. S13 for a 3D representation). The DyNet method [Fig. 4(b)] indicated correlation of pID 13, corresponding to the orthosteric pocket, with several surrounding pockets. The highest signals are found on the direct connection with pIDs 8 and 39. While there was no direct correlation with the high-volume and persistent pID 7, indirect connections through pIDs 2, 8, or 10 were identified. Furthermore, there was also a discernible path between pID 7 and pID1, passing through pIDs 10 and 11.
Conversely, the crosstalk network obtained with Pocketron [Fig. 4(c)] displayed a more central role of the orthosteric site pID 13, which is connected to several surrounding pockets, including the larger and persistent pIDs 7 and 8, and other smaller and less persistent pockets. Regarding the separate, tough persistent, pID 3, there was only a local communication with surrounding pockets, such as pID 11, 17, and pID, 26, while no connection with any of the biggest pockets was found. The difficulty in identifying communication with Pocketron may be due to the conformation of the system where the presence of helix 4 separated the two pIDs 13 and 7 from pID 8, thus interrupting the possible interchange of pocket’s residues.
Finally, the DF analysis [Fig. 4(d)] suggested the presence of two key clusters with high correlation signals, one around pID 3 and 1, and one around the orthosteric pocket (pID 13), mainly connected by the edge between the relevant node 7 and node 27.
The AR presents three main hotspots, consisting in the orthosteric binding pocket (Fig. 5, gray residues), the BF-3 pocket (cyan residues), and the AF-2 pocket (green residues). The AF2 site is important for the binding of co-activator proteins, which trigger the transcription of genes regulated by AR. Some antagonists have been found to bind to this site, which prevents co-activator proteins from binding and activating AR.45 The BF3 site is a recently discovered site where antagonists can bind to suppress AR activity. When an antagonist binds to any of these sites, it causes a change in the structure of the AF2 site, making it impossible for co-activator proteins to bind to AR due to the conformational changes induced by the movement of H12. Notably, besides the orthosteric site, corresponding to pID 13, several pockets were identified from our simulations in these hotspot regions. These included pIDs 5 and 6, located in the AF-2 regions, which showed low persistence in our simulations. It is interesting to notice that, despite this known communication between the orthosteric site and the AF-2 region, no remarkable communication between the two regions was observed in our analyses. Only the pocket crosstalk analysis with Pocketron [Fig. 4(c)] displayed some communication between pID 13 and the small cluster comprising pockets 6, 5, and 36, passing through pID 39, however, with relatively small edge intensities.
Finally, concerning the BF3 pocket, shown in cyan and corresponding to pIDs 1 and 4 in our results, Pocketron displayed no relevant communication with other pockets, while DyNet showed a pathway starting from pID 1 to pID 5 through pID 11, 10, and the orthosteric pID 13. Finally, Pocketron correctly identified a non-negligible communication between pID 13 and pID 21, which is in the terminal part of H11 and undergoes a conformational change after binding of an agonist or antagonist, confirming the possibility of information exchange between the two regions.
C. EGFR kinase domain
The analyses on the EGFR system (Fig. 6, and Fig. S14 for a 3D representation) showed consistent results in identifying pID 1, which corresponded to the hinge region, as a central hub between two separate regions of the structure that were mainly involved in inter-pocket communication. The DyNet approach [Fig. 6(b)] resulted in several connections between the hinge region (pID 1) and the closer pockets, such as pIDs 30, 44, 53, and 57, as well as the pockets formed by the A loop, such as pIDs 7, 8, 37, and 45. Moreover, it showed a clear connection between pID 57 and pID 44. Comparing these results with those obtained using Pocketron [Fig. 6(c)], we can see consistent connections, such as those connecting pID with pIDs 57, which is located between five beta-strands and the alpha C-helix, and 44, a deep groove located at the interface. The networks showed connections involving the pockets both with the hinge and with respect to each other. Additionally, an interesting signal is observed between the hinge (pID1) and pIDs 7 and 8, which are formed by the A loop in this conformation, in both methods creating a communication network within that region. The DF analysis [Fig. 6(d)] highlighted high correlations within two opposite regions of the system, providing overall information in agreement with what was observed from DyNet and Pocketron. Passing through the main persistent pID 1, the two distant regions of the EGFR, in both of which high communication is observed, had the possibility to communicate and transfer structural dynamics information.
In the literature, it has been shown that there are a few main structures of the receptor mainly responsible for the activation and deactivation of the system. The structure presents a hinge allowing for flexibility and movement in this specific part of the structure. This region is typically composed of a stretch of amino acids that can bend or rotate without disrupting the overall structure. In close vicinity, one can find a small helix called the αC-helix, 5 β-strands, a loop known as the A loop, and the orthosteric ATP (adenosine 5′-triphosphate) pocket that includes the “DFG” motif made by the residues D855, F856, G857. These structures assume different conformations to activate and deactivate the receptor. A thorough study on the behavior of the EFGR kinase was conducted by Qiu et al.,46 which examined the changes in the apo structure caused by the presence of a ligand in the ATP binding pocket (Osimertinib) alone and in combination with a ligand in the allosteric MEK (mitogen-activated protein kinase kinase) pocket (JBJ). The study revealed that the presence of Osimertinib resulted in the system assuming an αC-out conformation and an open A loop. On the other hand, the presence of the allosteric inhibitor JBJ led to an inactive state, closing the loop inside of the hinge that interacts with the out-rotated αC-helix and a 2-turn small helix formed in the loop (“Src-like” inactive state).
Figure 7 presents the location of critical regions in the EGFR structure, namely the ATP binding pocket (gray residues in the figure), the MEK pocket (red residues), and a potential allosteric site identified by Qiu et al.46 Our pocket detection revealed a strong correspondence for the first two known pockets, namely the ATP and MEK pockets, corresponding to pIDs 44 and 57, respectively. Additionally, the A loop was identified as another important area of the structure, which corresponds to the region around pID 7, that mostly displayed in DyNet and Pocketron correlations with mainly pIDs 1 and 57 (directly or with small path). Therefore, our results confirmed the view reported in the literature, with the central hinge mediating the exchange of communication between the different regions of the structure.
IV. DISCUSSION AND CONCLUSIONS
Analyzing the communication patterns within a biomolecular structure is of utmost importance to understand its functioning and to potentially design allosteric binders able to modulate its activity. In this respect, computational methods are enabling tools to achieve an atom-level, holistic picture of the underlying mechanistic details. In this study, we combined molecular dynamics (MD) simulations in the μs timescale with some tools to identify correlations in the structural dynamics of three pharmaceutically relevant targets: the adenosine A2A receptor (A2A), the androgen receptor (AR), and the epidermal growth factor receptor (EGFR) kinase domain. The systems were all simulated in the absence of any co-factor/ligand. The protocol is based on two major steps: (1) pockets detection/tracking and (2) the evaluation of pocket crosstalk. We took advantage of Pocketron13 to address the first step. Then, we performed the second step by employing three different techniques: post-processing27 of Pocketron merge–split matrices, the Dynamic Network Analysis (DyNet), and distance fluctuation (DF) analysis methods. For all the methods, we used the notion of pocket coming from the Pocketron analysis. The three correlation methods are based on different conceptual frameworks; therefore, we aimed to assess to what extent they could provide similar, complementary, or different indications of pocket correlations.
Pocketron has shown that, in all the analyzed three cases, orthosteric pockets are placed in the top-3 largest pockets, such as pID 4 for the A2A case, pID 13 for the AR case, and pID 44 (ATP site) for the EGFR. Overall, the investigated approaches allowed, in most cases, to straightforwardly highlight the most relevant known sites; these typically emerged as persistent and highly connected pockets.
In pockets communication terms, the degree of consistency of the tested methods proved to be system dependent, showing partial orthogonality of the outcomes. This confirms the “many disguises” nature of allostery,6 albeit from the computational standpoint.
The analysis of A2A revealed that DyNet can detect strong communications surrounding the orthosteric site, where also Pocketron finds multiple correlations. Conversely, DF identifies fewer connections from the orthosteric site and points to pockets in different regions compared to the other methods, particularly close to the intracellular portion. The work from the McCammon laboratory and collaborators42 identified five non-orthosteric pockets and all of them were identified by Pocketron (pID 3, 5, 6, 8, and 33).
In the androgen receptor case, similar to the A2A results, higher communication was associated with the orthosteric site (pID 13) in DynNet and Pocketron, with DF displaying one main connection and then a denser area of signals further away close to the BF3 site (pIDs 3 and 4). Analogously, the former methods identified communication with the AF2 allosteric site (pIDs 5 and 6), albeit with different intensities, while no relevant connections were observed through the DF approach.
For the EGFR kinase domain, both DyNet and Pocketron results highlighted several communications originating from the ATP site (pID 44). The MEK pocket (pID 57), which is a known allosteric site,47 showed a high degree of communication with the surroundings and especially the hinge region for all the three methods. Furthermore, Fig. 6(d) (the DF analysis) revealed two main clusters of communications within the receptor, connected by the hinge, resulting in low distance fluctuations in regions not involved in major conformational changes. The study by Qiu et al.46 demonstrated a correlation between the MEK pocket (upon binding of the JBJ ligand) and the closing movement of the A loop, which leads to inactivation of the receptor. Their results identified a potential allosteric pocket in the lower region of the receptor called “Pocket X.” Comparison of the residues that create this pocket revealed it corresponds to our pID 11, which is found, in the case of Pocketron, connected to the hinge and the ATP site through other intermediate pockets. For the first two methods, pID 7 has a higher persistence and a stronger, direct connection to the hinge (and thus the ATP site).
As we selected systems known to bear allosteric links, the results obtained from the application of these three different methods seem to confirm the presence of allosteric phenomena. We stress again that we focused only on the top-ranking communication results to facilitate the interpretation and comparison of the complex interplays emerging from our networks. We cannot hence exclude additional interesting connections that may exist; such system-specific analysis is beyond the scope of the present study, which aims at providing an overall comparison of the three different approaches. The issue of identifying potential allosteric patterns is approached differently by the three methods. The DyNet approach, being based on mutual information, represents a probabilistic viewpoint. Specifically, the correlation within biomolecular structures is quantified by the degree of statistical independence between distributions of observables (e.g., Cartesian coordinates, distances, etc.) suitable to describe structural motions. Conversely, the DF method quantifies the correlation in terms of variance of the observables, namely distances; the variance is a parameter of a distribution and not a distribution itself. Finally, in Pocketron, once the pockets in the structure are identified, the correlation stems from the dynamical sharing of residues observed between neighboring pockets during a MD simulation.
Interestingly, our results showed that Pocketron, despite not manipulating explicitly distributions, is able to achieve good consistency with the overall picture obtained from the mutual information based DyNet approach.
Additionally, these two methods display partial consistency with the variance-based DF analysis. Notably, the DF approach prioritizes correlations corresponding to low fluctuations, i.e., low variance, of the distances; one can interpret this correlation measure as a synchronous motion between two pockets. Furthermore, differently than DyNet and Pocketron that assess correlation between adjacent pockets only, in DF the correlation is instead estimated for all pairs. As a result, the DF method provides a global description of the correlations within the structure. In this framework, the use of a distance threshold thus gains particular relevance in light of identifying genuinely long-range correlations.
Opposite to this global and synchronous spirit, the DyNet and Pocketron methods quantify local and possibly asynchronous correlations. Hence, long-range allosteric communication in these two methods is always a consequence of a path in a graph, whereas in DF direct links rule all the long-range interactions. Within this picture, our results showed, unexpectedly and interestingly, that Pocketron seems to be more consistent with the probabilistic approach, even if there is no explicit probabilistic formulation.
For all these reasons, while we envisage an exploitation of the DF method to identify regions that can be sensitive to long-range and synchronous modulation, DyNet and Pocketron appear more suitable to reconstruct signaling pathways happening through more general communication patterns.
The employed algorithms, as usual, present free hyper-parameters; hence studying the “persistency” of topological features48 of graphs varying the hyper-parameters could be an interesting line of further investigation. Nevertheless, while the persistent homology solution is inspired by topological principles, one could instead envision a more physical approach: for the DF case in particular, as genuine allosteric interactions are inherently long range, here one could take advantage of the rate of decay of the electrostatic interactions to define a proper threshold. Furthermore, the threshold definition can stem from secondary and tertiary structure knowledge of the system.
Although our approach for pocket identification and study of the correlations takes explicitly into account the dynamics of the system, it is nevertheless intrinsically limited by the conformational space accessible to the apo systems, the simulative timeframe, and the water-only buffer. To improve this last aspect for future works, one may envisage the doping of water through hydrophobic molecular probes.49 This can improve the sampling of biomolecular conformations exposing so-called cryptic pockets; the sampled structure conformations could well feed the here introduced computational pipeline.
SUPPLEMENTARY MATERIAL
The supplementary material includes correlation matrices obtained using the three methods for all the three systems, 3D views of the pocket networks, the effect of parameters used in the analyses, and further details about the MD simulations.
ACKNOWLEDGMENTS
Giorgio Colombo and Stefano Serapian are thanked for sharing the code to conduct the distance fluctuation (DF) analysis and providing useful indications on its use. We thank the referees for their useful suggestions.
AUTHOR DECLARATIONS
Conflict of Interest
S.D. and A.C. are share owners of BiKi Technologies s.r.l., a company that commercializes software solutions for medicinal chemistry.
Author Contributions
Riccardo Aguti: Investigation (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Mattia Bernetti: Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (equal); Visualization (lead); Writing – original draft (equal); Writing – review & editing (equal). Stefano Bosio: Investigation (supporting); Validation (supporting); Visualization (supporting); Writing – original draft (supporting); Writing – review & editing (equal). Sergio Decherchi: Conceptualization (lead); Formal analysis (equal); Investigation (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (supporting); Visualization (equal); Writing – review & editing (equal). Andrea Cavalli: Conceptualization (supporting); Funding acquisition (equal); Project administration (equal); Resources (lead); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
MD trajectories, preprocessed data, and a Jupyter Notebook that can be used to perform our analyses and generate all the figures in the manuscript, along with input files for the MD simulations, are openly available and can be downloaded at https://doi.org/10.48557/TVYQBB.
Preprocessed data and the Jupyter Notebook only can also be found at https://github.com/mbernett/pocketcorrel.
NanoShaper is freely available at https://gitlab.iit.it/SDecherchi/nanoshaper/. Pocketron is available in the commercial software “BiKi Life Sciences” (https://www.bikitech.com). For no profit institutions, the software can be obtained with a nominal fee.