The pathway(s) that a ligand would adopt en route to its trajectory to the native pocket of the receptor protein act as a key determinant of its biological activity. While Molecular Dynamics (MD) simulations have emerged as the method of choice for modeling protein-ligand binding events, the high dimensional nature of the MD-derived trajectories often remains a barrier in the statistical elucidation of distinct ligand binding pathways due to the stochasticity inherent in the ligand’s fluctuation in the solution and around the receptor. Here, we demonstrate that an autoencoder based deep neural network, trained using an objective input feature of a large matrix of residue–ligand distances, can efficiently produce an optimal low-dimensional latent space that stores necessary information on the ligand-binding event. In particular, for a system of L99A mutant of T4 lysozyme interacting with its native ligand, benzene, this deep encoder–decoder framework automatically identifies multiple distinct recognition pathways, without requiring user intervention. The intermediates involve the spatially discrete location of the ligand in different helices of the protein before its eventual recognition of native pose. The compressed subspace derived from the autoencoder provides a quantitatively accurate measure of the free energy and kinetics of ligand binding to the native pocket. The investigation also recommends that while a linear dimensional reduction technique, such as time-structured independent component analysis, can do a decent job of state-space decomposition in cases where the intermediates are long-lived, autoencoder is the method of choice in systems where transient, low-populated intermediates can lead to multiple ligand-binding pathways.

1.
A. C.
Pan
,
D. W.
Borhani
,
R. O.
Dror
, and
D. E.
Shaw
, “
Molecular determinants of drug–receptor binding kinetics
,”
Drug Discovery Today
18
,
667
673
(
2013
).
2.
R. E.
Amaro
and
A.
Mullholand
, “
Multiscale methods in drug design bridge chemical and biological complexity in the search for cures
,”
Nat. Rev. Chem.
2
,
0148
(
2018
).
3.
J.
Mondal
,
N.
Ahalawat
,
S.
Pandit
,
L. E.
Kay
, and
P.
Vallurupalli
, “
Atomic resolution mechanism of ligand binding to a solvent inaccessible cavity in T4 lysozyme
,”
PLoS Comput. Biol.
14
,
e1006180
(
2018
).
4.
N.
Ahalawat
and
J.
Mondal
, “
Mapping the substrate recognition pathway in cytochrome P450
,”
J. Am. Chem. Soc.
140
,
17743
17752
(
2018
).
5.
A.
Nunes-Alves
,
D. B.
Kokh
, and
R. C.
Wade
, “
Recent progress in molecular simulation methods for drug binding kinetics
,”
Curr. Opin. Struct. Biol.
64
,
126
133
(
2020
).
6.
N.
Ahalawat
and
J.
Mondal
, “
An appraisal of computer simulation approaches in elucidating biomolecular recognition pathways
,”
J. Phys. Chem. Lett.
12
,
633
641
(
2021
).
7.
Y.
Shan
,
E. T.
Kim
,
M. P.
Eastwood
,
R. O.
Dror
,
M. A.
Seeliger
, and
D. E.
Shaw
, “
How does a drug molecule find its target binding site?
,”
J. Am. Chem. Soc.
133
,
9181
9183
(
2011
).
8.
R. O.
Dror
,
A. C.
Pan
,
D. H.
Arlow
,
D. W.
Borhani
,
P.
Maragakis
,
Y.
Shan
,
H.
Xu
, and
D. E.
Shaw
, “
Pathway and mechanism of drug binding to G-protein-coupled receptors
,”
Proc. Natl. Acad. Sci. U. S. A.
108
,
13118
13123
(
2011
).
9.
D. E.
Shaw
et al, “
Anton 2: Raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer
,” in
SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
(
IEEE
,
2014
), pp.
41
53
.
10.
D. E.
Shaw
et al, “
Anton 3: Twenty microseconds of molecular dynamics simulation before lunch
,” in
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
,
New York
,
NY, 13 November
2021
.
11.
C.
Kutzner
,
S.
Páll
,
M.
Fechner
,
A.
Esztermann
,
B. L.
de Groot
, and
H.
Grubmüller
, “
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
,”
J. Comput. Chem.
36
,
1990
2008
(
2015
).
12.
A.
Laio
and
M.
Parrinello
, “
Escaping free-energy minima
,”
Proc. Natl. Acad. Sci. U. S. A.
99
,
12562
12566
(
2002
).
13.
O.
Valsson
,
P.
Tiwary
, and
M.
Parrinello
, “
Enhancing important fluctuations: Rare events and metadynamics from a conceptual viewpoint
,”
Annu. Rev. Phys. Chem.
67
,
159
184
(
2016
).
14.
J. D.
Chodera
and
F.
Noé
, “
Markov state models of biomolecular conformational dynamics
,”
Curr. Opin. Struct. Biol.
25
,
135
144
(
2014
).
15.
An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation
, edited by
G. R.
Bowman
,
V. S.
Pande
, and
F.
Noé
(
Springer
,
Netherlands
,
2014
).
16.
N.
Ahalawat
and
J.
Mondal
, “
Assessment and optimization of collective variables for protein conformational landscape: GB1 β-hairpin as a case study
,”
J. Chem. Phys.
149
,
094101
(
2018
).
17.
N.
Ahalawat
,
S.
Bandyopadhyay
, and
J.
Mondal
, “
On the role of solvent in hydrophobic cavity-ligand recognition kinetics
,”
J. Chem. Phys.
152
,
074104
(
2020
).
18.
S.
Bandyopadhyay
and
J.
Mondal
, “
A deep autoencoder framework for discovery of metastable ensembles in biomacromolecules
,”
J. Chem. Phys.
155
,
114106
(
2021
).
19.
A.
Altis
,
P. H.
Nguyen
,
R.
Hegger
, and
G.
Stock
, “
Dihedral angle principal component analysis of molecular dynamics simulations
,”
J. Chem. Phys.
126
,
244111
(
2007
).
20.
G.
Pérez-Hernández
,
F.
Paul
,
T.
Giorgino
,
G.
De Fabritiis
, and
F.
Noé
, “
Identification of slow molecular order parameters for Markov model construction
,”
J. Chem. Phys.
139
,
015102
(
2013
).
21.
C. R.
Schwantes
and
V. S.
Pande
, “
Improvements in Markov state model construction reveal many non-native interactions in the folding of NTL9
,”
J. Chem. Theory Comput.
9
,
2000
2009
(
2013
).
22.
A.
Mardt
,
L.
Pasquali
,
H.
Wu
, and
F.
Noé
, “
VAMPnets for deep learning of molecular kinetics
,”
Nat. Commun.
9
,
5
(
2018
).
23.
W.
Chen
,
H.
Sidky
, and
A. L.
Ferguson
, “
Nonlinear discovery of slow molecular modes using state-free reversible VAMPnets
,”
J. Chem. Phys.
150
,
214114
(
2019
).
24.
H.
Sidky
,
W.
Chen
, and
A. L.
Ferguson
, “
High-resolution Markov state models for the dynamics of Trp-cage miniprotein constructed over slow folding modes identified by state-free reversible VAMPnets
,”
J. Phys. Chem. B
123
,
7999
8009
(
2019
).
25.
R.
Capelli
,
A.
Bochicchio
,
G.
Piccini
,
R.
Casasnovas
,
P.
Carloni
, and
M.
Parrinello
, “
Chasing the full free energy landscape of neuroreceptor/ligand unbinding by metadynamics simulations
,”
J. Chem. Theory Comput.
15
,
3354
3361
(
2019
).
26.
R.
Capelli
,
P.
Carloni
, and
M.
Parrinello
, “
Exhaustive search of ligand binding pathways via volume-based metadynamics
,”
J. Phys. Chem. Lett.
10
,
3495
3499
(
2019
).
27.
J. M. L.
Ribeiro
,
P.
Bravo
,
Y.
Wang
, and
P.
Tiwary
, “
Reweighted autoencoded variational Bayes for enhanced sampling (RAVE)
,”
J. Chem. Phys.
149
,
072301
(
2018
).
28.
S.
Motta
,
L.
Callea
,
L.
Bonati
, and
A.
Pandini
, “
PathDetect-SOM: A neural network approach for the identification of pathways in ligand binding simulations
,”
J. Chem. Theory Comput.
18
,
1957
1968
(
2022
).
29.
L.
Liu
,
W. A.
Baase
, and
B. W.
Matthews
, “
Halogenated benzenes bound within a non-polar cavity in T4 lysozyme provide examples of I⋯S and I⋯Se halogen-bonding
,”
J. Mol. Biol.
385
,
595
605
(
2009
).
30.
R.
Raag
and
T. L.
Poulos
, “
Crystal structures of cytochrome P-450CAM complexed with camphane, thiocamphor, and adamantane: Factors controlling P-450 substrate hydroxylation
,”
Biochemistry
30
,
2674
2684
(
1991
).
31.
T. L.
Poulos
,
B. C.
Finzel
, and
A. J.
Howard
, “
High-resolution crystal structure of cytochrome P450cam
,”
J. Mol. Biol.
195
,
687
700
(
1987
).
32.
G. E.
Hinton
and
R. R.
Salakhutdinov
, “
Reducing the dimensionality of data with neural networks
,”
Science
313
,
504
507
(
2006
).
33.
S.
Nosé
, “
A molecular dynamics method for simulations in the canonical ensemble
,”
Mol. Phys.
52
,
255
268
(
1984
).
34.
W. G.
Hoover
, “
Canonical dynamics: Equilibrium phase-space distributions
,”
Phys. Rev. A
31
,
1695
1697
(
1985
).
35.
M.
Parrinello
and
A.
Rahman
, “
Polymorphic transitions in single crystals: A new molecular dynamics method
,”
J. Appl. Phys.
52
,
7182
7190
(
1981
).
36.
L.
Molgedey
and
H. G.
Schuster
, “
Separation of a mixture of independent signals using time delayed correlations
,”
Phys. Rev. Lett.
72
,
3634
3637
(
1994
).
37.
See https://www.tensorflow.org for tensorflow library.
38.
39.
D. P.
Kingma
and
J.
Ba
, “
Adam: A method for stochastic optimization
,” arXiv:1412.6980 (
2017
).
40.
J.-H.
Prinz
,
H.
Wu
,
M.
Sarich
,
B.
Keller
,
M.
Senne
,
M.
Held
,
J. D.
Chodera
,
C.
Schütte
, and
F.
Noé
,
J. Chem. Phys.
134
,
174105
(
2011
).
41.
H.
Wu
and
F.
Noé
, “
Variational approach for learning Markov processes from time series data
,”
J. Nonlinear Sci.
30
,
23
66
(
2019
).
42.
F.
Paul
,
H.
Wu
,
M.
Vossel
,
B. L.
de Groot
, and
F.
Noé
, “
Identification of kinetic order parameters for non-equilibrium dynamics
,”
J. Chem. Phys.
150
,
164120
(
2019
).
43.
M. K.
Scherer
,
B.
Trendelkamp-Schroer
,
F.
Paul
,
G.
Pérez-Hernández
,
M.
Hoffmann
,
N.
Plattner
,
C.
Wehmeyer
,
J.-H.
Prinz
, and
F.
Noé
, “
PyEMMA 2: A software package for estimation, validation, and analysis of Markov models
,”
J. Chem. Theory Comput.
11
,
5525
5542
(
2015
).
44.
F.
Noé
,
C.
Schütte
,
E.
Vanden-Eijnden
,
L.
Reich
, and
T. R.
Weikl
, “
Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations
,”
Proc. Natl. Acad. Sci. U. S. A.
106
,
19011
19016
(
2009
).
45.
N.
Ariane
,
D.
Zuckerman
, and
G. M.
Arantes
, “
Escape of a small molecule from inside T4 lysozyme by multiple pathways
,”
Biophys. J.
114
,
1058
1066
(
2018
).
46.
M.
Sahil
,
S.
Sarkar
, and
J.
Mondal
, “
Long time-step molecular dynamics can retard simulation of protein-ligand recognition process
,”
Biophys. J.
122
,
802
816
(
2023
).
47.
A.
Morton
,
W. A.
Baase
, and
B. W.
Matthews
, “
Energetic origins of specificity of ligand binding in an interior nonpolar cavity of T4 lysozyme
,”
Biochemistry
34
,
8564
8575
(
1995
).
48.
V. A.
Feher
,
E. P.
Baldwin
, and
F. W.
Dahlquist
, “
Access of ligands to cavities within the core of a protein is rapid
,”
Nat. Struct. Mol. Biol.
3
,
516
521
(
1996
).
49.
R. A.
Fisher
, “
The use of multiple measurements in taxonomic problems
,”
Ann. Eugen.
7
,
179
188
(
1936
).
50.
J. M.
Lamim Ribeiro
and
P.
Tiwary
, “
Toward achieving efficient and accurate ligand-protein unbinding with deep learning and molecular dynamics through RAVE
,”
J. Chem. Theory Comput.
15
,
708
719
(
2019
).
51.
M.
Sultan
and
V. S.
Pande
, “
tICA-metadynamics: accelerating metadynamics by using kinetically selected collective variables
,”
J. Chem. Theory Comput.
13
,
2440
2447
(
2017
).

Supplementary Material

You do not currently have access to this content.