Inspired by recent progress in time-resolved x-ray crystallography and the adoption of time-resolution by cryo-electronmicroscopy, this article enumerates several approaches developed to become bigger/smaller, faster, and better to gain new insight into the molecular mechanisms of life. This is illustrated by examples where chemical and physical stimuli spawn biological responses on various length and time-scales, from fractions of Ångströms to micro-meters and from femtoseconds to hours.
RESPONSES OF THE CELLULAR MACHINERY TO STIMULI
All life responds to environmental stimuli. The majority of trees grow leaves in the spring as a response to increasing temperatures and extended sun light. Wounds close after an injury because an army of body cells responds in a concerted fashion. Stimuli, therefore, may have transformative implications on the shape of an organism. Failure in the molecular machinery to perceive and transmit stimuli may impact health and wellbeing in unforeseen ways. Receptors are central to the perception of stimuli. They are located not only in the cell membrane but also in a soluble form in the cytoplasm. They orchestrate finely tuned cellular responses, which are subject to intense investigation not only to cure diseases but also to gain insight in the molecular machinery of cellular organisms.
Present-day grand challenges are outlined by big ideas formulated by the National Science Foundation (NSF) through funding of basic research. They are expressed in a quite general form, which allows for an assortment of diverse ideas and approaches. Here is an outlook on a hypothetical example of the big idea “Understanding the Rules of Life.” When a growth factor binds to its cellular receptor, the entire biochemistry of the cell changes dramatically in preparation of advancing through the cell-cycle (Weinberg, 2007). Its entire machinery can switch from a quiet state to a manufacturing plant that rampantly produces new cellular components and degrades others to get ready for cell division. Since failures in the machinery lead to severe diseases, including cancer (Weinberg, 2007), it is of high interest how this switch is orchestrated, starting from the activation and transcription of distinct DNA sequences to the busy production of new proteins and enzymes. To follow events like these on the molecular level multiple time points, a few seconds apart across a time span of maybe half an hour and across the 3-dimensional volume of an entire single cell are required. Such an approach is very challenging (Patwardhan et al.,2014; de Jonge and Peckys, 2016; DiCecco et al., 2021; and Loconte et al., 2023). To observe the molecular rearrangements in real time, the cell must be kept in its physiological environment, and it must survive an imaging process that uses ionizing particles to depict a snapshot of its 3D structure with near atomic resolution. With advances in cryo-electron tomography (Baumeister, 2022), it may be that a 3D view of a cell can be produced also at warm temperatures with sufficient spatial resolution (DiCecco et al., 2021). However, it is not clear whether a living cell can survive an extended period of time while imaged by strongly ionizing particles (electrons) (de Jonge and Peckys, 2016). One may envision that only very limited, low dose data with a very low signal to noise ratio can be collected at each time point, so that the dose deposited is much below a critical level that would otherwise kill the cell. The entire movie of the cell-cycle consists then of multiples of ultra-low dose 3D tomograms that can only be assembled by using mutual information from many if not all snapshots in all orientations and at all time points as demonstrated on other systems using geometric machine learning (Fung et al.,2016; Dashti et al., 2020). A successful experiment, however, would reveal the identity, the shape, and the position of a large number of biological macromolecules as a function of time and reveal a deep, hitherto unmatched, insight into the mechanism into cell activation that has numerous bio-medical implications.
RESPONSES OF INDIVIDUAL BIOLOGICAL MOLECULES TO STIMULI
The function of individual biological macromolecules can be determined from purified preparations with time-resolved structural methods. When crystals can be obtained, time-resolved macromolecular crystallography (TRX) (Moffat, 1989, 2001) is the method of choice since it provides time-resolutions that are commensurate with the time-scales of the reactions to be investigated. Recent advances in single particle cryo-EM allow structures to be determined with near atomic resolution that rivals that of crystallography. However, time-resolved methods need to be implemented that are fast enough to cover macromolecular reactions in a meaningful way (Maeots and Enchev, 2022).
As TRX at synchrotrons approached maturity (Ren et al.,1999; Schotte et al., 2012; Jung et al., 2013; and Schmidt et al., 2013), the advent of x-ray Free Electron Lasers (XFELs) pushed the possible time-resolution to femtoseconds (fs) (Tenboer et al.,2014; Barends et al., 2015; Pande et al., 2016; and Nogly et al., 2018), the duration of the x-ray pulses at XFELs. Light stimulated reactions can, therefore, be followed on hitherto unprecedented femtosecond time-scales. In order to ensure uniform illumination, the crystal size should match or be smaller than the absorption length, which is on the order of 5 μm (Tenboer et al.,2014; Poddar et al., 2022) for photoactive yellow protein at the absorption maximum. XFEL pulses are capable of producing strong diffraction patterns from micro crystals whose sizes match the absorption length. Ultra short x-ray pulses that enable the “destruction-before-destruction” principle (Chapman et al., 2014) are required to collect a diffraction pattern before the crystal is destroyed. The microcrystal must be replaced with a new, pristine one in a serial fashion. Accordingly, the time-resolved technique has been named ‘time-resolved serial femtosecond crystallography (TR-SFX)’ (Aquila et al., 2012; Tenboer et al., 2014). In order to follow a reaction from the beginning to the end, a time-series of SFX data needs to be collected. Figure 1 shows 4 time points from a time series collected on photoactive yellow protein (PYP) through 11 orders of magnitude from 250 fs to 4 ms. The 4 time points depicted are representative of more than 200 datasets that cover the reaction. These data were collected by multiple groups over a time-period of about 23 years (Genick et al., 1997; Ren et al., 2001; Schmidt et al., 2004; Ihee et al., 2005; Schotte et al., 2012; Jung et al., 2013; Schmidt et al., 2013; Tenboer et al., 2014; Pande et al., 2016; and Pandey et al., 2020) at synchrotrons and XFELs. Around 2010, the speed of TRX data collection at BioCARS (sector 14, Advanced Photon Source, Argonne National Laboratory) has become fast enough that additional parameters such as temperature (Schmidt et al., 2013), pH (Tripathi et al., 2012), or x-ray dose (Schmidt et al., 2012) could be varied in a reasonable amount of experimental time supplementing the time information. This speed, however, will be largely exceeded by high-repetition XFELs, such as the LCLS-II-HE, that are scheduled to continuously generate fs x-ray pulses with a repetition rate of several hundred kHz. With these machines, the collection of several hundred time-delays that densely cover the time range from fs to ms will likely be possible within one experimental shift (12 h).
The amount of digital data generated by the collection of a time-series using an x-ray Free Electron Laser is immense. The stored data of the world had a volume about 300 exabytes (300 × 1018 bytes) in 2007 (Hilbert and Lopez, 2011), and it is currently more than 79 000 exabytes (79 zettabytes), which is projected to grow to 160 000 exabytes in 2025 (www.statistica.com). As we are concerned with storage of with our own data, the storage requirement for a time-series is estimated. The time-series might consist of 500 datasets, each dataset at a different time point. For each dataset, about 10 × 106 detector readouts are necessary (most of which do not contain Bragg reflections), which accumulates to 5 × 109 detector readouts for the entire time series. Each detector readout requires 10 Mb of storage. Accordingly, 5 × 1016 bytes of storage (50 petabytes) is required. At future high-repetition rate x-ray light sources, on the order of 100 000 detector readouts will be produced per second. The 5 × 109 images required for the time-series are collected in approximately 1 day, which amounts to about 1/40 000 of all data stored worldwide. This is a significant amount. Storage and analysis require immense computational effort that maybe alleviated by discarding diffraction patterns that do not contain Bragg reflections in real time. From an individual point of view, such a scenario is a challenge because a previously enjoyable detector image is nothing more but a glimpse in an ocean of data and not worth too much. Suppose we all can successfully navigate the data ocean, is that what is gained worth such an effort? Let us assume that a sufficiently large representation of a cellular proteome consists of 1000 proteins. A substantial subset of them (maybe 600) can be crystallized, and their structure and function investigated by time-resolved crystallography at a single high-repetition x-ray source in 2 years. This would then provide a database to reconstruct a functionally and structurally meaningful cell content during a cell-cycle as outlined above, a priceless treasure for future drug discovery.
Crystal lattice contacts pose constraints on the dynamics of the crystal's molecules. Accordingly, it would be desirable to cross-check results with another method that provides direct structural information, such as cryo-EM or time-resolved nuclear magnetic resonance (Mock et al., 2017). Imagine that time-resolved cryo-EM would work with sub-millisecond time-resolution consistently with near atomic resolution. This type of data would have an edge over crystallographic data, since rather than originating from an ensemble like a crystal, these data are collected from single particles which would add more dimensions, namely, free energy surfaces (Dashti et al., 2014; Hosseinizadeh et al., 2017), to the results. Yet, time-resolved x-ray crystallography took decades to evolve (Ren et al., 1999; Tenboer et al., 2014) and has the decisive advantage that data can be swiftly collected at (i) physiological temperatures and with (ii) unprecedented time-resolution, both conditions of which are challenge for time-resolved cryo-EM approaches.
In time-resolved crystallography applications, machine learning and artificial intelligence (ML/AI) remain mostly unexplored, although interesting approaches are reported (Schmidt et al., 2003; Ihee et al., 2005; Schmidt et al., 2013; Pande et al., 2016; and Hosseinizadeh et al., 2021). A large challenge remains the deconvolution of kinetic mixtures to extract unique and authentic reaction intermediates in a user-friendly way. In the field of time-resolved spectroscopy, algorithms to extract intermediates are more common (Henry and Hofrichter, 1992; Hoff et al., 1999; Zimanyi et al., 1999a; and Zimanyi et al., 1999b). With the application of the singular value decomposition (an unsupervised machine learning method) to TRX data (Schmidt et al., 2003), one is able to extract the number of processes (intermediates) and their relaxation times by globally identifying them in the right singular vectors. By applying a kinetic model, the structures of the intermediates as well as a compatible kinetic mechanism can also be determined from the left singular vectors (Rajagopal et al., 2004; Schmidt et al., 2004; Ihee et al., 2005; Rajagopal et al., 2005; Jung et al., 2013; and Schmidt et al., 2013). The limitations are that essentially identical concentration profiles of molecules in intermediate states can be produced from an assortment of compatible mechanism. A particular selected mechanism is not unique. However, since similar, chemically sensible concentrations are obtained from multiple degenerate mechanisms, the structures of the intermediates may be, indeed, unique. This must be examined on a case-by-case basis. It is also desirable to streamline the tedious projection algorithm to extract the intermediates from the left singular vectors (Zimanyi et al., 1999b; Schmidt et al., 2003; and Rajagopal et al., 2004). Perhaps, ML/AI methods can be used to extract the intermediates directly from the time-resolved difference maps in a user-friendly way by applying constraints that are a-priori known, such as the number of intermediates (from a prior SVD analysis) and by exploiting the fact that crystallographic occupancy is directly equivalent to (fractional) concentration.
Geometric machine learning methods become interesting when multiple datasets at closely spaced time points can be obtained (Fung et al., 2016; Hosseinizadeh et al., 2021). Spatial completeness is not so much of importance since mutual information from nearby time points can be utilized. In extreme cases, only one (quasi monochromatic) diffraction pattern is necessary per time point (Hosseinizadeh et al., 2021), which covers as little as 0.1% of reciprocal space and which, in addition, is highly partial and, therefore, does not allow for the determination of the required integrated reflection intensities. The information of the remaining ∼20 000 diffraction patterns, which would add up to a crystallographic complete dataset of integrated reflection intensities, can be retrieved from the mutual information extracted from all, however mostly from the closest time points (Hosseinizadeh et al., 2021). It has to be seen how far such an extreme approach carries.
The recent ability to determine structures of molecules on a femtosecond timescale after an ultrashort excitation led to the exciting consequence that Jablonski diagrams (Fig. 2) can now be populated by measured (real) x-ray structures (Barends et al., 2015; Nango et al., 2016; Pande et al., 2016; and Nogly et al., 2018). Information obtained from ultrafast absorption, fluorescence, infrared or Raman spectroscopy can be structurally interpreted, and molecular dynamics simulations can be used to further understand structural transitions with simplified models (Groenhof et al., 2002a; Groenhof et al., 2002b; Groenhof, 2013; and Hosseinizadeh et al., 2021). The initial, resonant transitions between the various states in the Jablonski diagram depend on rate coefficients (Einstein coefficients). This way the population change is described by coupled differential equations to accommodate all transition, including absorption and spontaneous and stimulated emission. The initial population dynamics during the reaction initiation is, therefore, dependent on the strength of the light pulse employed, the magnitude, and the frequency dependence of the Einstein coefficients that describe the transition and the time-dependent concentrations of molecules that occupy the Jablonski diagram.
The states in a Jablonski diagram can be considered as time-independent, “metastable” states that are visited by reacting molecules that are vertically excited or de-excited from other time-independent states, mostly from the ground state. With excitation pulses approach a few fs (in the impulsive limit), this picture breaks down. In the established view, an ensemble of molecules is coherently lifted from a ground state potential energy surface to an excited state potential energy surface forming coherent wavepackets (Dhar et al., 1994). The “stable” states predicted by near-equilibrium considerations might not exist on these ultrafast time-scales. Vibrational states can be excited by impulsively displacing the electron cloud relative to the nucleus. The nuclei do not react well to an outside oscillating electromagnetic field (Bruner et al., 2017). However, the electrons will be able to follow the oscillating electric field. Perhaps for a few fs (Bruner et al., 2017), the electrons are not confined to the potential of the nuclei. This means, that for this very short period of time, the so-called Born–Oppenheimer approximation is not valid. In a structural representation, the electrons are substantially displaced by the electromagnetic field of the exciting laser pulse away from the nuclei. If the displacement is not large enough that the electrons can leave the nuclear potential (e.g., as photoelectrons), the electron cloud begins to oscillate shortly thereafter with the positions of the nuclei remaining stationary (Bruner et al., 2017). Accordingly, the excitation process itself becomes complicated in the impulsive limit (Dhar et al., 1994). It might involve transient overshoots, oscillations, and an abrupt electric current that populates high energy unoccupied molecular orbitals with excited (valence) electrons. When the excitation is considered vertical, the structure of the nuclear ensemble does not change. This will produce an energetically unfavorable configuration with the electrons in a new molecular orbital, but the nuclear structure still resembles that of the ground state. However, at this point of time (perhaps a few fs after the excitation event), the Born–Oppenheimer approximation becomes valid again because the electrons are able to quickly adjust to the potential of the nuclei. The atomic configuration relaxes on the excited state energy surface driven by electrostatic forces until they return to the electronic ground state energy surface, in some cases radiation-less through a conical intersection (Groenhof et al., 2004; Levine and Martinez, 2007; Pande et al., 2016; and Hosseinizadeh et al., 2021). Intense, impulsive, ultrashort laser pulses tend to induce damage (Grunbein et al., 2020; Miller et al., 2020). Electrons are accelerated in intense electric fields that may even exceed the electrical breakdown threshold. There is surprisingly little known about the mechanism of absorption, in particular about absorption cross sections as a function of pulse duration (Hutchison et al., 2016). It has been suggested that unspecific damage and the resonance absorption process must be decoupled (Dhar et al., 1994). In addition, there is very little systematic information on how powerful ultrashort laser pulses couple into protein crystals embedded in stabilizing solutions or viscous media. These are all questions that must be addressed in the future.
Enzymes are biological catalysts that catalyze the function of life ranging from digestion of food sources, to motion, perception of stimuli, and cognitive functions. Although their static structures can be determined by x-ray crystallography (Blow and Steitz, 1970) or cryo-EM (Tsai et al., 2022), their functions are difficult to observe directly with atomic precision (Johnson, 2013; Schmidt, 2020; and Wilson, 2022). Methods with time-resolution on enzyme solutions such as absorption spectroscopy in combination with stop-flow mixing experiments (Johnson, 2013) or time-resolved Small and Wide Angle x-ray Scattering (SAXS/WAXS) (Kim et al., 2012; Bjorling et al., 2015; Grant, 2018; and Byer et al., 2022) provide an avenue to investigate enzyme catalysis. However, they all lack atomic resolution. Accordingly, there is uncertainty in the interpretation of the experimental results. One of the most important developments to investigate structures and function of enzymes is the mix-and-inject (Schmidt, 2013) technique that employs diffusion to initiate a reaction (Geremia et al., 2006; Schmidt, 2013). Serial crystallography at intense pulsed x-ray sources in combination with suitable mixing injectors (Calvey et al., 2016; Calvey et al., 2019) enables this technique (Kupitz et al., 2017; Mehrabi et al., 2019a). Since diffusion times dependent on the square of the crystal size, very small crystals must be employed to swiftly initiate enzymatic reactions much faster than the turnover time. Only then can the enzymatic cycle be followed with x-ray structures. Since the crystals are small, already a single exposure results in unacceptable damage by x-ray radiation, so that a new crystal is required. For fastest diffusion, microcrystals with edge lengths on the order of 2 μm are required (Schmidt, 2013). A damage free diffraction pattern might not be obtainable, unless the “diffraction-before-destruction principle” (Chapman et al., 2014) is employed, which requires ultrashort, femtosecond x-ray pulses from XFELs. The method of mixing and then injecting into highly intense pulses from an XFEL has been named mix-and-inject serial crystallography (MISC) (Kupitz et al., 2017). In the last few years, several groundbreaking experiments have been performed to demonstrate the feasibility of MISC (Kupitz et al., 2017; Stagno et al., 2017; Olmos et al., 2018; Dasgupta et al., 2019; Ishigami et al., 2019; Pandey et al., 2021; and Murakawa et al., 2022). More results are expected in the (near) future. With MISC, well-ordered complexes between enzymes and substrates are detected in difference maps (Fig. 3). To reach occupancy values >10% during the initial substrate binding phase, a substrate is required that is soluble to concentrations on the order of that of the crystalline enzyme (∼25 mM). Otherwise, the buildup of the enzyme-substrate complex is too slow and may extend to the steady-state regime even when supported by a high second order binding coefficient. The higher the substrate concentration, the faster the enzyme-substrate complex builds up, and the more time is available to observe intermediates in the burst phase that precedes the steady state. In an extreme case, a substrate concentration of 1 M was used to start reactions in bigger crystals that were investigated with synchrotron pulses (Mehrabi et al., 2019a). With MISC, the initial (pre-steady state) phase of substrate binding and processing is explored. Typically, after formation of a non-covalently bound enzyme-substrate complex, the structures of one or multiple intermediate states may be determined. As one can see from Fig. 4, the steady state Michaelis complex (ES) (Cornish-Bowden, 2012) consists of a mixture of several intermediates weighted by their respective occupancies. Non-covalently and covalently bound enzyme-substrate complexes as well as covalently and non-covalently bound enzyme-product complexes with catalytically modified substrate molecules can all be present at the same time. MISC can be used to unravel the structures of all or a subset of these complexes depending on the kinetic mechanism. As a rule of thumb, only successively more stable intermediates can be observed. Intermediates with a short lifetime cannot be observed after an intermediate with a long lifetime, simply because molecules will not accumulate sufficiently in the short lived intermediate to become observable by MISC (and by any time-resolved method). With MISC, we have a promising and conceptually straightforward method at hand (Schmidt, 2020) that gives insight in enzyme catalysis and ligand binding, which can complement or even replace similar, but technically more challenging approaches using caged substrates (Mehrabi et al., 2019b; Wilamowski et al., 2022).
After decades of time-resolved macromolecular structure determination with macroscopically large single crystals, the time for time-resolved serial crystallography that relies on a large number of microscopic crystals has come. Structure and function of an enormous number of biological macromolecules including enzymes can now be determined in real time and at physiological temperatures. New beamlines such as the ID29 SMX instrument at the European Synchrotron Radiation Facility (ESRF) in Grenoble, France, or the PETRA-III 14-2 TREXX instrument at the Deutsches Elektronen Synchrotron (DESY) in Hamburg, Germany, will help to make time-resolved studies with the pump-probe and the mix-and-inject technologies even more popular. Unrivaled time-resolution can be reached at XFELs with very small (even sub-μm) crystals with the benefit of the essential absence of radiation damage. Ultimately, research is carried out by individuals. It is an unwavering interest in time-resolved crystallography that can make a difference.
This work was enabled by the NSF Science and Technology Center Biology with XFELs (BioXFEL), NSF-STC 1231306, and NSF RAPID 2030466. The author thanks P. Schwander, V. Srajer, and E. Stojkovic for commenting on an earlier version of this manuscript.
Conflict of Interest
The authors have no conflicts to disclose.
Marius Schmidt: Conceptualization (lead); Funding acquisition (lead); Project administration (lead); Resources (lead); Visualization (lead); Writing – original draft (lead); Writing – review & editing (lead).
Data sharing is not applicable to this article as no new data were created or analyzed in this study.