Heterogeneity is intrinsic to the dynamic process of a chemical reaction. As reactants are converted to products via intermediates, the nature and extent of heterogeneity vary temporally throughout the duration of the reaction and spatially across the molecular ensemble. The goal of many biophysical techniques, including crystallography and spectroscopy, is to establish a reaction trajectory that follows an experimentally provoked dynamic process. It is essential to properly analyze and resolve heterogeneity inevitably embedded in experimental datasets. We have developed a deconvolution technique based on singular value decomposition (SVD), which we have rigorously practiced in diverse research projects. In this review, we recapitulate the motivation and challenges in addressing the heterogeneity problem and lay out the mathematical foundation of our methodology that enables isolation of chemically sensible structural signals. We also present a few case studies to demonstrate the concept and outcome of the SVD-based deconvolution. Finally, we highlight a few recent studies with mechanistic insights made possible by heterogeneity deconvolution.
INTRODUCTION
When Keith first introduced time-resolved crystallography to us in the early 1990s, he attracted our attention to a couple of challenges in data analysis, in addition to several scientific problems in biochemistry that would significantly benefit from such technical advances in data processing. One challenge was the crystallographic data reduction from the photographs of polychromatic Laue diffraction (Ren and Moffat, 1994), then commonly recorded on negative films and phosphor imaging plates. The other was the resolution of structural heterogeneity intrinsic to the dynamic process of a biochemical reaction or a protein conformational change—the kind of process that time-resolved crystallography aims to study. During those years when we were students, postdocs, and research associates in Moffat lab, we had gained firsthand experience in dealing with mixed signals in our data. Later in our independent research careers, solving these computational problems became one of our goals. As we demonstrate in this review, proper treatment and resolution of structural heterogeneity directly impact structural findings, therefore scientific conclusions.
Prior to the first wave of time-resolved crystallographic studies largely based on the Laue diffraction technique (Bourgeois and Weik, 2009; Ren , 1999; and Schmidt, 2008), Moffat laid out the premise from the outset that “the evolution of the system is completely stochastic; there is no temporal coherence between the molecules, and molecules of a given conformation are distributed spatially at random in the lattice.” He further pointed out that the independence of each protein molecule in a crystal lattice arises from “the weakness of the intermolecular forces” (Moffat, 1989). After practicing for a decade, Moffat reflected on the challenge and opportunity in time-resolved crystallography (Moffat, 2001): “The changes in structure from one molecule to another are uncorrelated in space and time …. One molecule is in-state [i] does not affect the probability that its neighbors in the crystal lattice are also in-state [i] or in any other state [j].” Time-resolved crystallography, as Moffat proposed, essentially takes advantage of such coexistence of multiple states. “Time-resolved experimental approaches do not attempt chemical or physical trapping of intermediates, accept that chemical and structural heterogeneity is inherent in the sample at all times as the reaction proceeds, and attempt to resolve this time-dependent heterogeneity into the structures of homogeneous intermediates during the subsequent data analysis process.” However, he also predicted that “… resolution of structurally heterogeneous data into an overall mechanism and time-independent structures of intermediates is a challenging conceptual, structural and computational problem.” In a nutshell, Moffat proposed an elegant method without the need for any deliberate experimental trapping to simultaneously determine metastable reaction intermediates that could rather be called “time-unresolved crystallography.” Despite Moffat's articulation on the meaning of “time-resolved crystallography,” this phrase is often interpreted literally in many high-profile publications with merely a layman understanding that two well-resolved intermediate states can be unconditionally observed by electron density maps measured at two different reaction times t1 and t2 regardless of the limited resolving power of time (see below).
On the left side of Eqs. (1) and (2), F(t, H), ρ(t, r), and Δρ(t, r) are experimental observations that contain mixed signals from the reactants, intermediates, and products, while the functions on the right side are the desired solutions. These structure factor sets Fi(H) and their corresponding electron density maps ρi(r) or Δρi(r) represent the distinct structures involved in the biochemical reaction, but isolated from one another, while their populations fi(t) vary as a function of time during the reaction. A simultaneous solution entails datasets collected at far more time delays than the number of unknowns such that a linear system can be established and solved by overdetermination. In other words, it would be an overinterpretation if each observation at a given delay time leads to a unique structure.
Three important implications from these equations are worth noting. First, although it is desirable to solve the electron density maps for all pure conformational species that rise, peak, and fall as a function of time throughout a chemical reaction, it would be more practical and informative to resolve electron density maps that depict structural events, often occurring in different parts of the protein structure. More than one structural event could concur in a pure conformational species (see example below). Second, the time-dependent function fi(t) for each structural species i is shared not only between Eqs. (1) and (2) but also among all reflections H in reciprocal space and every position r in real space. These time-dependent functions are, therefore, space-independent. They depict the population rise and fall of reactants, intermediates, and products throughout a chemical reaction. Third, both the structure factor set Fi(H) and the electron density map ρi(r) or Δρi(r) corresponding to the structural species i are time-independent spatial functions in reciprocal and real spaces, respectively. Therefore, the first step toward the desired solutions is to factorize time-resolved observations, either F(t, H), ρ(t, r), or preferably Δρ(t, r), into space-independent temporal functions and time-independent spatial functions. We shall demonstrate below how singular value decomposition (SVD) is utilized to achieve this first goal elegantly.
If a reaction under investigation conforms to the Arrhenius behavior, the temporal functions fi(t) can be expressed as a sum of several exponential functions so that a kinetic model of the chemical reaction would be in sight (Moffat, 2001). If all fi(t) functions involve only a single exponential in the form of exp(−t/τ), it is a strong indication that the reactant converts to the product in a single chemical step of a time constant τ, and no intermediate species is detected. It is important to note that two time points t1 and t2 do not guarantee a clean resolution of two distinct structural species. When t1 and t2 are well separated on both sides of the time constant τ, they would carry greater resolving power to differentiate the product from the reactant to a certain extent. When an intermediate state exists due to a rate-limiting step, the functions fi(t) would exhibit a biphasic behavior with two different time constants τ1 and τ2, such as fi(t, τ1, τ2). In this case, three time points flanking and interleaving with the time constants would be the best choice to resolve three structural species. Regardless of the choice of time points, signals remain heterogeneous at all times unless the distinct reaction steps occur decades apart such that an equilibrium is reached temporarily in the middle of the reaction. In other words, the resolving power of time is limited by an exponential function.
Although not guaranteed, if Eqs. (1) and (2) can be transformed into another form where at least one of the time-dependent functions involves only a single time constant, say f1(t, τ1), this transformation is said to accomplish a deconvolution of one chemical process from the other. It would be highly informative to figure out what corresponding electron density maps ρ1(r) and ρ2(r) would satisfy the transformation such that at least one of the deconvoluted maps can be visualized as a pure chemical species. This deconvolution after SVD represents the second step in obtaining the desired solutions in Eqs. (1) and (2). We will demonstrate how such deconvolution can be achieved by a rotation after SVD.
In summary, there are two intertwined aspects of resolving structural heterogeneity in any dynamic studies including time-resolved crystallography: to establish the population change of a structural species as a function of a given parameter x in metadata and to resolve structural changes in response to the change of x and x alone. In this review, we discuss key concepts of SVD in the context of spectroscopic and crystallographic applications and demonstrate how a multi-dimensional rotation enables isolation of chemically sensible structural signals in several case studies and real projects.
SINGULAR VALUE DECOMPOSITION (SVD)
An electron density map ρ(X, r), or a difference map Δρ(X, r), consists of an array of density values on grid points within a region of interest. To perform SVD on electron density maps, all M grid points in a three-dimensional map can be serialized into a one-dimensional sequence of density values. What serialization protocol to use is not important as long as it is consistently applied to all maps with the same grid setting and size, and a reverse protocol is available to erect a three-dimensional map from a sequence of M densities. A set of N serialized maps fill the columns of a data matrix A with no specific order, so that the width of A is N columns spanning the space of metadata X, and its length is M rows spanning the real space of r (Fig. 1). Notice that the metadata X consisting of experimental conditions and the real space coordinates r are not part of the data matrix A for SVD. Only the electron density values are entered in A for SVD, hence called the core data, as opposed to the metadata. If a consistent protocol of serialization is used, the corresponding voxels from all N maps would occupy a single row of matrix A. Such strict correspondence in a given row of matrix A is important. Changes of the density values in a row from one map to another could arise from signals, systematic errors, or noises. Although this review mainly concerns electron densities as core data, SVD and its associated analytical procedures are equally applicable to many other types of data, such as spectroscopic data used for case studies below and interatomic distance matrices (Ren, 2016, 2013a, 2013b).
SVD of the data matrix A results in , also known as matrix factorization (Fig. 1). Matrix U has the same shape as A, that is, N columns and M rows. The N columns contain a series of decomposed basis components Uk, known as left singular vectors of M items each, where k = 1, 2, …, N. For crystallographic maps, each component Uk spanning the real space of r can be erected using the reverse protocol to form a three-dimensional map. These decomposed elemental maps Uk can be presented in the same way as the original maps using popular molecular graphics software such as Coot and PyMol. Notice that these decomposed elemental maps or map components Uk are independent of any metadata. That is to say, these map components remain constant while the metadata vary, which satisfies the dependency requirement of the function ρi(r) in Eq. (3).
The second matrix W is a square matrix that contains all zeros except for N non-negative values on its major diagonal, known as singular values wk (Fig. 1). The magnitude of wk is considered as the weight or significance of its corresponding component Uk. The third matrix V is also a square matrix of N × N. Each column of V or row of its transpose , known as the right singular vector Vk, contains the relative compositions of Uk in each of the N original maps in A. Each right singular vector Vk can be considered as a function of the metadata, which satisfies the dependency requirement of function fi(X) in Eq. (3). Effectively, SVD separates the constant components independent of the metadata from their compositions that depend on the metadata. Thus, the first goal in solving the desired unknowns as depicted by Eq. (3) is essentially achieved by SVD, which factorizes the experimental observations in A into a set of spatial functions independent of metadata and their corresponding population changes as a function of metadata. This achievement is further clarified by reconstitution of several most significant components Uk.
RECONSTITUTION
Assuming that the singular values after wn are markedly smaller than those from w1 through wn, the components after Un are considered insignificant, thus often excluded in this approximation. That is, the data matrix A is said to have a low effective rank of n. As a result, the structural information evenly distributed in all N original maps is effectively concentrated into a far fewer number of n significant components, known as information concentration or dimension reduction. On the other hand, the trailing components in matrix U contain inconsistent fluctuations and random noises. Excluding these minor components effectively removes noises (Schmidt , 2003). The least squares property of SVD guarantees that the rejected trailing components sum up to the least squares of the discrepancies between the original core data and the approximation using the accepted components.
However, no clear boundary is guaranteed between signals, systematic errors, and noises. Systematic errors could be more significant than the desired signals. Therefore, it is sometimes necessary to exclude certain components from 1 through n. If systematic errors are correctly identified, they can be effectively removed by eliminating the corresponding components during reconstitution. Since systematic errors are just unwanted signals, deconvolution of systematic errors is equally important as deconvolution of desired signals. While the desired signals arising from a chemical reaction are expected to follow a kinetic model, a variety of systematic errors usually do not.
CASE STUDIES OF SVD IN SPECTROSCOPIC DATA ANALYSIS
For better clarity, we opt to use (difference) absorption spectra instead of electron density maps to illustrate the principles of SVD as they are equally applicable to both electron density maps and structure factor sets.
Dark reversion of bacteriophytochrome
In the first case study, we examine a time series of absorption spectra recorded during the dark reversion process of a full-length bacteriophytochrome RpBphP2 from photosynthetic bacterium Rhodopseudomonas palestras [Fig. 2(b)]. RpBphP2 incorporates a linear tetrapyrrole biliverdin as a chromophore, and undergoes reversible photoconversion between a red-light-absorbing Pr state and a far-red-light-absorbing Pfr state (Giraud , 2005; Yang , 2015). Typical for a canonical bacteriophytochrome, RpBphP2 adopts the Pr state in the dark with λmax = 707 nm. Upon red light illumination, it photoconverts to the Pfr state with λmax = 748 nm. In the absence of any light, RpBphP2 reverts slowly but spontaneously to the Pr state with a half-time of about 1 h.
A time series of absorption spectra is measured at 45 time points evenly spaced over 3 h during the dark reversion of RpBphP2 in solution. The starting sample shows spectral features characteristic to both Pr and Pfr states following red light exposure at 660 nm [reddest curve in Fig. 2(b)]. The absorbance values of each spectrum cover the wavelength range between 250 and 900 nm occupying a column of the data matrix A. SVD of the matrix A results in two significant singular values far greater than the rest [Fig. 2(a)]. The corresponding two most significant spectral components U1 and U2, or the left singular vectors, are shown in Fig. 2(c). The first component U1 resembles the dark resting spectrum of Pr but with a small shoulder at the far-red region somewhat like the light spectrum of Pfr. Therefore, U1 contains mixed signals from both states. The second component U2 features both positive and negative peaks similar to a difference spectrum. When the right singular vectors are plotted against the metadata, here time t [Fig. 2(d)], both seem to follow exponential functions. It would be very informative to plot the right singular vectors against each other, revealing the trace of the compositional changes of the corresponding spectral components and facilitating a subsequent rotation for deconvolution of pure spectra.
The dark reversion of RpBphP2 proceeds linearly from a mixture of Pr and Pfr states back to the Pr state without any detectable intermediate species since c1 correlates with c2 in a straight line [Fig. 2(d)] suggesting that both right singular vectors follow exponential functions of the same time constant. In other words, only two distinct spectral species of the bilin chromophore are detected. However, this observation does not exclude the possibility of any spectrally silent intermediate conformation. Here, the observed smooth evolution in spectral change simply reflects the continuously varying composition between Pfr and Pr states. It would be incorrect to conclude that the temporal smoothness displayed in the time series of absorption spectra originates from the conformational continuity of the bilin chromophore during its dark reversion. In this case study, it seems easy to understand Moffat's warning of “no temporal coherence” among bacteriophytochrome molecules in solution (Moffat, 1989). Yet, similar erroneous conclusions are not uncommon in the literature of time-resolved crystallography, where each and every electron density map in a time series was usually interpreted and refined as a distinct structure while a collection of such structures are said to be “time-resolved.” In reality, there is no guarantee that any time point would represent a pure state during a reaction or process, including those time points at the beginning and end. Furthermore, a simple SVD procedure does not produce pure spectra of the Pfr and Pr states. As discussed later in the review, deconvolution of pure spectra would entail additional data processing known as a rotation in the most significant subspace identified by SVD. This case study continues after we introduce the method of deconvolution.
Photoreaction of photolyase
In the second case study, we examine a more complicated scenario involving the interplay among three different cofactors of a photolyase PhrB where these redox cofactors, namely, a catalytic flavin cofactor FAD, an auxiliary ribolumazine cofactor DMRL, and an inorganic iron–sulfur cluster ([4Fe4S]), exhibit overlapping spectral features (Graf , 2015; Oberpichler , 2011; and Ren , 2023). This time series of difference absorption spectra is measured from a single PhrB crystal under continuous illumination of violet light at 405 nm [Figs. 3(a) and 3(b)]. SVD of the difference spectra at 200 time points evenly spaced in 20 s produces three outstanding singular triplets [Figs. 3(c) and 3(d)]. The first component U1 carries strong signals arising from FAD photoreduction characterized by the negative sharp edge at 485 nm. The other two major components U2 and U3 show a transition from a decent correlation at short wavelengths to an anticorrelation at long wavelengths [Fig. 3(c)]. The four-dimensional space of c1, c2, c3, and t is presented in several orthographical projections in Fig. 4. Only c1 seems to follow an exponential function, while large fluctuations are observed in the top three right singular vectors. Interestingly, these fluctuations are not random as they exhibit a preferred orientation. Once again, this case study shows that a pure mathematical decomposition alone is insufficient for revealing the chemically meaningful property of the sample. This inadequacy has motivated our efforts in recent years to develop a method of deconvolution to extract pure states.
THE ORTHONORMAL PROPERTY OF SVD
A solution set of SVD guarantees that the columns in U and V, namely, the left and right singular vectors Uk and Vk, are orthonormal. That is, Uh•Uk = Vh•Vk = 0 (ortho) and Uk•Uk = Vk•Vk = 1 (normal), where h ≠ k but both range from 1 to N. The orthonormal property also holds for the row vectors. As a result, each component Uk is independent of the other components in matrix U. In other words, one component cannot be represented by a linear combination of any other components. In dynamic experiments, however, protein structural responses to two physical or chemical parameters in the metadata, such as temperature and pH, are not necessarily orthogonal. They could even exhibit some correlation. Therefore, the decomposed left singular vectors Uk from the pure mathematical decomposition of a matrix may not necessarily represent any physically or chemically meaningful changes.
Due to the orthonormal property of SVD, an N-dimensional Euclidean space is established, and its most significant subspace is defined by the first n dimensions. Each coefficient set Pj = (c1j, c2j, …, cnj) of the jth reconstituted map is located in this n-dimensional subspace, such as the two-dimensional subspace of c1 and c2 in Fig. 2(d) and the three-dimensional subspace of c1, c2, and c3 in Fig. 4. All coefficient sets for j = 1, 2, …, N used in the reconstitution of the N original electron density maps (or spectra) in a least squares sense can be represented by N points or vectors P1, P2, …, PN in the Euclidean subspace [Figs. 2(d) and (4)]. This n-dimensional subspace essentially defines a conformational space surveyed by the jointly analyzed core data. While right singular vectors are traditionally presented as functions against some selected metadata, such as time t, here we emphasize the importance to present right singular vectors scaled by their corresponding singular values, that is, the coefficient set as defined in Eq. (5), as a multi-dimensional scatter plot. Each reconstituted electron density map (or spectrum) and the corresponding observed map are presented as a dot in the conformational space at a position determined by the coefficient set Pj. When the subspace has dimensionality greater than two, the conformational space is presented as multiple two-dimensional orthographical projections (Fig. 4). These scatter plots are highly informative for examining the relationship between the electron density maps and their metadata, and for conducting deconvolution by rotation in this Euclidean subspace.
Given two coefficient sets Pi ≈ Pj located close to each other in the conformational space, the corresponding structures i and j are expected to have similar conformations arising from nearly identical reconstitutions. Conversely, two dots far apart from each other in the conformational space suggest two distinct conformations with dissimilar compositions of the map components. For dynamic data collected in a time series, establishing a trajectory is straightforward given that the time stamp of each data point is known. For a collection of electron density maps without a concrete order, it is also possible to establish a trajectory in this conformational space even when the temporal order of the core data is absent, assuming that population change evolves smoothly along a reaction pathway (Ren, 2016, 2013a, 2013b). The order of structures in a series would inform the causation and consequence of structural changes, which in turn reveals structural mechanism. In addition, an off-trajectory location or those between two clusters of observed structures would reveal rare conformations that have never been experimentally captured likely due to energy barriers. Such hypothetical structures can be very informative when they are refined against reconstituted electron density maps (Ren, 2022) or against reconstituted distance matrices using molecular distance geometry (Ren, 2016, 2013a, 2013b).
ROTATION IN EUCLIDEAN SUBSPACE OF SVD
The default solution set of SVD usually carries signals of complicated physical and chemical meanings that are hard to discern. Therefore, SVD alone provides no direct answer to “what-does-it-mean.” A basis component Uk must be interpreted in the context of the metadata. It is important to note that the factorized set of matrices U, W, and V from SVD is not the only solution. The question is: Would alternative solution sets present a physically meaningful interpretation? The idea of rotation following SVD was introduced by Henry and Hofrichter (1992). However, the protocol they put forward does not preserve the orthonormal and least squares properties of SVD. Ren proposed a different rotation protocol that takes metadata into consideration so that the relationship between the core data and metadata can be directly examined (Ren, 2019). This rotation protocol enables a numerical deconvolution of multiple physical and chemical factors following a pure mathematical decomposition, thereby providing a route to answer the question of “what-does-it-mean.” The Ren rotation shall not be confused with a rotation in the three-dimensional real space where molecular structures reside.
This theorem holds because the dot product of two vectors remains unchanged when both vectors rotate by the same angle [Eq. (6)]. It would be straightforward to prove Eq. (6) simply by combining Eqs. (7) and (8) into Eq. (6) and expanding them. All cross terms of sine and cosine would be self-canceled (Ren , 2024).
A rotation in two-dimensional subspace of h and k has no effect on other dimensions, as guaranteed by the orthonormal property of SVD. Furthermore, a multi-dimensional rotation can be achieved by consecutive steps of rotations in many two-dimensional subspaces. The resulting solution set shall retain the orthonormal property of SVD and have no effect on how the core data of protein structures are compared. By transforming one solution set to an alternative solution set , one may find an appropriate perspective that better elucidates the relationship between the core data and metadata in a clear and concise manner (Fig. 1). The theorem of rotation itself does not mention anything related to deconvolution. However, such rotation enables deconvolution by isolating a specific parameter in the metadata from the others.
If two clusters emerge in the conformational space by examining the distribution of the coefficient set Pj, it is desirable to perform a rotation such that these clusters are aligned along a single dimension k without involving other dimensions. As a result, the component Rk would clearly reveal the structural transition from one cluster to the other. A deterministic solution entails a clear relationship between the core data and metadata. In most cases, a proper rotation is achieved via trial-and-error guided by iterative interpretation of the deconvoluted maps in the context of the subject matter. Although a “bad” rotation may not improve data interpretability, it shall not alter the distribution of the observed data and the shape of the reaction trajectory, because a rotation would not introduce nonexistent features nor eliminate any authentic signals. For example, a rotation cannot alter the observation of a large gap between two clusters of core datasets, although it may make these clusters less obvious from that ill-conceived viewpoint. If a fast chemical step follows a slow one, the short-lived intermediate species may not accumulate enough for detection as it is being produced too slowly in the step leading to it. In this scenario, the proposed rotation would not help because it cannot generate signals not captured in the experimental data. However, if weak signals from a short-lived species are indeed present, their clarity can be significantly enhanced after separation from other variations by a proper rotation.
CASE STUDIES OF DECONVOLUTION IN SPECTROSCOPIC DATA ANALYSIS
We continue to use the aforementioned case studies to demonstrate the analytical scheme of deconvolution.
Dark reversion of bacteriophytochrome
In this case study, SVD alone performed on the time series of dark reversion does not yield the pure spectra of the Pfr and Pr states in bacteriophytochrome [Fig. 2(c)]. However, if we apply a counterclockwise rotation of 30.9° in the subspace of the top two dimensions where both c1, c2, and U1, U2 are rotated the same way according to Eqs. (7) and (8), the linear combination of these components remains unchanged according to the theorem of rotation [Eq. (6)]. After rotation, the first left singular vector R1 reveals the absorption spectrum of a pure Pfr state, while the second R2 represents the difference spectrum of Pr–Pfr [Fig. 5(b)]. The straight-line correlation between c1 and c2 becomes perfectly vertical, which is how the rotation angle of 30.9° was determined. Therefore, f1 is a constant, and f2 can be fit with a single exponential function with a time constant τ = 58 min [Fig. 5(c)]. Hence, the pure Pfr state can be derived from an extrapolated point as marked by the red dot on the vertical line. This suggests that the dark reversion experiment did not start with a pure Pfr state; rather the initial sample contained a mixture about 1/3 of Pr and 2/3 of Pfr [reddest curve in Fig. 2(b)]. Similarly, the spectrum of the pure Pr state can be obtained by a slight extrapolation forward beyond the recorded time series at the purple dot in the subspace of f1 and f2 [purple curve in Fig. 5(b)]. As a result of rotation, we are able to achieve deconvolution and obtain the pure spectra for both Pfr and Pr states from the heterogeneous data collected throughout the dark reversion process. We intend to use this minimal case study to illustrate what Moffat had envisioned for time-resolved studies. “Time-resolved experimental approaches … attempt to resolve this time-dependent heterogeneity into the structures of homogeneous intermediates during the subsequent data analysis process” (Moffat, 2001).
We may refer the vertical straight line in the subspace of f1 and f2 as the reaction trajectory of dark reversion from Pfr to Pr, although only the later 2/3 has been experimentally observed [Fig. 5(c)]. It is noteworthy that this reaction trajectory is an invariant, including both the experimentally captured portion by time-resolved absorption spectroscopy and the extrapolated wings beyond the time series, whether such a straight line is vertical along a single dimension of f2 after rotation [Fig. 5(c)] or inclined in the original subspace of c1 and c2 [Fig. 2(d)]. As implied by the theorem of rotation [Eq. (6)], such invariant reaction trajectory depicts the intrinsic absorption property of this bacteriophytochrome protein regardless of our choice of principal axes. In other words, an alternative coordinate system can be chosen as wish to find a viewpoint of greater clarity and conciseness in data interpretation without altering the invariant reaction trajectory.
Photoreaction of photolyase
The second case study showcases a multistep rotation performed in the top five-dimensional subspace. After rotation, it becomes clear that the photoreaction of PhrB under violet light is a biphasic process evidenced by the curved correlation between f1 and f3. By aligning the slower phase II along the third dimension, f1 can be fit nicely with a single exponential function of a time constant τ1 = 0.4 s, while f3 must be described by a two-exponential function with an additional time constant τ2 = 9 s [Fig. 6(a)]. This rotation is achieved by a least squares fitting that determines a first step of −43.8° rotation in the subspace of dimensions 2 and 3 followed by a second step of 54.1° rotation in the subspace of dimensions 1 and 3 with several additional steps involving the dimensions 4 and 5. Alternatively, the faster phase I can be aligned to the first dimension so that f3 can be fit with a single exponential of τ2 = 9 s, while f1 requires both exponentials of τ1 and τ2 (Fig. 7). These two rotations are off by 32.7° in the subspace of dimensions 1 and 3. The pure difference spectra arising from the photoreduction reactions of FAD and DMRL can be deconvoluted from each other in two alternative rotations but not at the same time. The first spectral component R1 after rotation 1 remains as a mixture, while the third component R3 depicts photoreduction of the DMRL cofactor alone (Fig. 8). On the other hand, R1 represents the change only caused by the photoreduction of FAD, and R3 becomes a mixture after rotation 2 (Fig. 8).
This case study underscores an important principle. If two processes are intrinsically interdependent, or if two spectroscopic or structural responses are partially correlated with, instead of orthogonal to, each other, it is impossible to achieve a clean separation in the same Euclidean space established by SVD. The best achievable solution is to use two different rotations to isolate one of the right singular vectors as a single exponential that describes a single chemical process, here f1(t, τ1) after rotation 1 [Fig. 6(a)] and f3(t, τ2) after rotation 2 (Fig. 7). Similarly, only one of the left singular vectors at a time is deconvoluted as a pure component attributed to one single process (Fig. 8). A pure component depicting changes caused by a single reaction step cannot coexist with a pure time course of a single exponential. If they do, this process is independent of, hence orthogonal to the others. In other words, a thorough deconvolution as we formulate in Eq. (9) is not always possible.
The significant fluctuations observed in the difference spectra of PhrB are reoriented into the second dimension so that large fluctuations displayed in f2 are isolated from other dimensions (Fig. 6). R2, the fluctuating component, captures spectral changes only in shorter wavelengths <450 nm, largely matching the absorption range of the iron–sulfur cluster [Fig. 8; see also raw difference spectra in Figs. 3(a) and 3(b)], suggesting that the inorganic cofactor acts as a redox capacitor in the photolyase PhrB. This finding alluded to an intriguing proposal that the iron–sulfur cluster could play a protective role in redox active proteins by smoothing out electronic potential spikes arising from an imbalance of redox reactions on the fly, given its large capacity of four electrons and five valence states (Ren , 2024, 2023).
DECONVOLUTION OF ELECTRON DENSITY MAPS
Compared to the case studies in spectroscopy, SVD-based deconvolution is far more complicated when it is applied to a time series of electron density maps or more broadly to a collection of maps obtained under different experimental conditions. The challenge for SVD of crystallographic data is twofold. First, the number of datasets or maps is inevitably limited, that is, the columns of the data matrix A. Second, the maps obtained from dynamic crystallography experiments often suffer from significant systematic errors. To address the first challenge, we proposed to use alternative reference datasets so that more difference electron density maps can be produced for SVD analysis (see above). In addition, including electron density maps related by non-crystallographic symmetry (NCS) would help increase the total number of independently measured maps. On the other hand, NCS usually introduces an additional source of systematic discrepancy due to crystal packing. Since deconvolution enables a clean separation of both signals and systematic errors, the benefits gained from SVD of a larger collection of maps, such as better signal to noise ratio and improved interpretability, outweigh additional dimensionality required. In this section, we highlight a few recent findings from SVD-based deconvolution. It is important to point out that other than electron densities, this methodology has also been applied to other types of structural data such as interatomic distance matrices that enabled a large-scale meta-analysis of related structures in the Protein Data Bank (Ren, 2016, 2013a, 2013b).
Low-frequency oscillations in protein structure
The advent of x-ray free electron lasers (XFELs) has triggered a second wave of time-resolved crystallography in the last decade. By taking advantage of the intense and ultrashort x-ray pulses, dynamic structural events can be captured before the destruction power of x-rays takes effect (Neutze, 2014). When combined with femtosecond laser pulses as a reaction pump, the achievable and highly desirable time resolution is approaching the XFEL pulse duration. Such greatly improved time resolution of x-ray diffraction has enabled direct observation of low frequency oscillations in protein structures, which have been previously detected by ultrafast spectroscopic methods (Perticaroli , 2015; Xu and Havenith, 2015). Here low frequencies refer to tens to low hundreds wavenumbers or <10 THz, typically 25–45 cm−1 or a period around 1 ps, compared to much higher frequencies of bond stretching and wagging observed by vibrational spectroscopy. These low-frequency signals detected from heme proteins have been attributed to the ligand photodissociation reaction crossing a discontinuity at the intersection of two energy landscapes of the photoexcited state and the ligand dissociated product (Champion, 2005; Gruia , 2008; and Rosca , 2002). As such low-frequency oscillations are usually damped rapidly after a few cycles, they are often explained as vibrational energy relaxation mainly through the solvent in the protein matrix (Fujisaki and Straub, 2005; Okazaki , 2001). Since these spectroscopic signals and theoretical simulations did not offer any molecular image of oscillating structural element, the molecular basis and the functional role of these low-frequency oscillations remain highly speculative.
We applied the SVD-based deconvolution to a time series of XFEL datasets collected by Barends (2015) to probe the structural events in carbonmonoxy myoglobin (MbCO) triggered by the photodissociation of the CO ligand. We were able to isolate a near-perfect oscillation that is distinct from the characteristic signals commonly observed by static and time-resolved crystallography, such as the docked CO ligand, the out-of-plane motion of iron, and the heme doming (Ren, 2019). We extracted oscillatory signals for ∼2/3 of a full period, which appears within ∼30 fs upon photolysis and diminishes by the time point of 3 ps. Two major components are required to depict this oscillation and a least squares fitting results in a low-frequency model of ∼40 cm−1 or 1.2 THz. Such temporal dependency obtained from time-resolved crystallographic data agrees perfectly with the low-frequency oscillations detected by vibrational coherence spectroscopy (Gruia , 2008; Rosca , 2002). An immediate question would be “what is oscillating?” The definitive answer resides in the electron density component maps of the left singular vectors in the two-dimensional subspace. In contrast to other components showing excellent correlation with specific structural elements and describing the characteristic signals of photolysis from MbCO, large difference densities manifested in the two-dimensional subspace of oscillation consistently show no association with the heme, ligand, nor protein (see Fig. 7 of Ren, 2019). This suggests that the low-frequency oscillation associates largely with solvent channels. The oscillatory period of 0.83 ps equivalent to ∼40 cm−1 is the time that a mechanical wave traverses the molecular dimension. We thus postulate that the low-frequency oscillation arises from a shock wave triggered by femtosecond laser pulse. The shock wave may originate at the heme center, propagates mainly in the solvent channels, then bounces off the protein boundary for a few times, before it completely dissipates. Not surprisingly, the oscillation frequency depends on the size of a protein. It was reported that a slightly smaller heme protein of cytochrome c (cyt c) of 12 kDa and a much larger heme protein of cystathionine β-synthase (CBS) of 63 kDa exhibit their dominant low frequencies of 44 and 25 cm−1 or periods of 0.76 and 1.3 ps, respectively (Karunakaran , 2014, 2010). Given that these heme proteins are nearly globular in shape, the ratios between the cubic roots of their molecular weights could be used to assess their relative linear sizes, that is, cyt c:MbCO:CBS = 0.89:1:1.55, which are highly comparable to the ratios between the oscillatory periods as cyt c:MbCO:CBS = 0.91:1:1.6. This remarkable agreement has lent further support to our proposal. It is important to point out that such structural revelation for vibrational energy relaxation is only made possible by proper deconvolution of the heterogeneous XFEL datasets that record many concurrent structural events.
Low-frequency vibrational dynamics had also been observed from bacteriorhodopsin (bR) in purple membrane and lipid nanodiscs (Ferrand , 1993; Johnson , 2014). Femtosecond pump-probe spectroscopy study on visual rhodopsin revealed a characteristic frequency at 60 cm−1 or 1.8 THz, which led to a premature claim that “the primary step in vision is a vibrationally coherent process” (Wang , 1994). In a recent XFEL study on bR, Kovacs (2019) collected time-resolved datasets up to 10 ps, from which they reported oscillatory behaviors of torsion angles and interatomic distances around 100 cm−1 in the protein and its retinal chromophore. When we reexamined their datasets using the SVD-based deconvolution method and conducted a rotation in a 17-dimensional subspace, we found no signal that can be interpreted as photoisomerization of the retinal. Instead, we obtained extensive oscillatory signals in their data occupying ten significant SVD components, among which five frequencies were isolated from one another and from other types of signals (Ren, 2022). The dominant lowest frequency of 61 ± 2 cm−1 matches perfectly with the oscillation reported for visual rhodopsin (Wang , 1994). These strong oscillatory signals extracted in the most significant two-dimensional subspace are further modeled by a rapidly damping sinusoidal function that lasts for nearly two periods (Fig. 9). The other four oscillations range from 150 to 400 cm−1 (Figs. S3 and S4 of Ren, 2022). These two-dimensional oscillations are good examples to illustrate the concept of minimal dimensionality when a single dimension is not sufficient to describe a process [Eq. (9)]. Despite the excellent sinusoidal signals, the corresponding electron density features show no association with any specific structural element, such as the transmembrane helices or the chromophore [compare Figs. 9(a) and 9(b) to 10(a)], similar to the oscillating components of MbCO. Our findings corroborate the notion that the bR protein was pumped into some higher potential energy surfaces after multiphoton absorption by short pump pulses of excessive peak power density and the bR protein may have entered nonproductive pathways that evade the photoisomerization of the retinal, hence irrelevant to its biological function (Miller , 2020). As the oscillating components show, such vibrational energy was oscillating mainly in the solvent before dissipation [Figs. 9(a) and 9(b)]. Twice the wavelength of the shock wave is equivalent to the thickness of the seven-helix bundle of 17 Å given a wave velocity of 1.5 km/s or nm/ps. Here is a critical question: Does the oscillation at the same frequency of 60 cm−1 detected from visual rhodopsin arise from the primary step of vision, or a side effect of the retinal photoisomerization, or is it merely an artifact induced by femtosecond pump pulses of excessive peak power density? Key evidence has yet to show whether the oscillation persists after the rhodopsin molecule absorbs a single photon.
Photoisomerization of double bond and isomerization sampling
Light-triggered double bond isomerization is a primary photochemical event in many rearrangement reactions. Understanding this process at the molecular and electronic levels has been a central topic for many time-resolved projects based on synchrotron Laue diffraction and femtosecond serial crystallography at XFELs. Two model systems have been extensively studied to examine the trans-to-cis photoisomerization of a specific C=C double bond at near atomic resolution. They are the blue light photoreceptor photoactive yellow protein (PYP) incorporated with p-coumaric acid chromophore (Genick , 1997; Jung , 2013; Pande , 2016; Perman , 1998; and Schotte , 2012) and the light-driven proton pump bacteriorhodopsin (bR) mentioned above (Kovacs , 2019; Nango , 2016; and Nogly , 2018). In earlier studies that achieved time resolution of nanoseconds or longer, mixed signals from trans/cis configurations were not a major concern as the photoisomerization was observed at its completion. However, as more recent studies reached time resolution of picosecond and femtosecond at XFELs, the possibility of structural heterogeneity at early time points during photoisomerization and other photoreactions can no longer be ignored.
On PYP, Jung (2013) presented an early intermediate with the double bond C2=C3 of the chromophore twisted at 80°. An equivalent intermediate refined by Schotte (2012) showed a more convincing torsion angle of 33°. Pande (2016) reported the trans-to-cis transition from a torsion angle of 150° to 50° captured around 600 fs. On bR, Nogly (2018) described “the trajectory of retinal isomerization” as the C13=C14 double bond starts to twist from trans configuration to a torsion angle of 135° at early hundreds femtosecond, then evolves to 82° at mid hundreds femtosecond, and remains in a distorted cis conformation of 37° at 10 ps. Kovacs (2019) even tabulated the torsion angles of C13=C14 double bond at many time points before 10 ps and plotted its continuous evolution against the delay time. While some non-ideal torsion angles seem inevitable in structural refinement due to challenges in thorough deconvolution of trans and cis signals from each other, the overall picture of PYP focused on the trans-to-cis transition. However, the interpretation of photoisomerization event as continuous evolution around a double bond is fundamentally flawed.
Here a misconception likely arises from a literal understanding of the phrase “time-resolved crystallography” that disregards the resolving power of time, leading to an overinterpretation of data where every dataset at a given time point gives rise to a uniquely refined conformation. By definition, the refined torsion angle of a double bond is not a crystallographic observation from protein crystals but an interpretation of an observed electron density map; neither do all stereochemical parameters including bond lengths and angles. That is to say, the same experimental map could be interpreted differently. It is a common pitfall to interpret smoothly evolving electron density maps in time-resolved studies as a synchronous conformational change rather than continuous population shifts involving a small number of distinct metastable states. These non-ideal, or even contorted, double bonds most likely resulted from a misinterpretation of trans/cis mixed electron density maps as a single conformation. Structural refinement software based on least squares fitting would have no better choice but produces an odd torsion angle as a compromise between two distinct states.
Perhaps, this important point can be better explained using spectroscopic data. Similar to Figs. 2(b), 3(a), and 3(b), both PYP and bR also showed smooth and gradual color changes in time series of absorption spectra and during pH titration (Meyer, 1985; Meyer , 1987; and Oesterhelt and Hess, 1973). Clearly, such smoothness in color change must not be interpreted as a continuous change in torsion angle of a specific double bond passing 90° during isomerization. It is equally wrong to presume that all C13=C14 double bonds pedal synchronously throughout an ensemble of molecules. From the commonly accepted view of molecular dynamics, the trans and cis configurations of a double bond correspond to two energy wells narrowly centered around the torsion angle of 180° and 0°, respectively. The smooth transition displayed in absorption spectra results from the gradually falling population of the initial state and the gradually rising population of the end state. The populations of a few distinct states remain mixed at all times. Conformational heterogeneity is evidenced from the temporal smoothness. Similarly, the smooth evolution of electron density maps (see, for example, Figs. S3 and S8 of Nogly , 2018) also resulted from relative populational shifts of heterogeneous states. Forceful refinement of a single conformation against a dataset of mixed states would inevitably lead to an erroneous conclusion.
Why does such a straightforward concept in spectroscopy appear deceptive in crystallography? The cause could have originated from these two seemingly conflicting aspects: On one hand, the crystalline periodicity and symmetry presumes that all unit cells in a single crystal are considered identical. Furthermore, all asymmetric units in a unit cell are related by crystallographic symmetry. As such, an entire crystal contains chemically and conformationally homogeneous molecules. On the other hand, the thermodynamic principle affirms that the reactants, intermediates, and products of a chemical reaction initiated in a crystal would disobey the crystalline periodicity and symmetry, the so-called “time-dependent substitutional disorder” (Ren and Moffat, 1994). These scenarios are analogous to a well-rehearsed military march in formation vs a civilian holiday parade. Moffat asserted that the thermodynamic principle supersedes crystallinity so that “all molecules in the crystal lattice behave independently of one another as if they were in dilute solution” (Moffat, 1989). However, the recent time-resolved studies of bR appear to uphold crystallinity over thermodynamics as the photoisomerization of retinal was depicted as a single dominant intermediate species pedaling around C13=C14 double bond continuously from trans to cis and synchronously throughout the crystal. Yet, it is curious why such a cornerstone principle in time-resolved crystallography advocated long ago by Moffat had not been followed nor disputed.
One may argue that refining a single conformation does not exclude the possibility of other coexisting conformations. A theoretically plausible conformation derived from molecular dynamics simulation had often been used to defend the crystallographic refinement result. If a double bond is refined to a torsion angle of ∼90°, it implies that such distorted conformation represents the most dominant population among all possible photoproducts at a given time point, which can only be justified by an energy well near 90° torsion angle upon photon absorption. Yet, no evidence, theoretical or experimental, suggests so. Does each double bond pedal, twist, rotate, and pass the midpoint between trans and cis at 90° torsion angle? One may imagine that a small fraction of the molecular ensemble could happen to be passing 90° torsion angle during the x-ray exposure. Molecular dynamics simulation could also show such passing of an energy barrier. Nevertheless, a transient conformation at ∼90° torsion angle without an energy well cannot out populate other intermediate species in the well-known energy wells of trans and cis configurations. In other words, it is not convincing, could even be misleading, to declare a direct observation of a double bond so contorted at near 90° torsion angle solely based on a refinement result in protein crystallography. However, it is in fact not the central point of these studies to claim that an unexpected energy well had been discovered near 90° torsion angle. We will demonstrate below that a straightforward reinterpretation of the observed maps would satisfactorily explain the seemingly twisted double bonds—both trans and cis are coexisting in the ensemble and only their relative composition is changing.
The same principle also applies to any interpretation of time-resolved electron density maps that capture the rupture or formation of a chemical bond, as in the photolysis of carbonmonoxy myoglobin and hemoglobin. Two atoms are either bonded with a proper bond length or not bonded but engaged in van der Waals contact, or not in contact at all. A refined bond length longer than the proper value but shorter than van der Waals distance strongly suggests a mixture of both bonded and nonbonded species. A claim of direct observation of an unusual bond length in between is very likely erroneous and would require extraordinary evidence beyond structural refinement to prove otherwise.
Do these two contrasting interpretations, continuity in conformation vs continuity in composition, differ only in semantics? Are they equivalent therefore inconsequential to mechanistic understanding? We conducted a joint SVD analysis on a collection of time-resolved XFEL datasets of bR plus a few more synchrotron datasets (Weinert , 2019). By extensively applying the rotation theorem [Eq. (6)], we were able to isolate the concurrent structural events during photoisomerization of retinal along with detailed structural information. Contrasting to “the trajectory of retinal isomerization,” this analysis presents an entirely different story regarding how photoisomerization takes place in bR (Ren, 2022).
Retinal incorporated in bR isomerizes from all-trans to 13-cis upon photon absorption at ∼500 fs specifically at C13=C14 double bond with a high quantum yield (Govindjee , 1990; Tittor and Oesterhelt, 1990). In contrast, photoisomerization of all-trans retinal is slow and sluggish in an organic solvent or in the gas phase. Isomerization to 11-cis is preferred in solution, but other double bonds are also allowed to isomerize although with poor quantum yields (Freedman and Becker, 1986; Hamm , 1996; Koyama , 1991; and Logunov , 1996). Here the deconvoluted structural events explain how the interactions between the retinal chromophore and its protein environment catalyze the productive pathway of isomerization but shut down the other pathways to unwanted byproducts by tuning the potential energy surfaces of the ground and excited states. At the ground state, the all-trans retinal, including the trans C15=Nζ double bond of the Schiff base, adopts a rather flat geometry [Fig. 10(c)]. Both experimental and theoretical studies suggested that a charge separation is established between the β-ionone ring and the Schiff base upon photon absorption (Mathies and Stryer, 1976; Nogly , 2018; and Schenkl , 2005). Their data allowed us to estimate that the separated charge is equivalent to a fraction of an electron thus a pair of attraction forces of several piconewton are exerted on the β-ionone ring and the Schiff base toward each other (Ren, 2022). This photoinduced Coulombic attraction is an intrinsic property of all-trans retinal, regardless of whether it is incorporated in protein and in which protein. As the flat plate of the retinal becomes subjected to a longitudinal compressive force at the Frank–Condon point, that is, 0+ time point, the flat retinal has to be compressed and develop some curvature, which was exactly observed in our deconvoluted maps at ∼30 fs. Our intermediate structures refined against the deconvoluted maps show that the retinal molecule is shortened and curves into a wavy S shape (Fig. 2 of Ren, 2022). The S-shaped retinal is the consequence of twisting all single bonds from C6 to C15 along the polyene chain under compression [Fig. 10(c)]. The largest twist exceeds 60° from the ideal anti conformation of 180°, which occurs early in the photocycle and near the β-ionone ring. The twists subside gradually and diminish completely at a rather late L state [Fig. 1(c) of Ren, 2022]. Such conformational continuity reflects little variation in the energy landscape throughout the torsional rotation of a single bond in contrast to configurational discreteness of a double bond. All double bonds remain in near-perfect trans configuration except that C13=C14 flips to a near-perfect cis configuration before reaching the J state. We did not find any double bond twisted at the midpoint of isomerization necessary for satisfactory map fitting. This study demonstrates that the same crystallographic datasets can be interpreted differently depending on whether mixed structural signals are properly deconvoluted.
Although no double bond was found deviating from the near ideal trans and cis configurations, we postulated that all double bonds along the polyene chain attempt to sample the possibilities of isomerization, but to no avail, except one [Fig. 10(c)]. The common consequence of these attempts was directly observed as a transient expansion of the retinal binding pocket before a successful isomerization [Figs. 10(a) and 10(b)]. The failed isomerization attempts, and restricted single bond twists are due to the same reason—the retinal is tightly boxed in its binding pocket of the protein. The most constrictive site centers at the middle segment of the polyene chain, allowing larger twists toward the β-ionone ring and the only successful isomerization at C13=C14 double bond [Fig. 3(a) of Ren, 2022]. Without this binding pocket, a free all-trans retinal in solution is allowed to isomerize at multiple double bonds, but preferably to 11-cis as the C11=C12 double bond is located at the midpoint of the polyene chain. All isomerization quantum yields in solution are poor because single bond rotation is energetically easier. A cis double bond and a syn single bond would deliver the same outcome—bringing the β-ionone ring and the Schiff base toward each other—except that a syn single bond would revert to anti spontaneously after charge recombination but a cis double bond would not. Recently, multiple isomerizations from all- trans retinal captured during the early photoreaction of another microbial rhodopsin are likely incomplete isomerization attempts driven by the same Coulombic attraction force before settling to a complete isomerization to 11- cis (Kaziannis , 2024). As the protein environment of retinal in bR hinders many pathways to the wasteful byproducts, the productive quantum yield to 13-cis is greatly enhanced because it is the only remaining pathway. The consequence of such tuning of potential energy surfaces was also visualized as one of the SVD components R10 resulting from the rotation theorem [Eq. (6) and Fig. 10(a)]. This component map can be readily modeled by the retinal binding pocket expansion before isomerization [Fig. 10(b)]. The very early pocket expansion event at hundreds femtosecond strongly suggests that the retinal is sampling numerous options of isomerization along with single bond twisting. Although these attempts manage to push the binding pocket outward quite a bit, the elastic strength of the transmembrane helices imposes strong stereochemical restraints on the retinal chromophore, thereby imposing insurmountable energy barriers to suppress any byproducts, hence catalyzing the intended product. Several mutants at the retinal binding pocket fail to do so, which permits by-product formation to various extents, therefore impairing the pumping activity (Ren, 2022).
This study of photoisomerization in bR demonstrates that heterogeneous structural deconvolution offers mechanistic insights into the catalysis of a chemical reaction that cannot be obtained from electron density maps of mixed states. We must point out that the deconvolution is not limited to the separation of distinct conformational species, it also allows the separation of structural events. Here, retinal isomerization and its binding pocket expansion are two structural events that may or may not concur. As the deconvolution formulated in Eq. (3), each observed electron density map ρ(X, r) or difference map Δρ(X, r) is expressed as a linear combination of several maps of pure conformational species. When this linear combination is derived from SVD and the subsequent rotation [Eq. (5)], the components involved in the linear combination may or may not represent pure conformations. Instead, they could depict distinct structural events. It is equally informative, if not better, to present the I state of bR as a conformational state of all-trans retinal linearly combined with an expanded binding pocket, the precursor of the J state as 13-cis retinal combined with a recovered binding pocket, and the J state as 13-cis retinal in a contracted binding pocket (Ren, 2022). In other words, concurrent structural events taking place in different regions of a protein, such as isomerization of the retinal and its binding pocket expansion, are more likely to be orthogonal to one another, hence representing more elemental components compared to those depicting pure conformational species throughout a chemical reaction.
Electronic orbital transitions
Structural signals captured by x-ray crystallography are derived from detected x-rays scattered by the electrons of protein, prosthetic cofactors, and solvent molecules in a crystal. A protein with a molecular mass of m in Da contributes at least m/2 electrons to scatter x-rays. Each of these electrons resides strictly in its own orbital, that is, a probability distribution with a specific shape depending on its quantum state, although this electronic orbital could be extensively delocalized. In essence, a chemical reaction involves electron transfer from one orbital to another. This quantized orbital transition is in theory observable by crystallography as two distinct orbitals before and after the transition would scatter x-rays differently. However, it has been long deemed infeasible in protein crystallography, despite being highly desirable, because the electrons involved in the orbital transitions of a chemical reaction add up to a tiny fraction of m/2 electrons that contribute to the total scattering. It is widely believed that crystallographic observations of electron density distribution of a single or several individual electrons would require ultrahigh spatial resolution such as those demonstrated by charge–density analyses that map out bonding electrons in small molecule crystallography (Coppens, 1998; Koritsanszky and Coppens, 2001). For two different systems, we were able to resolve electron density maps that depict the images of orbital transitions in the respective protein environments. Such visualization of orbital transitions involving a small number of electrons is only made possible by deconvolution of signals and systematic errors from different sources.
It is well established that upon photolysis of carbonmonoxy myoglobin (MbCO), the bound CO ligand dissociates efficiently from an iron –porphyrin–imidazole system. For the heme Fe(II) bonded to the CO ligand, the five 3d orbitals are split into two energy levels so that its six 3d electrons occupy three orbitals at the lower energy level in pairs and leave the other two orbitals empty, resulting in a low-spin state S = 0. As the sixth ligand of CO is lost upon photolysis, the dissociated ligand parks at a nearby docking site, and the heme iron returns to its high-spin state S = 2 by occupying all of its 3d orbitals (Perutz, 1979). Theoretical calculation explained how the newly occupied 3d orbitals break the heme planarity and make the iron drop out of the heme plane (Ugalde , 2004). The structural consequences of the spin crossover are highly reproducible and have been repeatedly observed using static and time-resolved crystallography. However, none of the crystallographic maps showed any clear signal of orbital occupations in different spin states. By jointly analyzing a collection of ultrafast MbCO datasets (Barends , 2015), one of the rotated SVD components U4 (no differentiation between notations U and R) revealed a difference Fourier map that resembles a superposition of several 3d orbitals of the heme Fe(II) (Ren, 2019). This component features a network of positive densities in the shape of a cubic cage around the iron with eight peaks at the corners of the cube, a feature indiscernible in any of the raw difference maps (Fig. 3 of Ren, 2019). The positive cubic densities suggest a net gain of electron density distribution while the five 3d orbitals are all occupied when the sixth ligand is ejected by photolysis. This crossover of the heme Fe(II) from low- to high-spin takes place concurrently with the other events of conformational changes, such as departure of the CO ligand, heme doming, and out-of-plane displacement of the iron. Since the electron density redistribution due to spin crossover is relatively minor compared with atomic motions causing relocation of many tens or even hundreds of electrons, small signals of the spin crossover are completely immersed, hence undetected in the raw difference maps without a numerical deconvolution.
In addition, the coefficient c4 associated with U4 decreases monotonically from its maximum at tens femtosecond to ∼2/3 of its peak value as the reaction proceeds for 150 ps, exhibiting an exponential decay with a time constant of 44 ps (Fig. 11). We postulate that this decay arises from ultrafast occupation of the empty 3d orbitals before 10 fs followed by a slower transition to a medium-spin state S = 1 or an equilibrium between the high- and medium-spin states. This interpretation is supported by theoretical calculations showing that the more energetically favored ground state of an iron –porphyrin –imidazole complex is the triplet spin multiplicity, that is, the medium-spin state S = 1, rather than the high-spin state S = 2 (Rovira , 1997; Ugalde , 2004).
In a recent study, we reported extensive observations of orbital transitions in a four-iron-four-sulfur cluster ([4Fe4S]) incorporated in the DNA repair photolyase PhrB mentioned above (Ren , 2024). Our spectroscopic studies showed that the redox changes in the [4Fe4S] cluster are coupled to the interdependent photoreduction reactions of two organic cofactors simultaneously (Ren , 2023). According to quantum chemical calculations, two parallel faces of the cubic shape of the iron–sulfur cluster are unique among the six cubic faces due to the coupling of spin momenta of electrons between nearby metal centers. When ferrous and ferric ions coexist in the same cluster, they adopt a mixed valence of Fe2.5+ as a result of delocalization of 3d electrons (Noodleman and Case, 1992; Voityuk, 2010). Hence, the iron–sulfur cluster was theoretically predicted to have a layering feature in its electron density distribution. Remarkably, such layered electronic structure arising from the redox response of the iron–sulfur cluster has been uncovered by our SVD-based deconvolution of experimental difference maps (Fig. 12). We captured quantized electron density changes with characteristic molecular symmetries as [4Fe4S]2+ in the ground state is reduced or oxidized to four other valence states. In addition to those features centered on irons, the captured positive or negative peaks of electron density changes associated with all eight sulfurs, including Sγ atoms of the Cys anchors, suggest electron relaxation of the passive orbitals. We also observed several other important electronic events of orbital transition. Specifically, the valence isomers of [4Fe4S]2+ exhibit different orientations of the valence layers; and the iron–sulfur cluster appears to expand in size as it gains sufficient electrons to form a neutral cluster of [4Fe4S]0 (Ren , 2024).
As demonstrated in both MbCO and PhrB, it is possible to visualize electron density changes due to quantized electronic events if such small changes can be effectively isolated from many other concurrent conformational changes involving atomic displacements. Contrary to the common belief, sharp images at subatomic resolution are not necessary. This is because the probability distribution functions of electrons do not exhibit a mathematical discontinuity that could only be properly depicted by high-frequency terms in the Fourier summation, that is, ultrahigh spatial resolution. Given the resolving power of the SVD-based deconvolution, signals and systematic errors from multiple sources can be isolated, from which weak signals of orbital transition could be extracted. In the absence of significant atomic displacement, electronic movements could stand out as a few top SVD components.
CONCLUDING REMARKS
On this special occasion that celebrates the work and achievements of Keith Moffat, we felt compelled to revisit a key concept in time-resolved studies—structural heterogeneity. At the dawn of time-resolved crystallography, Keith laid a cornerstone by articulating that the inherent heterogeneity in a chemical reaction can be exploited for studying reaction mechanism without deliberate trapping. With recent developments in serial femtosecond crystallography at x-ray free electron lasers, it becomes ever more relevant and urgent to reiterate this concept in light of the current practice in the field. We were fortunate to join Moffat's early endeavor to capture protein dynamics by crystallography. Thanks to those struggles and failures, we have been striving for experimental and analytical innovations in our independent careers. In this review, we present the formulation, utility, and outcome of SVD-based deconvolution that enables isolation of distinct molecular events from heterogeneous experimental data. As we have demonstrated in diverse systems, the resolution of structural heterogeneity in dynamic data plays a key role in gaining new insights into some of the most fundamental biochemical processes.
ACKNOWLEDGMENTS
This work is supported in part by grants from the University of Illinois Chicago, National Institutes of Health (Nos. EY024363 and EY035671), National Science Foundation (No. MCB 2017274), Department of Energy (No. DE-SC0024529), and Chicago Biomedical Consortium Catalyst Award (No. C-086) to XY. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Zhong Ren: Conceptualization (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Xiaojing Yang: Conceptualization (equal); Funding acquisition (equal); Investigation (equal); Project administration (equal); Resources (equal); Validation (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding authors upon reasonable request.