The field of time-resolved macromolecular crystallography has been expanding rapidly after free electron lasers for hard x rays (XFELs) became available. Techniques to collect and process data from XFELs spread to synchrotron light sources. Although time-scales and data collection modalities can differ substantially between these types of light sources, the analysis of the resulting x-ray data proceeds essentially along the same pathway. At the base of a successful time-resolved experiment is a difference electron density (DED) map that contains chemically meaningful signal. If such a difference map cannot be obtained, the experiment has failed. Here, a practical approach is presented to calculate DED maps and use them to determine structural models.
THE PHYSICAL BASIS OF DED MAPS, REAL SPACE REPRESENTATION
The need for a practical tutorial how to analyze time-resolved crystallographic (TRX) data led to a workshop at the 10th Annual international BioXFEL conference in San Juan, Puerto Rico in May 2023. The majority of the material used at this workshop is referenced here. This manuscript is intended as a tutorial for a practical approach to producing high-quality difference electron density maps from crystallographic data. In addition, it outlines how to use a difference electron density map to determine a molecular structure, which could be a structure of an intermediate or any other structure of interest. The tutorial is driven by the need to analyze small signal in difference maps caused by a weak extent of reaction initiation that is common to TRX experiments. In addition, it serves as a refresher and as an entry point to this fascinating field.
Related information is also found in the literature cited below. However, this manuscript does not explain how to process TRX data, neither Laue data (Moffat, 1989; Ren and Moffat, 1995) nor data collected by time-resolved serial crystallography (Aquila , 2012; Tenboer , 2014; and White, 2019). It also does not explain how to globally analyze the TRX data and extract chemical, kinetic mechanisms, and pure species structures from them. For this, the reader is referred to advanced literature (Schmidt , 2003; Rajagopal , 2004; Ihee , 2005; Schmidt , 2005; Schmidt, 2008; Jung , 2013; Schmidt , 2013; and Schmidt, 2019).
THE PHYSICAL BASIS OF DED MAPS, RECIPROCAL SPACE REPRESENTATION
In general, a time-resolved crystallographic experiment consists of two steps. (i) Reference structure factor amplitudes are collected from protein crystals where a reaction has not been started. This can be done for example by exposing the crystals in the dark ahead of a pump-probe experiment. In step (ii), time-dependent structure factor amplitudes are collected at a time delay t after a reaction is initiated, for example, by an intense laser light pulse or by mixing with the substrate. is the amplitude of the time-dependent structure factor . is the reciprocal space equivalent of Eq. (1). Figure 1 shows that how is constructed from the structure factors of the intermediates and the reference state. After processing, two crystallographic datasets are obtained, one for the reference state, and another that probes the progress of the reaction at time-point t. (More datasets can be obtained at any other time-point.) Both datasets consist of a long list of Miller indices, structure factor amplitudes, and their measurement errors. Data are typically stored in a binary format called the mtz-format (after the progenitors McLaughlin, Terry, and Zelinka, see also the IUCr Commission on Crystallographic Computing) that is standard to the collaborative project number 4 (CCP4) suite of programs (Winn , 2011).
Figure 2 shows a flow chart how to calculate a difference map from measured data. The mtz-file that contains the reference structure factor amplitudes is called dark.mtz that with the time-resolved data is called light.mtz. This mimics the result of a pump-probe TRX experiment where the reaction is started in the crystals with laser light pulses. The mtz files could easily be called water.mtz and ligand.mtz where both files might have originated from a substrate diffusion experiment (Schmidt, 2013; Olmos , 2018), or given any other name. A third mtz file is required that contains the phases of the reference state, called here φref. This mtz file is obtained by refining a reference state model against the structure factor amplitudes contained in the dark.mtz file using standard refinement programs, such as refmac (Murshudov , 2011) or phenix (Adams , 2010; Liebschner , 2019). The reference model should be as complete as possible including all water molecules and other ions or ligands. The refinement should provide a set of the best reference phases possible. It also provides a set of calculated structure factor amplitudes that fit the observed structure factor amplitudes of the reference state as accurately as possible. The are by definition on the absolute scale.
Difference map calculations rely on proper scaling of the to the . In general, this is done by applying a resolution dependent scaling model that can consist of scale factors and isotropic or anisotropic B-factors (Evans, 2006; Dalton , 2022). Typically, the quality of the dataset is better than that of the data. This is because the reaction in the crystals produces disorder that is larger than that in the crystals at rest. The time-dependent intensities (or amplitudes) are corrected during scaling. Wilson plots with the and the have approximately the same slope after scaling. Disorder of individual atoms engaged in the reaction causes the magnitudes of the positive difference features to be often smaller than those of the negative ones.
Monitoring the Rscale is important to predict the success of a time-resolved experiment. When the data quality is poor because (i) not enough diffraction patterns are collected, (ii) unsuitable data processing parameters are employed, (iii) the detector geometry has not been determined correctly, or (iv) the detector itself is not in good working condition, the Rscale will be elevated caused by both systematic and experimental error in the collected data. Then, a meaningful difference electron density map cannot be obtained. Improvements in data processing or collecting more diffraction patterns (whatever is relevant for the particular experiment) may be necessary. From experience, Rscale factors near of smaller than 10% are found to be necessary to result in good difference maps with meaningful signal.
THE DIFFERENCE FOURIER APPROXIMATION
Note, this time, is on the absolute scale when true difference structure factors ΔFtrue are used with amplitude |ΔFtrue| and phase φΔF,true [Fig. 3(a)]. Here, attention to detail is required: |ΔFtrue| (the vertical bars enclose the entire ΔFtrue) is the amplitude of the true difference structure factor generated by subtraction of two vectors in the complex plane [Fig. 3(a)]. However, as mentioned (Fig. 1), the phase φΔF,true cannot be determined, and the difference structure factor ΔFtrue cannot be calculated. This would be the end of any attempt to determine a difference map, if there would not be an approximation that makes it possible that instead of the true difference structure factor an observed difference structure factor amplitude could be used together with the reference (dark) state phase.
With the notion that the phase of the difference structure factor ΔFtrue is not correlated with the reference phase, the second term on the right-hand side of Eq. (13) averages out in a Fourier summation. Equation (13) is the mathematical counterpart of the difference Fourier approximation. Equation (8) can be derived this way from first principles. The reason why is only represented on half the absolute scale and the justification that the reference phase can be used is now understood, because the difference structure factor DF with measured amplitudes Δ|F| and model phases φref is approximately . Accordingly, a difference map calculated with the observed differences and the phases of the reference state is a true difference map with DED features on ½ the absolute scale and some additional noise caused by the second term on the right-hand side of Eq. (13). Since can be accurately measured [Fig. 3(c)], the experimental DED map is very sensitive to structural and occupancy changes (Henderson and Moffat, 1971).
NECESSITY TO WEIGHT DIFFERENCE STRUCTURE FACTOR AMPLITUDES
DED MAP FUN
From the new file differenceF10.phs, a DED map (F10) can be calculated and compared to the observed difference map determined from the appropriate amplitudes and weights. Surprisingly, the two difference maps are almost identical [compare Figs. 4(a) and 4(b)]. If one would observe the F10 difference map during a LCLS beamtime, one would already be satisfied, since the experiment worked. This clearly outlines that the determination of a correct phase-flip pattern (either dark phase or the phase flipped by 180°) is sufficient to produce a good difference map. The art is to determine the phase-flip pattern correctly in the presence of experimental noise. If this pattern deteriorates caused by poorly measured structure factor amplitudes, the signal in the DED map deteriorates accordingly (Schmidt , 2003). This is the reason that weighting is important (see above). The goal is to down-weigh the contribution by potentially false phase flips to keep the difference signal as strong as possible.
STRUCTURES FROM DED MAPS
It should be mentioned at this point: Δρ is only determined on half the absolute scale using the measured amplitudes and the reference (dark) phases [see above, Eqs. (8) and (13)]. Therefore, for this equation to work with measured DED maps, twice the measured Δρ must be added for the extrapolation to remain related to 1/cI1. This needs to be kept in mind.
This equation is unusual unless one accepts that the Fourier synthesis [Eq. (23)] can be executed with negative amplitudes. Again, from crystallographic first principles (complex number algebra), any negative amplitude can be replaced with a positive amplitude when the associated phase is flipped by 180°. Figure 6 explains the reason using a representation in the complex plane. The extrapolated structure factor amplitudes are calculated by aligning the difference structure factor amplitudes with the dark state structure factor. If the are positive, an extrapolated structure factor with a magnitude larger than the reference amplitude and with the reference phase emerges. The can also be negative. Then, an extrapolated structure factor with a smaller magnitude [Fig. 6(b)] can be obtained. However, and this is inevitable, some of the extrapolated structure factor amplitudes calculated from Eq. (22) will become negative. This situation is depicted in Fig. 6(c). The resulting extrapolated structure factor points in the opposite direction compared to the reference structure factor. This means that this extrapolated structure factor is an ordinary structure factor (with a positive amplitude), but its phase is φref + 180°. Of course, it is this positive amplitude that must be submitted to a reciprocal space refinement program, and all extrapolated structure factors (with either the reference phase or the reference phase flipped by 180°) must be used to calculate an extrapolated map.
By omitting the “negative” structure factors from the Fourier summation [Eq. (23)] as suggested recently (De Zitter , 2022), an extrapolated map (ρnn, nn for no-negatives) is obtained that on the first glimpse appears quite similar to the correct DED map ρt. Structural refinement against ρnn, though, becomes more difficult. As an example, the ρnn map has been used to real-space refine the structure of the pCA chromophore [Fig. 5(b)] in photoactive yellow protein (PYP) with the goal to reproduce the torsional angle ΦT determined previously (Pande , 2016) [Fig. 5(b), red line]. To determine a structure that follows the electron density is more difficult and the result was quite different (ΦT ∼140°) from that where all structure factor amplitudes are maintained (ΦT ∼35°) (Pande , 2016). For the calculation of ρext in this example, the average extrapolated structure factor amplitude pointing in the direction of Fref was 255 × f (f is the Thomson scattering length of an electron), whereas the average amplitude pointing into opposite direction to Fref was 67 × f, a magnitude that cannot be neglected. It seems to be so that when the “negative” amplitudes are dismissed, it is not clear (a) whether an accurate characteristic N (NC) can be determined (see next paragraph for methods to determine NC), (b) whether the obtained extrapolated map is correct, and (c) whether a reciprocal space refinement against the incomplete |Fext| data will provide accurate structural displacements or structural relaxations. In any case, there is no physical reason to dismiss any extrapolated amplitudes. When an NC is determined for the PYP data (Pande , 2016; Pandey , 2020), the fraction of structure factors pointing in the opposite direction to those of the reference is about 25% of all structure factors in the dataset whether NC = 16 [Fig. 7(a)] or NC = 29 [Fig. 7(b)] is employed. This large fraction might also have a deeper meaning that merits investigation.
SEMI-AUTOMATIC DETERMINATION OF A CHARACTERISTIC FACTOR NC TO CALCULATE AN EXTRAPOLATED MAP
Method 2 uses the extrapolated maps themselves (Tripathi , 2012; Schmidt, 2019; and Pandey , 2020). A set of extrapolated structure factor amplitudes can be calculated with increasing factor N. Then, regions with strong negative density in the DED map become negative also in the extrapolated map [Fig. 8(b)]. These regions of interest (ROI) can be used to determine an accurate NC. Figure 7 shows real world examples from TR-SFX experiments at the LCLS and the European XFEL (EXFEL). With increasing N, the negative densities found in the ROIs of the extrapolated maps increase. The results are plotted as a function of N and the NC determined at the intersection of the two red lines in Fig. 7(a) or Fig. 7(b).
Method 3 correlates calculated difference electron density features to the observed difference density features. With increasing N, structural models MN can be refined against the resulting extrapolated structure factor amplitudes. From the resulting model, structure factors can be determined that can be used with the structure factors from the reference model to calculate a MN–Mref difference map. Difference features are compared and correlated with the observed difference features (Claesson , 2020). The correct N is found when the correlation between the two sets of difference features (observed and calculated) is optimum. Such an analysis can be performed in a user-friendly way using the program Xtrapol8 (De Zitter , 2022). The author acknowledges a poster presentation by De Zitter et al. at the 2023 PSB symposium in Grenoble, Fr.
Once a characteristic Nc is determined, an extrapolated map can be calculated that allows for the determination and a real space refinement of a structural model, e.g., in Coot (Emsley , 2010). An example of an extrapolated map from a recent experiment on photoactive yellow protein (PYP) is shown in Fig. 5(b) together with the corresponding difference map [Fig. 5(a)]. The ROI where the negative electron densities in the extrapolated maps are integrated is denoted by the dashed circle in Fig. 5(b). NC = 16 has been determined from Fig. 7(a). If NC would have been determined very different from 16 (for example NC = 8), the torsional angle ΦT would likely be substantially different. Since the torsional angle is a functional reaction coordinate for the PYP chromophore trans to cis isomerization, it is of paramount importance to determine this angle correctly. It has been suggested to improve the extrapolated map by density modification, such as solvent flattening and histogram matching, for example, using the program “dm'”(Cowtan, 1994). An improved structural model can then be determined from such a map and refined against the |Fext| (see also below).
LOVE-HATE RELATIONSHIP WITH LARGE NC
Large extrapolation factors are a nuisance. Reciprocal space refinement against the extrapolated structure factor amplitudes (all of them, see discussion above) usually results in inacceptable Rcryst values of > 40%. Here, a way to remedy this is described.
As a detail: by the application of the estimated phase of the difference structure factor φ|ΔF|,#, Δ|F|obs becomes an ordinary amplitude |ΔF|obs (note the position of the vertical straight lines to denote absolute values of the difference structure factor which is used [compare Eqs. (24) and (26)]. The calculation is particularly easy, if the Δ|F|obs are stored as positive values for book-keeping [Eq. (17)]. From these, a phased extrapolated electron density map is calculated. Refinement against the phased extrapolated structure factor amplitudes immediately results in acceptable Rcryst values.
A more sophisticated method to recover the magnitude of the true difference structure factor amplitude that was previously only estimated by projection [Eq. (10)] is shown in Fig. 9. Here, the situation is depicted by an Argand diagram that makes use of the estimated true phase φ|ΔF|,#. The orange difference is the weighted difference structure factor, and the red difference is a corrected difference structure factor that closes the triangle between the measured time-dependent amplitude and the sum of the dark structure factor and the difference structure factor. Extrapolation [Eq. (26)] can then be pursued with the corrected and the phase ϕ|ΔF|,#. Since the difference Fourier approximation is not applied, the reason for the factor 2 in [Eqs. (21) and (22)] vanishes. N becomes much smaller, and the extrapolated map appears much improved.
The method shown in Fig. 9 needs closer examination. In particular, the probability of the true difference structure factor given the noise and other systematic errors in the data and the structural models must be evaluated perhaps in a similar way as it has been done previously for partial structural models with errors (Read, 1986, 1997). In addition, the phase bias introduced by a model refined against the extrapolated map needs to be estimated. The best way to do this is by engaging an appropriate simulation using realistic structure factors with noise followed by a statistical analysis as performed previously (Read, 1986; Schmidt , 2003).
It can only be hoped that this tutorial will help to promote the widespread usage of TRX methods. Appropriate software solutions will be user-friendly with push-button interface and more functional in the future so that everyone with general crystallography knowledge can learn and practice structure determination from time-resolved DED maps.
This work was enabled by NSF Science and Technology Center Biology with XFELs (BioXFEL), NSF-STC 1231306. The author thanks P. Schwander and E. Stojkovic for commenting on an earlier version of this manuscript.
Conflict of Interest
The authors have no conflicts to disclose.
Marius Schmidt: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Resources (equal); Software (equal); Visualization (equal); Writing – original draft (equal).
No data were generated for this manuscript.