The function of light-harvesting complexes is determined by a complex network of dynamic interactions among all the different components: the aggregate of pigments, the protein, and the surrounding environment. Complete and reliable predictions on these types of composite systems can be only achieved with an atomistic description. In the last few decades, there have been important advances in the atomistic modeling of light-harvesting complexes. These advances have involved both the completeness of the physical models and the accuracy and effectiveness of the computational protocols. In this Perspective, we present an overview of the main theoretical and computational breakthroughs attained so far in the field, with particular focus on the important role played by the protein and its dynamics. We then discuss the open problems in their accurate modeling that still need to be addressed. To illustrate an effective computational workflow for the modeling of light harvesting complexes, we take as an example the plant antenna complex CP29 and its H111N mutant.
I. INTRODUCTION
In photosynthesis, the initial step is accomplished by specialized supercomplexes, photosystems, which comprise two components, light harvesting (LH) pigment–protein complexes, and the reaction center. LH complexes are unique biological systems, which integrate a protein matrix with a generally large aggregate of pigments. The number and type of pigments and the structure of the protein matrix are specific for each different photosynthetic organism.1 Through the multichromophoric aggregate, the LH complex absorbs sunlight and rapidly funnels the energy toward neighboring complexes and ultimately to the reaction center.2–4 The energy delivery by LH complexes is explained in terms of multichromophoric excited electronic states called excitons. Each exciton generally involves more than one pigment and is characterized by an energy slightly different from the others even if the aggregate is composed of the same types of molecules. The resulting energy ladder is not only determined by the strength of the interactions between excitations localized on the single pigments but also significantly tuned by the interactions of each pigment with the surrounding residues and the electric fields created by the whole protein and the external environment. Moreover, the excitons are coupled with the nuclear degrees of freedom of the multichromophoric aggregate and the protein.
Due to the complexity outlined above, an accurate atomistic modeling of LH complexes is extremely challenging, as it requires the integration of very different physical models, which span from quantum mechanical (QM) methods to classical models, dynamic descriptions, and statistical analyses. Moreover, for that integration to work, the different models need to be all coupled together, as the functioning of the LH complex is a real multiscale process in space and time.
In the last few decades, there have been important advances in the atomistic modeling of LH complexes,5–10 which have been largely pushed by the increasing amount of experimental data about their structures, the spectroscopic properties, and the energy transfer processes.11–17 In parallel, there has been an important activity in the development of computational methods that have allowed for simulating properties and processes of LH complexes embedded in different environments.18–21
This Perspective reviews these advances using the typical workflow followed in the computational investigation of a LH complex, which starts from a predicted or experimentally characterized structure, adds the environment, simulates the coupled dynamics of all the parts, calculates the excitonic states, and finally reproduces the optical spectra to be directly compared with experiments. A graphical representation of the workflow with the indication of all the different components is given in Fig. 1. In the presentation and discussion of the different parts of the workflow, we use a minor LH complex present in PSII of plants (CP29) as an example of application. Finally, the optical spectra of the complex will be compared with a recently proposed mutant22 to show the accuracy and completeness that the atomistic modeling can achieve in predicting even small effects as those induced by a single residue mutation.
General workflow for the atomistic modeling of light harvesting complexes. Atomistic simulations of LH complexes start either from experimentally determined structures (determined from, for example, x ray or cryo-EM) or from computational guesses, the latter being either traditional homology modeling or newer neural-network based algorithms, such as AlphaFold 2. Amino acidic mutations can be introduced in order to exploit atomistic simulations for complementing mutagenesis experiments. The starting structure is then protonated, inserted in a membrane, and solvated in order to simulate an environment that mimics the in vivo conditions. Molecular dynamics (MD) simulations are then run to sample LH conformations at the desired external conditions of temperature and pressure. The analysis of the MD simulations (both in terms of global convergence tests and of a more focused analysis of the embedded pigments) is employed to guide the sampling of the pigment conformations to be used in QM/MM calculations. The first part of the trajectory is usually discarded from the sampling (gray film in the “sampling” box) so as to avoid sampling transient, non-equilibrated structures (see Sec. II). QM/MM calculations are run over the sampled pigment conformations in order to extract the exciton parameters for simulating the spectra (see Secs. III and IV). The obtained results are the subject of a careful analysis and are post-processed in order to deal with, e.g., an inaccurate treatment of the vibrational degrees of freedom of the pigments (Sec. IV B). The post-processed data are finally employed to compute the optical spectra, the quality of which is assessed by a comparison with experimental data. An application of this workflow is provided in Sec. V.
General workflow for the atomistic modeling of light harvesting complexes. Atomistic simulations of LH complexes start either from experimentally determined structures (determined from, for example, x ray or cryo-EM) or from computational guesses, the latter being either traditional homology modeling or newer neural-network based algorithms, such as AlphaFold 2. Amino acidic mutations can be introduced in order to exploit atomistic simulations for complementing mutagenesis experiments. The starting structure is then protonated, inserted in a membrane, and solvated in order to simulate an environment that mimics the in vivo conditions. Molecular dynamics (MD) simulations are then run to sample LH conformations at the desired external conditions of temperature and pressure. The analysis of the MD simulations (both in terms of global convergence tests and of a more focused analysis of the embedded pigments) is employed to guide the sampling of the pigment conformations to be used in QM/MM calculations. The first part of the trajectory is usually discarded from the sampling (gray film in the “sampling” box) so as to avoid sampling transient, non-equilibrated structures (see Sec. II). QM/MM calculations are run over the sampled pigment conformations in order to extract the exciton parameters for simulating the spectra (see Secs. III and IV). The obtained results are the subject of a careful analysis and are post-processed in order to deal with, e.g., an inaccurate treatment of the vibrational degrees of freedom of the pigments (Sec. IV B). The post-processed data are finally employed to compute the optical spectra, the quality of which is assessed by a comparison with experimental data. An application of this workflow is provided in Sec. V.
II. SETUP OF THE SYSTEM AND CONFIGURATIONAL SAMPLING
At the present time, LH complexes of different photosynthetic organisms have been structurally characterized through x-ray diffraction or cryo-EM techniques. These structures represent a fundamental ingredient for the atomistic models; however, they also present important limitations. First of all, they do not include any information on the external environment, which for most of the LH complexes is a membrane. Moreover, fluctuations due to the temperature are not accounted for. Finally, the standard resolution of crystal or cryo-EM structures of LH complexes is not high enough to describe the internal geometry of the pigments with the atomic accuracy needed for safely applying QM calculations on them.
For all these reasons, an atomistic modeling of LH complexes based on only the crystal structure can give a qualitative description of the system at very low temperatures, but, in general, it cannot be representative of the room-temperature situation. To go beyond this too simplistic static picture, we can introduce molecular dynamics (MD) simulations. Molecular dynamics has been shown to be a very effective approach for sampling configurations of the LH complex and its environment at the temperature of interest.21
Molecular dynamics simulations of light-harvesting complexes usually started from the available x ray or cryo-EM structures deposited in public repositories.23 Another means of obtaining starting structures for MD simulations when the experimental structures are not available is to employ homology modeling techniques, which have also been employed for LH complexes.24–26 These techniques can be employed if the target protein sequence is sufficiently similar to the sequences of known structures. In addition, recent algorithms based on neural networks, such as AlphaFold 2,27,28 have proven able to determine protein structures as accurately as their experimental counterparts, providing another means of obtaining starting structures for molecular simulations. Especially when the starting structure is chosen based on a computational guess, such as in homology modeling or by employing neural networks, MD simulations and their spectroscopic analysis can be employed to validate the structures and gain insights on the architecture of systems that are still to be structurally resolved experimentally. Finally, the starting point being either an experimental structure or a computational guess, it is easy to slightly alter the initial structure so as to introduce, e.g., point mutations. Especially in the context of investigating LH complexes, the simulation of mutants can be significantly valuable as it complements site-directed mutagenesis experiments. We provide such an example in Sec. V.
Due to the complexity and the large dimension of the whole system (comprising the complex and the surrounding environment), a simplified description is commonly used for the potential energy (and the corresponding forces) in terms of Molecular Mechanics (MM) Force Fields (FFs). At the present times, accurate FFs are available to describe protein structures and their thermal fluctuations and for describing water and lipid membranes. However, the application of the available FFs to the pigments is by far less accurate. This is the reason why a reparameterization of the FF for describing the nuclear degrees of freedom of the pigments is generally introduced. In this reparameterization of the FF, QM calculations of internal geometry and vibrational properties are commonly used as reference data.29–31
Once the potential energy model of all the components has been defined, the system can be assembled and its trajectory can be determined following the same protocols used in MD simulations of biomacromolecules. In order to achieve a meaningful sampling of the configurational space, ns timescales are often not sufficient. We have observed fluctuations on the 100 ns timescale in LH systems, and we, therefore, recommend μs timescales whenever possible.32 We also recommend running several replicas of the same system obtained using different initial velocities. Different replicas are then subject to standard convergence assessment tests, such as monitoring the kinetic energy of the system, the root-mean-square deviation (RMSD) of different parts of the protein, and the average area available to each lipid in the plane of the membrane. Other quantities, such as selected coordinates or order parameters, may be used to assess equilibration. Averages of these quantities may systematically drift during a transient time, indicating that the system has yet to sample the equilibrium ensemble. When the averages reach a plateau or oscillate around a mean value, the trajectory can be considered locally equilibrated. The transient part should be discarded before further analysis and especially before computing excitation properties.
The distinct feature of LH complexes is the dense packing of pigments in highly specific arrangements that maximize energy transfer while avoiding concentration quenching.3 At variance with other biomacromolecular systems, then, the analysis of LH complex trajectories should pay particular attention to the relative arrangement of the embedded pigments, i.e., their relative distances and orientations. Pigment–pigment orientation measures that are directly linked to their interaction strength can be based on the Förster orientational factor κ [see Eq. (5) in Sec. III B]. In addition, analyzing the hydrogen bond interactions between the protein and the bound pigments can provide useful insights on interpreting the spectroscopic properties of the complex.25 This kind of analysis provides information on the interaction network within the LH complex and on how it is influenced by the protein dynamics.
Further effective analysis instruments aiming at disentangling the complex conformational landscapes in LH complexes can be borrowed from the ever-growing field of unsupervised learning.33 In particular, dimensionality reduction methods [such as principal component analysis (PCA)34 and Isomap35] can be employed to consistently reduce the redundancy in the underlying conformational ensemble starting from properly featured protein conformations,36 thereby improving the ease of characterization of protein conformations. In a similar spirit, kinetic-based independent component analysis algorithms, such as time-lagged independent component analysis (tICA),37 can also be used to characterize LH complexes conformations while conserving some degree of information lying in the temporal structure of the trajectory.38 Additionally, clustering techniques can be adopted to group similar conformations together, effectively coarse-graining the explored structures and facilitating the identification of key structural differences between the observed conformers. When running subsequent QM/MM calculations, an a priori division of the explored ensemble into clusters may be useful to reduce the number of calculations while ensuring an accurate sampling of the LH complex conformations. In addition, clustering can be used to connect structural features to spectroscopic observables.26,36
III. THE ELECTRONIC DESCRIPTION OF THE MULTICHROMOPHORIC AGGREGATE
From a strictly computational point of view, the exciton model can be seen as a “divide and conquer” method, where the complexity of the multichromophoric aggregate is reduced by separating the problem into smaller elements. In fact, the matrix elements of the Hamiltonian (1) can be easily computed from quantum chemical calculations on single pigments. However, the exciton model has enormous advantages that go beyond the mere computational efficiency. Strikingly, the exciton coefficients allow for a straightforward interpretation of the aggregate’s excited states and of their energy variations. As we will see in Sec. IV, this model also enables the description and calculation of optical line shapes.
Physically, the exciton model corresponds to neglecting all charge-transfer contributions to the aggregate excited states. This is generally a reasonable assumption as the inter-pigment distances in most of the LH complexes are small enough to have non-negligible excitonic couplings, but at the same time, they are large enough to avoid significant charge transfer. However, when needed, the Hamiltonian (1) can be augmented with charge-transfer states by including their energies and couplings with localized excitations.39,40
A. Site energies
Site energies represent the first main ingredient in the exciton Hamiltonian. Each site energy can be computed as the excitation energy of a pigment embedded in its environment. As such, calculation of these quantities requires an appropriate QM method and a description for the environment. Among the QM methods, a fair compromise between computational cost and accuracy is represented by linear-response time-dependent density functional theory (TD-DFT). Despite its well-known deficiencies, TD-DFT represents a valid and extensively tested method to compute excited states. More accurate QM methods have been employed for light-harvesting complexes, but their use has been mostly limited to benchmark calculations. Multireference methods, such as multireference configuration interaction-DFT (MRCI-DFT) and complete-active-space SCF (CASSCF), possibly in combination with perturbation theory (CASPT2), have been used to complement or assess TD-DFT methods.41–44 More recently, single-reference accurate methods have been employed. Approximate coupled cluster methods, such as second-order coupled cluster (CC2) and pair natural orbital coupled cluster [domain-based local pair natural orbital similarity transformed equation of motion-coupled cluster singles and doubles (DLPNO-STEOM-CCSD)], have been applied to photosynthetic pigments.45–47 Other alternatives are post-SCF methods based on equations of motion or response approaches, such as the algebraic-diagrammatic construction (ADC) and the Bethe–Salpeter equation in the GW approximation (GW/BSE). These methods have found applications for photosynthetic pigments.45,48 Semiempirical methods represent a much less expensive alternative to obtain excitation properties with reasonable accuracy. Among these, time-dependent density functional tight binding (TD-DFTB) and its long-range corrected (LC) version showed promise for describing light-harvesting pigments.49
Pigments in LH complexes are surrounded by a highly anisotropic environment, and they are bound in a protein pocket by establishing specific interactions, such as hydrogen bonds. Only a computational model that describes this anisotropy can reproduce excitation energy variations among the different sites of the complex. The simplest such model is quantum mechanics/molecular mechanics (QM/MM) with electrostatic embedding. In electrostatic embedding QM/MM, the chromophore is treated with the QM level of choice, whereas the environment is described as a set of fixed point charges. Irrespective of the QM description, this model introduces an external one-electron operator analogous to the electron–nucleus Coulomb interaction. The conceptual simplicity and ease of implementation coupled with any QM method have made this model extremely popular for excited-state calculations, also in light-harvesting complexes.47,50,51 While electrostatic embedding QM/MM can describe the effect of the MM environment on the QM part, it cannot describe how the environment is affected by the presence of the QM electron density or by its changes in response to excitation.
A more refined QM/MM flavor is the polarizable embedding QM/MM.52 Polarization has been included in QM/MM using various formulations based, e.g., on Drude oscillators,53–55 fluctuating charges,56–58 or induced point dipoles (IPDs).59–66
All these specificities make polarizable QM/MM methods computationally more expensive than standard electrostatic embedding analogs. New linear-scaling implementations are required to reduce the computational overhead of the formulation.69 The IPD formulation of polarizable QM/MM methods (from now on QM/MMPol) has been extensively used for calculating site energies of pigments within LH complexes coupled with linear-response TD-DFT methods.19,25,70–72
It is important to disentangle the different effects that the environment has on the site energies. The excitation energy of a chromophore can be modulated by the environment directly, through electrochromic effects, and indirectly by changing the internal geometry of the chromophore. These effects can be separated naturally by computing the excitation energy of the chromophore embedded in the MM(Pol) environment and isolated in vacuo.19 The difference between the two calculations represents the direct electrochromic effect due to the environment. We stress that the geometry of the pigment must be the same in both calculations in order to obtain a pure electrochromic effect when taking the difference.
The indirect geometrical effect on the site energies can be defined by comparing the in vacuo site energies of each distinct chromophore with a reference geometry. The environment contribution is very sensitive to the fluctuations of the environment. As such, an adequate sampling of environment degrees of freedom via, e.g., MD simulations, is needed to obtain a realistic description of the chromophores embedded in LH proteins. Indeed, as shown in Fig. 2, an analysis based on the experimental structure only, ignoring thermal effects included with the MD, provides a misleading picture of the influence of the environment on the excitation energy of the embedded chromophores. This kind of analysis allows us to separate “geometrical” and “environment” effects on the site energies and to analyze the origin of the specific tuning of the site energy of each pigment.72
Environment effect on chlorophylls in CP29. The distributions show the environment contribution to the site energy due to the protein environment only (i.e., without the membrane and waters) in the MD simulation of CP29. This contribution is obtained as , where is the site energy obtained through a QM/MM calculation in the protein environment and is the site energy obtained from a vacuum QM calculation on the pigment alone (see also Sec. IV B). A distribution is shown for each chlorophyll present in the CP29 system. The label of each chlorophyll is shown on the left. The red dashed line represents the same contribution computed on the starting (cryo-EM) structure (PDB: 3JCU). Hydrogen atoms are optimized prior to the calculation in the cryo-EM structure. All values have been computed using a QM/MMpol model using the M062X/6-31G(d) level of calculation.
Environment effect on chlorophylls in CP29. The distributions show the environment contribution to the site energy due to the protein environment only (i.e., without the membrane and waters) in the MD simulation of CP29. This contribution is obtained as , where is the site energy obtained through a QM/MM calculation in the protein environment and is the site energy obtained from a vacuum QM calculation on the pigment alone (see also Sec. IV B). A distribution is shown for each chlorophyll present in the CP29 system. The label of each chlorophyll is shown on the left. The red dashed line represents the same contribution computed on the starting (cryo-EM) structure (PDB: 3JCU). Hydrogen atoms are optimized prior to the calculation in the cryo-EM structure. All values have been computed using a QM/MMpol model using the M062X/6-31G(d) level of calculation.
Despite their power, QM/MM calculations of excitation energies suffer from systematic errors stemming mainly from the quantum chemical method and, in part, on the method used for obtaining chromophore geometries. In fact, the expected accuracy of QM methods for excitation energies is at least around 0.2 eV for TD-DFT methods and ∼0.1 eV for more accurate methods, such as ADC(2) and CC2.73 These errors are larger than the typical width of site energies in LH complexes, but they do not prevent a quantitative application of QM/MM methods to computing site energy ladders. In fact, while these errors are large, they are usually systematic, and they do not affect excitation energy differences among chemically identical pigments. For this reason, it has become customary to rigidly shift the calculated spectra, or equivalently the site energies, by a certain amount that takes into account the intrinsic error of the QM level of theory.25,51,72,74 This correction is justified by assuming that the error is systematic. In this respect, the same shift should be applied to different systems if they contain the same pigments and are treated at the same level of theory.
The situation is more complicated when a LH complex contains different types of pigments, such as Chl a and Chl b, or when multiple excited states need to be considered. There, it is not guaranteed that the QM method will give the same error for all excited states. For example, even the DLPNO-CCSD method gives different errors for the different transitions of Chl a.46 A possible solution is to use a different parametric shift for different pigment types. This approach seems quite straightforward, but it introduces a large number of empirical parameters. In some cases, the relative error for two different pigments is small enough not to influence substantially the spectral shape,72 and it can be simply ignored. When the relative error cannot be ignored, the safe choice is to determine the shift parameters from a system other than the system of interest, and possibly simpler. For example, the excitation energies of chromophores in solution could be used to determine the relative error that a given QM method makes on different transitions.
A much simpler approach to environment effects on site energies of pigments in LH complexes is the use of continuum models.75,76 These models lose the fine details of the protein environment, which makes the description necessarily less accurate than an atomistic model. However, continuum models can still represent a useful and inexpensive alternative to QM/MM especially if the QM subsystem is augmented to include the protein residues involved in specific interactions with the pigments.70,77 A recent alternative is to couple both MMPol and a continuum model to the QM description of a chromophore.78,79 This strategy has been applied to phycocyanin complexes of red algae.80
Other approaches are able to go beyond the classical description of the protein environment. The frozen density embedding (FDE) scheme and its extension to excited states (subsystem TD-DFT) divide the entire system into fragments, each described quantum mechanically and experiencing the embedding potential from the other QM fragments.81 This method has been applied to computing site energies and couplings in LH complexes.82
B. Couplings
Electronic couplings between localized excitations determine exciton delocalization effects, which reflect on optical spectra of LH complexes and on the exciton dynamics.
Contrary to multipole expansions, distributed expansions place a charge on each atom of the chromophore in order to reproduce the transition density and calculate the Coulomb coupling as an interaction between collections of point charges. In the TrEsp method,86 these atomic transition charges are determined by fitting the electrostatic potential generated by . This method has proven very effective in approximating the true Coulomb coupling almost exactly.
Importantly, the electronic couplings can also be affected by the surrounding environment. The environment influences the QM density of the chromophore, and therefore, it also affects the transition properties. This results in an indirect effect on the couplings. This effect can be described with any QM/MM embedding (electrostatic or polarizable). If the couplings are calculated in the context of subsystem TDDFT,82,85 the effect of the surrounding environment is also included.
If one approximates the environment as a dielectric, the sum of and can be thought as an interaction screened by the dielectric. This model has been extensively used in the literature, for example, coupled to the TrEsp method in the so-called Poisson-TrEsp strategy87 and to the Polarizable Continuum Model (PCM).88 However, in the highly inhomogeneous environment of LH complexes, the mediated interaction is not always well represented by a dielectric screening. In fact, a different positioning of the environment molecules around the chromophores can lead to substantially different screenings on the Coulomb coupling.9,40,89
As mentioned above, the Coulomb coupling is often a very good approximation of the total electronic coupling, at least for bright transitions. Things change when we consider optically forbidden transitions and chromophores in van der Waals contact with each other. For optically forbidden transitions, the first multipole term of the Coulomb coupling is zero. This does not guarantee that the Coulomb coupling is negligible, but it usually means that this term is quite small. When two chromophores are in contact, the short-range coupling term is nonzero. Combining the two cases, it is possible that the short-range coupling is of the same magnitude as the Coulomb coupling, if not larger.
Unfortunately, short-range contributions to the coupling are not as easy to compute as the Coulomb term.90 Some short-range terms arise from the fact that, at close contact, the electron clouds of the two chromophores experience Pauli repulsion, i.e., distort to satisfy the exclusion principle. As a consequence, perturbation expressions based on the isolated chromophores do not yield accurate results for this contribution.83
A more complete method to account for all contributions in electronic couplings is to use a QM calculation of the chromophoric dimer.40,90,91 In this case, the electronic Hamiltonian has to be transformed to the monomeric “site” basis with a localization procedure. Given that the dimer of chromophores is considered in the QM calculation, this approach gives, in principle, all contributions to the coupling. However, this strategy is clearly more computationally expensive. On the other hand, this strategy allows for deriving both exciton and charge-transfer couplings from the same calculations40,92 and, thus, describe all types of short-range interactions among chromophores.
IV. SIMULATION OF OPTICAL SPECTRA
From now on, for the sake of simplicity, we focus on absorption spectra, but similar considerations can be given for other optical spectroscopies and, at least in part, for two-dimensional electronic spectroscopy.20
Whereas the excitation energies and dipoles describe the position and intensity of the spectra, the band shape is determined by coupling of the electronic transitions to the vibrational degrees of freedom. The most common approach to include effects of the vibrational motion is the second order cumulant expansion in the exciton basis.93 The cumulant expansion formalism is exact for Gaussian fluctuations of the excitation energy, i.e., for a linear coupling of the excitation energy to the vibrations of the system.94
The formalism described above is only one of the possible choices when using the cumulant expansion in an exciton system. In fact, it contains two key approximations. First, the excited-state dynamics is assumed to be Markovian and exciton relaxation is described by an exponential decay function with lifetime τK, which is usually derived with the Redfield rate equation. Various approaches at overcoming this approximation have been devised, either by correcting the Redfield rates95 or by directly considering time-dependent rates in the cumulant expansion.96 The second approximation is the secular approximation, which neglects off-diagonal elements in the cumulant expansion and allows for writing the spectra as a sum over exciton states [Eq. (8)]. The most general approach, the full second-order cumulant expansion, relaxes both of these approximations.97,98
So far, we have introduced only the effects due to the “dynamic” component of excitation energy fluctuations, which defines what is also called “dynamic disorder” and contributes to the homogeneous line shape. However, an additional effect, commonly denoted as “static disorder,” should be considered as well. Here, the motions responsible for the disorder are those that lead to energy fluctuations slower than the absorption process, such as conformational changes of the environment and/or the pigments. Static disorder is usually included by introducing a set of exciton Hamiltonians calculated using site energies and couplings obtained for different configurations of the system (e.g., uncorrelated snapshots extracted from the MD simulation).72,99–101 The set of eigenstates resulting from this ensemble of excitonic Hamiltonians is then used to generate the corresponding set of spectra through Eq. (8). These spectra are finally convoluted together to obtain the final ensemble-averaged spectrum.
When applying such a strategy, however, one has to pay particular attention to how the different excitonic parameters respond to changes in the pigment geometry. The electronic couplings are quite stable with respect to changes in the internal geometry of the pigments depending more on their relative position and orientation, particularly when bright transitions are involved. Site energies, on the other hand, are known to vary significantly upon modest changes of the molecular geometry and even more when the pigment presents an extended conjugated chain as in the case of (bacterio)chlorophylls and carotenoids.25 If a MM-MD trajectory is used for considering static disorder, site energy changes following alterations of the internal geometry of the pigments can be inflated if an inaccurate force field for the pigments is employed. Furthermore, even if an accurate force field is employed, the fluctuations of the site energies with the fast (e.g., molecular vibrations) degrees of freedom must be removed from the ensemble, to avoid double counting of this contribution when computing optical spectra. A viable route for introducing the static disorder in the simulation of the spectra is presented in Sec. IV B.
A. Spectral density
In general, the spectral density of an embedded pigment can be separated into two contributions, one due to the intra-pigment vibrational modes and one arising from the motions of the environment .
The environmental part of the spectral density is commonly represented as a continuous function with low frequency contributions. This function is obtained either from the fitting of the experimental spectra, i.e., the difference between absorption and fluorescence spectra, or from a normal mode analysis of the whole pigment protein complex.102 The intra-pigment part of the spectral density is composed of discrete peaks at higher frequencies representing normal modes localized on the pigment. This part gives rise to what is normally called vibronic coupling. High-resolution spectroscopy, such as low-temperature fluorescence line narrowing (FLN) spectra,103–105 can resolve individual vibronic peaks and give information on their frequencies and corresponding Huang–Rhys factors. Therefore, the contribution has been often extracted from experiments.
In earlier applications, the trajectory of the system was obtained using MM-based MD. The SDs computed in this way, however, were too sensitive to the quality of the underlying force field, which has to match the QM description of the chromophore to give correct results.31,112–116 More recently, the increase in computer power and algorithms has allowed for the use of Born–Oppenheimer QM/MM(Pol) MDs to generate the trajectory.101,117,118 Commonly, a DFT description is used for the ground state QM/MM-MD simulations, which is then combined with a TD-DFT modeling of the excitation energies. Recently, semiempirical methods, such as DFTB, have been proposed as a computationally effective alternative to compute spectral densities of pigments in LH complexes.50,74,119
To give an example of how different models for computing the spectral density influence the corresponding line shape, in Figs. 3(a) and 3(b), we compare the experimentally derived spectral density of CP29 with those obtained from vertical gradient calculations or QM/MM MD simulations. The VG calculations were performed for Chls a604 and a610 of CP29, while the QM/MM MD spectral density refers to Chl a604.74 As one can note, comparing the two VG calculations, it is difficult to distinguish Chl a604 from a610. Moreover, both models agree quite well with the experimental estimate, with the QM/MM MD estimates showing a slightly better fit of the peak positions. This suggests that the inclusion of anharmonicity through QM/MM MD improves the description of intermolecular vibrations.
Spectral densities and spectra of chlorophyll. A comparison of the spectral density of a single Chl a obtained from the fitting of FLN experiments (Expt.)111 with the SD computed within the vertical gradient (VG) approach (a) and from a QM/MM dynamics (b).74 (c) Optical spectra for a single Chl a obtained using different intra-pigment parts of the spectral densities, as illustrated in (a) and (b). The excitation energy is shifted by the corresponding reorganization energy λ. The same environmental contribution Jenv was added to the intra-pigment part of the spectral density to obtain the complete spectral density of the chlorophyll embedded in the LH complex. This term is modeled as an overdamped Brownian oscillator with the reorganization energy equal to 60 cm−1 and a correlation time of 177 fs. No environmental contribution was added to the spectral density obtained from a QM/MM dynamics because in this case, the environmental contribution is inherently included.
Spectral densities and spectra of chlorophyll. A comparison of the spectral density of a single Chl a obtained from the fitting of FLN experiments (Expt.)111 with the SD computed within the vertical gradient (VG) approach (a) and from a QM/MM dynamics (b).74 (c) Optical spectra for a single Chl a obtained using different intra-pigment parts of the spectral densities, as illustrated in (a) and (b). The excitation energy is shifted by the corresponding reorganization energy λ. The same environmental contribution Jenv was added to the intra-pigment part of the spectral density to obtain the complete spectral density of the chlorophyll embedded in the LH complex. This term is modeled as an overdamped Brownian oscillator with the reorganization energy equal to 60 cm−1 and a correlation time of 177 fs. No environmental contribution was added to the spectral density obtained from a QM/MM dynamics because in this case, the environmental contribution is inherently included.
The different SD models have also been used to simulate the absorption line shape. The spectra are computed using the same adiabatic excitation energy [Fig. 3(c)]. From this comparison, it is clear that the differences between the experimentally derived spectral density (SD exp) and the computed ones have a limited impact on the actual line shape. In fact, despite the apparent differences in Figs. 3(a) and 3(b), the line shapes in Fig. 3(d) are all very similar and differ mainly in the relative intensity of the main 0–0 peak and the vibronic band.
Strikingly, among the vertical gradient calculations, the one on Chl a610 more closely resembles the line shape obtained from the experimental spectral density. This suggests that the low-temperature FLN experiment extracts the vibrational frequencies of Chl a610, which belongs to the terminal emitter state. The line shape calculated for Chl a604 deviates from the experimentally derived one and even more so does the one computed from QM/MM MD, which takes into account temperature effects. This comparison suggests that the experimentally derived spectral density is biased toward the low-energy Chls, and an accurate representation of vibronic coupling requires QM/MM calculations.
On the other hand, the line shapes obtained with the different methods are sufficiently similar to each other. For the purpose of computing exciton spectra, the choice of spectral density is probably not critical.
B. Static disorder
As reported above, MD is an efficient route to sample the configurational space of the solvated LH complex. The resulting trajectories can be finally used to extract different configurations of the system on which different excitonic Hamiltonians are calculated to account for static disorder in the simulation of the spectra.
Chlorophyll site energies in CP29. (a) Distribution of the Chl site energies obtained with a MMPol TD-DFT excited state calculation on geometries extracted from the classical MD . (b) Estimated fast contribution of the geometric (vacuum) Chl site energy vs the geometric contribution . (c) Site energy decomposition into fast (top), geometrically slow (middle), and environment (bottom) contributions. The estimated probability density of each contribution is shown. (d) Chlorophyll site energies , Eq. (19), in CP29. Chl a are shown in light green, and Chl b are shown in dark green. The boxplots show the median (central line), the first and third interquartile (extension of the box), and 1.5 × IQR (extension of the whiskers). The underlying points are shown for clarity. The red dashed horizontal lines indicate the error associated with the median of the distribution. The error is computed as , where is the standard deviation computed from 1000 bootstrap samples.
Chlorophyll site energies in CP29. (a) Distribution of the Chl site energies obtained with a MMPol TD-DFT excited state calculation on geometries extracted from the classical MD . (b) Estimated fast contribution of the geometric (vacuum) Chl site energy vs the geometric contribution . (c) Site energy decomposition into fast (top), geometrically slow (middle), and environment (bottom) contributions. The estimated probability density of each contribution is shown. (d) Chlorophyll site energies , Eq. (19), in CP29. Chl a are shown in light green, and Chl b are shown in dark green. The boxplots show the median (central line), the first and third interquartile (extension of the box), and 1.5 × IQR (extension of the whiskers). The underlying points are shown for clarity. The red dashed horizontal lines indicate the error associated with the median of the distribution. The error is computed as , where is the standard deviation computed from 1000 bootstrap samples.
Here, indicates the sum of the site energy calculated on the isolated pigment in its equilibrium geometry and a correction stemming from the variation of its fast nuclear degrees of freedom, such as the bond lengths, at that specific configuration of the system. In parallel, is the further change of the site energy due to change of the slower nuclear degrees of freedom of the pigment, such as angle distortions due to protein environment constraints. The total geometric contribution is directly accessible from a vacuum QM calculation of the pigment using a structure sampled from the MD simulation of the LH complex. Finally, is the explicit environment contribution, obtained by subtracting the total geometric component from a QM/MM(pol) calculation as .
The slow component of the geometrical contribution is then obtained by subtracting the fast component as . Figure 4(c) shows the distribution of , , and . The fast geometrical contribution evidently accounts for the most variance in , with the slow geometrical contribution and the environment contribution presenting a comparable variability.
The site energies computed with Eq. (19) are shown in Fig. 4(d). The distribution of each site energy is represented with a box plot, and we also report the confidence interval obtained via bootstrapping (red dashed lines). The statistical uncertainty on the mean site energy is around 50 cm−1 for most Chls and ensures that the obtained site-energy ladder is statistically significant.
In Eq. (19), the fast geometrical contribution has been averaged on all the chlorophylls and all the configurations of the different replicas. For the slow contribution instead, the average is performed on all the configurations within each replica separately but not on the pigments, as we have assumed that different chlorophylls will have different distortions due to their specific environment constraints.
With this scheme, we (i) correct the intrinsic error of the MM-MD simulation for the pigment bond lengths, (ii) exclude double-counting of bond length contribution to the energy disorder, and (iii) reduce the “noise” in the excitation energies coming only from the limited sampling of pigment geometries. We also point out that other approaches avoiding double-counting of vibrational motions in the final spectra, relying on geometry optimizations of the pigments, are present in the literature.120,121
Another approach to separating static and dynamic disorder in MD simulations is based on the calculation of site energies for a large number of closely spaced frames.122 If the site energy is mediated over a time window T, the fast fluctuations are reduced by a factor , while slower fluctuations are unaffected. The standard deviation of the means can be extrapolated for large T, yielding the magnitude of static disorder. This approach requires many calculations along a time window of several ns, and it is thus not feasible with standard QM/MM calculations. Using interpolation schemes for the potential energy surfaces122 makes this strategy viable. A further strategy involves running separate simulations, both for a single pigment and for an ensemble of pigments. Simulations on single pigments are employed to estimate the fast fluctuations of the site energies. Simulations on the ensemble are then used to extract the static disorder.123
V. APPLICATION TO CP29 AND ITS MUTANT
In Secs. III A, IV A, and IV B, we have presented some results obtained for the excitonic parameters of CP29. Here, we extend the analysis by comparing CP29 with a mutant and give a rationale for the observed changes in the absorption spectra of the two complexes.
The mutant that we are considering has been investigated experimentally by Guardini et al.22 It presents an asparagine (Asn) residue replacing the histidine (His) as the axial ligand of Chl a603 and, in the following, will be denoted as CP29-H111N for simplicity (we adopt here the same numbering of the referred article). This mutant is particularly interesting as it retains the targeted Chl a603, contrary to other mutations at the same H111 site, and it induces an increase in the red edge of the absorption spectrum, which has been connected to a change in the quenching properties of the complex. Authors suggested that this mutation results in a reduced distance/altered orientation between chlorophylls a603 and a609, induced by the lower steric hindrance of Asn as compared to His, and increased excitonic interactions between the two.22
A. Molecular dynamics of CP29-WT and CP29-H111N
MD simulations for both CP29-WT and CP29-H111N were employed for investigating the rearrangements occurring at the level of the Vio-a603-a609 cluster. The simulation protocol employed for the CP29-WT simulations has been presented elsewhere.32 For the CP29-H111N simulations, we have employed the very same protocol, with the only exception that the His residue H111 was mutated into the N111 residue before starting the minimization.
The reduced steric hindrance of Asn as compared to His results in Chl a603 being closer to its axial ligand in CP29-H111N as compared to CP29-WT. In Fig. 5(a), we show the distance between the Mg atom of Chl a603 and the Cα carbon of its axial ligand. The Mg-Cα distance is reduced by in CP29-H111N as compared to CP29-WT (the mean distance is 6.64 Å in CP29-WT and 5.74 Å in CP29-H111N), confirming the rationalizations of Guardini et al.22 The a603 position appears to be very robust to local structural changes induced by the protein dynamics in CP29-WT, with a standard deviation of Å, and less so in CP29-H111N (standard deviation of Å; see also Fig. 5). Thus, the closer Chl a603 in CP29-H111N is more sensitive to local protein fluctuations than in the WT.
Differences between CP29-WT and CP29-H111N. (a) Distance between the Mg atom of Chl a603 and the Cα carbon of its axial ligand. The axial ligand is H111 for CP29-WT and N111 for CP29-H111N. The estimated probability distribution of the distance is shown in the inset on the right. Different shades of orange (CP29-WT) and blue (CP29-H111N) correspond to different replicas. (b) Arrangement of the L2 pocket in CP29-WT. The histidine residue (H111) coordinating Chl a603 is highlighted. (c) Förster orientation factor squared between Chl a603 and a609 in CP29-WT and CP29-H111N. The estimated probability distribution of the orientation factor is shown in the inset on the right. Different shades of orange (CP29-WT) and blue (CP29-H111N) correspond to different replicas. (d) Arrangement of the L2 pocket in CP29-H111N. The asparagine (N111) residue coordinating Chl a603 is highlighted. (e) Site energies [as computed with Eq. (19)] for Chl a603 and Chl a609 in both CP29-WT and CP29-H111N. (f) Coulomb coupling [as computed with Eqs. (4) and (6)] between the Chl pair a603-a609 for both CP29-WT and CP29-H111N. (g) Absorption spectrum of CP29-WT and CP29-H111N. The spectrum of CP29-WT is normalized, and the one of CP29-H111N rescaled by the same normalization factor. The dashed line shows the difference spectrum (WT − H111N). The maximum and minimum of the difference spectrum are shown. In all plots, orange lines correspond to CP29-WT and blue lines correspond to CP29-H111N. The boxplots show the median (central line), the first and third interquartile (extension of the box), and 1.5 × IQR (extension of the whiskers). The underlying points are shown for clarity.
Differences between CP29-WT and CP29-H111N. (a) Distance between the Mg atom of Chl a603 and the Cα carbon of its axial ligand. The axial ligand is H111 for CP29-WT and N111 for CP29-H111N. The estimated probability distribution of the distance is shown in the inset on the right. Different shades of orange (CP29-WT) and blue (CP29-H111N) correspond to different replicas. (b) Arrangement of the L2 pocket in CP29-WT. The histidine residue (H111) coordinating Chl a603 is highlighted. (c) Förster orientation factor squared between Chl a603 and a609 in CP29-WT and CP29-H111N. The estimated probability distribution of the orientation factor is shown in the inset on the right. Different shades of orange (CP29-WT) and blue (CP29-H111N) correspond to different replicas. (d) Arrangement of the L2 pocket in CP29-H111N. The asparagine (N111) residue coordinating Chl a603 is highlighted. (e) Site energies [as computed with Eq. (19)] for Chl a603 and Chl a609 in both CP29-WT and CP29-H111N. (f) Coulomb coupling [as computed with Eqs. (4) and (6)] between the Chl pair a603-a609 for both CP29-WT and CP29-H111N. (g) Absorption spectrum of CP29-WT and CP29-H111N. The spectrum of CP29-WT is normalized, and the one of CP29-H111N rescaled by the same normalization factor. The dashed line shows the difference spectrum (WT − H111N). The maximum and minimum of the difference spectrum are shown. In all plots, orange lines correspond to CP29-WT and blue lines correspond to CP29-H111N. The boxplots show the median (central line), the first and third interquartile (extension of the box), and 1.5 × IQR (extension of the whiskers). The underlying points are shown for clarity.
The same simulations have also been employed to investigate the relative position and orientation of the a603-a609 pair. When looking at the minimum distance between the two chlorophylls, we see that it is shorter in the mutant than in the WT (the average minimum distance is 3.54 and 3.38 Å in CP29-WT and CP29-H111N, respectively, which translates into a 4% decrease in the mutant compared to the WT). Differences in the a603-a609 cluster between CP29-WT and CP29-H111N, however, are not limited to their position, but also their relative orientation is affected by the altered position of a603 resulting from the mutation. Figure 5(c) shows the squared orientational factor κij (i = a603, j = a609) appearing in Eq. (5), which we analyze here as a qualitative indicator of increased/decreased electronic coupling between the two pigments. The plot in Fig. 5 shows a 22% increase of in CP29-H111N as compared to CP29-WT (the average value of is 1.76 and 2.14 for CP29-WT and CP29-H111N, respectively) and a similar sensitivity to the protein dynamics (standard deviation of 0.25 and 0.22 for CP29-WT and CP29-H111N, respectively). The altered position and orientation of this chlorophyll pair can also be observed when looking at the MD [see Figs. 5(b) and 5(d) for a particular snapshot of the WT and the mutant].
B. Site energies, couplings, and optical spectrum
The geometrical analysis of the trajectories strongly suggests that the electronic coupling between the a603-a609 chlorophyll pair will be enhanced in the mutant with respect to WT. Meanwhile, the altered positions of these chlorophylls in the mutant may lead to substantially modified chlorophyll site energies. Assuming for simplicity that the two chlorophylls only make up the lowest or second-lowest exciton in CP29 (which is a reasonable assumption according to previous quantum chemical calculations and spectroscopic results77,124), both their site energies and their coupling are important for determining the resulting excitonic state. Site energies and couplings have been computed at the QM/MMpol level using TD-DFT M062X/6-31G(d) for both the CP29-WT and CP29-H111N, selecting uniformly spaced structures from the last 1 µs of trajectory for each replica, resulting in a total of 179 and 200 calculations for both chlorophylls for CP29-WT and CP29-H111N, respectively. The phytil tail of the chlorophyll was removed from the QM part (and considered in the polarizable MM system) by cutting the bond after the first aliphatic carbon of the tail. Additionally, in order to recover the environment contribution to the site energy and extract the geometrical site energy , we performed the same QM calculations without including the effects of the MM subsystem.
The site energy of each chlorophyll was then obtained by means of Eq. (19), which is essential to avoid double counting of the fast internal motions of the pigments when computing the spectrum, while still retaining the pigment identity arising from its particular dynamic environment. The site energies of Chls a603 and a609 are shown in Fig. 5(e). The difference in mean of chlorophylls a603 and a609 between the WT and the mutant is cm−1 and cm−1 (the mean difference value and the associated standard error pooled among the replicas are reported). These values indicate that we can consider the site energy of both chlorophylls to be practically the same in the mutant and the WT. By contrast, the electronic coupling difference for the a603-a609 pair is estimated as cm−1, which is around a 40% change in the absolute coupling of the WT [see also Fig. 5(f)]. Therefore, corroborating the geometrical insights obtained from the orientational factor [Fig. 5(c)], we obtain a significantly different coupling between the WT and the mutant, with the mutant presenting higher excitonic interactions than the WT. This confirms the intuition of Guardini et al.,22 who proposed a strengthening of excitonic interactions for the a603-a609 pair in CP29-H111N.
Considering the local character of the geometrical changes associated with the mutation (i.e., only the relative arrangement of a603 and a609 is substantially altered) and the increased electronic coupling between the two chlorophylls, we can compute the absorption spectrum of CP29-H111N by employing the very same exciton Hamiltonians employed for CP29-WT, with the only difference that the a603-a609 coupling is increased by 55 cm−1 in the mutant. The absorption spectra of CP29-WT and CP29-H111N are then computed with the cumulant expansion theory truncated at second order (see Sec. IV). The spectral density of chlorophylls was modeled as a linear combination of 48 high-frequency modes and an overdamped Brownian oscillator term, taking the parameters from FLN experimental data.111 This choice was made to remain consistent with our earlier optical spectra simulations of light-harvesting complex II (LHCII) and light-harvesting complex stress-related protein 1 (LHCSR1).26,72
The calculated absorption spectra for CP29-WT and CP29-H111N are reported in Fig. 5(g). Notably, the red shift observed by Guardini et al. (from 674.8 to 685.6 nm) is well reproduced by our calculations (from 670.9 to 688.2 nm), demonstrating the effectiveness of our approach. Therefore, our computational model of CP29-H111N validates the interpretation of Guardini et al.,22 namely, that a strengthening of excitonic interactions of the a603-a609 pair is responsible for the red shift observed in the absorption spectrum. We also note that this computational approach does not just reproduce the experimental results, but it complements them by explicitly tracking down the molecular sources, giving rise to the different observables that have been probed experimentally.
VI. CONCLUDING REMARKS AND FUTURE OUTLOOKS
In this Perspective, we have presented a general overview of the challenges encountered when light-harvesting complexes are modeled through atomistic approaches and highlighted the peculiarities that characterize these pigment–protein complexes. We have presented an effective computational strategy to simulate their optical spectroscopy, which combines molecular dynamics simulation, quantum chemical methodologies, and line shape theories. The predictive power achieved by such an integrated strategy has been demonstrated on the comparison of absorption spectra of wild-type CP29 and a particular CP29 mutant that has been studied recently by experimental means.
Despite the remarkable power offered by this strategy, there is still much room for improvement in many of the computational aspects discussed in this Perspective.
One possibility is to exploit machine learning methods to substantially reduce the computational cost associated with excitonic properties while retaining an accuracy almost equal to that of expensive quantum chemical methods.125–132 Machine learning frameworks have also proven to be able to incorporate the description of the environment for complex systems, including pigment–protein complexes.133,134 This opens a route to accurate calculations of static and dynamics disorders, which require a large number of calculations.
Another important aspect is the extent of MD sampling. In most applications, crystal structures or short MD simulations of the complexes are used, which can lead to significant biases toward the selected initial structure. Instead, to properly explore the whole conformational space of the complex, including all possible configurations of the pigments, enhanced sampling simulations should be used.135–138 In addition, in this case, machine learning techniques can represent a speedup by providing useful reaction coordinates upon which the sampling is improved.139–142
ACKNOWLEDGMENTS
The authors acknowledge funding from the European Research Council, under Grant No. ERC-AdG-786714 (LIFETimeS). The authors thank Ulrich Kleinekathöfer for providing the spectral densities calculated through a DFTB/MM MD simulation.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts of interest to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.