Geoengineering can control only some climatic variables but not others, resulting in side-effects. We investigate in an intermediate-complexity climate model the applicability of linear response theory (LRT) to the assessment of a geoengineering method. This application of LRT is twofold. First, our objective (O1) is to assess only the *best* possible geoengineering scenario by looking for a suitable modulation of solar forcing that can cancel out or otherwise modulate a climate change signal that would result from a rise in carbon dioxide concentration [CO$2$] alone. Here, we consider only the cancellation of the expected global mean surface air temperature $\Delta \u27e8[Ts]\u27e9$. It is in fact a straightforward inverse problem for this solar forcing, and, considering an infinite time period, we use LRT to provide the solution in the frequency domain in closed form as $fs(\omega )=(\Delta \u27e8[Ts]\u27e9(\omega )\u2212\chi g(\omega )fg(\omega ))/\chi s(\omega )$, where the $\chi $’s are linear susceptibilities. We provide procedures suitable for numerical implementation that apply to *finite* time periods too. Second, to be able to utilize LRT to quantify side-effects, the response with respect to uncontrolled observables, such as regional averages $\u27e8Ts\u27e9$, must be approximately *linear*. Therefore, our objective (O2) here is to assess the linearity of the response. We find that under geoengineering in the sense of (O1), i.e., under combined greenhouse and required solar forcing, the asymptotic response $\Delta \u27e8[Ts]\u27e9$ is actually not zero. This turns out not to be due to nonlinearity of the response under geoengineering, but rather a consequence of inaccurate determination of the linear susceptibilities $\chi $. The error is in fact due to a significant quadratic nonlinearity of the response under system identification achieved by a forced experiment. This nonlinear contribution can be easily removed, which results in much better estimates of the linear susceptibility, and, in turn, in a fivefold reduction in $\Delta \u27e8[Ts]\u27e9$ under geoengineering practice. This correction dramatically improves also the agreement of the spatial patterns of the predicted linear and the true model responses. However, considering (O2), such an agreement is not perfect and is worse in the case of the precipitation patterns as opposed to surface temperature. Some evidence suggests that it could be due to a greater degree of nonlinearity in the case of precipitation.

Geoengineering strategies with the aim of mitigating climate change are receiving increasing attention,^{1–10} not only because of their potential to solve one of the greatest challenges faced by modern society, but also because of the great risk that such an unprecedented endeavor entails. Here, we would like to advocate that the study of climate change in general, and geoengineering, in particular, would benefit from response theory^{11,12} and the theory of nonautonomous dynamical systems.^{13–20} These mathematical tools were introduced into climate science many years ago,^{21–23} but only recently have they started to really gain traction.^{24–37} The first application of response theory to the study and efficient assessment of geoengineering, in particular, was by Kravitz and MacMartin.^{38} They assessed the linearity of the response, but regarding global averages only. However, regional temperature responses to radiative forcing can be nonlinear,^{32,39–41} and there has been an indication^{39} that they can be nonlinear in the case of geoengineering too. We show that it is possible to describe in a concise and general way the response of the climate system to two or more forcings with given time-dependent modulations. In particular—and this is the case of interest in geoengineering—if a forcing is given, one can arrange the time modulation of $N$ other forcings in such a way as to achieve a desired time-dependent change for $N$ climatic observables of interest. The pitfall of this approach is that (a) the response of any other observable is, in principle, uncontrolled and (b) nonlinearities can become more and more relevant as forcings are added to the system. This indicates that there are some fundamental caveats in the setup of geoengineering strategies.

## I. INTRODUCTION: USING RESPONSE THEORY TO FORMULATE GEOENGINEERING STRATEGIES AND OUTCOMES

First, we summarize briefly the existing mathematical tools (Sec. I A) that will provide us with the framework to describe the geoengineering problem as an inverse problem (O1) (Sec. I B). We will then elucidate the utility of this inverse problem approach and compare it with the alternative control problem and other approaches (Sec. I C), and we will subsequently provide the context for the need to assess geoengineering strategies (O2) (Sec. I D).

### A. Elements of response theory

In *nonautonomous dissipative dynamical systems*, like the climate system, given in the form

the *response* of the system to an external forcing $g(x,t)$ can be defined most generically^{42} in terms of the so-called *snapshot attractor*^{15} of the system, and the natural probability distribution or the measure $\mu (x,t)$ supported by it. This applies also to chaotic systems, when the snapshot attractor is a fractal object. Both the attractor and the measure are *unique* objects; they are defined by an *ensemble* of trajectories initialized in the *infinite* past. The time dependence of the snapshot attractor, also called a pullback attractor,^{16,18,43} and its measure give what is often termed the “forced response,” and their geometrical details at any instant describe (statistical aspects of) the *internal variability* in a conceptually sound sense.^{36}

For a scalar observable $\Psi (x)$ too, the (forced) response is uniquely given by a projection of the measure. Response theory^{12,44,45} asserts that the most basic ensemble-based statistics, the mean $\u27e8\Psi \u27e9(t)=\u222bdx\Psi (x)\mu (x)(t)$ can be decomposed into linear ($j=1$) and nonlinear ($j>1$) contributions,

where the first-order, i.e., linear, term can be obtained as

where $\mu \xaf(x)$ is the natural invariant measure of the autonomous system ($g=0$), and the operators are defined as $LF\mu =\u2212div(F\mu )$ and $Lg\mu \xaf=\u2212div(g\mu \xaf)$. In (2), $\u27e8\Psi \u27e90$ is the unperturbed ($\u03f5=0$) expectation value, and the series converges only if the forcing $\u03f5g(x,t)$ is small enough. If the forcing depends on time in a multiplicative fashion, $g(x,t)=g(x)f(t)$, then we can write

Note that the higher-order terms $\u27e8\Psi \u27e9(j)$ can be expressed as multiple *convolution integrals* involving multitime Green’s functions.^{32} Taking the Fourier transform (FT) of Eq. (4), we have, via the convolution theorem,^{46} a response formula in the frequency domain,

where $\chi \Psi (1)(\omega )=FT[G\Psi (1)(t)]$ is called the linear *susceptibility*.

### B. The geoengineering problem

It has been proposed^{3} that the effect of greenhouse forcing can be mitigated by applying another external forcing to the Earth system by some geoengineering means that has, in a way, an opposing effect. There are various types of forcing that can achieve this, but here we will consider those—generically referred to as “solar-radiation management” (SRM)^{47,48}—that can be modeled by a modulation of the solar constant. We will call this simply the “solar forcing.” Clearly, these are means that modulate the shortwave incoming radiation. Readily proposed geoengineering methods include a fleet of reflective satellites of large Sun-facing surface area put into orbit around the Earth, aerosols suspended in the atmosphere, artificially generated clouds, etc. A modulated solar constant model represents these geoengineering scenarios with various degrees of approximation, not necessarily a good approximation.^{5}

Formally, the geoengineering problem involves a forced/nonautonomous system, where at least two terms contribute to the forcing. For simplicity, to start with, we consider the case of only two forcing terms, both of which are additive; that is, the dynamical system of interest is

where the subscripts already indicate the physical meanings of the forcings, $g$ for “greenhouse” and $s$ for “solar.” Except for the need for a convergent series in Eq. (2), an arbitrary value can be assigned to the “small” parameter $\u03f5$, and in order to obtain a result in the uncomplicated form of Eq. (10), we choose the same $\u03f5$ for both forcing components. Equation (4) implies that the first-order contribution $\u27e8\Psi \Sigma \u27e9(1)(t)$ to the *total response*$\Delta \u27e8\Psi \Sigma \u27e9$ under combined forcing, i.e., geoengineering (where the subscript in $\Psi \Sigma $ is to indicate the presence of multiple forcings), can be written as the superposition of first-order contributions to respective responses to the two forcings in two separate scenarios when these forcings are acting alone. Formally, this is expressed as

The FT of this equation is

Note that the nonlinear response is more complicated with multiple forcings present: it is not just a sum of multiple convolution integrals^{32} as in the case of a single forcing scenario.

If the “forward” problem is the prediction of the response to a given forcing, then the *inverse* problem of “predicting” the necessary forcing for a desired response, being our objective (O1), seems to be well defined in view of the above equations. To a linear approximation, the necessary or *required* forcing is

In the above, $\u03f5=1$ is taken.

The case of two forcings, one greenhouse and one geoengineering, can be generalized to $N+1$ forcings when we desire to control $N$ climate observables $\Psi T=(\Psi 1,\u2026,\Psi N)$ by modulating $N$ geoengineering forcings $fsT=(fs1,\u2026,fsN)$. With these, generalizing Eq. (9), we can write in matrix form^{49}

This equation can be inverted to give the vector of geoengineering forcings $fs$,

provided the inverse of the matrix $\chi \Psi ,s$ exists. The problem of static response is considered by Lu *et al.*,^{50} who take the $N$ geoengineering forcings as those acting at $N$ different grid points of a climate model and look for forcing fields to which the climate system is most susceptible.

### C. The utility of the inverse problem and its alternatives

In Appendix C, we provide a procedure to solve the inverse problem with $N=2$ using discrete and finite time series. That situation can be interpreted as a control problem, which is in fact a rather special type of *optimal* control. This way, the required forcing can be predetermined and need not be updated during its application. In a different approach in, for example, Refs. 47 and 48, the solar forcing was constructed on the basis of some models of how much radiative forcing a sudden change of some greenhouse gas concentration or the stratospheric optical depth would yield. In addition, these authors created a scenario ensemble of SRMs, and selected the most effective SRMs. The latter assessment strategy is clearly rather inefficient and inaccurate, and that would still be the case had the ensemble been generated using response theory. We note that MacMartin *et al.*^{51} proposed for the first time to solve an inverse problem as a “design problem” for geoengineering. They invert the analytical solution of a conceptual model for the global average surface temperature, the parameters of which conceptual model are inferred via fitting it to Earth System Model (ESM) simulation data. Our method is more generic in that it does not require analytic inversion or the use of a simplified model, and it can also consider any observable to be controlled, not just the global average temperature.

The inverse problem would have a *direct* practical relevance were we to have $fg(t)$ a given, as assumed. However, this is clearly not the case: predicting greenhouse gas emissions is an extremely complicated task, since it is determined among others by *social* processes, for which we do not have good models. Nevertheless, efforts are under way^{52,53} (see also https://crescendoproject.eu/research/theme-4/). The current standard practice to tackle this challenge, as reflected by the IPCC reports,^{1} is to consider half a dozen so-called “methodologically constructed” 21st century emission scenarios. This way, instead of climate predictions, one produces so-called climate *projections* belonging to hypothetical future emission scenarios. Therefore, the solution to our inverse problem has a rather *indirect* practical relevance; the inverse problem approach would allow us to carry out *scenario analyses*. The reader can find elsewhere^{51,54–56} descriptions and analysis of a *feedback* control problem of *direct* practical relevance, when the solar forcing is being determined in real time with the use of some controller, adapting to a progressing greenhouse forcing, trying to realize the desired response. Note that with feedback control, in a scenario analysis setting, a new simulation needs to be run for each emission scenario, making it very inefficient for an extensive assessment exercise. Note also that for feedback control, what can be observed as a reference (the basis of feedback) is not the forced response in terms of an ensemble, but only a single realization. Therefore, what the feedback and open-loop control strategies would, respectively, realize—the climatology in terms of, say, ensemble means, on the one hand, and the internal variability, on the other—could be very different. Considering any observable $\Psi $ chosen to be controlled, the results from feedback and open-loop control would be more different the stronger the internal variability $\Psi $ features are. Therefore, more difference is expected between the two control strategies when a local quantity is controlled as opposed to a global-mean quantity. In an extreme case, one can consider a geoengineering method *designed*^{7} to extinguish the oscillatory El-Niño phenomenon, presumably in order to prevent floods or droughts that are part of the internal variability of Earth’s climate.

We point out that in, for example, Eq. (10), we write $\Psi $ to denote a generic observable. This means that we can *choose* a particular (scalar) observable that we desire to evolve in a particular way. With reference to the classic term “global warming,” in contrast to “climate change,” we will attempt to enforce cancellation of the global average surface air temperature (Sec. III A). With the increasingly wide-ranging analyses of climate change scenarios, however, it is clear that “climate change” should have a comprehensive meaning, and not just be a synonym for “global warming.”^{57} In fact, physical quantities other than temperature could have a greater social or ecological impact.^{1} Therefore, unlike in the present work, in practice, we might want to choose some variable other than global temperature to control. Besides the *physical type* of the observable quantity, we can make arbitrary choices with respect to the *spatial scale* of the quantity, such as local or regional averages (Sec. III A 2), zonal averages ( Appendix E), global averages (Sec. III A 1), etc.

### D. This paper

We now turn to motivating the need for comprehensive assessment of geoengineering scenarios. Once an observable $\Psi $ is chosen to evolve in a particular way, which determines $fs(t)$ according to Eq. (10), the evolution of any other observable $\Phi $ will be *a given*—the solution of a *forward* problem formally identical to (9),

yet with an $fs$ given by Eq. (10). Note that $\u27e8\Phi \Sigma \u27e9(1)(t)\u2260\u27e8\Psi \Sigma \u27e9(1)(t)$ when $G\Phi ,g(t)\u2260G\Psi ,g(t)$ and/or $G\Phi ,s(t)\u2260G\Psi ,s(t)$, which is the generic case. Regarding the desire for cancellation, $\Delta \u27e8\Psi \Sigma \u27e9=0$, we can frame^{58} geoengineering—considering for simplicity only quasistatically slow changes of $fg(t)$—as a confinement to the 0 isoline of $\Delta \u27e8\Psi \Sigma \u27e9$ over the plane of $fg$ and $fs$. In general, this isoline is different for different observables $\Phi \u2260\Psi $; i.e., under a linear response, these straight isolines fan out from the origin of the $fg$–$fs$ plane. This is illustrated in Fig. 1, where the curvature of the isolines for larger values of $fg$ and $fs$ reflect also the more general situation of nonlinear responses. It is a straightforward implication that when the system is confined to one isoline, it cannot be confined to the different isolines of other variables $\Phi i$, i.e., (unwanted) changes $\Delta \u27e8\Phi i,\Sigma \u27e9\u22600$ will ensue. In other words, the proposed geoengineering method will provide just a partial solution at best. While one aspect of the problem is solved, other aspects can be neglected, or even changed to the worse, possibly with catastrophic consequences.^{59} A long list of studies have to date addressed the issue of side effects (see, e.g., Refs. 60, 47, 48, 5, 51, 61, 38, and 62.) This possibility is the main *motivation* of our present investigation too, concerning, in particular, the question (O2) of whether linear response theory can provide an efficient tool to map out and quantify accurately the various side effects of a variety of scenarios. A comprehensive assessment would consider a variety of geoengineering scenarios, emission scenarios, Earth System Models, and possibly other things, for which the efficiency of computation is crucial. In this study, having enforced (approximately, to various degrees) a cancellation of global average surface air temperature, $\Delta \u27e8\Psi \Sigma \u27e9=\Delta \u27e8[Ts,\Sigma ]\u27e9\u22480$, we will *diagnose* unwanted changes, i.e., the total response, in terms of

Note that we denote spatial averaging by square brackets, subscripted by the spatial variable(s) with respect to which we average over its whole range, e.g., longitudes $\lambda $ for zonal averages. However, for global averaging, we drop the subscripting altogether (instead of writing, e.g., $[Ts]\lambda ,\mu $). Some of these observables have been considered in a number of studies,^{5,38,47,48,61,62} and our results are mostly consistent with the published ones; however, as our novel contribution (O2), we will also investigate carefully whether these responses can be predicted by linear response theory.

The premise of our objective as set out above is that of Robock,^{63} recently quoted in a blog by Kravitz,^{7} in which he asks the questions whether “there is only one thermostat” and whether “the climate can be optimized regionally.” Regarding the second of these, Ban-Weiss and Caldeira^{64} and MacMartin *et al.*^{65} applied different spatial patterns to reduce side effects to surface temperature. This idea is actually already covered by our framework with $N=2$, in which the function $gs(x)$ in Eq. (7) needs to be specified accordingly. Regarding the first question, Kravitz *et al.*^{56,66} proposed that one names *multiple* objectives and looks accordingly for multiple suitable “control knobs” for the climate. They employed a feedback control. The alternative—the inverse problem approach (O1) that we propose here—is shown above to be straightforward to generalize to the multiple-objective–multiple-control-knob situation, simply through the possibility of solving the linear matrix equation (11) by inverting for $fs$. With regard to side effects, however, whatever way we construct the geoengineering forcing, the situation is hardly different from the single-objective–single-control-knob situation: there will be objectives that we could inadvertently miss from the list, or objectives that are not convenient to include, and then we need to assess the side effects in terms of the corresponding uncontrolled observables—or, indeed, assess the climatology as comprehensively as possible or as is desired. Regarding the assessability of side effects using response theory (O2), even if nonlinear response formulas are available, feasibility might be hampered by an increasing number of objectives or control forcings.

We point out that in the Planet Simulator intermediate-complexity GCM (PlaSim in short),^{67} the greenhouse and solar forcings have been found to be approximately equivalent in terms of the stationary response of the global average surface air temperature^{68} insomuch that its isolines are parallel straight lines (even if there is a curvature of the surface). This was found to be the case in rather extensive ranges of the forcings, 90–2880 ppm and 1200–1500 W m$\u22122$, respectively. That is, any curvature of the blue line as shown in Fig. 1 occurs outside of the said ranges. However, in the context of geoengineering, the concern is whether these forcings are equivalent in the same sense in terms of other variables ($\Phi $) too, as discussed. We will demonstrate in PlaSim that with regard to regional averages $Ts$, the correspondence of forcings is still remarkable, but there is nevertheless a residual response with a nontrivial pattern under geoengineering. Furthermore, our analysis hints that (O2) this residual response might not be so linear, and less so for precipitation, which goes beyond the findings of MacMartin and Kravitz,^{38} who demonstrate clearly the linearity only of the global average response under geoengineering, but average the linear predictions of spatial patterns over nine models. On the other hand, Cao *et al.*^{39} do indicate that the local response for several observables under geoengineering might be nonlinear by comparing the prediction of a linear regression model with simulation results for the HadCM3L model. However, we point out that it is possible that the local susceptibilities represented in the simple linear model were inaccurately estimated from model simulation data, and so further analysis would be needed to attribute the said mismatch to nonlinearity. This is what we attempt here.

This work follows Ragone *et al.*^{31} and Lucarini *et al.*^{32} The latter demonstrated that it is convenient for response theory to predict spatial patterns too, which, as outlined above, form the basis for one of the types of diagnostics that we pursue in order to assess the success of the geoengineering method. In both of these papers, the authors used PlaSim^{67} to demonstrate the power of their methodology, but with slightly differing setups of the model. Here, we adopt the setup of Lucarini *et al.*^{32} featuring meridional ocean heat transport. The present work also builds on Gritsun and Lucarini^{69} in adopting a simple technique to obtain a better estimate of the linear susceptibility, which was independently discovered by Liu *et al.*^{70} Clearly, a better susceptibility estimate would be useful in making a linear prediction only if the actual response were linear. Under [CO$2$]-doubling, Ragone *et al.*^{31} and Lucarini *et al.*^{32} found a nonlinear response of the *global average* $\Delta \u27e8[Ts]\u27e9$, and so no linear prediction would be productive in that case; however, under geoengineering, the total response is aimed to be much smaller, and so in principle the response may be linear. This is found to be approximately the case in PlaSim, and, so, as one of the main contributions of this work (O1), by improving the susceptibility estimates, we can improve greatly on our prediction of a solar forcing $fs(t)$ required for cancellation, $\Delta \u27e8\Psi \Sigma \u27e9(t)=0$.

The structure of the remainder of this paper is as follows. Next, in Sec. II, we detail our methodology, listing a set of experiments performed to establish the response characteristics of the climate model and to assess nonlinearities, among other things. Then, in Sec. III, we provide results, first pertaining to objective (O1) about the success of the primary objective of geoengineering, namely, the cancellation (Sec. III A 1), and then our diagnostics of other observables (Secs. III A 2 and III B). Finally, in Sec. IV, in terms of the stationary climate only (O1), we outline an improved method for obtaining the required solar forcing for cancellation, and also (O2) analyze our improved diagnostics with respect to the linearity of the response. In Sec. V, we summarize our results and give our perspective for worthwhile future work. In a series of appendixes, we give details of the notations and algorithms for spectral analysis in discrete time ( Appendix A), the way we obtain the Green’s functions ( Appendix B), our novel solution method to the discrete- and finite-time inverse problem for a required solar forcing ( Appendix C), and the circular convolution theorem ( Appendix D), and we relegate the diagnostics of zonal averages to Appendix E.

## II. METHODOLOGY: FORCING SCENARIOS

The form of the forcing signal $fg$ due to changes in the carbon dioxide concentration $[CO2]$, for which we want to solve the geoengineering inverse problem, is a ramp that was used by Lucarini *et al.*^{32} This is a standard forcing type, also used for the CMIP6 DECK (Diagnostic, Evaluation and Characterization of Klima) protocols.^{71} More precisely, it is not a time-continuous ramp, for the reason detailed in Appendix B, but $[CO2]$, and so $fg$, is kept constant for one year after each incremental increase. The $[CO2][n+1]\u2212[CO2][n]$ increment is a (small) *fraction* of the current value $[CO2][n]$, and therefore increasing in a superlinear fashion with time $[n]$, but, owing to the logarithmic dependence of the radiative forcing on the $[CO2]$ concentration,^{72} it realizes a linear radiative forcing signal^{73} $fg[n]$ (see Appendix A for the notation for a discrete time series), i.e., a constant-in-time ($n$) radiative forcing increment $fg[n+1]\u2212fg[n]$. Hence the name “ramp.” Such a form of the forcing signal is useful in diagnosing or interpreting results. For example, if the response characteristic to solar forcing $fs$ is similar to that of $fg$, then the required solar forcing to cancel global change would also be approximately ramplike.

Note, however, that a linearity of the response characteristic to any forcing is usually checked by a comparison of the linear prediction with the truth in terms of a model simulation subject to the same forcing. The latter we will refer to as “reference” in the following, instead of “truth.” Besides the nonlinearity, another factor that gives rise to a discrepancy is a statistical error due to the finite ensemble size. However, the latter has a very distinct feature that can be visually told apart easily from the contribution of nonlinearity. We reiterate that by applying a staircaselike forcing, we guarantee that the said discrepancy has no contribution due to calculations being performed in discrete time.

We point out that, at asymptotic times, there is no discrepancy at all, because of the way we estimate the Green’s function ( Appendix B): the discrepancy emerges *transiently* only. Its all-time maximum is a useful intuitive measure of nonlinearity in the examined regime. In any case, the larger the response, the greater is the nonlinear contribution to it, and so—in the context of system identification—the more inaccurate our estimates of the susceptibilities ( Appendix B) become. Therefore, besides our *base scenario* of (overall) doubling $[CO2]$, we will also check if we can obtain a more accurate (and so useful for the geoengineering problem) estimate of the Green’s function using a *weaker* identification forcing. In particular, we apply a $[CO2]$ change that results in *half* of the (overall) radiative forcing change of that by doubling $[CO2]$ (this is realized by $[CO2]\u221e/[CO2]0=2$, according to the above-mentioned logarithmic law^{72}). Note that in the case of this weaker forcing, irrespective of the different plateau level, the increments of the $[CO2]$ changes realize the same 1%/year relative change.

We refer the reader to Table I for an overview of the various identification and test forcing scenarios that we used in the present study. Among them, we have CQ2 defined by 0.1%/year relative changes, which makes it a much slower change than the base scenario. The response to such a slow ramp forcing should be ramplike as long as the linear term in Eq. (2) dominates over the nonlinear ones. This forcing scenario will, therefore, provide us another reference in interpreting other results with respect to linearity.

. | Form: . | Step . | Ramp . | Slow ramp . | ||
---|---|---|---|---|---|---|

Forcing . | Plateau: . | 2 . | $2$ . | 2 . | $2$ . | 2 . |

[CO_{2}] | CS2 | CS1 | CR2 | CR1 | CQ2 | |

Solar | SS2 | SS1 | SR2 | SR1 | ||

Combined | BR2 | BR1 |

. | Form: . | Step . | Ramp . | Slow ramp . | ||
---|---|---|---|---|---|---|

Forcing . | Plateau: . | 2 . | $2$ . | 2 . | $2$ . | 2 . |

[CO_{2}] | CS2 | CS1 | CR2 | CR1 | CQ2 | |

Solar | SS2 | SS1 | SR2 | SR1 | ||

Combined | BR2 | BR1 |

In Table I, we have not indicated the plateau level of the solar forcing $fs$ used in conjunction with $fg$. We chose this level such that the response asymptotically in terms of the global average surface air temperature is the same but of opposite sign as that due to the corresponding $fg$. This level can be easily determined to a good approximation by an iterative procedure. Besides those in Table I, we will introduce a few more forcing scenarios in Sec. IV: some forcing scenarios that will give improved results, and other forcing scenarios that aid the interpretation of our results. The abrupt stepwise forcing scenarios, CS1, SS1, CS2, SS2, are employed to numerically determine the Green’s functions as detailed in Appendix B.

## III. RESULTS

### A. Surface air temperature

#### 1. Global average

The global average surface air temperature is the variable with respect to which we *prescribe* the cancellation. We do *not* consider any other variable in this role throughout the present study. Having predicted the solar forcings (SR1, SR2) required to produce no total response used in combination with prescribed $[CO2]$ forcings (CR1, CR2) adopting the methodology described in Appendix C, we plot the predicted linear responses in Fig. 2(a). Clearly, these predictions can be viewed either as components of the *predicted* total response (BR1, BR2), or the predicted response in separate scenarios (CR1, CR2, SR1, SR2). Alongside these predictions, we plot the true response in the scenarios when the forcings are applied separately, i.e., the responses evaluated by direct numerical simulations (CR1, CR2, SR1, SR2). The comparison of prediction and reference reveals that (i) the response to stronger forcing is more nonlinear in the case of greenhouse forcing (CR2) in comparison with solar forcing (SR2) and (ii) with a weaker identification (CS1, SS1) and test forcing (CR1, SR1), the linear prediction for CR1 is much better than that for CR2, while SR1 is seemingly as good as SR2. For the scenarios of combined forcing (BR1, BR2), only the true response is nontrivial if nonlinear, which is displayed in Fig. 2(b). Indeed, because of the nonlinearity, the total asymptotic response is nonzero (note that the fluctuations at asymptotic time are due to the finite ensemble size). It is visibly nonzero even with the weaker forcings. However, it is just about 10% of that with greenhouse forcing solely, even in the case of the stronger forcings.

The pronounced nonlinearity [point (i) above] shows up also in other experiments. With a very slow forcing CQ2, we registered the response as shown in Fig. 3. Although the rate of forcing is unchanged throughout the time period of almost 700 years, the response switches to a slower rate between 400 and 500 years, or, between 3 and 4 K changes in the temperature.^{74} The placement of this change in the rate, compared with the asymptotic temperature change of almost 3 K upon the weaker CR1 forcing seen in Fig. 2(a), is in good agreement with the observation of a much more closely linear response to that weaker forcing as compared with CR2. A crude indicator of non/linearity can be extracted not only from the CQ2 experiment, but also from comparing the asymptotic/stationary responses (denoted by a subscript $\u221e$) in the XX1 and XX2 experiments, as the following ratio:

(note that we write an “X” in place of one of the possible characters in the scenario identification code when it does not matter which of the possible characters is written there). This value is $\rho =0.99$ with solar forcing and 0.85 with greenhouse forcing, in agreement with what the comparison of predicted and true responses, seen in Fig. 2(a), allowed us to conclude above.

#### 2. Spatial pattern

We continue with the *diagnostics* of the geoengineering method in view of *uncontrolled* observables. We begin by looking at the observable of the same physical quantity, the surface air temperature, but of a spatial scale other than the global average. We would like to map out the spatial variation of the total response. A comprehensive view of the spatial variations of the response is given by the distribution over the 2D surface, computing the response at each grid point separately, as done by Lucarini *et al.*^{32} Similarly to zonal averages ( Appendix E), the response patterns to greenhouse and solar forcings are very similar in the stationary climate regimes; see Figs. 4(a) and 4(b) for the strong forcings CR2 and SR2, respectively (see Ref. 75 for such a comparison in a complex model). The patterns in Figs. 4(a) and 4(b) are misaligned slightly, which results in nonzero predicted total responses (BR2) of opposite sign in neighboring regions. This is shown in Fig. 4(c). The picture for the weaker forcings, CR1, SR1 (not shown), and BR1 [Fig. 4(e)], is similar.

Unsurprisingly, large predicted residual total responses occur where the response is large to either greenhouse or solar forcing alone. However, the predicted total response turns out to be grossly erroneous; the reference regarding the surface air temperature, shown in Figs. 4(d) for BR2 and 4(f) for BR1, shows that *significant cancellation is achieved even locally* [we note that the overwhelmingly red and blue colors in Figs. 4(d) and 4(f), respectively, are consistent with the signs of the true residual total global change shown in Fig. 2(b)]. However, looking at the temperatures at the highest model level, nearest the tropopause, the response under combined forcing [BX2, Fig. 5(a)] is comparable in magnitude to the response under, say, solar forcing alone [SX2, Fig. 5(b)]—nothing like the situation on the surface as seen in Figs. 4(b) and 4(f).

### B. Annual precipitation

Here, we present results for another diagnostic observable, the annual precipitation $Py$. In terms of the spatial patterns of the response, very similar conclusions can be drawn for the precipitation as for the surface air temperature, and these conclusions are supported by the set of diagrams in Fig. 6. The difference is that the largest responses are observed at equatorial regions, and it is not clear what mechanism causes this. Most importantly, *significant cancellation is actually achieved as opposed to the much less favorable linear prediction*. This is so even if the solar forcing used is the same as before, i.e., one determined with the aim of canceling global warming (not wettening; in the same spirit as Fig. 4 of Ref. 38). The significant cancellation clearly suggests that the response characteristics of $Py$ to greenhouse and to solar forcing, say, in terms of the respective Green’s functions, are very similar, as with the corresponding Green’s functions of $Ts$. Nevertheless, there are differences too, as seen in Fig. 7, such as the more obvious nonmonotonicity of the evolution, despite a monotonic forcing. Furthermore, the linear prediction for the total response in the stationary climate is nonzero, which is clearly because we are looking at an uncontrolled observable. This linear prediction, however, is quite “unreliable,” as can be expected from the mismatch of the true and predicted spatial patterns. Otherwise, both the predicted and the true total global mean responses to combined forcing look rather negligible compared with the responses to the greenhouse or solar forcings acting separately (see also Table II). Interestingly, the transient responses (not shown) have similar qualities to those of the temperature: nonlinearity is most obvious for CR2 as opposed to CR1, SR1, and SR2.

Forcing . | CX1 . | CX2 . | SX1 . | SX2 . |
---|---|---|---|---|

Δ⟨[P_{y}]⟩_{∞} (mm) | 74 | 124 | −71 | −121 |

Forcing . | CX1 . | CX2 . | SX1 . | SX2 . |
---|---|---|---|---|

Δ⟨[P_{y}]⟩_{∞} (mm) | 74 | 124 | −71 | −121 |

We note that equatorial drying under a similar geoengineering scenario has also been reported by others.^{5,38} However, in these studies, a quadrupling of [CO$2$] was considered. We point out that it does seem to matter what levels of change we consider: under [CO$2$]-doubling, we actually find less drying than in the case of the $2$-fold [CO$2$] increase. This finding can, however, have different reasons. One candidate is that the response under combined forcing is nonlinear; another is that (assuming that the response under combined forcing is approximately linear) the required solar forcing was determined inaccurately [which already resulted in a residual response as seen in Fig. 2(b)]. Note that in Refs. 5 and 38, an exact cancellation of global mean surface temperature was achieved in the stationary climate, like, for example, in the G1 GeoMIP experiment. Given this, Fig. 4 of Ref. 38 indicates that the response of the global mean is approximately linear in most CMIP5 models considered, at least up to a certain forcing level that was actually lower than [CO$2$]-doubling. In the following, we indicate that both of the said effects play a role, i.e., nonlinearity is likely also present in our case; however, it might not be the dominant component. Drying while global average surface temperature was maintained in a model was reported also by Ricke *et al.*^{47,48}

## IV. IMPROVED METHODOLOGY AND RESULTS

### A. Achieving a cancellation of the global mean surface temperature change

This subsection pertains to our objective (O1). The very close resemblance of the patterns seen in Figs. 4(a) and 4(b) hints that the effect of a changing $[CO2]$ on the radiative forcing shaping the surface air temperature is very similar to that due to a changing solar strength. However, with these data, we are not properly informed about just how similar, because, for example, the CR2 and SR2 forcings act in *opposite* directions and, owing to nonlinearities, they do not have to have the same effect even if the effects due to forcings acting in the same direction are indistinguishable. Therefore, we produced just that missing simulation: complementing SS2, for which the applied solar forcing is a step of equal magnitude but opposite sign. For this forcing, the stationary climate is shown in Fig. 8(a), to be referred to as SS2I. It is virtually indistinguishable from the pattern resulting for CS2, seen in Fig. 4(a), including a lack of such misalignment as the comparison of Figs. 4(a) and 4(b) revealed. This goes beyond the report on the (approximate) “equivalence” of greenhouse and solar forcings with respect to (asymptotic in time) *global average* surface temperature;^{68} this is extended now to *regional averages*, i.e., spatial patterns, of that variable with a remarkable degree of approximation [just how close this equivalence is will be indicated by Fig. 9(a)].

The superposition of the stationary climates for SS2 and SS2I, displayed in Fig. 8(b), is in turn almost indistinguishable from the asymptotic total response to combined BR2 forcing, seen in Fig. 4(c). By inspection of Eq. (2), this pattern turns out to be created by even-order nonlinear perturbative terms of the response. The selection of the even-order terms takes exactly the superposition of the responses from two experiments where the forcings are equal but of opposite sign: $\epsilon 1=\u2212\epsilon 2$.

Instead of eliminating the even-order terms by superposition, we can retain only the odd-order terms by subtraction. We proceed in this direction, assuming that the third-order and higher odd-order terms make a negligible contribution. This way, we attempt to improve on the results for the linear susceptibility—and so ultimately on our prediction of the required solar forcing needed for canceling global warming. This is done with the aim of making an advance regarding our objective (O1). We can then apply this forcing in a new experiment coded as BR2C (“C” for “cancel”). For this experiment, we can utilize (although we will not examine the transient^{76}) our finding that the response characteristics to greenhouse and solar, i.e., short-wave and long-wave radiative, forcings are very similar, which would allow the application of a solar forcing that is a simple straight ramp, just like $log\u2061([CO2]/[CO2]0)(t)$, having the same length before the plateau (this should be the rationale behind the G2 experiments of GeoMIP). That is, what we improve on here is only the *level* of the plateau. It is rather straightforward to obtain the following equations for this level $f\u221e,BR2C,s$:

The subscripts $\u221e$ refer to the asymptotic, stationary climate regime; other subscripts refer to the experiment, i.e., the forcing scenario. Observe that we need data from a new experiment, CS2I, where the “I” indicates an experiment related to CS2 analogously to the relation of SS2I to SS2. Since we are interested in the stationary climate regime only, owing to ergodicity we can produce just a single long trajectory instead of an ensemble. The result of this is $\Delta \u27e8[Ts]\u27e9\u221e,CS2I=\u22125.11$ K (while we already have $\Delta \u27e8[Ts]\u27e9\u221e,SS2I=4.36K$, and, from Fig. 11, $\Delta \u27e8[Ts]\u27e9\u221e,SS2=\u2212\Delta \u27e8[Ts]\u27e9\u221e,CS2=\u22124.90K$). Having $|f\u221e,BR2C,g|=|f\u221e,CS2|$, we can express the sought-for forcing in relative terms based on the temperature data only, such as

In fact, we carried out the BR2C experiment independently: *iteratively* determining a solar forcing that cancels to a very good approximation the total response (similarly to how the level for, e.g., SS2 was determined observing the result of CS2). This forcing in the above relative terms was found to be 1.11, agreeing well with our prediction of 1.08.

Given that our prediction is smaller than the forcing actually needed for cancellation, we can predict an *upper bound* on the actual total response to our predicted forcing by substituting the actually needed value $|f\u221e,BR2C,s|/|f\u221e,SS2|=1.11$ into Eq. (17) (assuming that the response under combined forcing is linear). This gives $\Delta \u27e8[Ts]\u27e9\u221e,BR2C<0.134$ K. Considering that the total residual response with the original naïve methodology ( Appendixes B and C) was 0.6 K, this means that with the improved methodology we managed to reduce the total response almost to *one-fifth* or even less of the said first result (of course, the exact reduction can be easily determined by an extra simulation, which we have not run). In fact, some residual total response even with the improved method could be expected, since the simple measure of nonlinearity (14) indicated that linearity is much more violated by an increasing radiative forcing as opposed to a decreasing one. This suggests that the third-order *odd* perturbative term is not very small relative to the second-order one—contrary to the assumption of our improved methodology. Another source of error could be a nonlinear component of the response under combined forcing. This is what we examine next.

### B. Uncontrolled response and its (non)linearity

This subsection pertains to our objective (O2). Even if we managed to achieve a perfect cancellation in terms of the global averages, amounting to a success in terms of our objective (O1), it is still important to examine the total response in terms of any other observables regarding which the cancellation is not enforced, to see whether there is any unwanted residual. To this end, we look at the BR2C data. In particular, in Fig. 9, we show the spatial variations of the stationary climate in terms of (a) the surface air temperature and (b) the annual precipitation. The former looks to be a result of interpolating between the maps of Figs. 4(d) and 4(f), and the latter looks to have the same relationship with the maps seen in Figs. 6(d) and 6(f). This implies that the variances with respect to space for BR2C (exact cancellation), both for temperature and for precipitation, are about the same as those for BR2 (approximate cancellation employing a naïve method), and are much larger than the residual total responses in terms of the respective global averages for BR2. The reason for this must be that typically the respective (not necessarily linear) local response characteristics to greenhouse and solar forcing are somewhat different. Furthermore, comparing the BR2 and BR2C scenarios, we see that the difference in terms of the climatic surface air temperature could be as much as 2 K, which is about 10% of the maximum response under the corresponding greenhouse forcing alone. This justifies very well the application of the improved methodology as described in Sec. IV A. We note that the cooling tropics and warming arctic under global average surface temperature cancellation are in agreement with the findings of other studies using state-of-the-art models, including the first study examining the side-effects of geoengineering by Govindasamy and Caldeira.^{60}

The improved methodology to estimate susceptibilities applies of course to regional averages too. What remains to be seen now is whether linear response theory can predict the residual total responses seen in Figs. 9(a) and 9(b) (O2). The corresponding linear predictions are shown in Figs. 9(c) and 9(d), respectively. These predictions show a dramatic improvement on the first results shown in Figs. 4(d) and Fig. 6(d), respectively. Quantitatively, however, the prediction is not perfect. We can quantify this by, for example, the Pearson correlation coefficient $C$ between the reference $\Delta \u27e8\Psi \u27e9$ and the linear prediction $\u27e8\Psi \u27e9(1)$, the results for which are shown in Table III (note that no weighting of the data points with the area represented by grid points is done). This shows that the prediction skill is better for the temperature than the precipitation.

. | Pearson correlation coefficient . | std(ρ)
. | ρ
. |
---|---|---|---|

T_{s} | 0.78 | 0.26 | 0.73 |

P_{y} | 0.53 | 1.01 | 0.70 |

. | Pearson correlation coefficient . | std(ρ)
. | ρ
. |
---|---|---|---|

T_{s} | 0.78 | 0.26 | 0.73 |

P_{y} | 0.53 | 1.01 | 0.70 |

Whether the imperfection of the linear prediction is due to nonlinearity—as a small error $E=\Delta \u27e8\Psi \u27e9\u2212\u27e8\Psi \u27e9(1)$ would normally suggest—is not clear, because it is possible that the response $\Delta \u27e8\Psi \u27e9$ is linear but errors in the susceptibility estimates determining $\u27e8\Psi \u27e9(1)$ (or rather its estimator) remain. We should, therefore, find a way to check linearity without relying on the linear prediction. This can be done in a naïve way similarly to what we did for single-forcing scenarios, evaluating $\rho $ as defined by Eq. (14). However, this time, we have not one but two forcings present. Because of this, it turns out that a check of linearity requires not two but three data points at least. In fact, we are readily endowed by three dataset candidates resulting from the BR1, BR2, and BR2C experiments. In each scenario, if the response is linear, the asymptotic climate would be given by an equation like

where $i=1,2,3$ stand for, say, BR1, BR2, BR2C, in that order. One can express $\chi \Psi ,s$ from the equation for $i=3$, substitute into the equations for $i=1,2$, and from these latter express $\chi \Psi ,g$. Under linearity, the ratio of these expressions,

would be unity, meaning that Eq. (20) are in fact satisfied. We have evaluated $\rho $ for all grid points and display the results in Fig. 10. This suggests that we do have nonlinearity for both the temperature and precipitation. However, this conclusion can be called into question by noticing that the three data points could be too close to one another, resulting possibly in an inaccurate estimation of the ratio $\rho $, thereby falsely suggesting nonlinearity. Instead of evaluating $\rho $, faced with finite dataset size to evaluate climatic means, a test statistic would be a proper way of indicating nonlinearity at individual grid points.^{77} Nevertheless, based on the available data still, one idea to indicate that deviations from unity of both the correlation coefficient $C$ and $\rho $ are due to nonlinearity would be to demonstrate a correlation between the local errors $E$ of the linear prediction and $\rho $. We have checked the scatterplots of these quantities for both the temperature and precipitation and found no sign of correlations. This, however, does not mean that the response is linear: some unidentified effect could destroy the correlation. Our final idea is that if two situations feature different levels of nonlinearity, even if the two gridpoint-wise quantifiers of nonlinearity, $E$ and $\rho $, have random errors, typically they should indicate in a coordinated way a stronger deviation from linearity in the case when nonlinearity is actually stronger. We propose to capture this typical behavior by the correlation coefficient $C$ on the one hand, and the standard deviation std($\rho $) over the grid points on the other. We note that even if linearity is typical, a smaller std($\rho $) would indicate that it is more typical. We have already given the correlation coefficient in Table III, where we also display std($\rho $). We do indeed see that both quantities suggest that the response of precipitation is more nonlinear.

In the last column of Table III, we show $\rho $ for the global averages $[Ts]$ and $[Py]$ [not the average of the gridpoint-wise $\rho $’s, but having, e.g., $\Psi =[Ts]$ in Eq. (21)]. The steady-state values are estimated by taking the temporal mean of the ensemble means in the last 80 years. These values could be somewhat inaccurate because of the drift seen in Figs. 2(b) and 7(b) for the BR1 simulation. But considering the possible maximum values of $\rho $ for both $\Psi =[Ts]$ and $[Py]$, a degree of nonlinearity still seems very likely. The figures indicate that the response of the global average precipitation, unlike the local values/regional averages, is not significantly more nonlinear than the response of temperature under geoengineering. These results caution us about the reliability of linear predictions of side-effects as part of an assessment exercise:

linear predictions of regional responses are less reliable than the global response;

some quantities can respond more nonlinearly than others.

## V. SUMMARY AND OUTLOOK

As our objective (O1), we defined and solved an inverse problem for finding a solar forcing that can cancel global warming when applied in conjunction with a given greenhouse forcing. Our novel inverse problem approach is generic in two respects. First, we can allow for different choices of the scalar observable to keep under control—different either with respect to the physical quantity, or considering, for example, local variables. Second, we can prescribe an arbitrary time evolution of the chosen observable, assessing perhaps less stringent requirements than total cancellation.^{8} The inverse problem itself was derived in the framework of linear response theory. We have then further generalized the problem to the case when we want to control $N$ climatic observables by including $N$ auxiliary forcings.

Because of a degree of nonlinearity of the response characteristics of interest, the degree of approximation of the solution of the inverse problem, specifically for the cancellation of global average surface air temperature, depended on the accuracy of determining the linear susceptibilities (or Green’s functions) belonging to the different forcings. The inaccurate determination of the linear susceptibilities stems from the fact that for the estimation of the Green’s functions, we used *finite*-magnitude external system identification forcings (see Appendix B), in which case the nonlinearity of the response is already felt, while for the cancellation, i.e., *zero* total response, we would need the linear susceptibilities *exactly*—assuming the response is linear under combined forcing. An inaccurately predicted required solar forcing leads to a nonzero residual true total response. By a simple method, however, also used by Gritsun and Lucarini^{69} and independently by Liu *et al.*,^{70} here, for determining the susceptibilities, we eliminate even-order nonlinearities from the response in the system identification experiments. The price of this is having to run twice as many ensemble simulations for system identification. In the scenario of doubling CO$2$ concentration, by this method, we were able to achieve a fivefold reduction of the unwanted actual total response arising instead of cancellation. Furthermore, the linear prediction of spatial patterns using the improved *local* susceptibilities improved dramatically.

Nevertheless, the prediction is not perfect, and, pursuing our objective (O2), we indicated that the response under combined forcing should be somewhat nonlinear, and the degree of nonlinearity could be typically stronger for some quantities. In particular, we have seen evidence suggesting that in PlaSim the response of precipitation is more nonlinear than that of the surface temperature. This casts a shadow over the use of linear response theory for an efficient assessment. Perhaps there would still be value in this method, since larger-scale quantities are expected to be better predictable. Results reported by Cao *et al.*^{39} seem to indicate that in complex models, this nonlinearity is not more modest. Therefore, it would be desirable in the future to work out a method of predicting the nonlinear response in geoengineering scenarios.

We note that the *nonlinearity seen* in the system identification experiments *is not a local property* of the unforced system. The nonlinearity starts to be manifested at a certain level of the response. This is the case only for the positive radiative forcing, but not for the negative one. This is why—despite the nonlocality of nonlinearity—our improved methodology in Sec. IV A still worked so well: the wrong susceptibility estimated from the positive forcing was averaged by the more correct susceptibility estimated from the negative forcing. That is, the error was mitigated.

Ours is the first such analysis of the linearity of regional response under geoengineering; previous studies^{38,39} only compared the linear prediction and the true model response, and, as we have argued, such a comparison has to be considered inconclusive regarding the nonlinearity of the response. There is a question as to whether our findings on the predictive power of linear response theory in PlaSim carry over to state-of-the-art Earth System Models, because these do respond more weakly. The question certainly seems valid, however, since CMIP5 models also feature nonlinear regional responses under [CO$2$] forcing alone.^{40,41} The responses of global average surface air temperature and precipitation have been reported by MacMartin and Kravitz^{38} to be approximately linear in some CMIP5 models, seemingly more so than in PlaSim, but weaker forcing than [CO$2$]-doubling was considered, and the linearity of regional responses was not analyzed in detail. In apparent contradiction, Cao *et al.*^{39} reported a clear and not insignificant mismatch of the linear prediction and model simulation for local responses. However, only a single state-of-the-art model was considered, the HadCM3L model, and so it was not indicated whether this is typical behavior of state-of-the-art models. Furthermore, to reiterate, regarding the novel contribution (O2) of our work, we claim that, based on these previous results reported, it was not correct to conclude that the mismatch was due to nonlinearity. As the latest result on the linearity of the local precipitation response, however, Ref. 78, analyzing the Geoengineering Large Ensemble of the CESM1 model, states that “While a rather extreme geoengineering scenario has been considered, many, but not all, of the precipitation features scale linearly with the offset global warming.” [in particular, see their Figs. 16(c), 16(g), and 16(k)]. Furthermore, we learn from Figs. 9 and 10 of Ref. 79 that different ESMs feature quite different patterns of local temperature and precipitation response under geoengineering.

We pointed out also that instead of stepwise system identification forcing, it is better to use a Kronecker delta forcing to achieve a better signal-to-noise ratio (SNR). As another gain from using a Kronecker delta forcing, the response would be much more modest in magnitude, and hence it would stay farther away from regimes with more significant contributions of nonlinear terms, and so the linear susceptibilities could be estimated more accurately, even by the naïve method [see Appendix B and Eq. (A2)].

We note that the presented method for predicting a required solar forcing is based on the Green’s functions that are determined by externally forcing the system of interest. This is clearly not a method that could be put in practice in the case of the Earth system, and so we are restricted to using climate models. Therefore, this is another reason, besides the unpredictability of 21st century greenhouse forcing, why the method is suitable only for scenario analyses. However, it might be possible to estimate the Green’s functions without externally forcing the system, just from an observation of unforced fluctuations. The crucial question in this regard is whether the fluctuation–dissipation theorem^{11,21} is applicable.

## ACKNOWLEDGMENTS

We would like to express our gratitude to an anonymous referee of a previous submission of an earlier version of our manuscript for their very thorough feedback and many suggestions to improve the quality of our work, and also for providing references to the relevant literature. We would also like to acknowledge Ken Caldeira for drawing our attention to his paper^{39} relevant to our contribution (O2), and the then editor Ben Kravitz for informing us about his publication.^{80} We would also like to thank an anonymous referee and the editor Georg Gottwald for their helpful suggestions to improve this paper. T.B. would like to thank Jian Lu for inspiring discussions on geoengineering and sharing their manuscript.^{50} This work was part of the EU Horizon 2020 Project CRESCENDO (under Grant No. 641816); financial support is gratefully acknowledged. It also received support from the EU Blue Action Project (under Grant No. 727852) and from the Institute for Basic Science (IBS), South Korea, under No. IBS-R028-D1. V.L. acknowledges support from the DFG Sfb/Transregio TRR181 Project.

### APPENDIX A: COMPUTING THE RESPONSE IN TIME AND FREQUENCY DOMAINS

To be able to carry out (approximate) calculations involving spectral transforms, we need to clarify the formulas and algorithms applicable to *discrete-time* and *finite-size* data. We can approximate the time-continuous forcing $f(t)$ [appearing in Eq. (4)] by a *staircaselike* forcing that is defined by a uniform *sampling* of $f(t)$, called a *sample-and-hold* approximation. It can be represented by a discrete sequence $f[n]=f(t=nT)$, $n=\u2026,\u22121,0,1,\u2026,$ with $T$ being the uniform sampling interval, in which sequence the data points provide the levels of the “stairs.” That is, for an actual staircaselike forcing signal, $f(t=(n+\nu )T)=f[n]$ for all $\nu \u2208[0,1]$, where the noninteger $\nu $ can be viewed as a phase variable—the phase where the sample is taken within the interval where the forcing is constant. For such staircaselike forcings, sample values of the response with the sampling $\Psi [n]=\Psi (t=(n+\nu )T)$ of the time-continuous response at any phase $\nu \u2208[0,1]$ obey

where the discrete-time (DT) impulse response or DT Green’s function $h\Psi [n]$ is, clearly, the response $\u27e8\Psi ^\u22a5\u27e9(1)$ to a Kronecker delta function forcing: $f[n]=\delta [n]=1$ if $n=0$ and 0 otherwise.^{81} Note that we make a distinction in our notation with regard to the special forcing such that we distinguish $\Psi ^$ from $\Psi $; however, for simplicity, we did not subscript $\Psi ^$ by $\nu $, despite its dependence on the phase. Note also that the DT impulse response function cannot be obtained by a straightforward sampling of the Green’s function; that is, in general, $h\Psi [n]\u2260G\Psi (1)[n]=G\Psi (1)(t=(n+\nu )T)$ with the same $\nu $ as $\Psi [n]$ (or $\Psi ^[n]$) is defined with, or with any other fixed $\nu $, for all $n$. If the sampling frequency is not adequate regarding some typical time scales of the forcing, then the calculated discrete response will be also an inadequate approximation. We note further that—unlike the Dirac delta in the continuous-time case—the Kronecker delta *can* be realized for numerical purposes. It is equivalent to applying a step forcing and taking the difference,

This method was used by Lucarini *et al.*^{32} Such external forcings we will refer to as (system) identification forcing.

When faced with the practical situation of having *finite* time series, $f[l]$ and $h\Psi [l]$, $l=0,\u2026,L\u22121$, Eq. (D5) of Appendix D can be used to determine the response $h\Psi \u2217f[l]$, $l=0,\u2026,L\u22121$. The usefulness of this formula is due to the existence of efficient algorithms for evaluating the discrete Fourier transform, DFT; we evaluate the DFT using fft from MATLAB. To use the formula, one can *pad* $f[l]$ and $h\Psi [l]$ by $L\u22121$ zeros in *front* (or rather $L$ zeros^{82}); we will denote these padded sequences by, e.g., $f~[l]$, $l=0,\u2026,2(L\u22121)$. The first useful “half” ($l=0,\u2026,L\u22122$) of the circular convolution $DFT\u22121{DFT{h~\Psi}DFT{f~}}$ resulting from Eq. (D5) will then match the linear convolution $h\Psi \u2217f[l]$, $l=0,\u2026,L\u22121$. Unlike this calculation in the frequency domain, the calculation in the time domain using Eq. (A1) is straightforward.

### APPENDIX B: OBTAINING THE GREEN’S FUNCTION

First, to predict the response (to first order), we need to obtain, for example, the (first-order) Green’s function. As Eq. (5) suggests, it is fully determined by the autonomous system. A direct evaluation of this formula is, however, prone to failure.^{32} Second, we note that in practice, we can study only a discrete-time version of the system. This suggests that for a direct way of determining the Green’s function, instead of Eq. (4), we have to use Eq. (A1) [leading to Eq. (A2)]. It also means that we cannot infer the response of the system just by observing its autonomous dynamics, but we need to force it externally in a suitable way. Third, an ensemble of experiments (appropriately initialized) is needed to obtain the expected value $\u27e8\Psi ^\u27e9$ [with the notation introduced in Appendix A, first appearing in Eq. (A1)]. This was acknowledged also by MacMartin and Kravitz.^{38} However, it is feasible to run only a finite number of experiments, so we obtain an approximation of $\u27e8\Psi ^\u27e9$, where the error is some correlated noise process. This correlation can be made negligible by using a suitably infrequent sampling, allowed by, say, the application of a slow forcing. We use the same data as Lucarini *et al.*,^{32} which consist of some ensembles of 200 members, and we have produced new data belonging to new forcing scenarios, as described in Sec. II, that consist of ensembles of 20 members.

As already spelled out in Appendix A, two identification forcing types are particularly suitable to determine the Green’s function: one is a step forcing, and the other is the Kronecker delta. When a random statistical error is present due to the finite ensemble size, represented, say, by a Gaussian random variable $\xi $, it is actually *better to use a Kronecker delta* forcing for the following reason. Using the step forcing instead, one needs to take the difference of consecutive values—what is sometimes called “differencing”—of the response sequence (A2). This way, at any time, the variance of the error is that of the *difference* of two random variables, $\xi 1$ and $\xi 2$, both distributed identically to the original random variable $\xi $. For Gaussian variables, it is straightforward to show that $Var[\xi 1\u2212\xi 2]=2Var[\xi ]$. Note that we assume that $\xi $ is the same random variable to a good approximation under the delta and step forcings. Despite the advantage of the Kronecker delta forcing, we apply a step forcing also in our new experiments, so that we are able to make use of data produced for the study by Lucarini *et al.*^{32} in a consistent manner. Examples of the response to step forcings are displayed in Fig. 11. The similarity of the responses to greenhouse and solar forcings here, and so the Green’s functions, is consistent with the findings of others^{83–85} and the design of the G2 GeoMIP experiment.^{6}

It is important to appreciate the following trade-off. For a better signal-to-noise ratio (SNR), one can apply a more powerful identification forcing. However, in the presence of nonlinearities, the more powerful the forcing signal, the larger is the error in estimating the Green’s function *belonging to the base state* $\u27e8\Psi \u27e90$ (even without noise). MacMartin and Kravitz^{38} applied a [CO$2$]-quadrupling (and this is a standard forcing level for geoengineering studies^{5,6}); however, they did not determine the solar forcing for cancellation via the Green’s functions ( Appendix C), and they checked the linearity of the response only up to a forcing level lower than [CO$2$]-doubling. Their motivation for applying the high forcing level seems to have been only to be able to determine the Green’s function with a better SNR given that no ensemble data are available from the GeoMIP experiments.

We make here two more comments about the issue with noise. First, instead of instantaneous samples of the observable $\Psi $ and the corresponding Green’s function, we will consider, like Lucarini *et al.*,^{32} *annual averages*, $\Psi \xaf[n]=\u222b01d\nu \Psi ((n+\nu )T)$. This is sensible given the slow rate of change that the applied forcing represents, and it also greatly reduces the noise level. In this regard, we point out that annual averages too obey Eq. (A1) *exactly* if the forcing is constant over a year, because the order of summations can be interchanged, whereby a well-defined DT Green’s function belonging to the annual average emerges. We will use only annually constant staircaselike forcings in our experiments (Sec. II), so that it will be clear that a linear prediction of the response has an error not because Eq. (A1) does not apply exactly, but because of the missing higher-order perturbative terms appearing in the expression (2). Second, the said enhancement of noise by differencing in Eq. (A2) cannot be overcome by working in the frequency domain. The Green’s function, via the frequency domain and applying Eq. (D1) of Appendix D, is expressed as $h\Psi =DTFT\u22121{DTFT{\Delta \u27e8\Psi ^\u231c\u27e9}/DTFT{f\u231c}}$, where $1/DTFT{f\u231c}=1\u2212e\u2212i\omega $. The latter is the very factor arising in the discrete-time Fourier transform (DTFT) of a differenced sequence. As far as we are aware, the only way to avoid the differencing and thereby reduce the noise is by using a Kronecker delta identification forcing as argued above.^{86}

We also note that the application of a stepwise identification forcing implies a specific frequency dependence of the variance of the estimator of the susceptibility. See Ref. 80 for other system identification techniques that imply different such frequency dependences.

### APPENDIX C: THE INVERSE PROBLEM

When different forcings act at the same time, their first-order contributions to the response—as discussed in Sec. I A—can be *superimposed*. Hence, when we desire a certain total response $\Delta \u27e8\Psi \Sigma \u27e9(t)$ to a *combined forcing* when all forcings are given but one, there is a *unique* form of that one required forcing to fulfill our desire. In terms of the geoengineering problem of interest (Sec. I B), the required solar forcing $fs$ to achieve a total response $\Delta \u27e8\Psi \Sigma \u27e9$ given a greenhouse forcing $fg$ can be *expressed*, to a first-order approximation, in the frequency domain as stated in Eq. (10). With the most obvious choice of *cancellation*, $\Delta \u27e8\Psi \Sigma \u27e9=0$, Eq. (10) simplifies to

However, in practice, when finite time series are available, the simplification is not so trivial. As described at the end of Appendix A, in place of the three FTs in Eq. (10), we have to evaluate DFT${f~g}$, DFT${h~\Psi ,s}$, and DFT${h~\Psi ,g}$. *Furthermore*, the DFT in place of $\Delta \u27e8\Psi \Sigma \u27e9(\omega )$ is that of a sequence $\Delta \u27e8\Psi \u02c7\Sigma \u27e9[l]$, only the first useful “half” ($l=0,\u2026,L\u22122$) of which is zero, as dictated by our requirements, but its second half ($l=L\u22121,\u2026,2(L\u22121)$) has nonzero values in general. That is, what we have is

It can be shown that the said nonzero values characterize the total response to combined *step* forcings (to do with the “gap” mentioned in the caption of Fig. 12) and also depend to a certain extent on the particular finite $fg[l]$ presented. The reason for this is that the Green’s function is given only up to a finite time, which becomes clear upon inspection of the workings of the convolution of finite time series. Simply by rearranging Eq. (C2), the said nonzero values are given by

where, however, $f~s$ is not known, being the sought-for object. The idea is that we can look for $f~s$ by an *iterative* procedure, which is initialized, say, by $f~s=f~g$. Note that if $h\Psi ,g$ and $h\Psi ,s$ are not dissimilar, then nor are $fg$ and $fs$; that is, the initial value is not far from the solution, which gives hope that it is within the basin of attraction of the solution. In each iterate, we do the following:

Evaluate Eq. (C3) using the current estimate of $f~s$, but instead of simply substituting this value, first we replace any nonzeros in its first half by zeros.

In the resulting $\Delta \u27e8\Psi \u02c7\Sigma \u27e9[l]$, we replace any nonzeros in its

*first*half by zeros in order to have it in the right form.We then get a new estimate for $f~s$ using Eq. (C2).

Ideally, the first half of the $f~s$ estimates in stage 3 converge to zero, and the second half to some nontrivial form, that is, the solution. In our experience (results not shown), this is the case for systems with fairly simple and smoothly varying Green’s functions. However, when the same Green’s functions are corrupted by noise (which arises in practice from the finiteness of the ensemble size; see Appendix B), our experience is that the procedure does not necessarily converge, but iterates of $f~s$ can develop increasingly large oscillatory features. It is possible to achieve convergence for some smaller but nonzero noise level. However, even then, the limit function retains small oscillatory features over the full length of $f~s$.

We emphasize that the iterative procedure was needed because we could not predict the second nonuseful half of $\Delta \u27e8\Psi \u02c7\Sigma \u27e9[l]$, since we do not have the Green’s functions in full but only with a cutoff in time. This means that by running longer and longer *ensemble* simulations, by which we can determine the Green’s functions further and further in time, the solution can be approximated by a noniterative procedure better and better. This is a numerically more expensive solution.

As an alternative, working in the time domain, the inverse problem leads to the performance of a *deconvolution*,

Note that we have written $\Delta \u27e8\Psi \u02c7\Sigma \u27e9[l]$ in the above, which should correspond exactly to the appropriately defined circular convolution in Eq. (C3) for $l=0,\u2026,2(L\u22121)$. It is straightforward to show that in the time domain too, $fs[l]$, $l=0,\u2026,L\u22121$, is obtained *iteratively* in three stages in a similar manner as outlined above in the frequency domain (except that in stage 1, there is no need to replace any values by zeros, only in stage 2). One can use deconv in MATLAB to perform the deconvolution. We find in simple examples studied (results not shown) that without noise the procedure in the time domain leads to the very same solution as the procedure in the frequency domain. However, this is not the case with additive noise, which means that the deconvolution/inverse problem is *ill-posed* in this case. Nevertheless, the weaker the noise, the closer the outcome is to the true solution, either in the time or the frequency domain. We find that in the time domain, the procedure always converges to some solution; however, with increasing noise strength, this solution features oscillations of increasing amplitude as time advances. Nevertheless, for a certain noise strength when the frequency-domain procedure also converges, we find that the solution in the time domain is smoother and so closer to the true solution *earlier* in time. This is also what we find considering the PlaSim data, as shown in Fig. 12. We conclude, therefore, that *it is preferable to work in the time domain* using Eq. (C4) to produce numerical results. Nevertheless, we will carry out our calculations in the frequency domain, using, for example, the forcing signals shown in Fig. 12(a), in order to make the point that even a rough forcing signal convolved with a rough Green’s function produces a not so rough response, as we see in Sec. III A1.

We emphasize that, in our case, multiple iterations are not needed, because $h[Ts],g$ and $h[Ts],s$ are very similar, as indicated by Fig. 11. The straightforward application of Eq. (C2) is sufficient, substituting an all-zero sequence for $\Delta \u27e8\Psi \u02c7\Sigma \u27e9$. Ours is obviously a special case, however, and the generic iterative procedure might be needed for other geoengineering scenarios of practical relevance.

We note that an anonymous referee of a previous submission of this paper had suggested that, in the time domain, a simpler alternative way of obtaining the solution by a time-marching procedure should exist (not relying on deconvolution). Indeed, one can break down the convolution sum (A1) as $\u27e8\Psi ^\u27e9(1)[n]=\u2211k=2nh\Psi ,s[k]fs[n\u2212k]+h\Psi ,s[1]fs[n\u22121]$, which can be expressed for $fs[n\u22121]$ and consider that $\u27e8\Psi ^\u27e9(1)[n]=\u2211k=1nh\Psi ,g[k]fg[n\u2212k]$ is given for all $n$. Suppose $fg[0]=0$; then the procedure for finding $fs[n\u22121>0]$ can be initialized by $fs[0]=0$ for $n=1$. We have checked (results not shown) that it gives exactly the same result as our iterative procedure, reproducing the time series pattern due to a particular noise realization in a simple example system.

### APPENDIX D: THE CIRCULAR CONVOLUTION THEOREM AND ITS APPLICATION

Taking the discrete-time Fourier transform (DTFT) of Eq. (A1), we have, via the convolution theorem for discrete sequences,^{46} a formally analogous version of Eq. (6), with the individual Fourier transforms approximated by Fourier series,

where, for example, $f2\pi (\omega )=DTFT{Tf[n]}=\u2211n=\u2212\u221e\u221eTf[n]e\u2212i\omega n$ and $f[n]=DTFT\u22121{T\u22121f2\pi (\omega )}=12\pi T\u222b\u2212\pi \pi d\omega f2\pi (\omega )ei\omega n$ with a normalized nondimensional angular frequency $\omega $. Featuring instead the dimensional frequency $f$ measured in hertz (1 Hz = 1 s$\u22121$), the forward and inverse transform pairs are symmetrical: $f1/T(f)=f2\pi (2\pi fT)=\u2211n=\u2212\u221e\u221eTf[n]e\u2212i2\pi fTn$ and $f[n]=T\u222b1/Tdff1/T(f)ei2\pi fTn$. The DTFT, a continuous function of the frequency $f$, is often sampled at $f=k/(NT)$, $k=0,\u2026,N\u22121$,

for any $n0$, which yields the discrete Fourier transform (DFT) of the *finite* sequence $fN[n]$, $n=n0,\u2026,n0+N$, where the full infinite sequence $fN[n]$, $n\u2208R$, turns out to be $N$-periodic, since it has to be, for the equivalence of the two sums in Eq. (D2), in the so-called periodic summation form

Therefore, when $f[n]$ is actually $N$-periodic, its DTFT is nonzero only at $f=k/(NT)$, $k\u2208R$, and also periodic, such that the DFT of a single cycle of $f[n]$ is able to represent its DTFT. For such periodic sequences, to be denoted distinctively using a subscript as $fN[n]$, it can be proved^{46} that

for any nonperiodic sequence $y[n]$. Note that $y\u2217fN$ is referred to as the *circular convolution* of the sequences $y[n]$ and $f[n]$. When the sequences $y[n]$ and $f[n]$ are of finite length, $n=0,\u2026,N\u22121$ with any $N\u22651$, so that, for example, $fN[n]=f[mod(n,N)]$, their circular convolution can be shown to be^{46} (see also https://uk.mathworks.com/help/signal/ug/linear-and-circular-convolution.html)

This equality is called the *circular convolution theorem*. It follows that when $y[n]=0$ and $f[n]=0$ for $n=0,\u2026,Nf\u22121$ and $Ny\u22121$, respectively, then $(y\u2217fN)[mod(n\u22121,N)]=(y\u2217f)[n]$ for $n=N,\u2026,N+min(Nf+Ny,N\u22121)$. Furthermore, $(y\u2217f)[n]$, $n=1+Nf+Ny,\u2026,N+1+Ny$, is the segment that represents the part of the linear convolution that can be considered useful in the sense that it coincides with the occurrence of the finite values of $f$ in a finite time interval of length $N\u2212Nf$. Therefore, the circular convolution $(y\u2217fN)[n]$ captures the useful part of the linear convolution over $n=max(1+Nf+Ny,N),\u2026,N+min(1+Ny,Nf+Ny,N\u22121)$.

Therefore, when faced with the practical situation of having *finite* time series, $f[l]$ and $h\Psi [l]$, $l=0,\u2026,L\u22121$, Eq. (D5) can be used to determine the response $h\Psi \u2217f[l]$, $l=0,\u2026,L\u22121$ (whose usefulness comes from an efficient algorithm for evaluating the DFT, called the fast Fourier transform algorithm, FFT). In particular, if the two sequences are to be *padded* in front by a number $Nf=Nh=N0$ of zeros equally [so that the circular convolution (D5) is well defined], then the reconstructed length of the linear convolution $h\Psi \u2217f$ (the response of a causal system coinciding with the forcing) is $1+N0\u2212max(N0\u2212L+1,0)$. This is a linear function of $N0$ saturating at $N0=L\u22121$ reaching the full length $L$. Therefore, for simplicity, one can pad by $N0=L\u22121$ zeros,^{82} and we will *denote these padded sequences* by, for example, $f~[l]$, $l=0,\u2026,2(L\u22121)$. Note that padding with fewer or no zeros results in a circular convolution that better approximates either the useful or the not useful part of the linear convolution, whichever approximation is better when more zeros are used. In the extreme case of no padding, very little of the useful part can be approximated well. The key to the applicability of Eq. (D4) is that it does not matter how the forcing $f[n]$—and with it the response $\u27e8\Psi ^\u27e9[n]$—continues after our experiment, and so they can be thought of as periodic.

### APPENDIX E: ZONAL AVERAGE SURFACE TEMPERATURE

For the diagnostics of any residual total response, a good starting point is to look at the zonally averaged fields of the surface air temperature. As a reminder, any observable other than the one for which a desired evolution has been enforced (the global average surface air temperature in our case; see Sec. III A 1) can evolve in an uncontrolled fashion. This is something that is not intended and should be checked. First, we show results with the $2$-fold $[CO2]$ increase (CR1, BR1). Following Lucarini *et al.*^{32} (where only the case of $[CO2]$-doubling was treated), treating zonal means in a similar fashion to global means informs us that the response to either greenhouse or solar forcing is strongest at high-latitude/polar regions [see Figs. 13(a) and 13(c)]. This is where the response is most nonlinear, as indicated by Figs. 13(b) and 13(d), showing the difference between reference and prediction. This nonlinearity should be due to albedo saturation and/or nonlinear characteristics of radiation physics, as discussed by others.^{32,40,41,87} We note that in Figs. 13(b) and 13(d), nonzero values under the stationary climate (after a transient) represent finite-ensemble-size statistical errors only. As a consequence of the said nonlinearities, in the high-latitude regions, linear response theory performs very poorly in predicting the total response to combined forcing, as it does also in the regime of stationary climate; compare Figs. 14(a) and 14(b) showing the prediction and reference, respectively.

In addition to such a visual comparison, it is customary to quantify the discrepancy by measuring the error of the prediction *relative* to the true value. However, the true value can be zero at certain latitudes, and this naïve relative error measure then lacks an obvious meaning. In these situations, it is customary^{88} to analyze the following relative error:

This takes values in the interval [0,1] for all values of $\Delta \u27e8\Psi \u27e9BRX$ and $\u27e8\Psi \u27e9BRX(1)$; and a larger value should be considered worse. Clearly, $e1(\mu )$ as a function of latitude would facilitate the comparison of the predictive skill of linear response theory at different latitudes. We note that in Eq. (E1), $\u27e8\Psi \u27e9BRX(1)$ is meant to be an estimator of the actual quantity, and this estimator is biased, but to keep things simple, we do not introduce a separate symbol for the estimator. Another possibility in our situation is that we measure the error of prediction of the response to combined forcing relative to the response to one of the forcings, e.g.,

We evaluate $e1$ and $e2$ only with respect to the stationary climate, in which case the estimation is very accurate since we can take an average also with respect to time. Figure 15(a) shows the result in the case of the weaker forcing (CR1, BR1). Both $e1$ and $e2$ indicate with good agreement that the prediction is poorest at some high-latitude regions.

With $[CO2]$-doubling (CR2, BR2), as shown by the results in Fig. 15(b), the performance has different characteristics in comparison with the case of weak forcing. Both $e1$ and $e2$ are highest at both equatorial and some high-latitude regions and somewhat less at polar and some Southern Hemisphere midlatitude regions.

## REFERENCES

*IPCC Fifth Assessment Synthesis Report–Climate Change 2014 Synthesis Report*, edited by P. Aldunce, T. Downing, S. Joussaume, Z. Kundzewicz, J. Palutikof, J. Skea, K. Tanaka, F. Tangang, C. Wenying, and Z. Xiao-Ye [Intergovernmental Panel on Climate Change (IPCC), 2014], p. 116.

*International Handbook on Responsible Innovation*, edited by R. von Schomberg and J. Hankins (Edward Elgar Publishing, 2019).

*Nonautonomous Dynamical Systems*, Mathematical Surveys and Monographs Vol. 176 (AMS, 2011).

*What’s in a Name? Global Warming vs. Climate Change*(NASA, 2008).

*Models, Simulations, and the Reduction of Complexity*, edited by U. Gähde, S. Hartmann, and J. Wolf (De Gruyter, 2013), pp. 229–254.

^{63}“If geoengineering worked, whose hand would be on the thermostat? How could the world agree on an optimal climate?”

^{69}