Inverse problems are ubiquitous in science and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input–output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) not be unique, or (iii) be sensitive to measurement noise contaminating the data. Bayesian statistical inversion (BSI) is an approach to tackle ill-posed and/or ill-conditioned inverse problems. Advantageously, BSI provides a “solution” that (i) quantifies uncertainty by assigning a probability to each possible value of the unknown parameter/input and (ii) incorporates prior information and beliefs about the parameter/input. Herein, we provide a tutorial of BSI for inverse problems by way of illustrative examples dealing with heat transfer from ambient air to a cold lime fruit. First, we use BSI to infer a parameter in a dynamic model of the lime temperature from measurements of the lime temperature over time. Second, we use BSI to reconstruct the initial condition of the lime from a measurement of its temperature later in time. We demonstrate the incorporation of prior information, visualize the posterior distributions of the parameter/initial condition, and show posterior samples of lime temperature trajectories from the model. Our Tutorial aims to reach a wide range of scientists and engineers.

## I. INTRODUCTION

### A. Mathematical models of physical systems

In science and engineering, we often apply physical principles to develop a mathematical model of a physical system in order to capture some phenomenon of interest. Physics-based models are useful for acquiring insights and understanding about a system, explaining observations, guiding future experiments, discovering new scientific questions, and making predictions.^{1}

Abstractly, a physics-based model of a system is typically an operator *f*_{β}: *x* ↦ *y* that (i) predicts the output *y* in response to any given input *x* and (ii) contains a vector *β* of physical parameters characterizing the system. See Fig. 1.

The input and output could be scalars, vectors, or functions. Evaluating the operator *f*_{β} at an input *x* could constitute evaluating an analytic expression, solving a system of algebraic equations, evaluating an integral, solving a differential equation, or running a computer simulation.^{2,3} Note that the abstraction in Fig. 1 is not predicated on the system having mass and/or energy input/output; broadly, by definition, the input is merely a causal factor set in the beginning of an experiment on the system, and the output is an effect/result/consequence of the input.^{4}

### B. The forward problem

The *forward problem* is, given the model structure *f*_{β}(*x*) and its parameters *β*, predict the output *y* of the physical system in response to a given input *x*—i.e., predict the effect of a cause.

The forward problem is solved by evaluating *f*_{β}(*x*).

### C. Inverse problems

Mathematicians find it difficult to define “inverse problem,” yet most recognize one when they see it.

Charles W. Groetsch^{3}

Two categories of *inverse problems*^{2,3,7–12} are (see Fig. 2)

*parameter identification*: determine the parameters*β*characterizing a system that produced a set of observed input–output pairs ${(xi,yi)}i=1N$;*reconstruction*: determine/reconstruct the input*x*′ that produced an observed system output*y*′, given the model parameters*β*; that is, predict the cause of an observed effect.

Inverse problems are ubiquitous in science and engineering. The development of a quantitatively predictive model often involves parameter identification. Reconstruction problems emerge when measurements we ideally wish to make in order to probe a system are infeasible, inaccessible, invasive, dangerous, and/or destructive, e.g., (i) sensing or imaging using indirect measurements (an elementary example: exploiting a model of the thermal expansion of mercury to infer the temperature of the air from a measurement of the volume of mercury in a mercury-in-glass thermometer^{13,14}), (ii) making inferences about the interior of a domain from measurements on its boundary, and (iii) time reversal: reconstructing the past state of a system from a measurement of its current state.^{2,3}

A famous inverse problem, one of reconstruction, is “can one hear the shape of a drum?”^{15–18}

**Can one hear the shape of a drum?**Suppose we strike the head of a drum with a stick at

*t*= 0. The force induces transverse waves in the drumhead, which transmit to the air, producing longitudinal waves (sound).Treating the drumhead as a homogeneous, elastic, two-dimensional membrane under tension, the wave equation is a mathematical model for the vertical displacement*u*(*x*,*t*) of the vibrating membrane at a point*x*∈ Ω and time*t*≥ 0,(1)$\u22022u\u2202t2=c\Delta xu,x\u2208\Omega ,t\u22650,\u2009wave\u2009equation,u|\u2202\Omega =0,\u2009clamped\u2009boundary\u2009conditions.$The parameter

*c*> 0 is the ratio of the spatially uniform tension in the membrane to its density, Δ_{x}is the Laplace operator, and $\Omega \u2282R2$ defines the geometry of the membrane ($\u2202\Omega \u2282R2$ is the boundary of Ω). Damping is neglected. Equation (1) is subject to an initial position and velocity set by the striking stick.^{19}The pure tones the drumhead can produce are dictated by standing wave solutions of the wave equation,*u*(*x*,*t*) =*e*^{iωt}*ϕ*(*x*). Substituting into Eq. (1), we find that the frequency*ω*of a pure tone of the drumhead is related to an eigenvalue of the Laplace operator on the domain Ω,(2)$c\Delta x\varphi =\u2212\omega 2\varphi ,x\u2208\Omega ,\u2009an\u2009eigenvalue\u2009problem,\varphi |\u2202\Omega =0,\u2009clamped\u2009boundary\u2009conditions.$According to Eq. (2), the membrane, characterized by its physical property

*c*and its geometry Ω, can produce a discrete spectrum of vibration frequencies*ω*_{1}≥*ω*_{2}≥ ⋯.^{15}The conventional

*forward problem*is, given the physical property*c*and geometry Ω of the membrane, use Eq. (2) to determine the spectrum of vibration frequencies*ω*_{1}≥*ω*_{2}≥ ⋯ it can produce. In undergraduate mathematics courses, students solve the forward problem analytically for a rectangular or circular drum head.^{20}An

*inverse problem*is, given the physical property*c*of the membrane and its spectrum of vibration frequencies*ω*_{1}≥*ω*_{2}≥ ⋯, use Eq. (2) to determine its geometry Ω. Certainly, the set of pure tones a drum can produce contains information about the geometry of its membrane, but it is not obvious if these pure tones (and*c*)*uniquely determine*its geometry. This inverse problem, “can one hear the shape of a drum?,” was popularized by Kac in 1966.^{15}It was not until 1992 that Gordon, Webb, and Wolpert^{17}found two non-isometric domains with an identical spectrum of vibration frequencies, shown in Fig. 3. That is, one cannot necessarily hear the shape of a drum.^{18}

Classically, a solution to an inverse problem is (conceptually) sought by adjusting the input *x* (for a reconstruction problem) or parameter *β* (for a parameter identification problem) until the model output(s) match(es) or (approximately) fit(s) the observed system output(s) to achieve (maximal) consistency between the model and the data,^{21} e.g., least squares.^{22,23}

#### 1. Challenges in applied inverse problems

As opposed to theoretical, *exact* inverse problems, in *applied* inverse problems,^{24} we must cope with noise contaminating the measurements of the system output. This noise may originate from (i) an imperfect measurement instrument and/or (ii) variance in the system output in response to a repeated input due to (a) inherent stochasticity and/or (b) unrecognized or poorly controlled, and thus unaccounted-for in the model, inputs/conditions that influence the output. In addition, the model *f*_{β}(*x*) may inadequately approximate physical reality—perhaps, in part, owing to (ii-b).^{21}

Applied inverse problems are particularly interesting and challenging because, in contrast to forward problems, they are often ill-posed and/or ill-conditioned.^{6,25}

A solution to an applied inverse problem may not exist owing to noise in the measured output and model inadequacy:

in reconstruction problems, the observed output may fall outside the image of

*f*_{β}so that no physically viable input is consistent with it;in parameter identification, a parameter

*β*giving a model*f*_{β}(*x*) that exactly reproduces all observed input–output pairs may not exist.

A solution to an applied inverse problem may not be unique:

in reconstruction problems, many different inputs may be consistent with the observed output owing to a many-to-one

*f*_{β};in parameter identification, (i) the number of observed input–output pairs may be insufficient to fully constrain the parameters or (ii) the model may be inherently unidentifiable (i.e., no amount of data can fully constrain the parameters).

^{25–27}

A solution to an applied inverse problem may discontinuously depend on or be sensitive to measurement noise contaminating the data. Even if the solution to the inverse problem exists and is unique, noise contaminating measurements of the system output propagates onto and corrupts the solution to the inverse problem. In ill-conditioned inverse problems,

^{14}small realizations of noise in the data produce large changes in the solution.

To tackle applied inverse problems, these challenges implore us to account for the noise contaminating measurements of the output, reframe our concept of a “solution,” and quantify our uncertainty in the solution.^{28,29}

### D. Bayesian statistical inversion (BSI)

Bayesian statistical inversion (BSI)^{13,29–37} is a versatile framework to tackle (possibly) ill-posed and/or ill-conditioned applied inverse problems. BSI has two key advantages. First, BSI allows us to incorporate prior (i.e., before data are collected) information and/or beliefs about the input/parameter into our solution to the inverse problem. Second, BSI yields a probabilistic solution to inverse problems that quantifies uncertainty about the input/parameter.

*Key references for BSI*:^{13,38,39}

**Modeling uncertainty**. To model uncertainty about the unknown input/parameter, we treat it as a random variable and model its probability distribution. In this Bayesian view, the probability assigned to each region of input/parameter space reflects our degree of belief that the input/parameter falls in that region. This belief is based on some combination of subjective belief and objective information.^{40,41} Loosely speaking, the spread of the probability density over the input/parameter space reflects our uncertainty about its value, whereas sharp concentration of the density reflects certainty.

**Prior vs posterior distributions**. Our uncertainty about the input/parameter is different before and after we conduct an experiment, collect data, and compare these data with our model. Consequently, the input/parameter has a *prior* density before the data are considered and then an updated, *posterior* density after we are enlightened by the data.

**Modeling the data-generating process.** To acknowledge the unobservable noise that contaminates our measurements, we treat the outcomes of our measurements of the system output (i.e., the data) as realizations of random variables. A model of the stochastic data-generating process follows from combining (i) the mathematical model *f*_{β}(*x*) of the system and (ii) a model of the probability distribution of the noise.

**The two ingredients of BSI: the prior and likelihood**. The two key ingredients for tackling an inverse problem with BSI are as follows:

The

*prior distribution*of the input/parameter, expressing our beliefs about it before the data are collected/observed. The prior we construct depends on the amount of information we have about the input/parameter before the experiment, and it involves a degree of subjectivity and judgment.^{36,40}A prior may be roughly categorized as diffuse, weakly informative, or informative^{39}based on the amount of uncertainty it admits about the input/parameter. An informative prior, e.g., a Gaussian distribution with a small variance, expresses a high degree of certainty about the input/parameter. On the other hand, a diffuse prior, e.g., a uniform distribution, is more appropriate when we lack any prior information about the input/parameter. A diffuse prior adheres to the principle of indifference and spreads its density widely over input/parameter space to express a very high degree of uncertainty.^{39}An informative prior influences the posterior more than a diffuse prior, which “lets the data speak for itself.”^{42}Generally, as we gather more data, the influence of the prior on the posterior tends to (but does not for non-identifiable systems^{26}) weaken/diminish as the data override/overwhelm the prior.The

*likelihood function*of the input/parameter, giving the probability density of the data conditioned on each value of the input/parameter. The likelihood function is constructed from (i) the data [our observation(s) of the system output(s), paired with inputs for the parameter identification problem] and (ii) our model of the data-generating process, which constitutes (a) the forward model*f*_{β}and (b) the probability distribution that models the noise contaminating the data. The likelihood quantifies the support that the data lend to each value of the unknown input/parameter.^{39}From these two ingredients, we next “turn the Bayesian crank”

^{43}to obtain the posterior distribution of the unknown input/parameter.

**The BSI solution to an inverse problem: the posterior**. The BSI “solution” to the inverse problem, the *posterior distribution* of the unknown input/parameter, follows from the prior density and likelihood function of the input/parameter via Bayes’ theorem. The posterior density of the unknown input/parameter gives the probability that the input/parameter falls in any given region of input/parameter space, conditioned on the data. The posterior updates the prior in light of the data, offers a compromise between the prior and likelihood, and constitutes the raw solution to the inverse problem that quantifies uncertainty about the input/parameter through its degree of spread.^{28}

For some combinations of families of likelihood and prior probability distributions [e.g., a Gaussian (conjugate) prior for a Gaussian likelihood density], we obtain a closed-form expression for the posterior distribution. Otherwise, we resort to approximating the posterior distribution (i) as a tractable analytical distribution [e.g., as a Gaussian via the Laplace (quadratic) approximation or some other parameterized distribution via variational inference] or (ii) as an empirical distribution constructed using samples from the posterior obtained from a Markov Chain Monte Carlo (MCMC) method.^{43,44}

We may summarize the posterior distribution of the input/parameter with a *credible region* (e.g., a highest-density region^{45}) that contains some large fraction *α* of the posterior density and thus credibly contains—given the assumptions in the likelihood and prior hold—the input/parameter (precisely, with probability *α*).^{46} Note that a Bayesian credible interval for the parameter/input is distinct from and arguably more intuitive and natural than a frequentist confidence interval.^{47}

We next present an optional warm-up problem that gives an interpretable, closed-form posterior to illustrate and gain intuition about Bayesian inference.

**Bayesian inference of the pH of a vat of wine**Suppose we work at a winery and wish to estimate the average pH,

*x*, of Pinot Noir in a large, stirred vat. We possess a noisy pH-meter for the task.^{48}We adopt the Bayesian approach and view the pH as a random variable*X*to model our uncertainty about it.**Prior.**First, we impose a Gaussian prior distribution on*X*based on the history of pH measurements from previous batches of Pinot Noir produced by our winery over many years,with(3)$\pi prior(x)=1\sigma pr2\pi exp\u221212x\u2212xpr\sigma pr2,$*x*_{pr}and $\sigma pr2$ being the mean and variance of average pH among previous batches. This prior reflects our assumption that we followed closely the fermentation protocols from previous years, sourced our grapes from the same vineyard, employed the same yeast strain for fermentation, etc.**Probabilistic data-generating model.**We treat the extraction of a sample of wine from the vat and subsequent measurement of its pH, giving an observed pH*x*_{obs}, as a stochastic process. Particularly, we treat*x*_{obs}as a realization of an random variable,where $E\u223cN(0,\sigma 2)$ is a normally distributed random variable whose variance(4)$Xobs\u2223(X=x)=x+E,$*σ*^{2}models the imprecision of the pH-meter and variation in pH among samples from the vat. Assume that*σ*is known from previously observed variance in pH measurements of samples from the same vat (but different batch).**Data**. To gather information about the pH*X*of the vat of the wine, we take*n*samples of wine from the vat and measure the pH of each sample with our noisy pH-meter, giving data {*x*_{obs,1}, …,*x*_{obs,n}}.**Likelihood function**. Under our probabilistic data-generating model in Eq. (4), the*likelihood*density—the probability density of the data given the pH of the wine in the vat is*x*(and, implicitly, given*σ*)—is, assuming independent measurements,(5)$\pi likelihood(xobs,1,\u2026,xobs,n\u2223x)=\u220fi=1n1\sigma 2\pi exp\u221212x\u2212xobs,i\sigma 2.$Since we collected the data, we now see*π*_{likelihood}as a function of*x*, indicating the support the data lend for each value of*x*. Then, with some algebra (including completing the square), the likelihood functionwhere $xobs\u0304\u2254n\u22121\u2211i=1nxobs,i$ is the mean pH among the samples. The likelihood function diminishes as(6)$\pi likelihood(xobs,1,\u2026,xobs,n\u2223x)\u221dexp\u221212x\u2212xobs\u0304\sigma /n2,$*x*deviates from the sample mean $xobs\u0304$, and it diminishes more rapidly as*n*increases.**Posterior**. The posterior, the Bayesian solution to this task at the winery, is the probability density of the pH of Pinot Noir in the vat*X*, given the data {*x*_{obs,1}, …,*x*_{obs,n}} (and, implicitly,*σ*). It follows from Bayes theorem that(7)$\pi posterior(x\u2223xobs,1,\u2026,xobs,n)=\pi likelihood(xobs,1,\u2026,xobs,n\u2223x)\pi prior(x)\u222b\u2212\u221e\u221e\pi likelihood(xobs,1,\u2026,xobs,n\u2223x\u2032)\pi prior(x\u2032)dx\u2032.$After some algebra (including completing the square; the denominator is a constant we do not need to handle), we find the posterior is also Gaussian,(8)$\pi posterior(x\u2223xobs,1,\u2026,xobs,n)=1\sigma post2\pi \xd7exp\u221212x\u2212xpost\sigma post2,$with an inverse variance, a measure of*concentration*of the Gaussian, ofand mean(9)$1\sigma post2\u22541\sigma pr2+n1\sigma 2$Intuitively: (1) The concentration of the posterior Gaussian increases (i) as we collect more data ((10)$xpost\u2254n/\sigma 2n/\sigma 2+1/\sigma pr2xobs\u0304+1\u2212n/\sigma 2n/\sigma 2+1/\sigma pr2xpr.$*n*increases) and (ii) if we use a more precise pH-meter (small*σ*), reflecting a reduction in our uncertainty about the pH. (2) The mean of the posterior is a weighted average of the sample mean pH $xpost\u0304$ and the prior mean pH*x*_{pr}. With more data and a more precise pH-meter, we weigh the sample mean more than the prior mean. A more informative prior (smaller*σ*_{pr}) means we need more data (larger*n*) to override the prior.^{49–51}**Example**. Figure 4 illustrates how (top) the diffuseness of the prior influences the posterior and (bottom) how more data override the prior and reduce posterior uncertainty.**Conjugate priors**. Note that the posterior distribution in Eq. (10) belongs to the same family (Gaussian) as the prior distribution. Hence, the Gaussian prior is a*conjugate prior*for the Gaussian likelihood distribution in Eq. (5). Generally, a conjugate prior can simplify Bayesian inference by giving a closed-form expression for the posterior.^{46}

### E. Our contribution: a tutorial of BSI, illustrated on inverse problems of heat transfer

We provide a tutorial of BSI to solve inverse problems of both parameter identification and reconstruction while incorporating prior information and quantifying uncertainty. Our Tutorial is by way of examples regarding a simple, intuitive physical process: convective heat transfer from ambient air to a cold lime fruit in our kitchen. We aim to engage a wide range of scientists and engineers.

In Sec. II A, we describe our experimental setup: (i) we take a cold lime fruit out of a refrigerator and allow it to exchange heat with indoor air via natural convection; (ii) a probe inserted into the lime measures its internal temperature. In Sec. II B, we pose a mathematical model governing the time-evolution of the lime temperature based on Newton’s “law” of cooling. The model contains a single parameter. To describe the data-generating process, in Sec. II C, we augment this model with a probabilistic model of taking a noisy measurement of the lime temperature with the imperfect temperature probe. Next, in Sec. II D, we employ BSI to infer the parameter in the dynamic model of the lime temperature using time series data of the lime temperature (an overdetermined inverse problem). We then employ the model of the lime temperature with the inferred parameter to tackle two time reversal problems: reconstruct the initial temperature of the lime, given a measurement of its current temperature and the time it has been outside of the refrigerator (Sec. II E, ill-conditioned), and reconstruct the initial temperature of the lime and the duration it has been out of the refrigerator, given a measurement of its current temperature (Sec. II F, underdetermined). The solution to each inverse problem under BSI expresses uncertainty through the posterior probability density of the parameter/initial condition. We assess the fidelity of the solution to the reconstruction problems by comparing with the measured initial condition held out from the BSI procedure. From these concrete, instructive examples, we intend for readers to recognize inverse problems in their research domain where BSI may be employed to incorporate prior information and quantify uncertainty.

## II. A TUTORIAL OF BAYESIAN STATISTICAL INVERSION (BSI)

As a tutorial of BSI to solve inverse problems while incorporating prior information and quantifying uncertainty, we tackle a variety of inverse problems pertaining to heat exchange between ambient air and a lime fruit via natural convection.

### A. Experimental setup: heat exchange between a lime and the air

We allowed a lime fruit (∼5 cm diameter) to reside in a refrigerator for several days. Then, at time *t* ≔ *t*_{0} (min), we removed the lime from the refrigerator, placed it on a thin slab of insulating styrofoam, and allowed it to exchange heat with the indoor air via natural convection. An electrical-resistance-based temperature sensor inserted into the lime allows us to measure its internal temperature at any given time *t* to generate a data point (*t*, *θ*_{obs}) (“obs” = observed). [To avoid the early-time data reflecting the (short) time dynamics of the temperature probe coming into thermal equilibrium with the lime, owing to its nonzero thermal mass, we inserted the probe into the lime *before* we placed it in the refrigerator, so it begins thermally equilibrated with the lime.] Figure 5 shows our experimental setup, and Sec. S1 explains our temperature sensor and Arduino microcontroller setup.

### B. The mathematical model of the lime temperature

We develop a mathematical model governing *θ* = *θ*(*t*) (°C), the temperature of the lime as a function of time *t* ≥ *t*_{0} (min), as it exchanges heat with the bath of ambient air at bulk (i.e., far from the lime) temperature *θ*^{air} (°C). The mechanism of lime-air heat transfer is natural convection, which is the combined effect^{52} of both heat (i) conduction and (ii) advection, i.e., via motion of the air adjacent to the lime, driven by buoyant forces arising from spatial density-gradients in the air, caused by temperature-gradients.^{53} The initial temperature of the lime is *θ*(*t*_{0}) =: *θ*_{0}.

**Assumptions.** We make several simplifying assumptions:

The temperature of the lime is spatially uniform. [We estimate the Biot number

^{52}for this system Bi ≔*hr*/*k*≈ 0.6 based on (i) our measurement of the radius of the lime*r*≈ 2.5 cm, (ii) the reported thermal conductivity of a lime*k*≈ 0.595 J/(s m °C),^{54}and (iii) an estimated heat transfer coefficient for natural convection through a gas,*h*≈ 15 J/(s m^{2}°C).^{52,55}]The bulk temperature of the air (a bath)

*θ*^{air}is constant (sufficiently far from the lime).Heat conduction between the lime and the styrofoam surface on which it sits is negligible.

The mass of the lime is constant (e.g., a negligible loss of moisture over time).

The heat capacity of the lime is constant with temperature.

Heat released/consumed due to chemical reactions (e.g., oxidation) is negligible.

The temperature probe inserted into the lime has a negligible thermal mass.

The rate of heat exchange between the air and the lime is governed by Newton’s “law” of cooling.

**Newton’s “law” of cooling**

^{56–59}prescribes the rate of heat transfer (J/min) from the air to the lime at time

*t*≥

*t*

_{0}as proportional to the difference in temperature between the lime and the air (the thermodynamic driving force for heat transfer) at that time,

*hA*[

*θ*

^{air}−

*θ*(

*t*)], depends on two (assumed, constant) parameters: (i) the surface area of the lime in contact with the air

*A*(cm

^{2}) and (ii) the (natural) convective heat transfer coefficient,

*h*[J/(°C min cm

^{2})], between the air and the surface of the lime.

^{52}

**Differential equation model**. Conservation of energy applied to the lime gives its change in temperature d

*θ*= d

*θ*(

*t*) over a differential change in time d

*t*,

*C*(J/°C) the thermal mass of the lime. Equation (12) balances the amount (J) of sensible heat stored in the lime (left) and amount (J) of heat transferred to the lime (right) over the time interval [

*t*,

*t*+

*dt*). Thus, a first-order differential equation describes the time-evolution of the lime temperature,

^{56,57}

*λ*≔

*C*/(

*hA*) (min) of the model is a time constant governing the dynamics of heat transfer between the lime and the air.

**Analytical solution to the model**. Equation (13) admits an analytical solution through a variable transformation and then integration,

*θ*

_{0}, the lime temperature

*θ*(

*t*) monotonically approaches the air temperature [lim

_{t→∞}

*θ*(

*t*) =

*θ*

^{air}] as the temperature difference between the lime and air decays exponentially. The parameter

*λ*is a time scale for the lime to reach thermal equilibrium with the air. Specifically, after a duration

*t*−

*t*

_{0}=

*λ*out of the refrigerator, the difference between the air temperature and lime temperature is

*e*

^{−1}≈ 37% of the initial difference.

Figure 6 shows the solution to the model of the lime temperature, which we write as *θ*(*t*; *λ*, *t*_{0}, *θ*_{0}, *θ*^{air}) to emphasize its dependence on the parameter *λ*, the initial condition (*t*_{0}, *θ*_{0}), and the air temperature *θ*^{air}.

### C. The probabilistic model of the data-generating process

Consider the process of employing our temperature probe, an imperfect measurement instrument, to measure the lime temperature at time *t*, giving a data point (*t*, *θ*_{obs}).

We treat the measured temperature *θ*_{obs} as a realization of a random variable Θ_{obs} owing to two sources of stochasticity. First, *measurement noise*: unobservable noise originating from the temperature probe corrupts the measurement. Second, *residual variability*: under repeated experiments with identical conditions (*t*_{0}, *θ*_{0}, *θ*^{air}), the observed lime temperature at time *t* may exhibit variance due to variable conditions/inputs that are poorly controlled or unrecognized and thus unaccounted for in our model for *θ*(*t*).^{21} For example, (i) the air temperature *θ*^{air} is not perfectly controlled and may fluctuate over time and (ii) the opening and closing of doors in the building may introduce fluctuating air currents in the room, making the heat transfer coefficient *h* fluctuate over time.

*E*

_{σ}additive to the model prediction, independent among repeated measurements, and having an identical distribution over time. Then, our probabilistic model of the data-generating process is

*E*

_{σ}is a zero-centered Gaussian with variance

*σ*

^{2},

The random variable *E*_{σ} can capture noise emanating from both the measurement instrument and residual variability. However, Eq. (15) assumes that the mean measured lime temperature at time *t* over repeated experiments with the same conditions (*t*_{0}, *θ*_{0}, *θ*^{air}) is given by the model *θ*(*t*; *λ*, *t*_{0}, *θ*_{0}, *θ*^{air}). That is, our data-generating model neglects the possibility of *model discrepancy* (with physical reality), a nonzero difference between (a) the expected measured lime temperature at time *t* over multiple experiments with the same conditions (*t*_{0}, *θ*_{0}, *θ*^{air}) and (b) the model prediction of the lime temperature, *θ*(*t*; *λ*, *t*_{0}, *θ*_{0}, *θ*^{air}).^{21}

_{obs}at time

*t*, given (i.e., conditioned on)

*λ*,

*t*

_{0},

*θ*

_{0},

*θ*

^{air},

*σ*, is obtained by translating the density of the noise in Eq. (16) to center it at the model prediction,

### D. Inverse problem I: Parameter identification

**Overview of problem and approach**

**Task**: employ BSI to infer the parameter Λ (i.e., *λ* treated as a random variable) characterizing the lime, appearing in the model of the lime temperature in Eq. (14), using data from a heat transfer experiment.

First, we estimate *λ* with a back-of-the-envelope calculation and use this estimate to construct a weakly informative prior distribution of Λ.

To admit our uncertainty about the variance *σ*^{2} of the measurement noise in our model of the data-generating process in Eq. (15), we also treat it as a random variable Σ^{2} to also be inferred from the data. We impose a diffuse prior distribution on Σ^{2}, granted support based on the noise characteristics of the temperature probe.

Next, we set up a heat transfer experiment with the lime (see Sec. II A). We take two measurements of the conditions of the experiment: (i) the initial temperature of the lime, giving datum (*t*_{0} = 0, *θ*_{0,obs}), and (ii) the air temperature, giving datum $\theta obsair$. To admit these are noisy measurements, we treat the initial temperature of the lime Θ_{0} and air temperature Θ^{air} as random variables to be inferred from the data, but impose informative prior distributions on them based on these measurements.

We then measure the lime temperature at different times over the course of the heat transfer experiment, giving time series data ${(ti,\theta i,obs)}i=1N$ that provide information about Λ.

Finally, we use Bayes’ theorem to construct the posterior distribution of (Λ, Θ_{0}, Θ^{air}, Σ) in light of the data ${(ti,\theta i,obs)}i=1N$, sample from it using a Markov Chain Monte Carlo algorithm, and obtain a credible interval for the parameter Λ that quantifies posterior uncertainty about its value.

**Quick overview**:

*measurements of the experimental conditions*: the measured initial (*t*=*t*_{0}= 0) temperature of the lime*θ*_{0,obs}and air temperature $\theta obsair$;*data collected during the experiment*: the lime temperature time series data ${(ti,\theta i,obs)}i=1N$;*random variables to infer from the data:*the parameter Λ, the initial lime temperature Θ_{0}, the air temperature Θ^{air}, and the variance of the measurement noise Σ^{2};*sources of priors*: Λ: back-of-the-envelope calculation, Θ_{0}, Θ^{air}: our noisy measurements of them, and Σ^{2}: precision of temperature sensor.

**Summary of results:** See Fig. 7.

Classically, this inverse problem is overdetermined because a parameter *λ* giving a model *θ*(*t*; *λ*, *t*_{0}, *θ*_{0}, *θ*^{air}) that passes through all data points ${(ti,\theta i,obs)}i=1N$ does not exist. [Note that we do not attempt to determine each *C*, *h*, and *A* in Eq. (12) from the data ${(ti,\theta i,obs)}i=1N$ but rather the lumped parameter *λ* ≔ *C*/(*hA*). The former attempt would make the system unidentifiable because multiple distinct (*C*, *h*, *A*) give the same *λ* and hence produce the same model *θ*(*t*).]

#### 1. The experimental setup

To characterize the setup of the lime heat transfer experiment, we use the temperature probe to measure the air temperature, giving datum $\theta obsair$, and the initial lime temperature at *t* = *t*_{0} ≔ 0, giving datum *θ*_{0,obs}. These data are plotted in Fig. 7(a) as the horizontal dashed line and first point, respectively.

#### 2. The prior distributions

Before the data ${(ti,\theta i,obs)}i=1N$ are considered, in BSI, we must express our prior beliefs and information about the value of each variable via a prior probability density function.

**The parameter,** Λ. A back-of-the-envelope estimate of *λ* = *C*/(*hA*) is $\u22481$ h based on the following. The diameter of the lime, approximated as a sphere, is ∼5 cm. The mass of the lime is ∼100 g. The specific heat of a lime is reported as ∼4.0 kJ/(kg °C).^{54,60} A typical heat transfer coefficient *h* for natural convection via gas is 15 J/(s m^{2} °C).^{52,55}

*π*

_{prior}(

*λ*) as that of a Gaussian distribution (i) centered at our back-of-the-envelope estimate of

*λ*, (ii) with a high variance to reflect our low confidence in this rough estimate, and (iii) truncated below zero to enforce positivity,

**The experimental conditions,**Θ

_{0}

**and**Θ

^{air}. We impose informative prior distributions on the initial lime temperature Θ

_{0}and air temperature Θ

^{air}based on our (noisy) measurements of them,

**The variance of the measurement noise,**Σ

^{2}. Our prior distribution of the standard deviation of the measurement noise, reflecting our beliefs about the precision of the temperature probe, is

**The joint prior distribution**. The joint prior distribution of all of the unknowns for this inverse problem factorizes since we imposed independent priors, corresponding to plausible assumptions, including that the parameter

*λ*of the lime has no causal link with the air temperature,

*π*

_{prior}(

*λ*,

*θ*

_{0},

*θ*

^{air},

*σ*) summarizes the information and beliefs we have about the unknowns (

*λ*,

*θ*

_{0},

*θ*

^{air},

*σ*) at this stage, before the data ${(ti,\theta i,obs)}i=1N$ are considered.

#### 3. The data and likelihood function

**The data**. We measure and consider the lime temperature over the course of the heat transfer experiment, giving the time series data ${(ti,\theta i,obs)}i=1N$ displayed in Fig. 7(a).

**The likelihood function.**The likelihood function gives the probability density of the data ${(ti,\theta i,obs)}i=1N$ conditioned on each possible value of the parameters Λ =

*λ*and Σ =

*σ*and experimental conditions Θ

_{0}=

*θ*

_{0}and Θ

^{air}=

*θ*

^{air}. We construct the likelihood from the (i) data ${(ti,\theta i,obs)}i=1N$ and (ii) model of the data-generating process in Eq. (17). The likelihood function is

*E*

_{σ}as an independent random variable. Note, inherently, that the likelihood is conditioned on the model

*structure*[i.e., functional form/shape for

*θ*(

*t*) in Eq. (14)] as well.

Since we possess the data ${(ti,\theta i,obs)}i=1N$ at this stage, we (i) view the likelihood as a function of the unknowns (*λ*, *θ*_{0}, *θ*^{air}, *σ*) and (ii) interpret it as a measure of the support the data ${(ti,\theta i,obs)}i=1N$ lend to each value of the unknowns, (*λ*, *θ*_{0}, *θ*^{air}, *σ*).^{39}

#### 4. The posterior distribution

*posterior density*governs the probability distribution of the unknowns (Λ, Θ

_{0}, Θ

^{air}, Σ) conditioned on the time series data ${(ti,\theta i,obs)}i=1N$. By Bayes’ theorem,

^{46}the posterior density is proportional to the product of the likelihood function and prior density,

*ratios*of posterior densities.

We are particularly interested in the posterior distribution of the parameter Λ, with (Θ_{0}, Θ^{air}, Σ) marginalized out.

The posterior density of the unknowns *π*_{posterior}(*λ*, *θ*_{0}, *θ*^{air}, *σ*∣*θ*_{1,obs}, …, *θ*_{N,obs})

is the raw solution to this parameter identification problem as it assigns a probability to each region of (

*λ*,*θ*_{0},*θ*^{air},*σ*)-space to reflect our posterior degree of belief that the unknowns (*λ*,*θ*_{0},*θ*^{air},*σ*) fall in that region;represents an

*update*to our prior density*π*_{prior}(*λ*,*θ*_{0},*θ*^{air},*σ*), in light of the data ${(ti,\theta i,obs)}i=1N$; andreflects a compromise between (i) our prior knowledge and beliefs about (

*λ*,*θ*_{0},*θ*^{air},*σ*) and (ii) the support the data ${(ti,\theta i,obs)}i=1N$ lend to (*λ*,*θ*_{0},*θ*^{air},*σ*), according to our model of the data-generating process.

**Remark on sources of posterior uncertainty about** **Λ**. Uncertainty about Λ remains even after the data ${(ti,\theta i,obs)}i=1N$ are considered. Sources of this posterior uncertainty are a lack of data (small *N*) coupled with two sources of noise captured through our model of the data-generating process in Eq. (15):^{21} (i) measurements of the lime temperature being corrupted by (unobservable) noise from using an imperfect temperature sensor and (ii) unrecognized and/or poorly controlled inputs/conditions that influence the lime temperature *over the course of the experiment* and result in white noise. However, our posterior distribution does *not* capture uncertainty due to (i) residual variability: unrecognized and/or poorly controlled inputs/conditions that influence the lime temperature and vary *from experiment-to-experiment* (because we infer Λ using data from only a single experiment), or (ii) model inadequacy: when *θ*(*t*) does not faithfully predict the expected [over many experiments under the same conditions (*t*_{0}, *θ*_{0}, *θ*^{air})] measured lime temperature [perhaps, in part, owing to (i)], which would introduce bias and violate the white noise assumption in Eq. (15).

#### 5. Sampling from the posterior

We employ a *Markov Chain Monte Carlo* (MCMC) algorithm, the No-U-Turn Sampler (NUTS)^{61} implemented in Turing.jl^{62} in Julia,^{63} to obtain samples from the joint posterior distribution in Eq. (24) in order to (i) approximate it with an empirical distribution using kernel density estimation and (ii) compute means and credible intervals of the unknowns. Over four independent chains, we collect 2500 samples/chain, with the first half discarded for warm-up.

**Markov chain Monte Carlo (MCMC) sampling from the posterior distribution****The computational challenge of computing the posterior.**In a general setting for parameter inference via BSI, let$U\u2208RK$ be the random vector of the

*K*unknown parameters in the inverse problem;$d\u2208RN$ be the data vector of noisy measurements/observations.

The posterior density of*U*follows from Bayes’ theorem,^{46}The denominator,(25)$\pi (u)\u2254\pi posterior(u\u2223d)=\pi likelihood(d\u2223u)\pi prior(u)\pi evidence(d).$*the evidence*, is the probability density of the data—a constant that serves as a normalizing factor for the numerator,Typically, we cannot analytically evaluate this(26)$\pi evidence(d)=\u222b\pi likelihood(d\u2223u)\pi prior(u)du.$*K*-dimensional integral. If*K*is large, it may be intractable to use numerical cubature to approximate*π*_{evidence}(*d*) as well. The same difficulty may arise for the integral to (i) compute the mean of the posterior or (ii) marginalize out a subset of the unknowns we are less concerned with.^{43}**Markov chain Monte Carlo (MCMC) simulation.**MCMC methods^{44}permit us to obtain samples*u*_{1}, …,*u*_{n}from the posterior density*π*(*u*) with only access to evaluations of a function to which the posterior density is*proportional*[as in Eq. (24)]. This circumvents the need to compute*π*_{evidence}(*d*). From the samples*u*_{1}, …,*u*_{n}, we can (i) construct an empirical posterior distribution using kernel density estimation^{64}and (ii) approximate (a) the mean of the posterior from the mean of the samples and (b) an equal-tailed credible interval of any unknown from the percentiles of its samples. The approximation to the posterior improves as more samples are collected. Note that we can construct the empirical*marginal*posterior distribution, of a subset of unknowns, trivially by ignoring the remaining dimensions of the sampled vectors*u*_{1}, …,*u*_{n}.The idea behind an MCMC method is to (i) construct a Markov chain

*U*_{1},*U*_{2}, … whose (a) state space is the parameter space and (b) transition kernel*π*(_{t}*u*′∣*u*) governing the probability of transitioning from one state*u*to another*u*′ endows the chain with a stationary distribution equal to the posterior distribution*π*(*u*) and then (ii) simulate the Markov chain to obtain a realization*u*_{1}, …,*u*_{n}, regarded as (autocorrelated) samples from*π*(*u*).^{39,43,65,66}**For example**,**random walk Metropolis.**Perhaps the simplest MCMC simulation algorithm to understand is random walk Metropolis.^{43,65}Here, a realization*u*_{1}, …,*u*_{n}of a Markov chain*U*_{1}, …,*U*_{n}is obtained by iterating a stochastic process of “propose then accept-or-reject”*n*times. We initiate the walk at some state*u*_{1}in a region with reasonably high posterior density. Within the iterations, suppose*u*is the current state in the chain. We*propose*to move to a new state*u*′, chosen randomly according to an isotropic random walk starting at*u*. That is, the proposed new state is a random variable*U*′ =*u*+ Δ*U*, where $\Delta U\u223cN(0,\sigma 2I)$. We*accept*this proposed state transition with probabilityand(27)$min1,\pi (u\u2032)\pi (u)$*reject*it otherwise, staying at*u*. This rule (i) always accepts proposed moves “uphill” to a state*u*′ with higher density than*u*and (ii) occasionally accepts moves “downhill.” Note that the rule only requires the ratio of the densities of the two states. Hence, the normalization factor*π*_{evidence}(*d*) cancels and is not needed. Together, this proposal distribution and acceptance rule specifies a transition kernel*π*(_{t}*u*′∣*u*) that grants the Markov chain*U*_{1},*U*_{2}, … a stationary density equal to*π*(*u*). Consequently,*u*_{1}, …,*u*_{n}are autocorrelated samples from the posterior*π*(*u*).^{43}The scale parameter*σ*in the random walk proposal distribution dictates the efficiency of the sampling, in terms of the amount of serial correlation in the samples. If*σ*is too small, too many proposed random walk steps are required to explore the state space. If*σ*is too large, too many proposals will be to visit regions with low density, which will be rejected, making the walker stay in place. Both extremes make the sampler inefficient.^{65,67}NUTS,

^{61}an extension of Hamiltonian Monte Carlo (HMC),^{68}tends to be a more efficient MCMC sampler than random walk Metropolis owing to its more judicious state transition proposal scheme than a random walk in state space.**Warm-up, stopping rules, convergence diagnostics.**If it is too difficult to find a high-density region of the posterior in which to initiate the Markov chain with state*u*_{1}, some first fraction of the*n*MCMC samples are typically be discarded (i.e., not counted as samples from the posterior) as “warm-up” to allow the walker to find a region of state space with high posterior density.^{39,66}Stopping rules and empirical convergence diagnostic tools may ascertain, or at least give us confidence, that we have run the Markov chain sufficiently long (i.e., that

*n*is large enough) for the sampled chain*u*_{1}, …,*u*_{n}to give a good approximation of the posterior. More, they allow us to avoid wastefully running the chain for longer than needed for our desired accuracy. An example of a stopping rule is to terminate when the*effective*sample size exceeds a pre-specified number. The effective sample size is the number of*independent*samples with the same estimation power as the*autocorrelated*samples from the chain,*u*_{1}, …,*u*_{n}. An example of an empirical convergence diagnostic tool is a*trace plot*of a coordinate of the sampled chain*u*_{1}, …,*u*_{n}vs the iteration number. The trace plot shows the efficiency of the Markov chain. It should resemble a horizontal, approximately uniform-thickness “hairy caterpillar” to reflect low autocorrelation between the samples, thorough exploration of the support of the posterior, and sufficient warm-up. Another convergence check is for consistency between the empirical posterior distributions from independent chains (perhaps) run in parallel.^{66,69,70}

#### 6. Summary of results

Figure 7(a) shows all data from the heat transfer experiment that we use to infer the parameter Λ of the lime with BSI.

Figure 7(b) compares (i) the prior distribution of the parameter Λ and (ii) its updated, (marginal) empirical posterior distribution constructed via kernel density estimation. The bar shows the *equal-tailed 90% posterior credible interval* for Λ, [0.94 h, 1.04 h]. By definition, the true parameter *λ* of the lime is situated in this interval with 90% probability, falls below it with 5% probability, and falls above it with 5% probability. The width of the interval, then, reflects our posterior uncertainty about Λ. This interpretation is predicated upon our model of the data-generating process and prior assumptions holding.

Figure 7(c) illustrates the posterior distribution over functions *θ*(*t*) modeling the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, *θ*(*t*; *λ*, *t*_{0} = 0, *θ*_{0}, *θ*^{air}), with (*λ*, *θ*_{0}, *θ*^{air}) a sample from the posterior distribution. The models fit the data ${(ti,\theta i,obs)}i=1N$ well and exhibit little variance, reflecting low uncertainty about the parameter *λ* in light of the data.

We found little correlation (magnitude of Pearson correlation $<$0.02) between Λ and Σ in the joint posterior distribution. The (marginal) posterior distributions of Λ and Σ are well-approximated as independent Gaussian distributions $N(0.98\u2009h,(0.03\u2009h)2)$ and $N(0.18\xb0C,(0.05\xb0C)2)$.

**Checks.** To assess the efficiency of an MCMC sampler and give confidence the MCMC samples reliably approximate the posterior distribution,^{66} we drew trace plots and visualized the empirical posterior distribution of *λ* over four independent chains in Fig. S2. The trace plot indicates that each sampled chain exhibits little autocorrelation and thoroughly explores an interval of *λ*, and the empirical posterior distributions are consistent with each other.

To visually inspect (i) the amount of prior information we include about *λ* in the prior distribution and (ii) the consistency of the observed data ${(ti,\theta i,obs)}i=1N$ with the prior distribution, we show a prior predictive check^{39,71} in Fig. S3, where we generated synthetic data obtained by simulating the data-generating process with parameters sampled from the prior.

To visually assess the fit of the posterior model of the lime temperature, we, in a posterior predictive check,^{39,71} plot the distribution of residuals between our observed data and synthetic data obtained by simulating the data-generating process with parameters sampled from the posterior in Fig. S4. The mean residual is less than ±0.25 °C, an excellent fit.

### E. Inverse problem IIa: time reversal

**Overview of problem and approach**

**Task**: employ BSI to infer the initial temperature of the lime Θ_{0} (i.e., *θ*_{0} treated as a random variable) based on a single measurement of its temperature at a later time *t*′ > *t*_{0} = 0, $\theta obs\u2032$, and a measurement of the air temperature, $\theta obsair$.

First, we impose a diffuse prior distribution on Θ_{0} based on the range of temperatures encountered in refrigerators.

Second, we impose informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ^{2} based on the posterior distributions from our parameter identification phase in Sec. II D. “Yesterday’s posterior is today’s prior.”^{72}

Next, we set up another heat transfer experiment on the same lime. To (partially) characterize the conditions of the experiment, we measure the air temperature, giving datum $\theta obsair$, which we use to construct an informative prior distribution on Θ^{air}. [Note that we also measured the initial temperature of the lime, but we hold the data (*t*_{0} = 0, *θ*_{0,obs}) out from the BSI procedure until the end, to test the fidelity of the posterior distribution of Θ_{0}.]

Then, during the experiment, we (indirectly) gather information about Θ_{0} by measuring the temperature of the lime at time *t*′ > *t*_{0} = 0, i.e., a duration *t*′ after the lime was taken out of the refrigerator, giving datum $(t,\theta obs\u2032)$.

Finally, we use Bayes’ theorem to construct the posterior distribution of (Θ_{0}, Λ, Θ^{air}, Σ) in light of the datum $(t,\theta obs\u2032)$, sample from the posterior using an MCMC algorithm, and obtain a credible interval for the initial lime temperature Θ_{0} that quantifies posterior uncertainty about its value.

**Quick overview**:

*measurement of an experimental condition*: the measured air temperature $\theta obsair$;*datum collected during the experiment*: the measured lime temperature point $(t\u2032,\theta obs\u2032)$ with*t*′ >*t*_{0}and*t*_{0}= 0;*random variables to infer from the datum:*the initial lime temperature Θ_{0}, the parameter Λ, the air temperature Θ^{air}, and the variance of the measurement noise Σ^{2}; and*sources of priors*: Θ_{0}: range of temperatures encountered in refrigerators; Λ, Σ^{2}: the posterior from our parameter identification phase in Sec. II D; Θ^{air}: our noisy measurement of the air temperature.

*t*′. To see the ill-conditioning, let

*δθ*′ be the error in our measurement of the true lime temperature

*θ*(

*t*′) and

*δθ*

_{0}be the resulting error in our prediction of the initial temperature, $\theta 0\u0302:=\theta 0+\delta \theta 0$, with

*θ*

_{0}being the true initial lime temperature. The errors

*δθ*

_{0}and

*δθ*′ are related through a perturbed version of Eq. (14) (given it holds),

*t*′ >

*t*

_{0}at which we take the measurement of the lime temperature. This ill-conditioning is apparent from the graphical solution to this time reversal problem, of tracing the model trajectory of the lime temperature backward in time, starting at the point (

*t*′,

*θ*(

*t*′) +

*δθ*′), back to $(t0=0,\theta 0\u0302)$. A small error

*δθ*′ in the measured lime temperature at

*t*′ results in a large change in the trajectory traced backward, if

*t*′ is large enough to place the measured lime temperature close to the air temperature. See Fig. S5.

#### 1. The experimental setup

To characterize the setup of the additional lime heat transfer experiment (different from the one for parameter identification in Sec. II D, but with the same lime), we use the temperature probe to measure the air temperature, giving datum $\theta obsair$, plotted in Fig. 8(a) as the horizontal dashed line. (Note, we actually also measured the initial lime temperature at *t* = *t*_{0} ≔ 0, giving datum *θ*_{0,obs} = 6.3 °C, but let us pretend we did not for now.)

#### 2. The prior distributions

**The initial temperature,**Θ

_{0}. We impose a diffuse prior distribution on the initial temperature of the lime based on a (generous) range of temperatures encountered in refrigerators,

**The air temperature**, Θ

^{air}. We impose an informative prior distribution on the air temperature Θ

^{air}based on our (noisy) measurement of it,

**The parameter**Λ

**and variance of the measurement noise**

**Σ**

^{2}. We exploit the information we obtained about Λ and Σ during our parameter identification phase in Sec. II D to construct informative prior distributions on the parameter Λ characterizing the (same) lime and the variance in the measurement noise Σ emanating from the (same) temperature sensor. Particularly, we use the posterior distributions from Sec. II D. “Yesterday’s posterior is today’s prior,”

^{72}

**The joint prior distribution.** Again, the joint prior distribution of all of the unknowns *π*_{prior}(*θ*_{0}, *λ*, *θ*^{air}, *σ*) for this inverse problem factorizes since we impose independent priors.

#### 3. The datum and likelihood function

**The datum**. During the second heat transfer experiment, we take a single measurement of the lime temperature at time *t*′ > *t*_{0} = 0, giving the datum point $(t\u2032,\theta obs\u2032)$ displayed in Fig. 8(a). Note that the time the lime was taken out of the refrigerator *t*_{0} = 0 is known.

**The likelihood function.**The likelihood function gives the probability density of the datum $\theta obs\u2032$ conditioned on each possible value of the parameters Λ =

*λ*and Σ =

*σ*and experimental conditions Θ

_{0}=

*θ*

_{0}and Θ

^{air}=

*θ*

^{air}. We construct the likelihood from the (i) datum $(t\u2032,\theta obs\u2032)$ and (ii) model of the data-generating process in Eq. (15). The likelihood function

*θ*

_{0},

*λ*,

*θ*

^{air},

*σ*).

#### 4. The posterior distribution

*posterior density*governs the probability distribution of the unknowns (Θ

_{0}, Λ, Θ

^{air}, Σ) conditioned on the data $(t\u2032,\theta obs\u2032)$. By Bayes’ theorem, the posterior density is proportional to the product of the likelihood function and prior density,

We are particularly interested in the posterior distribution of the initial lime temperature Θ_{0}, with (Λ, Θ^{air}, Σ) marginalized out.

Again, we employ NUTS to obtain samples from the posterior and then construct an empirical approximation of it.

#### 5. Summary of results

Figure 8(a) shows the data from the heat transfer experiment that we employ to infer the initial lime temperature Θ_{0} with BSI.

Figure 8(b) compares (i) the prior distribution of the initial lime temperature Θ_{0} with (ii) its updated (marginal) empirical posterior distribution constructed via kernel density estimation. The bar shows the 90% equal-tailed posterior credible interval for Θ_{0}. Notably, the hold-out test data, the measured initial lime temperature *θ*_{0,obs}, fall in this credible interval.

Figure 8(c) illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, *θ*(*t*; *λ*, *t*_{0} = 0, *θ*_{0}, *θ*^{air}), with (*θ*_{0}, *λ*, *θ*^{air}) a sample from the posterior distribution. The true initial condition is covered by the ensemble of backward trajectories.

#### 6. Ill-conditioning

Figure 9 shows the marginal posterior distribution of the initial lime temperature Θ_{0} for various times *t*′ at which we measure the lime temperature to obtain the datum $(t\u2032,\theta obs\u2032)$. As *t*′ becomes larger, the posterior distribution of Θ_{0} spreads, reflecting higher uncertainty. For large *t*′, i.e., measurements after the lime has been outside of the refrigerator for a long time, we still have high posterior uncertainty about the initial lime temperature. That the datum $(t\u2032,\theta obs\u2032)$ has not much enlightened us much about the initial lime temperature Θ_{0} for large *t*′ owes to the ill-conditioned nature of the problem illustrated in Eq. (28). Hence, Fig. 9 demonstrates the ability of BSI to capture ill-conditioning in inverse problems of reconstruction.

### F. Inverse problem IIb: Time reversal

**Overview of problem and approach**

**Task**: employ BSI to infer both the time *T*_{0} (i.e., *t*_{0} treated as a random variable) the lime was taken out of the refrigerator and the initial temperature Θ_{0} of the lime based on a measurement of the lime temperature at time *t*′ > *t*_{0}, $\theta obs\u2032$, and the measured air temperature $\theta obsair$.

First, we impose a diffuse prior distribution on Θ_{0} based on the range of temperatures encountered in refrigerators; a weakly informative prior distribution on *T*_{0} based on our sense of time passing; and informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ^{2} based on the posterior distributions from our parameter identification phase in Sec. II D.

Next, we set up another heat transfer experiment (well, we use the same data from the second heat transfer experiment in Sec. II E; importantly, it is a different experiment from the one used for parameter identification in Sec. II D) on the same lime. To (partially) characterize the condition of the experiment, we measure the air temperature, giving datum $\theta obsair$ to construct an informative prior distribution for Θ^{air}. [Note that we also measure the initial temperature of the lime *θ*_{0,obs} and know the time *t*_{0} it was taken out of the refrigerator, but we hold this datum (*t*_{0}, *θ*_{0,obs}) out from the BSI procedure to test the fidelity of the posterior distribution of (*T*_{0}, Θ_{0}).]

Then, during the experiment, to (indirectly) gather information about (*T*_{0}, Θ_{0}), we measure the temperature of the lime at time *t*′ > *t*_{0}, $\theta obs\u2032$.

Finally, we use Bayes’ theorem to construct the posterior distribution of (*T*_{0}, Θ_{0}, Λ, Θ^{air}, Σ) in light of the data, sample from it using a MCMC algorithm, and obtain an empirical marginal joint posterior distribution for the initial lime temperature Θ_{0} and time it was taken out of the refrigerator *T*_{0} that quantifies posterior uncertainty about their values.

**Quick overview**:

*measurement of an experimental condition*: the measured air temperature $\theta obsair$;*datum collected during the experiment*: the measured lime temperature point $(t\u2032,\theta obs\u2032)$ with*t*′ >*t*_{0}and*t*_{0}unknown;*random variables to infer from the datum:*the time the lime was taken out of the refrigerator*T*_{0}, the initial lime temperature Θ_{0}, the parameter Λ, the air temperature Θ^{air}, and the variance of the measurement noise Σ^{2}; and*sources of priors*:*T*_{0}: our human judgment of the passing of time; Θ_{0}: range of temperatures encountered in refrigerators; Λ, Σ^{2}: the posterior from our parameter identification phase in Sec. II D; Θ^{air}: our noisy measurement of the air temperature.

*and*(2) “the lime was initially not very cold and has been outside of the refrigerator for a short duration.” Mathematically, there is a curve of infinite solutions in the (

*t*

_{0},

*θ*

_{0}) plane (the two primary unknowns) that satisfy the model in Eq. (14) with known (

*t*′,

*θ*(

*t*′)) and

*θ*

^{air},

*T*

_{0}and Θ

_{0}, the posterior distribution in BSI may assign different weights to each of the (inherently, equal-weighted) classical solutions, and (ii) by accounting for measurement noise, BSI entertains solutions off of the curve comprising the classical solutions. This time reversal problem still becomes ill-conditioned for large

*t*′ −

*t*

_{0}, as the curve described in Eq. (34) is sensitive to errors in the measurement of

*θ*(

*t*′) when

*t*′ −

*t*

_{0}is large.

#### 1. The experimental setup

To (partially) characterize the experimentally setup of this lime heat transfer experiment, we use the temperature probe to measure the air temperature, giving datum $\theta obsair$ plotted as the horizontal dashed line in Fig. 10(a).

#### 2. The prior distributions

_{0}, Θ

^{air}, Λ, and Σ

^{2}as in Sec. II E. Additionally, we now impose a prior distribution on the time

*T*

_{0}at which the lime was taken out of the refrigerator based on our unreliable—truly,

*t*

_{0}= 0, so the prior for

*T*

_{0}is biased, as it is not centered at the true value—judgment of the passing of time,

*π*

_{prior}(

*t*

_{0},

*θ*

_{0},

*λ*,

*θ*

^{air},

*σ*) for this inverse problem. We visualize the marginal prior distribution of (

*T*

_{0}, Θ

_{0}) in Fig. 10(b).

#### 3. The datum and likelihood function

**The datum**. The datum from the heat transfer experiment is a single measurement of the lime temperature, $(t\u2032,\theta obs\u2032)$ with *t*′ > *t*_{0}, displayed in Fig. 10(a). The time the lime was taken out of the refrigerator *t*_{0} is *not* known.

**The likelihood function.**The likelihood function gives the probability density of the datum $\theta obs\u2032$ conditioned on each possible value of the parameters Λ =

*λ*and Σ =

*σ*and experimental conditions

*T*

_{0}=

*t*

_{0}, Θ

_{0}=

*θ*

_{0}, and Θ

^{air}=

*θ*

^{air},

#### 4. The posterior distribution

*posterior density*governs the probability distribution of the unknowns (

*T*

_{0}, Θ

_{0}, Λ, Θ

^{air}, Σ) conditioned on the data $(t\u2032,\theta obs\u2032)$. By Bayes’ theorem, the posterior density is proportional to the product of the likelihood function and (joint) prior density,

*T*

_{0}, Θ

_{0}), with (Λ, Θ

^{air}, Σ) marginalized out.

Again, we employ NUTS to obtain samples from the posterior.

#### 5. Summary of results

Figure 10(a) shows the data from the heat transfer experiment that we employ to infer the initial condition of the lime (*T*_{0}, Θ_{0}) with BSI.

By showing contours, Fig. 10(b) compares (i) the joint prior distribution of the initial condition of the lime, (*T*_{0}, Θ_{0}), with (ii) the updated, empirical, joint posterior distribution of (*T*_{0}, Θ_{0}) constructed via kernel density estimation. The curve drawn in the (*t*_{0}, *θ*_{0}) plane shows the classical solutions, given in Eq. (34) with $\theta air\u2254\theta obsair$, $\theta (t\u2032)\u2254\theta obs\u2032$, and *λ* set to the mean of its posterior distribution from the parameter identification phase. The ridge of the posterior density follows the curve of classical solutions. However, owing to the non-uniform density and finite support of the prior distribution of (*T*_{0}, Θ_{0}), the classical solutions are weighted differently and some are not entertained at all. The posterior density spreads orthogonal to (off) the curve of classical solutions owing to BSI accounting for noise corrupting our measurement of the air and lime temperature (unlike the classical approach); i.e., BSI entertains solutions off of the curve of classical solutions. The measured initial condition of the lime (*t*_{0} = 0, *θ*_{0,obs}), held out as test data, falls in a region of high posterior density. This result is, in part, a consequence of the mean of our prior for *T*_{0} in Eq. (35) being close to the true *t*_{0} = 0. The marginal posterior distributions of *T*_{0} and Θ_{0} are compared in Fig. 10(b) as well, including their 90%, equal-tailed posterior credible intervals. The credible interval for Θ_{0} is much wider than in the inverse problem IIa in Fig. 8(b), despite using the same lime temperature measurement $\theta obs\u2032$ after the same duration *t*′ the lime has been outside of the refrigerator, because *t*_{0} now is not specified but rather *T*_{0} is endowed a spread-out prior distribution, including density less than the true *t*_{0}.

Figure 10(c) illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, *θ*(*t*; *λ*, *t*_{0}, *θ*_{0}, *θ*^{air}), with (*t*_{0}, *θ*_{0}, *λ*, *θ*^{air}) being a sample from the posterior distribution. The intuition explaining the correlation of *T*_{0} and Θ_{0} in the joint posterior density in Fig. 10(b) is apparent in Fig. 10(c): the data $(t\u2032,\theta obs\u2032)$ are consistent with both propositions: (i) “the lime was initially not very cold and taken out of the refrigerator recently” and (ii) “the lime was initially very cold and taken out of the refrigerator a while ago.”

### G. Software for BSI

A variety of free and open-source probabilistic programming libraries are available, making Bayesian statistical inversion accessible to practitioners.^{39} Broadly, the probabilistic (computer) programming paradigm^{73,74} allows practitioners to (i) easily specify (a) the prior probability distributions and (b) the probabilistic forward model assumed to generate the data then (ii) call a Markov Chain Monte Carlo routine to draw samples from the posterior distribution that (implicitly) follows from the prior and likelihood. Popular Bayesian inference engines/probabilistic programming libraries include Turing.jl^{62} in Julia,^{63}^{,} PyMC3,^{75}^{,} Pyro,^{76} and Probability^{77} in Python, and Stan^{78} that interfaces with several languages, including R.

**BSI coding tutorial.** We provide a minimal coding tutorial for BSI to tackle inverse problems I and IIa herein at https://simonensemble.github.io/pluto_nbs/bsi_tutorial.html using the probabilistic programming library Turing.jl in the Julia programming language.

### H. Conclusions

By way of example, we provided a tutorial of Bayesian statistical inversion (BSI) to solve inverse problems while incorporating prior information and quantifying uncertainty. Our focus was a simple, intuitive physical process—heat transfer from ambient indoor air to a cold lime fruit via natural convection. First, we developed a simple mathematical model for the lime temperature, which contains a single parameter. Then, we used a time series dataset of the lime temperature to infer, via BSI, the posterior distribution of the model parameter. Next, we employed the model with the inferred parameter to tackle, via BSI, two reconstruction problems of time reversal. The first task, ill-conditioned, was to predict the initial temperature of the lime from a measurement of its temperature later in time. The second task, underdetermined, was to predict the initial temperature of the lime *and* the time it was taken out of the refrigerator from a measurement of its temperature later in time. We intend for our Tutorial to facilitate scientists and engineers to (i) recognize inverse problems in their research domain, (ii) question whether these problems are well-posed and well-conditioned, and (iii) leverage BSI to solve these problems while incorporating prior information and quantifying uncertainty.

Our BSI solutions to the inverse problems involving the lime are subject to limitations. First, our mathematical model of the lime temperature relies on several simplifying assumptions listed in Sec. II B. The model may be more accurate if we relaxed these assumptions and amended it to account for, e.g., the time-dependence of the bulk air temperature *θ*^{air}(*t*) and spatial temperature gradients in the lime in conjunction with its geometry and spatial heterogeneity (skin, flesh, seeds). Second, ideally, we would replicate the heat transfer experiment multiple times and use all of this time series data of the lime temperature for the inference of the parameter Λ in Sec. II D. This would allow the posterior distribution of Σ to capture the full residual variability of the lime temperature over repeated experiments owing to poorly controlled and/or unrecognized inputs/conditions that affect the lime temperature. Third, we neglected the possibility of model discrepancy,^{21} an unaccounted-for source of both systemic bias and uncertainty in the posterior of the parameter/initial condition of the lime.

## III. DISCUSSION

We provided an introductory tutorial on Bayesian statistical inversion as a tool to tackle inverse problems, which could be ill-posed, while (i) incorporating prior information and (ii) providing a solution in the form of a probability density function over the input/parameter space, which quantifies uncertainty via its spread. Inverse problems are pervasive throughout sciences and engineering, e.g., in heat or radiation transfer, gravitational fields, wave scattering, tomography, and electromagnetism;^{79–84} vibration of springs and beams;^{85} imaging;^{7,86} fluid mechanics;^{87} physiology;^{88} epidemiology;^{89,90} ecology;^{91} geophysics;^{84} environmental science;^{92,93} palaeoclimatology;^{94} chemical/bio-chemical reactions;^{95–99} and adsorption.^{100} Mathematically, reconstruction problems in these domains often reduce to using data to determine (1) a vector that was linearly transformed by a matrix;^{13,101,102} (2) the initial condition, boundary condition, or forcing term in an ordinary or partial differential equation;^{7} (3) an integrand;^{103,104} or (4) the geometry of a domain.^{9}

### A. Drawbacks of BSI

The Bayesian statistical approach to inverse problems has some drawbacks compared to simpler, classical methods, such as least squares.

**A larger computational cost.** The Bayesian approach to inverse problems typically demands a relatively large computational cost to approximate the posterior distribution via a Markov chain Monte Carlo algorithm—especially when the number of parameters/inputs is large and the likelihood is expensive to evaluate (owing to a large size of the data and/or a large computational expense to evaluate the forward model).^{35,49} To reduce the cost of Bayesian inference at the expense of accuracy/flexibility, we may resort to approximate Bayesian computation (ABC),^{105} make a Laplace (quadratic) approximation of the posterior, or employ a conjugate prior for the likelihood that gives a closed-form expression for the posterior.^{35}

**The subjectivity of the prior.** A common objection to Bayesian inference is the subjectivity involved in constructing the prior distribution (which influences the posterior).^{106} First, we can assess the impact of the prior on the posterior through a sensitivity analysis (i.e., examine how much the posterior changes when the prior changes). Second, through *prior predictive checking*, we may ascertain the consistency of the prior with our observed data by comparing (a) the observed data with (b) synthetic data generated via sampling inputs/parameters from the prior then simulating the probabilistic model of the data-generating process under those inputs/parameters.^{39,71} Third, in the absence of information, we may adopt the principle of indifference and impose a diffuse prior that avoids biasing the posterior. Finally, one intriguing approach to construct a prior is to elicit estimates of the unknown parameter/input from a panel of experts.^{107} In defense of the prior distribution, it provides an *opportunity*/*benefit* to (i) potentially obtain better estimates of the inputs/parameters by exploiting relevant external information or constraints about the input/parameters (e.g., the physical constraint and back-of-the-envelope estimate of *λ* for the lime) and (ii) interpret new data from an experiment in the context of different beliefs held among different scientists (and encoded in their priors) before the data were collected.^{36,40} In addition, the choice for the forward model (its structure, e.g., functional form), upon which non-Bayesian methods also rely, requires subjectivity/judgment as well.^{36}

### B. Model discrepancy

*Model discrepancy*^{21} refers to a nonzero difference between the (i) expected measurement of the system output over repeated experiments with the same input/conditions and (ii) the prediction of the system output by the model. If significantly present and not accounted for in the model of the data-generating process, model discrepancy corrupts the BSI solution to an inverse problem.^{108} First, disregarded model discrepancy introduces bias, i.e., the posterior density of the parameter/input will not be centered at the true value even as more and more data are collected. Second, it leads to mis-calibrated uncertainty quantification, i.e., the posterior credible interval will not be likely to contain the true value of the parameter/input.

*x*) (a random variable dependent on the input,

*x*) as a Gaussian process

^{109}and infer it from data.

^{21,108,110}That is, we model the measured system output

*Y*

_{obs}as

*f*

_{β}(

*x*) the (inadequate) forward model and

*E*the noise, which by definition has mean zero. The model discrepancy function Δ(

*x*) aims to capture the input-dependent difference between reality (which we probe through sampling the output of the system

*Y*

_{obs}for a given input

*x*) and the model predictions.

### C. Model selection

We assumed that the model structure—in our study, the functional form/shape of the lime temperature as a function of time, *θ*(*t*)—is known (see Table 2a). In *model selection*, the task is to select the best model of a physical system among a set of candidate models, given data. Bayesian approaches to model selection include the Bayesian information criterion^{111} and Bayes factors^{39,112} based on the evidence function.^{40,113}

### D. Other methods to approximate the posterior distribution

Here, we employed the NUTS^{61} Markov chain Monte Carlo method to obtain samples from the posterior distribution to approximate it. Other methods to approximate a posterior distribution include (i) other Monte Carlo sampling methods, including the Gibbs sampler,^{114} adaptive Metropolis algorithm,^{115} sequential Monte Carlo,^{43} and approximate Bayesian computation (ABC),^{116} and (ii) obtaining an analytical expression as an approximation to the posterior, including the Laplace approximation and variational Bayes.^{117} The BSI practitioner must consider accuracy, speed, and ease of implementation^{43} when choosing a method to approximate their posterior at hand.

### E. Classical approaches to inverse problems

A traditional approach to inverse problems is least squares to minimize the mismatch between the data and the model predictions by tuning the unknown input/parameter. For overdetermined inverse problems, bootstrapping,^{118} asymptotic theory,^{119} and the Hessian of the loss function^{120} can provide uncertainty estimates for the unknown input/parameters within the least squares framework—and for new predictions by the model. To handle ill-posed and/or ill-conditioned inverse problems, Tikhonov regularization^{121,122} provides a means to incorporate prior assumptions into their solutions by augmenting least squares. For example, the ill-posed problem of determining the initial temperature profile of a rod from a measurement of its temperature profile later in time may be tackled by augmenting the least-squares loss function with a regularization term to promote smooth solutions.^{123} Tikhonov regularization is intimately connected with BSI.^{13,14,34,38} However, BSI is more versatile and better-grounded in theory; consequently, BSI provides a more interpretable means of incorporating prior assumptions/information, and BSI entertains and weighs multiple solutions for uncertainty quantification.

## SUPPLEMENTARY MATERIAL

The supplementary material shows the Arduino setup for lime temperature measurements, convergence diagnostics for MCMC sampling, prior predictive check, posterior predictive check, and graphical solution to time reversal problem.

## ACKNOWLEDGMENTS

F.G.W. acknowledges NSF under Award No. 1920945 for support. C.M.S. acknowledges support from the US Department of Homeland Security Countering Weapons of Mass Destruction under CWMD Academic Research Initiative Cooperative Agreement No. 21CWDARI00043. This support does not constitute an expressed or implied endorsement on the part of the Government. The authors thank Edward Celarier, Luther Mahoney, and the anonymous reviewers for feedback on the manuscript.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Faaiq G. Waqar**: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Swati Patel**: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). **Cory M. Simon**: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

All data and Julia code to reproduce the results/plots in this article are openly available in Github at https://github.com/faaiqgwaqar/Inverse-Problems.^{124}

## REFERENCES

*Parameter Estimation and Inverse Problems*

*Inverse Problems: Activities for Undergraduates*

*Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering*

*An Introduction to Inverse Problems with Applications*

*Inverse Problems: Tikhonov Theory and Algorithms*

*Elements of the Theory of Inverse Problems*

*Linear and Nonlinear Inverse Problems with Practical Applications*

*An Introduction to Data Analysis and Uncertainty Quantification for Inverse Problems*

*Statistical and Computational Inverse Problems*

*Mathematical Modelling*

*Nonlinear Least Squares for Inverse Problems: Theoretical Foundations and Step-by-Step Guide for Applications*

*Handbook of Uncertainty Quantification*

*Inverse Problem Theory and Methods for Model Parameter Estimation*

*Bayesian Methods for the Physical Sciences*

*An Introduction to Bayesian Analysis: Theory and Methods*

*Monte Carlo Statistical Methods*

*Introduction to Bayesian Statistics*

*Fundamentals of Thermal-Fluid Sciences*

*Transport Phenomena*

*Exploring Engineering*

**5**8, 956–960 (1990)]

*Large-Scale Inverse Problems and Quantification of Uncertainty*

*Future of Software Engineering Proceedings*

*Inverse Problems for Partial Differential Equations*

*Introduction to Inverse Problems for Differential Equations*

*Inverse Problems: Basics, Theory and Applications in Geophysics*

_{2}in supported amine sorbents using

*ab initio*priors

*Inverse Problems in the Mathematical Sciences*

*Gaussian Processes for Machine Learning*

*Machine Learning: A Bayesian and Optimization Perspective*

*An Adaptive Metropolis Algorithm*

*Handbook of Approximate Bayesian Computation*

*Linear Inverse Problems and Tikhonov Regularization*

Proceedings of the Twenty-First International Conference on Artificial Intelligence and StatisticsPMLR