A tutorial on the Bayesian statistical approach to inverse problems

Inverse problems are ubiquitous in the sciences and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input-output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) not be unique, or (iii) be sensitive to measurement noise contaminating the data. Bayesian statistical inversion (BSI) is an approach to tackle ill-posed and/or ill-conditioned inverse problems. Advantageously, BSI provides a"solution"that (i) quantifies uncertainty by assigning a probability to each possible value of the unknown parameter/input and (ii) incorporates prior information and beliefs about the parameter/input. Herein, we provide a tutorial of BSI for inverse problems, by way of illustrative examples dealing with heat transfer from ambient air to a cold lime fruit. First, we use BSI to infer a parameter in a dynamic model of the lime temperature from measurements of the lime temperature over time. Second, we use BSI to reconstruct the initial condition of the lime from a measurement of its temperature later in time. We demonstrate the incorporation of prior information, visualize the posterior distributions of the parameter/initial condition, and show posterior samples of lime temperature trajectories from the model. Our tutorial aims to reach a wide range of scientists and engineers.


Introduction 1.Mathematical models of physical systems
In the sciences and engineering, we often apply physical principles to develop a mathematical model of a physical system in order to capture some phenomenon of interest.Physics-based models are useful for acquiring insights and understanding about a system, explaining observations, guiding future experiments, discovering new scientific questions, and making predictions [1].
Abstractly, a physics-based model of a system is typically an operator f β : x → y that (i) predicts the output y in response to any given input x and (ii) contains a vector β of physical parameters characterizing the system.See Fig. 1.The input and output could be scalars, vectors, or functions.Evaluating the operator f β at an input x could constitute evaluating a function, solving a system of algebraic equations, multiplying by a matrix, solving a differential equation, evaluating an integral, or running a computer simulation.[2,3] Note, the abstraction of a physical system in Fig. 1 does not require the system to have mass and/or energy input/output; by definition, the input is merely a causal factor set in the beginning of the experiment, and the output is an effect/result of the input [4].

The forward problem
The forward problem is, given the model structure f β (x) and its parameters β, predict the output y of the physical system in response to a given input x-predict the effect of a cause.
The forward problem is solved by evaluating f β (x).
Forward problems tend to be (but are not always1 ) well-posed and well-conditioned (ie., have a unique solution that is insensitive to error in the data x input to the problem) [6].

Inverse problems
Mathematicians find it difficult to define "inverse problem", yet most recognize one when they see it.
inverse problem unknown data model structure, f β (x) model parameter, β x y    Inverse problems are ubiquitous in the sciences and engineering.Parameter identification is often involved in the development of a quantitatively predictive model.Reconstruction problems emerge when measurements we ideally wish to make in order to probe a system are infeasible, inaccessible, invasive, dangerous, and/or destructive; eg.(i) sensing or imaging using indirect measurements 2 , (ii) making inferences about the interior of a domain from measurements on its boundary, and (iii) time reversal: reconstructing the past state of a system from a measurement of its current state.[2,3] A famous inverse problem, one of reconstruction, is "can one hear the shape of a drum?" [15][16][17][18]. 2 An elementary example is exploiting a thermal expansion model of mercury to infer the temperature of the air from a measurement of the volume of mercury in a thermometer [13,14].
Can one hear the shape of a drum?
Suppose we strike a drumhead with a stick at t = 0.The force induces transverse waves in the drumhead, which transmit to the air to produce longitudinal waves in the air (sound).
Treating the drumhead as a homogeneous, elastic membrane, the wave equation is a mathematical model for the vertical displacement u(x, t) of the membrane at a point x ∈ Ω and time t ≥ 0: u| ∂Ω = 0. clamped boundary conditions The parameter c > 0 is the ratio of the spatially-uniform tension in the membrane to its density; ∆ x is the Laplace operator; Ω ⊂ R 2 defines the geometry of the membrane (∂Ω ⊂ R 2 is the boundary of Ω).Damping is neglected.Eqn. 1 is subject to an initial position and velocity set by the striking drumstick.[19] The pure tones the drumhead can produce are dictated by standing wave solutions of the wave equation, e iωt φ(x).Substituting into eqn 1, we find the frequency ω of a pure tone of the drumhead is related to an eigenvalue of the Laplace operator on the domain Ω: φ| ∂Ω = 0. clamped boundary conditions According to eqn. 2, the membrane, characterized by its physical property c and its geometry Ω, can produce a discrete spectrum of vibration frequencies ω 1 ≥ ω 2 ≥ • • • .[15] The conventional forward problem is: given the physical property c and geometry Ω of the membrane, use eqn. 2 to determine the spectrum of vibration frequencies ω 1 ≥ ω 2 ≥ • • • it can produce.In undergraduate mathematics courses, students solve the forward problem analytically for a rectangular or circular drum head.[20] An inverse problem is: given the physical property c of the membrane and its spectrum of vibration frequencies ω 1 ≥ ω 2 ≥ • • • , use eqn. 2 to determine its geometry Ω.The set of pure tones a drum can produce contains information about the geometry of its membrane; but, it is not obvious if these pure tones (and c) uniquely determine the geometry of the membrane.This inverse problem, "can one hear the shape of a drum?" was popularized by Kac in 1966, but it was not until 1992 that Gordon, Webb, and Wolpert found two non-congruent domains with an identical spectrum of vibration frequencies [18], shown below.
ie., one cannot hear the shape of a drum [18].
Classically, a solution to an inverse problem is sought by adjusting the input x or parameter β until the model output(s) match(es)/fit(s) the observed system output(s) eg., via least squares [21,22], to achieve consistency between the model and the data [23].

Challenges in applied inverse problems
As opposed to theoretical, exact inverse problems, in applied inverse problems [24], we must cope with noise contaminating the measurements of the system output.This noise may originate from: (i) an imperfect measurement instrument and/or (ii) variance in the system output in response to a repeated input due to (a) inherent stochasticity and/or (b) unrecognized or poorly-controlled, and thus unaccounted-for in the model, inputs/conditions that influence the output.More, the model f β (x) may inadequately approximate physical reality-perhaps, in part, owing to (ii)b.[23] Applied inverse problems are particularly interesting and challenging because, in contrast to forward problems, they are often ill-posed and/or ill-conditioned [6,25]; a solution to an applied inverse problem: • may not exist, owing to noise in the measured output and model inadequacy.
-in reconstruction problems, the observed output may fall outside the image of f β , so that no physically-viable input is consistent with it.
-in parameter identification, a parameter β giving a model f β (x) that reproduces all observed input-output pairs may not exist.
• may not be unique.
-in reconstruction problems, many different inputs may be consistent with the observed output, owing to a many-to-one f β .
-in parameter identification, (i) the number of observed input-output pairs may be insufficient to fully constrain the parameters or (ii) the model may be inherently unidentifiable (ie.no amount of data can fully constrain the parameters) [25][26][27].
• may depend discontinuously on or be sensitive to measurement noise contaminating the data.Even if the solution to the inverse problem exists and is unique, noise contaminating measurements of the system output propagates onto and corrupts the solution to the inverse problem.In ill-conditioned inverse problems [14], small realizations of noise in the data produce large changes in the solution.
To tackle applied inverse problems, these challenges implore us to account for the noise contaminating measurements of the output, reframe our concept of a "solution", and quantify our uncertainty in the solution [28,29].

Bayesian statistical inversion (BSI)
Bayesian statistical inversion (BSI) [13,[29][30][31][32][33][34] is a versatile framework to tackle [possibly] illposed and/or ill-conditioned applied inverse problems.BSI has two key advantages.First, BSI allows us to incorporate prior (ie.before data is collected) information and/or beliefs about the input/parameter into our solution to the inverse problem.Second, BSI yields a probabilistic solution to inverse problems that quantifies uncertainty about the input/parameter.
Modeling uncertainty.To model uncertainty about the unknown input/parameter, we treat it as a random variable and model its probability distribution.In this Bayesian view, the probability assigned to each region of input/parameter space reflects our degree of belief that the input/parameter falls in that region.This belief is based on some combination of subjective belief and objective information.[36,37] Loosely speaking, the spread (concentration) of the probability density over the input/parameter space reflects our uncertainty (certainty) about its value.
Prior vs. posterior distributions.Our uncertainty about the input/parameter is different before and after we conduct an experiment, collect data, and compare this data with our model.Consequently, the input/parameter has a prior density before the data are considered, then an updated, posterior density after we are enlightened by the data.
Modeling the data-generating process.To acknowledge the unobservable noise that contaminates our measurements, we view the outcomes of our measurements of the system output (ie., the data) as realizations of random variables.A model of the stochastic data-generating process follows from combining (i) the mathematical model f β (x) of the system and (ii) a model of the probability distribution of the noise.
The two ingredients of BSI.The two key ingredients for tackling an inverse problem with BSI are: 1. the prior distribution of the input/parameter, expressing our beliefs about it before the data are collected/observed.Constructing a prior is context-dependent and involves a degree of subjectivity.A prior may be roughly categorized as diffuse, weakly informative, or informative [38], based on the amount of uncertainty it admits about the input/parameter.An informative prior, eg. a Gaussian distribution with a small variance, expresses a high degree of certainty about the input/parameter.On the other hand, a diffuse prior, eg. a uniform distribution, spreads its density widely over input/parameter space to express a very high degree of uncertainty.[38] An informative prior influences the posterior more than a diffuse prior, which "lets the data speak for itself" [39].Generally, as we gather more data, the influence of the prior on the posterior tends to (but does not for non-identifiable systems [26]) weaken/diminish as the data overrides/overwhelms the prior.
2. the likelihood function of the input/parameter, giving the probability density of the data conditioned on each value of the input/parameter.The likelihood function is constructed from (i) the data and (ii) our model of the data-generating process, which constitutes (a) the forward model and (b) a probability distribution to model the noise corrupting the measurements.The likelihood quantifies the support that the data lend to each value of the unknown input/parameter.[38] From these two ingredients, we next "turn the Bayesian crank" [40] to obtain the posterior distribution of the unknown input/parameter.
The BSI solution to an inverse problem.The BSI "solution" to the inverse problem, the posterior distribution of the unknown input/parameter, follows from the prior density and likelihood function of the input/parameter via Bayes' theorem.The posterior density of the unknown input/parameter gives the probability that the input/parameter falls in any given region of input/parameter space, conditioned on the data.The posterior updates the prior in light of the data, offers a compromise between the prior and likelihood, and constitutes the raw solution to the inverse problem that quantifies uncertainty about the input/parameter through its spread.[28] In practice, some inference engine (eg.Markov Chain Monte Carlo sampling) is usually employed to approximate the posterior [40].We may summarize the posterior distribution of the input/parameter with a credible region (eg., a highest-density region [41]) that contains some large fraction α of the posterior density and thus credibly contains (given the assumptions in the likelihood and prior hold) the input/parameter (precisely, with probability α) [42].Note, a Bayesian credible interval for the parameter/input is distinct from and arguably more intuitive and natural than a frequentist confidence interval [43].

Our contribution: a tutorial of BSI, illustrated on inverse problems of heat transfer
We provide a tutorial of BSI to solve inverse problems of both parameter identification and reconstruction while incorporating prior information and quantifying uncertainty.Our tutorial is by way of examples regarding a simple, intuitive physical process: convective heat transfer from ambient air to a cold lime fruit.We aim to engage a wide range of scientists and engineers.
In Sec.2.1, we describe our experimental setup: (i) we take a cold lime fruit out of a refrigerator and allow it to exchange heat with indoor air via natural convection; (ii) a probe inserted into the lime measures its internal temperature.In Sec.2.2, we pose a mathematical model governing the time-evolution of the lime temperature, based on Newton's "law" of cooling.The model contains a single parameter.To describe the data-generating process, in Sec.2.3, we augment this model with a probabilistic model of making a noisy measurement of the lime temperature with the imperfect temperature probe.Next, in Sec.2.4, we employ BSI to infer the parameter in the dynamic model of the lime temperature (an overdetermined inverse problem), using time series data of the lime temperature.We then employ the model of the lime temperature with the inferred parameter to tackle two time reversal problems: (Sec.2.5, ill-conditioned) reconstruct the initial temperature of the lime, given a measurement of its current temperature and the time it has been outside of the refrigerator and (Sec.2.6, underdetermined) reconstruct the initial temperature of the lime and the duration it has been out of the refrigerator, given a measurement of its current temperature.The solution to each inverse problem under BSI expresses uncertainty through the posterior probability density of the parameter/initial condition.We assess the fidelity of the solution to the reconstruction problems by comparing with the measured initial condition held out from the BSI procedure.From these concrete, instructive examples, we intend for readers to recognize inverse problems in their research domain where BSI may be employed to incorporate prior information and quantify uncertainty.

A tutorial of Bayesian statistical inversion (BSI)
As a tutorial of BSI to solve inverse problems while incorporating prior information and quantifying uncertainty, we tackle a variety of inverse problems pertaining to heat exchange between ambient air and a lime fruit via natural convection.

Experimental setup: heat exchange between a lime and the air
We allowed a lime fruit (∼5 cm diameter) to reside in a refrigerator for several days.Then, at time t := t 0 [min], we removed the lime from the refrigerator, placed it on a thin slab of insulating styrofoam, and allowed it to exchange heat with the indoor air via natural convection.A resistance-based temperature sensor inserted into the lime3 allows us to measure its internal temperature at any given time t to generate a data point (t, θ obs ) ("obs" = observed).Fig. 3 shows our experimental setup and Sec.S1.1 explains our temperature sensor and Arduino microcontroller setup.

The mathematical model of the lime temperature
We develop a mathematical model governing θ = θ(t) [ • C], the temperature of the lime as a function of time t ≥ t 0 [min], as it exchanges heat with the bath of ambient air at bulk (ie., far from the lime) temperature θ air [ • C].The mechanism of lime-air heat transfer is natural convection, which is the combined effect [44] of both (i) heat conduction and (ii) heat transport via motion of the air adjacent to the lime, driven by buoyant forces arising from changes in the density of the air due to changes in its temperature [45].The initial temperature of the lime is θ(t 0 ) =: θ 0 .
Assumptions.We make several simplifying assumptions: • the temperature of the lime is spatially uniform4 • the bulk temperature of the air (a bath) θ air is constant (sufficiently far from the lime) • heat conduction between the lime and the styrofoam surface on which it sits is negligible the system: a lime fruit the process: heat transfer via natural convection the measurement instrument: a temperature probe a data point: (t, θ obs ) Figure 3: Experimental setup.An initially-cold lime fruit rests on a styrofoam slab and exchanges heat with the indoor air via natural convection.A temperature sensor inserted into the lime can measure its internal temperature at any given time t, producing a data point (t, θ obs ).
• the mass of the lime is constant (eg., a negligible loss of moisture over time) • the heat capacity of the lime is constant with temperature • heat released/consumed due to chemical reactions (eg., oxidation) is negligible • the temperature probe inserted into the lime has a negligible thermal mass • the rate of heat exchange between the air and the lime is governed by Newton's "law" of cooling Newton's "law" of cooling.Newton's "law" of cooling [48][49][50][51] prescribes the rate of heat transfer [J/min] from the air to the lime at time t ≥ t 0 as proportional to the difference in temperature between the lime and the air (the thermodynamic driving force for heat transfer) at that time: rate of heat transfer from air to lime ∝ θ air − θ(t).
The precise rate, hA[θ air −θ(t)], depends on two (assumed, constant) parameters: (i) the surface area of the lime in contact with the air, between the air and the surface of the lime [44].
Differential equation model.Conservation of energy applied to the lime gives its change in temperature dθ = dθ(t) over a differential change in time dt: with C [J/ • C] the thermal mass of the lime.Eqn. 4 balances the amount [J] of sensible heat stored in the lime (left) and amount [J] of heat transferred to the lime (right) over the time interval [t, t + dt).Thus, a first-order differential equation describes the time-evolution of the lime temperature [48,49]: The single, lumped parameter λ := C/(hA) [min] of the model is a time constant governing the dynamics of heat transfer between the lime and the air.
Analytical solution to the model.Eqn. 5 admits an analytical solution through a variable transformation, then integration: Starting at θ 0 , the lime temperature θ(t) monotonically approaches the air temperature (lim t→∞ θ(t) = θ air ) as the temperature difference between the lime and air decays exponentially.The parameter λ is a time scale for the lime to reach thermal equilibrium with the air.Specifically, the lime temperature reaches a fraction e −1 ≈ 0.63 of the way to the air temperature after a duration t − t 0 = λ out of the refrigerator.
Fig. 4 shows the solution to the model of the lime temperature, which we write as θ(t; λ, t 0 , θ 0 , θ air ) to emphasize its dependence on the parameter λ, the initial condition (t 0 , θ 0 ), and the air temperature θ air .

The probabilistic model of the data-generating process
Consider the process of employing our temperature probe, an imperfect measurement instrument, to measure the lime temperature at time t, giving a data point (t, θ obs ).
We treat the measured temperature θ obs as a realization of a random variable Θ obs owing to two sources of stochasticity.First, measurement noise: unobservable noise originating from the temperature probe corrupts the measurement.Second, residual variability : under repeated experiments with identical conditions (t 0 , θ 0 , θ air ), the lime temperature at time t may exhibit variance due to variable conditions/inputs that are poorly-controlled or unrecognized, and thus unaccounted for in the model θ(t) [23].For example, (i) the air temperature θ air is not perfectly controlled and may fluctuate over time and (ii) the opening and closing of doors in the building may introduce fluctuating air currents in the room, making the heat transfer coefficient h change over time.
We model the noise in the observed lime temperature as a random variable E σ additive to the model prediction, independent among repeated measurements, and having an identical distribution over time.Then, our probabilistic model of the data-generating process is: where E σ is a zero-centered Gaussian with variance σ 2 : Eqn. 7 assumes the time scale for the temperature probe to thermally equilibrate with the lime is negligibly small, avoiding a time delay.
The random variable E σ can capture noise emanating from both the measurement instrument and residual variability.However, eqn.7 assumes the mean measured lime temperature at time t over repeated experiments with the same conditions (t 0 , θ 0 , θ air ) is given by the model θ(t; λ, t 0 , θ 0 , θ air ).Ie., our data-generating model neglects the possibility of model discrepancy, a nonzero difference between (a) the expected measured lime temperature at time t over multiple experiments with the same conditions (t 0 , θ 0 , θ air ) and (b) the model prediction of the lime temperature, θ(t; λ, t 0 , θ 0 , θ air ).[23] Under eqn. 7, the probability density function governing the distribution of the measured lime temperature Θ obs at time t, given λ, t 0 , θ 0 , θ air , σ, is obtained by translating the density of the noise in eqn.8 to center it at the model prediction:

Overview of problem and approach
Task: employ BSI to infer the parameter Λ (ie., λ treated as a random variable) characterizing the lime, appearing in the model of the lime temperature in eqn.6, using data from a heat transfer experiment.
First, we estimate λ with a back-of-the-envelope calculation and use this estimate to construct a weakly informative prior distribution of Λ.
To admit our uncertainty about the variance of the measurement noise in our model of the data-generating process in eqn.7, we treat Σ 2 as an unknown parameter to also be inferred from the data.We impose a diffuse prior distribution on Σ 2 , granted support based on the noise characteristics of the temperature probe.
Next, we conduct a heat transfer experiment and collect data (see Sec. 2.1).Our measurements defining the conditions of the experiment are: (i) the initial temperature of the lime, giving data (t 0 = 0, θ 0,obs ) and (ii) the air temperature, giving data θ air obs .To admit these are noisy measurements, we treat the initial temperature of the lime Θ 0 and air temperature Θ air as random variables to be inferred from the data, but impose informative prior distributions on them based on these measurements.Over the course of the experiment, we measure the lime temperature, giving time series data {(t i , θ i,obs )} N i=1 that provide information about Λ.
Finally, we use Bayes' theorem to construct the posterior distribution of (Λ, Θ 0 , Θ air , Σ) in light of the data, sample from it using a Markov Chain Monte Carlo algorithm, and obtain a credible interval for the parameter Λ that quantifies posterior uncertainty about its value.
Quick overview: • data: the measured initial temperature of the lime (t 0 = 0, θ 0,obs ), air temperature θ air obs , and lime temperature over the course of the experiment {(t i , θ i,obs )} N i=1 • random variables to infer from the data: the parameter Λ, the initial lime temperature Θ 0 , the air temperature Θ air , the variance of the measurement noise Σ 2 • sources of priors: Λ: back-of-the-envelope calculation; Θ 0 , Θ air : our noisy measurements of them; Σ 2 : precision of temperature sensor Summary of results: See Fig. 5.
Classically, this inverse problem is overdetermined because a parameter λ giving a model θ(t; λ, t 0 , θ 0 , θ air ) that passes through all data points {(t i , θ i,obs )} N i=1 does not exist5 .

The prior distributions
Before the data {(t i , θ i,obs )} N i=1 are considered, in BSI, we must express our prior beliefs and information about the value of each variable via a prior probability density function.
The parameter, Λ.A back-of-the-envelope estimate of λ = C/(hA) is ≈ 1 hr, based on the following.The diameter of the lime, approximated as a sphere, is ∼5 cm.The mass of the lime is ∼100 g.The specific heat of a lime is reported as ∼4.0 kJ/[kg• • C] [46,52].A typical heat transfer coefficient h for natural convection via gas is 15 We specify a weakly informative prior density π prior (λ) as that of a Gaussian distribution (i) centered at our back-of-the-envelope estimate of λ, (ii) with a high variance to reflect our low confidence in this rough estimate, and (iii) truncated below zero to enforce positivity: See Fig. 5b.
The experimental conditions, Θ 0 and Θ air .We impose informative prior distributions on the initial lime temperature Θ 0 and air temperature Θ air based on our (noisy) measurements of them: The variance of the measurement noise, Σ 2 .Our prior distribution of the standard deviation of the measurement noise, reflecting our beliefs about the precision of the temperature probe, is where U(•) is a uniform distribution over the interval •.
The joint prior distribution.The joint prior distribution of all of the unknowns for this inverse problem factorizes since we imposed independent priors, corresponding to plausible assumptions, incl.that the parameter λ of the lime has no causality link with the air temperature.

The data and likelihood function
The data.The data from the heat transfer experiment are displayed in Fig. 5a: • the initial condition of the lime (t 0 = 0, θ 0,obs ) • the air temperature θ air obs • the lime temperature over the course of the experiment {(t i , θ i,obs )} N i=1 .
The likelihood function.The likelihood function gives the probability density of the data {(t i , θ i,obs )} N i=1 conditioned on each possible value of the parameters Λ = λ and Σ = σ, and experimental conditions Θ 0 = θ 0 and Θ air = θ air .We construct the likelihood from the (i) data {(t i , θ i,obs )} N i=1 and (ii) model of the data-generating process in eqn.9.The likelihood function is: The likelihood factorizes because we model the measurement noise E σ as an independent random variable.Note, inherently, the likelihood is conditioned on the model structure as well.
Since we possess the data {(t i , θ i,obs )} N i=1 at this stage, we (i) view the likelihood as a function of the unknowns (λ, θ 0 , θ air , σ) and (ii) interpret it as a measure of the support the data {(t i , θ i,obs )} N i=1 lend to each value of the unknowns, (λ, θ 0 , θ air , σ). [38]
We are particularly interested in the posterior distribution of the parameter Λ, with (Θ 0 , Θ air , Σ) marginalized out.
• reflects a compromise between (i) our prior knowledge and beliefs about (λ, θ 0 , θ air , σ) and (ii) the support the data {(t i , θ i,obs )} N i=1 lend to (λ, θ 0 , θ air , σ), according to our model of the data-generating process.Remark on sources of posterior uncertainty about Λ. Uncertainty about Λ remains even after the data {(t i , θ i,obs )} N i=1 are considered.Sources of this posterior uncertainty owe to a lack of data (small N) coupled with two causes of noise captured through our model of the datagenerating process in eqn.7 [23]: (i) measurements of the lime temperature being corrupted by [unobservable] noise from using an imperfect temperature sensor and (ii) unrecognized and/or poorly-controlled inputs/conditions that influence the lime temperature and vary over the course of the experiment.However, our posterior distribution does not capture uncertainty due to (i) residual variability: unrecognized and/or poorly-controlled inputs/conditions that influence the lime temperature and vary from experiment-to-experiment (because we conduct only a single experiment) or (ii) model inadequacy: when θ(t) does not faithfully predict the expected (over many experiments under the same conditions (t 0 , θ 0 , θ air )) measured lime temperature (perhaps, in part owing to (i)), which would introduce bias.
Sampling from the posterior.We employ a Markov Chain Monte Carlo (MCMC) algorithm, the No-U-Turn Sampler (NUTS) [53] implemented in Turing.jl[54] in Julia [55], to obtain samples from the joint posterior distribution in eqn.16 in order to (i) approximate it with an empirical distribution using kernel density estimation and (ii) compute means and credible intervals of the unknowns.Over four independent chains, we collect 2 500 samples/chain, with the first half discarded for burn-in.

Markov chain Monte Carlo (MCMC) sampling from the posterior distribution
In a general setting for parameter inference via BSI, let • U ∈ R K be the random vector of the K unknown parameters in the inverse problem • d ∈ R N be the data vector of noisy measurements/observations. The posterior density of U follows from Bayes' theorem [42]: The denominator, the evidence, is the probability density of the data-a constant that serves as a normalizing factor for the numerator: Usually, we cannot analytically evaluate this K-dimensional integral.If K is large, it may be intractable to use numerical cubature to approximate π evidence (d) as well.The same difficulty may arise for the integral to (i) compute the mean of the posterior or (ii) marginalize out a subset of the unknowns we are less concerned with.[40] MCMC methods permit us to obtain samples u 1 , ..., u n from the posterior density π(u) while only knowing a function to which it is proportional (as in eqn.16).This circumvents the need to compute π evidence (d).From the samples u 1 , ..., u n , we can (i) construct an empirical posterior distribution using kernel density estimation [56] and (ii) approximate (a) the mean of the posterior from the mean of the samples and (b) an equal-tailed credible interval of any unknown from the percentiles of its samples.Note, we can construct the empirical marginal posterior distribution, of a subset of unknowns, trivially by ignoring the remaining dimensions of the sampled vectors.
The idea behind an MCMC method is to (i) construct a Markov chain U 1 , U 2 , ... whose (a) state space is the parameter space, and (b) transition kernel governing the probability of transitioning from one state u to another u endows the chain with a stationary distribution equal to the posterior distribution π(u) then (ii) simulate the Markov chain to obtain a realization u 1 , ..., u n , regarded as (serially-correlated) samples from π(u).[38,40,57,58] Perhaps the simplest MCMC simulation algorithm to understand is random walk Metropolis [40,57].Here, a realization u 1 , ..., u n of a Markov chain U 1 , ..., U n is obtained by iterating a stochastic process of "propose then accept-or-reject" n times.Suppose u is the current state in the chain.We propose to move to a new state u , chosen randomly according to an isotropic random walk starting at u. Ie., the proposed new state is a random variable U = u + ∆U where ∆U ∼ N (0, σ 2 I).We accept this proposed state transition with probability and reject it otherwise, staying at u.This rule (i) always accepts proposed moves "uphill" to a state u with higher density than u and (ii) occasionally accepts moves "downhill".Note, the rule only requires the ratio of the densities of the two states.Hence, the normalization factor cancels and π evidence (u) is not needed.Together, this proposal distribution and acceptance rule specifies a transition kernel π(u | u) that grants the Markov chain U 1 , U 2 , ... a stationary density equal to π(u).Consequently, u 1 , ..., u n are serially-correlated samples from the posterior π(u).[40] The scale parameter σ in the random walk proposal distribution dictates the efficiency of the sampling, in terms of the amount of serial correlation in the samples.If σ is too small, too many proposed random walk steps are required to explore the state space.If σ is too large, too many proposals will be to visit regions with low density, which will be rejected, making the walker stay in place.Both extremes make the sampler inefficient.[57,59] NUTS [53], an extension of Hamiltonian Monte Carlo (HMC) [60], is a more efficient MCMC sampler than random walk Metropolis owing to its more intelligent proposal scheme than a random walk in state space.

Summary of results
Fig. 5a shows all data from the heat transfer experiment that we employ to infer the parameter Λ with BSI.probability, and falls above it with 5% probability.The width of the interval, then, reflects our posterior uncertainty about Λ.This interpretation is predicated upon our model of the datagenerating process and prior assumptions holding.Fig. 5c illustrates the posterior distribution over functions θ(t) modeling the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t 0 = 0, θ 0 , θ air ), with (λ, θ 0 , θ air ) a sample from the posterior distribution.The models fit the data {(t i , θ i,obs )} N i=1 well and exhibit little variance (see the residual plot in Fig. S3; the mean posterior model of the lime temperature matches each data point to within ±0.25 • C), reflecting low uncertainty about the parameter λ in light of the data.
We found little correlation (Pearson correlation 0.01) between Λ and Σ in the joint posterior distribution.The (marginal) posterior distributions of Λ and Σ are well-approximated as independent Gaussian distributions N (0.98 hr, (0.02 hr) 2 ) and N (0.16 To assess the efficiency of an MCMC sampler and give confidence the MCMC samples reliably approximate the posterior distribution [58], we drew trace plots and visualized the empirical distribution over four independent chains.See Fig. S2.

Overview of problem and approach
Task: employ BSI to infer the initial (t 0 = 0) temperature of the lime Θ 0 (ie., θ 0 treated as a random variable) based on a measurement of its temperature at time t > t 0 , θ obs , and the measured air temperature θ air obs .
First, we impose a diffuse prior distribution on Θ 0 based on the range of temperatures encountered in refrigerators.
Second, we impose informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ 2 , based on the posterior distributions from our parameter identification activity in Sec.2.4."Yesterday's posterior is today's prior."[61] Next, we conduct another heat transfer experiment on the same lime (see Sec. 2.1).Our measurement defining the condition of the experiment is the measured air temperature, θ air obs .To (indirectly) gather information about Θ 0 , we measure the temperature of the lime at time t , ie. a duration t after it was taken out of the refrigerator, giving θ obs .
Finally, we use Bayes' theorem to construct the posterior distribution of (Θ 0 , Λ, Θ air , Σ) in light of the data, sample from it using an MCMC algorithm, and obtain a credible interval for the initial lime temperature Θ 0 that quantifies posterior uncertainty about its value.
Note, we also measured the initial temperature of the lime, but we hold the data (t 0 = 0, θ 0,obs ) out from the BSI procedure to test the fidelity of the posterior distribution of Θ 0 .

Quick overview:
• data: the measured air temperature θ air obs and lime temperature (t , θ obs ) with t > t 0 = 0.
• random variables to infer from the data: the initial lime temperature Θ 0 , the parameter Λ, the air temperature Θ air , the variance of the measurement noise Σ 2 • sources of priors: Θ 0 : range of temperatures encountered in refrigerators; Λ, Σ 2 : the posterior from our parameter identification activity in Sec.2.4; Θ air : our noisy measurement of the air temperature.
Summary of results: See Figs. 6 and 7.
Classically, this is a determined time-reversal problem that becomes ill-conditioned for large t .To see the ill-conditioning, let δθ be the error in our measurement of θ(t ) and δθ 0 be the resulting error in our prediction of the initial temperature, θ0 =: θ 0 + δθ 0 .The errors δθ 0 and δθ are related through a perturbed version of eqn.5: implying the error in the predicted initial temperature δθ 0 = δθ e (t−t 0 )/λ grows exponentially with the time t > t 0 at which we take the measurement of the lime temperature.This illconditioning is apparent from the graphical solution to this time reversal problem, of tracing the model trajectory of the lime temperature backwards in time, starting at the point (t , θ(t )+δθ ), back to (t 0 = 0, θ0 ).A small error δθ in the measured lime temperature at t results in a large change in the trajectory traced backwards, if t is large enough to place the measured lime temperature close to the air temperature.See Fig. S4.

The prior distributions
The initial temperature, Θ 0 .We impose a diffuse prior distribution on the initial temperature of the lime based on a (generous) range of temperatures encountered in refrigerators: The air temperature, Θ air .We impose an informative prior distribution on the air temperature Θ air based on our (noisy) measurement of it: The parameter Λ and variance of the measurement noise Σ 2 .We exploit the information we obtained about Λ and Σ during our parameter identification activity in Sec.2.4 to construct informative prior distributions on the parameter Λ characterizing the (same) lime and the variance in the measurement noise Σ emanating from the (same) temperature sensor.Particularly, we use the posterior distributions from Sec. 2.4."Yesterday's posterior is today's prior."[61] Λ ∼ N (0.98 hr, (0.02 hr) 2 ) (23a) The joint prior distribution.Again, the joint prior distribution of all of the unknowns π prior (θ 0 , λ, θ air , σ) for this inverse problem factorizes since we impose independent priors.

The data and likelihood function
The data.The data from the second heat transfer experiment are displayed in Fig. 6a: • the measured air temperature, θ air obs • a single measurement of the lime temperature, (t , θ obs ) with t > t 0 = 0.
Note, the time the lime was taken out of the refrigerator t 0 = 0 is known.
The likelihood function.The likelihood function gives the probability density of the data θ obs conditioned on each possible value of the parameters Λ = λ and Σ = σ, and experimental conditions Θ 0 = θ 0 and Θ air = θ air .We construct the likelihood from the (i) data (t , θ obs ) and (ii) model of the data-generating process in eqn. 7. The likelihood function is: It quantifies the support the datum (t , θ obs ) lends to each value of the unknowns (θ 0 , λ, θ air , σ).

The posterior distribution
The (joint) posterior density governs the probability distribution of the unknowns (Θ 0 , Λ, Θ air , Σ) conditioned on the data (t , θ obs ).By Bayes' theorem, the posterior density is proportional to the product of the likelihood function and prior density: We are particularly interested in the posterior distribution of the initial lime temperature Θ 0 , with (Λ, Θ air , Σ) marginalized out.
Again, we employ NUTS to obtain samples from the posterior then construct an empirical approximation of it.

Summary of results
Fig. 6a shows the data from the heat transfer experiment that we employ to infer the initial lime temperature Θ 0 with BSI.Fig. 6b compares (i) the prior distribution of the initial lime temperature Θ 0 with (ii) its updated, (marginal) empirical posterior distribution constructed via kernel density estimation.The bar shows the 90% equal-tailed posterior credible interval for Θ 0 .Notably, the hold-out test data, the measured initial lime temperature θ 0,obs , falls in this credible interval.Fig. 6c illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t 0 = 0, θ 0 , θ air ), with (θ 0 , λ, θ air ) a sample from the posterior distribution.
Ill-conditioning.Fig. 7 shows the marginal posterior distribution of the initial lime temperature Θ 0 for various times t at which we measure the lime temperature to obtain (t , θ obs ).As t becomes larger, the posterior distribution of Θ 0 spreads, reflecting higher uncertainty.This illustrates the ability of BSI to capture ill-conditioning in inverse problems of reconstruction.

Inverse problem IIb: time reversal
Overview of problem and approach Task: employ BSI to infer both the time T 0 (ie., t 0 treated as a random variable) the lime was taken out of the refrigerator and the initial temperature Θ 0 of the lime based on a measurement of the lime temperature at time t > t 0 , θ obs , and the measured air temperature θ air obs .
First, we impose a diffuse prior distribution on Θ 0 based on the range of temperatures encountered in refrigerators and a weakly informative prior distribution on T 0 based on our sense of time passing.
Second, we impose informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ 2 , based on the posterior distributions from our parameter identification activity in Sec.2.4.
Next, we conduct another heat transfer experiment a on the same lime (see Sec. 2.1).Our measurement defining the condition of the experiment is the measured air temperature, θ air obs .To (indirectly) gather information about (T 0 , Θ 0 ), we measure the temperature of the lime at time t , θ obs .Finally, we use Bayes' theorem to construct the posterior distribution of (T 0 , Θ 0 , Λ, Θ air , Σ) in light of the data, sample from it using a MCMC algorithm, and obtain an empirical marginal joint posterior distribution for the initial lime temperature Θ 0 and time it was taken out of the refrigerator T 0 that quantifies posterior uncertainty about their values.
Note, we also measured the initial temperature of the lime and know when it was taken out of the refrigerator, but we hold this data (t 0 = 0, θ 0,obs ) out from the BSI procedure to test the fidelity of the posterior distribution of (T 0 , Θ 0 ).

Quick overview:
• data: the measured air temperature θ air obs and lime temperature (t , θ obs ).
• random variables to infer from the data: the time the lime was taken out of the refrigerator T 0 , the initial lime temperature Θ 0 , the parameter Λ, the air temperature Θ air , the variance of the measurement noise Σ 2 .
• sources of priors: T 0 : our human judgement of the passing of time; Θ 0 : range of temperatures encountered in refrigerators; Λ, Σ 2 : the posterior from our parameter identification activity in Sec.2.4; Θ air : our noisy measurement of the air temperature.
a Well, we use the same data from the second heat transfer experiment in Sec.2.5.Importantly, it is a different experiment from the one used for parameter identification.
Classically, this time reversal problem is underdetermined.Conceptually, the current condition of the lime (t , θ obs ) is consistent with (1) "the lime was initially very cold and has been outside of the refrigerator for a long duration" and (2) "the lime was initially not very cold and has been outside of the refrigerator for a short duration".Mathematically, there is a line of infinite solutions in the (t 0 , θ 0 ) plane (the two primary unknowns) that satisfy the model in eqn. 5 with known (t , θ ) and θ air : θ = θ air + (θ 0 − θ air )e −(t −t 0 )/λ .
Contrasting the classical curve of solutions in eqn.26 with a solution via BSI, (i) depending on the prior distribution, the posterior distribution in BSI may assign different weights to each of the (inherently, equal-weighted) classical solutions and (ii) by accounting for measurement noise, BSI entertains solutions off of the curve comprising the classical solutions.This time reversal problem still becomes ill-conditioned for large t , in that the curve described by eqn.26 becomes sensitive to errors in the measured θ as t > t 0 increases.

The prior distributions
We impose the same prior distributions of Θ 0 , Θ air , Λ, and Σ 2 as in Sec.2.5.Additionally, we now impose a prior distribution on the time T 0 at which the lime was taken out of the refrigerator, based on our unreliable judgement of the passing of time: This gives a joint prior distribution π prior (t 0 , θ 0 , λ, θ air , σ) for this inverse problem.We visualize the prior of (T 0 , Θ 0 ) in Fig. 8b.

The data and likelihood function
The data.The data from the second heat transfer experiment are displayed in Fig. 8a: • the measured air temperature, θ air obs • a single measurement of the lime temperature, (t , θ obs ) with t .
The time the lime was taken out of the refrigerator t 0 is not known.
The likelihood function.The likelihood function gives the probability density of the measurement θ obs conditioned on each possible value of the parameters Λ = λ and Σ = σ and experimental conditions T 0 = t 0 , Θ 0 = θ 0 , and Θ air = θ air .

The posterior distribution
The (joint) posterior density governs the probability distribution of the unknowns (T 0 , Θ 0 , Λ, Θ air , Σ) conditioned on the data (t , θ obs ).By Bayes' theorem, the posterior density is proportional to the product of the likelihood function and (joint) prior density: We are particularly interested in the posterior distribution of the initial condition of the lime (T 0 , Θ 0 ), with (Λ, Θ air , Σ) marginalized out.
Again, we employ NUTS to obtain samples from the posterior.

Summary of results
Fig. 8a shows the data from the heat transfer experiment that we employ to infer the initial condition of the lime (T 0 , Θ 0 ) with BSI.
By showing contours, Fig. 8b compares (i) the joint prior distribution of the initial condition of the lime, (T 0 , Θ 0 ), with (ii) the updated, empirical, joint posterior distribution of (T 0 , Θ 0 ) constructed via kernel density estimation.Because this inverse problem is underdetermined, the posterior density is quite widely spread over the curve in the (t 0 , θ 0 ) plane comprising the classical solution, in eqn.26.However, owing to the non-uniform density and finite support of the prior distribution of (T 0 , Θ 0 ), generally each of the classical solutions is weighted differently in the posterior density.Also, the posterior density spreads orthogonal to the curve of classical solutions, quantifying uncertainty owing to admission of measurement noise in the data.Note, the hold-out test data, the measured initial condition of the lime (t 0 = 0, θ 0,obs ), falls in a region of high posterior density.But, this result is in part a consequence of the mean of our prior for T 0 in eqn.27 being close to the true t 0 = 0.The marginal posterior distributions of T 0 and Θ 0 are compared in Fig. 8b as well, including their 90%, equal-tailed posterior credible intervals.The credible interval for Θ 0 is much wider here than in the inverse problem IIa in Fig. 6b, despite using the same measurement θ obs , because t 0 is not specified in this underdetermined inverse problem.Fig. 8c illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t 0 , θ 0 , θ air ), with (t 0 , θ 0 , λ, θ air ) a sample from the posterior distribution.The intuition explaining the correlation of T 0 and Θ 0 the joint posterior density in Fig. 8b is apparent in Fig. 8c: the data (t , θ obs ) is consistent with both propositions (i) "the lime was initially not very cold and taken out of the refrigerator recently" and (ii) "the lime was initially very cold and taken out of the refrigerator a while ago".

Conclusions
By way of example, we provided a tutorial of Bayesian statistical inversion (BSI) to solve inverse problems while incorporating prior information and quantifying uncertainty.Our focus was a simple, intuitive physical process-heat transfer from ambient indoor air to a cold lime fruit via natural convection.First, we developed a simple mathematical model for the lime temperature, which contains a single parameter.Then, we used a time series data set of the lime temperature to infer, via BSI, the posterior distribution of the model parameter.Next, we employed the model with the inferred parameter to tackle, via BSI, two reconstruction problems of time reversal.The first task, ill-conditioned, was to predict the initial temperature of the lime from a measurement of its temperature later in time.The second task, underdetermined, was to predict the initial temperature of the lime and the time it was taken out of the refrigerator from a measurement of its temperature later in time.We intend for our tutorial to facilitate scientists and engineers to (i) recognize inverse problems in their domain and (ii) employ BSI to solve them, while incorporating prior information and quantifying uncertainty.
Our BSI solutions to the inverse problems involving the lime are subject to limitations.First, our mathematical model of the lime temperature relies on several simplifying assumptions listed in Sec.2.2.The model may be more accurate if we relaxed these assumptions and amended it to account for eg., (i) the time-dependence of the bulk air temperature θ air (t) and (ii) the spatial temperature gradients in the lime and its geometry and spatial heterogeneity (skin, flesh, seeds).Second, ideally, we would replicate the heat transfer experiment multiple times and use all of this time series data of the lime temperature for the inference of the parameter Λ in Sec.2.4.This would allow the posterior distribution of Σ to capture the full residual variability of the lime temperature over repeated experiments, owing to poorly-controlled and/or unrecognized inputs/conditions that affect the lime temperature.Third, we neglected the possibility of model discrepancy [23] that could arise due to these factors, an unaccounted-for source of both uncertainty and systemic bias in the posterior of the parameter/initial condition of the lime.
Classical approaches to uncertainty quantification.For overdetermined inverse problems tackled via least squares, bootstrapping [88], asymptotic theory [89], and the Hessian of the loss function [90] can provide uncertainty estimates for the unknown input/parameters and new predictions by the model.[23] refers to a nonzero difference between the (i) expected measurement of the system output over repeated experiments under the same conditions and (ii) the prediction of the system output by the model.If significantly present and not accounted for in the model of the data-generating process, model discrepancy corrupts the BSI solution to an inverse problem [91].First, disregarded model discrepancy introduces bias, ie. the posterior density of the parameter/input will not be centered at the true value even as more and more data are collected.Second, it leads to mis-calibrated uncertainty quantification, ie. the posterior credible interval will not be likely to contain the true value of the parameter/input even as more and more data are collected.

Model discrepancy. Model discrepancy
To account for model discrepancy, we can modify the model of the data-generating process to explicitly include a model discrepancy function ∆(x) (a random variable dependent on the input, x) as a Gaussian process [92] and employ data to infer it.Ie., we model the measured system output Y obs as Y obs (x) = f β (x) + ∆(x) + E, with E the noise, which by definition has mean zero.
Model selection.We assumed the model structure is known (see Tab. 2a).In model selection, the task is to select the best model of a physical system among a set of candidate models, given data.Bayesian approaches to model selection include the Bayesian information criterion [93] and Bayes factors [38,94].
Other methods to approximate the posterior distribution.Here, we used the NUTS [53] Markov Chain Monte Carlo method to obtain samples from the posterior distribution to approximate it.Other methods to approximate the posterior distribution include (i) other Monte Carlo sampling methods, including Approximate Bayesian Computation (ABC) [95], sequential Monte Carlo [40], and the adaptive Metropolis algorithm [96] and (ii) obtaining a tractable analytical expression as an approximation to the posterior, including the Laplace approximation [97] and variational Bayes [97].The BSI practitioner must consider accuracy, speed, and ease of implementation [40] when choosing a method to approximate their posterior at hand.

Figure 1 :
Figure 1: The mathematical model y = f β (x), parameterized by β, predicts the output y of a physical system in response to an input x.
The parameter identification problem.
The reconstruction problem.

Figure 2 :
Figure 2: Two categories of inverse problems.

Fig. 5b
Fig. 5b compares (i) the prior distribution of the parameter Λ with (ii) its updated, (marginal) empirical posterior distribution constructed via kernel density estimation.The bar shows the equal-tailed 90% posterior credible interval for Λ, [0.94 hr, 1.01 hr].By definition, the true parameter λ of the lime is situated in this interval with 90% probability, falls below it with 5%

Figure 6 :
Figure 6: Inverse problem IIa: time reversal: infer the initial (at time t 0 = 0) temperature Θ 0 of the lime from a measurement of its temperature later, (t , θ obs ).(a) The data.(b) The prior and posterior distributions of Θ 0 .The black bar shows its equal-tailed 90% credible interval.The vertical line shows the held-out measurement θ 0,obs .(c) A sample of 100 model trajectories θ(t; λ, t 0 = 0, θ 0 , θ air ) from the posterior.

Figure 7 :
Figure 7: Inverse problem IIa: time reversal: the (empirical, marginal) posterior distribution of Θ 0 as the time at which we measure the lime temperature, t , increases.

Figure 8 :
Figure 8: Inverse problem IIb: time reversal: infer the initial temperature Θ 0 of the lime and the time T 0 it was taken out of the refrigerator from a measurement of its temperature later, (t , θ obs ).(a) The data.(b) The joint and marginal prior and posterior distributions of (T 0 , Θ 0 ).The held-out measurement of the initial condition of the lime is indicated by the black point/dashed lines.The green dashed line shows the classical solution in eqn.26 with λ set to be the mean of the posterior from Sec. 2.4.The black bars in the marginal plots show the 90% equal-tailed credible intervals.(c) A sample of 100 model trajectories θ(t; λ, t 0 , θ 0 , θ air ), with (T 0 , Θ 0 , Λ, Θ air ) a sample from the posterior.

S1. 2 Figure S2 :
FigureS2: Convergence diagnostics[58] for our NUTS routine for inverse problem I.The four colors correspond to four independent Markov chains.(a) The trace plots show the parameter λ i at step i of the Markov chain.Note, (i) approximately the same range of λ's are explored in each chain and (ii) we see good mixing of the chain, in that local maxima and minima are frequently encountered presented.The iterations start at 1000 because the samples from the preceding iterations were discarded as "burn-in".(b) The empirical posterior distribution of Λ. Note, over the four independent chains, the posterior distribution is approximately consistent.