Inverse problems are ubiquitous in science and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input–output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) not be unique, or (iii) be sensitive to measurement noise contaminating the data. Bayesian statistical inversion (BSI) is an approach to tackle ill-posed and/or ill-conditioned inverse problems. Advantageously, BSI provides a “solution” that (i) quantifies uncertainty by assigning a probability to each possible value of the unknown parameter/input and (ii) incorporates prior information and beliefs about the parameter/input. Herein, we provide a tutorial of BSI for inverse problems by way of illustrative examples dealing with heat transfer from ambient air to a cold lime fruit. First, we use BSI to infer a parameter in a dynamic model of the lime temperature from measurements of the lime temperature over time. Second, we use BSI to reconstruct the initial condition of the lime from a measurement of its temperature later in time. We demonstrate the incorporation of prior information, visualize the posterior distributions of the parameter/initial condition, and show posterior samples of lime temperature trajectories from the model. Our Tutorial aims to reach a wide range of scientists and engineers.
I. INTRODUCTION
A. Mathematical models of physical systems
In science and engineering, we often apply physical principles to develop a mathematical model of a physical system in order to capture some phenomenon of interest. Physics-based models are useful for acquiring insights and understanding about a system, explaining observations, guiding future experiments, discovering new scientific questions, and making predictions.1
Abstractly, a physics-based model of a system is typically an operator fβ: x ↦ y that (i) predicts the output y in response to any given input x and (ii) contains a vector β of physical parameters characterizing the system. See Fig. 1.
The input and output could be scalars, vectors, or functions. Evaluating the operator fβ at an input x could constitute evaluating an analytic expression, solving a system of algebraic equations, evaluating an integral, solving a differential equation, or running a computer simulation.2,3 Note that the abstraction in Fig. 1 is not predicated on the system having mass and/or energy input/output; broadly, by definition, the input is merely a causal factor set in the beginning of an experiment on the system, and the output is an effect/result/consequence of the input.4
B. The forward problem
The forward problem is, given the model structure fβ(x) and its parameters β, predict the output y of the physical system in response to a given input x—i.e., predict the effect of a cause.
The forward problem is solved by evaluating fβ(x).
C. Inverse problems
Mathematicians find it difficult to define “inverse problem,” yet most recognize one when they see it.
Charles W. Groetsch3
Two categories of inverse problems2,3,7–12 are (see Fig. 2)
parameter identification: determine the parameters β characterizing a system that produced a set of observed input–output pairs ;
reconstruction: determine/reconstruct the input x′ that produced an observed system output y′, given the model parameters β; that is, predict the cause of an observed effect.
Inverse problems are ubiquitous in science and engineering. The development of a quantitatively predictive model often involves parameter identification. Reconstruction problems emerge when measurements we ideally wish to make in order to probe a system are infeasible, inaccessible, invasive, dangerous, and/or destructive, e.g., (i) sensing or imaging using indirect measurements (an elementary example: exploiting a model of the thermal expansion of mercury to infer the temperature of the air from a measurement of the volume of mercury in a mercury-in-glass thermometer13,14), (ii) making inferences about the interior of a domain from measurements on its boundary, and (iii) time reversal: reconstructing the past state of a system from a measurement of its current state.2,3
A famous inverse problem, one of reconstruction, is “can one hear the shape of a drum?”15–18
Can one hear the shape of a drum?
Suppose we strike the head of a drum with a stick at t = 0. The force induces transverse waves in the drumhead, which transmit to the air, producing longitudinal waves (sound).
Treating the drumhead as a homogeneous, elastic, two-dimensional membrane under tension, the wave equation is a mathematical model for the vertical displacement u(x, t) of the vibrating membrane at a point x ∈ Ω and time t ≥ 0,(1)The parameter c > 0 is the ratio of the spatially uniform tension in the membrane to its density, Δx is the Laplace operator, and defines the geometry of the membrane ( is the boundary of Ω). Damping is neglected. Equation (1) is subject to an initial position and velocity set by the striking stick.19
The pure tones the drumhead can produce are dictated by standing wave solutions of the wave equation, u(x, t) = eiωtϕ(x). Substituting into Eq. (1), we find that the frequency ω of a pure tone of the drumhead is related to an eigenvalue of the Laplace operator on the domain Ω,(2)According to Eq. (2), the membrane, characterized by its physical property c and its geometry Ω, can produce a discrete spectrum of vibration frequencies ω1 ≥ ω2 ≥ ⋯.15
The conventional forward problem is, given the physical property c and geometry Ω of the membrane, use Eq. (2) to determine the spectrum of vibration frequencies ω1 ≥ ω2 ≥ ⋯ it can produce. In undergraduate mathematics courses, students solve the forward problem analytically for a rectangular or circular drum head.20
An inverse problem is, given the physical property c of the membrane and its spectrum of vibration frequencies ω1 ≥ ω2 ≥ ⋯, use Eq. (2) to determine its geometry Ω. Certainly, the set of pure tones a drum can produce contains information about the geometry of its membrane, but it is not obvious if these pure tones (and c) uniquely determine its geometry. This inverse problem, “can one hear the shape of a drum?,” was popularized by Kac in 1966.15 It was not until 1992 that Gordon, Webb, and Wolpert17 found two non-isometric domains with an identical spectrum of vibration frequencies, shown in Fig. 3. That is, one cannot necessarily hear the shape of a drum.18
Classically, a solution to an inverse problem is (conceptually) sought by adjusting the input x (for a reconstruction problem) or parameter β (for a parameter identification problem) until the model output(s) match(es) or (approximately) fit(s) the observed system output(s) to achieve (maximal) consistency between the model and the data,21 e.g., least squares.22,23
1. Challenges in applied inverse problems
As opposed to theoretical, exact inverse problems, in applied inverse problems,24 we must cope with noise contaminating the measurements of the system output. This noise may originate from (i) an imperfect measurement instrument and/or (ii) variance in the system output in response to a repeated input due to (a) inherent stochasticity and/or (b) unrecognized or poorly controlled, and thus unaccounted-for in the model, inputs/conditions that influence the output. In addition, the model fβ(x) may inadequately approximate physical reality—perhaps, in part, owing to (ii-b).21
Applied inverse problems are particularly interesting and challenging because, in contrast to forward problems, they are often ill-posed and/or ill-conditioned.6,25
A solution to an applied inverse problem may not exist owing to noise in the measured output and model inadequacy:
in reconstruction problems, the observed output may fall outside the image of fβ so that no physically viable input is consistent with it;
in parameter identification, a parameter β giving a model fβ(x) that exactly reproduces all observed input–output pairs may not exist.
A solution to an applied inverse problem may not be unique:
in reconstruction problems, many different inputs may be consistent with the observed output owing to a many-to-one fβ;
in parameter identification, (i) the number of observed input–output pairs may be insufficient to fully constrain the parameters or (ii) the model may be inherently unidentifiable (i.e., no amount of data can fully constrain the parameters).25–27
A solution to an applied inverse problem may discontinuously depend on or be sensitive to measurement noise contaminating the data. Even if the solution to the inverse problem exists and is unique, noise contaminating measurements of the system output propagates onto and corrupts the solution to the inverse problem. In ill-conditioned inverse problems,14 small realizations of noise in the data produce large changes in the solution.
To tackle applied inverse problems, these challenges implore us to account for the noise contaminating measurements of the output, reframe our concept of a “solution,” and quantify our uncertainty in the solution.28,29
D. Bayesian statistical inversion (BSI)
Bayesian statistical inversion (BSI)13,29–37 is a versatile framework to tackle (possibly) ill-posed and/or ill-conditioned applied inverse problems. BSI has two key advantages. First, BSI allows us to incorporate prior (i.e., before data are collected) information and/or beliefs about the input/parameter into our solution to the inverse problem. Second, BSI yields a probabilistic solution to inverse problems that quantifies uncertainty about the input/parameter.
Key references for BSI:13,38,39
Modeling uncertainty. To model uncertainty about the unknown input/parameter, we treat it as a random variable and model its probability distribution. In this Bayesian view, the probability assigned to each region of input/parameter space reflects our degree of belief that the input/parameter falls in that region. This belief is based on some combination of subjective belief and objective information.40,41 Loosely speaking, the spread of the probability density over the input/parameter space reflects our uncertainty about its value, whereas sharp concentration of the density reflects certainty.
Prior vs posterior distributions. Our uncertainty about the input/parameter is different before and after we conduct an experiment, collect data, and compare these data with our model. Consequently, the input/parameter has a prior density before the data are considered and then an updated, posterior density after we are enlightened by the data.
Modeling the data-generating process. To acknowledge the unobservable noise that contaminates our measurements, we treat the outcomes of our measurements of the system output (i.e., the data) as realizations of random variables. A model of the stochastic data-generating process follows from combining (i) the mathematical model fβ(x) of the system and (ii) a model of the probability distribution of the noise.
The two ingredients of BSI: the prior and likelihood. The two key ingredients for tackling an inverse problem with BSI are as follows:
The prior distribution of the input/parameter, expressing our beliefs about it before the data are collected/observed. The prior we construct depends on the amount of information we have about the input/parameter before the experiment, and it involves a degree of subjectivity and judgment.36,40 A prior may be roughly categorized as diffuse, weakly informative, or informative39 based on the amount of uncertainty it admits about the input/parameter. An informative prior, e.g., a Gaussian distribution with a small variance, expresses a high degree of certainty about the input/parameter. On the other hand, a diffuse prior, e.g., a uniform distribution, is more appropriate when we lack any prior information about the input/parameter. A diffuse prior adheres to the principle of indifference and spreads its density widely over input/parameter space to express a very high degree of uncertainty.39 An informative prior influences the posterior more than a diffuse prior, which “lets the data speak for itself.”42 Generally, as we gather more data, the influence of the prior on the posterior tends to (but does not for non-identifiable systems26) weaken/diminish as the data override/overwhelm the prior.
The likelihood function of the input/parameter, giving the probability density of the data conditioned on each value of the input/parameter. The likelihood function is constructed from (i) the data [our observation(s) of the system output(s), paired with inputs for the parameter identification problem] and (ii) our model of the data-generating process, which constitutes (a) the forward model fβ and (b) the probability distribution that models the noise contaminating the data. The likelihood quantifies the support that the data lend to each value of the unknown input/parameter.39
From these two ingredients, we next “turn the Bayesian crank”43 to obtain the posterior distribution of the unknown input/parameter.
The BSI solution to an inverse problem: the posterior. The BSI “solution” to the inverse problem, the posterior distribution of the unknown input/parameter, follows from the prior density and likelihood function of the input/parameter via Bayes’ theorem. The posterior density of the unknown input/parameter gives the probability that the input/parameter falls in any given region of input/parameter space, conditioned on the data. The posterior updates the prior in light of the data, offers a compromise between the prior and likelihood, and constitutes the raw solution to the inverse problem that quantifies uncertainty about the input/parameter through its degree of spread.28
For some combinations of families of likelihood and prior probability distributions [e.g., a Gaussian (conjugate) prior for a Gaussian likelihood density], we obtain a closed-form expression for the posterior distribution. Otherwise, we resort to approximating the posterior distribution (i) as a tractable analytical distribution [e.g., as a Gaussian via the Laplace (quadratic) approximation or some other parameterized distribution via variational inference] or (ii) as an empirical distribution constructed using samples from the posterior obtained from a Markov Chain Monte Carlo (MCMC) method.43,44
We may summarize the posterior distribution of the input/parameter with a credible region (e.g., a highest-density region45) that contains some large fraction α of the posterior density and thus credibly contains—given the assumptions in the likelihood and prior hold—the input/parameter (precisely, with probability α).46 Note that a Bayesian credible interval for the parameter/input is distinct from and arguably more intuitive and natural than a frequentist confidence interval.47
We next present an optional warm-up problem that gives an interpretable, closed-form posterior to illustrate and gain intuition about Bayesian inference.
Bayesian inference of the pH of a vat of wine
Suppose we work at a winery and wish to estimate the average pH, x, of Pinot Noir in a large, stirred vat. We possess a noisy pH-meter for the task.48 We adopt the Bayesian approach and view the pH as a random variable X to model our uncertainty about it.
Prior. First, we impose a Gaussian prior distribution on X based on the history of pH measurements from previous batches of Pinot Noir produced by our winery over many years,with xpr and being the mean and variance of average pH among previous batches. This prior reflects our assumption that we followed closely the fermentation protocols from previous years, sourced our grapes from the same vineyard, employed the same yeast strain for fermentation, etc.(3)Probabilistic data-generating model. We treat the extraction of a sample of wine from the vat and subsequent measurement of its pH, giving an observed pH xobs, as a stochastic process. Particularly, we treat xobs as a realization of an random variable,where is a normally distributed random variable whose variance σ2 models the imprecision of the pH-meter and variation in pH among samples from the vat. Assume that σ is known from previously observed variance in pH measurements of samples from the same vat (but different batch).(4)Data. To gather information about the pH X of the vat of the wine, we take n samples of wine from the vat and measure the pH of each sample with our noisy pH-meter, giving data {xobs,1, …, xobs,n}.
Likelihood function. Under our probabilistic data-generating model in Eq. (4), the likelihood density—the probability density of the data given the pH of the wine in the vat is x (and, implicitly, given σ)—is, assuming independent measurements,(5)Since we collected the data, we now see πlikelihood as a function of x, indicating the support the data lend for each value of x. Then, with some algebra (including completing the square), the likelihood functionwhere is the mean pH among the samples. The likelihood function diminishes as x deviates from the sample mean , and it diminishes more rapidly as n increases.(6)Posterior. The posterior, the Bayesian solution to this task at the winery, is the probability density of the pH of Pinot Noir in the vat X, given the data {xobs,1, …, xobs,n} (and, implicitly, σ). It follows from Bayes theorem that(7)After some algebra (including completing the square; the denominator is a constant we do not need to handle), we find the posterior is also Gaussian,(8)with an inverse variance, a measure of concentration of the Gaussian, ofand mean(9)Intuitively: (1) The concentration of the posterior Gaussian increases (i) as we collect more data (n increases) and (ii) if we use a more precise pH-meter (small σ), reflecting a reduction in our uncertainty about the pH. (2) The mean of the posterior is a weighted average of the sample mean pH and the prior mean pH xpr. With more data and a more precise pH-meter, we weigh the sample mean more than the prior mean. A more informative prior (smaller σpr) means we need more data (larger n) to override the prior.49–51(10)Example. Figure 4 illustrates how (top) the diffuseness of the prior influences the posterior and (bottom) how more data override the prior and reduce posterior uncertainty.
Conjugate priors. Note that the posterior distribution in Eq. (10) belongs to the same family (Gaussian) as the prior distribution. Hence, the Gaussian prior is a conjugate prior for the Gaussian likelihood distribution in Eq. (5). Generally, a conjugate prior can simplify Bayesian inference by giving a closed-form expression for the posterior.46
E. Our contribution: a tutorial of BSI, illustrated on inverse problems of heat transfer
We provide a tutorial of BSI to solve inverse problems of both parameter identification and reconstruction while incorporating prior information and quantifying uncertainty. Our Tutorial is by way of examples regarding a simple, intuitive physical process: convective heat transfer from ambient air to a cold lime fruit in our kitchen. We aim to engage a wide range of scientists and engineers.
In Sec. II A, we describe our experimental setup: (i) we take a cold lime fruit out of a refrigerator and allow it to exchange heat with indoor air via natural convection; (ii) a probe inserted into the lime measures its internal temperature. In Sec. II B, we pose a mathematical model governing the time-evolution of the lime temperature based on Newton’s “law” of cooling. The model contains a single parameter. To describe the data-generating process, in Sec. II C, we augment this model with a probabilistic model of taking a noisy measurement of the lime temperature with the imperfect temperature probe. Next, in Sec. II D, we employ BSI to infer the parameter in the dynamic model of the lime temperature using time series data of the lime temperature (an overdetermined inverse problem). We then employ the model of the lime temperature with the inferred parameter to tackle two time reversal problems: reconstruct the initial temperature of the lime, given a measurement of its current temperature and the time it has been outside of the refrigerator (Sec. II E, ill-conditioned), and reconstruct the initial temperature of the lime and the duration it has been out of the refrigerator, given a measurement of its current temperature (Sec. II F, underdetermined). The solution to each inverse problem under BSI expresses uncertainty through the posterior probability density of the parameter/initial condition. We assess the fidelity of the solution to the reconstruction problems by comparing with the measured initial condition held out from the BSI procedure. From these concrete, instructive examples, we intend for readers to recognize inverse problems in their research domain where BSI may be employed to incorporate prior information and quantify uncertainty.
II. A TUTORIAL OF BAYESIAN STATISTICAL INVERSION (BSI)
As a tutorial of BSI to solve inverse problems while incorporating prior information and quantifying uncertainty, we tackle a variety of inverse problems pertaining to heat exchange between ambient air and a lime fruit via natural convection.
A. Experimental setup: heat exchange between a lime and the air
We allowed a lime fruit (∼5 cm diameter) to reside in a refrigerator for several days. Then, at time t ≔ t0 (min), we removed the lime from the refrigerator, placed it on a thin slab of insulating styrofoam, and allowed it to exchange heat with the indoor air via natural convection. An electrical-resistance-based temperature sensor inserted into the lime allows us to measure its internal temperature at any given time t to generate a data point (t, θobs) (“obs” = observed). [To avoid the early-time data reflecting the (short) time dynamics of the temperature probe coming into thermal equilibrium with the lime, owing to its nonzero thermal mass, we inserted the probe into the lime before we placed it in the refrigerator, so it begins thermally equilibrated with the lime.] Figure 5 shows our experimental setup, and Sec. S1 explains our temperature sensor and Arduino microcontroller setup.
B. The mathematical model of the lime temperature
We develop a mathematical model governing θ = θ(t) (°C), the temperature of the lime as a function of time t ≥ t0 (min), as it exchanges heat with the bath of ambient air at bulk (i.e., far from the lime) temperature θair (°C). The mechanism of lime-air heat transfer is natural convection, which is the combined effect52 of both heat (i) conduction and (ii) advection, i.e., via motion of the air adjacent to the lime, driven by buoyant forces arising from spatial density-gradients in the air, caused by temperature-gradients.53 The initial temperature of the lime is θ(t0) =: θ0.
Assumptions. We make several simplifying assumptions:
The temperature of the lime is spatially uniform. [We estimate the Biot number52 for this system Bi ≔ hr/k ≈ 0.6 based on (i) our measurement of the radius of the lime r ≈ 2.5 cm, (ii) the reported thermal conductivity of a lime k ≈ 0.595 J/(s m °C),54 and (iii) an estimated heat transfer coefficient for natural convection through a gas, h ≈ 15 J/(s m2 °C).52,55]
The bulk temperature of the air (a bath) θair is constant (sufficiently far from the lime).
Heat conduction between the lime and the styrofoam surface on which it sits is negligible.
The mass of the lime is constant (e.g., a negligible loss of moisture over time).
The heat capacity of the lime is constant with temperature.
Heat released/consumed due to chemical reactions (e.g., oxidation) is negligible.
The temperature probe inserted into the lime has a negligible thermal mass.
The rate of heat exchange between the air and the lime is governed by Newton’s “law” of cooling.
Figure 6 shows the solution to the model of the lime temperature, which we write as θ(t; λ, t0, θ0, θair) to emphasize its dependence on the parameter λ, the initial condition (t0, θ0), and the air temperature θair.
C. The probabilistic model of the data-generating process
Consider the process of employing our temperature probe, an imperfect measurement instrument, to measure the lime temperature at time t, giving a data point (t, θobs).
We treat the measured temperature θobs as a realization of a random variable Θobs owing to two sources of stochasticity. First, measurement noise: unobservable noise originating from the temperature probe corrupts the measurement. Second, residual variability: under repeated experiments with identical conditions (t0, θ0, θair), the observed lime temperature at time t may exhibit variance due to variable conditions/inputs that are poorly controlled or unrecognized and thus unaccounted for in our model for θ(t).21 For example, (i) the air temperature θair is not perfectly controlled and may fluctuate over time and (ii) the opening and closing of doors in the building may introduce fluctuating air currents in the room, making the heat transfer coefficient h fluctuate over time.
The random variable Eσ can capture noise emanating from both the measurement instrument and residual variability. However, Eq. (15) assumes that the mean measured lime temperature at time t over repeated experiments with the same conditions (t0, θ0, θair) is given by the model θ(t; λ, t0, θ0, θair). That is, our data-generating model neglects the possibility of model discrepancy (with physical reality), a nonzero difference between (a) the expected measured lime temperature at time t over multiple experiments with the same conditions (t0, θ0, θair) and (b) the model prediction of the lime temperature, θ(t; λ, t0, θ0, θair).21
D. Inverse problem I: Parameter identification
Overview of problem and approach
Task: employ BSI to infer the parameter Λ (i.e., λ treated as a random variable) characterizing the lime, appearing in the model of the lime temperature in Eq. (14), using data from a heat transfer experiment.
First, we estimate λ with a back-of-the-envelope calculation and use this estimate to construct a weakly informative prior distribution of Λ.
To admit our uncertainty about the variance σ2 of the measurement noise in our model of the data-generating process in Eq. (15), we also treat it as a random variable Σ2 to also be inferred from the data. We impose a diffuse prior distribution on Σ2, granted support based on the noise characteristics of the temperature probe.
Next, we set up a heat transfer experiment with the lime (see Sec. II A). We take two measurements of the conditions of the experiment: (i) the initial temperature of the lime, giving datum (t0 = 0, θ0,obs), and (ii) the air temperature, giving datum . To admit these are noisy measurements, we treat the initial temperature of the lime Θ0 and air temperature Θair as random variables to be inferred from the data, but impose informative prior distributions on them based on these measurements.
We then measure the lime temperature at different times over the course of the heat transfer experiment, giving time series data that provide information about Λ.
Finally, we use Bayes’ theorem to construct the posterior distribution of (Λ, Θ0, Θair, Σ) in light of the data , sample from it using a Markov Chain Monte Carlo algorithm, and obtain a credible interval for the parameter Λ that quantifies posterior uncertainty about its value.
Quick overview:
measurements of the experimental conditions: the measured initial (t = t0 = 0) temperature of the lime θ0,obs and air temperature ;
data collected during the experiment: the lime temperature time series data ;
random variables to infer from the data: the parameter Λ, the initial lime temperature Θ0, the air temperature Θair, and the variance of the measurement noise Σ2;
sources of priors: Λ: back-of-the-envelope calculation, Θ0, Θair: our noisy measurements of them, and Σ2: precision of temperature sensor.
Summary of results: See Fig. 7.
Classically, this inverse problem is overdetermined because a parameter λ giving a model θ(t; λ, t0, θ0, θair) that passes through all data points does not exist. [Note that we do not attempt to determine each C, h, and A in Eq. (12) from the data but rather the lumped parameter λ ≔ C/(hA). The former attempt would make the system unidentifiable because multiple distinct (C, h, A) give the same λ and hence produce the same model θ(t).]
1. The experimental setup
To characterize the setup of the lime heat transfer experiment, we use the temperature probe to measure the air temperature, giving datum , and the initial lime temperature at t = t0 ≔ 0, giving datum θ0,obs. These data are plotted in Fig. 7(a) as the horizontal dashed line and first point, respectively.
2. The prior distributions
Before the data are considered, in BSI, we must express our prior beliefs and information about the value of each variable via a prior probability density function.
The parameter, Λ. A back-of-the-envelope estimate of λ = C/(hA) is h based on the following. The diameter of the lime, approximated as a sphere, is ∼5 cm. The mass of the lime is ∼100 g. The specific heat of a lime is reported as ∼4.0 kJ/(kg °C).54,60 A typical heat transfer coefficient h for natural convection via gas is 15 J/(s m2 °C).52,55
3. The data and likelihood function
The data. We measure and consider the lime temperature over the course of the heat transfer experiment, giving the time series data displayed in Fig. 7(a).
Since we possess the data at this stage, we (i) view the likelihood as a function of the unknowns (λ, θ0, θair, σ) and (ii) interpret it as a measure of the support the data lend to each value of the unknowns, (λ, θ0, θair, σ).39
4. The posterior distribution
We are particularly interested in the posterior distribution of the parameter Λ, with (Θ0, Θair, Σ) marginalized out.
The posterior density of the unknowns πposterior(λ, θ0, θair, σ∣θ1,obs, …, θN,obs)
is the raw solution to this parameter identification problem as it assigns a probability to each region of (λ, θ0, θair, σ)-space to reflect our posterior degree of belief that the unknowns (λ, θ0, θair, σ) fall in that region;
represents an update to our prior density πprior(λ, θ0, θair, σ), in light of the data ; and
reflects a compromise between (i) our prior knowledge and beliefs about (λ, θ0, θair, σ) and (ii) the support the data lend to (λ, θ0, θair, σ), according to our model of the data-generating process.
Remark on sources of posterior uncertainty about Λ. Uncertainty about Λ remains even after the data are considered. Sources of this posterior uncertainty are a lack of data (small N) coupled with two sources of noise captured through our model of the data-generating process in Eq. (15):21 (i) measurements of the lime temperature being corrupted by (unobservable) noise from using an imperfect temperature sensor and (ii) unrecognized and/or poorly controlled inputs/conditions that influence the lime temperature over the course of the experiment and result in white noise. However, our posterior distribution does not capture uncertainty due to (i) residual variability: unrecognized and/or poorly controlled inputs/conditions that influence the lime temperature and vary from experiment-to-experiment (because we infer Λ using data from only a single experiment), or (ii) model inadequacy: when θ(t) does not faithfully predict the expected [over many experiments under the same conditions (t0, θ0, θair)] measured lime temperature [perhaps, in part, owing to (i)], which would introduce bias and violate the white noise assumption in Eq. (15).
5. Sampling from the posterior
We employ a Markov Chain Monte Carlo (MCMC) algorithm, the No-U-Turn Sampler (NUTS)61 implemented in Turing.jl62 in Julia,63 to obtain samples from the joint posterior distribution in Eq. (24) in order to (i) approximate it with an empirical distribution using kernel density estimation and (ii) compute means and credible intervals of the unknowns. Over four independent chains, we collect 2500 samples/chain, with the first half discarded for warm-up.
Markov chain Monte Carlo (MCMC) sampling from the posterior distribution
The computational challenge of computing the posterior. In a general setting for parameter inference via BSI, let
be the random vector of the K unknown parameters in the inverse problem;
be the data vector of noisy measurements/observations.
The posterior density of U follows from Bayes’ theorem,46The denominator, the evidence, is the probability density of the data—a constant that serves as a normalizing factor for the numerator,(25)Typically, we cannot analytically evaluate this K-dimensional integral. If K is large, it may be intractable to use numerical cubature to approximate πevidence(d) as well. The same difficulty may arise for the integral to (i) compute the mean of the posterior or (ii) marginalize out a subset of the unknowns we are less concerned with.43(26)Markov chain Monte Carlo (MCMC) simulation. MCMC methods44 permit us to obtain samples u1, …, un from the posterior density π(u) with only access to evaluations of a function to which the posterior density is proportional [as in Eq. (24)]. This circumvents the need to compute πevidence(d). From the samples u1, …, un, we can (i) construct an empirical posterior distribution using kernel density estimation64 and (ii) approximate (a) the mean of the posterior from the mean of the samples and (b) an equal-tailed credible interval of any unknown from the percentiles of its samples. The approximation to the posterior improves as more samples are collected. Note that we can construct the empirical marginal posterior distribution, of a subset of unknowns, trivially by ignoring the remaining dimensions of the sampled vectors u1, …, un.
The idea behind an MCMC method is to (i) construct a Markov chain U1, U2, … whose (a) state space is the parameter space and (b) transition kernel πt(u′∣u) governing the probability of transitioning from one state u to another u′ endows the chain with a stationary distribution equal to the posterior distribution π(u) and then (ii) simulate the Markov chain to obtain a realization u1, …, un, regarded as (autocorrelated) samples from π(u).39,43,65,66
For example, random walk Metropolis. Perhaps the simplest MCMC simulation algorithm to understand is random walk Metropolis.43,65 Here, a realization u1, …, un of a Markov chain U1, …, Un is obtained by iterating a stochastic process of “propose then accept-or-reject” n times. We initiate the walk at some state u1 in a region with reasonably high posterior density. Within the iterations, suppose u is the current state in the chain. We propose to move to a new state u′, chosen randomly according to an isotropic random walk starting at u. That is, the proposed new state is a random variable U′ = u + ΔU, where . We accept this proposed state transition with probabilityand reject it otherwise, staying at u. This rule (i) always accepts proposed moves “uphill” to a state u′ with higher density than u and (ii) occasionally accepts moves “downhill.” Note that the rule only requires the ratio of the densities of the two states. Hence, the normalization factor πevidence(d) cancels and is not needed. Together, this proposal distribution and acceptance rule specifies a transition kernel πt(u′∣u) that grants the Markov chain U1, U2, … a stationary density equal to π(u). Consequently, u1, …, un are autocorrelated samples from the posterior π(u).43 The scale parameter σ in the random walk proposal distribution dictates the efficiency of the sampling, in terms of the amount of serial correlation in the samples. If σ is too small, too many proposed random walk steps are required to explore the state space. If σ is too large, too many proposals will be to visit regions with low density, which will be rejected, making the walker stay in place. Both extremes make the sampler inefficient.65,67(27)NUTS,61 an extension of Hamiltonian Monte Carlo (HMC),68 tends to be a more efficient MCMC sampler than random walk Metropolis owing to its more judicious state transition proposal scheme than a random walk in state space.
Warm-up, stopping rules, convergence diagnostics. If it is too difficult to find a high-density region of the posterior in which to initiate the Markov chain with state u1, some first fraction of the n MCMC samples are typically be discarded (i.e., not counted as samples from the posterior) as “warm-up” to allow the walker to find a region of state space with high posterior density.39,66
Stopping rules and empirical convergence diagnostic tools may ascertain, or at least give us confidence, that we have run the Markov chain sufficiently long (i.e., that n is large enough) for the sampled chain u1, …, un to give a good approximation of the posterior. More, they allow us to avoid wastefully running the chain for longer than needed for our desired accuracy. An example of a stopping rule is to terminate when the effective sample size exceeds a pre-specified number. The effective sample size is the number of independent samples with the same estimation power as the autocorrelated samples from the chain, u1, …, un. An example of an empirical convergence diagnostic tool is a trace plot of a coordinate of the sampled chain u1, …, un vs the iteration number. The trace plot shows the efficiency of the Markov chain. It should resemble a horizontal, approximately uniform-thickness “hairy caterpillar” to reflect low autocorrelation between the samples, thorough exploration of the support of the posterior, and sufficient warm-up. Another convergence check is for consistency between the empirical posterior distributions from independent chains (perhaps) run in parallel.66,69,70
6. Summary of results
Figure 7(a) shows all data from the heat transfer experiment that we use to infer the parameter Λ of the lime with BSI.
Figure 7(b) compares (i) the prior distribution of the parameter Λ and (ii) its updated, (marginal) empirical posterior distribution constructed via kernel density estimation. The bar shows the equal-tailed 90% posterior credible interval for Λ, [0.94 h, 1.04 h]. By definition, the true parameter λ of the lime is situated in this interval with 90% probability, falls below it with 5% probability, and falls above it with 5% probability. The width of the interval, then, reflects our posterior uncertainty about Λ. This interpretation is predicated upon our model of the data-generating process and prior assumptions holding.
Figure 7(c) illustrates the posterior distribution over functions θ(t) modeling the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t0 = 0, θ0, θair), with (λ, θ0, θair) a sample from the posterior distribution. The models fit the data well and exhibit little variance, reflecting low uncertainty about the parameter λ in light of the data.
We found little correlation (magnitude of Pearson correlation 0.02) between Λ and Σ in the joint posterior distribution. The (marginal) posterior distributions of Λ and Σ are well-approximated as independent Gaussian distributions and .
Checks. To assess the efficiency of an MCMC sampler and give confidence the MCMC samples reliably approximate the posterior distribution,66 we drew trace plots and visualized the empirical posterior distribution of λ over four independent chains in Fig. S2. The trace plot indicates that each sampled chain exhibits little autocorrelation and thoroughly explores an interval of λ, and the empirical posterior distributions are consistent with each other.
To visually inspect (i) the amount of prior information we include about λ in the prior distribution and (ii) the consistency of the observed data with the prior distribution, we show a prior predictive check39,71 in Fig. S3, where we generated synthetic data obtained by simulating the data-generating process with parameters sampled from the prior.
To visually assess the fit of the posterior model of the lime temperature, we, in a posterior predictive check,39,71 plot the distribution of residuals between our observed data and synthetic data obtained by simulating the data-generating process with parameters sampled from the posterior in Fig. S4. The mean residual is less than ±0.25 °C, an excellent fit.
E. Inverse problem IIa: time reversal
Overview of problem and approach
Task: employ BSI to infer the initial temperature of the lime Θ0 (i.e., θ0 treated as a random variable) based on a single measurement of its temperature at a later time t′ > t0 = 0, , and a measurement of the air temperature, .
First, we impose a diffuse prior distribution on Θ0 based on the range of temperatures encountered in refrigerators.
Second, we impose informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ2 based on the posterior distributions from our parameter identification phase in Sec. II D. “Yesterday’s posterior is today’s prior.”72
Next, we set up another heat transfer experiment on the same lime. To (partially) characterize the conditions of the experiment, we measure the air temperature, giving datum , which we use to construct an informative prior distribution on Θair. [Note that we also measured the initial temperature of the lime, but we hold the data (t0 = 0, θ0,obs) out from the BSI procedure until the end, to test the fidelity of the posterior distribution of Θ0.]
Then, during the experiment, we (indirectly) gather information about Θ0 by measuring the temperature of the lime at time t′ > t0 = 0, i.e., a duration t′ after the lime was taken out of the refrigerator, giving datum .
Finally, we use Bayes’ theorem to construct the posterior distribution of (Θ0, Λ, Θair, Σ) in light of the datum , sample from the posterior using an MCMC algorithm, and obtain a credible interval for the initial lime temperature Θ0 that quantifies posterior uncertainty about its value.
Quick overview:
measurement of an experimental condition: the measured air temperature ;
datum collected during the experiment: the measured lime temperature point with t′ > t0 and t0 = 0;
random variables to infer from the datum: the initial lime temperature Θ0, the parameter Λ, the air temperature Θair, and the variance of the measurement noise Σ2; and
sources of priors: Θ0: range of temperatures encountered in refrigerators; Λ, Σ2: the posterior from our parameter identification phase in Sec. II D; Θair: our noisy measurement of the air temperature.
1. The experimental setup
To characterize the setup of the additional lime heat transfer experiment (different from the one for parameter identification in Sec. II D, but with the same lime), we use the temperature probe to measure the air temperature, giving datum , plotted in Fig. 8(a) as the horizontal dashed line. (Note, we actually also measured the initial lime temperature at t = t0 ≔ 0, giving datum θ0,obs = 6.3 °C, but let us pretend we did not for now.)
2. The prior distributions
The joint prior distribution. Again, the joint prior distribution of all of the unknowns πprior(θ0, λ, θair, σ) for this inverse problem factorizes since we impose independent priors.
3. The datum and likelihood function
The datum. During the second heat transfer experiment, we take a single measurement of the lime temperature at time t′ > t0 = 0, giving the datum point displayed in Fig. 8(a). Note that the time the lime was taken out of the refrigerator t0 = 0 is known.
4. The posterior distribution
We are particularly interested in the posterior distribution of the initial lime temperature Θ0, with (Λ, Θair, Σ) marginalized out.
Again, we employ NUTS to obtain samples from the posterior and then construct an empirical approximation of it.
5. Summary of results
Figure 8(a) shows the data from the heat transfer experiment that we employ to infer the initial lime temperature Θ0 with BSI.
Figure 8(b) compares (i) the prior distribution of the initial lime temperature Θ0 with (ii) its updated (marginal) empirical posterior distribution constructed via kernel density estimation. The bar shows the 90% equal-tailed posterior credible interval for Θ0. Notably, the hold-out test data, the measured initial lime temperature θ0,obs, fall in this credible interval.
Figure 8(c) illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t0 = 0, θ0, θair), with (θ0, λ, θair) a sample from the posterior distribution. The true initial condition is covered by the ensemble of backward trajectories.
6. Ill-conditioning
Figure 9 shows the marginal posterior distribution of the initial lime temperature Θ0 for various times t′ at which we measure the lime temperature to obtain the datum . As t′ becomes larger, the posterior distribution of Θ0 spreads, reflecting higher uncertainty. For large t′, i.e., measurements after the lime has been outside of the refrigerator for a long time, we still have high posterior uncertainty about the initial lime temperature. That the datum has not much enlightened us much about the initial lime temperature Θ0 for large t′ owes to the ill-conditioned nature of the problem illustrated in Eq. (28). Hence, Fig. 9 demonstrates the ability of BSI to capture ill-conditioning in inverse problems of reconstruction.
F. Inverse problem IIb: Time reversal
Overview of problem and approach
Task: employ BSI to infer both the time T0 (i.e., t0 treated as a random variable) the lime was taken out of the refrigerator and the initial temperature Θ0 of the lime based on a measurement of the lime temperature at time t′ > t0, , and the measured air temperature .
First, we impose a diffuse prior distribution on Θ0 based on the range of temperatures encountered in refrigerators; a weakly informative prior distribution on T0 based on our sense of time passing; and informative prior distributions on the parameter Λ characterizing the lime and the variance of the measurement noise Σ2 based on the posterior distributions from our parameter identification phase in Sec. II D.
Next, we set up another heat transfer experiment (well, we use the same data from the second heat transfer experiment in Sec. II E; importantly, it is a different experiment from the one used for parameter identification in Sec. II D) on the same lime. To (partially) characterize the condition of the experiment, we measure the air temperature, giving datum to construct an informative prior distribution for Θair. [Note that we also measure the initial temperature of the lime θ0,obs and know the time t0 it was taken out of the refrigerator, but we hold this datum (t0, θ0,obs) out from the BSI procedure to test the fidelity of the posterior distribution of (T0, Θ0).]
Then, during the experiment, to (indirectly) gather information about (T0, Θ0), we measure the temperature of the lime at time t′ > t0, .
Finally, we use Bayes’ theorem to construct the posterior distribution of (T0, Θ0, Λ, Θair, Σ) in light of the data, sample from it using a MCMC algorithm, and obtain an empirical marginal joint posterior distribution for the initial lime temperature Θ0 and time it was taken out of the refrigerator T0 that quantifies posterior uncertainty about their values.
Quick overview:
measurement of an experimental condition: the measured air temperature ;
datum collected during the experiment: the measured lime temperature point with t′ > t0 and t0 unknown;
random variables to infer from the datum: the time the lime was taken out of the refrigerator T0, the initial lime temperature Θ0, the parameter Λ, the air temperature Θair, and the variance of the measurement noise Σ2; and
sources of priors: T0: our human judgment of the passing of time; Θ0: range of temperatures encountered in refrigerators; Λ, Σ2: the posterior from our parameter identification phase in Sec. II D; Θair: our noisy measurement of the air temperature.
1. The experimental setup
To (partially) characterize the experimentally setup of this lime heat transfer experiment, we use the temperature probe to measure the air temperature, giving datum plotted as the horizontal dashed line in Fig. 10(a).
2. The prior distributions
3. The datum and likelihood function
The datum. The datum from the heat transfer experiment is a single measurement of the lime temperature, with t′ > t0, displayed in Fig. 10(a). The time the lime was taken out of the refrigerator t0 is not known.
4. The posterior distribution
Again, we employ NUTS to obtain samples from the posterior.
5. Summary of results
Figure 10(a) shows the data from the heat transfer experiment that we employ to infer the initial condition of the lime (T0, Θ0) with BSI.
By showing contours, Fig. 10(b) compares (i) the joint prior distribution of the initial condition of the lime, (T0, Θ0), with (ii) the updated, empirical, joint posterior distribution of (T0, Θ0) constructed via kernel density estimation. The curve drawn in the (t0, θ0) plane shows the classical solutions, given in Eq. (34) with , , and λ set to the mean of its posterior distribution from the parameter identification phase. The ridge of the posterior density follows the curve of classical solutions. However, owing to the non-uniform density and finite support of the prior distribution of (T0, Θ0), the classical solutions are weighted differently and some are not entertained at all. The posterior density spreads orthogonal to (off) the curve of classical solutions owing to BSI accounting for noise corrupting our measurement of the air and lime temperature (unlike the classical approach); i.e., BSI entertains solutions off of the curve of classical solutions. The measured initial condition of the lime (t0 = 0, θ0,obs), held out as test data, falls in a region of high posterior density. This result is, in part, a consequence of the mean of our prior for T0 in Eq. (35) being close to the true t0 = 0. The marginal posterior distributions of T0 and Θ0 are compared in Fig. 10(b) as well, including their 90%, equal-tailed posterior credible intervals. The credible interval for Θ0 is much wider than in the inverse problem IIa in Fig. 8(b), despite using the same lime temperature measurement after the same duration t′ the lime has been outside of the refrigerator, because t0 now is not specified but rather T0 is endowed a spread-out prior distribution, including density less than the true t0.
Figure 10(c) illustrates the posterior distribution of backward trajectories of the lime temperature by showing a random sample of 100 realizations of models for the lime temperature, θ(t; λ, t0, θ0, θair), with (t0, θ0, λ, θair) being a sample from the posterior distribution. The intuition explaining the correlation of T0 and Θ0 in the joint posterior density in Fig. 10(b) is apparent in Fig. 10(c): the data are consistent with both propositions: (i) “the lime was initially not very cold and taken out of the refrigerator recently” and (ii) “the lime was initially very cold and taken out of the refrigerator a while ago.”
G. Software for BSI
A variety of free and open-source probabilistic programming libraries are available, making Bayesian statistical inversion accessible to practitioners.39 Broadly, the probabilistic (computer) programming paradigm73,74 allows practitioners to (i) easily specify (a) the prior probability distributions and (b) the probabilistic forward model assumed to generate the data then (ii) call a Markov Chain Monte Carlo routine to draw samples from the posterior distribution that (implicitly) follows from the prior and likelihood. Popular Bayesian inference engines/probabilistic programming libraries include Turing.jl62 in Julia,63, PyMC3,75, Pyro,76 and Probability77 in Python, and Stan78 that interfaces with several languages, including R.
BSI coding tutorial. We provide a minimal coding tutorial for BSI to tackle inverse problems I and IIa herein at https://simonensemble.github.io/pluto_nbs/bsi_tutorial.html using the probabilistic programming library Turing.jl in the Julia programming language.
H. Conclusions
By way of example, we provided a tutorial of Bayesian statistical inversion (BSI) to solve inverse problems while incorporating prior information and quantifying uncertainty. Our focus was a simple, intuitive physical process—heat transfer from ambient indoor air to a cold lime fruit via natural convection. First, we developed a simple mathematical model for the lime temperature, which contains a single parameter. Then, we used a time series dataset of the lime temperature to infer, via BSI, the posterior distribution of the model parameter. Next, we employed the model with the inferred parameter to tackle, via BSI, two reconstruction problems of time reversal. The first task, ill-conditioned, was to predict the initial temperature of the lime from a measurement of its temperature later in time. The second task, underdetermined, was to predict the initial temperature of the lime and the time it was taken out of the refrigerator from a measurement of its temperature later in time. We intend for our Tutorial to facilitate scientists and engineers to (i) recognize inverse problems in their research domain, (ii) question whether these problems are well-posed and well-conditioned, and (iii) leverage BSI to solve these problems while incorporating prior information and quantifying uncertainty.
Our BSI solutions to the inverse problems involving the lime are subject to limitations. First, our mathematical model of the lime temperature relies on several simplifying assumptions listed in Sec. II B. The model may be more accurate if we relaxed these assumptions and amended it to account for, e.g., the time-dependence of the bulk air temperature θair(t) and spatial temperature gradients in the lime in conjunction with its geometry and spatial heterogeneity (skin, flesh, seeds). Second, ideally, we would replicate the heat transfer experiment multiple times and use all of this time series data of the lime temperature for the inference of the parameter Λ in Sec. II D. This would allow the posterior distribution of Σ to capture the full residual variability of the lime temperature over repeated experiments owing to poorly controlled and/or unrecognized inputs/conditions that affect the lime temperature. Third, we neglected the possibility of model discrepancy,21 an unaccounted-for source of both systemic bias and uncertainty in the posterior of the parameter/initial condition of the lime.
III. DISCUSSION
We provided an introductory tutorial on Bayesian statistical inversion as a tool to tackle inverse problems, which could be ill-posed, while (i) incorporating prior information and (ii) providing a solution in the form of a probability density function over the input/parameter space, which quantifies uncertainty via its spread. Inverse problems are pervasive throughout sciences and engineering, e.g., in heat or radiation transfer, gravitational fields, wave scattering, tomography, and electromagnetism;79–84 vibration of springs and beams;85 imaging;7,86 fluid mechanics;87 physiology;88 epidemiology;89,90 ecology;91 geophysics;84 environmental science;92,93 palaeoclimatology;94 chemical/bio-chemical reactions;95–99 and adsorption.100 Mathematically, reconstruction problems in these domains often reduce to using data to determine (1) a vector that was linearly transformed by a matrix;13,101,102 (2) the initial condition, boundary condition, or forcing term in an ordinary or partial differential equation;7 (3) an integrand;103,104 or (4) the geometry of a domain.9
A. Drawbacks of BSI
The Bayesian statistical approach to inverse problems has some drawbacks compared to simpler, classical methods, such as least squares.
A larger computational cost. The Bayesian approach to inverse problems typically demands a relatively large computational cost to approximate the posterior distribution via a Markov chain Monte Carlo algorithm—especially when the number of parameters/inputs is large and the likelihood is expensive to evaluate (owing to a large size of the data and/or a large computational expense to evaluate the forward model).35,49 To reduce the cost of Bayesian inference at the expense of accuracy/flexibility, we may resort to approximate Bayesian computation (ABC),105 make a Laplace (quadratic) approximation of the posterior, or employ a conjugate prior for the likelihood that gives a closed-form expression for the posterior.35
The subjectivity of the prior. A common objection to Bayesian inference is the subjectivity involved in constructing the prior distribution (which influences the posterior).106 First, we can assess the impact of the prior on the posterior through a sensitivity analysis (i.e., examine how much the posterior changes when the prior changes). Second, through prior predictive checking, we may ascertain the consistency of the prior with our observed data by comparing (a) the observed data with (b) synthetic data generated via sampling inputs/parameters from the prior then simulating the probabilistic model of the data-generating process under those inputs/parameters.39,71 Third, in the absence of information, we may adopt the principle of indifference and impose a diffuse prior that avoids biasing the posterior. Finally, one intriguing approach to construct a prior is to elicit estimates of the unknown parameter/input from a panel of experts.107 In defense of the prior distribution, it provides an opportunity/benefit to (i) potentially obtain better estimates of the inputs/parameters by exploiting relevant external information or constraints about the input/parameters (e.g., the physical constraint and back-of-the-envelope estimate of λ for the lime) and (ii) interpret new data from an experiment in the context of different beliefs held among different scientists (and encoded in their priors) before the data were collected.36,40 In addition, the choice for the forward model (its structure, e.g., functional form), upon which non-Bayesian methods also rely, requires subjectivity/judgment as well.36
B. Model discrepancy
Model discrepancy21 refers to a nonzero difference between the (i) expected measurement of the system output over repeated experiments with the same input/conditions and (ii) the prediction of the system output by the model. If significantly present and not accounted for in the model of the data-generating process, model discrepancy corrupts the BSI solution to an inverse problem.108 First, disregarded model discrepancy introduces bias, i.e., the posterior density of the parameter/input will not be centered at the true value even as more and more data are collected. Second, it leads to mis-calibrated uncertainty quantification, i.e., the posterior credible interval will not be likely to contain the true value of the parameter/input.
C. Model selection
We assumed that the model structure—in our study, the functional form/shape of the lime temperature as a function of time, θ(t)—is known (see Table 2a). In model selection, the task is to select the best model of a physical system among a set of candidate models, given data. Bayesian approaches to model selection include the Bayesian information criterion111 and Bayes factors39,112 based on the evidence function.40,113
D. Other methods to approximate the posterior distribution
Here, we employed the NUTS61 Markov chain Monte Carlo method to obtain samples from the posterior distribution to approximate it. Other methods to approximate a posterior distribution include (i) other Monte Carlo sampling methods, including the Gibbs sampler,114 adaptive Metropolis algorithm,115 sequential Monte Carlo,43 and approximate Bayesian computation (ABC),116 and (ii) obtaining an analytical expression as an approximation to the posterior, including the Laplace approximation and variational Bayes.117 The BSI practitioner must consider accuracy, speed, and ease of implementation43 when choosing a method to approximate their posterior at hand.
E. Classical approaches to inverse problems
A traditional approach to inverse problems is least squares to minimize the mismatch between the data and the model predictions by tuning the unknown input/parameter. For overdetermined inverse problems, bootstrapping,118 asymptotic theory,119 and the Hessian of the loss function120 can provide uncertainty estimates for the unknown input/parameters within the least squares framework—and for new predictions by the model. To handle ill-posed and/or ill-conditioned inverse problems, Tikhonov regularization121,122 provides a means to incorporate prior assumptions into their solutions by augmenting least squares. For example, the ill-posed problem of determining the initial temperature profile of a rod from a measurement of its temperature profile later in time may be tackled by augmenting the least-squares loss function with a regularization term to promote smooth solutions.123 Tikhonov regularization is intimately connected with BSI.13,14,34,38 However, BSI is more versatile and better-grounded in theory; consequently, BSI provides a more interpretable means of incorporating prior assumptions/information, and BSI entertains and weighs multiple solutions for uncertainty quantification.
SUPPLEMENTARY MATERIAL
The supplementary material shows the Arduino setup for lime temperature measurements, convergence diagnostics for MCMC sampling, prior predictive check, posterior predictive check, and graphical solution to time reversal problem.
ACKNOWLEDGMENTS
F.G.W. acknowledges NSF under Award No. 1920945 for support. C.M.S. acknowledges support from the US Department of Homeland Security Countering Weapons of Mass Destruction under CWMD Academic Research Initiative Cooperative Agreement No. 21CWDARI00043. This support does not constitute an expressed or implied endorsement on the part of the Government. The authors thank Edward Celarier, Luther Mahoney, and the anonymous reviewers for feedback on the manuscript.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Faaiq G. Waqar: Conceptualization (equal); Data curation (lead); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Swati Patel: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Cory M. Simon: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Methodology (equal); Project administration (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal).
DATA AVAILABILITY
All data and Julia code to reproduce the results/plots in this article are openly available in Github at https://github.com/faaiqgwaqar/Inverse-Problems.124