Dynamical systems are frequently used to model biological systems. When these models are fit to data, it is necessary to ascertain the uncertainty in the model fit. Here, we present prediction deviation, a metric of uncertainty that determines the extent to which observed data have constrained the model's predictions. This is accomplished by solving an optimization problem that searches for a pair of models that each provides a good fit for the observed data, yet has maximally different predictions. We develop a method for estimating *a priori* the impact that additional experiments would have on the prediction deviation, allowing the experimenter to design a set of experiments that would most reduce uncertainty. We use prediction deviation to assess uncertainty in a model of interferon-alpha inhibition of viral infection, and to select a sequence of experiments that reduces this uncertainty. Finally, we prove a theoretical result which shows that prediction deviation provides bounds on the trajectories of the underlying true model. These results show that prediction deviation is a meaningful metric of uncertainty that can be used for optimal experimental design.

Nonlinear dynamical systems are used throughout systems biology to describe the dynamics of biomolecular interactions. These models typically have a number of unknown parameters, such as infection rates and decay rates, which are estimated by fitting the model to measurements from the physical system. Two important questions then arise: What is the uncertainty in the model predictions, and how can that uncertainty be reduced? We describe here a new approach for measuring uncertainty in model predictions, by searching for a pair of model parameters that both provide a good fit for the observed data, but make maximally different predictions. We further show how to estimate the impact on the uncertainty of a candidate experiment that has not yet been done, allowing the experimenter to determine beforehand if an experiment will be valuable. We use prediction deviation to analyze a model of HIV infection which can only be partially observed. With prediction deviation, and with appropriately selected experiments, we are able to provide bounds on the behavior of the unobserved quantities and gain insights into inhibition that are otherwise unavailable.

## I. INTRODUCTION

Systems of nonlinear differential equations are used throughout biology to model the behavior of complex, dynamical systems. These models have proven particularly useful in systems biology for describing networks of biomolecular interactions.^{1} Often the utility of the model depends on being able to estimate a set of unknown parameters, which is typically done by collecting data from the physical system and finding the best-fit parameters. When inferring a dynamical system from data, there are two important questions that arise:

*Uncertainty quantification*: Is the model sufficiently constrained by the data?*Optimal experimental design*: If not, what additional experiments would most reduce the remaining uncertainty?

Uncertainty is often measured by constructing a confidence interval for each parameter estimate. We propose a different approach to the problem of uncertainty quantification and then show that this approach leads naturally to an optimal experimental design strategy. Our fundamental hypothesis is that the purpose of fitting a model is to be able to use it to make predictions. In many situations, the parameter values *per se* are not of interest, rather the goal is to gain insight into the system's behavior. In these situations, the purpose of assessing model uncertainty is to determine if the model's predictions can be trusted.

This paper begins by developing *prediction deviation*, a new measure of uncertainty on predicted behaviors. We then use prediction deviation to measure uncertainty in a partially observed model of human immunodeficiency virus (HIV) infection, where we found that after one experiment there remained substantial uncertainty in the behavior of the unobserved component. We then show that prediction deviation leads naturally to a way to measure *experiment impact*, which is a maximum uncertainty on predicted behaviors if an additional experiment were to be conducted. This approach is used to determine a sequence of experiments that reduces uncertainty in the HIV infection model and ultimately bounds the behavior of the unobserved component. Finally, we provide a theoretical foundation for prediction deviation by showing that, under reasonable assumptions, it bounds the trajectory of the underlying true model.

### A. Confidence intervals do not measure predictive power

Parameter confidence intervals are a poor way of determining if a nonlinear dynamical system's predictions are constrained by the observed data. Sensitive dependence means that tight confidence intervals do not imply constrained predictions. The classic Lorenz system provides an illustration of this phenomenon

Fig. 1(a) shows *x*(*t*) data generated from the Lorenz system with parameters $\theta true=[7,38,5]$, initial conditions $x(0)=10,\u2009y(0)=20$, and $z(0)=3$, and a small amount of normally distributed noise. The best-fit estimates for the parameters, $\theta *$, are very close to the true values $\theta true$, and have seemingly tight confidence intervals: $\theta 1*=7.00$$(6.51\u22127.49),\u2009\theta 2*=38.03$$(36.08\u221240.28)$, and $\theta 3*=5.00$$(4.82\u22125.17)$, with 95% simultaneous likelihood-based intervals in parentheses.^{2} Suppose we wished to use these observed data with $y(0)=20$ to predict the behavior of the system when $y(0)=7$, with all other factors staying constant. Do the tight confidence intervals allow for confidence in the model's predictions at this different initial condition? Fig. 1(b) shows that they do not. This figure compares the predictions made by the best-fit model $\theta *$ to those made by the model with parameters $\theta \xaf=[6.98,38.12,4.99]$. $\theta \xaf$ is well within the confidence intervals of $\theta *$, moreover, the fit error of $\theta \xaf$ is within the 95% confidence interval for the fit error of $\theta *$, meaning $\theta \xaf$ is also a good fit for the observed data. However, $\theta \xaf$ and $\theta *$ make entirely different predictions for the condition we wish to predict. The phase portraits in Figs. 1(c) and 1(d) show that this small change in the parameters is enough to send the trajectory to a different side of the attractor. The Lorenz system is a canonical example of sensitivity, but chaotic dynamics are not required to have tight confidence intervals with unconstrained predictions. For instance, this same result can be had any time a basin boundary lies within the confidence interval.

Tight confidence intervals do not imply constrained predictions, and likewise wide confidence intervals do not imply unconstrained predictions. Parameters in nonlinear dynamical systems may be interrelated such that they individually have large confidence intervals, yet the predictions of interest are actually constrained. The following parameterization of the Lotka-Volterra predator-prey model illustrates this fact:

A symmetry in the parameters renders them all unidentifiable—they have infinite confidence intervals. Suppose we were able to observe *x*(*t*) data and wished to use these data to predict the state *y*(*t*). Fig. 2 shows that despite the infinite confidence intervals, *x*(*t*) data constrain predictions of *y*(*t*). The data in Fig. 2 were generated using $\theta true=[1,0.05,1,1]$ and initial conditions $x(0)=y(0)=10$, with standard normal noise. $\theta \xaf$ is the worst-case of how bad the prediction in *y*(*t*) could be. Specifically, of all parameters that have fit error within the 95% confidence interval of the fit error of the best-fit (that is, all parameters that provide a good fit of the data), $\theta \xaf$ is the one that maximized the squared difference between its prediction of *y*(*t*) and that of the best-fit $\theta *$. Thus any model that fits the *x*(*t*) observations will make a prediction on *y*(*t*) that differs from the best-fit by no more than the difference seen in $\theta \xaf$.

This model contains a structural unidentifiability, which could be identified and corrected by a reparameterization.^{3–5} It is also possible to have model parameters that are structurally identifiable but not *practically* identifiable, given the noise in the collected data.^{2} Gutenkunst *et al.*^{6} show that models with parameters that cannot be well constrained by data are ubiquitous in systems biology and conclude that “modelers should focus on predictions rather than on parameters.”

If the purpose of fitting the model to data is to ascertain the values of the parameters, then confidence intervals provide a useful quantification of uncertainty. However, if the purpose is to use the fitted model to make predictions about unobserved variables or unobserved conditions, then confidence intervals serve no purpose for dynamical systems. We propose putting aside the issue of measuring confidence intervals and instead directly measure the uncertainty in the quantity of interest: the predictions.

## II. PREDICTION DEVIATION AS A MEASURE OF UNCERTAINTY

Figs. 1(b) and 2(b) provide the motivation for our approach to measuring uncertainty. We consider a scenario, or set of scenarios, for which we are interested in predicting the system behavior. In Fig. 1(b) this was a different initial condition and in Fig. 2(b) it was an unobserved variable. We then pose the following question: Of all parameters that are a good fit to the observed data, what is the largest deviation in predicted behaviors for any pair? We call this deviation the *prediction deviation*. A low prediction deviation, such as that in Fig. 2(b), means that the observed data have constrained the prediction of interest. A high prediction deviation, such as that in Fig. 1(b), means that the observed data have not constrained the prediction of interest.

### A. Parameter estimation

We must first introduce notation to make the idea of prediction deviation precise. We suppose that we are learning a system of ordinary differential equations with state variables $x(t)$, unknown parameters $\theta $, and known external factors $\nu $

If the initial conditions are known then they are included in $\nu $, and if unknown in $\theta $.

We now provide notation for the observed data and the data fitting problem. Let $Pj=(Ij,Tj,\nu j)$ represent a particular *experiment*, with *I _{j}* being the set of state variables that are observed, $Tj={Ti,j:i\u2208Ij}$ the sets of time points at which these observations are made for each state variable, and $\nu j$ the external factors. We suppose that a total of

*J*experiments have been performed, resulting in observed data $x\u0303ij(t)$, for $j=1,\u2026,J,i\u2208Ij,$ and $t\u2208Ti,j$. We denote the complete set of observed experiments as $P={P1,\u2026,PJ}$ and the complete set of observed data as $x\u0303$.

The unknown parameters $\theta $ are typically estimated from the observed data $x\u0303$ by minimizing the weighted squared error

where $xi(t;\theta ,\nu j)$ is obtained by integrating (1) and $\sigma ijt2$ is the noise variance. The best-fit parameters $\theta *$ are the solution to the least squares problem

### B. Prediction deviation

To measure prediction deviation, we wish to search over the set of all models that provide a good fit to the observed data. We consider a model $\theta $ to be a “good fit” to the observed data if its fit error $zfit(\theta ;P,x\u0303)$ is not too much worse than that of the best fit, $\theta *$. Specifically, we measure a 95% confidence interval for $zfit(\theta *;P,x\u0303)$, which we denote as $[zl*,zu*]$. In the event of normally distributed observation noise, a parametric estimate for the interval can be obtained using the $\chi 2$ distribution, or, as we do here, a nonparametric confidence interval can be obtained with the bootstrap.^{7} The prediction deviation is defined as the maximum difference on a prediction problem between any pair of models that both have fit error within the 95% confidence interval of the best-fit error.

The prediction problems for which the prediction deviation is to be measured are defined in the same way as the experiment according to which data were collected. We call $Y\u2113=(I\u2113,T\u2113,\nu \u2113)$ a *prediction problem*, where as before $I\u2113$ is the set of state variables to be predicted in problem $\u2113,\u2009T\u2113={Ti,\u2113:i\u2208I\u2113}$ are the sets of time points at which these predictions are made for each state variable, and $\nu \u2113$ are the external factors. Let $Y={Y1,\u2026,YL}$ be the full collection of variables and experiments of interest for prediction. The squared difference between $\theta 1$ and $\theta 2$ on the prediction problems is

As before, $\sigma i\u2113t2$ is an estimate of the noise level for that measurement, which is important primarily for combining multiple measurements of possibly different scales into one metric. Prediction deviation can now be framed as an optimization problem

^{8}results for the Lorenz system from Fig. 1 and now discuss how prediction deviation can be used to understand and reduce uncertainty in a viral infection model.

## III. UNCERTAINTY IN A MODEL OF HIV INFECTION

### A. The model and data

We now use prediction deviation to assess how well observed data constrain the model predictions of an unobserved component in a model of the innate immune response to HIV infection.^{9} The model describes the dynamics of how interferon-alpha (IFN*α*) protects CD4 T cells from infection by HIV. IFN*α* is a signaling protein that endows CD4 T cells with protection from HIV by upregulating genes that disrupt viral replication. In the model, CD4 T cells (C) are infected by HIV (H) and become infected cells (CH) that produce additional viruses. Exposure to IFN*α* (I) induces a refractory state in both uninfected and infected cells (CI and CHI, respectively) which if uninfected are protected from infection, and if infected no longer produce additional viruses. The refractory state is reversible and CI and CHI cells eventually revert to their original state, C and CH, respectively. The dynamical system that describes the interactions of these quantities is

We use here tissue culture data collected by Browne *et al*.,^{9} who provide full details of the experimental methodology. In short, varying levels of IFN*α* were added to tissue cultures with CD4 T cells. After allowing the cells to incubate with the IFN*α* for 6 h, HIV was added to the culture for 1 h and then washed out. The total number of uninfected ($C+CI$) and infected ($CH+CHI$) cells, along with the viral count ($H$) were measured with four replicates every 24 h, for 3 days. This experiment was done separately for a total of 7 initial IFN*α* levels: 0, 0.002, 0.02, 0.2, 2, 20, and 200 ng/ml. In this tissue culture, the IFN*α* activity remained constant and so $I(t)$ was a known, external factor. Details of model fitting and prediction deviation implementation are given in Appendix A.

### B. Prediction deviation after one experiment

To illustrate how prediction deviation changes with additional experiments and how it can be used for experiment selection, we label each combination of variables and IFN*α* level as a separate experiment. For example, $C+CI$ measured at I = 0.002 ng/ml defines one experiment, and $H$ measured at I = 2.0 ng/ml is another. There are a total of 21 such experiments for which data were collected. We begin by using data from only one of these experiments, and then consider the problems of determining uncertainty in model fit, and deciding which additional experiments to add in order to reduce prediction uncertainty.

The purpose of defining the model and collecting experimental data is to understand the dynamics of how IFN*α* provides protection to CD4 T cells during HIV infection. The experimental data themselves do not explicitly show the interaction of IFN*α* and CD4 T cells inasmuch as only the sum $C+CI$ can be observed. The natural prediction problem is to then try to predict the $CI$ timecourse, at the same observation times as the $C+CI$ data. We begin with just one experiment, and let $P$ be the experiment corresponding to $C+CI$ measured at I = 0.002 ng/ml. Let $Y$ be the corresponding prediction problem, $CI$ at I = 0.002 ng/ml.

Using prediction deviation, we can determine if the observations of $C+CI$ at I = 0.002 ng/ml constrain the predictions of $CI$ at I = 0.002 ng/ml, and Fig. 3 shows that they do not. In particular, Fig. 3(a) shows that the two models that maximize prediction deviation both provide a good fit for the observed data, while Fig. 3(b) shows that they provide widely diverging predictions about the $CI$ timecourse: One of the models suggests that nearly all of the CD4 T cells are refractory, while the other one suggests that nearly none of them are. These observed data do not in any way increase our understanding of the IFN*α* dynamics.

## IV. OPTIMAL EXPERIMENT DESIGN

Knowing that the predictions of $CI$ are entirely unconstrained, the question that naturally follows is to determine which of the remaining 20 experiments should be done next in order to maximally reduce the prediction deviation. More generally, we wish to predict the impact that a particular candidate experiment or set of experiments $P\u2032$ will have on the prediction deviation, given that we have already completed experiments $P$, with $P\u2032\u2229P=\u2205$.

Fig. 4 provides some insight into this problem. This figure shows the predictions of the best-fit and prediction deviation models from Fig. 3 on two of the candidate experiments, C + CI at I levels of 0.0 and 200.0 ng/ml. On the candidate experiment in Fig. 4(a), the prediction deviation models are very different. Suppose observations were collected for this experiment and then prediction deviation were recomputed using both these observations and the original set. After collecting data, at least one of the prediction deviation models in Fig. 4(a) would no longer be a good fit. We cannot know *a priori* if the observations will lie close to one of the models and thereby disqualify the other, or if they will lie in the middle, disqualifying both, but at least one model will not be a good fit for the new observations.

Fig. 4(b) shows the alternative situation where the prediction deviation models do not disagree on the candidate experiment. Were this experiment to be done, it is possible that the observations would disqualify both prediction deviation models and there would be a reduction in uncertainty. However, it is also possible for the observations to be such that both models remain feasible, meaning there is no reduction in uncertainty.

The experiment in Fig. 4(a) seems like a good choice for reducing uncertainty, however having a large deviation on the candidate experiment $P\u2032$ does not necessarily mean that the deviation on the prediction problem of interest, in this case CI at I = 0.002 ng/ml, will actually be reduced. Certainly that pair of prediction deviation models will no longer satisfy both constraints (5b) and (5c), however, there may exist yet another pair of models that do not disagree on $P\u2032$ but produce the same prediction deviation on $Y$. A powerful property of prediction deviation as a measure of uncertainty is that we actually can determine if this is the case.

### A. Estimating experiment impact

Collecting observations from experiment $P\u2032$ would change the prediction deviation by requiring the prediction deviation models to be a good fit for the new observations. In essence, there would be two new constraints that must be satisfied

for some *η*, where $x\u0303\u2032$ are the new observations. In Appendix A, we show that these constraints imply

which allows us to get some idea of the impact these constraints would have even without knowledge of $x\u0303\u2032$. The essence of this result is that if the prediction deviation models are a good fit for the new data, they must have close trajectories on the new data. We are unable to restrict the prediction deviation models to be a good fit for the new data until we have collected the new data. We can, however, restrict the prediction deviation models to have close trajectories on the candidate experiment, thus estimating the impact that the candidate experiment would have on the prediction deviation. This is done by solving the prediction deviation problem with the added constraint (6), which we call the *experiment impact problem*

*estimated experiment impact*.

Of all possible outcomes of $P\u2032$, the outcome that reduces uncertainty in $Y$ the least is if the observations follow the trajectories of $\theta \u03021$ and $\theta \u03022$. In this sense, the predicted experiment impact is a worst-case reduction of uncertainty, and we can expect that $P\u2032$ will reduce the prediction deviation *at least* as much as $zdev(\theta \u03021,\theta \u03022;Y)$, subject to the closeness requirement *η* being appropriate. Appendix A describes how *η* can be chosen.

## V. REDUCING UNCERTAINTY OF IFN*α* DYNAMICS

We now continue the results on uncertainty in IFN*α* dynamics and use the predicted experiment impact to find additional experiments that reduce the uncertainty shown in Fig. 3(b). There are 20 candidate experiments consisting of different component measurements and varying IFN*α* levels. The estimated experiment impact optimization problem, (7), was solved for each of these candidate experiments, and results for the experiment that predicted the largest reduction of uncertainty, $C+CI$ at I = 0.0 ng/ml, are shown in Fig. 5. Fig. 5(a) shows the predicted experiment impact models on the candidate experiment, which are forced to have close trajectories. Fig. 5(b) shows that for the prediction problem there is a substantial reduction in uncertainty by requiring the models to produce close trajectories on the candidate experiment—this is the estimated experiment impact. The actual experiment impact is shown in Figs. 5(c) and 5(d): in Fig. 5(c) the actual observations, and in Fig. 5(d) the prediction deviation after including those observations. The actual reduction in prediction deviation was very close to that predicted by the estimated experiment impact in Fig. 5(b).

The estimated experiment impact problem predicted a significant reduction in uncertainty from only one of the 20 candidate experiments. Fig. 6 shows the results of separately adding each of the 20 candidates, and the $C+CI$ at I = 0.0 ng/ml experiment of Fig. 5 provided by far the largest reduction of uncertainty. The other two experiments at I = 0.0 ng/ml provided a moderate reduction in uncertainty, while the remaining 17 experiments provided no reduction in uncertainty. Some experiments actually increased the uncertainty, by increasing the amount of noise in the fitting.

We denote the prediction deviation models after including data from experiment $P\u2032$ as $\theta \xaf1\u2032$ and $\theta \xaf2\u2032$. Fig. 7(a) compares the estimated experiment impact, $zdev(\theta \u03021,\theta \u03022;Y)$, to the actual impact of each candidate experiment, $zdev(\theta \xaf1\u2032,\theta \xaf2\u2032;Y)$. As already seen in Fig. 5, for the one candidate that in actuality significantly reduced uncertainty, the predicted impact was very close to the actual impact. There were two experiments that provided a moderate reduction in uncertainty which was not matched by the estimated experiment impact. For the remaining 17 experiments, solving the estimated experiment impact problem correctly predicted that these experiments would not reduce uncertainty. Because it comes from adding a constraint to the prediction deviation problem, the estimated experiment impact problem cannot predict an increase in uncertainty, rather it can only predict that uncertainty will not decrease. Thus in Fig. 7(a) the estimated experiment impacts for the 17 ineffectual candidates are very close to the previously measured prediction deviation.

The two experiments with a moderate reduction in uncertainty that was not predicted give insight into how the estimated experiment impact problem works. Fig. 8 shows the pre-experiment and post-experiment prediction deviation models, along with the expected experiment impact models, for one of these two experiments. Estimated experiment impact is a worst-case analysis, and for these two experiments the worst-case models did not reduce uncertainty while the post-experiment models did. Each of these experiments had a potential outcome, consistent with the observed data, which would not have reduced uncertainty. Fig. 8(a) shows this worst-case potential outcome for one of the experiments. The actual data did not follow these worst-case trajectories, and in fact were able to moderately reduce uncertainty. Importantly, there were no experiments for which the estimated experiment impact indicated a reduction of uncertainty where in reality there was none. Because estimated experiment impact is a worst-case analysis, if the model is correct, this type of error will not occur and we will not do experiments that end up not reducing uncertainty.

The fact that 17 of the 20 experiments produced no reduction of uncertainty could not have been known without solving the estimated experiment impact problem. In particular, measuring the uncertainty in the candidate experiments themselves, as in Fig. 4, could not predict that all of these experiments would have no impact. Fig. 7(b) compares the deviation on the candidate experiments, $zdev(\theta \xaf1,\theta \xaf2;P\u2032)$, to the actual experiment impact $zdev(\theta \xaf1\u2032,\theta \xaf2\u2032;Y)$. The two candidates with the highest pre-experiment deviation did not actually reduce uncertainty at all. Fig. 7(b) shows that the uncertainty in the candidate experiment does not at all predict the impact that the candidate will have in the uncertainty of the prediction problem.

Fig. 9 shows the outcome of using the expected experiment impact in a sequential experimentation setting. Starting from the data in Fig. 3(a), we sequentially added in the data from the candidate experiment whose estimated experiment impact predicted the largest reduction in uncertainty. Each time after adding data from a candidate, we recomputed the prediction deviation with the new set of observations and recomputed the estimated experiment impact of the remaining candidates. Fig. 9(a) shows that adding in the second set of observations (those in Fig. 5(c)) produced a large drop in prediction deviation, shown in Figs. 5(b) and 5(d). Additional experiments continued to reduce uncertainty, but in much smaller amounts, consistent with the findings of Fig. 6. After adding data from just 3 of the 20 candidate experiments, the uncertainty was at nearly the level that was obtained by including all 20 of the candidate experiments. Fig. 9(b) compares the prediction deviation with only the initial experiment to that obtained after including the first three experiments selected using the estimated experiment impact. Initially, the data supported both the hypothesis that nearly none of the CD4 T cells were refractory, and the hypothesis that nearly all of the CD4 T cells were refractory. With the additional observations, the prediction deviation shows that only a small minority of CD4 T cells are refractory.

## VI. THEORETICAL ANALYSIS

Prediction deviation has a strong theoretical guarantee that further motivates its use as a metric of uncertainty. For the purposes of the theoretical analysis, we assume that there exists a true model $\theta true$, and the observed data equal the output of this true model, plus random noise

where *ϵ _{ijt}* are independent but not necessarily identically distributed random variables. Let

*α*be such that $zu*$ used to measure prediction deviation is the upper-bound on a $1\u2212\alpha $ confidence interval. Under reasonable assumptions on

*ϵ*which are given in Appendix B, the following theorem holds:

_{ijt}**Theorem 1.** *With probability at least* $1\u2212\alpha $

This theorem means that the trajectory of the true model is in a particular sense bounded by that of the prediction deviation models: With high probability, it does not differ from either of the prediction deviation models by an amount larger than the difference in the prediction deviation models themselves. Thus if the prediction deviation is small and the trajectories of the prediction deviation models are close, then the trajectory of the true model can be specified within a narrow window, with high probability. This guarantee shows that prediction deviation corresponds to bounds on the underlying true model and provides additional support for the validity of prediction deviation as a metric of uncertainty. The proof is given in Appendix B.

## VII. RELATED WORKS

There are several related lines of work assessing predictive power in dynamical systems. Kreutz *et al*.^{10} use an optimization approach to measure prediction confidence intervals. Prediction intervals are measured by solving a sequence of minimization problems, separately for each time point in each prediction problem. Prediction intervals differ from the prediction deviation in that there might be different models that provide the upper and lower bounds at each time interval, whereas prediction deviation produces a single pair of models that maximizes the total deviation across all time points. The main strength of using prediction deviation as a measure of uncertainty is the ability to directly predict the impact of an additional experiment on the prediction deviation, via the estimated experiment impact problem. Kreutz *et al*.^{10} propose using the prediction intervals of the candidate experiments to decide which experiment would have the highest impact on the prediction problem. For nonlinear dynamical systems, reducing uncertainty of the model under one condition (the candidate experiment) does not necessarily reduce uncertainty under a different condition (the prediction problem). This is shown clearly in Fig. 7(b), where many candidate experiments had large uncertainty themselves, yet their observations did not reduce the uncertainty in the prediction problem. For prediction deviation, on the other hand, solving the optimization problem in (7) provides a direct estimate of how much reducing uncertainty in the proposed experiment will reduce uncertainty in the prediction problem. Because it is a worst-case analysis, the estimate from solving (7) also will not make the sort of error shown in Fig. 7(b) where the recommended experiments end up not reducing uncertainty. Other approaches to measuring prediction intervals include boostrapping^{11} and MCMC sampling in a Bayesian framework.^{12} Vanlier *et al.*^{13} provide a review of recent approaches to measuring uncertainty both in parameters and in predictions.

Optimal experimental design has typically been studied in the context of parameter estimation^{14–16} or, more recently, model selection and discrimination.^{17–21} Kreutz and Timmer^{22} provide a review of recent approaches to optimal experimental design for these two problems. Our methods here are for optimal experimental design for prediction uncertainty, which generally requires predicting the impact of a proposed experiment on prediction uncertainty. Casey *et al.*^{23} measure prediction uncertainty using a linearization of the prediction problem and then show how to predict the impact of a proposed experiment on the approximated prediction uncertainty. Another approach to optimal experiment design is to simulate the outcome of the proposed experiment using the best-fit model, and to measure the corresponding reduction in uncertainty.^{16} Useful experiments will themselves have high prediction uncertainty, so there will likely be a large range of possible outcomes, only one of which is the best-fit outcome. As shown in Fig. 8, the impact of the experiment on prediction uncertainty may depend strongly on which of the possible outcomes is realized. The actual reduction of uncertainty from an experiment could be much less than that predicted by the best-fit outcome, potentially wasting a valuable experiment. Solving (7) measures uncertainty under the worst-case of the possible outcomes of the experiment, ensuring that the experiment will be useful whatever the outcome may be.

## VIII. CONCLUSIONS

Two important questions that arise when fitting nonlinear dynamical systems to data are uncertainty quantification and optimal experimental design. We presented in this paper a prediction-centered approach for measuring uncertainty in a dynamical system's fit to data. Prediction deviation is able to directly show, via the pair of prediction deviation models, how much uncertainty remains in the prediction problem, thus answering the uncertainty quantification question. Solving the estimated experiment impact problem provides *a priori* a direct estimate of the impact that a candidate experiment would have on uncertainty. This allows the experimenter to choose the additional experiments that are likely to most reduce uncertainty, thus answering the question of optimal experimental design. We used the estimated experiment impact problem to sequentially choose 4 experiments which produced nearly the same reduction in uncertainty as the full set of 20 candidate experiments. In addition to the sequential experimentation setting that was demonstrated here, estimated experiment impact can also be used to predict the impact of simultaneously running a number of experiments by combining them into a single candidate. Finally, we proved a bound that with high probability provides a direct relationship between prediction deviation and how constrained the underlying true model is, providing a theoretical foundation for using prediction deviation as a metric of uncertainty.

## ACKNOWLEDGMENTS

Funding for this project was provided in part by Siemens and the U.S. Army Research Office.

### APPENDIX A: IMPLEMENTATION DETAILS

##### 1. Specifying the parameter *η*

Observations $x\u0303\u2032$ from candidate experiment $P\u2032$ would constrain the prediction deviation models according to the two constraints $zfit(\theta \xaf1;P\u2032,x\u0303\u2032)\u2264\eta $ and $zfit(\theta \xaf2;P\u2032,x\u0303\u2032)\u2264\eta $. The amount that the fit error on the new models would be constrained, *η*, is a parameter in the estimated experiment impact problem, (7). Assuming normally distributed noise and a reasonable estimate of the experiment noise level $\sigma ijt2,\u2009zfit(\theta *;P\u2032,x\u0303\u2032)$ follows a $\chi 2$ distribution whose 95% percentile provides a reasonable choice for *η*. Alternatively, since observations are normalized by their noise level when computing fit error, if all observations contribute equally to the uncertainty then $\eta =zu*|P\u2032|/|P|$ provides a reasonable choice, where $|P|$ is the number of observations in experiment $P$ and $zu*$ is the upper end of the 95% confidence interval for the best-fit error. This is the approach we used in our experiments, and the effect of this *η* through (7d) can be seen in Fig. 5(a).

The following result provides the motivation for constraint (7d) in the estimated experiment impact problem.

**Proposition 1.** $zfit(\theta \xaf1;P\u2032,x\u0303\u2032)\u2264\eta $ *and* $zfit(\theta \xaf2;P\u2032,x\u0303\u2032)\u2264\eta $ *imply* $zdev(\theta \xaf1,\theta \xaf2;P\u2032)\u22642\eta $.

*Proof.*

The third line uses the triangle inequality, and the last line is by supposition. ◻

This result allows for an approximation of the impact of $P\u2032$ that does not require knowledge of the data $x\u0303\u2032$. The triangle inequality is generally loose, and incorporating this constraint into a maximization problem means that the result is the worst-case impact of $x\u0303\u2032$. These two approximations provide room for the additional approximation made above in choosing *η*. Ultimately, Fig. 7(a) shows that the approximations involved in estimating experiment impact are good enough to be useful.

##### 2. Simulation and optimization

SloppyCell^{24,25} was used to integrate the model ODE system. In addition to integrating the model, SloppyCell integrates the forward sensitivity system, which provides gradients of the model trajectories with respect to the parameters, $\u2207\theta xi(t;\theta ,\nu j)$. From these gradients, it is a straightforward calculation to obtain the gradients of the objectives and constraints for the three optimization problems in this paper: the data fitting problem, the prediction deviation problem, and the estimated experiment impact problem. All optimization problems were solved using random restarts of gradient-based optimization methods, with each optimization problem solved from 20 random initializations. The data fitting problem is an unconstrained minimization problem and was solved using the Scipy implementation of the Newton conjugate-gradient algorithm.^{26,27} The prediction deviation and estimated experiment impact problems are constrained maximization problems and were solved using the logarithmic barrier method [Ref. 27, Framework 17.2]. This method solves the constrained problem via a sequence of unconstrained problems, each of which was solved using the Newton conjugate-gradient method. The computational difficulty of each of these unconstrained problems is similar to that of the data-fitting problem. Solving (5) and (7) should thus scale in a similar way as the data fitting problem and have similar challenges. Feasible initial values for the prediction deviation and estimated experiment impact optimization problems were obtained using a Gaussian random walk from the best-fit parameters (which are always feasible), rejecting infeasible steps.

##### 3. Experimental data

The data for the experiment on IFN*α* dynamics were those provided by Browne *et al*.^{9} There, two parameters were measured separately from these data, and we followed and treated these parameters, as well as all initial conditions, as known. One of the known parameters was the IFN*α* decay rate, and so $I(t)$ was thus known. The estimation done in this paper was then on a space of 7 parameters and 5 variables. The noise variance estimate used for weighted least squares and for prediction deviation, $\sigma ijt2$, was taken as the average over time of the sample variances across the four replicates at each time point, separately for each set of variables and IFN*α* level. This is equivalent to the maximum likelihood estimate under a model where the noise is normally distributed with a variance that differs across variables and IFN*α* levels but is constant across time points.

### APPENDIX B: PROOF OF THE THEORETICAL RESULT

The result of Theorem 1 provides a theoretical foundation for using prediction deviation as a metric of uncertainty by showing that it relates directly to bounds on the behavior of the underlying true model. The theorem requires the following assumptions:

**Assumption 1.***The observed data are the output of a true model*$\theta true$,*plus noise:*$x\u0303ij(t)=xi(t;\theta true,\nu j)+\u03f5ijt$.**Assumption 2.***The random variables*ϵ_{ijt}*are independent*.**Assumption 3.***The probability density function of ϵ*$F\u03f5ijt(x)$_{ijt}is symmetric about 0 and unimodal, meaning the distribution function*is convex for*$x\u22640$*and concave for*$x\u22650$.**Assumption 4.***Let*$\theta *$*be the best-fit model under a particular realization of the observations and*$zu*$*fixed. Then, assume*$Px\u0303(zfit(\theta *;P,x\u0303)\u2264zu*)\u22651\u2212\alpha $.

Assumption 2 requires independence, but does not require *ϵ _{ijt}* to be identically distributed, thus the noise level may vary across different observations. Assumption 3 is quite general: it is satisfied by the normal distribution, as well as by other heavy-tailed distributions. In Assumption 4, the model $\theta *$ is held constant and the randomness is over different realizations of

*ϵ*, and thus different realizations of $x\u0303$. This assumption is about how the best-fit to one realization of the data generalizes to other realizations of the data, and requires that $zu*$, used in constraints (5b) and (5c), actually provides a $1\u2212\alpha $ upper bound for the fit error.

_{ijt}For the proof of Theorem 1, we define notation to describe the squared residuals. Let $Rijttrue=(xi(t;\theta true,\nu j)\u2212x\u0303ij(t))2$ be the squared residuals under the true model and $Rijt*$ the squared residuals under the best-fit model, $\theta *$. Let $bijt=xi(t;\theta *,\nu j)\u2212xi(t;\theta true,\nu j)$ be the bias of the best-fit model.

The following result shows that intervals of the noise distribution centered on 0 contain the most probability mass.

**Lemma 1.** *For* $x\u22650$ *and for* $a\u2208\mathbb{R},\u2009F\u03f5ijt(x+a)\u2212F\u03f5ijt(\u2212x+a)\u2264F\u03f5ijt(x)\u2212F\u03f5ijt(\u2212x)$.

*Proof.* This result follows from Assumption 3. When *x* = 0 the result is trivial. For *x* > 0, we first consider the case where $a\u2265x$. For all $x\u22650,\u2009F\u03f5ijt(x)$ is concave, and thus $F\u03f5ijt\u2032(x)$ is monotonically non-increasing. This means $\u2202\u2202a(F\u03f5ijt(x+a)\u2212F\u03f5ijt(\u2212x+a))\u22640$ $\u2200a\u2265x$, and this quantity is maximized when *a* = *x*. Thus,

which is the statement of the lemma. The second line follows directly from the concavity of $F\u03f5ijt(x)$ and the third line uses the symmetry $F\u03f5ijt(x)=1\u2212F\u03f5ijt(\u2212x)$. When $a\u2264\u2212x$, the same argument holds using the convexity of $F\u03f5ijt(x)$ for $x\u22640$.

For the remaining case, $|a|<x$,

by the concavity of $F\u03f5ijt(x)$ on the interval $[x\u2212a,x+a]$. From the symmetry, it then follows that

After rearranging, this proves the lemma. ◻

An important concept for the proof of Theorem 1 is that of a stochastic ordering, which we now define and then use to prove the theorem.

**Definition 1.** *For random variables X and Y,* $X\u227cY$ *if* $P(X>x)\u2264P(Y>x)\u2200x$.

**Lemma 2.** $Rijttrue\u227cRijt*$.

*Proof.*

using Lemma 1. ◻

The next result comes from Shaked and Shanthikumar,^{28} Theorem 1.A.3(b).

**Lemma 3.** *For independent random variables* $X1,\u2026,Xn$ *and* $Y1,\u2026,Yn$*, let* $X=\u2211i=1nwiXi$ *and* $Y=\u2211i=1nwiYi$ *with non-negative weights* $w1,\u2026,wn$*. If* $Xi\u227cYi\u2200i$*, then* $X\u227cY$.

**Corollary 1.** $zfit(\theta true;P,x\u0303)\u227czfit(\theta *;P,x\u0303)$.

*Proof.* The fit error is a weighted sum of the squared residuals, with weights $1\sigma ijt2$, so this result follows directly from Lemmas 2 and 3, and Assumption 2. ◻

**Theorem 1.** *With probability at least* $1\u2212\alpha $

*Proof.* By Corollary 1 and Assumption 4

Thus with probability at least $1\u2212\alpha ,\u2009(\theta true,\theta \xaf1)$ is a feasible solution to problem (5). The proof of the theorem is then by contradiction: If $zdev(\theta true,\theta \xaf1;Y)>zdev(\theta \xaf1,\theta \xaf2;Y)$, then $(\theta \xaf1,\theta \xaf2)$ cannot be an optimal solution to problem (5). However, $(\theta \xaf1,\theta \xaf2)$ are defined to be optimal solutions, and so the theorem holds. The same argument simultaneously holds for $(\theta true,\theta \xaf2)$. ◻