We describe an innovative experimental approach, and a proof of principle investigation, for the application of System Identification techniques to derive quantitative dynamical models of transcriptional regulation in living cells. Specifically, we constructed an experimental platform for System Identification based on a microfluidic device, a time-lapse microscope, and a set of automated syringes all controlled by a computer. The platform allows delivering a time-varying concentration of any molecule of interest to the cells trapped in the microfluidics device (input) and real-time monitoring of a fluorescent reporter protein (output) at a high sampling rate. We tested this platform on the GAL1 promoter in the yeast Saccharomyces cerevisiae driving expression of a green fluorescent protein (Gfp) fused to the GAL1 gene. We demonstrated that the System Identification platform enables accurate measurements of the input (sugars concentrations in the medium) and output (Gfp fluorescence intensity) signals, thus making it possible to apply System Identification techniques to obtain a quantitative dynamical model of the promoter. We explored and compared linear and nonlinear model structures in order to select the most appropriate to derive a quantitative model of the promoter dynamics. Our platform can be used to quickly obtain quantitative models of eukaryotic promoters, currently a complex and time-consuming process.
System identification is a branch of control engineering aimed at developing computational approaches to derive, from measurement data, a quantitative dynamical model of a physical system able to predict its future behaviour. There is a long tradition in the successful application of system identification approaches to medicine and physiology;1–3 however, in molecular biology, only few attempts have been made to infer a quantitative model of gene regulation due to experimental limitations of current techniques. Indeed, whereas in engineering it is now common to measure thousands of time-points at a desired sampling rate for a physical system to be modelled, this has been very difficult in biology, where time-series data consist of very few samples. Here, we constructed an experimental system identification platform that allows to provide a time varying concentration of any molecule of interest (input) to a population of cells and to measure the single-cell response in the form of the fluorescence level of a reporter protein, at a sufficiently high sampling rate, thus making it possible to evaluate the dynamics of the process of interest. The system identification platform consists of a set of automated syringes, containing the “input” molecule, connected via capillary tubes to a microfluidic device, in which the cells are “trapped.” The microfluidic device is a microscope glass slide to which a transparent-polymer is attached. The polymer is etched with channels to transport the content of the syringes to the chamber in which cells are trapped. A time-lapse fluorescence microscope is then used to image the cells in the chamber in order to quantify the fluorescence level of the reporter protein in the single cells. We tested the experimental platform to implement and compare different linear and nonlinear system identification approaches to a simple transcriptional network in the yeast S. cerevisiae. The results we obtained confirm that the experimental system identification platform we developed can successfully be used to infer quantitative models of a eukaryotic promoter in a rapid and efficient manner.
I. INTRODUCTION
The aim of system identification, an important branch of Systems and Control Theory, is to derive a dynamical input-output model of a physical system of interest from measurement data. The model can then be used to predict the behaviour of the system to an unknown input or to derive and implement control strategies to steer at will its dynamic behaviour towards a predefined goal. This field is quite advanced and a well established theory has been developed in the case of Linear Systems (i.e., systems described by a set of ordinary linear differential equations).4 Nevertheless, nonlinearity is generic in nature and many practical examples of nonlinear dynamic behaviour have been reported in the engineering literature;5 modelling and identification of nonlinear dynamic systems is a challenging task because the principle of linear superposition does not (generally) apply to nonlinear systems; therefore, heuristic methods are required for their identification,6 modelling techniques applied to nonlinear dynamics and chaos have been reported in the literature.7
In biology, system identification can be used with a dual purpose in mind: (1) as in the case of control engineering, to design feedback control strategies to steer the biological system towards a desired goal (i.e., a desired protein concentration);8–10 (2) to understand the biological mechanisms underlying the biological process. In both cases, the dynamical nature of biological processes is a crucial feature that needs to be captured by system identification.
Current experimental techniques are, in general, suitable to assess steady-state behaviour of molecular pathways, or to measure a few time points during a time-course experiment, thus making these approaches unsuitable for System Identification purposes. Hence, only few attempts have been made to infer models of transcriptional regulation using system identification techniques.11,12 Indeed, the majority of the models in systems biology have been built using a priori knowledge of the underlying chemical and genetic mechanisms.13–16
The aim of this paper is to propose an experimental approach, and a proof of principle study, to derive quantitative dynamical models of biological system of interest through the application of system identification techniques. Specifically, the goal is that of identifying automatically an input-output model of transcriptional regulation, i.e., a model of the dynamical response of a promoter to a transcription factor (TF) of interest.
Currently, the most common experimental approach to model a promoter is to derive a “response curve.” This curve is obtained by expressing the TF at different levels in the cell and then measuring, at steady-state, the concentration of a reporter gene downstream the promoter of interest controlled by the TF. Usually, the promoter “dose-response” curve is fitted with a Michaelis-Menten (or Hill) function, which can then be incorporated into a dynamical model of the system.12 In order to express the TF at different levels, several techniques have been proposed, but in eukaryotes, they mostly rely on the expression of the TF from promoters of different strengths (although techniques based on single cell imaging exploiting the dilution rate of the TF during cell division have also been proposed in prokaryotes17). Since expressing a TF at different levels in a living cells is quite laborious, especially in eukaryotes, only a few points of the promoter response curve are usually measured, with the result that in the current literature very few quantitative models of promoters are available.
In order to overcome the current limitations, we propose the use of an experimental platform based on a microfluidic device, a time-lapse microscopy apparatus and, a set of automated syringes all controlled by a computer. Microfluidics allows to grow cells and to precisely change their environmental conditions in real-time; moreover, the cells in the device can be imaged with the microscope at high sampling rate (thus overcoming limitations of standard techniques), in order to evaluate the effects of the input provided to the system. This is achieved by measuring over time the fluorescence of a reporter used to track the output of the phenomenon of interest. The number of measured outputs relies on the number of different colours that can be tracked at the same time by fluorescent microscopy (up to 4 can be easily quantified).
We show that such a platform can enable the use of a System Identification approach to derive a dynamical model of a promoter. In particular, the promoter can be described as an input-output system, whose input is the TF level, or in the case of inducible promoters (as the one used in this study), the level of a small molecule able to activate the TF of interest. The system output is the fluorescence level of a reporter protein encoded by a gene expressed from the promoter to be modelled.
We used the experimental platform described above to identify and compare different linear and non-linear models of the GAL1 promoter in yeast cells driving expression of a green fluorescent protein (Gfp) fused to the GAL1 gene.18
II. BIOLOGICAL SYSTEM
The system under investigation is a strain of yeast cells (yGIL337, Gal1-GFP::KanMX,GAL10-mCherry::NatMX) constructed by Lang et al.;18 in these cells, the Green Fluorescent Protein (Gfp) is fused to the Gal1 protein and expressed from the GAL1 promoter and a variant of the red fluorescent protein (mCherry) is fused to the Gal10 protein and expressed from the GAL10 promoter.18
The Gal1 protein is one of the enzymes needed by yeast for galactose utilisation, thus the activity of the GAL1 promoter is related to the presence in the cells' environment of a sugar, galactose, which is sensed by the cells as a “switch on” signal for the expression of the GAL1 gene. On the contrary, the presence of another sugar, glucose, represses the production of Gal1 protein, because glucose is the preferred carbon source requiring much less energy to be metabolised.19 Thus, cells will first consume all the available glucose and then switch to utilise galactose, if any is available in the medium. Because of this, the input provided to the yeast cells can either be glucose (which switched off Gfp production) or galactose (which switched on Gfp production), but not a combination of the two sugars because yeast cells will not respond to galactose if glucose is present.
Moreover, a recent dynamical model of the galactose regulation system in yeast has revealed that the galactose system works as a low-pass filter, thus dampening the effect of switches between galactose and glucose.19 Hence, the frequency of the input signal has to be low enough for the system to respond.
Our strategy is to dynamically modulate the presence of two sugars in the medium in which the cells are grown, as input to the system, and to follow the dynamics of the Gal1 promoter in response to such an input by measuring Gfp fluorescence, which is considered as the output of the system.
III. TECHNOLOGICAL PLATFORM
To measure the dynamics of the GAL1 promoter in yeast cells during in-vivo experiments, we designed and implemented a platform based on a microfluidic device, a time-lapse microscopy apparatus, and a set of actuated syringes, all controlled by a computer, Figure 1-Panel B.
A. Microfluidics
At the core of the platform there is a microfluidic chip, MFD0005a, designed by the lab of Jeff Hasty (UCSD),20 housing a micro-chamber (height: ) which “traps” yeast cells, that can only grow in a mono-layer, thus allowing an easier automated image analysis. We produced replicas of the device designed by Ferry and colleagues20 thanks to the master-mold they kindly provided us as a blueprint, following the protocol here described: prior to the fabrication of the microfluidic devices, the master is exposed to chlorotrimethylsilane (Sigma-Aldrich Co.) vapours for 20 min in order to prevent polydimethylsiloxane (PDMS) from sticking to the mold. A 10:1 mixture of PDMS prepolymer and curing agent (Sylgard 184, Dow Corning) is prepared and degassed under vacuum until all air bubbles are removed from the blend. Then the mixture is poured on the master, and it is cured on a hot plate at for 1 h. After this step, the PDMS layer, containing the microfluidic channels, is peeled from the master and it is cut with a scalpel to separate the single devices; holes are bored through them with a 20-gauge blunt needle in order to create inlets for the access of cells and liquid substances. For each PDMS piece containing micro-channels, a thin glass slide (150 μm) is cleaned in acetone and isopropyl alcohol in a sonic bath for 10 min for each step. Finally, the PDMS layers and glass slides are exposed to oxygen plasma in a RIE (Reactive Ion Etching) machine for 10 s and brought into contact forms a strong irreversible bond between two surfaces.
Once the cells are loaded in the device, they can be provided with any combination of two inducer compounds (galactose and glucose) by simply modulating the difference in hydrostatic pressures at the two inlets, thanks to the Dial-a-Wave system.20 This can be easily achieved by varying the vertical position of syringes filled with sugar-supplemented media using specific motorised linear rails, which we built. On each linear guide, we mounted a stepper motor (Pc Control Ltd.), the rotary-to-linear conversion of the motion is obtained by the use of a timing belt (SDP/SI–—Designatronics, Inc.) coupled with tow pulleys (SDP/SI–—Designatronics, Inc.) one of each is linked to the motor axis. The motors are connected to a StepperBee + electronic board (Pc Control Ltd.), which is used together with a custom code, implemented in matlab (Mathworks, Inc.), running on a PC in order to control the vertical position of the syringes by moving appropriately the axes of the stepper motors.
B. Microscopy and image analysis
Image acquisition was carried out with a Nikon-TI Eclipse microscope equipped with a 40× objective and a Peltier-cooled Andor Clara camera, the entire ensemble is controlled by the NIS Elements software (v. 4.0). The microscope was programmed to acquire two types of images: (a) a bright field image (phase contrast) and (b) two fluorescence images (one for the green spectrum for GFP and one in the red spectrum for Sulforhodamine B). Fluorescence images were taken by using the filters FITC (excitation 460 = 40 nm, emission 510 nm = 50 nm) (green spectrum) and TRITC (excitation 530 = 30 nm, emission 590 nm = 60 nm) (red spectrum).
The red dye Sulforhodamine B was added to the galactose medium and it was used to check for the proper administration of the input to the system. Phase contrast and fluorescent images were acquired at intervals of 5 min. Once cells have been imaged, image analysis methods can be applied to estimate their fluorescence; to this end, we developed a custom image processing algorithm to obtain a measure of the average of the fluorescence of the entire cell population.21
IV. EXPERIMENTAL METHODOLOGY
On day 0 yeast cells (yGIL337 strain) were inoculated in 10 ml galactose (GAL)/raffinose (RAF) (2%) + synthetic complete medium (SC). On day 1, the culture was diluted to reach a final of 0.01. On day 2, 60 ml syringes (Becton, Dickinson and Company, NJ) filled with 10 ml SC + GAL/RAF (2%) + 5 of sulforhodamine B (Sigma-Aldrich) and SC + glucose (GLC) (2%) media were prepared, as well as sink syringes (filled with 10 ml ddH2O); capillaries and needles connected the syringes to the microfluidic device. Temperature in the microenvironment surrounding the moving stage of the microscope was controlled and fixed at . Before connecting media and sink syringes, the microfluidic device MFD0005a wetting is carried out as previously described.20 After removing air bubbles, media and water filled 60 ml syringes were attached to the device and correct functioning was checked by inspecting the red-fluorescence emitted by Sulforhodamine B as a result of the automatic height control of the syringes. This allowed us to carry out a correct calibration of the actuation strategy before the actual experiment is run. At this point, yeast cells are injected in the microfluidic device by pouring the batch culture in a 60 ml syringe similar to the ones used to media and sinks. Once cells were trapped in the defined area (see Ref. 20 for details) Nikon Perfect Focus System (PFS) is activated to assist auto-focusing during the experiment and the microscope is programmed to acquire images at every 5 min interval: phase contrast (40 ms, exposure time) and epifluorescence images (green fluorescence, 300 ms exposure time; red fluorescence, 100 ms exposure time) were acquired to allow the image processing algorithm to (a) locate the cells (bright field images) (b) quantify the synthesised Gfp (green fluorescence) and (c) verify the correct administration of GAL/GLC (red fluorescence). Yeasts were imaged for up to 16 h. During this interval, Galactose and Glucose were alternatively provided to the yeast chamber. The frequency was chosen according to the previous literature,19 specifically cells were provided with a galactose enriched medium for 180 min and then with glucose for the following 180 min and so on until the end of the experiment, as depicted in Figure 2, Panel B.
V. EXPERIMENTAL RESULTS
As described before, the concentration of galactose was tracked with a red fluorescent dye (Sulforhodamine B), so that it was possible to obtain a time profile of the actual input provided to the cells by measuring the fluorescence of the medium in the red spectrum (Figure 2, Panel B). The average Gfp fluorescence of the cells' population, in the green spectrum, is instead taken as the system output (Figure 2, Panel A).
The representative dataset shown in Figure 2 is consistent with the expected behaviour of the promoter under investigation. The presence of galactose induces the expression of Gfp from the GAL1 promoter, by activating the Gal4 transcription factor, whereas glucose represses the activity of the GAL1 promoter and hence of the Gfp production. Therefore, the Gfp fluorescence level is expected to increase or to stay at a high steady state value when the cells are fed with galactose. It should instead decrease, or stay at a low steady state, when glucose is provided. At the beginning of the experiment, cells exhibit a high fluorescence level, since cells were taken from a galactose overnight culture. Gfp level was expected to remain almost constant in the first 180 min of the experiment (Figure 2). However, as can be seen in Figure 2, its value was observed to slightly decrease, probably because of the stress during the loading into the microfluidic device. In the rest of the experiment, the activity of the GAL1 promoter in response to the other two pulses of galactose is consistent with the expected behaviour, as revealed by the Gfp fluorescence level (Figure 2).
VI. IDENTIFICATION
A. Candidate models
The identification of the parameters of a mathematical model able to capture the dynamics of the GAL1 promoter was carried out without assuming any a priori knowledge of the underlying chemical and physical processes occurring inside the cells. Input and output data (i.e., level of galactose quantified by the red dye as input and the level of Gfp fluorescence as output) were used to fit different model structures thus considering the system as a “black box.” We used the platform to test and compare the following common identification methods (a description of the metrics used to assess the effectiveness of the identification is given in Sec. VI B).
1. ARX models
Auto Regressive exogenous models (ARX)4 with a delay from input to output of the form
the output y(t) represents the measured fluorescence level at the current time t and it is assumed to be proportional to the sum of its na past values, a sort of memory of the system, and to the sum of nb past values of the input (i.e., galactose/glucose). In addition, the input is assumed not to act instantaneously on the output but after a delay of samples.
2. First order transfer function with delays
First-order transfer function with a time delay of the form
The transfer function is just a different mathematical representation, using the Laplacian operator, of the first-order linear ordinary differential equation (ODE) reported as
This equation represents the rate of change of the Gfp fluorescence level () as a function of a production term proportional to the input (, i.e., galactose/glucose)), which is assumed to act on the output only after a delay equal to Td. A linear degradation term for the reporter protein () is also present.
Parameters to be estimated, via the Prediction Error Minimisation (PEM) criterion,4 are the transfer function gain Kp, the time constant Tp and the delay Td.
3. State space models
Higher-order (linear time-invariant) state space model, which assumes that more than one differential equation is needed to correctly model the promoter dynamics. These extra equations can either represent physical quantities (i.e., mRNA, protein) or abstract quantities useful to model the system.
The generic state-space model of interest can be written as
where x is the state vector of dimension n, A is a n-by- n matrix, B is a column vector of dimension n, C is a row vector of dimension n and x0 is the vector of initial conditions.
For this class of models, two different algorithms were considered, prediction error minimisation (PEM applied to state space models) and N4SID.4 The system order n, the coefficients of the matrices A, B, and C, and the vector of initial conditions x0 were estimated from the experimental data.
4. Nonlinear model
We considered the use of a nonlinear model of the form
this nonlinear differential equation models the rate of change of the Gfp fluorescence level ( in response to galactose stimulation (the ), delayed of τ minutes) as a non linear function of a production term (the Hill function22) and of a linear degradation term (Dy(t)). In the Hill function, K represents the value of u needed to achieve half of the maximal production rate v, and the H exponent, Hill coefficient, governs the steepness of this function (the higher the value, the more similar the function is to a step function). The α parameter represents the promoter “leakiness,” i.e., the production rate in the absence of galactose (i.e., u = 0).
We added an explicit delay τ to the input, as in the case of the ARX model and the Transfer Function, since from the experimental results (see Figure 2) the response of the system, namely changes in Gfp values, in response to switches between galactose and glucose and vice-versa, appears to be delayed.
The Simulated Annealing algorithm23 was applied to estimate the parameters α, v, H, K, D, τ and the initial condition (IC) for the state variable x starting from an initial guess of all of them. The estimation has been carried out by minimising the following objective function:
where for the i-th datapoint, is the model output in response to the measured input that leads to the real system output yi and, is the average of y.
It is worth pointing out that a systematic experimental comparison of the identification approaches described above when applied to in-vivo biological systems has not been carried out in the existing literature.
B. Metrics and validation
To assess the performance of each identification scenario and to carry out a comparison among different methods, we used the following metrics to evaluate their predictive ability.
1. Akaike's final prediction error and information criterion
Akaike's Final Prediction Error (FPE) or AIC4 can be used to evaluate the quality of a given model by testing how it captures the system response to a known input signal. The metrics can be computed as follows:
and
where is the vector of estimated parameters, m is the number of estimated parameters, N is the dimension of the estimation dataset and are the prediction errors. Both FPE and AIC take their smallest values when the model is the most accurate.
2. Fitting percentage
This index provides a measure of the percentage of the output variation that is reproduced by the model and is given by the following formula:4
where for the i-th datapoint, is the model output in response to the measured input that leads to the real system output yi and, is the average of y. This index can effectively be used also for cross-validation purposes by evaluating the model ability to capture data that are different from those used for the identification of its parameters.
C. Identification strategies
The suitability of models identified with each of the strategies considered in this paper has been tested in two sets of different scenarios. In the first set, both identification and validation of the models were performed on the same dataset (scenarios I and II). In the second set, the predictive ability of each of the models is evaluated by using a dataset different from the one used for identification (scenarios III and IV).
Scenario I
The parameters of each class of models were estimated on the dataset depicted in Figure 2 that consist of experimental measurements for both the input (red dye fluorescence) and output (Gfp fluorescence) filtered via a low pass filter to reduce measurement noise. The same dataset was used for validation.
Scenario II
The data used to identify the mathematical models are shown in Figure 3; differently from scenario I, here the input is the ideal concentration of galactose (i.e., a square waveform) and not the estimated concentration as measured by the red dye fluorescence; the output is the same as in scenario I, i.e., the filtered measured green fluorescence of the cells. The models obtained were then validated on the same dataset used for identification.
Scenario III
In this scenario, only the first 700 min of the data in scenario I (Figure 2) were used to estimate the parameters of the mathematical models described in Sec. VI A, validation was performed on the entire dataset.
Scenario IV
In this scenario, only the first 700 min of the data in scenario II (Figure 3) were used to estimate the parameters of the mathematical models. As in the case of scenario III, validation was performed on the entire dataset.
VII. RESULTS
The results of the identification for the different model structures across the four scenarios are described below. The value of the indices given in Sec. VI B is summarised in Table I.
. | Scenario I . | Scenario II . |
---|---|---|
FPE | ||
AIC | ||
FIT (%) | ||
Scenario III | Scenario IV | |
FPE | ||
AIC | ||
FIT (%) |
. | Scenario I . | Scenario II . |
---|---|---|
FPE | ||
AIC | ||
FIT (%) | ||
Scenario III | Scenario IV | |
FPE | ||
AIC | ||
FIT (%) |
The order of the ARX models used to capture the system behaviour over scenarios I-IV obtained by the System Identification algorithm is the following:
For the delayed transfer function, we obtained the following parameters:
When state space models are used, we obtained the following order (the full matrices were n):
Scenarios I and II: n = 4 with both N4SID and PEM [Figures 4(c) and 5(c)],
Scenarios III and IV: n = 5 with both N4SID and PEM [Figures 6(c) and 7(c)].
Finally for the non linear model, we found the following values for the parameters of Eq. (5) for each scenario:
Scenario I: [0.0000,0.0055,2.3678,1.8792,0.0048,1.2719,54.2024] [Figure 4(d)],
Scenario II: [0.0000,0.0138,1.7930,2.5890,0.0075,1.4449,66.4625] [Figure 5(d)],
Scenario III: [0.0014,1.2429,3.4970,3.8305,0.0126,1.2381,70.9909] [Figure 6(d)],
Scenario IV: [0.0018,0.0801,2.2255,3.3595,0.0101,1.9086,72.1324] [Figure 7(d)].
Both the ARX model and the Transfer Function explicitly include a parameter to account for a delayed response of the promoter to a change in galactose concentration. In the ARX model, the delay is captured by the parameter nk which was estimated, in the different scenarios, to range from 10 to 14 samples; since each sample is measured at 5 min time intervals, the delay can be estimated to be between 50 min and 70 min. Interestingly, the delay (Td) estimated by the transfer function is in the same range, varying from 57.82 min to 68.07 min across the four scenarios. The state space model does not include an explicit time delay, although it is possible to add one, and hence it needs a relatively high number of states (4 or 5) to correctly capture the observed dynamics. For the nonlinear model, the estimated time delay ranges from 54.02 min to 72.13 min, in agreement with the linear models.
Biologically, the presence of the delay may be explained by several biological processes, such as the time needed to activate the Gal4 transcription factor following galactose induction, or for the transcription initiation complex to form. However, since the GAL1 promoter has been very well studied in the literature, it is known that both the Gal4 activation and initiation of transcription from the GAL1 promoter are quite fast, in the order of minutes.24 On the contrary, the estimated delay from the system identification procedure is in the order of an hour; therefore, the most likely explanation is the time required by the fluorescence reporter protein (Gfp) to fold and mature, as well as, its high half-life known to be in the order of hours.25
These observations raise an obvious but important consideration: the reporter protein half-life, i.e., the protein stability, must be commensurate to the dynamics of the promoter to be modelled. Hence, it is important to have at least a rough estimate of the promoter dynamics, otherwise, we may filter out fast dynamics due to the reporter protein.25 In theory, a very unstable protein, with a very short half-life, should be the best choice to avoid filtering out fast promoter dynamics, however unstable proteins have a weak fluorescent signal, and hence they increase measurement errors, therefore in practice a balance between half-life and fluorescent intensity must be found to perform successful experiments.25
Regarding the performance of the different models on the various scenarios, we can draw two main conclusions by inspecting Table I: (1) the FIT index (as defined in Sec. VI B), which is the only one out of the three indices independent of the number m of model parameters, is consistently smaller (i.e., worse) in scenarios III and IV when compared to scenarios I and II. This is to be expected since in scenarios III and IV, the models are identified using a smaller number of samples. Moreover, the prediction error is estimated on samples not used for the identification. The FPE and AIC indices are not so easy to interpret since in each scenario a different model order (i.e., the number m) is chosen (at least for the ARX and the state-space models) and these indices depend also on the model order; (2) by comparing scenarios I and III with scenarios II and IV, we noticed that using the “ideal” input signal (scenarios II and IV) decreased the performance of all the models, but for the ARX. This reduction may indicate that the fluctuations present in the red fluorescent dye measurement, which is proportional to the galactose concentration, are not due to measurement errors but probably contain some relevant information.
The nonlinear model is seemingly the worst performer, however, this has to be attributed to the heuristic algorithm (i.e., simulated annealing) that we used to estimate model parameters, rather than to the model structure. Indeed, unlike linear models, where it is possible to exactly compute the optimal solution which minimises the cost function, in the case of nonlinear models a heuristic approach must be employed. The parameters of the heuristic method have to be carefully chosen for best performance (i.e., the starting temperature, the cooling function, etc.).
VIII. DISCUSSION
A pressing open problem in quantitative biology is to develop integrated experimental and computational system identification approaches to biological processes, such as transcriptional control of gene expression, to derive quantitative dynamical models of complex molecular mechanisms. We believe that the use of a microfluidics based platform, such as the one described here, can be instrumental for the design of innovative identification strategies and address the need for better quantitative models of biological processes.
The results here described confirm that complex mechanisms underlying transcriptional control from a eukaryotic promoter, which requires the coordinated action of several protein complexes and it is still not completely understood,26 can indeed be identified using standard System Identification strategies without requiring any detailed a priori knowledge of the promoter to be modelled.13–16
The experimental platform we developed represents a cheap but accurate technological solution that may be used to analyse and model any promoter of interest. However, in order to use the platform to model promoters which are not inducible by small molecules, an extra step is required as depicted in Figure 8. The transcription factor, together with a reporter fluorescent protein, has to be cloned downstream of an inducible promoter, in order to be able to generate a time-varying concentration of the transcription factor, which can be used as input for the system identification procedure.
One of the limitations of our approach is the need for a fluorescent reporter protein, whose half-life has to be carefully chosen in order to avoid filtering out the dynamic behaviour of the promoter under investigation.
The different identification strategies used in the paper show comparable performances indicating that even a linear model with the simplest structure, such as the linear first-order transfer function with a delay, can be effectively used to model promoter dynamics. A linear model may be preferred to nonlinear ones, because of its simplicity and versatility for control purposes, e.g., Ref. 27. Moreover, since nonlinear model parameters can be identified only using heuristic method, the extra effort required to tune heuristic algorithm parameters is worthwhile only when a linear model fails to satisfactorily capture the promoter dynamics.
Indeed, there will be cases in which a linear model will not be able to capture the promoter behaviour, such as when adaptation to the input occurs.8 For example, the promoter of the STL1 gene, which encodes a glycerol protein symporter, is controlled by the transcription factor Hog1, which is activated by osmotic shock. Our experimental system identification platform may be applied using as input the osmotic stress (i.e., increasing extracellular pressure) and, as output, a Gfp reporter protein downstream the STL1 promoter.8 However, a sustained osmotic stress will at first activate the promoter but then, once enough glycerol is produced to counteract the external pressure, the cell will stop expressing Gfp from STL1 promoter, because of cell adaptation to osmotic stress (for details refer to Ref. 8). Hence, a linear model would fail in capturing the promoter dynamics and a nonlinear model is needed instead.8
The experimental system identification platform here described allows fast prototyping of eukaryotic promoters to probe their dynamical behaviour and to identify input-output quantitative models. We wish to emphasise that our experimental strategy for system identification could be effectively implemented “on-line” at the beginning of a control experiment such as those reported in Refs. 8–10 and 27.
ACKNOWLEDGMENTS
This work was supported by a Human Frontiers Science Program grant (RGP0020/2011) to DdB. We are grateful to Professor Botstein for providing yeast strains (yGIL294, yGIL337, and yGIL339) and to Professor Hasty for providing the microfluidics design and the master silicon wafer to produces the devices.