Search processes are ubiquitous in physical and biological phenomena, often involving the random motion of molecules. In particular, transcription factors (TFs) are proteins that regulate gene expression and need to find their DNA targets quickly—which is difficult to achieve with random motion alone. Nature came up with a remarkable solution known as facilitated diffusion, combining 1D diffusion along the DNA and “excursions” of diffusion in 3D that help the TF to quickly arrive at distant parts of the DNA. In this paper, we show that this process can be analyzed naturally using the concept of conditional probability, providing an alternative intuition to the effectiveness of this mechanism.
I. MOTIVATION
Elchanan Mossel's Dice paradox1 poses a simple probability question that gives rise to neat and non-intuitive results. The paradox is phrased as follows:2
You roll a fair six-sided die until you get 6. What is the expected number of rolls, if we consider only those series of rolls in which each roll in the series is an even number?
It is a common mistake to think that the answer is 3 based on the following logic: if the possible outcomes are either 2, 4 or 6 and we wish to get a 6 then the number of rolls follows a geometric distribution with a parameter of 1/3.
Next, we will describe an important biological process known as “facilitated diffusion,” which, intriguingly, is related to the concept of conditioned probability. In fact, we will see that this mechanism shares several key properties with the “paradox” discussed above.
II. FACILITATED DIFFUSION
In principle, the TF search could have been executed by first attaching to the DNA at some point, followed by processive motion until the target is found. For velocities large enough, this could have led to a superior search mechanism. However, such motion inevitably consumes energy (since it breaks detailed balance). For this or other reasons, this solution was not chosen by Nature. Note that other biological scenarios do involve the processive motion of proteins in 1D, see, e.g., Ref. 8.
Facilitated diffusion is a mechanism where the TF performs 1D diffusion along the DNA and at any given moment can fall off and perform 3D diffusion until reattaching to the DNA (a 3D excursion) at some random point along it; this is a key assumption that allows significant simplifications of the mathematical description of the mechanism. The 1D diffusion along the DNA and the 3D excursions alternate until the TF hits its target. This model is well studied, see Refs. 3, 4, and 9 for reviews, and empirical evidence supporting it was found in bacteria.10 Figure 1 shows a cartoon illustrating this mechanism. As we shall see, the dependence of the search time on the underlying microscopic parameters (diffusion constants and dimensions) is fundamentally different than the 1D and 3D models discussed above. For a broad parameter regime, it can lead to a significant speed-up of the search time.
Cartoon illustrating the facilitated diffusion search mechanism. The TF (circle) performs 1 D diffusion along the DNA (solid line). It then falls off, performs 3D diffusion, and lands somewhere else along the DNA (3D diffusion excursion)—which could get it closer to its target (square) or further away from it. This process repeats itself until the TF finds the target.
Cartoon illustrating the facilitated diffusion search mechanism. The TF (circle) performs 1 D diffusion along the DNA (solid line). It then falls off, performs 3D diffusion, and lands somewhere else along the DNA (3D diffusion excursion)—which could get it closer to its target (square) or further away from it. This process repeats itself until the TF finds the target.
We will now show a mathematical description of this model that allows significant speed-up compared with the results of Eqs. (5) and (6). Namely, we'll arrive at a search time that scales like L instead of L2. The key principle for the success of this mechanism is analogous to what we described earlier discussing the dice “paradox”—the falling-off during a 1D search attempt and re-trying at a random point along the DNA resembles the “throwing away” of long die rolling sequences. Namely, the probability to fall-off is analogous to the probability of getting an odd number. Note that in the dice problem once we get an odd number we “reset” the counter, while in the case of facilitated diffusion failed attempts do contribute to the total time. However, in both cases, the “resetting” allows us to avoid lengthy runs and, thus, shortens the mean first passage time (MFPT). Furthermore, in both cases, we see the dramatic effects of the a priori benign conditioning process: In the dice problem, this leads to a MFPT always smaller than 2 (for any number of facets of the die). In the facilitated diffusion problem, conditioning on a successful search leads to a MFPT linear in the distance from the target, rather than quadratically as we might expect intuitively.
III. MATHEMATICAL DESCRIPTION
We now present the mathematical description of the facilitated diffusion mechanism. We start from a microscopic point of view, then move on to a continuum description and by properly taking the conditional probability into account we arrive at a description of the full search process. Our results are closely related to the extensive analytical results obtained in Ref. 11, even though our derivation is more elementary and focuses on different aspects of the mathematical description of the mechanism.
We start by solving the one dimensional problem of a TF hitting the target while being bound to the DNA. Afterwards we'll take the 3D excursions into account.
A. The 1D problem
Equation (12) is analytically solvable. Even so, this solution is of little significance given that the search mechanism is not “measured” at the level of a single search attempt but—as we shall see later – on the level of numerous search attempts. In other words, what we will be interested in is the MFPT associated with Eq. (12) (not to be confused with the MFPT of the entire facilitated diffusion process—which we calculate in Sec. III B). We present the calculation of the FPT distribution for the sake of completeness in Appendix A.
B. Taking 3D excursions into account
Up to this point we've only considered a single 1D search attempt. Taking the 3D excursions into account, we are able to model the whole process. We will do so by assuming a finite DNA of length 2L with the TF target at its center and that the reattachment point after a 3D excursion is distributed uniformly on the DNA. Moreover, we assume that the search starts with the TF bound to the DNA at some random point along it; relaxing the assumption is inconsequential.
We see that the search time now scales linearly with L instead of quadratically! As we mentioned before, this is reminiscent of the dice “paradox”—we do not “keep” long and unsuccessful sequences. Moreover, if we optimize the search time with respect to Γ we arrive at a neat conclusion that the optimal search time is obtainable by taking and then the TF spends half of its time in 1D and half in 3D with the overall search time of . Using typical values for bacteria , and as could be found, for example, in Refs. 3 and 4, we indeed arrive at an overall search time of the order of tens of seconds.
IV. DISCUSSION
In this paper, we revisited a well-known and well-studied mechanism for how TFs search for their target genes. We showed how the mathematical description of the mechanism naturally utilizes the basic concept of conditional probability.
The facilitated diffusion problem we discussed here is mathematically related to the class of problems of first passage time under restart. For these problems, one is interested in the first passage time of a random walker, with a rate to “reset” the particle, typically to a particular site (in contrast to the random resetting encountered in the facilitated diffusion problem). Intriguingly, for these problems, there is an optimal restart rate that can speed up the search dramatically. Reference 18 studies the statistical properties of a searcher absorption by a static target under constant rate resetting of the searcher position. A generic treatment of first passage under resetting is given in Ref. 19, and a review thoroughly studying different cases and generalizations of the resetting time is found in Ref. 20. Such processes are deeply related to the inspection paradox of probability theory, where a sampling bias may distort the statistics in counter-intuitive ways. For instance, in a famous example of this paradox, the average waiting time for a bus a person measures when they arrive at the bus station at some random, uniformly distributed, time is greater than the average time between consecutive buses. In the case of heavy-tailed distributions, in fact, the former can be infinite even when the latter is finite! Resetting overcomes this sampling bias and in some cases may even shorten the waiting time compared with the distribution's mean. A review studying the relations between stochastic processes under resetting and the inspection paradox is found in Ref. 21. This study also characterize the processes where resetting will enable a speed-up compared with a simple mean of the distribution.
While the facilitated diffusion mechanism is a powerful mechanism for shortening the search time, there are both extensions to this mechanism and other, completely different, mechanisms worth mentioning. Still within the framework of facilitated diffusion, one may take into account the dynamics of the DNA molecule itself, as discussed in Ref. 22, or the energy landscape the TF experiences while moving along the DNA, as discussed in the reviews.3,4,23 Recently, Ref. 24 related diffusion on such a disordered landscape to the phenomenon of Anderson localization and discussed its implications for facilitated diffusion. Another intriguing work directly related to facilitated diffusion is given in Ref. 25 where the authors use a facilitated diffusion-based model to study the architecture of bacteria genome.
Although facilitated diffusion is fit to describe the search mechanism of some TFs in bacteria, other types of TFs (and proteins in general) could have a significantly different structure leading to inherently different dynamics. Reference 26 discusses a protein extended in one dimension in a manner that enables it to interact with many sequences along the DNA in parallel, which effectively reduces the dimensionality of the search (from three to two dimensions) causing a remarkable speed-up of the search process—distinct from the mechanism we explored here. The concept of dimensionality reduction in biological search processes dates to the seminal work of Delbrück and Adam in 1968, Ref. 27. This work was recently revisited and extended.28 Another distinct example is given in Refs. 29–31, which discuss the search mechanism TFs use in eukaryotic cells. In this case, the TFs often have a long polymeric tail called the Intrinsic Disordered Region that plays a major role in the search, though the theoretical framework for this scenario has yet to be developed.
ACKNOWLEDGMENTS
The authors thank Wencheng Ji, Naama Barkai, Yariv Kafri, Urlich Gerland, Shlomi Reuveni, Sarah Kostinski, and Raphael Voituriez for helpful discussions and comments.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
APPENDIX A: CALCULATING THE FPT
In the following, we calculate the FPT as opposed to the MFPT calculated in the main text.
APPENDIX B: DERIVING THE MFPT EQUATION DIRECTLY FROM THE EQUATION FOR THE FPT
In the main text, we derived the equation for the MFPT using the appropriate recursion relation following fundamental principles. In the following, we shall present an alternative derivation starting from the equation for the FPT, namely, Eq. (12) (also presented in the previous appendix as Eq. (A1)).
Note that we expect this time to be finite even on an infinite domain, since the finite fall-off rate would prevent the mean time from diverging—in contrast, for example, to the diverging MFPT associated with normal random walks in 1D (we have also shown this directly from the FPT distribution in the previous appendix).
APPENDIX C: DICE PARADOX CODE
The following Python script demonstrates the result of the dice paradox. It could be used to help the reader to develop intuition towards the paradox's result. There are two input parameters: “sides” refers to the number of sides the die has and “runs” refers to the number of runs included in the estimation of the expected value.
import numpy as np
np.random.seed(42)
sides = 1000
runs = 1000
T_w_cond = 0
T_wo_cond = 0
for i in range(runs):
hit_tgt = False
t_w_cond = 0
t_wo_cond = 0
while not hit_tgt:
t_w_cond = t_w_cond + 1
t_wo_cond = t_wo_cond + 1
roll = np.random.randint(1,sides+1)
if roll %2 != 0:
t_w_cond = 0
if roll == sides:
hit_tgt = True
T_w_cond = T_w_cond +\
t_w_cond/runs
T_wo_cond = T_wo_cond +\
t_wo_cond/runs
print(’Number of rolls w/o accounting for’
+ ’ conditioning:’, str(T_wo_cond))
print(’Number of rolls when accounting for’
+ ’ conditioning:’, str(T_w_cond))
For each run the script “rolls” the die until it lands on “sides” while keeping track of the number of rolls and the number of rolls without landing on an odd value, the latter is reset if an odd value is rolled. The script ends with printing the number of rolls without taking the conditioning into account (which we expect to be about the same as the input of “sides”) and the number of rolls when taking the conditioning into account.