Search processes are ubiquitous in physical and biological phenomena, often involving the random motion of molecules. In particular, transcription factors (TFs) are proteins that regulate gene expression and need to find their DNA targets quickly—which is difficult to achieve with random motion alone. Nature came up with a remarkable solution known as facilitated diffusion, combining 1D diffusion along the DNA and “excursions” of diffusion in 3D that help the TF to quickly arrive at distant parts of the DNA. In this paper, we show that this process can be analyzed naturally using the concept of conditional probability, providing an alternative intuition to the effectiveness of this mechanism.

## I. MOTIVATION

Elchanan Mossel's Dice paradox^{1} poses a simple probability question that gives rise to neat and non-intuitive results. The paradox is phrased as follows:^{2}

You roll a fair six-sided die until you get 6. What is the expected number of rolls, if we consider only those series of rolls in which each roll in the series is an even number?

It is a common mistake to think that the answer is 3 based on the following logic: if the possible outcomes are either 2, 4 or 6 and we wish to get a 6 then the number of rolls follows a geometric distribution with a parameter of 1/3.

*conditioning*present in the formulation of the paradox that should lead us to make use of a fundamental concept in probability theory,

*conditional*probability. All rolls are still possible, but the

*conditioning*on rolling only even numbers makes us eliminate a series from our statistics as soon as an odd number is rolled. The basic concept of conditional probability is summarized by the following equation:

*A*occurs, given that

*B*has occurred, and $ P ( A \u2229 B )$ is the probability that both

*A*and

*B*occur. As mentioned, the answer of 3 is wrong due to improperly taking into account the conditioning. We can perform the correct calculation using Eq. (1). Instead of

*A*we'll have

*X*which stands for the probability of getting a 6 on the

_{i}*i*th roll, so that $ P ( X i ) = 1 / 6$. The event

*B*will occur whenever a series has only even numbers (2s and 4s) before the first 6.

*P*(

*B*) will allow us to calculate the fraction of series that qualify for consideration because they meet the condition. First, we'll calculate $ P ( X i \u2229 B )$

*P*(

*B*), the fraction of all possible series that qualify for consideration, is simply the sum of $ P ( X i \u2229 B )$ over all possible values of

*i*,

Next, we will describe an important biological process known as “facilitated diffusion,” which, intriguingly, is related to the concept of conditioned probability. In fact, we will see that this mechanism shares several key properties with the “paradox” discussed above.

## II. FACILITATED DIFFUSION

*L*being the DNA total length and $ D 1 D$ the 1D diffusion coefficient. For typical values relevant for bacteria, $ L = 10 6 \u2009 nm , \u2009 D 1 D = 10 4 \u2013 10 5 \u2009 nm 2 / s$—taken from Refs. 3 and 4, this results in a search time of thousands of hours. The estimate for 3D diffusion is as follows:

*V*being the volume restricting the TF and its target, $ D 3 D$ the 3D diffusion coefficient, and

*r*the typical spatial size of the target. It could be obtained by dimensional analysis assuming that the mean search time has to be inversely proportional to the concentration of searchers (forcing the mean search time to be proportional to the search volume) or, more rigorously, by solving a first-passage-time problem of a random walker hitting a target in a restricted volume, as found in Refs. 5 and 6. For typical values relevant for bacteria, $ V = 10 9 \u2009 nm 3 , \u2009 D 3 D = 10 7 \u2009 nm 2 / s , \u2009 r = 0.34 \u2009 nm$—taken from Refs. 3 and 4, this results in a search time of hundreds of seconds. This result is much better than the 1D case, but in reality it is likely that the protein would spend a finite length of time bound to a non-target site, which would further increase the value of

*t*, making this estimate optimistic.

_{search}^{7}

In principle, the TF search could have been executed by first attaching to the DNA at some point, followed by processive motion until the target is found. For velocities large enough, this could have led to a superior search mechanism. However, such motion inevitably consumes energy (since it breaks detailed balance). For this or other reasons, this solution was not chosen by Nature. Note that other biological scenarios do involve the processive motion of proteins in 1D, see, e.g., Ref. 8.

Facilitated diffusion is a mechanism where the TF performs 1D diffusion along the DNA and at any given moment can fall off and perform 3D diffusion until reattaching to the DNA (a 3D excursion) at some random point along it; this is a key assumption that allows significant simplifications of the mathematical description of the mechanism. The 1D diffusion along the DNA and the 3D excursions alternate until the TF hits its target. This model is well studied, see Refs. 3, 4, and 9 for reviews, and empirical evidence supporting it was found in bacteria.^{10} Figure 1 shows a cartoon illustrating this mechanism. As we shall see, the dependence of the search time on the underlying microscopic parameters (diffusion constants and dimensions) is fundamentally different than the 1D and 3D models discussed above. For a broad parameter regime, it can lead to a significant speed-up of the search time.

We will now show a mathematical description of this model that allows significant speed-up compared with the results of Eqs. (5) and (6). Namely, we'll arrive at a search time that scales like *L* instead of *L*^{2}. The key principle for the success of this mechanism is analogous to what we described earlier discussing the dice “paradox”—the falling-off during a 1D search attempt and re-trying at a random point along the DNA resembles the “throwing away” of long die rolling sequences. Namely, the probability to fall-off is analogous to the probability of getting an odd number. Note that in the dice problem once we get an odd number we “reset” the counter, while in the case of facilitated diffusion failed attempts do contribute to the total time. However, in both cases, the “resetting” allows us to avoid lengthy runs and, thus, shortens the mean first passage time (MFPT). Furthermore, in both cases, we see the dramatic effects of the *a priori* benign conditioning process: In the dice problem, this leads to a MFPT always smaller than 2 (for any number of facets of the die). In the facilitated diffusion problem, conditioning on a successful search leads to a MFPT linear in the distance from the target, rather than quadratically as we might expect intuitively.

## III. MATHEMATICAL DESCRIPTION

We now present the mathematical description of the facilitated diffusion mechanism. We start from a microscopic point of view, then move on to a continuum description and by properly taking the conditional probability into account we arrive at a description of the full search process. Our results are closely related to the extensive analytical results obtained in Ref. 11, even though our derivation is more elementary and focuses on different aspects of the mathematical description of the mechanism.

We start by solving the one dimensional problem of a TF hitting the target while being bound to the DNA. Afterwards we'll take the 3D excursions into account.

### A. The 1D problem

*x*from the target, $ p \u0303 ( x )$—hereinafter, we mark a quantity with a tilde when it describes a discrete function; we use the same notation for its continuous counterpart albeit with the tilde omitted. We can write the recursion relation for $ p \u0303 ( x )$ as follows:

*γ*is the probability to fall at any step and

*δx*is the step size. This recursion relation holds since at each step the TF falls off with probability

*γ*, and otherwise has a probability of 1/2 of moving to either of the neighboring sites. In Eq. (7), we neglect the effect of having finite boundaries – given that the DNA is long compared with the size of the TF we may, without significantly affecting the results, solve the problem on an infinite domain, $ x \u2208 ( \u2212 \u221e , \u221e )$, while assuming the target is at

*x*= 0.

*δt*being the time step, taking continuum limit: $ \delta x \u2192 0 , \u2009 \delta t \u2192 0$ and $ \gamma \u2192 0$, while defining $ D \u2261 ( \delta x ) 2 / 2 \delta t$ and $ \Gamma \u2261 \gamma / \delta t$. This brings us to the following ordinary differential equation:

*x*at time

*t*or the probability that the TF would hit the target by time

*t*given that it started at position

*x*, see Ref. 12 for additional explanation

Equation (12) is analytically solvable. Even so, this solution is of little significance given that the search mechanism is not “measured” at the level of a *single* search attempt but—as we shall see later – on the level of numerous search attempts. In other words, what we will be interested in is the MFPT associated with Eq. (12) (not to be confused with the MFPT of the entire facilitated diffusion process—which we calculate in Sec. III B). We present the calculation of the FPT distribution for the sake of completeness in Appendix A.

*δt*, the time contributed to $ T ( x )$ is the product of

*δt*and the probability of actually hitting the target. Taking the continuum limit of Eq. (13) in a similar manner to what that was done in Eqs. (8) and (9) gives

*x*decays exponentially with $ | x |$.

### B. Taking 3D excursions into account

Up to this point we've only considered a single 1D search attempt. Taking the 3D excursions into account, we are able to model the whole process. We will do so by assuming a finite DNA of length 2*L* with the TF target at its center and that the reattachment point after a 3D excursion is distributed uniformly on the DNA. Moreover, we assume that the search starts with the TF bound to the DNA at some random point along it; relaxing the assumption is inconsequential.

*i*th 1D search attempt, $ t 1 D ( x i )$ is the time spent on the

*i*th attempt assuming it was successful while $ t 1 D \u2009 f ( x i )$ denotes the time assuming that it wasn't, all conditioned on starting from position

*x*, and $ t 3 D i$ is the time spent on 3D diffusion assuming the $ ( i \u2212 1 ) t h$ 1D search attempt has failed. The search time then follows:

_{i}*x*(assumed to be uniformly distributed) we arrive at the following:

_{i}We see that the search time now scales linearly with *L* instead of quadratically! As we mentioned before, this is reminiscent of the dice “paradox”—we do not “keep” long and unsuccessful sequences. Moreover, if we optimize the search time with respect to Γ we arrive at a neat conclusion that the optimal search time is obtainable by taking $ \Gamma = 1 / \u27e8 t 3 D \u27e9$ and then the TF spends half of its time in 1D and half in 3D with the overall search time of $ L \u27e8 t 3 D \u27e9 / D$. Using typical values for bacteria $ 2 L = 10 6 \u2009 nm , \u2009 D = 10 4 \u2013 10 5 \u2009 nm 2 / s$, and $ \u27e8 t 3 D \u27e9 = 10 \u2212 4 \u2009 s$ as could be found, for example, in Refs. 3 and 4, we indeed arrive at an overall search time of the order of tens of seconds.

## IV. DISCUSSION

In this paper, we revisited a well-known and well-studied mechanism for how TFs search for their target genes. We showed how the mathematical description of the mechanism naturally utilizes the basic concept of conditional probability.

The facilitated diffusion problem we discussed here is mathematically related to the class of problems of first passage time under restart. For these problems, one is interested in the first passage time of a random walker, with a rate to “reset” the particle, typically to a particular site (in contrast to the random resetting encountered in the facilitated diffusion problem). Intriguingly, for these problems, there is an optimal restart rate that can speed up the search dramatically. Reference 18 studies the statistical properties of a searcher absorption by a static target under constant rate resetting of the searcher position. A generic treatment of first passage under resetting is given in Ref. 19, and a review thoroughly studying different cases and generalizations of the resetting time is found in Ref. 20. Such processes are deeply related to the inspection paradox of probability theory, where a sampling bias may distort the statistics in counter-intuitive ways. For instance, in a famous example of this paradox, the average waiting time for a bus a person measures when they arrive at the bus station at some random, uniformly distributed, time is greater than the average time between consecutive buses. In the case of heavy-tailed distributions, in fact, the former can be infinite even when the latter is finite! Resetting overcomes this sampling bias and in some cases may even shorten the waiting time compared with the distribution's mean. A review studying the relations between stochastic processes under resetting and the inspection paradox is found in Ref. 21. This study also characterize the processes where resetting will enable a speed-up compared with a simple mean of the distribution.

While the facilitated diffusion mechanism is a powerful mechanism for shortening the search time, there are both extensions to this mechanism and other, completely different, mechanisms worth mentioning. Still within the framework of facilitated diffusion, one may take into account the dynamics of the DNA molecule itself, as discussed in Ref. 22, or the energy landscape the TF experiences while moving along the DNA, as discussed in the reviews.^{3,4,23} Recently, Ref. 24 related diffusion on such a disordered landscape to the phenomenon of Anderson localization and discussed its implications for facilitated diffusion. Another intriguing work directly related to facilitated diffusion is given in Ref. 25 where the authors use a facilitated diffusion-based model to study the architecture of bacteria genome.

Although facilitated diffusion is fit to describe the search mechanism of some TFs in bacteria, other types of TFs (and proteins in general) could have a significantly different structure leading to inherently different dynamics. Reference 26 discusses a protein extended in one dimension in a manner that enables it to interact with many sequences along the DNA in parallel, which effectively reduces the dimensionality of the search (from three to two dimensions) causing a remarkable speed-up of the search process—distinct from the mechanism we explored here. The concept of dimensionality reduction in biological search processes dates to the seminal work of Delbrück and Adam in 1968, Ref. 27. This work was recently revisited and extended.^{28} Another distinct example is given in Refs. 29–31, which discuss the search mechanism TFs use in eukaryotic cells. In this case, the TFs often have a long polymeric tail called the Intrinsic Disordered Region that plays a major role in the search, though the theoretical framework for this scenario has yet to be developed.

## ACKNOWLEDGMENTS

The authors thank Wencheng Ji, Naama Barkai, Yariv Kafri, Urlich Gerland, Shlomi Reuveni, Sarah Kostinski, and Raphael Voituriez for helpful discussions and comments.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### APPENDIX A: CALCULATING THE FPT

In the following, we calculate the FPT as opposed to the MFPT calculated in the main text.

*t*given that it started at position

*x*,

^{6}we will Laplace transform the equation. Denoting $ L [ g ( x , t ) ] \u2261 G ( x , s )$, we obtain

*not*to hit the target of course). If, on the other hand, we look at the probability to hit the target

*conditioned*on hitting it, the corresponding CDF is

### APPENDIX B: DERIVING THE MFPT EQUATION DIRECTLY FROM THE EQUATION FOR THE FPT

In the main text, we derived the equation for the MFPT using the appropriate recursion relation following fundamental principles. In the following, we shall present an alternative derivation starting from the equation for the FPT, namely, Eq. (12) (also presented in the previous appendix as Eq. (A1)).

*t*given that we start at

*x*, the MFPT is obtainable from $ g ( x , t )$ as

Note that we expect this time to be finite even on an infinite domain, since the finite fall-off rate would prevent the mean time from diverging—in contrast, for example, to the diverging MFPT associated with normal random walks in 1D (we have also shown this directly from the FPT distribution in the previous appendix).

### APPENDIX C: DICE PARADOX CODE

The following Python script demonstrates the result of the dice paradox. It could be used to help the reader to develop intuition towards the paradox's result. There are two input parameters: “sides” refers to the number of sides the die has and “runs” refers to the number of runs included in the estimation of the expected value.

import numpy as np

np.random.seed(42)

sides = 1000

runs = 1000

T_w_cond = 0

T_wo_cond = 0

for i in range(runs):

hit_tgt = False

t_w_cond = 0

t_wo_cond = 0

while not hit_tgt:

t_w_cond = t_w_cond + 1

t_wo_cond = t_wo_cond + 1

roll = np.random.randint(1,sides+1)

if roll %2 != 0:

t_w_cond = 0

if roll == sides:

hit_tgt = True

T_w_cond = T_w_cond +\

t_w_cond/runs

T_wo_cond = T_wo_cond +\

t_wo_cond/runs

print(’Number of rolls w/o accounting for’

+ ’ conditioning:’, str(T_wo_cond))

print(’Number of rolls when accounting for’

+ ’ conditioning:’, str(T_w_cond))

For each run the script “rolls” the die until it lands on “sides” while keeping track of the number of rolls and the number of rolls without landing on an odd value, the latter is reset if an odd value is rolled. The script ends with printing the number of rolls without taking the conditioning into account (which we expect to be about the same as the input of “sides”) and the number of rolls when taking the conditioning into account.

## REFERENCES

*A Guide to First-Passage Processes*

*x*at time

*t*is straightforward by looking at Eq. (11), the probability to be at

*x*at time

*t*is the probability that at $\nt\n\u2212\n\delta \nt$ we were at either $\nx\n\u2212\n\delta \nx$ or at $\nx\n+\n\delta \nx$ times $\n1\n/\n2$ and times the probability the TF did not fall off. The interpretations as an equation for the probability that the TF would either hit the target

*by time t*or

*at time t*given that it started at position

*x*follow the same reasoning: The probability that the TF started at

*x*and took $\nt\n/\n\delta \nt$ steps to hit the target should be the same as starting at $\nx\n\xb1\n\delta \nx$ and arriving after $\n(\nt\n/\n\delta \nt\n)\n\u2212\n1$ steps times the probability of taking one more step (and taking into account the chance of falling off).

*A First Course in Stochastic Processes*

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*

*Stochastic Processes in Physics and Chemistry*

*Thinking Probabilistically: Stochastic Processes, Disordered Systems, and Their Applications*

*Ordinary Differential Equations and Dynamical Systems*