The quantum internet is one of the frontiers of quantum information science. It will revolutionize the way we communicate and do other tasks, and it will allow for tasks that are not possible using the current, classical internet. The backbone of a quantum internet is entanglement distributed globally in order to allow for such novel applications to be performed over long distances. Experimental progress is currently being made to realize quantum networks on a small scale, but much theoretical work is still needed in order to understand how best to distribute entanglement, especially with the limitations of near-term quantum technologies taken into account. This work provides an initial step toward this goal. In this work, we lay out a theory of near-term quantum networks based on Markov decision processes (MDPs), and we show that MDPs provide a precise and systematic mathematical framework to model protocols for near-term quantum networks that is agnostic to the specific implementation platform. We start by simplifying the MDP for elementary links introduced in prior work and by providing new results on policies for elementary links in the steady-state (infinite-time) limit. Then, we show how the elementary link MDP can be used to analyze a complete quantum network protocol. We then provide an extension of the MDP formalism to two elementary links. Here, as new results, we derive linear programing relaxations that allow us to obtain optimal steady-state policies with respect to the expected fidelity and waiting time of the end-to-end link.

## I. INTRODUCTION

The quantum internet^{1–5} is envisioned to be a global-scale interconnected network of devices which exploits the uniquely quantum-mechanical phenomenon of entanglement. By operating in tandem with today's Internet, it will allow people all over the world to perform quantum communication tasks, such as quantum key distribution (QKD),^{6–11} quantum teleportation,^{12–14} quantum clock synchronization,^{15–18} distributed quantum computation,^{19} and distributed quantum metrology and sensing.^{20–22} A quantum internet will also allow for exploring fundamental physics^{23} and for forming an international standard time.^{24} Quantum teleportation and QKD are perhaps the primary use cases of the quantum internet in the near term. In fact, there are several metropolitan-scale QKD systems already in place.^{25–32}

Scaling up beyond the metropolitan level toward a global-scale quantum internet is a major challenge. All of the aforementioned tasks require the use of shared entanglement between distant locations on the Earth, which typically has to be distributed using single-photonic qubits sent through either the atmosphere or optical fibers. It is well known that optical signals transmitted through either the atmosphere or optical fibers undergo an exponential decrease in the transmission success probability with distance,^{33–35} limiting direct transmission distances to roughly hundreds of kilometers. Therefore, one of the central research questions in the theory of quantum networks is how to overcome this exponential loss, thus distributing entanglement over long distances efficiently and at high rates.

A quantum network can be modeled as a graph $G=(V,E)$, where the vertices *V* represent the nodes in the network and the edges in *E* represent quantum channels connecting the nodes, see Fig. 1. Then, the task of entanglement distribution is to transform *elementary links*, i.e., entanglement shared by neighboring nodes, to *virtual links*, i.e., entanglement between distant nodes, see the right-most panel of Fig. 1. In this context, nodes that are not part of the virtual links to be created can act as *quantum repeaters*, i.e., helper nodes whose purpose is to mitigate the effects of loss and noise along a path connecting the end nodes, thereby making the quantum information transmission more reliable. Specifically, quantum repeaters perform entanglement distillation^{36–38} (or some other form of quantum error correction), entanglement swapping,^{12,39} and possibly some form of routing, in order to create the desired virtual links. Protocols for entanglement distribution in quantum networks have been described from an information-theoretic perspective in Refs. 40–47, and limits on communication in quantum networks have been explored in Refs. 40–54. Linear programs, and other techniques for obtaining optimal entanglement distribution rates in a quantum network, have been explored in Refs. 53 and 55–57. However, information-theoretic analyses are agnostic to physical implementations, and generally speaking, the protocols and the rates derived apply in an idealized scenario in which quantum memories have high coherence times, and quantum gate operations have no error.

What are the fundamental limitations on *near-term quantum networks*? Such quantum networks are characterized by the following elements:

Small number of nodes;

Imperfect sources of entanglement;

Non-deterministic elementary link generation and entanglement swapping;

Imperfect measurements and gate operations;

Quantum memories with short coherence times;

No (or limited) entanglement distillation/error correction.

A theoretical framework taking these practical limitations into account would act as a bridge between statements about what can be achieved in principle (which can be answered using information-theoretic methods) and statements that are directly useful for the purpose of implementation. The purpose of this work is to present the initial elements of such a theory of near-term quantum networks.

The main contribution of this work is to frame quantum network protocols in terms of Markov decision processes (MDPs) and to place the Markov decision process for elementary links introduced in Ref. 58 within an overall quantum network protocol. More specifically, the contributions of this work are as follows:

In Sec. II, we start by recapping the model for elementary link generation presented in Ref. 58. Along the way, we present Lemma II.1. While the result of Lemma II.1 is generally well known, to the best of our knowledge, its proof is not readily accessible, and thus, we provide the proof here. Then, as a new contribution, we show that the Markov decision process (MDP) for elementary links introduced in Ref. 58 can be written in a simpler manner in terms of different variables. Furthermore, we emphasize that the figure of merit associated with the MDP, as introduced in Ref. 58, takes into account both the fidelity of the elementary link and its success probability. To the best of our knowledge, such a figure of merit has not been considered in prior work. The simplified form of the MDP allows us to derive two new results. The first new result is Theorem II.4, which gives us an analytic expression for the steady-state value of an elementary link undergoing an arbitrary time-homogenous policy. The second new result is Theorem II.5, which allows us to determine the optimal steady-state value of the elementary link using a linear program. We demonstrate the usefulness of the MDP approach to modeling elementary links in Sec. II D, in which we provide an extended example of elementary links generated via satellite-to-ground transmission.

In Sec. III, we describe entanglement distillation protocols and protocols for joining elementary links (in order to create virtual links) in general terms as local operations and classical communication (LOCC) quantum instrument channels. We then present three joining protocols and write them down explicitly as LOCC channels. Doing so allows us to determine the output state of the protocol for

*any*set of input states, including input states that are noisy as a result of device imperfections, etc. This, in turn, allows us to compute the fidelity of the output state with respect to the ideal target state that would be obtained if the input states were ideal. Formulas for the fidelity at the output of the protocols are presented as Proposition III.1, Proposition III.2, and Proposition III.3. In particular, Proposition III.1 provides a formula for the fidelity at the output of the usual entanglement swapping protocol, which, to the best of our knowledge, is not explicitly found in prior works. Prior works typically use (as an approximation) the product of the individual elementary link fidelities in order to obtain the fidelity after entanglement swapping.In Sec. IV, we present a quantum network protocol that combines the Markov decision process for elementary links with known routing and path-finding algorithms. Then, we provide a general method for determining waiting times and key rates for the quantum key distribution for this protocol.

In Sec. V, we provide a first step toward extending the elementary link MDP by defining an MDP for two elementary links with entanglement swapping. We then show how to approximate waiting times using a linear program, and we find that this linear programing approximation reproduces exactly the known analytic results on the waiting time for such a scenario.

^{59}However, our result is more general, allowing us to compute waiting times for arbitrary parameter regimes, while the analytic results are true only for restricted parameter regimes. Broadly speaking, having linear-programing approximations to the waiting time and other important quantities of interest (such as fidelity) will be important when considering MDPs for larger networks.

This work is the one in a long line of work on quantum repeaters, taking device imperfections and noise into account, beginning with the initial theoretical proposal,^{60,61} and then resulting in a vast body of work.^{56,57,59,62–91} (See also Refs. 92–95 and the references therein.) All of these proposals deal almost exclusively with a single transmission line connecting a sender and a receiver. However, for a quantum internet, we need to go beyond a single transmission line, and we need to consider multiple transmission lines operating in parallel. A unified and self-consistent theoretical framework will help to guide real-world implementations. It is our hope that this work provides a good starting point along this line of thought and leads to a better understanding of how realistic, near-term quantum devices could be used to realize large-scale quantum networks and, eventually, a global-scale quantum internet.

## II. A MARKOV DECISION PROCESS FOR ELEMENTARY LINKS

We start by presenting a Markov decision process (MDP) for elementary links, as introduced in Ref. 58. To be specific, this is an MDP for an arbitrary edge of the graph corresponding to a quantum network. However, unlike Ref. 58, we present the MDP in much simpler terms in which we need not explicitly keep track of the quantum state. Through this simplification, we are able to establish a new result, Theorem II.4, which gives us the steady-state fidelity of an elementary link undergoing an arbitrary time-homogenous (stationary) policy. We start by describing the physical model of elementary link generation, considering two specific examples of transmission channels. Then, we define the MDP corresponding to this model of elementary link generation.

### A. Generating elementary links

Our model for elementary link generation is the one considered in Ref. 58 and illustrated in Fig. 2, based on the same model considered in prior work.^{76,96–98} Consider an arbitrary physical link in the network. For every such physical link, there is a source station that prepares and distributes an entangled state to the corresponding nodes. In general, all of these source stations operate independently of each other, distributing entangled states as they are requested. Specifically, we have the following.

The source produces a

*k*-partite quantum state*ρ*, $k\u22652$, and sends it to the nodes via a quantum channel $S$, leading to the state $S(\rho S)$. Here,^{S}*k*is the number of nodes belonging to an edge, with*k*= 2 corresponding to ordinary, bipartite edges (such as the red edges in Fig. 1) and $k\u22653$ corresponding to hyperedges (such as the blue bubbles in Fig. 1).- The modes perform a heralding procedure, which is a protocol involving local operations and classical communication. It can be described by a quantum instrument ${M0,M1}$, where $M0$ and $M1$ are completely positive trace non-increasing maps such that $M0+M1$ is trace preserving. These maps capture not only the probabilistic nature of the heralding procedure but also the various imperfections of the devices that are used to perform the procedure. The map $M0$ corresponds to failure of heralding and $M1$ corresponds to success. The probability of successful transmission and heralding is(1)$p=Tr[(M1\xb0S)(\rho S)],$and the states conditioned on success and failure are, respectively,(2)$\sigma 0:=1p(M1\xb0S)(\rho S),$(3)$\tau \u2205:=11\u2212p(M0\xb0S)(\rho S).$
The superscript “0” in $\sigma 0$ indicates that, upon success of the heralding procedure, the quantum systems have been immediately stored in local quantum memories at the nodes and have not yet suffered from any decoherence.

- The state of the quantum systems after $m\u2208{0,1,2,\u2026}$ time steps in the quantum memories is given by(4)$\sigma (m):=N\xb0m(\sigma 0),$
where $N$ is a quantum channel that describes the decoherence of the individual quantum memories at the nodes.

#### 1. Ground-based transmission

The most common medium for quantum information transmission for communication purposes is photons traveling through either free space or fiber-optic cables. These transmission media are modeled well by a bosonic pure-loss/attenuation channel $L\eta $,^{105} where $\eta \u2208(0,1]$ is the transmittance of the medium, which, for fiber-optic or free-space transmission, has the form $\eta =e\u2212L/L0$,^{33–35} where *L* is the transmission distance and *L*_{0} is the attenuation length of the fiber.

Before the *k* quantum systems corresponding to the source state *ρ ^{S}* are transmitted through the pure-loss channel, they are each encoded into

*d*bosonic modes with $d\u22652$. A simple encoding is the following:

sometimes called the *d-rail encoding*. In other words, using *d* bosonic modes, we form a qudit quantum system by defining the standard basis elements of the associated Hilbert space by the states corresponding to a single photon in each of the *d* modes. We let

denote the vacuum state of the *d* modes, which is the state containing no photons.

In the context of photonic state transmission, the source state *ρ ^{S}* is typically of the form $|\psi S\u27e9\u27e8\psi S|$, where

where $|\psi nS\u27e9$ is a state vector with *n* photons in total for each of the *k* parties, and the numbers $pnS\u22650$ are probabilities, so that $\u2211n=0\u221epnS=1$. For example, in the cases *k *=* *2 and *d *=* *2, the following source state is generated from a parametric down-conversion process (see, e.g., Refs. 106 and 107):

where *r* and *q* are parameters characterizing the process. One often considers a truncated version of this state as an approximation, so that^{107}

where $p0+p1+p2=1$.

Typically, the encoding into bosonic modes is not perfect, which means that a source state of the form (9) is not ideal, and that the desired state is given by one of the state vectors $|\psi jS\u27e9$, and the other terms arise due to the naturally imperfect nature of the source. For example, for the state in (12), the desired bipartite state is the maximally entangled state

Once the source state is prepared, each mode is sent through the pure-loss channel. Letting

denote the quantum channel that acts on the *d* modes of each of the *k* systems, the overall quantum channel through which the source state *ρ ^{S}* is sent is

where $\eta \u2192=(\eta 1,\eta 2,\u2026,\eta k)$ and *η _{j}* is the transmittance of the medium to the $jth$ node in the edge. The quantum state shared by the

*k*nodes after transmission from the source is then $\rho S,out=S\eta \u2192,(k;d)(\rho S)$.

Now, it is known (see, e.g., Ref. 108) that the action of the bosonic pure-loss channel on any linear operator *σ _{d}* encoded in

*d*modes according to the encoding in (7) is equivalent to the output of an erasure channel.

^{109,110}In general, a

*d*-dimensional quantum erasure channel $Ep(d)$, with $p\u2208[0,1]$, is defined as follows. Consider the vector space $\u2102d$ with orthonormal basis elements ${|0\u27e9,|1\u27e9,\u2026,|d\u22121\u27e9}$ and the vector space $\u2102d+1$ with orthonormal basis elements ${|0\u27e9,|1\u27e9,\u2026,|d\u22121\u27e9,|d\u27e9}$. Then, for every linear operator $X\u2208L(\u2102d),\u2009Ep(d)(X)=pX+(1\u2212p)|d\u27e9\u27e8d|$. Note that the output is an element of $L(\u2102d+1)$. In particular, note that the vector $|d\u27e9$ is orthogonal to the input vector space $\u2102d$.

**Lemma II.1** (Pure-loss channel with a *d*-rail encoding^{108}). Let $d\u22652$. For every linear operator *X* acting on a *d*-dimensional Hilbert space defined by the basis elements in (5)–(7), we have that

*Proof*. To start, the bosonic pure-loss channel has the following Kraus representation:^{111,112}

where *a* and $a\u2020$ are the annihilation and creation operators of the bosonic mode, respectively, which are defined as $a|n\u27e9=n|n\u22121\u27e9$ for all $n\u22651$ (with $a|0\u27e9=0$) and $a\u2020|n\u27e9=n+1|n+1\u27e9$ for all $n\u22650$.

Now, every linear operator *X* acting on a *d*-dimensional space that is encoded into d bosonic modes as in (5)–(7) can be written as

for $\alpha \u2113,\u2113\u2032\u2208\u2102$. Using (18), it is straightforward to show that

Using this, we find that

Therefore,

as required. □

After transmission from the source to the nodes, the heralding procedure typically involves doing measurements at the nodes to check whether all of the photons arrived. In the ideal case, the quantum instrument ${M0,M1}$ for the heralding procedure corresponds simply to a measurement in the single-photon subspace defined by (5)–(7). To be specific, let

where $\Pi (d)$ is the projection onto the *d*-dimensional single-photon subspace defined by (5)–(7) and $1Hd$ is the identity operator of the full Hilbert space $Hd$ of the *d* bosonic modes. Then, letting $x\u2192\u2208{0,1}k$ and defining

the maps $M0$ and $M1$ have the following form:

These maps correspond to perfect photon-number-resolving detectors. However, the detectors are typically noisy due to dark counts and other imperfections (see, e.g., Ref. 107), so that in practice, the maps $M0$ and $M1$ will not have the ideal forms presented in (31) and (32).

Let

Then, if the source produces the ideal quantum state, such as the state in (13), so that $\rho S=\Psi +=|\Psi +\u27e9\u27e8\Psi +|$, and if the heralding procedure is also ideal, then using (16), we obtain

which means that the transmission-heralding success probability as defined in (1) is simply $p=Tr[\sigma \u0303(1)]=\eta 1\eta 2$.

**Remark II.2** (Multiplexing). In practice, in order to increase the transmission-heralding success probability, multiplexing strategies are used. The term “multiplexing” here refers to the use of a single transmission channel to send multiple signals simultaneously, with the signals being encoded into distinct (i.e., orthogonal) frequency modes, see, e.g., Ref. 113. If $M\u22651$, distinct frequency modes are used, then the source state being transmitted is $(\rho S)\u2297M$. If *p* denotes the probability that any single one of the signals is received and heralded successfully, then the probability that at least one of the *M* signals is received and heralded successfully is $1\u2212(1\u2212p)M$.

#### 2. Transmission from satellites

Let us now consider the model of elementary link generation proposed in Ref. 114 in which the entanglement sources are placed on satellites orbiting the Earth. For further information on satellite-based quantum communication, we refer to Ref. 115 for a review, and we refer to Refs. 116–119 for more detailed modeling of the satellite-to-ground quantum channel than what we consider here.

When modeling photon transmission from satellites to ground stations, we must take into account background photons. Here, we analyze the scenario in which a source on board a satellite generates an entangled photon pair and distributes the individual photons to two parties, Alice (*A*) and Bob (*B*), on the ground. We allow the distributed photons to mix with background photons from an uncorrelated thermal source. Also, as before, we use the bosonic encoding defined in (5)–(7), but we stick to *d *=* *2, i.e., qubit source states and, thus, bipartite elementary links. In this scenario, it is common for the two modes to represent the polarization degrees of freedom of the photons, so that

represent the state of one horizontally and vertically polarized photon, respectively.

Let $n\xaf$ be the average number of background photons. Then, as done in Ref. 114, we can define an approximate thermal background state as

The transmission channel from the satellite to the ground stations is then

where $U\eta sg$ is the beamsplitter unitary (see, e.g., Ref. 105) and *A*_{1} and *A*_{2} refer to the horizontal and vertical polarization modes, respectively, of the dual-rail quantum system being transmitted, similarly for *E*_{1} and *E*_{2}. Note that for $n\xaf=0$, the transformation in (40) reduces to the one in (16) with *d *=* *2.

The transmittance $\eta sg$ generally depends on atmospheric conditions (such as turbulence and weather conditions) and on orbital parameters (such as altitude and zenith angle).^{117–119} In general, if the satellite is at the altitude *h* and the path length from the satellite to the ground station is *L*, then

where

and

with $\eta atmzen$ the transmittance at zenith (*ζ* = 0). In general, the zenith angle *ζ* is given by

for a circular orbit of altitude *h*, with $R\u2295\u22486378$ km being the Earth's radius. The following parameters, thus, characterize the total transmittance from satellite to ground: the initial beam waist *w*_{0}, the receiving aperture radius *r*, the wavelength *λ* of the satellite-to-ground signals, and the atmospheric transmittance $\eta atmzen$ at zenith. Throughout the rest of this section, we take^{114}^{,}*r *=* *0.75 m, $w0=2.5$ cm, *λ* = 810 nm, and $\eta atmzen=0.5$ at 810 nm.^{116}

For a source state $\rho ABS$, with $A\u2261A1A2$ and $B\u2261B1B2$, the quantum state shared by Alice and Bob after the transmission of the state $\rho ABS$ from the satellite to the ground stations is

where $\eta sg(1)$ and $\eta sg(2)$ are the transmittances to the ground stations and $n\xaf1$ and $n\xaf2$ are the corresponding thermal background noise parameters. In Sec. II D, we look at a specific example of a source state $\rho ABS$ and, thus, provide an explicit form for the state $\rho ABS,out$. We also consider the heralding procedure defined by (28)–(32) and, thus, provide explicit forms for the states $\sigma 0$ and $\tau \u2205$ in (2) and (3) corresponding to success and failure, respectively, of the heralding procedure.

### B. Definition of the MDP

Having described the physical model of elementary link generation in Sec. II A, let us now proceed to the definition of the Markov decision process (MDP) for an elementary link. Note that while the formalism of Sec. II A gives us a mathematical description of the quantum state of an elementary link immediately after it is successfully generated, the MDP formalism provides us with a systematic framework to define actions on an elementary link and their effects on the quantum state over time.

Before starting, let us briefly summarize the definition of a Markov decision process (MDP); we refer to Appendix A for more details and a detailed explanation of the notation being used. An MDP is a mathematical model of an agent performing actions on a system (usually called the environment). The system is described by a set $S$ of *(classical) states*, and the agent picks actions from a set $A$. Corresponding to every action, $a\u2208A$ is a $|S|\xd7|S|$ *transition matrix T ^{a}*, such that the matrix element $Ta(s\u2032;s)$ is equal to the probability of transitioning to the state $s\u2032\u2208S$, given that the current state is $s\u2208S$ and the action $a\u2208A$ is taken.

The results of Ref. 58 show us that, for the purposes of tracking the quantum state of an elementary link over time as well as its fidelity to a target pure state, it is enough to keep track of the time that the quantum systems of the elementary link reside in their respective quantum memories. With this observation, we can define a simpler MDP for elementary links.

*States*: The states in our elementary link MDP are defined by the set $S={\u22121,0,1,\u2026,m\u22c6}$, which correspond to the number of time steps that the quantum systems of the elementary link have been sitting in their respective quantum memories. The state −1 corresponds to the elementary link being inactive, and $m\u22c6$ corresponds to the coherence time of the quantum memory. Specifically, if $tcoh$ is the coherence time of the quantum memory (say, in seconds), and the duration of the every time step (in seconds) is $\Delta t$ (based on the classical communication time between the nodes in the elementary link), then $m\u22c6=(tcoh/\Delta t)$. From now on, we refer to $m\u22c6$ as the*maximum storage time*of the elementary link. We use*M*(*t*), $t\u2208\mathbb{N}$, to refer to the random variables (taking values in $S$) corresponding to the state of the MDP at time*t*. We also associate to the elements in $S$ orthonormal vectors ${|m\u27e9}m\u2208S$, and we emphasize that these vectors should not be thought of as representing quantum states but as representing the extreme points of a probability simplex associated with the set $S$, see Appendix A for details.*Actions*: The set of actions is $A={0,1}$, where 0 corresponds to the action of “wait” and “1” corresponds to “request.” In other words, at every time step, the agent can decide to keep their quantum systems currently in memory or to discard the quantum systems and perform the elementary link generation procedure again.The transition matrices*T*^{0}and*T*^{1}corresponding to the two actions are defined as follows:(48)$T0=1(\u2212)+B(+),$(49)$T1=|gp\u27e9\u27e8\gamma |,$where(50)$1(\u2212):=|\u22121\u27e9\u27e8\u22121|,$(51)$B(+):=\u2211m=0m\u22c6\u22121|m+1\u27e9\u27e8m|+|\u22121\u27e9\u27e8m\u22c6|,$(52)$|gp\u27e9:=(1\u2212p)|\u22121\u27e9+p|0\u27e9,$(53)$|\gamma \u27e9=\u2211m=\u22121m\u22c6|m\u27e9.$(Note that we define our transition matrices such that probability vectors are applied to them from the right, see Appendix A for details.) The transition matrix

*T*^{0}describes what happens to the elementary link when the action*a*= 0 (wait) is taken by the agent: if the elementary link is currently inactive, then it stays inactive; if the elementary link is active and it is in memory for less than $m\u22c6$ time steps, then the memory time is incremented by one; if the elementary link is active and it has been in memory for $m\u22c6$ time steps, then because the coherence time of the memory has been reached (as per the definition of $m\u22c6$), the elementary link becomes inactive. If the action*a*= 1 (request) is taken, then regardless of the current state of the elementary link, the state changes to −1 (inactive) with probability $1\u2212p$, meaning that the elementary link generation failed, or it changes to 0 with probability*p*, meaning that the elementary link generation succeeded. These two possibilities are captured by the probability vector $|gp\u27e9$.We use*A*(*t*), $t\u2208\mathbb{N}$, to refer to the random variable (taking values in the set $A$) corresponding to the action taken at time*t*.We let $H(t)=(M(1),A(1),M(2),A(2),\u2026,A(t\u22121),M(t))$ be the

*history*, consisting of a sequence of states and actions, up to time*t*, with $H(1)=M(1)$.

- •
*Figure of merit*: Our figure of merit for an elementary link is the following function:(54)$f(m):={\u27e8\psi |\sigma (m)|\psi \u27e9if\u2009m\u2208{0,1,2,\u2026,m\u22c6}0if\u2009m=\u22121$(55)$=(1\u2212\delta m,\u22121)\u27e8\psi |\sigma (m)|\psi \u27e9,$where $\sigma (m)$ is defined in (4) and $|\psi \u27e9$ is a target state vector for the elementary link. (For example, if the elementary link contains two nodes, then $|\psi \u27e9$ could be the state vector for the two-qubit maximally entangled state.) We emphasize that the function

*f*is not just the fidelity of the elementary link—it also depends implicitly on the probability that the elementary link is active because if*f*was simply the fidelity of the elementary link, then instead of the definition $f(\u22121)=0$, we would have $f(\u22121)=\u27e8\psi |\tau \u2205|\psi \u27e9$, where $\tau \u2205=(1/(1\u2212p))(M0\xb0S)(\rho S)$ is the quantum state corresponding to failure of the heralding procedure, see (3). We illustrate the importance of this distinction, therefore the usefulness of this figure of merit for designing and evaluating protocols, in Sec. II D 2 a, specifically Fig. 7. To the best of our knowledge, this figure of merit has not been considered in prior work.

A *policy* is a sequence $\pi =(d1,d2,\u2026)$ of *decision functions* $dt:S\xd7A\u2192[0,1]$, which indicate the probability of performing a particular action conditioned on the state of the system,

For a particular policy $\pi =(d1,d2,\u2026,dt\u22121)$, the probability of a particular history $ht=(m1,a1,m2,a2,\u2026,at\u22121,mt)$ of states and actions is (see Appendix A 2)

Then, the quantum state of the elementary link is^{58}

where we recall that $\sigma (mt)$ is given by (4).

We are interested primarily in the expected value of the function *f* defined in (55) at times $t\u2208\mathbb{N}$,

for policies $\pi =(d1,d2,\u2026,dt\u22121)$. We are also interested in the probability that the elementary link is active at time $t\u2208\mathbb{N}$, which is given by

From this, the expected fidelity of the elementary link is given by

We are interested in the maximum value of the function $F\u0303\pi (t)$ defined in (60) among all policies *π*,

A policy *π* achieving the supremum is called an optimal policy.

In the steady-state (infinite-time) limit, we are interested in the maximum value of $F\u0303\pi (t)$ among all time-homogeneous (stationary) policies $\pi =(d,d,\u2026)$, i.e., policies in which a fixed decision function *d* is used at every time step,

if the limit exists.

### C. Policies

In Ref. 58, it was shown that a policy that achieves the optimal value in (63) can be determined using a backward recursion algorithm. We restate this algorithm here for completeness.

**Theorem II.3** (Optimal finite-time policy for an elementary link^{58}). For all $t\u2208\mathbb{N}$, the optimal expected fidelity of an elementary link with success probability $p\u2208[0,1]$ is given by

where

for all $j\u2208{2,3,\u2026,t\u22121}$, and

Furthermore, the optimal policy is deterministic and given by $\pi =(d1*,d2*,\u2026,dt\u22121*)$, where

Intuitively, the result of Theorem II.3 tells us that, for finite times, the optimal policy can be found by optimizing the individual actions going “backwards in time,” by first optimizing the final action at time *t* – 1, then optimizing the action at time *t* − 2, etc., and then finally optimizing the action at time *t *=* *1. This is, indeed, the case because from (68), we see that the optimal action at the first time step is obtained using the function *w*_{2}, but from (66), we see that to calculate *w*_{2}, we need *w*_{3}, and to calculate *w*_{3}, we need *w*_{4}, etc., until we get to the function *w _{t}* for the final time step, which we can calculate using (67).

While the optimal policy for finite times was determined in Ref. 58, the steady-state value of the expected fidelity under arbitrary stationary policies [i.e., the value in (64)] was not determined. We now show that the limit in (64) exists, and we determine its value for arbitrary decision functions.

**Theorem II.4** (Steady-state expected value of an elementary link). Let *p* be the success probability of generating an elementary link in a quantum network, and let *d* be a decision function, such that $d(m)(0)=\alpha (m)$ is the probability of executing the action wait and $d(m)(1)=1\u2212d(m)(0)=\alpha \xaf(m)$ is the probability of executing the action request. Then, if the elementary link undergoes the stationary policy $(d,d,\u2026)$, then

where

with

*Proof*. See Appendix D. □

Using Theorem II.4, we can determine the optimal steady-state value of the function $F\u0303(d,d,\u2026)$, thus the optimal decision function *d*, by optimizing the quantity in (69) with respect to $m\u22c6$ independent variables $\alpha (\u22121),\alpha (0),\u2026,\alpha (m\u22c6)$ subject to the constraints $\alpha (m)\u2208[0,1]$ for all $m\u2208{\u22121,0,1,\u2026,m\u22c6}$. [Recall from the statement of Theorem II.4 that the variables $\alpha (m)$ are directly related to the decision function *d*.] Alternatively, we can use the following linear program in order to obtain an optimal policy.

**Theorem II.5** (Linear program for the optimal steady-state value of an elementary link). Consider an elementary link in a quantum network with maximum memory time $m\u22c6$. Let $|f\u27e9:=\u2211m=\u22121m\u22c6f(m)|m\u27e9$. Then, the optimal steady-state value of the elementary link, namely, the quantity in (64), is equal to the solution of the following linear program:

where the optimization is with respect to the $(m\u22c6+1)$-dimensional vectors $|v\u27e9,|w0\u27e9,|w1\u27e9$, and the inequality constraints on the vectors are componentwise. For every feasible point of this linear program, we obtain a decision function *d* as follows: $d(m)(a)=(\u27e8m|wa\u27e9/\u27e8m|v\u27e9)$ for all $m\u2208{\u22121,0,1,\u2026,m\u22c6}$ and $a\u2208{0,1}$. If $\u27e8m|v\u27e9=0$, then we set $d(m)(0)=\alpha (m)$ and $d(m)(1)=1\u2212\alpha (m)$ for an arbitrary $\alpha (m)\u2208[0,1]$.

**Proof.** The linear program in (74) is a special case of the linear program presented in Proposition A.2 in Appendix A. The main assumption of that result is that the MDP be ergodic, which is true in this case by Theorem II.4. □

#### 1. The memory-cutoff policy

An example of a stationary policy is the *memory-cutoff policy*, which has been considered extensively in prior work.^{58,59,63–65,97,98,100,120–123} This is a deterministic policy that is defined by a cutoff time $t\u22c6\u2208\mathbb{N}0\u222a{\u221e}$, where $\mathbb{N}0:={0,1,2,\u2026,}$, such that $t\u22c6\u2264m\u22c6$. Then,

Then, by Theorem II.4, we have $Nd=1+t\u22c6p$, so that

for all $t\u22c6\u2208\mathbb{N}0$, which agrees with Ref. 58 [Eq. (4.15)], which was obtained using different methods. We also obtain

for all $t\u22c6\u2208\mathbb{N}0$.

For $t\u22c6=\u221e$, we have, for all $t\u22651$,^{58}

In what follows, we make use of the following definitions for the deterministic decision functions corresponding to the memory-cutoff policy:

### D. Example: Satellite-to-ground entanglement distribution

#### 1. Quantum state of an elementary link

In Sec. II A 2, we defined the transmission channel corresponding to the transmission of entanglement from a satellite to two ground stations. In particular, if we consider two ground stations, one corresponding to Alice and one corresponding to Bob, then given a state $\rho ABS$ produced by the source on the satellite, the state after the transmission of the system *A* to Alice and the system *B* to Bob is given by (48)

where $\eta sg(1)$ and $\eta sg(2)$ are the transmittances to the ground stations and $n\xaf1$ and $n\xaf2$ are the corresponding thermal background noise parameters.

After transmission, we assume a heralding procedure defined by post-selecting on coincident events using (perfect) photon-number-resolving detectors. One can justify this assumption because, in the high-loss and low-noise regimes ($\eta sg(1),\eta sg(2),n\xaf\u226a1$), the probability of four-photon and three-photon occurrences is negligible compared to two-photon events. Therefore, upon successful heralding, the (unnormalized) quantum state shared by Alice and Bob is

where

is the projection onto the two-photon-coincidence subspace. Note that the projection Π_{AB} is exactly the projection $\Lambda 1\u2297\Lambda 1$, with $\Lambda 1$ defined in (28). Then, the transmission-heralding success probability is, as per the definition in (1),

Now, let us take the source state $\rho ABS$ to be the following:

where $fS\u2208[0,1]$ and

**Proposition II.6** (Quantum state of a satellite-to-ground elementary link^{114}) Let $\eta sg(1),\eta sg(2),n\xaf1,n\xaf2\u2208[0,1]$, and consider the source state $\rho ABS$ given by (88). Then, after successful heralding, the (unnormalized) state $\sigma \u0303AB(1)$ given by (84) is equal to

where

and

for $i\u2208{1,2}$.

From (93), we have that the transmission-heralding success probability is given by

so that the quantum state shared by Alice and Bob conditioned on successful heralding is, as per the definition in (2),

##### a. Success probability and fidelity

Let us now evaluate the quality of entanglement transmission from a satellite to two ground stations. For illustrative purposes, and for simplicity, we focus primarily on the simple scenario depicted in Fig. 3, in which a satellite passes over the midpoint between two ground stations, although the same analysis can be done even when this is not the case. Since the satellite is an equal distance away from both ground stations, we have $\eta sg(1)=\eta sg(2)$. We also let $n\xaf1=n\xaf2$. This means that $x1=x2\u2261x,\u2009y1=y2\u2261y$, and $z1=z2\u2261z$, so that

In this scenario, given a distance *d* between the ground stations and an altitude *h* for the satellite, by simple geometry, the distance *L* between the satellite and either ground station is given by

where $R\u2295$ is the radius of Earth.

Now, let us consider the transmission-heralding success probability *p* in (98). Due to the altitude of the satellites, there typically has to be multiplexing of the signals (see Remark II.2) in order to maintain a high probability of both ground stations receiving the entangled state. In Fig. 4, we plot the success probability with multiplexing, which is given by $1\u2212(1\u2212p)M$, where *M* is the number of distinct frequency modes used for multiplexing.

We also plot in Fig. 4 the fidelity of the initial state, which is given by

The fidelity of $\sigma AB0$ with respect to $\Phi AB+$ is related in a simple way to the entanglement of $\sigma AB0$. In particular, by the partial positive transpose (PPT) criterion,^{124,125} $\sigma AB0$ is entangled if and only if its fidelity with respect to $\Phi AB+$ is strictly greater than $1/2$, and this leads to constraints on the loss and noise parameters of the satellite-to-ground transmission.

**Proposition II.7**. The quantum state $\sigma AB0$ after the successful satellite-to-ground transmission, as defined in (99), is entangled if and only if the fidelity of the source state in (88) satisfies $fS>(1/2)$, and

*Proof*. Observe that the state $\sigma AB0$ is a Bell-diagonal state of the form

where $\alpha ,\beta ,\gamma \u22650$ [when $fS>(1/2)$]. Indeed, the coefficient of $\Phi AB+$ in (93) can be written as

and the coefficient of $\Phi AB\u2212$ in (94) can be written as

We can, thus, make the following identifications:

Now, using the PPT criterion,^{124,125} we have that $\sigma AB0$ is entangled if and only if $\u27e8\Phi +|\sigma AB0|\Phi +\u27e9>(1/2)$. Then, from (102), we have that

so we require

Simplifying this leads to

as required. □

Now, for the scenario depicted in Fig. 3, we have that $x1=x2=x,\u2009y1=y2=y$, and $z1=z2=z$, so that from (100), we have $a=x2+y2,\u2009b=z2$, and $c=2xy$. Substituting this into (105) leads to $2(fS\u22121)(x2+y2)+(4fS\u22121)z2\u22122(1+2fS)xy>0$ as the condition for $\sigma AB0$ to be entangled. We plot this condition in Fig. 5. The inequality gives us the colored regions, and the values within the regions are obtained by evaluating the fidelity according to (102).

##### b. Key rates for QKD

Let us also consider key rates for quantum key distribution (QKD) between Alice and Bob, who are at the ends of the elementary link whose quantum state is $\sigma AB0$ (conditioned on successful transmission and heralding), as given by (100). We consider the BB84, six-state, and device-independent (DI) QKD protocols, and we calculate the secret key rates using known asymptotic secret key rate formulas, which we review (along with other necessary background on QKD) in Appendix C.

Recalling from the proof of Proposition II.7 that $\sigma AB0$ is a quantum state of the form

with *α*, *β*, and *γ* defined in (109)–(111), it is easy to show using (C2)–(C6) that the quantum bit-error rates (QBERs) for the BB84 and six-state protocols are

For the device-independent protocol, we assume that the correlation is such that the quantum bit-error rate is $QDI(d,h)=Q6-state(d,h)$ and $S(d,h)=22(1\u22122QDI(d,h))$. Then, assuming that *M* signals per second are transmitted from the satellite, the secret-key rate (in units of secret key bits per second) is given by $K\u0303=pMK$, where $p=a+c$ is the success probability of elementary link generation and *K* is the asymptotic secret key rate per copy of the state $\sigma AB0$, which depends on the protocol under consideration. Using the formulas in Appendix C, we obtain

We plot these secret key rates in Fig. 6.

In Fig. 6, notice that the region of non-zero secret key rate is largest for the six-state protocol, with the region for the BB84 protocol being smaller and the region for the DI protocol being even smaller. This is due to the fact that the error threshold for the DI protocol is the smallest among the three protocols, with the error threshold for the BB84 protocol slightly larger, and the error threshold for the six-state protocol the largest.

##### c. Quantum memory model

Having examined the quantum state immediately after successful transmission and heralding, let us now consider a particular model of decoherence for the quantum memories in which the transmitted qubits are stored. For illustrative purposes, we consider a simple amplitude damping decoherence model for the quantum memories. The amplitude damping channel $A\gamma $ is a qubit channel, with $\gamma \u2208[0,1]$, such that^{126}

Note that for *γ* = 0, we recover the noiseless (identity) channel. We can relate *γ* to the coherence time of the quantum memory, which we denote by $tcoh$, as follows (Ref. 127, Sec. 3.4.3):

Note that infinite coherence time corresponds to an ideal quantum memory, meaning that the quantum channel is noiseless. Indeed, by relating the noise parameter *γ* to the coherence time as in (125), we have that $tcoh=\u221e\u21d2\gamma =0$.

For $m\u2208\mathbb{N}0$ applications of the amplitude damping channel, it is straightforward to show that

where $\lambda m:=e\u2212m/tcoh=(1\u2212\gamma )m$. Then, for all $m\u2208\mathbb{N}0$,

where *α* and *β* are given by (110) and (111), respectively. Note that we have assumed that the memories corresponding to systems *A* and *B* have the same coherence time. It follows that

Note that $f(m)\u2264f(0)$ for all $m\u2208\mathbb{N}0$.

#### 2. Policies

##### a. Memory-cutoff policy

Let us now consider the memory-cutoff policy, which we defined in Sec. II C. Using (77) and (78), along with the expression for *f*(*m*) in (138), for every cutoff $t\u22c6\u2208\mathbb{N}0$, we obtain

Then, using the fact that $\lambda m=e\u2212m/tcoh$, it is straightforward to show that

Therefore, in the steady-state limit,

For $t\u22c6=\u221e$, from (78), we obtain

for all $t\u22651$. Evaluating the sums leads to

Then, for all $p\u2208(0,1]$, we obtain $limt\u2192\u221eF\u0303\u221e(t)=(1/2)$.

Let us now focus primarily on the $t\u22c6=\u221e$ memory-cutoff policy by considering an example. Consider the situation depicted in Fig. 3, in which we have two ground stations separated by a distance *d* and a satellite at the altitude *h* that passes over the midpoint between the two ground stations. Now, given that the ground stations are separated by a distance *d*, it takes time at least $2d/c$ to perform the heralding procedure, as this is the round-trip communication time between the ground stations (*c* is the speed of light). We, thus, take the duration of each time step in the decision process for the elementary link to be $2d/c$. If the coherence time of the quantum memories is *x* seconds, then $tcoh=(xc/2d)$ time steps. In Fig. 7, we plot the quantities $F\u0303\u221e(t)$ (solid lines), $F\u221e(t)$ (dashed lines), and $X\u221e(t)$ (dotted lines) for the $t\u22c6=\u221e$ memory-cutoff policy under this scenario.

In Fig. 7, we can see the trade-off among the quantities $F\u0303$, *F*, and *X*. On the one hand, the fidelity $F\u221e(t)$ is always highest at time *t *=* *1, as we expect, but at this point, the probability $X\u221e(t)$ that the elementary link is active is simply *p*. Since we want not only a high fidelity for the elementary link but also a high probability that the elementary link is active, by optimizing $F\u0303$, it is possible to achieve a higher elementary link activity probability at the expense of a slightly lower fidelity. Specifically, in Fig. 7, we see that for every choice of *d* and *h*, there exists a time step $tcrit\u22651$ at which $F\u0303$ is maximal. At this point, the elementary link activity probability is $1\u2212(1\u2212p)tcrit$, which, in many cases, is dramatically greater than *p*, while the fidelity $F\u221e(tcrit)$ is only slightly lower than the fidelity at time *t *=* *1. Therefore, by waiting until time $tcrit$, it is possible to obtain an elementary link that is almost deterministically active, while incurring only a slight decrease in the fidelity. The time $tcrit$, obtained by optimizing the quantity $F\u0303\u221e(t)$ with respect to time *t* and can be found using the formula in (145), can be viewed as the optimal time *t* that should be chosen for the quantum network protocol presented in Fig. 13. We refer to Ref. 128 for an argument similar to the one presented here, except that in Ref. 128, the time $tcrit$ is obtained by considering a desired value of the fidelity $F\u221e(t)$ rather than by optimizing $F\u0303\u221e(t)$ with respect to *t*, which is what we do here.

##### b. Forward recursion policy

The forward recursion policy is defined as the time-homogeneous policy, such that the action at time *t* is equal to the one that maximizes the quantity $F\u0303\pi (t+1)$ at the next time step. The corresponding decision function is^{58}

Observe that if *p *=* *1, then the second condition in (146) is always false because of the fact that $f(m)\u2264f(0)$ for all $m\u2208\mathbb{N}0$, see (137). Therefore, when *p *=* *1, we have that $dtFR=dt0$, i.e., the forward recursion policy is equal to the $t\u22c6=0$ memory-cutoff policy, see (82). We now show that the forward recursion policy reduces to a memory-cutoff policy even when *p *<* *1.

**Proposition II.8.** Consider the satellite-to-ground bipartite elementary link generation with $n\xaf1=n\xaf2=0$ and *f _{S}* = 1, and let $p\u2208(0,1)$ be the transmission-heralding success probability, as given by (98). Let $tcoh$ be the coherence time of the quantum memories, as defined in Sec. II D 1. Then, for all $t\u22651$

where

In other words, if $p\u2264(1/2)$, then the forward recursion policy is equal to the $t\u22c6=\u221e$ memory-cutoff policy; if $p>(1/2)$, then the forward recursion policy is equal to the $t\u22c6$ memory-cutoff policy, with $t\u22c6$ given by (148).

**Remark II.9**. The result of Proposition II.8 goes beyond elementary link generation with satellites because we assumed that $n\xaf1=n\xaf2=0$ and *f _{S}* = 1. As a result of these assumptions, the result of Proposition II.8 applies to every elementary link generation scenario (such as ground-based elementary link generation as described in Sec. II A 1) in which the transmission channel is a pure-loss channel, the heralding procedure is described by (28)–(32), the source state is equal to the target state, and the quantum memories are modeled as in Sec. II D 1.

**Proof.** For the state $\sigma AB0$ as given by (2), using (138), the second condition in (147) translates to

In the case $n\xaf1=n\xaf2=0$ and *f _{S}* = 1, we have that $\alpha =\beta =(1/2)$, so that the inequality in (150) becomes

Now, this inequality is satisfied for all $m\u2208\mathbb{N}0$ if and only if $p\u2264(1/2)$. In other words, if $p\u2264(1/2)$, then for all possible memory times, the action is to wait if the elementary link is currently active, meaning that the decision function in (146) becomes

which is precisely the decision function $d\u221e$ for the $t\u22c6=\u221e$ memory-cutoff policy, see (82).

For $p\u2208(1/2,1)$, whether or not the inequality in (151) is satisfied depends on the memory time *m*. Consider the largest value of *m* for which the inequality is satisfied and denote that value by $mmax$. Since the action is to wait, at the next time step, the memory value will be $mmax+1$, which by definition will not satisfy the inequality in (149). This means that for all memory times strictly less than $mmax+1$, the forward recursion policy dictates that the wait action should be performed if the elementary link is currently active. As soon as the memory time is equal to $mmax+1$, then the forward recursion policy dictates that the request action should be performed. This means that $mmax+1$ is a cutoff value. In particular, by rearranging the inequality in (151), we obtain

which means that

and

as required. □

Observe that the cutoff in (148) is equal to zero for all $p\u2265(1/2)(1+e\u22122/tcoh)$. This means that *p *=* *1 is not the only transmission-heralding success probability for which the forward recursion policy is equal to the $t\u22c6=0$ memory-cutoff policy. Intuitively, for $(1/2)(1+e\u22122/tcoh)\u2264p\u22641$, the transmission-heralding success probability is high enough that it is not necessary to store the quantum state in memory—for the purpose of maximizing the expected value of $F\u0303$, it suffices to request a new quantum state at every time step. At the other extreme, for $0\u2264p\u22641/2$, the probability is too low to keep requesting—for the purpose of maximizing the expected value of $F\u0303$, it is better to keep the quantum state in memory indefinitely.

##### c. Backward recursion policy

Finally, to end this section, let us consider the backward recursion policy, which we know to be optimal from Theorem II.3. We perform the policy optimization for small times, just as a proof of concept.

In Fig. 8, we plot optimal values of $F\u0303\pi (t+1)$ for a single elementary link, except now we plot them as a function of the ground station distance *d* and the satellite altitude *h* as per the situation depicted in Fig. 3. We also plot the elementary link activity probability $X\pi (t+1)$ and the expected fidelities $F\pi (t+1)$ associated with the optimal policies. As before, we assume that *f _{S}* = 1, but unlike before, we assume that $n\xaf1=n\xaf2=10\u22124$, and we consider multiplexing with $M=105$ distinct frequency modes per transmission. We assume a coherence time of 1 s throughout. For small distance-altitude pairs, we find that the optimal value is reached within five time steps. For these cases, it is worth pointing out that the optimal value of $F\u0303\pi (t+1)$ corresponds to an elementary link activity probability $X\pi (t+1)$ of nearly one, while the fidelity (although it drops, as expected) does not drop significantly, meaning that the elementary link can still be useful for performing entanglement distillation of parallel elementary links or for creating virtual links. It is also interesting to point out that for a ground distance separation of

*d*=

*2000 km, the optimal values for satellite altitude*

*h*=

*1000 km are higher than for*

*h*=

*500 km. This result can be traced back to the top-left panel of Fig. 4, in which we see that the transmission-heralding success probability curves for*

*h*=

*500 km and*

*h*=

*1000 km cross over at around 1700 km, so that*

*h*=

*1000 km has a higher probability than*

*h*=

*500 km when*

*d*=

*2000 km.*

## III. ENTANGLEMENT DISTILLATION AND JOINING PROTOCOLS

In Sec. II, we discussed elementary links in a quantum network, how to model the generation of elementary links, and how to model them in time in terms of a Markov decision process. The description of an elementary link in terms of a Markov decision process allows us to determine, as a function of time, the quantum state of an elementary link. Keeping in mind the overall goal of entanglement distribution, i.e., the creation of long-distance virtual links, the next step in an entanglement distribution protocol is to take elementary links, to improve their fidelity using entanglement distillation, and then to join them in order to create the virtual links (using, e.g., entanglement swapping). In this section, we explain how to model entanglement distillation protocols and joining protocols using LOCC channels. We refer to Appendix B 2 for a detailed explanation of LOCC channels. The explicit description of these protocols as LOCC channels is important because, as we saw in Sec. II, the quantum state of an elementary link will not always be the ideal entangled state with respect to which joining protocols are typically defined. It is, therefore, important to understand how the protocols will act when the input states are not ideal.

### A. Entanglement distillation

The term “entanglement distillation” refers to the task of taking many copies of a given quantum state *ρ _{AB}* and transforming them, via an LOCC protocol, to several (fewer) copies of the maximally entangled state $\Phi AB$. Typically, with only a finite number of copies of the initial state

*ρ*, it is not possible to perfectly obtain copies of the maximally entangled state, so we aim, instead, for a state

_{AB}*σ*whose fidelity $F(\Phi AB,\sigma AB)$ to the maximally entangled state is higher than the fidelity $F(\Phi AB,\rho AB)$ of the initial state. Mathematically, the task of entanglement distillation corresponds to the transformation

_{AB}where $n,m\u2208\mathbb{N}$, *m *<* n*, and $LAnBn\u2192AmBm$ is an LOCC channel.

Typically, in practice, we have *n *=* *2 and *m *=* *1, with the task being to transform two two-qubit states $\rho A1B11$ and $\rho A2B22$ to a two-qubit state $\sigma A1B1$ having a higher fidelity to the maximally entangled state than the initial states. Protocols achieving this aim are typically probabilistic in practice, meaning that the state $\sigma A1B1$ with higher fidelity is obtained only with some non-unit probability.

We are not concerned with any particular entanglement distillation protocol in this work. All we are concerned with is their mathematical structure. In particular, entanglement distillation protocols that are probabilistic can be described mathematically as an LOCC instrument, which we now demonstrate with a simple example, depicted in Fig. 9, which comes from Ref. 36. In this protocol, Alice and Bob first apply the CNOT gate to their qubits and follow it with a measurement of their second qubit in the standard basis. They then communicate the results of their measurement to each other. The protocol is considered successful if they both obtain the same outcome and a failure otherwise. This protocol has the following corresponding LOCC instrument channel:

where

Furthermore, the states $\rho AjBjiso,j,\u2009j\u2208{1,2}$ are defined as

where $TU$ is the *isotropic twirling channel*, see, e.g., Ref. 129 (Example 7.25).

It is a straightforward calculation to show that if $f1=\u27e8\Phi |\rho A1B11|\Phi \u27e9$ and $f2=\u27e8\Phi |\rho A2B22|\Phi \u27e9$ are the fidelities of the initial states with the maximally entangled state, then the protocol depicted in Fig. 9, with corresponding LOCC channel given by (157), succeeds with probability,

and the fidelity of the output state $\sigma A1B1$ with the maximally entangled state (conditioned on success) is

The above example illustrates a general principle, which is that entanglement distillation protocols that are probabilistic (and heralded) can be described using LOCC instrument channels. Specifically, let $G=(V,E)$ be the graph corresponding to the physical links in a quantum network. Given an element $e\u2208E$ with *n* parallel edges $e1,e2,\u2026,en$, every probabilistic entanglement distillation protocol has the form of an LOCC instrument channel of the following form:

where $De1\cdots en\u2192e1\cdots en\u2032e;0$ and $De1\cdots en\u2192e1\cdots en\u2032e;1$ are completely positive trace non-increasing LOCC maps, such that $De1\cdots en\u2192e1\cdots en\u2032e;0+De1\cdots en\u2192e1\cdots en\u2032e;1$ is a trace-preserving map, thus an LOCC quantum channel. Specifically, $De1\cdots en\u2192e1\cdots en\u2032e;0$ corresponds to failure of the protocol and $De1\cdots en\u2192e1\cdots en\u2032e;1$ corresponds to success of the protocol.

### B. Joining protocols

Let us now discuss joining protocols, such as entanglement swapping. We can describe such protocols using LOCC instrument channels, just as with entanglement distillation protocols. As above, let $G=(V,E)$ be the graph corresponding to the physical links in a quantum network. A path in a graph is a sequence $w=(v1,e1,v2,e2,\u2026,en\u22121,vn)$ of vertices and edges that specifies how to get from the vertex *v*_{1} to the vertex *v _{n}*. Given a path

*w*of active elementary links in the network, the joining channel $Lw\u2192e\u2032$ that forms the new virtual link $e\u2032$ is given in the probabilistic setting by

where $Lw\u2192e\u20320$ and $Lw\u2192e\u20321$ are completely positive trace non-increasing LOCC maps, such that $Lw\u2192e\u20320+Lw\u2192e\u20321$ is a trace-preserving map, thus an LOCC quantum channel. Specifically, $Lw\u2192e\u20320$ corresponds to failure of the joining protocol, and $Lw\u2192e\u20321$ corresponds to success of the joining protocol. Given an input state *ρ _{w}* corresponding to the given path

*w*, the success probability of the joining protocol is $psucc=Tr[Lw\u2192e\u20321(\rho w)]$, and the state conditioned on success is

Note that as input states to the maps $Lw\u2192e\u20320$ and $Lw\u2192e\u20321$, we could have arbitrary states of the elementary links along the path *w*. In particular, depending on the elementary link policy, they could be states of the form (59), which take into account the noise in the quantum memories and other device imperfections arising during the process of generating the elementary links.

The precise joining protocol, and thus, the explicit form for the maps $Lw\u2192e\u20320$ and $Lw\u2192e\u20321$, depends on the type of entanglement that is to be created. For bipartite entanglement, we consider entanglement swapping in Sec. III B 1. For tripartite GHZ entanglement, we describe a protocol in Sec. III B 2, and for multipartite graph states, we describe a protocol in Sec. III B 3.

#### 1. Entanglement swapping protocol

Let $\rho AR\u21921R\u21922\cdots R\u2192nB$ be a multipartite quantum state, where $n\u22651$ and $R\u2192j\u2261Rj1Rj2$ is an abbreviation for two the quantum systems $Rj1$ and $Rj2$. The entanglement swapping protocol with *n* intermediate nodes is defined by a Bell-basis measurement of the systems $R\u2192j$, i.e., a measurement described by the positive operator-valued measure (POVM) ${\Phi z,x:z,x\u2208[d]}$, where $[d]={0,1,\u2026,d\u22121},\u2009\Phi z,x=|\Phi z,x\u27e9\u27e8\Phi z,x|$, and

are the qudit Bell state vectors, with

The operators *Z* and *X* are the discrete Weyl operators,^{129} which are defined as

Conditioned on the outcomes (*z _{j}*,

*x*) of the Bell measurement on $R\u2192j$, the unitary $ZBz1+\cdots +znXBx1+\cdots +xn$ is applied to the system

_{j}*B*, where the addition is performed modulo

*d*. Let $z\u2192,x\u2192\u2208[d]\xd7n$ and define

where the addition in the second line is performed modulo *d*. Then, the LOCC quantum channel corresponding to the entanglement swapping protocol with $n\u22651$ intermediate nodes is

The standard entanglement swapping protocol^{39} corresponds to the input state

This scenario is shown in Fig. 10. Indeed, it can be shown that

Furthermore, the standard teleportation protocol^{12} corresponds to *n *=* *1 and the input state

where $A=\u2205$ is a trivial (one-dimensional) system and $\sigma R11$ is an arbitrary *d*-dimensional quantum state, so that

as expected.

**Proposition III.1** (Fidelity after entanglement swapping). For all $n\u22651$ and all states $\rho AR111,\rho R12R212,\u2026,\rho Rn2Bn+1$, the fidelity of the maximally entangled state with the state after entanglement swapping of $\rho AR111,\rho R12R212,\u2026,\rho Rn2Bn+1$ is given by

where $z\u2032=\u2212z1\u2212z2\u2212\cdots \u2212zn$ and $x\u2032=\u2212x1\u2212x2\u2212\cdots \u2212xn$.

**Proof.** See Appendix E 1. □

A simple way to make the entanglement swapping protocol probabilistic is to modify the measurement operators $MR\u21921\cdots R\u2192nz\u2192,x\u2192$ in the ideal protocol as follows:

where ${\Lambda R\u2192jzj,xj,\alpha j}zj,xj,\alpha j\u2208{0,1},\u2009j\u2208{1,2,\u2026,n}$ are POVMs, such that

The values $qj\u2208[0,1]$ represent the success probability of the Bell-basis measurement at the $jth$ intermediate node. We then define the LOCC instrument channel for the probabilistic entanglement swapping protocol as follows:

where

and

Then, the success probability of the protocol is

for every state $\rho AR\u21921\cdots R\u2192nB$.

#### 2. GHZ entanglement swapping protocol

The previous example takes a chain of Bell states and transforms them into a Bell state shared by the end nodes of the chain. In this example, we look at a protocol that takes the same chain of Bell states and transforms them instead to a multi-qubit GHZ state, which is defined as^{130}

We call this protocol as the *GHZ entanglement swapping protocol*.

The protocol for transforming a chain of two Bell states to a three-party GHZ state is shown in Fig. 11. First, the two qubits $R11$ and $R12$ in the central node are entangled with a CNOT gate, followed by a measurement of $R12$ in the standard basis (with corresponding POVM ${|0\u27e9\u27e80|,|1\u27e9\u27e81|}$). The result $x\u2208{0,1}$ is communicated to *B*, where the correction operation $XBx$ is applied. The LOCC channel corresponding to this protocol is

where

The protocol shown in Fig. 11, with the corresponding LOCC quantum channel in (187), can be easily extended to a scenario with *n *>* *1 intermediate nodes. In this case, the node $R\u21921$ starts by applying the gate $CNOTR\u21921$ to its qubits and then measuring the qubit $R12$ in the standard basis. The outcome of this measurement is sent to the node $R\u21922$, and the corresponding correction operation is applied to the qubit $R21$. Then, the gate $CNOTR\u21922$ is applied to the qubits at $R\u21922$, followed by a standard-basis measurement of $R22$ and communication of the outcome to $R\u21923$ and a correction operation on $R31$. This proceeds in sequence until the $nth$ intermediate node $R\u2192n$, which sends its measurement outcome to *B*, which applies the appropriate correction operation. The LOCC channel for this protocol is

where

for all $x\u2192\u2208{0,1}n$. If the input state to this channel is

then the output is a $(n+2)$-party GHZ state given by the state vector $|GHZn+2\u27e9AR11\cdots Rn1B$ as defined in (186), i.e.,

**Proposition III.2** (Fidelity after GHZ entanglement swapping). For all $n\u22651$ and for all states $\rho AR111,\rho R12R212,\u2026,\rho Rn2Bn+1$, the fidelity of the $(n+2)$-party GHZ state with the state after the GHZ entanglement swapping of $\rho AR111,\rho R12R212,\u2026,\rho Rn2Bn+1$ is

*Proof.* See Appendix E 2. □

The GHZ entanglement swapping protocol can be made probabilistic in a manner similar to the entanglement swapping protocol. We start by writing (190) as follows:

where

Then, to make the protocol probabilistic, we can make the following simple modification:

where ${\Lambda Rj2xj,\alpha j}xj,\alpha j\u2208{0,1},\u2009j\u2208{1,2,\u2026,n}$ are POVMs, such that

The values $qj\u2208[0,1]$ represent the success probability of the standard-basis measurement at the $jth$ intermediate node. Then, we define the LOCC quantum instrument channel for the GHZ entanglement swapping protocol as follows:

where

and

Then, the success probability of the protocol is

for every state $\rho AR\u21921\cdots R\u2192nB$.

#### 3. Graph state distribution protocol

We now consider an example of distributing an arbitrary graph state, which can be viewed as a special case of the procedure considered in Ref. 73. A graph state^{131–133} is a multi-qubit quantum state defined using graphs.

Consider a graph $G=(V,E)$, which consists of a set *V* of vertices and a set *E* of edges. For the purpose of this example, *G* is an undirected graph, and *E* is a set of two-element subsets of *V*. The graph state $|G\u27e9$ is an *n*-qubit quantum state $|G\u27e9A1\cdots An$ with $n=|V|$, which is defined as

where *A*(*G*) is the adjacency matrix of *G*, which is defined as

and $\alpha \u2192$ is the column vector $(\alpha 1,\u2026,\alpha n)T$. It is easy to show that

where $|+\u27e9:=12(|0\u27e9+|1\u27e9)$ and

with $CZAiAj:=|0\u27e9\u27e80|Ai\u22971Aj+|1\u27e9\u27e81|Ai\u2297ZAj$ being the controlled-*Z* gate.

Now, consider the scenario depicted in Fig. 12 in which *n *=* *4 nodes share Bell states with a central node. The task is for the central node to distribute the graph state $|G\u27e9$ to the outer nodes. One possible procedure is for the central node to locally prepare the graph state and then to teleport the individual qubits using the Bell states. However, it is possible to perform a slightly simpler procedure that does not require the additional qubits needed to prepare the graph state locally. In fact, the following deterministic procedure produces the required graph state $|G\u27e9$ shared by the nodes $A1,\u2026,An$.

The central node applies $CZ(G)$ to the qubits $R1,\u2026,Rn$.

On each of the qubits $R1,\u2026,Rn$, the central node performs the measurement defined by the POVM ${|+\u27e9\u27e8+|,|\u2212\u27e9\u27e8\u2212|}$, where $|\xb1\u27e9=12(|0\u27e9\xb1|1\u27e9)$. The outcome is an

*n*-bit string $x\u2192=(x1,\u2026,xn)$, where*x*= 0 corresponds to the “+” outcome and_{i}*x*= 1 corresponds to the “−” outcome. The central node communicates outcome_{i}*x*to the node_{i}*A*._{i}The nodes

*A*apply $Zxi$ to their qubit. In other words, if_{i}*x*= 0, then_{i}*A*does nothing, and if_{i}*x*= 1, then_{i}*A*applies_{i}*Z*to their qubit.

Let us prove that this protocol achieves the desired outcome. First, we observe that

Then, after the first step, the state is

where we have used the fact that

Then, we find that for every outcome string $(x1,\u2026,xn)$ of the measurement on the qubits $R1,\u2026,Rn$, the corresponding (unnormalized) post-measurement state is

Then, using the fact that $Zx|\alpha \u27e9=(\u22121)\alpha x|\alpha \u27e9$ for all $x,\alpha \u2208{0,1}$, we find that at the end of the second step, the (unnormalized) state is

for all $(x1,\u2026,xn)\u2208{0,1}n$. From this, we see that up to local Pauli-*z* corrections, the post-measurement state is equal to the desired graph state $|G\u27e9$ with probability $1/2n$ for every measurement outcome string $(x1,\u2026,xn)$. Once all of the nodes *A _{i}* receive their corresponding outcome

*x*and apply the correction $ZAixi$, the nodes $A1,\u2026,An$ share the graph state $|G\u27e9$. As a result of the classical communication of the measurement outcomes and the subsequent correction operations, the protocol is deterministic.

_{i}The protocol described above has the following representation as an LOCC channel:

for every state $\rho A1nR1n$, where $H=|+\u27e9\u27e80|+|\u2212\u27e9\u27e81|$ is the Hadamard operator, and we have let

We have also used the abbreviation $A1n\u2261A1A2\u2026An$, and similarly for $R1n$. Using the fact that

for all $x\u2192\u2208{0,1}n$, and letting

we can write the channel in the following simpler form:

From this, we see that the protocol can be thought of as measuring the systems $R1,\u2026,Rn$ according to the POVM ${|Gx\u2192\u27e9\u27e8Gx\u2192|}x\u2192\u2208{0,1}n$ and, conditioned on the outcome $x\u2192$, applying the correction operation $Zx\u2192$ to the systems $A1,\u2026,An$. Note that ${|Gx\u2192\u27e9\u27e8Gx\u2192|}x\u2192\u2208{0,1}n$ is, indeed, a POVM due to the fact that

**Proposition III.3** (Fidelity after graph state distribution). For all $n\u22652$, every graph *G* with *n* vertices, and all two-qubit states $\rho A1R11,\u2009\rho A2R22,\u2026,\rho AnRnn$, the fidelity of the graph state $|G\u27e9$ with the state after the graph state distribution protocol applied to $\rho A1R11,\u2009\rho A2R22,\u2026,\rho AnRnn$ is

where the column vector $z\u2192=(z1,\u2026,zn)T$ is given by $z\u2192=A(G)x\u2192$, with *A*(*G*) the adjacency matrix of *G*.

**Proof.** See Appendix E 3. □

In order to make the graph state distribution protocol probabilistic, we can make the following modification:

where ${\Lambda R1nx\u2192,\alpha}x\u2192\u2208{0,1}n,\alpha \u2208{0,1}$ is a POVM, such that

The value $q\u2208[0,1]$ represents the success probability of the measurement defined by the POVM ${|Gx\u2192\u27e9\u27e8Gx\u2192|R1n}x\u2192\u2208{0,1}n$. Then, we define the LOCC quantum instrument channel for the graph state distribution protocol as follows:

where

and

Then, the success probability of the protocol is

for every state $\rho A1nR1n$.

## IV. ANALYSIS OF A QUANTUM NETWORK PROTOCOL

In Secs. II and III, we described in detail how to model elementary links in a quantum network using Markov decision processes. Then, we showed how to model entanglement distillation protocols and joining protocols (such as entanglement swapping) as LOCC channels. The upshot of these developments is that they give us a method for determining the quantum states of elementary and virtual links in a quantum network that depends explicitly on the underlying device parameters and noise processes that characterize the device, thereby allowing us to perform a more realistic analysis of entanglement distribution protocols, as we now show in this section.

In this section, we analyze a simple entanglement distribution protocol. Recall from Sec. I that the entanglement distribution refers to the task of creating virtual links—entanglement between non-adjacent nodes—from elementary links, which are entangled states shared by adjacent (physically connected) nodes. An entanglement distribution protocol can be thought of as a graph transformation, as done in Refs. 128 and 134 and depicted in Fig. 1. Starting with the graph $G=(V,E)$ of physical links in the network, the goal is to realize a new graph $Gtarget=(V,Etarget)$ consisting of virtual links in addition to elementary links, such as the graph in the right-most panel of Fig. 1.

The protocol that we consider consists of two steps: generate elementary links and then perform joining protocols based on the given target graph. The protocol is described more formally in Fig. 13. Starting with the graph $G=(V,E)$ of elementary links, all of the elementary links independently undergo policies *π _{e}*, with $e\u2208E$. After $t\u22651$ time steps, an algorithm

^{128,134,135}finds paths for creating the virtual links specified by the target graph $Gtarget$, and the corresponding joining protocols are performed. If the entire target network cannot be achieved in

*t*time steps, then a decision is made to either conclude the protocol with the current configuration or to continue for another

*t*time steps under the same policies.

**Remark IV.1.** Note that in the protocol described in Fig. 13, the virtual links are created only when all of the required elementary links are active. This is of course not the most general procedure because it is in general possible to join some of the elementary links along a path while waiting for others to become active. To handle such general procedures requires developing MDPs for systems of multiple elementary links. While this is the subject of ongoing future work, we provide an example of how to extend the elementary-link MDP framework of Sec. II to a system of two elementary links, in which entanglement swapping is included, in Sec. V. We also note that the protocol in Fig. 13 uses fixed routing and path-finding algorithms from Refs. 128, 134, and 135. It is possible, in principle, to develop an MDP that takes into account routing. Doing so would allow us to obtain protocols that simultaneously optimize the actions of the elementary links, the joining operations, and the actions corresponding to routing, either directly using dynamic programing algorithms such as the one in Theorem II.3, or through reinforcement learning. These possibilities, and other possibilities for developing more sophisticated protocols using MDPs, are interesting directions for future work.

### A. Fidelity

In order to quantify the performance of the protocol described in Fig. 13, it is natural to ask what the fidelity of the resulting states of the elementary and virtual links are to prescribed target states. Thus, let us begin by showing, in general terms, how we could calculate the fidelity after *t* time steps of our protocol.

First, we note that all of the elementary links are independent of each other. This is due to the fact that we assume that every node has a separate quantum system for every one of the elementary links associated with that node. Furthermore, we assume that every elementary link undergoes its own policy independent of the other elementary links. Therefore, after *t* time steps, the quantum state of the network is

where $\pi \u2192={\pi e:e\u2208E}$ is a collection of policies for the individual elementary links, and every state $\rho e\pi e(t)$ is given by (59), namely,

Recall from (57) that $Pr[He(t)=ht]\pi e$ is the probability of the history *h ^{t}* with respect to the policy

*π*, and $\sigma e(t|ht)$ is the quantum state of the elementary link conditioned on the history

_{e}*h*, given by (60).

^{t}The state in (230) is a classical-quantum state that contains both classical information about the history of elementary link and the quantum state of the elementary link conditioned on every history. If we condition on an elementary link corresponding to $e\u2208E$ being active at time *t*, then the expected quantum state of the elementary link at time *t* is^{58}

From these states, we can calculate the quantum states of the virtual links in the target graph that are created via joining protocols. In general, the states are of the form (166). As a concrete example, let us consider the usual entanglement swapping protocol from Sec. III B 1. Let $w=(v1,e1,v2,e2,\u2026,en,vn+1)$ be a path between two non-neighboring nodes *v*_{1} and $vn+1$, such that the entanglement swapping protocol along this path creates the virtual link given by the edge ${v1,vn+1}$. The quantum state at the input of the entanglement swapping protocol is $\u2297j=1n\rho \xafej\pi ej(t)$, and the output state conditioned on the success of the protocol is $LES;n(\u2297j=1n\rho \xafej\pi ej(t))$, where we recall the definition of $LES;n$ in (172).

After the appropriate joining protocols are performed, and conditioned on their success, we obtain the target graph $Gtarget=(V,Etarget)$, and the corresponding quantum state has the form $\u2297e\u2208Etarget\omega e$, where if *e* is a virtual link, obtained via a joining protocol, then *ω _{e}* is given by (166). Now, the target quantum state is simply a tensor product of the target states corresponding to the edges of the target graph, i.e., $\u2297e\u2208Etarget\omega etarget$. Therefore, by multiplicativity of fidelity with respect to the tensor product, the fidelity of the quantum state after the protocol is equal to $\u220fe\u2208EtargetF(\omega e,\omega etarget)$. For the virtual links, individual fidelities in this product can be calculated using the formulas presented in Sec. III B.

### B. Waiting time

In addition to the fidelity, another relevant figure of merit is the *expected waiting time*, which is a figure of merit that indicates how long it takes (on average) to establish an elementary or a virtual link. This figure of merit has been considered in prior work in the context of both a linear chain of quantum repeaters and general quantum networks.^{59,65,98,121,136–138}

When defining the waiting times, we imagine a scenario in which elementary link generation is continuously occurring in the network,^{128} and that an end-user request for entanglement occurs at a time $treq\u22650$. The waiting time is then the number of time steps from time $treq$ onward that it takes to establish the entanglement.

**Definition IV.2** (Elementary link waiting time). Let $G=(V,E)$ be the graph corresponding to the elementary links of a quantum network and let $e\u2208E$. For all $treq\u22650$, the waiting time for the elementary link corresponding to the edge *e* is defined to be

Then, the expected waiting time is

where *π* is an arbitrary policy for the elementary link corresponding to the edge *e*.

We make the following definition for the waiting time for a collection of elementary links.

**Definition IV.3** (Collective elementary link waiting time). Let $G=(V,E)$ be the graph corresponding to the elementary links of a quantum network, and let $treq\u22650$. For every subset $E\u2032\u2286E$, the waiting time for the elementary links corresponding to the elements of $E\u2032$ is defined to be

where $XE\u2032(t):=\u220fe\u2208E\u2032Xe(t)$.

In other words, the collective elementary link waiting time is the time it takes for all of the elementary links given by $E\u2032$ to be simultaneously active, and its expected value is

where $\pi \u2192=(\pi e:e\u2208E\u2032)$ is an arbitrary collection of policies for the elementary links corresponding to $E\u2032$. If we consider a collection of elementary links, all undergoing the $t\u22c6=\u221e$ memory-cutoff policy, then

Proofs of this result using various different techniques can be found in Refs. 65, 98, and 139. In Appendix F, we prove this result within the framework introduced here by explicitly evaluating the formula in (235).

**Definition IV.4** (Virtual link waiting time). Let $G=(V,E)$ be the graph corresponding to the elementary links of a quantum network, and let $treq\u22650$. Given a pair $v1,vn\u2208V$ of distinct non-adjacent vertices and a path $w=(v1,e1,v2,e2,\u2026,en\u22121,vn)$ between them for some $n\u22652$, the *virtual link waiting time* along this path is defined to be the amount of time it takes to establish the virtual link given by the edge ${v1,vn}$,

where $Ew={e1,e2,\u2026,en\u22121}$ is the set of edges corresponding to the path *w*, $WEw(treq)$ is the collective elementary link waiting time from Definition IV.3, and $YEw$ is a binary random variable for the success of the joining protocol along the path *w*, so that *Y _{w}* = 1 corresponds to success of the joining protocol and

*Y*= 0 to failure. We define

_{w}*Y*and $WEw$ to be independent random variables.

_{w}The formula for the virtual link waiting time in Definition IV.4 is based on the formula in Ref. 59. It corresponds to the simple strategy of waiting for all of the elementary links along the path *w* to be established, and performing the measurements for the joining protocol. Note that this strategy is consistent with our overall quantum network protocol in Fig. 13.

### C. Key rates for quantum key distribution

In order to determine secret key rates between arbitrary pairs of nodes in a quantum network, we need to keep track of the quantum state of the relevant elementary links as a function of time. The following discussion and formulas for secret key rates are based on Ref. 113.

Suppose that *K* is a function that gives the number of secret key bits per entangled state shared by the nodes of either an elementary link or a virtual link. (*K* is, for example, the formula for the asymptotic secret key rate of the BB84, six-state, or device-independent protocol.) Then, suppose that $G=(V,E)$ is the graph corresponding to the elementary links of a quantum network. Consider a collection $e\u2032:={v1,\u2026,vk}\u2209E$ of distinct nodes corresponding to a virtual link for some $k\u22652$, and let *w* be a path in the physical graph leading to the virtual link given by $e\u2032$. An entanglement swapping protocol is performed along the path *w* in order to establish the bipartite virtual link. Conditioned on the success of the joining protocol, the quantum state of the virtual link is given by (166), namely,

where

is the success probability of the joining protocol. Then, the secret key rate (in units of secret key bits per second) for the virtual link along the path *w* is

Here, *K* is calculated using the state in (238). The repetition rate $\nu e\u2032rep$ in this case is a function of the end-to-end classical communication time required for executing the joining protocol.

## V. A MARKOV DECISION PROCESS BEYOND THE ELEMENTARY LINK LEVEL

The developments so far in this work constitute an analysis of quantum networks using a Markov decision process (MDP) for elementary links. As we have seen, the framework of MDPs is useful because it allows us to model noise processes and imperfections that are present in near-term quantum technologies, and thus, allows us to understand the limits on the performance of near-term quantum networks. An important question is how useful the MDP formalism will be in practice when scaling up to model systems of more than one elementary link. In this section, we provide an MDP for a system of two elementary links, taking entanglement swapping into account. We note that in the recent work,^{140} MDPs for repeater chains with two, three, and four elementary links have been considered, but the definition of the MDP here differs from the one in Ref. 140 because here we take decoherence of the quantum memories into account.

We start this section by defining the basic elements of the MDP, and then, we show how to obtain optimal policies using linear programing. In particular, we formulate the optimal expected waiting time to obtain the end-to-end virtual link and the optimal expected fidelity of the end-to-end virtual link as linear programs. Then, we show that prior analytical results on the expected waiting time for two elementary links under the memory-cutoff policy,^{59} known only in the “symmetric” scenario when the two elementary links have the same transmission-heralding success probability and the same memory cutoff, can be reproduced. However, we note that our linear programing procedure can be applied even in non-symmetric scenarios.

### A. An MDP for two elementary links

Let *p*_{1} and *p*_{2} be the success probabilities for generating the two elementary links, and let *q* be the probability of successful entanglement swapping. Note that *p*_{1} and *p*_{2} are defined exactly as in Sec. II A. In particular,

where $Mj1,\u2009j\u2208{1,2}$ are the completely positive maps corresponding to the success of the heralding procedure for the $jth$ elementary link, $Sj$ is the transmission channel from the source to the nodes for the $jth$ elementary link, and $\rho jS$ is the state produced by the source associated with the $jth$ elementary link, see Fig. 14. We also define the states

where $Nj$ is the quantum channel describing the decoherence of the quantum memories associated with the $jth$ elementary link.

Now, recall that in the case of the one elementary link considered in Sec. II B, the state variable was just the memory time *M*(*t*), referring to the time for which the quantum state of the elementary link was held in the memories of the nodes, and the actions consisted of either keeping the elementary link or discarding it and generating a new one. Now, in the case of two elementary links, we must keep track of the memory time of both elementary links, and we also store information about whether or not the virtual (end-to-end) link is active. The actions are similar to before, consisting of the same elementary link actions as before, but now, we define an additional action for performing the entanglement swapping operation. Formally, we have the following:

*States*: The states of the MDP are elements of the set $S=X\xd7M1\xd7M2$, where $X={0,1}$ indicates whether or not the end-to-end link is active, $M1={\u22121,0,1,\u2026,m1\u22c6}$ is the set of possible states of the first elementary link (with the elements of the set having the same interpretation as in the elementary link MDP), and $M2={\u22121,0,1,\u2026,m2\u22c6}$ is the set of possible states of the second elementary link. In particular, $m1\u22c6$ and $m2\u22c6$ are the maximum storage times of the two elementary links, corresponding to their coherence times, see Sec. II B. To these states, we associate the (standard) probability simplex spanned by the orthonormal vectors $|x\u27e9\u2297|m1\u27e9\u2297|m2\u27e9$, with $x\u2208X,\u2009m1\u2208M1$, and $m2\u2208M2$, and we often use the abbreviation $|s\u27e9\u2261|x,m1,m2\u27e9\u2261|x\u27e9\u2297|m1\u27e9\u2297|m2\u27e9$ for every $s=(x,m1,m2)\u2208S$.We use $S(t)=(X(t),M1(t),M2(t)),\u2009t\u2208\mathbb{N}$, to refer to the random variables (taking values in $S$) corresponding to the state of the MDP.

*Actions*: The set of actions is $A={00,01,10,11,\u22c8}$, where the different actions have the following meanings:00: Keep both elementary links.

01: Keep the first elementary link, discard, and regenerate the second.

10: Discard and regenerate the first elementary link, keep the second.

11: Discard and regenerate both elementary links.

$\u22c8$: Perform entanglement swapping.

We use

*A*(*t*), $t\u2208\mathbb{N}$, to refer to the random variables (taking values in the set $A$) corresponding to the actions taken.We let $H(t)=(S(1),A(1),S(2),A(2),\u2026,A(t\u22121),S(t))$ be the history, consisting of a sequence of states and actions, up to time $t\u2208\mathbb{N}$, with $H(1)=S(1)$.

*Figure of merit*: For the elementary link MDP defined in Sec. II B, recall that the figure of merit was essentially the fidelity of the elementary link, but scaled by a factor corresponding to the probability that the elementary link is active. We define the figure of merit here in an analogous fashion as follows:(245)$f(x,m1,m2)={\u27e8\psi |LES;1(\sigma 1(m1)\u2297\sigma 2(m2))|\psi \u27e9if\u2009x=1,\u2009m1,m2\u22650,0,otherwise,$where we recall that $LES;1$ is the entanglement swapping channel for one intermediate node, as defined in Sec. III B 1, and $|\psi \u27e9$ is a target pure state vector, which, in this context, is typically the maximally entangled state $|\Phi \u27e9$ as defined in (168).

Let us now proceed to the definition of the transition matrices for our MDP. Unlike the elementary link scenario, in this scenario of two elementary links, we want not only for the fidelity and success probability of the end-to-end link to be high but also for the average amount of time it takes to generate the end-to-end link to be low—in other words, we want the expected waiting time to be low as well. Therefore, in order to address the expected waiting time in our MDP, we define the transition matrices in such a way that states corresponding to an active end-to-end link [i.e., states $s=(x,m1,m2)\u2208S$ such that *x *=* *1] are *absorbing states*. By doing this, the expected waiting time is nothing but the expected time to absorption, which is a standard result in the theory of Markov chains, see, e.g., Ref. 141. We note that this idea of relating the expected waiting time of a quantum repeater chain to the absorption time of a Markov chain has already been used in Ref. 121; however, here, we apply this idea in the more general context of an MDP, while also taking memory decoherence and other device imperfections explicitly into account.

Let $Tja$ denotes the transition matrix for the $jth$ elementary link, as defined in (48) and (49), for $a\u2208{0,1}$. Then, using those elementary link transition matrices, we define the transition matrices for our MDP for two elementary links as follows:

where

and $|gpj\u27e9,\u2009j\u2208{1,2}$ is defined exactly as in (53).

First, let us observe that every transition matrix has a block structure, with the blocks defined by the transitions of the status of the end-to-end link. Specifically, we can write every transition matrix *T ^{a}* as

where the sub-blocks $Tx\u2192x\u2032a$ are the block corresponding to the transition of the status of the virtual link from $x\u2208{0,1}$ to $x\u2032\u2208{0,1}$. (We note, as before, that probability vectors are applied to transition matrices from the right, see Appendix A.) From this, we see that for the actions $00,01,10,\u2009and\u200911$, the transition matrices are of the following block-diagonal form:

Therefore, for these transition matrices, because the entanglement swapping action is not performed, the transition from *x *=* *0 to *x *=* *1 is not possible. Consequently, if the end-to-end is initially inactive (*x *=* *0), then it stays inactive, and each elementary link transitions independently according to the elementary link transition matrices from Sec. II B. If the end-to-end link is initially active (*x *=* *1), then nothing happens to the states of the elementary links, in accordance with the definition of an absorbing state. For the action $\u22c8$ of entanglement swapping, we have three non-zero blocks. The block $T0\u21920\u22c8$ means that the end-to-end link is initially inactive and stays inactive, which can happen in one of several ways:

Both elementary links are initially active, but the entanglement swapping fails, after which both elementary links are regenerated. This possibility is given by the term $(1\u2212q)|gp1,gp2\u27e9\u27e8\gamma 1+,\gamma 2+|$.

Both elementary links are initially inactive. In this case, they both remain inactive after the entanglement swapping action, and this is given by the term $|\u22121,\u22121\u27e9\u27e8\u22121,\u22121|$.

One of the elementary links is active, but the other is not. In this case, the memory time of the active elementary link is incremented by one, corresponding to the “shift” operator

*S*on the active elementary link, while the inactive elementary link remains inactive. These possibilities are given by the terms $S1\u2297|\u22121\u27e9\u27e8\u22121|$ and $|\u22121\u27e9\u27e8\u22121|\u2297S2$._{j}One of the elementary links is inactive, and the other has reached is maximum memory time. In this case, the inactive elementary link remains inactive, and the other elementary link transitions to the −1 state because the maximum time $mj\u22c6$ was reached. These possibilities are given by the terms $|\u22121,\u22121\u27e9\u27e8m1\u22c6,\u22121|$ and $|\u22121,\u22121\u27e9\u27e8\u22121,m2\u22c6|$.

The block $T0\u21921\u22c8$ corresponds to a transition from the end-to-end link initially being inactive to being active, which happens when the entanglement swapping succeeds. Since the entanglement swapping is possible only when both elementary links are active, and because we want to keep track of the memory times of the elementary links at the moment the entanglement swapping is performed, this block is given by $q11+\u229712+$. Finally, the block $T1\u21921\u22c8$ corresponds to the end-to-end link being active already; thus, in accordance with the definition of an absorbing state, this block is given simply by $11\u229712$, as with the other actions.

Now, just as we defined a memory-cutoff policy for elementary links in Sec. II C 1, we can define a memory-cutoff policy for the system of two elementary links that we are considering here. Suppose that the first elementary link has cutoff time $t1\u22c6\u2264m1\u22c6$, and the second elementary link has cutoff time $t2\u22c6\u2264m2\u22c6$. Then, we define the decision function such that if both elementary links are active, then an entanglement swap is attempted; otherwise, one of the actions 01, 10, or 11 is performed, depending on which elementary links are active. This leads to the following definition of the deterministic decision function:

for all $m1\u2208M1$ and $m2\u2208M2$. Note that it is only necessary to define the decision function on the transient states $(0,m1,m2)$ and not the absorbing states $(1,m1,m2)$ because the figures of merit that we are concerned with [such as the expected value of the function *f* in (245) and the expected waiting time to absorption] do not depend on the values of the decision function on absorbing states.

### B. Optimal policies via linear programing

Having defined the basic elements of the MDP for two elementary links with entanglement swapping, let us now look at optimal policies. We are concerned both with the figure of merit defined in (245) and with the expected waiting time to obtain an end-to-end link. In Appendix A 4, we show that both quantities can be bounded using linear programs. In fact, the results in Appendix A 4 go beyond the MDP for two elementary links that we consider here because the linear programs apply to general MDPs with arbitrary state and action sets and transition matrices.

Theorem V.1 (Linear program for the optimal expected value for two elementary links). Given a system of two elementary links, along with the associated MDP defined in Sec. V A, the optimal expected value of the function *f* defined in (247) is bounded from above the following linear program:

where the optimization is with respect to the $(m1\u22c6+2)\xb7(m2\u22c6+2)$-dimensional vectors $|x\u27e9,\u2009|y\u27e9,\u2009|wa\u27e9,\u2009|va\u27e9,\u2009a\u2208A$, and the inequality constraints are component-wise. Every set of feasible points $|x\u27e9,\u2009|y\u27e9,\u2009|wa\u27e9,\u2009|va\u27e9,\u2009a\u2208A$ of this linear program defines a stationary policy with the decision function *d*, whose values for the transient states $(0,m1,m2)$ are as follows:

for all $m1\u2208M1,\u2009m2\u2208M2$, and $a\u2208A$. If $\u27e80,m1,m2|x\u27e9=0$, then we can set $d(0,m1,m2)$ to be an arbitrary probability distribution over the set $A$ of actions.

**Remark V.2**. Note that in the theorem statement above, we defined the action of the decision function only for the transient states. For the absorbing states, we can set the decision function to be arbitrary because neither the expected value of the MDP nor the expected waiting time to absorption is affected by the value of the decision function on absorbing states, see Appendix A 3.

**Theorem V.3** (Linear program for the optimal expected waiting time for two elementary links). Given a system of two elementary links, along with the associated MDP defined in Sec. V A, the optimal expected waiting time is bounded from below by the following linear program:

where the optimization is with respect to the $(m1\u22c6+2)\xb7(m2\u22c6+2)$-dimensional vectors $|x\u27e9$ and $|wa\u27e9,\u2009a\u2208A$, and the inequality constraints are component-wise. Every set of feasible points $|x\u27e9,\u2009|wa\u27e9,\u2009a\u2208A$ of this linear program defines a stationary policy with decision *d*, whose values for the transient states $(0,m1,m2)$ (see Remark V.2) are as follows:

for all $m1\u2208M1,\u2009m2\u2208M2$, and $a\u2208A$. If $\u27e80,m1,m2|x\u27e9=0$, then we can set $d(0,m1,m2)$ to be an arbitrary probability distribution over the set $A$ of actions.

We now show that the linear program in (260) reproduces the known analytical result in Ref. 59, Eq. (5), for the expected waiting time for two elementary links with the same success probability *p* and cutoff time $t\u22c6$,

In Fig. 15, we plot this function along with the optimal value obtained for the linear program in (260). We find that the two curves coincide for all values of the transmission-heralding probability *p* and the entanglement swapping success probability *q* considered. This provides us not only a sanity check on the linear program but also evidence that the memory-cutoff policy in (257) is optimal, at least in the symmetric scenario. We also note that the result in (262) holds only in the symmetric scenario in which both elementary links have the same transmission-heralding success probability, while the linear program in (260) can be used to determine the optimal expected waiting time in arbitrary parameter regimes.

## VI. SUMMARY AND OUTLOOK

The central topic of this work is the theory of near-term quantum networks—specifically, how to describe them and how to develop protocols for entanglement distribution in practical scenarios with near-term quantum technologies. The goal in this area of research is to develop protocols that can handle multiple-user requests, work for any given network topology, and adapt to changes in topology and attacks to the network infrastructure, with the ultimate goal being the realization of the quantum internet. In this work, we have laid some of the foundations for this research program. The core idea is that Markov decision processes (MDPs) provide a natural setting in which to analyze near-term quantum network protocols. We illustrated this idea in this work by first analyzing the MDP for elementary links first introduced in Ref. 58, simplifying its formulation and presenting some new results about it. We then considered the example of satellite-to-ground elementary link generation under the lens of the elementary link MDP. We then showed how the elementary link MDP can be used as part of an overall quantum network protocol. Finally, we provided a first step toward using the MDP formalism for more realistic, larger networks, by providing an MDP for two elementary links. We showed that important figures of merit such as the fidelity of the end-to-end link as well as the expected waiting time for the end-to-end link can be obtained using linear programs.

Moving forward, there are many interesting directions to pursue. The MDPs introduced in this work are not entirely general because they do not model protocols for arbitrary repeater chains and arbitrary networks. Thus, to start with, extending the MDP for two elementary links to repeater chains of arbitrary length is an interesting direction for the future work. In this direction, we expect that linear and possibly even semi-definite relaxations of the expected value of the end-to-end link and of the expected waiting time, such as those in Theorem V.1 and Theorem V.3, are going to be crucial in the analysis of longer repeater chains because the size of the MDP (the number of states and actions) will grow exponentially with the number of elementary links.

Going beyond repeater chains to general quantum networks, it is of interest to examine protocols involving multiple cooperating agents. When we say that agents “cooperate,” we mean that they are allowed to communicate with each other. In the context of quantum networks, agents who cooperate have knowledge beyond that of their own nodes. If every agent cooperates with an agent corresponding to a neighboring elementary link, then the agents would have knowledge of the network in their local vicinity, and this would, in principle, improve waiting times and rates for entanglement distribution. Furthermore, the quantum state of the network would not be a simple tensor product of the quantum states corresponding to the individual edges, as we have in (229) when all the agents are independent. See Refs. 128 and 135 for a discussion of nodes with local and global knowledge of a quantum network in the context of routing.

Finally, another interesting direction for future work is to develop quantum network protocols based on the decision processes that incorporate queuing models for requests for links of a specific type between specific nodes, see, e.g., Refs. 90 and 142. Then, one can calculate quantities such as the time needed to fulfill all requests. We can also calculate the “capacity” of the network defined in the context of queuing systems as the maximum number of requests that can be fulfilled per unit time.

## ACKNOWLEDGMENTS

Much of this work is based on the author's Ph.D. thesis research,^{143} which was conducted at the Hearne Institute for Theoretical Physics, Department of Physics and Astronomy, Louisiana State University. During this time, financial support was provided by the National Science Foundation and the National Science and Engineering Research Council of Canada Postgraduate Scholarship. The plots in this work were made using the Python package matplotlib.^{144}

## AUTHOR DECLARATIONS

### Conflict of Interest

The author has no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

### APPENDIX A: OVERVIEW OF MARKOV DECISION PROCESSES

In this section, we provide a brief overview of the concepts from the theory of Markov decision processes (MDPs) that are relevant for this work. We mostly follow the definitions and results as presented in Ref. 145 while using the notation defined in Appendix A 1.

##### 1. Notation

Throughout this work, we deal with probability distributions defined on a discrete, finite set of points. It is very helpful to write these probability distributions as vectors in a (standard) probability simplex. We do this as follows. Consider a finite set $X$. To this set, we associate the orthonormal vectors ${|x\u27e9}x\u2208X$ in $\mathbb{R}|X|$, which means that $\u27e8x|x\u2032\u27e9=\delta x,x\u2032$ for all $x,x\u2032\u2208X$. The probability simplex corresponding to $X$ is then formally defined as all convex combinations of the vectors in ${|x\u27e9}x\u2208X$,

This set is in one-to-one correspondence with the set of all probability distributions defined on $X$. Specifically, let $P:X\u2192[0,1]$ be a probability distribution (probability mass function) on $X$, i.e., $P(x)\u2208[0,1]$ for all $x\u2208X$ and $\u2211x\u2208XP(x)=1$. The unique probability vector $|P\u27e9X\u2208\Delta X$ corresponding to *P* is

We drop the subscript $X$ from $|P\u27e9X$ whenever the underlying set $X$ is clear from the context. It is important to note and to emphasize that the vector $|P\u27e9$ does not represent a quantum state—the braket notation is used merely for convenience. Normalization of the probability vector is then captured by defining the following vector:

We often omit the subscript $X$ in $|\gamma X\u27e9$ when the underlying set $X$ is clear from the context. Then

It is often the case that a probability distribution is associated with a random variable *X* taking values in $X$, so that $P(x)\u2261PX(x)=Pr[X=x]$ for all $x\u2208X$. In this case, for brevity, we sometimes write the probability vector as

Now, consider another random variable *Y* taking values in the finite set $Y$. We regard stochastic matrices mapping *X* to *Y* (i.e., matrices of conditional probabilities $Pr[Y=y|X=x]$) as linear operators with domain $\Delta X$ and codomain $\Delta Y$,

and we denote the matrix elements by

We then have, by definition of a stochastic matrix,

which captures the fact that the columns of a stochastic matrix sum to one. Then, if $|PX\u27e9\u2208\Delta X$ is a probability distribution corresponding to *X*, then the action of the matrix $TY|X$ on $|PX\u27e9$, which results in the probability distribution $|PY\u27e9\u2208\Delta Y$ corresponding to *Y*, can be written as

In particular, for all $y\u2208Y$

Finally, we discuss joint probability distributions. Consider two finite sets $X$ and $Y$ and the set $\Delta X\xd7Y\u2282\mathbb{R}|X\xd7Y|$ of all (joint) probability distributions on $X\xd7Y$. Now, because $\mathbb{R}|X\xd7Y|\u2245\mathbb{R}|X|\u2297\mathbb{R}|Y|$, we can regard $\Delta X\xd7Y$ as the convex span (convex hull) of tensor product orthonormal vectors $|x\u27e9\u2297|y\u27e9,\u2009x\u2208X,\u2009y\u2208Y$. Thus, every $|Q\u27e9XY\u2208\Delta X\xd7Y$ can be written as

We frequently use the abbreviation $|x,y\u27e9\u2261|x\u27e9\u2297|y\u27e9$ in this paper. Then, marginal distributions can be obtained as follows:

where

These concepts for probability distributions defined on two sets can be readily extended to probability distributions defined on sets of the form $X1\xd7X2\xd7\cdots \xd7Xn$ for all $n\u22652$.

##### 2. Definitions

A *Markov decision process (MDP)* is a stochastic process that models the evolution of a system with which an agent is allowed to interact. Formally, an MDP is defined as a collection,

consisting of the following elements:

A set $S$ of the allowed

*states*of the system. We consider finite state sets throughout this work. The sequence $(S(t):t\u2208\mathbb{N})$ of random variables taking values in $S$ describes the state of the system at all times $t\u2208\mathbb{N}$.A set $A$ of

*actions*that the agent is allowed to perform on the system. We consider finite action sets throughout this work. The sequence $(A(t):t\u2208\mathbb{N})$ of random variables taking values in $S$ describes the action taken by the agent at all times $t\u2208\mathbb{N}$.- A set ${Ta}a\u2208A$ of
*transition matrices*, which are stochastic matrices with domain $\Delta S$ and codomain $\Delta S$. Specifically(A18)$Ta=\u2211s,s\u2032\u2208SPr[S(t+1)=s\u2032|S(t)=s,A(t)=a]|s\u2032\u27e9\u27e8s|,$for all $t\u2208\mathbb{N}$. These matrices determine how the system evolves from one time to the next conditioned on the actions of the agent.

A function $r:S\xd7A\u2192\mathbb{R}$ that quantifies the

*reward*that the agent receives at every time step based on the current state of the system and the action that it takes.

The *history* up to time $t\u2208\mathbb{N}$ of an MDP is the random sequence $H(t):=(S(1),A(1),\u2026,A(t\u22121),S(t))$, with $H(1)=S(1)$. By the Markovian nature of an MDP, the probability distribution of every history $ht=(s1,a1,\u2026,at\u22121,st)$ is equal to

where

is the probability distribution of actions at time *j* conditioned on the current state of the system. We refer to $dj:S\xd7A\u2192[0,1]$ as a *decision function*. Note that $\u2211a\u2208Adj(s)(a)=1$ for all $s\u2208S$. The sequence

of decision functions at all times $t\u2208\mathbb{N}$ is known as a *policy* of the agent. In the context of this work, policies should be thought of as synonymous with *protocols* for quantum networks.

Given a decision function *d*, we define the following linear operators acting on $\Delta S$:

Then, it is straightforward to show that the linear operator

from $\Delta S$ to $\Delta S$ is a stochastic matrix with elements

for all $t\u2208\mathbb{N}$ and all $s,s\u2032\u2208S$.

**Remark A.1.** Observe that for a fixed decision function *d*, the set ${Dad}a\u2208A$ of linear operators defined in (A22) forms a positive operator-valued measure (POVM). Indeed, by definition, all of the operators are positive semidefinite; furthermore, by definition of the decision function in (A20)

The transition matrices *P ^{d}* as defined in (A23) allow us to determine the probability distribution of the state of the system at every time $t\u2208\mathbb{N}$ for a given policy. Specifically, for a policy $\pi =(d1,d2,\u2026)$

where

is the probability distribution for the system at the initial time *t *=* *1.

###### a. MDPs with absorbing states

We call a state $s\u2208S$ *absorbing* if $Ta|s\u27e9=|s\u27e9$ for all $a\u2208A$. In other words, once the system reaches the state *s*, it always stays there, meaning that $Pd|s\u27e9=|s\u27e9$ for all decision functions *d*. Every state that is not absorbing is called *transient* if there is non-zero probability that, starting from such a state, the system will eventually reach an absorbing state. We can partition the set $S$ of all states into disjoint sets: $S=Stra\u222aSabs$, where $Sabs$ is the set of absorbing states and $Stra$ is the set of transient states. We can then rewrite the set ${|s\u27e9}s\u2208S$ as ${|0,s\u27e9}s\u2208Stra\u222a{|1,s\u27e9}s\u2208Sabs$, leading to the following block structure for the transition matrices *T ^{a}*:

where $T0\u21920a$ is the block describing transitions between transient states, $T1\u21920a$ is the block describing transition between an absorbing state and a transient state, $T0\u21921a$ is the block describing transitions between a transient state and an absorbing state, and $T1\u21921a$ is the block describing transitions between absorbing states. Note that by our definition of an absorbing state, $T1\u21920a=0$ and $T1\u21921a=1Sabs$ for all $a\u2208A$. Similarly, for a decision function *d*, we can write the matrices $Dad,\u2009a\u2208A$, in block form as

Consequently, the transition matrix *P ^{d}* in (A23) has the following form:

where

##### 3. Figures of merit

While the primary figure of merit in a Markov decision process is the expected reward, in this work, we are mostly interested in what we call *functions of state* (such as the fidelity) and the absorption time (corresponding to the waiting time for a virtual link).

*Functions of state.* In this work, we are also interested in functions $f:S\u2192\mathbb{R}$ of the state of the system. We can associate to such functions the vector

Then, for a policy $\pi =(d1,d2,\u2026)$, we are interested in the expected value of the random variable $f(S(t))$ for all $t\u2208\mathbb{N}$, i.e., the quantity