The quantum internet is one of the frontiers of quantum information science. It will revolutionize the way we communicate and do other tasks, and it will allow for tasks that are not possible using the current, classical internet. The backbone of a quantum internet is entanglement distributed globally in order to allow for such novel applications to be performed over long distances. Experimental progress is currently being made to realize quantum networks on a small scale, but much theoretical work is still needed in order to understand how best to distribute entanglement, especially with the limitations of near-term quantum technologies taken into account. This work provides an initial step toward this goal. In this work, we lay out a theory of near-term quantum networks based on Markov decision processes (MDPs), and we show that MDPs provide a precise and systematic mathematical framework to model protocols for near-term quantum networks that is agnostic to the specific implementation platform. We start by simplifying the MDP for elementary links introduced in prior work and by providing new results on policies for elementary links in the steady-state (infinite-time) limit. Then, we show how the elementary link MDP can be used to analyze a complete quantum network protocol. We then provide an extension of the MDP formalism to two elementary links. Here, as new results, we derive linear programing relaxations that allow us to obtain optimal steady-state policies with respect to the expected fidelity and waiting time of the end-to-end link.
I. INTRODUCTION
The quantum internet1–5 is envisioned to be a global-scale interconnected network of devices which exploits the uniquely quantum-mechanical phenomenon of entanglement. By operating in tandem with today's Internet, it will allow people all over the world to perform quantum communication tasks, such as quantum key distribution (QKD),6–11 quantum teleportation,12–14 quantum clock synchronization,15–18 distributed quantum computation,19 and distributed quantum metrology and sensing.20–22 A quantum internet will also allow for exploring fundamental physics23 and for forming an international standard time.24 Quantum teleportation and QKD are perhaps the primary use cases of the quantum internet in the near term. In fact, there are several metropolitan-scale QKD systems already in place.25–32
Scaling up beyond the metropolitan level toward a global-scale quantum internet is a major challenge. All of the aforementioned tasks require the use of shared entanglement between distant locations on the Earth, which typically has to be distributed using single-photonic qubits sent through either the atmosphere or optical fibers. It is well known that optical signals transmitted through either the atmosphere or optical fibers undergo an exponential decrease in the transmission success probability with distance,33–35 limiting direct transmission distances to roughly hundreds of kilometers. Therefore, one of the central research questions in the theory of quantum networks is how to overcome this exponential loss, thus distributing entanglement over long distances efficiently and at high rates.
A quantum network can be modeled as a graph , where the vertices V represent the nodes in the network and the edges in E represent quantum channels connecting the nodes, see Fig. 1. Then, the task of entanglement distribution is to transform elementary links, i.e., entanglement shared by neighboring nodes, to virtual links, i.e., entanglement between distant nodes, see the right-most panel of Fig. 1. In this context, nodes that are not part of the virtual links to be created can act as quantum repeaters, i.e., helper nodes whose purpose is to mitigate the effects of loss and noise along a path connecting the end nodes, thereby making the quantum information transmission more reliable. Specifically, quantum repeaters perform entanglement distillation36–38 (or some other form of quantum error correction), entanglement swapping,12,39 and possibly some form of routing, in order to create the desired virtual links. Protocols for entanglement distribution in quantum networks have been described from an information-theoretic perspective in Refs. 40–47, and limits on communication in quantum networks have been explored in Refs. 40–54. Linear programs, and other techniques for obtaining optimal entanglement distribution rates in a quantum network, have been explored in Refs. 53 and 55–57. However, information-theoretic analyses are agnostic to physical implementations, and generally speaking, the protocols and the rates derived apply in an idealized scenario in which quantum memories have high coherence times, and quantum gate operations have no error.
Graphical depiction of a quantum network and entanglement distribution. (Left) The physical layout of the quantum network is described by a hypergraph G, which should be thought of as fixed, in which the vertices represent the nodes (senders and receivers) in the network and the (hyper)edges represent quantum channels that are used to distribute entangled states (elementary links) shared by the corresponding nodes. (Center) At any point in time, only a certain number of elementary links in the network may be active. By “active,” we mean that an entangled state has been distributed successfully to the nodes and the corresponding quantum systems stored in the respective quantum memories. Active bipartite links are indicated by a red line, and active k-partite elementary links, , corresponding to the hyperedges are indicated by a blue bubble. (Right) An entanglement distribution protocol transforms elementary links to virtual links, which are indicated in orange, thus leading to a new graph for the network. The protocol is described mathematically by an LOCC channel.
Graphical depiction of a quantum network and entanglement distribution. (Left) The physical layout of the quantum network is described by a hypergraph G, which should be thought of as fixed, in which the vertices represent the nodes (senders and receivers) in the network and the (hyper)edges represent quantum channels that are used to distribute entangled states (elementary links) shared by the corresponding nodes. (Center) At any point in time, only a certain number of elementary links in the network may be active. By “active,” we mean that an entangled state has been distributed successfully to the nodes and the corresponding quantum systems stored in the respective quantum memories. Active bipartite links are indicated by a red line, and active k-partite elementary links, , corresponding to the hyperedges are indicated by a blue bubble. (Right) An entanglement distribution protocol transforms elementary links to virtual links, which are indicated in orange, thus leading to a new graph for the network. The protocol is described mathematically by an LOCC channel.
What are the fundamental limitations on near-term quantum networks? Such quantum networks are characterized by the following elements:
Small number of nodes;
Imperfect sources of entanglement;
Non-deterministic elementary link generation and entanglement swapping;
Imperfect measurements and gate operations;
Quantum memories with short coherence times;
No (or limited) entanglement distillation/error correction.
A theoretical framework taking these practical limitations into account would act as a bridge between statements about what can be achieved in principle (which can be answered using information-theoretic methods) and statements that are directly useful for the purpose of implementation. The purpose of this work is to present the initial elements of such a theory of near-term quantum networks.
The main contribution of this work is to frame quantum network protocols in terms of Markov decision processes (MDPs) and to place the Markov decision process for elementary links introduced in Ref. 58 within an overall quantum network protocol. More specifically, the contributions of this work are as follows:
In Sec. II, we start by recapping the model for elementary link generation presented in Ref. 58. Along the way, we present Lemma II.1. While the result of Lemma II.1 is generally well known, to the best of our knowledge, its proof is not readily accessible, and thus, we provide the proof here. Then, as a new contribution, we show that the Markov decision process (MDP) for elementary links introduced in Ref. 58 can be written in a simpler manner in terms of different variables. Furthermore, we emphasize that the figure of merit associated with the MDP, as introduced in Ref. 58, takes into account both the fidelity of the elementary link and its success probability. To the best of our knowledge, such a figure of merit has not been considered in prior work. The simplified form of the MDP allows us to derive two new results. The first new result is Theorem II.4, which gives us an analytic expression for the steady-state value of an elementary link undergoing an arbitrary time-homogenous policy. The second new result is Theorem II.5, which allows us to determine the optimal steady-state value of the elementary link using a linear program. We demonstrate the usefulness of the MDP approach to modeling elementary links in Sec. II D, in which we provide an extended example of elementary links generated via satellite-to-ground transmission.
In Sec. III, we describe entanglement distillation protocols and protocols for joining elementary links (in order to create virtual links) in general terms as local operations and classical communication (LOCC) quantum instrument channels. We then present three joining protocols and write them down explicitly as LOCC channels. Doing so allows us to determine the output state of the protocol for any set of input states, including input states that are noisy as a result of device imperfections, etc. This, in turn, allows us to compute the fidelity of the output state with respect to the ideal target state that would be obtained if the input states were ideal. Formulas for the fidelity at the output of the protocols are presented as Proposition III.1, Proposition III.2, and Proposition III.3. In particular, Proposition III.1 provides a formula for the fidelity at the output of the usual entanglement swapping protocol, which, to the best of our knowledge, is not explicitly found in prior works. Prior works typically use (as an approximation) the product of the individual elementary link fidelities in order to obtain the fidelity after entanglement swapping.
In Sec. IV, we present a quantum network protocol that combines the Markov decision process for elementary links with known routing and path-finding algorithms. Then, we provide a general method for determining waiting times and key rates for the quantum key distribution for this protocol.
In Sec. V, we provide a first step toward extending the elementary link MDP by defining an MDP for two elementary links with entanglement swapping. We then show how to approximate waiting times using a linear program, and we find that this linear programing approximation reproduces exactly the known analytic results on the waiting time for such a scenario.59 However, our result is more general, allowing us to compute waiting times for arbitrary parameter regimes, while the analytic results are true only for restricted parameter regimes. Broadly speaking, having linear-programing approximations to the waiting time and other important quantities of interest (such as fidelity) will be important when considering MDPs for larger networks.
This work is the one in a long line of work on quantum repeaters, taking device imperfections and noise into account, beginning with the initial theoretical proposal,60,61 and then resulting in a vast body of work.56,57,59,62–91 (See also Refs. 92–95 and the references therein.) All of these proposals deal almost exclusively with a single transmission line connecting a sender and a receiver. However, for a quantum internet, we need to go beyond a single transmission line, and we need to consider multiple transmission lines operating in parallel. A unified and self-consistent theoretical framework will help to guide real-world implementations. It is our hope that this work provides a good starting point along this line of thought and leads to a better understanding of how realistic, near-term quantum devices could be used to realize large-scale quantum networks and, eventually, a global-scale quantum internet.
II. A MARKOV DECISION PROCESS FOR ELEMENTARY LINKS
We start by presenting a Markov decision process (MDP) for elementary links, as introduced in Ref. 58. To be specific, this is an MDP for an arbitrary edge of the graph corresponding to a quantum network. However, unlike Ref. 58, we present the MDP in much simpler terms in which we need not explicitly keep track of the quantum state. Through this simplification, we are able to establish a new result, Theorem II.4, which gives us the steady-state fidelity of an elementary link undergoing an arbitrary time-homogenous (stationary) policy. We start by describing the physical model of elementary link generation, considering two specific examples of transmission channels. Then, we define the MDP corresponding to this model of elementary link generation.
A. Generating elementary links
Our model for elementary link generation is the one considered in Ref. 58 and illustrated in Fig. 2, based on the same model considered in prior work.76,96–98 Consider an arbitrary physical link in the network. For every such physical link, there is a source station that prepares and distributes an entangled state to the corresponding nodes. In general, all of these source stations operate independently of each other, distributing entangled states as they are requested. Specifically, we have the following.
The source produces a k-partite quantum state ρS, , and sends it to the nodes via a quantum channel , leading to the state . Here, k is the number of nodes belonging to an edge, with k = 2 corresponding to ordinary, bipartite edges (such as the red edges in Fig. 1) and corresponding to hyperedges (such as the blue bubbles in Fig. 1).
- The modes perform a heralding procedure, which is a protocol involving local operations and classical communication. It can be described by a quantum instrument , where and are completely positive trace non-increasing maps such that is trace preserving. These maps capture not only the probabilistic nature of the heralding procedure but also the various imperfections of the devices that are used to perform the procedure. The map corresponds to failure of heralding and corresponds to success. The probability of successful transmission and heralding is(1)and the states conditioned on success and failure are, respectively,(2)(3)
The superscript “0” in indicates that, upon success of the heralding procedure, the quantum systems have been immediately stored in local quantum memories at the nodes and have not yet suffered from any decoherence.
- The state of the quantum systems after time steps in the quantum memories is given by(4)
where is a quantum channel that describes the decoherence of the individual quantum memories at the nodes.
Our model for elementary link generation in a quantum network consists of source stations associated with every elementary link that distributes entangled states to the corresponding nodes.76,96–98
Our model for elementary link generation in a quantum network consists of source stations associated with every elementary link that distributes entangled states to the corresponding nodes.76,96–98
1. Ground-based transmission
The most common medium for quantum information transmission for communication purposes is photons traveling through either free space or fiber-optic cables. These transmission media are modeled well by a bosonic pure-loss/attenuation channel ,105 where is the transmittance of the medium, which, for fiber-optic or free-space transmission, has the form ,33–35 where L is the transmission distance and L0 is the attenuation length of the fiber.
Before the k quantum systems corresponding to the source state ρS are transmitted through the pure-loss channel, they are each encoded into d bosonic modes with . A simple encoding is the following:
sometimes called the d-rail encoding. In other words, using d bosonic modes, we form a qudit quantum system by defining the standard basis elements of the associated Hilbert space by the states corresponding to a single photon in each of the d modes. We let
denote the vacuum state of the d modes, which is the state containing no photons.
In the context of photonic state transmission, the source state ρS is typically of the form , where
where is a state vector with n photons in total for each of the k parties, and the numbers are probabilities, so that . For example, in the cases k = 2 and d = 2, the following source state is generated from a parametric down-conversion process (see, e.g., Refs. 106 and 107):
where r and q are parameters characterizing the process. One often considers a truncated version of this state as an approximation, so that107
where .
Typically, the encoding into bosonic modes is not perfect, which means that a source state of the form (9) is not ideal, and that the desired state is given by one of the state vectors , and the other terms arise due to the naturally imperfect nature of the source. For example, for the state in (12), the desired bipartite state is the maximally entangled state
Once the source state is prepared, each mode is sent through the pure-loss channel. Letting
denote the quantum channel that acts on the d modes of each of the k systems, the overall quantum channel through which the source state ρS is sent is
where and ηj is the transmittance of the medium to the node in the edge. The quantum state shared by the k nodes after transmission from the source is then .
Now, it is known (see, e.g., Ref. 108) that the action of the bosonic pure-loss channel on any linear operator σd encoded in d modes according to the encoding in (7) is equivalent to the output of an erasure channel.109,110 In general, a d-dimensional quantum erasure channel , with , is defined as follows. Consider the vector space with orthonormal basis elements and the vector space with orthonormal basis elements . Then, for every linear operator . Note that the output is an element of . In particular, note that the vector is orthogonal to the input vector space .
Lemma II.1 (Pure-loss channel with a d-rail encoding108). Let . For every linear operator X acting on a d-dimensional Hilbert space defined by the basis elements in (5)–(7), we have that
Proof. To start, the bosonic pure-loss channel has the following Kraus representation:111,112
where a and are the annihilation and creation operators of the bosonic mode, respectively, which are defined as for all (with ) and for all .
Now, every linear operator X acting on a d-dimensional space that is encoded into d bosonic modes as in (5)–(7) can be written as
for . Using (18), it is straightforward to show that
Using this, we find that
Therefore,
as required. □
After transmission from the source to the nodes, the heralding procedure typically involves doing measurements at the nodes to check whether all of the photons arrived. In the ideal case, the quantum instrument for the heralding procedure corresponds simply to a measurement in the single-photon subspace defined by (5)–(7). To be specific, let
where is the projection onto the d-dimensional single-photon subspace defined by (5)–(7) and is the identity operator of the full Hilbert space of the d bosonic modes. Then, letting and defining
the maps and have the following form:
These maps correspond to perfect photon-number-resolving detectors. However, the detectors are typically noisy due to dark counts and other imperfections (see, e.g., Ref. 107), so that in practice, the maps and will not have the ideal forms presented in (31) and (32).
Let
Then, if the source produces the ideal quantum state, such as the state in (13), so that , and if the heralding procedure is also ideal, then using (16), we obtain
which means that the transmission-heralding success probability as defined in (1) is simply .
Remark II.2 (Multiplexing). In practice, in order to increase the transmission-heralding success probability, multiplexing strategies are used. The term “multiplexing” here refers to the use of a single transmission channel to send multiple signals simultaneously, with the signals being encoded into distinct (i.e., orthogonal) frequency modes, see, e.g., Ref. 113. If , distinct frequency modes are used, then the source state being transmitted is . If p denotes the probability that any single one of the signals is received and heralded successfully, then the probability that at least one of the M signals is received and heralded successfully is .
2. Transmission from satellites
Let us now consider the model of elementary link generation proposed in Ref. 114 in which the entanglement sources are placed on satellites orbiting the Earth. For further information on satellite-based quantum communication, we refer to Ref. 115 for a review, and we refer to Refs. 116–119 for more detailed modeling of the satellite-to-ground quantum channel than what we consider here.
When modeling photon transmission from satellites to ground stations, we must take into account background photons. Here, we analyze the scenario in which a source on board a satellite generates an entangled photon pair and distributes the individual photons to two parties, Alice (A) and Bob (B), on the ground. We allow the distributed photons to mix with background photons from an uncorrelated thermal source. Also, as before, we use the bosonic encoding defined in (5)–(7), but we stick to d = 2, i.e., qubit source states and, thus, bipartite elementary links. In this scenario, it is common for the two modes to represent the polarization degrees of freedom of the photons, so that
represent the state of one horizontally and vertically polarized photon, respectively.
Let be the average number of background photons. Then, as done in Ref. 114, we can define an approximate thermal background state as
The transmission channel from the satellite to the ground stations is then
where is the beamsplitter unitary (see, e.g., Ref. 105) and A1 and A2 refer to the horizontal and vertical polarization modes, respectively, of the dual-rail quantum system being transmitted, similarly for E1 and E2. Note that for , the transformation in (40) reduces to the one in (16) with d = 2.
The transmittance generally depends on atmospheric conditions (such as turbulence and weather conditions) and on orbital parameters (such as altitude and zenith angle).117–119 In general, if the satellite is at the altitude h and the path length from the satellite to the ground station is L, then
where
and
with the transmittance at zenith (ζ = 0). In general, the zenith angle ζ is given by
for a circular orbit of altitude h, with km being the Earth's radius. The following parameters, thus, characterize the total transmittance from satellite to ground: the initial beam waist w0, the receiving aperture radius r, the wavelength λ of the satellite-to-ground signals, and the atmospheric transmittance at zenith. Throughout the rest of this section, we take114,r = 0.75 m, cm, λ = 810 nm, and at 810 nm.116
For a source state , with and , the quantum state shared by Alice and Bob after the transmission of the state from the satellite to the ground stations is
where and are the transmittances to the ground stations and and are the corresponding thermal background noise parameters. In Sec. II D, we look at a specific example of a source state and, thus, provide an explicit form for the state . We also consider the heralding procedure defined by (28)–(32) and, thus, provide explicit forms for the states and in (2) and (3) corresponding to success and failure, respectively, of the heralding procedure.
B. Definition of the MDP
Having described the physical model of elementary link generation in Sec. II A, let us now proceed to the definition of the Markov decision process (MDP) for an elementary link. Note that while the formalism of Sec. II A gives us a mathematical description of the quantum state of an elementary link immediately after it is successfully generated, the MDP formalism provides us with a systematic framework to define actions on an elementary link and their effects on the quantum state over time.
Before starting, let us briefly summarize the definition of a Markov decision process (MDP); we refer to Appendix A for more details and a detailed explanation of the notation being used. An MDP is a mathematical model of an agent performing actions on a system (usually called the environment). The system is described by a set of (classical) states, and the agent picks actions from a set . Corresponding to every action, is a transition matrix Ta, such that the matrix element is equal to the probability of transitioning to the state , given that the current state is and the action is taken.
The results of Ref. 58 show us that, for the purposes of tracking the quantum state of an elementary link over time as well as its fidelity to a target pure state, it is enough to keep track of the time that the quantum systems of the elementary link reside in their respective quantum memories. With this observation, we can define a simpler MDP for elementary links.
States: The states in our elementary link MDP are defined by the set , which correspond to the number of time steps that the quantum systems of the elementary link have been sitting in their respective quantum memories. The state −1 corresponds to the elementary link being inactive, and corresponds to the coherence time of the quantum memory. Specifically, if is the coherence time of the quantum memory (say, in seconds), and the duration of the every time step (in seconds) is (based on the classical communication time between the nodes in the elementary link), then . From now on, we refer to as the maximum storage time of the elementary link. We use M(t), , to refer to the random variables (taking values in ) corresponding to the state of the MDP at time t. We also associate to the elements in orthonormal vectors , and we emphasize that these vectors should not be thought of as representing quantum states but as representing the extreme points of a probability simplex associated with the set , see Appendix A for details.
Actions: The set of actions is , where 0 corresponds to the action of “wait” and “1” corresponds to “request.” In other words, at every time step, the agent can decide to keep their quantum systems currently in memory or to discard the quantum systems and perform the elementary link generation procedure again.
The transition matrices T0 and T1 corresponding to the two actions are defined as follows:(48)(49)where(50)(51)(52)(53)(Note that we define our transition matrices such that probability vectors are applied to them from the right, see Appendix A for details.) The transition matrix T0 describes what happens to the elementary link when the action a = 0 (wait) is taken by the agent: if the elementary link is currently inactive, then it stays inactive; if the elementary link is active and it is in memory for less than time steps, then the memory time is incremented by one; if the elementary link is active and it has been in memory for time steps, then because the coherence time of the memory has been reached (as per the definition of ), the elementary link becomes inactive. If the action a = 1 (request) is taken, then regardless of the current state of the elementary link, the state changes to −1 (inactive) with probability , meaning that the elementary link generation failed, or it changes to 0 with probability p, meaning that the elementary link generation succeeded. These two possibilities are captured by the probability vector .We use A(t), , to refer to the random variable (taking values in the set ) corresponding to the action taken at time t.
We let be the history, consisting of a sequence of states and actions, up to time t, with .
- •
- Figure of merit: Our figure of merit for an elementary link is the following function:(54)(55)
where is defined in (4) and is a target state vector for the elementary link. (For example, if the elementary link contains two nodes, then could be the state vector for the two-qubit maximally entangled state.) We emphasize that the function f is not just the fidelity of the elementary link—it also depends implicitly on the probability that the elementary link is active because if f was simply the fidelity of the elementary link, then instead of the definition , we would have , where is the quantum state corresponding to failure of the heralding procedure, see (3). We illustrate the importance of this distinction, therefore the usefulness of this figure of merit for designing and evaluating protocols, in Sec. II D 2 a, specifically Fig. 7. To the best of our knowledge, this figure of merit has not been considered in prior work.
A policy is a sequence of decision functions , which indicate the probability of performing a particular action conditioned on the state of the system,
For a particular policy , the probability of a particular history of states and actions is (see Appendix A 2)
Then, the quantum state of the elementary link is58
where we recall that is given by (4).
We are interested primarily in the expected value of the function f defined in (55) at times ,
for policies . We are also interested in the probability that the elementary link is active at time , which is given by
From this, the expected fidelity of the elementary link is given by
We are interested in the maximum value of the function defined in (60) among all policies π,
A policy π achieving the supremum is called an optimal policy.
In the steady-state (infinite-time) limit, we are interested in the maximum value of among all time-homogeneous (stationary) policies , i.e., policies in which a fixed decision function d is used at every time step,
if the limit exists.
C. Policies
In Ref. 58, it was shown that a policy that achieves the optimal value in (63) can be determined using a backward recursion algorithm. We restate this algorithm here for completeness.
Theorem II.3 (Optimal finite-time policy for an elementary link58). For all , the optimal expected fidelity of an elementary link with success probability is given by
where
for all , and
Furthermore, the optimal policy is deterministic and given by , where
Intuitively, the result of Theorem II.3 tells us that, for finite times, the optimal policy can be found by optimizing the individual actions going “backwards in time,” by first optimizing the final action at time t – 1, then optimizing the action at time t − 2, etc., and then finally optimizing the action at time t = 1. This is, indeed, the case because from (68), we see that the optimal action at the first time step is obtained using the function w2, but from (66), we see that to calculate w2, we need w3, and to calculate w3, we need w4, etc., until we get to the function wt for the final time step, which we can calculate using (67).
While the optimal policy for finite times was determined in Ref. 58, the steady-state value of the expected fidelity under arbitrary stationary policies [i.e., the value in (64)] was not determined. We now show that the limit in (64) exists, and we determine its value for arbitrary decision functions.
Theorem II.4 (Steady-state expected value of an elementary link). Let p be the success probability of generating an elementary link in a quantum network, and let d be a decision function, such that is the probability of executing the action wait and is the probability of executing the action request. Then, if the elementary link undergoes the stationary policy , then
where
with
Proof. See Appendix D. □
Using Theorem II.4, we can determine the optimal steady-state value of the function , thus the optimal decision function d, by optimizing the quantity in (69) with respect to independent variables subject to the constraints for all . [Recall from the statement of Theorem II.4 that the variables are directly related to the decision function d.] Alternatively, we can use the following linear program in order to obtain an optimal policy.
Theorem II.5 (Linear program for the optimal steady-state value of an elementary link). Consider an elementary link in a quantum network with maximum memory time . Let . Then, the optimal steady-state value of the elementary link, namely, the quantity in (64), is equal to the solution of the following linear program:
where the optimization is with respect to the -dimensional vectors , and the inequality constraints on the vectors are componentwise. For every feasible point of this linear program, we obtain a decision function d as follows: for all and . If , then we set and for an arbitrary .
Proof. The linear program in (74) is a special case of the linear program presented in Proposition A.2 in Appendix A. The main assumption of that result is that the MDP be ergodic, which is true in this case by Theorem II.4. □
1. The memory-cutoff policy
An example of a stationary policy is the memory-cutoff policy, which has been considered extensively in prior work.58,59,63–65,97,98,100,120–123 This is a deterministic policy that is defined by a cutoff time , where , such that . Then,
Then, by Theorem II.4, we have , so that
for all , which agrees with Ref. 58 [Eq. (4.15)], which was obtained using different methods. We also obtain
for all .
For , we have, for all ,58
In what follows, we make use of the following definitions for the deterministic decision functions corresponding to the memory-cutoff policy:
D. Example: Satellite-to-ground entanglement distribution
1. Quantum state of an elementary link
In Sec. II A 2, we defined the transmission channel corresponding to the transmission of entanglement from a satellite to two ground stations. In particular, if we consider two ground stations, one corresponding to Alice and one corresponding to Bob, then given a state produced by the source on the satellite, the state after the transmission of the system A to Alice and the system B to Bob is given by (48)
where and are the transmittances to the ground stations and and are the corresponding thermal background noise parameters.
After transmission, we assume a heralding procedure defined by post-selecting on coincident events using (perfect) photon-number-resolving detectors. One can justify this assumption because, in the high-loss and low-noise regimes (), the probability of four-photon and three-photon occurrences is negligible compared to two-photon events. Therefore, upon successful heralding, the (unnormalized) quantum state shared by Alice and Bob is
where
is the projection onto the two-photon-coincidence subspace. Note that the projection ΠAB is exactly the projection , with defined in (28). Then, the transmission-heralding success probability is, as per the definition in (1),
Now, let us take the source state to be the following:
where and
Proposition II.6 (Quantum state of a satellite-to-ground elementary link114) Let , and consider the source state given by (88). Then, after successful heralding, the (unnormalized) state given by (84) is equal to
where
and
for .
From (93), we have that the transmission-heralding success probability is given by
so that the quantum state shared by Alice and Bob conditioned on successful heralding is, as per the definition in (2),
a. Success probability and fidelity
Let us now evaluate the quality of entanglement transmission from a satellite to two ground stations. For illustrative purposes, and for simplicity, we focus primarily on the simple scenario depicted in Fig. 3, in which a satellite passes over the midpoint between two ground stations, although the same analysis can be done even when this is not the case. Since the satellite is an equal distance away from both ground stations, we have . We also let . This means that , and , so that
Optical satellite-to-ground transmission.114 Two ground stations g1 and g2 are separated by a distance d with a satellite at an altitude h at the midpoint. Both ground stations are the same distance L away from the satellite, so that the total transmittance for two-qubit entanglement transmission (one qubit to each ground station) is , where , with given by (42) and given by (45). Reprinted with permission from Khatri et al., npj Quantum Inf. 7, 4 (2021). Copyright 2021 Author(s), licensed under a Creative Commons License.
Optical satellite-to-ground transmission.114 Two ground stations g1 and g2 are separated by a distance d with a satellite at an altitude h at the midpoint. Both ground stations are the same distance L away from the satellite, so that the total transmittance for two-qubit entanglement transmission (one qubit to each ground station) is , where , with given by (42) and given by (45). Reprinted with permission from Khatri et al., npj Quantum Inf. 7, 4 (2021). Copyright 2021 Author(s), licensed under a Creative Commons License.
In this scenario, given a distance d between the ground stations and an altitude h for the satellite, by simple geometry, the distance L between the satellite and either ground station is given by
where is the radius of Earth.
Now, let us consider the transmission-heralding success probability p in (98). Due to the altitude of the satellites, there typically has to be multiplexing of the signals (see Remark II.2) in order to maintain a high probability of both ground stations receiving the entangled state. In Fig. 4, we plot the success probability with multiplexing, which is given by , where M is the number of distinct frequency modes used for multiplexing.
Plots of the transmission-heralding success probability as well as the initial fidelity of the quantum state conditioned on successful heralding for the situation depicted in Fig. 3 in which and . Indicated is the threshold fidelity of beyond which the state is entangled (see Proposition II.7). The success probability is shown in a multiplexing setting with (see Remark II.2). Also, we have let and fS = 1.
Plots of the transmission-heralding success probability as well as the initial fidelity of the quantum state conditioned on successful heralding for the situation depicted in Fig. 3 in which and . Indicated is the threshold fidelity of beyond which the state is entangled (see Proposition II.7). The success probability is shown in a multiplexing setting with (see Remark II.2). Also, we have let and fS = 1.
We also plot in Fig. 4 the fidelity of the initial state, which is given by
The fidelity of with respect to is related in a simple way to the entanglement of . In particular, by the partial positive transpose (PPT) criterion,124,125 is entangled if and only if its fidelity with respect to is strictly greater than , and this leads to constraints on the loss and noise parameters of the satellite-to-ground transmission.
Proposition II.7. The quantum state after the successful satellite-to-ground transmission, as defined in (99), is entangled if and only if the fidelity of the source state in (88) satisfies , and
Proof. Observe that the state is a Bell-diagonal state of the form
where [when ]. Indeed, the coefficient of in (93) can be written as
and the coefficient of in (94) can be written as
We can, thus, make the following identifications:
Now, using the PPT criterion,124,125 we have that is entangled if and only if . Then, from (102), we have that
so we require
Simplifying this leads to
as required. □
Now, for the scenario depicted in Fig. 3, we have that , and , so that from (100), we have , and . Substituting this into (105) leads to as the condition for to be entangled. We plot this condition in Fig. 5. The inequality gives us the colored regions, and the values within the regions are obtained by evaluating the fidelity according to (102).
Plots of the entanglement region for the state obtained after a successful satellite-to-ground transmission for the scenario depicted in Fig. 3. The regions are defined by the condition , with F(1) the fidelity of the state with the maximally entangled state, see (102) and Proposition II.7. For both plots, we assume fS = 1. For the right-hand plot, we take .
Plots of the entanglement region for the state obtained after a successful satellite-to-ground transmission for the scenario depicted in Fig. 3. The regions are defined by the condition , with F(1) the fidelity of the state with the maximally entangled state, see (102) and Proposition II.7. For both plots, we assume fS = 1. For the right-hand plot, we take .
b. Key rates for QKD
Let us also consider key rates for quantum key distribution (QKD) between Alice and Bob, who are at the ends of the elementary link whose quantum state is (conditioned on successful transmission and heralding), as given by (100). We consider the BB84, six-state, and device-independent (DI) QKD protocols, and we calculate the secret key rates using known asymptotic secret key rate formulas, which we review (along with other necessary background on QKD) in Appendix C.
Recalling from the proof of Proposition II.7 that is a quantum state of the form
with α, β, and γ defined in (109)–(111), it is easy to show using (C2)–(C6) that the quantum bit-error rates (QBERs) for the BB84 and six-state protocols are
For the device-independent protocol, we assume that the correlation is such that the quantum bit-error rate is and . Then, assuming that M signals per second are transmitted from the satellite, the secret-key rate (in units of secret key bits per second) is given by , where is the success probability of elementary link generation and K is the asymptotic secret key rate per copy of the state , which depends on the protocol under consideration. Using the formulas in Appendix C, we obtain
We plot these secret key rates in Fig. 6.
Asymptotic secret key rates for the BB84, six-state, and device-independent (DI) quantum key distribution protocols for the scenario depicted in Fig. 3. When calculating the error rates in (116) and (117), we take fS = 1. To calculate the key rates in (118)–(120), we have taken .
In Fig. 6, notice that the region of non-zero secret key rate is largest for the six-state protocol, with the region for the BB84 protocol being smaller and the region for the DI protocol being even smaller. This is due to the fact that the error threshold for the DI protocol is the smallest among the three protocols, with the error threshold for the BB84 protocol slightly larger, and the error threshold for the six-state protocol the largest.
c. Quantum memory model
Having examined the quantum state immediately after successful transmission and heralding, let us now consider a particular model of decoherence for the quantum memories in which the transmitted qubits are stored. For illustrative purposes, we consider a simple amplitude damping decoherence model for the quantum memories. The amplitude damping channel is a qubit channel, with , such that126
Note that for γ = 0, we recover the noiseless (identity) channel. We can relate γ to the coherence time of the quantum memory, which we denote by , as follows (Ref. 127, Sec. 3.4.3):
Note that infinite coherence time corresponds to an ideal quantum memory, meaning that the quantum channel is noiseless. Indeed, by relating the noise parameter γ to the coherence time as in (125), we have that .
For applications of the amplitude damping channel, it is straightforward to show that
where . Then, for all ,
where α and β are given by (110) and (111), respectively. Note that we have assumed that the memories corresponding to systems A and B have the same coherence time. It follows that
Note that for all .
2. Policies
a. Memory-cutoff policy
Let us now consider the memory-cutoff policy, which we defined in Sec. II C. Using (77) and (78), along with the expression for f(m) in (138), for every cutoff , we obtain
Then, using the fact that , it is straightforward to show that
Therefore, in the steady-state limit,
For , from (78), we obtain
for all . Evaluating the sums leads to
Then, for all , we obtain .
Let us now focus primarily on the memory-cutoff policy by considering an example. Consider the situation depicted in Fig. 3, in which we have two ground stations separated by a distance d and a satellite at the altitude h that passes over the midpoint between the two ground stations. Now, given that the ground stations are separated by a distance d, it takes time at least to perform the heralding procedure, as this is the round-trip communication time between the ground stations (c is the speed of light). We, thus, take the duration of each time step in the decision process for the elementary link to be . If the coherence time of the quantum memories is x seconds, then time steps. In Fig. 7, we plot the quantities (solid lines), (dashed lines), and (dotted lines) for the memory-cutoff policy under this scenario.
The memory-cutoff policy for satellite-to-ground elementary link generation for various ground distances d and satellite altitudes h, according to the situation depicted in Fig. 3. The solid lines are [as given by (145)], the dashed lines are , and the dotted lines are [see (79)], where , with a and c given by (100) and , respectively. We let fS = 1 be the fidelity of the source, we let be the average number of background photons, and we take the memory coherence times to be 1 s (top) and 60 s (bottom). The dots are placed at the maxima of the curves for .
The memory-cutoff policy for satellite-to-ground elementary link generation for various ground distances d and satellite altitudes h, according to the situation depicted in Fig. 3. The solid lines are [as given by (145)], the dashed lines are , and the dotted lines are [see (79)], where , with a and c given by (100) and , respectively. We let fS = 1 be the fidelity of the source, we let be the average number of background photons, and we take the memory coherence times to be 1 s (top) and 60 s (bottom). The dots are placed at the maxima of the curves for .
In Fig. 7, we can see the trade-off among the quantities , F, and X. On the one hand, the fidelity is always highest at time t = 1, as we expect, but at this point, the probability that the elementary link is active is simply p. Since we want not only a high fidelity for the elementary link but also a high probability that the elementary link is active, by optimizing , it is possible to achieve a higher elementary link activity probability at the expense of a slightly lower fidelity. Specifically, in Fig. 7, we see that for every choice of d and h, there exists a time step at which is maximal. At this point, the elementary link activity probability is , which, in many cases, is dramatically greater than p, while the fidelity is only slightly lower than the fidelity at time t = 1. Therefore, by waiting until time , it is possible to obtain an elementary link that is almost deterministically active, while incurring only a slight decrease in the fidelity. The time , obtained by optimizing the quantity with respect to time t and can be found using the formula in (145), can be viewed as the optimal time t that should be chosen for the quantum network protocol presented in Fig. 13. We refer to Ref. 128 for an argument similar to the one presented here, except that in Ref. 128, the time is obtained by considering a desired value of the fidelity rather than by optimizing with respect to t, which is what we do here.
b. Forward recursion policy
The forward recursion policy is defined as the time-homogeneous policy, such that the action at time t is equal to the one that maximizes the quantity at the next time step. The corresponding decision function is58
Observe that if p = 1, then the second condition in (146) is always false because of the fact that for all , see (137). Therefore, when p = 1, we have that , i.e., the forward recursion policy is equal to the memory-cutoff policy, see (82). We now show that the forward recursion policy reduces to a memory-cutoff policy even when p < 1.
Proposition II.8. Consider the satellite-to-ground bipartite elementary link generation with and fS = 1, and let be the transmission-heralding success probability, as given by (98). Let be the coherence time of the quantum memories, as defined in Sec. II D 1. Then, for all
where
In other words, if , then the forward recursion policy is equal to the memory-cutoff policy; if , then the forward recursion policy is equal to the memory-cutoff policy, with given by (148).
Remark II.9. The result of Proposition II.8 goes beyond elementary link generation with satellites because we assumed that and fS = 1. As a result of these assumptions, the result of Proposition II.8 applies to every elementary link generation scenario (such as ground-based elementary link generation as described in Sec. II A 1) in which the transmission channel is a pure-loss channel, the heralding procedure is described by (28)–(32), the source state is equal to the target state, and the quantum memories are modeled as in Sec. II D 1.
In the case and fS = 1, we have that , so that the inequality in (150) becomes
Now, this inequality is satisfied for all if and only if . In other words, if , then for all possible memory times, the action is to wait if the elementary link is currently active, meaning that the decision function in (146) becomes
which is precisely the decision function for the memory-cutoff policy, see (82).
For , whether or not the inequality in (151) is satisfied depends on the memory time m. Consider the largest value of m for which the inequality is satisfied and denote that value by . Since the action is to wait, at the next time step, the memory value will be , which by definition will not satisfy the inequality in (149). This means that for all memory times strictly less than , the forward recursion policy dictates that the wait action should be performed if the elementary link is currently active. As soon as the memory time is equal to , then the forward recursion policy dictates that the request action should be performed. This means that is a cutoff value. In particular, by rearranging the inequality in (151), we obtain
which means that
and
as required. □
Observe that the cutoff in (148) is equal to zero for all . This means that p = 1 is not the only transmission-heralding success probability for which the forward recursion policy is equal to the memory-cutoff policy. Intuitively, for , the transmission-heralding success probability is high enough that it is not necessary to store the quantum state in memory—for the purpose of maximizing the expected value of , it suffices to request a new quantum state at every time step. At the other extreme, for , the probability is too low to keep requesting—for the purpose of maximizing the expected value of , it is better to keep the quantum state in memory indefinitely.
c. Backward recursion policy
Finally, to end this section, let us consider the backward recursion policy, which we know to be optimal from Theorem II.3. We perform the policy optimization for small times, just as a proof of concept.
In Fig. 8, we plot optimal values of for a single elementary link, except now we plot them as a function of the ground station distance d and the satellite altitude h as per the situation depicted in Fig. 3. We also plot the elementary link activity probability and the expected fidelities associated with the optimal policies. As before, we assume that fS = 1, but unlike before, we assume that , and we consider multiplexing with distinct frequency modes per transmission. We assume a coherence time of 1 s throughout. For small distance-altitude pairs, we find that the optimal value is reached within five time steps. For these cases, it is worth pointing out that the optimal value of corresponds to an elementary link activity probability of nearly one, while the fidelity (although it drops, as expected) does not drop significantly, meaning that the elementary link can still be useful for performing entanglement distillation of parallel elementary links or for creating virtual links. It is also interesting to point out that for a ground distance separation of d = 2000 km, the optimal values for satellite altitude h = 1000 km are higher than for h = 500 km. This result can be traced back to the top-left panel of Fig. 4, in which we see that the transmission-heralding success probability curves for h = 500 km and h = 1000 km cross over at around 1700 km, so that h = 1000 km has a higher probability than h = 500 km when d = 2000 km.
Optimal values of , along with the associated values of and fidelities , for a single elementary link distributed by a satellite to two ground stations, according to the symmetric situation depicted in Fig. 3. We assume that fS = 1 and that , and we assume that the quantum memories have a coherence time of 1 s. We also assume multiplexing with distinct frequency modes per transmission.
Optimal values of , along with the associated values of and fidelities , for a single elementary link distributed by a satellite to two ground stations, according to the symmetric situation depicted in Fig. 3. We assume that fS = 1 and that , and we assume that the quantum memories have a coherence time of 1 s. We also assume multiplexing with distinct frequency modes per transmission.
III. ENTANGLEMENT DISTILLATION AND JOINING PROTOCOLS
In Sec. II, we discussed elementary links in a quantum network, how to model the generation of elementary links, and how to model them in time in terms of a Markov decision process. The description of an elementary link in terms of a Markov decision process allows us to determine, as a function of time, the quantum state of an elementary link. Keeping in mind the overall goal of entanglement distribution, i.e., the creation of long-distance virtual links, the next step in an entanglement distribution protocol is to take elementary links, to improve their fidelity using entanglement distillation, and then to join them in order to create the virtual links (using, e.g., entanglement swapping). In this section, we explain how to model entanglement distillation protocols and joining protocols using LOCC channels. We refer to Appendix B 2 for a detailed explanation of LOCC channels. The explicit description of these protocols as LOCC channels is important because, as we saw in Sec. II, the quantum state of an elementary link will not always be the ideal entangled state with respect to which joining protocols are typically defined. It is, therefore, important to understand how the protocols will act when the input states are not ideal.
A. Entanglement distillation
The term “entanglement distillation” refers to the task of taking many copies of a given quantum state ρAB and transforming them, via an LOCC protocol, to several (fewer) copies of the maximally entangled state . Typically, with only a finite number of copies of the initial state ρAB, it is not possible to perfectly obtain copies of the maximally entangled state, so we aim, instead, for a state σAB whose fidelity to the maximally entangled state is higher than the fidelity of the initial state. Mathematically, the task of entanglement distillation corresponds to the transformation
where , m < n, and is an LOCC channel.
Typically, in practice, we have n = 2 and m = 1, with the task being to transform two two-qubit states and to a two-qubit state having a higher fidelity to the maximally entangled state than the initial states. Protocols achieving this aim are typically probabilistic in practice, meaning that the state with higher fidelity is obtained only with some non-unit probability.
We are not concerned with any particular entanglement distillation protocol in this work. All we are concerned with is their mathematical structure. In particular, entanglement distillation protocols that are probabilistic can be described mathematically as an LOCC instrument, which we now demonstrate with a simple example, depicted in Fig. 9, which comes from Ref. 36. In this protocol, Alice and Bob first apply the CNOT gate to their qubits and follow it with a measurement of their second qubit in the standard basis. They then communicate the results of their measurement to each other. The protocol is considered successful if they both obtain the same outcome and a failure otherwise. This protocol has the following corresponding LOCC instrument channel:
where
Depiction of the simple entanglement distillation protocol as described in Ref. 36. The protocol takes two isotropic states [see (161)] and transforms them probabilistically to a state with higher fidelity.
Furthermore, the states are defined as
where is the isotropic twirling channel, see, e.g., Ref. 129 (Example 7.25).
It is a straightforward calculation to show that if and are the fidelities of the initial states with the maximally entangled state, then the protocol depicted in Fig. 9, with corresponding LOCC channel given by (157), succeeds with probability,
and the fidelity of the output state with the maximally entangled state (conditioned on success) is
The above example illustrates a general principle, which is that entanglement distillation protocols that are probabilistic (and heralded) can be described using LOCC instrument channels. Specifically, let be the graph corresponding to the physical links in a quantum network. Given an element with n parallel edges , every probabilistic entanglement distillation protocol has the form of an LOCC instrument channel of the following form:
where and are completely positive trace non-increasing LOCC maps, such that is a trace-preserving map, thus an LOCC quantum channel. Specifically, corresponds to failure of the protocol and corresponds to success of the protocol.
B. Joining protocols
Let us now discuss joining protocols, such as entanglement swapping. We can describe such protocols using LOCC instrument channels, just as with entanglement distillation protocols. As above, let be the graph corresponding to the physical links in a quantum network. A path in a graph is a sequence of vertices and edges that specifies how to get from the vertex v1 to the vertex vn. Given a path w of active elementary links in the network, the joining channel that forms the new virtual link is given in the probabilistic setting by
where and are completely positive trace non-increasing LOCC maps, such that is a trace-preserving map, thus an LOCC quantum channel. Specifically, corresponds to failure of the joining protocol, and corresponds to success of the joining protocol. Given an input state ρw corresponding to the given path w, the success probability of the joining protocol is , and the state conditioned on success is
Note that as input states to the maps and , we could have arbitrary states of the elementary links along the path w. In particular, depending on the elementary link policy, they could be states of the form (59), which take into account the noise in the quantum memories and other device imperfections arising during the process of generating the elementary links.
The precise joining protocol, and thus, the explicit form for the maps and , depends on the type of entanglement that is to be created. For bipartite entanglement, we consider entanglement swapping in Sec. III B 1. For tripartite GHZ entanglement, we describe a protocol in Sec. III B 2, and for multipartite graph states, we describe a protocol in Sec. III B 3.
1. Entanglement swapping protocol
Let be a multipartite quantum state, where and is an abbreviation for two the quantum systems and . The entanglement swapping protocol with n intermediate nodes is defined by a Bell-basis measurement of the systems , i.e., a measurement described by the positive operator-valued measure (POVM) , where , and
are the qudit Bell state vectors, with
The operators Z and X are the discrete Weyl operators,129 which are defined as
Conditioned on the outcomes (zj, xj) of the Bell measurement on , the unitary is applied to the system B, where the addition is performed modulo d. Let and define
where the addition in the second line is performed modulo d. Then, the LOCC quantum channel corresponding to the entanglement swapping protocol with intermediate nodes is
The standard entanglement swapping protocol39 corresponds to the input state
This scenario is shown in Fig. 10. Indeed, it can be shown that
A chain of five nodes corresponding to the entanglement swapping protocol with n = 3 intermediate nodes. The red lines represent maximally entangled states. The goal of the entanglement swapping protocol is to establish entanglement between A and B. The protocol proceeds by first performing a Bell-basis measurement on the systems at the nodes , and communicating the results of the measurement to B, who applies a correction operation based on the outcomes.
A chain of five nodes corresponding to the entanglement swapping protocol with n = 3 intermediate nodes. The red lines represent maximally entangled states. The goal of the entanglement swapping protocol is to establish entanglement between A and B. The protocol proceeds by first performing a Bell-basis measurement on the systems at the nodes , and communicating the results of the measurement to B, who applies a correction operation based on the outcomes.
Furthermore, the standard teleportation protocol12 corresponds to n = 1 and the input state
where is a trivial (one-dimensional) system and is an arbitrary d-dimensional quantum state, so that
as expected.
Proposition III.1 (Fidelity after entanglement swapping). For all and all states , the fidelity of the maximally entangled state with the state after entanglement swapping of is given by
where and .
Proof. See Appendix E 1. □
A simple way to make the entanglement swapping protocol probabilistic is to modify the measurement operators in the ideal protocol as follows:
where are POVMs, such that
The values represent the success probability of the Bell-basis measurement at the intermediate node. We then define the LOCC instrument channel for the probabilistic entanglement swapping protocol as follows:
where
and
Then, the success probability of the protocol is
for every state .
2. GHZ entanglement swapping protocol
The previous example takes a chain of Bell states and transforms them into a Bell state shared by the end nodes of the chain. In this example, we look at a protocol that takes the same chain of Bell states and transforms them instead to a multi-qubit GHZ state, which is defined as130
We call this protocol as the GHZ entanglement swapping protocol.
The protocol for transforming a chain of two Bell states to a three-party GHZ state is shown in Fig. 11. First, the two qubits and in the central node are entangled with a CNOT gate, followed by a measurement of in the standard basis (with corresponding POVM ). The result is communicated to B, where the correction operation is applied. The LOCC channel corresponding to this protocol is
where
The GHZ entanglement swapping protocol with one intermediate node. The two qubits in the central node are entangled using the CNOT gate, after which the qubit is measured in the standard basis. The result of the measurement is communicated to B, where the gate is applied.
The GHZ entanglement swapping protocol with one intermediate node. The two qubits in the central node are entangled using the CNOT gate, after which the qubit is measured in the standard basis. The result of the measurement is communicated to B, where the gate is applied.
The protocol shown in Fig. 11, with the corresponding LOCC quantum channel in (187), can be easily extended to a scenario with n > 1 intermediate nodes. In this case, the node starts by applying the gate to its qubits and then measuring the qubit in the standard basis. The outcome of this measurement is sent to the node , and the corresponding correction operation is applied to the qubit . Then, the gate is applied to the qubits at , followed by a standard-basis measurement of and communication of the outcome to and a correction operation on . This proceeds in sequence until the intermediate node , which sends its measurement outcome to B, which applies the appropriate correction operation. The LOCC channel for this protocol is
where
for all . If the input state to this channel is
then the output is a -party GHZ state given by the state vector as defined in (186), i.e.,
Proposition III.2 (Fidelity after GHZ entanglement swapping). For all and for all states , the fidelity of the -party GHZ state with the state after the GHZ entanglement swapping of is
Proof. See Appendix E 2. □
The GHZ entanglement swapping protocol can be made probabilistic in a manner similar to the entanglement swapping protocol. We start by writing (190) as follows:
where
Then, to make the protocol probabilistic, we can make the following simple modification:
where are POVMs, such that
The values represent the success probability of the standard-basis measurement at the intermediate node. Then, we define the LOCC quantum instrument channel for the GHZ entanglement swapping protocol as follows:
where
and
Then, the success probability of the protocol is
for every state .
3. Graph state distribution protocol
We now consider an example of distributing an arbitrary graph state, which can be viewed as a special case of the procedure considered in Ref. 73. A graph state131–133 is a multi-qubit quantum state defined using graphs.
Consider a graph , which consists of a set V of vertices and a set E of edges. For the purpose of this example, G is an undirected graph, and E is a set of two-element subsets of V. The graph state is an n-qubit quantum state with , which is defined as
where A(G) is the adjacency matrix of G, which is defined as
and is the column vector . It is easy to show that
where and
with being the controlled-Z gate.
Now, consider the scenario depicted in Fig. 12 in which n = 4 nodes share Bell states with a central node. The task is for the central node to distribute the graph state to the outer nodes. One possible procedure is for the central node to locally prepare the graph state and then to teleport the individual qubits using the Bell states. However, it is possible to perform a slightly simpler procedure that does not require the additional qubits needed to prepare the graph state locally. In fact, the following deterministic procedure produces the required graph state shared by the nodes .
The central node applies to the qubits .
On each of the qubits , the central node performs the measurement defined by the POVM , where . The outcome is an n-bit string , where xi = 0 corresponds to the “+” outcome and xi = 1 corresponds to the “−” outcome. The central node communicates outcome xi to the node Ai.
The nodes Ai apply to their qubit. In other words, if xi = 0, then Ai does nothing, and if xi = 1, then Ai applies Z to their qubit.
Depiction of a protocol for distributing a graph state among four nodes , all of which initially share Bell states with the central node.
Depiction of a protocol for distributing a graph state among four nodes , all of which initially share Bell states with the central node.
Let us prove that this protocol achieves the desired outcome. First, we observe that
Then, after the first step, the state is
where we have used the fact that
Then, we find that for every outcome string of the measurement on the qubits , the corresponding (unnormalized) post-measurement state is
Then, using the fact that for all , we find that at the end of the second step, the (unnormalized) state is
for all . From this, we see that up to local Pauli-z corrections, the post-measurement state is equal to the desired graph state with probability for every measurement outcome string . Once all of the nodes Ai receive their corresponding outcome xi and apply the correction , the nodes share the graph state . As a result of the classical communication of the measurement outcomes and the subsequent correction operations, the protocol is deterministic.
The protocol described above has the following representation as an LOCC channel:
for every state , where is the Hadamard operator, and we have let
We have also used the abbreviation , and similarly for . Using the fact that
for all , and letting
we can write the channel in the following simpler form:
From this, we see that the protocol can be thought of as measuring the systems according to the POVM and, conditioned on the outcome , applying the correction operation to the systems . Note that is, indeed, a POVM due to the fact that
Proposition III.3 (Fidelity after graph state distribution). For all , every graph G with n vertices, and all two-qubit states , the fidelity of the graph state with the state after the graph state distribution protocol applied to is
where the column vector is given by , with A(G) the adjacency matrix of G.
Proof. See Appendix E 3. □
In order to make the graph state distribution protocol probabilistic, we can make the following modification:
where is a POVM, such that
The value represents the success probability of the measurement defined by the POVM . Then, we define the LOCC quantum instrument channel for the graph state distribution protocol as follows:
where
and
Then, the success probability of the protocol is
for every state .
IV. ANALYSIS OF A QUANTUM NETWORK PROTOCOL
In Secs. II and III, we described in detail how to model elementary links in a quantum network using Markov decision processes. Then, we showed how to model entanglement distillation protocols and joining protocols (such as entanglement swapping) as LOCC channels. The upshot of these developments is that they give us a method for determining the quantum states of elementary and virtual links in a quantum network that depends explicitly on the underlying device parameters and noise processes that characterize the device, thereby allowing us to perform a more realistic analysis of entanglement distribution protocols, as we now show in this section.
In this section, we analyze a simple entanglement distribution protocol. Recall from Sec. I that the entanglement distribution refers to the task of creating virtual links—entanglement between non-adjacent nodes—from elementary links, which are entangled states shared by adjacent (physically connected) nodes. An entanglement distribution protocol can be thought of as a graph transformation, as done in Refs. 128 and 134 and depicted in Fig. 1. Starting with the graph of physical links in the network, the goal is to realize a new graph consisting of virtual links in addition to elementary links, such as the graph in the right-most panel of Fig. 1.
The protocol that we consider consists of two steps: generate elementary links and then perform joining protocols based on the given target graph. The protocol is described more formally in Fig. 13. Starting with the graph of elementary links, all of the elementary links independently undergo policies πe, with . After time steps, an algorithm128,134,135 finds paths for creating the virtual links specified by the target graph , and the corresponding joining protocols are performed. If the entire target network cannot be achieved in t time steps, then a decision is made to either conclude the protocol with the current configuration or to continue for another t time steps under the same policies.
Outline of a quantum network protocol based on Markov decision processes. Every elementary link in the network follows a policy for time steps. At the end of the t time steps, the appropriate paths in the network are found, and the corresponding joining protocols are performed in order to achieve the network corresponding to the target graph .
Outline of a quantum network protocol based on Markov decision processes. Every elementary link in the network follows a policy for time steps. At the end of the t time steps, the appropriate paths in the network are found, and the corresponding joining protocols are performed in order to achieve the network corresponding to the target graph .
Remark IV.1. Note that in the protocol described in Fig. 13, the virtual links are created only when all of the required elementary links are active. This is of course not the most general procedure because it is in general possible to join some of the elementary links along a path while waiting for others to become active. To handle such general procedures requires developing MDPs for systems of multiple elementary links. While this is the subject of ongoing future work, we provide an example of how to extend the elementary-link MDP framework of Sec. II to a system of two elementary links, in which entanglement swapping is included, in Sec. V. We also note that the protocol in Fig. 13 uses fixed routing and path-finding algorithms from Refs. 128, 134, and 135. It is possible, in principle, to develop an MDP that takes into account routing. Doing so would allow us to obtain protocols that simultaneously optimize the actions of the elementary links, the joining operations, and the actions corresponding to routing, either directly using dynamic programing algorithms such as the one in Theorem II.3, or through reinforcement learning. These possibilities, and other possibilities for developing more sophisticated protocols using MDPs, are interesting directions for future work.
A. Fidelity
In order to quantify the performance of the protocol described in Fig. 13, it is natural to ask what the fidelity of the resulting states of the elementary and virtual links are to prescribed target states. Thus, let us begin by showing, in general terms, how we could calculate the fidelity after t time steps of our protocol.
First, we note that all of the elementary links are independent of each other. This is due to the fact that we assume that every node has a separate quantum system for every one of the elementary links associated with that node. Furthermore, we assume that every elementary link undergoes its own policy independent of the other elementary links. Therefore, after t time steps, the quantum state of the network is
where is a collection of policies for the individual elementary links, and every state is given by (59), namely,
Recall from (57) that is the probability of the history ht with respect to the policy πe, and is the quantum state of the elementary link conditioned on the history ht, given by (60).
The state in (230) is a classical-quantum state that contains both classical information about the history of elementary link and the quantum state of the elementary link conditioned on every history. If we condition on an elementary link corresponding to being active at time t, then the expected quantum state of the elementary link at time t is58
From these states, we can calculate the quantum states of the virtual links in the target graph that are created via joining protocols. In general, the states are of the form (166). As a concrete example, let us consider the usual entanglement swapping protocol from Sec. III B 1. Let be a path between two non-neighboring nodes v1 and , such that the entanglement swapping protocol along this path creates the virtual link given by the edge . The quantum state at the input of the entanglement swapping protocol is , and the output state conditioned on the success of the protocol is , where we recall the definition of in (172).
After the appropriate joining protocols are performed, and conditioned on their success, we obtain the target graph , and the corresponding quantum state has the form , where if e is a virtual link, obtained via a joining protocol, then ωe is given by (166). Now, the target quantum state is simply a tensor product of the target states corresponding to the edges of the target graph, i.e., . Therefore, by multiplicativity of fidelity with respect to the tensor product, the fidelity of the quantum state after the protocol is equal to . For the virtual links, individual fidelities in this product can be calculated using the formulas presented in Sec. III B.
B. Waiting time
In addition to the fidelity, another relevant figure of merit is the expected waiting time, which is a figure of merit that indicates how long it takes (on average) to establish an elementary or a virtual link. This figure of merit has been considered in prior work in the context of both a linear chain of quantum repeaters and general quantum networks.59,65,98,121,136–138
When defining the waiting times, we imagine a scenario in which elementary link generation is continuously occurring in the network,128 and that an end-user request for entanglement occurs at a time . The waiting time is then the number of time steps from time onward that it takes to establish the entanglement.
Definition IV.2 (Elementary link waiting time). Let be the graph corresponding to the elementary links of a quantum network and let . For all , the waiting time for the elementary link corresponding to the edge e is defined to be
Then, the expected waiting time is
where π is an arbitrary policy for the elementary link corresponding to the edge e.
We make the following definition for the waiting time for a collection of elementary links.
Definition IV.3 (Collective elementary link waiting time). Let be the graph corresponding to the elementary links of a quantum network, and let . For every subset , the waiting time for the elementary links corresponding to the elements of is defined to be
where .
In other words, the collective elementary link waiting time is the time it takes for all of the elementary links given by to be simultaneously active, and its expected value is
where is an arbitrary collection of policies for the elementary links corresponding to . If we consider a collection of elementary links, all undergoing the memory-cutoff policy, then
Proofs of this result using various different techniques can be found in Refs. 65, 98, and 139. In Appendix F, we prove this result within the framework introduced here by explicitly evaluating the formula in (235).
Definition IV.4 (Virtual link waiting time). Let be the graph corresponding to the elementary links of a quantum network, and let . Given a pair of distinct non-adjacent vertices and a path between them for some , the virtual link waiting time along this path is defined to be the amount of time it takes to establish the virtual link given by the edge ,
where is the set of edges corresponding to the path w, is the collective elementary link waiting time from Definition IV.3, and is a binary random variable for the success of the joining protocol along the path w, so that Yw = 1 corresponds to success of the joining protocol and Yw = 0 to failure. We define Yw and to be independent random variables.
The formula for the virtual link waiting time in Definition IV.4 is based on the formula in Ref. 59. It corresponds to the simple strategy of waiting for all of the elementary links along the path w to be established, and performing the measurements for the joining protocol. Note that this strategy is consistent with our overall quantum network protocol in Fig. 13.
C. Key rates for quantum key distribution
In order to determine secret key rates between arbitrary pairs of nodes in a quantum network, we need to keep track of the quantum state of the relevant elementary links as a function of time. The following discussion and formulas for secret key rates are based on Ref. 113.
Suppose that K is a function that gives the number of secret key bits per entangled state shared by the nodes of either an elementary link or a virtual link. (K is, for example, the formula for the asymptotic secret key rate of the BB84, six-state, or device-independent protocol.) Then, suppose that is the graph corresponding to the elementary links of a quantum network. Consider a collection of distinct nodes corresponding to a virtual link for some , and let w be a path in the physical graph leading to the virtual link given by . An entanglement swapping protocol is performed along the path w in order to establish the bipartite virtual link. Conditioned on the success of the joining protocol, the quantum state of the virtual link is given by (166), namely,
where
is the success probability of the joining protocol. Then, the secret key rate (in units of secret key bits per second) for the virtual link along the path w is
Here, K is calculated using the state in (238). The repetition rate in this case is a function of the end-to-end classical communication time required for executing the joining protocol.
V. A MARKOV DECISION PROCESS BEYOND THE ELEMENTARY LINK LEVEL
The developments so far in this work constitute an analysis of quantum networks using a Markov decision process (MDP) for elementary links. As we have seen, the framework of MDPs is useful because it allows us to model noise processes and imperfections that are present in near-term quantum technologies, and thus, allows us to understand the limits on the performance of near-term quantum networks. An important question is how useful the MDP formalism will be in practice when scaling up to model systems of more than one elementary link. In this section, we provide an MDP for a system of two elementary links, taking entanglement swapping into account. We note that in the recent work,140 MDPs for repeater chains with two, three, and four elementary links have been considered, but the definition of the MDP here differs from the one in Ref. 140 because here we take decoherence of the quantum memories into account.
We start this section by defining the basic elements of the MDP, and then, we show how to obtain optimal policies using linear programing. In particular, we formulate the optimal expected waiting time to obtain the end-to-end virtual link and the optimal expected fidelity of the end-to-end virtual link as linear programs. Then, we show that prior analytical results on the expected waiting time for two elementary links under the memory-cutoff policy,59 known only in the “symmetric” scenario when the two elementary links have the same transmission-heralding success probability and the same memory cutoff, can be reproduced. However, we note that our linear programing procedure can be applied even in non-symmetric scenarios.
A. An MDP for two elementary links
Let p1 and p2 be the success probabilities for generating the two elementary links, and let q be the probability of successful entanglement swapping. Note that p1 and p2 are defined exactly as in Sec. II A. In particular,
where are the completely positive maps corresponding to the success of the heralding procedure for the elementary link, is the transmission channel from the source to the nodes for the elementary link, and is the state produced by the source associated with the elementary link, see Fig. 14. We also define the states
where is the quantum channel describing the decoherence of the quantum memories associated with the elementary link.
Two elementary links with entanglement swapping at the central node.
Now, recall that in the case of the one elementary link considered in Sec. II B, the state variable was just the memory time M(t), referring to the time for which the quantum state of the elementary link was held in the memories of the nodes, and the actions consisted of either keeping the elementary link or discarding it and generating a new one. Now, in the case of two elementary links, we must keep track of the memory time of both elementary links, and we also store information about whether or not the virtual (end-to-end) link is active. The actions are similar to before, consisting of the same elementary link actions as before, but now, we define an additional action for performing the entanglement swapping operation. Formally, we have the following:
States: The states of the MDP are elements of the set , where indicates whether or not the end-to-end link is active, is the set of possible states of the first elementary link (with the elements of the set having the same interpretation as in the elementary link MDP), and is the set of possible states of the second elementary link. In particular, and are the maximum storage times of the two elementary links, corresponding to their coherence times, see Sec. II B. To these states, we associate the (standard) probability simplex spanned by the orthonormal vectors , with , and , and we often use the abbreviation for every .
We use , to refer to the random variables (taking values in ) corresponding to the state of the MDP.
Actions: The set of actions is , where the different actions have the following meanings:
00: Keep both elementary links.
01: Keep the first elementary link, discard, and regenerate the second.
10: Discard and regenerate the first elementary link, keep the second.
11: Discard and regenerate both elementary links.
: Perform entanglement swapping.
We use A(t), , to refer to the random variables (taking values in the set ) corresponding to the actions taken.
We let be the history, consisting of a sequence of states and actions, up to time , with .
- Figure of merit: For the elementary link MDP defined in Sec. II B, recall that the figure of merit was essentially the fidelity of the elementary link, but scaled by a factor corresponding to the probability that the elementary link is active. We define the figure of merit here in an analogous fashion as follows:(245)
where we recall that is the entanglement swapping channel for one intermediate node, as defined in Sec. III B 1, and is a target pure state vector, which, in this context, is typically the maximally entangled state as defined in (168).
Let us now proceed to the definition of the transition matrices for our MDP. Unlike the elementary link scenario, in this scenario of two elementary links, we want not only for the fidelity and success probability of the end-to-end link to be high but also for the average amount of time it takes to generate the end-to-end link to be low—in other words, we want the expected waiting time to be low as well. Therefore, in order to address the expected waiting time in our MDP, we define the transition matrices in such a way that states corresponding to an active end-to-end link [i.e., states such that x = 1] are absorbing states. By doing this, the expected waiting time is nothing but the expected time to absorption, which is a standard result in the theory of Markov chains, see, e.g., Ref. 141. We note that this idea of relating the expected waiting time of a quantum repeater chain to the absorption time of a Markov chain has already been used in Ref. 121; however, here, we apply this idea in the more general context of an MDP, while also taking memory decoherence and other device imperfections explicitly into account.
Let denotes the transition matrix for the elementary link, as defined in (48) and (49), for . Then, using those elementary link transition matrices, we define the transition matrices for our MDP for two elementary links as follows:
where
and is defined exactly as in (53).
First, let us observe that every transition matrix has a block structure, with the blocks defined by the transitions of the status of the end-to-end link. Specifically, we can write every transition matrix Ta as
where the sub-blocks are the block corresponding to the transition of the status of the virtual link from to . (We note, as before, that probability vectors are applied to transition matrices from the right, see Appendix A.) From this, we see that for the actions , the transition matrices are of the following block-diagonal form:
Therefore, for these transition matrices, because the entanglement swapping action is not performed, the transition from x = 0 to x = 1 is not possible. Consequently, if the end-to-end is initially inactive (x = 0), then it stays inactive, and each elementary link transitions independently according to the elementary link transition matrices from Sec. II B. If the end-to-end link is initially active (x = 1), then nothing happens to the states of the elementary links, in accordance with the definition of an absorbing state. For the action of entanglement swapping, we have three non-zero blocks. The block means that the end-to-end link is initially inactive and stays inactive, which can happen in one of several ways:
Both elementary links are initially active, but the entanglement swapping fails, after which both elementary links are regenerated. This possibility is given by the term .
Both elementary links are initially inactive. In this case, they both remain inactive after the entanglement swapping action, and this is given by the term .
One of the elementary links is active, but the other is not. In this case, the memory time of the active elementary link is incremented by one, corresponding to the “shift” operator Sj on the active elementary link, while the inactive elementary link remains inactive. These possibilities are given by the terms and .
One of the elementary links is inactive, and the other has reached is maximum memory time. In this case, the inactive elementary link remains inactive, and the other elementary link transitions to the −1 state because the maximum time was reached. These possibilities are given by the terms and .
The block corresponds to a transition from the end-to-end link initially being inactive to being active, which happens when the entanglement swapping succeeds. Since the entanglement swapping is possible only when both elementary links are active, and because we want to keep track of the memory times of the elementary links at the moment the entanglement swapping is performed, this block is given by . Finally, the block corresponds to the end-to-end link being active already; thus, in accordance with the definition of an absorbing state, this block is given simply by , as with the other actions.
Now, just as we defined a memory-cutoff policy for elementary links in Sec. II C 1, we can define a memory-cutoff policy for the system of two elementary links that we are considering here. Suppose that the first elementary link has cutoff time , and the second elementary link has cutoff time . Then, we define the decision function such that if both elementary links are active, then an entanglement swap is attempted; otherwise, one of the actions 01, 10, or 11 is performed, depending on which elementary links are active. This leads to the following definition of the deterministic decision function:
for all and . Note that it is only necessary to define the decision function on the transient states and not the absorbing states because the figures of merit that we are concerned with [such as the expected value of the function f in (245) and the expected waiting time to absorption] do not depend on the values of the decision function on absorbing states.
B. Optimal policies via linear programing
Having defined the basic elements of the MDP for two elementary links with entanglement swapping, let us now look at optimal policies. We are concerned both with the figure of merit defined in (245) and with the expected waiting time to obtain an end-to-end link. In Appendix A 4, we show that both quantities can be bounded using linear programs. In fact, the results in Appendix A 4 go beyond the MDP for two elementary links that we consider here because the linear programs apply to general MDPs with arbitrary state and action sets and transition matrices.
Theorem V.1 (Linear program for the optimal expected value for two elementary links). Given a system of two elementary links, along with the associated MDP defined in Sec. V A, the optimal expected value of the function f defined in (247) is bounded from above the following linear program:
where the optimization is with respect to the -dimensional vectors , and the inequality constraints are component-wise. Every set of feasible points of this linear program defines a stationary policy with the decision function d, whose values for the transient states are as follows:
for all , and . If , then we can set to be an arbitrary probability distribution over the set of actions.
Remark V.2. Note that in the theorem statement above, we defined the action of the decision function only for the transient states. For the absorbing states, we can set the decision function to be arbitrary because neither the expected value of the MDP nor the expected waiting time to absorption is affected by the value of the decision function on absorbing states, see Appendix A 3.
Theorem V.3 (Linear program for the optimal expected waiting time for two elementary links). Given a system of two elementary links, along with the associated MDP defined in Sec. V A, the optimal expected waiting time is bounded from below by the following linear program:
where the optimization is with respect to the -dimensional vectors and , and the inequality constraints are component-wise. Every set of feasible points of this linear program defines a stationary policy with decision d, whose values for the transient states (see Remark V.2) are as follows:
for all , and . If , then we can set to be an arbitrary probability distribution over the set of actions.
We now show that the linear program in (260) reproduces the known analytical result in Ref. 59, Eq. (5), for the expected waiting time for two elementary links with the same success probability p and cutoff time ,
In Fig. 15, we plot this function along with the optimal value obtained for the linear program in (260). We find that the two curves coincide for all values of the transmission-heralding probability p and the entanglement swapping success probability q considered. This provides us not only a sanity check on the linear program but also evidence that the memory-cutoff policy in (257) is optimal, at least in the symmetric scenario. We also note that the result in (262) holds only in the symmetric scenario in which both elementary links have the same transmission-heralding success probability, while the linear program in (260) can be used to determine the optimal expected waiting time in arbitrary parameter regimes.
The expected waiting time for an end-to-end for a system of two elementary links, as depicted in Fig. 14. We let be the transmission-heralding success probability for both elementary links, and we denote by q the success probability for entanglement swapping. We compare the known analytical result for this scenario [Ref. 59, Eq. (5)], with cutoff [see (262)], to the solution obtained by the linear program in (260), with maximum storage time .
The expected waiting time for an end-to-end for a system of two elementary links, as depicted in Fig. 14. We let be the transmission-heralding success probability for both elementary links, and we denote by q the success probability for entanglement swapping. We compare the known analytical result for this scenario [Ref. 59, Eq. (5)], with cutoff [see (262)], to the solution obtained by the linear program in (260), with maximum storage time .
VI. SUMMARY AND OUTLOOK
The central topic of this work is the theory of near-term quantum networks—specifically, how to describe them and how to develop protocols for entanglement distribution in practical scenarios with near-term quantum technologies. The goal in this area of research is to develop protocols that can handle multiple-user requests, work for any given network topology, and adapt to changes in topology and attacks to the network infrastructure, with the ultimate goal being the realization of the quantum internet. In this work, we have laid some of the foundations for this research program. The core idea is that Markov decision processes (MDPs) provide a natural setting in which to analyze near-term quantum network protocols. We illustrated this idea in this work by first analyzing the MDP for elementary links first introduced in Ref. 58, simplifying its formulation and presenting some new results about it. We then considered the example of satellite-to-ground elementary link generation under the lens of the elementary link MDP. We then showed how the elementary link MDP can be used as part of an overall quantum network protocol. Finally, we provided a first step toward using the MDP formalism for more realistic, larger networks, by providing an MDP for two elementary links. We showed that important figures of merit such as the fidelity of the end-to-end link as well as the expected waiting time for the end-to-end link can be obtained using linear programs.
Moving forward, there are many interesting directions to pursue. The MDPs introduced in this work are not entirely general because they do not model protocols for arbitrary repeater chains and arbitrary networks. Thus, to start with, extending the MDP for two elementary links to repeater chains of arbitrary length is an interesting direction for the future work. In this direction, we expect that linear and possibly even semi-definite relaxations of the expected value of the end-to-end link and of the expected waiting time, such as those in Theorem V.1 and Theorem V.3, are going to be crucial in the analysis of longer repeater chains because the size of the MDP (the number of states and actions) will grow exponentially with the number of elementary links.
Going beyond repeater chains to general quantum networks, it is of interest to examine protocols involving multiple cooperating agents. When we say that agents “cooperate,” we mean that they are allowed to communicate with each other. In the context of quantum networks, agents who cooperate have knowledge beyond that of their own nodes. If every agent cooperates with an agent corresponding to a neighboring elementary link, then the agents would have knowledge of the network in their local vicinity, and this would, in principle, improve waiting times and rates for entanglement distribution. Furthermore, the quantum state of the network would not be a simple tensor product of the quantum states corresponding to the individual edges, as we have in (229) when all the agents are independent. See Refs. 128 and 135 for a discussion of nodes with local and global knowledge of a quantum network in the context of routing.
Finally, another interesting direction for future work is to develop quantum network protocols based on the decision processes that incorporate queuing models for requests for links of a specific type between specific nodes, see, e.g., Refs. 90 and 142. Then, one can calculate quantities such as the time needed to fulfill all requests. We can also calculate the “capacity” of the network defined in the context of queuing systems as the maximum number of requests that can be fulfilled per unit time.
ACKNOWLEDGMENTS
Much of this work is based on the author's Ph.D. thesis research,143 which was conducted at the Hearne Institute for Theoretical Physics, Department of Physics and Astronomy, Louisiana State University. During this time, financial support was provided by the National Science Foundation and the National Science and Engineering Research Council of Canada Postgraduate Scholarship. The plots in this work were made using the Python package matplotlib.144
AUTHOR DECLARATIONS
Conflict of Interest
The author has no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: OVERVIEW OF MARKOV DECISION PROCESSES
In this section, we provide a brief overview of the concepts from the theory of Markov decision processes (MDPs) that are relevant for this work. We mostly follow the definitions and results as presented in Ref. 145 while using the notation defined in Appendix A 1.
1. Notation
Throughout this work, we deal with probability distributions defined on a discrete, finite set of points. It is very helpful to write these probability distributions as vectors in a (standard) probability simplex. We do this as follows. Consider a finite set . To this set, we associate the orthonormal vectors in , which means that for all . The probability simplex corresponding to is then formally defined as all convex combinations of the vectors in ,
This set is in one-to-one correspondence with the set of all probability distributions defined on . Specifically, let be a probability distribution (probability mass function) on , i.e., for all and . The unique probability vector corresponding to P is
We drop the subscript from whenever the underlying set is clear from the context. It is important to note and to emphasize that the vector does not represent a quantum state—the braket notation is used merely for convenience. Normalization of the probability vector is then captured by defining the following vector:
We often omit the subscript in when the underlying set is clear from the context. Then
It is often the case that a probability distribution is associated with a random variable X taking values in , so that for all . In this case, for brevity, we sometimes write the probability vector as
Now, consider another random variable Y taking values in the finite set . We regard stochastic matrices mapping X to Y (i.e., matrices of conditional probabilities ) as linear operators with domain and codomain ,
and we denote the matrix elements by
We then have, by definition of a stochastic matrix,
which captures the fact that the columns of a stochastic matrix sum to one. Then, if is a probability distribution corresponding to X, then the action of the matrix on , which results in the probability distribution corresponding to Y, can be written as
In particular, for all
Finally, we discuss joint probability distributions. Consider two finite sets and and the set of all (joint) probability distributions on . Now, because , we can regard as the convex span (convex hull) of tensor product orthonormal vectors . Thus, every can be written as
We frequently use the abbreviation in this paper. Then, marginal distributions can be obtained as follows:
where
These concepts for probability distributions defined on two sets can be readily extended to probability distributions defined on sets of the form for all .
2. Definitions
A Markov decision process (MDP) is a stochastic process that models the evolution of a system with which an agent is allowed to interact. Formally, an MDP is defined as a collection,
consisting of the following elements:
A set of the allowed states of the system. We consider finite state sets throughout this work. The sequence of random variables taking values in describes the state of the system at all times .
A set of actions that the agent is allowed to perform on the system. We consider finite action sets throughout this work. The sequence of random variables taking values in describes the action taken by the agent at all times .
- A set of transition matrices, which are stochastic matrices with domain and codomain . Specifically(A18)
for all . These matrices determine how the system evolves from one time to the next conditioned on the actions of the agent.
A function that quantifies the reward that the agent receives at every time step based on the current state of the system and the action that it takes.
The history up to time of an MDP is the random sequence , with . By the Markovian nature of an MDP, the probability distribution of every history is equal to
where
is the probability distribution of actions at time j conditioned on the current state of the system. We refer to as a decision function. Note that for all . The sequence
of decision functions at all times is known as a policy of the agent. In the context of this work, policies should be thought of as synonymous with protocols for quantum networks.
Given a decision function d, we define the following linear operators acting on :
Then, it is straightforward to show that the linear operator
from to is a stochastic matrix with elements
for all and all .
Remark A.1. Observe that for a fixed decision function d, the set of linear operators defined in (A22) forms a positive operator-valued measure (POVM). Indeed, by definition, all of the operators are positive semidefinite; furthermore, by definition of the decision function in (A20)
The transition matrices Pd as defined in (A23) allow us to determine the probability distribution of the state of the system at every time for a given policy. Specifically, for a policy
where
is the probability distribution for the system at the initial time t = 1.
a. MDPs with absorbing states
We call a state absorbing if for all . In other words, once the system reaches the state s, it always stays there, meaning that for all decision functions d. Every state that is not absorbing is called transient if there is non-zero probability that, starting from such a state, the system will eventually reach an absorbing state. We can partition the set of all states into disjoint sets: , where is the set of absorbing states and is the set of transient states. We can then rewrite the set as , leading to the following block structure for the transition matrices Ta:
where is the block describing transitions between transient states, is the block describing transition between an absorbing state and a transient state, is the block describing transitions between a transient state and an absorbing state, and is the block describing transitions between absorbing states. Note that by our definition of an absorbing state, and for all . Similarly, for a decision function d, we can write the matrices , in block form as
Consequently, the transition matrix Pd in (A23) has the following form:
where
3. Figures of merit
While the primary figure of merit in a Markov decision process is the expected reward, in this work, we are mostly interested in what we call functions of state (such as the fidelity) and the absorption time (corresponding to the waiting time for a virtual link).
Functions of state. In this work, we are also interested in functions of the state of the system. We can associate to such functions the vector
Then, for a policy , we are interested in the expected value of the random variable for all , i.e., the quantity