Irreversibility is usually captured by a comparison between the process that happens and a corresponding “reverse process.” In the last decades, this comparison has been extensively studied through fluctuation relations. Here, we revisit fluctuation relations from the standpoint, suggested decades ago by Watanabe, that the comparison should involve the prediction and the retrodiction on the unique process, rather than two processes. We identify a necessary and sufficient condition for a retrodictive reading of a fluctuation relation. The retrodictive narrative also brings to the fore the possibility of deriving fluctuation relations based on various statistical divergences, and clarifies some of the traditional assumptions as arising from the choice of a reference prior.
I. PROCESSES VERSUS INFERENCES
Quantitative approaches to irreversibility traditionally involve a comparison between the process physically happening, usually called forward process, and a corresponding reverse (or backward) process. The definition of the latter is intuitive only for some processes: somewhat ironically, not for those that are paradigmatic of irreversibility. Indeed, consider the erasure channel, which sends every possible input state to a unique, fixed output state: what should one take as its reverse process?
In a previous paper,1 two of us proposed to look at irreversibility as arising out of our logical inference, rather than out of physical processes. Specifically, we proposed to define the reverse process in terms of Bayesian retrodiction. This is a universal recipe. This retrodictive element can be identified a posteriori in all previously reported fluctuation relations that we checked, including the most famous ones, both classical2–5 and quantum,6 that are highlighted in the many available reviews.7–10 Besides recovering “intuitive” reverse processes, retrodiction provides a definition for the non-intuitive ones, which smoothly removes anomalies that were reported with other tentative definitions. Thus, it seems plausible that all fluctuation relations can be understood in terms of retrodiction (though the literature is too vast and sparse to make a definitive call, we shall strengthen the evidence with Result 3 below).
In the pursuit of this line of research, we recognize that our previous paper was not radical enough. If the retrodictive origin of irreversibility is assumed, the narrative of the two processes becomes superfluous: using retrodiction to define a reverse process is an unnecessary step. There is only one process, the one that happens; what is being compared are our forward and backward inferences on it: prediction and retrodiction.
The replacement of irreversibility with irretrodictability was pioneered by Watanabe,11,12 though prior to our previous work no connection had been drawn with the fluctuation theorems derived in the last twenty years. Under this change of viewpoint, it is the same physics that is being described, freed from an excess baggage in the narrative (and thus, possibly, on the interpretation). Besides epistemological economy, we are going to show that this viewpoint is fruitful as it opens previously unnoticed vistas.
The plan of this paper is as follows. In Sec. II, we present a self-contained introduction to retrodiction, both classical and quantum; and Sec. III describes two case studies in detail. Section IV deals with fluctuation relations: we show that these relations are intimately related to statistical distances (“divergences”) and that Bayesian retrodiction arises from the requirement that the fluctuating variable can be computed locally. We also compare the fluctuation relations obtained in the retrodictive narrative with those obtained in the reverse-process narrative. Section V reflects back on the structure of retrodiction, elaborating on the role of the unavoidable reference prior.
A word on the presentation. This paper covers topics from statistics, thermodynamics, and quantum information. We have tried to keep the presentation self-contained. We have also adopted a compact approach to references: besides those that prove specific results, we shall cite mostly reference books and reviews, and occasionally a few works that we consider clear and exemplary, useful as entry points for the reader, without any expectation of being exhaustive.
II. RETRODICTION: GENERALITIES
In this paper, we consider processes with discrete alphabets. The input of the channel is denoted and the output . We shall always have in mind , keeping the notation different only when clarity demands it.
Also, throughout the paper, we assume that all probability vectors have strictly positive elements, and also all channels have only strictly positive entries (with the exception of the permutation channels studied in Subsection III A). Arbitrarily small entries would be indistinguishable from an exact zero, certainly in practice, and perhaps also in principle depending on one's understanding of probabilities.
A. Bayesian retrodiction on classical information
As basic setting of retrodiction, consider the most elementary form of statistical inference: at the output of a known channel , one observes the outcome , and wants to infer something about the input x. In this paper, we focus on Bayesian retrodiction, whose goal is to update one's belief on the distribution of x. This requires a prior belief, the reference prior, denoted . The total prior knowledge is therefore captured by the joint probability distribution ; in particular, the prior knowledge about y is . When the knowledge on y is updated to , one performs the Bayes' update
on the total knowledge, whence the updated knowledge on x follows as . This is the most elementary example of retrodiction.
Slightly less basic, though also widely discussed in the statistical literature, is the retrodiction on x based on “soft evidence” on y. This refers to the situation, in which the update on y is not a sharp value , but a distribution u(y). In real life, soft evidence may arise by sheer uncertainty (e.g., reading the outcome y in very dim light) or by virtual evidence (e.g., the doctor told me a definite result for my test, but I saw that he was tired and fear that he may have misread the actual result y written on the sheet). The translation of such uncertainties into a quantitative u(y) is not trivial,13,14 but we take it for granted. For such situations, Bayes' update (1) is generalized to Jeffrey's update15
In the case of virtual evidence, Jeffrey's update is a direct consequence16,17 of Bayes' update starting from , under the assumption that the variable z influences directly only y and not x (cf. the example above: the tiredness of the doctor has no direct influence on whether I am actually sick). In other cases, it may be considered as an actual addition to the rules of Bayesian inference (this was Jeffrey's own view).
Thus, the conditional probability plays the role of channel for the retrodiction, in short retrodiction channel. For the remainder of the paper, we change the notation to
We shall make use at our convenience of a matrix representation. The channel is represented by the column-stochastic matrix
Similarly, the retrodiction channel is represented by the column stochastic matrix
In this notation, we similarly define input and output distributions p(x) as column vectors vp. For instance, the relations that define the reference prior can be written as
B. Two remarks
Before continuing, we bring up two crucial remarks. The first is about the reference prior. It is well known,11,12,17 and our presentation above leaves it clear once again, that this element of subjectivity is an unavoidable feature of Bayesian retrodiction for a generic channel. The question of the choice of the prior is a recurring topic in Bayesian statistics. The literature on fluctuation relations does not mention it as such, the assumption being stated in more physical language. We shall get back to this point in Sec. V. Here, for the sake of definiteness we just mention two possible choices. One is the uniform prior for all x. Another is the steady state of , defined by . Every stochastic map has at least one steady state, and exactly one if all its entries are strictly positive. It follows immediately from that the uniform prior is a steady state if and only if is bistochastic.
The second remark, that also others felt the need to highlight,18 is that retrodiction is not inversion. A channel has a linear inverse if there exists M such that . In the case of an invertible channel, given a valid output distribution , one is able to recover the input distribution p(x). But for most invertible channels, M is not a channel itself: there exist u(y) such that . In particular, since the image of the probability simplex by is convex, there exists such that no input distribution p(x) is mapped by to —while retrodicting from is the most basic example of Bayesian inference. Ultimately of course the difference is in the task: retrodiction does not aim at reconstructing the prior through repeated sampling, but at updating one's belief after a single run of the process. This is particularly clear in the example of the test for a sickness given above. See Refs. 42 and 43 for studies of retrodiction of quantum temporal dynamics.
In Subsection III A, we shall see a remarkable coincidence: the channels for which the retrodiction channel coincides with the inverse, and those for which the retrodiction channel is independent of the reference prior, are exactly the same.
C. Retrodiction on “quantum-inside” classical channels
According to our current knowledge, the most general description of the inner working of any classical input–output channel is given by quantum theory. The quantum-inside description of a classical channel is as follows (Fig. 1). The classical input x prepares a system in a state ρx. The system is then sent through a quantum channel [a completely positive, trace-preserving (CPTP) map] , and eventually measured with the positive operator-valued measure (POVM) , leading to the classical outcome y. All in all,
A quantum-inside classical channel and the corresponding retrodiction channel. The construction, described in Eqs. (7)–(10), is valid for every set of states , every CPTP map and every POVM .
A quantum-inside classical channel and the corresponding retrodiction channel. The construction, described in Eqs. (7)–(10), is valid for every set of states , every CPTP map and every POVM .
We want to derive the quantum description of the associated classical retrodiction channel (3): that is, finding states , a CPTP map , and a POVM , such that
For this, we first need to define the adjoint of the channel, that is the operator such that for all operators X, Y. Inserting this definition in (3), one finds
This looks like (3), but in general one has , and is a CPTP map only if is unital. In order to identify proper states, channel and measurement, one has to introduce a reference state
As in the classical case, we assume that Ξ and have full rank, to skip caveats for situations of measure zero. Then one possible construction of the quantum elements of (6) uses
where we have introduced the notation for a positive operator A. Starting from this basic construction, one can obtain others as follows:
also leading to (6) for any pair of unitary channels .
The key observation is that the retrodiction channel turns out to be the Petz recovery or Petz transpose map19,20 of for the reference state Ξ [Eq. (9)], or a rotated version thereof [Eq. (12)].
The Petz map, a widely used tool in quantum information,21–23 was previously identified on formal grounds as the generalization of retrodiction within the quantum formalism.24–27 First of all, in the case where all the states and the channels are diagonal in the same basis, (9) reduces to (3). Furthermore, just as the Bayesian retrodiction depends on a reference prior ξ, the Petz map depends on a reference state α.28 Interestingly, the Petz map was also used for quantum fluctuation relations,29 but the connection with retrodiction was not noticed.
III. RETRODICTION: TWO CASE STUDIES
In this section, we present first retrodiction on Hamiltonian channels (both classical and quantum), which are provably the only ones for which the retrodictive map is independent of the reference prior and is identical to the inverse. Then, we discuss retrodiction for all classical bit channels (d = 2): precisely because it is elementary, this case study is useful to clarify features and dissipate possible confusions about retrodiction.
A. Case study: Hamiltonian channels
We call Hamiltonian channels, both classical and quantum, channels that are both deterministic and invertible (Watanabe12 referred to these channels as “bilaterally deterministic”). The flows do not cross, and each state belongs to one and only one trajectory.
For classical information, we have with f a bijection (in the discrete case, a permutation), and so is uniquely defined. In this case, it is absolutely natural to expect
independent of the reference prior. It is readily verified that this is indeed the case from Eq. (3), since for a bijection we have .
This result has very appealing features: the retrodiction channel coincides with the inverse and is independent of the arbitrary choice of reference prior. Appealing as they are, these features cannot be taken as paradigmatic, because they are actually unique to this case.
Result 1. The following three statements are equivalent:
- I
The channel is a permutation.
- II
The retrodiction channel is independent of the reference prior.
- III
There exists a reference prior ξ, for which the retrodiction channel is the inverse channel .
Proof. We present a full proof here, putting on record that the equivalence of (I) and (II) was already proved in Watanabe's pioneering study.12
Equation (14) proves (I) (II, III). The implication (II) (III) goes as follows: Eq. (4) implies . If for all ξ, then for all vectors: then , that is .
We are left to prove (III) (I). Let us assume that there is a reference prior such that , i.e., . Let us spell out this condition:
All the terms are products of non-negative numbers, for all x by assumption, and for . Thus, for all the off-diagonal terms to be zero, we need
This means the product of any two entries of a given y-row will always be zero. Hence, there can be at most one non-zero entry for that row, which means that the matrix can have at most d non-zero entries. But there are d columns, and the sum of all the elements of each column must be 1. Thus, the only possibility is that each row and each column have exactly one non-zero entry, and the value of the entry is 1. This defines a permutation matrix and concludes the proof. ◻
Incidentally, condition (15) shows that is not determined by the reference prior; so, at that point we had proved directly (III) (II).
The same result holds for retrodiction on quantum information—in fact, Result 1 was presented first for reasons of clarity, but can be seen as a special case of the following:
Result 2. The following three statements are equivalent:
-
The channel is unitary.
-
The retrodiction channel is independent of the reference prior.
-
There exists a reference state α, for which the retrodiction channel is the inverse channel .
Proof. The implications (I) (II, III) follow from the direct calculation of (9) for a unitary channel:
where we have used the identity channel, and that is .
The proof of (II) (III) is analog to that for classical information. Trivially, holds by definition of . Therefore, if for all α, then for all ρ; whence .
Finally, for the proof of (III) (I): since any Petz map is CPTP, the starting assumption implies that is a CPTP map. But it is known that a CPTP map with the same input and output space has a CPTP inverse (that is, it is invertible, and the inverse is itself a channel) if and only if it is unitary.30,31 ◻
B. Classical one-bit channels
As a second case study, we consider classical stochastic processes for d = 2 (Fig. 2). We write a generic channel as
with . Its steady state is
unique unless (this being expected, since every state is a steady state for the identity channel).
Bit channels (a) and their retrodiction (b) can be depicted as the respective maps above.
Bit channels (a) and their retrodiction (b) can be depicted as the respective maps above.
The corresponding retrodiction channel with generic reference prior is
with and . Interestingly, the retrodiction channel built on the steady state has the same stochastic matrix as the channel itself,
This can be verified without calculation, noticing that and that must also be column-stochastic.
The channel (16) is invertible if and only if . Result 1 of course holds: the retrodiction channel will be the inverse if and only if (identity channel) or (bit-flip channel). For all the other invertible channels, retrodiction and inversion do not coincide, whatever the choice of the reference prior.
The non-invertible channels, , make for an interesting case study; we change the notation , so that
First notice that
for all input p. In other words, these channels erase whatever information is present in the input, and produce a fixed output distribution (which, of course, coincides with their steady state). In this sense, they could all be called erasure channels, though the name is usually given to the case ε = 0.
Because at the output all information on the input has been destroyed, one may naively expect the retrodiction channel to produce a completely random outcome. But this forgetting the importance of the reference prior in retrodiction. Plugging the expressions in the equations, one readily finds
The retrodiction channel of an erasure channel is the erasure channel that returns the reference prior—a result that can be easily extended to any alphabet dimension.32 In agreement with (18), if the reference prior is the steady state, the retrodiction channel is the same erasure channel. The fact that one can associate a thermodynamically reversible process to a logically irreversible channel like the erasure channel was noted for instance by Sagawa (Ref. 44). The identity between the forward and the reverse erasure channels was noticed by Riechers and co-workers (Ref. 45). for a specific toy model of physical erasure. We see here that it is an unavoidable feature, having defined the reverse process from the steady state. These observations are summarized in Fig. 3.
(a) An ϵ-erasure channel, (b) the retrodiction channel with steady state as reference prior, (c) the retrodiction channel for a generic reference prior.
(a) An ϵ-erasure channel, (b) the retrodiction channel with steady state as reference prior, (c) the retrodiction channel for a generic reference prior.
IV. FLUCTUATION RELATIONS FROM RETRODICTION
The topic of this section, fluctuation relations, originated in statistical thermodynamics. As we shall see, the formal structure of these relations can be derived without any reference to that branch of physics. As it happens, we shall mention thermodynamics only in the very last paragraph of the section. The explicit application of these formulas to important situations in thermodynamics was discussed in our previous paper.1
A. The process and its statistics
As we noted in the introduction, it is customary in studies of irreversibility to define the physical process as the forward process, and to compare it to its corresponding reverse process. Here, we adopt a different narrative as follows:
-
There is only one process, the one that is happening.
- A (forward) prediction on the process starts with a prior p(x) on the input, and infers the predicted distribution
- A retrodiction on the process starts with a prior q(y) on the output, and infers the retrodicted distribution
The explicit mention of ξ will be dropped for simplicity in the remainder of this section and resumed in Sec. V.
We proceed to derive fluctuation relations with our narrative, and later we show the comparison with the reverse-process narrative.
B. Derivation of the fluctuation relations
Consider a variable that depends on the initial and final states, and may be determined by the process. Its predicted distribution is
while its retrodicted distribution is
where
So, the difference between and is encoded in this ratio of probabilities, which is exactly the quantity that appears in the statistical f-divergence33,34
where the function f(r) must be convex for and satisfy . The “entropy production,” on which the thermodynamical literature bases fluctuation relations, uses , which generates the reverse Kullback–Leibler distance . But we do not need to choose that particular function at this stage: for any function f(r) invertible35 for , if we set
we have by definition
that is the generalization of Crooks' theorem.4 By integrating over ω, one obtains the integral fluctuation relation
that depends only on the process. This is the generalization of Jarzynski's equality.3
C. Comparison between retrodiction and reverse process
In all the literature we are aware of, fluctuation relations are presented as a measure of the statistical difference between the forward and the reverse process, not between the predicted and retrodicted distributions of a single process. The difference between the two narratives has mathematical manifestations that we are going to discuss now.
For the sake of definiteness, let us start with a canonical example. Suppose that the variable of interest is entropy, and that in the process under study it changes by . In a retrodictive approach, (23) defines the retrodicted distribution for that same process. But if one looks at (23) as defining a reverse process, for that process the change of entropy will be rather .
Generalizing this observation, the distribution of the variable ω in the reverse process reads
where, under assumption (28),
because the roles of PF and PR are exchanged between the forward and the reverse process [for the choice , there follows the expected minus sign ]. The resulting fluctuation relation then reads1
As expected, μF evaluated at ω is now related to evaluated at . The Jacobian factor, which comes from the change of variable in the δ-function, ensures that the integral fluctuation relation takes exactly the same form as (31).
A comparison of (30) and (34) for various choices of f is given in Table I. For the thermodynamical case , we have , and therefore the only difference between (30) and (34) is that μR is evaluated at ω while is evaluated at . Thus, in thermodynamics not only the Jarzynski equality but also the Crooks fluctuation theorem is the same in both narratives (up to that sign change). Interestingly, even when reporting experiments in which the reverse process was actually implemented, it is the retrodictive version that is usually plotted for its visual convenience: see for instance the pioneering verification of Crooks' fluctuation theorem with folding and unfolding of RNA.36
Fluctuation relations (FRs) obtained in the retrodictive and in the reverse-process narratives, for a few choices of ω satisfying (29) for the corresponding f-divergence. The fourth column is kept in the form (34) without possible algebraic simplifications, to facilitate the identification of and .
. | f-Divergence . | FR for retrodiction (30) . | FR for reverse process (34) . | Integral FR (31) . |
---|---|---|---|---|
Reverse Kullback–Leibler | ||||
Squared Hellinger | ||||
Neyman |
D. Fluctuation relations and Bayesian retrodiction
In the retrodictive narrative, the fluctuation relation (30) and its derivate (31) are statistical properties of the random variable ω defined by (29). They are formally valid for the statistical comparison between arbitrary PF and PR, with no reference to the notion of retrodiction, let alone to its mathematical expression (3). In the reverse-process narratives, one studies the distribution of the values of the variable when the roles of PF and PR are swapped [Eq. (33)]; but even then, the fluctuation relations follow without having specified any mathematical relation between PF and PR. So, what is the role of Bayesian retrodiction, or that of a proper definition of the reverse process? We are going to prove that it singles out a specific structure for R(x, y), and that this simple result has far-reaching consequences in the context of thermodynamics.
Result 3. The ratio R(x, y) [Eq. (26)] is of the form , for some functions F and G, if and only if PF and PR are related as (22) and (23), with the latter constructed from Bayesian retrodiction [Eq. (3)]. In this case,
Proof. If PF and PR are given by (22) and (23), using (3) it is trivial to derive (35). In the other direction: without loss of generality we keep the form (22) for PF and, using the product rule of joint probabilities, we write for the conditional distribution (channel) η and marginal q. The assumption reads
with and . Since the LHS is strictly positive,37 must hold for all (x, y). Now, being a channel, η must satisfy , that is . So, finally
where is a valid probability distribution because all the have the same sign. ◻
While this result may look purely anecdotal or formal, let us recall that in the usual thermodynamical interpretation is the (non-adiabatic) stochastic entropy production.38 Thus, whenever the stochastic entropy production can be computed locally (that is, independently of the correlations between microstates x and y), a structure of Bayesian retrodiction is unavoidable (in the reverse-process narrative: the reverse process must be defined through Bayesian retrodiction).
V. ON THE CHOICE OF THE REFERENCE PRIOR
In the literature on fluctuation relations, based on thermodynamics, the wording “reference prior” is absent. Its role is usually taken by an assumption of “detailed-balance.” In all the examples that we have looked into,1 this corresponds to the choice of the steady state as reference prior. The operational interpretation of this choice is very physical: one takes as reference the process in which nothing changes. It has also a very neat consequence when it comes to fluctuation relations: the ratio R(x, y) given in (35), and thus the variable that enters the fluctuation relations, depends on the channel only through its steady state γ. With this choice, one is clearly studying fluctuations around equilibrium.39
Inspired by statistical comparisons, one may opt for a different definition of the reference prior. One possibility is trying to keep prediction and retrodiction as close as possible. With such goals, let us take as a figure of merit
which we called the reverse Kullback–Leibler distance in the Sec. IV. Only the second term depends on the reference prior; besides, it does not depend on q(y), but does depend on p(x) that may be arbitrary. We may then choose ξ as to minimize the average of over all possible choices of p. Upon such averaging, . Thus we want to find with
Interestingly, we have:
Result 4. The reference prior that minimizes (36) is the uniform prior, for every .
We present the proof in Appendix.
In the same spirit, one can study the reference priors that minimize other figures of merit averaged over the possible priors p(x) and q(y). We run some simple numerical checks at d = 2 for two other figures of merit. For the Kullback–Leibler distance , the steady state is generically not optimal, while the uniform prior seems to be optimal again, even though the dependence on ξ is different from (36). For the guessing probability, i.e., the probability that , neither the steady state nor the uniform prior are generically optimal.
VI. FINAL CONSIDERATIONS: DO WE NEED A REVERSE PHYSICAL PROCESS?
The everyday meaning of (ir)reversibility in nature is captured by the perceived “arrow of time”: if the video of the evolution played backward makes sense, the process is reversible; if it does not make sense, it is irreversible.
Science has gone very far in bringing this intuition on quantitative ground. The standard underlying narrative still involves two processes: the one that we observe, and the associated reverse process (not deemed to be strictly impossible, but very unlikely). This reverse process is generically not the video played backward: to cite an extreme example, nobody conceives bombs that fly upward to their airplanes while cities are being built from rabble.40,41 In the case of controlled protocols in the presence of an unchanging environment, the reverse process is implemented by reversing the protocol. If the environment were to change (in an uncontrolled way, by definition of environment), the connection between the physical process and the associated reverse one becomes thinner.
With our line of research, we are exploring the possibility that the narrative of the reverse process may not be needed at all. In the wording pioneered by Watanabe, irreversibility may be rather irretrodictability. So far, this program has found no obstacle, and has even clarified situations that were deemed puzzling in the case of some quantum channels.1 The vistas opened by this approach also allow to expand the scope of fluctuation relations (Sec. IV) and discuss the choice of a reference prior (Sec. V).
Barring surprises à la John Bell, this conflict of narratives will not be discriminated by experiments. Indeed, on the one hand, the retrodiction channels (both classical and quantum) are by construction valid channels: nothing forbids the physical implementation of the corresponding processes, as indeed was done in the experimental verifications of Crooks' theorem.36 On the other hand, to falsify the retrodictive narrative, one would have to find a reverse process related to its original process in a way that cannot be expressed by (or worse, contradicts) logical reasoning: it is hard to see how such a claim could ever be made. So, one's narrative of choice will depend on the fruitfulness of the intuition, the economy of concepts, the elegance of the formulas… In this paper, we have hinted at the superiority of the retrodictive narrative in all these respects.
ACKNOWLEDGMENTS
The authors acknowledge the help of Eugene Koh in completing the proof of Result 4. F.B. acknowledges support from the Japan Society for the Promotion of Science (JSPS) KAKENHI, Grant Nos. 19H04066 and 20K03746, and from MEXT Quantum Leap Flagship Program (MEXT Q-LEAP), Grant No. JPMXS0120319794. V.S. acknowledges support from the National Research Foundation and the Ministry of Education, Singapore, under the Research Centres of Excellence programme.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
APPENDIX: THE REFERENCE PRIOR THAT MINIMIZES THE AVERAGE KULLBACK–LEIBLER DISTANCE
Consider a generic classical channel with d-dimensional input and output alphabets. Denote the reference prior as with and . With this parametrization, with
On the uniform prior (ux = 0 for all x), this takes the value
Thus, is equal to
where the inequality is Jensen's inequality on the first term. Thus, we have proved that