Irreversibility is usually captured by a comparison between the process that happens and a corresponding “reverse process.” In the last decades, this comparison has been extensively studied through fluctuation relations. Here, we revisit fluctuation relations from the standpoint, suggested decades ago by Watanabe, that the comparison should involve the prediction and the retrodiction on the unique process, rather than two processes. We identify a necessary and sufficient condition for a retrodictive reading of a fluctuation relation. The retrodictive narrative also brings to the fore the possibility of deriving fluctuation relations based on various statistical divergences, and clarifies some of the traditional assumptions as arising from the choice of a reference prior.

Quantitative approaches to irreversibility traditionally involve a comparison between the process physically happening, usually called forward process, and a corresponding reverse (or backward) process. The definition of the latter is intuitive only for some processes: somewhat ironically, not for those that are paradigmatic of irreversibility. Indeed, consider the erasure channel, which sends every possible input state to a unique, fixed output state: what should one take as its reverse process?

In a previous paper,1 two of us proposed to look at irreversibility as arising out of our logical inference, rather than out of physical processes. Specifically, we proposed to define the reverse process in terms of Bayesian retrodiction. This is a universal recipe. This retrodictive element can be identified a posteriori in all previously reported fluctuation relations that we checked, including the most famous ones, both classical2–5 and quantum,6 that are highlighted in the many available reviews.7–10 Besides recovering “intuitive” reverse processes, retrodiction provides a definition for the non-intuitive ones, which smoothly removes anomalies that were reported with other tentative definitions. Thus, it seems plausible that all fluctuation relations can be understood in terms of retrodiction (though the literature is too vast and sparse to make a definitive call, we shall strengthen the evidence with Result 3 below).

In the pursuit of this line of research, we recognize that our previous paper was not radical enough. If the retrodictive origin of irreversibility is assumed, the narrative of the two processes becomes superfluous: using retrodiction to define a reverse process is an unnecessary step. There is only one process, the one that happens; what is being compared are our forward and backward inferences on it: prediction and retrodiction.

The replacement of irreversibility with irretrodictability was pioneered by Watanabe,11,12 though prior to our previous work no connection had been drawn with the fluctuation theorems derived in the last twenty years. Under this change of viewpoint, it is the same physics that is being described, freed from an excess baggage in the narrative (and thus, possibly, on the interpretation). Besides epistemological economy, we are going to show that this viewpoint is fruitful as it opens previously unnoticed vistas.

The plan of this paper is as follows. In Sec. II, we present a self-contained introduction to retrodiction, both classical and quantum; and Sec. III describes two case studies in detail. Section IV deals with fluctuation relations: we show that these relations are intimately related to statistical distances (“divergences”) and that Bayesian retrodiction arises from the requirement that the fluctuating variable can be computed locally. We also compare the fluctuation relations obtained in the retrodictive narrative with those obtained in the reverse-process narrative. Section V reflects back on the structure of retrodiction, elaborating on the role of the unavoidable reference prior.

A word on the presentation. This paper covers topics from statistics, thermodynamics, and quantum information. We have tried to keep the presentation self-contained. We have also adopted a compact approach to references: besides those that prove specific results, we shall cite mostly reference books and reviews, and occasionally a few works that we consider clear and exemplary, useful as entry points for the reader, without any expectation of being exhaustive.

In this paper, we consider processes with discrete alphabets. The input of the channel is denoted x { 1 , 2 , , d x } and the output y { 1 , 2 , , d y }. We shall always have in mind d x = d y = d, keeping the notation different only when clarity demands it.

Also, throughout the paper, we assume that all probability vectors have strictly positive elements, and also all channels have only strictly positive entries (with the exception of the permutation channels studied in Subsection III A). Arbitrarily small entries would be indistinguishable from an exact zero, certainly in practice, and perhaps also in principle depending on one's understanding of probabilities.

As basic setting of retrodiction, consider the most elementary form of statistical inference: at the output of a known channel φ ( y | x ), one observes the outcome y = y *, and wants to infer something about the input x. In this paper, we focus on Bayesian retrodiction, whose goal is to update one's belief on the distribution of x. This requires a prior belief, the reference prior, denoted ξ ( x ). The total prior knowledge is therefore captured by the joint probability distribution P ξ ( x , y ) = ξ ( x ) φ ( y | x ); in particular, the prior knowledge about y is ξ ̂ ( y ) = x ξ ( x ) φ ( y | x ). When the knowledge on y is updated to y = y *, one performs the Bayes' update

P ξ ( x , y ) y = y * P ξ ( x , y ) = P ξ ( x | y * ) δ y , y *
(1)

on the total knowledge, whence the updated knowledge on x follows as P ξ ( x | y * ) = ξ ( x ) φ ( y * | x ) / ξ ̂ ( y * ). This is the most elementary example of retrodiction.

Slightly less basic, though also widely discussed in the statistical literature, is the retrodiction on x based on “soft evidence” on y. This refers to the situation, in which the update on y is not a sharp value y = y *, but a distribution u(y). In real life, soft evidence may arise by sheer uncertainty (e.g., reading the outcome y in very dim light) or by virtual evidence (e.g., the doctor told me a definite result z = z * for my test, but I saw that he was tired and fear that he may have misread the actual result y written on the sheet). The translation of such uncertainties into a quantitative u(y) is not trivial,13,14 but we take it for granted. For such situations, Bayes' update (1) is generalized to Jeffrey's update15 

P ξ ( x , y ) u ( y ) P ξ ( x , y ) = P ξ ( x | y ) u ( y ) .
(2)

In the case of virtual evidence, Jeffrey's update is a direct consequence16,17 of Bayes' update starting from z = z *, under the assumption that the variable z influences directly only y and not x (cf. the example above: the tiredness of the doctor has no direct influence on whether I am actually sick). In other cases, it may be considered as an actual addition to the rules of Bayesian inference (this was Jeffrey's own view).

Thus, the conditional probability P ξ ( x | y ) plays the role of channel for the retrodiction, in short retrodiction channel. For the remainder of the paper, we change the notation to

φ ̂ ξ ( x | y ) = ξ ( x ) ξ ̂ ( y ) φ ( y | x ) .
(3)

We shall make use at our convenience of a matrix representation. The channel φ ( y | x ) is represented by the column-stochastic matrix

M φ = ( φ ( 0 | 0 ) φ ( 0 | d x ) φ ( y | x ) φ ( d y | 0 ) φ ( d y | d x ) ) .

Similarly, the retrodiction channel φ ̂ ξ ( x | y ) is represented by the column stochastic matrix

M φ ̂ ξ = ( φ ̂ ξ ( 0 | 0 ) φ ̂ ξ ( 0 | d y ) φ ̂ ξ ( x | y ) φ ̂ ξ ( d x | 0 ) φ ̂ ξ ( d x | d y ) ) .

In this notation, we similarly define input and output distributions p(x) as column vectors vp. For instance, the relations that define the reference prior can be written as

M φ v ξ = v ξ ̂ , M φ ̂ ξ v ξ ̂ = v ξ .
(4)

Before continuing, we bring up two crucial remarks. The first is about the reference prior. It is well known,11,12,17 and our presentation above leaves it clear once again, that this element of subjectivity is an unavoidable feature of Bayesian retrodiction for a generic channel. The question of the choice of the prior is a recurring topic in Bayesian statistics. The literature on fluctuation relations does not mention it as such, the assumption being stated in more physical language. We shall get back to this point in Sec. V. Here, for the sake of definiteness we just mention two possible choices. One is the uniform prior ξ ( x ) = 1 d for all x. Another is the steady state of φ, defined by γ = γ ̂. Every stochastic map has at least one steady state, and exactly one if all its entries are strictly positive. It follows immediately from φ ̂ γ ( x | y ) = γ ( x ) γ ( y ) φ ( y | x ) that the uniform prior is a steady state if and only if φ is bistochastic.

The second remark, that also others felt the need to highlight,18 is that retrodiction is not inversion. A channel has a linear inverse if there exists M such that M M φ = 1. In the case of an invertible channel, given a valid output distribution p ̂ ( y ) = x φ ( y | x ) p ( x ), one is able to recover the input distribution p(x). But for most invertible channels, M is not a channel itself: there exist u(y) such that v u M φ v p. In particular, since the image of the probability simplex by M φ is convex, there exists y * such that no input distribution p(x) is mapped by M φ to δ y , y *—while retrodicting from δ y , y * is the most basic example of Bayesian inference. Ultimately of course the difference is in the task: retrodiction does not aim at reconstructing the prior through repeated sampling, but at updating one's belief after a single run of the process. This is particularly clear in the example of the test for a sickness given above. See Refs. 42 and 43 for studies of retrodiction of quantum temporal dynamics.

In Subsection III A, we shall see a remarkable coincidence: the channels for which the retrodiction channel coincides with the inverse, and those for which the retrodiction channel is independent of the reference prior, are exactly the same.

According to our current knowledge, the most general description of the inner working of any classical input–output channel is given by quantum theory. The quantum-inside description of a classical channel is as follows (Fig. 1). The classical input x prepares a system in a state ρx. The system is then sent through a quantum channel [a completely positive, trace-preserving (CPTP) map] E, and eventually measured with the positive operator-valued measure (POVM) { Π y }, leading to the classical outcome y. All in all,

φ ( y | x ) = Tr ( Π y E [ ρ x ] ) .
(5)
Fig. 1.

A quantum-inside classical channel and the corresponding retrodiction channel. The construction, described in Eqs. (7)–(10), is valid for every set of states { ρ x }, every CPTP map E and every POVM { Π y }.

Fig. 1.

A quantum-inside classical channel and the corresponding retrodiction channel. The construction, described in Eqs. (7)–(10), is valid for every set of states { ρ x }, every CPTP map E and every POVM { Π y }.

Close modal

We want to derive the quantum description of the associated classical retrodiction channel (3): that is, finding states ρ ̂ y, a CPTP map E ̂, and a POVM { Π ̂ x } x, such that

φ ξ ( x | y ) = Tr ( Π ̂ x E ̂ [ ρ ̂ y ] ) .
(6)

For this, we first need to define the adjoint E of the channel, that is the operator such that Tr ( Y E [ X ] ) = Tr ( E [ Y ] X ) for all operators X, Y. Inserting this definition in (3), one finds

φ ξ ( x | y ) = Tr ( ( ξ ( x ) ρ x ) E [ Π y / ξ ̂ ( y ) ] ) .

This looks like (3), but in general one has Tr ( Π y / ξ ̂ ( y ) ) 1 , x ξ ( x ) ρ x 1, and E is a CPTP map only if E is unital. In order to identify proper states, channel and measurement, one has to introduce a reference state

Ξ = x ξ ( x ) ρ x .
(7)

As in the classical case, we assume that Ξ and Ξ ̂ = E [ Ξ ] have full rank, to skip caveats for situations of measure zero. Then one possible construction of the quantum elements of (6) uses

ρ ̂ y = 1 ξ ̂ ( y ) S ( E [ Ξ ] ) [ Π y ] ,
(8)
E ̂ E ̂ Ξ = S ( Ξ ) E S 1 ( E [ Ξ ] ) ,
(9)
Π ̂ x = ξ ( x ) S 1 ( Ξ ) [ ρ x ] ,
(10)

where we have introduced the notation S ( A ) [ B ] = A B A for a positive operator A. Starting from this basic construction, one can obtain others as follows:

ρ ̂ y U s [ ρ ̂ y ] ,
(11)
E ̂ Ξ U m E ̂ Ξ U s 1 ,
(12)
Π ̂ x Π ̂ x U m 1 ,
(13)

also leading to (6) for any pair of unitary channels ( U s , U m ).

The key observation is that the retrodiction channel E ̂ turns out to be the Petz recovery or Petz transpose map19,20 of E for the reference state Ξ [Eq. (9)], or a rotated version thereof [Eq. (12)].

The Petz map, a widely used tool in quantum information,21–23 was previously identified on formal grounds as the generalization of retrodiction within the quantum formalism.24–27 First of all, in the case where all the states and the channels are diagonal in the same basis, (9) reduces to (3). Furthermore, just as the Bayesian retrodiction φ ̂ ξ depends on a reference prior ξ, the Petz map E ̂ α depends on a reference state α.28 Interestingly, the Petz map was also used for quantum fluctuation relations,29 but the connection with retrodiction was not noticed.

In this section, we present first retrodiction on Hamiltonian channels (both classical and quantum), which are provably the only ones for which the retrodictive map is independent of the reference prior and is identical to the inverse. Then, we discuss retrodiction for all classical bit channels (d =2): precisely because it is elementary, this case study is useful to clarify features and dissipate possible confusions about retrodiction.

We call Hamiltonian channels, both classical and quantum, channels that are both deterministic and invertible (Watanabe12 referred to these channels as “bilaterally deterministic”). The flows do not cross, and each state belongs to one and only one trajectory.

For classical information, we have y = f ( x ) with f a bijection (in the discrete case, a permutation), and so x = f 1 ( y ) is uniquely defined. In this case, it is absolutely natural to expect

φ ( y | x ) = δ y , f ( x ) φ ̂ ξ ( x | y ) = δ x , f 1 ( y ) ,
(14)

independent of the reference prior. It is readily verified that this is indeed the case from Eq. (3), since for a bijection we have ξ ̂ ( y ) = ξ ( x ) δ y , f ( x ).

This result has very appealing features: the retrodiction channel coincides with the inverse and is independent of the arbitrary choice of reference prior. Appealing as they are, these features cannot be taken as paradigmatic, because they are actually unique to this case.

Result 1. The following three statements are equivalent:

  • I

    The channel φis a permutation.

  • II

    The retrodiction channel φ ̂is independent of the reference prior.

  • III

    There exists a reference prior ξ, for which the retrodiction channel φ ̂ ξis the inverse channel φ 1.

Proof. We present a full proof here, putting on record that the equivalence of (I) and (II) was already proved in Watanabe's pioneering study.12 

Equation (14) proves (I) (II, III). The implication (II) (III) goes as follows: Eq. (4) implies M φ ̂ ξ M φ v ξ = v ξ. If M φ ̂ ξ = M φ ̂ for all ξ, then M φ ̂ M φ v ξ = v ξ for all vectors: then M φ ̂ M φ = 1, that is φ ̂ = φ 1.

We are left to prove (III) (I). Let us assume that there is a reference prior such that φ ̂ ξ = φ 1, i.e., M φ ̂ ξ M φ = 1. Let us spell out this condition:

M φ ̂ ξ M φ = 1 y M x y φ ̂ ξ M y x φ = δ x x x x y φ ̂ ξ ( x | y ) φ ( y | x ) = δ x x x x y ξ ( x ) ξ ̂ ( y ) φ ( y | x ) φ ( y | x ) = δ x x x x .

All the terms are products of non-negative numbers, ξ ( x ) > 0 for all x by assumption, and 1 / ξ ̂ ( y ) 0 for 0 ξ ̂ ( y ) 1. Thus, for all the off-diagonal terms to be zero, we need

φ ( y | x ) φ ( y | x ) = 0 x x , y .
(15)

This means the product of any two entries of a given y-row will always be zero. Hence, there can be at most one non-zero entry for that row, which means that the matrix M φ can have at most d non-zero entries. But there are d columns, and the sum of all the elements of each column must be 1. Thus, the only possibility is that each row and each column have exactly one non-zero entry, and the value of the entry is 1. This defines a permutation matrix and concludes the proof. ◻

Incidentally, condition (15) shows that M φ ̂ ξ M φ = 1 is not determined by the reference prior; so, at that point we had proved directly (III) (II).

The same result holds for retrodiction on quantum information—in fact, Result 1 was presented first for reasons of clarity, but can be seen as a special case of the following:

Result 2. The following three statements are equivalent:

  • The channel Eis unitary.

  • The retrodiction channel E ̂is independent of the reference prior.

  • There exists a reference state α, for which the retrodiction channel E ̂ αis the inverse channel E 1.

Proof. The implications (I) (II, III) follow from the direct calculation of (9) for a unitary channel:

U ̂ α = S ( α ) ° U ° S 1 ( U [ α ] ) = ( U ° U ) ° S ( α ) ° U ° S 1 ( U [ α ] ) = U = U 1 ,

where we have used U ° U = I the identity channel, and U ° S ( α ) ° U = S ( U [ α ] ) that is U α U = U α U .

The proof of (II) (III) is analog to that for classical information. Trivially, E ̂ α ° E [ α ] = α holds by definition of E ̂ α. Therefore, if E ̂ α = E ̂ for all α, then E ̂ [ E [ ρ ] ] = ρ for all ρ; whence E ̂ = E 1.

Finally, for the proof of (III) (I): since any Petz map is CPTP, the starting assumption E ̂ α = E 1 implies that E 1 is a CPTP map. But it is known that a CPTP map E with the same input and output space has a CPTP inverse (that is, it is invertible, and the inverse is itself a channel) if and only if it is unitary.30,31

As a second case study, we consider classical stochastic processes for d =2 (Fig. 2). We write a generic channel as

M φ = ( φ ( 0 | 0 ) φ ( 0 | 1 ) φ ( 1 | 0 ) φ ( 1 | 1 ) ) ( 1 a b a 1 b ) ,
(16)

with 0 a , b 1. Its steady state is

γ ( 0 ) = b a + b , γ ( 1 ) = a a + b ,

unique unless a = b = 0 (this being expected, since every state is a steady state for the identity channel).

Fig. 2.

Bit channels (a) and their retrodiction (b) can be depicted as the respective maps above.

Fig. 2.

Bit channels (a) and their retrodiction (b) can be depicted as the respective maps above.

Close modal

The corresponding retrodiction channel with generic reference prior is

M φ ̂ ξ = ( φ ̂ ξ ( 0 | 0 ) φ ̂ ξ ( 0 | 1 ) φ ̂ ξ ( 1 | 0 ) φ ̂ ξ ( 1 | 1 ) ) = ( ( 1 a ) ξ ( 0 ) ξ ̂ ( 0 ) a ξ ( 0 ) ξ ̂ ( 1 ) b ξ ( 1 ) ξ ̂ ( 0 ) ( 1 b ) ξ ( 1 ) ξ ̂ ( 1 ) ) ,
(17)

with ξ ̂ ( 0 ) = ( 1 a ) ξ ( 0 ) + b ξ ( 1 ) and ξ ̂ ( 1 ) = 1 ξ ̂ ( 0 ). Interestingly, the retrodiction channel built on the steady state has the same stochastic matrix as the channel itself,

M φ ̂ γ = M φ [ d = 2 ] .
(18)

This can be verified without calculation, noticing that φ ̂ γ ( x | x ) = φ ( x | x ) and that M φ ̂ γ must also be column-stochastic.

The channel (16) is invertible if and only if a + b 1. Result 1 of course holds: the retrodiction channel will be the inverse if and only if a = b = 0 (identity channel) or a = b = 1 (bit-flip channel). For all the other invertible channels, retrodiction and inversion do not coincide, whatever the choice of the reference prior.

The non-invertible channels, a + b = 1, make for an interesting case study; we change the notation a ε, so that

M φ = ( 1 ε 1 ε ε ε ) .
(19)

First notice that

M φ v p = ( 1 ε ε ) ,
(20)

for all input p. In other words, these channels erase whatever information is present in the input, and produce a fixed output distribution (which, of course, coincides with their steady state). In this sense, they could all be called erasure channels, though the name is usually given to the case ε = 0.

Because at the output all information on the input has been destroyed, one may naively expect the retrodiction channel to produce a completely random outcome. But this forgetting the importance of the reference prior in retrodiction. Plugging the expressions in the equations, one readily finds

M φ ̂ ξ = ( ξ ( 0 ) ξ ( 0 ) ξ ( 1 ) ξ ( 1 ) ) .
(21)

The retrodiction channel of an erasure channel is the erasure channel that returns the reference prior—a result that can be easily extended to any alphabet dimension.32 In agreement with (18), if the reference prior is the steady state, the retrodiction channel is the same erasure channel. The fact that one can associate a thermodynamically reversible process to a logically irreversible channel like the erasure channel was noted for instance by Sagawa (Ref. 44). The identity between the forward and the reverse erasure channels was noticed by Riechers and co-workers (Ref. 45). for a specific toy model of physical erasure. We see here that it is an unavoidable feature, having defined the reverse process from the steady state. These observations are summarized in Fig. 3.

Fig. 3.

(a) An ϵ-erasure channel, (b) the retrodiction channel with steady state as reference prior, (c) the retrodiction channel for a generic reference prior.

Fig. 3.

(a) An ϵ-erasure channel, (b) the retrodiction channel with steady state as reference prior, (c) the retrodiction channel for a generic reference prior.

Close modal

The topic of this section, fluctuation relations, originated in statistical thermodynamics. As we shall see, the formal structure of these relations can be derived without any reference to that branch of physics. As it happens, we shall mention thermodynamics only in the very last paragraph of the section. The explicit application of these formulas to important situations in thermodynamics was discussed in our previous paper.1 

As we noted in the introduction, it is customary in studies of irreversibility to define the physical process as the forward process, and to compare it to its corresponding reverse process. Here, we adopt a different narrative as follows:

  • There is only one process, the one that is happening.

  • A (forward) prediction on the process starts with a prior p(x) on the input, and infers the predicted distribution
    P F ( x , y ) = p ( x ) φ ( y | x ) .
    (22)
  • A retrodiction on the process starts with a prior q(y) on the output, and infers the retrodicted distribution
    P R [ ξ ] ( x , y ) = q ( y ) φ ̂ ξ ( x | y ) .
    (23)

The explicit mention of ξ will be dropped for simplicity in the remainder of this section and resumed in Sec. V.

We proceed to derive fluctuation relations with our narrative, and later we show the comparison with the reverse-process narrative.

Consider a variable Ω ( x , y ) that depends on the initial and final states, and may be determined by the process. Its predicted distribution is

μ F ( ω ) = x , y δ ( ω Ω ( x , y ) ) P F ( x , y ) ,
(24)

while its retrodicted distribution is

μ R ( ω ) = x , y δ ( ω Ω ( x , y ) ) P R ( x , y ) = x , y δ ( ω Ω ( x , y ) ) R ( x , y ) P F ( x , y ) ,
(25)

where

R ( x , y ) = P R ( x , y ) P F ( x , y ) .
(26)

So, the difference between μ F ( ω ) and μ R ( ω ) is encoded in this ratio of probabilities, which is exactly the quantity that appears in the statistical f-divergence33,34

D f ( P R | | P F ) = x , y P F ( x , y ) f ( R ( x , y ) ) ,
(27)

where the function f(r) must be convex for r + and satisfy f ( 1 ) = 0. The “entropy production,” on which the thermodynamical literature bases fluctuation relations, uses f ( r ) = ln ( r ), which generates the reverse Kullback–Leibler distance D K L ( P F | | P R ). But we do not need to choose that particular function at this stage: for any function f(r) invertible35 for r +, if we set

Ω ( x , y ) = f ( R ( x , y ) ) ,
(28)

we have by definition

ω F = D f ( P R | | P F ) .
(29)

Besides, there immediately follows from (24) and (25) the fluctuation relation

μ R ( ω ) = f 1 ( ω ) μ F ( ω ) ,
(30)

that is the generalization of Crooks' theorem.4 By integrating over ω, one obtains the integral fluctuation relation

f 1 ( ω ) F = x , y P R ( x , y ) = 1 ,
(31)

that depends only on the process. This is the generalization of Jarzynski's equality.3 

In all the literature we are aware of, fluctuation relations are presented as a measure of the statistical difference between the forward and the reverse process, not between the predicted and retrodicted distributions of a single process. The difference between the two narratives has mathematical manifestations that we are going to discuss now.

For the sake of definiteness, let us start with a canonical example. Suppose that the variable of interest is entropy, and that in the process under study it changes by Δ S. In a retrodictive approach, (23) defines the retrodicted distribution for that same process. But if one looks at (23) as defining a reverse process, for that process the change of entropy will be rather Δ S.

Generalizing this observation, the distribution of the variable ω in the reverse process reads

μ ̃ R ( ω ) = x , y δ ( ω Ω ̃ ( x , y ) ) P R ( x , y ) ,
(32)

where, under assumption (28),

Ω ̃ ( x , y ) = f ( 1 R ( x , y ) ) g ( Ω ( x , y ) ) ,
(33)

because the roles of PF and PR are exchanged between the forward and the reverse process [for the choice f ( r ) = ln ( r ), there follows the expected minus sign g ( ω ) = f ( 1 / r ) = f ( r ) = ω]. The resulting fluctuation relation then reads1 

μ ̃ R ( g ( ω ) ) | g ( ω ) | = f 1 ( ω ) μ F ( ω ) .
(34)

As expected, μF evaluated at ω is now related to μ ̃ R evaluated at g ( ω ). The Jacobian factor, which comes from the change of variable in the δ-function, ensures that the integral fluctuation relation takes exactly the same form as (31).

A comparison of (30) and (34) for various choices of f is given in Table I. For the thermodynamical case f ( r ) = ln ( r ), we have | g ( ω ) | = 1, and therefore the only difference between (30) and (34) is that μR is evaluated at ω while μ ̃ R is evaluated at ω. Thus, in thermodynamics not only the Jarzynski equality but also the Crooks fluctuation theorem is the same in both narratives (up to that sign change). Interestingly, even when reporting experiments in which the reverse process was actually implemented, it is the retrodictive version that is usually plotted for its visual convenience: see for instance the pioneering verification of Crooks' fluctuation theorem with folding and unfolding of RNA.36 

Table I.

Fluctuation relations (FRs) obtained in the retrodictive and in the reverse-process narratives, for a few choices of ω satisfying (29) for the corresponding f-divergence. The fourth column is kept in the form (34) without possible algebraic simplifications, to facilitate the identification of g ( ω ) and | g ( ω ) |.

ω = f ( r ) f-Divergence FR for retrodiction (30) FR for reverse process (34) Integral FR (31)
ln ( r )   Reverse Kullback–Leibler  μ R ( ω ) = e ω μ F ( ω )   μ ̃ R ( ω ) = e ω μ F ( ω )   e ω F = 1  
( 1 r ) 2   Squared Hellinger  μ R ( ω ) = ( 1 ω ) 2 μ F ( ω )   μ ̃ R ( ω ( 1 ω ) 2 ) 1 + ω ( 1 ω ) 2 = ( 1 ω ) 2 μ F ( ω )   ( 1 ω ) 2 F = 1  
1 / r 1   Neyman χ 2  μ R ( ω ) = 1 1 + ω μ F ( ω )   μ ̃ R ( ω 1 + ω ) 1 ( 1 + ω ) 2 = 1 1 + ω μ F ( ω )   1 1 + ω F = 1  
ω = f ( r ) f-Divergence FR for retrodiction (30) FR for reverse process (34) Integral FR (31)
ln ( r )   Reverse Kullback–Leibler  μ R ( ω ) = e ω μ F ( ω )   μ ̃ R ( ω ) = e ω μ F ( ω )   e ω F = 1  
( 1 r ) 2   Squared Hellinger  μ R ( ω ) = ( 1 ω ) 2 μ F ( ω )   μ ̃ R ( ω ( 1 ω ) 2 ) 1 + ω ( 1 ω ) 2 = ( 1 ω ) 2 μ F ( ω )   ( 1 ω ) 2 F = 1  
1 / r 1   Neyman χ 2  μ R ( ω ) = 1 1 + ω μ F ( ω )   μ ̃ R ( ω 1 + ω ) 1 ( 1 + ω ) 2 = 1 1 + ω μ F ( ω )   1 1 + ω F = 1  

In the retrodictive narrative, the fluctuation relation (30) and its derivate (31) are statistical properties of the random variable ω defined by (29). They are formally valid for the statistical comparison between arbitrary PF and PR, with no reference to the notion of retrodiction, let alone to its mathematical expression (3). In the reverse-process narratives, one studies the distribution of the values of the variable when the roles of PF and PR are swapped [Eq. (33)]; but even then, the fluctuation relations follow without having specified any mathematical relation between PF and PR. So, what is the role of Bayesian retrodiction, or that of a proper definition of the reverse process? We are going to prove that it singles out a specific structure for R(x, y), and that this simple result has far-reaching consequences in the context of thermodynamics.

Result 3. The ratio R(x, y) [Eq. (26)] is of the form F ( x ) G ( y ), for some functions F and G, if and only if PF and PR are related as (22) and (23), with the latter constructed from Bayesian retrodiction [Eq. (3)]. In this case,

R ( x , y ) = ξ ( x ) p ( x ) q ( y ) ξ ̂ ( y ) .
(35)

Proof. If PF and PR are given by (22) and (23), using (3) it is trivial to derive (35). In the other direction: without loss of generality we keep the form (22) for PF and, using the product rule of joint probabilities, we write P R ( x , y ) = q ( y ) η ( x | y ) for the conditional distribution (channel) η and marginal q. The assumption reads

η ( x | y ) φ ( y | x ) = F ̃ ( x ) G ̃ ( y ) x , y ,

with F ̃ ( x ) = p ( x ) F ( x ) and G ̃ ( y ) = G ( y ) / q ( y ). Since the LHS is strictly positive,37  sign ( F ̃ ( x ) ) = sign ( G ̃ ( y ) ) must hold for all (x, y). Now, being a channel, η must satisfy x η ( x | y ) = 1, that is 1 / G ̃ ( y ) = x F ̃ ( x ) φ ( y | x ). So, finally

η ( x | y ) = F ̃ ( x ) x F ̃ ( x ) φ ( y | x ) φ ( y | x ) φ ̂ ξ ( x | y ) ,

where ξ ( x ) = F ̃ ( x ) / x F ̃ ( x ) is a valid probability distribution because all the F ̃ ( x ) have the same sign. ◻

While this result may look purely anecdotal or formal, let us recall that in the usual thermodynamical interpretation Ω ( x , y ) = ln ( R ( x , y ) ) is the (non-adiabatic) stochastic entropy production.38 Thus, whenever the stochastic entropy production can be computed locally (that is, independently of the correlations between microstates x and y), a structure of Bayesian retrodiction is unavoidable (in the reverse-process narrative: the reverse process must be defined through Bayesian retrodiction).

In the literature on fluctuation relations, based on thermodynamics, the wording “reference prior” is absent. Its role is usually taken by an assumption of “detailed-balance.” In all the examples that we have looked into,1 this corresponds to the choice of the steady state as reference prior. The operational interpretation of this choice is very physical: one takes as reference the process in which nothing changes. It has also a very neat consequence when it comes to fluctuation relations: the ratio R(x, y) given in (35), and thus the variable that enters the fluctuation relations, depends on the channel φonly through its steady state γ. With this choice, one is clearly studying fluctuations around equilibrium.39 

Inspired by statistical comparisons, one may opt for a different definition of the reference prior. One possibility is trying to keep prediction and retrodiction as close as possible. With such goals, let us take as a figure of merit

D K L ( P F | | P R [ ξ ] ) = x , y P F ( x , y ) ln ( P F ( x , y ) P R [ ξ ] ( x , y ) ) = x , y P F ( x , y ) ln p ( x ) q ( y ) + x , y P F ( x , y ) ln ξ ̂ ( y ) ξ ( x ) ,

which we called the reverse Kullback–Leibler distance in the Sec. IV. Only the second term depends on the reference prior; besides, it does not depend on q(y), but does depend on p(x) that may be arbitrary. We may then choose ξ as to minimize the average of D K L ( P F | | P R [ ξ ] )over all possible choices of p. Upon such averaging, p ( x ) 1 d. Thus we want to find ξ D = argmin F [ ξ ] with

F [ ξ ] = x , y φ ( y | x ) ln ξ ̂ ( y ) ξ ( x ) .
(36)

Interestingly, we have:

Result 4. The reference prior that minimizes (36) is ξ D ( x ) = 1 dthe uniform prior, for every φ.

We present the proof in  Appendix.

In the same spirit, one can study the reference priors that minimize other figures of merit averaged over the possible priors p(x) and q(y). We run some simple numerical checks at d =2 for two other figures of merit. For the Kullback–Leibler distance D K L ( P R [ ξ ] | | P F ), the steady state is generically not optimal, while the uniform prior seems to be optimal again, even though the dependence on ξ is different from (36). For the guessing probability, i.e., the probability that argmax P F ( x , y ) = argmax P R [ ξ ] ( x , y ), neither the steady state nor the uniform prior are generically optimal.

The everyday meaning of (ir)reversibility in nature is captured by the perceived “arrow of time”: if the video of the evolution played backward makes sense, the process is reversible; if it does not make sense, it is irreversible.

Science has gone very far in bringing this intuition on quantitative ground. The standard underlying narrative still involves two processes: the one that we observe, and the associated reverse process (not deemed to be strictly impossible, but very unlikely). This reverse process is generically not the video played backward: to cite an extreme example, nobody conceives bombs that fly upward to their airplanes while cities are being built from rabble.40,41 In the case of controlled protocols in the presence of an unchanging environment, the reverse process is implemented by reversing the protocol. If the environment were to change (in an uncontrolled way, by definition of environment), the connection between the physical process and the associated reverse one becomes thinner.

With our line of research, we are exploring the possibility that the narrative of the reverse process may not be needed at all. In the wording pioneered by Watanabe, irreversibility may be rather irretrodictability. So far, this program has found no obstacle, and has even clarified situations that were deemed puzzling in the case of some quantum channels.1 The vistas opened by this approach also allow to expand the scope of fluctuation relations (Sec. IV) and discuss the choice of a reference prior (Sec. V).

Barring surprises à la John Bell, this conflict of narratives will not be discriminated by experiments. Indeed, on the one hand, the retrodiction channels (both classical and quantum) are by construction valid channels: nothing forbids the physical implementation of the corresponding processes, as indeed was done in the experimental verifications of Crooks' theorem.36 On the other hand, to falsify the retrodictive narrative, one would have to find a reverse process related to its original process in a way that cannot be expressed by (or worse, contradicts) logical reasoning: it is hard to see how such a claim could ever be made. So, one's narrative of choice will depend on the fruitfulness of the intuition, the economy of concepts, the elegance of the formulas… In this paper, we have hinted at the superiority of the retrodictive narrative in all these respects.

The authors acknowledge the help of Eugene Koh in completing the proof of Result 4. F.B. acknowledges support from the Japan Society for the Promotion of Science (JSPS) KAKENHI, Grant Nos. 19H04066 and 20K03746, and from MEXT Quantum Leap Flagship Program (MEXT Q-LEAP), Grant No. JPMXS0120319794. V.S. acknowledges support from the National Research Foundation and the Ministry of Education, Singapore, under the Research Centres of Excellence programme.

The authors have no conflicts to disclose.

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Consider a generic classical channel φ with d-dimensional input and output alphabets. Denote the reference prior as ξ ( x ) = 1 d ( 1 + u x ) with 1 u x d 1 and x u x = 0. With this parametrization, F [ ξ ] F ( u ¯ ) with

F ( u ¯ ) = x y φ ( y | x ) ln ( x φ ( y | x ) ( 1 + u x ) 1 + u x ) .

On the uniform prior (ux = 0 for all x), this takes the value

F ( 0 ¯ ) = x y φ ( y | x ) ln ( x φ ( y | x ) ) .

Thus, Δ F ( u ¯ ) F ( 0 ¯ ) is equal to

Δ = x y φ ( y | x ) ln ( x φ ( y | x ) ( 1 + u x ) x φ ( y | x ) ) x ln ( 1 + u x ) x y φ ( y | x ) x φ ( y | x ) ln ( 1 + u x ) x φ ( y | x ) x ln ( 1 + u x ) = x , y φ ( y | x ) ln ( 1 + u x ) x ln ( 1 + u x ) = 0 ,

where the inequality is Jensen's inequality on the first term. Thus, we have proved that

F ( u ¯ ) F ( 0 ¯ ) 0 for all u ¯ .
(A1)
1.
F.
Buscemi
and
V.
Scarani
,
Phys. Rev. E
103
,
052111
(
2021
).
2.
G. N.
Bochkov
and
Y. E.
Kuzovlev
,
Sov. Phys. JETP
45
,
125
(
1977
).
3.
C.
Jarzynski
,
Phys. Rev. Lett.
78
,
2690
(
1997
).
4.
G. E.
Crooks
,
J. Stat. Phys.
90
,
1481
(
1998
).
5.
C.
Jarzynski
,
J. Stat. Phys.
98
,
77
(
2000
).
6.
H.
Tasaki
, e-print arXiv:cond-mat/0009244 (
2000
).
7.
M.
Campisi
,
P.
Hänggi
, and
P.
Talkner
,
Rev. Mod. Phys.
83
,
771
(
2011
).
8.
C.
Jarzynski
,
Annu. Rev. Condens. Matter Phys.
2
,
329
(
2011
).
9.
U.
Seifert
,
Rep. Prog. Phys.
75
,
126001
(
2012
).
10.
K.
Gawedzki
, e-print arXiv:1308.1518 (
2013
).
11.
S.
Watanabe
,
Rev. Mod. Phys.
27
,
179
(
1955
).
12.
S.
Watanabe
,
Progr. Theor. Phys. Suppl.
E65
,
135
(
2013
).
13.
H.
Chan
and
A.
Darwiche
,
Artif. Intell.
163
,
67
(
2005
).
14.
B.
Jacobs
,
J. Artif. Intell. Res.
65
,
783
(
2019
).
15.
R.
Jeffrey
,
The Logic of Decision
(
McGraw-Hill
,
1965
).
16.
J.
Pearl
,
Probabilistic Reasoning in Intelligent Systems: networks of Plausible Inference
(
Morgan Kaufmann
,
1988
).
17.
E. T.
Jaynes
, in
Probability Theory: The Logic of Science
, edited by
G. L.
Bretthorst
(
Cambridge University Press
,
2003
).
18.
S. M.
Barnett
,
J.
Jeffers
, and
D. T.
Pegg
,
Symmetry
13
,
586
(
2021
).
19.
D.
Petz
,
Commun. Math. Phys.
105
,
123
(
1986
).
20.
D.
Petz
,
Q. J. Math.
39
,
97
(
1988
).
21.
M. M.
Wilde
,
Quantum Information Theory
(
Cambridge University Press
,
Cambridge
,
2013
).
22.
D.
Sutter
,
O.
Fawzi
, and
R.
Renner
,
Proc. R. Soc. A
472
,
20150623
(
2016
).
23.
A. M.
Alhambra
and
M. P.
Woods
,
Phys. Rev. A
96
,
022118
(
2017
).
24.
C. A.
Fuchs
, e-print arXiv:quant-ph/0205039 (
2002
).
25.
M. S.
Leifer
and
R. W.
Spekkens
,
Phys. Rev. A
88
,
052130
(
2013
).
26.
G. E.
Crooks
,
Phys. Rev. A
77
,
034101
(
2008
).
27.
D.
Fields
,
A.
Sajia
, and
J. A.
Bergou
, e-print arXiv:2006.15692 (
2020
).
28.
Petz discovered this map in the context of the monotonicity property of the quantum relative entropy D ( ρ | | α ) = Tr ρ ln ρ ρ ln α. For any CPTP linear map E, one has D ( ρ | | α ) D ( E [ ρ ] | | E [ α ] ) 0. Petz showed that D ( ρ | | α ) = D ( E [ ρ ] | | E [ α ] ) if and only if E ̂ α ° E [ ρ ] = ρ. In other words, E ̂ α reconstructs not only α after the channel, but every state whose relative entropy with respect to α has not changed.
29.
H.
Kwon
and
M. S.
Kim
,
Phys. Rev. X
9
,
031029
(
2019
).
30.
F.
Buscemi
,
G. M.
D'Ariano
,
M.
Keyl
,
P.
Perinotti
, and
R.
Werner
,
J. Math. Phys.
46
,
082109
(
2005
).
31.
A.
Nayak
and
P.
Sen
, e-print arXiv:quant-ph/0605041 (
2006
).
32.
Consider a generic erasure channel φ ( y | x ) = β ( y ) for all x , y [ 1 , , d ], that sends any input state to the state β. This will be the fate of any reference prior too: ξ ̂ ( y ) = β ( y ). Thus Eq. (3) yields φ ̂ ( x | y ) = ξ ( x ).
33.
I.
Csiszár
,
Stud. Sci. Math. Hung.
2
,
299
(
1967
).
34.
F.
Liese
and
K.-J.
Miescke
,
Statistical Decision Theory
(
Springer
,
New York
,
2008
).
35.
Not all f that are used for f-divergences are invertible. For instance, f ( r ) = r ln ( r ) for the Kullback–Leibler distance, or f ( r ) = 1 2 | r 1 | for the total variation distance. For such choices, one can still study the comparison between μ R ( ω ) and μ R ( ω ); it just won't take the compact form Eq. (30).
36.
D.
Collin
,
F.
Ritort
,
C.
Jarzynski
,
S. B.
Smith
,
I.
Tinoco
, and
C.
Bustamante
,
Nature
437
,
231
(
2005
).
37.
Recall our assumption that all the entries of the channel are positive; they could be as small as desired.
38.
M.
Esposito
and
C.
Van den Broeck
,
Phys. Rev. Lett.
104
,
090601
(
2010
).
39.
We used the wording “steady state” instead of “equilibrium” because the latter has become almost synonymous of thermal equilibrium. Our channels have nothing thermal a priori, and even for channels involving contact with a thermal bath, the thermal state may not be the steady state (e.g., if the Hamiltonian changes during the process).
40.
J.
Earman
,
Philos. Sci.
41
,
15
(
1974
).
41.
M.
Barrett
and
E.
Sober
,
Br. J. Philos. Sci.
43
,
141
(
1992
).
42.
S.
Gammelmark
,
B.
Julsgaard
, and
K.
Mølmer
,
Phys. Rev. Lett.
111
,
160401
(
2013
).
43.
H.
Bao
,
S.
Jin
,
J.
Duan
,
S.
Jia
,
K.
Mølmer
,
H.
Shen
, and
Y.
Xiao
,
Nat. Commun.
11
,
5658
(
2020
).
44.
T.
Sagawa
,
J. Stat. Mech.
2014
,
P03025
.
45.
P. M.
Riechers
,
A. B.
Boyd
,
G. W.
Wimsatt
, and
J. P.
Crutchfield
,
Phys. Rev. Res.
2
,
033524
(
2020
).