In statistical mechanics, entropy is defined as a fundamental quantity. However, its unit, , involves that of temperature, which is only subsequently defined—and defined in terms of entropy. This circularity arises with the introduction of Boltzmann's constant into the very expression of entropy. The carried by the constant prevents entropy from finding a unit of its own while simultaneously obfuscating its informational nature. Following the precepts of information theory, we argue that entropy is well measured in bits and coincides with information capacity at thermodynamic equilibrium. Consequently, not only is the temperature of a system in equilibrium expressed in , but it acquires a clear meaning: It is the cost in energy to increase its information capacity by 1 bit. Viewing temperature as joules per bit uncovers the strong duality exhibited by Gibbs long ago between available capacity and free energy. It also simplifies Landauer's cost and clarifies that it is a cost of displacement, not of erasure. Replacing the kelvin with the bit as an SI unit would remove Boltzmann's constant from the seven defining constants.
I. MOTIVATION
Entropy is a fundamental quantity in statistical mechanics, and its SI unit is . Yet, in statistical mechanical terms, what does a represent? Tentative answers must confront the circularity mentioned in the abstract, where entropy is defined first yet incorporates a logically posterior temperature unit. Articulating entropy in terms of an as-yet-undefined unit demands an explanation that necessarily breaks with entropy's otherwise simple definition. It must foreshadow a logically posterior definition, like , or theorem, like equipartition. Moreover, to think in joules per kelvin introduces into our view of entropy all the parochial dependencies encompassed in the development of the Kelvin scale. For instance, this includes the aggregate behavior of water and the number 100—and with it, our physiology: the number of our fingers.
The flipside of the aforementioned circularity is that, upon defining temperature from energy and entropy, the kelvin conspicuously crops up alone, as if it were the unit of a fundamental quantity. For instance, Schroeder1 wrote: “Thanks to the factor of Boltzmann's constant in the definition of entropy, the slope of a system's entropy vs. energy graph has the units of . If we take the reciprocal of this slope, we get something with units of kelvins, just what we want for temperature.”
In Sec. II, we first explain how the circular inconsistency regarding the entropy units arises in statistical mechanics as a result of its historical ties to phenomenological thermodynamics. The significant role of information in thermodynamics is emphasized in Sec. III, which motivates the information-theoretic account of entropy. As explained in Sec. IV, at thermodynamic equilibrium, entropy amounts to information capacity. In Sec. V, we show how temperature arises as a bridge between energy and information capacity and how it acquires a clear meaning when formulated in units of joules per bit. Not only does it uncover the notion of available information capacity completely analogous to free energy, but it also sheds light on Landauer's erasure. We also discuss how our proposal alters the current SI.
II. THE TIES OF HISTORY
“Careful, that's hot!” Temperature is one of the most intuitive physical concepts, as we can vividly feel it through our skin. Arguably, temperature is more intuitive than energy; and definitely more than entropy. Perhaps unsurprisingly, the historical development of the three concepts has also followed that order.
The first thermometers were built in the 17th century, by the end of which some precursor of kinetic energy, the so-called vis viva, was introduced by Leibniz.2 As Young3 coined the term energy at the beginning of the 19th century, its existence in various forms was quickly realized. Notably, in 1824, Carnot4 studied engines that could use energy of the caloric type to produce energy of the work type. Clausius5 then characterized the form of energy that is inevitably lost during a thermodynamical cycle in such engines. Only then did he introduce entropy, a monotonically increasing quantity that assured the irreversibility of some processes.
The formalization of these concepts at a macroscopic level, rooted in empirical observations, established the first theory of heat: thermodynamics.*
The first theory of heat is a theory of thermal processes relating macroscopic quantities. Its starting point is temperature. Indeed, the zeroth law entails grouping systems into equivalence classes based on thermal equilibrium. Temperature then serves as a real-valued label for these equivalence classes, endowing them with an ordering that dictates the direction of possible heat flow. Clausius' formula for entropy is , where Q is the amount of heat entering or escaping the system and T is the absolute temperature of the system.
Boltzmann's entropy in this raw form is a combinatorial object, a log-count, a very different quantity than the entropy of Clausius. Had been given a life of its own, perhaps Boltzmann's entropy had eventually found its own unit—in hindsight the natural unit of information (nat).
However, it did not. Planck prefactored with what is now known as Boltzmann's constant, . The constant harmonizes Boltzmann's statistical mechanical entropy with Clausius's thermodynamic entropy, both in , while also assuring a statistical mechanical temperature in kelvins.
We suggest that this introduction of the constant into Boltzmann's entropy, as important as it was for the early development of statistical mechanics, tied the infant theory of statistical mechanics to its predecessor, thermodynamics. However, the theory of thermodynamics should be obtained as a limiting case of statistical mechanics and not be used as an overarching framework for statistical mechanics. This is because statistical mechanics supersedes thermodynamics; it claims more explanatory power. For instance, it yields fluctuation results, it accommodates quantum statistics, it explains phase transitions from first principles, and it better predicts low temperature behaviors. The chronological discovery of ideas does not, therefore, amount to their logical priority.
III. INFORMATION IN THERMODYNAMICS
Crucial insights and results followed the introduction of information-processing considerations into the picture of thermodynamics. For modern reviews, see Refs. 7 and 8. A serious challenge to the second law was proposed by Maxwell9–11 with his famous demon who sorts a system via measurements and manipulations, thereby reducing its entropy. Szilard's engine12 has become the canonical system on which a demon is imagined to perform its puzzling manipulations. It consists of a one-molecule gas container that can be bisected by a partition, which thereupon acts as a bidirectional piston. After inserting the partition, the demon observes on which side the molecule is trapped and then expands adiabatically the piston in the opposite direction. This extracts of free energy from the molecule. After thermalization with the environment, the process can be repeated if the demon has kept his ability to measure and act on the system. Szilard suggested that the second law should be saved by the act of measurement of the demon, which, he thought, should unavoidably create of heat, namely, just enough to compensate. The idea to tie a thermodynamic cost to the act of measurement per se has been followed by many,13–15 and was likely stimulated by the confusion that arose with measurement in quantum theory.
In 1961, Landauer16 correctly identified information erasure—and not measurement—as the precise thermodynamically irreversible step that needs to be compensated by heat dissipation. The erasure of one bit of information must be accompanied by an amount of of free energy lost to the environment or to the non-information-bearing degrees of freedom of a computer. The reason for heat dissipation is purely physical: Information cannot be processed independently of real devices, or as Landauer would have it, “Information is physical.”
The fact that erasure has a thermodynamic cost does not a priori preclude measurement from also having such a cost. This concern was part of the larger misconception that held that computing devices should unavoidably involve logical irreversibility and, with it, heat dissipation. Upon demonstrating that universal computation can be done via logically reversible steps, Bennett17 paved the way for developing thermodynamically reversible models of computation, the most spectacular of which is perhaps Fredkin and Toffoli's ballistic computer.18 In this light, the status of measurement, a very special kind of computation, has been clarified:19 An apparatus initialized in a ready state can measure non-dissipatively. This yields a detailed and satisfactory resolution of Maxwell's problem: With an initialized memory, the demon can measure the system and act on it to reduce its entropy. However, this is no paradox, as it merely displaces the entropy of the system onto its memory. To operate in a cycle, the demon needs to reset the memory to its initial state, that is, to get rid of the information stored—an erasure that, by Landauer's bound, dissipates a quantity of heat greater or equal to the entropy reduction of the system times the environmental temperature.
The logical reversibility of dynamical laws applies to all physical systems, including those with the ability to store information and act based on it. Information-processing agents cannot avoid the second law. Yet, the point of view of information processing has offered limits to what can or cannot be done, thermodynamically. In particular, Landauer's erasure cost can be considered one of the many expressions of the second law, which, as eloquently stated by Schumacher,20 mandates that “No physical process has as its sole result the erasure of information.”
The resolution of Maxwell's demon has been highly influential in promoting the role of information theory in thermodynamics. The scientific literature on the topic has been booming and ramifying in many ways. The advent of Shannon's mathematical theory of communication21 generated insights22,23 (and debates24,25) on the nature of entropy. Algorithmic information theory26–28 permits a quantification of information based on individual objects, yielding more sophisticated notions of entropy29,30 and macrostates.31 The significance of quantum information was realized32,33 and incorporated into various thermodynamical analyses.34 Axiomatic reconstructions of thermodynamics have been suggested.35–37
IV. THERMODYNAMIC EQUILIBRIUM: ENTROPY AS INFORMATION CAPACITY
Energy is a fundamental physical quantity whose conservation principle has permitted significant theoretical advancements and has been proven to be repeatedly consistent with experimental tests. In a conversation about thermodynamics, we shall be satisfied with its SI unit, the joule (J), itself expressed in terms of the kilogram, the meter, and the second.
However, we expressed the problematic circularity with the SI unit of entropy as measured in (Sec. I) and advocated instead for the information-theoretic view of entropy (Sec. III). Many such entropy measures have been proposed, most of which are tied to bits. For instance, Shannon entropy is the expected number of bits required to communicate the outcome of a random variable X in an optimal prefix code.21 More convoluted measures were proposed; yet, for our purpose, we step back on what is perhaps the simplest: information capacity.
Like a distance in space measured by the number of meters that can fit in that space, the information capacity of a system is measured by the number of bits that can fit in that system. In this light, is the number of bits required to label all distinguishable states at energy E, or equivalently, the number of bits that can be encoded in the system if it is seen as a storage resource. At the macroscopic scale, physical systems have a very large number of microstates, which translates to a large value for information capacity. The divide between a two-level system and a macroscopic system can be bridged, for instance, by measuring information capacity in yottabit (Ybit), which corresponds to . This unit assesses more conveniently the information capacity of macroscopic systems.
In modern days, it has become second nature to quantify the storage capacity of devices in bits. In a similar way, any physical system can be viewed as an information-storage system. Compared to the memory of a computing device that has been engineered to be stable in some relevant set of environments, a generic physical system has microstates which, for all practical purposes, cannot be prepared nor maintained in any chosen configuration. Despite this technological (but not fundamental) impracticality for user interplay, it remains that information can be encoded in physical systems, a quantity which is bounded by its information capacity. The assumption of thermodynamic equilibrium amounts to viewing a system's entropy as maximal and, therefore, reaching its full capacity.§
V. TEMPERATURE AS JOULES PER BIT
In this section, we develop the logical implication of taking entropy as fundamental and assigning it its own unit: temperature should have the units of . The information capacity S is a function of its internal energy E. The temperature is obtained as the reciprocal of the slope between information capacity and energy, , or, when the slope is well-defined, directly as . Therefore, with entropy measured in bits, temperature is in .
Non-exotic systems are of positive temperature and positive heat capacity, yielding both positive first and positive second derivatives of E with respect to S, as displayed in Fig. 1(a).** In thermodynamic contexts, where large systems are concerned, is negligible compared to the capacity of the system, so the slope is well approximated by a finite difference. Thus, temperature can be interpreted as the increase in internal energy (in J) required to increase the information capacity by .
Graphs of energy vs capacity. Systems usually have a concave up curve (Fig. 1(a)). A heat bath is a linear idealization (Fig. 1(b)).
The statistical mechanical definition of temperature has many advantages, one of which is the possibility to make sense of negative temperature.38 In joules per bit, it is to be interpreted as the amount of energy that needs to be extracted from the system in order to increase its capacity by one bit. Furthermore, temperature as such is independent of an a priori notion of heat baths, which instead can be understood in terms of systems whose function is of constant slope, namely, of constant temperature, as displayed in Fig. 1(b). Such systems can be thought of as idealizations of very large systems or, as close-ups of some function, which then appears to be linear. The energy cost to increase the information capacity of a heat bath at temperature T (in ) by X (in bits) is XT (in J). However, for systems that are not well approximated by a heat bath, the energy cost for an additional need not be linear: The energy cost must be integrated over the interval of increased information capacity.
A. Available energy and capacity
In 1873, Gibbs39 found out that systems whose entropy is not maximal have available energy—now known as free energy—as the system can be used to produce work. This energy is extracted from the system by transforming it into a state of lower energy, while conserving its entropy, until the capacity curve is reached. It is thus the amount of energy that can be extracted from the system with no need to store an excess of entropy. In the same breath, Gibbs presents what he calls the capacity for entropy, or as we like to view it, available capacity.†† It is the number of bits that can be encoded in the system at no extra energy cost. Available capacity quantifies the amount of structure in the system, like blanks on a tape, or the empty registers in the memory of Maxwell's demon. The view advocated here, in which energy and entropy are more fundamental than temperature, highlights well the duality captured by Gibbs between available energy and available capacity. Figure 2 illustrates these quantities in the energy vs capacity plane of an isolated system.
The point's coordinates are given by the system's entropy and energy. It represents a nonequilibrium state, as its entropy is not maximal, i.e., the point is not on the capacity curve. As an example, it could correspond to the state of an isolated gas in a box, where all the particles occupy a smaller region than that of the whole box. The vertical red dashed line represents the available energy while the horizontal blue dashed line represents the available capacity.
The point's coordinates are given by the system's entropy and energy. It represents a nonequilibrium state, as its entropy is not maximal, i.e., the point is not on the capacity curve. As an example, it could correspond to the state of an isolated gas in a box, where all the particles occupy a smaller region than that of the whole box. The vertical red dashed line represents the available energy while the horizontal blue dashed line represents the available capacity.
Moreover, that duality is also captured by idealized systems that are efficient to store either energy or entropy. On one end, a battery is a system designed to keep the energy available; namely, the internal energy of the system can be changed with no significant changes in entropy. As an example, consider a weight in a gravitational field. The (potential) energy of the weight can be easily changed without affecting the information that is encoded in it. When changing the position of the weight, except for the energy, all the intrinsic properties of the system are unchanged. On the other end, some systems are efficient for storing information (or entropy) with no energy change—we call them tapes. Degenerate ground states and ideal hard disk drives are instances of those tapes.
B. Landauer's “displacement”
Be it classical or quantum, information processing follows the logical reversibility of the physical laws of motion. As a consequence, the information (i.e., entropy) of an isolated system does not decrease spontaneously. When considering the entire universe as one isolated system, the conclusion is straightforward: “No information is ever lost.” This statement has been recognized as a formulation of the second law of thermodynamics.42,43
The term “information erasure” is, therefore, a misnomer: Information is never erased as if free energy had the power of fundamental erasure. Rather, information is displaced. When information leaves the relevant degrees of freedom of an information-storage device, it moves to nearby systems, its environment. The environment, therefore, should be seen as a memory which is itself described by an energy vs information capacity curve of the sort shown in Fig. 1. Landauer's principle then reflects a clear application of temperature as joules per bit: To accommodate one additional bit of information, the environment's capacity must be expanded via a precise quantity of energy. By definition, this quantity is the temperature T of the environment times one bit. No unnecessary . No unnecessary .
Relatedly, Schumacher advocated20 that environmental temperature and erasure cost are interchangeable. Namely, one can define temperature as the erasure cost.
C. Implications of redefining the unit of temperature
Adopting the primacy of the bit over the kelvin has implications. Taken seriously, our proposal leads to the elimination of one of the seven defining constants of the SI, namely, .
Seen in this light, Boltzmann's constant is not a fundamental constant, but a conversion factor between those different standards.‡‡ As pointed out by Callen:44 “The constant prefactor [in Boltzmann's entropy] merely determines the scale of [Boltzmann's entropy]; it is chosen to obtain agreement with the Kelvin scale of temperature.” Therefore, Boltzmann's constant embodies the same parochial dependencies as those of the kelvin.
Should the SI adopt the bit instead of the kelvin, the other base units would be left unchanged, as can be readily seen by inspecting Fig. 3. Indeed, no other SI unit depends on the current definition of the kelvin. Moreover, the connection between entropy, energy, and temperature is made obvious in the diagram of Fig. 3(b): Temperature makes the bridge between energy (in J, defined by the arrows coming from the meter, the second, and the kilogram) and entropy (in bits).
The result of altering the International System of Units (SI) to recognize the primacy of the bit and, with it, temperature as . Figure 3(a) displays the current defining constants and SI units, and their relationships to one another. Figure 3(b) displays our modification. While Boltzmann's constant has been removed, and the Kelvin replaced, no other SI constant or unit is affected by the modification. Note that the units of temperature are no longer a SI base unit. Those of entropy are.
The result of altering the International System of Units (SI) to recognize the primacy of the bit and, with it, temperature as . Figure 3(a) displays the current defining constants and SI units, and their relationships to one another. Figure 3(b) displays our modification. While Boltzmann's constant has been removed, and the Kelvin replaced, no other SI constant or unit is affected by the modification. Note that the units of temperature are no longer a SI base unit. Those of entropy are.
Obviously, we do not expect the to feature in everyday discussions about temperature. After all, even if the kelvin is the scientific unit, the common usage of the Fahrenheit and the Celsius scales is still widespread. Still, we could describe our preferred pool temperature ( ) as . As another example, let us consider the entropy of one mole of helium at room temperature and atmospheric pressure. As computed from the Sackur–Tetrode equation (Ref. 1, Chapter II), its value is 126 . This corresponds to approximately .
VI. CONCLUSION
By taking seriously the well-accepted logical priority of entropy over temperature in statistical mechanics, we suggest that entropy ought to find a unit of its own, the bit. This offers an interdisciplinary interpretation of thermodynamics centered on the duality between energy and information. Temperature arises as the nexus between them, linking joules to bits—and hence measured in . When a system's entropy is maximal, its information capacity is a useful proxy for its entropy, as they coincide. The system's temperature is then given by the energy cost to increase the information capacity by one bit. When the system's entropy is not maximal, it has available energy, or available capacity, like batteries and tapes do. Viewing the environment as a storehouse of information explains Landauer's “displacement”; namely, since a bit is never erased, the environment's capacity needs to be increased by one bit to accommodate it, which is achieved at the energy cost (in joules) of , where T is the temperature of the environment in joules per bit.
The ideas in this paper were developed under the assumption of classical physical laws. However, this limitation does not detract from the significance of our proposal. We conjecture that the interpretation of temperature as energy per unit of information remains valid, regardless of whether the laws of physics are quantum or of any other form yet to be discovered.
Like energy, which takes very different forms across the many domains of physics, information might also be characterized in different ways. Investigating them and interlinking information-theoretic notions with many of the traditional concepts of thermodynamics shall be fruitful.
ACKNOWLEDGMENTS
The authors are grateful to Charles H. Bennett, Gilles Brassard, David Deutsch, Paul Erker, Hlér Kristjánsson, Richard MacKenzie, Bryan W. Roberts, Tommaso Toffoli, and Maria Violaris for fruitful discussions and comments on earlier versions of this work. The authors also thank the Jude the Obscure Pub for its enlightening atmosphere. Additionally, the authors acknowledge Gilles Brassard for generously enabling the publication of this work in open access. Figure 3 was generated from an adaptation of the Mathematica notebook of Emilio Pisanty (dated 2016–2018); made available under the CC BY-SA 4.0 license. X.C.-R. thanks the Fonds National Suisse (FNS) for financial support through the Postdoc Mobility fellowship program. X.C.-R. acknowledges funding from the BMW endowment fund. C.A.B.'s work is supported by the Fonds de recherche du Québec—Nature et technologie, the FNS, and the Hasler Foundation. S.W. acknowledges support from FNS through Project No. 214808.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
While in this work we use the term “thermodynamics” in its general sense encompassing also statistical mechanics, the present section assumes “thermodynamics” to specifically denote the phenomenological and macroscopic framework that remains agnostic to any underlying microscopic physics, i.e., not statistical mechanics.
A classic interpretation of is the cardinality of a macrostate, which here is solely determined by the system's energy.
A valid defense of the natural logarithm is its convenience when the machinery of differential calculus is used to elaborate the theory. While true, this is easily resolved by the conversion .
The fact that a system at maximal entropy saturates its capacity can be easily recognized when quantifying entropy in the probabilistic setting: In this case, the Gibbs–Shannon entropy of a distribution over a configuration space of possible values is maximal for the uniform distribution, and is then equal to the capacity.
In most contexts, can be inverted into a function . We opted for the graph of the latter, so as to have the slope directly equal to the temperature (instead of the inverse temperature).
It is often said that the Boltzmann's constant links the average kinetic energy with the temperature, but this contingency in the equipartition theorem is explained by the exponential relation between S and E that arises when we consider degrees of freedom that are quadratic in the energy.