One of the most powerful laws in physics is the second law of thermodynamics, which states that the entropy of any system remains constant or increases over time. In fact, the second law is applicable to the evolution of the entire universe and Clausius stated, “The entropy of the universe tends to a maximum.” Here, we examine the time evolution of information systems, defined as physical systems containing information states within Shannon’s information theory framework. Our observations allow the introduction of the second law of information dynamics (infodynamics). Using two different information systems, digital data storage and a biological RNA genome, we demonstrate that the second law of infodynamics requires the information entropy to remain constant or to decrease over time. This is exactly the opposite to the evolution of the physical entropy, as dictated by the second law of thermodynamics. The surprising result obtained here has massive implications for future developments in genomic research, evolutionary biology, computing, big data, physics, and cosmology.
I. INTRODUCTION
The research field of information dynamics (infodynamics) has its origins in a few significant scientific developments that include the seminal Information Theory developed by Shannon in 19411 and the pioneering work of Brillouin in 19532 and Landauer in 19613 on information physics. A more recent development is the introduction of the mass-energy-information (M-E-I) equivalence principle formulated by Vopson in 2019.4 Using thermodynamic considerations, Landauer introduced his 1961 principle stating that information, as defined in Shannon’s framework, is not just a mathematical construct, but it is physical, having small energy associated with it, which is detectable at information erasure. Backed up by multiple experimental confirmations reported in the literature,5–8 Landauer’s principle passed long ago the theoretical realm and the scientific community today broadly accepts it as valid. The M-E-I equivalence principle proposed in 2019 is an extension of Landauer’s principle stating that, if information is equivalent to energy, according to Landauer, and if energy is equivalent to mass, according to Einstein’s special relativity, then the triad of mass, energy, and information must all be equivalent, too (i.e., if M = E and E = I, then M = E = I). The M-E-I equivalence principle generated a number of interesting ramifications in physics,9–11 but still awaits an experimental confirmation.12 The Landauer and M-E-I equivalence principles are necessary in order to fulfill the thermodynamic laws of physics. These principles have been initially proposed in the context of digital information and computing technologies. This is because any computational or memory device is essentially a physical system, which is part of the universe and it must obey the universal laws of physics, including thermodynamics. Due to these considerations, Landauer suggested that logical irreversibility must be equivalent to physical irreversibility. Because irreversible processes are also dissipative, i.e., they take place with dissipation of energy, and since the erase operation that deletes a bit of information is irreversible, then it must dissipate a small energy that comes from the memory bit itself. Hence, Landauer deduced that a bit of information is physical, or more generally, any form of information as defined in Shannon’s framework is physical. The M-E-I equivalence principle proposes that the Landauer energy of an information bit condenses into its equivalent mass-energy when the information is stored at equilibrium. These fundamental ideas have created a bridge between pure mathematics and physics, essentially “physicalizing” mathematics. The concept of physicalizing the mathematics has profound implications for the way that we think about the whole universe, because it shows that the universe is fundamentally mathematical and it can be seen as emerging from information, i.e., “it from bit,” a concept coined by the legendary physicist, Wheeler.13
Here, we examine the entropy and the time dynamics of information systems and, in analogy to the second law of thermodynamics, we formulate the second law of infodynamics.
II. ENTROPY OF INFORMATION
Let us assume a physical system is in its virgin state with no information stored in it [Fig. 1(a)]. We now assume that the system undergoes the process of encoding digital bits of information via a given process of digital information storage. The technology deployed to encode digital information is irrelevant to our discussion, but we will demonstrate our argument here using a magnetic data storage system. The total entropy of the system is a measure of all its possible physical microstates compatible with the macrostate, and we call this the physical entropy of the system, Sphys. The physical entropy of the system is characteristic of the non-information bearing microstates within the system. We now assume that N digital bits of information are created within the physical body. This is equivalent to the “write” operation of a digital data storage device. The additional N bits of information created within our test system represent N additional microstates superimposed onto the existing physical microstates.
These additional microstates are information bearing states, and the additional entropy associated with them is called the entropy of information, Sinf.
The total entropy of the system is now the sum of the initial physical entropy and the newly created entropy of information, Stot = Sphys + Sinf. Hence, an important observation is that the process of creating information increases the overall entropy of a given system. In our example, we write digitally onto our hypothetical system the word INFORMATION using magnetic data recording, so a digital 0 is blue (magnetization up) and a digital 1 is red (magnetization down) [Figs. 1(b) and 1(c)].
In binary code, this results in 11 bytes, so N = 88 bits of 0 and 1 states, are encoded [Fig. 1(c)]. The evolution of the physical entropy and the total entropy of our test system are both governed by the second law of thermodynamics. The second law of thermodynamics has many alternative formulations, but in this context, we will use the one stating that the entropy of an isolated system undergoing any transformation remains always constant or increases over time. When applied to the whole universe, Clausius definition states, “The entropy of the universe tends to a maximum.” Mathematically, this formulation of the second law is written as ∂S/∂t ≥ 0, where S is the total entropy and t is time.
Units of bits are obtained when b = 2, trits when b = 3, nats when b = e, i.e., Euler’s number. The natural choice of b = 2 resulting in bits, which is the case in this article, is dictated by the current digital technologies making this a convenient choice. For an arbitrary choice of the base “b,” the information function can be returned in different units using the logarithm base change formula, . For example, if we want to convert information expressed in nuts into bits, then b = 2, a = e, and .
Our objective in this study is to examine the time evolution of Sinf. According to (4), only two variables, N and H(X), can drive any changes in the Sinf. The Shannon function has a maximum value, which is 1 in our case and it tends to 1 for large N.
III. TIME EVOLUTION OF DIGITAL INFORMATION STATES
The meaning of this relaxation time is the average time it takes for a magnetic grain of volume V within a magnetic bit state to undergo a spontaneous magnetization flip due to the thermal activation. Hence, after a sufficiently long time, we expect magnetic grains to lose their magnetization state, leading to magnetic bit states undergoing self-erasure, and, therefore, reducing the information states N. The implication of this analysis is that the entropy of the information bearing states tends to decrease over time.
This new law of infodynamics must not violate the second law of thermodynamics, so the entropy reduction in the information states must be compensated by an entropy increase in the physical states, via a dissipation mechanism. This rationale was behind Landauer’s principle that information is physical, which was also derived similarly and expanded to the mass-energy-information equivalence principle by Vopson.4
The simulation performed on our test sample resulted in a simultaneous reduction of the magnetization of all the magnetic information bit states up to the point when N = 0. However, in reality, this process can take place gradually so that N reduces to a lower value in steps, until it reaches zero eventually. Re-examining relation (4), it can be easily seen that a reduction in N would lead to a reduction of the information entropy, confirming indeed the second law of infodynamics (7).
IV. TIME EVOLUTION OF BIOLOGICAL INFORMATION STATES
In order to verify the universal validity of the second law of infodynamics, we need to examine the time evolution of the information entropy of a system, in which the number of information states N remains constant and the reduction of the information entropy comes from Shannon’s information entropy function.
A natural information coding system that fulfills this requirement is the genetic DNA/RNA code, because the information is encoded in the sequence of nucleotides and its time evolution is described by the genetic mutations. Genetic mutations are changes in the nucleotide sequence, and these changes can take place via three mechanisms: (i) Single nucleotide polymorphisms (SNPs), where changes occur so that the number of nucleotides remains constant; (ii) Deletions, where N decreases; and (iii) Insertions, which result in N increasing. Out of the three possible cases, only the SNP mutations are of interest to us, because they maintain the value of the N constant.
The ideal test system is a virus genome that undergoes frequent mutations in a short period of time. In this study, we examined the RNA sequence of the novel SARS-CoV-2 virus, which emerged in December 2019 resulting in the current COVID-19 pandemic.
A DNA sequence can be represented as a long string of the letters A, C, G, and T. These represent the four nucleotides: adenine (A), cytosine (C), guanine (G), and thymine (T) [replaced with uracil (U) in RNA sequences]. Therefore, within Shannon’s information theory framework, a typical genome sequence can be represented as a 4-state probabilistic system, with n = 4 distinctive events, and probabilities . Using digital information units and Eq. (2), for n = 4, we determine that Shannon information entropy is 2 (H = log2 4 = 2), so each nucleotide can encode maximum 2 bits: A = 00, C = 01, G = 10, and T = 11. For a given genomic sequence containing N nucleotides, the total Shannon information entropy can be maximum 2N bits.
The reference RNA sequence of the SARS-CoV-2, representing a sample of the virus collected early in the pandemic in Wuhan, China in December 2019 (MN908947),15 has 29 903 nucleotides, so N = 29 903. For this reference sequence, we computed the Shannon information entropy using relation (2).
The value obtained represents the reference Shannon information entropy at time zero before any mutations took place. Using the National Center for Biotechnology Information (NCBI) database, we searched and extracted a number of SARS-CoV-2 variants sequenced at various locations around the globe, at different times, starting from January 2020 to October 2021 (Table I).
Genome . | References . | SNPs . | Time (months) . | Location . | Shannon IE . |
---|---|---|---|---|---|
MN908947 | 15 | 0 | 0 | China | 1.957 024 3 |
LC542809 | 19 | 4 | 3 | Japan | 1.956 919 7 |
MT956915 | 20 | 7 | 5 | Spain | 1.956 923 0 |
MW466798 | 21 | 9 | 7 | South Korea | 1.956 932 7 |
MW294011 | 22 | 19 | 10 | Ecuador | 1.956 705 8 |
MW679505 | 23 | 25 | 14 | USA | 1.956 663 0 |
MW735975 | 24 | 26 | 14 | USA | 1.956 571 4 |
OK546282.1 | 25 | 32 | 16 | USA | 1.956 567 5 |
OK104651.1 | 26 | 40 | 20 | Egypt | 1.956 459 1 |
OL351371.1 | 27 | 49 | 22 | Egypt | 1.956 261 4 |
Genome . | References . | SNPs . | Time (months) . | Location . | Shannon IE . |
---|---|---|---|---|---|
MN908947 | 15 | 0 | 0 | China | 1.957 024 3 |
LC542809 | 19 | 4 | 3 | Japan | 1.956 919 7 |
MT956915 | 20 | 7 | 5 | Spain | 1.956 923 0 |
MW466798 | 21 | 9 | 7 | South Korea | 1.956 932 7 |
MW294011 | 22 | 19 | 10 | Ecuador | 1.956 705 8 |
MW679505 | 23 | 25 | 14 | USA | 1.956 663 0 |
MW735975 | 24 | 26 | 14 | USA | 1.956 571 4 |
OK546282.1 | 25 | 32 | 16 | USA | 1.956 567 5 |
OK104651.1 | 26 | 40 | 20 | Egypt | 1.956 459 1 |
OL351371.1 | 27 | 49 | 22 | Egypt | 1.956 261 4 |
By searching for complete genome sequences, containing the same number of nucleotides as the reference sequence, we carefully selected variants that displayed an incremental number of SNP mutations with time, and we computed the Shannon information entropy for each variant. The calculations have been performed using previously developed software, GENIES,16,17 designed to study genetic mutations using Shannon’s information theory.18
The full dataset, including genome data references/links collection times, number of SNP mutations, and the Shannon information entropy value of each genome are shown in Table I. Figure 3 shows the time evolution of the number of SARS-CoV-2 SNP mutations and the time evolution of each variant’s information entropy, Sinf computed using (4) and normalized to kb. The data indicate that, as expected, the number of mutations increases linearly as a function of time [Fig. 3, bottom graph, coefficient of determination (COD) = 99%]. Remarkably, for the same dataset, the Shannon information entropy (see Table I), and the overall information entropy of the SARS-CoV-2 variants (Sinf) computed using (4), decreases rather linearly over time (Fig. 3, top graph, COD = 97%). The observed correlation between the information entropy and the time dynamics of the genetic mutations is truly unique, because it reconfirms the second law of infodynamics, but it also points to a possible deterministic approach to genetic mutations, currently believed to be just random events. The existence of an entopic force that governs genetic mutations instead of randomness is very powerful and it could lead to the future development of predictive algorithms for genetic mutations before they occur.
V. CONCLUSIONS
In this study, we introduced the second law of infodynamics, which is universally applicable to any information system, including biological systems where the number of information states remains constant. We demonstrated that the information bearing states evolve over time in a way that their associated entropy remains constant or decreases. Hence, all physical systems containing information states should obey not only the second law of thermodynamics but also the second law of infodynamics, as demonstrated in this article. The introduction of the second law of infodynamics is of fundamental importance because it will aid future studies and developments in a diverse range of sciences, including genetics, evolutionary biology, virology, computing, big data, physics, and cosmology. However, in this article, we do not address the implications of the second law of infodynamics to fundamental issues such as the evolution of information in the universe, the overall balance of physical and information entropies in the universe, or the growth of biological information in the terrestrial biosphere and beyond. We also do not explain how the second law of infodynamics relates to the relaxation times of the information states and the observation time, nor do we address the question of the possible existence of fluctuations of information states when the minimal information entropy state occurs. We, therefore, hope that these unanswered questions will be addressed in the future studies stimulated by this work.
ACKNOWLEDGMENTS
M.M.V. acknowledges the financial support received for this research from the School of Mathematics and Physics, University of Portsmouth. S.L. also acknowledges the support received for this research from the Jeremiah Horrocks Institute for Mathematics, Physics, and Astronomy, University of Central Lancashire.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.