Contextuality is a key feature of quantum mechanics, as was first brought to light by Bohr [*Albert Einstein: Philosopher-Scientist*, Library of Living Philosophers Vol. VII, edited by P. A. Schilpp (Open Court, 1998), pp. 199–241] and later realized more technically by Kochen and Specker [J. Math. Mech. **17**, 59 (1967)]. Isham and Butterfield put contextuality at the heart of their topos-based formalism and gave a reformulation of the Kochen–Specker theorem in the language of presheaves in Isham and Butterfield [Int. J. Theor. Phys. **37**, 2669 (1998)]. Here, we broaden this perspective considerably (partly drawing on existing, but scattered results) and show that apart from the Kochen–Specker theorem, Wigner’s theorem, Gleason’s theorem, and Bell’s theorem also relate fundamentally to contextuality. We provide reformulations of the theorems using the language of presheaves over contexts and give general versions valid for von Neumann algebras. This shows that a very substantial part of the structure of quantum theory is encoded by contextuality.

## I. INTRODUCTION

### A. Structural theorems of quantum theory

There are a small number of key theorems in the foundations of quantum theory which throw the differences between classical and quantum in sharp relief. The first and oldest of these is Wigner’s theorem^{1} from 1931, which shows that each transformation on the set of pure states of a quantum mechanical system that preserves transition probabilities is given by conjugation with a unitary or anti-unitary operator. Hence, the pure state space of a quantum system has a very specific structure.

In 1957, Gleason^{2} proved that any assignment of probabilities to projection operators such that probabilities assigned to orthogonal projections add up must already be given by a quantum state. Since projections represent propositions about the values of physical quantities^{3} by the spectral theorem, Gleason’s result justified the use of the Born rule—30 years after its introduction—when calculating expectation values in quantum mechanics.

Bell^{4} proved in 1964 that in local hidden variable theories, there is an upper bound on correlations that can exist between spatially separated subsystems of a composite system. Quantum theory violates this upper bound and hence cannot be (or be replaced by) a local hidden variable theory. There are a number of fine interpretational points, but it is largely accepted today that the violation of Bell’s inequality has been confirmed experimentally.

Finally, in 1967, Kochen and Specker^{5} showed that under mild and natural conditions, it is mathematically impossible to assign values to all physical quantities simultaneously. Usually, this is phrased as saying that there are no non-contextual value assignments. Since in classical physics, states do assign values to all physical quantities at once, this is a rather strong result and a severe obstacle to any realist interpretation of the quantum formalism.

Each of these landmark theorems singles out a central aspect of quantum theory that distinguishes it from classical physics. The contents of the theorems are very distinct; each one concerns a different structural aspect of quantum theory. Yet, in this article, we will show that, in fact, all these theorems have a common source, which is *contextuality*. This strongly suggests that contextuality is at the heart of quantum theory and is largely responsible for the structural differences between classical and quantum.

### B. Contextuality and its mathematical formalization

Contextuality, which was introduced by Bohr,^{6,7,75} is a deep concept, and like many other deep concepts in physics, it has taken on a range of meanings and interpretations in the literature.^{8} This has led to a certain danger of talking vaguely and at cross-purposes. In order to avoid this, we define precisely what we mean by contextuality and we give a rather minimal and conservative definition that comes with little interpretational baggage.

We say that a physical system has *physical contextuality* if it has some incompatible physical quantities, that is, quantities that cannot be measured simultaneously in an arbitrary state. For all we know today, physical contextuality in this sense is a characteristic feature of all quantum systems, but not of classical systems, and the restrictions on co-measurability are fundamental and not just due to a lack of experimenters’ (or theoreticians’) finesse.

Even in a physical system with physical contextuality, there are families of compatible, co-measurable physical quantities. (In an extreme case, such a family could consist of a single physical quantity, although this does not occur for quantum systems.) A family of compatible physical quantities is called a *physical context*. We remark the following:

Our definition of physical contextuality does not refer to actual measurement setups, but we could, as is often done, identify a measurement setup with a physical context, i.e., with the family of physical quantities that is measured by the setup.

Our definition does not refer to values measured and/or possessed by physical quantities, nor to probabilities. This is not necessary for our purposes, and it avoids many interpretational issues that often cloud the discussion. In particular, we can consider whether non-contextual assignments of values, probabilities, etc., are possible in our kind of contextual theory.

In order to be compatible according to our definition, two physical quantities must be co-measurable in all states. Hence, two physical quantities that are co-measurable in some states can still be incompatible and lie in different contexts. We are not concerned with weak measurements, weak values, etc.

The definition of physical contextuality is given in a somewhat intuitive manner, since no precise mathematical formalization of “physical quantities” and “states” is provided so far.

We remark—following Kochen and Specker^{5} and especially Conway and Kochen^{9}—that much less than the full Hilbert space formalism needs to be given: as long as the physical system under consideration (or some subsystem of it) has the physical quantity usually called “spin-1,” which can be measured in different directions in space, the system is contextual, since measurements in different directions cannot be performed simultaneously. The mathematical formalism required to describe this situation is a modest part of projective geometry in three dimensions.

Yet, following standard practice, we will assume the usual Hilbert space formalism in which the set of (bounded) physical quantities is mathematically represented by the set $B(H)sa$ of bounded self-adjoint operators on the Hilbert space $H$ of the system. The self-adjoint operators form the real part of the complex, noncommutative algebra of bounded operators on the Hilbert space of system. A context is mathematically formalized as a commutative subalgebra of this noncommutative algebra. This can be generalized to von Neumann algebras of physical quantities. We will quickly recall the necessary mathematical background in Sec. II. Moreover, we will introduce the context category and some minimal background on presheaves in order to make this article largely self-contained.

### C. Structural theorems and contextuality

We will show that each of the key theorems mentioned above has an *equivalent* reformulation in the language of presheaves over contexts, thus showing the close and sometimes surprising connections between these theorems and contextuality. The prototype of such results is found in the work by Isham, Butterfield, and Hamilton^{10,11} on the Kochen–Specker theorem. We will extend their results to the other fundamental theorems by Wigner, Gleason, and Bell and will provide a bigger, more coherent picture of the role of contextuality in the foundations of quantum mechanics.

In Sec. III, Wigner’s theorem is treated. The relevant presheaf is trivial, since Wigner’s theorem is based on the mere order of contexts, as we will show. The Kochen–Specker theorem and its reformulation are presented in Sec. IV. Here, the so-called spectral presheaf plays the key role. It can be seen as a generalized state space for a quantum system, and the Kochen–Specker theorem is equivalent to the fact that this space has no points in a suitable sense. Gleason’s theorem is treated in Sec. V. Its reformulation is based on the so-called probabilistic presheaf, which does have points, i.e., global sections, and these correspond exactly with quantum states. Finally, we consider Bell’s theorem and its relations to contextuality in Sec. VI. The relevant presheaf is a bipartite version of the probabilistic presheaf. This presheaf is based on a simple way of composition via contexts, yet it is rich enough to encode all quantum correlations (and not more). In fact, by adding a notion of time orientation in subsystems, it singles out quantum states unambiguously. Section VII concludes this paper.

## II. MATHEMATICAL PRELIMINARIES

### A. Algebras of physical quantities

Throughout this work, we will take the perspective of *algebraic quantum mechanics*, that is, we emphasize the role of the physical quantities, or observables, and the algebra they form. This means no departure from standard textbook quantum mechanics, just a slightly different perspective that allows for substantial generalizations. We will assume that the physical quantities generate a von Neumann algebra (standard references are, e.g., Refs. 12–14). In the following, we quickly fix notations.

#### 1. Von Neumann algebras

Let $H$ be the complex Hilbert space of the quantum system under consideration. The algebra of all bounded linear operators on $H$ is denoted $B(H)$. If $H$ is of finite dimension *n*, then $H=Cn$ and $B(H)=Mn(C)$, the algebra of all *n* × *n*-matrices with complex entries. If $H$ is infinite-dimensional, then $B(H)$ carries several interesting topologies (which all coincide in finite dimensions). We will need the weak (operator) topology, the ultraweak topology, and the norm topology. For details on topologies on $B(H)$ and how they relate to each other, see e.g., Ref. 14.

In particular, $B(H)$ is closed in the weak topology and hence is a von Neumann algebra. More generally, every weakly closed, unital subalgebra of $B(H)$ is a von Neumann algebra, and every von Neumann algebra is of this form for a suitable Hilbert space $H$. The physical quantities of a quantum system are represented by the bounded *self-adjoint operators* in a von Neumann algebra $N$. The real vector space of self-adjoint operators in $N$ is denoted $Nsa$.

By the *spectral theorem*, each self-adjoint operator *a* is the norm limit of finite real linear combinations of projection operators, i.e., *a* can be approximated by operators of the form $\u2211i=1nAipi$, where the *A*_{i} are real numbers and the *p*_{i} are mutually orthogonal projections, that is, *p*_{i}*p*_{j} = *δ*_{ij}*p*_{i}.

Let *S* be a family of bounded operators on $H$. The commutant of *S*, denoted *S*′, is the set of all operators in $B(H)$ that commute with all operators in *S*,

Von Neumann’s *double commutant theorem* (see, e.g., Ref. 12) shows that a subalgebra $N$ of $B(H)$ is weakly closed, i.e., a von Neumann algebra if and only if $N=N\u2033$. Here, $N\u2033=(N\u2032)\u2032$ is the commutant of the commutant of $N$.

The projections in a von Neumann algebra $N$ form a complete orthomodular lattice, denoted $P(N)$. The least upper bound (*join*) of a family of projections $(pi)i\u2208I$ in $P(N)$ is denoted ⋁_{i∈I}*p*_{i}, and the greatest lower bound (*meet*) is denoted ⋀_{i∈I}*p*_{i}. In quantum logic, the projection operators are interpreted as propositions about the values of physical quantities. Let *a* be a self-adjoint operator representing some physical quantity. For simplicity, assume that $a=\u2211i=1nAipi$. Then, the projection *p*_{i} represents the proposition “the physical quantity (represented by) *a* has the value *A*_{i}.”

#### 2. States on a von Neumann algebra

A state on a von Neumann algebra is a positive linear functional of norm 1. The states form a convex set $S(N)$, whose extreme points are called *pure* states. In the case of $N=B(H)$, the pure states are the familiar vector states. A state *ρ* is called *multiplicative* if *ρ*(*ab*) = *ρ*(*a*)*ρ*(*b*) for all $a,b\u2208Nsa$. This is a strong condition: non-trivial multiplicative states exist only on commutative von Neumann algebras. For these, they are exactly the pure states. A state *ρ* is called *normal* if *ρ*(*a*_{i}) → *ρ*(*a*) for every monotone increasing net of operators $ai\u2208N$ with least upper bound *a*.

Just as in standard quantum mechanics, in algebraic quantum theory, (mathematical) states on a von Neumann algebra are interpreted as physical states of the quantum system, assigning expectation values to physical quantities.

#### 3. Jordan algebras and associativity

Instead of the usual multiplication (by composition) of self-adjoint operators, one can also use another product, defined by

This is the *Jordan product*, given by the anti-commutator of *a* and *b* up to the conventional factor $12$. In contrast to the usual product *ab*, the Jordan product *a*◦*b* is always self-adjoint, even if *a* and *b* do not commute. Hence, there is a *real Jordan algebra*$J(H)sa=(B(H)sa,\u25e6)$. Its complexification is $J(H)=(B(H),\u25e6)$.

Clearly, the Jordan product is commutative, *a*◦*b* = *b*◦*a* for all $a,b\u2208B(H)sa$. More generally, we can associate a weakly closed Jordan algebra $J(N)=(N,\u25e6)$ to every von Neumann algebra $N$ by replacing the generally noncommutative product in $N$ by the commutative Jordan product. There is a “shadow” of noncommutativity left: $J(N)$ is associative if and only if $N$ is commutative.

### B. Contexts and the context category

#### 1. Mathematical contexts

We begin with the basic definitions of a (mathematical) context and the partially ordered set of all contexts of a quantum system.

*Let* $H$ *be the Hilbert space of the quantum system under consideration, and let* $N\u2286B(H)$ *denote the von Neumann algebra of physical quantities of the system. A context is a commutative von Neumann subalgebra of* $N$*.*

This definition of a (mathematical) context simply encodes the idea that within a (physical) context, which can be identified with a chosen experimental setup, all the physical quantities are compatible, co-measurable, and, hence, are represented mathematically by commuting self-adjoint operators. We denote contexts as $V,V\u0303,V\u0307,\u2026$.

#### 2. The context category

There is a natural partial order on contexts: some contexts are maximal, that is, one cannot add any further self-adjoint operators to them without destroying commutativity. Other contexts are non-maximal; they are commutative von Neumann subalgebras that are properly contained in a larger context. In fact, if *V* is a non-maximal context, there are many different maximal contexts containing *V*.

Hence, any quantum system has many different contexts, with smaller ones contained in larger ones. The key idea for all that follows is that one should not just consider a single context of a quantum system, or a small number, but all of them simultaneously. This idea goes back to Isham^{10} and has become a fruitful perspective and powerful tool over the last 20 years. The topos approach to quantum theory and many subsequent developments are based on this idea; for an introduction, see, e.g., Ref. 15. The key definition is as follows:

*Let* $H$ *be the Hilbert space of the quantum system under consideration, and let* $N\u2286B(H)$ *denote the von Neumann algebra of physical quantities. The context category of the system is the set of all contexts, i.e., the set of all commutative von Neumann subalgebras of* $N$*, equipped with inclusion as partial order. The context category is denoted* $V(N)$*. If* $N=B(H)$*, then we will simply write* $V(H)$ *[instead of* $V(B(H))$*].*

The name *context category* comes from the fact that every partially ordered set (or poset, for short) can also be regarded as a category.^{16} The objects are the elements of the poset, and there is an arrow *a* → *b* if and only if *a* ≤ *b*. Hence, in a poset seen as a category, arrows express the order, so there is at most one arrow between any two objects. We will actually need very little category theory in the following, but we will feel free to use some simple and well-established categorical notions where appropriate. The reader not familiar with category theory can equally well read “context poset” for “context category” throughout. If a context $V\u0303$ is contained in another context *V*, we will write $iV\u0303V:V\u0303\u21aaV$ for the inclusion map. Alternatively, one could just write $V\u0303\u2282V$.

The following, powerful result is due to Harding and Navara:^{17}

*Let* $N$ *be a von Neumann algebra not isomorphic to* $C\u2295C$ *or to* $M2(C)$*. Then, the context category* $V(N)$ *of* $N$ *determines the projection lattice* $P(N)$ *as an orthomodular lattice up to isomorphism. Conversely, the projection lattice* $P(N)$ *determines the poset* $V(N)$ *up to isomorphism.*

In fact, Harding and Navara’s proof holds more generally for orthomodular lattices with no maximal Boolean sublattices with only four elements [this is why we exclude the cases $N=C\u2295C$ and $N=M2(C)$]. The result shows that the context category, i.e., the set of contexts together with the information of how contexts are contained within each other, encodes exactly the same amount of information as the projection lattice. In this sense, contextuality determines quantum logic and vice versa.

Note that the context category $V(N)$ is just a poset. Its elements, the contexts $V\u2286N$, are just “points” within $V(N)$, without the inner structure. In particular, from the perspective of $V(N)$, we do not have access to the commutative von Neumann subalgebras, much less to the operators contained within each context. All the structure of $V(N)$ lies in the order, that is, in the information of how some contexts are contained within others. This makes the Harding–Navara result quite remarkable, since the mere order structure on contexts determines the full structure of the projection lattice.

A brief remark on two-dimensional systems. While the concept of physical contextuality applies to the two-dimensional case $N=M2(C)$ also, the noncontextuality constraints between observables, i.e., the order relations between contexts are trivial in this case: the only order relations involve the trivial context, generated by the identity in $M2(C)$. In turn, taken out the trivial context, there are no order relations in $V(M2(C))$. Since the order relations are at the heart of physical contextuality and since in this work we argue that the latter is behind many key theorems in quantum foundations, in our reformulations of these theorems, we will exclude the two-dimensional case. One way to justify this is to view any such system as part of a larger system (including an “environment”) in which case it immediately embeds within the realm of validity of the theorems as given below.

#### 3. Contexts without (non)commutativity

The context category $V(N)$ of a von Neumann algebra $N$ can be defined without any reference to (non)commutativity: every weakly closed associative Jordan subalgebra of $N$ is a commutative von Neumann subalgebra and vice versa. Hence, we can regard $N$ as a weakly closed Jordan algebra and consider the set of its weakly closed associative Jordan subalgebras, partially ordered by inclusion. This poset is (isomorphic to) $V(N)$.

### C. Presheaves over the context category

#### 1. The concept of a presheaf: Local data glued together

We saw in Subsection II B that the context category $V(N)$ already encodes a lot of information about a quantum system. Now, we will build further structures upon the context category in order to make it an even more useful tool. Concretely, given the context category $V(N)$, we are interested in assigning data to each context. Moreover, since $V(N)$ is a poset, we want to relate the data assigned to a context *V* to the data assigned to a smaller context $V\u0303\u2282V$.

For example, for each context $V\u2208V(N)$, one may consider the set Σ(*V*) of all pure states of *V*. If $V\u0303\u2282V$, then every pure state of *V* gives a pure state of $V\u0303$ simply by restriction,

(Pure states of commutative von Neumann algebras are traditionally denoted *λ*.) In this way, we obtain a natural map from the pure states of *V* to the pure states of $V\u0303$, hence relating the data assigned to *V* to the data assigned to $V\u0303$.

This is an example of a general construction, viz., a presheaf. This naming is traditional and has no particular meaning for us. The general definition of a presheaf over the context category $V(N)$ is as follows:

*Let* $N$ *be a von Neumann algebra, and let* $V(N)$ *be its context category. A presheaf over* $V(N)$ *is a contravariant functor* $P\u0332:V(N)\u2192Set$*. That is,* $P\u0332$ *is given as follows:*

*on objects: for all*$V\u2208V(N)$*,*$P\u0332V$*, the component of*$P\u0332$*at**V**, is some set;**on arrows: for all inclusions*$iV\u0303V:V\u0303\u21aaV$*, there is a restriction map*$P\u0332(iV\u0303V):P\u0332V\u27f6P\u0332V\u0303,x\u27fcP\u0332(iV\u0303V)(x).$

Of course, this is a very general notion. The idea is that the “local” data assigned to each context can vary from context to context, but there are “connecting maps” relating the local data at *V* and $V\u0303$ whenever $V\u0303\u2282V$. The mathematical language of presheaves is a convenient tool for book-keeping.

*The trivial presheaf. The simplest (non-empty) presheaf over* $V(N)$ *is the trivial presheaf* $1\u0332$*, which is given as follows:*

*on objects: for all*$V\u2208V(N)$*,*$1\u0332V\u2254{*}$*, the one-element set;**on arrows: for all inclusions*$iV\u0303V:V\u0303\u21aaV$*,*$1\u0332(iV\u0303V):1\u0332V\u27f61\u0332V\u0303,*\u27fc*.$

We will make use of the trivial presheaf when we consider Wigner’s theorem in Sec. III. For the treatment of the Kochen–Specker theorem in Sec. IV, we will use the *spectral presheaf* (already sketched above); for Gleason’s theorem (in Sec. V), we will use the *probabilistic presheaf*; and for Bell’s theorem (in Sec. VI), we will use the *Bell presheaf*, a version of the probabilistic presheaf adapted to composite systems. Each of these presheaves is tailor-made for reformulating the respective theorem. The spectral presheaf is built from pure states in each context, and the probabilistic presheaf is built from mixed states.

#### 2. Contravariance and coarse-graining

One may wonder why we are using contravariant functors rather than covariant ones. If $V,V\u0303$ are contexts such that $V\u0303\u2282V$, why not map the local data assigned to $V\u0303$ into the local data assigned to *V*? Covariant functors do have their place in the bigger scheme,^{18} but as it turns out, for our purposes, we only need contravariant functors. Generically, the idea is that the data assigned to a larger context *V* is richer, more informative, and can be coarse-grained or restricted to the data assigned to a smaller context $V\u0303$. Conversely, there is often no canonical way to “fine-grain” or extend data assigned to $V\u0303$ to data assigned to *V*. It is always possible to discard information, but it is often impossible to create information, at least not in a unique way. For example, every pure state of *V* gives a pure state of $V\u0303$ by restriction, but a pure state of $V\u0303$ can usually be extended in many different ways to a pure state of *V*. (The problem here is that there is no *canonical* extension.)

#### 3. Mapping a presheaf into itself

In order to relate Wigner’s theorem to the trivial presheaf in Sec. III, we have to consider automorphisms of the trivial presheaf, that is, reversible mappings of $1\u0332$ into itself. There are a number of possible definitions and conventions, but we will focus on a very simple notion of automorphism that suits our purposes.

Let $P\u0332$ be a presheaf over the context category $V(N)$. Roughly speaking, we can map $P\u0332$ to itself by first shifting the components around and then mapping each (shifted) component into itself in a way that is compatible with the restriction maps (natural transformation). More precisely, the shifting around of components is achieved by a morphism of the base category, which is $V(N)$ in our case, acting by pullback. This means that if $\varphi \u0303:V(N)\u2192V(N)$ is a morphism of the base category, then it acts as follows: $P\u0332\u25e6\varphi \u0303$ is the presheaf over $V(N)$ with component $(P\u0332\u25e6\varphi \u0303)V=P\u0332\varphi \u0303(V)$ at *V*. That is, the new component at *V* is the old component at $\varphi \u0303(V)$ for every $V\u2208V(N)$. The restriction maps of $P\u0332\u25e6\varphi \u0303$ are given in the obvious way by $(P\u0332\u25e6\varphi \u0303)(iV\u0303V)=P\u0332(i\varphi \u0303(V\u0303)\varphi \u0303(V))$.

An automorphism $\Theta :P\u0332\u2192P\u0332$ of a presheaf $P\u0332$ over $V(N)$ then consists of the following:

an automorphism $\varphi \u0303:V(N)\u2192V(N)$ of the base category acting by pullback, thus mapping $P\u0332$ to $P\u0332\u25e6\varphi \u0303$,

followed by, for each $V\u2208V(N)$, an isomorphism $\theta V:(P\u0332\u25e6\varphi \u0303)V\u2192(P\u0332\u25e6\varphi \u0303)V$ such that, whenever $V\u0303\u2282V$, one has $P\u0332(i\varphi \u0303(V\u0303)\varphi \u0303(V))\u25e6\theta V=\theta V\u0303\u25e6P\u0332(i\varphi \u0303(V\u0303)\varphi \u0303(V))$.

#### 4. Local and global sections of a presheaf

Presheaves over the context category $V(H)$ [or $V(N)$] are not just sets but collections of sets (one for each context), which are interconnected by the restriction maps. Hence, the notion of an “element” of a presheaf must be defined suitably. Let $P\u0332$ be a presheaf over $V(N)$, and let *D* be a downward closed subset of $V(N)$, i.e., if *V* ∈ *D* and $V\u0303\u2282V$, then $V\u0303\u2208D$. A *local section* *γ* of $P\u0332$ *over* *D* consists of a choice of one element from the component $P\u0332V$ for each *V* ∈ *D*, denoted *γ*_{V}, such that whenever $V,V\u0303\u2208D$ with $V\u0303\u2282V$, then $P\u0332(iV\u0303V)(\gamma V)=\gamma V\u0303$. This condition means that the elements *γ*_{V} that we pick from the sets $P\u0332V$ (where *V* ∈ *D*) fit together under the restriction maps of the presheaf $P\u0332$.

One should think of a local section *γ* of $P\u0332$ over *D* as a “partial” element of $P\u0332$. If one has a local section over $D=V(N)$, then *γ* is called a *global section* (or *global element*) of the presheaf $P\u0332$. This is the analog of an element of a set or a point of a space.

For a given presheaf $P\u0332$, global sections may or may not exist (while local sections always exist, just make *D* small enough). In fact, finding a global section amounts to fitting specified local data, one element from each component of $P\u0332$, together into a whole. We will see that the presheaf reformulations of the Kochen–Specker theorem (following Isham, Butterfield, and Hamilton), Gleason’s theorem, and also Bell’s theorem are statements about the existence or nonexistence of global sections of certain presheaves.

## III. REFORMULATION OF WIGNER’S THEOREM

### A. Wigner’s theorem, Dye’s theorem, and Jordan ^{*}-automorphisms

We first consider the following Wigner’s theorem:^{1}

*(Wigner’s theorem). Let*$H$

*be a Hilbert space,*$dim(H)\u22652$

*, and let*$P1(H)$

*be the set of rank-1 projections on*$H$

*[equivalently,*$P1(H)$

*is the projective Hilbert space]. Every bijective map*

*such that*tr[

*φ*(

*p*)

*φ*(

*q*)] = tr[

*pq*]

*for all*$p,q\u2208P1(H)$

*(i.e., transition probabilities are preserved) is implemented by conjugation with a unitary or anti-unitary operator*

*u*

*,*

Various nice proofs can be found in the literature; for a modern perspective, see, e.g., Ref. 19. These authors also prove the following: let $Aut(P1(H))$ denote the group of automorphisms of $P1(H)$ [i.e., bijective maps $\phi :P1(H)\u2192P1(H)$ that preserve transition probabilities]. If $dim(H)\u22653$, then $Aut(P1(H))$ is isomorphic to the group $Aut(P(H))$ of automorphisms of the projection lattice $P(H)$, i.e., maps

that

are bijective,

preserve complements, $\u2200p\u2208P(H):\varphi (1\u2212p)=1\u2212\varphi (p)$, and

preserve and reflect order, $\u2200p,q\u2208P(H):(p\u2264q)\u21d4(\varphi (p)\u2264\varphi (q))$.

Geometrically, *p* ≤ *q* means that the closed subspace that *p* projects onto is contained in the closed subspace that *q* projects onto. Algebraically, (*p* ≤ *q*) ⇔ (*pq* = *p*). Since an automorphism $\varphi \u2208Aut(P(H))$ preserves the order, it also preserves all meets (greatest lower bounds) and all joins (least upper bounds) in $P(H)$. Since *ϕ* also preserves complements, it is an automorphism of the complete orthomodular lattice $P(H)$.

Under the group isomorphism $Aut(P1(H))\u2192Aut(P(H))$, the automorphism $\varphi \u2208Aut(P(H))$ corresponding to a given $\phi \u2208Aut(P1(H))$ is an extension of *φ* to all projections that preserves joins and complements (and hence also meets; for details, see Ref. 19). Of course, we have $\varphi |P1(H)=\phi $.

Hence, if the Hilbert space is at least three-dimensional, Wigner’s theorem is equivalent to the fact that every automorphism of the projection lattice $P(H)$ is implemented by conjugation with a unitary or anti-unitary operator,

There is a generalization of Wigner’s theorem to von Neumann algebras, which is closer to the formulation with automorphisms of $P(H)$ than automorphisms of $P1(H)$. This is Dye’s theorem.^{20} Before we formulate the theorem, we recall that given a von Neumann algebra $N$, we can form the associated Jordan algebra $J(N)=(N,\u25e6)$, which has the same elements and linear structure as $N$, and the Jordan product given by

Recall that the Jordan product is not only commutative but also associative only if the von Neumann algebra $N$ is commutative. A Jordan *-automorphism of $J(N)$ is a bijective map $\Phi :J(N)\u2192J(N)$ such that both Φ and Φ^{−1} preserve Jordan product and involution (_)^{*},

and analogously for Φ^{−1}. We can now formulate Dye’s theorem.

*Let* $N$ *be a von Neumann algebra with no direct summand of type* *I*_{2}*. For every automorphism* $\varphi :P(N)\u2192P(N)$ *of the projection lattice of* $N$*, there exists a unique Jordan* **-automorphism* $\Phi :J(N)\u2192J(N)$ *such that* Φ(*p*) = *ϕ*(*p*) *for all projections* $p\u2208P(N)$*.*

It is easy to see that the Jordan *-automorphism Φ induced by an automorphism $\varphi :P(N)\u2192P(N)$ is *ultraweakly continuous* (or normal), i.e., it preserves (countable) joins of projections.^{21} Conversely, every ultraweakly continuous Jordan *-automorphism $\Phi :J(N)\u2192J(N)$ induces an automorphism *ϕ* of the complete orthomodular lattice $P(N)$ by $\varphi \u2254\Phi |P(N)$.

The ultraweakly continuous Jordan *-automorphisms of $J(N)$ form a group denoted $Aut(J(N))$. Dye’s theorem, hence, shows that, provided $N$ has no type *I*_{2} summand, there is a group isomorphism

between the group of automorphisms of the projection lattice and the group of ultraweakly continuous Jordan *-automorphisms of $N$.

One may wonder how Dye’s theorem and Jordan *-automorphisms relate to unitary and anti-unitary operators as in Wigner’s theorem. To see this, we first need the following well-known result (see, e.g., Ref. 22):

*Every Jordan* **-automorphism* $\Phi :N\u2192N$ *of a von Neumann algebra* $N$ *can be decomposed as the sum of a* **-isomorphism and a* **-anti-isomorphism.*

More concretely, there are projections *p*, *q* in the center of $N$ such that $N$ is unitarily equivalent to both $pN\u2295(1\u2212p)N$ and $qN\u2295(1\u2212q)N$, and $\Phi |Np:pN\u2192qN$ is a *-isomorphism, while $\Phi |(1\u2212p)N:(1\u2212p)N\u2192(1\u2212q)N$ is a *-anti-isomorphism. Moreover, we need the following proposition (see e.g., Ref. 22):

*Every* **-automorphism* $\Phi :B(H)\u2192B(H)$ *is implemented by conjugation with a unitary operator and every* **-anti-automorphism is implemented by conjugation with an anti-unitary operator.*

By Dye’s theorem and Proposition 4, every (ultraweakly continuous) Jordan ^{*}/automorphism $\Phi :B(H)\u2192B(H)$ decomposes into a *-isomorphism on $B(H)p$ and a *-anti-isomorphism on $B(H)(1\u2212p)$, where *p* is a *central* projection in $P(H)$. Since $B(H)$ is a factor (a von Neumann algebra with trivial center), the only central projections are 0 and 1. Thus, a Jordan *-automorphism $\Phi :B(H)\u2192B(H)$ is either a *-automorphism or a *-anti-automorphism. Hence, by Proposition 5, it is of the form

for a unitary or anti-unitary operator *u* acting on $B(H)$. We see that Wigner’s theorem is a special case of Dye’s theorem [depending on some special features of $B(H)$], and we can rephrase Wigner’s theorem as follows:

*(Wigner’s theorem in “Jordan formulation”). Let* $H$ *be a Hilbert space,* $dimH\u22653$*. Every automorphism* $\varphi :P(H)\u2192P(H)$ *is implemented by a unique ultraweakly continuous Jordan* **-automorphism* $\Phi :J(H)\u2192J(H)$ *such that* *ϕ*(*p*) = Φ(*p*) *for all* $p\u2208P(H)$*.*

This shows that contrary to the usual hand-waving arguments, there is a good mathematical reason to consider both unitary and anti-unitary operators in Wigner’s theorem, since we actually have a statement about the structure of $B(H)$ as a Jordan algebra $J(H)=(B(H),\u25e6)$. The Jordan structure is preserved by the action of both unitary and anti-unitary operators. For a related treatment of symmetries in quantum theory, see also Ref. 23.

### B. Contextuality and Jordan structure

So far, all this does not relate to contexts in any obvious way. The result that connects Wigner’s theorem (and Dye’s theorem) with contextuality is the following theorem:

*(Döring and Harding*^{24} *). Let* $N$ *be a von Neumann algebra not isomorphic to* $C\u2295C$ *or to* $M2(C)$*. For every order automorphism* $\varphi \u0303:V(N)\u2192V(N)$ *of the context category of* $N$*, there is a unique (ultraweakly continuous) Jordan* **-automorphism* $\Phi :J(N)\u2192J(N)$ *such that* $\varphi \u0303(V)=\Phi [V]$ *for all* $V\u2208V(N)$*.*

This shows that the mere order structure of contexts determines the algebra of observables as a Jordan algebra up to isomorphism. The proof proceeds in two steps: first, using the result by Harding and Navara^{17} already mentioned in Sec. II B, one shows that an order automorphism $\varphi \u0303:V(N)\u2192V(N)$ induces a unique automorphism $\varphi :P(N)\u2192P(N)$ of the projection lattice; second, by Dye’s theorem, this gives a Jordan *-automorphism $\Phi :J(N)\u2192J(N)$. As a shorthand, contextuality determines the Jordan algebra of physical quantities and vice versa.

As remarked elsewhere,^{21} it is easy to see that the Jordan *-automorphism Φ induced by an order isomorphism $\varphi \u0303:V(N)\u2192V(N)$ is ultraweakly continuous. Hence, there is a group isomorphism

The Döring–Harding result can be regarded as a reformulation of Dye’s theorem (Theorem 3), explicitly showing that it is contextuality that determines the Jordan algebra structure of von Neumann algebras (and vice versa). Specializing to the algebra $N=B(H)$ and using Proposition 5, we have the following theorem:

*(Wigner’s theorem in contextual form). Let*$H$

*be a Hilbert space,*$dim(H)\u22653$

*. For every order automorphism*$\varphi \u0303:V(H)\u2192V(H)$

*, there is a unique unitary or anti-unitary operator*

*u*

*such that*

*Conversely, every unitary or anti-unitary operator*

*u*

*induces an order automorphism of the context category*$V(H)$

*by conjugation.*

This is our first reformulation of Wigner’s theorem. Remarkably, any bijective map *ϕ* that preserves the order on the collection of contexts must be implemented by a unitary or anti-unitary operator.

### C. The trivial presheaf

Obviously, the structure of the context category $V(H)$ is sufficient to reformulate Wigner’s theorem. Extra information, as may be provided by presheaves over $V(H)$, is not necessary. In this sense, Wigner’s theorem is the simplest of the theorems that we consider.

Yet, there is a reformulation of Wigner’s theorem, more generally Dye’s theorem, that does use a presheaf. Since we need exactly the information provided by the context category $V(N)$, which is a partially ordered set, the presheaf must mirror this partial order (and nothing more). The trivial presheaf $1\u0332$ over $V(N)$ does this: the component at $V\u2208V(N)$ is the one-element set {^{*}}, and for every inclusion $iV\u0303V:V\u0303\u21aaV$, there is a restriction $1\u0332(iV\u0303V):1\u0332V\u21921\u0332V\u0303$, sending {^{*}} to {^{*}}. Hence, we have both the elements of the poset $V(N)$ and the order relation encoded by $1\u0332$.^{25}

As discussed in Sec. II C, an automorphism Θ of a presheaf $P\u0332$ consists of two things (according to our convention): a shifting around of components, induced by an automorphism $\varphi \u0303$ of the base category acting by pullback, followed by an isomorphism $\theta V:(P\u0332\u25e6\varphi \u0303)V\u2192(P\u0332\u25e6\varphi \u0303)V$ for each $V\u2208V(N)$ such that, whenever $V\u0303\u2282V$, one has $P\u0332(i\varphi \u0303(V\u0303)\varphi \u0303(V))\u25e6\theta V=\theta V\u0303\u25e6P\u0332(i\varphi \u0303(V\u0303)\varphi \u0303(V))$.

In our case, an order automorphism $\varphi \u0303:V(N)\u2192V(N)$ acts by pullback on the trivial presheaf $1\u0332$ over $V(N)$ in the following way: for all $V\u2208V(N)$, $(1\u0332\u25e6\varphi \u0303)V=1\u0332\varphi \u0303(V)={*}$.

Since each component $1\u0332V$ of the trivial presheaf $1\u0332$ is just a one-element set, $1\u0332V={*}$, the only isomorphism $\theta V:(1\u0332\u25e6\varphi \u0303)V\u2192(1\u0332\u25e6\varphi \u0303)V$ is the identity map [for each $V\u2208V(N)$]. Hence, an automorphism of the trivial presheaf $1\u0332$ over $V(N)$ is simply given by a shifting around of components, induced by an (order) automorphism of the base category $V(N)$.

*Let* $N$ *be a von Neumann algebra. There is a bijective correspondence between automorphisms* Θ *of the trivial presheaf* $1\u0332$ *over* $V(N)$ *and order automorphisms* $\varphi \u0303$ *of the context category* $V(N)$*.*

From this and Theorem 7, we have the following corollary:

*(Dye’s theorem in presheaf form). Let* $N$ *be a von Neumann algebra not isomorphic to* $C\u2295C$ *or to* $M2(C)$*. For every* *automorphism* Θ *of the trivial presheaf* $1\u0332$ *over the context category* $V(N)$*, there is a unique (ultraweakly continuous) Jordan* **-automorphism* $\Phi :J(N)\u2192J(N)$ *such that* $\varphi \u0303(V)=\Phi [V]$ *for all* $V\u2208V(N)$*, where* $\varphi \u0303:V(N)\u2192V(N)$ *is the automorphism of the context category corresponding to* Θ*.*

Finally, from Lemma 9 and Theorem 8 we obtain the following corollary:

*(Wigner’s theorem in presheaf form). Let*$H$

*be a Hilbert space,*$dim(H)\u22653$

*. For every automorphism*Θ

*of the trivial presheaf*$1\u0332$

*over the context category*$V(H)$

*, there is a unique unitary or anti-unitary operator*

*u*

*such that*

*is the order automorphism of*$V(H)$

*inducing*$\Theta :1\u0332\u21921\u0332$

*. Conversely, every unitary or anti-unitary operator*

*u*

*induces an automorphism*Θ

*of the trivial presheaf*$1\u0332$

*over*$V(H)$

*.*

Note that the trivial presheaf contains exactly the right amount of information: every automorphism of $1\u0332$ gives a unitary or anti-unitary *u* and vice versa. More physically speaking, every “rearrangement” of contexts that preserves the order (i.e., preserves how contexts are contained within each other) determines a unitary or anti-unitary operator and vice versa.

## IV. REFORMULATION OF THE KOCHEN–SPECKER THEOREM

The Kochen–Specker theorem^{5} is deeply connected to contextuality. The usual interpretation of the theorem amounts to a negative statement of the kind “there are no non-contextual assignments of values to physical quantities.”^{26}

Kochen and Specker’s result excludes certain state space models for quantum theory. This makes it difficult to interpret the Kochen–Specker theorem in geometric terms. In addition, it is not straightforward to see the exact nature of the connection between contextuality in our sense (commutative subalgebras of compatible physical quantities, arranged into a poset) and the nonexistence of valuation functions. Both aspects were clarified by Isham and Butterfield in a beautiful series of papers,^{10,11,27,28} with Hamilton as a co-author of the third paper. In fact, the context category first shows up in these papers, and so does the spectral presheaf $\Sigma \u0332$, which will be defined below. The latter plays a central role in the topos approach to quantum theory^{15} and serves as a generalized state space for a quantum system, notwithstanding the Kochen–Specker theorem. In fact, the Kochen–Specker theorem is equivalent to the fact that the quantum state space $\Sigma \u0332$ has no points (technically, it has no global sections). This reformulation serves as the prototype for the reformulations of the other fundamental theorems of quantum theory discussed in this article.

In Sec. IV A, we will give a quick overview of the Kochen–Specker theorem and its background. Then, in Sec. IV B, we make some connections with contextuality and present the presheaf reformulation of the Kochen–Specker theorem by Isham, Butterfield, and Hamilton. Finally, we extend their results to von Neumann algebras.

### A. Valuation functions and the Kochen–Specker theorem

In their seminal paper,^{5} Kochen and Specker considered the question whether assignments of values to the physical quantities of a quantum system exist. Let $H$ be a separable Hilbert space, let $B(H)$ be the algebra of bounded linear operators on $H$, and let $B(H)sa$ be the real vector space of bounded self-adjoint operators. A *valuation function* is a function

such that

for all $a\u2208B(H)sa$, it holds that

*v*(*a*) ∈*sp*(*a*) (spectrum rule),for all continuous functions $f:R\u2192R$, it holds that

*v*(*f*(*a*)) =*f*(*v*(*a*)) (*functional composition principle*).

*(Kochen–Specker theorem*^{5} *). Let* $H$ *be a Hilbert space,* $dimH\u22653$*. There exist no valuation functions* $v:B(H)sa\u2192R$*.*

In the proof, a certain family of rays, i.e., rank-1 projections, is considered. Each of these must be assigned either 0 or 1 according to the spectrum rule and in every orthogonal triple *p*_{1}, *p*_{2}, *p*_{3} of rank-1 projections, exactly one projection is assigned 1 and the others are assigned 0. By carefully choosing the family of projections, Kochen and Specker constructed an explicit counterexample: they show that no consistent assignment of values 0, 1 to the projections in their family is possible. The original proof used a configuration of 117 rays in $H=R3$ (the real Euclidean space), which could later be reduced to 31 and even fewer in $C4$. The proof of the result in real, three-dimensional Hilbert space implies the result in higher-dimensional, real and complex Hilbert spaces.

Bell provided a proof of the same result, i.e., there are no non-contextual assignments of values to physical quantities.^{29} His proof uses a continuity argument and Gleason’s theorem and hence is not “discrete” as Kochen and Specker’s proof.

### B. The spectral presheaf

Kochen and Specker emphasized that an earlier proof of nonexistence of certain value assignments by von Neumann^{30} was flawed, since it posed conditions on noncompatible physical quantities (represented by noncommuting self-adjoint operators), which Kochen and Specker regarded as unjustified. In contrast, the functional composition principle employed by Kochen and Specker seemingly is just a condition on commuting operators, since *a* and *f*(*a*) commute.

Yet, there are triples {*a*, *b*, *c*} of self-adjoint operators such that

i.e., *c* is both a function of *a* and a function of *b*. Such a triple is called a Kochen–Specker triple. Crucially, *c* = *f*(*a*) commutes with *a* and *c* = *g*(*b*) commutes with *b*, but *a* need not commute with *b*. In this way, the functional composition principle does pose conditions on noncommuting operators, too: since *f*(*v*(*a*)) = *v*(*f*(*a*)) = *v*(*g*(*b*)) = *g*(*v*(*b*)), the (hypothetical) values *v*(*a*) and *v*(*b*) assigned to *a* (respectively, *b*) are not independent.

The relations induced by Kochen–Specker triples led Isham and Butterfield to introduce the *spectral presheaf*. In their first paper^{10} (of four), the self-adjoint operators themselves served as “stages,” and in the third paper,^{11} the step to commutative subalgebras of $B(H)$ and the context category was taken. We will focus on the latter. First, consider a single commutative subalgebra *V* of $B(H)$. We assume for the moment that *V* is closed in the norm topology, and hence, *V* is a *C*^{*}-algebra. [For details on the norm topology and *C*^{*}-algebras, see, e.g., Ref. 12. In finite dimensions, any subalgebra of $B(H)$ is a *C*^{*}-algebra.] Not surprisingly, there are valuation functions on *V*_{sa}, the self-adjoint operators in *V*: every character, that is, every multiplicative linear functional of norm 1,

fulfills both the spectrum rule and the functional composition principle. Conversely, every valuation function is a character of *V*. Recall from Sec. II A that multiplicative linear functionals of norm 1 on a commutative von Neumann algebra *V* are exactly the pure states of *V*. Hence, we consider the set

of characters (or pure states) of *V*, traditionally called the Gelfand spectrum of *V*.^{31} In physical terms, Σ(*V*) is the (pure) state space of the physical system described by the physical quantities in *V*. As expected, the points of the state space Σ(*V*) correspond exactly with valuation functions on *V*_{sa}.

If $V\u0303\u2282V$ is a unital *C*^{*}-subalgebra, then every character of $V\u0303$ arises as the restriction of some character of *V*, that is, there is a surjective map

Hence, there is a canonical map from the state space of the bigger algebra *V* to the state space of the smaller algebra $V\u0303$. Every valuation function on *V* can be restricted to a valuation function on $V\u0303$.

In infinite dimensions, it is useful to work with commutative von Neumann subalgebras instead of the more general *C*^{*}-subalgebras, and we will do so from now on. (In finite dimensions, there is no difference.) Isham, Butterfield, and Hamilton’s key idea^{10,11} was to combine all the state spaces for commutative subalgebras of a quantum system into one global object. This is the spectral presheaf.

*Let* $H$ *be a Hilbert space. The spectral presheaf* $\Sigma \u0332$ *of the algebra* $B(H)$ *is the presheaf over the context category* $V(H)\u2254V(B(H))$ *given*

*on objects: for all commutative von Neumann subalgebras*$V\u2208V(H)$*, let*$\Sigma \u0332V=\Sigma (V),theGelfandspectrumofV;$*on arrows: for all inclusions*$iV\u0303V:V\u0303\u21aaV$*, let*$\Sigma \u0332(iV\u0303V):\Sigma \u0332V\u27f6\Sigma \u0332V\u0303,\lambda \u27fc\lambda |V\u0303.$

It is clear by construction that the spectral presheaf $\Sigma \u0332$ is a kind of state space for the quantum system, built from all the state spaces Σ(*V*) of the commuting, compatible parts $V\u2208V(H)$ of the noncommutative algebra $B(H)$ of physical quantities.

As we saw in Sec. II B, for a presheaf, the analog of a point is a global section. What would a global section of the spectral presheaf $\Sigma \u0332$ be? For every context $V\u2208V(H)$, we have to pick one element $\lambda V\u2208\Sigma \u0332V$, the Gelfand spectrum of *V*. *λ*_{V} is a valuation function for the physical quantities in *V*, i.e., it assigns a value *λ*_{V}(*a*) to all *a* ∈ *V*_{sa} such that the spectrum rule and functional composition hold.

Moreover, if $V\u0307$ is another commutative subalgebra that contains *a*, then *a* is also contained in $V\u0303\u2254V\u2229V\u0307$. The value we assign to *a* in *V* is *λ*_{V}(*a*) and the value we assign to *a* in $V\u0307$ is $\lambda V\u0307(a)$. Moreover, the value we assign to *a* in $V\u0303=V\u2229V\u0307$ is

and also

and hence,

The structure of a global section, therefore, guarantees that the value assigned to a physical quantity, represented by the self-adjoint operator *a*, is the same, independent of the context in which it lies. Since also the spectrum rule and the functional composition principle hold, every global section of $\Sigma \u0332$ would provide a valuation function on all of $B(H)$. Conversely, a valuation function would give a global section of $\Sigma \u0332$.

Since the Kochen–Specker theorem shows that there are no valuation functions, i.e., no non-contextual value assignments, Isham, Butterfield, and Hamilton could give the following reformulation:

*The Kochen–Specker theorem is equivalent to the fact that the spectral presheaf* $\Sigma \u0332(H)$ *has no global sections whenever* $dim(H)\u22653$*.*

In more physical terms, the Kochen–Specker theorem is equivalent to the fact that the quantum state space $\Sigma \u0332$ has no points. This does not mean, however, that $\Sigma \u0332$ is “empty,” and it still has plenty of subobjects (which are the presheaf analog of subsets). One can just not “focus down” to points, which would be (nonexistent) microstates.

We note that the nonexistence of global sections is not just a consequence of Kochen–Specker but is exactly equivalent. This shows that the context category and the spectral presheaf contain just the right amount of information and that the Kochen–Specker theorem indeed is encoded by our notion of contextuality.

The Kochen–Specker theorem was generalized to von Neumann algebras in Ref. 32.

*(Generalized Kochen*–*Specker theorem). Let* $N$ *be a von Neumann algebra with no direct summand of type* *I*_{1}, *I*_{2}*. Then, there are no valuation functions* $v:Nsa\u2192R$*.*

The condition that $N$ has no summand of type *I*_{1}, *I*_{2} generalizes the condition that $dimH\u22653$ in the original proof of the theorem.^{33,76,77} Since we can easily define the spectral presheaf of a von Neumann algebra $N$ [simply replace $B(H)$ by $N$ and $V(H)$ by $V(N)$ in Definition 4], we also have the following reformulation of the generalized Kochen–Specker theorem:

*Let* $N$ *be a von Neumann algebra with no direct summand of type* *I*_{2}*, let* $V(N)$ *be the context category of* $N$*, and let* $\Sigma \u0332$ *be its spectral presheaf. The generalized Kochen*–*Specker theorem is equivalent to the fact that* $\Sigma \u0332$ *has no global sections.*

## V. REFORMULATION OF GLEASON’S THEOREM

### A. The Born rule and Gleason’s theorem

Gleason’s theorem,^{2} proven in 1957, shows that the Born rule follows from very modest assumptions. Let $H$ be a Hilbert space of dimension 3 or greater. Assume that there is a function assigning probabilities to projection operators,

such that

*μ*(1) = 1,for all $p,q\u2208P(H)$, if

*pq*= 0, then*μ*(*p*+*q*) =*μ*(*p*) +*μ*(*q*).

Condition (a) is the obvious *normalization* condition and (b) is *finite additivity* on mutually orthogonal projections. Clearly, if one aims to have any probabilistic formalism relating to projections (representing propositions about a quantum system), having such a function *μ* that assigns probabilities to projections is the minimal and natural requirement. There is a built-in non-contextuality condition: every projection *p* lies in many different contexts, but *μ* assigns just one probability to *p*, independently of contexts.

There is an obvious strengthening of finite additivity to infinite families of mutually orthogonal projections, called complete *additivity:*

- (b′)
for any family $(pi)i\u2208I$ of mutually orthogonal projections (i.e.,

*p*_{i}*p*_{j}=*δ*_{ij}*p*_{i}for all*i*,*j*∈*I*), it holds that*μ*(⋁_{i∈I}*p*_{i}) =*∑*_{i∈I}*μ*(*p*_{i}).

Note that if the underlying Hilbert space $H$ is separable, then the index set *I* is at most countable.^{34}

Gleason showed the following theorem, partly answering an earlier problem posed by Mackey.

*(Gleason’s theorem). Let*$dimH\u22653$

*. Given a completely additive probability measure*

*μ*

*on projections, there always exists a unique positive trace-class operator with trace*1,

*written*

*ρ*

_{μ}

*, such that*

In finite dimensions, *ρ*_{μ} is nothing but a density matrix. As mentioned in Sec. II A, this means that every completely additive probability measure on projections determines a unique normal state of $B(H)$. Conversely, every normal state, equivalently every positive trace-class operator of trace 1 (or, in finite dimensions, every density matrix), *ρ* determines a unique completely additive probability measure by

Obviously, $\mu \rho \mu =\mu $ and $\rho \mu \rho =\rho $. Hence, Gleason’s theorem justifies the use of density matrices and the Born rule in quantum mechanics. The condition that the Hilbert space is at least three-dimensional is essential, and we will assume $dimH\u22653$ from now on.

In order to understand the power of Gleason’s theorem, note that the definition of a completely additive probability measure $\mu :P(H)\u2192[0,1]$ only poses conditions on mutually orthogonal, hence, commuting projections. In other words, for any family $(pi)i\u2208I$ of mutually orthogonal projections, condition (b′) above is a condition within a context *V* that contains all the *p*_{i}, *i* ∈ *I*. If there are several contexts that contain all the *p*_{i}, it does not matter which context we consider, since probabilities are assigned directly to projections, independently of the contexts they lie in.

For simplicity, let us assume that the Hilbert space $H$ is finite-dimensional, $H=Cn$. Let *V* be a context of $B(H)=Mn(C)$, the complex *n* × *n*-matrix algebra. Let {*p*_{1}, …, *p*_{m}} denote the unique set of minimal projections in *V*. Then, the *p*_{i} are mutually orthogonal and *V* is generated by them, $V={p1,\u2026,pm}\u2033$. Every self-adjoint operator *a* ∈ *V*_{sa} is a unique real linear combination of the *p*_{i}, that is, $a=\u2211i=1mAipi$. We extend the finitely additive probability measure *μ* to a function $\mu :Vsa\u2192R$ by

This implies directly that if $r\u2208R$, then *μ*(*ra*) = *rμ*(*a*) and if *a*, *b* ∈ *V*_{sa}, then *μ*(*a* + *b*) = *μ*(*a*) + *μ*(*b*). Hence, $\mu :Vsa\u2192R$ is a real-linear function. Importantly, if *a* and *b* do not commute, it is not obvious at all if *μ*(*a* + *b*) = *μ*(*a*) + *μ*(*b*) holds or not. A finitely additive probability measure $\mu :P(H)\u2192[0,1]$ gives a function $\mu :Vsa\u2192R$ that is *linear in every context**V* in a straightforward way, but it is not clear initially why this function should also be linear across contexts, i.e., on noncommuting operators. Traditionally, a function that is linear on commuting operators is called *quasi-linear*.

Gleason’s result shows that there always exists a density matrix *ρ*_{μ} such that *μ*(*p*) = tr[*ρ*_{μ}*p*], and clearly, we also have *μ*(*a*) = tr[*ρ*_{μ}*a*] due to linearity of the trace. Crucially, the map

is linear on *all* (self-adjoint) operators,

which implies that *μ* is also linear on all operators. Hence, the quasi-linear function *μ* is, in fact, linear. In this way, Gleason’s theorem solves a local-to-global problem, where “local” here means “on commuting operators” (or “within contexts”) and global means “on all operators.”

By the efforts of many people, Gleason’s theorem has been generalized to von Neumann algebras (see Ref. 35 and references therein).

*(Generalized Gleason’s theorem). Let*$N$

*be a von Neumann algebra with no direct summand of type*

*I*

_{2}

*, and let*$\mu :P(N)\u2192[0,1]$

*be a finitely additive probability measure on the projections of*$N$

*. There exists a unique state*

*ρ*

_{μ}

*of*$N$

*such that*

Note that here $\rho \mu :N\u2192C$ denotes the state itself (i.e., a positive linear functional of norm 1), while before the state was denoted, $tr[\rho \mu _]:B(H)\u2192C$ and *ρ*_{μ} was just the positive trace-class operator (or density matrix). The reason is that the state *ρ*_{μ} need not be normal, and hence, there may be no density matrix. In fact, *ρ*_{μ} is normal if and only if the probability measure *μ* is completely additive.

### B. The probabilistic presheaf

In order to relate Gleason’s theorem (in its generalized form), Theorem 17, more explicitly to contextuality, we consider a certain presheaf that encodes probability assignments to projections. The obvious definition is as follows:

*Let* $N$ *be a von Neumann algebra with context category* $V(N)$*. The probabilistic presheaf* $\Pi \u0332$ *of* $N$ *over* $V(N)$ *is the presheaf given*

*on objects: for all*$V\u2208V(N)$*, let*$\Pi \u0332V\u2254{\mu V:P(V)\u2192[0,1]\u2223\mu Visafinitelyadditiveprobabilitymeasure},$*on arrows: for all inclusions*$iV\u0303V:V\u0303\u21aaV$*, let*$\Pi \u0332(iV\u0303V):\Pi \u0332V\u27f6\Pi \u0332V\u0303\mu V\u27fc\mu V|V\u0303.$*Here, the restriction*$\mu V|V\u0303$*of the function*$\mu V:P(V)\u2192[0,1]$*to*$P(V\u0303)\u2282P(V)$*is simply marginalization.*

Note that this is the simplest possible definition of a presheaf built from finitely additive probability measures (FAPMs) on contexts. An element $\mu V\u2208\Pi \u0332V$ is a FAPM for the projections in *V*, so it only assigns probabilities to projections in *V*, not to all projections (unlike the FAPM $\mu :P(N)\u2192[0,1]$ in the generalized Gleason’s theorem (Theorem 17), which assigns probabilities to all projections in $N$).

The probabilistic presheaf $\Pi \u0332$ can be seen as a generalization of the spectral presheaf $\Sigma \u0332$ in the following way: at each context $V\u2208V(N)$, the component $\Sigma \u0332V$ of $\Sigma \u0332$ is the set of pure states $\lambda :V\u2192C$; see Definition 4. In the probabilistic presheaf $\Pi \u0332$, on the other hand, the component $\Pi \u0332V$ is given by finitely additive probability measures $\mu V:P(V)\u2192[0,1]$. The latter are positive linear functionals on *V* of norm 1, i.e., convex linear combinations of elements in $\Sigma \u0332V$. In other words, the elements of $\Pi \u0332V$ correspond with mixed states of *V*, while the elements of $\Sigma \u0332V$ correspond with pure states of *V*, equivalently, extreme points of $\Pi \u0332V$.

What about global sections of the probabilistic presheaf? Prima facie, we do not know whether global sections exist or not, but we now show that every quantum state $\rho :N\u2192C$ gives a global section *γ*_{ρ} of $\Pi \u0332$. Define

Here, $\rho |P(V)$ is the restriction of the quantum state to the projections in the context *V*. Since *ρ* is linear, $\rho |P(V)$ is a finitely additive probability measure on the projections in *V*. If a projection *p* is contained in a context *V* and a subcontext $V\u0303\u2282V$, then

so *γ*_{ρ} is indeed a global section.

Conversely, let *γ* be a global section of the probabilistic presheaf $\Pi \u0332$. In every context $V\u2208V(N)$, we have a FAPM $\gamma (V):P(V)\u2192[0,1]$ on the projections of *V*. The restriction maps $\Pi \u0332(iV\u0303V)$ guarantee that whenever a context $V\u0303$ is contained in another context *V*, a projection *p* is assigned the same probability, no matter whether we regard *p* as a projection in *V* or in $V\u0303$. Hence, a global section *γ* of the probabilistic presheaf $\Pi \u0332$ gives a finitely additive probability measure *μ* on all projections in $N$. By Gleason’s theorem for von Neumann algebras, this determines a unique state *ρ*_{γ} of the algebra $N$, provided $N$ has no type *I*_{2} summand. The latter condition is akin to the condition that the Hilbert space must be at least three-dimensional.

Before we state Gleason’s theorem in its contextual reformulation, we discuss the following slight variation of Definition 5. Note that we may interpret the probability measures $\mu V\u2208\Pi \u0332V$ as positive operator-valued measures. In fact, by Gelfand duality, every commutative von Neumann algebra $V\u2208V(N)$ corresponds with an (extremely disconnected) compact Hausdorff space, whose *σ*-algebra of open (and closed) sets corresponds with the projection lattice $P(V)$. Identifying $R$ with real-valued (1 × 1)-matrices, *μ*_{V} trivially becomes a positive operator-valued measure, and by Naimark’s theorem,^{36} we can find a dilation of the form *μ*_{V} = *v*^{*}*φ*_{V}*v*, where $v:C\u2192K$ is a bounded linear map into some Hilbert space $K$ [by scalar multiplication, *v* corresponds to a vector $v\u2208K$ given by *v* = *v*(1)] and $\phi V:P(V)\u2192P(K)$ is an embedding (or spectral measure).^{37} In this reading, we obtain a finitely additive probability measure *μ*_{V} by setting $\phi V|V\u0303=\phi V\u0303$ whenever $V\u0303\u2282V$. It is easy to see that the latter then defines an orthomorphism $\phi :P(N)\u2192P(K)$, which by Dye’s theorem^{20,38} lifts to a unique Jordan *-homomorphism $\Phi :J(N)\u2192B(K)$.

Importantly, collections of dilations over contexts $(\mu V=v*\phi Vv)V\u2208V(N)$ still correspond with quantum states *ρ* = *v*^{*}Φ*v*. In finite dimensions, the latter is easily recognized as a purification of *ρ*.^{39} In particular, restricting to pure states, we may choose $K=H$ and $\phi :P(H)\u2192P(H)$ the identity map such that $|v\u3009\u2208H$ is the pure state corresponding to *ρ*(*p*) = ⟨*v*|*p*|*v*⟩ = tr[|*v*⟩⟨*v*|*p*] for all $p\u2208P(N)$. As always, mixed states correspond with convex combinations of pure states; in this sense, applying Naimark’s theorem in contexts amounts to a type of intrinsic convexity condition with respect to the set of pure states (for more details, see Refs. 40 and 41). Taking the latter into account, we refine Definition 5 as follows:

*Let* $N$ *be a von Neumann algebra with context category* $V(N)$ *and* $K$ *be a Hilbert space.*

*The dilated probabilistic presheaf* $\Pi \u0332D:V(N)op\u2192Set$ *of* $N$ *over* $V(N)$ *is the presheaf given*

*on objects: for all*$V\u2208V(N)$*, let*$(\Pi \u0332D)V\u2254{\mu V:P(V)\u2192[0,1]\u2223\mu V=v*\phi Vvforv\u2208K,\phi V:P(V)\u21aaP(K),and\mu V(1)=1},$*on arrows: for all inclusions*$iV\u0303V:V\u0303\u21aaV$*, let*$\Pi \u0332D(iV\u0303V):(\Pi \u0332D)V\u27f6(\Pi \u0332D)V\u0303,v*\phi Vv=\mu V\u27fc\mu V\u0303=v*\phi V|V\u0303v.$

One can consider a presheaf that is closely related to the (dilated) probabilistic presheaf $\Pi \u0332$ $(\Pi \u0332D)$ but has as component at $V\u2208V(N)$ only completely additive probability measures. It is easy to check that if a von Neumann algebra $N$ has no type *I*_{2} summand, there is a bijective correspondence between normal states on $N$ and global sections of the normal (dilated) probabilistic presheaf over $V(N)$. To simplify notation, we will denote the normal (dilated) probabilistic presheaf of $N$ over $V(N)$ also by $\Pi \u0332$ $(\Pi \u0332D)$.

*(Generalized Gleason’s theorem in contextual form). Let* $N$ *be a von Neumann algebra with no direct summand of type* *I*_{2}*. There is a bijective correspondence between quantum states, that is, states on* $N$*, and global sections of the (dilated) probabilistic presheaf* $\Pi \u0332$ $(\Pi \u0332D)$ *over* $V(N)$*.*

This is our reformulation of Gleason’s theorem, which connects it explicitly with contextuality. In contrast to the spectral presheaf, the (dilated) probabilistic presheaf does have global sections and they correspond exactly with quantum states. It is remarkable that the very simple definition of the probabilistic presheaf $\Pi \u0332$, with FAPMs in every context, connected by the obvious restriction maps in the form of marginalization, suffices to guarantee this (for single systems). In Sec. VI, we will see that FAPMs have to be refined to dilations in contexts, as defined in Definition 6, in order to guarantee a similar correspondence also for composite systems.

As usual, the power of the construction lies in the restriction maps (and, of course, Gleason’s theorem). In particular, no further local or global data are needed. In physical terms, there is no need for hidden variables. More importantly, there is no room for hidden variables: as soon as a theory assigns probabilities to all projections in dimension 3 or greater in the obvious way, i.e., finitely additively on orthogonal projections, there exists a quantum state that provides this assignment of probabilities. There are no other (finitely additive) assignments of probabilities to projections apart from those given by quantum states.

Any hidden variables or other extra data could at best give further restrictions. It is worthwhile mentioning the case $N=B(H)$ explicitly.

*(Gleason’s theorem in contextual form). Let* $B(H)$ *be the algebra of all bounded operators on a Hilbert space* $H$*. If* $dimH\u22653$*, then there is a bijective correspondence between quantum states, that is, states on* $B(H)$*, and global sections of the (dilated) probabilistic presheaf* $\Pi \u0332$ $(\Pi \u0332D)$ *over* $V(H)$*.*

The fact that Gleason’s theorem is closely linked with contextuality in this manner was first observed by Döring,^{32} made more explicit by de Groote,^{42} and then in a form very similar to the one above by Döring.^{43} In a different vein, Gleason’s theorem has also been extended to effect algebras,^{44,45} which further allows us to cover the two-dimensional case.

## VI. BELL’S THEOREM AND CONTEXTUALITY

Bell’s seminal paper^{4} responds to a long-standing conjecture by Einstein, Podolsky, and Rosen (EPR),^{46} who that claim quantum theory is only a statistical version of a more fundamental theory, similar to the relation between thermodynamics and statistical mechanics. Besides the probabilistic nature of quantum theory, this idea is motivated by certain nonlocal features present in the quantum formalism, believed to be resolved within the more fundamental theory. As a response to EPR’s famous thought experiment, Bell formalizes EPR’s assumption of an underlying space of hidden variables and derives a constraint for the maximal amount of correlations possible in such theories under the additional assumption of locality.^{4,47} However, some quantum mechanically predicted and experimentally verified correlations^{48–50} do not obey these constraints and, thus, cannot be reproduced by any local hidden variable model.

Pitowsky has pointed out that Bell inequalities are a special case of Boole’s “conditions of possible experience”—consistency constraints on correlations under the assumption of a common underlying measure space.^{51,52} Here, we extend this perspective in several ways. In Sec. VI A, we identify the assumption of a common underlying measure space with the classical case of trivial physical contextuality and stress the role of composition (of measure spaces) as the relevant notion of locality in Bell inequalities. In Sec. VI B, we use Gelfand duality to show that, in the single-context case, composition of measure spaces and observable algebras coincide. Shifting focus to the latter and their order structure imposed by physical contextuality, in Sec. VI C, we generalize the locality constraint inherent to Bell inequalities from one to many contexts by defining the Bell presheaf as a version of the probabilistic presheaf over product contexts. Surprisingly, global sections of the Bell presheaf correspond with quantum states up to a choice of time orientation in subsystems. This is interpreted as a reformulation and extension of Bell’s theorem in contextual form.

### A. Correlations in classical theories

#### 1. Classical state spaces

We first give an account of what we mean by a *classical* theory. For our purposes, it will be enough to consider the kinematics and so we start with a set (soon to be upgraded to an algebra) of observables $O$. We take as a defining property of a classical theory that all its observables are *simultaneously measurable*; from the perspective of physical contextuality, we are thus considering the trivial case of a single context.^{53,54,78} Observables $a\u2208O$ in classical theories are mathematically represented by functions $fa:\Sigma \u2192R$ on some measure space (Σ, *σ*, *ds*). We call Σ the (*single-context*) *state space* of the theory. Every microstate *s* ∈ Σ assigns truth values to propositions of the form $a\u2032\u2208\Delta \u2032$ (read “*the physical quantity* *a* *has a value within the Borel subset* $\Delta \u2208R$”),

We can therefore speak of the *value of an observable**v*_{s}(*a*), *given the state**s* ∈ Σ in the intuitive sense, i.e., through evaluation of the corresponding function,

The *valution functions*$vs:O\u2192R$ in Eq. (2), which we already discussed in connection with the Kochen–Specker theorem in Sec. IV, are defined on all observables; in other words, every observable has an intrinsic (sharp) value in every state. [Note that the spectrum rule, *v*_{s}(*a*) ∈ sp(*a*) = Im(*f*_{a}), is trivially satisfied.] The observation that all observables simultaneously take deterministic values justifies to model *physical states* by points in some space Σ and observables by functions $fa:O\u2192R$ in the first place. Of course, this inductive reasoning has to be revisited for non-classical theories, that is, theories with non-trivial physical contextuality. We will do so in Secs. VI B and VI C.

It is natural to equip the set of observables $O$ with the structure of an algebra.^{55} It is straightforward to extend the definition of valuation functions in Eq. (2) to this extra algebraic structure: for all $a,b\u2208O$, $r\u2208R$ and *s* ∈ Σ, we set

From this perspective, classical states *s* ∈ Σ correspond with *algebra homomorphisms*$vs:O\u2192R$. Note that when $O=N$ is a commutative von Neumann algebra, Eq. (3) holds as a consequence of the functional composition principle *v*_{s}(*f*(*a*)) = *f*(*v*_{s}(*a*)), in which $f:R\u2192R$ is a continuous function and $a,f(a)\u2208Nsa$ are self-adjoint operators (see Sec. IV).

Given two subsystems with measure spaces (Σ_{1}, *σ*_{1}, *ds*_{1}) and (Σ_{2}, *σ*_{2}, *ds*_{2}), the composite state space is defined as the Cartesian product Σ_{1&2} ≔ Σ_{1} × Σ_{2} with product *σ*-algebra *σ*_{1&2} generated by elements *B*_{1} × *B*_{2}, *B*_{1} ∈ *σ*_{1}, *B*_{2} ∈ *σ*_{2} and product measure *ds*_{1&2} ≔ *ds*_{1} × *ds*_{2} satisfying the condition^{56}

In a similar way, we obtain composite state spaces with multiple subsystems. If the algebra $O$ of the composite system is generated from the algebras of its subsystems, i.e., by functions $fa:\Sigma \u2192Rn$ on the composite state space $\Sigma =\xd7i=1n\Sigma i$ (see Sec. VI B), then evaluation on elements *s* ∈ Σ again yields algebra homomorphisms similarly to Eq. (3). Hence, we obtain composite valuation functions $vs:O\u2192Rn$ from the obvious generalization of Eq. (2) to composite observables. Moreover, we obtain a generalization of the truth values in Eq. (1) by considering tuples $a=(a1,\u2026,an)\u2208O$ with $ai\u2208Oi$ for *i* ∈ {1, …, *n*} as well as functions $fa:\Sigma \u2192Rn$, $fa(s)\u2254(fa1(s1),\u2026,fan(sn))$ with $s\u2208\Sigma =\xd7i=1n\Sigma i$. The truth value of the proposition $a\u2032\u2208\Delta \u2032$ with the Borel set $\Delta \u2254\xd7i=1n\Delta i$ is defined as

#### 2. Factorizability of joint probability distributions

The spectrum rule [Eq. (2)] and the algebraic relations in Eq. (3) are characteristic of pure states. Mixed states are convex mixtures of pure states, i.e., probability measures $\mu :\Sigma \u2192R$,

The probability for the event corresponding to the Borel set $\Delta \u2282R$ when measuring the observable $a\u2208O$ of a system in the mixed state *μ* is given by^{57}

Note that in the last step we have used the indicator function $\Theta (a\u2032\u2208\Delta \u2032,s)$ in Eq. (1). For instance, the probability for obtaining a particular outcome *A* corresponds to Δ_{A} ≔ {*A*}. Analogously, for joint probability distributions on a bipartite system (represented by the product of measure spaces Σ = Σ_{1} × Σ_{2}), we obtain

This condition on joint probability distributions is called *factorizability*. From it, one derives the Bell inequalities in the usual way.^{4,58}

Note that Eq. (6) expresses merely the splitting of supports of indicator functions $\Theta (a\u2032\u2208\Delta \u2032,s)$. We, thus, conclude that factorizability is a consequence of the following two assumptions:

trivial physical contextuality, i.e., a single common underlying measure space, and

the composite system is given by the Cartesian product of individual measure spaces.

Assumption (b) is the natural way to compose independent (single-context) measure spaces. It encodes the locality condition in the derivation of Bell inequalities, which we identify with local consistency conditions on correlations, interpreted over a common single composite measure space.^{51,59} In turn, the violation of Bell inequalities, thus, becomes a no-go-result for the existence of (a joint probability distribution in) a composite classical state space.

We remark that the above argument is not restricted to outcome deterministic models, and the same conclusion obtains if not all *s* are (experimentally) distinguishable.^{60} Note also that we have not yet specified the algebra of functions. For instance, the above results apply to commutative von Neumann as well as *C*^{*}-algebras (cf. Ref. 61). In Sec. VI B, we extend the constraints on correlations in Eq. (6) by relaxing assumption (a), i.e., by allowing for non-trivial physical contextuality.

### B. Locality and composition of contexts

In Sec. VI A, we have argued that factorizability can be seen as a consequence of the assumption of a common underlying measure space, which by locality decomposes as the product of individual measure spaces. However, by the Kochen–Specker theorem (Theorem 12), this picture is hardly available for quantum systems.^{62,79} In order to impose local “conditions of possible experience” (in the sense of Refs. 51 and 52) also for quantum systems, in this section, we shift perspective from measure spaces to observable algebras and their partial order of contexts. In particular, we define a notion of composition for the latter, which extends the locality constraint in Bell inequalities beyond the classical case, i.e., to systems with non-trivial physical contextuality.

#### 1. Composition of classical state spaces

Recall that in Sec. VI A we defined composition of classical systems in terms of their state spaces, namely, via the product of the corresponding measure spaces. Alternatively, we can define the composite state space in terms of the corresponding algebras of observables.

To this end, we represent observables as measurable functions in a commutative von Neumann algebra $N$. By Gelfand duality, we may identify $N$ with a measure space over its Gelfand spectrum $\Sigma (N)$.^{63,80} Recall from Theorem 18 that global sections of the probabilistic presheaf $\Pi \u0332$ correspond with (mixed) states on $N$, i.e., $S(N)\u2245\Gamma (\Pi \u0332(V(N)))$. Furthermore, if we define the product of two commutative von Neumann algebras $N1,N2$ in terms of the (spatial) tensor product $N1\u2297N2$,^{64} it follows that

where $\Pi \u0332$ denotes the normal probabilistic presheaf. It is easy to see that Eq. (7) is not true for noncommutative von Neumann algebras (see also comment below Definition 7). Nevertheless, it holds in every commutative von Neumann subalgebra $V\u2282N$, i.e., in every context $V\u2208V(N)$. This suggests to extend the locality assumption in Bell inequalities by evaluating it with respect to *all product contexts*.

#### 2. Composite context category

Let $N1,N2$ be generally noncommutative von Neumann algebras and $V(N1),V(N2)$ be the respective context categories. We define the composite context category by

where for all $V\u03031,V1\u2208V(N1)$, $V\u03032,V2\u2208V(N2)$: $(V\u03031,V\u03032)\u22821&2(V1,V2):\u21d4(V\u03031\u22821V1,V\u03032\u22822V2)$. If $N1=B(H1)$, $N2=B(H2)$, we also write $VH1&H2$ for $VN1&N2$. Note that, by Eq. (7), we may identify every composite context (*V*_{1}, *V*_{2}) ≔ *V*_{1} ⊗ *V*_{2} with the (spatial) tensor product of the respective commutative von Neumann subalgebras. By comparison with the derivation in Sec. VI A, Eq. (8) is a generalization of the locality constraint in Bell inequalities from one to many contexts.

With $VN1&N2$ as the base category, we may define the normal probabilistic presheaf $\Pi \u0332(VN1&N2)$. Each component $\Pi \u0332(V1,V2)$ consists of completely additive probability measures $\mu :P(V1\u2297V2)\u2192[0,1]$ together with the obvious restriction maps. More importantly, we define the *Bell presheaf* as the normal dilated probabilistic presheaf $\Pi \u0332D(VN1&N2)$ according to Definition 6. This subtle difference plays a crucial role in Lemma 20 below (cf. Refs. 39–41). Physically, it represents the idea that we can think of a system as part of a larger system (e.g., including an environment).

*Let* $N1,N2$ *be a von Neumann algebras with context category* $V(N1)$*,* $V(N2)$*, respectively. Then, we call the normal dilated probabilistic presheaf* $\Pi \u0332D(VN1&N2)$ *over the product context category* $V1&2\u2254V(N1)\xd7V(N2)$ *the* Bell presheaf *of* $N1$ and $N2$*.*

Definition 7 provides a natural setting to study extensions of Bell inequalities, i.e., consistency constraints on correlations in different measure spaces—related to one another by physical contextuality. Importantly, we point out that $VN1&N2$ assumes much less structure than the (spatial) tensor product of (noncommutative) von Neumann algebras $N1\u2297N2$. In particular, $VN1&N2$ contains far fewer contexts than $V(N1\u2297N2)$, namely, only product contexts.

### C. The Bell presheaf

For the sake of clarity of argument, we will restrict the present discussion to the case $Ni=B(Hi)$, *i* = 1, 2, with $dim(Hi)\u22653$ finite, and refer to Ref. 39 for the general case. Recall from Gleason’s theorem in contextual form (Theorem 18) that quantum states of the composite system $H1\u2297H2$ bijectively correspond with global sections of the probabilistic presheaf $\Pi \u0332(V(H1\u2297H2))$. Since $V(H1\u2297H2)$ is a much richer poset than $VH1&H2$, there are many more restriction maps in the (dilated) probabilistic presheaf $\Pi \u0332D(V(H1\u2297H2))$ than in the Bell presheaf $\Pi \u0332(VH1&H2)$. It is easy to see that every global section of $\Pi \u0332(V(H1\u2297H2))$, that is, every quantum state, induces a global section of $\Pi \u0332(VH1&H2)$, but it is not clear *a priori* whether the converse holds. The Bell presheaf $\Pi \u0332(VH1&H2)$ could potentially have many more global sections than those corresponding with quantum states. Remarkably, this is not the case.^{65,81,82} In order to see this, the following lemma is crucial; for details, we refer to Refs. 39 and 41.

*Let*$Hi$

*be Hilbert spaces with*$dim(Hi)\u22653$

*finite,*$B(Hi)$

*be the algebras of physical quantities, and*$V(Hi)$

*be the corresponding context categories. Then, for every global section*$\gamma \u2208\Gamma (\Pi \u0332(VH1&H2))$

*of the Bell presheaf in Definition 7, there exists a unique linear map*$\varphi \gamma :B(H1)\u2192B(H2)$

*. Moreover, there exists a Hilbert space*$K$

*, a linear map*$v:H2\u2192K$

*, and a Jordan**

*-homomorphism*$\Phi \gamma :J(H1)\u2192J(K)$

*such that*

*ϕ*^{γ} in Lemma 20 is a decomposable map.^{66} In contrast, by Choi’s theorem,^{67} every density matrix on the composite system $H1\u2297H2$ corresponds with a completely positive, trace-preserving map $\varphi :B(H1)\u2192B(H2)$, and by Stinespring’s theorem,^{68} every such completely positive map *ϕ* is of the form *ϕ* = *v*^{*}Φ*v* with $v:H2\u2192K$ a linear map and $\Phi :B(H1)\u2192B(K)$ a ^{*}-homomorphism. It follows that a global section of the Bell presheaf $\gamma \u2208\Gamma (\Pi \u0332(VN1&N2))$ corresponds with a quantum state if and only if the Jordan ^{*}-homomorphism in Lemma 20 lifts to a ^{*}-homomorphism.

Not every Jordan ^{*}-homomorphism is also ^{*}-homomorphism. It turns out that for the special case of $N=B(H)$, there are exactly two ways to lift a Jordan algebra $J(H)=(B(H),\u25e6)$ to a von Neumann algebra: by augmenting the symmetric product (anticommutator) to an associative product $a\u22c5\xb1b=12{a,b}\xb112[a,b]$. Moreover, for every Jordan ^{*}-homomorphism $\Phi :J(H1)\u2192J(K)$, it holds Φ([*a*, *b*]) = ±[Φ(*a*), Φ(*b*)], whereas Φ is a ^{*}-homomorphism only if it also preserves the commutator, Φ([*a*, *b*]) = [Φ(*a*), Φ(*b*)] for all $a,b\u2208B(H1)$ (cf. Ref. 69). The difference between the two associative algebras with products ·_{±} has a clear physical interpretation in terms of the direction of time. This can be made precise in the form of time orientations on the context category. (For more details, we refer to Refs. 41 and 70 and specifically.^{21,22})

*Let*$H$

*be a Hilbert space,*$B(H)$

*be the algebra of physical quantities, and*$V(H)$

*be the corresponding context category. The canonical time orientation on the context category is the map into the automorphisms on*$V(H)$

*, denoted by*$Aut(V(H))$

*,*

*The context category*$V(H)$

*together with the canonical time orientation*Ψ

*on it is called the time-oriented context category and denoted*$V(H)\u0303=(V(H),\Psi )\u2245B(H)$

*.*

By Theorem 8, every order automorphism on $V(H)$ corresponds to conjugation by a unitary or anti-unitary operator. Recall that every anti-unitary can be decomposed into the time-reversal operator and a unitary operator, where the time-reversal operator causes a sign change of the time parameter $t\u2208R$ in the canonical evolution in Eq. (9).

Moreover, by differentiation of Eq. (9), time-reversal corresponds with a change in sign for the commutator in the two associative algebras with products ·_{±}, whose common Jordan algebra is $J(H)$. In line with Definition 8, we denote by Ψ^{*} the reverse time orientation on $V(H)$, where Ψ^{*} is given by the inversion *t* ↦ −*t* in Eq. (9), i.e., Ψ^{*}(*t*, *a*) = Ψ(−*t*, *a*).^{71,83} Succinctly, time orientations encode the forward time direction in a quantum system.

We remark that for general Jordan algebras there are more ways to lift them to von Neumann algebras and, thus, also more possible time orientations on $N$.^{72}

Next, we introduce a notion of *time-oriented global sections* with respect to the relative time orientation $\Psi =(\Psi 1*,\Psi 2)$ between systems $(V(H1),\Psi 1*)$ and $(V(H2),\Psi 2)$. We remark that this relative orientation is implicit in Choi’s theorem^{67} (for details, see Ref. 70).

*Let*$Hi$

*,*

*i*= 1, 2,

*be Hilbert spaces,*$B(Hi)$

*be the algebras of physical quantities on either subsystem, and*$V(Hi)\u0303$

*be the corresponding context categories with respective canonical time orientations*Ψ

_{i}

*. A global section of the Bell presheaf*$\gamma \u2208\Gamma (\Pi \u0332(V1&2))$

*is called time-oriented with respect to*$\Psi =(\Psi 1*,\Psi 2)$

*if*

*where*Φ

^{γ}

*is the Jordan*$\Psi 2\u2032$

^{*}-homomorphism in Lemma 20 and*is the unique time-orientation on*$B(K)$

*induced by the canonical time orientation*Ψ

_{2}

*on*$B(H2)$

*. We denote by*$\Gamma (\Pi \u0332(V(H1)\u0303*\xd7V(H2)\u0303))$

*the set of time-oriented global sections with respect to*$\Psi =(\Psi 1*,\Psi 2)$

*.*

Combining the above, we arrive at a contextual reformulation and extension of Bell’s theorem. For the proof and more details, see Refs. 39 and 41.

*(Bell’s theorem in contextual form). Let*$Hi$

*,*

*i*= 1, 2,

*be Hilbert spaces*$dim(Hi)\u22653$

*finite,*$B(Hi)$

*be the algebra of physical quantities,*$V(Hi)\u0303$

*be the corresponding context categories with respective canonical time orientations*Ψ

_{i}

*, and*$S(Hi)\u2245\Gamma (\Pi \u0332(V(B(Hi))))$

*be the corresponding state spaces. Then, the state space of the composite system is given by*

*Moreover, for commutative von Neumann algebras*$N1,N2$

*,*

Theorem 21 evaluates the constraints on correlations bound by physical contextuality and locality in the form of context composition. It bounds classical correlations by Bell inequalities [cf. Eq. (6) in the commutative (single-context) case as a consequence of Eq. (7)]. More generally, Theorem 21 classifies correlations bound by the natural generalization of Bell inequalities from one to many contexts and their order relations given by inclusion (coarse-graining). It is in this sense that Theorem 21 is a reformulation and extension of Bell’s theorem in contextual form. Remarkably, the resulting correlations already correspond with quantum states up to the relative time orientation between local subsystems in Definition 9. The latter is a genuine quantum phenomenon, which is intimately related to entanglement.^{73}

The proof of Theorem 21 as presented here is given in Refs. 40 and 41, together with a thorough discussion on the relation between contextuality and no-signaling. The generalization to general von Neumann algebras [replace $B(Hi)$ with $Ni$ in Theorem 21] is proved in Ref. 39, where it is discussed in reference to Gleason’s theorem and as a generalization thereof to composite systems. We remark that Theorem 21 provides a (first step towards a) notion of composition in the topos approach to quantum theory.

Finally, we note that we have defined locality in the form of composition of local observable algebras in Eq. (8). This is somewhat reminiscent of the construction of the local observable algebra in algebraic quantum field theory^{74} and motivates to further generalize Theorem 21 to multiple systems. We leave this and similar problems for future research.

## VII. CONCLUSION AND OUTLOOK

In this article, we have shown that important structural theorems of quantum theory—Wigner’s theorem, Gleason’s theorem, the Kochen–Specker theorem, and Bell’s theorem—fundamentally relate to contextuality.

Wigner’s theorem can be rephrased in terms of automorphisms on the partially ordered set of contexts, that is, maps preserving the context order, which are implemented by conjugation with unitary or anti-unitary operators. Hence, instead of demanding transition probabilities between pure states to be preserved, one can equivalently demand the order on contexts to be preserved instead.

The Kochen–Specker theorem is equivalent to the fact that the spectral presheaf has no global sections: given a “local” pure state in every context (each such state assigns sharp values to all observables in its context), there is no way of fitting these together in a consistent way. In other words, there are no dispersion-free quantum states. The “fitting together” here refers to the non-contextuality condition asserting that if an observable is contained in different contexts, then the value assigned to it by the different pure states must be the same.

Gleason’s theorem answers a similar local-to-global problem, yet instead of valuation functions, it considers measures on the physical quantities in a quantum system. From the perspective of presheaves, this is easily achieved by replacing pure states with mixed states *locally*, that is, in every context, and by extending the restriction maps (from the spectral preheaf) to all probability measures, which, thus, become marginalization constraints. Again, one asks for global sections, that is, probability assignments that are consistent *globally* or across contexts. In contrast to the spectral presheaf, global sections do exist in the case of the probabilistic presheaf and by Gleason’s theorem bijectively correspond with quantum states (density matrices in finite dimensions). Gleason’s theorem, therefore, lifts quasi-linearity of probability measures in contexts to linearity on states.

Finally, Bell’s theorem attains a reformulation and substantial extension in terms of (non-trivial) contextuality. Bell inequalities naturally arise as constraints on correlations over a common underlying measure space, where locality demands that it is of composite form, i.e., a product of individual measure spaces. From this perspective, Bell inequalities are restricted to individual contexts only. Our contextual reformulation of Bell’s theorem extends Bell inequalities to many contexts. Locality enters as a notion of composition for contexts, and correlations satisfying the collection of all Bell inequalities over the composite context category become global sections of the Bell presheaf. Surprisingly, the latter correspond with quantum states up to a choice of time orientation in subsystems.

## ACKNOWLEDGMENTS

This work was supported through a studentship in the Center for Doctoral Training on Controlled Quantum Dynamics at Imperial College funded by the EPSRC.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

## REFERENCES

We use “physical quantity” synonymously with “observable.”

See also Ref. 75 about Grete Hermann’s lucid early contribution in interpreting Bohr’s notion of complementarity and her influence on her contemporaries and the subject in general.

^{*}-Products

Strictly speaking, because $1\u0332$ is a contravariant functor, we encode the opposite order, hence the poset $V(N)op$, but this gives exactly the same amount of information as $V(N)$.

It is less clear which positive statement can be given and whether “contextual value assignments” exist or would even make sense.

The Gelfand spectrum Σ(*V*) is equipped with the *Gelfand topology*, which is the topology of pointwise convergence.

The type *I*_{1} factors in the factor decomposition of $N$ correspond with its commutative part. In general, commutative von Neumann algebras admit many valuation functions.^{32,76,77}

There is an obvious notion of countable additivity: *complete additivity* reduces to countable additivity for a separable and to finite additivity for a finite-dimensional Hilbert space.

Note that we dropped the dependence on contexts for the Hilbert space $K$ and the linear map $v:C\u2192K$. This can be done by choosing $K$ sufficiently large,^{36,68} and by absorbing the context dependence on *v* into *φ*_{V}. Alternatively, one can impose constraints under restriction as in Definition 6 for both *v*_{V} and *φ*_{V} individually.

^{*}-algebras and time evolution of quantum systems

We are not considering additional non-commuting operators, such as in the Koopman–von Neumann formalism for classical mechanics (see Ref. 78, for instance).

We include the “trivial” observable $e\u2208O$ represented by the constant function *f*_{e} = 1. This observable simply asks the question “Is the system there?” and the answer is always “yes.” When $O=N$ is a commutative von Neumann algebra, *e* is the identity.

A product measure always exists. It is unique if the individual measures are *σ*-finite, which is the case for commutative von Neumann algebras on separable Hilbert spaces.

In particular, it also holds if no dispersion-free states (truth values) exist, e.g., note that commutative von Neumann algebras $N1,N2$ admit dispersion-free states if and only if they possess a mimimal projection.^{76,77}

For a related discussion on the connection between contextuality and Bell’s theorem, see, e.g., Ref. 79. We remind the reader that in this treatment we exclude the two-dimensional case.

For technical details on the Gelfand duality for commutative von Neumann algebras, see Ref. 80.

In this case, every normal product state extends to a unique normal state on the composite system, and the normal state space of the composite system is generated from normal product states (Propositions 11.27 and 11.28 in Ref. 13).

The canonical time orientation Ψ = exp◦*ψ* in Definition 8 is the exponential of the (canonical) dynamical correspondence on $B(H)$ (cf. Ref. 83). The reverse time orientation Ψ^{*} =*◦exp◦*ψ*.

The notion of time-oriented presheaves over the context category was introduced in Ref. 21.