We present a method for assigning probabilities to the solutions of initial value problems that have a Lipschitz singularity. To illustrate the method, we focus on the following toy example: , , and , with . This example has a physical interpretation as a mass in a uniform gravitational field on a frictionless, rigid dome of a particular shape; the case with is known as Norton’s dome. Our approach is based on (1) finite difference equations, which are deterministic; (2) elementary techniques from alpha-theory, a simplified framework for non-standard analysis that allows us to study infinitesimal perturbations; and (3) a uniform prior on the canonical phase space. Our deterministic, hyperfinite grid model allows us to assign probabilities to the solutions of the initial value problem in the original, indeterministic model.
Some simple mechanical systems are characterized by indeterministic initial value problems. One such example is “Norton’s dome”: a mass at rest on a hill of a specific shape in a gravitational field may stay at rest or start sliding off at an arbitrary time. Around a decade ago, this example caught the attention of philosophers of physics, but similar examples were already discussed by mathematicians and physicists in the nineteenth century. We present a numerical simulation study of a class of such mechanical systems. Our approach is to discretize the time variable and to apply results from a branch of mathematics called “non-standard analysis,” which allows us to work with infinitesimals in the strict sense (i.e., numbers larger than zero but smaller than for every natural number ). This approach also allows us to assign probabilities to the solutions of the indeterministic Cauchy problem, without introducing any non-infinitesimal perturbations (which would defeat the purpose). The methodology may be applicable to study more realistic problems, such as turbulent flows, shock waves, and -body collisions.
I. INTRODUCTION
The nineteenth century mathematician and physicist Poisson was the first to search for a mechanical interpretation of indeterministic Cauchy problems.1,2 Later that same century, Boussinesq gave a gravitational interpretation of a broad class of such indeterministic Cauchy problems by considering a mass placed at rest at the apex of a frictionless surface from a particular family of hill shapes.3 This work seems to have been largely forgotten, but we want to alert physicists and applied mathematicians to a recent revival of this issue in the philosophical literature: this question was raised again by a contemporary philosopher of science, Norton,4,5 who focused on a particularly simple case, now often referred to as Norton’s dome. Malament generalized Norton’s example to a family of problems that we will call Malament’s mounds (presented in Sec. II).6 These examples involve initial value problems with a differential equation that exhibits a non-Lipschitz singularity. Such non-Lipschitz Cauchy problems are prevalent in the context of physical applications, such as turbulent flows and associated dispersion,7 shock waves,8 and collisions in Newtonian -body problems.9 They are also of interest for the foundations of physics, as a case study in determinism and causality, and may be suitable for didactic purposes, to illustrate the role of uniqueness conditions in Newtonian mechanics. In the context of fluid turbulence, this form of indeterminism has been called “classical spontaneous stochasticity,” and it has been proposed that the phenomenon has a quantum-mechanical analog (see Ref. 10, where also the connection to Norton’s dome is mentioned). The main goal of the present paper is to demonstrate a method for assigning probabilities to trajectories that are solutions to non-Lipschitz Cauchy problems. To demonstrate the method, we focus on the toy problem of Malament’s mounds throughout.
Shortly before Norton’s work,4,5 probabilistic approaches have been applied to non-Lipschitz Cauchy problems;11,12 see more recently also Refs. 13–15. (We are grateful to an anonymous referee for pointing us to these papers.) Indeterministic theories can be supplemented by hidden variables to arrive at deterministic theories, which are empirically equivalent and which can be used to assign probabilities to the former (see, e.g., Refs. 16 and 17). This is the approach we take in this paper, where we let the hidden variables take on infinitesimal values (in the sense of non-standard analysis). Some physicists presuppose the existence of one unique solution (obtainable, e.g., via physical regularization). However, there are genuine cases of indeterminism, where multiple solutions obtain with various probabilities; therefore, in general, it cannot be taken for granted at the outset that a unique solution dominates. Our method does not start from any a priori assumption of a unique solution, although it turns out that probability one should be assigned to a single solution in our case study.
To be clear, our aim is to analyze a class of initial value problems, not any natural phenomena (at least not directly). We start from toy problems, which have inherited the usual idealizations and limits to applicability native to classical physics (e.g., point masses and perfectly frictionless, rigid surfaces). Hence, we are not focusing on when the description is a useful one, and the discussion cannot be settled by direct comparison to experiments.
Our approach relies on an elementary application of non-standard analysis, a branch of mathematics first developed by Robinson18,19 as an alternative framework for differential and integral calculus The name contrasts with “standard” analysis in terms of the standard real numbers and associated concepts, such as the standard limit (introduced in terms of an epsilon–delta definition) and the derivative and integral defined in terms of this limit. Alternatively, non-standard analysis extends the set of real numbers with infinitely large numbers and infinitesimals, which are closed under the same algebraic rules as the standard reals. Infinitely large numbers are numbers larger than for all natural numbers ; non-zero infinitesimals are their multiplicatory inverse. The theory uses notions from model theory (a branch of mathematical logic) and is built upon so-called non-standard models of real-closed fields. To keep our paper self-contained yet accessible to non-logicians, we use the framework of alpha-theory, which we introduce in Sec. IV.
Non-standard analysis is often used to obtain results about the real numbers, and it contains a theorem that guarantees that the results will agree with those of standard analysis. However, non-standard results can also be studied in their own right, without the end-goal of obtaining a result in terms of standard reals. Moreover, non-standard analysis captures some of the guiding intuitions from the historical development of the calculus, in particular, the ideas of Leibniz, some of which were lost during the development of standard analysis in the nineteenth century. (For a brief introduction to the history, see, e.g., Ref. 20.) The idea of infinitesimals in the context of calculus and analysis was long believed to be irreparably confused or intrinsically paradoxical, but this assumption was proven wrong by the work of Robinson in the 1960s19 and later developments. Finally, non-standard analysis is close to physical praxis and didactics, which in some regard stays close to the Leibnizian appeal to infinitesimals. Indeed, non-standard techniques have been applied to physics in a variety of applications, including Brownian motion, perturbation theory for differential equations, etc.21 Many of these applications involve a discrete model with infinitesimal steps of quantities that are taken to be continuous in the standard model. In particular, we will consider difference equations on discrete grids with infinitesimal time steps.
Since our method relies on non-standard analysis, it can yield genuinely new results only as long as the results are presented in terms of hyperreal numbers. Once the results are interpreted in the context of standard real numbers, it cannot yield anything that cannot be obtained via methods of standard analysis. However, even in that case, it may still be relevant since the non-standard approach may be shorter, easier to obtain or more instructive than the standard one. Here, we argue that the non-standard approach suggests a way of assigning probabilities to the standard solutions of indeterministic initial value problems. Perhaps this will inspire future work that achieves similar results without having to introduce non-standard methods.
Our paper is structured as follows. Section II reviews the shape, initial value problem, and standard solutions for Malament’s mounds. In Sec. III, we refine our research questions and specify our working hypothesis. In Sec. IV, we introduce concepts from alpha-theory, a simplified approach to non-standard analysis. In Sec. V, we apply this to build an alternative model for Malament’s mounds, in which we can consider infinitesimal perturbations. This approach allows us to assign probabilities to the standard solutions in Sec. VI. We offer some discussion and review our main conclusions in Sec. VII.
II. MALAMENT’S MOUNDS
Norton’s problem represents a mass placed with zero velocity at the apex of a particular dome in a uniform gravitational field. The shape of the dome is chosen such that Newton’s second law applied to the mass takes on a particularly simple form, as we will see below. Malament generalized Norton’s dome to the following family of shapes,6 which yields a family of indeterministic Cauchy problems:
where is any real number in , is the horizontal axis (orthogonal to the gravitational field), is the vertical height (anti-parallel to the gravitational field), and the apex is at the point . See Fig. 1 for five examples of hill shapes. Observe that the above expression becomes undefined for -values larger than ; therefore, the mounds have a maximal height and unilateral width of .
Define as the arc distance measured along the dome from the apex. Then, we find
Expressing as a function of the arc length measured from the apex yields
We assume that the gravitational field is constant with and that a unit mass moves on a frictionless hill of the specified family. Then, Newton’s second law yields a second-order non-linear ordinary differential equation (ODE) involving a non-Lipschitz continuous function. For each choice of , the Cauchy problem for the corresponding Malament mound is given by
This problem is the second-order analog of a textbook example commonly used to illustrate a failure of Lipschitz continuity. As is well known, the solution of such problems is non-unique. Besides the trivial, singular solution, , there is a one-parameter family of regular solutions (see, e.g., Theorem 2 in Ref. 22, due to Kneser23), which can be represented geometrically as a Peano broom (see Fig. 2),
where is a positive real number, which can be interpreted as the time at which the mass starts sliding off the hill. The solution can be verified by substitution into (1).
In the three-dimensional case, there is an additional continuum of possibilities regarding the direction of descent. Throughout this paper, we limit ourselves to the two-dimensional case (as depicted in Fig. 1) such that this indeterminacy is reduced to two possible directions.
III. RESEARCH QUESTIONS AND WORKING HYPOTHESIS
Faced with indeterminism due to a lack of Lipschitz continuity, some authors search for arguments that single out a unique solution, e.g., by regularization (smoothing the system close to the singularity) or adding physical principles or heuristics not encoded by the Cauchy problem itself. The assumption that there is one correct solution (motivated by additional physical constraints besides the mathematical equation) is widely—though perhaps not univocally—held in the field of fluid dynamics. For instance, the authors of Ref. 24 aim to regulate the solutions of Cauchy problems with non-Lipschitz indeterminism to select a unique global solution.
Our current approach is slightly different: lacking a unique solution, we look for a probabilistic description for the trajectory of the mass. This approach may be alien to Newtonian physics (the context in which the problems of Sec. II arose), but it is accepted in classical physics more generally (i.e., in statistical physics). Hence, we start with the following research question:
Given that there are multiple solutions to Cauchy problem (1), is there a well-supported way to assign probabilities to them?
This research question leads us to two more specific questions:
Can we assign relative probabilities to the singular solution vs the family of regular solutions?
Can we assign relative probabilities to the various regular solutions (regarding and the direction of movement)?
Since our questions are aimed at finding probabilities, the usual approach of physical regularization is of no avail here (but see Sec. VII A). Alternatively, we aim to represent the trajectories on Malament’s mounds using a discrete, deterministic model from non-standard analysis that can help us to measure the sought probabilities.
IV. ALPHA-THEORY, α-LIMITS, AND HYPERFINITE GRID DIFFERENTIAL EQUATIONS
In this section, we first introduce a simplified framework for non-standard analysis: Alpha-theory, which was developed by Benci and Di Nasso.25 We then show how their notion of hyperfinite grid differential equations is used to find all solutions to indeterministic Cauchy problems and how to assign probabilities to them.
Alpha-theory defines a new, ideal number, , which can be thought of as an infinite number (larger than every natural number) that captures the rate of divergence of the linear sequence . The theory also defines a new type of limit, the -limit, which can be thought of as the value a sequence would take if it was extended to position . This limit is not to be confused with the standard limit operation: except for the special case of constant sequences of real numbers, they do not agree. (See the end of Sec. IV A for some examples.) Moreover, the -limit is broader in scope: it does not only apply to sequences of real numbers, but to sequences of any type of objects, including sets and hyperreal numbers. We choose alpha-theory because it suffices to introduce the notion of hyperfinite grid functions and associated differential equations.25,26 These hyperfinite grid differential equations behave much like finite difference equations except that the number of steps is infinite and each step is infinitesimal.
A. α-limits
Alpha-theory assumes most of standard mathematics (more specifically, it is based on Zermelo–Fraenkel set theory with the axiom of choice, but without the axiom of regularity) to which it adds six new axioms (see pp. 77–78 in Ref. 25), which we reproduce here for self-containedness:
Every sequence has a unique -limit, denoted by . If is a sequence of atoms (i.e., primitive elements that are not sets), then is an atom, too.
If is a constant sequence with the real number as its value, then .
The -limit of the identity sequence is a new number such that .
- The set of all -limits of real sequences is called the set of hyperreals; denoted byforms a field, called the hyperreal field, where and .
The -limit of the constant sequence with the value equal to the empty set, , equals the empty set; i.e., . The -limit of a sequence of non-empty sets is .
If and are two sequences such that and is a function such that the compositions and make sense, then .
Observe that axiom 4 ensures that the usual algebraic operations on the reals are well-defined for the hyperreals, too.
Some additional definitions are helpful.25
A hyperreal number is called infinite if there exists no standard real such that ; otherwise, the hyperreal is called finite. A hyperreal number is called infinitesimal if there exists an infinite hyperreal such that .
- For any object , its hyper-image (or -transform) is defined asThis extends the meaning of the “hyper-” (or -) prefix already introduced in axiom 4 for the set of hyperreals to all sets and other objects.
A set is hyperfinite if there exists a sequence of finite sets and a sequence such that
- The hyperfinite sum of a hyperfinite set of hyperreal numbers, , is defined as
Finally, we include three theorems that will be useful for our problem. (For proofs, see p. 9, pp. 288–289, and p. 92 of Ref. 25, respectively.)
Every finite hyperreal number is infinitesimally close to a unique real number , called its standard part: . In particular, if is infinitesimal, then its standard part equals zero: . Therefore, taking the standard part can be thought of as “rounding off” the infinitesimal part.
can consistently be assumed to be a multiple of each natural number and each natural power.
For any function , its hyper-image is a function such that for every sequence , it holds that .
It is instructive to compare the -limit of a few sequences. The constant sequence , the linearly decreasing sequence , and the quadratically decreasing sequence all have the same standard limit, 0. However, they have different -limits: , , and . The latter two are infinitesimals, with . Still, the standard part of both these hyperreals is zero. Therefore, one way to interpret the hyperreals is that—compared to real numbers—they retain information about the asymptotic behavior of the sequences by which they were constructed. Likewise, infinite hyperreals contain information about the rate of divergence of the sequence by which they are obtained. For example, the -limit of the sequence is , the square of that of which is . This is similar to Landau’s symbols (small- and big- notation), but the hyperreal field provides a richer algebraic structure.
B. Functions, derivatives, and differential equations on hyperfinite grids
To construct a hyperfinite grid, which we will use to model time, we need to consider the -limit of a sequence of sets. A first example of the -limit applied to a sequence of sets (proven on p. 81 of Ref. 25) is This set is hyperfinite, but not finite.25
A sequence of sets of interest to our problem at hand is
We may think of each as set of discrete moments (a finite grid). As increases, the sets contain more moments per unit of time and span a longer time. The -limit of this sequence is a hyperfinite set, which we call a hyperfinite grid,
contains infinitely many moments per unit of time and infinitely many such units. The infinitesimal time step between two consecutive elements of has length . Intuitively, then, the discrete set can be used to represent the positive direction of time, just like the continuous set often plays this role. If we assume to be a multiple of each natural number and each natural power, then .
We will call a function , which is the -limit of a sequence of functions , a grid function.
For our problem, we also need to define the first and second-order grid derivative of a grid function, (see pp. 160–161 in Ref. 25; the definitions are analogous to Euler’s method). First, consider a family of functions , with the grid function as their -limit. The right-hand grid derivative is then defined as the -limit of the sequence of first-order finite differences on the , which, at position , equal . The factor comes from division by the discrete time step, . Hence, in the -limit,
for all . Likewise, the second-order grid derivative, , is defined as the -limit of the sequence second-order finite differences on the , which, at position , equal . Hence,
where .
With each standard function , we may associate a grid function by restricting its hyper-image to (cf. p. 160 in Ref. 25). It can be proven that where and are defined, ,
with such that . (Proof is given on p. 161 of Ref. 25.)
With a given standard Cauchy problem, we can now associate a hyperfinite grid differential equation with two initial conditions on the grid function. This “association” is one-to-many because the standard initial conditions can be precisified in infinitely many ways. In particular, there are infinitely many hyperreals , , and such that , , and . Each choice of , , and leads to a different solution in terms of a grid function. When the standard Cauchy problem has a unique solution, then all these grid functions are infinitesimally close; hence, they have the same standard part. It can be shown that the standard part of the grid solution equals the solution to the standard Cauchy problem. However, when the standard Cauchy problem fails uniqueness, the associated grid functions contain pairs that differ by more than an infinitesimal from each other; hence, they have different standard parts. The approach we sketched here is based on the idea of the non-standard proof for the standard Peano theorem (see, e.g., pp. 165–167 in Ref. 25, p. 32 in Ref. 21, and Ref. 27), which—unlike the standard proof—shows us how to construct all these solutions. We apply this to our case in Sec. V.
Moreover, we can use this approach to associate a probability measure to the different standard solutions. Recall that the different infinitesimal precisifications can be obtained as -limits of different converging sequences, which also correspond to all the different ways one could take the standard limit (and which may lead to different standard outcomes). Rather than arguing for one limit process as the correct one, we take a measure over all possible ways of converging. This idea is not entirely novel, although its application to non-Lipschitz continuity is: a similar approach was proposed in the context of stochastic differential equations.26 We will explain this in Sec. VI.
V. HYPERFINITE GRID DIFFERENTIAL EQUATION WITH INITIAL CONDITIONS FOR MALAMENT’S MOUNDS
Let us now apply the approach outlined in Sec. IV to our standard Cauchy problem. We will construct all solutions to this problem, i.e., functions , such that the set of equations (1) hold, using a hyperfinite grid equation.
Each hyperfinite grid Cauchy problem associated with our standard problem looks as follows:
where , , and are infinitesimals. A solution to this problem is a grid function such that (3) holds.
In order to find these solutions, we construct a sequence of finite difference equations on the set of moments with associated initial conditions. The finite difference equation on associated with our ODE is, for all ,
To make explicit that the solution can be obtained by iteration, we rewrite this as a recurrence relation and replace by in all terms (so now ). Adding the initial conditions, we obtain the following sequence of discrete initial value problems:
where and are real-valued sequences that converge to 0 as goes to infinity; therefore, their -limits are infinitesimal. Notice that we can continue for arbitrarily large values, i.e., beyond the maximal height of the mounds. However, we are interested only in the region around the apex.
In general, one should also consider the sequences that converge to 0 as the initial moment (instead of fixing this at for all ), but for autonomous equations such as ours, this does not lead to more than an infinitesimal difference in the -limit; therefore, it does not impact the standard solution.
For each choice of the initial conditions, and , the solution is a unique sequence that can be obtained recursively. When , we obtain the constant sequence for all , which corresponds to in the -limit and leads to the singular standard solution after taking the standard part.
What if or or both are non-zero? Unfortunately, there is no analytic solution known for the non-linear difference equation of second order in (4); therefore, we will have to examine its behavior numerically. However, notice that we can read off the scaling behavior of the non-zero solutions from the difference equation directly, by noticing that terms of the form are added to terms of the form . Hence, by the principle of dimensional homogeneity, expressing as multiples of is helpful because this factors out in all terms. This scale factor will also play a crucial role in taking the -limit.
A. Numerical study of the family of finite difference equations
Numerically, we find that for any value of , , and , we can fit the numerical solutions of (4) to
for some real value of , and this fit improves as increases. Therefore, from now on, we will only consider the final value of , which equals . Hence, we can estimate as , where
1. Results for Vn,0 = 0
First, we study the special case where ; hence, the initial grid velocity is zero.
For example, if we consider and (the scale factor), the fitted is about . Therefore, for instance, when , . This value is negative, meaning that, at large , the fitted curve looks as though the mass left the top before . This is exactly as you would expect for starting positions that are relatively far from the apex. In terms of direction, the mass will slide off exactly in the direction of the initial displacement.
For values of that are smaller, we find other fitted solutions with a less negative or even positive -value. Positive -values correspond to fitted curves that, at large , look as though the mass left the top later than . Therefore, the smaller the perturbation, the more apparent delay, as expected. Moreover, the relation between the size of the perturbation and the -value of the corresponding solution is highly non-linear. This is illustrated, for the case , in Fig. 3.
In Figs. 4–6, we compare initial conditions (blue curves) with initial condition (orange curves) (both with and ). Figure 4 shows pairs of . These discrete curves can be thought of as parameterized by time: subsequent data points are a temporal distance of apart. Figure 5 shows the same two sequences as a function of for large : at this scale, the sequences on the one hand and the continuous curves on the other hand nearly coincide, allowing an excellent fit between them. We see that the orange curve reaches the distance of, e.g., 50 at larger (i.e., later in time) than the blue curve. Therefore, the orange curve is delayed compared to the blue one, which is consistent with the curves’ -values. Figure 6 shows the two sequences as a function of , now for small : at this scale, the sequences and the continuous curves are qualitatively different, although the fit between them is excellent for large , as we saw in Fig. 5. The fitted values of do not correspond to the minimum of the sequences, which occurs at for both.
2. Results for a general case
We need to vary both initial conditions and independently to study the dependence of the discrete perturbation on . In this section, we study this dependence systematically; therefore, we no longer require the initial grid velocity to be zero.
We wrote a program in visual Pascal (Delphi), which allows us to study the effect of initial conditions in the recurrence equation (4) on the fitted -value understood as via (5).28 We use our program to determine the -values and to represent them using a color scale: see Fig. 7 for an example of the output. The legend in the figures indicates the range of the -values. In practice, the and intervals start at a number slightly higher than zero (and much smaller than the upper bound): otherwise, the singular solution (at ) is in the field of view, dominating the -scale.
It is instructive to see the results of our program combined with particular sequences or trajectories. This is shown in Fig. 7. The part of the trajectory below the main diagonal corresponds with a mass moving toward the apex and coincides with positive -values.
In general, we see that solutions with a positive -value, visible as a narrow red band in the figures, mainly occur for smaller than (below the main diagonal ). This is to be expected: if the (sufficiently small) initial grid velocity is negative (i.e., directed toward the top), the mass first moves toward the apex, before sliding off, thus increasing the (apparent) . However, for fixed , cannot be chosen arbitrarily small; otherwise, we select a trajectory that goes over the top and slides off on the other side, resulting in a smaller positive or negative . For positive grid velocities, the mass slides off the mound monotonically with a smaller or more negative apparent at large as compared to the same with .
We also studied the dependence of on (keeping fixed). As an example, we considered , , and ; this corresponds to the right-hand edge in Fig. 7(d). When is varied from 0 to , monotonically increases to an asymptote (located near ) and then monotonically decreases. This is shown in Fig. 8. (Since we do not have an analytic solution to the recurrence equation, we cannot determine the position of the asymptote analytically either.)
Continuing with the example, the initial condition corresponds to a mass that is released from an arc distance of with grid velocity zero. As we already discussed in Sec. V A 1, it immediately starts sliding off from the initial side. This leads to a negative -value. Initial conditions with between the asymptote and correspond to a mass that is released from the same distance but now with a positive velocity toward the top. As is decreased in this interval, the mass moves closer to the apex before sliding down, leading to a monotonic increase in the -value toward the asymptote.
The initial condition corresponds to a mass that is released from a distance of the apex with a velocity directed toward the apex such that the mass already reaches the top at : this leads to a mass that rapidly slides off the dome at the other side, also corresponding with a negative -value. As is increased up to the aforementioned asymptote, the velocity decreases, leading to a slower descent from the other side, hence the monotonically increasing -values.
The slope of the -curve in the -interval between 0 and the asymptote is characterized by an infinite sequence of points of inflection. The first one occurs at , the second one at , the third one at , etc. In general, . Solving for in the case where and yields exactly. In principle, the position of the other points of inflection can be computed from the general expression for , but this becomes impractical quickly. [For instance, the relevant equation for the inflection point corresponding with is .] Instead of the analytic approach, we have determined numerically that the third and fourth inflection points occur around and , respectively. The position of the asymptote can be regarded as the limit of this sequence of inflection point positions, but this does not yield a more practical way of computing it.
So far, all numerical results we have shown were for Norton’s dome (). Figure 9 presents results obtained by varying . As increases, the region with positive -values becomes narrower and its slope approaches the main diagonal. In other words, increasing looks like zooming out and decreasing looks like zooming in as compared to the intermediate case where . Quantitatively, this scaling behavior is consistent with the scale factor, .
B. Results in the α-limit
For our purposes, it is crucial that we do not introduce any finite perturbations (as in Sec. V A), but that we keep the standard part of the initial values and in (3) exactly zero. In order to achieve this, we take the -limit of sequences of finite grid Cauchy problems, for which the perturbations become infinitesimal in this limit. We focus on the subset of both and because we know the scaling behavior of the finite discrete functions on the corresponding sequence of finite intervals . Computationally, the results for do not change, even when we change (and ), as long as we scale the plots like this. Since the result on this scale is exactly the same for every finite above a certain threshold, alpha-theory guarantees that this scaling behavior holds in the -limit as well. This way, we can plot panel (a) of Fig. 10 using standard numerical simulations.
For finite , the solution to (3), , is a finite sum of infinitesimals; therefore, its standard part is zero. For infinite , is a hyperfinite sum that in general need not be infinitesimal, even when all the terms are. For example, for the sequence of finite difference equations with initial conditions , the difference between the sequence and decreases faster than , where we find numerically that is about equal to . Therefore, when we take the -limit of this sequence of finite grid Cauchy problems, with , we find the hyperfinite grid solution: . For values of such that is finite, we take . Then, corresponds with the general solution (2) with . Therefore, for all , , where the latter is one of the solutions to the continuous Cauchy problem. In particular, we find the undelayed solution: . As we see in Fig. 11, to obtain a standard solution with a non-zero , we have to pick a -value that is finely tuned to . This observation is directly relevant for the assignment of probabilities in Sec. VI.
As explained in Sec. IV, the indeterminism in the standard model (continuous, without infinitesimals) can be interpreted as being due to rounding off the infinitesimals from the hyperfinite model. We have now seen an example of this in terms of our toy model. Finally, this attribution or correspondence allows us to assign probabilities to the standard solutions.
VI. USING THE HYPERFINITE GRID DIFFERENTIAL EQUATION TO ASSIGN PROBABILITIES TO THE STANDARD SOLUTIONS
Faced with the family of regular solutions (2) to (1), it might be tempting to impose a probability density on directly: a uniform probability measure on perhaps? This approach faces two problems. The first one is an immediate consequence of the fact that may be arbitrarily large: there is no standard countably additive probability measure that is uniform over an infinite support. This problem may be overcome by adopting a merely finitely additive probability function. The second problem is that the measure is not robust under reparameterizations; therefore, one needs an argument to favor a uniform measure over , rather than over , , or some other transformation.
Instead of imposing a probability measure on directly, for which we know no principled way of choosing one, we approach the issue differently. Starting from the hyperfinite model, we first consider hyperreal initial conditions that are randomly chosen from a suitable interval that guarantees that they are both infinitesimal and then compute the resulting probabilities for obtaining the singular solution and for the values of associated with the family of regular solutions. The first step amounts to assuming a uniform prior on the phase space. Now, we only face the second problem: arbitrary coordinate transformations lead to infinitely many representations of the same phase space. Our next question, then, is how to make a principled choice here.
A. Phase space for uniform, random sampling
In (1) and (3), we have used the Newtonian formalism with two generalized coordinates: the arc length and the arc (grid) velocity. It is well known that measures on the phase space change due to coordinate transformations. To report the results of our numerical experiments, we have used the arc length at two different times as the phase space ( and ), rather than the initial arc length and the grid velocity ( and ), which can be viewed as an example of such a transformation.
To make a principled choice, we take our cue from statistical mechanics. In Ref. 29, Goldstein reviewed arguments (going back to Boltzmann) to the effect that a non-arbitrary choice for the probability measure is a uniform measure that is invariant under the dynamics. Statistical physicists had to resolve this issue because they often appeal to the notion of typicality in the sense of “almost all” trajectories. Clearly, such statements only make sense relative to a specific measure, which turns out be the Lebesgue measure on the phase space. This choice is motivated by Liouville’s theorem, which implies that the Lebesgue measure on the phase space conserves probability. Outside of statistical mechanics, the approach to associate a unique probability measure to a random selection of initial conditions via invariance requirements is also well-known from the work of Jaynes,30 which led to many applications of such maximum information entropy (MaxEnt) methods.
In other words, we need coordinates such that the density of states is conserved on the phase space. For this, we have to consider the Lebesgue measure on the canonical coordinates from Hamiltonian mechanics (such that Liouville’s theorem applies). For our problem, the canonical coordinates are the arc length and the conjugate momentum, which equals mass times the arc velocity. Since we assumed a unit mass, the phase space spanned by is the proper starting point for a probabilistic analysis that allows us to start from a uniform probability distribution (MaxEnt). Hence, we use to represent the initial conditions [where ], as shown in panel (b) of Fig. 10.
Therefore, our prior probability is motivated by the dynamics, which privileges the uniform measure on the -plane. This is what we take selecting “random” initial conditions to mean and how to reason about “typical” results. However, this is a defeasible choice: if there is any background information on how the initial position and velocity are realized (due to preparation or post-selection of the system), we should adapt the prior in light of it. For instance, one might consider a process that aims to place the mass as close to the top as possible with a velocity as close to zero as possible and then selects systems of which the real-valued initial position and velocity are indeed zero. In such a case, it is clear that the prior does not come from the dynamics of the system under study, but from an independent placement mechanism. In the case of such an external influence, the Cauchy problem does not contain all the information about the system; therefore, it does not suffice to determine the probabilities. For this example, we may want to consider a Gaussian distribution around the origin in the -plane instead of a uniform distribution. Fortunately, as we will see in Sec. VI B, our main results (in terms of probabilities for the -values) turn out to be quite robust: they are valid for a uniform measure over as well as any Gaussian or other finite transformations of it.
The vector field on this phase plane is shown in Fig. 12, where the symmetry between the upper and lower half of the plane shows that the dynamics is time reversible. If we reinterpret as a position rather than an arc length and include the opposite side of the slope as negative -values (not shown), the vector field shows the signature of a saddle point at the singularity (indicating an unstable equilibrium).
B. From measuring initial conditions to probabilities
We now propose to measure the probability of sets of standard solutions as the standard part of the normalized area of the set of corresponding initial values, , in the hyperfinite grid model. Since there is no such thing as the “largest infinitesimal,” we have to normalize on and being in a particular interval of infinitesimals. For this purpose, we select and ; therefore, the normalization factor is . We can now represent any event as a subset of and consider its probability as the standard part of the area of the set divided by . (This approach is similar to that of Ref. 26, where it is connected to a Loeb measure.31)
All the events we have discussed in Sec. V B were contained in , which is a strict subset of the proposed reference class (for every ) and has an infinitesimal normalized area of .
We already observed that exactly one combination of hyperreal-valued initial conditions leads to the equilibrium solution: and . This means that the singular solution has zero area. Although it is not logically impossible for it to happen, it does not happen almost surely. This assignment is very robust: it holds not only for the uniform prior on , but for any prior that does not assign more than an infinitesimal portion to the singleton . All other initial conditions, in , are associated with a regular solution. They carry unit probability; therefore, a regular solution happens almost surely. This settles our first research question.
Our second research question asked how to assign relative probabilities to the -values in the family of regular solutions. The key to answer this lies in our observation that the relation between the infinitesimal initial conditions and the -parameter in the corresponding continuous solution is strongly non-linear [where ]. Figure 11 shows that, for the specific choice where , the interval where is positive covers a non-infinitesimal fraction of (about 10%), but this interval itself is only an infinitesimal fraction of the entire -range, . Therefore, the normalized area corresponding to positive -values is infinitesimal. Moreover, almost all positive -values are such that is infinitesimal: since increases so fast in the region where it is positive, the interval of -values that correspond to a non-infinitesimal value is infinitesimal compared to the interval where is positive (Fig. 11). (Due to this highly non-linear dependence of on , for fixed , it was also difficult in the numerical experiments with the corresponding finite difference equations to find explicit examples of large, positive -values.) The normalized area corresponding to negative -values is unity minus an infinitesimal. The negative -values are all infinitesimal. Therefore, for , we find the undelayed standard solution [i.e., ] almost surely.
These observations hold in general, across all : almost all infinitesimal initial conditions correspond to the undelayed standard solution and only an infinitesimal proportion of all infinitesimal initial conditions correspond to standard solutions with . Hence, if we assume a uniform distribution of the infinitesimal initial conditions, we arrive at the following probabilities for the standard solutions:
The probability of the mass staying at the apex of any of Malament’s mounds indefinitely (singular solution) is zero.
The normalized area of initial conditions in the hyperfinite grid model corresponding to the mass staying at the apex of any of Malament’s mounds for some observable time is infinitesimal; therefore, the probability is zero.
The normalized area of initial conditions in the hyperfinite grid model corresponding to the mass immediately starting to slide off the apex of Norton’s dome or any of Malament’s mounds is one minus an infinitesimal; therefore, the probability is unity.
In conclusion, a point mass with velocity zero at the apex of any frictionless Malament’s mound in a uniform gravitational field will immediately start sliding off the dome almost surely. If the initial conditions are external to the Cauchy problem, other priors can be considered and our methodology may yield another result. For instance, for an asymmetrical measure on the -plane, the conclusion will be different. However, the above conclusion continues to hold if the uniform measure on the -plane is replaced by a symmetric Gaussian (cut-off for non-infinitesimal values since they contradict the conditions set by the standard initial value problem).
In addition, for a mass sliding toward the apex (from a finite distance and with finite velocity), reaching it with a standard velocity of exactly zero, it will either slide off from the opposite side immediately or slide back immediately (depending on the precise infinitesimal position and velocity values), almost surely. In this case, the initial values are sampled from a different part of the phase plane, but it remains the case that those corresponding to any measurable delay are of infinitesimal measure as compared to those yielding no measurable delay.
So far, we have only presented equations and results that rely on distances to the apex. By adding sign information, we can keep track of which side of the two-dimensional cross section of the dome the mass is on. We find that the probability of sliding off on either side of the dome is . This can be seen directly from the symmetry of the extended phase plane (not shown) in combination with the uniform prior. Moreover, observing the initial infinitesimal position and velocity allows us to predict the final descent direction. Therefore, unlike the original model, the hyperfinite model does not exhibit spontaneous symmetry breaking: either the symmetry is broken at or the solution remains symmetric (when and . Hence, the symmetry breaking in the standard model may be thought of as being due to rounding off the infinitesimals in the hyperfinite model.
VII. DISCUSSION AND CONCLUSION
First, we comment on possible applications of this work. Then, we draw general conclusions.
A. Relation to contemporary hydrodynamics literature
In our paper, we focused on a toy example. However, differential equations with a non-Lipschitz singularity are prevalent in the context of physical applications, such as shock formation and turbulent flows; a case that is widely studied is that of the Burgers equation (a first-order, non-linear partial differential equation) in the inviscid limit (i.e., the limit of viscosity to zero, which is equivalent to the infinite limit of the Reynolds number). In Sec. I, we already mentioned some publications that used probabilistic approaches to such problems. Here, we briefly review results from this literature. Where possible, we connect their results to ours. Fully comparing, “translating,” and contrasting the cited works to our methodology, however, would warrant a separate study, much more extensive than the current paper. In any case, our study shows that it will be crucial to pay due attention to the order of the (standard) limits: when the limit of finite perturbations to zero is taken as the first step, there is no way to recover the probabilities associated with various rates of convergence afterward. Alpha-limits have the advantage of retaining this asymptotic behavior automatically by encoding it into distinct infinitesimals.
In the context of passive scalar transport in a turbulent velocity field, E and Vanden-Eijnden11 introduced “generalized flows,” which are families of probability distributions on the space of solutions to a non-Lipschitz ODE. They started from a first-order ODE for a non-Lipschitz velocity field and considered probability distributions on the set of solutions: either as a probability measure on the path-space or as transition probabilities (which degenerates to unit probability mass at the unique path in the case of Lipschitz continuity). They considered the analogy to stochastic ODEs with a random (white noise) velocity field and also two natural regularizations of the problem, which do not always give identical results.
Building on this, E and Vanden-Eijnden13 presented some examples, including a first-order analogous case to the second-order ODE that we discussed. Applied to our case, their approach relies on a stochastic process to determine the time and the initial direction of the descent from the top. Like in our approach, the singular solution has measure zero, and both directions of descent are equiprobable. They considered transition probability distributions to characterize the random field, which they used to define a generalized flow for the non-Lipschitz ODE.
Falkovich et al.12 compared chaotic (deterministic) behavior with exponential separation and truly turbulent (stochastic) behavior with explosive separation (with power law scaling): only in the second case do infinitesimally close trajectories separate in finite time. In the inviscid limit, the ODE becomes non-Lipschitz, allowing for multiple Lagrangian trajectories. They considered a statistical description of the trajectories, in terms of a stochastic Lagrangian flow, which allows, for instance, the study of averages.
E and Vanden-Eijnden11,13 showed that different regularization processes give rise to different generalized flows, without one of them being uniquely well-motivated by the underlying physical context. In more recent work, however, Mailybaev et al. did propose a way to assign a unique (i.e., independent of the regularization method) statistical probability distribution to the Burgers equation in the inviscid limit,14 as well as a class of ODEs that also become non-Lipschitz in this limit.15
After a trajectory encounters such a singularity in finite time (known as “blowup”), the evolution is no longer deterministic: there are continuum many solutions. One way to understand this is that the initial state represents a Dirac-delta distribution of initial conditions: whereas this remains a Dirac-delta distribution for fully deterministic evolutions, non-Lipschitz singularities make the delta-distribution spread out to a “spontaneous” probability distribution—a phenomenon known as “spontaneous stochasticity.” This phenomenon may also have a quantum-mechanical analog.10
Using methods related to those in our paper, we can re-interpret the Dirac-delta distribution as a function from non-standard analysis (an infinitely small Gaussian), non-singular points as finite dispersion (such that infinitesimal differences remain infinitesimal), and the singular point as a place where there is infinite dispersion (such that infinitesimal differences become non-infinitesimal). Viewed as such, the non-Lipschitz indeterminism of the continuous model can be connected to a case of deterministic chaos in a corresponding hyperfinite model.
B. Conclusion
In this paper, we presented a method for assigning probabilities to the solutions of initial value problems that lack Lipschitz continuity. First, we linked the differential equation to sequences of finite grid differential equations, which are deterministic and allow for systematic numerical studies. Second, to avoid introducing any non-infinitesimal perturbations, we considered the -limit of the sequences and their solutions as hyperfinite grid functions. Third, starting from a uniform prior on the phase space spanned by the canonical coordinates, we assigned probabilities to the standard part of these hyperfinite grid functions, which equal the solutions of the corresponding, continuous Cauchy problem. Although we set out to find a probability distribution over the solutions, we found unit probability for one single solution (the non-delayed, regular solution). Hence, while we did not assume uniqueness at the outset, we do find ourselves in agreement with authors who set out to find a unique continuation beyond the non-Lipschitz discontinuity.
Our methodology and results are analogous to the study of fully deterministic chaotic systems without any singularities: in classical chaos, when two systems have initial conditions that differ by a non-infinitesimal amount below the measurement precision, they cannot be distinguished empirically at the start. Yet, the resulting trajectories measurably diverge at some point, and these later states can be used to determine their initial positions beyond measurement precision at the time. (However, see Ref. 17, which suggested to reinterpret this as indeterminism after all.) Likewise, in the case of indeterministic Cauchy problems, the changes at later times can be attributed to infinitesimal differences present at the start in the corresponding hyperfinite model. (These infinitesimal differences are not measurable on any real-valued measurement precision.) The hyperfinite model produces asymmetric results only when the initial conditions are asymmetric; therefore, we find that the symmetry breaking in the standard model can be thought of as the result of rounding off the infinitesimals.
The current paper focused on toy examples (Malament’s mounds), but our methodology is applicable to study other initial value problems that lack Lipschitz continuity. Hence, we hope our approach will be fruitful for application to the study of singularities in hydrodynamics and other blowup phenomena.
ACKNOWLEDGMENTS
We are grateful to an anonymous referee for a very instructive report that greatly helped us to improve the presentation of our methodology, to Vieri Benci for helpful discussions on this topic, and to Christian Maes for feedback on an earlier version. Part of S.W.’s work was supported by the Research Foundation Flanders (Fonds Wetenschappelijk Onderzoek, FWO) (Grant No. G066918N).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The program binary as well as the source code for the numerical study are openly available in GitHub at https://github.com/DannyVanpoucke/NortonDomeExplorer, Ref. 28.