Stellar systems—globular and nuclear star clusters, elliptical and spiral galaxies and their surrounding dark matter haloes, and so on—are ubiquitous characters in the evolutionary tale of our Universe. This tutorial article is an introduction to the collective dynamical evolution of the very large numbers of stars and/or other self-gravitating objects that comprise such systems, i.e., their kinetic theory. We begin by introducing the basic phenomenology of stellar systems, and explaining why and when we must develop a kinetic theory that transcends the traditional two-body relaxation picture of Chandrasekhar. We then study the individual orbits that comprise stellar systems, how those orbits are modified by linear and nonlinear perturbations, how a system responds self-consistently to fluctuations in its own gravitational potential, and how one can predict the long-term evolutionary fate of a stellar system in both quasilinear and nonlinear regimes. Though our treatment is necessarily mathematical, we develop the formalism only to the extent that it facilitates real calculations. Each section is bolstered with intuitive illustrations, and we give many examples throughout the text of the equations being applied to topics of major astrophysical importance, such as radial migration, spiral instabilities, and dynamical friction on galactic bars. Furthermore, in the 1960s and 1970s, the kinetic theory of stellar systems was a fledgling subject which developed in tandem with the kinetic theory of plasmas. However, the two fields have long since diverged as their practitioners have focused on ever more specialized and technical issues. This tendency, coupled with the famous obscurity of astronomical jargon, means that today relatively few plasma physicists are aware that their knowledge is directly applicable in the beautiful arena of galaxy evolution, and relatively few galactic astronomers know of the plasma-theoretic foundations upon which a portion of their subject is built. Yet, once one has become fluent in both Plasmaish and Galacticese, and has a dictionary relating the two, one can pull ideas directly from one field to solve a problem in the other. Therefore, another aim of this tutorial article is to provide our plasma colleagues with a jargon-light understanding of the key properties of stellar systems, to offer them the theoretical minimum necessary to engage with the modern stellar dynamics literature, to point out the many direct analogies between stellar- and plasma-kinetic calculations, and ultimately to convince them that stellar dynamics and plasma kinetics are, in a deep, beautiful and useful sense, the same thing.

The evolution of our Universe is a great unfolding drama in which galaxies play a fundamental role. These galaxies grow and die; they pulse and throb; they swallow gas and they belch it back out again; and they merge together and they tear one another apart. The study of the collective dynamical behavior that occurs inside galaxies—and of the violent lives they lead, the tumultuous dark matter environments in which they sit, and the star clusters they cradle—is best termed the kinetic theory of stellar systems. This is the subject with which we are concerned in this tutorial article.

The kinetic theory of stellar systems rests upon two main theoretical pillars, and our tutorial article aims to introduce colleagues to both of them. The first pillar is the theory of orbits. By this, we mean the orbital motion of stars and/or dark matter particles in gravitational fields, or, more broadly, the dynamics of point particles subject to some Hamiltonian flow. The basic language of this topic is angle–action coordinates. These are not only the natural variables with which to describe smooth orbits in quiescent gravitational fields but also to understand how these orbits are modified by perturbations of various kinds. These perturbations tend to have the most dramatic effect on orbits near resonances, where the frequency of the perturbation matches the natural frequency of the orbit. The first aim of our tutorial article is to introduce colleagues to these concepts. We employ as little heavy formalism as possible, assuming that readers already have a passing familiarity with Hamiltonian mechanics, our aim being to help them understand how they apply in the stellar dynamical context.

The second pillar upon which our subject is built is the collective dynamics of many bodies, and here our theoretical foundation is plasma kinetics. Why is this the case in a treatise on gravity? Well, every high school student knows that electromagnetism and gravity are intellectually adjacent subjects: up to a sign, the Coulombic electrical force and the Newtonian gravitational force are equivalent. What is less well appreciated is that this adjacency extends much further than simple ±1/r2 force laws. Indeed, the mathematical methods developed to probe the collective behavior of protons and electrons in astrophysical and fusion plasmas are in many cases precisely the same as those needed to describe the dynamics of stars, spiral arms, and dark matter in galaxies like the Milky Way. Concepts like two-body relaxation, linear response theory, Landau damping, resonant wave–particle interactions, and phase mixing all had origins in plasma theory but are now part of the standard lexicon of galactic dynamics. However, these two fields long ago went their separate ways, and in much of today's literature are separated by a near-impenetrable wall of jargon. In the second part of this tutorial article, we develop the tools needed to describe collective behavior in galaxies. We aim to provide colleagues with the theoretical minimum needed to appreciate galactic-dynamical problems. In so doing, we aim to convey—to those on both the stellar-dynamical and plasma-theoretic sides of the aisle—just some of the deep connections between the two fields, and to encourage much greater crosstalk between these two estranged, yet intimately related, subjects.

In contrast with most other treatises on stellar (or “galactic”) dynamics and galaxy evolution, we approach our subject from the point of view of theoretical physics rather than practical astronomy. Thus, we spill almost no ink over such crucial concepts as mass-to-light ratios and chemical evolution and dust extinction (though we happily refer the reader to the many excellent texts that do).1–3 Our models are necessarily rather simple, in some cases even crude, since our goal here is to provide physical insight rather than to accurately reproduce observational data. On the other hand, we do not simply develop mathematical formalism for its own sake. Almost every formal calculation we present is accompanied by a concrete example of its application. We hope this approach underscores our key message, which is that the kinetic theory of stellar systems is not merely the mathematical fancy of a few specialists. Instead, it can and should aid our understanding of galaxy evolution; and it passes the basic test of all good physics, by telling us things about the real world that we did not already know.

To set the context for this tutorial article, and to get an idea of the characteristic scales involved, we invite you to consider Fig. 1. Panel (a) shows an image of a disk galaxy called the Southern Pinwheel, captured by the la Silla Observatory in the mountains of Chile. We have chosen this image not because it is visually stunning—though who could deny that?—but because it exhibits several of the paradigmatic features of disk galaxies, and is in fact thought to resemble our own Galaxy, the Milky Way. Like the Milky Way, the Southern Pinwheel has an elongated or barred structure at its center and exhibits beautiful spiral arms. The red knots strung along the arms are H α emission from regions of recent star formation, meaning stars have been born there in the last few tens of Myr (where 1 Myr 106 years), while the bluer light comes from much older stars, up to ages of 10 Gyr (where 1 Gyr 109 years; the Big Bang happened 13.7 Gyr ago, and galaxies formed in the first few Gyr that followed).3 The image is haphazardly obscured by thin, dark dust lanes, regions wherein visible starlight has been absorbed. The whole galaxy is a few tens of kiloparsecs (kpc) across, where 1 pc 3×1016 m. (The distance from the Sun to the next-closest star, Proxima Centauri, is of order 1 pc.) From our perspective, we are seeing the Southern Pinwheel galaxy more or less face-on, but if we were to see it from the side we would find that it was very thin, with a central bulge—like two fried eggs glued back to back. It consists of around N1011 stars, most of which are orbiting in this very thin, nearly two-dimensional disk plane, and they mostly move on (quasi-)circular orbits around the center of the galaxy, all rotating in the same direction. This means that the motion of stars in the galaxy is highly ordered, a.k.a. dynamically cold. Another way to say this is that the stellar disk, like a plasma beam, has low-velocity dispersion. The key dynamical timescale in this disk is the azimuthal orbital period of one of the stars, which is typically tdyn100 Myr, and this timescale increases with orbital radius. Since galaxies are 10 Gyr old, we deduce that most stars have completed fewer than 100 orbits in the lifetime of the galaxy.

FIG. 1.

(a) The spiral galaxy M83 (the Southern Pinwheel). Credit: ESO. (b) The globular cluster M80. Credit: NASA, The Hubble Heritage Team, STScI, AURA.

FIG. 1.

(a) The spiral galaxy M83 (the Southern Pinwheel). Credit: ESO. (b) The globular cluster M80. Credit: NASA, The Hubble Heritage Team, STScI, AURA.

Close modal

Now let us contrast this with panel (b) of Fig. 1, which is an image of the globular cluster M80. This is another optical image, taken with the Hubble Space Telescope. Globular clusters are much smaller than galaxies, with typical radii of 10pc—in fact, a galaxy like the Southern Pinwheel typically contains many hundreds or even thousands of globular clusters. Despite their relative smallness, globular clusters have a much higher density of stars than do galaxies. The cluster M80 contains something like N105 stars, so the typical distance between stars is (105/pc3)1/30.02pc, of order 50 times smaller than the distance from the Sun to Proxima Centauri. Globular clusters typically contain very little gas and hence exhibit almost no ongoing star formation. This is why there is a much narrower range of colors in this image compared to the one beside it. Moreover, unlike disk galaxies, globular clusters are (nearly) spherically symmetric, so the image of M80 would not look very different if we were able to view it from another angle. Also unlike disk galaxies, they are disordered, dynamically hot systems, meaning that the motion of their stars has no strongly preferred direction (like the isotropic Maxwellian in an electrostatic plasma). More mathematically, at any given point r in the cluster, the local phase space distribution function f(r,v) depends primarily on the magnitude, rather than the direction, of the velocity v. Though they are of a comparable age to galaxies—having formed 10 Gyr ago—globular clusters are much more “dynamically old” than galaxies, since the typical orbital period of their stars is only tdyn105 yr. Thus, each star in a globular cluster has undergone 105 orbits in its lifetime.

To summarize: disk galaxies are dynamically young and cold, globular clusters are dynamically old and hot.

The brief astronomy lesson above motivates us to introduce two simple mental models which we will refer to throughout this tutorial article: the cold, rotating axisymmetric disk and the hot, nonrotating spherical cluster—see Fig. 2 for an illustration.

FIG. 2.

Schematic illustration of typical orbits in (a) cold disks and (b) hot spheres (idealized versions of the archetypical systems shown in Fig. 1). In a cold disk, most orbits are nearly circular and confined to low inclinations relative to the disk plane. By contrast, in a hot sphere the orbits are near-isotropically distributed and have a broad range of eccentricities.

FIG. 2.

Schematic illustration of typical orbits in (a) cold disks and (b) hot spheres (idealized versions of the archetypical systems shown in Fig. 1). In a cold disk, most orbits are nearly circular and confined to low inclinations relative to the disk plane. By contrast, in a hot sphere the orbits are near-isotropically distributed and have a broad range of eccentricities.

Close modal

These are really just (over-)simplified versions of disk galaxies and globular clusters, respectively, and constitute clean examples of stellar systems that exhibit completely different dynamical behaviors. Some of the characteristic properties of these two prototypical systems are summarized in Table I.

TABLE I.

Characteristic scales for the two prototypical systems considered in this article, the disk galaxy and the globular cluster.

Disk galaxy Globular cluster
Mental model  Cold, rotating axisymmetric disk  Hot, nonrotating sphere 
Size   10s of kpc  10 pc 
Age   10 Gyr   10 Gyr 
Number of stars N  1011  105 
Orbital period tdyn   100 Myr   0.1 Myr 
Disk galaxy Globular cluster
Mental model  Cold, rotating axisymmetric disk  Hot, nonrotating sphere 
Size   10s of kpc  10 pc 
Age   10 Gyr   10 Gyr 
Number of stars N  1011  105 
Orbital period tdyn   100 Myr   0.1 Myr 

We will find that the cold disk is analogous to a nonthermal plasma (with collective oscillations/instabilities and wave–particle interactions playing a key role), while the hot sphere is more analogous to a thermal plasma or a classical gas (with a kinetic evolution primarily driven by discrete, local two-body encounters). However, there are many subtleties and differences to be encountered along the way, and gravitational systems are capable of behaviors that have no analog in gas or plasma kinetics.

It is also worth mentioning that throughout this article, for concreteness, we often refer to the motion of “stars.” However, we are also very much interested in the motion of dark matter particles, which obey the same set of equations as stars, and are just as important a component of galactic dynamics.

The rest of this tutorial article is organized as follows. In Sec. II, we introduce some of the basic properties of N-body self-gravitating systems, discuss the extent to which self-gravitating systems play by the same thermodynamic rules as gases and plasmas, and give a qualitative idea of how these systems evolve on long timescales. Then, we begin with the theory of orbits. In Sec. III, we approximate a galaxy's gravitational potential by a smooth “mean field” potential and study the orbits of individual stars in this mean field, with the help of angle–action variables. In Sec. IV, we show how individual orbits respond to both linear and nonlinear perturbations and describe the theory of resonant trapping. Our proper study of kinetic theory begins in Sec. V, where we introduce the Klimontovich equation for an N-body system's phase space distribution function (DF) and derive a generic set of equations governing its evolution. In Sec. VI, we study the self-consistent linear response of stellar systems to weak perturbations, and then, in Sec. VII, we use those linear solutions to construct a quasilinear theory of statistically averaged evolution, culminating in the Balescu–Lenard equation. Our final topic, in Sec. VIII, is nonlinear kinetic theory, particularly in the presence of orbit trapping and with applications to bar-halo friction and the saturation of spiral instabilities.

Throughout these sections, we continually draw direct analogies between gravitational and plasma systems, as well as giving many concrete examples of the equations working in practice. Finally, in Sec. IX, we touch on some other areas of galactic dynamics that have plasma-theoretic analogues, and give references for further reading.

We mention here that the standard reference in this subject is the textbook Galactic Dynamics4 by Binney and Tremaine, which we will hereafter refer to as BT08. While our article is far narrower in scope than BT08, much of what we discuss cannot be found there or in any other textbook or publication. The main exceptions to this rule are the introductory sections Secs. II and III, where by necessity there is significant overlap with BT08. Where helpful, we point the reader to specific sections of BT08 in which more details can be found.

Finally, this tutorial grew out of notes that we both compiled as lecturers of the Kinetic Theory of Stellar Systems course at Oxford. Lecture materials, problem sets, etc., can be found online.5 

The simplest theoretical approach to describing a stellar system is to imagine it as a gas of stars, and to apply the traditional methods of equilibrium thermodynamics and two-body collision theory. In this section, we will attempt to do just that, and we will find that while these classical notions can be useful in characterizing stellar systems, they also break down completely in some circumstances—and that in the strict sense, in self-gravitating systems there is no such thing as equilibrium. The insights we gain from this failure will point us toward the more complete kinetic description that we will develop in later sections.

The first thing to mention is that the speeds of stars in galaxies and star clusters are nearly always of order tens or hundreds of km s−1, much less than the speed of light. This means that relativistic corrections to Newtonian gravity can almost always be ignored. There are exceptions, particularly for stars orbiting around supermassive black holes in galactic centers,6 but for our purpose, Newtonian gravity is an excellent approximation. Also, while stars do exhibit a spectrum of different masses, for simplicity we will always assume that our system consists of a single mass species, unless we explicitly state otherwise.

We therefore consider a system of N1 stars of equal mass m (usually, but not necessarily, taken to be of order a solar mass M), interacting only via Newtonian gravity. The gravitational potential energy due to the interaction between two stars with positions r and r is
(1)
where G=4.3×103 pc M1 km s−1 is Newton's gravitational constant (here M=2.0×1030 kg is the mass of the Sun). Newton's second law then tells us that the force on particle i is
(2)
where vidri/dt is the ith star's velocity.

Gravity is an exclusively attractive force—two masses always attract and never repel—which means that unlike in plasma, there is no shielding, nor is there any sensible notion of a “Debye length” over which forces are screened. Since N is large (1011 in the cold disk, 105 in the hot sphere), and since every star is attracting every other star, what stops the entire system from collapsing under its own weight and forming a huge black hole? The answer depends on the context. In the hot sphere, the random motions of the stars counterbalance the gravitational attraction (and can be thought of as an effective “pressure”). In the case of a cold disk, it is the rotational support (i.e., the fact that the disk is spinning) that counterbalances gravity.

To put these notions on a more mathematical basis, we now derive the virial theorem. For an isolated N-body system, we can choose the origin of our coordinates to be the fixed center of mass. Then, the moment of inertia of the N-body system with respect to the origin is
(3)
Let us now take the second derivative of this quantity with respect to time
(4)
Using the equation of motion (2), we can write the second term on the right-hand side as
(5)
For an equilibrium system, the moment of inertia can fluctuate around some average value taken over a few dynamical times, but that average value must not itself change. Denoting such a time average by ., we have that for a system in virial equilibrium
(6)
where
(7)
are the averaged total kinetic and total potential energies of the system, respectively. In other words, in virial equilibrium, the kinetic and potential parts of the energy of the system balance each other to within an order-unity factor.

A globular cluster (right panel of Fig. 1) is an excellent example of a system in virial equilibrium. Its smooth, near-spherical shape reflects the fact that it is not, on average, undergoing any violent bulk motions, despite the stars inside it whizzing around like a swarm of bees on the timescale tdyn. Note that even though a cluster is in virial equilibrium, that equilibrium does still evolve; but the evolution of this “quasi-stationary state” occurs over a timescale much longer than tdyn, as we will see (Sec. II C).

The virial theorem allows us to make quick order-of-magnitude estimates about typical random velocities of stars, σ. For instance, consider that a globular cluster has a mass MNm105M and a radius R of a few parsecs. Using the virial theorem, we can calculate the typical velocity dispersion of stars in this cluster. Roughly, we have KMσ2/2 and WGM2/R, so using (6), we get
(8)
in agreement with what is observed (and much less than c, justifying our use of Newtonian gravity).

Finally, let us contrast globular clusters with an example of a system, which is very much not in virial equilibrium. Figure 3 shows the Antennae system, which consists of two galaxies that have collided and are in the process of merging.

FIG. 3.

The Antennae galaxies NGC4038 and NGC4039. These two colliding galaxies are in the process of merging into a single structure. It is easy to see by eye that this system is going to undergo significant morphological changes on the typical timescale tdyn. Hence, it is not in virial equilibrium. Credit: ESA/Hubble.

FIG. 3.

The Antennae galaxies NGC4038 and NGC4039. These two colliding galaxies are in the process of merging into a single structure. It is easy to see by eye that this system is going to undergo significant morphological changes on the typical timescale tdyn. Hence, it is not in virial equilibrium. Credit: ESA/Hubble.

Close modal

For this system, Eq. (6) does not hold; instead, energy is rapidly being transferred between kinetic and potential forms on the timescale tdyn as the two galaxies perform their cosmic dance. However, we know from observations and numerical simulations of such galaxies that they ultimately do virialize and that this virialization is nearly complete after only a few tdyn (which in this case still means a few 109 years or more). This rather rapid evolution toward a virialized state is called violent relaxation.7 The precise mechanics of violent relaxation, and the question of which particular virial equilibrium state the system will fall into, are a matter of ongoing research not only in stellar dynamics but also in plasma physics.8 

For the rest of this article, we will only consider virialized stellar systems whose bulk properties are evolving on timescales tdyn. Isolated disk galaxies and globular clusters both fall into this category.

We mentioned above that virial equilibria are not true equilibria of stellar systems—rather, they are just quasi-equilibria that evolve on timescales much longer than the dynamical time tdyn. Yet why is this the case? After all, this is not what happens in either a gas or an electrostatic plasma: there, the natural thermal equilibrium distribution of particles' positions and velocities is a homogeneous Maxwellian. Unfortunately, gravity does not play by those thermodynamic rules.

To see why, suppose we wished to describe a globular cluster as a “stellar gas” in thermal equilibrium. We could make a natural definition of the cluster's temperature T by relating it to the total kinetic energy
(9)
where N is the number of stars. Then, the total energy of the cluster Etot is the sum of the kinetic and potential energies
(10)
However, from the virial theorem (6), we know that W=2K, so Etot=K<0. The fact that Etot is negative is not unusual: it just means that the system is gravitationally bound. The heat capacity is then
(11)
Notice that C is negative. This is strange: if you add heat to the system (increase its total energy Etot), it cools down! In other words, the more energy is given to the cluster, the lower the amplitude of the random velocities inside it.

The negativity of the heat capacity implies that it is impossible for a globular cluster ever to reach a thermal equilibrium. For instance, consider a cluster at temperature T immersed in a heat bath of the same temperature. Suppose there is some infinitesimal negative fluctuation in the cluster's energy, |δE|. By Eq. (11), this will cause the cluster to achieve a higher temperature by δT=(|δE|)/C>0, i.e., with higher velocity dispersion. Since the temperature is now higher, the system loses more heat to the bath, which then makes its temperature increase even further, and so on without limit.

Nor can one achieve thermal equilibrium in an isolated stellar system (i.e., in the absence of any heat bath). The reason is that any star with a sufficiently large velocity is unbound from the system and hence will escape to infinity, a process known as evaporation. This precludes the system from ever attaining a Maxwellian velocity distribution function (DF), since the cluster would otherwise have a finite phase space density out to v.

One might argue at this point that the Maxwellian is only an idealization, and not being able to exactly realize a Maxwellian DF is a rather pedantic argument for the nonexistence of thermal equilibrium. (Indeed, all speeds are ultimately bounded by the speed of light, but this does not stop us from employing a Maxwellian DF when describing plasmas!) To justify our claim further, let us calculate the rate at which stars would evaporate from a hypothetical Maxwellian stellar system. As we will argue in Sec. III, the motion of individual stars on the timescale tdyn is very accurately described by a mean field approximation, in which we ignore the complexities of the N-body system and instead consider the stars to be moving in a smooth gravitational potential Φ(r), in precisely the same way as one imagines the unperturbed motion in a homogeneous electrostatic plasma to be a straight line. Then, the energy per unit mass of any star with velocity v and position r is v2/2+Φ(r), meaning that any star with v>vesc(r)2Φ(r) will evaporate. Averaging this over the entire system gives us the typical escape speed
(12)
where ρ0(r) is the mass density of stars at position r and M is the total cluster mass. This is obviously related to the total potential energy of the system W. Being careful not to over-count the pairwise interactions, and using the virial theorem (6), we get
(13)
where σ2=N1i=1Nvi2 is the cluster's velocity dispersion. Supposing the cluster did have a Maxwellian DF with the same velocity dispersion, we can estimate the fraction of unbound stars by
(14)
In other words, we would expect about 1% of an isolated stellar system to evaporate away on the timescale of refilling of the Maxwellian tail, which is roughly a relaxation time (see Sec. II C for the definition of this term). The equivalent situation in a collisional plasma would be that 1% of the electrons disappeared every mean-free time! Thus, even if the cluster “wanted” to reach a Maxwellian steady-state, it could not do so because stars are constantly evaporating from the high-velocity tail.

Evaporation is a real and important effect in cluster dynamics; indeed, there are certain star clusters called open clusters with N103 (some of which you can see with the naked eye, like the Pleiades) whose lifetime is limited by evaporation to a few hundred Myr. In other words, there were star clusters visible to the dinosaurs that have since evaporated into the vastness of space and are no longer visible to us. Yet at no point was their DF ever Maxwellian.

With minor modifications, the above arguments also hold for cold, differentially rotating disks, as well as other virialized self-gravitating structures. The bottom line is that strict thermal equilibrium is impossible in a self-gravitating system, and there is no universal requirement for stellar systems to “tend toward the Maxwellian” as gases and plasmas do. Fundamentally, this stems from the purely attractive nature of gravity: since the potential energy of a very loosely bound particle decays like 1/r, there are infinitely many ways to rearrange the phase space coordinates of very distant particles with no energy cost. In the language of statistical mechanics, this means that in isolated self-gravitating systems there is no microcanonical ensemble, no canonical ensemble, energy is not extensive, and entropy is not bounded (so there exists no maximum entropy state4,9). Concepts like entropy are still useful in stellar dynamics, particularly if the phase space is bounded by, e.g., tidal truncation (meaning the system is not truly isolated) or by the approximate conservation of certain invariants (meaning there is a quasi-maximum entropy state on short enough timescales).10 However, in general, one has to be very careful about naive application of “standard” statistical mechanical results to gravitational systems.

We have argued that there exists no unevolving thermal equilibrium state for stellar systems. What, then, does happen to stellar systems after they reach virialization? Roughly speaking, the answer is that they move slowly from one virialized quasi-equilibrium state to another on a timescale that we call the relaxation time, trelax.

How do we calculate the relaxation time? Roughly speaking, it is the timescale over which a typical star “forgets” its initial orbit. A proper theory of relaxation requires an understanding of stellar orbits (Sec. III), but for now the intuition from gas or plasma kinetics is good enough. Hence, we simply ask, supposing an unperturbed star was moving in a straight line, how long would it take for its velocity v to de-correlate from its initial value, i.e., to accumulate an impulse |Δv||v|?

In a gas, these impulses occur because a given molecule collides with some other molecule roughly once per mean free time. Hence, after a few such collisions, this molecule has completely forgotten its initial state [see Fig. 4(a)].

FIG. 4.

(a) Relaxation in a gas involves physical contact collisions (or at least, very short-range interactions) between molecules, meaning a molecule's velocity can be changed significantly after a single collision. (b) In a stellar system, by contrast, two-body encounters are mostly weak, long-range nudges. Later in the article, we will deal with collective effects, whereby one must think beyond bare two-body processes and account for the coherent oscillations of very large numbers of stars at once.

FIG. 4.

(a) Relaxation in a gas involves physical contact collisions (or at least, very short-range interactions) between molecules, meaning a molecule's velocity can be changed significantly after a single collision. (b) In a stellar system, by contrast, two-body encounters are mostly weak, long-range nudges. Later in the article, we will deal with collective effects, whereby one must think beyond bare two-body processes and account for the coherent oscillations of very large numbers of stars at once.

Close modal

This is not so in a stellar system. Even in a dense system like a globular cluster, the distance between single stars is such that direct physical collisions are very rare.11 Instead, in a cluster, the evolution of a star's orbit is mostly driven by a very large number of weak nudges from all the other stars it is interacting with simultaneously [Fig. 4(b)]. In the coming subsections, we will follow Chandrasekhar and estimate the relaxation timescale due to a series of such nudges (Sec. II C 1), a calculation completely analogous to the Spitzer two-body collision calculation from plasma kinetics. We will also pause to comment on the confusing words “collisional” and “collisionless” as applied to stellar systems (Sec. II C 2). Then (Sec. II C 3), we will show that Chandrasekhar's theory is very successful at describing the evolution of hot spheres (in the same way that Spitzer's theory works for describing collisional plasmas) but fails completely to describe the evolution of cold disks (as does Spitzer's theory when dealing with collisionless plasma out of equilibrium). This will be the jumping-off point where stellar dynamics ceases to consist of a series of rough estimates; thereafter, things get quantitative.

1. Chandrasekhar's theory of two-body relaxation

Here, we give the simplest version of Chandrasekhar's calculation of the relaxation time,12 based on his theory of two-body relaxation, and primarily following Sec. 1.2.1 of BT08. (For a more complete treatment of Chandrasekhar theory, see Chap. 7 of BT08.) What we will present is a gravitational version of Rutherford scattering.

To begin, we will ignore all the complications associated with real stellar systems and instead consider a single “test” star moving in a statistically homogeneous box of static “field” stars of mass m. Let the test star travel in a perfectly straight line until it approaches a field star at impact parameter b. The goal now is to calculate the nudge to the test star's velocity, δv(b).

Since we are going to assume that the interaction between the two stars is weak, to lowest order we can approximate the trajectory of the test star as a straight line throughout the encounter (see Fig. 5). Suppose without loss of generality that the closest approach of the test star to the field star occurs at t=0. We may then calculate the total impulse it receives by integrating from t= to t=. Then, the acceleration the test star feels in the direction parallel to v while flying toward the field star (t<0) is exactly canceled by the retardation it feels as it flies away (t>0), meaning that there is no overall velocity change in the parallel direction. Hence, we need only to calculate the total perpendicular velocity nudge δv.

FIG. 5.

A test star encounters a field star of mass m with impact parameter b. Assuming the encounter to be weak, the lowest-order nudge to the test star's velocity, δv, can be calculated by integrating along the unperturbed trajectory (straight dotted line).

FIG. 5.

A test star encounters a field star of mass m with impact parameter b. Assuming the encounter to be weak, the lowest-order nudge to the test star's velocity, δv, can be calculated by integrating along the unperturbed trajectory (straight dotted line).

Close modal
The perpendicular acceleration field felt by the test star is (see Fig. 5 for the definition of r and θ)
(15)
Integrating over all time, we get
(16)
As one might expect, the more massive the field star, and the smaller the impact parameter b, the larger the velocity impulse the test star experiences. Moreover, it is sensible that the impulse decays with v, since the larger is v the shorter is the duration of the encounter. On the other hand, the formula (16) also predicts its own failure: if δv is sufficiently large that it is comparable to the original speed v, then the assumed straight-line trajectory is obviously a poor approximation. Note that for a fixed m and v, this breakdown occurs at a critical impact parameter b90=2Gm/v2, so-called because this in fact produces a 90-degree deflection. For impact parameters bb90, one ought really to solve the two-body encounter problem without making any approximations (which is not difficult—see Sec. 3.2 of BT08). Luckily, such strong encounters are rare and make little difference to the bulk evolution of the system. We will ignore them from now on.
Chandrasekhar's next step is to assume that relaxation in a stellar system consists of nothing more than an uncorrelated series of these weak two-body encounters. Under this assumption, we can calculate the relaxation time as follows. Let the number density of stars in the box be n. Then, in a time T, the field star passes a total of nvT2πbdb field stars at impact parameter [b,b+db]. The mean square kick to the velocity that the test star receives in time T is therefore
(17)
where lnΛln(bmax/bmin) is the Coulomb logarithm, which is familiar from the corresponding plasma calculation of charged particle scattering.13 

We seem to have a problem here, since taking either bmin0 or bmax, or both, causes the total velocity impulse to diverge (albeit in the most undramatic way, i.e., logarithmically). So, what values of bmin and bmax should we take? Usually, one takes bmax to be comparable to the scale of the system R, since there cannot be any encounters more distant than this (this is different from plasma, in which one can safely take bmax of order the Debye screening length13). Moreover, one typically takes bminb90 because, as we mentioned above, strong encounters are usually rare and unimportant. In fact, while setting the Coulomb logarithm accurately is an important problem for those modelling globular cluster evolution,14 from a theorist's standpoint the answer is it does not matter too much because the model is only heuristic anyway. For instance, encounters with impact parameters b on the order of the size of the system certainly cannot be thought of as straight lines, since they must take into account the curved geometry of the true orbits of the stars (Sec. II C 3). The standard choice ΛR/b90 is good enough for obtaining order-of-magnitude estimates, which is all we care about here.

If we now set δv2 equal to the original squared speed v2, we can identify T=t2BR as the two-body relaxation time
(18)
Let us relate this more clearly to the number of stars in the system N. Let the side-length of the box be R—which is a proxy for the radius of the star cluster or galaxy in which we are interested. Then, we know from the virial theorem that typically v2GNm/R [see Eq. (8)]. We may also set the typical crossing time to be tdynR/v and the typical number density to be nN/(4πR3/3). Then, taking bmax=R and bmin=b90=2Gm/v2, we get ΛRv2/(Gm)N, so that
(19)
To be even more heuristic, we will often drop the factors of 0.1 and lnN and simply state that the two-body relaxation time is on the order of t2BRNtdyn.

Plugging in characteristic numbers from Table I, we find that for a globular cluster t2BR10 Gyr, comparable to the age of the cluster. This suggests that the simple physics of uncorrelated two-body encounters is of major importance for determining the evolution of globular clusters. However, for a disk galaxy, this theory suggests evolution should only occur on the excruciatingly long timescale t2BR1010 Gyr, nine orders of magnitude longer than the age of the Universe.15 Yet, we know from observations and simulations that disk galaxies can evolve significantly on a timescale of a few Gyr. The estimate (19) tells us that star-star encounters are much too inefficient to have produced this evolution, so a more sophisticated kinetic theory is required (Sec. II C 3).

Note that Eq. (19) is directly analogous to the inverse two-body collision time in an electrostatic plasma, tcoll(Λp/logΛp)ωp1, where Λp is the plasma parameter (the number of particles per Debye sphere) and ωp is the plasma frequency.16 Thus, there is a direct analogy between the plasma parameter Λp and the number of stars N (at least for stellar systems whose evolution is driven purely by finite-N fluctuations, with little collective amplification). In particular, the inverse of these quantities, 1/Λp1 and 1/N1, are the small numbers with which one can expand the BBGKY hierarchy in weakly coupled plasmas and stellar systems, respectively.17 These numbers being small are equivalent to the statement that the kinetic energy of each particle is large compared to the fluctuations in the potential energy (recall that the mean potential energy in a stellar system is of order the kinetic energy, see Sec. II A).

2. Collisional and collisionless

It is worth commenting here on the (often extremely confusing) use of the words collisional and collisionless when describing stellar systems. These words were partly inherited from plasma theory, where they have a fairly concrete meaning: roughly, interactions that occur on scales smaller than the Debye scale are collisions, and if these can be ignored, then the system is collisionless. In a stellar system, however, there is no shielding, no Debye length, and no natural way to decouple small-scale and large-scale gravitational potential fluctuations—every star is interacting gravitationally with every other star, always.

Sometimes in the gravitational context, the word collisionless is used as a shorthand for “a system whose naive two-body relaxation time (19) is much longer than its age” (e.g., a disk or an elliptical galaxy), while collisional systems are those with t2BR less than or comparable to their age (basically, star clusters). Alternatively, the word “collisional” is sometimes used to refer to self-interacting dark matter (SIDM),18 a model of dark matter involving a finite cross section for two dark matter particles to annihilate, by contrast with standard collisionless dark matter, which has no such cross section. We could give many other examples of these words being used for slightly different purposes. It is therefore no surprise that this semantic issue frequently causes confusion among nonexperts.

There are two sensible ways to deal with the issue. The first way is to call a system collisional if and only if the finite number of particles N matter to its evolution, and to call it collisionless otherwise. The second way is to avoid using the words collisional and collisionless altogether. We advocate the second way, although these words are so ingrained in the literature that sometimes this is impossible.

3. Where do we go from here?

It turns out that when applied to the evolution of hot spherical clusters, Chandrasekhar's picture does a remarkably good job of estimating the true relaxation time trelax and various other evolutionary properties,14 although the value of the Coulomb logarithm has to be tweaked.22,23 Moreover, one can use Chandrasekhar's ideas to spin up a full kinetic theory—i.e., a kinetic equation of the form f/t=C[f], with C calculated by assuming relaxation consists of two-body encounters as above—and this works well in hot spheres too.24 To demonstrate this, we present Fig. 6 (adapted from Joshi et al.19), which shows the evolution of “Lagrange radii” of a spherical globular cluster in a numerical simulation. These Lagrange radii are the radii enclosing a fixed percentage of the cluster's mass at any given time. First of all, we see that the inner Lagrange radii gradually decrease with time (corresponding to the contraction of the cluster core, leading ultimately to the “gravothermal catastrophe”25) and the outer radii slowly increase (corresponding to the expansion of the cluster's envelope). This evolution happens very slowly (the time unit on the horizontal axis is the two-body relaxation time t2BRtdyn), corresponding to the gradual evolution from one virialized state to another. Looking more closely, one can see that for each Lagrange radius in the panel there are actually two lines plotted. One of these was calculated from a direct N-body simulation of the “exact” equations of motion of all the stars, whereas the other employed a Monte–Carlo calculation based upon Chandrasekhar's theory of two-body encounters. The fact that across much of the figure, these two lines are indistinguishable is testament to the success of Chandrasekhar theory in describing hot spherical clusters.

FIG. 6.

Long-term evolution of a globular cluster. Adapted from Joshi et al., Astrophys. J. 540, 969 (2000).Copyright 2000 IOP.19 The lines show the evolution of the “Lagrange radii” which enclose 0.35%, 1%, 3.5%, 5%, 7%, 10%, 14%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% of the cluster mass, respectively. In each case, both the results of a direct N-body simulation and of a Monte–Carlo calculation20 based on Chandrasekhar's theory are shown. For most times and Lagrange radii, the two sets of results are indistinguishable.

FIG. 6.

Long-term evolution of a globular cluster. Adapted from Joshi et al., Astrophys. J. 540, 969 (2000).Copyright 2000 IOP.19 The lines show the evolution of the “Lagrange radii” which enclose 0.35%, 1%, 3.5%, 5%, 7%, 10%, 14%, 20%, 30%, 40%, 50%, 60%, 70%, and 80% of the cluster mass, respectively. In each case, both the results of a direct N-body simulation and of a Monte–Carlo calculation20 based on Chandrasekhar's theory are shown. For most times and Lagrange radii, the two sets of results are indistinguishable.

Close modal

On the other hand, as mentioned at the end of Sec. II C 1, Chandrasekhar's theory fails completely to describe the evolution of cold disk galaxies. First of all, disk galaxies develop long-range collective structures like the spiral arms in Fig. 1(a), and these have no place in Chandrasekhar's picture. Also, Chandrasekhar's theory drastically overestimates the timescale on which these systems relax. As an example of this, consider Fig. 7 (adapted from Frankel et al.21). This figure was generated by fitting a simple diffusion model to the orbital actions (to be introduced in Sec. III C) of stars in the Milky Way using data from Gaia, a space observatory, which has been mapping the kinematics of stars in our Galaxy since 2014. Panel (a) shows the diffusion of the stellar angular momenta (essentially the guiding radius of near-circular orbits around the galaxy), while panel (b) shows the evolution of those same stars' mean radial actions (essentially their orbital “eccentricities”). In both panels, we see that significant evolution can occur on the timescale of a few Gyr. To get an estimate for the true relaxation time in this system, note that the orbital angular momenta of stars on circular orbits (speed vϕ250 km s−1) near the Solar radius in the Milky Way (R8 kpc) is L=Rvφ2000 kpc km s−1. The relaxation time we infer from Fig. 7(a) is therefore of order trelax(2000/619)2×6 Gyr 60 Gyr, and many orders of magnitude shorter than would be suggested by the Chandrasekhar estimate (19). Even though this relaxation time is still several times longer than the age of the Galaxy (10 Gyr), it can still have a major effect on galaxy evolution, as we will see in Sec. VIII B 1.

FIG. 7.

Adapted from Frankel et al., Astrophys. J. 896, 15 (2020). Copyright 2020 IOP.21 Each panel shows the evolution of an orbital integral of motion [namely, (a) the component of angular momentum perpendicular to the galactic disk and (b) the radial action, which will be introduced properly in Sec. III C] of stars in the Milky Way over 6 Gyr, as inferred from Gaia data. In the right panel, different colors correspond to groups of stars with different initial Galactocentric radii.

FIG. 7.

Adapted from Frankel et al., Astrophys. J. 896, 15 (2020). Copyright 2020 IOP.21 Each panel shows the evolution of an orbital integral of motion [namely, (a) the component of angular momentum perpendicular to the galactic disk and (b) the radial action, which will be introduced properly in Sec. III C] of stars in the Milky Way over 6 Gyr, as inferred from Gaia data. In the right panel, different colors correspond to groups of stars with different initial Galactocentric radii.

Close modal

Chandrasekhar's theory fails because it is based on a number of false assumptions, for instance:

  • Stellar systems are inherently inhomogeneous (see Fig. 1) but the theory assumed homogeneity.

  • Because of the inhomogeneity, unperturbed stellar orbits are not endless straight lines, but instead are quasiperiodic (Sec. III). This leads to resonant interactions.

  • Stars do not behave in an independent, uncorrelated way but instead tend to oscillate collectively. Thus, one cannot replace all interactions between stars with a superposition of uncorrelated two-body encounters.

These statements apply to both globular clusters and disk galaxies, which raises the question: Why does Chandrasekhar's theory work to describe the former but not the latter? The answer lies in the fact that globular clusters are dynamically hot, whereas disk galaxies are dynamically cold. The hotness of clusters means that they act like thermal plasmas: collective effects are weak and successive fluctuations are mostly uncorrelated, as different parts of the system have no way to efficiently communicate with one another. On the other hand, the highly ordered orbital structure of disk galaxies means that great numbers of stars can have long-lived coherent dynamical interactions. The result is that disk galaxies, like nonequilibrium plasmas, are replete with large-scale collective oscillations and possible instabilities. Such effects are able to redistribute energy and angular momentum much more efficiently—and in a fundamentally different way—than the mechanism of incoherent two-body relaxation.

It follows that to describe the behavior of disk galaxies, we need to leave Chandrasekhar's theory behind and develop a new kinetic theory, one capable of dealing with the complications listed above. That is one of the aims for the rest of this tutorial article. However, before we can do kinetics, we must learn how to describe the orbits of stars in inhomogeneous, but still fixed and smooth, mean field gravitational potentials (Sec. III), and then figure out what happens to those orbits when they are perturbed (Sec. IV).

As we mentioned in Sec. II B, on the timescale tdyn the orbits of stars in a virialized N-body system can be very well described by ignoring the discrete nature of the system and instead treating individual stars as test particles orbiting in a smooth potential. This is a good approximation for two reasons: (i) the gravitational force between two objects is long range, decreasing as the square of the distance between them, and (ii) the number of stars in the system, N, is very large. The consequence of (i) is that the force acting on any individual star is typically not dominated by its nearest neighbors, but rather by the collective gravity of simultaneous distant interactions. The consequence of (ii) is that the Poisson fluctuations in the number of stars in a given sub-volume tends to be small, so the collective gravitational force due to stars in this sub-volume fluctuates only weakly on the timescale tdyn. The average value of this collective force, combining all the sub-volumes, is then typically very smooth (see also Sec. 1.2 of BT08).

Another key feature of virialized stellar systems is that they have rather simple bulk geometry: they are normally approximately spherical (in the case of globular clusters, some dwarf galaxies, etc.), or thin and axisymmetric (like a disk galaxy), or possibly ellipsoidal/triaxial (like some massive elliptical galaxies). This simple geometry leads to (mostly) simple, regular orbits, and not many chaotic orbits. In this section, we will describe how stars orbit within a simple class of smooth mean field potentials.

In elementary courses on plasma physics, there is no great heed paid to “mean field dynamics” because those dynamics are trivial: in a homogeneous electrostatic plasma, mean field “orbits” are just straight lines: r(t)=vt+r(0) and v=v(0)= cst. On the contrary, stellar systems are inherently inhomogeneous, which means that the unperturbed mean field orbits of stars are not straight lines. This is something plasma physicists also have to deal with when considering orbits in systems with nontrivial geometry and especially with magnetic fields, such as in toroidal tokamak fusion plasmas or the Earth's magnetosphere.26–28 

If the gravitational potential is Φ(r), the equations of motion of a test star are
(20a)
(20b)
In practice, we cannot solve these equations of motion explicitly except in a few very simple potentials (e.g., the Kepler potential Φ(r)1/|r|, and the harmonic potential Φ(r)|r|2). However, we can solve the equations via numerical integration—this is called “integrating the orbit.” Modern numerical packages like Galpy,29, Gala,30 and AGAMA31 allow one to integrate and analyze mean field orbits in pre-defined potentials with ease.

In general, numerical orbit integration reveals a wide array of possible orbit families—see Chap. 3 of BT08 for an overview. However, for simplicity, throughout this article, we will only consider one simple orbit family, which is the planar rosette orbit shown in Fig. 8.

FIG. 8.

Nearly all orbits in central potentials Φ(r) look like this: a two-dimensional planar “rosette.” Using in-plane polar coordinates (ϕ,R), each orbit is a combination of an azimuthal oscillation in the ϕ direction and a radial oscillation between Rp and Ra. The thick black line shows the trajectory over one radial period, from Ra to Rp and back again.

FIG. 8.

Nearly all orbits in central potentials Φ(r) look like this: a two-dimensional planar “rosette.” Using in-plane polar coordinates (ϕ,R), each orbit is a combination of an azimuthal oscillation in the ϕ direction and a radial oscillation between Rp and Ra. The thick black line shows the trajectory over one radial period, from Ra to Rp and back again.

Close modal

This is (nearly) the only type of orbit that is possible in central potentials Φ(r), where r|r|. It is therefore the overwhelmingly dominant orbit family in both our mental models, the hot sphere and the cold axisymmetric disk (Fig. 2). By symmetry, motion in central potentials conserves each component of the specific orbital angular momentum vector, L=r×p, and so each such orbit is confined to a two-dimensional plane (which we can take to be the plane Z=0 without loss of generality). From now on, we describe orbits in this plane with a standard (ϕ,R) polar coordinate system centered at r=0.

1. Orbits in central potentials

We will now show how to describe the rosette-type orbit of Fig. 8 quantitatively. We see from Fig. 8 that the orbit consists of a combination of an azimuthal circulation in the positive ϕ direction, and a radial oscillation in R between the fixed pericenter Rp and apocenter Ra. We can calculate Rp and Ra in terms of the two conserved quantities of the star's motion, namely, its specific energy and specific angular momentum (we will drop the word “specific” from now on)
(21a)
(21b)
The peri/apocentre are the radial turning points of the orbit; that is, they are solutions to vR(Rp)=vR(Ra)=0, where vR=dR/dt is the radial velocity
(22)
In central potentials, there is then a one-to-one correspondence between the integrals of motion (E,L) and the peri/apocentric distances (Rp,Ra):
(23a)
(23b)
We can also calculate the frequencies of the radial and azimuthal oscillations as follows. First, the radial frequency is
(24)
where TR is the radial period
(25)
The azimuthal frequency is
(26)
where |Δϕ| is the increment in azimuthal angle over one radial period reading
(27)
(In all sensible spherical potentials, the increment |Δϕ(Rp,Ra)| lies between π and 2π). In other words, all the characteristic quantities describing the orbit can be calculated given the two numbers (Rp,Ra), or equivalently the two constants of motion (E,L). This is in contrast to a homogeneous plasma, where one can label an orbit just by its velocity v.
To further illustrate what these rosette orbits look like, consider motion in the spherical Plummer potential, which is often used as a simple model for the gravitational potentials in globular clusters:
(28)
Here, b is the scale radius of the potential (for a globular cluster b1 pc —see Table I). The Plummer potential has very simple asymptotic forms. For rb, it approximates the Keplerian potential of a point mass, ΦGM/r. In this potential, all orbits are closed ellipses with focus at the origin, i.e., they have |Δϕ(Rp,Ra)|=2π. For rb, the (nonconstant part of the) Plummer potential is quadratic Φr2, which corresponds to motion in a spherical “harmonic core” of constant density (see BT08, Sec. 3.1). All orbits in the harmonic core are also closed ellipses, but this time, they are centered on the origin, and have |Δϕ(Rp,Ra)|=π.

In Fig. 9, we show three orbits integrated in the potential (28) in the (ϕ,R) plane, each with the same Ra=b but different Rp, as labeled above each panel.

FIG. 9.

Three orbits in the Plummer potential (28). In each case, the orbit starts at its apocentric distance at the location (X,Y)=(b,0). Each orbit has the same apocenter Ra=b (red dashed circles) but different pericenters Rp (blue dashed circles), as indicated above each panel. As a result, orbit (a) is almost radial (JRL), orbit (c) is almost circular (JRL), and orbit (b) is somewhere in-between (JRL)—see Sec. III C 3 for the definition of the radial action JR.

FIG. 9.

Three orbits in the Plummer potential (28). In each case, the orbit starts at its apocentric distance at the location (X,Y)=(b,0). Each orbit has the same apocenter Ra=b (red dashed circles) but different pericenters Rp (blue dashed circles), as indicated above each panel. As a result, orbit (a) is almost radial (JRL), orbit (c) is almost circular (JRL), and orbit (b) is somewhere in-between (JRL)—see Sec. III C 3 for the definition of the radial action JR.

Close modal

Orbit (a) is nearly radial, with RpRa, whereas orbit (c) is nearly circular, with RpRa, and orbit (b) is somewhere in-between.

The orbit labels (Rp,Ra), or equivalently (E,L), contain no information about the orbit's instantaneous radial or azimuthal phase, just as labeling an orbit by v in a plasma does not tell us its instantaneous position in the box r. However, it turns out that the phase of the orbits also exhibit rather simple behavior, as we investigate next.

A crucial feature of orbits in many smooth, regular galactic potentials revealed by numerical orbit integration is that they are quasiperiodic. By this we mean that, very often, one can Fourier-analyze a numerically integrated orbit and write it as
(29)
Here, the sum is over all d-dimensional vectors comprised of integers, where d is the number of dimensions, e.g., (0,0,1), (0,1,1), … etc., in the case of d=3. The vector Ω is made up of d frequencies that are characteristic of the orbit in question. Quasiperiodicity is also a common feature of particle orbits in tokamaks, magnetospheres, and so on.

That the orbits in Fig. 9 are quasiperiodic might have been guessed by eye, but this fact also extends to a much broader range of orbits in realistic galactic potentials.32,33 Mathematically, that an orbit is quasiperiodic can be proven in cases where the number of known independent isolating integrals of motion is (at least) equal to the number of dimensions (which is always true for orbits in spherically symmetric potentials). However, it is a nontrivial empirical fact that quasiperiodicity also extends to orbits in many realistic axisymmetric potentials Φ(R,Z) for which the orbit is fully three-dimensional and there are only two obvious independent constants of motion (E and the component of angular momentum along the symmetry axis, LZ).32,33 Normally in a system with more dimensions than there are independent integrals of motion, one would expect some degree of chaos; instead, it is believed that orbits in axisymmetric potentials almost always possess a third independent integral of motion, whose analytic form we do not know but which makes the orbits regular and hence quasiperiodic (though this is not generically guaranteed34).

While triaxial potentials (like ellipsoids) can also harbor families of quasiperiodic orbits, they also tend to have many more chaotic orbits too, which arise in the parts of phase space between the regions of distinct quasiperiodic families. As such, in the rest of this tutorial article, we will suppose that the exact potential of the system is close to, or exactly, axisymmetric, and that in the limit of pure axisymmetry, there is only one regular (quasiperiodic) family of orbits we need to worry about, and no chaos. The central potentials Φ(r) are of this nature.

The upshot of quasiperiodicity is that stellar orbits can be thought of as a superposition of oscillators with fixed frequencies Ω. Here, we will show how this fact allows us to construct a coordinate system in which this mean field orbital motion is trivial, namely, angle–action coordinates. An alternative and more detailed introduction to these coordinates and how they are used can be found in Sec. 3.5 of BT08.

Let (q,p) be a set of canonical coordinates and momenta on phase space (a reader who is not familiar with canonical coordinates can consult Sec. D.4 of BT08). Now consider a star moving according to the generic Hamiltonian H(q,p,t). The equations of motion for that star are
(30)
(30a)
(30b)
We can also use these canonical coordinates to introduce a bilinear antisymmetric functional on phase space called the Poisson bracket. For any two functions g(q,p) and h(q,p), we define
(31)
This will be useful later.

A perfectly valid choice for the canonical coordinates (q,p) is the position and velocity (r,v). However, the lack of straight line orbits in stellar systems means that (r,v) are awkward variables with which to formulate a kinetic theory. We would like instead to work as much as possible with quantities that are invariant along orbits. An integral of motion I(r,v) is any function of phase space coordinates that is constant along an orbit. As argued in Sec. III B, for all the mean field orbits we care about in this tutorial article (that is, orbits that are quasiperiodic) and there exist at least as many independent integrals of motion as there are degrees of freedom in the system (normally 2 or 3). One can construct as many mutually dependent integrals of motion as one wishes, since any function of integrals of motion is itself an integral of motion.

The obvious next question is: given that these integrals of motion exist, can we use them as coordinates for our phase space? The answer is yes, but if we want to preserve the Hamiltonian structure of our system—i.e., if we want the new coordinates to be canonical—then we have to be careful about which integrals to use. It turns out that there is a special set of integrals called actions J, which are canonical momenta on phase space,35 and can be complemented by canonical coordinates. Specifically, these actions have the special property that the Hamiltonian of the underlying quasiperiodic system H(q,p) can be written exclusively in terms of them, H=H(J), and they can be complemented by canonical coordinates or angles θ, which are 2π-periodic. In these special angle–action coordinates (q,p)=(θ,J), Hamilton's equations (30) take the simple form
(32)
(32a)
(32b)
where we introduced the orbital frequencies Ω(J)=H/J. In these coordinates, the orbits are straight, horizontal lines, as one has
(33a)
(33b)
Some further properties of the angle–action coordinates are as follows. Since they are canonical, they leave the Poisson bracket (31) invariant, so that one generically has
(34)
As a result, angle–action coordinates also leave the infinitesimal phase space volumes invariant so that
(35)
Furthermore, the periodicity of the angles in d dimensions means that any phase space function can be Fourier expanded as
(36)
with inverse
(37)
Actions are also adiabatic invariants: roughly speaking, if H evolves on a timescale longer than the dynamical time, tdyn1/|Ω|, an orbit governed by H will evolve in such a way that J=cst. (but see Weinberg36–38 for a more precise statement).

Note that we have not proven the existence of the angle–action variables, nor have we explained how to generate them from scratch (via the process of writing down a suitable canonical generating function). All this is standard material which is explained rather neatly (if heuristically) in Sec. 3.5 of BT08, or much more rigorously in the books by Lichtenberg and Lieberman39 and Arnold.40 Throughout the rest of this tutorial article, we will simply assume that the angle–action coordinates can be constructed for the problem at hand.

Also, it is important to note that angles and actions are just coordinates, and that once we have constructed them, we can use them to describe the motion of particles in Hamiltonians other than H(J). In particular, we often care about systems in which the motion is nearly, but not perfectly, quasiperiodic (such as a spherical stellar system perturbed by an infalling satellite). In such a case, we can construct the angle–action variables (θ,J) corresponding to the corresponding perfectly periodic system and then describe the real (perturbed) system by expressing its Hamiltonian in these coordinates, H(θ,J), deriving the corresponding equation of motion dθ/dt=H/J and dJ/dt=H/θ, etc. We will see how this construction works in practice starting in Sec. V B.

1. Example: Angle–action variables for a multi-periodic homogeneous system

As a first example, we consider a fictitious homogeneous stellar system, which provides a direct analogy with homogeneous plasmas. To this end, let us assume that our stellar system is placed within a periodic three-dimensional box of side-length L. We also assume that the mean potential vanishes, i.e., Φ0=0, so that unperturbed trajectories are straight lines. In this case, the system's angle–action coordinates and the associated orbital frequencies are given by
(38)
This makes clear a crucial correspondence between (fictitious) homogeneous stellar systems and homogeneous electrostatic plasmas. In both systems, mean field orbits are labeled by their velocity v, and, up to a prefactor, these velocities are also the orbit's intrinsic frequencies Ω. This means that (i) the gravitational/electrostatic potential Φ(r) is a function only of angles θ and not actions J, and (ii) the orbital frequencies Ω(J) are trivial (linear) functions of the actions. These two features are special to homogeneous systems, and greatly simplify the kinetic description (see, e.g., Sec. VI B 1).

2. Example: Angle–action variables for the 1D harmonic oscillator

Let us now consider the case of a one-dimensional harmonic oscillator whose Hamiltonian is H=(v2+ω2x2)/2. As illustrated in Fig. 10, in this case, orbits take the form of concentric circles in the phase space (x,v/ω).

FIG. 10.

Phase space diagram of a harmonic oscillator, H=(v2+ω2x2)/2. Panel (a) shows particle trajectories in the physical phase space (x,v/ω), which take the form of concentric circles. Here, the action J should therefore be seen as a label for the circle associated with the orbit, and the angle θ should be seen as the position along the circle. Panel (b) shows the same trajectories in angle–action space (θ,J). In these coordinates, the mean field motions are straight lines. The action J is conserved, while the angle θ evolves linearly in time.

FIG. 10.

Phase space diagram of a harmonic oscillator, H=(v2+ω2x2)/2. Panel (a) shows particle trajectories in the physical phase space (x,v/ω), which take the form of concentric circles. Here, the action J should therefore be seen as a label for the circle associated with the orbit, and the angle θ should be seen as the position along the circle. Panel (b) shows the same trajectories in angle–action space (θ,J). In these coordinates, the mean field motions are straight lines. The action J is conserved, while the angle θ evolves linearly in time.

Close modal
We define the action via H=Jω, along with the angle θ so that
(39)
Following Eq. (34), one can check that [θ,J]=1; thus, (θ,J) are canonical coordinates. Finally, orbits which followed circles in (x,v/ω) space now traverse straight horizontal lines in (θ,J) space.

3. Example: Angle–action variables for orbits in a central potential

For central potentials, Φ(R), the two-dimensional rosette motion can be described using action variables J1=L and J2=JR, where L is the usual angular momentum (see Eq. 21) and JR is the radial action
(40)
with the radial velocity, vR, given by Eq. (22). This is directly analogous to the longitudinal adiabatic invariant J=mvds for particle motion in a magnetic mirror.41 The conjugate angles are the azimuthal and radial angles
(41a)
(41b)
These angles evolve at the rate of the azimuthal frequency Ωϕ and radial frequency ΩR, respectively.

Figure 11 (taken from Ref. 42) presents recent data from the spacecraft Gaia, showing the distribution of millions of stellar orbits in our local patch of the Milky Way disk, plotted in various slices through angle–action space.

FIG. 11.

The distribution of stars' orbits in our local patch of the Milky Way disk (a.k.a. the Solar neighborhood), plotted in angle–action space, using data from the Gaia satellite. This figure was reproduced from Hunt et al., Mon. Not. R. Astron. Soc. 490, 1026 (2019). Copyright 2019 Royal Astronomical Society.42 Another detailed guide to features in the action space (top left panel) can be found in Ref. 43.

FIG. 11.

The distribution of stars' orbits in our local patch of the Milky Way disk (a.k.a. the Solar neighborhood), plotted in angle–action space, using data from the Gaia satellite. This figure was reproduced from Hunt et al., Mon. Not. R. Astron. Soc. 490, 1026 (2019). Copyright 2019 Royal Astronomical Society.42 Another detailed guide to features in the action space (top left panel) can be found in Ref. 43.

Close modal

Each panel is full of structure, some of which is due to selection effects (since we are unable to access data across the whole Galaxy) but much of which is due to dynamical interactions between stars, spiral arms, the Galaxy's central bar, and so on. Though we will not pursue any details here, understanding structures like those in Fig. 11 is a major goal of modern Galactic astronomy.

4. Example: Angle–action variables for near-circular orbits

We have introduced angle–action coordinates as the natural variables with which to describe quasiperiodic orbits, since those orbits are superpositions of (generally anharmonic) oscillations at discrete frequencies Ω(J). However, a disadvantage of angle–action coordinates is that we can rarely make explicit the mapping (r,v)(θ,J). On the other hand, we can write down an explicit mapping if we restrict ourselves to near-circular orbits (which is a highly relevant regime for describing cold disks). There, we can employ the epicyclic approximation. In this limit, each orbit consists of circular motion at the guiding center radius Rg, plus a superposition of harmonic oscillations in the radial and azimuthal coordinates. In  Appendix A, we show how to construct explicit angle–action coordinates in the epicyclic approximation.

For the remainder of this tutorial article, we assume that we can always construct angle–action variables (θ,J), such that the mean field Hamiltonian depends only on J (which is itself a nontrivial technical problem44). It is then tempting to think of actions as the “velocity” variables in our new phase space and the angles as our new “positions.” Indeed, many formulas from plasma kinetics will more-or-less hold in stellar dynamics just by making the substitutions vJ and rθ (see especially Table II). However, there are two key mathematical differences that make stellar kinetics much harder than plasma kinetics. The first is that unlike in a homogeneous plasma, the canonical coordinates change at rates that are in general, nonlinear functions of the canonical momenta, i.e., Ω(J) is typically a complicated (nonlinear) function of J, whereas in plasma we have ΩJv. The other key difference is that the fluctuating gravitational potential in these variables (which we will introduce in Sec. IV) depends on both the canonical coordinates and momenta, i.e., δΦ(r,t)=δΦ(θ,J,t). As we will see, these differences make the physics of stellar systems in some ways richer dynamically than that of electrostatic plasmas, but also make explicit calculation more difficult.

TABLE II.

Comparison of various pieces entering the Balescu–Lenard equation for homogeneous [Eq. (126)] and inhomogeneous [Eq. (125)] stellar systems.

Homogeneous Inhomogeneous
Orbital distribution  f0(v,t)  f0(J,t) 
Basis decomposition  ψ(r,r)keik·(rr)/k2  ψ(r,r)pΦ(p)(r)Φ(p)*(r) 
Response matrix  1k2dvk·f0/vωk·v  ndJn·f0/Jωn·ΩΦn(p)*(J)Φn(q)(J) 
Dressing of perturbations  1/|ϵk(k·v)|2  |ψnnd(J,J,n·Ω)|2 
Resonance condition  k·(vv)=0  n·Ω=n·Ω 
Homogeneous Inhomogeneous
Orbital distribution  f0(v,t)  f0(J,t) 
Basis decomposition  ψ(r,r)keik·(rr)/k2  ψ(r,r)pΦ(p)(r)Φ(p)*(r) 
Response matrix  1k2dvk·f0/vωk·v  ndJn·f0/Jωn·ΩΦn(p)*(J)Φn(q)(J) 
Dressing of perturbations  1/|ϵk(k·v)|2  |ψnnd(J,J,n·Ω)|2 
Resonance condition  k·(vv)=0  n·Ω=n·Ω 

Now that we know how to describe stellar orbits in smooth central potentials, we are ready to consider what happens if we perturb those orbits. In particular, in this section we discuss how stellar orbits in an axisymmetric, razor-thin (i.e., two-dimensional) disk are modified when we introduce a rigidly rotating nonaxisymmetric perturbation (e.g., a spiral arm pattern). The discussion will be familiar to plasma physicists who have studied, for example, the interaction of ions with toroidal Alfvén waves in tokamaks.26,28

In order to construct angle–action variables, we need a system that is integrable. Since we know that orbits in axisymmetric potentials are nearly always integrable, we will first consider an axisymmetrized version of our (in general, nonaxisymmetric) system, and construct the angle–action variables associated with motion in that axisymmetrized system. We then treat nonaxisymmetries as perturbations.

To begin our construction, at a given time let the exact, in general, nonaxisymmetric, gravitational potential of the whole two-dimensional system (including external perturbations, if they are present) be Φ(r)=Φ(ϕ,R). Then, the exact Hamiltonian of a test star orbiting in this system is
(42)
The azimuthal average of H is
(43)
where for any function q(r), we define
(44)
We know that we can construct angle–action variables (θ,J) such that H¯ only depends on J. Given these angle–action variables, we can express any function on phase space in terms of them via r=r(θ,J) and v=v(θ,J). In particular, we can return to our original exact Hamiltonian H and split it into a θ-independent “mean-field” part and a θ-dependent “fluctuation”:
(45)
where the θ average of some quantity q(θ,J) is
(46)
Here, the “0” subscript reminds us that q0 is equivalent to the zeroth Fourier mode of q when expanded in angles [see Eq. (36) above].
One can show that the “mean-field Hamiltonian” H0 and the “axisymmetric Hamiltonian” H¯ are equivalent (even though this is not generally true of the potential alone, i.e., Φ0Φ¯). Furthermore, we have that
(47)
i.e., the angle-dependent part of the Hamiltonian is just the nonaxisymmetric part of the potential. Hence, from now on, we write
(48)
and refer to H0 and δΦ as the mean-field Hamiltonian and the potential perturbation, respectively.
We now assume that the nonaxisymmetric potential perturbation δΦ rotates in the ϕ direction with fixed pattern speed Ωp,
(49)
Given that the perturbation is periodic in ϕΩpt, using Eq. (41a) and the fact that R is a function only of θR and J (through the implicit relation 41b), we can always expand δΦ as a Fourier series [c.f. Eq. (36)]
(50)
where n=(nϕ,nR). For a perturbation with fixed azimuthal harmonic number m>0 (e.g., an m-armed spiral with a sinusoidal azimuthal profile), the only terms that survive in the expansion (50) are those with nϕ=±m. More generally, bar/spiral perturbations can usually be built as a superposition of a small number of m-harmonic components.
A star's orbit evolves according to Hamilton's equations (32). Thus, using the Fourier expansion (50), its action evolves as J=J(0)+δJ(t), where
(51)
Given this equation, it will not surprise the reader that the majority of interesting orbital modifications by rigidly rotating potential perturbations occur at resonances, i.e., locations J in phase space where
(52)
for some vector of integers N=(Nϕ,NR). That is because these are locations where, substituting the unperturbed trajectories (33a) and (33b), the argument of the exponential on the right-hand side of Eq. (51) is constant.

For concreteness, suppose the perturbation in question has azimuthal harmonic number m. Then, the most important values of N—which are sometimes referred to as the perturbation's major resonances—are

(53)
(53a)
(53b)
(53c)
These resonances tend to dominate the in-plane evolution of disk galaxies driven by large-scale perturbations like spirals and bars. Resonances with |NR|2 also exist, but are usually subdominant because bars and spirals have Fourier spectra dominated by the lowest few nR. On the other hand, resonances with |NR|2 are very important for understanding, e.g., the dynamical friction felt by an orbiting point mass.45–49 

The resonances (53) have a simple physical interpretation in the important case of near-circular orbits (see Sec. III C 4 and  Appendix A). In that case, the corotation resonance is located at the orbital radius RCR such that an orbit with guiding radius Rg=RCR advances in azimuth at exactly the same rate as the perturbation, i.e., Ωc(RCR)=Ωp, with Ωc the azimuthal frequency for circular orbits. If we now imagine keeping that star's orbit near-circular but increasing Rg slightly, it will begin to circulate at a frequency slightly smaller than Ωp. As we continue this process, eventually Rg will be sufficiently large that the difference ΩpΩc is equal to the frequency of radial epicycles κ [Eq. (A7b)]; this is the outer Lindblad resonance. Had we performed a similar experiment but instead gradually made the circular orbit smaller than Rg, the star would begin to orbit faster than Ωp, until ΩpΩc coincided with κ; this is the inner Lindblad resonance. In summary, for epicyclic orbits, we can write the resonance condition as
(54)
with n=0,1,1 for CR, OLR, and ILR, respectively. For an illustration of these resonant orbits, see, e.g., Fig. 6.10 of BT08.

1. Example: Orbits perturbed by a spiral potential

To illustrate the morphology of perturbed orbits and the importance of resonances, consider the motion of test stars in the potential Φ=Φ¯+δΦ, where Φ¯ is the logarithmic halo potential
(55)
and R0 is an arbitrary scale length. This potential gives rise to a flat rotation curve, meaning that all stars on circular orbits (JR=0) have the same orbital speed: vϕ2=RΦ¯/R=V02=cst. This is in fact a reasonably realistic model for many galactic disks.3 For δΦ, we choose the potential of a steadily rotating logarithmic spiral
(56a)
(56b)
This corresponds to a spiral with m arms, pattern speed Ωp, and pitch angle p (so p0 is a very tightly wrapped trailing spiral, which gets less and less tightly wound as p90°—for much more detail, see Sec. 6.1 of BT08). The rotation period of this spiral is simply, Tp=2π/Ωp. In Fig. 12, we plot contours of δΦ/(εV02) for m=2 and p=60° (a rather loosely wrapped arm).
FIG. 12.

Contours of the spiral potential, Eq. (56), for m=2 and p=60° (the azimuthal phase is arbitrary). The dotted and dashed white curves show the radial locations of the ILR, CR, and OLR, respectively, assuming Ωp=V0/(6.5R0), but this is only meant to guide the eye—the logarithmic spiral (56) and the logarithmic halo (55) are both scale-free, so nothing about the figure would change if we modified Ωp apart from the radii of these white circles.

FIG. 12.

Contours of the spiral potential, Eq. (56), for m=2 and p=60° (the azimuthal phase is arbitrary). The dotted and dashed white curves show the radial locations of the ILR, CR, and OLR, respectively, assuming Ωp=V0/(6.5R0), but this is only meant to guide the eye—the logarithmic spiral (56) and the logarithmic halo (55) are both scale-free, so nothing about the figure would change if we modified Ωp apart from the radii of these white circles.

Close modal

We now fix the pattern speed to be Ωp=0.154V0/R0, so that the corotation radius of circular orbits is RCR=6.5R0. We take the spiral strength to be ε=0.01, the pitch angle p=60°, and the initial phase ϕp=0. We initialized the orbits of nine test stars in this potential, putting each star on a circular orbit with speed V0 and azimuthal location ϕ(t=0)=0. The initial conditions of the stars therefore differed only in their initial orbital radii R(0), which we sampled linearly between RILR and ROLR inclusive. We integrated each orbit for a total time 20Tp.

In Figs. 13 and 14, we show the resulting orbits. In each panel, the black line shows the trajectory of a star whose initial radius R(0) is given at the top of the panel. we also show the corotation radius of near-circular orbits RCR with a dashed red line, and the corresponding Lindblad radii RILR/OLR with dotted red lines. In Fig. 13, we plot the trajectories in the inertial frame of the galaxy, while in Fig. 14, we plot them in a frame that rotates azimuthally at the rate Ωp [so the spiral potential (56) is fixed].

FIG. 13.

Orbits of stars in the combined logarithmic halo (55) and spiral (56) potentials. All stars are initially on circular orbits with speed V0 but differ in their initial orbital radii R(0). The dotted and dashed lines show the Lindblad and corotation radii, as in Fig. 12. In particular, in panels (a) (e), and (i) the orbit is trapped at the ILR, CR, and OLR, respectively (see Fig. 14).

FIG. 13.

Orbits of stars in the combined logarithmic halo (55) and spiral (56) potentials. All stars are initially on circular orbits with speed V0 but differ in their initial orbital radii R(0). The dotted and dashed lines show the Lindblad and corotation radii, as in Fig. 12. In particular, in panels (a) (e), and (i) the orbit is trapped at the ILR, CR, and OLR, respectively (see Fig. 14).

Close modal
FIG. 14.

As in Fig. 13 except in a frame that rotates in the ϕ direction at rate Ωp (so the spiral pattern is stationary).

FIG. 14.

As in Fig. 13 except in a frame that rotates in the ϕ direction at rate Ωp (so the spiral pattern is stationary).

Close modal

These figures reveal that initially circular orbits are not dramatically distorted if the star sits far from a resonance. In this case, the orbits are well described with linear perturbation theory which we introduce in Sec. IV E. On the other hand, the orbits are changed qualitatively if they are located near a resonance. In particular, panel (a) shows an orbit that is “trapped” at the ILR, which is the key resonance responsible for the formation of stellar bars. Meanwhile, panel (e) shows an orbit that is trapped at corotation. When viewed in the rotating frame, these orbits do not complete any azimuthal loops, but are rather “stuck” on one side of the spiral, and librate back and forth in the trough of the spiral potential. Trapped orbits at corotation are especially important for determining the slowdown rate of spinning stellar bars (Sec. VIII B 1), the saturation of spiral instabilities (Sec. VIII B 2), and so on. (They are also very familiar to plasma physicists, who are used to particles “bouncing” in the troughs of waves if the wave's phase velocity matches the particle velocity.) Finally, panel (i) shows an orbit trapped at the OLR.

The rest of this section is devoted to a mathematical description of how stellar orbits respond to a perturbation of the form (50). We start by discussing a quantity that is rigorously conserved for such orbits (Sec. IV D). We then describe how one can approximate the evolution of linearly perturbed orbits (Sec. IV E). Finally, we show how the nonlinearly distorted orbits that can develop near resonances can be treated analytically in the pendulum approximation (Sec. IV F).

One thing we can immediately say about test star motion in the Hamiltonian (48) is that it preserves neither angular momentum L nor energy E (since δΦ is both nonaxisymmetric and time-dependent). Yet, it does conserve the Jacobi integral (see Sec. 3.3.2 of BT08):
(57)
This follows because HJHΩpL is just the Hamiltonian in the frame co-rotating with the perturbation, and in that frame, the Hamiltonian is time-independent and hence conserved (see Sec. 3.3.2 of BT08 for proof). Since EJ=cst., any interaction of a test star with a constant amplitude, rigidly rotating potential perturbation must produce changes to the star's energy ΔE and angular momentum ΔL that are related by
(58)
If ΔE is sufficiently small, we can approximate it by ΔE(E/L)ΔL+(E/JR)ΔJR=ΩϕΔL+ΩRΔJR, following the generic definition of orbital frequencies in Eq. (32a). This allows us to relate the change in radial action ΔJR to the change in angular momentum ΔL,
(59)
For instance, stars that corotate with the perturbation, Ωϕ=Ωp, do not experience changes to their radial actions to first order.
Suppose the perturbation strength |δΦ|/|H0|ε1 [just as in Eq. (56)]. Then, to zeroth order in ε, we have δJ(0)=0, i.e., the star's trajectory is just a straight line through angle–action space, J=J0 and θ=θ0+Ωt. Plugging this back in to the right-hand side of Eq. (51) and integrating forward in time, we get
(60)
Thus, to first order in ε, the perturbation (50) nudges the star's orbital action by an amount δJ(1), which is proportional to ε in magnitude and oscillatory in time, except in the case where the star's mean field orbit is in resonance with the perturber, namely, when there exists some pair of integers n=N=(Nϕ,NR) such that (52) holds for J=J0. In the resonant case, (60) predicts that δJ will grow linearly in time, since the Nth Fourier component of the forcing (50) is constant in the frame moving with the unperturbed stellar orbit.

In reality, it is precisely near these resonant locations that linear perturbation theory will eventually break down. In that case, one should instead turn to an alternative, nonlinear description of the dynamics.

To describe the near-resonant orbits analytically, we turn to the “pendulum approximation.” The main idea of the pendulum approximation is that near resonances, the linear approximations used in Sec. IV E only truly break down along certain special “resonant directions” in angle–action space, namely, those directions where the argument of the exponential in Eq. (50) varies slowly for some resonance n=N. This allows us to average out the oscillatory motion in the “nonresonant” directions and thereby reduce the complicated resonant dynamics to the integrable problem of a one-dimensional pendulum.

For definiteness, let us set the number of dimensions to d=2. Then, θ=(θϕ,θR) and J=(Jϕ,JR). We now make a canonical transformation (see Sec. D.4 of BT08) to a new set of coordinates, which are again angle–action coordinates of the unperturbed problem.39,50 Precisely, we map (θ,J)(θ,J), where θ=(θf,θs) consists of the “fast” and “slow” angles
(61)
(61a)
(61b)
and J=(Jf,Js) consists of the corresponding fast and slow actions
(62)
(62a)
(62b)
(In plasma, the analog of, e.g., Eq. (61) would be a coordinate transform x=xvpht, where vph is the phase velocity of a wave moving along the x axis). Having made this transformation, we may rewrite the Hamiltonian H [Eqs. (48) and (49)] in terms of the new coordinates:
(63)
where k=(kf,ks) is a vector of integers and we have expanded δΦ as a Fourier series in the new angles θ, i.e., written δΦ(r,t)=kΨk(J)eik·θ. The coefficients Ψk are easily computed for a simple bar or spiral model.51,52 The special thing about the form (63) of the Hamiltonian is that it has no explicit time dependence [or rather, the time dependence has been absorbed into the definition of the angle θs; see Eq. (61)].
The fast angles θf evolve on the orbital timescale, whereas θs evolves on the much longer timescale (N·ΩNϕΩp)1. Thus, we choose to average H over the unimportant fast motion and work instead with the simpler Hamiltonian h(2π)1dθfH at fixed Jf. We also expand h around the resonance. To do this, we let the resonant action be Jres [meaning that Eq. (52) is satisfied for J=Jres]. In the transformed coordinates, this is Jres=(Jf,Js,res), and so we can define
(64)
with φk[π,π] (and we used the shorthand Ψ(0,ks)=Ψk). Expanding h for small I, and assuming that a single Fourier component k dominates the perturbation, we find that at each fixed Jf the dynamics reduces to motion in the “slow plane” (θs,Js) dictated by the pendulum Hamiltonian
(65)
where φ=φk and
(66)
Here, Js,res(Jf) is the resonant value of the slow action at fixed fast action, |Ψk|Js,res is the strength of the bar perturbation on resonance, and φ[π,π] is just a phase-shifted slow angle. Note that in contrast with linear perturbation theory (Sec. IV E), in this case if the perturbation strength is |δΦ|/|H0|ε1, then the “perturbing” part of the Hamiltonian (65) has relative strength |F/h|O(ε1/2), and the timescale for evolution due to this perturbation is therefore tlibε1/2tdyn. Appreciating this difference is crucial when it comes to developing an appropriate kinetic theory (see Sec. V E).
The variables (φ,I) are canonical variables for the pendulum Hamiltonian (65). The pendulum moves at constant “energy” h, either on an untrapped “circulating” orbit with h>F or a trapped “librating” orbit with h<F (the separatrix between these two families is h=F). The libration period for oscillations around (φ,I)=(0,0) is
(67)
The maximum width in I of the librating “island” is at φ=0, where it spans I[Ih,Ih] and Ih is the island half-width:
(68)
which is independent of k. The contours of this Hamiltonian are plotted in Fig. 15 for the case with k=1.
FIG. 15.

Pendulum phase space. Colors show contours of constant h(φ,I) [see Eq. (65)] for k=1. The black dashed line shows the separatrix between librating (i.e., trapped) and circulating (i.e., untrapped) orbit families.

FIG. 15.

Pendulum phase space. Colors show contours of constant h(φ,I) [see Eq. (65)] for k=1. The black dashed line shows the separatrix between librating (i.e., trapped) and circulating (i.e., untrapped) orbit families.

Close modal

The separatrix between librating and circulating orbit families is shown with a black dashed line.

1. Example: Corotation resonance, flat rotation curve

Let us illustrate the pendulum dynamics with a concrete example. This requires stipulation of the resonance N under consideration. Let us consider the corotation resonance (CR), as defined in Eq. (53a), with the resonance location at the angular momentum value L=LCR(JR) for a given radial action JR. Then, we have

(69a)
(69a)
(69b)
(69c)
We must also stipulate a background potential; the simplest choice is the logarithmic halo potential (55), which gives rise to a flat rotation curve with circular speed V0. With this choice, for initially circular orbits (Jf=JR=0), we get
(70)
(70a)
(70b)
(70c)
We now combine these two choices, i.e., we consider circular, corotating orbits in a logarithmic halo. In this case, the slow action Js is just proportional to the guiding radius R, with the resonant value Js,res occurring at radius RCR, and the slow angle is (up to a phase shift) equivalent to m[ϕ0tdtΩp(t)]. Thus, in this case, one can think of the transformation into slow and fast angle–action variables simply as “moving into the rotating frame” (or in the plasma context, shifting to a frame in which the wave's phase velocity is zero).

With these choices, the libration time (67) can be estimated as
(71a)
(71b)
where tdyn=2πR/V0 is the circular orbital period at radius R, and all quantities must be evaluated at the resonance location R=RCR. Clearly, tlib is always several times longer than the orbital period tdyn. Moreover, Eq. (68) lets us estimate the radial width of the resonant island for these orbits
(72a)
(72b)
Realistic spirals like those in the Milky Way today have ε several percent or more.53 Thus, the resonant island at corotation can span one or more kiloparsecs in radius. This must be borne in mind whenever one makes a “narrow resonance” approximation, since Rh can also be comparable to the radial scale over which the galaxy's underlying density profile changes significantly (3 kpc for the stellar disk of the Milky Way).

As a more quantitative example, let us apply our pendulum formalism to the nine orbits shown in Fig. 14. In Fig. 16, we show these nine trajectories through the space of azimuthal angle ϕΩpt+ln(RCR/R0)cot p and angular momentum L.

FIG. 16.

The same orbits as in Fig. 14, but plotted in the space of azimuthal angle (in the rotating frame) and angular momentum (relative to the corotation resonance value). Up to constants, these are equivalent to the slow angle and slow action at corotation [Eqs. (61) and (62)]. The background colored lines are contours of the associated pendulum Hamiltonian (c.f. Fig. 15). The dashed and dotted red lines have the same meaning as in Fig. 14.

FIG. 16.

The same orbits as in Fig. 14, but plotted in the space of azimuthal angle (in the rotating frame) and angular momentum (relative to the corotation resonance value). Up to constants, these are equivalent to the slow angle and slow action at corotation [Eqs. (61) and (62)]. The background colored lines are contours of the associated pendulum Hamiltonian (c.f. Fig. 15). The dashed and dotted red lines have the same meaning as in Fig. 14.

Close modal

Up to constants, these are equivalent to the slow angle–action pair (φ,I) near corotation. With the colored lines, we plot contours of the pendulum Hamiltonian (c.f. Fig. 15) using the parameters for the corotation resonance [Eqs. (69) and (70)]. The dashed and dotted red lines again show the ILR, CR, and OLR from bottom to top, respectively. We see that the pendulum approximation does a good job of predicting the orbit shapes near corotation. The slight mismatch in the orbit shapes is mostly because the pendulum theory ignores all but one Fourier harmonic k (as such, the agreement with theory gets better (respectively worse) if we use more open (respectively tight) spirals, i.e., take p>60° [respectively p<60°)]. The pendulum approximation can even work quite well far away from the CR. Mostly, this is because away from corotation, orbits are not perturbed much in angular momentum so the approximation L= const. is a good one. However, we see that the pendulum approximation begins to break down close to the ILR and OLR, where resonant effects not associated with corotation start to play an important role in the dynamics. Near those resonances, angular momentum is clearly not conserved, and to describe the dynamics accurately, one must instead shift to the proper slow angle–action coordinates appropriate to each resonance.

2. Example: Radial migration

Radial migration is the name given to any dynamical process that takes a star on a circular or nearly circular orbit in a galactic disk and alters its angular momentum (or equivalently its guiding radius) while producing little or no change in its radial action. The ubiquity of radial migration in galactic disks, and the importance of this especially for the evolution of chemical gradients, was first appreciated by Sellwood and Binney.54 Here, we perform two numerical experiments similar to the ones shown in their paper, in order to demonstrate the difference between linear (Sec. IV E) and nonlinear (Sec. IV F) perturbations.

We consider a similar setup to the one we described in Sec. IV C 1 by integrating an array of test particles in the logarithmic halo potential Φ¯ with an accompanying spiral potential perturbation δΦ, except this time we modify the potential perturbation (56) by multiplying it by a Gaussian in time
(73)
This corresponds to a transient spiral which grows and decays on the timescale τ, and whose amplitude reaches its peak at t=tpeak. All other parameters are set to the same values as in Fig. 13. With these parameters, we can use Eq. (71a) to estimate the libration time at peak spiral amplitude to be tlib5Tp. For the initial conditions of the stars, we use circular orbits with radii randomly sampled from the range R(0)/R0[4,9], which is centered on the corotation radius but does not reach as far as the ILR/OLR. We choose the initial azimuthal angles ϕ(0) randomly in [0,2π]. We integrate for a total time of 10Tp, and always set the time of peak amplitude to be halfway through the integration, tpeak=5Tp.

First, we integrated the equations of motion of 5000 test stars in this spiral potential, choosing the growth/decay time to be τ=3Tp. In panels (a) and (b) of Fig. 17, we show the resulting evolution of the angular momenta of a randomly selected subset of these stars, shown with different colors, as a function of azimuthal angle and time. In panel (c), we show a scatterplot of the change in angular momentum, ΔLL(10Tp)L(0), for every star, as well as the mean ΔL at each fixed L(0). The red dashed line in these panels shows the corotation resonance, while the solid pink lines are displaced from corotation by Lh [Eq. (68)], the approximate resonance width at peak spiral amplitude. We see that close to the resonance, the spiral traps stars and causes them to undergo large shifts in their angular momenta (up to the resonance width Lh). As the spiral's amplitude decays, it “drops off” these stars at angular momenta, which are often rather different from their initial values. The blue line in panel (c) makes it clear that there is a preference for stars initially inside corotation (L(0)<LCR) to receive positive ΔL, and vice versa. To understand this, note that in the limit where the libration time at peak amplitude is very short, a star which was originally located inside corotation will librate back and forth across L=LCR several times before being deposited at some new L=L(0)+ΔL. This new L could be either inside or outside LCR with roughly equal probability, so on average the star will have moved out (ΔL>0). Meanwhile, far from the resonance, the stars respond roughly adiabatically to the perturbation and so ΔL is clustered around zero. We also mention that in panel (c), many stars gather on the line with slope 2 that passes through ΔL=0 at corotation. This line corresponds to stars that are initially located at L(0)=LCR being deposited at LCR+.55,56

FIG. 17.

Radial migration in a galactic disk. We integrate the orbits of 5000 stars that are initially on circular orbits with random azimuths ϕ(0) and radii chosen randomly in the range R(0)/R0[4,9]. The stars are perturbed by a transient spiral potential (73) with lifetime τ=3Tp. Panels (a) and (b) show the evolution of the angular momenta of a handful of these stars as functions of azimuthal angle in the rotating frame and time, respectively. Panel (c) shows the change in angular momentum experienced by each particle by the end of the simulation as a function of its initial angular momentum L(0). In each panel, the red dashed line corresponds to corotation, and the solid pink lines are offset from corotation by approximately the resonance half-width Lh=mIh [Eq. (68)] at peak spiral amplitude.

FIG. 17.

Radial migration in a galactic disk. We integrate the orbits of 5000 stars that are initially on circular orbits with random azimuths ϕ(0) and radii chosen randomly in the range R(0)/R0[4,9]. The stars are perturbed by a transient spiral potential (73) with lifetime τ=3Tp. Panels (a) and (b) show the evolution of the angular momenta of a handful of these stars as functions of azimuthal angle in the rotating frame and time, respectively. Panel (c) shows the change in angular momentum experienced by each particle by the end of the simulation as a function of its initial angular momentum L(0). In each panel, the red dashed line corresponds to corotation, and the solid pink lines are offset from corotation by approximately the resonance half-width Lh=mIh [Eq. (68)] at peak spiral amplitude.

Close modal

Second, in Fig. 18, we rerun the same experiment except with a much shorter-lived spiral, τ=0.3Tp. We see from panel (a) that in this case, no orbits manage to undergo a full libration inside the trapping region. As a result, there is no preferential direction of the transport of stellar angular momenta across the resonance, so the mean ΔL in panel (c) is approximately zero for all initial L(0). In fact, since the perturbation lasts for so short a time, the resonance location is not so significant, meaning the distribution of ΔL values does not depend particularly strongly on L(0). In fact, the envelope of ΔL values is significantly broader far away from corotation than it was in Fig. 17 (so the variance (ΔL)2 is relatively large even though ΔL is close to zero).

FIG. 18.

As in Fig. 17, except the spiral grows and decays much more quickly, τ=0.3Tp.

FIG. 18.

As in Fig. 17, except the spiral grows and decays much more quickly, τ=0.3Tp.

Close modal

From the results of these numerical experiments, we can extract two key physical principles, which apply ubiquitously in the kinetic theory of both stellar systems and plasmas:

  • The efficiency with which different perturbations shuffle the integrals of motion of a distribution of particles depends on the spatio-temporal correlation spectrum of those perturbations (just as in, e.g., the classic stochastic acceleration study of Sturrock57). It is therefore imperative to develop a theory, which can account for the statistical evolution of a distribution of particles when exposed to such a perturbation spectrum, be it externally or self-consistently generated. This is what we will do in the rest of this tutorial article, starting in Sec. V.

  • If a single coherent perturbation dominates the potential δΦ over a timescale tlib, then it will trap resonant orbits nonlinearly, and this may drive the system down a substantially different evolutionary path compared to the case with no trapping. But if some process is able to interrupt the trapping process on a timescale tlib, then nonlinear trapping can be ignored and one can apply linear perturbation theory (Sec. IV E) with confidence. This “interruption” can be due to the rapid decay of the perturbation as in the example above, or because of a time-dependent Ωp and the accompanying movement of the resonance location through phase space,51,58 or it can be due to some collisional/diffusive process, which kicks particles in and out of the resonance region. We address the last of these possibilities in Sec. VIII B. We also mention that the interruption of nonlinear trapping by time-dependent wave amplitudes, collisionality, and so on is very commonly studied in plasma theory.59–62 

In Sec. II, we introduced the idea that an arbitrary gravitationally bound ensemble of stars reaches virial equilibrium after a few dynamical timescales tdyn. It then moves through a series of such equilibria on the relaxation timescale trelaxtdyn. Then, in Secs. III and IV, we saw how to describe the quasiperiodic orbits of stars in smooth, regular, time-independent potentials Φ¯, became familiar with the machinery of angle–action variables, and discussed the modification of orbits by some perturbation δΦ. Here, we will set up the remainder of the article by bringing these two sets of ideas together. Our aim is to describe a quasi-equilibrium stellar system as a collection of quasiperiodic orbits and then to develop a quantitative theory of how fluctuations in the true gravitational potential cause that quasi-equilibrium to change slowly over time.

We start with the microscopic Klimontovich (DF), or empirical DF,
(74)
(Here and in what follows, the subscript “M” stands for “microscopic.”) This function encodes the exact position ri(t) and velocity vi(t) of every star i at time t, which are themselves the exact solutions to the Newtonian equations of motion (2). Therefore, for indistinguishable particles, the Klimontovich function (74) contains complete information about the system. Yet, at this stage, it is totally useless because if we knew all this information, we would not need kinetic theory. Nevertheless, describing the system in this way is what will allow us to most conveniently formulate that kinetic theory.
Note that f is normalized such that drdvf=Nm=M, the total mass of the stars. Thus, if there is no external forcing, the exact microscopic gravitational potential is
(75)
where ψ(r,r)=G/|rr|. This is simply the formal solution of the Poisson equation
(76)
where the microscopic mass density is ρm=dvfm. It follows that every particle i satisfies Hamilton equations (30), namely,63 
(77)
(77a)
(77b)
where Hm is the microscopic one-particle Hamiltonian
(78)
Using Eqs. (74)–(78), it is now easy to check that fm(r,v,t) exactly satisfies the Klimontovich equation
(79)
which we notice can also be written in a pleasingly coordinate-free form as
(80)
This form of the equation is particularly convenient when we wish to perform a canonical coordinate transform, e.g., from position and velocity to angle–action variables. Physically, Eq. (80) just says that each infinitesimal “pixel” of phase space fluid is invariant if one follows it along the trajectory prescribed by the exact Hamiltonian Hm (Liouville's theorem).
We now take the formal step of splitting all quantities into an angle-averaged part and an angle-dependent fluctuation (which at this stage is not necessarily small). Thus, we write the microscopic DF as
(81)
where
(82)
Obviously, if we were to convert back to real space coordinates, then f0, for a galactic disk, would be axisymmetric (independent of ϕ), while all nonaxisymmetric fluctuations would be contained in δf. The equivalent in a homogeneous plasma would be a DF f0 that depended only on velocity, with all position-dependent fluctuations carried in δf. Similarly, the scalar potential Φm as defined in Eq. (75) is a linear functional of the DF fm, as is the microscopic Hamiltonian Hm defined in Eq. (78). Thus, in correspondence with (81), we may write
(83)
By construction, H0 is the part of the Hamiltonian that only depends on actions J, while any θ dependence (and hence any nonaxisymmetry of the potential) is contained in δΦ (see Sec. IV A). In a homogeneous system, H0=v2/2. Since the transformation to angle–action coordinates (Sec. III C) is canonical, it conserves the infinitesimal volumes drdv=dθdJ and so the Klimontovich DF (74) can be expressed as
(84)
We note the subtlety that even though all nonaxisymmetry of the DF fm is contained in δf, the quantity δf may also contain an axisymmetric piece. Related to this is the fact that f0 is not equivalent to the axisymmetric part of fm. This means that, for example, one cannot in general construct the axisymmetric part of the exact surface density distribution Σ¯m(R)=(2π)1dϕdvfm of a disk from f0(J) alone: one also needs to know the θR-dependent part of the DF.
Let us briefly consider what happens if there are no nonaxisymmetric fluctuations (which in reality would require N). Setting δf=δΦ=0 in Eqs. (81)–(83) and plugging the results into (80), we get64 
(85)
The left-hand side of this equation can be thought of as a “mean field operator” which advects particles along their mean field trajectories; the equation therefore tells us that in the absence of fluctuations, the phase space density along mean field trajectories is conserved, as we would expect. In fact, since f0 and H0 only depend on J, the Poisson bracket is zero [see Eq. (34)]:
(86)
This means that f0/t=0, i.e., f0 cannot change with time even at a fixed phase space location. Put differently, any f0 that is only a function of J is an equilibrium distribution in the absence of fluctuations, or what is called a collisionless equilibrium, or equivalently a quasi-stationary distribution—a result known as Jeans' theorem. In a homogeneous stellar system or electrostatic plasma, Eq. (86) is equivalent to the requirement that [f0,12v2]=0, which is trivially true of any f0 which only depends on velocity. In this case and in the case of inhomogeneous stellar systems, the fact that f0 satisfies (86) does not mean that it is a stable equilibrium; in general, determining whether some f0 is linearly stable requires the linear response theory that we will develop in Sec. VI.
Let us now reinstate the fluctuations, in particular allowing f0 to be time-dependent. Plugging the expansions (81), (83) into (80) and writing the Poisson bracket out explicitly as in Eq. (34), we get, without any approximations,
(87a)
(87b)
where Ω(J)H0/J. Equation (87a) tells us how the (nonlinear) coupling of nonaxisymmetric fluctuations δf and δΦ drives evolution of the angle-averaged part of the DF f0. Note that this equation takes the form of a continuity equation in action space, as it must, since we are not allowing stars to be created or destroyed. Meanwhile, Eq. (87b) tells us how the fluctuating part of the DF evolves over time.
If we are doing the problem self-consistently, then we must also couple these two equations to the Poisson equation for the fluctuations, namely,
(88)
In plasma, the interaction kernel ψ has no dependence on the “actions” (velocities), which makes the Poisson equation much easier to solve in practice.
At this point, it is convenient to introduce a Fourier representation for our angle-dependent quantities. That is, we use the periodicity of the angle variables θ to express δf and δΦ as Fourier series, like in Eq. (36), namely,
(89a)
(89b)
Let us now rewrite (87a) and (87b) in Fourier form so that
(90a)
(90b)
Meanwhile, the Fourier-transformed version of Eq. (88) reads
(91)
The (bare) coupling coefficients ψnn(J,J) encode, in essence, Poisson's equation in angle–action space, through the decomposition
(92)
In the limit of an homogeneous multi-periodic plasma (Sec. III C 1), these coefficients are greatly simplified and become ψnnδnn/|n|2. In that limit, note that (i) the only “resonances” permitted are for n=n and (ii) the coupling between particles does not depend on the actions (J,J). The fact that neither (i) or (ii) hold in general for realistic stellar systems makes linear response calculations much harder than in homogeneous plasmas (Sec. VI).

The only assumption we have made in deriving Eqs. (90a) and (91) is that H0 is integrable, i.e., that there exists a set of global angle–action coordinates (θ,J) such that H0=H0(J). If this is true, then Eqs. (90a) and (91) are exact. Sadly, they are also very difficult to solve.

To render them solvable, we will often make an additional assumption that angle-dependent fluctuations are small, i.e., δff0 and δΦH0. With this, it follows naturally from Eqs. (90a) and (90b) that the fluctuation δf evolves on the dynamical timescale tdyn1/|Ω|, whereas the angle-averaged DF f0—whose rate of change is quadratic in the small fluctuations—evolves on some much longer timescale trelaxtdyn. This timescale separation is what allows the system to move through a series of virial equilibria, each of which approximately satisfies Eq. (85), as we described qualitatively in Sec. II C 3. For self-consistent problems, H0[f0] will also be changing on the timescale trelax, and in practice every time H0 changes significantly, we have to recompute the mapping (r,v)(θ,J).

Our assumption of weak fluctuations means that we now have a well-defined program for solving the system (90a) and (91). On the timescale tdyn, we may linearize equation (90b) by throwing away the small terms on the right-hand side
(93)
and then solve it under the assumption that f0 is fixed. If we are doing a self-consistent problem, we must couple this linearized equation with the Poisson equation (91). The simultaneous solution of these two linear equations is a nontrivial problem and is the subject of Sec. VI. Supposing we can do this successfully, we can then plug the solutions δf, δΦ into the right-hand side of Eq. (90a)—or, if appropriate, its ensemble-averaged cousin (118)—and solve for the evolution of f0 on the much longer timescale trelax. Such evolution of f0 is called quasilinear, since it involves the multiplication of two small quantities, each of which is determined using linear theory.
We emphasize that the linear equation (93) is just the kinetic version of the linear perturbation theory that we wrote down for the orbits of individual stars in Sec. IV E. To see this, note that the formal solution to Eq. (93) is just
(94)
The first term on the right-hand side represents the ballistic evolution (i.e., the mean-field motion, along straight lines in angle–action space) of some initial fluctuation δf(θ,J,0) as governed by the mean field Hamiltonian H0, i.e., the phase mixing of unperturbed orbits J= cst, θ=Ωt+ cst. Phase mixing causes the initial fluctuation to oscillate rapidly in action (or, more properly, frequency) space for times t1/|Ω|—see Fig. 19 for illustration. [Analogously in plasma, phase mixing consists of particles moving along v= cst, r=vt+ cst and therefore δf(r,v,0) winds up to ever smaller scales in velocity space].

The second term on the right-hand side of Eq. (94) tells us how the DF is modified by the potential perturbation δΦ as evaluated along the unperturbed trajectory; in other words, it is the result of applying an equation like (60) to every star. Put differently, in any linear or quasilinear kinetic theory, the only particle trajectories that are accounted for are the ballistic unperturbed trajectories set by H0 (straight lines in angle–action space) plus the small (sinusoidal) wiggles instigated by δΦ. Quasilinear transport is then a consequence of resonances between the oscillatory forcing δΦ and these first-order wiggles—hence enters perturbation theory at second order.

FIG. 19.

Schematic illustration of phase mixing in angle–action space as a function of time. The gray blob represents some phase space DF fluctuation δf evolving in the absence of external forces [first term on the right-hand side of equation (94)]. Unperturbed trajectories follow horizontal lines in angle–action space [Fig. 10(b)]. Since the orbital frequencies Ω(J) depend on J, particles of different actions dephase. This leads to the appearance of ever finer structures in phase space.

FIG. 19.

Schematic illustration of phase mixing in angle–action space as a function of time. The gray blob represents some phase space DF fluctuation δf evolving in the absence of external forces [first term on the right-hand side of equation (94)]. Unperturbed trajectories follow horizontal lines in angle–action space [Fig. 10(b)]. Since the orbital frequencies Ω(J) depend on J, particles of different actions dephase. This leads to the appearance of ever finer structures in phase space.

Close modal

One can imagine iterating the above procedure to higher orders in perturbation theory by plugging (94) back into the right hand side of (90b), calculating δfn to second order in the perturbation strength, and so on.65 At the level of particle orbits, this would correspond to adding “sinusoidal disturbances on top of sinusoidal disturbances” over and over with ever smaller amplitude. In the case where the perturbations δΦ are stochastic, this approach is usually given the name weak turbulence theory. Weak turbulence is the subject of many a plasma monograph such as the excellent one by Kadomtsev66 and underlies the resonance broadening theories pioneered by Dupree67 and others.16,68,69 On the other hand, in the case where δΦ is deterministic, the second-order calculation leads, e.g., to the classic plasma echo phenomenon first pointed out by Gould et al.70 

However, no matter how many iterations of this perturbative scheme one performs, it cannot account for fundamentally nonperturbative effects such as particle trapping by strong potential perturbations (Sec. IV F). One way to see this is to note that the perturbative iteration scheme accounts for effects that are of order ε, ε2, ε3, etc., where ε|δΦ/H0| measures the strength of the potential perturbations. Nonlinear trapping, though, enters the dynamics at order ε1/2 (Sec. IV F), and so cannot be recovered by summing any finite number of perturbative terms. Another way to arrive at the same conclusion is to note that trapped orbits [like the one in Fig. 14(e)] are topologically distinct from untrapped ones [c.f. Fig. 14(f)], and no amount of iterative sinusoidal deformations of the latter can ever reproduce the former.

These fundamentally nonperturbative effects can be very important for the evolution of the system as a whole, but a kinetic theory that accounts for them cannot start from Eq. (93). In the case where δΦ is stochastic, this regime is termed strong turbulence, and formulating an accurate kinetic theory turns out to be very difficult even in the simplest plasmas.68 Luckily, in galaxies, δΦ is often dominated by a single coherent perturbation, such as a strong bar or spiral arm. In this case, a simple kinetic theory incorporating trapping can be developed using the pendulum approximation (Sec. IV F), and this will be the subject of Sec. VIII. Moreover, just as we discussed at the end of Sec. IV F 2, if nonlinear effects are somehow quenched or interrupted on a short enough timescale, then the problem is relinearized and (94) is rendered (approximately) valid. We will see a concrete example of this in Sec. VIII B.

In this section, we study the self-consistent response of a system to weak perturbations, using the linear approximation (Sec. V D). We assume throughout that the angle-averaged DF f0 is time-independent, which means that we need to solve Eqs. (91) and (93) simultaneously, but we can ignore Eq. (90a).

For simplicity, we will develop the linear theory in the absence of any possible external perturbations, i.e., we will assume the only potential fluctuations are derived self-consistently internally from the angle-dependent part of the DF δf. However, the case where external perturbations are present is not essentially different (see, e.g., Chap. 5 of BT08), and we will show a relevant example in Sec. VI C 1.

Equations (91) and (93) constitute an initial value problem and are naturally solved via Laplace transform.71 Let us therefore define the Laplace transform of a function h(t), and its inverse via
(95a)
(95b)
Recall that the Laplace transform (95a) is defined for complex ω in the upper-half plane, or more precisely for Imω>σ where σ is chosen large enough for the integral to converge at t. In the inverse transform (95b), the “Bromwich contour” runs along the line Im ω=σ from Reω= to Reω=+, but can be deformed at will, provided it never crosses over any singularities of h̃(ω).
Laplace transforming Eq. (93), we get
(96)
with δfn(J,0) is the initial fluctuation in the DF. The solution to this equation is
(97)
[Note that by taking the inverse Laplace transform of this, we can recover the formal solution (94)]. Meanwhile, the Laplace-transformed version of Eq. (91) is
(98)
Let us now put Eqs. (97) and (98) together in order to eliminate δf̃n(J,ω). The result is the following integral equation for the potential fluctuation δΦ̃n(J,ω):
(99)
where ΩH0(J)/J.
Equation (99) is a linear operator equation of the form
(100)
where we have written the “state” of the potential fluctuations at frequency ω as |δΦ̃(ω) and introduced the abstract linear operator M̂(ω). Note that (100) is not merely a schematic representation of Eq. (99); instead, it is what is known in quantum statistical mechanics as a Dyson equation.28,72 One can in fact develop the entire response theory of inhomogeneous systems on the basis of the abstract Dyson equation and operator theory73–76 without reference to the explicit angle–action representation. We will not go into the details here, but instead just discuss the physics that can be gleaned from Eq. (100).

On the right-hand side of Eq. (100), we have the source term |δΦ̃source(ω), which corresponds to the second term on the right-hand side of Eq. (99). This is the part of the potential fluctuations that arises purely from the inevitable, finite-N discreteness noise δfn(J,0), propagated ballistically along mean field trajectories. (Had we included an external potential perturbation in our calculation, it would just be added to this source term.) The other term involves the operator M̂(ω) acting on the potential fluctuation |δΦ̃(ω). It tells us the extent to which potential fluctuations arise because of perturbations in the DF which were themselves induced by potential fluctuations. Thus we are discussing collective effects: perturbations to the DF that seed their own gravitational fields. Glancing back at (99), we expect these collective effects to be significant if there are fluctuations at frequencies ω which are close to some of the available stellar frequencies n·Ω. These fluctuations will be the all more efficient provided they come from stars at locations in action space J where the mean field DF has a large gradient f0/J.

Another way to understand (100) is as follows. We start with an initial fluctuation |δΦ̃source(ω) and then ask when we ask what is the self-consistent response of the system to itself. First, the fluctuation |δΦ̃source(ω) provokes a response M̂(ω)|δΦ̃source(ω). The response to this response is M̂(ω)M̂(ω)|δΦ̃source(ω), and so on. Equating this string of self-consistent responses to the overall potential, we have |δΦ̃(ω)=[I+M̂(ω)+M̂(ω)M̂(ω)+]|δΦ̃source(ω)=[IM̂(ω)]1|δΦ̃source(ω), which we can multiply by [IM̂(ω)] to arrive at (100).

How do we solve equation (100)? In homogeneous plasma theory, it is easy: in that case, the operator notation we have employed here is unnecessary because M̂(ω) is a scalar (up to proportionality constants, it is equal to 1 minus the dielectric function ϵk(ω)), which “acts” on the potential fluctuation by simple multiplication. In inhomogeneous systems, we are not so lucky. Ultimately, this is because, as we warned in Sec. III D, the potential fluctuation δΦ(θ,J) depends on both phase space coordinates and momenta, and so, M̂(ω) acts upon these fluctuations in a nontrivial long-range fashion. In practice, this means we must solve the operator equation by casting [IM̂(ω)] in a matrix representation.

The usual technique for doing this is the so-called biorthogonal basis method originally developed by Kalnajs.77 The idea is that we can expand an arbitrary potential fluctuation in real space in terms of some set of basis functions Φ(p)(r), and similarly for a density fluctuation using some ρ(p)(r). For self-consistent problems, it is intelligent to choose these functions in pairs such that they solve the Poisson equation, 2Φ(p)=4πGρ(p), and are orthogonal, i.e., drΦ(p)*(r)ρ(q)(r)=0 unless p=q. Many such sets are known, and the precise set one should use depends on the problem at hand (but they usually consist of familiar objects like spherical harmonics, Bessel functions, etc.). Physically, each discrete pair (Φ(p),ρ(p)) corresponds to a way in which the stellar system can oscillate self-consistently, and the full response of the system can be built up by superposing these discrete oscillations. This is reminiscent of elementary quantum mechanics, in which one builds the wavefunction of an electron in the hydrogen atom as a superposition of orthogonal eigenfunctions. Mathematically, once we have constructed such a basis, we can project our linear response equations onto it, thereby translating a problem of coupled differential equations into one of linear algebra.

The details of the biorthogonal basis method are rather technical, and unimportant for most of what follows, so we relegate them to  Appendix B, but the explicit matrix form of the operator M̂(ω) that results is
(101)
where Φn(p)(J) is the Fourier transform of Φ(p)(r) in angle space [see Eq. (36)], and E is an arbitrary constant with units of energy. In Eq. (101), M(ω) is usually called the response matrix. There is a striking similarity between Eq. (101) and the corresponding quantity in plasma theory, and we explore this connection in Sec. VI B 1. Moreover, just as in plasma theory, although we derived (101) for Im ω>0, we can analytically continue this function to the lower-half ω plane by taking the action space integral along the Landau contour. In practice, for any given n, stellar systems typically sustain only a finite range of resonance frequencies, n·Ω. This leads to branch cuts in the response matrix,78,79 as is also the case, for example, in magnetized relativistic plasmas,80 in the toroidal ion temperature gradient mode,81,82 or truncated Maxwellians.83 We will not pursue any details here, but refer the reader to Chap. 5 of BT08 and to Fouvry and Prunet84 for more information.

Having expressed M̂(ω) as a matrix M(ω), the nature of the solution to Eq. (100) depends crucially on whether det[IM(ω)]=0 or det[IM(ω)]0:

  • If ω is such that det[IM(ω)]0 then the inverse matrix [IM(ω)]1 exists, and the solution to Eq. (100) is simply
    (102)

    This is true for the great majority of ω values. Equation (102) equation tells us that at these frequencies, the noise source |δΦ̃source(ω) gets amplified by collective effects. Sometimes this is referred to as the noise being dressed by a gravitational wake. In the equivalent electrostatic plasma calculation, IM is just the dielectric function ϵk(ω); for near-equilibrium plasmas, the solution (102) tells us that each “bare” charge is locally screened by the Debye cloud of opposite charges that it gathers round itself. But in the gravitational context, there is no screening, and collective effects will often make gravitational perturbations on large scales stronger, not weaker. Loosely speaking, whereas Debye clouds tend to reduce the effective charge, gravitational wakes tend to increase the effective mass hence typically shortening the relaxation time (see Sec. VI C 1 for an example). This behavior was illustrated beautifully in the context of a local shearing portion of a galactic disk by Toomre and Kalnajs.85 

  • Alternatively, for ω such that det[IM(ω)]=0, the inverse of [IM(ω)] does not exist, which tells us that |δΦ̃(ω) can be nonzero even in the absence of a source term on the right-hand side of Eq. (100). These are the so-called Landau modes of the system,86 and they tend to occur for a discrete set of (generally) complex frequencies {ωm}. The system is then able to support a discrete set of collective oscillations which, just as electromagnetic waves in vacuum, do not need to be continually driven by some noise source. Moreover, under fairly weak assumptions one can show that the solution of Eq. (100) at this frequency will have time-dependence eiωmt (see Kalnajs75 for a precise statement). Thus, a system is linearly unstable if there exists a Landau mode whose imaginary part Imωm>0. Otherwise, the system is stable.

This confluence of a “dressed continuum” of fluctuations superposed with a discrete set of “Landau modes” is characteristic of both plasmas and stellar systems, although the typical responses of these systems to perturbations can be very different, as we will illustrate with some examples below.

Note that although we use the term Landau mode above, these are in general not the normal modes of the linearized system, because ω is in general not real. An alternative approach to linear response theory is to use purely real frequencies, in which case the fundamental objects in the theory are (singular) van Kampen modes.87 One can show that Landau modes are equivalent to a superposition of a continuum of these van Kampen modes. We will not refer to van Kampen modes again in this article—we only note that they offer an alternative way to formulate everything we present in this section.

1. Example: Jeans instability in a homogeneous periodic cube

First, let us address directly the analogy between stellar systems and homogeneous plasmas, by calculating the response matrix M(ω) for a fictitious three-dimensional homogeneous stellar system placed inside a periodic cube. As noted in Sec. III C 1, in that case, orbits are simple (multi-periodic) straight lines, and we can take θr and JΩv [see Eq. (38)]. The system's instantaneous potential and DF are linked by
(103)
where ψ(r,r)=G/|rr| is the gravitational pairwise interaction. Making use of the periodicity of this system one has
(104)
Glancing at (B4), we find that, in homogeneous systems, the natural basis elements are Φ(p)eip·θ/|p|, namely, plane waves. This is unsurprising, since the system is translationally invariant. In Eq. (104), the factor 1/|p|2 is naturally reminiscent of the 1/|k|2 factor that arises when Fourier transforming the electrostatic Poisson equation for plasmas.
With this, the response matrix from Eq. (101) becomes
(105)
where ω¯(2π)1Lω. Assuming88 that the system's mean DF follows the Maxwellian distribution
(106)
where ρ0 is the system's mean density, and σ is the velocity dispersion, we find from Eq. (105) that
(107)
where ζ=ω¯/(2|p|σ) is a dimensionless frequency and Z is the usual plasma dispersion function89 
(108)
and LJ is the so-called Jeans length
(109)
The main difference the expression (107) for a self-gravitating system, and the analogous one for an electrostatic plasma, is an overall minus sign.

The dispersion relation for Landau modes is then δpqMpq(ωm)=0. By looking for solutions with Re ωm=0, one can show that instability occurs (Im ωm>0) if the system is larger than a critical scale, L>LJ (see Sec. 5.2.4 of BT08 for details). Thus, unlike in a plasma, a homogeneous Maxwellian distribution of stars is not necessarily stable to linear perturbations, but instead can be made unstable, e.g., by making the system dynamically colder (reducing σ) or denser (increasing ρ0). The physics underlying this Jeans instability is the competition between gravity and velocity dispersion, which we sketch in Fig. 20.

FIG. 20.

Jeans instability as a competition between velocity dispersion and gravitational attraction. A density fluctuation ρ0 over a length scale L will collapse under its own gravity if the collapse timescale 1/Gρ0 is shorter than the crossing time L/σ where σ is the velocity dispersion.

FIG. 20.

Jeans instability as a competition between velocity dispersion and gravitational attraction. A density fluctuation ρ0 over a length scale L will collapse under its own gravity if the collapse timescale 1/Gρ0 is shorter than the crossing time L/σ where σ is the velocity dispersion.

Close modal

In a stellar system without any velocity dispersion, a density fluctuation ρ0 over a length scale L would collapse under its own gravity on the timescale 1/Gρ0. However, if the stars have some velocity dispersion σ, then they can escape the region of size L on the timescale L/σ. If the latter timescale is longer than the former, the system cannot avoid collapse: this happens if Lσ/Gρ0, which, up to a prefactor, is the Jeans length (109). The Jeans instability is the basic mechanism underlying large-scale structure formation in the Universe.3 

1. Example: Spiral Landau modes in cold stellar disks

Another important example of a Landau mode is a spiral mode in a disk galaxy. In truth, it is still not completely clear what gives rise to the spiral structure that we see in real galaxies like that in the top panel of Fig. 1.

Rather than attempt a review of this fascinating subject here,92,93 let us restrict ourselves to considering a more artificial system, namely, an isolated, two-dimensional stellar disk consisting of equal-mass stars and no gas. In this case, spirals can arise manifestly as linear instabilities (i.e., Landau modes with Im ωm>0) of an underlying DF f0(L,JR).

A classic example of such a spiral mode was discovered by Zang,90 which we reproduce in Fig. 21 (adapted from Petersen et al.91) The model Zang employed was a “Mestel disk”—a simple, scale-free, two-dimensional disk embedded in a rigid dark matter halo—to which he added inner and outer radial truncations. Just as with the Jeans instability (Sec. VI B 1), one can always make a disk unstable by making it sufficiently dynamically cold and/or dense. In Zang's case, this was achieved by modifying the active fraction of the disk, essentially the ratio between the self-gravitating mass in the disk and that in the unresponsive halo. Panel (a) of Fig. 21 shows the contours of det[IM(ω)] [Eq. (101)] in the complex frequency plane. The yellow cross shows the location where this determinant equals zero, which gives the frequency ωm of an unstable spiral Landau mode. Precisely, the mode rotates with pattern speed 0.88Ω0 and growth rate 0.12Ω0, where Ω0 is a typical dynamical frequency comparable to 2π/tdyn. The spatial structure of the spiral mode is shown in panel (b); this is calculated by finding the eigenvector Xm with components Xmp, such that M(ωm)·Xm=0 and then writing
(110)
(Since this is a linear eigenmode, the overall normalization is arbitrary.)
FIG. 21.

Determination of an unstable spiral Landau mode in a truncated Mestel disk following Ref. 90. Panel (a) shows the determinant of IM(ω) in the complex frequency plane; the yellow cross shows where this function passes through zero. Panel (b) shows contours of the corresponding spiral Landau mode. Figure adapted from Petersen et al., Mon. Not. R. Astron. Soc. 530, 4378 (2024). Copyright 2024 Royal Astronomical Society.91 

FIG. 21.

Determination of an unstable spiral Landau mode in a truncated Mestel disk following Ref. 90. Panel (a) shows the determinant of IM(ω) in the complex frequency plane; the yellow cross shows where this function passes through zero. Panel (b) shows contours of the corresponding spiral Landau mode. Figure adapted from Petersen et al., Mon. Not. R. Astron. Soc. 530, 4378 (2024). Copyright 2024 Royal Astronomical Society.91 

Close modal

The two examples we have given so far both involved smooth mass distributions that were made sufficiently cold and/or dense that they became unstable. However, there are other ways of producing instabilities which are perhaps more familiar to plasma physicists used to the bump-on-tail paradigm.16 For instance, De Rijcke et al.94 recently performed a linear stability analysis, like the one we have outlined in this section, for a grooved Mestel disk. That is, they took a Mestel disk with a stable DF—the JR=0 slice through which is shown with a dotted line in Fig. 22(a)—and then carved a “groove” in it by removing a portion of the stars in a given range of angular momentum [solid line in Fig. 22(a)].

FIG. 22.

Groove-induced instability in a Mestel disk. Panel (a) shows the JR=0 cut through the DF in the stable (ungrooved) and unstable (grooved) case. Panel (b) shows the unstable spiral mode as calculated from linear stability analysis of the grooved disk. The solid circle shows the location of the groove for circular orbits, while the three dashed circles show the ILR, CR, and OLR resonance locations of the resulting spiral mode. Figure adapted from De Rijcke et al., Mon. Not. R. Astron. Soc. 484, 3198 (2019). Copyright 2019 Royal Astronomical Society.94 

FIG. 22.

Groove-induced instability in a Mestel disk. Panel (a) shows the JR=0 cut through the DF in the stable (ungrooved) and unstable (grooved) case. Panel (b) shows the unstable spiral mode as calculated from linear stability analysis of the grooved disk. The solid circle shows the location of the groove for circular orbits, while the three dashed circles show the ILR, CR, and OLR resonance locations of the resulting spiral mode. Figure adapted from De Rijcke et al., Mon. Not. R. Astron. Soc. 484, 3198 (2019). Copyright 2019 Royal Astronomical Society.94 

Close modal

This grooved disk is also unstable to spiral Landau modes, and Fig. 22(b) shows the shape of the dominant mode.

Simulations show that the Landau mode spectrum of stellar disks is often dominated by one, or at most a few, isolated frequencies.95 This means that spirals in unstable disks are perhaps more analogous to, e.g., Alfvenic eigenmodes in tokamak plasmas than to classic bump-on-tail instabilities in homogeneous plasmas, since the latter tend to harbor a whole spectrum of unstable modes (at least one at every unstable wavenumber k). Because of this, the evolution of unstable stellar disks might be more aptly described by something like the single-wave kinetics of O'Neil,96 Berk and Breizman,97,98 etc., than by the typical many-wave quasilinear theory of Drummond and Pines,99 and Kaufman.100 See Sec. VIII for more.

We now suppose that our system is stable, meaning that all Landau modes are damped. An astrophysical example of such a (weakly) damped mode is the (dipole) =1 mode of globular clusters,101 which represents long-lasting sloshing oscillations.102 We further assume that we may wait long enough that these modes no longer contribute to the response (i.e., we wait for t|Imωm|1, where ωm is the system's least strongly damped Landau mode). In this regime, we can solve (99) directly under the assumption that ω is not associated with a Landau mode. Having constructed the solution, we will then perform an inverse Laplace transform to get the dressed wake part of the response in the time domain.

The easiest—and most physically enlightening—way to solve (99) is to guess what form the solution should have, and then work out the exact details post hoc. So, let us look at Eq. (99). Supposing for a moment that we were to ignore collective effects (M̂=0), we would find
(111)
which we call the “bare” solution. Inverse Laplace transforming this according to Eq. (95b), deforming the Bromwich contour so that it encircles the pole at ω=n·Ω (see Fig. f.1 of Fouvry and Bar-Or103 for illustration) and closing the contour with a large semicircle in the lower-half plane, we would find
(112)
The physical interpretation of this result is very clear: it is the (Fourier transform of the) potential at location r(θ,J) that is created by summing the contributions from every star with initial locations r(0)=r(θ(0),J(0)), assuming those stars simply travel along mean field trajectories θ=Ωt+θ(0) and J=J(0).
The ansatz we now make is that the solution to the full Eq. (99) is exactly of this functional form, except with the bare interaction potential ψnn(J,J) replaced with some dressed (and frequency-dependent) interaction potential ψnnd(J,J,ω), whose form we do not yet know. That is, we assume the solution has the form
(113)
This ansatz is actually very natural if one is acquainted with Rostoker's principle in plasma kinetics, which essentially says that one can consider a hot plasma as a set of completely uncorrelated particles, provided one replaces the bare Coulombic potential of each particle with its dressed, or screened, version.16 (We will use this idea to derive the Balescu–Lenard kinetic equation for stable stellar systems in Sec. VII B.) Plugging the ansatz (113) into (99), we can read off an implicit equation for ψd:
(114)
where ΩH0(J)/J. Thus, we will have constructed a valid solution to Eq. (99) if we can solve Eq. (114) for ψd. While at first sight it may not seem like we have made much progress with this rewriting, it turns out that (114) is actually quite easy to solve if we project the various terms onto the biorthogonal basis elements Φn(p)(J), just as in Eq. (101). We leave the details to  Appendix B, and here just write down the final answer:
(115)
We can usefully compare this to the bare interaction, which as we show in  Appendix B can be expanded as
(116)
Thus, we see that in passing from ψψd, we have simply “divided” the bare interaction by [IM(ω)]. This is analogous to dividing by the dielectric function ϵk(ω) in electrostatic plasma theory.
We can now inverse Laplace transform Eq. (113). The assumption that Landau modes can be ignored (i.e., t|Imωm|1) means that when we deform the Bromwich contour we need only pick up contributions from the poles at ω=n·Ω, and can ignore any singularities of ψd [i.e., zeroes of [IM(ω)], see Eq. (115)]. The result is
(117)
which is the same as (112) with ψψd. Thus, we have realized Rostoker's idea: the dressed potential fluctuation is equivalent to that carried by the initial fluctuation (δfn(J,0)), transported along the mean field (ein·Ωt), but with the Newtonian potential of each star replaced with the dressed potential (ψd) evaluated at the mean field frequency (n·Ω), with contributions summed over all stars (ndJ). Furthermore, (117) can be plugged into the right hand side of Eq. (94) to give an explicit solution for the fluctuations in the DF δfn(J,t). Note that the basis functions Φ(p) play explicit no role in Eq. (117); they are merely a calculational tool which is only used to construct the dressed interaction ψnnd(J,J,ω) for a given mean field f0(J).

1. Example: Response of a stellar system to a perturbing mass

So far in this section we have concentrated on the response of a stellar system to internal noise δf(t=0). However, an entirely analogous calculation shows that even externally imposed potential fluctuations will be “dressed” by the system's self-gravity—just as an externally imposed point charge will gather its own Debye cloud in an electrostatic plasma. Here, we will not pursue any quantitative details, but instead show a few figures that capture the characteristic features of such a response.

First, we present Fig. 23, which is adapted from Gibbon.104 This is a cartoon of a quasineutral electrostatic plasma made of blue positive charges and red negative charges. When a pair of electrical probes is inserted into the plasma, they gather around them Debye clouds consisting of particles of the opposite charge. In this way, the electrostatic potential of the probes is screened over a Debye length. This cloud is typically much smaller than the scale of the whole system.

FIG. 23.

Cartoon [adapted from P. Gibbon, arXiv, 2007.04783 (2020). Copyright 2020 Author.104] of the Debye shielding that occurs when electrical probes are placed inside a quasineutral plasma.

FIG. 23.

Cartoon [adapted from P. Gibbon, arXiv, 2007.04783 (2020). Copyright 2020 Author.104] of the Debye shielding that occurs when electrical probes are placed inside a quasineutral plasma.

Close modal

Next, we turn to Fig. 24, which is adapted from Magorrian.105 This is an analogous stellar-dynamical scenario, in which a heavy point mass is driven on a straight line orbit at constant speed through a homogeneous stellar system with Maxwellian f0 (Sec. VI B 1). The size of the box in this case has size L0.9LJ, so the system is (weakly) stable (recall that instability occurs if L>LJ). The contours in panel (a) show the density response of the system according to linear theory in the case that stars only respond to the point-mass perturber but not to each other. This we call the bare response. By contrast, panel (b) shows the response density in the dressed case in which stars interact both with the perturber and with each other [the contour levels are the same as in panel (a)]. Let us make two simple observations which contrast the behavior of the stellar system shown in these panels with the plasma system sketched in Fig. 23. (i) The overdense wake in the gravitational system acts to reinforce the perturbation rather than shield it, since there are no opposite charges. Heuristically therefore, one can think of gravitational polarization as a kind of “anti-Debye-shielding.” (ii) The scale of the induced wake, especially in the dressed case, can be comparable to the scale of the system itself, in contrast with an electrostatic plasma where the Debye length is usually much smaller than the system.

FIG. 24.

Response of a Maxwellian box of stars to a point mass perturber, adapted from J. Magorrian, Mon. Not. R. Astron. Soc. 507, 4840 (2021). Copyright 2020 Royal Astronomical Society.105 The point mass (black dot) is driven through the box of size 0.9LJ at constant speed, and the wake response is computed according to linear theory. Solid blue contours denote positive overdensity, and red dotted contours denote negative overdensity, while the blue dashed curve corresponds to zero overdensity. In panel (a), the stars only feel the gravitational potential of the perturber but not of each other, while in panel (b), they interact with both the perturber and each other. The same linearly spaced contour levels are used in both panels.

FIG. 24.

Response of a Maxwellian box of stars to a point mass perturber, adapted from J. Magorrian, Mon. Not. R. Astron. Soc. 507, 4840 (2021). Copyright 2020 Royal Astronomical Society.105 The point mass (black dot) is driven through the box of size 0.9LJ at constant speed, and the wake response is computed according to linear theory. Solid blue contours denote positive overdensity, and red dotted contours denote negative overdensity, while the blue dashed curve corresponds to zero overdensity. In panel (a), the stars only feel the gravitational potential of the perturber but not of each other, while in panel (b), they interact with both the perturber and each other. The same linearly spaced contour levels are used in both panels.

Close modal

Since a homogeneous box is not a particularly realistic model for a stellar system, we finally present Fig. 25, which is adapted from D'Onghia et al.106 This figure shows the response density of a stellar disk to an embedded point-mass perturber, driven on a circular orbit with constant speed for one azimuthal period (250 Myr). The disk is chosen to be fairly dynamically cold, but not so cold as to be unstable. The response was calculated using a full N-body simulation in which stars interact both with the perturber and with each other. The perturbing mass was much less than 1% of the whole system's mass. Despite its relatively small mass, the perturber provokes a strongly amplified response in the disk surface density which extends over a large azimuthal range. This phenomenon, which was first discovered by Julian and Toomre107 using linear theory, reflects the fact that cold stellar disks are very effective amplifiers of perturbations, as we discussed in Sec. II C 3. As a result, it is typical of weakly stable stellar systems to develop collective responses on a global scale.

FIG. 25.

Response of a stellar disk to a massive perturber driven on a circular orbit [adapted from D'Onghia et al., Astrophys. J. 766, 34 (2013). Copyright 2013 IOP.106] The perturber mass is much less than 1% of the mass of the system. Still, it manages to produce a significant response over a large swathe of the disk.

FIG. 25.

Response of a stellar disk to a massive perturber driven on a circular orbit [adapted from D'Onghia et al., Astrophys. J. 766, 34 (2013). Copyright 2013 IOP.106] The perturber mass is much less than 1% of the mass of the system. Still, it manages to produce a significant response over a large swathe of the disk.

Close modal

In many cases, we do not know, nor do we really care, about the details of the microscopic DF fm of a particular stellar system. What we often do care about, however, is the structure and evolution of some suitably averaged DF, fm. If we are interested in the evolution of a system driven by random fluctuations, be they internally or externally generated, then the appropriate average is an ensemble average over all possible microscopic realizations of those fluctuations. The central idea of kinetic theory is to use the statistics of the realizations of the fluctuations to derive an equation for the evolution of fm.

We expect for an ensemble average of fm over microscopic realizations to be equivalent to an average over angles and over time (on timescales trelax). Thus, fm=f0 and the relevant equation describing the evolution is [c.f. Eq. (90a)]
(118)
In Sec. V D, we spelled out the basic idea of quasilinear kinetic theory, which is that if the fluctuations δΦ and δf are sufficiently small, then we can predict the evolution of f0 on the timescale trelaxtdyn by plugging the solutions from linear response theory (Sec. VI) into the right-hand side of Eq. (118). Quasilinear theory works provided such a timescale separation exists, and provided we can ignore higher order nonlinear and nonperturbative effects like trapping, as discussed in Sec. V E.

In this section, we will develop quasilinear kinetic theory more quantitatively. Our study of internally driven evolution will culminate in the Balescu–Lenard equation which describes the slow evolution of an isolated, stable N-star system. We first describe the basic physics underlying the Balescu–Lenard equation, then simply write it down without a proper derivation and discuss some of its properties (Sec. VII A). Then, we offer a simple derivation of the Balescu–Lenard equation based on Rostoker's principle (Sec. VII B). We give several examples of this equation being used in practice (Sec. VII C), before discussing where the theory breaks down and how it might be improved (Sec. VII D).

As in Sec. VI, we consider systems in which the fluctuations are internally generated by the finite-N noise, but we note that a very similar formalism can be developed to account for externally driven quasilinear evolution.103,108

As a first step, we take the formal solution for the perturbation to the DF in linear theory, Eq. (94), and plug it into the right hand side of Eq. (118). The result is
(119)
where the two contributions to the “action space flux” F=F1+F2 on the right-hand side are
(120)
and
(121)
These equations are not yet written in a form that is useful for calculation, but their physical meaning is already apparent. In order to make this discussion as clear as possible, let us deal with them in reverse order.
  • The second flux F2 is a diffusion term. This is obvious since it is proportional to the gradient of the DF f0(J) in action space. That is to say, this term answers the following question: “I am a star with action J, moving along my orbit within a bath of potential fluctuations δΦ. If I think of myself as a massless test particle, then on average, how are those potential fluctuations going to nudge me around?” It is natural that the answer involves the correlation of fluctuations sampled at action space location J. Nothing about the expression (121) would formally change had we insisted that δΦ was externally imposed, but here we are going to use the solution for δΦ derived in Sec. VI C assuming the fluctuations are internally generated and that the system is stable. Thus, we say that the flux F2 describes orbital diffusion due to dressed discreteness noise.

  • On the other hand, the first flux F1 provides the answer to the obvious rebuttal: “Wait! I'm not really a test particle, so diffusion cannot be the whole story. I have finite mass, and so I drive potential fluctuations of my own. To what extent is my motion altered by the part of δΦ that I have myself produced?” Physically, this term represents dynamical friction, which Chandrasekhar intuited back in the 1940s. The key idea is that when any massive body moves through a sea of other bodies, it creates a “wake” behind it in response (just as in Figs. 24 and 25). This wake then backreacts, or drags, on the massive body, slowing it down. It is therefore natural that the flux F1 involves the correlation between δΦ at action space location J and the initial discreteness noise at the same location, δfn(J,0). The existence of such a term is also a necessary consequence of energy (and momentum) conservation: since F2 scatters stars to larger energies on average, in a self-consistent system, there must be another term dragging them back to lower energies.

Although we have taken all stars to be of equal mass for simplicity, one can show that in a multi-mass system, the more massive a body is the larger the wake it induces, and so the stronger the drag force it feels. This leads for instance to massive bodies losing energy and spiraling in toward the center of their host galaxy. Mathematically, for a subsystem of very massive bodies, the frictional term in Eq. (119) is typically much more important than the diffusive term. By contrast, a very light subsystem will contribute only weakly to the total gravitational potential fluctuation, and hence will experience little dynamical friction; for this subsystem, the diffusive term in Eq. (119) can dominate over the frictional one. This is also an astrophysically relevant scenario: for instance, a key problem of galactic evolution is to understand the diffusion of stellar orbits (Fig. 7) due to interactions with transient spiral waves and/or giant molecular clouds.108 For the present single-mass system, though, F1 and F2 are typically comparable in magnitude.

The next step in deriving a proper quasilinear kinetic equation is to take the solution for the dressed potential from linear theory δΦ[δf(0)], Eq. (117), and plug it into the right hand sides of Eq. (120) and (121). One then further assumes that stellar orbits are initially uncorrelated and takes ttdyn. The details of this procedure are rather lengthy and do not convey much physical insight, so we refer the reader to, e.g., the works by Chavanis,109 Fouvry and Bar-Or,103 and here just write down the final answer. Equation (119) takes the Fokker–Planck form
(122)
with the friction vector
(123)
and diffusion tensor
(124)
We can put these expression together to arrive at the Balescu–Lenard equation
(125)

1. Properties of the Balescu–Lenard equation

At first glance, the Balescu–Lenard equation might look implausibly complicated, but the physics underlying it is simple. Indeed, with hindsight, one could argue that this is the simplest equation one could write down with all the desired properties (particularly that it must overcome the shortcomings of Chandrasekhar theory that we listed in Sec. II C 3). For instance:

  • The equation is expressed in angle–action variables and therefore naturally accounts for the inhomogeneity of the system and its associated nontrivial orbital structure.

  • Related to this, unlike in Chandrasekhar's theory, in which we had to stipulate a maximum impact parameter bmax to fix the Coulomb logarithm [e.g., Eq. (17)], the Balescu–Lenard equation suffers from no large scale divergence and no such cutoff is needed. The largest scale of fluctuations in the system is accounted for naturally through the lowest-order basis elements Φ(p) and wavenumbers n.

  • It takes the form of a Fokker–Planck equation in action space, with both friction and diffusion terms present, as it must following the physical arguments below equation (121).

  • The right-hand side is an antisymmetric functional of f0(J) and f0(J). This tells us that the system is driven by pairwise interactions. In other words, the Balescu–Lenard equation describes a collection of orbits at action space location J interacting with orbits at all other locations J. It is remarkable that in this approximation (truncating at second order in small fluctuations), the collective evolution of an amazingly complicated N-star system acts as if it were driven by uncorrelated two-body encounters—though they interact not through the usual bare coupling ψ=G/|rr| but through the dressed pairwise interaction potential ψd. This is a direct manifestation of Rostoker's principle, which serves as the basis for the alternative derivation shown in Sec. VII B.

  • The Dirac delta function on the right-hand side encodes the fact that interactions are not only pairwise but also resonant. In other words, the only meaningful interactions occur if some linear combination of frequencies of the star at J matches a linear combination of the frequencies of the star at J. This makes sense, since pairs of stars that do not satisfy this condition will undergo an interaction that vanishes on average—each star will feel an equal “pull” and “push” from the other as it traverses many mean field orbits, with a net result of zero.

  • The equation conserves mass, energy, and satisfies a H-theorem for Boltzmann entropy. In addition, the right-hand side exactly vanishes for inhomogeneous Boltzmann DFs (i.e., thermal distributions) of the form f0(J)eβH0(J) (although these DFs are not realizable in actual galaxies since they contain infinite mass).

  • In the limit where collective amplification is ignorable [ M=0, Eq. (101)], we can replace ψdψ and recover the Landau equation which describes the evolution of an N-body system driven by “bare” resonant interactions.110 

Finally, in the limit of a homogeneous system, one can use the angle–action mapping (38) and thereby recover (up to constants) the classic Balescu–Lenard collision operator from electrostatic plasma kinetics
(126)
In Table II, we provide a direct comparison between the various terms in the homogeneous equation (126) and the corresponding inhomogeneous equation (125). In particular, for a homogeneous system, we have |ψkkd|2δkkk4|ϵk|2, where ϵ is the dielectric function. This is why the homogeneous equation (126) involves only a single sum over wavenumbers k, rather than the double sum over n,n that we had in the inhomogeneous equation (125). The factor |ϵk(k·v)|2 in Eq. (126) is the classic dielectric shielding factor. In near-Maxwellian plasmas, this factor tends to suppress two-body interactions on large (bigger than Debye) scales, but in stellar systems, it can strongly amplify such interactions—see Sec. VII D for a detailed discussion.

2. An historical aside

The Balescu–Lenard equation was first derived in the context of homogeneous electrostatic plasmas.111,112 The main mathematical difference in the stellar dynamical case is that one must formulate the equation in angle–action variables.

The basic idea of formulating a quasilinear diffusion equation in action space seems to have been first arrived at by Kaufman and Nakayama,113 Kaufman,100,114 in the context of inhomogeneous plasmas. The idea of doing gravitational kinetic theory in angle–action variables was introduced around the same time by Kalnajs,75 who was predominantly interested in the linear properties of disk galaxies and their associated spiral modes (Sec. VI). A proper diffusion equation of the form tf0=J·[D·Jf0] was not written down in the stellar-dynamical context until a few years later, by Dekker.115 For a thorough overview of the history and of linear and quasilinear kinetics in angle–action variables (including their plasma precedents), see Chavanis.116 

The first derivation of the full Balescu–Lenard kinetic equation for stellar systems was in a somewhat opaque 1987 treatise by Luciani and Pellat.117 The same equation was then arrived at in 2010 by Heyvaerts118 using the BBGKY hierarchy; then in 2012 by Chavanis109 using the Klimontovich description; again in 2017 by Heyvaerts et al.119 using a Fokker–Planck description; then in 2018 by Fouvry and Bar-Or103 using stochastic equations and Novikov's theorem; and finally in 2021 by Hamilton120 using Rostoker's principle, as we now detail.

Rostoker's principle16,121,122 tells us that we can think of a secularly evolving plasma or stellar system as a collection of N uncorrelated particles undergoing two-body encounters (à la Chandrasekhar), provided we replace the bare Coulombic/Newtonian interaction ψ(r,r) of each two-body encounter with its dressed counterpart ψd. Roughly speaking, this is because particles are weakly coupled, and because the main effect of collective interactions upon the potential fluctuation spectrum is to dress each particle with its collective wake [Eq. (117)]. Here, we use this principle to derive the Balescu–Lenard equation (125) in a short and simple way following Hamilton.120 (Note we do not actually need to deal explicitly with fluctuations here, so we will drop the “0” subscript on the ensemble-averaged DF).

First, consider a “test” star with coordinates (θ,J) and a “field” star with coordinates (θ,J). Mathematically, Rostoker's principle tells us to forget about all the other stars and treat the interaction of these two stars as if they were an isolated system with specific two-body Hamiltonian [units of (velocity)2]
(127)
where H0(J) is the mean field Hamiltonian. Here, mψd(θ,J,θ,J) is the dressed specific potential energy between a star at phase space location (θ,J) and a star at (θ,J). (It consists of the usual Newtonian attraction plus collective effects; if we ignore these, then ψdG/|rr|). Let us expand ψd as a Fourier series in the angle variables:
(128)
Here, ψnnd(J,J,n·Ω) is the dressed potential interaction we derived in Sec. VI C, and we have put in the correct frequency dependence (ω=n·Ω), since it is physically clear that these are the only frequencies available to the system. (Alternatively, one can cheat: just think of ψd/m as some function of θ,θ,J,J, and then identify the correct functional form at the end by comparison with 125.)
Let us treat the two-body interaction as a perturbation to each star's motion. Using Hamilton's equation dJ/dt=h/θ, we find that to zeroth order in this perturbation the test and field stars just follow their mean field trajectories θ=θ0+Ωt and θ=θ0+Ωt indefinitely. To first order, similarly to Eq. (60), the result of their interaction is to nudge each other to new actions J+δJ and J+δJ, respectively, where
(129)
The result for δJ is the same as (129) except we replace the first factor inin. Note that because the two stars nudge each other, this pairwise interaction conserves the total energy of the pair. This is different from considering an interaction between a test star and a background star (as in Sec. II C 1).
Next, following Rostoker, we consider the relaxation of our entire system to consist of nothing more than an uncorrelated set of dressed two-body encounters. Then, it is easy to write down a master equation for the DF f(J). To do so, we account for (a) test stars being nudged out of the state J and in to some new state J+ΔJ, as illustrated in Fig. 26(a), and (b) test stars being nudged in to the state J from JΔJ, as in Fig. 26(b). Processes (a) and (b) are both deterministic and time-reversible, so we must also account for their inverses, which would be represented by the same diagrams but with the arrows pointing in the opposite direction. Let the transition rate density be w(ΔJ,ΔJ|J,J). This quantity is defined such that w(ΔJ,ΔJ|J,J)dΔJdΔJτ is the probability that a given test star with initial action J is scattered to the volume of phase space dΔJ around J+ΔJ, by a given field star with action J that is itself scattered to the volume element dΔJ around J+ΔJ, in a time interval τ that is much longer than an orbital period but much shorter than the relaxation time. Assuming the system's equilibrium state is invariant under time reversal, f must satisfy the master equation
(130)
(The factor of 1/2 accounts for the fact that, when we integrate over all possible J, ΔJ and ΔJ, the contribution from each diagram in Fig. 26 accounts for all possible interactions, and both diagrams are equivalent. The “accounting trick” of using both diagrams and then halving the answer leads very simply to the Balescu–Lenard equation.)
FIG. 26.

Two possibilities for “scattering” in action coordinates. A test star's action changes by ΔJ during an “encounter” with a field star whose action changes from J to J+ΔJ. The DF f(J) will be decremented if the test star is kicked out of state J [process (a)], but will be incremented if it is kicked in to state J from JΔJ [process (b)]. To write down a master equation, we must also account for the inverse processes (by reversing the directions of the arrows).

FIG. 26.

Two possibilities for “scattering” in action coordinates. A test star's action changes by ΔJ during an “encounter” with a field star whose action changes from J to J+ΔJ. The DF f(J) will be decremented if the test star is kicked out of state J [process (a)], but will be incremented if it is kicked in to state J from JΔJ [process (b)]. To write down a master equation, we must also account for the inverse processes (by reversing the directions of the arrows).

Close modal
By expanding the integrand in Eq. (130) for weak interactions, i.e., for ΔJ,ΔJJ,J, up to second order in small quantities, we can show that
(131)
where A is the 3×3 matrix
(132)
where ΔJΔJτ is the expectation value of ΔJΔJ after a time interval τ for a given test star action J and field star action J. The matrix B is identical to A except we replace ΔJΔJτΔJΔJτ. In a homogeneous system, J and J can be taken to be the (suitably normalized) linear momenta of the test and field star; then, momentum conservation implies ΔJ=ΔJ so that A=B. With this, one can easily recover equation (2.36) of the analogous calculation for a homogeneous plasma by Diamond et al.16 
Now we put the above results together. Since by assumption stars at a given action are randomly distributed in angles, we can calculate the expectation value ΔJΔJτ by averaging over the initial phases θ0,θ0. [This is analogous to the integral over impact parameters b in Chandrasekhar's theory, as in Eq. (17).] Thus, we have
(133)
where δJ is given in Eq. (129). Plugging (133) and (129) in to (132) and taking the limit τ, and making use of the following two identities:
(134a)
(134b)
we find the following expression for the matrix A:
(135)
The result for B is identical to Eq. (135) except we replace the factor nn with nn. Putting the explicit formulas for A(J,J) and B(J,J) in to the kinetic equation (131), we recover the Balescu–Lenard equation (125).

Thus, just as in plasma physics,16 the Balescu–Lenard collision operator can be interpreted as arising from the uncorrelated pairwise scattering of dressed stars as they traverse their (otherwise unperturbed) mean field orbits.

Implementing the Balescu–Lenard equation (125) in practice is hard, and has only been attempted a handful of times. Here, we mention some of these implementations, starting with an artificial one-dimensional gravitational system (Sec. VII C 1), then moving on to a hot sphere (Sec. VII C 2) and finally a cold disk (Sec. VII C 3). Other applications include the inhomogeneous Hamiltonian mean-field model,123 as well as the relaxation of the eccentricities124 and orientations125 of stellar orbits around supermassive black holes.

1. Example: Balescu–Lenard in one-dimensional “gravity”

Recently, Roule et al.126 considered the evolution of a one-dimensional “gravitational” N-body system. Their system is gravitational in the sense that density and potential are linked through Poisson's equation. Yet, because it is one-dimensional the resulting bare interaction between particles is ψ(x,x)|xx|, and the force between any two particles is proportional to sgn(xx). In this simple setup, there is a one-to-one mapping between action J and energy E, and the authors tend to use E to characterize orbits. The initial DF the authors choose for the particles is closely analogous to the isotropic DF that generates the three-dimensional Plummer sphere whose orbits we studied in Sec. III A 1. Crucially for the application of Balescu–Lenard theory, the DF is linearly stable.

Roule et al.126 simulated this one-dimensional N-body system directly many times over with random initial conditions, in order to produce an ensemble of systems over which to average. This was to be compared with the prediction from the Balescu–Lenard equation (125). They also performed simulations “without collective effects.” To do this they first integrated the mean field motion of random ensembles of massive but noninteracting particles in their chosen model. They then used the fluctuating gravitational potential generated by these particles as the input to a test particle simulation. Since test particles, by definition, do not produce potential fluctuations of their own, they experience no friction [so F1=0 in Eq. (119)], but they do experience diffusion (F20). These test particle simulations were to be compared to the predictions of “Landau” theory, i.e., diffusion along the lines of Balescu–Lenard but without collective amplification (Sec. VII A 1).

Figure 27 shows Roule et al.'s126 results. The red crosses in this figure show measurements of the diffusion coefficient in energy of particles over a time long compared to the dynamical time tdyn but short compared to the relaxation time trelax. The red solid line shows the theoretical “Balescu–Lenard” prediction. It demonstrates an excellent fit, confirming that Eq. (125) is capturing the physics of this system. Meanwhile, the blue crosses and blue solid line shows that the “Landau” theory is an excellent predictor of the diffusion coefficient in the case without collective effects.

FIG. 27.

Diffusion coefficients of particles in a one-dimensional self-gravitating system (Sec. VII C 1). Crosses show the results of N-body simulations, while solid curves show the theoretical results based on the Balescu–Lenard equation (125) (when collective interactions are switched on) or the corresponding Landau equation (when they are switched off). Note that the vertical axis scale is different for red and blue points. Figure reproduced from Roule et al., Phys. Rev. E 106, 044118 (2022). Copyright 2022 American Physical Society.126 

FIG. 27.

Diffusion coefficients of particles in a one-dimensional self-gravitating system (Sec. VII C 1). Crosses show the results of N-body simulations, while solid curves show the theoretical results based on the Balescu–Lenard equation (125) (when collective interactions are switched on) or the corresponding Landau equation (when they are switched off). Note that the vertical axis scale is different for red and blue points. Figure reproduced from Roule et al., Phys. Rev. E 106, 044118 (2022). Copyright 2022 American Physical Society.126 

Close modal

Let us emphasize that the red and blue data in Fig. 27 are plotted on different y axes. In particular, the Landau diffusion coefficient is 1 in these units, which would correspond to a relaxation time trelaxNtdyn, similar to Chandrasekhar's theory [Eq. (19)]. What is very notable is that over a large range of energies E, the Balescu–Lenard diffusion coefficient is ten times smaller than the Landau one, corresponding to a relaxation time trelax10Ntdyn. In other words, this is a self-gravitating system (to the extent that this is “gravity” at all) in which collective effects tend to quench, rather than amplify, the rate of evolution.

While this result is unusual, it is not unique. A similar result was obtained by Weinberg,45 who studied the dynamical friction drag on a satellite falling into a spherical galaxy (a similar calculation to the one we will present in Sec. VIII B 1). Weinberg found that when collective effects were included, the wake that the satellite induced in the galaxy was highly symmetric and that this led to a drag which was weaker than in the bare case without collective effects. The lesson we draw from these examples is that collective effects can have a major impact upon the evolution of inhomogeneous stellar systems, but it may be difficult to guess in advance what that impact will be.

2. Example: Balescu–Lenard in a hot sphere

On the surface, Balescu–Lenard theory is much more general than Chandrasekhar's theory of two-body relaxation (Sec. II C 1). One might therefore ask (i) whether it contains Chandrasekhar's theory as a limiting case, and (ii) given that Chandrasekhar's theory works very well in hot spheres (Sec. II C 3), does a direct application of the results of Balescu–Lenard theory to those systems give an answer that differs from Chandrasekhar's theory?

The answer to question (i) is yes: one can recover Chandrasekhar's theory from the Balescu–Lenard formalism by ignoring collective effects and confining oneself to interactions on small scales over which the curvature of mean field orbits can be neglected (replacing them with straight lines110). To answer question (ii) requires a lot more work. Fouvry et al.24 investigated it by evaluating the right hand side of Eq. (125) for an isochrone sphere, which is a toy model of a spherical star cluster. Using a known (stable) DF, they calculated the Balescu–Lenard flux F as a sum of contributions F from spherical harmonic quantum numbers =0,1,2,. As illustrated in Fig. 28, they found that on small scales 1, the modulus |F| scales as 1. Hence, a sum over recovers the small-scale logarithmic divergence familiar from Chandrasekhar's theory, a result already obtained by Weinberg127 when considering dynamical friction (see Fig. 5 therein). Such a divergence is always present in a kinetic theory based on weak ±1/r2 interactions. This strongly suggests that Chandrasekhar's theory—which is much simpler to implement than the Balescu–Lenard theory—is being recovered by Eq. (125) on small scales. This motivated Fouvry et al.24 to calculate the Balescu–Lenard flux as accurately as possible on large scales (crit, with crit12, say), and to calculate the Chandrasekhar flux on small scales (>crit), and then add the two contributions together. The advantage of this over just using Chandrasekhar's theory is that one is not plagued by any large-scale divergence (which is normally cured by imposing a maximum impact parameter bmax), and in principle, one accounts for any resonant and/or collective phenomena that are ignored by Chandrasekhar. The result of this exercise, which is shown in Fig. 29, is a slightly improved agreement between the contour map of f0/t in action space computed from theory as compared to N-body simulations. As we discussed in Sec. II C 3, the agreement based on Chandrasekhar's theory alone is already good, and as such the large-scale Balescu–Lenard “correction” is small.

FIG. 28.

Balescu–Lenard flux, F [see Eqs. (119)–(121)], for fluctuations with spherical harmonic quantum number (an angular scale 1). The magnitude of this flux scales as 1 for large 1. This confirms that the Balescu–Lenard equation recovers the Coulomb logarithm behavior from Chandrasekhar's theory in the limit of local encounters (Sec. II C 1). Figure reproduced from Fouvry et al., Mon. Not. R. Astron. Soc. 508, 2210 (2021). Copyright 2021 Royal Astronomical Society.24 

FIG. 28.

Balescu–Lenard flux, F [see Eqs. (119)–(121)], for fluctuations with spherical harmonic quantum number (an angular scale 1). The magnitude of this flux scales as 1 for large 1. This confirms that the Balescu–Lenard equation recovers the Coulomb logarithm behavior from Chandrasekhar's theory in the limit of local encounters (Sec. II C 1). Figure reproduced from Fouvry et al., Mon. Not. R. Astron. Soc. 508, 2210 (2021). Copyright 2021 Royal Astronomical Society.24 

Close modal
FIG. 29.

Contour plot of the evolution rate f0/t in action space, for a simple model of a spherical cluster as measured from (a) direct N-body simulations and (b) a theory which combines the Balescu–Lenard equation (125) on large scales with the Chandrasekhar theory on small scales. Figure reproduced from Fouvry et al., Mon. Not. R. Astron. Soc. 508, 2210 (2021). Copyright 2021 Royal Astronomical Society.24 

FIG. 29.

Contour plot of the evolution rate f0/t in action space, for a simple model of a spherical cluster as measured from (a) direct N-body simulations and (b) a theory which combines the Balescu–Lenard equation (125) on large scales with the Chandrasekhar theory on small scales. Figure reproduced from Fouvry et al., Mon. Not. R. Astron. Soc. 508, 2210 (2021). Copyright 2021 Royal Astronomical Society.24 

Close modal

In summary, perhaps the most intellectually satisfying approach to spherical cluster kinetics is to couple a large-scale Balescu–Lenard flux with a small-scale Chandrasekhar flux. However, in practice, Chandrasekhar's theory applied to all scales (with an additional cutoff at large scales bmax) is almost as good a description and is both much simpler to understand and easier to implement.

3. Example: Balescu–Lenard in a cold disk

As a final example, let us discuss the dynamics of an initially stable, razor thin stellar disk, following the N-body simulations of Sellwood128 and the kinetic predictions from Fouvry et al.129 The model employed by these authors is the stable Mestel disk, with a DF f0 whose JR=0 slice we plotted with a dotted black curve back in Fig. 22(a). Notably, this DF is very smooth in action space. However, Sellwood's128 simulations of this disk shows that after many dynamical times tdyn, a grooved/ridged feature always develops in the action space DF [see Fig. 30(a)]. The time taken for this feature to develop increases linearly with the number of particles N employed, suggesting it is driven by the finite-N noise. Remarkably, this newly modified DF is found in the simulations to be unstable to the formation of exponentially growing spiral waves.

FIG. 30.

Change in the DF f0(L,JR) for an initially smooth, razor thin stellar disk, (a) as measured in an N-body simulation, (b) as predicted by the Balescu–Lenard equation (125) [Figure adapted from Fouvry et al., Astron. Astrophys. 584, A129 (2015). Copyright 2015 European Southern Observatory.129] The DF develops a ridge/groove feature which renders it unstable to spiral Landau modes—see Fig. 22.

FIG. 30.

Change in the DF f0(L,JR) for an initially smooth, razor thin stellar disk, (a) as measured in an N-body simulation, (b) as predicted by the Balescu–Lenard equation (125) [Figure adapted from Fouvry et al., Astron. Astrophys. 584, A129 (2015). Copyright 2015 European Southern Observatory.129] The DF develops a ridge/groove feature which renders it unstable to spiral Landau modes—see Fig. 22.

Close modal

In fact, this modified DF is precisely the one studied by De Rijcke et al.94 in their linear stability analysis (Sec. VI B 2), the JR=0 slice through which is shown with a solid black curve in Fig. 22(a). The corresponding spiral Landau mode they calculated from linear theory [which is shown in Fig. 22(b)] matches that extracted by Sellwood128 from his simulations, confirming that his spiral waves are true linear instabilities of the modified DF. Thus, our stable, isolated stellar disk has driven itself unstable!

We can usefully contrast this behavior with that of a plasma. If we initiated, e.g., a finite-N electrostatic plasma with a Maxwellian DF, then the plasma would exhibit thermal fluctuations δΦ,δfN1/2, but its mean-field Maxwellian f0 would simply never evolve. That is because the Maxwellian is the ultimate maximum entropy state in a homogeneous plasma, a perfect thermodynamic equilibrium. Yet, for self-gravitating systems, there is no such thermodynamic equilibrium (Sec. II B), and so even a linearly a s