We present methods and algorithms that allow the Vlasiator code to run global, three-dimensional hybrid-Vlasov simulations of Earth's entire magnetosphere. The key ingredients that make Vlasov simulations at magnetospheric scales possible are the sparse velocity space implementation and spatial adaptive mesh refinement. We outline the algorithmic improvement of the semi-Lagrangian solver for six-dimensional phase space quantities, discuss the coupling of Vlasov and Maxwell equations' solvers in a refined mesh, and provide performance figures from simulation test runs that demonstrate the scalability of this simulation system to full magnetospheric runs.
I. INTRODUCTION
Vlasiator is a hybrid-Vlasov space plasma simulation system, specifically designed to perform kinetic simulations of the near-Earth environment.1 Its goal has been to use it to perform global, three-dimensional simulations of Earth's magnetosphere and its interaction with the solar wind without a fixed prescription of the shape of the particles' velocity distribution function [as would be the case in magnetohydrodynamics (MHD)]. Being an implementation of a hybrid kinetic method, Vlasiator simulates the phase-space evolution of ion species by propagating the phase space density on a Cartesian grid, modeling ions as a distribution function in six (three spatial and three velocity) dimensions. Electrons are treated in an indirect way, their effective physical action reduced to charge neutralization, Hall currents, and contributions to Ohm's law via the electron pressure gradient.2 In Vlasiator's numerical implementation, the deliberate choice of a Eulerian representation of phase space, instead of the common approach of a particle-in-cell (PIC) approximation,3 means that the simulations are computationally very heavy, typically exceeding millions of CPU-hours for a few tens of minutes of simulated physical time. On the other hand, this choice enables simulation results that are unencumbered by stochastic particle sampling noise that is inherent to the PIC simulation approach. In the end, high-resolution PIC simulations typically end up using particle counts that are roughly equivalent ( ) to Vlasiator's phase space grid point numbers.4
Vlasiator has been employed to study Earth's foreshock,5–12 magnetosheath waves,13,14 magnetosheath jets,15–17 dayside magnetopause magnetic reconnection,18–20 transient foreshocks,21 magnetotail reconnection site locations,22,23 fast magnetotail flows,24 magnetotail current sheet flapping,25 and ionospheric precipitation.26,27 In addition, Refs. 28 and 29 presented simulations, in which full electrostatic Vlasov simulations including kinetic electrons were run. Common to all previous publications was a limitation to two spatial dimensions: For reasons of computational complexity and cost, the simulation runs were chosen to operate in spatially two-dimensional cuts through the Earth's magnetosphere, along the equatorial plane (for foreshock and magnetosheath studies) or the noon-midnight meridional plane (for studies of dayside- and tail reconnection, among others). The dimension perpendicular to this plane extended only a single simulation cell in thickness, with periodic boundary conditions. Their magnetic field had to be modeled as a line dipole, which prevented fully realistic mapping of inner magnetosphere locations to ionospheric altitudes.
A notable exception is given by Ref. 30, where the simulation setup was a slab of finite extent, to provide a third spatial dimension on a limited scale (but large enough to comfortably exceed ion scales), so that the local kinetic behavior of dayside magnetopause reconnection could be studied without limitation of strict two-dimensionality. Yet, the overall geometry of the simulation domains, and thus the magnetic topology and physical degrees of freedom, remained essentially two-dimensional, so that a comprehensive validation of physical results was required in each of the aforementioned studies, to ensure that their outcomes were not negatively affected by the limitation to two spatial dimensions.
Thanks to recent algorithmic improvements to the code, these limitations have been overcome, fulfilling the original goal outlined in Ref. 31. In the present paper, we describe the technological and conceptual evolution of the Vlasiator code that reduces the computational overhead to a degree that global simulations in full three spatial dimensions (plus three velocity dimensions) have become possible. Scientific results from the first 3D + 3V global hybrid-Vlasov simulation of Earth's magnetosphere performed with the methods presented here will be presented in an upcoming publication.
The structure of this publication is such that the remainder of Sec. I outlines the length- and timescales that magnetospheric hybrid-kinetic simulations need to resolve and how these resolution requirements lead to difficult computational demands. Section II describes Vlasiator's data structures and solvers, with a special focus on the recent developments that enable the combination of spatial adaptive mesh refinement and sparse velocity space structure. Section III, then, presents verification test cases, which show the capabilities of the new solvers. Discussion and outlook toward future developments are presented in Sec. IV.
A. Spatiotemporal scales and computational requirements
As the goal of Vlasiator is to perform hybrid-kinetic simulations of the whole magnetospheric domain, the involved spatiotemporal scales place stringent requirements on the grid resolution and size. Kinetic electron dynamics are neglected in the hybrid approach, so characteristic length scales of interest are dictated by ion kinetic effects. They include, at the small end, the ion inertial length (whose value in the solar wind, for example, is , where c is the speed of light, and ωi is the ion plasma frequency). Wavelengths of mirror modes and electromagnetic ion cyclotron (EMIC) waves in the magnetosheath are .13,14 The Larmor radius provides yet another kinetic length scale that varies strongly throughout the magnetospheric domain (from in the solar wind to tens of kilometers in the cusp regions).
In addition to these smallest length scales that need to be resolved, the minimum outer system size of the simulation domain is dictated by the extents of the magnetospheric flow and current systems. With the simulation domain, shaped as a rectangular cuboid box in the GSM (geocentric-solar-magnetospheric) coordinate system, the bow shock standoff distance at Earth Radii 32 provides a minimum bound for the Sunward direction. If the simulation is expected to generate a foreshock extending toward the Sun, the box must be extended to contain the majority of generated foreshock wave power. In the opposite direction, the magnetotail extends far in −x. The whole tail, however, cannot be encompassed in the simulation domain, so the constraint is to include the plasma domain relevant to and driving the features of interest. We have chosen tailward simulation domain extents down to (in contrast to much longer tails in MHD simulations, such as Ref. 33), allowing simulation data to be directly compared to spacecraft data from the ARTEMIS mission.34
In the two remaining spatial dimensions, the system size is determined by the bow shock shape: The box needs to be large enough to contain a significant part of the shock surface. Downstream from locations where the shock surface is allowed to intersect the box, the pressure balance has proven to be unreliable, which can negatively influence the model's physical validity. In Vlasiator global runs, the extents are, thus, typically chosen to range from to provide generous leeway away from the inner magnetosphere and magnetotail.
In near-Earth space, the velocity space extents are dictated by the typical bulk velocity of the solar wind and non-thermal velocity structures (such as non-relativistic particle beams accelerated at the bow shock or by magnetic reconnection). Satellite observations and previous simulation runs6,35 motivate velocity limits of under regular solar wind conditions. For the velocity space resolution, i.e., the size of a velocity space cell, a systematic study by Pfau-Kempf et al.36 found that yields reasonable results for the physical processes relevant to magnetospheric simulations.
Given the resolution and spatial extent requirements discussed above, Table I tabulates the amount of grid points that the phase space needs to comprise in each of its six dimensions. From the table, we conclude that a fully resolved simulation would require a total of 3D simulation cells, each of which contain a velocity space of velocity space cells. In a naïve approach to uniform grid discretization, the 6D phase space would, thus, contain phase space cells, which leads to a memory requirement of about 72 exabytes. Obviously, this is far beyond anything a computing facility can be expected to provide in the foreseeable future.
Listing of typical simulation extent and resolution requirements for global kinetic magnetospheric simulations in all six spatial and velocity dimensions. A straightforward Cartesian grid without refinement would lead to unfeasible simulation cell counts.
Dim. (GSM) . | Physical extent L . | Resolution requirement . | Required cell count n . |
---|---|---|---|
x | 3822 | ||
y | 3822 | ||
z | 3822 | ||
200 |
Dim. (GSM) . | Physical extent L . | Resolution requirement . | Required cell count n . |
---|---|---|---|
x | 3822 | ||
y | 3822 | ||
z | 3822 | ||
200 |
The key strategy hitherto employed in Vlasiator to achieve a significant reduction of computational complexity is the sparse velocity space representation.37 It makes use of the observed property of plasma velocity distribution functions in near-Earth space tend to be compactly localized in the velocity domain. Very often, high phase space densities are present in (near) Maxwellian core populations around the bulk velocity, sometimes with beam distributions clearly separated from the thermal core. Long tails of the distribution, such as power-law or kappa distribution functions, are present and important for high-energy particle transport studies, but their overall kinematic impact on global plasma dynamics is small compared to the bulk and beam structures. Large volumes of the velocity space are filled with phase space density values that are many orders of magnitude lower than the dynamically relevant peaks. Vlasiator deliberately discards those parts and stores and processes only the parts of velocity space in which the phase space density exceeds a set threshold value . In magnetospheric setups, we found that a sparsity threshold value of led to a 98% reduction in modeled phase space volume,36 while resulting in an overall mass loss of . This directly translates to a 98% reduction in computational requirements, which made the spatially 2D Vlasiator simulations and scientific studies possible. This threshold value was also empirically found to retain foreshock beam distributions, thus leaving ion-kinetic beam instabilities in the foreshock intact.9 For a more detailed description of the sparse velocity space mechanism, see Refs. 37 and 1.
However, this mechanism alone is not sufficient to make 3D + 3V global magnetospheric simulations feasible, as the memory requirement for the simulation data would still be of the order of (tens of) exabytes. To enable spatially 3D simulations, a further reduction in computational intensity is required. A common approach, especially in magnetohydrodynamic (MHD) modeling,33,38,39 is to change the spatial grid structure to be non-uniformly discretized. Typically, a distinction is made between adaptive mesh refinement (AMR) approaches and unstructured sparse grids.
In an AMR setup, the simulation domain is initially discretized on a coarse Cartesian base mesh. The mesh resolution in certain areas of the domain is successively doubled by splitting simulation cells into smaller subcells. Thus, parts of the domain have double mesh resolution, parts have quadruple resolution, etc.
The choice for when to refine must be made judiciously based on the problem at hand: In some cases, the refinement will be made based on predefined geometric constraints (such as the choice of higher resolution for certain areas of interest). More generally, however, the adaptivity of the refinement process is expressed through a run-time evaluated refinement criterion. This refinement criterion continuously reevaluates whether any given simulation cell should be coarser or finer than it is at the moment.
A simulation code that employs a refined mesh needs to be designed such that it can operate on different grid resolutions and especially handle situations at refinement interfaces, where simulation cells have multiple neighbors of higher or lower resolution in one or multiple directions. Section II describes the details of how Vlasiator's solvers have been adapted and optimized for this task. Using these optimizations, the amount of simulation cells for a global 3D + 3V Vlasiator simulation is reduced to cells, leading to a total memory requirement of 33.7 terabytes, which is well within the capabilities of modern supercomputers.
Section II details the design choices and algorithmic changes to the Vlasiator data structures and solvers that make computationally efficient 3D + 3V simulations with a refined mesh geometry possible.
II. METHODS
A. The grid structure
Vlasiator's basic simulation grid is based on the distributed Cartesian cell-refinable grid (DCCRG) library,40 which provides memory layout, mesh geometry, and refinement and a parallel message passing interface (MPI) communication framework.41
The DCCRG framework enables adaptive mesh refinement at arbitrary locations of a Cartesian simulation domain, by splitting a cubic simulation cell into eight cubic subcells with twice the spatial resolution (due to its choice of solvers, Vlasiator currently requires exactly cubical cells). This refinement process can be iterated multiple times. The domain decomposition is neither tree nor patch based,42,43 as every cell can be individually refined or unrefined without any cooperation of its local neighborhood (with the limitation that only one jump of refinement level is allowed at a time, and refinement interfaces need to be separated by at least two cells). As a result, the refinement domains can be freely located and the refinement interfaces can have arbitrary shapes. The solvers operating on that domain need to, therefore, be able to handle changes of refinement level on a cell-by-cell basis.
The current refinement strategy for Vlasiator global simulations is based on regions of interest. These regions are identified in a low-resolution run (and/or using estimates by well-established empirical relations for the bow shock and magnetopause positions,44) and are, thereby, predefined for the high-resolution simulation. Figure 1 shows an example of the mesh refinement geometry employed in such a situation. For future runs, however, dynamic refinement criteria are being investigated, based on local physical parameters and their gradients, as is commonly done in global MHD simulations (e.g., Ref. 33).
Snapshot of a Vlasiator global magnetospheric simulation, showing the plasma density color-coded. The adaptive mesh refinement (AMR) is visible in the large contrast of cell sizes between the inflowing solar wind (coarsest cells) and the magnetotail (finest cells). In this simulation, refinement regions are predefined through simple geometric primitives.
Snapshot of a Vlasiator global magnetospheric simulation, showing the plasma density color-coded. The adaptive mesh refinement (AMR) is visible in the large contrast of cell sizes between the inflowing solar wind (coarsest cells) and the magnetotail (finest cells). In this simulation, refinement regions are predefined through simple geometric primitives.
While the DCCRG grid of Vlasiator stores the phase space density data as solved by the Vlasov equation, the electromagnetic fields are not discretized on the same mesh. They are propagated with a finite difference upwind field solver,45 which stores its data on a simple uniform Cartesian array grid.46 This grid does not employ mesh refinement (as the field solver does not need it), but is discretized at a constant resolution matching the finest AMR refinement level of the DCCRG grid.
As the computational demands of these simulations require running on large distributed-memory systems, both the Vlasov and field solver grids use a spatial domain decomposition approach. The strongly varying computational complexity of the sparse velocity space and AMR implementations makes it necessary for the Vlasov solver grid to dynamically rebalance the load of its decomposition domains. A combined action by the DCCRG grid and the Zoltan library47 takes care of this process. The field solver, on the other hand, has a constant computational load per simulation cell, so that a simple MPI Cartesian communicator is used for initial grid partition; no rebalancing is required during run time.
This multi-mesh structure is referred to as heterologous domain decomposition. The intricacies of coupling between these two grids are discussed in a recent publication.48
B. The 6D Vlasov solver
As the near-Earth plasmas modeled by Vlasiator are in good approximation non-relativistic, relativistic effects are neglected in the implementation. This has several benefits for the semi-Lagrangian structure of the translation step: Transport in all three spatial dimensions can be treated independently, as Newtonian kinematics does not introduce a Lorentz factor of relativistic kinematics. For each dimension, transport in the subspace, hence, forms a linear shear of the distribution, as illustrated in Fig. 2.
Illustration of the elementary semi-Lagrangian shear step that underlies the Vlasiator Vlasov solver. Panel (a) shows the example of a 1D + 1D real space/velocity space cut through the simulation domain, as it shears during a translation update. Panel (b) presents how phase space density information from adjacent cells in the update direction is extracted into a linear pencil array. This pencil is split to match its finest resolution cells in panel (c). An interpolating polynomial is reconstructed through the phase space density control points at cell centers, shown in panel (d). This polynomial is advected, and the resulting target phase space values are integrated over the volume of the pencil intersecting the cell and written back into the phase space data structure.
Illustration of the elementary semi-Lagrangian shear step that underlies the Vlasiator Vlasov solver. Panel (a) shows the example of a 1D + 1D real space/velocity space cut through the simulation domain, as it shears during a translation update. Panel (b) presents how phase space density information from adjacent cells in the update direction is extracted into a linear pencil array. This pencil is split to match its finest resolution cells in panel (c). An interpolating polynomial is reconstructed through the phase space density control points at cell centers, shown in panel (d). This polynomial is advected, and the resulting target phase space values are integrated over the volume of the pencil intersecting the cell and written back into the phase space data structure.
In a similar way, certain properties of the Lorentz force acting on the plasma medium allow the acceleration step to be simplified. In the nonrelativistic Vlasov equation, the action of the Lorentz force transforms the phase space distribution inside one simulation time step like a solid rotator in velocity space: The magnetic field causes a gyration of the distribution function by the Larmor motion . Any non-zero electric field accelerates electrically charged matter, which is a simple linear shift of velocity space.
Hence, the overall velocity space transformation is a rotation, whose axis direction matches the local magnetic field vector. The gyration center velocity is given by the convective plasma velocity . This transformation can be decomposed into three consecutive shear motions53,54 along the coordinate axes, thus repeating the same fundamental algorithmic structure as the spatial translation update. The acceleration update affects the velocity space locally within each spatial simulation cell, so its properties are unaffected by the introduction of spatial AMR. This paper will, hence, focus on the changes to the translation solver step.
Iterating through the storage data structure of Vlasiator's sparse phase space has a high numerical complexity. In contrast, the access patterns of the solver are predictable and linear: The semi-Lagrangian solver performs the shear transformation in-order, in one dimension at a time. To benefit from this, phase space data are organized into a so-called pencil structure at the beginning of each solver step, in which spatially adjacent data, in one Cartesian direction at a time, are stored linearly in memory. These pencils share no data dependencies during the shear and can be processed in parallel.
In a refined mesh, construction of these pencils carries an additional complication, since neighboring cells at the mesh refinement boundaries can have different sizes. The algorithm performs the following steps for every spatial update direction (as illustrated in Fig. 2):
-
Find pencil seed cells
The goal is to build pencils of spatially adjacent cells. This process starts with the seed cells at one boundary of the domain. These are all cells that do not contain a local neighbor in the backwards direction of the chosen coordinate.
-
Pencil construction
The pencil is built by identifying the neighbors of the seed cell in the positive coordinate direction and appending them to the pencil data structure. This process continues until no further neighbor cells are found (either at the opposite simulation boundary, or the edge of the local MPI domain).
-
Pencil splitting
When a refinement interface is encountered in this process, the next neighbors are four cells at higher mesh resolution. Here, the existing pencil structure is stopped, and four new separate pencils continue. To minimize the amount of data movement, the split pencils also contain cells from the coarser domain corresponding to the semi-Lagrangian update's stencil size (two cells in common simulation runs). This process advances recursively across any further refinement interface.
In the case where a refined pencil encounters a re-coarsening of the mesh, the pencil construction stops, again taking padding by the stencil size into account. A new seed point (and hence, a new pencil) is started to continue at the coarser level.
-
Result merging and remote neighbor contributions
Even a completed pencil still has a mass outflow toward a further neighbor cell. This can, in many cases, mean that a boundary of the spatial decomposition over parallel processes was encountered, and a remote neighbor cell exists, into which the transported phase space density flows. This is called the remote neighbor contribution of the pencil.
In situations where the end of the pencil coincides with a coarsening refinement interface, four pencils will feed their remote neighbor contribution into the same target cell, so the target cell provides a target buffer for up to four incoming phase space density values, which are summed after the transport step is completed.
Once the complete set of pencils has been constructed, meaning that the whole 3D space is organized into pencil structures, the next step of the solver acts on one phase space cell at a time, interpolating the discrete phase space values of the pencils by a reconstruction polynomial (depicted as the red curve in Fig. 2). The phase space integral values of each cell are taken as control points for a polynomial using the nonuniform piecewise parabolic reconstruction method of Ref. 55. A slope limiter is used at this point, to ensure the total variation diminishing (TVD) property of the overall solver step.
The shifted polynomial is integrated over the target cell extent to obtain updated phase space density values. These are transferred back from the pencil structure into the phase space grid data structure.
This entire process gets repeated for all three spatial dimensions, leading to an effective three-dimensional spatial transport step. As a performance optimization, the pencilization process is only performed once and its metadata are retained until the MPI load balances change the spatial decomposition of the simulation domain.
C. Coupling between Vlasov and field solver
To complete the description of a plasma system in the hybrid-kinetic regime, the Vlasov equation [Eq. (1)] is solved together with Maxwell's equations in the Darwin approximation.1 The electric field and magnetic field act on the distribution functions, and in return, statistical moments of the particle distribution functions (overall mass density ρm, charge density ρq, bulk velocity , and pressure tensor ) are fed into the field solver. Hence, data need to be exchanged between the two equations' solver procedures.48
As Vlasiator's field solver and Vlasov solver operate on separate grids, they both require a coupling process during each simulation time step, to be used as source terms for each other's update. The association between the two grids is calculated at initialization time and updated after any load balancing step. Reference 48 presents an in-depth discussion of details of the coupling procedure between the two meshes, including the spatial filtering step employed to smooth out discontinuities in areas of resolution mismatch.
D. Boundary conditions
Vlasiator offers a selection of boundary conditions to handle different simulation scenarios, described in detail in Ref. 1 Inflowing boundary conditions enable solar wind to enter the simulation, with arbitrary time-varying plasma conditions.56 Periodic boundary conditions are used to enable simulation in one and two spatial dimensions, and outflow boundary conditions are used to enable plasma and electromagnetic fields to exit the simulation domain. Outflowing boundary conditions are implemented as homogeneous Neumann boundaries, essentially copying the values from the neighboring simulation cells. The inner boundary of the magnetospheric setup is terminated by an ionospheric shell, set at about and modeled as a perfect conductor. The ionospheric boundary cells' phase space density is initialized with a Maxwellian distribution and remains constant throughout the run. The electric field at the ionosphere is set to zero, and the magnetic field component is initialized with values corresponding to the Earth dipole at those locations.
III. RESULTS
Section III presents a number of validation test runs, in which the performance of the Vlasov solver has been tested. Full discussion and in-depth analysis of the physical properties of our large-scale 3D magnetospheric simulations are subject of an upcoming publication.
A. Performance of the semi-Lagrangian solver
To ensure that the propagation algorithm performs well across simulation refinement interfaces, three tailored simulations were performed, where a test distribution was propagated in a periodic box with a size of cells at coarsest refinement level. In these tests, the semi-Lagrangian solver was run in isolation from the rest of the simulation system (especially without any interaction of the field solver). A monoenergetic plasma (moving at constant velocity v) with a spatial density modulation of square, sinusoidal, or triangular form was initialized in the center of the domain, with two levels of mesh refinement implemented in the same region. A fixed time step (corresponding to a spatial Courant–Friedrichs–Lewy (CFL) value of 0.8 at the finest refinement level) was used. In 100 simulation time steps, the density modulation propagates once around the boxes' periodic boundaries and arrives at its starting position. Figures 3(a)–3(c) show the spatial profiles for all three signals at the start of the simulation, after 100, 1000, and 5000 time steps (corresponding to 1, 10, and 50 complete transitions of the simulation box). In panels (d) and (e), the triangular setup has been repeated with a uniform grid discretization at the highest and lowest resolution level, respectively.
Accuracy of translation method with square, sinusoidal, and triangular density modulations. In panels (a)–(c), they were propagated over a partially refined periodic simulation domain. Panels (d) and (e) present the behavior of the same triangular modulation in a fully refined and completely unrefined mesh, for comparison. The original signal is shown in black, and the signal after propagation is shown in blue (having traveled once around the periodic boundaries), red (ten crossing times), and green (50 crossing times). Vertical lines indicate the simulation resolution.
Accuracy of translation method with square, sinusoidal, and triangular density modulations. In panels (a)–(c), they were propagated over a partially refined periodic simulation domain. Panels (d) and (e) present the behavior of the same triangular modulation in a fully refined and completely unrefined mesh, for comparison. The original signal is shown in black, and the signal after propagation is shown in blue (having traveled once around the periodic boundaries), red (ten crossing times), and green (50 crossing times). Vertical lines indicate the simulation resolution.
It is apparent that the use of slope limiters caused some diffusion, but overall the signals are conserved adequately. In particular, there are no distortions or artifacts created at the interfaces between refinement regions. Comparison of the mesh-refined triangle run [panel (c)] with the high- and low-resolution uniform mesh runs [panels (d) and (e)] shows that the AMR mechanism does not add any extra diffusivity, and its effective diffusive behavior lies between the high- and low-resolution case. The small asymmetries visible in the final result stem from the selected propagation direction ( ) and were mirrored when the tests where repeated in the opposite flow direction. These asymmetries stem from the applied parabolic polynomial fitting method that uses four data points, which cannot be selected around the source cell in a symmetric fashion.
B. Speedup and computational effort
As a benchmark problem for realistic plasma simulations, a shock-in-a-box test was run, in which an elongated rectangular simulation domain (with a size of ) is initially filled with two plasma regions that correspond to the Rankine–Hugoniot conditions of an interplanetary shock in the de Hoffmann–Teller frame.57,58 These simulations have a similar setup as employed in Ref. 36, with the important distinction that they are fully three-dimensional meshes. Figure 4 shows the simulation box, mesh refinement structure, and resulting density profiles, and Ref. 59 contains full simulation configuration and output files.
Setup and results of shock-in-a-box simulations used for computational scaling tests. Panel (a) shows the initial simulation state: The box is an elongated rectangle, inside which a standing shock wave is initialized. Panel (b) presents the situation after . A foreshock has formed and kinetic effects breaking the Rankine–Hugoniot conditions have shifted the position of the shock. Panel (c) outlines a 1D cut through the mesh geometry (showing the simulation cells) and density profile at (black) and (red) of simulation run AMR-18. In panel (d), the same information is presented for the unrefined run low-1.
Setup and results of shock-in-a-box simulations used for computational scaling tests. Panel (a) shows the initial simulation state: The box is an elongated rectangle, inside which a standing shock wave is initialized. Panel (b) presents the situation after . A foreshock has formed and kinetic effects breaking the Rankine–Hugoniot conditions have shifted the position of the shock. Panel (c) outlines a 1D cut through the mesh geometry (showing the simulation cells) and density profile at (black) and (red) of simulation run AMR-18. In panel (d), the same information is presented for the unrefined run low-1.
The parameters are chosen such that the shock has zero velocity in the simulation frame and remains quasi-stationary in the central region of the box. Note that in the short simulation time frame, complex 3D shock dynamics do not have time to form. The ion inertial length is upstream and downstream of the shock. Table II compares five different runs of this scenario. In the “low-1” run, the simulation was discretized with a unrefined mesh of cells in size (yielding a spatial resolution of ). For “med-8” and “high-64,” this was increased to cells ( ) and cells ( ), respectively, so these three simulations provide a weak-scaling set of the solver implementation without AMR.
Memory usage and wall time requirement of shock-in-a-box simulation test runs. Nc is the number of spatial cells, and Nf is the overall number of phase space cells in the simulation. “Mem” denotes the simulations' total resident memory as measured by the PAPI library.60 The wall time and the core count provide the total computational resource use CPUh. All simulations ran for using different numbers of simulation steps . The efficiency parameter denotes the processing time spent per step per phase-space cell.
Run . | Nc . | Nf . | (km) . | . | Mem (GB) . | twall (s) . | CPUh . | . | Eff (μs) . |
---|---|---|---|---|---|---|---|---|---|
low-1 | 16 000 | 200 | 32 | 15.74 | 5500 | 48.9 | 495 | 0.33 | |
med-8 | 128 000 | 100 | 256 | 182.7 | 12883 | 916 | 989 | 0.43 | |
high-64 | 1 024 000 | 50 | 2048 | 1545.9 | 31695 | 18030 | 1978 | 0.53 | |
AMR-18 | 296 000 | 50–200 | 608 | 524.9 | 28955 | 4890 | 1758 | 0.55 | |
eqv-18 | 281 216 | 76.92 | 608 | 429.5 | 17548 | 2963 | 1286 | 0.49 |
Run . | Nc . | Nf . | (km) . | . | Mem (GB) . | twall (s) . | CPUh . | . | Eff (μs) . |
---|---|---|---|---|---|---|---|---|---|
low-1 | 16 000 | 200 | 32 | 15.74 | 5500 | 48.9 | 495 | 0.33 | |
med-8 | 128 000 | 100 | 256 | 182.7 | 12883 | 916 | 989 | 0.43 | |
high-64 | 1 024 000 | 50 | 2048 | 1545.9 | 31695 | 18030 | 1978 | 0.53 | |
AMR-18 | 296 000 | 50–200 | 608 | 524.9 | 28955 | 4890 | 1758 | 0.55 | |
eqv-18 | 281 216 | 76.92 | 608 | 429.5 | 17548 | 2963 | 1286 | 0.49 |
The results were obtained on the “Vorna” cluster at the University of Helsinki, consisting of 180 nodes with Intel Xeon E5–2670 CPUs at 2.60 GHz and an InfiniBand QDR interconnect.
Simulation “AMR-18” from Table II was set up with the same low base resolution as “low-1,” but the central part of the domain ( ) underwent two levels of mesh refinement, giving that region the same effective resolution as “high-64.” Furthermore, simulation “eqv-18” was initialized with uniform cells resulting in a spatial cell count (281 216) close to what the AMR run achieved (296 000), which yielded a uniform resolution of throughout the box.
The simulation time step was determined from the advection CFL condition. As this value depends on the size of spatial cells, each run had to perform a different amount of time steps in order to reach the simulation target time of 100 s.
As the figures in the table show, the AMR simulations' amount of simulation cells Nc lies far below that of “high-64” and its memory consumption and total computation resource use are reduced accordingly. As the AMR solver pencilization process is more complex than that in the unrefined case, the effective computational effort per phase-space cell, here expressed as (μs), is noticeably higher. This can also directly be inferred from comparing the resources spent by “AMR-18” and “eqv-18.” The increased computational complexity of the AMR solver, however, is offset by the increase in effective physical phase space volume per computational unit, so that AMR Vlasov simulations clearly outperform unrefined setups in terms of overall computational effort per physical volume or conversely allow for a much finer resolution at the physically relevant regions at the same computational cost.
C. Global magnetospheric simulations
Finally, the scientific use case for adaptive mesh refinement in Vlasiator is the study of global, three-dimensional hybrid kinetic simulations of Earth's entire magnetosphere.
Figures 1 and 5 present different visualizations of the mesh refinement structure currently employed in Vlasiator global runs. This run modeled the plasma around Earth, in a box extending in geocentric solar ecliptic (GSE) coordinates , with an incoming, steady solar wind density of , flowing at a speed of in the – x direction. The solar wind carried an interplanetary magnetic field of in the purely southward direction, resulting in spatially well-known positions for the day- and nightside reconnection regions. The simulation's spatial grid had a coarsest resolution level of with three subsequent levels of refinement, yielding a finest resolution of . The refinement regions were chosen to give maximum resolution coverage for the dayside magnetopause and tail reconnection regions. Note that the ion inertial length in this simulation is smaller than even the finest grid resolution. This was likewise the case in previous publications,14,30,36 which have nonetheless provided valuable insights into physics of the magnetosphere, that would be hard to obtain using other simulation methods. As this manuscript focusses on the implementation details of adaptive mesh refinement, the inclined reader is referred to Ref. 61, which presents a recent study further quantifying the physical fidelity of Vlasiator simulations at low resolution and presenting a subgrid method to mitigate some of its effects.
Two-dimensional meridional plane cut-through of a 3D global magnetospheric simulation of Earth's magnetosphere during southward IMF, showing the plasma number density ( in grayscale). Overlaid in red are the contours of successive mesh refinement regions. The coarsest refinement level has a spatial resolution of km, and the subsequent red contour levels mark , and 1000 km, respectively. The blue and green curves mark the nominal magnetopause and bow shock locations according to the analytic model by Refs. 44 and 32, respectively.
Two-dimensional meridional plane cut-through of a 3D global magnetospheric simulation of Earth's magnetosphere during southward IMF, showing the plasma number density ( in grayscale). Overlaid in red are the contours of successive mesh refinement regions. The coarsest refinement level has a spatial resolution of km, and the subsequent red contour levels mark , and 1000 km, respectively. The blue and green curves mark the nominal magnetopause and bow shock locations according to the analytic model by Refs. 44 and 32, respectively.
In Fig. 5, the blue line provides an analytic estimate of the magnetopause location using the model of Ref. 44 calculated from the solar wind parameters of the simulation. The magnetospheric geometry that has self-consistently developed in the simulation matches this prediction quite well, but the magnetopause standoff distance appears to mismatch by about . The authors of Ref. 62, however, have argued that the model by Ref. 44 overestimates the radial distance by about , an effect that is most likely even more exacerbated by the fact that this simulation has an unusually high solar wind temperature of , leading to an enhanced thermal pressure in the magnetosheath and somewhat stronger compression of the magnetosphere, resulting in a smaller magnetopause standoff distance. Most importantly, the simulation shows no sudden discontinuity or departure from the prediction in different refinement regions.
The green line in Fig. 5 presents the analytic bow shock location model by Ref. 32. It again shows that the Vlasiator bow shock geometry matches well with observational behavior. Some departure from the ideal parabolic shape of the model is visible, but this is likely due to kinetic effects of the cusps and transient magnetosheath phenomena, as previously presented in Refs. 19 and 21, and stays within the confidence range given in Ref. 32.
Direct performance comparison with a high-resolution 2D Vlasiator production run using the Vlasiator solver without AMR was performed on the Mahti supercomputer at the CSC—Center for Scientific Computing in Kajaani, Finland, a BullSequana XH2000 system with AMD Rome 7H12 CPUs at 2.6 GHz and an InfiniBand HDR100 interconnect. Table III outlines the cell counts, memory requirement, and effective calculation time per core per cell of both runs. As before, the AMR solver carries a performance penalty in terms of computing time per simulated phase space cell. There is also still somewhat of a resolution gap between the 2D simulation (at ) and the 3D run (at ), but the overall reduction in required phase space cells means that a simulated 3D volume, instead of a 2D slice, is now possible and meaningful scientific results can now be attained.
Memory usage and wall time requirement of global magnetospheric simulation runs. Column labels are identical to Table II.
Run . | Nc . | Nf . | (km) . | . | Mem (TiB) . | Eff (μs) . |
---|---|---|---|---|---|---|
2D | 300 | 128 000 | 11.89 | 4.18 | ||
3D | 1000 | 64 000 | 4.82 | 7.08 |
Run . | Nc . | Nf . | (km) . | . | Mem (TiB) . | Eff (μs) . |
---|---|---|---|---|---|---|
2D | 300 | 128 000 | 11.89 | 4.18 | ||
3D | 1000 | 64 000 | 4.82 | 7.08 |
IV. CONCLUSIONS AND OUTLOOK
Studies of the near-Earth plasma environment are strongly dependent on space plasma simulations, whose computational requirements continue to use a major share of supercomputing resources worldwide. The methods and results presented in this paper show that, using a combination of sparse velocity space representation and spatial adaptive mesh refinement, it is possible to simulate the entire outer extents of Earth's magnetosphere close to hybrid-kinetic scales in three dimensions using a Vlasov simulation code. This outcome opens the door for modeling studies of magnetospheric plasma phenomena in three dimensions, among these complex phenomena benefiting from a global view (without the previously required couplings of different simulation approaches63 or limitation to stochastic sampling of distribution functions as in the PIC approach4).
Section II introduces the dimensional splitting approach used in Vlasiator's solver and described the dimension-by-dimension dynamic pencilization and semi-Lagrangian solver strategy. The combination of adaptive mesh refinement and a massively parallelized, low-diffusive phase space solver is novel, and its fields of applicability are only just starting to be explored.
The scaling results in Subsection III B demonstrate how the AMR implementation provides a very effective reduction of memory usage and overall computation time compared to unrefined simulations. Through the introduction of the AMR solver, individual computations for each computational cell had incurred a computation time overhead of about 50%. Yet, this is counterbalanced by the fact that the number of simulation cells directly reduces by a factor of eight for every introduced refinement level, thus leading to an overall significant performance gain.
Global magnetospheric simulations performed with these new methods have been validated against the Ref. 44 magnetopause and the Ref. 32 bow shock model and show that magnetospheric dynamics remains intact and consistent across all refinement levels. The simulation extents and resolutions that Vlasiator achieves using the methods outlined here are now approaching those of global MHD simulations64 and are able to encompass kinetic phenomena that before escaped global modeling. Yet, even with the current improvements, resolution of the ion inertial length is still beyond the numerical capabilities of supercomputers in the unrefined parts of the simulation domain. Improvement of simulation techniques and implementations are ongoing.
The current mesh refinement process is only adaptive in its spatial dimensions, while the velocity resolution remains fixed. One potential future optimization would be the implementation of an adaptively refined mesh also in the velocity domain of each cell, thus yielding a fully six-dimensional mesh refinement and hopefully providing another order-of-magnitude improvement of computational performance. Another avenue is to decouple the time stepping in regions of different resolution and different physical properties (as in Ref. 65) enabling longer time steps where possible and focusing the computational power on the regions of dynamics at small spatiotemporal scales.
ACKNOWLEDGMENTS
This work was performed as part of the European Research Council Consolidator Grant No. 682068-PRESTISSIMO. The Academy of Finland supported this work through the PROFI4 Grant No. 318913.
The authors gratefully also acknowledge the Academy of Finland (Grant Nos. 1339327, 1335554, 1347795, 1345701, 267186, 309937, 322544, 328893, 338629, and 339756). The Finnish Centre of Excellence in Research of Sustainable Space, funded through the Academy of Finland Grant Nos. 312351 and 1336805, supports Vlasiator development and science as well.
The simulations for this publication were run on the “Hawk” Machine at the HLRS—Höchstleistungsrechenzentrum Stuttgart as part of the PRACE Tier-0 grant “Frodo” (Grant No. 2019204998), the “Mahti” Machine at the CSC—Centre for Scientific Computing in Kajaani, and on the “Vorna” Cluster of the University of Helsinki. The authors wish to thank the Finnish Grid and Cloud Infrastructure (FGCI) for supporting this project with computational and data storage resources.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Urs Ganse: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (lead); Methodology (supporting); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). Harriet George: Writing – review & editing (supporting). Evgeny Gordeev: Writing – review & editing (supporting). Maxime Grandin: Writing – review & editing (supporting). Konstantinos Horaites: Writing – review & editing (supporting). Jonas Suni: Writing – review & editing (supporting). Vertti Tarvus: Writing – review & editing (supporting). Fasil Tesema Kebede: Writing – review & editing (supporting). Lucile Turc: Writing – review & editing (supporting). Hongyang Zhou: Writing – review & editing (supporting). Minna Palmroth: Funding acquisition (equal); Project administration (lead); Resources (lead); Writing – review & editing (supporting). Tuomas Koskela: Conceptualization (equal); Methodology (lead); Software (lead); Writing – review & editing (supporting). Markus Battarbee: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (supporting); Software (equal); Visualization (equal); Writing – review & editing (supporting). Yann Pfau-Kempf: Conceptualization (equal); Formal analysis (equal); Methodology (supporting); Software (equal); Visualization (supporting); Writing – review & editing (supporting). Konstantinos Papadakis: Software (supporting); Validation (supporting); Writing – review & editing (supporting). Markku Alho: Writing – review & editing (supporting). Maarja Bussov: Writing – review & editing (supporting). Giulia Cozzani: Writing – review & editing (supporting). Maxime Dubart: Writing – review & editing (supporting).
DATA AVAILABILITY
The Vlasiator simulation code is distributed under the GPL-2 open-source license at https://github.com/fmihpc/vlasiator.66 Full simulation data for the presented analysis are stored in the University of Helsinki Datacloud. Data presented in this paper can be accessed by following the data policy on the Vlasiator website https://helsinki.fi/vlasiator. Data handling and visualization are supported through the Analysator python package https://github.com/fmihpc/analysator as well as a plugin for the VisIt visualization system.67 Simulation configuration files and raw performance data are available for download from Ref. 59.