We present methods and algorithms that allow the Vlasiator code to run global, threedimensional hybridVlasov simulations of Earth's entire magnetosphere. The key ingredients that make Vlasov simulations at magnetospheric scales possible are the sparse velocity space implementation and spatial adaptive mesh refinement. We outline the algorithmic improvement of the semiLagrangian solver for sixdimensional phase space quantities, discuss the coupling of Vlasov and Maxwell equations' solvers in a refined mesh, and provide performance figures from simulation test runs that demonstrate the scalability of this simulation system to full magnetospheric runs.
I. INTRODUCTION
Vlasiator is a hybridVlasov space plasma simulation system, specifically designed to perform kinetic simulations of the nearEarth environment.^{1} Its goal has been to use it to perform global, threedimensional simulations of Earth's magnetosphere and its interaction with the solar wind without a fixed prescription of the shape of the particles' velocity distribution function [as would be the case in magnetohydrodynamics (MHD)]. Being an implementation of a hybrid kinetic method, Vlasiator simulates the phasespace evolution of ion species by propagating the phase space density on a Cartesian grid, modeling ions as a distribution function in six (three spatial and three velocity) dimensions. Electrons are treated in an indirect way, their effective physical action reduced to charge neutralization, Hall currents, and contributions to Ohm's law via the electron pressure gradient.^{2} In Vlasiator's numerical implementation, the deliberate choice of a Eulerian representation of phase space, instead of the common approach of a particleincell (PIC) approximation,^{3} means that the simulations are computationally very heavy, typically exceeding millions of CPUhours for a few tens of minutes of simulated physical time. On the other hand, this choice enables simulation results that are unencumbered by stochastic particle sampling noise that is inherent to the PIC simulation approach. In the end, highresolution PIC simulations typically end up using particle counts that are roughly equivalent ( $ n \u2248 10 12$) to Vlasiator's phase space grid point numbers.^{4}
Vlasiator has been employed to study Earth's foreshock,^{5–12} magnetosheath waves,^{13,14} magnetosheath jets,^{15–17} dayside magnetopause magnetic reconnection,^{18–20} transient foreshocks,^{21} magnetotail reconnection site locations,^{22,23} fast magnetotail flows,^{24} magnetotail current sheet flapping,^{25} and ionospheric precipitation.^{26,27} In addition, Refs. 28 and 29 presented simulations, in which full electrostatic Vlasov simulations including kinetic electrons were run. Common to all previous publications was a limitation to two spatial dimensions: For reasons of computational complexity and cost, the simulation runs were chosen to operate in spatially twodimensional cuts through the Earth's magnetosphere, along the equatorial plane (for foreshock and magnetosheath studies) or the noonmidnight meridional plane (for studies of dayside and tail reconnection, among others). The dimension perpendicular to this plane extended only a single simulation cell in thickness, with periodic boundary conditions. Their magnetic field had to be modeled as a line dipole, which prevented fully realistic mapping of inner magnetosphere locations to ionospheric altitudes.
A notable exception is given by Ref. 30, where the simulation setup was a slab of finite extent, to provide a third spatial dimension on a limited scale (but large enough to comfortably exceed ion scales), so that the local kinetic behavior of dayside magnetopause reconnection could be studied without limitation of strict twodimensionality. Yet, the overall geometry of the simulation domains, and thus the magnetic topology and physical degrees of freedom, remained essentially twodimensional, so that a comprehensive validation of physical results was required in each of the aforementioned studies, to ensure that their outcomes were not negatively affected by the limitation to two spatial dimensions.
Thanks to recent algorithmic improvements to the code, these limitations have been overcome, fulfilling the original goal outlined in Ref. 31. In the present paper, we describe the technological and conceptual evolution of the Vlasiator code that reduces the computational overhead to a degree that global simulations in full three spatial dimensions (plus three velocity dimensions) have become possible. Scientific results from the first 3D + 3V global hybridVlasov simulation of Earth's magnetosphere performed with the methods presented here will be presented in an upcoming publication.
The structure of this publication is such that the remainder of Sec. I outlines the length and timescales that magnetospheric hybridkinetic simulations need to resolve and how these resolution requirements lead to difficult computational demands. Section II describes Vlasiator's data structures and solvers, with a special focus on the recent developments that enable the combination of spatial adaptive mesh refinement and sparse velocity space structure. Section III, then, presents verification test cases, which show the capabilities of the new solvers. Discussion and outlook toward future developments are presented in Sec. IV.
A. Spatiotemporal scales and computational requirements
As the goal of Vlasiator is to perform hybridkinetic simulations of the whole magnetospheric domain, the involved spatiotemporal scales place stringent requirements on the grid resolution and size. Kinetic electron dynamics are neglected in the hybrid approach, so characteristic length scales of interest are dictated by ion kinetic effects. They include, at the small end, the ion inertial length (whose value in the solar wind, for example, is $ d i = c / \omega i \u2248 230 \u2009 km$, where c is the speed of light, and ω_{i} is the ion plasma frequency). Wavelengths of mirror modes and electromagnetic ion cyclotron (EMIC) waves in the magnetosheath are $ \lambda M \u2248 300 \u2009 km$.^{13,14} The Larmor radius $ r L = m v q B$ provides yet another kinetic length scale that varies strongly throughout the magnetospheric domain (from $ r L \u2248 1500 \u2009 km$ in the solar wind to tens of kilometers in the cusp regions).
In addition to these smallest length scales that need to be resolved, the minimum outer system size of the simulation domain is dictated by the extents of the magnetospheric flow and current systems. With the simulation domain, shaped as a rectangular cuboid box in the GSM (geocentricsolarmagnetospheric) coordinate system, the bow shock standoff distance at $ x \u2248 12$ Earth Radii $ ( R E )$^{32} provides a minimum bound for the Sunward direction. If the simulation is expected to generate a foreshock extending toward the Sun, the box must be extended to contain the majority of generated foreshock wave power. In the opposite direction, the magnetotail extends far in −x. The whole tail, however, cannot be encompassed in the simulation domain, so the constraint is to include the plasma domain relevant to and driving the features of interest. We have chosen tailward simulation domain extents down to $ x \u2248 \u2212 100 \u2009 R E$ (in contrast to much longer tails in MHD simulations, such as Ref. 33), allowing simulation data to be directly compared to spacecraft data from the ARTEMIS mission.^{34}
In the two remaining spatial dimensions, the system size is determined by the bow shock shape: The box needs to be large enough to contain a significant part of the shock surface. Downstream from locations where the shock surface is allowed to intersect the box, the pressure balance has proven to be unreliable, which can negatively influence the model's physical validity. In Vlasiator global runs, the extents are, thus, typically chosen to range from $ y , z \u2248 [ \u2212 60 ; 60 ] \u2009 R E$ to provide generous leeway away from the inner magnetosphere and magnetotail.
In nearEarth space, the velocity space extents are dictated by the typical bulk velocity of the solar wind and nonthermal velocity structures (such as nonrelativistic particle beams accelerated at the bow shock or by magnetic reconnection). Satellite observations and previous simulation runs^{6,35} motivate velocity limits of $ v x , v y , v z \u2208 [ \u2212 4000 ; 4000 ] \u2009 km / s$ under regular solar wind conditions. For the velocity space resolution, i.e., the size of a velocity space cell, a systematic study by PfauKempf et al.^{36} found that $ \Delta v \u2248 40 \u2009 km / s$ yields reasonable results for the physical processes relevant to magnetospheric simulations.
Given the resolution and spatial extent requirements discussed above, Table I tabulates the amount of grid points that the phase space needs to comprise in each of its six dimensions. From the table, we conclude that a fully resolved simulation would require a total of $ N r = n x \xd7 n y \xd7 n z = 3822 3 = 4.5 \xd7 10 8$ 3D simulation cells, each of which contain a velocity space of $ N v = n v x \xd7 n v y \xd7 n v z = 200 3 = 8 \xd7 10 6$ velocity space cells. In a naïve approach to uniform grid discretization, the 6D phase space would, thus, contain $ N c = N r \xd7 N v = 3.5 \xd7 10 15$ phase space cells, which leads to a memory requirement of about 72 exabytes. Obviously, this is far beyond anything a computing facility can be expected to provide in the foreseeable future.
Dim. (GSM) .  Physical extent L .  Resolution requirement $ \Delta l$ .  Required cell count n . 

x  $ [ \u2212 100 ; 20 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
y  $ [ \u2212 60 ; 60 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
z  $ [ \u2212 60 ; 60 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
$ v x , v y , v z$  $ [ \u2212 4000 ; 4000 ] \u2009 km / s$  $ 40 \u2009 km / s$  200 
Dim. (GSM) .  Physical extent L .  Resolution requirement $ \Delta l$ .  Required cell count n . 

x  $ [ \u2212 100 ; 20 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
y  $ [ \u2212 60 ; 60 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
z  $ [ \u2212 60 ; 60 ] \u2009 R E$  $ 1000 \u2009 km \u2009 ( 0.156 \u2009 R E )$  3822 
$ v x , v y , v z$  $ [ \u2212 4000 ; 4000 ] \u2009 km / s$  $ 40 \u2009 km / s$  200 
The key strategy hitherto employed in Vlasiator to achieve a significant reduction of computational complexity is the sparse velocity space representation.^{37} It makes use of the observed property of plasma velocity distribution functions in nearEarth space tend to be compactly localized in the velocity domain. Very often, high phase space densities are present in (near) Maxwellian core populations around the bulk velocity, sometimes with beam distributions clearly separated from the thermal core. Long tails of the distribution, such as powerlaw or kappa distribution functions, are present and important for highenergy particle transport studies, but their overall kinematic impact on global plasma dynamics is small compared to the bulk and beam structures. Large volumes of the velocity space are filled with phase space density values that are many orders of magnitude lower than the dynamically relevant peaks. Vlasiator deliberately discards those parts and stores and processes only the parts of velocity space in which the phase space density exceeds a set threshold value $ f min$. In magnetospheric setups, we found that a sparsity threshold value of $ f min = 10 \u2212 15 \u2009 s 3 / m 6$ led to a 98% reduction in modeled phase space volume,^{36} while resulting in an overall mass loss of $ < 1 %$. This directly translates to a 98% reduction in computational requirements, which made the spatially 2D Vlasiator simulations and scientific studies possible. This threshold value was also empirically found to retain foreshock beam distributions, thus leaving ionkinetic beam instabilities in the foreshock intact.^{9} For a more detailed description of the sparse velocity space mechanism, see Refs. 37 and 1.
However, this mechanism alone is not sufficient to make 3D + 3V global magnetospheric simulations feasible, as the memory requirement for the simulation data would still be of the order of (tens of) exabytes. To enable spatially 3D simulations, a further reduction in computational intensity is required. A common approach, especially in magnetohydrodynamic (MHD) modeling,^{33,38,39} is to change the spatial grid structure to be nonuniformly discretized. Typically, a distinction is made between adaptive mesh refinement (AMR) approaches and unstructured sparse grids.
In an AMR setup, the simulation domain is initially discretized on a coarse Cartesian base mesh. The mesh resolution in certain areas of the domain is successively doubled by splitting simulation cells into smaller subcells. Thus, parts of the domain have double mesh resolution, parts have quadruple resolution, etc.
The choice for when to refine must be made judiciously based on the problem at hand: In some cases, the refinement will be made based on predefined geometric constraints (such as the choice of higher resolution for certain areas of interest). More generally, however, the adaptivity of the refinement process is expressed through a runtime evaluated refinement criterion. This refinement criterion continuously reevaluates whether any given simulation cell should be coarser or finer than it is at the moment.
A simulation code that employs a refined mesh needs to be designed such that it can operate on different grid resolutions and especially handle situations at refinement interfaces, where simulation cells have multiple neighbors of higher or lower resolution in one or multiple directions. Section II describes the details of how Vlasiator's solvers have been adapted and optimized for this task. Using these optimizations, the amount of simulation cells for a global 3D + 3V Vlasiator simulation is reduced to $ N c \u2248 1.3 \xd7 10 12$ cells, leading to a total memory requirement of 33.7 terabytes, which is well within the capabilities of modern supercomputers.
Section II details the design choices and algorithmic changes to the Vlasiator data structures and solvers that make computationally efficient 3D + 3V simulations with a refined mesh geometry possible.
II. METHODS
A. The grid structure
Vlasiator's basic simulation grid is based on the distributed Cartesian cellrefinable grid (DCCRG) library,^{40} which provides memory layout, mesh geometry, and refinement and a parallel message passing interface (MPI) communication framework.^{41}
The DCCRG framework enables adaptive mesh refinement at arbitrary locations of a Cartesian simulation domain, by splitting a cubic simulation cell into eight cubic subcells with twice the spatial resolution (due to its choice of solvers, Vlasiator currently requires exactly cubical cells). This refinement process can be iterated multiple times. The domain decomposition is neither tree nor patch based,^{42,43} as every cell can be individually refined or unrefined without any cooperation of its local neighborhood (with the limitation that only one jump of refinement level is allowed at a time, and refinement interfaces need to be separated by at least two cells). As a result, the refinement domains can be freely located and the refinement interfaces can have arbitrary shapes. The solvers operating on that domain need to, therefore, be able to handle changes of refinement level on a cellbycell basis.
The current refinement strategy for Vlasiator global simulations is based on regions of interest. These regions are identified in a lowresolution run (and/or using estimates by wellestablished empirical relations for the bow shock and magnetopause positions,^{44}) and are, thereby, predefined for the highresolution simulation. Figure 1 shows an example of the mesh refinement geometry employed in such a situation. For future runs, however, dynamic refinement criteria are being investigated, based on local physical parameters and their gradients, as is commonly done in global MHD simulations (e.g., Ref. 33).
While the DCCRG grid of Vlasiator stores the phase space density data as solved by the Vlasov equation, the electromagnetic fields are not discretized on the same mesh. They are propagated with a finite difference upwind field solver,^{45} which stores its data on a simple uniform Cartesian array grid.^{46} This grid does not employ mesh refinement (as the field solver does not need it), but is discretized at a constant resolution matching the finest AMR refinement level of the DCCRG grid.
As the computational demands of these simulations require running on large distributedmemory systems, both the Vlasov and field solver grids use a spatial domain decomposition approach. The strongly varying computational complexity of the sparse velocity space and AMR implementations makes it necessary for the Vlasov solver grid to dynamically rebalance the load of its decomposition domains. A combined action by the DCCRG grid and the Zoltan library^{47} takes care of this process. The field solver, on the other hand, has a constant computational load per simulation cell, so that a simple MPI Cartesian communicator is used for initial grid partition; no rebalancing is required during run time.
This multimesh structure is referred to as heterologous domain decomposition. The intricacies of coupling between these two grids are discussed in a recent publication.^{48}
B. The 6D Vlasov solver
As the nearEarth plasmas modeled by Vlasiator are in good approximation nonrelativistic, relativistic effects are neglected in the implementation. This has several benefits for the semiLagrangian structure of the translation step: Transport in all three spatial dimensions can be treated independently, as Newtonian kinematics does not introduce a Lorentz factor of relativistic kinematics. For each dimension, transport in the $ ( x i , v x i )$ subspace, hence, forms a linear shear of the distribution, as illustrated in Fig. 2.
In a similar way, certain properties of the Lorentz force acting on the plasma medium allow the acceleration step to be simplified. In the nonrelativistic Vlasov equation, the action of the Lorentz force $ F \u2192 l = q s ( v \u2192 \xd7 B \u2192 + E \u2192 )$ transforms the phase space distribution inside one simulation time step $ \Delta t$ like a solid rotator in velocity space: The magnetic field causes a gyration of the distribution function by the Larmor motion $ \Delta v \u2192 = \u2202 v \u2192 / \u2202 t \u2009 \Delta t = q s m s \u2009 ( v \u2192 \xd7 B \u2192 + E \u2192 ) \u2009 \Delta t$. Any nonzero electric field accelerates electrically charged matter, which is a simple linear shift of velocity space.
Hence, the overall velocity space transformation is a rotation, whose axis direction matches the local magnetic field vector. The gyration center velocity is given by the convective plasma velocity $ E \u2192 \xd7 B \u2192 / B 2$. This transformation can be decomposed into three consecutive shear motions^{53,54} along the coordinate axes, thus repeating the same fundamental algorithmic structure as the spatial translation update. The acceleration update affects the velocity space locally within each spatial simulation cell, so its properties are unaffected by the introduction of spatial AMR. This paper will, hence, focus on the changes to the translation solver step.
Iterating through the storage data structure of Vlasiator's sparse phase space has a high numerical complexity. In contrast, the access patterns of the solver are predictable and linear: The semiLagrangian solver performs the shear transformation inorder, in one dimension at a time. To benefit from this, phase space data are organized into a socalled pencil structure at the beginning of each solver step, in which spatially adjacent data, in one Cartesian direction at a time, are stored linearly in memory. These pencils share no data dependencies during the shear and can be processed in parallel.
In a refined mesh, construction of these pencils carries an additional complication, since neighboring cells at the mesh refinement boundaries can have different sizes. The algorithm performs the following steps for every spatial update direction (as illustrated in Fig. 2):

Find pencil seed cells
The goal is to build pencils of spatially adjacent cells. This process starts with the seed cells at one boundary of the domain. These are all cells that do not contain a local neighbor in the backwards direction of the chosen coordinate.

Pencil construction
The pencil is built by identifying the neighbors of the seed cell in the positive coordinate direction and appending them to the pencil data structure. This process continues until no further neighbor cells are found (either at the opposite simulation boundary, or the edge of the local MPI domain).

Pencil splitting
When a refinement interface is encountered in this process, the next neighbors are four cells at higher mesh resolution. Here, the existing pencil structure is stopped, and four new separate pencils continue. To minimize the amount of data movement, the split pencils also contain cells from the coarser domain corresponding to the semiLagrangian update's stencil size (two cells in common simulation runs). This process advances recursively across any further refinement interface.
In the case where a refined pencil encounters a recoarsening of the mesh, the pencil construction stops, again taking padding by the stencil size into account. A new seed point (and hence, a new pencil) is started to continue at the coarser level.

Result merging and remote neighbor contributions
Even a completed pencil still has a mass outflow toward a further neighbor cell. This can, in many cases, mean that a boundary of the spatial decomposition over parallel processes was encountered, and a remote neighbor cell exists, into which the transported phase space density flows. This is called the remote neighbor contribution of the pencil.
In situations where the end of the pencil coincides with a coarsening refinement interface, four pencils will feed their remote neighbor contribution into the same target cell, so the target cell provides a target buffer for up to four incoming phase space density values, which are summed after the transport step is completed.
Once the complete set of pencils has been constructed, meaning that the whole 3D space is organized into pencil structures, the next step of the solver acts on one phase space cell at a time, interpolating the discrete phase space values of the pencils by a reconstruction polynomial (depicted as the red curve in Fig. 2). The phase space integral values of each cell are taken as control points for a polynomial using the nonuniform piecewise parabolic reconstruction method of Ref. 55. A slope limiter is used at this point, to ensure the total variation diminishing (TVD) property of the overall solver step.
The shifted polynomial is integrated over the target cell extent to obtain updated phase space density values. These are transferred back from the pencil structure into the phase space grid data structure.
This entire process gets repeated for all three spatial dimensions, leading to an effective threedimensional spatial transport step. As a performance optimization, the pencilization process is only performed once and its metadata are retained until the MPI load balances change the spatial decomposition of the simulation domain.
C. Coupling between Vlasov and field solver
To complete the description of a plasma system in the hybridkinetic regime, the Vlasov equation [Eq. (1)] is solved together with Maxwell's equations in the Darwin approximation.^{1} The electric field $ E \u2192$ and magnetic field $ B \u2192$ act on the distribution functions, and in return, statistical moments of the particle distribution functions (overall mass density ρ_{m}, charge density ρ_{q}, bulk velocity $ V \u2192$, and pressure tensor $ P \u2192 \u2192$) are fed into the field solver. Hence, data need to be exchanged between the two equations' solver procedures.^{48}
As Vlasiator's field solver and Vlasov solver operate on separate grids, they both require a coupling process during each simulation time step, to be used as source terms for each other's update. The association between the two grids is calculated at initialization time and updated after any load balancing step. Reference 48 presents an indepth discussion of details of the coupling procedure between the two meshes, including the spatial filtering step employed to smooth out discontinuities in areas of resolution mismatch.
D. Boundary conditions
Vlasiator offers a selection of boundary conditions to handle different simulation scenarios, described in detail in Ref. 1 Inflowing boundary conditions enable solar wind to enter the simulation, with arbitrary timevarying plasma conditions.^{56} Periodic boundary conditions are used to enable simulation in one and two spatial dimensions, and outflow boundary conditions are used to enable plasma and electromagnetic fields to exit the simulation domain. Outflowing boundary conditions are implemented as homogeneous Neumann boundaries, essentially copying the values from the neighboring simulation cells. The inner boundary of the magnetospheric setup is terminated by an ionospheric shell, set at about $ 4.7 R E$ and modeled as a perfect conductor. The ionospheric boundary cells' phase space density is initialized with a Maxwellian distribution and remains constant throughout the run. The electric field at the ionosphere is set to zero, and the magnetic field component is initialized with values corresponding to the Earth dipole at those locations.
III. RESULTS
Section III presents a number of validation test runs, in which the performance of the Vlasov solver has been tested. Full discussion and indepth analysis of the physical properties of our largescale 3D magnetospheric simulations are subject of an upcoming publication.
A. Performance of the semiLagrangian solver
To ensure that the propagation algorithm performs well across simulation refinement interfaces, three tailored simulations were performed, where a test distribution was propagated in a periodic box with a size of $ 20 \xd7 1 \xd7 1$ cells at coarsest refinement level. In these tests, the semiLagrangian solver was run in isolation from the rest of the simulation system (especially without any interaction of the field solver). A monoenergetic plasma (moving at constant velocity v) with a spatial density modulation of square, sinusoidal, or triangular form was initialized in the center of the domain, with two levels of mesh refinement implemented in the same region. A fixed time step $ \Delta t = 0.8 \Delta x v$ (corresponding to a spatial Courant–Friedrichs–Lewy (CFL) value of 0.8 at the finest refinement level) was used. In 100 simulation time steps, the density modulation propagates once around the boxes' periodic boundaries and arrives at its starting position. Figures 3(a)–3(c) show the spatial profiles for all three signals at the start of the simulation, after 100, 1000, and 5000 time steps (corresponding to 1, 10, and 50 complete transitions of the simulation box). In panels (d) and (e), the triangular setup has been repeated with a uniform grid discretization at the highest and lowest resolution level, respectively.
It is apparent that the use of slope limiters caused some diffusion, but overall the signals are conserved adequately. In particular, there are no distortions or artifacts created at the interfaces between refinement regions. Comparison of the meshrefined triangle run [panel (c)] with the high and lowresolution uniform mesh runs [panels (d) and (e)] shows that the AMR mechanism does not add any extra diffusivity, and its effective diffusive behavior lies between the high and lowresolution case. The small asymmetries visible in the final result stem from the selected propagation direction ( $ + x$) and were mirrored when the tests where repeated in the opposite flow direction. These asymmetries stem from the applied parabolic polynomial fitting method that uses four data points, which cannot be selected around the source cell in a symmetric fashion.
B. Speedup and computational effort
As a benchmark problem for realistic plasma simulations, a shockinabox test was run, in which an elongated rectangular simulation domain (with a size of $ L x = 3.2 \xd7 10 4 \u2009 km , \u2009 L y , \u2009 and L z = 2 \xd7 10 2 \u2009 km$) is initially filled with two plasma regions that correspond to the Rankine–Hugoniot conditions of an interplanetary shock in the de Hoffmann–Teller frame.^{57,58} These simulations have a similar setup as employed in Ref. 36, with the important distinction that they are fully threedimensional meshes. Figure 4 shows the simulation box, mesh refinement structure, and resulting density profiles, and Ref. 59 contains full simulation configuration and output files.
The parameters are chosen such that the shock has zero velocity in the simulation frame and remains quasistationary in the central region of the box. Note that in the short simulation time frame, complex 3D shock dynamics do not have time to form. The ion inertial length is $ c / \omega p i = 223 \u2009 km$ upstream and $ 148 \u2009 km$ downstream of the shock. Table II compares five different runs of this scenario. In the “low1” run, the simulation was discretized with a unrefined mesh of $ 160 \xd7 10 \xd7 10$ cells in size (yielding a spatial resolution of $ \Delta x = 200 \u2009 km$). For “med8” and “high64,” this was increased to $ 320 \xd7 20 \xd7 20$ cells ( $ \Delta x = 100 \u2009 km$) and $ 640 \xd7 40 \xd7 40$ cells ( $ \Delta x = 50 \u2009 km$), respectively, so these three simulations provide a weakscaling set of the solver implementation without AMR.
Run .  N_{c} .  N_{f} .  $ \Delta x$ (km) .  $ N CPU$ .  Mem (GB) .  t_{wall} (s) .  CPUh .  $ N tsteps$ .  Eff (μs) . 

low1  16 000  $ 1.1 \xd7 10 9$  200  32  15.74  5500  48.9  495  0.33 
med8  128 000  $ 7.7 \xd7 10 9$  100  256  182.7  12883  916  989  0.43 
high64  1 024 000  $ 6.1 \xd7 10 10$  50  2048  1545.9  31695  18030  1978  0.53 
AMR18  296 000  $ 1.8 \xd7 10 10$  50–200  608  524.9  28955  4890  1758  0.55 
eqv18  281 216  $ 1.7 \xd7 10 10$  76.92  608  429.5  17548  2963  1286  0.49 
Run .  N_{c} .  N_{f} .  $ \Delta x$ (km) .  $ N CPU$ .  Mem (GB) .  t_{wall} (s) .  CPUh .  $ N tsteps$ .  Eff (μs) . 

low1  16 000  $ 1.1 \xd7 10 9$  200  32  15.74  5500  48.9  495  0.33 
med8  128 000  $ 7.7 \xd7 10 9$  100  256  182.7  12883  916  989  0.43 
high64  1 024 000  $ 6.1 \xd7 10 10$  50  2048  1545.9  31695  18030  1978  0.53 
AMR18  296 000  $ 1.8 \xd7 10 10$  50–200  608  524.9  28955  4890  1758  0.55 
eqv18  281 216  $ 1.7 \xd7 10 10$  76.92  608  429.5  17548  2963  1286  0.49 
The results were obtained on the “Vorna” cluster at the University of Helsinki, consisting of 180 nodes with Intel Xeon E5–2670 CPUs at 2.60 GHz and an InfiniBand QDR interconnect.
Simulation “AMR18” from Table II was set up with the same low base resolution as “low1,” but the central part of the domain ( $ x \u2208 [ \u2212 4 \xd7 10 6 , 4 \xd7 10 6 ] \u2009 m$) underwent two levels of mesh refinement, giving that region the same effective $ 50 \u2009 km$ resolution as “high64.” Furthermore, simulation “eqv18” was initialized with $ 416 \xd7 26 \xd7 26$ uniform cells resulting in a spatial cell count (281 216) close to what the AMR run achieved (296 000), which yielded a uniform resolution of $ \Delta x = 76.92 \u2009 km$ throughout the box.
The simulation time step $ \Delta t$ was determined from the advection CFL condition. As this value depends on the size of spatial cells, each run had to perform a different amount of time steps in order to reach the simulation target time of 100 s.
As the figures in the table show, the AMR simulations' amount of simulation cells N_{c} lies far below that of “high64” and its memory consumption and total computation resource use are reduced accordingly. As the AMR solver pencilization process is more complex than that in the unrefined case, the effective computational effort per phasespace cell, here expressed as $ t wall \xb7 N CPU / ( N f \xb7 N tsteps )$ (μs), is noticeably higher. This can also directly be inferred from comparing the resources spent by “AMR18” and “eqv18.” The increased computational complexity of the AMR solver, however, is offset by the increase in effective physical phase space volume per computational unit, so that AMR Vlasov simulations clearly outperform unrefined setups in terms of overall computational effort per physical volume or conversely allow for a much finer resolution at the physically relevant regions at the same computational cost.
C. Global magnetospheric simulations
Finally, the scientific use case for adaptive mesh refinement in Vlasiator is the study of global, threedimensional hybrid kinetic simulations of Earth's entire magnetosphere.
Figures 1 and 5 present different visualizations of the mesh refinement structure currently employed in Vlasiator global runs. This run modeled the plasma around Earth, in a box extending in geocentric solar ecliptic (GSE) coordinates $ x \u2208 [ \u2212 110 ; 50 ] ; R E , \u2009 y , z \u2208 [ \u2212 57 ; 57 ] R E$, with an incoming, steady solar wind density of $ n SW = 10 6 \u2009 m \u2212 3$, flowing at a speed of $ v SW = 750 \u2009 km / s$ in the – x direction. The solar wind carried an interplanetary magnetic field of $ B z = \u2212 5 \u2009 nT$ in the purely southward direction, resulting in spatially wellknown positions for the day and nightside reconnection regions. The simulation's spatial grid had a coarsest resolution level of $ \Delta x = 8000 \u2009 km$ with three subsequent levels of refinement, yielding a finest resolution of $ \Delta x = 1000 \u2009 km$. The refinement regions were chosen to give maximum resolution coverage for the dayside magnetopause and tail reconnection regions. Note that the ion inertial length $ c / \omega p i \u2248 230 \u2009 km$ in this simulation is smaller than even the finest grid resolution. This was likewise the case in previous publications,^{14,30,36} which have nonetheless provided valuable insights into physics of the magnetosphere, that would be hard to obtain using other simulation methods. As this manuscript focusses on the implementation details of adaptive mesh refinement, the inclined reader is referred to Ref. 61, which presents a recent study further quantifying the physical fidelity of Vlasiator simulations at low resolution and presenting a subgrid method to mitigate some of its effects.
In Fig. 5, the blue line provides an analytic estimate of the magnetopause location using the model of Ref. 44 calculated from the solar wind parameters of the simulation. The magnetospheric geometry that has selfconsistently developed in the simulation matches this prediction quite well, but the magnetopause standoff distance appears to mismatch by about $ 2 \u2009 R E$. The authors of Ref. 62, however, have argued that the model by Ref. 44 overestimates the radial distance by about $ 1 \u2009 R E$, an effect that is most likely even more exacerbated by the fact that this simulation has an unusually high solar wind temperature of $ T = 0.5 \u2009 MK$, leading to an enhanced thermal pressure in the magnetosheath and somewhat stronger compression of the magnetosphere, resulting in a smaller magnetopause standoff distance. Most importantly, the simulation shows no sudden discontinuity or departure from the prediction in different refinement regions.
The green line in Fig. 5 presents the analytic bow shock location model by Ref. 32. It again shows that the Vlasiator bow shock geometry matches well with observational behavior. Some departure from the ideal parabolic shape of the model is visible, but this is likely due to kinetic effects of the cusps and transient magnetosheath phenomena, as previously presented in Refs. 19 and 21, and stays within the confidence range given in Ref. 32.
Direct performance comparison with a highresolution 2D Vlasiator production run using the Vlasiator solver without AMR was performed on the Mahti supercomputer at the CSC—Center for Scientific Computing in Kajaani, Finland, a BullSequana XH2000 system with AMD Rome 7H12 CPUs at 2.6 GHz and an InfiniBand HDR100 interconnect. Table III outlines the cell counts, memory requirement, and effective calculation time per core per cell of both runs. As before, the AMR solver carries a performance penalty in terms of computing time per simulated phase space cell. There is also still somewhat of a resolution gap between the 2D simulation (at $ \Delta x = 300 \u2009 km$) and the 3D run (at $ \Delta x = 1000 \u2009 km$), but the overall reduction in required phase space cells means that a simulated 3D volume, instead of a 2D slice, is now possible and meaningful scientific results can now be attained.
Run .  N_{c} .  N_{f} .  $ \Delta x$ (km) .  $ N CPU$ .  Mem (TiB) .  Eff (μs) . 

2D  $ 6 \xd7 10 6$  $ 1.9 \xd7 10 11$  300  128 000  11.89  4.18 
3D  $ 1.3 \xd7 10 6$  $ 1.56 \xd7 10 9$  1000  64 000  4.82  7.08 
Run .  N_{c} .  N_{f} .  $ \Delta x$ (km) .  $ N CPU$ .  Mem (TiB) .  Eff (μs) . 

2D  $ 6 \xd7 10 6$  $ 1.9 \xd7 10 11$  300  128 000  11.89  4.18 
3D  $ 1.3 \xd7 10 6$  $ 1.56 \xd7 10 9$  1000  64 000  4.82  7.08 
IV. CONCLUSIONS AND OUTLOOK
Studies of the nearEarth plasma environment are strongly dependent on space plasma simulations, whose computational requirements continue to use a major share of supercomputing resources worldwide. The methods and results presented in this paper show that, using a combination of sparse velocity space representation and spatial adaptive mesh refinement, it is possible to simulate the entire outer extents of Earth's magnetosphere close to hybridkinetic scales in three dimensions using a Vlasov simulation code. This outcome opens the door for modeling studies of magnetospheric plasma phenomena in three dimensions, among these complex phenomena benefiting from a global view (without the previously required couplings of different simulation approaches^{63} or limitation to stochastic sampling of distribution functions as in the PIC approach^{4}).
Section II introduces the dimensional splitting approach used in Vlasiator's solver and described the dimensionbydimension dynamic pencilization and semiLagrangian solver strategy. The combination of adaptive mesh refinement and a massively parallelized, lowdiffusive phase space solver is novel, and its fields of applicability are only just starting to be explored.
The scaling results in Subsection III B demonstrate how the AMR implementation provides a very effective reduction of memory usage and overall computation time compared to unrefined simulations. Through the introduction of the AMR solver, individual computations for each computational cell had incurred a computation time overhead of about 50%. Yet, this is counterbalanced by the fact that the number of simulation cells directly reduces by a factor of eight for every introduced refinement level, thus leading to an overall significant performance gain.
Global magnetospheric simulations performed with these new methods have been validated against the Ref. 44 magnetopause and the Ref. 32 bow shock model and show that magnetospheric dynamics remains intact and consistent across all refinement levels. The simulation extents and resolutions that Vlasiator achieves using the methods outlined here are now approaching those of global MHD simulations^{64} and are able to encompass kinetic phenomena that before escaped global modeling. Yet, even with the current improvements, resolution of the ion inertial length is still beyond the numerical capabilities of supercomputers in the unrefined parts of the simulation domain. Improvement of simulation techniques and implementations are ongoing.
The current mesh refinement process is only adaptive in its spatial dimensions, while the velocity resolution remains fixed. One potential future optimization would be the implementation of an adaptively refined mesh also in the velocity domain of each cell, thus yielding a fully sixdimensional mesh refinement and hopefully providing another orderofmagnitude improvement of computational performance. Another avenue is to decouple the time stepping in regions of different resolution and different physical properties (as in Ref. 65) enabling longer time steps where possible and focusing the computational power on the regions of dynamics at small spatiotemporal scales.
ACKNOWLEDGMENTS
This work was performed as part of the European Research Council Consolidator Grant No. 682068PRESTISSIMO. The Academy of Finland supported this work through the PROFI4 Grant No. 318913.
The authors gratefully also acknowledge the Academy of Finland (Grant Nos. 1339327, 1335554, 1347795, 1345701, 267186, 309937, 322544, 328893, 338629, and 339756). The Finnish Centre of Excellence in Research of Sustainable Space, funded through the Academy of Finland Grant Nos. 312351 and 1336805, supports Vlasiator development and science as well.
The simulations for this publication were run on the “Hawk” Machine at the HLRS—Höchstleistungsrechenzentrum Stuttgart as part of the PRACE Tier0 grant “Frodo” (Grant No. 2019204998), the “Mahti” Machine at the CSC—Centre for Scientific Computing in Kajaani, and on the “Vorna” Cluster of the University of Helsinki. The authors wish to thank the Finnish Grid and Cloud Infrastructure (FGCI) for supporting this project with computational and data storage resources.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Urs Ganse: Conceptualization (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (lead); Methodology (supporting); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (lead); Writing – review & editing (lead). Harriet George: Writing – review & editing (supporting). Evgeny Gordeev: Writing – review & editing (supporting). Maxime Grandin: Writing – review & editing (supporting). Konstantinos Horaites: Writing – review & editing (supporting). Jonas Suni: Writing – review & editing (supporting). Vertti Tarvus: Writing – review & editing (supporting). Fasil Tesema Kebede: Writing – review & editing (supporting). Lucile Turc: Writing – review & editing (supporting). Hongyang Zhou: Writing – review & editing (supporting). Minna Palmroth: Funding acquisition (equal); Project administration (lead); Resources (lead); Writing – review & editing (supporting). Tuomas Koskela: Conceptualization (equal); Methodology (lead); Software (lead); Writing – review & editing (supporting). Markus Battarbee: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (supporting); Software (equal); Visualization (equal); Writing – review & editing (supporting). Yann PfauKempf: Conceptualization (equal); Formal analysis (equal); Methodology (supporting); Software (equal); Visualization (supporting); Writing – review & editing (supporting). Konstantinos Papadakis: Software (supporting); Validation (supporting); Writing – review & editing (supporting). Markku Alho: Writing – review & editing (supporting). Maarja Bussov: Writing – review & editing (supporting). Giulia Cozzani: Writing – review & editing (supporting). Maxime Dubart: Writing – review & editing (supporting).
DATA AVAILABILITY
The Vlasiator simulation code is distributed under the GPL2 opensource license at https://github.com/fmihpc/vlasiator.^{66} Full simulation data for the presented analysis are stored in the University of Helsinki Datacloud. Data presented in this paper can be accessed by following the data policy on the Vlasiator website https://helsinki.fi/vlasiator. Data handling and visualization are supported through the Analysator python package https://github.com/fmihpc/analysator as well as a plugin for the VisIt visualization system.^{67} Simulation configuration files and raw performance data are available for download from Ref. 59.