We present a scheme that spatially couples two gyrokinetic codes using first-principles. Coupled equations are presented and a necessary and sufficient condition for ensuring accuracy is derived. This new scheme couples both the field and the particle distribution function. The coupling of the distribution function is only performed once every few time-steps, using a five-dimensional (5D) grid to communicate the distribution function between the two codes. This 5D grid interface enables the coupling of different types of codes and models, such as particle and continuum codes, or delta-f and total-f models. Transferring information from the 5D grid to the marker particle weights is achieved using a new resampling technique. Demonstration of the coupling scheme is shown using two XGC gyrokinetic simulations for both the core and edge. We also apply the coupling scheme to two continuum simulations for a one-dimensional advection–diffusion problem.

The Exascale High-Fidelity Whole-Device-Modeling (WDM) project aims at delivering an application, WDMApp, composed of multiple physics components coupled using first-principles. The long-term goal of the WDM project is to combine the necessary physics needed to understand and predict the performance of ITER and future fusion reactor experiments. This paper describes a first step toward developing high-fidelity WDMApp, by coupling two first-principle gyrokinetic codes simulating the physics in the core and edge of a tokamak plasma. To guarantee that the system of coupled codes provides a solution based on first principles, the coupling between these components must itself be based on first principles. This paper extends our previous study in Ref. 1 to kinetic coupling and illustrates our approach to achieve a first-principles coupling of a core and edge gyrokinetic code.

The first step addressed in this project is the spatial coupling of two gyrokinetic codes, one for the core (such as GENE4,5 or GEM6) and one for the edge (XGC7), see Fig. 1, computing on separate spatial domains. The domain of coupled simulations is decomposed in three main regions, which are the core, the edge, and the overlap region, where both core and edge simulations co-exist, see Fig. 1(b). Beyond the overlap region, each simulation has a buffer region of variable width, which ends at an artificial boundary. This overlap region mitigates the error caused by numerical differences between core and edge simulations in the coupling region.

FIG. 1.

Whole volume core-edge coupled simulation. (a) Figure produced with visit.2 Data from a coupled simulation of a DIII-D like plasma. See Ref. 3 for visualization effort within the Exascale Computing Project. (b) Cartoon of the core–edge domain decomposition.

FIG. 1.

Whole volume core-edge coupled simulation. (a) Figure produced with visit.2 Data from a coupled simulation of a DIII-D like plasma. See Ref. 3 for visualization effort within the Exascale Computing Project. (b) Cartoon of the core–edge domain decomposition.

Close modal

We present a generalized scheme that enable the coupling of core and edge gyrokinetic simulations. This new scheme combines the coupling of the field (Poisson equation1) with the coupling of the particle distribution function in the buffer regions. The tight coupling of the Poisson equation ensures that core and edge simulations time advance the same gyrokinetic Vlasov equation. The coupling of the distribution function is necessary for correctly evolving the distribution function near the artificial boundary at the end of the buffer region. This coupling of the distribution function is the main contribution of this new work, going beyond the field coupling in Ref. 1.

This study provides basic insights into the problem of first-principle spatial code coupling. The principles developed here can be applied to other code coupling problems. For instance, we illustrate the scheme in the simpler context of coupling of the one-dimensional (1D) advection–diffusion equation. This scheme can also be applied and adapted to the coupling of 5D and even 6D codes.

To enable the coupling of different codes, we have implemented an abstract interface that communicates the particle distribution function information on a 5D grid. For particle-in-cell (PIC) codes, this requires the transfer of information between a grid and a set of marker particles. This transfer from the 5D grid to the marker particles is based on a moment preserving constrained resampling (MPCR) algorithm.8 Verification is performed with the core–edge coupled simulation using the XGC code in both domains.

In comparison with Ref. 1 where we only coupled the Poisson equation, the addition of the coupling of the distribution function in the buffers enables more complex simulations. For example, including a heat source in the core, neutral recycling in the edge, or simply kinetic electrons and associated turbulent fluxes. The new kinetic coupling scheme allows for f on one side (core or edge) to be transferred to the other side via the coupling of f in the buffer.9 

Finally, as we will discuss, this new core–edge coupling scheme enables one to use a much less computationally intensive model in the core (e.g., delta-f, or with fewer impurities) than in the edge. Such an optimization would lead to the speed-up of a coupled core–edge gyrokinetic simulation over one global code calculation. This comes at the price of addressing the computational load balancing problem between the core and edge simulations, since the core–edge coupled simulation requires the use of fewer compute nodes for the simpler delta-f core code, as compared to the total-f edge code.

The remainder of this paper is organized as follows. In Sec. II, we illustrate the problem of coupling f in a simple 1D advection–diffusion problem. In Sec. III, we present the full system of core–edge coupled gyrokinetic equations, i.e., the coupling of both the field and the particle distribution function. In Sec. IV, we introduce how we use a resampling technique to transfer the particle distribution function information between core and edge simulations. In Sec. V, we present a core–edge coupled simulation where we use this new coupling algorithm that employs the resampling of the particle distribution function. In Sec. VI, we discuss two optimization scenarios that exercise the core–edge coupling algorithm to achieve a computational speed-up by 2× to 10×. Finally, we close with conclusions and future work in Sec. VII.

In this section, we present the spatial coupling of a 1D advection–diffusion equation

ft+Vfx=D2fx2.
(1)

The reader only interested in the spatial coupling of gyrokinetic codes can directly go to Sec. III. However, this section provides valuable insight into the underlying principles of the scheme, which are more easily elucidated in a simplified context.

The simulations are performed with a continuum code on a 1D radial grid. The spatial derivatives are computed with centered stencils, the discrete advection–diffusion equation reads

{fit+Δt=fitΔtVfi+1tfi1t2h+ΔtDfi+1t2fit+fi1th2fi0=yi0,
(2)

with y0 being the initial condition, t the time index, and i the spatial index. We have verified that the mass of the 1D system, M=dxf(x):=nfn, is conserved to round-off error.

The coupling of core and edge advection–diffusion equations through the overlap region is based on the composition of f (see Ref. 1), by defining

fˇ=ϖfC+(1ϖ)fE,
(3)

where fC and fE are the core and edge “distribution functions,” respectively.

The function ϖ is a monotonic function of the radial coordinate, see Fig. 1(b). It is typically 1 in the core, 0 in the edge, and it varies continuously from 1 to 0 in the overlap. The overlap is the region where both fC and fE co-exist. Beyond this overlap are regions called buffers, which are used to provide suitable boundary conditions to the overlap. We discuss the subtleties of the ϖ function for gyrokinetic simulations in Sec. III B.

Given that fC and fE are two independent solutions of the advection–diffusion equation (1), it is simple to show by direct substitution that their combination fˇ=ϖfC+(1ϖ)fE is also a solution of Eq. (1) if and only if the following necessary and sufficient condition10 is fulfilled:

0=(fCfE)×(ϖt+VϖxD2ϖx22Dϖxln(fCfE)x).
(4)

Note that ϖ is a function of the radial direction only, so ϖt=0. Relaxing this constraint is left for future studies, when, for example, the position of the overlap will move in time.

There are two trivial solutions to Eq. (4). The first trivial solution consists in taking fC=fE everywhere. The second solution consists in taking ϖ as a fixed constant.

An alternate solution consists in using either one of these two conditions at each given position x. In our case and as shown in Fig. 1(b), ϖ=1 in the core and ϖ=0 in the edge, such that ϖx=0 in these regions. Therefore, the condition fC=fE is only necessary in the overlap region where ϖ is varying and ϖx0, see Sec. III D for gyrokinetic simulations.

Note that the third term in parenthesis in Eq. (4) requires

2Dϖx(fCfE)x=0,

which means that we do not need fCfE=0 in the overlap but only (fCfE)x=0 which is less restrictive, as it is true for any constant number including 0.

Obviously, the coupling of different codes will lead to interpolation error or marker particle noise effects, such that we will only be able to ensure fCfE=ϵ with ϵ0. In Ref. 1, we discussed (in the case of gyrokinetic simulations) that a larger overlap width, L, mitigates the effect of particle noise because it decreases the nonphysical term

Err=(fCfE)(Ẋ·ϖ).

As ϖ is linear, one has

Err=ϵẊ/L.

In the case of the advection–diffusion equations, using ϖ linear implies 2ϖ/x2=0, which leads to a similar conclusion regarding the width of the overlap. Overall, ϵ acts like a random error with similar amplitude at all positions of the overlap. Hence, we take ϖ/x constant. Intuitively one can understand that if ϖ is not linear, it should at least be a monotonic function. A detailed study of the effect of ϵ on the correctness of the solution, i.e., on Eq. (4), is left for a future study.

To keep the coupled solution fˇ correct in the overlap, we need to determine accurately its evolution. For this, we need to correctly push fC and fE with the advection–diffusion equation. The difficulty of correctly pushing in the buffers is related to the spatial derivatives that connect fi to fi+1 and fi1 in the discrete advection–diffusion equation (2).

During the first push, the value of f at the artificial boundary of the core simulation, which we note fnC with n the position of the artificial boundary, cannot be evolved correctly because the core simulation does not have the knowledge of fˇn+1. The quantity fnC thus becomes corrupted after the first push, i.e., (fC)nt+1 is incorrect. During the second push, the update of fn1C is incorrect, because (fC)nt+1 is corrupted. The error propagates by one interval per time step, because of our choice of stencil.

In this example, the error reaches the overlap boundary after N + 1 time steps, given that the buffer extends over N radial intervals. Therefore, if we synchronize the buffer at every N time steps, the error will never reach the overlap and the coupled solution will remain correct.11 

The width of the buffer is flexible and should be chosen wisely. In standard parallel domain decomposition within a single code, one would typically choose N = 1. However, inter-code communication is subject to latency considerations that may become a bottleneck for the entire simulation. To minimize this latency, it thus makes sense to consider sending larger messages less frequently, i.e., N >1, to minimize this latency. The larger the buffer, the less frequent is the synchronization. Moreover, the total volume of information exchanged is independent of N, because transferring N grid points every N time steps corresponds to transferring one grid point per time step, on average. More details about the buffer synchronization will be discussed in Sec. III E.

In the following tests, we chose V=1/2, the system size is L = 1, a grid with 100 intervals is used in x, and a Dirac function is initialized in the edge f(x,t=0)=δ(x1). A timeline of the simulations is shown in Fig. 2, where the reference (black) and coupled (red) simulations are compared in the left column. The Dirac function rapidly diffuses in a “blob” like structure which moves inwardly from the edge to the core and crosses the coupling region. The overlap and buffer regions are identified in Fig. 2 top right subplot. The buffers are periodically synchronized with the period Tsync.

FIG. 2.

Timeline of the evolution of an advection–diffusion problem. A blob is initialized in the edge and move inwardly through time. Each row is for a given time, as indicated in the figure. The left column shows the reference (black) and the coupled (red) simulations. In the right column, shown are the core and edge simulations of the coupled simulation. Only one buffer is shown. This buffer is synchronized every 20 steps.

FIG. 2.

Timeline of the evolution of an advection–diffusion problem. A blob is initialized in the edge and move inwardly through time. Each row is for a given time, as indicated in the figure. The left column shows the reference (black) and the coupled (red) simulations. In the right column, shown are the core and edge simulations of the coupled simulation. Only one buffer is shown. This buffer is synchronized every 20 steps.

Close modal

We test the influence of the synchronization period by comparing simulations where we use different synchronization periods, Tsync. The accuracy of the coupling is estimated in Fig. 3 where we plot, in (a), the time evolution of the system mass

M(t)=0Ldxfˇ(t),

and, in (c) and (d), the time evolution of the error

||ffˇ||2=0Ldx[f(t)fˇ(t)]2,

with f being the solution. Conservation of mass is a necessary but not a sufficient condition. Therefore, we also look at the error which is a rigorous way of measuring correctness of the coupled solution.

FIG. 3.

Estimating the error due to using different Tsync. (a) Time evolution of the system mass for four coupled simulations. (b) State of the four simulations at t = 1.2. (c) Difference between reference and coupled simulation using Tsync=NΔt=0.2. (d) Same error for the four simulations. Parameters are given in (a).

FIG. 3.

Estimating the error due to using different Tsync. (a) Time evolution of the system mass for four coupled simulations. (b) State of the four simulations at t = 1.2. (c) Difference between reference and coupled simulation using Tsync=NΔt=0.2. (d) Same error for the four simulations. Parameters are given in (a).

Close modal

Let us analyze the results (in blue in Fig. 3) of the coupled simulation carried out with a small enough period of synchronization, Tsync=NΔt=0.2. The mass is conserved as shown in (a), and the difference between the reference and coupled simulations is zero or round-off error as shown in (c). Note that the round-off error is the largest when the blob is in the overlap region—this is sensible as the overlap is the only place in which the computational methods differ.

Then, to show that too large a period leads to bigger error, we gradually increase the synchronization period in three simulations plotted with yellow, red, and purple curves. The mass of the system (a) is not conserved when using too large a synchronization period Tsync, and the error (d) is big. Note that the error rapidly increases and then slowly decreases in time (to a nonzero value), because the sharp structure due to an inappropriate buffer synchronization are diffusing in the system.

For illustration, we plot with a black dashed line in (a) the expected result of a purely advective problem (D = 0) during which we do not synchronize the buffer (Tsync=), and which is also initialized with a Dirac at the edge fˇ(t=0)=δ(1x). The solution of this incorrectly coupled system reads fˇ=(1ϖ)δ(x1Vt). The rate of mass loss of f inside the overlap region is

1fdfdt=Vϖx.

The mass moving at constant velocity and without diffusion is linearly decreasing when crossing the overlap, as we chose ϖ linear in x in the overlap. This incorrect coupling acts as if a perfect sink was present.

In comparison, the purple curve is the result of the 1D advection–diffusion simulation with Tsync=. The agreement with the black dashed line is very good. The slight difference between the black dashed line and the purple curve is due to the effect of diffusion. The initial Dirac perturbation tends to converge toward a symmetric Gaussian with a front tail that is faster than V and a back tail that is slower than V.

The results in Fig. 4 estimate the error on the system mass conservation (1 means 100% loss) at the final simulation time Tend=2 (blue), 8 (red), and 20 (black). The blob is passing many times through the overlap because the system is made periodic with f(x=0)=f(x=1) for the purpose of our tests.

FIG. 4.

Nonconservation of the system mass with respect to the period Tsync between two buffer synchronizations. We measure the error after the simulation times Tend=2, 8, and 20.

FIG. 4.

Nonconservation of the system mass with respect to the period Tsync between two buffer synchronizations. We measure the error after the simulation times Tend=2, 8, and 20.

Close modal

In this figure, the error, “Loss of mass,” is plotted with respect to the period of synchronization Tsync. The shorter the period, the better the system conserves mass. Three regions can be identified in this figure:

  1. Tsync<NΔt (exact conservation).

  2. NΔt<Tsync<0.4.

  3. Tsync>0.4.

We recall that N is the radial width of the buffer (in number of grid intervals).

When Tsync>0.4, region (3), the synchronization period is so large that the blob can pass almost entirely through the buffer without being synchronized. The error is big and the system rapidly loses most of the blob mass, as shown in (b) after 4 or 10 “turns.”

In between, when NΔt<Tsync<0.4, the accuracy depends on the diffusion. Increasing the diffusion coefficient D increases the loss of mass caused by the too slow synchronization of the buffer, as shown in Fig. 5. When the diffusion is low, we can even use a period larger than Tsync=NΔt without losing mass (see the yellow curve).

FIG. 5.

Nonconservation of the system mass with respect to the synchronization period Tsync. We scan the diffusion coefficient D.

FIG. 5.

Nonconservation of the system mass with respect to the synchronization period Tsync. We scan the diffusion coefficient D.

Close modal

When Tsync<NΔt, region (1), it means that the synchronization period is small enough to ensure exact coupling as we have shown in Fig. 3(c). We have further scanned the Δt parameter to verify this aspect, see Fig. 6.

FIG. 6.

Nonconservation of the system mass with respect to the synchronization period Tsync. We scan the time step Δt. Obviously, the smaller the time step the smaller the period Tsync needs to be.

FIG. 6.

Nonconservation of the system mass with respect to the synchronization period Tsync. We scan the time step Δt. Obviously, the smaller the time step the smaller the period Tsync needs to be.

Close modal

In this section, we report on the spatial coupling of gyrokinetic models. We consider, for simplicity, a single gyrokinetic species represented with the gyrocenter distribution function f. The generalization to multi-species is trivial.

The gyrocenter distribution function f is evolved with the 5D gyrokinetic equation

ft+Ẋ[ϕ]·fX+v̇[ϕ]fv=C[f]+S,
(5)

where X is the gyrocenter, v is the parallel velocity, C is the collision operator, and S is a source term. The electrostatic potential ϕ is solved with the gyrokinetic Poisson equation

Lϕ=n¯,
(6)

where the right-hand side is computed with

n¯[f](x)=+dv0+dμdαf(xρ,v,μ).
(7)

The gyrokinetic distribution function f(X,v,μ) is defined in gyrocenter space, while the fields ϕ(x) and n¯(x) are defined in particle position space; recall x=X+ρ. This discrepancy needs to be taken care of carefully when spatially coupling two gyrokinetic models. A review of gyrokinetic theory can be found in Ref. 12.

Note that, here, we do not explicitly describe the implementation for the Poisson equation, but we assume that L is fixed in time and does not depend on f. This is usually the case in the various implementations of the gyrokinetic Poisson equation. They could have differential or integral forms.13–15 Generalization to more elaborated field equations (where the left-hand side is updated) is left for future work, but should be relatively straightforward as any required coupled field information on the left-hand side should be constructed in the same manner that we do for the right-hand side.

Just as in Sec. II, the coupling of core and edge gyrokinetic simulations is based on the use of a composite distribution function1 

fˇ=ϖfC+(1ϖ)fE,
(8)

with fC and fE being the core and edge distribution functions, respectively. The function ϖ=ϖ(X), shown in Fig. 7, is a monotonic function of X the gyrocenter position, because we couple two gyrocenter distribution functions and not two particle distribution functions. Obviously, ϖ=1 in the core region and 0 in the edge region. In between these two regions, there is an overlap where 0<ϖ<1. In addition to the core, overlap, and edge regions, there are two buffer regions. As we will discuss in Subsection IV, these buffer regions are necessary for the consistency of the particle distribution function coupling.

FIG. 7.

Core/edge decomposition of the simulation volume. Heating and sink are included in the heat source term S.

FIG. 7.

Core/edge decomposition of the simulation volume. Heating and sink are included in the heat source term S.

Close modal

From the definition of the blended distribution function in Eq. (8), we can obtain the coupled equations. For example, the gyroaveraging of f in Eq. (7) is computed with

dαfˇ(xρ,v)=dαϖ(xρ)fC(xρ,v)+dα(1ϖ(xρ))fE(xρ,v),
(9)

where we define v=(v,μ) to lighten notation.

The set of coupled equations is summarized in Table I. We will detail those equations in subsections III C and III D. We use the “check” symbol, ˇ, to identify coupled quantities. For instance fˇ is the coupled distribution function and ϕˇ the coupled field.

TABLE I.

Quantities that need to be coupled. We note dZ=dxdv||dμdαJ to simplify the notation. The calculation of the spatial derivative fX(y)=limϵ0(f(y+ϵ)f(y))/ϵ is nonlocal, as it needs the knowledge of f at different positions. GK is “gyrokinetic.”

Distribution functionfˇ=ϖfC+(1ϖ)fE with ϖ=ϖ(X) monotonic
Collision (or source SC[fˇ] is a linear local operation on fˇ 
Charge density (linear and nonlocal) n¯[fˇ]=dZδ(X+ρx)[ϖfC+(1ϖ)fE] 
GK Poisson (depends on the charge density) Lϕˇ=n¯[fˇ] is linear in fˇ 
GK Vlasov operator (depends on ϕˇDDt=t+Ẋ[ϕˇ]·X+v̇[ϕˇ]v is nonlinear (ϕ.f) and is nonlocal (calculation of Ẋ·f/X). 
Distribution functionfˇ=ϖfC+(1ϖ)fE with ϖ=ϖ(X) monotonic
Collision (or source SC[fˇ] is a linear local operation on fˇ 
Charge density (linear and nonlocal) n¯[fˇ]=dZδ(X+ρx)[ϖfC+(1ϖ)fE] 
GK Poisson (depends on the charge density) Lϕˇ=n¯[fˇ] is linear in fˇ 
GK Vlasov operator (depends on ϕˇDDt=t+Ẋ[ϕˇ]·X+v̇[ϕˇ]v is nonlinear (ϕ.f) and is nonlocal (calculation of Ẋ·f/X). 

The same electrostatic potential ϕˇ is used in both core and edge simulations. This unique electrostatic potential ϕˇ is solved by using the composite distribution function equation (8) in the right-hand side of Poisson equation.1 This coupled gyrokinetic Poisson equation reads, in our case

Lϕˇ=n¯C+n¯E,

with

n¯C=dXdvdμdαJδ(X+ρx)ϖfC,n¯E=dXdvdμdαJδ(X+ρx)(1ϖ)fE.

A unique field guarantees that core and edge simulations push the same nonlinear terms in the Vlasov equation, DfDt=0, with

DDt=t+Ẋ[ϕˇ]·X+v̇[ϕˇ]v.

Solving for a unique coupled field avoids turbulence suppression by nonlinear phase-mixing, see Ref. 1.

The field solve is performed at each Runge–Kutta stage, but it only requires the exchange of 3D information, nC and ϕ, between the core and edge, see Ref. 1. Therefore, even if these communications are frequent, their computational cost is mostly negligible. Here are the main steps of this coupling algorithm

  1. The core and edge compute their charge density.

  2. The core simulation sends its density to the edge.

  3. The edge simulation solves the field and sends it to the core simulation.

  4. Both sides push the Vlasov equation.

All communications are performed using ADIOS technologies.16,17

Now that we are guaranteed to use the same gyrokinetic Vlasov operator DDt in the core and edge simulations, we want to be sure that if fC and fE are solutions of the gyrokinetic equation, then their composition fˇ is also a solution of the gyrokinetic equation, such that

DfˇDt=C[fˇ]+S.
(10)

To prove this, we inject the definition fˇ=ϖfC+(1ϖ)fE in the gyrokinetic equation, leading to

DϖfCDt+D(1ϖ)fEDt=C[ϖfC+(1ϖ)fE]+S,

which can be rewritten

0=(fCfE)DϖDt+ϖ(DfCDtC[fC]S)=0+(1ϖ)(DfEDtC[fE]S)=0.
(11)

Having DfCDt=C[fC]+S,DfEDt=C[fE]+S, then fˇ is a solution of Eq. (10), if and only if the following necessary and sufficient condition is fulfilled:

(fCfE)(ϖt+Ẋ·ϖX+v̇ϖv)=0,

where we used the definition DDt=t+Ẋ·X+v̇v.

This necessary and sufficient condition has two trivial solutions: fC=fE or ϖ a fixed constant. In our case, we wind up using these two conditions, just using either one or the other in different parts of the domain, as we will see now.

For coupling core and edge simulations, the monotonic function ϖ=ϖ(X) is a function of the radial coordinate, X, only. Therefore, ϖ/t=0 and ϖ/v=0, which leads to the simplified condition

(fCfE)(Ẋ·ϖX)=0.

This condition was mentioned in Ref. 1 without further details. We now study the implication of this condition in the context of the coupling of core and edge distribution functions.

By design, we have ϖ=1 in the core and ϖ=0 in the edge, such that ϖX=0 in these regions. Therefore, the condition fC=fE is only necessary in the overlap region where ϖ is varying with respect to the radial direction and ϖX0.

In the overlap, one has actually fCfEϵ and ϖ/X|overlap=1/L, such that one needs to minimize the quantity

Err=ϵẊ/L,

by minimizing either ϵ with more particles or by using a large overlap width L, see Sec. II B and see Ref. 1.

This design enables the distribution function to be incorrect in the buffer, where ϖ/X=0, as long as it does not contaminate the overlap region. Indeed, the buffer must provide information which is accurate enough for correctly evolving the distribution function in the overlap region. Mathematically, it means that the derivative of f must be defined on both sides of the interface between the overlap and the buffer. Numerically, this obviously translates into different constraints given that f is represented on a grid or with marker particles, as we will discuss now.

The problem due to an artificial boundary can be easily pictured with the presence of a heat source in the region outside of the edge simulation domain, as proposed in Fig. 7. Without a proper boundary condition at the artificial boundary, the heat will not enter the edge simulation. A similar issue would happen if a strong flux of particles is generated in the core [during an edge localized mode (ELM) crash for example]. Note that particle flux requires kinetic electrons.

In our study, both core and edge simulations have a buffer near the overlap. So, let us focus on the core simulation and its artificial boundary at the end of the buffer. It is trivial to reach the same conclusions for the edge simulation.

To understand how we need to handle the boundary condition, we study the evolution of the distribution function, f, at the position of the artificial boundary, see Fig. 8. More particularly we look at how to handle the term Ẋ·f when pushing the coupled gyrokinetic Vlasov equation. In the following, we first treat the case of a continuum code, and then the case of a PIC code.

FIG. 8.

Radial domain of the core simulation and its artificial boundary condition.

FIG. 8.

Radial domain of the core simulation and its artificial boundary condition.

Close modal

1. Continuum code

In a continuum code, pushing f with the gyrokinetic Vlasov equation requires a difference approximation for estimating the derivative of f. By using, for example, the centered stencil

fnX:=fn+1fn12h,

where n is the position of the artificial boundary, fn is the value of the distribution at this radial position, and h is the width of a radial interval, see Fig. 8.

In our case, the core code does not have the information fn+1C. Therefore, it cannot estimate fnCX correctly and it will incorrectly update fnC during the first push operation. The information fnC in the buffer is then corrupted. During the next push operation, the core code cannot estimate fn1CX correctly because fnC is corrupted. It will then incorrectly update fn1C which becomes corrupted after two push operations. The error originally located at fn+1C is propagating inwardly by one radial interval per push operation. If the buffer width is N intervals, it takes N + 1 push operations to corrupt the overlap region. In this case, the solution consists in updating the buffer distribution function with the one of the edge simulation, every N time steps. See Sec. II for an example of spatial coupling of continuum codes (1D advection–diffusion equation), and more particularly Sec. II C where we discuss the synchronization of the buffer of continuum codes.

2. PIC code

In a PIC code, things are more complicated. A PIC code evolves marker particles defined by their position Zp in the 5D gyrocenter space and their weight wp. The evolution of the distribution function near the artificial boundary condition is computed through the motion of the marker particles.18 The evolution of the distribution in a cell of the artificial boundary is the result of the evolution of incoming and outgoing marker particles, as well as the evolution of marker particles remaining inside. The problem at the artificial interface is that it misses the particles outside the simulation that move inside the cell; see Fig. 8. Their contribution to the evolution of f is missing. Therefore, f becomes corrupted in this cell.

The propagation of this error is also more delicate in a PIC code. There are multiple ways for this error to propagate radially:

  1. The collision operations radially couple the system because markers have neoclassical orbits spanning multiple radial surfaces with an orbit width δr. Therefore, by local collisions along their neoclassical orbits, they propagate the error radially. The corrupted markers, in turn, also propagate this error radially along their neoclassical orbits.

    We typically compute the collisions every 10 ion time steps in XGC. So synchronizing f every 10 time steps should be enough, as long as the buffer width is large enough compared to the largest neoclassical orbit widths δr.

  2. The turbulent E×B drift, vE, which is the same for all species, moves the marker particles radially. Assuming this field is mostly constant between two updates of the buffer, i.e., over a few time steps N, the radial width of the buffer needs to be larger than l=(ψ|ψ|·vENΔt) with Δt the simulation time step, ψ the poloidal magnetic flux that labels the magnetic surfaces, and ψ its gradient. The fact that the E field is gyroaveraged with a Bessel factor J0<1, does not change this conclusion.

    Even in a configuration of zero (or small) particle flux, the error due to the artificial boundary could propagate radially, because hot and cold particles can have large fluxes of opposite directions.14 

  3. The parallel nonlinearity can change the orbit of a marker and thus its radial excursion. This can enhance the radial propagation of the error.

In the future, one could imagine tracking a few marker particles, initiated near the boundary or outside the domain, in order to detect when they reach the overlap region (or a region just before). This could permit to automatically trigger the buffer synchronization before the overlap is corrupted. This is left for a future study.

The coupling of the core and edge distribution functions requires their synchronization in the buffer once every multiple time steps. Indeed, as we have shown in Secs. III D and III E, the distribution function does not need to be correct in the buffer, as long as the error it contains does not reach the overlap. The artificial boundary corrupts the buffer distribution function but it takes time for this error to propagate through the buffer and reach the overlap. A solution we described consists in synchronizing the distribution function in the buffer region once every N time steps before the error penetrates the overlap region. The period of synchronization, T=NΔt, needs to be small enough so that the error coming from the artificial boundary does not reach the overlap. Obviously, a correct period of synchronization depends on the buffer radial width, as it would take more time for the error to crossover a large buffer. A cartoon of the buffers synchronization is shown in Fig. 9. Note that the core and edge buffers can have different widths and periods of synchronization.

FIG. 9.

Synchronization of the buffer distribution functions, every N time steps. The core and edge buffers can have different widths and be updated with different periods T=NΔt.

FIG. 9.

Synchronization of the buffer distribution functions, every N time steps. The core and edge buffers can have different widths and be updated with different periods T=NΔt.

Close modal

We will now discuss in Sec. IV how one can couple the distribution function while using a resampling technique. Such technique uses an intermediate 5D grid for representing the information which is transferred between the core and edge simulations. Thanks to using an intermediate grid, one can couple different models or even different codes like PIC and continuum together.

We now describe how to use the resampling technique for coupling core and edge particle distribution functions in the buffer regions. This resampling technique allows information to be transferred between core and edge simulations, More interestingly, it enables the coupling of codes using different models (delta-f and total-f) or different numerical schemes (PIC and continuum), because an abstract interface is used for communicating the 5D data. Using an intermediate 5D grid is possible because the resampling technique transfers the information from a 5D grid to marker particle weights, and vice versa. The 5D grid data are communicated between the two codes using ADIOS communication.16,17 Note that if we couple two continuum codes, the resampling of marker particles is unnecessary and it suffices to interpolate the information between the grids (if they are different).

The original use of the resampling algorithm is to reduce the particle noise by reducing the variance of the marker weights while preserving the correct physics. Indeed, the distribution function represented by marker particles contains a noise component, f̃, such that these marker particles supposed to represent f, actually represent F=f+f̃. Many marker particles are used to maximize the signal to noise ratio r=f/f̃. Nonetheless, the noise component f̃ increases through time until the simulation is dominated by noise. The resampling scheme addresses this issue by decreasing the noise component f̃ while preserving the physics component f.

Preserving the physics contained in f while minimizing the weights variance is ensured by solving a quadratic programming (QP) optimization problem under constraints (moment preserving constrained resampling8,19). The quadratic optimization decreases the variance and the constraints preserve the physics. Note that in Ref. 8 the resampling was implemented for a full-f scheme where the marker weights are fixed. In our work, we use an implementation for the total-f scheme of XGC where the marker weights vary (like in a delta-f scheme). This was introduced in Ref. 19 and will be presented in detail in a forthcoming publication. We will now describe the resampling algorithm.

The resampling algorithm regroups the marker particles per velocity bins (vi,μj) inside each Voronoi cells. A Voronoi cell is defined around each vertex of the XGC 3D position grid. Once the marker particles are sorted in this 5D grid, the algorithm can perform three actions:

  • If there are not enough markers per bin (excessive noise), new markers are created prior to resampling their weights. This is called “up-sampling.” The QP optimization problem also needs a high enough number of marker particles to be successful.

  • If there are too many markers per bin, many of them are destroyed and the weights of the remaining markers are resampled. This is called “down-sampling.”

  • Finally, if the level of noise measured with the variance of the weights inside the bin is higher than a given parameter, the weights are resampled. This permits to maintain a “good” signal to noise ratio.

When creating or destroying marker particles, one has to be careful and ensure conservation of the phase space volume by renormalizing the weights, because each marker represents a subvolume of this phase space and blindly changing their number would be an issue for the total-f scheme (as well as for the classic delta-f scheme).

When resampling marker particles, the algorithm minimizes the variance (noise reduction) of the marker particle weights inside a velocity bin. For this purpose, the algorithm solves a constrained quadratic programming optimization problem (Goldfarb-Idnani see Ref. 8) with the optimization target being the minimization of the marker variance. Currently, the constraint consists in preserving five moments of the distribution function, which are the mass M[1], the parallel linear momentum M[v], the magnetic moment M[μ], the perpendicular energy M[μB], and the parallel kinetic energy M[v2], with

M[g](i,j)=pbingωpP(i,j)(v||,p,μp).
(12)

P(i,j) is the projection on velocity bins (vi,μj). These bins are used to discretize the velocity space. A minimum number of marker particles per bin is necessary to guarantee the success of the optimization problem. For this purpose, the algorithm up-samples the bins if necessary. More details will be given in a future publication dedicated to the resampling technique for the total-f scheme.

When coupling the core and edge distribution functions, the algorithm may up-sample, down-sample, or resample the bins. By default, the algorithm resamples all bins without testing their variance, because we want to import the entire distribution function information. Indeed, the goal of resampling for coupling is to transfer the particle distribution function information. We can deactivate the down-sampling option, but the up-sampling option is necessary for increasing the rate success of the resampling operation.

As we will discuss now, applying the resampling technique for coupling the core and edge distribution functions brings extra difficulties compared to its original use for reducing the particle noise.

First of all, the algorithm requires communications between the core and edge simulations. For instance, the resampling of the core distribution function fC in the core buffer region can be described in three steps: (1) we compute the five moments M by projecting the edge marker particle weights on a 5D grid; (2) we transfer these 5D moments M from the edge to the core; (3) we use these moments to resample the core marker particles in the core buffer. This communication algorithm is illustrated in the cartoon in Fig. 10. A simple verification test is shown in subplots (a) and (b). This test illustrates the transfer of 5D information and the synchronization of the buffers. In this test, we purposely use different initial conditions in the two simulations shown in (a) and (b), in order to more clearly visualize the buffers after their synchronization. The two buffer rings clearly stand out in (a) and (b).

FIG. 10.

Illustration of the exchange of information in the radial buffers with the resampling technique between two XGC simulations. Simulations in (a) and (b) are initialized with different initial conditions. Then we synchronize the buffer without pushing the particles in time. This is a basic test.

FIG. 10.

Illustration of the exchange of information in the radial buffers with the resampling technique between two XGC simulations. Simulations in (a) and (b) are initialized with different initial conditions. Then we synchronize the buffer without pushing the particles in time. This is a basic test.

Close modal

There is a second extra difficulty when resampling the marker particles of a PIC code while using the 5D information from another PIC code, because the marker particles have different positions and thus a different noise in the two simulations. Indeed, the distribution function represented by particles contains a noise component, f̃, such that these particles supposed to represent f, actually represent F=f+f̃. If we consider two different sets of particles labeled 1 and 2, they will have different noise components: f̃1 and f̃2, respectively. Then when resampling the buffer of the set of marker particles 1 with the information of the set of marker particles 2, we will resample the set 1 with the information F2=f+f̃2. In the end, if successful, the set 1 will represent F1=(f+f̃2)+f̃1, where f̃1 is the noise of the set of marker 2 on the information F2=f+f̃2. This noise is different from the noise f̃2, which is the marker particle noise over the information f. Two problems arise: (1) this extra noise makes the optimization problem more difficult, because of nonphysical grid scale information and (2) the noise of the two sets of marker particles somehow accumulates in the buffer. To avoid these two problems, we use a large number of particles per cell for resampling in the buffer and a large number of particle per cell to reduce the noise on the 5D grid. Studying the use of noise filtering techniques to mitigate the error due to particle noise is left for a future work. Filtering can be based on a field-aligned Fourier filter20 or on a smoothing function.

With our abstract interface, the coupling of different models, such as delta-f and total-f is possible. In a delta-f model like GEM (or the delta-f version of XGC), the distribution is represented with f=fa+fp, where fa is an analytical function and fp is the particle-in-cell representation. In a total-f code like XGC, the distribution function is represented with f=fa+fg+fp where fg is a grid component that contains the slow relaxation of the distribution function. Coupling these different representations is not a problem, because the moments M can be computed with f. Therefore, the component fp, which is the one we want to represent with particles when resampling, is recovered with fp=ffa or fp=ffafg depending on the model we use. The coupling of codes using a different number of weights or different weight evolution equation is, in principle, also possible with this coupling scheme.

This scheme also enables the coupling of PIC (XGC) and continuum (GENE) codes because it exchanges distribution function information (the five moments M) on a 5D grid. A continuum code like GENE can take the 5D grid data from XGC and update its gyrokinetic grid data. Then it can send its gyrokinetic information on the interface 5D grid to XGC, which in turn resamples its marker particle weights. The bins used for resampling can be represented on any velocity grid with any coordinate system if it facilitates the coupling. The only constraint is that there needs to be enough marker particles in the bins for reducing noise or enabling the success of the optimization while resampling. Note that different velocity grids can be used for the core buffer and for the edge buffer. Such a study is left for future work.

There are three aspects of the coupling algorithm to consider when estimating the cost of coupling core and edge simulations. They are the:

  • Cost of the Poisson coupling and communications.

  • Cost of the f-coupling and resampling operation.

  • Ratio between the simulation volume of a core–edge coupled simulation and a reference simulation.

The cost of the Poisson coupling is mostly negligible. The main constraint is that the coupling scheme needs the communication of the density and potential between the core and edge simulations. Such information is already communicated within a single code using MPI communications between different tasks. The difference in a coupled simulation is that we use ADIOS communications to transfer this information with files or in-memory communications between two different executables.

The cost of the f-coupling with resampling is also almost negligible, as long as it is well configured. Indeed, the cost of resampling all the velocity bins of all Voronoi cells in the buffer takes a significant fraction of one time step, i.e., more or less 20%40%, but we resample only once every N = 10 time steps which decreases the total relative time of the resampling operation down to 2%4%. The computational time taken for resampling a large buffer is the same as the one for resampling a narrow buffer, because the XGC code is efficiently parallelized in the radial direction.

In core–edge coupled simulations, part of the plasma volume is simulated twice, because the scheme uses overlap and buffer regions at the interface between the two codes. At first sight, the coupling scheme makes a coupled simulation more expensive, but there are at least two ways for speeding-up a simulation thanks to this new scheme. One way is to use different models in the core and edge, with a cheaper model in the core. Another way is to take advantage of the fact that core and edge turbulence transport timescale are different. We will discuss how to take advantage of these aspects in Sec. VI.

We now test the new spatial coupling scheme described in Sec. III, by carrying out coupled core–edge gyrokinetic simulations of a simple circular geometry. We couple a core simulation with an edge simulation, both carried out with XGC. The core domain covers the radial domain going from ψn=0 to ψn=0.65. The edge domain covers the radial region going from ψn=0.3 to ψn=0.9. The synchronization of the distribution function in the buffer uses the resampling technique described in Sec. IV. The communication of the 5D grids are performed using the ADIOS16,17 framework, which enables both file and memory based communication. Both the core and edge simulations are run with XGC, and we use the same delta-f representation of the perturbation, i.e., f=fa+fp. Nonetheless, we use a different density of marker particles in these two simulations, and two different radial domains as already mentioned when commenting Fig. 7.

A successful coupled simulation is shown in Fig. 11, where we compare a reference simulation (a) with a coupled simulation (b), and we plot their difference (c). We show the E×B flux of (gyrocenter) particles, Γ, which is a combination of the field ϕ and the distribution function f, and is defined by

Γ(X)=dv||dμϕα×BB2f(X,v,μ),

where ϕα=ϕα(X).

FIG. 11.

Gyrocenter flux in a reference (a) and a coupled (b) simulations. Their difference is shown in (c). ψn is the radial direction labeling the magnetic flux surfaces. For the coupled simulation, we synchronize f in the buffers with the resampling technique between two XGC simulations, as described in Sec. IV.

FIG. 11.

Gyrocenter flux in a reference (a) and a coupled (b) simulations. Their difference is shown in (c). ψn is the radial direction labeling the magnetic flux surfaces. For the coupled simulation, we synchronize f in the buffers with the resampling technique between two XGC simulations, as described in Sec. IV.

Close modal

An excellent agreement is found between reference and coupled simulations, with an error of a few percent. Let us emphasize that during this simulation, the buffers have been resampled 80 times (every 10 time steps). This is a difficult problem, because we use significantly narrower buffer compared to our previous study.1 This choice makes the coupling of distribution function in the buffer more critical, and it demonstrates the efficiency of the coupling technique. The buffer regions are shown in Fig. 12.

FIG. 12.

Illustration of the exchange of information in radial buffers with the resampling technique between two XGC simulations. (a) core simulation, (b) edge simulation, and (c) their coupled diagnostics. (d) In the bottom left is a zoom on the buffer of the core simulation. Graph (e) is a zoom on the surface 118 inside the core buffer. The edge simulation flux is in blue and the core simulation flux is in red markers.

FIG. 12.

Illustration of the exchange of information in radial buffers with the resampling technique between two XGC simulations. (a) core simulation, (b) edge simulation, and (c) their coupled diagnostics. (d) In the bottom left is a zoom on the buffer of the core simulation. Graph (e) is a zoom on the surface 118 inside the core buffer. The edge simulation flux is in blue and the core simulation flux is in red markers.

Close modal

We will now look, in more detail, in Fig. 12, at the role of the dynamics boundary conditions, i.e., the synchronization of the buffer distribution function. We will show that it prevents the corrupted data from penetrating inside the overlap region, where the solution has to be correct.

In Fig. 12, the core–edge coupled simulation (c) is decomposed in its core simulation (a) and its edge simulation (b). Each simulation is split in four radial regions:

  1. the core (respectively, the edge),

  2. the overlap,

  3. the buffer, and

  4. the outside (which is mostly empty).

In this simulation, the outside is not exactly empty, because we let the marker particles follow their neoclassical orbits beyond the simulation boundary. So even if they are created inside the artificial boundary, they may follow an orbit crossing the artificial boundary or they may simply drift in this region. Research about how to handle this boundary in a different manner is left for a future study. This region is nonphysical and, as we can see, there is a huge nonphysical flux growing there.

The role of the buffer is to actively apply correct boundary condition to the overlap, so that the nonphysical dynamics in the empty region does not penetrate the overlap. We discussed the coupling of these equations with detail in Sec. III.

In the bottom left subplot, we zoom on the buffer of the core simulation. As we can see, the buffer data are reset every 10 time steps. Indeed, in this simulation we resample the buffer every 10 time steps. The blue structure in the outside region does not have the time to contaminate the overlap region.

To visualize the effect of the resampling, in subplot (e), we zoom on the flux of the surface 118, which is inside the core buffer. In blue, we plot the flux from the reference edge simulation, and with red markers, we plot the flux in the core buffer. Overall, the two curves agree pretty well, thus, showing that the synchronization of the distribution function keeps the buffer close to the correct solution. It is also clear, in the zoom of subplot (e), that the core's buffer diverges from the edge simulation but that the resampling is bringing it back to the correct solution every 10 time steps.

Finally, in Fig. 13, we show the fluxes of coupled simulations in the middle of the overlap region. If we do not synchronize the buffer, as in (b), the core and edge simulations significantly deviate from each other, whereas with the synchronization of the buffers, as in (a), the two simulations are in very good agreement in the overlap region. Moreover, the fact that the gyrocenter E×B flux of the core and edge simulations deviate significantly from each other in (b), despite the fact that both simulations use the exact same potential and thus E×B drift, means that fC and fE are significantly different in the core and edge simulations when the narrow buffers are not synchronized.

FIG. 13.

Comparison of the core and edge fluxes of coupled simulation in the middle of the overlap region (surface 100): (a) with the coupling of the distribution function in buffers and (b) without coupling the distribution function in buffers.

FIG. 13.

Comparison of the core and edge fluxes of coupled simulation in the middle of the overlap region (surface 100): (a) with the coupling of the distribution function in buffers and (b) without coupling the distribution function in buffers.

Close modal

We are now interested in discussing how one can use this new core–edge coupling scheme to speed-up a simulation. This is an important discussion as we have shown in Sec. IV D that the XGC–XGC coupled simulation is more expensive if nothing is done. We propose two simple solutions illustrated in the cartoon in Fig. 14.

FIG. 14.

Cartoon of the scenarii that can speed-up a simulation thanks to the core–edge coupling scheme.

FIG. 14.

Cartoon of the scenarii that can speed-up a simulation thanks to the core–edge coupling scheme.

Close modal

The first simple optimization (b) consists in using different models for the core and edge simulations, with a much cheaper gyrokinetic model in the core. This could be achieved for simulations that include impurities, because impurity physics is computationally more expensive in the edge than in the core. For instance, tungsten impurities are being modeled with several bundles in XGC because the ionization state of tungsten ions strongly varies in the edge pedestal. These many bundles, which are necessary in the edge, increase the computational cost of a simulation because of inter-species collisions with complexity proportional to the square of the number of kinetic species. When using seven tungsten bundles with deuterium and electrons, the simulation is almost 10 times more expensive than when simulating one tungsten bundle with deuterium and electrons (92/32=9). Therefore, taking advantage of the fact that only one tungsten bundle is necessary in the core, one can speed-up a whole device simulation by using different numbers of species in the core and edge simulations. The cartoon in Fig. 14(b) assume a 2× factor, as we split the simulation volume in two.

A second optimization consists in taking advantage of the different timescales of the turbulence saturation in the core and edge regions. This optimization is illustrated in the cartoon shown in Fig. 14(c). If the most expensive part of the simulation is the edge which saturates faster than the core, then we can saturate the core simulation first with the optimized core model and switch-on the expensive edge simulation later, by coupling the edge simulation to an already saturated core simulation. For instance, we could run a core simulation for 9 ms and then start a coupled simulation for 1 ms. This would make the edge simulation 10× cheaper. It is nonetheless not certain that a core–edge simulation will saturate in only 1 ms, as the core and edge physics may interact on a longer timescale. A dedicated study is necessary. More complex algorithms may be necessary with multiple iterations between core and core–edge simulations.

By applying these two optimizations enabled by the coupling scheme, we could reach a 10× speed-up when running a coupled core–edge simulation instead of a classical single simulation. These scenarii will be studied in future studies.

We have presented a new scheme for spatial kinetic coupling of two gyrokinetic simulations. The details of the coupled equations was given. Additionally, we presented a necessary and sufficient condition for ensuring accurate coupling. The scheme presented here extends our previous work1 to include the sharing of kinetic information between codes. The distribution function is shared in the buffer regions21 by using a recently developed resampling technique.8,19

The new coupling scheme is more generally applicable to kinetic problems because we have now addressed the coupling of both the fields and the particle distribution function. The transfer of the distribution function information between the core and edge simulations is done on an intermediate 5D grid using resampling. This way, the scheme and its implementation can be used to couple different models together, such as delta-f and total-f, and also to couple codes based on different representations of the distribution function, such as or PIC (XGC) and continuum (GENE) codes.

The flexibility of the buffer width presents an opportunity for code optimization because there is a trade-off between the frequency of synchronization and the size of the buffer region where the distribution function is resampled. The resampling of the distribution function can be computationally expensive so it is advantageous to decrease the frequency of resampling. More work is needed to pick optimal parameters for coupled core and edge gyrokinetic simulations.

Because of XGC parallelization, the wall clock time taken for synchronizing does not vary too much with the buffer width in our preliminary tests. Therefore, using a larger buffer, for decreasing the frequency of synchronization, decreases the overall computational cost. For the simulation we performed, we synchronized the buffer every 10 time steps and the overall cost of resampling took only a few percent of the total computing time. This optimization problem will be carefully studied in the frame of the CODAR co-design project.22 Future work also includes the coupling of electromagnetic simulations by applying the same approach to Ampère's equation as is used for the gyrokinetic Poisson equation.

Finally, we discussed how the coupling scheme can enable the speed-up of a whole volume simulation. A first optimization consists in using different models for the core and edge simulations, with a much cheaper gyrokinetic model in the core. In such a scenario, there is an important load balancing problem between the core and edge simulations. A second optimization consists in taking advantage of the different saturation times of turbulence in the core and edge regions. If the most expensive part of the simulation is the edge, which physics saturates faster than the core, then we can saturate the core simulation first with the optimized core model and switch-on the expensive edge simulation later, by coupling the edge on an already saturated core simulation.

A next step in the project concerns the coupling of GENE and XGC, as well as GEM and XGC, with this new scheme for coupling the distribution function. GENE and XGC have been carefully cross-verified,23 and recent progress on their coupling based on the Poisson equation coupling is reported in Ref. 24. Similarly, progress on the Poisson coupling of GEM and XGC is reported in Ref. 25.

This work was supported by the U.S. DOE under the Exascale Computing Project (No. 17-SC-20-SC).

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05–00OR22725.

This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02–05CH11231.

This manuscript has been authored by Princeton University under U.S. Department of Energy Contract No. DE-AC02–09CH11466. The publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Data used in preparing this article will be made available in the PPPL Theory Department ARK, see Ref. 26.

1.
J.
Dominski
,
S.
Ku
,
C.-S.
Chang
,
J.
Choi
,
E.
Suchyta
,
S.
Parker
,
S.
Klasky
, and
A.
Bhattacharjee
, “
A tight-coupling scheme sharing minimum information across a spatial interface between gyrokinetic turbulence codes
,”
Phys. Plasmas
25
,
072308
(
2018
).
2.
H.
Childs
,
E.
Brugger
,
B.
Whitlock
,
J.
Meredith
,
S.
Ahern
,
D.
Pugmire
,
K.
Biagas
,
M.
Miller
,
C.
Harrison
,
G. H.
Weber
,
H.
Krishnan
,
T.
Fogal
,
A.
Sanderson
,
C.
Garth
,
E. W.
Bethel
,
D.
Camp
,
O.
Rübel
,
M.
Durant
,
J. M.
Favre
, and
P.
Navrátil
, “
VisIt: An end-user tool for visualizing and analyzing very large data
,” in
High Performance Visualization–Enabling Extreme-Scale Scientific Insight
(
2012
), pp.
357
372
.
3.
M.
Kim
,
J.
Kress
,
J.
Choi
,
N.
Podhorszki
,
S.
Klasky
,
M.
Wolf
,
K.
Mehta
,
K.
Huck
,
B.
Geveci
,
S.
Phillip
,
R.
Maynard
,
H.
Guo
,
T.
Peterka
,
K.
Moreland
,
C.-S.
Chang
,
J.
Dominski
,
M.
Churchill
, and
D.
Pugmire
, “
In situ analysis and visualization of fusion simulations: Lessons learned
,” in
High Performance Computing
, edited by
R.
Yokota
,
M.
Weiland
,
J.
Shalf
, and
S.
Alam
(
Springer International Publishing
,
Cham
,
2018
), pp.
230
242
.
4.
F.
Jenko
,
W.
Dorland
,
M.
Kotschenreuther
, and
B. N.
Rogers
, “
Electron temperature gradient driven turbulence
,”
Phys. Plasmas
7
,
1904
1910
(
2000
).
5.
T.
Görler
,
X.
Lapillonne
,
S.
Brunner
,
T.
Dannert
,
F.
Jenko
,
F.
Merz
, and
D.
Told
, “
The global version of the gyrokinetic turbulence code GENE
,”
J. Comput. Phys.
230
,
7053
7071
(
2011
).
6.
Y.
Chen
and
S. E.
Parker
, “
Electromagnetic gyrokinetic particle-in-cell turbulence simulation with realistic equilibrium profiles and geometry
,”
J. Comput. Phys.
220
,
839
855
(
2007
).
7.
S.
Ku
,
C. S.
Chang
,
R.
Hager
,
R. M.
Churchill
,
G. R.
Tynan
,
I.
Cziegler
,
M.
Greenwald
,
J.
Hughes
,
S. E.
Parker
,
M. F.
Adams
,
E.
D'Azevedo
, and
P.
Worley
, “
A fast low-to-high confinement mode bifurcation dynamics in the boundary-plasma gyrokinetic code XGC1
,”
Phys. Plasmas
25
,
056107
(
2018
).
8.
D.
Faghihi
,
V.
Carey
,
C.
Michoski
,
R.
Hager
,
S.
Janhunen
,
C.
Chang
, and
R.
Moser
, “
Moment preserving constrained resampling with applications to particle-in-cell methods
,”
J. Comput. Phys.
409
,
109317
(
2020
).
9.
Take the example of an edge simulation that does not include the core region where a heat source is localized. This edge simulation will receive the information about the heat source from the core simulation in its buffer region.
10.
Equivalent to the condition we discuss in Sec. III D for gyrokinetic simulations.
11.
Note that if we use higher order stencils, the error would propagate more rapidly, in which case one needs to synchronize the buffer more frequently.
12.
A. J.
Brizard
and
T. S.
Hahm
, “
Foundations of nonlinear gyrokinetic theory
,”
Rev. Mod. Phys.
79
,
421
468
(
2007
).
13.
A.
Mishchenko
,
A.
Koenies
, and
R.
Hatzky
, “
Particle simulations with a generalized gyrokinetic solver
,”
Phys. Plasmas
12
,
062305
(
2005
).
14.
J.
Dominski
,
B. F.
McMillan
,
S.
Brunner
,
G.
Merlo
,
T.-M.
Tran
, and
L.
Villard
, “
An arbitrary wavelength solver for global gyrokinetic simulations. Application to the study of fine radial structures on microturbulence due to non-adiabatic passing electron dynamics
,”
Phys. Plasmas
24
,
022308
(
2017
).
15.
A.
Mishchenko
,
R.
Hatzky
,
E.
Sonnendrücker
,
R.
Kleiber
, and
A.
Könies
, “
An iterative approach to an arbitrarily short-wavelength solver in global gyrokinetic simulations
,”
J. Plasma Phys.
85
,
905850116
(
2019
).
16.
L.
Qing
,
L.
Jeremy
,
T.
Yuan
,
A.
Hasan
,
P.
Norbert
,
C. J.
Youl
,
K.
Scott
,
T.
Roselyne
,
L.
Jay
,
O.
Ron
,
P.
Manish
,
S.
Nagiza
,
S.
Karsten
,
S.
Arie
,
W.
Matthew
,
W.
Kesheng
, and
Y.
Weikuan
, “
Hello ADIOS: The challenges and lessons of developing leadership class I/O frameworks
,”
Concurrency Comput.: Pract. Exp.
26
,
1453
1473
(
2014
).
17.
W. F.
Godoy
,
N.
Podhorszki
,
R.
Wang
,
C.
Atkins
,
G.
Eisenhauer
,
J.
Gu
,
P.
Davis
,
J.
Choi
,
K.
Germaschewski
,
K.
Huck
,
A.
Huebl
,
M.
Kim
,
J.
Kress
,
T.
Kurc
,
Q.
Liu
,
J.
Logan
,
K.
Mehta
,
G.
Ostrouchov
,
M.
Parashar
,
F.
Poeschel
,
D.
Pugmire
,
E.
Suchyta
,
K.
Takahashi
,
N.
Thompson
,
S.
Tsutsumi
,
L.
Wan
,
M.
Wolf
,
K.
Wu
, and
S.
Klasky
, “
ADIOS 2: The adaptable input output system. A framework for high-performance data management
,”
SoftwareX
12
,
100561
(
2020
).
18.
In the total-f version of XGC, the marker weight is evolved by using the “direct delta-f” method such that the weight evolution is deduced from conservation of phase space volume and dδf/dt=df0/dt (collisionless gyrokinetic equation). Collision effects are computed separately on the weights with a collision operator.
19.
R.
Hager
,
V.
Carey
,
J.
Dominski
,
S.
Ku
, and
C. S.
Chang
, “
Realization of moment-preserving resampling in a particle-in-cell code
,”
in
International Conference for the Numerical Simulation of Plasma (
2019
).
20.
B.
McMillan
,
S.
Jolliet
,
A.
Bottino
,
P.
Angelino
,
T.
Tran
, and
L.
Villard
, “
Rapid Fourier space solution of linear partial integro-differential equations in toroidal magnetic confinement geometries
,”
Comput. Phys. Commun.
181
,
715
719
(
2010
).
21.
Communicating the distribution function between the core and edge in narrow buffer regions is necessary when one simulation does not cover the full domain. This can be easily understood in a case where a source of
heat is in the core region, i.e., outside of the edge simulation domain. The knowledge of the source term is then communicated by the core simulation to the edge simulation in the interface buffer.
22.
I.
Foster
,
M.
Ainsworth
,
B.
Allen
,
J.
Bessac
,
F.
Cappello
,
J. Y.
Choi
,
E.
Constantinescu
,
P. E.
Davis
,
S.
Di
,
W.
Di
 et al, “
Computing just what you need: Online data analysis and reduction at extreme scales
,” in
European Conference on Parallel Processing
(
Springer
,
Cham
,
2017
), pp.
3
19
.
23.
G.
Merlo
,
J.
Dominski
,
A.
Bhattacharjee
,
C. S.
Chang
,
F.
Jenko
,
S.
Ku
,
E.
Lanti
, and
S.
Parker
, “
Cross-verification of the global gyrokinetic codes GENE and XGC
,”
Phys. Plasmas
25
,
062308
(
2018
).
24.
G.
Merlo
,
S.
Janhunen
,
F.
Jenko
,
A.
Bhattacharjee
,
C.
Chang
,
J.
Cheng
,
P.
Davis
,
J.
Dominski
,
K.
Germaschewski
,
R.
Hager
,
S.
Klasky
,
S.
Parker
, and
E.
Suchyta
, “
First coupled GENE-XGC microturbulence simulations
,”
Phys. Plasmas
28
,
012303
(
2020
).
25.
J.
Cheng
,
J.
Dominski
,
Y.
Chen
,
H.
Chen
,
G.
Merlo
,
S.
Ku
,
R.
Hager
,
C.
Chang
,
E.
Suchyta
,
E.
D'azevedo
,
E.
Suchyta
,
S.
Sreepathi
,
S.
Klasky
,
F.
Jenko
,
A.
Bhattacharjee
, and
S.
Parker
, “
Spatial core-edge coupling of the particle-in-cell gyrokinetic codes GEM and XGC
,”
Phys. Plasmas
27
,
122510
(
2020
).
26.
J.
Dominski
,
J.
Cheng
,
G.
Merlo
,
V.
Carey
,
R.
Hager
,
L.
Ricketson
,
J.
Choi
,
K.
Germaschewski
,
S.
Ku
,
A.
Mollen
,
N.
Podhorszki
,
D.
Pugmire
,
E.
Suchyta
,
P.
Trivedi
,
R.
Wang
,
C. S.
Chang
,
J.
Hittinger
,
F.
Jenko
,
S.
Klasky
,
S. E.
Parker
, and
A.
Bhattacharjee
(
2020
). “
Data from figures in ‘Spatial coupling of gyrokinetic simulations, a generalized scheme based on first-principles
,’” Dataset.