The traditional explicit electrostatic momentumconserving particleincell algorithm requires strict resolution of the electron Debye length to deliver numerical stability and accuracy. The explicit electrostatic energyconserving particleincell algorithm alleviates this constraint with minimal modification to the traditional algorithm, retaining its simplicity, ease of parallelization, and acceleration on modern supercomputing architectures. In this article, we apply the algorithm to model a onedimensional radio frequency capacitively coupled plasma discharge relevant to industrial applications. The energyconserving approach closely matches the results from the momentumconserving algorithm and retains accuracy even for cell sizes up to 8 times the electron Debye length. For even larger cells, the algorithm loses accuracy due to poor resolution of steep gradients within the radio frequency sheath. Accuracy can be recovered by adopting a nonuniform grid, which resolves the sheath and allows for cell sizes up to 32 times the electron Debye length in the quasineutral bulk of the discharge. The effect is an up to 8 times reduction in the number of required simulation cells, an improvement that can compound in higherdimensional simulations. We therefore consider the explicit energyconserving algorithm as a promising approach to significantly reduce the computational cost of fullscale device simulations and a pathway to delivering kinetic simulation capabilities of use to industry.
I. INTRODUCTION
Delivering advanced semiconductor manufacturing capabilities for the coming decades will require an unprecedented control of plasma chemistry and kinetic behavior at the wafer surface.^{1} Developing equipment that can enable this control will necessitate increasing reliance on predictive modeling tools which can accurately capture this physics.^{2} Such tools, perhaps in combination with artificial intelligence approaches, will also enable the generation of digital twins for plasma equipment and reduce the time for optimization of integrated circuit fabrication recipes.^{3,4}
Due to its intuitive nature, a long history of use within the community, and adaptability to highperformance computing, the ParticleinCell (PIC) method^{5–7} has proven to be the most commonly used technique to conduct kinetic simulations of lowtemperature plasmas, including lowpressure radio frequency capacitively coupled plasma discharges (RFCCPs) used in semiconductor etching, doping, and cleaning.^{1,2,4,8–22} In the electrostatic limit, the most common PIC algorithm solves for the Poisson equation, and therefore electric field, using the finite difference method, and updates the particles via the phasespace volumepreserving Boris algorithm.^{23,24} When set on a uniform grid, this algorithm has the desirable property of reproducing momentum conservation up to machine precision,^{6} and we will therefore refer to this canonical algorithm as MCPIC. The MCPIC algorithm must satisfy the following criteria to produce a stable and accurate simulation:^{6,20,21,25}
Criterion 1d is somewhat vague, primarily because the required number of particles is problem dependent and therefore difficult to determine a priori. The criterion is usually expressed as a need to include a sufficient number of macroparticles per Debyesphere or percell. One approach for determining accuracy is to ensure that the increased numerical thermalization rate, caused by adopting macroparticles of artificially large charge, is small compared to competing physical processes.^{21,28} However commonly, PIC users will rely on intuition, guidance from the literature, or perhaps most decisively, perform convergence tests to determine if the criterion is satisfied.
This paper is focused primarily on overcoming the restriction on cell size [criterion (1a)], although we will use the other criteria to draw conclusions regarding the accuracy and stability of our results. In MCPIC simulations where the electron Debye length is underresolved, the instability, known as the finitegridinstability, manifests as an artificial growth rate in the plasma wave dispersion relationship, leading to heating of the plasma.^{25} This heating continues until condition (1a) is satisfied, making it a particularly pernicious instability since the numerical scientist or engineer may mistake it for real physics within the simulation.
Perhaps unsurprisingly, these constraints make the computational cost of resolving the full sixdimensional phase space of a realistic plasma discharge enormous. Typical RFCCP discharges may have dimensions on the order of thousands of $ \lambda D e$ and equilibrium time scales spanning millions of plasma oscillations ( $ 1 / \omega p e$). This has severely limited the use of computer aided software for designing lowpressure industrial plasma devices, which operate in the kinetic regime, necessitating an overreliance on costly and timeconsuming experiments as an alternative. Advances in computing power have come a long way to alleviate these issues, however realistic fully resolved 2D simulations remain challenging and large 3D simulations are still impossible, even on the latest supercomputers, making them prohibitive for industrial computer aided design. Therefore, highperformance computing cannot be relied upon as the only means to address this challenge; research and development of algorithms which can alleviate the stability constraints of criteria 1ad are also essential.
There have been several attempts to overcome the numerical constraints of the MCPIC algorithm. These include semiimplicit methods such as the directimplicit or implicitmoment methods which apply a single linearized correction and can be generalized to additional corrections.^{30–44} These techniques can prove accurate when tuned correctly; however, the optimal parameters to avoid numerical heating or cooling can be time and space dependent, making them challenging to maintain.^{44,45} More recently, work on fully implicit schemes has been driven by Chacon and coworkers,^{46–51} Lapenta and coworkers,^{52–55} and others,^{56} with Eremin et al.^{17,57–60} adapting them for lowtemperature applications. These approaches remain stable, as well as charge and energyconserving, for arbitrary cell and time step sizes, although some care must be taken with the particle update to accurately capture field gradients. While powerful, these techniques often require a fully nonlinear solve with a welltuned preconditioner, increasing code complexity and potentially affecting parallel performance.^{61,62} Since industry relevant RFCCPs exhibit physics at, or near, electron time scales, performance improvements associated with a larger time step may be limited by the requirement to resolve these dynamics. In particular, many equipment manufacturers are considering multifrequency,^{63–68} highfrequency,^{14,69} or tailored waveforms^{70–74} which can induce complex electron timescale phenomena in the plasma sheath. Even in the simple situation of a relatively low, single frequency, discharge (as considered in this article), electron acceleration in the sheath can produce highfrequency oscillations which couple to the bulk discharge properties via stochastic heating.^{19}
An alternative algorithm, attributable to a 1970 paper by Lewis,^{75} applies a variational approach to derive an explicit PIC method that is energy conserving in the zerotime step limit. Recently, these derivations have been put on a more theoretical footing through research into geometric PIC approaches, specifically through Discrete Exterior Calculus and Whitney interpolating forms.^{76–78} In practice, the method tightly bounds energy growth when the time step satisfies condition (1b) and for practical purpose can be considered energy conserving. Throughout this paper, we will refer to this explicit electrostatic energyconserving technique as ECPIC. The advantage of energy conservation is that it can prevent the finitegrid instability from occurring and therefore alleviate the constraint of criterion 1a, allowing for $\Delta x\u226b \lambda D e$. Additionally, ECPIC is explicit, with only a slight modification to the MCPIC method and is therefore easy to implement and highly amenable to parallelization and acceleration on modern heterogeneous supercomputing architectures. The ECPIC algorithm was initially viewed in a less than positive light due to the undesirable emergence of a nonphysical coldbeam instability;^{79} however, recent work from Barnes and Chacon has shown that the technique is accurate in thermal plasmas dominated by ambipolar forces.^{80} More specifically, the Debye length can be safely underresolved when the velocity of a particle beam is less than the species thermal velocity.^{80} This makes the ECPIC method a strong candidate for modeling many lowtemperature plasma devices.
It is important to clarify that the ECPIC approach alleviates a numerical constraint of the MCPIC algorithm and will therefore not necessarily be accurate for $\Delta x> \lambda D e$ if resolution of Debye scale physics is required for physical accuracy. Therefore, ECPIC is best suited for modeling cases where $ \lambda D e$ scale physics is unimportant throughout much of the plasma. An example of such a case is within an RFCCP discharge, where the bulk of the plasma, while kinetic, is quasineutral and relatively quiescent, making this system a suitable test for the method. Additional caveats to consider are that the ECPIC method does not conserve momentum to machine precision, and, in its simplest form, the algorithm implemented here is first order rather than second order accurate in space.^{81} Therefore, as with any numerical method, it is important to check the accuracy of ECPIC for modeling the system of interest.
In this paper, we demonstrate that the approach can provide accuracy for lowtemperature plasma RFCCP simulations while adopting cell sizes much larger than the Debye length. Section II details the algorithm, software package, and simulation configuration used to explore the accuracy of the method. Section III presents the results comparing the MCPIC and ECPIC approaches in the fully resolved and underresolved cases, demonstrating that a nonuniform grid can improve ECPIC accuracy for even lower resolution and discussing the possible performance gains enabled by this approach. Finally, Sec. IV offers some concluding thoughts and pathways toward realistic simulations of industrial devices using the ECPIC algorithm.
II. METHODOLOGY
To avoid ambiguities of complex geometry and boundary conditions, and focus attention on effects of the algorithm, the ECPIC method is tested on a simple 1D RFCCP model using the minipic code. minipic is a onedimensional objectoriented Python package developed at the Princeton Plasma Physics Laboratory for rapid prototyping of new algorithms. All routines are written using the NumPy framework, which offers performance approaching that of compiler level languages.^{82} Although the code runs on only a single CPU, all routines can be offloaded to a single GPU using the CuPy package.^{83}^{,} minipic incorporates periodic or Dirichlet boundary conditions with a time varying RF potential and solves the Poisson equation on the CPU via LU decomposition with the SciPy sparse linear algebra package.^{84} Particles are updated with the leapfrog algorithm, and Monte Carlo collisions are implemented following the techniques of Vahedi & Surendra^{27} Specific modifications are made to the collision algorithm in order to match the 2013 benchmark of Turner et al.,^{9} with further details provided in Appendix B. The code is benchmarked against all cases of Turner et al., showing excellent agreement (also in Appendix B).
A. Explicit energyconserving PIC algorithm on a uniform grid
The onedimensional explicit energyconserving (EC)PIC algorithm time loop is considered on a uniform grid with node locations $ x n$, where $n\u2208 0 , N$, and the cell size is $\Delta x.$^{75} Later, in this article, we explore the use of ECPIC on a nonuniform grid,^{47,50} the derivation for which can be found in Appendix A. Where unspecified below, the time level ( $k$) is the same for all variables. The algorithm is similar to the momentumconserving (MC)PIC algorithm, so we will clearly denote where the differences lie.
 Interpolate the charge density to the grid using a first order shape function$ \rho n= 1 \Delta x \u2211 s q s \u2211 i S 1 X i \u2212 x n \Delta x,$$ S 1 x= 1 + x , \u2212 1 \u2264 x < 0 , 1 \u2212 x , 0 \u2264 x \u2264 1 , 0 , otherwise ,$
where in onedimension, $ q s$ is the charge per unitarea of species $s$, $ \rho n$ is the charge density at grid node $n$, and $ X i$ is the location of particle $i$.
 Solve the Poisson equation on the grid using the 3point stencil for the Laplacian$ \varphi n + 1 \u2212 2 \varphi n + \varphi n \u2212 1 \Delta x 2=\u2212 \rho n \epsilon 0,$
where $ \varphi n$ is the electric potential at grid node $n$. In this paper, we set Dirichlet boundary conditions by assigning the potential at the end point grid nodes $ \varphi 0$ and $ \varphi N$, and Eq. (4) is applied for all $n\u2208 1 , N \u2212 1$.
 (Different from MCPIC). The electric field is stored at cellcenters and calculated from the electric potential using a centered finite difference scheme$ E n + 1 / 2=\u2212 \varphi n + 1 \u2212 \varphi n \Delta x,$
where $ E n + 1 / 2$ is the electric field at the center of cell $n$ with location $ x n + 1 / 2$.
 (Different from MCPIC). Interpolation of the electric field back to the particles is performed via the nearest grid point shape function with respect to the cell center$ E i= \u2211 n E n + 1 / 2 S 0 X i \u2212 x n + 1 / 2 \Delta x,$$ S 0 x= 1 , \u2212 1 / 2 \u2264 x < 1 / 2 , 0 , \u2009 otherwise ,$
where $ E i$ is the electric field for particle $i$, and $ S 0 x$ is the nearest grid point shape function.
 Update the particle phase space coordinates ( $ X i, V i$) via the leapfrog algorithm.$ V i k + 1 / 2 \u2212 V i k \u2212 1 / 2 \Delta t= q s m s E i k,$$ X i k + 1 \u2212 X i k \Delta t= V i k + 1 / 2.$
where m_{s} is the mass of species s.

Collide particles via the chosen MonteCarlo Collision algorithms.

Return to step 1.
Due to the use of the nearestgridpoint shape function in Eq. (6), this implementation of ECPIC is expected to be first order in space, whereas the standard MCPIC algorithm, using a linear interpolating function, is second order.^{81}
The approach of Eq. (10) also demonstrates how higher order ECPIC methods could be derived if one were to replace $ S 1$ with some arbitrary order shape function $ S n$. Examples of such shape functions, or Whitney interpolating forms as they are known, can be found in Ref. 78. Alternatively, since the current density is also a 1form, the electric field could be updated via Ampère's law, using the same shape function for current deposition and electric field interpolation,^{51,57} although in higher dimensions this would require a procedure to eliminate the rotational component of the updated electric field.^{51,85} Reference 52 relies on this approach to deliver a second order accurate ECPIC method.
Since the derivative of the discrete potential in Eq. (10) is exact, the force on the particle is, by definition, conservative. This property is important for demonstrating the energyconserving nature of ECPIC in the $\Delta t\u21920$ limit, the full proof of which can be found in Refs. 75, 79, and 86 for uniform grids and Refs. 47 and 50 for nonuniform grids. An empirical study of the heating rates for the MCPIC and ECPIC algorithms in a collisionless periodic system for different grid resolutions is presented in Appendix C, demonstrating the superior energy conservation properties of ECPIC.
B. Configuration of the radiofrequency capacitively coupled plasma discharge
To test the accuracy of ECPIC for realistic lowtemperature plasma physics, the Helium RFCCP simulations from Turner et al. are modified to that of a larger discharge. The simulation is initialized with a uniform plasma density and temperature, then evolved until a steady state is achieved, after which the plasma properties are averaged over a given time period. We utilize identical Helium collision algorithms and cross section tables to those from the Turner et al. benchmark,^{9} including electron–neutral elastic collisions, two electron–neutral excitation collisions (at 19.82 and 20.61 eV), electron–neutral ionization (24.59 eV), ion–neutral elastic collisions, and ion–neutral chargeexchange. The physical and numerical parameters for the discharge are listed in Table I.
Parameter name .  Parameter symbol and units .  Parameter value . 

Electrode separation  $L\u2009(m)$  $0.3$ 
Neutral density  $ n n\u2009 1 / m 3$  $9.64\xd7 10 20\u2009(30\u2009mTorr)$ 
Neutral temperature  $ T n\u2009(K)$  $300$ 
RF frequency  $f\u2009(MHz)$  $13.56$ 
RF voltage amplitude  $ V 0\u2009(V)$  $1000$ 
Simulation time  $ T sim\u2009(s)$  $5100/f$ 
Total averaging time  $ T a\u2009(s)$  $100/f$ 
Averaging time interval  $ T f\u2009(s)$  $ 1 / 10 f$ 
Initial plasma density  $ n 0\u2009 1 / m 3$  $7\xd7 10 15$ 
Initial electron temperature  $ T e\u2009(eV)$  $2.585\u2009(30\u2009000\u2009K)$ 
Initial ion temperature  $ T i\u2009(K)$  $300$ 
Reference number of cells  $ N x , ref$  $2048$ 
Reference cell size  $\Delta x ref\u2009(m)$  $1.465\xd7 10 \u2212 4$ 
Number of time steps  $ N t$  $20\u2009400\u2009000$ 
Time step size  $\Delta t\u2009(s)$  $ 1 / 4000 f$ 
Averaging time steps  $ N a$  $400\u2009000$ 
Time steps between averaging  $ N f$  $400$ 
Initial particlespercell per species  $ N PPC$  $512$ 
Parameter name .  Parameter symbol and units .  Parameter value . 

Electrode separation  $L\u2009(m)$  $0.3$ 
Neutral density  $ n n\u2009 1 / m 3$  $9.64\xd7 10 20\u2009(30\u2009mTorr)$ 
Neutral temperature  $ T n\u2009(K)$  $300$ 
RF frequency  $f\u2009(MHz)$  $13.56$ 
RF voltage amplitude  $ V 0\u2009(V)$  $1000$ 
Simulation time  $ T sim\u2009(s)$  $5100/f$ 
Total averaging time  $ T a\u2009(s)$  $100/f$ 
Averaging time interval  $ T f\u2009(s)$  $ 1 / 10 f$ 
Initial plasma density  $ n 0\u2009 1 / m 3$  $7\xd7 10 15$ 
Initial electron temperature  $ T e\u2009(eV)$  $2.585\u2009(30\u2009000\u2009K)$ 
Initial ion temperature  $ T i\u2009(K)$  $300$ 
Reference number of cells  $ N x , ref$  $2048$ 
Reference cell size  $\Delta x ref\u2009(m)$  $1.465\xd7 10 \u2212 4$ 
Number of time steps  $ N t$  $20\u2009400\u2009000$ 
Time step size  $\Delta t\u2009(s)$  $ 1 / 4000 f$ 
Averaging time steps  $ N a$  $400\u2009000$ 
Time steps between averaging  $ N f$  $400$ 
Initial particlespercell per species  $ N PPC$  $512$ 
In the results presented below, we measure agreement between simulations in an empirical sense. Specifically, we compare the 1norm difference between grid quantities and consider the simulations in agreement if the difference is less than 5%. Where needed, we rely on linear interpolation to map between different simulation grids. A simulation is considered accurate if it is in agreement with another simulation that was previously determined to be accurate.
Each simulation was performed on a single NVIDIA V100 GPU and single core of an IBM POWER9 CPU at Princeton Universities Traverse cluster^{87} allowing for multiple simulations to be run concurrently.
III. RESULTS AND DISCUSSION
A. Comparison of momentum and energy conserving PIC methods in the fully resolved case
The simulation is initialized and evolved for $5000$ RF periods, or 20 × 10^{6} time steps, until a quasisteady state is reached. Time averaged data of plasma moments, phase space data, and field quantities are then recorded over an additional $100$ RF periods at intervals of $1/ 10 $ th of an RF period, or every 400 steps over a total 400 000 steps. At each output time, all electron phase space data within a $1\u2009\u2009mm$ region at the center of the discharge are recorded for producing electron energy distribution functions (EEDFs). Additionally, all Helium ions that reach the RF electrode within this averaging time are recorded for generation of ion impact distribution functions. The xcomponent of temperature at each grid point is computed from the pressure tensor (as opposed to the stress tensor). Figure 1 plots key plasma, field, and energy distribution properties for both the fully resolved MCPIC and ECPIC simulations.
The electron and ion densities exhibit the typical symmetric profile of an RFCCP discharge with peak density at the center, decreasing monotonically toward the electrodes. Charge density reveals that the plasma is indeed quasineutral everywhere except within the sheaths near the electrodes. The potential shows a flattopped profile, with peak plasma potential of 423 V, resulting in a near zero electric field in the plasma bulk tending to larger values in the sheath region. The xcomponent of electron temperature is raised above the initial simulation temperature to around 3.2 eV in the plasma bulk, peaking at the edge of the sheath where power is being deposited. The xcomponent of ion temperature is lowest in the bulk but increases in the sheath regions due to acceleration, although the idea of a temperature for both species breaks down in this kinetic region. The electron energy distribution functions (EEDFs) at the center of the discharge show a typical biMaxwellian profile, with the higher energy tail important for driving ionization in this discharge, and generation of radicals in more complex chemistries.^{8,88} The ion impact distribution functions exhibit a peaked profile corresponding to successive chargeexchange collisions during acceleration to the wall through the sheath potential.^{89} The angular distribution is relatively narrow, with a minimum at $\theta = 0 \xb0$ caused by offaxis scattering during ion–neutral collisions in the sheath.
With consideration to the stability and accuracy criteria of Eq. (1), we calculate that at the quasisteady state,

$ \lambda D e , min\u22481.11\Delta x ref$.

$ \omega p e , max\Delta t\u22480.085$.

No particles exceed $ v max= \Delta x ref / \Delta t$.

Convergence studies demonstrate that the number of particlespercell $ N ppc=512$ is sufficient for statistical accuracy.
On this final point, we performed a convergence study with $ N ppc= 16 , \u2009 32 , \u2009 64 , 128 , \u2009 \u2009 256 , \u2009 512$ and saw agreement within 1.7% between results with $ N ppc=256$ and $512$, indicating satisfactory convergence for our case. Note that since $\Delta x\u2248 \lambda D e$, the number of particlesperDebyesphere is also approximately $512$. For Monte Carlo collisional processes, we had $ \nu t\Delta t\u22480.002$ for electrons and $0.0005$ for ions. We therefore consider the RFCCP simulations using the wellestablished MCPIC algorithm to be accurate within the limitations of the scheme.
Figure 1 demonstrates excellent agreement between the MCPIC and ECPIC algorithms. With the 1norm difference in plasma density and electron temperature across the entire domain calculated as 1.1% and 0.76%, respectively. Most importantly for industrial purpose, the central electron EEDF and ion impact energy and angular distribution functions also show close agreement. According to our definition, we therefore conclude that the ECPIC simulation of the RFCCP is in agreement with the MCPIC simulation and is therefore considered accurate by our criterion. Both of these solutions will be referred to when assessing accuracy in the analysis below.
B. Stability and accuracy for spatially underresolved simulations
To determine the stability and accuracy of the two algorithms for spatially underresolved cases, we progressively increase the cell size by powers of 2, up to a maximum of $\Delta x=32\Delta x ref$, corresponding to $ N x=64$ simulation cells in the coarsest case. For all simulations, the total number of initial particles is kept the same, resulting in the same number of particles per Debye sphere (assuming the density and temperature profiles remain accurate). This was decided to ensure that the numerical collision frequency induced by the artificially large macroparticle weight was constant between all simulations.^{21} It should therefore be noted that the initial number of particlespercell increases with cell size. Figures 2 and 3 plot the same parameters as Fig. 1 for the MCPIC and ECPIC algorithms, respectively. Agreement between simulations is determined by interpolation onto the grid with $\Delta x=\Delta x ref$, all results are summarized in Table II.
.  .  MCPIC .  ECPIC .  

Cell size .  Total number of cells .  Ion density error (1norm) .  Electron temperature error (1norm) .  Ion density error (1norm) .  Electron temperature error (1norm) . 
$\Delta x ref/2$  4,096  0.26%  1.1%  0.26%  1.0% 
$\Delta x ref$  2,048  0%  0%  0%  0% 
$2\Delta x ref$  1,024  0.50%  1.7%  0.46%  0.90% 
$4\Delta x ref$  512  4.6%  5.0%  0.52%  1.4% 
$8\Delta x ref$  256  22.8%  17.7%  1.8%  2.9% 
$16\Delta x ref$  128  X  X  7.9%  7.5% 
$32\Delta x ref$  64  X  X  60%  26% 
.  .  MCPIC .  ECPIC .  

Cell size .  Total number of cells .  Ion density error (1norm) .  Electron temperature error (1norm) .  Ion density error (1norm) .  Electron temperature error (1norm) . 
$\Delta x ref/2$  4,096  0.26%  1.1%  0.26%  1.0% 
$\Delta x ref$  2,048  0%  0%  0%  0% 
$2\Delta x ref$  1,024  0.50%  1.7%  0.46%  0.90% 
$4\Delta x ref$  512  4.6%  5.0%  0.52%  1.4% 
$8\Delta x ref$  256  22.8%  17.7%  1.8%  2.9% 
$16\Delta x ref$  128  X  X  7.9%  7.5% 
$32\Delta x ref$  64  X  X  60%  26% 
For the MCPIC algorithm, the simulations with $\Delta x= 16 , 32\Delta x ref$ become unstable and crash after 139 and 56 RF periods, respectively (denoted by crosses in Table II and the Fig. 2 legend). Investigating the data reveals that the crash occurs when the 32 GB of GPU RAM becomes saturated. This indicates a greater than 100fold increase in memory utilization from the initial conditions of the simulation which, based on the code design, can only be attributed to creation of new particle phase space data. This particle creation is presumed to be caused by artificial heating from the finitegridinstability due to underresolution of the Debye length, leading to an increase in ionization and a corresponding increase in density. This is then exacerbated when the density yields a plasma frequency too large to satisfy criterion (1b), leading to an additional numerical instability and therefore further heating and ionization. Eventually, this process leads to a runaway in particle number, memory exhaustion, and finally simulation failure.
Regarding the stable results, we observe that for increasing $\Delta x$, the peak plasma density and electron temperature also increase. The ion temperature increases toward the sheath region for the lowest resolution simulation ( $\Delta x=8\Delta x ref$), and this likely explains the shift to higher energies of the ion impact energy distribution function for the same case. With reference to Table II and based on our accuracy criterion, the MCPIC algorithm appears to become inaccurate for $\Delta x\u22654\Delta x ref$.
Contrasting this, the ECPIC simulations remain stable for all cases, presumably since the finitegrid instability is suppressed. Although stable, the density does increase significantly in the coarsest simulation, with a surprising drop in electron temperature in the plasma bulk, but increase in the sheath region, driving additional ionization and therefore plasma density. The ion temperatures, however, remain extremely accurate for all cases, and the EEDF and ion impact distribution functions are accurate for all but the coarsest case. With reference to Table II and based on our accuracy criterion, the ECPIC algorithm is stable for all cases, but inaccurate for $\Delta x\u226516\Delta x ref$, offering a significant improvement over the MCPIC algorithm.
It should also be pointed out that decreasing the number of cells reduces the noise for all results, which is especially noticeable in the charge density and electron power deposition [Figs. 2(c), 3(c), 2(f), and 3(f)]. This is due to the fact that the total number of particlespercell is increasing with cell size, leading to improved statistical resolution of the plasma moments and field quantities.
To investigate why the ECPIC algorithm loses accuracy, Fig. 4 plots the charge density ( $\rho $) and electric field ( $E$) within the lefthand sheath region (for $x<3\u2009cm$) at various phases of the RF bias for the different resolution simulations. It is clear that the gradient of charge density and electric field become quite steep during certain phases of the RF period, particularly at $ 4 / 5 f$, and that the simulations with larger $\Delta x$ struggle to resolve this.
C. Improving accuracy with nonuniform grids
Nonuniform grids introduce additional computational and accuracy challenges for PIC codes. In onedimension, the ordering of cells is straightforward; however, higher dimensions raise questions on how to best order the cells for optimal memory access and communication patterns. Fully unstructured grids can also prevent the use of algorithms such as the geometric multigrid method, which often perform better than the more generalized algebraic multigrid method for solving the Poisson equation. Nonuniform grids also lead to accuracy errors due to the introduction of particle selfforces, which break momentum conservation (even for MCPIC).^{90,91} It therefore remains important to proceed with numerical experiments to characterize the accuracy of RFCCP simulations under various nonuniform grid conditions.
With this in mind, minipic was updated to incorporate nonuniform grids, with the following grid design logic:

Within a buffer length $ L buff$ from each electrode, the cell size is set to $\Delta x min$ such that sheath gradients are suitably resolved.

Outside of this region, the cells can increase up to $\Delta x max$.

Cells are increased (or decreased) in size only by doubling (or halving) the cell's size.

No cell can have neighbor cells which are larger (or smaller) by more than a factor of 2.
An example of how this refinement strategy might look is shown in Fig. 6 for $ N x=128$ without coarsening, $ L buff=10\Delta x ref$ and $\Delta x max= 1 , 2 , 4 , 8\Delta x ref$.
From Fig. 4, we observe that the maximum sheath length during the RF phase is approximately 2.2 cm, and therefore set $ L buff$ in our RFCCP simulations to be just over two times the sheath length, or 5 cm. The total number of cells for each simulation are provided in Table III. Figure 7 plots the results of the ECPIC simulations with the refined grid and increasing maximum cell sizes. For all cases, the same number of total simulation particles are used.
Minimum cell size .  Maximum cell size .  Total number of cells .  Ion density error (1norm) .  Electron temperature error (1norm) . 

$\Delta x ref$  $\Delta x ref$  2048  0%  0% 
$\Delta x ref$  $2\Delta x ref$  1366  0.34%  0.45% 
$\Delta x ref$  $4\Delta x ref$  1,026  0.62%  0.49% 
$\Delta x ref$  $8\Delta x ref$  858  0.64%  0.71% 
$\Delta x ref$  $16\Delta x ref$  776  2.1%  2.0% 
$\Delta x ref$  $32\Delta x ref$  736  4.2%  3.7% 
$2\Delta x ref$  $2\Delta x ref$  1,024  0.46%  0.90% 
$2\Delta x ref$  $4\Delta x ref$  684  0.58%  0.87% 
$2\Delta x ref$  $8\Delta x ref$  516  0.58%  1.1% 
$2\Delta x ref$  $16\Delta x ref$  434  2.1%  2.5% 
$2\Delta x ref$  $32\Delta x ref$  394  3.9%  4.1% 
$4\Delta x ref$  $4\Delta x ref$  512  0.52%  1.4% 
$4\Delta x ref$  $8\Delta x ref$  342  0.65%  1.7% 
$4\Delta x ref$  $16\Delta x ref$  258  2.5%  3.4% 
$4\Delta x ref$  $32\Delta x ref$  218  4.3%  5.1% 
Minimum cell size .  Maximum cell size .  Total number of cells .  Ion density error (1norm) .  Electron temperature error (1norm) . 

$\Delta x ref$  $\Delta x ref$  2048  0%  0% 
$\Delta x ref$  $2\Delta x ref$  1366  0.34%  0.45% 
$\Delta x ref$  $4\Delta x ref$  1,026  0.62%  0.49% 
$\Delta x ref$  $8\Delta x ref$  858  0.64%  0.71% 
$\Delta x ref$  $16\Delta x ref$  776  2.1%  2.0% 
$\Delta x ref$  $32\Delta x ref$  736  4.2%  3.7% 
$2\Delta x ref$  $2\Delta x ref$  1,024  0.46%  0.90% 
$2\Delta x ref$  $4\Delta x ref$  684  0.58%  0.87% 
$2\Delta x ref$  $8\Delta x ref$  516  0.58%  1.1% 
$2\Delta x ref$  $16\Delta x ref$  434  2.1%  2.5% 
$2\Delta x ref$  $32\Delta x ref$  394  3.9%  4.1% 
$4\Delta x ref$  $4\Delta x ref$  512  0.52%  1.4% 
$4\Delta x ref$  $8\Delta x ref$  342  0.65%  1.7% 
$4\Delta x ref$  $16\Delta x ref$  258  2.5%  3.4% 
$4\Delta x ref$  $32\Delta x ref$  218  4.3%  5.1% 
We clearly observe a significant increase in accuracy for the larger cell size cases when compared to the uniform grid cases. A result that is confirmed when comparing the 1norm of ion density and electron temperature, as shown in Table III. Although the minimum cell size has been set to the reference cell size in Fig. 7, the accuracy of ECPIC in the results of Fig. 3 indicates that we may also be able to safely adopt larger minimum cell sizes, thereby further reducing the cell count. Figure 8 shows additional simulations with different minimum and maximum cell sizes, the results of which are also listed in Table III.
Using our definition of accuracy, we can conclude that for almost all cases, the ECPIC method accurately reproduces the RFCCP behavior. The only exception is the case where $\Delta x min=4\Delta x ref$ and $\Delta x max=32\Delta x ref$ (the most underresolved case) where the 1norm of electron temperature slightly exceeds the criterion at 5.1%. Perhaps most importantly for applications to semiconductor etching the EEDF at the center of the discharge, ion impact distribution energy and angle are clearly in excellent agreement for all cases. This shows that the RFCCP discharge considered here can be accurately modeled using the ECPIC approach on nonuniform grids, providing an up to 8 times reduction in the number of cells and allowing for up to 32 times larger cells in the plasma bulk. Appendix D shows that accuracy is not improved when MCPIC is also modified to include a nonuniform grid, indicating that the observed improvements are a result of the properties of ECPIC rather than the choice of a nonuniform grid. Section III D will discuss how the reduction in number of cells could lead to improved performance for one and higher dimensional CCP simulations.
D. Opportunity for PIC performance improvements with the explicit electrostatic energyconserving algorithm
Figure 9 shows $CR$ against $ L buff / L$ for various refinement ratios demonstrating that, unsurprisingly, the reduction in number of cells is linearly proportional to $ L buff / L$ and decreases asymptotically toward zero for $ L buff / L=0$ as $R\u2192\u221e$. ECPIC will therefore only prove beneficial for reducing the number of simulation cells when a large portion of the length (or area/volume in higher dimensions) can be suitably underresolved, and when this region can be underresolved by at least an order of magnitude (ideally $R\u226510$). If $ L buff / L$ is small, the reduction in number of cells will compound in higher dimensions, approximately scaling as $C R d$, where $d$ is the dimension of the problem.
What improvements might this reduction in cells provide for PIC codes in one and higher dimensions? With consideration to the entire PIC algorithm, reducing the number of cells is expected to lower the computational cost and therefore increase performance for grid operations. In an electrostatic PIC code, the cost of grid operations is generally dominated by the elliptical Poisson solve. Nonetheless, time to complete the Poisson solve made at each time step is generally much cheaper than particle routines (i.e., interpolation, phase space update, and collisions) purely because a large number of particlespercell are required for statistical resolution [criterion (1d)]. Therefore, since we initialized all simulations with the same number of particles, we saw only a marginal reduction in runtime for simulations with significantly underresolved grids. This is also demonstrated in Fig. 10(a), which shows 2D weak scaling of the highperformance LTPPIC code^{92,93} with up to 16 384 cores on the CPU partition of the modern Perlmutter supercomputer.^{94} Here, the GMRES solver with geometric multigrid preconditioner from the Hypre package is used,^{95} which is significantly faster than particle routines.
As shown in Fig. 10(b), the picture is quite different on the heterogenous CPU+GPU partition of the computer. We believe it is fairest to compare results on a nodetonode basis, where each node comprises 4× NVIDIA A100 GPUs and a single 64 core AMD EPYC CPU. Therefore, we equate 1 GPU to 16 cores of the CPU, scaling up to a total of 1024 GPUs. With this architecture, and using a CUDA implementation of the same algorithms in Hypre, the 2D Poisson solve time becomes comparable in computational cost to the particle routines. This can be explained by considering that GPU chipsets offer the greatest performance improvement when handling a very large throughput of data. In the context of a highperformance PIC code, this is generally satisfied for particle routines that handle tens or hundreds of millions of phase space variables per GPU. On the other hand, since a large number of particlespercell are required, the grid size in memory must necessarily be orders of magnitude smaller than the particle phase space data and therefore receives a reduced performance improvement on the GPU. The dashed blue line in Fig. 10(b) shows the equivalent solve time for the 2D Poisson equation on CPUs demonstrating that it may often be better to perform the solve on the CPU rather than the GPU. It should be pointed out that despite the relative performance reduction of the Poisson solve on the GPU, Fig. 10 clearly indicates that an order of magnitude speedup can still be obtained for a PIC code on heterogeneous architectures.
Where ECPIC can provide an advantage is in reducing the number of cells in a simulation, and therefore the solve time for the Poisson equation. On a purely CPU architecture, or for 1D simulations, this may provide little performance improvement; however, as Fig. 10 shows, on a heterogeneous CPU+GPU architecture, a noticeable (although not order of magnitude) speedup could be obtained in two and presumably higher dimensions. As mentioned, constructing and optimizing a nonuniform grid in more than onedimension is not a straightforward task and investigating speedup with reducing number of cells will be the topic of future research. It is expected that a structured quadtree/octree approach,^{96} as opposed to a fully unstructured mesh, would likely be most efficient since it still allows for the continued use of geometric multigrid preconditions.^{95}
Where ECPIC may be able to offer orders of magnitude speedup, is if the reduction in number of cells can be translated into an equivalent reduction in the number of simulation particles. On a nonuniform grid, this would necessitate the use of particle splitting, merging, or remapping^{100–111} to realize a performance gain, and it is currently an open problem as to how this will affect accuracy when the number of particles per Debye sphere is reduced on large cells. Implementing and studying these approaches is beyond the scope of this paper and is also reserved for future work.
IV. CONCLUSION
Building on the research of Lewis, as well as Barnes and Chacon, the explicit energy conserving (EC)PIC algorithm is applied to model an RF capacitively coupled plasma discharge relevant to industrial applications. It is observed that the ECPIC method can accurately reproduce the behavior of the discharge with respect to the momentum conserving (MC)PIC simulations. When underresolving the simulation with larger cell size, the ECPIC method is observed to be more stable and accurate than the MCPIC case, allowing for uniform cells up to 8 times the electron Debye length. Loss in accuracy of the ECPIC method for more highly underresolved simulations is due to poor resolution of the plasma sheath region which can be remedied by adopting a nonuniform grid. The nonuniform ECPIC method shows both excellent accuracy and stability, allowing for cell sizes up to 32 times the electron Debye length in the quasineutral region of the discharge. This leads to an up to 8 times reduction in number of cells and therefore reduced computational cost for PIC gridbased routines, with even greater benefit expected for two or threedimensional simulations. Future work will investigate the approach in higher dimensions, as well as how best to implement variable particle weight schemes to realize further performance improvements while maintaining accuracy. This work presents one step toward realizing PIC as a useful tool for engineering prototyping of lowtemperature plasma devices relevant to a wide range of industries.
ACKNOWLEDGMENTS
This research was funded by the Department of Energy's Laboratory Directed Research and Development (LDRD) program through grant number R112.
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DEAC0205CH11231 using NERSC Award No. BESERCAP0021158.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
Andrew Tasman Powis: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (equal). Igor D. Kaganovich: Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – review & editing (equal).
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.
APPENDIX A: EXPLICIT ENERGYCONSERVING PIC ALGORITHM ON A NONUNIFORM GRID
The onedimensional explicit energyconserving (EC)PIC algorithm time loop is considered on a nonuniform grid with node locations $ x n$ where $n\u2208 0 , N$. When unspecified, the time level ( $k$) is the same for all variables. The procedure is as follows:
 Interpolate the charge density to the grid using first order shape functions$ \rho n= 1 x n + 1 \u2212 x n \u2212 1 \u2211 s q s \u2211 i S + X i \u2212 x n x n + 1 \u2212 x n + S \u2212 X i \u2212 x n x n \u2212 x n \u2212 1,$$ S + x= 1 \u2212 x , \u2009 0 \u2264 x \u2264 1 , 0 , \u2009 otherwise ,$$ S \u2212 x= 1 + x , \u2212 1 \u2264 x < 0 , 0 , \u2009 otherwise ,$
where in onedimension, $ q s$ is the charge per unitarea of species $s$, $ \rho n$ is the charge density at grid node $n$, and $ X i$ is the location of particle $i$.
 Solve the Poisson equation on the grid using the nonuniform 3point stencil for the Laplacian$ 2 x n + 1 \u2212 x n \u2212 1 x n \u2212 x n \u2212 1 \u2009 \varphi n \u2212 1 \u2212 2 x n + 1 \u2212 x n \u2212 1 1 x n + 1 \u2212 x n + 1 x n \u2212 x n \u2212 1 \varphi n + 2 x n + 1 \u2212 x n \u2212 1 x n + 1 \u2212 x n \u2009 \varphi n + 1 = \u2212 \rho n \epsilon 0 ,$
where $ \varphi n$ is the electric potential at grid node $n$, and we set Dirichlet boundary conditions by assigning the potential at the end point grid nodes $ \varphi 0$ and $ \varphi N$ and Eq. (A3) is applied for all $n\u2208 1 , N \u2212 1$.
 Calculation of the electric field from the potential is given by a cellcentered finite difference approach$ E + 1 / 2=\u2212 \varphi n + 1 \u2212 \varphi n x n + 1 \u2212 x n,$
where $ E n + 1 / 2$ is the electric field at the center of cell $n$ with location $ x n + 1 / 2$.
 Interpolation of the electric field back to the particles is performed via the nearest grid point shape function$ E i= \u2211 n E n + 1 / 2 S 0 X i \u2212 x n + 1 / 2 x n + 1 \u2212 x n ,$$ S 0 x= 1 , \u2212 1 / 2 \u2264 x < 1 / 2 , 0 , \u2009 otherwise ,$
where $ E i$ is the electric field for particle $i$, and $ S 0 x$ is the nearest grid point shape function.
 Update the particle phase space data ( $ X i, V i$) via the standard leapfrog algorithm.$ V i k + 1 / 2 \u2212 V i k \u2212 1 / 2 \Delta t= q s m s E i k,$$ X i k + 1 \u2212 X i k \Delta t= V i k + 1 / 2.$
where m_{s} is the mass of species s.

Collide particles via the chosen MonteCarlo Collision algorithms.

Return to step 1.
As for the uniform grid case, one could derive higher order ECPIC methods on a nonuniform grid, although this can be considerably more involved.
APPENDIX B: BENCHMARKING OF minipic
minipic was benchmarked against Turner et al.'s RFCCP simulations^{9} which provide plots of plasma density for comparison. In order to match this benchmark, minor modifications to the collision algorithms of Vahedi & Surendra^{27} were made; these included:

Adopting an isotropic scattering angle $\chi $ for electronneutral collisions: $ cos \chi =1\u22122r$, where $r\u2208 0 , 1$ is a uniformly sampled random number.

Removing the small change in electron velocity due to momentum transfer with neutrals.

Assuming an equal split of remaining kinetic energy between the incident and created electrons during ionization.
The results from minipic and the benchmark are shown in Fig. 11 demonstrating excellent agreement for all cases. Based on these data, we consider minipic an accurate PICMCC code for the purpose of this paper.
APPENDIX C: EMPIRICAL HEATING RATES OF THE EXPLICIT MOMENTUM AND ENERGY CONSERVING PIC ALGORITHMS
To demonstrate the excellent energy conserving properties of ECPIC for underresolved simulations, we model a uniform Helium plasma up to $t= 10 \u2009 000 / \omega p e$ with time step $\Delta t= 0.1 / \omega p e$ on a grid with 2048 cells and 512 particlespercell perspecies. The boundary conditions are periodic, and no collisions are included; therefore, the total number of particles and total energy should remain constant for an accurate simulation. The spatial length of the simulation is determined by the cell size, which is scaled as a multiple of the electron Debye length $ \lambda D e$ at initialization.
From Fig. 12, it is clear that for the MCPIC algorithm, the finitegridinstability significantly heats the plasma until stability criterion 1a is satisfied, leading to large numerical errors in underresolved simulations. For the ECPIC case, however, the error is significantly smaller, even for highly underresolved simulations. The maximum growth rate, for the $\Delta x=1024 \lambda D e$ case, is $9.5\xd7 10 \u2212 7 \omega p e$, which is due to the finite time stepping leapfrog algorithm. The mean electron density and temperature within the RFCCP discharge modeled in this text are $ 4.4 \xd7 10 15/ m 3$ and $3.1\u2009eV$, respectively, corresponding to a uniform artificial power input of approximately $7.7\u2009 W / m 3$. This is three orders of magnitude lower than the multiple $ kW / m 3$ electron heating rate in the discharge [see, for example, Fig. 1(f) in the main text] and therefore acceptable for our purpose. If required, the artificial heating could be further decreased by adopting a smaller time step or larger number of simulation particles.
Energy oscillations are observed at short times in all simulations due to noise from the random initialization of macroparticles.^{6} These oscillations are eventually dissipated by Landua damping. A “quiet start” initial distribution could eliminate this behavior;^{109} however, we do not pursue this here since for our very longtime simulations the initial condition plays little to no role in the final quasisteadystate outcome.
Despite the advantages of ECPIC in energy conservation, the user should take care to check how underresolution of Debye scale physics or lack of momentumconservation affects their specific configuration before relying completely on the algorithm.
APPENDIX D: MOMENTUM CONSERVING PIC WITH NONUNIFORM GRIDS
Note that to call the momentumconserving (MC)PIC algorithm as such when using a nonuniform grid is a misnomer, since the nonuniform nature of the cells introduces particle selfforces that compromise momentum conservation. However, we continue to refer to it as MCPIC in this section to improve continuity.
To verify that the improvements in accuracy for the RFCCP simulations with ECPIC with a nonuniform grid are a result of both changes, it is important to see if the accuracy of MCPIC is significantly improved on a nonuniform grid. Figure 13 shows similar plots of the RFCCP discharge as those within the main body of the text. Even though we do see an increase in accuracy with this approach, likely due to part of the simulation domain being appropriately resolved to satisfy stability criterion 1a, we still see that the method is inaccurate for $\Delta x max\u22658\Delta x ref$ and fails for $\Delta x max\u226516\Delta x ref$. This indicates that is not the introduction of a nonuniform grid alone which produces more accurate simulations using ECPIC.