Modeling and forecasting the dynamics of complex systems, such as moderate pressure capacitively coupled plasma (CCP) systems, remains a challenge due to the interactions of physical and chemical processes across multiple scales. Historically, optimization for a given application would be accomplished via a design of experiment (DOE) study across the various external control parameters. Machine learning (ML) techniques show the potential to “forecast” process conditions not tested in a traditional DOE study and thereby allow better optimization and control of a plasma tool. In this article, we have used standard DOE as well as ML predictions to analyze I-V data in a moderate-pressure CCP system. We have demonstrated that supervised regression ML techniques can be a useful tool for extrapolating data even when a plasma system is undergoing a transition in the heating mode, in this case from the alpha to gamma mode. Classification analysis of control parameters is another possible application of ML techniques that can be deployed for system control. Here, we show that given a large set of measured data, the models can identify the gas ratio in the feed gas as well as correctly identify the operating pressure and electrode gap in almost all the cases.

CE: Tables 9–11 presented in picture format in Author source so that we treat as fx1 to fx3. Kindly check.

## I. INTRODUCTION

Due to their wide use in the semiconductor industry, capacitively coupled plasmas (CCPs) are among the most commonly used laboratory plasmas. CCPs are commonly categorized based on their operational pressure and the resulting “heating mode.”^{1–5} A low-pressure discharge typically falls within the range of tens of $\mu $Torr to tens of mTorr, while the moderate pressure range extends from 1 to 100 Torr.^{5,6} These two pressure regimes exhibit notable distinctions regarding breakdown, heating mechanisms, and sustaining the plasma discharge.^{3–6} Although the general pressure dependence of CCPs is known, studies of these systems in the moderate pressure regime are limited. This is in part because “traditional” diagnostic tools, such as Langmuir probes, do not work at these moderate pressures.^{7,8} Nonetheless, such moderate pressure CCP systems are widely utilized in industrial applications, including carbon nanotube and diamond-like carbon deposition processes,^{9} flat panel display, and solar panel fabrication industries,^{10} as well as an active medium for $ CO 2$ lasers.^{5} Despite their widespread use, a lack of understanding of radio frequency (RF) CCP discharges in these pressure ranges has resulted in most designs being based on empirical studies.^{5}

Modeling and forecasting the dynamics of complex systems, such as moderate pressure CCP systems, remains a challenge due to the interactions of physical and chemical processes across multiple scales. Observational data, including multifidelity data from sensors, can provide valuable insights, but integrating it into existing models is difficult.^{11} Under some conditions, it is possible to deploy a large suite of diagnostics and via those results develop effective models of the plasma system. Under other conditions, such as moderate or high pressures, traditional diagnostics (Langmuir probes, etc.) will not function correctly, and detailed models of the plasmas are harder to develop. On the other hand, ML approaches, particularly deep learning, can extract features from massive amounts of data. Machine learning proves exceptionally valuable in scenarios where we lack a comprehensive theory, yet we seek to discern meaningful trends. Essentially, machine learning automates the scientific method,^{12} mirroring the sequence of hypothesis generation, testing, and either rejection or refinement. This technology equips us with a neutral toolkit for streamlining the process of discovery. Consequently, it is no wonder that machine learning is presently catalyzing revolutions across numerous domains, including science, technology, business, and medicine.^{13}

Machine learning holds the potential to expand our understanding of laboratory plasmas operating under conditions that are largely unexplored.^{14–17} The influence of ML in the realm of low-temperature plasmas (LTPs) is especially noteworthy for emerging applications such as plasma treatment in microelectronics production, quantum materials processing, LTP-based advancements in the chemical industry, as well as in medicine and biotechnology.^{18} Machine learning and data-driven approaches show promise in enhancing plasma research by addressing challenges in modeling plasma–surface interactions, enabling real-time diagnostics, and developing predictive controllers for efficient and automated LTP treatments on complex biological surfaces.^{19} So far, they have been used to accurately predict plasma parameters like density and electron temperature in LTPs^{20,21} as well as in fusion plasma research.^{22,23} In plasma medicine research, ML combined with standard dosimetric techniques has been found to offer a highly tunable and beneficial dose rate of LTPs for controlled radiobiological effects.^{24}

LTPs, particularly those used in the semiconductor industry, are complex environments that are influenced by many external “control parameters.”^{25,26} These control parameters include external power (power deposition method, power level, supplied power frequency); neutral gas (species, pressure, flow rate); and chamber geometry/materials. For many years, researchers deploying plasma systems in the semiconductor industry made use of design of experiment (DOE) studies to ascertain control parameter regimes that would result in the effective processing of a wafer or other substrates. Specifically, DOE studies have been used to narrow in on the “correct parameters” needed in a large number of processes, including hydrophilicity of fabric,^{27} gas utilization in semiconductor manufacturing,^{28} wire bonding,^{29} and many others. A basic review of DOEs and how they are used is given in the online NIST/SEMATECH handbook.^{30} DOE studies do not allow one to build models of a process—just find which combination of parameters will give rise to the “proper” processing of a wafer. Early results with ML already show promise in bridging this gap.^{18,31–33}

In this article, we will examine the use of ML to predict trends in heating mode transition^{4,5,34–36} of moderate-pressure CCPs, specifically between 0.1 and 3.5 Torr for argon, nitrogen, and oxygen plasmas. Levitski^{34} was the first to point out this transition between two distinct but stable regimes in a capacitive RF discharge at the moderate pressure regime and named them as alpha and gamma mode plasmas after the Townsend ionization coefficients.^{4} Here, we will make use of the data collected from an experimental study^{36} of the current-voltage (I–V) characteristics at 0.5–2.5 Torr. We will show results and compare the prediction accuracy of different ML models for both supervised regression as well as supervised classification approaches. We will start by describing the experimental system and related diagnostic tools in Sec. II. This will include a review of how the measured data is analyzed to arrive to our database. In Sec. III, we will describe the DOE and present the resultant structure of the database that will be used in our ML study. In Sec. IV A, we will deploy four common ML regression analysis models to examine our experimental data. We will explore the efficacy of various ML models on the resultant I–V data sets in predicting I–V data under varying conditions. In Sec. IV B, we will use ML to explore the classification of measured data based on their control parameters. Finally, we will provide our conclusions in Sec. V.

## II. EXPERIMENTAL SETUP

The experiments in this study were conducted using the modified gaseous electronics chamber (mGEC).^{37–41} The mGEC reactor’s design has been discussed in detail by Goeckner *et al.*^{37} Originally, the mGEC had an inductively coupled plasma (ICP) source, which was later converted into a CCP source.^{38} A general schematic of the CCP system is depicted in Fig. 1. The plasma-facing powered and grounded aluminum electrodes are surrounded by grounded electrode shields, maintaining a gap of about 2.5 mm between the electrode and shield. The powered electrode measures 11.4 cm in diameter, while the grounded electrode is 15 cm in diameter. The gap between them is adjustable from 2 to 12.5 cm. Although the mGEC has internal walls to control chamber diameter, they were not utilized in these studies.

As is shown in Figs. 1 and 2, an RF signal generator (Keysight 33600 waveform generator) produced a 13.56 MHz signal, amplified by an ENI A-300 RF Power Amplifier to create input power. Baseline measurements of supplied and reflected power were taken with a Bird Model 43-wattmeter. The power then passed through an L-type match network, equipped with two adjustable capacitors, which were set to minimize the reflected power, before reaching the powered electrode. The load encompassed the plasma, DC bias circuit, current and voltage probes (I–V), the powered electrode, the 50 $\Omega $ transmission line, and the grounded chamber with a grounded electrode/chuck.

Frequency (MHz) . | Z_{1} (Ω)
. | Z_{2} (Ω)
. | Z_{g1} (Ω)
. | Z_{g2} (Ω)
. |
---|---|---|---|---|

13.56 | 28.97i | 4.96−113.87i | 1.1+11.95i | −71.79i |

27.12 | 0.12+7.6i | 2.39−18.68i | 1.1+23.91i | −35.89i |

40.68 | 11.48+81.83i | 2.52+35.96i | 1.1+35.86i | −23.93i |

Frequency (MHz) . | Z_{1} (Ω)
. | Z_{2} (Ω)
. | Z_{g1} (Ω)
. | Z_{g2} (Ω)
. |
---|---|---|---|---|

13.56 | 28.97i | 4.96−113.87i | 1.1+11.95i | −71.79i |

27.12 | 0.12+7.6i | 2.39−18.68i | 1.1+23.91i | −35.89i |

40.68 | 11.48+81.83i | 2.52+35.96i | 1.1+35.86i | −23.93i |

As is also shown in Figs. 1 and 2, the DC self-bias, current, and voltage were measured on the 50 $\Omega $ transmission line between the matching network and powered electrode. The DC self-bias measurement setup featured an 84 $\mu $H choke followed by a capacitor to ground for measuring bias. The current probe utilized a Rogowski coil (Pearson electronics 2877), while the voltage probe was built in-lab by capacitively coupling the transmission line’s powered lead through Teflon. An identical Rogowski coil measured current through the grounded electrode. High-speed data acquisition was performed using a Teledyne Lecroy HDO 6104B oscilloscope with 12-bit vertical resolution and $ 10 10$ samples per second. To enhance voltage probe sensitivity to plasma sheath harmonics, it was connected to the oscilloscope’s 50 $\Omega $ input. A fast Fourier transform (FFT) analysis discerned fundamental and higher harmonics within waveforms, crucial for power and impedance calculations of the sheath and plasma bulk.

Calibrating both current and voltage probes involved obtaining amplitude and phase factors as functions of frequency, including relative phase. The oscilloscope’s 50 $\Omega $ input provided a known resistive load for probe calibration. Calculating I–V magnitude and phase at the electrode relied on FFT data from the probes and the parasitic impedances listed in Table I. The electrical length of the transmission line between the probe and electrode, as measured by a network analyzer, was accounted for when reconstructing the I–V waveform at the powered electrode. Additionally, an equivalent circuit, see Fig. 2, incorporating measured parasitics, Table I, was used to calculate current and voltage at the electrodes from raw data at the probes. The methods of parasitic impedance, delay measurements, and calibration are described by Press.^{41} Typical measured and reconstructed I–V traces are shown in Fig. 3.

^{42}

^{,}

^{42}

^{,}$V(t)$ is the voltage across the sheath ( $V(t)<0$) and is given by

^{42}

^{,}

^{43,44}Because the temporal change in the voltage across the sheath is determined by the applied voltage, the temporal change in the conduction current across the sheath will also follow the applied voltage. This implies that at the electrode

^{42}This observation becomes apparent when considering the definition of displacement current,

^{45}Thus, if $V(t)= V 0cos(\omega t)$ and total current, $I(t)= I 0cos(\omega t\u2212\nu )$, $\nu $ being the phase shift, we can write the even and odd component of $I(t)$ as

## III. mGEC MODERATE PRESSURE PLASMA DATABASE

A study of moderate pressure plasma can consist of an indefinite number of experiments involving many adjustable control parameters, such as feed gas, power, pressure, interelectrode gap, etc. To understand the effects of these control parameters, current and voltages (I–V) were measured at combinations of four control parameters: applied power (10–70 W), electrode gap (20–28 mm), operating pressure (0.5–2.5 Torr), and ratios of feed gases ( $ Ar/ N 2$, $ Ar/ O 2$, $ N 2/ O 2$). Specific combinations of these parameters were determined using a commercial software package (JMP). The order in which the runs were taken was randomized. The experiments were repeated four times, allowing for either the direct measurement or calculation of the parameters shown in Table II. In Table III, we show the p-values of the ten primary I–V quantities vs the six external control parameters as calculated using JMP. p-values are a measure of the probability that a value is due to random chance.^{46} In general, $P<0.05$, or a $5%$ chance of random occurrence, is considered statistically significant. As seen in Table III, variation in the electrode gap was found to have an insignificant correlation with all the measured quantities and thus was kept fixed at 24 mm for subsequent experiments.

No. . | Parameter . | Description . |
---|---|---|

Measured parameters | ||

1 | V_{DC} | DC self-bias |

2 | V_{rf,1} | Total 1st harmonic peak to peak voltage |

3 | V_{rf,2} | Total 2nd harmonic peak to peak voltage |

4 | V_{rf,3} | Total 3rd harmonic peak to peak voltage |

5 | I_{rf,1} | Total driven electrode peak to peak current (1st harmonic) |

6 | I_{rf,2} | Total driven electrode peak to peak current (2nd harmonic) |

7 | I_{rf,3} | Total driven electrode peak to peak current (3rd harmonic) |

8 | I_{gnd,1} | Total ground electrode peak to peak current (1st harmonic) |

9 | I_{gnd,2} | Total ground electrode peak to peak current (2nd harmonic) |

10 | I_{gnd,3} | Total ground electrode peak to peak current (3rd harmonic) |

11 | ν_{1,p} | Phase difference at the probe (1st harmonic) |

12 | ν_{2,p} | Phase difference at the probe (2nd harmonic) |

13 | ν_{3,p} | Phase difference at the probe (3rd harmonic) |

Calculated parameters | ||

14 | I_{cond,1} | Driven electrode peak to peak conduction current (1st harmonic) |

15 | I_{cond,2} | Driven electrode peak to peak conduction current (2nd harmonic) |

16 | I_{cond,3} | Driven electrode peak to peak conduction current (3rd harmonic) |

17 | I_{disp,1} | Driven electrode peak to peak displacement current (1st harmonic) |

18 | I_{disp,2} | Driven electrode peak to peak displacement current (2nd harmonic) |

19 | I_{disp,3} | Driven electrode peak to peak displacement current (3rd harmonic) |

20 | R_{1} | Resistive impedance at the driven electrode (1st harmonic) |

21 | R_{2} | Resistive impedance at the driven electrode (2nd harmonic) |

22 | R_{3} | Resistive impedance at the driven electrode (3rd harmonic) |

23 | X_{1} | Reactive impedance at the driven electrode (1st harmonic) |

24 | X_{2} | Reactive impedance at the driven electrode (2nd harmonic) |

25 | X_{3} | Reactive impedance at the driven electrode (3rd harmonic) |

26 | P_{1,e} | Average power at the driven electrode (1st harmonic) |

27 | P_{2,e} | Average power at the driven electrode (2nd harmonic) |

28 | P_{3,e} | Average power at the driven electrode (3rd harmonic) |

29 | P_{1,p} | Average power at the probe (1st harmonic) |

30 | P_{2,p} | Average power at the probe (2nd harmonic) |

31 | P_{3,p} | Average power at the probe (3rd harmonic) |

32 | ν_{1,e} | Phase difference at the driven electrode (1st harmonic) |

33 | ν_{2,e} | Phase difference at the driven electrode (2nd harmonic) |

34 | ν_{3,e} | Phase difference at the driven electrode (3rd harmonic) |

No. . | Parameter . | Description . |
---|---|---|

Measured parameters | ||

1 | V_{DC} | DC self-bias |

2 | V_{rf,1} | Total 1st harmonic peak to peak voltage |

3 | V_{rf,2} | Total 2nd harmonic peak to peak voltage |

4 | V_{rf,3} | Total 3rd harmonic peak to peak voltage |

5 | I_{rf,1} | Total driven electrode peak to peak current (1st harmonic) |

6 | I_{rf,2} | Total driven electrode peak to peak current (2nd harmonic) |

7 | I_{rf,3} | Total driven electrode peak to peak current (3rd harmonic) |

8 | I_{gnd,1} | Total ground electrode peak to peak current (1st harmonic) |

9 | I_{gnd,2} | Total ground electrode peak to peak current (2nd harmonic) |

10 | I_{gnd,3} | Total ground electrode peak to peak current (3rd harmonic) |

11 | ν_{1,p} | Phase difference at the probe (1st harmonic) |

12 | ν_{2,p} | Phase difference at the probe (2nd harmonic) |

13 | ν_{3,p} | Phase difference at the probe (3rd harmonic) |

Calculated parameters | ||

14 | I_{cond,1} | Driven electrode peak to peak conduction current (1st harmonic) |

15 | I_{cond,2} | Driven electrode peak to peak conduction current (2nd harmonic) |

16 | I_{cond,3} | Driven electrode peak to peak conduction current (3rd harmonic) |

17 | I_{disp,1} | Driven electrode peak to peak displacement current (1st harmonic) |

18 | I_{disp,2} | Driven electrode peak to peak displacement current (2nd harmonic) |

19 | I_{disp,3} | Driven electrode peak to peak displacement current (3rd harmonic) |

20 | R_{1} | Resistive impedance at the driven electrode (1st harmonic) |

21 | R_{2} | Resistive impedance at the driven electrode (2nd harmonic) |

22 | R_{3} | Resistive impedance at the driven electrode (3rd harmonic) |

23 | X_{1} | Reactive impedance at the driven electrode (1st harmonic) |

24 | X_{2} | Reactive impedance at the driven electrode (2nd harmonic) |

25 | X_{3} | Reactive impedance at the driven electrode (3rd harmonic) |

26 | P_{1,e} | Average power at the driven electrode (1st harmonic) |

27 | P_{2,e} | Average power at the driven electrode (2nd harmonic) |

28 | P_{3,e} | Average power at the driven electrode (3rd harmonic) |

29 | P_{1,p} | Average power at the probe (1st harmonic) |

30 | P_{2,p} | Average power at the probe (2nd harmonic) |

31 | P_{3,p} | Average power at the probe (3rd harmonic) |

32 | ν_{1,e} | Phase difference at the driven electrode (1st harmonic) |

33 | ν_{2,e} | Phase difference at the driven electrode (2nd harmonic) |

34 | ν_{3,e} | Phase difference at the driven electrode (3rd harmonic) |

Control . | Ar flow . | N_{2} flow
. | O_{2} flow
. | Pressure . | Gap . | Power . |
---|---|---|---|---|---|---|

. | (%) . | (%) . | (%) . | (Torr) . | (mm) . | (W) . |

Powered electrode | ||||||

V_{DC} | 0.0024 | 0.0311 | 0.4070 | <0.0001 | 0.7025 | <0.0001 |

V_{rf,1} | 0.0012 | 0.0043 | 0.7295 | 0.0252 | 0.7869 | <0.0001 |

V_{rf,2} | 0.0001 | 0.0444 | 0.0971 | 0.0007 | 0.9940 | <0.0001 |

V_{rf,3} | 0.9278 | 0.8050 | 0.8758 | <0.0001 | 0.8299 | <0.0001 |

I_{rf,1} | <0.0001 | 0.0083 | 0.0911 | 0.9634 | 0.8046 | <0.0001 |

I_{rf,2} | 0.0001 | 0.0450 | 0.0970 | 0.0007 | 0.9933 | <0.0001 |

I_{rf,3} | 0.8136 | 0.7093 | 0.8912 | <0.0001 | 0.8446 | <0.0001 |

Ground electrode | ||||||

I_{gnd,1} | 0.0009 | 0.0225 | 0.3313 | 0.0031 | 0.1812 | <0.0001 |

I_{gnd,2} | 0.3328 | 0.4769 | 0.7984 | <0.0001 | 0.3369 | <0.0001 |

I_{gnd,3} | 0.0585 | 0.1756 | 0.5999 | 0.0012 | 0.5578 | <0.0001 |

Control . | Ar flow . | N_{2} flow
. | O_{2} flow
. | Pressure . | Gap . | Power . |
---|---|---|---|---|---|---|

. | (%) . | (%) . | (%) . | (Torr) . | (mm) . | (W) . |

Powered electrode | ||||||

V_{DC} | 0.0024 | 0.0311 | 0.4070 | <0.0001 | 0.7025 | <0.0001 |

V_{rf,1} | 0.0012 | 0.0043 | 0.7295 | 0.0252 | 0.7869 | <0.0001 |

V_{rf,2} | 0.0001 | 0.0444 | 0.0971 | 0.0007 | 0.9940 | <0.0001 |

V_{rf,3} | 0.9278 | 0.8050 | 0.8758 | <0.0001 | 0.8299 | <0.0001 |

I_{rf,1} | <0.0001 | 0.0083 | 0.0911 | 0.9634 | 0.8046 | <0.0001 |

I_{rf,2} | 0.0001 | 0.0450 | 0.0970 | 0.0007 | 0.9933 | <0.0001 |

I_{rf,3} | 0.8136 | 0.7093 | 0.8912 | <0.0001 | 0.8446 | <0.0001 |

Ground electrode | ||||||

I_{gnd,1} | 0.0009 | 0.0225 | 0.3313 | 0.0031 | 0.1812 | <0.0001 |

I_{gnd,2} | 0.3328 | 0.4769 | 0.7984 | <0.0001 | 0.3369 | <0.0001 |

I_{gnd,3} | 0.0585 | 0.1756 | 0.5999 | 0.0012 | 0.5578 | <0.0001 |

Dataset . | Factorial . | No. of runs . | Mixture . | Flow ratio (%) . | Pressure (Torr) . | Power (W) . | Gap (mm) . |
---|---|---|---|---|---|---|---|

1 | Level 2 | 4 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 0.5 1.5 2.5 | 10 25 40 55 70 | 20 24 28 |

2 | Full | 4 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 0.5 1.5 2.5 | 10 25 40 55 70 | 24 |

3 | Full | 1 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 1 2 | 10 25 40 55 70 | 24 |

Dataset . | Factorial . | No. of runs . | Mixture . | Flow ratio (%) . | Pressure (Torr) . | Power (W) . | Gap (mm) . |
---|---|---|---|---|---|---|---|

1 | Level 2 | 4 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 0.5 1.5 2.5 | 10 25 40 55 70 | 20 24 28 |

2 | Full | 4 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 0.5 1.5 2.5 | 10 25 40 55 70 | 24 |

3 | Full | 1 | Ar: O_{2}Ar: N _{2}N _{2}: O_{2} | 0:100 33:67 67:33 100:0 | 1 2 | 10 25 40 55 70 | 24 |

Once the initial screening was completed, comprehensive experiments were conducted across power, pressure, and gas mixtures. DC self-bias as well as magnitude and phases of electrode voltage and currents were measured at the first three harmonics as a function of these control parameters. Power, impedance, conduction, and displacement currents were calculated from the measured magnitude and phases. Experiments were run for all combinations of pressure of 0.5, 1.5, 2.5 Torr, and nominal power of 10, 25, 40, 55, and 70 W (read from the wattmeter) for pure argon, nitrogen, and oxygen as well different mixtures between them; see Table IV. Each experiment was run for a total of four times and 13 full RF cycles were processed for each run. Each cycle consisted of about 738 data points acquired by the oscilloscope, giving a phase resolution of about one-half degree. A $13\xd74$ matrix was built for all parameters of interest for each run. For each of the four iterations, average parameter values as well as the error bars were calculated by taking the mean and the standard deviation of the 13 acquired cycles. To study interpolation using machine learning, additional data were later collected at 1 and 2 Torr for the same gap, gas mixtures, and power ranges. Data at 1 and 2 Torr were collected only once and postprocessed in a similar way.

The results of these experiments were used to construct three separate matrices from the available datasets (Datasets 1, 2, and 3) for machine learning and statistical analysis with JMP. The first matrix consists of the measured phase-independent 10 parameters, namely, voltage and current magnitudes at all three harmonics and DC bias (parameters 1–10 in Table II) from all three sets in Table IV. For each parameter of the first two Datasets, the mean of the four runs was used. The second matrix has all 34 parameters from all three sets above including all four runs for Datasets 1 and 2, size $894\xd734$. This is the comprehensive matrix including all experimental data phase dependent or not. Finally, the third matrix consists of all 34 parameters with only the experiment corresponding to the max electrode power of the four runs for the first two sets, size $201\xd734$. This matrix was prepared to represent the “best” of the four runs since power loss in the transmission line/match network was minimal for this run.

## IV. MACHINE LEARNING ANALYSIS

Empirical modeling, characterized by its focus on understanding observed data patterns, is a cornerstone of scientific exploration. It delves into datasets, uncovers hidden trends, and generates insights into the underlying processes. Through statistical techniques, visualization, and exploratory data analysis, empirical modeling aids in developing hypotheses and theories. It is particularly useful in initial data exploration. On the other end of the spectrum, deterministic modeling thrives on well-defined rules and equations. By simulating real-world processes and interactions, deterministic models offer precise insights into cause-and-effect relationships. Despite the precision of deterministic modeling, its accuracy is often limited by computational resources. Unlike empirical modeling, deterministic modeling is often susceptible to limitations in the underlying approximations of the model itself, which makes it unwieldy especially where the physics is not well understood.

Machine learning has the potential to change how industrial plasmas are optimized for a given use.^{14} In general, ML relies on advanced statistical analysis of data sets to generate insights into underlying processes. A number of these analysis techniques have been developed and are readily available. In our ML-based examination of moderate pressure CCP discharges, we will make use of both “classification” and “regression” ML techniques available in MATLAB (R2023a). Specifically, MATLAB has built in a set of both “supervised” (for labeled input, X, and output, Y, data) and “unsupervised” (for unlabeled data) ML techniques. Supervised results in a mapping from the input variables to the output values via $Y=f ( X )$. Supervised techniques can be further divided into “classification” and “regression” techniques. Classification techniques are those that seek to apply a label to a given input (cat vs dog picture) while regression provides an algebraic output value.

### A. Supervised regression analysis

Four different ML-supervised regression models available in MATLAB were compared to each other for our studies:

Deep Neural Network (DNN) regression with Levenberg–Marquerdt backpropagation (MATLAB function

*trainlm*).^{47,48}Tree Ensemble (TE) regression

^{49–51}with an ensemble of decision trees (MATLAB function*treebagger*).^{50,52}Naive Bayes (NB) (MATLAB function

*fitrgp*).^{53–56}Support Vector (SV) regression model (MATLAB function

*fitrsvm*).^{57–61}

Each of these regression techniques focuses on learning relationships between input features and output variables using labeled training data. This approach enables the creation of predictive models capable of making accurate forecasts on new, unseen data. The strength of supervised regression lies in its adaptability to diverse scenarios. Through algorithms like Neural Networks, Tree Ensembles, Naive Bayes, and Support Vector Machines, it can capture intricate, nonlinear relationships present in data. By training on existing data, supervised regression extracts patterns, enabling it to generalize to new data while providing accurate predictions.

Neural Networks^{62} comprise layers of nodes that learn relationships by adjusting weights through iterative training. Neural Networks excel at handling complex, nonlinear data and are suitable for large datasets where nuanced relationships are key. Specifically, the Levenberg–Marquerdt method is an algorithm that makes use of both the Gauss–Newton method and the steepest descent method to solve nonlinear least squares problems.^{47,48} On the other hand, TEs operate through ensemble learning, combining multiple decision trees to enhance predictive accuracy.^{63} Decision trees split data based on features, and Random Forests aggregate their outputs. This approach reduces overfitting and provides robustness, making it suitable for various data types and large datasets. Naive Bayes,^{64} another widely used regression method, is a probabilistic algorithm rooted in Bayesian probability. Despite its assumption of feature independence, Naive Bayes performs remarkably well in text and categorical data analysis. It is particularly efficient for high-dimensional data and quick predictions. Finally, Support Vector Machines aim to find the best hyperplane that separates data points or predicts a continuous value.^{58} They excel in scenarios with distinct class separation or complex feature interactions, often utilizing kernel functions to capture nonlinear relationships. Support Vector Machines prioritize generalization and perform well even in high-dimensional spaces.

These regression models were used to train and predict the 34 different parameters shown in Table II vs the plasma control parameters shown in Table IV. The “training” serves to adjust the weights of each input parameter, thereby adjusting the results reaching the output layer. The input variables/parameters were standardized before training, i.e., data for each variable was rescaled to have a mean of 0 and standard deviation of 1. Additionally, the model hyperparameters^{65} were optimized during training. Hyperparameters are essential settings or configurations that are not learned from the training data but are predefined by the model developer. These parameters govern the speed, quality, and performance of the machine-learning model during the training process. Examples of hyperparameters include learning rates in gradient descent, batch size, the depth of decision trees in TEs, and the number of hidden layers in neural networks. The choice of hyperparameters significantly impacts a model’s ability to learn and generalize to new, unseen data. Therefore, hyperparameter tuning, which involves systematically optimizing these settings, is a crucial step in building effective machine-learning models. For example, the TE model was optimized with Bayesian optimization^{66} using quantile error.^{67} The predictor importance object “OOBPermutedPredictorDeltaError” in the “treebagger” function in MATLAB was used to infer the minimum number of inputs required while predicting high-error bar parameters. On the other hand, the neural network model was optimized by cross-validating 15% of the data and testing 15% of the data, while the rest was used for training. Additionally, the depth of the neural network for each parameter was determined by finding the number of hidden layers between 10 and 50 giving the least mean squared error (MSE) during training. An example of MSE variation with the number of hidden layers is shown in Fig. 4.

Both interpolation and extrapolation of the data sets were studied using our supervised regression models. For interpolation, the models were trained at 0.5, 1.5, and 2.5 Torr and tested at 1 and 2 Torr. For extrapolation, prediction accuracy was checked at both the high and low end of the pressure range, namely, at 2.5 and 0.5 Torr. For the extrapolation to 2.5 Torr, the models were trained with data at the lower pressures. Similarly, for the extrapolation to 0.5 Torr, the models were trained with data at higher pressures. This was done to check how the models perform at a pressure where the physics may be different than where they were trained. For predicting phase-independent parameters, only the control parameters: the gas ratio, electrode gap, pressure, and nominal power were used as inputs. Since these parameters are known to have a low error bar,^{36} mean values of the measured parameters from Matrix 1 as shown in Table V were used for training and testing the machine learning models. On the other hand, all available data from the four repeated experiments (Matrix 2) were used to train and predict the phases and phase-dependent parameters as they may have a high error bar. Training the models with data from all experiments enables them to learn from an increased number of observations as well as from additional input parameters, the values of which may vary between repeated runs, unlike the control parameters. These additional inputs can be used given they can be predicted accurately beforehand. This allows one to use cumulative inputs with the help of predictor importance ranking by the TE model until the desired prediction accuracy is achieved.

Matrix 1 . | Matrix 2 . | Matrix 3 . |
---|---|---|

10 phase-independent parameters (e.g., voltage and current magnitudes at all harmonics, and DC bias) and the corresponding control parameters from all three sets above. For the first two datasets mean of the four runs was used. Number of observations: 291. | All 34 parameters (both phase independent and dependent) and the corresponding control parameters from all three datasets above including all four runs for Datasets 1 and 2. Number of observations: 894. | All 34 parameters (both phase independent and dependent) and the corresponding control parameters with only the experiment corresponding to the max electrode power of the four runs for the first two sets. Number of observations: 201. |

Matrix 1 . | Matrix 2 . | Matrix 3 . |
---|---|---|

10 phase-independent parameters (e.g., voltage and current magnitudes at all harmonics, and DC bias) and the corresponding control parameters from all three sets above. For the first two datasets mean of the four runs was used. Number of observations: 291. | All 34 parameters (both phase independent and dependent) and the corresponding control parameters from all three datasets above including all four runs for Datasets 1 and 2. Number of observations: 894. | All 34 parameters (both phase independent and dependent) and the corresponding control parameters with only the experiment corresponding to the max electrode power of the four runs for the first two sets. Number of observations: 201. |

#### 1. Prediction accuracy check with regression models

Phase-independent parameters were first examined by means of prediction with both interpolation and extrapolation. Matrix 1 was used to predict and check fit for the parameters at 1 and 2 Torr. 201 observations at 0.5, 1.5, and 2.5 Torr were used for training with control parameters ( $201\xd76$ matrix) as inputs. Then predictions were checked against the remaining 90 independent observations at 1 and 2 Torr (see Table VI). Similarly, for extrapolation at 2.5 and 0.5 Torr, 216 observations were used for training, and predictions were checked against the remaining 75 independent observations (see Table VI). Voltage and currents to both powered and grounded electrodes at first and second harmonics were found to have a correlation coefficient of 0.95 or higher between the measured and predicted data by at least one model. The only exception was the second harmonic current to the ground electrode at 0.5 Torr with a correlation coefficient of 0.84. On the other hand, predictions for the third harmonic value despite having a low error bar,^{36} for most cases, were less accurate than the others ( $r<0.95$).

Phase-independent parameters . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | Interpolation (1–2 Torr) . | Extrapolation (0.5 Torr) . | Extrapolation (2.5 Torr) . | |||||||||

Parameter . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . |

V_{DC} | 0.98 | 0.97 | 0.95 | 0.97 | 0.89 | 0.94 | 0.32 | 0.92 | 0.96 | 0.88 | 0.86 | 0.86 |

V_{rf,1} | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.96 | 0.99 | 0.99 | 0.98 | 0.97 |

V_{rf,2} | 0.99 | 0.99 | 0.96 | 0.98 | 0.93 | 0.94 | 0.95 | 0.97 | 0.99 | 0.99 | 0.98 | 0.94 |

V_{rf,3} | 0.92 | 0.92 | 0.93 | 0.94 | 0.94 | 0.94 | 0.96 | 0.97 | 0.91 | 0.88 | 0.63 | 0.82 |

I_{rf,1} | 0.99 | 0.99 | 0.99 | 0.99 | 0.95 | 0.96 | 0.86 | 0.97 | 0.99 | 0.98 | 0.97 | 0.98 |

I_{rf,2} | 0.99 | 0.99 | 0.97 | 0.99 | 0.93 | 0.94 | 0.82 | 0.96 | 0.99 | 0.98 | 0.97 | 0.96 |

I_{rf,3} | 0.93 | 0.91 | 0.92 | 0.95 | 0.93 | 0.93 | 0.91 | 0.96 | 0.91 | 0.86 | 0.76 | 0.46 |

I_{gnd,1} | 0.98 | 0.99 | 0.99 | 0.99 | 0.93 | 0.94 | 0.94 | 0.96 | 0.98 | 0.99 | 0.96 | 0.96 |

I_{gnd,2} | 0.98 | 0.96 | 0.92 | 0.95 | 0.84 | 0.84 | 0.79 | 0.81 | 0.98 | 0.97 | 0.37 | 0.83 |

I_{gnd,3} | 0.90 | 0.86 | 0.82 | 0.79 | 0.68 | 0.76 | 0.78 | 0.82 | 0.92 | 0.95 | 0.38 | 0.84 |

Phase-independent parameters . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

. | Interpolation (1–2 Torr) . | Extrapolation (0.5 Torr) . | Extrapolation (2.5 Torr) . | |||||||||

Parameter . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . |

V_{DC} | 0.98 | 0.97 | 0.95 | 0.97 | 0.89 | 0.94 | 0.32 | 0.92 | 0.96 | 0.88 | 0.86 | 0.86 |

V_{rf,1} | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.96 | 0.99 | 0.99 | 0.98 | 0.97 |

V_{rf,2} | 0.99 | 0.99 | 0.96 | 0.98 | 0.93 | 0.94 | 0.95 | 0.97 | 0.99 | 0.99 | 0.98 | 0.94 |

V_{rf,3} | 0.92 | 0.92 | 0.93 | 0.94 | 0.94 | 0.94 | 0.96 | 0.97 | 0.91 | 0.88 | 0.63 | 0.82 |

I_{rf,1} | 0.99 | 0.99 | 0.99 | 0.99 | 0.95 | 0.96 | 0.86 | 0.97 | 0.99 | 0.98 | 0.97 | 0.98 |

I_{rf,2} | 0.99 | 0.99 | 0.97 | 0.99 | 0.93 | 0.94 | 0.82 | 0.96 | 0.99 | 0.98 | 0.97 | 0.96 |

I_{rf,3} | 0.93 | 0.91 | 0.92 | 0.95 | 0.93 | 0.93 | 0.91 | 0.96 | 0.91 | 0.86 | 0.76 | 0.46 |

I_{gnd,1} | 0.98 | 0.99 | 0.99 | 0.99 | 0.93 | 0.94 | 0.94 | 0.96 | 0.98 | 0.99 | 0.96 | 0.96 |

I_{gnd,2} | 0.98 | 0.96 | 0.92 | 0.95 | 0.84 | 0.84 | 0.79 | 0.81 | 0.98 | 0.97 | 0.37 | 0.83 |

I_{gnd,3} | 0.90 | 0.86 | 0.82 | 0.79 | 0.68 | 0.76 | 0.78 | 0.82 | 0.92 | 0.95 | 0.38 | 0.84 |

Measured phases were found to have a smaller signal to noise ratio (SNR) than the phase-independent parameters due to the tuning variability of the matching network.^{36} Here, the SNR is the average measured parameter value divided by the standard deviation. The accuracy of prediction in such cases could be related to the error bar magnitude of that parameter. Hence, interpolation and extrapolation were studied separately for the phase-dependent parameters, specifically the phase difference, conduction, and displacement current at the powered electrode, using all available data from Matrix 2. For interpolating these parameters, predictions were checked against 90 independent observations at 1 and 2 Torr (see Table VII), with models trained from the remaining 804 observations at 0.5, 1.5, and 2.5 Torr. For extrapolation at 2.5 and 0.5 Torr, 594 observations were used for training and 300 remaining independent observations for testing (see Table VII). When using only the control parameters as inputs (Table VII), only the displacement current predictions were accurate with a correlation coefficient >0.95 in most cases. Although displacement current depends on phase, having a low error bar^{36} could explain its prediction accuracy with only control parameters as inputs. Due to the value of sine being small at measured phases of the cycle [see Eq. (10)], the variation in displacement current magnitude is less sensitive to change in phase resulting in a lower error bar. Nonetheless, prediction accuracy increased even more with total current as an additional input. The first harmonic current to the powered electrode was chosen as the additional input since it ranked as the most important predictor by the TE model for both interpolation and extrapolation cases. Once the total and displacement currents were predicted accurately, high error bar parameters like conduction current and phase were predicted next. Prediction accuracy for high error bar parameters was less accurate with only the control parameters as inputs. But with total and displacement currents as additional inputs, machine learning models were able to predict conduction current and phase with great accuracy (Fig. 5). This is not surprising given the fact that phase shift and conduction current magnitude can be easily calculated in a single step with the knowledge of total and displacement currents beforehand [see Eqs. (9) and (10)].

. | Interpolation (1–2 Torr) . | Extrapolation (0.5 Torr) . | Extrapolation (2.5 Torr) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Parameter . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . |

First harmonic phase-dependent parameters | ||||||||||||

I_{cond,1} | 0.67 | 0.67 | 0.65 | 0.69 | 0.59 | 0.81 | 0.05 | 0.84 | 0.82 | 0.91 | 0.19 | 0.85 |

I_{disp,1} | 0.99 | 0.96 | 0.99 | 0.92 | 0.95 | 0.96 | 0.88 | 0.97 | 0.99 | 0.96 | 0.94 | 0.89 |

R_{1} | 0.92 | 0.88 | 0.84 | 0.77 | 0.84 | 0.85 | 0.49 | 0.83 | 0.93 | 0.92 | 0.29 | 0.84 |

X_{1} | 0.76 | 0.8 | 0.74 | 0.89 | 0.88 | 0.85 | 0.37 | 0.92 | 0.78 | 0.73 | 0.48 | 0.76 |

P_{1,e} | 0.73 | 0.72 | 0.71 | 0.84 | 0.82 | 0.96 | 0.42 | 0.95 | 0.81 | 0.92 | 0.65 | 0.93 |

P_{1,p} | 0.86 | 0.85 | 0.78 | 0.94 | 0.91 | 0.98 | 0.5 | 0.96 | 0.9 | 0.94 | 0.1 | 0.95 |

ν_{1,e} | 0.36 | 0.29 | 0.29 | 0.46 | 0.37 | 0.57 | 0.21 | 0.59 | 0.59 | 0.56 | 0.17 | 0.7 |

ν_{1,p} | 0.69 | 0.64 | 0.61 | 0.7 | 0.35 | 0.32 | <0 | 0.32 | 0.86 | 0.76 | 0.61 | 0.63 |

Second harmonic phase-dependent parameters | ||||||||||||

I_{cond,2} | 0.91 | 0.98 | 0.94 | 0.78 | 0.9 | 0.94 | 0.93 | 0.92 | 0.99 | 0.97 | 0.91 | 0.88 |

I_{disp,2} | 0.93 | 0.99 | 0.98 | 0.82 | 0.92 | 0.94 | 0.88 | 0.97 | 0.99 | 0.94 | 0.75 | 0.94 |

R_{2} | 0.33 | 0.38 | 0.35 | 0.32 | 0.32 | 0.35 | 0.11 | 0.36 | 0.56 | 0.45 | 0.10 | <0 |

X_{2} | <0 | <0 | <0 | <0 | 0.63 | 0.46 | <0 | 0.47 | 0.24 | <0 | <0 | 0.55 |

P_{2,e} | 0.92 | 0.99 | 0.95 | 0.99 | 0.85 | 0.9 | 0.8 | 0.86 | 0.99 | 0.79 | 0.85 | 0.96 |

P_{2,p} | 0.71 | 0.66 | 0.77 | 0.52 | 0.72 | 0.81 | 0.66 | 0.74 | 0.88 | 0.7 | 0.82 | 0.44 |

ν_{2,e} | 0.38 | 0.42 | 0.31 | 0.42 | 0.26 | 0.24 | 0.05 | 0.31 | 0.69 | 0.44 | <0 | 0.41 |

ν_{2,p} | 0.28 | 0.31 | 0.05 | 0.33 | 0.34 | 0.16 | 0.18 | 0.28 | 0.21 | 0.48 | 0.14 | 0.07 |

Third harmonic phase-dependent parameters | ||||||||||||

I_{cond,3} | 0.75 | 0.80 | 0.76 | 0.76 | 0.83 | 0.91 | 0.53 | 0.90 | 0.73 | 0.62 | 0.17 | <0 |

I_{disp,3} | 0.85 | 0.86 | 0.92 | 0.87 | 0.93 | 0.94 | 0.83 | 0.95 | 0.90 | 0.78 | 0.55 | 0.82 |

R_{3} | 0.05 | <0 | 0.01 | 0.02 | 0.25 | 0.13 | 0.11 | 0.16 | 0.12 | 0.55 | <0 | 0.41 |

X_{3} | 0.4 | 0.44 | 0.4 | 0.53 | 0.41 | 0.27 | 0.02 | 0.37 | 0.37 | 0.33 | 0.09 | 0.47 |

P_{3,e} | 0.75 | 0.76 | 0.51 | 0.69 | 0.82 | 0.83 | 0.73 | 0.89 | 0.77 | 0.01 | 0.08 | 0.75 |

P_{3,p} | 0.6 | 0.58 | 0.54 | 0.51 | 0.65 | 0.83 | 0.17 | 0.8 | 0.62 | <0 | <0 | 0.65 |

ν_{3,e} | 0.08 | 0.04 | 0.02 | ≈0 | 0.25 | 0.17 | <0 | 0.17 | 0.15 | 0.2 | <0 | 0.39 |

ν_{3,p} | 0.17 | 0.15 | 0.13 | 0.30 | 0.25 | 0.29 | <0 | 0.32 | 0.11 | 0.24 | <0 | 0.46 |

. | Interpolation (1–2 Torr) . | Extrapolation (0.5 Torr) . | Extrapolation (2.5 Torr) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Parameter . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . | TE . | NB . | DNN . | SV . |

First harmonic phase-dependent parameters | ||||||||||||

I_{cond,1} | 0.67 | 0.67 | 0.65 | 0.69 | 0.59 | 0.81 | 0.05 | 0.84 | 0.82 | 0.91 | 0.19 | 0.85 |

I_{disp,1} | 0.99 | 0.96 | 0.99 | 0.92 | 0.95 | 0.96 | 0.88 | 0.97 | 0.99 | 0.96 | 0.94 | 0.89 |

R_{1} | 0.92 | 0.88 | 0.84 | 0.77 | 0.84 | 0.85 | 0.49 | 0.83 | 0.93 | 0.92 | 0.29 | 0.84 |

X_{1} | 0.76 | 0.8 | 0.74 | 0.89 | 0.88 | 0.85 | 0.37 | 0.92 | 0.78 | 0.73 | 0.48 | 0.76 |

P_{1,e} | 0.73 | 0.72 | 0.71 | 0.84 | 0.82 | 0.96 | 0.42 | 0.95 | 0.81 | 0.92 | 0.65 | 0.93 |

P_{1,p} | 0.86 | 0.85 | 0.78 | 0.94 | 0.91 | 0.98 | 0.5 | 0.96 | 0.9 | 0.94 | 0.1 | 0.95 |

ν_{1,e} | 0.36 | 0.29 | 0.29 | 0.46 | 0.37 | 0.57 | 0.21 | 0.59 | 0.59 | 0.56 | 0.17 | 0.7 |

ν_{1,p} | 0.69 | 0.64 | 0.61 | 0.7 | 0.35 | 0.32 | <0 | 0.32 | 0.86 | 0.76 | 0.61 | 0.63 |

Second harmonic phase-dependent parameters | ||||||||||||

I_{cond,2} | 0.91 | 0.98 | 0.94 | 0.78 | 0.9 | 0.94 | 0.93 | 0.92 | 0.99 | 0.97 | 0.91 | 0.88 |

I_{disp,2} | 0.93 | 0.99 | 0.98 | 0.82 | 0.92 | 0.94 | 0.88 | 0.97 | 0.99 | 0.94 | 0.75 | 0.94 |

R_{2} | 0.33 | 0.38 | 0.35 | 0.32 | 0.32 | 0.35 | 0.11 | 0.36 | 0.56 | 0.45 | 0.10 | <0 |

X_{2} | <0 | <0 | <0 | <0 | 0.63 | 0.46 | <0 | 0.47 | 0.24 | <0 | <0 | 0.55 |

P_{2,e} | 0.92 | 0.99 | 0.95 | 0.99 | 0.85 | 0.9 | 0.8 | 0.86 | 0.99 | 0.79 | 0.85 | 0.96 |

P_{2,p} | 0.71 | 0.66 | 0.77 | 0.52 | 0.72 | 0.81 | 0.66 | 0.74 | 0.88 | 0.7 | 0.82 | 0.44 |

ν_{2,e} | 0.38 | 0.42 | 0.31 | 0.42 | 0.26 | 0.24 | 0.05 | 0.31 | 0.69 | 0.44 | <0 | 0.41 |

ν_{2,p} | 0.28 | 0.31 | 0.05 | 0.33 | 0.34 | 0.16 | 0.18 | 0.28 | 0.21 | 0.48 | 0.14 | 0.07 |

Third harmonic phase-dependent parameters | ||||||||||||

I_{cond,3} | 0.75 | 0.80 | 0.76 | 0.76 | 0.83 | 0.91 | 0.53 | 0.90 | 0.73 | 0.62 | 0.17 | <0 |

I_{disp,3} | 0.85 | 0.86 | 0.92 | 0.87 | 0.93 | 0.94 | 0.83 | 0.95 | 0.90 | 0.78 | 0.55 | 0.82 |

R_{3} | 0.05 | <0 | 0.01 | 0.02 | 0.25 | 0.13 | 0.11 | 0.16 | 0.12 | 0.55 | <0 | 0.41 |

X_{3} | 0.4 | 0.44 | 0.4 | 0.53 | 0.41 | 0.27 | 0.02 | 0.37 | 0.37 | 0.33 | 0.09 | 0.47 |

P_{3,e} | 0.75 | 0.76 | 0.51 | 0.69 | 0.82 | 0.83 | 0.73 | 0.89 | 0.77 | 0.01 | 0.08 | 0.75 |

P_{3,p} | 0.6 | 0.58 | 0.54 | 0.51 | 0.65 | 0.83 | 0.17 | 0.8 | 0.62 | <0 | <0 | 0.65 |

ν_{3,e} | 0.08 | 0.04 | 0.02 | ≈0 | 0.25 | 0.17 | <0 | 0.17 | 0.15 | 0.2 | <0 | 0.39 |

ν_{3,p} | 0.17 | 0.15 | 0.13 | 0.30 | 0.25 | 0.29 | <0 | 0.32 | 0.11 | 0.24 | <0 | 0.46 |

#### 2. Extrapolation beyond mGEC database with regression models

Based on the above effort, we recognize that ML can be used to extrapolate beyond the known results. Once prediction accuracy is checked at the high and low end of the pressure range, namely, at 2.5 and 0.5 Torr, one can use ML to further extrapolate data at pressures greater than 2.5 Torr as well as smaller than 0.5 Torr where experimental data is unavailable. To demonstrate this, Matrix 3 was used to extrapolate voltage and current parameters at 3.5 and 0.1 Torr (Fig. 6). Predictions were made at 3.5 Torr with the model giving the best fit at 2.5 Torr. The best model was trained for each respective parameter with all available data for that parameter at 0.5, 1.5, and 2.5 Torr. Similarly, predictions were made at 0.1 Torr using the best-fitting model at 0.5 Torr. ML predictions at both 3.5 and 0.1 Torr were found to be congruent with the I–V trends at increasing/decreasing pressure. This can be seen from Fig. 6, where predictions for fundamental voltage and total fundamental current for pure argon, nitrogen, and oxygen at 0.1 and 3.5 Torr have been compared to those at 0.5, 1, 2, and 2.5 Torr as well to the experimental I–V curves at 0.5, 1, 2, and 2.5 Torr.

The predictions shown here are consistent with the transition from a low to moderate pressure discharge. This transition was proposed to be a consequence of the alpha-mode sheath breakdown due to an increased emission of secondary electrons into the bulk plasma^{35} and is accompanied by a sharp growth in the electron density and a rise in the steepness of the I–V curve.^{35} Following this, in Fig. 6, we see a similar rise in steepness in argon and oxygen between 0.1 and 2 Torr, the rate of increase being much higher in argon. These predictions are consistent with the known alpha–gamma transitions for argon,^{68} and oxygen.^{3} Additionally, between 2 and 3.5 Torr for argon and oxygen, we see no significant change in the I–V curves. Thus, ML predicts that once the transition is complete, ( $p>2$ Torr), no significant change in plasma production/ionization mechanisms occurs.

### B. Classification of control parameters

Five different classification models were used for comparison for the supervised classification studies. A linear discriminant analysis (LDA) was used for supervised classification in addition to the four models used for regression. LDA aims to find a linear combination of features or variables that best separates or discriminates between two or more classes or groups.^{69,70} It does this by modeling the distribution of each class, making assumptions about their underlying probability distributions (usually Gaussian), and then calculating the likelihood of a new data point belonging to each class based on these distributions. The key idea behind LDA is to maximize the between-class variance while minimizing the within-class variance, resulting in a set of discriminant functions. These functions are then used to project new data points into a lower-dimensional space, making it easier to classify them into their respective classes. LDA is particularly useful when dealing with multiple classes and can be a powerful tool for dimensionality reduction while preserving class separability.

The ML models were used to classify the data by three of the four control parameters, namely, the operating pressure, gap, and the gas content of the plasma. The designated MATLAB functions for Neural Network, Tree Ensemble, Naive Bayes, Support Vector, and Linear Discriminant Analysis were “patternnet,” “fitcensemble,” “fitcnb,” “fitcsvm,” and “fitcdscr,” respectively. For classification studies, Matrix 2 was used to check prediction accuracy for a given control parameter using all 34 parameters as well as the remaining control parameters as inputs. The models were randomly trained on 80% of all 894 observations and tested on the rest 20% of the observations independent of the training set. The models were optimized during the training. The Neural Network model was optimized by cross-validating 15% of the data and testing 15% of the data while the rest was used for training. The number of hidden layers was 30. On the other hand, the shallow models were optimized by tuning all of the hyperparameters of the respective models. Moreover, predictors were ranked based on their relative importance using the ensemble tree model. For a given control parameter, a standard least square fitting optimization was also done with JMP using the remaining control parameters as well as all the measured and calculated parameters listed in Table II as predictors. The predictors were ranked in descending order of their p-values. The JMP ranking was then compared to the parameter relative importance ranking by the TE model; see Fig. 7. Here, the parameter relative importance ranking, generated by the TE model, was found to be very different from the p-value (logworth) ranking generated by the combined least square fitting method using JMP. The least-square fit method of analyzing DOE data has been used successfully for many years to find the optimal parameter space for material processing.^{27–30} Because ML predictor importance ranking is radically different from DOE p-value (logworth) ranking of the parameters, it seems unlikely that one will be able to use ML parameter relative importance ranking values to optimize processes. However, it is worth pointing out that neither DOE nor ML studies can produce an accurate ranking of the optimal parameter space for processing a wafer as neither is based on a physical model of the discharge.

#### 1. Results for classification analysis

Classification accuracy for the ML models was checked by plotting confusion matrices^{71} as well as receiver operating characteristic (ROC) curves.^{72} A confusion matrix (Tables IX–XI) shows the summary of the model’s predictions compared to the actual outcomes. Each table shows the number of experimental or “true” control parameters compared to the number of predicted control parameters. It is observed from these tables that ML classification analysis is able to correctly classify >95% of the correct control parameters. On the other hand, a ROC curve (Figs. 8–10) is a graphical representation used to evaluate the performance of a binary classification model or a diagnostic test. It plots the true positive rate (TPR) against the false positive rate (FPR) across different threshold values for classifying the positive and negative classes. The area under curve (AUC) for such a curve measures the probability that a classifier will be more confident in correctly identifying a randomly chosen positive example as positive compared to a randomly chosen negative example. Thus, the closer the AUC is to 1, the better the classification, and an AUC of 1 implies a perfect classification by the model. The specific AUC obtained for all of the ML models against all of the examined control parameters is given in Table VIII. It is observed that the AUC is close to 1 for almost all the cases. This implies that the models were able to predict the correct operating pressure, gas ratio, and electrode gap of a given set of measurements almost all the time. This gives high confidence that one can use these techniques in monitoring plasma systems for rapid fault detection.

. | AUC . | Total % accuracy . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Class name . | TE . | NB . | DNN . | SV . | LDA . | TE . | NB . | DNN . | SV . | LDA . |

Pressure classification | ||||||||||

500 mTorr | 1 | 0.99 | 1 | 1 | 1 | |||||

1000 mTorr | 0.99 | 0.97 | 1 | 1 | 1 | |||||

1500 mTorr | 1 | 1 | 1 | 0.99 | 0.99 | 98.88 | 88.76 | 98.3 | 98.88 | 96.63 |

2000 mTorr | 0.99 | 0.96 | 0.96 | 0.98 | 0.91 | |||||

2500 mTorr | 1 | 1 | 0.99 | 1 | 0.99 | |||||

Electrode gap classification | ||||||||||

20 mm | 0.91 | 0.99 | 0.97 | 0.90 | 1 | |||||

24 mm | 0.96 | 0.99 | 0.99 | 0.92 | 0.99 | 97.19 | 97.75 | 97.2 | 97.75 | 97.75 |

28 mm | 0.99 | 1 | 1 | 1 | 0.99 | |||||

Feed gas classification | ||||||||||

Ar (pure) | 1 | 0.99 | 1 | 1 | 1 | |||||

Ar: N_{2} ∼ 2: 1 | 0.99 | 0.98 | 1 | 0.97 | 0.98 | |||||

Ar: N_{2} ∼ 1: 2 | 1 | 0.99 | 1 | 1 | 0.99 | |||||

N_{2} (pure) | 0.99 | 0.99 | 0.99 | 0.99 | 0.95 | |||||

N_{2}: O_{2} ∼ 2: 1 | 0.99 | 0.99 | 0.99 | 0.99 | 0.97 | 91.01 | 87.64 | 95.5 | 93.26 | 93.82 |

N_{2}: O_{2} ∼ 1: 2 | 0.99 | 0.99 | 1 | 1 | 0.98 | |||||

O_{2} (pure) | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | |||||

O_{2}: Ar ∼ 2: 1 | 0.98 | 0.98 | 0.91 | 0.91 | 0.82 | |||||

O_{2}: Ar ∼ 1: 2 | 1 | 0.99 | 1 | 0.99 | 0.98 |

. | AUC . | Total % accuracy . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Class name . | TE . | NB . | DNN . | SV . | LDA . | TE . | NB . | DNN . | SV . | LDA . |

Pressure classification | ||||||||||

500 mTorr | 1 | 0.99 | 1 | 1 | 1 | |||||

1000 mTorr | 0.99 | 0.97 | 1 | 1 | 1 | |||||

1500 mTorr | 1 | 1 | 1 | 0.99 | 0.99 | 98.88 | 88.76 | 98.3 | 98.88 | 96.63 |

2000 mTorr | 0.99 | 0.96 | 0.96 | 0.98 | 0.91 | |||||

2500 mTorr | 1 | 1 | 0.99 | 1 | 0.99 | |||||

Electrode gap classification | ||||||||||

20 mm | 0.91 | 0.99 | 0.97 | 0.90 | 1 | |||||

24 mm | 0.96 | 0.99 | 0.99 | 0.92 | 0.99 | 97.19 | 97.75 | 97.2 | 97.75 | 97.75 |

28 mm | 0.99 | 1 | 1 | 1 | 0.99 | |||||

Feed gas classification | ||||||||||

Ar (pure) | 1 | 0.99 | 1 | 1 | 1 | |||||

Ar: N_{2} ∼ 2: 1 | 0.99 | 0.98 | 1 | 0.97 | 0.98 | |||||

Ar: N_{2} ∼ 1: 2 | 1 | 0.99 | 1 | 1 | 0.99 | |||||

N_{2} (pure) | 0.99 | 0.99 | 0.99 | 0.99 | 0.95 | |||||

N_{2}: O_{2} ∼ 2: 1 | 0.99 | 0.99 | 0.99 | 0.99 | 0.97 | 91.01 | 87.64 | 95.5 | 93.26 | 93.82 |

N_{2}: O_{2} ∼ 1: 2 | 0.99 | 0.99 | 1 | 1 | 0.98 | |||||

O_{2} (pure) | 0.99 | 0.99 | 0.99 | 0.99 | 0.98 | |||||

O_{2}: Ar ∼ 2: 1 | 0.98 | 0.98 | 0.91 | 0.91 | 0.82 | |||||

O_{2}: Ar ∼ 1: 2 | 1 | 0.99 | 1 | 0.99 | 0.98 |

## V. CONCLUSIONS AND FUTURE WORK

In this article, we have used statistical as well as machine learning predictions to analyze I–V data in a moderate-pressure CCP. We have demonstrated that DNNs, as well as other commonly used machine learning models, can be a useful tool for extrapolating data even with a high error bar at a transitional regime. Specifically, phase data with high error bars can be predicted with great accuracy, which can be used to automatically tune matching networks. Classification of control parameters is another possible application of these models, given a large set of measured data are available. The models were able to identify the gas ratio in the feed gas as well as correctly identify the operating pressure and electrode gap in almost all the cases. The importance of the predictors was ranked for these classification predictions. While input parameter importance ranking can give some insight into physics, comparison with physics-informed models is necessary. Physics-informed learning can integrate domain knowledge and physical laws into ML models, improving their interpretability and accuracy, even with imperfect data.^{11} In future works, physics-informed learning, such as physics-informed neural networks (PINNs)^{11,15} will be explored on the mGEC moderate pressure database.

## ACKNOWLEDGMENTS

This work was supported by a generous donation from Applied Materials Inc.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

### Author Contributions

**Shadhin Hussain:** Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Validation (equal); Writing – original draft (equal); Writing – review & editing (equal). **David J. Lary:** Formal analysis (equal); Methodology (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). **Ken Hara:** Formal analysis (equal); Methodology (equal); Supervision (equal); Validation (equal); Writing – review & editing (equal). **Kallol Bera:** Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal). **Shahid Rauf:** Funding acquisition (equal); Project administration (equal); Resources (equal); Supervision (equal). **Matthew Goeckner:** Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (equal); Writing – original draft (equal); Writing – review & editing (equal).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.

## REFERENCES

*Radio-Frequency Capacitive Discharges*

*The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake our World*

*Journal of Physics: Conference Series*(IOP Publishing, Bristol, UK, 2023), Vol. 2439, p. 012016.

*Glow Discharge Processes: Sputtering and Plasma Etching*

*Plasma Etching: An Introduction*, edited by D. Manos and D. Flamm (Academic, New York, 1989).

*2015 12th International Conference on Service Systems and Service Management (ICSSSM)*Guangzhou, China, 22–24 June 2015 (IEEE, New York, 2015), pp. 1–5.

*Handbook 151: NIST/SEMATECH e-Handbook of Statistical Methods*(National Institute of Standards and Technology, Gaithersburg, MD, 2002), http://www.itl.nist.gov/div898/handbook/mpc/mpc.htm.

*Principles of Plasma Discharges and Materials Processing*

*Classification and Regression Trees*, The Wadsworth Statistics/Probability Series (Wadsworth International Group, Belmont, CA, 1984).

*Gaussian Processes for Machine Learning*

*Estimation of Dependences Based on Empirical Data*, Springer Series in Statistics (Springer, New York, 1982).

*The Nature of Statistical Learning Theory*

*The Nature of Statistical Learning Theory*, 2nd ed., Statistics for Engineering and Information Science (Springer, New York, 2000).

*Estimation of Dependences Based on Empirical Data; Empirical Inference Science: Afterword of 2006*, 2nd ed., Information Science and Statistics (Springer, New York, 2006).

*Neural Networks and Learning Machines*

*Proceedings of 3rd International Conference on Document Analysis and Recognition*

*Optimization Techniques IFIP Technical Conference*, Novosibirsk, July 1–7, 1974 (Springer, New York, 1975), pp. 400–404.

*Quantile Regression*

*Discriminant Analysis and Statistical Pattern Recognition*