Machine learning is a useful tool when extracting hidden information from complex measurement data obtained via surface analysis, as in secondary ion mass spectrometry. Flexible learning methods often require significant effort to adjust parameters, as these parameters may have a significant effect on results. However, machine learning methods enable the extraction of new information that cannot be found by manual analysis. This paper presents some examples of complex data analyses using conventional multivariate analysis methods based on linear combinations (principal component analysis and multivariate curve resolution), an unsupervised learning method based on artificial neural networks (sparse autoencoder), and a supervised learning method based on decision trees (random forest). To obtain reproducible and useful results from machine learning applications to surface analysis data, the preparation of data sets—including the selection of variables and the raw data conversion process—is crucial. Moreover, sufficient information representing analytical purposes, such as the chemical structures of unknown samples, material types, and physical or chemical properties of particular materials, must be contained in the data set for supervised learning.

With the development of surface analysis techniques such as secondary ion mass spectrometry (SIMS), data sets obtained from sophisticated machines have become extremely rich and complex. Multivariate analysis1–10 and machine learning11–15 methods facilitate the interpretation of complex data. For example, time-of-flight SIMS (ToF-SIMS) is a powerful analytical tool for complex samples including biological cells and tissues that contain completely unknown materials and provides 2D and 3D distribution images of individual materials and mass spectra containing chemical structural information.16 We primarily introduce analytical applications of ToF-SIMS data as examples of complex spectral and image data analysis. 2D or 3D image data with spectral information—such as electron microscope, infrared, and Raman mass imaging data—can be converted into matrix-style data (Fig. 1).17–23 Various numerical analysis methods, including conventional multivariate analysis and machine learning methods, are readily applicable to matrix data sets.

FIG. 1.

General concept of machine learning application to complex data interpretation and spectrum–image data conversion for machine learning.

FIG. 1.

General concept of machine learning application to complex data interpretation and spectrum–image data conversion for machine learning.

Close modal

Multivariate analysis methods—including principal component analysis (PCA), independent component analysis (ICA),23 maximum autocorrelation factor (MAF) analysis,6–8,23 cluster analysis, and nonnegative matrix factorization (NMF) approaches such as multivariate curve resolution (MCR)6,7—help in identifying the main mass peaks of materials within complex ToF-SIMS data, evaluating the distributions of target materials, and investigating materials related to a region of interest. For example, multivariate analysis is useful for evaluating the ToF-SIMS data of protein samples.1,2,9,24–32 The ToF-SIMS spectra of protein-adsorbed samples cannot be readily differentiated by the straightforward presence or absence of unique peaks because all proteins consist of 20 amino acids, and amino acid fragment ions from proteins are generally detected in ToF-SIMS spectra even with gas cluster ion beams.33–35 PCA can be used to classify amino acid fragment ions from multiple protein samples, and subsequently, to distinguish individual proteins in protein mixture samples.1,2,9 This technique is also useful for evaluating the orientations of immobilized proteins and structural changes on the surface.24–32 When PCA is applied to ToF-SIMS image data, the score of each principal component (PC) reflects the distribution of the corresponding material group, and significant mass peaks related to the PC are indicated by PC loadings. Subtle changes in the surface chemical structures of polymer materials can also be evaluated from ToF-SIMS data using multivariate analysis.3,36 The analysis of ToF-SIMS raw data containing images and spectra facilitates the evaluation of complex biological samples. Mass peaks from biomolecules are classified by applying multivariate analysis, and then, distribution images of individual functions in tissues are obtained.4,37 Furthermore, partial least square (PLS) regression is useful in extracting significant ToF-SIMS peaks corresponding to particular factors.3,38

Thus, multivariate analysis is a powerful tool for studying and comparing raw data sets. However, general multivariate analysis methods are based on linear combinations, whereas more complex ToF-SIMS data sets may contain nonlinear factors owing to matrix effects. Such cases warrant nonlinear multivariate analysis via kernel methods and nonlinear dimensionality reduction methods, such as t-distributed stochastic neighbor embedding (t-SNE)23 and uniform manifold approximation and projection (UMAP).23 An artificial neural network (ANN)-based method was employed to chemically classify the SIMS spectra of adsorbed protein films in 2002.12 When experimentally compared to PCA in terms of the characterization of protein spectra, the ANN technique exhibited superior performance, successfully distinguishing the spectra of all of the adsorbed protein films using the entire mass spectrum. However, the computers of the time were not sufficiently powerful to implement complex ANN systems with large numbers of middle layers or manage datasets exceeding several GB in size. As higher-performing computers and more flexible methods—including ANN-based methods—have been developed since then these approaches have become increasingly useful for interpreting complex ToF-SIMS data.

More recently, various machine learning methods,13–15,39–48 including those based on ANNs, have also been employed to interpret surface analysis data sets including mass and spectroscopic imaging data. These numerical analysis methods are powerful tools for extracting more detailed information from raw data than conventional manual analysis. However, some machine learning methods are so flexible that tuning the parameters for analysis may be difficult, complicating the analytical results. To obtain reproducible results from machine learning, the principles of the surface analysis method used to obtain raw data, as well as the purpose of the analysis must be understood. Although tutorials for multivariate analysis applications to surface analysis are abundant,4,5 this is not the case for machine learning and deep learning, as these approaches have only recently been applied to surface analysis. In this study, the benefits and concerns associated with the use of machine learning for surface analysis data, in terms of the reproducibility of numerical analyses, are summarized.

Applications of machine learning for complex surface analysis data often demand more effort than manual analyses. For example, ANN-based methods, such as autoencoder11,14,15,40,49 and self-organizing maps (SOMs),42,43,45–47 require complex tuning procedures for their parameters. However, by applying machine learning methods to data, new information that cannot be found by manual analysis can be extracted. In fact, a summary of current popular methods would prove of no value, as new machine learning methods are constantly being developed. Conversely, this study examines autoencoder as a representative ANN-based method to identify important factors in the application of unsupervised methods. The interpretation procedure of the features obtained by the autoencoder is introduced, and the autoencoder's performance is then compared with that of popular multivariate analysis methods such as principle component analysis (PCA) and non-negativity matrix factorization (MNF). All three algorithms represent unsupervised learning methods, which are useful for obtaining the outline of data and extracting pure component information, as well as relationships between variables, without any a priori information of the samples. They are specifically useful in the analysis of samples containing unknown factors.

Furthermore, a decision-tree-based supervised learning method, namely, random forest, is introduced and discussed as a representative supervised learning method. Supervised learning methods are useful for finding important variables related to a particular factor within the data. As shown in Fig. 1, supervised learning methods require labels for identifying the data. Annotation and label-setting processes are essential in obtaining the desired prediction results, when using a supervised learning method.41 

Conversion processes of surface analysis raw data into numerical data are essential for obtaining effective and reproducible results from machine learning methods because all information to be extracted by machine learning must be contained within the converted numerical data. Table I represents a sample data set to be used by supervised learning methods, comprising two primary categories: labels and descriptors. These labels generally reflect the properties of the corresponding data; i.e., the labels encompass factors expressing materials, chemical structures, and chemical and physical properties, whereas the descriptors correspond to numerical data obtained using surface analysis methods, such as the signals or intensities of energies, wavelengths, and masses. Each subset (a row in Table I) may represent the spectrum of a pixel or voxel from the imaging data.

TABLE I.

Sample data set for supervised learning methods (one-hot encoding labels).

Sample No.Label 1Label 2Label mDescriptor 1Descriptor 2Descriptor n
… 0.010 0.002 … 0.100 
… 0.000 0.004 … 0.001 
— — — … — — — … — 
… 0.040 0.000 … 0.030 
Sample No.Label 1Label 2Label mDescriptor 1Descriptor 2Descriptor n
… 0.010 0.002 … 0.100 
… 0.000 0.004 … 0.001 
— — — … — — — … — 
… 0.040 0.000 … 0.030 

In the raw data conversion process of surface analysis, as in the cases of SIMS, the peak picking method may be an issue, as some of the peaks may overlap or be too low to be discriminated from noise. The peak auto-search function of the SIMS operation software is generally very useful, although it can potentially omit some of the weak peaks. Binning, wherein a spectrum is divided so that many data points are allocated into each bin, is useful for converting peak spectral intensities into numerical data. If the bin width is very narrow, a large number of peak intensities are generated; in such cases, this number must be reduced by integrating some of the peaks prior to the machine learning process. Moreover, scaling is often necessary to analyze large quantities of data obtained with different machines or under different measurement conditions. Auto scaling, variance scaling and min–max normalization5,7 are generally effective ways to evaluate different types of data simultaneously. Data correction based on measurement error theories is generally useful in obtaining appropriate results from data analysis, for example, Poisson scaling is generally suitable for ToF-SIMS data because such data typically exhibit Poisson-distributed noise approximately.44 

Once the raw data have been converted into numerical matrix data, any learning method can be applied. When supervised learning methods are employed, the data set format (e.g., the contents of labels and variables in descriptors) must be defined in advance to obtain reproducible results after applying new data to the model built by the method. Determining the labels necessary to obtain the desired results and the variables required to provide the necessary information is crucial in the development of an appropriate and reproducible machine learning model.

Data sets for unsupervised learning do not require labels. When raw data contain spectra and images, as shown in Fig. 1, the spectrum at each pixel is converted into numerical data, and the corresponding information is listed. Preprocessed data sets are generally favorable for machine learning to minimize the influence of errors and noise in raw data, as models constructed by learning methods may be based on physical or chemical theories that can be used to interpret the measurement results. PCA, one of the most widely used multivariate analysis methods, provides the outline information of a data set, as well as significant variables (mass peaks for SIMS data) that contribute to each component suggested by the learning method. This type of information is helpful for interpreting the structure of a data set and the relationships between the variables. The autoencoder, an unsupervised learning method based on ANNs (Figs. 2), also indicates important variables that contribute significantly to each feature extracted from the hidden (middle) layer. For example, PCA provides PC scores when raw data—including image data—are analyzed and PC loadings, which indicate the contribution level of each variable (generally providing spectral information). Similarly, the autoencoder indicates features in the middle layer, which shows images when an image containing the data is analyzed. The encoder and decoder weights indicate each variable's contribution, which generally represents spectral information.

FIG. 2.

Example of an autoencoder.

FIG. 2.

Example of an autoencoder.

Close modal
FIG. 3.

Schematics of model samples. The outline, total ion, and secondary ion images specific to three polymers: polystyrene (PS), polyethylene terephthalate (PET), and polycarbonate (PC).

FIG. 3.

Schematics of model samples. The outline, total ion, and secondary ion images specific to three polymers: polystyrene (PS), polyethylene terephthalate (PET), and polycarbonate (PC).

Close modal

In addition, when data are impacted by nonlinear factors such as matrix effects, PCA and MCR may provide results without considering nonlinearity. In contrast, the autoencoder enables nonlinear dimensionality reduction,21,24 as a previous study21 has indicated that it exhibits a higher linearity of the concentration dependence in time-of-flight (ToF)-SIMS data of two-organic compound mixtures.

Data sets for supervised learning methods require labels to describe the samples. Such labels may include material names, material types, chemical structures, chemical bonds, and chemical and physical properties. These factors can be predicted using descriptors generally corresponding to peak intensities in certain spectra, including SIMS, mass spectrometry, infrared, Raman, and x-ray photoelectron spectroscopy spectra.17–23 Both singular labeling and multilabeling may be effective for supervised learning, and labels can be expressed using binary one-hot encoding, class numbers, actual values, or class values. The one-hot encoding label formats listed in Table I can be converted to class numbers as listed in Table II. When actual values, such as molecular weights, transition temperature, viscosity, and hydrophilicity, are used as labels, class values may be more appropriate than actual values, as the latter may cause overlearning.

TABLE II.

Sample data for supervised learning methods.

Sample No.Label Class No.Descriptor 1Descriptor 2Descriptor n
0.010 0.002 … 0.100 
0.000 0.004 … 0.001 
       …   
0.040 0.000 … 0.030 
Sample No.Label Class No.Descriptor 1Descriptor 2Descriptor n
0.010 0.002 … 0.100 
0.000 0.004 … 0.001 
       …   
0.040 0.000 … 0.030 

Random forest (RF), a supervised learning method based on decision trees, is useful for classifying data sets according to labels.14,23 When a single label is used in RF-facilitated analysis, the most important variables (features) are suggested by the model. This function is helpful for determining important variables related to a particular property expressed by the label. This is one of the most powerful functions of RF for analyzing surface analysis datasets.

Learning methods based on ANNs are generally useful for interpreting complex data sets that include nonlinear factors—as in the case of matrix effects—because ANNs are sufficiently flexible to express nonlinear phenomena.24 Although deep ANN-based methods are powerful in obtaining suitable results, the processes used to obtain these results are usually not readily obvious. Understanding these processes is essential for further investigation of the surface analysis data, as they can indicate important variables and relationships between them. Narrow ANN-based methods containing one or a few middle layers can indicate relationships between important variables and particular factors (labels).21 

The model sample for unsupervised learning methods contained three polymers on an aluminum-coated glass substrate (Fig. 3): polyethylene terephthalate (PET), polystyrene (PS), and polycarbonate (PC).8 The polymer sample was measured using TOF-SIMS 5 (ION-TOF GmbH, Germany) with 30 kV Bi3++ before and after sputtering with 5 kV Ar1000+ while maintaining the total primary ion dose at less than 1012 ions/cm2. The secondary ion images were obtained over a 300 × 300 μm2 region of the sample (Fig. 3). The mass peaks in all sample data were autosearched, and 511 peaks were selected. The intensities of 511 mass peaks at each pixel in each sample were converted into matrix data and normalized to the total ion count at each pixel. The ToF-SIMS surface analysis data of the polymer sample before and after sputtering with 5 kV Ar1000+ were combined and analyzed using unsupervised methods: PCA; MCR, which is an MNF method; and sparse autoencoder (SAE). PCA and MCR were implemented using the PLS toolbox (Eigenvector Research Inc., WA, USA) in MATLAB (MathWorks, MA, USA). The SAE was implemented using the default trainAutoencoder setting of the Deep Learning toolbox in MATLAB. Each ToF-SIMS data point spanned 128 × 128 pixels over a 300 × 300 μm region, and the integrated image size was 128 × 256 pixels.

The other model samples for supervised learning included six peptides: physalaemin [amino acid sequence (A. A. S.) ADPNKFYGLM; molecular weight (M. W.) 1155.81, Scrum, Japan], synthesized peptide No. 1 (A. A. S. ESTHQWCK; M. W. 1018.32, Scrum, Japan), angiotensin II (A. A. S. DRVYIHPF; M.W. 1046.49, Scrum, Japan), oxytocin (A. A. S. CYIQNCPLG; M. W. 1008.25, Wako-Fujifilm, Japan), bradykinin (A. A. S. RPPGFSPFR; M. W. 1060.27, Scrum, Japan), and synthesized peptide No. 2 (A. A. S. AEMTHWCK; M.W. 1005.18, Scrum, Japan). The six peptides were measured by VAMAS project participants (Technical Working Area 2, Surface Chemical Analysis, A26).14 The intensities of the peaks in ToF-SIMS spectra were exported using a list encompassing 4230 peaks ranging from m/z 13 to 1214, which were prepared for the previous study.14 These intensities were normalized to the total ion count and, subsequently, used as descriptors. The presence of each amino acid was used as a label and described via one-hot encoding (1 or 0).

In the analysis of relatively simple sample data, such as the polymer sample data considered in this study, similar information may be extracted using typical unsupervised learning methods such as PCA, MCR, and SAE. An image and a spectrum of each material are indicated by score images and loadings in PCA, concentration and spectrum matrices in MCR, and feature images and decoder (or encoder) weights in SAE (Fig. 4). Pure polymer information was clearly extracted from the data obtained after sputtering in each analytical result, as the sample surface contained more contaminants prior to sputtering by Ar cluster ions. Although the multivariate analysis results were described in a previous study,10 we briefly summarize them here. The substrate, PS, PET, and PC were directly extracted by the MCR as components 1, 3, 5, and 6, respectively (Fig. 5 and Table S2 in the supplementary material49). By contrast, PCA extracted the common features of multiple components as in the case of PC4, which is a common feature of PET and PC (Fig. 5). The mass peaks relate to each polymer—m/z 91 (C7H7+) from PS, m/z 104(C7H4O+) from PET, and m/z 135 (C9H11O+) from PC—as indicated by the spectrum matrix of MCR and loadings of PCA (see Table S1 in the supplementary material).10,49 PC1 loadings suggested contaminant siloxane-related peaks [m/z 73 Si(CH3)3, m/z 147 Si(CH3)3OSi(CH3)2,] and PS-related fragment ion m/z 91. After sputtering (the right side of the PC1 score image in Fig. 5), the difference between PS and other polymers was clearly shown. The PCA results reflected the features related to the polymers and contaminants and indicated that one of the most important factors for the ToF-SIMS data of the polymer sample was the effect of sputtering (the contamination at the sample surface). On the other hand, the MCR results were not highly influenced by sputtering because MCR indicated individual ingredients in the sample, including polymers and contaminants, as well.

FIG. 4.

Concept of unsupervised learning methods.

FIG. 4.

Concept of unsupervised learning methods.

Close modal
FIG. 5.

PCA score and MCR concentration matrix images for the polymer sample ToF-SIMS data before and after sputtering by an Ar cluster ion beam.

FIG. 5.

PCA score and MCR concentration matrix images for the polymer sample ToF-SIMS data before and after sputtering by an Ar cluster ion beam.

Close modal

Furthermore, the SAE extracted pure component features alongside the common features of multiple components (Fig. 6 and Table S3 in the supplementary material49): for example, PC, PS, PET, and the common features of PET and PC were extracted as features 3 [decoder weights indicated m/z 135 and 107 from PC, and m/z 165(C13H9+), and 178(C14H10+) from PC and PS], 6 [decoder weights indicated m/z 91, 193(C15H13+), and 117(C8H9+) from PS], 7 [decoder weights indicated m/z 148(C9H8O2+), 76(C6H4+), 105(C8H9+), and m/z 149(C9H9O2+) from PET], and 4 [decoder weights indicated m/z 135, 189(C15H9+), and 63 from PC, and m/z 77(C6H5+) from PC and PET], respectively. If subtle unknown information, such as contaminants or damage, must be identified within a sample, PCA and SAE can be used. If information pertaining to each main component in a sample is prioritized instead, MCR may be more suitable, although PCA and SAE may also be useful.

FIG. 6.

SAE feature images of the polymer sample ToF-SIMS data before and after sputtering by an Ar cluster ion beam.

FIG. 6.

SAE feature images of the polymer sample ToF-SIMS data before and after sputtering by an Ar cluster ion beam.

Close modal

In terms of spectral information, PCA loadings, MCR spectrum matrix, and SAE weights suggest important mass peaks contributing to each component or feature. Table III lists PS-related results from PCA, MCR, and SAE. These include negative high-value loadings of PC 4 from PCA, spectrum matrix elements with high values for component 3 from MCR, and decoder weights with high values for feature 6 from SAE. These results all indicate similar mass peaks associated with PS.8,25,26 In addition, the SAE results may be changed by adjusting the learning parameters such as the feature number, transfer function, regularization conditions, and epoch number. In this study, the default setting of the MATLAB function for the SAE (trainAutoencoder) was employed for polymer sample analysis. Neural-network-based methods such as autoencoder can provide more flexible results than linear-combination-based methods such as PCA and MNF. However, these methods may require more complex parameter adjustments than conventional multivariate analysis.

TABLE III.

PC4 loadings, spectrum matrix elements of component 3 for MCR, and decoder weights for feature 6 of SAE. PC4, component 3, and feature 6 are all associated with PS.

PC 4 (3.13%)Mass (Da)Comp. 3 (14.71%)Mass (Da)DW06Mass (Da)
−0.26 91.04 0.31 91.04 0.82 91.04 
−0.17 117.04 0.23 105.05 0.79 193.06 
−0.15 105.05 0.21 117.04 0.61 117.04 
−0.15 129.04 0.20 193.06 0.56 104.04 
−0.14 193.06 0.19 129.04 0.53 129.04 
−0.13 92.04 0.18 115.03 0.52 92.04 
−0.12 104.04 0.17 103.03 0.48 103.03 
−0.11 181.07 0.17 132.88 0.47 167.04 
−0.11 167.04 0.17 92.04 0.47 181.07 
−0.10 115.03 0.15 104.04 0.45 115.03 
−0.10 103.03 0.15 128.03 0.44 128.03 
−0.09 130.05 0.15 181.07 0.39 116.03 
−0.09 143.07 0.14 167.04 0.36 23.99 
−0.09 116.03 0.14 178.01 0.35 194.06 
−0.09 128.03 0.14 116.03 0.34 177.78 
PC 4 (3.13%)Mass (Da)Comp. 3 (14.71%)Mass (Da)DW06Mass (Da)
−0.26 91.04 0.31 91.04 0.82 91.04 
−0.17 117.04 0.23 105.05 0.79 193.06 
−0.15 105.05 0.21 117.04 0.61 117.04 
−0.15 129.04 0.20 193.06 0.56 104.04 
−0.14 193.06 0.19 129.04 0.53 129.04 
−0.13 92.04 0.18 115.03 0.52 92.04 
−0.12 104.04 0.17 103.03 0.48 103.03 
−0.11 181.07 0.17 132.88 0.47 167.04 
−0.11 167.04 0.17 92.04 0.47 181.07 
−0.10 115.03 0.15 104.04 0.45 115.03 
−0.10 103.03 0.15 128.03 0.44 128.03 
−0.09 130.05 0.15 181.07 0.39 116.03 
−0.09 143.07 0.14 167.04 0.36 23.99 
−0.09 116.03 0.14 178.01 0.35 194.06 
−0.09 128.03 0.14 116.03 0.34 177.78 

The 1037 ToF-SIMS spectra of the six peptides with amino acid number labels (Table IV) were predicted using the RF model described in the previous study.41 To predict unknown peptides that are not encompassed by the training data set, the 20 amino acids contained in peptides were employed as labels for supervised learning, as they could not be predicted by the peptide names in the training data set. An annotation and label setting are essential to obtain the desired prediction results by machine learning. In terms of interpolation, randomly split test data sets (approximately 10% of the 1037 spectra) were predicted perfectly. In the previous study,41 the presence of each amino acid was almost perfectly predicted using a one-hot encoding label in the same RF model. This study demonstrated that the existence and the actual number of a particular chemical structure (an amino acid, in this case) can be predicted by the RF model. Similar results may be obtained using model-based ANNs, although the parameter-tuning process may be more complex than that of an RF.

TABLE IV.

Amino acid number labels for six peptides: physalaemin (ADPNKFYGLM), synthesized peptide No. 1 (ESTHQWCK), angiotensin II (DRVYIHPF), oxytocin (CYIQNCPLG), bradykinin (RPPGFSPFR), and synthesized peptide No. 2 (AEMTHWCK).

LabelADPNKFYGLMESTHQWCKDRVYIHPFCYIQNCPLGRPPGFSPFRAEMTHWCK
LabelADPNKFYGLMESTHQWCKDRVYIHPFCYIQNCPLGRPPGFSPFRAEMTHWCK

This study presents examples of machine learning applications for the extraction of information from surface analysis data, as well as details regarding appropriate data set preparation. Although machine learning methods do not always make the data analysis process easier, they yield more detailed information that cannot be extracted by conventional manual analysis. They are extremely helpful in the investigation of datasets that include completely unknown samples. Thus, the introduction of machine learning may be similar to the introduction of a new machine that provides different information from conventional machines.

When a multicomponent sample (e.g., the polymer sample considered in this study) without severe matrix effects is analyzed, multivariate analyses, namely, PCA and MCR, were demonstrated to provide similar information to that obtained by a neural-network-based unsupervised analysis method, namely, autoencoder. Although neural-network-based methods may produce more flexible results than linear-combination-based methods such as PCA, they often require significantly more complex parameter-tuning procedures to obtain the desired results. Moreover, the appropriate selection of variables (e.g., mass peaks for ToF-SIMS data) and conversion processing of raw data for machine learning are crucial to ensure that reproducible and essential information is extracted from the data sets.

In terms of supervised learning methods, labeling is essential to obtain the desired results. Labels must contain sufficient information representing analytical purposes, such as the chemical structures of unknown samples, material types, and physical or chemical properties of particular materials. Machine learning may provide novel and important information when the data sets (descriptors) and labels are sufficient and adequately prepared.

The author acknowledges the participants of the VAMAS project “Technical Working Area 2, Surface Chemical Analysis, A26” and Tomoko Kawashima for their support regarding ToF-SIMS measurement.

The author has no conflicts to disclose.

Satoka Aoyagi: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Software (equal); Supervision (equal); Validation (equal); Visualization (equal); Writing – original draft (equal).

The data that support the findings of this study are available from the corresponding author upon reasonable request.

1.
M. S.
Wagner
and
D. G.
Castner
,
Langmuir
17
,
4649
(
2001
).
2.
M. S.
Wagner
,
D. J.
Graham
, and
D. G.
Castner
,
Appl. Surf. Sci.
252
,
6575
(
2006
).
3.
A. J.
Urquhart
,
M.
Taylor
,
D. G.
Anderson
,
R.
Langer
,
M. C.
Davies
, and
M. R.
Alexander
,
Anal. Chem.
80
,
135
(
2008
).
4.
S.
Vaidyanathan
,
J. S.
Fletcher
,
R.
Goodacre
,
N. P.
Lockyer
,
J.
Micklefield
, and
J. C.
Vickerman
,
Anal. Chem.
80
,
1942
(
2008
).
5.
J. L. S.
Lee
,
I. S.
Gilmore
, and
M. P.
Seah
,
Surf. Interface Anal.
40
,
1
(
2008
).
6.
V. S.
Smentkowski
,
S. G.
Ostrowski
, and
M. R.
Keenan
,
Surf. Interface Anal.
41
,
88
(
2009
).
7.
J. L. S.
Lee
,
I. S.
Gilmore
, and
M. P.
Seah
,
Surf. Interface Anal.
41
,
653
(
2009
).
8.
A.
Henderson
,
J. S.
Fletcher
, and
J. C.
Vickerman
,
Surf. Interface Anal.
41
,
666
(
2009
).
9.
D. J.
Graham
and
D. G.
Castner
,
Biointerphases
7
,
49
(
2012
).
10.
Y.
Yokoyama
,
T.
Kawashima
,
M.
Ohkawa
,
H.
Iwai
, and
S.
Aoyagi
,
Surf. Interface Anal.
47
,
439
(
2015
).
11.
T.
Kawashima
,
T.
Aoki
,
Y.
Taniike
, and
S.
Aoyagi
,
Biointerphases
15
,
031013
(
2020
).
12.
O. D.
Sanni
,
M. S.
Wagner
,
D.
Briggs
,
D. G.
Castner
, and
J. C.
Vickerman
,
Surf. Interface Anal.
33
,
715
(
2002
).
13.
H. M.
Rostam
,
P. M.
Reynolds
,
M. R.
Alexander
,
N.
Gadegaard
, and
A. M.
Ghaemmaghami
,
Sci. Rep.
7
,
3521
(
2017
).
14.
K.
Matsuda
and
S.
Aoyagi
,
Biointerphases
15
,
021013
(
2020
).
15.
S.
Aoyagi
and
K.
Matsuda
,
Rapid Commun. Mass Spectrom.
37
,
e9445
(
2023
).
16.
I. S.
Gilmore
,
J. Vac. Sci. Technol. A
31
,
050819
(
2013
).
17.
C. M.
Parish
and
L. N.
Brewer
,
Ultramicroscopy
110
,
134
(
2010
).
18.
S.
Aoyagi
,
D.
Hayashi
,
Y.
Murase
,
N.
Miyauchi
, and
A. N.
Itakura
,
e-J. Surf. Sci. Nanotechnol.
21
,
128
(
2023
).
19.
Rekha
Gautam
,
Sandeep
Vanga
,
Freek
Ariese
, and
Siva
Umapathy
,
EPJ Tech. Instrum.
2
,
8
(
2015
).
20.
A.
Beratto-Ramos
,
C.
Agurto-Munoz
,
J. P.
Vargas-Montalba
, and
R. P.
Castillo
,
Carbohydr. Polym.
230
,
115561
(
2020
).
21.
J. P.
Smith
et al,
Analyst
145
,
7571
(
2020
).
22.
S.
Pylypenko
,
K.
Artyushkova
, and
J. E.
Fulghum
,
Appl. Surf. Sci.
256
,
3204
(
2010
).
23.
N.
Verbeeck
,
R. M.
Caprioli
, and
R.
Van de Plas
,
Mass Spectrom. Rev.
39
,
245
(
2020
).
24.
N.
Xia
,
C. J.
May
,
S. L.
McArthur
, and
D. G.
Castner
,
Langmuir
18
,
4090
(
2002
).
25.
J.-W.
Park
,
I.-H.
Cho
,
D. W.
Moon
,
S.-H.
Paek
, and
T. G.
Lee
,
Surf. Interface Anal.
43
,
285
(
2011
).
26.
H.
Wang
,
D. G.
Castner
,
B. D.
Ratner
, and
S.
Jiang
,
Langmuir
20
,
1877
(
2004
).
27.
F.
Liu
,
M.
Dubey
,
H.
Takahashi
,
D. G.
Castner
, and
D. W.
Grainger
,
Anal. Chem.
82
,
2947
(
2010
).
28.
M.
Dubey
,
K.
Emoto
,
H.
Takahashi
,
D. G.
Castner
, and
D. W.
Grainger
,
Adv. Funct. Mater.
19
,
3046
(
2009
).
29.
H. E.
Canavan
,
D. J.
Graham
,
X.
Cheng
,
B. D.
Ratner
, and
D. G.
Castner
,
Langmuir
23
,
50
(
2007
).
30.
R.
Michel
and
D. G.
Castner
,
Surf. Interface Anal.
38
,
1386
(
2006
).
31.
S.
Aoyagi
,
M.
Dohi
,
N.
Kato
,
M.
Kudo
,
S.
Iida
,
M.
Tozu
, and
N.
Sanada
,
e-J. Surf. Sci. Nanotechnol.
4
,
614
(
2006
).
32.
Y.-P.
Kim
,
M.-Y.
Hong
,
J.
Kim
,
E.
Oh
,
H. K.
Shon
,
D. W.
Moon
,
H.-S.
Kim
, and
T. G.
Lee
,
Anal. Chem.
79
,
1377
(
2007
).
33.
S.
Aoyagi
,
J. S.
Fletcher
,
S.
Sheraz
,
T.
Kawashima
,
I.
Berrueta Razo
,
A.
Henderson
,
N. P.
Lockyer
, and
J. C.
Vickerman
,
Anal. Bioanal. Chem.
405
,
6621
(
2013
).
34.
S.
Aoyagi
,
T.
Kawashima
, and
Y.
Yokoyama
,
Rapid Commun. Mass Spectrom.
29
,
1687
(
2015
).
35.
Y.
Yokoyama
et al,
Anal. Chem.
88
,
3592
(
2016
).
36.
Y.-T. R.
Lau
,
L.-T.
Weng
,
K.-M.
Ng
, and
C.-M.
Chan
,
Anal. Chem.
82
,
2661
(
2010
).
37.
L. A.
Klerk
,
P. Y. W.
Dankers
,
E. R.
Popa
,
A. W.
Bosman
,
M. E.
Sanders
,
K. A.
Reedquist
, and
R. M. A.
Heeren
,
Anal. Chem.
82
,
4337
(
2010
).
38.
L.
Yang
,
A. G.
Shard
,
J. L. S.
Lee
, and
S.
Ray
,
Surf. Interface Anal.
42
,
911
(
2010
).
39.
S. A.
Thomas
,
A. M.
Race
,
R. T.
Steven
,
I. S.
Gilmore
, and
J.
Bunch
, in
2016 IEEE Symposium Series on Computational Intelligence (SSCI)
, Athens, Greece, 6–9 December 2016 (IEEE, New York
2016
), pp.
1
7
.
40.
M.
Ito
,
Y.
Kuga
,
T.
Yamagishi
,
M.
Fujita
, and
S.
Aoyagi
,
Biointerphases
15
,
021010
(
2020
).
41.
S.
Aoyagi
et al,
Anal. Chem
.
93
,
4191
(
2021
).
42.
W.
Gardner
,
A. L.
Hook
,
M. R.
Alexander
,
D.
Ballabio
,
S. M.
Cutts
,
B. W.
Muir
, and
P. J.
Pigram
,
Anal. Chem.
92
,
6587
(
2020
).
43.
W.
Gardner
,
D. A.
Winkler
,
D.
Ballabio
,
B. W.
Muir
, and
P. J.
Pigram
,
Biointerphases
15
,
061004
(
2020
).
44.
J.
Tang
,
A.
Henderson
, and
P.
Gardner
,
Analyst
146
,
5880
(
2021
).
45.
W.
Gardner
,
D. A.
Winkler
,
B. W.
Muir
, and
P. J.
Pigram
,
Biointerphases
17
,
020802
(
2022
).
46.
W.
Gardner
,
D. A.
Winkler
,
S. M.
Cutts
,
S. A.
Torney
,
G. A.
Pietersz
,
B. W.
Muir
, and
P. J.
Pigram
,
Anal. Chem.
94
,
7804
(
2022
).
47.
W.
Gardner
,
R.
Maliki
,
S. M.
Cutts
,
B. W.
Muir
,
D.
Ballabio
,
D. A.
Winkler
, and
P. J.
Pigram
,
Anal. Chem.
92
,
10450
(
2020
).
48.
K.
Matsuda
and
S.
Aoyagi
,
Anal. Bioanal. Chem.
414
,
1177
(
2022
).
49.
See the supplementary material online for PCA loadings (the highest or the lowest ten), MCR spectrum matrix (the highest ten), and decoder weights (the highest ten).

Supplementary Material