This study was undertaken to understand the extent and nature of problems in x-ray photoelectron spectroscopy (XPS) data reported in the literature. It first presents an assessment of the XPS data in three high-quality journals over a six-month period. This analysis of 409 publications showing XPS spectra provides insight into how XPS is being used, identifies the common mistakes or errors in XPS analysis, and reveals which elements are most commonly analyzed. More than 65% of the 409 papers showed fitting of XP spectra. An ad hoc group (herein identified as “the committee”) of experienced XPS analysts reviewed these spectra and found that peak fitting was a common source of significant errors. The papers were ranked based on the perceived seriousness of the errors, which ranged from minor to major. Major errors, which, in the opinion of the ad hoc committee, can render the interpretation of the data meaningless, occurred when fitting protocols ignored underlying physics and chemistry or contained major errors in the analysis. Consistent with other materials analysis data, ca. 30% of the XPS data or analysis was identified as having major errors. Out of the publications with fitted spectra, ca. 40% had major errors. The most common elements analyzed by XPS in the papers sampled and researched at an online database, include carbon, oxygen, nitrogen, sulfur, and titanium. A scrutiny of the papers showing carbon and oxygen XPS spectra revealed the classes of materials being studied and the extent of problems in these analyses. As might be expected, C 1s and O 1s analyses are most often performed on sp2-type materials and inorganic oxides, respectively. These findings have helped focus a series of XPS guides and tutorials that deal with common analysis issues. The extent of problematic data is larger than the authors had expected. Quantification of the problem, examination of some of the common problem areas, and the development of targeted guides and tutorials may provide both the motivation and resources that enable the community to improve the overall quality and reliability of XPS analysis reported in the literature.
I. INTRODUCTION
The issues of reproducibility and the quality of data and data analysis in the current scientific literature have recently received considerable attention.1–10 Careful examinations of published data for several types of measurements have suggested that 20%–30% of multiple types of data and analysis have significant flaws.11–13 X-ray photoelectron spectroscopy (XPS) is increasingly used in a growing number of publications but with evidence of growing erroneous and misleading data analysis.7,14,15,16 This current study reports results from a systematic examination of XPS data appearing over a six-month period in three renowned journals. This examination has sought to address two questions: first, what is the extent of faulty XPS data and data analysis appearing in the literature and, second, what types of errors or problems are appearing? This analysis of publications was undertaken in parallel with the preparation of a series of XPS guides and tutorials, where information regarding the nature of common errors in XPS data analysis has informed their development.17
Reproducibility and data reliability are at the heart of scientific endeavor.1 A survey involving more than 1500 scientists reported in Nature in 2016 indicated that 90% of them thought reproducibility was an issue and 50% thought it was at a significant problem.2 In an assessment of 1000 articles reporting thermophysical data, roughly 33% were found to have significant problems.12 A study of metal organic framework (MOF) isotherm data showed wide scatter in the results and that 20% of the data was “out of bounds.”13 Nonreproducible data appear to be found across a wide spectrum of journals.11 Discussions among surface scientists have suggested that similar problems are occurring in published x-ray photoelectron spectroscopy (XPS) data and analysis. This current study was undertaken to assess the magnitude of this problem and, importantly, to seek to understand the types of errors that most commonly occur so that approaches may be devised to address them.
A. Nature and importance of XPS data
XPS, as demonstrated by continuous growth in the number of publications for which it is a keyword, is the most widely used method for chemically analyzing surfaces; it has a multitude of applications in many areas of research and technology.18 The essence of the technique is the energy analysis of photoelectrons generated from surfaces by x-radiation via the photoelectric effect. The energies of the photoelectrons can be used to identify the elements present at a surface, and the relatively short distances that electrons can travel in solids, without losing their identifying energy, provide surface sensitivity. Relatively small changes in the energies of the photoelectron peaks provide information about the chemical environments of the elements at a surface. Other advantages of XPS that have contributed to its rising popularity include its increased availability in both stand-alone instruments and via synchrotron sources, the widespread acceptance of the technique, the possibility of high precision and sensitivity (often down to parts per thousand of a monolayer),19 its quantitative nature, and the morphological information it can provide.20,21 XPS data are collected in the form of lower energy-resolution scans over wide energy ranges (ca. 1000 eV), referred to as survey or wide scans,22 and higher resolution scans over narrower energy ranges (typically 20–30 eV), referred to as “narrow or detail scans,” which focus on specific photoelectron peaks, Auger signals, or valence band signals. The ability of XPS to provide quantitative information about surfaces and interfaces is critical to many areas of science and technology.
B. XPS and reproducibility
The type of information that XPS can provide is often essential for understanding the surfaces and interfaces that play a critical role in material behavior. It can also be critical for enabling the synthesis of materials with advanced properties. As one example, in a study of the synthesis of nanoparticles for medical applications, XPS revealed that inconsistent surface chemistry was the source of nonreproducible and inconsistent behaviors of the particles.23 In such cases, XPS is a tool for understanding and enhancing reproducibility. However, uninformed use or analysis of XPS data and incomplete reporting of processes and analysis can be a source of reproducibility problems.7,24 Unfortunately, in parallel with the proliferation of the technique, there appears to have been an increase in incorrect XPS data analyses and interpretations entering the scientific literature. There is increasing concern that inappropriate XPS data analysis is now self-propagating with new incorrect results referencing older incorrect or incomplete results.7 Indeed, analysts in some labs are asked to “replicate” analytical approaches found in the literature that, upon examination, are found to be significantly flawed or misleading.
The essence of the photoemission process is relatively easy to understand. Indeed, the simple spectra used to demonstrate XPS data to new users are generally easy to explain, understand, and even quantify. However, there are multiple sources of potential issues and complications that impact the quality of XPS data and analysis. Among the important areas here are instrument setup and calibration, sample preparation and mounting, and the collection and analysis of data.16 Incomplete reporting of important information in each of these areas makes data replication and/or data analysis difficult if not impossible.
XPS data and analysis can be highly precise and quantitative.25 Indeed, the current capability to produce high-quality XPS data with appropriate analysis has required a considerable amount of community effort to develop, with contributions from instrument vendors, analysts, and metrologists. Based on interlaboratory comparison studies in the late 1970s,26,27 it became clear that there were, at that time, significant uncontrolled variables related to instrument performance and sample setup that were not yet understood by the community and that caused significant variation in relative amplitudes and peak energies. This understanding led to major efforts in instrument development and the formation of standards committees (ASTM E42 on surface analysis and ISO TC201 on Surface Chemical Analysis) that focused on both measuring instrument performance and developing best practices for instrument operation. These efforts have been highly successful in producing both high-quality instruments and a foundation of standards and guides for XPS analysis.
Beyond instrument setup and operation, a good deal of XPS data is not trivial to appropriately analyze, requiring knowledge of both the physical processes associated with XPS signals and some understanding of the chemical nature of the sample under examination. For example, the impacts of surface charging on distorting peak shapes28 and the presence of spin—orbit splitting and other “final state effects”29 are too frequently not recognized, leading to the incorrect identification of unexpected spectral features as the presence of multiple chemical states on a sample surface.
A common problem for most XPS users is the presence of overlapping peaks; XPS peak widths (typically 0.5–2 eV) are often comparable to the chemical shifts that indicate the chemical states of an element.30,31 Thus, signals from the different chemical states of a given element typically overlap, making peak fitting an indispensable part of much XPS data analysis. Useful peak fitting requires recognition of the physical and chemical processes noted above. It is also essential to select appropriate synthetic peak shapes,31–33 widths, and energies, as well as spectral backgrounds.34,35 Misinterpretation and misuse of peak shapes and energies, inappropriate use of fitting parameters, along with a lack of reporting of processes and parameters used for fitting, are some of the most common problems associated with reliable and reproducible XPS data analysis.
Until the relatively recent increase in the number of XPS instruments, XPS measurements were usually undertaken by a surface analysis group or under the guidance of an experienced analyst who had considerable knowledge about data collection and analysis. However, as the acceptance and adoption of XPS has grown, along with much instrumental automation, many users may have been left without immediate access to the expertise or training needed to perform an analysis, or they may not recognize that it is important.
C. Understanding and addressing the problems
Some within the international community of XPS experts and the ASTM and ISO surface-related standards committees anticipated the widespread use of XPS among nonexperts and the problems that are occurring today. Accordingly, the standards committees increased efforts to provide guides, technical reports, standards, and other information for less experienced XPS analysts.36,37 Various websites have also been created that contain significant amounts of explanatory/tutorial information.38–40 The journal Surface Science Spectra is dedicated to the careful archiving of XPS data, as well as that of other surface analytical methods.
This paper represents an effort by a group of experienced XPS analysts to quantify the degree to which incorrect XPS data and analysis are appearing in the literature and to identify the nature or causes of these issues. The study is one aspect of the response16,17 to a 2018 American Vacuum Society survey41 that revealed a need and desire for protocols, best practices, and standards to address reproducibility issues. Information provided by the current study to further understand the nature of common errors has focused on and identified topics that the current series of tutorial articles addresses, while also identifying additional topics for consideration.
II. ASSESSMENT APPROACH AND DATA MINING
A. Approach and rating system
XPS data published over a six-month period, January to July 2019, in three high-quality journals were evaluated in this study: Journal A, an energy/battery materials journal with an impact factor of ca. 25 (153 papers); Journal B, a surface and materials journal with an impact factor of ca. 4 (153 papers—it is by chance that Journals A and B had the same number of papers that showed XPS spectra); and Journal C, a general scientific journal with a great deal of material content and an impact factor of ca. 4 (103 papers). The XPS data in these journals are expected to be representative of those in the literature generally. These journals are not identified to avoid the association of the issues identified and the related discussion with only these specific journals. Indeed, in our experience, the treatment/reporting of XPS in these journals is the same as what we are seeing in other scientific journals. We do not identify the publishers of these journals, except to say that (i) each of the three journals comes from a different publisher and (ii) these publishers are recognized as three of the best in science and engineering.
XPS spectra that had appeared in the three journals were identified and evaluated by an ad hoc group of six experienced XPS analysts located at five institutions in three different countries. Note that this ad hoc committee was not commissioned to do this work by any professional organization or society. However, the authors are actively involved in the community of scientists, attempting to bring attention to the issues of reproducibility and the quality of data and data analysis in the current scientific literature related to XPS. Based upon their collective experience in collecting, analyzing, and publishing XPS spectra, the authors reviewed and classified the articles in the following manner. The XPS spectra that were evaluated were classified according to a “green, yellow, orange, or red” scheme (Fig. 1), which was agreed upon by the group prior to the start of the assessment. Green and yellow ratings suggest that the data and accompanying data interpretation are fundamentally correct and most likely contribute to and support the conclusions in the paper. An orange rating indicates that the error(s) in the data analysis raise significant concerns regarding the processing, analyzing, and/or reporting of the XPS spectra but they may or may not compromise the validity of the work. A red rating corresponds to one or more “catastrophic” errors that demonstrate a fundamental lack of understanding of the technique, or the data, or possibly the reliance on a previously published erroneous analysis. In all likelihood, red errors compromise the validity of the conclusions in the paper. Most unfitted data landed in the green or yellow categories, as little interpretation was given or required. A more detailed discussion of the types of errors that were observed is presented below.
This assessment focused on published spectra appearing in the main body of a paper or in its supplementary material. In other words, this assessment was based on the data in figures and the information in the corresponding figure captions, including the identification of the peaks and information related to their fitting, which were present in more than 65% of the papers with XPS spectra. Where it seemed essential, additional portions of the papers were examined to verify the appropriateness of the assessment. This assessment did not examine other important aspects of quality XPS data generation and reporting, such as data quantification, system calibration, or reporting of the instrument setup and configuration. Karen Gaskell has researched the degree to which XPS data acquisition parameters and other experimental conditions are adequately reported in the literature. Such information is important for reproducing reported work. In her talk on this topic at the 66th AVS International Symposium in Columbus, OH in 2019, she indicated that a significant number of the papers in the literature fail to adequately report the conditions under which XPS data were acquired—in some cases, authors do not even report the type of instrument that had been employed. Thus, our error assessment will almost certainly underestimate the prevalence of erroneous XPS analyses in the literature.
The spectra in our study were first evaluated separately by five members of the ad hoc committee. Their individual analyses then served as the basis of group tele-discussions in which a consensus on every paper was reached. These consensus results were then evaluated independently by a sixth ad hoc committee member who had not participated in the previous discussions. The sixth committee member was specifically asked to verify whether any red classifications were appropriate and not an over-reaction of the initial group effort. Any discrepancies between the ad hoc committee ratings and those of the sixth committee member were discussed in a conference call to come to final, consensus ratings of the XPS data reported in each paper. In general, there was very good agreement in the initial ratings by the committee: there was no disagreement among the five panelists regarding ca. 60% of the initial rankings, in ca. 33% of all cases, rankings of the ad hoc committee members fell into two neighboring color categories, and in only 7% of ranked papers did initial rankings differ by a greater amount. We note that the sixth independent reviewer mostly agreed with papers that had been classified as red but also recommended that a few that had been classified as orange should be recategorized as red. One dynamic of the group is worth noting, as it impacts the journal review process generally. Each ad hoc committee member was highly knowledgeable in XPS but each also had significant experience in specific areas. Often, it was found that those experts in specific areas were able to identify issues that others, less knowledgeable in the specific area, did not recognize.
Finally, the ratings given to each publication in this study pertain only to the quality of the XPS spectra and spectral analysis—no consideration was given to the apparent merits, validity, impact, or importance of the papers.
B. Mining additional data
After rating the XPS spectra and accompanying spectral analyses in the papers in our study, it was possible to examine the papers for additional information. Topics for this analysis included the nature of the observed errors, the elements analyzed, and the appearance of wide scans versus narrow scans. The frequency and types of common errors in the analysis of specific chemical elements are likely to be particularly important in addressing problems in the literature.
In addition to identifying the most common chemical elements analyzed by XPS in our sample of the literature, we examined the frequency of requests for information about the different elements from the XPS Simplified webpage and database created and run by Thermo Fisher Scientific Inc. This site contains tutorial information for each element, listing photoelectron and Auger peak positions, the most commonly analyzed peaks/spectral regions for each element,42 and specific details for peak fitting the signals of the elements. It also contains general/tutorial information about XPS. The information considered herein consisted of the number of page views for each element over the course of two months (a total of 48 996 unique page views). Of course, it is not possible to know someone's intent in looking up information about a particular element on this website. Nevertheless, these specific page views seem to provide an indication of the elements that are most commonly researched and analyzed by XPS. Note that a subset of this data was previously shown and discussed in an online article.43 Our survey of the literature and the results from XPS Simplified both identify carbon and oxygen as the two most shown and researched of the chemical elements. Additional analysis was then performed to determine the types of carbon- and oxygen-containing materials currently being analyzed by XPS. Carbon- and oxygen-containing materials were classified broadly as polymers, sp2 carbon, inorganic oxides, etc.
III. RESULTS AND DISCUSSION
A. Paper assessments and common errors
In addressing our first question: the quality of the data analysis of the published spectra, our evaluation, shown in Fig. 2(a), is that, on average, for the three journals, ca. 30% of all the papers (with fitted or unfitted spectra) are in the red category and another 30% are orange. We immediately note that these values are consistent with those reported for other characterization methods.12,13
In general, unfitted data were ranked green or yellow [see Fig. 2(b)], as little interpretation was required or given, although peaks were sometimes misidentified and the data collection inadequate in some ways. As has been noted, more than 65% of the papers showing XPS spectra also showed some degree of fitting, and this was the source of the majority of the errors [see Fig. 2(c)]. Much of the following discussion focuses on these fitted data.
In the process of evaluating papers and spectra, several common issues and errors appeared. Common issues that placed papers in the different categories are listed below. A similar version of common errors associated with fitting is published in a guide to XPS peak fitting.44
1. Green category
No significant errors. By all indications, the data were worked up in a scientifically reasonable way and peaks identified appropriately.
The data may have minor issues. For example, they may have been plotted opposite the convention in XPS, which is for binding energy to increase to the left, not to the right.
2. Yellow category
Modest truncation of the edges of a peak envelope, i.e., not taking or showing data over a large enough energy range but not to the point that some reasonable interpretation or fitting of the data is not possible.
Neglecting to include the sum of the fit components and/or the residuals to the fit (or some other figure of merit for peak fitting), making it somewhat difficult to assess the quality of the fit. Nevertheless, the fit components appeared to be a good approximation to the peak envelope.
Not including/showing the background/baseline for the fit, but, again, the fit/data analysis otherwise seemed reasonably sound, and a reasonable background was implied by the fit components.
Some concerns about the selection of the background/baseline relative to the noise level.
3. Orange category
Significant truncation of the peak envelope in a narrow scan—data acquisition was over too narrow a range.
Neglecting to include the sum of the fit components and/or the residuals to the fit, where the sum of the fit components did not appear to be a good approximation of the peak envelope.
Using an incorrect background for a fit. For example, linear baselines are generally inappropriate for the significantly rising baselines that are commonly observed in many XPS peaks, e.g., the application of a linear background to an Ni 2p spectrum.
Failure to pin the background to match or approximate the surrounding noise—fits were regularly observed in which the background was fixed to a local noise peak or trough at one side of the peak envelope.
Employing varying peak widths in a fit when there was no good chemical reason for doing so. For example, it is often the case that peaks from oxide materials are broader than those of the corresponding metals, e.g., the Al 2p peaks from Al2O3 versus from metallic Al. However, large variations in peak widths across all components of a C 1s or N 1s narrow scan have no scientific basis.
Adding too many synthetic peaks to a fit and then optimizing them with software to better match the peak envelope, ignoring the physics and chemistry of the sample.
Attempting to fit and interpret noisy data when it was clear that little meaningful information could be extracted from the data.
4. Red category
A paper could receive a red rating if it contained a significant number of orange errors or particularly egregious orange errors—some of the errors below are more extreme examples of the errors in the orange category.
Extreme truncation of the peak envelope in a narrow scan.
Gross failure to make the background match or be appropriately close to the noise surrounding the peak envelope such that the resulting peak areas/quantitation would be meaningless, e.g., the background line may cut through the spectral envelope.
Employing wildly varying peak widths in a fit when there was no good chemical or physical reason for doing so.
Adding far too many synthetic peaks to a fit and then optimizing them with software to better match the peak envelope, ignoring the physics and chemistry of the sample.
Attempting to fit extremely noisy data. The number of possible fits to a spectrum increases significantly as it becomes noisier—at some point, however, there is too much uncertainty in the fit of a noisy spectrum for it to have any statistical meaning.
Disregarding/neglecting spin–orbit splitting when it was present, not using proper spin–orbit splitting ratios, or labeling a pair of spin–orbit peaks as separate chemical states.
Failure to include the original data, i.e., showing only the synthetic peaks for a fit and perhaps their sum, or using heavily smoothed data—it is nearly impossible to know under these circumstances whether a fit is reasonable. That is, in some cases, authors report “XPS spectra” when they were showing only the sum of the fit components or the fit components themselves.
Gross mislabeling of chemical states, labeling noise as chemical states, omitting chemical states, or proposing impossible chemical states. For example, in their C 1s peak fitting, authors sometimes (i) mislabel (switch) the C—O and C=O chemical states, (ii) omit the C=O state, (iii) try to fit the natural asymmetry (tailing) in the C 1s signal of sp2-type carbon, e.g., from graphene or carbon nanotubes, as multiple carbon-oxygen type components, even when there is not enough oxygen in the material to justify these synthetic peaks, as indicated by a small or nonexistent O 1s peak from the sample—here, it might be better to first fit the C 1s spectrum from an unfunctionalized sp2-containing material with an asymmetric line shape and then use this line shape to fit the functionalized material,45 and (iv) try to fit (and label) the shake-up signal(s) from materials containing sp2 carbon as carbon–oxygen type chemical states.
There are obviously many more ways that XPS spectra can be inappropriately fit.
As shown in Fig. 2(c), on average, just over 40% of the papers with fitted data received red rankings and another ca. 40% received orange rankings. There were, however, some differences here among the journals. The percentage of red rankings is the lowest in the journal focusing on surfaces and interfaces (Journal B, 35%, see also Table I), which suggests that authors in this journal are generally more familiar with XPS—it seems reasonable to expect that surface and interface researchers would be somewhat more familiar with a surface analytical method. In contrast, the number of red errors was the highest for the most general journal (Journal C, 48%, see also Table I). The percentage of red errors for the energy/battery journal was somewhere between these values: 43%. Regarding the number of papers in the three journals that show XPS spectra, Table I reveals that a very high fraction of the papers in the energy/battery journal showed XPS spectra (39%), and 76% of these spectra were fit, indicating the importance of XPS to energy technologies, and possibly something about the complexity of the materials being examined. The use of XPS in the surface and interface chemistry journal is also significant but not as high as that for the energy/battery journal (18% of the papers in Journal B showed XPS spectra, and 60% of these spectra were fit). In the more general science journal, the use of XPS is significantly lower and, as just noted, the quality of data analysis appears to be lower as well (1.2% of the papers in Journal C showed XPS spectra, and 78% of these spectra were fit). It is interesting that the percentage of red rankings increases with the percentage of fit data, and the group that appears to know most about XPS (the surface and material group) also seems to do the least peak fitting. XPS peak fitting is an indispensable part of some aspects of XPS data analysis. However, not every XPS narrow scan needs to be fit. Sometimes, simply showing and discussing the raw spectra, or integrating them and comparing peak areas provides enough information for a study. Finally, note that the 39%, 18%, and 1.2% values (of publications with XPS spectra) here are most likely underestimates for the frequency with which XPS was used in these studies, because this survey only considered papers that showed actual XPS spectra—certainly some of these papers would have mentioned XPS analysis and/or discussed results obtained from the technique without showing spectra.
. | Journal A, Energy and Battery Related . | Journal B, Surface and Interface Chemistry . | Journal C, General Science . |
---|---|---|---|
Total No. of papers | 397 | 863 | 8359 |
% Papers with XPS spectra | 39 | 18 | 1.2 |
% Papers with fitted XPS spectra | 76 | 60 | 78 |
% Papers with fitted spectra and red rankings | 43 | 35 | 47 |
% Papers with fitted spectra and orange rankings | 41 | 40 | 36 |
. | Journal A, Energy and Battery Related . | Journal B, Surface and Interface Chemistry . | Journal C, General Science . |
---|---|---|---|
Total No. of papers | 397 | 863 | 8359 |
% Papers with XPS spectra | 39 | 18 | 1.2 |
% Papers with fitted XPS spectra | 76 | 60 | 78 |
% Papers with fitted spectra and red rankings | 43 | 35 | 47 |
% Papers with fitted spectra and orange rankings | 41 | 40 | 36 |
It is clear from the results in this section that significant sections of the community that use and produce XPS spectra have problems with inappropriate data acquisition, fitting, presentation, and/or analysis. In her assessment of the reporting of parameters related to XPS peak fitting, Karen Gaskell46 found many additional problems: less than 50% of the papers identified the software being used, less than 40% provided information about peak widths and other relevant parameters, and less than 10% described the fitting process. Several recent efforts have been made to increase the amount of information related to peak fitting that is disclosed in papers. ISO has a standard that covers reporting parameters related to peak fitting,47 Sherwood has published an excellent review of the use and misuse of peak fitting,31 and a guide has been prepared to help address this quality challenge.44
B. Looking deeper—what elements are being analyzed and where are the problems?
1. Identifying the most frequently analyzed elements
As with any measurement, an XPS analysis is an effort to address research or technological questions. It is, therefore, useful to explore more broadly the elements that are being analyzed by XPS, and, to the extent possible, information about the materials being analyzed and where problems are appearing. To this end, the number of times spectra from each element appeared as XPS narrow scans in Journals A, B, and C was determined, and this information was compared to the number of hits for each element over a two-month period at the XPS Simplified web-based database (note that this approach to quantifying the results in which individual spectra were counted is different from the approach taken to this point in this work). Figure 3 shows the total number of page views and narrow scans of the top ten elements from these two sources. From these plots, three observations are apparent: (i) carbon and then oxygen are the top two elements on both lists—these two elements appear to be the most researched and also the most shown in XPS; (ii) carbon and oxygen are not only the top two elements on both lists, they lead the other elements by a significant margin; and (iii) with the exception of copper (on the XPS Simplified list) and fluorine (on the publications list), the elements on the two top ten lists are the same. This agreement suggests that the publications analyzed in this study constitute a reasonable, representative sample of XPS in the scientific literature.
The quality of the analysis in the papers that included the most highly reported and researched elements in XPS was reviewed. Note that the overall rating for each paper was used to categorize/rank all the narrow scans included in that paper; i.e., if a paper had previously been given a yellow score, all narrow scans in that paper received a yellow score. The color-coded quality rankings for the first five elements in the literature survey in Fig. 3 are shown in Fig. 4.
A few points are of interest in this figure. First, the rankings for the top five elements shown in the literature are somewhat better to noticeably better than the average rankings for all the elements. However, as noted above, Figs. 3 and 4 were obtained by counting every XPS narrow scan of each element in our sample of the literature, while Fig. 2 shows overall percentages for these papers—one would expect the percentages in Fig. 4 to be higher. Second, Fig. 4 suggests that, ironically, among the top five elements shown in narrow scans in the literature, the most commonly shown element (carbon as C 1s) is also the most poorly analyzed. Third, while analysis of the top three elements, C, O, and N, involves s signals, which are less complex than those produced by p, d, and f shells (s orbitals in XPS produce only one signal, while p, d, and f orbitals yield two), the quality ratings, indicated by the percentage of red and orange ratings, are somewhat worse for the C 1s, O 1s, and N 1s envelopes than for the fourth and fifth elements on the list, S and Ti, which are most commonly analyzed in S 2p and Ti 2p narrow scans. Part of the reason for the errors in these “simpler” 1s spectra may be that their envelopes often contain multiple chemical states, which requires peak fitting with multiple components (see common errors listed above). Furthermore, while somewhat better, the fitting of S 2p spectra was also problematic in many cases. To elaborate, the S 2 p spectrum has a rather well-defined spin—orbit doublet that it is too often ignored, leading to inappropriate labeling of chemical states. The Ti 2p signal is not the most complex transition metal spectrum: Ti(II) and Ti(III) do not show significant multiplet splitting, some tailing is expected for the metallic Ti(0) signal, and the Ti(IV) signal is rather straightforward to analyze (no multiplet splitting).48 Nevertheless, it seems somewhat surprising to have a transition metal with fewer issues than C, O, and N. One reason for the better analysis of Ti may be that many spectra in the literature for titanium are of TiO2, which is a very common material with Ti(IV); i.e., the diversity of Ti-containing materials in the literature appears to be lower than that for C, O, N, and S. Another possibility is that analysts recognize the complexity and take additional care. These results suggest that guides focusing on peak fitting and identification of the chemical states in C 1s and O 1s (and probably N 1s and S 2p, and perhaps Ti 2p and Fe 2p) spectra would be of value to the community.
2. Types of C- and O-containing materials being analyzed
Because of the high level of interest in carbon and oxygen in XPS, we categorized the types of carbon- and oxygen-containing materials currently being analyzed in the literature. In the case of carbon, a large fraction of the C 1s narrow scans came from either (i) sp2-type materials, i.e., graphene, carbon nanotubes, graphite, etc., (ii) organic polymers, (iii) ultrathin films, e.g., organic monolayers on gold, silicon oxide, or silicon, or (iv) adventitious carbon. (Of course, adventitious carbon is everywhere—adventitious carbon is probably included in the other carbon signals listed here. This category was for signals that were specifically identified as this form of carbon.) Carbon-containing materials that fell outside of these categories were put in an “Other” category, which included materials like carbon steel, carbonates, diamond, carbides, kerogen, and MOFs. Figure 5 shows the number of papers in each of these categories and the color rankings they received. Organic polymers have been an important focus of XPS research for decades.49 Nevertheless, the number of C 1s spectra in the literature of sp2-type materials is more than twice that of any other category, which is clear evidence for the strong interest today in these materials for batteries, catalysis, filtration membranes, etc. More than 50% of the C 1s papers with spectra of sp2-type materials contained red errors. This high proportion of research where there are questions regarding the interpretation of the data is probably explained, at least in part, by the fact that sp2-type materials produce some of the most complex C 1s envelopes, which often include two types of carbon (sp2 and sp3), noticeable asymmetry in some of the signals, chemically shifted (oxidized) forms of carbon, the need to use information from other regions of the spectrum (often the O 1s region), and shake-up signals. Determining appropriate backgrounds for these envelopes, which are often extended, can also be challenging, and one must watch out for the few overlaps with other signals that may occur, e.g., the Ru 3d5/2 and 3d3/2 peaks overlap directly with the C 1s signal, and a small K 2p3/2 signal can be mistaken for a carbon shake-up peak. In contrast, the number of green rankings for the ultrathin films is quite high, and the number of red rankings for the adventitious carbon is quite low (see Fig. 6, bottom plot). These more favorable results may be because (i) those who work on ultrathin films/monolayers and/or who analyze the adventitious carbon on their surfaces tend to be part of the surface community and are, therefore, more knowledgeable about XPS (this possibility is consistent with the lower number of red errors noted in Journal B, the surface journal, see Table I) or (ii) many alkyl monolayers and most adventitious carbon are hydrocarbon in nature, which are relatively easy to fit: these peak envelopes often only show signals from sp3 hybridized carbon (no shake-up signals and little or no asymmetry in their component signals) and only one or two oxidation states for carbon. However, it is also the case that the numbers of spectra in these categories are quite low, which makes these latter results, statistically speaking, tentative at best.
The O 1s narrow scans in the literature were also categorized by material (see Fig. 6). The resulting categories, in order of frequency, are inorganic oxides, oxygen in sp2 carbon, oxygen in organic polymers, and oxygen in ultrathin films. Perhaps surprisingly, O 1s narrow scans of inorganic oxides are more than twice as prevalent as those of any other material, which is probably a reflection of the general importance of these types of materials—inorganic oxides are present in many substrates including as oxide films on semiconductors and metals, and oxides are regularly deposited by atomic layer deposition (ALD), chemical vapor deposition (CVD), and sputtering. Corrosion of most inorganic materials results in oxides. The study of catalysts, often inorganic oxides, for applications such as water splitting is just one of several emerging “hot” areas of research that rely heavily on the interpretation of O 1s spectra. The rather high number of O 1s narrow scans of oxygen-containing sp2-type carbon appears to follow from the importance of sp2-based carbon materials in general (see above). Orange or red errors identified in O 1s fits frequently involved either overfitting of spectra (adding too many fit components) or incorrect peak assignments. Common examples of the latter were studies into the significance of oxygen defects. In these cases, the O 1s signal of oxygen defects was commonly identified at a binding energy just above the regular metal oxide, but other contributions that occur at similar peak energies were ignored, e.g., hydroxides and oxygen-carbon based functional groups. In examples of the former, authors would identify, fit, and assign multiple components within a very small binding energy range without any error analysis or assessment of the uniqueness of the resulting fit. In these cases, it appeared that component intensities were set to yield a favorable result, conveniently supporting the authors’ hypotheses but being clearly based on a biased (and often undescribed) fit protocol.
The results in Figs. 5 and 6 point to a need within the technical community for tutorial information about fitting C 1s and O 1s narrow scans, which should probably place special emphasis on fitting/understanding the C 1s and O 1s signals in sp2-type carbon and the O 1s signals in inorganic oxides. Some attention should also be paid to peak fitting both the C 1s and O 1s spectra of organic polymers. These results suggest that additional tutorial type information regarding fitting of specific elements or classes of elements may be useful.
This section is concluded with a brief discussion of the types of carbon- and oxygen-containing materials being analyzed in the three journals we sampled. As expected, the types of carbon- and oxygen-containing materials differ by journal. Regarding the analysis of carbon, Fig. 7 shows that (i) the majority of carbon-containing materials analyzed in Journal A (the energy/battery journal) contain sp2 hybridized carbon, although a good fraction of the materials that were analyzed also fall outside of one of our main categories, (ii) sp2 carbon-containing materials, organic polymers, and ultrathin films (in this order) are all analyzed to a fairly high degree in Journal B (the surface and interface journal), suggesting that it is more balanced in its coverage of materials, and (iii) a strong majority of the C 1s spectra in Journal C (the general science journal) are of sp2 hybridized carbon-containing materials. Regarding the analysis of oxygen, Fig. 7 reveals that inorganic oxides are the most analyzed class of materials in all three journals, with sp2 hybridized carbon being the next most analyzed type of material in Journals A and C, and Journal B again showing a greater diversity in the types of materials analyzed in it.
C. Additional discussion and moving forward
It is emphasized that this effort to track and document errors and problems in the literature has not been an attempt to discredit or marginalize any journals, authors, or techniques, but rather to assess the degree of the problem so that it can be addressed. Anecdotal evidence is simply not adequate to guide educational or organizational efforts to improve the quality of XPS peak fitting, or that of other techniques—the information summarized here has been presented to help identify the extent of the problem and guide efforts to solve it. As described below, data from this study have motivated the development of guides focused on XPS peak fitting in general and also on extracting information from C 1s spectra. This study also provides motivation for the development of other guides such as for O 1s and N 1s peak fitting and, perhaps, an argument for tutorials on the XPS analyses of other elements.
1. Useful introductory guides, tutorials, and perspectives that address common errors in XPS data analysis
Identification of common errors is often helpful to enable researchers to identify problems in the literature and to avoid them in their own work. However, it is not necessarily useful to identify common errors without providing tools to address them. An important component of the focus topic collection of papers related to Reproducibility Challenges and Solutions17 is intended to provide such tools, many of which directly respond to issues discussed in this paper. It includes a collection of papers designed to provide introductory XPS information and guides and, as appropriate, a slightly higher-level discussion of important topics related to XPS analysis. The objective is to provide less experienced XPS users easy access to the information needed to obtain and report reliable information from their data. These papers focus on issues that are important to users and those that often provide problems or challenges.
In this study, many reports of XPS data were found to have multiple types of errors. The special topic collection includes several papers that directly address important elements of curve fitting including a Practical guide for curve fitting44 which introduces the important concepts, identifies many of the ways fitting can fail, and provides examples of ways to fit spectra of increasing complexity. Also important is the Introductory guide to backgrounds in XPS.50 This guide introduces the differing nature of background spectra observed in XPS and the types of backgrounds used in fitting data. Both guides directly address many of the common problems identified during the data analysis described in this paper. In addition, two of the guides deal with the types of specific issues observed in reports of C 1s spectra. The first focuses on XPS analysis of polymers (Practical guides for x-ray photoelectron spectroscopy: Analysis of polymers51) and the second on C 1s narrow scans [Practical guides for x-ray photoelectron spectroscopy (XPS): Interpreting the carbon 1s spectrum52].The latter was developed in direct response to the problems in fitting C 1s spectra found in this study. Other introductions and guides deal with incorrect peak identification, loss peaks, spin—orbit splitting, and peak identification problems introduced by surface charging.28,50,53 Additional papers address other sources of the XPS analysis problem such as instrument operation,54 quantitative analysis,55 and sample preparation.56
2. Everyone has a role in addressing the challenge
Unfortunately, the problems created by faulty analysis of materials by XPS and other techniques are complex and in many ways self-perpetuating. By recognizing some of the issues and drivers it may be possible to avoid or at least minimize their impact.
Sarewitz57 describes a “destructive feedback between the production of poor-quality science, the responsibility to cite previous work, and the compulsion to publish.” Breaking the chain of publishing and referencing incorrect data or analysis requires the vigilance of the entire community.
The current degree of inadequate XPS data and analysis occurs despite more than thirty years of community efforts to develop guides and standards for surface analysis. Both ASTM International Committee E42 on Surface Analysis and the International Organization for Standardization ISO Technical Committee 201 on Surface Chemical Analysis have devoted considerable efforts to produce standards and guides for XPS. However, one problem with consensus standards is availability. The series of Reproducibility Challenges and Solutions guides directly address this issue by providing information critical to quality XPS use and pointing toward the standards that are available.
Many other issues relate to the quality of analysis of both XPS data and that from other techniques. Data accessibility and archiving can enable others to analyze or reanalyze data to verify or correct analyses. Equipment vendors have an important role in educating users. They also need to continue to develop software that facilitates data archiving by recording a full range of instrument and operational parameters, which assists in the reporting, replicating, and understanding of data sets. Providing instrument profiles that are referenceable can assist users in accurately reporting instrument information. As always, data quality, analysis, presentation, and accuracy remain a core responsibility of researchers, analysts, and authors.
IV. SUMMARY AND CONCLUSIONS
Consistent with reported trends for other types of materials characterization, an analysis of papers published in three high-quality journals over a six-month period has revealed that roughly 30% of the analysis of the XPS spectra published in these journals is sufficiently flawed that it calls into question the conclusions of these papers.
Beyond simply identifying the presence of errors, which provides an important motivation for improvement, the identification of the types of errors that commonly occur provides information useful in spotting errors in the published literature. It was found that the curve fitting of XP spectra was a significant problem in XPS analysis. This has motivated the development of new guides for peak fitting, including for the analysis of C 1s spectra. These guides, and others in the collection, are intended to be useful for the XPS community in producing and recognizing quality XPS data and analysis.
The intent of this study was to provide information that can lead to improvements in the analyzing and reporting of XPS data. It is not to be critical of the technique, which has the proven ability to provide important and highly useful data for many types of research. Rather, an attempt was made to offer information that can motivate improvements in the analyzing and reporting of XPS data.
The increasing need for research studies to apply a wide range of tools is a likely contributor to the observed problems. It is not possible for individual researchers or even research groups to have the expertise required in all areas. This same issue impacts journal reviewers. Collaboration and interactions with analysis experts are encouraged.
The XPS community has developed standards and guidelines for data collection, analysis, and reporting that have improved the ability of the analyst to do quality XPS analysis and that address the issues identified in this work. However, as the use of XPS has grown at a very rapid pace, the transfer of this information has been limited. Increased efforts are needed, and in process, to enhance the availability of this information.
AUTHORS’ CONTRIBUTIONS
The ad hoc group of senior experts responsible for the spectral evaluation in this study consisted of W.S., T.R.G., C.D.E., A.H.-G., D.R.B., and M.R.L. All of these members contributed in a consequential way to the writing of this document. As noted, T.S.N. performed the original analysis of the elements that are most frequently analyzed and researched in XPS. He contributed the data from the XPS Simplified website that was analyzed herein. The student authors on this work performed the data mining and workup of the data, including a significant amount of analysis that is not shown herein and the creation of a searchable database. G.H.M., T.G.A., and B.M. evaluated most of the spectra with M.R.L. prior to the teleconferences with the other committee members. These graduate students also participated in these discussions. G.H.M. and T.G.A. directed work performed by other students. B.M. suggested that we evaluate Journal A. G.H.M. contributed significantly to the writing, production of figures, referencing, and summarizing/analyzing of the data in this document.
ACKNOWLEDGMENTS
The BYU authors acknowledge the Department of Chemistry and Biochemistry and College of Physical and Mathematical Sciences at Brigham Young University for their support.
DATA AVAILABILITY
The statistical results that support the findings of this study may be made available on request from the corresponding author. The specific evaluations of spectra that were made in this study will not be made available due to privacy and ethical restrictions.
APPENDIX: UNIFIED LIST OF COMMON ERRORS
The common errors noted for the different paper classifications/categories used in this study are presented here as one list for clarity. This list was used in a “Common Errors Poster” presented in 2019 at the European Conference on Surface and Interface Analysis, Practical Surface Analysis, and American Vacuum Society, along with examples of data analyses containing multiple errors, with the challenge for poster viewers to identify as many errors as possible.
No indication that instrument performance or appropriate calibration was completed before analysis.
No consideration of the relevant physics and chemistry of the spectra when doing analysis, peak identification (including satellite and multiplet splitting), or peak fitting.
Not plotting the data according to the international convention, i.e., binding energy increasing to the left.
Presenting and interpreting data that are far too noisy to be useful.
Labeling noise as chemical components.
Not showing the original data—only showing the synthetic (fit) peaks and perhaps their sum.
Using inappropriate background shapes or not showing any background in a fit.
Not providing the sum of the fit components, making it difficult to determine the quality of a fit.
Having widely varying peak widths in a fit, e.g., using extremely broad and extremely narrow peaks when there is no chemical reason for doing so.
Having a baseline completely miss the noise/background on either side of the peak.
Not collecting data over a wide enough energy window to see a reasonable amount of baseline on both sides of the peak envelope.
Mislabeling higher oxidation states incorrectly as coming from lower oxidation states when they have higher binding energies, e.g., in a fit to a C 1s spectrum, reversing the labeling of the C—O and C=O fit components.
Not taking spin-orbit splitting into account when it is necessary, and/or using inappropriate ratios for these pairs of peaks.
In a comparison of related spectra, employing widely different peak widths and/or positions for components that are supposed to represent the same chemical states.