This article strives to expand on existing work to demonstrate advancements in data processing made available using event mode measurements. Most spallation neutron sources in the world have data acquisition systems that provide event recording. The new science that is enabled by utilizing event mode has only begun to be explored. In the past, these studies were difficult to perform because histograms forced dealing with either large chunks of time or a large number of files. With event based data collection, data can be explored and rebinned long after the measurement has completed. This article will review some of the principles of event data and how the method opens up new possibilities for *in situ* measurements, highlighting techniques that can be used to explore changes in the data. We also demonstrate the statistical basis for determining data quality and address the challenge of determining how long to measure mid-measurement. Finally, we demonstrate a model independent method of grouping data via hierarchical clustering methods that can be used to improve calibration, reduction, and data exploration.

## I. INTRODUCTION

By utilizing time-of-flight methods, spallation neutron sources (SNSs) can continuously gather much wider energy-dependent spectra than monochromatic approaches. These time-of-flight sources are increasingly reliant upon the so-called event based data measurement modes,^{1} where the detection of each individual neutron count is recorded, as opposed to being recorded only within a histogrammed spectrum. Although the individual events are later rebinned into classical histogram representations (e.g., measured intensity vs. d-spacing), the event based data format remains stored at the facility such that it can be re-reduced or examined in new ways at a later date.

An obvious advantage of event based data modes is the ability to rebin and examine the measured scattering in different schemes *ex post facto*, which can be particularly useful for *in situ* studies. For example, if a material unexpectedly undergoes a structural change or transition at an unexpected time (e.g., during a heating ramp), the data can be parsed selectively to identify scattering before and after such a change and not forced into a collectively merged single spectrum. There are numerous such recent examples from the NOMAD^{2} and POWGEN^{3} beamlines at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL).^{4–8} Beyond slicing data in time, scattered data can be reduced according to the sample environment condition recorded at the time of the measurement (referred to herein as a sample log) such as temperature, pressure, applied electric field, etc. Event based data have the potential to be utilized in advanced data reduction and analysis analytics, particularly through the use of machine-learning beamline methods, for example, automatic grouping of measured data via hierarchical clustering.

In this paper, we demonstrate some of the fundamental considerations in efficiently grouping event based data, introducing the concept of “fat events.” We also review the underlying statistics which govern the ultimate data quality for neutron diffraction measurements and correlate these with considerations of measurement time. Finally, we demonstrate a model-independent method of automatically grouping spectra and its utility for both error checking and data clustering.

## II. BRIEF REVIEW OF POISSON DISTRIBUTION

The Poisson distribution describes the probability, *P*, of observing a given number of events (i.e., *ν* detected neutron counts) within a given time period if the average number of events expected in such a time period is *μ* such that

This is a simplified form of binomial distribution in the case of a large number of trials. The variance for the Poisson distribution is simply *σ*^{2} = *μ*. Note that in relation to event based neutron scattering data, this will hold true only for raw detector counts and not necessarily for the fully reduced data.

A correct characterization of “time period” is required for this representation to make sense. The number of detected neutrons depends not only on the scattering cross section but also on the incident flux at the sample. This is typically accounted for by dividing data by a characteristic of the number of incident neutrons, such as the integrated proton charge (e.g., micro-amp hours) or integrated incident monitor flux. In effect, “time” speeds up or slows down depending on the incident flux on the sample. Hence, many data acquisition systems will characterize a single measurement by the accumulated proton charge on the target, as instantaneous beam power delivered to the target at a spallation neutron source frequently varies.

It follows naturally that any observed value which obeys a Poisson distribution will have a measured uncertainty that falls as *T*^{−1/2}, where *T* is the measurement period.^{9} As both the number of neutron counts at detectors and the accumulated incident flux increase with time, neutron scattering event detection can, to first order, be considered to behave according to Poisson statistics. This is the origin of the general beam line adage that one must measure 4 times as long to improve statistics by a factor of 2. Note that there are potentially other systematic sources of error in a neutron scattering measurement, and this rule only applies to the improvement of uncertainty which are derived from the quantities obeying Poisson statistics.^{10}

## III. EVENT MEASUREMENTS

With neutron event mode data collection,^{1,11} the time-of-flight, detector identifier, and wall clock time of the proton pulse the event is associated with are recorded for every event. This last value is commonly referred to as the event’s pulse time. Using the pulse time of every event and correlating it with the sample environment logs, one can split or slice the data after the acquisition is completed. The SNS at ORNL operates at a nominal 60 Hz. This means that time slicing schemes that are significantly larger than $160$ s can employ this simple approach. For faster varying information (e.g., high-frequency electric field, pulse magnets), not only a correction must be made to calculate the time when the neutron was at the sample position but also a correction must be made for the incident flux on the sample. However, most sample environments have a characteristic time scale which is much larger than an individual time-of-flight frame associated with a proton pulse. In this contribution, we focus on slicing data from these “slower” sample logs.

Data stored in raw event mode can be cumbersome to reduce to traditional forms, largely based on the inherent memory requirements associated with the data format. Significant compression and process speed-up comes from storing the data as a sparse histogram of “weighted events,” where each bin represents a given time-of-flight value, with a weight equal to the number of events that fall within that binning scheme.^{11} Beyond time-of-flight, weight (or counts), and associated uncertainty, each weighted event is associated with the detector pixel it was measured at. With this strategy, the pulse time is discarded.

For *in situ* measurements, where reduction schemes may be varied *ex post facto*, it is useful to introduce an additional data storage scheme that functionally sits between the raw event format and the fully histogrammed weighted events. We here introduce the concept of “fat events,” which are collections of sparse histograms that are only partially compressed against pulse time. As the characteristic timing associated with many sample environment logs vary on the order of seconds (or longer), which is much greater than the associated pulse time measurement tolerance (100 ns), significant gains in compression can be achieved by selecting binning schemes of weighted data in accordance with the characteristic time of sample environment log variance, which naturally encompasses many individual event pulse times. Thus, the “fat event” representation is a collection of sparse histograms, one for each effective pulse time. This additional field for pulse time can provide savings in memory over raw events but still allows for slicing the data later. Figure 1 is a modified version of Fig. 3 from the work of Peterson^{11} with additional lines for fat events with varying wall clock tolerances. The vertical axis describes a “relative compression factor” which is the ratio of the number of compressed to number of uncompressed (or raw) events (i.e., compressed/uncompressed). This is limited to be between zero (no events at all) and one (each event is unique and cannot be compressed).

As an example of fat events, imagine an experiment where all of the neutrons are observed in a single narrow peak. In this case, each newly arrived event’s time-of-flight will be the same as an existing weighted event’s. The existing weighted event’s weight and uncertainty would be changed, and the new event would be discarded as it is accounted for.

Empirically the relative compression factor follows the form of

with *T* being the wall-clock time and the constants *A*_{i} being freely fit. The main point is that the compression performs proportional to *T*^{−1/2}. This is in agreement with the traditional view that neutron scattering adheres to Poisson statistics as described in Sec. II. An important feature of compressing data is that the data will only compress if new events have the same, within tolerance, time-of-flight as data that are already recorded. Compressed data requires fewer unique time of flight values to describe it.

## IV. STATISTICS OF RECORDED SAMPLE LOGS

Assume that a sample environment log value (e.g., temperature) has a time dependence given by *L*(*t*). For older, histogram based neutron data acquisition systems, it is common to record these log values at a single point in the total measurement time (often either at the beginning or at the end of the measurement). In event based measurements, one typically records *L*(*t*) over all times, to allow for later data filtering. This section will explore different ways of summarizing the sample environment log value of using three different techniques: simple mean (*μ*), time averaged mean (*μ*_{T}), and flux averaged mean (*μ*_{P}).

Since the records are made at discrete times *t*_{i}, a common way to calculate the simple mean, *μ*, and variance, *σ*^{2}, of *L*(*t*) is by inspecting the sample log values themselves,^{13}

and the associated variance^{10}

Here *L*_{i} = *L*(*t*_{i}). This will give the wrong result for quantities that are not recorded with a constant frequency. Instead, the time averaged mean will produce a more accurate result

and the corresponding variance

Filtering, or splitting up the measurement, can be accounted for as well by introducing a masking function, *M*(*t*). Here, we assume that the mask will have values of either 1 (when the neutron events are relevant) or 0 (otherwise) and is applied to both the neutrons and the sample environment logs. In the presence of a mask, Eqs. (5) and (6) should be rewritten by replacing *dt* with *M*(*t*)*dt*.

In real situations, the underlying measurement is tied to the incident flux Φ(*t*) rather than to a clock on the wall. Frequently, experimenters will normalize the diffraction pattern by the total incident flux but will use a time-averaged value of a log. This is inconsistent, but generally of minor consequence. To introduce the flux weighted average of log, first we must define the normalized flux

Using this function allows for determining mean and variance using traditional techniques for continuous variables^{13}

and the associated variance

These reduce to the time average and variance in the case of constant flux, Φ(*t*) = *P*, using all of the recorded information *M*(*t*) = 1 for 0 ≤ *t* ≤ *T*_{1} ≤ *T* and zero elsewhere. Then the normalized probability is simply

The mean and variance are then of the expected form

and similarly,

For the case of constant incident flux, this is the time average value and the time average variance.

We demonstrate with an example where *L*(*t*) is taken as the total measured neutron counts per second (Fig. 2, middle) over all detectors on the POWGEN instrument during the initial run cycle turn-on period, when the delivered power to the target is being slowly ramped up (Fig. 2, top). To first order, there is a linear relationship between proton power on the target and neutrons generated, as seen by the normalized ratio of these two quantities (Fig. 2, bottom). Note that this behavior deviates slightly at very early times (*t* < 1.5 h) when the incident spectrum generated by the moderator is not yet in equilibrium (observed as an increase in delivered neutrons per pulse).

Using the counts per second as *L*(*t*), one can see the difference in employing the various averages in Fig. 3. As with many other examples in this article, when the incident neutron flux is constant, all three measurements of the uncertainty follow a trend of *T*^{−1/2}.

Since the neutron count rate was recorded at a constant rate of once per second, it is not surprising that the average *μ* and time average *μ*_{T} are the same (green and blue lines, respectively). The incident flux weighted average has a different dependence, due to the behaviour in the beginning of the time interval, when the accelerator was ramping up after an extended shutdown. This is also when the ortho/para-hydrogen ratio is being optimized for the run cycle.^{14} After the first hour, once the proton charge (and incident spectrum) has stabilized, the values and uncertainties follow the *T*^{−1/2} trend. In practice, one would filter/remove the data from the first hour. Figure 3 highlights how accepting a relatively small amount of unstable data into an otherwise stable measurement has a strong effect. Another example is presented in Appendix A with greater detail.

## V. ESTABLISHING STOPPING CRITERIA

The question of “how long to measure” in a neutron scattering experiment is often answered by estimating expected scattering rates via the sample mass and expected scattering power (from literature values^{15,16}) or by measuring a characteristic sample for a short time and extrapolating. Other heuristics can be applied such as measuring to a specific statistical cutoff (e.g., Toby and Egami^{17} suggests 10^{6} counts over a range of data to obtain an acceptable uncertainty in reduced data) or simply comparing the current sample to how long a related sample was previously measured.

It has been long suggested that neutron diffraction data are over-measured. This is understandable, as the consequences of having insufficient counting statistics often outweigh the potential gains of dividing limited beam time among additional samples. However, if one can determine a way to match the measurement time to the statistical requirements of a sample, this will enable both a more efficient use of beam time and an appropriate degree of confidence in the measured data.

Ideally, the correct criterion for stopping a measurement is when the uncertainty in a parameter from fit or refinement^{18} is below a defined level. For example, consider the ideal case during powder diffraction measurements and Rietveld refinements.^{19,20} Here, one could imagine that data would be reduced and refined during the measurement itself, and dynamic review of the resulting Rwp and structure parameters would influence the measurement stopping criteria (e.g., when the Rwp gets below an expected value). At present, this strategy is often impractical. Reducing the data, getting it to an appropriate analysis program, automating that program, and feeding information back to the data acquisition require several different systems to act in concert in addition to having a reasonably good understanding of the structure. Figure 4 shows this strategy applied to the first 4 h of diamond measurement presented in Sec. IV (while the accelerator was changing power) long after the measurement was complete with the results compared to refining the full 12 h measurement providing the “known” structure. As described above, during the first hours of the measurement, the moderator was out of equilibrium, changing the incident spectrum of delivered neutrons. This is the reason for the variance in the fit 1 min slice results. While Fig. 4 demonstrates that the parameters themselves are greatly affected, the uncertainties in the parameters are driven more by the number of neutrons counted. Note that the parameters of the refinement vary until the incident spectrum has stabilized and then move quickly to their asymptotic values. Accepting “bad” data for the sake of measuring more neutrons has an adverse effect on the data as a whole.

Note that this degree of sensitivity to relative changes in data does not actually require a full Rietveld refinement. Indeed, for metrics defining convergence of data quality, much simpler data inspection methods can potentially be employed. We demonstrate this here by performing single-peak fits on this diamond dataset, specifically looking at the peak at d = 0.48 Å. This peak was fit using a pseudo-Voigt peak shape, in which the peak position, amplitude, and width were refined for each slice of data. The results of the fit are shown in Fig. 5.

From these single peak fits, the characteristic changes due to shifting the incident moderator spectrum can be seen to occur up through the first hour of data collection. The simplicity of this approach lends its use to more applications while still providing more insight than traditional heuristics. Depending on the particulars of the phenomenon being studied, one could select a parameter of a characteristic peak or any such related metric.

## VI. MODEL INDEPENDENT COMPARISON OF SPECTRA

Demonstrated here is an approach well suited for combining event-based data based on how similar it is. It has been described elsewhere^{21–23} that the exact method for clustering data is much less important than the choice of distance or similarity criteria used to inform that clustering. Once an appropriate criterion is chosen, further processing may be required in order to be used with a particular clustering technique, but the resultant clusters generated (often referred to as the different species) will be generally the same.

With diffraction patterns, it is desirable to employ a distance metric that can overlook subtle differences in scale that preserve relative intensities. Additionally, small uniform shifts in peak position, such as those due to temperature dependent unit cell expansion, are often of less interest in comparison to significant changes in structure due to processes such as phase-change or decomposition. de Gelder *et al.*^{24} explored the concept of spectra similarity metrics in relationship to powder diffraction data and introduced a cosine-type metric to describe the similarity between two spectra *f*(*x*) and *g*(*x*),

where $w$_{fg}(*r*) is a weighting function and *c*_{fg}(*r*) is the cross-correlation function,

The auto-correlation function, *c*_{ff}(*r*), is calculated when *f*(*x*) = *g*(*x*) and will have a peak at *r* = 0. The cross correlation is an asymmetric function where *c*_{fg}(*r*) = *c*_{gf}(−*r*). The weights in Eq. (13) must be symmetric ($w$_{fg}(*r*) = $w$_{fg}(−*r*)). Here, they are set to be a triangle function

when |*r*| ≤ *l* and zero elsewhere. The parameter, *l*, is a characteristic distance within which shifted features are not as strictly penalized. Other weighting functions may be selected, but this simple triangle form used here is effective. When two patterns are identical, the similarity function becomes unity, *S*_{fg} = 1.

de Gelder’s similarity metric is insensitive to absolute scale fluctuations between data due to the denominator in Eq. (13). The weighting function employed in the cross correlation can accommodate a degree of tolerance in absolute shifting of feature positions, such as those often encountered due to thermal expansion.

Most clustering algorithms require that the similarity matrix is symmetric and that the diagonal (self-similarity) is unity. Accumulation of numerical errors can cause slight errors along the diagonal that can lead to such algorithms failing. As such, we here explicitly set the diagonal to one and ensure symmetry by averaging with the transpose such that

In the cases presented below, the absolute difference introduced by these two consistency operations from the explicit calculation is found to be of the order 10^{−16} and thus insignificant.

With a similarity metric calculated, it is straightforward to perform hierarchical clustering on the data. Many clustering algorithms consider a distance function rather than a similarity metric, which is related simply as

A distance of *D*_{fg} = 0 would indicate identical patterns.

Continuing to show the first 4 h of the diamond measurement above, Fig. 6 shows the symmetrized similarity matrix with species determined using the DBSCAN (density-based spatial clustering of applications with noise) method.^{25–27} This also highlights the period before the incident flux stabilizes as different from the rest of the measurement. With the selected density parameter, those regions are marked as “noise” by DBSCAN. We note the potential of utilizing such an approach to significantly enhance and automate calibration and reduction procedures on neutron diffraction beamlines, as the data are essentially self-interrogating for outliers.

As a further example of using hierarchical clustering^{28,29} to explore data, we employ this method to a temperature dependent dataset of barium titanate (BaTiO_{3}) measured on the NOMAD diffractometer^{2} from 100 °C to 500 °C at constant 10 °C intervals. Barium titanate is a canonical ferroelectric with a notable difference in local vs. long-range structural transitions, which has received significant attention in recent years.^{30,31} Previous work has shown the utility of employing model-free analysis methods to *in situ* studies of total scattering via the Combinatorial Appraisal of Transition States (CATS) method,^{32} which in part utilized a direct residual between data to determine distance metrics. We here take a similar approach by employing de Gelder’s similarity matrix. The results of calculating the similarity matrix across the dataset are shown in Fig. 7, where the expected phase-transitions (from R3m at low temperature to Amm2, P4mm, and finally Pm$3\xaf$m at high temperature) are noted as dashed vertical lines.

In hierarchical clustering, the different species of data can be determined by setting a characteristic distance cutoff such that data with distances less than the cutoff are clustered together. In the case of the barium titanate data, a cutoff of 0.01 was employed, which produced four distinct species. The calculated distances and resultant clusters are shown as a dendrogram in Fig. 8. A subset of the average scattering patterns, based on these four species clustering and colored correspondingly with the dendrogram, are shown in Fig. 9. We note that the method has correctly separated the four distinct phases. This demonstrates the utility of such an approach not only for calibration and reduction procedures on instruments but also for data exploration and model-free analysis.

## VII. CLOSING REMARKS

Since the initial results of neutron event data processing were published,^{33,34} experience has provided new insights into utilizing this capability. We herein introduced the so-called fat events and demonstrated their utility to more readily inspect partially reduced data. A physically meaningful way to average sample environment measurements, accounting for the incident neutron flux, has been shown herein as well as a guidance to approach the question of when to stop a measurement.

The method of employing hierarchical clustering to automatically discriminate event based data has been demonstrated via neutron diffraction data of diamond and barium titanate samples. This approach shows a model free technique to aid in data inspection and reduction methods, particularly when faced with large datasets from *in situ* experiments. The choice of the distance function can be made such that subtle changes in data (such as peak shifts due to thermal effects) are ignored when compared to more dramatic changes such as phase transitions. This work demonstrates the utility of further exploring such machine learning methods for use in scattering techniques, particularly when addressing large datasets and complex materials under *in situ* study.

## ACKNOWLEDGMENTS

The work of D.O. was funded by the BES Early Career Award, Exploiting Small Signatures: Quantifying Nanoscale Structure and Behavior KC04062, under Contract No. DE-AC05-00OR22725. Neutron scattering measurements are sponsored by the User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy. Portions of this research used data from the NOMAD, POWGEN, and VISION instruments at the Spallation Neutron Source. The authors would like to thank Ashfia Huq for providing the POWGEN data, Katharine Page for providing the NOMAD data, and John Faber for insights into pattern matching metrics.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doepublic-access-plan).

### APPENDIX: TEMPERATURE EXAMPLE

In this appendix, the authors provide a second example to assist the reader in understanding Sec. IV using a more traditional sample environment log, temperature. The data provided here are artificial, but a reasonable analog of a real experiment. Here, we simulate a measurement using a sample environment that should fix the temperature at 200 K for 3 h. However, assume an unanticipated hardware failure causes the temperature to increase rapidly to 220 K. The issue is discovered after 15 min, and the temperature is brought back down but only reaches 205 K. Unfortunately, once the temperature issue is fixed, an unexpected issue in the proton accelerator stops neutron production for 30 min before it is brought back online and continues at one third of its initial power, *P*. Figure 10 shows this information graphically for the simulated temperature log *L*(*t*), a simplified flux Φ(*t*), and mask *M*(*t*). As the data were measured at a facility that records data in event mode, the data need not be remeasured.

Assume that the data from the sample at 220 K are considered outside the prescribed temperature range of interest and are to be discarded. The question remains what to do with the remaining data. The two most obvious approaches would be to either split the data between that measured at 200 K and 205 K or combine it all together to reduce overall uncertainty in the measured spectra. Since the decision will depend heavily on the phenomena being studied, we will present the mean and variance calculations for both the time averaged and flux average methods.

In calculating the time average values of Eqs. (5) and (6), the incident flux, Φ(*t*), is not required. Therefore, the values can be calculated by splitting into three regions

or *μ*_{T} = 203.7 K with an associated *σ*_{T} = 3.0 K. The integrals are not exact because they need to be evaluated numerically, in this case using Newton’s method.

Calculating the flux weighted values of Eqs. (8) and (9) requires splitting into more regions. Here both Φ(*t*) and *M*(*t*) are used. Using the function shown in Fig. 10, the denominator of Eq. (7) becomes

Then Eq. (8) is

or *μ*_{P} = 201.4 K and *σ*_{P} = 2.2 K. From this example, one can see that *μ*_{T} treats all time independent of incident flux, where *μ*_{P} weights the first part of the measurement more because of the higher flux. This is understood because the low temperature region will have accumulated *P*·1 h neutrons, while the high temperature region will have only *P*·0.5 h neutrons. The difference in values is not statistically significant, but the weighting of the different temperatures is more intuitive.

If the decision is made to split the data into above and below temperature changes, the equations are similarly evaluated. However, our fictional data will result in *μ*_{T} = *μ*_{P} = 200.0 K with *σ*_{T} = *σ*_{P} = 0.3 K for the lower temperature. These values are equivalent because *ϕ*(*t*) = 1 in this region. This is not the case for the higher temperature where despite the longer measurement time, the incident flux is significantly smaller, $\varphi (t)=23$. The averages are the same within one significant digit, *μ*_{T} = *μ*_{P} = 205.0 K, but *σ*_{T} = 0.4 K and *σ*_{P} = 0.1 K. The increased precision from splitting into two temperatures is paired with, effectively, having two measurements.

## REFERENCES

_{2}adsorption in zeolite x using stroboscopic isotope contrasted neutron total scattering

_{4}Mo

_{2}O

_{11}and Ce

_{4}Mo

_{2}O

_{11}: Revealing unexpected mixed-valent transient intermediates and determining the sequence of events during crystal growth

_{3}from local symmetry-breaking distortions

_{4}under pulsed high magnetic fields