Non-negative matrix factorization (NMF) is an appealing class of methods for performing unsupervised learning on streaming spectral data, particularly in time-sensitive applications such as *in situ* characterization of materials. These methods seek to decompose a dataset into a small number of components and weights that can compactly represent the underlying signal while effectively reconstructing the observations with minimal error. However, canonical NMF methods have no underlying requirement that the reconstruction uses components or weights that are representative of the true physical processes. In this work, we demonstrate how constraining a subset of the NMF weights or components as rigid priors, provided as known or assumed values, can provide significant improvement in revealing true underlying phenomena. We present a PyTorch-based method for efficiently applying constrained NMF and demonstrate its application to several synthetic examples. Our implementation allows an expert researcher-in-the-loop to provide and dynamically adjust the constraints during a live experiment involving streaming spectral data. Such interactive priors allow researchers to specify known or identified independent components, as well as functional expectations about the mixing or transitions between the components. We further demonstrate the application of this method to measured synchrotron x-ray total scattering data from *in situ* beamline experiments. In such a context, constrained NMF can result in a more interpretive and scientifically relevant decomposition than canonical NMF or other decomposition techniques. The details of the method are provided, along with general guidance for employing constrained NMF in the extraction of critical information and insights during time-sensitive experimental applications.

## I. INTRODUCTION

High-throughput experiments, which involve measurements exceeding the rate of conventional analysis,^{1–6} and *in situ* experiments, which involve measurements made during a dynamic process or sample change,^{7,8} have both become ubiquitous in the discovery of modern materials. Automation and engineering have increased the capabilities for data streaming and *in situ* experimentation in many settings, including university laboratories,^{9} industrial research,^{10} and central facilities.^{11} These technological advancements have afforded new opportunities for interrogating the dynamic processes of systems, including phase transitions^{12} and reactive processes.^{7} However, such opportunities bear the burden of an increased rate of data production, as well as a commensurate increase in the volume of data to analyze.

This is abundantly clear at central user facilities such as synchrotron light sources, where available photon flux for beamline measurements has been increasing at a rate even faster than Moore's law.^{13} Largely driven by advances in accelerator and detector technologies, such facilities are generating more data now than ever before. For example, the National Synchrotron Light Source II (NSLS-II) produced over 1 PB of raw data in the last year, and that rate is expected to increase as the facility matures.^{11} This measured raw data often cannot be immediately converted into material knowledge, but must first be processed and analyzed to extract key information. Unfortunately, the speed of conventional data analysis methods has not kept pace with the increased measurement speeds. This mismatch has led to a scenario where scientists performing *in situ* studies are effectively “flying-blind” while at the beamline, unable to fully understand their data until well after their experiment has completed.

This information bottleneck is particularly pronounced during total scattering experiments, where both the local and average atomic structure of a sample can be measured in fractions of a second.^{14} These experiments—combining both x-ray powder diffraction (XRD) and pair distribution function (PDF) methods—are often critical to understanding the origin of structure/property relationships in energy materials such as batteries,^{15,16} catalysis,^{17,18} photovoltaics,^{13} and multi-ferroics.^{19} Conventional analysis in total scattering studies typically involves iterative fitting of models to reciprocal^{20} and real space^{21} data in order to extract relevant physical properties (e.g., crystallographic phase, lattice constants, atomic positions, site occupancies, and atomic displacement parameters). These refinements may take minutes to hours for a skilled expert to perform, but can be difficult for novices and do not scale well to large datastreams.^{22} For amorphous, disordered, or nanoscale hierarchical materials, it is common to employ even more complex “large-box” refinements on total scattering data,^{23,24} but these methods of analysis take even longer (often days to weeks of compute) and are thus completely unsuitable for rapid analysis.

Machine learning (ML) tools offer great promise for monitoring and analyzing large volumes of streaming data.^{11} These tools have been demonstrated for classifying diffraction patterns,^{25–32} or decomposing large diffraction datasets over state variables into fundamental components.^{12,33–37} Particularly, unsupervised learning is useful for offering rapid diagnostics and analysis without prior training.^{15,34,38} Because unsupervised decomposition or data reduction algorithms do not require any prior data labeling, they are exceptionally useful for interpreting datasets in which there are potentially new or unexpected features or phases emerging. Two such algorithms, principal component analysis (PCA)^{39,40} and non-negative matrix factorization (NMF),^{38,41} have been employed to segregate distinct components from diffraction datasets. The algorithms aim to describe a dataset of *m* experimental patterns (XRD or PDF) as a weighted linear composition of *k* fundamental components. Of these, NMF is particularly well suited to the analysis of x-ray scattering or diffraction data because its decomposition aligns with the underlying physics: the component signals are enforced to remain greater than or equal to zero, and are strictly additive to one another in the dataset reconstruction.

Despite the attractive properties of non-negativity and additive superposition, NMF as canonically implemented suffers from several significant drawbacks in its application to scattering or diffraction experiments. First, it is incapable of interpreting translational variance: as the canonical NMF model uses only a linear combination of components, a given component cannot effectively reproduce data that is identical except for translation. Unfortunately, this is a common occurrence for XRD and PDF experiments due to material thermal expansion, changes in nanostructure, and lattice strain due to Vergad's law.^{34} Efforts to overcome this translation have included discarding translationally redundant components, although this requires additional hyperparameters for detecting the redundancy that are not effective across different experimental challenges.^{25,34} A second challenge of canonical NMF decomposition is that the optimization is Non-deterministic Polynomial-time hard (NP-hard), and thus most non-trivial applications will lack a unique solution. This means that the results of the model fit may vary depending on initialization, and are not guaranteed to converge to a global minimum for reconstruction error. Finally, though non-negativity is a sensible constraint for x-ray diffraction data, it is not sufficient to ensure that the output components correspond to physically interpretable entities.

It can be argued that there exist a set of components and weights that are most physically meaningful in the analysis of diffraction datasets.38 From the standpoint of the researcher seeking physical or chemical information, ideally the derived NMF components would correspond to the fundamental physical components in the material and respective derived weights would correspond to the volume fractions of those physical components. It is common in such studies to have some prior knowledge about these physical components. For example, *in situ* experiments commonly follow a trajectory across a state variable where at the start of the experiment, the sample begins as a known phase that will later undergo a transition into other known or unknown phases.^{42} In this instance, a physically meaningful analysis would include components that correspond to the initial, known phase. In other instances, both the end-member states of an *in situ* study may be well characterized before the measurement, with unknown intermediate phases or mixtures that need identification.^{12,38} Alternatively, there can be no knowledge of the component phase identity, but an expectation for the underlying composition. Metropolis matrix factorization attempts to address these problems by constraining select components during optimization, but remains too computationally costly to run in real time during an experiment involving streaming data.^{15}

Therefore, an ideal decomposition method for total scattering and diffraction datasets must resolve components and weights that are physically meaningful, and operate efficiently on the timescale of the measurement. The resulting model should not necessarily be the one that best fits the data but one that is most likely within the physical constraints of the system. By increasing the likelihood that the learned components and weights represent physically relevant features and process occurring within the sample, these results are more likely to be relevant and interpretable to the researcher performing the measurement. Moreover, an effective technique should operate quickly enough to provide real-time insights to the researcher, thus enabling a feedback loop between knowledge extraction and human action. Therefore, in order to be productive and insightful, decomposition methods for such data should be able to incorporate known weights or components as informative constraints on the problem, at the time of the measurement.

In this contribution, we present our advances in developing constrained NMF methods for use on spectral data, specifically *in situ* studies of total scattering and powder diffraction beamline data. The algorithms were developed using the PyTorch^{43} framework for speed and scalability, allowing constrained NMF to behave as a companion agent in concert with the researcher, leveraging the expertise of the scientist while providing real-time insights via physically interpretable decompositions of streaming datasets. We first demonstrate the capabilities of constrained NMF on several synthetic examples, in which the limitations of canonical NMF are explored. We then use constrained NMF to decompose temperature-dependent total scattering datasets of a ferroelectric perovskite and a molten salt, wherein there is no *a priori* knowledge of components in the latter. Finally, we provide best practices on how to apply these methods during *in situ* experiments and interpret the results as they relate to commonly encountered scenarios in the analysis of energy materials. These developments are extensible to applications in spectroscopy and microscopy, and can also be deployed at university and industrial laboratories.

## II. METHODS

### A. NMF

Given a non-negative data matrix **X** of dimensions *m *×* n*—where *m* is the number of independent measurements and *n* is the length of the feature dimension—the goal of NMF is to find a factorization such that

where **W** and **H** are non-negative matrices of dimension *m *×* k* and *k *×* n*, respectively, and *k* is the user-specified number of components. As such, the problem reduces to solving

where *D* is a separable, positive measure of fit and the notation $M\u22650$ indicates non-negativity of the entries of matrix *M*.

We employ the *β*-divergence $D\beta $ as a loss function during minimization. The *β*-divergence is a popular cost function for NMF minimization parameterized by a single shape parameter *β*.^{44} For the purposes of this study, we consider two special cases of the *β*-divergence—squared Euclidean distance, also known as the squared error (SE) (*β* = 2) given by

and Kullback–Leibler divergence (*β* = 1) given by

though the methods presented here may generalize to any choice of *β*.

We employ the alternating non-negative least squares (ANLS) algorithm in order to minimize the *β*-divergence subject to NMF constraints. Briefly, this algorithm operates by alternately minimizing $D\beta (X,WH)$ with respect to **W** and **H** under the constraints that all matrices remain non-negative. This can be solved by gradient descent with non-negativity constraints on parameter updates, as described in Algorithm 1 as follows:

1: procedure ANLS(X, $W0,\u2009H0$, β, λ_{1}, λ_{2}, iter, tol) |

2: $i\u21900$, $Lprev\u2190\u221e$, $\Delta \u2190\u221e$ |

3: $W\u2190W0$, $H\u2190H0$ |

4: while $i\u2264iter$ and $\Delta >tol$ do |

5: $X\u2032\u2190$ THRESHOLD WH, 0, $10\u22128$ |

6: $L\u2190D\beta (X,X\u2032)$ |

7: $\mu W\u2190{\u2211j=1nH.,jif\u2009\beta =1X\u2032(\beta \u22121)W\u22baotherwise$ |

8: $W\u2190W\xb7max(\mu W\u2212\u2202L\u2202W,0)/(\mu W+\lambda 1+\lambda 2W)$ ▷ Non-negative W update |

9: $X\u2032\u2190$ THRESHOLD WH, 0, $10\u22128$ |

10: $L\u2190D\beta (X,X\u2032)$ |

11: $\mu H\u2190{\u2211i=1mWi,.if\u2009\beta =1W\u22baX\u2032(\beta \u22121)otherwise$ |

12: $H\u2190H\xb7max(\mu H\u2212\u2202L\u2202H,0)/(\mu H+\lambda 1+\lambda 2H)$ ▷ Non-negative H update |

13: $i\u2190i+1$, $\Delta \u2190L\u2212Lprev$, $Lprev\u2190L$ |

14: end while |

15: return W, H |

16: end procedure |

17: function Threshold(M, t, ε) ▷ Threshold each element in matrix M |

18: for all $Mi,j$ in M do |

19: $Mi,j\u2190{Mi,jif\u2009Mi,j>t\epsilon otherwise$ |

20: end for |

21: end function |

1: procedure ANLS(X, $W0,\u2009H0$, β, λ_{1}, λ_{2}, iter, tol) |

2: $i\u21900$, $Lprev\u2190\u221e$, $\Delta \u2190\u221e$ |

3: $W\u2190W0$, $H\u2190H0$ |

4: while $i\u2264iter$ and $\Delta >tol$ do |

5: $X\u2032\u2190$ THRESHOLD WH, 0, $10\u22128$ |

6: $L\u2190D\beta (X,X\u2032)$ |

7: $\mu W\u2190{\u2211j=1nH.,jif\u2009\beta =1X\u2032(\beta \u22121)W\u22baotherwise$ |

8: $W\u2190W\xb7max(\mu W\u2212\u2202L\u2202W,0)/(\mu W+\lambda 1+\lambda 2W)$ ▷ Non-negative W update |

9: $X\u2032\u2190$ THRESHOLD WH, 0, $10\u22128$ |

10: $L\u2190D\beta (X,X\u2032)$ |

11: $\mu H\u2190{\u2211i=1mWi,.if\u2009\beta =1W\u22baX\u2032(\beta \u22121)otherwise$ |

12: $H\u2190H\xb7max(\mu H\u2212\u2202L\u2202H,0)/(\mu H+\lambda 1+\lambda 2H)$ ▷ Non-negative H update |

13: $i\u2190i+1$, $\Delta \u2190L\u2212Lprev$, $Lprev\u2190L$ |

14: end while |

15: return W, H |

16: end procedure |

17: function Threshold(M, t, ε) ▷ Threshold each element in matrix M |

18: for all $Mi,j$ in M do |

19: $Mi,j\u2190{Mi,jif\u2009Mi,j>t\epsilon otherwise$ |

20: end for |

21: end function |

To allow for import of prior knowledge, Algorithm 1 allows for the user to supply initial values for **W** and **H** in the form of inputs $W0$ and $H0$. To fix certain columns of **W** or rows of **H** throughout optimization in order to reflect ground-truth knowledge, one may fix the gradients for known components to zero in the following manner:

where $SM$ is the set of all known components of matrix **M**.

During the analysis of experimental results, we presented the MSE alongside the mean R-factor given by

as a commonly used metric in crystallographic and scattering methods.

Algorithm 1 has been implemented in PyTorch,^{43} and is publicly available at https://github.com/bnl/pub-Maffettone_2021_04.

For the measured BaTiO_{3} dataset, the as fit canonical and constrained NMF components and weights are shown in Figs. 7 and 8, respectively. For the measured molten salt dataset, the as fit canonical and constrained NMF components and weights are shown in Figs. 9 and 11, respectively, including how the model varies as a function of the number of components.

### B. Total scattering data collection

Total scattering measurements were performed at the PDF beamline at the National Synchrotron Light Source II at Brookhaven National Laboratory. The wavelength of incident x-rays for both measurements was found to be 0.1671 Å (74.21 keV). Data were collected in transmission mode with a flat-panel Perkin–Elmer detector. Sample-to-detector distance and detector orientation were calibrated using Ni powder standard. All raw 2D images were background subtracted using a dark image collected immediately prior to each scan. Beamstop and detector aberrations were removed with masking prior to radial integration that was performed using pyFAI,^{45} with PDFs then being generated using PDFgetX3.^{46}

For the molten salt measurements, the sample-to-detector distance was found to be 282.74 mm. The sample was held in 1 mm quartz capillaries resistively heated using Nichrome wire, with temperatures measured via a K-type thermocouple. Measurements were collected with 10 s data collection time per temperature. For the BaTiO_{3} measurements, samples were held in 1 mm quartz capillaries, with temperature controlled through a liquid nitrogen cryostream directed at the sample. Diffraction (XRD) measurements were collected at 1 m from the sample with 60 s data collection time per temperature.

## III. RESULTS

The goal of NMF is to find an *m *×* k* non-negative matrix **W** of weights and a *k *×* n* non-negative matrix **H** of components, to approximate a dataset $X\u2248WH$, where *m* is the number of patterns in a dataset, *n* is the number of points in a pattern, and *k* is the chosen number of components. The details of this optimization are presented in Sec. II. We augmented this approach to allow the user—or an algorithm—to initialize any column of **W** or row of **H**, and to subsequently constrain those vectors from being updated during the optimization. It is worth noting that the algorithm is often presented with the contents of **W** and **H** reversed; however, we find it more intuitive to have **W** refer to weights and have done so herein.

### A. Synthetic and experimental datasets

To compare the capabilities of canonical and constrained NMF, we assembled two synthetic datasets, which are shown in Fig. 1. The first highlights a common case where two unique, overlapping, components mix linearly with coefficients that sum to one [Fig. 1(a)]. This occurs in binary mixtures when the measured response function is a weighted sum of the pure response functions, as is the case for diffraction.

The second synthetic dataset contains three unique functions that are mixed non-monotonically as the dataset index progresses [Fig. 1(b)]. The three functions, chosen to be generic and visually clear, are a Lorentzian, a Gaussian, and a box function. The Gaussian and box function overlap to create a non-differentiable result. This dataset explores a limiting and challenging use case where underlying physical components originate from mixed contributions of both smoothly varying and discontinuous physical features in measured data, such as aberrations introduced by data processing procedures. The weights represent some commonly encountered ways that physical processes—and therefore components—can evolve naturally in chemical and physical processes (gradual decay, transitory existence, sudden onset). This family of challenges highlights the scenario wherein *a priori* knowledge of one or all components exists, but not necessarily knowledge of the weights.

We next examined two distinct experimental systems in order to validate our approach to constrained NMF, the full datasets of which are shown in Fig. 2. First, we analyzed the subtle temperature-driven phase transitions of barium titanate, BaTiO_{3} [Fig. 2(a)]. BaTiO_{3} is a well-studied material that has been used to probe the mechanism of ferroelectricity because its phase transitions exist at relatively low temperature.^{47,48} At high temperature, BaTiO_{3} exists as a paraelectric cubic phase, where on average the titanium (Ti) atoms are octahedrally coordinated by six oxygen (O) atoms. At room temperature, the average displacement of the Ti atom elongates along [001] crystal axis, resulting in a ferroelectric tetragonal phase. Further cooling results in an orthorhombic phase with polarization along the [011] axis. At low temperature, the material reaches a rhombohedral ground state, with all Ti atoms polarized along the [111] axis. This symmetry breaking transformation of the crystal lattice results in subtle peak splitting of the XRD pattern. It is further obfuscated by thermal expansion, which drives peaks in the XRD pattern to shift. The differences between patterns is so subtle that classification across these temperature-induced phase transitions is not possible without expert intervention during analysis.^{47}

In our final experimental system, we examined the structural evolution of a near-eutectic mixture of sodium chloride and chromium chloride, NaCl:CrCl_{3}, when it is heated beyond its melting temperature [Fig. 2(b)]. Such molten salts are of interest in the development of accident-tolerant nuclear fuels.^{42} Unlike the changes observed in BaTiO_{3}, the crystalline–amorphous transitions in this salt are abrupt and easily detectable. We used constrained NMF to understand the behavior of NaCl:CrCl_{3} across a temperature range of 300–963 K, which included its melting transition. In this experiment, there was a residual solid phase after the melting transition. Without any additional analysis, it remained unclear to the research team whether that solid was a distinct high-temperature phase or an experimental artifact. Taken together with the barium titanate dataset and the two previously outlined synthetic datasets, these four datasets present the clear utility of constrained NMF for on-the-fly and *in situ* analysis of diffraction data.

### B. Constraining weights with synthetic datasets

Through examination of the first synthetic dataset [Fig. 1(a)], we demonstrated the challenge of retaining physically relevant components from NMF even when the weights are known *a priori*. This is a common situation in diffraction studies of variable composition or phase mixing, where complementary experimental information (preparation conditions or spectroscopy) may indicate the concentration of individual phases without specifying the diffraction patterns of the phases. We considered a continuous mixture of two Gaussian functions, where noise is added both to the composition and to the final function.

The results of using canonical NMF to model this dataset is summarized in Fig. 3, which shows the modeled components, weights, and three representative reconstructions from the full dataset—a presentation format consistent for all synthetic examples discussed herein. In decomposing the Gaussian dataset using canonical NMF, the reconstruction loss as defined by SE [Eq. (3)] achieved a mean value (MSE) of 5.788 over the dataset. The common crystallographic R-factor metric [Eq. (6)] of the reconstruction achieved a mean value of 0.066. The decomposition into two components created a model of the mixtures that fits the data well. Even though the learned components are similar to the true functions, their respective magnitudes are not obviously linked to the magnitudes of those functions [Fig. 3(d)]. Furthermore, the weights must compensate for this increased predicted magnitude, and inevitably over-fit the noise in the dataset [Fig. 3(e)]. This is the expected behavior of canonical NMF, since it is naive to any prior knowledge or physical constraints we can impose on the decomposition. From this, we can see the baseline utility of canonical NMF for dimensionality reduction; however, even in this simple dataset we see how it is possible to obtain physically implausible solutions.

We next sought to inform the NMF model with prior knowledge of weights in the form of optimization constraints. In this use case, the weights were limited to the coordinates of a simplex (that is, they sum to unity) and expected to vary linearly. The results of this constrained NMF model are shown in Fig. 4. By initializing the NMF with the “true” weights, and constraining their update in the optimization algorithm, we find a comparably performing decomposition (MSE of 6.903 and mean R-factor 0.0722) that was much more likely given the physical constraints of the system [Fig. 4(d)]. By design, the performance metrics (MSE and R-factor) are slightly worse than those for the canonical NMF because we do not allow the decomposition to over-fit the noise in the weights (which is not exactly linear). This forces the weights to follow the prior and keeps the reconstruction robust to noise or outliers. Importantly, the resulting components from the constrained NMF have the same amplitude as the “true” function. In a crystallographic or spectroscopic context, this is important because peak intensity carries physical significance.

The results from this constrained NMF are obtained rapidly, in a matter of seconds on a personal laptop, and can be produced on-the-fly during measurement. The output components offer immediate insight into the underlying dataset, while depending on a predicted approximate prior. These components can be used as starting conditions for exact refinement,^{20} of each pattern or spectrum in a dataset. When analysis is less time constrained, other methods for further tuning the optimization of constrained NMF can be explored with continual human input. Some such methods include using non-rigid constraints and Markov chain Monte Carlo optimization,^{38} which would enable a flexible, non-exact, linear prior.

### C. Constraining components with synthetic datasets

As a second controlled example, we created a synthetic dataset from three diverse functions with smooth, non-monotonically varying weights in a linear combination [Fig. 1(b)]. Those functions included an overlapping box and Gaussian function, and a Lorentz function that does not overlap either. In this scenario, prior knowledge of any of these components significantly constrains the NP-hard optimization problem of NMF.

As can be seen in Fig. 5, this dataset was not effectively decomposed by canonical NMF. The resulting reconstruction MSE was 25.441 with a mean R-factor of 0.034. One reason for this failure was the non-differentiable box function. This property makes decomposition particularly challenging for the gradient descent methods used in the optimization. As shown in Fig. 5(d), the model found a local minimum in the optimization where the learned components were partial mixtures of the true components. This is both physically inaccurate and results in poor reconstruction. In a case such as this, the decomposition from canonical NMF results in a weak, non-physical, dimensionality reduction, which offers no new knowledge to the scientist.

Corollary to the previous example (where prior knowledge of the weights enabled a physically meaningful extraction of components), we sought to inform NMF of the components as a prior to decompose these composite functions. The results of modeling this dataset with constrained NMF are shown in Fig. 6. Despite the non-trivial nature of the weights, prior knowledge of the component functions enabled the model to learn those weights to a high degree of accuracy [Fig. 6(e)]. Again, we see the addition of priors on the components guiding the model away from overfitting. In this instance, the constrained NMF outperformed canonical NMF even by the standard metrics (MSE = 1.336, $\u27e8R\u2212factor\u27e9=$ 0.013)—improving in the MSE by nearly a factor of 20. This approach is also applicable if only some of the components are known, albeit less robust for this use case (Figs. S1–S6). This suggests that a procedure that allows for iterative inclusion of more priors through human or automatic intervention would allow for even the most complex datasets to be successfully resolved using constrained NMF.

### D. Constraints enable physically meaningful phase segregation of BaTiO_{3}

Following this validation of constrained NMF in controlled systems, we sought to test its ability to construct a meaningful decomposition in a subtle and challenging crystallographic problem with experimentally measured data. The BaTiO_{3} dataset presented in Fig. 2 contains four distinct crystallographic phases ($Pm3\xafm,\u2009P4mm$, *Amm*2, and $R3m$) across the temperature range measured. These transitions are subtle and not easily discernible using conventional analysis methods without expert intervention. Between these order–disorder phase transitions, there is also continuous peak shifting due to thermal expansion.

As seen in Fig. 7, canonical NMF is poorly suited to address translational variance (peak shifting) in decomposing a dataset into composite patterns.^{34,36,37} Additionally, it was also incapable of identifying the clear second order phase transitions within the data. Instead, it learned mixtures of the “true” components as its fundamental components in the output decomposition. From Fig. 7(b), one might falsely conclude that the high temperature phase (blue) increases approximately linearly in composition with temperature, or that there are three components that contribute significantly to the low temperature phase. A fastidious materials expert inspecting the NMF fit may disregard the analysis outright as it violates the Gibbs phase rule by suggesting the coexistence of four phases. Despite this, the reconstruction loss from NMF was quite low with an MSE of 0.150 and a mean R-factor of 0.039. This demonstrates that canonical NMF is not well suited for crystallographic datasets with subtle transitions, a common case for diffraction studies.

We next considered constraining the components of NMF to extract a better, more physically realistic, decomposition of the BaTiO_{3} dataset. We sought to approach this in a way that made no presumption of a specific phase, and only operated according to the knowledge that the dataset was measured along a single state variable (temperature). While our synthetic experiment mixing diverse functions demonstrated that constraining all of the components led to the optimal solution, such complete system knowledge may not be expected in all experimental applications. Previous studies on diffraction data have demonstrated that partial constraints may enable the extraction of physically sensible intermediates.^{38,49} To construct a physically meaningful solution without explicit knowledge of the underlying phases, we designed an iterative algorithm that wraps our constrained NMF to handle datasets over state variables such as temperature or composition.

The iterative application of constrained NMF automatically selects components to constrain, thus depending only on the user-selected number of components. It begins by assuming the end members of a dataset (in this case lowest and highest temperature patterns) are unique, fixing those members as components, and optimizing a partially constrained NMF. In the limiting case of a rank-two NMF, this amounts to combinatorial decomposition, which has been successful in interpreting the total scattering data.^{12} When more components are used in the decomposition, the algorithm then selects the variable component corresponding to the highest weights, then finds the location where this component is most prevalent in the dataset. The selected component is then fixed to the dataset member at this location, and the NMF optimization proceeds with one additional constraint. This process continues until all components of the NMF are constrained. Critically for total scattering datasets, this results in weights that are bounded between 0 and 1, thus possessing a meaningful mapping to fractional composition.

The results of applying this iterative application of constrained NMF to the barium titanate dataset is shown in Fig. 8. While this approach yielded results with similar reconstruction performance to those of canonical NMF (MSE = 0.349, $\u27e8R\u2212factor\u27e9$=0.051), the resulting components and weights can be unambiguously interpreted. As the fixed components were chosen directly from the measured dataset, they can be used directly as reference diffraction patterns of pure phases for further refinement [Fig. 8(a)]. Furthermore, the weights can be interpreted as molar fractions, showing phase mixtures existing in the sample during transitions [Fig. 8(b)]. The applications of constraints—even without prior knowledge of what those constraints should be—enabled a substantial improvement over canonical NMF, providing both physically meaningful components and weights.

There is still some room for improvement of this method. The transitions provided by the constrained NMF are still too gradual for second order phase transitions. While they occur in approximately the correct location, they indicate more mixing than is actually observed. This is a direct result of translation dependence of NMF: peak shifts in pure phases are likely cast as mixtures between minimally and maximally shifted components that are otherwise identical. Supervised or semi-supervised convolutional deep learning methods have exceeded this performance.^{28,29,50} Nonetheless, to the authors' knowledge, this is the best performance of any fully unsupervised ML approach at segregating the phases in such a dataset.

### E. Constraints enable physically meaningful *in situ* analysis of melting salts

We then used NMF to monitor an experimental transition of solid crystalline NaCl:CrCl_{3} into a molten amorphous state wherein the number of component phases is unknown. Using the elbow method^{51} to examine the MSE and Kullback–Leibler (KL)-divergence loss of canonical NMF, we noted diminishing returns in the reconstruction at around four components (Fig. 9). As shown in Fig. 10, canonical NMF using four components produces a decomposition with low reconstruction error (MSE = 3.355, R-factor = 0.020) that captures the solid to amorphous phase boundary with a clear second order phase transition shown at 676 K. However, as in our study of BaTiO_{3}, the results are physically unrealistic, because both the high temperature components are not consistent with real diffraction patterns, and the weights suggest a 3–4 phase mixture throughout the temperature series.

During the time-sensitive conditions experienced when performing such an experiment at a central facility, it is important to extract meaningful scientific insight in real time. Integrating the NMF into data acquisition pipeline^{52} enables the researcher to note the transition temperature on-the-fly; however, the canonical approach is unable to extract meaningful components. Without additional processing, the solid material that existed past the transition temperature was unable to be characterized. It is crucial to know if this phase is unique and mandates further use of the limited measurement time available. In the case of any experiment that is irreversible, destructive, or expensive to reproduce, immediate access to this type of information is paramount.

The constrained NMF approach offered this immediate analysis capability. The elbow method produced similar learning curves, with a four-component MSE of 9.193 and a mean R-factor of 0.0334. Due to the uncertain effects of peak shifting and thermal expansion, we chose to examine a range of values around this inflection point, and performed constrained NMF using 3–5 components in parallel as shown in Fig. 11.

Using the same automatic iterative constraining procedure from our study of BaTiO_{3}, the constrained NMF elucidated a significantly more physically meaningful decomposition than canonical NMF, readily discovering the difference between a component unique to a phase and shifting features due to thermal expansion. In the rank-three constrained NMF, two solid components and one amorphous component arose in the decomposition [Figs. 11(a) and 11(b)]. The solid components were nearly identical and showed a linear variation in weight (composition), which was 0.5/0.5 at the midpoint in the temperature range prior to the first order phase transition. A small and decaying amount of the high temperature solid phase was present early after the transition. This linear exchange between the two similar components below 400 K arises from the translational dependence of NMF: the algorithm compensates by considering two identical [besides shift] patterns at both ends of the relevant temperature range. From the rank-three constrained NMF alone, we can see the gradual impact of thermal expansion followed by a second order transition where some of the material remained solid.

Considering higher orders of constrained NMF showed the uniqueness of the high temperature solid material, and further validated the assertion that smooth, linear variations of similar patterns are an indication of peak shifting. The rank-four NMF extracts similar components as the rank-three NMF, albeit with a new high temperature component to describe the fine structure observed after the transition temperature [Figs. 11(c) and 11(d)]. This pattern [orange in Fig. 11(c)] was a result of an amorphous background (later fully pronounced at high temperature in dark gray) unexpected presence of persistent, strained, transitory crystallites of the original solid that had a preferred orientation with respect to the x-ray beam. Finally, the rank-five NMF [Figs. 11(e) and 11(f)] decomposed the dataset into the same components with an additional subdivision of the low temperature solid phase. This subdivision occurred at the midpoint of the previous two end members, demonstrating the behavior of the constrained NMF when compensating for pattern shift. This response to the imperceptible shift in the data is a useful and predictable benefit to the iterative and constrained approach: without translational invariance, this behavior enables the user to confidently note when components belong to the same—but shifting—phase.

The results from the decomposition using constrained NMF on a variable temperature diffraction dataset are immediately more interpretable than a similar decomposition using canonical NMF. We quantify this assertion by performing a Le Bail refinement in TOPAS^{53} of the true crystal structures against each of the components produced by NMF. The R-weighted pattern (R_{wp}) from these refinements is summarized in Fig. 12. The resulting error is consistently smaller (by a factor of four) for the constrained components, confirming that these components can be better interpreted as diffraction patterns from the physical system.

As with the BaTiO_{3} example, the weights correspond to interpretable compositions. Without solid–solid phase transitions as seen with BaTiO_{3}, the linear response to pattern shifting is apparent. While this ameliorates the lack of translational invariance of NMF, a more ideal solution would be one that maps only one component to a single phase regardless of the translational variations occurring during an experiment. Moreover, the recognition of persistent solids by the NMF is useful for adjusting an experimental protocol on-the-fly. Through user or algorithm intervention, the identification of a second order transition could be used to slow the rate of temperature change, allowing the system time to equilibrate. In such experiments, there is often a lag between the measured temperature and the material temperature, especially around a transition that involves latent heat. Indicating this transition on-the-fly enables a more robust and informative experiment.

## IV. DISCUSSION AND CONCLUSIONS

Here, we developed a constrained NMF approach that interprets streaming spectral data on-the-fly, promptly providing critical insight and analysis from time-sensitive experiments such as XRD and PDF. This tool is now regularly deployed at the PDF beamline at National Synchrotron Light Source II. We have continually refined how we utilize and interpret constrained NMF with our user community and outline some best practices here.

We have considered several approaches for constraining components and weights. Constraints can be specified *a priori* based on system knowledge, such as known material states, or in a data-driven manner during the experiment.

For diffraction, one *a priori* approach is to use simulated patterns of known or anticipated phases as constraints, forcing the unconstrained components to describe new or unknown phases. Alternatively, experimental patterns drawn directly from the dataset can be used as fundamental components. This is a viable approach for a variable temperature series where the end members are expected to be distinct (Fig. 11). For such cases, we have implemented a completely data-driven approach to derive the constrained components. This approach operates iteratively, seeking out points in the dataset that are maximally described by a single component (high weight) and constraining the value of the said component at that that point, thus reducing the degrees of freedom for NMF to explore in subsequent iterations. This method was illustrated in our study of BaTiO_{3}, where it recovered the known pure phases and approximate transition temperatures (Fig. 8). We have also considered the case where the mixture weights (i.e., volume fraction) are known, but not the components themselves, and demonstrated this capability on synthetic data (Fig. 4). Future work in this area should allow for soft constraints, such as enforcing monotonicity of weights or allowing the initial constraints to adjust with a penalty.

Because the decomposition is hyperparameter dependent, multiple instances of NMF can be run in parallel to efficiently explore the parameter space, enabling the scientist to investigate different constraints or numbers of components while the measurement is ongoing. This is particularly advisable when the user is uncertain about the number of fundamental components and allows the NMF agent to generate elbow plots as seen in Fig. 9. This also allows for real-time comparisons between constrained NMF and canonical NMF as shown in Figs. 7 and 8. Conditioning and re-plotting the NMF outputs occurs on the order of seconds to minutes depending on the size of the dataset. This allows for a feedback loop between the scientist, the analysis, and the experiment that is beneficial for a variety of studies. We have found the speed of this implementation sufficient for all encountered experiments by performing the recalculation and plotting using only the CPU; however, the decision to implement in PyTorch enables facile Graphical Processing Unit (GPU) acceleration using Compute Unified Device Architecture (CUDA) libraries if required by future applications.

While there are certain properties of NMF that encourage physicality in the output for x-ray scattering data (e.g., non-negativity and additive components), these characteristics alone do not assure physical interpretability. Although the features that make a decomposition immediately interpretable will depend on the nature of the experiment and discretion of the user employing the method, we have described some emergent features that make XRD decomposition readily interpretable: satisfying Gibbs phase rule, weights which sum to unity, and components that seemingly represent unique phases and not mixtures. From our synthetic examples (Figs. 3–6) and the elbow method (Fig. 9), we can see that simply reducing the reconstruction error is not sufficient in creating a more interpretable decomposition. It is also not informative to compare the reconstruction error between different datasets. In practice, we compare the order of magnitude of the reconstruction error between different instances of NMF (canonical, *k*-components, etc.) on the same dataset. However, this metric only informs the user if the method is capturing the information of the full dataset, and not if the output itself is physically interpretable.

The constrained NMF approach processes large datasets in an unsupervised fashion with an interactive prior: the intuition and decision-making of the human-in-the-loop. These constraints can be chosen explicitly by the researcher, or implicitly from the dataset by an algorithm. This gives the streaming analysis added value that cannot be expected from canonical NMF. We demonstrated the advantages of using constrained NMF to aid in revealing subtle phase transitions of a ferroelectric perovskite and the identification of experimental complexities in molten salts. This approach efficiently handles the drastic increase in data generation rates—critical at modern beamlines—where conventional analysis approaches cannot keep up with the experimental pace required for optimized data collection. Some overall limitations of NMF that still apply to constrained approaches are the inability to retain the partial translational invariance of XRD and PDF patterns,^{34,41} as well as other common aberrations that impact a pattern without impacting the underlying phase.^{28,50} Constrained NMF can be extended to a broad range of spectral data applications such as x-ray photoelectron spectra, x-ray absorption near edge spectra, photoluminescence spectra, nuclear magnetic resonance spectra, or mass spectra.^{49} These agents can be deployed as part of a federation that includes multiple NMF instances asynchronous with supervised ML, and be potentially informative of adaptive experiments.^{52} This maintains the agency of the researcher through leveraging of expertise, while producing real-time analysis and actionable results during an experiment.

## SUPPLEMENTARY MATERIAL

See the supplementary material for examples of partially constrained NMF components in synthetic datasets and demonstrations of applying constrained NMF to pair-distribution function (PDF) data from *in situ* melting salt experiments.

## ACKNOWLEDGMENTS

We acknowledge financial support from the BNL Laboratory Directed Research and Development (LDRD) projects 20–032 “Accelerating materials discovery with total scattering via machine learning” (P.M.M. and D.O.), and the Simons Foundation Center for Computational Biology (A.C.D). This research utilized the PDF (28-ID-1) Beamline and resources of the National Synchrotron Light Source II, a U.S. Department of Energy (DOE) Office of Science User Facility operated for the DOE Office of Science by Brookhaven National Laboratory under Contract No. DE-SC0012704.

## AUTHOR DECLARATIONS

### Conflict of Interest

The authors have no conflicts to disclose.

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.