Referencing schemes are commonly used in heterodyned spectroscopies to mitigate correlated baseline noise arising from shot-to-shot fluctuations of the local oscillator. Although successful, these methods rely on careful pixel-to-pixel matching between the two spectrographs. A recent scheme introduced by Feng *et al*. [Opt. Express **27**(15), 20323–20346 (2019)] employed a correlation matrix to allow free mapping between dissimilar spectrographs, leading to the first demonstration of floor noise limited detection on a multichannel array used in heterodyned spectroscopy. In addition to their primary results using a second reference spectrometer, Feng *et al.* briefly demonstrated the flexibility of their method by referencing to same-array pixels at the two spectral edges (i.e., edge-pixel referencing). We present a comprehensive study of this approach, which we term edge-pixel referencing, including optimization of the approach, assessment of the performance, and determination of the effects of background responses. We show that, within some limitations, the distortions due to background signals will not affect the 2D IR line shape or amplitude and can be mitigated by band narrowing of the pump beams. We also show that the performance of edge-pixel referencing is comparable to that of referencing to a second spectrometer in terms of noise suppression and that the line shapes and amplitudes of the spectral features are, within the measurement error, identical. Altogether, these results demonstrate that edge-pixel referencing is a powerful approach for noise suppression in heterodyned spectroscopies, which requires no new hardware and, so, can be implemented as a software solution for anyone performing heterodyned spectroscopy with multichannel array detectors already.

## INTRODUCTION

Optical heterodyne detection is a phase sensitive approach that mixes a weak signal (*E*_{sig}) and strong local-oscillator (*E*_{LO}) on a square-law detector ($Idet=ELO+Esig2=ILO+2RELO*Esig+Isig$) resulting in substantial signal amplification ($2RELO*Esig\u226bIsig$) in addition to phase information ($ELO*Esig$ being a phase sensitive interference). In pump–probe spectroscopy (e.g., TA and 2D IR), or even linear spectroscopy more generally, the probe, following the transmission through the analyte, later acts as the “LO” due to its interference with the signal at the detector. Hence, these are termed “self-heterodyned” spectroscopies in which the distinction between the “probe” and “LO” is unnecessary. However, for other heterodyned spectroscopies, such as four-wave mixing (for which the scope of this paper also concerns), the nonlinear signal is scattered into a different direction from the probe, thereby requiring overlap with a separate LO for phase-sensitive detection. In either case, extraction of the signal (*E*_{sig}) from the LO background, often reported as a change in optical density (ΔOD), requires chopping or phase cycling of the signal over two or more pulses.^{1,2} Shot-to-shot fluctuations of the LO, however, appear as correlated fluctuations in the spectral baseline^{2,3} and remain the largest source of noise for most heterodyned spectroscopies.

In the small signal limit, ΔOD ∝ Δ*I*/*I*, and the noise for the measurement, which contributes to Δ*I*, may be categorized into three types based on the scaling of spectral intensity *I*(*ω*). The first type is *Detector Noise*, which is independent of the number of photons detected [Δ*I* ∝ *const*. and ΔOD ∝ 1/*I*(*ω*)] and includes dark current and readout noise.^{2,4,5} The second type is *Shot Noise*, which (aside from instances of squeezed light^{6,7}) scales as the square root of the number of photons^{5,7} [Δ*I* ∝ $I(\omega )$ and ΔOD ∝ 1/$I(\omega )$]. Together, shot noise and detector noise form the *Noise Floor* of the apparatus (added in a root-sum-square way). The third type is *LO Noise* (also known as laser noise or shot-to-shot noise), which scales linearly with the spectral intensity [Δ*I* ∝ *I*(*ω*) and ΔOD ∝ *const*.] and is associated with the shot-to-shot intensity fluctuations of the LO.^{3,5} On the ΔOD spectrum, LO noise appears as a random but continuous perturbation of the baseline and, hence, is often described as “baseline wobbling.” This behavior contrasts with floor noise, which is generally assumed to be uncorrelated from pixel-to-pixel. In addition to these three forms of “additive” noise, there is also *Multiplicative-Pump Noise* (also known as convolutional noise), which arises from shot-to-shot fluctuations of the *pump* beam and manifests as scaling fluctuations of the nonlinear signal in the ΔOD spectrum,^{8} but is inherently ∼10^{2} times smaller than the third order signal itself and is, therefore, negligible in most considerations (see Appendix B). In terms of the repetition rate of the experiment, the power spectrum of the LO noise is characteristically *1*/*f*, and therefore, it is standard practice to isolate weak optical signals by chopping, phase cycling, or filtering with a lock-in-amplifier.^{9,10} To that end, the residual noise that remains after that filtering is the focus of this manuscript.

Of the three categories of additive noise under consideration, LO noise is typically the dominant noise source in the ΔOD spectrum for heterodyned spectroscopies running at or below 1–10 kHz. Typical values range from 0.1% to 0.5% (0.4–2 mΔOD) for most modern Ti:Sapphire amplifiers^{3,5,9,11–14} and from 1% to 2% (4–9 mΔOD) for most OPAs (both colinear and non-colinear).^{3,15–17} One notable exception is that two consecutive OPAs are anticorrelated in noise, where values as low as 0.2% (0.9 mΔOD) have been reported^{12} (note that Ref. 12 refers to the second nonlinear conversion stage as a “AgGaS_{2} OPA,” which may be known as “DFG,” although these two understandings are equivalent by conservation of energy). More recently, OPAs pumped by 100 kHz amplifiers utilizing Yb doped gain media have reported nearly detector-limited noise values of 0.1% (0.5 mΔOD) without referencing,^{18–21} mainly attributed to the 1/*f* scaling of the LO noise with the laser repetition rate. While Yb systems will likely pave the way for time-resolved infrared spectroscopy,^{18,22} many spectroscopies requiring cameras, or high pulse energies for studying weak and/or dilute reporters, are likely to remain at 1–10 kHz for the foreseeable future.

LO noise is generated at the laser source(s) and especially during parametric down-conversion in OPAs. The LO noise can be corrected using a beam splitter and a referencing scheme,^{2} and conventional referencing schemes have resulted in significant noise reduction factors for both single-channel^{9,15,23} and multi-channel detectors.^{3,11,13,16,24–26} Nevertheless, conventional referencing schemes still struggle to reach the noise floor with large well-depth detectors. As a case in point, the ratiometric scheme published by Werley *et al.* reduced the LO noise by a factor of 30× down to 0.074% (320 μΔOD), yet this was still a factor of 10× above the theoretical noise floor of their single-channel detector.^{9} One interesting consequence of conventional referencing schemes, where two arrays are subtracted (or divided) from each other, is that the two noise floors then add in a root-sum-square way, leading to a larger *conventional noise floor* than the signal array would otherwise have.^{2} Moreover, application of conventional referencing to multi-channel detectors usually requires similar mapping between the reference and signal spectrographs, which can be both time-consuming and costly.

Recently, a new class of referencing was introduced by Feng, Vinogradov, and Ge that uses a correlation matrix to allow free mapping between arbitrarily different spectrographs.^{8} As a consequence of calibration, the uncorrelated floor noise from the reference array is attenuated before carrying over into the signal array, thereby surpassing the conventional noise floor limit [more explanation may be found in Ref. 8 or here following Eq. (6)]. Indeed, with this scheme, they were able to achieve noise suppression down to ∼0.035% (150 μΔOD), which corresponds to the noise floor of their 32 pixel liquid-nitrogen cooled MCT array. A follow-up study on CMOS arrays showed similar performance as well.^{27} To emphasize the flexibility of their method, Feng *et al.* also briefly demonstrated how they could reference to edge pixels on the same signal array. The requirements for edge-pixel referencing are that the signal of interest be confined to the center of the LO spectrum and that the edge pixels are available for sacrifice.

Edge-pixel referencing takes advantage of the spectral correlations present in LO noise by subtracting an interpolated (curvilinear) baseline between the two edges of the ΔOD spectrum. When stated simply, the idea of subtracting an interpolated (or modeled) baseline from a difference-spectrum is a well-known approach.^{28–39} However, when applying the correlation matrix to edge pixels, a noise calibration is required, and it is here that edge-pixel referencing differs from previous forms of baseline interpolation. By calibrating the correlation matrix, a basis set is formed, which most optimally maps the subtle fluctuations in the edge pixels to the broader “modes” of the LO noise across the pixel array. For example, in the limiting case where LO noise is modeled as a random (straight) line, referencing to two edge pixels is formally equivalent to a linear interpolation of the baseline between those two pixels. To first order, this is a reasonable description of the LO noise, and, for convenience, how we have illustrated it in Fig. 1. In practice, however, LO noise is more complicated with stochastic curvilinear features, and so, more edge pixels are required to better capture these fluctuations. In that respect, edge-pixel referencing assumes no basis set (or even a model) for approximating the LO noise. Rather, it determines a natural basis set during calibration.

The idea of edge-pixel referencing replacing a second reference spectrometer is significant in a few respects. First, it greatly simplifies the experimental apparatus where space may be limited and beam alignment is time consuming.^{2} Second, in the case of mid-infrared detection, eliminating a second spectrometer would make experiments more affordable since MCT arrays are considerably more expensive than CMOS arrays (i.e., upconverting infrared light to visible). The high price tag of MCT arrays also comes with limited resolution (e.g., 32–64 pixels) relative to commercial CMOS arrays in the visible region (e.g., 1024–4096 pixels). Hence, by combining CMOS/upconversion detection with edge-pixel referencing, one may obtain low-noise, high resolution mid-IR pump probe spectra at lower cost. Finally, because it requires no additional hardware and uses information that is already present in the measurements, edge-pixel referencing can be implemented as a software-only approach to noise correction.

Despite these potential advantages, Feng *et al.* raised reasonable concerns that edge-pixel referencing may improperly add a background to the ΔOD signal spectrum if the nonlinear signal is present at the edge pixels. As we show, however, narrowing the spectral bandwidth of the pump offers an easy solution to this problem. Furthermore, we also show that in some cases, it is possible to isolate these artifacts from the true signal.

Here, we present a comprehensive study of edge-pixel referencing on CMOS arrays, which includes performance, optimization, and background effects. We begin with evaluation strategies for selecting the optimal set of edge pixels for referencing. We then discuss our implementation of edge-pixel referencing in the context of two-dimensional infrared (2D IR) spectroscopy, where we evaluate the effects of edge-pixel referencing on both the peak intensities and the line shapes in the 2D IR data as a function of waiting time to test for any distortions of the signal as a result of referencing. We evaluate the effects of background responses and the kinds of distortions that they can introduce. We also demonstrate that it is possible to reduce or eliminate these effects by band narrowing the pump pulses. Ultimately, we show that the performance of edge-pixel referencing is comparable to that of referencing to a second spectrometer in terms of noise suppression and that the line shapes and amplitudes of the spectral features are identical, within the measurement error, for the two approaches. We also show that, even in fairly extreme cases, the distortions due to background signals will not affect the 2D IR line shape or amplitude, within certain limits, and, in favorable cases, can be fully mitigated by band narrowing of the pump beams. Together, these results demonstrate that edge-pixel referencing is a powerful approach to noise suppression in heterodyne-detected spectroscopies that requires no new hardware and, so, can be implemented as a software solution for anyone performing heterodyned spectroscopy with multichannel array detectors already.

## MATERIALS AND METHODS

### Baseline referencing algorithm

We closely follow the algorithm established by Feng *et al.*^{8} in our referencing implementation. We acquire pump–probe spectra in a 4-pulse phase cycle^{10} as defined by the following equation:

where *S*_{tot} denotes the nonlinear spectrum, *I* denotes an individual LO spectrum collected on the signal spectrometer (of size *N*_{S} pixels), and superscripts ^{*} and ′ denote pump-pulse phases 0 and *π*, respectively. We prefer the formulation using the 4-pulse phase cycle over the 2-pulse phase cycle since the 4-pulse phase cycle will remove any potential pump–probe scatter that might arise at the edge pixels.^{10} The factor of ½ normalizes the two difference-spectra added in the 4-pulse phase cycle. For a fair comparison across the literature, one must keep in mind that the 4-pulse noise (which we report in our experiments below) is already 1/√2 smaller than the 2-pulse noise, which is more typical in conventional pump–probe measurements and in line with the values reported in the Introduction.

In practice, pump–probe spectra are accompanied by the signal array noise $nsig=nsigLO+nsigflr$, where $nsigLO$ is the LO noise arising from shot-to-shot intensity fluctuations of the LO and $nsigflr$ is the floor noise of the signal array detector. In Appendix B, *n*_{sig} is equivalent to either Eqs. (B8), (B14), or a corresponding 4-pulse equivalent, depending on the assumptions made about the types of noises present and the type of phase cycling. The total ΔOD spectrum collected on the signal spectrometer is given by the sum in the following equation where *S*_{true} is the true underlying (noiseless) pump–probe signal:

One common approach is to split off a reference beam from the LO before the sample with a 50:50 beam splitter. A separate but nearly identical reference spectrometer then measures $nref=nrefLO+nrefflr$ where $nrefLO$ and $nrefflr$ are the LO and floor noise of the reference spectrometer. In an ideal world, $nsigLO=nrefLO$ and subtracting *n*_{ref} from Eq. (2) yields a pump–probe spectrum $Stot\u2212\u2009nref=\u2009Strue+\u2009nsigflr\u2212nrefflr$, which is completely free of LO noise. However, it also carries a larger floor noise than before since the two floor noises are (mostly) independent random variables. In practice, “identical” spectrographs do not really exist and there will always be some pixel-to-frequency mismatch between the two spectrographs.

A recent approach by Feng *et al.*^{8} relaxes the conventional constraint that *n*_{sig} ≅ *n*_{ref} by employing a noise correlation matrix ** B** (of size

*N*

_{S}×

*N*

_{R}) such that

*n*

_{sig}≅

*B*·*n*

_{ref}. This simple modification allows seamless mapping between dissimilar spectrometers and/or alignments at a small cost of calibrating the matrix

**. Stated more formally, the pump–probe spectrum after referencing**

*B**S*

_{ref}(which should ideally approximate

*S*

_{true}within the floor noise) is given by the following equation:

where *S*_{tot} is the measured 4-pulse phase cycle response on the signal spectrometer and *n*_{ref} (of size *N*_{R} pixels) is the corresponding 4-pulse noise measured on the reference spectrometer. The residual noise after referencing (denoted by Δ*n* = *S*_{true} − *S*_{ref}) is given in the following equation in terms of the variance where the square is computed pixel-wise:

According to Eq. (2), if *S*_{true} is unknown, then it is impossible to actively measure *n*_{sig} on the signal spectrometer while simultaneously collecting a nonlinear spectrum. However, *n*_{sig} may be measured by blocking the pump beam and therefore making *S*_{true} = 0. The procedure for calibrating ** B** consists of collecting a series of “blank” probe shots where the pump beam is physically blocked, and least-squares minimization is enforced on the mean residual noise in the following equation:

Throughout this manuscript, angled brackets ⟨⋯⟩ denote the mean. The solution to Eq. (5) is given by the following equation:

We pause here to discuss how the correlation matrix attenuates uncorrelated noise from carrying through to the signal array. Suppose the reference beam is blocked during calibration, and *n*_{ref} consists of white noise perfectly uncorrelated with *n*_{sig}. Then, the outer product ⟨*n*_{sig} · *n*_{ref}^{T}⟩, in Eq. (6), averages to zero for every element in the matrix, and hence, all noise is blocked from carrying over into the signal array. If the reference beam is unblocked, then correlation will exist between the two arrays and ** B** is nonzero. In this case, the uncorrelated noise from the reference array must pass through a linear transform and adds by incoherent addition along the rows of

**. However, because**

*B***is defined to minimize Eq. (5), calibration will carefully balance the norm of**

*B***to statistically guarantee that the reduction in correlated noise is always greater than or equal to the addition of incoherent noise from the reference array.**

*B*Following calibration of ** B**, the effectiveness of the referencing scheme may be tested by computing the RMSE noise of

*S*

_{ref}for a “test” dataset (collected separately from the dataset used for calibration, denoted as $Sreftest$), which is given by the following equation:

For implementing this referencing scheme, we have modified our data collection code to automatically recalibrate ** B** every 20 min. During this time, looping through a null mask on the AOM blocks the pump beam, and we collect 10 000 “blank” 4-pulse phase cycles (i.e., 40 000 shots) to calibrate

**. The program then returns to collecting 2D IR data as it references individual spectra along the free induction decay.**

*B*### 2D IR measurements

The details of the apparatus are described elsewhere^{40} with a few notable modifications mentioned here. Figure 1 illustrates the apparatus and referencing schemes. A 2 kHz Ti:Sapphire amplifier outputs 1.7 mJ pulses centered at 800 nm with 90 fs pulse duration (magenta solid arrow). The full output pumps a home-built optical parametric amplifier (OPA) followed by a difference frequency generation (DFG) stage that outputs pulses of 150 fs duration at an energy of 15 *µ*J per pulse centered at ∼2155 cm^{−1} or 4.64 *µ*m (black solid arrow). The mid-IR light splits into three ways with a 3° CaF_{2} wedge (incident with s-polarization): A ∼ 7% primary reflection (black solid arrow) serves as the probe/LO, a ∼5% back reflection (black dashed–dotted arrow) serves as the reference beam for the second spectrometer, and the remaining transmitted light is sent to a pulse shaper to serve as the pump beam (black dashed arrow). Meanwhile, the depleted 800 nm light discarded after pumping the OPA propagates to a zero-dispersion stretcher where we isolate two separate ∼1.5 cm^{−1} bands at the Fourier plane for upconversion, one for the probe/LO and the other for the second spectrometer reference beam.

Immediately prior to focusing into the sample, the pump and probe beams pass through separate waveplate/polarizer pairs each rotated such that the polarizations of the resulting pump and probe beams differ by 54.7° (the magic angle) as measured at the sample. A parabolic mirror focuses the pump (below-axis, vertically polarized) and probe (on-axis, polarized 54.7° to pump) into the sample. After the sample, another parabolic mirror recollimates the two beams. An iris discards the pump beam and the probe beam passes through a detection polarizer, followed by a waveplate to rotate the polarization back to vertical for type I phase-matching at the upconversion crystal. Note that rotating the off-axis pump beam to 54.7°, rather than keeping it vertically polarized, would result in mixing s- and p-polarization components due to the angle of incidence on the sample. In contrast, the on-axis probe beam preserves a single linear polarization at the sample, so we rotate the polarization of the probe to the magic angle rather than the pump, which necessitates the waveplate after the sample to rotate the polarization of the probe for proper phase matching in the upconversion crystal. The sample cell includes two 13 × 2 mm round CaF_{2} windows with a ∼100 *µ*m PTFE spacer for DMSO samples. For all measurements, we use the cyano stretching vibration of methyl thiocyanate (MeSCN) as the probe vibration. This oscillator is notoriously weak, with a molar absorptivity of ∼150M^{−1} cm^{−1}. We collected 80 000 “blank” (pump blocked) spectra (or 20 000 4-pulse phase cycles) for optimizing and performance testing the referencing methods shown below. In addition to these 80 000 “blank” spectra designated for calibration, an additional 2000 nonlinear transient absorption spectra (or 500 4-pulse phase cycles) were collected separately and used as the designated “test” data for evaluating Eq. (7).

The upconversion of probe and reference beams involves similar optics, although they vary slightly in the particular portion of the spectrum sampled from the depleted 800 nm light and significantly in their beam path (for ∼4 m after the CaF_{2} wedge). Due to path length constraints and available space on our apparatus, keeping the two beams on a similar path is impractical. With regard to the narrow bands of ∼800 nm light used for upconversion, simply splitting one upconversion pulse with a 50:50 beam splitter would leave the upconverted light too weak to leverage the full dynamic range on both CMOS arrays. Both the probe/LO and reference beams are overlapped with their respective narrow-band upconversion beams using a 2 mm CaF_{2} 800 nm high reflector. Both upconversion crystals are identical 5 × 5 × 3 mm^{3} wedge-cut MgO:LiNbO_{3} (5% doping, θ = 46.5°, type I, Crylight Photonics). The upconverted probe/LO light passes through a 300 mm focal length spectrometer (Princeton Instruments 2300i) for detection using a 14-bit, 1024-pixel, single-line CMOS array (Imaging Solutions Group LW-ELIS-1394A). The upconverted reference light goes through a home-built spectrometer consisting of a 1800g/mm grating, a concave mirror (f = 200 mm), and an identical CMOS array.

## RESULTS AND DISCUSSION

### Performance and optimization of edge-pixel referencing

We first assess how the number of edge pixels affects the performance of the algorithm. We use a series of calibrations on the transient absorption data (see methods and materials above) using varying numbers of contiguous edge pixels. Each matching set of contiguous edge pixels is chosen with the same common center pixels (No. 400 and No. 800), as illustrated by the vertical bands in Fig. 2(a). Pixel No. 600 is located midway between any set of edge pixels in the series (which is also the approximate midway point between the ground-state bleach/stimulated emission and excited-state-absorption signals in the pump–probe spectrum of MeSCN), and thus, we choose pixel No. 600 as the designed test pixel for observing how the noise depends on the number of edge pixels that we use in the calibration. Figure 2(b) shows the residual noise [see Eq. (7)] for different numbers of edge pixels [following the color bar above Fig. 2(a)] as a function of the calibration set size, i.e., the number of sets of 4-pulse cycles we use for calibration.

Figure 2 exhibits a few notable trends. First, the performance of edge-pixel referencing increases (noise decreases) with increasing numbers of edge pixels, which is not surprising. Repeated analysis on a 5X binned version of this same dataset (not shown here) produces essentially the same asymptotic limits, as shown in Fig. 4(b). In other words, it would be more accurate to say that the residual noise depends on the ratio of the number of edge pixels to center pixels, rather than on the number of edge pixels alone. Second, the minimum calibration set size, which we infer as the point at which the residual noise reaches an asymptote, increases with the number of edge pixels. It appears that edge-pixel referencing approaches the noise floor of our detector between 300 and 500 pixels. Third, with respect to the calibration set size, the last few trend lines (100–500 edge pixels) show that the noise approaches the asymptotic limit once the calibration set size reaches ∼10× the number of edge pixels. Finally, it is notable that the residual noise after referencing with just two edge pixels is already an 8× improvement from the unreferenced noise (shown later to be ∼4.5 mΔOD), which suggests that edge-pixel referencing with just 2–4 pixels may still be very helpful even for low pixel density arrays such as MCT arrays.

To explore the value of choosing edge pixels with higher spectral intensity closer to the center of the probe/LO spectrum, we run a series of calibrations with 20 edge pixels (10 on each side) in varying locations both near and far from the test pixel, as illustrated in Fig. 3(a). Figure 3(b) shows a plot of the test-pixel noise vs the average edge-pixel intensity. The plot shows that the test-pixel noise is, indeed, inversely related to the average edge-pixel intensity for the chosen set of referencing pixels. Fitting to a log–log plot yields a slope of −0.998 and a y-intercept of 4.810 (data not shown).

Finally, Fig. 4(a) compares the mean residual noise [see Eq. (7)] for unreferenced (green), second spectrometer referencing (blue), and edge-pixel referencing (red). The edge pixels chosen (indicated by the gray shading) are typical for our experiments with MeSCN. Given the poorer noise suppression using the second spectrometer relative to edge-pixel referencing, we should emphasize that the optical scheme employed here does differ from that used by Feng *et al.*^{27} where they showed that second spectrometer referencing was able to reach the floor noise of another CMOS array. Our apparatus differs in that both the probe and reference beams undergo independent SFG (upconversion) before detection. Unfortunately, we are unable to afford 50:50 splitting of the same SFG pump beam without sacrificing the full dynamic range of our CMOS detectors or losing the frequency resolution (i.e., broadening the upconversion pump in the zero-order dispersion stretcher). Thus, the two pump beams for each upconversion process (each ∼1.5 cm^{−1} in bandwidth) are separately extracted from the depleted 800 nm OPA light. Due to their different spectral origins, we do not expect that the two upconversion pump beams exhibit perfectly correlated noise. This reality should fundamentally limit how well the second spectrometer can remove the LO noise from the signal spectrometer, which, we believe, is the reason for the lower performance seen in Fig. 4 for second spectrometer referencing. Had each upconversion beam been derived from the same origin and just split by a 50:50 beam splitter, we expect that second spectrometer referencing would perform, as well as edge-pixel referencing. For the rest of this paper, we only use the second spectrometer data as a control measurement for the comparison of the amplitudes and line shapes obtained by edge-pixel referencing.

Looking closely at the zoomed-in view in Fig. 4(b), we first note that the large peak in the residual noise around pixel No. 625 is a consequence of the MeSCN hole in the transmission spectrum. In fact, the ∼3:2 peak-to-baseline ratio of the residual noise in Fig. 4 is consistent with 1/*I*(*ω*) noise scaling with respect to the transmission spectrum [see the black trace in Figs. 2(a) and 3(a)]. As mentioned in the introduction, 1/*I*(*ω*) noise scaling is characteristic of the detector noise. From the data sheet provided by the manufacturer, the full-well depth of each CMOS pixel is 8 × 10^{5}*e*^{−} at 2^{14}–1 = 16 383 counts, and the detector noise is ∼235*e*^{−}. At 11 360 counts (e.g., test pixel No. 600), this should translate to a detector noise of ∼180 μΔOD, which is consistent with the asymptotic noise limit in Fig. 2(b). For the 4-pulse phase cycle, the photon shot noise is equal to $1/(ln10N)$, (see Appendix C) where *N* is the number of electrons detected. At 11 360 counts and a conversion ratio of 8 × 10^{5}*e*^{−}/16 383 counts, this corresponds to a shot noise limit of ∼580 μΔOD (or a total noise floor of ∼610 μΔOD), which is ∼3× larger than the residual noise reported in Figs. 2(b) and 3 (for pixel No. 600). This result is surprising because it suggests that we have achieved a noise floor that is below the shot noise limit. However, we believe that this effect is artificial and related to the effective resolution of the spectrograph.

Photon shot noise may be understood as the interference of the electromagnetic vacuum with the LO resulting in a modulation of the photon flux.^{41,42} For a light source that is both coherent and relatively dim, the photon noise is modeled using Poisson statistics.^{43} However, for brighter light sources, identical photons tend to clump together due to boson symmetry, which is known as photon “bunching.” In this case, light is said to be chaotic and a different model is required for describing the photon noise. It has been shown that filtering a light source based on frequency (e.g., a spectrometer) can produce correlated photon bunching of adjacent frequencies.^{44} We believe this effect may apply to upconversion in the sense that phase-matching is a frequency filtering process in and of itself. As a sum-frequency process, during upconversion, the mid-IR light is spectrally convolved (or filtered, in this context) with the ∼1.5 cm^{−1} (800 nm) pump, thereby limiting our spectral resolution well before the spectrometer. Vacuum modes, being the origin of photon noise in the wave picture, accompany the LO along the way, and we argue that those too are spectrally convolved during upconversion.

On the other hand, the apparent resolution of our array detector is ∼0.3 cm^{−1} per pixel, which means that the spectral response of 5 adjacent pixels should be highly correlated. This correlation would mean that these 5 pixels function, effectively, as a single pixel, which means that it would be inappropriate to estimate the shot noise limit on a per pixel basis. Rather, it should be estimated on a 5-pixel basis. We test this hypothesis by binning every 5 pixels of the same calibration and test datasets, then subsequently recalibrating ** B** (now with 5× fewer edge pixels), and computing the residual noise after edge-pixel referencing. The results are shown in Fig. 4(b) where the red (dashed) trace is the residual noise after edge referencing the 5× binned spectra, and the black (dashed) trace is the corresponding noise floor estimate (decreasing by a factor of $5$). For comparison, the edge referenced noise trace without binning is replotted as the solid red trace and the corresponding noise floor without binning as the solid black trace. The result shows a negligible change in residual noise after 5× binning, which is expected if the shot noise is redundant on a 5-pixel basis.

### 2D IR line shape comparison

We conduct several experiments to measure what effect(s) edge-pixel referencing may have on 2D IR amplitudes and line shapes. Each experiment consists of 30–35 repeated trials of 2D IR waiting-time-series spectra of 200 mM MeSCN in DMSO for which we evaluate and compare line shapes between the second spectrometer (i.e., the control) and edge-pixel measurements. In the interest of keeping the comparison as fair as possible, both sets of data are derived from the same unreferenced dataset: for every 2D IR spectrum collected, we calculate and save two referenced versions of the same unreferenced measurement which we refer to as a “trial pair.”

A python script (written in-house) analyzes the volume and the center line slope (CLS) of the 0 → 1 (ground-state bleach and stimulated emission) peak. Peak volumes come from the analytical integration of a 2D Gaussian fit,^{45} the functional form of which includes correlation of the line shape and an offset. We calculate the CLS as described previously.^{45,46} Figure 5 shows the line shape analysis for 200 mM MeSCN in DMSO collected with a pump bandwidth of 100 cm^{−1}. Figures 5(a) and 5(b) show the isotropic lifetime decay and CLS for edge referenced (red) and second spectrometer referenced data (blue). Error bars on each point indicate the standard error of the mean (SEM). The difference plots below each respective graph [Figs. 5(c) and 5(d)] provide a more quantitative comparison. Purple dots indicate the mean of the difference between individual trial pairs in the measured values at each waiting time (edge-pixel minus second spectrometer). The null hypothesis is that the two referencing schemes yield identical peak volumes and CLSs for a given waiting-time (i.e., ΔVol = 0 and ΔCLS = 0). Black lines trace the two-sided 95% confidence intervals for testing of the null hypothesis.

Based on the overlapping SEM bars in Fig. 5(a), it may be tempting to conclude that both referencing schemes yield indistinguishable lifetime decays. Statistical analysis of this sort (i.e., Welch’s *t*-test) assumes that the two datasets are independent. However, the two referenced datasets originate from the same unreferenced data, and therefore, the underlying noise in the peak volumes and CLSs between the two datasets will be correlated to some degree. For example, small fluctuations in the peak volume arising from multiplicative pump noise, or long-term power drifts, will appear equally in the edge-pixel referenced spectrum and the second spectrometer referenced spectrum for any given trial pair. The confidence intervals shown in Figs. 5(c) and 5(d) are computed based on a paired *t*-test, which is immune to the correlated noise between matching data points.^{47} More precisely, the two datasets presented in Fig. 5(a) are susceptible to multiplicative pump noise, while the single dataset in Fig. 5(c) is not. Had the two datasets been collected separately in time, multiplicative pump noise would no longer be correlated on a paired basis and we could no longer claim that the data in Fig. 5(c) are immune to multiplicative pump noise. Consequently, this analysis would lead to larger confidence intervals and the erroneous conclusion that the two referencing schemes are indistinguishable. With that background, Fig. 5(c) shows that edge-pixel referencing consistently favors a ∼1% larger peak volume over second spectrometer referencing, which is a systematic error. This error arises from a background response induced by the pump beam that the edge-pixel reference scheme attempts to correct as if it were LO noise. The effect is small but not negligible, and we address it in more quantitative detail later in this manuscript. In contrast, Fig. 5(d) shows that nearly all the values for the CLS differences fall within the interval for accepting the null hypothesis, indicating that, at least for the CLS, the two measurements are truly equivalent within experimental uncertainty.

Interestingly, the SEM bars in Figs. 5(a) and 5(b) are almost equal in magnitude between the two referenced datasets, yet Fig. 4 might lead you to believe that the two methods should differ by a factor of ∼5× in noise. With respect to the SEM bars in Fig. 5(a), the long-term drift in the pump power will also act as a form of noise. Separate analysis of the 35 trials (data not shown) reveals that the peak volume drifted ∼12% over the course of the experiment, while the relative standard deviation of the peak volume is consistently ∼6% (or ∼12% as two-sided) across all waiting times. Although we have separately confirmed that the second spectrometer data are indeed ∼5× noisier, it appears that the SNR of the second spectrometer data is sufficiently high such that the variability in fitting the 2D Gaussian between the trial pair is still much smaller than 6%. Therefore, we conclude that the limiting source of noise in Fig. 5(a) is due to the long-term pump power drift over the course of the 35 trials. With respect to Fig. 5(b), we remark that the ∼5× excess noise present in the second spectrometer data is the LO noise, which appears as vertical banding (i.e., correlated fluctuations) along the probe axis in the 2D IR spectra. As the CLS is a measure of the pixel position of peak intensity along the probe axis at a given pump frequency, the vertical banding is approximately orthogonal to the CLS measurement, and therefore, the excess LO noise in the second spectrometer data should have little impact on the SEM bars in Fig. 5(b).

A paired *t*-test is a high standard of analysis capable of removing correlated noise between two datasets. However, it is only applicable in situations where a second dataset containing correlated noise (i.e., the second spectrometer data) is available. In most experiments, this approach would not be feasible. Instead, the conventional method of analyzing simple 2D IR spectra involves multi-component fitting of the line shape and CLS to extract a handful of characteristic parameters. As a result, for practical purposes, comparing the parameters that result from the two methods is the only analysis that matters if one cannot distinguish the fitting parameters from the two methods, then it does not matter which referencing scheme we use. Based on this reasoning, we fit the lifetime (peak volume vs waiting time) and CLS decays from each referencing scheme to a double exponential model, weighting each point by the standard deviation. Table I shows that the fit parameters for lifetime and CLS decays are indistinguishable between the two methods. Hence, from a practical standpoint, edge-pixel referencing yields results that are indistinguishable from the second spectrometer referencing, meaning that the small distortion in the peak volume that arises in the edge-pixel referencing scheme does not affect the ability to accurately extract lifetime information from the data.

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.24 ± 0.16 | 0.4 ± 0.4 | 0.76 ± 0.04 | 76 ± 4 | 0.43 ± 0.03 | 0.55 ± 0.08 | 0.57 ± 0.03 | 5.8 ± 0.3 |

Second spect. | 0.24 ± 0.15 | 0.4 ± 0.4 | 0.76 ± 0.04 | 75 ± 4 | 0.42 ± 0.04 | 0.50 ± 0.09 | 0.58 ± 0.03 | 5.6 ± 0.4 |

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.24 ± 0.16 | 0.4 ± 0.4 | 0.76 ± 0.04 | 76 ± 4 | 0.43 ± 0.03 | 0.55 ± 0.08 | 0.57 ± 0.03 | 5.8 ± 0.3 |

Second spect. | 0.24 ± 0.15 | 0.4 ± 0.4 | 0.76 ± 0.04 | 75 ± 4 | 0.42 ± 0.04 | 0.50 ± 0.09 | 0.58 ± 0.03 | 5.6 ± 0.4 |

### Pump band narrowing as a supporting strategy

As noted in the introduction, Feng *et al.*^{27} raised concerns that edge-pixel referencing may introduce an artificial background to the spectrum if the edge pixels reside in regions with pump-induced responses such as a weak solvent background response. Indeed, the results shown in Fig. 5(c) confirm their hypothesis. To address this problem, we propose on narrowing the pump bandwidth to ∼50% of the probe bandwidth. This strategy, which we will refer to as “band narrowing,” should reduce the nonlinear response at the edge pixels. Provided that the pump bandwidth is still sufficiently large to encompass the line shape of interest, the only potential cost in band narrowing the pump is a loss in time-resolution for events occurring on the order of the pulse duration.

We test this strategy by applying a 40 cm^{−1} Gaussian mask centered about the MeSCN 0 → 1 transition to all AOM shapes for the 2D IR measurement. Figure 6 and Table II show the results of these measurements following the same analysis as for the data in Fig. 5. The decay data in Fig. 6 show that the peak volumes and line shapes collected using a narrow band pump are indistinguishable between edge-pixel and second spectrometer referencing using both methods of analysis.

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.21 ± 0.07 | 0.6 ± 0.3 | 0.79 ± 0.02 | 77 ± 2 | 0.46 ± 0.02 | 0.37 ± 0.04 | 0.54 ± 0.02 | 5.7 ± 0.3 |

Second spect. | 0.21 ± 0.06 | 0.6 ± 0.3 | 0.79 ± 0.02 | 77 ± 2 | 0.46 ± 0.02 | 0.37 ± 0.04 | 0.54 ± 0.02 | 5.6 ± 0.3 |

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.21 ± 0.07 | 0.6 ± 0.3 | 0.79 ± 0.02 | 77 ± 2 | 0.46 ± 0.02 | 0.37 ± 0.04 | 0.54 ± 0.02 | 5.7 ± 0.3 |

Second spect. | 0.21 ± 0.06 | 0.6 ± 0.3 | 0.79 ± 0.02 | 77 ± 2 | 0.46 ± 0.02 | 0.37 ± 0.04 | 0.54 ± 0.02 | 5.6 ± 0.3 |

### Referencing to large background signals

It is reasonable to wonder about the limits of edge-pixel referencing in cases where there are large background signals that arise from additional transitions in the spectrum. To address that question, we prepare a sample with an additional chromophore added such that we know that the signal from that chromophore will appear in the middle of the referencing region. In this case, we choose to spike our MeSCN in DMSO samples with benzonitrile. The line shape for benzonitrile falls 50–70 cm^{−1} to the blue of the MeSCN transition, occupying most of the right-hand edge-pixel referencing region. Figure 7 shows pump–probe spectra for a sample with 200 mM MeSCN and 200 mM benzonitrile in DMSO using both edge-pixel (red) and second spectrometer (blue) referencing. For the edge-pixel referenced data, there is a severe baseline distortion in the transient pump–probe spectrum when referencing to the benzonitrile peak, as expected.

Figure 8 shows a side-by-side comparison of the 2D IR spectra collected by the second spectrometer (left) and edge pixel (right) referencing for this same mixed sample. Both spectra show the MeSCN line shape in the center (∼2155 cm^{−1}), while the benzonitrile line shape (∼2230 cm^{−1}) is present only in the second spectrometer referenced spectrum and is, unsurprisingly, absent from the edge-pixel referenced spectrum. Although distortions do appear in the edge-pixel spectrum, they strictly manifest as vertical bands in the pump axis region *ω*_{1} = [2190 cm^{−1}, 2240 cm^{−1}], leaving the MeSCN line shape apparently unperturbed. Figure 9 shows further analyses for the spiked solution as before, confirming that referencing to the nitrile line shape does not significantly perturb the MeSCN line shape. This conclusion is further supported by the lifetime and CLS fits provided in Table III, which also show indistinguishable fit parameters between second spectrometer and edge-pixel referencing and are consistent with the results for the solution without benzonitrile.

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A1 (prob.) . | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.23 ± 0.25 | 0.5 ± 0.9 | 0.77 ± 0.07 | 74 ± 6 | 0.4 ± 0.1 | 0.3 ± 0.2 | 0.59 ± 0.07 | 5.3 ± 0.8 |

Second spect. | 0.24 ± 0.27 | 0.5 ± 0.9 | 0.76 ± 0.07 | 75 ± 9 | 0.4 ± 0.1 | 0.3 ± 0.2 | 0.60 ± 0.07 | 5.3 ± 0.8 |

Referencing . | Peak volume . | Center line slope . | ||||||
---|---|---|---|---|---|---|---|---|

method . | A1 (prob.) . | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. | A_{1} (prob.)
. | τ_{1} (ps)
. | A_{2} (prob.)
. | τ_{2} (ps)
. |

Edge pixel | 0.23 ± 0.25 | 0.5 ± 0.9 | 0.77 ± 0.07 | 74 ± 6 | 0.4 ± 0.1 | 0.3 ± 0.2 | 0.59 ± 0.07 | 5.3 ± 0.8 |

Second spect. | 0.24 ± 0.27 | 0.5 ± 0.9 | 0.76 ± 0.07 | 75 ± 9 | 0.4 ± 0.1 | 0.3 ± 0.2 | 0.60 ± 0.07 | 5.3 ± 0.8 |

Note that to achieve a pump bandwidth that spans both the MeSCN and benzonitrile line shapes requires us to shift the mid-IR spectrum to a higher energy and broaden it to the point that it significantly overlaps the CO_{2} antisymmetric stretching transition. Because we did not purge the beam path of CO_{2}, the beams are susceptible to chromatic dispersion. Because the effects of temporal chirp show up on both referenced spectra equally, this effect is inconsequential with respect to the comparison of the two referencing methods. Thus, we do not concern ourselves with obtaining transform limited pulses, even though performing this same experiment with a transformed limited pump and probe would lead to slightly different fit values than those shown in Table III. We also note that the 3× reduction in signal observed between Figs. 5(a) and 9(a) is due to the movement of the mid-IR center frequency from 2155 cm^{−1} to 2190 cm^{−1} in order to excite both line shapes. The decrease in signal also results in the increase in the relative magnitudes of the SEM uncertainties and, therefore, the uncertainties of the fit parameters.

Although the results shown in Figs. 8 and 9 and Table III may appear contrary to the transient absorption spectrum in Fig. 7, a better understanding may be gained by framing the problem in terms of the Fourier Transform. Consider the application of Eq. (3) to a scenario where noise *n*_{edge} and background signal *f*_{edge} are present at the edge pixels. The edge pixel referenced 2D IR spectrum *S*_{edge} is then given by the following equation:

After the substitution of Eq. (3), with the understanding that ** B** ·

*n*

_{edge}=

**·**

*B**n*

_{ref}, we arrive at the following equation:

Because the pump and probe axes are orthogonal and the operator ** B** strictly acts along the probe axis, the Fourier transform along the pump axis (

*t*

_{1}→

*ω*

_{1}) necessarily commutes with

**. After applying the Fourier transform to Eq. (9), we arrive at the following equation, which shows that the edge referenced spectrum in the frequency–frequency domain is equal to the second spectrometer referenced 2D IR spectrum minus a term of**

*B***·**

*B**f*

_{edge}(

*ω*

_{1},

*ω*

_{3}):

We point back to Fig. 8 as a visual aid for interpreting Eq. (10). In both spectra, we have highlighted the vertical regions encompassing the line shapes of MeSCN (region V_{1}) and benzonitrile (region V_{2}). Edge pixel regions are highlighted in black (Edge 1 and Edge 2) on the right-hand spectrum. For edge-pixel referencing to add distortion to vertical region V_{1}, there must be a nonzero signal at the overlap of V_{1} and either Edge 1 and/or Edge 2 (represented by striped regions). In terms of Eq. (10), this requirement reduces to whether *f*_{edge}(*ω*_{1}, *ω*_{3}) is nonzero within the striped regions that indicate the overlap between the edge and signal regions. For the case of V_{1}, the background signal at the overlap regions is negligible, so no distortion is observed. For the case of V_{2}, the benzonitrile line shape is present in the V_{2}/Edge 1 overlap region, causing distortion in V_{2}. The reason distortion generally appears as vertical banding is due to the nature of operator ** B** in Eq. (10), which (by definition) only acts along the probe axis.

This interpretation is helpful for predicting what effect edge-pixel referencing will have on a given region of a 2D IR spectrum in various circumstances. Based on Fourier Transform pairs, we note four limiting cases of edge-pixel referencing with additive background signals and what the consequence will be on the 2D spectrum.

Case No. 1: Referencing to a constant offset in the time domain will add a vertical band to the spectrum at

*ω*_{1}= 0 (or more generally, the frequency-shifted DC component).Case No. 2: Referencing to a sinusoidal signal of the form sin(

*ω*_{0}*t*_{1}+*ϕ*) will add a vertical band to the frequency spectrum at*ω*_{1}=*ω*_{0}.Case No. 3: Referencing to a signal of the form δ(

*t*_{1}) will add a constant vertical background to the spectrum.Case No. 4: Referencing to Gaussian white-noise will add Gaussian white-noise to the 2D frequency spectrum.

With this new insight, we revisit the results presented in Fig. 5. The volume discrepancy reported in Fig. 5(c) is most consistent with Case No. 3. Specifically, a background response from the solvent that is well approximated as a delta function at *t*_{1} = 0 is consistently observed at each waiting time. Note that although this added background in the frequency–frequency domain is constant along the pump axis, it is probably more complicated along the LO/probe axis since the product *B* ·*f*_{edge}(*ω*_{1}, *ω*_{3}) is nontrivial (see the V_{2} region in Fig. 8). We believe that this contribution to the spectrum is responsible for the subtle 1% discrepancy in the peak volume fit between edge-pixel and second spectrometer referencing. The delta function behavior at *t*_{1} = 0 independent of waiting time would be consistent with a non-resonant response of the solvent resulting from the overlap of the two pump beams that causes a slight modulation of the probe beam resulting in what appears to be a weak signal sharply peaked at *t*_{1} = 0.

In addition to these effects, another consideration that we should make note of is the effects of scattered light. During heterodyne detection, electric fields mix on a square law detector, producing phase sensitive interferences. In addition to the LO field (*E*_{LO}) and the 2D IR field (*E*_{sig}), both pump (*E*_{1} and *E*_{2}) and probe (*E*_{3}) fields may scatter into the direction of the detector due to a scratched window or small particles present in the sample. The various scattering contributions have been analyzed and considered thoroughly before, and almost all of them are removed using the four-pulse phase cycling scheme that we employ with the pulse shaper.^{48} The only remaining scattering contribution is the involvement of one scattering interaction from each pump beam, $E1*E2$. Because the $E1*E2$ scatter arises from the interference of the two pump fields, it carries a temporal phase of $e\u2212i\omega t1$ and appears in the 2D IR spectrum along the diagonal. Based on field strengths, however, $E1*E2$ interference will always appear as a factor of $I2/ILO$ smaller than the $E1*ELO$ interference, where the intensities in the radical are those observed at the spectrometer. Because the pump beam does not propagate toward the SFG crystal and, ultimately, the detector like the LO does, this factor should be very small. In any case, the only remedy for removing $E1*E2$ interference is optical chopping of the probe beam, which along with the existing 4-pulse phase cycle results in an effective 8-pulse phase cycle. If this scheme is not employed, then referencing to the $E1*E2$ scatter will produce vertical bands on the left and right side of the 2D spectrum in the frequency region corresponding to the edge pixels used for referencing, much as was seen for the benzonitrile experiment.

One unique feature of upconversion detection is that all mid-IR signals that appear on the spectrometer *must* have undergone sum-frequency mixing with the 800 nm narrow band. At a bandwidth of 1.5 cm^{−1}, the narrow-band upconversion pulse has a pulse duration encompassing events occurring within ±5 ps of the probe. Because the pump always precedes the probe, this temporal overlap requirement effectively limits the time over which the pump scatter can upconvert to the first 5 ps of waiting time, which does reduce the impact of this contribution somewhat.

A final consideration involves the applicability of edge-pixel referencing to 2D IR spectroscopy based on other experimental arrangements such as hole-burning (sometimes referred to as double resonance) or four-wave mixing methods. The hole-burning method is really just a transient absorption experiment with a narrow-band pump pulse. As a result, everything that we have done with edge-pixel referencing will translate nicely to that experimental approach. In the four-wave mixing experiment, spectra often contain a phasing error *e*^{iϕ} in both the rephasing and nonrephasing spectra that must be corrected after the fact. If we assume that whatever signal present at the edge pixels *f*_{edge}(*ω*_{1}, *ω*_{3}) also contains the same phase error, then this factor should appear equally in both of the right-hand terms of Eqs. (9) and (10). In this case, we predict that edge pixel referencing should be fully compatible with the projection-slice theorem. For example, if repeating the benzonitrile experiment presented here, the pump–probe calibration spectrum would appear like the red trace shown in Fig. 7. Although unpleasant, in principle, this spectrum should be obtained when projecting the properly phased 2D spectrum against the probe axis. Clearly, we have not demonstrated the practical application of edge-pixel referencing in this context, but there is no reason, in principle, why it should fail. The practical utility of this approach is, therefore, a potential area for future study.

## CONCLUSION

We have studied the performance, optimization, and potential distortions of edge-pixel referencing on a CMOS array spectrometer in 2D IR applications. Based on these results, we provide the following recommendations for optimal performance: (1) Choose edge pixels closer to the center of the spectrum with higher intensity but not so close that the edge pixels overlap with the signal from the line shape of interest. (2) For optimal noise reduction, it is best to choose as many edge pixels as possible. (3) Nonlinear background signals at the edge pixel may cause spectral distortions, although these may or may not affect the spectral region of interest depending on the nature of the background. (4) The effects of background signal distortions may be mitigated by band-narrowing the pump pulse. With these performance characteristics in mind, it is clear that edge-pixel referencing is a promising approach for noise reduction in many applications of heterodyned spectroscopy, especially in 2D IR.

In comparing edge-pixel referencing to other referencing schemes, we would note several advantages and some limitations. Among the most notable advantages is the fact that the implementation of edge-pixel referencing is entirely a matter of software. It requires no additional spectrometer or array detector and can be set up with some simple modifications to the usual data collection code for 2D IR or other heterodyned spectroscopies. That makes it an essentially free implementation. In addition, as we showed in Fig. 4, edge-pixel referencing is capable of achieving shot-noise limited detection, which means that it is at least as good as referencing to another spectrometer. Finally, however, there are some limitations. Edge-pixel referencing does perform much better with a high pixel count, making it less accessible when using MCT array detectors that may only have 128 pixels. Background signals that contribute in the regions of the edge pixels can lead to distortions of the 2D IR spectrum although we showed that for some sharp transitions, these effects are not significant, depending on the particular spectral region in which they appear, and for broader transitions, we can use band-narrowing of the pump beams to mitigate this effect somewhat.

## ACKNOWLEDGMENTS

We would like to thank Ilya Vinogradov and Nien-Hui Ge for insightful discussions on this project. The authors would also gratefully acknowledge funding for this work from the U.S. National Science Foundation under Grant No. CHE-1707598.

### APPENDIX A: PROPAGATION OF ERROR

Given a function *f*(*x*_{1}, *x*_{2}), we wish to know how the fluctuations *x*_{1} = ⟨*x*_{1}⟩ + *δx*_{1} and *x*_{2} = ⟨*x*_{2}⟩ + *δx*_{2} propagate through *f*. Assuming that these fluctuations are relatively small, the deviation of *f*(*x*_{1}, *x*_{2}) from *f*(⟨*x*_{1}⟩, ⟨*x*_{2}⟩) is well-approximated by the first order Taylor series expansion

The mean squared error of *f*(*x*_{1}, *x*_{2}) is obtained by averaging over the square of Eq. (A1),

where $\delta x12$, $\delta x22$, and ⟨*δx*_{1}*δx*_{2}⟩ are the variances and covariance of *x*_{1} and *x*_{2}. The root mean squared error of *f*(*x*_{1}, *x*_{2}) is obtained by taking the square root of Eq. (A2).

### APPENDIX B: ERROR OF THE 2-PULSE PHASE CYCLE

Δ*OD* for the 2-pulse phase cycle in absorbance mode is defined as

The differential of the 2-pulse Δ*OD* is given by

The mean squared error of the 2-pulse Δ*OD* is obtained by averaging over the square of Eq. (B2),

To simplify the notation, we assume that *I*_{1} and *I*_{2} are statistically equivalent. The MSE and RMSE of the 2-pulse Δ*OD* are then given by the following equations:

When the unreferenced LO noise is present, the consecutive-shot covariance ⟨*δI*_{1}*δI*_{2}⟩ can be significant.^{8} For our apparatus, the 2-pulse covariance ⟨*δI*_{1}*δI*_{2}⟩ ranges from 0.7⟨*δI*^{2}⟩ to 0.9⟨*δI*^{2}⟩ depending on the spectral frequency and OPA alignment. Although Eq. (B5) may suggest that the shot-to-shot covariance term ⟨*δI*_{1}*δI*_{2}⟩ is required for computing *RMSE*(Δ*OD*), note that ⟨Δ*I*^{2}⟩ = ⟨(*δI*_{1} − *δI*_{2})^{2}⟩ = 2⟨*δI*^{2}⟩ − 2⟨*δI*_{1}*δI*_{2}⟩. Hence, the *MSE*(Δ*OD*) and *RMSE*(Δ*OD*) may still be computed using just the difference-noise Δ*I* alone, as given by the following equations:

#### 1. Multiple noise sources

The formulas above do not specify the multiple noise sources that comprise the single-shot noise *δI*. As discussed in the Introduction, *δI* = *δI*_{LO} + *δI*_{shot} + *δI*_{det} includes the local-oscillator noise (*δI*_{LO}), photon shot noise (*δI*_{shot}), and detector noise (*δI*_{det}). Assuming ⟨*I*_{1}⟩ = ⟨*I*_{2}⟩ and ignoring multiplicative pump-noise, Eq. (B2) may be rewritten as

As first noted by Feng *et al.*^{8,27} and remarked above following Eq. (6), we emphasize that when referencing with the correlation matrix ** B**, the uncorrelated floor noise from the other reference spectrometer may not add to the signal array any more than the correlated LO noise that is subtracted. Hence, for convenience, we assume

*δI*

_{shot}and

*δI*

_{det}are just those values on the signal array, and the corresponding floor noise added from the detector array through

**is irrelevant in that it may be interpreted as the remaining LO noise after referencing. On the other hand, note that we redefined the LO noise as Δ**

*B**I*

_{LO}, which may be interpreted either as (1) Δ

*I*

_{LO}=

*δI*

_{LO,1}−

*δI*

_{LO,2}in the case of unreferenced LO noise or (2) the residual LO noise after referencing with

**. Although we could have reformulated**

*B**δI*

_{shot}and

*δI*

_{det}as difference-noise too, these two noise sources are commonly characterized in terms of single laser shots.

After averaging over the square of Eq. (B6) and assuming that $\delta Ishot,12=\delta Ishot,22$ and $\delta Idet,12=\delta Idet,22$ and that the remaining covariances are negligible, we have

After taking the square root of Eq. (B9), we arrive at the RMSE formula for the 2-pulse Δ*OD* noise (ignoring pump noise),

When both pump and LO noises are negligible, the 2-pulse Δ*OD* noise floor is

#### 2. Multiplicative pump noise

The heterodyned probe response measured on the signal array for an analyte pumped by a pulse pair with relative phase difference Δ*ϕ* is given by

where *χ*^{(3)}*I*_{Pu} is the third order response we wish to isolate, a factor of 2$R{}$ has been ignored for brevity, and we have assumed |*E*_{sig}| ≪ |*E*_{LO}|. Applying Eq. (B12) to Eq. (B1) while making use of the approximation log_{10}(1 + *x*) ≈ *x*/ln(10) for *x* ≪ 1 and setting Δ*ϕ* = 0 for *I*_{1} and Δ*ϕ* = *π* for *I*_{2}, we find that the 2-pulse Δ*OD* is well approximated by

Taking the differential of Eq. (B13), we have

In the last term of Eq. (B14), we have multiplied by ⟨*I*_{Pu}⟩/⟨*I*_{Pu}⟩ to show that the pump noise is proportional to the 2D IR signal *χ*^{(3)}⟨*I*_{Pu}⟩. Thus, Eq. (B14) shows that pump noise manifests as a multiplicative scaling of the third order signal, but is always a factor of *δI*_{pu}/⟨*I*_{Pu}⟩ smaller than the signal itself. As discussed in the Introduction, *δI*/⟨*I*⟩ is at worst 1%–2% for most well-behaved laser systems running at 1–10 kHz.

### APPENDIX C: ERROR OF THE 4-PULSE PHASE CYCLE

We now turn to the 4-pulse Δ*OD* function as defined in Eq. (C1). To preserve normalization, a factor of ½ has been appended to the formula to now account for the two difference-spectra contained in the following equation:

Applying similar steps and assumptions as those in the lead-up to Eq. (B9), we arrive at the MSE of the 4-pulse phase cycle,

Taking the square root of Eq. (C2), we find that the RMSE of the 4-shot Δ*OD* noise is given by

The 4-pulse Δ*OD* noise floor is given by the following equation where Δ*I*_{LO} is assumed to be negligible: