This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals in an extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.

Optical spectroscopy is a versatile technique for various scientific, industrial, and consumer applications. Recently, computational spectroscopy using reconstructive algorithms1–8 has been rapidly developed owing to its potential to enable a miniaturized spectrometer.9,10 A spectrometer encodes optical spectral information spatially or temporally and then measures the encoded information using a series of photodetectors. The transformation between the input signal x and the photo-detector readout y in a spectrometer design can be either linear or nonlinear. However, a linear design is typically preferred. An example of nonlinear encoding is observed in Fourier-transform infrared (FTIR) spectrometers, which produce an auto-correlation of the input through a variable time-delay interferometer. For a linear system, x and y can be represented using column vectors, while the mapping is represented by a responsivity matrix R such that y = Rx. For example, a monochromator spatially disperses different spectral components with a block-diagonal R matrix. In a reconstructive spectrometer, R is generally much more complex. Each photo-detector’s readout depends on all instead of a few spectral components. While the complex structure of R means a lot of computational resources are generally required to recover the spectrum x from the photo-detector readouts y, it also opens up new opportunities. These include the ability to tailor R to encode the spectral information directly to parameters of interest in the application, such as the spectral peak positions and the relative intensities between the peaks; the ability to increase the accuracy of spectral reconstruction by focusing on the most important spectral information even when y’s dimension is much smaller than x’s;11 and the ability to miniaturize the spectrometer.

A wealth of research has been conducted to increase spectral reconstruction’s efficiency and accuracy. The approaches can generally be categorized as optimization-based or data-driven. The optimization-based approaches formulate the inverse problem into a convex optimization problem in terms of the signal to be reconstructed. To deal with the non-uniqueness of the solution, different types of regularizations (e.g., nonnegativity, sparsity, and smoothness) have been considered.1,12,13 The effectiveness of these optimization-based techniques often hinges on the precise adjustment of regularization parameters and hyper-parameters, a task that can be challenging.14 Moreover, optimization-based approaches based on iterative procedures can be energy and computationally expensive for critical applications such as Internet-of-Things or time-sensitive tasks. Despite the challenges, miniaturized spectrometers based on these spectral-reconstruction techniques have reported good performance. For example, we recently developed a chip-scale spectrometer concept by integrating 16 semiconductor photo-detectors, each with a different spectral response.13 We designed the photo-detectors using nanostructured materials to minimize the dependence on the incident angle of light and, therefore, eliminated the need for any external optical components. The entire spectrometer, sans the electronic readout circuitry, is only a few micrometers thick. As the photodetector’s response was broadband, the device was able to recover basic spectral information, including the positions and relative intensities of peaks across the entire visible spectrum. The accuracy of reconstructing a multi-peak spectrum was 0.97% (RMS error) for locating the peak locations, comparable to a monochromator with twice the number of detectors. Enabled by the rich interaction between different spectral components in each photo-detector, we also showed that only seven detectors were sufficient for input signals with up to three peaks and had relatively smooth spectra.

In comparison, deep learning (DL)-based data-driven approaches8,15,16 showed the potential to address the challenges of an optimization-based approach. First, deep overparametrized networks trained by gradient descent methods tend to enforce the regularization implicitly,17 avoiding the need to adjust regularization parameters. Second, while training neural networks can be computationally expensive, reconstructing a new measurement during inference requires only a single forward pass through the network. In contrast, optimization-based approaches can be costly in terms of energy and computing resources during inference. Various data-driven methods of reconstructive spectroscopy have been developed, including utilizing a simple, fully connected network on a plasmonic spectral encoder on the device,8 minimizing potential noise before the reconstruction process on a colloidal quantum dot spectrometer,16 and a specifically tailored residual convolutional neural network aiming to improve the reconstruction performance.15 

A significant challenge with the current deep learning methodologies is their dependence on extensive volumes of real, precisely annotated training data to achieve optimal performance.8,14 In a practical setting, gathering such real, labeled data for spectrometer reconstruction are both costly and time-intensive. Additionally, the actual labeled data pairs collected often contain considerable noise, which may result in poor performance during testing when used to train deep learning models.

In this work, we introduce a method to tackle these challenges in deep learning-based approaches for spectral reconstruction. As illustrated in Fig. 1, we propose a Sim2Real framework in which we train the deep learning models solely based on simulated datasets and then deploy the model on a real dataset. Our method contains the following two key components:

  • Hierarchical Data Augmentation (HDA): To mitigate the domain gap between simulated and real data, we introduce a data augmentation technique to perturb both the response matrix R and the encoded signal. Training the model with these data improves its robustness.

  • ReSpecNN: In line with the HDA, we developed a new lightweight network architecture specifically designed for the Sim2Real framework, which is applied to the spectral reconstruction problem with the aforementioned HDA approach.

FIG. 1.

The diagram of our proof-of-concept Sim2Real framework in the reconstructive spectroscopy. The orange arrows denote existing methods, which require collecting and training on real-world data. The blue box and arrows denote our proposed Sim2Real framework that effectively addresses the domain shift between simulated and real-world data. The domain shift is visualized through PCA in Fig. 2 (see Sec. II for details). The response matrix plot is reprinted with permission from T. Sarwar, C. Yaras, X. Li, Q. Qu, and P.-C. Ku, “Miniaturizing a chip-scale spectrometer using local strain engineering and total-variation regularized reconstruction,” Nano Lett. 22, 8174–8180. Copyright 2022 American Chemical Society.

FIG. 1.

The diagram of our proof-of-concept Sim2Real framework in the reconstructive spectroscopy. The orange arrows denote existing methods, which require collecting and training on real-world data. The blue box and arrows denote our proposed Sim2Real framework that effectively addresses the domain shift between simulated and real-world data. The domain shift is visualized through PCA in Fig. 2 (see Sec. II for details). The response matrix plot is reprinted with permission from T. Sarwar, C. Yaras, X. Li, Q. Qu, and P.-C. Ku, “Miniaturizing a chip-scale spectrometer using local strain engineering and total-variation regularized reconstruction,” Nano Lett. 22, 8174–8180. Copyright 2022 American Chemical Society.

Close modal

Unlike conventional methods, marked by orange arrows in Fig. 1, which require collecting and training on real-world data, our strategy eliminates the need for extensive real-world data collection, only requiring a single measurement of the response matrix R. Moreover, by utilizing our augmented simulated training data, we effectively close the domain gap between simulated and real data, leading to high-quality spectral reconstruction during testing on real data.

The rest of the paper is organized as follows. In Sec. II, we introduce the mathematical formulation of the spectral reconstruction problem while identifying the limitations of an existing method. In Sec. III, we present our main contributions and discuss how they address the limitations of the existing method. In Sec. IV, we validate the performance of our proposed method on real-world data by comparing it with the state-of-the-art method and discussing the implications.

In this section, we introduce the mathematical formulations of the spectrum reconstruction problem. We also identify the challenges and limitations of existing optimization and deep learning methods.

For spectrometer signal reconstruction, the encoded signal vector y=(y1,,yK)R+K is the output produced by the spectrometer given the signal of interest x as input and is modeled mathematically as
(1)
(2)
where Eq. (2) is the discrete approximation of the model. Here, we denote the spectral signal of interest at the wavelength λ by x(λ)R+, where λ belongs to some predetermined wavelength range of interest [a, b]. The signal is encoded, i.e., measured, by K distinct spectral encoders with various responsivity at different wavelengths as shown in Fig. 1. We denote Ri(λ) as the relative spectral power density of the ith spectral encoder and ɛi, the error of the ith encoder, where i ∈ {1, …, K}. The error term, ɛi, is a random quantity whose dependency on x is not assumed to be known.
As shown in Eq. (2), sampling wavelengths {λ1, …, λ} result in the equally spaced discretization of [a, b] with a spectral resolution (ba)/. Stacking up the relative spectral power density for each encoder gives us the response matrix RRiji=1,j=1K,RK×, where RijRi(λj). As such, the discretization relationship between y and x could be expressed through R in a cleaner form
(3)

As such, our goal is to recover x from observed y and R under the setting where the number of encoders K is much smaller than the number of sampled wavelengths . In other words, the system in Eq. (3) is highly under-determined with non-unique solutions.

Note that given the response matrix R, the reconstruction problem in (3) can be approached using (non-negative) least squares, as discussed in the introduction. Due to the ill-posedness of the problem, explicit regularization techniques are used to ensure unique solutions with desirable additional structures. Within optimization-based methods, the least-squares (LS) estimate x̂arg minxyRx22 fails to consider the signal’s nonnegativity, often leading to suboptimal solutions. This issue can be addressed by employing non-negative least squares (NNLS).1 To deal with an under-determined system, regularization strategies like 1-norm, 2-norm, or total-variation (TV) norm regularization have also been proven effective.12,13

However, solving these optimization problems can be time and memory consuming, and their performance is sensitive to the choice of the regularization hyperparameter, so they are often too expensive to deploy on chip-scaled devices.

Recently, deep learning-based approaches gained popularity due to their modeling flexibility and fast inference speed. However, a major bottleneck for effectively leveraging deep neural network models is gathering a large amount of real spectral data pairs (yreal,xreal), a task that is both expensive and time-consuming in reconstructive spectroscopy.

Therefore, based on the problem formulation in Eq. (3), previous approaches have focused on using the response matrix R to produce simulated spectral data pairs (ysim,xsim), aiming to mimic the distribution of real data obtained in laboratory settings. However, we observe that the simulated data and the real-measured data exhibit an undesirable phenomenon known as “domain shift,” leading to the poor performance of models trained solely on simulated data when applied to real ones. To provide a straightforward quantitative assessment of the domain shift in our problem, here we ran PCA on both simulated data and real data (preprocessed by log-min-max as mentioned in Sec. IV) to visualize the discrepancy via their two principal components. As depicted in Fig. 2, the real data present a much larger spread and variation along both axes compared to the simulated data, which demonstrated the existence of the domain shift.

FIG. 2.

The PCA projection of simulated and real data. The clear separation between clusters along the first two principal components highlights the distribution differences between the simulated data ysim and real data yreal, indicating a significant domain shift.

FIG. 2.

The PCA projection of simulated and real data. The clear separation between clusters along the first two principal components highlights the distribution differences between the simulated data ysim and real data yreal, indicating a significant domain shift.

Close modal

Later in Sec. III, we delve into this issue and detail our approach to bridging the domain shift. For the reader’s convenience, we first review previous approaches in detail.

1. Training on simulated data

In practice, the simulated spectral signals xsim can be viewed as the sum of Lorentzian distribution functions.13,15 Given the response matrix R, a simulated spectral signal xsim and its corresponding encoded signal ysim can be generated as follows:

  1. Generate M single peak Lorentzian distribution functions independently with various standard deviations.

  2. Sum and then normalize the heights of those Lorentzian curves within range [0, 1]. We denote the result distribution as xsim.

  3. Multiply xsim with the response matrix R to produce the encoded signal ysim=Rxsim.

Specifically, in the first step, each (Lorentzian) peak is characterized by three parameters: the mean μ, the width constant γ, and the intensity constant I. The parameters correspond to the peak location, spectral width, and intensity magnitude, respectively. Each parameter is sampled i.i.d uniformly from a set of ranges to be chosen to match the specific characteristics of the spectrometer device. Under our problem setting, the ranges are set to μ ∈ [0, 205], γ ∈ [15, 20], and I ∈ [0.25, 1], respectively.

2. Challenges in deploying trained models on real data

Equation (3) is a simplified model of the actual spectroscopy system. As such, the procedure in Sec. II C 1 for generating simulated spectral signals xsim and their corresponding encoded signals ysim may produce distributions that differ from the distribution of real encoded signals yreal. This may be attributed to the circuit design, which, in reality, may introduce diverse types of unknown noise into the system, resulting in this distributional difference between real and simulated data. When applying machine learning algorithms, this difference, or “domain shift,” could cause degraded performance.

We illustrate this in Fig. 3, where we train a model exclusively on simulated data that can successfully reconstruct the spectrum given simulated data as input. However, the model’s performance degrades significantly on real-world data, even when the real-world and simulated data appear visually similar. Detailed empirical evidence is discussed in Sec. IV D.

FIG. 3.

The testing RMSE on the simulated and real data13 for ResCNN15 and our proposed model ReSpecNN. Both models followed the Sim2Real training setting, which is trained solely on simulated data. Our model further incorporated the hierarchical data augmentation during training, while ResCNN does not.

FIG. 3.

The testing RMSE on the simulated and real data13 for ResCNN15 and our proposed model ReSpecNN. Both models followed the Sim2Real training setting, which is trained solely on simulated data. Our model further incorporated the hierarchical data augmentation during training, while ResCNN does not.

Close modal

The challenges of “domain shift” are common for deep learning-based approaches. The “domain shift” or “domain gap” issue was first observed in transfer learning,18 where a model trained on a large source dataset is fine-tuned on a smaller, more specific dataset. To mitigate the performance degradation due to distributional discrepancy between source and target datasets, many techniques have been proposed, including domain adaptation,19–21 meta-learning,22 and few-shot learning.23,24 These techniques have been shown to be effective in reducing the domain gap in various fields, including computer vision25,26 and natural language processing.27–29 

However, when applied to the spectroscopy reconstruction, those proposed methods fall short of solving the domain gap for the following reasons. First, we do not assume access to any data from the target distribution. Existing methods still require a certain amount of data from the targeted domain for fine-tuning, which is not applicable under our assumption. Second, most domain adaptation methods have been designed for classification-based tasks, making them less directly applicable to our spectral reconstruction problem in spectroscopy, which is fundamentally a (non-negative) regression problem. While a few existing results focus on practical regression problems,30,31 they often deal with simple regression problems rather than tackling the more challenging inverse problem setting that we consider. Third, regarding the topic of deep learning for inverse problems, for tasks like image reconstruction, existing research primarily leverages the inductive bias of neural network architectures suitable for image signals.32–34 In this work on spectral reconstruction, we deal with one-dimensional spectral signals, which require a different model design.

While deep learning approaches have become commonly used for spectral reconstruction problems, few studies tackle such domain shift issues directly. A recent work8 successfully reconstructed spectra with up to 14 peaks using a model trained on spectra with up to eight peaks under blind testing conditions; the training and test data are still drawn from similar conditions and distributions. Therefore, a specialized method is required to tackle the domain shift issue in solving the spectral reconstruction problem. We now introduce the proposed Sim2Real framework in the following.

In this section, we introduce the following Sim2Real framework to bridge the domain shift between the simulated encoded signal data and the data from real-world spectrometers. Our method tackles the domain shift by two key components: (i) hierarchical data augmentation for training data generation, and (ii) a lightweight network architecture designed for the spectrum reconstruction problem with our HDA.

Although we measure the response matrix R in advance and consider it to be fixed and known, our hierarchical data augmentation strategy acknowledges the potential uncertainty in our measurement of R and the encoded signal vector y to improve model robustness and minimize the domain gap. For every pair of simulated training data (ysim,xsim) introduced in Sec. II C, we systematically introduce noise as outlined in Algorithm 1.

ALGORITHM 1.

Hierarchical data augmentation (HDA).

Input: 
xsim: a simulated spectral signal from Sec. II C 
R: the response matrix 
p(R): the distribution of noise perturbation on R 
q(yaug(s)): the distribution of noise perturbation on yaug(s) 
S, T: the number of perturbations 
Output: 
(yaug(1,1),xsim),,(yaug(S,T),xsim): training data pairs. 
for s = 1, …, S do 
Sample Δsp(R
Rs = R + Δs 
yaug(s)=Rsxsim 
for t = 1, …, T do 
Sample εtq(yaug(s)) 
yaug(s,t)=yaug(s)+εt 
end for 
end for 
Input: 
xsim: a simulated spectral signal from Sec. II C 
R: the response matrix 
p(R): the distribution of noise perturbation on R 
q(yaug(s)): the distribution of noise perturbation on yaug(s) 
S, T: the number of perturbations 
Output: 
(yaug(1,1),xsim),,(yaug(S,T),xsim): training data pairs. 
for s = 1, …, S do 
Sample Δsp(R
Rs = R + Δs 
yaug(s)=Rsxsim 
for t = 1, …, T do 
Sample εtq(yaug(s)) 
yaug(s,t)=yaug(s)+εt 
end for 
end for 

To illustrate this process, we have provided a visualization in Fig. 4 using an example of 3LED samples (number of peaks M = 3) explained in detail below. Instead of simply multiplying response matrix R, our hierarchical data augmentation extends each original simulated data sample (ysim,xsim) to S × T augmented ones (yaug(s,t),xsim) by adding different noises on the measured response matrix R (aquamarine green traces in the middle) and each intermediate augmented encoded signal (illustrated by yaug(1) only, light purple traces on the right), respectively.

FIG. 4.

The diagram of the simulated data generation procedure and our proposed hierarchical data augmentation (HDA) scheme. Simulated spectral signal xsim is generated through the sum of Lorentzian distribution. For one given xsim, we generate many corresponding augmented encoded signals yaug(S,T) by adding noise ΔS to R before multiplying with the spectral signal and adding noise ɛT afterward. The two noise distributions could be chosen flexibly. This HDA process is summarized in detail by Algorithm 1.

FIG. 4.

The diagram of the simulated data generation procedure and our proposed hierarchical data augmentation (HDA) scheme. Simulated spectral signal xsim is generated through the sum of Lorentzian distribution. For one given xsim, we generate many corresponding augmented encoded signals yaug(S,T) by adding noise ΔS to R before multiplying with the spectral signal and adding noise ɛT afterward. The two noise distributions could be chosen flexibly. This HDA process is summarized in detail by Algorithm 1.

Close modal

To perturb the response matrix R, we simply add Gaussian noise with zero mean and variance related to entries of R. That is, for each noisy perturbation matrix Δs (s = 1, …, S), its (i,j)th entry is sampled i.i.d from the distribution N0,σij2 with σij = αRij, where α is a hyperparameter controlling the intensity of perturbation on R. For perturbing the encoded signal y, we inject non-negative noise ɛt (t = 1, …, T). Here, each entry of ɛt is sampled i.i.d from the Gaussian distribution and then passed through the ReLU(z)=max{0,z} operator to enforce non-negativity. In practice, we determine those hyper-parameters through empirical experiments. Details can be found in Sec. IV A.

In the case above, we considered adding Gaussian noise to disrupt the data, which is simple yet effective in practice. However, should specific information about the noise be accessible under certain conditions, we can further refine both distributions p(R) and q(yaug(s)) for the noise sampled on the response matrix and the intermediate augmented encoded signal, respectively. As a result, for each simulated ground truth spectrum input xsim, S × T many corresponding augmented encoded signal vector yaug(,) are generated. This also demonstrates the generalizability and flexibility of our data augmentation method. By incorporating structured noise into the device-informed simulated data generation, we term this process Hierarchical Data Augmentation (HDA).

For training with augmented data generated by HDA, we propose a deep neural network architecture tailored for spectrometer signal reconstruction, hereinafter referred to as ReSpecNN. The architecture is visualized in Fig. 5 and explained in detail below.

FIG. 5.

The architecture of our proposed neural network. ReSpecNN comprises two fully connected modules (dubbed rec_fc and rf_fc) and a three-layer convolutional neural network (dubbed conv). Note that rec_fc and rf_fc have residual connections. For each linear layer in fully connected modules, the number above represents its output dimension, where L denotes the number of input wavelengths. For each 1D convolutional layer, the tuple below specifies the number of filters and the kernel size.

FIG. 5.

The architecture of our proposed neural network. ReSpecNN comprises two fully connected modules (dubbed rec_fc and rf_fc) and a three-layer convolutional neural network (dubbed conv). Note that rec_fc and rf_fc have residual connections. For each linear layer in fully connected modules, the number above represents its output dimension, where L denotes the number of input wavelengths. For each 1D convolutional layer, the tuple below specifies the number of filters and the kernel size.

Close modal

The ReSpecNN model is specifically tailored for our HDA scheme. In contrast to the previous model by Kim et al.,15 we incorporate an additional fully connected block with skip connections to enhance the adaptability of our model in handling HDA-perturbed data. Instead of directly absorbing the information from the measured response matrix R into the calculations, which could be misleading due to inaccuracies in R measurements, we aim for the extra fully connected block to potentially learn a more robust inverse of R through our HDA-perturbed data. This, in turn, enhances the overall model’s robustness, leading to improved reconstruction performance. Specifically, it is important to note that the previous model by Kim et al.15 initialized input data with information directly from measured R, which might contain noise or misleading information. In comparison, through HDA, our ReSpecNN autonomously learns a more robust R for training.

Our model consists of two fully connected modules and a 1D convolutional module. The first fully connected module (rec_fc) aims to construct each spectrum at a gross level. To avoid the possible overfitting at this stage, dropout layers are incorporated after each fully connected layer. This fully connected module is followed by a convolutional module with three 1D convolutional layers (Conv1d in PyTorch), each followed by a max-pooling layer and a ReLU activation, serving to extract the potential spatial features from each wavelength value.

Subsequently, another fully connected module (rf_fc), consisting of fully connected layers and dropout layers mirroring the rec_fc module, is employed for the detailed reconstruction of finer spectral information. Furthermore, a residual connection links the initial output from rec_fc with the detailed output from rf_fc to improve the quality of the final prediction without losing the key spectral features from the initial reconstruction. A sigmoid function is applied at the end to ensure the final output spectrum is smooth and continuous.

In this section, we illustrate the performance of our Sim2Real approach on real-world data by comparing both the test accuracy and the inference time with a state-of-the-art optimization-based method: NNLS with Total Variation (NNLS-TV) regularization.13 

Our proposed DL-based model, ReSpecNN, in our Sim2Real setting was solely trained on the device-informed simulated data, generated according to Algorithm 1. To prepare the model for subsequent experiments, a total of 20 000 simulated data pairs (yaug, xsim) were used. To address the potential scaling difference between simulated and real input encoded signal values, the log-min-max normalization transformation is applied to convert the raw inputs y to ŷ before fed into the neural network,
(4)

In the simulated data generation process, a batch size of 256 is utilized. During the hierarchical data augmentation process, we set S = 2 and T = 4. For the noise control parameters, we chose α = 5 × 10−2 and σɛ = 1 × 10−5. In practice, both S and T could be determined empirically. For instance, we trained one model for each S = 0, 1, 2, 3 until convergence and obtained corresponding MAEs of 16.1, 1.96, 1.23, and 1.94. Consequently, we selected S = 2. It is worth noting that as S increases, the size of the training data also increases. Therefore, for more efficient training, our model favors smaller values of S. The noise levels α and θɛ are determined similarly, either by sweeping between values or by employing a binary search method within an interval to find the appropriate value. During the training stage, we employed the MSELoss function in PyTorch as our training loss. The optimizer used was Adam with a learning rate of 3 × 10−4.

To demonstrate the performance of our Sim2Real framework, we conducted evaluations using a real-world dataset, which is collected with a portable spectrometer device.13 This real-world dataset includes nine single-peak and five multiple-peak spectral signal samples, covering the wavelength range from 400 to 650 nm needed for the spectrometer’s targeted mobile application. The diverse peak profiles and the wide wavelength span ensure that the dataset is well-suited for testing the model under realistic conditions.

To evaluate the effectiveness of our proposed Sim2Real framework in comparison with the optimization-based method NNLS-TV,13 we focused mainly on two critical aspects: the accuracy of the spectral peak location and relative peak intensity. Instead of attempting to predict the entire spectral shape, which would require a large number of deployed spectral encoders (In fact, the number of spectral encoders is the most important factor determining the spectral resolution, which in turn dictates the criterion of the Relative Peak Position Error), our goal is to efficiently predict the most useful parameters for the mobile application, the spectral peak locations, using the least number of spectral encoders. Therefore, we chose the following relative peak position error as our evaluation criterion, defined as
for each reconstruction spectral signal. Here, λreconstructed denotes the wavelength position of the peak for the reconstructed spectral signal, while λ corresponds to the position of the ground truth.
Specific to multi-peak spectral data, we also measured the relative peak intensity error defined as
where Ireconstructed denotes the normalized peak intensity of the reconstructed spectral signal, while I corresponds to the ground truth.

Figure 6 presents two spectrum reconstruction examples using our proposed deep neural network and the NNLS-TV13 approach. When compared with the ground truths obtained by a commercial spectrometer, our approach exhibits a reliable performance on our miniature chip-scale spectrometer.

FIG. 6.

Visualization of the reconstructed spectral signals using real-world data (left: a single-peak sample, right: a double-peak sample) produced by NNLS-TV13 and our Sim2Real approach. The corresponding ground-truth spectra are measured by a commercial spectrometer. Our Sim2Real approach achieves a comparable performance with NNLS-TV. For any results with asymmetric peaks, the peak position is defined as the wavelength at which the maximum intensity occurs.

FIG. 6.

Visualization of the reconstructed spectral signals using real-world data (left: a single-peak sample, right: a double-peak sample) produced by NNLS-TV13 and our Sim2Real approach. The corresponding ground-truth spectra are measured by a commercial spectrometer. Our Sim2Real approach achieves a comparable performance with NNLS-TV. For any results with asymmetric peaks, the peak position is defined as the wavelength at which the maximum intensity occurs.

Close modal

For peak position predictions, Fig. 7 shows that our approach yields a result comparable with NNLS-TV13 on the aforementioned real-world dataset.13 In Fig. 7, we calculated the Mean Absolute Error (MAE), defined as the average of the absolute values of the relative peak position error across all the spectral data samples. While offering a significantly faster inference speed (details to be discussed in Sec. IV C), our model can still maintain the relative errors within −5%–5%.

FIG. 7.

The relative peak position errors (vs wavelength in nm) for single-peak real data (top) and multiple-peak real data (bottom). The plot title shows the mean absolute error (abbrev. “MAE”) for each approach.

FIG. 7.

The relative peak position errors (vs wavelength in nm) for single-peak real data (top) and multiple-peak real data (bottom). The plot title shows the mean absolute error (abbrev. “MAE”) for each approach.

Close modal

Regarding the results for relative peak intensity, which represent a more difficult challenge as illustrated in Fig. 8, our method effectively constrained the maximum relative error in intensity to under 50%. In contrast, the NNLS-TV approach occasionally exhibits significant prediction errors on certain spectral data samples.

FIG. 8.

The relative intensity errors (vs wavelength in nm) for the subset of “minor” peaks with relatively lower intensities.

FIG. 8.

The relative intensity errors (vs wavelength in nm) for the subset of “minor” peaks with relatively lower intensities.

Close modal

Our miniature chip-scale spectrometers have been integrated into wearable health monitoring devices. In such scenarios, the inference time for reconstruction often emerges as a critical factor for achieving fast real-time health monitoring. Additionally, our pre-trained Sim2Real method offers the potential to conserve battery life, as it requires only one forward pass for inference, unlike NNLS-TV, which requires sophisticated software or smart programs to iteratively solve it. Overall, prioritizing fast inference time is advantageous for real-time monitoring and extended battery life because of the low computational/energy costs involved in inference.

To demonstrate our superior inference time, we compared the execution time of our pre-trained model with the NNLS-TV solver using the real dataset.13 The results in Table I show that our model significantly reduces inference time. When conducting the execution time experiments in Table I, we ensured fairness by utilizing only the CPU for both the NNLS-TV and our DL approach. Moreover, tests were performed on identical hardware setups to guarantee that the comparison solely reflects the computational efficiency of each method.

TABLE I.

The average execution time (per sample, in milliseconds) for the NNLS-TV13 method and our proposed approach on the real dataset.13 The NNLS-TV method13 was originally implemented with the MATLAB solver lsqnonneg in the paper, and to ensure a fair comparison with our model implemented in PyTorch, we also implemented their method using scipy.optimize.nnls in SciPy. Boldface denotes the mean execution time of our model implemented with PyTorch.

NNLS+TV (MATLAB)NNLS+TV (SciPy)Ours (PyTorch)
Exec. Time (ms) 10.634 ± 0.126 7.965 ± 0.230 1.134 ± 0.330 
NNLS+TV (MATLAB)NNLS+TV (SciPy)Ours (PyTorch)
Exec. Time (ms) 10.634 ± 0.126 7.965 ± 0.230 1.134 ± 0.330 

As previously discussed in Sec. II D, the domain shift usually exists between the device-informed simulated data ysim and the real-world data yreal. Many works only focus on improving the reconstruction accuracy within the datasets of the same distribution (i.e., training and testing exclusively on either simulated data or specific real-measured datasets) without attempting to bridge the domain gap between them. For instance, ResCNN15 has been demonstrated to achieve excellent spectral reconstruction performance and maintain stable results even under some noisy conditions. It preprocessed the input data from y to x̃Ry (R is the pseudoinverse of R) for improved results. However, ResCNN trained only on simulated data does not perform optimally when directly applied to real-world data.13 

Figure 3 illustrates a comparison of the root mean square error (RMSE) under the Sim2Real setting between the spectral signals reconstructed by ResCNN and our ReSpecNN against the ground truths for both the device-informed simulated data and the real data collected from our spectrometer.13 While ResCNN nearly perfectly fits the simulated data (with a lower RMSE), its performance on the real dataset13 tends to drop significantly under the Sim2Real training setting, which confirms the domain shift phenomenon.

In this paper, we introduce a novel Sim2Real framework for spectral signal reconstruction in spectroscopy, focusing on the sampling efficiency and inference time. Throughout the training process, only a single measurement of the response matrix R for the spectrometer device is required, with all other training data being simulated and generated. To address the domain shift between the real-world data and the device-informed simulated data, our Sim2Real framework introduces the hierarchical data augmentation approach to train our deep learning model. Furthermore, our neural network, ReSpecNN, which is trained exclusively on such augmented simulated data, is specifically designed for the reconstruction of real spectral signals. In our experiments, even with the simplest Gaussian noise augmentation, our Sim2Real method has achieved a ten-fold speed-up during the inference compared to the state-of-the-art optimization-based solver NNLS-TV,13 while demonstrating on par performance in terms of the solution quality. Although using only Gaussian noises for augmentation presents a limitation, the flexibility of our data augmentation suggests that improvements on the perturbation model of the response matrix R may further improve our Sim2Real framework by experimentally determining the noise distribution.

In short, our hierarchical data augmentation strategy significantly improves the model robustness on spectral signal reconstruction in spectroscopy by realistically simulating the noise that can occur within the spectrometer device. However, it does have limitations. For example, in cases of extreme outliers that arise in some of the real spectral data, even after extensive data augmentation (corresponding to the high noise level in our method), those spectral data may still not be represented adequately. This leads to suboptimal performance in spectrum reconstruction for these extreme cases. Nevertheless, this limitation is a common challenge across data augmentation techniques.

Moving forward, to better handle outliers and further improve the robustness and accuracy of our model, we plan to apply the following approaches. First, it would be interesting to explore some improved noise patterns within our data augmentation process, for instance, incorporating the idea of adversarial training of deep networks.35 Furthermore, while we focus on scenarios without labeled real training data, limited labeled data may be available in practice. These data can fine-tune our model, but the small size risks overfitting. To mitigate this while employing these data, we can use simulated training data for regularization during fine-tuning, as suggested in the recent study.36 

Regarding the model scalability, the input size in the spectral reconstruction problem depends on the number of spectral encoders used within the spectrometer device, which varies based on specific problem setup and device design. In this paper, the number of spectral encoders is fixed at 16. Looking ahead, we could still explore our model scalability under potentially different problem setups, for instance, the reconstruction with polarized spectral encoded signals.

J.C., P.L., Q.Q., and Y.W. acknowledge support from NSF CAREER CCF Grant No. 2143904, NSF CCF Grant No. 2212066, MIDAS Propelling Original Data Science (PODS) Grant, and a gift grant from KLA. P.-C.K. acknowledges the support from NSF ECCS Grant No. 2317047 for device fabrication and spectral data collection. Y.W. also acknowledges support from the Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, a Schmidt Futures program.

The authors have no conflicts to disclose.

J.C. and P.L. contributed equally to the work.

Jiyi Chen: Conceptualization (equal); Data curation (equal); Investigation (equal); Methodology (equal); Software (equal); Validation (equal); Visualization (equal); Writing – original draft (equal). Pengyu Li: Conceptualization (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Visualization (equal); Writing – original draft (equal). Yutong Wang: Visualization (supporting); Writing – review & editing (supporting). Pei-Cheng Ku: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Funding acquisition (equal); Investigation (equal); Methodology (equal); Project administration (equal); Resources (equal); Supervision (lead); Validation (equal); Visualization (equal); Writing – original draft (equal); Writing – review & editing (supporting). Qing Qu: Supervision (lead); Writing – review & editing (supporting).

The testing spectral data and our experimental results are available at https://github.com/j1goblue/Rec_Spectrometer.

1.
C.-C.
Chang
and
H.-N.
Lee
, “
On the estimation of target spectrum for filter-array based spectrometers
,”
Opt. Express
16
,
1056
1061
(
2008
).
2.
U.
Kurokawa
,
B. I.
Choi
, and
C. C.
Chang
, “
Filter-based miniature spectrometers: Spectrum reconstruction using adaptive regularization
,”
IEEE Sens. J.
11
,
1556
1563
(
2011
).
3.
J.
Bao
and
M. G.
Bawendi
, “
A colloidal quantum dot spectrometer
,”
Nature
523
,
67
70
(
2015
).
4.
Z.
Wang
and
Z.
Yu
, “
Spectral analysis based on compressive sensing in nanophotonic structures
,”
Opt. Express
22
,
25608
(
2014
).
5.
Z.
Wang
,
S.
Yi
,
A.
Chen
,
M.
Zhou
,
T. S.
Luk
,
A.
James
,
J.
Nogan
,
W.
Ross
,
G.
Joe
,
A.
Shahsafi
,
K. X.
Wang
,
M. A.
Kats
, and
Z.
Yu
, “
Single-shot on-chip spectral sensors based on photonic crystal slabs
,”
Nat. Commun.
10
,
1020
(
2019
).
6.
C.
Kim
,
W.-B.
Lee
,
S. K.
Lee
,
Y. T.
Lee
, and
H.-N.
Lee
, “
Fabrication of 2D thin-film filter-array for compressive sensing spectroscopy
,”
Opt. Lasers Eng.
115
,
53
58
(
2019
).
7.
T.
Sarwar
,
S.
Cheekati
,
K.
Chung
, and
P.-C.
Ku
, “
On-chip optical spectrometer based on GaN wavelength-selective nanostructural absorbers
,”
Appl. Phys. Lett.
116
,
081103
(
2020
).
8.
C.
Brown
,
A.
Goncharov
,
Z. S.
Ballard
,
M.
Fordham
,
A.
Clemens
,
Y.
Qiu
,
Y.
Rivenson
, and
A.
Ozcan
, “
Neural network-based on-chip spectroscopy using a scalable plasmonic encoder
,”
ACS Nano
15
,
6305
6315
(
2021
).
9.
J.
Meng
,
J. J.
Cadusch
, and
K. B.
Crozier
, “
Plasmonic mid-infrared filter array-detector array chemical classifier based on machine learning
,”
ACS Photonics
8
,
648
657
(
2021
).
10.
S.
Zhang
,
Y.
Dong
,
H.
Fu
,
S.-L.
Huang
, and
L.
Zhang
, “
A spectral reconstruction algorithm of miniature spectrometer based on sparse optimization and dictionary learning
,”
Sensors
18
,
644
(
2018
).
11.
E. J.
Candes
,
J. K.
Romberg
, and
T.
Tao
, “
Stable signal recovery from incomplete and inaccurate measurements
,”
Commun. Pure Appl. Math.
59
,
1207
1223
(
2006
).
12.
L. I.
Rudin
,
S.
Osher
, and
E.
Fatemi
, “
Nonlinear total variation based noise removal algorithms
,”
Physica D
60
,
259
268
(
1992
).
13.
T.
Sarwar
,
C.
Yaras
,
X.
Li
,
Q.
Qu
, and
P.-C.
Ku
, “
Miniaturizing a chip-scale spectrometer using local strain engineering and total-variation regularized reconstruction
,”
Nano Lett.
22
,
8174
8180
(
2022
).
14.
P.
Li
,
C.
Yaras
,
T.
Sarwar
,
P.-C.
Ku
, and
Q.
Qu
, “
Accelerating deep learning in reconstructive spectroscopy using synthetic data
,” in CLEO 2023 (
Optica Publishing Group
,
2023
), p.
JTu2A.71
.
15.
C.
Kim
,
D.
Park
, and
H.-N.
Lee
, “
Compressive sensing spectroscopy using a residual convolutional neural network
,”
Sensors
20
,
594
(
2020
).
16.
J.
Zhang
,
X.
Zhu
, and
J.
Bao
, “
Denoising autoencoder aided spectrum reconstruction for colloidal quantum dot spectrometers
,”
IEEE Sens. J.
21
,
6450
6458
(
2021
).
17.
S.
Arora
,
N.
Cohen
,
W.
Hu
, and
Y.
Luo
, “
Implicit regularization in deep matrix factorization
,” in
Proceedings of the 33rd International Conference on Neural Information Processing Systems
(
Curran Associates, Inc
.,
2019
), Vol.
32
.
18.
S. J.
Pan
and
Q.
Yang
, “
A survey on transfer learning
,”
IEEE Trans. Knowl. Data Eng.
22
,
1345
1359
(
2010
).
19.
E.
Tzeng
,
J.
Hoffman
,
K.
Saenko
, and
T.
Darrell
, “
Adversarial discriminative domain adaptation
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2017
), pp.
7167
7176
.
20.
Y.
Ganin
and
V.
Lempitsky
, “
Unsupervised domain adaptation by backpropagation
,” in
International Conference on Machine Learning
(
PMLR
,
2015
), pp.
1180
1189
.
21.
W.
Zhang
,
W.
Ouyang
,
W.
Li
, and
D.
Xu
, “
Collaborative and adversarial network for unsupervised domain adaptation
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2018
), pp.
3801
3809
.
22.
C.
Finn
,
P.
Abbeel
, and
S.
Levine
, “
Model-agnostic meta-learning for fast adaptation of deep networks
,” in
International Conference on Machine Learning
(
PMLR
,
2017
), pp.
1126
1135
.
23.
O.
Vinyals
,
C.
Blundell
,
T.
Lillicrap
,
D.
Wierstra
et al, “
Matching networks for one shot learning
,” in Advances in Neural Information Processing Systems 29 (Curran Associates, Inc.,
2016
).
24.
J.
Snell
,
K.
Swersky
, and
R.
Zemel
, “
Prototypical networks for few-shot learning
,” in Advances in Neural Information Processing Systems 30 (Curran Associates, Inc.,
2017
).
25.
R.
Girshick
,
J.
Donahue
,
T.
Darrell
, and
J.
Malik
, “
Rich feature hierarchies for accurate object detection and semantic segmentation
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2014
), pp.
580
587
.
26.
B.
Zoph
,
V.
Vasudevan
,
J.
Shlens
, and
Q. V.
Le
, “
Learning transferable architectures for scalable image recognition
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2018
), pp.
8697
8710
.
27.
A. M.
Dai
and
Q. V.
Le
, “
Semi-supervised sequence learning
,” in Advances in Neural Information Processing Systems 28 (Curran Associates, Inc.,
2015
).
28.
J.
Howard
and
S.
Ruder
, “
Universal language model fine-tuning for text classification
,” in
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
, edited by
I.
Gurevych
and
Y.
Miyao
(
Association for Computational Linguistics
,
Melbourne, Australia
,
2018
), pp.
328
339
.
29.
J.
Devlin
,
M.-W.
Chang
,
K.
Lee
, and
K.
Toutanova
, “
BERT: Pre-training of deep bidirectional transformers for language understanding
,” in
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
, edited by
J.
Burstein
,
C.
Doran
, and
T.
Solorio
(
Association for Computational Linguistics
,
Minneapolis, MN
,
2019
), pp.
4171
4186
.
30.
X.
Chen
,
S.
Wang
,
J.
Wang
, and
M.
Long
, “
Representation subspace distance for domain adaptation regression
,” in
Proceedings of the 38th International Conference on Machine Learning
(
ICML
,
2021
), pp.
1749
1759
.
31.
I.
Nejjar
,
Q.
Wang
, and
O.
Fink
, “
DARE-GRAM: Unsupervised domain adaptation regression by aligning inverse gram matrices
,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2023
), pp.
11744
11754
.
32.
C.
Dong
,
C. C.
Loy
,
K.
He
, and
X.
Tang
, “
Image super-resolution using deep convolutional networks
,”
IEEE Trans. Pattern Anal. Mach. Intell.
38
,
295
307
(
2016
).
33.
K.
Kulkarni
,
S.
Lohit
,
P.
Turaga
,
R.
Kerviche
, and
A.
Ashok
, “
ReconNet: Non-iterative reconstruction of images from compressively sensed measurements
,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2016
), pp.
449
458
.
34.
G.
Ongie
,
A.
Jalal
,
C. A.
Metzler
,
R. G.
Baraniuk
,
A. G.
Dimakis
, and
R.
Willett
, “
Deep learning techniques for inverse problems in imaging
,”
IEEE J. Sel. Areas Inf. Theory
1
,
39
56
(
2020
).
35.
I. J.
Goodfellow
,
J.
Shlens
, and
C.
Szegedy
, “
Explaining and harnessing adversarial examples
,” in
International Conference on Learning Representations ICLR
,
2015
.
36.
N.
Ruiz
,
Y.
Li
,
V.
Jampani
,
Y.
Pritch
,
M.
Rubinstein
, and
K.
Aberman
, “
DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation
,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(
IEEE
,
2023
), pp.
22500
22510
.