We demonstrate an optical machine learning method in the terahertz domain, which allows the recognition of objects within a single measurement. As many materials are transparent in the terahertz spectral region, objects hidden within such materials can be identified. In contrast to typical object recognition methods, our method only requires a single pixel detector instead of a focal plane array. The core of the calculation is performed by a quantum cascade laser generated terahertz beam, which is spatially modulated at a near-infrared encoded silicon wafer. We show that this method is robust against displacements of the objects and noise. Additionally, the method is flexible and, due to the optically performed recognition task, inherently fast.

The terahertz (THz) region is the part of the electromagnetic spectrum with frequencies between radio- and optical frequencies (0.1 THz–10 THz). Countless materials, including most dielectrics, are transparent to THz radiation,^{1} which makes it the ideal spectral region for hidden object imaging.^{2,3} However, due to the technical difficulty in generating, modulating, and detecting THz radiation, the potential could not yet be fully exploited.^{4}

The transparency of many materials in the THz region has triggered applications in medicine, industry, and safety. Typically, these applications are imaging applications with the purpose of finding and recognizing structures or objects, which are well hidden, inside, or behind the THz transparent material, such as potentially dangerous, concealed objects^{5} or the water distribution in biological substances.^{6} For the detection of such objects, typically an image is recorded, digitized, and processed on a digital processor. Recently, the performance of object recognition methods substantially increased due to the development of modern machine learning algorithms and specialized hardware. While this approach works well in the visible spectral range, due to the availability of high resolution, low noise charge-coupled device (CCD) sensors, the THz region suffers from a lack of efficient detection methods. Available two-dimensional focal plane arrays, e.g., micro-bolometer cameras, provide only low sensitivity in the THz region, and recorded images are thus degraded by noise, which leads to long acquisition times and malfunctioning image recognition methods. This limitation can be lifted by using a compressed sensing technique,^{2,7} which allows the reconstruction of an image from an array of intensity values recorded with a single pixel detector. Therefore, a detector with a significantly higher sensitivity than the available camera systems, in combination with a THz spatial light modulator (SLM), replaces the camera.^{7,8} However, for recording a low-noise image, single pixel imaging requires one measurement per pixel to obtain an image that can subsequently be processed on a computer system, which thus slows down the process drastically.

Here, we present an optical object recognition method in the THz domain, which simultaneously overcomes both limitations of the available methods: the low sensitivity and the slow acquisition speed. While the highly sensitive single pixel THz detector resolves the signal to noise ratio limitations, the machine learning algorithm allows the recognition of objects with only a single measurement. Since the most computationally complex part of the image recognition process, the matrix multiplication of the weight matrix, and the input vector, which represents the object, is performed in parallel and optically, the recognition is inherently fast. Acquisition and detection rates are thus only limited by the setup. Examples of such limitations include the acquisition speed of the detector, the speed of the light modulation, and the speed of the stage, which is used to move the objects. Our method is flexible and can be extended without major modifications. The number of different detectable objects can, for example, be increased by taking one measurement per object class with a distinctive weight matrix per output node. With the demonstrated optical recognition method, we can identify metal objects, which are placed in the THz beam path, with a 10 Hz repetition rate and near perfect accuracy.

The central part of our method is the spatial intensity modulation of a parallel THz beam. This modulation is crucial as it performs the matrix multiplication between the input vector, which is the object in question, and the weight matrix, which contains the trainable parameters. As there are no SLMs available in the THz region, we employ a modulation scheme based on NIR induced electron–hole pair generation in a high-resistivity silicon wafer to modulate the THz light.^{7,9} A wafer-bonded high-power THz quantum cascade laser (QCL)^{10,11} with an emission frequency of 3.8 THz acts as a light source for the recognition system. The QCL, which is cryogenically cooled to 10 K, is electrically driven by short pulses with a pulse length of 880 ns, depending on the desired output power and repetition rates ranging from 200 Hz to 40 kHz. At full power, the single-facet peak emitted power of the QCL is 470 mW, and the corresponding average output power is 16 mW. A parabolic mirror transforms the diverging THz beam into a parallel beam in which the objects, which are to be recognized, are positioned [see Fig. 1(a)]. The objects block parts of the terahertz beam and therefore change its spatial intensity profile. The objects in question may be hidden behind various THz transparent materials such as paper or polyethylene (PE) foil. A high-resistivity silicon wafer is used to spatially modulate the THz light. A 1.3 W near infrared (NIR) beam with the 800 nm wavelength is widened up to a beam diameter of 1 cm, propagated through a *λ*/2 waveplate and projected on a Hamamatsu X10468-02 LCOS-SLM spatial light modulator (SLM). The SLM enforces a spatially distributed polarization of the NIR beam, which is converted to an intensity modulation by employing a linear polarizer. The spatially modulated NIR light is directed onto the silicon and absorbed, which changes the material from resistive to conductive by the generation of electron–hole pairs.^{7,9} For the maximum NIR intensity of 800 mW cm^{−2}, an estimated carrier density of 2 · 10^{17} cm^{−3} is achieved. The free carriers increase the THz absorption of the silicon wafer and therefore modulate the THz transmission. Note that the pixel size (≈0.6 × 0.6 mm^{2}) of our modulation patterns is chosen to be in the order of magnitude of the carrier diffusion length of the silicon wafer. As a result, the dimensions of the objects to be recognized have to be significantly larger than a single pixel in order to ensure an effective modulation. Smaller pixels can be achieved by employing materials with shorter carrier diffusion lengths. The modulation depth of the THz is proportional to the NIR intensity, as depicted in Fig. 1(b). The maximum measured modulation, with an estimated NIR power density of 800 mW cm^{−2}, is found to be just above 30%. The spatially modulated THz light is focused onto a pyroelectric THz single pixel detector with a detectivity of 8.3 · 10^{8} $cmHz/W$ and a noise equivalent power of 333 $pW/Hz$. The relative responsivity of the detector has its 3 dB roll off just below 20 Hz. Even though the SLM allows modulation rates up to 60 Hz, our system is limited by the speed of the detector. We achieve detection rates of up to 10 Hz. The dynamic range of our setup is given by the maximum signal without an object in the THz path (720 mV) and the noise level (<1 mV). As any single pixel THz detector that is suitable for the application in question can be used, this speed limit can be lifted if required by using, e.g., fast photoconductive detectors. In the case that more sensitivity is required, a cryogenically cooled silicon bolometer may be used instead. The signal voltage from the detector is fed into a lock-in amplifier with integration times of 10 ms–100 ms. A bias value is added by means of software to the result of the matrix multiplication [see Eq. (1)], and the activation of the neural network is calculated.

The employed machine learning algorithm can most intuitively be understood as a shallow artificial neural network, which is identical to a logistic regression,^{12} and consists of an input layer with *M* = 256 neurons and an output layer of a single neuron [see Fig. 2(a)]. Each input neuron corresponds to a pixel of the discretized object and is activated by the THz intensity at that position. The weights *W* between the input and output neurons correspond to the pixel brightness of the spatially modulated NIR beam, which is directed onto the Si wafer. The output node has a sigmoid activation function $\sigma (x)=11+e\u2212x$, which is useful for classification problems, as it is highly nonlinear and limits the output to values between 0 and 1. A trainable bias value *b* is added to the output by means of software. The trainable parameters of the algorithm, except of the bias, are kept in the weight matrix *W*, which contains *M* weights, one for each pixel. In contrast to software implementations of machine learning algorithms, the input vector, which is required for training, is initially measured by means of single pixel imaging. This vector contains the geometric information of the object as well as the spatial intensity distribution of the THz beam. Therefore, a 256 × 256 Hadamard matrix is generated, and its column vectors are selected and reshaped to 16 × 16 matrices. Each of the resulting matrices *M*_{i} is projected onto the Si wafer to measure a corresponding THz intensity value *z*_{i}. By using a least-squares numerical solver, the equation system *z* = *H* · *x* is solved, where *x* is the desired image and *H* is the Hadamard matrix. The resulting image *x*, which consists of the values *x*_{i}, is stored. Once trained, the predictions $\u0177$ of the algorithm for an input *x* are given by the forward propagation,

The most computationally complex part of the object recognition is the matrix multiplication $WTx=\u2211i=0MWixi$, which consists of *M* multiplications and the summation of the resulting terms. In our experiment, this operation is performed optically by modulating the transmitted THz intensity spatially and detecting it with a single pixel detector. In fact, it is not only performed optically during the prediction phase but also during the training phase. Another advantage is that the optical implementation of the matrix multiplication is fully parallel, making it inherently fast. The main limitation of this method is the ability to physically modulate the THz transmission. As a result, the possible weight values of the matrix *M* are limited as well. This can be counteracted by introducing a software scaling after the measurement but proves not to be necessary in our experiment. A bias *b* that is added to the matrix multiplication and sigmoid activation *σ* of the total sum is calculated by software. The output of the activation function is at the same time the prediction $\u0177$ of the artificial neural network.

The measurements are referenced to measurements of the same object with all values of the weight-matrix set to 1. This is done to make the result independent of the transmitted intensity, as only the ratio of the intensities is used. As a result, the area of the objects, which are placed in the THz beam, has no influence on the prediction result. This method is also useful for compensating slow drifts during the experiment by periodically measuring the reference value.

For training, the weight matrix *W* is initialized with Gaussian distributed random values [see Fig. 3(d)]. The training set consists of the two objects, which are positioned in the THz beam in a random sequence. Note that no transformations other than minor random positional imprecisions caused by the stage and the mounting of the objects as well as detection noise are applied to the objects. For different recognition tasks, it may be useful to add transformations such as random shifts and rotations in order to increase the generalization of the machine learning algorithm. The predictions are calculated and a cross-entropy loss-function

is calculated, where *y* is the ground-truth (the actually present object) and $\u0177$ is the prediction of the artificial neural network. Note that the predictions $\u0177$ are continuous values between 0 and 1, whereas the ground-truth are discrete values (either 0 or 1). Therefore, the loss-function yields continuous, positive values. The perfect loss would be 0, where the predicted values would be only zeros and ones, identical to the ground-truth. Note that the loss-function does not necessarily yield information about the prediction accuracy, as the real system returns continuous values between 0 and 1, which can also lie close to 0.5. This results in a high prediction accuracy but simultaneously yields a loss value, which is far from zero. The loss-function is minimized by stochastic gradient descent,^{13} a gradient based optimization method. The gradient ∇_{i} of the i-th weight value is estimated by back-propagation,^{13} which here, as our chosen model architecture is identical to a logistic regression, simplifies to

where *x*_{i} are the components of the previously, by means of compressed imaging, recorded training samples. To stabilize and accelerate the training behavior, a Nesterov momentum^{14} based approach is employed. Therefore, the intermediate value

is calculated, where *β* is a parameter that describes the magnitude of the first order momentum. It determines how much the gradient changes during each iteration. If it is small, the gradient changes a lot, but if it is large, it behaves inert. If *β* = 1, the gradient does not change at all and no useful training takes place. Here, *β* is chosen to be 0.5. A learning rate *α* is used to control how much the weight values *W*_{t,i} change during each iteration *t*,

Note that even though higher learning rates accelerate the learning process, learning rates that are too high lead to oscillations of the optimizer, resulting in bad convergence.^{15} For this experiment, the learning rate is empirically chosen to be 50.

The neural network is trained to differentiate two metal objects in the shape of the letter “T” and of an arrow pointing to the north-east “ ↗.” The first step during the forward propagation is the recording of the reference value, which is done by applying a homogeneous NIR pattern (all weights are set to one). The resulting detection value is recorded and stored as reference. Next, the NIR beam is patterned by the SLM with the trained weight matrix values and projected onto the Si wafer. This spatially modulates the THz transmission accordingly and changes its total transmitted power. The THz beam is focused onto the detector, which integrates its intensity and returns a single signal value. The data acquisition is reading the detector value after waiting for a single integration time. As all other steps during the forward propagation are much faster, the sampling time is dominated by the integration time. The activation of the output of the artificial neural network is calculated, as seen in Eq. (1), where the matrix-multiplication, which is performed optically, is divided by the previously recorded reference value. The addition of a bias and the calculation of the sigmoid activation function are performed by software. For each training step, consisting of ten measurements and the corresponding ground-truth, a loss value is calculated [see Eq. (2)]. Subsequently, the gradient for the optimization is estimated, and the weights are updated as seen in Eq. (5). This results in an overall reduction of the training loss as depicted in Fig. 2(b). Note that even though there are 257 trainable parameters in the model, which is more than the 180 presented samples, it is trained successfully. The back-propagation leads to an overall improvement of the predictive capabilities. At a certain point, depending on the requirements and the complexity of the problem, the performance of the model is sufficient. Equation (1) reveals that the standard deviation of the output is dominated by the matrix multiplication *W*^{T}*x*. The weights are physically limited by the maximum possible modulation depth and cannot be negative, as there are no negative NIR intensities. As a result, the standard deviation of the matrix multiplication is limited, and hence, the activation is limited as well. This explains why the loss converges nicely, without reaching near zero values. For 100 test samples, the mean output value is 0.503, and the standard deviation is only 0.005. Even though the predictions of the model are correct, the output values are close to the decision threshold. Performing a post-measurement scaling on the measured matrix multiplication values can compensate for the low standard deviation. However, as this does not affect the accuracy, it is not necessary to perform such a scaling. An evaluation after 18 training steps, which corresponds to only 180 samples, yields a perfect score without a single misclassification. This can be seen in Fig. 3(a), which shows a close-up sketch of the objects and the corresponding results, plotted in a confusion matrix.^{16} The confusion matrix compares the ground-truth, which are the objects that are in the beam, and the predictions of the model. All classifications lying in the diagonal, extending from the bottom left to the top right, are correct, whereas all other classifications represent misclassifications. As demonstrated, the system is highly capable of differentiating the trained objects.

A major benefit of this setup is demonstrated by hiding the objects behind materials that are not transparent in the visible region. Therefore, the objects are hidden behind a polyethylene (PE) foil and a sheet of white paper. As shown in Figs. 3(b) and 3(c), the addition of these materials to the THz path does not significantly impact the performance of the object recognition system. The system was trained without the materials in the path but nonetheless is well capable of detecting the hidden objects, even though the THz light is attenuated significantly in the case of the object being hidden behind paper. This illustrates that the system can be trained without any prior knowledge of the materials within which the objects are hidden, if they are sufficiently transparent to THz radiation. Looking at the weight matrices of different training steps, reshaped into 16 × 16 pixel images, reveals the spatial structure of the learned weights, which corresponds to the strength of the spatial THz transmission modulation, as shown in Fig. 3(d). Note that for higher training steps, parts of the vertical and horizontal bars of the T shape and of the vertical and diagonal bars of the arrow shape can be identified.

Our measurements show that the neural network is resilient to spatial displacements of the objects. For systematic horizontal shifts of 2 mm and 4 mm [see Fig. 4(a)], with 15 samples measured for each shift, not a single misclassification is recorded.

To ensure that the network is performing as expected and indeed relies on the objects and the trained weights, the objects are removed, which results in a constant classification of the arrow class [see Fig. 4(b)]. This is attributed to the fact that the modulation in this case always yields the same intensity value, as the THz beam is spatially not blocked by an object. Setting all weights zero except of the software bias, with the objects being present in the THz beam, leads to a constant classification of the T class [see Fig. 4(c)]. It can hence be concluded that the system indeed learns to recognize the objects and that the weights, which are optically projected onto the silicon wafer, play a crucial role.

A great advantage of modern machine-learning algorithms is their noise resilience.^{17} This is also observed in our system by systematically reducing the emission power of the QCL and measuring the prediction accuracy for 20 samples for each power value. The results of this measurement series are illustrated in Fig. 4(d). The accuracy of the method is constant for a large interval of QCL emitted THz powers and only rapidly degrades when the noise floor of the system is reached.

We demonstrate an optical machine learning algorithm in the THz domain based on the spatial transmission modulation of a THz beam. The modulation is performed by modifying the optical response of a high resistivity silicon wafer by shining on it with a spatially modulated near-infrared beam with a peak intensity of 800 mW cm^{−2}. We demonstrate a 10 Hz object recognition rate, limited by the employed pyroelectric detector, with near 100% accuracy for objects hidden behind various materials and objects in plain view. The recognition speed can be improved by employing faster detectors such as large array field-effect transistors^{18} and by replacing the SLM with a faster digital mirror device.

The physical pixel size of our modulation scheme is limited by the carrier diffusion length. This can be improved by employing materials with shorter carrier diffusion lengths or by separating the pixels on the modulation material. As a result, smaller objects may be detected and higher resolutions can be reached. It is also possible to create a system with the ability of spectrally resolved object recognition. Therefore, e.g., a color-switching QCL^{19} may be employed as a controllable light-source, and the different spectral components would be measured sequentially. Each component would correspond to its own weight matrix. The measurements would then be combined on the computer and yield a prediction that depends on the spatial distribution of the spectral transmission of the objects in question.

The algorithm can be easily expanded such that any desired number of objects can be recognized and distinguished. This means that there are multiple output nodes, which can be achieved by introducing multiple weight matrices, one for each output node. The outputs would hence be calculated sequentially, increasing the measurement times only linearly. There are also different ways of expanding the neural network to a deep neural network. The most flexible approach is that our method serves as the hardware implementation of the input layer of a deep neural network, where the remaining layers are implemented in software. This significantly reduces the number of required measurements compared to a compressed imaging approach and eliminates the need for a THz camera but at the same time keeps the flexibility of modern deep neural network topologies. The implementation of a convolutional layer by using this method may also increase the performance of the algorithm, when handling problems with large displacements of the objects.

The authors would like to thank Lukas Mennel (TU Wien) and David Müller (TU Wien) for insightful and stimulating discussions and acknowledge financial support from the Austrian Science Fund FWF (Grant Nos. DK CoQuS W1210, DK Solids4Fun W1243, and DiPQCL P30709-N27) and the Air Force Office of Scientific Research AFOSR (Grant No. FA9550-17-1-0340).

## DATA AVAILABILITY

The data that support the findings of this study are available from the corresponding author upon reasonable request.