Photonic accelerators for Artificial Intelligence (AI) are rapidly advancing, promising to provide revolutionary computational speed for modern AI architectures. By leveraging photons with a bandwidth higher than 100 THz, photonic accelerators tackle the computational demands of AI tasks that GHz electronics alone cannot meet. Photonics accelerators integrate circuitry for matrix–vector operators and ultra-fast feature extractors, enabling energy-efficient and parallel computations that prove crucial for the training and inference of AI models in various applications, including classification, segmentation, and feature extraction. This Perspective discusses modern challenges and opportunities that optical computations open in AI for research and industry.
WHY OPTICAL ACCELERATION?
Modern Artificial Intelligence (AI) architectures are confronting a computational power challenge that underpins present and future applications of this technology, especially the recently developed foundation models.1–4 The latest report from OpenAI5 divides the development of AI into two distinct phases, according to the training computational power required. Before 2012, AI computing power doubled approximately every two years, following the empiric paradigm of Moore’s law6 that satisfied the required demand (Fig. 1, white area). The subsequent AI stage witnessed a sharp acceleration, with computing power requirements doubling every three months (Fig. 1, darker area).
The escalating demand for computational resources has profound implications for the training of foundation models, rapidly becoming the new cornerstone of AI capabilities.5,7,8 Foundation models refer to pre-trained neural network architectures designed to capture generic and broad patterns in data.9,10 These neural architectures are now fundamental building blocks for various tasks and applications, providing the starting point for further specialization. While the size of foundation models grows to accommodate the complexities of modern tasks in speech recognition, image classification, and natural language processing, the training process becomes increasingly resource-intensive.5,11,12 In this process, the training iterations require adjusting hundreds of millions of parameters to minimize errors, demanding substantial computational power ranging in the thousands of petaflop/s-days5,11 (Fig. 1).
Traditional electronic central processing units (CPUs) struggle to accommodate such a rapid growth of computational demands, providing a bottleneck to future AI expansions.13–15 The first issue is that traditional CPUs follow Moore’s law, which lags behind the second phase of modern AI.3,5,16 On the other hand, the semiconductor industry is approaching physical limitations by transitioning between nanometer nodes, struggling to maintain the basic scaling of Moore’s law, and approaching a plateau.17 In this context, the training of foundation models faces the challenge of accommodating increasingly sophisticated architectures while finding available resources to meet the required computational demand. Addressing this problem is becoming a paramount issue to avoid a second winter in the AI industry in the future.5
Optical accelerators, such as analog optical processors, are emerging as promising solutions to address this problem.13,15,16,18–23 Photonics accelerators are the front end for digital systems, providing an alternative to traditional digital software processing units.22,24,25 Optical computation represents a method for light information processing without the necessity for conversion into slower electronic signals.26 Photonics processing offers numerous advantages, including speed with rates of hundreds of terabits per second (TB/s),13 enhanced power efficiency due to minimized heat generation from optical signals, and long-distance transmission with minimal energy loss.15,18 The inherent parallelism of light, which can concurrently traverse multiple pathways, empowers optical computations to perform multiple operations in parallel at low power requirements, on the order of a few watts per teraflop.13,27
The complex-valued phase profile of optical waves enables complex-valued optical neural networks (CV-ONNs), another notable development in photonic computing.28 These networks extend the capabilities of conventional neural networks by incorporating complex numbers, encompassing both real and imaginary components, into their computations. This enables CV-ONNs to model and process a broader range of data patterns, including phase information and interference effects, which are often crucial in various scientific and engineering applications.
Within the domain of deep neural networks (DNNs), optical accelerators exhibit high performances in executing fundamental tasks, such as matrix multiplications, spatial 2D convolutions, including first- and second-order derivatives for edge detection, and element-wise nonlinear activation functions.14,15,19,24,25,27,29,30 Photonic acceleration enhances computing speed between 3 and 6 orders of magnitude over electronic devices.14 The speed enhancement allows designing highly performing linear vector–vector (V–V) and matrix–vector (M–V) multiplications, the backbone of convolution operations in DNNs. These operations tend to be the most computationally intensive components in common DNN architectures, accounting for over 90% of their floating-point operations (FLOPs).13,14,19
Optical processor configurations exist both in free-space15,16,19,20,23 and integrated on-chip,21,22 yielding already mature implementations.13,18,26 Enhanced processing capabilities enable accelerated training and inference for neural networks, empowering researchers to train AI models faster with larger networks while tackling problems previously beyond the reach of conventional electronics, such as hyperspectral video understanding.31,32
OPTICAL MATRIX–VECTOR PRODUCT
At present, free-space optical architectures are among the most performing approaches for optical M–V multiplication.14,19,23 These systems exploit dynamical spatial multiplexing, which can easily integrate within photodetector systems.15,16,19,20,23 The main computational framework for V–V and M–V products in free space unfolds in two fundamental steps. The first step is element-wise multiplication [Fig. 2(a)]. Each input pixel representing xj aligns spatially and projects pixel-by-pixel onto the transmissive element with dynamically controlled optical characteristics, usually a Spatial Light Modulator (SLM). The SLM sets its transmissivity proportionally to wij, resulting in the scalar multiplication operation wijxj. The second step is optical fan-in [Fig. 2(b)]. The modulated pixels are combined physically by directing the transmitted light onto a single detector, leading to a photon count directly proportional to the resulting dot product yi.
In a recent study,14 the authors propose a linear coherent matrix-multiplication framework based on spatial multiplexing via SLM. The computation process entails the evaluation of V–V dot products, denoted as yi = x · wi = ∑jxjwij, where x represents the input vector of neural activations from the preceding layer and wi represents the weight vector connecting neuron nodes in x to the ith neuron in the subsequent layer. Each constituent xj of x encodes the intensity of an individual spatial mode illuminated by an organic light emitting diode (OLED) pixel, while each weight wij depicts the transmissivity of a modulator pixel. This architectural design permits parallel computation of up to 711 × 711 = 505 521 scalar multiplications and additions. The optical strategy minimizes energy consumption by aggregating dot products by summing up the spatial modes on a single detector. For a substantial vector size N, the signal-to-noise ratio (SNR) scales in proportion to at the shot-noise limit, facilitating accurate determination of the dot-product output, even when individual modes exhibit relatively low average photon counts.14 This capacity to perform dot products involving expansive vectors with minimal energy utilization constitutes a distinctive advantage of optical methodologies.
In the follow-up work,23 the authors propose a more advanced architecture that expands the previous linear framework and incorporates an element-wise Optoelectronic Opaque Nonlinearity Activation (OONA) layer situated between two fully connected optical linear layers, introducing nonlinearity to the model [Fig. 2(c)]. The linear layers implement M–V multiplications using broadband, incoherent light as direct inputs, utilizing techniques similar to previous studies. Natural input images are distributed via microlens-based fan-out, with multiplication achieved through spatial light modulation of attenuated image copies based on weight matrix components [see Fig. 2(d)]. The OONA layer processes the resulting vectors, using saturating nonlinear responses through an image intensifier tube to preserve the ONN’s spatial parallelism [Fig. 2(e)]. The OONA layer optimizes computational efficiency by avoiding read-out/read-in costs associated with separate electronic processing of nonlinear activations. The second ONN layer follows a similar optical M–V multiplier configuration, culminating in output detection through a camera or photodetector array and extending the architecture’s capacity to include nonlinearity. In this study, the authors showcase the capabilities of these networks by applying them to variety of different image understanding tasks, such as Modified National Institute of Standards and Technology (MNIST) digit classification, image reconstruction, or anomaly cell-organelle classification. Significantly, this methodology extends beyond image analysis, as demonstrated in a recent study by Valensise et al.,33 which employs a similar approach to implement optical encoding for natural language sentences.
EXTRACTING FEATURES WITH OPTICAL HARDWARE
The potential of optical accelerators for enabling new applications of artificial intelligence (AI) extends beyond computational acceleration, opening frontiers in feature extraction and analysis.32,34–37 While promising enhancements of standard computations in AI, these accelerators augment the dimensionality of feature spaces beyond the grasp of contemporary electronic architectures. This paradigm shift begins in machine vision, where recent work demonstrated optical accelerators integrated within imaging systems for extracting features at speeds larger than ten Tb/s.32,35 These accelerators enable operations such as M–V products,34 spectral encoding,32,36 depth estimation,35 and polarization37 measurements by directly interfacing with optical signals.
A recent example of such a system is a real-time, high-resolution hyperspectral video acquisition and processing system.32 This system leverages the universal approximation ability of inverse-designed metasurfaces composed of nanoresonator units38–40 as optimal spectral encoder hardware for hyperspectral information-related video and imaging tasks. The system comprises two core components: a hardware spectral encoder denoted as and a complementary software decoder [Fig. 2(f)]. The encoder transforms high-dimensional hyperspectral data β into a lower-dimensional multispectral feature tensor , while the subsequent decoder transforms into customized outputs tailored to specific tasks. This approach implements unsupervised dimensionality reduction, which involves finding the best-encoded representation of β through a set of projectors Λ(ω).
The output from the hardware encoder, comprising an array of sub-pixels integrated into a single camera pixel, is captured by a monochromatic camera that acts as an imaging readout layer. The software decoder interprets the tensor differently for machine vision tasks, such as hyperspectral reconstruction and semantic segmentation. In the case of hyperspectral reconstruction, the objective is to minimize loss using the Root Mean Squared Error (RMSE) between ground truth and reconstructed spectral responses. For semantic segmentation, the task is a pixel-level classification of an image. It automatically creates a segmentation mask that identifies different classes as they move the image in real time. The AI-enhanced hyperspectral technology provides a substantial reconstruction quality, achieving results of under 2% reconstruction error rate while maintaining a high image quality at 30 frames per second, and 1 megapixel spatial resolution. This approach enables high-quality hyperspectral imaging at elevated resolutions, opening for real-time applications at video rates of 30 frames per second and above.
The journey toward optical accelerators in high-performance computing introduces opportunities that open new avenues in the future of artificial intelligence. Optical accelerators meet the demands for the escalating computational power of modern AI architectures, harnessing photons’ speed to overcome electronic computing bottlenecks, promising enhanced computational efficiency, improved energy utilization, and intrinsic parallel data processing.
However, alongside the promise of transformation, potential challenges lie ahead that create essential opportunities for researchers. Integrating photonics into AI accelerators demands hardware and software components synchronization, an open problem that requires novel design paradigms. Ensuring seamless compatibility between photonics and electronics while mitigating latency and signal integrity is also challenging. In addition to these issues, the transition to optical accelerators must conclude without compromising the accessibility and affordability of current integrated electronic systems. If research would bridge the gap between cutting-edge results and practical deployment, it will require concerted efforts to overcome manufacturing complexities while fostering a broader ecosystem that can support the mass adoption of new optical technologies. Another critical challenge is integrating optoelectronic AI hardware with cost-effective and energy-efficient broadband nonlinear responses, which could go beyond initial lab-level prototypes. This issue is important to unlocking the full potential of optical accelerators for advanced computational tasks, offering efficient solutions toward fully optical neural networks.
Addressing these questions can enable future AI systems to transcend the boundaries of traditional image-based analysis and delve into richly informative multichannel video streams containing hundreds of information channels. Foundation models, known for their prowess in capturing long-range dependencies within data,11,41,42 are uniquely positioned to exploit the multi-dimensional nature of hyperspectral video, unlocking its latent insights. Future AI systems equipped with hyperspectral video understanding abilities can unravel intricate relationships and patterns embedded within the data, offering enhanced potential for applications in fields as diverse as medical diagnostics,43 environmental monitoring,31,44 security,45,46 and beyond.
As the exponential growth in computational demands and the quest for energy-efficient computing converge, the future trajectory of optical accelerators for AI is poised to shape the foundation of high-performance computing and the successful development of a new, modern era of AI.
Conflict of Interest
The authors have no conflicts to disclose.
Maksim Makarenko: Writing – original draft (equal); Writing – review & editing (equal). Qizhou Wang: Visualization (equal); Writing – review & editing (equal). Arturo Burguete-Lopez: Visualization (equal); Writing – review & editing (equal). Andrea Fratalocchi: Writing – review & editing (equal).
Data sharing is not applicable to this article as no new data were created or analyzed in this study.