We experimentally demonstrate the world’s first field-programmable gate-array-based real-time fiber nonlinearity compensator (NLC) using sparse K-means++ machine learning clustering in an energy-efficient 40-Gb/s 16-quadrature amplitude modulated self-coherent optical system. Our real-time NLC shows up to 3 dB improvement in Q-factor compared to linear equalization at 50 km of transmission.
I. INTRODUCTION
Surging data traffic demands, caused by bandwidth-hungry Internet services such as video streaming or cloud computing, pose a significant challenge to underlying fiber-optic communication systems. The only viable solution for increasing the data rates is the employment of advanced modulation formats [e.g., 16 quadrature amplitude modulation (16-QAM)]. The core difficulty on the transmission of such signals is the Kerr-induced fiber nonlinearity,1 which is responsible for nonlinear optical effects such as four wave mixing (FWM).2 The optical Kerr effect is attributed to the so-called nonlinear Shannon capacity limit,2 which sets an upper bound on the achievable data rate in optical fiber communications when using traditional linear transmission techniques. There have been extensive efforts in attempting to surpass the nonlinear Shannon limit through several fiber nonlinearity compensators (NLCs)1–5 with the most well-known being the optical phase conjugation,1 digital back-propagation,2,3 phase-conjugated twin-waves,4 and the nonlinear Fourier transform (NFT).5 However, besides these solutions being either complex1–3,5 or spectrally inefficient,4 more importantly, they traditionally compensate deterministic nonlinearities, thus ignoring critical stochastic nonlinear effects, such as the interplay between amplified spontaneous emission from optical amplification with fiber nonlinearity. Recently, digital machine learning has been under the spotlight for compensation of nonlinear distortions, harnessing various algorithms that perform classification, such as supervised artificial neural networks6 and support vector machines7 or unsupervised clustering, such as K-means, fuzzy-logic C-means, hierarchical,8,9 and affinity propagation.10 Unsupervised machine learning clustering algorithms are more attractive than supervised algorithms because they are completely blind and do not need any training signals that limits signal capacity. Digital-based machine learning digital blocks are independent from mathematically tractable models and can be optimized for a specific hardware configuration and channel.11 Hitherto reported machine learning based NLCs have been implemented offline using conventional software platforms (e.g., Matlab®). However, there is a tremendous need for a practical algorithm for application in real-time communication links that can maximize transmission performance without sacrificing network latency and energy efficiency. Recent advances in the field of field-programmable gate-arrays (FPGAs) have made it possible to process high bandwidth digital signals in next-generation telecommunication systems.
In this work, we experimentally demonstrate the first FPGA-based NLC using sparse K-means++ clustering for 40-Gb/s 16-QAM energy-efficient self-coherent systems12 (see Fig. 1). We show that our proposed NLC can offer up to 3 dB Q-factor enhancement for transmission at 50 km of standard single mode fiber (SSMF).
II. FPGA DESIGN AND EXPERIMENTAL SETUP
K-means clustering is a method of vector quantization, which has been popular for cluster analysis in data mining. It aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. However, K-means is computationally inefficient, especially for real-time signal processing where many calculations are required for thousands or millions of data points in a very limited timeframe (ms). Here, we propose a sparse K-means++ based NLC for real-time self-coherent optical signals. In the algorithm, we keep taking the data points within a regular time interval (sparsity) from the real-time incoming signal to estimate the centers of the clusters. The time interval can be dependent on the memory size of the FPGA. The updated centers’ coefficients are applied to the data signal in the hard decision process to help reduce the bit error rate (BER). As the calculation speed of the FPGA is very limited compared to the high-speed incoming data signal, the processing time for estimating the cluster centers’ coefficients will result in time delay and therefore system latency. In order to reduce such delays, the old updated cluster centers’ coefficients are saved in the latch registers and applied to the incoming data signal until new cluster centers are generated. The clustering algorithm is based on Lloyd’s approach,8,9 and its objective is to minimize the total intra-cluster variance or the squared error function , where J, K, and n are the objective function, number of clusters, and number of symbols, respectively. is the ith symbol and is the centroid for a cluster j. The inside term in the norm defines the distance function between a symbol and a centroid. In the algorithm, we have knowledge about the centers (i.e., the “++” term) of the adopted modulation format level to enable faster convergence and adaptivity.
The steps of our algorithm are given below, while the conceptual diagram in Fig. 2(a) shows an example of the 16 centers’ updates of a 16-QAM signal:
Initiate K cluster centers (centroids) by averaging the received data per cluster after linear equalization.
Compute point-to-cluster-centroid distances of all observations to each centroid.
Assign each observation to the cluster with the closest centroid (Batch update).
Compute the average of the observations in each cluster to obtain K new centroid locations.
Repeat steps 2 through 4 until cluster assignments do not change or the maximum number of iterations is reached.
In our NLC, the number of iterations was fixed, as further iterations did not improve the Q-factor. In Fig. 2(a), we show the design of the NLC. As shown, in order to update the centers in K-means, only information from one of the N-parallel FPGA-channels (32 in total) is required for the I and Q components, the updated center coefficients from one channel will be used for the other channels, thus significantly reducing the processing time and the resource utilization. The system is split into two parallel processors, where in the upper diagram the minimum Euclidean distance among sparse received data per constellation cluster M and the ideal centers is first calculated (i.e., the “++” term). The inset of Fig. 2(a) shows an example of the first ideal center’s initialization of a 16-QAM signal. Afterward, hard decision is executed and the BER/Q-factor is calculated. After this initialization, the lower arm in the design (the centers coefficients update block) is activated. It is the key part of the sparse K-means++ clustering algorithm, which calculates the coefficients of the K-means following a similar procedure as before (i.e., via Euclidean minimum distance calculation per cluster), estimates the mean for each center, and then updates them in each iteration depending on the renewed cluster mean. In the last iteration of the process, we re-assign symbols until all of them are properly assigned, considering that the minimum BER value has been reached (convergence). The FPGA used in our experiment was a Xilinx Virtex Ultrascale + VCU118 Evaluation FPGA Platform. The self-coherent system (Fig. 1) employs a 10 GBaud 16-QAM signal at 40-Gb/s. The transmitter-digital signal processing (DSP) was processed offline in Matlab using a look-up-table (LUT) in which pre-distortion was used to mitigate the opto-electronic components’ impairments similarly to Ref. 13. A narrow linewidth (<100 kHz) external cavity laser (ECL) was tuned to 1549.5 nm, and using an arbitrary waveform generator (AWG) operating at 20 GS/s, two uncorrelated pseudo-random level signals (215 − 1) were applied to the IQ modulator to generate the 16-QAM signal. After IQ modulation, the optical signal was transmitted over 50 km of SSMF. At the receiver, the optical signal was converted to an electrical one using a self-homodyne coherent receiver. Afterward, the signal was captured by using a real-time oscilloscope sampled at 50 GS/s, and directly after the analog to digital converters (ADCs), the resultant electrical signal entered our FPGA board. First, the real-time signal passed through the constant modulus algorithm (CMA) combined with the multi-modulus algorithm (MMA) for signal equalization. Afterward, the signal went through a least-mean squares (LMS) digital filter for carrier phase recovery, and finally, our proposed machine learning was processed in which hard calculations were carried out. The design structure of the CMA/MMA and LMS for real-time application is identical to that reported in Ref. 14. For our FPGA design, 32 parallel channels (Ch. N) were attached to meet the required 20 GHz bandwidth, considering that the available clock rate per channel is 312.5 MHz. From Ch. 1 to Ch. N, we selected only one channel and used it to update the information of the centers in K-means. However, for each channel, we processed separately the I and Q components.
III. RESULTS
Figure 2(b) and Table I present the resource utilization of the receiver DSP after FPGA synthesis and implementation, and Fig. 2(c) shows the on-chip power, which is in total 10.219 W. It is evident that the sparse K-means++, in which the hard-decision (yellow) and the coefficients update (blue) are included, covers >60% of the total complexity of the DSP receiver [e.g., LUTs, configurable logic block (CLB), 8-bit carry chain per CLB (CARRY8)]. The rest of the complexity is allocated to the CMA (pink) and the LMS (green). We also calculated the on-chip power consumption for the case using only a linear equalizer in order to compare with the case with the spare K-means++ algorithm. In the case with only linear equalization, the signal equalization (CMA and MMA), the LMS, and the hard-decision modules are included. Note that the hard decision is also needed before the BER calculation. The total on chip power for this case is 9.756 W. The on-chip power required when using the sparse K-means++ algorithm is 10.219 W; thus, the difference between using the sparse K-means++ algorithm and only using the conventional linear equalizer is not significant. This makes the proposed real-time sparse K-means++ algorithm competitive from a power consumption perspective. While this is contributed to a self-coherent system, it is worth noting that in a conventional coherent system, additional linear equalization is required for the frequency offset compensation between the transmitter and receiver laser. In addition, since our system was tested for short-each transmission at 50 km, chromatic dispersion compensation was ignored in the linear equalization process. Thus, we thus envisage that the total complexity of the linear equalizer in traditional coherent homodyne systems is much larger than the proposed machine learning algorithm.
DSP port . | LUTs . | Registers . | CARRY8 . | CLB . |
---|---|---|---|---|
LMS | 26 630 | 5 940 | 3 404 | 5 655 |
Sp. K-means++ | 431 149 | 60 316 | 32 444 | 67 973 |
CMA | 164 538 | 28 336 | 11 264 | 25 970 |
DSP port . | LUTs . | Registers . | CARRY8 . | CLB . |
---|---|---|---|---|
LMS | 26 630 | 5 940 | 3 404 | 5 655 |
Sp. K-means++ | 431 149 | 60 316 | 32 444 | 67 973 |
CMA | 164 538 | 28 336 | 11 264 | 25 970 |
Finally, in Fig. 2(d), we show the transmission performance of the 16-QAM 10 Gbaud signal at 50 km when using linear equalization and sparse K-means++ for launched optical powers (LOPs) of up to 14 dBm. We show that our NLC enhances the Q-factor by 3 dB at 14 dBm of LOP (a constellation diagram is also included). This is attributed to the compensation of self-phase modulation as a single-channel/carrier is transmitted.
IV. CONCLUSION
We experimentally demonstrated a novel, practical fiber-induced nonlinearity compensator, the sparse K-means++, for 40-Gb/s real-time 16-QAM energy-efficient self-coherent systems. At 50 km of commercial fiber transmission, our approach improved the Q-factor by up to 3 dB over linear equalization. This is the world’s first implementation of machine learning in an FPGA board for compensation of nonlinearities in optical networks. We believe that our technique can also be valuable for other modulation techniques such as fast optical orthogonal frequency division multiplexing.15–17
AUTHOR’S CONTRIBUTIONS
E.G. and Y.L. contributed equally to this work.
ACKNOWLEDGMENTS
This work was supported by SFI (Grant Nos. 13/RC/2077, 12/RC/2276, and 15/US-C2C/I3132) and the HEA INSPIRE.