We experimentally demonstrate the world’s first field-programmable gate-array-based real-time fiber nonlinearity compensator (NLC) using sparse K-means++ machine learning clustering in an energy-efficient 40-Gb/s 16-quadrature amplitude modulated self-coherent optical system. Our real-time NLC shows up to 3 dB improvement in Q-factor compared to linear equalization at 50 km of transmission.

Surging data traffic demands, caused by bandwidth-hungry Internet services such as video streaming or cloud computing, pose a significant challenge to underlying fiber-optic communication systems. The only viable solution for increasing the data rates is the employment of advanced modulation formats [e.g., 16 quadrature amplitude modulation (16-QAM)]. The core difficulty on the transmission of such signals is the Kerr-induced fiber nonlinearity,1 which is responsible for nonlinear optical effects such as four wave mixing (FWM).2 The optical Kerr effect is attributed to the so-called nonlinear Shannon capacity limit,2 which sets an upper bound on the achievable data rate in optical fiber communications when using traditional linear transmission techniques. There have been extensive efforts in attempting to surpass the nonlinear Shannon limit through several fiber nonlinearity compensators (NLCs)1–5 with the most well-known being the optical phase conjugation,1 digital back-propagation,2,3 phase-conjugated twin-waves,4 and the nonlinear Fourier transform (NFT).5 However, besides these solutions being either complex1–3,5 or spectrally inefficient,4 more importantly, they traditionally compensate deterministic nonlinearities, thus ignoring critical stochastic nonlinear effects, such as the interplay between amplified spontaneous emission from optical amplification with fiber nonlinearity. Recently, digital machine learning has been under the spotlight for compensation of nonlinear distortions, harnessing various algorithms that perform classification, such as supervised artificial neural networks6 and support vector machines7 or unsupervised clustering, such as K-means, fuzzy-logic C-means, hierarchical,8,9 and affinity propagation.10 Unsupervised machine learning clustering algorithms are more attractive than supervised algorithms because they are completely blind and do not need any training signals that limits signal capacity. Digital-based machine learning digital blocks are independent from mathematically tractable models and can be optimized for a specific hardware configuration and channel.11 Hitherto reported machine learning based NLCs have been implemented offline using conventional software platforms (e.g., Matlab®). However, there is a tremendous need for a practical algorithm for application in real-time communication links that can maximize transmission performance without sacrificing network latency and energy efficiency. Recent advances in the field of field-programmable gate-arrays (FPGAs) have made it possible to process high bandwidth digital signals in next-generation telecommunication systems.

In this work, we experimentally demonstrate the first FPGA-based NLC using sparse K-means++ clustering for 40-Gb/s 16-QAM energy-efficient self-coherent systems12 (see Fig. 1). We show that our proposed NLC can offer up to 3 dB Q-factor enhancement for transmission at 50 km of standard single mode fiber (SSMF).

FIG. 1.

Self-coherent system for 50 km transmission incorporating the machine learning based FPGA receiver.

FIG. 1.

Self-coherent system for 50 km transmission incorporating the machine learning based FPGA receiver.

Close modal

K-means clustering is a method of vector quantization, which has been popular for cluster analysis in data mining. It aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. However, K-means is computationally inefficient, especially for real-time signal processing where many calculations are required for thousands or millions of data points in a very limited timeframe (ms). Here, we propose a sparse K-means++ based NLC for real-time self-coherent optical signals. In the algorithm, we keep taking the data points within a regular time interval (sparsity) from the real-time incoming signal to estimate the centers of the clusters. The time interval can be dependent on the memory size of the FPGA. The updated centers’ coefficients are applied to the data signal in the hard decision process to help reduce the bit error rate (BER). As the calculation speed of the FPGA is very limited compared to the high-speed incoming data signal, the processing time for estimating the cluster centers’ coefficients will result in time delay and therefore system latency. In order to reduce such delays, the old updated cluster centers’ coefficients are saved in the latch registers and applied to the incoming data signal until new cluster centers are generated. The clustering algorithm is based on Lloyd’s approach,8,9 and its objective is to minimize the total intra-cluster variance or the squared error function J=j=1Ki=1nxi(j)cj2, where J, K, and n are the objective function, number of clusters, and number of symbols, respectively. xi is the ith symbol and cj is the centroid for a cluster j. The inside term in the norm defines the distance function between a symbol and a centroid. In the algorithm, we have knowledge about the centers (i.e., the “++” term) of the adopted modulation format level to enable faster convergence and adaptivity.

The steps of our algorithm are given below, while the conceptual diagram in Fig. 2(a) shows an example of the 16 centers’ updates of a 16-QAM signal:

  1. Initiate K cluster centers (centroids) by averaging the received data per cluster after linear equalization.

  2. Compute point-to-cluster-centroid distances of all observations to each centroid.

  3. Assign each observation to the cluster with the closest centroid (Batch update).

  4. Compute the average of the observations in each cluster to obtain K new centroid locations.

  5. Repeat steps 2 through 4 until cluster assignments do not change or the maximum number of iterations is reached.

FIG. 2.

(a) Design of sparse K-means++ on the FPGA. (Inset) First initialization step with ideal centers in a 16-QAM signal. N: number of parallel FPGA channels; M: number of clusters in the constellation diagram. (b) Floorplan of the DSPs on the FPGA: CMA (pink); K-means hard decision (yellow) and coefficients update (blue); LMS (green). (c) On-chip power report (power estimation from Xilinx synthesized netlist). (d) Performance of sparse K-means++ for 16-QAM at 50 km.

FIG. 2.

(a) Design of sparse K-means++ on the FPGA. (Inset) First initialization step with ideal centers in a 16-QAM signal. N: number of parallel FPGA channels; M: number of clusters in the constellation diagram. (b) Floorplan of the DSPs on the FPGA: CMA (pink); K-means hard decision (yellow) and coefficients update (blue); LMS (green). (c) On-chip power report (power estimation from Xilinx synthesized netlist). (d) Performance of sparse K-means++ for 16-QAM at 50 km.

Close modal

In our NLC, the number of iterations was fixed, as further iterations did not improve the Q-factor. In Fig. 2(a), we show the design of the NLC. As shown, in order to update the centers in K-means, only information from one of the N-parallel FPGA-channels (32 in total) is required for the I and Q components, the updated center coefficients from one channel will be used for the other channels, thus significantly reducing the processing time and the resource utilization. The system is split into two parallel processors, where in the upper diagram the minimum Euclidean distance among sparse received data per constellation cluster M and the ideal centers is first calculated (i.e., the “++” term). The inset of Fig. 2(a) shows an example of the first ideal center’s initialization of a 16-QAM signal. Afterward, hard decision is executed and the BER/Q-factor is calculated. After this initialization, the lower arm in the design (the centers coefficients update block) is activated. It is the key part of the sparse K-means++ clustering algorithm, which calculates the coefficients of the K-means following a similar procedure as before (i.e., via Euclidean minimum distance calculation per cluster), estimates the mean for each center, and then updates them in each iteration depending on the renewed cluster mean. In the last iteration of the process, we re-assign symbols until all of them are properly assigned, considering that the minimum BER value has been reached (convergence). The FPGA used in our experiment was a Xilinx Virtex Ultrascale + VCU118 Evaluation FPGA Platform. The self-coherent system (Fig. 1) employs a 10 GBaud 16-QAM signal at 40-Gb/s. The transmitter-digital signal processing (DSP) was processed offline in Matlab using a look-up-table (LUT) in which pre-distortion was used to mitigate the opto-electronic components’ impairments similarly to Ref. 13. A narrow linewidth (<100 kHz) external cavity laser (ECL) was tuned to 1549.5 nm, and using an arbitrary waveform generator (AWG) operating at 20 GS/s, two uncorrelated pseudo-random level signals (215 − 1) were applied to the IQ modulator to generate the 16-QAM signal. After IQ modulation, the optical signal was transmitted over 50 km of SSMF. At the receiver, the optical signal was converted to an electrical one using a self-homodyne coherent receiver. Afterward, the signal was captured by using a real-time oscilloscope sampled at 50 GS/s, and directly after the analog to digital converters (ADCs), the resultant electrical signal entered our FPGA board. First, the real-time signal passed through the constant modulus algorithm (CMA) combined with the multi-modulus algorithm (MMA) for signal equalization. Afterward, the signal went through a least-mean squares (LMS) digital filter for carrier phase recovery, and finally, our proposed machine learning was processed in which hard calculations were carried out. The design structure of the CMA/MMA and LMS for real-time application is identical to that reported in Ref. 14. For our FPGA design, 32 parallel channels (Ch. N) were attached to meet the required 20 GHz bandwidth, considering that the available clock rate per channel is 312.5 MHz. From Ch. 1 to Ch. N, we selected only one channel and used it to update the information of the centers in K-means. However, for each channel, we processed separately the I and Q components.

Figure 2(b) and Table I present the resource utilization of the receiver DSP after FPGA synthesis and implementation, and Fig. 2(c) shows the on-chip power, which is in total 10.219 W. It is evident that the sparse K-means++, in which the hard-decision (yellow) and the coefficients update (blue) are included, covers >60% of the total complexity of the DSP receiver [e.g., LUTs, configurable logic block (CLB), 8-bit carry chain per CLB (CARRY8)]. The rest of the complexity is allocated to the CMA (pink) and the LMS (green). We also calculated the on-chip power consumption for the case using only a linear equalizer in order to compare with the case with the spare K-means++ algorithm. In the case with only linear equalization, the signal equalization (CMA and MMA), the LMS, and the hard-decision modules are included. Note that the hard decision is also needed before the BER calculation. The total on chip power for this case is 9.756 W. The on-chip power required when using the sparse K-means++ algorithm is 10.219 W; thus, the difference between using the sparse K-means++ algorithm and only using the conventional linear equalizer is not significant. This makes the proposed real-time sparse K-means++ algorithm competitive from a power consumption perspective. While this is contributed to a self-coherent system, it is worth noting that in a conventional coherent system, additional linear equalization is required for the frequency offset compensation between the transmitter and receiver laser. In addition, since our system was tested for short-each transmission at 50 km, chromatic dispersion compensation was ignored in the linear equalization process. Thus, we thus envisage that the total complexity of the linear equalizer in traditional coherent homodyne systems is much larger than the proposed machine learning algorithm.

TABLE I.

Key FPGA resource of the developed DSP.

DSP portLUTsRegistersCARRY8CLB
LMS 26 630 5 940 3 404 5 655 
Sp. K-means++ 431 149 60 316 32 444 67 973 
CMA 164 538 28 336 11 264 25 970 
DSP portLUTsRegistersCARRY8CLB
LMS 26 630 5 940 3 404 5 655 
Sp. K-means++ 431 149 60 316 32 444 67 973 
CMA 164 538 28 336 11 264 25 970 

Finally, in Fig. 2(d), we show the transmission performance of the 16-QAM 10 Gbaud signal at 50 km when using linear equalization and sparse K-means++ for launched optical powers (LOPs) of up to 14 dBm. We show that our NLC enhances the Q-factor by 3 dB at 14 dBm of LOP (a constellation diagram is also included). This is attributed to the compensation of self-phase modulation as a single-channel/carrier is transmitted.

We experimentally demonstrated a novel, practical fiber-induced nonlinearity compensator, the sparse K-means++, for 40-Gb/s real-time 16-QAM energy-efficient self-coherent systems. At 50 km of commercial fiber transmission, our approach improved the Q-factor by up to 3 dB over linear equalization. This is the world’s first implementation of machine learning in an FPGA board for compensation of nonlinearities in optical networks. We believe that our technique can also be valuable for other modulation techniques such as fast optical orthogonal frequency division multiplexing.15–17 

E.G. and Y.L. contributed equally to this work.

This work was supported by SFI (Grant Nos. 13/RC/2077, 12/RC/2276, and 15/US-C2C/I3132) and the HEA INSPIRE.

1.
M. A. Z.
Al-Khateeb
,
M. E.
McCarthy
,
C.
Sánchez
, and
A. D.
Ellis
, “
Nonlinearity compensation using optical phase conjugation deployed in discretely amplified transmission systems
,”
Opt. Express
26
(
18
),
23945
23959
(
2018
).
2.
E.
Temprana
,
E.
Myslivets
,
B. P.-P.
Kuo
 et al., “
Overcoming Kerr-induced capacity limit in optical fiber transmission
,”
Science
348
(
6242
),
1445
1448
(
2015
).
3.
R.
Maher
,
L.
Galdino
,
M.
Sato
 et al., “
Linear and nonlinear impairment mitigation in a Nyquist spaced DP-16QAM WDM transmission system with full-field DBP
,” in
Proceedings of the European Conference on Optical Communication (ECOC), Cannes, France, September 2014
(
IEEE
,
2014
), p.
5.10
.
4.
X.
Liu
,
A. R.
Chraplyvy
,
P. J.
Winzer
 et al., “
Phase-conjugated twin waves for communication beyond the Kerr nonlinearity limit
,”
Nat. Photonics
7
,
560
568
(
2013
).
5.
S. T.
Le
,
V.
Aref
, and
H.
Buelow
, “
Nonlinear signal multiplexing for communication beyond the Kerr nonlinearity limit
,”
Nat. Photonics
11
,
570
576
(
2017
).
6.
Z.
Shaoliang
,
F.
Yaman
,
K.
Nakamura
 et al., “
Field and lab experimental demonstration of nonlinear impairment compensation using neural networks
,”
Nat. Commun.
10
,
3033
(
2019
).
7.
M.
Li
,
S.
Yu
,
J.
Yang
 et al., “
Nonparameter nonlinear phase noise mitigation by using M-ary support vector machine for coherent optical systems
,”
IEEE Photonics J.
5
(
6
),
7800312
(
2013
).
8.
E.
Giacoumidis
,
A.
Matin
,
J.
Wei
 et al., “
Blind nonlinearity equalization by machine learning based clustering for single- and multi-channel coherent optical OFDM
,”
J. Lightwave Technol.
36
(
3
),
721
727
(
2018
).
9.
J.
Zhang
,
W.
Chen
,
M.
Gao
, and
G.
Shen
, “
K-means-clustering-based fiber nonlinearity equalization techniques for 64-QAM coherent optical communication system
,”
Opt. Express
25
(
22
),
27570
27580
(
2017
).
10.
E.
Giacoumidis
,
I.
Aldaya
,
J. L.
Wei
 et al., “
Affinity propagation clustering for blind nonlinearity compensation in coherent optical OFDM
,” in
Proceedings of Conference on Lasers and Electro-Optics (CLEO), San Jose, CA, USA, November 2018
(
IEEE
,
2018
), p.
STh1C.5
.
11.
D.
Zibar
,
M.
Piels
,
R.
Jones
, and
C. G.
Schäeffer
, “
Machine learning techniques in optical communication
,”
J. Lightwave Technol.
34
(
6
),
1442
1452
(
2016
).
12.
E.
Giacoumidis
,
A.
Choudhary
,
E.
Magi
 et al., “
Chip-based Brillouin processing for carrier recovery in coherent optical communications
,”
Optica
5
(
10
),
1191
1199
(
2018
).
13.
J.
Zhang
,
J.
Yu
, and
H.-C.
Chien
, “
Advanced linear and nonlinear compensations for 16QAM SC-400G unrepeatered transmission system
,”
Opt. Commun.
409
,
34
38
(
2018
).
14.
G.
Liu
,
K.
Zhang
,
R.
Zhang
 et al., “
Demonstration of a carrier frequency offset estimator for 16-/32-QAM coherent receivers: A hardware perspective
,”
Opt. Express
26
(
4
),
4853
4862
(
2018
).
15.
E.
Giacoumidis
,
S. K.
Ibrahim
,
J.
Zhao
 et al., “
Experimental and theoretical investigations of intensity-modulation and direct-detection optical fast-OFDM over MMF-links
,”
IEEE Photonics Technol. Lett.
24
(
1
),
52
54
(
2012
).
16.
E.
Giacoumidis
,
A.
Tsokanos
,
C.
Mouchos
 et al., “
Extensive comparisons of optical fast-OFDM and conventional optical OFDM for local and access networks
,”
J. Opt. Commun. Networking
4
(
10
),
724
733
(
2012
).
17.
E.
Giacoumidis
,
S. K.
Ibrahim
,
J.
Zhao
 et al., “
Experimental demonstration of cost-effective intensity-modulation and direct-detection optical fast-OFDM over 40km SMF transmission
,” in
Proceedings of Conference on Optical Fiber Communications and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC), Los Angeles, California, USA, March 2012
(
IEEE
,
2012
), p.
JW2A.65
.