One of the main issues of the satellite-to-ground optical communication, including free-space satellite quantum key distribution (QKD), is an achievement of the reasonable accuracy of positioning, navigation, and optical stabilization. Proportional–integral–derivative (PID) controllers can handle various control tasks in optical systems. Recent research shows the promising results in the area of composite control systems including classical control via PID controllers and reinforcement learning (RL) approach. In this work, we apply a RL agent to an experimental stand of the optical stabilization system of the QKD terminal. We find via agent control history more precise PID parameters and also provide an effective combined RL-PID dynamic control approach for the optical stabilization of the satellite-to-ground communication system.

1.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
, 2nd ed. (
MIT Press
,
2018
).
2.
T.
Wei
,
Y.
Wang
, and
Q.
Zhu
, in
Proceedings of the 54th Annual Design Automation Conference 2017, DAC ’17
(
Association for Computing Machinery
,
New York, NY
,
2017
.
3.
E.
Mocanu
,
D. C.
Mocanu
,
P. H.
Nguyen
,
A.
Liotta
,
M. E.
Webber
,
M.
Gibescu
, and
J. G.
Slootweg
,
IEEE Trans. Smart Grid
10
,
3698
(
2019
).
4.
A. E.
Sallab
,
M.
Abdou
,
E.
Perot
, and
S.
Yogamani
,
Electron. Imaging
29
,
70
76
(
2017
).
5.
B.
Kiran
,
I.
Sobh
,
V.
Talpaert
,
P.
Mannion
,
A.
Sallab
,
S.
Yogamani
, and
P.
Perez
,
IEEE Trans. Intell. Transp. Syst.
23
,
4909
(
2021
); arXiv:2002.00444.
6.
E.
Bøhn
,
E. M.
Coates
,
S.
Moe
, and
T. A.
Johansen
, in
2019 International Conference on Unmanned Aircraft Systems (ICUAS)
(
IEEE
,
2019
), pp.
523
533
.
7.
A. V.
Khmelev
,
A. V.
Duplinsky
,
V. F.
Mayboroda
,
R. M.
Bakhshaliev
,
M. Y.
Balanov
,
V. L.
Kurochkin
, and
Y. V.
Kurochkin
,
Tech. Phys. Lett.
47
,
858
(
2021
).
8.
Y.
Qin
,
W.
Zhang
,
J.
Shi
, and
L.
Jinglong
, in
2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC)
(
IEEE
,
2018
), pp.
1
6
.
9.
A. I.
Lakhani
,
M. A.
Chowdhury
, and
Q.
Lu
, “
Stability-preserving automatic tuning of PID control with reinforcement learning
,” arXiv:2112.15187 (
2021
).
10.
I.
Carlucho
,
M.
De Paula
, and
G. G.
Acosta
,
ISA Trans.
102
,
280
(
2020
).
11.
D.
Silver
,
G.
Lever
,
N.
Heess
,
T.
Degris
,
D.
Wierstra
, and
M.
Riedmiller
, in
31st International Conference on Machine Learning
(
ICML
,
2014
), p.
1
12.
R. S.
Sutton
,
D.
Precup
, and
S.
Singh
,
Artif. Intell.
112
,
181
(
1999
).
13.
J. G.
Ziegler
and
N. B.
Nichols
,
J. Fluids Eng.
64
,
759
(
1942
).
14.
G. E.
Uhlenbeck
and
L. S.
Ornstein
,
Phys. Rev.
36
,
823
(
1930
).
15.
T. P.
Lillicrap
,
J. J.
Hunt
,
A.
Pritzel
,
N.
Heess
,
T.
Erez
,
Y.
Tassa
,
D.
Silver
, and
D.
Wierstra
, “
Continuous control with deep reinforcement learning
,” arXiv:1509.02971 [cs.LG] (
2019
).
16.
R.
Zou
,
P.
Parvizi
,
R.
Cheriton
,
C.
Bellinger
, and
D.
Spinello
, “
Wavefront sensorless adaptive optics for free-space satellite-to-ground communication using reinforcement learning
,” in ,
Durham, UK
,
29–31 March 2023
(
HAL,
2023
).
17.
R.
Nian
,
J.
Liu
, and
B.
Huang
,
Comput. Chem. Eng.
139
,
106886
(
2020
).
18.
P.
Henderson
,
R.
Islam
,
P.
Bachman
,
J.
Pineau
,
D.
Precup
, and
D.
Meger
,
Deep Reinforcement Learning That Matters
(
AAAI Press
,
2018
).
You do not currently have access to this content.