We present a numerical study of training a self-propelling agent to migrate in the unsteady flow environment. We control the agent to utilize the background flow structure by adopting the reinforcement learning algorithm to minimize energy consumption. We considered the agent migrating in two types of flows: one is simple periodical double-gyre flow as a proof-of-concept example, while the other is complex turbulent Rayleigh–Bénard convection as a paradigm for migrating in the convective atmosphere or the ocean. The results show that the smart agent in both flows can learn to migrate from one position to another while utilizing background flow currents as much as possible to minimize the energy consumption, which is evident by comparing the smart agent with a naive agent that moves straight from the origin to the destination. In addition, we found that compared to the double-gyre flow, the flow field in the turbulent Rayleigh–Bénard convection exhibits more substantial fluctuations, and the training agent is more likely to explore different migration strategies; thus, the training process is more difficult to converge. Nevertheless, we can still identify an energy-efficient trajectory that corresponds to the strategy with the highest reward received by the agent. These results have important implications for many migration problems such as unmanned aerial vehicles flying in a turbulent convective environment, where planning energy-efficient trajectories are often involved.

1.
C. D.
Cone
, “
Thermal soaring of birds
,”
Am. Sci.
50
,
180
209
(
1962
).
2.
F.
Ludlam
and
R.
Scorer
, “
Reviews of modern meteorology 10 convection in the atmosphere
,”
Q. J. R. Meteorol. Soc.
79
,
317
341
(
1953
).
3.
R.
Bencatel
,
J. T.
de Sousa
, and
A.
Girard
, “
Atmospheric flow field models applicable for aircraft endurance extension
,”
Prog. Aerosp. Sci.
61
,
1
25
(
2013
).
4.
H. J.
Williams
,
E.
Shepard
,
M. D.
Holton
,
P.
Alarcón
,
R.
Wilson
, and
S.
Lambertucci
, “
Physical limits of flight performance in the heaviest soaring bird
,”
Proc. Natl. Acad. Sci.
117
,
17884
17890
(
2020
).
5.
P. B.
MacCready
, “
Optimum airspeed selector
,”
Soaring
10
,
10
(
1958
).
6.
Z.
Akos
,
M.
Nagy
, and
T.
Vicsek
, “
Comparing bird and human soaring strategies
,”
Proc. Natl. Acad. Sci.
105
,
4139
4143
(
2008
).
7.
M. J.
Allen
and
V.
Lin
, “
Guidance and control of an autonomous soaring UAV
,”
Technical Report
No. H-2714 (
2007
).
8.
J.
Wharington
and
I.
Herszberg
, “
Control of a high endurance unmanned air vehicle
,” in
Proceedings of the 21st ICAS Congress
, Vol.
1234567890
,
1998
.
9.
Z.
Ákos
,
M.
Nagy
,
S.
Leven
, and
T.
Vicsek
, “
Thermal soaring flight of birds and unmanned aerial vehicles
,”
Bioinspiration Biomimetics
5
,
045003
(
2010
).
10.
G.
Reddy
,
A.
Celani
,
T. J.
Sejnowski
, and
M.
Vergassola
, “
Learning to soar in turbulent environments
,”
Proc. Natl. Acad. Sci.
113
,
E4877
E4884
(
2016
).
11.
G.
Reddy
,
J.
Wong-Ng
,
A.
Celani
,
T. J.
Sejnowski
, and
M.
Vergassola
, “
Glider soaring via reinforcement learning in the field
,”
Nature
562
,
236
239
(
2018
).
12.
T.
Dbouk
and
D.
Drikakis
, “
Quadcopter drones swarm aeroacoustics
,”
Phys. Fluids
33
,
057112
(
2021
).
13.
S.
Colabrese
,
K.
Gustavsson
,
A.
Celani
, and
L.
Biferale
, “
Flow navigation by smart microswimmers via reinforcement learning
,”
Phys. Rev. Lett.
118
,
158004
(
2017
).
14.
K.
Gustavsson
,
L.
Biferale
,
A.
Celani
, and
S.
Colabrese
, “
Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning
,”
Eur. Phys. J. E
40
,
110
(
2017
).
15.
S.
Colabrese
,
K.
Gustavsson
,
A.
Celani
, and
L.
Biferale
, “
Smart inertial particles
,”
Phys. Rev. Fluids
3
,
084301
(
2018
).
16.
L.
Biferale
,
F.
Bonaccorso
,
M.
Buzzicotti
,
P.
Clark Di Leoni
, and
K.
Gustavsson
, “
Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using reinforcement learning
,”
Chaos
29
,
103138
(
2019
).
17.
J. K.
Alageshan
,
A. K.
Verma
,
J.
Bec
, and
R.
Pandit
, “
Machine learning strategies for path-planning microswimmers in turbulent flows
,”
Phys. Rev. E
101
,
043110
(
2020
).
18.
S.
Muiños-Landin
,
A.
Fischer
,
V.
Holubec
, and
F.
Cichos
, “
Reinforcement learning with artificial microswimmers
,”
Sci. Rob.
6
,
eabd9285
(
2021
).
19.
J.
Qiu
,
W.
Huang
,
C.
Xu
, and
L.
Zhao
, “
Swimming strategy of settling elongated micro-swimmers by reinforcement learning
,”
Sci. China Phys. Mech. Astron.
63
,
284711
(
2020
).
20.
J.
Qiu
,
N.
Mousavi
,
K.
Gustavsson
,
C.
Xu
,
B.
Mehlig
, and
L.
Zhao
, “
Navigation of micro-swimmers in steady flow: The importance of symmetries
,”
J. Fluid Mech.
932
,
A10
(
2022
).
21.
S. C.
Shadden
,
F.
Lekien
, and
J. E.
Marsden
, “
Definition and properties of Lagrangian coherent structures from finite-time Lyapunov exponents in two-dimensional aperiodic flows
,”
Phys. D
212
,
271
304
(
2005
).
22.
K.-Q.
Xia
, “
Current trends and future directions in turbulent thermal convection
,”
Theor. Appl. Mech. Lett.
3
,
052001
(
2013
).
23.
F. M.
Callier
and
C. A.
Desoer
,
Linear System Theory
(
Springer
,
2012
).
24.
L. S.
Pontryagin
,
Mathematical Theory of Optimal Processes
(
CRC Press
,
1987
).
25.
P.
Mehta
,
M.
Bukov
,
C.-H.
Wang
,
A. G.
Day
,
C.
Richardson
,
C. K.
Fisher
, and
D. J.
Schwab
, “
A high-bias, low-variance introduction to Machine Learning for physicists
,”
Phys. Rep.
810
,
1
124
(
2019
).
26.
G.
Carleo
,
I.
Cirac
,
K.
Cranmer
,
L.
Daudet
,
M.
Schuld
,
N.
Tishby
,
L.
Vogt-Maranto
, and
L.
Zdeborová
, “
Machine learning and the physical sciences
,”
Rev. Mod. Phys.
91
,
045002
(
2019
).
27.
S. L.
Brunton
,
B. R.
Noack
, and
P.
Koumoutsakos
, “
Machine learning for fluid mechanics
,”
Annu. Rev. Fluid Mech.
52
,
477
508
(
2020
).
28.
P.
Garnier
,
J.
Viquerat
,
J.
Rabault
,
A.
Larcher
,
A.
Kuhnle
, and
E.
Hachem
, “
A review on deep reinforcement learning for fluid mechanics
,”
Comput. Fluids
225
,
104973
(
2021
).
29.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
(
MIT Press
,
2018
).
30.
F.
Cichos
,
K.
Gustavsson
,
B.
Mehlig
, and
G.
Volpe
, “
Machine learning for active matter
,”
Nat. Mach. Intell.
2
,
94
103
(
2020
).
31.
J. N.
Tsitsiklis
and
B.
Van Roy
, “
An analysis of temporal-difference learning with function approximation
,”
IEEE Trans. Automat. Control
42
,
674
690
(
1997
).
32.
C.
Szepesvári
, “
Algorithms for reinforcement learning
,”
Synth. Lect. Artif. Intell. Mach. Learn.
4
,
1
103
(
2010
).
33.
T.
Haarnoja
,
A.
Zhou
,
P.
Abbeel
, and
S.
Levine
, “
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
,” in
International Conference on Machine Learning
(
PMLR
,
2018
), pp.
1861
1870
.
34.
W.
Zhang
,
T.
Inanc
,
S.
Ober-Blobaum
, and
J. E.
Marsden
, “
Optimal trajectory generation for a glider in time-varying 2D ocean flows B-spline model
,” in
2008 IEEE International Conference on Robotics and Automation
(
IEEE
,
2008
), pp.
1083
1088
.
35.
C.
Wang
,
Z.
Xu
,
X.
Zhang
, and
S.
Wang
, “
Optimal reduced frequency for the power efficiency of a flat plate gliding with spanwise oscillations
,”
Phys. Fluids
33
,
111908
(
2021
).
36.
L.
Liu
,
M.
Chen
,
J.
Yu
,
Z.
Zhang
, and
X.
Wang
, “
Full-scale simulation of self-propulsion for a free-running submarine
,”
Phys. Fluids
33
,
047103
(
2021
).
37.
W.
Wang
,
H.
Huang
, and
X.-Y.
Lu
, “
Optimal chordwise stiffness distribution for self-propelled heaving flexible plates
,”
Phys. Fluids
32
,
111905
(
2020
).
38.
H.
Yu
,
X.-Y.
Lu
, and
H.
Huang
, “
Collective locomotion of two uncoordinated undulatory self-propelled foils
,”
Phys. Fluids
33
,
011904
(
2021
).
39.
Y.
Liu
,
C.
Pan
, and
Y.
Liu
, “
Propulsive performance and flow-field characteristics of a jellyfish-like ornithopter with asymmetric pitching motion
,”
Phys. Fluids
32
,
071904
(
2020
).
40.
K. M.
Laurent
,
B.
Fogg
,
T.
Ginsburg
,
C.
Halverson
,
M. J.
Lanzone
,
T. A.
Miller
,
D. W.
Winkler
, and
G. P.
Bewley
, “
Turbulence explains the accelerations of an eagle in natural flight
,”
Proc. Natl. Acad. Sci.
118
,
e2102588118
(
2021
).
41.
A.
Xu
,
W.
Shyy
, and
T.
Zhao
, “
Lattice Boltzmann modeling of transport phenomena in fuel cells and flow batteries
,”
Acta Mech. Sin.
33
,
555
574
(
2017
).
42.
S.
Chen
and
G. D.
Doolen
, “
Lattice Boltzmann method for fluid flows
,”
Annu. Rev. Fluid Mech.
30
,
329
364
(
1998
).
43.
C. K.
Aidun
and
J. R.
Clausen
, “
Lattice-Boltzmann method for complex flows
,”
Annu. Rev. Fluid Mech.
42
,
439
472
(
2010
).
44.
A.
Xu
,
L.
Shi
, and
T.
Zhao
, “
Accelerated lattice Boltzmann simulation using GPU and OpenACC with data management
,”
Int. J. Heat Mass Transfer
109
,
577
588
(
2017
).
45.
A.
Xu
,
L.
Shi
, and
H.-D.
Xi
, “
Lattice Boltzmann simulations of three-dimensional thermal convective flows at high Rayleigh number
,”
Int. J. Heat Mass Transfer
140
,
359
370
(
2019
).
46.
W.-F.
Zhou
and
J.
Chen
, “
Large-scale structures of turbulent Rayleigh–Bénard convection in a slim-box
,”
Phys. Fluids
33
,
065103
(
2021
).
47.
A.
Xu
,
X.
Chen
,
F.
Wang
, and
H.-D.
Xi
, “
Correlation of internal flow structure with heat transfer efficiency in turbulent Rayleigh–Bénard convection
,”
Phys. Fluids
32
,
105112
(
2020
).
48.
A.
Xu
,
B.-R.
Xu
,
L.-S.
Jiang
, and
H.-D.
Xi
, “
Production and transport of vorticity in two-dimensional Rayleigh–Bénard convection cell
,”
Phys. Fluids
34
,
013609
(
2022
).
You do not currently have access to this content.