The rapid development of quantitative portfolio optimization in financial engineering has produced promising results in AI-based algorithmic trading strategies. However, the complexity of financial markets poses challenges for comprehensive simulation due to various factors, such as abrupt transitions, unpredictable hidden causal factors, and heavy tail properties. This paper aims to address these challenges by employing heavy-tailed preserving normalizing flows to simulate the high-dimensional joint probability of the complex trading environment under a model-based reinforcement learning framework. Through experiments with various stocks from three financial markets (Dow, NASDAQ, and S&P), we demonstrate that Dow outperforms the other two based on multiple evaluation metrics in our testing system. Notably, our proposed method mitigates the impact of unpredictable financial market crises during the COVID-19 pandemic, resulting in a lower maximum drawdown. Additionally, we explore the explanation of our reinforcement learning algorithm, employing the pattern causality method to study interactive relationships among stocks, analyzing dynamics of training for loss functions to ensure convergence, visualizing high-dimensional state transition data with t-SNE to uncover effective patterns for portfolio optimization, and utilizing eigenvalue analysis to study convergence properties of the environment’s model.

1.
A.
Pastore
,
U.
Esposito
, and
E.
Vasilaki
, “Modelling stock-market investors as reinforcement learning agents,” in 2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS) (IEEE, 2015), pp. 1–6.
2.
Y.
Li
,
P.
Ni
, and
V.
Chang
, “An empirical research on the investment strategy of stock market based on deep reinforcement learning model,” in COMPLEXIS (SCITEPRESS, 2019), pp. 52–58.
3.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A.
Graves
,
I.
Antonoglou
,
D.
Wierstra
, and
M.
Riedmiller
, “Playing Atari with deep reinforcement learning,” arXiv:1312.5602 (2013).
4.
H. V.
Hasselt
,
A.
Guez
, and
D.
Silver
, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Press, 2016), Vol. 30.
5.
Z.
Wang
,
T.
Schaul
,
M.
Hessel
,
H.
Hasselt
,
M.
Lanctot
, and
N.
Freitas
, “Dueling network architectures for deep reinforcement learning,” in International Conference on Machine Learning (PMLR, 2016), pp. 1995–2003.
6.
J.
Lee
,
R.
Kim
,
Y.
Koh
, and
J.
Kang
, “
Global stock market prediction based on stock chart images using deep Q-network
,”
IEEE Access
7
,
167260
167277
(
2019
).
7.
Q.
Kang
,
H.
Zhou
, and
Y.
Kang
, “An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management,” in Proceedings of the 2nd International Conference on Big Data Research (ACM, 2018), pp. 141–145.
8.
V.
Mnih
,
A.
Badia
,
M.
Mirza
,
A.
Graves
,
T.
Lillicrap
,
T.
Harley
,
D.
Silver
, and
K.
Kavukcuoglu
, “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning (PMLR, 2016), pp. 1928–1937.
9.
X.
Li
,
Y.
Li
,
Y.
Zhan
, and
X.
Liu
, “Optimistic bull or pessimistic bear: Adaptive deep reinforcement learning for stock portfolio allocation,” arXiv:1907.01503 (2019).
10.
T.
Lillicrap
,
J.
Hunt
,
A.
Pritzel
,
N.
Heess
,
T.
Erez
,
D.
Silver Y. Tassa
, and
D.
Wierstra
, “Continuous control with deep reinforcement learning,” arXiv:1509.02971 (2015).
11.
S.
Sornmayura
, “
Robust forex trading with deep Q network (DQN)
,”
ABAC J.
39
(
1
),
15
33
(
2019
).
12.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A. A.
Rusu
,
J.
Veness
,
M. G.
Bellemare
,
A.
Graves
,
M.
Riedmiller
,
A. K.
Fidjeland
,
G.
Ostrovski
, and
S.
Petersen
, “
Human-level control through deep reinforcement learning
,”
Nature
518
(
7540
),
529
533
(
2015
).
13.
J.
Schulman
,
F.
Wolski
,
P.
Dhariwal
,
A.
Radford
, and
O.
Klimov
, “Proximal policy optimization algorithms,” arXiv:1707.06347 (2017).
14.
P.
Yu
,
J.
Lee
,
I.
Kulyatin
,
Z.
Shi
, and
S.
Dasgupta
, “Model-based deep reinforcement learning for dynamic portfolio optimization,” arXiv:1901.08740 (2019).
15.
K.
Chua
,
R.
Calandra
,
R.
McAllister
, and
S.
Levine
, “Deep reinforcement learning in a handful of trials using probabilistic dynamics models,” Advances in Neural Information Processing Systems 31 (Curran Associates Inc., 2018).
16.
M.
Janner
,
J.
Fu
,
M.
Zhang
, and
S.
Levine
, “When to trust your model: Model-based policy optimization,” Advances in Neural Information Processing Systems 32 (Curran Associates Inc., 2019).
17.
T.
Cai
and
H.
Wei
, “Distributed gaussian mean estimation under communication constraints: Optimal rates and communication-efficient algorithms,” arXiv:2001.08877 (2020).
18.
S. K.
Stavroglou
,
A. A.
Pantelous
,
H.
Stanley
, and
K. M.
Zuev
, “
Hidden interactions in financial markets
,”
Proc. Natl. Acad. Sci. U.S.A.
116
(
22
),
10646
10651
(
2019
).
19.
L.
Yang
,
T.
Gao
,
Y.
Lu
,
J.
Duan
, and
T.
Liu
, “
Neural network stochastic differential equation models with applications to financial data forecasting
,”
Appl. Math. Model.
115
,
279
299
(
2023
).
20.
W.
Bao
,
J.
Yue
, and
Y.
Rao
, “
A deep learning framework for financial time series using stacked autoencoders and long-short term memory
,”
PLoS One
12
(
7
),
e0180944
(
2017
).
21.
A. A.
Ariyo
,
A. O.
Adewumi
, and
C. K.
Ayo
, “Stock price prediction using the ARIMA model,” in 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (IEEE, 2014), pp. 106–112.
22.
L.
Dinh
,
D.
Krueger
, and
Y.
Bengio
, “Nice: Non-linear independent components estimation,” arXiv:1410.8516 (2014).
23.
L.
Dinh
,
J.
Sohl-Dickstein
, and
S.
Bengio
, “Density estimation using real NVP,” arXiv:1605.08803 (2016).
24.
G.
Papamakarios
,
E.
Nalisnic
,
D. J.
Rezende
,
S.
Mohamed
, and
B.
Lakshminarayanan
, “
Normalizing flows for probabilistic modeling and inference
,”
J. Mach. Learn. Res.
22
(
1
),
2617
2680
(
2021
).
25.
K.
Rasul
,
A. S.
Sheikh
,
I.
Schuster
,
U.
Bergmann
, and
R.
Vollgraf
, “Multivariate probabilistic time series forecasting via conditioned normalizing flows,” arXiv:2002.06103 (2020).
26.
M. C.
Gemici
,
D.
Rezende
, and
S.
Mohamed
, “Normalizing flows on Riemannian manifolds,” arXiv:1611.02304 (2016).
27.
T.
Haarnoja
,
A.
Zhou
,
P.
Abbeel
, and
S.
Levine
, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning (PMLR, 2018), pp. 1861–1870.
28.
G.
Appel
, “
Become your own technical analyst: How to identify significant market turning points using the moving average convergence-divergence indicator or MACD
,”
J. Wealth Manag.
6
(
1
),
27
36
(
2003
).
29.
J.
Bollinger
, “
Using Bollinger bands
,”
Stoc. Commod.
10
(
2
),
47
51
(
1992
).
30.
R.
Adrian
, “
The relative strength index revisited
,”
Afr. J. Bus. Manag.
5
(
14
),
5855
5862
(
2011
).
31.
H.
Lai
,
J.
Shen
,
W.
Zhang
, and
Y.
Yu
, “Bidirectional model-based policy optimization,” in International Conference on Machine Learning (PMLR, 2020), pp. 5618–5627.
32.
X.
Liu
,
Z.
Xiong
,
S.
Zhong
,
H.
Yang
, and
A.
Walid
, “Practical deep reinforcement learning approach for stock trading,” arXiv:1811.07522 (2018).
33.
P.
Emami
, “Deep deterministic policy gradients in tensorflow,” My Summaries of Machine Learning Papers and Investigations into Various Topics Concerning Artificial Intelligence (published online, 2016); available at https://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html.
34.
W. F.
Sharpe
, “
The Sharpe ratio
,”
J. Portfolio Management
21
(1) 49–58 (
1994
).
35.
J.
Yong
and
X.
Zhou
,
Stochastic Controls: Hamiltonian Systems and HJB Equations
(
Springer Science & Business Media
,
1999
), Vol. 43.
36.
P.
Lévy
,
Théorie de L’addition des Variables Aléatoires
(
Gauthier-Villars
,
1954
).
37.
J.
Duan
,
An Introduction to Stochastic Dynamics
(
Cambridge University Press
,
2015
), Vol. 51.
38.
P.
Jaini
,
I.
Kobyzev
,
Y.
Yu
, and
M.
Brubaker
, “Tails of Lipschitz triangular flows,” in International Conference on Machine Learning (PMLR, 2020), pp. 4673–4681.
39.
T.
Huster
,
J.
Cohe
,
Z.
Lin
,
K.
Chan
,
C.
Kamhoua
,
N. O.
Leslie
,
C. J.
Chiang
, and
V.
Sekar
, “Pareto GAN: Extending the representational power of gans to heavy-tailed distributions,” in International Conference on Machine Learning (PMLR, 2021), pp. 4523–4532.
40.
J. A.
Rice
,
Mathematical Statistics and Data Analysis
(
Cengage Learning
,
2006
).
41.
Y.
Lu
,
R.
Maulik
,
T.
Gao
,
F.
Dietrich
,
I. G.
Kevrekidis
, and
J.
Duan
, “Learning the temporal evolution of multivariate densities via normalizing flows,” arXiv:2107.13735 (2021).
42.
C.
Fang
,
Y.
Lu
,
T.
Gao
, and
J.
Duan
, “
An end-to-end deep learning approach for extracting stochastic dynamical systems with α-stable Lévy noise
,”
Chaos
32
(
6
),
063112
(
2022
).
43.
W. F.
Sharpe
, “
The Sharpe ratio
,”
Streetwise
3
,
169
185
(
1998
).
44.
P.
Gai
and
S.
Kapadia
, “
Contagion in financial networks
,”
Proc. R. Soc. A: Math. Phys. Eng. Sci.
466
(
2120
),
2401
2423
(
2010
).
45.
V.
Boginski
,
S.
Butenko
, and
P. M.
Pardalos
, “
Statistical analysis of financial networks
,”
Comput. Stat. Data Anal.
48
(
2
),
431
443
(
2005
).
46.
F. J.
Fabozzi
,
H. M.
Markowitz
, and
F.
Gupta
, “Portfolio selection,” Handbook of Finance (Wiley & Sons Inc., 2008), Vol. 2.
47.
See https://www.yahoo.com for experimental data.
48.
Y.
Zhang
,
Z.
Zhang
,
T.
Luo
, and
Z. J.
Xu
, “
Embedding principle of loss landscape of deep neural networks
,”
Adv. Neural Inf. Process. Syst.
34
,
14848
14859
(
2021
).
49.
P.
Chaudhari
,
A.
Choromanska
,
S.
Soatto
,
Y.
LeCun
,
C.
Baldassi
,
C.
Borgs
,
J.
Chayes
,
L.
Sagun
, and
R.
Zecchina
, “
Entropy-SGD: Biasing gradient descent into wide valleys
,”
J. Stat. Mech.: Theory Exp.
2019
(
12
),
124018
.
50.
A.
Jacot
,
F.
Gabriel
, and
C.
Hongler
, “The asymptotic spectrum of the Hessian of DNN throughout training,” arXiv:1910.02875 (2019).
51.
Z.
Dong
,
Z.
Yao
,
A.
Gholam
,
M. W.
Mahoney
, and
K.
Keutzer
, “HAWQ: Hessian aware quantization of neural networks with mixed-precision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (IEEE, 2019), pp. 293–302.
52.
Z.
Yao
,
A.
Gholami
,
S.
Shen
,
M.
Mustafa
,
K.
Keutzer
, and
M.
Mahoney
, “AdaHessian: An adaptive second order optimizer for machine learning,” in proceedings of the AAAI Conference on Artificial Intelligence (AAAI, 2021), Vol. 35, pp. 10665–10673.
53.
Z.
Li
,
Z.
Wang
, and
J.
Li
, “Analyzing sharpness along gd trajectory: Progressive sharpening and edge of stability,” arXiv:2207.12678 (2022).
54.
Z.
Liao
and
M. W.
Mahoney
, “
Hessian eigenspectra of more realistic nonlinear models
,”
Adv. Neural Inf. Process. Syst.
34
,
20104
20117
(
2021
).
55.
L.
Sagun
,
L.
Bottou
, and
L.
LeCun
, “Eigenvalues of the Hessian in deep learning: Singularity and beyond,” arXiv:1611.07476 (2016).
You do not currently have access to this content.