Complex systems, characterized by intricate interactions among numerous entities, give rise to emergent behaviors whose data-driven modeling and control are of utmost significance, especially when there is abundant observational data but the intervention cost is high. Traditional methods rely on precise dynamical models or require extensive intervention data, often falling short in real-world applications. To bridge this gap, we consider a specific setting of the complex systems control problem: how to control complex systems through a few online interactions on some intervenable nodes when abundant observational data from natural evolution is available. We introduce a two-stage model predictive complex system control framework, comprising an offline pre-training phase that leverages rich observational data to capture spontaneous evolutionary dynamics and an online fine-tuning phase that uses a variant of model predictive control to implement intervention actions. To address the high-dimensional nature of the state-action space in complex systems, we propose a novel approach employing action-extended graph neural networks to model the Markov decision process of complex systems and design a hierarchical action space for learning intervention actions. This approach performs well in three complex system control environments: Boids, Kuramoto, and Susceptible-Infectious-Susceptible (SIS) metapopulation. It offers accelerated convergence, robust generalization, and reduced intervention costs compared to the baseline algorithm. This work provides valuable insights into controlling complex systems with high-dimensional state-action spaces and limited intervention data, presenting promising applications for real-world challenges.

1.
J. K.
Parrish
and
L.
Edelstein-Keshet
, “
Complexity, pattern, and evolutionary trade-offs in animal aggregation
,”
Science
284
,
99
101
(
1999
).
2.
E.
Bonabeau
,
M.
Dorigo
, and
G.
Theraulaz
,
Swarm Intelligence: From Natural to Artificial Systems
(
Oxford University Press
,
1999
), p. 1.
3.
C. W.
Reynolds
, “Flocks, herds and schools: A distributed behavioral model,” in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques (ACM, 1987), pp. 25–34.
4.
T.
Vicsek
,
A.
Czirók
,
E.
Ben-Jacob
,
I.
Cohen
, and
O.
Shochet
, “
Novel type of phase transition in a system of self-driven particles
,”
Phys. Rev. Lett.
75
,
1226
(
1995
).
5.
J.
Han
and
L.
Wang
, “
Nondestructive intervention to multi-agent systems through an intelligent agent
,”
PLoS One
8
,
e61542
(
2013
).
6.
R. M.
D’Souza
,
M.
di Bernardo
, and
Y.-Y.
Liu
, “
Controlling complex networks with complex nodes
,”
Nat. Rev. Phys.
5
,
250
262
(
2023
).
7.
G.
Baggio
,
D. S.
Bassett
, and
F.
Pasqualetti
, “
Data-driven control of complex networks
,”
Nat. Commun.
12
,
1429
(
2021
).
8.
J.
Pearl
and
D.
Mackenzie
,
The Book of Why: The New Science of Cause and Effect
(
Basic Books
,
2018
).
9.
A. E.
Motter
, “
Networkcontrology
,”
Chaos
25
,
097621
(
2015
).
10.
J.
Ladyman
,
J.
Lambert
, and
K.
Wiesner
, “
What is a complex system?
,”
Eur. J. Philos. Sci.
3
,
33
67
(
2013
).
11.
O. L.
Mangasarian
, “
Sufficient conditions for the optimal control of nonlinear systems
,”
SIAM J. Control
4
,
139
152
(
1966
).
12.
E.
McShane
, “
The calculus of variations from the beginning through optimal control theory
,”
SIAM J. Control Optim.
27
,
916
939
(
1989
).
13.
X.
Zhou
, “
Maximum principle, dynamic programming, and their connection in deterministic control
,”
J. Optim. Theory Appl.
65
,
363
373
(
1990
).
14.
T.
Asikis
,
L.
Böttcher
, and
N.
Antulov-Fantulin
, “
Neural ordinary differential equation control of dynamics on graphs
,”
Phys. Rev. Res.
4
,
013221
(
2022
).
15.
L.
Böttcher
,
N.
Antulov-Fantulin
, and
T.
Asikis
, “
AI Pontryagin or how artificial neural networks learn to control dynamical systems
,”
Nat. Commun.
13
,
333
(
2022
).
16.
S. L.
Brunton
,
J. L.
Proctor
, and
J. N.
Kutz
, “
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
,”
Proc. Natl. Acad. Sci. U. S. A.
113
,
3932
3937
(
2016
).
17.
E.
Kaiser
,
J. N.
Kutz
, and
S. L.
Brunton
, “
Sparse identification of nonlinear dynamics for model predictive control in the low-data limit
,”
Proc. R. Soc. A
474
,
20180335
(
2018
).
18.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
(
MIT Press
,
2018
).
19.
J.
Schulman
,
F.
Wolski
,
P.
Dhariwal
,
A.
Radford
, and
O.
Klimov
, “Proximal policy optimization algorithms,” arXiv:1707.06347 (2017).
20.
T.
Haarnoja
,
A.
Zhou
,
P.
Abbeel
, and
S.
Levine
, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International Conference on Machine Learning (PMLR, 2018), pp. 1861–1870.
21.
R. S.
Sutton
, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Machine Learning Proceedings 1990 (Elsevier, 1990), pp. 216–224.
22.
K.
Chua
,
R.
Calandra
,
R.
McAllister
, and
S.
Levine
, “
Deep reinforcement learning in a handful of trials using probabilistic dynamics models
,”
Adv. Neural Inf. Process. Syst.
31
,
3
(
2018
).
23.
H.
Wei
,
N.
Xu
,
H.
Zhang
,
G.
Zheng
,
X.
Zang
,
C.
Chen
,
W.
Zhang
,
Y.
Zhu
,
K.
Xu
, and
Z.
Li
, “Colight: Learning network-level cooperation for traffic signal control,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management (ACM, 2019), pp. 1913–1922.
24.
M.
Korecki
, “
Deep reinforcement meta-learning and self-organization in complex systems: Applications to traffic signal control
,”
Entropy
25
,
982
(
2023
).
25.
S.
Ha
and
H.
Jeong
, “Towards automated statistical physics: Data-driven modeling of complex systems with deep learning,” arXiv:2001.02539 (2020).
26.
S.
Brody
,
U.
Alon
, and
E.
Yahav
, “How attentive are graph attention networks?,” arXiv:2105.14491 (2021).
27.
T. N.
Kipf
and
M.
Welling
, “Semi-supervised classification with graph convolutional networks,” arXiv:1609.02907 (2016).
28.
S.
Mannor
,
R. Y.
Rubinstein
, and
Y.
Gat
, “The cross entropy method for fast policy search,” in Proceedings of the International Conference on Machine Learning (ICML, 2003), pp. 512–519.
29.
M.
Ballerini
,
N.
Cabibbo
,
R.
Candelier
,
A.
Cavagna
,
E.
Cisbani
,
I.
Giardina
,
V.
Lecomte
,
A.
Orlandi
,
G.
Parisi
,
A.
Procaccini
,
M.
Viale
, and
V.
Zdravkovic
, “
Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study
,”
Proc. Natl. Acad. Sci. U. S. A.
105
,
1232
1237
(
2008
).
You do not currently have access to this content.