Increasingly complex nonlinear World-Earth system models are used for describing the dynamics of the biophysical Earth system and the socioeconomic and sociocultural World of human societies and their interactions. Identifying pathways toward a sustainable future in these models for informing policymakers and the wider public, e.g., pathways leading to robust mitigation of dangerous anthropogenic climate change, is a challenging and widely investigated task in the field of climate research and broader Earth system science. This problem is particularly difficult when constraints on avoiding transgressions of planetary boundaries and social foundations need to be taken into account. In this work, we propose to combine recently developed machine learning techniques, namely, deep reinforcement learning (DRL), with classical analysis of trajectories in the World-Earth system. Based on the concept of the agent-environment interface, we develop an agent that is generally able to act and learn in variable manageable environment models of the Earth system. We demonstrate the potential of our framework by applying DRL algorithms to two stylized World-Earth system models. Conceptually, we explore thereby the feasibility of finding novel global governance policies leading into a safe and just operating space constrained by certain planetary and socioeconomic boundaries. The artificially intelligent agent learns that the timing of a specific mix of taxing carbon emissions and subsidies on renewables is of crucial relevance for finding World-Earth system trajectories that are sustainable in the long term.

1.
H. J.
Schellnhuber
, “
‘Earth system’ analysis and the second copernican revolution
,”
Nature
402
,
C19
(
1999
).
2.
J. F.
Donges
,
R.
Winkelmann
,
W.
Lucht
,
S. E.
Cornell
,
J. G.
Dyke
,
J.
Rockström
,
J.
Heitzig
, and
H. J.
Schellnhuber
, “
Closing the loop: Reconnecting human dynamics to earth system science
,”
Anthropocene Rev.
4
,
151
157
(
2017
).
3.
J.
Rockström
,
W.
Steffen
,
K.
Noone
,
Å.
Persson
,
F. S.
Chapin III
,
E. F.
Lambin
,
T. M.
Lenton
,
M.
Scheffer
,
C.
Folke
,
H. J.
Schellnhuber
et al., “
A safe operating space for humanity
,”
Nature
461
,
472
(
2009
).
4.
J.
Rockström
,
W. L.
Steffen
,
K.
Noone
,
Å.
Persson
,
F. S.
Chapin III
,
E.
Lambin
,
T. M.
Lenton
,
M.
Scheffer
,
C.
Folke
,
H. J.
Schellnhuber
et al., “
Planetary boundaries: Exploring the safe operating space for humanity
,”
Ecol. Soc.
14
,
32
(
2009
).
5.
UG Assembly
, “Transforming our world: The 2030 agenda for sustainable development,” Technical Report (Institution United Nations, 2015).
6.
UNFC on Climate Change
, “Conference of the parties—Adoption of the paris agreement,” Technical Report (Institution United Nations, 2015).
7.
J. M.
Anderies
,
S. R.
Carpenter
,
W.
Steffen
, and
J.
Rockström
, “
The topology of non-linear global carbon dynamics: From tipping points to planetary boundaries
,”
Environ. Res. Lett.
8
,
044048
(
2013
).
8.
W.
Steffen
,
K.
Richardson
,
J.
Rockström
,
S. E.
Cornell
,
I.
Fetzer
,
E. M.
Bennett
,
R.
Biggs
,
S. R.
Carpenter
,
W.
De Vries
,
C. A.
De Wit
et al., “
Planetary boundaries: Guiding human development on a changing planet
,”
Science
347
,
1259855
(
2015
).
9.
K.
Raworth
, “
A safe and just space for humanity: Can we live within the doughnut
,”
Oxfam Policy Pract. Clim. Change Resil
8
,
1
16
(
2012
).
10.
J.
Rogelj
,
D.
Shindell
,
K.
Jiang
,
S.
Fifita
,
P.
Forster
,
V.
Ginzburg
,
C.
Handa
,
H.
Kheshgi
,
S.
Kobayashi
,
E.
Kriegler
et al., “Mitigation pathways compatible with 1.5 c in the context of sustainable development,” IPCC Report (2018).
11.
W.
Steffen
,
J.
Rockström
,
K.
Richardson
,
T. M.
Lenton
,
C.
Folke
,
D.
Liverman
,
C. P.
Summerhayes
,
A. D.
Barnosky
,
S. E.
Cornell
,
M.
Crucifix
,
J. F.
Donges
,
I.
Fetzer
,
S. J.
Lade
,
M.
Scheffer
,
R.
Winkelmann
, and
H. J.
Schellnhuber
, “
Trajectories of the earth system in the anthropocene
,”
Proc. Natl. Acad. Sci. U.S.A.
115
,
8252
8259
(
2018
), see https://www.pnas.org/content/115/33/8252.full.pdf.
12.
F.
Müller-Hansen
,
M.
Schlüter
,
M.
Mäs
,
J. F.
Donges
,
J. J.
Kolb
,
K.
Thonicke
, and
J.
Heitzig
, “
Towards representing human behavior and decision making in earth system models—An overview of techniques and approaches
,”
Earth Syst. Dyn.
8
,
977
1007
(
2017
).
13.
D. L.
Kelly
,
C. D.
Kolstad
, et al., “Integrated assessment models for climate change control,” in International Yearbook of Environmental and Resource Economics 1999/2000: A Survey of Current Issues (Edward Elgar, Cheltenham, 1999), pp. 171–197.
14.
C.
Pahl-Wostl
,
C.
Schlumpf
,
M.
Büssenschütt
,
A.
Schönborn
, and
J.
Burse
, “
Models at the interface between science and society: Impacts and options
,”
Integr. Assess.
1
,
267
280
(
2000
).
15.
M. R.
Bussieck
and
A.
Meeraus
, “General algebraic modeling system (GAMS),” in Modeling Languages in Mathematical Optimization (Springer, 2004), pp. 137–157.
16.
R. S.
Pindyck
, “
The use and misuse of models for climate policy
,”
Rev. Environ. Econ. Policy
11
,
100
114
(
2017
).
17.
M. I.
Kamien
and
N. L.
Schwartz
,
Dynamic Optimization: The Calculus of Variations and Optimal Control in Economics and Management
(
Courier Corporation
,
2012
).
18.
W.
Liang
, “Climate modification directed by control theory,” e-print arXiv:0805.0541 (2008).
19.
N.
Botta
,
P.
Jansson
, and
C.
Ionescu
, “
The impact of uncertainty on optimal emission policies
,”
Earth Sys. Dyn.
9
,
525
542
(
2018
).
20.
G.
Deffuant
and
N.
Gilbert
,
Viability and Resilience of Complex Systems: Concepts, Methods and Case Studies from Ecology and Society
(
Springer Science & Business Media
,
2011
).
21.
T.
Kittel
,
R.
Koch
,
J.
Heitzig
,
G.
Deffuant
,
J.-D.
Mathias
, and
J.
Kurths
, “Operationalization of topology of sustainable management to estimate qualitatively different regions in state space,” e-print arXiv:1706.04542 (2017).
22.
R. S.
Sutton
and
A. G.
Barto
,
Introduction to Reinforcement Learning
(
MIT Press
,
Cambridge
,
1998
). Vol. 135.
23.
F. B.
von der Osten
, “Intelligent decision-making in coupled socio-ecological systems,” Ph.D. thesis (University of Melbourne, 2017).
24.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A. A.
Rusu
,
J.
Veness
,
M. G.
Bellemare
,
A.
Graves
,
M.
Riedmiller
,
A. K.
Fidjeland
,
G.
Ostrovski
et al., “
Human-level control through deep reinforcement learning
,”
Nature
518
,
529
(
2015
).
25.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A.
Graves
,
I.
Antonoglou
,
D.
Wierstra
, and
M.
Riedmiller
, “Playing atari with deep reinforcement learning,” e-print arXiv:1312.5602 (2013).
26.
G.
Tesauro
, “
Temporal difference learning and TD-Gammon
,”
Commun. ACM.
38
,
58
68
(
1995
).
27.
K.
Arulkumaran
,
M. P.
Deisenroth
,
M.
Brundage
, and
A. A.
Bharath
, “A brief survey of deep reinforcement learning,” e-print arXiv:1708.05866 (2017).
28.
Y.
Li
, “Deep reinforcement learning,” e-print arXiv:1810.06339 (2018).
29.
C. J. C.
H. Watkins
, “Learning from delayed rewards,” Ph.D. thesis (King’s College, Cambridge, 1989).
30.
Y.
LeCun
,
Y.
Bengio
, and
G.
Hinton
, “
Deep learning
,”
Nature
521
,
436
(
2015
).
31.
L.-J.
Lin
, “Reinforcement learning for robots using neural networks,” Technical Report (DTIC Document, 1993).
32.
D.
Silver
,
A.
Huang
,
C. J.
Maddison
,
A.
Guez
,
L.
Sifre
,
G.
Van Den Driessche
,
J.
Schrittwieser
,
I.
Antonoglou
,
V.
Panneershelvam
,
M.
Lanctot
et al., “
Mastering the game of go with deep neural networks and tree search
,”
Nature
529
,
484
(
2016
).
33.
D.
Silver
,
T.
Hubert
,
J.
Schrittwieser
,
I.
Antonoglou
,
M.
Lai
,
A.
Guez
,
M.
Lanctot
,
L.
Sifre
,
D.
Kumaran
,
T.
Graepel
,
T.
Lillicrap
,
K.
Simonyan
, and
D.
Hassabis
, “
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play
,”
Science
362
,
1140
1144
(
2018
), see https://science.sciencemag.org/content/362/6419/1140.full.pdf.
34.
M.
Hessel
,
J.
Modayil
,
H.
Van Hasselt
,
T.
Schaul
,
G.
Ostrovski
,
W.
Dabney
,
D.
Horgan
,
B.
Piot
,
M.
Azar
, and
D.
Silver
, “Rainbow: Combining improvements in deep reinforcement learning,” in Thirty-Second AAAI Conference on Artificial Intelligence (AAAI Press, 2018).
35.
H.
Mao
,
M.
Alizadeh
,
I.
Menache
, and
S.
Kandula
, “Resource management with deep reinforcement learning,” in Proceedings of the 15th ACM Workshop on Hot Topics in Networks (ACM, 2016), pp. 50–56.
36.
Z.
Zhou
,
X.
Li
, and
R. N.
Zare
, “
Optimizing chemical reactions with deep reinforcement learning
,”
ACS. Cent. Sci.
3
,
1337
1344
(
2017
).
37.
T. P.
Lillicrap
,
J.
Hunt
,
A.
Pritzel
,
N.
Heess
,
T.
Erez
,
Y.
Tassa
,
D.
Silver
, and
D.
Wierstra
, “Continuous control with deep reinforcement learning (2015),” e-print arXiv:1509.02971 (2016).
38.
S.
Levine
,
C.
Finn
,
T.
Darrell
, and
P.
Abbeel
, “
End-to-end training of deep visuomotor policies
,”
J. Mach. Learn. Res.
17
,
1334
1373
(
2016
).
39.
Y.
Zhu
,
R.
Mottaghi
,
E.
Kolve
,
J. J.
Lim
,
A.
Gupta
,
L.
Fei-Fei
, and
A.
Farhadi
, “Target-driven visual navigation in indoor scenes using deep reinforcement learning,” in 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017), pp. 3357–3364.
40.
S.
Gu
,
E.
Holly
,
T.
Lillicrap
, and
S.
Levine
, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017), pp. 3389–3396.
41.
T.
Haarnoja
,
A.
Zhou
,
S.
Ha
,
J.
Tan
,
G.
Tucker
, and
S.
Levine
, “Learning to walk via deep reinforcement learning,” e-print arXiv:1812.11103 (2018).
42.
J. F.
Donges
,
J.
Heitzig
,
W.
Barfuss
,
J. A.
Kassel
,
T.
Kittel
,
J. J.
Kolb
,
T.
Kolster
,
F.
Müller-Hansen
,
I. M.
Otto
,
M.
Wiedermann
et al., “
Earth system modelling with complex dynamic human societies: The copan:Core World-Earth modeling framework
,”
Earth Syst. Dyn. Discuss.
2018
,
1
27
.
43.
W. B.
Arthur
, “
Designing economic agents that act like human agents: A behavioral approach to bounded rationality
,”
Am. Econ. Rev.
81
,
353
359
(
1991
).
44.
E.
Lindkvist
and
J.
Norberg
, “
Modeling experiential learning: The challenges posed by threshold dynamics for sustainable renewable resource management
,”
Ecol. Econ.
104
,
107
118
(
2014
).
45.
E.
Lindkvist
,
Ö.
Ekeberg
, and
J.
Norberg
, “
Strategies for sustainable management of renewable resources during environmental change
,”
Proc. R. Soc. B
284
,
20162762
(
2017
).
46.
D.
Rolnick
,
P. L.
Donti
,
L. H.
Kaack
,
K.
Kochanski
,
A.
Lacoste
,
K.
Sankaran
,
A. S.
Ross
,
N.
Milojevic-Dupont
,
N.
Jaques
,
A.
Waldman-Brown
et al., “Tackling climate change with machine learning,” e-print arXiv:1906.05433 (2019).
47.
W.
Barfuss
,
J. F.
Donges
, and
J.
Kurths
, “
Deterministic limit of temporal difference reinforcement learning for stochastic games
,”
Phys. Rev. E
99
,
043305
(
2019
).
48.
M.
Wiering
and
M.
van Otterlo
, “
Reinforcement learning: State-of-the-Art
,“ in
Adaptation, Learning, and Optimization
(Springer-Verlag, Berlin, 2012), Vol. 12, pp. 3–42.
49.
R.
Bellman
, “
A Markovian decision process
,”
J. Math. Mech.
6
,
679
684
(
1957
).
50.
H.
Van Hasselt
,
A.
Guez
, and
D.
Silver
, “Deep reinforcement learning with double Q-learning,” in Thirtieth AAAI Conference on Artificial Intelligence (AAAI Press, 2016).
51.
H. V.
Hasselt
, “Double Q-learning,” in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2010), pp. 2613–2621.
52.
Z.
Wang
,
T.
Schaul
,
M.
Hessel
,
H.
Van Hasselt
,
M.
Lanctot
, and
N.
De Freitas
, “Dueling network architectures for deep reinforcement learning,” e-print arXiv:1511.06581 (2015).
53.
M. G.
Bellemare
,
W.
Dabney
, and
R.
Munos
, “A distributional perspective on reinforcement learning,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (JMLR.org, 2017), pp. 449–458.
54.
T.
Schaul
,
J.
Quan
,
I.
Antonoglou
, and
D.
Silver
, “Prioritized experience replay,” e-print arXiv:1511.05952 (2015).
55.
W.
Barfuss
,
J. F.
Donges
,
S. J.
Lade
, and
J.
Kurths
, “
When optimization for governing human-environment tipping elements is neither sustainable nor safe
,”
Nat. Commun.
9
,
2354
(
2018
).
56.
J.
Nitzbon
,
J.
Heitzig
, and
U.
Parlitz
, “
Sustainability, collapse and oscillations in a simple world-earth model
,”
Environ. Res. Lett.
12
,
074020
(
2017
).
57.
J.
Heitzig
,
W.
Barfuss
, and
J.
Donges
, “
A thought experiment on sustainable management of the earth system
,”
Sustainability
10
,
1947
(
2018
).
58.
S.
Zhang
and
R. S.
Sutton
, “A deeper look at experience replay,” e-print arXiv:1712.01275 (2017).
59.
D. P.
Kingma
and
J.
Ba
, “Adam: A method for stochastic optimization,” e-print arXiv:1412.6980 (2014).
60.
J.
Heitzig
,
T.
Kittel
,
J. F.
Donges
, and
N.
Molkenthin
, “
Topology of sustainable management of dynamical systems with desirable states: From defining planetary boundaries to safe operating spaces in the earth system
,”
Earth Syst. Dyn.
7
,
21
50
(
2016
).
61.
M. T.
Spaan
, “Partially observable Markov decision processes,” in Reinforcement Learning (Springer, 2012), pp. 387–414.
62.
C.
Szegedy
,
W.
Zaremba
,
I.
Sutskever
,
J.
Bruna
,
D.
Erhan
,
I.
Goodfellow
, and
R.
Fergus
, “Intriguing properties of neural networks,” e-print arXiv:1312.6199 (2013).
63.
N.
Papernot
,
P.
McDaniel
,
S.
Jha
,
M.
Fredrikson
,
Z. B.
Celik
, and
A.
Swami
, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P) (IEEE, 2016), pp. 372–387.
64.
V.
Behzadan
and
A.
Munir
, “Vulnerability of deep reinforcement learning to policy induction attacks,” in International Conference on Machine Learning and Data Mining in Pattern Recognition (Springer, 2017), pp. 262–275.
65.
V.
Behzadan
and
A.
Munir
, “Whatever does not kill deep reinforcement learning, makes it stronger,” e-print arXiv:1712.09344 (2017).
66.
S.
Huang
,
N.
Papernot
,
I.
Goodfellow
,
Y.
Duan
, and
P.
Abbeel
, “Adversarial attacks on neural network policies,” e-print arXiv:1702.02284 (2017).
67.
P.
Mirowski
,
R.
Pascanu
,
F.
Viola
,
H.
Soyer
,
A. J.
Ballard
,
A.
Banino
,
M.
Denil
,
R.
Goroshin
,
L.
Sifre
,
K.
Kavukcuoglu
et al., “Learning to navigate in complex environments,” e-print arXiv:1611.03673 (2016).
68.
W. D.
Nordhaus
, “Estimates of the social cost of carbon: Background and results from the rice-2011 model,” Technical Report (Institution National Bureau of Economic Research, 2011).
69.
N.
Stern
and
N. H.
Stern
,
The Economics of Climate Change: The Stern Review
(
Cambridge University Press
,
2007
).
70.
M.
Wiedermann
,
J. F.
Donges
,
J.
Heitzig
,
W.
Lucht
, and
J.
Kurths
, “
Macroscopic description of complex adaptive networks coevolving with dynamic node states
,”
Phys. Rev. E
91
,
052801
(
2015
).
71.
W.
Barfuss
,
J. F.
Donges
,
M.
Wiedermann
, and
W.
Lucht
, “
Sustainable use of renewable resources in a stylized social–ecological network model under heterogeneous resource distribution
,”
Earth Syst. Dyn.
8
,
255
264
(
2017
).
72.
S.
Sitch
,
B.
Smith
,
I. C.
Prentice
,
A.
Arneth
,
A.
Bondeau
,
W.
Cramer
,
J. O.
Kaplan
,
S.
Levis
,
W.
Lucht
,
M. T.
Sykes
et al., “
Evaluation of ecosystem dynamics, plant geography and terrestrial carbon cycling in the LPJ dynamic global vegetation model
,”
Glob. Chang. Biol.
9
,
161
185
(
2003
).
73.
J. Z.
Leibo
,
V.
Zambaldi
,
M.
Lanctot
,
J.
Marecki
, and
T.
Graepel
, “Multi-agent reinforcement learning in sequential social dilemmas,” in Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, 2017), pp. 464–473.
74.
J.
Perolat
,
J. Z.
Leibo
,
V.
Zambaldi
,
C.
Beattie
,
K.
Tuyls
, and
T.
Graepel
, “A multi-agent reinforcement learning model of common-pool resource appropriation,” in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2017), pp. 3643–3652.
75.
J.
Heitzig
,
K.
Lessmann
, and
Y.
Zou
, “
Self-enforcing strategies to deter free-riding in the climate change mitigation game and other repeated public good games
,”
Proc. Natl. Acad. Sci. U.S.A.
108
,
15739
15744
(
2011
).
76.
A.
Nagabandi
,
G.
Kahn
,
R. S.
Fearing
, and
S.
Levine
, “Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning,” in 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018), pp. 7559–7566.
77.
V.
Pong
,
S.
Gu
,
M.
Dalal
, and
S.
Levine
, “Temporal difference models: Model-free deep RL for model-based control,” e-print arXiv:1802.09081 (2018).
78.
D.
Amodei
,
C.
Olah
,
J.
Steinhardt
,
P.
Christiano
,
J.
Schulman
, and
D.
Mané
, “Concrete problems in ai safety,” e-print arXiv:1606.06565 (2016).
79.
T. M.
Lenton
,
H.
Held
,
E.
Kriegler
,
J. W.
Hall
,
W.
Lucht
,
S.
Rahmstorf
, and
H. J.
Schellnhuber
, “
Tipping elements in the earth’s climate system
,”
Proc. Natl. Acad. Sci. U.S.A.
105
,
1786
1793
(
2008
), see https://www.pnas.org/content/105/6/1786.full.pdf.
80.
H. J.
Schellnhuber
, “
Tipping elements in the earth system
,”
Proc. Natl. Acad. Sci. U.S.A.
106
,
20561
20563
(
2009
), see https://www.pnas.org/content/106/49/20561.full.pdf.
You do not currently have access to this content.