Reinforcement learning (RL) is a powerful machine learning technique that has been successfully applied to a wide variety of problems. However, it can be unpredictable and produce suboptimal results in complicated learning environments. This is especially true when multiple agents learn simultaneously, which creates a complex system that is often analytically intractable. Our work considers the fundamental framework of Q-learning in public goods games, where RL individuals must work together to achieve a common goal. This setting allows us to study the tragedy of the commons and free-rider effects in artificial intelligence cooperation, an emerging field with potential to resolve challenging obstacles to the wider application of artificial intelligence. While this social dilemma has been mainly investigated through traditional and evolutionary game theory, our work connects these two approaches by studying agents with an intermediate level of intelligence. We consider the influence of learning parameters on cooperation levels in simulations and a limiting system of differential equations, as well as the effect of evolutionary pressures on exploration rate in both of these models. We find selection for higher and lower levels of exploration, as well as attracting values, and a condition that separates these in a restricted class of games. Our work enhances the theoretical understanding of recent techniques that combine evolutionary algorithms with Q-learning and extends our knowledge of the evolution of machine behavior in social dilemmas.

1.
M.
Tsvetkova
,
T.
Yasseri
,
N.
Pescetelli
, and
T.
Werner
, “
A new sociology of humans and machines
,”
Nat. Hum. Behav.
8
,
1864
1876
(
2024
).
2.
A.
Dafoe
,
Y.
Bachrach
,
G.
Hadfield
,
E.
Horvitz
,
K.
Larson
, and
T.
Graepel
, “Cooperative AI: Machines must learn to find common ground,”
Nature
593
,
33–36
(
2021
).
3.
L. P.
Kaelbling
,
M. L.
Littman
, and
A. W.
Moore
, “
Reinforcement learning: A survey
,”
J. Artif. Intell. Res.
4
,
237
285
(
1996
).
4.
C. J.
Watkins
and
P.
Dayan
, “
Q-learning
,”
Mach. Learn.
8
,
279
292
(
1992
).
5.
C.
Szepesvári
,
Algorithms for Reinforcement Learning
(
Springer Nature
,
2022
).
6.
A. G.
Barto
,
P. S.
Thomas
, and
R. S.
Sutton
, “Some recent applications of reinforcement learning,” in Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems (Yale University, 2017).
7.
A.
Perera
and
P.
Kamalaruban
, “
Applications of reinforcement learning in energy systems
,”
Renew. Sustain. Energy Rev.
137
,
110618
(
2021
).
8.
M. A.
Bucci
,
O.
Semeraro
,
A.
Allauzen
,
G.
Wisniewski
,
L.
Cordier
, and
L.
Mathelin
, “
Control of chaotic systems by deep reinforcement learning
,”
Proc. R. Soc. A
475
,
20190351
(
2019
).
9.
S.
Leonardos
and
G.
Piliouras
, “
Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory
,”
Artif. Intell.
304
,
103653
(
2022
).
10.
A.
Kianercy
and
A.
Galstyan
, “
Dynamics of Boltzmann Q learning in two-player two-action games
,”
Phys. Rev. E
85
,
041145
(
2012
).
11.
T. T.
Nguyen
,
N. D.
Nguyen
, and
S.
Nahavandi
, “
Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications
,”
IEEE Trans. Cybern.
50
,
3826
3839
(
2020
).
12.
D.
Bloembergen
,
K.
Tuyls
,
D.
Hennes
, and
M.
Kaisers
, “
Evolutionary dynamics of multi-agent learning: A survey
,”
J. Artif. Intell. Res.
53
,
659
697
(
2015
).
13.
M.
Bowling
and
M.
Veloso
, “
Multiagent learning using a variable learning rate
,”
Artif. Intell.
136
,
215
250
(
2002
).
14.
S.
Gronauer
and
K.
Diepold
, “
Multi-agent deep reinforcement learning: A survey
,”
Artif. Intell. Rev.
55
,
895
943
(
2022
).
15.
W.
Du
and
S.
Ding
, “
A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications
,”
Artif. Intell. Rev.
54
,
3215
3238
(
2021
).
16.
R.
Miikkulainen
and
S.
Forrest
, “
A biological perspective on evolutionary computation
,”
Nat. Mach. Intell.
3
,
9
15
(
2021
).
17.
B.
Li
,
Z.
Wei
,
J.
Wu
,
S.
Yu
,
T.
Zhang
,
C.
Zhu
,
D.
Zheng
,
W.
Guo
,
C.
Zhao
, and
J.
Zhang
, “
Machine learning-enabled globally guaranteed evolutionary computation
,”
Nat. Mach. Intell.
5
,
1
11
(
2023
).
18.
K.-S.
Tang
,
K.-F.
Man
,
S.
Kwong
, and
Q.
He
, “
Genetic algorithms and their applications
,”
IEEE Signal Process. Mag.
13
,
22
37
(
1996
).
19.
L.
Haldurai
,
T.
Madhubala
, and
R.
Rajalakshmi
, “
A study on genetic algorithm and its applications
,”
Int. J. Comput. Sci. Eng.
4
,
139
143
(
2016
).
20.
K.
Tuyls
,
K.
Verbeeck
, and
T.
Lenaerts
, “A selection-mutation model for Q-learning in multi-agent systems,” in Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (Association for Computing Machinery, 2003), pp. 693–700.
21.
W.
Barfuss
, “
Dynamical systems as a level of cognitive analysis of multi-agent learning: Algorithmic foundations of temporal-difference learning dynamics
,”
Neural Comput. Appl.
34
,
1653
1671
(
2022
).
22.
A.
Sehgal
,
H.
La
,
S.
Louis
, and
H.
Nguyen
, “Deep reinforcement learning using genetic algorithm for parameter optimization,” in 2019 Third IEEE International Conference on Robotic Computing (IRC) (IEEE, 2019), pp. 596–601.
23.
S.
Dominic
,
R.
Das
,
D.
Whitley
, and
C.
Anderson
, “Genetic reinforcement learning for neural networks,” in IJCNN-91-Seattle International Joint Conference on Neural Networks (IEEE, 1991), Vol. 2, pp. 71–76.
24.
D. E.
Moriarty
,
A. C.
Schultz
, and
J. J.
Grefenstette
, “
Evolutionary algorithms for reinforcement learning
,”
J. Artif. Intell. Res.
11
,
241
276
(
1999
).
25.
F.
Liu
and
G.
Zeng
, “
Study of genetic algorithm with reinforcement learning to solve the TSP
,”
Expert Syst. Appl.
36
,
6995
7001
(
2009
).
26.
X.
Wu
,
Q.
Zhu
,
W.-N.
Chen
,
Q.
Lin
,
J.
Li
, and
C. A. C.
Coello
, “
Evolutionary reinforcement learning with action sequence search for imperfect information games
,”
Inf. Sci.
676
,
120804
(
2024
).
27.
M.-J.
Kim
,
J. S.
Kim
, and
C. W.
Ahn
, “
Evolving population method for real-time reinforcement learning
,”
Expert Syst. Appl.
229
,
120493
(
2023
).
28.
H.
Bai
,
R.
Cheng
, and
Y.
Jin
, “
Evolutionary reinforcement learning: A survey
,”
Intell. Comput.
2
,
0025
(
2023
).
29.
Q.
Zhu
,
X.
Wu
,
Q.
Lin
,
L.
Ma
,
J.
Li
,
Z.
Ming
, and
J.
Chen
, “
A survey on evolutionary reinforcement learning algorithms
,”
Neurocomputing
556
,
126628
(
2023
).
30.
Y.
Lin
,
F.
Lin
,
G.
Cai
,
H.
Chen
,
L.
Zou
, and
P.
Wu
, “Evolutionary reinforcement learning: A systematic review and future directions,” arXiv:2402.13296 (2024).
31.
Z.
Tan
and
K.
Li
, “
Differential evolution with mixed mutation strategy based on deep reinforcement learning
,”
Appl. Soft Comput.
111
,
107678
(
2021
).
32.
Z.
Liu
,
B.
Chen
,
H.
Zhou
,
G.
Koushik
,
M.
Hebert
, and
D.
Zhao
, “Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2020), pp. 11748–11754.
33.
A.
Szolnoki
and
M.
Perc
, “
Conditional strategies and the evolution of cooperation in spatial public goods games
,”
Phys. Rev. E
85
,
026104
(
2012
).
34.
F. C.
Santos
,
M. D.
Santos
, and
J. M.
Pacheco
, “
Social diversity promotes the emergence of cooperation in public goods games
,”
Nature
454
,
213
216
(
2008
).
35.
C.
Hilbe
,
B.
Wu
,
A.
Traulsen
, and
M. A.
Nowak
, “
Cooperation and control in multiplayer social dilemmas
,”
Proc. Natl. Acad. Sci. U. S. A.
111
,
16425
16430
(
2014
).
36.
C.
Hauert
,
M.
Holmes
, and
M.
Doebeli
, “
Evolutionary games and population dynamics: Maintenance of cooperation in public goods games
,”
Proc. R. Soc. B Biol. Sci.
273
,
2565
2571
(
2006
).
37.
S.
Kurokawa
and
Y.
Ihara
, “
Emergence of cooperation in public goods games
,”
Proc. R. Soc. B Biol. Sci.
276
,
1379
1384
(
2009
).
38.
B.
Mintz
and
F.
Fu
, “
Social learning and the exploration-exploitation tradeoff
,”
Computation
11
,
101
(
2023
).
39.
B.
Mintz
and
F.
Fu
, “
The point of no return: Evolution of excess mutation rate is possible even for simple mutation models
,”
Mathematics
10
,
4818
(
2022
).
40.
D.
Wang
,
D.
Tan
, and
L.
Liu
, “
Particle swarm optimization algorithm: An overview
,”
Soft Comput.
22
,
387
408
(
2018
).
41.
M.-P.
Song
and
G.-C.
Gu
, “Research on particle swarm optimization: A review,” in Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826) (IEEE, 2004), Vol. 4, pp. 2236–2241.
42.
M.
Ueda
, “
Memory-two strategies forming symmetric mutual reinforcement learning equilibrium in repeated prisoners’ dilemma game
,”
Appl. Math. Comput.
444
,
127819
(
2023
).
43.
Y.
Usui
and
M.
Ueda
, “
Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma
,”
Appl. Math. Comput.
409
,
126370
(
2021
).
44.
J. E.
Agudo
and
C.
Fyfe
, “Reinforcement learning for the n-persons iterated prisoners’ dilemma,” in 2011 Seventh International Conference on Computational Intelligence and Security (IEEE, 2011), pp. 472–476.
45.
M.
Harper
,
V.
Knight
,
M.
Jones
,
G.
Koutsovoulos
,
N. E.
Glynatsi
, and
O.
Campbell
, “
Reinforcement learning produces dominant strategies for the iterated prisoner’s dilemma
,”
PLoS One
12
,
e0188046
(
2017
).
46.
Q.
Wang
,
X.
Chen
,
N.
He
, and
A.
Szolnoki
, “
Evolutionary dynamics of population games with an aspiration-based learning rule
,”
IEEE Trans. Neural Networks Learn. Syst.
2024
,
1–14
.
47.
W.
Barfuss
and
J. M.
Meylahn
, “
Intrinsic fluctuations of reinforcement learning promote cooperation
,”
Sci. Rep.
13
,
1309
(
2023
).
48.
T. W.
Sandholm
and
R. H.
Crites
, “
Multiagent reinforcement learning in the iterated prisoner’s dilemma
,”
BioSystems
37
,
147
166
(
1996
).
49.
S. S.
Izquierdo
,
L. R.
Izquierdo
, and
N. M.
Gotts
, “
Reinforcement learning dynamics in social dilemmas
,”
J. Artif. Soc. Social Simul.
11
,
1
(
2008
).
50.
M.
Kaisers
and
K.
Tuyls
, “FAQ-learning in matrix games: Demonstrating convergence near nash equilibria, and bifurcation of attractors in the battle of sexes,” in Interactive Decision Theory and Game Theory (Association for the Advancement of Artificial Intelligence, 2011).
51.
N.
Masuda
and
M.
Nakamura
, “
Numerical analysis of a reinforcement learning model with the dynamic aspiration level in the iterated prisoner’s dilemma
,”
J. Theor. Biol.
278
,
55
62
(
2011
).
52.
M. W.
Macy
and
A.
Flache
, “
Learning dynamics in social dilemmas
,”
Proc. Natl. Acad. Sci. U. S. A.
99
,
7229
7236
(
2002
).
53.
L.
Wang
,
D.
Jia
,
L.
Zhang
,
P.
Zhu
,
M.
Perc
,
L.
Shi
, and
Z.
Wang
, “
Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning
,”
Nonlinear Dyn.
108
,
1837
1845
(
2022
).
54.
C.
Yu
,
M.
Zhang
,
F.
Ren
, and
G.
Tan
, “
Emotional multiagent reinforcement learning in spatial social dilemmas
,”
IEEE Trans. Neural Networks Learn. Syst.
26
,
3083
3096
(
2015
).
55.
A.
Oroojlooy
and
D.
Hajinezhad
, “
A review of cooperative multi-agent deep reinforcement learning
,”
Appl. Intell.
53
,
13677
13722
(
2023
).
56.
O.
Leimar
and
J. M.
McNamara
, “
Learning leads to bounded rationality and the evolution of cognitive bias in public goods games
,”
Sci. Rep.
9
,
16319
(
2019
).
57.
H. H.
Nax
and
M.
Perc
, “
Directional learning and the provisioning of public goods
,”
Sci. Rep.
5
,
1
6
(
2015
).
58.
X.
Wang
,
Z.
Yang
,
G.
Chen
, and
Y.
Liu
, “
Enhancing cooperative evolution in spatial public goods game by particle swarm optimization based on exploration and q-learning
,”
Appl. Math. Comput.
469
,
128534
(
2024
).
59.
L.
Wang
,
L.
Fan
,
L.
Zhang
,
R.
Zou
, and
Z.
Wang
, “
Synergistic effects of adaptive reward and reinforcement learning rules on cooperation
,”
New J. Phys.
25
,
073008
(
2023
).
60.
U.
ManChon
and
Z.
Li
, “Public goods game simulator with reinforcement learning agents,” in 2010 Ninth International Conference on Machine Learning and Applications (IEEE, 2010), pp. 43–49.
61.
A.
Iwasaki
,
S.
Imura
,
S. H.
Oda
,
I.
Hatono
, and
K.
Ueda
, “
Does reinforcement learning simulate threshold public goods games? A comparison with subject experiments
,”
IEICE Trans. Inf. Syst.
86
,
1335
1343
(
2003
).
62.
Y.
Horita
,
M.
Takezawa
,
K.
Inukai
,
T.
Kita
, and
N.
Masuda
, “
Reinforcement learning accounts for moody conditional cooperation behavior: Experimental results
,”
Sci. Rep.
7
,
39275
(
2017
).
63.
M. A.
Janssen
and
T.-K.
Ahn
, “
Learning, signaling, and social preferences in public-good games
,”
Ecol. Soc.
11
(2),
1–23
(
2006
); available at https://www.jstor.org/stable/26266011.
64.
J.
Arifovic
and
J.
Ledyard
, “
Scaling up learning models in public good games
,”
J. Public Econ. Theory
6
,
203
238
(
2004
).
65.
C.
Bühren
,
J.
Haarde
,
C.
Hirschmann
, and
J.
Kesten-Kühne
, “
Social preferences in the public goods game—An agent-based simulation with econsim
,”
PLoS One
18
,
e0282112
(
2023
).
66.
C. R.
Cotla
, “Learning in repeated public goods games—A meta analysis,” SSRN:3241779 (2015).
67.
W.
Hichri
and
A.
Kirman
, “
The emergence of coordination in public good games
,”
Eur. Phys. J. B
55
,
149
159
(
2007
).
68.
M. A.
Nowak
,
Evolutionary Dynamics: Exploring the Equations of Life
(
Harvard University Press
,
2006
).
69.
U.
Dieckmann
,
J. A.
Metz
,
M.
Sabelis
, and
K.
Sigmund
, Adaptive Dynamics of Infectious Diseases: In Pursuit Virulence Management (Cambridge University Press, 2002), Vol. 1, pp. 460–463.
70.
O.
Diekmann
et al., “
A beginner’s guide to adaptive dynamics
,”
Banach Center Publ.
63
,
47
86
(
2004
).
71.
X.
Chen
and
F.
Fu
, “
Outlearning extortioners: Unbending strategies can foster reciprocal fairness and cooperation
,”
PNAS Nexus
2
,
pgad176
(
2023
).
72.
Å.
Brännström
,
J.
Johansson
, and
N.
Von Festenberg
, “
The hitchhiker’s guide to adaptive dynamics
,”
Games
4
,
304
328
(
2013
).
73.
A.
Galstyan
, “
Continuous strategy replicator dynamics for multi-agent Q-learning
,”
Auton. Agents Multi-Agent Syst.
26
,
37
53
(
2013
).
74.
Y.
Yang
,
R.
Luo
,
M.
Li
,
M.
Zhou
,
W.
Zhang
, and
J.
Wang
, “Mean field multi-agent reinforcement learning,” in International Conference on Machine Learning (PMLR, 2018), pp. 5571–5580.
75.
R.
Carmona
,
M.
Laurière
, and
Z.
Tan
, “
Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning
,”
Ann. Appl. Probab.
33
,
5334
5381
(
2023
).
You do not currently have access to this content.