At present, the research on the dynamics of cooperative behavior of agents under reinforcement learning mechanism either assumes that agents have global interaction, that is, agents interact with all other agents in the population, or directly study the influence of relevant factors on cooperation evolution based on the local interaction in a network structure. It neglects to formally study how the limitation of agents that only interact with local agents affects their strategy choice. Thus, in this paper, we study the cooperative behavior of agents in a typical social decision-making environment with conflicts between individual interests and collective interests. On the one hand, a programmed game model in game theory, namely, prisoner’s dilemma game, is used to capture the essence of real-world dilemmas. On the other hand, the effects of local and global strategy learning on the cooperative evolution of agents are investigated separately, and the nature of spatial reciprocity under the reinforcement learning mechanism is found. Specifically, when there is no inherent connection between the interacting agents and the learning agents within the system, the network structure has a limited effect on promoting cooperation. It is only when there is an overlap between the interacting agents and the learning agents that the spatial reciprocity effect observed in the traditional evolutionary game theory can be fully realized.

1.
R. M.
Dawes
, “
Social dilemmas
,”
Annu. Rev. Psychol.
31
,
169
193
(
1980
).
2.
M. W.
Macy
and
A.
Flache
, “
Learning dynamics in social dilemmas
,”
Proc. Natl. Acad. Sci.
99
,
7229
7236
(
2002
).
3.
E.
Pennisi
, “
How did cooperative behavior evolve
?”
Science
309
,
93
(
2005
).
4.
A. M.
Colman
, “
The puzzle of cooperation
,”
Nature
440
,
744
745
(
2006
).
5.
M.
Perc
and
M.
Marhl
, “
Evolutionary and dynamical coherence resonances in the pair approximated prisoner’s dilemma game
,”
New J. Phys.
8
,
142
(
2006
).
6.
J.
Zhang
,
C.
Zhang
,
T.
Chu
, and
M.
Perc
, “
Resolution of the stochastic strategy spatial prisoner’s dilemma by means of particle swarm optimization
,”
PLoS One
6
,
e21787
(
2011
).
7.
M. A.
Nowak
and
R. M.
May
, “
Evolutionary games and spatial chaos
,”
Nature
359
,
826
829
(
1992
).
8.
Z.-X.
Wu
,
X.-J.
Xu
,
Z.-G.
Huang
,
S.-J.
Wang
, and
Y.-H.
Wang
, “
Evolutionary prisoner’s dilemma game with dynamic preferential selection
,”
Phys. Rev. E
74
,
021107
(
2006
).
9.
A.
Szolnoki
,
M.
Mobilia
,
L.-L.
Jiang
,
B.
Szczesny
,
A. M.
Rucklidge
, and
M.
Perc
, “
Cyclic dominance in evolutionary games: A review
,”
J. R. Soc. Interface
11
,
20140735
(
2014
).
10.
M.
Perc
,
J. J.
Jordan
,
D. G.
Rand
,
Z.
Wang
,
S.
Boccaletti
, and
A.
Szolnoki
, “
Statistical physics of human cooperation
,”
Phys. Rep.
687
,
1
51
(
2017
).
11.
M.
Perc
,
J.
Gómez-Gardenes
,
A.
Szolnoki
,
L. M.
Floría
, and
Y.
Moreno
, “
Evolutionary dynamics of group interactions on structured populations: A review
,”
J. R. Soc. Interface
10
,
20120997
(
2013
).
12.
S.
Boccaletti
,
G.
Bianconi
,
R.
Criado
,
C. I.
Del Genio
,
J.
Gómez-Gardenes
,
M.
Romance
,
I.
Sendina-Nadal
,
Z.
Wang
, and
M.
Zanin
, “
The structure and dynamics of multilayer networks
,”
Phys. Rep.
544
,
1
122
(
2014
).
13.
C.
Wang
,
M.
Perc
, and
A.
Szolnoki
, “
Evolutionary dynamics of any multiplayer game on regular graphs
,”
Nat. Commun.
15
,
5349
(
2024
).
14.
M. A.
Amaral
,
L.
Wardil
,
M.
Perc
, and
J. K.
da Silva
, “
Stochastic win-stay-lose-shift strategy with dynamic aspirations in evolutionary social dilemmas
,”
Phys. Rev. E
94
,
032317
(
2016
).
15.
F.
Zhang
,
J.
Wang
,
H.
Gao
,
X.
Li
, and
C.
Xia
, “
Role of strategy update rules in the spatial memory-based mixed strategy games
,”
Eur. Phys. J. B
94
,
1
11
(
2021
).
16.
C.
Hilbe
,
L. A.
Martinez-Vaquero
,
K.
Chatterjee
, and
M. A.
Nowak
, “
Memory-n strategies of direct reciprocity
,”
Proc. Natl. Acad. Sci.
114
,
4715
4720
(
2017
).
17.
N. E.
Glynatsi
,
A.
McAvoy
, and
C.
Hilbe
, “
Evolution of reciprocity with limited payoff memory
,”
Proc. R. Soc. B
291
,
20232493
(
2024
).
18.
F.
Fu
,
C.
Hauert
,
M. A.
Nowak
, and
L.
Wang
, “
Reputation-based partner choice promotes cooperation in social networks
,”
Phys. Rev. E
78
,
026117
(
2008
).
19.
A.
Szolnoki
,
M.
Perc
,
G.
Szabó
, and
H.-U.
Stark
, “
Impact of aging on the evolution of cooperation in the spatial prisoner’s dilemma game
,”
Phys. Rev. E
80
,
021901
(
2009
).
20.
M.
Perc
and
Z.
Wang
, “
Heterogeneous aspirations promote cooperation in the prisoner’s dilemma game
,”
PLoS One
5
,
e15117
(
2010
).
21.
S.
Wang
,
X.
Chen
, and
A.
Szolnoki
, “
Exploring optimal institutional incentives for public cooperation
,”
Commun. Nonlinear Sci. Numer. Simul.
79
,
104914
(
2019
).
22.
Q.
Song
,
Z.
Cao
,
R.
Tao
,
W.
Jiang
,
C.
Liu
, and
J.
Liu
, “
Conditional neutral punishment promotes cooperation in the spatial prisoner’s dilemma game
,”
Appl. Math. Comput.
368
,
124798
(
2020
).
23.
T.
Gross
and
B.
Blasius
, “
Adaptive coevolutionary networks: A review
,”
J. R. Soc. Interface
5
,
259
271
(
2008
).
24.
M. A.
Nowak
, “
Five rules for the evolution of cooperation
,”
Science
314
,
1560
1563
(
2006
).
25.
A.
Yamauchi
,
J.
Tanimoto
, and
A.
Hagishima
, “
What controls network reciprocity in the prisoner’s dilemma game
?”
BioSystems
102
,
82
87
(
2010
).
26.
G.-Q.
Zhang
,
Q.-B.
Sun
, and
L.
Wang
, “
Noise-induced enhancement of network reciprocity in social dilemmas
,”
Chaos, Solitons Fractals
51
,
31
35
(
2013
).
27.
S.
Kokubo
,
Z.
Wang
, and
J.
Tanimoto
, “
Spatial reciprocity for discrete, continuous and mixed strategy setups
,”
Appl. Math. Comput.
259
,
552
568
(
2015
).
28.
J.
Tanimoto
, “
How does resolution of strategy affect network reciprocity in spatial prisoner’s dilemma games
?”
Appl. Math. Comput.
301
,
36
42
(
2017
).
29.
Q.
Su
,
A.
Li
,
L.
Wang
, and
H.
Eugene Stanley
, “
Spatial reciprocity in the evolution of cooperation
,”
Proc. R. Soc. B
286
,
20190041
(
2019
).
30.
X.
Ma
,
J.
Quan
, and
X.
Wang
, “
Effect of reciprocity mechanisms on evolutionary dynamics in feedback-evolving games
,”
Nonlinear Dyn.
112
,
709
729
(
2024
).
31.
D.
Bloembergen
,
K.
Tuyls
,
D.
Hennes
, and
M.
Kaisers
, “
Evolutionary dynamics of multi-agent learning: A survey
,”
J. Artif. Intell. Res.
53
,
659
697
(
2015
).
32.
Y.
Rizk
,
M.
Awad
, and
E. W.
Tunstel
, “
Decision making in multiagent systems: A survey
,”
IEEE Trans. Cogn. Develop. Syst.
10
,
514
529
(
2018
).
33.
F. P.
Santos
, “Dynamics of cooperation and conflict in multiagent systems,” in Proceedings of the AAAI Conference on Artificial Intelligence (2023), Vol. 37(13), p. 15453.
34.
S. V.
Albrecht
and
M.
Woolridge
, “
Emergent behaviours in multi-agent systems with evolutionary game theory
,”
AI Commun.
35
(4) (
2022
).
35.
H.
Guo
,
C.
Mu
,
Y.
Chen
,
C.
Shen
,
S.
Hu
, and
Z.
Wang
, “
Multi-agent, human-agent and beyond: A survey on cooperation in social dilemmas
,”
Neurocomputing
610
,
128514
(
2024
).
36.
C.
Yu
,
M.
Zhang
,
F.
Ren
, and
G.
Tan
, “
Emotional multiagent reinforcement learning in spatial social dilemmas
,”
IEEE Trans. Neural Netw. Learn. Syst.
26
,
3083
3096
(
2015
).
37.
Y.
Usui
and
M.
Ueda
, “
Symmetric equilibrium of multi-agent reinforcement learning in repeated prisoner’s dilemma
,”
Appl. Math. Comput.
409
,
126370
(
2021
).
38.
G.
Szabó
and
C.
Tőke
, “
Evolutionary prisoner’s dilemma game on a square lattice
,”
Phys. Rev. E
58
,
69
(
1998
).
39.
C. J.
Watkins
and
P.
Dayan
, “
Q-learning
,”
Mach. Learn.
8
,
279
292
(
1992
).
40.
S.-P.
Zhang
,
J.-Q.
Zhang
,
L.
Chen
, and
X.-D.
Liu
, “
Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning
,”
Nonlinear Dyn.
99
,
3301
3312
(
2020
).
41.
M. A.
Nowak
,
C. E.
Tarnita
, and
T.
Antal
, “
Evolutionary dynamics in structured populations
,”
Philos. Trans. R. Soc. B: Biol. Sci.
365
,
19
30
(
2010
).
42.
F.
Débarre
,
C.
Hauert
, and
M.
Doebeli
, “
Social evolution in structured populations
,”
Nat. Commun.
5
,
1
7
(
2014
).
43.
J.
Zhao
,
X.
Wang
,
C.
Gu
, and
Y.
Qin
, “
Structural heterogeneity and evolutionary dynamics on complex networks
,”
Dyn. Games Appl.
11
,
612
629
(
2021
).
44.
A.
Civilini
,
N.
Anbarci
, and
V.
Latora
, “
Evolutionary game model of group choice dilemmas on hypergraphs
,”
Phys. Rev. Lett.
127
,
268301
(
2021
).
45.
C.
Du
,
Y.
Lu
,
H.
Meng
, and
J.
Park
, “
Evolution of cooperation on reinforcement-learning driven-adaptive networks
,”
Chaos
34
,
041101
(
2024
).
46.
X.
Wang
,
Z.
Yang
,
Y.
Liu
, and
G.
Chen
, “
A reinforcement learning-based strategy updating model for the cooperative evolution
,”
Physica A
618
,
128699
(
2023
).
47.
H.
Ding
,
G.-s.
Zhang
,
S.-h.
Wang
,
J.
Li
, and
Z.
Wang
, “
Q-learning boosts the evolution of cooperation in structured population by involving extortion
,”
Physica A
536
,
122551
(
2019
).
48.
L.
Wang
,
D.
Jia
,
L.
Zhang
,
P.
Zhu
,
M.
Perc
,
L.
Shi
, and
Z.
Wang
, “
Lévy noise promotes cooperation in the prisoner’s dilemma game with reinforcement learning
,”
Nonlinear Dyn.
108
,
1837
1845
(
2022
).
49.
L.
Fan
,
Z.
Song
,
L.
Wang
,
Y.
Liu
, and
Z.
Wang
, “
Incorporating social payoff into reinforcement learning promotes cooperation
,”
Chaos
32
,
123140
(
2022
).
50.
Z.
Yang
,
L.
Zheng
,
M.
Perc
, and
Y.
Li
, “
Interaction state q-learning promotes cooperation in the spatial prisoner’s dilemma game
,”
Appl. Math. Comput.
463
,
128364
(
2024
).
51.
Y.
Geng
,
Y.
Liu
,
Y.
Lu
,
C.
Shen
, and
L.
Shi
, “
Reinforcement learning explains various conditional cooperation
,”
Appl. Math. Comput.
427
,
127182
(
2022
).
52.
E.
Rodrigues Gomes
and
R.
Kowalczyk
, “Dynamic analysis of multiagent q-learning with ε-greedy exploration,” in Proceedings of the 26th Annual International Conference on Machine Learning, June (2009), pp. 369–376.
53.
M.
Ye
,
C.
Tianqing
, and
F.
Wenhui
, “
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning
,”
J. Syst. Eng. Electron.
32
,
642
657
(
2021
).
54.
S.
Hua
and
L.
Liu
, “
Coevolutionary dynamics of population and institutional rewards in public goods games
,”
Expert Syst. Appl.
237
,
121579
(
2024
).
55.
Y.
Fang
,
T. P.
Benko
,
M.
Perc
,
H.
Xu
, and
Q.
Tan
, “
Synergistic third-party rewarding and punishment in the public goods game
,”
Proc. R. Soc. A
475
,
20190349
(
2019
).
56.
S.
Hua
,
Z.
Hui
, and
L.
Liu
, “
Evolution of conditional cooperation in collective-risk social dilemma with repeated group interactions
,”
Proc. R. Soc. B
290
,
20230949
(
2023
).
57.
W.-J.
Li
,
Z.
Chen
,
L.-L.
Jiang
, and
M.
Perc
, “
Information sharing promotes cooperation among mobile individuals in multiplex networks
,”
Nonlinear Dyn.
112
,
20339
20352
(
2024
).
58.
C.
Xia
,
J.
Wang
,
M.
Perc
, and
Z.
Wang
, “
Reputation and reciprocity
,”
Phys. Life Rev.
46
,
8
45
(
2023
).
59.
Z.
Zhu
,
X.
Wang
,
L.
Liu
, and
S.
Hua
, “
Green sensitivity in supply chain management: An evolutionary game theory approach
,”
Chaos, Solitons Fractals
173
,
113595
(
2023
).
You do not currently have access to this content.