We present the use of modern machine learning approaches to suppress self-sustained collective oscillations typically signaled by ensembles of degenerative neurons in the brain. The proposed hybrid model relies on two major components: an environment of oscillators and a policy-based reinforcement learning block. We report a model-agnostic synchrony control based on proximal policy optimization and two artificial neural networks in an Actor–Critic configuration. A class of physically meaningful reward functions enabling the suppression of collective oscillatory mode is proposed. The synchrony suppression is demonstrated for two models of neuronal populations—for the ensembles of globally coupled limit-cycle Bonhoeffer–van der Pol oscillators and for the bursting Hindmarsh–Rose neurons using rectangular and charge-balanced stimuli.

1.
A.
Benabid
,
P.
Pollak
,
C.
Gervason
,
D.
Hoffmann
,
D.
Gao
,
M.
Hommel
,
J.
Perret
, and
J.
De Rougemont
, “
Long-term suppression of tremor by chronic stimulation of the ventral intermediate thalamic nucleus
,”
Lancet
337
,
403
406
(
1991
);
[PubMed]
A.
Benabid
,
S.
Chabardes
,
J.
Mitrofanis
, and
P.
Pollak
, “
Deep brain stimulation of the subthalamic nucleus for the treatment of Parkinson’s disease
,”
Lancet Neurol.
8
,
67
81
(
2009
);
[PubMed]
A.
Kühn
and
J.
Volkmann
, “
Innovations in deep brain stimulation methodology
,”
Mov. Disorders
32
,
11
(
2017
).
2.
M. D.
Johnson
,
S.
Miocinovic
,
C. C.
McIntyre
, and
J. L.
Vitek
, “
Mechanisms and targets of deep brain stimulation in movement disorders
,”
Neurotherapeutics
5
,
294
308
(
2008
);
[PubMed]
V.
Gradinaru
,
M.
Mogri
,
K. R.
Thompson
,
J. M.
Henderson
, and
K.
Deisseroth
, “
Optical deconstruction of Parkinsonian neural circuitry
,”
Science
324
,
354
359
(
2009
);
[PubMed]
J.-M.
Deniau
,
B.
Degos
,
C.
Bosch
, and
N.
Maurice
, “
Deep brain stimulation mechanisms: Beyond the concept of local functional inhibition
,”
European Journal of Neuroscience
32
,
1080
1091
(
2010
).
[PubMed]
3.
P. A.
Tass
,
Phase Resetting in Medicine and Biology. Stochastic Modelling and Data Analysis
(
Springer-Verlag
,
Berlin
,
1999
).
4.
M. G.
Rosenblum
and
A. S.
Pikovsky
, “
Controlling synchrony in ensemble of globally coupled oscillators
,”
Phys. Rev. Lett.
92
,
114102
(
2004
);
[PubMed]
M. G.
Rosenblum
and
A. S.
Pikovsky
“Delayed feedback control of collective synchrony: An approach to suppression of pathological brain rhythms,”
Phys. Rev. E
70
,
041904
(
2004
).
5.
O.
Popovych
,
C.
Hauptmann
, and
P. A.
Tass
, “
Effective desynchronization by nonlinear delayed feedback
,”
Phys. Rev. Lett.
94
,
164102
(
2005
).
6.
N.
Tukhlina
,
M.
Rosenblum
,
A.
Pikovsky
, and
J.
Kurths
, “
Feedback suppression of neural synchrony by vanishing stimulation
,”
Phys. Rev. E
75
,
011019
(
2007
).
7.
G.
Montaseri
,
M.
Javad Yazdanpanah
,
A.
Pikovsky
, and
M.
Rosenblum
, “
Synchrony suppression in ensembles of coupled oscillators via adaptive vanishing feedback
,”
Chaos
23
,
033122
(
2013
).
8.
W.
Lin
,
Y.
Pu
,
Y.
Guo
, and
J.
Kurths
, “
Oscillation suppression and synchronization: Frequencies determine the role of control with time delays
,”
Europhys. Lett.
102
,
20003
(
2013
);
S.
Zhou
,
P.
Ji
,
Q.
Zhou
,
J.
Feng
,
J.
Kurths
, and
W.
Lin
, “
Adaptive elimination of synchronization in coupled oscillator
,”
New J. Phys.
19
,
083004
(
2017
).
9.
O.
Popovych
,
B.
Lysyansky
,
M.
Rosenblum
,
A.
Pikovsky
, and
P.
Tass
, “
Pulsatile desynchronizing delayed feedback for closed-loop deep brain stimulation
,”
PLoS ONE
12
,
e0173363
(
2017
).
10.
P. A.
Tass
, “
Effective desynchronization by means of double-pulse phase resetting
,”
Europhys Lett.
53
,
15
21
(
2001
);
C.
Hauptmann
and
P. A.
Tass
, “
Cumulative and after-effects of short and weak coordinated reset stimulation: A modeling study
,”
J. Neural Eng.
6
,
016004
(
2009
);
[PubMed]
O. V.
Popovych
and
P. A.
Tass
, “
Desynchronizing electrical and sensory coordinated reset neuromodulation
,”
Front. Hum. Neurosci.
6
,
58
(
2012
);
[PubMed]
D.
Wilson
and
J.
Moehlis
, “
Clustered desynchronization from high-frequency deep brain stimulation
,”
PLOS Comput. Biol.
11
,
1
26
(
2016
);
A.
Holt
,
D.
Wilson
,
M.
Shinn
,
J.
Moehlis
, and
T.
Netoff
, “
Phasic burst stimulation: A closed-loop approach to tuning deep brain stimulation parameters for parkinson
’s disease,”
PLoS Comput. Biol.
12
,
e1005011
(
2016
).
[PubMed]
11.
B.
Rosin
,
M.
Slovik
,
R.
Mitelman
,
M.
Rivlin-Etzion
,
S. N.
Haber
,
Z.
Israel
,
E.
Vaadia
, and
H.
Bergman
, “
Closed-loop deep brain stimulation is superior in ameliorating parkinsonism
,”
Neuron
72
,
370
384
(
2011
);
[PubMed]
S.
Little
,
A.
Pogosyan
,
S.
Neal
,
B.
Zavala
,
L.
Zrinzo
,
M.
Hariz
,
T.
Foltynie
,
P.
Limousin
,
K.
Ashkan
,
J.
FitzGerald
,
A.
Green
,
T.
Aziz
, and
P.
Brown
, “
Adaptive deep brain stimulation in advanced Parkinson disease
,”
Ann Neurol.
74
,
449
(
2013
).
[PubMed]
12.
S.
Herzog
,
F.
Wörgötter
, and
U.
Parlitz
, “
Data-driven modeling and prediction of complex spatio-temporal dynamics in excitable media
,”
Front. Appl. Math. Stat.
4
,
60
(
2018
);
J.
Pathak
,
B.
Hunt
,
M.
Girvan
,
Z.
Lu
, and
E.
Ott
, “
Model-free prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach
,”
Phys. Rev. Lett.
120
,
024102
(
2018
);
[PubMed]
R. S.
Zimmermann
and
U.
Parlitz
, “
Observing spatio-temporal dynamics of excitable media using reservoir computing
,”
Chaos
28
,
043118
(
2018
);
[PubMed]
M.
Quade
,
M.
Abel
,
J.
Nathan Kutz
, and
S. L.
Brunton
, “
Sparse identification of nonlinear dynamics for rapid model recovery
,”
Chaos
28
,
063116
(
2018
);
[PubMed]
R.
Cestnik
and
M.
Abel
, “
Inferring the dynamics of oscillatory systems using recurrent neural networks
,”
Chaos
29
,
063128
(
2019
);
[PubMed]
T.
Weng
,
H.
Yang
,
C.
Gu
,
J.
Zhang
, and
M.
Small
, “
Synchronization of chaotic systems and their machine-learning models
,”
Phys. Rev. E
99
,
042203
(
2019
);
[PubMed]
K.
Yeo
and
I.
Melnyk
, “
Deep learning algorithm for data-driven simulation of noisy dynamical system
,”
J. Comput. Phys.
376
,
1212
1231
(
2019
).
13.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
, 2nd ed. (
MIT Press
,
2018
).
14.
L. P.
Kaelbling
,
M. L.
Littman
, and
A. W.
Moore
, “
Reinforcement learning: A survey
,”
J. Artif. Intell. Res.
4
,
237
285
(
1996
).
15.
J.
Schulman
,
F.
Wolski
,
P.
Dhariwal
,
A.
Radford
, and
O.
Klimov
, “Proximal policy optimization algorithms,” arXiv:1707.06347 (2017).
16.
A.
Hill
,
A.
Raffin
,
M.
Ernestus
,
A.
Gleave
,
R.
Traore
,
P.
Dhariwal
,
C.
Hesse
,
O.
Klimov
,
A.
Nichol
,
M.
Plappert
,
A.
Radford
,
J.
Schulman
,
S.
Sidor
, and
Y.
Wu
, see https://github.com/hill-a/stable-baselines for “Stable Baselines” (2018).
17.
J. L.
Hindmarsh
and
R. M.
Rose
, “
A model for neuronal bursting using three coupled first order differential equations
,”
Proc. R. Soc. Lond. B
221
,
87
(
1984
).
18.
The complete cycle of the diagram in Fig. 1 is a multiple of δ.
19.
Currently available DBS devices deliver a pulsatile stimulation with a frequency of about 120 Hz. In particular, the pulse shape shown in Fig. 3(b) is used.
20.
A. Y.
Ng
,
D.
Harada
, and
S.
Russell
, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Proceedings of the Sixteenth International Conference on Machine Learning (ICML, 1999), pp. 278–287.
21.
T is a macroscopic time scale equal to the duration of stimuli application. For prospective wearable DBS systems, T could be considered infinitely large.
22.
R. S.
Sutton
,
D.
McAllester
,
S.
Singh
, and
Y.
Mansour
, “
Policy gradient methods for reinforcement learning with function approximation
,”
Neural Inf. Proc. Syst.
12
,
1057
1063
(
1999
). https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation
23.
M.
Abadi
,
A.
Agarwal
,
P.
Barham
,
E.
Brevdo
,
Z.
Chen
,
C.
Citro
,
G. S.
Corrado
,
A.
Davis
,
J.
Dean
,
M.
Devin
,
S.
Ghemawat
,
I.
Goodfellow
,
A.
Harp
,
G.
Irving
,
M.
Isard
,
Y.
Jia
,
R.
Jozefowicz
,
L.
Kaiser
,
M.
Kudlur
,
J.
Levenberg
,
D.
Mané
,
R.
Monga
,
S.
Moore
,
D.
Murray
,
C.
Olah
,
M.
Schuster
,
J.
Shlens
,
B.
Steiner
,
I.
Sutskever
,
K.
Talwar
,
P.
Tucker
,
V.
Vanhoucke
,
V.
Vasudevan
,
F.
Viégas
,
O.
Vinyals
,
P.
Warden
,
M.
Wattenberg
,
M.
Wicke
,
Y.
Yu
, and
X.
Zheng
, see https://www.tensorflow.org/ for “TensorFlow: Large-scale Machine Learning on Heterogeneous Systems” (2015).
24.
Y.
Kuramoto
,
Chemical Oscillations, Waves and Turbulence
(
Springer
,
Berlin
,
1984
).
25.
A.
Pikovsky
and
S.
Ruffo
, “
Finite-size effects in a population of interacting oscillators
,”
Phys. Rev. E
59
,
1633
1636
(
1999
).
26.
The level of the finite-size fluctuations can be estimated as std(X) for the sub-threshold values of the coupling ε.
27.
K.
Arulkumaran
,
M. P.
Deisenroth
,
M.
Brundage
, and
A. A.
Bharath
, “
Deep reinforcement learning: A brief survey
,”
IEEE Signal Process. Mag.
34
,
26
38
(
2017
).
28.
D. V.
Dylov
,
L.
Waller
, and
J. W.
Fleischer
, “
Nonlinear restoration of diffused images via seeded instability
,”
IEEE J. Sel. Top. Quantum Electron.
18
,
916
925
(
2011
);
D. V.
Dylov
,
L.
Waller
, and
J. W.
Fleischer
“Instability-driven recovery of diffused images,”
Opt. Lett.
36
,
3711
3713
(
2011
).
[PubMed]
29.
D. V.
Dylov
and
J. W.
Fleischer
, “
Nonlinear self-filtering of noisy images via dynamical stochastic resonance
,”
Nat. Photonics
4
,
323
(
2010
).
30.
L.
Busoniu
,
R.
Babuska
, and
B.
De Schutter
, “Multi-agent reinforcement learning: A survey,” in 2006 9th International Conference on Control, Automation, Robotics and Vision (IEEE, 2006), pp. 1–6.
31.
Notice that feedback-based techniques reported in Ref. 9 also do not use any information about the model of the system but explicitly assume that pathological activity emerges due to synchrony.
32.
P.
Sanz Leon
,
S.
Knock
,
M.
Woodman
,
L.
Domide
,
J.
Mersmann
,
A.
McIntosh
, and
V.
Jirsa
, “
The virtual brain: A simulator of primate brain network dynamics
,”
Front. Neuroinform.
7
,
10
(
2013
).
33.
P. A.
Tass
and
M.
Majtanik
, “
Long-term anti-kindling effects of desynchronizing brain stimulation: A theoretical study
,”
Biol. Cybern.
94
,
58
66
(
2006
).
34.
F.
Blandini
and
M.-T.
Armentero
, “
Animal models of Parkinson’s disease
,”
FEBS J.
279
,
1156
1166
(
2012
).
35.
Notice that the previously developed feedback-based suppression techniques were analyzed theoretically only for the unrealistic continuous-time stimulation and their modification to the pulsatile stimulation was an ad hoc, relying on the assumption of a smooth continuous envelope.
You do not currently have access to this content.