A foundational machine-learning architecture is reinforcement learning, where an outstanding problem is achieving an optimal balance between exploration and exploitation. Specifically, exploration enables the agents to discover optimal policies in unknown domains of the environment for gaining potentially large future rewards, while exploitation relies on the already acquired knowledge to maximize the immediate rewards. We articulate an approach to this problem, treating the dynamical process of reinforcement learning as a Markov decision process that can be modeled as a nondeterministic finite automaton and defining a subset of states in the automaton to represent the preference for exploring unknown domains of the environment. Exploration is prioritized by assigning higher transition probabilities to these states. We derive a mathematical framework to systematically balance exploration and exploitation by formulating it as a mixed integer programming (MIP) problem to optimize the agent’s actions and maximize the discovery of novel preferential states. Solving the MIP problem provides a trade-off point between exploiting known states and exploring unexplored regions. We validate the framework computationally with a benchmark system and argue that the articulated automaton is effectively an adaptive network with a time-varying connection matrix, where the states in the automaton are nodes and the transitions among the states represent the edges. The network is adaptive because the transition probabilities evolve over time. The established connection between the adaptive automaton arising from reinforcement learning and the adaptive network opens the door to applying theories of complex dynamical networks to address frontier problems in machine learning and artificial intelligence.
Skip Nav Destination
,
,
,
Article navigation
December 2024
Research Article|
December 03 2024
Adaptive network approach to exploration–exploitation trade-off in reinforcement learning Available to Purchase
Special Collection:
Advances in Adaptive Dynamical Networks
Mohammadamin Moradi
;
Mohammadamin Moradi
(Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft)
1
School of Electrical, Computer and Energy Engineering, Arizona State University
, Tempe, Arizona 85287, USA
Search for other works by this author on:
Zheng-Meng Zhai
;
Zheng-Meng Zhai
(Conceptualization, Investigation, Writing – original draft)
1
School of Electrical, Computer and Energy Engineering, Arizona State University
, Tempe, Arizona 85287, USA
Search for other works by this author on:
Shirin Panahi
;
Shirin Panahi
(Conceptualization, Investigation)
1
School of Electrical, Computer and Energy Engineering, Arizona State University
, Tempe, Arizona 85287, USA
Search for other works by this author on:
Ying-Cheng Lai
Ying-Cheng Lai
a)
(Conceptualization, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing)
1
School of Electrical, Computer and Energy Engineering, Arizona State University
, Tempe, Arizona 85287, USA
2
Department of Physics, Arizona State University
, Tempe, Arizona 85287, USA
a)Author to whom correspondence should be addressed: [email protected]
Search for other works by this author on:
Mohammadamin Moradi
1
Zheng-Meng Zhai
1
Shirin Panahi
1
Ying-Cheng Lai
1,2,a)
1
School of Electrical, Computer and Energy Engineering, Arizona State University
, Tempe, Arizona 85287, USA
2
Department of Physics, Arizona State University
, Tempe, Arizona 85287, USA
a)Author to whom correspondence should be addressed: [email protected]
Chaos 34, 123120 (2024)
Article history
Received:
June 03 2024
Accepted:
November 09 2024
Citation
Mohammadamin Moradi, Zheng-Meng Zhai, Shirin Panahi, Ying-Cheng Lai; Adaptive network approach to exploration–exploitation trade-off in reinforcement learning. Chaos 1 December 2024; 34 (12): 123120. https://doi.org/10.1063/5.0221833
Download citation file:
Pay-Per-View Access
$40.00
Sign In
You could not be signed in. Please check your credentials and make sure you have an active account and try again.
121
Views
Citing articles via
Reservoir computing with the minimum description length principle
Antony Mizzi, Michael Small, et al.
Recent achievements in nonlinear dynamics, synchronization, and networks
Dibakar Ghosh, Norbert Marwan, et al.
Data-driven nonlinear model reduction to spectral submanifolds via oblique projection
Leonardo Bettini, Bálint Kaszás, et al.
Related Content
Heterogeneous reinforcement learning for defending power grids against attacks
APL Mach. Learn. (June 2024)
Optimizing testing strategies for early detection of disease outbreaks in animal trade networks via MCMC
Chaos (April 2023)
Algorithmic trading bot using reinforcement learning
AIP Conf. Proc. (April 2025)
Peer-to-peer energy trading in a community based on deep reinforcement learning
J. Renewable Sustainable Energy (December 2023)