Integrating residential-level photovoltaic energy generation and energy storage for the on-grid system is essential to reduce electricity use for residential consumption from the grid. However, reaching a reliable and optimal control policy is highly challenging due to the intrinsic uncertainties in the renewable energy sources and fluctuating demand profile. In this work, we proposed and designed an ensemble deep reinforcement learning (DRL) algorithm combined with risk evaluation to solve the energy optimization problem under uncertainties. Meanwhile, the attention and masking layer, the state-of-the-art natural language processing techniques, were incorporated into the algorithm to handle the issue of hard constraints, which are frequently encountered in the renewable energy optimization problem. To the best of our knowledge, this work is the first attempt to tackle the energy optimization problem under uncertainty using a scenario-based ensemble DRL approach with a risk evaluation. Through a well-designed single household microgrid energy management system, we found that the attention and masking layer played a crucial role in fulfilling the hard constraint. The ensemble DRL with the increased number of agents showed a significantly improved energy management policy leading to ∼75% of the cost reduction compared with those obtained by using conventional DRL with a single agent. The risk evaluation revealed that the current ensemble DRL approach possessed a high-risk/high-profit feature, which could be significantly improved by designing a risk-aware reward function in future investigations.

1.
Smart Grid: Networking, Data Management, and Business Models
, 1st ed., edited by
H. T.
Mouftah
and
M.
Erol-Kantarci
(
CRC Press
,
2017
).
2.
M.
Hashmi
,
S.
Hanninen
, and
K.
Maki
, “
Survey of smart grid concepts, architectures, and technological demonstrations worldwide
,” in
2011 IEEE PES Conference on Innovative Smart Grid Technologies Latin America (ISGT LA)
(
IEEE
,
Medellin, Colombia
,
2011
), pp.
1
7
.
3.
M. R.
Alam
,
M.
St-Hilaire
, and
T.
Kunz
, “
Computational methods for residential energy cost optimization in smart grids: A survey
,”
ACM Comput. Surv.
49
,
1
34
(
2016
).
4.
J. C.
Alberizzi
,
M.
Rossi
, and
M.
Renzi
, “
A MILP algorithm for the optimal sizing of an off-grid hybrid renewable energy system in South Tyrol
,”
Energy Rep.
6
,
21
26
(
2020
).
5.
C. d.
Santos
,
E.
Cavalheiro
,
P.
Bartmeyer
, and
C.
Lyra
, “
A MINLP model to optimize battery placement and operation in smart grids
,” in
2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT)
(
IEEE
,
Washington, DC
,
2020
), pp.
1
5
.
6.
P.-H.
Cheng
,
T.-H.
Huang
,
Y.-W.
Chien
,
C.-L.
Wu
,
C.-S.
Tai
, and
L.-C.
Fu
, “
Demand-side management in residential community realizing sharing economy with bidirectional PEV while additionally considering commercial area
,”
Int. J. Electr. Power Energy Syst.
116
,
105512
(
2020
).
7.
Z.
ur Rehman
,
M.
Fatima
,
A.
Khan
,
S.
Shah
,
R.
Wasim Ahmad
,
A.
Din
 et al., “
Energy optimization in a smart community grid system using genetic algorithm
,”
Int. J. Commun. Syst.
e4265
(
2019
).
8.
E.
Fernandez
,
M. J.
Hossain
, and
M. S. H.
Nizami
, “
Game-theoretic approach to demand-side energy management for a smart neighbourhood in Sydney incorporating renewable resources
,”
Appl. Energy
232
,
245
257
(
2018
).
9.
S.
Hussain
,
A.
Al Alili
, and
A. M.
Al Qubaisi
, “
Optimization based fuzzy resource allocation framework for smart grid
,” in
2015 IEEE International Conference on Smart Energy Grid Engineering (SEGE)
(
IEEE
,
Oshawa, ON
,
2015
), pp.
1
6
.
10.
R.
Bagherpour
,
N.
Mozayani
, and
B.
Badnava
, “
Improving demand‐response scheme in smart grids using reinforcement learning
,”
Int. J. Energy Res.
45
,
21082
21095
(
2021
).
11.
H.-M.
Chung
,
S.
Maharjan
,
Y.
Zhang
, and
F.
Eliassen
, “
Distributed deep reinforcement learning for intelligent load scheduling in residential smart grids
,”
IEEE Trans. Ind. Inf.
17
,
2752
2763
(
2021
).
12.
C.
Guo
,
X.
Wang
,
Y.
Zheng
, and
F.
Zhang
, “
Real-time optimal energy management of microgrid with uncertainties based on deep reinforcement learning
,”
Energy
238
,
121873
(
2022
).
13.
S.
Kim
and
H.
Lim
, “
Reinforcement learning based energy management algorithm for smart energy buildings
,”
Energies
11
,
2010
(
2018
).
14.
R.
Lu
,
S. H.
Hong
, and
X.
Zhang
, “
A dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach
,”
Appl. Energy
220
,
220
230
(
2018
).
15.
L.
Liu
,
J.
Zhu
,
J.
Chen
, and
H.
Ye
, “
Deep reinforcement learning for stochastic dynamic microgrid energy management
,” in
2021 IEEE 4th International Electrical and Energy Conference (CIEEC)
, Wuhan, China (
IEEE
,
2021
), pp.
1
6
.
16.
T.
Sogabe
,
D. B.
Malla
,
S.
Takayama
,
S.
Shin
,
K.
Sakamoto
,
K.
Yamaguchi
 et al., “
Smart grid optimization by deep reinforcement learning over discrete and continuous action space
,” in
2018 IEEE 7th World Conference on Photovoltaic Energy Conversion (WCPEC) (A Joint Conference of 45th IEEE PVSC, 28th PVSEC & 34th EU PVSEC)
(
IEEE
,
Waikoloa Village, HI
,
2018
), pp.
3794
3796
.
17.
T.
Sogabe
,
H.
Ichikawa
,
T.
Sogabe
,
K.
Sakamoto
,
K.
Yamaguchi
,
M.
Sogabe
 et al., “
Optimization of decentralized renewable energy system by weather forecasting and deep machine learning techniques
,” in
2016 IEEE Innovative Smart Grid Technologies-Asia (ISGT-Asia)
(
IEEE
,
Melbourne
,
2016
), pp.
1014
–101
8
.
18.
C. H.
Papadimitriou
and
K.
Steiglitz
,
Combinatorial Optimization: Algorithms and Complexity
(
Dover Publications
,
Mineola, NY
,
1998
).
19.
S. J.
Russell
,
P.
Norvig
,
E.
Davis
, and
D.
Edwards
,
Artificial Intelligence: A Modern Approach
, 3rd ed., Global edition (
Pearson
,
Boston
,
2016
).
20.
Gurobi Optimization, LLC.
,
Gurobi Optimizer Reference Manual
(
Gurobi Optimization, LLC
.,
2022
).
21.
IBM ILOG CPLEX
,
V12. 1: User's Manual for CPLEX
(
International Business Machines Corporation
,
2009
), Vol.
46
, p.
157
.
22.
H.
Shokouhandeh
,
M.
Ahmadi Kamarposhti
,
I.
Colak
, and
K.
Eguchi
, “
Unit commitment for power generation systems based on prices in smart grid environment considering uncertainty
,”
Sustainability
13
,
10219
(
2021
).
23.
A.
Zakaria
,
F. B.
Ismail
,
M. S. H.
Lipu
, and
M. A.
Hannan
, “
Uncertainty models for stochastic optimization in renewable energy applications
,”
Renewable Energy
145
,
1543
1571
(
2020
).
24.
L. K. K.
Maia
,
A.
Ochoa Bique
, and
E.
Zondervan
, “
Certainty through uncertainty: Stochastic optimization of grid-integrated large-scale energy storage in Germany
,”
Phys. Sci. Rev.
20200051
(
2021
).
25.
A.
Esmaeily
,
A.
Ahmadi
,
F.
Raeisi
,
M. R.
Ahmadi
,
A.
Esmaeel Nezhad
, and
M.
Janghorbani
, “
Evaluating the effectiveness of mixed-integer linear programming for day-ahead hydro-thermal self-scheduling considering price uncertainty and forced outage rate
,”
Energy
122
,
182
193
(
2017
).
26.
A.
Shukla
and
S. N.
Singh
, “
Clustering based unit commitment with wind power uncertainty
,”
Energy Convers. Manage.
111
,
89
102
(
2016
).
27.
L.
Wu
and
M.
Shahidehpour
, “
Financial risk evaluation in stochastic PBUC
,”
IEEE Trans. Power Syst.
24
,
1896
1897
(
2009
).
28.
P. A.
Ruiz
,
C. R.
Philbrick
, and
P. W.
Sauer
, “
Modeling approaches for computational cost reduction in stochastic unit commitment formulations
,”
IEEE Trans. Power Syst.
25
,
588
589
(
2010
).
29.
C.-L.
Tseng
and
W.
Zhu
, “
Optimal self-scheduling and bidding strategy of a thermal unit subject to ramp constraints and price uncertainty
,”
IET Gener., Transm. Distrib.
4
,
125
(
2010
).
30.
H.
Moghimi
,
A.
Ahmadi
,
J.
Aghaei
, and
A.
Rabiee
, “
Stochastic techno-economic operation of power systems in the presence of distributed energy resources
,”
Int. J. Electr. Power Energy Syst.
45
,
477
488
(
2013
).
31.
A.
Shapiro
, “
Computational complexity of stochastic programming: Monte Carlo sampling approach
,” in
Proceedings of the International Congress of Mathematicians 2010 (ICM 2010)
[
Hindustan Book Agency (HBA)
,
Hyderabad
,
2011
], pp.
2979
2995
.
32.
S.
Fan
,
Z.
Li
,
J.
Wang
,
L.
Piao
, and
Q.
Ai
, “
Cooperative economic scheduling for multiple energy hubs: A bargaining game theoretic perspective
,”
IEEE Access
6
,
27777
27789
(
2018
).
33.
H.
Nafisi
,
S. M. M.
Agah
,
H.
Askarian Abyaneh
, and
M.
Abedi
, “
Two-stage optimization method for energy loss minimization in microgrid based on smart power management scheme of PHEVs
,”
IEEE Trans. Smart Grid
7
,
1268
1276
(
2016
).
34.
G. B.
Dantzig
and
G.
Infanger
, “
Intelligent control and optimization under uncertainty with application to hydro power
,”
Eur. J. Oper. Res.
97
,
396
407
(
1997
).
35.
J.
Cao
,
D.
Harrold
,
Z.
Fan
,
T.
Morstyn
,
D.
Healey
, and
K.
Li
, “
Deep reinforcement learning-based energy storage arbitrage with accurate lithium-ion battery degradation model
,”
IEEE Trans. Smart Grid
11
,
4513
4521
(
2020
).
36.
G. E. P.
Box
,
G. M.
Jenkins
,
G. C.
Reinsel
, and
G. M.
Ljung
,
Time Series Analysis: Forecasting and Control
, 5th ed. (
John Wiley & Sons, Inc.
,
Hoboken, New Jersey
,
2016
).
37.
D.
Arize
and
T.
Nogueira Rios
, “
A comparison study on time series forecasting given smart grid load uncertainties
,” in
2019 8th Brazilian Conference on Intelligent Systems (BRACIS)
(
IEEE
,
Salvador
,
2019
), pp.
257
62
.
38.
E.
Foruzan
,
L.-K.
Soh
, and
S.
Asgarpoor
, “
Reinforcement learning approach for optimal distributed energy management in a microgrid
,”
IEEE Trans. Power Syst.
33
,
5749
5758
(
2018
).
39.
V.
Mnih
,
K.
Kavukcuoglu
,
D.
Silver
,
A. A.
Rusu
,
J.
Veness
,
M. G.
Bellemare
 et al., “
Human-level control through deep reinforcement learning
,”
Nature
518
,
529
533
(
2015
).
40.
T. P.
Lillicrap
,
J. J.
Hunt
,
A.
Pritzel
,
N.
Heess
,
T.
Erez
,
Y.
Tassa
 et al., “
Continuous control with deep reinforcement learning
,” arXiv:150902971 (
2019
).
41.
J.
Schulman
,
S.
Levine
,
P.
Abbeel
,
M.
Jordan
, and
P.
Moritz
, “
Trust region policy optimization
,” in
Proceedings of the 32nd International Conference on Machine Learning
, edited by
F.
Bach
and
D.
Blei
(
PMLR
,
2015
), Vol.
37
, p.
1889
97
.
42.
J.
Schulman
,
F.
Wolski
,
P.
Dhariwal
,
A.
Radford
, and
O.
Klimov
, “
Proximal policy optimization algorithms
,” arXiv:170706347 (
2017
).
43.
I.
Clavera
,
J.
Rothfuss
,
J.
Schulman
,
Y.
Fujita
,
T.
Asfour
, and
P.
Abbeel
, “
Model-based reinforcement learning via meta-policy optimization
,” arXiv:180905214 (
2018
).
44.
H.
Mao
,
S. B.
Venkatakrishnan
,
M.
Schwarzkopf
, and
M.
Alizadeh
, “
Variance reduction for reinforcement learning in input-driven environments
,” arXiv:180702264 (
2019
).
45.
R. S.
Sutton
and
A. G.
Barto
,
Reinforcement Learning: An Introduction
(
MIT Press
,
Cambridge, MA
,
1998
).
46.
K. J.
Åström
, “
Optimal control of Markov processes with incomplete state information
,”
J. Math. Anal. Appl.
10
,
174
205
(
1965
).
47.
R. D.
Smallwood
and
E. J.
Sondik
, “
The optimal control of partially observable Markov processes over a finite horizon
,”
Oper. Res.
21
,
1071
1088
(
1973
).
48.
T.
Kimura
,
K.
Shiba
,
C.-C.
Chen
,
M.
Sogabe
,
K.
Sakamoto
, and
T.
Sogabe
, “
Variational quantum circuit-based reinforcement learning for POMDP and experimental implementation
,”
Math. Probl. Eng.
2021
,
1
11
.
49.
E. A.
Hansen
, “
Solving POMDPs by searching in policy space
,” arXiv:13017380 (
2013
).
50.
I.
Bello
,
H.
Pham
,
Q. V.
Le
,
M.
Norouzi
, and
S.
Bengio
, “
Neural combinatorial optimization with reinforcement learning
,” arXiv:161109940 (
2017
).
51.
W.
Kool
,
H.
van Hoof
, and
M.
Welling
, “
Attention, learn to solve routing problems!
,” arXiv:180308475 (
2019
).
52.
H.
Tehrani
,
Q.
Huy Do
,
M.
Egawa
,
K.
Muto
,
K.
Yoneda
, and
S.
Mita
, “
General behavior and motion model for automated lane change
,” in
2015 IEEE Intelligent Vehicles Symposium (IV)
(
IEEE
,
Seoul
,
2015
), pp.
1154
1159
.
53.
S.
Sharifzadeh
,
I.
Chiotellis
,
R.
Triebel
, and
D.
Cremers
, “
Learning to drive using inverse reinforcement learning and deep Q-networks
,” arXiv:161203653 (
2017
).
54.
T.
Brys
,
A.
Harutyunyan
,
P.
Vrancx
,
M. E.
Taylor
,
D.
Kudenko
, and
A.
Nowe
, “
Multi-objectivization of reinforcement learning problems by reward shaping
,” in
2014 International Joint Conference on Neural Networks (IJCNN)
(
IEEE
,
Beijing
,
2014
), pp.
2315
–23
22
.
55.
A.
Vaswani
,
N.
Shazeer
,
N.
Parmar
,
J.
Uszkoreit
,
L.
Jones
,
A. N.
Gomez
 et al., “
Attention is all you need
,” arXiv:170603762 (
2017
).
56.
J.
Guo
,
X.
Nie
,
Y.
Ma
,
K.
Shaheed
,
I.
Ullah
, and
Y.
Yin
, “
Attention based consistent semantic learning for micro-video scene recognition
,”
Inf. Sci.
543
,
504
516
(
2021
).
57.
T.
Dash
,
S.
Chitlangia
,
A.
Ahuja
, and
A.
Srinivasan
, “
A review of some techniques for inclusion of domain-knowledge into deep neural networks
,”
Sci. Rep.
12
,
1040
(
2022
).
58.
X.
Wang
,
Y.
Tian
,
X.
Zhao
,
T.
Yang
,
J.
Gelernter
,
J.
Wang
 et al., “
Improving multiperson pose estimation by mask-aware deep reinforcement learning
,”
ACM Trans. Multimedia Comput. Commun. Appl.
16
,
1
18
(
2020
).
59.
L.
Rokach
, “
Ensemble-based classifiers
,”
Artif. Intell. Rev.
33
,
1
39
(
2010
).
60.
T.
Chen
and
C.
Guestrin
, “
XGBoost: A scalable tree boosting system
,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(
ACM
,
New York
,
2016
), pp.
785
94
.
61.
G.
Ke
,
Q.
Meng
,
T.
Finley
,
T.
Wang
,
W.
Chen
,
W.
Ma
 et al., “
LightGBM: A highly efficient gradient boosting decision tree
,”
Adv. Neural Inf. Process. Syst.
30
,
3146
3154
(
2017
), available at http://citebay.com/how-to-cite/light-gradient-boostingmachine/.
62.
J.
García
and
F.
Fernández
, “
A comprehensive survey on safe reinforcement learning
,”
J. Mach. Learn. Res.
16
,
1437
1480
(
2015
).
You do not currently have access to this content.