This study proposes a modified gaps filling method, expanding the column mean imputation method and evaluated using randomly generated missing values comprising 5%, 10%, 15%, and 20% of the original data on power output. The XGBoost algorithm was implemented as a forecasting model using the original and processed datasets and two sources of solar radiation data, namely, Shortwave Radiation (SWR) from Advanced Himawari Imager 8 (AHI-8) and Surface Solar Radiation Downward (SSRD) from ERA5 global reanalysis data. The accuracy of the two sets of forecasted power output was evaluated using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Results show that by applying the proposed gap filling method and using SWR in forecasting solar photovoltaic (PV) output, the improvement in the RMSE and MAE values range from 12.52% to 24.30% and from 21.10% to 31.31%, respectively. Meanwhile, using SSRD, the improvement in the RMSE values range from 14.01% to 28.54% and MAE values from 22.39% to 35.53%. To further evaluate the accuracy of the proposed gap-filling method, the proposed method could be validated using different datasets and other forecasting methods. Future studies could also consider applying the said method to datasets with data gaps higher than 20%.

1.
D.
Fung
, “
Methods for the estimation of missing values in time series
,”
M.S. thesis (
Edith Cowan University, 2006
)
.
2.
I. S.
Iwueze
,
E. C.
Nwogu
,
V. U.
Nlebedim
,
U. I.
Nwosu
, and
U. E.
Chinyem
, “
Comparison of methods of estimating missing values in time series
,”
Open J. Stat.
8
(
2
),
390
399
(
2018
).
3.
G.
Box
, “
Box and Jenkins: Time series analysis, forecasting and control
,” in
A Very British Affair: Six Britons and the Development of Time Series Analysis during the 20th Century
, edited by
T. C.
Mills
(
Palgrave Macmillan
,
London
,
UK
,
2013
), pp.
161
215
.
4.
H.
Demirhan
and
Z.
Renwick
, “
Missing value imputation for short to mid-term horizontal solar irradiance data
,”
Appl. Energy
225
,
998
1012
(
2018
).
5.
I.
Laña
,
I.
(Iñaki) Olabarrieta
,
M.
Vélez
, and
J.
Del Ser
, “
On the imputation of missing data for road traffic forecasting: New insights and novel techniques
,”
Transp. Res., Part C
90
,
18
33
(
2018
).
6.
T.
Kim
,
W.
Ko
, and
J.
Kim
, “
Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting
,”
Appl. Sci.
9
(
1
),
204
(
2019
).
7.
D.
Sovilj
et al, “
Extreme learning machine for missing data using multiple imputations
,”
Neurocomputing
174
,
220
231
(
2016
).
8.
C. A. W.
Glas
, “
Missing data
,” in
International Encyclopedia of Education
,
3rd ed.
, edited by
P.
Peterson
,
E.
Baker
, and
B.
McGaw
(
Elsevier
,
Oxford
,
2010
), pp.
283
288
.
9.
M.
Jamshidian
and
M.
Mata
, “
2-advances in analysis of mean and covariance structure when data are incomplete this research was supported in part by the National Science Foundation Grant DMS-0437258
,” in
Handbook of Latent Variable and Related Models
, edited by
S.-Y.
Lee
(
North-Holland
,
Amsterdam
,
2007
), pp.
21
44
.
10.
J.
Peppanen
,
X.
Zhang
,
S.
Grijalva
, and
M. J.
Reno
, “
Handling bad or missing smart meter data through advanced data imputation
,” in
IEEE Power and Energy Society Innovative Smart Grid Technologies Conference (ISGT)
,
2016
.
11.
M. J.
Azur
,
E. A.
Stuart
,
C.
Frangakis
, and
P. J.
Leaf
, “
Multiple imputation by chained equations: What is it and how does it work?
,”
Int. J. Methods Psychiatr. Res.
20
(
1
),
40
49
(
2011
).
12.
S.
van Buuren
and
K.
Groothuis-Oudshoorn
, “
mice: Multivariate Imputation by Chained Equations in R
,”
J. Stat. Software
45
,
1
67
(
2011
).
13.
Q.-T.
Phan
,
Y.-K.
Wu
,
Q.-D.
Phan
, and
H.-Y.
Lo
, “
A study on missing data imputation methods for improving hourly solar dataset
,” in
8th International Conference on Applied System Innovation (ICASI)
(IEEE,
2022
), pp.
21
24
.
14.
G. E. A. P. A.
Batista
and
M. C.
Monard
, “
An analysis of four missing data treatment methods for supervised learning
,”
Appl. Artif. Intell.
17
(
5–6
),
519
533
(
2003
).
15.
R.
Frouin
and
H.
Murakami
, “
Estimating photosynthetically available radiation at the ocean surface from ADEOS-II global imager data
,”
J. Oceanogr.
63
(
3
),
493
503
(
2007
).
16.
H.
Hersbach
et al, “
The ERA5 global reanalysis
,”
Q. J. R. Meteorol. Soc.
146
(
730
),
1999
2049
(
2020
).
17.
T.
Alquthami
,
M. A.
Magzoub
, and
A. M.
Osman
, “
A day ahead prediction of solar PV power output using ensemble neural network
,” in
22nd International Middle East Power Systems Conference (MEPCON)
. (IEEE,
2021
), pp.
20
25
.
18.
B.
Kim
,
D.
Suh
,
M.-O.
Otto
, and
J.-S.
Huh
, “
A novel hybrid spatio-temporal forecasting of multisite solar photovoltaic generation
,”
Remote Sens.
13
(
13
),
2605
(
2021
).
19.
A. K.
Gupta
and
R. K.
Singh
, “
Short-term day-ahead photovoltaic output forecasting using PCA-SFLA-GRNN algorithm
,”
Front. Energy Res.
10
,
1029449
(
2022
).
20.
F. E.
Grubbs
, “
Sample criteria for testing outlying observations
,”
Ann. Math. Stat
21
(
1
),
27
58
(
1950
).
21.
I.
Jebli
,
F.-Z.
Belouadha
,
M. I.
Kabbaj
, and
A.
Tilioua
, “
Prediction of solar energy guided by pearson correlation using machine learning
,”
Energy
224
,
120109
(
2021
).
22.
T. M.
Al-Jaafreh
and
A.
Al-Odienat
, “
The solar energy forecasting by Pearson correlation using deep learning techniques
,”
EARTH Sci. Hum. Constr.
2
,
158
163
(
2022
).
23.
J.
Guo
,
X.
Xu
,
W.
Lian
, and
H.
Zhu
, “
A new approach for interval forecasting of photovoltaic power based on generalized weather classification
,”
Int. Trans. Electr. Energy Syst.
29
(
4
),
e2802
(
2019
).
24.
J.
Zhong
,
L.
Liu
,
Q.
Sun
, and
X.
Wang
, “
Prediction of photovoltaic power generation based on general regression and back propagation neural network
,”
Energy Procedia
152
,
1224
1229
(
2018
).
25.
M.
Shen
,
H.
Zhang
,
Y.
Cao
,
F.
Yang
, and
Y.
Wen
, “
Missing data imputation for solar yield prediction using temporal multi-modal variational auto-encoder
,” in
Proceedings of the 29th ACM International Conference on Multimedia
, New York, NY (ACM,
2021
), pp.
2558
2566
.
26.
I.
de-Paz-Centeno
,
M. T.
García-Ordás
,
Ó.
García-Olalla
, and
H.
Alaiz-Moretón
, “
Imputation of missing measurements in PV production data within constrained environments
,”
Expert Syst. Appl.
217
,
119510
(
2023
).
27.
M. T. E.
Seddik
,
O.
Kadri
, and
M. R.
Abdessemed
, “
Imputation as service using support vector regression: Application to a photovoltaic system in Algeria
,” in 1st National Conference of Materials Sciences and Engineering (MSE’22) (HAL Open Science, 2022), available at https://hal.science/hal-03815846.
28.
T.
Shireen
,
C.
Shao
,
H.
Wang
,
J.
Li
,
X.
Zhang
, and
M.
Li
, “
Iterative multi-task learning for time-series modeling of solar panel PV outputs
,”
Appl. Energy
212
,
654
662
(
2018
).
29.
I. P.
Panapakidis
,
A. S.
Bouhouras
, and
G. C.
Christoforidis
, “
A missing data treatment method for photovoltaic installations
,” in
IEEE International Energy Conference (ENERGYCON)
(
2018
).
30.
T.
Chen
and
C.
Guestrin
, “
XGBoost: A Scalable Tree Boosting System
,” in
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, New York, NY (ACM,
2016
), pp.
785
794
.
31.
A.
Alcañiz
,
D.
Grzebyk
,
H.
Ziar
, and
O.
Isabella
, “
Trends and gaps in photovoltaic power forecasting with machine learning
,”
Energy Rep.
9
,
447
471
(
2023
).
You do not currently have access to this content.