High-precision wind power prediction could reference the optimal dispatch and stable operation of the power system. This paper proposes an adaptive hybrid optimization algorithm that integrates decomposition and reconstruction to effectively explore the potential characteristics and related factors of wind power output and improve the accuracy of short-term wind power prediction. First, the extreme-point symmetric mode decomposition is used to analyze the periodicity, trend, and abrupt characteristics in the original wind power sequence and form multiple intrinsic mode functions with local time-domain characteristics. Then, considering the similarity of the feature sequence and the efficiency of the prediction algorithm, the permutation entropy is used to reconstruct the components with close time-domain characteristics to form subsequences that could reflect different spectral characteristics. Then, the improved maximum relevance minimum redundancy-the long short-term memory-the adaptive boosting algorithm model is used to determine the prediction model structure, parameters, and optimal feature factors of the subsequences. Finally, the prediction results of each subsequence are integrated to obtain the final wind power. Taking a wind farm in northern Shaanxi as the application object, the prediction accuracy and efficiency of the methods proposed in this paper are compared in terms of the decomposition method, prediction model, and prediction timeliness. The results show that in the 15 min to 3 h forecast periods, compared with other models, the mean absolute error and root mean square error of the proposed model are increased. At the same time, as the forecast period grows, the superiority of the proposed method is more prominent.
I. INTRODUCTION
As the country pays more attention to clean energy, the proportion of wind power in the power system has gradually increased. However, due to the uncertainty of wind power itself, it may affect the safety, stability, and economic operation of power systems. Therefore, it is particularly important to improve the accuracy of short-term wind power prediction.1 The more accurate wind power prediction results can provide reference for the dispatching department to adjust the dispatch plan in time and formulate a reasonable control strategy, which is conducive to reducing the spinning reserve capacity of the grid, reducing power generation costs, and improving the safety of wind power.2
Commonly used wind power prediction methods include the neural network method,3–5 time series method,6,7 support vector machine method,8–10 and other machine learning methods, which have the advantages of simple modeling and rapid calculation. Although these machine learning methods can predict wind power in the short term, the accuracy of the prediction is not high enough. With the rise of deep learning, Long Short-Term Memory (LSTM) is gradually applied to wind power prediction,11 which solves the problem that traditional neural networks cannot learn long-distance dependencies and improves the accuracy of the forecast. Therefore, this paper will continue to use the advantages of the LSTM network to establish an effective wind power prediction model.
Due to the randomness and uncertainty of wind power, it is difficult to obtain satisfactory results using only a single prediction method. In order to solve this problem, a hybrid model combining different technologies is usually used to improve the prediction accuracy. So far, the combination of the data decomposition strategy and prediction technology can achieve better results. The commonly used time-frequency decomposition methods mainly include empirical mode decomposition (EMD),12,13 ensemble empirical mode decomposition (EEMD),14,15 variational mode decomposition (VMD),16,17 and so on. Among them, EMD is highly adaptive, but there are problems such as the end effect and modal aliasing. Although EEMD reduces the influence of modal aliasing by adding white noise, repeated noise processing may change the original signal and affect the accuracy of the algorithm; VMD is transformed into a series of sequences with natural frequencies through signal recursion, effectively suppressing noise, but the parameter K (the number of intrinsic mode functions) needs to be determined in advance, and empirical values are used in most cases. Therefore, this paper adopts the Extreme-point Symmetric Mode Decomposition (ESMD) method. ESMD creatively proposed the Direct Interpolation (DI) method for data, which basically solved the frequency crossover problem. Starting from the signal characteristics, it borrowed the idea of “least squares” to optimize the final remaining mode to become the “adaptive global moving average” of the entire data. From this, to determine the optimal number of screenings, the trend and fluctuation characteristics of the time series are more obvious.18,19 Currently, ESMD has been used in many research fields, such as climate prediction20 and fault diagnosis,21 but few people apply it to wind power prediction. Aiming at the problem of some similar components in the signal decomposition process, Ref. 22 used sample entropy (SE) to reconstruct components with similar complexity. However, SE will produce inaccurate estimates when processing short time series. Combining it with ESMD will not enhance the advantage of decomposition. Therefore, permutation entropy (PE) is used for reconstruction to enhance the periodic characteristics of components and reduce prediction time.
Since wind power is affected by factors such as wind speed, wind direction, air pressure, and temperature, only considering the original wind power data and ignoring the influence of these factors will reduce prediction accuracy. Moreover, each subcomponent after decomposition and reconstruction may be affected by different factors. Simply adding various factors to the prediction of each component may be counterproductive, resulting in a decrease in calculation accuracy and speed. Reference 23 used the distance correlation coefficient (DCC) to analyze the non-linear correlation between wind power and numerical weather forecast (NWP). Reference 24 used principal component analysis (PCA) to extract the main ingredients of the influencing factors but did not analyze the correlation between these influencing factors and wind power. Reference 25 used mutual information (MI) to analyze the correlation between input features and wind power, but this method did not consider the relationship between each feature, which increases the number of input features. In order to overcome these shortcomings, the maximum relation minimum redundancy (mRMR) is applied to wind power prediction, which can ensure that the input features have greater correlation and less redundancy and improve the prediction efficiency.
In response to the above problems, this paper proposes a new hybrid short-term wind power forecasting method, which uses ESMD-PE to decompose and reconstruct the original wind power sequence, uses mRMR to select input features for each sub-sequence, and uses LSTM-Adaboost for prediction. The main novelties of this article are as follows:
Applying ESMD to the data decomposition of wind power prediction, effectively extracting some nonlinear features and hidden trends in the original wind power sequence, and reducing the complexity and non-stationarity of the original sequence.
Using PE to reconstruct the decomposed components, which enhances the periodic characteristics and reduces the calculation time.
Using the adaptive boosting (Adaboost) integrated algorithm to optimize the LSTM model to improve the prediction accuracy.
A novel hybrid forecasting model (ESMD-PE and mRMR-LSTM-Adaboost) is proposed and used for short-term wind power forecasting in different forecast periods. A number of experiments and comparative analyses were carried out to evaluate the effectiveness of the method in different foresight periods.
The main structure of this paper is as follows: Sec. II briefly introduces the theory, including ESMD-PE, mRMR algorithm, and LSTM-Adaboost model; Sec. III introduces the prediction model and prediction performance evaluation index; Sec. IV carries out case study applications and analyzes the results; and the research conclusions are given in Sec. V.
II. METHODS
A. Sequence decomposition—ESMD
ESMD is a method proposed by Wang and Li for non-linear and non-stationary signal decomposition in 2014.18 This method is a new development of the Hilbert–Huang transform method. The method consists of two parts: the first part is modal decomposition, which can generate several modes and an optimal adaptive global average, and the second part is time-frequency analysis. Borrowing the idea of “least squares” to optimize the final remaining mode to become the “adaptive global moving average” of the entire data, the optimal number of screenings is determined. Considering that all integral transforms including Hilbert transform have inherent defects in analyzing time-frequency changes, a creatively proposed “direct interpolation (DI) method” for data, which can intuitively reflect the time-varying of each mode amplitude and frequency. The ESMD only considers one-dimensional observation data and treats the original wind power data sequence as a non-linear and non-stationary signal. The specific decomposition process is as follows:
Step 1: Find all extreme points of sequence P (including maximum and minimum points).
Step 2: Connect adjacent extreme points with line segments and mark the midpoints of the line segments as .
Step 3: Supplement the midpoints F0 and Fn of the left and right borders in a certain way.
Step 4: Use the obtained n + 1 midpoints to construct p interpolation lines .
Step 5: Calculate their mean curve .
Step 6: Repeat Step 1–Step 5 for P − L* until (ɛ is the preset allowable error) or the number of screenings reaches the preset maximum value; at this time, the first empirical mode M1 is obtained.
Step 7: Repeat Step 1–Step 6 to obtain M2, M3, … until the final margin R only has a certain number of extreme points.
B. Partial reconstruction—PE
Entropy can characterize the complexity of signals and the uncertainty of measurement information and is suitable for processing non-linear problems. Permutation entropy detects the dynamic changes of time series by comparing the values of adjacent time series. Using ESMD to decompose the original sequence produces multiple IMFs, some of which have similar randomness. PE is used to calculate the entropy value of each IMF component and reconstruct the sequence with a similar entropy value, which can reduce the complexity of the model and improve the prediction speed. The algorithm is shown as follows:
- Step 1: Regard the obtained components as a time series , perform phase space reconstruction, and obtain matrix Y aswhere m is the embedding dimension, t is the delay time, and .
Each row in matrix Y is a reconstruction component, and there are n reconstruction components in total.
Step 2: Rearrange each reconstructed component in ascending order and obtain the column index of each element position in the vector to form a set of symbol sequences:
There are a total of m! kinds of different symbol sequences for dimensional phase space mapping.
Step 3: Calculate the number of occurrences of each symbol sequence divided by m!, which is the total number of occurrences of different symbol sequences as the probability of the symbol sequence, namely, .
- Step 4: The calculation formula of permutation entropy of time series X is(1)
- Step 5: The maximum permutation entropy is ln(d!), and the permutation entropy is normalized, namely,(2)
The size of the permutation entropy value indicates the degree of randomness of time series X: the smaller the entropy value, the simpler and more regular the time series; conversely, the larger the entropy value, the more complex and random the time series.
C. Feature selection—mRMR
The mRMR algorithm is a filtering feature selection method. It weighs relevance and redundancy in different ways and uses mutual information as a calculation criterion to measure the redundancy between features and the relationship between features and class variables.
The principle of maximum correlation refers to selecting those features that have the greatest correlation with the model. The greater the correlation, the stronger the problem-solving ability of the trained model. The maximum correlation calculation formula is expressed as
where xi is the ith feature, is the categorical variable, L is the total number of categories, and S is the feature subset.
Since the greater the correlation between features, the higher the redundancy. In order to reduce the redundancy between features and make each feature representative, the redundancy needs to be minimized, which is the principle of minimum redundancy. The minimum redundancy calculation formula is expressed as
D. Optimizing prediction—LSTM-Adaboost
1. LSTM
The long short-term memory network (LSTM) is an efficient RNN architecture introduced by Hochreiter and Schmidhuber in 1997 and has been redefined by many people since then. The LSTM is to overcome the problem of gradient disappearance when a standard RNN deals with a long-term dependence. In a standard RNN, the entire neural network is a series of repeated modules, forming a series of simple hidden networks, such as a single S-shaped layer. Compared with a standard RNN, the hidden layer of the LSTM has a more complex structure. Specifically, the LSTM introduces the concept of gates and memory cells in each hidden layer. The storage block is mainly composed of four parts: input gate, forget gate, output gate, and self-connected storage unit. The internal structure of the LSTM is shown in Fig. 1.
2. Adaboost
The Adaboost algorithm was proposed in 1995. Its basic idea is to train different weak learners for the same training sample set and to gather the trained weak learners to form a strongest learner. The Adaboost algorithm constructs the diversity of samples by changing the distribution of samples and at the same time determines the weight of each sample according to whether the sample prediction in each training sample set is correct and the accuracy of the sample prediction during the previous training. At this time, the sample weight with the larger prediction error of the previous weak learner will be strengthened, and the sample with the changed weight will be used to train the next weak learner again. At the same time, a new weak learner will be added in the new round of training and stop training until the initially predetermined minimum error rate or maximum allowable number of iterations is reached.
3. LSTM-Adaboost
The LSTM-Adaboost model is constructed as follows: the simple process is shown in Fig. 2.
Input: training samples , where xi is the training data, yi is the prediction data, and n is the number of samples.
Output: the final strong predictor.
Initialize the weight. The weight distribution of the sample is .
The number of iterations is represented by k = 1, 2, 3, …K.
At the k-th iteration, the weight distribution is Dk, and the weak predictor hk is obtained by LSTM training.
Calculate the prediction error of the weak predictor hk on each training sample: . Here, the output interval of is [0,1].
Calculate the total error of the training sample: .
Calculate the coefficients of the current LSTM weak predictor: , where .
Update the weight distribution of the training set: , where .
At the end of the loop, save the weight of the weak predictor: , where .
Build a strong predictor: .
III. FORECAST MODEL AND EVALUATION INDEX
A. A novel hybrid model based on ESMD-PE and mRMR-LSTM-Adaboost
Based on the above analysis, this section proposes a novel hybrid model based on ESMD-PE and mRMR-LSTM-Adaboost. The overall framework is shown in Fig. 2. The specific process is as follows:
Part 1: Sequence decomposition. Regard the original wind power sequence (training data) as a non-linear and non-stationary signal and decompose it into a number of relatively stable and distinctive IMFs and an optimal adaptive global average (R).
Part 2: Partial reconstruction. According to the entropy value of each IMF and R, each sequence with a similar entropy value is reconstructed to obtain the reconstructed subsequence.
Part 3: Feature selection. According to the mRMR value of the features in each subsequence, the input feature set of each component is established to reduce the mutual influence between different features while ensuring the correlation, and the ten-fold cross-validation is used to obtain the optimal input feature.
Part 4: Optimizing prediction. Use Adaboost to optimize the LSTM to obtain the LSTM-Adaboost prediction model. Optimize the parameters in subsequence prediction models and use the model to combine the prediction results of each sequence to obtain the final prediction results.
B. Evaluation index
This paper selects mean absolute error (MAE), root mean square error (RMSE), and relative root mean square error (RRMSE) as the evaluation criteria for evaluating the prediction accuracy of each model, and the calculation formulas are shown as follows:
where N is the number of samples, yi is the measured value, is the predicted value, and RRMSE represents the accuracy of the model. According to the metric in Ref. 26, the accuracy of the model is considered excellent at RRMSE < 10, good at 10 < RRMSE < 20, fair at 20 < RRMSE < 30, and bad at RRMSE > 30.
IV. CASE ANALYSIS
In order to verify the usability and practicability of the proposed model, this paper selects part of the data (collected every 15 min) of a wind farm in northern Shaanxi, China, from September to October 2009 for analysis. The total number of datasets is 3600. The rated installed capacity of the wind farm is 49.5 MW. The climate type in this area is temperate continental climate, and the height of the wind farm is between 1100 and 1370 m. Due to the special topography of the Loess Plateau in northern Shaanxi and the influence of atmospheric circulation, the region has abundant wind energy resources. The basic information of the wind farm is shown in Table I. In order to improve the generalization ability of the network, some hyperparameters in the network are adjusted based on the verification set. 60% of the total datasets (i.e. 2160) are used as the training set, 20% (i.e. 720) are used as the verification set, and the rest 20% (i.e. 720) are used as the test set,27 and the specific division is shown in Fig. 3.
Basic information of a wind power plant in northern Shaanxi.
Index . | Maximum value . | Minimum value . | Average value . | Standard deviation . |
---|---|---|---|---|
Wind power (MW) | 46.5 | 6.45 | 26.47 | 20.03 |
Wind speed (m/s) | 10.9 | 5.4 | 8.15 | 2.75 |
Wind direction (deg) | 349 | 7 | 172.89 | 74.13 |
Temperature (°C) | 30.80 | 12.68 | 20.91 | 3.72 |
Air pressure (kPa) | 82.60 | 81.60 | 82.06 | 0.20 |
Index . | Maximum value . | Minimum value . | Average value . | Standard deviation . |
---|---|---|---|---|
Wind power (MW) | 46.5 | 6.45 | 26.47 | 20.03 |
Wind speed (m/s) | 10.9 | 5.4 | 8.15 | 2.75 |
Wind direction (deg) | 349 | 7 | 172.89 | 74.13 |
Temperature (°C) | 30.80 | 12.68 | 20.91 | 3.72 |
Air pressure (kPa) | 82.60 | 81.60 | 82.06 | 0.20 |
Time series of the wind power value of training, validation, and testing sets.
A. Sequence decomposition and reconstruction based on ESMD and PE
According to the prediction model established in Sec. III A, ESMD is used to decompose the training data (including the training set and the verification set, a total of 2880 datasets) to reduce the complexity and fluctuation of the wind energy sequence. The decomposed sequence has a more significant trend and periodicity. The optimal number of screenings needs to be determined before decomposition, and the screening results are shown in Fig. 4.
It can be seen from Fig. 4 that when the number of screenings is 22, the variance ratio is the smallest. Therefore, the optimal number of screenings is 22. Under the best screening times, the training data (including the training set and the verification set, a total of 2880 datasets) are decomposed into 9 IMFs and a margin R by ESMD. The decomposition result is shown in Fig. 5. The frequency and amplitude fluctuations are shown in Figs. 6 and 7.
It can be seen from Fig. 5 that the components decomposed by ESMD are independent of each other, and the margin R reflects the overall trend of wind power. It can be seen from Fig. 6 that the overall fluctuation range from A1 to A7 shows an increasing trend and that from A8 to A9 shows a decreasing trend. Looking at the amplitude changes in each IMF separately, A1–A3 all change rapidly between 5 and 50 min with large fluctuations, and the periodicity is not obvious. The changes of A4–A7 are slower and smaller and relatively more regular. The amplitude fluctuation of A8 and A9 is small, and the amplitude is not large, reflecting the long-term trend of wind power. It can be seen from Fig. 7 that the overall fluctuation range gradually decreases from F1 to F9. The amplitude of F1 changes little in the original sequence, and these changes should be mainly related to meteorological factors. The frequency change trends of F2 and F3 are very similar and should be affected by similar factors. The amplitude changes of F4 and F5 are larger than other components, and the curve trends of IMF4 and IMF5 are closer to the original sequence. The frequency change of F6–F9 is very small, the change range is within 0.2 units, and the time interval of the change is relatively large.
In summary, the stability of the sequence from IMF1 to IMF9 has gradually increased, and the fluctuations have gradually decreased. The proportion of IMFs with large frequency changes in the original sequence is very small, that is to say, the violent oscillation of the high frequency part will not affect the overall trend of the original sequence. On the contrary, the low frequency part with a smooth and stable trend occupies a large proportion in the original sequence. It can be seen that the decomposed sequence is more stable than the original sequence, which is more conducive to the feature extraction and related prediction analysis of the related sequence.
In this paper, the permutation entropy of each modal function is calculated and the modal function is reconstructed according to the entropy value. The entropy value of each sequence is shown in Fig. 8.
Based on the above analysis, the sequences with similar entropy values are superimposed. That is to say, IMF1, IMF4, and IMF5 are superimposed and reconstructed as CIMF1, IMF2–3 are superimposed and reconstructed as CIMF2, IMF6–8 are superimposed and reconstructed as CIMF3, and IMF9 and R are superimposed and reconstructed as CIMF4. The reconstruction result is shown in Fig. 9.
It can be seen from Fig. 9 that the reconstructed subsequences have great differences in changing trends. The fluctuation of CIMF1 and CIMF2 is relatively large, and the periodicity is not obvious. It may be affected by some obvious factors that change over time (such as wind speed). Among them, the fluctuation frequency of CIMF2 is much higher than that of CIMF1, which may be affected by more factors. The change of CIMF3 is relatively slow but has obvious periodicity, which roughly corresponds to wind power fluctuations with an average period of about 10 days. The reason for this change may be affected by meteorological factors, such as temperature or air pressure. CIMF4 shows the monthly variation trend of the sequence, indicating that it is more severely affected by meteorological factors. The above analysis shows that the ESMD-PE data processing method can separate the wind power change trends of different periods from wind power observation sequences. Some of the faster-changing sequences obtained may be affected by wind speed and other factors. The sequence showing obvious periodicity may be related to meteorological factors, and the interannual variation trend of wind power can be separated from the long-term wind power observation sequence, which may be related to climate change.
B. Feature selection based on mRMR
According to the above analysis, each sequence after decomposition and reconstruction may be affected by different factors. Therefore, this section will use mRMR to select the feature set of each component. The initial feature set F and its representation method are shown in Table II, and the specific steps of feature selection are shown in Fig. 10.
Feature name and representation method.
Feature name . | Representation method . |
---|---|
Temperature | T |
Air pressure | P |
Wind direction | D |
Wind speed | S |
Historical wind speed | St |
Historical wind power | Pt |
Feature name . | Representation method . |
---|---|
Temperature | T |
Air pressure | P |
Wind direction | D |
Wind speed | S |
Historical wind speed | St |
Historical wind power | Pt |
The wind power data in this paper are collected every 15 min. InTable II, St and Pt represent the wind speed and wind power before t min. For example, S15 represents the wind speed value before 15 min, and P45 represents the wind power value before 45 min. In addition, wind speed and air pressure and temperature are measured values.
After the feature matrix is established, the incremental search method is used to establish the candidate feature set for CIMF1–CIMF4, and the mRMR value of each feature is calculated. The results of the mRMR value of each feature in descending order are shown in Table III.
Results of the mRMR value of each feature in descending order.
. | Subcomponent . | |||
---|---|---|---|---|
Sort . | DIMF1 . | DIMF2 . | DIMF3 . | DIMF4 . |
1 | P45 (4.74) | P30 (6.79) | T (3.79) | T (7.19) |
2 | S30 (4.69) | P45 (6.65) | P30 (3.38) | D (6.78) |
3 | P30 (4.65) | P15 (6.61) | D (3.30) | P (6.29) |
4 | P60 (4.48) | P60 (6.52) | P (3.24) | S60 (2.42) |
5 | P15 (3.85) | S30 (6.32) | P15 (3.16)) | S15 (2.32) |
6 | S45 (3.84) | S45 (6.23) | P60 (2.98) | P45 (2.13) |
7 | S15 (3.70) | S15 (6.18) | P45 (2.40) | S60 (2.08) |
8 | S60 (3.62) | S60 (6.13) | S30 (2.32) | P60 (1.92) |
9 | S (1.09) | S (5.38) | S45 (2.24) | S45 (1.84) |
10 | T (0.86) | D (2.50) | S15 (2.18) | S30 (1.71) |
11 | P (0.43) | P (2.38) | S60 (2.06) | P15 (1.65) |
12 | D (0.40) | T (2.18) | S (1.13) | S (0.94) |
. | Subcomponent . | |||
---|---|---|---|---|
Sort . | DIMF1 . | DIMF2 . | DIMF3 . | DIMF4 . |
1 | P45 (4.74) | P30 (6.79) | T (3.79) | T (7.19) |
2 | S30 (4.69) | P45 (6.65) | P30 (3.38) | D (6.78) |
3 | P30 (4.65) | P15 (6.61) | D (3.30) | P (6.29) |
4 | P60 (4.48) | P60 (6.52) | P (3.24) | S60 (2.42) |
5 | P15 (3.85) | S30 (6.32) | P15 (3.16)) | S15 (2.32) |
6 | S45 (3.84) | S45 (6.23) | P60 (2.98) | P45 (2.13) |
7 | S15 (3.70) | S15 (6.18) | P45 (2.40) | S60 (2.08) |
8 | S60 (3.62) | S60 (6.13) | S30 (2.32) | P60 (1.92) |
9 | S (1.09) | S (5.38) | S45 (2.24) | S45 (1.84) |
10 | T (0.86) | D (2.50) | S15 (2.18) | S30 (1.71) |
11 | P (0.43) | P (2.38) | S60 (2.06) | P15 (1.65) |
12 | D (0.40) | T (2.18) | S (1.13) | S (0.94) |
In Table III, for CIMF1 and CIMF2, the mRMR value of the historical wind power and the historical wind speed is relatively forward and is much larger than others. However, for CIMF3 and CIMF4, the temperature, wind direction, and air pressure are relatively forward.
Ten-fold cross-validation28 is used to obtain the best input feature set of CIMF1–CIMF4, as shown in Table IV.
Input feature set of CIMF1–CIMF4.
Subcomponent . | Input feature set . |
---|---|
CIMF1 | P45, S30, P30, P60 |
CIMF2 | P30, P45, P15, P60, S30, S45, S15, S60, S |
CIMF3 | T, P30, D, P |
CIMF4 | T, D, P |
Subcomponent . | Input feature set . |
---|---|
CIMF1 | P45, S30, P30, P60 |
CIMF2 | P30, P45, P15, P60, S30, S45, S15, S60, S |
CIMF3 | T, P30, D, P |
CIMF4 | T, D, P |
It can be seen from Table IV that the input feature sets of CIMF1 and CIMF2 mainly include the historical wind power and the historical wind speed, and CIMF2 also includes the wind speed. The wind speed and wind power have a very important relationship, and the uncertainty of wind speed and intermittent wind power have great randomness and fluctuation. CIMF3 and CIMF4 mainly include meteorological characteristics, such as temperature, wind direction, and pressure. Wind speed and wind direction are related to climate. In addition, the horizontal pressure gradient force is the direct cause of wind formation. Therefore, CIMF3 and CIMF4 reflect the change trend of wind power in different time periods.
C. Hyperparameter adjustment of LSTM-Adaboost
Before applying the LSTM-Adaboost model for prediction, the parameters are adjusted based on the performance of the verification set.
Adam is selected as the optimizer in LSTM, the maximum number of training epochs is set to 20, the gradient threshold is set to 1, the initial learning rate is set to 0.005, and after 5 epochs, the learning rate is reduced by 80%.29 The optimal input feature vector of each subsequence is determined by the tailing property of PACF, and it is set to three points of each component and their corresponding features.30 Take CIMF1 as an example to study the influence of hyperparameters in LSTM on the model.
The dropout value will affect the model's prediction accuracy. The dropout value indicates the strength of dropout to prevent overfitting. The larger the value, the greater the probability that each neuron is set to 0. Dropout is between the layers of the model and not between the neurons in the same layer. The corresponding MAE and RSME values when the dropout value is set to 0, 0.3, 0.5, and 0.7 are shown in Fig. 11.
It can be seen from Fig. 11 that the dropout value is set to 0.3 that has the best effect, which can improve the generalization ability of the model and improve the prediction accuracy. When the dropout value becomes very large, it reduces the capacity of the model, which makes the model’s ability to fit the relationship between features weaker and occurs underfitting.
For the setting of the number of hidden layers and hidden layer nodes, the increase in the number of hidden layers will cause a rapid increase in time cost. Considering the high requirements for timeliness of ultra-short and short-term wind power prediction, the hidden layers are set to layer 1 and layer 2.31 The number of hidden layer nodes is directly related to the complexity of the solution and the output accuracy. When the number of nodes is too large, the generalization ability of the network will be missing and even “overfitting” will occur, and when the number of nodes is too small, the training ability will be reduced.32 Therefore, the experiment-error method33 is used to predict networks with different numbers of nodes to find a better network structure for each component. The number of nodes in each hidden layer is set to 50, 100, 150, and 200 in turn.
First, initialize a single hidden layer LSTM model with 16 input nodes and 1 output node. Run three times for different hidden layer nodes and calculate the average value of each performance index, and the results are shown in Table V.
Prediction results of the CIMF1 component of the single hidden layer with different numbers of nodes.
Number of hidden layer nodes . | MAE . | RMSE . |
---|---|---|
16-50-1 | 1.12 | 1.35 |
16-100-1 | 1.02 | 1.15 |
16-150-1 | 0.95 | 1.10 |
16-200-1 | 0.86 | 1.06 |
Number of hidden layer nodes . | MAE . | RMSE . |
---|---|---|
16-50-1 | 1.12 | 1.35 |
16-100-1 | 1.02 | 1.15 |
16-150-1 | 0.95 | 1.10 |
16-200-1 | 0.86 | 1.06 |
It can be seen from Table V that when the number of nodes in hidden layer 1 is 200, the learning performance of the model is the best. On the basis of the above analysis, the dual-hidden-layer LSTM model learning is performed, and the optimal number of nodes in hidden layer 2 is determined in turn. The prediction performance is shown in Table VI.
Prediction results of the CIMF1 component of the double hidden layers with different numbers of nodes.
Number of hidden layer nodes . | MAE . | RMSE . |
---|---|---|
16-200-50-1 | 0.78 | 1.01 |
16-200-100-1 | 0.73 | 0.92 |
16-200-150-1 | 0.71 | 0.90 |
16-200-200-1 | 0.70 | 0.89 |
Number of hidden layer nodes . | MAE . | RMSE . |
---|---|---|
16-200-50-1 | 0.78 | 1.01 |
16-200-100-1 | 0.73 | 0.92 |
16-200-150-1 | 0.71 | 0.90 |
16-200-200-1 | 0.70 | 0.89 |
It can be seen from Table VI that when the numbers of nodes in hidden layer 2 are 50, 100, 150, and 200, the performance indicators of the model are better than those of the single hidden layer. Since the increase in the number of nodes will affect the execution speed, the number of nodes in the hidden layer is selected as 100. For CIMF1, a four-layer LSTM structure of “16-200-100-1” should be adopted (that is, input layer: 16 nodes, hidden layer 1: 200 nodes, hidden layer 2: 100 nodes, and output layer: 1 node).
After determining the number of LSTM network layers, the number of nodes, and specific parameters, it is used as a weak predictor to fill the Adaboost framework. The number of weak predictors K affects the accuracy and time. Therefore, to overcome the optimization algorithm’s contingency, this paper conducts multiple predictions using the strong predictor with the same K value and calculates the average of MAE and RSME and then determines the most reasonable number of weak predictors. The results are shown in Table VII.
MAE and RSME values of strong predictors with different K values.
K value . | MAE . | RMSE . |
---|---|---|
K = 1 | 0.73 | 0.92 |
K = 2 | 0.70 | 0.88 |
K = 3 | 0.64 | 0.76 |
K = 4 | 0.62 | 0.76 |
K = 5 | 0.59 | 0.75 |
K = 6 | 0.59 | 0.75 |
K value . | MAE . | RMSE . |
---|---|---|
K = 1 | 0.73 | 0.92 |
K = 2 | 0.70 | 0.88 |
K = 3 | 0.64 | 0.76 |
K = 4 | 0.62 | 0.76 |
K = 5 | 0.59 | 0.75 |
K = 6 | 0.59 | 0.75 |
It can be seen from Table VII that as the K value changes from small to large, both MAE and RMSE decrease. Among them, when K is from 1 to 3, MAE and RMSE decrease more, and when K is from 3 to 6, MAE and RMSE change very little, which shows that although increasing the K value is conducive to reducing the error and improving the prediction accuracy, the K value is too large to increase prediction time. To sum up, this paper chooses three LSTM network weak predictors to form a strong predictor. Through the above methods, the dropout value, network structure, and prediction performance of CIMF1–CIMF4 are obtained as shown in Table VIII.
Dropout value, network structure, and performance of CIMF1–CIMF4.
Subcomponent . | Dropout value . | K value . | LSTM structure . | MAE . | RMSE . |
---|---|---|---|---|---|
CIMF1 | 0.3 | 3 | 16-200-100-1 | 0.64 | 0.76 |
CIMF2 | 0.3 | 3 | 36-100-50-1 | 0.57 | 0.83 |
CIMF3 | 0.3 | 2 | 16-50-50-1 | 0.29 | 0.35 |
CIMF4 | 0.3 | 2 | 12-50-50-1 | 0.02 | 0.03 |
Subcomponent . | Dropout value . | K value . | LSTM structure . | MAE . | RMSE . |
---|---|---|---|---|---|
CIMF1 | 0.3 | 3 | 16-200-100-1 | 0.64 | 0.76 |
CIMF2 | 0.3 | 3 | 36-100-50-1 | 0.57 | 0.83 |
CIMF3 | 0.3 | 2 | 16-50-50-1 | 0.29 | 0.35 |
CIMF4 | 0.3 | 2 | 12-50-50-1 | 0.02 | 0.03 |
D. Different experiments and relative analysis
In order to verify the prediction accuracy and stability of the model proposed in this paper, a variety of wind power prediction models were established, and the results were divided into the following two categories for comparison.
1. One-step (15 min) ahead forecasting
One-step ahead forecasting is to use the first three actual datasets to predict the fourth dataset. When the actual value of the fourth dataset is available, use the second to fourth actual datasets to predict the fifth dataset. For a model that uses a decomposition algorithm whenever new actual data are added, the new training data (the number of datasets is still 2880) must be re-decomposed and reconstructed34 and then predicted. If RRMSE > 15, adjust the relevant parameters.
Three sets of experiments are used to compare the predictive performance between the proposed model and other comparable models. In Experiment 1, three individual models (BP, ARMA, and LSTM) are selected as the control group. In Experiment 2, EEMD and VMD are selected as the control group to compare the prediction ability of models based on different decomposition methods. In Experiment 3, MI is selected to compare the prediction ability of models based on different feature selection methods.
Experiment 1: Comparison with other individual models
Table IX shows the comparison of the results of the optimized model and the other individual models. Figure 12 shows the forecasting results of individual forecasting models. Figure 12(a) shows the predicted results for all forecasting models. Figure 12(b) shows the scatter diagram of each model.
According to the evaluation criteria shown in Table IX, the optimized model outperformed the individual models. In terms of MAE, LSTM-Adaboost is 12.31% less than BP, 5.97% less than ARMA, and 3.86% less than LSTM. In terms of RMSE, LSTM-Adaboost is 9.20% less than BP, 2.71% less than ARMA, and 1.50% less than LSTM.
Experiment 2: Comparison with other models using different decomposition methods
This experiment demonstrated the performance of different decomposition methods by comparing ESMD-PE-LSTM-Adaboost with models using EEMD and VMD. The comparison results are listed in Table X and Fig. 13.
It can be seen from Fig. 13 and Table X that the accuracy of the decomposition and reconstruction models is higher than that of individual models. In terms of predictive effects, ESMD and VMD are better than EEMD.
Experiment 3: Comparison with other models using different feature selection methods
In order to verify the influence of meteorological factors on wind power, this experiment applies different feature selection methods to the prediction. The comparison results are listed in Table XI and Fig. 14.
According to the evaluation criteria shown in Table XI, compared with the model using MI as the feature selection method, the model proposed in this paper has improved MAE and RMSE by 8.81% and 8.43%, respectively.
Comparison of forecasting performances of the optimized model and other individual models.
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
BP | 3.41 | 4.35 | 18.70 |
ARMA | 3.18 | 4.06 | 17.50 |
LSTM | 3.11 | 4.01 | 17.24 |
LSTM-Adaboost | 2.99 | 3.95 | 17.01 |
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
BP | 3.41 | 4.35 | 18.70 |
ARMA | 3.18 | 4.06 | 17.50 |
LSTM | 3.11 | 4.01 | 17.24 |
LSTM-Adaboost | 2.99 | 3.95 | 17.01 |
Comparison of forecasting performances of the models using different decomposition methods.
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
EEMD-PE-LSTM-Adaboost | 2.27 | 3.26 | 14.04 |
VMD-PE-LSTM-Adaboost | 1.99 | 2.58 | 11.11 |
ESMD-PE-LSTM-Adaboost | 2.02 | 2.60 | 11.20 |
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
EEMD-PE-LSTM-Adaboost | 2.27 | 3.26 | 14.04 |
VMD-PE-LSTM-Adaboost | 1.99 | 2.58 | 11.11 |
ESMD-PE-LSTM-Adaboost | 2.02 | 2.60 | 11.20 |
Comparison of forecasting performances of the models using different feature selection methods.
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
ESMD-PE-MI-LSTM-Adaboost | 1.28 | 1.66 | 7.12 |
ESMD-PE-mRMR-LSTM-Adaboost | 1.18 | 1.52 | 6.54 |
Method . | MAE . | RMSE . | RRMSE . |
---|---|---|---|
ESMD-PE-MI-LSTM-Adaboost | 1.28 | 1.66 | 7.12 |
ESMD-PE-mRMR-LSTM-Adaboost | 1.18 | 1.52 | 6.54 |
Results of each individual prediction model: (a) predicted results of models and (b) scatter diagram of models.
Results of each individual prediction model: (a) predicted results of models and (b) scatter diagram of models.
Results of using different decomposition methods: (a) predicted results of models and (b) scatter diagram of models.
Results of using different decomposition methods: (a) predicted results of models and (b) scatter diagram of models.
Results of using different feature selection methods: (a) predicted results of models and (b) scatter diagram of models.
Results of using different feature selection methods: (a) predicted results of models and (b) scatter diagram of models.
The above analysis shows that using LSTM-Adaboost for wind power prediction is better than using BP, ARMA, and LSTM; using decomposition methods, such as ESMD, can further improve the prediction accuracy; using reconstruction algorithms, such as PE, reduces repetitive modeling and hardly affects the prediction accuracy; and using mRMR for feature selection also has a better effect.
2. Multi-step ahead forecasting
The above research shows that the model proposed in this paper has a satisfactory effect when the forecasting was one-step. However, the gradual decomposition will increase the forecasting time. Therefore, while ensuring the forecasting accuracy, increasing the forecast period appropriately can reduce the number of decompositions and save forecasting time. Moreover, the length of the forecast period is of great significance for the dispatching department to adjust the dispatch plan in time and formulate a reasonable control strategy. Now, take two models that perform well in one-step forecasting (using ESMD and VMD, respectively) as an example to illustrate the impact of different forecast periods (2-step ahead, 4-step ahead, 6-step ahead, 8-step ahead, 10-step ahead, and 12-step ahead) on the prediction results. In this stage, if RRMSE > 20, adjust the relevant parameters. Figure 15 shows the prediction results of the model proposed in this paper in different forecast periods (4-step ahead, 8-step ahead, and 12-step ahead). Figure 16 shows the changes in the MAE and RMSE of the two models in different forecast periods (2-step ahead, 4-step ahead, 6-step ahead, 8-step ahead, 10-step ahead, and 12-step ahead).
Prediction results of the model proposed in this paper in different forecast periods (4-step ahead, 8-step ahead, and 12-step ahead): (a) predicted results in different forecast periods and (b) scatter diagram in different forecast periods.
Prediction results of the model proposed in this paper in different forecast periods (4-step ahead, 8-step ahead, and 12-step ahead): (a) predicted results in different forecast periods and (b) scatter diagram in different forecast periods.
Changes in the MAE and RMSE of the two models in different forecast periods (2-step ahead, 4-step ahead, 6-step ahead, 8-step ahead, 10-step ahead, and 12-step ahead): (a) MAE and (b) RMSE.
Changes in the MAE and RMSE of the two models in different forecast periods (2-step ahead, 4-step ahead, 6-step ahead, 8-step ahead, 10-step ahead, and 12-step ahead): (a) MAE and (b) RMSE.
It can be seen from Fig. 15 that the prediction results in different forecast periods are different. Among them, the prediction result for 4-step ahead is the best, and for 12-step ahead, it is the worst. Due to the increase in ahead time, errors will continue to accumulate, thereby reducing the accuracy of the prediction. Figure 16 shows that the MAE and RMSE of two models increase with the growth of the forecast period. When the forecasting was 2-step, 4-step, and 6-step, there is not much difference between the two. However, when the model forecasts from 6-step to 12-step, the difference between the two gradually increases and the model proposed in this paper is always better than VMD-PE-mRMR-LSTM-Adaboost. This is because each component after decomposition and reconstruction of ESMD-PE reflects the change in wind power in different periods to a certain extent and can respond in time when the forecast period grows, which has certain advantages over other decomposition methods. The evaluation index values of the two prediction models in different forecast periods are shown in Table XII.
Performance evaluation using ESMD and VMD in different forecast periods.
. | . | Decomposition method . | |
---|---|---|---|
Forecast period . | Evaluation index . | ESMD . | VMD . |
2-step ahead | MAE | 1.29 | 1.23 |
RMSE | 1.67 | 1.55 | |
RRMSE | 7.17 | 6.66 | |
4-step ahead | MAE | 1.45 | 1.49 |
RMSE | 1.86 | 1.91 | |
RRMSE | 8.00 | 8.25 | |
6-step ahead | MAE | 1.88 | 1.92 |
RMSE | 2.43 | 2.48 | |
RRMSE | 10.45 | 10.69 | |
8-step ahead | MAE | 2.19 | 2.33 |
RMSE | 2.85 | 3.06 | |
RRMSE | 12.27 | 13.16 | |
10-step ahead | MAE | 2.71 | 2.79 |
RMSE | 3.50 | 3.62 | |
RRMSE | 15.06 | 15.54 | |
12-step ahead | MAE | 2.85 | 3.08 |
RMSE | 3.66 | 3.98 |
. | . | Decomposition method . | |
---|---|---|---|
Forecast period . | Evaluation index . | ESMD . | VMD . |
2-step ahead | MAE | 1.29 | 1.23 |
RMSE | 1.67 | 1.55 | |
RRMSE | 7.17 | 6.66 | |
4-step ahead | MAE | 1.45 | 1.49 |
RMSE | 1.86 | 1.91 | |
RRMSE | 8.00 | 8.25 | |
6-step ahead | MAE | 1.88 | 1.92 |
RMSE | 2.43 | 2.48 | |
RRMSE | 10.45 | 10.69 | |
8-step ahead | MAE | 2.19 | 2.33 |
RMSE | 2.85 | 3.06 | |
RRMSE | 12.27 | 13.16 | |
10-step ahead | MAE | 2.71 | 2.79 |
RMSE | 3.50 | 3.62 | |
RRMSE | 15.06 | 15.54 | |
12-step ahead | MAE | 2.85 | 3.08 |
RMSE | 3.66 | 3.98 |
In Table XII, as the forecast period grows, the MAE and RMSE of the two models increase. However, the RRMSE of the model proposed in this paper is always less than 20, and the prediction result is good. On the whole, the effect of using ESMD is better than VMD, indicating that the model proposed in this paper has a better effect in short-term wind power forecasting.
V. CONCLUSIONS
With the growth of population and economy, the development and utilization of new energy have become an inevitable trend. Therefore, accurate wind power prediction is particularly important. In this research, an adaptive hybrid optimization model integrating decomposition and reconstruction is proposed. The model combines ESMD-PE and mRMR, as well as improved LSTM, for wind power prediction. The main conclusions are as follows:
An adaptive hybrid optimization model combining decomposition and reconstruction is proposed, which is applied to wind power prediction. ESMD-PE is used to decompose the original sequence into subsequences that can reflect different spectrum characteristics, Adaboost is used to optimize LSTM, and the improved LSTM is used to predict each decomposed subsequence. The final prediction result is obtained by integrating the prediction results of all subsequences.
The meteorological data have a great influence on the performance of the wind power forecast model. The corresponding influence factors are selected based on mRMR. The input variables, including the decomposed wind power sub-sequence and other suitable meteorological factors, are determined, and the prediction is made based on the input variables and the LSTM-Adaboost model.
Compared with the eight existing prediction models, the model proposed in this paper has obtained better and more stable prediction results. Compared with the mixed model based on other decomposition methods, the model proposed in this paper has more significant advantages in multi-step forecasting. The results show that LSTM is the best individual prediction model, and Adaboost is used to optimize LSTM with significant effects. ESMD-PE and Adaboost can improve the performance of LSTM, and the contribution of ESMD-PE is far greater than that of Adaboost. ESMD-PE and Adaboost can be used together to achieve greater improvements.
In addition, it should be noted that although ESMD avoids the modal aliasing phenomenon to a certain extent, it has not completely eliminated it. How to further reduce the modal aliasing and reduce the number of screenings by improving the decomposition method is the focus of the next stage of work. In addition, the use of Adaboost to improve the LSTM has improved the accuracy of prediction, but it has not completely solved the “gradient explosion” and “gradient disappearance” phenomena, which are also problems that need to be solved in the next stage.
ACKNOWLEDGMENTS
The authors acknowledge the financial support provided by the Research Fund of the State Key Laboratory of Eco-Hydraulics in Northwest Arid Region, the Xi’an University of Technology (Grant No. 2019KJCXTD-5), and the Key Research and Development Plan of Shaanxi Province (Grant No. 2018-ZDCXL-GY-10-04).
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
Author Contributions
K.Z. conceptualized the study and methodology, investigated the study, validated the data, and wrote the original draft. Y.Z. investigated the study; performed data analysis and software analysis; and wrote, reviewed, and edited the manuscript. G.Z. provided resources and performed data analysis and formal analysis. X.H. and J.Y. wrote, reviewed, and edited the manuscript. All authors have read and agreed to the published version of the manuscript.
DATA AVAILABILITY
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy reasons.