A critical step in stochastic optimization models of power system analysis is to select a set of appropriate scenarios and significant numbers of scenario generation methods exist in the literature. This paper develops a clustering based scenario generation method, which aims to improve the performance of existing scenario generation techniques by grouping a set of correlated wind sites into clusters according to their cross-correlations. Copula based models are utilized to model spatiotemporal correlations and the Gibbs sampling is then used to generate scenarios for day-ahead markets. Our results show that the generated scenarios based on clustered wind sites outperform existing approaches in terms of reliability and sharpness and can reduce the total computational time for scenario generation and reduction significantly. The clustering-based framework can therefore provide a better support for real-world market simulations with high wind penetration.

Stochastic programming represents a potentially promising technique to address uncertainties associated with wind power1–3 and has been studied widely in the recent decades4–6 in a variety of fields.7–10 However, one challenge in its practice is to select a set of appropriately weighted scenarios to represent the space of uncertainty. Such techniques usually involve fitting forecasted wind power or forecast errors to specific distributions,11 where scenarios are then generated by sampling the derived distributions.12 Such distributions can be either parametric distributions, such as Gaussian5 and Generalized Gaussian distributions,13 the Beta distribution,14,15 the Weibull distribution,16 and the logit-normal distribution,12 or non-parametric distributions.17 Ma et al.18 characterized forecast errors using empirical distributions and the inverse transformation method is applied to obtain a set of scenarios. Cui et al.13 developed a generalized Gaussian mixture model to fit forecast errors of aggregated wind power from hundreds of wind farms and the fitted distribution is used to sample scenarios for probabilistic wind ramp forecasting.

While sampling methods can be applied to these distributions to generate scenarios for a single wind farm or aggregated power output of multiple wind farms,5,11,13,15,17,18 an increasing number of studies have placed emphasis on spatiotemporal correlation in scenario generation.19–21 Typically, such correlation is modeled with multivariate joint distributions. For example, Pinson et al.12 employed a multivariate Gaussian distribution to describe correlations between wind power forecasts made at different lead times, and this method has been widely adopted in numerous studies recently. Other commonly used methods include Cholesky decomposition,22 moment matching techniques,23 and the generalized dynamic factor model.24 In addition, machine learning-based methods are also employed to generate scenarios in the latest studies. Such methods include artificial neural network (ANN),25 quantile regression convolutional neural network (QRCNN),26 radial basis function neural network (RBFNN),18 generative adversarial networks (GAN),27,28 and support vector machine (SVM).29 Particularly, techniques that are used to generate probabilistic forecasts can be used to produce discrete scenarios by sampling the predictive distributions to avoid reliance on prior distributions.30 Nevertheless, modeling high dimensional multivariate non-Gaussian distributions can be challenging, and a common approach is to use copula.31 By applying the marginal cumulative distribution functions to stochastic variables, the original variables are transformed from the original space into a common uniform domain, in which correlations among the original variable can be further characterized using copulas. The copula-based methods have been used extensively in the literature recently.12,32–35 For example, Zhang et al.19 modeled spatiotemporal correlations of clustered wind farms using a copula based model and implemented a scenario generation method. A similar method is developed by Tang et al.20 and the Gibbs sampling method is adopted to generate scenarios for a stochastic economic dispatch (ED) model.

The above methods represent a considerable number of scenario generation approaches. However, real-world power systems usually involve hundreds or even thousands of wind farms, while sampling random variables of such high dimensions is usually computationally prohibitive, especially when spatial-temporal correlation is explicitly considered. Recently, data mining techniques have been applied to power system studies to process large volumes of data in a diverse set of literature studies.36,37 Typical applications include cable layout design, evaluation of transfer capabilities, reliability of networks, and also wind power forecasting.38 In the study by Vallee et al.,37 a clustering method is developed to group 94 wind areas into 14 clusters, which are then used to generate samples of wind speed to evaluate system reliability through a non-sequential Monte Carlo simulation.

In this paper, we present a scenario generation framework based on cluster analysis, where wind sites are grouped into clusters by correlation. The framework is aimed to improve the quality of generated scenarios at less time expense by maximizing intra-cluster correlation and minimizing inter-cluster correlation. By clustering wind sites into a number of smaller clusters, existing methods can be employed to generate scenarios for each cluster. The generated scenarios of wind power are then reduced and used as inputs to a stochastic day-ahead unit commitment (DAUC) model, which uses a hedging strategy to clear the day-ahead market to determine commitment statuses of thermal units. Given the commitment statuses of thermal generators, a real-time ED model is then solved to evaluate the system cost-effectiveness and reliability. A synthetic, 2000-bus network built on the footprint of the Electric Reliability Council of Texas (ERCOT)39 is employed to demonstrate the applicability of the proposed method in real-world power systems. Our approach represents an enhanced framework based on existing methods and can be applied to real-world power systems with high wind penetration.

The main contributions of our work include (i) enhancing existing scenario generation methods by substantially improving the quality of generated scenarios with reduced time consumption, (ii) demonstrating the proposed framework with a real-world sized power system and scrutinizing the trade-off between the number of clusters and a variety of evaluation metrics, and (iii) using the generated scenarios as inputs to a stochastic power market simulation of a synthetic network built on the footprint of the Texas power system and assessing its impact in market operation and scheduling. The proposed framework can be extended by integrating with other scenario generation methods and applied to real-world power systems with high wind penetration.

The remainder of this paper is organized as follows: Sec. II details the developed clustering-based framework. Section III formulates the power market simulation model. Section IV summarizes the data and assumptions. Sections V and VI present the generated scenarios and the power market simulation, respectively. Section VII concludes this paper.

While numerous studies suggest that including correlations across wind farms can provide better support for existing scenario generation methods,19–21 these studies typically include all wind farms in a single group. However, wind farms in a real-world power network usually scatter over a wide area since geographical diversification can effectively decrease fluctuations in aggregate output and improve overall economic viability.40,41 In addition, correlations normally decay with distances, which results in a weak correlation between two wind farms if they are remotely located. Therefore, it is questionable whether the generated scenarios can benefit from including weakly correlated wind farms and a more appropriate approach is to group wind farms into clusters based on their cross-correlations.42 

Cluster analysis groups data objects based only on information provided by the data itself, where data objects in one cluster often share similar characteristics and are distinct from other clusters. The similarity or distance between data objects is usually expressed in terms of proximity measures, such as p norms, or the cohesion of the cluster. In this study, we use Pearson's correlation coefficient ρ(xi,xj) as the proximity measure since the wind farms are clustered based on correlation, i.e., wind farms in the same cluster should be more closely correlated with each other, while correlations between two clusters are less strong. The formula for ρ(xi,xj) is given by the equation below:

ρ(xi,xj)=cov(xi,xj)σxiσxj.
(1)

A variety of methods exist in cluster analysis.43 In this study, we use the k-means method. The k-means method is a widely used prototype-based, partitional method that selects the mean of all data objects in a cluster as the prototype. Let C denotes a set of data objects, which can be grouped into K clusters and let xCk denotes the data objects in the kth cluster, the objective function of the k-means method becomes

mink=1KxCkdist(ck,x),
(2)

where dist() represents the selected proximity measure. Note that since higher ρ represents stronger correlation, hence higher similarity, the objective function minimizes 1ρ instead:

mink=1KxCk[1ρ(ck,x)].
(3)

To solve the optimization problem, one can exhaustively enumerate all possible ways of grouping objects into K clusters. However, this is often computationally prohibitive and heuristic iteration-based algorithms are typically used. Such heuristic algorithms often require initial conditions (i.e., centroids) as input and iteratively converge to a solution where the objective value no longer decreases. Although the heuristic algorithms can converge within an acceptable amount of time, they are sensitive to initial conditions and often converge to a local minimum rather than a global minimum. Therefore, multiple runs with randomly selected initial conditions are typically required. In this paper, the k-means function in MATLAB is employed and repeated ten times using different initial cluster centroids.

After all wind sites are grouped into K clusters, existing scenario generation methods can be applied to each cluster. Given a cluster with N wind sites, the spatial correlations of wind power production across all sites in the cluster can be modeled using a multivariate joint probability distribution, whose cumulative density function (CDF) and probability density function (PDF) are given below:

Fa1aNf1fN(x1,ta,,xN,ta,x1,tf,,xN,tf)=Prob(X1,tax1,ta,,XN,taxN,ta,X1,tfx1,tf,,XN,tfxN,tf),
(4a)
fa1aNf1fN(x1,ta,,xN,ta,x1,tf,,xN,tf)=2NFa1aNf1fNX1aXNaX1fXNf,
(4b)

where Xi,ta and Xi,tf denote actual and forecasted wind power at site i during time t, respectively. Note that the forecasts can be deterministic forecasts with any lead time and in our analysis, day-ahead forecasts are used. Therefore, scenario generation given wind power forecast becomes sampling the distribution of actual wind power (X1,ta,,XN,ta) conditioned on the forecasted wind power (x1,tf,,xN,tf),

Fa1aN|f1fN(x1,ta,,xN,ta)=Prob(X1,tax1,ta,,XN,taxN,ta|X1,tf=x1,tf,,XN,tf=xN,tf),
(5a)
fa1aN|f1fN(x1,ta,,xN,ta)=NFa1aN|f1fNX1aXNa.
(5b)

Here, we follow Tang et al.20 by modeling the multivariate distribution with marginal distributions of individual variables and a copula to link their interdependence. The spatial correlation is modeled with the copula, while the temporal correlation is modeled with a multivariate Gaussian distribution. Next, Gibbs sampling is adopted to simulate sampling of the multivariate random variable by sequentially sampling each component.

1. Gibbs sampling

The Gibbs sampling method is a Markov chain Monte Carlo algorithm44,45 and the idea is to iteratively sample only one variable or a block of variables at a time from its distribution conditioned on the remaining variables. Suppose we have a two-dimensional random variable (X, Y), the Gibbs sampler generates a sequence (known as a Gibbs sequence) by sampling each component conditioned on the other component:

y(0),x(0),y(1),x(1),y(2),x(2),,y(k),x(k).

The initial value y(0) is specified and the rest of the sequence is obtained by iteratively sampling,

X(j)f(X|Y(j)=y(j)),Y(j+1)f(Y|X(j)=x(j)).

The process of generating a Gibbs sequence is Gibbs sampling. It shows that under certain conditions, the distributions of X(k) and Y(k) converge to f(X) and f(Y) (i.e., their marginal distributions), respectively, as k. Without knowing the joint distribution, we can generate a set of samples that follow their marginal distributions. It can also be extended to high dimensional space as well. Therefore, Gibbs sampling converts sampling a multivariate distribution into sequentially sampling a set of univariate distributions, i.e., sampling the multivariate distribution in Eq. (5a) becomes sampling Fai|a1,,ai1ai+1,,aNf1,,fN iteratively. A detailed procedure is given in Algorithm 1. Note that, in this analysis, we use the forecasted wind power xi,tf to initialize xi,ta.

Algorithm 1. Gibbs sampling.

procedure Gibbs sampler 
 Initialize xi,ta,0xi,tf,i=1,,N 
for scenario ξΞdo Sample: 
  X1,ta|x2,ta,ξ1,,xN,ta,ξ1,x1,tf,,xN,tf 
  X2,ta|x1,ta,ξ,x3,ta,ξ1,,xN,ta,ξ1,x1,tf,,xN,tf 
  … 
  XN,ta|x1,ta,ξ,,xN1,ta,ξ,x1,tf,,xN,tf 
end for 
end procedure 
procedure Gibbs sampler 
 Initialize xi,ta,0xi,tf,i=1,,N 
for scenario ξΞdo Sample: 
  X1,ta|x2,ta,ξ1,,xN,ta,ξ1,x1,tf,,xN,tf 
  X2,ta|x1,ta,ξ,x3,ta,ξ1,,xN,ta,ξ1,x1,tf,,xN,tf 
  … 
  XN,ta|x1,ta,ξ,,xN1,ta,ξ,x1,tf,,xN,tf 
end for 
end procedure 

2. Spatial correlation

To obtain the univariate distributions required in Algorithm 1, we start from the conditioned probability distributions from Eq. (4a). By applying Bayes' theorem, the CDF of Xi,ta conditioned on other variables in Algorithm 1 is

Fai|a1ai1ai+1aNf1fN(xi,ta)=Prob(Xi,taxi,ta,Xji,ta=xji,ta,X1,tf=x1,tf,,XN,tf=xN,tf)Prob(Xji,ta=xji,ta,X1,tf=x1,tf,,XN,tf=xN,tf),j{1,,N}.
(6)

By applying the copula function, the equation becomes

Fai|a1ai1ai+1aNf1fN(xi,ta)=2N1C2NFa1Fai1Fai+1FaNFf1FfN|0Fai(xi,ta)2N1C2NFa1Fai1Fai+1FaNFf1FfN|01,
(7)

which only includes a copula function C2N and marginal distribution functions associated with each variable Fai. Note that detailed derivation can be found in  Appendix.

Once the univariate CDF is determined, Algorithm 1 is implemented sequentially to produce samples. In this study, the logit-normal distribution is used to fit the marginal distributions of wind power,11 and the copula is selected from among Gaussian, student t, and the Archimedean family based on the Bayesian information criterion (BIC).13 

3. Temporal correlation

Samples produced from the above method represent simultaneous power production from multiple spatially correlated wind farms. However, as suggested by Pinson et al.,12 power production from a wind farm at different times are also temporally correlated, which can be modeled as follows: Given wind farm i and let ui,ta=Fai(xi,ta)(0,1) denote the CDFs associated with its actual power production at t = 1 to T, suppose Yi=(Yi,1,Yi,2,,Yi,T) is a T-dim variable whose entries are given by Yi,t=Φ1(ui,ta), then, Yi can be modeled using a multivariate Gaussian distribution

YiN(0,Σi),

where Φ1 is the inverse of the standard Gaussian CDF function, 0 is a T-dim vector of zeros, and Σi is a covariance matrix whose entries are given by

Σi(t1,t2)=cov(Yi,t1,Yi,t2)=exp(|t2t1|v).
(8)

Therefore, Eq. (8) implies that the temporal correlation decays exponentially with time and is dictated by the parameter v. In this study, the value of v is selected by fitting Σi to historical data using the method of least squares.

After |Ξ| scenarios are generated for each of the K clusters, the generated scenarios can be directly used as inputs to the stochastic programming models. However, the exhaustive combination of all scenarios from K clusters results in |Ξ|K scenario, which can be computationally prohibitive as K increases. Therefore, the number of generated scenarios must be reduced to an adequate number before the model can be solved in a time efficient manner. Numerous scenario reduction techniques exist in the literature,46–49 where similar scenarios are aggregated based on particular distance measures. However, most of these reduction techniques do not involve clusters. In addition, the process of scenario generation itself can also be time-consuming, particularly given the great number of scenarios if the scenarios in all clusters are exhaustively combined. Therefore, we adopt the method of forward reduction from Dupacova et al.49 and apply it in a heuristic way to reduce the number of scenarios from |Ξ|K to 10.

For convenience, we define a T-dim vector xka,ξ to represents scenario ξ in the kth cluster,

xka,ξ=(i=1Nkxi,1a,ξ,,i=1Nkxi,Ta,ξ),ξΞ,
(9)

where Nk is the number of wind sites in the kth cluster. Therefore, each scenario in a cluster is represented by the sum of wind power from all wind sites in that cluster. Let's denote the original set of sums as Sk={xka,ξ,k=1,,K,ξΞ} and use S as the reduced set. Using the above definition, the detailed implementation is given in Algorithm 2.

Algorithm 2. Heuristic scenario reduction.

procedure Scenario reduction 
 Initialize S empty set  
fork=1,,Kdo 
  Add scenarios from cluster k: SS×Sk 
  Reduce S into 10 scenarios using forward reduction 
end for 
 Return the final scenarios S={xa,j,j=1,,10} 
end procedure 
procedure Scenario reduction 
 Initialize S empty set  
fork=1,,Kdo 
  Add scenarios from cluster k: SS×Sk 
  Reduce S into 10 scenarios using forward reduction 
end for 
 Return the final scenarios S={xa,j,j=1,,10} 
end procedure 

Note that during the kth iteration, a new cluster Sk is added and S is updated by taking the Cartesian product of S and Sk, i.e., the element in S is updated by taking all ordered pairs (xa,j,xka,ξ), where xa,jS,j=1,,10 and xka,ξSk,ξΞ. Therefore, the dimension of set S increases by T after each iteration and after all iterations, dim S=K·T. Upon completion of Algorithm 2, each single element in set S includes the sums of wind power output from all clusters. Based on the indices of the selected scenarios in S, site-specific wind power output can then be picked up and used in the power market simulation.

To evaluate the performance of the generated scenarios, two prediction interval (PI) based metrics are introduced: reliability and sharpness. These metrics are used widely in probabilistic forecast to assess the quality of the PIs.50 Probabilistic forecasts can take a variety of forms,51,52 such as PDFs, quantiles, and scenarios. While the evaluation metrics are often defined based on PDFs, other forms can be converted into PDFs as needed. Since the generated scenarios can be viewed as realizations of probabilistic wind power forecasts, the PIs are estimated based on the generated scenarios and the PI-based metrics are then adopted to evaluate the quality. In addition, the continuous ranked probability score (CRPS) is also employed to measure the quality of our scenarios.

1. Prediction interval

Given the forecasted mean x and variance σ2 of a random variable X probabilistically forecasted random variable x and its variance σ2, a PI with 100(1α)% confidence level can be given by [Lα,Uα],

Lα=xz1α/2σ,
(10)
Uα=x+z1α/2σ,
(11)

where z1α/2 is the 100(1α/2) percentile of the standard Gaussian distribution. Note that the random variable X here can be the wind power production at site i in time t, i.e., Xi,ta, while x and σ2 in Eqs. (10) and (11) are the estimated first and second moments from the generated scenarios. Since the PI of normalized wind power is bounded in [0,1], anything out of the interval will be floored at 0 and capped at 1, i.e., the resulting distribution is assumed to be censored Gaussian. Note that a previous study50 suggests that even if the actual distribution is non-Gaussian, the assumption of a censored Gaussian can still be applied with satisfactory performance.

2. Reliability

Given confidence level 100(1α)%, PIs can be derived from the generated scenarios for each random variable. The future measured wind power is expected to lie within the PI. The given confidence level is also known as the PI nominal confidence (PINC), and it is expected that the probability of the obtained PIs covering the measured wind power xi,ta will asymptotically reach the nominal level of confidence. Therefore, the reliability of the PIs is measured by PI coverage probability (PICP),

PICPi,tα=1N·Ti=1Nt=1TIi,tα×100%,
(12)

where the indicator of PICP, Ii,tα, is defined by

Ii,tα={1,xi,ta[Lxα,Uxα]0,xi,ta[Lxα,Uxα].
(13)

Ideally, the PICP should equal the PINC, i.e., the average coverage error (ACE)50 should be as close to zero as possible. In our analysis, we use the average absolute coverage error over the entire forecast time horizon T and across all N wind sites as the indicator of reliability,

ACE¯α=1N·Ti=1Nt=1T|PINCPICPi,tα|,
(14)

where PINC=100(1α).

3. Sharpness

The definition of PICP implies that a high PICP can be realized by widening PIs. Therefore, another metric, sharpness, is introduced to measure the size of PIs. Sharpness is evaluated in the form of interval score. Given the width of a PI: ωi,tα=Ui,tαLi,tα, the interval score Sci,tα is

Sci,tα={2αωi,tα4(Li,tαxi,ta),ifxi,ta<Li,tα2αωi,tα,ifxi,ta[Li,tα,Ui,tα]2αωi,tα4(xi,taUi,tα),ifxi,ta>Ui,tα,

where Sci,tα rewards a narrow PI and gives penalty if the PI fails to cover the realized value xi,ta.

Similar to the definition of ACE, we use an average interval score (AIS) to evaluate the sharpness of the generated scenarios over time horizon T and across all N wind sites,

AIS¯α=1N·Ti=1Nt=1TSci,tα.
(15)

4. Overall metric

Clearly, there is a trade-off between reliability and sharpness, since a wider PI may result in a higher coverage rate at the cost of a lower interval score, and vice versa. We, therefore, introduce a synthetic evaluation metric (SEM) as the linear combination of ACE and AIS to give a comprehensive evaluation of the PI,

SEM¯α=λ1ACE¯αλ2AIS¯α.
(16)

Note that a lower SEM indicates a better overall performance since a negative AIS is used here. λ1 and λ2 are weighting coefficients, and we assume λ1=λ2=0.5 to put equal emphasis on both reliability and sharpness. The weighting coefficients can be varied by decision makers to put more weights on either aspect.

5. Score

There are numerous numerical scores proposed to measure the quality of probabilistic forecasts.53 We use the CRPS score, a proper score that is designed to measure both the reliability and sharpness of a probablistic forecast.54 Suppose the predicted CDF of the wind power production in site i at time t is F̂i,t(x) and the observed wind power is xi,ta, the definition of CRPS can be given by the following equation by measuring the difference between the predicted and observed CDFs,

CRPSi,t=[F̂i,t(x)1xxi,ta]2dx,
(17)

where 1xxi,ta:{x|x}{0,1} is the indicator function of event xxi,ta. We further take the average over all wind sites and all time intervals to measure the overall performance,

CRPS=1N·Ti=1Nt=1TCRPSi,t.
(18)

Note that the predicted CDF is required before calculating the CRPS. In the realm of weather prediction, the predicted CDF can be obtained by three methods: parametric method, non-parametric method, and ensemble forecast.51 By giving a set of discrete scenarios, our study falls in the category of ensemble forecast. Therefore, to estimate the predicted CDF, the empirical CDF is derived from the generated scenarios by using the following equation:

F̂i,t(x)=EξΞ(1xxi,ta,ξ),
(19)

where xi,ta,ξ denotes the generated wind power scenarios for wind site i at time t.

A critical motivation of this study is to apply the enhanced approach of scenario generation to real-world power systems with high renewable penetration. Power markets in real world are typically operated on multiple timescales ranging from days ahead to real time. Typically, as the markets evolve forward, forecasts with shorter lead times are used as they become available. In most real-world power markets, the day-ahead market is often cleared one day before using day-ahead forecasts and the schedules are updated in the real-time market several hours ahead of the actual operation by using more up-to-date forecasts. The imbalance between the forecasts and the actual values is then offset by deploying operating reserves, which is known as the load frequency control (LFC) process. The purpose of the power market simulation is to show the market benefits from the improvement in forecasts due to clustering. However, we believe that including forecasts with multiple lead times would be unnecessarily complex to illustrate the benefits of the methods utilized in this work. Therefore, although forecasts with different lead times are used in real-world markets, we only focus on day-ahead forecasts and assume that the real-time market is cleared based on actual values to neglect the LFC process. In addition, we assume that both markets are operated on a 24-h horizon at an hourly resolution. To simulate the two markets, we develop a day-ahead unit commitment (DAUC) model and a real-time economic dispatch (RTED) model. The DAUC model takes in day-ahead wind power forecasts and determines the commitment statuses of all thermal units. Next, the RTED model is solved based on actual wind power production to determine the final financial settlement. In addition, the DAUC model is solved both deterministically and stochastically. The stochastic run solves for a hedging strategy that explicitly considers future uncertainties associated with wind power, which is represented by scenarios. By contrast, the deterministic run solves each wind scenario independently and all decision variables vary by scenarios. Once the DAUC model is solved, commitment statuses of all thermal units are fixed and the RTED model is solved by using the actual wind output. We then compute the financial settlement only based on the solutions from the RTED model.

The DAUC model is formulated as a mixed integer program by following Ref. 55 and enhanced by constraints associated with regulating and responsive reserves. In addition, active power flow on transmission lines is calculated using shift factors and bounded by a set of pre-defined line limits. The objective of the model is to minimize total costs, which include actual production costs, fixed costs, and load and curtailment costs. The full DAUC model formulation is given below. Note that decision variables in each scenario (ξ) are independent of other scenarios in the deterministic runs.

System load balance

gGpg,tξ+bBLδb,tξ=bBLDb,t,tT.
(20)

Maximum and minimum power generation

Pgmin·vg,tpg,tξp¯g,tξPgmax·vg,t,gGT,
(21a)
pg,tξp¯g,tξ,gG,
(21b)
p¯g,tξPg,tF,ξ,gGGT.
(21c)

Note that if wind unit gGW is installed at site i, the maximum available wind power Pg,tF,ξ is given by multiplying the normalized wind power xi,ta,ξ with the rated capacity of unit g.

Maximum available power of thermal generators accounting for ramp rates

p¯g,tξpg,t1ξRgU·vg,t1ξ+RgSU(vg,tξvg,t1ξ)+Pgmax(1vg,tξ),gGT,tT,
(22a)
p¯g,tξPgmax·vg,t+1ξ+RgSD(vg,tξvg,t+1ξ),gGT,tT.
(22b)

Power generation during ramp-down or shut-down

pg,t1ξpg,tξRgD·vg,tξ+RgSD(vg,t1ξvg,tξ)+Pgmax(1vg,t1ξ),gGT,tT.
(23)

Active power flow, L,

ΛbBSF,b(gGbpg,tξδb,tξDb,t)Λ.
(24)

Up-time of thermal generators, gGT,

(25)
t=1TgU0(1vg,tξ)=0,
(25a)
τ=tt+TgUvg,τξTgU(vg,tξvg,t1ξ),t[TgU0+1,TTgU+1]
(25b)
τ=tT[vg,τξ(vg,tξvg,t1ξ)]0,t[TTgU+2,T].
(25c)

Down-time of thermal generators, gGT,

(26)
t=1TgD0vg,tξ=0,
(26a)
τ=tt+TgD(1vg,τξ)TgD(vg,t1ξvg,tξ),t[TgD0+1,TTgD+1],
(26b)
τ=tT[(1vg,τξ)(vg,t1ξvg,tξ)]0,t[TTgD+2,T].
(26c)

Reserve availability, gGT,

(27)
rg,tS,ξ+rg,tRU,ξp¯g,tξpg,tξ,
(27a)
rg,tRD,ξpg,tξPgmin·vg,tξ.
(27b)

Spinning reserve balance, tT,

gGTrg,tS,ξ+δtS,ξ=RQtS.
(28)

Regulating up/down reserve balance, tT,

(29)
gGTrg,tRU,ξ+δtRU,ξ=RQtRU,
(29a)
gGTrg,tRD,ξ+δtRD,ξ=RQtRD.
(29b)

Cost of thermal generators, gGT,tT,

zg,tP,ξ=Cg0·vg,tξ+Cg·pg,tξ,
(30a)
zg,tSU,ξCgSU(vg,tξvg,t1ξ),
(30b)
zg,tSD,ξCgSD(vg,t1ξvg,tξ).
(30c)

Curtailment cost, tT,

ztΔ,ξ=CΔLbBLδb,tξ+CΔR(δtRU,ξ+δtRD,ξ)+CΔS·δtS,ξ.
(31)

Constraint (20) ensures system-wide load balance. Different from Ref. 55, we include slack variable δb,tξ for each load bus bBL to reduce the chances of infeasibilities. Equations (21a) and (21b) allow thermal units to operate within their output limits, while the output of renewable units is limited by power forecast in Eq. (21c). In addition, Eqs. (22a) and (22b) limit the maximum output from thermal generators during start-up and shut-down. The output of thermal units is further limited by Eq. (23) during ramp-down or shut-down. In Eq. (24), the active power flow on transmission line is calculated using shift factors SF,b and is limited by line limit Λ. The minimum online/offline time of thermal generators are limited by Eqs. (25) and (26). The amount of available reserves (rg,tS,ξ for responsive reserve, rg,tRU,ξ and rg,tRD,ξ for up/down regulating reserves) from thermal units is provided in Eq. (27) and requirements of reserves are met in Eqs. (28) and (29). Similar to the load balance constraint, slack variables are also introduced to account for reserve shortages.

Costs of thermal generators consist of production costs and fixed costs. Typically, production costs of thermal generators are represented by quadratic functions; here for simplicity, they are modeled as linear functions of power production pg,tξ, as shown in Eq. (30a). The fixed costs, as given by Eqs. (30b) and (30c), include start-up and shut-down costs, which are determined solely by commitment statuses. In addition to the costs of thermal units, the objective function also includes curtailment costs, which are used to avoid load shedding and reserve shortages, as given in Eq. (31). Finally, the objective function in the deterministic run is given in Eq. (32) by summing up all costs:

min:zξ=ztΔ,ξ+gGT,tT(zg,tP,ξ+zg,tSU,ξ+zg,tSD,ξ).
(32)

To account for uncertainties associated with wind power, a two-stage stochastic DAUC model is developed based on the deterministic DAUC model. We assume that the commitment statuses (vg,tξ) are the only first-stage decision variables and must remain equal across all scenarios, while all other variables are the second-stage decision variables and are thus scenario-dependent. Therefore, a non-anticipativity constraint is introduced in the following equation:

vg,tξ=vg,tξ,gGT,tT,ξξ.
(33)

The objective function of the stochastic run is given by Eq. (34), where the probability-weighted costs are minimized. The fixed costs are equal across all scenarios, since they are only dependent on the first stage variables. By contrast, the production costs and curtailment costs are functions of the second stage variables, which are scenario dependent; therefore, Eq. (34) minimizes their probability-weighted expectation. Note that probabilities of scenarios (Prξ) are determined in the scenario reduction process

minẑ=gGT,tT(zg,tSU+zg,tSD)+ξPrξ·(ztΔ,ξ+gGT,tTzg,tP,ξ).
(34)

After the DAUC model is solved, the RTED model is solved by fixing the commitment status of all thermal units and replacing the forecasted wind power with actual values. In addition, the minimum online/offline time constraints in Eqs. (25) and (26) are also removed. Therefore, the RTED model is formulated as a continuous linear program, due to the removal of all binary variables. The full model consists of Eqs. (20)–(24) and (27)–(31), and the objective function is represented by Eq. (32). Note that while each deterministic scenario results in an independent set of commitment status, hence an independent RTED run, the stochastic run returns only one set of commitment status for all scenarios and is followed by one single RTED run.

To simulate real-world power systems, we use a well-developed, publicly available synthetic network developed by Ref. 39, which is built on the footprint of the Electric Reliability Council of Texas (ERCOT). The 24-h load profiles from the IEEE 118 bus system are mapped into load profiles of the Texas system. Note that the load profile of one load bus in the 118 bus system may be mapped into multiple load buses in the Texas system, due to the latter's significantly larger size. In addition, the load profiles are scaled up or down to match the original load magnitudes of the Texas system.

Generating units in the Texas system can be grouped into six categories based on their fuel types, as shown in in Table I. Most techno-economic parameters of thermal units, such as the maximum and minimum output and ramp rates, are directly drawn from the original dataset. The production cost functions are mapped from the units with similar fuel types and rated capacities in the IEEE 118 bus system. The minimum online and offline time of a thermal units is determined primarily by its prime driver, while the start-up and shut-down costs are also affected by fuel costs. Since the original dataset does not give the types of prime drivers, the prime drivers are determined by their capacities following Ref. 56. Fuel costs are drawn from US EIA's Monthly Energy Review.57 Note that we assume that all nuclear units are constantly online. In addition, wind and solar units have zero production and start-up/shut-down costs.

TABLE I.

Number and capacity of units by fuel types in the real-world Texas power system61 and the synthetic power system from Ref. 39. Note that the data from EIA reflect the whole Texas power system, while ERCOT covers over 90% of Texas load, hence the disprepancy.

Synthetic ERCOT systemReal-world Texas system
Fuel typeNo. of unitsCapacity (MW)ShareCapacity (MW)Share
Coal 39 14 502 15% 23 589 19% 
NG 367 63 810 66% 69 386 56% 
Nuclear 5139 5% 4960 4% 
Wind 87 9587 10% 22 583 18% 
Solar PV 22 651 1% 1240 1% 
Hydro 25 2603 3% 670 1% 
Biomass 0% 425 0% 
Other 0% 659 1% 
Total 546 96 292 100% 123 512 100% 
Synthetic ERCOT systemReal-world Texas system
Fuel typeNo. of unitsCapacity (MW)ShareCapacity (MW)Share
Coal 39 14 502 15% 23 589 19% 
NG 367 63 810 66% 69 386 56% 
Nuclear 5139 5% 4960 4% 
Wind 87 9587 10% 22 583 18% 
Solar PV 22 651 1% 1240 1% 
Hydro 25 2603 3% 670 1% 
Biomass 0% 425 0% 
Other 0% 659 1% 
Total 546 96 292 100% 123 512 100% 

For simplicity, we assume that outputs from all solar and hydro units are zero. Neglecting solar and hydro power may affect the commitment statuses of thermal units; however, we do not think it introduces a systematic error given their small capacity shares. As can be seen from Table I, both hydro power and solar PV only contribute marginally to the total capacity in both the real-world and the synthetic systems. In addition, it allows us to focus on the uncertainties associated with wind power.

We assume that load curtailment cost is $9000/MWh, according to the value of load loss (VOLL) used by ERCOT.58 We use $2000/MWh, the low system-wide offer cap (LCAP) provided by the Texas Public Utility Commission,59 as the penalty price for responsive reserve shortage and $5500/MWh, the average of LCAP and VOLL, as the penalty price for the regulating up/down reserve shortages. The penalty prices are selected such that responsive reserves are curtailed ahead of regulating reserves, and load will be the last to be curtailed.60 Historical data from ERCOT indicate that the regulating up/down reserves account for 1%–2% of system-wide load, while the responsive reserves range from 5% to 11%. Therefore, we assume 10% for responsive reserve margin and 2% for both up and down regulating reserve margins.

The data drawn from NREL's Wind Integration National Dataset (WIND) toolkit62 are used as input to the scenario generation model. The WIND toolkit provides synthetic wind turbine output power as well as meteorological data for more than 126 000 sites in the contiguous United States from 2007 to 2012. The data from 2007 to 2011 are used as input to the cluster analysis. The WIND toolkit simulates wind power production every 5 min and applies an NWP model to provide hourly power forecasts at multiple lead times. In our study, the day-ahead forecast is selected to generate scenarios for the day-ahead market. Since the temporal granularity is every 5 min for the actual power data and hourly for all forecasts, we calculate the hourly average of the actual wind power in the training dataset to match the time resolution of the day-ahead forecast. To capture seasonal variations of load profiles in the power market simulation, one day is selected from each season in 2012: February 10, 2012 (winter), May 10, 2012 (spring), August 10, 2012 (summer), and November 10, 2012 (fall). The Gibbs sampler is trained using actual wind power and wind power forecast from January 1, 2007 to the day before each target day. Next, scenarios are generated by the Gibbs sampler based on the day-ahead forecasts for the target day. The 87 wind units in the Texas system are mapped from 45 sites (shown in Fig. 1) in the WIND toolkit based on their geographical coordinates. Therefore, all wind units in one site have identical normalized wind power, although they may have different rated capacities. Figure 2 shows a heat map of the cross-site correlation coefficients across all 45 wind sites in this study, indicating strong correlations.

FIG. 1.

Locations of all wind sites in the synthetic ERCOT power grid.

FIG. 1.

Locations of all wind sites in the synthetic ERCOT power grid.

Close modal
FIG. 2.

Correlation coefficients across all 45 wind sites in the study.

FIG. 2.

Correlation coefficients across all 45 wind sites in the study.

Close modal

In this section, the clustering based framework developed in Sec. II is applied to the 45 wind sites from the Texas system over the four target days. The wind sites are first divided into a given number (K) of clusters and the Gibbs sampler is applied to each cluster independently. Since Gibbs sampling requires a large number of iterations before the samples reach a stationary distribution, 9000 samples are generated for each cluster and the first 8000 samples are discarded (known as the “burn-in” process) and only the last 1000 samples are kept for market simulation. The performance metrics are then derived from the generated scenarios and scrutinized. The goal is to examine the relation between the number of clusters and the quality of generated scenarios. The optimal number of clusters is jointly determined by the performance metrics and time consumption. In this study, the number of clusters K ranges from 1 to 20. Note that since when K = 1, all wind sites are put into one single group, it is equivalent to the case where cluster analysis is not applied. Therefore, the case when K = 1 represents our baseline case. The implementation is coded in MATLAB and executed on a high performance computing platform to exploit its parallel computing capability. A compute node equipped with a 20-core Xeon E5–2698 CPU and 256 GB memory is used, where the model training and scenario generation are both parallelized.

Figures 3 and 4 show the results from an ex-post validation analysis by scrutinizing the autocorrelation coefficients and cross correlation coefficients. The autocorrelation coefficients of historical wind power from site i are given by the following equation:

ρi(Δt)=cov(xi,ta,xi,t+Δta)σ(xi,ta)·σ(xi,t+Δta),
(35)

where Δt is time lag and xi,ta and xi,t+Δta are taken from the training dataset. Since the generated scenario xi,ta,ξ can be viewed as an estimator of xi,ta, the estimated autocorrelation coefficient ρ̂i(Δt) can be given by the above equation when xi,ta is substituted by xi,ta,ξ. The autocorrelation is caused by temporal correlations in the wind power time series. By comparing ρi(Δt) and ρ̂i(Δt) from all 45 sites, Fig. 3 indicates good agreement between the model input and output. The declining trends indicate weaker temporal correlations as time lag increases, as modeled in Eq. (8).

FIG. 3.

Autocorrelation coefficients as a function of time lag. Blue curves represent autocorrelation coefficients from the generated scenarios, while black curves represent the coefficients from the training dataset.

FIG. 3.

Autocorrelation coefficients as a function of time lag. Blue curves represent autocorrelation coefficients from the generated scenarios, while black curves represent the coefficients from the training dataset.

Close modal
FIG. 4.

Relative differences between the modeled cross-variable correlation coefficients and the original correlation coefficients in Fig. 2.

FIG. 4.

Relative differences between the modeled cross-variable correlation coefficients and the original correlation coefficients in Fig. 2.

Close modal

As presented in Fig. 2, the wind sites are strongly correlated. As spatial correlation is modeled by copula functions, the resulting scenarios should present strong correlations across sites. We examine the linear correlation coefficients using the generated scenarios and compare it with the original coefficients from historical data. The correlation coefficient is given by

ρi,j=cov(xi,ta,xj,ta)σ(xi,ta)σ(xj,ta).
(36)

Similar to the autocorrelation coefficients, the estimated cross correlation coefficients ρ̂i,j are given when xi,ta are substituted by xi,ta,ξ. Due to the large number of sites, we calculate the relative differences between ρi,j and ρ̂i,j,

|ρi,jρ̂i,j|ρi,j.
(37)

Figure 4 visualizes the relative differences as heat maps, where each pixel represents one pair of sites. By comparing it with Fig. 2, it shows approximation errors using the copula functions. However, it also indicates that while greater errors are presented when the spatial correlations are weak, most errors for pairs with strong spatial correlations are not significant, indicating good agreement between the model input and output.

The results from the winter day (February 10, 2012) are shown in Fig. 5. As illustrated in Fig. 5(a), the diagonal indicates the ideal results where PICP equals PINC; therefore, greater distance between the PICP and the diagonal implies lower reliability. It shows that the reliability is affected by both the number of clusters and PINC. In most cases, the distance between PICP and the diagonal is greater when PINC = 50%. This inspires us to examine the average distance between the PICP and the diagonal over the entire PINC range, i.e., ACE, which measures the overall reliability by taking the mean. Figure 5(c) shows the ACE as a function of K. As can be seen, the ACE drops from around 15% to less than 1% when K increases from 2 to 9, and rebounds to 4% when K increases to 20. The considerable drop of ACE before K = 9 implies improved reliability due to clustering, which can be further attributed to the exclusion of sites that are weakly correlated. By contrast, the rise of ACE beyond K = 9 indicates less reliable scenarios with further refined clusters, probably due to the exclusion of those sites that present strong correlations.

FIG. 5.

Performance evaluation of the generated scenarios for the winter day (February 10, 2012): (a) PICP as a function of PINC. The diagonal (dashed line) represents the ideal result, where PICP = PINC. The red and blue curves represent the closest and farthest PICP profiles to the diagonal. (b) Interval scores as a function of PINC. The red and blue curves represent the profiles with the highest and lowest mean interval scores, respectively. (c) ACE and (d) AIS as a function of K.

FIG. 5.

Performance evaluation of the generated scenarios for the winter day (February 10, 2012): (a) PICP as a function of PINC. The diagonal (dashed line) represents the ideal result, where PICP = PINC. The red and blue curves represent the closest and farthest PICP profiles to the diagonal. (b) Interval scores as a function of PINC. The red and blue curves represent the profiles with the highest and lowest mean interval scores, respectively. (c) ACE and (d) AIS as a function of K.

Close modal

Similar results are presented by interval scores. Figure 5(b) exhibits monotonically increasing interval scores as PINC increases. In addition, the interval scores are also largely affected by K. Similar to ACE, AIS gives a comprehensive evaluation to the interval scores of a given K by taking the mean over the entire PINC range. As presented in Fig. 5(d), the AIS reaches its peak at K = 6 and declines thereafter. Since higher interval scores indicate better performance in terms of sharpness, the results suggest K = 6 for the optimal sharpness.

As displayed in Fig. 5, both metrics suggest that a medium number of clusters results in the optimal performance of the generated scenarios. However, the two metrics do not agree on the optimal number of clusters: the highest reliability is present when K = 9 while K = 6 maximizes sharpness. Therefore, SEM is introduced and used as a comprehensive evaluation metric by factoring in both reliability and sharpness. Figure 6 shows the SEM values and the CRPS scores as a function of K in all four representative days. As can be seen, the winter, spring, and fall days exhibit very similar profiles, where the SEM is minimized when a medium number of clusters is selected. For example, the SEM is minimized when K= 10, 12, and 10 in the winter, spring, and fall day, respectively. Although the summer day presents a slightly different profile, where the SEM is minimized at K = 15 and only rebounds slightly when K increases to 20, the minimum SEM still indicates the optimal overall performance of the generated scenarios when a medium number of clusters is selected. Interestingly, the CRPS scores present different profiles from the SEM: the scores remain low before K = 10 and increase when K > 10. Therefore, both the SEM and CRPS scores suggest against a high number of clusters, which can reduce the quality of the generated scenarios due to failure to account for correlations across wind farms. In addition, the CRPS scores also suggest benefits from clustering: the CRPS scores are at their lowest when K ranges from 2 to 4 in all seasons except for winter.

FIG. 6.

SEM and CRPS as a function of the number of clusters (K) for all four cases in our study. (a) February 10, 2012, (b) May 10, 2012, (c) August 10, 2012, and (d) November 10, 2012.

FIG. 6.

SEM and CRPS as a function of the number of clusters (K) for all four cases in our study. (a) February 10, 2012, (b) May 10, 2012, (c) August 10, 2012, and (d) November 10, 2012.

Close modal

The generated scenarios can be directly used as inputs to the stochastic DAUC models. However, the number of generated scenarios must be reduced to an adequate number before the model can be solved in a time efficient manner. We follow the procedures in Algorithm 2 to reduce the generated scenarios. Figure 7 shows the time consumption as a function of K. The total time consumption is composed of scenario generation time and scenario reduction time. As shown in all four representative days, the total time consumption sees a trade-off between scenario generation and scenario reduction. The time consumed by scenario generation contributes to over 90% of the total time consumption when K < 3, while scenario reduction consumes over 65% of the total time when K increases to 20. In addition, as K increases, the time consumption of scenario generation presents a decreasing trend, while scenario reduction presents an increasing trend. For example, the average time consumption of scenario generation across all four days decreases from 140 min to 25 min when K increases from 1 to 20. By contrast, the average time consumption of scenario reduction increases from less than 1 min to more than 50 min over the same range of K. It is not surprising to observe the trade-off, since as K increases, each cluster is more likely to see a smaller number of wind farms, which reduces the scenario generation time. However, the number of iterations in scenario reduction is directly related to the number of clusters, and an increased K leads to longer scenario reduction time.

FIG. 7.

Time consumption as a function of the number of clusters (K) for all four cases in our study.

FIG. 7.

Time consumption as a function of the number of clusters (K) for all four cases in our study.

Close modal

The trade-off of time consumption between scenario generation and reduction again suggests that a medium number of clusters should be selected. In addition, a medium number of clusters can also significantly reduce the total time consumption. As shown in Fig. 7, the total time consumption tends to be minimized when 5K10.

In addition, Table II compares the baseline (K = 1) with the clustered cases (K > 1). The comparison indicates that the clustering-based framework can significantly improve the generated scenarios in terms of SEM and time consumption. As can be seen, the SEM in the clustered cases (K = 2–20) can be reduced by up to 33%, while the total time consumption can be reduced by over 60%. In addition, the CRPS score can also benefit marginally from clustering. Therefore, the clustering-based framework can significantly enhance the quality of the generated scenarios and reduce the total time consumption.

TABLE II.

Comparison between the baseline (K = 1) and the clustered cases (K > 1). Note the minimum SEM, CRPS, and time consumptions in the clustered cases are listed. The optimal K only applies to the clustered cases, i.e., K > 1.

WinterSpringSummerFall
SEM Optimal K 10 12 16 10 
 Baseline 32% 32% 30% 23% 
 Clustered 26% 27% 20% 16% 
 Improvement 20% 16% 33% 28% 
CRPS Optimal K 
 Baseline 0.249 0.242 0.277 0.191 
 Clustered 0.259 0.231 0.268 0.186 
 Improvement −4% 5% 3% 3% 
Total time (min) Optimal K 
 Baseline 123 125 134 178 
 Clustered 49 49 51 54 
 Improvement 61% 61% 62% 70% 
WinterSpringSummerFall
SEM Optimal K 10 12 16 10 
 Baseline 32% 32% 30% 23% 
 Clustered 26% 27% 20% 16% 
 Improvement 20% 16% 33% 28% 
CRPS Optimal K 
 Baseline 0.249 0.242 0.277 0.191 
 Clustered 0.259 0.231 0.268 0.186 
 Improvement −4% 5% 3% 3% 
Total time (min) Optimal K 
 Baseline 123 125 134 178 
 Clustered 49 49 51 54 
 Improvement 61% 61% 62% 70% 

As mentioned in Sec. III, the DAUC model will be solved deterministically and stochastically. Each deterministic run solves only one wind scenario, while the stochastic run solves all scenarios simultaneously. The 1000 wind scenarios generated in Sec. V with the minimum SEM values are reduced to ten scenarios and are represented with “1” to “10.” Note that the stochastic run with scenarios generated from the clustered case is denoted by “SC” and we use “SU” to denote the stochastic run with scenarios generated from the baseline, i.e., the unclustered method. In the clustered cases, the numbers of clusters are selected based on the minimum SEM. In addition, another run (denoted by “A”) takes the actual wind power as inputs to the DAUC model as if we have perfect foresight into the future. In summary, the DAUC model is solved 13 times, including 11 deterministic solves and two stochastic solves. Next, the RTED model is solved by using the actual wind power and committing all thermal units, where the commitment statuses are given by the DAUC model. In this study, Pyomo Stochastic Programming (PySP),63 an open-source Python library, is employed to implement the stochastic programming models and Gurobi is used to solve them. On the same compute node as used in Sec. V, it takes 20–25 min to solve a deterministic run, around 8 h to solve a stochastic run, and less than 5 min to solve a RTED model.

Results of the power market simulations are presented in Fig. 8 and Table III. As shown in Fig. 8, scenario A is always the least expensive in all seasons due to perfect foresight of wind power. Meanwhile, the more expensive total costs in other scenarios can be largely attributed to uncertainties associated with the forecast. In addition, great variations of total costs are presented in the deterministic scenarios. For example, in the fall day, the total cost from scenario 2 is 66% more expensive than that of scenario A, while scenario 3 is only 7% more expensive. Overall, the increased costs in the deterministic runs range from 2% to 103% across all seasons and scenarios, implying great uncertainties when the day-ahead market is cleared based on the deterministic wind power forecast. In contrast, the total costs of scenario SC only increase by 4%, 2%, and 4% in the spring, summer, and fall days, respectively. Note that although scenario SC results in a 50% more expensive total cost in the winter day, it is still lower than most of the deterministic runs in the same day, which range from 49% to 103%. The comparison between scenarios SC and SU indicates that the total costs are increased when the clustering-based framework is applied. However, the increases are only marginal in most cases: the total costs increase by 1%–3% in seasons 2, 3, and 4.

FIG. 8.

Cost breakdown of market simulations for all four representative days by scenarios. White represents production costs and black represents reserve shortage costs.

FIG. 8.

Cost breakdown of market simulations for all four representative days by scenarios. White represents production costs and black represents reserve shortage costs.

Close modal
TABLE III.

Shortages of responsive reserve in all seasons (MWh).

Season12345678910ASCSU
Winter 6321 4034 5824 3047 3146 4095 5474 6411 4471 5141 3134 
Spring 4432 1586 4571 2632 2062 3795 4716 3921 1556 4326 803 1253 803 
Summer 735 2935 2949 1336 3972 905 523 978 432 191 177 
Fall 2479 3583 365 807 3304 1498 2758 2124 2891 2138 223 
Season12345678910ASCSU
Winter 6321 4034 5824 3047 3146 4095 5474 6411 4471 5141 3134 
Spring 4432 1586 4571 2632 2062 3795 4716 3921 1556 4326 803 1253 803 
Summer 735 2935 2949 1336 3972 905 523 978 432 191 177 
Fall 2479 3583 365 807 3304 1498 2758 2124 2891 2138 223 

A closer examination of the cost breakdown indicates that the differences in total costs are primarily driven by variations in reserve shortages. As shown in Fig. 8, production costs are almost equal across all scenarios in the same day, while costs of reserve shortages vary significantly. Note that regulating reserve requirements are well met in all scenarios and shortages of responsive reserves are shown in Table III. Similar to the results of total costs, scenario A always results in the least amount of shortages and scenario SC results in less shortages than most deterministic scenarios. Therefore, the stochastic program represents a better strategy for market scheduling than the deterministic approach in terms of reliability and economy.

This paper presents an enhanced scenario generation framework by integrating existing approaches with cluster analysis. As demonstrated in the results, the generated scenarios can benefit significantly from the clustering based framework in terms of reliability, sharpness, and time consumption. While only one scenario generation approach is investigated in this study, the framework can also be extended to many other scenario generation techniques. Therefore, this framework can provide proper support to real-world power system analysis with large-scale wind penetration.

Several insights can be drawn from our study. First, as shown in Sec. V, the SEM of the generated scenarios is optimal when a medium number of clusters is used, due to trade-off between intra-cluster and inter-cluster correlations. In addition, the time consumption is also affected by the number of clusters: a small cluster with less members requires less time for scenario generation; however, the increased number of clusters leads to an increased amount of time for scenario generation. Although the CRPS scores suggest a smaller optimal number of clusters, they also indicate that the quality of the generated scenarios clustering can be improved by clustering. The results also imply that it may negatively affect the quality of generated scenarios when all wind farms are grouped into one single cluster regardless of the strength of correlations, which highlights the necessity of the clustering-based framework.

In addition, as indicated in the power market simulation, the hedging strategy given by the stochastic DAUC model presents great advantage over most deterministic runs in terms of both economy and power system reliability. The reserve shortages are effectively reduced without significantly increasing the total costs. Note that even though some deterministic runs may result in lower total costs than the hedging strategy, i.e., scenario 4 in the winter day, it is more likely to result in greater loss by adhering to the deterministic strategy, since all other deterministic runs lead to greater total costs and reserve shortages. By contrast, the stochastic strategy outperforms most deterministic runs in terms of both economy and power system reliability.

Several caveats exist. Figure 8 indicates reserve shortages are present across all scenarios, which can be attributed to multiple factors. First, we assume ancillary services are only provided by thermal units, while other types of qualified resources, such as hydro or even load resources, are excluded. In addition, the economic loss due to reserve shortages may be underestimated as a result of the fixed penalizing prices, since the dynamic scarcity pricing mechanism in ERCOT allows the prices to rise when a shortage of operating reserves is present.58 Finally, ERCOT may procure replacement ancillary service capacity from a supplemental ancillary services market (SASM),60 which is not co-optimized with the energy market. Therefore, the reserve shortages do not necessarily reflect resource insufficiency. However, although this study may benefit from including the above factors, the main emphasis is placed on demonstration of the clustering-based scenario generation framework, and modeling of the above factors is beyond our scope.

Besides, we should note that although the stochastic run provides a comparatively better strategy for the power market operation, the time consumption is less acceptable to the operators. In this study, it takes around 8 h to solve its extensive form using state-of-the-art commercial solvers, while it takes 1.5 h to execute day-ahead reliability unit commitment (RUC) in ERCOT.60 The slow performance is largely intrinsic to mixed integer programs (MIPs), which is proved to be NP-hard.64 Therefore, in the future, additional techniques such as in Refs. 65 and 66 should be employed for a better and more stable time performance.

Another caveat, as shown in Fig. 8, is the more expensive total costs when the clustering-based framework is used to generate scenarios for stochastic runs. However, this does not indicate that the clustering-based framework is outperformed by the baseline in terms of solution quality, since only one number of clusters (K) are examined in each season due to the extremely long computational time of stochastic runs. Ideally, the solution quality should be examined over a range of cluster numbers. In addition, the optimal K is selected when the SEM is minimized and an underlying hypothesis is that a better SEM of probabilistic scenarios is related to a better solution quality of stochastic optimization. The fact that SEM is weakly correlated with the solution quality implies that further examination of other evaluation metrics and comprehensive studies should be conducted to examine the relationship between the evaluation metrics of probabilistic forecasting and the solution quality of stochastic optimization. Moreover, a real-world decision-maker may need a unique and comprehensive evaluation metric to determine the best set of scenarios, which should account for a variety of factors, such as scenario quality (reliability and sharpness), computational time, and solution quality. In fact, there has been little research into the trade-offs between the quality of stochastic optimization solutions and the quality of probabilistic scenarios and such studies are critical given the discrepancies presented in our study.67 

Finally, the clustering strategy in Sec. II is based purely on the cross-correlations across wind sites, which only captures the spatial correlation and fails to capture time dependencies. While temporal correlations are accounted for in the scenario generation in the form of autocorrelation, the quality of the generated scenarios could benefit from taking into account more complex dependence structures. Particularly, the spatiotemporal dependencies across wind sites can be complex, and a localized understanding of this structure may provide more targeted information to improve forecasts. For example, previous research suggests that the characterization of spatial correlation across wind sites can be improved by identifying the wind regime that is dominated by a wind direction since sites that are located along the wind direction are more likely to be correlated.68 Future studies will explore more clustering methods and criteria, such as distance-based methods, as studies have suggested superior performance when advanced techniques are adopted.69,70

This work was supported by the National Renewable Energy Laboratory under Subcontract No. XAT-8–82151-01 (under the U.S. Department of Energy Prime Contract No. DE-AC36–08GO28308). This work was authored in part by Alliance for Sustainable Energy, LLC, the manager and operator of the National Renewable Energy Laboratory for the U.S. Department of Energy (DOE) under Contract No. DE-AC36- 08GO28308. Funding provided by U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Wind Water Power Technologies Office. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Sets
B

Buses

BGB

Buses with generators

BLB

Load buses

G

Generators

GbG

Generators on bus bBG

GTG

Thermal generators

GWG

Wind generators

L

Transmission lines, or branches

T

Time slices

Ξ

Wind power scenarios

Parameters
Cg

Slope of the cost function of generator gGT, in $/MWh

CΔL

System-wide load curtailment cost, $/MWh

CΔR

System-wide regulating up/down reserve curtailment cost, $/MWh

CΔS

System-wide spinning reserve curtailment cost, $/MWh

CgSU,CgSD

Start-up and shut-down cost of generator gGT

Cg0

Intercept of the cost function of generator gGT, in $

Db,t

Demand at bus bBL in time tT

K

Number of clusters during scenario generation

pg,0

Initial power production of generator g

Pg,tF

Forecasted power at time t from non-thermal generator gGGT

Pgmax,Pgmin

Upper and lower bounds of power production from thermal generator gGT

RgSU,RgSD

Start-up/shut-down ramp rates of generator g

RgU,RgD

Ramp up/down rates of generator g

RQtRD

System regulation down reserve requirement at time t

RQtRU

System regulation up reserve requirement at time t

RQtS

System spinning reserve requirement at time t

SF,b

Power shift factor, power flow increment on branch due to power injected at bus b

TgU,TgD

The minimum online/offline time of generator g

TgU0,TgD0

Number of initial time slices generator g must be online/offline

ui,ta

Marginal CDF of xi,ta

ui,tf

Marginal CDF of xi,tf

vg,0

Initial commitment status of generator g

Λ

Power limit of branch L

Variables
pg,t+

Power production from generator gG in time tT

p¯g,t+

Maximum power available from generator gG in time tT

rg,tRD+

Regulation down reserve provided by generator gGT in time t

rg,tRU+

Regulation up reserve provided by generator gGT in time t

rg,tS+

Spinning reserve provided by generator gGT in time t

vg,t

Commitment status of generator gGT in time tT, binary

xi,ta,xi,tf

Samples of random variables Xi,ta and Xi,tf

xi,ta,ξ

Realization of variables Xi,ta in scenario ξ

Xi,ta,Xi,tf

Random variables indicating actual and forecasted wind power at site i in time t

zg,tP+

Production cost of generator gGT in time tT

zg,tSD+

Shut-down cost of generator gGT in time tT

zg,tSU+

Start-up cost of generator gGT in time tT

ztΔ+

Load and reserve curtailment cost in time tT

δb,t+

Load curtailment at bus b in time t

δtRD+

System-wide regulating down reserve curtailment in time t

δtRU+

System-wide regulating up reserve curtailment in time t

δtS+

System-wide spinning reserve curtailment in time t

Copulas are employed to model the multivariate distributions in the denominator and numerator of Eq. (6). According to Sklar's theorem,71 any multivariate joint distribution can be written in terms of the marginal distributions of each component and a copula that describes the dependence structure between the components. Therefore, the joint CDF in Eq. (4a) becomes

Fa1aNf1fN(x1,ta,,x1,Na,x1,tf,,x1,Nf)=C2N(u1,ta,,uN,ta,u1,tf,,uN,tf),
(A1)

where the copula C2N:[0,1]2N[0,1] is a continuous, real-valued function and uj,ta,uj,tf[0,1] are marginal CDFs associated with Xj,ta and Xj,tf,

uja=Faj(xj,ta)=Prob(Xj,taxj,ta),
(A2a)
ujf=Ffj(xj,tf)=Prob(Xj,tfxj,tf).
(A2b)

Similarly, the joint PDF in Eq. (4b) can be expressed as

fa1aNf1fN=2NFa1aNf1fNX1,taX1,NaX1,tfX1,Nf=c2N·j=1N(fajffj),
(A3)

where c2N:[0,1]2N+ is the 2Nth order partial derivative of C2N,

c2N=2NC2NFa1FaNFf1FfN,
(A4)

and faj and ffj are marginal PDFs associated with Xj,ta and Xj,tf,

faj=dFajdXj,ta,
(A5a)
ffj=dFfjdXj,tf.
(A5b)

Using the above copula representation, the numerator in (6) can be expressed as follows using the copula function,

Prob(Xiaxia,Xji,ta=xji,ta,X1,tf=x1,tf,,XN,tf=xN,tf)=0xiafa1aNf1fNdXia=0xiac2N·j=1N(fajffj)dXia=2N1C2NFa1Fai1Fai+1FaNFf1FfN|0Fai(xia)·j=1N(fajffj)fai,j{1,,N}.
(A6)

Similarly, the denominator can be written as the following equation:

Prob(Xji,ta=xji,ta,X1,tf=x1,tf,,XN,tf=xN,tf)=Prob(Xia+,Xji,ta=xji,ta,X1,tf=x1,tf,,XN,tf=xN,tf)=2N1C2NFa1Fai1Fai+1FaNFf1FfN|01·j=1N(fajffj)fai,j{1,,N}.
(A7)

Finally, the equation in Eq. (6) becomes the form in Eq. (7),

Fai|a1ai1ai+1aNf1fN(xi,ta)=2N1C2NFa1Fai1Fai+1FaNFf1FfN|0Fai(xi,ta)2N1C2NFa1Fai1Fai+1FaNFf1FfN|01.
(A8)
1.
S.
Takriti
,
J. R.
Birge
, and
E.
Long
, “
A stochastic model for the unit commitment problem
,”
IEEE Trans. Power Syst.
11
,
1497
1508
(
1996
).
2.
J. R.
Birge
and
F.
Louveaux
,
Introduction to Stochastic Programming
(
Springer Science & Business Media
,
2011
).
3.
E. M.
Constantinescu
,
V. M.
Zavala
,
M.
Rocklin
,
S.
Lee
, and
M.
Anitescu
, “
A computational framework for uncertainty quantification and stochastic optimization in unit commitment with wind power generation
,”
IEEE Trans. Power Syst.
26
,
431
441
(
2011
).
4.
Q. P.
Zheng
,
J.
Wang
, and
A. L.
Liu
, “
Stochastic optimization for unit commitment—A review
,”
IEEE Trans. Power Syst.
30
,
1913
1924
(
2015
).
5.
J.
Wang
,
A.
Botterud
,
R.
Bessa
,
H.
Keko
,
L.
Carvalho
,
D.
Issicaba
,
J.
Sumaili
, and
V.
Miranda
, “
Wind power forecasting uncertainty and unit commitment
,”
Appl. Energy
88
,
4014
4023
(
2011
).
6.
J.
Wang
,
M.
Shahidehpour
, and
Z.
Li
, “
Security-constrained unit commitment with volatile wind power generation
,”
IEEE Trans. Power Syst.
23
,
1319
1327
(
2008
).
7.
F.
Bouffard
,
F. D.
Galiana
, and
A. J.
Conejo
, “
Market-clearing with stochastic security-part I: Formulation
,”
IEEE Trans. Power Syst.
20
,
1818
1826
(
2005
).
8.
F.
Bouffard
,
F. D.
Galiana
, and
A. J.
Conejo
, “
Market-clearing with stochastic security-part II: Case studies
,”
IEEE Trans. Power Syst.
20
,
1827
1835
(
2005
).
9.
B.
Wang
and
B. F.
Hobbs
, “
Real-time markets for flexiramp: A stochastic unit commitment-based analysis
,”
IEEE Trans. Power Syst.
31
,
846
860
(
2016
).
10.
A.
Papavasiliou
,
S. S.
Oren
, and
R. P.
O'Neill
, “
Reserve requirements for wind power integration: A scenario-based stochastic programming framework
,”
IEEE Trans. Power Syst.
26
,
2197
2206
(
2011
).
11.
P.
Pinson
, “
Very-short-term probabilistic forecasting of wind power with generalized logit–normal distributions
,”
J. R. Stat. Soc.
61
,
555
576
(
2012
).
12.
P.
Pinson
,
H.
Madsen
,
H. A.
Nielsen
,
G.
Papaefthymiou
, and
B.
Klöckl
, “
From probabilistic forecasts to statistical scenarios of short-term wind power production
,”
Wind Energy
12
,
51
62
(
2009
).
13.
M.
Cui
,
J.
Zhang
,
Q.
Wang
,
V.
Krishnan
, and
B.-M.
Hodge
, “
A data-driven methodology for probabilistic wind power ramp forecasting
,”
IEEE Trans. Smart Grid
10
,
1326
1338
(
2017
).
14.
H.
Bludszuweit
,
J. A.
Domínguez-Navarro
, and
A.
Llombart
, “
Statistical analysis of wind power forecast error
,”
IEEE Trans. Power Syst.
23
,
983
991
(
2008
).
15.
K.
Bruninx
and
E.
Delarue
, “
A statistical description of the error on wind power forecasts for probabilistic reserve sizing
,”
IEEE Trans. Sustainable Energy
5
,
995
1002
(
2014
).
16.
M. J.
Morshed
,
J. B.
Hmida
, and
A.
Fekih
, “
A probabilistic multi-objective approach for power flow optimization in hybrid wind-PV-PEV systems
,”
Appl. Energy
211
,
1136
1149
(
2018
).
17.
E.
Xydas
,
M.
Qadrdan
,
C.
Marmaras
,
L.
Cipcigan
,
N.
Jenkins
, and
H.
Ameli
, “
Probabilistic wind power forecasting and its application in the scheduling of gas-fired generators
,”
Appl. Energy
192
,
382
394
(
2017
).
18.
X.-Y.
Ma
,
Y.-Z.
Sun
, and
H.-L.
Fang
, “
Scenario generation of wind power based on statistical uncertainty and variability
,”
IEEE Trans. Sustainable Energy
4
,
894
904
(
2013
).
19.
N.
Zhang
,
C.
Kang
,
Q.
Xu
,
C.
Jiang
,
Z.
Chen
, and
J.
Liu
, “
Modelling and simulating the spatio-temporal correlations of clustered wind power using copula
,”
J. Electr. Eng. Technol.
8
,
1615
1625
(
2013
).
20.
C.
Tang
,
Y.
Wang
,
J.
Xu
,
Y.
Sun
, and
B.
Zhang
, “
Efficient scenario generation of multiple renewable power plants considering spatial and temporal correlations
,”
Appl. Energy
221
,
348
357
(
2018
).
21.
Z.
Wang
,
C.
Shen
, and
F.
Liu
, “
A conditional model of wind power forecast errors and its application in scenario generation
,”
Appl. Energy
212
,
771
785
(
2018
).
22.
F.
Hafiz
,
A. R.
de Queiroz
, and
I.
Husain
, “
Coordinated control of PEV and PV-based storage system under generation and load uncertainties
,” in
2018 IEEE Industry Applications Society Annual Meeting (IAS)
(
2018
), pp.
1
5
.
23.
K.
Høyland
,
M.
Kaut
, and
S. W.
Wallace
, “
A heuristic for moment-matching scenario generation
,”
Comput. Optim. Appl.
24
,
169
185
(
2003
).
24.
D.
Lee
and
R.
Baldick
, “
Load and wind power scenario generation through the generalized dynamic factor model
,”
IEEE Trans. Power Syst.
32
,
400
410
(
2017
).
25.
S. I.
Vagropoulos
,
E. G.
Kardakos
,
C. K.
Simoglou
,
A. G.
Bakirtzis
, and
J. P.
Catalão
, “
ANN-based scenario generation methodology for stochastic variables of electric power systems
,”
Electr. Power Syst. Res.
134
,
9
18
(
2016
).
26.
Y.
Xu
,
L.
Shi
, and
Y.
Ni
, “
Deep-learning-based scenario generation strategy considering correlation between multiple wind farms
,”
J. Eng.
2017
,
2207
2210
.
27.
H.
Wei
,
Z.
Hongxuan
,
D.
Yu
,
W.
Yiting
,
D.
Ling
, and
X.
Ming
, “
Short-term optimal operation of hydro-wind-solar hybrid system with improved generative adversarial networks
,”
Appl. Energy
250
,
389
403
(
2019
).
28.
Y.
Chen
,
Y.
Wang
,
D.
Kirschen
, and
B.
Zhang
, “
Model-free renewable scenario generation using generative adversarial networks
,”
IEEE Trans. Power Syst.
33
,
3265
3275
(
2018
).
29.
G.
Sideratos
and
N. D.
Hatziargyriou
, “
Probabilistic wind power forecasting using radial basis function neural networks
,”
IEEE Trans. Power Syst.
27
,
1788
1796
(
2012
).
30.
J.-F.
Toubeau
,
J.
Bottieau
,
F.
Vallée
, and
Z.
De Grève
, “
Deep learning-based multivariate probabilistic forecasting for short-term scheduling in power markets
,”
IEEE Trans. Power Syst.
34
,
1203
1215
(
2019
).
31.
G.
Papaefthymiou
and
D.
Kurowicka
, “
Using copulas for modeling stochastic dependence in power system uncertainty analysis
,”
IEEE Trans. Power Syst.
24
,
40
49
(
2009
).
32.
J. E. B.
Iversen
and
P.
Pinson
, “
Resgen: Renewable energy scenario generation platform
,” in
2016 IEEE Power Engineering Society General Meeting
(
IEEE
,
2016
).
33.
F.
Golestaneh
,
H. B.
Gooi
, and
P.
Pinson
, “
Generation and evaluation of space—time trajectories of photovoltaic power
,”
Appl. Energy
176
,
80
91
(
2016
).
34.
W.
Hu
,
Y.
Min
,
Y.
Zhou
, and
Q.
Lu
, “
Wind power forecasting errors modelling approach considering temporal and spatial dependence
,”
J. Mod. Power Syst. Clean Energy
5
,
489
498
(
2017
).
35.
D. L.
Woodruff
,
J.
Deride
,
A.
Staid
,
J.-P.
Watson
,
G.
Slevogt
, and
C.
Silva-Monroy
, “
Constructing probabilistic scenarios for wide-area solar power generation
,”
Sol. Energy
160
,
153
167
(
2018
).
36.
M.
Kazerooni
,
H.
Zhu
, and
T. J.
Overbye
, “
Literature review on the applications of data mining in power systems
,” in
Power and Energy Conference at Illinois (PECI), 2014
(
IEEE
,
2014
), pp.
1
8
.
37.
F.
Vallée
,
G.
Brunieau
,
M.
Pirlot
,
O.
Deblecker
, and
J.
Lobry
, “
Optimal wind clustering methodology for adequacy evaluation in system generation studies using nonsequential Monte Carlo simulation
,”
IEEE Trans. Power Syst.
26
,
2173
2184
(
2011
).
38.
L.
Dong
,
L.
Wang
,
S. F.
Khahro
,
S.
Gao
, and
X.
Liao
, “
Wind power day-ahead prediction with cluster analysis of NWP
,”
Renewable Sustainable Energy Rev.
60
,
1206
1212
(
2016
).
39.
A. B.
Birchfield
,
T.
Xu
,
K. M.
Gegner
,
K. S.
Shetye
, and
T. J.
Overbye
, “
Grid structural characteristics as validation criteria for synthetic networks
,”
IEEE Trans. Power Syst.
32
,
3258
3265
(
2017
).
40.
B.
Drake
and
K.
Hubacek
, “
What to expect from a greater geographic dispersion of wind farms?—A risk portfolio approach
,”
Energy Policy
35
,
3999
4008
(
2007
).
41.
F.
Roques
,
C.
Hiroux
, and
M.
Saguan
, “
Optimal wind power deployment in Europe—A portfolio approach
,”
Energy Policy
38
,
3245
3256
(
2010
).
42.
Y.
Ma
,
T.
Runolfsson
, and
J.
Jiang
, “
Cluster analysis of wind turbines of large wind farm with diffusion distance method
,”
IET Renewable Power Gener.
5
,
109
116
(
2011
).
43.
P.
Tan
,
M.
Steinbach
, and
V.
Kumar
,
Introduction to Data Mining
, Pearson New International Edition (
Pearson Education Limited
,
2013
).
44.
C.
Robert
and
G.
Casella
,
Monte Carlo Statistical Methods
(
Springer Science & Business Media
,
2013
).
45.
G.
Casella
and
E. I.
George
, “
Explaining the Gibbs sampler
,”
Am. Stat.
46
,
167
174
(
1992
).
46.
Y.
Dvorkin
,
Y.
Wang
,
H.
Pandzic
, and
D.
Kirschen
, “
Comparison of scenario reduction techniques for the stochastic unit commitment
,” in
2014 IEEE PES General Meeting
(
2014
), pp.
1
5
.
47.
J. M.
Morales
,
S.
Pineda
,
A. J.
Conejo
, and
M.
Carrion
, “
Scenario reduction for futures market trading in electricity markets
,”
IEEE Trans. Power Syst.
24
,
878
888
(
2009
).
48.
D.
Xu
,
Z.
Chen
, and
L.
Yang
, “
Scenario tree generation approaches using K-means and LP moment matching methods
,”
J. Comput. Appl. Math.
236
,
4561
4579
(
2012
).
49.
J.
Dupačová
,
N.
Gröwe-Kuska
, and
W.
Römisch
, “
Scenario reduction in stochastic programming
,”
Math. Program.
95
,
493
511
(
2003
).
50.
C.
Wan
,
Z.
Xu
,
P.
Pinson
,
Z. Y.
Dong
, and
K. P.
Wong
, “
Probabilistic forecasting of wind power generation using extreme learning machine
,”
IEEE Trans. Power Syst.
29
,
1033
1044
(
2014
).
51.
P.
Lauret
,
M.
David
, and
P.
Pinson
, “
Verification of solar irradiance probabilistic forecasts
,”
Sol. Energy
194
,
254
271
(
2019
).
52.
H.
Quan
,
A.
Khosravi
,
D.
Yang
, and
D.
Srinivasan
, “
A survey of computational intelligence techniques for wind power uncertainty quantification in smart grids
,”
IEEE Trans. Neural Networks Learn. Syst.
(in press).
53.
T.
Gneiting
and
A. E.
Raftery
, “
Strictly proper scoring rules, prediction, and estimation
,”
J. Am. Stat. Assoc.
102
,
359
378
(
2007
).
54.
H.
Hersbach
, “
Decomposition of the continuous ranked probability score for ensemble prediction systems
,”
Weather Forecasting
15
,
559
570
(
2000
).
55.
M.
Carrión
and
J. M.
Arroyo
, “
A computationally efficient mixed-integer linear formulation for the thermal unit commitment problem
,”
IEEE Trans. Power Syst.
21
,
1371
1378
(
2006
).
56.
N.
Kumar
,
P.
Besuner
,
S.
Lefton
,
D.
Agan
, and
D.
Hilleman
, “
Power plant cycling costs
,”
Technical Report No. NREL/SR-5500-55433
(National Renewable Energy Laboratory (NREL), Golden, CO, USA,
2012
).
57.
EIA
,
Monthly Energy Review
(
U.S. EIA
,
2018
).
58.
R.
Surendran
,
W.
Hogan
,
H.
Hui
, and
C.-N.
Yu
,
Scarcity Pricing in ERCOT
(FERC Technical Conference,
2016
).
59.
Public Utility Commission of Texas,
Chapter 25: Rules
,”
Electric Substantive Rules
(Public Utility Commission of Texas,
2018
).
60.
ERCOT
,
ERCOT Nodal Protocols
(
ERCOT
,
2019
).
61.
U.S. Energy Information Administration
,
State Electricity Profiles 2019
(
U.S. Energy Information Administration
,
2019
).
62.
C.
Draxl
,
A.
Clifton
,
B.-M.
Hodge
, and
J.
McCaa
, “
The wind integration national dataset (wind) toolkit
,”
Appl. Energy
151
,
355
366
(
2015
).
63.
J.-P.
Watson
,
D. L.
Woodruff
, and
W. E.
Hart
, “
PySP: Modeling and solving stochastic programs in Python
,”
Math. Program. Comput.
4
,
109
149
(
2012
).
64.
C.
Tseng
, “
On power system generation unit commitment problems
,” Ph.D. thesis, University of California, Berkeley (
1998
).
65.
J. F.
Benders
, “
Partitioning procedures for solving mixed-variables programming problems
,”
Comput. Manage. Sci.
2
,
3
19
(
2005
).
66.
J. R.
Birge
, “
Decomposition and partitioning methods for multistage stochastic linear programs
,”
Oper. Res.
33
,
989
1007
(
1985
).
67.
B.
Rachunok
,
A.
Staid
,
J.-P.
Watson
,
D. L.
Woodruff
, and
D.
Yang
, “
Stochastic unit commitment performance considering Monte Carlo wind power scenarios
,” in
2018 IEEE International Conference on Probabilistic Methods Applied to Power Systems (PMAPS)
(
IEEE
,
2018
), pp.
1
6
.
68.
K.
Kazor
and
A. S.
Hering
, “
Assessing the performance of model-based clustering methods in multivariate time series with application to identifying regional wind regimes
,”
J. Agric., Biol., Environ. Stat.
20
,
192
217
(
2015
).
69.
E.
Keogh
and
C. A.
Ratanamahatana
, “
Exact indexing of dynamic time warping
,”
Knowl. Inf. Syst.
7
,
358
386
(
2005
).
70.
Y.
Fu
,
Human Activity Recognition and Prediction
(
Springer
,
2016
).
71.
A.
Sklar
, “
Random variables, joint distribution functions, and copulas
,”
Kybernetika
9
,
449
460
(
1973
).