Computational time in optimization models scales with the number of time steps. To save time, solver time resolution can be reduced and input data can be down-sampled into representative periods such as one or a few representative days per month. However, such data reduction can come at the expense of solution accuracy. In this work, the impact of reduction of input data is systematically isolated considering an optimization which solves an energy system using representative days. A new data reduction method aggregates annual hourly demand data into representative days which preserve demand peaks in the original profiles. The proposed data reduction approach is tested on a real energy system and real annual hourly demand data where the system is optimized to minimize total annual costs. Compared to the full-resolution optimization of the energy system, the total annual energy cost error is found to be equal or less than 0.22% when peaks in customer demand are preserved. Errors are significantly larger for reduction methods that do not preserve peak demand. Solar photovoltaic data reduction effects are also analyzed. This paper demonstrates a need for data reduction methods which consider demand peaks explicitly.

## I. INTRODUCTION

Microgrid planning can reduce investment and operational costs, through reducing demand and energy charges or increasing revenue for energy services provided to the utility.^{1} Optimization models can determine a cost-optimal solution without the need to iterate through a manually defined search space of solutions. The trade-off between accuracy and computational speed in optimization models is an important research question as it benefits the planning and design of efficient microgrids.

Reducing the computational time required to solve optimization problems can be at odds with the need for accurate solutions. Time series input data such as demand, solar insolation, and utility rates vary seasonally, daily, and hourly, and commercial utility rates often include demand charges, which can make up a disproportionate amount of annual electric charges. As interactions between fluctuating electricity prices, demand charges, and distributed energy resource (DER) dispatch significantly impact microgrid design, accurate modeling requires demand, solar insolation, and utility data to be represented with a high level of granularity, including demand peaks. Yet a single run of optimization models using an annual horizon and hourly granularity for time series data can take multiple days,^{2,3} a problem further exacerbated when considering multiple years^{4} or topology sizing.

A common technique to increase the computation speed of microgrid optimization is to reduce high resolution time series input data to (sets of) representative periods through data reduction methods. However, the choice of reduction method varies, as do the number, length, and resolution of the representative periods, indicating that there is little consensus on which method is most appropriate for any given microgrid model. Some researchers use 3–6 representative days per year with 1, 2, or 4-h resolution to capture seasonal and daily trends.^{5–9} Others aggregate time series data into 12 monthly representative days with 1-h resolution.^{9–12}

The methods for including representative peaks also vary. In addition to typical seasonal days, Refs. 13–15 capture seasonal (winter and summer) peak behavior in separate peak days, whereas in Ref. 9 only the single time instant of seasonal maximum demand is included in the typical seasonal days. Ref. 16 use two representative 1-h resolution days for each month: One to represent work days, and one to represent weekends and holidays. Ref. 17 construct and alter typical days, testing the number of days used, and selectively adding maximum demand values into the profiles based on the operational results of the full annual hourly resolution optimization of the system designed by the representative periods. This of course increases the run-time and computational complexity. Both Refs. 18 and 19 cluster multiple time-series data (such as heating demand and photovoltaic (PV) data) into a tested range of typical days, but while Ref. 18 add one annual peak day to represent extreme heating demand, Ref. 19 use three separate peak profiles to account for demand extremes of each time series that is reduced. References 18 and 19 each note that the peak days are necessary for correct system sizing, but they do not quantify the impact of removing peak days.

To address the question of how accuracy and computational efficiency are impacted by the use of representative periods, the following papers discussed through the remainder of the introduction focus explicitly on input data reduction approaches. These works each evaluate a single data reduction method, testing the number, length, and resolution of representative periods. The performance of a k-means method integrated with a parametric *ε*-constraint optimization technique is evaluated in Ref. 18. Both Refs. 3 and 20 evaluate a k-means method that maintains the minimum and maximum values of the original demand proles. In Refs. 19 and 21, a k-medoids clustering methods to reduce demand data is tested. The methods show high accuracy compared to reference results modeled with annual hourly time series,^{3,19–21} demonstrating the validity of using reduced time series data in energy system optimization. However, the use of only a single approach in each paper prevents generalization of the results.

Recently, researchers have begun to directly compare the performance of multiple methods of data reduction, seeking to identify whether certain techniques consistently outperform across a range of energy system models. A k-means algorithm clusters electricity demand and wind output data for Great Britain in Ref. 22, and 15 methods of selecting representative days from within each cluster are tested. A systematic analysis of downsampling, clustering, and heuristic techniques is presented in Ref. 23. These studies focus on large power systems with high shares of renewable generation, but microgrids and DERs have different optimization objectives and are subject to higher relative variability in input data.

There are two studies which deal with smaller systems, comparing multiple data reduction methods (including averaging and k-means) for optimizing generation and storage DER sizing for residential systems^{2,24} and an islanded system.^{24} Both Refs. 2 and 24 find that medoid-based methods tended to perform better than centroid-based or averaging methods. Testing both representative days and representative periods of lengths up to a week, Ref. 24 conclude that they are inadequate for modeling energy systems that rely on long-term storage, and that data reduction method performance primarily depends on the system being modeled.

To the best of the authors' knowledge, only Refs. 2 and 24 directly compare demand data reduction methods in microgrid optimization. However, neither study peak demand in systems subject to demand charges or time of use (TOU) rates and neither evaluate the optimization results for reducing different data types, such as solar PV vs demand data. Reference 24 observe differences by data type of the Root Mean Squared Error (RMSE) between the original time series and the reduced data equivalents. However, they do not extend their comparison of reduction of different data types to the optimization results. By simultaneously reducing multiple time series rather than reducing only demand data or only PV performance data, previous works may miss insight on the number of representative days needed to represent typical and extreme behavior of different data types.

The primary contribution of this paper is testing the importance of retaining peak demand profiles in data reduction, while isolating the impact of the reduction technique on microgrid design and cost considering both PV and natural gas generators. A reference microgrid optimization, using a full annual hourly resolution time step (8760 h), is used for validation. Nine different demand data reduction approaches across two classes of techniques are compared. The most important class is a novel approach developed for this paper, Monthly Peak Preservation (MPP). MPP constructs representative weekday and weekend profiles, while explicitly preserving demand peaks. Our testing methodology directly isolates the impact of data reduction methods for demand and solar PV, avoiding confounding results caused by data reduction of separate input time series.

The layout of the paper is as follows: Section II A introduces the data reduction methods. The simplified mixed-integer linear program (MILP) formulations are introduced in Sec. III A, followed by an explanation of the testing procedure in Sec. III B. Case studies and energy systems modeled are described in Sec. III C. Results are then discussed in Sec. IV.

## II. DATA AGGREGATION METHODOLOGY

### A. Demand data reduction overview

This paper uses two classes of demand data reduction methods: Monthly Peak Preservation (MPP), newly developed for this research, and k-means clustering. For both classes, the reduced demand data represent weekdays and weekend day types separately. Representative hourly weekday and weekend demand profiles allow for variations in TOU rates to be captured, as utilities frequently differentiate between weekdays and weekends when setting such rates. MPP also includes representative profiles for peak days. Each representative daytype (dt = peak, week, weekend) in a month *m* has a 24-h, hourly representative demand profile, $Rm,dt,h$. The three day types are up-scaled to the full month using scaling factors $nm,dt$, which represent the number of times each daytype is expected to occur in a given month *m*, and sum to the number of calendar days ND_{m} ($\u2211dtnm,dt=NDm$). Equation (1) shows how annual energy consumption is calculated using the representative days, and its equivalence to the total actual consumption, the hourly demand *L _{H}* summed over the entire year.

### B. New Monthly Peak Preservation Method (MPP)

#### 1. Peak demand profile

MPP is specifically formulated for this paper to preserve total annual energy demand and diurnal demand peaks by also considering a third type of representative profile, called the peak day.

To create these peak day profiles $Rm,pk,h$, the maximum demand for month *m* and hour *h* over all days *d* in the month is selected (Fig. 1).

To test the optimal number of peaks to use in the MPP method, the number of peak days is varied from 0 to 5. Six sets of representative profiles ($Rm,dt,h$) and their corresponding values of $nm,dt$ for each month are created. For each set, the peak day profile for each month is $Rm,pk,h$, and the weekday and weekend demand profiles are generated as described in Sec. II B 2 and in Algorithm 1 (see Appendix). The set of demand profiles without any peak days is referred to as *M*_{0}, the set with one peak day is referred to as *M*_{1}, etc.

#### 2. Weekday and weekend nonpeak demand profiles

Weekday and weekend demand profiles are generated to satisfy the following conditions: (1) Represent the average weekday and weekend demand profiles. (2) Maintain the monthly total consumption. Conditions (1) and (2) conflict as the peak day(s) have been separated out; the average weekday and weekend demand profiles therefore have to be adjusted downward such that the sum of all weekday, weekend, and peak demand profiles equals the total monthly consumption.

To construct $Rm,wd,h$ and $Rm,we,h$, the total annual hourly demand *L _{m}*

_{,}

_{d}

_{,}

_{h}is first separated into data sets of weekday demand and weekend demand for each month, and each set is summed. Then, demand peak values at hour

*h*are removed from the demand data sets, based on the occurrence of the peak on a weekday or a weekend, as well as on the number of peak days $nm,pk$ represented in the month. For example, if 3 peak days are represented for month

*m*, and the monthly peak at hour

*h*fell on a weekday, then $3\xd7Rm,wd,h$ is subtracted from the total demand summed over all weekdays for month

*m*at hour

*h*. This ensures that the addition of the third daytype to represent peak behavior avoids overestimating the annual energy demand, as calculated in Eq. (1). Negative nonpeak demand is avoided by limiting the number of peak days $nm,pk$ (see the Appendix for details).

The number of times a weekday and weekend daytype occur, $nm,wd$ and $nm,we$ respectively, is calculated by subtracting the number of peak days $nm,pk$ from the actual number of weekdays and weekends in a calendar month ($NDm,wd$ and $NDm,we$, respectively), as shown in Eq. (4). The subtraction is weighted by *η _{m}*, the ratio of hourly peak values that occurred on a weekday [defined in Eq. (3)], to ensure Eq. (1) and the sum of $nm,wd,\u2009nm,we$, and $nm,pk$ is equal to the actual number of days in month

*m*.

### C. Demand clustering: K-means

K-means is a commonly used approach in the manipulation of large data sets to create representative, but smaller subsets of the data.^{25} It is formulated as a greedy optimization algorithm that assigns points among clusters by calculating cluster centroids which minimize the sum of the within-cluster sums of point-to-centroid distance. The distance measure *d*(*x _{i}*,

*μ*) is the squared Euclidean distance between the points

_{k}*x*within a cluster and the cluster centroid

_{i}*μ*. K-means iterates the assignment of points to clusters and the calculation of cluster centroids, decreasing the total sum of distances and the number of reassignments until the algorithm reaches a minimum. Each cluster centroid is the empirical mean of all cluster members.

_{k}As with MPP, the weekday demand and the weekend demand of every month are each grouped separately into *c* clusters, and the cluster centroids are used as the representative profiles $Rm,dt,c,h$. Unlike for MPP, demand peaks are not separated in the k-means approach. Total annual consumption is conserved exactly by setting $nm,dt,c$ as the number of cluster members in cluster *c*.

K-means is implemented using the built-in MATLAB k-means function. The number of clusters is defined *a priori* and varied as *c *=* *1, 2, 3. The corresponding sets of representative profiles are referred to as *K*_{1}, *K*_{2}, and *K*_{3}, respectively. (As both *K*_{1} and *M*_{0} average all weekdays into one representative weekday and all weekend days into one representative weekend for each month, they are identical by definition). An example of the clustering approach is shown in Fig. 2.

### D. PV data reduction method

Both seasonal and day-to-day variations in PV system performance interact with demand variations, impacting optimal system design of microgrids. While this paper focuses on the effect of demand data reduction, the inclusion of PV as a technology option in the test cases motivates testing PV data reduction. PV system performance depends on local climate data. PV system performance is typically calculated from solar radiation available as 8760 h Typical Meteorological Year (TMY) files. PV system performance exhibits seasonal and daily variations, which interact with demand variations to influence optimal PV sizing. To capture seasonal variations, twelve representative days with hourly resolution $RPVm,havg$ are created by averaging the daily PV system performance $PVm,d,h$

Additionally, we create sets of representative profiles to characterize the minimum $RPVm,hmin=mind(PVm,d,h)$ and the maximum monthly PV system generation, $RPVm,hmax=maxd(PVm,d,h)$, which are constructed by taking the minimum and the maximum values of the daily solar generation, respectively (Fig. 3). PV system performance data are provided to the optimization as AC power output normalized by system capacity.

### E. 8760-timeseries reconstruction

For validation of the demand data reduction methods, the reference optimization uses 8760 hourly time series data for demand, i.e., the complete demand dataset. But testing demand data reduction for microgrid investment decisions considering PV technologies requires also an annual hourly PV system profile with realistic intraday variations and total annual solar energy generation. Therefore, a 8760 h PV system performance time series (RepP) are reconstructed so that each day within month *m* has the same representative profile *RPV _{m}*

_{,}

_{h}. Removing day-to-day variations in PV generation within a month allows isolating the impact of demand data reduction.

Similarly, testing PV data reduction methods calls for a reconstructed 8760 annual hourly timeseries RepL for the demand data. Each RepL weekday in a month *m* has the same representative demand profile $Rm,wd,h$, each weekend in *m* has the same profile $Rm,we,h$, and $Rm,pk,h$ is placed in RepL to match the monthly demand peaks. RepL exhibits intradaily variations and total monthly consumption identical to the model using reduced demand data.

## III. TESTING METHODOLOGY

### A. Optimization schemes

#### 1. Representative days optimization

The demand data reduction methods are tested using a mixed-integer linear program (MILP), based on the Distributed Energy Resources Customer Adoption Model (DER-CAM)^{26–28} and further developed by XENDEE—Bankable Energy. The MILP minimizes the total annual costs *C* of providing energy services to a system by optimizing the technology portfolio and operation. The optimization takes input of demand in the form of representative demand profiles $Rm,dt,h$ and the corresponding number of days $nm,dt$, as well as the monthly average 24 h PV system performance profile described in Eq. (5). Decision variables, which describe the unknown quantities of a mathematical model such as the number of generators and their dispatch schedule, are optimized for each time step *h* ∈ (1,24) within each of the representative days. Monthly and annual quantities are determined by scaling up the daily variables using $nm,dt$.

The objective function considers all costs associated with meeting system energy demand, including monthly fixed utility costs, volumetric electricity purchases, demand charges (aggregated into $cutility$), annualized technology investment costs $cinvest$, and technology operation and maintenance (O&M) costs $cO&M$. The optimization is subject to over 500 constraints, including operational, economic, and an energy balance constraint, where at every time step the demand *L _{m}*

_{,}

_{d}

_{,}

_{h}must equal the sum of the electricity purchased

*u*

_{m}_{,}

_{d}

_{,}

_{h}and the electricity

*g*

_{j}_{,}

_{m}

_{,}

_{d}

_{,}

_{h}produced or consumed by technology

*j*. A highly simplified form of the objective function and the energy balance is shown in

The utility costs *c*_{utility} are modeled by specifying the volumetric electricity and demand charge tariffs for each month. Technologies are modeled by specifying installation costs, O&M costs, and operating characteristics. In the first of two test cases, the generation technology is limited to PV, and in the second text case, the choice includes both PV and a natural gas generator. The optimizer selects and sizes the PV capacity as a continuous variable, and selects and sizes the number of gas generator units as a discrete variable.

#### 2. 8760-timestep optimization

A 8760 hourly time step optimization benchmarks the data reduction method performance. The 8760-time step optimization is created by increasing the resolution of the representative days MILP, replacing timesteps $m\u2208(1,12),\u2009d\u2208(1,NDm)$, and *h* ∈ (1,24) with time step *t* ∈ (1,8760). The resulting optimization takes demand data input and PV data input in the form of annual hourly time series. This increases the run-time considerably, as will be demonstrated in the results, motivating the testing of representative days in this paper.

### B. Testing framework

#### 1. Overview

Data reduction method performance is tested by comparing results for the same energy system in an 8760-time step optimization and a representative day optimization. Apart from the time series data for the reduction method being tested, all input data representing the energy system are identical between the two models. Therefore, all discrepancies in performance metrics must be caused by data reduction (either for demand, when testing demand data reduction methods, or for PV, when testing PV data reduction).

Figure 6 provides a flow chart of the testing procedure. An overview of the system demand and PV system performance inputs is provided in Table I. Annual hourly time series data for system energy demand (ActL) and PV system performance (ActP) are reduced into sets of representative demand and PV profiles by applying the data reduction methods as described in Secs. II A and II D. Reconstructed annual hourly profiles RepL and RepP are generated by populating the representative demand and PV profiles across the year. The generator sizing and operation are determined by the 8760-time step and the representative day optimizations. The results of each energy system designed by the 8760-time step and the representative day optimizations are compared using the performance metrics described in Sec. III B 2.

#### 2. Performance metrics

Data reduction method performance will be evaluated based on objective function error, and discrepancies in generator sizing, demand charges, and energy charges. The primary metric for the data reduction method is the objective function error. Discrepancies in demand charges, energy charges, and generator sizing will be useful in understanding solver differences. The discrepancy between the representative day optimization results and the 8760-time step optimization results will be reported as a percent difference, relative to the 8760-time step optimization results.

### C. Energy system and case study description

#### 1. Building

The main gym building on the University of California (UC) San Diego campus in La Jolla, California is chosen for testing. The gym is open all days of the week, but is closed during holidays and school breaks. Historical 15-min resolution real power data from 2018 are aggregated to hourly resolution. Figure 7 shows a snapshot of the data and the annual load duration curve. The demand is highly variable, especially during vacations (at ≈40 kW or 20% of peak load) and periods of high use (weekday evenings at 200 kW). The demand data do not include cooling or heating which are supplied through district heating. All other energy needs for the gym are met through electricity to operate ventilation fans, lighting, and exercise equipment.

#### 2. Utility tariff

The tariff structure selected is SDGE Schedule AL-TOU Secondary, which applies to nonresidential customers with an average monthly demand exceeding 20 kW.^{29} Table II shows summer and winter energy charges and demand charges for on-peak, semipeak, and off-peak periods. The noncoincident demand charge is applied to the maximum hourly demand for a given month while the on-peak demand charge is applied to the maximum hourly demand during the peak period only. The on-peak period is from 1600 to 2100 h for all days of the year.

#### 3. Generators

The PV system modeled is a fixed ground-mounted array with a nominal efficiency of 19%. PV system installation cost is set to $1700 per kW of capacity, and O&M costs are set to $1.417 per kW of capacity per month.^{30} The annual hourly PV system performance data are modeled AC power output per kilowatt of system capacity, calculated using NREL's PVWatts tool for San Diego at a tilt of 33° and an azimuth of 180°. System capacity is determined by the optimization.

The natural gas generator (NGGen) is based on the Generac SG100 with a power rating of 100 kW. Installation costs are $200 000 per generator, with variable O&M costs of $0.02 per kWh of output energy. The cost of natural gas varies from 7.63 to 9.16 dollars per thousand cubic feet, depending on the month. For reference, the annual cost of supplying energy entirely from the modeled generator is $0.052 per kWh. The lifetime of PV and natural gas generators is set to 30 years.

## IV. RESULTS

### A. Total annual consumption

All methods were designed to maintain total annual consumption. Figure 8 shows the percent error of the calculated annual consumption against the actual annual consumption, $\u2211m=112\u2211d=1NDm\u2211h=124Lm,d,h$. The total annual consumption error shown in Fig. 8 applies to both the PV only and PV and gas generator optimization, as the same demand profiles were used for testing in each case.

K1, K2, K3, and M0 simply average actual demand data to obtain mean representative demand profiles (without including peak days), and therefore show zero error in total annual consumption by definition. The error in total consumption in M1–M5 is nonzero but less than 0.7%. The error is a consequence of the peak day calculations, where demand peaks are subtracted from weekend and weekday data sets to construct peak profiles on an hour-by-hour basis, but total annual consumption is calculated by multiplying $Rm,dt,h$ by a single value $nm,dt$ per profile.

### B. Demand data reduction

#### 1. Case 1: PV only

Figure 9 compares the error in the optimized objective function, which is the total annual energy cost (see Sec. III A 2). The data reduction methods that include at least one peak day profile for each month, M1–M5, perform the best (Table III, Fig. 9), with an objective function value that deviates from the 8760 objective function by less than 0.3%. The variation in objective function error with the number of peak days is minimal. Demand data reduction methods that do not include peak demand profiles show more significant objective function errors, as high as 8.9% for K0 and M0. The objective function error improves gradually from K1, to K2 (5.7%), and then to K3 (4.5%).

The objective function error is largely due to misrepresenting the demand charges. A clear correlation exists in Fig. 9 between the objective function error and demand charge discrepancy over various demand data reduction methods, and the demand charge error is dominant compared to other cost errors, as shown in Fig. 10. The significant contribution of demand charges to the total annual cost (48% for the 8760 case in Table III) is caused by the timing of the monthly demand peaks to occur primarily within the on-peak period and the absence of dispatchable generation. Since peak demand occurs in the evenings, i.e., at times without significant PV production, there is no opportunity for demand charge reduction through PV investment. Therefore, the underestimation in total annual energy costs is correlated with the underestimation of monthly peak demand, and therefore utility demand charges.

PV sizing (Table III) deviates significantly from the PV capacity in the 8760-time step optimization. Each of the reduced representative profile optimizations oversizes the PV system: M1–M5 select between 21% and 37% additional PV compared to the 8760-time step optimization, which is preferable for ensuring sufficient DER capacity for microgrid planning purposes. Yet M1–M5 have an objective function error of less than 0.3%, and 0% discrepancy in annual demand charges. For the daytype optimization, the solver minimizes the objective function by investing in more PV and reducing energy charges, whereas for the 8760-time step optimization, it reduces PV capacity and annualized investment costs rather than reducing energy charges. The trade-off between energy charges and investment costs is apparent in Fig. 10.

#### 2. Case 2: PV and natural gas generator

As in case 1, the demand data reduction methods that consider peak days outperform those that do not (Fig. 11, Table IV). The inclusion of a peak demand profile more significantly reduces the objective function error than increasing the number of representative profiles for K-means or increasing the number of representative peak profiles for MPP. The same trend for the objective function error observed in the results for case 1 is seen in the results for case 2. M1–M5 again have the smallest objective function error, ranging from 0.4% (M4) to 0.6% (M1), and there is little variation in the objective function error with the number of peak days. K1 and M0 have higher objective function errors of 6.7%, and as in case 1, gradual improvements are seen when comparing K1 to K2, then K3, and a steep improvement when comparing M0 to M1.

Demand charge discrepancy (Figs. 11 and 12) also shows the same trend as in case 1, although the correlation between the objective function error and the demand charge discrepancy is not as strong here. This is due to the dispatchable gas generator technology, which allows the solver to reduce demand charges through generator investments; this is unlike in case 1 where demand charges were independent of PV investment and therefore, the trade-off was only between energy charges and investment costs.

Including a peak demand profile $Rm,pk,h$ also affects generator sizing for case 2 as shown in Fig. 13. Again, there is a clear distinction between the demand data reduction methods that include a peak demand profile and the methods that do not. M1 through M5 select the same number of natural gas generators as the 8760-time step optimization, whereas K1, K2, K3, and M0 all select one less generator. The PV sizing discrepancy is reduced by the addition of dispatchable natural gas generators, particularly for M1 through M5, which now have a PV sizing discrepancy of less than 2.3%. The difference in generator selection also has an effect on the error for different cost components (Fig. 12). Unlike for case 1, the demand charge discrepancy is no longer dominant; with the selection of one less generator, K1–K3 and M0 save investment and O&M costs, but must meet more of the demand through utility purchases as reflected in the increased energy and demand charges.

#### 3. Discussion of demand data reduction results

Comparing the demand data reduction methods, the results indicate that the most significant impact on method performance is the inclusion of a peak demand profile. Methods that capture peak demand behavior have a low objective function error (M1–M5), and methods that only represent typical weekday and weekend demand have higher error (K1–K3, M0). The annual demand charge discrepancy indicates that the objective function error is largely due to misrepresenting the demand charges.

The significance of the peak demand profile is highlighted by observing that the inclusion of a peak demand profile plays a greater role in accurately representing the system demand than the number of daily profiles used. The error is greatest for K1 and M0, which only use two representative daily demand profiles for each month. K-means clustering does not capture demand peaks explicitly, but using four (K2) and six (K3) representative daily profiles per month, K-means improves the objective function error by increasing the maximum demand (as seen in the K1, K2, and K3 clusters in Fig. 2) and by capturing more variation in demand with respect to the TOU pricing periods. In contrast, M1–M5 use half as many representative demand profiles per month as K3, instead capturing extreme demand behavior with one peak day per month, yet significantly outperform K3. Within M1–M5, the number of peak days has little effect on changes in the objective function error.

In case 2, the generator sizing discrepancy is a direct consequence of the inclusion of demand peaks; dispatchable energy is economically beneficial because it reduces demand charges during evening peaks. Since they do not capture the peak demand, the optimizations using either k-means or M0 underestimate the demand charge reduction by a second generator. This further highlights that demand data reduction methods must be evaluated for energy systems subject to varying TOU rates and demand charges.

### C. PV data reduction

The impact of PV data reduction is isolated by comparing the representative profile optimization (RepL, RepP) against (RepL, ActP), using the same energy system as in case 2. M3 was chosen as the representative profile set for constructing RepL. Averaging the PV system performance data ($RPVm,havg$) outperforms with an objective function error of 0.9%, compared to a more significant underestimation of total annual cost by 6.1% using maximized PV system performance data ($RPVm,hmax$). These cost underestimations are due to overinvestment in PV capacity which then underperforms since the resource was overestimated. The minimization PV data reduction method causes costs to be overestimated by 3.9%, and no PV is selected in the solution as PV energy becomes more expensive than grid energy.

Reducing PV data has a greater impact on both the objective function and the sizing results than reducing demand data [cf. $RPVm,havg$ validated with (RepL, ActP) in Fig. 14, vs M3 in Fig. 11]. Reducing both demand and PV data increases the objective function error over either demand data reduction or PV data reduction. But the objective function error for reducing both demand and PV data is still under 1.5% for the averaged PV data reduction.

### D. Run times

Using representative demand profiles reduces the optimization run time by over 90% compared to the 8760 simulations. There is no meaningful or consistent difference that would motivate a choice of one of the reduction methods solely on the basis of computational speed, as seen in Fig. 15. Therefore, M1 through M5 are again identified as the methods with best performance, losing less than 1% accuracy in the objective function in exchange for significant run time savings.

## V. CONCLUSIONS

This paper assesses the accuracy of various demand and PV system performance data reduction methods for use in microgrid design optimization. We present the Monthly Peak Preservation (MPP) method, a new approach to averaging demand data while preserving demand peaks, and compare it against existing clustering techniques. MPP reduces annual hourly demand data to 36 representative 24-h demand profiles, using one peak profile per month to preserve peak demand, and two profiles per month to capture average weekday and weekend demand.

The testing methodology is designed to isolate the effects of reducing demand time series from PV time series, providing new insight into the impact of using representative demand profiles and the comparative effects of various data reduction methods. Data reduction performance compared to 8760 h results is evaluated on the basis of objective function error, as well as discrepancies in demand charges, technology sizing, and energy charges.

Methods which include a peak demand profile outperform those that do not, demonstrating the importance of accounting for extreme demands and validating the MPP method. For a grid-connected building subject to TOU rates and demand charges, MPP (with 36 representative days) has less than 1% objective function error. Comparatively, methods which use 48 (K2) or 72 (K3) representative profiles to capture weekday and weekend demand without preserving demand peaks show an error between 1.83% and 5.7%. The number of peak days per month (M1 vs M5) had relatively little effect on the objective function error, compared to the effect of including vs removing peak demand profiles entirely.

The objective function error shows strong correlation with demand charge discrepancy in the cases studied, further highlighting the importance of including peak demand profiles in demand data reduction. For the case that includes options for dispatchable generation, DER sizing also changes in the absence of peak days. Methods that do not preserve peak demand consistently undersized both generator and PV capacity, which is unsurprising, given that systems subject to TOU rates and demand charges especially benefit from demand charge reduction through DER deployment. Therefore, DER cost savings increase with the inclusion of peak demand profiles. The result of our paper therefore also have implications for decarbonization and microgrid resiliency as undersizing DERs is detrimental to both objectives.

Our results cover a relatively simple energy system without thermal demand or storage. While recent works on data reduction methods in energy system optimization have modeled systems with both electrical and thermal loads as well as a wide selection of generation and storage options, these papers have lumped together the impact of data reduction, without identifying the error contributions of demand and PV data reduction and discrepancies in different cost components. We clearly show that isolating different inputs is important, and we recommend that future research apply a similar approach in dissecting impacts of representative profile in MILP formulations.

Further work is necessary to continue to develop understanding of how data reduction influences design of more complex energy systems. Our MILP formulation does not model storage dispatch across representative days, which may have significant consequences for scenarios in which that is important. Follow-up research will address this challenge and present an evaluation of the impact of modeling uncoupled representative days on storage dispatch and sizing.

## ACKNOWLEDGMENTS

We would like to acknowledge Adib Nasle and the rest of the XENDEE team for their fruitful discussion and support, as well as WorleyParsons—Advisian for soliciting our consultation for microgrid projects, and providing valuable experience on real microgrid projects.

### APPENDIX: UPPER BOUND ON NUMBER OF PEAK DAYS

The maximum number of peak days that can occur in a month, $nm,pk\xaf$, is calculated as the minimum number of peaks that can be subtracted out across hours for a given month, depicted in Fig. 16 and detailed in Algorithm 1.

The shaded segments each represent the peak demand at each hour ($Rm,pk,h$), and are stacked in multiples, where each shaded area represents an occurrence of a peak day ($nm,pk$). The dashed line is the sum of the demand data, based on whether the peak at hour *h* occurred on a weekday. For instance, at hour 5, the peak occurred on a weekend, so the value of the marker is the sum of the energy consumption across all weekends in March at hour 5. The solid line indicates the boundary before the stacked areas first intersect the marker line. $nm,pk\xaf$ is the number of segments under the solid line, depicted here as 6 peak days possible.

Also of note, $\eta March=1524$, as 15 peak demand values (circle hours) occurred on a weekday. The number of peak days is varied from 0 to $nm,pk\xaf$. Algorithm 1 shows the details of how MPP uses $nm,pk\xaf$ to construct $maxm(nm,pk\xaf)+1$ sets of representative profiles ($Rm,dt,h$) and their corresponding values of $nm,dt$ for each month.