Activity: Service station, with coffee shop, shop and restaurant.
- In the case of a service station, we could discuss if the representative period of activity is a week or a year. Probably the consumption pattern is repeated weekly so with some weeks of data would be enough. On the other hand, if the station has a lot of seasonal climate, it would be convenient to have one year of data.
- As consumption data, we will need, at least, the main consumption of the location
- As external data or consumption variables, it would be ideal to have degree days (CDD, HDD) information and an indicative parameter about the occupation of the service station (tickets or sales).
• Response variable:
- Main consumption of the installation since January 2012 to December 2012 (1 year), hourly frequency
- Heating and cooling degree days (obtained from degreedays.net) (HDD, CDD)
- Daily sales (S)
- Daily tickets (T)
- It is interesting to generate to generate new data sets transforming existing variables and study if correlated with consumption. For this example, we will create square and cubic variables of the tickets (T) and Sales (S) and the product of Sales*Heating degree days and Sales*Cooling degree days.
- HDD². HDD³
- CDD², CDD³
- T², S²
- S*HDD, S*CDD
Resolution or baseline frequency
The resolution or baseline resolution is affected by the resolution of the available variables. In our case, Degree Days, tickets ans Sales has "Daily" resolution, so this will be the resolution of our formula.
Calculating the formula with Mintab 16
Minitab 16 is a statistical software useful for this types of problems. There exists more software which can help us, but Minitab 16 is one of the most used in the field of engineering statistics.
Once we have installed, we open and we introduce our data in columns as we can see in the following figure:
Note: In the image above there is an error with the first HDD^2 values, as they are HDD^3.
The tool we are going to use in order to calculate the formula is called "Regression". If we want an exhaustive analysis where we need more accuracy, we can use "Regression step by step".
We click in "Statistics" > "Regression" > "Regression..."
As Response we select "Main [kWh]" and as a "Predictors" the rest of variables expect the "Date", because it's not necessary for calculate the formula. The variables are selected by double-clicking on them.
Note: If you are interested in observing the residual values in graphical format, in "Graphics" you can activate the "four in one" option. See residues (error) in a graphical format will allow to detect outliers which can be eliminated from the analysis.
We click on "Accept" and Minitab will calculate automatically the equation based on the predictors selected, the correlation coeficient (R-cuad). Then, we have to refine the process.
Refining the equation - statistical P value
The first equation estimated is not necessarily the best. In fact, we have introduced in the model a set of interactions between variables and transformations that maybe don't correlate with the model.
One way to understand if a variable correlated or not is to look at the P value. If it's greater than 0.05 indicates that the variable in the model does not correlate. We can see our first model we have up to 7 variables that do not correlate:
We don't have to eliminate all the variables which not correlate, because there they are related between them. We have to remove one by one, from highest to lowest P value and analyse iteration by iteration what values of P we receive.
Moreover, it may happen that after removal one of the variables, if we introduce it again, we will give a P value less than 0.05.
Therefore, the refinement process it's not concrete, and can exists infinite combinations. We must decide when the model is correct for our purpose and the iterations.
One of our results could be this one:
We can see that the formula has been simplified, using only Cooling degree days (CDD) and sales (S) as predictors. We see how the formula is a nonlinear grade 3 polynomial. We have the cubic CDD variable and the interaction of CDD with Sales (S), indicating more heating is related with more sales.
The correlation value is so good (94.4%), which indicates an error in our savings about a 5.6%
With the residual chart we will observe if exists some outliers in the model or not, indicating a lower correlation coefficient and a bigger error. If so, you can reach an agreement with the client to delete this data from the model. Sometimes it can happen that up to 30% of the data received are outliers and has to be removed.
Once we have calculated our formula, it's time to insert it to our Measure & Verification project in DEXCell Energy Manager!
HOWTO - Generate my baseline based on a formula