Solar energy prediction

omar

New Member
#1
Hi,
I am prediction one-day-ahead solar energy output using 30 days historical data. The data sets are hourly, so the prediction is done hourly from sunrise to sunset.
I have doe the prediction using sliding window technique, When I am predicting 01/06, I am using 30 days historical data (from 02/05 – 31/05 ) for the training dataset that will be used to build the model, the training dataset include weather variables (global horizontal irradiance, direct normal irradiance, temperature and humidity) as input to the model and actual solar energy output (not the predicted) as dependent variable. I do this for 360 days.
I got comment: So many lagged dependent variables will raise multicollinearity issue, in linear regression model. Did you check it?

What is the answer for this.

Hope some one can help me on this.

Regards,
 
#2
Hi Omar,

Generally, you should think about what variables to insert into the model, don't just insert into the model every possible variable.
You should use some theoretical knowledge when choosing the predictors.

Multicollinearity happened when some of the predictors are highly correlated.

For example, if X1 cause Y and X2 doesn't cause Y
But X1 is highly correlated with X2.
X1 may result as an insignificant predictor, despite the fact it should be significant.

http://blog.minitab.com/blog/unders...ling-multicollinearity-in-regression-analysis
 
#3
Hi,
Thank you for your answer.
But I am talking about dependent variable and not independent variables, So I am talking about correlation between Y's and not X's.
If I have auto-correlation between Y's (found using acf function), what is the solution? is it just to include the lagged Y's as input to the model?
Regards,
Omar
 
#5
Yes, the perdition is done hourly for 24 hours ahead, using 30 days historical data.
using MLR, input variables are weather variables, and the output is the energy.
 
#6
Hi Omar,

One of the regression assumptions is Independence of errors.
Since there is probably a correlation between the Y of the last 30 days, you probably should check this assumption.
 
#7
Hi,
I did, and there is relation of error as I can see from the Q-Q plot. is there another way to check that? and if there is relation what is the solution?
Many thanks.