Selecting Variables in Regression Analysis

#1
Hello, I am working on generating a regression analysis, but I am not sure if my setup is accurate.

I would like to predict how long it will take to preheat my oven based on certain variables. My dependent variable will be time it takes for oven to preheat. For my independent variables, I have the temperature I wish to preheat the oven to, the ambient temperature in my home, and the time of day.

I have my data collected, however I am not sure how to quantify "time of day" as a value for the analysis. I played around with it, and for one analysis I used the value 1 for tests run between 8am and 12pm, value 2 for tests run between 12pm and 4pm, and value 3 for runs between 4pm and 8pm.

Can anyone tell me if this is a good way to use "time of day" as an effective variable when running a regression analysis?

Thank you much!
 

rogojel

TS Contributor
#2
hi,
coding time of the day as 1,2,3 is deinitely a bad idea as it is a completely arbitrary scale. You could try dummy variables instead.
regards
 
#3
I am not sure why you want time of day in the model at all.... But since you do: Try the sin(hour).

1. Don't categorize. Categorizing continuous variables loses information and invokes magical thinking
2. Time as a linear predictor makes no sense because time of day is not linear: 23:59 is very close to 00:01.
3. So, use trigonometry which also has a repeat quality.

You could divide the 24 hours into 360 degrees (so, 00:00 = 0, 12:00 = 180 etc. Then take sine and put that in your model.
 
#4
Thanks for the reply! My idea is to see how time of day effects an experiment. I suppose in my oven example, time would not show much variability. In another example, say, commuting time, I think this would show more significance. For example, if I left my house for work 10 minutes later, how would this effect a normal commute? It's not for any real study, I am just curious of the concept more than anything.

I'll give it a try, thanks for the advice!