Hello!
In the context of regression, in what situations am I allowed to group separate variables into a single categorical predictor? I'll use R code as an example since it's what I'm most familiar with.
For example I could run a model like this:
but I could also reshape dat in this way
The two approaches give very different results, and I'm not sure if one is considered more valid than the other.
Thanks!
In the context of regression, in what situations am I allowed to group separate variables into a single categorical predictor? I'll use R code as an example since it's what I'm most familiar with.
For example I could run a model like this:
Code:
dat <- data.frame(ind=c(1,2,3,4,5,6), y=c(40,63,23,66,74,45), day1=c(4,6,3,6,1,3), day2=c(6,4,7,9,8,9))
dat
ind y day1 day2
1 1 40 4 6
2 2 63 6 4
3 3 23 3 7
4 4 66 6 9
5 5 74 1 8
6 6 45 3 9
mod <- lm(y ~ day1 + day2 + day1:day2, data=dat)
summary(mod)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -133.873 230.464 -0.581 0.620
day1 33.121 41.781 0.793 0.511
day2 23.062 29.095 0.793 0.511
day1:day2 -4.046 5.289 -0.765 0.524
Code:
dat2 <- melt(dat, id.vars = c("ind","y"))
dat2
ind y variable value
1 1 40 day1 4
2 2 63 day1 6
3 3 23 day1 3
4 4 66 day1 6
5 5 74 day1 1
6 6 45 day1 3
7 1 40 day2 6
8 2 63 day2 4
9 3 23 day2 7
10 4 66 day2 9
11 5 74 day2 8
12 6 45 day2 9
mod2 <- lm(y ~ variable + value + variable:value, data=dat2)
summary(mod2)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.7965 20.7469 2.304 0.0502 .
variableday2 -1.7345 41.7837 -0.042 0.9679
value 1.0531 4.9129 0.214 0.8356
variableday2:value -0.2478 6.9479 -0.036 0.9724
Thanks!
Last edited: