right-skewed distributed variable in regression model

Hey everybody, I have a variable (year) that I was supposed to include in my regression model as a fixed effect. I just found out that it is heavily right-skewed(most observation in the last 4 years of sample period of 20 years) and ruins my model fit. Every other control variable, that is signficant when not including year, loses it´s significants if I include year.
I was told I can not include the year variable in the regression if I give a good explanation for why I excluded it. Am I right, that the inclusion of year would overstate the significance of observations in the later years and "overfit" the model?


Well-Known Member
Hi Derperino,

You should build the model per knowledge not only per the statistical result.
You shouldn't include or exclude IVs only per the significance level.
If you believe that year should be part of the model you should include it.

The normality assumption in the linear regression is for the residuals, not for each IV.
If any problem, You may use transformation or generalized linear model.
What kind of model are you using? Is it a fixed effect model or something else? Also, how are you treating your year variable as a numerical or categorical?


TS Contributor
Have you checked for multicollinearity? Plot all of your IVs by year. I suspect that you may find that some of them changed during the last four years. Unless your DV is affected by aging, time is usually a surrogate for other changes and would cause a multicollinearity problem.