I was wondering if there is any systematic rational of how to decide which category of a factor variable (e.g. level of education -primary, secondary, tertiary education) shall be used as reference category in multiple regression models?
Well it doesn't *really* matter as you can answer any questions you have using any of the categories as the reference. But if there is one level that you're most interested in comparing to the others then it might make things slightly easier on you to choose that as the reference. With that said it's also entirely possible to use a different coding scheme other than reference/dummy coding: https://en.wikipedia.org/wiki/Categorical_variable#Categorical_variables_and_regression
Thanks for your reply Dason.
I would like to give an actual example which made me wondering about the choice of reference categories.
I run a linear regression model which also includes 2 factor variables as IVs. One of the factor variables measures respondents level of education (primary, secondary, tertiary).
In a pre-regression step, I run correlation analyses which indicate that primary school correlates negatively with the DV and tertiary education correlates positively with the DV. Secondary schholl does not correlate significantly with the DV.
Hence I was wondering which of the three categories is best used as reference category in the multiple regression model. I thought of using secondary school since this category is not significantly related to the DV.
Does that make sense?
What is your hypothesis? Once you have the output you can run it each way and see which may be easiest for your audience to understand. There is not right and wrong way - there are just some that make more intuitive sense.