Categorical multiple linear regression analysis - what would you do?

#1
Noob here. I'm doing a project to determine how different factors (race, insurance status) are related to distance from pediatric urologists.

I'm thinking of tabulating the mean of each of these factors for several successive distance brackets, i.e. 0-10 miles, 10-20 miles etc. So distance isn't really continuous even though this is my dependent variable. Would you use multiple linear regression for this? Could I simply assign a number to each bracket, and interpret it that way, i.e. 0-10 miles = 1, 10-20 miles = 2.

Does it make sense to test whether these factors are predictive of distance? I'm simply trying to see if there are any correlations between distance and these factors, and so far MLR seems to be the way to go.
 

kiton

New Member
#2
Regression analysis allows you to establish causality between variables, whereas correlation only indicates whether variables related or not. What is your goal?

With distance "as is" you are looking at a linear regression. However, if you transform it into "brackets" your DV would be on the ordinal scale. Therefore, you will be looking at a completely different model (e.g., ordered logistic regression). Now, when you say MLR -- do you imply multinomial logistic regression? If so, then it deals with categorical predictors, not ordinal.

Why don't you run an ANOVA with a continuous DV and two categorical predictors?
 
#3
Thanks for your help. I want to see if race, insurance status etc is predictive of distance from surgeons. So I guess that means doing a regression analysis. I think what you said, ordered logistic regression is the best way for me to go. (What do you think? Does it make more sense to determine correlation, as opposed to predictive value?)

I can't do a continuous distance because my data is not that fine. I am relying on zip code tabulation areas, and they can vary in size.

I know this is a stupid question, but can my predictors themselves be means? For example, the dependent variable is distance. I am calculating the average race of all ZCTAs in each distance bracket. This average has it's own mean and SD. Do I just disregard this SD when conducting my ordered logistic regression?
 

kiton

New Member
#4
Thanks for your help. I want to see if race, insurance status etc is predictive of distance from surgeons. So I guess that means doing a regression analysis. I think what you said, ordered logistic regression is the best way for me to go. (What do you think? Does it make more sense to determine correlation, as opposed to predictive value?)
It is not one or another one, as typically researchers look at both. Correlation indicates you variables are related or not, whereas regression establishes what impact one has on another.

I can't do a continuous distance because my data is not that fine. I am relying on zip code tabulation areas, and they can vary in size.
Can you please clarify "data is not that fine" -- what bothers you exactly?

I know this is a stupid question, but can my predictors themselves be means? For example, the dependent variable is distance. I am calculating the average race of all ZCTAs in each distance bracket. This average has it's own mean and SD. Do I just disregard this SD when conducting my ordered logistic regression?
Is there a specific reason to go this route? Based on your goal, ANOVA seems appropriate.
 
#5
Thanks for your help. After thinking about it, I think correlation would be better for me. I went ahead and calculated the distance of each ZCTA from the nearest surgeon. That way my response variable is continuous.

I am trying to use ANOVA General linear model on Minitab, and it shows that my p value is 0.000 for basically every variable. I see a high number of "Lack-of-Fit" and "Pure error" entries. Why is this?

Source DF Adj SS Adj MS F-Value P-Value
Uninsured children 1 455112 455112 193.34 0.000
Percent; HISPANIC OR LATINO AND 1 63696 63696 27.06 0.000
Perc_NonHispLat - White alone 1 149961 149961 63.70 0.000
Perc_NonHispLat - BlackAAalone 1 13536 13536 5.75 0.016
Perc_NonHispLat - Asian alone 1 532260 532260 226.11 0.000
Perc_NonHispLat - American Indi 1 430559 430559 182.91 0.000
Error 31319 73725005 2354
Lack-of-Fit 28527 65561158 2298 0.79 1.000
Pure Error 2792 8163847 2924
Total 31325 84922149