- Thread starter noetsi
- Start date

My theory is with only 400 data points, 62 in the least common level, and 62 predictors I simply have too many predictors. With the same data and 31 predictors the model runs fine. Regardless, I was wondering if I even have to test linearity when all my predictors are ordinal (none are linear, they are all 4 point likert data).

She also said I should test for a binomial distribution

I would run the 32 explanatory variables. (But maybe I would use the lasso. It is a sort variable selection method.) For those which are signifikant or have a "large" parameter estimate I would test interactions.

You could run one factor at a time as a factor variabel with 4 levels (and all the others as linear effects). If the 4 levels are on a line then it is fine with the regression model. A linear regression model can be defendes as an approximation in a several variables Taylor series.

If all explanatory variables are 4 level Lickert items, then all variabels have the same scale, (I would say that going from 2 to 4 is an important change.) The most important variable is the one with highest regression coefficient. But you need to recompute the scale because your bosses will not understand: log(p/(1-p)) = a+b*x. So you need to compute p = 1/(1+exp( -(a+b*x1+...)

Don't make it to complicated. Simplify for the bosses.

I am simplifying it for the bosses believe me. The primary thing I report for them are 1) which variables are significant and 2) which have the most importance [aka impact]. That is a ranking from high to low. I have used 3 different approaches which I have found recommended:

1) Odds ratio [values below 1 I use the equation 1/or which effectively generates an absolute value].

2) Highest Wald values

3) A standardized coefficient [one of many that exist, its the only one SAS generates].

My problem is that the rankings are somewhat different from highest to lowest depending on which you use. And I have found no consensus on which generates the better results.

I am not sure what you mean by this. Treat one predictor as categorical [dummy coding so you would have 3 dummies] and the rest as linear predictors. Then see if the three dummies are significant?

"You could run one factor at a time as a factor variable with 4 levels (and all the others as linear effects). If the 4 levels are on a line then it is fine with the regression model."

Only 4 of my predictors actually are significant at the .05 level.

You have likert data (4 points in this case). Formally the non-linearity assumption does not apply to ordinal data. But if you treating the ordinal variables as continuous (that is using the odds ratio with them rather then creating a series of dummy variables) do you have to test linearity for them regardless of this? The likert scale variables might be ordinal, but you are treating them as if continuous (I think our likert data is continuous in that its reasonable to assume the difference between each point is the same even although formally it is not continuous in nature. This is commonly done although some disagree).

While I am asking many question, if you have ordinal variables and you assume they have to be tested for non-linearity (I used box Tidwell) what do you do to determine their importance if they are non-linear? Four of my variables turned out to be non-linear (although 2 came close to passing the Box Tidwell). I am not sure how to determine their relative importance to other variables if they are non-linear.

Splines or loess don't help because you can't use those to compare to other predictor variables in terms of their impact on the predictor.