Necessary to rank transform ANCOVA?

StatsN00bie

New Member
Hi all, I'm hoping you can shed some light. My data are non-parametric, so I thought I'd have to use ranks in order to do ANCOVA, but then I read that because it falls under multiple regression, that the variables don't need to be normally distributed. Can someone please clarify? TIA!

Dason

Ambassador to the humans
Can you describe what you mean when you say that your "data are non-parametric"?

StatsN00bie

New Member
It is not normally distributed. The sample size is about 50.

Dason

Ambassador to the humans
A few things to note. "Non-parametric" refers to modeling techniques. Data is what data is and there are more distributions than just the normal distribution so it doesn't make much sense to say that the data is non-parametric. The second thing is that there is no assumption about the distribution of the raw data. The normal distribution assumption is on the residuals in the model (technically it's on the error term but the residuals are are stand-in replacement for the error terms). You can't assess this until you actually fit the model and obtain the residuals.

Dragan

Super Moderator
The traditional rank-ANCOVA procedure is referred to as Quade's test. It proceeds by separately ranking the variate and covariate and then regressing the variate on the covariate and subsequently saving the residuals. An ANOVA is then performed on the residuals.

Note the ANCOVA is a combination of ANOVA and Regression.

StatsN00bie

New Member
Hi Dragan, thank you for your response.

I read that Quade's test assumes an equal distribution of the covariate amongst the groups. How strict is this assumption? I'm confused because my covariates are significantly associated with the independent variable (group), therefore violating the homogeneity of regression slopes, which is another reason why I thought I should not use ANCOVA in the first place. The second reason is I have a slight violation of homogeneity of variance, which I cannot get around with very unequal sample sizes.

Can you offer any advice on what test is appropriate? In short, I have three covariates (two continuous, one categorical), and the two continuous covariates are very significantly associated with the grouping. TIA!

Dragan

Super Moderator
How many groups do you have?...and yes, Quade's test does indeed make the assumption that there is no slope/treatment interaction (i.e., equality of slopes).

StatsN00bie

New Member
I have three groups (sample sizes about 100, 25 and 15). I am interested in finding out the effect of the group on my dependent variable. The two covariates that are continuous are highly associated with the groups (p < 0.01). The third covariate is categorical and not associated. I am okay with taking that out of the analysis if it simplifies things. Would it be appropriate to include these covariates as factors, then? Could I still use ANCOVA or would I have to use a more generalized regression model? Thank you very much for your help.

StatsN00bie

New Member
Does anyone else have suggestions, please? Basically, I would like to know what is the proper regression to use for my project (summarized below).

I am looking at the variability of thickness measurements on a retinal scan. Two measurements are made on each scan and I am using the absolute difference of the measurements as the outcome variable. Each scan also has an associated disease/diagnosis, as well as what type of fluid is present. The predictors would be 1) baseline thickness of tissue A, 2) baseline thickness of tissue B, 3) type of fluid present, and 4) diagnosis/disease.

My primary interest is looking at type of fluid as the predictor, and baseline thickness and diagnosis as covariates, but my data violate the assumptions of homogeneity of variances and regression slopes.

Any advice would be greatly appreciated!

kiton

New Member
Could you please clarify the following. (1) "absolute difference of the measurements as the outcome variable" -- what are the min and max values, mean, and SD? and (2) do you have cross-sectional or longitudinal data (I assume the former)?

Based on the statistical properties of your outcome, we can advise you on the proper estimator. Note, heteroskedasticity shouldn't be of great concern (could be "fixed" with robust SEs). However, there are other important assumptions -- e.g., multicollinearity and normality of residuals -- that could present barriers for unbiased estimation.

StatsN00bie

New Member
kiton,

The range is 0-352 (there are a few outliers), mean is 35.6 and SD is 52.5. It is cross-sectional data. Thanks very much!

kiton

New Member
Let me double-check -- is it count data (i.e., 0-352 are only non-negative integers)?

StatsN00bie

New Member
Yes, non-negative (it is absolute difference of two measurements).

kiton

New Member
In such case, I would run a series of models fitting Poisson, negative-binomial, or zero-inflated (if you have a ton of zero values) distributions and examine their fit and coefficient (and standard errors) performance. Additionally, you can run an OLS with log-transformed DV to provide you some points for comparison. Note that you will have to use robust (or clustered) standard errors (as you cannot fully assume a given distribution). Also note, that with count-models the interpretation of the coefficients will be different -- they will indicate the rate of change in the outcome. Don't forget to check your IVs for collinearities.

P.S. If you come across the notion of over-dispersion and its "easy fix" with negative-binomial model -- don't believe that. Just estimate a few and compare the performance.

StatsN00bie

New Member
That's a bit beyond my abilities, but I will try it! Sorry, I read your last post too quickly. The DV is not counts (they are non-negative, but not integers). Does that change anything?

kiton

New Member
It does. Try experimenting with OLS then. Note the required assumptions.