There is no relationship if you keep the measurements 'paired'.... but I've got to dash so will post later 2 nite....
But you have simulated your self and Dason have shown analytically that there is a relation between change and the baseline value. Of course there is a relation!So, depth change is finishing depth - starting depth. If you do the regression Change~starting depth there is NO correlation using these simulated values.
But maybe I miss-understand something here.
To do a pairwise t-test is a repeated measurement study. A very simple, very concrete and often very useful study – still a repeated measurement study (where the individual acts as its own control).
Maybe the baseline should not be used as explanatory variable, but used in a repeated measurement model.
There is no relationship if you keep the measurements 'paired'.... but I've got to dash so will post later 2 nite....
Dason, can you prove that two completely random variables (from a random and not a normal distribution) when regressed as Y-X~X will have a correlation proportional to -1*X? My simulation of two completely random variables seemed to have this correlation just as you predicted for two normally distributed variables. I was interested in whether this could be proved mathematically.
Another question has popped into my head.
The central limit theorem should suggest that if my sample size is large enough then my residuals should tend to a normal distribution. So, if they don't, then does that mean I can't say that the central limit theorem will help?
On the other hand, the vast majority of my sample lies on the diagonal line of the QQ plot. Since the confidence intervals are estimated based on a normal distribution, does it matter that, say 40 of 260 points are off the scale when the rest lie on the plot? In other words, if 80% of my residuals are normally distributed, should this be OK for the calculation of confidence intervals?
How best should I convince a reviewer that my linear model is OK when my residuals do not pass the Shapiro-Wilk normality test?
I am as well eager to know the answer to this question. However, I for one have seen many studies that have published confidence intervals for the differences (including some of my own ones), or have encountered journal editors or reviewers asking for CI for the differences. However, differences between two populations are not necessarily normally distributed. The point is that despite that fact, authors, editors and reviewers (and readers) still seem indifferent to the type of the sample for which the CI is computed. Or, at least, they might have no other option.
If I were you, rather than mentioning the P value (which can give a biased reviewer the excuse to invalidate your results), I would try to base my model selection on my QQ plot, as its subjective nature leaves some room for you to maneuver. Besides, I think you could use CLT due to the high number of your observations (as stated previously by Dason I think)?How best should I convince a reviewer that my linear model is OK when my residuals do not pass the Shapiro-Wilk normality test?
You appear to be showing a fundamental misunderstanding of the central limit theorem. The CLT doesn't apply to the data itself. It applies to sample means or in this case the estimated parameters in the model. We can show that with enough data the sampling distribution of the estimated parameters will be approximately normal even if the original errors aren't normally distributed.
This appears to be nonsense. Well... I guess not completely nonsense but I think you're misunderstanding the qqplot. You don't have a situation in which "80% of the residuals are normally distributed" - you just have a situation in which the residuals probably aren't perfectly normally distributed. Even if you had a situation in which your error term was a mixture of a normal distribution and something else - the qqplot wouldn't be able to tell you exactly which points weren't from the normal distribution. And that really doesn't matter anyways. We might care about outliers but using the qqplot isn't the way to identify them.On the other hand, the vast majority of my sample lies on the diagonal line of the QQ plot. Since the confidence intervals are estimated based on a normal distribution, does it matter that, say 40 of 260 points are off the scale when the rest lie on the plot? In other words, if 80% of my residuals are normally distributed, should this be OK for the calculation of confidence intervals?
You don't actually say what you're taking differences of. But the CLT can still apply to differences in many regards so I don't have a (major) problem with using normal based methods if the sample size is large enough.
I don't have emotions and sometimes that makes me very sad.
SiBorg (10-24-2012)
Tweet |