Given your qq plot I strongly suggest you do a skewness test and DFBETA. The data appears to be abnormal with a lot of outliers in the tail. Its best to check.
In other words, is this 'wrong'? Or does it help to know that the starting conditions predict change? Even though they don't predict anything, it just arises out of the maths?

Given your qq plot I strongly suggest you do a skewness test and DFBETA. The data appears to be abnormal with a lot of outliers in the tail. Its best to check.
"Facts are stubborn things, but statistics are more pliable." Mark Twain
Now, I am really embarrassed! Very embarrassed!
Ok it is nice that we say friendly words to each other. Thank you!
Sometimes we, I mean myself, makes stupid comments. Then maybe it is good if we are not to frank.
If we regress: (random number – A) versus A, of course the regression coefficient will be around –1.
I mean that’s what the left hand equations says. There is a –1 in front of A in the left hand side of the equation. As Dason later on point out. (So I don’t know where I was dreaming out the –0.5 coefficient. Stupid guess of me!)
Anyway, this is an important model that is used again and again. It is very common to take the difference to the baseline and to use the baseline as an explanatory factor. I say again: it is not wrong, but is it relevant?
Would it be better to just use the late periods value and baseline as explanatory variable? (I don’t think so but I don’t know why.)
It is very natural to take the difference so that “the individuals acts as its own control”.
There is a simple solution to this and at the moment I can’t se it.
Explain to us!
(A note: It was nice of Noetsi to explain where he saw the heteroscedasticity. I can’t se that. Anyway thanks!
Is this simply an example of 'regression to the mean'. It's just regression to a lower mean which is why the effects look significant. What we really need is a way of looking at the slopes of the lines between each baseline and follow-up depth, corrected for the difference in means, to see weather there is any 'true' effect.
Greta this is why I said recently that interpreting the qq plot is subjective. We can see that the result differs from person to person, or from time to time in one person.
Besides, I too agree on that G thing! GGG = GretaGarboGenius![]()
Or GGGG = GretaGreatGarboGenius![]()
Just before I sleep. If the expected regression coefficient for random interaction is -1, then is it the change from -1 that we are interested in? So I got -0.5 so does that mean that deeper chambers do not shallow as much as we would expect them to from simple regression to the mean? I.e. the opposite of what I concluded before (that deeper chambers shallow more...).
Or does my -0.5 simply reflect that these were not sampled from a perfectly normal distribution?
OK, I honestly think that's it. So, -1.00 is the expected association if there is no association and any deviation from this is more or less than expected depending on whether it is more or less than -1.00. Whereas, for the other coefficients, we are interested in the difference between 0 which would indicate no association.
Do we agree??![]()
I am even more embarrassed!
(Still: friendly words! Tanks!)
But there are a number of people here who are really good at this. (I am not one of them. But I am reading, listening and learning.)
(@victorxstc, I understood that you would hit on this about the impression of the graphs. But still, what we observe is an objective fact! Lets talk about that later.)
Suppose the error term is small so that it is negligible and add A (the baseline measurement) to both sides:
B-A+A = a+b*A +A
B = a+(1+b)A
So if b= -0.6 then
B= a+ (1-0.6)*A
B=a+(0.4)*A
Therefore A will have a predictive value if b< 1.0
But is it a good model?
Last edited by GretaGarbo; 09-28-2012 at 09:55 PM.
The adjusted R-squared for the random model I created is 0.56. Interestingly, even if you sample from exactly the same normal distribution you get the same result (as predicted by Dason). The next test is to see if you sample from any distribution do get the same result.Adjusted R-squared: 0.5637
So, an additional question is this. How do you adjust the R2 to reflect that this IV needs to have -1 as it's comparitor rather than 0 as for all the other ones.
Ok, so I've done this one too. You get the same correlation (roughly -1.0) even with two random variables. The only difference is that the QQ plot becomes very sigmoid shaped (and is not fixed by taking logs then taking the difference).The next test is to see if you sample from any distribution do get the same result.
Code below and plots attached...
Code:ACD5 <- runif(200, 0, 10) ACD6 <- runif(200,0,10) ACDtestR<-data.frame(ACD5,ACD6) ACDtestR$ACDdiff<-ACDtestR$ACD5-ACDtestR$ACD6 t.test(ACD5,ACD6) Welch Two Sample t-test data: ACD5 and ACD6 t = 0.6981, df = 397.287, p-value = 0.4855 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.3725438 0.7828126 sample estimates: mean of x mean of y 5.107567 4.902433 ACDtestR$ACDdiff<-ACDtestR$ACD6-ACDtestR$ACD5 model.test<-lm(ACDdiff~ACD5,data=ACDtestR) summary(model.test) Call: lm(formula = ACDdiff ~ ACD5, data = ACDtestR) Residuals: Min 1Q Median 3Q Max -4.8977 -2.5170 0.0591 2.7380 5.0182 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.14488 0.40274 12.78 <2e-16 *** ACD5 -1.04747 0.06803 -15.40 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.879 on 198 degrees of freedom Multiple R-squared: 0.5449, Adjusted R-squared: 0.5426 F-statistic: 237 on 1 and 198 DF, p-value: < 2.2e-16
Greta, I agree that what we observe is objective(@victorxstc, I understood that you would hit on this about the impression of the graphs. But still, what we observe is an objective fact! Lets talk about that later.)(although one can argue that there is Nothing objective in the whole universe, as anything we perceive is only a subjective image of something which may or may not exist in the real world (a real world which itself may or may not exist at all) [but that's another story and I think you are familiar with it too])
![]()
However, if we exclude this true but philosophical and non-practical fact, and agree (in fact assume) that the QQ plot itself is something 100% objective, we still have to finally confirm that its interpretation is not objective at all, since there are no clear-cut ways of interpreting itAnd by "clear-cut", I mean: as exact as a P value which can be non-significant or significant when it passes 0.05 in either ascending or descending ways.
Sure looking forward to discussing it.
--------------------------------
About the technical parts on the model and correlation coefficients, all I can say is that I'm totally lost!!
Let's consider a slightly different situation, i.e I assume that ACD change is generally negative and takes a normal distribution [i.e. ACDchange<-rnorm(200,-0.159,0.215)].
If I subtract this from the baseline measurement (i.e I am now keeping the values 'paired'), my regression works showing that there is no correlation at all between starting depth and shallowing.
I feel that my situation is more akin to this, since I have kept the measurements 'paired' between patients.
However, let's assume that there is a 20% measurement error proportional to the actual depth measured. Is it possible that THIS is regressing toward the mean and that is what is causing the apparent correlation.
So what I am saying is that should I do a sensitivity analysis where I put on a random measurement error of, say 20% and see what happens when there is no correlation other than this random error. Then, I can say whether the effect I have found is more or less than this random error.
Does this sound like a good idea?
It works!!! A 20% measurement error will give a correlation of -0.75!!! Eureka!
I think it is nice to try to participate in SiBorgs work. Not only is SiBorg a fellow member in this community but I also think that this is a problem that appears frequently.
Don’t care about R2. It is largely irrelevant anyway.
I don’t understand this. If it (the 20% error) were added to the dependent variable it would just increase the random error. If it were added to the independent variable it would create measurement errors in the independent variable and cause biased and inconsistent estimates.However, let's assume that there is a 20% measurement error proportional to the actual depth measured. Is it possible that THIS is regressing toward the mean and that is what is causing the apparent correlation.
Besides it is obvious that if the baseline measurement is included as an explanatory variable, there will a “significant” R2.
If this study had more two time periods, like if there had been three of more time periods then it would have been natural to model it as a repeated measures series. We could do that now also with just two time periods. Note that in that case we would not use the baseline measurement as an “explanatory variable”.
Then there would be a between-subject-random error (among the circa 200 patients) and a within subject random error. All the individual measurements would have such an individual level.
An other point:
The use of the variable name: “Racd_screean_median”, the use of the word “median” suggest that there have been several measurement made and since the median was (maybe) used that there was a skewed distribution in the measured variable. This problem might have been cured by taking the logarithm that we have been talking about. It also indicates a sort of multi-level formulation in that each patient is measured several times. This does not matter much if we go into the world of normally distributed models where several random component are just lumped together in a common normally distributed random error. But for other distributions if might matter.
SiBorg (10-02-2012)
Hi Greta. What I did was take 200 random starting cACD depths. I then subtracted a random sample of 200 depth changes sampled from a normal distribution. That then gives a finishing cACD depth.
So, depth change is finishing depth - starting depth. If you do the regression Change~starting depth there is NO correlation using these simulated values.
However, if you add a 20% random error to the starting AND finishing depths, then use this to estimate change and then do the regression Change~starting depth, suddenly there is a correlation. So what I am saying is that the random error from the start measurement and the end measurement is regressing toward the mean and giving an apparent correlation.
You get the same effect (but less so) if you add a 0.4mm random error to the start and end measurements that does not depend on the value measured.
So, it's not the regression that's at fault (because if I do the regression with 'perfect' simulated values I don't see a correlation). It's the random error in the 'real' measurements that's causing the problem.
I suspect that a repeated measures design would be more robust to the random error.... but I don't know anything about repeated measures designs...
|
|