About me: Medical student with minimal statistical knowledge or research experience
Question related to: statistical regression/correlation analysis of a medical trial
Trial design: Double blind, randomized controlled trial. Placebo vs vitamin with a few months between baseline and follow-up measurements. Follow-up and baseline measurements are identical.
Measurements performed, possibly relevant for answering my questions:
In total 6 tests are performed twice. Once at baseline, once at follow-up.
Insulin sensitivity testing; 2 tests.
- First provides 3 values usable for analysis (one of these is important, other 2 secondary)
- Second test provides 4 quite comparable values usable for analysis (difference are minor because of different calculation method)
Vascular function testing; 4 tests.
- First provides one important useable value
- Second provides 4 values
- Third provides 3 values (1 important, 2 secondary)
- Fourth provides 4 equally important variables
Important! Current situation:
- Goal of study: 120 participants
- Current status: 13 participants included with baseline measurements taken, 8 of them have also completed follow-up measurements.
- Double blinded study, with no option to de-blind (until trial completion) to see effect of the vitamin. Solution for me: Looking at the correlations between insulin sensitivity (insulin resistance) and vascular function (vascular dysfunction).
What I did so far:
Baseline correlation (N=13)
Using the 13 baseline measurements, I performed Spearman correlation (in SPSS) using all variables of insulin sensitivity and vascular function. I also included screening data such as BMI, age etc. I figured Pearson would be inferior because of the small sample size (and thus not a normal distribution).
Follow-up/baseline difference (N=8)
I made a separate database where I subtracted the baseline (BL) data from the follow-up (FU) data. This results in a database with both positive and negative values (because sometimes the values improved at follow-up, and sometimes they deteriorated).
I did this, because I want to know if there is a relationship between the change of insulin sensitivity and vascular functioning between BL and FU. In other words; does a decrease (between BL/FU) in insulin sensitivity cause an increase in vascular dysfunction?
To analyse this, I used the spearman correlation again. Also, I tried some linear regression analysis (in SPSS) on these numbers, using insulin sensitivity parameters as the independent variable.
My questions:
Regarding baseline correlation:
(1) Is Spearman correlation indeed the best (and easiest) way to assess the correlation between insulin resistance and vascular dysfunction in this situation?
Regarding the Follow-up/baseline difference analysis:
(2) Does it even make sense to use spearman correlation in this situation?
(3) Is linear regression the best way to analyse the difference between follow-up and baseline in this situation, using the subtracted data?
(4) Would it make sense to correct for age, sex or other variables in such a small sample size (N=8) with regression analysis?
General questions:
(5) After reading this, any other suggestions regarding the statistical analysis approach? Or any specific references I could use which deal with situations like this specifically?
Thank you in advance!
PS. Note to administrators: this was my first time on this forum. After writing this relatively long thread (in latest version of Google Chrome) and then clicking on preview after a decent amount of time spent on typing it prompts with a re-login because of an expired session. After logging in again, I got a white screen. Pressing back did not work. Everything I wrote disappeared. Very frustrating
I wrote this post again in Word and then pasted it into the forum to prevent this from happening again. Known issue?
Last edited by DutchMedicalStudent; 05-22-2017 at 09:48 AM.
I'm currently doing a research internship which only lasts a few months. The study will last another year or two, and is done by my supervisor here. Only helping with the current study does not meet the university requirements so I also have to do my own research questions. This is the only data available from this study so far and there's only 2 months left before the thesis has to be complete so this is all the data I can use, unfortunately.
How is your dependent variable measured? That determines if Spearman makes sense or not.
Its common to run ANCOVA to control for things in a randomized test.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
To be totally honest I would suggest talking to someone else in your university (not just your supervisor) for advice about how to proceed here, whether this study will really be sufficient for a passing grade, and if not whether you have any alternative options. A correlational study with N=13/8 has very weak power (i.e., even if there is a substantial correlation, you'd have little chance of detecting it with a sample size this small) - meaning that there probably isn't much value in an analysis like this. It sounds like a tricky situation though, so make sure you sound out your options carefully.
Re. The pearson/spearman choice: A small sample size does not imply a non-normal distribution of your variables. I'd probably use Pearson, since it doesn't throw away information by turning observations into ranks, and thus should have slightly higher power. If you're worried about non-normality you can calculate confidence intervals via bootstrapping instead of normal theory.
I wouldn't use multiple regression in the follow-up analysis: The use of difference scores controls for individual difference variables implicitly, and you don't have degrees of freedom to burn on including other predictor variables.
Matt aka CB | twitter.com/matthewmatix
To be totally honest I would suggest talking to someone else in your university (not just your supervisor) for advice about how to proceed here, whether this study will really be sufficient for a passing grade, and if not whether you have any alternative options. A correlational study with N=13/8 has very weak power (i.e., even if there is a substantial correlation, you'd have little chance of detecting it with a sample size this small) - meaning that there probably isn't much value in an analysis like this. It sounds like a tricky situation though, so make sure you sound out your options carefully.
Re. The pearson/spearman choice: A small sample size does not imply a non-normal distribution of your variables. I'd probably use Pearson, since it doesn't throw away information by turning observations into ranks, and thus should have slightly higher power. If you're worried about non-normality you can calculate confidence intervals via bootstrapping instead of normal theory.
I wouldn't use multiple regression in the follow-up analysis: The use of difference scores controls for individual difference variables implicitly, and you don't have degrees of freedom to burn on including other predictor variables.
Matt aka CB | twitter.com/matthewmatix
ondansetron (05-30-2017)
Doesn't that require me to know which group had what? All my variables are continuous/scale.
There is no other option. This is what I have to use. Luckily the correlation and the linear regression analysis have both resulted in multiple interesting significant values, so apparently the correlations are strong enough to become significant in this small population.
I have tested for non-normal distribution, the majority of the variables are non-normally distributed. So, using Pearson for higher power is in my opinion a bad choice (Also, the advantage of higher power is no not necessary as Spearman has provided sufficient significant correlations).
I see your point with the multiple regression analysis. Thanks!
It requires you to have variables that vary on some dimension you are interested in. But if you don't have that then any analysis would be impossible anyway. I don't know what you mean by knowing which group had what.
The same problem CWB mentions for regression, low power, applies to ANCOVA.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
I thought that ANCOVA required you to split the population in groups. Since I don't know who had treatment or placebo, I cannot do that (population is still double blinded). I am simply looking at the correlation between variables.
Even comparing the baseline and follow-up I really can't consider as two separate groups, because they are the same people. I subtracted these from each other to get a delta which I can use for linear regression ANOVA).
What exactly do you mean by "variables that vary on some dimension", because I have many variables for each part i'm interested in and they vary quite a lot haha.
What I mean is that to run ANOVA or regression you have to know 1) how your dependent variable varied (did they get better, did they die or whatever you are measuring) and 2) how the predictor variable varied. For example if some one had a treatment or did not. If you don't know this information its impossible to run any statistic I know, you have nothing to compare. The point of standard statistics is to see how the dependent variable varied with the predictors. If you don't know how the predictors varied obviously you can not run such tests.
I am not aware of any statistics you can run where you don't know if the predictor took on a specific level. But they may well exist, I do not do medical research.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
It is appropriate for small sample sizes such as yours. But, as was already mentioned, statistical power to detect effects will be very low.(1) Is Spearman correlation indeed the best (and easiest) way to assess the correlation between insulin resistance and vascular dysfunction in this situation?
Yes. And you could do some scatterplots.Regarding the Follow-up/baseline difference analysis:
(2) Does it even make sense to use spearman correlation in this situation?
Multiple regression is inappropriate regarding your extremely small sample size.(3) Is linear regression the best way to analyse the difference between follow-up and baseline in this situation, using the subtracted data?
(4) Would it make sense to correct for age, sex or other variables in such a small sample size (N=8) with regression analysis?
Wit kind regards
K.
»Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)
noetsi (05-31-2017)
Now I think I understand you. Pretty much all measurements are simply numerical. With all variables I know whether higher is "better" or "worse". I just wanted to say, there is no categorical value to divide people in groups. In the linear regression analysis I just wanted to know, for example, if people who had improved insulin sensitivity after 8 weeks also had better vascular function (and those who deteriorated after 8 weeks also had a worsened vascular function). That was the goal of this analysis. This means I compared two numerical/continuous/scale values. So I suppose I do know how the values "are varied", right?
Luckily and apparently, the correlations are very strong, because I found numerous significant results!
Yeah, I think I will add some graphs to my results section as well!
That's what I thought. After using one independent variable I got a P of 0.053. After adding another variable, for which I should actually correct, I got P's of around 0.01 and 0.03 for those 2 variables!
Does this mean these P-values are probably overestimation (and should be higher) because of the small sample size?
Could i put these results in the thesis (because they are clearly significant now), with a sidenote that multiple regression analysis is not ideal in such a small sample? Or are 2 variables so "not done" with N=8 that I shouldn't even dare to put it in?
Thanks
In general, there's the problem of overfitting. Statistical models tend to become too perfectly fitted if there are many predictors and only a few observations. That means, it is doubtful whether results can be generalized to new data. On the other hand, you seem to have extremely strong associations (which makes me wonder why this wasn't known beforehand; by the way, you could have mentioned the size oif the correlation coefficients and of the regression coefficients (and the Adjusted R² of your mutltiple regression model). Admittedly, I am not sure whether overfitting is still a serious issue if associations are such strong.
With kind regards
Karabiner
»Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)
DutchMedicalStudent (06-02-2017)
Tweet |