+ Reply to Thread
Results 1 to 13 of 13

Thread: Master thesis, statistics check!

  1. #1
    Points: 71, Level: 1
    Level completed: 43%, Points required for next Level: 29

    Location
    the Netherlands
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Master thesis, statistics check!




    About me: Medical student with minimal statistical knowledge or research experience

    Question related to: statistical regression/correlation analysis of a medical trial

    Trial design: Double blind, randomized controlled trial. Placebo vs vitamin with a few months between baseline and follow-up measurements. Follow-up and baseline measurements are identical.

    Measurements performed, possibly relevant for answering my questions:
    In total 6 tests are performed twice. Once at baseline, once at follow-up.
    Insulin sensitivity testing; 2 tests.
    - First provides 3 values usable for analysis (one of these is important, other 2 secondary)
    - Second test provides 4 quite comparable values usable for analysis (difference are minor because of different calculation method)
    Vascular function testing; 4 tests.
    - First provides one important useable value
    - Second provides 4 values
    - Third provides 3 values (1 important, 2 secondary)
    - Fourth provides 4 equally important variables

    Important! Current situation:
    - Goal of study: 120 participants
    - Current status: 13 participants included with baseline measurements taken, 8 of them have also completed follow-up measurements.
    - Double blinded study, with no option to de-blind (until trial completion) to see effect of the vitamin. Solution for me: Looking at the correlations between insulin sensitivity (insulin resistance) and vascular function (vascular dysfunction).

    What I did so far:
    Baseline correlation (N=13)
    Using the 13 baseline measurements, I performed Spearman correlation (in SPSS) using all variables of insulin sensitivity and vascular function. I also included screening data such as BMI, age etc. I figured Pearson would be inferior because of the small sample size (and thus not a normal distribution).

    Follow-up/baseline difference (N=8)
    I made a separate database where I subtracted the baseline (BL) data from the follow-up (FU) data. This results in a database with both positive and negative values (because sometimes the values improved at follow-up, and sometimes they deteriorated).
    I did this, because I want to know if there is a relationship between the change of insulin sensitivity and vascular functioning between BL and FU. In other words; does a decrease (between BL/FU) in insulin sensitivity cause an increase in vascular dysfunction?
    To analyse this, I used the spearman correlation again. Also, I tried some linear regression analysis (in SPSS) on these numbers, using insulin sensitivity parameters as the independent variable.

    My questions:
    Regarding baseline correlation:
    (1) Is Spearman correlation indeed the best (and easiest) way to assess the correlation between insulin resistance and vascular dysfunction in this situation?
    Regarding the Follow-up/baseline difference analysis:
    (2) Does it even make sense to use spearman correlation in this situation?
    (3) Is linear regression the best way to analyse the difference between follow-up and baseline in this situation, using the subtracted data?
    (4) Would it make sense to correct for age, sex or other variables in such a small sample size (N=8) with regression analysis?
    General questions:
    (5) After reading this, any other suggestions regarding the statistical analysis approach? Or any specific references I could use which deal with situations like this specifically?

    Thank you in advance!

    PS. Note to administrators: this was my first time on this forum. After writing this relatively long thread (in latest version of Google Chrome) and then clicking on preview after a decent amount of time spent on typing it prompts with a re-login because of an expired session. After logging in again, I got a white screen. Pressing back did not work. Everything I wrote disappeared. Very frustrating
    I wrote this post again in Word and then pasted it into the forum to prevent this from happening again. Known issue?
    Last edited by DutchMedicalStudent; 05-22-2017 at 09:48 AM.

  2. #2
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Master thesis, statistics check!

    Hi there!

    Quote Originally Posted by DutchMedicalStudent View Post
    Important! Current situation:
    - Goal of study: 120 participants
    - Current status: 13 participants included with baseline measurements taken, 8 of them have also completed follow-up measurements.
    Can you elaborate on why you're analysing data at this stage, given that your data collection is incomplete and your current sample is so tiny?
    Matt aka CB | twitter.com/matthewmatix

  3. #3
    Points: 71, Level: 1
    Level completed: 43%, Points required for next Level: 29

    Location
    the Netherlands
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by CowboyBear View Post
    Can you elaborate on why you're analysing data at this stage, given that your data collection is incomplete and your current sample is so tiny?
    I'm currently doing a research internship which only lasts a few months. The study will last another year or two, and is done by my supervisor here. Only helping with the current study does not meet the university requirements so I also have to do my own research questions. This is the only data available from this study so far and there's only 2 months left before the thesis has to be complete so this is all the data I can use, unfortunately.

  4. #4
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Master thesis, statistics check!

    How is your dependent variable measured? That determines if Spearman makes sense or not.

    Its common to run ANCOVA to control for things in a randomized test.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #5
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by DutchMedicalStudent View Post
    I'm currently doing a research internship which only lasts a few months. The study will last another year or two, and is done by my supervisor here. Only helping with the current study does not meet the university requirements so I also have to do my own research questions. This is the only data available from this study so far and there's only 2 months left before the thesis has to be complete so this is all the data I can use, unfortunately.
    To be totally honest I would suggest talking to someone else in your university (not just your supervisor) for advice about how to proceed here, whether this study will really be sufficient for a passing grade, and if not whether you have any alternative options. A correlational study with N=13/8 has very weak power (i.e., even if there is a substantial correlation, you'd have little chance of detecting it with a sample size this small) - meaning that there probably isn't much value in an analysis like this. It sounds like a tricky situation though, so make sure you sound out your options carefully.

    Re. The pearson/spearman choice: A small sample size does not imply a non-normal distribution of your variables. I'd probably use Pearson, since it doesn't throw away information by turning observations into ranks, and thus should have slightly higher power. If you're worried about non-normality you can calculate confidence intervals via bootstrapping instead of normal theory.

    I wouldn't use multiple regression in the follow-up analysis: The use of difference scores controls for individual difference variables implicitly, and you don't have degrees of freedom to burn on including other predictor variables.
    Matt aka CB | twitter.com/matthewmatix

  6. #6
    TS Contributor
    Points: 18,889, Level: 87
    Level completed: 8%, Points required for next Level: 461
    CowboyBear's Avatar
    Location
    New Zealand
    Posts
    2,062
    Thanks
    121
    Thanked 427 Times in 328 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by DutchMedicalStudent View Post
    I'm currently doing a research internship which only lasts a few months. The study will last another year or two, and is done by my supervisor here. Only helping with the current study does not meet the university requirements so I also have to do my own research questions. This is the only data available from this study so far and there's only 2 months left before the thesis has to be complete so this is all the data I can use, unfortunately.
    To be totally honest I would suggest talking to someone else in your university (not just your supervisor) for advice about how to proceed here, whether this study will really be sufficient for a passing grade, and if not whether you have any alternative options. A correlational study with N=13/8 has very weak power (i.e., even if there is a substantial correlation, you'd have little chance of detecting it with a sample size this small) - meaning that there probably isn't much value in an analysis like this. It sounds like a tricky situation though, so make sure you sound out your options carefully.

    Re. The pearson/spearman choice: A small sample size does not imply a non-normal distribution of your variables. I'd probably use Pearson, since it doesn't throw away information by turning observations into ranks, and thus should have slightly higher power. If you're worried about non-normality you can calculate confidence intervals via bootstrapping instead of normal theory.

    I wouldn't use multiple regression in the follow-up analysis: The use of difference scores controls for individual difference variables implicitly, and you don't have degrees of freedom to burn on including other predictor variables.
    Matt aka CB | twitter.com/matthewmatix

  7. The Following User Says Thank You to CowboyBear For This Useful Post:

    ondansetron (05-30-2017)

  8. #7
    Points: 71, Level: 1
    Level completed: 43%, Points required for next Level: 29

    Location
    the Netherlands
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by noetsi View Post
    How is your dependent variable measured? That determines if Spearman makes sense or not.

    Its common to run ANCOVA to control for things in a randomized test.
    Doesn't that require me to know which group had what? All my variables are continuous/scale.


    Quote Originally Posted by CowboyBear View Post
    To be totally honest I would suggest talking to someone else in your university (not just your supervisor) for advice about how to proceed here, whether this study will really be sufficient for a passing grade, and if not whether you have any alternative options. A correlational study with N=13/8 has very weak power (i.e., even if there is a substantial correlation, you'd have little chance of detecting it with a sample size this small) - meaning that there probably isn't much value in an analysis like this. It sounds like a tricky situation though, so make sure you sound out your options carefully.

    Re. The pearson/spearman choice: A small sample size does not imply a non-normal distribution of your variables. I'd probably use Pearson, since it doesn't throw away information by turning observations into ranks, and thus should have slightly higher power. If you're worried about non-normality you can calculate confidence intervals via bootstrapping instead of normal theory.

    I wouldn't use multiple regression in the follow-up analysis: The use of difference scores controls for individual difference variables implicitly, and you don't have degrees of freedom to burn on including other predictor variables.
    There is no other option. This is what I have to use. Luckily the correlation and the linear regression analysis have both resulted in multiple interesting significant values, so apparently the correlations are strong enough to become significant in this small population.

    I have tested for non-normal distribution, the majority of the variables are non-normally distributed. So, using Pearson for higher power is in my opinion a bad choice (Also, the advantage of higher power is no not necessary as Spearman has provided sufficient significant correlations).

    I see your point with the multiple regression analysis. Thanks!

  9. #8
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Master thesis, statistics check!

    It requires you to have variables that vary on some dimension you are interested in. But if you don't have that then any analysis would be impossible anyway. I don't know what you mean by knowing which group had what.
    The same problem CWB mentions for regression, low power, applies to ANCOVA.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  10. #9
    Points: 71, Level: 1
    Level completed: 43%, Points required for next Level: 29

    Location
    the Netherlands
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by noetsi View Post
    It requires you to have variables that vary on some dimension you are interested in. But if you don't have that then any analysis would be impossible anyway. I don't know what you mean by knowing which group had what.
    The same problem CWB mentions for regression, low power, applies to ANCOVA.
    I thought that ANCOVA required you to split the population in groups. Since I don't know who had treatment or placebo, I cannot do that (population is still double blinded). I am simply looking at the correlation between variables.
    Even comparing the baseline and follow-up I really can't consider as two separate groups, because they are the same people. I subtracted these from each other to get a delta which I can use for linear regression ANOVA).

    What exactly do you mean by "variables that vary on some dimension", because I have many variables for each part i'm interested in and they vary quite a lot haha.

  11. #10
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Master thesis, statistics check!

    What I mean is that to run ANOVA or regression you have to know 1) how your dependent variable varied (did they get better, did they die or whatever you are measuring) and 2) how the predictor variable varied. For example if some one had a treatment or did not. If you don't know this information its impossible to run any statistic I know, you have nothing to compare. The point of standard statistics is to see how the dependent variable varied with the predictors. If you don't know how the predictors varied obviously you can not run such tests.

    I am not aware of any statistics you can run where you don't know if the predictor took on a specific level. But they may well exist, I do not do medical research.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  12. #11
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: Master thesis, statistics check!

    (1) Is Spearman correlation indeed the best (and easiest) way to assess the correlation between insulin resistance and vascular dysfunction in this situation?
    It is appropriate for small sample sizes such as yours. But, as was already mentioned, statistical power to detect effects will be very low.
    Regarding the Follow-up/baseline difference analysis:
    (2) Does it even make sense to use spearman correlation in this situation?
    Yes. And you could do some scatterplots.

    (3) Is linear regression the best way to analyse the difference between follow-up and baseline in this situation, using the subtracted data?
    (4) Would it make sense to correct for age, sex or other variables in such a small sample size (N=8) with regression analysis?
    Multiple regression is inappropriate regarding your extremely small sample size.

    Wit kind regards

    K.
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  13. The Following User Says Thank You to Karabiner For This Useful Post:

    noetsi (05-31-2017)

  14. #12
    Points: 71, Level: 1
    Level completed: 43%, Points required for next Level: 29

    Location
    the Netherlands
    Posts
    5
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Master thesis, statistics check!

    Quote Originally Posted by noetsi View Post
    What I mean is that to run ANOVA or regression you have to know 1) how your dependent variable varied (did they get better, did they die or whatever you are measuring) and 2) how the predictor variable varied. For example if some one had a treatment or did not. If you don't know this information its impossible to run any statistic I know, you have nothing to compare. The point of standard statistics is to see how the dependent variable varied with the predictors. If you don't know how the predictors varied obviously you can not run such tests.

    I am not aware of any statistics you can run where you don't know if the predictor took on a specific level. But they may well exist, I do not do medical research.
    Now I think I understand you. Pretty much all measurements are simply numerical. With all variables I know whether higher is "better" or "worse". I just wanted to say, there is no categorical value to divide people in groups. In the linear regression analysis I just wanted to know, for example, if people who had improved insulin sensitivity after 8 weeks also had better vascular function (and those who deteriorated after 8 weeks also had a worsened vascular function). That was the goal of this analysis. This means I compared two numerical/continuous/scale values. So I suppose I do know how the values "are varied", right?

    Quote Originally Posted by Karabiner View Post
    It is appropriate for small sample sizes such as yours. But, as was already mentioned, statistical power to detect effects will be very low.
    Luckily and apparently, the correlations are very strong, because I found numerous significant results!

    Quote Originally Posted by Karabiner View Post
    Yes. And you could do some scatterplots.
    Yeah, I think I will add some graphs to my results section as well!

    Quote Originally Posted by Karabiner View Post
    Multiple regression is inappropriate regarding your extremely small sample size.
    That's what I thought. After using one independent variable I got a P of 0.053. After adding another variable, for which I should actually correct, I got P's of around 0.01 and 0.03 for those 2 variables!
    Does this mean these P-values are probably overestimation (and should be higher) because of the small sample size?
    Could i put these results in the thesis (because they are clearly significant now), with a sidenote that multiple regression analysis is not ideal in such a small sample? Or are 2 variables so "not done" with N=8 that I shouldn't even dare to put it in?

    Quote Originally Posted by Karabiner View Post
    With kind regards
    K.
    Thanks

  15. #13
    TS Contributor
    Points: 17,749, Level: 84
    Level completed: 80%, Points required for next Level: 101
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,540
    Thanks
    56
    Thanked 639 Times in 601 Posts

    Re: Master thesis, statistics check!


    In general, there's the problem of overfitting. Statistical models tend to become too perfectly fitted if there are many predictors and only a few observations. That means, it is doubtful whether results can be generalized to new data. On the other hand, you seem to have extremely strong associations (which makes me wonder why this wasn't known beforehand; by the way, you could have mentioned the size oif the correlation coefficients and of the regression coefficients (and the Adjusted R² of your mutltiple regression model). Admittedly, I am not sure whether overfitting is still a serious issue if associations are such strong.

    With kind regards

    Karabiner
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  16. The Following User Says Thank You to Karabiner For This Useful Post:

    DutchMedicalStudent (06-02-2017)

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats