+ Reply to Thread
Results 1 to 15 of 15

Thread: Sample Size Calc for Right Skewed Data

  1. #1
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Sample Size Calc for Right Skewed Data




    I have a request from a person for a sample size calculation. When I requested information about the potential study data they provided two means and standard deviations from a prior study. Given the values (x1=7; s1=6 and x2=0.5; s2 = 2) it appears the data are right skewed. I can simulate those data to work toward a simulation study, though the values actually represent before and after values. So observations are paired. So I will have to simulate two variables, but they will have to be correlated. Also data have a lower bound of zero.


    I guess I can do this, but I have no idea how correlated the paired data are. Any suggestions?


    Also, are there any basics that I am missing, e.g., the difference of two paired skewed data equal...?


    My current plan is to simulate two skewed variables and correlate them, I guess, since a bigger pre-value may mean an ability for greater decrease then lower pre-values which have little wriggle room to decrease. So, any suggestions would be appreciated. Or can I simulate a variable and subtract a constant from it, that sound good too.


    Plan, simulate data and normalize and run say 10,000 ttests and play around with sample sizes. I can also do the same thing but run Wilcoxon sign rank tests with unnormalized data. Though if the former seems feasible, that may be a good approach, because the final study analyses may require controlling for covariates, though that was not done in the flimsy example those parameters were from.




    Stop cowardice, ban guns!

  2. #2
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data

    If I simulate a skewed sample using a very large n-value and the parameters align with my target parameters, I am guessing I can then shrink the n-value and assume the smaller sample is a realization of my target.

    So I can test sample sizes for Wilcox on sign rank test, straightforward.

    What about a one-sample ttest (vs 0) of differences of two lognormal variables, though 0's may be in the sample so log transformation may require use of a constant. Any advice?

    When I backtransform I will be in the median realm, but how does the constant come into play?
    Stop cowardice, ban guns!

  3. #3
    TS Contributor
    Points: 22,359, Level: 93
    Level completed: 1%, Points required for next Level: 991
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Sample Size Calc for Right Skewed Data

    Quote Originally Posted by hlsmith View Post
    Given the values (x1=7; s1=6 and x2=0.5; s2 = 2) it appears the data are right skewed.
    Hi. I don't quite follow how can you deduce that the data is right-skewed just from those two pieces of information. Or did they show you some histograms or some other stuff?
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  4. The Following User Says Thank You to spunky For This Useful Post:

    hlsmith (06-11-2017)

  5. #4
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data

    They are left bounded by zero. So am I just making **** up, I mean making a too big of an assumption. The study used a ttest, also the sample size was 20.

    Not too much to work from, right? What approach would you take?
    Stop cowardice, ban guns!

  6. #5
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Sample Size Calc for Right Skewed Data

    hi,
    from (7,6) to (0.5,2) looks like a huge effect - a simple permutation test with a quite low sample size might be sufficient, no? Simulating that should be quite easy.

    regards

  7. The Following User Says Thank You to rogojel For This Useful Post:

    hlsmith (06-12-2017)

  8. #6
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data

    Thanks. Agreed. And the permutation test is considered a non-parametric, so skewness isn't considered? Of note, I plan to look at the values' differences between pre/post, so is there a one-sample permutation test. Since you wouldn't be just switching assignment for all of the observations between two groups due to there only being one group?


    A lingering issue I had in my mind was that the pre and post measures should be correlated not just two independent samples, but I don't know by how much they are correlated. A generic work around if I don't get the covariance structure right, may be to simulate two sets then sort them individually and then match them based on order. However that would be a too optimistic version of the actual scenario I would imagine.
    Stop cowardice, ban guns!

  9. #7
    TS Contributor
    Points: 22,359, Level: 93
    Level completed: 1%, Points required for next Level: 991
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Sample Size Calc for Right Skewed Data

    Well, if I learned anything from CBear's blog:

    http://thepathologicalscience.blogsp...not-cause.html

    was that the whole non-normality brouhaha is mostly overblown, particularly for simple (and, usually, quite robust) tests such as the t-test and whatnot.

    I honestly wouldn't freak out too much if people are using known-and-tried power analysis methods (like those from G*Power).

    As far as the correlation aspect goes, maybe you can try a few like say 0, .1, .3, .5 for "independence", "small", "medium" and "large" effect sizes a la Cohen and see how bad things can get?

    It's easy to simulate correlated, non-normal data in something like lavaan or semTools. And I know SAS has a macro out there somewhere that uses the same method as lavaan, in case you need it.

    I'd provide R code but I'm not sure if it would be particularly useful to you (?)
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  10. The Following User Says Thank You to spunky For This Useful Post:

    rogojel (06-12-2017)

  11. #8
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Sample Size Calc for Right Skewed Data

    Ehhhh I wouldn't be so quick to dismiss the non-normality here.

    Quote Originally Posted by hlsmith
    also the sample size was 20
    and

    Quote Originally Posted by The linked article
    That technical note aside, the net effect is that the headline figure of a Type I error rate of 17% is based on a tiny sample size (18) and an extremely unusual degree of non-normality
    So depending on the severity of the skew it could have a decent impact with sample sizes this small.
    I don't have emotions and sometimes that makes me very sad.

  12. #9
    TS Contributor
    Points: 22,359, Level: 93
    Level completed: 1%, Points required for next Level: 991
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Sample Size Calc for Right Skewed Data

    OMG 20!?!?!!

    I totally missed that part. Yeah, then it seems like you have a case of the ugly here (where ugly means small N )
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  13. #10
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data

    Let's see that R code Spunky!
    Stop cowardice, ban guns!

  14. #11
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data

    I feel like this is a silly question and Dason alluded to it in a post I had a couple of years ago, but alas without my Advance Search button, here I am.


    If I am doing a power simulation are the following correct:


    -sample size: is whatever I am using in the simulation
    -alpha: is the level of significance I am using for cut off in the simulation
    -power: is the number of times the null is reject given the above parameters


    So if I am doing this with a ttest for example, I set my sample size and alpha, then I get my "power" from the number of times out of the number of samples that I rejected the null (e.g., p-value </= 0.05), correct?
    Stop cowardice, ban guns!

  15. #12
    Devorador de queso
    Points: 95,540, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,930
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Sample Size Calc for Right Skewed Data

    You *estimate* your true power based on the proportion of trials in which you reject the null at your chosen alpha level. So you had the gist of it right but make sure you're talking about proportion because it doesn't make sense to say the power is 838 and if you're going to specify an alpha then you can't just always compare against 0.05 ... unless you *always* use 0.05
    I don't have emotions and sometimes that makes me very sad.

  16. The Following User Says Thank You to Dason For This Useful Post:

    hlsmith (06-12-2017)

  17. #13
    TS Contributor
    Points: 22,359, Level: 93
    Level completed: 1%, Points required for next Level: 991
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Sample Size Calc for Right Skewed Data

    Well, it really is very simple. It would look like this:

    Code: 
    
    library(lavaan)
    library(psych)
     
    mod  <- "x1 ~~ 6*x2
        
                  x1 ~~ 36*x1
                  x2 ~~ 4*x2
     
                  x1 ~ 7*1
                  x2 ~ 0.5*1"
     
     
    N<- 100
    skew <- c(2,2)
    kurt <- c(7,7)
     
    data <- simulateData(mod, sample.nobs=N, skewness=skew, kurtosis=kurt)
    describe(data)
       vars   n mean   sd median trimmed  mad   min   max range skew kurtosis   se
    x1    1 100 6.58 5.83   6.32    6.33 1.50  3.64 13.48  9.85 1.46     2.57 0.19
    x2    2 100 0.47 1.87  -0.18   -0.04 0.75 -1.31  7.30  8.61 2.71    10.45 0.13
    Notice that I had to square your 6 and your 2 in the SD section because lavaan takes in variances to create the variance-covariance matrix.

    So the mod part specifies that that has a correlation of 0.5, the means and standard deviations that you mentioned, univariate skewnesses of 2 and kurtoses of 7.

    Then you could do something like:

    Code: 
    
    > t.test(data$x1,data$x2)
    
            Welch Two Sample t-test
    
    data:  data$x1 and data$x2
    t = 9.8325, df = 125.79, p-value < 2.2e-16
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     4.380432 6.588089
    sample estimates:
    mean of x mean of y 
    6.0769192 0.5926587
    And I guess repeat that a gazillion times to see how the power looks like.
    Last edited by spunky; 06-13-2017 at 11:50 AM.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  18. The Following User Says Thank You to spunky For This Useful Post:

    hlsmith (06-12-2017)

  19. #14
    Human
    Points: 12,672, Level: 73
    Level completed: 56%, Points required for next Level: 178
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,361
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Sample Size Calc for Right Skewed Data

    What is the "delta" that your client wish to detect with power= 0.80 and significance level 0.05?

    Why not just assume log-normal and get a sample size base on that (and from say the two std:s)?

    Possibly assume gamma distribution and simulate from that (Gamma and the log-normal are relatively similar.)

  20. #15
    Omega Contributor
    Points: 38,289, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,992
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Sample Size Calc for Right Skewed Data


    Thanks Greta. I have since resolved the questions related to the project. They actually wanted to do a two independent sample test. Though, they gave me an example study that they wanted to emulate, which was a before and after study, so I presumed that is what they wanted. I was able to bang something out for both scenarios in SAS. Though for the correlated simulation, I just kept trying values for location, etc. in a huge sample until I got two distributions that were close enough.


    I had wondered if I could use the Gamma. I had also wondered if I could surmise a delta and dispersion parameter using the two possibly skewed based means and SDs.


    So is there any rule about the difference of two lognormals equaling something like a lognormal, given or not given dependency of values. I had thought it would be easier to just work with the differences, though I was only given the two means and SDs to work from. Though, as Miner pointed out the two groups were very different in values, and actually would require less than 20 patients to test the hypothesis.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats