+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 19

Thread: testing rates through time

  1. #1
    Points: 1,935, Level: 26
    Level completed: 35%, Points required for next Level: 65

    Location
    USA
    Posts
    29
    Thanks
    10
    Thanked 0 Times in 0 Posts

    testing rates through time




    Hello all, this seems like it should be a simple questions but our team cannot find an answer to the question. If I have data set up like the following (the numbers are the percent that react positively to the drug):
    Code: 
    Drug 2007 2008 2009 2010
       A   59   62   61   62
       B   50   49   58   61
       C   67   70   69   83
    and my goal is to see whether there is a significant difference in the observed trends in rates over time. How can I test this...unfortunately in my line of work, this is all we get, we cannot get data with the sample sizes or any other information. Any help would be greatly appreciated as I have ran into this problem many times.

    Thank you so much!
    Last edited by Dason; 10-11-2011 at 11:22 AM.

  2. #2
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: testing rates through time

    without sample size or sd I think you may have to rely on simple mean differences.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. The Following User Says Thank You to trinker For This Useful Post:

    shanehall.m (10-11-2011)

  4. #3
    Points: 855, Level: 15
    Level completed: 55%, Points required for next Level: 45
    d21e7x11's Avatar
    Location
    Hamilton, ON Canada
    Posts
    62
    Thanks
    7
    Thanked 9 Times in 9 Posts

    Re: testing rates through time

    shanehall.m, you may want to consider first applying the arcsine transformation to percentages. Then you can out the repeated measurements analysis on the transformed outcome.

  5. The Following User Says Thank You to d21e7x11 For This Useful Post:

    shanehall.m (10-11-2011)

  6. #4
    Points: 1,935, Level: 26
    Level completed: 35%, Points required for next Level: 65

    Location
    USA
    Posts
    29
    Thanks
    10
    Thanked 0 Times in 0 Posts

    Re: testing rates through time

    So there is no sort of time series analysis? We can construct a line graph with both the different drugs having different lines, but we don't know of a way to compare them. So your saying do an arcsine transformation to the percentages and then run a repeated measures analysis. That will negate the trend effect wont it? Are you suggesting teh arcsine transformation as a more precise way of treating the percentages as true continuous numbers?

  7. #5
    Points: 1,935, Level: 26
    Level completed: 35%, Points required for next Level: 65

    Location
    USA
    Posts
    29
    Thanks
    10
    Thanked 0 Times in 0 Posts

    Re: testing rates through time

    Slide 9 of this powerpoint is almost identical to my question. In this powerpoint, I'd be trying to compare the age groups. If I had no other numbers other than the rates/proportions given throughout the years, how would I be able to find a difference in the trends of the age groups. Thank you all for your help.
    http://www.hsph.harvard.edu/means-ma...cideTrends.ppt

  8. #6
    Ninja say what!?!
    Points: 8,297, Level: 61
    Level completed: 49%, Points required for next Level: 153
    Link's Avatar
    Posts
    1,165
    Thanks
    37
    Thanked 84 Times in 76 Posts

    Re: testing rates through time

    One method is to just throw the numbers into a linear regression model:

    Percentage ~ DrugA*time + DrugB*time + DrugC*time

    This will tell you whether there is a significant increasing linear trend in the percentages over time. I problem with this though is that you have so little observations. A way around this could be to assume that all the drugs have the same growth over time.

    PS. If you are planning to present this professionally, I strongly recommend bringing someone onto your team who knows what they're doing.

  9. #7
    Devorador de queso
    Points: 95,819, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,935
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: testing rates through time

    I did a quite little thing to see if we assume that there is a linear trend if we have any evidence that the slope is different for the different drugs. We end up concluding we don't have enough evidence. Once again though if we actually had sample sizes we could do quite a bit more with the data.
    Code: 
    dat <- data.frame(drug = rep(c("A","B","C"), each = 4),
                      year = rep(2007:2010, 3),
                      vals = c(59,62,61,62,50,49,58,61,67,70,69,83))
    # I'd rather work with smaller numbers for the predictor
    dat$time = dat$year - 2007
    # Apply the arcsin-squareroot transformation
    dat$transvals <- asin(sqrt(dat$vals/100))
    
    # Plotting to see what the data looks like
    library(ggplot2)
    # Plot of actual data
    qplot(time, vals, colour = drug, data = dat, geom = "line")
    # Plot of transformed data
    qplot(time, transvals, colour = drug, data = dat, geom = "line")
    
    # Fit a line for each drug (actual data)
    o.full <- lm(vals ~ drug + time + drug:time, data = dat)
    # Fit a line for each drug (transformed data)
    o.trans <- lm(transvals ~ drug + time + drug:time, data = dat)
    
    # Check the interaction term to see if there is a "significant"
    # difference
    anova(o.full) # Interaction isn't significant
    anova(o.trans) # Interaction isn't signficant
    
    # Not entirely sure the transformation is completely appropriate
    # since the point is to stabilize the variance but it partially
    # depends on sample size which we don't know.  So if the sample
    # sizes are approximately equal it doesn't matter.  But then again
    # all the observations are in a relatively small range anyways so
    # it doesn't really matter... and that's probably why we don't
    # see any big changes in the analysis.
    And the code along with the output
    Code: 
    > dat <- data.frame(drug = rep(c("A","B","C"), each = 4),
    +                   year = rep(2007:2010, 3),
    +                   vals = c(59,62,61,62,50,49,58,61,67,70,69,83))
    > # I'd rather work with smaller numbers for the predictor
    > dat$time = dat$year - 2007
    > # Apply the arcsin-squareroot transformation
    > dat$transvals <- asin(sqrt(dat$vals/100))
    > 
    > # Plotting to see what the data looks like
    > library(ggplot2)
    > # Plot of actual data
    > qplot(time, vals, colour = drug, data = dat, geom = "line")
    > # Plot of transformed data
    > qplot(time, transvals, colour = drug, data = dat, geom = "line")
    > 
    > # Fit a line for each drug (actual data)
    > o.full <- lm(vals ~ drug + time + drug:time, data = dat)
    > # Fit a line for each drug (transformed data)
    > o.trans <- lm(transvals ~ drug + time + drug:time, data = dat)
    > 
    > # Check the interaction term to see if there is a "significant"
    > # difference
    > anova(o.full) # Interaction isn't significant
    Analysis of Variance Table
    
    Response: vals
              Df Sum Sq Mean Sq F value    Pr(>F)    
    drug       2 645.17  322.58 28.5052 0.0008634 ***
    time       1 156.82  156.82 13.8571 0.0098231 ** 
    drug:time  2  45.03   22.52  1.9897 0.2173416    
    Residuals  6  67.90   11.32                      
    ---
    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 
    > anova(o.trans) # Interaction isn't signficant
    Analysis of Variance Table
    
    Response: transvals
              Df   Sum Sq  Mean Sq F value   Pr(>F)   
    drug       2 0.073098 0.036549 24.8271 0.001253 **
    time       1 0.018545 0.018545 12.5975 0.012082 * 
    drug:time  2 0.005863 0.002932  1.9914 0.217123   
    Residuals  6 0.008833 0.001472                    
    ---
    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 
    > 
    > # Not entirely sure the transformation is completely appropriate
    > # since the point is to stabilize the variance but it partially
    > # depends on sample size which we don't know.  So if the sample
    > # sizes are approximately equal it doesn't matter.  But then again
    > # all the observations are in a relatively small range anyways so
    > # it doesn't really matter... and that's probably why we don't
    > # see any big changes in the analysis.

  10. #8
    Points: 855, Level: 15
    Level completed: 55%, Points required for next Level: 45
    d21e7x11's Avatar
    Location
    Hamilton, ON Canada
    Posts
    62
    Thanks
    7
    Thanked 9 Times in 9 Posts

    Re: testing rates through time

    Quote Originally Posted by shanehall.m View Post
    We can construct a line graph with both the different drugs having different lines, but we don't know of a way to compare them.
    Actually, repeated measurements analysis would apply if outcomes at different time points were obtained from the same "entity". If you can assume independence then you could fit regression lines within each group and obtain estimates of the slope and their standard errors. Then I think it will be possible to test if there is a difference between the slopes since you'll have an estimate and a standard error.

    Quote Originally Posted by shanehall.m View Post
    Are you suggesting teh arcsine transformation as a more precise way of treating the percentages as true continuous numbers?
    Yes, that's right.

  11. #9
    Points: 855, Level: 15
    Level completed: 55%, Points required for next Level: 45
    d21e7x11's Avatar
    Location
    Hamilton, ON Canada
    Posts
    62
    Thanks
    7
    Thanked 9 Times in 9 Posts

    Re: testing rates through time

    In SAS:

    Code: 
    data test;
    input Drug $ t2007 t2008 t2009 t2010;
    datalines;
       A   59   62   61   62
       B   50   49   58   61
       C   67   70   69   83
    ;
    run;
    
    data test7; set test(keep=drug t2007 rename=(t2007=outc)); time=2007; 
    data test8; set test(keep=drug t2008 rename=(t2008=outc)); time=2008;
    data test9; set test(keep=drug t2009 rename=(t2009=outc)); time=2009;
    data test10; set test(keep=drug t2010 rename=(t2010=outc)); time=2010;
    
    data test2; set test7 test8 test9 test10;
     troutc=arsin(sqrt(outc*0.01));
    run;
    
    proc glm data=test2; /*untransformed outcome, time categorical*/
       class drug time;
       model outc=time drug;
    run;
    
    proc glm data=test2; /*transformed outcome, time categorical*/
       class drug time;
       model troutc=time drug;
    run;
    Output (ANOVA table), untransformed outcome, time categorical:
    Code: 
    Source                      DF     Type III SS     Mean Square    F Value    Pr > F
    
    time                         3     172.2500000      57.4166667       3.53    0.0881
    Drug                         2     645.1666667     322.5833333      19.85    0.0023
    Output (ANOVA table), transformed outcome, time categorical:
    Code: 
    Source                      DF     Type III SS     Mean Square    F Value    Pr > F
    
    time                         3      0.02074889      0.00691630       3.32    0.0983
    Drug                         2      0.07309807      0.03654903      17.55    0.0031

    Dason, I don't think we can test the time*drug interaction with these data. We have a single observation in each time*drug cell, so it's just there is no error term to test for an interaction.
    Last edited by d21e7x11; 10-12-2011 at 12:17 PM.

  12. The Following User Says Thank You to d21e7x11 For This Useful Post:

    shanehall.m (10-13-2011)

  13. #10
    Devorador de queso
    Points: 95,819, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,935
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: testing rates through time

    If we look for a linear trend and treat time as continuous then we can look for an interaction. It's not ideal but it's probably the best we could do with this dataset.

  14. The Following User Says Thank You to Dason For This Useful Post:

    shanehall.m (10-12-2011)

  15. #11
    Points: 855, Level: 15
    Level completed: 55%, Points required for next Level: 45
    d21e7x11's Avatar
    Location
    Hamilton, ON Canada
    Posts
    62
    Thanks
    7
    Thanked 9 Times in 9 Posts

    Re: testing rates through time

    Yes, that's true. I'm not proficient enough in R so I couldn't tell from your code if time was continuous or categorical.

    Here is what I got with time continuous, time*drug interation included:

    SAS code with time continuous:
    Code: 
    proc glm data=test2; /*untransformed outcome, time continuous*/
       class drug;
       model outc=time drug time*drug;
    run;
    
    proc glm data=test2; /*transformed outcome, time continuou*/
       class drug;
       model troutc=time drug time*drug;
    run;
    Output - untransformed outcome:
    Code: 
    Source                      DF     Type III SS     Mean Square    F Value    Pr > F
    
    time                         1     156.8166667     156.8166667      13.86    0.0098
    Drug                         2      44.9826521      22.4913261       1.99    0.2176
    time*Drug                    2      45.0333333      22.5166666       1.99    0.2173
    Output - transformed outcome:
    Code: 
    Source                      DF     Type III SS     Mean Square    F Value    Pr > F
    
    time                         1      0.01854527      0.01854527      12.60    0.0121
    Drug                         2      0.00585338      0.00292669       1.99    0.2176
    time*Drug                    2      0.00586314      0.00293157       1.99    0.2171
    Last edited by d21e7x11; 10-12-2011 at 12:25 PM.

  16. The Following User Says Thank You to d21e7x11 For This Useful Post:

    shanehall.m (10-12-2011)

  17. #12
    Ninja say what!?!
    Points: 8,297, Level: 61
    Level completed: 49%, Points required for next Level: 153
    Link's Avatar
    Posts
    1,165
    Thanks
    37
    Thanked 84 Times in 76 Posts

    Re: testing rates through time

    In case you guys are interested, I became more curious and set up a model assuming the same chronological growth in all three drugs (editing Dason's coding):

    Code: 
    1> o.full <- lm(vals ~ drug + time, data = dat)
    
    1> summary(o.full)
    
    Call:
    lm(formula = vals ~ drug + time, data = dat)
    
    Residuals:
       Min     1Q Median     3Q    Max 
    -4.867 -2.175 -0.025  2.067  5.900 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  56.1500     2.3763  23.629 1.09e-08 ***
    drugB        -6.5000     2.6568  -2.447  0.04015 *  
    drugC        11.2500     2.6568   4.234  0.00286 ** 
    time          3.2333     0.9701   3.333  0.01034 *  
    ---
    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 
    
    Residual standard error: 3.757 on 8 degrees of freedom
    Multiple R-squared: 0.8766,	Adjusted R-squared: 0.8303 
    F-statistic: 18.94 on 3 and 8 DF,  p-value: 0.0005423


    Looks like there may be enough evidence to say that there IS an overall growth over time. I still feel like we have too few observations though.

  18. The Following User Says Thank You to Link For This Useful Post:

    shanehall.m (10-12-2011)

  19. #13
    Points: 1,935, Level: 26
    Level completed: 35%, Points required for next Level: 65

    Location
    USA
    Posts
    29
    Thanks
    10
    Thanked 0 Times in 0 Posts

    Re: testing rates through time

    d21e7x11 , in your SAS code, should that be model troutc instead of model outc or does the step before that automatically denote that anytime data=test2 is used, the transformed data is used... Everyone thanks for your help! So what I am getting out of this is to apply the arcsine transformation to the proportions, and then fit a line for each drug as well as an anova model to make sure there is no interaction. But, I guess the main point being, if we get this type of data, there needs to be more than 4 years of data.

  20. #14
    Devorador de queso
    Points: 95,819, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,935
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: testing rates through time

    Do you have any idea if the sample sizes for each data point are similar? Or even what the relative size of the sample sizes is?

  21. The Following User Says Thank You to Dason For This Useful Post:

    shanehall.m (10-12-2011)

  22. #15
    Points: 1,935, Level: 26
    Level completed: 35%, Points required for next Level: 65

    Location
    USA
    Posts
    29
    Thanks
    10
    Thanked 0 Times in 0 Posts

    Re: testing rates through time


    drug A would have a sample size > 200,000
    Drub b would have a sample size > 7,000
    drug c would have a sample size > 600

    That is as accurate as I can get.

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats