+ Reply to Thread
Results 1 to 4 of 4

Thread: Testing arbitrary contrasts based on summary statistics

  1. #1
    Cookie Scientist
    Points: 13,050, Level: 74
    Level completed: 50%, Points required for next Level: 200
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,288
    Thanks
    65
    Thanked 579 Times in 435 Posts

    Testing arbitrary contrasts based on summary statistics




    Following the discussion in this recent thread (Getting F statistic from multiple t-statistics?) and spunky's request therein, here is a neat trick one can use to test arbitrary contrasts for ANOVA models, and even multiple degree of freedom tests, using only some basic summary statistics of the kind that would be reported in a manuscript -- without needing access to the raw data.

    Setup

    We need the following three pieces of information to do these tricks:
    1. the cell means
    2. the sample sizes per cell
    3. at least one F-ratio corresponding to any of the possible contrasts from the ANOVA model

    So let's say we have a manuscript on our desk in which the authors conducted a 2*2 factorial ANOVA, with factors A (having levels A_{+1} and A_{-1}) and B (having levels B_{+1} and B_{-1}). They give a table of means and sample sizes by cell, but for whatever reason only report the test of the interaction. So we have the following information:
    Code: 
    > # cell means
    > round(tapply(dat$y, list(A = dat$A,B = dat$B), mean), 2)
        B
    A        -1      1
      -1 103.54  99.28
      1   99.59 102.61
    > 
    > # cell sample sizes
    > table(A = dat$A,B =  dat$B)
        B
    A    -1  1
      -1 18 23
      1  23 16
    > 
    > # t statistic for interaction contrast
    > summary(lm(y ~ A + B + AB, data=dat))$coef["AB",]
       Estimate  Std. Error     t value    Pr(>|t|) 
    1.819051097 0.570385136 3.189162871 0.002073302
    Case 1: single degree of freedom tests

    Perhaps as curious readers we are interested in knowing whether the A_{+1}, B_{+1} cell differs from the other three cells. In other words we want to test the contrast below labeled "new":
    Code: 
    > cbind(contr, new = c(3,-1,-1,-1))
          A  B AB new
    [1,]  1  1  1   3
    [2,]  1 -1 -1  -1
    [3,] -1  1 -1  -1
    [4,] -1 -1  1  -1
    Following the formula here, the F-ratio can be computed as
    F = \frac{MSR}{MSE} = \frac{SSR/(p_{large} - p_{small})}{SSE/(N - p_{large})}
    where p_{large} and p_{small} are the numbers of parameters in the full model and the nested model, respectively; and N is the total sample size.

    So the only two missing quantities here are SSR and SSE. If we can get those we can compute the desired F-ratio.

    Given a particular contrast \lambda,
    SSR for \lambda = \frac{(\sum_{j=1}^J \lambda_j \bar{Y_j})^2}{\sum_{j=1}^J (\lambda_j^2/n_j)}
    where \lambda_j is the contrast weight for group j, \bar{Y_j} is the mean for group j, J is the number of groups, and n_j is the number of observations in group j.

    So in this data we have
    SSR_{new} = \frac{[3(102.61) - 99.59 - 99.28 - 103.54]^2}{9/16 + 1/23 + 1/23 + 1/18} = 41.67.

    Now we need to get SSE. To do this, we can use the same formula to compute SSR for a contrast for which we already know F, and then rearrange the F-ratio formula to solve for SSE.

    So for the known interaction contrast we have
    SSR_{interaction} = \frac{(102.61 - 99.59 - 99.28 + 103.54)^2}{1/16 + 1/23 + 1/23 + 1/18} = 258.51.

    Solving the F formula for SSE gives
    SSE = \frac{SSR(N - p_{large})}{F(p_{large} - p_{small})}

    Since t^2 = F, we can now just plug in the numbers to get
    SSE = \frac{258.51(80 - 4)}{10.17(4 - 3)} = 1931.84

    So that finally we have
    F_{new} = \frac{41.67/(4 - 3)}{1931.84/(80 - 4)} = 1.64

    And we can check our work by running anova() on the dataset after recoding the contrasts:
    Code: 
    > anova(lm(y ~ other1 + other2 + new, data=dat))
    Analysis of Variance Table
    
    Response: y
              Df  Sum Sq Mean Sq F value   Pr(>F)   
    other1     1   38.06  38.056  1.4988 0.224639   
    other2     1  191.60 191.604  7.5462 0.007505 **
    new        1   41.50  41.496  1.6343 0.205002   
    Residuals 76 1929.70  25.391                    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    Aside from some minimal rounding error, we have it.

    Case 2: multiple degree of freedom tests

    Given the same data as above, suppose now that for some reason we wish to see the 2 degree of freedom test comparing the full ANOVA model (factors A, B, and their interaction) to a model that only includes factor A.

    We already saw above how to solve for SSE (and in fact we already computed it--we are using the same Full model so SSE will be the same). However, here we will get SSR in a slightly different way, using
    SSR = \sum_{i=1}^N(\hat{Y}_{iSmall} - \hat{Y}_{iLarge})^2
    where \hat{Y}_{iSmall} is the predicted value for observation i under the smaller or reduced model, \hat{Y}_{iLarge} is the predicted value for observation i under the larger or full model, and N is the number of observations. Essentially, we are treating the predicted values from the more complex model as the data to be predicted and then computing the sum of squared errors in the normal fashion.

    In the ANOVA case, this formula can be written more simply as
    SSR = \sum_{j=1}^Jn_j(\hat{Y}_{jSmall} - \hat{Y}_{jLarge})^2
    where n_j is the number of observations in group j, \hat{Y}_{jSmall} is the predicted value for group j under the smaller or reduced model, \hat{Y}_{jLarge} is the predicted value for group j under the larger or full model, and J is the number of groups.

    The predicted values from the Large model are straightforward: they are the group means. For the Small model, we have two sets of predicted values, those for A_{+1} and A_{-1}, and in both cases these predicted values are weighted averages of the two cell means at each level (i.e., collapsing across the B factor), weighted by cell size.

    For A_{+1}:
    \hat{Y}_{A_{+1}Small} = \frac{[23(99.59) + 16(102.61)]}{23 + 16} = 100.83

    For A_{-1}:
    \hat{Y}_{A_{-1}Small} = \frac{[18(103.54) + 23(99.28)]}{18 + 23} = 101.15

    So using the simplified SSR formula, we have
    SSR = 18(101.15 - 103.54)^2 + 23(101.15 - 99.28)^2
    + 23(100.83 - 99.59)^2 + 16(100.83 - 102.61)^2 = 269.31

    Which makes our F-ratio
    F = \frac{269.31/(4 - 2)}{1931.84/(80 - 4)} = 5.30

    Checking our work:
    Code: 
    > anova(lm(y ~ A, data=dat),
    +       lm(y ~ A + B + AB, data=dat))
    Analysis of Variance Table
    
    Model 1: y ~ A
    Model 2: y ~ A + B + AB
      Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
    1     78 2198.8                                
    2     76 1929.7  2    269.06 5.2983 0.007013 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    And again we have it, save for minimal rounding error.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  2. The Following 2 Users Say Thank You to Jake For This Useful Post:

    CowboyBear (10-10-2016), spunky (07-24-2012)

  3. #2
    TS Contributor
    Points: 20,976, Level: 91
    Level completed: 26%, Points required for next Level: 374
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,123
    Thanks
    166
    Thanked 533 Times in 427 Posts

    Re: Testing arbitrary contrasts based on summary statistics

    oh w-o-w!!! thanks Jake!!! instant subscription to this thread for future references now. i'm gonna have to start adding your posts in my reference sections, heh...
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  4. #3
    Cookie Scientist
    Points: 13,050, Level: 74
    Level completed: 50%, Points required for next Level: 200
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,288
    Thanks
    65
    Thanked 579 Times in 435 Posts

    Re: Testing arbitrary contrasts based on summary statistics

    I'm not sure how useful this stuff is when you have the actual dataset in hand, but with these procedures now you can really be the Reviewer From Hell...
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  5. #4
    TS Contributor
    Points: 20,976, Level: 91
    Level completed: 26%, Points required for next Level: 374
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,123
    Thanks
    166
    Thanked 533 Times in 427 Posts

    Re: Testing arbitrary contrasts based on summary statistics


    Quote Originally Posted by Jake View Post
    the Reviewer From Hell...
    mwahahahahahahahahha!!!! > : )
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats