Say that Rater 1 agrees in 13/19 of the cases, and Rater 2 agrees in 14/19 of the cases. Why is the calculation 13/19 * 14/19. That amount is then added to the product of the total non agreement observations in a similar fashion to get the "total agreement by chance."

Seems like something is lurking behind the scenes I am not quite understanding. Are there any sort of basic probability rules I am missing or that are at play here? I am just not sensing how "chance" agreement is found from doing these simple calculations on what was actually observed.

I know the chance agreement is only a part of the final Kappa statistic...it is just the chance calculation that confuses me. ]]>

I have a data set with 3 proposed dichotomous independent variables: sex (male or female), side (left or right), and region (lower or upper). I want to know which are significant / affect my dependent variable: distance.

Apparently, in the medical research community, it is common to use a univariate test to filter out independent variables that are ultimately thrown into a multivariate test.

In order to abide by this, my initial idea was to do a single factor ANOVA as a variable filter and then do a linear regression on the passed variables. However I am getting caught up on a few things...

1) What is the value of a post hoc test in this case? Looking at each category, it is clear that their individual distance distributions are non parametric (I mean.. they kind of are, but are definitely left skewed).

2) If a post hoc test is necessary.. which would I use? Wilcoxon seems to be the most appropriate for non parametric data, but how "non normal" does it really have to be?

3) Assuming two variables are passed as being potentially significant and I can throw them into a multivariate test.. should I just use a simple linear regression?

4) As a bonus: is this a simple task in Excel? :)

Thanks to whomever answers. The overlap in statistical methods and general concepts has made me overthink this!

5) My understanding is that ANOVA is just a less powerful, but more general, form of regression analysis. Why am I even supposed to use it?! ]]>

I've been working on this problem for a couple hours now, and I'm stumped.

Essentially what we have are 4 groups of 25 people each. The groups are formed around a sales goal for each group ($10k, $20k, $30k, $40k). We don't know the individual scores within those groups, but we have 4 group means (15.5, 16.2, 17.9, 20). We've also been given a Mean Square Error (40). The question asked is what is the correlation between the individual scores in the group and the sales goal of the groups. Originally my thinking was to multiply each group mean by 25 to get a group score, then find the correlation (using R), but that correlation gives me the same answer as the correlation between the means and the sales goals (which the question specifically says is not what is wanted).

Would very much appreciate any thoughts. Thanks. ]]>