chi-squared hypothesis test of homogeneity using R

#1
A survey of drivers was taken to see if they had been in an accident during the previous year, and if so was it a minor or major accident. The results are tabulated by age group:

\(
\begin{array}{c|lcr}
\text{Age} & \text{None} & \text{Minor} & \text{Major} \\
\hline
\text{under} 18 &67 & 10 & 5 \\
18-25 & 42 & 6 & 5 \\
26-40 & 75 &8 & 4 \\
40-65 &56 &4 & 6\\
65+ &57 &15 &1 \\
\end{array}
\)
Do a chi-squared hypothesis test of homogeneity using `R `to see if there is difference in distributions based on age and draw a column wise normal qqplot of these data by age.

I used the function "chisq.test" but do not understanding how does it interpret whether there is difference in distributions based on age ?
Also If i was asked to check the independence, i would use the same function "chisq.test". Am i wrong to use the function "chisq.test" for chi-squared hypothesis test of "homogeneity"?
 

gianmarco

TS Contributor
#2
Hi!
I am little confused from your terminology (i.e., "homogeneity"), but since you have been given a cross-tabulation, I guess you are to evaluate if there is a correlation between the levels (i.e., categories) of the two categorical variables being compared (age class and type of accident). This is where chisq test comes into play.

The test indicates that there is not a significant correlation between age classes and type of accident:
Code:
Pearson's Chi-squared test
data:  mydata 
X-squared = 12.5862, df = 8, p-value = 0.1269
It can be seen that the expected frequencies are not far from the observed ones:
Code:
 none     minor    major
under18 67.46260  9.767313 4.770083
18-25  43.60388  6.313019 3.083102
26-40  71.57618 10.362881 5.060942
40-65  54.29917  7.861496 3.839335
over65 60.05817  8.695291 4.246537
Hope this helps
regards
Gm