How about trying a chi-square to establish a relationship there incase you treat all your Xs & Y as discrete (categorical).
Hi,
I hope I'm not asking a simple question, but I'm not exactly a statistician, so I was hoping someone can point me in the right direction.
Let's say I have data of SAT scores, BMI, and 40yard times of students in Wyoming, New York, and Texas, but the data doesn't have the metric from the same student. We can assume SAT scores, BMI, and 40 yard times are independent. What the data might look like is in the attached file.
Obviously BMI, SAT, and 40y are on completely different scales, but if necessary we can assume they are each normally distributed.
Now, here is where I start to get vague and I apologize for not having better terms, but I want to figure out how "Similar" states are based on these metrics. If all three metrics are wildly different from each state, the states are not similar, and if all three metrics are similarly distributed, then the states are similar. If SAT scores are similar but 40y times are different, the metric should be somewhere in between.
If someone can point me in the right direction on what kind of analysis I need to use, I would greatly appreciate it.
Thank you in advance.
How about trying a chi-square to establish a relationship there incase you treat all your Xs & Y as discrete (categorical).
Your data is confusing. Are these averages or individuals measurements? Why do the number of observations per variable vary (e.g., Texas only has one SAT score [hard not to make a Texas joke], but other states have more). The previous reply was asking you about using a Chi-Square test (for comparing two categorical variables). Though if we don't know what these data represent it is hard to propose suggestions. If they are means, standard deviations would be helpful in calculating t-tests. If these were means, then you may also be able to look at correlations. More information is needed.
There just doesn't seem to be much data here. Is the dataset larger than this? Comparing a single BMI to three from another state does not present well for comparisions. Typically you would compare measures of central tendency along with paying attentino to their measures of dispersion (e.g., means with standard deviations). I do not have any direct suggestions with this small dataset, and wonder about the representativeness of a couple of people from a larger state. Can I say that my age is 33 and my daughter is 1, are our ages different?
Yes, this is only a sample of the data. The actual data is much larger (say 1M records total). I know I could do a goodness of fit test within each category to see if they are significantly different, but I'm not sure how to combine the results from each category to come up with one aggregate metric.
|
|