# Thread: Not sure what method to use with this data

1. ## Not sure what method to use with this data

Hi,

I hope I'm not asking a simple question, but I'm not exactly a statistician, so I was hoping someone can point me in the right direction.

Let's say I have data of SAT scores, BMI, and 40yard times of students in Wyoming, New York, and Texas, but the data doesn't have the metric from the same student. We can assume SAT scores, BMI, and 40 yard times are independent. What the data might look like is in the attached file.

Obviously BMI, SAT, and 40y are on completely different scales, but if necessary we can assume they are each normally distributed.

Now, here is where I start to get vague and I apologize for not having better terms, but I want to figure out how "Similar" states are based on these metrics. If all three metrics are wildly different from each state, the states are not similar, and if all three metrics are similarly distributed, then the states are similar. If SAT scores are similar but 40y times are different, the metric should be somewhere in between.

If someone can point me in the right direction on what kind of analysis I need to use, I would greatly appreciate it.

2. ## Re: Not sure what method to use with this data

How about trying a chi-square to establish a relationship there incase you treat all your Xs & Y as discrete (categorical).

3. ## Re: Not sure what method to use with this data

Originally Posted by venkat5557
How about trying a chi-square to establish a relationship there incase you treat all your Xs & Y as discrete (categorical).
I'm sorry, I'm not sure what you mean by that...

4. ## Re: Not sure what method to use with this data

Your data is confusing. Are these averages or individuals measurements? Why do the number of observations per variable vary (e.g., Texas only has one SAT score [hard not to make a Texas joke], but other states have more). The previous reply was asking you about using a Chi-Square test (for comparing two categorical variables). Though if we don't know what these data represent it is hard to propose suggestions. If they are means, standard deviations would be helpful in calculating t-tests. If these were means, then you may also be able to look at correlations. More information is needed.

5. ## Re: Not sure what method to use with this data

Originally Posted by hlsmith
Your data is confusing. Are these averages or individuals measurements? Why do the number of observations per variable vary (e.g., Texas only has one SAT score [hard not to make a Texas joke], but other states have more). The previous reply was asking you about using a Chi-Square test (for comparing two categorical variables). Though if we don't know what these data represent it is hard to propose suggestions. If they are means, standard deviations would be helpful in calculating t-tests. If these were means, then you may also be able to look at correlations. More information is needed.
They are individual measurements, not means. That explains why the number of observations vary. Suppose the 40y, SAT, and BMI were different surveys. Then you will get varying number of responces for each category (and yes, there is an implied Texas joke).

6. ## Re: Not sure what method to use with this data

There just doesn't seem to be much data here. Is the dataset larger than this? Comparing a single BMI to three from another state does not present well for comparisions. Typically you would compare measures of central tendency along with paying attentino to their measures of dispersion (e.g., means with standard deviations). I do not have any direct suggestions with this small dataset, and wonder about the representativeness of a couple of people from a larger state. Can I say that my age is 33 and my daughter is 1, are our ages different?

7. ## Re: Not sure what method to use with this data

Originally Posted by hlsmith
There just doesn't seem to be much data here. Is the dataset larger than this? Comparing a single BMI to three from another state does not present well for comparisions. Typically you would compare measures of central tendency along with paying attentino to their measures of dispersion (e.g., means with standard deviations). I do not have any direct suggestions with this small dataset, and wonder about the representativeness of a couple of people from a larger state. Can I say that my age is 33 and my daughter is 1, are our ages different?
Yes, this is only a sample of the data. The actual data is much larger (say 1M records total). I know I could do a goodness of fit test within each category to see if they are significantly different, but I'm not sure how to combine the results from each category to come up with one aggregate metric.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts