# Nominal & Ratio Variables: How to correlate?

#### sjs016

##### New Member
The following is the hypothesis I'm trying to work with:
"Students who do not achieve at least 70% in high school English will not pass a college admissions test."

I was given a dataset with over 800+ high school English grades expressed in percentages. The dataset also includes whether each student "passed" or "failed" the college admissions test. For example, the data looks like this (times 800 cases):​
Student 1: 79% English grade; PASS
Student 2: 56% English grade; PASS
Student 3: 80% English grade; FAIL
Student 4: 93% English grade; PASS
Student 5: 60% English grade; FAIL​

I just don't know where to even begin with analyzing this hypothesis because the high school English grades are ratio, but the admission test outcomes are nominal. What statistical tests can I run with the provided data to analyze the hypothesis?
Also note: The only software I have to analyze the data is Excel.​

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Do you have any other variables that may explain the outcome and bias results if not included. Say school or socio-economic status factors?

What are you going to use these results for?

You likely need to use logistic regression, which is probably not readily available in Excel without finding code online or an add-in package.

#### sjs016

##### New Member
Do you have any other variables that may explain the outcome and bias results if not included. Say school or socio-economic status factors?

What are you going to use these results for?

You likely need to use logistic regression, which is probably not readily available in Excel without finding code online or an add-in package.
Thanks for your reply. Unfortunately, I do not have any other variables to work with. I am working for a college and I have been asked to solve this for them - I am not entirely sure what they are going to do with the results.

So far I have coded the "pass" and "fail" variables in Excel to "1" and "0", respectively. This way, I'm able to run a T-Test. I'm completely new to statistics though, so I am really not sure if that was appropriate for me to do.

I was also thinking of performing Chi Square by collapsing the Grade 12 English grades into categories: "Below 70%" and "70% and Above." I believe that doing a Chi Square this way will be valid, but I am just not sure if it will help me to answer the hypothesis.

#### sjs016

##### New Member
Also, my manager wants me to perform a Pearson's Correlation but I just don't see how that would be possible with the data I was provided with. Am I missing something here?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
No your manager is missing some thing. Pearson correlation would be in appropriate and uninformative. The chi-square may be an option, in that treating the percentage as a continuous variable gets wonky because it is bounded.

I would plot all of these data (in a scatterplot and may be also a bar graph) first and color code the plots based on which group they are in. This will drastically help you understand your data. though watch out for dots or markers being directly on top of each other, so make the fill transparent or add a little jitter to the percentage values. The plot will help you understand any themes that may be going on. Also, I believe there is theory about grading and how instructors bias the marks by letting a lot of people slide by right about the passing threshold.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yes, I had thought about the correlation measure as well, but questioned whether you would meet all of the assumptions. Make sure you still plot these data to make sure you are not missing something. An issue which will mess with many of the assumptions is whether you have a cluster of scores right near say the top bound. So a bunch of 97, 98 99, even 100. So if you calculated the mean with SD, say the +2SD would be non-senses. Because 95% of data should fit between +/- 2SDs, and 99+2SD would be in a hypothetical example well above a 100%. Though bounded data mess up normality assumption, and obviously equal variances between groups if one group has lower scores that are not as close to the top bound.

Does this make sense?

Another option may be the Wilcoxon Rank Sum test, though it has assumptions as well.

#### ondansetron

##### TS Contributor
I have now tried a Point-Biserial Correlation and I think it may have worked. My data seems to qualify for Point--Biserial according to this source:
https://statistics.laerd.com/spss-tutorials/point-biserial-correlation-using-spss-statistics.php#assumptions
With one nominal (and dichotomous) variable and one continuous variable, the point-biserial correlation is equivalent to the Pearson correlation. As hlsmith said, though, it depends if you've sufficiently met the assumptions to use the statistic (or claim equivalence).