Statistical analysis between two different data sets


New Member
I am doing research looking at teaching courses within graduate courses of different majors.

There are a few things I looked at in this research, and the one I'm stuck on is this: I am looking to see if there is a correlation between teaching as part of the core curricula and teaching as part of the schools' mission statements.

I looked at the top 50 schools for each major and looked at the percentage of these programs that had teaching as part of the core curricula. So five different sets of data, where each data set is 50, for a total of 250 data points. These are reported as a percentage.

I then looked at teaching key words (codes) from mission statements. I chose five different key words to look for. I then compiled absolute numbers of these teaching codes from each program. Since there was a possibility to have all five keywords from each of the 50 schools for each major, each major (math, physics, etc.) had a data set of 250, for a total of 1250 data points. These were reported as actual numbers, not a percentage. I put them together in a chart, attached here.

There appears to be a clear correlation between the two, but I don't know the statistical analysis to do to prove it with a p-value, rather than the eyeball test. What I'm most specifically stuck on is that each variable is from a different data set.

Thank you!