# Odds ratio calculation

#### starpen

##### New Member
Hey guys.

I've been doing some work during the holidays and there's one thing I've been having trouble with.

I'm doing a personal study comparing social survey answers between two different countries to make a comparative analysis on religions effect on education. To compare, I need to do some odds ratio calculations but I'm not sure exactly how to do it.

Basically, my data is shaped like this:

Country 1:

Low level of education - Yes to specific question (11% yes)
Mid level of education - Yes to specific question (3% yes)
High level of education - Yes to specific question (2% yes)

Country 2:
Low level of education - Yes to specific question (60% yes)
Mid level of education - Yes to specific question (46% yes)
High level of education - Yes to specific question (37% yes)

I get that the formula is:
(p/(1- p))/(q/(1-q))

but how do I get the total odds ratio for country one to compare to country two? Do I use the total span and just use low education / high education?

#### trinker

##### ggplot2orBust
I don't think you can simply add the percents together to get a new whole. Hopefully I'mn wrong on this. Do you have the original raw values (n)?

#### victorxstc

##### Pirate
I think you have to merge the "moderate education" group into one of the "low" or "high" education groups (in both countries) [so you will have two groups in each country], in order to be able to apply the formula of odds ratio.

#### trinker

##### ggplot2orBust
That's what the poster is asking to do but they have percentage rather than raw data.

#### starpen

##### New Member
But isn't it the span of the data I need? As far as I understand odds ratio (and this could be wrong), I need probability A and probability B to do the calculations, right? Couldn't I just use 11% as prob. A and 2% as prob. B?

#### starpen

##### New Member
I'm sorry. The numbers were just examples. They sum up to 100% in the set. Here is a link http://minus.com/lbpLByZb9H8EB9.

I have the raw data as well, but isn't this enough? It's in Danish but the idea is, that it's comparing Denmark and Poland in specific points. Here it's primary education, secondary education and tertiary education down and agree/disagree across. Hope it makes sense?

#### trinker

##### ggplot2orBust
Yes it's a whole new ballgame now. I thought your percents didn't sum to 100. I assume victorxstc is working you up a response right now. If victorxstc hasn't responded in a day I'll come back to this.

#### starpen

##### New Member
So it's like this:

(0.1145(1-0.1145))/(0.0247(1-0.0247)) = 5,10573408285119

Is this correct?

#### Dason

That doesn't look right for an odds ratio calculation. Note that the formula (which you gave) is (p/(1- p))/(q/(1-q)) (notice that it's division in the numerator and denominator - not multiplication)

Edit: But it seems like when you actually did the calculation you used division...

#### GretaGarbo

##### Human
What is the response variable in this case? “Religions” or ”Yes to specific question” (what is that then?) or maybe something else? It is not “country” or “level of education” is it?

For me odds ratios are something that is (mostly) used after a logistic regression (also called logit regression) has been estimated. And then the odds ratios are evaluated by some estimated parameter in the regression model. That will also give confidence intervals for the odds ratio.

There is something I haven’t understood here.

#### Dason

What is the response variable in this case? “Religions” or ”Yes to specific question” (what is that then?) or maybe something else? It is not “country” or “level of education” is it?

For me odds ratios are something that is (mostly) used after a logistic regression (also called logit regression) has been estimated. And then the odds ratios are evaluated by some estimated parameter in the regression model. That will also give confidence intervals for the odds ratio.

There is something I haven’t understood here.
The idea is the same but you don't need to go a logistic regression to get estimates of the odds ratios when you have a table of the responses.

#### starpen

##### New Member
Perhaps I'm using it wrong, but I'm using it to establish the relative differences in the span between education in the specific countries. I see that 11.45% is less different from 2.47% than 54.98% is to 16.54% in absolute numbers, but I wanted to show, that the difference is smaller in relative terms. Since the total number of respondents isn't the same across the survey, I needed something else to compare odds for "agree".

This specific question (among loads like it) is: "In times of few jobs, men should have more right to jobs than women" Agree/disagree. This is then paired to education level (primary, secondary and tertiary).

#### starpen

##### New Member
Yeah, the chi2 was the main part of the data processing. I did the countries one by one and did the set that I linked to as a summary. All I wanted to do (to understand how relative differences work in data sets with uneven number of respondents) was to figure out how odds ratio work in this regard and I figured the difference in odds ratio the countries compared would give me a relative difference idea, because I eliminate the total number of respondents in my calculations.

EDIT: and why would I merge the sets? All I need is the high/low values, right?

#### victorxstc

##### Pirate
For me odds ratios are something that is (mostly) used after a logistic regression (also called logit regression) has been estimated. And then the odds ratios are evaluated by some estimated parameter in the regression model. That will also give confidence intervals for the odds ratio.
I guess he wants to calculate OR for a 2x2 contingency table, analyzed using a chi-squared.

----------

Perhaps I'm using it wrong, but I'm using it to establish the relative differences in the span between education in the specific countries. I see that 11.45% is less different from 2.47% than 54.98% is to 16.54% in absolute numbers, but I wanted to show, that the difference is smaller in relative terms. Since the total number of respondents isn't the same across the survey, I needed something else to compare odds for "agree".

This specific question (among loads like it) is: "In times of few jobs, men should have more right to jobs than women" Agree/disagree. This is then paired to education level (primary, secondary and tertiary).
I think after merging two of those three educational profiles, you can run a chi-squred between the numbers (raw data needed) to obtain a P value, and if the result was remarkable, an OR would be good as well (otherwise, the OR is not so useful as it might be close to 1 [edit: or the difference was not representative of a true difference in population]).

------------------------

I think it depends on how you have calculated the percentages in the first place. I see in the second country, the sum of percentages surpass 100%, so it seems that each prcentage belongs to the number of individuals responded to that question divided by the number of individuals in that class. So since we don't know how many individuals were in each class, we are unable to base our calculations on these percentages.

For example, if in the second country we have 10, 100, and 1000 individuals in the first, second and third groups of the second country, the number of individuals would be 1, 30, and 200. But If we had 100, 10000, and 10 individuals in those groups, the numbers would differ (based on the given percentages). So we can't sum up the percentages in any of the two groups in order to merge them. Nor we can use these for odds ratio calculations.

-------------------

EDIT: and why would I merge the sets? All I need is the high/low values, right?
That is fine to run your test with low and high only, but then a considerable part of your valuable data would be disposed. Merging would give a larger sample, and can also introduce the moderately educated people to your study as well. Although it might also reduce the impact of the extreme groups.

As a suggestion, you can test both conditions (merging or high/low only) and see which one gives you a more good-looking result and use it.

---------------

BTW what was your P value?

#### GretaGarbo

##### Human
The idea is the same but you don't need to go a logistic regression to get estimates of the odds ratios when you have a table of the responses.
I understand that. But how to get confidence intervals?

Edit: I just realized that it could be estimated with that good old model. I had forgotten about that one. But I think something is lost here because that is like doing simple linear regression model and not a multiple regression model (taking account of both explanatory factor).

And is it possible to evaluate if there are any interaction effects in the absence of
a model?

I guess he wants to calculate OR for a 2x2 contingency table, analyzed using a chi-squared.
No, I don’t want to analyse it with a chi-squared test. And the contingency table would be 2x2x3.

I want to analyse it with a logit model. Or possibly a probit model or complementary log log model. Or possibly with an new a model that you and I, victorxstc, had discussed before – a discussion that was interrupted!

“he” ????

Last edited:

#### victorxstc

##### Pirate
No, I don’t want to analyse it with a chi-squared test. And the contingency table would be 2x2x3.

I want to analyse it with a logit model. Or possibly a probit model or complementary log log model. Or possibly with an new a model that you and I, victorxstc, had discussed before – a discussion that was interrupted!

“he” ????
Dear Greta

By "he" I meant "starpen". I guessed maybe he wants to do a chi-squared in this way:

In country 1:

------------ Low education ----------- High education
-------- Yes: 2800 persons ----------- 7500 persons
-------- No : 100,000 persons -------- 45000 persons

In this example, P would become < 0.001
OR = 5.9524 (meaning that high-educated people tend to respond to that question with Yes, 6 times more than do the low-education people).
95% CI for OR = 5.6915 to 6.2252

The same could be done for the second country.
Also the values in both countries could be summed up and the same process could be repeated for the combination.

BTW, truly sorry for the interruption

#### starpen

##### New Member
I did P-values for Poland and Denmark separately and they were both smaller than 0.001 (as far as I remember, the largest of the two were 0.0000008, but I was satisfied with <0.05).

Just to be clear; we are talking about the odds ratio being possible for the "extreme" numbers, right?

#### victorxstc

##### Pirate
Just to be clear; we are talking about the odds ratio being possible for the "extreme" numbers, right?
OR can be calculated for both extremes (eg, my example above), or if you merge your middle group into one of the extremes. Both are possible, and would lead to ORs. Then you compared the confidence intervals of the ORs from the two countries?

By the way congrats on the great P values ORs can be very huge in that case.

#### starpen

##### New Member
Thanks. Yeah, this study has been loads of fun. Seems like I've found something I need to work further on Yeah, I think I have something pretty interesting when comparing CI of the ORs. I guess this summers self learning from my own article wasn't all to nothing after all