Comparative study and too many variables

Hey guys.

I've been sociology for quite some time but I'm still pretty inexperienced in statistics and Stata. Here's my problem:

I'm working on a comparative study between Poland and Denmark in regards to social life and education. What I'd like to do, is to compare the respondents level of education to their responses in specific social categories and here's my problem. I can easily compare the answers in fx. Denmark alone, say compare Education and social answer, but that's not what I want. I want to compare education and answer between Poland and Denmark and that makes a problem I can't figure out. I then have three variables I need to mash up but I haven't a clue how to do it.


What I have --> Level of education compared answer in social category in country internal.

What I need --> Level of education compared to answer in social category and the difference between Denmark and Poland.

I hope it makes sense.


I don't really understand. Do you mean that you have 3 variables:
- education level
- social category
- country

... and that you're trying to assess the impact of social category and country on level of education?

How are education level and social category measured? How many respondents do you have?
Thanks for the fast reply.

Yes, I have three variables but my thesis is something in the line of: "There is a correlation between country, education level and how the certain group of respondents reproduce specific social values." The idea is, that education level will have different impacts on the social factor when you compare two countries.

I've coded education in 5 labels and I the social category is in 5 labels as well: "Never, hardly ever, sometimes, often, always".

I have around 3500 useful respondents.


So if I understand correctly the social variable is ordinal, ie it's a number ranging from 1-5 where 1 is "never" and 5 is "always". Is education also ordinal?

If so I would suggest an ordinal logistic regression with education as the outcome variable, and the predictor variables social & country plus their interaction. For example:
ologit education

The significance of the interaction term tests the null hypothesis that the relationship between social level & education level does not differ by country.
Yeah, social and education are ordinal variables and I see your point, but as the data is now, is looks like this:

Udddk = answer to education in 5 categories Denmark
Uddpl = answer to education in 5 categories Poland
Jobfam = answer to social question

In this way, I only have education paired to country so I don't know exactly what to do. I'm sorry if this is really basic stuff but I'm learning as fast as I can :)


Why are they paired? Is that part of the research design (eg do you have some kind of matched cohort), or is that just how they're recorded in the dataset?
Yeah, that's how they're recoded. If i tab f.eks. "udddk", I get the distribution of education in Denmark (in the respondents). So if I tab "udddk" with "jobfam", I get education level compared to the social factor. If i tab "udddk", "uddpl" and "jobfam" it gets confused and tells me I have too many variables - and I get why. I just don't know what to do about it :)


In that case you should be able to -reshape- the dataset into a long format that you can analyse:
rename udddk udd0
rename uddpl udd1
gen n=_n
reshape long udd, i(n) j(poland)
ologit udd i.jobfam##i.poland