What sort of data is this? In a pickle

Hi, sorry first of all if is could be suited more towards a different catagory.

Basically ive done questionaire and one question is :

3. Highest formal education.

GCSE's (School or FE college)
A-Levels (Or equivalent)
Graduate Degree (Higher education)
Postgraduate or higher

they are tick boxes where you have to choose one.

In SPSS i have entered them an given them values as 1=None 2=GSCE and so on

What i want to know is what kind of data is it?

Is it nominal/catagorical as they are not scale as in the gap between each level is not equal,

Or is it Likert/ordinal data as each subsequent one is an increased level to the prior?

I would like to compare this question to this question "Do you actively seek climate change information" Yes or NO

My theory is that the people with higher education level DO actively seek info more than lower educated. ( You can actually see this by looking at the data in its raw form)

But what test would show this the best? Am i right that to get the best answer for my question i would have to choose the education level as likert data and maybe do mann whitney? But that would create a median of teh education level right (which technially is incorrect?)

Any help is greatly appreciated! Im tying myself in knots!

P.s i have done crosstabs and that does show that people with an degree most actively seek info, but i want to say really something like...as education increases they are more likely to activel seek info not just only people with degrees (as people with degree only has the higheest percentage as more people with that qualification has actually done the survey!)



Fortran must die
If you assume they are ordered based on the level of education than they are ordinal. Ordinal data levels do not have the same distance between them. The difference between ordinal and nominal data is that you can logically order the levels in ordinal but not nominal data. Likert is a common form of ordinal data (although to make things confusing likert data is commonly treated as interval data by many researchers - who call it interval like).

The best (if not easiest) test would to make your dependent variable climate information, you education level an independent variable and run logistic regression.

I don't know why you would do a non-parametric when you can do logistic regression instead.
Thank you for the reply. THats the thing im getting confused with they way researchers treat likert data when the distance between each thing is not technically equal. Reading many things with conflicting opinion.

Just that wouldnt a logistic regression create medians and standard deviations? Which is wrong if im using education as a likert thus ordinal? Or would it just be best case scenario really and im over complicating things?

Sorry i think ive just read so many conflicting papers on stats now ive complicated the matter! Cheers for you help, i really am no Stats expert


Fortran must die
As far as likert data being treated like interval, the argument is that when you have enough levels it behaves enough like interval data to work (roughly) in procedures that use interval data. The method is robust enough to handle it (or so the argument goes, not all agree). They are not arguing that it is inteval data, simply that it is close enough to use that way.

Logistic regression won't create medians, it will generate slopes and standard errors. There is nothing wrong with using likert data like ordinal at all, in fact that is what I was suggesting you do. The reason to use logistic regression is that if you can parametric methods are preferred to non-parametric ones as your initial suggestion.

I have come to the painful conclusion that no two statisticians ever agree....which is why you usually go with generally accepted rules.
Ok, thank you ever so much for that. I shall try and see if i can get my head around logistic regression.

Ive just done a Linear regression, and it comes up with a Sig. (in the coeffcient box) of 0.54 so that means there is no statistican relation right?

Just to mess around i just entered a load of data along the lines of lots of people with little education and them not actively seeking
and people with high education Yes to actively seeking, and the Sig level then became 0.00 thus now ther is significant stat relationship between education level and activly seeking info

Is what i said correct?

Thank you very much for your contued help, im rather a fish out of water in this department.....


Fortran must die
If the .54 is your p score than you are right. Its not sig. But I don't think you have a dependent variable that is useful for OLS so I am not sure that value means anything.

I am not sure what you ran, based on your comments above? What method did you use and what was your dependent and independent variable.

Don't feel bad. When I work with dason, trinker, spunky et el on statistics question I feel like a total idiot (because they know so much more stats than I do and I ask **** questions). The more you learn, the more you realize what you don't know.
Thank you for the reassuring words!

Right well to get them results i did : Analyse- Regression- Linear.....And the dependant as Activelly seeking info, And the Independant as Education level
and with my original data it came out as 0.54 in the Sig column and education row, But when i manipulated the data to extremes (just to test) making all lower educated people not actively seek and higher educated always seek. the sig changed to 0.00, so i guessed it worked.....

And just to confirm i have The education levels as ordinal and the Activel seeking info [Yes or NO] as nominal.

I think im starting to see why linear regression makes sense. Just that last year i helped a friend with their project and we asked for help and got told to use mann whitney U for when compairing Likert (ordinal) to nominal (yes or no)


Fortran must die
I am not sure how the manipulation worked, but I dont think you can use linear regression with that dependent variable. Your data is not interval in nature. I would run logistic regression and see what your results are.
AAaaaa yes i see, i think i was getting confused in diferences in terminology. I think what you refer to as interval is what i have learnt as scaler. And yes obv my data is not on a scale.

I just ran Binary logistic regression and to be honest ive never seen any of the output before, so im not sure where to look for my sig levels that i need, there are so many things e.g. 'Varialbles in the equation' box, 'model summary' with things like cox and snell and nagelkerke i, 'omibus tests of model coeffcients'.

Sorry for all the questions, feel free just to ignore me if you are busy or anything, i feel as though i should be paying you back in someway for the help!


Fortran must die
What software did you run. I can interpret the data is its SPSS or SAS. If you want to know if the independent variable is sig you should look for the associated wald test. The overall model is most often tested by the - two log likihood test. Both should have sigs associated with them.

If you ran SPSS or SAS let me know and I can tell you what specifically to look for.


Fortran must die
I forgot how clunky SPSS is for logistic regression.

For the model test (to see if the model has any predicted value) look for (in Block 1 you ignore block 0) the "ominibus test of model coefficients". The sig value tells you if your model adds predictive value (if you reject the null which normally you want). If its less than .05 you normally assume it does have predictive value and reject the null. If this is not the case )if the sig value is above .05) you stop, your variable tells you nothing.

The nagelkerke R square is an attempt to duplicate an OLS R square (although it does not show you how much explained variance the model accounts for). Many don't like it, and it will generally be much lower than an OLS R square.

Look for "variables in the equation" to analyze your individual variable. This will have a "sig" value that is your p value. It is for the Wald test which is the logistic regression equivilent of the t-test (as close as logistic regression comes to that). The Ex(B) value at the end of this table is your odds ratio.

that is most helpful, i sadly have no time left on this computer so will have to pick it up again tomorrow. I will let you know how it went. Im just worried im over complicating things as any tests that I use that are not most obvious ones i have to justify my reasoning and as the marker and the aim of this project is more social science related (im doing a Envrionmental science degree) it may be unnecessary the depths this is going to. At the end of the day for this specific question all i want to find out is "As education level increases people will actively seek information about climate change more often' Thus my hypothsis being Increased Education more seeking! this is just a small part of my wider project- being along the lines of People with higher education are currently more engaged with climate change and i want to look into how if that is true how to get other people more interested.

Thank you once again,


Fortran must die
I dont think you are over complicating it. And using logistic regression in this case is recommended compared to a non-parametric test as long as your number of cases is high enough.