Normality and tests

#1
Hi all,
I am working on a project where I have three theoretical constructs and have binary coding for positives and negatives of each (attached is a snapshot of my data, but total N=159 and age ranges from 5-19). Couple questions:
1. How can I test this data for normality? To know if I should use parametric or non-parametric tests?
2. What test should I use to compare each construct (columns E-J) across age, gender, and location? And on whether or not they know a scientists (last column)?

TIA
 

Attachments

noetsi

Fortran must die
#3
All of the formal tests of normality have serious power issues and thus I stay away from them. A QQ plot is the best approach I have found to test normality.

You have to be careful what you are testing normality in. For example in regression you are interested in the residuals not the raw data in terms of normality.
 
#4
Hi Lee,

I don't really understand what do you want to compare?
What is the dependent variable?
Are the predictors: Embodiment, Attainability, ...Desirability, Sci, Age?

When the sample size is 30 or more usually you can assume the average distribute normally.
http://www.talkstats.com/threads/n-30-or-not-this-is-the-question.74376/#post-217220
The dependent variable would be the embodiment, attainability, and desirability (positive and negative), so six different tests. The predictors would be age (categorical), gender (binary), location (there are two different schools, so binary), and whether they know a scientist (binary). I'm assuming I would sum together the "scores" find the average in the dependent column for each predictor (e.g., the total for embodiment for age 5) and use that as my continuous dependent variable.
Does that help clarify?
 
Last edited:

obh

Active Member
#6
All of the formal tests of normality have serious power issues and thus I stay away from them. A QQ plot is the best approach I have found to test normality.

You have to be careful what you are testing normality in. For example in regression you are interested in the residuals not the raw data in terms of normality.
Hi Noetsi :)

I think the common practice is to combine a normality test with a graphical method like the QQ plot.
 
#7
Hi Lee,

Why is age categorical?
Did you try using the linear regression? (if meets the assumptions)
I am treating age as categorical because it is not assumed that there will be a trend, we just need to see if there is a significant difference in which constructs each age associates with. When I actually run the tests, I will group ages together (5-7, 8-10, etc)
 

obh

Active Member
#8
Why not just put the real age?

Anyway, even with an ordinal age variable you can run a linear regression. (if meets the assumptions)
If for example, you will run a one-way ANOVA over only the age variable you may miss some differences due to other predictors and may get a wrong answer ...
 
#9
Why not just put the real age?

Anyway, even with an ordinal age variable you can run a linear regression. (if meets the assumptions)
If for example, you will run a one-way ANOVA over only the age variable you may miss some differences due to other predictors and may get a wrong answer ...
Thanks. With a regression model, then, would logistic regression be a better choice since the dependent is binary?
 
#13
Yep, I'll try:
I am using a theory of role modeling to evaluate student responses to a survey about scientists. The role model theory has three constructs: goal embodiment, attainability, and desirability. I went through the survey and coded student responses that indicated a particular construct. The responses code be coded as positive or negative. For example, positive codes for desirability would be terms like "cool" and "fun" while negative codes would be words like "boring" and "nerd." It is a qualitative study in that we are evaluating the words students used, but now we want to statistically compare across age, gender, location, and whether they know a scientist.
For statistical comparison, if a student anywhere indicated a particular construct (positive or negative) they were given a '1' for that construct. If nothing was present, they were given a '0'. Even if they had multiple coded responses for one construct, they were still only given a '1' to indicate presence rather than quantity. This is the standard code practice for studies in this vein. In the end, I have something that looks like the picture on the original post (only with 159 students from ages 5 - 19).
In the end, I will have six DV: one for each construct both positive and negative. I want to compare that to the four predictors (age, gender, location, and whether they know a scientist).

Does that help?
 

obh

Active Member
#14
So the same student could get both 1 for negative and 1 for positive? so actually Positive Embodiment and Negative Embodiment are two dependent variables? say all the combinations are possible for one student (Pos Emb,Neg Emb ): (1,1), (1,0) , (0,1), (0,0) ?
 
#17
It seems that way for age, at least. But what about gender, location, and knowing a scientist (also all binary)?
Also, are there tests for normality I need to run before I can use logistic regression?