+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 19

Thread: Normality

  1. #1
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Normality




    I did a test of normality in STATA on my dependent variable. The rest was significant thus violations of normality. After taking the log, there is still violation of normality. What is the next step to take?

  2. #2
    Omega Contributor
    Points: 38,253, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,989
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: Normality

    Why are you trying to normalize the variable?


    Can you post a histogram and qq-plot of your data?


    Large dataset will typically fail normality test when there is very small departures. What is your sample size. Also, feel free to upload the normality test results.
    Stop cowardice, ban guns!

  3. #3
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Normality

    Quote Originally Posted by hlsmith View Post
    Why are you trying to normalize the variable?


    Can you post a histogram and qq-plot of your data?


    Large dataset will typically fail normality test when there is very small departures. What is your sample size. Also, feel free to upload the normality test results.
    Very good questions to raise. Also, what is the dependent variable including the units of measure? A log transformation might not be the most appropriate transformation.

  4. #4
    Human
    Points: 12,666, Level: 73
    Level completed: 54%, Points required for next Level: 184
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,360
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Normality

    Quote Originally Posted by Jazz3 View Post
    I did a test of normality in STATA on my dependent variable. The rest was significant thus violations of normality. After taking the log, there is still violation of normality. What is the next step to take?
    The next step is to check normality of the residuals, (i.e. the dependent variable given the explanatory variables). The distribution of the dependent variable is irrelevant.

  5. The Following 2 Users Say Thank You to GretaGarbo For This Useful Post:

    CowboyBear (04-26-2017), ondansetron (04-21-2017)

  6. #5
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Normality

    Quote Originally Posted by GretaGarbo View Post
    The next step is to check normality of the residuals, (i.e. the dependent variable given the explanatory variables). The distribution of the dependent variable is irrelevant.
    Good catch, I didn't think to make sure the OP was doing that since I kind of assumed that's what had been done :O

  7. #6
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality

    The more I post in this forum the more I learn (THANKS) and the more I realize I don't know anything (CRIES)

    I'm on my way out, but I will definitely go over everything

  8. #7
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality

    My dependent variable is on a scale from 0 to 100 (percentage), but I took the log (seemed logical at the time)

    My main drivers are also on a scale from 0 to 100, but I had to transform them so I only (keep) have values of fifty and higher

    Then I have some economic control variables (GDP) and nominal dummy control variable 1/0

    I also checked for outliers. Ive attached the kdensity(histogram?)/qqplot and the other tests I used for normality
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	stem r.JPG‎
Views:	8
Size:	69.3 KB
ID:	6576   Click image for larger version

Name:	qnorm.JPG‎
Views:	7
Size:	19.1 KB
ID:	6577   Click image for larger version

Name:	kdensity.JPG‎
Views:	7
Size:	26.4 KB
ID:	6578   Click image for larger version

Name:	iqr.JPG‎
Views:	8
Size:	24.7 KB
ID:	6579  

  9. #8
    Human
    Points: 12,666, Level: 73
    Level completed: 54%, Points required for next Level: 184
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,360
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Normality

    That seems to be OK. And with the large sample size the parameter estimates will be approx normal by the central limit anyway, so it is OK to go ahead with the inference based on normal theory. (In my view - but we can all be wrong.)

  10. The Following User Says Thank You to GretaGarbo For This Useful Post:

    Jazz3 (04-23-2017)

  11. #9
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Normality

    Quote Originally Posted by GretaGarbo View Post
    That seems to be OK. And with the large sample size the parameter estimates will be approx normal by the central limit anyway, so it is OK to go ahead with the inference based on normal theory. (In my view - but we can all be wrong.)
    I agree with you that the departure from normality doesn't appear to be large enough to cause an appreciable issue, especially at that sample size.

    Would you be able to show us a histogram and normal probability plot of your untransformed residuals?

  12. The Following User Says Thank You to ondansetron For This Useful Post:

    Jazz3 (04-23-2017)

  13. #10
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality

    The untransformed dependent variable
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	kdensity2.JPG‎
Views:	3
Size:	26.4 KB
ID:	6580   Click image for larger version

Name:	qnorm2.JPG‎
Views:	4
Size:	12.9 KB
ID:	6581   Click image for larger version

Name:	stemb2.jpg‎
Views:	4
Size:	44.6 KB
ID:	6582   Click image for larger version

Name:	iqr2.JPG‎
Views:	1
Size:	24.4 KB
ID:	6583  

  14. #11
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality

    Doing the test for the untransformed and transformed variable both were significant
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	swilk.JPG‎
Views:	6
Size:	14.3 KB
ID:	6584   Click image for larger version

Name:	swilk2.JPG‎
Views:	7
Size:	14.6 KB
ID:	6585  

  15. #12
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Normality

    Quote Originally Posted by Jazz3 View Post
    Doing the test for the untransformed and transformed variable both were significant
    Formal tests for normality aren't really a great idea for several reasons. In particular, they're often very sensitive to slight departures from normality, with this problem magnifying as you increase the sample size. I would personally not even run a formal test of normality and just rely on the histogram/stem-leaf/normal probability plot approach. If you check your standardized residuals and investigate the suspect outliers (absolute standardized value between 2 and 3) and the outliers (absolute standardized value great than 3) as well as other regression diagnostics (influential observations) I think you'll be better off than using anything than the formal normality tests. Especially since regression methods are able to perform pretty well in the presence of outliers and moderate non-normality.

    The difference between your transformed and untransformed plots doesn't look to be too big. I would see how much the conclusions vary between the two models. Did you happen to look into the constant variance assumption using a plot of residuals vs predicted y values in the transformed and untransformed models? I think the homoscedasticity assumption is more important than the normality issue.

  16. #13
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality

    Quote Originally Posted by ondansetron View Post
    Formal tests for normality aren't really a great idea for several reasons. In particular, they're often very sensitive to slight departures from normality, with this problem magnifying as you increase the sample size. I would personally not even run a formal test of normality and just rely on the histogram/stem-leaf/normal probability plot approach. If you check your standardized residuals and investigate the suspect outliers (absolute standardized value between 2 and 3) and the outliers (absolute standardized value great than 3) as well as other regression diagnostics (influential observations) I think you'll be better off than using anything than the formal normality tests. Especially since regression methods are able to perform pretty well in the presence of outliers and moderate non-normality.

    The difference between your transformed and untransformed plots doesn't look to be too big. I would see how much the conclusions vary between the two models. Did you happen to look into the constant variance assumption using a plot of residuals vs predicted y values in the transformed and untransformed models? I think the homoscedasticity assumption is more important than the normality issue.
    Ohh I see. Good to know!

    I didnt check for homoscedasticity using plots, I used the BrueschPagan test in both cases there was heteroscedasticity, so I run the regression on robust error terms

    But I've attached the plots
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	rvplotUNtrans.JPG‎
Views:	13
Size:	29.0 KB
ID:	6586   Click image for larger version

Name:	rvplotTrans.JPG‎
Views:	6
Size:	28.2 KB
ID:	6587  

  17. #14
    Points: 1,741, Level: 24
    Level completed: 41%, Points required for next Level: 59

    Posts
    230
    Thanks
    37
    Thanked 68 Times in 59 Posts

    Re: Normality

    Quote Originally Posted by Jazz3 View Post
    Ohh I see. Good to know!

    I didnt check for homoscedasticity using plots, I used the BrueschPagan test in both cases there was heteroscedasticity, so I run the regression on robust error terms

    But I've attached the plots
    Understood that you used BP and then robust SEs. As you can see with the plot, the vertical spread of the residuals is not approximately constant which indicates the errors might not have constant variance (as you already saw using the BP). Are you using and categorical variables? If so, it would probably be helpful for you to plot the residuals using the categorical variable as a grouping variable (should give you the same plots, but it will use a different symbol or color on the plot to represent each group). This way you can get an idea of why the heteroscedasticity is occurring (i.e. is it that each group has the same pattern or that the groups just have different variances from one another).

  18. #15
    Points: 533, Level: 10
    Level completed: 66%, Points required for next Level: 17

    Posts
    50
    Thanks
    20
    Thanked 0 Times in 0 Posts

    Re: Normality


    Thanks I will do that, but how do I get different symbols or colors?

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats