+ Reply to Thread
Results 1 to 8 of 8

Thread: Multiple regression with non-normal distribution?

  1. #1
    Points: 953, Level: 16
    Level completed: 53%, Points required for next Level: 47

    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Multiple regression with non-normal distribution?




    I have some data that I would like to analyse by using multiple regression, as I am interested in the predictive ability of a model (variables influencing the likelihood of being bullied). I have just completed an initial dataset and have been looking at some of the possible variables.

    Unfortunately, three of my main variables are far from being normally distributed - these refer to questionnaire data totals (e.g. on on bullying [DV], in which most children are not bullied, so there is a very strong positive skew). This will not be the case with other data which is likely to be more normally distributed (e.g. academic data).

    My question is whether there is a way of having a predictive model that can cater for a number of seriously skewed variables, while others are not. I understand that it is possible to transform data, but wonder if I then need to transform all variables (even those with a normal distribution) in order to keep consistency across variables. Or am I going to be limited to non-parametric correlations?

    All advice gratefully received - I am pretty new to this so the learning curve is steep.
    Many thanks in advance

  2. #2
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Multiple regression with non-normal distribution?

    Are you saying the predictor variables are non-normal or that the response is non-normal?

  3. #3
    Points: 953, Level: 16
    Level completed: 53%, Points required for next Level: 47

    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Multiple regression with non-normal distribution?

    The distribution is non-normal, i.e. there is no "bell-curve" - it is very much skewed to the left (positive) for the three variables related to the questionnaire. One (bullying) is the DV and the two others are IVs (I'm looking at totals for the 3 questionnaire sections, so it is the distribution of the response totals that is non-normal). Hope I've not totally misunderstood something here!

  4. #4
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Multiple regression with non-normal distribution?

    Well typically we don't actually care if either the DV or the IV are normally distributed. What we care about is if the residuals are normally distributed (technically it's the 'error' term that we want to be normally distributed but we can never truly observe that so we use the residuals as an adequate substitute for assessing whether or not that assumption is met).

    So it's hard to say before you actually do the regression if you'll have a problem with something being 'non-normal'.

  5. The Following User Says Thank You to Dason For This Useful Post:

    judith_sw (06-18-2011)

  6. #5
    Points: 953, Level: 16
    Level completed: 53%, Points required for next Level: 47

    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Multiple regression with non-normal distribution?

    I've had a go at a regression ... what I was worried about was that any results would not be valid if there was a non-normal distribution at the outset - so thanks for that! Is there a paper that could be referenced for that?

    That then brings me on to stage 2 - in terms of the regression, what should I be looking out for? I pulled it up again (I'm using SPSS). I've worked an example using the Pallant text. So far I can see:
    Multicollinearity = OK. Tolerance and VIF fine, correlation with both IVs, but not too much between IVs
    P-P plot = close but not exactly to the line
    Scatterplot mostly clustering around the middle but one "strange" looking trend
    Everything else looking OK ... both IVs both making a significant contribution, one much more than the other as expected. I've attached it, just in case anyone has time to look at the "strange" plot! Prelim.doc
    Other IVs that I will be using at a later stage will be less problematic (I hope). I note in the Pallant SPSS Survival Manual that this non-normal distribution is common in the social sciences ... but what to do about it seems less straightforward. I have a good sample size and an interesting area (education and special educational needs).

    Many thanks for help and advice - much appreciated
    Last edited by judith_sw; 06-18-2011 at 05:41 PM.

  7. #6
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Multiple regression with non-normal distribution?

    Yeah your residual by predicted plot doesn't look too good. It looks like the response is bounded below. Can you describe the response a little bit more?

  8. #7
    Points: 953, Level: 16
    Level completed: 53%, Points required for next Level: 47

    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Multiple regression with non-normal distribution?

    The DV and 2 IVs are totals scores on a questionnaire. Each response was scored 0, 1, 2 or 3 (Likert scale) and a total score generated, but due to the nature of the topic, there is a real skew towards people answering positively (i.e. bullying, most are not bullied, so lots of totals with 0; similar for behaviour; the reverse true for positive relationships).

    This could be of interest: http://gradworks.umi.com/32/43/3243035.html
    And this, but it is getting beyond me: (wileyonlinelibrary.com) DOI: 10.1002/sim.4155

    It's getting late here now, so will have to turn in. Thank you for the advice so far - if there are any other pointers you can add, I would be very grateful. I'm seeing my supervisor on Monday, so the more info I have the better. I am prepared to put in whatever work is necessary to expedite the analysis process, but see that this dataset could be a lengthy and complex procedure. This is for a preliminary (tentative) analysis to be presented at a conference soon, but the full dataset will be compiled in August. Looks like the steep learning curve will continue
    Last edited by judith_sw; 06-18-2011 at 06:35 PM.

  9. #8
    Points: 953, Level: 16
    Level completed: 53%, Points required for next Level: 47

    Posts
    7
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Multiple regression with non-normal distribution?


    Hi,
    Any further comments on the viability of my non-normal variables? I can run correlations using a non-parametric test, but would very much like to do a full regression if there is any way of doing it in a robust manner. If I go down the route of transforming my skewed variables, does it then mean that I have to transform all of my variables in the same way to keep consistency?
    Thank you.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats