+ Reply to Thread
Page 4 of 4 FirstFirst 1 2 3 4
Results 46 to 51 of 51

Thread: Multivariate normality

  1. #46
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Multivariate normality




    An interesting approach to outliers for logistic regression

    First, we run a baseline model including all cases
    Second, we run a model excluding outliers (whose standardized residual is greater than 3.0 or less than 3.0) and influential cases (whose Cook's distance is greater than 1.0).
    If the model excluding outliers and influential cases has a classification accuracy rate that is better than the baseline model, we will interpret the revised model. If the accuracy rate of the revised model without outliers and influential cases is less than 2% more accurate, we will interpret the baseline model.
    I assume when they talk about classification accuracy they mean the Hosmer-Lemeshow goodness of fit test although I am not sure.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #47
    Devorador de queso
    Points: 95,940, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,937
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Multivariate normality

    I'm guessing by classification accuracy they mean that you predict the outcome to be 0 if the predicted probability is less than .5 and predict it to be 1 if the predicted probability is >= .5 (I guess one could toy with the cutoff to find the optimal accuracy). Then you can look at the proportion of cases you correctly predicted - call that your classification accuracy.
    I don't have emotions and sometimes that makes me very sad.

  3. #48
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Multivariate normality

    Probably. Do you think that is a reasonable approach?
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  4. #49
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Multivariate normality

    An interesting take on outlier analysis similar I think to what Lazar suggested....

    The Figure is a residual plot for the adjusted model. The horizontal axis shows the predicted probability of angina for each observation; the vertical axis shows the Pearson residual. The size of the plotted circle is proportional to the Cook’s distance for the observation. The higher curve is of subjects who developed angina, and the lower curve is of subjects who did not. Because the number of subjects who developed angina is smaller, their observations are generally more influential, and their circles tend to be larger. From the Figure, we can identify several possible problems. First, there are 2 observations with predicted probabilities of angina between 0.75 and 0.80. These come from 2 subjects with unusually high cholesterol values (600 and 696 mg/dL). The subject with 696 mg/dL did not develop angina, making a rather poor fit to the model and the most influential observation in these data, shown by having the largest circle. There are also subjects who developed angina despite having a very low predicted probability in the model. The low predicted probabilities for these subjects were primarily due to low cholesterol values. The mismatch between the observed angina rates and low predicted probability of angina in the regression model for these subjects creates large residuals, and these are the points in the upper left region of the Figure. A substantial number of these subjects have residual values >3 and might be considered outliers.
    They looked for outliers and then tried to find what was unusual about them (or rather why they were unusual).

    http://circ.ahajournals.org/content/117/18/2395.full

    They then suggest a sensitivity analysis by removing points associated with the elements that make the outlier unusual [such as unusual cholesterol counts] and seeing what that does to the regression results.
    Last edited by noetsi; 07-08-2013 at 03:48 PM.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  5. #50
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Multivariate normality

    From two of Dason's favorite authors TABACHNICK and FIDEL Its on page 74 which deals with data clean up.

    "Transformations are undertaken to improve the normality of the distributions and to pull univariate outliers closer to the center of the distribution therefore reducing their impact. Transformations, if acceptable, are undertaken prior to the serach for multivariate outliers because the the statistics used to reveal them [multivariate outliers] (Mahalanobis distance and its variants) are also sensitive to failure of normality."

    Transformations, such as logging are therefore tied primarily to univerate analysis (which assumes of course that univariate normality matters, otherwise you would not tranform based on that). Which is pretty common in text.

    This comes from the 5th ed of "Using Multivariate Statistics." the fact that it is on its fifth edition suggest it is pretty popular with professors, who drive much of the text market

    They go on to say on p 87: "With almost every data set in which we have used transformations, the results of analysis have been substantially improved. This is partiularly true when some variables are skewed and others are not, or variables are skewed very differently prior to transformation.'

    So even when you are not required to deal with normality, they feel it improves the results to do so (and again there transformations are tied to univariate analysis not multivariate analysis it would appear).
    Last edited by noetsi; 07-09-2013 at 01:53 PM.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  6. #51
    Points: 25, Level: 1
    Level completed: 49%, Points required for next Level: 25

    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Multivariate normality


    Take a look at Henze-Zirkler's Multivariate Normality Test.

+ Reply to Thread
Page 4 of 4 FirstFirst 1 2 3 4

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats