+ Reply to Thread
Results 1 to 3 of 3

Thread: discriminant linear analysis, coefficient interpretation

  1. #1
    Points: 4,711, Level: 43
    Level completed: 81%, Points required for next Level: 39

    Location
    Amsterdam, The Netherlands
    Posts
    70
    Thanks
    20
    Thanked 9 Times in 9 Posts

    discriminant linear analysis, coefficient interpretation




    Hi everyone,

    I am trying to weigh the effect of two independent variables (age, gender) on a response variable (pass or fail in a Math's test). Age is nominal, gender and pass or fail are binary, respectively. My chosen method of analysis is linear discriminant analysis using R.

    The results I got from LDA (linear discriminant analysis) in R are:

    Coefficients of linear discriminants:
    LD1
    test$age 0.3805229
    test$gender 0.8154062

    With my rather poor knowledge about the LDA method, I am assuming that, although both independent variables are good predictors of passing the test, gender is a better one. Is this true?

    I am kind of lost at how to assess LDA results. Should they be close to 1 for me to accept x's are good predictors of y? I am unsure whether there is a threshold here..

    On a separate (much larger dataset) weighing similar effects, I got:

    Coefficients of linear discriminants:
    LD1
    test$age 1.32
    test$gender -0.21

    I am clueless about how this should be interpreted?

    Hereunder is a summary of what I have been doing in R.

    Many thanks in advance for any help on this.

    Ramon

    > test <-read.csv("ldatest.csv", stringsAsFactors = FALSE,strip.white = TRUE, na.strings = c("NA","") )
    > test
    pass/fail age gender
    1 0 20 0
    2 0 21 0
    3 0 22 0
    4 0 23 0
    5 0 24 0
    6 0 25 0
    7 0 26 0
    8 0 27 0
    9 0 28 0
    10 0 29 0
    11 0 30 0
    12 1 31 0
    13 1 32 0
    14 1 33 0
    15 1 34 0
    16 1 35 0
    17 0 20 1
    18 0 21 1
    19 0 22 1
    20 0 23 1
    21 0 24 1
    22 0 25 1
    23 0 26 1
    24 0 27 1
    25 1 28 1
    26 1 29 1
    27 1 30 1
    28 1 31 1
    29 1 32 1
    30 1 33 1
    31 1 34 1
    32 1 35 1
    >
    > library(MASS)
    > lda(test$pass/fail ~ test$age + test$gender)

    Call:
    lda(test$pass/fail ~ test$age + test$gender)

    Prior probabilities of groups:
    0 1
    0.59375 0.40625

    Group means:
    test$age test$gender
    0 24.36842 0.4210526
    1 32.07692 0.6153846

    Coefficients of linear discriminants:
    LD1
    test$age 0.3805229
    test$gender 0.8154062

  2. #2
    TS Contributor
    Points: 6,789, Level: 54
    Level completed: 20%, Points required for next Level: 161
    terzi's Avatar
    Location
    Mexico City, Mexico
    Posts
    420
    Thanks
    10
    Thanked 34 Times in 33 Posts

    Re: discriminant linear analysis, coefficient interpretation

    Hi parsec2011,

    Your interpretation is correct. Fisher's discriminant function scores are proportional to coefficients from a multiple regression with group membership as dependent variable, so yeah, bigger scores (far from zero) are better predictors. Just don't forget that Discriminant Analysis is focused on Discrimination/Classification, so it is usually not the best technique to test which variables are more related with the response.

    Now, regarding the change in coefficients, it could probably be due to some violation in the assumptions of LDA. There is a multivariate normality assumption in the Independent Variables. I know you have a binary variable but Discriminant Models are quiet robust against non-normality, I've read that some dichotomous predictors may be used without problems. But there is also an assumption requiring a constant covariance matrix between the two groups. If this assumption is not met, a quadratic discriminant function is required instead of a linear one. I'd fit both models and assess the fit in order to find the one that produces the best results. By the way, the command qda fits the Quadratic Discriminant Functions, it is also in the MASS library.

    Hope this helps
    Statisticians are engaged in an exhausting but exhilarating struggle with the biggest challenge that philosophy makes to science: how do we translate information into knowledge

  3. The Following User Says Thank You to terzi For This Useful Post:

    parsec2011 (03-19-2013)

  4. #3
    Points: 4,711, Level: 43
    Level completed: 81%, Points required for next Level: 39

    Location
    Amsterdam, The Netherlands
    Posts
    70
    Thanks
    20
    Thanked 9 Times in 9 Posts

    Re: discriminant linear analysis, coefficient interpretation


    Hello terzi,

    Your comments are very useful and will allow me to make a difference between linear and quadratic applications of discriminant analysis.
    The thought hadn’t crossed my mind and I am grateful for your help.

    After doing some follow up on the matter, I made some new findings, which I would like to share for anyone who might find it useful.

    -In my previous post, I was missing what I call the predict coefficients for each individual in the sample. The yielded total coefficients (expected ones) are printed by r using the script line: test.lda.values <- predict(test.lda, test[2:3]) where test.lda is the LDA test coefficient results for the independent variables for which I would like to examine the discriminant effect against my dependent (y) variable; and test[2:3] is the "independent variable" component of my excel-made dataset, which starts in the second column, and ends in the fourth, thus 2:3.

    -Once I get the expected values, list, I need to convert them (because R presents them with a mean=0), which I have done in excel by substracting each expected value from the overall mean of the expected values list, on a separate column. It is quite nice and interesting to see that discriminant analysis is so detailed in the sense that you only need to check whether the values from the y column (observed ones) correspond with the expected values. Quick inspection would of course not suffice to ascertain any results. Therefore, I have made a test to compare the group (wherein I take in consideration only expected values) that at the same time correspond to all 0 values in my dependent variable column, with the mean for all expected values found within all the 0 responses on the y column.

    Using this method, I am able to present statistically significant results using the test, and the predictive factors.

    If anyone would like to use this method, or encounters any difficulties with R, the following script provides a kind of step by step process to get, both the coefficients from LDA, as well as the "way" to get those predicted values that need to be weighed against the observed ones.

    library(MASS) #loads the library needed for linear discriminant analysis

    options(max.print=100000000) #enables R to accept large datasets

    test <-read.csv("ldatest.csv", stringsAsFactors = FALSE,strip.white = TRUE, na.strings = c("NA","") ) #gets rid off the annoying empty cell value issue and allows me to import my excel file in csv format

    test.lda<-lda(test$results ~ test$age + test$gender) #performs the LDA test

    test.lda.values <- predict(test.lda, test[2:3]) #predicts the expected values for each individual in the sample by weighing 3 independent variables against one dependent, starting from the second column

    plot(test.lda) #shows the graph in which by simple inspection we can see whether or not there is any discriminant effect

    If anyone would like to add or comment on my post, please do so. While I have knowledge about descriptive and parametric statistics, non parametrics is still under construction.

    Further, the best reference I have found about linear discriminant analysis is:


    Fisher's linear discriminant analysis
    school.maths.uwa.edu.au/~nazim/3S6/.../LDA.pd... - Traducir esta página
    Formato de archivo: PDF/Adobe Acrobat - Vista rápida
    Fisher's linear discriminant analysis. STAT3366: Applied Statistical Modelling 3S6. SMS, UWA, Semester 1, 2011 – 154. We want to develop “rules” or criteria ...

    Welcome to a Little Book of R for Multivariate Analysis ...
    little-book-of-r-for-multivariate-analysis.readthe... - Traducir esta página
    Welcome to a Little Book of R for Multivariate Analysis!¶. By Avril Coghlan, Wellcome Trust Sanger Institute, Cambridge, U.K. Email: alc@sanger.ac.uk. This is a ...

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats