# Thread: discriminant linear analysis, coefficient interpretation

1. ## discriminant linear analysis, coefficient interpretation

Hi everyone,

I am trying to weigh the effect of two independent variables (age, gender) on a response variable (pass or fail in a Math's test). Age is nominal, gender and pass or fail are binary, respectively. My chosen method of analysis is linear discriminant analysis using R.

The results I got from LDA (linear discriminant analysis) in R are:

Coefficients of linear discriminants:
LD1
test\$age 0.3805229
test\$gender 0.8154062

With my rather poor knowledge about the LDA method, I am assuming that, although both independent variables are good predictors of passing the test, gender is a better one. Is this true?

I am kind of lost at how to assess LDA results. Should they be close to 1 for me to accept x's are good predictors of y? I am unsure whether there is a threshold here..

On a separate (much larger dataset) weighing similar effects, I got:

Coefficients of linear discriminants:
LD1
test\$age 1.32
test\$gender -0.21

I am clueless about how this should be interpreted?

Hereunder is a summary of what I have been doing in R.

Many thanks in advance for any help on this.

Ramon

> test <-read.csv("ldatest.csv", stringsAsFactors = FALSE,strip.white = TRUE, na.strings = c("NA","") )
> test
pass/fail age gender
1 0 20 0
2 0 21 0
3 0 22 0
4 0 23 0
5 0 24 0
6 0 25 0
7 0 26 0
8 0 27 0
9 0 28 0
10 0 29 0
11 0 30 0
12 1 31 0
13 1 32 0
14 1 33 0
15 1 34 0
16 1 35 0
17 0 20 1
18 0 21 1
19 0 22 1
20 0 23 1
21 0 24 1
22 0 25 1
23 0 26 1
24 0 27 1
25 1 28 1
26 1 29 1
27 1 30 1
28 1 31 1
29 1 32 1
30 1 33 1
31 1 34 1
32 1 35 1
>
> library(MASS)
> lda(test\$pass/fail ~ test\$age + test\$gender)

Call:
lda(test\$pass/fail ~ test\$age + test\$gender)

Prior probabilities of groups:
0 1
0.59375 0.40625

Group means:
test\$age test\$gender
0 24.36842 0.4210526
1 32.07692 0.6153846

Coefficients of linear discriminants:
LD1
test\$age 0.3805229
test\$gender 0.8154062

2. ## Re: discriminant linear analysis, coefficient interpretation

Hi parsec2011,

Your interpretation is correct. Fisher's discriminant function scores are proportional to coefficients from a multiple regression with group membership as dependent variable, so yeah, bigger scores (far from zero) are better predictors. Just don't forget that Discriminant Analysis is focused on Discrimination/Classification, so it is usually not the best technique to test which variables are more related with the response.

Now, regarding the change in coefficients, it could probably be due to some violation in the assumptions of LDA. There is a multivariate normality assumption in the Independent Variables. I know you have a binary variable but Discriminant Models are quiet robust against non-normality, I've read that some dichotomous predictors may be used without problems. But there is also an assumption requiring a constant covariance matrix between the two groups. If this assumption is not met, a quadratic discriminant function is required instead of a linear one. I'd fit both models and assess the fit in order to find the one that produces the best results. By the way, the command qda fits the Quadratic Discriminant Functions, it is also in the MASS library.

Hope this helps

3. ## The Following User Says Thank You to terzi For This Useful Post:

parsec2011 (03-19-2013)

4. ## Re: discriminant linear analysis, coefficient interpretation

Hello terzi,

Your comments are very useful and will allow me to make a difference between linear and quadratic applications of discriminant analysis.
The thought hadn’t crossed my mind and I am grateful for your help.

After doing some follow up on the matter, I made some new findings, which I would like to share for anyone who might find it useful.

-In my previous post, I was missing what I call the predict coefficients for each individual in the sample. The yielded total coefficients (expected ones) are printed by r using the script line: test.lda.values <- predict(test.lda, test[2:3]) where test.lda is the LDA test coefficient results for the independent variables for which I would like to examine the discriminant effect against my dependent (y) variable; and test[2:3] is the "independent variable" component of my excel-made dataset, which starts in the second column, and ends in the fourth, thus 2:3.

-Once I get the expected values, list, I need to convert them (because R presents them with a mean=0), which I have done in excel by substracting each expected value from the overall mean of the expected values list, on a separate column. It is quite nice and interesting to see that discriminant analysis is so detailed in the sense that you only need to check whether the values from the y column (observed ones) correspond with the expected values. Quick inspection would of course not suffice to ascertain any results. Therefore, I have made a test to compare the group (wherein I take in consideration only expected values) that at the same time correspond to all 0 values in my dependent variable column, with the mean for all expected values found within all the 0 responses on the y column.

Using this method, I am able to present statistically significant results using the test, and the predictive factors.

If anyone would like to use this method, or encounters any difficulties with R, the following script provides a kind of step by step process to get, both the coefficients from LDA, as well as the "way" to get those predicted values that need to be weighed against the observed ones.

library(MASS) #loads the library needed for linear discriminant analysis

options(max.print=100000000) #enables R to accept large datasets

test <-read.csv("ldatest.csv", stringsAsFactors = FALSE,strip.white = TRUE, na.strings = c("NA","") ) #gets rid off the annoying empty cell value issue and allows me to import my excel file in csv format

test.lda<-lda(test\$results ~ test\$age + test\$gender) #performs the LDA test

test.lda.values <- predict(test.lda, test[2:3]) #predicts the expected values for each individual in the sample by weighing 3 independent variables against one dependent, starting from the second column

plot(test.lda) #shows the graph in which by simple inspection we can see whether or not there is any discriminant effect

If anyone would like to add or comment on my post, please do so. While I have knowledge about descriptive and parametric statistics, non parametrics is still under construction.

Further, the best reference I have found about linear discriminant analysis is:

Fisher's linear discriminant analysis
Formato de archivo: PDF/Adobe Acrobat - Vista rápida
Fisher's linear discriminant analysis. STAT3366: Applied Statistical Modelling 3S6. SMS, UWA, Semester 1, 2011 – 154. We want to develop “rules” or criteria ...

Welcome to a Little Book of R for Multivariate Analysis ...
Welcome to a Little Book of R for Multivariate Analysis!¶. By Avril Coghlan, Wellcome Trust Sanger Institute, Cambridge, U.K. Email: alc@sanger.ac.uk. This is a ...

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts