I edited the second part of the question to make it (perhaps) more clear....
Hello,
I was reading different resources about regression diagnostic, in particular for Logistic Regression.
As for leverage, the sources suggest to seek for observations with higher-than-average leverage.
Now, where I am confused is about how the mean leverage is calculated.
One sources suggests: (k+1)/N
where k=number of predictors, N=sample size
My question:
1) if one of the predictors is categorical, in k do we have to also count the levels of the categorical predictor?
2) do we have to also count the intercept (I think not)?
As for a practical example, given the dataset and the model below, how would you calculate the average leverage?
What I am wondering is if, when counting the number of predictors (i.e., devising k), do we have to also count the number of levels of categorical predictors?Code:mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") mydata$rank <- factor(mydata$rank) > head(mydata) admit gre gpa rank 1 0 380 3.61 3 2 1 660 3.67 3 3 1 800 4.00 1 4 1 640 3.19 4 5 0 520 2.93 4 6 1 760 3.00 2 fit <- glm(admit ~ gre + gpa + rank, data=mydata, family=binomial(logit))
In other words, if we have 1 continuous predictor and 1 categorical predictor with 3 levels, k would be:
2 (i.e., 1 continous predictor + 1 categorical predictor)
or
3 (i.e., 1 continuous predictor + 2 [i.e., the levels of the categ predictor minus one due to dummy coding]) ?
Thanks for any clarification
gm
http://cainarchaeology.weebly.com/
I edited the second part of the question to make it (perhaps) more clear....
http://cainarchaeology.weebly.com/
"leverage Measures the potential impact of an individual case on the results, which is directly proportional to how far an individual case is from the centroid in the space of the predictors. Leverage is computed as the diagonal elements, h sub ii , of the "Hat" matrix, bold H ,bold H = bold X star ( bold X star prime bold X star ) sup -1 bold X star primewhere bold X star = bold V sup 1/2 bold X , and bold V = diag { P Hat ( 1 - P Hat ) } . As in OLS, leverage values are between 0 and 1, and a leverage value, h sub ii > 2 k / n is considered "large"; k = number of predictors, n = number of cases."
Taken from: http://www.datavis.ca/courses/grcat/grc6.html
I would say per my opinion, you would not include the intercept in the count and yes account for >/= 3 group categories. So TS status (human, bot, raptor) would count as 2 predictors. Still regularly using SAS, so feel free to post R code for my edification.
Stop cowardice, ban guns!
gianmarco (09-30-2016)
I was just looking here and there. this thread looked interested and it is. informative. nice post
Tweet |