# Thread: calculation of the average leverage when predictor(s) is categorical

1. ## calculation of the average leverage when predictor(s) is categorical

Hello,
I was reading different resources about regression diagnostic, in particular for Logistic Regression.
As for leverage, the sources suggest to seek for observations with higher-than-average leverage.

Now, where I am confused is about how the mean leverage is calculated.
One sources suggests: (k+1)/N
where k=number of predictors, N=sample size

My question:
1) if one of the predictors is categorical, in k do we have to also count the levels of the categorical predictor?
2) do we have to also count the intercept (I think not)?

As for a practical example, given the dataset and the model below, how would you calculate the average leverage?
Code:
``````mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
mydata\$rank <- factor(mydata\$rank)

1     0 380 3.61    3
2     1 660 3.67    3
3     1 800 4.00    1
4     1 640 3.19    4
5     0 520 2.93    4
6     1 760 3.00    2

fit <- glm(admit ~ gre + gpa + rank, data=mydata, family=binomial(logit))``````
What I am wondering is if, when counting the number of predictors (i.e., devising k), do we have to also count the number of levels of categorical predictors?
In other words, if we have 1 continuous predictor and 1 categorical predictor with 3 levels, k would be:
2 (i.e., 1 continous predictor + 1 categorical predictor)
or
3 (i.e., 1 continuous predictor + 2 [i.e., the levels of the categ predictor minus one due to dummy coding]) ?

Thanks for any clarification
gm

2. ## Re: calculation of the average leverage when predictor(s) is categorical

I edited the second part of the question to make it (perhaps) more clear....

3. ## Re: calculation of the average leverage when predictor(s) is categorical

"leverage Measures the potential impact of an individual case on the results, which is directly proportional to how far an individual case is from the centroid in the space of the predictors. Leverage is computed as the diagonal elements, h sub ii , of the "Hat" matrix, bold H ,
bold H = bold X star ( bold X star prime bold X star ) sup -1 bold X star prime
where bold X star = bold V sup 1/2 bold X , and bold V = diag { P Hat ( 1 - P Hat ) } . As in OLS, leverage values are between 0 and 1, and a leverage value, h sub ii > 2 k / n is considered "large"; k = number of predictors, n = number of cases."

Taken from: http://www.datavis.ca/courses/grcat/grc6.html

I would say per my opinion, you would not include the intercept in the count and yes account for >/= 3 group categories. So TS status (human, bot, raptor) would count as 2 predictors. Still regularly using SAS, so feel free to post R code for my edification.

4. ## The Following User Says Thank You to hlsmith For This Useful Post:

gianmarco (09-30-2016)

5. ## Re: calculation of the average leverage when predictor(s) is categorical

I was just looking here and there. this thread looked interested and it is. informative. nice post

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts