# Thread: Article on logistic regression - don't understand logic of statement.

1. ## Article on logistic regression - don't understand logic of statement.

The author Menard, has written a good deal on this topic and this is from a recent (2011) journal article. This makes sense.

According to ChebychefFs inequality theorem (Bohrnstedt and Knoke 1988), for any distribution, even for a very non-normal distribution, at least 93.75 percent of all cases will lie within eight standard deviations of the mean and 96 percent will lie within
10 standard deviations.
But I don't understand the leap in the section below. How do you go from the above statement to the view that a standard deviation has 10 to 15 percent of the data. For one thing the amount of data in a given deviation is different. In a normal distribution the first deviation contains about 34 percent of the data (on one side of the mean), the 2nd deviation has only about 13.5 percent. The paper deals with standardized logistic coefficients where the units are standard deviations so maybe my logic is wrong.

Thus, regardless of the distribution of a predictor, we can count on a one standard deviation difference or change being typically 10-15 percent of the total range of the independent variable.

2. ## Re: Article on logistic regression - don't understand logic of statement.

When I saw this I thought the authors badly mangled Chebychev's inequality. So I checked out the paper you're talking about because I thought there might be more context to the quotes that would explain this... but there wasn't...

However I think they used Chebyshev's inequality correctly but just did a piss poor job writing what they meant. Chebyshev's inequality tells us that 93.75% of the data will fall within 4 standard deviations of the mean (not the 8 like is in the paper) and 96% will fall within 5 standard deviations of the mean (not the 10 like in the paper). Notice that they incorrectly doubled the number. I think they meant to say that 93.75% of the observations fall within 4 standard deviations of the mean - which implies that over a range of 8 standard deviations we observe 93.75% of the data. Over a range of 10 standard deviations we observe 96% of the data. So if we treat these values as essentially containing the entire range of the data then without knowing which standard deviation we're talking about we might expect that between 1/8 and 1/10 of the observations fall within that specific standard deviation. This gets us sort of close to what they wrote because this gets us 10-12.5% (not the 10-15% reported in the paper)

3. ## The Following User Says Thank You to Dason For This Useful Post:

noetsi (06-25-2013)

4. ## Re: Article on logistic regression - don't understand logic of statement.

When you say 4 standard deviations of the mean do you mean four to the left and four to the right (which actually would be 8) or just 2 to the left and two to the right (which would be four)? I assume the later from your comments - it seems strange that a well known expert missed this or for that matter that the editor and the peer review pannel did.

5. ## Re: Article on logistic regression - don't understand logic of statement.

I thought this statement interesting

(3. Is the impact of X on Z greater than (or less than) the impact of Y on Z in both (or all) of two (or more) samples or populations?
I thought the answer was, if the variance of the samples is different, that you could not do this whether you used standardized or unstandardized coefficients.

but I was wrong (shock)

the answer may not be as obvious, but because our focus is not on the precise value of a single parameter, but rather on the comparison of two different parameters (again, with two different predictors measured on two different scales), the unstandardized coefficients fail to tell us which of the two predictors has the stronger influence on the dependent variable in either sample. Comparing the relative strength of the two variables, even across samples, requires the use of standardized coefficients.
I never even knew there was a rule

When deciding whether to focus on standardized or unstandardized coefficients, as a general guideline, the rule that standardized coefficients
should be used for comparing the effects of different predictors measured on different scales takes precedence over the rule that unstandardized coefficients should be used for comparisons of predictors across different samples or populations.

6. ## Re: Article on logistic regression - don't understand logic of statement.

Originally Posted by noetsi
When you say 4 standard deviations of the mean do you mean four to the left and four to the right (which actually would be 8) or just 2 to the left and two to the right (which would be four)? I assume the later from your comments - it seems strange that a well known expert missed this or for that matter that the editor and the peer review pannel did.
No I mean the first one. That's the common interpretation of "within k standard deviations of the mean". This makes sense. For example pretend I have a mean 0, standard deviation 1 random variable. Does the value 4 fall "within 4 standard deviations of the mean"? Yes it does. How about the value -4? Yes, that one does too. So sure it's defining something that has a total length of 8 standard deviations.

7. ## Re: Article on logistic regression - don't understand logic of statement.

So isn't his comments on 8 standard deviations the same as yours of 4 if by 8 he means four standard deviations to the left and right? That is when he says 8 standard deviations I think he means four to the left and four to the right for a total of 8 (or maybe that is what you meant by poor writing).

If by
at least 93.75 percent of all cases will lie within eight standard deviations of the mean
he means four to the right and four to the left (which is 8 total) isn't that the same thing you mean by four standard deviations?

8. ## Re: Article on logistic regression - don't understand logic of statement.

Yes I guess that is my complaint. That isn't an accurate description of the region he is describing. It's misleading. If he wanted to emphasize the length of 8 standard deviations he should have said that instead of the incorrect way he phrased it.

Which is what I was essentially saying - I thought he poorly mangled Chebyshev's theorem because he did a poor/misleading job of expressing what he actually meant...

9. ## Re: Article on logistic regression - don't understand logic of statement.

Ok I misunderstood your comments. I am surprised no one on the journal raised this point (or maybe they did and his reputation was great enough to get away with it - he appears to be one of the most important figures in standardized logistic regression coefficients and wrote the Sage monograph on logistic regression (well one of them).

My sig is a short portion of one of his comments (all that would fit). I have not a clue in the world what it means (especially the part about kurtosis).

10. ## Re: Article on logistic regression - don't understand logic of statement.

I'm wondering how closely they looked at that one statement. I meant it raised my eyebrows because I've used Chebyshev's theorem enough times to know that those values seemed low for "within 8 standard deviations". But if you don't really deal with that and you're reviewing the paper you might just assume they got that minor detail right.

And kurtosis is sort of a measure of how "peaked" or "pointy" a distribution is.

11. ## Re: Article on logistic regression - don't understand logic of statement.

Do you want to post the citation for the article?

Also, I agree that typically many people make statements such as within 2 standard deviations, but actually mean within +/- 2 standard deviations.

12. ## Re: Article on logistic regression - don't understand logic of statement.

I did not think anyone would be interested or i would have Since Dason read it I guess that was a bad assumption on my part....

The citation is

Menard, Scott. "Standards for Standardized Logistic Regression Coefficients." Social Forces 89:4 1409-1428, June 2011 (I know that is not apa style, but I left my copy somewhere)

Its actually an interesting article. He discusses the problems with logistic regression standardized coefficients, noting....

In logistic regression, the calculation of standardized coefficients is complicated by the fact that it is not the value of Y, but the probability that Y has one ot the other of its possible values, that is predicted by the logistic regression equation. The actual dependent variable in logistic regression is not Y, but logit(Y), whose observed values (for each single case) of logit(O) = - oo and logit(l) = + oo do not permit the calculation of means or standard deviations. Solutions to this problem have been (I. to ignore the variance in logit(Y), (2. to make up a number for the variance in logit(Y), and (3. to estimate the variance in logit(Y). The first two solutions lead to partially standardized logistic regression coefflcients, which do provide the rank ordering ofthe strengths ofthe relarionships of the predictors to the outcome, bur can not otherwise be interpreted or used in rhe same way as standardized coefficienrs in mulriple regression.

The last solution produces a fully standardized logistic regression coefficient, which can also provide a rank ordering of the strengths of the relarionships of the predictors to the outcome, and which, at least for one such coefficienr, can also be used and interpreted in exactly the same way as a standardized coefficient in mulriple regression.
1415-6 (as far as I have gotten). He argues there is a concensus now of the best way to generate a logistic regression standardized coefficient - something that certainly has not been the case historically....

13. ## Re: Article on logistic regression - don't understand logic of statement.

For those of us who use SAS...

A third approach implemented in SAS PROC LOCISTIC (SAS Institute 1995; see also Allison 1999; Hilbe 2009) and presented as the "Standardized Estimate" in SAS output, takes b*a and divides by the standard deviation of the standard logistic distribution, π/SQRT(3), [where "SQRT(3)" is the square root of 3], regardless of the actual variation in the dependent variable b*.
Essentially this takes the predictor variable times its standard deviation and divides by standard deviation of the standard logistic distribution (which is about 1.813). This is what Menard refers to as a partially standardized regression coefficient above.

One could argue that one is assuming a latent variable with a standard logistic distribution [regardless of what the distribution of the observed dependent variable would be]...Dividing by pie/SQRT(3) adds no real information compared to b*a. Its sole virtue is that it is readily available in SAS output, and it produces exactly the same rank ordering as b*a.
Allison in his comment on this notes that it does not really matter what you divide by [given the SAS logic]. I am not really sure why they do the division at all given that the are effectively dividing by a constant - unless you can reasonably assume the latent variable (if one exists) always has a logistic distribution. Which seems a bit unlikely.

14. ## Re: Article on logistic regression - don't understand logic of statement.

When deciding whether to focus on standardized or unstandardized coefficients, as a general guideline, the rule that standardized coefficients should be used for comparing the effects of different predictors measured on different scales takes precedence over the rule that unstandardized coefficients should be used for comparisons of predictors across different samples or populations.
I'm not sure if this really makes sense. Studies may often intentionally recruit a sample or manipulate the X variable so as to produce a lot of variation in X. Won't this mean that the standardized coefficient depends heavily on how much variation in X the researchers managed to produce?

15. ## Re: Article on logistic regression - don't understand logic of statement.

I would think so since the standardized coefficient in all the models he presents use the standard deviation of the predictor to standardize that predictor. I am not sure why a study would deliberately increase variation - usually you are trying to limit that I thought to limit standard errors

Menard proposes the following. Note the logic here is not just to rank order the predictors (there are simpler standardized coefficients such as the SAS ones that does this) but to comment on the specific impact of the predictor in standardized form as OLS standardized predictors do.

b*m = b(sx)R/slogit(Y)

b*m is the standardized logistic regression coefficient
b is the slope of the predictor
sx is the standard deviation of the predictor
R is the square root of the R squared value (there are several in logistic regression, but it probably does not matter which one you use)
slogit(Y) is the standard deviation of the predicted value of Y - I can't post the correct symbols here (I am not sure how you would obtain this in SAS).

The interpretation of this fully standardized logistic regression coefficient is straightforward, and closely parallels the interpretation of standardized coefficients in linear regression: a one standard deviation increase in X is associated with a b*m standard deviation difference or change in logit(Y).

Does that make sense to posters? And if it does, does anyone know where to find slogit(Y) in SAS

16. ## Re: Article on logistic regression - don't understand logic of statement.

While I am at it, I have never understood why the following is true (because the metric for odds ratios is exactly the same for every coefficient - the liklihood of being in a given level of Y for a change in X). I know it is true, but I don't understand the logic of why - if X1 has an odds ratio of 2 and X2 has an odds ratio of 4 why isn't X2 having more impact on Y than X1 would?

It is worth emphasizing in this context that the odds ratio is not an acceptable substitute for a standardized coefficient. While for many statisticians this will be common knowledge, too many cases occur in which odds ratios are treated as though they conveyed information about the magnitude of the effect of the variable, different from the unstandardized logistic regression coefficient, and could be interpreted as standardized coefficients.'"