I think you might be thinking of restriction of range. (Though models for truncated DVs are also a thing).
What is the term for when your outcome variable is truncated and that may be dampening your results?
For example, you use a dichotomous outcome for your regression analysis instead of continuous.
I think you might be thinking of restriction of range. (Though models for truncated DVs are also a thing).
Matt aka CB | twitter.com/matthewmatix
I have never heard a term for this, but if the variation in the DV [or for that matter the IV] is extremely limited it will impact the slopes [they will be lower than they actually should be]. It has to be pretty extreme for this to matter.
Sometimes when they speak of turning a natural interval variable into a categorical variable, normally frowned on, they talk about "loss of information."
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
On page 61 of "Using Multivariate Statistics" by Fidel and Tabachnick it says in part.
"Sample correlations may be lower than population correlations when there is restricted range in sampling of cases or very uneven splits in the categories of dichotomous variables.....A falsely small correlation between two continuous variables is obtained if the range of responses to one or both of the variables is restricted in the sample." On the next page they go "The correlation between a continuous variable and a dichotomous variable, or between two dichotomous variables (unless they have the same peculiar splits)' is also too low if most (say over 90%) responses to the dichotomous variable fall into one category."
This does not seem to apply to standardized slopes which you would normally not use for dichotomous variables anyway. It is in a chapter on data clean up not on regression per se although logically it applies to that.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
I never realized that was the case. Its often stated that you use pearson for interval variables, spearman for ordinal, and polychoric for binary data. In practice that is way too simple. For example Pearson assumes a linear relationship and two variables having a curvilinear relationship won't fit this well [although I don't know if spearman or polychoric will either].
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Yeah you're right, it's more complicated than that rule suggests.
But yeah the Pearsons correlation = standardised slope thing is a nice property, it shows a bit more clearly what the magnitude of Pearsons correlation actually tells you (i.e., for a standard deviation increase in one variable, the expected standard deviation change in the other = r).
Matt aka CB | twitter.com/matthewmatix
noetsi (02-06-2017)
So what correlation do you use for non-liner relationships? I have long wondered.
I think as long as the p value is high enough you can use pearson
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Spearman's rho is good for monotonic but non-linear relationships (although it's really describing just the strength of the relationship and not exactly its form).
For relationships that aren't monotonic you'd need a more complex model (E.g., quadratic regression, piecewise regression, spline models, loess, etc.) That in turn means you won't really be able to summarise the model in the form of a single number in the way you can with a correlation (though I suppose you might still report the R2 as a summary of the strength of the relationship in some cases).
Matt aka CB | twitter.com/matthewmatix
noetsi (02-07-2017)
Essentially if you want to model non-linear relationships you do regression With a quadratic, cubic etc.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
A couple of comments, I have also heard it referenced as a "loss of information". If you are referencing turning a continuous into a binary or categorical variable you can use the term dichotomized or discretized, if applicable.
Is this standardized correlation also why the R^2 can be interpreted on the percentage scale?
Stop cowardice, ban guns!
Tweet |