scratching my head over some regression output i'm getting from Excel. as you can see from below, the best fit line on my indeptendent variable DSCR (Act) is negative slope, but the beta is positive. used log transformations.
my problem is those are the exact regression results i got from Excel, produced simultaneously with that best fit line plot for the predicted Loss given DSCR (Act) and the actual loss given DSCR (Act).
I do not understand why i'd be getting a coefficient of +0.139 for DSCR (Act) given the plot.
The plot is not for the regression equation you are commenting on. The plot shows a single IV when you have two in your model. None of the paramaters for the graph and the equation are the same (not just the slope, the R squared anything). LTV is not being addressed by the plot at all but it is in you equation.
You need to look at the documentation of whatever produced the plot in Excel. My guess is it is producing a univariate plot of one of the variables in your equation. So you are comparing the line for one of the independent variables and the dependent variable in the plot to a line with both independent variables. In honesty Excel is not a great tool for regression (having used it).
sorry, maybe i should've clarified. i understand it's a univariate plot, and i don't expect any of the paramaters to be the same.
my confusion simply lies in the basic binary (positive/negative) relationship between Loss & DSCR (Act). clearly, given the graph of the data, one would not guess the coefficient for DSCR (Act) to be positive.
in other words, larger DSCR (Act) does not result in higher Loss in real life or in the data, so why is my regression equation telling me this is so?
edit: this is an extremely simple regression, so i would think Excel could handle it just fine.
The slope of a univerate relationship can be totally different than its multivariate one including direction. It is dangerous to guess how the marginal slope (from the multiple regression) will look based on univerate analysis. Remember that only the unique relationship between Loss and DSCR (that is the overlap between the two not shared with the other independent variable in the model) is considered in the multiple regression. That partial slope of multiple regression is, as you found, not always anything at all like a univariate relationship where the entire overlap between the independent and dependent variable is used to calculate the slope.
In the data of the multivariate reality, with both independent variables considered, DSCR is not resulting in the same relationship it does by itself. That is not unusual.
I don't know what LTV is but hopefully this plot can help you see why the coefficient can change. The two different plotting colors and shapes represents data from different groups. If we ignore group then it appears that as x increases so does y. But if you look within a single group you'll see that trend reversed - as x increase it appears that y decreases.
This is just for the simple case where we add groups but the same type of situation can occur when you add another continuous variable - it's just a little harder to visualize in those situations.
Think of Y as your dependent variable and W as DSCR with LTV X. Then in the univariate relationship both Red and Green would reflect the slope. But in the multiple regression only Green would. That difference in how slopes are calculated can make a huge difference.
right. i understand that, too (even though i'm basically a novice with regression modeling). i've certainly seen sign changes in betas when all variables are regressed. i just didn't think it was possible that the directional component of the partial slope could change to such a large degree, particularly in this case, given what i know about the real life behavior of these variables. and there is a pretty strong downward trend in the data (univariately speaking). and intuitively, it makes no sense that a higher given DSCR results in a higher predicted Loss, regardless of the statistical, mechanical explanation.
that being said, let me give it a shot to try to wrap a simpleton's brain around this.
given that when running a univariate regression with DSCR only, i get a large, negative beta and t-stat. whereas if i incorporate LTV, as in the above summary output, the t-stat and beta for the DSCR become small and positive. would it be correct to say, in basic terms, that the LTV is overwhelming the DSCR contribuition to the change in the dependent variable, rendering its contribution insignificant?
edit: there were a couple of replies before mine. i'll review those, but if you could let me know if my thinking is along the right lines, i'd appreciate it. i have to explain this to some even more lay laypeople than me, and i think they'll get it if i word it that way.
edit2: let me put it another way. all this is data is saying is: as LTV (Loan To Value, how high of leverage there is in a loan) goes UP, Losses are expected to be HIGHER, and as DSCR (debt service coverage ratio, how much income you have to cover the mortgage) goes UP, Losses are expected to be LOWER. so, in real life, if you have a higher LTV loan, but you have a really high DSCR, your Losses will probably be ZERO. but the equation will tell me the losses will be HIGHER.
Based on your initial regression equation as either DSCR and LTV go up the dependent variable goes up because the slope for both is positive. That seems different than what you stated above which has losses going up with LTV and down with DSCR. The dependent variable is moving in the same direction for both independent variables (up based on the slope).
When you talk about the smell test you raise a key point. The comments Dason and I have made are tied purely to the statistics presented - they are purely methodological comments. Neither of us, as far as I know, have any substantive expertise in the area you are raising at all and we are not saying that regression equation is modeling reality correctly. Many problems in the regression model, e.g., too little data, the wrong form for a particular variable, leaving other key variables out of the model etc can all lead to wrong statistical results. For example the true relationship between the variables might be non-linear and the linear regression you used might be entirely wrong as a result.
As the subject matter expert if you think the model does not make substantive sense you should consider changes in the model that would better capture the true results.
Oh, when i was referring to the UP and DOWN thing, i was referring to real life. the problem is the model says different.
that being said, would it be correct to say that in generic terms, that the LTV variable is overwhelming the DSCR variable's contribuition to the change in the dependent variable, rendering the DSCR's contribution insignificant (ie much lower t-stat and beta)?
anyway, given that i know i have good data, i think you're right, i'm going to have to figure out either A) what other control variables are necessary and/or B) if linear regression is really appropriate for this.
finally, thanks very much for the repsonses, noetsi and Dason.
The best way to tell the relative impact of the two variables is to look not at the raw slopes you presented but their standardized beta weights. The problem with comparing raw slopes is that the units they are measured in come into play so you are comparing apples and oranges. Regardless of which is more important the results you find might not be due to their relative impact, but the model you specified.
From your comments I think your last point is critical. I think your model needs to be corrected. I would look at the academic and/or trade literature in the field on this, commonly someone (often many people) have written on these issues before.
Good call. Originally I just had different plotting characters but thought there could be a little more distinction and in R black is color = 1 and red is color = 2 and my groups were coded 1 and 2 so I just used those.