Help on interpreting linear regression estimates
I'm currently working on a larger assignment, and I need some input how to interpret the results from my/a linear regression. I'm pretty sure I get it right, but as a precautionary measure some input would be lovely.
I've websearched and speculated alot about this, but I can't seem to get a final take on it.
This is a part of my regression (which is enough to illustrate the point):
Well, the first model is with raw numbers, and the second with logarithm applied on all variables. I was quick to see that not data transforming would make everything quite useless. As we observe, model 1 is far from signifigant, neither in total, or for any variable listed. The opposite with model 2, with the exception of log_gdp of course, but almost. So there's really no doubt that model 2 is way better. But I'm confused regarding the estimates of the individual variables, and their interaction.
See, as I'm interpreting it, in model 1, we have a -6.125e-02 estimate on gdp, -1.201e+02 on education and (positive) 1.845e-03 on their interaction. This is where I'm unsure; so for every move gdp, we have a -6.125e-02 decrease in homicides and -1.201e+02 decrease in homicides when it comes to education. This makes sense, since we should assume that wealth and education means less tendency to conduct homicide. But what about the interaction? So gdp:education means 1.845e-03 increase in homicide? So both of these in combination means an increase in homicide? This makes no sense, at least reagarding our assumption/theory that these two factors should reduce crime/homicides..
It's essentially the same problem in model 2, just inverted; now log_gdp:log_education is negative, but log_gdp and log_education positive. So in model 2 both log_gdp and log_education means an x% increase in homicides, but their interaction means a x% decrease?
And why does log transformation seemingly makes this invertion? Because that's the true/real interaction/effect, or?
Any help appreciated.
I'm currently working on a larger assignment, and I need some input how to interpret the results from my/a linear regression. I'm pretty sure I get it right, but as a precautionary measure some input would be lovely.
I've websearched and speculated alot about this, but I can't seem to get a final take on it.
This is a part of my regression (which is enough to illustrate the point):
Code:
> summary(fittest1)
Call:
lm(formula = homicides_any_method ~ gdp * education, data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-3000 -2082 -1399 -124 42265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.442e+03 1.234e+03 2.790 0.00593 **
gdp -6.125e-02 1.686e-01 -0.363 0.71692
education -1.201e+02 1.690e+02 -0.711 0.47828
gdp:education 1.845e-03 1.528e-02 0.121 0.90404
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5372 on 158 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.03071, Adjusted R-squared: 0.0123
F-statistic: 1.669 on 3 and 158 DF, p-value: 0.176
> summary(fittest2)
Call:
lm(formula = log_any_homicide_rate ~ log_gdp * log_education,
data = raw_data)
Residuals:
Min 1Q Median 3Q Max
-2.2483 -0.6268 -0.0109 0.4907 2.7438
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3101 1.8372 0.713 0.47684
log_gdp 0.4819 0.2856 1.687 0.09355 .
log_education 2.6292 0.8446 3.113 0.00220 **
log_gdp:log_education -0.3949 0.1245 -3.173 0.00181 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.998 on 158 degrees of freedom
(15 observations deleted due to missingness)
Multiple R-squared: 0.3181, Adjusted R-squared: 0.3052
F-statistic: 24.57 on 3 and 158 DF, p-value: 4.201e-13
See, as I'm interpreting it, in model 1, we have a -6.125e-02 estimate on gdp, -1.201e+02 on education and (positive) 1.845e-03 on their interaction. This is where I'm unsure; so for every move gdp, we have a -6.125e-02 decrease in homicides and -1.201e+02 decrease in homicides when it comes to education. This makes sense, since we should assume that wealth and education means less tendency to conduct homicide. But what about the interaction? So gdp:education means 1.845e-03 increase in homicide? So both of these in combination means an increase in homicide? This makes no sense, at least reagarding our assumption/theory that these two factors should reduce crime/homicides..
It's essentially the same problem in model 2, just inverted; now log_gdp:log_education is negative, but log_gdp and log_education positive. So in model 2 both log_gdp and log_education means an x% increase in homicides, but their interaction means a x% decrease?
And why does log transformation seemingly makes this invertion? Because that's the true/real interaction/effect, or?
Any help appreciated.
Last edited: