Hi guys, please help. I need to do regression analysis for demographic data. I am not sure whether linear or logistic regression would be appropriate... Also I would like to be able to extrapolate trends...
can use SPSS or STATA.
thanks,
S
The earth is round: P<0.05
Thanks. Basically, I am trying to do population projections. So I have data from 1980 till 2010 and I want to project those in the future. I keep reading on the internet and it seems that autocorrelation could be something to consider... not sure... and what if I wanted to add other variables to the model, like economic growth... thanks for helping with the model.
Sylvia
It's not quite that simple ...
You could do just normal regression. It probably wouldn't be very wrong. The problem is just that for most time series, the residuals will be autocorrelated (each value correlated with the previous one), so they are not independent, and the regression error will be underestimated (the Durbin-Watson test can be used to check for this condition if I remember correctly).
So, one solution is to first do a regression,
then calculate the residuals,
then estimate their AR(1) coefficient rho (in essence the degree of autocorrelation) by regressing all residuals onto their previous values,
then use this value to remove the autocorrelation (this is called pre-whitening),
then regress again (couple of extra tricks there)
All this has to be done with rather meticulous attention to detail, so unless you feel heroic it would be best to use a standard package for it ... don't know if it's in SPSS or STATA but surely there is a package for R!
Dear Ohammer, dear all,
This indeed is not easy. Is there a way to do something simpler - like doing a normal regression and then extrapolating the trend? (if so what to do with years - can it be treated as independent varaible or does it need to be transformed?)
and how about a logistic curve? does it make sense to do logistic regression and then extrapolate (how?)
THANKS
Yes forget the pre-whitening, I don't think it would make any practical difference, and people are regressing time-series without it all the time. (Only, if you are going to publish this, and get one of those pedantic reviewers we all hate, it might be criticized).
So, all you need then is an ordinary least squares regression with time as independent variable (no transformation). Write down the equation for the modeled function (linear, logit, whatever), plug in the year 2020, and tell us our future
This is giving me a strange result
Coefficients(a)Beta t Sig.
Unstandardized Coefficients Standardized Coefficients
Model B Std. Error
1 (Constant) -1.631E8 4801028.350 -33.973 .000
Year 83544.071 2407.111 .989 34.707 .000
a. Dependent Variable: Singapore
Beta is 83,544
how should I treat variable "year" in the equation?
You have done a linear regression with year t as independent variable? It looks like you have slope 83544 and intercept -163100000:
y=83544*t-163100000
Plug t=2010 into this, and get about 4.8 million, which is (Wikipedia ... Singapore ... hang on ...) quite close to the 2010 population of Singapore?
BUT: The population before ca. 1952 was negative according to this model - maybe you need to consider e.g. an exponential instead ...
Bless you Ohammer. So it gives around 6.1 mln in 2025 which is very reasonable. But I will add x square to the model as suggested.
Another way of looking at it would be that for each year we would get 83,544 extra people, correct?
I will also try logistic reg to compare.
I've sort of been wondering this the entire thread... How are you going to use logistic regression here? What outcome are you modeling?
Yes, logistic regression doesn't seem to work...as it is for binary variables only. linear regression yielded plausible results. This makes me wonder how do demographers make their logistic curves....
If you have a suggestion for a not very complex model, please do let me know.
I think maybe there is a confusion about the word "logistic" here: Logistic regression often refers to (GLM) regression of binary data using a logit/probit link, maybe this is what you tried? You may instead be thinking of fitting to a logistic (sigmoid) function often used for population growth, something like
y=a/(1+b*exp(-cx))
for parameters a, b and c ?
This model is difficult to linearize by transformation (at least if all three parameters have to be estimated), so you may have to use a nonlinear regression method .
Yes.
This is the output:
Logistic
Model Summary
R R Square Adjusted R Square Std. Error of the Estimate
.995 .990 .990 .014
The independent variable is Year.
ANOVA
Sum of Squares df Mean Square F Sig.
Regression .569 1 .569 2771.558 .000
Residual .006 27 .000
Total .575 28
The independent variable is Year.
Coefficients
Unstandardized Coefficients Standardized Coefficients
B Std. Error Beta t Sig.
Year .984 .000 .370 3166.533 .000
(Constant) 537904.449 338891.324 1.587 .124
The dependent variable is ln(1 / SEAsia).
Is A- contsant, B- year? where is C?........
Tweet |