hi all! I'm new here but I sort of need some help.
I have this regression analysis assignment to do using SPSS and stats knowledge and I'm not sure of what I'm supposed to do. I've only answered, or tried my best to answer the 1st of the 6 questions..... I feel a little dumb right now, but I don't know what to do...
To put you in the right context here is the problem statement and below I'll tell you what I've done so far:
Regression analysis assignment:
The data for this assignment are found in the following SPSS file, FemalePrivateWage.sav, which is available at CampusNet.
The data set contains labour market data collected by Statistics Denmark (Danmarks Statistik). The sample is collected randomly and consists of information for the year 1990 on economic, demographic and socio-economic variables for 255 female private employees. Qualitative as well as quantitative variables are involved. The variables are as follows:
child: 1 = children 0-6 years old, 0 = no children 0-6 years old
married :1 = married or living with a partner, 0 = single
age : age in number of years
province : 1 = living in the province, 0 = living in the capital area
education : education in number of years
exper : experience in number of years
hourwage : average hourly wage rate in DKK
According to the Human Capital theory, the salary of individuals depends on knowledge and skills which can be acquired through education, experience and job-specific training. A simple wage model (known in the literature as the “Mincer-model”) predicts that
ln(hourwagei) = f(educationi,experi)+εi.
In an thorough analysis of this model it will be important to control for differences in geography, and other socio-economic variables.
In the following questions you will be taken through the various steps that are necessary in order to conduct a valid analysis of the problem at hand.
Give a brief, descriptive analysis of the data. [Hint: you need to state what the scales are for all the variables. With the problem in mind it will make sense to make scatter plots of the dependent variable (hourwage) against all the explanatory variables. Here you should realize that it is better to use the log of hourwage. Also, a scatter plot of an interval scaled variable against nominal variables is not so helpful … can you think of a better plot … it starts with a b?]
Based on the above model you are to make an analysis of the salary structure in Denmark based on the data involved. It is important that all variables available are examined. [Hint: Start by regressing ln(hourwage) on all the other variables. Before you can assess the significance of the estimates, you must decide if the model satisfies its design criteria. Assess the assumption of homoscedasticity by making various plots and comparing the HRSE standard errors and the regular standard errors. Compare the simple correlations to partial correlations to detect possible problems with respect to multicollinearity. Save residuals and make histogram, QQ or PP plots to assess whether the residuals are approximately normal. Now you are ready to remove insignificant variables from the model. For your final model, plot the residuals against the predicted values, and the remaining regressors in separate scatter plots. This will help you capture possible misspecification of the regression function …. in my final model I was left with province, education and exper. Make sure that you give a thorough interpretation of the regression coefficient estimates].
Now we will try different kinds of things in order to make sure that you know how to do them
3)In order to perform WLS estimation instead of using HRSE we will now search for a good weight. [Hint: regress the squared residuals on the levels and squares of the regressors of your final model from 2) and do also include age and age squared. Explain why it would be stupid to both include the level and the squares of province in this regression (or any regression for that matter). The variables that do best in this regression is the right choice for weight]
4)Use age (if you didn’t make this choice above, then you did something wrong) as weight in a WLS of your final model from 2). [Hint: follow the instructions in the corresponding slide show]
5)Try to determine whether the return to education and exper are different between the province and the capital. [Hint: include province*education and province*exper in your final model from 2) and test for significance. Try making a test where your test the significance of both the new variables jointly. Interpret the outcome]
6)Consider whether there are diminishing returns education or exper. [Hint: Add the squares of education and exper in your final model from 2). Assess the outcome]
What I've done:
So far I've only answered question 1, but I'm not even sure it's right. described each variable (ordinal, nominal or scale) I've checked them for normailty using P-P plots and discussed the variables. I made histogram wiht normal curves and got also out the output for decriptive stats in SPSS. I made scatter diagrams also for the appropriate variables and box plots for the others.
Than I looked at the data set for multiple regression analysis. R2 was 10% and my f observed made me conclude that there was a linear relationship between my independant variables and the hour wage.
than I tested all b coefficient. Only experience and education, accordning to the t-tests showed that they were linearly related to the hour wages.
now... I'm lost... I know this should be a log of hourwage, but what do I do exaclty. Do I still use all variables or only education and experience? can you let me know if what I've done is right or worng so far? what would you do if you were me! thanks so much!