Regression with missing data

Status
Not open for further replies.
#1
Hi all

So I just want to do a quick linear regression, but when I initially tried I had an error (r(2000 no observations). I've realised my issue was that I have several variables with missing values.

I have a 3-level dependent variable (1,2,3). Then I have several variables that indicate presence (0 or 1) that I code as dummy variables using 'i.' If the variable is 'present', the quality is also taken. So if x1 is present it's also given a score (x2) of 1-3. I want to look at both the effect of presence and quality on the DV.

So my command might be - regress DV i.x1 i.x2 i.y1 i.y2. The problem is then that x2 and y2 have missing values where x1 or y1 are not present.

I want to ask - how can I account for this? I thought that I could recode the x2 and y2 missing values as 0, but I feel my coefficients will be somewhat skewed if I did that.

The other option might be to do a sort of pairwise regression (no idea how in STATA but could figure it out!) but then I end up with different numbers of observations for different parts of the model which is tricky for interpretation.

Anyone got any ideas?
 
Status
Not open for further replies.