# Thread: multiple linear regression: negative Y ?!

1. ## multiple linear regression: negative Y ?!

So I have my equation and it looks weird. The intercept is a small positive value and a lot of the co-efficients have large negative values. So that means my outcome, dose, is going to be negative?! How's that supposed to make sense? Or do I have it wrong?

Thanks.

Not necessarily. You can't say until you have final predictions. The predictions are result of not just parameters but sum-product of parameters & IVs. What if IVs are also negative? It'd help if you provide more information regarding your equation. And I'd say that first of all make predictions & see if the outcome is really negative?

What negative slopes mean is that the DV will go down as the IV goes up. If the slopes are small (meaning Y won't fall much) the slopes can be negative and Y not be negative even with a small intercept.

If you find predictions of Y that are nonsensical (like negative dosage) several factors could be in play. If your Y is not interval like and you are using a method like OLS regression this can generate nonsensical results. Also if an IV can not take on a value of 0 (say the value is height of an adult - no adult has a zero height) that can make your intercept essentially meaningless as all it is the value of Y when all X are 0. I don't know for sure how that effects predictions of Y, although it is common to center an X that can not take on a meaningful value of 0.

What do I do now? My Y is continuous, not interval. A couple of slopes have a LARGE negative value. What do I need to remedy? The model seemed to have evolved okay based on p-values and general sense.

First of all let's see if the predictions are actually negative or not? Do you remember, I told you to use output statement to calculate residuals & cook's D statistics? Similarly you can calculate the predicted values as well:

Output out=statsclue1 p=predicted_dv;

Do a univariate analysis on variable predicted_dv & see what % of the values are negative?

proc univariate data=statsclue1;
var predicted_dv;
run;

Umm..don't think I get it. Do you mean:

proc print data=statsclue out=statsclue1 p=predicted_dose;
run; /*this actually didn't run*/

proc univariate data=statsclue1; /*and how is this taking into account the regression equation that I have?*/
var predicted_dose;
run;

Working on it.

You were supposed to add the output statement to the proc statement when you fit your model...

Just to clarify what Dason said, use output statement in proc GLM.

proc glm data=stats out=stats1 p=predicted_dose;
class gender;
model dose=gender bmi A B C A*C/solution clparm;
run;

ERROR 22-322: Syntax error, expecting one of the following: ;, (, ALPHA, DATA, MANOVA, MULTIPASS,
NAMELEN, NOPRINT, ORDER, OUTSTAT, PLOTS.
ERROR 76-322: Syntax error, statement will be ignored.

Trying to figure.

Perhaps you can use a suitable link function to force your prediction to be on the upper right hand quadrant. That is map [-Inf,Inf] -> [0,Inf]. The log function comes to mind.

Originally Posted by dmancevo
Perhaps you can use a suitable link function to force your prediction to be on the upper right hand quadrant. That is map [-Inf,Inf] -> [0,Inf]. The log function comes to mind.
Let's not jump quite there yet when the OP hasn't even successfully been able to see if the predictions are negative or not.

Oh thanks! I got an output. Had to put the output statement AFTER model.

It gives a table of extreme observations. TWO of the lowest observations are NEGATIVE. ?

Trying to figure.

edit: and there's ONE missing value...dunno why. All my dose cells had a value though some values for other variables were missing.
edit: no, the missing count is 40.

Don't get worried about missing now, see how many negative values do you have & what % they are of total predicted values.

All predicted values aren't printed. There are quantiles and there are extreme observations. The extreme observations show only TWO negative values and the Quantiles show one big negative value at 0% and another negative value at 1%. At all other levels---5%, 10%,25%,50%,90%,95%,99%,100%, the values are ALL POSITIVE.

So I guess negatives are 1% of the predicted values..?

PS: figured the missing thing, I think. The other variables had some data missing and so those observations were deleted for the analysis. Values used were 40 less than values read. Makes sense.

That means somewhere between 1 and 5% of your values are negative. You can also find the exact number by:
Code:
``````proc sql;
select count(predicted_dose)
from stats1
where predicted_dose<0;
quit;``````
So this shows that only very small % of your values are predicted negative, contrary to your initial thought . If this is a concern to you then you'll have to think transformations. But first decide that is this small number a concern?

