Linear Regression: how to tell if factors "disagree"?

#1
I use a regression as my predictor. Let's say my regression is

(y=a1 x1 + a2 x2 + a3 x3)

[TEX]y=a_1 x_1+a_2 x_2+a_3 x_3[/TEX]

I realized that in practise, when my prediction is way off, it's usually because one factor significantly skewed the prediction. For example, x1,x2 are both slightly negative, while x3 is very positive.

In this case (when factors "disagree" with each other), I would rather my predictor not do anything, than reporting a "controversal" prediction.

What is the best way of identifying if the factors "disagree"?

Thanks!

By the way, how do I post formulas properly on this forum?
 
Last edited:

ledzep

Point Mass at Zero
#2
Hey, Welcome to TS. Glad that you've posted your queries here. Just few questions for your to get bit more information on your problem..

I realized that in practise, when my prediction is way off, it's usually because one factor significantly skewed the prediction.
Does your model fit the data right? Have you performed residual analysis? Are all the variables in your model significant?

And not all predictions would be good. It is hard to predict accurately for some values (e.g. outliers).

In this case (when factors "disagree" with each other),..
can you please explain this a bit more on "disagreement" between the factors. Is it necessary that the 3 factors have to agree (all positives or all negatives??)?

By the way, how do I post formulas properly on this forum?
if you want to type formula, you can add your LaTeX code inside math tags.

For example: if you want to type f(x)= mu/2 you just have to add the LaTex code inside the math tags.

\( f(x)= \frac{\mu}{2}\)
 
#3
Does your model fit the data right? Have you performed residual analysis? Are all the variables in your model significant?

And not all predictions would be good. It is hard to predict accurately for some values (e.g. outliers).



can you please explain this a bit more on "disagreement" between the factors. Is it necessary that the 3 factors have to agree (all positives or all negatives??)?

Thanks for your reply. Yes, all the factors are significant and the regression works fine most of the case.

I agree that it is hard to predict outliers, which is exactly what I am trying to do here. In my particular application, I don't have to generate a prediction every single time (it can simply returns "no clue"), but when I do, I would rather it be correct.

From my observation, the predicted values are off usually when the factors signs are different, or magnitude are significantly different. So I want to attack from here. I am looking for a quantitative way of measuring the "disagreement" (instead of the sign rule I am listing as an example). Or, any way to tell "an outlier is likely to happen with the input here. don't trust the regression result."

[BTW I tried to use latex, but the main post is what I get.... ]
 

ledzep

Point Mass at Zero
#4
It is a very interesting task to predict the outliers. I know of no method. You make make an educated guess dependent on your data.

Remember that an observation is an outlier if it is atypical on the residuals plot (not the raw y values). This is because it could just that some unusually large (or small) values of y could be because of certain (synergistic) combination of factors.

Adding your equation inside the maths tag should work fine. Like here,
\(y=a_1 x_1+a_2 x_2+a_3 x_3\)