# P-value of intercept = 0 meaning? Can age be independent var in simple regression?

#### Sandbox007

##### New Member
Hi I recently got into statistics... I am not too sure if this is the correct place to ask, I have included pictures as well
1. Can I test "age of used vehicle" as independent variable against "selling price" as dependent variable in a simple linear regression?
2. Is my residual plot as shown weird in any way? because "age" in my dataset only have values from 0 to 25 , I was wondering if it would be okay ( as values generally line up vertically) compared to when i use other variables such as " mileage" where the residual plot seems more scattered and do not line up ...
3. My p-value of intercept is 0, is that considered significant and okay , or is there something wrong?

#### Attachments

• 52.9 KB Views: 12
• 78.6 KB Views: 11

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Given your sample size, everything seems fine. A q-q plot of residuals would also be nice. The model seems fine given age predicts price not price predicts age.

The significant intercept, just means that the average price of a brand new car (age =0) is not equal to zero. Which makes sense right? You have a couple of cars that don't completely fit the model (larger residuals), but the sample size is reasonable. You could always look up the cars which irregular residuals and try to figure out why and add that variable to the model.

Welcome to the forum.

#### Miner

##### TS Contributor
The vertical lines are caused by the fact that age is probably in integer form rather than truly continuous. That causes the residuals to group together on the x-axis rather than spread out.

#### Sandbox007

##### New Member
Given your sample size, everything seems fine. A q-q plot of residuals would also be nice. The model seems fine given age predicts price not price predicts age.

The significant intercept, just means that the average price of a brand new car (age =0) is not equal to zero. Which makes sense right? You have a couple of cars that don't completely fit the model (larger residuals), but the sample size is reasonable. You could always look up the cars which irregular residuals and try to figure out why and add that variable to the model.

Welcome to the forum.
Thank you! Does this mean my residual plot fulfills the assumption of linearity as well? I am also quite worried about the outlier ( 25 yrs old car) , should it have been removed or kept ? I read online that outliers should not be removed unless its a clear error.

#### Sandbox007

##### New Member
The vertical lines are caused by the fact that age is probably in integer form rather than truly continuous. That causes the residuals to group together on the x-axis rather than spread out.
Such a residual plot is fine right for a simple linear regression?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yeah, you shouldn't remove outliers unless they are erroneous. But in your set, if you have just one car that is 8 years older than the next oldest car - I would drop it and report that if disseminating your results. Also, per @miners comment, the residuals clustered on years is what it is, and unless you know the day it was sold - they will be that way. Not a real issue for the model.

#### Sandbox007

##### New Member
Yeah, you shouldn't remove outliers unless they are erroneous. But in your set, if you have just one car that is 8 years older than the next oldest car - I would drop it and report that if disseminating your results. Also, per @miners comment, the residuals clustered on years is what it is, and unless you know the day it was sold - they will be that way. Not a real issue for the model.
Would it be correct to just leave the outlier in but I just write something like "Hey! thats an outlier, an interesting find , hopefully future analysis could be done with bigger sample size to find out more" ?
Edit: or would it be better to remove outliers and re run?

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
You have a large enough sample that it shouldn't come into play - however, at times outliers at the end of a fit can have 'leverage'. I would fit the model with and without it and see if anything changes. If estimates shift and you get a better fit, I would remove and document it.