Check if dependency between two variables is linear

#1
I've posted the question:
https://stats.stackexchange.com/que...near/424215?noredirect=1#comment791817_424215
There are a number of values for dependent variable (let's name it Y) and the same number of corresponding values for independent variable (let's name it X).
Below is just toy example:

X=2,4,7,11,15,20,25,30,33,42,45,50,55,60,70
Y=0,0,0,0,100,100,200,200,200,500,500,900,950,950,1000


How can i check if dependency Y(X)is linear?

In addition, i have another theoretical question. If my independent variable (X) is binary, i.e. takes only two values 0 or 1, but Y is discrete (e.g. takes the same values from the example above). Is it possible, that dependency Y(X) is linear? Why?



One from the replies to my question was:

A different interpretation of "linearity" is that alternative non-linear models aren't worth the additional complexity. There are two standard, textbook approaches to this: add a quadratic term or bin the independent variable(s). Run an ANOVA on the nested model. If it's not significant, conclude you haven't detected any nonlinearity. These are often called "goodness of fit" tests


Unfortunately the reply was not well clear for me. Are there good explanation and tutorial for these two methods (add a quadratic term and bin the independent variable) (if possible in r)? What i already understand i have to make some linear models (perhaps with lm function) and then test them with ANOVA. How many models? It's not clear which models i could make with one independent variable? Could you help please?
 

hlsmith

Not a robit
#2
The best way to examine linearity is construct a scatterplot of data and visualize the relationship. You can also fit a line to the data and see if it has reasonable fit and the residual appear to have not pattern, using linear regression.

What I like to do, which isn't as intro level, is fit a spline to the data and see how many degrees of freedom it has (knots). Another basic option is to fit a Loess curve and the shape of the line when playing around with the smoothness feature. What the other person was saying, is that you can add terms to the linear regression model. Polynomials, so X^2, or X^3 and see if they better explain the dependent model. You are able to perform test to compare nested model (linear regression models). So compare y = x versus y = x + x^2. This is what they were referencing.

If X is binary than the model produces a linear relationship, it is like a one unit increase in the X1 model, much like when the IV is continuous. See below:

where DV and IV linear:

1567170576260.png

When DV linear and IV is categorical:

1567170602611.png

Let me know if this is confusing, I wrote it all pretty quickly.
 
#3
#6
The best way to examine linearity is construct a scatterplot of data and visualize the relationship. You can also fit a line to the data and see if it has reasonable fit and the residual appear to have not pattern, using linear regression.

What I like to do, which isn't as intro level, is fit a spline to the data and see how many degrees of freedom it has (knots). Another basic option is to fit a Loess curve and the shape of the line when playing around with the smoothness feature. What the other person was saying, is that you can add terms to the linear regression model. Polynomials, so X^2, or X^3 and see if they better explain the dependent model. You are able to perform test to compare nested model (linear regression models). So compare y = x versus y = x + x^2. This is what they were referencing.

If X is binary than the model produces a linear relationship, it is like a one unit increase in the X1 model, much like when the IV is continuous. See below:

where DV and IV linear:

View attachment 1303

When DV linear and IV is categorical:

View attachment 1304

Let me know if this is confusing, I wrote it all pretty quickly.
Hi,
Thanks for your reply.
Are you an R user? I'm asking, because may be it would be easier for me to understand the concept using R code examples. The regression Y on X2 is not linear (from the plot 2 you provided it doesnt look like linear, i.e. the majority of the points are far from the line). So it's not possible to obtain a linear dependency for binary IV and discrete DV. Am i right?
 
#7
Where did he say so?

If so, I really dislike this kind of hypothetical fluffyness. Show us the real stuff!
Hi,
Thanks for your response and typo you found. I'm working with very large data sets and it's not possible to post it here, but the example mimics the real data i have. Sorry for the misunderstanding.
 
#8
The regression Y on X2 is not linear
Read about linearity.

from the plot 2 you provided it doesnt look like linear, i.e. the majority of the points are far from the line)
That is rather an example of a large variance.

I'm working with very large data sets and it's not possible to post it here, but the example mimics the real data i have.
Very few real large data sets are exactly linear.

A simple method could be to use a running mean for small intervals on the x-axes. Or, as suggested, do a lowess plot, or a GAM model, eg try the R package mgcv. And also to use goodness of fit test or lack-of-fit tests like in anova.