Thread: Do I have a non-linear regression?

1. Do I have a non-linear regression?

Hello everyone,

I want to start by saying that I don't have too much experience working with continuous dependent variables. I have worked more with Logistic regressions.

I want to test if the numbers of Use of Force (UOF) a staff member of a mental clinic had in year 2014 predicts the number of UOF the staff member had in 2015. I hypothesize that the members who had a high number of UOF will have high number of UOF in 2015. the rationale (perhaps incorrectly) for me is that maybe these group is simplily more violent. I have a dataset with the same members for 2014 and 2015 (paired).

When I regress UOF_15 (dv) UOF_14 (iv), the coefficient is .4641591 and significant (p<0.001) suggesting a positive relationship. The distribution for my variables UOF_14 and 15 UOF are skewed right. I didn't fix this but lets omit this for a moment.

I suspect that the relationship between UOF 14 and 15 is not linear. After the regress command I run a predict command to observed the predicted y values. For example when and officer had 1 UOF in 2014, the predicted Y is 2.5, for 2 is 2.9 for 3 is 3.5. But when officer had 4 UOF in 2014 the Y is 3.9 or when UOF is 30 the predict Y is 16 UOF. Is this normal? I would expect a member with 40 UOF in 2014 have a predict Y of 40 plus? Am I missing soemthing here?

It seems that y increases at the same rate for each value of x. That's why I think I may have a not linear line.

To solve this I create a quadratic term by UOF_2014_2=UOF_2014*UOF_2014. I included his term into the regression and it is significant with a coefficient -.0067852. However the R-squared of the model improve very minimal from 0.210 to 0.215.
1. Am I confusing terms and principals here?
2. Do I really have a non linear relationship?
3. If I have a non-linear (quadratic) relationship, how can I know at one point my effect decrees ( inflection point)?

I would appreciate any comments. Thank you so much!
Marvin

2. Re: Do I have a non-linear regression?

I know nothing about UOF in mental clinics so bear that in mind with my answer.

I would have thought that use could classify factors that affect UOF as those that remain with a staff member between 2014 and 2015 (e.g. how violent they are) and things which don't (e.g. essentially random factors, a staff member might treat their same patient the same way and sometimes they require UOF and sometimes they don't). Some factors like the section they work in might be the same for some staff but not others.

In a single factor model you are assuming that there are factors due to the staff member plus a random element.

The staff members who have the most UOF instances in 2014 probably are more likely to have UOF in 2015 but also probably had higher than expected levels by chance.

To simplify things say staff members fall into two types violent and non violent. violent types have a number of UOF which is (approximates) normally distributed with mean 10 and s.d. 4, non violent types have a mean of 4 and a s.d. of 2. The staff member with the most UOF might have had 30 he is almost certainly of the violent type but it was just random that he exceeded the average of violent types of 10. The expected number of UoF the following year for that staff member would still be 10 (or a little lower as there is a small chance he is non-violent). An R squared of 0.21 implies that the correlation is not very strong and the random (or unmodelled) factors have a greater effect on the dependent variable than the independent variable.

Of course all staff have a different mean level but it makes sense that the staff with least UoF were not only less violent but also lucky (if UOF incidents is considered a bad thing) and those with the most were not only more violnet but also unlucky. This assumption mean that for linear model of
UOF15 = a + b* UOF14 I would expect a to be positive (someone with 0 UOF in 2014 has an expected value greater than 0 in 2015 and b <1 (If the total number if UOF is similar in each year the people with the most in 2014 would be expected to have more than average in 2015 but less than they had in 2015).

All that said for the regression model to be reasonable the residuals need to be independent, normal random variables. If the relationship is non linear then the residuals (difference between the actual and predicted 2015 values) will not be independent. If the actual relationship is quadratic most of the staff who had very low UOF or very high UOF in 2014 will have negative residuals and those in the middle will have positive residuals (or vice versa). This is easiest just to plot and do by eye but if you have enough data you can put them into groups based on 2014 UOF and test whether the mean residual in each group is close enough to 0 to have occurred by chance.

Whether to include a factor or not an exact science if you have few data points you don't want many factors in the model, you also need to consider whether while the quadratic improves the relationship a little another function might improve it move.

With your data I would consider taking logs. Without taking logs your residuals have a floor (someone who had 1 UOF in 2014 can not have a residual less than -1) and I get the impression from your post that the standard deviation of the residuals will be high enough that the normal assumption of residuals would give many staff a significant likelihood of negative UOF in 2015.

Taking logs would usually make your model
Ln(UOF15) = Ln(UOF14) +c
but in your example you can have 0 UOF just not negative where a log based model assumes all the variables are positive. I don't see why you couldn't change the model to:
Ln(UOF15 + 1) = Ln (UOF14 + 1) + c if you do have 0 data.

4. Re: Do I have a non-linear regression?

I suspect that the relationship between UOF 14 and 15 is not linear. After the regress command I run a predict command to observed the predicted y values. For example when and officer had 1 UOF in 2014, the predicted Y is 2.5, for 2 is 2.9 for 3 is 3.5. But when officer had 4 UOF in 2014 the Y is 3.9 or when UOF is 30 the predict Y is 16 UOF. Is this normal?
I do not know what "normal" means. However you have a linear model with one iv and a constant term. Judging by the values you report your constant term must be approx 2 .... so it seems right.

It seems that y increases at the same rate for each value of x. That's why I think I may have a not linear line.
Im sorry this I do not understand. If you have a linear model y=a+bx is it not the case that b is the constant rate with which y increases for each x? I do not know what you mean by a "not linear line"??

Do I really have a non linear relationship?
You have a statistically significant non-linear realtionsship between iv and dv, whether this is practically significant is another story. -.0067852 may not seem large ... but if you really have values as large as 30 on dv then -0.006*30^2 = -5.4 which may or may not matter alot.

5. Re: Do I have a non-linear regression?

Hi Jesper,

1. Yes, my constant term is 2.0. I got confuse at first, because I was expecting to observed an always higher UOF# in 2015 than 2014, since my coefficient term is .46. For example 2 UOF= 2.3 and 30 UOF - 16. But after doing some readings I realized that the constant plays an effect. Is that right? Also I am supposed to see an equal increase in my predictive Y values for every increase in the X, right? this increase should be my coeficient .46. Am I right?

2. Disregard this questions. How can I test for a non-linear relationship?

3. How can I look for outlives? Is there a systematic way to do this other than looking at a graph? For example I have a couple of officers that have 6 UOF in 2014 and had 40 UOF in 2015.

Best,
Marvin

6. Re: Do I have a non-linear regression?

Hi Marvin,

I imagine UoF data as being of the type of data we call counts. See the Poisson family distributions to see if there is some familiarity with your dataset. Variance is equal to the mean in the Poisson distribution, when this assumption is not correct we look at other distributions of the same family, sometimes negative binomial etc. Modelling count data (2015) with count data (2014) is an unknown for me. But...
2. To test if there is a non-linear relationship you can look at the regression value R^2, and see if this increases if you try using a transformed value of the IV and DV datasets (I suspect for positive skew you will need to square or log transform - and may or may not need to reflect. Look up the web for info on this. Basically, a transform to a normal distribution is what you would need to do for linear regression. Go back to the R^2 value, did it increase? If so, you're on to a winner (maybe).
3. If you got your mean outcomes, you will have your standard errors of those mean estimates. Outliers are those with UoFs that are greater than 2 x standard errors from the mean. In theory. But this does not account for individual variations which have much larger error margins. Anyway, probably rule of thumb, given that it is a small sample, you should probably consider an individual 4 standard errors out of the mean as being an outlier.

Cheers
Michael

 Tweet