Interpretation help for my negative binomial regression --> Testing for curviliniarit

Hello everyone,

I am a newby to that forum, so I hope my question is not redundant. I am currently sitting on my master thesis where I investigate which effect diversity in teams has on publications.
Herefore, I have a dependent variable 'publications' which is a count variable. I have two control variables 'Team_Size' and 'Start_Year'. For my independent variable I have generated four diversity indices (I don't want to get too much in detail) as well as the average experience per team member (therefore five independent variables.)

As I assume a curvilinear relationship between diversity in teams and the publication rate, I would like to test whether an inversed U-shaped relationship is given between the dependent and independent variable. Hence, I have generated the squares of my independent variable in STATA.

I have used the nbreg (negative binomial regression) command as my dependent variable 'Pub' is a count variable with many zeros (--> overdispersion).

In my first model, I have only included my controll variables, then I have stepwise included my independent variables and squares.

Now, I am not sure how to interpret my results. Could somebody help me? Below attached you can see a photo of my output (It is still not beautiful but just exported to excel)

As far as I see it, I would say the following:

1) There is a significant relationship between Team_Size and Publications, such that bigger teams produce less.
2) Also, the control variable 'Start_Year" has a significant association with the publication rate, as the publication rate seems to decrease with time.
3) Diversity Index number 2 has a significant association with the publication rate. It is not a curvilinear relationship as the square of Diversity Index number 2 is not significant but is instead a linear negative relationship.
4) Diversity Index number 4 and number 5 are insignificant.
5) Average_Experience is significant. There is a U-shaped relationship between the independent and the dependent variable.

Is that all correct?

How come, that my control variable 'Team_Size' is in some models significant and in others it is not? Can I still say, that Team_Size is significant?
Diversity_Index_2 is only significant in model 2. Can I say that it is still overall significant?

Re: Interpretation help for my negative binomial regression --> Testing for curvilini

Hello there!

Firstly, using NB does not solve the problem of over-dispersion (its a commonly believed assumption, which is not quite true). If you have many zeros, then zero-inflated NB (-zinb- in Stata) should be your choice. Further, NB regression coefficients by themselves are not that much helpful (as in linear regression). Therefore, I suggest you include an option that displays incidence-risk ratios (-, irr- option in Stata). In that case, the coefficients would represent the percent of increase in the rate of DV.

Now, to tell you the truth, I cannot see your picture as the letters seems too small. Assuming you have a simplified model with 5 IVs in the following form:

y = x1 + x2 + x3 + x4 + x5, where x5 is suspected to be curvelinear. To test for curvelinearity, you square x5, say, x5_sq and add it to the model: y = x1 + x2 + x3 + x4 + x5 + x5_sq. You do not need to square other predictors, if there are not suspected to have a curvelinear relationship with the DV.

If the estimated squared term is significant, then you do have the first sign of curvelinear effect present. Now, if the sign is negative, then the curve is concave downward (inverted-U), otherwise -- upward. Finally, you should calculate the maximum of the curve -- curve max = -x5/2*x5_sq -- if the resulting value falls into the reasonable interval of values, then indeed you do have a curvelinear effect. Plot it and explore the effect.

Note, in interpretation of your results you rely on those estimated in a full model -- i.e., y = x1 + x2 + x3 + x4 + x5 + x5_sq, and not the step-wise ones. Also, if you do use step-wise approach, then it would make sense to report the -loglikelihood to see how the model "behaves" with addition of new regressors.