Let me rephrase my question: I would like to know what R is telling me exactly with the following output.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.87321 1.90100 -2.564 0.010362 *
ConsDur 0.08247 0.02309 3.572 0.000355 ***
ConsDur:Sonority -0.05816 0.01015 -5.730 1.00e-08 ***
Where does the coefficient for ConsDur:Sonority even come from? Is it some sort of data manipulation of ConsDur? If so? How can I get a predicted probability from these values?
Understandable; I personally am not comfortable in using interactions unless one of the variables is dichotomous (0/1); for me the interpretation/assumption gets a little tricky. The coefficient for ConsDur:Sonority comes out the same way that the coefficient for ConsDur comes out--you can see that by doing this: in your data set make a new variable, call it inter = ConsDur x Sonority. Then run your model adjusting for ConsDur, inter. You should see the same result for inter as you did for ConsDur:Sonority. At it's base the interaction is just another covariate, albeit one whose values are determined entirely by two other covariates; it is not a covariate that you can change by itself.
The thing that is tricky to me, is that the interaction value is the same when ConsDur=0.1 and Sonority = 30, as when ConsDur=3 and Sonority=1. When one of the variables is dichotomous you do not need to worry about this.
When I said "interpret the interaction", I meant that I would not know how to calculate the predicted probability based on a particular value of consonant duration in this model because the interaction term needs a coefficient from consonant duration and sonority (interaction coefficient*ConsDur coef*Sonority coef).
You need both particular values of consonant duration and sonority. You can't calculate a predicted probability if you only input a particular value for consonant duration. Because the predicted probability varies according to the level of sonority too.
Working through an example calculation with your output: If ConsDur=0.1 and Sonority=30 (ConsDur*Sonority=3), then a +1 increase in ConsDor to =1.1 and -26.4 decrease in Sonority to =3.6 (leads to a +0.96 increase in ConsDur*Sonority to =3.96), then the "value that leads to predicted probability" changes by 1*0.082 + 0.96*(-0.058) = +0.026. I will just repeat here that it is tricky to me, when one of the variables is not dichotomous. A +1 increase in ConsDur and +1 increase in Sonority, the end effect depends on what levels of ConsDur and Sonority you started with, because the interaction term will change by varying amounts.
Yeah, I thought of that, but I am afraid of overfitting. The second best model is just with ConsDur when ranked in terms of the lowest AIC. p.s. I'm not surprised it Sonority has very little effect and is not sig. when the interaction is added. Sonority is a measure of voicing which is related to consonant duration for inertial motor reasons. In some languages voicing is contrastive by itself but I have reason to believe this is not the case for Ojibwe.
Right, adding Sonority to the model may be only adding a statistically non-significant term. But people like to at least see that you considered it as well. It will give a fuller picture, show that you are not trying to obscure something to get a certain result. Maybe you could present two models, then point out that the one with Sonority is really rubbish, so you're just going to go forward with predicted probabilities based on the model without Sonority?