Hello dear forum members,

My study examines the impact of online (a) reputational status, and (b) reviews on physicians’ performance, as proxied with new patient referrals (non-negative over-dispersed counts).

In case of (a), to address the issue of endogeneity in the reputational status (binary), I use an instrumented variable (IV) approach and estimate the following equations:

y1 = exp(y2 + x1 + x2 + x3) + u1 (1) --> panel negative binomial model (FE)

y2 = x1 + x2 + x3 + z1 + z2 + z3 + u2 (2) --> panel probit model (RE)

Let me briefly describe (b) case: using Leximancer I was able to analyze the text reviews and extract 4 primary emerged themes (i.e., interpersonal manners, staff, office, knowledge), for which I was able to export the total number of counts of occurrences. Let me call these variables n1/n4.

Note, “as is” n1/n4 do not directly impact referrals, as higher count of, say, office or staff related concepts has nothing to do with the referrals. For that reason, I additionally extracted the overall sentiment accumulating from the reviews for a given physician (let’s call it “s”). Now, it is plausible to assume that higher count with favorable sentiment could positively impact referrals, whereas higher count with unfavorable sentiment could negatively impact referrals.

Having said the above, the model to be estimated is:

y1 = exp(x1 + x2 + x3
+ n1 + n2 +n3 +n4 + s
+ n1*s + n2*s + n3*s + n4*s) + u3 (3)

Note, I included the main effects in the model, despite the fact that n1/n4 are not theoretically related to y1 -- is this correct? Or shall I drop the main effects?

Further, to “correct” equation (3) for endogeneity (as I did with case (a)), I only have 3 instruments available (z1/z3) as in (2) and it makes it literally impossible to employ the IV approach to instrument all 9 parameters (i.e., main effects + interactions) in equation (3).

Is there any way to go about such situation (may be some augmented or step-wise models?)? I’d sincerely appreciate your feedback.