My study examines the impact of online (a) reputational status, and (b) reviews on physicians’ performance, as proxied with new patient referrals (non-negative over-dispersed counts).

In case of (a), to address the issue of endogeneity in the reputational status

*(binary)*, I use an instrumented variable (IV) approach and estimate the following equations:

y1 = exp(y2 + x1 + x2 + x3) + u1 (1) --> panel negative binomial model (FE)

y2 = x1 + x2 + x3 + z1 + z2 + z3 + u2 (2) --> panel probit model (RE)

Let me briefly describe (b) case: using Leximancer I was able to analyze the text reviews and extract 4 primary emerged themes (i.e., interpersonal manners, staff, office, knowledge), for which I was able to export the total number of counts of occurrences. Let me call these variables n1/n4.

Note, “as is” n1/n4 do not directly impact referrals, as higher count of, say, office or staff related concepts has nothing to do with the referrals. For that reason, I additionally extracted the overall

*sentiment*accumulating from the reviews for a given physician

*(let’s call it “s”)*. Now, it is plausible to assume that higher count with

*favorable*sentiment could positively impact referrals, whereas higher count with

*unfavorable*sentiment could negatively impact referrals.

Having said the above, the model to be estimated is:

y1 = exp(x1 + x2 + x3

+ n1 + n2 +n3 +n4 + s

+ n1*s + n2*s + n3*s + n4*s) + u3 (3)

Note, I included the main effects in the model, despite the fact that n1/n4 are not theoretically related to y1 -- is this correct? Or shall I drop the main effects?

Further, to “correct” equation (3) for endogeneity (as I did with case (a)), I only have 3 instruments available (z1/z3) as in (2) and it makes it literally impossible to employ the IV approach to instrument all 9 parameters (i.e., main effects + interactions) in equation (3).

Is there any way to go about such situation (may be some augmented or step-wise models?)? I’d sincerely appreciate your feedback.