I ran a 2-stage fixed-effects panel model in R. The goal is to find the effect of strategic alliance participation on firm performance. Alliance participation is not random - firms self-select (and are selected by their future partners) into alliances. Thus I ran a 2-stage model.

First, I ran a fixed-effects plm model in which I regressed log(number of alliances) on a set of variables that should impact the propensity of firms to participate in alliances (I have 11 years worth of of data on alliances for about 500 firms).

Second, I plugged the results of the first stage into the second fixed-effects panel model where I estimate firm performance as a function of log(number of alliances) and other variables.

Now for the question. When I do the second stage, should I plug in the fitted values of log(number of alliances) or the actual values of log(number of alliances) from the dataset? I have read that 2-stage models call for the fitted values in place of the actual values. Is that correct?

I understand that I also need to plug in the residuals obtained in the first stage as the correction for self-selection. Is that correct? How should I interpret the coefficients for the value itself (fitted or actual) and for the residual?

I tend to interpret them this way: the coefficient for the fitted value is the effect of alliance participation on firm performance that is expected for the average firm. The coefficient for the residual is the effect of deviation from the predicted value. Please let me know if this sounds like a correct way to interpret my results.

I will greatly appreciate any ideas. Thanks!