using prediction results in a separate model


I'm looking for some feedback on an idea, if you're interested. The issue at hand is how to deal with the variable "age" as a predictor of likelihood to make a donation (and how much of a donation, at that).

I have been including "age," along with a number of other variables, in multiple linear regression models to help segment a donor base. The problem that I've been considering is that of cumulative giving (which for better or for worse, is my DV): older people have had a longer time to give, and a younger person giving at the same rate will not have cumulative giving as high, but that doesn't make one donor more valuable than the other.

So one thing that I tried was to create two new variables: largest single gift and that person's age at the time said gift. Using largest gift amount as the DV and age at the time of gift as the IV, I scored a data set, and then included that predicted score as a variable in my original regression with all my usual variables to predict giving. This variable performed very well.

I looked at a scatter plot of the "largest gift predicted scores" compared to age, and it shows a strong correlation which looks like a wide band...about 15 years of age spread all along the upward line. By comparison, the age-to-cumulative giving scatter plot looks like a very narrow line with a high R2.

So my conclusion is that including the raw scores in my model rather than the "age" variable is better, because I am measuring something other than the cumulative giving effect. But am I misreading this? Or, are there things to consider if I'm using model results as a variable in a model?

I hope this makes some sense, and I hope it interests somebody out there!




TS Contributor
Hi phoebe,

If I understood well, you are using the predicted values obtained by a regression model as independent variables in another model. In that case, it would be better to include the dependent variable, wouldn't it? Since the predicted values have an error associated and the dependent variable is (supposedly) measured without error.

Now, for your problem, you could create anew variable defined as "Average donations per year" this would exclude the advantage of older donors.

Good luck