My pain point is on regression exercises. Here is a text example after which I try to provide some answers. Hope you can help me framing in the right way the questions and relative answers:

REGRESSION EXERCISE

The manager in charge of deciding the credit policy to clients (they are firms) gets info on new clients from a public database collecting their past behavior. Let X be an index summarizing the variables recovered from this database. It takes value on a range from 1-‘highly unreliable client’ to 10-‘higly reliable client’.

In addition, the manager maintains his own database on behavior of his clients. Let Y be the delay (in days) of his clients on the date established to pay back their debt. Regressing Y on X he gets a model to predict future behavior Y of a new client based on X:

Y = 62.2 - 5.3*X + e, R2 = 0.52, var{e} = 16.8,

(1.4)

standard error for the estimated coefficient in parentheses.

1) Derive the predicted value of Y for a new client with X=5 and the 0.95 confidence interval for that prediction.

The manager wonders whether he could improve the performance of this model by adding a new explanatory variable to the regression. He considers two options: W is the size of the firm (1 if number of employees>150, 0 otherwise) while Z is the access to international markets of the firm (1 for firms with relevant export, 0 for firms with little/no export). This is the new regression:

Y = 52.3 - 4.2*X + 4.7*W + 5.6*Z + e, R2 = 0.62, var{e} = …,

(1.2) (2.1) (3.4)

2) would you add W and Z to the model?

3) derive the value of var{e}.

In the past this manager tried to reduce the delay of his clients by providing a discount on the debt provided that it was paid back with a delay smaller than 10 days. To test whether this incentive works he randomly selected some clients: D=1 for those who received the incentive, D=0 for those who didn’t. This is how the regression looks like including D as an additional explanatory variable:

Y = 53.7 - 4.1*X + 4.8*W + 5.7*Z - 3.5*D + e,

(1.2) (2.1) (3.4) (0.3)

4) Would you say that the incentive works? R2 here is missing, is it relevant to your answer?

5) Would your answer change if the standard error for the estimated coefficient were 1.5?

6) (this is a bit more tricky…) Now suppose the incentive is not randomly allocated. Instead, it is assigned to firms with a value of X smaller than 5 and only to them. Do you expect to find an estimated coefficient on D more or less like the one above? Or substantially different from it?

ANSWERS

1. Substituting X in the equation I would get Y=35,7 and the confidence internal should be 35,7+- 1,96*16,8rootsquared

2. I would add them because R^2 is higher so improves the relevance of the analysis

3. How do I calculate it without the sample number?

4. Standard errors stay the same. The coefficients vary a bit. The variable D is not related to any other variable, so i think it didn’t change a lot the result and the incentive is not working

5. No because there was no correlation in the first place...?

6. I’m not sure how to answer that