Kindly need some tutorship for the next statistics exam

#1
Hello everyone,
I’m a university student desperately looking for help to brilliantly pass the next statistics exam. I’m sure it’s no news to you but I hope I won’t seem just another lazy guy looking for shortcuts. My professor is new to the course and doesn’t really have a method so I’m struggling to follow its exam structure since he is really theory oriented but exams are more practical. My question is, before posting any exercise example, can I have some guidance from some of you experts through the next week before the exam? Is there a better place to do so? Thanks in advance, I can guarantee that you’ll have a willing scholar ready to deepen the threads.
 
#3
My pain point is on regression exercises. Here is a text example after which I try to provide some answers. Hope you can help me framing in the right way the questions and relative answers:

REGRESSION EXERCISE
The manager in charge of deciding the credit policy to clients (they are firms) gets info on new clients from a public database collecting their past behavior. Let X be an index summarizing the variables recovered from this database. It takes value on a range from 1-‘highly unreliable client’ to 10-‘higly reliable client’.
In addition, the manager maintains his own database on behavior of his clients. Let Y be the delay (in days) of his clients on the date established to pay back their debt. Regressing Y on X he gets a model to predict future behavior Y of a new client based on X:

Y = 62.2 - 5.3*X + e, R2 = 0.52, var{e} = 16.8,
(1.4)
standard error for the estimated coefficient in parentheses.
1) Derive the predicted value of Y for a new client with X=5 and the 0.95 confidence interval for that prediction.

The manager wonders whether he could improve the performance of this model by adding a new explanatory variable to the regression. He considers two options: W is the size of the firm (1 if number of employees>150, 0 otherwise) while Z is the access to international markets of the firm (1 for firms with relevant export, 0 for firms with little/no export). This is the new regression:

Y = 52.3 - 4.2*X + 4.7*W + 5.6*Z + e, R2 = 0.62, var{e} = …,
(1.2) (2.1) (3.4)
2) would you add W and Z to the model?
3) derive the value of var{e}.

In the past this manager tried to reduce the delay of his clients by providing a discount on the debt provided that it was paid back with a delay smaller than 10 days. To test whether this incentive works he randomly selected some clients: D=1 for those who received the incentive, D=0 for those who didn’t. This is how the regression looks like including D as an additional explanatory variable:

Y = 53.7 - 4.1*X + 4.8*W + 5.7*Z - 3.5*D + e,
(1.2) (2.1) (3.4) (0.3)

4) Would you say that the incentive works? R2 here is missing, is it relevant to your answer?
5) Would your answer change if the standard error for the estimated coefficient were 1.5?
6) (this is a bit more tricky…) Now suppose the incentive is not randomly allocated. Instead, it is assigned to firms with a value of X smaller than 5 and only to them. Do you expect to find an estimated coefficient on D more or less like the one above? Or substantially different from it?

ANSWERS
1. Substituting X in the equation I would get Y=35,7 and the confidence internal should be 35,7+- 1,96*16,8rootsquared
2. I would add them because R^2 is higher so improves the relevance of the analysis
3. How do I calculate it without the sample number?
4. Standard errors stay the same. The coefficients vary a bit. The variable D is not related to any other variable, so i think it didn’t change a lot the result and the incentive is not working
5. No because there was no correlation in the first place...?
6. I’m not sure how to answer that
 

Karabiner

TS Contributor
#4
Y = 52.3 - 4.2*X + 4.7*W + 5.6*Z + e, R2 = 0.62, var{e} = …,
(1.2) (2.1) (3.4)
2) would you add W and Z to the model?
3) derive the value of var{e}.
2. I would add them because R^2 is higher so improves the relevance of the analysis
This is tricky. Logically, any additional variable must increase R² in the sample.
But if you look at the commonly used 95% confidence intervals for the
3 regression weights, you will find that one of them includes 0, and
therefore one could maybe assume that this variable is not related to
the outcome in the population.

With kind regards

Karabiner
 
#5
Thank you Karabiner. So you suggest me to answer that yes it improves R2 but yet from one of the two new variables we have a risk to incur in an unrelated variable?
Any suggestions for the other questions?