Multiple regression with interaction - and dummy variables?

#1
Hi guys!

So, here's the deal. I'm in a bit of a pickle here :shakehead
I'm currently struggling with a take-home exam where I'm having trouble figuring out one question regarding interaction variables and dummy variables.

The assignment concerns sales forecast for a medical company in the UK, where I have come up with the regression to estimate sales forecast:

Sales = b0 + b1(Advertising) + b2(Bonus) + b3(Compet) + b4(SOUTH)

Where:
Advertising = Amount spent on advertising (in units of £1000).
Bonus = Total amount of bonuses paid (in unites of £100).
Compet: The largest competitor's sales in the territory (in units of £1000).
SUOTH: Dummy variable that is coded 1 if territory is in Southern England, and 0 otherwise.

Now, the exact question I'm struggling with is this:
"Mr X thinks the impact of competitor's sales may only be significant in territories in Southern England. Explain the steps you would follow to test this belief"

I have personally come up with the following model in order to answer the question:
Sales = b0 + b1(Advertising) + b2(Bonus) + b3(Compet) + b4(SOUTH)D + b5(COMPET*SUOTH)D

Where D indicates dummy variables.

With the new b5 parameter, I should be able to conduct hypothesis tests in order to determine whether competitors sales is only significant in the Southern territories, and not in other territories.

However, several of my classmates have come up with different suggestions, and we are all a bit unsure here.

How would you approach this question?
 
#2
your solution works very well.

The interaction term and its associated beta value will tell you the strength of the difference and its direction.

Good luck on your exam.

Out of curiosity, what were the other solutions?
 
#3
Hi the42up.

The other solution my mate came up with would be the following:


Sales = b0 + b1X1 + b2X2 + (b3X3 + b4)*X4

Sales = b0 + b1(ADV) + b2(BONUS) + (b3(SOUTH)+b4)*(Competitor's Sales)

The attached image below clarifies it better:



Would you say this is more correct?
 

noetsi

Fortran must die
#4
An interaction term will tell if competitor's sales' impact on sales varies at levels of South. It won't tell you that their result is only statistically significant in South Wales. Or at least I have never seen interaction interpreted that way. You could tell how much the effect size varied and assess substantive significance of this, but this is not statistical significance the way that term is usually used.
 
#5
Really? I thought that by comparing the significance of b3(Compet) with b5(COMPET*SUOTH)D you would be able to answer that question?
For example, if b3 turns out to be insignificant but b5 is significant - then you would be able to conclude, that the impact of competitor's sales is only significant in southern territories (since the reference scenario is non-southern territories)?
 
#6
An interaction term will tell if competitor's sales' impact on sales varies at levels of South. It won't tell you that their result is only statistically significant in South Wales. Or at least I have never seen interaction interpreted that way. You could tell how much the effect size varied and assess substantive significance of this, but this is not statistical significance the way that term is usually used.
I interpreted the question as meaning that the boss is only interested in the southern wales rather than wanting to know if its only significant in the southern wales. So your friends have a better solution.

Sometimes reading comprehension is more important than statistical know-how.
 

noetsi

Fortran must die
#7
That may be although if they only want the results in southern wales it does not make a lot of sense to have data from other places :p

It's certainly true that knowing what is being asked is critical to any analysis.
 
#8
Well, the exact problem formulation goes like this:

"Mr. Philip thinks the impact of competitor's sales may only be significant in territories in Southern England. Explain the steps you would follow to test this belief."

Data information:
SALES: Sales over the past year (in units of 1000).
ADV: Amount spent on advertising (in units of 100).
BONUS: The total amount of bonuses paid (in units of 100).
SHARE: The market share held by Cohop (the company) in the territory (in %).
COMPET: The largest competitor's sales in the territory (in units of 1000).
SOUTH: Dummy variable that is coded 1 if territory is in Southern England, and 0 otherwise.​

Thats basically all we have.
Any suggestions? :wave:
 

rogojel

TS Contributor
#9
hi,
I think, the term significant may cause the problem. If this is meant to be "practically significant" as opposed to "statistically significant" which would be the expected meaning from a boss, then this can be translated as the statement that the impact of the competition is only large enough in the south. With this interpretation you only need to include the interaction term in the model and compare the impact of competition for the two areas. beta3 against beta3+beta5.

regards
 

noetsi

Fortran must die
#10
There is a difference between substantive and statistical significance. An interaction term will tell you if the impact of competitors sales on sales varies significantly by region - which really means that what you find in the sample likely exists in the actual population, it is not due to sampling error.

This does not mean that there are substantively large differences in sales tied to competitor sales in S Wales or for that matter anywhere. To answer that you have to look at sales in S Wales and elsewhere and how they relate to competitors sales. And you have to make the substantive judgment whether competitors sales is having enough impact on sales in either region to be substantively successful.
 
Last edited: