Problem interpreting Multiple Regression on Video Game Sales

#1
Hi everyone,

I am a new member, I am so glad to have found the forum. I'm tearing my hair out about this problem. These are my results:



"Views" = Hits on our company Facebook page.
"Reviews" = Number of 3rd party reviews published about our game that link to our website.
"Coupon" = Discount coupons for the game.
"Shareware" = Unique listings of our product on Shareware websites that link to our website.
"Price" = Pricing of our product. The product pricing was changed three times, starting at $19.99, down to $9.99, and then $6.99.
"Downloads" = Downloads of the trial version of the product from our website.

Am I right in interpreting this data to say that with an increase of 1 game review, product units sold will decrease by -6.94? Also, it is true that increased price decreases total unit sales, but the amount it decreases by seems larger than reality. Is the "Day" value skewing the results?

Here's how I got the results using R:

results <- summary(lm(Sales ~ Downloads + Price + Coupon + Shareware + Views + Reviews + Day))
, where "Sales" is the dependent variable and the other factors are independent variables.


Can someone tell me what I could have possibly done wrong to produce these counterintuitive results? Anyway ways to test their validity? I am so sorry if these question are stupid. I have to learn multiple regression for work, with very little background in math, and no teacher! Any help is appreciated, thanks!
 
Last edited:

terzi

TS Contributor
#2
Hi aplfalcon,

There are certain things that could be be producing wrong results:

* Some variables in the model appear to be non-significant, that is, these measures don't help you explain sales. Before adjusting the model, try analyzing the relationships with scatter plots and correlations.

* The Response Variable may have some skew (when dealing with money it is very common:)), you should analyze it first.

* Certain assumptions may not be met in your model. The most important are normality in residuals and common variance across the observations. Those assumptions must be checked graphically.

* Since your response variable seems to be measured over time, your data may be autocorrelated, which will cause troubles in an Ordinary Least Squares Regression Model.

As you can see, your data appears to be a little tricky. Try doing a deeper exploratory analysis and I would suggest you to perform a Robust Regression Model.
 
#3
Hi Terzi, thanks so much for your reply. It's really helpful.

I noticed that some of the variables do appear non-significant, but can they be significant in large numbers? 1 download doesn't do much for sales, but my data set includes thousands of downloads per day. Same goes for views.

I did get the sense that the time variable did not fit in the model. Thanks for suggesting the Robust Regression model, I will look into it. Also, are "residuals" the same as minimum sum of squared errors (SSE)? And when you say, "normality," do you mean a normal dstribution? Thanks for your help.

It looks like I'm going to have to take a statistics class next quarter. There are so many things to learn, I don't think I can learn it all on my own.
 

terzi

TS Contributor
#4
If a variable is non-significant in the model it means it has no linear relationship with the response. Downloads it's significant although views doesn't seem to be. But before making conclusions, you should analyze the relationships individually first, in order to detect their shape, direction and strength.

Now, regarding the assumptions of the model, with normality I do refer to the normal distribution of your residuals. Residuals are not the same as your SSE, a residual is the difference between the value in your DV predicted by the model and the actual value. There is a residual for every observation and it is assumed that these residuals are distributed normally.

I'm almost certain that the model you fitted is not meeting all assumptions since that is the most common reason for "weird", illogical results.

It is indeed a great field, so if you have the opportunity to take a statistics course that will be really helpful. Of course, feel free to come for any doubts you may have.

Good luck