# Suggestions to improve this sports model?

#### Norm517

##### New Member
Based on the second table "January window spend by place..." in this article, I came up with an LM attempting to determine if, in the English soccer league, there's a relationship between the average season's end ranking in the league table (there are 20 teams, three of which change every season), and average expenditure on transfers in January and team ranking in January. After a trying a few variants, the best model with which I came up looked like this:

season_final_ranking <- log(avg_january_spending + 1) + ranking_on_January_31

One glaring problem with the article is that it didn't mention at which point in January the rankings were measured, adding to the difficulty of using this model for predictions. The model summary was as follows:

ranking_on_January_31 was significant, with p < 0.001
avg_january_spending was not significant
Overall:
Residual standard error: 1.82 on 17 degrees of freedom
Multiple R-squared: 0.8939, Adjusted R-squared: 0.8814
F-statistic: 71.62 on 2 and 17 DF, p-value: 5.226e-09

Anyone have any suggestions for improving the model, or replacing it with a better variant?

#### noetsi

##### No cake for spunky
I can't open the link given blocks. What do you mean by LM, linear model?

I would think one problem could be high multicolinearity between your IV the log of spending and the ranking because spending likely influences not just the final season ranking, but the ranking in January. There are a variety of solutions, none of them ideal. The most recommeded, creating an index of the effected IV, probably would not make conceptual sense here. You should in any case do a VIF test (available in all statistical software). John Fox wrote Regression Diagnostics which deals with this issue extensively.

Another improvement would be to add other independent variables. Its not surprising that early season success would be part of late season success, since the former is an inherent part of the later. So looking at other factors would add to your model. Journals in particular would probably want more. Could you somehow measure the strength of schedule for a team and/or managerial success in the past?

#### Norm517

##### New Member
Thanks, noetsi. I'd try to paste in the table, but I'm not sure if it would get this thread removed. If the link itself is not showing, it's: http://www.bbc.co.uk/sport/0/football/20793698.

Yes, by LM, I meant linear model. It does seem reasonable to inquire if January spending would have some kind of influence even over January results - assessing this in this context is confounded by the fact that the article doesn't indicate whether the January rankings they presented were from January 1st (in which case, they are not influenced by January spending at all), or later in January (in which case, they may be). I ran R's vif function (car package) over the model, and the variance inflation factor for my IVs was 1.000029, which, if I understand correctly, doesn't suggest a multicolinearity problem. It would be great to measure the strength of a schedule, but so far, I've been going strictly from the data posted in the article. I guess some of these issues could be addressed if I were able to determine the source of their data, and see if there's anything else to be gained from that source (guess I should try to do that). Thanks for the suggestions!