(My personal question which I would ask to you, are Question n1 and Question n2)
'' Hello, I am conducting a regression in order to predict a tennis player's service point win % i.e. the percentage of points he wins when he is the server. ----
Model 1 If my DV (Dependent Variable) data lies in the range 0.3-0.9, does it make sense to use a logistic regression?
-------Question 1) (Why does he ask if the data lies in the 0.3-0.9 range has it sense to use logistic? How is this range calculated? )
If using logistic I would endeavor to build a model with serve win % as my Dependent Variable and my Indipendent Variable's as:
+average serve win % of last n matches (maybe n=5 or 10) to account for form
..... Would this be a good model to use? Preliminary logistic regressions just involving serve win % regressed on surface + player ranking + opponent ranking ... are showing some strange results so im losing faith in logistic for this data.
An alternative I'm considering is to use raw variables in a linear regression type model with interactions.... Along the lines of Aiken & West 1991My dependent variable will be number of service points won in match, and my independent variables will be:
+ no. service points played in match + the surface the match played on
+ the player's ranking points +the opponents ranking points
+ an interaction between player and opponent ranking points
+ an interaction between surface and no. points played
+ average service points won in last n matches
+ average % of service points won in last m matches
Do either of these models stand out as smart or appropriate ways to model this data?
For context, for each player I have between 100-350 matches worth of data. I would love to hear what you guys think, or if you have any other suggestions on how to predict serve win % using the stated variables I would really appreciate it. I'm conducting this analysis in R so any code/package suggestions would also be great''
For my avg. serve win % in last n matches variable, I wanted to standardize serve % based on surface and player rank/opponent rank for all of my data, for better accuracy - so I needed to gauge the effect of surface/rank I ran simple linear and logistic regressions along the following lines: serve win % = surface + player ranking + opponent ranking I also ran these IV's in regressions of their own. The linear model results were pretty much as expected and were accurate/in line with tennis knowledge/theory. The logistic regression results were pretty wild and inaccurate.''
-------Question N 2 : (Why is he trying to standardize the variable serve%, based on surface? Why? And how can he do this? By running multiple linear regression? Assigning an arbitrary value to the serve% variable?'' How can he santardadize it?)