# Standardization of a variable

#### luchins

##### Member
Hello, before making a linear regression, how can I know if I have to standardize a variable?

#### hlsmith

##### Not a robit
Depends on the purpose of your analysis. Feel free to provide more details. Standardizing isn't required in most models, beyond regularization models and centering interaction terms in regression.

#### luchins

##### Member
Depends on the purpose of your analysis. Feel free to provide more details. Standardizing isn't required in most models, beyond regularization models and centering interaction terms in regression.

Example I saw this discussion:

https://stats.stackexchange.com/que...players-service-point-win-percentage-which-of

This guy asked to stack exange:

(My personal question which I would ask to you, are Question n1 and Question n2)

'' Hello, I am conducting a regression in order to predict a tennis player's service point win % i.e. the percentage of points he wins when he is the server. ----

Model 1 If my DV (Dependent Variable) data lies in the range 0.3-0.9, does it make sense to use a logistic regression?

-------Question 1) (Why does he ask if the data lies in the 0.3-0.9 range has it sense to use logistic? How is this range calculated? )

If using logistic I would endeavor to build a model with serve win % as my Dependent Variable and my Indipendent Variable's as:

+average serve win % of last n matches (maybe n=5 or 10) to account for form

+surface

+player ranking

+opposition ranking

..... Would this be a good model to use? Preliminary logistic regressions just involving serve win % regressed on surface + player ranking + opponent ranking ... are showing some strange results so im losing faith in logistic for this data.

An alternative I'm considering is to use raw variables in a linear regression type model with interactions.... Along the lines of Aiken & West 1991My dependent variable will be number of service points won in match, and my independent variables will be:

+ no. service points played in match + the surface the match played on

+ the player's ranking points +the opponents ranking points

+ an interaction between player and opponent ranking points

+ an interaction between surface and no. points played

+ average service points won in last n matches

+ average % of service points won in last m matches

Do either of these models stand out as smart or appropriate ways to model this data?

For context, for each player I have between 100-350 matches worth of data. I would love to hear what you guys think, or if you have any other suggestions on how to predict serve win % using the stated variables I would really appreciate it. I'm conducting this analysis in R so any code/package suggestions would also be great''

For my avg. serve win % in last n matches variable, I wanted to standardize serve % based on surface and player rank/opponent rank for all of my data, for better accuracy - so I needed to gauge the effect of surface/rank I ran simple linear and logistic regressions along the following lines: serve win % = surface + player ranking + opponent ranking I also ran these IV's in regressions of their own. The linear model results were pretty much as expected and were accurate/in line with tennis knowledge/theory. The logistic regression results were pretty wild and inaccurate.''

-------Question N 2 : (Why is he trying to standardize the variable serve%, based on surface? Why? And how can he do this? By running multiple linear regression? Assigning an arbitrary value to the serve% variable?'' How can he santardadize it?)