Hello,

I am working on creating a statistic to predict players most likely to get a hit on any given day. In this statistic, I am using five different predictive factors, which I am going to then assign relative weights to in order to improve the accuracy of the prediction statistic.

For example:
  • Batter Hits/Game (Last 30 Games)
  • Batter Hits/Game (Last 3 Games)
  • Opposing Pitcher Hits/9 Innings (Last 15 Games)
  • Opposing Pitcher Hits/9 Innings (Last 2 Games)
  • Ballpark Hit Factor (Current Season)
The factor weighting part is easy once I figure out how to account for the different ranges of the factors themselves. I don't want to force a specific min and max for each factor. I also don't want any of the "standardized" or "normalized" or "whatever" factors to end up as a negative value.

Of course, outliers happen (and happen frequently with the smaller sample sets), but here are the approximate ranges of each factor:
  • Batter Hits/Game (Last 30 Games): ~0-1.5
  • Batter Hits/Game (Last 3 Games): ~0-3
  • Opposing Pitcher Hits/9 Innings (Last 15 Games): ~6-15
  • Opposing Pitcher Hits/9 Innings (Last 2 Games):~2-20
  • Ballpark Hit Factor: ~0.5-2.0
I thought about ranking them (as each factor has the same number of players as the input) and then weighting from there, but that doesn't account for how each individual factor is distributed within those rankings. I thought about always having the max value equal to 1, forcing each factor to have a range of 0-1, but I don't know if that's right either. I've thought about a lot of things, but I honestly just get stuck spinning my wheels.

Basically, I'm wondering what the best way to "normalize" or "standardize" or "whatever" each individual factor is so that I can then assign weights to each of them to calculate my final hit predictor statistic. I love doing things like this to occupy my time, but it's been entirely long since my last statistics class.

Thank you for your help!

Best regards,
Eric