Player Pool Analysis

Hi, :wave: loving this forum already - hoping to become a regular user for both work and hobby (online poker!!)!!

Anyway my current conundrum is that I want to "rate" or "categorise" different types of players (very precisely) depending on their various relevant statistics that are sitting in my online poker database. I've exported these stats in a CSV file to Excel. And want to apply some kind of scientific model to appropriately identify player types.

Look for any help, hopefully this will be a very interesting piece of work, happy to share results and findings!! My sample size at the moment is ~2000 players in my database that have played more then 500 hands, about 6million online poker hands. I've done this before, about a year ago just using a very simplistic method of =Countif in Excel but want to take it much further this time. Will be using Excel again but do have limited access to SAS.

Feel free to ask any questions. I dont want to kill the topic by posting wayyy too much in OP but I do have lots of datamined info to work from for those who dont understand online poker, which I presume would generate first few questions, ie how accurate/reliable is my starting data.
Last edited:


New Member
What kind of variables do you have? Or would you need to calculate them from the data first? If so, a lot of the predictive quality of your model will probably depend on the quality of the "features" that you extract from the data. These would describe the player's behavior at a somewhat higher level than the raw data. For instance, proportion of hands folded or something (no idea if that would be a good feature though).

Some things you could look into are factor analysis or cluster analysis. You could even try tinkering around with neural networks if you're feeling adventurous.
Or would you need to calculate them from the data first?
No we dont need to calculate the variables as their already stored in a PostgreSQL database, which my FrontEnd DB runs off. In terms of player statistics we're have approx 130 individual stats available but tbh I dont think we need that many for this piece. We could calculate some extra stats to highlight the ones of greater significance correlating to winrate? But I think as a player I have a solid grasp of this already.

What kind of variables do you have?
I believe I've extracted the most relevant ones which are;

  • VP$IP - Voluntary put $ in pot
  • PFR - Pre Flop Raise (basically raising instead of calling)
  • PreFlopColdCall - called cold w/o prev investing money in pot
  • PFCall - Like above+when calling from the blinds, ie dead money already invested or calling a reraise
  • Aggression Factor - ( Total Times Bet + Total Times Raised ) / (Total Times Called )
  • Aggression Frequency - [ ( Total Times Bet + Total Times Raised ) / (Total Times Bet + Total Times Raised + Total Times Called + Total Times Folded ) ] * 100
  • BB/100 - ie Winrate, Big Bets won per 100 hands
  • Amount Won
  • WTSD - Went to Showdown %
  • W$WSF - Won W/o Showdown when saw Flop%
  • W$wWSD - Won $ at Showdown %

Additionally, I also extracted these stats to support the key figures;
  • 3Bet - [a type of reraise, typically the 3rd bet]
  • 4Bet
  • F.3Bet Faced a 3bet and folded
  • F.4Bet
  • CBet - Continuation Bet Flop - ie raise 1st street and bet 2nd street
  • Fold to Cbet - Faced Cbet and folded
  • ATS - Attempted to Steal Blinds
  • LimpCall - flat call BB, faced a raise and called this%
  • FoldBB.Steal - % folded when faced a raise from late position
  • FoldSB.Steal

...factor analysis or cluster analysis. You could even try tinkering around with neural networks if you're feeling adventurous....
These sound seriously sexy, I'm happyt to explore whatever is deemed most appropriate for my objective. Im good with numbers but my exposure to statistical models is limited [ie 0-5%].

The objective is pretty much to use a "AutoRate" feature for each player type, ideally the fewer player types the better.
Max capacity 14. anywhere between 5 & 7 would be ideal. Obv I'll have to create the rules so that some are more important then others.

And have my database identify these player types based on the rules we develop from the Analysis. So that different general strategies can be applied against the player pools until other players give me a reason to deviate from the general strategy.
I've uploaded the raw data - see below. Have obv also removed names, etc and replaced with numerical values.

I am a bit lost with where to start with Factor Analysis or Cluster Analysis - I did do a some reading but this just went completely over my head!!!