Variance across different sample sizes

#1
Hello,

I want to know how I can interpret variance when the sample sizes are not exact.

Here is my problem: I am analyzing daily playing data for poker players. For each player on each day their playing statistics are recorded. For each player I want to understand how consistent their playing stats are over time. My problem is that they play different numbers of hands each day (4000 one day, 5000 the next, and so on).

How can I control for # of hands played so that I can compare the consistency of different players? I cannot access the data in a fashion other than daily sums.

Thanks,
theyjamin
 
#2
Hi, what is the outcome? Winning / not winning? Or in other words: can you formulate the outcome in terms of "numbers of success for a given number of hands"? In this case, you can use a Binomial Regression Model. Here, the outcome consists of two variables: the number of trials (here number of hands) and the number of success.
 
#3
Hi,

The outcome for each individual hand is either yes, no or null, for a given stat. The stats are based on behavior in the hand, and often there is no opportunity for the behavior to be observed.

Here is the data structure for a given player.

Let n be the number of daily hands
Let x be the statistic value for that day
Let the day number be the subscript

So the data for a player comes like this (n1, x1), (n2,x2), (n3,x3),... , (ni, xi).

I want to determine how x varies from day to day across my sample, and account for the fact that n is different for each day. My logic is that if there are few hands played on a given day, the value of x on that day is less conclusive and hence will add noise to my calculation of the daily variance.

I hope this makes sense.
 

hlsmith

Omega Contributor
#4
What are you thinking for Stat of interest? How big is n and number of days?

Side question, is this a game against the dealer or everyone? Their head-to-head competition may influence success. Clarification, no or null meant outcome is 2 possible groups or 3?
 
#5
For example a stat of interest would be %of time calling when facing a bet. For an individual hand this will either be 1 (they called), 0 (they folded), or N/A (when at no point were they facing a bet). The game is played against other players.

N is typically 0<n<10,000 and I am looking at roughly 50 days. I want to look at the variance of x over the 50 days, while controlling for n. This is because we are more likely to see outlier values for x when n is smaller.

Thank you