Set Completion Problem for a Set of Collectable Objects


New Member
I'm trying to do some probability analysis for a digital collectable game called SolForge. There is a particular point in my analysis where I'm not sure if I'm correct, and everything past that point hinges on what I'm doing there. Since the game itself is irrelevant to the question at hand, I will try to be as generic as possible. If you want to see the problem in its entirety, you can find my forum post on the game's website here.

First, the groundwork for the problem: We have a set of collectable objects. Within this set, the objects are separated into four subsets based on rarity: Common, Uncommon, Rare, and Very Rare.

Objects can be acquired in either of two ways:

  • Purchasing packs of six random objects each. Each object has an \(R_c\%\) chance of being Common, \(R_u\%\) chance of being Uncommon, \(R_r\%\) chance of being Rare, and an \(R_v\%\) chance of being Very Rare. Each pack costs \(\$C_p\)
  • Purchasing objects individually for a fixed cost based on rarity. A Common costs \(\$C_c\), an Uncommon costs \(\$C_u\), a Rare costs \(\$C_r\), and a Very Rare costs \(\$C_v\).

Having a subset completion of \(X\%\) for a given rarity gives us a \(Y\%\) chance of opening some number of new objects of that rarity from a pack.

The question: What value of \(Y\) gives us an expected value of 1 new object of a given rarity after opening a number of packs equivalent in value to purchasing an individual object of the same rarity? (i.e., what value of \(Y\) allows us to break even by opening packs?)

The end goal of my actual analysis is to know \(X\) so that we know at what point it becomes more cost-efficient to buy individual objects than to buy packs, since packs are substantially cheaper than individual Rares or Very Rares. I think I need to calculate \(Y\) first, and that's where I'm having trouble. Here's what I tried to solve for \(Y\):

\(Y^1 + Y^2 + Y^3 + Y^4 + Y^5 + Y^6 = 1\)
(Since there are six objects in a pack, I am adding the odds of opening 1, 2, 3, 4, 5, and 6 objects of our desired rarity together and saying this is equal to one relevant object. The problem is I have no idea if this is correct. Solving this for \(Y\) gives a value of 0.504138, or ~50.4%)

Something interesting that I found while fiddling with this is the following:
\(Y = 1/2\), or 50%, when \((1-Y)^6 + Y^1 + Y^2 + Y^3 + Y^4 + Y^5 + Y^6 = 1\)
For that matter, \(Y = 1/2\) always when \((1-Y)^m + \sum\limits_{n=1}^m Y^n = 1\)

Should I be using this instead, since it also includes the odds of not opening a new objects? If so, does that mean the break-even point always 50% for this sort of thing? I get the feeling I'm pretty far off the mark here. Any help you can give me here will be greatly appreciated.
Last edited:


TS Contributor
The very first thing is that how do you model the number of objects of a rarity in the pack.

If you assume the 6 objects are independent to each other, a possible model is the multinomial model. Let \( Y_c, Y_u, Y_r, Y_v \) be the number of objects of the 4 rarities respectively. Then

\( (Y_c, Y_u, Y_r, Y_v) \sim \text{Multinomial}(6;p_c, p_u, p_r, p_v)\)

One good thing is that each of them has a marginal distribution of Binomial distribution, and thus the expected value will be just equal to, e.g.

\( E[Y_c] = 6p_c \)

I am not sure about the exact details, so perhaps you can elaborate more first, say by giving an example.


New Member
I guess I need to fill in some more details, then. Hopefully I won't leave anything important out this time.

The six objects in each pack are determined independently of one another, without replacement. (i.e., two different packs may have some overlap in what is opened, but a single pack will never contain multiples of the same object.) The rarity for each object (except the first) is determined as follows:
  • There is an \(X_v\) chance the object is Very Rare. If not -
  • There is an \(X_r\) chance the object is Rare. If not -
  • There is an \(X_u\) chance the object is Uncommon. If not -
  • There is a 100% chance it is Common
The first object in each pack is checked the same way but is always at least Uncommon.

In the particular example I am working with, the average number of objects within each rarity opened in each pack is as follows: (with a pack-opening sample size of 837)

  • 0.047 Very Rare
  • 0.262 Rare
  • 1.419 Uncommon
  • 4.272 Common
  • (6.000 Total)
From this, the odds of any one object being being a given rarity during its check have been estimated to be: (I have not verified this - the person who collected the data made these calculations)

  • 1% Very Rare
  • 5% Rare
  • 25% Uncommon (100% if the first object)
  • 100% Common

And the number of unique objects within each rarity is as follows:
  • 60 Very Rare
  • 80 Rare
  • 84 Uncommon
  • 140 Common
I'm going to read up on Binomial and Multinomial distributions now to see if I can use those. If anyone comes up with something different in the meantime, please let me know.
Last edited:


TS Contributor
I think I am still not very sure about your exact difficulty / problem although I understand your every single sentence.

Maybe a first clarifying question is that, do you mean there are fixed number of objects in each rarity which is known to you and you want to calculate the probability of the objects appearing in a certain pack? In such cases you will use a (multi)hypergeometric distribution for this as the population size is finite. Multinomial model could be thought as the "with replacement" case so may not be exactly suitable, but they are close when the population size is large.

Or, you have some given data and want to estimate the probability?


New Member
Ah, I'm sorry I still haven't managed to make myself clear. Some of the information I just added may have been irrelevant now that I think about it.

The important thing is that I find out at what point in the collecting process that it becomes more cost-efficient to buy individual objects instead of packs. Though the numbers for the costs are different in my actual example, let's say a pack costs $1 and an individual Rare costs $7. For the price of one Rare, I can buy 7 packs. Since, in the data I provided, there is an average of 0.262 Rares in a pack, we can expect to see 1.834 Rares in 7 packs. (i.e., buying packs is more cost-effective than buying individual Rares when my end goal is to acquire Rares.)

But what if I already have 50 of the 80 unique Rares already in my collection? Duplicates are worthless to me, so I only care about opening Rares I don't already have. I want to know the odds of opening a new Rare in 7 packs such that my expected value is 1 new rare per batch. (i.e., I get the same Rare value from buying 7 packs as I do buying 1 individual Rare). That way, I know that before this point it is more cost-effective to buy packs, and after that point it is more cost-effective to buy singles, with my end goal being to complete my collection of Rares.

I hope that clears things up.


Ambassador to the humans
I don't see why it would ever switch from being more cost effective to buy one way over the other based on what you currently have. Unless you can specify the actual object you're buying (so instead of just saying I'll pay $7 for a random rare you can say I'll pay $X for rare #32).


TS Contributor
Now the question makes more sense, but still you need to answer what Dason ask: If you purchases the object individually, can you specify the object so that it will not duplicate with those you already owned? Or you can just specify the rarity and just randomly picked one in that category?

If it is the former case, please also consider the following question:
Is it possible to have duplicate objects inside a pack? i.e. to have the same object appearing twice or more inside a pack.

If no the calculation will be simpler and just follows a hypergeometric distribution argument to calculate the fair value of a pack, given the number of objects you currently owned.

If yes the calculation will be slightly more complicated as you need to consider the possibility of duplication within a pack, though the probability maybe very small to be negligible.


New Member
That's an excellent point. The packs are random, but the single purchases are specific—you get to choose which one you get every time. Also, you cannot get the same object more than once per pack.

Another detail which I left out in an attempt to simplify things is that we are actually interested in acquiring three of each object, and that extras can be sold for a portion of their individual cost. (This is a card game. You can have up to three of any given card in your deck. The game uses a currency called "silver" which is used to buy packs and "forge" specific single cards. Any card in excess of the third copy can be "smelted" for additional silver. I can provide these numbers if they are necessary for the analysis.)

I don't think this detail is necessary for the analysis, as we still want more of each card up until the third copy (so three cards equates to one generalized object), and you can't open more than one copy of a card in a given pack, but it does help to suggest that there is a tipping point where one acquisition method becomes outclassed by the other.

I'll read up on the hypergeometric distribution and try that one next.

EDIT: The hypergeometric distribution looks like it's exactly what I was looking for. Thanks so much!
Last edited:


TS Contributor

\( x_c, x_u, x_r, x_v \) be the number of objects you owned in each rarity respectively, which is "full" in your inventory, i.e. you can only sell for a portion of price when you receive another identical object.

\( n_c, n_u, n_r, n_v \) be the number of objects in total for each rarity respectively.

\( Y_c, Y_u, Y_r, Y_v \) be the number of objects in a random 6-items pack for each rarity respectively, in which you are not "full" yet.

And the probability of observing \( Y_c = y_c, Y_u = y_u, Y_r = y_r, Y_v = y_v \) is

\( \Pr\{Y_c = y_c, Y_u = y_u, Y_r = y_r, Y_v = y_v\} \)

\( = \frac {\displaystyle \binom {x_c + x_u + x_r + x_v} {6 - y_c - y_u - y_r - y_v}
\binom {n_c - x_c} {y_c}\binom {n_u - x_u} {y_u}
\binom {n_r - x_r} {y_r}\binom {n_v - x_v} {y_v}}
{\displaystyle \binom {n_c + n_u + n_r + n_v} {6}} \)

And the random value of such random pack is equal to
\( C_cY_c + C_uY_u + C_rY_r + C_vY_v + S \)

where \( S \) is the value you get when you sell the item which you are "full". and we can calculate the expected value as

\( 6 \times \frac {C_c(n_c - x_c) + C_u(n_u - x_u) + C_r(n_r - x_r) + C_v(n_v - x_v)} {n_c + n_u + n_r + n_v} + E\)

The last term can be also calculated similarly if you provide more information.