My goal is to create an app that calculates rarity scores of traits.
The rarity score is a formula like so:
Rarity score = 1/(%Chance Of occurrence)
Let's say I have a trait that has 10% chance of occurring.
The rarity score for this trait will be:
10 = 1/(10%).
This score will be without trait normalization.
What I am trying to find out is how the process of trait normalization (or rarity normalization) is done.
From my research the normalization takes into account the amount of traits in a specific trait type.
Let's say we have two trait types:
Trait_Type: Hair-Color
Value: Green 1% Score 100
Value: Blue 99% Score: 1
Trait_Type: Shirt-Color
100 traits all having 1% chance of occurrence.
When we use the rarity calculator above all values of shirt colors will get the same 100 score as the score of a green hair-color.
This is not accurate, when we have 100 traits (or many traits) obviously they will have lower percentages granting each trait a higher score.
In reality each shirt-color isn't really worth because all have a 1% chance of occurring.
On the other hand the Green background color is really worth.
My goal is to introduce these differences and add trait count for each trait_type into account so when we score those traits the green will show way higher than a shirt-color.
The information I know is:
The chance of a trait happening.
The rarity score of it.
All the data about trait count (Trait type count, traits amount inside the trait etc..)
The farthest I got is:
Vanilla_score = 1/(%Chance of trait happening)
Normalized_score = (Vanilla_score*Avg number of traits per trait_type)/traits in category
This will not result in an accurate enough score.
If we take a trait_type called: `Flair`
Value: `hijab`
Avg Trait_count per category: `13.1875`
Trait_category_count: `16`
Trait_count_for_flair_category: `40`
The trait has a 0.44~% chance of occurring.
With the vanilla score it will give it a value of: `243.87`
With this method the normalized score will be: `80.4`
On the site I want to replicate the score is: `35.87`
What are other calculations that can be done to take into consideration the traits per trait_category into account?
(If any data is missing let me know and I will add it.)
Reference links:
Trait Normalization (at the end of website)
Explanation about current used formula
The rarity score is a formula like so:
Rarity score = 1/(%Chance Of occurrence)
Let's say I have a trait that has 10% chance of occurring.
The rarity score for this trait will be:
10 = 1/(10%).
This score will be without trait normalization.
What I am trying to find out is how the process of trait normalization (or rarity normalization) is done.
From my research the normalization takes into account the amount of traits in a specific trait type.
Let's say we have two trait types:
Trait_Type: Hair-Color
Value: Green 1% Score 100
Value: Blue 99% Score: 1
Trait_Type: Shirt-Color
100 traits all having 1% chance of occurrence.
When we use the rarity calculator above all values of shirt colors will get the same 100 score as the score of a green hair-color.
This is not accurate, when we have 100 traits (or many traits) obviously they will have lower percentages granting each trait a higher score.
In reality each shirt-color isn't really worth because all have a 1% chance of occurring.
On the other hand the Green background color is really worth.
My goal is to introduce these differences and add trait count for each trait_type into account so when we score those traits the green will show way higher than a shirt-color.
The information I know is:
The chance of a trait happening.
The rarity score of it.
All the data about trait count (Trait type count, traits amount inside the trait etc..)
The farthest I got is:
Vanilla_score = 1/(%Chance of trait happening)
Normalized_score = (Vanilla_score*Avg number of traits per trait_type)/traits in category
This will not result in an accurate enough score.
If we take a trait_type called: `Flair`
Value: `hijab`
Avg Trait_count per category: `13.1875`
Trait_category_count: `16`
Trait_count_for_flair_category: `40`
The trait has a 0.44~% chance of occurring.
With the vanilla score it will give it a value of: `243.87`
With this method the normalized score will be: `80.4`
On the site I want to replicate the score is: `35.87`
What are other calculations that can be done to take into consideration the traits per trait_category into account?
(If any data is missing let me know and I will add it.)
Reference links:
Trait Normalization (at the end of website)
Explanation about current used formula