How to determine the cut-off point of your dataset?

#1
Hi,

I was wondering whether there is a way of determining cut-off point of your dataset? Let say I have the following dataset , which includes the following columns: text and emotion (negative, positive , or neutral):

| Text | Emotion |
| I like yellow. | Positive |
| I hate blue. | Negative |
| I love yellow. | Positive |
| I dislike orange. | Negative |
| I am ok with any colour. | Neutral |​

(Lets presume that there are 1000 rows with similar text - I have just presented you the first 5). I want to calculate the mean frequency of each word in order to select the most important words. Lets say that the output (this is just an example) is:

WORD MEAN
I 0.76
like 0.56
hate 0.45
love 0.03
dislike 0.34
am 0.33
ok 0.22
with 0.10
any 0.02
colour 0.05
yellow 0.20
blue 0.18
orange 0.76
pink 0.05
How will I be able to determine the cut-off point of my dataset in order to select the most important words (i.e. the words that appear more frequently)?