What do you mean by "explaining the data distribution"? What is your actual objective?I was thinking of using the Bayes theorem of explaining the data distribution
Do you mean the data is skewed ("uneven")?
I have two sets of data distribution between 0 to 100%. I was thinking of using the Bayes theorem of explaining the data distribution. More info on Bayes theorem:
http://en.wikipedia.org/wiki/Bayes%27_theorem
However, both of my data is not normally distributed.
1st dataset: 0-100% with midpoint of 16.66%. (Highest value obtained is 85% while lowest value 0%)
2nd dataset: 0-100% with midpoint of 16.66%. (Highest value 35% with lowest value 0%)
Both scales/data are independent of each other. However, I'm not sure which kind of probability theory to use. I was thinking Bayesian, but I'm not sure if Bayesian theory works with uneven data distribution. Can anyone tell me which theory should I be looking at? Can Bayesian work for uneven data distribution?
Thank you
What do you mean by "explaining the data distribution"? What is your actual objective?I was thinking of using the Bayes theorem of explaining the data distribution
Do you mean the data is skewed ("uneven")?
Sorry not understand "fit a Bayes theorem for a data distribution". Can you provide any concrete example for what you want to do? Either the wording is too vague or there is something wrong?
OK, Example I collected test scores from different individuals.Dataset 1:
#1 - 41%
#2 - 6% (failed)
#3 - 86%
#4 - 46%
#5 - 30%
#6 - 48%
#7 - 68%
#8 - 10% (failed)For Dataset 2, I collected scores from 6 people.
#1 - 28%
#2 - 34%
#3 - 34%
#4 - 23%
#5 - 32%
#6 - 14% (failed)
Now, I want to categorised the findings in dataset 1 and 2 into categories (distinction, above average, average, pass and fail) . Since both test are different, I cannot use the same scale for both data.
For dataset 1 & 2:
Dataset1 - classification - Dataset2
66.67%> - distinction - 31.67%
50 - 66.66% - above average - 26.67 - 31.66%
33.33 - 49.99% - average - 21.67 -26.66%
16.67 - 33.32% - pass - 16.67 - 21.66%
<16.66 - fail - <16.66%
That was the classification I was trying to fit a Bayesian classification to. However, Bayesian has probabilities on both sides, so that classification I just gave does not work because I have not taken into account the negative scale. If I followed the Bayes theorem perfectly, I should end up with a scale of like:
-distinctionNow, if the data was normally distributed, 0 - 100%, I can say that the midpoint is 50% and I can classify the groups into specific percentage (%) range. So every 20% increment gives a rise to one category.
-above average
-average
-pass
-boderline/midpoint
-just failed
-moderately failed
-failed spectacularly
-pathetic
As my data is not normally distributed (as shown by Dataset1 and 2), I was wondering how do I fit it to the second type of classification:
-distinctionCan I use a Bayes theorem for that kind of data? I have worked out the positive classification scales as shown here:
-above average
-average
-pass
-boderline/midpoint
-just failed
-moderately failed
-failed spectacularly
-pathetic
For dataset 1 & 2:But how do I find intervals for the negative portion? Can Bayesian work with a skewed data such as this?
Dataset1 - classification - Dataset2
66.67%> - distinction - 31.67%
50 - 66.66% - above average - 26.67 - 31.66%
33.33 - 49.99% - average - 21.67 -26.66%
16.67 - 33.32% - pass - 16.67 - 21.66%
<16.66 - fail - <16.66%
Is it clear now?
|
|