Bonferoni test

javedbtk

New Member
In my example, I have about 50 statistical analysis, so is it feasible to use bonferoni test in this case?. 05/50 will be very very small value and it will be impossible for one type of algorithm to significantly outperform the other. Thanks

Karabiner

TS Contributor
In my example, I have about 50 statistical analysis, so is it feasible to use bonferoni test in this case?.
Well, it depends.
05/50 will be very very small value
It depends. In some genome studies, for example, this would be a big value.
and it will be impossible for one type of algorithm to significantly outperform the other. Thanks
Would this be a problem for you?

Maybe some information about your studiy (topic, research questions, study design, sample size, practical and/or
theoretical relevance) would be useful.

With kind regards

Karabiner

javedbtk

New Member
Well, it depends.

It depends. In some genome studies, for example, this would be a big value.

Would this be a problem for you?

Maybe some information about your studiy (topic, research questions, study design, sample size, practical and/or
theoretical relevance) would be useful.

With kind regards

Karabiner

Yes it would be a problem. For example, algorithm A significantly perform better than B with the p value of 0.0001, it means algorithm A is quite better than B, but after the bonferoni analysis, we would end with no algorithm performed better than the other.

GretaGarbo

Human
In my example, I have about 50 statistical analysis,
Here it seems to be about evaluating different treatments.

For example, algorithm A significantly perform better than B
But here it seems to be about evaluation different algorithms.

Algorithms and treatments are different things.

So, what is it? And what criteria do you want to use?

Karabiner

TS Contributor
Yes it would be a problem. For example, algorithm A significantly perform better than B with the p value of 0.0001, it means algorithm A is quite better than B,
No, it just tells you that you can reject the Null hypothesis "the difference between A and B is = 0.00000000000000000000000000".
Small p values do not indicate a large effect. Usually, they are due to large sample sizes.

Maybe some information about your study (topic, research questions, study design, sample size, practical and/or
theoretical relevance) would be useful.

With kind regards

Karabiner

Last edited:

ondansetron

TS Contributor
No, it just tells you that you can reject the Null hypothesis "the difference between A and B is = 0.00000000000000000000000000".
Small p values do not indicate a large effect. Usually, they are due to large sample sizes.

Maybe some information about your studiy (topic, research questions, study design, sample size, practical and/or
theoretical relevance) would be useful.

With kind regards

Karabiner
Just reposting this because anyone who reads now in the future should see the emphasis that p-values tell you nothing about "A is quite better than B."

javedbtk

New Member
Just reposting this because anyone who reads now in the future should see the emphasis that p-values tell you nothing about "A is quite better than B."
Indeed, but it explains at least A and B have significant difference

ondansetron

TS Contributor
Indeed, but it explains at least A and B have significant difference
This is a pretty useless thing to explain, in general. P-values have limited information to convey and it's a misconception that "significance" is some targeted endpoint with tons of value.

It also sounds like in your OP that your goal is to have something be significant since your concern is that [one won't be able to outperform the other] if you use a smaller alpha level per test. This should not be your goal.

javedbtk

New Member
This is a pretty useless thing to explain, in general. P-values have limited information to convey and it's a misconception that "significance" is some targeted endpoint with tons of value.

It also sounds like in your OP that your goal is to have something be significant since your concern is that [one won't be able to outperform the other] if you use a smaller alpha level per test. This should not be your goal.
If not p values, then what is the alternative? How can we perform analysis for significant differences?

Karabiner

TS Contributor
Due to the nearly complete lack of information about the study, we don't know the research design; not even the scale level of the dependent variable; or why and what for the study is undertaken. It is difficult to suggest solutions if the problem is described so poorly.

Maybe you can perform all comparisons in one analysis (perhaps repeated measures ANOVA or mixed ANOVA or multilevel modeling, if the dependent variable is interval scaled), and attach 95% confidence intervals to the estimated parameters. Such confidence intervals will give you an impression about how reliable the estimations are.

What you then consider a "significant" difference (in the sense of important/relevant/remarkable..., I suppose?) will be up to your own judgement. No statistical procedure can take this task off you.

With kind regards

Karabiner

Last edited:

ondansetron

TS Contributor
If not p values, then what is the alternative? How can we perform analysis for significant differences?
What do you believe "significance" means?

GretaGarbo

Human
Are you evaluating two treatments or are you evaluating two algorithms?

javedbtk

New Member
Are you evaluating two treatments or are you evaluating two algorithms?
Algorithms.. . I am working on software development effort estimation where different algorithms like linear regression, support vector regression are applied to perform predictions. Data generated is not normally distributed. I performed Wilcox test to get p values.
3 algorithms compared with each other to find its predictive accuracy. These comparisons are repeated 4 times for 4 different datasets. So for each dataset, the comparisons are 3, but overall the comparisons are 12 i.e. 3 algorithms * 4 datasets.
Now is it possible I divide the 0.05 by 3 (comparison for each dataset) rather than 0.05/12 (comparison for all datasets and algorithms).
Thanks for understanding

GretaGarbo

Human
If you have two treatments and one algorithm that compute the median and an other algorithm that computes the mean. Do you say that if the mean algorithm computes a smaller p-value (in comparing the two treatments) that you then have "shown" that the mean algorithm is "better"?

I hope you agree that this is absurd.

Usually one creates a model, and the model should fit the data.

Then you choose an estimator that is appropriate, e.g. least squares or maximum likelihood.

Then you choose an algorithm tha can compute the estimator.

Of course you can call all three steps an "algorithm" but the data must still fit the model and the estimator must be relevant.

- - -

Besides, four different data sets are not much. And you cant really define a population from which the data sets are taken from. So what are you doing inference about?