Default Bayesian alternatives to some simple models

CB

Super Moderator
This thread is more to kick off a discussion than ask questions exactly... maybe even a bit of debate?

Ok, so we've all heard time and again that null hypothesis significance testing is good for nothing but ravishing maidens. It tell us P(Data|Hypothesis) when we want to know P(Hypothesis|Data); it can't provide evidence for a null hypothesis; p values depend on the researcher's sampling plan, not just the actual data (see the Wagenmakers article below); etc etc.

Several of the main complaints about NHST have been floating around for 60+ years now, and more and more methodological articles seem to be calling for a switch to Bayesian data analysis - especially in psychology, but also in other fields. Bayesian data analysis allows us to directly calculate the probability of particular hypotheses or models being true, and circumvents a bunch of the problems with NHST.

The thing is, so far that hasn't really changed how everyday researchers do their bidness. NHST still rules the roost. Part of that problem is probably just cultural inertia, but part of it is probably also because Bayesian analysis is perceived as harder, requiring more subjective decisions (especially about priors), and more programming skills than typical NHST tests.

But lately there have been a handful of articles coming out that are trying to suggest Bayesian alternatives to common statistical analyses, with the t-test sort of being used as a bit of a test case. The idea seems to be to develop alternatives that both:
1) Are easy to use
2) Have default, reasonably non-informative priors that can be used in a wide range of situations.

1) Model comparisons using the Bayesian Information Criterion (BIC).
A practical solution to the pervasive problems of p values (Wagenmakers, 2007)
Here the idea is that if you can get BIC's for different models, or calculate them (and Wagenmakers show how to do so for something like an ANOVA), you can then compare models in a Bayesian fashion. Specifically the difference between the BIC's of two models approximates to a Bayes Factor indicating the weight of evidence in favour of the model with the lower BIC. Wagenmakers talks about this approximation being consistent with a unit information prior - I don't think I understand the prior very well here though. The nice thing about this method is that it can be applied very widely - you could do anything from a t-test (i.e. by comparing an ANOVA model with a dichotomous predictor to one with intercept only) to a comparison of SEM models. But it doesn't use Bayesian estimation at all.

2) Model comparisons using Bayes Factors, and a "JZS" prior
Bayesian t tests for accepting and rejecting the null hypothesis (Rouder et al, 2009)
A default bayesian hypothesis test for ANOVA designs (Wetzels et al, 2012)
A default Bayesian hypothesis test for correlations and partial correlations (Wetzels & Wagenmakers, 2012)
Analyses for t-tests, regression and binomial observations implemented online here
In this setup the prior for coefficients under the alternative hypothesis is made with reference to effect size, using a Cauchy(0, 1) distribution. As in the above approach we can produce evidence for and against a null hypothesis that a variable has no effect, using Bayes Factors. This approach isn't quite as general as the first. Again, Bayesian reasoning is used for inference, but not model estimation.

3) Full blown Bayesian estimation
Bayesian estimation supersedes the t test (Kruschke, 2012)
Implemented beautifully online here
Kruschke suggests that mucking around with frequentist analysis and then calculating Bayes Factors is not the way to go. Proper Bayesian estimation gives us much more detailed information; especially in the form of a posterior distribution for each parameter, which shows us the relative plausibility of lots of different values of the parameter (not just a null and alternate hypothesis). For a t-test situation, Kruschke suggests a very wide normal distribution as an appropriate non-informative prior for the group means (technical details in the article). We combine this prior with the sample data using MCMC, and get a posterior distribution. This approach is probably the most informative and most general, but also will seem the most unfamiliar to a researcher used to OLS and NHST and SPSS.

One thing that jumps out at me about approach (3) is that it abandons the traditional obsession with a "point" null hypothesis. The method, at least in the default implementation, can't provide evidence for an exact point null hypothesis. I don't mind this - most of the time, a null hypothesis that an effect is exactly zero seems pretty implausible, unless we're talking about an experiment testing whether people have precognition or something like that. But the method can provide evidence that an effect is negligible (using a "region of practical equivalence"), and if I understand it correctly can tell us the posterior probability that an effect in a particular direction exists. That seems more useful - in the social sciences, researchers very often have an hypothesis that an effect exists in a particular direction.

Anyway, that's enough babbling on from me, especially since I don't understand this stuff all that well anyway. But what do you guys think of these approaches, and of Bayesian data analysis in general? And will there come a day when our usual answer to a standard "what test should I use?" question will be to tell the poster about a default Bayesian analysis?

rasmusab

New Member
I think 3) is the way to go! But I'm the one that made the online verision of "Bayesian estimation supersedes the t test" so I might be biased In general I think Kruschke has a point when he emphasizes parameter estimation rather than model comparison.

One of the strengths of classical statistics is that the different tests and models are so "standardized". If I write that I've used a t-test, an ANOVA, Pearson's Correlation, OLS regression everybody knows (should know at least) what model I've been using, no need to explain it further. When using Bayesian statistics I would have to fully describe my model and justify the use of every distribution. This is of course a good thing, in one way, but it makes writing up an Bayesian analysis much more verbose. The cool thing with "Bayesian estimation supersedes the t test" (BEST) is that it presents a "standard" Bayesian model that I wouldn't have to explain every time I used it in an analyzis, I could just cite Kruschke's paper and be done with it! If there were more of these kind of Bayesian models for common cases of analysis with reasonable priors I think it would both help the adoption and ease the presentation of Bayesian statistics. Why not "Bayesian Anova Rules the Traditional anova" (BART) or "Bayesian Estimation Truly Trumphs OLS Regression" (BETTOR)?

BTW do anybody know of more papers like BEST? That is, attempts to introduce "standard" Bayesian models for different types of anayses?

However, I think the single most important change that could increase the use of Bayesian statistics (at least in psychology) would be if it was easy to do in SPSS. :/

TheEcologist

Global Moderator
I see myself as a bit of a person on the side-line. I am not a fan-boy of null-hypothesis testing nor a born-again Bayesian.
People are raving about Bayesian analyses these days, I do not, even though I absolutely love it. I also don't think that Bayesian analysis are increasingly popular because people are finally seeing the light, or the "errors" of the old ways. The real reasons are practical and necessity driven. So here is my opinion.

Null-hypothesis testing, frequentists and Bayesians.

Lets start with a historical perspective on Frequentists vs Bayesians. It's an important one, so take note, please do not confuse frequentists with null-hypothesis testing. The notion of testing an alternative hypothesis against a null has nothing to do with frequentists, even though it has been used in classical stats the past 100 years - in which coincidently frequentist approaches dominated. So there is definitely a correlation, but sorry no causality here people. T.C. Chamberlin (in a science paper in 1890, I believe) advocated a method of multiple working hypotheses, as a superior form of conducting scientific inference. It was a superior idea, in my opinion; think of all plausible hypotheses that explain a patterns - then device mathematical models for each hypothesis and continue testing these hypotheses, and establish the relative evidence for each. This can be done in a Frequentist context just as simple as it is to conduct null-hypothesis testing under a snug Bayesian blanket. Nevertheless, null-hypothesis testing won the day back then, why? Because it fitted the more simple nature of experiments, fertilizer or not (ect). William Sealy Gosset, certainly didn't care about multiple hypotheses, his invention of the t-test fitted his needs: An easy way to test the quality of stout. It fitted the assumptions as well. It also turned out that it was something that could be easily applied to many fields. It's null-hypothesis approach was an easy concept to grasp and fitted the nature of studies at the time (answering the burning questions of the time).

However, I think the single most important change that could increase the use of Bayesian statistics (at least in psychology) would be if it was easy to do in SPSS. :/
The t-test and null hypothesis testing, was the easy way to do it then.. in many fields that were (and still are) a very long way from becoming qualitative. Multiple hypothesis testing was difficult, it required more rigorous knowledge of the system.. you needed to translate hypotheses to mathematical models. You needed a way to rank them, which didn't really exist, so null-hypothesis testing won. They could be "standardized". It was the easy practical solution to necessity. People though of ways to make it even easier, to get these advancements in statistical science better adopted in non-mathematical fields. They made normal-tables and statistical cooking books.. and SPSS. The need to truly understand the tests, and their philosophy, was removed. So, sadly, it remained the dominant method even after it no longer fitted study designs (multiple comparison tests are my golden standard for this ) - even after multiple hypothesis testing was theoretically solid due to the development of information theory (AIC's and all that people!).

Now it is being shunned, shunn the null-hypothesis believer! I'm sorry but it still has it's place and use. I would not advise using Bayesian alternatives to t-test just because they are Bayesian. I like, nay love, Bayesian approaches because I can tailor them to my situation, my experiment, my reality. Now I see them being "standardized" (not necessarily reflecting you guys or the authors of the papers above) - But in general I do not agree with this. I advocate true understanding, even if it is perceived to be hard. It's the attitude that needs to change, not the label on the black box. Simple doing every thing standardized "Bayesian" (in B-SPSS?), is not and never will be a panacea for the advancement of science.

The real driver behind the Bayesian revolution? Another historical view.

Let me repeat myself, the real reasons are practical and necessity driven. Even though we are all hearing and reading that the hip new statistical approach is "Bayesian statistical inference" - a radical revolutionary concept in which prior knowledge is combined with the information provided by the data to reach a consensus - the history of stats will show us that it was the dominant approach for over 150 years, from its introduction in the late 18th century. It, however, began to fall out of favour due mainly to two distinct practical issues. The first, researchers at the time became increasingly concerned about the subjectivity of the approach, which stemmed from the inclusion of the prior in the analysis. These were real concerns of the time, it was seen as a hindrance to progress (and this objection did not disappear magically with the computer revolution!). And secondly (and in my opinion the most important), the computational complexity of the Bayesian approach quickly became too difficult as more and more complex problems were studies (I mean have you tried calculating posteriors with pencil and paper? Were did you get stuck?). Very intelligent scientists of the time became staunch opponents of the Bayesian approach (see e.g. Kotz and Johnson 1992, Breakthroughs in Statistics). They were the Steven Hawkins of their time and their methods began to dominate the field and, looking for alternatives, they (and we all know who they are!) developed the classical approach to statistical inference. Therefore, much of the 20th century was bleak for the Bayesians, and with good reason, you didn' t get far with Bayesian analyses and if you used it your methods were non-tractable. Then game the computer revolution with the great invention of the MCMC algorithm, allowing the necessary numerical integration by sampling from the posterior distribution. Suddenly, this made the Bayesian approach tractable once more. Though much more importantly, suddenly the much more complex models could be fit (and they were needed because of the simultaneous explosion of HUGE datasets automatically collected from space or deep-sea sensors). Suddenly Bayesian models were revived by the true gods of statical revolution, the ugly twin sisters of practically and necessity. Therefore I believe that MCMC's wiped the second objection to Bayesian statistics off the table, but not the first, Bayesian analyse have their own pitfalls, as do all approaches.

So, here is my question to you. What is your black box, and why don' t you rip off the (Bayesian or Frequentist) labels and take a look inside?
You may find it is actually easy to grasp.

Tonight I will raise my glass to the inventors of the classical statistics but bow let me get back to my MCMC's with completely uninformative priors...

TE

Last edited:

CB

Super Moderator
I think 3) is the way to go! But I'm the one that made the online verision of "Bayesian estimation supersedes the t test" so I might be biased Wow! I think the online version is brilliant. It quickly and easily shows the user the real advantages of the Bayesian information approach. Great job, and it's awesome to see you on TS The cool thing with "Bayesian estimation supersedes the t test" (BEST) is that it presents a "standard" Bayesian model that I wouldn't have to explain every time I used it in an analyzis, I could just cite Kruschke's paper and be done with it! If there were more of these kind of Bayesian models for common cases of analysis with reasonable priors I think it would both help the adoption and ease the presentation of Bayesian statistics. Why not "Bayesian Anova Rules the Traditional anova" (BART) or "Bayesian Estimation Truly Trumphs OLS Regression" (BETTOR)?
I like, nay love, Bayesian approaches because I can tailor them to my situation, my experiment, my reality. Now I see them being "standardized" (not necessarily reflecting you guys or the authors of the papers above) - But in general I do not agree with this. I advocate true understanding, even if it is perceived to be hard. It's the attitude that needs to change, not the label on the black box. Simple doing every thing standardized "Bayesian" (in B-SPSS?), is not and never will be a panacea for the advancement of science.

I agree that the development of more default Bayesian priors and models would be really helpful. TheEcologist raises some good points: Researchers simply switching from thoughtlessly applying routine significance tests to thoughtlessly applying default Bayesian tests is no panacea for improving statistical practice.

But even if they aren't a panacea, I honestly believe that even a default Bayesian approach to hypothesis testing is still an improvement over significance testing; maybe even a gateway drug leading one day to more tailored, customised, Bayesian data analysis. Even a default Bayesian approach is actually capable of at least attempting to answer the questions that researchers are interested in; significance testing isn't. Who one ever started a data analysis saying "I'd like to know the probability of observing a statistic this or more extreme, if this particular hypothesis is true?" But that's what the significance test tells us - not the probability that the hypothesis is true (which is what we surely actually want to know!) More than that, the idea of using p values to make inferences about hypotheses relies on a logical fallacy: the probabilistic form of the modus tollens. The fact that the observation of some particular data is unlikely if a particular hypothesis is true does not imply that the hypothesis is unlikely given the data observed.

I think that even the most basic of the three alternatives above (the BIC approach), which involves little more than a transformation of output from a conventional frequentist analysis, still carries an advantage over the NHST alternative. Particularly, it carries a communication advantage: It allows you to say directly how probable one hypothesis is in comparison to another. That's something that someone without advanced statistical expertise can intuitively grasp. Communicating statistical information to people with limited statistics knowledge is actually part of our job, and something we're not always good at. I think the controversy that erupted when the New York Times bravely tried to define p values shows just how hard it is for people without advanced stats knowledge to grasp how significance testing really works.

However, I think the single most important change that could increase the use of Bayesian statistics (at least in psychology) would be if it was easy to do in SPSS. :/
I agree! I think statisticians and data analysts may underestimate the power of SPSS defaults, because many of us use it rarely and feel in need of a hot shower afterwards on the rare occasions that we do... But it really does dictate practice for a lot of everyday researchers. Simply using SPSS defaults will never be the best way to do data analysis; but as long as some people are going about analysis that way, somehow pressuring IBM to improve the program's default options could really make a difference. The development of default Bayesian analyses does mean that SPSS could now realistically incorporate Bayesian alternatives to common tests such as ANOVA, t-tests, correlation, GzLM's, and so on.

BTW do anybody know of more papers like BEST? That is, attempts to introduce "standard" Bayesian models for different types of anayses?
Other than the ones cited above the only other similar paper I know of is A Default Prior Distribution for Logistic and Other Regression Models by Gelman et al. Their approach is implemented in function bayesglm in package arm in R. Would also be interested to see any others that people know of!

TheEcologist

Global Moderator
But even if they aren't a panacea, I honestly believe that even a default Bayesian approach to hypothesis testing is still an improvement over significance testing; maybe even a gateway drug leading one day to more tailored, customised, Bayesian data analysis. Even a default Bayesian approach is actually capable of at least attempting to answer the questions that researchers are interested in; significance testing isn't. Who one ever started a data analysis saying "I'd like to know the probability of observing a statistic this or more extreme, if this particular hypothesis is true?" But that's what the significance test tells us - not the probability that the hypothesis is true (which is what we surely actually want to know!)
Bayesian analysis also doesn't give you the probability that a hypothesis is true (lets please not get into a philosophical debate of truth). It just gives a relative ranking of a given hypothesis within a set of hypotheses included - given the priors and the likelihood. In all practical cases you can't get the probability of a hypothesis without knowing the probability of data (which is practically impossible). If you include two hypotheses, a null and your H1, Bayesian analysis is just about as useful as null hypotheses testing. You only get the relative ranking as we never really have the full set of hypotheses. This is why I say that multiple hypothesis testing is an improvement, and this can be done just as valid in a frequentist setting (AIC) as in a Bayesian setting. It really is not the fact that it is "Bayesian" that makes it magically superior to null-hypothesis testing - it is the fact that you can test multiple plausible hypotheses not just some default null that may or may not be suitable to your situation. So what is superior about Bayesian analyses within this context? Well I feel it is a decent framework with strong theoretical support for multiple hypothesis testing. If it is better than the alternative (AIC), I can't say, that is a question for the lords of statistics. I wonder what others here at TS think?

More than that, the idea of using p values to make inferences about hypotheses relies on a logical fallacy: the probabilistic form of the modus tollens. The fact that the observation of some particular data is unlikely if a particular hypothesis is true does not imply that the hypothesis is unlikely given the data observed.
People who think that really didn't study there statistics textbooks well (or got the wrong textbooks!). My stats classes where full of these warnings (e.g. lack of evidence for the H1 is no evidence for the H0 and others). So this would stem from ignorance about statistics rather then an intrinsic failure of null hypothesis testing.

Really Cowboybear, I love Bayesian analysis but lets be fair. It's not better just 'cause it's Baysesian'.

CB

Super Moderator
Thanks for your interesting arguments, TE. I guess I had been thinking in terms of Bayesian analysis vs NHST, and perhaps not acknowledging that there are obviously frequentist analyses other than NHST (some of which may be very useful).

If you include two hypotheses, a null and your H1, Bayesian analysis is just about as useful as null hypotheses testing.
I'm not sure about this bit though. In NHST with an H0 and H1, we aren't calculating the likelihood of the data (or any particular statistic) under two hypotheses: we're just looking at the likelihood of the test statistic under the null hypothesis. At least some of the Bayesian alternatives to NHST (particularly options 1 and 2 in my original post) force us to be a bit more specific about what we mean by the alternate hypothesis... surely a good thing? (tangent: probably under either approach, if we're testing a theory that can only tell us the direction of a relationship and nothing more, that theory isn't particularly impressive).

My stats classes where full of these warnings (e.g. lack of evidence for the H1 is no evidence for the H0 and others).
Sure, there's a knowledge problem; we can't blame poor old Ronald Fisher everytime somebody misinterprets a p value! But when the method simply isn't capable of providing evidence for one of the hypotheses being tested, surely that's a legitimate problem with the method?

I wonder what others here at TS think?
Me too Would love to hear from spunky, Lazar, Jake etc!

TheEcologist

Global Moderator
I may not be getting your point so bear with me...

Sure, there's a knowledge problem; we can't blame poor old Ronald Fisher everytime somebody misinterprets a p value! But when the method simply isn't capable of providing evidence for one of the hypotheses being tested, surely that's a legitimate problem with the method?
It's all relative to each other. They are equivalent.

Lets go through the merits of two hypotheses testing in a Bayesian and frequentist setting.

Starting with Bayes rule;

$$P(Hypothesis \mid Data) = \frac{P(Hypothesis)P(Data \mid Hypothesis)}{P(Data)}$$

As the probability of the data is essentially constant and unknown, Bayes rule reduces to;

$$P(Hypothesis \mid Data) \propto P(Hypothesis)P(Data \mid Hypothesis)$$

And when only using flat (constant) priors, it further reduces to;

$$P(Hypothesis \mid Data) \propto P(Data \mid Hypothesis)$$

(incidentally this is basically just ranking by the likelihood --> add parsimony checks and you've got AIC).

Now if we get a relative ranking of two hypotheses, H1 and H0, with equal parameters for both as in e.g. a t-test, and we find that H1 is better supported than H0. Under your above statement we may now interpret it as evidence for H1, as H1 ranks higher that H0. We should now feel confident that H1 is better than H0 (I agree). Does that make H1 true? We don't know, as evidence for H1 above H0 is no evidence for H1 as the truth! We will never know until we include all possible hypotheses in the analysis -> which essentially gives you P(Data). Which essentially can't be done, in most if not all cases. So the analysis just told us H1 is better that H0.

In contrast if we run a t-test we are likely to reject H0, and we then rank H1 as higher than H0. We should now feel confident that H1 is better than H0. Is H1 true? We don't know, as lack of evidence for H0 is no evidence for H1! So the analysis just told us H1 is better that H0.

Do you see my point? If you keep on testing two alternative hypotheses, Bayesian analyses are going to add about as much information above NHST as cutting out the entrails of a goat. Bayesian analyses in this setting are only an improvement if we can (convincingly) defend the inclusion of a strong prior. Otherwise these methods are equivalently good, and equivalently bad.

Alternatively if you include the full set op all plausible hypotheses (multiple hypothesis testing), you get a ranking between all plausible hypotheses (in the above example only if the amount of parameters are equal between hypotheses, otherwise we have to fool around with principle of parsimony and Occam's razor in both Bayesian and frequentest approaches). If you have included the full set of real life possibilities, with Bayes, you can even approach the "true" probability of the hypothesis! [given the data of course].

Multiple hypothesis testing however will never really work in a "p-value" setting (just look at the mess of multiple comparison testing),
it does however work very well in a Bayesian and information theoretic setting (AIC). With Bayesians having the powerful MCMC at their disposal!

Would love to hear from spunky, Lazar, Jake etc!
Indeed, why is it so quiet!

With my MCMC's running in the background, and my book on Fisher and Pearson snug on my bookshelf, I would like to say "Cogita ante salis" - before we jump from default NHST into default Bayesian.

Thanks for the interesting thoughts, Cowboybear.

TE

Last edited:

Dason

Multiple hypothesis testing however will never really work in a "p-value" setting (just look at the mess of multiple comparison testing),
it does however work very well in a Bayesian and information theoretic setting (AIC). With Bayesians having the powerful MCMC at their disposal!
Would you mind elaborating on what you mean by this?

TheEcologist

Global Moderator
Would you mind elaborating on what you mean by this?
I was talking about the increasing chance of type I errors with every hypothesis you add, and the messy corrections that would be required to prevent these and keep alpha at the desired level. I just don't see it working.

Dason

I was talking about the increasing chance of type I errors with every hypothesis you add, and the messy corrections that would be required to prevent these and keep alpha at the desired level. I just don't see it working.
And the Bayesian approach avoids this how?

TheEcologist

Global Moderator
And the Bayesian approach avoids this how?
Oke, now I don't understand you. How does a type I error increase if you rank hypothesis with an appropriate ranking measure, e.g. DIC?

Compared to an analysis where you would need to test each hypothesis against the null and each respective combination of $$H_i$$?

Sure the data could not contain enough information to discriminate between hypotheses, but type 1 error?

TheEcologist

Global Moderator
You know come to think of it, this is the analogue of type I errors that Bayesian and AIC analyses that test multiple hypotheses can suffer from. Lets say you have a set op hypotheses you test $$H_i$$ through $$H_p$$, and the ranking comes out as below;

\begin{align} & \left.\begin{array}{l} = H_true \\ = ... \\ \end{array}\right.\\ & \left. \begin{array}{l} = H_i \\ = H_j \\ = H_k \end{array}\right\} indistinguishable \\ & \left.\begin{array}{l} = H_l \\ = H_m \\ = H_n \\ = H_o \\ = H_p \\ ... \end{array}\right. \end{align}

Your set op hypotheses will be ranked according to your data, priors and parsimony as depicted above. We never know where exactly our set of hypotheses are with regards to the "truth" $$H_{true}$$ but we know where our hypotheses/models are with regards to each other. Now due to simple sampling effects, some of the hypotheses could have been mixed up, and will have a ranking statistic that makes them practically indistinguishable. If you choose the top one, you may be making an error in the nature of a type I error (but not exactly) as a different sample may have yielded one of the other three indistinguishable models as best. It is, in my opinion, far less serious than a type I error and model averaging makes this considerable less a problem.

Is this what you meant Dason?

Also, the forum does not agree with my latex... any suggestions?

Last edited:

rasmusab

New Member
But even if they aren't a panacea, I honestly believe that even a default Bayesian approach to hypothesis testing is still an improvement over significance testing; maybe even a gateway drug leading one day to more tailored, customised, Bayesian data analysis.
I like the gateway drug idea, or maybe the opposite, default Bayesian analyses can act as the methadone while on rehab from classical statistics. As a gateway drug I think the approach Kruschke takes is good, prepackage everything nicely, make the method easy to run with good defaults, but be explicit with the model and make it easy to change it.

So which default classical methods could be the target of Bayesian models? On the top of my head (and from the appendix of any psychology statistics textbook):

* T-test
* Pearson's regression
* Binomial test
* Pearson's chi-squared test
* Factorial ANOVA
* OLS regression

What should a default Bayesian analyses include (except for being Bayesian)? Maybe:

* Robustness. To save researchers from the tyranny of the normal distribution.
* Heterogeneous variances between groups. Why is this always assumed?
* A default plot. Bayesian analysis lend itself to visual inspection of posterior distributions and MCMC traces, this could be built in from start.
* A good, easy to use and easy to grasp implementation. Implement it in FORTRAN using a clever MCMC scheme? No! Better to use BUGS/JAGS/STAN where the model is clearly expressed and easy to modify.

What do you think?

TheEcologist

Global Moderator
Is this what you meant Dason?
Ripley also has some wisdom to add;

If you really want to assess uncertainty you need to take into account that the
models are false and that several models may capture different aspects of the
data and so be false in different ways.
-- Brian D. Ripley

CB

Super Moderator
And when only using flat (constant) priors, it further reduces to;

$$P(Hypothesis \mid Data) \propto P(Data \mid Hypothesis)$$

(incidentally this is basically just ranking by the likelihood --> add parsimony checks and you've got AIC).

Now if we get a relative ranking of two hypotheses, H1 and H0, with equal parameters for both as in e.g. a t-test, and we find that H1 is better supported than H0. Under your above statement we may now interpret it as evidence for H1, as H1 ranks higher that H0. We should now feel confident that H1 is better than H0 (I agree). Does that make H1 true? We don't know, as evidence for H1 above H0 is no evidence for H1 as the truth! We will never know until we include all possible hypotheses in the analysis -> which essentially gives you P(Data). Which essentially can't be done, in most if not all cases. So the analysis just told us H1 is better that H0.
Good point. I guess what I'm getting at, though, is that in a conventional NHST t-test we only consider P(Data|Hypothesis) under the null hypothesis. We don't calculate P(Data|H1). So all we can say in the end is that a particular result would be unlikely (or likely) if the null hypothesis was true. We're not able to say that the data/results would be more likely given the alternate than given the null. Even if you favour multiple hypothesis testing, at least the Bayesian t-tests actually test two hypotheses: the NHST version really just tests one.

One of the consequences of that is that there is no possible p value from an NHST t-test that would allow us to say "the null hypothesis is better supported than the alternate" (let alone better supported than all other hypotheses).

Obviously this problem relates to NHST as it's conventionally applied, rather than frequentist analysis in general. E.g., if one could pit a point null hypothesis against a point alternate hypothesis, we could calculate likelihoods under both hypotheses within a frequentist analysis. But that rarely happens (at least in my field), mainly because our theories aren't specific enough to make point predictions. The Bayesian route offers a way out of this dilemma, to some degree, by allowing the specification of "alternate" hypotheses that spread credibility over a range of values. Is that a good thing? I'm not sure - maybe what we should really be doing is focusing on developing theory that makes more specific predictions.

CB

Super Moderator
What should a default Bayesian analyses include (except for being Bayesian)? Maybe:

* Robustness. To save researchers from the tyranny of the normal distribution.
* Heterogeneous variances between groups. Why is this always assumed?
Ok, declaration of ignorance time: I really don't understand what role distributional assumptions play in Bayesian analyses . E.g., what assumptions might we make about the error term in a Bayesian regression?

rasmusab

New Member
E.g., what assumptions might we make about the error term in a Bayesian regression?
From my perspective, one of the benefits with using Bayesian statistics is that it is easy to use any distribution for the error terms (and for priors) and one of the problems with classical methods (except for relying on a null hypothesis) is that they most often assume that the error is normally distributed (like in OLS and ANOVA). You are absolutely right in that it is not only related to Bayesian statistics (it could be done in a frequentist framework too ), but I was thinking that if you already are in the business of constructing default Bayesian models, why not spice them up with by being more robust to outliers and to be able to accommodate heterogeneous variances between groups? Kruschke does this in his BEST model by using a wide tailed t-distribution instead of a normal distribution.