Ok, so we've all heard time and again that null hypothesis significance testing is good for nothing but ravishing maidens. It tell us P(Data|Hypothesis) when we want to know P(Hypothesis|Data); it can't provide evidence

*for*a null hypothesis; p values depend on the researcher's sampling plan, not just the actual data (see the Wagenmakers article below); etc etc.

Several of the main complaints about NHST have been floating around for 60+ years now, and more and more methodological articles seem to be calling for a switch to Bayesian data analysis - especially in psychology, but also in other fields. Bayesian data analysis allows us to directly calculate the probability of particular hypotheses or models being true, and circumvents a bunch of the problems with NHST.

The thing is, so far that hasn't really changed how everyday researchers do their bidness. NHST still rules the roost. Part of that problem is probably just cultural inertia, but part of it is probably also because Bayesian analysis is perceived as harder, requiring more subjective decisions (especially about priors), and more programming skills than typical NHST tests.

But lately there have been a handful of articles coming out that are trying to suggest Bayesian alternatives to common statistical analyses, with the t-test sort of being used as a bit of a test case. The idea seems to be to develop alternatives that both:

1) Are easy to use

2) Have default, reasonably non-informative priors that can be used in a wide range of situations.

I've noticed three broad approaches:

1) Model comparisons using the Bayesian Information Criterion (BIC).

A practical solution to the pervasive problems of p values (Wagenmakers, 2007)

Here the idea is that if you can get BIC's for different models, or calculate them (and Wagenmakers show how to do so for something like an ANOVA), you can then compare models in a Bayesian fashion. Specifically the difference between the BIC's of two models approximates to a Bayes Factor indicating the weight of evidence in favour of the model with the lower BIC. Wagenmakers talks about this approximation being consistent with a unit information prior - I don't think I understand the prior very well here though. The nice thing about this method is that it can be applied very widely - you could do anything from a t-test (i.e. by comparing an ANOVA model with a dichotomous predictor to one with intercept only) to a comparison of SEM models. But it doesn't use Bayesian

*estimation*at all.

2) Model comparisons using Bayes Factors, and a "JZS" prior

Bayesian t tests for accepting and rejecting the null hypothesis (Rouder et al, 2009)

A default bayesian hypothesis test for ANOVA designs (Wetzels et al, 2012)

A default Bayesian hypothesis test for correlations and partial correlations (Wetzels & Wagenmakers, 2012)

Analyses for t-tests, regression and binomial observations implemented online here

In this setup the prior for coefficients

*under the alternative hypothesis*is made with reference to effect size, using a Cauchy(0, 1) distribution. As in the above approach we can produce evidence for and against a null hypothesis that a variable has no effect, using Bayes Factors. This approach isn't quite as general as the first. Again, Bayesian reasoning is used for inference, but not model estimation.

3) Full blown Bayesian estimation

Bayesian estimation supersedes the t test (Kruschke, 2012)

Implemented beautifully online here

Kruschke suggests that mucking around with frequentist analysis and then calculating Bayes Factors is not the way to go. Proper Bayesian estimation gives us much more detailed information; especially in the form of a posterior distribution for each parameter, which shows us the relative plausibility of lots of different values of the parameter (not just a null and alternate hypothesis). For a t-test situation, Kruschke suggests a very wide normal distribution as an appropriate non-informative prior for the group means (technical details in the article). We combine this prior with the sample data using MCMC, and get a posterior distribution. This approach is probably the most informative and most general, but also will seem the most unfamiliar to a researcher used to OLS and NHST and SPSS.

One thing that jumps out at me about approach (3) is that it abandons the traditional obsession with a "point" null hypothesis. The method, at least in the default implementation, can't provide evidence

*for*an exact point null hypothesis. I don't mind this - most of the time, a null hypothesis that an effect is exactly zero seems pretty implausible, unless we're talking about an experiment testing whether people have precognition or something like that. But the method can provide evidence that an effect is negligible (using a "region of practical equivalence"), and if I understand it correctly can tell us the posterior probability that an effect in a particular direction exists. That seems more useful - in the social sciences, researchers very often have an hypothesis that an effect exists in a particular direction.

Anyway, that's enough babbling on from me, especially since I don't understand this stuff all that well anyway. But what do you guys think of these approaches, and of Bayesian data analysis in general? And will there come a day when our usual answer to a standard "what test should I use?" question will be to tell the poster about a default Bayesian analysis?