Revenue/Value(Millions)

459/1835

563/323

369/1181

513/1000

406/990

304/822

276/800

285/656

340/646

276/413

175/384

186/372

196/333

206/329

205/308

161/274

187/262

145/261

143/258

142/198

Here are the questions:

1. Construct a 95% confidence interval estimate of the mean value of all franchises that generate $150 million of annual revenue.

2. Construct a 95% prediction interval of the value of an individual franchise that generates $150 million of annual revenue.

For the first one I thought I had the formula right but I got a really strange number, I think its because I'm getting the wrong value for the SSX. So I was wondering if anyone could help explain the formula used for these problems because for the first one I had it as Y=(1/20)+((150-276.85)^2/(SSX)) and then square root all of that. Any help would be greatly appreciated. ]]>

He has replicated a study and wants to compare it to the original study conducted by other researchers. He does not have access to the original study's data and only has the following knowledge about the variables: mean, median, and standard deviation.

Is it possible to take a set of participant measures and make any sort of meaningful comparison with another data set for which you only know the mean, median, and SD?

-- kate ]]>

My reason for asking this is a recent kerfluffle over some stats reporting I did years ago for a paper in STEM education. We had several variables being tested, but these variables faced a few issues when it came to applying ANOVA and t-tests: some variation from normality, uneven n's, etc. Most of the normality problems were due to floor effects since these were time measures and could not be negative. For consistency, we ended up doing nonparametric tests throughout the paper: 3-way Kruskall-Wallis then post hoc Mann-Whitney (with adjustments). When it came to publishing decisions, we decided to report the means and SDs along with the test statistics.

Half a decade later, we've received some complaints that we should have reported medians instead. From what I've read and seen, my knowledge as a stats consultant working in education and STEM research says that it's debatable which is better to report. Medians make sense because the tests are median-based. However, my argument was and still is that the data was mostly normal and that the mean and standard deviation gives more information.

Of course, the sensible answer is to report mean, median, and SD. We didn't because the table was already at the point of illegibility and the authors refused to add a second table.

So, is there any consensus here? Published opinions? Or is this a horrible, unending debate like the vi versus emacs debates in computer science?

-- kate ]]>