# Stuck with my darn P value

#### noetsi

##### No cake for spunky
Don't assume they're wiser than you! Your time on this forum means you know more about stats than the people writing a lot of articles. Assuming something is valid just because everyone else is doing it doesn't work. That's part of the reason why we have a reproducibility crisis in science right now, where people are realising that a lot of supposedly well-established findings are complete bollocks.

If you only want to make conclusions about correlations between observed variables in your population, then inferential statistics aren't necessary. If you want to make inferences about causal effects in your population then you do need inferential statistics, because you can't directly observe causal effects and uncertainty applies to their estimates.

Good!
I am doubtful that I, a practitioner am going to discover causal reality (if academics can ever do that) in a field I am not native to and which (looking at the journals on it) has little in the way of developed theory to start with. I work in vocational rehabilitation not economics or psychology, the methods are not strong and limited empirical testing or theory exists as far as I can find (and I looked a lot in journals). That philosophical statement out of the way I have doubts about your points in practice CWB. First, as I noted above there is a limited set of data. I have access to everything that exists. So I can not create a theory test the theory and gather new data. There is no other data.

Second, I think there is a trade off between the risks of finding something by chance (in an entire population gathered over two decades) and basing decisions on no empirical analysis at all. Which is what occurs if we do not run the types of analysis I do (and that is the norm not the exception in public programs, commonly decisions are made with no formal analysis at all). Saying you should not use inferential statistics raises questions in my mind as well. Regression, SEM, etc are good ways to see patterns that simple crosstabs and the like will never show. For example they do a much better job of controlling for multiple effects that non-inferential statistics can not.

Again the question is, if the choice is between 1) showing that certain predictors have direct and indirect effects through regression and SEM (with no theory to build on initially) knowing this could be by chance and 2) basing policy no inferential analysis at all, where is the greater risk. Its not like you can do as you suggest, run an inferential statistics and if you are wrong go back and gather more data possibly months later. No one is going to wait months and in any case there is no additional data that can be gathered.

#### CB

##### Super Moderator
Sometimes you have to choose between a simple, valid analysis and a possibly more powerful but much more complex analysis that is hard to understand and explain, particularly if the simpler analysis gives you the answer you want. You make using "multiple significance tests and having to jury-rig in a solution for familywise type 1 error" sound like something amateurish that should be avoided at all costs by researchers, but that is exactly how an anova works.
The reporting outcome of a Poisson regression in this case would be something along the lines of "intervention was associated with an X% reduction in subsequent surgeries, confidence interval Y-Z". Is that really more complicated than the chi-square with follow-ups and collapsing of outcomes? I think you might be assuming that because one test is more familiar to users than another, that this implies it'll produce results that are easier to understand to explain. That is not necessarily the case.

Also, again, it sounds like in your answer above that your choice of analysis would depend on whether the one you initially try gives you the results you want, and that is p-hacking.

#### CB

##### Super Moderator
Second, I think there is a trade off between the risks of finding something by chance (in an entire population gathered over two decades) and basing decisions on no empirical analysis at all.
Sure, but it's not like the only options are stepwise regression and a hunt for significant p values vs. perfect pre-registered experiments. There are ways to do exploratory analysis better and worse. I think we're probably wandering a bit far away from the OP's original question, but at some stage I'd be more than happy to look at some specific research question of yours and see if we can make some suggestions for how to deliver the best inferences possible given the constraints you work in. As a quick example, you say you can't get new data which is fine, but you could improve your inferences by separating hold-out vs. training samples and then doing cross-validation on the hold-out samples once you've selected a model using the training data. Anyway maybe more on this in a separate thread sometime!

#### katxt

##### Active Member
The philosophical discussion is great, but to return to PKrazda'a original question. Can we find a legitimate p value to go with the obviously superior surgery technique. The chi square suggestion wasn't challenged for legitimacy but certainly didn't get universal support.
I'm sorry but I don't think this is at all an appropriate analysis. The study has a simple question - whether the intervention results in fewer subsequent surgeries. By trying to jam this into a chi-square framework you're ending up with multiple significance tests and having to jury-rig in a solution for familywise type 1 error. This just isn't the right way to do it - the OP needs a count-based regression model.
Well, the data is there. It would be interesting to see the results of a count-based regression model. In the meantime here is another suggestion which is simple and hopefully free of the taint jury-rigging or p-hacking.
The original data was two samples of 60, each looking something like this 2 3 1 1 2 3 4 ..... Each number is a measure of how much surgery was needed. Here is a parallel but more continuous version where the measure of the amount of surgery is the total hours spent in surgery. The data now looks like 3.4 2.6 5.2 2.1 .... In the second case the appropriate analysis is simply a suitable two sample test. There seems no good reason why same analysis should not be appropriate for the original data too. The t test is out, as is the Mann-Whitney. However, a randomization test is fine. For the data given, a randomization test gives p<0.0001. This method can also provide a confidence interval for the difference.

#### CB

##### Super Moderator
Statistically that works but:

1) Does OP even have the surgery time data?
2) I'm wary of changing the research question to fit a particular data analysis method. The question should drive the analysis.
3) Poisson regression is easy and simple to implement so I'm not sure why this is necessary.

#### PKrazda

##### New Member
Thanks everyone for getting back to me!

In particular katxt I'm really appreciative of this method but I'm struggling to work out exactly how you got to those p values for each line? Could you explain how you get to the initial chi squared then how you apply the post hoc test at each level to get the p values? Sorry I know for you guys this must be basic stuff but despite my best efforts to learn this stuff, I'm pretty lost here!

#### PKrazda

##### New Member
Also I'm afraid I don't have the times of each operation to hand and would rather use the first method you suggested

#### CB

##### Super Moderator
If you can advise the software you're using I can suggest an appropriate function to use. (For a Poisson GLM).

#### PKrazda

##### New Member
excel's all I have! Is that ok?

#### CB

##### Super Moderator
It's hard to do a decent statistical analysis in Excel. Something like R or SPSS would allow you to do a proper analysis here.

#### PKrazda

##### New Member
even if i just want a p value for each line of data? Are there sites online that can offer that kind of analysis?

#### CB

##### Super Moderator
Possibly, but rushing to get a p value without worrying about whether it answer the right question is a bad idea. So I have a question for you: What does a p value tell you?

#### PKrazda

##### New Member
I think I'd really struggle getting to grips with a program like that. What I really want is a p value for each row of data, are there and good online resource that can help me with this and explain the process?

#### PKrazda

##### New Member
oops sorry i missed your last post. A p value tells you the likelihood that the null hypothesis is not true, thus how likely you are to have proof of your theorised hypothesis?

#### CB

##### Super Moderator
A p value tells you the likelihood that the null hypothesis is not true, thus how likely you are to have proof of your theorised hypothesis?
I'm sorry but no, that isn't what a p value is. FAQ post on this here: http://www.talkstats.com/showthread.php/49475-FAQ-What-is-a-p-value

Look, I understand that statistical computing can be challenging, but I am worried that you just want a p value and don't seem to be so concerned with whether the analysis actually answers your research question. Our goal here is to help posters pick and apply appropriate analyses, not to just run an analysis that you think you want but that isn't actually appropriate. I'm happy to help if you want to apply an appropriate analysis here and are willing to take the time to do so, but comparing the numbers line by line is not appropriate here, and given the context running an inappropriate analysis could have harmful real-life effects.

#### PKrazda

##### New Member
Ok i do completely take your point here, I'm just a bit concerned I'm too far away from fully understanding some of the alternative methods that have been proposed. It's difficult starting with very little background in this area and knowing which way to go. I think I'm so keep on a p value as (i thought at least), I understood essentially what it means, despite of course not being able to apply it to my actual situation. For example I spent a good old while trying to get to grips with poisson regression but hit brick wall and still have no idea how to use it let alone explain it in my study. Balls.

#### GretaGarbo

##### Human
Is there anyone that can help me and explain how to do this? (in non-statistician language :roll eyes
@PKrazda,
I can understand that you are rolling your eyes, after this long discussion.

Since you returned I will give you my estimates, the confidence interval and "the darn p-value".

The measurements are number of operations. Four operations are twice as many as two, so no one can doubt that it is a ratio scale. Thus, it is appropriate to estimate means.

The mean of old is 2.2 and the mean of new is 1.57.

The original poster want to estimate the difference between these means.

The difference of these means is 0.63 and its standard error (the "so to say uncertainty" in the means difference) is 0.1543.

The 95% Confidence interval is (0.33 ; 0.93). This is clearly away from zero, so it is statistically significant.

The "darn p-value" is about 4.072807e-05 which is very low, and much lower than 0.05, thus it is statistically significant.

Code:
### This is an R program

new <- c(rep(1,35),rep(2,18),rep(3,5),rep(4,2))
table(new )

old <- c(rep(1,14),rep(2,25),rep(3,16),rep(4,5))
table(old )

mean(new)
#  1.566667
var(new)
#0.6225989

mean(old)
# 2.2

var(old)
# 0.8067797

mean(old) - mean(new)
# 0.6333333

sqrt(var(new)/60 + var(old)/60 )
# 0.1543469

z_value <- (mean(old) - mean(new))/sqrt(var(new)/60 + var(old)/60 )
z_value
# 4.10331

2*(1 - pnorm(z_value))
# the darn p-value is 4.072807e-05

t.test(old,new)

# Welch Two Sample t-test
#
# data:  old and new
# t = 4.1033, df = 116.07, p-value = 7.598e-05
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#  0.3276318 0.9390349
# sample estimates:
#  mean of x mean of y
# 2.200000  1.566667

(mean(old) - mean(new))
# 0.6333333

(mean(old) - mean(new)) + c(-1, +1)*1.96*sqrt(var(new)/60 + var(old)/60 )
# 95% CI for the difference of means:
# 0.3308133   0.9358533
I agreed with what CBear said about using the Poisson distribution. In a previous internal discussion (in the chatbox) I suggested to use a log-likelihood ratio test with the Poisson distribution, conditional on that the values were larger than zero. (The Poisson probabilities in the likelihood needs to be divided by the probability of zero, which of course is a function of the wanted parameter.) So it seems difficult to test with standard software.

But then I saw that the variance was much lower than mean, so I thought (incorrectly!) that it can not be Poisson distributed. But when I did som simulations and took away the zeros, I noticed that the variance would be lower than the mean. So it can be Poisson distributed. But i have not done any goodness-of-fit tests or any QQ-plot.

But the sample size is large (120 = 60 + 60), so by the central limit theorem one can expect the difference in the means to be approximately normally distributed (although the original data are skewed.) Do a bootstrap simulation if you doubt it.

I like the method of using a chi-squared test. But how do we compare different suggested methods? Usually by comparing the power of test. I have no proof but I believe that the z-test is more powerful. The likelihood ratio test would give very similar result, or slightly little better result (higher power) since it is said to give a better approximation.

#### ondansetron

##### TS Contributor
It's hard to do a decent statistical analysis in Excel. Something like R or SPSS would allow you to do a proper analysis here.
I haven't used SAS University, specifically (because I have access to a paid, licensed version at my program), but I've heard good things from people who have used it. You might want to look into it because it is free from SAS (at least in some cases). Any objections to using SAS University? I would imagine functionality might be slightly reduced from a full-paid version, but it's one heck of a program (so I can't imagine the free version totally lacking).

#### katxt

##### Active Member
Statistically that works but:
3) Poisson regression is easy and simple to implement so I'm not sure why this is necessary.
I'm not clear what you are regressing on what. Are you assuming that the proportions for old and new follow the same pattern over classes 1 to 4 except for different parameters, and that you are looking for a significant interaction in a linear model?
Perhaps you could just stick the data through the Poisson regression and let us know what you get.

#### CB

##### Super Moderator
I'm not clear what you are regressing on what. Are you assuming that the proportions for old and new follow the same pattern over classes 1 to 4 except for different parameters, and that you are looking for a significant interaction in a linear model?
Perhaps you could just stick the data through the Poisson regression and let us know what you get.
You are just regressing number of surgeries on condition.

Code:
#create data as vectors showing values for individual cases
new_cond = c(rep(0, times = 14+25+16+5), rep(1, times = 35+18+5+2))
ops = c(rep(1, times = 14), rep(2, times =25), rep(3, times = 16), rep(4, times = 5),
rep(1, times = 35), rep(2, times =18), rep(3, times = 5), rep(4, times = 2))

#describe differences in number of surgeries by condition
aggregate(ops, by = list(new_cond), FUN = mean) #fewer surgeries with new method
aggregate(ops, by = list(new_cond), FUN = sd)

#Poisson regression
fit = glm(ops ~ new_cond, family= poisson)
summary(fit)
#Suggests significant negative "effect" of condition, i.e., new method results in fewer surgeries
#Note deviance < df which suggests underdispersion (less variability in the DV than
#predicted by the Poisson model). This is presumably due to the lack of zeroes in the data.
#It is an assumption violation, albeit unlikley to be very harmful.
confint(fit)

t.test(ops ~ new_cond) #conventional t-test says statistically significantly fewer surgeries in new method condition.
#But errors obviously not normal given count data.

library(lmPerm) #so let's use a permutation test instead
summary(aovp(ops ~ new_cond)) #same conclusion with permutation test
That's three different reasonable methods providing similar conclusions. Each directly tests the research question of interest (does new method result in fewer surgeries?)

The next thing would be to control for baseline characteristics of patients, since treatment isn't randomised. But I don't know what data is available on that.