# Stuck with my darn P value

#### PKrazda

##### New Member
Hey guys

So I've been doing some medical research and I have a great result by no idea how to approach working out the P value.

Basically with a new technique we can reduce the number of subsequent (revisional) operation a patient needs. The data is as follow:

Old method
1 operation - 14
2 operations - 25
3 operations - 16
4 operations -5

New method
1 operation - 35
2 operations - 18
3 operations - 5
4 operations - 2

So we're really pleased that using the new method patients are needing less operations. I'm trying to write this up and get some sexy statistics with p values, CI and all that but I've been trying to teach myself how to do it for god knows how long and I'm real stumped...

Is there anyone that can help me and explain how to do this? (in non-statistician language :roll eyes

Thanks!!!

#### PKrazda

##### New Member
Or I guess my next questions would be... If a p value isn't appropriate here what's the best way to present this data statistically?

Thanks again!!!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
So you have 60 patients in the historic group and 60 in the new method group and the above represent the counts of number of repetitive interventions that they needed?

Was the method randomized, can you be confident that there is no apparent of latent confounders?

I am thinking a Poisson regression would be your best bet given your basic description. Not a simple procedure to just jump into if you have a limited statistical back group. Also, it might not control for repeated events. Please tell us more about your study!

#### PKrazda

##### New Member
Thanks for getting back to me!

That's correct, we have 60 patients since a point in time at which we implemented the new technique. Fortunately we have a prospectively kept database of all patients, demographics and outcomes from which we collected data on the previous 60 patients having the same operation.

And yep you're right about the repetitive operation aspect. 'Operation one' was the initial operation and the subsequent operations are each to revise the initial op. So those having 'four operations' have had there initial op and 3 subsequent ops to alter/adjust the outcome of the first.

It wasn't randomised as the surgeon changed practice at the beginning of 2014 and our study is focusing solely on operations done by them to eliminate operating surgeon as a variable.

The study itself is looking at the aesthetic outcomes of a reconstructive surgical procedure. Quite interesting and if things go well could possibly change practice!

I've had a look at Poisson regression and it looks a little out of my comfort zone but if you think it's the way to go i'll read around it and see if I can grasp it to be used in our study. Any additional suggestions for ways to represent this kind of data? Was I right in that a p value wouldn't be much use here? I seem to se it used quite a lot representing this kind of data (i.e. 2 data sets, one pre and one post intervention), but can never really grasp hat it's trying to demonstrate.

#### CB

##### Super Moderator
Can I strongly suggest that you pre-register your protocol for data collection and analysis ahead of time for your next clinical trial? When you collect data and then only start to think about the analysis there is a major risk that (consciously or subconsiously) your data analytic decisions will be guided by whether a particular analysis method provides the findings you're hoping to see. It can be ok to figure out the analysis decisions as you go in an exploratory fashion in some contexts, but in clinical trials it's a really really big no-no nowadays.

Was I right in that a p value wouldn't be much use here? I seem to se it used quite a lot representing this kind of data (i.e. 2 data sets, one pre and one post intervention), but can never really grasp hat it's trying to demonstrate.

I do suspect that if you're unsure about general statistical concepts that you probably would benefit from collaborating with a statistician on this project -

Is there any possibility of having a statistician join your project team here? Not knowing statistical concepts like p values is totally understandable, but clinical trials are important high-stakes research - it is important to get this stuff right. Having an expert on board could help a lot.

#### PKrazda

##### New Member
Hi CowboyBear, thanks for getting back to me too

I take your point and am fully aware of interpretation bias when looking at data in the fashion. Unfortunately we just don't have anything close to the kind of funding one would need for a randomised controlled trial. Retrospective data analysis such as this are used by a huge number of people who want to push forward with ideas but do not have the means to use what in essence would be the 'right' way to do it. When presenting our data we will of course be upfront about the nature of data collection and the methodology used throughout and it's up to the reader to appreciate the inherent limitations in such a study.

Unfortunately whilst I would love the involvement of a statistician in our study, there is no chance that can happen as our institute just cannot stretch to it and we have experienced this with similar studies.

On that note would you be able to put forward any final suggestions on how to present this data i the most statistically effective way? I know it may be tricky given the simplicity of our data presentation.

Thanks again for your responses so far!

#### CB

##### Super Moderator
Hi CowboyBear, thanks for getting back to me too

I take your point and am fully aware of interpretation bias when looking at data in the fashion. Unfortunately we just don't have anything close to the kind of funding one would need for a randomised controlled trial. Retrospective data analysis such as this are used by a huge number of people who want to push forward with ideas but do not have the means to use what in essence would be the 'right' way to do it.
Oh totally I understand that not everyone can afford to run RCTs (my dept certainly can't!) But even if you have design that has, say, a non-randomised control group, it's still possible to pre-register your study protocol and analysis plan. That is, if the study is planned before data collection takes place. It's just something to think about for next time.

When presenting our data we will of course be upfront about the nature of data collection and the methodology used throughout and it's up to the reader to appreciate the inherent limitations in such a study.
This sounds great, but you can help the reader further by also reporting several different ways of analysing the data so that they can see how robust the results are to different analysis techniques. In this case a Poisson regression seems like a good starting point, but maybe a negative binomial could work too, and you could even have an OLS/linear regression for comparison.

#### katxt

##### Active Member
On the face of it, it is hard to see what is wrong with a Chi square test.
The data is independent and each patient appears in one cell only. It's true that the test doesn't take into account the ordered nature of the data, but if you get a significant p value (and you will) that doesn't matter. Probably you should accumulate the 3 and 4 op patients into one 3+ group because the counts are low.
I'm trying to imagine the objections a referee might bring up.

#### CB

##### Super Moderator
It's true that the test doesn't take into account the ordered nature of the data, but if you get a significant p value (and you will) that doesn't matter.
It kinda sounds like you're saying that if the chi-square is significant then report that, but if it isn't then find something else? That's p-hacking, no?

It's possible to run a chi-square here yes, but it doesn't seem like an ideal choice in the sense that a chi-square test doesn't answer the question posed in this research (it looks at whether there is some difference between the two distributions, not whether the new technique was associated with fewer subsequent surgeries).

#### noetsi

##### No cake for spunky
Can I strongly suggest that you pre-register your protocol for data collection and analysis ahead of time for your next clinical trial? When you collect data and then only start to think about the analysis there is a major risk that (consciously or subconsiously) your data analytic decisions will be guided by whether a particular analysis method provides the findings you're hoping to see.
Of course that probably describes oh about 90 percent of the social science analysis [can't speak for medical analysis]. Given that in most non-economic areas there is no theory to test even academics are going to just keep playing with the data to they find something that is significant

Even if you state your assumptions ahead of time, when they don't pan out you are going to run follow up theories....

#### katxt

##### Active Member
True, the Chi square test says there is some difference somewhere. Having established that, then you can do post hoc tests at each level, again using Chi square. (This is exactly the process you follow with an anova. Include Bonferroni if you like.)
So, here's a plan. Make three groups 1, 2, 3+ vs new, old, and test for some difference somewhere with a 3x2 Chi square. p = 0.0002 so yes, there is a difference somewhere.
At level 1, test old 14 vs new 35 with 1x2 Chi square. p = 0.003, so yes, new is significantly higher.
At level 2, test old 25 vs new 18 with 1x2 Chi square. p = 0.29, so can't really tell.
At level 3, test old 21 vs new 7 with 1x2 Chi square. p = 0.003, so yes, new is significantly lower.
It convinces me.

#### CB

##### Super Moderator
Of course that probably describes oh about 90 percent of the social science analysis [can't speak for medical analysis]. Given that in most non-economic areas there is no theory to test even academics are going to just keep playing with the data to they find something that is significant
You don't need sophisticated theory to do an effective pre-registered study. In the case of a clinical trial the theory isn't necessarily going to be any more complicated than "the intervention works". The point is to specify your protocol for data collection and analysis prior to data collection so that readers don't have to wonder about whether you tailored your analysis in such a way as to produce significant effects.

Exploratory research is fine, but exploratory research dressed up as a confirmatory work and supported by p-hacking is a waste of everyone's time and money.

Even if you state your assumptions ahead of time, when they don't pan out you are going to run follow up theories....
Uh, no. There are a lot of people out there who are aware of the reproducibility crisis and doing strong pre-registered studies (and accepting the results if the theory isn't supported) - there's effectively something of a revolution underway in terms of improving scientific practice in medicine and the social sciences. Try googling "registered replication reports".

For those who want to hunt around and explore follow-up theories that's fine, but then you need to label the work as exploratory and use suitable methods (generally not statistical significance testing).

#### CB

##### Super Moderator
True, the Chi square test says there is some difference somewhere. Having established that, then you can do post hoc tests at each level, again using Chi square. (This is exactly the process you follow with an anova. Include Bonferroni if you like.)
So, here's a plan. Make three groups 1, 2, 3+ vs new, old, and test for some difference somewhere with a 3x2 Chi square. p = 0.0002 so yes, there is a difference somewhere.
At level 1, test old 14 vs new 35 with 1x2 Chi square. p = 0.003, so yes, new is significantly higher.
At level 2, test old 25 vs new 18 with 1x2 Chi square. p = 0.29, so can't really tell.
At level 3, test old 21 vs new 7 with 1x2 Chi square. p = 0.003, so yes, new is significantly lower.
It convinces me.
I'm sorry but I don't think this is at all an appropriate analysis. The study has a simple question - whether the intervention results in fewer subsequent surgeries. By trying to jam this into a chi-square framework you're ending up with multiple significance tests and having to jury-rig in a solution for familywise type 1 error. This just isn't the right way to do it - the OP needs a count-based regression model.

#### noetsi

##### No cake for spunky
To be clear I am not saying you should p hack. I think it is common. I know that at work when I test models that don't pan out I don't simply stop analyzing the data. I pull more variables in, transform the variables, create new theories, and use the same data I had, because it is the only data there is. In the world I live in you can't go out and gather new data because your original assumptions and theories were wrong. You have to try new theories you did not anticipate and you usually don't have the option of gathering new data. You either use what you have or you do no analysis. I suspect most use what they have even though that is problematic.

Last edited:

#### CB

##### Super Moderator
I think academics who invest significant amount of effort in reports are not going to report they found nothing.
This is true of some or even many academics but there also exist many academics who can and do report null findings (in both replications and original studies). Here's a massive example of many researchers working on a project that resulted in lots of null findings: http://science.sciencemag.org/content/349/6251/aac4716

Honestly, look up search terms like "reproducibility" or "pre-registered" or "questionable research practices" and you're going to find that this is a massive area of effort and change at the moment. Lots of us care about good science.

It never came up in one of my statistical classes
Pre-registration and reproducibility are covered in the one I teach. They are also covered in Daniel Lakens' MOOC, which had over 5000 students in its last iteration.

#### noetsi

##### No cake for spunky
For the record what CWB is saying is arguably the correct way to do statistics even if its probably not done that way IMHO. At the least I think, although I again don't think this done commonly, when you test many models you should adjust the nominal alpha level by some type of correction to address family wise error. But I do not believe this happens and have never seen it brought up in fact.

In my case, and I think this is true of most practitioners, I am not testing theory. There is no theory to test. I want to know what causes Y. So I pull in a wide range of variables based on a hunch and see what generates the best models. This is common in the academic literature. Stepwise regression (or the use of Rsquare change methods and similar approaches) are not only common in the literature they are taught in classes (my graduate classes in statistics certainly did).

You argument, which is correct of course, essentially rules out exploratory analysis. You test theory, and if it does not pan out you gather new data and start over. I don't think that is reasonable, particularly outside academics where they are not going to wait months for results if your original guesses are wrong. I think you have to trade off getting some results against the danger that you found the results by chance.

It might be noted I commonly conduct data analysis on populations of tens of thousands which are arguably whole populations not samples. And that I understand CWB's point. I think there has to be a trade off between his concerns and the need to generate results in exploratory analysis when you need at least a theory of what is occurring. Once you create a theory then you can test it with new data, if new data can be obtained which it can not in my case.

Last edited:

#### CB

##### Super Moderator
I want to know what causes Y. So I pull in a wide range of variables based on a hunch and see what generates the best models.
This is fine as long as you report clearly that this is how you found your models, and you then cross-validate them on new data (because they will fit more poorly in new data than on the data they were trained on).

There's also the correlation != causation issue there, but save that for another day.

This is common in the academic literature.
Indeed, but whether or not something is common is irrelevant to the issue of whether it's good scientific practice.

Stepwise regression (or the use of Rsquare change methods and similar approaches) are not only common in the literature they are taught in classes (my graduate classes in statistics certainly did).
Stepwise regression is a fundamentally flawed technique that produces biased estimates, and its shortcomings have been repeatedly discussed in the literature (e.g. here) and in my posts on the forum. It may have been taught in your classes, but that doesn't make it a sensible technique to use. Please don't advocate it on the forums unless you have an actual principled argument for why its shortcomings don't matter. Even if you want to use the data to select a model, there are much better ways than stepwise regression to do so (e.g., various information criteria, Bayesian variable selection, etc.).

You argument, which is correct of course, essentially rules out exploratory analysis.
I'm glad you agree with my argument. As I said in a post above, exploratory research is fine, and we've been over this point on the forums before too. It's also something that has been covered repeatedly by the researchers advocating pre-registration (e.g., here), because lots of people raise this objection when they hear about pre-registration for the first time. Exploratory research can be valuable: The point is simply to pick methods that are appropriate for exploratory work (i.e., typically not significance testing), and most especially not to report exploratory work as if was confirmatory (e.g., hypothesising after the results are known).

#### noetsi

##### No cake for spunky
I don't consider myself a valid judger of what is right or not in statistics. I will point out as someone who spent a significant portion of his life in academics that there are a wide range of peer reviewed articles that have stepwise regression, and similar approaches in them. So their peers, who are wiser than I, thought they were legitimate. I would point out that if you throw out p values that throws out most statistical approaches like GLM unless you are arguing you should run those and ignore p values.

I do not work with samples. I work with whole populations of members in specific federal programs. There is no other data to collect. That is no other people are in the federal programs I analyze so no new data can be collected. I literally have access to everything that exists on such programs or can exist. So there is no way to go out and gather new data even if that was desired.

It is doubtful whether p values even apply to my analysis since I have population effects. And the non-statisticians I report to are not really interested in my research approach - commonly they tell me to take out what I put in on that. What they refer to as "esoteric statistical details" or "behind the curtain methods." They just want results, I worry about doing things right.

#### CB

##### Super Moderator
I don't consider myself a valid judger of what is right or not in statistics. I will point out as someone who spent a significant portion of his life in academics that there are a wide range of peer reviewed articles that have stepwise regression, and similar approaches in them. So their peers, who are wiser than I, thought they were legitimate.
Don't assume they're wiser than you! Your time on this forum means you know more about stats than the people writing a lot of articles. Assuming something is valid just because everyone else is doing it doesn't work. That's part of the reason why we have a reproducibility crisis in science right now, where people are realising that a lot of supposedly well-established findings are complete bollocks.

It is doubtful whether p values even apply to my analysis since I have population effects.
If you only want to make conclusions about correlations between observed variables in your population, then inferential statistics aren't necessary. If you want to make inferences about causal effects in your population then you do need inferential statistics, because you can't directly observe causal effects and uncertainty applies to their estimates.

They just want results, I worry about doing things right.
Good!

#### katxt

##### Active Member
Sometimes you have to choose between a simple, valid analysis and a possibly more powerful but much more complex analysis that is hard to understand and explain, particularly if the simpler analysis gives you the answer you want. You make using "multiple significance tests and having to jury-rig in a solution for familywise type 1 error" sound like something amateurish that should be avoided at all costs by researchers, but that is exactly how an anova works.