View Full Version : Statistical Authority


mad.casual
09-04-2008, 03:43 PM
This question borders on the metaphysical, but metaphysics always make for good general discussions. Anyway, how does one know, statistically speaking, when one is right and what to do about it? I'll give an example that has plagued me throughout my career and is, at this point, pretty general;

I'm a biochemist with a more quant./analytical bent. For the myriad of bioanalytical tests that you encounter, there are regulating agencies (FDA, CAP, ISO, NCCLS,...) that set the standards you have to meet in order to have a 'valid' test. As you can imagine, entrepreneurial individuals will bend/break any statistical rule they can in order to meet these standards (Running a hundred samples and reporting the best forty, controlling validation samples in ways that you couldn't possibly control customer samples, etc.). Humorously enough, these agencies often account for some of this, allowing for trial runs, asking for the best 95% of the data... and still the rules are skirted.

I only ask, because I perceive a sort of glass ceiling between those who say to toss data (or actually do the tossing) and those who generate the data. And I'm interested in advanced degrees in computing and statistics with a possible career switch, but I can't believe the heaven's part and data magically aligns just because you have an advanced degree or a degree in math/stats.

To use a metaphor, an Imperial Stormtrooper is looking for general advice from some statistics Jedi. Any around that would chip in their $.02?

JohnM
09-04-2008, 06:07 PM
Oh yeah, it's good to be on this side of the ceiling. :D

TheAnalysisFactor
09-04-2008, 08:31 PM
Ahh, a topic near and dear to my heart (or maybe gut).

I only did a brief stint in doing statistics in industry, but I encountered something similar when in academia. I started out in a PhD program in experimental psychology, and I saw pretty much the same thing. The only difference there is it was all about getting p<.05 so you could get published.

My advisor was particularly "creative" in data analysis, and suggested we do things, for example, like remove all the males from the study post-hoc because the effect was significant only in females. When I suggested we put this in a footnote of the paper, I was laughed at.

Another professor admitted that in a series of three studies, two made a nice theoretical story, but the third completely contradicted it. She dropped the third and published the other two.

I ended up leaving that program and switching to get a masters in statistics, pretty much in disgust. This wasn't the only reason, but it was part of a general package. Part of my motivation for taking more and more statistics was that I didn't feel like I understood enough to argue against these practices (even though some were pretty clear). I actually threw away all the data for one study just before I left, so it couldn't be published (it had been accepted with revision). I didn't want my name on it.

The only thing I can say in defense of their approach is that I don't think they really understood the statistical principles involved. I think they believed so much in their theory that the data seemed to get in the way,and statistical principles were just annoying roadblocks. They would really be horrified, or at least uncomfortable, with the idea that they were "cheating" or making up data.

All I can say is that once I really learned statistics, by getting the higher degree, I really got how important things like random sampling really, really are.

I have since worked as a statistical consultant at a university far, far away, and never once encountered this kind of thinking. Maybe they're doing better research there. Maybe they're smart enough not to mention creative data analysis to a statistician. But they're still getting good results. So I don't think everyone's so creative.

The other part is as the statistician, I don't care if p<.05. I just care that p is measuring what it is supposed to. I don't need the publications the way the researchers do. And I've been in their shoes--the pressure is incredible. Maybe when the pressure is high enough, regulations are just inconvenient, so stormtroopers just blast 'em.

Okay, I just thought of one other thing--now that I know statistics better, I have more tools available to me. Back in my psych program, everyone used ANOVA on everything. I mean everything. Maybe an occasional chi-square would pop up. No one knew much else, reviewers included. There are times you can look at data a different way, and find what you were looking for, or sometimes something more interesting. Not in a toss data way, but in a transform a variable sort of way. Or you realize the relationship isn't linear, but if you categorize the IV, you find signficance. Maybe when you don't have a light saber, you have to blast everything to get anything done?

My $.02.

Karen

JohnM
09-04-2008, 11:16 PM
All kidding aside (and please excuse my tongue-in-cheek response above - I come from a long lineage of kidders, so it's genetic), what they are doing by shaving off inconvenient data - let's call it what it is - is fraud. Period.

Maybe I'm being a bit naive, but no amount of pressure justifies squirreling away inconvenient data - I know I sleep well at night knowing that I haven't lied in a technical report or shaded the view on something to my boss.

I'm actually saddened by the way data is thrown out in the objective of getting a significant p-value. It corrupts the research process, and prevents us from reaching the truth as quickly as we could have...and I'm a firm believer that there is something to be learned from "insignificant" results....

...and as a professional statistician, that is the toughest part of it - you seek the truth, and others seek to further their agendas. It never ceases to amaze me how two people can look at the same data and see completely different things.

Unfortunately, things won't change much by getting a stats degree or changing careers - you won't all of a sudden have a magical insight - but you will be better armed - you will know what questions to ask, and you will help others to get better at asking questions of themselves, especially around articulating their research objectives, narrowing the questions, etc.

It's possible that a lot of the "inconvenient" data is merely a result of a poorly construed design, a poorly controlled experiment (if I had a nickel for every "control" that really wasn't a "control"), or maybe a less-than-perfectly articulated research question - and this is where you can swoop in, help, and make a difference.

mad.casual
09-05-2008, 12:25 PM
Thanks to you both.

Karen, for some of the companies I've worked for and some of the work I've done, you're right. Too many bad guys to stick a light saber in each one. But I've definitely been in some situations where the time isn't taken to do it right, but rather again, and again, and again.

JohnM, I knew it! I can't see the ceiling, but I knew it had to be there. Anyway, in some of the positions I've worked, fraudulent numbers on a report to a boss is the snowball that leads to an avalanche of misdiagnoses and product recalls. Good to know others take/took these analyses as seriously as I'm starting to and still manage to work with people.

mad.casual
09-05-2008, 02:12 PM
There are times you can look at data a different way, and find what you were looking for, or sometimes something more interesting. Not in a toss data way, but in a transform a variable sort of way. Or you realize the relationship isn't linear, but if you categorize the IV, you find signficance.

Thinking about this a little bit more;

I don't know as much about the Social Sciences, but I think, because many (myself included) don't always know how to hybridize determinism with Stochasm and we wind up staggering around between the two (the fog of research?). I've generally been accepting of this, but there is definitely some significant abuse that happens.

TheAnalysisFactor
09-06-2008, 10:16 PM
Thanks to you both.

Karen, for some of the companies I've worked for and some of the work I've done, you're right. Too many bad guys to stick a light saber in each one. But I've definitely been in some situations where the time isn't taken to do it right, but rather again, and again, and again.

Yeah, I've see the do it again and again approach myself. Been stuck in it. Maybe I'm being too generous, (and I've never been accused of optimism) but I really believe it isn't a matter of "bad guys. It's more about ignorance than intentional fraud. There are a LOT of people knowing just enough statistics to be dangerous.

And I don't know about industry--my whole experience has been academia--but very few universities have quality statistical support for researchers. Even when there is a statistics department on campus, those faculty do not have the time to help other researchers, so those researchers are on their own.

And as for the pressure issue, I agree it's not an excuse for fraud. But a lot of pressure makes it really easy to become blind to your own bad behavior. See the greater good that comes out of the bad behavior. And when you don't really know what to do and there is no statistician to ask--just your colleagues who know about as much as you do--it's easy to not take statistical assumptions too seriously.

Who knows, maybe I was in a really awful department where it was actual fraud. Wouldn't surprise me. It was a pretty dysfunctional in general. Or maybe it was that field. I've worked with a lot of other researchers since then who were interested in the truth. In 7 years as a consultant, I can only remember 1 or 2 consultations where the question was "how do I make my insignificant results significant?"

I don't know as much about the Social Sciences, but I think, because many (myself included) don't always know how to hybridize determinism with Stochasm and we wind up staggering around between the two (the fog of research?). I've generally been accepting of this, but there is definitely some significant abuse that happens.

Yes, I agree there is significant abuse that happens. But my point here was simply that if you only know a few statistical methods, you're going to miss real results because you're taking the wrong approach. If you always do a median split on an IV because you know how to do ANOVA but not regression, you might miss something. If you always do linear regression because it's all you know, but the real relationship in the data is non-linear, you're going to miss it. And if you know you don't know what you're doing, you often give up too soon.

On a personal level, a career change was the absolute best thing for me. Much more personally satisfying to have the purpose of seeking the truth and to help others find it than to be fighting for truth in a place that isn't interested. There is nothing more satisfying than helping someone understand something that was overwhelming them.

Karen

JohnM
09-07-2008, 08:12 AM
And I don't know about industry--my whole experience has been academia--but very few universities have quality statistical support for researchers. Even when there is a statistics department on campus, those faculty do not have the time to help other researchers, so those researchers are on their own.

It's the same in industry. I work for a very large company, and there is a stats support department, but there simply isn't enough help to go around - there never will be. I work in QA, but I occasionally get asked to consult because of my stats background, but I simply don't have enough time to help everyone.

Who knows, maybe I was in a really awful department where it was actual fraud. Wouldn't surprise me. It was a pretty dysfunctional in general. Or maybe it was that field. I've worked with a lot of other researchers since then who were interested in the truth. In 7 years as a consultant, I can only remember 1 or 2 consultations where the question was "how do I make my insignificant results significant?"

It's rarely that blatant - most researchers, even if they don't know a ton of stats, are savvy enough to avoid making it look this obvious - generally they know how to "spin" it or better yet, re-adjust their theory / hypothesis to fit the data....

.....oh, and don't forget about the ugly "p" word (politics)

Yes, I agree there is significant abuse that happens. But my point here was simply that if you only know a few statistical methods, you're going to miss real results because you're taking the wrong approach. If you always do a median split on an IV because you know how to do ANOVA but not regression, you might miss something. If you always do linear regression because it's all you know, but the real relationship in the data is non-linear, you're going to miss it. And if you know you don't know what you're doing, you often give up too soon.

Yes, I agree - this isn't fraud, it's just ignorance, and that's not their fault.

Much more personally satisfying to have the purpose of seeking the truth and to help others find it than to be fighting for truth in a place that isn't interested. There is nothing more satisfying than helping someone understand something that was overwhelming them.

Karen, I totally agree. Very true from the perspective of working in QA, where we are expected to be independent and free of conflicts of interest - I feel like we can express our true opinion. And from the teaching perspective, yes, it's nice to have a positive impact on someone's effort to learn.

Silvanus
09-07-2008, 09:41 PM
Inaccurate stats is definitely a problem in the biological sciences, at least in my experience as PhD student. Many researchers from my dept. who didn't completely understand what to do with their data or how statistical tests work, tended not to seek help from our campus statisticians and instead made do with software like Prism or Excel. And the worst part is, this 'knowledge' was passed down from P.I to to grad student, and between grad students, creating a culture of statistical ignorance. Some even made their stats up altogether, for whatever reason. The most unconscionable example I can think of was with a fellow PhD student who wasn't happy with the p-values his statistical test told him, so he made up his own to suit his hypotheses. And this went undetected by his supervisor and thesis markers, and so now he's a Postdoc contributing to 'cutting edge' international research. God help us. Still, there are QC checks. Respected peer-review journals do have statisticians on board that would easily filter out those submitted papers that were obviously questionable. What scares me is the proportion of those mistakes that aren't so obvious. Statistical methods definitely needs more emphasis at grad level teaching imho (at least in med sci).

I've often wondered what this sort of thing was like in industry though (from the point of view someone considering a career shift to a Pharmaceutical company). This is quite an interesting thread. Cheers.

IronMan
09-08-2008, 11:25 AM
Interesting subject. I have an undergrad degree in statistics and work in industry for a HR services firm. As the company statistical analyst, I have been asked by two different managers to take the same data and basically prove opposite conclusions... Or people saying there is significance in something that isn't there.

mad.casual, I think further statistical knowledge will help you use the appropriate tests/ models for the type of data you work with. Unfortunately, superior knowledge also lets you cheat much more effectively as well.

JohnM
09-09-2008, 08:03 AM
As the company statistical analyst, I have been asked by two different managers to take the same data and basically prove opposite conclusions... Or people saying there is significance in something that isn't there.

Sort of supports my statement about adjusting hypotheses / theories to fit the data....

IronMan
09-10-2008, 02:10 PM
Yeah exactly.

I do some forecasting for the expected amount of work our clients submit to us in a given year... And because my forecasts didn't fit their expectations of what would happen, they got thrown out. Now because volume is lower than expected it is causing problems with subcontractors to whom we have legal obligations to give so much work.

Anyone have the experience of working with managers who don't trust statistical methods? Or are just dinosaurs?