Your Statistical Toolbox

bryangoodrich

Probably A Mammal
#1
In thinking about all the things I am learning and planning to learn, it occurred to me that I should get some insight from people from another perspective and differing background into what they think should be included in their knowledge box of statistical tools. Therefore, if you were to make a brief list of skills, technologies, methodologies, etc., of what you think every statistician should have on their utility belt, what would it be?

Part of the reason I'm thinking about this is because I want to improve my understanding and facility with SAS and other technologies as I get the chance. However, it is pretty wild to think I'll become nearly as good as I have become with R with any immediacy. Thus, I want to craft my learning objectives to what I think would be optimal to know.

I think we can all agree on some basics. You should be able to do simple and multiple linear regressions including variations with response and independent variable transformations. You should be able to set up a one-way and two-way ANOVA and interpret the results. The list goes on.

The goal is ultimately to have a set of skills I can confidently say I know and confidently say I know how to do with a variety of technologies. I want to be able to have a strong resume or CV, but I don't want to have it lose track of the most important skills. Of course, that depends on the job and firm to which I would be applying. In general, though, a research analyst should have a set of skills that would impress anyone, and I want to pinpoint those!
 

spunky

Can't make spagetti
#2
Of course, that depends on the job and firm to which I would be applying
if it's ok with you, i think if you could narrow this down a little bit more it would be very helpful... what kind of stuff would you be interested in doing in the future?

for instance, i can tell you that one of the things that's really hot in my area (educational/psychological measurement) is Item Response Theory and logit models, but if you're not in the social sciences or deal with a lot of survey design, it will certainly be entertaining to learn but it may or may not be very useful to you...
 

Dason

Ambassador to the humans
#3
I don't really know how to answer your question because there are very few things I think should be in every statistician's toolbox. I also don't know what level we're considering here - are we talking a statistician with an undergraduate degree, masters, or PhD?

There are some basics that every statistician should know but I think you've covered those already in your post. After that though it gets really debatable about what everybody should know.

Now I could give a list of things I find incredibly useful and some topics I wouldn't be able to survive without. But I could walk down the hall* and find some people that would do perfectly well if I erased those topics from their memory.

* I'm not actually in the office right now so this isn't exactly true but you get the point.
 

jpkelley

TS Contributor
#4
I liked bryan's posting. It made me think about being very explicit about my own toolbox. For me, it's less about a statistical toolbox, so I had to think about this a bit.

Maybe a better way to put this....if you met someone at a dinner party who said they were a statistician at X company, what would you expect them to know? This should provide a basic list. Presumably, this would extend far beyond a list of statistical test into the realm of data management/manipulation and data visualization.

Also, if enough people here give a list of top 5 or 10 statistical tests (or general skills related to statistics...like visualization), a consensus list could be generated.
 

Link

Ninja say what!?!
#5
I have to agree...though thinking about it, maybe he's just trying to get a feel for what tools are in our own boxes for what we do in our respective fields.

As a biostatistician and epidemiologist, a few tools completely essential to me are survival analysis (including logistic models, cox-proportional hazards, kaplan meier, and life tables), gee's, mixed models, SEMs, anova, PCA, survival analysis, cluster analysis, spatial analysis, poisson regression, bootstrapping, Cross-validating, etc (did I mention survival analysis?).

I'm sure I can probably think of more if I sit here and think about it further. Hope it gives you an idea though.
 

bryangoodrich

Probably A Mammal
#6
You all bring up good points, but I'd say jpkelley hit the nail on the head for this exercise, especially on the point that a statisticians toolkit should include data management and visualization. In fact, most of my skills are in data management, and I've got a strong background in visualization--I just lack the experience there.

It is true that the skill set a statistician requires will be crafted toward the sort of work they do. Just compare Link and Spunky's examples between epidemiologist and survival analysis with psychologist and survey design. With that said, there will also be commonalities between our tool sets. This intersection is important, and the overlap between fields will vary (economists and social scientists focus far more on linear regression than anova models, e.g.). Nevertheless, my interests lie in what skill set is required to apply broadly. Certainly the few examples I gave were acceptable, but it was far from complete. I didn't say anything about robust and weighted regressions, stepwise regression, and various basic design topics. If we get too far advanced we also risk getting too specific.

Another view might be, what sort of tasks do you find yourself repeating often? How much, say, bootstrapping and cross-validating are you doing in SAS? How much textual processing are you doing in SPSS? What sort of visualizations are you reporting regularly? These are just examples of the sort of questions we can ask. I'm leaving it to you to ask yourself some, to reflect on your experiences, and provide those answers.
 

Dason

Ambassador to the humans
#7
I'd like to point out that I'd like stepwise regression removed from the statistician's toolbox. I think there are better ways to accomplish the task it set out to do and is misused in a lot of situations.
 

Link

Ninja say what!?!
#8
lol. I still find it a good tool if say, you have no idea about the topic and variables you are modelling (all your eyes see are Y's and X's). In using CV to find the best possible combination of models for prediction, I like adding stepwise regression as one of the libraries to be tested (just so I see how I'm doing with the other models in the library).

I'd like to point out that I'd like stepwise regression removed from the statistician's toolbox. I think there are better ways to accomplish the task it set out to do and is misused in a lot of situations.
 

bryangoodrich

Probably A Mammal
#9
I'd like to point out that I'd like stepwise regression removed from the statistician's toolbox. I think there are better ways to accomplish the task it set out to do and is misused in a lot of situations.
hahaha, I knew bringing that up would get a comment. I wasn't stating it as something that should be included, just an example of something that could be included.

lol. I still find it a good tool if say, you have no idea about the topic and variables you are modeling
I would say there are probably other data mining and exploring techniques we could use to understand the relationship of the variables. The stepwise regression is really a way to sort of automate that exploration by a small set of parameters and conditions. At least, that's how I see the situation.
 

jpkelley

TS Contributor
#10
Re: stepwise. I agree with Dason...much clearer results (and much more conservative, in my opinion) to go the AICc model selection route. But stay away from automatic model averaging functions (say, in R).

I'll post my list later today. My girlfriend is making me take a break from TS so I can finish a draft of one of my dissertation chapters. Booooring.