Best programming languages for a statistician to know?

#1
Hello all,

I am contemplating branching out and learning a small handful of programming languages to supplement the M.S. in statistics I am currently working on. There seem to be a lot of such languages so I am interested in learning, at most, maybe 3 of the most commonly used ones (if nothing else, just to demonstrate to a possible employer that I am able to learn such languages).

I am already somewhat familiar with SQL since I took two courses on SAS (which sometimes uses SQL), but I have read also that statisticians sometimes make extensive use of C++, Perl, Python, MATLAB, etc. What would you all recommend I learn?
 

Jake

Cookie Scientist
#2
I see that R is not in your list. I hope that's because it is "so obvious" that you will want to learn R.
 
#3
Not all statisticians are great programmers, so in many professional organizations SPSS is more popular than R as it is easier for everyone to work with. R is more for people not afraid to start with nothing. R is very popular also because it is free software. SAS and SPSS are commercial software packages that makes them less attractive to students. R is used in many organizations along with SAS and SPSS.
 
#5
For implementing methods in data mining, Bayesian statistics, Markov Chain Monte Carlo and such, R is the best.

For asset pricing and other applications of statistical methods in Finance, Matlab and R are the two best packages. I would focus on them and try to stay away from SAS.

For my detailed comparison of five major statistical packages (R, Matlab, SAS, Stata, SPSS), see

http://stanfordphd.com/Statistical_Software.html

Cannot say much about pure IT tools, like SQL, Perl and such.
 
#6
I recommend R Programming Language for Statistics. Its nice and good package support. You can even use Python, there are different packages that help in statistical learning with python, packages like pandas, numpy, scipy, scikit-learn, matplotlib, pylab etc. There is also an interface of R for python that is Rpy2 ( http://rpy.sourceforge.net/ )

I am even learning R at present and all I can say is that its awesome.
 

noetsi

Fortran must die
#7
It depends on if you like to write code or not and where you will work. R has, according to its adherants which includes many in the statistical community a number of nice features. But it is in practice tied entirely to writing code - lots of it. There is no effective GUI unlike languages such as SPSS, SAS or STATA.

In a university it probably won't matter what software you use - although you probably won't work there without a PHD. In industry, be that government or the private sector, you will use what the organization does for the most part. There is no agreement here, and no reliable nation wide data, to say which software is used most often. My guess is that among non-academics the commericial softwares such as SAS are more commonly used. I know at the large state agency I work I had to appeal agency rules in order to get R allowed on my computer.
 

TheEcologist

Global Moderator
#8
http://stanfordphd.com/Statistical_Software.html

Cannot say much about pure IT tools, like SQL, Perl and such.
"On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation"

Sounds like someone with little experience in R graphics wrote that. A fairer comparison would be "some fancy graphics are easier to implement from Matlab out of the box", while R requires additional add-on packages / knowhow to implement these.
 

trinker

ggplot2orBust
#9
StatisticsWorldwide.com said:
R is very popular also because it is free software.
I would disagree somewhat with this. Free is nice but that's more the icing on the cake. It is popular because of what it's capable of.
 

noetsi

Fortran must die
#10
Of course the question then becomes what R can do you will use outside pure research that other statistical packages won't - other than make you write lots of code:p It is extremely unlikely you will need to do anything outside a research environment (which means nearly all non-university jobs) that commerical statistical softwares won't do. Because the software are written specifically for what is in demand on the job. I suspect Minitab and Excel are used far more often in the private sector than any "statistical" package to do statistics simply because they are well known to managers.

Issues that probably won't come up here much since a high percentage of the regular posters are essentially academics - and therefore need capacities uncommonly used outside academics.:p The real key is to decide where you want to work and then ask people who work there what is desired. There are many job sites that list technical jobs (such as INDEED). Many will list software needs. It is worth looking at those and getting a sense of what is desired and not.
 
#11
"On the flip side, Matlab has much better graphics, which you will not be ashamed to put in a paper or a presentation"

Sounds like someone with little experience in R graphics wrote that. A fairer comparison would be "some fancy graphics are easier to implement from Matlab out of the box", while R requires additional add-on packages / knowhow to implement these.
TheEcologist, almost everything is available on any platform. The question is how easily can you get it? You can praise very specific and narrow packages in R all day but the truth is: on the day-to-day basis you generate less pleasant plots in R than in Matlab... And this is coming from an R fan.

The moment you mention word "add-on" it is pointless to compare any platforms. You can find add-ons for almost anything online. The question is how transparent are they? Do you want to spend half a day parsing them or do you have better things to do?... So when I say "R" I mean only standard and well-documented packages. And even then they are not as well-documented as the corresponding functions in Matlab, typically.
 

noetsi

Fortran must die
#12
I think this thread has become in large part whether one likes R or not :p But the OP really is asking I think what will be useful to them. To truly answer that it would be nice to know what he/she plans to do with it. Is it for research, for a job (my guess) or something else? If it is for a job where do they intend to work. That determines in large part the answer to the question.
 

TheEcologist

Global Moderator
#15
So when I say "R" I mean only standard and well-documented packages.
Well so did I.

And even then they are not as well-documented as the corresponding functions in Matlab, typically
Matter of taste I would say. Additionally for those working in Windows Matlab would be preferable. I use Matlab on a linux server.. it leaves much to desire.

No I'm sorry the comment on R graphics vs Matlab is really nonsense.

Matlab does have some really nice pattern-recognition tools that you will find much harder to match in R (though it is possible).
 

noetsi

Fortran must die
#16
I think the question on the survey is whether they surveyed CIO's say at fortune 500 companies (and more generally line units in average corporations most work at). And who returned their survey. IF you survey a speciality population, or you only get responses back from them, then your survey won't match the true population.

I think a better survey would be to content analyze job postings on INDEED Monster etc, and see how often they mention R, SAS etc as a requiremnt
 

Jake

Cookie Scientist
#17
I think a better survey would be to content analyze job postings on INDEED Monster etc, and see how often they mention R, SAS etc as a requiremnt
Interesting that you say that... because that's exactly the first thing reported in the article. Have you really still not read the article after all the many times we've showed it to you over the years?
 

trinker

ggplot2orBust
#18
noetsi said:
I think a better survey would be to content analyze job postings on INDEED Monster etc, and see how often they mention R, SAS etc as a requiremnt
Good idea. I'll do it in R you use SAS and we'll see who get's done scraping and analyzing first. :)
 

noetsi

Fortran must die
#19
I am sure you will do everything faster (and better) than I trinker. I don't think that will be a good way of evaluating how common specific statistical software is utilized in a non-academic setting.

It was fun having this thread brought up on the same day we had a thread entitled "R is slow" I thought of saying, "and ugly too." Plus Julia is replacing it:p

Strangely my award reshowed up after weeks of not being there. Clear sign the net agrees with my views of software (or Zola).