R vs Sas vs Splus vs Matlab vs others

#1
Can someone explain to me the pros and cons of all these software packages? Which one would you use if you can start all over again? Thanks.
 
#3
Heres my sense, not from experience, from parroting what I have heard.

Kill matlab off your list. As a statistics tool it would be the primary tool of only applied mathematicians involved primairly in modeling and secondly statistics.

Replace it with SPSS and link R and S for a new list. So now your list is:
R/Splus, SAS, SPSS

Ok Splus versus R. Splus is a proprietary system. R is based on it and free. R has more support at this point from the intellectual community. Splus still has an enterprise presence. There is no reason to use SPlus ahead of R unless you are in or preparing for a specific job or someone else is paying the bills. And Splus is not always "better". For example at the time of the plubishing a book called MASS noted that an operation took 15 minutes under Splus while it only took 90 seconds under R. The advantage to an open source project that people are actually working on is that people don't suffer glaring inefficiencies very long.



R versus SAS. Ive used both. In a nutshell, R gives you nothing that you do not know how to ask for, SAS gives you lots for asking for very little but it will seem like nothing unless you know what to look for.

R evolves faster than SAS. SAS has higher standards.
R is more programic than SAS. SAS is a little more cookbook.
SAS has stronger support for huge databases and enterprise level data management. It remains very very popular in big business. But it has an incredibly expensive license for the full version.

SPSS is largely graphical data analysis environment where the scripts that can be generated arn't often used. Its a point and click data analysis environment. It has incredible world wide inflitration in social sciences.

Personally I use R.
 

WeeG

TS Contributor
#5
I think it depends on what you want to do.

R is the strongest software, it allows you to do nearly anything, as long as you know how to.

SAS is also strong, I personally don't like it that much, but it's one of the strongest software out there.
Speaking of SAS, if you want easy life with nice graphics, you should try JMP, superb software.

I am not a fan of SPSS, I prefer Minitab and Statistica over it.

If you work with Biostatistics or longitudinal data analysis, you should use Stata, also a very powerful software.

If you are good at Excel, use XLStat.

About MatLab, I think, that just like R, which is a software for Statisticians has mathematical functions, MatLab, a software for Mathematicians, has Statistical models built in it. You can do a lot with MatLab, but sometimes Statistical methods will be easier to use by other software.

I use JMP and Stata
 

Jodi

New Member
#6
I think R is definately not for beginners or the weak of heart as it involves learning a whole new computer language. I've worked a lot with SPSS v15. It's extremely easy to use like MiniTab. If you are totally green, check out InStat by GraphPad it walks you through everything and pats you on the back afterwards.:)
 
#7
I use Matlab, really powerfull, mathworks says it is the fastest there is (and I believe it). You can do anything, many many ready to use functions for almost any field. Easy to use, funcier graphs from other softwares I have used.
 
#8
If you had to start all over again with a new software package, then the one you would choose would depend on your level/area of expertise and what software you have used in the past.

If it was me, because I'm only a rudimentary statistician, I would want to start over again with something simple but very capable like Minitab.

I am learning R at the moment - but I certainly don't depend on it yet. So for me I would avoid starting over exclusively with something like this.

SigmaStat is too user-friendly for me now, but it provided quite a nice stepping stone to Minitab.

In lieu of Minitab I'd probably use SPSS - but only if I wasn't the one paying for the license.

Otherwise I'd make do with Excel (i.e., program my own tests) while I transitioned over to R.
 
Last edited:
#9
R is certainly a very good statistical analysis package. It is very comprehensive, easy to learn and compatible with it's proprietary counterpart S. Its one drawback is that it becomes very slow, or may even fail when dealing with extremely large sets of data.
This however is unlikely to be a problem unless you've got millions of data to process.

Matlab is not really a stats package, but a numerical analysis tool, although you can do statistical operations with it. If you needed to use matlab however, you'd be better off getting the free version Octave. It is completely compatible with Matlab, but you don't have to pay $$$$ to get a copy.

Similarly, you'd be crazy to pay the license fee for SPSS when there is PSPP which is the free version of SPSS. Unlike the "student" versions, you don't have any limit on case counts and there's no expiry date. It's fast obsoleting SPSS, in the way that R has obsoleted S.

Don't be tempted to try and use Excel for except for VERY simple work. It's just too easy to make mistakes and not notice them.

Personally I would avoid Minitab because it's very outdated these days.
 
#10
I use R in my line of work. I did a lot of research on using it. While others have sung praises of it... It can be very difficult to use more advanced functions without explanation from someone else.

I personally despise using R, because to do interesting functions means spending 2-3 hours scouring mailling lists and the internet to get a basic clue of what the hell is going on. IF I had a textbook that explained it, (like my company would ever pay for that, dang a$$holes, and I personally don't have the money for that) maybe I wouldn't be so bitter about the language.

Also, it is almost all command line unless you get the Rcmdr package installed... Which is still quite limited (but better than nothing when your company won't buy you software.)
 
#11
It depends on your skills in statistics and on the time and money you want to spend.
If you are not very rich avoid Matlab, SAS and SPSS. SPlus and R are very similar but R is free.
R is very complete but you will have to spend some time understanding it. If you use Excel datasets, XLSTAT is a complete statistical tool not too expensive (450 US$) with the most important statistcal methods, it can be a good solution.
Personnaly, I use R, SAS and XLSTAT for my research.
 

Dragan

Super Moderator
#13
Can someone explain to me the pros and cons of all these software packages? Which one would you use if you can start all over again? Thanks.

For me, it depends on what I am doing.

If I am using bootstrap techniques, then I would use SPlus.

If I want a variety of numerical integration techniques available or solving (large) systems of equations, or symbolic results then I would use Mathematica.

If I want a (quick) empirical confirmation of analytical derivations that I make, then I would use Minitab.

If I want to conduct a large Monte Carlo study where speed is of essence, then I would program in Fortran.

If I am teaching a basic course in inferential statistics then I would SPSS.
...
The list goes on....
 
#14
At the risk of sounding like a commercial (which is actually an impossibility) for R, I am going to make a not-so-complete list of the reasons I am a big fan of R.

- R is free
- R will perform virtually all common statistical methods without additional programming
- R has graphics far superior to that of SAS and Excel, and rivals most other packages
- R is extensible. If you want to write C code and interface it with R, there is a simple way to do it
- R code/data written by you can be shared with the rest of the statistics community as an R package
- R documentation, including books, manuals, tutorials, etc. are freely available
 
#15
I have only ever used R and SPplus and now use R most of the time in my work. For me the main advantages are
1. R is free
2. It can be programmed to do anything not just statistical functions
3. The graphics are excellent
4. There is a lot of goodwill and support from the network of other R users
5. Programs can be written and saved which gives you an exact record of how you produced your results and enables you to repeat the process with another set of data.

The speed and memory limitations in my experience are more a problem with the PC and 32 bit OS rather than with R.

Yes, it is a steep learning curve to understand the language but there are plently of examples available, manuals and people who are willing to help.
 

TheEcologist

Global Moderator
#16
I have only ever used R and SPplus and now use

The speed and memory limitations in my experience are more a problem with the PC and 32 bit OS rather than with R.
Using R in windows does give a memory conflict thats less of a problem in Linux sure but many of my friends who work on genetics with datasets to id metabolic networks for instance really need to revert to "old mainframe" era programs like SAS. Sometime they even need to create their own programs. So the way R handles memory can be a real issue.

- most of us will not run into this problem, I do occasionally but then I reboot in linux and I'm usually fine.

Secondly many University labs refuse to use R because of the liability issue. You use R at your own risk, which is often unacceptable if you design bridges, aircraft or maybe medication that people will use. In short: If your bridge fails because of an inherent error in R, you are liable and not the R-developers or package developers.

- This really is only a problem for a select group of fields, not for mine. I love the way contributing to R is build up like contributing to science; you submit a package you created and then it gets peer reviewed by others before it gets accepted. I therefore feel that for this reason there is a very low chance of any 'inherent error' in R. The chances of running into such a thing in a program like SPSS are much higher (no peer reviews, no huge and knowledgeable user community) thus they NEED to backup their expensive programs, whereas I feel R does not (or does this in a different way).

These are the only two meaningful problems I know about.

The steep learning curve is not really a problem, if you really cant work with code: use R-commander. Its what I recommend to grad students who haven't taken R-courses, I often see them learning commands really well with it and then dumping R-commander after sometime anyway.
 
#17
Secondly many University labs refuse to use R because of the liability issue. You use R at your own risk, which is often unacceptable if you design bridges, aircraft or maybe medication that people will use. In short: If your bridge fails because of an inherent error in R, you are liable and not the R-developers or package developers.

- This really is only a problem for a select group of fields, not for mine. I love the way contributing to R is build up like contributing to science; you submit a package you created and then it gets peer reviewed by others before it gets accepted. I therefore feel that for this reason there is a very low chance of any 'inherent error' in R. The chances of running into such a thing in a program like SPSS are much higher (no peer reviews, no huge and knowledgeable user community) thus they NEED to backup their expensive programs, whereas I feel R does not (or does this in a different way).
I agree, this is not really a problem, but rather a marketing tool used by software companies. Having provided help to many novice statisticians, I can say with confidence that in any software package, whether proprietary or not, the user is much more likely to be the source of erroneous inferences than an 'inherent error' in the software. As statisticians, we should carefully check our work. Ultimately, our inferences are our responsibility.

BioStatMatt
 

vinux

Dark Knight
#18
:) Why should I miss n this thread? I agree with opinion of most of the people.


I use SAS for my job and R for my personal research. And I got chance to work using multiple Stat packages.
My view on stat package as follows


Excel: This one is most widely used statistical package. But it has only limited features to analysis the data ( not used for complex stat analysis).
SAS: is the one package used in most of the analytics companies. SAS skill has lot of opportunities in the job market. SAS become a standard for clinical and analytics related work. When handlyig hugh data ( above 50 million records), SAS is the most efficeint one.

R( RKWard): I prefer R for my research and independent studies.(previously i use C++/JAVA) . This one is a good choice for a good programmer. It is a semi Obeject oriented language.

Matlab: i found,programming in matlab is fun and easy.

Mathematica: used for complex integration and maths related problems.

Minitab/SPSS/Statistica/Statgraphics: These are mainly GUI based stat packages. I found Minitab is more user friendly and simple. ( not used latest versions other than Minitab).

Knowledge Seeker: I use it for CHAID ( because SAS enterprise miner is costly)

FI Model Builder (MBDT/MBPA) : for CART, GAM and score card development.

Eviews: I tried long back.. no idea about new versions.. used for Time Series Analysis.
===============================================
For new bees use GUI stat packages - Minitab/spss.. ( now most of it has both GUI and programming part).

One must learn the data manipulation part if he use the programming stat packages. Without that it will not be fun.
like in SAS -> one should be good at BASE package
R -> should know what is class and mode,list/dataframe/matrix .. etc.
 

terzi

TS Contributor
#19
Stats for newbies

In my opinion, the best software is the one that helps you fill all your needs easily.

Certain software as MATHEMATICA, GAUSS and MATLAB are good, but they are not devoted to Statistics, so they are somehow weak. Their real strengths is shown in areas as simulation, or complex mathematics.

I also avoid certain software such as STATISTICA or EXCEL. I know that they are widely used but they still have many flaws in its processes. I specially had some problems with STATISTICA in Time Series Analysis and with some Multivariate Tools (I used version 7, so I'm not sure whether this problems have been fixed) In fact in certain talks the use of Excel for statistics has been criticized (Cryer, 2001).

I'd like to make some comments on the most popular softwares. I personally use STATA at work, since is one of the most complete packages I've worked with. It also has open contributions so it grows fast. Besides I think the tools for modeling available in STATA are beyond any other multi-function package. I like SAS cause programming it's fun:) Still I'm not a huge fan of its interface. Along with STATA is one of the most powerful programs. I find SPSS really confusing! It is almost impossible to introduce survey designs: and I haven't managed to produce graphics with the "amazing graph generator". Don't get me wrong, it can be outstanding tool (I love the Multidimensional Scaling Module) , I just think the interface needs to improve a lot.

Finally, it is true that R is the Ultimate Weapon for a statistician, but to use this software requires full command of stats and also knowledge of programming. I think R should only be recommended to professionals with high knowledge in the area. I mean, it is pretty easy to obtain wrong results with it if you don't use it properly.

So, I'd like to add some recommendations, which haven't been mentioned here yet, specially for those starting in the world of randomness. These packages are, in my opinion, the best option for starting with Stats:

MINITAB

Is the easiest software I've used and has most basic tools available. It's specially good for Industrial Stats and Quality Control. I should recommend it for beginners due to the detailed help files where you can easily learn the basis of statistical tools using examples.

INFOSTAT

This software was developed in Argentina and is quite popular in Latin America. One of its great advantages is the prize, really cheap compared to the competition. You can also find many tools in this package that uses a very clean menu interface. Previous versions lacked of high quality graphics but I heard this is being improved.

OPENSTAT

Openstat is a free Statistical package oriented to Social Sciences. I consider it an amazing tool to teaching statistics. It has an easy-to-use menu interface, along with an important number of tools that displays results in a simple way.

NCSS

This a very interesting software based on menus, that I would also suggest to beginners. It has more statistical tools available than MINITAB and INFOSTAT, so it is a really useful alternative. I really like its friendly interface which has the advantage of even suggesting certain alternatives to you analysis.

Well, I hope this helps someone to choose a statistical software not only based in its capabilities but also thinking in one's capabilities with Stats:)
 
Last edited:
#20
multilevel models and multiple imputation

what about multilevel models with repeated measures analysis and multiple imputation?

Which program is better to use? SAS or R?