# Which software to buy ?

#### WeeG

Hi Guys,

someone asked me today which statistical software should he buy for his work. the field of work is biostatistics, there will be a need for complicated models. the work might also include some data mining, resampling, and more. the other software he already uses is matlab. the software he want to buy should complete.

I was thinking about Stata along with Stat/transfer (because he can also use R which is free). any other suggestions ?

you can suggest SAS, STATA, JMP or STATISTICA

#### SE_Lazic

I prefer JMP to Statistica. JMP is interactive and allows you to think in terms of models. Statistica requires too many mouse clicks to do even simple analyses, and leads to "what is the right test" type of thinking rather than "how can I best model the data" type of thinking. I can't comment on SAS or Stata.

But why pay for stats software!? If your friend is familiar with matlab, R will be an easy transition. The playwith package (http://code.google.com/p/playwith/) and ggobi (http://www.ggobi.org/) can be used for interactive graphics. What more do you need

#### bugman

Wee G

Id use R over everything, but of the packages you mentioned, I agree with SE_Lazic about JMP. Statisitica, IMO is aweful. I haven't used STATA.

#### Dason

In my opinion the only package that even should be considered other than R is SAS. I'd use R over SAS any day but SAS is really nice for certain things. (I guess if you're going to do SEM there might be other stats packages you would consider but that doesn't sound like the case).

May I ask why they feel they need something other than R?

#### WeeG

I think they are looking for a package that will be easier to use, with less code writing. JMP sounds good, but isn't it necessary to learn the JMP script language to be able to use it's real power ? STATA is powerful and simple. STATISTICA was an option because of it's Data Mining features, but I guess R must have those, does it ?

I use Matlab for coding and Oxmetrics for quick point-and-click stuff. I think thats pretty optimal for my purposes.

#### Lazar

WeeG if they dont want to do coding have you considered deducer. Gives R an SPSS interface (i.e. point and click).

EDIT: Ah it depends on what you mean by complex analysis (sorry did not read that bit). Deducer might not be perfect for highly complex or analysis that are not typically (however, it is relatively easy to code new point and click menu options).

#### WeeG

thank you everyone for your comments, I was doing some checkup of pricing on the Internet, and find something that might change the picture. The price of SAS is MUCH higher than the other packages. STATA is extremely cheaper (including STAT/TRANSFER). For even cheaper price I can get SYSTAT 13...what do you think about that ? I heard good reviews about it.

#### duskstar

I use Stata, SAS and R myself though am mainly a Stata user. I've never personally had a use for STAT transfer, if I ever need to move something into R I just export it as a csv file and then take it into R, though I suspect that might be an issue with larger datasets.

I dont have SAS at home anymore because of issues installing it on Vista home (my university gave us all a version to use at home, but I could never get it to work.)

I suppose a lot of it is personally preference. Each package has its advantages, but if its only basic stats you want, you might just want to look at which has the most understandable code for that person.

In my experiences:

Stata - Easy to use, can use "point and click" (though I never have). I use Stata, my manager uses SAS (hence why I know a bit of SAS).

SAS - Expensive. But, definitely superior with handling large datasets. I would say I think SAS is better for writing macros than Statas option to write programs.

R - Can't really go wrong if its free. The code is a bit different to get used too, but one you do, its okay to use.

I'm afraid I've never heard of SYSTAT.

#### WeeG

I suppose that my dilemma exist because I know that SAS is relatively powerful and flexible. On the other hand, STATA is also powerful in compare to other packages, and it's much cheaper. The question is, in case I got for STATA, if I'll need something very complicated, will R be able to handle anything that SAS does ? If the answer for that one is yes, I see no reason for me to choose SAS, since I am more familiar with R and STATA....

SYSTAT is like SPSS in the way of very easy to use, but I like it, it's has many options, models and test, and the output is useful, for every model you get automatically the information criteria's and all that. I work a bit with the demo version, and it surprised me in a good way, it's not bad, not bad at all, and cheap (relatively, none of these packages is actually cheap)

#### duskstar

I personally haven't ever found anything I can do in SAS which I can't do in Stata or R. Sometimes it maybe a bit fiddly though, I have sometimes had to write twice as much code to make something run the same way as it does in SAS. Stata is superior in some areas (in my opinion) such as survival analysis. I'd be interested to hear if others have ever found something they can do in one program but not in another, I'm sure there will be somethings.

#### WeeG

I suppose examples can be found on both ways, I would also like to know if there is anything. I once wanted to bring data to longitudinal form, with SAS it took an entire procedure and with STATA one line.

In my education and professional experience, I have and use R, SAS, Stata, and Stat transfer (from the list you've provided) quite a bit.

From what I've seen, R and SAS are predominantly the software used to analyze data. They are programming languages which make it really easy to handle data.

SAS (Cost = licensed at ~$1000 or so a year) has established itself as the software to use for large datasets. Most business professionals I have come across use SAS and its established itself among financial analysts. R (Cost = free) can also handle large datasets if you have plenty of memory. A disclaimer though is that R does not guarantee its algorithms, whereas SAS in a way does. That's why usually, you'll see grant writers propose SAS (or Stata) in their grant proposals rather than R. Stata (Cost = ~$800, though you have to pay to upgrade when new versions are released) I've never really liked because of the point and click feature, though it makes it really easy for people who don't consider themselves good programmers. It seems to be used a lot among epidemiologists.

Stat transfer to me is sort of a waste of money. Most datasets can be converted to CSV format (or some other alternative) by the person giving you the data. Thus, you can import it using the software you already have. Also, most stats software have started integrating other data formats into their own formats. For example, SAS v9.2 now allows you to import Stata data.

I'd strongly recommend R since to me it's just as good as SAS (given that you have enough memory) and it's free. If you need the algorithms to be guaranteed to work though, I'd go with SAS.

Yes, R can do most things that SAS can. It has the same flexibility as SAS (IMO) so I think you should go that route instead.

Keep in mind that with SAS, you can also do it in one line using the proc transpose. The extra options are only required if you want more flexibility or are using additional datasets simultaneously (which I don't think STATA allows you).

#### Dason

I think Link does a good summary of how I feel too. SAS really is the only alternative to R in my opinion and I'd recommend Enterprise Guide if you were going to go the SAS route - it does a little bit of point and click driven stuff to make your life easier but I mainly like it for the code completion and help with the options you can throw into a proc. SAS isn't cheap but compared to the other packages I think it's the only one worth anything.

If you really desire point and click for R you can get it for a few things (R commander and deducer and the like) but there isn't a canonical nice GUI for R that allows menu driven interaction. I'm not a fan of that so I don't really care. If what you really want is just a nice environment other than the command line to interact with R then I'll recommend RStudio - it's pretty awesome.

#### JenB

Hi. I work for StatSoft. To clarify, STATISTICA Data Miner provides three different user interfaces for data mining: (1) Individual Analyses, (2) Recipes, and (3) Workspaces.

1. Individual Analyses: If you already know which algorithm to use, individual analyses provide a variety of algorithm-specific options. Algorithms include trees, boosted trees, random forests for classification and regression problems, automated neural networks, k-nearest neighbors, support vector machines, various clustering methods, Kohonen networks, partial least squares, generalized linear models, association rules, sequence analysis, etc.
2. Recipes: Easy-to-use wizard-like user interface that can be used by novices. Default algorithms include trees, boosted trees, random forests, neural networks, and support vector machines. See YouTube Data Mining Recipes video.
3. Workspaces: Analytic workflows composed of multiple algorithms, which display as icons/nodes. Use one of the provided workspace templates or create your own. Since STATISTICA integrates with R, you can also create R nodes. See YouTube Data Miner Workspace video.

#### WeeG

thanks for the info, may I ask you a couple of questions, since you work for StatSoft ?

1. I have tried to find what are the prices of the different packages. Sometimes it was easy to locate the info on the net (Stata, Systat), sometimes was harder but I managed to find it (SAS). I couldn't find the price of STATISTICA anywhere...in addition to that, I can't even use a 30 day trial of the software without registration. why is that and how do you think this way help you to attract new customers ?

2. I am looking for a package for Biostatistics, but will need during time to use a bit of data mining and a bit of Bayesian modelling. Can you convince me why should I choose STATISTICA (along with R which I'll use anyway), over SAS, STATA or SYSTAT ?

Thanks !!

P.S These questions don't come to hint anything about any software, I am honestly curious, and find it interesting, I am aware that people from other companies will say good things about their packages, but I don't know much about STATISTICA, so it can be informative !

#### trinker

[R] has a steep learning curve but once you got it its flexibility and power make it well worth the time you spent learning the program.

#### JenB

Hi WeeG,

STATISTICA is scalable for different numbers of users, and we allow you to mix and match functionality. Our quote depends on what your site needs based on the needs and skills of individual users. Your local StatSoft sales office will help tailor a quote to your specific needs and that will ultimately save you money.

Reasons to consider STATISTICA are:
* Quality and accuracy of model performance
* Variety of available algorithms
* Strong graphical visualization of models
* Ease of use
* Ability to handle very large data sets

Best regards,
Jen

#### gianmarco

Hi!
I agree with JenB about STATISTICA.
Even though I got an old version (8) and use it for very basic needs (hypothesis tests, charts, corrspondence analysis), I find it very user-friendly and easy to use.

Regards,
Gm