SAS is better than R :)

noetsi

Fortran must die
#1
In study of the statistical packages, simulation from probability distributions is one of the important aspects. This paper is based on simulation study from Bernoulli distribution conducted by various popular statistical packages like R, SAS, Minitab, MS Excel and PASW. The accuracy of generated random data is tested through Chi-Square goodness of fit test. This simulation study based on 8685000 random numbers and 27000 tests of significance shows that ability to simulate random data from Bernoulli distribution is best in SAS and is closely followed by R Language, while Minitab showed the worst performance among compared packages.
http://www.google.com/url?sa=t&rct=...GIgYAO&usg=AFQjCNE0t7dBQWruaa5nuDwUizff9cwHrw
 

Dason

Ambassador to the humans
#2
Didn't I already rip this paper apart at some point? I feel like you posted it and I provided arguments against their methodology. With that said though if you're going to make a claim like that you probably don't want to use a 5 year old article where the authors are:

Manash Pratim Kashyap
Department of Business Administration
Assam University, Silchar, India
kashayap.manashaus@gmail.com
Nadeem Shafique Butt
PIQC Institute of Quality
Lahore, Pakistan
nadeemshafique@piqc.com.pk
Dibyojyoti Bhattacharjee
Department of Business Administration
Assam University, Silchar, India
dibyojyoti.bhattacharjee@gmail.com

Their titles don't really scream to me that they are qualified to do this analysis.

I know you like to push R users' buttons but you really should pick your fights more wisely ;)
 

noetsi

Fortran must die
#3
No I didn't post it and I am not sure why you assume they are not qualified. A lot of the advanced analysis I have seen comes out of India these days. The days when US universities dominated the world has been eroding badly as funding dries up for them. Indian research is seen as a serious threat to the US intellectual dominance.

Why do you assume that something has changed in the last five years? Usually when a case is made it is for those who dispute it to offer evidence that it does not remain correct.
 

hlsmith

Not a robit
#5
There was a paper just like this one posted awhile back, probably 5 months ago or more. Dason would have a point if there was now a new package in R or procedural approach in SAS that would outdate the paper's content.

Though, I would say there is no way for me to truly evaluate the authors' ability to run the comparisons. They could be the greatest around or paid to do it. Who know?
 

noetsi

Fortran must die
#6
I agree that there is no way to evaluate the author's credentials. My point was that simply dismissing Indian schools was not reasonable anymore. My view is that it is essentially impossible to tell what software is better in part because it changes constantly.
 

hlsmith

Not a robit
#7
Dason is able to defend himself, but I don't think he mentioned their nationalities - just titles. Yes, procedures are constantly changing, and who knows if they had the most concise approach.
 

noetsi

Fortran must die
#8
I don't see any titles - just the universities they are at. But it is possible I did miss them somewhere (I assume they all have PHD's although I don't see that).

Since dason is not here he can't defend himself. That is why I am pouring it on when he can't :p
 

spunky

Doesn't actually exist
#9
No I didn't post it
SURE YOU POSTED IT BEFORE!

PROOF: http://www.talkstats.com/showthread.php/56826-SAS-versus-R

(didn't you say you had like perfect memory or something? :p)

and Dason' argument against their qualifications is not because they come from India, noetsi. it's because of the departments they're associated with (Business Admin). i also tend to be suspicious when people from other areas of statistics make grandiose claims (including me, of course, because my area is rife with people who don't know what we're doing).

it would be a matter of figuring out what these people did their PhDs on and what other publications they have. for instance, Peter Bentler (one of the big names in Structural Equation Modeling) considers himself a big-shot statistician. but he is a Clinical Psychologist-turned-data-analyst... NOT a statistician. all his theory papers are authored with people who did PhDs/MScs in Statistics... and we both had a disagreement on SEMNET where he had to admit his linear algebra is not very good (he was trying to evaluate a proof i had been working on)... because he never trained as a Statistician.

so i do believe Dason has a point in being suspicious of what people do sometimes because you have no way of verifying whether they're qualified to say what they're saying. and there are not enough people out there to consciously read this stuff and make valid claims. i'd like to mention the experience i went through with my own article, for example. i got two reviewers: the first one was "publish as is". literally, a one-sentence review. the second one was from a person who tried but through his reviews i was able to see (s)he didn't quite get what i was doing. this guy/gal didn't even know how the Fundamental Theorem of Algebra worked, as (s)he evidenced by one of his/her suggestions. so all i ended up getting from those reviewers was that they either did not know enough to criticize my manuscript or didn't care (to be honest, i think the 2nd reviewer cared but didn't know enough and the 1st reviewer just didn't care).

you and i have been over this before. there is more data out there than people qualified to analyze it... so the rest of us have to half-ass our way to do the best we can. the problem is distinguishing the capable people from the rest of us.
 

noetsi

Fortran must die
#10
No I don't have a perfect memory.... :(

Spunky you are in education so if statistics done by non-statisticians is suspect...:p

I don't think the issue here is theory. It is how fast programs calculate. So how much you are an expert in stats, for this specific simulation, is likely less important than how good a coder you are. I would think business administration professors code as well as statistical professors (or their graduate students most likely since they are the ones doing the coding). In past research I have seen in a technical area commonly one department will contact another so you don't even know who did the coding.

The sense I get is that this is not really that complex an issue. They were just seeing how fast the software worked on a basic simulation. But maybe I am wrong at that. I misunderstood the point dason made, I thought he was arguing Indian universities were not good at technical research (which is wrong, but made a lot in the West).
 

Dason

Ambassador to the humans
#11
It has nothing to do with "how fast programs calculate". This was an assessment of how well the programs simulate randomness. As I pointed out in the previous thread that spunky links to I think the methodology wasn't appropriate/adequate.

And I wasn't pointing to the nationalities - I was mentioning the departments they're associated with. Like spunky says it's not a perfect indicator but that along with the poor methodology makes me question their authority (and ability) to do this type of assessment. I wonder, though, what would have made you think that the paper was some comparison of computation speed.
 

spunky

Doesn't actually exist
#12
of course i am suspect, noetsi! we're all suspect in programs like yours or mine because we don't have the privilege of going through the rigour of theory. as we've both discussed before (and i'm sure we agreed on this) the best we can aspire is for a 'cookbook' undertanding of statistics where all we know is to recognize generic data types and choose the methods we learned according to that. but when there are subtleties in the data or it doesn't quite fit the 'recipie' we know, we start getting in trouble because we don't know enough theory to adapt the methods that we know.


i believe you are, indeed, misunderstanding Dason's point. Dason's criticism is on the specific analyses the authors made to assess the speed and accuracy of the algorithms. there is no point in me repeating them because he expanded on their in the post i linked, but it sums up to say that the analyses they did to conclude which algorithm is better are not the best. and if the analyses are not ideal, you cannot really conclude that one algorithm is better than the other because you're not evaluating them right. it's akin to saying something like (i admit it's a poor analogy but the gist of the idea is there): "i am going to evaluate which tool is better, the screwdriver or the hammer. my analysis will be the time it takes me to get a nail through the wall. since the hammer did it faster, the hammer is the best tool". of course the hammer is the best tool, according tot this design! the way you gathered and analyzed your data is suspect!

data analysis works the same. there really is no use in running through simulations if you can't make sense of what the data tells you in the end. and, as Dason pointed out, the way they're making sense of the data does not match the research questions they posed. they think they do and they argue they do, but they do not. so yes, the complex issue here (and this is where credentials come in) is that their analyses are suspect and hence their conclusions are suspect. and, usually, you trying to figure out what you did wrong in day-to-day data analysis can be complicated.

and no, i'm pretty sure neither Dason nor anyone would argue that all Indian universities or Indian graduates are sub-par when comparaed to Western universities. many of the greatest statisticians the world has ever seen (Mahalanobis, CR Rao, etc.) are Indian. and i'm alsopretty sure that India has its fair share of crappy universities and crappy graduates, like everywhere else in the world.
 

noetsi

Fortran must die
#13
Well some of us, many of us I think, are suspect because we really don't want to address the rigor of mathematical theory but use it for something very practical [or to push theory in our subfields]. I doubt you care if statistical theory is advanced, you want to address educational theory. You hope the statisticans have generated the right tool just as a biologist hopes the electrical engineer created the right electron microscope. You want to use statistics, not advance it [or rather I think most non-statisicians do].

data analysis works the same. there really is no use in running through simulations if you can't make sense of what the data tells you in the end.
Well there is if you have to have something done by 5 pm Friday, or get fired, and you know what you generate won't be used anyhow....

As I noted I misunderstood Dason's post.
 

spunky

Doesn't actually exist
#14
Well some of us, many of us I think, are suspect because we really don't want to address the rigor of mathematical theory but use it for something very practical [or to push theory in our subfields]. I doubt you care if statistical theory is advanced, you want to address educational theory. You hope the statisticans have generated the right tool just as a biologist hopes the electrical engineer created the right electron microscope. You want to use statistics, not advance it [or rather I think most non-statisicians do].
but this traps us in the cookbook approach to Statistics that we're trying to escape. i think the crux of the problem here is that you keep equating Statistics = tool. Statistics is not a 'tool'. it is a science and, as such, you rarely have the privilege of answering yes/no, right/wrong, hammer/screwdriver questions if you want to extract scientific knowledge out of something. if you don't try and learn some theory, your practice suffers. and if your practice suffers, your paycheque does as well :p. such is the nature of the beast and there is no way around it. now that science is mostly statistics-driven we all have to become, within varying degrees, 'statisticians'. (<-- maybe i should have said 'methodologists' or 'data analysts' here)



Well there is if you have to have something done by 5 pm Friday, or get fired, and you know what you generate won't be used anyhow....
i'll... pretend i didn't read this one :D
 

bryangoodrich

Probably A Mammal
#15
Simple critique. The results were

Code:
SAS  210
R    212
SPSS 241
Xls  252
Mtab 270
Okay, is 210 really significantly better than 212 for the rank difference to be meaningful? Is it different enough for it to be significant between SAS and Minitab? Best to worse, out of the 5,400 sample, we're looking at 3.9% to 5% "poor fits." I also wonder about the variation. Okay, SAS does 210 poor fits out of 5,400. What's the variation? Is that greater or less than R? Are they different?

I also can't take a paper seriously that has typos like "form" for "from" and uses the made up plural of "softwares."