SAS v R

hlsmith

Omega Contributor
#2
Yup, seems about right. And I always forget that people in the business world use a dataminer/enterprise SAS version, which makes things even easier given minimal to no coding needed.
 
Last edited:

noetsi

Fortran must die
#3
You really don't need to use code at all to do enterprise guide. Although there are many statistical functions that can not be utilized that way. But in those cases, usually options, you can just open up the code, add the option and paste it into a code window which I do frequently.

People who are good at code like dason tend to not understand that those of us who are not great at it (like me) struggle a lot with R for that reason. :p
 

Dason

Ambassador to the humans
#4
It's not that we don't understand that you struggle. It's that we understand that *everybody* struggles. But you get better with practice and it's a much better way to do things so we think it's worth it.
 

noetsi

Fortran must die
#5
I think there is a trade off between ease of use and ability to do cutting edge work. For most, even if you get better, it is going to be always slower to do R than SAS. And for a wide range of procedures SAS almost certainly is correct.

An interesting point raised by the author, who's biases I don't know, is that R generates incorrect results at times because it relies on individuals to generate many of its procedures and there is little oversight. So if an author gets it wrong, no one will catch it. With a large company and multiple oversights that is probably less likely.

I know dason will say that anyone can review the code and catch these mistakes. I wonder how often that really happens with the more esoteric procedures. One thing I have read, by someone who was not a R critic was that those who create the code have their own views of what is correct or not and that influences how they write the code. Given that statisticians disagree with each other, that would seem to be important.

One thing that is true is some of the suggestions of authors in books do not exist in SAS. So if you want to do that you have to either do it yourself in R or find someone who will (or I guess S although I do not know if that code still is used).
 

hlsmith

Omega Contributor
#6
That is a weak critique for the main packages and procedures in R, which cover the procedures in SAS.

In R you can make 'Pull Requests' to package authors to have them clean up code or make fixes. Plus so many people are using R now that things do get caught.

A weakness maybe in esoteric packages minimally used or those with dependencies on other packages, since updates are needed when changes are made.
 
Last edited:

noetsi

Fortran must die
#7
the question is about when authors disagree on issues. When there is no agreed on way, the code may do things the way one group feels is correct and others strongly disagree with. Which matters if you are not familiar with the disagreement or the way the code occurs.
 

hlsmith

Omega Contributor
#8
Yeah, that is why you reference the program version you used along with package release number for version control and posterity. I get most of those complaints, but it comes back to how analytic plans are analyst dependent.

But thanks for sharing.
 
Last edited:

Dason

Ambassador to the humans
#9
the question is about when authors disagree on issues. When there is no agreed on way, the code may do things the way one group feels is correct and others strongly disagree with. Which matters if you are not familiar with the disagreement or the way the code occurs.
How is this different than SAS though? The documentation will tell you what method or whatever is being used. At least with r if you aren't sure on the details you can look at the code...
 

noetsi

Fortran must die
#10
I am not sure SAS does the cutting edge stuff that statisticians disagree on :) I think there is longer review at SAS (I don't think R has a formal review system at all, anyone who wants to post code can) as it is a larger organization, although I don't know this.

SAS documentation is pretty easy to find. I don't actually know where the R documentation for a given product is. The problem with looking at the code is that you have to understand what the code is doing. With SAS you read English explanations of the substance behind the code in the documentation - you don't inspect code.
 

hlsmith

Omega Contributor
#11
In R you typically just type a ? by the procedure and it takes you to the documentation and commonly provides an example with an available toy dataset or simulation data (with data generating code).



P.S., @Dason - what is the vetting process to get something into CRAN. This may help Noetsi to better understand. @noetsi - this isn't any different from when say I use a macro in SAS. I have to look at the code to understand the procedure - very comparable.
 
Last edited:

Dason

Ambassador to the humans
#12
@hlsmith - The base packages go through whatever review the R core team gives them. But it is true that a package submitted by a user doesn't have to prove correctness in any sort of way. Hell they've let me throw a few packages on CRAN.

But once again the code is available for all to see. The code maintainer is a lot of times the person that developed the new method of interest so they would be the most knowledgeable about the code in the first place.

So is there a slight risk to trusting code in an external package? Sure. But to me it's a question of if you trust the relatively unknown procedures SAS uses to guarantee their code correctness. You can see their documentation. You can assume that the procs meet the specifications set for them. But you can't look at their code to verify the correctness. They have people hired to look at the code but still you yourself can't view it to see what is going on under the hood. With R you can view the code. And quite a few people do. So sure if you're using a relatively obscure method that nobody cares about then you might want to check the code out because nobody else might have done that already. But it's not like that method was going to be included in SAS anyways so why are we picking on those methods in the first place?

I will say that SAS tends to have much more complete documentation for their procs than R does for the packages. And that's because writing documentation is boring so package authors typically do enough to get by but nobody is paying them to write the help pages so they tend to do the more interesting things (like add features or add more robust unit tests...)

Note that I work for a *giant* company. We don't use SAS. We use R. We use Python. We use the tools we feel are best for data science. So noetsi can claim that R doesn't get used by companies but the two non-academic jobs I've had so far have used R.
 

noetsi

Fortran must die
#14
So the 1/100 of one percent of the people who understand r code can insure themselves it works. If they can go from that very complex code to the often highly esoteric statistical issues and understand what the code is doing on that. Only a handful of people on the planet I suspect are good enough at the code and understand the statistics well enough to do so (dason is one who can of course).

In SAS they tell you what they did. And they reference the statisticians ....(although no the ones that built the code).
 

Dason

Ambassador to the humans
#15
How did you get that 1/100 of one percent can 'insure' themselves it works? That's complete bullshit and not what I said at all. For almost everything a typical R user would want to do there is going to be enough eyes on it and enough people using it that you can be guaranteed quality code. I said for the more cutting edge stuff (the **** SAS won't even have implemented...) that's when you'll need to do more work to ensure that it is valid. But like I said typically it's the person that developed the method that is implementing these packages so they're the ones that understand the code and method the most anyways.

You like shitting on R but I doubt you can point to a single relevant case to back up your claims here that R can't be trusted. You're basically appealing to a boogie-man argument that the R is out to get you and only the most elite know how to avoid being viciously destroyed by the scary codez.

Yet you have no problems with the fact that if I think something is wrong with the SAS output I can't check to see if the code is correct or not. Instead I'd have to contact their support. Wait a while to get contacted back. They'll put in an internal ticket or something to check out the issue and maybe one day it'll get fixed. Now the statisticians and programmers working at SAS are good at their jobs. But they make mistakes too. And it takes time to get through all that red tape. And then once they identify a problem... do I have to wait until their next release? Do I pay extra for that? I'm not even sure - I don't have to deal with the SAS licensing and updates. But I do know how fast fixes and feature requests can come for R packages.

So basically what I'm saying is that if you're coming across a bug or something is coded wrong in R... then you're probably one of the first people to use that package and SAS probably doesn't even have it implemented yet.
 

noetsi

Fortran must die
#16
What I said was 1/100 of one percent of the people who understand r code...which probably overstates the percent of the US public who know R code extensively-- well enough to determine what we are talking about not just run models themselves. :p What percent of the US public do you think knows R code dason (let alone the advanced use of it to determine how others generated advanced statistical elements in dispute by statisticians). Think what percent of the US public has ever had a graduate course in statistics (and many of those courses will not be in R of course - almost none of mine were).

You misunderstood my point I think. More to the point the author of the article raised this issue not me. And for what its worth I think R code is awesome. My point is most users (who are not going to be expert coders or have graduate degrees in statistics) are probably going to have a more easier time understanding the English only SAS documentation than inspecting code.
 

Dason

Ambassador to the humans
#17
And my point is that you don't need to inspect the code unless there is something wrong or you don't trust it. Which really only happens on the bleeding edge new stuff. So your point on users needing to inspect R code is somewhat irrelevant. R has human readable English only documentation too.
 

hlsmith

Omega Contributor
#18
@Dason, SAS releases maintenance versions regularly, where they make corrections and add new or experimental procedures. These cost nothing, you just have to request a link to the new install. Side note, which isn't such a big deal these days, is that SAS is installed in its entirety on you machine, so it is a beast and accessible anywhere given you have power.

I will bring up my prior comment, macros in SAS are just like using R. A bunch of code you can review and which may have the same threats as R.

You could probably look somewhere for the number of downloads for a R package to get an idea of how many people MAY be using it, base confidence on that if the number is very large.
 

noetsi

Fortran must die
#19
That is good to know. I had not seen R documentation much.

I think it is true that this is only going to be a major issue on things that are in dispute by statisticians or very advanced. Although I have read that there are enough differences among software to generate different results on much less cutting edge research. Usually the differences are not large.
 

bryangoodrich

Probably A Mammal
#20
Although they do not say this outright it suggests that R is more used by academics.
I didn't see them saying that at all. The article really doesn't say much about actual data about use in the wild. The difference isn't about business vs academia. It's about developers vs users. Some people can point-and-click their way through Excel. Then you got those that can build an entire application out of an Excel workbook (me). That's the difference. Which is more useful? That depends.

If the 60 year old guy that can muddle their way through Excel print outs to bring a business value, then that's useful. But a millennial that can code up that process to be repeated by everybody in the organization in an ongoing basis has scalability. However, If that product never gets used because the millennial can't communicate and distribute that product, then it's basically useless. The tools don't matter in this case. It doesn't matter if one is more "academic" (meaningless term). The only thing that matters is value. No tool brings that. It's how you use those tools, in the right environment, among the right people, at the right time. It depends.

R is not SAS. It doesn't try to be SAS. That article is stupid from the beginning when it says "R is the Open source counterpart of SAS." It isn't. That statement doesn't even make sense. They're not even remotely doing the same thing (SAS has a database engine, security settings, GUI interface, extensions, and so on; it's an application that can be used for data processing and statistics). R is a programming language that can be used to do a myriad of things, obviously designed for stats. But you can also use Python or Java or Julia or Scala or Spark or the many SQL-based interpreters on the market. All of these get used to process data and compute stats or machine learning models.

Does SAS have a huge market share? Sure, but does it drive the business? That depends. Typically its analysts and people that need to do "more than Excel" that use it. But like my example above, that doesn't scale. Businesses code. Engineers enable companies to do more. This is a space SAS utterly fails at beyond "hey we can integrate with X and Y" because they can't compete in that space. They simply want to give more people access to what that space has to offer. But no engineer is sitting there thinking "man, I'm so glad I know proc sql so I can integrate my data across our databases and AWS S3 storage to compute real-time statistics on our sales." SAS is only going to ever be at the end of that pipeline. R can be put anywhere into it.

My advice will always be the same for anyone wanting to know what to learn: it depends on wtf you want to do. Are you an analyst or an engineer? Do you like to code up your solutions or prefer to click your way around a spreadsheet? SAS provides a programming interface to its capabilities, but to me it's still just a spreadsheet in the end. If I need stats, sure there are "formulas" that SAS provides, but I'm a coder. Whether I'm in Python or R or JavaScript or Scala I'm only looking for an API to enable me to do those computations. If they don't have them, I'll build them myself (to the extent I know how). That's an engineers life. We build ****. And that is the only dichotomy between SAS and R.