IF you have access to SAS and R, which one would you learn?

#1
Suppose you work at a company that has SAS and R installed, which one would you be inclined to focus more on in terms of self learning?
The obvious answer is to learn whichever one the company is using; however, we don't use either in my department..so all this is for self learning/gain another skill set/further job security etc etc.
my job right now is As an 'analyst' for an insurance company. Data collection, see who is doing well, not doing well, adjusting factors and pricing. A lot of it is done through excel and basic arithmetic, so no statistical or predictive analytics concepts. However I would like to use SAS or R to apply these concepts to see if I can get better and more concrete results than just eyeballing.

Would you start with R or SAS? my first thought was that since SAS costs so much money, I should 'milk it'. While I still have it. SAS is still relevant in the industry and is not a bad skill to have. However, while R is free, it's value seems to be growing.

Personally, I feel R is a bit friendlier just because I can find more resources on it, but not sure if the name of SAS really trumps them all.

Personally, for my career goal, I'm not saying I want to be a statistician, but I do want to stay in the data analytics field and want to make myself more marketable.
 

noetsi

Fortran must die
#2
On this board R :p

I primarily use SAS, I am slowly learning R. Both have advantages and disadvantages. If you don't have very good programming skills (I don't) R is a lot harder because it has no significant GUI while SAS has Enterprise Guide. So its easier to learn SAS. Second, as far as I know, you can not combine SQL with statistics in R. SAS does do this. Much of my work as a data analyst is getting existing data in a form that can be used to run statistics including joins and filters. SAS does this very well (through PROC SQL). I have not seen this in R although it may exist. The result is you can run SQL and statistics seamlessly in SAS, but you probably can not in R.
Finally, many corporations know SAS which is a corporation. They may not have heard of R.

On the other hand R does pretty much every type of statistics you can imagine and SAS does not (or at least it does not do certain cutting edge elements of that). I am learning R to do multilevel and structural equation modeling which R does very well, but which I have my doubts about SAS. And its completely free to use R.

For most corporate functions I would imagine SAS works best. For higher end analysis such as many do on this board R is better. If you want to be a true statistician which I am not, R is the way to go. SAS is for data analyst.
The many many R fanatics will have fits when they read this :(
 

Jake

Cookie Scientist
#3
Second, as far as I know, you can not combine SQL with statistics in R. SAS does do this. Much of my work as a data analyst is getting existing data in a form that can be used to run statistics including joins and filters. SAS does this very well (through PROC SQL). I have not seen this in R although it may exist. The result is you can run SQL and statistics seamlessly in SAS, but you probably can not in R.
http://lmgtfy.com/?q=r+sql
 

noetsi

Fortran must die
#4
From the link Jake amusingly posted

The R community is unique as programming communities go. Many users of R come from academia and have a relatively extensive mathematical background. The R community has developed in relative isolation from some other areas of programming that have been widely adopted by business. To many business users, working with data is synonymous with dealing with a relational database system (RDBMS). Yet none of the R books that I read use relational databases at all (and online resources on the subject are limited).
Which suggests that what I said is accurate. R is not normally used with SQL. I never said it was impossible. There are over 6,000 R modules. But no discussion or presentation I have ever said related to R ever brought up SQL let alone used it. With SAS its a central element of the software. They have probably the best GUI of any SQL software.

This is more hopeful although I wonder how many actually use or will soon use the 2016 server. We still use, at a large agency, the 08 server.

There is a great deal of excitement regarding Microsoft’s acquisition of Revolution Analytics that subsequently lead to R being integrated into SQLServer 2016. SQLServer 2016 is available as a preview, but is still subject to changes before its official release.
TSQL which the MS server uses is very good SQL so if this occurs it will be a big step up for R. It is not clear to me generally from the article what SQL R uses and how ANSI compliant it is.

https://www.simple-talk.com/dotnet/software-tools/sql-and-r-/
 
Last edited:

bryangoodrich

Probably A Mammal
#5
TSQL which the MS server uses is very good SQL so if this occurs it will be a big step up for R. It is not clear to me generally from the article what SQL R uses and how ANSI compliant it is.
R doesn't use SQL, R has several libraries that support database connectivity: RODBC and DBI, in particular. While I used to use the latter for a number of conveniences, it's the former that supports basic ODBC compliance. However, with the odbcDriverConnect command, you can specify your database driver, such as {SQL Server}, that you might have installed, providing for more native functionality than ODBC compliance requires. In addition, as you should set up anyway, a Data Source Name (DSN) can be used that once established on the machine makes connecting to a database as simple as

Code:
conn = odbcConnect("MyDSN")
From which you can do a great many things, but most simply reading and writing

Code:
sql <- "SELECT * FROM whatever WHERE blah = 1"
x = sqlQuery(conn, sql)  # Read
sqlSave(conn, x, "TheTable") # Write a table
Doesn't get easier than that.

In any case, Microsoft already embedded R into SQL Server 2016. The 2016 and above versions will simply be an encapsulation of their cloud Azure services, making for an excellent database product, because what 2016 is already exists when you're using their cloud services. R has been available on Azure for some time now, so the integration already existed. But Microsoft didn't commercialize R. R was already expanded with certain optimization, especially for big data distributed computing, and provided enterprise level support by Revolution Analytics. Purchasing them simply allowed Microsoft to absorb their commercial user base, expand it to their SQL Server user base, and help Microsoft focus on moving more into the big data computing and open source environments, which is what they've been doing for the last 2 years (SQL Server on Linux coming; Visual Studio on Linux; Hadoop on Azure, etc.).

It isn't a big step up for R. It's a step up for Microsoft and SQL Server. Otherwise, the only other paradigm for computing where the data resides is Hadoop, and short of having a specific platform to support that with R (e.g., Teradata), would require using platform-specific means of computing on the data. SAS already does this model. SAS has an embedded database system so you can do your data access/management in SAS or compute in SAS, and it seems like "one thing." This is where understanding the difference between platform and technologies that allow access patterns to that platform are important to discern.

In that sense, R allows SQL access as much as any other technology that supports SQL access. R is not a platform, though. Its a computing technology. To me, it's one of the easiest computing technologies to learn because its built for dealing with data.
 

noetsi

Fortran must die
#6
I guess what I meant is if there is a R set of codes similar to PROC SQL that allows you to do transactional data analysis. I think you said yes bryan although I am not sure. :p Is it ANSI compliant, which for example ACCESS was really not - way to many legacy commands that violated ANSI?

A key for the OP is to find out what companies you do or will work for use. They may or may not care what SQL you use - but you need to check. And check that your company will allow you to connect to one of the R links through which you run it, mine was not until I appealed. You can't use it if you can't access it.
 

bryangoodrich

Probably A Mammal
#7
R isn't implementing its own SQL language. You're literally just passing that SQL string from R over to the database itself. It's using whatever language the database uses.

Access requires its own SQL language because it's using its own database engine, but if you connect Access to another backend (and you can), then you're not using the JET engine SQL language (which is ANSI-89 level 1 compliant).

The same is true of SAS. It uses its own database engine, and thus it has to use its own SQL language, which implements most of the ANSI SQL rules and key words, but also introduces some of its own (like the "calculated alias" keyword). However, when you make a connection to a different backend, as I've done in SAS, you have full access to the commands of that database (e.g., I can use TSQL statements in SAS running against a SQL Server instance).
 

rogojel

TS Contributor
#8
I would not make this decision dependent on one feature, like SQL. I would definitely pick R for two reasons - it is available everywhere, and it is has way faster, shorter development cycles, so new features will arrivebway faster. And, BTW it is m ore or less the reproducible standard, readable and testeable by everyone.

regards
 

hlsmith

Omega Contributor
#9
ive used sas for years. originally r was deemed difficult to use, though no with all of the online resources i think that statement needs updating. as long as you are learning the fundamentals hat is paramount. the actual answer learn them both. picking is silly and limiting.
 

rogojel

TS Contributor
#10
You might not have the time to do both -- e.g. I never bothered to,learn pascal, java etc. once I learned C and C++. Life isntoo short and I would prefer proficiency in one to acceptable performance in two :)
 

hlsmith

Omega Contributor
#11
If I was hiring, I would prefer someone that was more open. Think of all of the countless programs out there, e.g., spark, Revman, HLM, Winbugs, SAS, R, etc. Seeing how other programs do things and options they provide are only going to broaden a person's expectations and knowledge. I am not writing about duplicating every procedure in each language, just optimizing your options.
 

rogojel

TS Contributor
#12
It really depends on the type of job you are hiring for. It rarely happens that the job needs several languages, and then you would want a specialist in the particular language/system you have. Actually I have never seen a case where a generalist was preferred - and I sat on both sides of a job interview table lots of times.