benefit of Proc SQL vs SQL vs learning SAS syntax

#1
I'm just starting to learn SAS and I'm wanting to know which direction I should take in my learning.

I work in a financial industry and our primary database work is in SQL Developer. My main objective in learning SAS is so I am able do statistical analysis on my datasets(I'm still new to SAS, so I do not know of any database benefits of SAS vs SQL).

I know that SAS has it's own syntax, and also it has the proc SQL which will let me code in SQL syntax.

My question is whether it is beneficial to start from the beginning in learning SAS (i.e code in the SAS interface), or can I use SAS just for it's statistical tools (I,e instead of coding in SAS, just import my SQL Data every time)?

What is the benefit if proc sql vs coding in SQL dev/server and then import the data?

Also, can SAS on it's own be used as a database management software like SQL, or does it really need proc sql to do all the joins, etc etc

Can I just learn the statistical functions without knowing the basic SAS syntax (since I would have already formatted and manipulated my data in SQL before outputting)?
 

bryangoodrich

Probably A Mammal
#2
Depending on what you're trying to do, you'll want to know some base SAS coding. SAS manages sas7bat files as if they were database tables, so in that sense, you can code SQL against your SAS data. Additionally, SAS can make external connections to your database sources to which your SAS SQL code will get translated to the native database language. The nice thing about SQL is at its core, it is universal. If something accepts full ANSI SQL, then you can code SQL against it. However, every database engine extends SQL with specific functions and ways to make things easier (e.g., pivoting data, doing row-based operations, different aggregate calculations or date functions).

When it comes to performing statistical analyses, data importing, data manipulation, you will want to know how to do those in the SAS language as well as SQL, as some of those operations are not able to be done from the SQL language. Some parts of that process may be, but in your SAS code you may switch from a data step to a proc sql step and back to a data step. Thus, if you're going to be a SAS user, you might want to get some feel for using the language. Myself, I do most of my work in SQL when I'm in SAS, but sometimes the proc SQL language doesn't have the nice things Microsoft T-SQL has, so I have to pause, do a data step (e.g., create unique row numbers), and then go back to SQL code. Or when I'm connecting to my SQL Server database, there's no reason to export from SQL Server, import into SAS, and then do code. I can just do my SQL code directly from SAS through a connection (or create a stored procedure on SQL Server and just call the procedure from SAS rather directly).
 

noetsi

Fortran must die
#3
I run statistics which involves drawing data out of a relational data base (about 95% of the time is SQL which I suspect is pretty common). Having seen old type SAS code used before SQL (commonly this involves merging statements) SQL is infinitely more useful/easy to do. Also PROC SQL is pretty similar to T SQL so if you learn one moving to the other is not that difficult.

If you are going to do stats than just use the EG GUI (you have to learn little coding to do so). Commonly you will need to add elements (say contrast statements) which EG won't do. But you can use the PREVIEW button in EG to find the base code and then just add the extra commands you need ( you just cut and past the commands from EG to a code window and add what you need).

SAS INC is not great at explaining how to code in its web site (they are statisticians not coders). But there are commonly places on line you can find what you need, notably the SUGI conference papers.
 

bryangoodrich

Probably A Mammal
#4
Well this day and age, most analytical products provide some form of SQL to interact with the data; if the data is relational (structured/tabular), then it's designed for SQL. You know SQL you can work on SAS, Microsoft, Oracle, etc. data products in some capacity. Even big data technology has ported a subset of the ANSI SQL standard to provide for using distributed computed systems (e.g., Hadoop) and data processing, such as HiveSQL and its variants Tez, Drill, and so on. Nothing beats also understanding relevant programming languages that can work against the systems you're working with. For SAS, obviously know base SAS. If Hadoop, knowing some Pig or ways to implement other languages (R, Python, Scala, Java) on it. Otherwise, general purpose languages like R and Python (or JavaScript or Closure, and others) can be used across many systems for different purposes.

In short: If you want to code against data, nothing beats knowing SQL. Then know some programming languages either specific or general to what you're doing.
 

noetsi

Fortran must die
#5
Personally I think learning base SAS is a waste of time for generating data for stats. All our old reports used it, and they are moving to PROC SQL and away from base SAS [for drawing in data] because its so much easier to code and understand. Its very hard to read through base sas data pulls, they tend to be convoluted because SAS was never really designed before SQL to deal with relational tables. You have to merge tables over and over again, which is very inefficient.