Is retrieving data using datastep really slow?

#1
I'm trying to get use to coding in SAS and it's syntax, because If I want to do more statistical analysis and data basic manipulation, I can do it all in SAS vs "Do it in SQL Server, paste it to a work book, and then import the data in SAS."

Am I doing it wrong or is DataStep really slow? Im trying to pull a basic table from one of our servers, but it's really slow.
Here's my code:

data Counts(firstobs= 1 obs=10);
set library.table;
run;

All I'm doing is I'm pulling the first 10 observations from table. We do house a lot of data, but I figured just pulling the first 10 observations shouldn't be too complicated. I also just tried obs = 10, but that pulled all my data, but took about 20 minutes.

*EDIT.. after about 20 minutes, the firstobs=1 did not work.

Am I missing something?

Also, I'm not using Proc SQL on purpose so I can learn another language.
 
Last edited:
#2
When drawing data from a SQL table using a “DATA” step, SAS converts the code into SQL in the background anyway because that’s the only language the database understands. You’ll probably find that “PROC SQL” is just as slow. I suspect that the bottleneck is either (1) a huge source table, or (2) your LAN connection (assuming the server is a different computer to the one on which you’re working). If you’re using a standard 10/100 Ethernet connection, make sure that it’s configured for 100 Mbps Full Duplex or Auto Negotiation. Also, don’t use network drives for SAS libraries in which to store datasets while drawing from a remote source because doing so increases your bandwidth requirements significantly and slows things down tremendously. Your Network and/or System Administrator should be able to help you get more speed.

Another possible way to speed things up is to specify which columns you want in a “KEEP” clause. This works well if the source table has lots of columns.
Code:
data Counts;
	set library.table (keep = Col6 Col17 Col23 Col34);
run;
What do you mean to achieve with the “FIRSTOBS” specification? The order of observations in a SQL table is not guaranteed to remain static. If you have the necessary disk space, it is often better to draw the whole SQL table across into a SAS dataset and refine your selections subsequently on your local machine because all of the table’s rows are normally passed to SAS anyway—that is, row selection processing usually happens on the local machine, not the server.
 
#3
Thanks Con-Tester,
Basically, what I'm trying to achieve with 'firstObs' is to simply get the first 10 rows of the data. As I query and join my data together, I want to periodically check that my cod and table are still functioning. Is there a better way to do this?
 
#4
Well, as I said, you need to be aware that the order in which a SQL table’s observations occur is not guaranteed to remain the same from one run to the next, so that you could end up with different “top 10” observations at different times. If that’s not an issue, you can use this SAS code:
Code:
data Counts;
	set library.table (keep = Col6 Col17 Col23 Col34);
	if (_N_ > 9) then stop;
run;
 

Stu

New Member
#5
I'm trying to get use to coding in SAS and it's syntax, because If I want to do more statistical analysis and data basic manipulation, I can do it all in SAS vs "Do it in SQL Server, paste it to a work book, and then import the data in SAS."

Am I doing it wrong or is DataStep really slow? Im trying to pull a basic table from one of our servers, but it's really slow.
Here's my code:

data Counts(firstobs= 1 obs=10);
set library.table;
run;

All I'm doing is I'm pulling the first 10 observations from table. We do house a lot of data, but I figured just pulling the first 10 observations shouldn't be too complicated. I also just tried obs = 10, but that pulled all my data, but took about 20 minutes.

*EDIT.. after about 20 minutes, the firstobs=1 did not work.

Am I missing something?

Also, I'm not using Proc SQL on purpose so I can learn another language.
It's generally very fast. You'll need to see what code is being sent by SAS/ACCESS in the background to understand where the slowdown is. Certain options and functions do not translate. If any are specified, SAS will pull everything from the database before filtering in order to prevent a process from failing.

In your case, you are using the obs= and firstobs= options on the output dataset, not the input dataset. Everything will be read in its entirety from the SQL server before being filtered.

Additionally, if you still have performance problems, you can check on exactly what is being sent. At the top of your program, add the following lines:

Code:
options sastrace=',,,db' sastraceloc=saslog;
When you run your code, you will see the log filled up with responses from your SQL server. If you see these statements in sequence:

ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
SAS_SQL: Unable to convert the query to a DBMS specific SQL statement due to an error.
ACCESS ENGINE: SELECT * FROM LIBRARY.TABLE

The SQL server will be doing minimal, if not zero, filtering on the data, and sending it all to your SAS server before any filtering is done. Since the firstobs= and obs= options are SAS-specific, they're probably not going to pass to the server. You'll want to filter your table with a WHERE statement, and definitely enable the SQL_FUNCTIONS=ALL libname option. For example:

Code:
libname library oracle schema=myschema sql_functions=all;
You can see a list of the supported and unsupported functions for various database types here:
http://support.sas.com/documentatio...t/viewer.htm#p0f514zivtwp5an1msqy0l0br8wc.htm

If you do need to use an unsupported function, it's best to first filter your data at the lowest level possible and bring it to the SAS server, then perform the SAS functions on the dataset. You can multithread the data manipulation using various techniques to speed it up as well.
 
Last edited: