chengjia
05-02-2009, 03:27 PM
We (beauthor.com (http://beauthor.com)) published a new SAS book. Here is the preface.
Preface
There is an old saying in Chinese: "Indigo comes from blue but is darker than blue; ice comes from water but is colder than water." This means, "The master is surpassed by the apprentice." However, this saying also describes the situation of my two "daughters": Understanding SAS and Selected Papers on SAS. This new book (Selected Papers on SAS) is derived from the older sister (Understanding SAS), but she is prettier and more charming than the older sister because I have concentrated on important and interesting parts rather than discussing every point. Actually, I should say that papers in this book are the essence of the older sister.
In the book Understanding SAS I included many new ideas, that is, my own research results. I finished the book, but the labor and delivery process takes a long time. Meanwhile, I decided to write some papers. I have taken some ideas from the book and finished them as separate, independent papers. So now the younger sister makes her debut before her older sister.
The book has 17 papers.
Mainly, there are three kinds of contents: basic, fundamental parts such as names, order of statements and options, end of data set, and operators; hot topics such as CHKLOG, page of, rtf files and special characters, and transmitting between SAS data sets and Excel files; and that are just for fun.
The SAS manual is a great resource for every SAS programmer. However, it is too general in some places. For example, when talking about variable names, it says
You do not assign the names of special SAS automatic variables (such as _N_ and _ERROR_) or variable list names (such as _NUMERIC_, _CHARACTER_, and _ALL_) to variables.
This is far, far from enough. As we know, many computer languages have a determined set of reserved words, such as SQL. But SAS is different. "The rules are more flexible for SAS variable names than for other language elements. . . . . SAS reserves a few names for automatic variables and variable lists, SAS data set, and librefs."
First, what are these "a few names"? Second, this gives users great flexibility. They can use almost any words freely. On the other hand, it brings users some inconvenience, because you don't know how SAS treats a word: Is it treated as a user-defined name or as a keyword? This depends on SAS's interest and SAS's understanding. Sometimes you may think it is OK to use a specified word as a name, but SAS says: No, it is a keyword. Then your programming will be messed up. The following program creates two printouts. Guess what these printouts are.
DATA s;
a=5;n=6;
PROC REPORT NOWD;
COLUMN a n;
RUN;
DATA s;
not=0;
yes=not+1;
PROC PRINT;RUN;
Therefore, if you don't know how SAS behaves, you may not get the results that you want. We need to know about individual names. We need to know what names are forbidden in SAS and what names and in what situations SAS has its own interpretation of their meanings, and then we can correctly use these names or just avoid them.
Another example is subsetting. We all know that we can use a WHERE statement and an IF statement (subsetting IF) to subset a data set. Also, we know that there are some differences between a WHERE statement and a subsetting IF statement. We know that they will produce different data sets in some situations as the SAS manual mentions in the following:
The WHERE statement can produce a different data set from the subsetting IF when a BY statement accompanies a SET, MERGE, or UPDATE statement.
Then we may ask the question: If there is no BY statement, are there any differences? Some books discuss this, but the discussions do not contain enough details. We need comprehensive comparisons between two statements. Several papers in this book discuss this topic.
END of data set is fundamental in the SAS language. Almost every programmer uses option END=. In the paper [IV] I discuss when we can use this option and how SAS works on this option.
Relationships among the options KEEP (DROP), RENAME, and WHERE are basic. In the paper [III] I discuss relationships among statements and options. You may not care about relationships between options OBS= and WHERE= because you never use them together. However, it is quite possible that you have used options RENAME= and WHERE= together. The SAS online document talks about relationships among the options KEEP=, DROP=, and RENAME=, but not with the WHERE= option. So what is wrong with the following program?
DATA s;
a=3;
DATA t;
SET s(WHERE=(a>2) RENAME=(a=b));
RUN;
3 DATA t;
4 SET s(WHERE=(a>2) RENAME=(a=b));
ERROR: Variable a is not on file WORK.S.
5 RUN;
NOTE: The SAS System stopped processing this step because of errors.
Preface
There is an old saying in Chinese: "Indigo comes from blue but is darker than blue; ice comes from water but is colder than water." This means, "The master is surpassed by the apprentice." However, this saying also describes the situation of my two "daughters": Understanding SAS and Selected Papers on SAS. This new book (Selected Papers on SAS) is derived from the older sister (Understanding SAS), but she is prettier and more charming than the older sister because I have concentrated on important and interesting parts rather than discussing every point. Actually, I should say that papers in this book are the essence of the older sister.
In the book Understanding SAS I included many new ideas, that is, my own research results. I finished the book, but the labor and delivery process takes a long time. Meanwhile, I decided to write some papers. I have taken some ideas from the book and finished them as separate, independent papers. So now the younger sister makes her debut before her older sister.
The book has 17 papers.
Mainly, there are three kinds of contents: basic, fundamental parts such as names, order of statements and options, end of data set, and operators; hot topics such as CHKLOG, page of, rtf files and special characters, and transmitting between SAS data sets and Excel files; and that are just for fun.
The SAS manual is a great resource for every SAS programmer. However, it is too general in some places. For example, when talking about variable names, it says
You do not assign the names of special SAS automatic variables (such as _N_ and _ERROR_) or variable list names (such as _NUMERIC_, _CHARACTER_, and _ALL_) to variables.
This is far, far from enough. As we know, many computer languages have a determined set of reserved words, such as SQL. But SAS is different. "The rules are more flexible for SAS variable names than for other language elements. . . . . SAS reserves a few names for automatic variables and variable lists, SAS data set, and librefs."
First, what are these "a few names"? Second, this gives users great flexibility. They can use almost any words freely. On the other hand, it brings users some inconvenience, because you don't know how SAS treats a word: Is it treated as a user-defined name or as a keyword? This depends on SAS's interest and SAS's understanding. Sometimes you may think it is OK to use a specified word as a name, but SAS says: No, it is a keyword. Then your programming will be messed up. The following program creates two printouts. Guess what these printouts are.
DATA s;
a=5;n=6;
PROC REPORT NOWD;
COLUMN a n;
RUN;
DATA s;
not=0;
yes=not+1;
PROC PRINT;RUN;
Therefore, if you don't know how SAS behaves, you may not get the results that you want. We need to know about individual names. We need to know what names are forbidden in SAS and what names and in what situations SAS has its own interpretation of their meanings, and then we can correctly use these names or just avoid them.
Another example is subsetting. We all know that we can use a WHERE statement and an IF statement (subsetting IF) to subset a data set. Also, we know that there are some differences between a WHERE statement and a subsetting IF statement. We know that they will produce different data sets in some situations as the SAS manual mentions in the following:
The WHERE statement can produce a different data set from the subsetting IF when a BY statement accompanies a SET, MERGE, or UPDATE statement.
Then we may ask the question: If there is no BY statement, are there any differences? Some books discuss this, but the discussions do not contain enough details. We need comprehensive comparisons between two statements. Several papers in this book discuss this topic.
END of data set is fundamental in the SAS language. Almost every programmer uses option END=. In the paper [IV] I discuss when we can use this option and how SAS works on this option.
Relationships among the options KEEP (DROP), RENAME, and WHERE are basic. In the paper [III] I discuss relationships among statements and options. You may not care about relationships between options OBS= and WHERE= because you never use them together. However, it is quite possible that you have used options RENAME= and WHERE= together. The SAS online document talks about relationships among the options KEEP=, DROP=, and RENAME=, but not with the WHERE= option. So what is wrong with the following program?
DATA s;
a=3;
DATA t;
SET s(WHERE=(a>2) RENAME=(a=b));
RUN;
3 DATA t;
4 SET s(WHERE=(a>2) RENAME=(a=b));
ERROR: Variable a is not on file WORK.S.
5 RUN;
NOTE: The SAS System stopped processing this step because of errors.