Somewhere else on your spreadsheet (doesn't matter where) let's say, column E and F you can list the name in column E and the numerical equivalent in column F

So, e.g.

E F

Black 1

Caucasian 2

Hispanic 3

Asian 4

Or whatever you want.

You then put the following formula into column B

=VLOOKUP(A1,$E$1:$F$4,2)

This formula looks up the value from cell A1 in the array E1:F4 and returns the value from the second column in E1:F4

The $ are important so that when you drag the copy handle and copy the formula to the other cells the A1 reference changes to A2, A3, etc, but the E1:F4 stays constant.

Hope this helps.

Dason- I think a regression analysis on SAS necessarily requires you to have numerical values- so no other reason .

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables

SAS will do the conversion for you. You just tell SAS that your variable is categorical and it does the rest for you. Typically the software you're using will do the dummy coding for you.

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables

When is coding acceptable? Is it okay to code when there are only two values, for example, yes/no, red/blue, male/female ? Asking 'cause I've done it already! Crap.

Code:

```
proc reg;
class Race;
model ...;
```

If you do not put the CLASS statement, then you are implying that there is a consistent trend in races and as the race value increases (2 to 3 to 4) the trend continues (linearly). Obviously not what you want to do with races.

Code:

```
proc reg;
class Race;
model ...;
```

If you do not put the CLASS statement, then you are implying that there is a consistent trend in races and as the race value increases (2 to 3 to 4) the trend continues (linearly). Obviously not what you want to do with races.

The class statement doesn't seem to be working with proc reg data=xyz; . The moment I put class underneath this, it shows up in red and SAS doesn't run eventually. I'd put : class racecode (ref='0')/param=ref; . Didn't work.

Plus just doing a direct coding into numbers isn't the right thing to do. See this thread at the stats stackexchange site for explanations why: http://stats.stackexchange.com/questions/20166/how-to-implement-dummy-variable-using-n-1-variables

model outcome=gender;

run;

?

(coded gender 1 and 2)

Also, if the outcome is dose and the gender-dose association comes out to be significant, how do you tell from the SAS output which gender gets the higher dose?

Thanks.

Last edited:

Also my previous question- must the comparison variable for dummy variables always be coded 'o'? I mean, should it always be 'else 0'?

Stats not my field so at sea. Pardon the naive questions. Thanks.

I am sure this is true in the actual coding (well it has to be) but I try to avoid such

Column: Race- has Blacks, White, Hispanics, Asians etc.

In the next column I need to give these individual races numerical values, for eg. Black is 1, White is 2 and so on.

How to do it?

Thanks.

In the next column I need to give these individual races numerical values, for eg. Black is 1, White is 2 and so on.

How to do it?

Thanks.

Procedure:

1) Insert an extra column.

2) In the Penn World Table, the first country is Angola (country iso code AGO) and assuming that you have 41 year observations for each country ranging from 1960 to 2000 - the observations for Angola will run from row 2 to row 42. The next country is Albania (country iso code ALB) and its observations run from row 43- row 83, etc. (observations start from the second row since the first row contains variable names)

3) In the new column, put a 1 in the second row (corresponding to the first observation for Angola). Then, use the "IF" function to specify that all remaining entries for Angola be coded "1", Albania "2", etc. Assuming that the iso_code identifier is in column B in Excel, the following formula should do it

"=IF(B3= B2,C2,C2+1)"

Recall that we put a 1 in the second row of the new column ( "C2" in this case). So in cell C3, we have instructed Excel to assign value 1 if the country identifier in column B3 is the same as the country identifier in column B2, ELSE - assign a value of C2+1 = 2.

4) Drag down this formula until the bottom and each country will now have a unique numerical identifier.

Andrew Musau

Faculty of Economics and Social Sciences

University of Agder