Two for one: Unique variable values

ondansetron

TS Contributor
#1
I used the search and did not find the answer, and I've been poking in SAS documentation without much luck.

problem 1) I would like to tell SAS to randomly select x% of my spreadsheet subject to the constraint that all values for a particular variable are unique.
For example, if I have a sheet of 500 people with birthdays, I want a 5% random sample of these observations. However, I want zero birthdays repeated in the sample.

4286 Proc surveyselect data=birthday method=srs rate=.05 seed=12345
4287 out=uniqueBD;
4288 run;

This is the standard syntax, but I tried finding a way to put a constraint that birthday must be a unique value to end up in the sample. Google and SAS documentation have not given me what I'm looking for thus far.

Problem 2) I want to assign a unique, randomly generated ID to each observation subject to the constraint that some other feature is constant for a given ID.
For example, I want to assign a randomly generated, unique ID for each observation such that observations with the same address receive the same unique code, but such that different addresses have different codes.

I was thinking of how to go about this maybe:

Data step to create a new variable for the ID and using a random number function to specify an 8-digit number. I'm not strong enough in SAS to tell it to say for each address, give a unique value, but for matching addresses, assign the same value.

Any guidance with one or both is appreciated. The first problem is the most important.
 

ondansetron

TS Contributor
#3
If the values in the original table are unique to start with can't you just sample without replacement?
I could, but they are not unique.

Take for example measuring a child's height over time. When they enter observation, they are assigned a random number for ID. Each measurement on the child is entered into a data rectangle with the ID to specify which person the measurements belong to in the study.

ID height date
1 50 x
2 65 x
3 70 x
1 55 (x+delta)
2 70 (x+delta)
.
.
.

where x is some start date and delta as a change between dates.

Random sampling on ID would possibly pull #1 twice, but I want it only once to sample one person, rather than one observation.