Hi I have a variable that contains numbers and letters. Ideally, participants had to enter their first two state letter - ex. NY- following by certain numbers and letters.

2 OH34789
3 FL3
4 45OHnmo56
5 ny1234

I would like to extract the first to letter of this string variable which is supposed to be the state. How can I do it in stata. There is another way to deal with this talking into account that some participants did not follow the correct format of entering their IDs such as participant 4 - which does not have their State letter at the beginning or 3 that have two blank spaces at the beginning?

1. Strip out the numbers:
foreach n of numlist 0/9 {
    replace state=subinstr(state, "`n'", "", .)
2. Trim the blank spaces:
replace state=trim(state)
3. Extract the first two letters and convert to upper case:
replace state=upper(substr(state, 1, 2))
After than run:
tab state, mis

To make sure that the states are all valid, and to look at the missing ones to see why they're missing. Of course it might be best to work with a copy of the original variable, rather than doing all of these modifications to the original variable - then you can look at the original variable to see why the final one is missing (or wrong).


