Infile for non-standard data

#1
I have a messy file (does not line up in columns), where some of the columns are tab delimitated and some are comma.

My problem with the data set is reading the files with variable lengths

12 Stephen Cole, 33, Columbia, MO
5 Dave Anderson, 25*, Concord, OH

The first column is a ID (tab) the the name (comma, variable length) age (comma), active (presence of an asterisk after age), home (tab, variable length)

The * after the age indicates if they are inactive.

All the names start at column @19, but everything after that is variable lengths and column starts.

I want to read into a format where I finally get.

ID Name Age Active Home
12 Stephen Cole 33 Active Columbia, MO
5 Dave Anderson 25 Inactive Concord, OH

Thus far I have:

data marathon;
infile 'c:/file.txt' dlm=',' pad firstobs=12;
input @3 ID 3. @19 Name $CHAR13.;


Then I get stuck on how to read the rest. I am mostly thrown with how to read the asterisk next to the age as its own column. If I had that understood, I think I can handle the rest.
 
#2
What I would do is use a multi-step approach:
1. Import the source file as a fixed-length record type in four columns, namely ID, Name, RestOfRecord and RecordNo where RecordNo is a numeric type (assigned by a “RecordNo = _N_;” statement in the first import step) and RestOfRecord is a string type (make sure it’s long enough to take everything after the name). Call this dataset X1.
2. Use a data step to split X1 into X2 and X3 such that X2 keeps ID, Name and RecordNo, and X3 keeps RecordNo and RestOfRecord.
3. Export X3 to a temporary comma-delimited text file, say X3.txt.
4. Import X3.txt as a CSV file into dataset X3, expanding the appropriate variable names of the comma-separated RestOfRecord fields.
5. Sort both X2 and X3 by RecordNo and merge into your final dataset.

I hope the above is clear. Please ask if it isn’t. That’s how I would approach it, based on the info you have provided. There may be a more elegant solution but I’m not aware of any for such a mixed storage format. Concerning the Age variable with the potential for an inactive indicator following it, first import this field as a string and then add appropriate processing to turn it into a numeric variable, plus a separate active/inactive indicator.