View Full Version : matching obs in two colms
SAS_Novice
08-06-2010, 08:29 AM
Hi all,
I have got the data in the format below:-
A id1 id2
1 a f
2 b g
3 c k
4 d m
5 e n
6 f a
7 g p
What I want to do is if id1 is equal to one of the id2's then I want to create a new variable called WANTED.
if id1 matches to any one of the id2 observations , then WANTED='YES" else WANTED='NO'.
for example:-
Here a appears in id1 column and id2 column, hence WANTED=YES
but e appears in id1 but not in id2 hence WANTED=NO.
Is there anyway we could do this?
Thank you very much in advance.
I'm going to call your data set "original". See below.
data list;
set original (keep=a id2);
proc sql; create table wanted as select A from original, list
where original.id1=list.id2 order by A; run;quit;
data final;
merge original (in=b) wanted (in=c);
by a;
length wanted $3.;
if b*c then wanted="Yes";
else if b then wanted="No";
run;
SAS_Novice
08-06-2010, 11:23 AM
Thank you vey much Link for your prompt reply. But, seems there is a wee bit of trouble.
Here is SAS Log:
proc sql;
77 create table wanted as select A from original,list
78 where original.id1=list.id2 order by A;
ERROR: Ambiguous reference, column A is in more than one table.
ERROR: Ambiguous reference, column A is in more than one table.79 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
80 data final;
81 merge original (in=b) wanted (in=c);
ERROR: File WORK.WANTED.DATA does not exist.82 by a;
83 length wanted $3.;
84 if b*c then wanted="Yes";
85 else if b then wanted="No";
86 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.FINAL may be incomplete. When this step was stopped there were 0
observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
Hope you can help me out once again.. Meanwhile I will keep researching on Proc Sql and try to gain some confidence on it. Thank you very much once again.
data a1;
input A id1 id2;
datalines;
1 23 34
2 25 23
3 27 30
4 28 84
5 30 90
6 99 28
7 30 22
;
proc sql;
create table _temp_ as
select a.*,b.id2 as k
from a1 as a left join a1 as b
on a.id1=b.id2;
quit;
data a2;
set _temp_;
length wanted $3;
if k = . then wanted = "No";
else wanted = "Yes";
drop k;
run;
SAS_Novice
08-06-2010, 12:08 PM
Thank You Very Much Pygm . This is exactly the sort of thing what I was looking for. Thanks again.
Just a slight problem I have though.
The data that I made up above are all numeric but the data I have is all text data. SOrry for this, I should have explicitly said it above.
The data is like below:-
A id1 id2
1 a f
2 b g
3 h p
4 d m
5 e u
6 f l
7 g t
What I want to say is that Id1 and Id2 are text variables rather than a numeric variable. Is it possible to do similar things with text too?
Many Thanks again!
no problem,
sure u can
replace "if k=." with "if missing(k)"
:)
SAS_Novice
08-10-2010, 05:28 AM
no problem,
sure u can
replace "if k=." with "if missing(k)"
:)
Thanks Pygm. It works prefectly for a small toy data set like the one above.
I was repeating the drill for somewhat larger data set and it somehow failed to generate the desired output for me.
Let me give a more accurate picture..of the data...Here are the two columns of the data I have got
id1
R1-203B-00570
R5-203-00568
R9-203K-00569
R11-203-00431
and
id2
R7-203B-00570
R5-203-00568
R9-203K-00569
R15-203-00431
So, When I run the code that you provided,
So, Instead of generating a new column called 'K', the code genereates a new column called 'id2' which are all blanks instead of YES or NO.
Hope I have made myself clear.
Many Many Thanks for your time,effort and help again.
Powered by vBulletin™ Version 4.1.3 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.