Add price information based on reference number to different observation

#1
Hi

I have a large dataset that includes some observatios that are identified as either E or A.
Observation A and E have a common reference ID number.
Observation A incudes price information whereas E does not. However based on the reference number I would like to add the price information from observation A to observation E.

So something like if ref # in obs A equals ref # in obs E then add Price from obs A to obs E.

I have tried looking at the merge function, but as far as I can tell that won't work.

Is it a if-else-then function that needs to be applied?

Any help would be much appreciated.

Thanks
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Is this long or wide data?

Yes, this possibly seems like a basic if-else-then statement. Let us know if the multiple observations for the same id exist and are new rows or not. So perhaps describe the data a little bit more.
 

Janus

New Member
#3
Two thoughts:

1) Order the data by ref number and obs (A, then E) and use a retain statement.
2) Use proc sql. Call price information from A via an inline query and match it to E:

Code:
proc sql;
create table newdata as
select
[INDENT]d.*,[/INDENT]
[INDENT]i.priceinfo as newpriceinfo[/INDENT]
from
[INDENT]data d,[/INDENT]
[INDENT](select[/INDENT]
[INDENT][INDENT]ref,[/INDENT][/INDENT]
[INDENT][INDENT]priceinfo[/INDENT][/INDENT]
[INDENT]from[/INDENT]
[INDENT][INDENT]data) i[/INDENT][/INDENT]
where
[INDENT]d.ref=i.ref[/INDENT]
; quit;
I haven't tested but that would probably work. I would check to make sure both A and E are retained.
 
#4
The data for an observation identified as an A (added order) has a unique reference number with a price point attached.
The data for an observation identified as an E (executed order) has the unique reference number too, but with no price point, as this information is implicitly available.

The data occurs on seperate rows.
 

Janus

New Member
#5
I see that my query doesn't filter out the E observations. Add a where clause to the inline query that calls only A obs:

where obs = 'A'
or
where priceinfo ne .

Something like that. Does that work?
 
#6
I'm not sure i can follow your code properly. I get an error when running it.

proc sql;
create table newdata as
select ref price
from orderbook;
price as newprice from orderbook
(select ref, price from orderbook_ex) where ref = ref;

quit;

Can you point me to the error in the coding?
 

Janus

New Member
#7
That's not the same as the code I posted. Any semicolon use breaks a query and tells SAS to look for a new one. You can execute multiple queries within a proc sql statement, but after your first semicolon you list columns without a SQL command. My guess is you were trying to use an inline query, but an inline query exists entirely within the from statement. I'm assuming that orderbook is your data, but what is orderbook_ex? Also, if you are doing an equijoin on two tables, you need to be specific when it comes to matching columns. "ref = ref" is vague because proc sql doesn't know what ref goes where (in this case, it would probably work b/c it is an inner join, but still), so you should be using table aliases. Below is my code from before updated to include proper table and variable names as you have listed them. The only thing I'm guessing at is the column that indicates "A" or "E" (I call it "obs"). This is what is happening.

In the from clause, I am calling two tables: your original data (alias d) and one I am creating on the fly (alias i), which contains only data for obs='A'. From those two datasets, I'm pulling everything from the original as well as all the price information and then matching on ref. See if this works:

Code:
proc sql;
create table newdata as
select
	d.*,
	i.price as newprice
from
	orderbook d,
	(select
		ref,
		price
	from
		orderbook
	where
		obs = 'A') i
where
	d.ref=i.ref
; quit;