# Lag function in SAS not updating to reflect new values.

#### Dason

I have some SAS code that needs to reference some values in a previous row. The lag function is typically how one deals with that. The issue is that the value that I grab using lag then (possibly) gets changed and the change isn't reflected in the next row. I've provided a short example but basically think of 'x' as the starting value and 'y' as the ending value and I don't want any of the intervals to be overlapping so if the starting value is <= the previous ending value I want to shift the interval over and keep repeating this process.
Code:
data test;
input x y;
datalines;
1 2
1 4
5 7
;

data new;
set test;
diff = y - x;
lg = lag(y);
if x < lg then x = lg + 1;
* update x and y to be shifted;
y = x + diff;
run;
My output gives:
Code:
                                     Obs    x    y    diff    lg

1     1    2      1      .
2     3    6      3      2
3     5    7      2      4
I would have expected/desired

Code:
                                     Obs    x    y    diff    lg

1     1    2      1      .
2     3    6      3      2
3     7    9      2      6
So basically in the final row the lagged value for y is reading the original unmodified value of 4 instead of the updated value of 6. I think this is due to lag being implemented as a queue but I don't know how to tell lag to update after I change the value. I'm open to any suggestions - I'm not married to using lag here if there is another way around the issue.

#### hlsmith

##### Omega Contributor
Do you really want to be over-writing values instead of creating new values to use within data steps?

That always seems dangerous, like calling a now dataframe the same thing as the augmented prior dataframe.

#### Dason

If I have to make new columns that contain the desired output that is fine. So if there ends up being 'newx' and 'newy' columns that give me what I want I'd be fine with that.

#### Dason

Was able to make it work using retain.

Code:
data attempt2;
set initial;
diff = y - x;
if x <= oldy then x = oldy + 1;
y = x + diff;
oldy = y;
retain oldy;
run;
So the first time through when it is first referenced oldy will be missing since it hasn't been initialized. That is fine for what I'm doing. Then we set the value of oldy to whatever y is at the end of the data step. What 'retain' does is tells it that the value of oldy should be 'retained' for the next row so when I compare x to oldy it will be the value from the previous row. Perfect.

Last edited:

#### Buckeye

##### Member
I have to do something similar. I have observations sorted by date. I need to fill missing values based on previous observations but only if the date is within a specified range of the last non missing value. I hope that makes sense and that it can be easily visualized.

#### Con-Tester

##### Member
Here’s some code that should do what you need:
Code:
data out_data;
set in_data;
retain prev_date [DateSpec];
/* Check if current observation is missing; if not, it's the most recent one. */
if (obs_date ~= .) then prev_date = obs_date;
/* Current observation is missing; apply adjustment. */
else if (abs(obs_date-prev_date) < [Allowed_Range]) then
drop prev_date;
run;
The items in square brackets you will have to supply yourself. For example, the initial value [DateSpec] for “prev_date” could be an actual SAS date such as ‘"29FEB2012"D’ or ‘.’ if you don’t want to assign an initial date (Caution: in the latter case, no observations with missing dates will be adjusted prior to the first non-missing one).

#### Buckeye

##### Member
Thanks. I'll give this a shot. I found a nifty way to deal with the caution that you mentioned. I think I could just sort by descending in a separate data step and fill the remaining missing value. At least my data is such that this method works I think.