SAS Coding to find averages for every 60 observations


I am working on an analysis for research on premature baby health. I ran into a SAS coding issue which I hope some of you can help me with.

This is a repeated measures design. My outcome variable is SpO2. There are 20 subjects. Each subject has over 30,000 repeated measures (2 second intervals). The variables I have are ID, Time in seconds, and SpO2.

I have 1 million observations total. I would like to break this up into parts.
Instead of having observations for 2 second intervals, I would like to create hourly intervals. So I want to create one dataset that takes has the mean SpO2 for each hour for each subject. The dataset originally looks like this:

ID Seconds SpO2
1 0 100
1 2 99
1 4 100

But I want it to look like this:

ID Hours Mean_SpO2
1 0 98
1 1 100
1 2 99

I appreciate any help.
Thank you.
Last edited:


Less is more. Stay pure. Stay poor.
This isn't correct, right:

ID Seconds Mean_SpO2
1 0 100
1 2 99
1 4 100

You have ID Seconds and SpO2, correct. Not averages in your current data.


Less is more. Stay pure. Stay poor.
Two steps, creating a categorical variable 30,000/30 = 1,000 minute groups and getting the averages for those groups and getting rid of duplicates so final dataset has about 1,000 averages.

The second step can be done using proc means and a class option, then an output statement in that step. I don't have that code memorized but you tell SAS to calculate averages for strata and insert that back into the outputted dataset. Then you may need to run a proc sort with nodupkey option so then instead of 30,000 rows you have 1,000.

I will think about the first step as well.
That is a great idea. I can do the second part with proc means and using the output out= option. But as for the first part I am still stumped. How would I make a variable that can categorize the first n observations as "1" , the second n observations as "2" etc...
I appreciate your help!
No, there are some with 20,000 measurements and others with over 50,000. However, each subject has their own dataset meaning I will have to merge all their datasets for the analysis later on. To be clear, I have 20 different csv files and each file holds information on one unique subject in long format.

So I believe that whatever process we do, we can run it on the individual files and then merge the data.
Or we can merge first then clean and change but I believe that may be more complex.


Less is more. Stay pure. Stay poor.
Alright, I suck at datasteps, but I think I figured out your first step. Create a new variable by dividing seconds by 30 and rounding all decimals up, not sure what to do if you have a zero value for seconds, I guess add 1 to all Second values and tweak the formula. You will have to clean up my code below and double check that it works:

SpO2_min = round ((second / 30)), 1.0;
P.S., Feel free to post your final code for all three steps, I would be interested in seeing it all put together.

Let me know if you have questions on how to put the 3 steps together.

1. Create categorical variable
2. Get average
3. Get rid of duplicate values for a group
%Macro Import(Name,ID);
Proc Import file="&Name" dbms=EXCEL out=&ID replace;

Data &ID;
set &ID;
Hours = CEIL ((Seconds/3600));

Proc Means data=&ID NOPRINT;
class Hours;
var SpO2;
output out=M&ID mean=mean_SpO2;

Data MS6;
set MS6;
Drop _FREQ_ _TYPE_;


I ran this macro for every subject's file.
All that's left is for me to merge datasets.
Thanks a bunch!!!