Summing observations for two different variables by two variables

#1
Hi,

I have a data set that contains the variables: age sex n pop area period. So the initial dataset looks something like this:

sex age area n pop period
1 1 3 20 20 1995
1 1 2 5 275 1995
1 1 1 10 300 1995
1 2 3 8 310 1995
1 2 2 26 280 1995

Age goes up through '12' and then this is repeated for sex '1' and '2' and areas '1', '2', and '3' for each calendar year through 2011. So after importing the Excel sheet, I deleted age and sex because I first want to do analyses just by area over time (then later by age group). I was told that the formatting of the data in Excel might be why I am getting some errors when I go to do the negative binomial regression, so I wanted to try deleting age / sex and then summing n and pop for each area for each year (e.g., sum of n and sum of pop for areas 1, 2, and 3 respectively for 1995, 1996, and so on).

I tried sorting the data set by area and period and creating new variables 'nsum' and 'popsum' and counting by using:
'if area=1 and period=1995 then;
do;
nsum=0;
popsum=0;
end;

nsum=nsum+n
popsum=popsum+pop

However, then I still have one row for each age group / sex that has now been deleted and I only get area 1 and period 1995 back in the output. For example:

area period nsum popsum
1 1995 450 2890
1 1995 200 1200
1 1995 130 680
1 1995 222 1354

I saw some examples online for how to collapse observations by group, but I am not sure how to sum observations by area AND period so I need to figure out what that code should look like. Thanks!
 

hlsmith

Not a robit
#2
Got lost in your description. Can you provide an example of what you want the final set to resemble

Do you want the sumpop and sumn for the unique none sex and age groups?
 
#3
Sorry for the unclear description! I ended up getting it sorted out with:

proc sql;
create table want as
select period,area,sum(n) as nsum,sum(pop) as popsum
from have
group by period, area;
quit;

So that way I am left with just period, area, and a new variable of nsum and a new variable of popsum.