creating families w. both siblings and stepsiblings

Hi. I have been having problem figuring out how to do this in stata for a while now.

I am working with a dataset on disabled/not disabled children including all the children’s siblings/stepsiblings and their parents.
The goal is to group the id’ed children into groups w. siblings and step-siblings to see the average number of step-siblings etc. in families w. a disabled child and families w. a non-disabled child. The problem is, I havent got any id on the family, only the parents.
Below you can see a small example, I have about 450.000 children id’ed.
As you can see below In the illustration I tried grouping the families into groups by mother and father by using group(m_id) to give them a sort of family_id. But I can’t quite move on from here. So in short; I am looking to group all 450.000 children into groups of siblings/step-siblings. Each child should have the same new familyid as all of its siblings/stepsiblings. But that should make some of the respondents appear in several family groupings. How do I deal with this kind of “cross-grouping”?
I would greatly appreciate your help!

id m_id f_id disabl gr_m gr_f
1 24 10 1 1 1
2 12 8 2 2 2
3 . 8 1 . 2
4 10 14 1 3 3
5 8 6 1 4 4
6 18 45 1 5 5
7 56 . 2 . 6
8 1 94 1 7 7
9 45 14 2 8 8
10 82 8 2 9 9
11 45 14 1 10 8
12 84 9 1 11 10
13 84 9 1 11 10
14 12 11 1 12 11
15 11 5 1 13 12
Which variables explains how many siblings there are in a family? And which variable explains if a family has a disabled child? If you have those variables, you can create a dummy variable and then use that dummy as an identifier to calculate the avarage number of siblings between the 2 groups (1. family without disabled child, 2 family with disabled child)
I also wanted to have a variable for each childs relation to the other siblings (siblings/step-siblings), so ended up creating as you suggested a dummy variable for each child whether it was a sibling/step sibling and in the end summarized those dummy-variables. Thank you for your reply.
Dear tinnestef,

I had a similar problem in the past. Full siblings are rather easy to fix; create a grouping variable with parental id variables using the -egen, group- command. You can subsequently identify (biological) only children with the -duplicates- command. If you remove these from your data set, the grouping variable will function as a sibling id variable since it will measure the unique combination of m_id and f_id - that is, the biological parents.

I never got around to fix the issue with the half siblings (remember, it becomes complicated since you can have both paternal and maternal half siblings) in Stata but it was quite straightforward using the PROC SQL-command in SAS.