I am currently doing a research internship and I am researching on what the driving factors (determinants) are for food security differences of households in Benin. I gained access to the dataset of DHS which preformed a survey of over 60.000 households. Well now... i have made a selection of the variables I want to use in my regression analysis on the basis of research and excisting scientific knowledgde. I have started cleaning up and preparing the dataset for modelling however i came across an issue you guys might be able to help me with. Okay so basically every household has somewhere between 1 and 43 household members of which some are children under 5 years old. I am planning on using a linear or logistic regression and since there is not a direct dependent variable i will be using stunting which is according to literature and other studies a good replacement for food security. now I have about 10 independent determinants that I want to include that would like to include. However stunting is only a variable that is measured under children (<5 years) and leaving out the other cases would mean that independent variables such as literacy, education or marital status would all be useless to use (since children under 5 are none of these ofcourse). do you guys know any other way in which i maybe could still include all independent determinants or should i just leave the once i cannot use out?

Your research qustion is not clear with respect to which units you want to investigate.
You say that you investigate food security differences of households, but in the remainder
you seem to view individuals as unit of observation ("literacy, education or marital status
would be useless with subjects < 5")?

If you stick to your initial formulation (food security differences of households), then you
will have to aggregate information on household level. With respect to the dependent
variable, which is only measured by proxy (stunting in children < 5), this would seemingly
mean to adjust your target population ("food security in households with at least 1 member
< 5 years"), and you would have to find a sensible aggregation for your DV (any stunting yes/no?
% of children < 5 who are affected? or something like that). Variables such as literacy, education
or marital status can well be aggregated on household level (e.g. highest educational level, or
median educational level; % of married household members; median literacy, or % of adults
who read etc.).

