Analyzing multiple cases

In ten years I never thought of this before :p We have cases which is the central element of our analysis. Some, I don't know what percent, go through part of the process and then return as a second (or third or fourth etc) case. I am not sure what the ramifications are of analyzing such repeat cases in regression models, when we analyze we just analyze all cases. Note this is not an issue of effect size or p values or standard errors. We have the entire population. I think its an issue of statistical independence although I am unsure if that applies.

One author noted this, although I am not sure how one does so in practice
"Dean et al (2015) found significantly different service impacts for individuals with prior case(s) versus those for whom the case was their first. Thus, this heterogeneity must be accounted for when including both in a single analysis."
Last edited:


Less is more. Stay pure. Stay poor.
Heterogeneity in estimates is what they are referring to.

Can you identify non-unique cases? This comes up in my work all the time, patient seen in ED, then seen again. Depending on overlap, use most recent case and control for number of priors, ignore independence issue if minimal overlap and use robust SEs, or if most have multiple visits use MLM.
It is easy to get at non-unique cases. Everyone has a casenumber. If it is greater than one they have more than one case. Only casenumber = 1 is the first case. casenumber =2 will be the second case and so on.

The question is which cases we should exclude. Do you gain anything in your analysis if you exclude all cases but the first one? We want to know which spending and policy, essentially an intervention has the right impact. We get assessed by the government on all these cases. So I am not sure we should exclude them in the regression (or descriptives for that matter). But of course if I knew for sure I would not ask. :)

I don't really know what heterogeneity of estimates are, I will have to look it up. Robust standard errors really are not an issue with us because we have the entire population. I worry about bias only really - that is the slopes being misleading.


Less is more. Stay pure. Stay poor.
The robust SEs are for the lack of addressing with person variability. So if I had more than one claim you could link all of the claims to me?
yes although in this case it is services that are involved primarily. There is a customerid and a separate caseid. A person can have many cases. Does this potentially bias the results? With a population I don't care about standard errors.


Less is more. Stay pure. Stay poor.
Well the example I always use is that if the observations are not independent your results wont quite be what you think they are. So I usually say, what if I allow a person to vote twice, thrice, etc. Well, their attributes are going to be over-represented. So if you are a male, it is going to seem like males really like a certain candidate if your voting pattern is consistent. When in actuality, a certain person liked a certain candidate. So you have claims. You may say people from Podunk FA really have a lot of claims, when it is a single person. So if you can resolve their upstream issues you get more bang for your buck. An example could be I have a person who goes to the ED 55 times a year to get recovery meds for migraines - a real scenario. So hey 5% of visits may be due to 1% of people. I need to target these habitual people and not think there are alot of migraine patients.
The problem is that we reviewed by the federal government at the case level. So if someone closes twice, we get rated on them twice. We want to improve our performance. I am not sure if I should analyze the cases separately or at the customer level. For income I assume this would average the DV. The problem would be with employment which is either 1 for having a job and 0 for not. If the same customers was there twice, and once they had a job and once they did not, how would you evaluate their success in the regression at the customer level (which has more than one case)?


Less is more. Stay pure. Stay poor.
OK, I think it is a free article - that is why I listed it. They used a tree partitioning model in lieu of regression.