Logistic regression dataselection

Hello everyone,

i am currently researching how much and for wchich reason non-profit organisations voluntarily appoint an auditor.
So i formulated some hypotheses and now i've selected all the non-profit organisations who are not classified as very big in my country (4510 cases).
The reason for this is that only the very big non-profit organisations need to appoint a statutory auditor.
Now i should be able to run my model.
The only problem is that in one of my hypotheses i presume that NPO's who are more dependant on grants and donations are more likely to voluntarily appoint an auditor.
But these NPO's are not recquired to disclose how much grants and donations they received in their annual accounts. They can decide for themselves if they want to disclose this matter.

So the problem is that only 1610 NPO's of my total sample of 4510 (small and big NPO's) diclosed how much grants and donations they received.

Now my questions is, is this a problem? And what are my options?
I presume i cant draw conclusions by running the logistic model for only these 1610 cases?
Should i run 2 models 1 for the 4510 cases and 1 for the 1610 cases?

i'm open to any suggestion or if by chance you know a decent book who can help me with my problem.

Thanks in advance for your help!


Active Member
The two options you mention are possibilities. An alternative that might work with all the cases is this: I'm assuming that if an NPO discloses their grant funding, that they do it in a way that can be treated as a continuous variable; that is, they disclose the amoung of funding. My idea is to include a dichotomous variable Z for whether the NPO discloses grant funding or not; a continuous variable X for the amount of funding, to which you would assign 0 to the nondisclosers; and a term Z*X for their interaction. You should, then, be able to analyze the effect of disclosure versus non-disclosure as well as, among the disclosers, the effect of the amount of grant funding.

I've never seen anything like that idea in print, but a good book for understanding how to implement and interpret logistic regression, especially when the model includes interaction terms, is Logistic Regression: A Self-learning Text by Kleinbaum and Klein.
Last edited: