Fundraising Evaluation Design

I work for a nonprofit that is hoping to determine whether or not certain corporate parameters are correlated with joining our organization as a partner.

Our population is a list of 700 companies that we have attempted to bring on as sponsors over the last couple of years, including those who became partners.

Independent Variables:
The data to which I have access include a company's prior giving to the organization, annual revenue, media spend, employee count, amount of philanthropic giving, giving in our philanthropic area, a binary variable indicating whether or not they've given to other organizations with our specific mission (this could potentially become actual amount given to our sister/rival charities), and a proxy variable that represents our connections to a company.

Dependent Variable:
A binary variable for whether or not a company joined as a partner at any point.

I know there is a lot of space for bias in these data (for instance, we're more closely connected to the companies that have been long-term partners or that some charities don't report received gift amounts). And I know there's a lot of space for noise here.

I'm really trying to develop my stats knowledge, and this will be my first time attempting something like this on my own. I know just enough to know how little I know and how many opportunities I'll have to get something wrong, but I was wondering if anyone had any thoughts about how I might structure this.

Any thoughts?


TS Contributor
You could start by exploring the predictor variables (descriptive statistics, graphical displays like histograms, box-an-whisker plots). Then you could examine bivariate relationships between the predictors and the dependent variable (e.g. annual revenue of partners versus non-partners: descriptive statistics and plots for both groups). Probably, calculation of correlation coefficients between the predictors would additionally be helpful to determine whether there's redundancy. And then you could build a binary logistic regression model model with partner status as dependent variable, including the other variables as predictors.

Just my 2pence



TS Contributor
No! Sometimes there is the requirement of normally distributed prediction errors (residuals) of a statistical model, but never of normally distributed raw variables (except, maybe, for the statistical significance test of a Pearson correlation). And, if sample size is large enough, even the assumption of normally distributed residuals can be violated without negative consequences; with n=700, your analyses will be robust. Besides, the most complicated analysis would be logistic regression, which doesn't require normality anywhere (AFAIK).

With kind regards