Fundraising Evaluation Design

#1
I work for a nonprofit that is hoping to determine whether or not certain corporate parameters are correlated with joining our organization as a partner.

Population:
Our population is a list of 700 companies that we have attempted to bring on as sponsors over the last couple of years, including those who became partners.

Independent Variables:
The data to which I have access include a company's prior giving to the organization, annual revenue, media spend, employee count, amount of philanthropic giving, giving in our philanthropic area, a binary variable indicating whether or not they've given to other organizations with our specific mission (this could potentially become actual amount given to our sister/rival charities), and a proxy variable that represents our connections to a company.

Dependent Variable:
A binary variable for whether or not a company joined as a partner at any point.

I know there is a lot of space for bias in these data (for instance, we're more closely connected to the companies that have been long-term partners or that some charities don't report received gift amounts). And I know there's a lot of space for noise here.

I'm really trying to develop my stats knowledge, and this will be my first time attempting something like this on my own. I know just enough to know how little I know and how many opportunities I'll have to get something wrong, but I was wondering if anyone had any thoughts about how I might structure this.

Any thoughts?
 

Karabiner

TS Contributor
#2
You could start by exploring the predictor variables (descriptive statistics, graphical displays like histograms, box-an-whisker plots). Then you could examine bivariate relationships between the predictors and the dependent variable (e.g. annual revenue of partners versus non-partners: descriptive statistics and plots for both groups). Probably, calculation of correlation coefficients between the predictors would additionally be helpful to determine whether there's redundancy. And then you could build a binary logistic regression model model with partner status as dependent variable, including the other variables as predictors.

Just my 2pence

K.
 

Karabiner

TS Contributor
#4
No! Sometimes there is the requirement of normally distributed prediction errors (residuals) of a statistical model, but never of normally distributed raw variables (except, maybe, for the statistical significance test of a Pearson correlation). And, if sample size is large enough, even the assumption of normally distributed residuals can be violated without negative consequences; with n=700, your analyses will be robust. Besides, the most complicated analysis would be logistic regression, which doesn't require normality anywhere (AFAIK).

With kind regards

K.