Trying to evaluate/measure how much 1 variable affects another.


So I've been helping look at some data for a fundraising event my school ran. We used an app and the school wants to evaluate if it is worth it. From the event I have the data of all donations (how much it was and who it went to). So what was done is for all fundraisers the sum of all donations they collected was calculated and the count (so can see each fundraiser and say they collected $x and y donations).

Now I was told the value of the app was just subsetting the fundraisers based on if they used the app or not and calculating the average amount fundraised and the average amount of donations they got and taking the difference between the 2 and multiplying that difference by the amount of fundraisers that used an app (so in affect saying the difference between the two means is caused by the app, and by multiplying that difference by the amount of people that used the app we should know how much it was worth). I don't however trust that number as it seems too simple, doesn't take into account outliners (or the data in general) and just seems too high to say that difference was caused by just the app.

So my knowledge on stats is kind of bad and I'm using R to run the tests for me. Looking at the data I used a Shapiro-Wilks test and histogram to verify my data is NOT normal. Since my data was not normal I used a Spearman's rank correlation and found there does appear to be a correlation between the app and the amount fundraised (both sum and count). Finally I used a wilcoxon rank sum test to test the hypothesis that the median for the app users was higher than non app users (which it was). Now I'm not sure how valid these tests are since my data does have a lot of ties ranks :S. At this point though I'm pretty confident the app is having a difference on my data and the numbers used by the app are higher than those that didn't use an app.

So now what I want to do is try to get a range I can be confident about that gives me a value on the app. I unfortunately don't really know how to do this. I tried using regression to plot out how the lines look, but my regression line had a huge error rate (line of best fit was something like 40%) and my residuals were not normal (looked like they followed an X^3 function when I plotted them) so I don't think that is the right way. Though for fun I tried finding the difference between the amount expected raised by the amount of sponsors everyone had, which unfortunately gave me a negative number which contradicts everything I've tried so far with my data. Thoughts?

edit: Apologies for this being in the wrong forum section :(
Last edited: