Web Analyst: Been a While Since I Touched Stats

#1
Hey everyone,

I joined up to this site looking to get some advice. I have been doing web analytics for just over 2 years now. I have an accounting degree, and I come from a corporate finance background. In my schooling, I took some stats classes. I think I went a tad farther statistically than most Business school undergrads I actually learned the proofs behind basic statistics.

But that was about 7 years ago, and I need a refresher. Also, I need to learn some new techniques, because a lot of what I learned won't apply (I think) to what we measure on a web site. Would anyone here be able to direct me to resources for these two issues:

1. A quick refresher on stats. I think it will come back quick...
2. I need to know how to analyze data that can't go below zero: tool, method, the whole thing.

Thanks in advance to anyone that can help. :tup:
 
#2
well....that is a very difficult question. I hope you were not actually looking for a simple response :)

Tell us more about the type of analysis, data, experiments you run and maybe we can help more.

For example, data that does not go below zero...does this mean count data that includes zeros? Is it continuous data? What types of processess generates the data - data mining (observational) or experimental? Maybe if count data, then poisson and negative binomial regression (and their respective zero inflated versions) is what you need. If continuous, that would suggest generalized linear models like a gamma dist and log link (if it does not inlcude zero). Maybe categorical data is what you maninly have.....
 
#3
well....that is a very difficult question. I hope you were not actually looking for a simple response :)

Tell us more about the type of analysis, data, experiments you run and maybe we can help more.

For example, data that does not go below zero...does this mean count data that includes zeros? Is it continuous data? What types of processess generates the data - data mining (observational) or experimental? Maybe if count data, then poisson and negative binomial regression (and their respective zero inflated versions) is what you need. If continuous, that would suggest generalized linear models like a gamma dist and log link (if it does not inlcude zero). Maybe categorical data is what you maninly have.....
I thought my first request for a link to a stats refresher was simple. :D

Regarding the second question, I analyze a lot of traffic metrics generally associated with web pages, which are almost always counting metrics (clicks, visits, referrals, etc) with the exception of time on site. The biggest question I'm asked is: Did this marketing campaign/site redesign/other change make a difference in traffic? The problems I have with the visit metrics is that it's not uncommon for the distribution to look normal... until zero. I'm not really sure how to go about conducting that analysis.
 

CB

Super Moderator
#4
I'm a big fan of David Garson's statnotes at North Carolina State University, lots of detail without confusing maths - but more focused on the social sciences perhaps.

http://faculty.chass.ncsu.edu/garson/PA765/statnote.htm

Stat trek is good for simple free tutorials on common stats questions

http://stattrek.com/

You may not be getting super many suggestions because "a quick refresher on stats" is a bit vague - where to go depends on what you're trying to find out, what kind of stats you're after, etc!
 
#5
Ah yes, the old "did it work" question. Heard that one a lot. I would love to hear others opinion, because I have grappled with it many many times.

Now, if there had been an experiment in place, where treatments / interventions were randomized in a controlled setting, where some visitors/buyers got treatment A and others got treatment B, then this is fairly easy to analyze as independent groups and typically is done by comparing two proportions or two means (or multiple proportions or means). So, make sure you understand tests for two proportions (z tests), means (t test, nonparametric alternatives, permutation tests etc.). For multiple categories, chi squares and Anova are your most basic methods for proportions and means respectively.

Without an experiment, when "we used to do X and now we do Y", I think all statistics break down. I think you are left with some really big assumptions - like comparing two groups before and after and assuming the *independent* groups are the same in all other regards.

Maybe you have opportunity to do some sort of a paired test (which for two groups is a paired t test or Mcnemars test. These are two that you should read about and both assume the same or matched individuals have data pre and post the intervention.

Some people try and fit a time series model to the "pre period" and see if the post period deviates. I am always skeptical of this...


Finally the issue of zeros is a hard one. As a lot of your dependent variables sound like counts, I suggest you look into Poisson and Negative Binomial regression. Both are for counts and both can model treatments / interventions as indicator variables. Both allow for some zeros. For extra zeros over and above what the distribution expects, get a book on zero inflated models. There is also a zero inflated gamma model for continuous variables >=0. Often I analyze them as two processes if I need something simple- the zeros, which I would use logistic regression for (probability of being zero or not) and the positive values which would be a linear model (generalized or general).


Hope this helps, at least a little to get you looking around..you really need many methods in the arsenal and certainly need access to statistical software for all but the basic parametric tests.
 
#6
Without an experiment, when "we used to do X and now we do Y", I think all statistics break down. I think you are left with some really big assumptions - like comparing two groups before and after and assuming the *independent* groups are the same in all other regards.
Thanks to the Berlin Wall between marketing and everyone else at our company, I rarely get the opportunity to control the experiment. Usually I'm racing through a pile of factors that include seasonal variation, multiple marketing campaigns, and possibly site design changes. Sometimes the site owner doesn't even have a goal. It's a mess.

However, I do occasionally get to jump in ahead of the curve, so I do appreciate your suggestions. I'll read up on what you mentioned. Can I ask follow up questions here, too?