## Intuition about why sample covariance is unbiased (chatbox convo)

02/08 16:04 spunky: quick theoretical check: for sample r to be an unbiased estimate of population rho do we (a) need the assumption of bivaraite normality or (b) r would eventually converge to rho if the sample size is big enough (even though the data may not be bivariate-normally distributed)?
02/08 16:05 spunky: small check in R seems to point out towards option (a)
02/08 16:07 spunky: NO... wait... code error. scrap that
02/08 17:00 Dason: For standardized data in regression the slope is r. Regression slope estimates are unbiased and consistent.
02/08 18:38 spunky: ::facepalm:: never thought about it that way. thanks for pointing that out!
02/08 20:51 Jake: the thing about unbiased estimate of rho seems odd... i understand the argument about it being just a transformation of a regression coefficient, but do the bounds really not complicate things at all? e.g. if rho is -1
02/08 20:57 Jake: i guess if sample covariance is unbiased then it makes sense that sample correlation is too. but intutively it seems like the bounds should complicate things...
02/08 20:59 Dason: Hmm. I think you're right. I probably messed up earlier - the sample covariance is unbiased - the sample variances are unbiased... but the correlation is a nonlinear function of those estimates...
02/08 21:01 Jake: well i have the same intuition about sample covariance. it's bounded at +/- the product of the SDs, right? it seems like in both cases (covariance and correlation) this should have consequences for whether you can get an unbiased estimate. although apparently not
02/08 21:06 Jake: found the following: "Since the sample correlation R(X,Y) is a nonlinear function of the sample covariance and sample standard deviations, it will not in general be an unbiased estimator of the distribution correlation ρ" http://www.math.uah.edu/stat/sample/Covariance2.html
02/08 21:06 Jake: evidently the thing about the bounds is just barking up the wrong tree. that is intuitively how i understand the issue of bias in R^2. i guess it is not a sound argument in general
02/08 21:12 Jake: i dont get it... if the population covariance as at the upper bound sd(x)*sd(y), so that sampling error can only happen in 1 direction, how does this not lead to an unbiased estimate of sample covariance?
02/08 21:12 Jake: maybe this is a CV question =S
02/08 21:14 Dason: but there is no upper bound for the sampling distribution
02/08 21:14 Dason: not marginally
02/08 21:15 Jake: oh... wait... the sample covariance could exceed the pop covariance which is at the maximum?
02/08 21:16 Dason: yeah
02/08 21:20 Jake: see if this intuition seems right to you... the two sample variances could possibly be both overestimated. so although there is a computational maximum to the sample covariance, this could exceed the pop covariance if both sample variances are overestimated (or really if just one is sufficiently overestimated)
02/08 21:21 Jake: this makes sense to me