# Thread: Calculating variance for variable derived from two correlated variables

1. ## Calculating variance for variable derived from two correlated variables

Hello,

I am analyzing a dataset based on a survey of organic farmers. Among other things, I have data about the farmer's organic seed usage and about their farm size.

One statistic I'm reporting is an estimate of the overall average percent of acres planted to organic seed. I'm calculating this as SUM(acres*percent_seed used)/SUM(acres).

I would like to provide a confidence interval for this, but am not sure how. Farm size and acreage are correlated (in general as farm size increases, usage of organic seed goes down).

From this post, I have been able to see how to calculate the variance of the product of two correlated variables, but adding in the devision step has stumped me.

Thank you so much,

Jared

Research and Education Assistant Director
Organic Seed Alliance
jared@seedalliance.org
http://seedalliance.org

2. ## Re: Calculating variance for variable derived from two correlated variables

Well this is something I am not good at, so let me provide some advice

This reminds me of the issues of unequal error terms in linear regression. I am trying to remember the remedies. You have these confidence intervals you can construct based on pooled data. So if we picture a 45 degree line and the actual distribution of data shaped like a funnel (say triangle) around it, you risk your smaller point estimates having too broad of confidence intervals and the larger point estimates having too narrow confidence intervals.

Hmm, with a lack of the actual solution (because I am too applied), I wonder if you could do a bootstrap nonparametric confidence interval. So a resampling technique. Let us see if anyone else has input.

3. ## Re: Calculating variance for variable derived from two correlated variables

Sorry, I was just thinking about your problem. Not sure if I mislead you. If you graph out your data is there a linear relationship. If you can get a linear relationship out of the data, transformation, you may not need to worry about complex calculation for a CI.

4. ## The Following User Says Thank You to hlsmith For This Useful Post:

JaredZystro (08-07-2015)

5. ## Re: Calculating variance for variable derived from two correlated variables

Hello,

There tends to be a linear relationship, although the distribution of acreage is not normal - there were many more respondents with small acreage compared to larger acreage.

6. ## Re: Calculating variance for variable derived from two correlated variables

Hello again,

Based on hlsmith's suggestion, I attempted to use a bootstraping technique. Below is my attempt in R. I haven't done this before and would love to hear if what I did has problems that I'm unaware of.

Code:
``````library(boot)

seedacreage <- function (D,d){
E=D[d,]
return ((sum(E\$Acres*E\$Seed)/sum(E\$Acres)))
}

b = boot (VegSeed, seedacreage, R=100000)

mean(b\$t[,1])
[1] 23.43237

boot.ci(b,type="basic")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100000 bootstrap replicates

CALL :
boot.ci(boot.out = b, type = "basic")

Intervals :
Level      Basic
95%   (-1.92, 33.06 )``````

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts