+ Reply to Thread
Results 1 to 5 of 5

Thread: Calculating variance for variable derived from two correlated variables

  1. #1
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Calculating variance for variable derived from two correlated variables




    Hello,

    I am analyzing a dataset based on a survey of organic farmers. Among other things, I have data about the farmer's organic seed usage and about their farm size.

    One statistic I'm reporting is an estimate of the overall average percent of acres planted to organic seed. I'm calculating this as SUM(acres*percent_seed used)/SUM(acres).

    I would like to provide a confidence interval for this, but am not sure how. Farm size and acreage are correlated (in general as farm size increases, usage of organic seed goes down).

    From this post, I have been able to see how to calculate the variance of the product of two correlated variables, but adding in the devision step has stumped me.

    I appreciate any help you can provide.

    Thank you so much,

    Jared

    Research and Education Assistant Director
    Organic Seed Alliance
    jared@seedalliance.org
    http://seedalliance.org

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,185 Times in 1,146 Posts

    Re: Calculating variance for variable derived from two correlated variables

    Well this is something I am not good at, so let me provide some advice


    This reminds me of the issues of unequal error terms in linear regression. I am trying to remember the remedies. You have these confidence intervals you can construct based on pooled data. So if we picture a 45 degree line and the actual distribution of data shaped like a funnel (say triangle) around it, you risk your smaller point estimates having too broad of confidence intervals and the larger point estimates having too narrow confidence intervals.


    Hmm, with a lack of the actual solution (because I am too applied), I wonder if you could do a bootstrap nonparametric confidence interval. So a resampling technique. Let us see if anyone else has input.
    Stop cowardice, ban guns!

  3. #3
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,185 Times in 1,146 Posts

    Re: Calculating variance for variable derived from two correlated variables

    Sorry, I was just thinking about your problem. Not sure if I mislead you. If you graph out your data is there a linear relationship. If you can get a linear relationship out of the data, transformation, you may not need to worry about complex calculation for a CI.
    Stop cowardice, ban guns!

  4. The Following User Says Thank You to hlsmith For This Useful Post:

    JaredZystro (08-07-2015)

  5. #4
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating variance for variable derived from two correlated variables

    Hello,

    There tends to be a linear relationship, although the distribution of acreage is not normal - there were many more respondents with small acreage compared to larger acreage.


  6. #5
    Points: 17, Level: 1
    Level completed: 33%, Points required for next Level: 33

    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Calculating variance for variable derived from two correlated variables


    Hello again,

    Based on hlsmith's suggestion, I attempted to use a bootstraping technique. Below is my attempt in R. I haven't done this before and would love to hear if what I did has problems that I'm unaware of.

    Code: 
    library(boot)
    
    seedacreage <- function (D,d){
       E=D[d,]
       return ((sum(E$Acres*E$Seed)/sum(E$Acres)))
    }
    
    b = boot (VegSeed, seedacreage, R=100000)
    
    mean(b$t[,1])
    [1] 23.43237
    
    boot.ci(b,type="basic")
    
    BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
    Based on 100000 bootstrap replicates
    
    CALL : 
    boot.ci(boot.out = b, type = "basic")
    
    Intervals : 
    Level      Basic         
    95%   (-1.92, 33.06 )
    Last edited by JaredZystro; 08-07-2015 at 05:50 PM.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats