+ Reply to Thread
Results 1 to 2 of 2

Thread: bootstrap regression question

  1. #1
    Points: 24, Level: 1
    Level completed: 47%, Points required for next Level: 26

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    bootstrap regression question




    I have a slightly different twist on the normal application of bootstrapping and regression. Here’s the problem… I have a population of people within a particular group (e.g. all the people living within a census tract). I want to build a model to predict a particular continuous outcome at the group level (e.g. predict % of people who will become infected with a disease). For independent variables, I want to use, for example, %male, average age, and average income. The problem is, I have an n of 1. That is, I have only n=1 census tract. Yet I want the model output to be in terms of the hierarchy of the group…so that I can say things like, as the percent of males in the census tract increase by 1 unit, the % of disease infection increases/decreases by x coefficient. Imagine we only have 10 census tracts total, and I want a custom model for each one. So given I only have 10 total census tracts, I cant simply model off that because n=10 isn't really any better than n=1.

    My thought was to bootstrap to create pseudo samples. That is, create lots of artificial smaller census tracts by randomly sampling maybe 10% of the census tract population, generating group level stats like “%infected with disease” for the dependent variable, and independent variables of %male, average age, average income of the pseudo census tract. Do this maybe 1000 times. Then build a regression model at this group (hierarchy) level to get those coefficients and predictions. I would sample with replacement; therefore my samples will create a sampling distribution that will look and center around the real means.

    Ignoring the spatial aspects of my example, is this a valid approach? Is there a better sampling idea? A better modeling idea? Do I need to consider something like a fixed effects model perhaps if the pseudo-sampling idea is ok??

    I’m struggling with how best to generate multiple samples from this scenario. We have only 10 census tracts so therefore I can’t reliably build a regression model using n=10 either. And, I want a separate model for each tract, so we’re back to an n=1.

    Any help you can provide would be greatly appreciated.

  2. #2
    Points: 24, Level: 1
    Level completed: 47%, Points required for next Level: 26

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: bootstrap regression question


    correction: assume I sample without replacement, that is I don't replace observations between samples, such that each pseudo group of observations contain unique subsamples

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats