+ Reply to Thread
Results 1 to 3 of 3

Thread: Low-prevalence covariates in multiple linear regression analysis

  1. #1

    Low-prevalence covariates in multiple linear regression analysis




    Is there any problem adding low-prevalence covariates to standard multiple linear regression (spontaneous entry)?

    For example, I have 1000+ subjects, 10 categorical variables and 5 numerical variables. I want to predict one of the numerical variables, using the remaining variables as covariates. The categorical variables are dichotomous (0, 1). However, for some of these categorical variables, the prevalence of "1" is 5% or less. I ran the model on SPSS and some of these low-prevalence categorical variables turned out to be predictors of the numerical variable with good 95% CI and p. The overall model had good performance R=0.6+. I'm not adding interactions between categorical variables in the model.

    I did not find any article asking me to look at covariate prevalence as an assumption for the liner regression model. I suppose I'm at risk of type 2 errors because I won't be able to detect subtle effects with so little prevalence, but am I at risk of another type of bias here if the variable turned out to be a good predictor?

    Thank you so much for your input.

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Low-prevalence covariates in multiple linear regression analysis

    You can add them. The only risk is sparsity in high dimension. Meaning you, making up the example, have 5% women with 5% old people with1% minority. Some approaches have difficulty running or easily converging in these settings, but I don't think that would such an issue in linear reg. The issue you come across is being able to generalize your results well - because who is to say your 22 people who are old, female and minority represent comparable individuals in the world. But if the variables are of benefit include them and remember Occam's razor.
    Stop cowardice, ban guns!

  3. The Following User Says Thank You to hlsmith For This Useful Post:

    neuromniscience (03-23-2016)

  4. #3
    Points: 3,006, Level: 33
    Level completed: 71%, Points required for next Level: 44

    Posts
    177
    Thanks
    1
    Thanked 29 Times in 29 Posts

    Re: Low-prevalence covariates in multiple linear regression analysis


    you can also test against something like a zero-corrected/inflated poisson distribution rather than a normal distribution.

    The zero-corrected/inflated poisson distribution works for where the number of zeros is inflated, i.e. a low incidence of a predictor (say like someone winning the lotto vs those who dont).

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats