+ Reply to Thread
Results 1 to 15 of 15

Thread: Type 1 error in regression with entire populations

  1. #1
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Type 1 error in regression with entire populations




    I have read that you are not supposed to keep choosing different levels of the categorical variable to be the reference level, because this can cause familywise error, the true type 1 error will be greater than your nominal alpha level. I mean by this if you have 5 levels, you change the reference level five times seeing what the results will be (although it can be useful substantively to do that).

    I am not sure the above is true. But even if it is, does this apply when you have an entire population, as I usually do. Can you even have type 1 error when you are analyzing a population. By that I mean there are 25000 people in the population of interest and I have all of them, there is no sample involved.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  2. #2
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    Why do you have a reference group, did you run something on this population?


    Type I Error is probably exclusive to statistical testing. Statistics are for making generalization from collected samples. You don't have a sample and aren't conducting statistical tests. Your numbers are the truth, so if two groups are different, they are different - no sampling distribution based threats. This is what I think.
    Stop cowardice, ban guns!

  3. #3
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    I just saw this but have not actually looked at its content:


    <li class="first-item">Xiaoqin Wang,

    Yin Jin,


    <li class="last-item">and Li Yin

    Measuring and estimating treatment effect on dichotomous outcome of a population

    Stat Methods Med Res October 2016 25: 1779-1790, first published on September 3, 2013 doi:10.1177/0962280213502146
    Stop cowardice, ban guns!

  4. #4
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Type 1 error in regression with entire populations

    Quote Originally Posted by noetsi View Post
    I have read that you are not supposed to keep choosing different levels of the categorical variable to be the reference level, because this can cause familywise error, the true type 1 error will be greater than your nominal alpha level. I mean by this if you have 5 levels, you change the reference level five times seeing what the results will be (although it can be useful substantively to do that).

    I am not sure the above is true.
    Hi,
    I am pretty sure this is not true. If you are only changing the reference level I think you are actually repeating the same test only presented differently - kindof like writing up the same test results in different languages - so it will still be the same test not an independent one.

  5. #5
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    Whenever I do multiple testing like noetsi mentioned, I always correct my level of significance. Unless it is the exact same test (a vs b, b vs a).
    Stop cowardice, ban guns!

  6. #6
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Type 1 error in regression with entire populations

    Quote Originally Posted by hlsmith View Post
    Whenever I do multiple testing like noetsi mentioned, I always correct my level of significance. Unless it is the exact same test (a vs b, b vs a).
    But this is the same thing, right ? a vs. b,c,d,e or b vs. a,c,d,e ..etc.

  7. #7
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    A vs b, a vs c, and b vs c, is three hypothesis tests in my practice.
    Stop cowardice, ban guns!

  8. #8
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Type 1 error in regression with entire populations

    Quote Originally Posted by noetsi View Post
    I have read that you are not supposed to keep choosing different levels of the categorical variable to be the reference level, because this can cause familywise error, the true type 1 error will be greater than your nominal alpha level. I mean by this if you have 5 levels, you change the reference level five times seeing what the results will be (although it can be useful substantively to do that).

    I am not sure the above is true. But even if it is, does this apply when you have an entire population, as I usually do. Can you even have type 1 error when you are analyzing a population. By that I mean there are 25000 people in the population of interest and I have all of them, there is no sample involved.
    I believe that you do have type 1 error even though you have a population.

    Thinking it through, you have two types of statistics, descriptive and inferential. In descriptive statistics (i.e., mean, standard deviation) you no longer have sampling error, so your measures of mean and standard deviation are absolute. No confidence intervals around mean or standard deviation. However, when you use inferential statistics, you still have variation with which you must deal. True, the sampling variation is gone, but all of your other sources of variation still exist. Where there is variation there is uncertainty. I believe that all of the typical rules, assumptions, etc. still apply.

  9. #9
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    Hmmm. Need references that go either way. Does traditional 1.96 disappear though, it would seem natural not to have it.
    Stop cowardice, ban guns!

  10. #10
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Type 1 error in regression with entire populations

    Quote Originally Posted by hlsmith View Post
    A vs b, a vs c, and b vs c, is three hypothesis tests in my practice.
    Yepp, you are right. But if I understand the question correctly, it is about doing the same analysis only changing the reference level. So, I agree, you have one test for each pair of levels but you do not have more tests just because you changed the reference level from A to B.

  11. #11
    TS Contributor
    Points: 14,811, Level: 78
    Level completed: 91%, Points required for next Level: 39
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,171
    Thanks
    34
    Thanked 405 Times in 363 Posts

    Re: Type 1 error in regression with entire populations

    I found the following:It appears that there are a lot of arguments either way. Andrew Gelman's makes the most sense to me.

  12. #12
    Fortran must die
    Points: 58,790, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,532
    Thanks
    692
    Thanked 915 Times in 874 Posts

    Re: Type 1 error in regression with entire populations

    I think a key issue raised by miner is how you understand your population. If you think of your analysis as only pertaining to that population than type 1 error does not make sense to me. You know the true results in the population and no error is possible. However, if you think of your population as a sample of all possible populations (that might occur in the future for example) then error does pertain - in terms of applying your analysis to those other macro-populations. I have back and forth on this issue, in this case I decided to ignore that future populations might be different (or ones in other states etc, this analysis is very focused).

    I am not sure what source of variation exist other than sampling variation that could cause error in honesty. I don't doubt they might exist -I just can't imagine what they are or how they would introduce error in the regression. All I really care about here are the slopes and odds ratios - not interested in other statistics.

    The literature I have seen comes down on HL Smith in terms of multiple tests (that is why posthoc tests are penalized). But that literature does not I think deal with populations. I am still not sure one way or the other if familywise error applies - because I think type 1 error itself is impossible in a population when all you care about is that population. Its a problem with a sample because you are interested in the larger population and you don't know if what you find in the sample matches the real population.

    Obviously I remain uncertain (thanks miner and hlsmith for the articles). I don't think the assumptions of regression, except for non-linearity, really apply when you are analyzing a large population (I have 25,0000 cases). Ignoring that you have the population when you have that many cases I don't think heteroscedasticity or multicolinearity influence the results because the results are asymptotically correct with that many cases even with the errors if they exist. Nor does normality matter because of the CLT.
    "Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995

  13. #13
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations

    I am also unsure of this additional variation.

    You could say mesasurement error, but that isn't usually addressed in your model. Sensitivity analysis can try to assume its direction and magnitude. You have dispersion of the variate, but that is the nature of a random variable (stochastic).

    For example how would one perform a two sample ttest with a population??

    I get the attempt of saying what about a future sample, but the moment after you get a measurement things are different and what if you aren't predicting just getting a cross-sectional measurement.
    Stop cowardice, ban guns!

  14. #14
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Type 1 error in regression with entire populations

    It all depends on what the analysis will be used for, I guess. The moment somebody says that we have proven an effect it is implicitely understood that we talk about future samples - as an effect presuposes that it will not disappear after we stop the measurement. Also, i do not think anyone uses the very careful wording that would be necessary to avoid this happening - something like "we did not use the usual statistical methods because we have a full census and we have no intention to discuss the existence or non-existence of any effect whastsoever that might show up in our analysis " - so, probably it would be best to go with Gelman .

    regards

  15. #15
    Omega Contributor
    Points: 38,392, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,000
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Type 1 error in regression with entire populations


    i accessed the paper I reference in post #3, it is not relevant to your question. It covers using maximum likelihoods to estimate risk differences, relative risks, and odds ratios from one model.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats