+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 17

Thread: Alternatives for Non-normality and Inequality of Variance?

  1. #1
    Points: 3,593, Level: 37
    Level completed: 62%, Points required for next Level: 57

    Posts
    12
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Alternatives for Non-normality and Inequality of Variance?




    I've been working with a rather difficult data set for over a week with no real progress. I am trying to compare the effect of temperature (15, 20, 25, 28, 30 degrees) on development time. The problem is the data is very non-normal and the variance unequal despite many transformations. What I have observed is as temperature decreases the variance increases quite substantially. For example, at 30 degrees the organism basically develops at 7 or 8 days but at 15 degrees the range of development can be 22-28 days ect. I've looked at running Kruskall-Wallis, Welsh ANOVA, but am still too concerned with the assumptions. Any advice? Here is my SAS Code if anyone wants to see what the issues are. Thanks!

    data dev1;
    input id duration temp;
    datalines;
    1 8 30
    2 8 30
    3 8 30
    4 8 30
    5 7 30
    6 8 30
    7 8 30
    8 8 30
    9 8 30
    10 8 30
    11 8 30
    12 8 30
    13 8 30
    14 8 30
    15 8 30
    16 8 30
    17 8 30
    18 8 30
    19 8 30
    20 8 30
    21 8 30
    22 7 30
    23 7 30
    24 7 30
    25 7 30
    26 7 30
    27 7 30
    28 7 30
    29 7 30
    30 7 30
    31 7 30
    32 6 30
    33 6 30
    34 7 30
    35 6 30
    36 6 30
    37 7 30
    38 7 30
    39 7 30
    40 7 30
    41 7 30
    42 7 30
    43 7 30
    44 7 30
    45 7 30
    46 7 30
    47 7 30
    48 7 30
    49 8 30
    50 7 30
    51 7 30
    52 7 30
    53 7 30
    54 7 30
    55 7 30
    56 7 30
    57 7 30
    58 7 30
    59 7 30
    60 7 30
    61 7 30
    62 7 30
    63 7 30
    64 7 30
    65 7 30
    66 7 30
    67 7 30
    68 7 30
    69 7 30
    70 7 30
    71 7 30
    72 7 30
    73 7 30
    74 7 30
    75 7 30
    76 7 30
    77 7 30
    78 7 30
    79 8 30
    80 8 30
    81 8 30
    82 7 30
    83 7 30
    84 7 30
    85 7 30
    86 7 30
    87 6 30
    88 6 30
    89 6 30
    90 6 30
    91 6 30
    92 6 30
    93 7 30
    94 6 30
    95 7 30
    96 6 30
    97 6 30
    98 7 28
    99 6 28
    100 6 28
    101 6 28
    102 6 28
    103 6 28
    104 6 28
    105 6 28
    106 6 28
    107 6 28
    108 6 28
    109 6 28
    110 6 28
    111 6 28
    112 6 28
    113 6 28
    114 6 28
    115 6 28
    116 6 28
    117 6 28
    118 6 28
    119 6 28
    120 7 28
    121 6 28
    122 6 28
    123 7 28
    124 6 28
    125 6 28
    126 6 28
    127 6 28
    128 6 28
    129 6 28
    130 7 28
    131 7 28
    132 6 28
    133 6 28
    134 6 28
    135 6 28
    136 6 28
    137 7 28
    138 7 28
    139 7 28
    140 7 28
    141 7 28
    142 7 28
    143 6 28
    144 7 28
    145 7 28
    146 7 28
    147 6 28
    148 6 28
    149 6 28
    150 6 28
    151 6 28
    152 6 28
    153 6 28
    154 7 28
    155 7 28
    156 7 28
    157 7 28
    158 7 28
    159 7 28
    160 6 28
    161 6 28
    162 6 28
    163 7 28
    164 6 28
    165 6 28
    166 6 28
    167 7 28
    168 7 28
    169 7 28
    170 7 28
    171 6 28
    172 7 28
    173 7 28
    174 7 28
    175 7 28
    176 7 28
    177 7 28
    178 7 28
    179 7 28
    180 7 28
    181 7 28
    182 8 28
    183 7 28
    184 7 28
    185 7 28
    186 7 28
    187 7 28
    188 8 28
    189 7 28
    190 6 28
    191 6 28
    192 7 28
    193 7 28
    194 6 28
    195 6 28
    196 6 28
    197 6 28
    198 6 28
    199 6 28
    200 6 28
    201 6 28
    202 6 28
    203 6 28
    204 6 28
    205 6 28
    206 6 28
    207 6 28
    208 6 28
    209 6 28
    210 7 28
    211 7 28
    212 6 28
    213 6 28
    214 6 28
    215 6 28
    216 6 28
    217 6 28
    218 6 28
    219 6 28
    220 6 28
    221 6 28
    222 6 28
    223 6 28
    224 6 28
    225 6 28
    226 6 28
    227 7 28
    228 7 28
    229 7 28
    230 7 28
    231 7 28
    232 6 28
    233 6 28
    234 6 28
    235 6 28
    236 6 28
    237 6 28
    238 6 28
    239 6 28
    240 6 28
    241 6 28
    242 6 28
    243 6 28
    244 6 28
    245 6 28
    246 6 28
    247 6 28
    248 6 28
    249 6 28
    250 6 28
    251 6 28
    252 6 28
    253 6 28
    254 6 28
    255 6 28
    256 6 28
    257 6 28
    258 6 28
    259 6 28
    260 6 28
    261 6 28
    262 6 28
    263 6 28
    264 6 28
    265 6 28
    266 6 28
    267 6 28
    268 6 28
    269 6 28
    270 6 28
    271 6 28
    272 6 28
    273 6 28
    274 6 28
    275 6 28
    276 6 28
    277 6 28
    278 6 28
    279 6 28
    280 6 28
    281 6 28
    282 6 28
    283 6 28
    284 6 28
    285 6 28
    286 6 28
    287 6 28
    288 6 28
    289 6 28
    290 6 28
    291 6 28
    292 8 25
    293 7 25
    294 7 25
    295 7 25
    296 7 25
    297 7 25
    298 7 25
    299 7 25
    300 7 25
    301 7 25
    302 7 25
    303 7 25
    304 7 25
    305 7 25
    306 7 25
    307 7 25
    308 8 25
    309 8 25
    310 8 25
    311 8 25
    312 8 25
    313 8 25
    314 8 25
    315 8 25
    316 8 25
    317 8 25
    318 8 25
    319 8 25
    320 8 25
    321 8 25
    322 8 25
    323 9 25
    324 7 25
    325 8 25
    326 8 25
    327 8 25
    328 7 25
    329 8 25
    330 8 25
    331 8 25
    332 8 25
    333 9 25
    334 8 25
    335 8 25
    336 8 25
    337 7 25
    338 9 25
    339 8 25
    340 8 25
    341 8 25
    342 8 25
    343 8 25
    344 7 25
    345 8 25
    346 8 25
    347 8 25
    348 8 25
    349 8 25
    350 8 25
    351 8 25
    352 8 25
    353 8 25
    354 8 25
    355 7 25
    356 7 25
    357 7 25
    358 7 25
    359 7 25
    360 8 25
    361 7 25
    362 7 25
    363 8 25
    364 8 25
    365 8 25
    366 8 25
    367 9 25
    368 8 25
    369 8 25
    370 8 25
    371 8 25
    372 9 25
    373 7 25
    374 8 25
    375 8 25
    376 8 25
    377 8 25
    378 8 25
    379 8 25
    380 8 25
    381 8 25
    382 8 25
    383 8 25
    384 8 25
    385 8 25
    386 8 25
    387 8 25
    388 8 25
    389 8 25
    390 8 25
    391 7 25
    392 7 25
    393 7 25
    394 7 25
    395 7 25
    396 7 25
    397 7 25
    398 7 25
    399 7 25
    400 7 25
    401 7 25
    402 7 25
    403 7 25
    404 7 25
    405 7 25
    406 7 25
    407 7 25
    408 7 25
    409 7 25
    410 7 25
    411 7 25
    412 7 25
    413 8 25
    414 8 25
    415 8 25
    416 8 25
    417 8 25
    418 8 25
    419 8 25
    420 7 25
    421 7 25
    422 7 25
    423 7 25
    424 7 25
    425 7 25
    426 7 25
    427 7 25
    428 7 25
    429 7 25
    430 7 25
    431 7 25
    432 7 25
    433 7 25
    434 7 25
    435 7 25
    436 7 25
    437 8 25
    438 8 25
    439 8 25
    440 7 25
    441 7 25
    442 7 25
    443 7 25
    444 7 25
    445 7 25
    446 7 25
    447 7 25
    448 7 25
    449 7 25
    450 7 25
    451 7 25
    452 7 25
    453 7 25
    454 7 25
    455 7 25
    456 7 25
    457 7 25
    458 7 25
    459 7 25
    460 7 25
    461 7 25
    462 7 25
    463 7 25
    464 7 25
    465 7 25
    466 7 25
    467 7 25
    468 7 25
    469 7 25
    470 7 25
    471 8 25
    472 8 25
    473 8 25
    474 8 25
    475 8 25
    476 8 25
    477 8 25
    478 8 25
    479 8 25
    480 7 25
    481 7 25
    482 7 25
    483 7 25
    484 7 25
    485 7 25
    486 7 25
    487 7 25
    488 7 25
    489 7 25
    490 7 25
    491 7 25
    492 7 25
    493 7 25
    494 7 25
    495 7 25
    496 7 25
    497 7 25
    498 7 25
    499 7 25
    500 7 25
    501 7 25
    502 7 25
    503 7 25
    504 7 25
    505 7 25
    506 7 25
    507 7 25
    508 7 25
    509 8 25
    510 8 25
    511 8 25
    512 8 25
    513 8 25
    514 8 25
    515 8 25
    516 8 25
    517 8 25
    518 8 25
    519 8 25
    520 11 20
    521 11 20
    522 12 20
    523 12 20
    524 11 20
    525 11 20
    526 12 20
    527 11 20
    528 11 20
    529 11 20
    530 11 20
    531 11 20
    532 12 20
    533 11 20
    534 11 20
    535 12 20
    536 10 20
    537 10 20
    538 11 20
    539 10 20
    540 10 20
    541 11 20
    542 11 20
    543 11 20
    544 11 20
    545 12 20
    546 12 20
    547 12 20
    548 12 20
    549 13 20
    550 12 20
    551 12 20
    552 13 20
    553 13 20
    554 12 20
    555 13 20
    556 12 20
    557 12 20
    558 12 20
    559 12 20
    560 13 20
    561 13 20
    562 12 20
    563 13 20
    564 12 20
    565 14 20
    566 12 20
    567 13 20
    568 12 20
    569 12 20
    570 11 20
    571 11 20
    572 11 20
    573 11 20
    574 11 20
    575 11 20
    576 11 20
    577 11 20
    578 11 20
    579 11 20
    580 11 20
    581 11 20
    582 12 20
    583 11 20
    584 12 20
    585 11 20
    586 11 20
    587 11 20
    588 12 20
    589 13 20
    590 12 20
    591 12 20
    592 12 20
    593 12 20
    594 12 20
    595 12 20
    596 13 20
    597 12 20
    598 13 20
    599 13 20
    600 13 20
    601 12 20
    602 12 20
    603 13 20
    604 12 20
    605 11 20
    606 11 20
    607 11 20
    608 10 20
    609 11 20
    610 11 20
    611 11 20
    612 12 20
    613 12 20
    614 13 20
    615 12 20
    616 13 20
    617 12 20
    618 12 20
    619 13 20
    620 12 20
    621 25 15
    622 24 15
    623 24 15
    624 23 15
    625 23 15
    626 23 15
    627 23 15
    628 24 15
    629 24 15
    630 23 15
    631 25 15
    632 25 15
    633 25 15
    634 25 15
    635 23 15
    636 25 15
    637 25 15
    638 23 15
    639 25 15
    640 25 15
    641 25 15
    642 26 15
    643 25 15
    644 26 15
    645 24 15
    646 26 15
    647 25 15
    648 25 15
    649 26 15
    650 26 15
    651 26 15
    652 24 15
    653 24 15
    654 23 15
    655 22 15
    656 23 15
    657 22 15
    658 22 15
    659 24 15
    660 24 15
    661 24 15
    662 23 15
    663 23 15
    664 25 15
    665 22 15
    666 24 15
    667 24 15
    668 25 15
    669 24 15
    670 24 15
    671 29 15
    672 25 15
    673 25 15
    674 24 15
    675 26 15
    676 26 15
    677 25 15
    678 26 15
    679 25 15
    680 24 15
    681 26 15
    682 25 15
    683 25 15
    684 26 15
    685 25 15
    686 26 15
    687 26 15
    688 26 15
    689 26 15
    690 26 15
    691 26 15
    692 26 15
    693 26 15
    694 26 15
    695 26 15
    696 27 15
    697 27 15
    698 27 15
    699 27 15
    700 27 15
    701 27 15
    702 27 15
    703 25 15
    704 25 15
    705 26 15
    706 26 15
    707 26 15
    708 26 15
    709 26 15
    710 26 15
    711 26 15
    712 26 15
    713 26 15
    714 26 15
    715 26 15
    716 26 15
    717 26 15
    718 27 15
    719 27 15
    720 27 15
    721 25 15
    722 25 15
    723 25 15
    724 25 15
    725 25 15
    726 25 15
    727 25 15
    728 25 15
    729 25 15
    730 25 15
    731 26 15
    732 26 15
    733 26 15
    734 26 15
    735 26 15
    736 26 15
    737 26 15
    738 27 15
    739 27 15
    740 26 15
    741 26 15
    742 26 15
    743 26 15
    744 26 15
    745 26 15
    746 26 15
    747 26 15
    748 26 15
    749 26 15
    750 26 15
    751 26 15
    752 26 15
    753 26 15
    754 26 15
    755 27 15
    756 27 15
    757 27 15
    758 27 15
    759 26 15
    760 26 15
    761 26 15
    762 26 15
    763 26 15
    764 26 15
    765 26 15
    766 26 15
    767 26 15
    768 26 15
    769 26 15
    770 26 15
    771 26 15
    772 26 15
    773 27 15
    774 27 15
    775 27 15
    776 27 15
    777 27 15
    ;

    proc univariate data=dev1 NORMALTEST;
    class temp;
    var duration;
    run;
    quit;

    proc glm data=dev1;
    class temp;
    model duration=temp;
    means temp / hovtest welch;
    run;
    quit;

  2. #2
    TS Contributor
    Points: 10,123, Level: 67
    Level completed: 19%, Points required for next Level: 327
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,318
    Thanks
    148
    Thanked 304 Times in 285 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    hi,
    it is nice that you showed us the data! The first, obvious, question is about the goal of the analysis: do you just want to prove the relationship between temp and duration or do you want to build a predictive model?

    Second: there seem to be two issues with your data : if this is in time order then you have a strong grouping (high temps only in a short period) and you can not exclude any confounding factors, like something else besides temps being also different at the time of the measurement. Also you have a large gap in the temperatures between about 15 and 22. Obviously for predictions this will be problematic.

    If you only want a generic proof that higher temps are linked to lower durations, you could for instance group the temperatures in 3 classes - High, Med, Low and run an ANOVA or some non-parametric variant (like Kruskal-Wallis). You have enough data so that the lower power of the non-parametric test will not matter, the effect is also quite clear.

    If you want predictions you should take care of that gap first IMO.

    regards

  3. #3
    Points: 3,593, Level: 37
    Level completed: 62%, Points required for next Level: 57

    Posts
    12
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Quote Originally Posted by rogojel View Post
    hi,
    it is nice that you showed us the data! The first, obvious, question is about the goal of the analysis: do you just want to prove the relationship between temp and duration or do you want to build a predictive model?

    Second: there seem to be two issues with your data : if this is in time order then you have a strong grouping (high temps only in a short period) and you can not exclude any confounding factors, like something else besides temps being also different at the time of the measurement. Also you have a large gap in the temperatures between about 15 and 22. Obviously for predictions this will be problematic.

    If you only want a generic proof that higher temps are linked to lower durations, you could for instance group the temperatures in 3 classes - High, Med, Low and run an ANOVA or some non-parametric variant (like Kruskal-Wallis). You have enough data so that the lower power of the non-parametric test will not matter, the effect is also quite clear.

    If you want predictions you should take care of that gap first IMO.

    regards
    I was fitting a nonlinear model (Lactin/Beriere) to describe the relationship between temperature and developmental rate, hence the clustering of high temperatures to capture the peak of the curve. With this analysis I posted simply want to show a development time difference at each temperature through some sort of multiple comparison test (Dunn's/Games Howell) but I cant run any ANOVA/KW/Welsh test due to data assumptions

  4. #4
    TS Contributor
    Points: 10,123, Level: 67
    Level completed: 19%, Points required for next Level: 327
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,318
    Thanks
    148
    Thanked 304 Times in 285 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Which assumptions are invalidated for a Kruskal-Wallis or a Mann-Whitney U test?

  5. #5
    Points: 3,593, Level: 37
    Level completed: 62%, Points required for next Level: 57

    Posts
    12
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Quote Originally Posted by rogojel View Post
    Which assumptions are invalidated for a Kruskal-Wallis or a Mann-Whitney U test?
    For KW, data variance very unequal and no transformations come close to equality. Again, Mann Whitney test also assumes homogeneity of variances.

  6. #6
    TS Contributor
    Points: 10,123, Level: 67
    Level completed: 19%, Points required for next Level: 327
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,318
    Thanks
    148
    Thanked 304 Times in 285 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Well, strictly seen you are right. How about a simple permutation test?
    regards

  7. #7
    Fortran must die
    Points: 43,368, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,368
    Thanks
    661
    Thanked 904 Times in 863 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Non-normality is not a major issue when you have at least 30 data points because of the central limit theorem (some say 40 others higher). You can transform the data (box cox transformations are sometimes useful) to make it normal if you like or run a non-parametric test. If you mean heteroscedastcity you can do transformations, you can do WLS (if you know the source of the problem) or you can use a robust SE (I think White is recommended).

    Neither of these effect the point estimate only the statistical test. I do not think you can split the data into three levels of the dependent variable and run ANOVA which requires a linear DV. You could use ordinal or multinomial logistic regression for that.
    "The difference between genius and stupidity is that genius has its limits."

  8. #8
    TS Contributor
    Points: 13,337, Level: 75
    Level completed: 22%, Points required for next Level: 313
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,126
    Thanks
    33
    Thanked 393 Times in 352 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    You can also transform your response variable (i.e., duration). Using Minitab, I used a Box-Cox transform on the response, then analyzed the results using a 1 way ANOVA followed by a Tukey post-hoc test. The transform corrected the heteroskedacity issue in the residuals. You can also repeat the analysis using regression on the transformed response.

  9. The Following User Says Thank You to Miner For This Useful Post:

    noetsi (03-20-2017)

  10. #9
    Fortran must die
    Points: 43,368, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,368
    Thanks
    661
    Thanked 904 Times in 863 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    How did you decide what the correct transformation in box cox was miner? This is the element of box cox that always confuses me.
    "The difference between genius and stupidity is that genius has its limits."

  11. #10
    TS Contributor
    Points: 13,337, Level: 75
    Level completed: 22%, Points required for next Level: 313
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,126
    Thanks
    33
    Thanked 393 Times in 352 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Minitab allows you to set lambda at 0 (natural log), 0.5 (square root), any value between -5 and 5, or allow Minitab to find an optimal value.

    I started with Tukey's "Ladder of Powers" and Tukey and Mosteller's "Bulge Rules", focusing on transforming the response in order to correct for heteroskedacity, but could not find a standard power transform that worked. Then I tried the Box-Cox and allowed Minitab to search for an optimal lambda, which worked.

  12. The Following User Says Thank You to Miner For This Useful Post:

    noetsi (03-20-2017)

  13. #11
    Fortran must die
    Points: 43,368, Level: 100
    Level completed: 0%, Points required for next Level: 0
    noetsi's Avatar
    Posts
    6,368
    Thanks
    661
    Thanked 904 Times in 863 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Ok, letting it find the optimal value was what I wanted to know. I wonder which algorithm it uses to do that.
    "The difference between genius and stupidity is that genius has its limits."

  14. #12
    TS Contributor
    Points: 13,337, Level: 75
    Level completed: 22%, Points required for next Level: 313
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,126
    Thanks
    33
    Thanked 393 Times in 352 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    I attached the information in Minitab Help.
    Attached Thumbnails Attached Thumbnails Click image for larger version

Name:	BoxCox1.jpg‎
Views:	3
Size:	105.2 KB
ID:	6527   Click image for larger version

Name:	BoxCox2.jpg‎
Views:	3
Size:	148.9 KB
ID:	6528  

  15. The Following User Says Thank You to Miner For This Useful Post:

    noetsi (03-20-2017)

  16. #13
    TS Contributor
    Points: 13,337, Level: 75
    Level completed: 22%, Points required for next Level: 313
    Miner's Avatar
    Location
    Greater Milwaukee area
    Posts
    1,126
    Thanks
    33
    Thanked 393 Times in 352 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Here is the Box-Cox optimal transform for the Duration response.
    Attached Images  

  17. The Following User Says Thank You to Miner For This Useful Post:

    nebulus (03-21-2017)

  18. #14
    Points: 3,593, Level: 37
    Level completed: 62%, Points required for next Level: 57

    Posts
    12
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?

    Quote Originally Posted by Miner View Post
    Here is the Box-Cox optimal transform for the Duration response.
    Interesting. I had tried to do a Box-Cox transformation in SAS prior to posting my question here and it suggested Lambda = -1 (i.e. the reciprocal transformation), which is essentially using developmental rate (1/d) rather than time. I tried that transformation and it did not help heteroskedacity. Since that is not a standard power transformation like you suggested I will use Minitab and see if I get the same results you reached. Thank you very much, this will definitely help me in the future.

    Edit* What p-value did you get for the variance test you used? After transforming the data and using Levene's Test I got an F=2.75 and p=0.02 This is much better than any transformation I ever did but still not non-significant. I tried other values proposed in that range and X^0.33 seemed to be best with p-value of 0.0327.
    Last edited by nebulus; 03-21-2017 at 10:39 PM.

  19. #15
    Points: 864, Level: 15
    Level completed: 64%, Points required for next Level: 36

    Posts
    141
    Thanks
    13
    Thanked 41 Times in 37 Posts

    Re: Alternatives for Non-normality and Inequality of Variance?


    Quote Originally Posted by noetsi View Post
    Non-normality is not a major issue when you have at least 30 data points because of the central limit theorem (some say 40 others higher).
    Just wanted to add that in cases of more than two groups in an ANOVA, for example, the CLT can't apply at any sample size, so you will always need the two assumptions of normally distributed DV among the groups and a common variance for the groups. Good points, though!

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats