+ Reply to Thread
Results 1 to 10 of 10

Thread: [R]: Mixed model for non-normal y with many 0s and some high values in nested design

  1. #1
    Points: 708, Level: 14
    Level completed: 8%, Points required for next Level: 92

    Location
    France
    Posts
    7
    Thanks
    3
    Thanked 0 Times in 0 Posts

    [R]: Mixed model for non-normal y with many 0s and some high values in nested design




    Dear all,

    I am currently writing my master thesis on the effect of a certain insecticide on bumble bee colonies. In particular I am testing whether the insecticide affects virus or bacterium concentrations. In many colonies I haven’t found any viruses, but if they are present they can be in high concentrations.
    I wanted to use some kind of mixed model reflecting the nested design of the study from which I got the bumble bees.
    The study design is hierarchical. 16 fields were paired according to landscape characteristics. In each pair one field was randomly assigned to be treated with the insecticide, while the other is the control field. In each field there are 2 boxes and in each box are 2 bumble bee hives.
    This is how my data looks like:
    > summary(adults.col)
    labno pair field box hive.label hive.real mass.mean
    1 : 1 P01 : 8 VR02 : 4 A:32 1:32 1:30 Min. :0.03763
    10 : 1 P02 : 8 VR03 : 4 B:32 3:32 2: 2 1st Qu.:0.12208
    11 : 1 P03 : 8 VR04 : 4 3:32 Median :0.15135
    12 : 1 P04 : 8 VR05 : 4 Mean :0.15637
    13 : 1 P05 : 8 VR06 : 4 3rd Qu.:0.18863
    14 : 1 P10 : 8 VR07 : 4 Max. :0.27110
    (Other):58 (Other):16 (Other):40
    mass.sem itd.mean itd.sem SBPV.SQ.mean SBPV.SQ.se
    Min. :0.00418 Min. :4.650 Min. :0.06751 Min. : 1.475 Min. : 0.2050
    1st Qu.:0.01305 1st Qu.:5.066 1st Qu.:0.14619 1st Qu.: 5.317 1st Qu.: 0.6325
    Median :0.01578 Median :5.319 Median :0.17602 Median : 6.553 Median : 2.3139
    Mean :0.01686 Mean :5.337 Mean :0.18367 Mean : 28.125 Mean : 5.0554
    3rd Qu.:0.01914 3rd Qu.:5.596 3rd Qu.:0.20829 3rd Qu.: 22.837 3rd Qu.: 9.0162
    Max. :0.03553 Max. :6.306 Max. :0.34589 Max. :121.000 Max. :14.1775
    NA's :58 NA's :58
    SBPV.detected.SBPV.detected ABPV.SQ.mean ABPV.SQ.se
    Min. :0.00000 Min. : 11.8 Min. : 4.91
    1st Qu.:0.00000 1st Qu.: 19.5 1st Qu.: 10.81
    Median :0.00000 Median : 33.1 Median : 19.55
    Mean :0.09375 Mean : 78668.0 Mean : 44832.98
    3rd Qu.:0.00000 3rd Qu.: 242.4 3rd Qu.: 137.59
    Max. :1.00000 Max. :393033.3 Max. :223992.04
    NA's :59 NA's :59
    ABPV.detected.ABPV.detected SBV.SQ.mean SBV.SQ.se SBV.detected.SBV.detected
    Min. :0.000000 Min. : 3.885 Min. : 0.095 Min. :0.00000
    1st Qu.:0.000000 1st Qu.: 6.848 1st Qu.: 1.917 1st Qu.:0.00000
    Median :0.000000 Median : 8.710 Median : 5.082 Median :0.00000
    Mean :0.078125 Mean : 34.028 Mean : 23.240 Mean :0.15625
    3rd Qu.:0.000000 3rd Qu.: 31.654 3rd Qu.: 22.671 3rd Qu.:0.00000
    Max. :1.000000 Max. :150.550 Max. :119.450 Max. :1.00000
    NA's :54 NA's :54
    box.nested hive.nested
    Min. : 1.00 Min. : 1.00
    1st Qu.: 8.75 1st Qu.:16.75
    Median :16.50 Median :32.50
    Mean :16.50 Mean :32.50
    3rd Qu.:24.25 3rd Qu.:48.25
    Max. :32.00 Max. :64.00

    I have been looking for any model that can deal with the zeros and the high concentrations but not really found anything, although I expect that is a common problem. I came across a zero inflated model but I was recommended not to use it because it was for count data and the relatively high SQ values (response variable) of the viruses (ABPV, SBPV, SBV) would not go well with this. I was also told not to use rank tests because of the high number of ties (zeros).

    Does anyone have a suggestion what I could use instead?

    Thanks in advance.

    N.B. I still have to determine bacteria concentrations or SQ values. Afterwards I will be told which fields were treated with the insecticide. For purposes of figuring out how to do this kind of analysis in R I created a vector pseudotreatment, which will be replaced by a vector specifying the real treatment information later on. Pseudotreatment has 2 levels C for control N stands for the insecticide.

  2. #2
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Why don't you post the data here? Then someone might suggest how to analyse the data.

    You can save the data with the command “dput”.

    Spoiler:


    And zero inflated models is not just for discrete data. Maybe look at COZIGAM.

  3. The Following User Says Thank You to GretaGarbo For This Useful Post:

    master student (08-11-2015)

  4. #3
    Points: 708, Level: 14
    Level completed: 8%, Points required for next Level: 92

    Location
    France
    Posts
    7
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Thanks GretaGarbo for the tip with posting the data and for the link to package about Unconstrained and Constrained Zero-Inflated
    Generalized Additive Models.

    Code: 
    structure(list(pair = structure(c(2L, 6L, 6L, 6L, 7L, 7L, 7L, 
    7L, 7L, 3L, 7L, 3L, 7L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 2L, 
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 2L, 6L, 6L, 6L, 6L, 1L, 1L, 
    1L, 1L, 3L, 2L, 3L, 3L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 8L, 4L, 
    4L, 4L, 2L, 2L, 6L, 7L, 5L, 8L, 4L), .Label = c("P01", "P02", 
    "P03", "P04", "P05", "P10", "P11", "P12"), class = "factor"), 
        field = structure(c(1L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 10L, 
        5L, 10L, 5L, 1L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 1L, 8L, 
        9L, 8L, 9L, 8L, 9L, 9L, 10L, 10L, 1L, 11L, 11L, 11L, 11L, 
        12L, 12L, 12L, 12L, 13L, 2L, 13L, 13L, 13L, 14L, 14L, 14L, 
        14L, 15L, 15L, 2L, 15L, 16L, 16L, 16L, 2L, 2L, 3L, 5L, 8L, 
        15L, 16L), .Label = c("VR02", "VR03", "VR04", "VR05", "VR06", 
        "VR07", "VR09", "VR12", "VR13", "VR14", "VR16", "VR17", "VR18", 
        "VR20", "VR21", "VR23"), class = "factor"), box = structure(c(1L, 
        2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
        1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 
        1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
        1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
        2L, 1L, 1L), .Label = c("A", "B"), class = "factor"), hive.label = structure(c(1L, 
        1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
        2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 
        2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 
        1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 
        2L, 2L, 1L), .Label = c("1", "3"), class = "factor"), hive.real = structure(c(1L, 
        1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 
        3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 
        3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 
        2L, 1L, 3L, 3L, 1L, 2L, 1L, 3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 
        3L, 3L, 1L), .Label = c("1", "2", "3"), class = "factor"), 
        SBPV.SQ.mean = c(NA, NA, 27.97, 5.2, NA, NA, NA, NA, NA, 
        NA, NA, NA, 5.67, NA, NA, NA, NA, NA, 1.475, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.43666666666667, 
        NA, NA, NA, NA, NA, NA, NA, 121, NA, NA, NA, NA, NA, NA, 
        NA), SBPV.SQ.se = c(NA, NA, 11.0322663129567, 1.66, NA, NA, 
        NA, NA, NA, NA, NA, NA, 0.29, NA, NA, NA, NA, NA, 0.205, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.96784059170599, 
        NA, NA, NA, NA, NA, NA, NA, 14.1774468787578, NA, NA, NA, 
        NA, NA, NA, NA), SBPV.detected = structure(c(0, 0, 1, 1, 
        0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 
        0, 0, 0), .Dim = c(64L, 1L), .Dimnames = list(NULL, "SBPV.detected")), 
        ABPV.SQ.mean = c(NA, NA, NA, 19.49, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, 393033.333333333, 242.433333333333, 
        33.1, 11.8466666666667, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA), ABPV.SQ.se = c(NA, NA, NA, 10.81, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223992.041029239, 
        137.58798800929, 19.5517262664963, 4.90749540102801, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), ABPV.detected = structure(c(0, 
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0), .Dim = c(64L, 1L), .Dimnames = list(NULL, 
            "ABPV.detected")), SBV.SQ.mean = c(NA, NA, NA, NA, NA, 
        87.15, NA, NA, 3.885, NA, NA, 7.695, NA, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, 9.495, NA, NA, NA, NA, NA, NA, NA, 
        33.005, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 150.55, 
        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6.565, NA, 7.925, 
        NA, NA, NA, 6.41, NA, 27.6), SBV.SQ.se = c(NA, NA, NA, NA, 
        NA, 60.85, NA, NA, 1.915, NA, NA, 1.925, NA, NA, NA, NA, 
        NA, NA, NA, NA, NA, NA, NA, 1.505, NA, NA, NA, NA, NA, NA, 
        NA, 27.095, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
        119.45, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.0950000000000002, 
        NA, 5.275, NA, NA, NA, 4.89, NA, 9.4), SBV.detected = structure(c(0, 
        0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 
        0, 0, 0, 1, 0, 1), .Dim = c(64L, 1L), .Dimnames = list(NULL, 
            "SBV.detected")), box.nested = c(1, 1, 2, 2, 3, 3, 4, 
        4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 
        13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 
        20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 
        28, 28, 29, 29, 30, 30, 31, 31, 32, 32), hive.nested = 1:64), .Names = c("pair", 
    "field", "box", "hive.label", "hive.real", "SBPV.SQ.mean", "SBPV.SQ.se", 
    "SBPV.detected", "ABPV.SQ.mean", "ABPV.SQ.se", "ABPV.detected", 
    "SBV.SQ.mean", "SBV.SQ.se", "SBV.detected", "box.nested", "hive.nested"
    ), class = "data.frame", row.names = c(NA, -64L))

  5. #4
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    What is the response variable in this case? What is meant by SBPV.SQ and
    ABPV.SQ?

    I guess that you measured three different viruses. Did you measure the amount of virus per bumble bee? That is, many bumble bees per hive? Or were the measurement taken per hive? Or possibly per box? (Two hives within one box?)

    Was the “SBPV.SQ.mean” some kind of mean and “SBPV.SQ.se” a standard deviation or standard error?

    There are lots of 'NA':s. Why? (I can't see any zeros.) If there were no viruses then the response should be zero, right? Or did the measurement/experiment fail very frequently?

    Did you or the supervisor randomize among the field pairs which one should be treated with insecticide?

  6. The Following User Says Thank You to GretaGarbo For This Useful Post:

    master student (08-12-2015)

  7. #5
    Points: 708, Level: 14
    Level completed: 8%, Points required for next Level: 92

    Location
    France
    Posts
    7
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Hi again, the response variables are the SBPV.SQ.mean, SBV.SQ.mean and ABPV.SQ.mean values (or actually a multiple of those). SQ stands for starting quantity (the quantity of viruses per sample) and SBPV, SBV and ABPV are viruses. 10 dead bees of one hive/colony present one sample as they were analyzed at once. Each sample was analyzed twice. Only if both analyses were positive the sample is considered positive. Otherwise it should be attributed the value 0. (Sorry I forgot to replace NAs with zeros. The SQ.mean is the average of the 2 measurements and SQ.se is the standard error of it.

    Yes it was randomized among the field pairs which one is treated with the insecticide.

  8. #6
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Try to make it easy for someone to help you.

    I suggest that master_student clean up the data and insert the zeros where they are supposed to be. Take away unnecessary variables.

    Use dput again. Insert a command that tell us how you would have analysed it if there was no zero-inflation. That will make it clear what is the dependent variable and random effects etc.

    To read long texts is boring. (That is why you lose readers and helpers.) To read a statistical model is interesting. Write down your statistical model. Use this link. Edit your first post. Take away your boring part with “summary(adolts.col)” and replace it with the model.

    Quote Originally Posted by master student View Post
    10 dead bees of one hive/colony present one sample as they were analyzed at once. Each sample was analyzed twice.
    They are analysed once and twice. I didn't understand that.

    Quote Originally Posted by master student View Post
    Only if both analyses were positive the sample is considered positive. Otherwise it should be attributed the value 0.
    This seem like an interesting selection mechanism. Is that the way it is usually recorded in your area? (If the sample had been analysed a third time, would that have further biased the measurement towards zero?)

    Is there any generally accessible published paper on how they measure and statistically analyse your type of data?

  9. #7
    Points: 708, Level: 14
    Level completed: 8%, Points required for next Level: 92

    Location
    France
    Posts
    7
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Hi again,

    Thanks for the tips. Sorry that I have not expressed myself well. I mashed ten dead bees and suspended them in water. The volume of this suspension is much larger than I need for one analysis. I have taken two aliquots and analyzed them. I can however not analyze each bee individually but the values are later on converted to pathogen numbers per bee for each colony (response variables).

    These are found in the columns ABPV, SBPV, SBV and N.bombi.

    Code: 
    structure(list(ptreatment = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("control", 
    "insecticide"), class = "factor"), pair = structure(c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 
    5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 
    7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("P01", 
    "P02", "P03", "P04", "P05", "P10", "P11", "P12"), class = "factor"), 
        field = structure(c(6L, 6L, 6L, 6L, 12L, 12L, 12L, 12L, 1L, 
        1L, 1L, 1L, 2L, 2L, 2L, 2L, 10L, 10L, 10L, 10L, 13L, 13L, 
        13L, 13L, 7L, 7L, 7L, 7L, 16L, 16L, 16L, 16L, 8L, 9L, 8L, 
        9L, 8L, 9L, 9L, 8L, 3L, 3L, 3L, 11L, 11L, 11L, 11L, 3L, 4L, 
        4L, 4L, 4L, 5L, 5L, 5L, 5L, 14L, 14L, 14L, 14L, 15L, 15L, 
        15L, 15L), .Label = c("VR02", "VR03", "VR04", "VR05", "VR06", 
        "VR07", "VR09", "VR12", "VR13", "VR14", "VR16", "VR17", "VR18", 
        "VR20", "VR21", "VR23"), class = "factor"), house = c(11, 
        12, 11, 12, 23, 24, 23, 24, 1, 2, 1, 2, 3, 4, 3, 4, 19, 20, 
        20, 19, 25, 26, 25, 26, 13, 14, 13, 14, 32, 31, 32, 31, 15, 
        17, 15, 18, 16, 17, 18, 16, 6, 6, 5, 21, 22, 21, 22, 5, 7, 
        8, 7, 8, 9, 10, 9, 10, 27, 28, 27, 28, 29, 30, 30, 29), beehive = c(21L, 
        23L, 22L, 24L, 45L, 47L, 46L, 48L, 1L, 3L, 2L, 4L, 5L, 7L, 
        6L, 8L, 37L, 40L, 39L, 38L, 49L, 51L, 50L, 52L, 25L, 27L, 
        26L, 28L, 63L, 62L, 64L, 61L, 30L, 33L, 29L, 35L, 31L, 34L, 
        36L, 32L, 11L, 12L, 10L, 41L, 43L, 42L, 44L, 9L, 13L, 15L, 
        14L, 16L, 17L, 19L, 18L, 20L, 53L, 55L, 54L, 56L, 57L, 59L, 
        60L, 58L), ABPV = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2816.305, 55024666.6666667, 
        3697.10833333333, 4269.9, 1782.92333333333, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), SBPV = c(0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 124.6375, 0, 0, 0, 0, 0, 22082.5, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 4573.095, 751.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        1278.585, 0, 0, 0, 1037.415, 0, 0, 0, 0, 0), SBV = c(0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 780.6125, 0, 0, 765.6525, 
        0, 3878.0875, 0, 0, 20700.625, 0, 0, 0, 0, 0, 0, 1076.66, 
        0, 4291.8, 911.52, 0, 0, 0, 0, 0, 0, 1147.39, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 13333.95, 0, 0, 652.68, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0), N.bombi = c(0, 0, 0, 0, 0, 0, 0, 0, 327635, 
        0, 2851960, 1938475, 0, 0, 32628750, 0, 0, 0, 3264800, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
        0, 0, 0, 0, 0, 0)), .Names = c("ptreatment", "pair", "field", 
    "house", "beehive", "ABPV", "SBPV", "SBV", "N.bombi"), row.names = c(15L, 
    16L, 17L, 18L, 38L, 39L, 40L, 41L, 1L, 14L, 23L, 33L, 43L, 53L, 
    58L, 59L, 10L, 12L, 31L, 32L, 42L, 44L, 45L, 46L, 19L, 20L, 21L, 
    22L, 55L, 56L, 57L, 64L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 62L, 
    2L, 3L, 4L, 34L, 35L, 36L, 37L, 60L, 5L, 6L, 7L, 8L, 9L, 11L, 
    13L, 61L, 47L, 48L, 49L, 50L, 51L, 52L, 54L, 63L), class = "data.frame")
    The model I would use if I had not so many zeros would look like this:

    Code: 
    library(nlme)
    ABPV.lme <- lme(fixed=ABPV~ptreatment,random = ~ 1|field/house,data=pathogens)
    summary(ABPV.lme)
    
    N.bombi.lme <- lme(fixed=N.bombi~ptreatment,random = ~ 1|field/house,data=pathogens)
    summary(ABPV.lme)

    N.B.: I do not know whether it is common practice to take B samples from positively tested colonies. We do it to reduce the probability of a false positive finding. A third sample would introduce a bias, the direction of the bias depends on whether we would accept all samples that had been twice positively tested or if we would neglect all samples that have one negative test.

  10. #8
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    That selection mechanism seems strange. In most cases when one have two measurements of the same quantity one take the mean value of the two values, not the minimum value. It is good to try to avoid false positives, but what about false negatives? I suggest to use the mean of the two measurements (zero or not).

    With only 5 positive out of 64 it might not be so precise estimates. You could try that with a logit model, recoded the positives as 1 (but I don't have that high hope).

    You have two houses per field but also two hives within each house. What happened to those data? Did you take 5 bees per hive?


    There is a book that might interest you: “Zero Inflated Models and Generalized Linear Mixed Models with R” by Zuur, Saveliev, Ieno. (2012)

  11. #9
    Points: 708, Level: 14
    Level completed: 8%, Points required for next Level: 92

    Location
    France
    Posts
    7
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des

    Thanks a lot GretaGarbo,

    In fact we planned to do both estimating the prevalence (proportion of colonies being infested) using a binomial model and the loads (or concentrations) using a linear model. Based on these data it might not make much sense to evaluate the loads and I guess I have to restrict myself to the prevalences.

    My supervisor made another suggestion. Estimating conditional probabilities (in addition to the prevalence) i.e. removing all the colonies without infection from the data set and estimating if the insecticide increases the pathogen load of infested colonies. What do you think of that?

    We take the mean of the two values but only if both are positive. My supervisor is much more concerned about false positives than false negatives. I need to ask him again why that is. If I have a binomial model how would I take the mean of one negative and one positive measurement?

    I took ten bees per colony. So each measurement represents one colony/hive. I haven't included the hive effect probably due to a thinking bias. I thought because each hive stands for one response value I should not include it, but I guess I have to include it to consider the variability due to the hive (locations) as well, right?

    Best

  12. #10
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: [R]: Mixed model for non-normal y with many 0s and some high values in nested des


    I think that you will get strange selection effects when deleting zero values and taking the min(0, measured value). Those selection effects will not be beneficial to the experiment.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats