# Logistic Regression

#### MrAnon9

##### New Member
Say that I have data on flour beetles, with different doses and the response being the probability of the beetle dying on day k, and the beetles are split between male and female.

How can I investigate whether a logistic regression model is an appropriate way describe the endpoint mortality data (i.e. death by last day) and if there is any differences between males and females?

Here is the model:

P(death by day 13 on dose [MATH] d_{i}) = (1+exp[-(\alpha_{1s} + \alpha_{2s}logd_{i})]^{-1}[/MATH] where s denotes sex

#### MrAnon9

##### New Member
Hi again, can anybody help me out there? I don't want the answer, I just really need to know what it is I have to do..

e.g. If I draw a plot, it would be dose against what?

#### jrai

##### New Member
It'd be good if you can clearly state your question & also your data structure.

#### MrAnon9

##### New Member
Ok my apologies.

I have data that goes from day 1-13 for death of flour beetles under 4 different doses. Each observation is categorized into male and female, so I have 8 observations per day.

I am being asked to decide whether the logistic function above is a suitable way to descrive the endpoint mortality data (i,e probability of death by day 13 with dose i), and second part, whether there is a difference between the responses of the two genders.

#### jrai

##### New Member
This is a survival analysis case & when the time is discrete as in your case then the logistic regression is appropriate choice. To quote professor Paul Allison, "The basic idea is simple. Each individual’s survival history is broken down into a set of discrete time units that are treated as distinct observations. After pooling these observations, the next step is to estimate a binary regression model predicting whether an event did or did not occur in each time unit. Covariates are allowed to vary over time from one time unit to another.

By specifying a logit link, you get estimates of the discrete-time proportional odds model proposed by Cox."

Here is the model:

P(death by day 13 on dose [MATH] d_{i}) = (1+exp[-(\alpha_{1s} + \alpha_{2s}logd_{i})]^{-1}[/MATH] where s denotes sex
Is the given model constructed by you or do you have to analyze this? I don't see time as one of the covariates which should be included in the model. This model gives you the difference in probabilities for different dose & sex combinations but the equation doesn't account for difference in probabilities based on the days.

#### MrAnon9

##### New Member
The model is given to me and I am asked to assess whether it is appropriate or not, but what can of things can I do to assess this?

#### jrai

##### New Member
Show some effort as to how would you go ahead. What do you think could be possible analysis?

#### MrAnon9

##### New Member
I could maybe fit the model in R and look at the AIC and Deviance analysis? And residual plot?

I could plot dose against ri/ni where ri is number of beetles killed on day (or dose? I don't know) i and ni is number tested on day i and see if it's decreasing? I know that as dose increases, mortality should decrease to satisfy monotonicity.

I'm not sure where the logistic link function log(pi/1-pi) comes into play here where pi = Probability of mortality, but i know it's equal to alpha + beta*logD or *D? where D is the dose.

I could plot dose against pi where pi goes from 0 - 1, but not sure here again what I am looking for.

I'm really not sure to be honest. Just throwing things out there.

#### jrai

##### New Member
Estimating the model (or at least analyzing the outline of the output) will be helpful. You can always check after estimating the model whether you're able to answer your question or not.

As from the design it is clear that it does not include time & one can't vary the probabilities by time. The ideal equation to answer all questions is:

P(death by day 13 on dose [MATH] d_{i}) = (1+exp[-(\alpha_{1} + \alpha_{2}logd_{i} + \alpha_{3}S + \alpha_{4}S*logd_{i} + \alpha_{5}t)])^{-1}[/MATH]

Deviance is a goodness of fit statistic & it would only tell you how good is the fit but does not tell you whether your model answers your question.

I'm not sure where the logistic link function log(pi/1-pi) comes into play here where pi = Probability of mortality, but i know it's equal to alpha + beta*logD or *D? where D is the dose.
Logistic link function when readjusted to leave just pi on the left hand side is the same as your given equation. So log model is no doubt appropriate to answer the question of finding probability of death by day 13 on dose di & also for finding the difference in probabilities for male & female.

The difference between male & female can be found by:
Marginal effect = Prob(Y = 1| S = 1) - Prob(Y = 1| S = 0), where denotes the means of all the other variables in the model.

#### MrAnon9

##### New Member
I think I'm just not really understanding the question here. I am modelling the proportion so I treat that as my response? (killed/total)? and the dose as my predictor variable? Or am I meant to model the logistic function against the dose?

It's the interpretation I am having problems with. How could I fit the model in R (I'm assuming that the only way to assess the suitability of the model is to fit it in R and analyse AIC and Residual plot)? Or can it be assessed without using a computer package?

I don'treally know if I am supposed to plot dose against killed/total or dose against logistic function? or even killed/survived :/

Please if you could answer these questions, I would appreciate that a lot.

#### jrai

##### New Member
That's the reason I was asking you to specify the data structure. As I quoted in professor's statement, ideally you should model a binary response variable as your DV. This variable would equal 1 if beetle dies & 0 if it survives.

For each beetle & for each day you'll have 1 observation. Say on day 1 you've 5 beetles & fifth beetle dies on Day 1 itself. You'll have 5 observations for day 1 with DV=0 for first four & DV=1 for the last one. For Day2 you'll have just 4 observations. Now estimating the equation given in my last response would give you the probability of death for the beetle. Once you've estimated the model, you can put in time=13 & the value for dose (1/2/3/4) & the sex & estimate the probability. You'll get different probability for male & for female.

The issue to take care of is dependence between the different observations. I won't burden you with that & you should look into this once you're comfortable with your data structure & model.

As from the initial question, it looks that more of an analytical response is needed rather than actually estimating the model. It is about thinking through the problem & see what will give you the answer at the end rather than finding answer itself. Do you agree?

#### MrAnon9

##### New Member
So that would be as simple as plotting the dose against the ratio for males and then females and seeing if it's increasing function in a similar way to the logistic function, which shows that the model is suitable?

#### noetsi

##### Fortran must die
I would guess that within subject ANOVA (or two way ANOVA with a within and between subject effect) or maybe Cox preportional hazard might be more suited to this.

#### Dason

ANOVA for binary response?

#### MrAnon9

##### New Member
Woah, that totally confused me now lol. I don't think that's what I need to do, infact I don't even need to do no actual model fitting, just need to describe if it';s suitable so I'm guessing this can only be done by plotting dose against the killed/total ratio and see if it's an increasing monotonic function like the logistic function? I'm not sure if log odds come into it and if they do then how

#### Dason

Sorry for any confusion I brought - I realize now it might have sounded like a suggestion. I was questioning noetsi because I don't think it's a good idea.

Since you have multiple observations at each point you could try calculating the logit of the estimated probabilities and then plotting those against dose. If logistic regression is a good fit then these should be approximately linear.

#### noetsi

##### Fortran must die
ANOVA for binary response?
A good point. You would have to use one of the non-parametric equivilents. I was thinking of how to address the problem and not the form of the data.

#### jrai

##### New Member
Since you have multiple observations at each point you could try calculating the logit of the estimated probabilities and then plotting those against dose. If logistic regression is a good fit then these should be approximately linear.
This sounds like a good solution.

#### Dason

If there aren't that many observations though it might make sense to smooth the estimated probabilities using some sort bayesian estimator. Something like (x+1)/(n+2) or (x+.5)/(n+1) would probably work well enough.

#### MrAnon9

##### New Member
Hi again, thanks for the help. I have done two plots from the data..

Are they relevant plots for deciding if the logistic model is suitable and if so, what can I deduce from the two graphs?

I think for plot 2, it needs to be a straight line relationship? Is this correct and what can I conclude from the first plot?

Should I plot against dose or log dose, does it matter?