Newbie Question

#1
Hello,
Just wondering if someone could help me. I have a degree in Maths and have done second year Uni Maths stats (but quite a while ago) so understand basic stats.

A medical condition (X) can be on or off on any particular day.

A number of Events (Ei) can be on or off and may have an impact on X being on or off. Ei can be on or off as well.

For example, taking a particular type of pill on the day before could be E1 and may mean that X is more or less likely. Or X being off for the previous 2 days may be E2 and may result in a higher probability of it being on the next day.

I have already done an analysis of the impact of particular individual events on X using binomial distributions, much the same way that a coin is determined to be fair or not.

https://en.wikipedia.org/wiki/Binomial_test

Thus for example, if X is off for the previous 2 days, there is an 85% chance it is on the next day (on average X is on 58% of the time and off 42% of the time).

If I use the binomial distribution method, X being impacted by X being off the prior 2 days is significant at the 99.39% level.

What I would like to do is do an analysis of all Events E at once rather than individually. Thus if X is on and off the prior 2 days, then it is highly likely that this has been caused by X being off the prior 2 days rather than some other event (and the influence of these other events should be discounted in this case).

I realise that what I am asking probably requires a significant stats background, but just hoping someone can point me in the correct direction so that I can follow up the relevant statistical technique that would address this.

Tanks!

Pinky
 

hlsmith

Omega Contributor
#2
Could logistic regression be used here. Predict X status (binary) via your binary predictors. It also sounds like you may have some potential covariates that you want to control for as well.


How many E variants do you have, so how many versions, say day categories and what is your sample size?
 
#3
Could logistic regression be used here. Predict X status (binary) via your binary predictors. It also sounds like you may have some potential covariates that you want to control for as well.


How many E variants do you have, so how many versions, say day categories and what is your sample size?
Thanks Hlsmith,

I looked up logistic regression on wikipedia:

https://en.wikipedia.org/wiki/Logistic_regression

And it looks like it might be the right model.

Here is their example:

"A group of 20 students spend between 0 and 6 hours studying for an exam. How does the number of hours spent studying affect the probability that the student will pass the exam?"

My example is more like:

"I take an exam every day. I study between 0 and 6 hours the day before. How does the number of hours spent studying affect the probability I will pass."

I guess if I record the results over 20 days, this is exactly the same as their example.

However my example also differs because I have multiple inputs. My example is more like:

"A group of 20 students spend between 0 and 6 hours studying for an exam, stay up late to between 8 pm and 2am the night before the exam, have a vegetarian, meat or fish dinner the night before, and have 0,1 or 2 beers on the evening prior to the exam. How does the number of hours spent studying, the time they stay up until on the night before, the type of dinner on the day before and the number of beers they have affect the probability that the student will pass the exam?"

I also want to know if some factors don't have any impact e.g. maybe it doesn't matter at all what dinner they have the night before

For my particular case I have about 220 binary data points, and 15 potential factors that could impact the result. Using my binomial distribution method 8 of these 15 significantly impact the binary result and some others are close to significance. The most significant ones are the results of the previous test (e.g. in the exam analogy, if I pass 2 exams in a row, I am highly likely and highly significantly likely to fail the next one).

I believe that if I used a more sophisticated test such as this one, it will be more likely to show up factors that are significant than the binomial method.

Anyway, seems I will need to hit the books in order to use this method as never got this far in my Maths Stats courses!
 

hlsmith

Omega Contributor
#4
What proportion pass exam. I still don't get the repeat exam reference. So do people contribute more than one observation?
 
#5
What proportion pass exam. I still don't get the repeat exam reference. So do people contribute more than one observation?
Hello Hlsmith,

The exam reference is fully described in the link.

https://en.wikipedia.org/wiki/Logistic_regression

I have replicated the tabular information in the link below:


The table shows the number of hours each student spent studying, and whether they passed (1) or failed (0).

Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50
Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1

Pinky