The question is :how can i predict the dependent variable using my data.

Thanks

- Thread starter Mykola
- Start date
- Tags confidence interval corrections logit nonparametric odds ratio. probability

The question is :how can i predict the dependent variable using my data.

Thanks

(That is the most natural choice. But it is not non-parametric analysis.)

(That is the most natural choice. But it is not non-parametric analysis.)

If you mean the dependent variable (DV): there is no need for a dependent variable to be normally distributed. For parametric models, instead it is the distribution of the model's prediction errors (residuals) which matters, not the distribution of the DV itself.

And if your sample size is large enough (say, n > 30 or 40 or so), then even normally distributed residuals are not necessary for "parametric" analyses.

With kind regards

Karabiner

If you mean the dependent variable (DV): there is no need for a dependent variable to be normally distributed. For parametric models, instead it is the distribution of the model's prediction errors (residuals) which matters, not the distribution of the DV itself.

And if your sample size is large enough (say, n > 30 or 40 or so), then even normally distributed residuals are not necessary for "parametric" analyses.

With kind regards

Karabiner

Independent variables are not normaly distributed. But i can transfere them to quintiles.

In logistic regression the dependent variable is assumed to be binomial distributed. There is no assumption about the normal distribution and no need to try to transform to normal distribution.

In general:

Some people seems to believe (after having read an elementary course) that there are only two possibilities; either normal-distribution-methods or non-parametrics. That is wrong. There are many parametric distributions (that are skewed and so on) that does not look like the normal distribution (e.g. binomial distribution, Poisson distribution, exponential distribution).

There are no (distributional) assumptions about the independent variables in regression. The independent variables are assumed to be fixed values (and thus have no distribution).

In logistic regression the dependent variable is assumed to be binomial distributed. There is no assumption about the normal distribution and no need to try to transform to normal distribution.

In general:

Some people seems to believe (after having read an elementary course) that there are only two possibilities; either normal-distribution-methods or non-parametrics. That is wrong. There are many parametric distributions (that are skewed and so on) that does not look like the normal distribution (e.g. binomial distribution, Poisson distribution, exponential distribution).

In logistic regression the dependent variable is assumed to be binomial distributed. There is no assumption about the normal distribution and no need to try to transform to normal distribution.

In general:

Some people seems to believe (after having read an elementary course) that there are only two possibilities; either normal-distribution-methods or non-parametrics. That is wrong. There are many parametric distributions (that are skewed and so on) that does not look like the normal distribution (e.g. binomial distribution, Poisson distribution, exponential distribution).

BUT when i was comparing two groups (one of them had the pathology and the other did not) using Mann Whitney test i've got got 3 independent variables that were differtent in two groups, and the

difference was statistically significant (in the begining i had six independent variables) .

So now i whant to analize the power of the ifluence of each of those statistically significant independent variable (or find the coeficient of correlation , or represent it as the odd ratios or some other mystic **** )on the depemdent variable. And i think that it will be rather small because there are at least 10 more independent variables that can also influense the dependent variable that i study. For example as my study is connected with brain iron deposition i have some patients (thete were much fewer of them ), who had a lot of iron in their brain but didn't have any signs of pathology(and i think that's the reason of skewenesss)because of the other independent variablest that i dont have.

So, if you will give me an advice or just some link that will help me to dig out some gems out of all the mud that i'm digging in, i'll be very greatfull.

DV <--- IV1, IV2, ...,IV6

So that pathology or non-pathology is explained by e.g. age and exercise etc. But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age. That does not make sense. Mann Whitney is just irrelevant here. (It is by the way sensitive to "spread", so it certainly has its assumptions (that is often violated) .)

The correlation is by the way a parameter. If you want that you do parametric estimation.

Go ahead and do a multiple logistic regression. Then you will also get an odds ratio.

DV <--- IV1, IV2, ...,IV6

So that pathology or non-pathology is explained by e.g. age and exercise etc. But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age. That does not make sense. Mann Whitney is just irrelevant here. (It is by the way sensitive to "spread", so it certainly has its assumptions (that is often violated) .)

The correlation is by the way a parameter. If you want that you do parametric estimation.

Go ahead and do a multiple logistic regression. Then you will also get an odds ratio.

In all medical scientific articls i`ve read the pathological state is presented as dependen variable

Last edited:

But if you do an Mann Whitney test then you investigate how the two groups pathology or the non-pathology influences age.

E.g. one can conduct a radomized experiment with, say, 7 groups receiving different dosages of a toxic agent, and measure whether subjects (plants) are killed or not during the experiment. Then a M-W test can be used to investigate whether those plants which were killed had received higher dosages than the survivors. If yes, then the interpretation is straightforward (IMO): higher dosages here led to more deaths.

With kind regards

Karabiner

In all medical scientific articls i`ve read the pathological state is presented as dependen variable

But if you do a Mann-Whitney test here, you just investigate whether the two groups differ with respect to age. I.e. whether there's an association. The test itself does not say anything about influences. That is a matter of design and of interpretation.

But the null hypothesis in Mann Whitney is P(x1 > x0) = 0.5, where x0 is the age of those who are not sick and x1 is the age of those who are sick. But the age (or the relevant IV in this case) is not normal according to OP and possibly skewed and heteroscedastic, and Mann Whitney is sensitive to that (Search for Fagerland-Sandvik).

E.g. one can conduct a randomized experiment with, say, 7 groups receiving different dosages of a toxic agent, and measure whether subjects (plants) are killed or not during the experiment. Then a M-W test can be used to investigate whether those plants which were killed had received higher dosages than the survivors. If yes, then the interpretation is straightforward (IMO): higher dosages here led to more deaths.

Note that the OP said:

I have one dependent variable that is cathegorical and binominal(patient has the pathology or does not). And six independent variables

But this is about optimal inference. It is known that the dependent variable is binomial. Logit is estimated with maximum likelihood (ML). ML gives consistent and efficient estimates. How could anything be better than maximum likelihood? And by Neyman Pearsons lemma it would give the most powerful test.

GretaGarbo
But the null hypothesis in Mann Whitney is P(x1 > x0) = 0.5 said:

age[/U] of those who are not sick and x1 is the age of those who are sick. But the age (or the relevant IV in this case) is not normal according to OP and possibly skewed and heteroscedastic, and Mann Whitney is sensitive to that (Search for Fagerland-Sandvik).

THE QUESTION

1/wHATS WRONG WHIS MY ANALISYS?

2/ cAN I DO Something else besides this?

Thanks alot/

Last edited:

Here is a general comment, not particular to your post. Medical literature doesn't exactly use the right methods at the right time or in the right way (nor do they recognize statistics as something that does not follow a cookbook approach). So, the argument that medical publications use one method or do something one way is a poor argument. There is a lot of "oh, this group published with this analysis, that must be the right way to do it."

Last edited:

tottally agree,and i`v seen those stuff many times (like the patients in the first group had 20+\-25 teeth) but as an examples i use only those articles where the main authors hi inedx is above 30, so i think they`re trying to use statistic in a correct way.

If you are talking about an impact factor or something similar when you say "index" I can tell you that it doesn't matter as much. I've seen top journals with bad stats in articles by prominent universities and prominent researchers. It be very careful of equating publications with "good" statistical practice and interpretation.