model comparison, log likelihood score, P value.

bagdevi

New Member
I am working on Bioinformatic analysis of genomic data. I am doing some statistical analysis and doing model comparison and have some conceptual doubts. And quite possible that it is a simple concept which I do not know, as I am not so through in statistics.

I am comparing two models. One is nested in the other. The nested one is the null model and the other one is the alternate model. The log likelihood score of the null is lnL0 and for alternate is lnL1. The difference in the number of parameters in these models is the degree of freedom, df.

I am looking for the P value by looking up in the chi square table with the df and 2*(lnL1 - lnL0),

my doubt is, should I take absolute difference between lnL1 and lnL0 or i should subtract lnL0 from lnL1.

if I subtract lnL0 from lnL1, in few cases I m getting negative number, how should I deal with such cases where 2*(lnL1 - lnL0) < 0 ?

As the null model is nested inside the alternate one, log likelihood score of the alternate should always be bigger than null. So, how come I am getting -ve value in some of my cases, Does it mean that I am doing something wrong?

By the way I am not calculating the log likelihood scores my myself. I am using the positive selection package PAML, which calculates the log likelihood score of the respective models.

It would really be a great help if somebody can clarify my doubts.

Thanking you.

Dason

I can think of 3 possible explanations.

1) You're doing something wrong in telling the program what you want
2) The program isn't doing a good job finding the actual maximum
3) The models aren't nested like you believe is the case.

bagdevi

New Member
Thank you for the response.

After looking at the parameters I have used, I can see, I have not made any mistake in running the jobs.

Yes, the models I am using are known to have convergence problems. For example,
they may get got stuck at a corner of the parameter space, with some parameter estimates to be
at 0, the lower bound set by the program.

So, I will now try to rerun the programs to see if I am getting different maximum likelihood scores.

About being nested, the program does not say anything about it, though the number of parameters are less in the null model than the alternate model. In case the null model is not nested, can I still calculate the P value?

Or is it wrong to do a model comparison and calculate P value in case the models are not nested?

I can think of 3 possible explanations.

1) You're doing something wrong in telling the program what you want
2) The program isn't doing a good job finding the actual maximum
3) The models aren't nested like you believe is the case.

BGM

TS Contributor
This is a simple mathematical fact:

If $$\Theta_0 \subseteq \Theta_1$$, then

$$\sup_{\theta \in \Theta_0} L(\theta) \leq \sup_{\theta \in \Theta_1} L(\theta)$$

Not sure if your actual model is nested or not. I guess you can always take the union of the parameter space such that it always satisfy the above relationship.