# The most appropriate method in this case

#### Stat_member

##### New Member
Hello everyone,

I am asking you my problem:
I have a database that contains 1500 machines that are located in 8 different companies. The goal is to know the cause of death of the machines that have lived the longest (I have about 10 explanatory variables). To do this, I focus on the machines that have lived more than 1500 days. Among them, there are 900 that died at more than 1500days and 1 that is still alive and has more than 1500days.
I wanted to use Cox in my problem, which would allow me to have, for each variable in the model, the risk of death.
Except that here I only have one machine still alive at 1500days and 900 are dead. So I wouldn't have enough censored data in my model. Is this a problem?
Is there any other method that could be used and that would be more appropriate here to answer my problem?

Best regards,
Stat_member.

#### Miner

##### TS Contributor
I think using Cox regression is an overkill solution. Your stated goal was to determine the cause of death for machines that have live more than 1500 days. A simple Pareto chart of the 900 will provide this information. The one surviving machine will not change your conclusions.

Just because you can, doesn't mean that you should.

#### Stat_member

##### New Member
Hello,

Thank you for your feedback @Miner !
Generalized models or penalized regression cannot be used in this case too?
I was expecting one of the statistical methods already mentioned...
So they are simply diagrams.

I have a question about Cox regression. If I deal with tanks older than 1500 days, I have about 900 dead and one alive. And if I add to this sample the still living tanks that are less than 1500 days old (all for the sake of using Cox), the results might be distorted?

Thanks

#### Miner

##### TS Contributor
Generalized models or penalized regression cannot be used in this case too?
I was expecting one of the statistical methods already mentioned...
Why? Your stated goal was to determine the cause of death for machines that have live more than 1500 days. Regression of any type will not provide this information.

I have a question about Cox regression. If I deal with tanks older than 1500 days, I have about 900 dead and one alive. And if I add to this sample the still living tanks that are less than 1500 days old (all for the sake of using Cox), the results might be distorted?
Always include all of your available data when performing a reliability analysis. Excluding the data less than 1500 days will distort your results.

#### Buckeye

##### Active Member
Where in the statement of the problem did you find an application to PCA? lol.

To Miner's point, we can't determine causal relationships from a regression analysis. Unless we...

I don't understand why you can't do logistic regression with alive or dead machines (all data) and use the age as an explanatory variable amongst others. This will tell you which variables increase or decrease the odds of death and by how much.

#### Stat_member

##### New Member
Why? Your stated goal was to determine the cause of death for machines that have live more than 1500 days. Regression of any type will not provide this information.

Always include all of your available data when performing a reliability analysis. Excluding the data less than 1500 days will distort your results.
The problem is that for all the dead machines, I don't know the reason of death. That's what I need to find out, but I don't see how a Pareto chart would help me if the cause of death is not known for any machine. But I don't know if they died because the temperature was too high, or some other factor...

#### Stat_member

##### New Member
Where in the statement of the problem did you find an application to PCA? lol.

To Miner's point, we can't determine causal relationships from a regression analysis. Unless we...