- Thread starter arkm25
- Start date
- Tags categorical variable clustering continous variable

Can you add more, such as is this a supervised or unsupervised problem? Also, are you looking to repeat the process three times. I am confused by the description. If you can provide real context that would greatly help.

Since there is a independent/response variable, this becomes supervised learning.

Some context: The response variable is the time until a factory machine breaks down. The categorical predictor/independent variables are machine type, size (small, medium, large) and location.

I am not looking to repeat a process. Currently I have one data-set and I'm looking to extract just one cluster of data-points.

Best regards

arkm25

What is your continuous variable, how was it measured? Are there pecularities regarding its distribution (e.g. markedly skewed, or uniform etc.)? What are your categorical variables, and many categories do they have? How large is your sample size?

With kind regards

Karabiner

With kind regards

Karabiner

The continuous variable is the time until a factory machine breaks down. It is simply measured as the the time from when a machine is put in operation til it no longer functions. Its distribution seem to resemble an exponential distribution.

The categorical variables with their respective number of levels are ...

Type: 4

Size: 3

Location: 5

Sample size: 83

Best regards

arkm25

Did you already check whether all three categorical variables are associated with time to breakdown? If e.g. there was no association between location and time, then you could possibly leave location out of the clustering process.

Yes they are all associated with time to breakdown. There were several other variables in the original data-set that were removed, and these are the ones remaining