Which cluster analysis methods to use to analyze purchase behavior with time?

Hello everyone,

I quite new to cluster analysis, but read through a fair amount of material, still I'm puzzled with the following task.

Performing cluster analysis on my customers, using the follwing data set
SubjectN; Week1 Purchase; Week2 Purchase;.....; Week12 Purchase

These variables are not independent of each other, but most cluster analyses examples showed very independent variables in their demonstration, I feel it would not be suitable for my dataset.

What would you recommend?


TS Contributor
I do not think that independence is a requirement for cluster analysis, there is no hypothesis test or kind of mathematical anlyisis that is performed on the clusters after all. IMO you should just run the analysis and check whether the clusters make practical sense.



Probably A Mammal
Agreed. Even if features are correlated, the learning algorithm isn't trying to identify the marginal impact of each feature on the outcome, like you might be interested with in a linear regression for explanatory reasons. The goal here is typically prediction and of pragmatic value. Thus, if the clustering makes sense or "works," with however the model is applied, then it has succeeded. However, there are things you can do so that dependence is mitigated. For instance, you can transform your features to their principal components and then cluster the top n components. Other dimensionality reduction methods can be used for this sort of purpose: retain only the minimal information required to get a successful model (success, again, is defined pragmatically by the application).