+ Reply to Thread
Results 1 to 6 of 6

Thread: Can you assign new observations into existing clusters?

  1. #1
    Points: 6,655, Level: 53
    Level completed: 53%, Points required for next Level: 95

    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Can you assign new observations into existing clusters?




    Have never done cluster analysis. Is a scenario like this possible/recommended...

    1. Run 100 observations thru cluster analysis and end up with 4 clusters.
    2. Now I have a "cluster model".
    3. I get 20 more observations.
    4. Run these 20 thru my existing cluster model and have each of them assigned into one of the existing 4 clusters.

    So it's kind of a regression approach... you have a model that you use to score future observations. Or would those 4 clusters now be invalid and I'd have to "re-cluster" based on the complete 120 obs?

    Thank you!

  2. #2
    TS Contributor
    Points: 22,378, Level: 93
    Level completed: 3%, Points required for next Level: 972
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Can you assign new observations into existing clusters?

    Quote Originally Posted by jawon View Post
    you have a model that you use to score future observations. Or would those 4 clusters now be invalid and I'd have to "re-cluster" based on the complete 120 obs?
    i would say that just for the sake of exploring your data carefully, you should probably re-run the clustering with the added 20 observations. because, in all honesty, if 20 observations end up changing cluster membership a lot maybe your 'cluster model' wasn't all that great to begin with.

    disclaimer: i prefer model-based clustering methods like finite mixtures than centroid-based clusting algorithms. which method are you using to cluster your observations?
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  3. #3
    Points: 6,655, Level: 53
    Level completed: 53%, Points required for next Level: 95

    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Can you assign new observations into existing clusters?

    Thank you for the response.

    The quantities were just to provide a concrete example. My dataset would actually have much more.

    My question is more about the concept of assigning NEW observations into EXISTING clusters. Admittedly I have more of a regression modeling lens on and I don't know if that works in the clustering world.

  4. #4
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: Can you assign new observations into existing clusters?

    Which clustering algorithm are you using? That determines how you apply your data mining model. For instance, if you're using k-means, the result of training a model on a set of data with a given similarity measure is to produce a set of cluster centers. You apply that model by assigning new observations based on which center they are most new using that measure the model was trained with. That model can be thought of as a pair (k, +) where 'k' represents the centers and '+' represents the measure. It makes no sense to say you've added observations to the cluster. The cluster isn't the "thing" that you've modeled. What you've modeled was (k, +). That was trained on the original data set. The model assigned new observations to whichever clusters they got assigned to. So on that model, what is most similar to a given k is just whatever the model determines. If you fit a new model you might end up with some (k', +) with very different clusters. You can also change the number of k that exist or use a completely different similarity measure. The point being, you aren't creating clusters. You're finding centers in that case.

    Of course, this only applies to what k-means produces. If you used hierarchical clustering or knn or some other approach, you'd have a different sort of model. A hierarchical clustering model produces a dendrogram that relates every observation and you'd apply it to test data differently, but the point still remains the same. The clusters you generate aren't the end result. Those are products of the model. The model itself is the product of the clustering (cluster centers; dendrogram, etc.).
    You should definitely use jQuery. It's really great and does all things.

  5. #5
    Points: 6,655, Level: 53
    Level completed: 53%, Points required for next Level: 95

    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Can you assign new observations into existing clusters?

    Thank you. Despite my inaccurate language, I think I got the answer I was looking for.

    Sounds like using k-means, I can train a model that will result in X number of clusters. Then I could run new observations through this original model and these new observations would get assigned to one of the original clusters.

    Is this a common way of using clusters? I've seen plenty of examples where clusters are created in a one-time analysis to help describe the different segments in the universe. But what I've described is more of an ongoing process, where a model is built and then new observations are routinely added to the original clusters.

  6. #6
    Points: 6,655, Level: 53
    Level completed: 53%, Points required for next Level: 95

    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Can you assign new observations into existing clusters?


    Bump. Would appreciate input from any of you clustering experts! Thank you.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats