+ Reply to Thread
Results 1 to 5 of 5

Thread: Question About K-Means

  1. #1
    Points: 3,079, Level: 34
    Level completed: 20%, Points required for next Level: 121

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question Question About K-Means




    I'm currently working on a project that looks at clustering retail stores on the bases of their Sales performance by item class.
    Items that are sold in these retail stores are classified across 18 groupings.
    In short, I'm looking at clustering stores that share similar sales patterns across these classes.
    I have been looking into using a K-means Clustering Algorithm, but I'm not sure if I should use Principal Component Analysis (PCA) to reduce my variables into 2-5, as opposed to 18.
    The research I have done has been giving me mixed results, and I have not been able to find anything specific. After clustering stores with PCA as well as without, my results vary greatly. Also, I am not sure what the best way is of evaluating my results...How can I tell which process is giving me the best result?

  2. #2
    Points: 3,489, Level: 36
    Level completed: 93%, Points required for next Level: 11

    Posts
    154
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hehe I ran into this on a midterm take home project once.

    There is a really good chance that the reason your kmean results vary wildly on the PC and natural variable values is because the PC are standardized and you didn't standardize the natural observations.

    I had a situation where I ran kmeans on the full PC projections and on the natural observations and I was getting wildly different results. I thought: why should this be if PC are just rotations in space? When I thought to standardize the natural variables I had my answer.

  3. #3
    Points: 3,079, Level: 34
    Level completed: 20%, Points required for next Level: 121

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I had no idea!!! Your point was VERY helpful!!

    I've been using % of total Sales instead of the actual sales, since I don't want Larger volume stores to be clustered together...So for example, Store X sells 5 products of class 1, 2 products of class 2 and 3 products of class 3....In my dataset, i've turned it into "50% of product 1, 20% of product 2 and 30% of product 3".
    If I standardize the actual dollar amounts, will it be essentially doing the same thing? Or will it standardize it by Product?


    Quote Originally Posted by Rounds View Post
    Hehe I ran into this on a midterm take home project once.

    There is a really good chance that the reason your kmean results vary wildly on the PC and natural variable values is because the PC are standardized and you didn't standardize the natural observations.

    I had a situation where I ran kmeans on the full PC projections and on the natural observations and I was getting wildly different results. I thought: why should this be if PC are just rotations in space? When I thought to standardize the natural variables I had my answer.

  4. #4
    Points: 3,489, Level: 36
    Level completed: 93%, Points required for next Level: 11

    Posts
    154
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Well point one is you don't ~have~ to standardized to use PCA. It is just that by default it is occasionally done. You can do it without the standardization and repeat to see if it was the issue.

    But whether you should standardize is still a question to consider.

    So all your variables are total sales in Product A through Z or are there other types of measurements in there too? Interesting. I dont have an answer.
    Last edited by Rounds; 06-03-2008 at 07:16 PM.

  5. #5
    Points: 3,079, Level: 34
    Level completed: 20%, Points required for next Level: 121

    Posts
    4
    Thanks
    0
    Thanked 0 Times in 0 Posts

    So running PCA Doesn't always standardize the dataset?
    The only type of measurement i'm using is Sales by Category.
    As to standardizing the dataset, do you suggest doing it? As mentioned before in my previous post, i'm worried that large volumes will be clustered together....I'm more interested in clustering stores that have similar sales patterns, as opposed to clustering stores with large sales volumes together.
    Do you have any ideas or suggestions as to how I should continue?
    Thanks!!

    Quote Originally Posted by Rounds View Post
    Well point one is you don't ~have~ to standardized to use PCA. It is just that by default it is occasionally done.

    But whether you should standardize is still a question to consider.

    So all your variables are total sales in Product A through Z or are there other types of measurements in there too? Interesting. I dont have an answer.

+ Reply to Thread

           




Similar Threads

  1. Replies: 3
    Last Post: 02-08-2010, 01:41 PM
  2. Replies: 0
    Last Post: 11-18-2009, 10:28 AM
  3. Replies: 0
    Last Post: 05-07-2009, 01:53 PM
  4. Replies: 0
    Last Post: 02-04-2009, 09:03 AM
  5. Question involving means of samples
    By kbaldazo in forum Statistics
    Replies: 0
    Last Post: 11-10-2007, 08:24 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats