clustering

  1. M

    Which distance metric should I use for county clustering?

    I'm trying to cluster U.S. counties based on the following characteristics: Median wages Unemployment rate Average educational attainment Population Many clustering algorithms require the calculation of a distance matrix but I'm having trouble evaluating the pros and cons of the different...
  2. T

    Clustering of behavior related data

    Dear Users, I'm quite beginner in this field but now my research requires some methodology and I thought to create a topic, maybe somebody had the similar issue before. I have some data regarding to health-related features, including: - BMI (scale) - Current diseases (categorical) - Physical...
  3. A

    Which clustering method can I use?

    I have a data-set which consists of 1 dependent continuous variable and 3 independent categorical variables. I need to find the cluster/group of data points with the smallest within-cluster variance of the independent variable. Any suggestions as to which clustering method I can use?
  4. F

    Density plot 3d for clustering

    Good morning everybody. I want to ask you if it is possibile to build a 3d density plot in order to show the density of a cluster, having the scores for the first 2 Principal Component for each individual. I showed the density of each cluster using the package "ggplot2", but I've done it only in...
  5. L

    Principal component and clustering

    Hi! I have a question concerning principal component analysis and how to pursue. First of all, this is my case: I have a lot of data points containing a calculated Q and a measured T. Q is calculated by the sum of different q's. I would like to make the following diagonal clusters...
  6. A

    sure independence screening in clustering

    ------------
  7. B

    Clustering standard errors in AMOS

    Hi all, Do any of you know how to cluster standard errors in a SEM-model constructed in AMOS? I have a paper which focuses on individuals in organizations. I have about 400 individuals distributed over about 200 organizations. In 80 organizations, I have just one respondent, in the other...
  8. N

    Segmentation analysis: Correct test to use?

    As part of a larger study, we have collected a wealth of data on the interactions customers engage in when buying and using a service. We have tried to look at this relatively close to reality. Hence, we distinguish 8 phases that tend to - but are not necessarily – sequential, 12 different...
  9. G

    Help with multidimensional scaling for multiple DVs

    Hello statistics experts! I couldn't find an answer to my question so I thought that one of you might know: I have 14 risk sources, like nuclear power plants (let's call them "objects"), and participants judged them on 14 characteristics (e.g., potential danger). Now, I would like to create...
  10. K

    Clustering heterogenous groups based on similarity of heterogeneity

    Disclaimer: I am not a stats major, and would love it if people shred my question to bits if it contains any obvious logical flaws. I am not a native English speaker, but I try my best to be concise. The reason why I am writing here is to get to the proper statistical lingo/jargon to better...
  11. A

    Latent class cluster analysis with mixed data

    I have data that contains continuous and categorical variables and I have to cluster that data using latent class analaysis - LCA. I know that LCA sometimes mean that manifest variables are categorical but I read that programs like LatentGold know how to hande both types of data when clustering...
  12. C

    Price Prediction

    I have a large set of search data from a particular website. A sample data set is attached here. Data set includes nearly 11,000 rows. What I want to do is to predict the price. I want to predict the price for a particular holiday id, particular Inhouse rating, particular star rating...
  13. C

    K-means clustering

    Is there any algorithm to find the value of 'k' in k-means clustering for a big data set?
  14. 8

    Measures of randomness

    Does anyone have any ideas on how to quantify randomness? More, specifically, I need a measure of clustering. Let's say I have 2 groups within a dataset. I plot their values based on two measures and color-code by group. When I do the same using other measures, I get a different scatter of red...
  15. C

    Help with clustering based on shopping category transaction freq

    hey there. Just wondering if anyone could lend their expertise on cluster analysis. Basically I am trying to see if I can find 5-10 customer segments based on shopping habits. I have a table (single customer view) of 10,000 randomly selected customers along with a column for each category...
  16. L

    Dunn index and Davies-Bouldin index in Clustering

    Does SAS calculate Dunn Index or Davies-Bouldin Index? It relates to validation of different cluster algorithms... Thanks for any suggestions.
  17. M

    When do sample size calculations need to adjust for clustering?

    I am aware that sample size calculations for cluster RCTs must adjust for clustered data. I would like to understand better if such adjustment might be necessary for other study designs. For instance, I am working on a nested case-control study to identify clinical and socio-demographic factors...
  18. J

    Evaluating clusters of variables produced by varclus & hclustvar

    Background - I want to cluster analyze a mixed dataset, clustering the variables on the basis of correlational similarity. SPSS gives me this option, but doesn't allow me to evaluate the clustering solutions by providing statistical measures of heterogeneity change (e.g. pseudo F statistic) or...
  19. L

    Fixed effects versus standard errors

    I have experimental data from a correspondence test in the rental housing market. On which ground should I choose between: 1) F.E. for the region where the apt is and cluster s.e. on the day I sent the message 2) cluster s.e. on region and F.E. on day when the message was sent (or...
  20. H

    cluster analysis and finding outliers

    This post is no longer up