1) My participants (aprox N = 80) produced written answers for a complex problem.

2) I content-coded these answers through a 11-code scheme. Some of these codes are binary (i.e. content present = 1, content absent = 0), while other codes are categorical with 3 to 5 categories.

4) I used SPSS two-step cluster procedure and obtained solutions with 2 and 3 clusters.

5) I proceed with t-tests / ANOVA in order to check whether these clusters differed in terms of two individual differences (continuous) variables, which is the substantive research question I am exploring.

I did not obtain the expected differences in the IVs in relation to the different clusters and believe this might be due, at least in part, to the heterogeneity of the clusters formed from 11 variables along with the relatively small sample size.

The questions are:

1) Given the types of variables I have available to forming the clusters (binary and categorical), is two-step cluster the only possible approach? Is it the most adequate? Any other suggestion?

2) Is there any way to select, a priori, which of the 11 available variables (i.e. codes) should be entered in the cluster analysis in order to obtain a cluster solution which maximises both the intra-cluster homogeneity and the inter-cluster heterogeneity? In other words, although I COULD use the 11 variables, perhaps this is not rendering the best possible cluster solution.

3) Is there an statistics associated with a cluster solution that allows me to tell "how good" that solution is? This specially important if the answer for question 2 is negative, as in this case I might test via "brute force" (i.e. combination of selected variables), the one which render the "best" cluster solution.

Any guidance (including radical reframings of the problem at hand) will be welcome. Many thanks in advance!