Calculate values for all permutations of categorical variables

Hi all,

So my particular approach may be completely wrong, so let me provide a high level goal first:

The data I'm working with is visitor demographic features with a score. Currently I am using 14 demographic features, each with up to 5 levels. I want determine which partition of demographic features will yield the biggest difference in average score.

I don't want to get too granular, so I will likely omit partitions that consist of less than 5% of the data. My end result would ideally be to a partition of all visitors into 2-4 groups with different average scores.


So! I wanted to start small and begin by calculating the average score based on every combination of 2 distinct features. I got stuck building the script to accomplish this. Here is a sample input/output:

Code for Input:
data=matrix(c("Blue","Blue","Brown","M","M","F","Dem","Dem","Rep",1,0,1), ncol=4)


Eyes Gender Politic Score
1 "Blue" "M" "Rep" "1"
2 "Blue" "M" "Dem" "0"
3 "Brown" "F" "Rep" "1"


Blue, M : .5
Brown, F:1
Blue, Rep : 1
Blue, Dem : 0
Brown,Rep :1
M, Rep : 1
M, Dem : 0
F, Rep : 1

So right now I am getting stuck at all the looping. To start I am just trying to build a function that creates a matrix of all distinct pairs of answers and questions. When it comes to looping through the questions, THEN each answer to the question, I get various errors.

analyze = function(test_data) {

categories = lapply(test_data, unique) #created list of all distinct categoric values
category_names = names(categories)

for (feature in category_names) {
for (ans in categories$feature){


*I know this isnt representative of the whole problem described at the beginning, I just tried to simplify down to an easier problem whose answer will help the most.


Cookie Scientist
I think you want to check out the outer() and combn() functions. The former crosses all elements of a first vector with all the elements of a second vector and applies some function that you define to each of the pairings. The latter takes a single vector and results all unique pairings of elements that can be formed from that single vector.


Another posibility is the following.

1. Create 4 non-redundant single column DATAFRAMES for Eyes Gender Politic Score using duplicated().
2. Join the DAATAFRAMES Eyes Gender Politic Score using merge() 3 times without match condition (=full join).

The result is a dataframe with all possible permutations (ignoring order).

Last edited by a moderator:



When you're posting code, dataframes or computer output it's helpful to wrap this information in code tags by:
  1. either clicking the pound (#) sign icon or
  2. wrap with [NOPARSE]
    some code

which produces:
some code
For more see this (LINK)