mishery
01-02-2008, 12:14 PM
Hope you can help me with this...
I want to get cross tabulation data for a large data set. I have 500 columns of binary data. I want to get the cross tabulation information for every possible combination. I can't see how to do this other than one by one, but I know there must be a way.
I just want to know for each of my column vectors the number of overlapping values for each other vector.
I am sure there must be a simple way of doing this other than
table(initdat$f1,initdat$f2)
..
..
table(initdat$499,initdat$f500)
Thank you!
I think this will be useful.Skip to the analysis part and thereafter
> http://www.nettakeaway.com/tp/R/264/analysis-with-r
Mike White
01-02-2008, 05:36 PM
Does the attached do what you want? However, for 500 columns you will have 500*(500-1)/2 tables, which is lots! If you only want data on overlapping 1's or 0's you could extract the data as in the attached, but the tables will still be very large.
mishery
01-03-2008, 05:06 AM
Thank you for the advice. I will give these suggestions a go.
As Mike says, it will be a large amount of data.
What I really want is something like a correlation/covariance matrix, with the off diagonal data being the number of cells in the vector pair that both are = 1. The diagonal of this matrix would be the number of cells=1 for each of the individual vectors.
So for this small sample data....
vec1 vec2 vec3
0 1 1
1 0 1
0 1 1
...What I would want would look like below...
vec1 vec2 vec3
vec1 1 0 1
vec2 0 2 2
vec3 1 2 3
Perhaps there is a way to do this other than using crosstabs?
Thanks
Mike White
01-03-2008, 03:14 PM
If you add the following code to the previous R-Script it should give you the results you want.
ones<-as.matrix(ones)
diag(ones)<-colSums(dat)
colnames(ones)<-colnames(dat)
rownames(ones)<-colnames(dat)
ones
Note that this does not work if one of the variables is all 1's or all 0's as in your example. This is because the table function only produces a 2x2 matrix if both variables have 0's and 1's and my simple indexing method (x[2,2]) to extra the 1's for both variables relies on a 2x2 matrix.
I will have another look at it!
Mike White
01-03-2008, 04:42 PM
If you only need the number of cells with paired 1's then to overcome the problem of a variable containing just 1's it is better to use the AND operation [& operator] as in the attached file.
# Create the sample
colNum <- 5
dat <- as.data.frame(matrix( (runif(50)> 0.5) * 1, ncol=colNum))
ans <- matrix(ncol=colNum, nrow=colNum)
for(i in 1:colNum){
for(j in 1:colNum)
ans[i, j] <- sum(dat[, i] * dat[, j])
}
# The result
ans
Mike White
05-23-2008, 09:59 AM
Having thought about this, a more elegant method is:
ans<-crossprod(as.matrix(dat))
This is also considerably faster with a 500 column data.frame:)