how to split and cut (big data)

natus vincere

New Member
Hello,

I have table: 3 cols and over 300k rows.

Col1 - ID (numeric id, ex: 123456)
Col2 - Numb (numeric from 1 to 25, ex: 1)
Col3 - Yes/no (0 or 1)

I have 443 unique IDs. Each ID has more than 400 rows, almost of all have different numbers of rows.

How to split initial table to table in this way:
Col1 - ID1
Col2 - ID2
..
Col443 - ID443

and fill it from 1 table (yes/no), 'Numb' is not valuable for the moment

and number of rows is 400 or the minimum of number of rows within ID.
so how to split it and finally cut excess ?

natus vincere

New Member
I thought that col2 'number' is not valuable for me at the moment, so there are 2 col's and 10 rows

Code:
id <- c("23456","23456","23456","12321","12321","12321","12321","33333","33333","33333")
yn <- c(1,0,1,1,0,1,0,0,1,0)
dat<- data.frame (ID=id, YN=yn)
as you can notice, there are 4 rows for "12321". This is to cover situation, where I need to cut excess (here, in example, excess, is a last row of "12321"

output:
Code:
id23456 <- c(1,0,1)
id12321 <- c(0,0,1)
id33333 <- c(1,1,0)
output <- data.frame(id23456,id12321,id33333)

bryangoodrich

Probably A Mammal
I have no idea how you derived your output. You want column id33333 to be (1, 1, 0) when we observe it in the data as (0, 1, 0) for the only three rows it has. It is entirely unclear how you want to "trim" off the extra observation from 12321 unless your logic is simply to say "only keep the first 3," in this case.