how to split one column in multiple columns in r

#1
hi,

I have the following data,
Code:
A2M	-0.313797611
A2M	0.74108697
A2M	-0.364874718
A2M	0.296491814
A2M	-0.387285569
A2M	0.58645548
A2M	-0.602393199
A2M	-1.028994508
A2M	-0.639690915
A2M	-0.200383697
A2M	0.716309005
A2M	-0.918101251
A2M	0.338130889
A2M	0.794278207
A2M	0.340621333
A2M	-0.164470116
A2M	-0.179730772
A2M	-0.282785167
A2M	0.111060937
A2M	0.323171455
A2M	-0.983622105
A2M	-0.703279954
A2M	0.446365681
A2M	-0.0790684
A2M	0.033109818
A2M	0.123290778
A2M	-0.353328667
A2M	0.222040441
A2M	0.009743741
A2M	1.259814381
A2M	0.126658822
A2M	0.079823417
A2M	1.187102777
A2M	0.705666413
A2M	0.349977506
A2M	0.370446498
A2M	-0.078942467
A2M	0.826843134
A2M	0.813806661
A2M	-0.169470638
A2M	0.335505461
A2M	-0.291090074
A2M	0.130483916
A2M	0.258792613
A2M	-0.206964672
A2M	-1.537886764
A2M	0.854443443
A2M	-1.23375232
A2M	-0.361646964
A2M	-0.417354449
A2M	-0.151858043
A2M	-0.595485683
A2M	-0.73087692
A2M	0.651077765
A2M	0.427290243
A2M	0.516756606
A2M	-0.074519507
A2M	-0.473491156
A2M	0.437234877
A2M	-0.789485322
A2M	0.590813423
A2M	0.688658755
A2M	-0.714729493
A2M	-0.6099811
A2M	0.295760274
A2M	0.588860795
A2M	-0.38344449
A2M	-0.843796526
A2M	0.215596617
A2M	-0.448474206
A2M	0.217355929
A2M	0.14365884
A2M	0.934852022
A2M	-0.322471338
A2M	0.385949847
A2M	1.17701988
A2M	-0.695221311
A2M	-0.009045642
A2M	1.113590106
A2M	-0.683036999
A2M	-0.795579495
A2M	0.268273176
A2M	-0.341739905
A2M	0.753384369
A2M	-0.186471306
A2M	-0.092962405
A2M	0.282793802
A2M	-0.483610412
A2M	0.561542691
A2M	0.313364097
A2M	-0.628507423
A2M	-0.549814535
A2M	0.153410275
A2M	-0.193158313
A2M	0.079435613
A2M	-0.602862062
A2M	-0.188641552
A2M	0.028923295
A2M	-0.145861501
A2M	0.13042126
A2M	0.555372579
A2M	-0.689738429
A2M	-0.064350801
A2M	-0.886627682
A2M	0.814279716
A2M	-0.941199359
A2M	-0.082379005
A2M	0.45676621
A2M	1.184100597
A2M	-1.345883539
A2M	-0.305608664
A2M	-0.360846747
A2M	1.004787999
A2M	-0.172306421
A2M	-0.867563256
A2M	0.782943453
A2M	-2.552063229
A2M	-0.633916987
A2M	0.023359565
A2M	0.401063589
A2M	0.0934295
A2M	0.65714858
A2M	0.034826908
A2M	0.713854053
A2M	0.229106261
A2M	0.749949745
A2M	-0.074935837
A2M	0.791341126
A2M	0.40757936
A2M	-0.155949177
A2M	0.314645211
A2M	-0.184689825
A1CF	-0.066518823
A1CF	-0.076164533
A1CF	-0.058999836
A1CF	-0.0679747
A1CF	-0.076860028
A1CF	-0.06135315
A1CF	-0.061624411
A1CF	-0.081115458
A1CF	-0.08523997
A1CF	0.211715126
A1CF	-0.08602535
A1CF	-0.042362118
A1CF	-0.056677192
A1CF	-0.063599964
A1CF	-0.095028642
A1CF	-0.082031521
A1CF	-0.084107425
A1CF	-0.031540488
A1CF	-0.068662465
A1CF	-0.013709687
A1CF	-0.068581723
A1CF	0.095439448
A1CF	0.293280537
A1CF	-0.07462896
A1CF	0.267898594
A1CF	-0.071840492
A1CF	-0.081406388
A1CF	-0.032998799
A1CF	-0.066942158
A1CF	-0.036771126
A1CF	-0.080686577
A1CF	-0.066789495
A1CF	-0.069306588
A1CF	-0.050656096
A1CF	-0.076479928
A1CF	-0.083887628
A1CF	-0.07586613
A1CF	-0.085638567
A1CF	-0.114187752
A1CF	0.170634834
A1CF	-0.071070307
A1CF	-0.046956267
A1CF	0.502321101
A1CF	-0.100290932
A1CF	0.69965247
A1CF	-0.070323226
A1CF	-0.092735523
A1CF	0.369057401
A1CF	-0.08532937
A1CF	-0.09352188
A1CF	-0.081690552
A1CF	-0.093774419
A1CF	-0.052445884
A1CF	-0.087612703
A1CF	-0.071129736
A1CF	0.257383552
A1CF	0.228446124
A1CF	-0.062003193
A1CF	-0.093154006
A1CF	-0.068835749
A1CF	-0.06276307
A1CF	-0.08584694
A1CF	-0.069152579
A1CF	0.198715668
A1CF	-0.071575836
A1CF	-0.083500603
A1CF	0.251047517
A1CF	-0.028920391
A1CF	-0.054123471
A1CF	-0.096850942
A1CF	-0.081389765
A1CF	-0.079794681
A1CF	0.092662751
A1CF	1.873492978
A1CF	-0.070063675
A1CF	-0.073719965
A1CF	-0.077648695
A1CF	-0.070695531
A1CF	-0.111552432
A1CF	-0.070776914
A1CF	-0.018577144
A1CF	0.201302145
A1CF	-0.092715077
A1CF	-0.077598728
A1CF	-0.081782482
A1CF	-0.086311525
A1CF	0.325538096
A1CF	-0.039729133
A1CF	-0.032485056
A1CF	-0.031482781
A1CF	0.173166573
A1CF	-0.04526459
A1CF	-0.07335161
A1CF	0.279482595
A1CF	-0.067194336
A1CF	-0.050690888
A1CF	0.498827196
A1CF	-0.018500578
A1CF	-0.081691457
A1CF	-0.071656961
A1CF	-0.082933316
A1CF	0.211077073
A1CF	-0.089698955
A1CF	-0.068877329
A1CF	-0.075181519
A1CF	-0.103403929
A1CF	-0.059945474
A1CF	-0.112948656
A1CF	-0.064598875
A1CF	-0.083215029
A1CF	-0.076862345
A1CF	-0.051212042
A1CF	-0.101639629
A1CF	-0.089998533
A1CF	-0.069779107
A1CF	0.603192736
A1CF	-0.071350544
A1CF	-0.064539386
A1CF	-0.09029606
A1CF	-0.058140787
A1CF	-0.083317915
A1CF	-0.058547979
A1CF	-0.066852463
A1CF	-0.055958555
A1CF	-0.067938668
A1CF	-0.059358528
A1CF	-0.084283403
A1CF	-0.074622995
A1CF	-0.05692619
A1CF	-0.043628531
A1CF	-0.075165947
A1CF	-0.038566069
I wanted to make it in 2 columns
Code:
A2M	-0.313797611	A1CF	-0.066518823
A2M	0.74108697	A1CF	-0.076164533
A2M	-0.364874718	A1CF	-0.058999836
A2M	0.296491814	A1CF	-0.0679747
A2M	-0.387285569	A1CF	-0.076860028
A2M	0.58645548	A1CF	-0.06135315
A2M	-0.602393199	A1CF	-0.061624411
A2M	-1.028994508	A1CF	-0.081115458
A2M	-0.639690915	A1CF	-0.08523997
A2M	-0.200383697	A1CF	0.211715126
A2M	0.716309005	A1CF	-0.08602535
A2M	-0.918101251	A1CF	-0.042362118
A2M	0.338130889	A1CF	-0.056677192
A2M	0.794278207	A1CF	-0.063599964
A2M	0.340621333	A1CF	-0.095028642
A2M	-0.164470116	A1CF	-0.082031521
A2M	-0.179730772	A1CF	-0.084107425
A2M	-0.282785167	A1CF	-0.031540488
A2M	0.111060937	A1CF	-0.068662465
A2M	0.323171455	A1CF	-0.013709687
A2M	-0.983622105	A1CF	-0.068581723
A2M	-0.703279954	A1CF	0.095439448
A2M	0.446365681	A1CF	0.293280537
A2M	-0.0790684	A1CF	-0.07462896
A2M	0.033109818	A1CF	0.267898594
A2M	0.123290778	A1CF	-0.071840492
A2M	-0.353328667	A1CF	-0.081406388
A2M	0.222040441	A1CF	-0.032998799
A2M	0.009743741	A1CF	-0.066942158
A2M	1.259814381	A1CF	-0.036771126
A2M	0.126658822	A1CF	-0.080686577
A2M	0.079823417	A1CF	-0.066789495
A2M	1.187102777	A1CF	-0.069306588
A2M	0.705666413	A1CF	-0.050656096
A2M	0.349977506	A1CF	-0.076479928
A2M	0.370446498	A1CF	-0.083887628
A2M	-0.078942467	A1CF	-0.07586613
A2M	0.826843134	A1CF	-0.085638567
A2M	0.813806661	A1CF	-0.114187752
A2M	-0.169470638	A1CF	0.170634834
A2M	0.335505461	A1CF	-0.071070307
A2M	-0.291090074	A1CF	-0.046956267
A2M	0.130483916	A1CF	0.502321101
A2M	0.258792613	A1CF	-0.100290932
A2M	-0.206964672	A1CF	0.69965247
A2M	-1.537886764	A1CF	-0.070323226
A2M	0.854443443	A1CF	-0.092735523
A2M	-1.23375232	A1CF	0.369057401
A2M	-0.361646964	A1CF	-0.08532937
A2M	-0.417354449	A1CF	-0.09352188
A2M	-0.151858043	A1CF	-0.081690552
A2M	-0.595485683	A1CF	-0.093774419
A2M	-0.73087692	A1CF	-0.052445884
A2M	0.651077765	A1CF	-0.087612703
A2M	0.427290243	A1CF	-0.071129736
A2M	0.516756606	A1CF	0.257383552
A2M	-0.074519507	A1CF	0.228446124
A2M	-0.473491156	A1CF	-0.062003193
A2M	0.437234877	A1CF	-0.093154006
A2M	-0.789485322	A1CF	-0.068835749
A2M	0.590813423	A1CF	-0.06276307
A2M	0.688658755	A1CF	-0.08584694
A2M	-0.714729493	A1CF	-0.069152579
A2M	-0.6099811	A1CF	0.198715668
A2M	0.295760274	A1CF	-0.071575836
A2M	0.588860795	A1CF	-0.083500603
A2M	-0.38344449	A1CF	0.251047517
A2M	-0.843796526	A1CF	-0.028920391
A2M	0.215596617	A1CF	-0.054123471
A2M	-0.448474206	A1CF	-0.096850942
A2M	0.217355929	A1CF	-0.081389765
A2M	0.14365884	A1CF	-0.079794681
A2M	0.934852022	A1CF	0.092662751
A2M	-0.322471338	A1CF	1.873492978
A2M	0.385949847	A1CF	-0.070063675
A2M	1.17701988	A1CF	-0.073719965
A2M	-0.695221311	A1CF	-0.077648695
A2M	-0.009045642	A1CF	-0.070695531
A2M	1.113590106	A1CF	-0.111552432
A2M	-0.683036999	A1CF	-0.070776914
A2M	-0.795579495	A1CF	-0.018577144
A2M	0.268273176	A1CF	0.201302145
A2M	-0.341739905	A1CF	-0.092715077
A2M	0.753384369	A1CF	-0.077598728
A2M	-0.186471306	A1CF	-0.081782482
A2M	-0.092962405	A1CF	-0.086311525
A2M	0.282793802	A1CF	0.325538096
A2M	-0.483610412	A1CF	-0.039729133
A2M	0.561542691	A1CF	-0.032485056
A2M	0.313364097	A1CF	-0.031482781
A2M	-0.628507423	A1CF	0.173166573
A2M	-0.549814535	A1CF	-0.04526459
A2M	0.153410275	A1CF	-0.07335161
A2M	-0.193158313	A1CF	0.279482595
A2M	0.079435613	A1CF	-0.067194336
A2M	-0.602862062	A1CF	-0.050690888
A2M	-0.188641552	A1CF	0.498827196
A2M	0.028923295	A1CF	-0.018500578
A2M	-0.145861501	A1CF	-0.081691457
A2M	0.13042126	A1CF	-0.071656961
A2M	0.555372579	A1CF	-0.082933316
A2M	-0.689738429	A1CF	0.211077073
A2M	-0.064350801	A1CF	-0.089698955
A2M	-0.886627682	A1CF	-0.068877329
A2M	0.814279716	A1CF	-0.075181519
A2M	-0.941199359	A1CF	-0.103403929
A2M	-0.082379005	A1CF	-0.059945474
A2M	0.45676621	A1CF	-0.112948656
A2M	1.184100597	A1CF	-0.064598875
A2M	-1.345883539	A1CF	-0.083215029
A2M	-0.305608664	A1CF	-0.076862345
A2M	-0.360846747	A1CF	-0.051212042
A2M	1.004787999	A1CF	-0.101639629
A2M	-0.172306421	A1CF	-0.089998533
A2M	-0.867563256	A1CF	-0.069779107
A2M	0.782943453	A1CF	0.603192736
A2M	-2.552063229	A1CF	-0.071350544
A2M	-0.633916987	A1CF	-0.064539386
A2M	0.023359565	A1CF	-0.09029606
A2M	0.401063589	A1CF	-0.058140787
A2M	0.0934295	A1CF	-0.083317915
A2M	0.65714858	A1CF	-0.058547979
A2M	0.034826908	A1CF	-0.066852463
A2M	0.713854053	A1CF	-0.055958555
A2M	0.229106261	A1CF	-0.067938668
A2M	0.749949745	A1CF	-0.059358528
A2M	-0.074935837	A1CF	-0.084283403
A2M	0.791341126	A1CF	-0.074622995
A2M	0.40757936	A1CF	-0.05692619
A2M	-0.155949177	A1CF	-0.043628531
A2M	0.314645211	A1CF	-0.075165947
A2M	-0.184689825	A1CF	-0.038566069
can the answer be general to more than 2 columns
 

hlsmith

Omega Contributor
#2
How do you decide which values get placed in the same row? I get you are just unconcatenating, but is there a rhyme or reason for the orderings when you have no ID number?
 
#3
I decide when starting a new gene name so in this example A2M is one gene, the next gene is A1CF
also you can say I can separate them every 132 rows so for example the first A2M start from 1 to 132 then the next one from 133 up to 264
 
Last edited:

bryangoodrich

Probably A Mammal
#4
You can cbind (column bind) tables together. They're expected to have the same number of rows.

Code:
x <- ... your data ...
x <- cbind(
    x[x[, 1] == "A2M", ], 
    x[x[, 1] == "A1CF", ])
Not sure why you would do this to your data though. Do you want to widen the table for display purposes? Then it might make more sense to have 2 columns with the header being the gene name (i.e., each column is a gene's vector of data). There is no reason I can think of to have 2 genes names on the same row in your table.

Code:
x <- data.frame(
    A2M = x[x[, 1] == "A2M", 2], 
    A1CF = x[x[, 1] == "A1CF", 2])
Here you filter out just the 2nd column (data values) for each of the genes and make a data frame (table) where each column is just those gene data values.

Also, put column names (headers) on your tables. This helps label what we call a table's schema (structure).

Your original data frame has the headers

(gene, value)

And you want us to make something like

(gene.A2M, value.A2M, gene.A1CF, value.A1Cf)

Which is a terrible structure. My solution would instead be

(A2M, A1CF)

Where the contents of the data frame are just the values. There are also functions that would take the original schema and "spread" the gene variable (which may have more than 2 genes) to the columns for us. This is efficient, since you wouldn't have to change your code every time your data changes and you have different genes in your data frame. Conceptually, my point still stands. The 2nd schema makes no sense to work with and there are no clean ways to take your original schema and generate the table if you have varying numbers of genes in your data frame. I'm talking about building a general solution, not one specific to this data set that has 2 genes with a 132 rows per gene. Yes, you can build a solution specific to that, but it wouldn't make any sense to do that programmatically. You might as well manually move things around in Excel.

So

Code:
library(tidyr)  # For spreading semantics
x <- ... read in your data ...
names(x) <- c("gene", "value")
x <- spread(x, gene, value)  # Spread the gene values to columns, filling with the value variable
# schema: Gene1, Gene2, ..., GeneN for N genes
# unit of measure (definition of row): gene value (whatever values here represent)
 
#5
Basically I have 10565 genes each gene with 132 rows , so as you said I want the header for each values be the gene names instead of having them in columns next to the values, how can I achieve this
 

bryangoodrich

Probably A Mammal
#6
I updated the code using tidyr's spread function to do that, but I'm still skeptical that this approach makes sense. It depends on what you're trying to do. I'm not against having a 132x10565 dimension table, but it still begs the question how you're going to work with this data. You want your data structure to efficiently be processed by what you're going to do with that data.

I can make arguments for why and when you would want to keep the original (gene, value) schema (visualization, for instance), or widening to the columns-of-genes data frame (such as per-gene computations), or converting this to a large matrix object whether 132x10565 or 10565x132. It just depends what you're trying to do with the data.
 

hlsmith

Omega Contributor
#7
Agreed BG. I would have to imagine that some machine just kicks out the data like this, and that no sane person would be so devious to just stack a bunch of data together.


OP, what do you now plan to do with these data? For example any vector arithmetic or comparative analyses? Do you still not care the ordering of the observations?
 
#8
I will do eQTL and the data takes this structure
That is why I want each column represent a gene then I will transpose it
So each row is gene name and then 132 values
 
#9
Code:
library(tidyr)  # For spreading semantics
x <- ... read in your data ...
names(x) <- c("gene", "value")
x <- spread(x, gene, value)  # Spread the gene values to columns, filling with the value variable
# schema: Gene1, Gene2, ..., GeneN for N genes
# unit of measure (definition of row): gene value (whatever values here represent)
it gave me error Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,.....
 

bryangoodrich

Probably A Mammal
#10
I have no idea what you tried to run since my code wasn't designed to be copy-and-paste, much less on which command gave an error. It also makes no sense to project your data to column vectors by gene just to transpose it. Easy solution

Code:
x <- read.csv(...whatever...)
names(x) <- c("gene", "value")
x <- matrix(x$value, ncol = 132, byrow = TRUE)  # Assumes your data is in the correct order by gene