+ Reply to Thread
Results 1 to 10 of 10

Thread: how to split one column in multiple columns in r

  1. #1
    Points: 891, Level: 15
    Level completed: 91%, Points required for next Level: 9

    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    how to split one column in multiple columns in r




    hi,

    I have the following data,
    Code: 
    A2M	-0.313797611
    A2M	0.74108697
    A2M	-0.364874718
    A2M	0.296491814
    A2M	-0.387285569
    A2M	0.58645548
    A2M	-0.602393199
    A2M	-1.028994508
    A2M	-0.639690915
    A2M	-0.200383697
    A2M	0.716309005
    A2M	-0.918101251
    A2M	0.338130889
    A2M	0.794278207
    A2M	0.340621333
    A2M	-0.164470116
    A2M	-0.179730772
    A2M	-0.282785167
    A2M	0.111060937
    A2M	0.323171455
    A2M	-0.983622105
    A2M	-0.703279954
    A2M	0.446365681
    A2M	-0.0790684
    A2M	0.033109818
    A2M	0.123290778
    A2M	-0.353328667
    A2M	0.222040441
    A2M	0.009743741
    A2M	1.259814381
    A2M	0.126658822
    A2M	0.079823417
    A2M	1.187102777
    A2M	0.705666413
    A2M	0.349977506
    A2M	0.370446498
    A2M	-0.078942467
    A2M	0.826843134
    A2M	0.813806661
    A2M	-0.169470638
    A2M	0.335505461
    A2M	-0.291090074
    A2M	0.130483916
    A2M	0.258792613
    A2M	-0.206964672
    A2M	-1.537886764
    A2M	0.854443443
    A2M	-1.23375232
    A2M	-0.361646964
    A2M	-0.417354449
    A2M	-0.151858043
    A2M	-0.595485683
    A2M	-0.73087692
    A2M	0.651077765
    A2M	0.427290243
    A2M	0.516756606
    A2M	-0.074519507
    A2M	-0.473491156
    A2M	0.437234877
    A2M	-0.789485322
    A2M	0.590813423
    A2M	0.688658755
    A2M	-0.714729493
    A2M	-0.6099811
    A2M	0.295760274
    A2M	0.588860795
    A2M	-0.38344449
    A2M	-0.843796526
    A2M	0.215596617
    A2M	-0.448474206
    A2M	0.217355929
    A2M	0.14365884
    A2M	0.934852022
    A2M	-0.322471338
    A2M	0.385949847
    A2M	1.17701988
    A2M	-0.695221311
    A2M	-0.009045642
    A2M	1.113590106
    A2M	-0.683036999
    A2M	-0.795579495
    A2M	0.268273176
    A2M	-0.341739905
    A2M	0.753384369
    A2M	-0.186471306
    A2M	-0.092962405
    A2M	0.282793802
    A2M	-0.483610412
    A2M	0.561542691
    A2M	0.313364097
    A2M	-0.628507423
    A2M	-0.549814535
    A2M	0.153410275
    A2M	-0.193158313
    A2M	0.079435613
    A2M	-0.602862062
    A2M	-0.188641552
    A2M	0.028923295
    A2M	-0.145861501
    A2M	0.13042126
    A2M	0.555372579
    A2M	-0.689738429
    A2M	-0.064350801
    A2M	-0.886627682
    A2M	0.814279716
    A2M	-0.941199359
    A2M	-0.082379005
    A2M	0.45676621
    A2M	1.184100597
    A2M	-1.345883539
    A2M	-0.305608664
    A2M	-0.360846747
    A2M	1.004787999
    A2M	-0.172306421
    A2M	-0.867563256
    A2M	0.782943453
    A2M	-2.552063229
    A2M	-0.633916987
    A2M	0.023359565
    A2M	0.401063589
    A2M	0.0934295
    A2M	0.65714858
    A2M	0.034826908
    A2M	0.713854053
    A2M	0.229106261
    A2M	0.749949745
    A2M	-0.074935837
    A2M	0.791341126
    A2M	0.40757936
    A2M	-0.155949177
    A2M	0.314645211
    A2M	-0.184689825
    A1CF	-0.066518823
    A1CF	-0.076164533
    A1CF	-0.058999836
    A1CF	-0.0679747
    A1CF	-0.076860028
    A1CF	-0.06135315
    A1CF	-0.061624411
    A1CF	-0.081115458
    A1CF	-0.08523997
    A1CF	0.211715126
    A1CF	-0.08602535
    A1CF	-0.042362118
    A1CF	-0.056677192
    A1CF	-0.063599964
    A1CF	-0.095028642
    A1CF	-0.082031521
    A1CF	-0.084107425
    A1CF	-0.031540488
    A1CF	-0.068662465
    A1CF	-0.013709687
    A1CF	-0.068581723
    A1CF	0.095439448
    A1CF	0.293280537
    A1CF	-0.07462896
    A1CF	0.267898594
    A1CF	-0.071840492
    A1CF	-0.081406388
    A1CF	-0.032998799
    A1CF	-0.066942158
    A1CF	-0.036771126
    A1CF	-0.080686577
    A1CF	-0.066789495
    A1CF	-0.069306588
    A1CF	-0.050656096
    A1CF	-0.076479928
    A1CF	-0.083887628
    A1CF	-0.07586613
    A1CF	-0.085638567
    A1CF	-0.114187752
    A1CF	0.170634834
    A1CF	-0.071070307
    A1CF	-0.046956267
    A1CF	0.502321101
    A1CF	-0.100290932
    A1CF	0.69965247
    A1CF	-0.070323226
    A1CF	-0.092735523
    A1CF	0.369057401
    A1CF	-0.08532937
    A1CF	-0.09352188
    A1CF	-0.081690552
    A1CF	-0.093774419
    A1CF	-0.052445884
    A1CF	-0.087612703
    A1CF	-0.071129736
    A1CF	0.257383552
    A1CF	0.228446124
    A1CF	-0.062003193
    A1CF	-0.093154006
    A1CF	-0.068835749
    A1CF	-0.06276307
    A1CF	-0.08584694
    A1CF	-0.069152579
    A1CF	0.198715668
    A1CF	-0.071575836
    A1CF	-0.083500603
    A1CF	0.251047517
    A1CF	-0.028920391
    A1CF	-0.054123471
    A1CF	-0.096850942
    A1CF	-0.081389765
    A1CF	-0.079794681
    A1CF	0.092662751
    A1CF	1.873492978
    A1CF	-0.070063675
    A1CF	-0.073719965
    A1CF	-0.077648695
    A1CF	-0.070695531
    A1CF	-0.111552432
    A1CF	-0.070776914
    A1CF	-0.018577144
    A1CF	0.201302145
    A1CF	-0.092715077
    A1CF	-0.077598728
    A1CF	-0.081782482
    A1CF	-0.086311525
    A1CF	0.325538096
    A1CF	-0.039729133
    A1CF	-0.032485056
    A1CF	-0.031482781
    A1CF	0.173166573
    A1CF	-0.04526459
    A1CF	-0.07335161
    A1CF	0.279482595
    A1CF	-0.067194336
    A1CF	-0.050690888
    A1CF	0.498827196
    A1CF	-0.018500578
    A1CF	-0.081691457
    A1CF	-0.071656961
    A1CF	-0.082933316
    A1CF	0.211077073
    A1CF	-0.089698955
    A1CF	-0.068877329
    A1CF	-0.075181519
    A1CF	-0.103403929
    A1CF	-0.059945474
    A1CF	-0.112948656
    A1CF	-0.064598875
    A1CF	-0.083215029
    A1CF	-0.076862345
    A1CF	-0.051212042
    A1CF	-0.101639629
    A1CF	-0.089998533
    A1CF	-0.069779107
    A1CF	0.603192736
    A1CF	-0.071350544
    A1CF	-0.064539386
    A1CF	-0.09029606
    A1CF	-0.058140787
    A1CF	-0.083317915
    A1CF	-0.058547979
    A1CF	-0.066852463
    A1CF	-0.055958555
    A1CF	-0.067938668
    A1CF	-0.059358528
    A1CF	-0.084283403
    A1CF	-0.074622995
    A1CF	-0.05692619
    A1CF	-0.043628531
    A1CF	-0.075165947
    A1CF	-0.038566069
    I wanted to make it in 2 columns
    Code: 
    A2M	-0.313797611	A1CF	-0.066518823
    A2M	0.74108697	A1CF	-0.076164533
    A2M	-0.364874718	A1CF	-0.058999836
    A2M	0.296491814	A1CF	-0.0679747
    A2M	-0.387285569	A1CF	-0.076860028
    A2M	0.58645548	A1CF	-0.06135315
    A2M	-0.602393199	A1CF	-0.061624411
    A2M	-1.028994508	A1CF	-0.081115458
    A2M	-0.639690915	A1CF	-0.08523997
    A2M	-0.200383697	A1CF	0.211715126
    A2M	0.716309005	A1CF	-0.08602535
    A2M	-0.918101251	A1CF	-0.042362118
    A2M	0.338130889	A1CF	-0.056677192
    A2M	0.794278207	A1CF	-0.063599964
    A2M	0.340621333	A1CF	-0.095028642
    A2M	-0.164470116	A1CF	-0.082031521
    A2M	-0.179730772	A1CF	-0.084107425
    A2M	-0.282785167	A1CF	-0.031540488
    A2M	0.111060937	A1CF	-0.068662465
    A2M	0.323171455	A1CF	-0.013709687
    A2M	-0.983622105	A1CF	-0.068581723
    A2M	-0.703279954	A1CF	0.095439448
    A2M	0.446365681	A1CF	0.293280537
    A2M	-0.0790684	A1CF	-0.07462896
    A2M	0.033109818	A1CF	0.267898594
    A2M	0.123290778	A1CF	-0.071840492
    A2M	-0.353328667	A1CF	-0.081406388
    A2M	0.222040441	A1CF	-0.032998799
    A2M	0.009743741	A1CF	-0.066942158
    A2M	1.259814381	A1CF	-0.036771126
    A2M	0.126658822	A1CF	-0.080686577
    A2M	0.079823417	A1CF	-0.066789495
    A2M	1.187102777	A1CF	-0.069306588
    A2M	0.705666413	A1CF	-0.050656096
    A2M	0.349977506	A1CF	-0.076479928
    A2M	0.370446498	A1CF	-0.083887628
    A2M	-0.078942467	A1CF	-0.07586613
    A2M	0.826843134	A1CF	-0.085638567
    A2M	0.813806661	A1CF	-0.114187752
    A2M	-0.169470638	A1CF	0.170634834
    A2M	0.335505461	A1CF	-0.071070307
    A2M	-0.291090074	A1CF	-0.046956267
    A2M	0.130483916	A1CF	0.502321101
    A2M	0.258792613	A1CF	-0.100290932
    A2M	-0.206964672	A1CF	0.69965247
    A2M	-1.537886764	A1CF	-0.070323226
    A2M	0.854443443	A1CF	-0.092735523
    A2M	-1.23375232	A1CF	0.369057401
    A2M	-0.361646964	A1CF	-0.08532937
    A2M	-0.417354449	A1CF	-0.09352188
    A2M	-0.151858043	A1CF	-0.081690552
    A2M	-0.595485683	A1CF	-0.093774419
    A2M	-0.73087692	A1CF	-0.052445884
    A2M	0.651077765	A1CF	-0.087612703
    A2M	0.427290243	A1CF	-0.071129736
    A2M	0.516756606	A1CF	0.257383552
    A2M	-0.074519507	A1CF	0.228446124
    A2M	-0.473491156	A1CF	-0.062003193
    A2M	0.437234877	A1CF	-0.093154006
    A2M	-0.789485322	A1CF	-0.068835749
    A2M	0.590813423	A1CF	-0.06276307
    A2M	0.688658755	A1CF	-0.08584694
    A2M	-0.714729493	A1CF	-0.069152579
    A2M	-0.6099811	A1CF	0.198715668
    A2M	0.295760274	A1CF	-0.071575836
    A2M	0.588860795	A1CF	-0.083500603
    A2M	-0.38344449	A1CF	0.251047517
    A2M	-0.843796526	A1CF	-0.028920391
    A2M	0.215596617	A1CF	-0.054123471
    A2M	-0.448474206	A1CF	-0.096850942
    A2M	0.217355929	A1CF	-0.081389765
    A2M	0.14365884	A1CF	-0.079794681
    A2M	0.934852022	A1CF	0.092662751
    A2M	-0.322471338	A1CF	1.873492978
    A2M	0.385949847	A1CF	-0.070063675
    A2M	1.17701988	A1CF	-0.073719965
    A2M	-0.695221311	A1CF	-0.077648695
    A2M	-0.009045642	A1CF	-0.070695531
    A2M	1.113590106	A1CF	-0.111552432
    A2M	-0.683036999	A1CF	-0.070776914
    A2M	-0.795579495	A1CF	-0.018577144
    A2M	0.268273176	A1CF	0.201302145
    A2M	-0.341739905	A1CF	-0.092715077
    A2M	0.753384369	A1CF	-0.077598728
    A2M	-0.186471306	A1CF	-0.081782482
    A2M	-0.092962405	A1CF	-0.086311525
    A2M	0.282793802	A1CF	0.325538096
    A2M	-0.483610412	A1CF	-0.039729133
    A2M	0.561542691	A1CF	-0.032485056
    A2M	0.313364097	A1CF	-0.031482781
    A2M	-0.628507423	A1CF	0.173166573
    A2M	-0.549814535	A1CF	-0.04526459
    A2M	0.153410275	A1CF	-0.07335161
    A2M	-0.193158313	A1CF	0.279482595
    A2M	0.079435613	A1CF	-0.067194336
    A2M	-0.602862062	A1CF	-0.050690888
    A2M	-0.188641552	A1CF	0.498827196
    A2M	0.028923295	A1CF	-0.018500578
    A2M	-0.145861501	A1CF	-0.081691457
    A2M	0.13042126	A1CF	-0.071656961
    A2M	0.555372579	A1CF	-0.082933316
    A2M	-0.689738429	A1CF	0.211077073
    A2M	-0.064350801	A1CF	-0.089698955
    A2M	-0.886627682	A1CF	-0.068877329
    A2M	0.814279716	A1CF	-0.075181519
    A2M	-0.941199359	A1CF	-0.103403929
    A2M	-0.082379005	A1CF	-0.059945474
    A2M	0.45676621	A1CF	-0.112948656
    A2M	1.184100597	A1CF	-0.064598875
    A2M	-1.345883539	A1CF	-0.083215029
    A2M	-0.305608664	A1CF	-0.076862345
    A2M	-0.360846747	A1CF	-0.051212042
    A2M	1.004787999	A1CF	-0.101639629
    A2M	-0.172306421	A1CF	-0.089998533
    A2M	-0.867563256	A1CF	-0.069779107
    A2M	0.782943453	A1CF	0.603192736
    A2M	-2.552063229	A1CF	-0.071350544
    A2M	-0.633916987	A1CF	-0.064539386
    A2M	0.023359565	A1CF	-0.09029606
    A2M	0.401063589	A1CF	-0.058140787
    A2M	0.0934295	A1CF	-0.083317915
    A2M	0.65714858	A1CF	-0.058547979
    A2M	0.034826908	A1CF	-0.066852463
    A2M	0.713854053	A1CF	-0.055958555
    A2M	0.229106261	A1CF	-0.067938668
    A2M	0.749949745	A1CF	-0.059358528
    A2M	-0.074935837	A1CF	-0.084283403
    A2M	0.791341126	A1CF	-0.074622995
    A2M	0.40757936	A1CF	-0.05692619
    A2M	-0.155949177	A1CF	-0.043628531
    A2M	0.314645211	A1CF	-0.075165947
    A2M	-0.184689825	A1CF	-0.038566069
    can the answer be general to more than 2 columns

  2. #2
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: how to split one column in multiple columns in r

    How do you decide which values get placed in the same row? I get you are just unconcatenating, but is there a rhyme or reason for the orderings when you have no ID number?
    Stop cowardice, ban guns!

  3. #3
    Points: 891, Level: 15
    Level completed: 91%, Points required for next Level: 9

    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: how to split one column in multiple columns in r

    I decide when starting a new gene name so in this example A2M is one gene, the next gene is A1CF
    also you can say I can separate them every 132 rows so for example the first A2M start from 1 to 132 then the next one from 133 up to 264
    Last edited by Marwah Soliman; 10-05-2017 at 01:34 PM.

  4. #4
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: how to split one column in multiple columns in r

    You can cbind (column bind) tables together. They're expected to have the same number of rows.

    Code: 
    x <- ... your data ...
    x <- cbind(
        x[x[, 1] == "A2M", ], 
        x[x[, 1] == "A1CF", ])
    Not sure why you would do this to your data though. Do you want to widen the table for display purposes? Then it might make more sense to have 2 columns with the header being the gene name (i.e., each column is a gene's vector of data). There is no reason I can think of to have 2 genes names on the same row in your table.

    Code: 
    x <- data.frame(
        A2M = x[x[, 1] == "A2M", 2], 
        A1CF = x[x[, 1] == "A1CF", 2])
    Here you filter out just the 2nd column (data values) for each of the genes and make a data frame (table) where each column is just those gene data values.

    Also, put column names (headers) on your tables. This helps label what we call a table's schema (structure).

    Your original data frame has the headers

    (gene, value)

    And you want us to make something like

    (gene.A2M, value.A2M, gene.A1CF, value.A1Cf)

    Which is a terrible structure. My solution would instead be

    (A2M, A1CF)

    Where the contents of the data frame are just the values. There are also functions that would take the original schema and "spread" the gene variable (which may have more than 2 genes) to the columns for us. This is efficient, since you wouldn't have to change your code every time your data changes and you have different genes in your data frame. Conceptually, my point still stands. The 2nd schema makes no sense to work with and there are no clean ways to take your original schema and generate the table if you have varying numbers of genes in your data frame. I'm talking about building a general solution, not one specific to this data set that has 2 genes with a 132 rows per gene. Yes, you can build a solution specific to that, but it wouldn't make any sense to do that programmatically. You might as well manually move things around in Excel.

    So

    Code: 
    library(tidyr)  # For spreading semantics
    x <- ... read in your data ...
    names(x) <- c("gene", "value")
    x <- spread(x, gene, value)  # Spread the gene values to columns, filling with the value variable
    # schema: Gene1, Gene2, ..., GeneN for N genes
    # unit of measure (definition of row): gene value (whatever values here represent)
    You should definitely use jQuery. It's really great and does all things.

  5. #5
    Points: 891, Level: 15
    Level completed: 91%, Points required for next Level: 9

    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: how to split one column in multiple columns in r

    Basically I have 10565 genes each gene with 132 rows , so as you said I want the header for each values be the gene names instead of having them in columns next to the values, how can I achieve this

  6. #6
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: how to split one column in multiple columns in r

    I updated the code using tidyr's spread function to do that, but I'm still skeptical that this approach makes sense. It depends on what you're trying to do. I'm not against having a 132x10565 dimension table, but it still begs the question how you're going to work with this data. You want your data structure to efficiently be processed by what you're going to do with that data.

    I can make arguments for why and when you would want to keep the original (gene, value) schema (visualization, for instance), or widening to the columns-of-genes data frame (such as per-gene computations), or converting this to a large matrix object whether 132x10565 or 10565x132. It just depends what you're trying to do with the data.
    You should definitely use jQuery. It's really great and does all things.

  7. #7
    Omega Contributor
    Points: 38,374, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: how to split one column in multiple columns in r

    Agreed BG. I would have to imagine that some machine just kicks out the data like this, and that no sane person would be so devious to just stack a bunch of data together.


    OP, what do you now plan to do with these data? For example any vector arithmetic or comparative analyses? Do you still not care the ordering of the observations?
    Stop cowardice, ban guns!

  8. #8
    Points: 891, Level: 15
    Level completed: 91%, Points required for next Level: 9

    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: how to split one column in multiple columns in r

    I will do eQTL and the data takes this structure
    That is why I want each column represent a gene then I will transpose it
    So each row is gene name and then 132 values

  9. #9
    Points: 891, Level: 15
    Level completed: 91%, Points required for next Level: 9

    Posts
    17
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: how to split one column in multiple columns in r

    Code: 
    library(tidyr)  # For spreading semantics
    x <- ... read in your data ...
    names(x) <- c("gene", "value")
    x <- spread(x, gene, value)  # Spread the gene values to columns, filling with the value variable
    # schema: Gene1, Gene2, ..., GeneN for N genes
    # unit of measure (definition of row): gene value (whatever values here represent)
    it gave me error Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,.....

  10. #10
    Probably A Mammal
    Points: 31,087, Level: 100
    Level completed: 0%, Points required for next Level: 0
    bryangoodrich's Avatar
    Location
    Sacramento, California, United States
    Posts
    2,564
    Thanks
    398
    Thanked 618 Times in 551 Posts

    Re: how to split one column in multiple columns in r


    I have no idea what you tried to run since my code wasn't designed to be copy-and-paste, much less on which command gave an error. It also makes no sense to project your data to column vectors by gene just to transpose it. Easy solution

    Code: 
    x <- read.csv(...whatever...)
    names(x) <- c("gene", "value")
    x <- matrix(x$value, ncol = 132, byrow = TRUE)  # Assumes your data is in the correct order by gene
    You should definitely use jQuery. It's really great and does all things.

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats