+ Reply to Thread
Results 1 to 12 of 12

Thread: Display completely a Character object

  1. #1
    Points: 251, Level: 5
    Level completed: 2%, Points required for next Level: 49

    Posts
    13
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Display completely a Character object



    Hi! I'm new to R and I'm stuck in a problem with a particular package but perhaps someone can help me.

    I'm using the cosmo package to find repeated subsequences in sequences of letters
    http://www.bioconductor.org/packages...tml/cosmo.html
    To run it with the dataset below I just execute 3 commands:

    >library(cosmo)
    > seqal <- system.file("Exfiles/short1.FASTA", package="cosmo")
    > res <- cosmo(seqs=seqal, constraints="None", minW=10, maxW=10, models="TCM")

    As result I get a list of motifs that I paste below but I need ALL the motifs not only the first 16, as I understand from the documentation "motifs" is a Character object how could I display it completely?

    1 1146 91 1 TTTTATTTTT 0.9797432
    2 1145 89 1 TTTTTTTTTT 0.9762943
    3 1187 87 1 TTTTTTTTTA 0.8750529
    4 1145 26 1 TTTTTTTTTA 0.8741454
    5 1196 73 -1 CAAAAAAAAA 0.8727398
    6 1196 97 -1 CAAAAAAAAA 0.8628938
    7 1250 79 -1 TAAAAAAAAA 0.8587207
    8 1157 40 -1 TAAAATAAAA 0.7922084
    9 1256 92 -1 AAAACTAAAA 0.7366366
    10 1145 102 1 TTTTTTTTAA 0.6705112
    11 1250 13 -1 TAACCAAAAA 0.6118567
    12 1114 70 -1 AAACCAAAAA 0.5864355
    13 1088 86 -1 CAAACTAAAA 0.5631808
    14 1122 73 -1 AAACAAAAAA 0.5310371
    15 1088 124 1 TATTATTTTA 0.4966246
    16 1187 8 -1 CAACAAAAAA 0.4187826



    >1068
    CACCACAACCACCACCACCCCCCACTTCATATATCAATATACAACCAACATCCTCCCTCAAACTAATCCAAAACCTCCCCCCAAAAA

    >1073
    CCAAAAAACAAATCAAAAATCCAATAATTCCACTTTTTAAATTACCTTCTCTATATATAACTCAAACAACA

    >1088
    CCCACAATCACCTCCTCCTTCTCCCTAATATACACCCCCACACCACACATCCAATCAATCATCTTTTTTTCCAAAATTTTTCCCCCAAACTAAAAACATATCCCCATACCCCCCCCCCCCCCTTATTATTTTAA

    >1114
    ACTCACACAAAATTCCCAATCTAAAATCACATTCACTATTCACTTTCTAATATTTCCTTAAAATACATAAAACCAAAAATCCTATTTTCACTACACAC

    >1122
    CAAAAACCCCAAAAACCTTTCCTTTCTTCCCCTTTAAAATAAACTCCATCAACCCTAAATAAAAATCCTCCCAAACAAAAAA

    >1145
    TTTCCTCTTTTTTTTTTTTTTCTCCTTTTTTTTTATTATTTATTATATATCTTAACATTTAATAATTACTTAATTTTATATTCTCTCCTTTTTTTTTTCTCTTTTTTTTAAATTTCCTAATTTTAATTATTTCACAATTCTAATTTCCACATTTATA



    >1146
    CCCTACTCATACTCTCCACTACCCAACACTCCTTCAATTAAATTTAATTCATTACCTTATACTACCATTAACACCCAAAAAATTTCTCACTTTTATTTTTAATTTTTCTATCAAATTAAATCCATCCATTA

    >1157
    CTCTAATTTTCATCCATTTCCCACCTTTACAACAAAACCTAAAATAAAATTTCCCAATTCAACTTCCAAACAATCTCTTAATTATCTTCCAATAATACTCATTAAAAAAATTCTCCAACCAACC

    >1161
    ACACTCATATCCAATTTATATCTCCAAAAAACTCATATAATACCCTACATTACTTATTCCCAACCACCTACAACAAAATCTCATCATTTAAA

    >1164
    TCTCCTATCTTCTTATAA

    >1187
    AAAACCACAACAAAAAAACCAAAAAACCCAAAAACTCCCCTTCTCCTTCTCTCAATAAATCATAATTCTCCTTTTCTCCTTTCTTCTTTTTTTTTATACCATAAATATACCTAATAA

    >1196
    CTCCACACACTCTACAAATCATCAACTATCCCCCAACACTCTCCCTTCCCCTTCCCCCCCCTAAACTAATCCCAAAAAAAAACCCCACCACAACCCCAAAAAAAAA

    >1206
    CTTCAAACCCCACAAACCTACATATACCTACAAAATTCACCAACAA


    >1250
    CCAACTTCATCATAACCAAAAATCCATACCCACTTAACATCCACATTATATAACAACAACCCCCAATATATACCTAATTAAAAAAAAACCACTAACAAAA

    >1256
    ACCAAAAACCCTTACAAAACTCCATAACACCTCAACAAACAAAAAAAAAAAAACCCACAAAACCCCTCAAAACACAAAAAACAATCACAAAAAAACTAAAAAAA

    >1258
    CCCTATATAATTACTACTATACTAACCCTTTCATTTAACCCTCTAATCTCCAATCTTACTACCCTACTCACACCCTCTACCTACACCCTAACTAACTCCTAATCTAACACTCCCAAATCACTA

    >1293
    TCTATCATCATATTCCTACATTATACTCACTCTCCTCCCTTCCCCTTCAACCTATTCAAATCTTCTCCCACCCCCTCCCCTCTTATTCCATTANG

    Thanks
    Ana

  2. #2
    FormerlyKnownAsRaptor
    Points: 24,442, Level: 95
    Level completed: 10%, Points required for next Level: 908
    Awards:
    Activity Award
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    3,174
    Thanks
    883
    Thanked 552 Times in 500 Posts

    Re: Display completely a Character object

    Your example isn't reproducible (for me anyway). Try
    Code: 
    str(res)
    and see what information that tells you. I suspect it's a list you can access the pieces of and only part of it is printed.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  3. The Following User Says Thank You to trinker For This Useful Post:

    anarichardson (09-29-2012)

  4. #3
    Points: 616, Level: 12
    Level completed: 32%, Points required for next Level: 34

    Location
    US
    Posts
    26
    Thanks
    1
    Thanked 8 Times in 8 Posts

    Re: Display completely a Character object

    Try accessing the slot named motif of the returned cosmo class using the @ symbol or slot function
    Code: 
    res@motifs
    slot(res, "motif")
    but there may not be more than 16

  5. The Following User Says Thank You to copeg For This Useful Post:

    anarichardson (09-29-2012)

  6. #4
    Point Mass at Zero
    Points: 5,855, Level: 49
    Level completed: 53%, Points required for next Level: 95
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    635
    Thanks
    169
    Thanked 130 Times in 128 Posts

    Re: Display completely a Character object

    I tried your example, was not reproducible for me either.
    Instead, here's one example from the vignette. Str does the trick as trinker said.
    Code: 
    seqFile <- system.file("Exfiles/seq.fasta", package="cosmo")
    res <- cosmo(seqs=seqFile, constraints="None" , minW=7, maxW=8, models=c("OOPS", "TCM"))
    Code: 
    summary(res) ## Displays the motifs
         seq pos orient    motif      prob
    1   Seq1  69      1 CGCCAGCG 1.0000000
    2   Seq6  21      1 CCCATGGG 1.0000000
    3   Seq3  86      1 CGGACGCG 1.0000000
    4  Seq10  14      1 GGCACGCG 1.0000000
    5   Seq2  35      1 CGCGCGCG 1.0000000
    6   Seq4  15      1 CCGGAGCG 1.0000000
    7   Seq8  25      1 GGGCTGCC 1.0000000
    8   Seq7  69      1 CGGGCGGG 0.9241186
    9   Seq5  79      1 GGCCGGCG 0.9022922
    10  Seq9   7      1 CGCCCGCG 0.8581486
    Code: 
    
    str(res@seqs) ## displays the whole sequence
    
     $ :List of 2
      ..$ seq : chr "GCTTCTGGTTCTATTGAAAAAATTCAGGTGGGGATGTCCAGGTGAAAAATATTTTTAGAATAGAAAAACGCCAGCGTTACATATCTTTAGGGAGAAGCGG"
      ..$ desc: chr "Seq1"
     $ :List of 2
      ..$ seq : chr "GATTGTGTTTAAAATAGATGTTTAAAGTAGCTCTCGCGCGCGCACTTTAGATGAATTTAAATGATATACATTGAGGACGATATTCTTCAAAACCGCCGCA"
    ..... etc etc
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  7. The Following User Says Thank You to ledzep For This Useful Post:

    anarichardson (09-29-2012)

  8. #5
    Point Mass at Zero
    Points: 5,855, Level: 49
    Level completed: 53%, Points required for next Level: 95
    ledzep's Avatar
    Location
    Berks,UK
    Posts
    635
    Thanks
    169
    Thanked 130 Times in 128 Posts

    Re: Display completely a Character object

    Just an aside, you can paste your R codes and outputs using
    [CODE] [/CODE]
    For Example
    [CODE] a1<-lm(y~x, data=mydata) [/CODE] displays
    Code: 
     a1<-lm(y~x, data=mydata)
    Or simply select your code/output and hastag them.
    Oh Thou Perelman! Poincare's was for you and Riemann's is for me.

  9. The Following 2 Users Say Thank You to ledzep For This Useful Post:

    anarichardson (09-29-2012), trinker (09-29-2012)

  10. #6
    Points: 251, Level: 5
    Level completed: 2%, Points required for next Level: 49

    Posts
    13
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Display completely a Character object

    Thanks so much for your comments, I've run

    Code: 
    > str(res@motifs)
    and got this

    Code: 
    Formal class 'align' [package "cosmo"] with 6 slots
      ..@ seq   : chr [1:16] "1250" "1145" "1145" "1122" ...
      ..@ pos   : int [1:16] 79 89 101 73 56 87 26 87 73 70 ...
      ..@ orient: num [1:16] -1 1 1 -1 -1 1 1 -1 -1 -1 ...
      ..@ motif : chr [1:16] "TAAAAAAAAA" "TTTTTTTTTT" "CTTTTTTTTA" "AAACAAAAAA" ...
      ..@ prob  : num [1:16] 0.91 0.908 0.863 0.827 0.794 ...
      ..@ eval  : num 2.31e-24
    I attach the short1.txt file with my sequences, the extension must be .FASTA to reproduce my case the .txt needs to be saved as .FASTA.
    But I can't find out how could I get more than 16 motifs.
    I also tried
    Code: 
    > str(res@motifs@motif)
    and got
    Code: 
     chr [1:16] "TAAAAAAAAA" "TTTTTTTTTT" "CTTTTTTTTA" "AAACAAAAAA" "TAAATAAAAA" "TTTTTTTTTA" ...
    and I also tried
    Code: 
    >  slot(res, "motifs")
    and got
    Code: 
       seq pos orient      motif      prob
    1  1250  79     -1 TAAAAAAAAA 0.9097117
    2  1145  89      1 TTTTTTTTTT 0.9075282
    3  1145 101      1 CTTTTTTTTA 0.8628318
    4  1122  73     -1 AAACAAAAAA 0.8273926
    5  1122  56     -1 TAAATAAAAA 0.7940559
    6  1187  87      1 TTTTTTTTTA 0.7787428
    7  1145  26      1 TTTTTTTTTA 0.7719197
    8  1088  87     -1 AAACTAAAAA 0.7601947
    9  1196  73     -1 CAAAAAAAAA 0.7511989
    10 1114  70     -1 AAACCAAAAA 0.7446275
    11 1196  97     -1 CAAAAAAAAA 0.7347967
    12 1146  91      1 TTTTATTTTT 0.6645410
    13 1088 123      1 TTATTATTTT 0.6589068
    14 1256  93     -1 AAACTAAAAA 0.6422225
    15 1250  13     -1 TAACCAAAAA 0.6247923
    16 1157  40     -1 TAAAATAAAA 0.5360757
    If I run
    Code: 
    > summary(res@pwm)
    I get this matrix with non zero values in the G row, so there should be some G in the motifs so there should be more motifs other than those 16 that I don't know how to see
    Code: 
    Position weight matrix:
           1 2      3      4     5      6      7 8 9     10
    A 0.0000 0 0.0566 0.0000 0.119 0.1931 0.0000 0 0 0.2833
    C 0.0941 0 0.0000 0.0000 0.000 0.0000 0.0000 0 0 0.0000
    G 0.0000 0 0.0000 0.0281 0.000 0.1656 0.3166 0 0 0.0908
    T 0.9059 1 0.9434 0.9719 0.881 0.6413 0.6834 1 1 0.6259
    Attached Files

  11. #7
    Points: 616, Level: 12
    Level completed: 32%, Points required for next Level: 34

    Location
    US
    Posts
    26
    Thanks
    1
    Thanked 8 Times in 8 Posts

    Re: Display completely a Character object

    The PWM defines a way to score similar motifs - some motifs can score really well, others relatively poorly, and often one prefers to look at only the top scoring motifs. The contribution of the G is small in the PWM, thus most likely won't show up in any top scoring motifs. I'm guessing the cosmo function has some scoring cutoff (hence my comment above). Don't know if it has a way to alter the threshold to show lower scoring motifs, but you can investigate your sequences directly through Biostrings
    Code: 
    library(Biostrings) #Load the biostrings library
    seqs = readFASTA ( "Exfiles/short1.FASTA");#read in the Sequences
    #Analyze the first sequence with the PWM, specifying a minimum score
    m = matchPWM(res@pwm@pwm, seqs[[1]]$seq, min.score="75%")

  12. The Following User Says Thank You to copeg For This Useful Post:

    anarichardson (10-03-2012)

  13. #8
    Points: 251, Level: 5
    Level completed: 2%, Points required for next Level: 49

    Posts
    13
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Display completely a Character object

    Quote Originally Posted by copeg View Post
    The PWM defines a way to score similar motifs - some motifs can score really well, others relatively poorly, and often one prefers to look at only the top scoring motifs. The contribution of the G is small in the PWM, thus most likely won't show up in any top scoring motifs. I'm guessing the cosmo function has some scoring cutoff (hence my comment above). Don't know if it has a way to alter the threshold to show lower scoring motifs, but you can investigate your sequences directly through Biostrings
    Code: 
    library(Biostrings) #Load the biostrings library
    seqs = readFASTA ( "Exfiles/short1.FASTA");#read in the Sequences
    #Analyze the first sequence with the PWM, specifying a minimum score
    m = matchPWM(res@pwm@pwm, seqs[[1]]$seq, min.score="75%")
    Thanks copeg, I think that finding that cutoff value could be the key.. I reviewed all the input parameters and cannot find it..
    That's why I'm trying to debug it but it's also confusing..

    I'm a psychologist so my background in software is limited..
    If I get it right you mean that I could use Biostrings package as another way to find motifs?

  14. #9
    Points: 616, Level: 12
    Level completed: 32%, Points required for next Level: 34

    Location
    US
    Posts
    26
    Thanks
    1
    Thanked 8 Times in 8 Posts

    Re: Display completely a Character object

    Quote Originally Posted by anarichardson View Post
    If I get it right you mean that I could use Biostrings package as another way to find motifs?
    I mean that you can use the cosmo package to find the Position Weight Matrix (PWM), then you can use the Biostrings package to search a sequence for a particular motif using a PWM (it does not look like the cosmo function has a parameter to set a threshold). Using Biostrings gives you the liberty to not only manually set the score threshold, but also search any sequence (not just those in the training set)

  15. The Following User Says Thank You to copeg For This Useful Post:

    Dason (10-03-2012)

  16. #10
    Points: 251, Level: 5
    Level completed: 2%, Points required for next Level: 49

    Posts
    13
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Display completely a Character object

    But I've already found this motifs
    Code: 
    1 1146 91 1 TTTTATTTTT 0.9797432
    2 1145 89 1 TTTTTTTTTT 0.9762943
    3 1187 87 1 TTTTTTTTTA 0.8750529
    4 1145 26 1 TTTTTTTTTA 0.8741454
    5 1196 73 -1 CAAAAAAAAA 0.8727398
    6 1196 97 -1 CAAAAAAAAA 0.8628938
    7 1250 79 -1 TAAAAAAAAA 0.8587207
    8 1157 40 -1 TAAAATAAAA 0.7922084
    9 1256 92 -1 AAAACTAAAA 0.7366366
    10 1145 102 1 TTTTTTTTAA 0.6705112
    11 1250 13 -1 TAACCAAAAA 0.6118567
    12 1114 70 -1 AAACCAAAAA 0.5864355
    13 1088 86 -1 CAAACTAAAA 0.5631808
    14 1122 73 -1 AAACAAAAAA 0.5310371
    15 1088 124 1 TATTATTTTA 0.4966246
    16 1187 8 -1 CAACAAAAAA 0.4187826
    How should I search a sequence for a particular motif? which particular motif? and which sequence?
    I think that the easier way is to try to make cosmo show all the motifs and not only 16.. but I don't know how to show variables in R, for sure that the motifs I'm searching for are in some of the variables in the cosmo.R script but I don't know how to show them..
    I tried printing some messages and rebuilding the package but didn't work.

  17. #11
    Points: 616, Level: 12
    Level completed: 32%, Points required for next Level: 34

    Location
    US
    Posts
    26
    Thanks
    1
    Thanked 8 Times in 8 Posts

    Re: Display completely a Character object

    How should I search a sequence for a particular motif? which particular motif? and which sequence?
    See post #7. Does the code make sense?

    I think that the easier way is to try to make cosmo show all the motifs and not only 16.. but I don't know how to show variables in R, for sure that the motifs I'm searching for are in some of the variables in the cosmo.R script but I don't know how to show them..
    I tried printing some messages and rebuilding the package but didn't work
    Why is this easier? Did you try the code I posted above? Again, there may be more than 16 sequences, but the probabilities are going to be low (and thus chance of false positives high)

    Begin my .02 - while I don't know what you are looking for (promoters?) I'd be skeptical of TTTTTTTTTT being amongst the top hits (it has low complexity and found in a great many sequences). You seem to be convinced there are more motifs than those that cosmo reported...and while there are (defined by the PWM) their probabilities are low (hence not reported by cosmo), and the harder you look - the less restrictive you are with scoring tolerances - your chances of false positives dramatically increases. A recommendation: DUST the sequences (see the NCBI toolkit) prior to analysis (presuming cosmo accepts dusted sequences), and do some post-hoc type of analysis in which you attempt to make sure the PWM you find is enriched in the training sequences relative to another list of similar sequences.
    End My .02

  18. The Following User Says Thank You to copeg For This Useful Post:

    anarichardson (10-04-2012)

  19. #12
    Points: 251, Level: 5
    Level completed: 2%, Points required for next Level: 49

    Posts
    13
    Thanks
    7
    Thanked 0 Times in 0 Posts

    Re: Display completely a Character object


    I think I should have explained the original problem before trying to solve it.. my sequences aren't biological data, they are not DNA sequences, so in my sequences I don't have the G letter, only A,C and T.. so when I run cosmo it crashes. Then I added one G in one of my sequences so cosmo at least does not crash and I get some motifs.
    I thought that I could find all the motifs that cosmo uses for building the pwm and then delete the motifs with G (they have low probabilities) and manually create a new pwm.. but I think that this is not the best solution.. I don't even know what a promoter is.. what I'm trying to find are short sequences that appear (with higher probability) in my sequences

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats