+ Reply to Thread
Results 1 to 3 of 3

Thread: Patterns in DNA, seperating signal from noise -- outliers?

  1. #1
    Points: 899, Level: 15
    Level completed: 99%, Points required for next Level: 1

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Patterns in DNA, seperating signal from noise -- outliers?




    Hello all,

    I'm looking at patterns in fragments of RNA. What I need to do is
    separate the signals from noise in Exons (small regions of DNA/RNA).


    Consider copies of an Exon (actually multiple measurements of this exon), the
    population has 1 occurrence of pattern at position 84, 25 occurrences
    at position 210, 39 occurrences at position 245, 1 occurence at
    positions 336,624,846 etc...

    Exon identifier, patternpos count
    00000002400000001265, 00084 1
    00000002400000001265, 00210 25
    00000002400000001265, 00245 39
    00000002400000001265, 00336 1
    00000002400000001265, 00624 1
    00000002400000001265, 00846 1
    00000002400000001265, 00998 7

    It seems that 1 occurrence of the pattern at positions 84, 336, 624, and 846
    is just noise, and the signal is at positions 210 and 245 and potentially
    at position 997 which has 7 occurrences of the pattern.

    At this stage I'm only interested in the first two "valid" signals. What
    is a valid method for picking the first two signals or outliers? Are these
    considered outliers?

    I've seen Z scores to remove outliers (or select them) but is this valid?
    I've read it's not appropriate to remove one then re-perform the test after
    selecting one.

    My question in a nutshell is what method can I use to select the first two signals
    that I can find in the list of counts?


    In another exon:
    00000003800000001913, 00858 6
    00000003800000001913, 00863 1
    00000003800000001913, 01040 34
    00000003800000001913, 01154 3
    00000003800000001913, 01313 3
    00000003800000001913, 01314 5
    00000003800000001913, 01349 7
    00000003800000001913, 01502 1

    Here I would say the 6, 34, 7 are signals.

    I haven't much experience in statistics and would greatly appreciate advice on this.


    Thanks in advance,

    James

  2. #2
    Points: 899, Level: 15
    Level completed: 99%, Points required for next Level: 1

    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Patterns in DNA, seperating signal from noise -- outliers?

    For anyone who's interested. I found the simple way was to calculate the first quartile Q1, and then discard counts below Q1.

  3. #3
    TS Contributor
    Points: 12,227, Level: 72
    Level completed: 45%, Points required for next Level: 223
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,470
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: Patterns in DNA, seperating signal from noise -- outliers?


    hi,
    just by looking at your numbers, you could try to fit a poisson distribution on them and see where you have large deviations (large chi-squared values). Those could be good candidates.

    regards

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats