I copied and pasted texts below that is found in a website (http://a-little-book-of-r-for-bioinf.../chapter7.html). But that didn't work. Instead I got "Error in .Method(..., na.last = na.last, decreasing = decreasing) : argument 1 is not a vector"
What do I need to do to get this one correct???
Code:
library("Biostrings")
s1 <- "aaaatgcagtaacccatgccc"
matchPattern("atg", s1) # Find all ATGs in the sequence s1
# Views on a 21-letter BString subject
#subject: aaaatgcagtaacccatgccc
#views:
# start end width
#[1] 4 6 3 [atg]
#[2] 16 18 3 [atg]
findPotentialStartsAndStops <- function(sequence)
{
# Define a vector with the sequences of potential start and stop codons
codons <- c("atg", "taa", "tag", "tga")
# Find the number of occurrences of each type of potential start or stop codon
for (i in 1:4)
{
codon <- codons[i]
# Find all occurrences of codon "codon" in sequence "sequence"
occurrences <- matchPattern(codon, sequence)
# Find the start positions of all occurrences of "codon" in sequence "sequence"
codonpositions <- attr(occurrences,"start")
# Find the total number of potential start and stop codons in sequence "sequence"
numoccurrences <- length(codonpositions)
if (i == 1)
{
# Make a copy of vector "codonpositions" called "positions"
positions <- codonpositions
# Make a vector "types" containing "numoccurrences" copies of "codon"
types <- rep(codon, numoccurrences)
}
else
{
# Add the vector "codonpositions" to the end of vector "positions":
positions <- append(positions, codonpositions, after=length(positions))
# Add the vector "rep(codon, numoccurrences)" to the end of vector "types":
types <- append(types, rep(codon, numoccurrences), after=length(types))
}
}
# Sort the vectors "positions" and "types" in order of position along the input sequence:
indices <- order(positions)
positions <- positions[indices]
types <- types[indices]
# Return a list variable including vectors "positions" and "types":
mylist <- list(positions,types)
return(mylist)
}
s1 <- "aaaatgcagtaacccatgccc"
findPotentialStartsAndStops(s1)
Re: Finding start and stop codons in a DNA sequence
Well whoever coded that originally isn't very good at coding "the R way" but I'll forget about that for now. The problem occurs with
Code:
codonpositions <- attr(occurrences, "start")
that doesn't work since an XStringViews object doesn't have a "start" attribute. It has a ranges attribute that has start values but really there's an easier way to get those start values... Just change that line to this:
Code:
codonpositions <- start(occurrences)
and everything should work fine. But yeah... that code could be much improved.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Re: Finding start and stop codons in a DNA sequence
But really this would be a much nicer way to do the same task:
Code:
library("Biostrings")
findPotentialStartsAndStops <- function(sequence, codons = c("atg", "taa", "tag", "tga")){
# find the starting position of the matches for all the codons
startpositions <- sapply(codons, function(x){start(matchPattern(x, sequence))})
# combine them into a vector and sort by position
output <- sort(do.call(c, startpositions))
# If we really *need* a list
#output <- list(as.numeric(output), names(output))
return(output)
}
s1 <- "aaaatgcagtaacccatgccc"
findPotentialStartsAndStops(s1)
I personally like the results more as a named vector but if you really want the results as a list then you can just uncomment the list part of the function and you should get the same results as before - just faster... and the code is much more R like.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich
Re: Finding start and stop codons in a DNA sequence
Thanks for your quick reply!
That worked
I encountered another problem with finding open reading frames using texts below. I got "numeric (0)". I would appreciate it if you could help me with this.
> findORFsinSeq <- function(sequence)
{
require(Biostrings)
# Make vectors "positions" and "types" containing information on the positions of ATGs in the sequence:
mylist <- findPotentialStartsAndStops(sequence)
positions <- mylist[[1]]
types <- mylist[[2]]
# Make vectors "orfstarts" and "orfstops" to store the predicted start and stop codons of ORFs
orfstarts <- numeric()
orfstops <- numeric()
# Make a vector "orflengths" to store the lengths of the ORFs
orflengths <- numeric()
# Print out the positions of ORFs in the sequence:
# Find the length of vector "positions"
numpositions <- length(positions)
# There must be at least one start codon and one stop codon to have an ORF.
if (numpositions >= 2)
{
for (i in 1numpositions-1))
{
posi <- positions[i]
typei <- types[i]
found <- 0
while (found == 0)
{
for (j in (i+1):numpositions)
{
posj <- positions[j]
typej <- types[j]
posdiff <- posj - posi
posdiffmod3 <- posdiff %% 3
# Add in the length of the stop codon
orflength <- posj - posi + 3
if (typei == "atg" && (typej == "taa" || typej == "tag" || typej == "tga") && posdiffmod3 == 0)
{
# Check if we have already used the stop codon at posj+2 in an ORF
numorfs <- length(orfstops)
usedstop <- -1
if (numorfs > 0)
{
for (k in 1:numorfs)
{
orfstopk <- orfstops[k]
if (orfstopk == (posj + 2)) { usedstop <- 1 }
}
}
if (usedstop == -1)
{
orfstarts <- append(orfstarts, posi, after=length(orfstarts))
orfstops <- append(orfstops, posj+2, after=length(orfstops)) # Including the stop codon.
orflengths <- append(orflengths, orflength, after=length(orflengths))
}
found <- 1
break
}
if (j == numpositions) { found <- 1 }
}
}
}
}
# Sort the final ORFs by start position:
indices <- order(orfstarts)
orfstarts <- orfstarts[indices]
orfstops <- orfstops[indices]
# Find the lengths of the ORFs that we have
orflengths <- numeric()
numorfs <- length(orfstarts)
for (i in 1:numorfs)
{
orfstart <- orfstarts[i]
orfstop <- orfstops[i]
orflength <- orfstop - orfstart + 1
orflengths <- append(orflengths,orflength,after=length(orflengths))
}
mylist <- list(orfstarts, orfstops, orflengths)
return(mylist)
}
Re: Finding start and stop codons in a DNA sequence
You should really make a separate thread for new questions. It really sounds like you just need some practice with R though so I would recommend checking out some of the resources in this thread. One other thing that would really help is if you would wrap your code in code tags like this:
[code]
a <- 2 + 2
[/code]
which gets changed to
Code:
a <- 2 + 2
which is much nicer because inside code tags indentation doesn't get stripped away.
"His programming is malfunctioning. It begins! Get your weapons, he's going to become a killbot!!!" - bryangoodrich