findORFs {ORFik} | R Documentation |
Find all Open Reading Frames (ORFs) on the simple input sequences
in ONLY 5'- 3' direction (+), but within all three possible reading frames.
Do not use findORFs for mapping to full chromosomes,
then use findMapORFs
!
For each sequence of the input vector IRanges
with START and
STOP positions (inclusive) will be returned as
IRangesList
. Returned coordinates are relative to the
input sequences.
findORFs( seqs, startCodon = startDefinition(1), stopCodon = stopDefinition(1), longestORF = TRUE, minimumLength = 0 )
seqs |
(DNAStringSet or character vector) - DNA/RNA sequences to search
for Open Reading Frames. Can be both uppercase or lowercase. Easiest call to
get seqs if you want only regions from a fasta/fasta index pair is:
seqs = ORFik:::txSeqsFromFa(grl, faFile), where grl is a GRanges/List of
search regions and faFile is a |
startCodon |
(character vector) Possible START codons to search for.
Check |
stopCodon |
(character vector) Possible STOP codons to search for.
Check |
longestORF |
(logical) Default TRUE. Keep only the longest ORF per
unique stopcodon: (seqname, strand, stopcodon) combination, Note: Not longest
per transcript! You can also use function
|
minimumLength |
(integer) Default is 0. Which is START + STOP = 6 bp. Minimum length of ORF, without counting 3bps for START and STOP codons. For example minimumLength = 8 will result in size of ORFs to be at least START + 8*3 (bp) + STOP = 30 bases. Use this param to restrict search. |
If you want antisence strand too, do:
#positive strands
pos <- findORFs(seqs)
#negative strands (DNAStringSet only if character)
neg <- findORFs(reverseComplement(DNAStringSet(seqs)))
relist(c(GRanges(pos, strand = "+"), GRanges(neg, strand = "-")),
skeleton = merge(pos, neg))
(IRangesList) of ORFs locations by START and STOP sites grouped by input sequences. In a list of sequences, only the indices of the sequences that had ORFs will be returned, e.g. 3 sequences where only 1 and 3 has ORFs, will return size 2 IRangesList with names c("1", "3"). If there are a total of 0 ORFs, an empty IRangesList will be returned.
Other findORFs:
findMapORFs()
,
findORFsFasta()
,
findUORFs()
,
startDefinition()
,
stopDefinition()
## Simple examples findORFs("ATGTAA") findORFs("ATGTTAA") # not in frame anymore findORFs("ATGATGTAA") # only longest of two above findORFs("ATGATGTAA", longestORF = FALSE) # two ORFs findORFs(c("ATGTAA", "ATGATGTAA")) # 1 ORF per transcript ## Get DNA sequences from ORFs seq <- DNAStringSet(c("ATGTAA", "AAA", "ATGATGTAA")) names(seq) <- c("tx1", "tx2", "tx3") orfs <- findORFs(seq, longestORF = FALSE) # you can get sequences like this: gr <- unlist(orfs, use.names = TRUE) gr <- GRanges(seqnames = names(seq)[as.integer(names(gr))], ranges(gr), strand = "+") # Give them some proper names: names(gr) <- paste0("ORF_", seq.int(length(gr)), "_", seqnames(gr)) orf_seqs <- getSeq(seq, gr) orf_seqs # Save as .fasta (orf_seqs must be of type DNAStringSet) # writeXStringSet(orf_seqs, "orfs.fasta") ## Reading from file and find ORFs #findORFs(readDNAStringSet("path/to/transcripts.fasta"))