scoreTranscripts {transite} | R Documentation |
This function is used to count the binding sites in a set of sequences for
all or a
subset of RNA-binding protein sequence
motifs and returns the result in a data frame, which is subsequently used by
calculateMotifEnrichment
to
obtain binding site enrichment scores.
scoreTranscripts(sequences, motifs = NULL, max.hits = 5, threshold.method = "p.value", threshold.value = 0.25^6, n.cores = 1, cache = paste0(tempdir(), "/sc/"))
sequences |
character vector of named sequences
(only containing upper case characters A, C, G, T), where the names are
RefSeq identifiers
and sequence
type qualifiers ( |
motifs |
a list of motifs that is used to score the specified sequences.
If |
max.hits |
maximum number of putative binding sites per mRNA that are counted |
threshold.method |
either |
threshold.value |
semantics of the |
n.cores |
the number of cores that are used |
cache |
either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If |
A list with three entries:
(1) df: a data frame with the following columns:
motif.id | the motif identifier that is used in the original motif library |
motif.rbps | the gene symbol of the RNA-binding protein(s) |
absolute.hits | the absolute frequency of putative binding sites per motif in all transcripts |
relative.hits | the relative, i.e., absolute divided by total, frequency of binding sites per motif in all transcripts |
total.sites | the total number of potential binding sites |
one.hit , two.hits , ... | number of transcripts with one, two, three, ... putative binding sites |
(2) total.sites: a numeric vector with the total number of potential binding sites per transcript
(3) absolute.hits: a numeric vector with the absolute (not relative) number of putative binding sites per transcript
Other matrix functions: calculateMotifEnrichment
,
runMatrixSPMA
, runMatrixTSMA
,
scoreTranscriptsSingleMotif
foreground.set <- c( "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU", "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU", "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU", "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA", "AUAGAC", "AGUUC", "CCAGUAA" ) # names are used as keys in the hash table (cached version only) # ideally sequence identifiers (e.g., RefSeq ids) and region labels # (e.g., 3UTR for 3'-UTR) names(foreground.set) <- c( "NM_1_DUMMY|3UTR", "NM_2_DUMMY|3UTR", "NM_3_DUMMY|3UTR", "NM_4_DUMMY|3UTR", "NM_5_DUMMY|3UTR", "NM_6_DUMMY|3UTR", "NM_7_DUMMY|3UTR", "NM_8_DUMMY|3UTR", "NM_9_DUMMY|3UTR", "NM_10_DUMMY|3UTR", "NM_11_DUMMY|3UTR", "NM_12_DUMMY|3UTR", "NM_13_DUMMY|3UTR", "NM_14_DUMMY|3UTR" ) # specific motifs, uncached motifs <- getMotifByRBP("ELAVL1") scores <- scoreTranscripts(foreground.set, motifs = motifs, cache = FALSE) ## Not run: # all Transite motifs, cached (writes scores to disk) scores <- scoreTranscripts(foreground.set) # all Transite motifs, uncached scores <- scoreTranscripts(foreground.set, cache = FALSE) foreground.df <- transite:::ge$foreground1 foreground.set <- foreground.df$seq names(foreground.set) <- paste0(foreground.df$refseq, "|", foreground.df$seq.type) scores <- scoreTranscripts(foreground.set) ## End(Not run)