get_bkg {universalmotif} | R Documentation |
For a set of input sequences, calculate the overall sequence background for
any k-let size. For very large sequences, this is only recommended for
non-DNA/RNA sequences: otherwise use the much faster and more efficient
Biostrings::oligonucleotideFrequency()
.
get_bkg(sequences, k = 1:3, as.prob = TRUE, pseudocount = 0, alphabet = NULL, to.meme = NULL, RC = FALSE, list.out = TRUE, nthreads = 1)
sequences |
|
k |
|
as.prob |
|
pseudocount |
|
alphabet |
|
to.meme |
If not |
RC |
|
list.out |
|
nthreads |
|
If to.meme = NULL
and list.out = TRUE
: a list with each entry being a
named numeric vector for every element in k
. If to.meme = NULL
and
list.out = FALSE
: a named numeric vector. Otherwise: NULL
, invisibly.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
Bailey T, Elkan C (1994). “Fitting a mixture model by expectation maximization to discover motifs in biopolymers.” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 2, 28–36.
create_sequences()
, scan_sequences()
, shuffle_sequences()
## Compare to Biostrings version library(Biostrings) seqs.DNA <- create_sequences() bkg.DNA <- get_bkg(seqs.DNA, k = 3, as.prob = FALSE, list.out = FALSE) bkg.DNA2 <- oligonucleotideFrequency(seqs.DNA, 3, 1, as.prob = FALSE) bkg.DNA2 <- colSums(bkg.DNA2) all(bkg.DNA == bkg.DNA2) ## Create a MEME background file get_bkg(seqs.DNA, k = 1:3, to.meme = stdout(), pseudocount = 1) ## Non-DNA/RNA/AA alphabets seqs.QWERTY <- create_sequences("QWERTY") bkg.QWERTY <- get_bkg(seqs.QWERTY, k = 1:2)