enrich_motifs {universalmotif}R Documentation

Enrich for input motifs in a set of sequences.

Description

Given a set of target and background sequences, test if the input motifs are significantly enriched in the targets sequences relative to the background sequences.

Usage

enrich_motifs(motifs, sequences, bkg.sequences, search.mode = "hits",
  max.p = 1e-05, max.q = 1e-05, max.e = 0.001, qval.method = "fdr",
  positional.test = "t.test", threshold = 0.001,
  threshold.type = "pvalue", verbose = 1, RC = FALSE, use.freq = 1,
  shuffle.k = 2, shuffle.method = "euler",
  shuffle.leftovers = "asis", return.scan.results = FALSE,
  progress = TRUE, BP = FALSE)

Arguments

motifs

See convert_motifs() for acceptable motif formats.

sequences

XStringSet Alphabet should match motif.

bkg.sequences

XStringSet Optional; if missing, shuffle_sequences() is used to create background sequences from the input sequences.

search.mode

character(1) One of c('hits', 'positional', 'both'). See details.

max.p

numeric(1) P-value threshold.

max.q

numeric(1) Adjusted P-value threshold. This is only useful if multiple motifs are being enriched for.

max.e

numeric(1). The E-value is calculated by multiplying the adjusted P-value with the number of input motifs times two (McLeay and Bailey 2010).

qval.method

character(1) See stats::p.adjust().

positional.test

character(1) One of c('t.test', 'wilcox.test', 'chisq.test', 'shapiro.test'). If using the Shapiro test for normality, then only the input sequences are tested for positionality; the background sequences are ignored. See stats::t.test(), stats::wilcox.test(), stats::chisq.test(), stats::shapiro.test().

threshold

numeric(1) Between 1 and 0. See scan_sequences().

threshold.type

character(1) One of c('logodds', 'pvalue'). See scan_sequences().

verbose

numeric(1) 0 for no output, 4 for max verbosity.

RC

logical(1) Whether to consider the reverse complement of the sequences. Only available for DNAStringSet, RNAStringSet sequences.

use.freq

numeric(1) If the multifreq slot of the motifs are filled, then they can be used to scan the sequences. See scan_sequences().

shuffle.k

numeric(1) The k-let size to use when shuffling input sequences. Only used if no background sequences are input. See shuffle_sequences().

shuffle.method

character(1) One of c('euler', 'markov', 'linear', 'random'). See shuffle_sequences(). The 'random' method is deprecated and will be removed in the next minor version.

shuffle.leftovers

character(1) One of c('asis', 'first', 'split', 'discard'). Only used if shuffle.method = 'random'. See shuffle_sequences().

return.scan.results

logical(1) Return output from scan_sequences(). For large jobs, leaving this as FALSE can save a small amount time by preventing construction of the complete results data.frame from scan_sequences().

progress

logical(1) Show progress. Note recommended if BP = TRUE. Set to FALSE if verbose = 0

BP

logical(1) Allows the use of BiocParallel within enrich_motifs(). See BiocParallel::register() to change the default backend. Setting BP = TRUE is only recommended for exceptionally large jobs (be wary of memory usage however, as enrich_motifs() does not try and limit itself in this regard). Furthermore, the behaviour of progress = TRUE is changed if BP = TRUE; the default BiocParallel progress bar will be shown (which unfortunately is much less informative).

Details

To find enriched motifs, scan_sequences() is run on both target and background sequences. If search.mode = 'hits', stats::fisher.test() is run to test for enrichment. If search.mode = 'positional', then the test as set by positional.test is run to check for positional differences between target and background sequences. However if positional.test = 'shapiro.test', then only target sequence hits are considered.

Value

data.frame Motif enrichment results. The resulting data.frame contains the following columns: * motif Motif name. * total.seq.hits Total number of matches across all target sequences. * num.seqs.hits Number of target sequences which contain matches. * num.seqs.total Number of target sequences. * total.bkg.hits Total number of matches across all background sequences. * num.bkg.hits Number of background sequences which contain matches. * num.bkg.total Number of background sequences. * Pval.hits P-value of enrichment. Only shown if search.mode = c('hits', 'both'). * Qval.hits Q-val of enrichment. Only shown if search.mode = c('hits', 'both'). * Eval.hits E-val of enrichment. Only shown if search.mode = c('hits', 'both'). * Pval.pos P-value of positional comparison. Only shown if search.mode = c('positional', 'both'). * Qval.pos Q-value of positional comparison. Only shown if search.mode = c('positional', 'both'). * Eval.pos E-value of positional comparison. Only shown if search.mode = c('positional', 'both').

Author(s)

Benjamin Jean-Marie Tremblay b2tremblay@uwaterloo.ca

References

McLeay R, Bailey T (2010). “Motif Enrichment Analysis: A unified framework and method evaluation.” BMC Bioinformatics, 11.

See Also

scan_sequences(), shuffle_sequences(), add_multifreq(), motif_pvalue()

Examples

data(ArabidopsisPromoters)
data(ArabidopsisMotif)
enrich_motifs(ArabidopsisMotif, ArabidopsisPromoters, threshold = 0.01)


[Package universalmotif version 1.2.1 Index]