makePWMEmpiricalBackground {PWMEnrich} | R Documentation |
Make a background appropriate for empirical P-value calculation. The provided set of background sequences is contcatenated into a single long sequence which is then scanned with the motifs and raw scores are saved. This object can be very large.
makePWMEmpiricalBackground(bg.seq, motifs, bg.pseudo.count = 1, bg.source = "", verbose = TRUE, ...)
bg.seq |
a set of background sequences, either a list of DNAString object or DNAStringSet object |
motifs |
a set of motifs, either a list of frequency matrices, or a list of PWM objects. If frequency matrices are given, the background distribution is fitted from bg.seq. |
bg.pseudo.count |
the pseudo count which is shared between nucleotides when frequency matrices are given |
bg.source |
a free-form textual description of how the background was generated |
verbose |
if to produce verbose output |
... |
currently unused (this is for convenience for makeBackground function) |
For reliable P-value calculation the size of the background set needs to be at least seq.len / min.P.value. For instance, to get P-values at a resolution of 0.001 for a single sequence of 500bp, we would need a background of at least 500/0.001 = 50kb. This ensures that we can make 1000 independent 500bp samples from this background to properly estimate the P-value. For a group of sequences, we would take seq.len to be the total length of all sequences in a group.
## Not run: if(require("PWMEnrich.Dmelanogaster.background")){ data(MotifDb.Dmel.PFM) # make empirical background by saving raw scores for each bp in the sequence - this can be very large in memory! if(require("BSgenome.Dmelanogaster.UCSC.dm3")) makePWMEmpiricalBackground(Dmelanogaster$upstream2000[1:100], MotifDb.Dmel.PFM) } ## End(Not run)