kmerKLPlot-methods {qrqc} | R Documentation |
kmerKLPlot
calls calcKL
, which calculates the
Kullback-Leibler divergence between the k-mer distribution at each
position compared to the k-mer distribution across all
positions. kmerKLPlot
then plots each k-mer's contribution to
the total K-L divergence by stack bars, for a subset of the
k-mers. Since there are 4^k possible k-mers for some value k-mers,
plotting each often dilutes the interpretation; however one can
increase n.kmers
to a number greater than the possible number
of k-mers to force kmerKLPlot
to plot the entire K-L divergence
and all terms (which are k-mers) in the sum.
If a x
is a list
, the K-L k-mer plots are faceted by
sample; this allows comparison to a FASTA file of random reads.
Again, please note that this is not the total K-L divergence,
but rather the K-L divergence calculated on a subset of the sample
space (those of the top n.kmers
k-mers selected).
kmerKLPlot(x, n.kmers=20)
x |
an S4 object a class that inherits from |
n.kmers |
a integer value indicating the size of top k-mers to include. |
signature(x = "SequenceSummary")
kmerKLPlot
will plot the K-L divergence for a subset of k-mers for a single object that
inherits from SequenceSummary
.
signature(x = "list")
kmerKLPlot
will plot the K-L divergence for a susbet of
k-mers for each of the objects that inherit from
SequenceSummary
in the list and display them in a series of
panels.
The K-L divergence calculation in calcKL
uses base 2 in the
log; the units are in bits.
Also, note that ggplot2
warns that "Stacking is not well defined when ymin
!= 0". This occurs when some k-mers are less frequent in the positional
distribution than the distribution across all positions, and the term of
the K-L sum is negative (producing a bar below zero). This does not
appear to affect the plot much. In examples below, warnings are
suppressed, but the given this is a valid concern from ggplot2
,
warnings are not suppressed in the function itself.
Vince Buffalo <vsbuffalo@ucdavis.edu>
getKmer
, calcKL
,
kmerEntropyPlot
## Load a somewhat contaminated FASTQ file s.fastq <- readSeqFile(system.file('extdata', 'test.fastq', package='qrqc'), hash.prop=1) ## Load a really contaminated FASTQ file s.contam.fastq <- readSeqFile(system.file('extdata', 'test-contam.fastq', package='qrqc'), hash.prop=1) ## Load a random (equal base frequency) FASTA file s.random.fasta <- readSeqFile(system.file('extdata', 'random.fasta', package='qrqc'), type="fasta", hash.prop=1) ## Make K-L divergence plot - shows slight 5'-end bias. Note units ## (bits) suppressWarnings(kmerKLPlot(s.fastq)) ## Plot multiple K-L divergence plots suppressWarnings(kmerKLPlot(list("highly contaminated"=s.contam.fastq, "less contaminated"=s.fastq, "random"=s.random.fasta)))