getClusters {wavClusteR} | R Documentation |
Identifies clusters using either the mini-rank norm (MRN) algorithm (default
and recommended to achieve highest sensitivity) or via a continuous wavelet
transform (CWT) based approach. The former employs thresholding of
background coverage differences and finds the optimal cluster boundaries by
exhaustively evaluating all putative clusters using a rank-based approach.
This method has higher sensitivity and an approximately 10-fold faster
running time than the CWT-based cluster identification algorithm. The
latter, maintained for compatibility with wavClusteR
, computes the
CWT on a 1 kb window of the coverage function centered at a high-confidence
substitution site, and identifies cluster boundaries by extending away from
peak positions.
getClusters(highConfSub, coverage, sortedBam, method = 'mrn', cores = 1, threshold, step = 1, snr = 3)
highConfSub |
GRanges object containing high-confidence substitution sites as returned by the getHighConfSub function |
coverage |
An Rle object containing the coverage at each genomic position as returned by a call to coverage |
sortedBam |
a GRanges object containing all aligned reads, including read sequence (qseq) and MD tag (MD), as returned by the readSortedBam function |
method |
a character, either set to "mrn" or to "cwt" to compute clusters using the mini-rank norm or the wavelet transform-based algorithm, respectively. Default is "mrn" (recommended). |
cores |
integer, the number of cores to be used for parallel evaluation. Default is 1. |
threshold |
numeric, if |
step |
numeric, if |
snr |
numeric, if |
GRanges object containing the identified cluster boundaries.
Clusters returned by this function need to be further merged by the
function filterClusters
, which also computes all relevant cluster
statistics.
Federico Comoglio and Cem Sievers
William Constantine and Donald Percival (2011), wmtsa: Wavelet Methods for Time Series Analysis, http://CRAN.R-project.org/package=wmtsa
Sievers C, Schlumpf T, Sawarkar R, Comoglio F and Paro R. (2012) Mixture models and wavelet transforms reveal high confidence RNA-protein interaction sites in MOV10 PAR-CLIP data, Nucleic Acids Res. 40(20):e160. doi: 10.1093/nar/gks697
Comoglio F, Sievers C and Paro R (2015) Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data, BMC Bioinformatics 16, 32.
getHighConfSub
, filterClusters
filename <- system.file( "extdata", "example.bam", package = "wavClusteR" ) example <- readSortedBam( filename = filename ) countTable <- getAllSub( example, minCov = 10, cores = 1 ) highConfSub <- getHighConfSub( countTable, supportStart = 0.2, supportEnd = 0.7, substitution = "TC" ) coverage <- coverage( example ) clusters <- getClusters( highConfSub = highConfSub, coverage = coverage, sortedBam = example, method = 'mrn', cores = 1, threshold = 2 )