clusterCTSS {CAGEr} | R Documentation |
Clusters individual CAGE transcription start sites (CTSSs) along the genome into tag clusters (TCs) using specified ab initio method, or assigns them to predefined genomic regions.
clusterCTSS(object, threshold = 1, nrPassThreshold = 1, thresholdIsTpm = TRUE, method = c("distclu", "paraclu", "custom"), maxDist = 20, removeSingletons = FALSE, keepSingletonsAbove = Inf, minStability = 1, maxLength = 500, reduceToNonoverlapping = TRUE, customClusters = NULL, useMulticore = FALSE, nrCores = NULL) ## S4 method for signature 'CAGEset' clusterCTSS(object, threshold = 1, nrPassThreshold = 1, thresholdIsTpm = TRUE, method = c("distclu", "paraclu", "custom"), maxDist = 20, removeSingletons = FALSE, keepSingletonsAbove = Inf, minStability = 1, maxLength = 500, reduceToNonoverlapping = TRUE, customClusters = NULL, useMulticore = FALSE, nrCores = NULL) ## S4 method for signature 'CAGEexp' clusterCTSS(object, threshold = 1, nrPassThreshold = 1, thresholdIsTpm = TRUE, method = c("distclu", "paraclu", "custom"), maxDist = 20, removeSingletons = FALSE, keepSingletonsAbove = Inf, minStability = 1, maxLength = 500, reduceToNonoverlapping = TRUE, customClusters = NULL, useMulticore = FALSE, nrCores = NULL)
object |
A |
threshold, nrPassThreshold |
Ignore CTSSs with signal |
thresholdIsTpm |
Logical indicating if |
method |
Method to be used for clustering: |
maxDist |
Maximal distance between two neighbouring CTSSs for them to be part of the
same cluster. Used only when |
removeSingletons |
Logical indicating if tag clusters containing only
one CTSS be removed. Ignored when |
keepSingletonsAbove |
Controls which singleton tag clusters will be removed.
When |
minStability |
Minimal stability of the cluster, where stability is defined as ratio
between maximal and minimal density value for which this cluster is maximal scoring.
For definition of stability refer to Frith et al., Genome Research, 2007.
Clusters with stability |
maxLength |
Maximal length of cluster in base-pairs. Clusters with length
|
reduceToNonoverlapping |
Logical, should smaller clusters contained within bigger
cluster be removed to make a final set of tag clusters non-overlapping. Used only
when |
customClusters |
Genomic coordinates of predefined regions to be used to
segment the CTSSs. The format is either a |
useMulticore |
Logical, should multicore be used. |
nrCores |
Number of cores to use when |
Two ab initio methods for clustering TSSs along the genome are supported:
"distclu"
and "paraclu"
. "distclu"
is an implementation of simple
distance-based clustering of data attached to sequences, where two neighbouring TSSs are
joined together if they are closer than some specified distance (see
distclu-functions
for implementation details. "paraclu"
is an
implementation of Paraclu algorithm for parametric clustering of data attached to
sequences developed by M. Frith (Frith et al., Genome Research, 2007,
http://cbrc3.cbrc.jp/~martin/paraclu/).
Since Paraclu finds #' clusters within clusters (unlike distclu), additional
parameters (removeSingletons
, keepSingletonsAbove
,
minStability
, maxLength
and reduceToNonoverlapping
) can
be specified to simplify the output by discarding too small (singletons) or
too big clusters, and to reduce the clusters to a final set of non-overlapping
clusters. Clustering is done for every CAGE dataset within the CAGEr object
separately, resulting in a different set of tag clusters for every CAGE dataset. TCs from
different datasets can further be aggregated into a single referent set of consensus
clusters by calling the aggregateTagClusters
function.
The slots clusteringMethod
, filteredCTSSidx
and tagClusters
of
the provided CAGEset
object will be occupied by the information on method used
for clustering, CTSSs included in the clusters and list of tag clusters per CAGE experiment,
respectively. To retrieve tag clusters for individual CAGE dataset use
tagClusters
function.
In CAGEexp
objects, the results will be stored as a
GRangesList
of TagClusters
objects in the metadata
slot tagClusters
. The TagClusters
object will contain a
filteredCTSSidx
column if appropriate. The clustering method name
is saved in the metadata slot of the GRangesList
.
Vanja Haberle
Frith et al. (2007) A code for transcription initiation in mammalian genomes, Genome Research 18(1):1-12, (http://www.cbrc.jp/paraclu/).
tagClusters
, aggregateTagClusters
and
CTSSclusteringMethod
.
Other CAGEr object modifiers: CTSStoGenes
,
CustomConsensusClusters
,
aggregateTagClusters
,
annotateCTSS
,
cumulativeCTSSdistribution
,
getCTSS
, normalizeTagCount
,
quantilePositions
,
summariseChrExpr
Other CAGEr clusters functions: CTSSclusteringMethod
,
CTSScumulativesTagClusters
,
CustomConsensusClusters
,
aggregateTagClusters
,
consensusClustersDESeq2
,
consensusClustersGR
,
cumulativeCTSSdistribution
,
plotInterquantileWidth
,
quantilePositions
,
tagClusters
head(tagClusters(exampleCAGEset, "sample1")) clusterCTSS( object = exampleCAGEset, threshold = 50, thresholdIsTpm = TRUE , nrPassThreshold = 1, method = "distclu", maxDist = 20 , removeSingletons = TRUE, keepSingletonsAbove = 100) head(tagClusters(exampleCAGEset, "sample1")) clusterCTSS( exampleCAGEexp, threshold = 50, thresholdIsTpm = TRUE , nrPassThreshold = 1, method = "distclu", maxDist = 20 , removeSingletons = TRUE, keepSingletonsAbove = 100) tagClustersGR(exampleCAGEexp, "Zf.30p.dome")