normOffsets {csaw} | R Documentation |
Calculate normalization factors or offsets using count data from multiple libraries.
normOffsets(object, type=NULL, ..., assay.id="counts", se.out=TRUE) normFactors(object, method="TMM", weighted=FALSE, ..., assay.id="counts", se.out=TRUE)
object |
A SummarizedExperiment object containing a count matrix. |
type |
Deprecated; a character string indicating what type of normalization is to be performed. |
method |
String specifying the type of scaling normalization method to use. |
weighted |
A logical scalar indicating whether precision weights should be used for TMM normalization. |
... |
Other arguments to be passed to |
assay.id |
An integer scalar or string specifying the assay values to use for normalization. |
se.out |
A logical scalar indicating whether or not a SummarizedExperiment object should be returned. Alternatively, a SummarizedExperiment object in which normalization factors are to be stored. |
The normFactors
function provides a convenience wrapper for the calcNormFactors
function in the edgeR package.
This uses the trimmed mean of M-values (TMM) method to remove composition biases, typically in background regions of the genome.
Precision weighting is turned off by default so as to avoid upweighting high-abundance regions.
These are more likely to be bound and thus more likely to be differentially bound.
Assigning excessive weight to such regions will defeat the purpose of trimming when normalizing the coverage of background regions.
The normOffsets
function performs non-linear normalization similar to the fast loess algorithm in normalizeCyclicLoess
.
This aims to account for mean dependencies in the efficiency biases between libraries.
For each sample, a lowess curve is fitted to the log-counts against the log-average count.
The fitted value for each genomic window is used as an offset in a generalized linear model for that feature and sample.
The use of the average count provides more stability than the average log-count when low counts are present for differentially bound regions.
Both functions expect SummarizedExperiment objects as input.
The count matrix to be used for normalization will be extracted according to the specified assay.id
field.
Library sizes are extracted from object$totals
.
For normFactors
, a numeric vector containing the relative normalization factors for each library is computed.
This is returned directly if se.out=FALSE
, otherwise it is stored in the norm.factors
field of the mcols
of the output object.
For normOffsets
with type="loess"
, a numeric matrix of the same dimensions as counts
is computed, containing the log-based offsets for use in GLM fitting.
This is returned directly if se.out=FALSE
, otherwise it is stored in the offsets
assay of the output object.
If se.out=TRUE
, a SummarizedExperiment is returned that contains the computed normalization factors/offsets but is otherwise identical to object
.
If se.out
is a SummarizedExperiment object, the normalization factors and offsets will be stored in an object that is otherwise identical to se.out
.
For normFactors
, the normalization factors are always computed from object
.
However, if se.out
is a (different) SummarizedExperiment object, these factors are stored and returned in se.out
.
This is useful when se.out
contains counts for windows, but the normalization factors are computed using larger bins in object
.
For normOffsets
with type="loess"
, the trend fits are always computed from object
.
However, if se.out
is a (different) SummarizedExperiment object, the trend fits will be used to compute offsets for each entry in se.out
using spline interpolation.
This is useful when se.out
contains counts for windows in an endogenous genome, but the trend fits are computed using spike-in chromatin regions.
In both functions, an error is raised if the library sizes in se.out$totals
are not identical to object$totals
.
This is because the normalization factors (for normFactors
) and average abundances (for normOffsets
) are only comparable when the library sizes are the same.
Consistent library sizes can be achieved by using the same readParam
object in windowCounts
and related functions.
Aaron Lun
Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.
Ballman KV, Grill DE, Oberg AL, Therneau TM (2004). Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778-86.
calcNormFactors
,
loessFit
,
normalizeCyclicLoess
counts <- matrix(rnbinom(400, mu=10, size=20), ncol=4) data <- SummarizedExperiment(list(counts=counts)) data$totals <- colSums(counts) # TMM normalization. normFactors(data) # Using loess-based normalization, instead. offsets <- normOffsets(data) head(offsets) offsets <- normOffsets(data, span=0.4) offsets <- normOffsets(data, iterations=1)