normOffsets {csaw}R Documentation

Normalize counts across libraries

Description

Calculate normalization factors or offsets using count data from multiple libraries.

Usage

normOffsets(object, type=NULL, ..., assay.id="counts", se.out=TRUE)

normFactors(object, method="TMM", weighted=FALSE, ..., 
    assay.id="counts", se.out=TRUE)

Arguments

object

A SummarizedExperiment object containing a count matrix.

type

Deprecated; a character string indicating what type of normalization is to be performed.

method

String specifying the type of scaling normalization method to use.

weighted

A logical scalar indicating whether precision weights should be used for TMM normalization.

...

Other arguments to be passed to calcNormFactors for type="scaling", or loessFit for type="loess".

assay.id

An integer scalar or string specifying the assay values to use for normalization.

se.out

A logical scalar indicating whether or not a SummarizedExperiment object should be returned. Alternatively, a SummarizedExperiment object in which normalization factors are to be stored.

Details

The normFactors function provides a convenience wrapper for the calcNormFactors function in the edgeR package. This uses the trimmed mean of M-values (TMM) method to remove composition biases, typically in background regions of the genome. Precision weighting is turned off by default so as to avoid upweighting high-abundance regions. These are more likely to be bound and thus more likely to be differentially bound. Assigning excessive weight to such regions will defeat the purpose of trimming when normalizing the coverage of background regions.

The normOffsets function performs non-linear normalization similar to the fast loess algorithm in normalizeCyclicLoess. This aims to account for mean dependencies in the efficiency biases between libraries. For each sample, a lowess curve is fitted to the log-counts against the log-average count. The fitted value for each genomic window is used as an offset in a generalized linear model for that feature and sample. The use of the average count provides more stability than the average log-count when low counts are present for differentially bound regions.

Both functions expect SummarizedExperiment objects as input. The count matrix to be used for normalization will be extracted according to the specified assay.id field. Library sizes are extracted from object$totals.

Value

For normFactors, a numeric vector containing the relative normalization factors for each library is computed. This is returned directly if se.out=FALSE, otherwise it is stored in the norm.factors field of the mcols of the output object.

For normOffsets with type="loess", a numeric matrix of the same dimensions as counts is computed, containing the log-based offsets for use in GLM fitting. This is returned directly if se.out=FALSE, otherwise it is stored in the offsets assay of the output object.

If se.out=TRUE, a SummarizedExperiment is returned that contains the computed normalization factors/offsets but is otherwise identical to object. If se.out is a SummarizedExperiment object, the normalization factors and offsets will be stored in an object that is otherwise identical to se.out.

Different SummarizedExperiment outputs

For normFactors, the normalization factors are always computed from object. However, if se.out is a (different) SummarizedExperiment object, these factors are stored and returned in se.out. This is useful when se.out contains counts for windows, but the normalization factors are computed using larger bins in object.

For normOffsets with type="loess", the trend fits are always computed from object. However, if se.out is a (different) SummarizedExperiment object, the trend fits will be used to compute offsets for each entry in se.out using spline interpolation. This is useful when se.out contains counts for windows in an endogenous genome, but the trend fits are computed using spike-in chromatin regions.

In both functions, an error is raised if the library sizes in se.out$totals are not identical to object$totals. This is because the normalization factors (for normFactors) and average abundances (for normOffsets) are only comparable when the library sizes are the same. Consistent library sizes can be achieved by using the same readParam object in windowCounts and related functions.

Author(s)

Aaron Lun

References

Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.

Ballman KV, Grill DE, Oberg AL, Therneau TM (2004). Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778-86.

See Also

calcNormFactors, loessFit, normalizeCyclicLoess

Examples

counts <- matrix(rnbinom(400, mu=10, size=20), ncol=4)
data <- SummarizedExperiment(list(counts=counts))
data$totals <- colSums(counts)

# TMM normalization.
normFactors(data)

# Using loess-based normalization, instead.
offsets <- normOffsets(data)
head(offsets)
offsets <- normOffsets(data, span=0.4)
offsets <- normOffsets(data, iterations=1)

[Package csaw version 1.18.0 Index]