normOffsets {csaw}R Documentation

Normalize counts between libraries

Description

Calculate normalization factors or offsets using count data from multiple libraries.

Usage

## S4 method for signature 'matrix'
normOffsets(object, lib.sizes=NULL, type=c("scaling", "loess"), 
    weighted=FALSE, ...)

## S4 method for signature 'SummarizedExperiment'
normOffsets(object, assay=1, type="scaling", ..., se.out=TRUE)

## S4 method for signature 'SummarizedExperiment'
normalize(object, ...)

Arguments

object

A matrix of numeric counts with one column per library and one row per genomic feature (e.g., window).

lib.sizes

A numeric vector specifying the total number of reads per library. This is extracted from object$totals for SummarizedExperiment inputs, or computed as colSums(object) for matrices.

type

A character string indicating what type of normalization is to be performed.

weighted

A logical scalar indicating whether precision weights should be used for TMM normalization.

...

Other arguments to be passed to calcNormFactors for type="scaling", or loessFit for type="loess".

assay

An integer scalar or string specifying the assay values to use for normalization.

se.out

A logical scalar indicating whether or not a SummarizedExperiment object should be returned. Alternatively, a SummarizedExperiment object in which normalization factors are to be stored.

Details

If type="scaling", this function provides a convenience wrapper for the calcNormFactors function in the edgeR package. Specifically, it uses the trimmed mean of M-values (TMM) method to perform normalization. Precision weighting is turned off by default so as to avoid upweighting high-abundance regions. These are more likely to be bound and thus more likely to be differentially bound. Assigning excessive weight to such regions will defeat the purpose of trimming when normalizing the coverage of background regions.

If type="loess", this function performs non-linear normalization similar to the fast loess algorithm in normalizeCyclicLoess. For each sample, a lowess curve is fitted to the log-counts against the log-average count. The fitted value for each genomic window is used as an offset in a generalized linear model for that feature and sample. The use of the average count provides more stability than the average log-count when low counts are present for differentially bound regions.

If a SummarizedExperiment object is supplied, the values to be used for normalization will be extracted according to the specified assay field. If se.out=TRUE, the SummarizedExperiment method will return a modified version of object containing normalization information. Normalization factors are stored in the "norm.factors" field in the mcols, while the offset matrix is stored in the "offset" field in the assays. Otherwise, if se.out=FALSE, a vector or matrix of normalization factors/offsets is directly returned.

Value

For type="scaling", a numeric vector containing the relative normalization factors for each library is returned.

For type="loess", a numeric matrix of the same dimensions as counts, containing the log-based offsets for use in GLM fitting.

If se.out=TRUE, a SummarizedExperiment is returned that contains the computed normalization factors/offsets but is otherwise identical to object.

Additional details for SummarizedExperiment inputs

If se.out is a SummarizedExperiment object and type="scaling", the function will calculate the normalization factors from object but return them in a modified version of se.out. This is useful when se.out contains counts for windows, but the normalization factors are computed using larger bins in object.

Note that the normalization factors can only be interpreted with respect to the library sizes used to calculate them. As such, the function will throw an error if the library sizes in se.out$totals are not identical to object$totals. Consistent library sizes can be achieved by using the same readParam object in windowCounts and related functions.

The use of a SummarizedExperiment object in se.out with type="loess" is not yet supported.

Author(s)

Aaron Lun

References

Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11, R25.

Ballman KV, Grill DE, Oberg AL, Therneau TM (2004). Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778-86.

See Also

calcNormFactors, loessFit, normalizeCyclicLoess

Examples

# A trivial example
counts <- matrix(rnbinom(400, mu=10, size=20), ncol=4)
normOffsets(counts)
normOffsets(counts, lib.sizes=rep(400, 4))

# Using loess-based normalization, instead.
offsets <- normOffsets(counts, type="loess")
head(offsets)
offsets <- normOffsets(counts, type="loess", span=0.4)
offsets <- normOffsets(counts, type="loess", iterations=1)

# Same for SummariedExperiment objects. 
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
data <- windowCounts(bamFiles, width=100, filter=1)

normOffsets(data, se.out=FALSE)
normOffsets(data, se.out=TRUE)

another.data <- windowCounts(bamFiles, width=10)
normOffsets(data, se.out=another.data)

normOffsets(data, type="loess", se.out=TRUE)
head(normOffsets(data, type="loess", se.out=FALSE))

[Package csaw version 1.14.1 Index]