normalize {scater}R Documentation

Normalise a SingleCellExperiment object using pre-computed size factors

Description

Compute normalised expression values from count data in a SingleCellExperiment object, using the size factors stored in the object.

Usage

normalizeSCE(object, exprs_values = "counts", return_log = TRUE,
  log_exprs_offset = NULL, centre_size_factors = TRUE,
  size_factor_grouping = NULL)

## S4 method for signature 'SingleCellExperiment'
normalize(object, exprs_values = "counts",
  return_log = TRUE, log_exprs_offset = NULL, centre_size_factors = TRUE,
  size_factor_grouping = NULL)

normalise(...)

Arguments

object

A SingleCellExperiment object.

exprs_values

String indicating which assay contains the count data that should be used to compute log-transformed expression values.

return_log

Logical scalar, should normalized values be returned on the log2 scale?

log_exprs_offset

Numeric scalar specifying the offset to add when log-transforming expression values. If NULL, value is taken from metadata(object)$log.exprs.offset if defined, otherwise 1.

centre_size_factors

Logical scalar indicating whether size fators should be centred.

size_factor_grouping

Factor specifying groups of cells in which size factors should be centred, see centreSizeFactors for details.

...

Arguments passed to normalize when calling normalise.

Details

Normalized expression values are computed by dividing the counts for each cell by the size factor for that cell. This aims to remove cell-specific scaling biases, e.g., due to differences in sequencing coverage or capture efficiency. If log=TRUE, log-normalized values are calculated by adding log_exprs_offset to the normalized count and performing a log2 transformation.

Features marked as spike-in controls will be normalized with control-specific size factors, if these are available. This reflects the fact that spike-in controls are subject to different biases than those that are removed by gene-specific size factors (namely, total RNA content). If size factors for a particular spike-in set are not available, a warning will be raised.

Size factors will be centred to have a mean of unity if centre_size_factors=TRUE, prior to calculation of normalized expression values. This ensures that the computed exprs can be interpreted as being on the same scale as log-counts. It also standardizes the effect of the log_exprs_offset addition, and ensures that abundances are roughly comparable between features normalized with different sets of size factors.

If size_factor_grouping is specified and centre_size_factors=TRUE, this is equivalent to subsetting the SingleCellExperiment; centering the size factors within each subset; normalizing within each subset; and then merging the subsets back together for output. This enables convenient normalization of multiple batches separately.

Note that normalize is exactly the same as normalise.

Value

A SingleCellExperiment object containing normalized expression values in "normcounts" if log=FALSE, and log-normalized expression values in "logcounts" if log=TRUE. All size factors will also be centred in the output object if centre_size_factors=TRUE.

Warning about centred size factors

Generally speaking, centering does not affect relative comparisons between cells in the same object, as all size factors are scaled by the same amount. However, if two different SingleCellExperiment objects are run separately through normalize, the size factors in each object will be rescaled differently. This means that the size factors and log-expression values will not be comparable between objects.

This lack of comparability is not always obvious. For example, if we subsetted an existing SingleCellExperiment object, and ran normalize separately on each subset, the resulting expression values in each subsetted object would not be comparable to each other. This is despite the fact that all cells were originally derived from a single SingleCellExperiment object.

In general, it is advisable to only compare size factors and expression values between cells in one SingleCellExperiment object, from a single normalize call with size_factor_grouping=NULL. If objects are to be combined, new size factors should be computed using all cells in the combined object, followed by a single normalize call. If size_factor_grouping is specified, expression values should only be compared within each level of the specified factor.

Author(s)

Davis McCarthy and Aaron Lun

Examples

data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
    assays = list(counts = sc_example_counts), 
    colData = sc_example_cell_info
)
keep_gene <- rowSums(counts(example_sce)) > 0
example_sce <- example_sce[keep_gene,]

## Apply TMM normalisation taking into account all genes
example_sce <- normaliseExprs(example_sce, method = "TMM")
## Scale counts relative to a set of control features (here the first 100 features)
example_sce <- normaliseExprs(example_sce, method = "none",
feature_set = 1:100)

## normalize the object using the saved size factors
example_sce <- normalize(example_sce)


[Package scater version 1.8.4 Index]