sumCountsAcrossCells {scater}R Documentation

Aggregate expression values across groups of cells

Description

Sum counts or average expression values for each feature across groups of cells. Also aggregate values in the colData and other metadata within each group.

Usage

sumCountsAcrossCells(x, ...)

aggregateAcrossCells(x, ...)

## S4 method for signature 'ANY'
sumCountsAcrossCells(
  x,
  ids,
  subset_row = NULL,
  subset_col = NULL,
  store_number = "ncells",
  average = FALSE,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
sumCountsAcrossCells(x, ..., exprs_values = "counts")

## S4 method for signature 'SummarizedExperiment'
aggregateAcrossCells(
  x,
  ids,
  ...,
  subset_row = NULL,
  subset_col = NULL,
  store_number = "ncells",
  coldata_merge = NULL,
  use_exprs_values = "counts"
)

## S4 method for signature 'SingleCellExperiment'
aggregateAcrossCells(
  x,
  ids,
  ...,
  subset_row = NULL,
  subset_col = NULL,
  coldata_merge = NULL,
  store_number = "ncells",
  use_exprs_values = "counts",
  use_altexps = TRUE,
  use_dimred = TRUE
)

Arguments

x

For sumCountsAcrossCells, a numeric matrix of expression values (usually counts) containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a matrix.

For aggregateAcrossCells, a SingleCellExperiment or SummarizedExperiment containing one or more matrices of expression values to be aggregated, possibly along with colData, reducedDims and altExps elements.

...

For the generics, further arguments to be passed to specific methods.

For the sumCountsAcrossCells SummarizedExperiment method, further arguments to be passed to the ANY method.

For aggregateAcrossCells, further arguments to be passed to sumCountsAcrossCells.

ids

A factor specifying the group to which each cell in x belongs.

Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a group.

subset_row

An integer, logical or character vector specifying the features to use. Defaults to all features.

For the SingleCellExperiment method, this argument will not affect alternative Experiments, where aggregation is always performed for all features (or not at all, depending on use_alt_exps).

subset_col

An integer, logical or character vector specifying the cells to use. Defaults to all cells with non-NA entries of ids.

store_number

String specifying the field of the output colData to store the number of cells in each group. If NULL, nothing is stored.

average

Logical scalar indicating whether the average should be computed instead of the sum.

BPPARAM

A BiocParallelParam object specifying whether summation should be parallelized.

exprs_values

A string or integer scalar specifying the assay of x containing the matrix of counts (or any other expression quantity that can be meaningfully summed).

coldata_merge

A named list of functions specifying how each column metadata field should be aggregated. Each function should be named according to the name of the column in colData to which it applies. Alternatively, a single function can be supplied, see below for more details.

use_exprs_values

A character or integer vector specifying the assay(s) of x containing count matrices.

use_altexps

Logical scalar indicating whether aggregation should be performed for alternative experiments in x. Alternatively, a character or integer vector specifying the alternative experiments to be aggregated.

use_dimred

Logical scalar indicating whether aggregation should be performed for dimensionality reduction results in x. Alternatively, a character or integer vector specifying the dimensionality reduction results to be aggregated.

Details

These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.

The behaviour of sumCountsAcrossCells is equivalent to that of colsum. However, this function can operate on any matrix representation in object; can do so in a parallelized manner for large matrices without resorting to block processing; and can natively support combinations of multiple factors in ids.

Any NA values in ids are implicitly ignored and will not be considered during summation. This may be useful for removing undesirable cells by setting their entries in ids to NA. Alternatively, we can explicitly select the cells of interest with subset_col.

Setting average=TRUE will compute the average in each set rather than the sum. This is particularly useful if x contains expression values that have already been normalized in some manner, as computing the average avoids another round of normalization to account for differences in the size of each set.

Note that, prior to version 1.16.0, sumCountsAcrossCells would return a raw matrix. This has now been wrapped in a SummarizedExperiment for consistency and to include per-group statistics.

Value

For sumCountsAcrossCells, a SummarizedExperiment is returned with one column per level of ids. Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row). Columns are ordered by levels(ids) and the number of cells per level is reported in the "ncells" column metadata. For DataFrame ids, each column corresponds to a unique combination of levels (recorded in the colData).

For aggregateAcrossCells, a SummarizedExperiment of the same class as x is returned, containing summed/averaged matrices generated by sumCountsAcrossCell on all assays specified in use_exprs_values. Column metadata and other available metadata (e.g., reduced dimensions) are also aggregated, see below.

Aggregation of additional metadata

The aggregateAcrossCells sums the assay values in x using sumCountsAcrossCells while also aggregating metadata across cells in a sensible manner. This makes it useful for obtaining an aggregated SummarizedExperiment during an analysis session; in contrast, sumCountsAcrossCells is more lightweight and is better for use inside other functions.

Aggregation of the colData is controlled using functions in coldata_merge. This can either be:

For any unspecified field, we check if all cells of a group have the same value. If so, that value is reported, otherwise a NA is reported for the offending group.

If x is a SingleCellExperiment, the assay values in the altExps are subjected to a similar summation/averaging across cells. This uses the same arguments that were used for the main experiment. Values in the reducedDims are also averaged across cells (regardless of the value of average).

Users can tune the behavior of the function for these additional fields with use_altexps and use_dimred. Note that if the alternative experiments themselves are SingleCellExperiments, any further nested alternative experiment or reduced dimensions will always be aggregated regardless of the value of use_altexps or use_dimred.

If ids is a DataFrame, the combination of levels corresponding to each column is also reported in the column metadata. Otherwise, the level corresponding to each column is captured in the column names.

Author(s)

Aaron Lun

See Also

numDetectedAcrossCells, which computes the number of expressing cells in each group.

Examples

example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- sumCountsAcrossCells(example_sce, ids)
head(out)
attr(out, "ncells")

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- sumCountsAcrossCells(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)
attr(out2, "ids")

# Using another column metadata merge strategy.
example_sce$stuff <- runif(ncol(example_sce))
example_merged <- aggregateAcrossCells(example_sce, ids, 
     coldata_merge=list(stuff=sum))

[Package scater version 1.16.2 Index]