sumCountsAcrossCells {scater} | R Documentation |
Sum counts or average expression values for each feature across groups of cells.
Also aggregate values in the colData
and other metadata within each group.
sumCountsAcrossCells(x, ...) aggregateAcrossCells(x, ...) ## S4 method for signature 'ANY' sumCountsAcrossCells( x, ids, subset_row = NULL, subset_col = NULL, store_number = "ncells", average = FALSE, BPPARAM = SerialParam() ) ## S4 method for signature 'SummarizedExperiment' sumCountsAcrossCells(x, ..., exprs_values = "counts") ## S4 method for signature 'SummarizedExperiment' aggregateAcrossCells( x, ids, ..., subset_row = NULL, subset_col = NULL, store_number = "ncells", coldata_merge = NULL, use_exprs_values = "counts" ) ## S4 method for signature 'SingleCellExperiment' aggregateAcrossCells( x, ids, ..., subset_row = NULL, subset_col = NULL, coldata_merge = NULL, store_number = "ncells", use_exprs_values = "counts", use_altexps = TRUE, use_dimred = TRUE )
x |
For For |
... |
For the generics, further arguments to be passed to specific methods. For the For |
ids |
A factor specifying the group to which each cell in Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a group. |
subset_row |
An integer, logical or character vector specifying the features to use. Defaults to all features. For the SingleCellExperiment method, this argument will not affect alternative Experiments,
where aggregation is always performed for all features (or not at all, depending on |
subset_col |
An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non- |
store_number |
String specifying the field of the output |
average |
Logical scalar indicating whether the average should be computed instead of the sum. |
BPPARAM |
A BiocParallelParam object specifying whether summation should be parallelized. |
exprs_values |
A string or integer scalar specifying the assay of |
coldata_merge |
A named list of functions specifying how each column metadata field should be aggregated.
Each function should be named according to the name of the column in |
use_exprs_values |
A character or integer vector specifying the assay(s) of |
use_altexps |
Logical scalar indicating whether aggregation should be performed for alternative experiments in |
use_dimred |
Logical scalar indicating whether aggregation should be performed for dimensionality reduction results in |
These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.
The behaviour of sumCountsAcrossCells
is equivalent to that of colsum
.
However, this function can operate on any matrix representation in object
;
can do so in a parallelized manner for large matrices without resorting to block processing;
and can natively support combinations of multiple factors in ids
.
Any NA
values in ids
are implicitly ignored and will not be considered during summation.
This may be useful for removing undesirable cells by setting their entries in ids
to NA
.
Alternatively, we can explicitly select the cells of interest with subset_col
.
Setting average=TRUE
will compute the average in each set rather than the sum.
This is particularly useful if x
contains expression values that have already been normalized in some manner,
as computing the average avoids another round of normalization to account for differences in the size of each set.
Note that, prior to version 1.16.0, sumCountsAcrossCells
would return a raw matrix.
This has now been wrapped in a SummarizedExperiment for consistency and to include per-group statistics.
For sumCountsAcrossCells
, a SummarizedExperiment is returned with one column per level of ids
.
Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row).
Columns are ordered by levels(ids)
and the number of cells per level is reported in the "ncells"
column metadata.
For DataFrame ids
, each column corresponds to a unique combination of levels (recorded in the colData
).
For aggregateAcrossCells
, a SummarizedExperiment of the same class as x
is returned,
containing summed/averaged matrices generated by sumCountsAcrossCell
on all assays specified in use_exprs_values
.
Column metadata and other available metadata (e.g., reduced dimensions) are also aggregated, see below.
The aggregateAcrossCells
sums the assay values in x
using sumCountsAcrossCells
while also aggregating metadata across cells in a sensible manner.
This makes it useful for obtaining an aggregated SummarizedExperiment during an analysis session;
in contrast, sumCountsAcrossCells
is more lightweight and is better for use inside other functions.
Aggregation of the colData
is controlled using functions in coldata_merge
.
This can either be:
A function that takes a subset of entries for any given column metadata field and returns a single value.
This can be set to, e.g., sum
or median
for numeric covariates,
or a function that takes the most abundant level for categorical factors.
A named list of such functions, where each function is applied to the column metadata field after which it is named.
Any field that does not have an entry in coldata_merge
is “unspecified” and handled as described below.
A list element can also be set to FALSE
, in which case no aggregation is performed for the corresponding field.
NULL
, in which case all fields are considered to be unspecified.
FALSE
, in which case no aggregation of column metadata is performed.
For any unspecified field, we check if all cells of a group have the same value.
If so, that value is reported, otherwise a NA
is reported for the offending group.
If x
is a SingleCellExperiment,
the assay values in the altExps
are subjected to a similar summation/averaging across cells.
This uses the same arguments that were used for the main experiment.
Values in the reducedDims
are also averaged across cells (regardless of the value of average
).
Users can tune the behavior of the function for these additional fields with use_altexps
and use_dimred
.
Note that if the alternative experiments themselves are SingleCellExperiments,
any further nested alternative experiment or reduced dimensions will always be aggregated
regardless of the value of use_altexps
or use_dimred
.
If ids
is a DataFrame, the combination of levels corresponding to each column is also reported in the column metadata.
Otherwise, the level corresponding to each column is captured in the column names.
Aaron Lun
numDetectedAcrossCells
, which computes the number of expressing cells in each group.
example_sce <- mockSCE() ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE) out <- sumCountsAcrossCells(example_sce, ids) head(out) attr(out, "ncells") batches <- sample(1:3, ncol(example_sce), replace=TRUE) out2 <- sumCountsAcrossCells(example_sce, DataFrame(label=ids, batch=batches)) head(out2) attr(out2, "ids") # Using another column metadata merge strategy. example_sce$stuff <- runif(ncol(example_sce)) example_merged <- aggregateAcrossCells(example_sce, ids, coldata_merge=list(stuff=sum))