downsampleMatrix {DropletUtils} | R Documentation |
Downsample a count matrix to a desired proportion for each cell.
downsampleMatrix(x, prop, bycol=TRUE)
x |
A numeric matrix of counts. |
prop |
A numeric scalar or, if |
bycol |
A logical scalar indicating whether downsampling should be performed on a column-by-column basis. |
Given multiple batches of very different sequencing depths, it can be beneficial to downsample the deepest batches to match the coverage of the shallowest batches. This avoids differences in technical noise that can drive clustering by batch.
If bycol=TRUE
, sampling without replacement is performed on the count vector for each cell.
This yields a new count vector where the total is equal to prop
times the original total count.
Each count in the returned matrix is guaranteed to be smaller than the original value in x
.
Different proportions can be specified for different cells by setting prop
to a vector.
If bycol=FALSE
, downsampling without replacement is performed on the entire matrix.
This yields a new matrix where the total count across all cells is equal to prop
times the original total.
The new total count for each cell may not be exactly equal to prop
times the original value,
which may or may not be more appropriate than bycol=TRUE
for particular applications.
Technically, downsampling on the reads with downsampleReads
is more appropriate as it recapitulates the effect of differences in sequencing depth per cell.
However, in practice, the aim is to obtain cells that have similar total counts across batches, for which downsampling on the UMI counts is a more direct approach.
Note that this function was originally implemented in the scater package as downsampleCounts
.
A numeric matrix of downsampled counts, of the same type as x
.
Aaron Lun
example(read10xCounts) downsampled <- downsampleMatrix(counts(sce10x), prop = 0.5)