downsampleReads {DropletUtils} | R Documentation |
Generate a UMI count matrix after downsampling reads from the molecule information file produced by CellRanger for 10X Genomics data.
downsampleReads(sample, prop, barcode.length=NULL, bycol=FALSE)
sample |
A string containing the path to the molecule information HDF5 file. |
barcode.length |
An integer scalar specifying the length of the cell barcode, see |
prop |
A numeric scalar or, if |
bycol |
A logical scalar indicating whether downsampling should be performed on a column-by-column basis. |
This function downsamples the reads for each molecule by the specified prop
, using the information in sample
.
It then constructs a UMI count matrix based on the molecules with non-zero read counts.
The aim is to eliminate differences in technical noise that can drive clustering by batch, as described in downsampleMatrix
.
Subsampling the reads with downsampleReads
recapitulates the effect of differences in sequencing depth per cell.
This provides an alternative to downsampling with the CellRanger aggr
function or subsampling with the 10X Genomics R kit.
Note that this differs from directly subsampling the UMI count matrix with downsampleMatrix
.
If bycol=TRUE
, sampling without replacement is performed on the reads for each cell.
The total number of reads for each cell after downsampling is guaranteed to be prop
times the original total.
Different proportions can be specified for different cells by setting prop
to a vector.
If bycol=FALSE
, downsampling without replacement is performed on all reads from the entire dataset.
The total number of reads for each cell after downsampling may not be exactly equal to prop
times the original value.
Note that this is the more natural approach and is the default, which differs from the default used in downsampleMatrix
.
A numeric sparse matrix containing the downsampled UMI counts for each gene (row) and barcode (column).
Aaron Lun
downsampleMatrix
,
read10xMolInfo
# Mocking up some 10X HDF5-formatted data. out <- DropletUtils:::sim10xMolInfo(tempfile(), nsamples=1) # Downsampling by the reads. downsampleReads(out, barcode.length=4, prop=0.5)