read10xCounts {DropletUtils} | R Documentation |
Creates a SingleCellExperiment from the CellRanger output directories for 10X Genomics data.
read10xCounts(samples, col.names=FALSE, type=c("auto", "sparse", "HDF5"), group=NULL)
samples |
A character vector containing one or more directory names, each corresponding to a 10X sample.
Each directory should contain the Alternatively, strings may contain a path to a HDF5 file in the sparse matrix format generated by 10X.
These can be mixed with directory names when |
col.names |
A logical scalar indicating whether the columns of the output object should be named with the cell barcodes. |
type |
String specifying the type of 10x format to read data from. |
group |
String specifying the group name if |
This function was originally developed from the Read10X
function from the Seurat package.
It was then taken from the read10xResults
implementation in the scater package.
If type="auto"
, the format is automatically detected for each samples
based on whether it ends with ".h5"
.
If so, type
is set to "HDF5"
; otherwise it is set to "sparse"
.
If type="sparse"
, count data are loaded as a dgCMatrix object.
This is a conventional column-sparse compressed matrix format produced by the CellRanger pipeline.
If type="HDF5"
, count data are assumed to follow the 10X sparse HDF5 format for large data sets.
It is loaded as a TENxMatrix object, which is a stub object that refers back to the file in samples
.
Users may need to set group
if it cannot be automatically determined.
Matrices are combined by column if multiple samples
were specified.
This will throw an error if the gene information is not consistent across samples
.
If col.names=TRUE
and length(sample)==1
, each column is named by the cell barcode.
For multiple samples, the columns are unnamed to avoid problems with non-unique barcodes across samples.
Note that user-level manipulation of sparse matrices requires loading of the Matrix package.
Otherwise, calculation of rowSums
, colSums
, etc. will result in errors.
A SingleCellExperiment object containing count data for each gene (row) and cell (column) across all samples
.
Row metadata will contain the fields "ID"
and "Symbol"
.
The former is the gene identifier (usually Ensembl), while the latter is the gene name.
Column metadata will contain the fields "Sample"
and "Barcode"
.
The former contains the value of samples
from which each column was obtained.
The latter refers to the cell barcode sequence and GEM group for each library.
Rows are named with the gene identifier. Columns are named with the cell barcode in certain settings, see Details.
Davis McCarthy, with modifications from Aaron Lun
Zheng GX, Terry JM, Belgrader P, and others (2017). Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049.
10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices
10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices
# Mocking up some 10X genomics output. example(write10xCounts) # Reading it in. sce10x <- read10xCounts(tmpdir) # Column names are dropped with multiple 'samples'. sce10x2 <- read10xCounts(c(tmpdir, tmpdir))