write10xCounts {DropletUtils}R Documentation

Write count data in the 10x format

Description

Create a directory containing the count matrix and cell/gene annotation from a sparse matrix of UMI counts, in the format produced by the CellRanger software suite.

Usage

write10xCounts(path, x, barcodes=colnames(x), gene.id=rownames(x),
    gene.symbol=gene.id, gene.type="Gene Expression", overwrite=FALSE, 
    type=c("auto", "sparse", "HDF5"), genome="unknown", version=c("2", "3"))

Arguments

x

A sparse numeric matrix of UMI counts.

path

A string containing the path to the output directory.

barcodes

A character vector of cell barcodes, one per column of x.

gene.id

A character vector of gene identifiers, one per row of x.

gene.symbol

A character vector of gene symbols, one per row of x.

gene.type

A character vector of gene types, expanded to one per row of x. Only used when version="3".

overwrite

A logical scalar specifying whether path should be overwritten if it already exists.

type

String specifying the type of 10X format to save x to.

genome

String specifying the genome for storage when type="HDF5". This can be a character vector with one genome per feature if version="3".

version

String specifying the version of the CellRanger format to produce.

Details

This function will try to automatically detect the desired format based on whether path ends with ".h5". If so, it assumes that path specifies a HDF5 file path and sets type="HDF5". Otherwise it will set type="sparse" under the assumption that path specifies a path to a directory.

Note that there were major changes in the output format for CellRanger version 3.0, to account for non-gene features such as antibody or CRISPR tags. Users can switch to this new format using version="3". See the documentation for “latest” for this new format, otherwise see “2.2” or earlier.

Value

For type="sparse", a directory is produced at path. If version="2", this will contain the files "matrix.mtx", "barcodes.tsv" and "genes.tsv". If version="3", it will instead contain "matrix.mtx.gz", "barcodes.tsv.gz" and "features.tsv.gz".

For type="HDF5", a HDF5 file is produced at path containing data in column-sparse format. If version="2", data are stored in the HDF5 group named genome. If version="3", data are stored in the group "matrix".

A TRUE value is invisibly returned.

Author(s)

Aaron Lun

References

10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/matrices

10X Genomics (2018). Feature-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices

10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/advanced/h5_matrices

10X Genomics (2018). HDF5 Feature Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices

See Also

read10xCounts

Examples

# Mocking up some count data.
library(Matrix)
my.counts <- matrix(rpois(1000, lambda=5), ncol=10, nrow=100)
my.counts <- as(my.counts, "dgCMatrix")
cell.ids <- paste0("BARCODE-", seq_len(ncol(my.counts)))

ngenes <- nrow(my.counts)
gene.ids <- paste0("ENSG0000", seq_len(ngenes))
gene.symb <- paste0("GENE", seq_len(ngenes))

# Writing this to file:
tmpdir <- tempfile()
write10xCounts(tmpdir, my.counts, gene.id=gene.ids, 
    gene.symbol=gene.symb, barcodes=cell.ids)
list.files(tmpdir)

[Package DropletUtils version 1.4.3 Index]