SingleR {SingleR} | R Documentation |
Returns the best annotation for each cell in a test dataset, given a labelled reference dataset in the same feature space.
SingleR( test, ref, labels, method = c("single", "cluster"), clusters = NULL, genes = "de", sd.thresh = 1, de.method = "classic", de.n = NULL, de.args = list(), aggr.ref = FALSE, aggr.args = list(), recompute = TRUE, quantile = 0.8, fine.tune = TRUE, tune.thresh = 0.05, prune = TRUE, assay.type.test = "logcounts", assay.type.ref = "logcounts", check.missing = TRUE, BNPARAM = KmknnParam(), BPPARAM = SerialParam() )
test |
A numeric matrix of single-cell expression values where rows are genes and columns are cells. Alternatively, a SummarizedExperiment object containing such a matrix. |
ref |
A numeric matrix of (usually log-transformed) expression values from a reference dataset,
or a SummarizedExperiment object containing such a matrix;
see Alternatively, a list or List of SummarizedExperiment objects or numeric matrices containing multiple references. Row names may be different across entries but only the intersection will be used, see Details. |
labels |
A character vector or factor of known labels for all samples in Alternatively, if |
method |
String specifying whether annotation should be performed on single cells in |
clusters |
A character vector or factor of cluster identities for each cell in |
genes, sd.thresh, de.method, de.n, de.args |
Arguments controlling the choice of marker genes used for annotation, see |
aggr.ref, aggr.args |
Arguments controlling the aggregation of the references prior to annotation, see |
recompute |
Logical scalar indicating whether to set up indices for later recomputation of scores,
when |
quantile, fine.tune, tune.thresh, prune |
Further arguments to pass to |
assay.type.test |
An integer scalar or string specifying the assay of |
assay.type.ref |
An integer scalar or string specifying the assay of |
check.missing |
Logical scalar indicating whether rows should be checked for missing values (and if found, removed). |
BNPARAM |
A BiocNeighborParam object specifying the algorithm to use for building nearest neighbor indices. |
BPPARAM |
A BiocParallelParam object specifying how parallelization should be performed, if any. |
If method="single"
, this function is effectively just a convenient wrapper around trainSingleR
and classifySingleR
.
If method="cluster"
, per-cell profiles are summed to obtain per-cluster profiles and annotation is performed on these clusters.
The function will automatically restrict the analysis to the intersection of the genes available in both ref
and test
.
If this intersection is empty (e.g., because the two datasets use different annotation in their row names), an error will be raised.
ref
can contain both single-cell or bulk data, but in the case of the former, read the Note in ?trainSingleR
.
A DataFrame is returned containing the annotation statistics for each cell or cluster (row).
This is identical to the output of classifySingleR
.
Aaron Lun, based on code by Dvir Aran.
Aran D, Looney AP, Liu L et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunology 20, 163–172.
############################## ## Mocking up training data ## ############################## Ngroups <- 5 Ngenes <- 1000 means <- matrix(rnorm(Ngenes*Ngroups), nrow=Ngenes) means[1:900,] <- 0 colnames(means) <- LETTERS[1:5] g <- rep(LETTERS[1:5], each=4) ref <- SummarizedExperiment( list(counts=matrix(rpois(1000*length(g), lambda=10*2^means[,g]), ncol=length(g))), colData=DataFrame(label=g) ) rownames(ref) <- sprintf("GENE_%s", seq_len(nrow(ref))) ref <- scater::logNormCounts(ref) trained <- trainSingleR(ref, ref$label) ############################### ## Mocking up some test data ## ############################### N <- 100 g <- sample(LETTERS[1:5], N, replace=TRUE) test <- SummarizedExperiment( list(counts=matrix(rpois(1000*N, lambda=2^means[,g]), ncol=N)), colData=DataFrame(cluster=g) ) rownames(test) <- sprintf("GENE_%s", seq_len(nrow(test))) test <- scater::logNormCounts(test) ############################### ## Performing classification ## ############################### pred <- SingleR(test, ref, labels=ref$label) table(predicted=pred$labels, truth=g) pred2 <- SingleR(test, ref, labels=ref$label, method="cluster", clusters=test$cluster) table(predicted=pred2$labels, truth=rownames(pred2))