cms {CellMixS}R Documentation

cms

Description

Calculates cell-specific mixing scores based on euclidean distances within a subspace of integrated data.

Usage

cms(sce, k, group, dim_red = "PCA", assay_name = "logcounts",
  res_name = NULL, k_min = NA, smooth = TRUE, n_dim = 20,
  cell_min = 10, BPPARAM = SerialParam())

Arguments

sce

A SingleCellExperiment object with the combined data.

k

Numeric. Number of k-nearest neighbours (Knn) to use.

group

Character. Name of group/batch variable. Needs to be one of names(colData(sce))

dim_red

Character. Name of embeddings to use as subspace for distance distributions. Default is "PCA".

assay_name

Character. Name of the assay to use for PCA. Only relevant if no existing 'dim_red' is provided. Must be one of names(assays(sce)). Default is "logcounts".

res_name

Character. Appendix of the result score's name (e.g. method used to combine batches).

k_min

Numeric. Minimum number of Knn to include. Default is NA (see Details).

smooth

Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean.

n_dim

Numeric. Number of dimensions to include to define the subspace.

cell_min

Numeric. Minimum number of cells from each group to be included into the AD test. Should be > 4 to make the ad.test function working.

BPPARAM

A BiocParallelParam object specifying whether cms scores shall be calculated in parallel.

Details

The cms function tests the hypothesis, that group-specific distance distributions of knn cells have the same underlying unspecified distribution. It performs Anderson-Darling tests as implemented in the kSamples package. In default the function uses all distances and group label defined in knn. If k_min is specified, the first local minimum of the overall distance distribution with at least k_min cells is used. This can be used to adapt to the local structure of the datatset e.g. prevent cells from a different cluster to be included. If 'dim_red' is not defined or default cms will calculate a PCA using runPCA. Results will be appended to colData(sce). Names can be specified using res_name. If multiple cores are available cms scores can be calculated in parallel (does not work on Windows). Parallelization can be specified using BPPARAM.

Value

A SingleCellExperiment with cms (and cms_smooth) within colData.

References

Scholz, F. W. and Stephens, M. A. (1987). K-Sample Anderson-Darling Tests. J. Am. Stat. Assoc.

See Also

.cmsCell, .smoothCms.

Examples

library(SingleCellExperiment)
sim_list <- readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce <- sim_list[[1]][, c(1:50)]

sce_cms <- cms(sce, k = 20, group = "batch", n_dim = 2)


[Package CellMixS version 1.0.2 Index]