wasserstein.sc {waddR} | R Documentation |
Two-sample test for single-cell RNA-sequencing data to check for differences between two distributions (conditions) using the 2-Wasserstein distance: Semi-parametric implementation using a permutation test with a generalized Pareto distribution (GPD) approximation to estimate small p-values accurately
wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL) ## S4 method for signature 'matrix,vector' wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL) ## S4 method for signature 'SingleCellExperiment,SingleCellExperiment' wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL)
x |
matrix of single-cell RNA-sequencing expression data with genes in rows and samples (cells) in columns |
y |
vector of condition labels |
method |
method employed in the testing procedure: “OS” for the one-stage method (i.e. semi-parametric testing applied to all (zero and non-zero) expression values); “TS” for the two-stage method (i.e. semi-parametric testing applied to non-zero expression values only, combined with a separate testing for differential proportions of zero expression using logistic regression). If this argument is not given, a two-sided test is run by default. |
permnum |
number of permutations used in the permutation testing procedure. If this argument is not given, 10000 is used as default |
seed |
number to be used to generate a L'Ecuyer-CMRG seed, which itself seeds the generation of an nextRNGStream() for each gene to achieve reproducibility. By default, NULL is given and no seed is set. |
Details concerning the permutation testing procedures for single-cell RNA-sequencing data can be found in Schefzik and Goncalves (2019). Corresponds to the function .testWass when identifying the argument inclZero=TRUE in .testWass with the argument method=”OS” and the argument inclZero=FALSE in .testWass with the argument method=”TS”.
See the corresponding values in the description of the function .testWass, where the argument inclZero=TRUE in .testWass has to be identified with the argument method=”OS”, and the argument inclZero=FALSE in .testWass with the argument method=”TS”. A vector concerning the testing results, precisely (see Schefzik and Goncalves (2019) for details) in case of inclZero=TRUE:
d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation
d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation
d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation
d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation
location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples
size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples
shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples
rho: correlation coefficient in the quantile-quantile plot
pval: p-value of the semi-parametric 2-Wasserstein distance-based test
p.ad.gpd in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA)
N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05)(otherwise NA)
perc.loc: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
perc.size: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
perc.shape: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
decomp.error: relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation
pval.adj: adjusted p-value of the semi-parametric 2-Wasserstein distance-based test according to the method of Benjamini-Hochberg
In case of inclZero=FALSE:
d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation
d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation
d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation
d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation
location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples
size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples
shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples
rho: correlation coefficient in the quantile-quantile plot
p.nonzero: p-value of the semi-parametric 2-Wasserstein distance-based test (based on non-zero expression only)
p.ad.gpd: in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA)
N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05)(otherwise NA)
perc.loc: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
perc.size: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
perc.shape: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation
decomp.error: relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation
p.zero: p-value of the test for differential proportions of zero expression (logistic regression model)
p.combined: combined p-value of p.nonzero and p.zero obtained by Fisher’s method
p.adj.nonzero: adjusted p-value of the semi-parametric 2-Wasserstein distance-based test (based on non-zero expression only) according to the method of Benjamini-Hochberg
p.adj.zero: adjusted p-value of the test for differential proportions of zero expression (logistic regression model) according to the method of Benjamini-Hochberg
p.adj.combined: adjusted combined p-value of p.nonzero and p.zero obtained by Fisher’s method according to the method of Benjamini-Hochberg
Schefzik and Goncalves (2019).
# some data in two conditions cond1 <- matrix(rnorm(100, 42, 1), nrow=1) cond2 <- matrix(rnorm(100, 45, 3), nrow=1) # call wasserstein.sc with a matrix # and a vector denoting conditions dat <- cbind(cond1, cond2) condition <- c(rep(1, 100), rep(2, 100)) wasserstein.sc(dat, condition, "TS", 100) # call wasserstein.sc with two SingleCellExperiment objects sce1 <- SingleCellExperiment::SingleCellExperiment( assays=list(counts=cond1, logcounts=log10(cond1))) sce2 <- SingleCellExperiment::SingleCellExperiment( assays=list(counts=cond2, logcounts=log10(cond2))) wasserstein.sc(sce1, sce2, "TS", 100) # for reproducible p-values wasserstein.sc(sce1, sce2, seed=123)