runPCA {scater}R Documentation

Perform PCA on cell-level data

Description

Perform a principal components analysis (PCA) on cells, based on the data in a SingleCellExperiment object.

Usage

runPCA(object, ncomponents = 2, method = c("prcomp", "irlba"),
  ntop = 500, exprs_values = "logcounts", feature_set = NULL,
  scale_features = TRUE, use_coldata = FALSE,
  selected_variables = NULL, detect_outliers = FALSE,
  rand_seed = NULL, ...)

Arguments

object

A SingleCellExperiment object.

ncomponents

Numeric scalar indicating the number of principal components to obtain.

method

String specifying how the PCA should be performed.

ntop

Numeric scalar specifying the number of most variable features to use for PCA.

exprs_values

Integer scalar or string indicating which assay of object should be used to obtain the expression values for the calculations.

feature_set

Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for PCA. This will override any ntop argument if specified.

scale_features

Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8.

use_coldata

Logical scalar specifying whether the column data should be used instead of expression values to perform PCA.

selected_variables

List of strings or a character vector indicating which variables in colData(object) to use for PCA when use_coldata=TRUE. If a list, each entry can take the form described in ?"scater-vis-var".

detect_outliers

Logical scalar, should outliers be detected based on PCA coordinates generated from column-level metadata?

rand_seed

Deprecated, numeric scalar specifying the random seed when using method="irlba".

...

Additional arguments to pass to prcomp_irlba when method="irlba".

Details

The function prcomp is used internally to do the PCA when method="prcomp". Alternatively, the irlba package can be used, which performs a fast approximation of PCA through the prcomp_irlba function. This is especially useful for large, sparse matrices.

Note that prcomp_irlba involves a random initialization, after which it converges towards the exact PCs. This means that the result will change slightly across different runs. For full reproducibility, users should call set.seed prior to running runPCA with method="irlba".

If use_coldata=TRUE, PCA will be performed on column-level metadata instead of the gene expression matrix. The selected_variables defaults to a vector containing:

This can be useful for identifying outliers cells based on QC metrics, especially when combined with detect_outliers=TRUE. If outlier identification is enabled, the outlier field of the output colData will contain the identified outliers.

Value

A SingleCellExperiment object containing the first ncomponent principal coordinates for each cell. If use_coldata=FALSE, this is stored in the "PCA" entry of the reducedDims slot. Otherwise, it is stored in the "PCA_coldata" entry.

The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar" attribute of the reduced dimension matrix. Note that this will only be of length equal to ncomponents when method is not "prcomp". This is because approximate PCA methods do not compute singular values for all components.

Author(s)

Aaron Lun, based on code by Davis McCarthy

See Also

prcomp, plotPCA

Examples

## Set up an example SingleCellExperiment
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
    assays = list(counts = sc_example_counts),
    colData = sc_example_cell_info
)
example_sce <- normalize(example_sce)

example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
head(reducedDim(example_sce))

[Package scater version 1.10.1 Index]