runPCA {scater} | R Documentation |
Perform a principal components analysis (PCA) on cells, based on the data in a SingleCellExperiment object.
runPCA(object, ncomponents = 2, method = c("prcomp", "irlba"), ntop = 500, exprs_values = "logcounts", feature_set = NULL, scale_features = TRUE, use_coldata = FALSE, selected_variables = NULL, detect_outliers = FALSE, rand_seed = NULL, ...)
object |
A SingleCellExperiment object. |
ncomponents |
Numeric scalar indicating the number of principal components to obtain. |
method |
String specifying how the PCA should be performed. |
ntop |
Numeric scalar specifying the number of most variable features to use for PCA. |
exprs_values |
Integer scalar or string indicating which assay of |
feature_set |
Character vector of row names, a logical vector or a numeric vector of indices indicating a set of features to use for PCA.
This will override any |
scale_features |
Logical scalar, should the expression values be standardised so that each feature has unit variance? This will also remove features with standard deviations below 1e-8. |
use_coldata |
Logical scalar specifying whether the column data should be used instead of expression values to perform PCA. |
selected_variables |
List of strings or a character vector indicating which variables in |
detect_outliers |
Logical scalar, should outliers be detected based on PCA coordinates generated from column-level metadata? |
rand_seed |
Deprecated, numeric scalar specifying the random seed when using |
... |
Additional arguments to pass to |
The function prcomp
is used internally to do the PCA when method="prcomp"
.
Alternatively, the irlba package can be used, which performs a fast approximation of PCA through the prcomp_irlba
function.
This is especially useful for large, sparse matrices.
Note that prcomp_irlba
involves a random initialization, after which it converges towards the exact PCs.
This means that the result will change slightly across different runs.
For full reproducibility, users should call set.seed
prior to running runPCA
with method="irlba"
.
If use_coldata=TRUE
, PCA will be performed on column-level metadata instead of the gene expression matrix.
The selected_variables
defaults to a vector containing:
"pct_counts_top_100_features"
"total_features_by_counts"
"pct_counts_feature_control"
"total_features_feature_control"
"log10_total_counts_endogenous"
"log10_total_counts_feature_control"
This can be useful for identifying outliers cells based on QC metrics, especially when combined with detect_outliers=TRUE
.
If outlier identification is enabled, the outlier
field of the output colData
will contain the identified outliers.
A SingleCellExperiment object containing the first ncomponent
principal coordinates for each cell.
If use_coldata=FALSE
, this is stored in the "PCA"
entry of the reducedDims
slot.
Otherwise, it is stored in the "PCA_coldata"
entry.
The proportion of variance explained by each PC is stored as a numeric vector in the "percentVar"
attribute of the reduced dimension matrix.
Note that this will only be of length equal to ncomponents
when method
is not "prcomp"
.
This is because approximate PCA methods do not compute singular values for all components.
Aaron Lun, based on code by Davis McCarthy
## Set up an example SingleCellExperiment data("sc_example_counts") data("sc_example_cell_info") example_sce <- SingleCellExperiment( assays = list(counts = sc_example_counts), colData = sc_example_cell_info ) example_sce <- normalize(example_sce) example_sce <- runPCA(example_sce) reducedDimNames(example_sce) head(reducedDim(example_sce))