build*NNGraph {scran} | R Documentation |
Build a shared or k-nearest-neighbors graph for cells based on their expression profiles.
## S4 method for signature 'ANY' buildSNNGraph(x, k=10, d=50, transposed=FALSE, pc.approx=FALSE, rand.seed=1000, irlba.args=list(), knn.args=list(), subset.row=NULL, BPPARAM=SerialParam()) ## S4 method for signature 'SingleCellExperiment' buildSNNGraph(x, ..., subset.row=NULL, assay.type="logcounts", get.spikes=FALSE, use.dimred=NULL) ## S4 method for signature 'ANY' buildKNNGraph(x, k=10, d=50, directed=FALSE, transposed=FALSE, pc.approx=FALSE, rand.seed=1000, irlba.args=list(), knn.args=list(), subset.row=NULL, BPPARAM=SerialParam()) ## S4 method for signature 'SingleCellExperiment' buildKNNGraph(x, ..., subset.row=NULL, assay.type="logcounts", get.spikes=FALSE, use.dimred=NULL)
x |
A SingleCellExperiment object, or a matrix containing expression values for each gene (row) in each cell (column). If it is matrix, it can also be transposed. |
k |
An integer scalar specifying the number of nearest neighbors to consider during graph construction. |
d |
An integer scalar specifying the number of dimensions to use for the k-NN search. |
directed |
A logical scalar indicating whether the output of |
transposed |
A logical scalar indicating whether |
pc.approx |
A logical scalar indicating whether approximate PCA should be performed. |
subset.row |
A logical, integer or character scalar indicating the rows of |
irlba.args |
A named list of additional arguments to pass to |
knn.args |
A named list of additional arguments to pass to |
rand.seed |
A numeric scalar specifying the seed for approximate PCA when |
BPPARAM |
A BiocParallelParam object to use in |
... |
Additional arguments to pass to |
assay.type |
A string specifying which assay values to use. |
get.spikes |
A logical scalar specifying whether spike-in transcripts should be used. |
use.dimred |
A string specifying whether existing values in |
The buildSNNGraph
method builds a shared nearest-neighbour graph using cells as nodes.
For each cell, its k
nearest neighbours are identified based on Euclidean distances in their expression profiles.
An edge is drawn between all pairs of cells that share at least one neighbour.
The weight of the edge between two cells is determined by the ranking of the shared nearest neighbors.
More shared neighbors, or shared neighbors that are close to both cells, will yield larger weights.
The aim is to use the SNN graph to perform community-based clustering, using various methods in the igraph package.
This is faster/more memory efficient than hierarchical clustering for large numbers of cells.
In particular, it avoids the need to construct a distance matrix for all pairs of cells.
The choice of k
can be roughly interpreted as the minimum cluster size.
Note that the setting of k
here is slightly different from that used in SNN-Cliq.
The original implementation considers each cell to be its first nearest neighbor that contributes to k
.
In buildSNNGraph
, the k
nearest neighbours refers to the number of other cells.
The buildKNNGraph
method builds a simpler k-nearest neighbour graph.
Cells are again nodes, and edges are drawn between each cell and its k-nearest neighbours.
No weighting of the edges is performed.
In theory, these graphs are directed as nearest neighour relationships may not be reciprocal.
However, by default, directed=FALSE
such that an undirected graph is returned.
An igraph-type graph, where nodes are cells and edges represent connections between nearest neighbors.
For buildSNNGraph
, these edges are weighted by the number of shared nearest neighbors.
For buildKNNGraph
, edges are not weighted but may be directed if directed=TRUE
.
In practice, PCA is performed on x
to obtain the first d
principal components.
This is necessary in order to perform the k-NN search (done using the get.knn
function) in reasonable time.
By default, the first 50 components are chosen, which should retain most of the substructure in the data set.
If d
is NA
or less than the number of cells, no dimensionality reduction is performed.
If pc.approx=TRUE
, prcomp_irlba
will be used to quickly obtain the first d
PCs.
Expression values in x
should typically be on the log-scale, e.g., log-transformed counts.
Ranks can also be used for greater robustness, e.g., from quickCluster
with get.ranks=TRUE
.
(Dimensionality reduction is still okay when ranks are provided - running PCA on ranks is equivalent to running MDS on the distance matrix derived from Spearman's rho.)
If the input matrix is already transposed, transposed=TRUE
avoids an unnecessary internal transposition.
By default, spike-in transcripts are removed from the expression matrix in buildSNNGraph,SCESet-method
.
However, any non-NULL
setting of subset.row
will override get.spikes
.
If use.dimred
is not NULL
, existing PCs are used from the specified entry of reducedDims(x)
,
and any setting of d
, subset.row
and get.spikes
are ignored.
Aaron Lun
Xu C and Su Z (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974-80
exprs <- matrix(rnorm(100000), ncol=100) g <- buildSNNGraph(exprs) clusters <- igraph::cluster_fast_greedy(g)$membership table(clusters)