countCells {cydar} | R Documentation |
Count the number of cells from each sample lying inside hyperspheres in high-dimensional space.
countCells(x, tol=0.5, BPPARAM=SerialParam(), downsample=10, filter=10, naive=FALSE)
x |
A CyData object produced by |
tol |
A numeric scalar proportional to the hypersphere radius. |
BPPARAM |
A BiocParallelParam object specifying how parallelization is to be performed. |
downsample |
An integer scalar specifying the frequency with which cells are sampled to form hyperspheres. |
filter |
An integer scalar specifying the minimum count sum required to report a hypersphere. |
naive |
A logical scalar specifying whether a naive counting approach should be used. |
Consider that each cell defines a point in M-dimensional space (where M is the number of markers), based on its marker intensities. This function constructs hyperspheres and counts the number of cells from each sample lying within each hypersphere. In this manner, the distribution of cells across the space can be quantified. For each hypersphere, cell counts for all samples are reported along with the median intensity across the counted cells for each marker.
Each hypersphere is centered on a cell to ensure that only occupied spaces are counted.
However, for high-density spaces, this can result in many redundant hyperspheres.
To reduce computational work, only a subset of cells are used to define hyperspheres.
The downsampling frequency is specified by downsample
, e.g., only every 10th cell is used to make a hypersphere by default.
Each hypersphere also has a radius of tol*sqrt(M)
(this relationship avoids loss of counts as M increases).
tol
can be interpreted as the acceptable amount of deviation in the intensity of a single marker for a given subpopulation.
The default value of 0.5 means that, for any one marker, cells with +0.5 or -0.5 intensity will be counted into the same subpopulation.
This value is sensible as intensities are usually on a log-10 scale, such that a total of 10-fold variability in marker intensities is tolerated.
The coordinates are reported as (weighted) medians across all cells in each hypersphere. This better reflects the location of the hypersphere if the cells are not distributed around the centre. Each cell is weighted inversely proportional to the total number of cells in the corresponding sample. This ensures that large samples do not dominate the median calculation.
All hyperspheres with count sums below filter
are removed by default.
Such hyperspheres do not have enough counts (and thus, information) for downstream analyses.
Removing them reduces the amount of memory required to form the output matrix.
The default setting of naive=FALSE
will use a method similar to that described by Samusik et al. (2016) to speed up the search.
Here, the distance between the hypersphere and cluster centers is first computed to decide which, if any, cells in the cluster should be considered for counting.
If naive=TRUE
, a slower naive approach will be used where distances are computed between all pairs of cells.
This is generally only useful for testing.
If markerData(x)$used
is not all TRUE
, only the specified markers will be used in the distance calculations.
Median coordinates will not be calculated for any discarded markers, instead being reported as NA
in the utput intensities
.
Note that the used
field should not be set directly – prepareCellData
must be run again to change the set of used markers.
Users can also increase speed by setting BPPARAM
to use multiple cores.
Neither this or naive
will not affect the final results, only the speed with which they are obtained.
A CyData object containing the following information:
counts
An integer matrix of counts for each hypersphere (row) and sample (column) in the Assays
slot.
intensities
:A numeric matrix of median intensities for each hypersphere (row) and marker (column), stored in the intensities
slot.
cellAssignments
:A list of integer vectors specifying the cells belonging in each group.
This is stored in a compressed format, see packIndices
for details.
sample.id
:An integer vector containing an integer ID for each sample.
This corresponds to values in cellData(x)$sample.id
, to ensure that they can be properly interpreted regardless of column subsetting or combining.
totals
:An integer vector specifying the total number of cells in each sample, stored as a field in the colData
slot.
tol
:An integer scalar equal to tol
, stored in the metadata
slot.
center.cell
:An integer vector specifying the column of cellIntensities
containing the centre of each hypersphere, stored as a field in the rowData
slot.
Any existing rowData
, intensities
and assays
are discarded.
All other slots are left unchanged from the values supplied in x
.
Aaron Lun
Samusik N, Good Z, Spitzer MH et al. (2016). Automated mapping of phenotype space with single-cell data. Nat. Methods 13:493-496
example(prepareCellData, echo=FALSE) downsample <- 10L tol <- 0.5 cnt <- countCells(cd, filter=1, downsample=downsample, tol=tol) cnt