findNeighbors {BiocNeighbors} | R Documentation |
Find all neighboring data points within a certain distance with the KMKNN algorithm.
findNeighbors(X, threshold, get.index=TRUE, get.distance=TRUE, BPPARAM=SerialParam(), precomputed=NULL, subset=NULL, raw.index=FALSE, ...)
X |
A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). |
threshold |
A positive numeric scalar specifying the maximum distance at which a point is considered a neighbor. |
get.index |
A logical scalar indicating whether the indices of the neighbors should be recorded. |
get.distance |
A logical scalar indicating whether distances to the neighbors should be recorded. |
BPPARAM |
A BiocParallelParam object indicating how the search should be parallelized. |
precomputed |
A KmknnIndex object from running |
subset |
A vector indicating the rows of |
raw.index |
A logial scalar indicating whether raw column indices to |
... |
Further arguments to pass to |
This function uses the same algorithm described in findKmknn
to identify all points in X
that within threshold
of each point in X
.
For Euclidean distances, this is equivalent to identifying all points in a hypersphere centered around the point of interest.
By default, a search is performed for each data point in X
, but it can be limited to a specified subset of points with subset
.
This yields the same result as (but is more efficient than) subsetting the output matrices after running findNeighbors
with subset=NULL
.
Turning off get.index
or get.distance
may provide a slight speed boost when these returned values are not of interest.
Using BPPARAM
will also split the search by query points, which usually provides a linear increase in speed.
If multiple queries are to be performed to the same X
, it may be beneficial to use buildKmknn
directly and pass the result to precomputed
.
In such cases, it is also possible to set raw.index=TRUE
to obtain indices of neighbors in the reordered data set in precomputed
,
though this will change both the nature of the output index
and the interpretation of subset
- see ?findKmknn
for details.
A list is returned containing:
index
, if get.index=TRUE
.
This is a list of integer vectors where each entry corresponds to a point (denoted here as i) in X
.
The vector for i contains the set of row indices of all points in X
that lie within threshold
of point i.
Points in each vector are not ordered, and i will always be included in its own set.
distance
, if get.distance=TRUE
.
This is a list of numeric vectors where each entry corresponds to a point (as above) and contains the distances of the neighbors from i.
Elements of each vector in distance
match to elements of the corresponding vector in index
.
If subset
is not NULL
, each row of the above matrices refers to a point in the subset, in the same order as supplied in subset
.
If raw.index=TRUE
, the values in index
refer to columns of KmknnIndex_clustered_data(precomputed)
.
Aaron Lun
buildKmknn
to build an index ahead of time.
Y <- matrix(runif(100000), ncol=20) out <- findNeighbors(Y, threshold=1)