queryNeighbors {BiocNeighbors}R Documentation

Query neighbors

Description

Find all neighboring data points within a certain distance of a query point with the KMKNN algorithm.

Usage

queryNeighbors(X, query, threshold, get.index=TRUE, get.distance=TRUE, 
    BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
    raw.index=FALSE, ...)

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).

query

A numeric matrix of query points, containing different data points in the rows but the same number and ordering of dimensions in the columns.

threshold

A positive numeric scalar specifying the maximum distance at which a point is considered a neighbor.

get.index

A logical scalar indicating whether the indices of the neighbors should be recorded.

get.distance

A logical scalar indicating whether distances to the neighbors should be recorded.

BPPARAM

A BiocParallelParam object indicating how the search should be parallelized.

precomputed

A KmknnIndex object from running buildKmknn on X.

transposed

A logical scalar indicating whether the query is transposed, in which case query is assumed to contain dimensions in the rows and data points in the columns.

subset

A vector indicating the rows of query (or columns, if transposed=TRUE) for which the neighbors should be identified.

raw.index

A logial scalar indicating whether raw column indices to precomputed$data should be returned.

...

Further arguments to pass to buildKmknn if precomputed=NULL.

Details

This function uses the same algorithm described in findKmknn to identify points in X that are neighbors (i.e., within a distance threshold) of each point in query. This requires both X and query to have the same number of dimensions.

By default, neighbors are identified for all data points within query. If subset is specified, neighbors are only detected for the query points in the subset. This yields the same result as (but is more efficient than) subsetting the output matrices after running queryNeighbors on the full query (i.e., with subset=NULL).

If transposed=TRUE, this function assumes that query is already transposed, which saves a bit of time by avoiding an unnecessary transposition. Turning off get.index or get.distance may also provide a slight speed boost when these returned values are not of interest. Using BPPARAM will also split the search by query points across multiple processes.

If multiple queries are to be performed to the same X, it may be beneficial to use buildKmknn directly to precompute the clustering. Advanced users can also set raw.index=TRUE, which returns indices of neighbors in the reordered data set in precomputed. This may be useful when dealing with multiple queries to a common precomputed object.

Value

A list is returned containing:

If subset is not NULL, each row of the above matrices refers to a point in the subset, in the same order as supplied in subset.

If raw.index=TRUE, the values in index refer to columns of KmknnIndex_clustered_data(precomputed).

Author(s)

Aaron Lun

See Also

buildKmknn to build an index ahead of time.

Examples

Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)
out <- queryNeighbors(Y, query=Z, threshold=1)
head(out$index)
head(out$distance)

[Package BiocNeighbors version 1.0.0 Index]