BiocNeighbors 1.2.0
Another application of the KMKNN or VP tree algorithms is to identify all neighboring points within a certain (Euclidean) distance of the current point. We first mock up some data:
nobs <- 10000
ndim <- 20
data <- matrix(runif(nobs*ndim), ncol=ndim)
We apply the findNeighbors()
function to data
:
fout <- findNeighbors(data, threshold=1)
head(fout$index)
## [[1]]
## [1] 3132 4243 1
##
## [[2]]
## [1] 7619 9446 436 9943 2 8737 251 4291 2524
##
## [[3]]
## [1] 3743 3 7344 9087
##
## [[4]]
## [1] 5600 4 8732 2903 7791 1421 923 8526 5328 3048 9485
##
## [[5]]
## [1] 9976 5 6181 3987 9866 132 3126 1648 3779
##
## [[6]]
## [1] 87 1833 6 641
head(fout$distance)
## [[1]]
## [1] 0.7845169 0.9752670 0.0000000
##
## [[2]]
## [1] 0.9914915 0.9889466 0.9862341 0.9740887 0.0000000 0.9192367 0.8594453
## [8] 0.8947946 0.9789735
##
## [[3]]
## [1] 0.9387640 0.0000000 0.9998153 0.8960285
##
## [[4]]
## [1] 0.9681719 0.0000000 0.9766791 0.9823662 0.9182805 0.8812096 0.9873737
## [8] 0.9612648 0.9630822 0.9836869 0.9431111
##
## [[5]]
## [1] 0.9097279 0.0000000 0.9727490 0.9999831 0.7364990 0.9039447 0.9772604
## [8] 0.9763693 0.9773558
##
## [[6]]
## [1] 0.9523960 0.9519972 0.0000000 0.9818344
Each entry of the index
list corresponds to a point in data
and contains the row indices in data
that are within threshold
.
For example, the 3rd point in data
has the following neighbors:
fout$index[[3]]
## [1] 3743 3 7344 9087
… with the following distances to those neighbors:
fout$distance[[3]]
## [1] 0.9387640 0.0000000 0.9998153 0.8960285
Note that, for this function, the reported neighbors are not sorted by distance. The order of the output is completely arbitrary and will vary depending on the random seed. However, the identity of the neighbors is fully deterministic.
The queryNeighbors()
function is also provided for identifying all points within a certain distance of a query point.
Given a query data set:
nquery <- 1000
ndim <- 20
query <- matrix(runif(nquery*ndim), ncol=ndim)
… we apply the queryNeighbors()
function:
qout <- queryNeighbors(data, query, threshold=1)
length(qout$index)
## [1] 1000
… where each entry of qout$index
corresponds to a row of query
and contains its neighbors in data
.
Again, the order of the output is arbitrary but the identity of the neighbors is deterministic.
Most of the options described for findKNN()
are also applicable here.
For example:
subset
to identify neighbors for a subset of points.get.distance
to avoid retrieving distances when unnecessary.BPPARAM
to parallelize the calculations across multiple workers.raw.index
to return the raw indices from a precomputed index.Note that the argument for a precomputed index is precomputed
:
pre <- buildIndex(data, BNPARAM=KmknnParam())
fout.pre <- findNeighbors(BNINDEX=pre, threshold=1)
qout.pre <- queryNeighbors(BNINDEX=pre, query=query, threshold=1)
Users are referred to the documentation of each function for specific details.
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows Server 2012 R2 x64 (build 9600)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocParallel_1.18.0 BiocNeighbors_1.2.0 knitr_1.22
## [4] BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.1 bookdown_0.9 digest_0.6.18
## [4] stats4_3.6.0 magrittr_1.5 evaluate_0.13
## [7] stringi_1.4.3 S4Vectors_0.22.0 rmarkdown_1.12
## [10] tools_3.6.0 stringr_1.4.0 parallel_3.6.0
## [13] xfun_0.6 yaml_2.2.0 compiler_3.6.0
## [16] BiocGenerics_0.30.0 BiocManager_1.30.4 htmltools_0.3.6