TileDBArray 1.12.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -2.76255396 0.25018412 -0.09447803 . -0.4951537 1.4188033
## [2,] 1.80111837 1.06835858 0.91679834 . 0.2015209 -1.1327626
## [3,] 1.58059395 0.85957675 1.43343773 . 1.2999757 1.4797679
## [4,] 0.53703490 -0.05966374 -1.51610121 . 2.3929684 1.7825732
## [5,] -0.55751350 -0.65382354 -0.63652367 . -0.5764152 0.8416369
## ... . . . . . .
## [96,] 0.608803969 1.029553458 0.254568444 . 1.14701805 0.16430067
## [97,] 0.237112681 -1.510721258 -0.006912663 . -0.55418746 -1.95763448
## [98,] 0.522278841 -0.120793789 0.179037861 . 0.45119546 -0.03422747
## [99,] -1.494202295 -0.920249763 -1.205330593 . 0.48813183 -1.76651033
## [100,] 0.833449789 0.008262907 -0.871029375 . -0.07645446 -0.09360263
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -2.76255396 0.25018412 -0.09447803 . -0.4951537 1.4188033
## [2,] 1.80111837 1.06835858 0.91679834 . 0.2015209 -1.1327626
## [3,] 1.58059395 0.85957675 1.43343773 . 1.2999757 1.4797679
## [4,] 0.53703490 -0.05966374 -1.51610121 . 2.3929684 1.7825732
## [5,] -0.55751350 -0.65382354 -0.63652367 . -0.5764152 0.8416369
## ... . . . . . .
## [96,] 0.608803969 1.029553458 0.254568444 . 1.14701805 0.16430067
## [97,] 0.237112681 -1.510721258 -0.006912663 . -0.55418746 -1.95763448
## [98,] 0.522278841 -0.120793789 0.179037861 . 0.45119546 -0.03422747
## [99,] -1.494202295 -0.920249763 -1.205330593 . 0.48813183 -1.76651033
## [100,] 0.833449789 0.008262907 -0.871029375 . -0.07645446 -0.09360263
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0.0 0.0
## [2,] 0 0 0 . 0.0 0.0
## [3,] 0 0 0 . 0.0 0.0
## [4,] 0 0 0 . 1.9 -1.2
## [5,] 0 0 0 . 0.0 0.0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . TRUE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.76255396 0.25018412 -0.09447803 . -0.4951537 1.4188033
## GENE_2 1.80111837 1.06835858 0.91679834 . 0.2015209 -1.1327626
## GENE_3 1.58059395 0.85957675 1.43343773 . 1.2999757 1.4797679
## GENE_4 0.53703490 -0.05966374 -1.51610121 . 2.3929684 1.7825732
## GENE_5 -0.55751350 -0.65382354 -0.63652367 . -0.5764152 0.8416369
## ... . . . . . .
## GENE_96 0.608803969 1.029553458 0.254568444 . 1.14701805 0.16430067
## GENE_97 0.237112681 -1.510721258 -0.006912663 . -0.55418746 -1.95763448
## GENE_98 0.522278841 -0.120793789 0.179037861 . 0.45119546 -0.03422747
## GENE_99 -1.494202295 -0.920249763 -1.205330593 . 0.48813183 -1.76651033
## GENE_100 0.833449789 0.008262907 -0.871029375 . -0.07645446 -0.09360263
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -2.7625540 1.8011184 1.5805939 0.5370349 -0.5575135 -0.7333186
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -2.76255396 0.25018412 -0.09447803 0.04503548 0.15236080
## GENE_2 1.80111837 1.06835858 0.91679834 1.02280627 -0.30283835
## GENE_3 1.58059395 0.85957675 1.43343773 -1.08421333 -1.33943363
## GENE_4 0.53703490 -0.05966374 -1.51610121 0.99355115 -0.57521883
## GENE_5 -0.55751350 -0.65382354 -0.63652367 0.01462782 -2.24394824
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -5.5251079 0.5003682 -0.1889561 . -0.9903074 2.8376067
## GENE_2 3.6022367 2.1367172 1.8335967 . 0.4030417 -2.2655252
## GENE_3 3.1611879 1.7191535 2.8668755 . 2.5999514 2.9595358
## GENE_4 1.0740698 -0.1193275 -3.0322024 . 4.7859369 3.5651464
## GENE_5 -1.1150270 -1.3076471 -1.2730473 . -1.1528305 1.6832739
## ... . . . . . .
## GENE_96 1.21760794 2.05910692 0.50913689 . 2.29403609 0.32860135
## GENE_97 0.47422536 -3.02144252 -0.01382533 . -1.10837492 -3.91526895
## GENE_98 1.04455768 -0.24158758 0.35807572 . 0.90239092 -0.06845494
## GENE_99 -2.98840459 -1.84049953 -2.41066119 . 0.97626365 -3.53302066
## GENE_100 1.66689958 0.01652581 -1.74205875 . -0.15290891 -0.18720527
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7 SAMP_8
## 2.630159 8.965379 3.832204 -1.510771 3.599369 3.352911 1.134673 -2.689144
## SAMP_9 SAMP_10
## -1.092709 -6.543993
out %*% runif(ncol(out))
## <100 x 1> DelayedMatrix object of type "double":
## y
## GENE_1 -0.5707955
## GENE_2 2.3459591
## GENE_3 2.5554252
## GENE_4 3.0061574
## GENE_5 -1.6150756
## ... .
## GENE_96 2.6940113
## GENE_97 -1.2440827
## GENE_98 0.4202578
## GENE_99 -3.1219278
## GENE_100 0.5251844
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.70511887 -0.42690197 -0.55485552 . -0.90747793 0.22097639
## [2,] 0.44719851 -0.64256273 -0.68818340 . -1.36429862 0.30861882
## [3,] -0.13352481 -0.20758877 0.45503852 . -1.23934257 -1.11949572
## [4,] -2.27189488 1.18842839 0.28084975 . 0.31320911 -0.80421345
## [5,] -0.77519871 -1.88059567 -0.08228658 . 1.10463688 -0.05519271
## ... . . . . . .
## [96,] -2.10188497 0.09824464 0.19988037 . 0.06077448 -0.89925660
## [97,] 0.57624637 -0.59909270 0.32777623 . -1.70959150 -2.03022629
## [98,] 0.69931611 0.57649554 -1.79446723 . 0.04316689 0.73002332
## [99,] -0.31381739 0.12742245 -0.22885635 . -0.00269354 0.35959225
## [100,] -0.74154836 0.66265278 -0.53792720 . -0.83579852 -1.35268708
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.70511887 -0.42690197 -0.55485552 . -0.90747793 0.22097639
## [2,] 0.44719851 -0.64256273 -0.68818340 . -1.36429862 0.30861882
## [3,] -0.13352481 -0.20758877 0.45503852 . -1.23934257 -1.11949572
## [4,] -2.27189488 1.18842839 0.28084975 . 0.31320911 -0.80421345
## [5,] -0.77519871 -1.88059567 -0.08228658 . 1.10463688 -0.05519271
## ... . . . . . .
## [96,] -2.10188497 0.09824464 0.19988037 . 0.06077448 -0.89925660
## [97,] 0.57624637 -0.59909270 0.32777623 . -1.70959150 -2.03022629
## [98,] 0.69931611 0.57649554 -1.79446723 . 0.04316689 0.73002332
## [99,] -0.31381739 0.12742245 -0.22885635 . -0.00269354 0.35959225
## [100,] -0.74154836 0.66265278 -0.53792720 . -0.83579852 -1.35268708
sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.14 TileDBArray_1.12.0 DelayedArray_0.28.0
## [4] SparseArray_1.2.0 S4Arrays_1.2.0 abind_1.4-5
## [7] IRanges_2.36.0 S4Vectors_0.40.0 MatrixGenerics_1.14.0
## [10] matrixStats_1.0.0 BiocGenerics_0.48.0 Matrix_1.6-1.1
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 jsonlite_1.8.7 compiler_4.3.1
## [4] BiocManager_1.30.22 crayon_1.5.2 Rcpp_1.0.11
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.22-5 R6_2.5.1 RcppCCTZ_0.2.12
## [13] XVector_0.42.0 tiledb_0.21.1 knitr_1.44
## [16] bookdown_0.36 bslib_0.5.1 rlang_1.1.1
## [19] cachem_1.0.8 xfun_0.40 sass_0.4.7
## [22] bit64_4.0.5 cli_3.6.1 zlibbioc_1.48.0
## [25] spdl_0.0.5 digest_0.6.33 grid_4.3.1
## [28] data.table_1.14.8 evaluate_0.22 nanotime_0.3.7
## [31] zoo_1.8-12 rmarkdown_2.25 tools_4.3.1
## [34] htmltools_0.5.6.1