TileDBArray 1.4.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.18491141 -0.06644347 0.40935304 . -0.9053600 0.7126573
## [2,] -0.88541781 -1.46025298 -1.08259915 . 3.1804115 -0.3933446
## [3,] -2.16841022 1.55932205 0.15685977 . -0.2511989 0.6374381
## [4,] -1.41114250 -2.53518103 0.54661075 . -0.3529032 -1.3483216
## [5,] 0.67329350 1.08991768 1.40738942 . -0.8734643 1.2318889
## ... . . . . . .
## [96,] -0.7563603 1.2130494 2.0335353 . -0.27573070 1.88149548
## [97,] 1.6453879 0.7305726 1.1647541 . 0.09121694 0.29867557
## [98,] -0.3648721 -1.3261283 1.2246742 . 0.05896685 -1.51118489
## [99,] -0.4039592 0.3758424 0.3392724 . -0.17935501 0.17217728
## [100,] 1.1750179 -1.1119851 -0.6200221 . -1.18040691 0.33866812
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.18491141 -0.06644347 0.40935304 . -0.9053600 0.7126573
## [2,] -0.88541781 -1.46025298 -1.08259915 . 3.1804115 -0.3933446
## [3,] -2.16841022 1.55932205 0.15685977 . -0.2511989 0.6374381
## [4,] -1.41114250 -2.53518103 0.54661075 . -0.3529032 -1.3483216
## [5,] 0.67329350 1.08991768 1.40738942 . -0.8734643 1.2318889
## ... . . . . . .
## [96,] -0.7563603 1.2130494 2.0335353 . -0.27573070 1.88149548
## [97,] 1.6453879 0.7305726 1.1647541 . 0.09121694 0.29867557
## [98,] -0.3648721 -1.3261283 1.2246742 . 0.05896685 -1.51118489
## [99,] -0.4039592 0.3758424 0.3392724 . -0.17935501 0.17217728
## [100,] 1.1750179 -1.1119851 -0.6200221 . -1.18040691 0.33866812
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse matrix of class TileDBMatrix and type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> matrix of class TileDBMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.18491141 -0.06644347 0.40935304 . -0.9053600 0.7126573
## GENE_2 -0.88541781 -1.46025298 -1.08259915 . 3.1804115 -0.3933446
## GENE_3 -2.16841022 1.55932205 0.15685977 . -0.2511989 0.6374381
## GENE_4 -1.41114250 -2.53518103 0.54661075 . -0.3529032 -1.3483216
## GENE_5 0.67329350 1.08991768 1.40738942 . -0.8734643 1.2318889
## ... . . . . . .
## GENE_96 -0.7563603 1.2130494 2.0335353 . -0.27573070 1.88149548
## GENE_97 1.6453879 0.7305726 1.1647541 . 0.09121694 0.29867557
## GENE_98 -0.3648721 -1.3261283 1.2246742 . 0.05896685 -1.51118489
## GENE_99 -0.4039592 0.3758424 0.3392724 . -0.17935501 0.17217728
## GENE_100 1.1750179 -1.1119851 -0.6200221 . -1.18040691 0.33866812
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.1849114 -0.8854178 -2.1684102 -1.4111425 0.6732935 0.3938469
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.18491141 -0.06644347 0.40935304 -0.23563900 0.25026910
## GENE_2 -0.88541781 -1.46025298 -1.08259915 -0.48017577 0.10592945
## GENE_3 -2.16841022 1.55932205 0.15685977 0.60395587 1.49675010
## GENE_4 -1.41114250 -2.53518103 0.54661075 0.01733356 -0.36203344
## GENE_5 0.67329350 1.08991768 1.40738942 1.33691185 -0.21291843
out * 2
## <100 x 10> matrix of class DelayedMatrix and type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.3698228 -0.1328869 0.8187061 . -1.8107201 1.4253146
## GENE_2 -1.7708356 -2.9205060 -2.1651983 . 6.3608230 -0.7866891
## GENE_3 -4.3368204 3.1186441 0.3137195 . -0.5023979 1.2748763
## GENE_4 -2.8222850 -5.0703621 1.0932215 . -0.7058063 -2.6966431
## GENE_5 1.3465870 2.1798354 2.8147788 . -1.7469286 2.4637778
## ... . . . . . .
## GENE_96 -1.5127206 2.4260988 4.0670706 . -0.5514614 3.7629910
## GENE_97 3.2907758 1.4611451 2.3295083 . 0.1824339 0.5973511
## GENE_98 -0.7297442 -2.6522566 2.4493484 . 0.1179337 -3.0223698
## GENE_99 -0.8079184 0.7516848 0.6785448 . -0.3587100 0.3443546
## GENE_100 2.3500358 -2.2239702 -1.2400443 . -2.3608138 0.6773362
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6 SAMP_7
## -5.006796 -16.867885 5.085463 -1.258486 -6.766969 -8.282045 6.722204
## SAMP_8 SAMP_9 SAMP_10
## 9.920361 -15.750143 7.407194
out %*% runif(ncol(out))
## <100 x 1> matrix of class DelayedMatrix and type "double":
## y
## GENE_1 0.05268971
## GENE_2 -0.05831130
## GENE_3 0.41017134
## GENE_4 -1.30239924
## GENE_5 1.97444756
## ... .
## GENE_96 1.2888682
## GENE_97 2.4139669
## GENE_98 -1.6666853
## GENE_99 -0.3068987
## GENE_100 -0.1538679
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.04120818 0.84895996 1.76368742 . 0.1423875 -0.6054166
## [2,] -1.08567052 -0.51237977 0.63169326 . 0.3529880 -0.4722672
## [3,] -0.25072349 -0.41770307 -0.09854563 . 1.2672261 0.4588575
## [4,] 1.09290244 -1.25842654 2.34666653 . 0.7537928 0.8035950
## [5,] -0.38850681 0.42129978 -0.09933754 . -1.2401123 0.2677466
## ... . . . . . .
## [96,] -0.16704577 1.05904172 1.68727410 . 0.41190337 0.07110572
## [97,] -0.72276454 -0.55365024 -0.42045708 . -0.78980321 1.55246615
## [98,] -0.09278692 1.04198060 -0.59953840 . -0.11887326 0.49778809
## [99,] 0.55117205 -0.48123158 -1.35055374 . 1.70057648 -1.40693715
## [100,] -0.27031656 -0.98789866 -1.61710447 . -1.29611135 1.49733074
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> matrix of class TileDBMatrix and type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 1.04120818 0.84895996 1.76368742 . 0.1423875 -0.6054166
## [2,] -1.08567052 -0.51237977 0.63169326 . 0.3529880 -0.4722672
## [3,] -0.25072349 -0.41770307 -0.09854563 . 1.2672261 0.4588575
## [4,] 1.09290244 -1.25842654 2.34666653 . 0.7537928 0.8035950
## [5,] -0.38850681 0.42129978 -0.09933754 . -1.2401123 0.2677466
## ... . . . . . .
## [96,] -0.16704577 1.05904172 1.68727410 . 0.41190337 0.07110572
## [97,] -0.72276454 -0.55365024 -0.42045708 . -0.78980321 1.55246615
## [98,] -0.09278692 1.04198060 -0.59953840 . -0.11887326 0.49778809
## [99,] 0.55117205 -0.48123158 -1.35055374 . 1.70057648 -1.40693715
## [100,] -0.27031656 -0.98789866 -1.61710447 . -1.29611135 1.49733074
sessionInfo()
## R version 4.1.1 Patched (2021-08-22 r80813)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] TileDBArray_1.4.0 DelayedArray_0.20.0 IRanges_2.28.0
## [4] S4Vectors_0.32.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
## [7] BiocGenerics_0.40.0 Matrix_1.3-4 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 bslib_0.3.1 compiler_4.1.1
## [4] BiocManager_1.30.16 jquerylib_0.1.4 tools_4.1.1
## [7] digest_0.6.28 bit_4.0.4 jsonlite_1.7.2
## [10] evaluate_0.14 lattice_0.20-45 nanotime_0.3.3
## [13] rlang_0.4.12 RcppCCTZ_0.2.9 yaml_2.2.1
## [16] xfun_0.27 fastmap_1.1.0 stringr_1.4.0
## [19] knitr_1.36 sass_0.4.0 bit64_4.0.5
## [22] grid_4.1.1 R6_2.5.1 rmarkdown_2.11
## [25] bookdown_0.24 tiledb_0.9.7 magrittr_2.0.1
## [28] htmltools_0.5.2 stringi_1.7.5 zoo_1.8-9