anndataR
allows users to work with .h5ad files and
.zarr stores, interact with AnnData objects
and convert to/from SingleCellExperiment or
Seurat objects. This enables users to move data easily
between the different programming languages and analysis ecosystems
needed to perform single-cell data analysis.
This package builds on our experience developing and using other
interoperability packages and aims to provide a first-class R
AnnData experience.
Existing packages provide similar functionality to anndataR but there are some important differences:
SingleCellExperiment objects to/from
AnnData and reading/writing of .h5ad files.
This is facilitated via reticulate
using basilisk
to manage Python environments (native reading of .h5ad is
also possible). In contrast, anndataR
provides a native R H5AD and Zarr interface, removing the need for
Python dependencies. Conversion to/from Seurat objects is
also supported.AnnData data
structure with different back ends and conversion to more common R
objects.There is significant overlap in functionality between these packages. We anticipate that over time they will become more specialised and work better together or be deprecated in favour of anndataR.
Install anndataR using BiocManager:
These sections briefly show how to use anndataR.
First, we fetch an example .h5ad file included in the
package:
By default, a H5AD is read to an in-memory AnnData
object:
It can also be read as a SingleCellExperiment object
(see vignette("usage_singlecellexperiment")):
Or as a Seurat object (see
vignette("usage_seurat")):
There is also a HDF5-backed AnnData object:
Similarly, these functionalities are provided for .zarr
stores too.
zarr_path <- system.file("extdata", "example_v2.zarr.zip", package = "anndataR")
td <- tempdir(check = TRUE)
unzip(zarr_path, exdir = td)
zarr_path <- file.path(td, "example_v2.zarr")# in-memory
adata <- read_zarr(zarr_path)
# as SingleCellExperiment
sce <- read_zarr(zarr_path, as = "SingleCellExperiment")
# as Seurat
obj <- read_zarr(zarr_path, as = "Seurat")
# as Zarr-backed
adata <- read_zarr(zarr_path, as = "ZarrAnnData")See for interacting with a Python AnnData via reticulate.
AnnData objectsView structure:
adata
#> ZarrAnnData object with n_obs × n_vars = 50 × 100
#> obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'hvg', 'Int', 'IntNA', 'IntScalar', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'PCs'
#> layers: 'counts', 'csc_counts', 'dense_counts', 'dense_X'
#> obsp: 'connectivities', 'distances'
#> varp: 'test_varp'Access AnnData slots:
dim(adata$X)
#> [1] 50 100
adata$obs[1:5, 1:6]
#> Float FloatNA Int IntNA Bool BoolNA
#> Cell000 42.42 NaN 0 NA FALSE FALSE
#> Cell001 42.42 42.42 1 42 TRUE NA
#> Cell002 42.42 42.42 2 42 TRUE TRUE
#> Cell003 42.42 42.42 3 42 TRUE TRUE
#> Cell004 42.42 42.42 4 42 TRUE TRUE
adata$var[1:5, 1:6]
#> String n_cells_by_counts mean_counts log1p_mean_counts
#> Gene000 String0 44 1.94 1.078410
#> Gene001 String1 42 2.04 1.111858
#> Gene002 String2 43 2.12 1.137833
#> Gene003 String3 41 1.72 1.000632
#> Gene004 String4 42 2.06 1.118415
#> pct_dropout_by_counts total_counts
#> Gene000 12 97
#> Gene001 16 102
#> Gene002 14 106
#> Gene003 18 86
#> Gene004 16 103Subsetting AnnData objects is
covered below. See ?`AnnData-usage`for more details on how
to work with AnnData objects.
Convert the AnnData object to a
SingleCellExperiment object (see
vignette("usage_singlecellexperiment")):
sce <- adata$as_SingleCellExperiment()
sce
#> class: SingleCellExperiment
#> dim: 100 50
#> metadata(18): Bool BoolNA ... StringScalar umap
#> assays(5): counts csc_counts dense_counts dense_X X
#> rownames(100): Gene000 Gene001 ... Gene098 Gene099
#> rowData names(11): String n_cells_by_counts ... dispersions
#> dispersions_norm
#> colnames(50): Cell000 Cell001 ... Cell048 Cell049
#> colData names(11): Float FloatNA ... log1p_total_counts leiden
#> reducedDimNames(2): X_pca X_umap
#> mainExpName: NULL
#> altExpNames(0):Convert the AnnData object to a Seurat
object (see vignette("usage_seurat")):
obj <- adata$as_Seurat()
obj
#> An object of class Seurat
#> 100 features across 50 samples within 1 assay
#> Active assay: RNA (100 features, 0 variable features)
#> 5 layers present: counts, csc_counts, dense_counts, dense_X, X
#> 2 dimensional reductions calculated: X_pca, X_umapConvert a SingleCellExperiment object to an
AnnData object (see
vignette("usage_singlecellexperiment")):
adata <- as_AnnData(sce)
adata
#> InMemoryAnnData object with n_obs × n_vars = 50 × 100
#> obs: 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'hvg', 'Int', 'IntNA', 'IntScalar', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> varm: 'X_pca'
#> layers: 'counts', 'csc_counts', 'dense_counts', 'dense_X', 'X'
#> obsp: 'connectivities', 'distances'
#> varp: 'test_varp'Convert a Seurat object to an AnnData
object (see vignette("usage_seurat")):
adata <- as_AnnData(obj)
adata
#> InMemoryAnnData object with n_obs × n_vars = 50 × 100
#> obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'Float', 'FloatNA', 'Int', 'IntNA', 'Bool', 'BoolNA', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'leiden'
#> var: 'String', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
#> uns: 'Bool', 'BoolNA', 'Category', 'DataFrameEmpty', 'hvg', 'Int', 'IntNA', 'IntScalar', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'Sparse1D', 'String', 'String2D', 'StringScalar', 'umap'
#> obsm: 'X_pca', 'X_umap'
#> layers: 'counts', 'csc_counts', 'dense_counts', 'dense_X', 'X'
#> obsp: 'connectivities', 'distances'AnnData objectWrite an AnnData object to disk:
tmpfile <- tempfile(fileext = ".h5ad")
adata$write_h5ad(tmpfile) # Alternatively, write_h5ad(adata, tmpfile)Write a SingleCellExperiment object to disk:
tmpfile <- tempfile(fileext = ".h5ad")
write_h5ad(sce, tmpfile)
#> Warning: Could not write element 'obsp/connectivities' of type 'dgTMatrix': Unsupported
#> matrix format in 'obsp/connectivities' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>
#> Warning: Could not write element 'obsp/distances' of type 'dgTMatrix': Unsupported
#> matrix format in 'obsp/distances' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>
#> Warning: Could not write element 'varp/test_varp' of type 'dgTMatrix': Unsupported
#> matrix format in 'varp/test_varp' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>Write a Seurat object to disk:
tmpfile <- tempfile(fileext = ".h5ad")
write_h5ad(obj, tmpfile)
#> Warning: Matrix column names cannot be written to a <HDF5AnnData> object, they will be
#> lost
#> ℹ To write column names for obsm[['X_pca']], store it as <data.frame> instead
#> of a double matrix
#> ℹ NOTE: obs_names and var_names are stored separately
#> Warning: Matrix column names cannot be written to a <HDF5AnnData> object, they will be
#> lost
#> ℹ To write column names for obsm[['X_umap']], store it as <data.frame> instead
#> of a double matrix
#> ℹ NOTE: obs_names and var_names are stored separatelySimilarly, we can write AnnData and other objects to
.zarr stores too.
tmpfile <- tempfile(fileext = ".zarr")
adata$write_zarr(tmpfile) # Alternatively, write_zarr(adata, tmpfile)
tmpfile <- tempfile(fileext = ".zarr")
write_zarr(sce, tmpfile)
#> Warning: Could not write element 'obsp/connectivities' of type 'dgTMatrix': Unsupported
#> matrix format in 'obsp/connectivities' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>
#> Warning: Could not write element 'obsp/distances' of type 'dgTMatrix': Unsupported
#> matrix format in 'obsp/distances' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>
#> Warning: Could not write element 'varp/test_varp' of type 'dgTMatrix': Unsupported
#> matrix format in 'varp/test_varp' ℹ Supported matrices inherit from
#> <RsparseMatrix> or <CsparseMatrix>
tmpfile <- tempfile(fileext = ".zarr")
write_zarr(obj, tmpfile)
#> Warning: Matrix column names cannot be written to a <ZarrAnnData> object, they will be
#> lost
#> ℹ To write column names for obsm[['X_pca']], store it as <data.frame> instead
#> of a double matrix
#> ℹ NOTE: obs_names and var_names are stored separately
#> Warning: Matrix column names cannot be written to a <ZarrAnnData> object, they will be
#> lost
#> ℹ To write column names for obsm[['X_umap']], store it as <data.frame> instead
#> of a double matrix
#> ℹ NOTE: obs_names and var_names are stored separatelyAnnData objectsanndataR
provides standard R subsetting methods that work with familiar bracket
notation. These methods return AnnDataView objects that
provide lazy evaluation for efficient memory usage.
Subset observations (rows) using logical conditions:
# Create some sample data
adata <- AnnData(
X = matrix(rnorm(50), nrow = 10, ncol = 5),
obs = data.frame(
cell_type = factor(rep(c("A", "B", "C"), c(3, 4, 3))),
score = runif(10)
),
var = data.frame(
gene_name = paste0("gene_", 1:5),
highly_variable = c(TRUE, FALSE, TRUE, TRUE, FALSE)
)
)
# Subset to cell type "A"
view1 <- adata[adata$obs$cell_type == "A", ]
view1
#> View of InMemoryAnnData object with n_obs × n_vars = 3 × 5
#> obs: 'cell_type', 'score'
#> var: 'gene_name', 'highly_variable'Subset variables (columns) using logical conditions:
Subset both observations and variables simultaneously:
# Numeric indices
view4 <- adata[1:5, 1:3]
view4
#> View of InMemoryAnnData object with n_obs × n_vars = 5 × 3
#> obs: 'cell_type', 'score'
#> var: 'gene_name', 'highly_variable'
# Character names (if available)
rownames(adata) <- paste0("cell_", 1:10)
colnames(adata) <- paste0("gene_", 1:5)
view5 <- adata[c("cell_1", "cell_2"), c("gene_1", "gene_3")]
view5
#> View of InMemoryAnnData object with n_obs × n_vars = 2 × 2
#> obs: 'cell_type', 'score'
#> var: 'gene_name', 'highly_variable'Views maintain access to all original data slots:
# Access dimensions
dim(view3)
#> [1] 3 3
nrow(view3)
#> [1] 3
ncol(view3)
#> [1] 3
# Access names
rownames(view3)
#> [1] "cell_1" "cell_2" "cell_3"
colnames(view3)
#> [1] "gene_1" "gene_3" "gene_4"
# Access data matrices and metadata
view3$X
#> gene_1 gene_3 gene_4
#> cell_1 -0.5613875 0.4959453 -1.7169359
#> cell_2 -0.7063544 -1.1328703 -0.9706762
#> cell_3 1.7351576 0.3617801 0.4077351
view3$obs
#> cell_type score
#> cell_1 A 0.7701659
#> cell_2 A 0.7618621
#> cell_3 A 0.7716168
view3$var
#> gene_name highly_variable
#> gene_1 gene_1 TRUE
#> gene_3 gene_3 TRUE
#> gene_4 gene_4 TRUE
# Views can be converted to concrete implementations
concrete <- view3$as_InMemoryAnnData()
concrete
#> InMemoryAnnData object with n_obs × n_vars = 3 × 3
#> obs: 'cell_type', 'score'
#> var: 'gene_name', 'highly_variable'If you use anndataR in your work, please cite “anndataR improves interoperability between R and Python in single-cell transcriptomics”:
citation("anndataR")
#> To cite package 'anndataR' in publications use:
#>
#> Deconinck L, Zappia L, Cannoodt R, Morgan M, scverse core, Virshup I,
#> Sang-aram C, Bredikhin D, Seurinck R, Saeys Y (2025). "anndataR
#> improves interoperability between R and Python in single-cell
#> transcriptomics." _bioRxiv_, 2025.08.18.669052.
#> doi:10.1101/2025.08.18.669052
#> <https://doi.org/10.1101/2025.08.18.669052>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Article{,
#> title = {{anndataR} improves interoperability between R and Python in single-cell transcriptomics},
#> author = {Louise Deconinck and Luke Zappia and Robrecht Cannoodt and Martin Morgan and {scverse core} and Isaac Virshup and Chananchida Sang-aram and Danila Bredikhin and Ruth Seurinck and Yvan Saeys},
#> journal = {bioRxiv},
#> year = {2025},
#> pages = {2025.08.18.669052},
#> doi = {10.1101/2025.08.18.669052},
#> }sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] anndataR_1.1.3 SingleCellExperiment_1.33.2
#> [3] SummarizedExperiment_1.41.1 Biobase_2.71.0
#> [5] GenomicRanges_1.63.2 Seqinfo_1.1.0
#> [7] IRanges_2.45.0 S4Vectors_0.49.2
#> [9] BiocGenerics_0.57.1 generics_0.1.4
#> [11] MatrixGenerics_1.23.0 matrixStats_1.5.0
#> [13] SeuratObject_5.4.0 sp_2.2-1
#> [15] BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 sys_3.4.3 jsonlite_2.0.0
#> [4] magrittr_2.0.5 spatstat.utils_3.2-2 farver_2.1.2
#> [7] rmarkdown_2.31 vctrs_0.7.3 ROCR_1.0-12
#> [10] spatstat.explore_3.8-0 htmltools_0.5.9 S4Arrays_1.11.1
#> [13] curl_7.1.0 Rhdf5lib_1.99.6 SparseArray_1.11.13
#> [16] rhdf5_2.55.16 sass_0.4.10 sctransform_0.4.3
#> [19] parallelly_1.47.0 KernSmooth_2.23-26 bslib_0.10.0
#> [22] htmlwidgets_1.6.4 ica_1.0-3 httr2_1.2.2
#> [25] plyr_1.8.9 plotly_4.12.0 zoo_1.8-15
#> [28] cachem_1.1.0 buildtools_1.0.0 igraph_2.3.0
#> [31] mime_0.13 lifecycle_1.0.5 pkgconfig_2.0.3
#> [34] Matrix_1.7-5 R6_2.6.1 fastmap_1.2.0
#> [37] fitdistrplus_1.2-6 future_1.70.0 shiny_1.13.0
#> [40] digest_0.6.39 paws.storage_0.9.0 patchwork_1.3.2
#> [43] Seurat_5.5.0 tensor_1.5.1 RSpectra_0.16-2
#> [46] irlba_2.3.7 Rarr_1.99.44 progressr_0.19.0
#> [49] spatstat.sparse_3.1-0 httr_1.4.8 polyclip_1.10-7
#> [52] abind_1.4-8 compiler_4.5.3 S7_0.2.2
#> [55] fastDummies_1.7.6 R.utils_2.13.0 MASS_7.3-65
#> [58] rappdirs_0.3.4 DelayedArray_0.37.1 tools_4.5.3
#> [61] lmtest_0.9-40 otel_0.2.0 httpuv_1.6.17
#> [64] future.apply_1.20.2 goftest_1.2-3 R.oo_1.27.1
#> [67] glue_1.8.1 nlme_3.1-169 rhdf5filters_1.23.3
#> [70] promises_1.5.0 grid_4.5.3 Rtsne_0.17
#> [73] cluster_2.1.8.2 reshape2_1.4.5 gtable_0.3.6
#> [76] spatstat.data_3.1-9 R.methodsS3_1.8.2 tidyr_1.3.2
#> [79] data.table_1.18.2.1 XVector_0.51.0 spatstat.geom_3.7-3
#> [82] RcppAnnoy_0.0.23 ggrepel_0.9.8 RANN_2.6.2
#> [85] pillar_1.11.1 stringr_1.6.0 spam_2.11-3
#> [88] RcppHNSW_0.6.0 later_1.4.8 splines_4.5.3
#> [91] dplyr_1.2.1 lattice_0.22-9 deldir_2.0-4
#> [94] survival_3.8-6 paws.common_0.8.9 tidyselect_1.2.1
#> [97] maketools_1.3.2 miniUI_0.1.2 pbapply_1.7-4
#> [100] knitr_1.51 gridExtra_2.3 scattermore_1.2
#> [103] xfun_0.57 stringi_1.8.7 lazyeval_0.2.3
#> [106] yaml_2.3.12 evaluate_1.0.5 codetools_0.2-20
#> [109] tibble_3.3.1 BiocManager_1.30.27 cli_3.6.6
#> [112] uwot_0.2.4 xtable_1.8-8 reticulate_1.46.0
#> [115] jquerylib_0.1.4 Rcpp_1.1.1-1 spatstat.random_3.4-5
#> [118] globals_0.19.1 png_0.1-9 spatstat.univar_3.1-7
#> [121] parallel_4.5.3 ggplot2_4.0.3 dotCall64_1.2
#> [124] listenv_0.10.1 viridisLite_0.4.3 scales_1.4.0
#> [127] ggridges_0.5.7 crayon_1.5.3 purrr_1.2.2
#> [130] rlang_1.2.0 cowplot_1.2.0