This vignette demonstrates how to access data resources provided by CLAMPData, the companion ExperimentHub data package for CLAMP, through the ExperimentHub interface.
CLAMPData provides curated gene-set libraries, pathway priors, and example expression matrices used by the CLAMP software package for prior-informed latent variable modeling.
Install CLAMPData from Bioconductor with BiocManager:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("CLAMPData")
library(CLAMPData)
library(ExperimentHub)
CLAMPData provides three ExperimentHub resources:
| Resource ID | Title | Description |
|---|---|---|
| EH10279 | GSE164416_DP_htseq_counts_txt_gz | HTSeq gene counts for islet RNA-seq example |
| EH10280 | human_gene_v2_5_alz_h5 | HDF5 file with gene-set priors for CLAMP |
| EH10281 | islets_metadata_csv | Sample metadata for islet RNA-seq example |
You can also list these resources with list_clamp_data():
list_clamp_data()
## name accessor
## 1 GSE164416_DP_htseq_counts_txt_gz GSE164416_DP_htseq_counts_txt_gz
## 2 human_gene_v2_5_alz_h5 human_gene_v2_5_alz_h5
## 3 islets_metadata_csv islets_metadata_csv
## description type
## 1 HTSeq-counts (gene-level) text file from GSE164416 for CLAMP examples. TXT.gz
## 2 HDF5 file used in CLAMP vignettes (human_gene_v2.5_alz.h5). HDF5
## 3 Sample metadata table for islet RNA-seq example (CLAMP). CSV
## species eh_id version
## 1 Homo sapiens EH10279 v1
## 2 Homo sapiens EH10280 v1
## 3 Homo sapiens EH10281 v1
p_counts <- GSE164416_DP_htseq_counts_txt_gz()
p_h5 <- human_gene_v2_5_alz_h5()
p_meta <- islets_metadata_csv()
cat("Counts path:", p_counts, "\n")
## Counts path: /home/biocbuild/.cache/R/ExperimentHub/dafcc67720882_10346
cat("H5 path:", p_h5, "\n")
## H5 path: /home/biocbuild/.cache/R/ExperimentHub/dafcc7a7d2398_10347
cat("Metadata path:", p_meta, "\n")
## Metadata path: /home/biocbuild/.cache/R/ExperimentHub/dafcc52db6a76_10348
# --- Counts Data (GSE164416) ---
counts <- read_islet_counts()
cat("Counts dimensions:", nrow(counts), "genes x", ncol(counts) - 1, "samples\n\n")
## Counts dimensions: 58336 genes x 133 samples
knitr::kable(counts[1:5, 1:6], caption = "Gene counts (first 5 genes)")
| ensembl | DP002 | DP003 | DP005 | DP007 | DP008 |
|---|---|---|---|---|---|
| ENSG00000000003 | 529 | 267 | 438 | 118 | 567 |
| ENSG00000000005 | 0 | 10 | 0 | 0 | 1 |
| ENSG00000000419 | 412 | 606 | 518 | 570 | 892 |
| ENSG00000000457 | 274 | 186 | 249 | 308 | 285 |
| ENSG00000000460 | 37 | 84 | 43 | 32 | 43 |
# --- Sample Metadata ---
meta <- read_islet_metadata()
cat("Metadata dimensions:", nrow(meta), "samples x", ncol(meta), "variables\n")
## Metadata dimensions: 133 samples x 2 variables
cat("Sample types:", paste(unique(meta$type), collapse = ", "), "\n\n")
## Sample types: T3cD, T2D, IGT, ND, IFG
knitr::kable(head(meta), caption = "Sample metadata (first 6 rows)")
| id | type |
|---|---|
| DP002 | T3cD |
| DP003 | T2D |
| DP005 | IGT |
| DP007 | IGT |
| DP008 | IGT |
| DP010 | ND |
CLAMP HDF5 files follow a fixed layout, shown by clamp_h5_schema():
clamp_h5_schema()
## path type
## 1 /data/expression matrix (samples x genes)
## 2 /meta/genes/symbol character
## 3 /meta/samples/geo_accession character
## description
## 1 Expression matrix; transposed to genes x samples at load
## 2 Gene symbols, length = n_genes
## 3 Sample identifiers, length = n_samples
validate_clamp_h5() checks that a file contains the datasets in this layout. It returns TRUE for a valid file and errors otherwise, so you can use it on your own HDF5 files:
if (requireNamespace("rhdf5", quietly = TRUE)) {
print(validate_clamp_h5(human_gene_v2_5_alz_h5()))
}
## [1] TRUE
read_clamp_alz_expression() runs this same check, then returns a genes x samples matrix:
# --- HDF5 Gene-Set Priors ---
if (requireNamespace("rhdf5", quietly = TRUE)) {
expr <- read_clamp_alz_expression()
cat("Expression matrix:", nrow(expr), "genes x", ncol(expr), "samples\n")
} else {
message("Install 'rhdf5' from Bioconductor to inspect HDF5 contents: ",
"BiocManager::install('rhdf5')")
}
## Expression matrix: 67186 genes x 54 samples
sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ExperimentHub_3.3.1 AnnotationHub_4.3.1 BiocFileCache_3.3.0
## [4] dbplyr_2.6.0 BiocGenerics_0.59.7 generics_0.1.4
## [7] CLAMPData_0.99.5 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.4 sass_0.4.10 BiocVersion_3.24.0
## [4] RSQLite_3.53.2 digest_0.6.39 magrittr_2.0.5
## [7] evaluate_1.0.5 bookdown_0.47 fastmap_1.2.0
## [10] blob_1.3.0 jsonlite_2.0.0 AnnotationDbi_1.75.0
## [13] DBI_1.3.0 BiocManager_1.30.27 httr_1.4.8
## [16] purrr_1.2.2 Biostrings_2.81.3 httr2_1.2.2
## [19] jquerylib_0.1.4 cli_3.6.6 crayon_1.5.3
## [22] rlang_1.2.0 XVector_0.53.0 Biobase_2.73.1
## [25] bit64_4.8.2 withr_3.0.3 cachem_1.1.0
## [28] yaml_2.3.12 otel_0.2.0 tools_4.6.0
## [31] memoise_2.0.1 dplyr_1.2.1 Rhdf5lib_2.1.0
## [34] filelock_1.0.3 curl_7.1.0 png_0.1-9
## [37] vctrs_0.7.3 R6_2.6.1 rhdf5_2.57.1
## [40] stats4_4.6.0 lifecycle_1.0.5 Seqinfo_1.3.0
## [43] KEGGREST_1.53.4 S4Vectors_0.51.3 IRanges_2.47.2
## [46] bit_4.6.0 pkgconfig_2.0.3 pillar_1.11.1
## [49] bslib_0.11.0 glue_1.8.1 xfun_0.59
## [52] tibble_3.3.1 tidyselect_1.2.1 rhdf5filters_1.25.0
## [55] knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
## [58] compiler_4.6.0