Contents

1 Introduction

This vignette demonstrates how to access data resources provided by CLAMPData, the companion ExperimentHub data package for CLAMP, through the ExperimentHub interface.

CLAMPData provides curated gene-set libraries, pathway priors, and example expression matrices used by the CLAMP software package for prior-informed latent variable modeling.

2 Installation

Install CLAMPData from Bioconductor with BiocManager:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("CLAMPData")

3 Load packages

library(CLAMPData)
library(ExperimentHub)

4 Available Resources

CLAMPData provides three ExperimentHub resources:

Resource ID Title Description
EH10279 GSE164416_DP_htseq_counts_txt_gz HTSeq gene counts for islet RNA-seq example
EH10280 human_gene_v2_5_alz_h5 HDF5 file with gene-set priors for CLAMP
EH10281 islets_metadata_csv Sample metadata for islet RNA-seq example

You can also list these resources with list_clamp_data():

list_clamp_data()
##                               name                         accessor
## 1 GSE164416_DP_htseq_counts_txt_gz GSE164416_DP_htseq_counts_txt_gz
## 2           human_gene_v2_5_alz_h5           human_gene_v2_5_alz_h5
## 3              islets_metadata_csv              islets_metadata_csv
##                                                              description   type
## 1 HTSeq-counts (gene-level) text file from GSE164416 for CLAMP examples. TXT.gz
## 2            HDF5 file used in CLAMP vignettes (human_gene_v2.5_alz.h5).   HDF5
## 3               Sample metadata table for islet RNA-seq example (CLAMP).    CSV
##        species   eh_id version
## 1 Homo sapiens EH10279      v1
## 2 Homo sapiens EH10280      v1
## 3 Homo sapiens EH10281      v1

5 Accessing data resources

p_counts <- GSE164416_DP_htseq_counts_txt_gz()
p_h5     <- human_gene_v2_5_alz_h5()
p_meta   <- islets_metadata_csv()
cat("Counts path:", p_counts, "\n")
## Counts path: /home/biocbuild/.cache/R/ExperimentHub/dafcc67720882_10346
cat("H5 path:", p_h5, "\n")
## H5 path: /home/biocbuild/.cache/R/ExperimentHub/dafcc7a7d2398_10347
cat("Metadata path:", p_meta, "\n")
## Metadata path: /home/biocbuild/.cache/R/ExperimentHub/dafcc52db6a76_10348

6 Reading example data

# --- Counts Data (GSE164416) ---
counts <- read_islet_counts()
cat("Counts dimensions:", nrow(counts), "genes x", ncol(counts) - 1, "samples\n\n")
## Counts dimensions: 58336 genes x 133 samples
knitr::kable(counts[1:5, 1:6], caption = "Gene counts (first 5 genes)")

Table 1: Gene counts (first 5 genes)
ensembl DP002 DP003 DP005 DP007 DP008
ENSG00000000003 529 267 438 118 567
ENSG00000000005 0 10 0 0 1
ENSG00000000419 412 606 518 570 892
ENSG00000000457 274 186 249 308 285
ENSG00000000460 37 84 43 32 43
# --- Sample Metadata ---
meta <- read_islet_metadata()
cat("Metadata dimensions:", nrow(meta), "samples x", ncol(meta), "variables\n")
## Metadata dimensions: 133 samples x 2 variables
cat("Sample types:", paste(unique(meta$type), collapse = ", "), "\n\n")
## Sample types: T3cD, T2D, IGT, ND, IFG
knitr::kable(head(meta), caption = "Sample metadata (first 6 rows)")

Table 1: Sample metadata (first 6 rows)
id type
DP002 T3cD
DP003 T2D
DP005 IGT
DP007 IGT
DP008 IGT
DP010 ND

CLAMP HDF5 files follow a fixed layout, shown by clamp_h5_schema():

clamp_h5_schema()
##                          path                     type
## 1            /data/expression matrix (samples x genes)
## 2          /meta/genes/symbol                character
## 3 /meta/samples/geo_accession                character
##                                                description
## 1 Expression matrix; transposed to genes x samples at load
## 2                           Gene symbols, length = n_genes
## 3                   Sample identifiers, length = n_samples

validate_clamp_h5() checks that a file contains the datasets in this layout. It returns TRUE for a valid file and errors otherwise, so you can use it on your own HDF5 files:

if (requireNamespace("rhdf5", quietly = TRUE)) {
  print(validate_clamp_h5(human_gene_v2_5_alz_h5()))
}
## [1] TRUE

read_clamp_alz_expression() runs this same check, then returns a genes x samples matrix:

# --- HDF5 Gene-Set Priors ---
if (requireNamespace("rhdf5", quietly = TRUE)) {
  expr <- read_clamp_alz_expression()
  cat("Expression matrix:", nrow(expr), "genes x", ncol(expr), "samples\n")
} else {
    message("Install 'rhdf5' from Bioconductor to inspect HDF5 contents: ",
          "BiocManager::install('rhdf5')")
}
## Expression matrix: 67186 genes x 54 samples

7 Session info

sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ExperimentHub_3.3.1 AnnotationHub_4.3.1 BiocFileCache_3.3.0
## [4] dbplyr_2.6.0        BiocGenerics_0.59.7 generics_0.1.4     
## [7] CLAMPData_0.99.5    BiocStyle_2.41.0   
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.4       sass_0.4.10          BiocVersion_3.24.0  
##  [4] RSQLite_3.53.2       digest_0.6.39        magrittr_2.0.5      
##  [7] evaluate_1.0.5       bookdown_0.47        fastmap_1.2.0       
## [10] blob_1.3.0           jsonlite_2.0.0       AnnotationDbi_1.75.0
## [13] DBI_1.3.0            BiocManager_1.30.27  httr_1.4.8          
## [16] purrr_1.2.2          Biostrings_2.81.3    httr2_1.2.2         
## [19] jquerylib_0.1.4      cli_3.6.6            crayon_1.5.3        
## [22] rlang_1.2.0          XVector_0.53.0       Biobase_2.73.1      
## [25] bit64_4.8.2          withr_3.0.3          cachem_1.1.0        
## [28] yaml_2.3.12          otel_0.2.0           tools_4.6.0         
## [31] memoise_2.0.1        dplyr_1.2.1          Rhdf5lib_2.1.0      
## [34] filelock_1.0.3       curl_7.1.0           png_0.1-9           
## [37] vctrs_0.7.3          R6_2.6.1             rhdf5_2.57.1        
## [40] stats4_4.6.0         lifecycle_1.0.5      Seqinfo_1.3.0       
## [43] KEGGREST_1.53.4      S4Vectors_0.51.3     IRanges_2.47.2      
## [46] bit_4.6.0            pkgconfig_2.0.3      pillar_1.11.1       
## [49] bslib_0.11.0         glue_1.8.1           xfun_0.59           
## [52] tibble_3.3.1         tidyselect_1.2.1     rhdf5filters_1.25.0 
## [55] knitr_1.51           htmltools_0.5.9      rmarkdown_2.31      
## [58] compiler_4.6.0