hammers is a utilities suite for scRNA-seq data
analysis. It provides simple tools to address tasks such as retrieving
aggregate gene statistics, finding and removing rare genes, performing
representation analysis, computing the center of mass for the expression
of a gene of interest in low-dimensional space, and calculating
silhouette and cluster-normalized silhouette.
To install hammers, run the following commands in an R session:
This tutorial uses an scRNA-seq human pancreas dataset. After loading
the required packages, download the dataset using the
BaronPancreasData function from scRNAseq. The
dataset will be stored as a SingleCellExperiment object.
library(hammers)
library(scLang)
library(scRNAseq)
library(scater)
sceObj <- BaronPancreasData('human')We will first normalize and log-transform the data using the
logNormCounts function from scuttle (loaded
automatically with scater).
We will also need PCA and UMAP dimensions. These can be computed
using the runPCA and runUMAP functions from
scater:
The genePresence function extracts the
genes that appear in at least minCutoff and at
most maxCutoff cells. It returns a data frame listing the
number of cells in which each gene that meets these criteria is
expressed:
df <- genePresence(sceObj, rownames(sceObj)[seq(100)], 300, 2000)
dim(df)
#> [1] 40 2
head(df)
#> Gene nCells
#> AATF AATF 1755
#> A1CF A1CF 1408
#> AACS AACS 1385
#> AAGAB AAGAB 1328
#> ABHD3 ABHD3 1220
#> AAAS AAAS 1216The geneCellSets function lists all the cells in which
selected genes are expressed:
cellSets <- geneCellSets(sceObj, rownames(sceObj)[seq(100)])
head(cellSets[[1]])
#> [1] "human1_lib1.final_cell_0024" "human1_lib1.final_cell_0052"
#> [3] "human1_lib1.final_cell_0054" "human1_lib1.final_cell_0188"
#> [5] "human1_lib1.final_cell_0242" "human1_lib1.final_cell_0294"findRareGenes is a wrapper around
genePresence that can extract the genes that appear in
fewer than nCells cells:
df <- findRareGenes(sceObj, nCells=10)
#> Warning in asMethod(object): sparse->dense coercion: allocating vector of size
#> 1.3 GiB
dim(df)
#> [1] 5008 2
head(df)
#> Gene nCells
#> ADRA1D ADRA1D 9
#> ALAS2 ALAS2 9
#> ALPK2 ALPK2 9
#> ASPA ASPA 9
#> BDNF BDNF 9
#> BRINP2 BRINP2 9The removeRareGenes function calls
findRareGenes and subsequently removes the identified rare
genes from the object:
The repAnalysis function shows which pairs selected from
two categorical columns of a single-cell object are:
doOverrep to FALSE.df <- repAnalysis(sceObj, 'donor', 'label')
head(df)
#> donor label sharedCount donorCount labelCount
#> 29 GSM2230759 acinar 843 3605 958
#> 4 GSM2230757 beta 872 1937 2525
#> 17 GSM2230758 alpha 676 1724 2326
#> 7 GSM2230757 endothelial 130 1937 252
#> 48 GSM2230760 ductal 280 1303 1077
#> 12 GSM2230757 quiescent_stellate 92 1937 173
#> expectedCount ratio pval pvalAdj
#> 29 403.03303 2.091640 1.274205e-216 3.290536e-214
#> 4 570.76963 1.527762 3.187997e-62 4.116378e-60
#> 17 467.96872 1.444541 9.375316e-35 8.070344e-33
#> 7 56.96394 2.282146 2.054204e-24 1.326206e-22
#> 48 163.76835 1.709732 3.329598e-23 1.719686e-21
#> 12 39.10620 2.352568 9.889812e-19 4.256613e-17
df <- repAnalysis(sceObj, 'donor', 'label', doOverrep=FALSE)
head(df)
#> donor label sharedCount donorCount labelCount expectedCount ratio
#> 15 GSM2230758 acinar 3 1724 958 192.7403 0.01556498
#> 3 GSM2230757 alpha 236 1937 2326 525.7862 0.44885164
#> 43 GSM2230760 acinar 2 1303 958 145.6732 0.01372936
#> 32 GSM2230759 beta 787 3605 2525 1062.2739 0.74086354
#> 6 GSM2230757 ductal 120 1937 1077 243.4530 0.49290822
#> 37 GSM2230759 gamma 36 3605 255 107.2791 0.33557314
#> pval pvalAdj
#> 15 5.183525e-94 1.338605e-91
#> 3 1.948937e-71 2.516489e-69
#> 43 1.349905e-69 1.162009e-67
#> 32 8.059698e-41 5.203387e-39
#> 6 6.325634e-25 3.267092e-23
#> 37 1.107353e-22 4.766089e-21The output of repAnalysis can be visualized using the
pvalRiverPlot function, which creates an alluvial plot for
a data frame having a p-value column. Thicker connecting bands represent
lower p-values:
The two columns can also be visualized using the
distributionPlot function:
The function can also plot relative percentages instead of raw counts:
Given a list of genes, geneCenters can compute the
center of mass of each gene based on a dimensionality reduction of
choice. By default, the reduction used by geneCenters is
UMAP:
genes <- c('PRSS1', 'TTR', 'GJD2', 'MS4A8')
geneCenters(sceObj, genes)
#> UMAP1 UMAP2
#> PRSS1 -8.193966 6.424557
#> TTR 4.850861 3.807886
#> GJD2 5.358846 -5.913178
#> MS4A8 3.673467 3.819835The centers of mass can be visualized with genesDimPlot,
a function which uses geneCenters internally:
Note: The related colsDimPlot function
can plot the center of mass of numeric columns of the metadata of a
Seurat object. colsDimPlot uses colCenters
internally, which works similarly to geneCenters. Both
genesDimPlot and colsDimPlot are wrappers
around pointsDimPlot, which adds labeled points on a plot
generated with scLang::dimPlot.
The computeSilhouette computes the silhouette for an
input column in the metadata or coldata of the single-cell expression
object, returning the object with an appended silhouette column:
sceObj <- computeSilhouette(sceObj, 'label', 'labelSil')
head(scCol(sceObj, 'labelSil'))
#> [1] 0.8646612 0.8496185 0.8589548 0.8629454 0.8499232 0.8356816The normalizeSilhouette function returns a data frame
whose number of rows equals the number of cells and the number of
columns equals the number of identities in the single-cell expression
object. It requires silhouette information for the required column to be
provided in the metadata/coldata of the object. For each column in the
output data frame, cells outside of the corresponding identity get a
score of 0. All the other cells get scores in (0, 1] by adjusted min-max
normalization:
df <- normalizeSilhouette(sceObj, 'label', 'labelSil')
head(df)
#> acinar beta delta activated_stellate ductal
#> human1_lib1.final_cell_0001 0.9573036 0 0 0 0
#> human1_lib1.final_cell_0002 0.9423589 0 0 0 0
#> human1_lib1.final_cell_0003 0.9516344 0 0 0 0
#> human1_lib1.final_cell_0004 0.9555989 0 0 0 0
#> human1_lib1.final_cell_0005 0.9426616 0 0 0 0
#> human1_lib1.final_cell_0006 0.9285128 0 0 0 0
#> alpha epsilon gamma endothelial quiescent_stellate
#> human1_lib1.final_cell_0001 0 0 0 0 0
#> human1_lib1.final_cell_0002 0 0 0 0 0
#> human1_lib1.final_cell_0003 0 0 0 0 0
#> human1_lib1.final_cell_0004 0 0 0 0 0
#> human1_lib1.final_cell_0005 0 0 0 0 0
#> human1_lib1.final_cell_0006 0 0 0 0 0
#> macrophage schwann mast t_cell
#> human1_lib1.final_cell_0001 0 0 0 0
#> human1_lib1.final_cell_0002 0 0 0 0
#> human1_lib1.final_cell_0003 0 0 0 0
#> human1_lib1.final_cell_0004 0 0 0 0
#> human1_lib1.final_cell_0005 0 0 0 0
#> human1_lib1.final_cell_0006 0 0 0 0The addNormSilhouette function adds non-zero normalized
silhouette information to the single-cell expression object as a
metadata/coldata column. For visualization purposes, we will use the
featurePlot function from scLang:
sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scater_1.39.4 ggplot2_4.0.2
#> [3] scuttle_1.21.6 scRNAseq_2.25.0
#> [5] SingleCellExperiment_1.33.2 SummarizedExperiment_1.41.1
#> [7] Biobase_2.71.0 GenomicRanges_1.63.2
#> [9] Seqinfo_1.1.0 IRanges_2.45.0
#> [11] S4Vectors_0.49.2 BiocGenerics_0.57.1
#> [13] generics_0.1.4 MatrixGenerics_1.23.0
#> [15] matrixStats_1.5.0 scLang_0.99.3
#> [17] hammers_0.99.11 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] RcppAnnoy_0.0.23 prismatic_1.1.2 BiocIO_1.21.0
#> [4] bitops_1.0-9 filelock_1.0.3 tibble_3.3.1
#> [7] polyclip_1.10-7 XML_3.99-0.23 lifecycle_1.0.5
#> [10] httr2_1.2.2 globals_0.19.1 lattice_0.22-9
#> [13] ensembldb_2.35.0 MASS_7.3-65 alabaster.base_1.11.4
#> [16] magrittr_2.0.5 sass_0.4.10 rmarkdown_2.31
#> [19] jquerylib_0.1.4 yaml_2.3.12 text2vec_0.6.6
#> [22] spam_2.11-3 sp_2.2-1 DBI_1.3.0
#> [25] buildtools_1.0.0 RColorBrewer_1.1-3 abind_1.4-8
#> [28] purrr_1.2.2 AnnotationFilter_1.35.0 ggraph_2.2.2
#> [31] RCurl_1.98-1.18 tweenr_2.0.3 rappdirs_0.3.4
#> [34] float_0.3-3 ggrepel_0.9.8 irlba_2.3.7
#> [37] listenv_0.10.1 alabaster.sce_1.11.0 maketools_1.3.2
#> [40] RSpectra_0.16-2 LISTO_0.6.5 parallelly_1.47.0
#> [43] codetools_0.2-20 DelayedArray_0.37.1 ggforce_0.5.0
#> [46] tidyselect_1.2.1 UCSC.utils_1.7.1 farver_2.1.2
#> [49] ScaledMatrix_1.19.0 viridis_0.6.5 BiocFileCache_3.1.0
#> [52] GenomicAlignments_1.47.0 jsonlite_2.0.0 rsparse_0.5.3
#> [55] BiocNeighbors_2.5.4 tidygraph_1.3.1 progressr_0.19.0
#> [58] ggalluvial_0.12.6 tools_4.5.3 ggnewscale_0.5.2
#> [61] Rcpp_1.1.1-1 glue_1.8.1 gridExtra_2.3
#> [64] SparseArray_1.11.13 xfun_0.57 GenomeInfoDb_1.47.2
#> [67] dplyr_1.2.1 HDF5Array_1.39.1 gypsum_1.7.0
#> [70] withr_3.0.2 BiocManager_1.30.27 fastmap_1.2.0
#> [73] rhdf5filters_1.23.3 rsvd_1.0.5 digest_0.6.39
#> [76] R6_2.6.1 RSQLite_2.4.6 cigarillo_1.1.0
#> [79] h5mread_1.3.3 RhpcBLASctl_0.23-42 tidyr_1.3.2
#> [82] data.table_1.18.2.1 rtracklayer_1.71.3 graphlayouts_1.2.3
#> [85] httr_1.4.8 S4Arrays_1.11.1 uwot_0.2.4
#> [88] pkgconfig_2.0.3 gtable_0.3.6 blob_1.3.0
#> [91] S7_0.2.1-1 XVector_0.51.0 sys_3.4.3
#> [94] htmltools_0.5.9 dotCall64_1.2 ProtGenerics_1.43.0
#> [97] SeuratObject_5.4.0 scales_1.4.0 alabaster.matrix_1.11.0
#> [100] png_0.1-9 lgr_0.5.2 knitr_1.51
#> [103] reshape2_1.4.5 rjson_0.2.23 curl_7.0.0
#> [106] cachem_1.1.0 rhdf5_2.55.16 liver_1.28
#> [109] stringr_1.6.0 BiocVersion_3.23.1 vipor_0.4.7
#> [112] parallel_4.5.3 AnnotationDbi_1.73.1 mlapi_0.1.1
#> [115] restfulr_0.0.16 pillar_1.11.1 grid_4.5.3
#> [118] alabaster.schemas_1.11.0 vctrs_0.7.3 BiocSingular_1.27.1
#> [121] beachmat_2.27.5 dbplyr_2.5.2 statisfactory_1.0.4
#> [124] cluster_2.1.8.2 beeswarm_0.4.0 paletteer_1.7.0
#> [127] evaluate_1.0.5 henna_0.7.5 GenomicFeatures_1.63.2
#> [130] cli_3.6.6 compiler_4.5.3 Rsamtools_2.27.2
#> [133] rlang_1.2.0 crayon_1.5.3 abdiv_0.2.0
#> [136] future.apply_1.20.2 labeling_0.4.3 ggeasy_0.1.6
#> [139] rematch2_2.1.2 ggbeeswarm_0.7.3 plyr_1.8.9
#> [142] stringi_1.8.7 viridisLite_0.4.3 alabaster.se_1.11.0
#> [145] BiocParallel_1.45.0 Biostrings_2.79.5 lazyeval_0.2.3
#> [148] Matrix_1.7-5 ExperimentHub_3.1.0 primes_1.6.1
#> [151] bit64_4.8.0 future_1.70.0 Rhdf5lib_1.99.6
#> [154] KEGGREST_1.51.1 alabaster.ranges_1.11.0 AnnotationHub_4.1.0
#> [157] igraph_2.3.0 memoise_2.0.1 bslib_0.10.0
#> [160] bit_4.6.0