This tutorial demonstrates how to use the STADyUM package to perform comparative analysis of pause-escape kinetics and pause-site distributions across cell types using the likelihood-ratio tests (LRTs) developed in Zeng et al. (2026)
In this workflow, we will:
TranscriptionRates for human CD4+ T cells and human CD14 monocytes. To learn how to estimate transcription rates from PRO-seq data, refer to STADyUM_Rate_Estimation_Tutorial.Rmd.likelihoodRatioTestThis example uses human PRO-seq data from two immunologically distinct cell types to demonstrate comparative transcription rate estimation.
Overview of the comparative probabilistic framework for promoter-proximal pausing analysis across conditions, cell types, and species. The framework uses likelihood-ratio tests to quantify changes in pause-escape kinetics and pausing distributions from nascent RNA sequencing data
has_connectivity <- requireNamespace("BiocFileCache", quietly = TRUE) &&
curl::has_internet()
knitr::opts_chunk$set(dpi = 72, fig.width = 5, fig.height = 4,
eval = has_connectivity)
if (!has_connectivity)
message("No internet connection; vignette code not evaluated.")
library(STADyUM) # For transcription rate estimation
library(GenomicRanges) # For genomic data manipulation
library(dplyr)
library(plyranges)
#library(tidyverse) # For data manipulation and visualization
library(ggplot2) # For plotting
library(pracma)
library(BiocFileCache)bfc <- BiocFileCache(ask = FALSE)
zip_path <- bfcrpath(bfc, "https://zenodo.org/records/20618059/files/lrt_vignette_data.zip")
unzip(zip_path, exdir = tempdir())
data_dir <- file.path(tempdir(), "vignettes", "lrt_test_data")
# STADyUM `ExperimentTranscriptionRates` objects containing transcription rates estimated from human CD4+ T cell and human CD14+ monocyte PRO-seq data.
cd4_rate <- readRDS(file.path(data_dir, 'cd4_rate.RDS'))
cd14_rate <- readRDS(file.path(data_dir, 'cd14_rate.RDS'))
scale_factor <- 38359083 / 33294862The likelihoodRatioTest() function compares promoter-proximal pausing kinetics between two TranscriptionRates objects from estimateTranscriptionRates(). For experimental data, genes present in both conditions are matched, then three complementary likelihood-ratio tests are run per transcription unit:
spikeInFile argument).The scaleFactor argument accounts for sequencing-depth differences between samples when comparing pause-related statistics. Here we use the ratio of total mapped reads between CD4+ T cells and CD14 monocytes.
For each gene, the function returns log2 fold changes, LRT test statistics, and Benjamini-Hochberg adjusted p-values. Results are stored in a TranscriptionRatesLRT object with three per-gene tables (chiTbl, betaTbl, and fkTbl) and a merged summary table (lrtTbl) used by the plotting functions below.
lrt <- likelihoodRatioTest(cd4_rate, cd14_rate, scale_factor)
# The merged per-gene LRT summary table is built automatically and stored in
# the lrtTbl slot, ready to be passed straight to the plotting methods below.
head(lrtTbl(lrt))## # A tibble: 6 × 29
## geneId beta1 beta2 lfc fkMean1 fkMean2 fkVar1 fkVar2 tStats p
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ENSG0000… 2.34e-4 1.44e-4 -0.700 40.5 55.9 583. 606. 18.7 9.97e-10
## 2 ENSG0000… 2.21e-4 9.66e-5 -1.19 68.5 78.2 1585. 1874. 18.3 1.50e- 9
## 3 ENSG0000… 2.11e-4 5.89e-4 1.48 81.8 86.3 1636. 2659. 19.2 5.84e-10
## 4 ENSG0000… 2.42e-4 1.76e-4 -0.461 96.3 129. 2993. 1990. 4.98 1.60e- 3
## 5 ENSG0000… 4.46e-5 2.09e-5 -1.10 63.0 57.0 680. 548. 23.5 7.04e-12
## 6 ENSG0000… 6.95e-5 8.27e-5 0.251 40.5 54.6 588. 2846. 0.773 2.14e- 1
## # ℹ 19 more variables: padj <dbl>, CD4_chi <dbl>, CD4_fkSD <dbl>,
## # CD4_betaGroup <fct>, CD4_chiGroup <fct>, CD4_sdGroup <chr>, CD14_chi <dbl>,
## # CD14_fkSD <dbl>, CD14_betaGroup <fct>, CD14_chiGroup <fct>,
## # CD14_sdGroup <chr>, betaCategory <fct>, chi_mean <dbl>,
## # chi_mean_group <fct>, chi_lfc <dbl>, chi_lfc_group <fct>,
## # pause_change <chr>, deltaSD <dbl>, deltaMean <dbl>
# Each plotting method takes the TranscriptionRatesLRT object directly and
# reads the merged table from its lrtTbl slot internally.
p0 <- plotBetaQuantileHeatmap(lrt)
print(p0)Each cell shows the number of genes that fall into a given β quintile in CD4+ T cells (x-axis) and CD14+ monocytes (y-axis). Strong enrichment along the diagonal indicates that the relative ordering of genes by pause-escape rate is substantially preserved across cell types, even though widespread gene-level changes in β are detected by the LRT.
p1 <- plotBetaScatter(lrt,
label_x = -6,
label_y = -1,
x_lim = c(-6, -1),
y_lim = c(-6, -1))
print(p1) LRT analysis identified widespread gene-level changes in β between human CD4+ T cells and CD14+ monocytes, with 25.8% of genes showing significantly increased β values and 36.3% showing significantly decreased β values.
To compare kinetic and distributional changes directly, we mapped shifts in β against shifts in σ. Although individual genes display changes in both parameters, their joint distribution shows minimal overall association, indicating weak coupling between kinetic and distributional changes.
## Warning in geom_label(data = label_df, aes(x = .data$x, y = .data$y, label =
## .data$label), : Ignoring unknown parameters: `label.colour`
Although σ values show global alignment between the two cell types, substantial gene-level deviations from the diagonal are evident. Stratifying genes by sharp and broad pausing groups in each cell type reveals four classes of change. Notably, approximately 30.3% (1606 out of 5300) of genes switch between sharp and broad pausing groups between the two cell types, indicating that variation in pausing dispersion frequently involves changes between sharper and broader pause-site distributions.
Together, these analyses suggest that pause-escape kinetics and pause-site distribution exhibit distinct modes of variation across cellular contexts. While the relative ordering of genes by pause-escape rate remains largely preserved across cell types, pause-escape kinetics retain substantial regulatory plasticity. Variation in pausing dispersion more frequently involves changes between sharper and broader pause-site distributions. The weak association between changes in β and σ further supports the view that kinetic and distributional aspects of pausing can vary differently across cellular contexts.
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BiocFileCache_3.3.0 dbplyr_2.6.0 pracma_2.4.6
## [4] ggplot2_4.0.3 plyranges_1.33.0 dplyr_1.2.1
## [7] GenomicRanges_1.65.0 Seqinfo_1.3.0 IRanges_2.47.2
## [10] S4Vectors_0.51.3 BiocGenerics_0.59.7 generics_0.1.4
## [13] STADyUM_1.3.2
##
## loaded via a namespace (and not attached):
## [1] DBI_1.3.0 bitops_1.0-9
## [3] httr2_1.2.3 rlang_1.2.0
## [5] magrittr_2.0.5 otel_0.2.0
## [7] matrixStats_1.5.0 compiler_4.6.0
## [9] RSQLite_3.53.2 vctrs_0.7.3
## [11] pkgconfig_2.0.3 crayon_1.5.3
## [13] fastmap_1.2.0 backports_1.5.1
## [15] XVector_0.53.0 labeling_0.4.3
## [17] utf8_1.2.6 Rsamtools_2.29.0
## [19] rmarkdown_2.31 UCSC.utils_1.9.0
## [21] purrr_1.2.2 bit_4.6.0
## [23] xfun_0.59 cachem_1.1.0
## [25] cigarillo_1.3.0 GenomeInfoDb_1.49.1
## [27] jsonlite_2.0.0 blob_1.3.0
## [29] DelayedArray_0.39.3 BiocParallel_1.47.0
## [31] broom_1.0.13 parallel_4.6.0
## [33] R6_2.6.1 bslib_0.11.0
## [35] RColorBrewer_1.1-3 rtracklayer_1.73.0
## [37] car_3.1-5 jquerylib_0.1.4
## [39] Rcpp_1.1.1-1.1 SummarizedExperiment_1.43.0
## [41] knitr_1.51 BiocBaseUtils_1.15.1
## [43] Matrix_1.7-5 tidyselect_1.2.1
## [45] dichromat_2.0-0.1 abind_1.4-8
## [47] yaml_2.3.12 codetools_0.2-20
## [49] curl_7.1.0 lattice_0.22-9
## [51] tibble_3.3.1 Biobase_2.73.1
## [53] withr_3.0.3 S7_0.2.2
## [55] evaluate_1.0.5 isoband_0.3.0
## [57] Biostrings_2.81.3 pillar_1.11.1
## [59] ggpubr_0.6.3 filelock_1.0.3
## [61] MatrixGenerics_1.25.0 carData_3.0-6
## [63] ggpointdensity_0.2.1 RCurl_1.98-1.19
## [65] scales_1.4.0 glue_1.8.1
## [67] tools_4.6.0 BiocIO_1.23.3
## [69] data.table_1.18.4 ggsignif_0.6.4
## [71] GenomicAlignments_1.49.0 XML_3.99-0.23
## [73] cowplot_1.2.0 grid_4.6.0
## [75] tidyr_1.3.2 restfulr_0.0.17
## [77] Formula_1.2-5 cli_3.6.6
## [79] rappdirs_0.3.4 S4Arrays_1.13.0
## [81] viridisLite_0.4.3 gtable_0.3.6
## [83] rstatix_0.7.3 sass_0.4.10
## [85] digest_0.6.39 SparseArray_1.13.2
## [87] rjson_0.2.23 farver_2.1.2
## [89] memoise_2.0.1 htmltools_0.5.9
## [91] lifecycle_1.0.5 httr_1.4.8
## [93] bit64_4.8.2 MASS_7.3-65
Zeng, Xin, Gilad Barshad, Rebecca Hassett, Edward J. Rice, Charles G. Danko, Adam Siepel, and Yixin Zhao. 2026. “A Comparative Analysis of Promoter-Proximal Pausing Reveals Kinetic and Distributional Dimensions of Variation.” bioRxiv. https://doi.org/10.64898/2026.06.01.729264.