exomePeak2 provides technical independent peak detection and differential methylation analysis for Methylated RNA immunoprecipitation sequencing data (MeRIP-Seq). MeRIP-Seq is the primary sequencing technology for epi-transcriptomic assessment. The peak calling processes in MeRIP-Seq is sensitive to GC content biases, which are generally present in NGS-based assays. Besides, the antibody pull-down efficiency do often vary across different IP replicates, introducing another layer of unwanted technical variation. exomePeak2 addresses these challenges by introducing a robust set of computation tools tailored for MeRIP-Seq. With exomePeak2, users can perform peak calling and differential analysis through a straightforward single-step function.
To install exomePeak2 from bioconductor, start R (version >“3.6”) and enter:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("exomePeak2")
For order versions of R, please refer to the appropriate [Bioconductor release](https://www.bioconductor.org/about/release-announcements/}.
For the peak calling of the MeRIP-Seq experiment, exomePeak2
requires alignment results to be stored in BAM format.
User need to specify the corresponding BAM directories
of IP and input samples in the arguments bam_ip
and
bam_input
of the main function exomePeak2()
,
respectively.
The following example demonstrates the peak calling from
BAM and GFF files. In addition to
GFF files, transcript annotation can also be provided
by the bioconductor TxDb object. TxDb
will be automatically downloaded if the corresponding UCSC genome name
is filled in the genome
argument.
Note that the genome sequences are necessary for the correction of GC
content biases. If the genome
argument is missing (
= NULL
), exomPeak2 will perform peak calling without
correcting the PCR amplification bias.
library(exomePeak2)
## Warning in fun(libname, pkgname): Package 'exomePeak2' is deprecated and will be removed from
## Bioconductor version 3.20
set.seed(1)
GENE_ANNO_GTF = system.file("extdata", "example.gtf", package="exomePeak2")
f1 = system.file("extdata", "IP1.bam", package="exomePeak2")
f2 = system.file("extdata", "IP2.bam", package="exomePeak2")
f3 = system.file("extdata", "IP3.bam", package="exomePeak2")
f4 = system.file("extdata", "IP4.bam", package="exomePeak2")
IP_BAM = c(f1,f2,f3,f4)
f1 = system.file("extdata", "Input1.bam", package="exomePeak2")
f2 = system.file("extdata", "Input2.bam", package="exomePeak2")
f3 = system.file("extdata", "Input3.bam", package="exomePeak2")
INPUT_BAM = c(f1,f2,f3)
exomePeak2(bam_ip = IP_BAM,
bam_input = INPUT_BAM,
gff = GENE_ANNO_GTF,
genome = "hg19")
## Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...): makeTxDbFromGFF() has moved to the txdbmaker package. Please call
## txdbmaker::makeTxDbFromGFF() to get rid of this warning.
## GRangesList object of length 29:
## $`1`
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 1 chr7 150742912-150742936 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## $`2`
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 2 chr7 150742987-150743011 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## $`3`
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 3 chr7 150743062-150743136 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## ...
## <26 more elements>
Besides the GRangesList object of peaks returned by the main
function, exomePeak2 will export significant peaks in
BED and CSV format; the files will be
automatically saved in a folder named
exomePeak2_output
.
In the peak calling mode, the peak statistics are calculated from the \({\beta}_{i,1}\) terms in the following Poisson GLMs:
\[K_{i,j} \sim Poisson(\lambda_{i,j})\]
\[log2(\lambda_{i,j}) = {\beta}_{i,0} + {\beta}_{i,1}1(\rho(j)=\text{IP}) + t_{i,j}\]
\(\lambda_{i,j}\) is the expected value of read abundance for sliding window (bin) \(i\) under sample \(j\). \({\beta}_{i,0}\) is the intercept coefficient; \({\beta}_{i,1}\) is the coefficient of IP/input log2 fold change; \(1(\rho(j)=\text{IP})\) is the regression covariate, which is an indicator (binary) variable of the sample \(j\) being IP sample. \(t_{i,j}\) is the bin-specific offset accounting for the sequencing depth variation, IP efficiency variation, and the GC content bias.
\[t_{i,j} = log2(s_j) + log2(f(GC_i))\]
By default, \(t_{i,j}\) is further
decomposed into the sample-specific size factor \(s_j\) and the GC content bias offset \(f(GC_i)\). \(\hat
s_j\) is estimated using the median read count on the background
bins, which are classified by the GMM (K = 2) from the bin’s POI values:
(IP count + 1)/(input count + 1). \(\hat
f(GC_i)\) is estimated using cubic splines expanded Poisson GLM
on bin’s read count with bin’s fragment GC content \(GC_i\) as covariate. Knots for the cubic
splines used are set at c(0, 0.4, 0.5, 0.6, 1)
. For
identifiability, the fitted values \(\hat
f(GC_i)\) are centered at 1 by dividing its sample mean.
For peak calling, Poisson GLMs using the above design are fitted to identify the significantly modified sliding windows; the default detection criteria is the Wald test p-values < 1e-10 under the alternative hypothesis of \({\beta}_{i,1} > 0\).
Annotations on the columns of the output table:
The following example performs differential methylation analysis (comparison of two biological conditions) on exon regions defined by transcript annotation.
f1 = system.file("extdata", "treated_IP1.bam", package="exomePeak2")
TREATED_IP_BAM = c(f1)
f1 = system.file("extdata", "treated_Input1.bam", package="exomePeak2")
TREATED_INPUT_BAM = c(f1)
exomePeak2(bam_ip = IP_BAM,
bam_input = INPUT_BAM,
bam_input_treated = TREATED_INPUT_BAM,
bam_ip_treated = TREATED_IP_BAM,
gff = GENE_ANNO_GTF,
genome = "hg19")
## Warning in call_fun_in_txdbmaker("makeTxDbFromGFF", ...): makeTxDbFromGFF() has moved to the txdbmaker package. Please call
## txdbmaker::makeTxDbFromGFF() to get rid of this warning.
## GRangesList object of length 2:
## $`1`
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 1 chr7 150910255-150910629 -
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
##
## $`2`
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## 2 chr7 150934486-150935360 +
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
In the differential methylation mode, exomePeak2 will first perform peak calling on the exonic regions by pulling the control and treated samples. Next, it will conduct the differential methylation analysis over the read counts of significant peaks using an interactive Poisson GLM design.
The peak statistics calculated in the differential methylation setting are based on the interactive coefficient \({\beta}_{i,3}\) of the following regression equation:
\[log2(\lambda_{i,j}) = {\beta}_{i,0} + {\beta}_{i,1}1(\rho(j)=\text{IP}) + {\beta}_{i,2}1(\rho(j)=\text{Treated}) + {\beta}_{i,3}1(\rho(j)=\text{IP&Treated}) + t_{i,j}\]
In the output of differential methylation analysis, The diff.log2FC and pvalue are calculated from the coefficient estimates of the interactive term \({\beta}_{i,3}\). The differential log2FC here is interpreted as log( IP over input FC in treated group / IP over input FC in control group). The offset terms \(t_{i,j}\) used in the interactive GLM are calculated using the same method as the peak calling.
If you encounter any problems during use, please contact the maintainer of exomePeak2:
Zhen Wei : zhen.wei01@xjtlu.edu.cn
Hansen, K. D., Irizarry, R. A., & Wu, Z. (2012). Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics, 13(2), 204-216.
Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 15(12), 1-21.
Benjamini, Y., & Speed, T. P. (2012). Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic acids research, 40(10), e72-e72.
Meng, J., Lu, Z., Liu, H., Zhang, L., Zhang, S., Chen, Y., … & Huang, Y. (2014). A protocol for RNA methylation differential analysis with MeRIP-Seq data and exomePeak R/Bioconductor package. Methods, 69(3), 274-281.
sessionInfo()
## R version 4.4.0 alpha (2024-03-27 r86216)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.6.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.72.0
## [3] rtracklayer_1.64.0 BiocIO_1.14.0
## [5] Biostrings_2.72.0 XVector_0.44.0
## [7] exomePeak2_1.16.0 SummarizedExperiment_1.34.0
## [9] Biobase_2.64.0 GenomicRanges_1.56.0
## [11] GenomeInfoDb_1.40.0 IRanges_2.38.0
## [13] S4Vectors_0.42.0 BiocGenerics_0.50.0
## [15] MatrixGenerics_1.16.0 matrixStats_1.2.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 bitops_1.0-7 httr2_1.0.0
## [4] biomaRt_2.60.0 rlang_1.1.3 magrittr_2.0.3
## [7] compiler_4.4.0 RSQLite_2.3.5 GenomicFeatures_1.56.0
## [10] systemfonts_1.0.6 png_0.1-8 vctrs_0.6.5
## [13] txdbmaker_1.0.0 stringr_1.5.1 pkgconfig_2.0.3
## [16] crayon_1.5.2 fastmap_1.1.1 dbplyr_2.5.0
## [19] labeling_0.4.3 utf8_1.2.4 Rsamtools_2.20.0
## [22] rmarkdown_2.26 UCSC.utils_1.0.0 ragg_1.3.0
## [25] bit_4.0.5 xfun_0.43 zlibbioc_1.50.0
## [28] cachem_1.0.8 jsonlite_1.8.8 progress_1.2.3
## [31] blob_1.2.4 DelayedArray_0.30.0 BiocParallel_1.38.0
## [34] parallel_4.4.0 prettyunits_1.2.0 R6_2.5.1
## [37] biglm_0.9-2.1 bslib_0.6.2 stringi_1.8.3
## [40] jquerylib_0.1.4 Rcpp_1.0.12 knitr_1.45
## [43] Matrix_1.7-0 splines_4.4.0 tidyselect_1.2.1
## [46] abind_1.4-5 yaml_2.3.8 codetools_0.2-19
## [49] curl_5.2.1 lattice_0.22-6 tibble_3.2.1
## [52] withr_3.0.0 KEGGREST_1.44.0 evaluate_0.23
## [55] BiocFileCache_2.12.0 xml2_1.3.6 mclust_6.1
## [58] pillar_1.9.0 filelock_1.0.3 generics_0.1.3
## [61] RCurl_1.98-1.14 hms_1.1.3 ggplot2_3.5.0
## [64] munsell_0.5.0 scales_1.3.0 glue_1.7.0
## [67] tools_4.4.0 speedglm_0.3-5 locfit_1.5-9.9
## [70] GenomicAlignments_1.40.0 XML_3.99-0.16.1 grid_4.4.0
## [73] AnnotationDbi_1.66.0 colorspace_2.1-0 GenomeInfoDbData_1.2.12
## [76] restfulr_0.0.15 cli_3.6.2 rappdirs_0.3.3
## [79] textshaping_0.3.7 fansi_1.0.6 S4Arrays_1.4.0
## [82] dplyr_1.1.4 gtable_0.3.4 DESeq2_1.44.0
## [85] sass_0.4.9 digest_0.6.35 SparseArray_1.4.0
## [88] farver_2.1.1 rjson_0.2.21 memoise_2.0.1
## [91] htmltools_0.5.8 lifecycle_1.0.4 httr_1.4.7
## [94] bit64_4.0.5 MASS_7.3-60.2