Contents

1 Introduction

This package is designed for reactome pathway-based analysis. Reactome is an open-source, open access, manually curated and peer-reviewed pathway database.

2 Citation

If you use ReactomePA1 in published research, please cite:

G Yu, QY He*. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016, 12(2):477-479. doi:[10.1039/C5MB00663E](http://dx.doi.org/10.1039/C5MB00663E)

3 Supported organisms

Currently ReactomePA supports several model organisms, including ‘celegans’, ‘fly’, ‘human’, ‘mouse’, ‘rat’, ‘yeast’ and ‘zebrafish’. The input gene ID should be Entrez gene ID. We recommend using clusterProfiler::bitr to convert biological IDs. For more detail, please refer to bitr: Biological Id TranslatoR.

4 Pathway Enrichment Analysis

Enrichment analysis is a widely used approach to identify biological themes. Here, we implement hypergeometric model to assess whether the number of selected genes associated with reactome pathway is larger than expected. The p values were calculated based the hypergeometric model2.

library(ReactomePA)
data(geneList)
de <- names(geneList)[abs(geneList) > 1.5]
head(de)
## [1] "4312"  "8318"  "10874" "55143" "55388" "991"
x <- enrichPathway(gene=de,pvalueCutoff=0.05, readable=T)
head(as.data.frame(x))
##                          ID                             Description
## R-HSA-68877     R-HSA-68877                    Mitotic Prometaphase
## R-HSA-2500257 R-HSA-2500257 Resolution of Sister Chromatid Cohesion
## R-HSA-5663220 R-HSA-5663220            RHO GTPases Activate Formins
## R-HSA-68882     R-HSA-68882                        Mitotic Anaphase
## R-HSA-2555396 R-HSA-2555396          Mitotic Metaphase and Anaphase
## R-HSA-2467813 R-HSA-2467813         Separation of Sister Chromatids
##               GeneRatio   BgRatio       pvalue     p.adjust       qvalue
## R-HSA-68877      25/307 128/10281 4.488951e-14 3.070443e-11 2.665020e-11
## R-HSA-2500257    23/307 120/10281 7.339950e-13 2.510263e-10 2.178806e-10
## R-HSA-5663220    21/307 133/10281 3.440692e-10 7.844777e-08 6.808948e-08
## R-HSA-68882      24/307 196/10281 3.839792e-09 5.822015e-07 5.053273e-07
## R-HSA-2555396    24/307 197/10281 4.255859e-09 5.822015e-07 5.053273e-07
## R-HSA-2467813    23/307 185/10281 6.108439e-09 6.963620e-07 6.044139e-07
##                                                                                                                                                              geneID
## R-HSA-68877   CDCA8/CDC20/CENPE/CCNB2/NDC80/NCAPH/SKA1/CENPM/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/BIRC5/NCAPG/AURKB/CCNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1
## R-HSA-2500257             CDCA8/CDC20/CENPE/CCNB2/NDC80/SKA1/CENPM/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/CCNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1
## R-HSA-5663220                          CDCA8/CDC20/CENPE/NDC80/SKA1/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1/EVL
## R-HSA-68882        CDCA8/CDC20/CENPE/NDC80/UBE2C/SKA1/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/PTTG1/LMNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/TAOK1
## R-HSA-2555396      CDCA8/CDC20/CENPE/NDC80/UBE2C/SKA1/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/PTTG1/LMNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/TAOK1
## R-HSA-2467813            CDCA8/CDC20/CENPE/NDC80/UBE2C/SKA1/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/PTTG1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/TAOK1
##               Count
## R-HSA-68877      25
## R-HSA-2500257    23
## R-HSA-5663220    21
## R-HSA-68882      24
## R-HSA-2555396    24
## R-HSA-2467813    23

For calculation/parameter details, please refer to the vignette of DOSE3..

4.1 Pathway analysis of NGS data

Pathway analysis using NGS data (eg, RNA-Seq and ChIP-Seq) can be performed by linking coding and non-coding regions to coding genes via ChIPseeker package, which can annotates genomic regions to their nearest genes, host genes, and flanking genes respectivly. In addtion, it provides a function, seq2gene, that simultaneously considering host genes, promoter region and flanking gene from intergenic region that may under control via cis-regulation. This function maps genomic regions to genes in a many-to-many manner and facilitate functional analysis. For more details, please refer to ChIPseeker4.

4.2 Visualize enrichment result

We implement barplot, dotplot enrichment map and category-gene-network for visualization. It is very common to visualize the enrichment result in bar or pie chart. We believe the pie chart is misleading and only provide bar chart.

barplot(x, showCategory=8)

dotplot(x, showCategory=15)

Enrichment map can be viusalized by enrichMap:

enrichMap(x, layout=igraph::layout.kamada.kawai, vertex.label.cex = 1)

In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories, we developed cnetplot function to extract the complex association between genes and diseases.

cnetplot(x, categorySize="pvalue", foldChange=geneList)

4.3 Comparing enriched reactome pathways among gene clusters with clusterProfiler

We have developed an R package clusterProfiler5 for comparing biological themes among gene clusters. ReactomePA works fine with clusterProfiler and can compare biological themes at reactome pathway perspective.

require(clusterProfiler)
data(gcSample)
res <- compareCluster(gcSample, fun="enrichPathway")
plot(res)

5 Gene Set Enrichment Analysis

A common approach in analyzing gene expression profiles was identifying differential expressed genes that are deemed interesting. The enrichPathway function we demonstrated previously were based on these differential expressed genes. This approach will find genes where the difference is large, but it will not detect a situation where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA)6 directly addressed this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. For algorithm details, please refer to the vignette of DOSE3.

y <- gsePathway(geneList, nPerm=1000,
                minGSSize=120, pvalueCutoff=0.2,
                pAdjustMethod="BH", verbose=FALSE)
res <- as.data.frame(y)
head(res)
##                          ID
## R-HSA-1474244 R-HSA-1474244
## R-HSA-69481     R-HSA-69481
## R-HSA-162909   R-HSA-162909
## R-HSA-5693532 R-HSA-5693532
## R-HSA-1428517 R-HSA-1428517
## R-HSA-69242     R-HSA-69242
##                                                                  Description
## R-HSA-1474244                              Extracellular matrix organization
## R-HSA-69481                                                 G2/M Checkpoints
## R-HSA-162909                                Host Interactions of HIV factors
## R-HSA-5693532                                 DNA Double-Strand Break Repair
## R-HSA-1428517 The citric acid (TCA) cycle and respiratory electron transport
## R-HSA-69242                                                          S Phase
##               setSize enrichmentScore       NES      pvalue   p.adjust
## R-HSA-1474244     258      -0.4577630 -1.943204 0.001338688 0.01601891
## R-HSA-69481       135       0.6871874  2.886236 0.003058104 0.01601891
## R-HSA-162909      121       0.5840669  2.403613 0.003095975 0.01601891
## R-HSA-5693532     129       0.4963430  2.065806 0.003105590 0.01601891
## R-HSA-1428517     132       0.3771694  1.577651 0.003105590 0.01601891
## R-HSA-69242       120       0.6701216  2.745101 0.003125000 0.01601891
##                  qvalues rank                   leading_edge
## R-HSA-1474244 0.01036599 1943 tags=33%, list=16%, signal=29%
## R-HSA-69481   0.01036599 1905 tags=48%, list=15%, signal=41%
## R-HSA-162909  0.01036599 3321 tags=57%, list=27%, signal=42%
## R-HSA-5693532 0.01036599 1990 tags=34%, list=16%, signal=29%
## R-HSA-1428517 0.01036599 3805 tags=48%, list=30%, signal=34%
## R-HSA-69242   0.01036599 1905 tags=50%, list=15%, signal=43%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                   core_enrichment
## R-HSA-1474244 11132/4017/1288/4811/3910/3371/1291/3791/831/1301/4238/7450/3685/80781/1280/1306/4314/3675/8425/977/4054/7042/3912/4322/1278/1511/4060/30008/1277/164656/22795/10516/81578/1293/2247/1295/58494/8076/5118/2192/1281/83700/50509/4319/1290/1513/11096/2202/4313/2199/3693/10536/1294/3339/1462/1289/1292/3908/4016/3909/4053/6678/1296/633/5654/2331/63923/7043/3913/1300/2200/1634/7177/1287/3679/4680/2006/7373/1307/1311/1308/652/4148/54829/4239
## R-HSA-69481                                                                                                              8318/55388/9133/983/1111/891/4174/4171/993/990/51512/9156/23594/4998/4175/4173/10926/5984/85236/5688/5709/641/5698/1763/8970/5693/8317/4176/5713/5982/5721/2810/5691/9088/995/5685/7468/4172/7336/5690/5684/83990/5686/5695/11200/10213/8345/7534/80010/23198/5983/7979/4683/63967/3018/5699/5714/5702/3014/5708/5692/8290/5704/580/6119
## R-HSA-162909                                                                                        3159/3932/5688/9688/3055/5709/5698/919/5693/5902/5713/5721/5691/292/1104/5685/1794/9631/5690/5684/5686/5695/11168/10213/940/23198/7979/23165/5699/5714/6921/55706/8815/5702/23636/9978/5905/5708/1174/5692/5704/5901/164/60489/4927/79902/5683/10762/1175/5694/8480/11097/5718/5682/5716/9972/904/81929/162/5707/57122/4869/926/5696/7514/79023/3837/920/9818
## R-HSA-5693532                                                                                                                                                                                                                     10635/890/1111/9156/5888/2237/54962/5984/3838/85236/5111/641/1763/8970/5427/5982/2070/7468/5424/7336/2140/83990/11200/2072/8345/80010/5983/2138/8914/4683/63967/3018/142/3014/5531/5425/54107/7517/8290/5536/580/6119/10721/672
## R-HSA-1428517                                                                                                                          5163/29078/3945/3948/1537/3418/9377/54205/9997/9123/4726/4704/50/1737/29796/7386/3419/6566/5467/5160/5165/4725/518/4723/9551/506/2108/1349/4708/2271/4700/509/3421/10131/54539/25874/1351/3939/1329/8050/4719/51204/4697/4702/516/4694/1340/1431/6389/4712/4722/4191/4711/522/1337/7381/4710/7384/10476/7351/55066/514/682
## R-HSA-69242                                                                                                                                       8318/890/9837/81620/51659/4174/4171/993/990/898/23594/4998/1163/9134/4175/4173/2237/6502/5984/994/84296/4609/5111/5688/64785/5709/5698/1763/5693/5427/23649/4176/5713/5982/5557/5721/5691/5685/1019/5558/4172/5424/5690/5684/5885/5686/5695/10213/23198/5983/7979/5699/5714/5702/5425/5708/5692/54107/5704/6119

5.1 Visualize GSEA result

enrichMap(y)

gseaplot(y, geneSetID = "R-HSA-69242")

6 Pathway Visualization

In ReactomePA, we also implemented viewPathway to visualized the pathway.

viewPathway("E2F mediated regulation of DNA replication", readable=TRUE, foldChange=geneList)

More documents can be found on the project website, https://guangchuangyu.github.io/ReactomePA.

7 Session Information

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] ReactomePA_1.20.2    DOSE_3.2.0           org.Hs.eg.db_3.4.1  
## [4] AnnotationDbi_1.38.0 IRanges_2.10.1       S4Vectors_0.14.1    
## [7] Biobase_2.36.2       BiocGenerics_0.22.0  BiocStyle_2.4.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10        compiler_3.4.0      plyr_1.8.4         
##  [4] tools_3.4.0         digest_0.6.12       RSQLite_1.1-2      
##  [7] evaluate_0.10       memoise_1.1.0       tibble_1.3.1       
## [10] gtable_0.2.0        rlang_0.1           graph_1.54.0       
## [13] igraph_1.0.1        fastmatch_1.1-0     DBI_0.6-1          
## [16] yaml_2.1.14         gridExtra_2.2.1     fgsea_1.2.1        
## [19] stringr_1.2.0       knitr_1.15.1        rappdirs_0.3.1     
## [22] rprojroot_1.2       grid_3.4.0          qvalue_2.8.0       
## [25] data.table_1.10.4   BiocParallel_1.10.1 GOSemSim_2.2.0     
## [28] rmarkdown_1.5       reactome.db_1.59.1  reshape2_1.4.2     
## [31] GO.db_3.4.1         DO.db_2.9           ggplot2_2.2.1      
## [34] magrittr_1.5        splines_3.4.0       backports_1.0.5    
## [37] scales_0.4.1        htmltools_0.3.6     graphite_1.22.0    
## [40] colorspace_1.3-2    labeling_0.3        stringi_1.1.5      
## [43] lazyeval_0.2.0      munsell_0.4.3

References

1. Yu, G. & He, Q.-Y. ReactomePA: An r/bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).

2. Boyle, E. I. et al. GO::TermFinder–open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics (Oxford, England) 20, 3710–3715 (2004).

3. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).

4. Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: An r/bioconductor package for chip peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).

5. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).

6. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).