Contents

1 Overview

The toppgene package is a client for the ToppGene Suite webserver that takes as input a gene list to perform enrichment analysis.

To demonstrate the use of ToppGene, below are the two test cases from the publication (Chen et al. 2007) of congenital heart disease (CHD) and diabetic retinopathy (DR).

2 Installation

To install this package, start R and enter:

if (! require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("toppgene")

3 Usage

3.1 Prepare the gene lists

A query may contain one or more genes. ToppGene enrich() requires gene Entrez ID integers. However, symbol conversion with ToppGene is more permissive than Bioconductor, therefore use ToppGene’s lookup() function to convert gene symbols to Entrez IDs. The published example provides gene symbols for CHD (n = 28) and DR (n = 27) that we will also use here.

genes_chd_sym <- c(
    "ADD1", "CITED2", "DTNA", "CKM", "GATA4", "GJA1", "HAND1", "HAND2", "HEY2",
    "HOXC4", "HOXC5", "ITGB3", "JARID2", "MTHFD1", "MTHFR", "MTRR", "NKX2-5",
    "NOS3", "NPPA", "NPPB", "RFC1", "SALL4", "TBX1", "TBX5", "TBX20",
    "TGFB1", "ZFPM2", "ZIC3")
genes_dr_sym <- c(
    "ACE", "ADRB3", "AGT", "AGTR2", "AKR1B1", "APOE", "AR", "CMA1", "EDN1",
    "GNB3", "HFE", "HLA-DPB1", "HLA-DRB1", "ICAM1", "ITGA2B", "ITGB2", "LTA",
    "NOS2A", "NOS3", "NPY", "PECAM1", "PON1", "RAGE", "SELE", "SERPINE1",
    "TIMP3", "TNF")

3.2 Convert gene symbol IDs to Entrez IDs

library(toppgene)

genes_chd <- lookup(genes_chd_sym)
genes_chd
#> DataFrame with 28 rows and 4 columns
#>     OfficialSymbol    Entrez   Submitted            Description
#>        <character> <integer> <character>            <character>
#> 1             ADD1       118        ADD1              adducin 1
#> 2           CITED2     10370      CITED2 Cbp/p300 interacting..
#> 3             DTNA      1837        DTNA     dystrobrevin alpha
#> 4              CKM      1158         CKM creatine kinase, M-t..
#> 5            GATA4      2626       GATA4 GATA binding protein 4
#> ...            ...       ...         ...                    ...
#> 24            TBX5      6910        TBX5 T-box transcription ..
#> 25           TBX20     57057       TBX20 T-box transcription ..
#> 26           TGFB1      7040       TGFB1 transforming growth ..
#> 27           ZFPM2     23414       ZFPM2 zinc finger protein,..
#> 28            ZIC3      7547        ZIC3    Zic family member 3
genes_dr <- lookup(genes_dr_sym)
genes_dr
#> DataFrame with 27 rows and 4 columns
#>     OfficialSymbol    Entrez   Submitted            Description
#>        <character> <integer> <character>            <character>
#> 1              ACE      1636         ACE angiotensin I conver..
#> 2            ADRB3       155       ADRB3    adrenoceptor beta 3
#> 3              AGT       183         AGT        angiotensinogen
#> 4            AGTR2       186       AGTR2 angiotensin II recep..
#> 5           AKR1B1       231      AKR1B1 aldo-keto reductase ..
#> ...            ...       ...         ...                    ...
#> 23            AGER       177        RAGE advanced glycosylati..
#> 24            SELE      6401        SELE             selectin E
#> 25        SERPINE1      5054    SERPINE1 serpin family E memb..
#> 26           TIMP3      7078       TIMP3 TIMP metallopeptidas..
#> 27             TNF      7124         TNF  tumor necrosis factor

3.3 Run enrichment queries

enrich_chd <- enrich(genes_chd$Entrez)
enrich_chd
#> DataFrame with 1383 rows and 15 columns
#>                    Category                     ID                   Name
#>                 <character>            <character>            <character>
#> 1    GeneOntologyMolecula..             GO:0008134 transcription factor..
#> 2    GeneOntologyMolecula..             GO:0061629 RNA polymerase II-sp..
#> 3    GeneOntologyMolecula..             GO:0140297 DNA-binding transcri..
#> 4    GeneOntologyMolecula..             GO:0001228 DNA-binding transcri..
#> 5    GeneOntologyMolecula..             GO:0001216 DNA-binding transcri..
#> ...                     ...                    ...                    ...
#> 1379                Disease DOID:2841 (is_implic.. asthma (is_implicate..
#> 1380                Disease               C1449563 Cardiomyopathy, Fami..
#> 1381                Disease EFO_0006340, EFO_000.. mean arterial pressu..
#> 1382                Disease            EFO_0000612  myocardial infarction
#> 1383                Disease DOID:13550 (is_impli.. angle-closure glauco..
#>           PValue QValueFDRBH QValueFDRBY QValueBonferroni TotalGenes
#>        <numeric>   <numeric>   <numeric>        <numeric>  <integer>
#> 1    6.31374e-12 1.37639e-09 8.20882e-09      1.37639e-09      19978
#> 2    1.66715e-10 1.46986e-08 8.76626e-08      3.63440e-08      19978
#> 3    2.02275e-10 1.46986e-08 8.76626e-08      4.40959e-08      19978
#> 4    1.73749e-08 8.82533e-07 5.26343e-06      3.78774e-06      19978
#> 5    2.02416e-08 8.82533e-07 5.26343e-06      4.41267e-06      19978
#> ...          ...         ...         ...              ...        ...
#> 1379 1.42795e-05 0.000230108  0.00182283        0.0220904      29516
#> 1380 1.45430e-05 0.000231939  0.00183733        0.0224981      29516
#> 1381 1.63772e-05 0.000258526  0.00204794        0.0253356      29516
#> 1382 1.78933e-05 0.000279606  0.00221493        0.0276810      29516
#> 1383 1.81704e-05 0.000281097  0.00222674        0.0281097      29516
#>      GenesInTerm GenesInQuery GenesInTermInQuery           Source
#>        <integer>    <integer>          <integer>      <character>
#> 1            754           28                 13                 
#> 2            427           28                 10                 
#> 3            595           28                 11                 
#> 4            504           28                  9                 
#> 5            513           28                  9                 
#> ...          ...          ...                ...              ...
#> 1379         157           28                  4   AllianceGenome
#> 1380          50           28                  3 DisGeNET Curated
#> 1381          52           28                  3             GWAS
#> 1382         350           28                  5             GWAS
#> 1383           7           28                  2   AllianceGenome
#>                         URL          GenesEntrez            GenesSymbol
#>                 <character>        <IntegerList>        <CharacterList>
#> 1                           10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 2                           10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 3                           10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 4                             2626,1482,9421,... GATA4,NKX2-5,HAND1,...
#> 5                             2626,1482,9421,... GATA4,NKX2-5,HAND1,...
#> ...                     ...                  ...                    ...
#> 1379 https://fms.alliance..   7040,3690,4524,...  TGFB1,ITGB3,MTHFR,...
#> 1380                              1482,4878,4879       NKX2-5,NPPA,NPPB
#> 1381 http://www.ebi.ac.uk..       2626,3221,4524      GATA4,HOXC4,MTHFR
#> 1382 http://www.ebi.ac.uk..  7040,1482,57057,... TGFB1,NKX2-5,TBX20,...
#> 1383 https://fms.alliance..            4524,4846             MTHFR,NOS3
enrich_dr <- enrich(genes_dr$Entrez)
enrich_dr
#> DataFrame with 1353 rows and 15 columns
#>                    Category                     ID                   Name
#>                 <character>            <character>            <character>
#> 1    GeneOntologyMolecula..             GO:0042277        peptide binding
#> 2    GeneOntologyMolecula..             GO:0004888 transmembrane signal..
#> 3    GeneOntologyMolecula..             GO:0034617 tetrahydrobiopterin ..
#> 4    GeneOntologyMolecula..             GO:0030545 signaling receptor r..
#> 5    GeneOntologyMolecula..             GO:0042605 peptide antigen bind..
#> ...                     ...                    ...                    ...
#> 1349                Disease DOID:1070 (is_implic.. primary open angle g..
#> 1350                Disease DOID:224 (biomarker_.. transient cerebral i..
#> 1351                Disease               C0036690             Septicemia
#> 1352                Disease               C0243026                 Sepsis
#> 1353                Disease               C1719672          Severe Sepsis
#>           PValue QValueFDRBH QValueFDRBY QValueBonferroni TotalGenes
#>        <numeric>   <numeric>   <numeric>        <numeric>  <integer>
#> 1    1.07728e-07 2.97329e-05 0.000184327      2.97329e-05      19978
#> 2    8.08462e-06 9.69324e-04 0.006009253      2.23136e-03      19978
#> 3    1.05361e-05 9.69324e-04 0.006009253      2.90797e-03      19978
#> 4    2.11526e-05 1.45953e-03 0.009048234      5.83811e-03      19978
#> 5    3.43071e-05 1.89375e-03 0.011740162      9.46875e-03      19978
#> ...          ...         ...         ...              ...        ...
#> 1349 4.85723e-09 1.42681e-07 1.21591e-06      1.36974e-05      29516
#> 1350 5.55304e-09 1.61045e-07 1.37241e-06      1.56596e-05      29516
#> 1351 5.82504e-09 1.61045e-07 1.37241e-06      1.64266e-05      29516
#> 1352 5.82504e-09 1.61045e-07 1.37241e-06      1.64266e-05      29516
#> 1353 5.82504e-09 1.61045e-07 1.37241e-06      1.64266e-05      29516
#>      GenesInTerm GenesInQuery GenesInTermInQuery           Source
#>        <integer>    <integer>          <integer>      <character>
#> 1            299           27                  7                 
#> 2           1407           27                 10                 
#> 3              4           27                  2                 
#> 4            662           27                  7                 
#> 5             47           27                  3                 
#> ...          ...          ...                ...              ...
#> 1349          23           27                  4   AllianceGenome
#> 1350         157           27                  6   AllianceGenome
#> 1351          24           27                  4 DisGeNET Curated
#> 1352          24           27                  4 DisGeNET Curated
#> 1353          24           27                  4 DisGeNET Curated
#>                         URL        GenesEntrez           GenesSymbol
#>                 <character>      <IntegerList>       <CharacterList>
#> 1                            3077,348,3689,...    HFE,APOE,ITGB2,...
#> 2                            6401,7124,155,...    SELE,TNF,ADRB3,...
#> 3                                    4843,4846             NOS2,NOS3
#> 4                            4049,7124,348,...      LTA,TNF,APOE,...
#> 5                               3077,3115,3123 HFE,HLA-DPB1,HLA-DRB1
#> ...                     ...                ...                   ...
#> 1349 https://fms.alliance..  5444,7124,348,...     PON1,TNF,APOE,...
#> 1350 https://fms.alliance..  7124,348,4846,...     TNF,APOE,NOS3,...
#> 1351                        4049,7124,4843,...      LTA,TNF,NOS2,...
#> 1352                        4049,7124,4843,...      LTA,TNF,NOS2,...
#> 1353                        4049,7124,4843,...      LTA,TNF,NOS2,...

3.4 View enrichment of publication top-ranked gene

library(IRanges) # CharacterList
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
library(DFplyr)  # (DataFrame support for various dplyr functions)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:IRanges':
#> 
#>     collapse, desc, intersect, setdiff, slice, union
#> The following objects are masked from 'package:S4Vectors':
#> 
#>     first, intersect, rename, setdiff, setequal, union
#> The following objects are masked from 'package:BiocGenerics':
#> 
#>     combine, intersect, setdiff, setequal, union
#> The following object is masked from 'package:generics':
#> 
#>     explain
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'DFplyr'
#> The following object is masked from 'package:dplyr':
#> 
#>     desc
#> The following object is masked from 'package:IRanges':
#> 
#>     desc
## Show all DataFrame rows of top_results().
orig <- options(showHeadLines = 20L)

top_results <- function(df) {
    df |>
        group_by(Category) |>
        slice(1) |>
        ungroup() |>
        ## Shorten GeneOntology to GO.
        mutate(Category = gsub(x = Category, "GeneOntology", "GO")) |>
        select(Category, ID, Name, GenesSymbol)
}
enrich_chd |>
    filter(any(GenesSymbol %in% CharacterList("HAND2"))) |>
    top_results()
#> DataFrame with 18 rows and 4 columns
#>               Category                     ID                   Name
#>            <character>            <character>            <character>
#> 1         Coexpression                  M1938 MEISSNER_BRAIN_HCP_W..
#> 2    CoexpressionAtlas PCBC_ratio_EB_vs_SC_.. ratio_EmbryoidBody_v..
#> 3             Cytoband                   4q33                   4q33
#> 4              Disease               C0039685    Tetralogy of Fallot
#> 5               Domain            4.10.280.10                      -
#> 6                 Drug            ctd:D003474               Curcumin
#> 7           GeneFamily                    420 Basic helix-loop-hel..
#> 8  GOBiologicalProcess             GO:0048738 cardiac muscle tissu..
#> 9  GOCellularComponent             GO:0000785              chromatin
#> 10 GOMolecularFunction             GO:0008134 transcription factor..
#> 11          HumanPheno             HP:0025015 Abnormal vascular mo..
#> 12         Interaction             int:NKX2-5    NKX2-5 interactions
#> 13            MicroRNA       hsa-miR-105:PITA   hsa-miR-105:PITA_TOP
#> 14          MousePheno             MP:0005294 abnormal heart ventr..
#> 15             Pathway                 M48011 REACTOME_CARDIOGENESIS
#> 16              Pubmed               25336743 Arid3b is essential ..
#> 17                TFBS            V$FREAC7_01            V$FREAC7_01
#> 18            ToppCell ba7f7ce034c0f42742bf.. facs-Heart-LV-3m-Mes..
#>                GenesSymbol
#>            <CharacterList>
#> 1   GATA4,NKX2-5,HAND1,...
#> 2     GATA4,HAND1,NPPB,...
#> 3                    HAND2
#> 4  CITED2,GATA4,NKX2-5,...
#> 5         HEY2,HAND1,HAND2
#> 6      TGFB1,GATA4,CKM,...
#> 7         HEY2,HAND1,HAND2
#> 8   TGFB1,CITED2,GATA4,...
#> 9    CITED2,GATA4,HEY2,...
#> 10   CITED2,GATA4,HEY2,...
#> 11   CITED2,GATA4,HEY2,...
#> 12 GATA4,JARID2,NKX2-5,...
#> 13  CITED2,HEY2,JARID2,...
#> 14   CITED2,GATA4,HEY2,...
#> 15   GATA4,HEY2,NKX2-5,...
#> 16     GATA4,HEY2,GJA1,...
#> 17    GATA4,NPPB,HOXC4,...
#> 18    CKM,NKX2-5,SALL4,...
enrich_dr |>
    filter(any(GenesSymbol %in% CharacterList("HLA-DPB1"))) |>
    top_results()
#> DataFrame with 16 rows and 4 columns
#>               Category                     ID                   Name
#>            <character>            <character>            <character>
#> 1         Coexpression                 M10454 MCLACHLAN_DENTAL_CAR..
#> 2    CoexpressionAtlas      geo_heart_1000_K1 geo_heart_top-relati..
#> 3        Computational GAVISH_3CA_METAPROGR.. Genes upregulated in..
#> 4             Cytoband                 6p21.3                 6p21.3
#> 5              Disease DOID:2841 (is_implic.. asthma (is_implicate..
#> 6               Domain              IPR003006              Ig/MHC_CS
#> 7           GeneFamily                    591 C1-set domain contai..
#> 8  GOBiologicalProcess             GO:0002684 positive regulation ..
#> 9  GOCellularComponent             GO:0098552       side of membrane
#> 10 GOMolecularFunction             GO:0042277        peptide binding
#> 11          HumanPheno             HP:0100721 Mediastinal lymphade..
#> 12         Interaction               int:KNG1      KNG1 interactions
#> 13            MicroRNA           hsa-miR-4443                       
#> 14             Pathway                 M16476 KEGG_CELL_ADHESION_M..
#> 15              Pubmed               20668555 Extended LTA, TNF, L..
#> 16            ToppCell 2ae62c428728c1d9d447.. Transplant_Alveoli_a..
#>                   GenesSymbol
#>               <CharacterList>
#> 1         SELE,APOE,ITGB2,...
#> 2     ITGB2,HLA-DPB1,HLA-DRB1
#> 3  SELE,HLA-DPB1,HLA-DRB1,...
#> 4             HFE,LTA,TNF,...
#> 5             LTA,TNF,ACE,...
#> 6       HFE,HLA-DPB1,AGER,...
#> 7       HFE,HLA-DPB1,HLA-DRB1
#> 8            SELE,LTA,TNF,...
#> 9            SELE,HFE,TNF,...
#> 10         HFE,APOE,ITGB2,...
#> 11     APOE,HLA-DPB1,HLA-DRB1
#> 12         ITGB2,HLA-DPB1,AGT
#> 13    LTA,ITGA2B,HLA-DPB1,...
#> 14    SELE,ITGB2,HLA-DPB1,...
#> 15       LTA,TNF,HLA-DPB1,...
#> 16      SELE,ACE,HLA-DPB1,...

options(showHeadLines = orig)

3.5 Convert drug database identifiers to PubChem CIDs

enrich_chd |>
    lookup_pubchem()
#> DataFrame with 101 rows and 3 columns
#>          Source           ID         CID
#>     <character>  <character> <character>
#> 1           CTD  ctd:C007095          NA
#> 2           CTD  ctd:C007350       15787
#> 3           CTD  ctd:C026116       41781
#> 4           CTD  ctd:C034587    56842157
#> 5           CTD  ctd:C041125          NA
#> ...         ...          ...         ...
#> 97       Stitch CID000070815       70815
#> 98       Stitch CID000168120      168120
#> 99       Stitch CID006450335     6450335
#> 100      Stitch CID005464096     5464096
#> 101      Stitch CID000023931       23931
enrich_dr |>
    lookup_pubchem()
#> DataFrame with 100 rows and 3 columns
#>          Source           ID         CID
#>     <character>  <character> <character>
#> 1           CTD  ctd:C001803      137994
#> 2           CTD  ctd:C003297        9677
#> 3           CTD  ctd:C004479          NA
#> 4           CTD  ctd:C007350       15787
#> 5           CTD  ctd:C044946          NA
#> ...         ...          ...         ...
#> 96       Stitch CID000001959        1959
#> 97       Stitch CID000003157        3157
#> 98       Stitch CID000071301       71301
#> 99       Stitch CID000000187         187
#> 100      Stitch CID000003715        3715

3.6 Change default limits of enrichment queries

One can change the various cut-offs of a query using the CategoriesDataFrame() to limit or expand the number of results.

## Default cut-offs.
cats <- CategoriesDataFrame()
cats
#> ToppGene CategoriesDataFrame with 19 categories
#>                               PValue MinGenes MaxGenes MaxResults Correction
#> Coexpression                    0.05        1     1500        100        FDR
#> CoexpressionAtlas               0.05        1     1500        100        FDR
#> Computational                   0.05        1     1500        100        FDR
#> Cytoband                        0.05        1     1500        100        FDR
#> Disease                         0.05        1     1500        100        FDR
#> Domain                          0.05        1     1500        100        FDR
#> Drug                            0.05        1     1500        100        FDR
#> GeneFamily                      0.05        1     1500        100        FDR
#> GeneOntologyBiologicalProcess   0.05        1     1500        100        FDR
#> GeneOntologyCellularComponent   0.05        1     1500        100        FDR
#> GeneOntologyMolecularFunction   0.05        1     1500        100        FDR
#> HumanPheno                      0.05        1     1500        100        FDR
#> Interaction                     0.05        1     1500        100        FDR
#> MicroRNA                        0.05        1     1500        100        FDR
#> MousePheno                      0.05        1     1500        100        FDR
#> Pathway                         0.05        1     1500        100        FDR
#> Pubmed                          0.05        1     1500        100        FDR
#> TFBS                            0.05        1     1500        100        FDR
#> ToppCell                        0.05        1     1500        100        FDR
#> ------------------------------
#> Values allowed by ToppGene are:
#>   PValue: [0, 1] <numeric>
#>   MinGenes: [1, 5000] <integer>
#>   MaxGenes: [2, 5000] <integer>
#>   MaxResults: [1, 5000] <integer>
#>   Correction: {None, FDR, Bonferroni} <character>

## Limit to 10 results for each category, and lower PValue for GeneOntology.
cats <-
    cats |>
    mutate(
        PValue = case_when(
            grepl("GeneOntology", rownames(cats)) ~ 1e-7,
            .default = PValue),
        MaxResults = 10L)
cats
#> ToppGene CategoriesDataFrame with 19 categories
#>                               PValue MinGenes MaxGenes MaxResults Correction
#> Coexpression                   5e-02        1     1500         10        FDR
#> CoexpressionAtlas              5e-02        1     1500         10        FDR
#> Computational                  5e-02        1     1500         10        FDR
#> Cytoband                       5e-02        1     1500         10        FDR
#> Disease                        5e-02        1     1500         10        FDR
#> Domain                         5e-02        1     1500         10        FDR
#> Drug                           5e-02        1     1500         10        FDR
#> GeneFamily                     5e-02        1     1500         10        FDR
#> GeneOntologyBiologicalProcess  1e-07        1     1500         10        FDR
#> GeneOntologyCellularComponent  1e-07        1     1500         10        FDR
#> GeneOntologyMolecularFunction  1e-07        1     1500         10        FDR
#> HumanPheno                     5e-02        1     1500         10        FDR
#> Interaction                    5e-02        1     1500         10        FDR
#> MicroRNA                       5e-02        1     1500         10        FDR
#> MousePheno                     5e-02        1     1500         10        FDR
#> Pathway                        5e-02        1     1500         10        FDR
#> Pubmed                         5e-02        1     1500         10        FDR
#> TFBS                           5e-02        1     1500         10        FDR
#> ToppCell                       5e-02        1     1500         10        FDR
#> ------------------------------
#> Values allowed by ToppGene are:
#>   PValue: [0, 1] <numeric>
#>   MinGenes: [1, 5000] <integer>
#>   MaxGenes: [2, 5000] <integer>
#>   MaxResults: [1, 5000] <integer>
#>   Correction: {None, FDR, Bonferroni} <character>

enrich_chd_mod <-
    enrich(
        genes_chd$Entrez,
        cats)
enrich_chd_mod
#> DataFrame with 165 rows and 15 columns
#>                   Category                     ID                   Name
#>                <character>            <character>            <character>
#> 1   GeneOntologyMolecula..             GO:0008134 transcription factor..
#> 2   GeneOntologyMolecula..             GO:0061629 RNA polymerase II-sp..
#> 3   GeneOntologyMolecula..             GO:0140297 DNA-binding transcri..
#> 4   GeneOntologyBiologic..             GO:0048738 cardiac muscle tissu..
#> 5   GeneOntologyBiologic..             GO:0007507      heart development
#> ...                    ...                    ...                    ...
#> 161                Disease               C0018800           Cardiomegaly
#> 162                Disease               C1383860    Cardiac Hypertrophy
#> 163                Disease               C0019284   Diaphragmatic Hernia
#> 164                Disease DOID:5844 (is_implic.. myocardial infarctio..
#> 165                Disease          MONDO_0015263       Brugada syndrome
#>          PValue QValueFDRBH QValueFDRBY QValueBonferroni TotalGenes GenesInTerm
#>       <numeric>   <numeric>   <numeric>        <numeric>  <integer>   <integer>
#> 1   6.31374e-12 1.37639e-09 8.20882e-09      1.37639e-09      19978         754
#> 2   1.66715e-10 1.46986e-08 8.76626e-08      3.63440e-08      19978         427
#> 3   2.02275e-10 1.46986e-08 8.76626e-08      4.40959e-08      19978         595
#> 4   2.83683e-22 6.60698e-19 5.50403e-18      6.60698e-19      20557         326
#> 5   2.75671e-21 3.21019e-18 2.67429e-17      6.42039e-18      20557         764
#> ...         ...         ...         ...              ...        ...         ...
#> 161 1.11016e-12 2.45344e-10 1.94352e-09      1.71741e-09      29516          82
#> 162 1.11016e-12 2.45344e-10 1.94352e-09      1.71741e-09      29516          82
#> 163 1.80466e-12 3.48976e-10 2.76445e-09      2.79181e-09      29516          41
#> 164 4.30131e-12 7.39348e-10 5.85683e-09      6.65413e-09      29516          99
#> 165 6.06821e-12 9.38752e-10 7.43643e-09      9.38752e-09      29516          19
#>     GenesInQuery GenesInTermInQuery           Source                    URL
#>        <integer>          <integer>      <character>            <character>
#> 1             28                 13                                        
#> 2             28                 10                                        
#> 3             28                 11                                        
#> 4             28                 16                                        
#> 5             28                 19                                        
#> ...          ...                ...              ...                    ...
#> 161           28                  7 DisGeNET Curated                       
#> 162           28                  7 DisGeNET Curated                       
#> 163           28                  6 DisGeNET Curated                       
#> 164           28                  7   AllianceGenome https://fms.alliance..
#> 165           28                  5             GWAS http://purl.obolibra..
#>              GenesEntrez            GenesSymbol
#>            <IntegerList>        <CharacterList>
#> 1   10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 2   10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 3   10370,2626,23493,...  CITED2,GATA4,HEY2,...
#> 4    7040,10370,2626,... TGFB1,CITED2,GATA4,...
#> 5    7040,10370,2626,... TGFB1,CITED2,GATA4,...
#> ...                  ...                    ...
#> 161   2626,1482,4878,...  GATA4,NKX2-5,NPPA,...
#> 162   2626,1482,4878,...  GATA4,NKX2-5,NPPA,...
#> 163   7040,2626,2697,...   TGFB1,GATA4,GJA1,...
#> 164   7040,4878,4879,...    TGFB1,NPPA,NPPB,...
#> 165 2626,23493,57057,...   GATA4,HEY2,TBX20,...

## MaxResults limited to at most 10.
enrich_chd_mod |>
    count(Category)
#> DataFrame with 18 rows and 2 columns
#>              Category         n
#>              <factor> <integer>
#> 1   Coexpression             10
#> 2   CoexpressionAtlas        10
#> 3   Computational             2
#> 4   Cytoband                 10
#> 5   Disease                  10
#> ...               ...       ...
#> 14         MousePheno        10
#> 15         Pathway           10
#> 16         Pubmed            10
#> 17         TFBS              10
#> 18         ToppCell          10

## PValue limited to below 1e-7.
enrich_chd_mod |>
    arrange(desc(PValue)) |>
    filter(grepl(x = Category, "Onto")) |>
    group_by(Category) |>
    slice(1)
#> DataFrame with 2 rows and 15 columns
#>                 Category          ID                   Name      PValue
#>              <character> <character>            <character>   <numeric>
#> 1 GeneOntologyBiologic..  GO:0003231 cardiac ventricle de.. 1.03508e-18
#> 2 GeneOntologyMolecula..  GO:0140297 DNA-binding transcri.. 2.02275e-10
#>   QValueFDRBH QValueFDRBY QValueBonferroni TotalGenes GenesInTerm GenesInQuery
#>     <numeric>   <numeric>        <numeric>  <integer>   <integer>    <integer>
#> 1 2.36401e-16 1.96936e-15      2.41071e-15      20557         162           28
#> 2 1.46986e-08 8.76626e-08      4.40959e-08      19978         595           28
#>   GenesInTermInQuery      Source         URL          GenesEntrez
#>            <integer> <character> <character>        <IntegerList>
#> 1                 12                          7040,10370,2626,...
#> 2                 11                         10370,2626,23493,...
#>              GenesSymbol
#>          <CharacterList>
#> 1 TGFB1,CITED2,GATA4,...
#> 2  CITED2,GATA4,HEY2,...

Session Info

sessionInfo()
#> R Under development (unstable) (2026-01-15 r89304)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] DFplyr_1.5.2        dplyr_1.2.0         IRanges_2.45.0     
#> [4] S4Vectors_0.49.0    BiocGenerics_0.57.0 generics_0.1.4     
#> [7] toppgene_0.99.1     BiocStyle_2.39.0   
#> 
#> loaded via a namespace (and not attached):
#>  [1] rappdirs_0.3.4      sass_0.4.10         xml2_1.5.2         
#>  [4] RSQLite_2.4.6       hms_1.1.4           digest_0.6.39      
#>  [7] magrittr_2.0.4      evaluate_1.0.5      bookdown_0.46      
#> [10] fastmap_1.2.0       blob_1.3.0          jsonlite_2.0.0     
#> [13] DBI_1.3.0           BiocManager_1.30.27 purrr_1.2.1        
#> [16] httr2_1.2.2         jquerylib_0.1.4     cli_3.6.5          
#> [19] crayon_1.5.3        rlang_1.1.7         dbplyr_2.5.2       
#> [22] bit64_4.6.0-1       withr_3.0.2         cachem_1.1.0       
#> [25] yaml_2.3.12         otel_0.2.0          parallel_4.6.0     
#> [28] tools_4.6.0         tzdb_0.5.0          memoise_2.0.1      
#> [31] filelock_1.0.3      curl_7.0.0          vctrs_0.7.1        
#> [34] R6_2.6.1            BiocFileCache_3.1.0 lifecycle_1.0.5    
#> [37] bit_4.6.0           vroom_1.7.0         pkgconfig_2.0.3    
#> [40] pillar_1.11.1       bslib_0.10.0        glue_1.8.0         
#> [43] xfun_0.56           tibble_3.3.1        tidyselect_1.2.1   
#> [46] knitr_1.51          htmltools_0.5.9     rmarkdown_2.30     
#> [49] readr_2.2.0         compiler_4.6.0

References

Chen, Jing, Huan Xu, Bruce J. Aronow, and Anil G. Jegga. 2007. “Improved Human Disease Candidate Gene Prioritization Using Mouse Phenotype.” BMC Bioinformatics 8 (1): 392. https://doi.org/10.1186/1471-2105-8-392.