Contents

1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
     install.packages("BiocManager") 
# orthogene is only available on Bioconductor>=3.14
if(BiocManager::version()<"3.14") 
  BiocManager::install(update = TRUE, ask = FALSE)

BiocManager::install("orthogene")
library(orthogene)

data("exp_mouse")
# Setting to "homologene" for the purposes of quick demonstration.
# We generally recommend using method="gprofiler" (default).
method <- "homologene"  

2 Introduction

It’s not always clear whether a dataset is using the original species gene names, human gene names, or some other species’ gene names.

infer_species takes a list/matrix/data.frame with genes and infers the species that they best match to!

For the sake of speed, the genes extracted from gene_df are tested against genomes from only the following 6 test_species by default: - human - monkey - rat - mouse - zebrafish - fly

However, you can supply your own list of test_species, which will be automatically be mapped and standardised using map_species.

3 Examples

3.1 Mouse genes

3.1.1 Infer the species

matches <- orthogene::infer_species(gene_df = exp_mouse, 
                                    method = method)
## Preparing gene_df.
## sparseMatrix format detected.
## Extracting genes from rownames.
## 15,259 genes extracted.
## Testing for gene overlap with: human
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: human
## Common name mapping found for human
## 1 organism identified from search: 9606
## Gene table with 19,129 rows retrieved.
## Returning all 19,129 genes from human.
## Testing for gene overlap with: monkey
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: monkey
## Common name mapping found for monkey
## 1 organism identified from search: 9544
## Gene table with 16,843 rows retrieved.
## Returning all 16,843 genes from monkey.
## Testing for gene overlap with: rat
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: rat
## Common name mapping found for rat
## 1 organism identified from search: 10116
## Gene table with 20,616 rows retrieved.
## Returning all 20,616 genes from rat.
## Testing for gene overlap with: mouse
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: mouse
## Common name mapping found for mouse
## 1 organism identified from search: 10090
## Gene table with 21,207 rows retrieved.
## Returning all 21,207 genes from mouse.
## Testing for gene overlap with: zebrafish
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: zebrafish
## Common name mapping found for zebrafish
## 1 organism identified from search: 7955
## Gene table with 20,897 rows retrieved.
## Returning all 20,897 genes from zebrafish.
## Testing for gene overlap with: fly
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: fly
## Common name mapping found for fly
## 1 organism identified from search: 7227
## Gene table with 8,438 rows retrieved.
## Returning all 8,438 genes from fly.
## Top match:
##   - species: mouse 
##   - percent_match: 92%

3.2 Rat genes

3.2.1 Create example data

To create an example dataset, turn the gene names into rat genes.

exp_rat <- orthogene::convert_orthologs(gene_df = exp_mouse, 
                                        input_species = "mouse", 
                                        output_species = "rat",
                                        method = method)

3.2.2 Infer the species

matches <- orthogene::infer_species(gene_df = exp_rat, 
                                    method = method)

3.3 Human genes

3.3.1 Create example data

To create an example dataset, turn the gene names into human genes.

exp_human <- orthogene::convert_orthologs(gene_df = exp_mouse, 
                                          input_species = "mouse", 
                                          output_species = "human",
                                          method = method)

3.3.2 Infer the species

matches <- orthogene::infer_species(gene_df = exp_human, 
                                    method = method)

4 Additional test_species

You can even supply test_species with the name of one of the R packages that orthogene gets orthologs from. This will test against all species available in that particular R package.

For example, by setting test_species="homologene" we automatically test for % gene matches in each of the 20+ species available in homologene.

matches <- orthogene::infer_species(gene_df = exp_human, 
                                    test_species = method, 
                                    method = method)

5 Session Info

utils::sessionInfo()
R version 4.4.0 beta (2024-04-14 r86421)
Platform: x86_64-apple-darwin20
Running under: macOS Monterey 12.7.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] orthogene_1.10.0 BiocStyle_2.32.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.5              babelgene_22.9           
 [3] xfun_0.43                 bslib_0.7.0              
 [5] ggplot2_3.5.1             htmlwidgets_1.6.4        
 [7] rstatix_0.7.2             lattice_0.22-6           
 [9] vctrs_0.6.5               tools_4.4.0              
[11] generics_0.1.3            yulab.utils_0.1.4        
[13] parallel_4.4.0            tibble_3.2.1             
[15] fansi_1.0.6               highr_0.10               
[17] pkgconfig_2.0.3           Matrix_1.7-0             
[19] data.table_1.15.4         homologene_1.4.68.19.3.27
[21] ggplotify_0.1.2           lifecycle_1.0.4          
[23] farver_2.1.1              compiler_4.4.0           
[25] treeio_1.28.0             tinytex_0.50             
[27] munsell_0.5.1             carData_3.0-5            
[29] ggtree_3.12.0             ggfun_0.1.4              
[31] gprofiler2_0.2.3          htmltools_0.5.8.1        
[33] sass_0.4.9                yaml_2.3.8               
[35] lazyeval_0.2.2            plotly_4.10.4            
[37] pillar_1.9.0              car_3.1-2                
[39] ggpubr_0.6.0              jquerylib_0.1.4          
[41] tidyr_1.3.1               cachem_1.0.8             
[43] grr_0.9.5                 magick_2.8.3             
[45] abind_1.4-5               nlme_3.1-164             
[47] tidyselect_1.2.1          aplot_0.2.2              
[49] digest_0.6.35             dplyr_1.1.4              
[51] purrr_1.0.2               bookdown_0.39            
[53] labeling_0.4.3            fastmap_1.1.1            
[55] grid_4.4.0                colorspace_2.1-0         
[57] cli_3.6.2                 magrittr_2.0.3           
[59] patchwork_1.2.0           utf8_1.2.4               
[61] broom_1.0.5               ape_5.8                  
[63] withr_3.0.0               scales_1.3.0             
[65] backports_1.4.1           httr_1.4.7               
[67] rmarkdown_2.26            ggsignif_0.6.4           
[69] memoise_2.0.1             evaluate_0.23            
[71] knitr_1.46                viridisLite_0.4.2        
[73] gridGraphics_0.5-1        rlang_1.1.3              
[75] Rcpp_1.0.12               glue_1.7.0               
[77] tidytree_0.4.6            BiocManager_1.30.22      
[79] jsonlite_1.8.8            R6_2.5.1                 
[81] fs_1.6.4