Exploring a MgDb Object

Nathan D. Olson

2016-05-15

The MgDb Class in the metagenomeFeatures package includes the sequences and taxonomic information for a 16S database. The following vignette demonstrates the class methods for exploring and subsetting a MgDb-class object using the demoMgDb included in the metagenomeFeatures package. MgDb-class object with full databases are inseparte packages such as the greengenes13.5MgDb package.

Demonstration MgDb-class Object

library(metagenomeFeatures)
demoMgDb <- get_demoMgDb()
demoMgDb
## MgDb object:[1] "Metadata"
## |ACCESSION_DATE: 3/31/2015
## |URL: https://greengenes.microbio.me
## |DB_TYPE_NAME: GreenGenes-MgDb-Demo
## |DB_TYPE_VALUE: MgDb
## |DB_SCHEMA_VERSION: 1.0
## [1] "Sequence Data:"
##   A DNAStringSet instance of length 249
##       width seq                                        names               
##   [1]  1343 GACGAACGCTGGCGGCGTGC...TGAATACGTTCCCGGGCCT 1093016
##   [2]  1326 GACGAACGCTGGCGGCGTGC...TGAATACGTTCCCGGGCCT 1083934
##   [3]  1334 GATGAACGCTGGCGGCACGC...TGAATGCGTTCCCGGGCCT 1075456
##   [4]  1345 GATGAACGCTAGCGGGAGGC...TGAATACGTTCCCGGGCCT 1023948
##   [5]  1504 GACGAACGCTGGCGGCGCGC...GGGGTTGATGATTGGGGTG 983909
##   ...   ... ...
## [245]  1422 TCCGGTTGATCCTGCCGGAG...TCGAAACTGGGCCTCGCGA 4327819
## [246]  1419 CACTGCTATTGGAGTCCGAC...GGGGTTGCGTGAGGGGGGC 4344031
## [247]  1343 CGGTTGATCCTGCCGAAGGC...CCTTGCACACACCGCCCGT 4357608
## [248]  1270 TAACGTGAAGACCGGGATAA...CGAGCAGGTTTTAGGTGAG 4437875
## [249]  1554 TTTTTTCTGAGAATTTGATC...GGGCTGGATCACCTCCTTT 4485266
## [1] "Taxonomy Data:"
## Source: sqlite 3.8.6 [/private/tmp/Rtmp2ANki7/Rinst122f7237d7c49/metagenomeFeatures/extdata/demoTaxa.sqlite]
## From: taxa [249 x 8]
## 
##       Keys     Kingdom             Phylum             Class
##      (chr)       (chr)              (chr)             (chr)
## 1  4324716 k__Bacteria   p__Bacteroidetes               c__
## 2   246960 k__Bacteria  p__Planctomycetes c__028H05-P-BN-P5
## 3   222675 k__Bacteria p__Armatimonadetes       c__0319-6E2
## 4   156874 k__Bacteria            p__NC10          c__12-24
## 5  4383832 k__Bacteria            p__GN02         c__3BR-5F
## 6  4383502 k__Bacteria   p__Elusimicrobia           c__4-29
## 7   315344 k__Bacteria   p__Cyanobacteria         c__4C0d-2
## 8  2655590 k__Bacteria            p__GN04       c__5bav_B12
## 9   552241 k__Bacteria         p__SBR1093        c__A712011
## 10 4327819  k__Archaea   p__Crenarchaeota            c__AAG
## ..     ...         ...                ...               ...
## Variables not shown: Order (chr), Family (chr), Genus (chr), Species (chr)
## [1] "Tree Data:"
## 
## Phylogenetic tree with 203452 tips and 203451 internal nodes.
## 
## Tip labels:
##  1018666, 421164, 989926, 892241, 1046178, 854915, ...
## Node labels:
##  , k__Bacteria, , , , , ...
## 
## Rooted; includes branch lengths.

MgDb Methods

taxa_keytypes

taxa_keytypes(demoMgDb)
## [1] "Keys"    "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"  
## [8] "Species"
taxa_columns(demoMgDb)
## [1] "Keys"    "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"  
## [8] "Species"
head(taxa_keys(demoMgDb, keytype = c("Kingdom")))
## Source: local data frame [6 x 1]
## 
##       Kingdom
##         (chr)
## 1 k__Bacteria
## 2 k__Bacteria
## 3 k__Bacteria
## 4 k__Bacteria
## 5 k__Bacteria
## 6 k__Bacteria

Select Methods

Used to retrieve db entries for a specified taxanomic group or id list, can return either taxonomic, sequences information, or both.

Selecting taxonomic information

select(demoMgDb, type = "taxa",
                keys = c("Vibrio", "Salmonella"),
                keytype = "Genus")
## Source: local data frame [0 x 8]
## 
## Variables not shown: Keys (chr), Kingdom (chr), Phylum (chr), Class (chr),
##   Order (chr), Family (chr), Genus (chr), Species (chr)

Selecting sequence information

select(demoMgDb, type = "seq",
                keys = c("Vibrio", "Salmonella"),
                keytype = "Genus")
##   A DNAStringSet instance of length 0

Selecting all

select(demoMgDb, type = "all",
                keys = c("Vibrio", "Salmonella"),
                keytype = "Genus")
## Warning in ape::drop.tip(tree, drop_tips): drop all tips of the tree:
## returning NULL
## $taxa
## Source: local data frame [0 x 8]
## 
## Variables not shown: Keys (chr), Kingdom (chr), Phylum (chr), Class (chr),
##   Order (chr), Family (chr), Genus (chr), Species (chr)
## 
## $seq
##   A DNAStringSet instance of length 0
sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] magrittr_1.5             metagenomeSeq_1.14.2    
##  [3] RColorBrewer_1.1-2       glmnet_2.0-5            
##  [5] foreach_1.4.3            Matrix_1.2-6            
##  [7] limma_3.28.4             metagenomeFeatures_1.2.2
##  [9] Biobase_2.32.0           Biostrings_2.40.0       
## [11] XVector_0.12.0           IRanges_2.6.0           
## [13] S4Vectors_0.10.0         BiocGenerics_0.18.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5                formatR_1.4               
##  [3] GenomeInfoDb_1.8.2         bitops_1.0-6              
##  [5] iterators_1.0.8            tools_3.3.0               
##  [7] zlibbioc_1.18.0            digest_0.6.9              
##  [9] nlme_3.1-128               RSQLite_1.0.0             
## [11] evaluate_0.9               lattice_0.20-33           
## [13] DBI_0.4-1                  yaml_2.1.13               
## [15] dplyr_0.4.3                stringr_1.0.0             
## [17] hwriter_1.3.2              knitr_1.13                
## [19] caTools_1.17.1             gtools_3.5.0              
## [21] grid_3.3.0                 R6_2.1.2                  
## [23] BiocParallel_1.6.2         rmarkdown_0.9.6           
## [25] gdata_2.17.0               latticeExtra_0.6-28       
## [27] gplots_3.0.1               matrixStats_0.50.2        
## [29] codetools_0.2-14           Rsamtools_1.24.0          
## [31] htmltools_0.3.5            GenomicRanges_1.24.0      
## [33] GenomicAlignments_1.8.0    ShortRead_1.30.0          
## [35] assertthat_0.1             SummarizedExperiment_1.2.2
## [37] ape_3.4                    KernSmooth_2.23-15        
## [39] stringi_1.0-1              lazyeval_0.1.10