Example 16S Annotation Workflow

Nathan D. Olson

2016-05-15

Overview

The metagenomeFeatures package and associated annotation packages (e.g. greengene13.5MgDb) can be used to and taxonomic annotations to MRexperiment objects. The metagenomeSeq package includes the MRexperiment-class, and has a number of methods for analyzing marker-gene metagenome datasets. This vignette demonstrate how to add taxonomic annotation to a MRexperiment-class object using metagenomeFeatures.

To generate a MRexperiment object with taxonomic information you need;

For MRexperiment objects, the assayData and phenoData need to have the sample sample order similarly assayData and featureData need to have the sample OTU order.

Required Packages

library(Biostrings)
library(metagenomeFeatures)
library(metagenomeSeq)
library(magrittr)

The magrittr pipe operator %>% is used throughout this vignette, see the magrittr package vignette for more information about this handy operator (https://cran.r-project.org/web/packages/magrittr/index.html).

Creating Inital MRexperiment Object

The assayData and phenoData we will use is in a file included in the metagenomeFeatures package with count data derived from the msd16s study.

dataDirectory <- system.file("extdata", package = "metagenomeFeatures")

assay_data <- file.path(dataDirectory, "msd16s_counts.csv") %>% load_meta(sep = ",")

pheno_data <- file.path(dataDirectory, "msd16s_sample_data.csv") %>% 
    read.csv(row.names = 1, stringsAsFactors = F) %>% AnnotatedDataFrame()

Creating an initial MRexperiment with count and phenoData.

demoMRexp = newMRexperiment(assay_data$counts, phenoData=pheno_data)
demoMRexp
## MRexperiment (storageMode: environment)
## assayData: 13530 features, 992 samples 
##   element names: counts 
## protocolData: none
## phenoData
##   sampleNames: 100259 100262 ... 602385 (992 total)
##   varLabels: Type Country ... Dysentery (5 total)
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:

Currently the featureData slot is empty, we will now use the annotateMRexp_fData function in the metagenomeFeatures package to load the featureData.

Loading featureData with metagenomeFeatures

You first need a MgDb object for the database used to classify the OTUs. A subset of the greengenes database, msd16s_MgDb is used to generate a MgDb-object for annotating the msd16s data. This database is only used here for demonstration purposes, the gg13.5MgDb database in the greengenes13.5MgDb package can also be used to generate the metagenomeAnnotation object.

annotateMRexp_fData

The annotateMRexp_fData function takes an MgDb and MRexperiment objects as input and defines the featureData slot using the OTU ids from the assayData slot. The slot is defined with a mgFeatures object which extends the AnnotatedDataFrame object with additional slots with the database reference sequences (refDbSeq), phylogenetic tree (refDbtree), and metadata about the reference database used.

demoMRexp <- annotateMRexp_fData(mgdb = msd16s_MgDb, MRobj = demoMRexp)
demoMRexp
## MRexperiment (storageMode: environment)
## assayData: 13530 features, 992 samples 
##   element names: counts 
## protocolData: none
## phenoData
##   sampleNames: 100259 100262 ... 602385 (992 total)
##   varLabels: Type Country ... Dysentery (5 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 100023 100024 ... 99906 (7007 total)
##   fvarLabels: Keys Kingdom ... Species (8 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

The featureData slot in the demoMRexp object now contains the taxonomic data for the OTUs as well as the database reference sequences as well as the phylogenetic tree.

TODO

demonstrate using mgFeatures accessors

sessionInfo()
## R version 3.3.0 (2016-05-03)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] magrittr_1.5             metagenomeSeq_1.14.2    
##  [3] RColorBrewer_1.1-2       glmnet_2.0-5            
##  [5] foreach_1.4.3            Matrix_1.2-6            
##  [7] limma_3.28.4             metagenomeFeatures_1.2.2
##  [9] Biobase_2.32.0           Biostrings_2.40.0       
## [11] XVector_0.12.0           IRanges_2.6.0           
## [13] S4Vectors_0.10.0         BiocGenerics_0.18.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5                formatR_1.4               
##  [3] GenomeInfoDb_1.8.2         bitops_1.0-6              
##  [5] iterators_1.0.8            tools_3.3.0               
##  [7] zlibbioc_1.18.0            digest_0.6.9              
##  [9] nlme_3.1-128               RSQLite_1.0.0             
## [11] evaluate_0.9               lattice_0.20-33           
## [13] DBI_0.4-1                  yaml_2.1.13               
## [15] dplyr_0.4.3                stringr_1.0.0             
## [17] hwriter_1.3.2              knitr_1.13                
## [19] caTools_1.17.1             gtools_3.5.0              
## [21] grid_3.3.0                 R6_2.1.2                  
## [23] BiocParallel_1.6.2         rmarkdown_0.9.6           
## [25] gdata_2.17.0               latticeExtra_0.6-28       
## [27] gplots_3.0.1               matrixStats_0.50.2        
## [29] codetools_0.2-14           Rsamtools_1.24.0          
## [31] htmltools_0.3.5            GenomicRanges_1.24.0      
## [33] GenomicAlignments_1.8.0    ShortRead_1.30.0          
## [35] assertthat_0.1             SummarizedExperiment_1.2.2
## [37] ape_3.4                    KernSmooth_2.23-15        
## [39] stringi_1.0-1              lazyeval_0.1.10