getGenesets {EnrichmentBrowser}R Documentation

Definition of gene sets according to different sources

Description

Functionality for retrieving gene sets for an organism under investigation from databases such as GO and KEGG. Parsing and writing a list of gene sets from/to a flat text file in GMT format is also supported.

The GMT (Gene Matrix Transposed) file format is a tab delimited file format that describes gene sets. In the GMT format, each row represents a gene set. Each gene set is described by a name, a description, and the genes in the gene set. See references.

Usage

getGenesets(org, db = c("go", "kegg"), cache = TRUE,
  go.onto = c("BP", "MF", "CC"), go.mode = c("GO.db", "biomart"))

writeGMT(gs, gmt.file)

Arguments

org

An organism in (KEGG) three letter code, e.g. ‘hsa’ for ‘Homo sapiens’. Alternatively, this can also be a text file storing gene sets in GMT format. See details.

db

Database from which gene sets should be retrieved. Currently, either 'go' (default) or 'kegg'.

cache

Logical. Should a locally cached version used if available? Defaults to TRUE.

go.onto

Character. Specifies one of the three GO ontologies: 'BP' (biological process), 'MF' (molecular function), 'CC' (cellular component). Defaults to 'BP'.

go.mode

Character. Determines in which way the gene sets are retrieved. This can be either 'GO.db' or 'biomart'. The 'GO.db' mode creates the gene sets based on BioC annotation packages - which is fast, but represents not necessarily the most up-to-date mapping. In addition, this option is only available for the currently supported model organisms in BioC. The 'biomart' mode downloads the mapping from BioMart - which can be time consuming, but allows to select from a larger range of organisms and contains the latest mappings. Defaults to 'GO.db'.

gs

A list of gene sets (character vectors of gene IDs).

gmt.file

Gene set file in GMT format. See details.

Value

For getGenesets: a list of gene sets (vectors of gene IDs). For writeGMT: none, writes to file.

Author(s)

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

References

GO: http://geneontology.org/

KEGG Organism code http://www.genome.jp/kegg/catalog/org_list.html

GMT file format http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

See Also

annFUN for general GO2gene mapping used in the 'GO.db' mode, and the biomaRt package for general queries to BioMart.

keggList and keggLink for accessing the KEGG REST server.

Examples


    # (1) Typical usage for gene set enrichment analysis with GO:
    # Biological process terms based on BioC annotation (for human)
    go.gs <- getGenesets(org="hsa", db="go")
    
    # eq.:  
    # go.gs <- getGenesets(org="hsa", db="go", go.onto="BP", go.mode="GO.db")

    # Alternatively:
    # downloading from BioMart 
    # this may take a few minutes ...
    
     go.gs <- getGenesets(org="hsa", db="go", mode="biomart")
    

    # (2) Defining gene sets according to KEGG  
    kegg.gs <- getGenesets(org="hsa", db="kegg")

    # (3) parsing gene sets from GMT
    gmt.file <- system.file("extdata/hsa_kegg_gs.gmt", package="EnrichmentBrowser")
    gs <- getGenesets(gmt.file)     
    
    # (4) writing gene sets to file
    writeGMT(gs, gmt.file)


[Package EnrichmentBrowser version 2.10.11 Index]