KeggPathwayView {MAGeCKFlute}R Documentation

Kegg pathway view

Description

Plot kegg pathway and color specific genes.

Usage

KeggPathwayView(gene.data = NULL, cpd.data = NULL, pathway.id,
  species = "hsa", kegg.dir = ".", cpd.idtype = "kegg",
  gene.idtype = "ENTREZ", gene.annotpkg = NULL, min.nnodes = 3,
  kegg.native = TRUE, map.null = TRUE, expand.node = FALSE,
  split.group = FALSE, map.symbol = TRUE, map.cpdname = TRUE,
  node.sum = "sum", discrete = list(gene = FALSE, cpd = FALSE),
  limit = list(gene = 1, cpd = 1), bins = list(gene = 10, cpd = 10),
  both.dirs = list(gene = TRUE, cpd = TRUE), trans.fun = list(gene = NULL,
  cpd = NULL), low = list(gene = "deepskyblue1", cpd = "blue"),
  mid = list(gene = "gray", cpd = "gray"), high = list(gene = "red", cpd =
  "yellow"), na.col = "transparent", ...)

Arguments

gene.data

Either vector (single sample) or a matrix-like data (multiple sample). Vector should be numeric with gene IDs as names or it may also be character of gene IDs. Character vector is treated as discrete or count data. Matrix-like data structure has genes as rows and samples as columns. Row names should be gene IDs. Here gene ID is a generic concepts, including multiple types of gene, transcript and protein uniquely mappable to KEGG gene IDs. KEGG ortholog IDs are also treated as gene IDs as to handle metagenomic data. Check details for mappable ID types. Default gene.data=NULL.

cpd.data

The same as gene.data, excpet named with IDs mappable to KEGG compound IDs. Over 20 types of IDs included in CHEMBL database can be used here. Check details for mappable ID types. Default cpd.data=NULL. Note that gene.data and cpd.data can't be NULL simultaneously.

pathway.id

Character vector, the KEGG pathway ID(s), usually 5 digit, may also include the 3 letter KEGG species code.

species

Character, either the kegg code, scientific name or the common name of the target species. This applies to both pathway and gene.data or cpd.data. When KEGG ortholog pathway is considered, species="ko". Default species="hsa", it is equivalent to use either "Homo sapiens" (scientific name) or "human" (common name).

kegg.dir

Character, the directory of KEGG pathway data file (.xml) and image file (.png). Users may supply their own data files in the same format and naming convention of KEGG's (species code + pathway id, e.g. hsa04110.xml, hsa04110.png etc) in this directory. Default kegg.dir="." (current working directory).

cpd.idtype

Character, ID type used for the cpd.data. Default cpd.idtype="kegg" (include compound, glycan and drug accessions).

gene.idtype

Character, ID type used for the gene.data, case insensitive. Default gene.idtype="entrez", i.e. Entrez Gene, which are the primary KEGG gene ID for many common model organisms. For other species, gene.idtype should be set to "KEGG" as KEGG use other types of gene IDs. For the common model organisms (to check the list, do: data(bods); bods), you may also specify other types of valid IDs. To check the ID list, do: data(gene.idtype.list); gene.idtype.list.

gene.annotpkg

Character, the name of the annotation package to use for mapping between other gene ID types including symbols and Entrez gene ID. Default gene.annotpkg=NULL.

min.nnodes

Integer, minimal number of nodes of type "gene","enzyme", "compound" or "ortholog" for a pathway to be considered. Default min.nnodes=3.

kegg.native

Logical, whether to render pathway graph as native KEGG graph (.png) or using graphviz layout engine (.pdf). Default kegg.native=TRUE.

map.null

Logical, whether to map the NULL gene.data or cpd.data to pathway. When NULL data are mapped, the gene or compound nodes in the pathway will be rendered as actually mapped nodes, except with NA-valued color. When NULL data are not mapped, the nodes are rendered as unmapped nodes. This argument mainly affects native KEGG graph view, i.e. when kegg.native=TRUE. Default map.null=TRUE.

expand.node

Logical, whether the multiple-gene nodes are expanded into single-gene nodes. Each expanded single-gene nodes inherits all edges from the original multiple-gene node. This option only affects graphviz graph view, i.e. when kegg.native=FALSE. This option is not effective for most metabolic pathways where it conflits with converting reactions to edges. Default expand.node=FLASE.

split.group

Logical, whether split node groups are split to individual nodes. Each split member nodes inherits all edges from the node group. This option only affects graphviz graph view, i.e. when kegg.native=FALSE. This option also effects most metabolic pathways even without group nodes defined orginally. For these pathways, genes involved in the same reaction are grouped automatically when converting reactions to edges unless split.group=TRUE. d split.group=FLASE.

map.symbol

Logical, whether map gene IDs to symbols for gene node labels or use the graphic name from the KGML file. This option is only effective for kegg.native=FALSE or same.layer=FALSE when kegg.native=TRUE. For same.layer=TRUE when kegg.native=TRUE, the native KEGG labels will be kept. Default map.symbol=TRUE.

map.cpdname

Logical, whether map compound IDs to formal names for compound node labels or use the graphic name from the KGML file (KEGG compound accessions). This option is only effective for kegg.native=FALSE. When kegg.native=TRUE, the native KEGG labels will be kept. Default map.cpdname=TRUE.

node.sum

Character, the method name to calculate node summary given that multiple genes or compounds are mapped to it. Poential options include "sum","mean", "median", "max", "max.abs" and "random". Default node.sum="sum".

discrete

A list of two logical elements with "gene" and "cpd" as the names. This argument tells whether gene.data or cpd.data should be treated as discrete. Default dsicrete=list(gene=FALSE, cpd=FALSE), i.e. both data should be treated as continuous.

limit

A list of two numeric elements with "gene" and "cpd" as the names. This argument specifies the limit values for gene.data and cpd.data when converting them to pseudo colors. Each element of the list could be of length 1 or 2. Length 1 suggests discrete data or 1 directional (positive-valued) data, or the absolute limit for 2 directional data. Length 2 suggests 2 directional data. Default limit=list(gene=1, cpd=1).

bins

A list of two integer elements with "gene" and "cpd" as the names. This argument specifies the number of levels or bins for gene.data and cpd.data when converting them to pseudo colors. Default limit=list(gene=10, cpd=10).

both.dirs

A list of two logical elements with "gene" and "cpd" as the names. This argument specifies whether gene.data and cpd.data are 1 directional or 2 directional data when converting them to pseudo colors. Default limit=list(gene=TRUE, cpd=TRUE).

trans.fun

A list of two function (not character) elements with "gene" and "cpd" as the names. This argument specifies whether and how gene.data and cpd.data are transformed. Examples are log, abs or users' own functions. Default limit=list(gene=NULL, cpd=NULL).

low

A list of two colors with "gene" and "cpd" as the names.

mid

A list of two colors with "gene" and "cpd" as the names.

high

A list of two colors with "gene" and "cpd" as the names.

na.col

Color used for NA's or missing values in gene.data and cpd.data. d na.col="transparent".

...

Extra arguments passed to keggview.native or keggview.graph function.

Details

The function KeggPathwayView is a revised version of pathview function in pathview package. KeggPathwayView maps and renders user data on relevant pathway graphs. KeggPathwayView is a stand alone program for pathway based data integration and visualization. It also seamlessly integrates with pathway and functional analysis tools for large-scale and fully automated analysis. KeggPathwayView provides strong support for data Integration. It works with: 1) essentially all types of biological data mappable to pathways, 2) over 10 types of gene or protein IDs, and 20 types of compound or metabolite IDs, 3) pathways for over 2000 species as well as KEGG orthology, 4) varoius data attributes and formats, i.e. continuous/discrete data, matrices/vectors, single/multiple samples etc. To see mappable external gene/protein IDs do: data(gene.idtype.list), to see mappable external compound related IDs do: data(rn.list); names(rn.list). KeggPathwayView generates both native KEGG view and Graphviz views for pathways. Currently only KEGG pathways are implemented. Hopefully, pathways from Reactome, NCI and other databases will be supported in the future.

The argument low, mid, and high specifies the color spectra to code gene.data and cpd.data. When data are 1 directional (TRUE value in both.dirs), only mid and high are used to specify the color spectra. Default spectra (low-mid-high) "green"-"gray"-"red" and "blue"-"gray"-"yellow" are used for gene.data and cpd.data respectively. The values for 'low, mid, high' can be given as color names ('red'), plot color index (2=red), and HTML-style RGB, ("\#FF0000"=red).

Value

The result returned by KeggPathwayView function is a named list corresponding to the input pathway ids. Each element (for each pathway itself is a named list, with 2 elements ("plot.data.gene", "plot.data.cpd"). Both elements are data.frame or NULL depends on the corresponding input data gene.data and cpd.data. These data.frames record the plot data for mapped gene or compound nodes: rows are mapped genes/compounds, columns are:

kegg.names

standard KEGG IDs/Names for mapped nodes. It's Entrez Gene ID or KEGG Compound Accessions.

labels

Node labels to be used when needed.

all.mapped

All molecule (gene or compound) IDs mapped to this node.

type

node type, currently 4 types are supported: "gene","enzyme", "compound" and "ortholog".

x

x coordinate in the original KEGG pathway graph.

y

y coordinate in the original KEGG pathway graph.

width

node width in the original KEGG pathway graph.

height

node height in the original KEGG pathway graph.

other columns

columns of the mapped gene/compound data and corresponding pseudo-color codes for individual samples

Author(s)

Wubing Zhang

See Also

pathview

Examples

#load data
data(gse16873.d)
data(demo.paths)
#KEGG view: gene data only
## Not run: 
i <- 1
pv.out <- KeggPathwayView(gene.data = gse16873.d[, 1],
       pathway.id = demo.paths$sel.paths[i], species = "hsa",
        out.suffix = "gse16873", kegg.native = TRUE)

## End(Not run)


[Package MAGeCKFlute version 1.0.1 Index]