gsva {GSVA} | R Documentation |
Estimates GSVA enrichment scores.
## S4 method for signature 'ExpressionSet,list' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("Gaussian", "Poisson", "none"), abs.ranking=FALSE, min.sz=1, max.sz=Inf, parallel.sz=0, parallel.type="SOCK", mx.diff=TRUE, tau=switch(method, gsva=1, ssgsea=0.25, NA), ssgsea.norm=TRUE, verbose=TRUE) ## S4 method for signature 'ExpressionSet,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("Gaussian", "Poisson", "none"), abs.ranking=FALSE, min.sz=1, max.sz=Inf, parallel.sz=0, parallel.type="SOCK", mx.diff=TRUE, tau=switch(method, gsva=1, ssgsea=0.25, NA), ssgsea.norm=TRUE, verbose=TRUE) ## S4 method for signature 'matrix,GeneSetCollection' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("Gaussian", "Poisson", "none"), abs.ranking=FALSE, min.sz=1, max.sz=Inf, parallel.sz=0, parallel.type="SOCK", mx.diff=TRUE, tau=switch(method, gsva=1, ssgsea=0.25, NA), ssgsea.norm=TRUE, verbose=TRUE) ## S4 method for signature 'matrix,list' gsva(expr, gset.idx.list, annotation, method=c("gsva", "ssgsea", "zscore", "plage"), kcdf=c("Gaussian", "Poisson", "none"), abs.ranking=FALSE, min.sz=1, max.sz=Inf, parallel.sz=0, parallel.type="SOCK", mx.diff=TRUE, tau=switch(method, gsva=1, ssgsea=0.25, NA), ssgsea.norm=TRUE, verbose=TRUE)
expr |
Gene expression data which can be given either as an |
gset.idx.list |
Gene sets provided either as a |
annotation |
In the case of calling |
method |
Method to employ in the estimation of gene-set enrichment scores per sample. By default
this is set to |
kcdf |
Character string denoting the kernel to use during the non-parametric estimation of the
cumulative distribution function of expression levels across samples when |
abs.ranking |
Flag used only when |
min.sz |
Minimum size of the resulting gene sets. |
max.sz |
Maximum size of the resulting gene sets. |
parallel.sz |
Number of processors to use when doing the calculations in parallel.
This requires to previously load either the |
parallel.type |
Type of cluster architecture when using |
mx.diff |
Offers two approaches to calculate the enrichment statistic (ES)
from the KS random walk statistic. |
tau |
Exponent defining the weight of the tail in the random walk performed by both the |
ssgsea.norm |
Logical, set to |
verbose |
Gives information about each calculation step. Default: |
GSVA assesses the relative enrichment of gene sets across samples using a non-parametric approach. Conceptually, GSVA transforms a p-gene by n-sample gene expression matrix into a g-geneset by n-sample pathway enrichment matrix. This facilitates many forms of statistical analysis in the 'space' of pathways rather than genes, providing a higher level of interpretability.
The gsva()
function first maps the identifiers in the gene sets to the
identifiers in the input expression data leading to a filtered collection of
gene sets. This collection can be further filtered to require a minimun and/or
maximum size of the gene sets for which we want to calculate GSVA enrichment
scores, by using the arguments min.sz
and max.sz
.
A gene-set by sample matrix of GSVA enrichment scores.
J. Guinney and R. Castelo
Barbie, D.A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112, 2009.
Hänzelmann, S., Castelo, R. and Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14:7, 2013.
Lee, E. et al. Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217, 2008.
Tomfohr, J. et al. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225, 2005.
filterGeneSets
computeGeneSetsOverlap
library(limma) p <- 10 ## number of genes n <- 30 ## number of samples nGrp1 <- 15 ## number of samples in group 1 nGrp2 <- n - nGrp1 ## number of samples in group 2 ## consider three disjoint gene sets geneSets <- list(set1=paste("g", 1:3, sep=""), set2=paste("g", 4:6, sep=""), set3=paste("g", 7:10, sep="")) ## sample data from a normal distribution with mean 0 and st.dev. 1 y <- matrix(rnorm(n*p), nrow=p, ncol=n, dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep=""))) ## genes in set1 are expressed at higher levels in the last 'nGrp1+1' to 'n' samples y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2 ## build design matrix design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2))) ## fit linear model fit <- lmFit(y, design) ## estimate moderated t-statistics fit <- eBayes(fit) ## genes in set1 are differentially expressed topTable(fit, coef="sampleGroup2vs1") ## estimate GSVA enrichment scores for the three sets gsva_es <- gsva(y, geneSets, mx.diff=1) ## fit the same linear model now to the GSVA enrichment scores fit <- lmFit(gsva_es, design) ## estimate moderated t-statistics fit <- eBayes(fit) ## set1 is differentially expressed topTable(fit, coef="sampleGroup2vs1")