evalRelevance {GSEABenchmarkeR} | R Documentation |
This function evaluates gene set rankings obtained from the application of enrichment methods to multiple datasets - where each dataset investigates a certain phenotype such as a disease. Given pre-defined phenotype relevance scores for the gene sets, indicating how important a gene set is for the investigated phenotype (as e.g. judged by evidence from the literature), this allows to assess whether enrichment methods produce gene set rankings in which phenotype-relevant gene sets accumulate at the top.
evalRelevance(ea.ranks, rel.ranks, data2pheno, perc = TRUE, top = 0, rand = FALSE) compOpt(rel.ranks, gs.ids, data2pheno = NULL, top = 0) compRand(rel.ranks, gs.ids, data2pheno = NULL, perm = 1000)
ea.ranks |
Enrichment analysis rankings. A list with an entry for each
enrichment method applied. Each entry is a list that stores for each
dataset analyzed the resulting gene set ranking obtained from applying the
respective method to the respective dataset. Resulting gene set rankings
are assumed to be of class |
rel.ranks |
Relevance score rankings. A list with an entry for each
phenotype investigated. Each entry should be a
|
data2pheno |
A named character vector where the names correspond to dataset IDs and the elements of the vector to the corresponding phenotypes investigated. |
perc |
Logical. Should observed scores be returned as-is or as a
*perc*entage of the respective optimal score. Percentages of the optimal
score are typically easier to interpret and are comparable between datasets
/ phenotypes. Defaults to |
top |
Integer. If |
rand |
Logical. Should gene set rankings be randomized to assess how
likely it is to observe a score equal or greater than the respective
obtained score? Defaults to |
gs.ids |
Character vector of gene set IDs on which enrichment analysis has been carried out. |
perm |
Integer. Number of permutations if |
The function evalRelevance
evaluates the similarity of a gene set ranking
obtained from enrichment analysis and a gene set ranking based on phenotype
relevance scores. Therefore, the function first transforms the ranks 'r'
from the enrichment analysis to weights 'w' in [0,1] via w = 1 - r / N;
where 'N' denotes the total number of gene sets on which the enrichment
analysis has been carried out. These weights are then multiplied with the
corresponding relevance scores and summed up.
The function compOpt
applies evalRelevance
to the theoretically
optimal case in which the enrichment analysis ranking is identical to the
relevance score ranking. The ratio between observed and optimal score is
useful for comparing observed scores between datasets / phenotypes.
The function compRand
repeatedly applies evalRelevance
to randomly
drawn gene set rankings to assess how likely it is to observe a score equal
or greater than the one obtained.
A numeric matrix (rows = datasets, columns = methods) storing in each cell the relevance score sum obtained from applying the respective method to the respective dataset.
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
runEA
to apply enrichment methods to multiple datasets;
readResults
to read saved rankings as an input for the eval-functions;
# # (1) simulated setup: 1 enrichment method applied to 1 dataset # # simulate gene set ranking ea.ranks <- EnrichmentBrowser::makeExampleData("ea.res") ea.ranks <- EnrichmentBrowser::gsRanking(ea.ranks, signif.only=FALSE) # simulated relevance score ranking rel.ranks <- ea.ranks rel.ranks[,2] <- runif(nrow(ea.ranks), min=1, max=100) colnames(rel.ranks)[2] <- "REL.SCORE" rownames(rel.ranks) <- rel.ranks[,"GENE.SET"] ind <- order(rel.ranks[,"REL.SCORE"], decreasing=TRUE) rel.ranks <- rel.ranks[ind,] # evaluate evalRelevance(ea.ranks, rel.ranks) compOpt(rel.ranks, ea.ranks[,"GENE.SET"]) compRand(rel.ranks, ea.ranks[,"GENE.SET"], perm=3) # # (2) simulated setup: 2 methods & 2 datasets # methods <- paste0("m", 1:2) data.ids <- paste0("d", 1:2) # simulate gene set rankings ea.ranks <- sapply(methods, function(m) sapply(data.ids, function(d) { r <- EnrichmentBrowser::makeExampleData("ea.res") r <- EnrichmentBrowser::gsRanking(r, signif.only=FALSE) return(r) }, simplify=FALSE), simplify=FALSE) # simulate a mapping from datasets to disease codes d2d <- c("ALZ", "BRCA") names(d2d) <- data.ids # simulate relevance score rankings rel.ranks <- lapply(ea.ranks[[1]], function(rr) { rr[,2] <- runif(nrow(rr), min=1, max=100) colnames(rr)[2] <- "REL.SCORE" rownames(rr) <- rr[,"GENE.SET"] ind <- order(rr[,"REL.SCORE"], decreasing=TRUE) rr <- rr[ind,] return(rr) }) names(rel.ranks) <- unname(d2d) # evaluate evalRelevance(ea.ranks, rel.ranks, d2d)