evalRelevance {GSEABenchmarkeR}R Documentation

Evaluating phenotype relevance of gene set rankings

Description

This function evaluates gene set rankings obtained from the application of enrichment methods to multiple datasets - where each dataset investigates a certain phenotype such as a disease. Given pre-defined phenotype relevance scores for the gene sets, indicating how important a gene set is for the investigated phenotype (as e.g. judged by evidence from the literature), this allows to assess whether enrichment methods produce gene set rankings in which phenotype-relevant gene sets accumulate at the top.

Usage

evalRelevance(ea.ranks, rel.ranks, data2pheno, perc = TRUE, top = 0,
  rand = FALSE)

compOpt(rel.ranks, gs.ids, data2pheno = NULL, top = 0)

compRand(rel.ranks, gs.ids, data2pheno = NULL, perm = 1000)

Arguments

ea.ranks

Enrichment analysis rankings. A list with an entry for each enrichment method applied. Each entry is a list that stores for each dataset analyzed the resulting gene set ranking obtained from applying the respective method to the respective dataset. Resulting gene set rankings are assumed to be of class DataFrame in which gene sets (required column named GENE.SET) are ranked according to a ranking measure such as a gene set p-value (required column named P.VALUE). See gsRanking for an example.

rel.ranks

Relevance score rankings. A list with an entry for each phenotype investigated. Each entry should be a DataFrame in which gene sets (rownames are assumed to be gene set IDs) are ranked according to a phenotype relevance score (required column REL.SCORE).

data2pheno

A named character vector where the names correspond to dataset IDs and the elements of the vector to the corresponding phenotypes investigated.

perc

Logical. Should observed scores be returned as-is or as a *perc*entage of the respective optimal score. Percentages of the optimal score are typically easier to interpret and are comparable between datasets / phenotypes. Defaults to TRUE.

top

Integer. If top is non-zero, the evaluation will be restricted to the first top gene sets of each enrichment analysis ranking. Defaults to 0, which will then evaluate the full ranking.

rand

Logical. Should gene set rankings be randomized to assess how likely it is to observe a score equal or greater than the respective obtained score? Defaults to FALSE.

gs.ids

Character vector of gene set IDs on which enrichment analysis has been carried out.

perm

Integer. Number of permutations if rand set to TRUE.

Details

The function evalRelevance evaluates the similarity of a gene set ranking obtained from enrichment analysis and a gene set ranking based on phenotype relevance scores. Therefore, the function first transforms the ranks 'r' from the enrichment analysis to weights 'w' in [0,1] via w = 1 - r / N; where 'N' denotes the total number of gene sets on which the enrichment analysis has been carried out. These weights are then multiplied with the corresponding relevance scores and summed up.

The function compOpt applies evalRelevance to the theoretically optimal case in which the enrichment analysis ranking is identical to the relevance score ranking. The ratio between observed and optimal score is useful for comparing observed scores between datasets / phenotypes.

The function compRand repeatedly applies evalRelevance to randomly drawn gene set rankings to assess how likely it is to observe a score equal or greater than the one obtained.

Value

A numeric matrix (rows = datasets, columns = methods) storing in each cell the relevance score sum obtained from applying the respective method to the respective dataset.

Author(s)

Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>

See Also

runEA to apply enrichment methods to multiple datasets; readResults to read saved rankings as an input for the eval-functions;

Examples


    #
    # (1) simulated setup: 1 enrichment method applied to 1 dataset
    #  

    # simulate gene set ranking
    ea.ranks <- EnrichmentBrowser::makeExampleData("ea.res")
    ea.ranks <- EnrichmentBrowser::gsRanking(ea.ranks, signif.only=FALSE)

    # simulated relevance score ranking
    rel.ranks <- ea.ranks
    rel.ranks[,2] <- runif(nrow(ea.ranks), min=1, max=100)
    colnames(rel.ranks)[2] <- "REL.SCORE"
    rownames(rel.ranks) <- rel.ranks[,"GENE.SET"]
    ind <- order(rel.ranks[,"REL.SCORE"], decreasing=TRUE)
    rel.ranks <- rel.ranks[ind,]

    # evaluate
    evalRelevance(ea.ranks, rel.ranks)    
    compOpt(rel.ranks, ea.ranks[,"GENE.SET"])
    compRand(rel.ranks, ea.ranks[,"GENE.SET"], perm=3)

    # 
    # (2) simulated setup: 2 methods & 2 datasets
    #
       
    methods <- paste0("m", 1:2)
    data.ids <- paste0("d", 1:2)

    # simulate gene set rankings
    ea.ranks <- sapply(methods, function(m) 
            sapply(data.ids, 
                function(d)
                {
                    r <- EnrichmentBrowser::makeExampleData("ea.res") 
                    r <- EnrichmentBrowser::gsRanking(r, signif.only=FALSE)
                    return(r)
                }, simplify=FALSE),
                simplify=FALSE)

    # simulate a mapping from datasets to disease codes
    d2d <- c("ALZ", "BRCA")
    names(d2d) <- data.ids

    # simulate relevance score rankings
    rel.ranks <- lapply(ea.ranks[[1]],
        function(rr)
        {
            rr[,2] <- runif(nrow(rr), min=1, max=100)
            colnames(rr)[2] <- "REL.SCORE"
            rownames(rr) <- rr[,"GENE.SET"]
            ind <- order(rr[,"REL.SCORE"], decreasing=TRUE)
            rr <- rr[ind,]
            return(rr)
        })
    names(rel.ranks) <- unname(d2d)

    # evaluate
    evalRelevance(ea.ranks, rel.ranks, d2d)


[Package GSEABenchmarkeR version 1.2.1 Index]