empiricalFDR {csaw}R Documentation

Control the empirical FDR

Description

Control the empirical FDR across clusters for comparisons to negative controls, based on tests that are significant in the wrong direction.

Usage

empiricalFDR(ids, tab, weight=NULL, pval.col=NULL, fc.col=NULL, neg.down=TRUE) 

Arguments

ids

an integer vector containing the cluster ID for each test

tab

a dataframe of results with PValue and at least one logFC field for each test

weight

a numeric vector of weights for each window, defaults to 1 for each test

pval.col

an integer scalar or string specifying the column of tab containing the p-values

fc.col

an integer scalar or string specifying the column of tab containing the log-fold changes

neg.down

a logical scalar indicating if negative log-fold changes correspond to the “wrong” direction

Details

Some experiments involve comparisons to negative controls where there should be no signal/binding. In such case, genuine differences should only occur in one direction, i.e., up in the non-control samples. Thus, the number of significant tests that change in the wrong direction can be used as an estimate of the number of false positives.

This function converts two-sided p-values in tab[,pval.col] to one-sided counterparts in the wrong direction. It combines the one-sided p-values for each cluster using combineTests. The number of significant clusters at some p-value threshold represents the estimated number of false positive clusters.

The same approach is applied for one-sided p-values in the right direction, where the number of detected clusters at the threshold represents the total number of discoveries. Dividing the number of false positives by the nmber of discoveries yields the empirical FDR at each p-value threshold. Monotonicity is enforced (i.e., the empirical FDR only decreases with decreasing p-value) as is the fact that the empirical FDR must be below unity.

The p-values specified in pval.col are assumed to be originally computed from some two-sided test, where the direction of change is independent of the magnitude of the p-value under the null hypothesis. This rules out p-values computed from, e.g., ANODEV where multiple contrasts are tested at once.

Control of the empirical FDR is best used for very noisy data sets where the BH method is not adequate. While the BH method protects against statistical false positives, the empirical FDR also protects against experimental false positives, e.g., due to non-specific binding. The downside is that the empirical FDR calculation relies on the availability of a good estimate of the number of false positives. The BH method in combineTests is more statistically rigorous and should be preferred for routine analyses.

Value

A data frame containing one row per cluster, with various fields:

All other fields are the same as those returned by combineTests. The exception is the direction field, which is not returned as it is not informative for one-sided tests.

Author(s)

Aaron Lun

References

Zhang Y, Liu T, Meyer CA et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.

See Also

combineTests

Examples

ids <- round(runif(100, 1, 10))
tab <- data.frame(logFC=rnorm(100), logCPM=rnorm(100), PValue=rbeta(100, 1, 2))
empirical <- empiricalFDR(ids, tab)
head(empirical)

[Package csaw version 1.14.1 Index]