The nullranges package contains functions for generation of feature sets (genomic regions) for exploring the null hypothesis of overlap or colocalization of two observed feature sets.
The package has two branches of functionality: matching or bootstrapping to generate null feature sets. The decision about which approach to use is ultimately up to the bioinformatics analyst. Here we describe the two different approaches briefly. For a listing of all the vignettes in the package, one can type:
vignette(package="nullranges")
Suppose we want to examine the significance of overlaps of genomic sets of features \(x\) and \(y\). To test the significance of this overlap, we calculate the overlap expected under the null by generating a null feature set \(y'\) (potentially many times). The null features in \(y'\) may be characterized by:
We provide a number of vignettes to describe the different matching and bootstrapping use cases. In the matching case, we have implemented a number of options, including nearest neighbor matching or rejection sampling based matching. In the bootstrapping case, we have implemented options for bootstrapping across or within chromosomes, and bootstrapping only within states of a segmented genome. We also provide a function to segment the genome by density of features. For example, supposing that \(x\) is a subset of genes, we may want to generate \(y'\) from \(y\) such that features are re-sampled in blocks from segments across the genome with similar gene density. In both cases, we provide a number of functions for performing quality control via visual inspection of diagnostic plots.
Finally, we recommend to incorporate list of regions where artificial features should not be placed, including the ENCODE Exclusion List (Amemiya, Kundaje, and Boyle 2019). This and other excluded ranges are made available in the excluderanges Bioconductor package by Mikhail Dozmorov. Use of excluded ranges is demonstrated in the segmented block bootstrap vignette.