BadRegionFinder-package {BadRegionFinder} | R Documentation |
BadRegionFinder is a package for identifying regions with a bad, acceptable and good coverage in sequence alignment data available as bam files. The whole genome may be considered as well as a set of target regions. Various visual and textual types of output are available.
Package: | BadRegionFinder |
Type: | Package |
Title: | BadRegionFinder: an R/Bioconductor package for identifying regions with bad coverage |
Version: | 1.16.0 |
Date: | 2016-03-07 |
Author: | Sarah Sandmann |
Maintainer: | Sarah Sandmann <sarah.sandmann@uni-muenster.de> |
Description: | BadRegionFinder is a package for identifying regions with a bad, acceptable and good coverage in sequence alignment data available as bam files. The whole genome may be considered as well as a set of target regions. Various visual and textual types of output are available. |
License: | LGPL-3 |
Imports: | VariantAnnotation, Rsamtools, biomaRt, GenomicRanges, S4Vectors, utils, stats, grDevices, graphics |
Suggests: | BSgenome.Hsapiens.UCSC.hg19 |
biocViews: | Coverage, Sequencing, Alignment, WholeGenome, Classification |
NeedsCompilation: | no |
git_url: | https://git.bioconductor.org/packages/BadRegionFinder |
git_branch: | RELEASE_3_11 |
git_last_commit: | 026495b |
git_last_commit_date: | 2020-04-27 |
Date/Publication: | 2020-04-27 |
In the use case of targeted sequencing it is most important to design the set of used primers in a way that the targeted regions are sequenced with a sufficient coverage. Yet, due to e.g. high GC-content the aimed at coverage may not always be obtained. Thus, a tool performing a detailed coverage analysis comparing many samples at a time – and not considering all available samples individually – appears to be most useful. Furthermore, with regards to reads mapping off target, it seems helpful to have a tool for investigating those regions, which show a relatively high coverage, but which were not originally targeted.
BadRegionFinder is a package for classifying a selection of regions or the whole genome into the user-definable categories of bad, acceptable and good coverage in any sequence alignment data available as bam files. Various visual and textual types of output are available including detailed output files considering every base that is or should be covered and an overview file considering the coverage of the different genes that were targeted.
Index of help topics:
BadRegionFinder-package BadRegionFinder: an R/Bioconductor package for identifying regions with bad coverage determineCoverage Determines the coverage (recommended for whole-genome analyses) determineCoverageQuality Classifies the determined coverage determineQuantiles Determines basewise user-defined quantiles determineRegionsOfInterest Determines the regions of interest plotDetailed Plots a more detailed overview of the coverage quality plotSummary Plots a summary of the coverage quality plotSummaryGenes Plots a summary of the coverage quality concerning the genes only reportBadRegionsDetailed Gives a detailed report on the coverage quality reportBadRegionsGenes Sums up the coverage quality on a gene basis reportBadRegionsSummary Sums up the coverage quality
The package contains a function performing the coverage determination - determineCoverage
(switch for whole-genome- and target region analyses). The actual classification of the coverage is performed by the function determineCoverageQuality
. If any subsets of regions are of interest, these may be selected by the function determineRegionsOfInterest
.
There are three different forms of textual reports available: a summary variant (reportBadRegionsSummary
), a detailed variant (reportBadRegionsDetailed
) and a summary variant focussing on the coverage of the genes (reportBadRegionsGenes
).
Furthermore, there exist three different forms of visual reports: a summary variant (plotSummary
), a detailed variant (plotDetailed
) and a summary variant visualizing the coverage of the genes as a barplot (plotSummaryGenes
).
Additionally, BadRegionFinder may be used to determine user-definable, basewise quantiles over all samples at any position (determineQuantiles
).
Sarah Sandmann
Maintainer: Sarah Sandmann <sarah.sandmann@uni-muenster.de>
More information on the bam format can be found at: http://samtools.github.io/hts-specs/SAMv1.pdf
determineCoverage
, determineCoverageQuality
, determineRegionsOfInterest
, reportBadRegionsSummary
, reportBadRegionsDetailed
, reportBadRegionsGenes
, plotSummary
, plotDetailed
, plotSummaryGenes
, determineQuantiles
library("BSgenome.Hsapiens.UCSC.hg19") threshold1 <- 20 threshold2 <- 100 percentage1 <- 0.80 percentage2 <- 0.90 sample_file <- system.file("extdata", "SampleNames.txt", package = "BadRegionFinder") samples <- read.table(sample_file) bam_input <- system.file("extdata", package = "BadRegionFinder") output <- system.file("extdata", package = "BadRegionFinder") target_regions <- system.file("extdata", "targetRegions.bed", package = "BadRegionFinder") targetRegions <- read.table(target_regions, header = FALSE, stringsAsFactors = FALSE) coverage_summary <- determineCoverage(samples, bam_input, targetRegions, output, TRonly = FALSE) coverage_indicators <- determineCoverageQuality(threshold1, threshold2, percentage1, percentage2, coverage_summary) badCoverageSummary <- reportBadRegionsSummary(threshold1, threshold2, percentage1, percentage2, coverage_indicators, "", output) coverage_indicators_temp <- reportBadRegionsDetailed(threshold1, threshold2, percentage1, percentage2, coverage_indicators, "", samples, output) badCoverageOverview <- reportBadRegionsGenes(threshold1, threshold2, percentage1, percentage2, badCoverageSummary, output) plotSummary(threshold1, threshold2, percentage1, percentage2, badCoverageSummary, output) plotDetailed(threshold1, threshold2, percentage1, percentage2, coverage_indicators_temp, output) plotSummaryGenes(threshold1, threshold2, percentage1, percentage2, badCoverageOverview, output) quantiles <- c(0.5) coverage_summary2 <- determineQuantiles(coverage_summary, quantiles, output)