preprocessIntervals {PureCN} | R Documentation |
Optimize intervals for copy number calling by tiling long intervals and by
including off-target regions. Uses scanFa
from the Rsamtools package
to retrieve GC content of intervals in a reference FASTA file. If provided,
will annotate intervals with mappability and replication timing scores.
preprocessIntervals(interval.file, reference.file, output.file = NULL, off.target = FALSE, average.target.width = 400, min.off.target.width = 20000, average.off.target.width = 2e+05, off.target.padding = -500, mappability = NULL, min.mappability = c(0.5, 0.1, 0.7), reptiming = NULL, exclude = NULL, off.target.seqlevels = c("targeted", "all"))
interval.file |
File specifying the intervals. Interval is expected in
first column in format CHR:START-END. Instead of a file, a |
reference.file |
Reference FASTA file. |
output.file |
Optionally, write GC content file. |
off.target |
Include off-target regions. |
average.target.width |
Split large targets to approximately this size. |
min.off.target.width |
Only include off-target regions of that size |
average.off.target.width |
Split off-target regions to that |
off.target.padding |
Pad off-target regions. |
mappability |
Annotate intervals with mappability score. Assumed on a scale
from 0 to 1, with score being 1/(number alignments). Expected as |
min.mappability |
|
reptiming |
Annotate intervals with replication timing score. Expected as
|
exclude |
Any target that overlaps with this |
off.target.seqlevels |
Controls how to deal with chromosomes/contigs
found in the |
Returns GC content by interval as GRanges
object.
Markus Riester
Talevich et al. (2016). CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol.
reference.file <- system.file("extdata", "ex2_reference.fa", package="PureCN", mustWork = TRUE) interval.file <- system.file("extdata", "ex2_intervals.txt", package="PureCN", mustWork = TRUE) bed.file <- system.file("extdata", "ex2_intervals.bed", package="PureCN", mustWork = TRUE) preprocessIntervals(interval.file, reference.file, output.file="gc_file.txt") intervals <- import(bed.file) preprocessIntervals(intervals, reference.file, output.file="gc_file.txt")