getDistrs {casper}R Documentation

Compute fragment start and fragment length distributions

Description

Compute fragment start distributions by using reads aligned to genes with only one annotated variant. Estimate fragment length distribution using fragments aligned to long exons (>1000nt). Fragment length is defined as the distance between the start of the left-end read and the end of the right-end read.

Usage

getDistrs(DB, bam, pbam, islandid=NULL, verbose=FALSE, nreads=4*10^6,
readLength, min.gt.freq = NULL, tgroups=5, mc.cores=1)

Arguments

DB

Annotated genome. Object of class knownGenome as returned by procGenome.

bam

Aligned reads, as returned by scanBam. It must be a list with elements 'qname', 'rname', 'pos' and 'mpos'. Ignored when argument pbam is specified.

pbam

Processed BAM object of class procBam, as returned by function procBam. Arguments bam and readLength are ignored when pbam is specified.

islandid

Island IDs of islands to be used in the read start distribution calculations (defaults to genes with only one annotated variant)

verbose

Set to TRUE to print progress information.

nreads

To speed up computations, only the first nreads are used to obtain the estimates. The default value of 4 milions usually gives highly precise estimates.

readLength

Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set readLength=75. \itemmin.gt.freqThe target distributions cannot be estimated with precision for gene types that are very unfrequent. Gene types with relative frequency below min.gt.freq are merged, e.g. min.gt.freq=0.05 means gene types making up for 5% of the genes in DB will be combined and a single read start and length distribution will be estimated for all of them. \itemtgroupsAs an alternative to min.gt.freq you may specify the maximum number of distinct gene types to consider. A separate estimate will be obtained for the tgroups with highest frequency, all others will be combined. \itemmc.coresNumber of cores to use for parallel processing

min.gt.freq

(Only for genomes with information of gene type) Minimum frequency of gene type to define a new class. All types with lower frequencies are collapsed.

tgroups

(Only for genomes with information of gene type) Maximum number of gene types. Types with low frequencies are collapsed.

mc.cores

Number of cores to use in parallel computation.

Value

An object of class readDistrs with slots:

lenDis

Table with number of fragments with a given length

stDis

Cumulative distribution function (object of type closure) for relative start position

Author(s)

Camille Stephan-Otto Attolini, David Rossell

Examples

data(K562.r1l1)
data(hg19DB)
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)

distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)

#Fragment length distribution
plot(distrs,'fragLength')

#Fragment start distribution (relative to transcript length)
plot(distrs,'readSt')

[Package casper version 2.16.1 Index]