computeRPKM {RIPSeeker} | R Documentation |
Given a list of single-end or paired-end read alignment files in BAM/SAM/BED format, compute the read counts and normalized read counts as expression of annotated transcript in the unit of "reads per kilobase of exon per million mapped reads" (RPKM).
computeRPKM(bamFiles, RIPSeekerRead = TRUE, paired = FALSE, countMode = "IntersectionNotEmpty", featureGRanges, idType = "ensembl_transcript_id", featureType = "exon", ignore.strand = FALSE, txDbName = "biomart", moreGeneInfo = FALSE, saveData, justRPKM = TRUE, ...)
bamFiles |
A list of one or more BAM/SAM/BED alignment files. |
RIPSeekerRead |
Binary flag. If TRUE, then import and process the alignment files using the built-in function |
paired |
Binary to indicate whether the alignments files are paired-end. The alignments file must be either paired-end or single-end but not both. |
countMode |
An argument used to set the |
featureGRanges |
GRanges of features as an optional argument for function to compute RPKM/FPKM just for those features without retrieving online annotations. |
idType |
A character string that specifies the type of the annotations, which can "ensembl_transcript_id", "ensembl_gene_id", "ucsc", etc. Refer to |
featureType |
Features that will be groupped by genes/transcripts in a GRangesList. The available options are "exon" (Default), "intron", "fiveUTR", "threeUTR", and "CDS" corresponding to the functions |
ignore.strand |
Whether to ignore strand when counting the reads (Default: FALSE). |
txDbName |
Name of the transcript database to use to retreive the annotation. The available options are "biomart" (Default) or "UCSC" corresponding to the functions |
moreGeneInfo |
Binary indicator to indicate whether to download more information for each genes/transcripts rather than having only the gene/transcript IDs (Default: FALSE). |
saveData |
Path of output file. |
justRPKM |
Binary for whether to return only the |
... |
Extra arguments passed to functions |
The function is a wrapper function making use of several external functions from several well maintained and freely available Bioconductor packages including GenomicFeatures, GenomicRanges, biomaRt and Rsamtools packages. The paired-end alignments are converted into single-end using function galp2gal
and then subject to read count computation by summarizeOverlaps
, which does not yet directly support paired-end alignments.
rpkmSEobject |
A |
rpkmDF |
Data frame with or without the detailed gene information columns depending on whether |
featureGRanges |
The features in GRanges object that are used to compute the gene expression. |
Also works for RNA-seq alignments.
Yue Li
M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar and M. Lawrence. GenomicFeatures: Tools for making and manipulating transcript centric annotations. R package version 1.8.2.
P. Aboyoun, H. Pages and M. Lawrence (). GenomicRanges: Representation and manipulation of genomic intervals. R package version 1.8.9.
Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).
BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).
Martin Morgan and Herv\'e Pag\'es (). Rsamtools: Binary alignment (BAM), variant call (BCF), or tabix file import. R package version 1.8.5. http://bioconductor.org/packages/release/bioc/html/Rsamtools.html
makeTxDbFromBiomart, makeTxDbFromUCSC, useMart, exonsBy, cdsBy, intronsByTranscript, fiveUTRsByTranscript, threeUTRsByTranscript, cdsBy, combineAlignGals, summarizeOverlaps, ScanBamParam, readGAlignmentPairs, readGAlignments
if(interactive()) { # need internet connection # Retrieve system files extdata.dir <- system.file("extdata", package="RIPSeeker") bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE) bamFiles <- grep("PRC2", bamFiles, value=TRUE) # use biomart txDbName <- "biomart" biomart <- "ENSEMBL_MART_ENSEMBL" # use archive to get ensembl 65 dataset <- "mmusculus_gene_ensembl" host <- "dec2011.archive.ensembl.org" # use ensembl 65 for annotation resultlist <- computeRPKM(bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=TRUE), #featureGRanges=featureGRanges, dataset=dataset, moreGeneInfo=TRUE, justRPKM=FALSE, idType="ensembl_transcript_id", txDbName=txDbName, biomart=biomart, host=host, by="tx") }