cpm {edgeR} | R Documentation |
Compute counts per million (CPM) or reads per kilobase per million (RPKM).
## S3 method for class 'DGEList' cpm(y, normalized.lib.sizes = TRUE, log = FALSE, prior.count = 0.25, ...) ## Default S3 method: cpm(y, lib.size = NULL, log = FALSE, prior.count = 0.25, ...) ## S3 method for class 'DGEList' rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE, log = FALSE, prior.count = 0.25, ...) ## Default S3 method: rpkm(y, gene.length, lib.size = NULL, log = FALSE, prior.count = 0.25, ...) ## S3 method for class 'DGEList' cpmByGroup(y, group = NULL, dispersion = NULL, ...) ## Default S3 method: cpmByGroup(y, group = NULL, dispersion = 0.05, offset = NULL, weights = NULL, ...) ## S3 method for class 'DGEList' rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...) ## Default S3 method: rpkmByGroup(y, group = NULL, gene.length, dispersion = 0.05, offset = NULL, weights = NULL, ...)
y |
matrix of counts or a |
normalized.lib.sizes |
logical, use normalized library sizes? |
lib.size |
library size, defaults to |
log |
logical, if |
prior.count |
average count to be added to each observation to avoid taking log of zero. Used only if |
gene.length |
vector of length |
group |
factor giving group membership for columns of |
dispersion |
numeric vector of negative binomial dispersions. |
offset |
numeric matrix of same size as |
weights |
numeric vector or matrix of non-negative quantitative weights.
Can be a vector of length equal to the number of libraries, or a matrix of the same size as |
... |
other arguments are not used. |
CPM or RPKM values are useful descriptive measures for the expression level of a gene.
By default, the normalized library sizes are used in the computation for DGEList
objects but simple column sums for matrices.
If log-values are computed, then a small count, given by prior.count
but scaled to be proportional to the library size, is added to y
to avoid taking the log of zero.
The rpkm
method for DGEList
objects will try to find the gene lengths in a column of y$genes
called Length
or length
.
Failing that, it will look for any column name containing "length"
in any capitalization.
cpmByGroup
and rpkmByGroup
compute group average values on the unlogged scale.
A numeric matrix of CPM or RPKM values.
cpm
and rpkm
produce matrices of the same size as y
.
cpmByGroup
and rpkmByGroup
produce matrices with a column for each level of group
.
If log = TRUE
, then the values are on the log2 scale.
aveLogCPM(y)
, rowMeans(cpm(y,log=TRUE))
and log2(rowMeans(cpm(y))
all give slightly different results.
Davis McCarthy, Gordon Smyth
y <- matrix(rnbinom(20,size=1,mu=10),5,4) cpm(y) d <- DGEList(counts=y, lib.size=1001:1004) cpm(d) cpm(d,log=TRUE) d$genes <- data.frame(Length=c(1000,2000,500,1500,3000)) rpkm(d)