cpm {edgeR}R Documentation

Counts per Million or Reads per Kilobase per Million

Description

Compute counts per million (CPM) or reads per kilobase per million (RPKM).

Usage

## S3 method for class 'DGEList'
cpm(y, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## Default S3 method:
cpm(y, lib.size = NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkm(y, gene.length = NULL, normalized.lib.sizes = TRUE,
       log = FALSE, prior.count = 2, ...)
## Default S3 method:
rpkm(y, gene.length, lib.size = NULL,
       log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
cpmByGroup(y, group = NULL, dispersion = NULL, ...)
## Default S3 method:
cpmByGroup(y, group = NULL, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)
## S3 method for class 'DGEList'
rpkmByGroup(y, group = NULL, gene.length = NULL, dispersion = NULL, ...)
## Default S3 method:
rpkmByGroup(y, group = NULL, gene.length, dispersion = 0.05,
       offset = NULL, weights = NULL, log = FALSE, prior.count = 2, ...)

Arguments

y

matrix of counts or a DGEList object

normalized.lib.sizes

logical, use normalized library sizes?

lib.size

library size, defaults to colSums(y).

log

logical, if TRUE then log2 values are returned.

prior.count

average count to be added to each observation to avoid taking log of zero. Used only if log=TRUE.

gene.length

vector of length nrow(y) giving gene length in bases, or the name of the column y$genes containing the gene lengths.

group

factor giving group membership for columns of y. Defaults to y$sample$group for the DGEList method and to a single level factor for the default method.

dispersion

numeric vector of negative binomial dispersions.

offset

numeric matrix of same size as y giving offsets for the log-linear models. Can be a scalar or a vector of length ncol(y), in which case it is expanded out to a matrix.

weights

numeric vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

...

other arguments are not used.

Details

CPM or RPKM values are useful descriptive measures for the expression level of a gene. By default, the normalized library sizes are used in the computation for DGEList objects but simple column sums for matrices.

If log-values are computed, then a small count, given by prior.count but scaled to be proportional to the library size, is added to y to avoid taking the log of zero.

The rpkm method for DGEList objects will try to find the gene lengths in a column of y$genes called Length or length. Failing that, it will look for any column name containing "length" in any capitalization.

cpmByGroup and rpkmByGroup compute group average values on the unlogged scale.

Value

A numeric matrix of CPM or RPKM values. cpm and rpkm produce matrices of the same size as y. cpmByGroup and rpkmByGroup produce matrices with a column for each level of group. If log = TRUE, then the values are on the log2 scale.

Note

aveLogCPM(y), rowMeans(cpm(y,log=TRUE)) and log2(rowMeans(cpm(y)) all give slightly different results.

Author(s)

Davis McCarthy, Gordon Smyth

See Also

aveLogCPM

Examples

y <- matrix(rnbinom(20,size=1,mu=10),5,4)
cpm(y)

d <- DGEList(counts=y, lib.size=1001:1004)
cpm(d)
cpm(d,log=TRUE)

d$genes <- data.frame(Length=c(1000,2000,500,1500,3000))
rpkm(d)

cpmByGroup(d, group=c(1,1,2,2))

rpkmByGroup(d, group=c(1,1,2,2))

[Package edgeR version 3.26.8 Index]