filterByExpr {edgeR} | R Documentation |
Determine which genes have sufficiently large counts to be retained in a statistal analysis.
## S3 method for class 'DGEList' filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, ...) ## Default S3 method: filterByExpr(y, design = NULL, group = NULL, lib.size = NULL, min.count = 10, min.total.count = 15, ...)
y |
matrix of counts or a |
design |
design matrix. Ignored if |
group |
vector or factor giving group membership for a oneway layout, if appropriate. |
lib.size |
library size, defaults to |
min.count |
numeric. Minimum count required for at least some samples. |
min.total.count |
numeric. Minimum total count required. |
... |
any other arguments.
For the |
This function implements the filtering strategy that was intuitively described by Chen & Smyth (2016).
Roughly speaking, the strategy keeps genes that have at least min.count
reads in a worthwhile number samples.
More precisely, the filtering keeps genes that have count-per-million (CPM) above k in n samples, where k is determined by min.count
and by the sample library sizes and n is determined by the design matrix.
n is essentially the smallest group sample size or, more precisely, the minimum inverse leverage for any fitted value. If all the group sizes are large, then this is relaxed slightly, but with n always greater than 70% of the smallest group size.
In addition, each kept gene is required to have at least min.total.count
reads across all the samples.
Logical vector of length nrow(y)
indicating which rows of y
to keep in the analysis.
Gordon Smyth
Chen Y, Lun ATL, and Smyth, GK (2016). From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438. http://f1000research.com/articles/5-1438
## Not run: keep <- filterByExpr(y) y <- y[keep,] ## End(Not run)