edgeRselection {ClassifyR} | R Documentation |
Performs a differential expression analysis between classes and chooses the features which have best resubstitution performance. The data may have overdispersion and this is modelled.
## S4 method for signature 'matrix' edgeRselection(counts, classes, ...) ## S4 method for signature 'DataFrame' edgeRselection(counts, classes, datasetName, normFactorsOptions = NULL, dispOptions = NULL, fitOptions = NULL, trainParams, predictParams, resubstituteParams, selectionName = "edgeR LRT", verbose = 3) ## S4 method for signature 'MultiAssayExperiment' edgeRselection(counts, targets = NULL, ...)
counts |
Either a |
classes |
A vector of class labels of class |
targets |
If |
... |
Variables not used by the |
datasetName |
A name for the data set used. Stored in the result. |
normFactorsOptions |
A named |
dispOptions |
A named |
fitOptions |
A named |
trainParams |
A container of class |
predictParams |
A container of class |
resubstituteParams |
An object of class |
selectionName |
A name to identify this selection method by. Stored in the result. |
verbose |
Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3. |
The differential expression analysis follows the standard edgeR
steps of estimating library size normalisation factors, calculating dispersion,
in this case robustly, and then fitting a generalised linear model followed by
a likelihood ratio test.
Data tables which consist entirely of non-numeric data cannot be analysed. If measurements
is an object of class MultiAssayExperiment
, the factor of sample classes must be stored
in the DataFrame accessible by the colData
function with column name "class"
.
An object of class SelectResult
or a list of such objects, if the classifier which
was used for determining the specified performance metric made a number of prediction varieties.
Dario Strbenac
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Mark D. Robinson, Davis McCarthy, and Gordon Smyth, 2010, Bioinformatics, Volume 26 Issue 1, https://academic.oup.com/bioinformatics/article/26/1/139/182458.
if(require(parathyroidSE) && require(PoiClaClu)) { data(parathyroidGenesSE) expression <- assays(parathyroidGenesSE)[[1]] sampleNames <- paste("Sample", 1:ncol(parathyroidGenesSE)) colnames(expression) <- sampleNames DPN <- which(colData(parathyroidGenesSE)[, "treatment"] == "DPN") control <- which(colData(parathyroidGenesSE)[, "treatment"] == "Control") expression <- expression[, c(control, DPN)] classes <- factor(rep(c("Contol", "DPN"), c(length(control), length(DPN)))) expression <- expression[rowSums(expression > 1000) > 8, ] # Make small data set. getClasses <- function(result) result[["ytehat"]] selected <- edgeRselection(expression, classes, "DPN Treatment", trainParams = TrainParams(classifyInterface), predictParams = PredictParams(NULL, getClasses = getClasses), resubstituteParams = ResubstituteParams(nFeatures = seq(10, 100, 10), performanceType = "balanced error", better = "lower")) head(selected@rankedFeatures[[1]]) plotFeatureClasses(expression, classes, "ENSG00000044574", dotBinWidth = 500, xAxisLabel = "Unnormalised Counts") }