isContaminant {decontam} | R Documentation |
Identify contaminant sequences.
Description
The frequency of each sequence (or OTU) in the input feature table as a function of the concentration of
amplified DNA in each sample is used to identify contaminant sequences.
Usage
isContaminant(seqtab, conc = NULL, neg = NULL, method = c("auto",
"frequency", "prevalence", "combined", "minimum", "either", "both"),
batch = NULL, batch.combine = c("minimum", "product", "fisher"),
threshold = 0.1, normalize = TRUE, detailed = TRUE)
Arguments
seqtab |
(Required). Integer matrix or phyloseq object.
A feature table recording the observed abundances of each sequence variant (or OTU) in each sample.
Rows should correspond to samples, and columns to sequences (or OTUs).
If a phyloseq object is provided, the otu-table component will be extracted.
|
conc |
(Optional). numeric . Required if performing frequency-based testing.
A quantitative measure of the concentration of amplified DNA in each sample prior to sequencing.
All values must be greater than zero. Zero is assumed to represent the complete absence of DNA.
If seqtab was prodivded as a phyloseq object, the name of the appropriate sample-variable in that
phyloseq object can be provided.
|
neg |
(Optional). logical . Required if performing prevalence-based testing.
TRUE if sample is a negative control, and FALSE if not (NA entries are not included in the testing).
Extraction controls give the best results.
If seqtab was provided as a phyloseq object, the name of the appropriate sample-variable in that
phyloseq object can be provided.
|
method |
(Optional). character . The method used to test for contaminants.
- auto
(Default). frequency, prevalence or combined will be automatically selected based on whether
just conc , just neg , or both were provided.
- frequency
Contaminants are identified by frequency that varies inversely with sample DNA concentration.
- prevalence
Contaminants are identified by increased prevalence in negative controls.
- combined
The frequency and prevalence probabilities are combined with Fisher's method and used to identify contaminants.
- minimum
The minimum of the frequency and prevalence probabilities is used to identify contaminants.
- either
Contaminants are called if identified by either the frequency or prevalance methods.
- both
Contaminants are called if identified by both the frequency and prevalance methods.
|
batch |
(Optional). factor , or any type coercible to a factor . Default NULL.
If provided, should be a vector of length equal to the number of input samples which specifies which batch
each sample belongs to (eg. sequencing run). Contaminants identification will be performed independently
within each batch.
If seqtab was provided as a phyloseq object, the name of the appropriate sample-variable in that
phyloseq object can be provided.
|
batch.combine |
(Optional). Default "minimum".
For each input sequence variant (or OTU) the probabilities calculated in each batch are combined into a
single probability that is compared to 'codethreshold' to classify contaminants.
Valid values: "minimum", "product", "fisher".
|
threshold |
(Optional). Default 0.1 .
The probability threshold below which (strictly less than) the null-hypothesis (not a contaminant) should be rejected in favor of the
alternate hypothesis (contaminant). A length-two vector can be provided when using the either or both methods:
the first value is the threshold for the frequency test and the second for the prevalence test.
|
normalize |
(Optional). Default TRUE.
If TRUE, the input seqtab is normalized so that each row sums to 1 (converted to frequency).
If FALSE, no normalization is performed (the data should already be frequencies or counts from equal-depth samples).
|
detailed |
(Optional). Default TRUE.
If TRUE, the return value is a data.frame containing diagnostic information on the contaminant decision.
If FALSE, the return value is a logical vector containing the binary contaminant classifications.
|
Value
If detailed=TRUE
a data.frame
with classification information.
If detailed=FALSE
a logical
vector is returned, with TRUE indicating contaminants.
Examples
st <- readRDS(system.file("extdata", "st.rds", package="decontam"))
# conc should be positive and non-zero
conc <- c(6413, 3581.0, 5375, 4107, 4291, 4260, 4171, 2765, 33, 48)
neg <- c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
# Use frequency or frequency and prevalence to identify contaminants
isContaminant(st, conc=conc, method="frequency", threshold=0.2)
isContaminant(st, conc=conc, neg=neg, method="both", threshold=c(0.1,0.5))
[Package
decontam version 1.2.1
Index]