mergePairsByID {dada2} | R Documentation |
This function attempts to merge each pair of denoised forward and reverse reads,
rejecting any which do not sufficiently overlap
or which contain too many (>0 by default) mismatches in the overlap region.
Note: This function does not assume that the fastq files
for the forward and reverse reads were in the same order.
If they are already in the same order, use mergePairs
.
mergePairsByID(dadaF, derepF, srF, dadaR, derepR, srR, minOverlap = 20, maxMismatch = 0, returnRejects = FALSE, idRegExpr = c("\\s.+$", ""), includeCol = character(0), justConcatenate = FALSE, verbose = FALSE)
dadaF |
(Required). A |
derepF |
(Required). A |
srF |
(Required). The trimmed and filtered forward reads
that you used as input for |
dadaR |
(Required). A |
derepR |
(Required). A |
srR |
(Required). See srF description, but in this case provide for the reverse reads. |
minOverlap |
(Optional). A |
maxMismatch |
(Optional). A |
returnRejects |
(Optional).
A |
idRegExpr |
(Optional).
A length 2 |
includeCol |
(Optional). |
justConcatenate |
(Optional).
NOT CURRENTLY SUPPORTED.
|
verbose |
(Optional). |
Not yet implemented: Use of the concatenate option will result in concatenating forward and reverse reads without attempting a merge/alignment step.
A data.frame
with a row for each unique pairing of forward/reverse denoised sequences,
and the following columns:
$abundance
: Number of reads corresponding to this forward/reverse combination.
$sequence
: The merged sequence.
$forward
: The index of the forward denoised sequence.
$reverse
: The index of the reverse denoised sequence.
$nmatch
: Number of matches nts in the overlap region.
$nmismatch
: Number of mismatches in the overlap region.
$nindel
: Number of indels in the overlap region.
$prefer
: The sequence used for the overlap region. 1=forward; 2=reverse.
$accept
: TRUE if overlap between forward and reverse denoised sequences was at least
minOverlap
and had at most maxMismatch
differences. FALSE otherwise.
$...
: Additional columns specified in propagateCol
.
# For the following example files, there are two ways to merge denoised directions. # Because the read sequences are in order, `mergePairs()` works. # `mergePairsByID` always works, # because it uses the read IDs to match denoised pairs. exFileF = system.file("extdata", "sam1F.fastq.gz", package="dada2") exFileR = system.file("extdata", "sam1R.fastq.gz", package="dada2") srF = ShortRead::readFastq(exFileF) srR = ShortRead::readFastq(exFileR) derepF = derepFastq(exFileF) derepR = derepFastq(exFileR) dadaF <- dada(derepF, err=tperr1, errorEstimationFunction=loessErrfun, selfConsist=TRUE) dadaR <- dada(derepR, err=tperr1, errorEstimationFunction=loessErrfun, selfConsist=TRUE) # Run and compare ex1time = system.time({ ex1 <- mergePairs(dadaF, derepF, dadaR, derepR, verbose = TRUE) ex1 <- data.table::data.table(ex1) }) ex1time # The new function, based on read IDs. ex2time = system.time({ ex2 = dada2:::mergePairsByID(dadaF = dadaF, derepF = derepF, srF = srF, dadaR = dadaR, derepR = derepR, srR = srR, verbose = TRUE) }) ex2time # Compare results (should be identical) ex2[(accept)] data.table::setkey(ex2, sequence) ex2[(accept), list(abundance = sum(abundance)), by = sequence] # Same sequence set (exactly) setequal(x = ex1$sequence, y = ex2[(accept)]$sequence) # Test concatenation functionality ex1cattime = system.time({ ex1cat <- mergePairs(dadaF, derepF, dadaR, derepR, justConcatenate = TRUE, verbose = TRUE) sapply(ex1cat, class) # need to convert to a character ex1cat$sequence <- unlist(ex1cat$sequence) ex1cat <- data.table::data.table(ex1cat) }) ex1cattime ex2cattime = system.time({ ex2cat <- dada2:::mergePairsByID(dadaF = dadaF, derepF = derepF, srF = srF, dadaR = dadaR, derepR = derepR, srR = srR, justConcatenate = TRUE, verbose = TRUE) }) ex2cattime ex2cat[(accept)] # Compare results (should be identical) data.table::setkey(ex1cat, sequence) ex1cat[(accept), list(abundance = sum(abundance)), by = sequence] data.table::setkey(ex2cat, sequence) ex2cat[(accept), list(abundance = sum(abundance)), by = sequence] # Same sequence set (exactly) setequal(x = ex1cat$sequence, y = ex2cat$sequence) intersect(x = ex1cat$sequence, y = ex2cat$sequence) ex1cat[, nchar(sequence)] ex2cat[, nchar(sequence)]