msaConsensusSequence {msa} | R Documentation |
This method computes a consensus sequence from a multiple alignment or a previously computed consensus matrix. Currently, two different ways of these computations are available.
## S4 method for signature 'matrix' msaConsensusSequence(x, type=c("Biostrings", "upperlower"), thresh=c(80, 20), ignoreGaps=FALSE, ...) ## S4 method for signature 'MultipleAlignment' msaConsensusSequence(x, ...)
x |
an object of class |
type |
a character string specifying how to compute the consensus
sequence. Currently, types |
thresh |
a decreasing two-element numeric vector of numbers
between 0 and 100 corresponding to the two conservation thresholds.
Only relevant for |
ignoreGaps |
a logical (default: |
... |
when the method is called for a
|
The method takes a MultipleAlignment
object or a
previously computed consensus matrix and computes a consensus
sequence. For type="Biostrings"
, the method
consensusString
from the Biostrings package is
used to compute the consensus sequence. For type="upperlower"
,
two thresholds (argument thresh
, see above) are used to
compute the consensus sequence:
If the relative frequency of the most frequent letter at a given position is at least as large as the first threshold (default: 80%), then this most frequent letter is used for the consensus sequence at this position as it is.
If the relative frequency of the most frequent letter at a given position is smaller than the first threshold, but at least as large as the second threshold (default: 20%), then this most frequent letter is used for the consensus sequence at this position, but converted to lower case.
If the relative frequency of the most frequent letter in a column is even smaller than the second threshold, then a dot is used for the consensus sequence at this position.
If ignoreGaps=FALSE
(which is the default),
gaps are treated like all other
letters except for the fact that obviously no lowercase conversion
takes place in case that the relative frequency is between the
two thresholds. If ignoreGaps=TRUE
, gaps are ignored
completely, and the consensus sequence is computed from the
non-gap letters only.
If the consensus matrix of a multiple alignment of nucleotide sequences contains rows labeled ‘+’ and/or ‘.’, these rows are ignored.
The function returns a character string with the consensus sequence.
Ulrich Bodenhofer <msa@bioinf.jku.at>
http://www.bioinf.jku.at/software/msa
U. Bodenhofer, E. Bonatesta, C. Horejs-Kainrath, and S. Hochreiter (2015). msa: an R package for multiple sequence alignment. Bioinformatics 31(24):3997-3999. DOI: 10.1093/bioinformatics/btv494.
msa
, MsaAAMultipleAlignment
,
MsaDNAMultipleAlignment
,
MsaRNAMultipleAlignment
,
MsaMetaData
,
MultipleAlignment
,
consensusString
## read sequences filepath <- system.file("examples", "HemoglobinAA.fasta", package="msa") mySeqs <- readAAStringSet(filepath) ## perform multiple alignment myAlignment <- msa(mySeqs) ## regular consensus sequence using consensusString() method from the ## 'Biostrings' package msaConsensusSequence(myAlignment) ## use the other method msaConsensusSequence(myAlignment, type="upperlower") ## use the other method with custom parameters msaConsensusSequence(myAlignment, type="upperlower", thresh=c(50, 20), ignoreGaps=TRUE) ## compute a consensus matrix first conMat <- consensusMatrix(myAlignment) msaConsensusSequence(conMat)