BrowseSeqs {DECIPHER} | R Documentation |
Opens an html file in a web browser to show the sequences in an XStringSet
.
BrowseSeqs(myXStringSet, htmlFile = paste(tempdir(), "/myXStringSet.html", sep = ""), openURL = interactive(), colorPatterns = TRUE, highlight = NA, patterns = c("-", alphabet(myXStringSet, baseOnly=TRUE)), colors = substring(rainbow(length(patterns), v=0.8, start=0.9, end=0.7), 1, 7), colWidth = Inf, ...)
myXStringSet |
A |
htmlFile |
Character string giving the location where the html file should be written. |
openURL |
Logical indicating whether the |
colorPatterns |
Logical specifying whether to color matched |
highlight |
Numeric specifying which sequence in the set to use for comparison or |
patterns |
Either an |
colors |
Character vector providing the color for each of the matched |
colWidth |
Integer giving the maximum number of nucleotides wide the display can be before starting a new page. Must be a multiple of |
... |
Additional arguments to adjust the appearance of the consensus sequence at the base of the display. Passed directly to |
BrowseSeqs
converts an XStringSet
into html format for viewing in a web browser. The sequences are colored in accordance with the patterns
that are provided, or left uncolored if colorPatterns
is FALSE
or patterns
is NULL
. Character or XStringSet
patterns are matched as regular expressions and colored according to colors
. If patterns
is a list of matrices, then it must contain one element per sequence. Each matrix is interpreted as providing the fraction red, blue, and green for each letter in the sequence. Thus, colors
is ignored when patterns
is a list. (See examples section below.)
Patterns are not matched across column breaks, so multi-character patterns
should be carefully considered when colWidth
is less than the maximum sequence length. Patterns are matched sequentially in the order provided, so it is feasible to use nested patterns
such as c("ACCTG", "CC")
. In this case the “CC” could be colored differently inside the previously colored “ACCTG”. Note that patterns
overlapping the boundaries of a previously matched pattern will not be matched. For example, “ACCTG” would not be matched if patterns=c("CC", "ACCTG")
.
Some web browsers cannot quickly display a large amount colored text, so it is recommended to use colorPatterns = FALSE
or to highlight
a sequence when viewing a large XStringSet
. Highlighting will only show all of the characters in the highlighted sequence, and convert all matching positions in the other sequences into dots without color
. Also, note that some web browsers display small shifts between fixed-width characters that may become noticeable as color offsets between the ends of long sequences.
Creates an html file containing sequence data and (if openURL
is TRUE
) opens it in a web browser for viewing. The layout has the sequence name on the left, position legend on the top, cumulative number of nucleotides on the right, and consensus sequence on the bottom.
Returns htmlFile
if the html file was written successfully.
Erik Wright eswright@pitt.edu
ES Wright (2016) "Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R". The R Journal, 8(1), 352-359.
# load the example DNA sequences db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER") dna <- SearchDB(db) # non-coding ribosomal RNA gene sequences # example of using the defaults with DNA sequences BrowseSeqs(dna) # view the XStringSet # color only "ACTG" and "CSC" patterns (where S is C or G) BrowseSeqs(dna, patterns=DNAStringSet(c("ACTG", "CSC"))) # highlight (i.e., only fully-color) the first sequence BrowseSeqs(dna, highlight=1) # other sequences are dots where matching # highlight the consensus sequence at the bottom BrowseSeqs(dna, highlight=0) # other sequences are dots where matching # split the wide view into multiple vertical pages (i.e., for printing) BrowseSeqs(dna, colWidth=100, highlight=1) # specify an alternative color scheme for -, A, C, G, T BrowseSeqs(dna, colors=c("#1E90FF", "#32CD32", "#9400D3", "black", "#EE3300")) # only color the positions within certain positional ranges (100-200 & 250-500) BrowseSeqs(dna, colorPatterns=c(100, 200, 250, 500)) # color according to base-pairing by supplying the fraction RGB for every position dbn <- PredictDBN(dna, type="structures") # calculate the secondary structures # dbn now contains the scores for whether a base is paired (left/right) or unpaired dbn[[1]][, 1] # the scores for the first position in the first sequence dbn[[2]][, 10] # the scores for the tenth position in the second sequence # these positional scores can be used as shades of red, green, and blue: BrowseSeqs(dna, patterns=dbn) # red = unpaired, green = left-pairing, blue = right # positions in black are not part of the consensus secondary structure # color all restriction sites data(RESTRICTION_ENZYMES) # load dataset containing restriction enzyme sequences sites <- RESTRICTION_ENZYMES sites <- gsub("[^A-Z]", "", sites) # remove non-letters sites <- DNAStringSet(sites) # convert the character vector to a DNAStringSet rc_sites <- reverseComplement(DNAStringSet(sites)) w <- which(sites != rc_sites) # find non-palindromic restriction sites sites <- c(sites, rc_sites[w]) # append their reverse complement sites <- sites[order(nchar(sites))] # match shorter sites first BrowseSeqs(dna, patterns=sites) # load the example protein coding sequences fas <- system.file("extdata", "50S_ribosomal_protein_L2.fas", package="DECIPHER") dna <- readDNAStringSet(fas) # example of using the defaults with amino acid sequences aa <- unique(translate(dna)) # the unique amino acid sequences BrowseSeqs(aa) # example of highlighting the consensus amino acid sequence AA <- AlignSeqs(aa) BrowseSeqs(AA, highlight=0) # example of highlighting positions that differ from the majority consensus BrowseSeqs(AA, highlight=0, threshold=0.5) # color amino acids according to their predicted secondary structure hec <- PredictHEC(AA, type="probabilities") # calculate the secondary structures # hec now contains the probability that a base is in an alpha-helix or beta-sheet hec[[3]][, 18] # the 18th position in sequence 3 is likely part of a beta-sheet (E) # the positional probabilities can be used as shades of red, green, and blue: BrowseSeqs(AA, patterns=hec) # red = alpha-helix, green = beta-sheet, blue = coil