read.bismark {bsseq} | R Documentation |
Parsing output from the Bismark alignment suite.
read.bismark(files, sampleNames, rmZeroCov = FALSE, strandCollapse = TRUE, fileType = c("cov", "oldBedGraph", "cytosineReport"), mc.cores = 1, verbose = TRUE, BACKEND = NULL)
files |
Input files. Each sample is in a different file. Input files are created by running Bismark's methylation extractor; see Note for details. |
sampleNames |
sample names, based on the order of |
rmZeroCov |
Should methylation loci that have zero coverage in all samples be removed. This will result in a much smaller object if the data originates from (targeted) capture bisulfite sequencing. |
strandCollapse |
Should strand-symmetric methylation loci, e.g., CpGs,
be collapsed across strands. This option is only available if
|
fileType |
The format of the input file; see Note for details. |
mc.cores |
The number of cores used. Note that setting
|
verbose |
Make the function verbose. |
BACKEND |
The backend used for the 'M' and 'Cov' matrices. The default,
|
An object of class BSseq
.
Input files can either be gzipped or not.
The user must specify the relevant file format via the fileType
argument. The format of the output of the Bismark alignment suite will depend
on the version of Bismark and on various user-specified options. Please
consult the Bismark documentation and the Bismark RELEASE NOTES
(http://www.bioinformatics.bbsrc.ac.uk/projects/bismark/RELEASE_NOTES.txt)
for the definitive list of changes between versions. When possible, it is
strongly recommended that you use the most recent version of Bismark.
The "cov
" and "oldBedGraph
" formats both have six columns
("chromosome
", "position
", "strand
",
"methylation percentage
", "count methylated
",
"count unmethylated
"). If you are using a recent version of Bismark
(v>=0.10.0
) then the standard file extension for this file is
".cov
". If, however, you are using an older version of Bismark
(v<0.10.0
) then the file extension will be ".bedGraph
". Please
note that the ".bedGraph
" file created in recent versions of Bismark
(v>=0.10.0
) is not suitable for analysis with bsseq because
it only contains the "methylation percentage
" and not
"count methylated
" nor "count unmethylated
".
The "cytosineReport
" format has seven columns
("chromosome
", "position
", "strand
",
"count methylated
", "count unmethylated
", "C-context
",
"trinucleotide context
").
There is no standard file extension for this file. The "C-context
" and
"trinucleotide context
" columns are not currently used by bsseq.
The following is a list of some issues to be aware of when using output from Bismark's methylation extractor:
The program to extract methylation counts was named
methylation_extractor
in older versions of Bismark (v<0.8.0
)
and re-named bismark_methylation_extractor
in recent versions of
Bismark (v>=0.8.0
). Furthermore, very old versions of Bismark
(v<0.7.7
) required that user run a separate script (called
something like genome_methylation_bismark2bedGraph
) to create the
six-column "cov
"/"oldBedGraph
" file.
The --counts
and --bedGraph
arguments must be supplied
to methylation_extractor
/bismark_methylation_extractor
in
order to use the output with bsseq::read.bismark()
.
The genomic co-ordinates of the Bismark output file may be zero-based
or one-based depending on whether the --zero_based
argument is used.
Furthermore, the default co-ordinate system varies by version of Bismark.
bsseq makes no assumptions about the basis of the genomic co-ordinates and
it is left to the user to ensure that the appropriate basis is used in the
analysis of their data. Since Bioconductor packages and
GRanges
use one-based co-ordinates, it
is recommended that your Bismark files are also one-based.
Peter Hickey peter.hickey@gmail.com
read.bsmooth
for parsing output from the BSmooth
alignment suite. read.umtab
for parsing legacy (old)
formats from the BSmooth alignment suite.
collapseBSseq
for collapse (merging or summing) the
data in two different directories.
infile <- system.file("extdata/test_data.fastq_bismark.bismark.cov.gz", package = 'bsseq') bismarkBSseq <- read.bismark(files = infile, sampleNames = "test_data", rmZeroCov = FALSE, strandCollapse = FALSE, fileType = "cov", verbose = TRUE) bismarkBSseq #----------------------------------------------------------------------------- # An example constructing a HDF5Array-backed BSseq object # library(HDF5Array) # See ?DelayedArray::setRealizationBackend for details hdf5_bismarkBSseq <- read.bismark(files = infile, sampleNames = "test_data", rmZeroCov = FALSE, strandCollapse = FALSE, fileType = "cov", verbose = TRUE, BACKEND = "HDF5Array")