testCpG {cobindR} | R Documentation |
diagnostical function - GC content and CpG content are clustered using 2D gaussian models (Mclust). FALSE is returned if > max.clust (default=1) subgroups are found using the bayesian information criterion (BIC). If do.plot=TRUE, the results are visualized.
## S4 method for signature 'cobindr' testCpG(x, max.clust = 4, do.plot = F, n.cpu = NA)
x |
an object of the class "cobindr", which will hold all necessary information about the sequences and the hits. |
max.clust |
integer describing the maximal number of clusters which are used for separating the data. |
do.plot |
logical flag, if do.plot=TRUE a scatterplot for the GC and CpG content for each sequence is produced and the clusters are color coded. |
n.cpu |
number of CPUs to be used for parallelization. Default value is 'NA' in which case the number of available CPUs is checked and than used. |
result |
logical flag, FALSE is returned if more than one subgroups are found using the bayesian information criterion (BIC) |
gc |
matrix with rows corresponding to sequences and columns corresponding to GC and CpG content |
Robert Lehmann <r.lehmann@biologie.hu-berlin.de>
the method uses clustering functions from the package "mclust" (http://www.stat.washington.edu/mclust/)
cfg <- cobindRConfiguration() sequence_type(cfg) <- 'fasta' sequence_source(cfg) <- system.file('extdata/example.fasta', package='cobindR') # avoid complaint of validation mechanism pfm_path(cfg) <- system.file('extdata/pfms',package='cobindR') pairs(cfg) <- '' runObj <- cobindr( cfg) testCpG(runObj, max.clust = 2, do.plot = TRUE)