biomvRseg {biomvRCNS} | R Documentation |
The function will perform a two stage segmentation on multi-sample genomic data from array experiment or high throughput sequencing data.
biomvRseg(x, maxk=NULL, maxbp=NULL, maxseg=NULL, xPos=NULL, xRange=NULL, usePos='start', family='norm', penalty='BIC', twoStep=TRUE, segDisp=FALSE, useMC=FALSE, useSum=TRUE, comVar=TRUE, maxgap=Inf, tol=1e-06, grp=NULL, cluster.m=NULL, avg.m='median', trim=0, na.rm=TRUE)
x |
input data matrix, or a |
maxk |
maximum length of a segment |
maxbp |
maximum length of a segment in bp, given positional information specified in |
maxseg |
maximum number of segment the function will try |
xPos |
a vector of positions for each |
xRange |
a |
usePos |
character value to indicate whether the 'start', 'end' or 'mid' point position should be used |
family |
family of |
penalty |
penalty method used for determining the optimal number of segment using likelihood, possible values are 'none','AIC','AICc','BIC','SIC','HQIC', 'mBIC' |
twoStep |
TRUE if a second stage merging will be performed after the initial group segmentation |
segDisp |
TRUE if a segment-wise estimation of dispersion parameter rather than using a overall estimation |
useMC |
TRUE if |
useSum |
TRUE if using grand sum across sample / x columns, like in the |
comVar |
TRUE if assuming common variance across samples (x columns) |
maxgap |
max distance between neighbouring feature to consider a split |
tol |
tolerance level of the likelihood change to determining the termination of the EM run |
grp |
vector of group assignment for each sample, with a length the same as columns in the data matrix, samples within each group would be processed simultaneously if a multivariate emission distribution is available |
cluster.m |
clustering method for prior grouping, possible values are 'ward','single','complete','average','mcquitty','median','centroid' |
avg.m |
method to calculate average value for each segment, 'median' or 'mean' possibly trimmed |
trim |
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint. |
na.rm |
|
A homogeneous segmentation algorithm, using dynamic programming like in tilingArray
; however capable of handling count data from sequencing.
A biomvRCNS-class
object:
|
Object of class |
|
Object of class |
|
Object of class |
Piegorsch, W. W. (1990). Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics, 863-867.
Picard,F. et al. (2005) A statistical approach for array CGH data analysis. BMC Bioinformatics, 6, 27.
Huber,W. et al. (2006) Transcript mapping with high density oligonucleotide tiling arrays. Bioinformatics, 22, 1963-1970. .
Zhang, N. R. and Siegmund, D. O. (2007). A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data. Biometrics 63 22-32.
Robinson MD and Smyth GK (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics, 9, 321-332
data(coriell) xgr<-GRanges(seqnames=paste('chr', coriell[,2], sep=''), IRanges(start=coriell[,3], width=1, names=coriell[,1])) values(xgr)<-DataFrame(coriell[,4:5], row.names=NULL) xgr<-xgr[order(xgr)] resseg<-biomvRseg(x=xgr, maxbp=4E4, maxseg=10, family='norm', grp=c(1,2))