runEMclonalCN {TitanCNA} | R Documentation |
Function to run the Expectation Maximization Algorithm for inference of model parameters: cellular prevalence, normal proportion, tumour ploidy. This is a key function in the TitanCNA package and is the most computationally intense. This function makes calls to a C subroutine that allows the algorithm to be run more efficiently.
runEMclonalCN(data, params, txnExpLen = 1e15, txnZstrength = 5e05, maxiter = 15, maxiterUpdate = 1500, pseudoCounts = 1e-300, normalEstimateMethod = "map", estimateS = TRUE, estimatePloidy = TRUE, useOutlierState = FALSE, likChangeThreshold = 0.001, verbose = TRUE)
data |
|
params |
|
txnExpLen |
Influences prior probability of genotype transitions in the HMM. Smaller value have lower tendency to change state; however, too small and it produces underflow problems. |
txnZstrength |
Influences prior probability of clonal cluster transitions in the HMM. Smaller value have lower tendency to change clonal cluster state. |
pseudoCounts |
Small, machine precision values to add to probabilities to avoid underflow. For example, |
maxiter |
Maximum number of expectation-maximization iterations allowed. In practice, for TitanCNA, it will usually not exceed 20. |
maxiterUpdate |
Maximum number of coordinate descent iterations during the M-step (of EM algorithm) when parameters are estimated. |
normalEstimateMethod |
Specifies how to handle normal proportion estimation. Using |
estimateS |
Logical indicating whether to account for clonality and estimate subclonal events. See Details. |
estimatePloidy |
Logical indicating whether to estimate and account for tumour ploidy. |
useOutlierState |
Logical indicating whether an additional outlier state should be used. In practice, this is usually not necessary. |
likChangeThreshold |
EM convergence criteria - stop EM when complete log likelihood changes less than the proportion specified by this argument. |
verbose |
Set to FALSE to suppress program messages. |
This function is implemented with the "foreach"
package and therefore supports parallelization. See "doMC"
or "doMPI"
for some parallelization packages.
The forwards-backwards algorithm is used for the E-step in the EM algorithm. This is done using a call to a C subroutine for each chromosome. The maximization step uses maximum a posteriori (MAP) for estimation of parameters.
If the sample has absolutely no normal contamination, then assign nParams$n_0 <- 0
and use argument normalEstimateMethod="fixed"
.
estimateS
should always be set to TRUE
. If no subclonality is expected, then use loadDefaultParameters(numberClonalClusters=1)
. Using estimateS=FALSE
and loadDefaultParameters(numberClonalClusters=0)
is gives more or less the same results.
list
with components for results returned from the EM algorithm, including converged parameters, posterior marginal responsibilities, log likelihood, and original parameter settings.
n |
Converged estimate for normal contamination parameter. |
s |
Converged estimate(s) for cellular prevalence parameter(s). This value is defined as the proportion of tumour sample that does not contain the aberrant genotype. This will contrast what is output in |
var |
Converged estimates for variance parameter of the Gaussian mixtures used to model the log ratio data. |
phi |
Converged estimate for tumour ploidy parameter. |
piG |
Converged estimate for initial genotype state distribution. |
piZ |
Converged estimate for initial clonal cluster state distribution. |
muR |
Mean of binomial mixtures computed as a function of |
muC |
Mean of Gaussian mixtures computed as a function of |
loglik |
Posterior Log-likelihood that includes data likelihood and the priors. |
rhoG |
Posterior marginal probabilities for the genotype states computed during the E-step. Only the final iteration is returned as a |
rhoZ |
Posterior marginal probabilities for the clonal cluster states computed during the E-step. Only the final iteration is returned as a |
genotypeParams |
Original genotype parameters. See |
ploidyParams |
Original tumour ploidy parameters. See |
normalParams |
Original normal contamination parameters. See |
clonalParams |
Original subclonal parameters. See |
txnExpLen |
Original genotype transition expected length. See |
txnZstrength |
Original clonal cluster transition expected length. See |
useOutlierState |
Original setting indicating usage of outlier state. See |
Gavin Ha <gavinha@gmail.com>
Ha, G., Roth, A., Khattra, J., Ho, J., Yap, D., Prentice, L. M., Melnyk, N., McPherson, A., Bashashati, A., Laks, E., Biele, J., Ding, J., Le, A., Rosner, J., Shumansky, K., Marra, M. A., Huntsman, D. G., McAlpine, J. N., Aparicio, S. A. J. R., and Shah, S. P. (2014). TITAN: Inference of copy number architectures in clonal cell populations from tumour whole genome sequence data. Genome Research, 24: 1881-1893. (PMID: 25060187)
"foreach"
, "doMC"
, "doMPI"
, loadAlleleCounts
, loadDefaultParameters
, viterbiClonalCN
message('Running TITAN ...') #### LOAD DATA #### infile <- system.file("extdata", "test_alleleCounts_chr2.txt", package = "TitanCNA") data <- loadAlleleCounts(infile) #### LOAD PARAMETERS #### message('titan: Loading default parameters') numClusters <- 2 params <- loadDefaultParameters(copyNumber = 5, numberClonalClusters = numClusters, skew = 0.1) #### READ COPY NUMBER FROM HMMCOPY FILE #### message('titan: Correcting GC content and mappability biases...') tumWig <- system.file("extdata", "test_tum_chr2.wig", package = "TitanCNA") normWig <- system.file("extdata", "test_norm_chr2.wig", package = "TitanCNA") gc <- system.file("extdata", "gc_chr2.wig", package = "TitanCNA") map <- system.file("extdata", "map_chr2.wig", package = "TitanCNA") cnData <- correctReadDepth(tumWig, normWig, gc, map) logR <- getPositionOverlap(data$chr, data$posn, cnData) data$logR <- log(2^logR) #transform to natural log #### FILTER DATA FOR DEPTH, MAPPABILITY, NA, etc #### data <- filterData(data, 1:24, minDepth = 10, maxDepth = 200, map = NULL) #### EM (FWD-BACK) TO TRAIN PARAMETERS #### #### Can use parallelization packages #### K <- length(params$genotypeParams$alphaKHyper) params$genotypeParams$alphaKHyper <- rep(500, K) params$ploidyParams$phi_0 <- 1.5 convergeParams <- runEMclonalCN(data, params, maxiter = 3, maxiterUpdate = 500, txnExpLen = 1e15, txnZstrength = 5e5, useOutlierState = FALSE, normalEstimateMethod = "map", estimateS = TRUE, estimatePloidy = TRUE)