nmfLassoBootstrap {SparseSignatures} | R Documentation |
Perform the evaluation of different nmfLasso solutions by bootstrap for K (unknown) somatic mutational signatures given a set of observations x. The estimation can slow down because of memory usage and intensive computations, when a big number of bootstrap repetitions is asked and when the analysis is performed for a big range of signatures (K). In this case, an advice may be to split the computation into multiple smaller sets.
nmfLassoBootstrap( x, K = 3:10, starting_beta = NULL, background_signature = NULL, normalize_counts = TRUE, nmf_runs = 10, bootstrap_repetitions = 50, iterations = 30, max_iterations_lasso = 10000, num_processes = Inf, seed = NULL, verbose = TRUE, log_file = "" )
x |
count matrix for a set of n patients and 96 trinucleotides. |
K |
a range of numeric value (each of them greater than 1) indicating the number of signatures to be discovered. |
starting_beta |
a list of starting beta value for each configuration of K. If it is NULL, starting betas are estimated by NMF. |
background_signature |
background signature to be used. If not provided, a warning is thrown and an initial value for it is estimated by NMF. If beta is not NULL, this parameter is ignored. |
normalize_counts |
if true, the input count matrix x is normalize such that the patients have the same number of mutation. |
nmf_runs |
number of iteration (minimum 1) of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored. |
bootstrap_repetitions |
Number of time bootstrap should be repeated. Higher values result in better estimate, but are computationally more expensive. |
iterations |
Number of iterations to be performed. Each iteration corresponds to a first step where beta is fitted and a second step where alpha is fitted. |
max_iterations_lasso |
Number of maximum iterations to be performed during the sparsification via Lasso. |
num_processes |
Number of processes to be used during parallel execution. To execute in single process mode, this parameter needs to be set to either NA or NULL. |
seed |
Seed for reproducibility. |
verbose |
boolean; Shall I print all messages? |
log_file |
log file where to print outputs when using parallel. If parallel execution is disabled, this parameter is ignored. |
A list of 3 elements: stability, RSS and evar. Here, stability reports the estimared cosine similarity for alpha and beta at each bootstrap repetition; RSS reports for each configuration the estimated residual sum of squares; finally, evar reports the explained variance.
data(background) data(patients) res = nmfLassoBootstrap(x=patients[1:100,], K=3:5, background_signature=background, nmf_runs=1, bootstrap_repetitions=2, num_processes=NA, seed=12345)