nmf.LassoCV {SparseSignatures}R Documentation

nmf.LassoCV

Description

Perform the discovery by cross validation of K (unknown) somatic mutational signatures given a set of observations x. The estimation can slow down because of memory usage, when I high number of cross validation repetitions is asked and when the grid search is performed for a lot of configurations. In this case, we advice to split the computation into multiple smaller sets.

Usage

nmf.LassoCV(x, K = 3:10, starting_beta = NULL,
  background_signature = NULL, nmf_runs = 10, lambda_values = c(0.1,
  0.2, 0.3), cross_validation_entries = 0.05,
  cross_validation_iterations = 5, cross_validation_repetitions = 10,
  iterations = 20, max_iterations_lasso = 10000, num_processes = Inf,
  seed = NULL, verbose = TRUE)

Arguments

x

count matrix.

K

a range of numeric value (each of them greater than 1) indicating the number of signatures to be discovered.

starting_beta

a list of starting beta value for each configuration of K. If it is NULL, starting betas are estimated by NMF.

background_signature

background signature to be used. If not provided, a warning is thrown.

nmf_runs

number of iteration of NMF to be performed for a robust estimation of starting beta. If beta is not NULL, this parameter is ignored.

lambda_values

range of values of LASSO to be used between 0 and 1. This value should be greater than 0. 1 is the value of LASSO that would shrink all the signatures to 0 within one step. The higher lambda_rate is, the sparser are the resulting signatures, but too large values result in a poor fit of the counts.

cross_validation_entries

Percentage of cells in the count matrix to be replaced by 0s.

cross_validation_iterations

For each configuration, the first time the signatures are discovered form a matrix with a ercentage of values replaced by 0s. This may result in a poor. This parameter is the number of restarts to be performed to improve this estimate.

cross_validation_repetitions

Number of time cross-validation should be repeated. Higher values result in better estimate, but are computationally expensive.

iterations

Number of iterations to be performed. Each iteration correspond to a first step where the counts are fitted and a second step where sparsity is enhanced.

max_iterations_lasso

Number of maximum iterations to be performed during the sparsification.

num_processes

Number of processes to be used during parallel execution. If executing in single process mode, this is ignored.

seed

Seed for reproducibility.

verbose

boolean; Shall I print all messages?

Value

A list corresponding with 3 elements: grid_search, starting_beta and mean_squared_error. Here, grid_search provides all the results of the executions within the grid search; starting_beta is the set of initial values of beta used for each configuration and mean_squared_error is the mean squared error between the observed counts and the predicted ones for each configuration.


[Package SparseSignatures version 1.2.0 Index]