M3C {M3C}R Documentation

M3C: Monte Carlo Consensus Clustering

Description

This function runs M3C, which is a consensus clustering tool with hypothesis testing. The basic idea is to use a multi-core enabled Monte Carlo simulation to drive the creation of a null distribution of stability scores. The monte carlo simulations maintains the correlation structure of the input data. Then the null distribution is used to compare the reference scores with the real scores and a empirical p value is calculated for every value of K. We also use the relative cluster stability index as an alternative metric which is just based on a comparison against the reference mean, the advantage being it requires fewer iterations. Small p values are estimated cheaply using a beta distribution that is inferred using parameter estimates from the Monte Carlo simulation.

Usage

M3C(mydata, montecarlo = TRUE, cores = 1, iters = 100, maxK = 10,
  des = NULL, ref_method = c("reverse-pca", "chol"), repsref = 100,
  repsreal = 100, clusteralg = c("pam", "km", "spectral"),
  distance = "euclidean", pacx1 = 0.1, pacx2 = 0.9, printres = FALSE,
  printheatmaps = FALSE, showheatmaps = FALSE, seed = NULL,
  removeplots = FALSE, dend = FALSE)

Arguments

mydata

Data frame or matrix: Contains the data, with samples as columns and rows as features

montecarlo

Logical flag: whether to run the Monte Carlo simulation or not (recommended: TRUE)

cores

Numerical value: how many cores to split the monte carlo simulation over

iters

Numerical value: how many Monte Carlo iterations to perform (default: 100, recommended: 100-1000)

maxK

Numerical value: the maximum number of clusters to test for, K (default: 10)

des

Data frame: contains annotation data for the input data for automatic reordering (optional)

ref_method

Character string: refers to which reference method to use (recommended: leaving as default)

repsref

Numerical value: how many reps to use for the Monte Carlo reference data (suggest 100)

repsreal

Numerical value: how many reps to use for the real data (recommended: 100)

clusteralg

String: dictates which algorithm to use for M3C (recommended: leaving as default)

distance

String: dictates which distance metric to use for M3C (recommended: leaving as default)

pacx1

Numerical value: The 1st x co-ordinate for calculating the pac score from the CDF (default: 0.1)

pacx2

Numerical value: The 2nd x co-ordinate for calculating the pac score from the CDF (default: 0.9)

printres

Logical flag: whether to print all results into current directory

printheatmaps

Logical flag: whether to print all the heatmaps into current directory

showheatmaps

Logical flag: whether to show the heatmaps on screen (can be slow)

seed

Numerical value: fixes the seed if you want to repeat results, set the seed to 123 for example here

removeplots

Logical flag: whether to remove all plots (recommended: leaving as default)

dend

Logical flag: whether to compute the dendrogram and p values for the optimal K or not

Value

A list, containing: 1) the stability results and 2) all the output data (another list) 3) reference stability scores (see vignette for more details on how to easily access)

Examples

res <- M3C(mydata, cores=1, iters=100, ref_method = 'reverse-pca', montecarlo = TRUE,printres = FALSE, 
maxK = 10, showheatmaps = FALSE, repsreal = 100, repsref = 100,printheatmaps = FALSE, seed = 123, des = desx)

[Package M3C version 1.2.0 Index]