analyse_stability {TMixClust} | R Documentation |
analyse_stability
Performs multiple clustering runs
with TMixClust, analyses the agreement between runs
with the Rand index and returns the clustering solution with the largest
likelihood.
A plot of agreement probability between all the runs and the run with the
maximum likelihood is produced.
analyse_stability(time_series_df, time_points = seq_len(ncol(time_series_df)), nb_clusters = 2, em_iter_max = 1000, mc_em_iter_max = 10, em_ll_convergence = 0.001, nb_clustering_runs = 3, nb_cores = 1)
time_series_df |
data frame containing the time series. Each row is a time series comprised of the time series name which is also the row name, and the time series values at each time point. |
time_points |
vector containing numeric values for the time points.
Default: |
nb_clusters |
desired number of clusters |
em_iter_max |
maximum number of iterations for the expectation-maximization (EM) algorithm. Default: 1000. |
mc_em_iter_max |
maximum number of iterations for Monte-Carlo resampling. Default is 10. |
em_ll_convergence |
convergence threshold for likelihood improvement. Default is 0.001. |
nb_clustering_runs |
number of times the clustering procedure is repeated on the input data. Default is 3. |
nb_cores |
number of cores to be used to run the separate clustering operations in parallel. Default is 1. |
TMixClust object with the highest likelihood. Renders a plot showing the overall distribution of the Rand index, which allows the user to assess clustering stability.
Monica Golumbeanu, monica.golumbeanu@bsse.ethz.ch
Golumbeanu M, Desfarges S, Hernandez C, Quadroni M, Rato S, Mohammadi P, Telenti A, Beerenwinkel N, Ciuffi A. (2017) Dynamics of Proteo-Transcriptomic Response to HIV-1 Infection.
# Load the toy time series data provided with the TMixClust package data(toy_data_df) # Identify the most optimal clustering solution with 3 clusters best_clust_obj = analyse_stability(toy_data_df, nb_clusters = 3, nb_clustering_runs = 4, nb_cores = 1) # Plot the time series from each cluster for (i in seq_len(3)) { # Extract the time series in the current cluster and plot them c_df=toy_data_df[which(best_clust_obj$em_cluster_assignment==i),] plot_time_series_df(c_df, plot_title = paste("cluster",i)) }