xina_clustering {XINA} | R Documentation |
Clustering multiplexed time-series omics data to find co-abundance profiles
xina_clustering(f_names, data_column, out_dir = getwd(), nClusters = 20, norm = "sum_normalization", chosen_model = "")
f_names |
A vector containing input file (.csv) paths |
data_column |
A vector containing column names (1st row of the input file) of data matrix |
out_dir |
A directory path for saving clustering results. (default: out_dir=getwd()) |
nClusters |
The number of desired maximum clusters |
norm |
Default is "sum_normalization". Sum-normalization is to divide the data matrix by row sum. If you want to know more about sum-normalization, see https://www.ncbi.nlm.nih.gov/pubmed/19861354. "zscore" is to calculate Z score for each protein. See scale. |
chosen_model |
You can choose a specific model rather than testing all the models that are available in mclust. mclustModelNames If you want k-means clustering instead of the model-based clustering, use "kmeans" here. |
a plot containing a BIC plot in current working directory and a list containing below information:
Item | Description |
clusters | XINA clustering results |
aligned | XINA clustering results aligned by ID |
data_column | Data matrix column names |
out_dir | The directory path containing XINA results |
nClusters | The number of clusters desired by user |
max_cluster | The number of clusters optimized by BIC |
chosen_model | The used covariance model for model-based clustering |
optimal_BIC | BIC of the optimized covariance model |
condition | Experimental conditions of the user input data |
color_for_condition | Colors assigned to each experimental conditions which is used for condition composition plot |
color_for_clusters | Colors assigned to each clusters which is used for XINA clustering plot |
norm_method | Used normalization method |
# Generate random multiplexed time-series data random_data_info <- make_random_xina_data() # Data files data_files <- paste(random_data_info$conditions, ".csv", sep='') # time points of the data matrix data_column <- random_data_info$time_points # mclust requires the fixed random seed to get reproduce the clustering results set.seed(0) # Run the model-based clustering to find co-abundance profiles example_clusters <- xina_clustering(data_files, data_column=data_column, nClusters=30) # Run k-means clustering to find co-abundance profiles example_clusters <- xina_clustering(data_files, data_column=data_column, nClusters=30, chosen_model="kmeans")