run_SMC {YAPSA} | R Documentation |
run_SMC
takes as input a big dataframe constructed from a vcf-like
file of a whole cohort. This wrapper function calls custom functions to
construct a mutational catalogue and stratify it according to categories
indicated by a special column in the input dataframe:
adjust_number_of_columns_in_list_of_catalogues
This stratification
yields a collection of stratified mutational catalogues, these are
reformatted and sent to the custom function SMC
and thus
indirectly to LCD_SMC
to perform a signature analysis
of the stratified mutational catalogues. The result is then handed over
to plot_SMC
for visualization.
run_SMC(my_table, this_signatures_df, this_signatures_ind_df, this_subgroups_df, column_name, refGenome, cohort_method_flag = "all_PIDs", in_strata_order_ind = seq_len(length(unique(my_table[, column_name]))), wordLength = 3, verbose_flag = 1, target_dir = NULL, strata_dir = NULL, output_path = NULL, in_all_exposures_df = NULL, in_rownames = c(), in_norms = NULL, in_label_orientation = "turn", this_sum_ind = NULL)
my_table |
A big dataframe constructed from a vcf-like file of a whole cohort. The first columns are those of a standard vcf file, followed by an arbitrary number of custom or user defined columns. One of these must carry a PID (patient or sample identifyier) and one must be the category used for stratification. |
this_signatures_df |
A numeric data frame |
this_signatures_ind_df |
A data frame containing meta information about the signatures |
this_subgroups_df |
A data frame indicating which PID (patient or sample identifyier) belongs to which subgroup |
column_name |
Name of the column in |
refGenome |
FaFile of the reference genome to extract the motif context of the
variants in |
cohort_method_flag |
Either or several of |
in_strata_order_ind |
Index vector defining reordering of the strata |
wordLength |
Integer number defining the length of the features or motifs, e.g. 3 for tripletts or 5 for pentamers |
verbose_flag |
Verbose if |
target_dir |
Path to directory where the results of the stratification procedure are going to be stored if non-NULL. |
strata_dir |
Path to directory where the mutational catalogues of the different strata are going to be stored if non-NULL |
output_path |
Path to directory where the results, especially the figures produced by
|
in_all_exposures_df |
Optional argument, if specified, |
in_rownames |
Optional parameter to specify rownames of the mutational catalogue |
in_norms |
If specified, vector of the correction factors for every motif due to differing trinucleotide content. If null, no correction is applied. |
in_label_orientation |
Whether or not to turn the labels on the x-axis. |
this_sum_ind |
Optional set of indices for reordering the PIDs |
A list with entries
exposures_list
,
catalogues_list
,
cohort
and
name_list
.
exposures_list
:
The list of s
strata specific exposures Hi, all are numerical data
frames with l
rows and m
columns, l
being the number
of signatures and m
being the number of samples
catalogues_list
:
A list of s
strata specific cohortwide (i.e. averaged over cohort)
normalized exposures
cohort
:
subgroups_df
adjusted for plotting
name_list
:
Names of the contructed strata.
create_mutation_catalogue_from_df
library(BSgenome.Hsapiens.UCSC.hg19) data(sigs) data(lymphoma_test) data(lymphoma_cohort_LCD_results) strata_list <- cut_breaks_as_intervals(lymphoma_test_df$random_norm, in_outlier_cutoffs=c(-4,4), in_cutoff_ranges_list=list(c(-2.5,-1.5), c(0.5,1.5)), in_labels=c("small","intermediate","big")) lymphoma_test_df$random_cat <- strata_list$category_vector choice_ind <- (names(lymphoma_Nature2013_COSMIC_cutoff_exposures_df) %in% unique(lymphoma_test_df$PID)) lymphoma_test_exposures_df <- lymphoma_Nature2013_COSMIC_cutoff_exposures_df[,choice_ind] temp_subgroups_df <- make_subgroups_df(lymphoma_test_df, lymphoma_test_exposures_df) mut_density_list <- run_SMC(lymphoma_test_df, AlexCosmicValid_sig_df, AlexCosmicValid_sigInd_df, temp_subgroups_df, column_name="random_cat", refGenome=BSgenome.Hsapiens.UCSC.hg19, cohort_method_flag="norm_PIDs", in_rownames = rownames(AlexCosmicValid_sig_df))