fcbf {FCBF} | R Documentation |
This functions allows selection of variables from a feature table of discrete/categorial variables and a target class. The function is based on the algorithm described in Yu, L. and Liu, H.; Feature Selection for High-Dimensional Data A Fast Correlation Based Filter Solution, Proc. 20th Intl. Conf. Mach. Learn. (ICML-2003), Washington DC, 2003
fcbf( feature_table, target_vector, minimum_su = 0.25, n_genes_selected_in_first_step = NULL, verbose = FALSE, samples_in_rows = FALSE, balance_classes = FALSE )
feature_table |
A table of features (samples in rows, variables in columns, and each observation in each cell) |
target_vector |
A target vector, factor containing classes of the observations. Note: the observations must be in the same order as the parameter x |
minimum_su |
A minimum_suold for the minimum correlation (as determined by symettrical uncertainty) between each variable and the class. Defaults to 0.25. Note: this might drastically change the number of selected features. |
n_genes_selected_in_first_step |
Sets the number of genes to be selected in the first part of the algorithm. The final number of selected genes is related to this paramenter, but depends on the correlation structure of the data. It overrides the minimum_su parameter. If left unchanged, it defaults to NULL and the minimum_su parameter is used. |
verbose |
Adds verbosity. Defaults to FALSE. |
samples_in_rows |
A flag for the case in which samples are in rows and variables/genes in columns. Defaults to FALSE. |
balance_classes |
Balances number of instances in the target vector y by sampling the number of instances in the minor class from all others. The number of samplings is controlled by resampling_number. Defaults to FALSE. |
Obs: For gene expression, you will need to run discretize_exprs first
Returns a data frame with the selected features index (first row) and their symmetrical uncertainty values regarding the class (second row). Variable names are present in rownames
data(scDengue) exprs <- SummarizedExperiment::assay(scDengue, 'logcounts') discrete_expression <- as.data.frame(discretize_exprs(exprs)) head(discrete_expression[,1:4]) infection <- SummarizedExperiment::colData(scDengue) target <- infection$infection fcbf(discrete_expression,target, minimum_su = 0.05, verbose = TRUE) fcbf(discrete_expression,target, n_genes_selected_in_first_step = 100)