Main Functions
KBoost(X, TFs, prior_weights, g, v, ite)
Function to infer gene regulatory network from gene expression data.
Input:
- X: an NxG matrix where N is the number of observations and G the number of genes.
- TFs: a vector of numerical indexes of the K genes in X that are TFs (default 1:G).
- prior_weights: a GxK matrix with the prior probabilities of each interaction (default is 0.5 for all values).
- g: a positive scalar that corresponds to the width parameter in the RBF Kernel (default 40).
- v: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration (default 0.1).
- ite: an integer that represents the maximum number of iterations (default 3).
Output:
List with the following fields:
- GRN: A matrix with the gene regulatory network.
- GRN_UP: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.
- prior: The prior for the best model at each iteration.
- model: the transcription factors with the highest posteriors at each iteration per gene.
- prior_weights: a GxK matrix with the prior probabilities of each interaction.
- g: a positive scalar that corresponds to the width parameter in the RBF Kernel.
- v: a positive scalar lower than 1 that is the shrinkage parameter for each boosting iteration.
- ite: an integer that represents the maximum number of iterations.
KBoost_human_symbol(X, gen_names, g, v, ite, pos_weight, neg_weight)
Function to infer gene regulatory network from human cell lines or patient samples. This function automatically builds a prior from Gerstein et al. (2012) and uses the list of TFs from Lambert et al. (2018). The gene expression data needs to be a numerical matrix.
Input:
- X: an NxG numeric matrix with the expression values of G genes and N obersvations. The gene names can be specified as column names.
- gen_names: a set of SYMBOL gene names that correspond to the names of the columns of X. Not required if column names of X are already gene names.
- g: a positive scalar with the width parameter for the RBF kernel. (default = 40).
- v: a number between 0 and 1 with the shrinkage parameter. (default = 0.1).
- ite: an integer with the number of iterations (default = 3).
- pos_weight: the prior weight for edges that were previously found in the Gerstein et al. network (default = 0.6).
- neg_weight: the prior weight for edges that were not found in the Gerstein et al. network (default = 0.5).
Output:
List with the following fields:
- GRN: A matrix with the gene regulatory network.
- GRN_UP: A matrix with the gene regulatory network before the heuristic step of multiplying each column by its variance.
- prior: The prior for the best model at each iteration.
- model: the transcription factors with the highest posteriors at each iteration per gene.
- prior_weights: a GxK matrix with the prior probabilities of each interaction.
- g: a positive scalar that corresponds to the width parameter in the RBF Kernel.
- v: a positive scalar smaller than 1 that is the shrinkage parameter for each boosting iteration.
- ite: an integer that represents the maximum number of iterations.
AUPR_AUROC_matrix(Net, G_mat, auto_remove, TFs, upper_limit)
Function to calculate the AUROC and AUPR of a known network.
Input:
- Net: An inferred network with the predictive probabilities that each transcription factor regulates each gene.
- G_mat: A matrix with the gold standard network.
- auto_remove: TRUE if the auto-regulation is to be discarded.
- TFs: the indexes of the rows of Net that are TFs.
- upper_limit: Max number of edges to use (default = all possible edges).
Output:
List with the following fields:
- AUPR: the area under the precision-recall (PR) curve.
- AUROC: the area under the receiver operator characteristic (ROC) curve.
- th: All the unique values of Net.
- Prec: The precision at each value of th.
- Rec: The recall at each value of th.
- FPR: The false positive rate at each value of th.
- TP: The true positives at each value of th.
- FP: The false positives at each value of th.
- TN: The true negatives at each value of th.
- FN: The false negatives at each value of th.
d4_mfac(v, g, ite)
Function to produce the KBoost AUPR and AUROC results on the DREAM4 Multifactorial Challenge.
Input:
- g: a number larger than 0 that is the width parameter for the RBF Kernel
- v: a number between 0 and 1 that is the shrinkage parameter
- ite: an integer with number of iterations.
Output:
- auprs: a matrix with the AUPR per D4 multifactorial dataset.
- aurocs: a matrix with the AUROC per D4 multifactorial dataset.
get_prior_Gerstein(gen_names, TFs, pos_weight, neg_weight)
Function to build a prior from a previously built Network on ChIP-Seq from Gerstein et al. (2012).
Input:
- gen_names: the gene names of the G genes in the user’s subset in Symbol nomenclature.
- TFs: the indexes of the K genes in the user’s subset which are TFs.
- pos_weight: the prior weight for edges that were previously found in the Gerstein et al. network
- neg_weight: the prior weight for edges that were not found in the Gerstein et al. network
Output:
- prior_weights: a GxK matrix with prior weights that a TF regulates a gene given the network published by Gerstein et al.
grid_search_kboost(dataset, vs, gs, ite)
Function to perform a grid search and find the best hyperparameters.
Input:
- dataset: One of the three datasets in the package, 1 for IRMA, 2 for DREAM4 multifactorial and 3 for DREAM5.
- vs: The range of values of v. All values need to be between 0 and 1.
- gs: The range of values of g. All values need to be larger than 0.
- ite: An integer that is the number of iterations.
Output:
List with the following fields:
- aurocs: a 3 dimensional marray with the AUROCs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.
- auprs: a 3 dimensional matrix with the AUPRs. Columns are the gs, the rows the datasets, vs, and the last dimension is the different datasets within a dataset.
irma_check(g, v, ite)
Function to produce the AUPR and AUROC Results on the DREAM4 Multifactorial Challenge.
Input:
- g: a number larger than 0 that is the width parameter for the RBF Kernel
- v: a number between 0 and 1 that is the shrinkage parameter
- ite: an integer with number of iterations.
Output:
- auprs: a matrix with the AUPR per IRMA dataset.
- aurocs: a matrix with the AUROC per IRMA dataset.
net_dist_bin(GRN,TFs,thr)
Function to calculate the shortest distance between nodes.
Input:
- GRN: An inferred networks with the predictive probabilities that a transcription factor regulates a gene.
- TFs: A vector with indexes of the rows of GRN which correspond to TFs.
- thr: A scalar between 0 and 1 that is used select the edges with large posterior probabilities.
Output:
- dist_mat: A matrix with the shortest distances between TFs (columns) and all genes (rows).
Example:
net_summary_bin(GRN,TFs,thr,a,b)
Function to summarize the GRN filtered with a threshold.
Input:
- GRN: An inferred networks with the predictive probabilities that a transcription facor regulates a gene.
- TFs: A vector with indexes of the rows of GRN which correspond to TFs.
- thr: a scalar between 0 and 1, edges with posterior probabilities lower than thr will be discarded.
- a: a scalar for the Katz and PageRank centrality measures. Default the inverse of the largest eigenvalue of GRN.
- b: a scalar for the Katz and PageRank centrality measures. Default is 1.
Output: List with the following fields:
- GRN_table: a sorted table version of the GRN.
- Outdegree: the outdegree of each TF.
- Indegree: the indegree of each gene.
 
- Close_centr: A matrix with the closeness centrality measure per TF.
Example:
net_refine(Net)
Function to do a heuristic post-processing suggested by Slawek and Arodz that improves accuracy. Each column is multiplied by its variance.
Input:
- Net: a GRN with TFs in the columns.
Output:
- Net: a refined GRN.
write_GRN_D4(GRN,TFs, filename)
Function to write output in DREAM4 Challenge Format.
Input:
- GRN: a GxK gene regulatory network.
- TFs: a K set of indixes of G that are TFs.
- filename: a string with the name of the file to store the GRN.