Preprocess_GeneExpression {MethylMix} | R Documentation |
Pre-processes gene expression data from TCGA.
Preprocess_GeneExpression(CancerSite, MAdirectories, MissingValueThresholdGene = 0.3, MissingValueThresholdSample = 0.1)
CancerSite |
character of length 1 with TCGA cancer code. |
MAdirectories |
character vector with directories with the downloaded data. It can be the object returned by the Download_DNAmethylation function. |
MissingValueThresholdGene |
threshold for missing values per gene. Genes with a percentage of NAs greater than this threshold are removed. Default is 0.3. |
MissingValueThresholdSample |
threshold for missing values per sample. Samples with a percentage of NAs greater than this threshold are removed. Default is 0.1. |
Pre-process includes eliminating samples and genes with too many NAs, imputing NAs, and doing Batch correction.
List with the pre-processed data matrix for cancer and normal samples.
## Not run: # Optional register cluster to run in parallel library(doParallel) cl <- makeCluster(5) registerDoParallel(cl) # Gene expression data for ovarian cancer cancerSite <- "OV" targetDirectory <- paste0(getwd(), "/") # Downloading gene expression data GEdirectories <- Download_GeneExpression(cancerSite, targetDirectory, TRUE) # Processing gene expression data GEProcessedData <- Preprocess_GeneExpression(cancerSite, GEdirectories) # Saving gene expression processed data saveRDS(GEProcessedData, file = paste0(targetDirectory, "GE_", cancerSite, "_Processed.rds")) stopCluster(cl) ## End(Not run)