BatchCorrectedCounts {CountClust} | R Documentation |
This function first converts counts data to log CPM data , then apply a linear model with the batch effect as a factor. We take the sum of intercept, residuals and mean batch effect across all the batches and then inverse transform it back to counts to get rid of batch effects.
BatchCorrectedCounts(data, batch_lab, use_parallel = TRUE)
data |
count matrix, with samples along the rows and features along the columns. |
batch_lab |
batch label vector. |
use_parallel |
if TRUE, we do a parallel analysis over features, else serial application. |
Returns a counts data. with same dimension as the input data, but which is corrected for batch_lab.
# Simulation example N=500; K=4; G=100; Label.Batch=c(rep(1,N/4),rep(2,N/4),rep(3,N/4),rep(4,N/4)); alpha_true=matrix(rnorm((K)*G,0.5,1),nrow=(K)); library(gtools) tt <- 10; omega_true = matrix(rbind(rdirichlet(tt*10,c(3,4,2,6)), rdirichlet(tt*10,c(1,4,6,3)), rdirichlet(tt*10,c(4,1,2,2)), rdirichlet(tt*10,c(2,6,3,2)), rdirichlet(tt*10,c(3,3,5,4))), nrow=N); B=max(Label.Batch); sigmab_true=2; beta_true=matrix(0,B,G); for(g in 1:G) { beta_true[,g]=rnorm(B,mean=0,sd=sigmab_true); } read_counts=matrix(0,N,G); for(n in 1:N){ for(g in 1:G) { read_counts[n,g]=rpois(1, omega_true[n,]%*%exp(alpha_true[,g] + beta_true[Label.Batch[n],g])); } } batchcorrect_counts <- BatchCorrectedCounts(read_counts, Label.Batch, use_parallel=FALSE)