forwsimDiffExpr {gaga} | R Documentation |
Forward simulation allows to evaluate the expected utility for sequential designs. Here the utility is the expected number of true discoveries minus a sampling cost. The routine simulates future data either from the prior predictive or using a set of pilot data and a GaGa or normal-normal model fit. At each future time point, it computes a summary statistic that will be used to determine when to stop the experiment.
forwsimDiffExpr(fit, x, groups, ngenes, maxBatch, batchSize, fdrmax = 0.05, genelimit, v0thre = 1, B = 100, Bsummary = 100, trace = TRUE, randomSeed, mc.cores=1)
fit |
Either GaGa or MiGaGa fit (object of type |
x |
|
groups |
If |
ngenes |
Number of genes to simulate data for. If |
maxBatch |
Maximum number of batches, i.e. the routine simulates
|
batchSize |
Batch size, i.e. number of observations per group to
simulate at each time point. Defaults to |
fdrmax |
Upper bound on FDR. |
genelimit |
Only the |
v0thre |
Only genes with posterior probability of being equally
expressed < |
B |
Number of forward simulations. |
Bsummary |
Number of simulations for estimating the summary statistic. |
trace |
For |
randomSeed |
Integer value used to set random number generator
seed. Defaults to |
mc.cores |
If |
To improve computational speed hyper-parameters are not re-estimated as new data is simulated.
A data.frame
with the following columns:
simid |
Simulation number. |
j |
Time (sample size). |
u |
Expected number of true positives if we were to stop experimentation at this time. |
fdr |
Expected FDR if we were to stop experimentation at this time. |
fnr |
Expected FNR if we were to stop experimentation at this time. |
power |
Expected power (as estimated by E(TP)/E(positives)) if we were to stop experimentation at this time. |
summary |
Summary statistic: increase in expected true positives if we were to obtain one more data batch. |
David Rossell.
Rossell D., Mueller P. Sequential sample sizes for high-throughput hypothesis testing experiments. http://sites.google.com/site/rosselldavid/home.
Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. Annals of Applied Statistics, 2009, 3, 1035-1051.
plotForwSim
to plot the simulated trajectories,
fitGG
for fitting a GaGa model,
fitNN
for fitting a normal-normal model,
seqBoundariesGrid
for finding the optimal design based
on the forwards simulation output.
powfindgenes
for fixed sample size calculations.
#Simulate data and fit GaGa model set.seed(1) x <- simGG(n=20,m=2,p.de=.5,a0=3,nu=.5,balpha=.5,nualpha=25) gg1 <- fitGG(x,groups=1:2,method='EM') gg1 <- parest(gg1,x=x,groups=1:2) #Run forward simulation fs1 <- forwsimDiffExpr(gg1, x=x, groups=1:2, maxBatch=2,batchSize=1,fdrmax=0.05, B=100, Bsummary=100, randomSeed=1) #Expected number of true positives for each sample size tapply(fs1$u,fs1$time,'mean') #Expected utility for each sample size samplingCost <- 0.01 tapply(fs1$u,fs1$time,'mean') - samplingCost*(0:2) #Optimal sequential design b0seq <- seq(0,20,length=200); b1seq <- seq(0,40,length=200) bopt <-seqBoundariesGrid(b0=b0seq,b1=b1seq,forwsim=fs1,samplingCost=samplingCost,powmin=0) bopt <- bopt$opt plot(fs1$time,fs1$summary,xlab='Additional batches',ylab='E(newly discovered DE genes)') abline(bopt['b0'],bopt['b1']) text(.2,bopt['b0'],'Continue',pos=3) text(.2,bopt['b0'],'Stop',pos=1)