seqFitNullGLMM_SPA {SAIGEgds} | R Documentation |
Fit the null model in the mixed model framework with genetic relationship matrix (GRM).
seqFitNullGLMM_SPA(formula, data, gdsfile, trait.type=c("binary", "quantitative"), sample.col="sample.id", maf=0.005, missing.rate=0.01, max.num.snp=1000000L, variant.id=NULL, inv.norm=TRUE, X.transform=TRUE, tol=0.02, maxiter=20L, nrun=30L, tolPCG=1e-5, maxiterPCG=500L, num.marker=30L, tau.init=c(0,0), traceCVcutoff=0.0025, ratioCVcutoff=0.001, geno.sparse=TRUE, num.thread=1L, model.savefn="", seed=200L, fork.loading=FALSE, verbose=TRUE)
formula |
an object of class |
data |
a data frame for the formulas |
gdsfile |
a SeqArray GDS filename, or a GDS object |
trait.type |
"binary" for binary outcomes, "quantitative" for continuous outcomes |
sample.col |
the column name of sample IDs corresponding to the GDS file |
maf |
minor allele frequency for imported genotypes (checking >= maf),
if |
missing.rate |
threshold of missing rate (checking <= missing.rate),
if |
max.num.snp |
the maximum number of SNPs used, or -1 for no limit |
variant.id |
a list of variant IDs, used to construct GRM |
inv.norm |
if |
X.transform |
if |
tol |
overall tolerance for model fitting |
maxiter |
the maximum number of iterations for model fitting |
nrun |
the number of random vectors in the trace estimation |
tolPCG |
tolerance of PCG iterations |
maxiterPCG |
the maximum number of PCG iterations |
num.marker |
the number of SNPs used to calculate the variance ratio |
tau.init |
a 2-length numeric vector, the initial values for variance
components, tau; for binary traits, the first element is always be set
to 1. if |
traceCVcutoff |
the threshold for coefficient of variation (CV) for the trace estimator, and the number of runs for trace estimation will be increased until the CV is below the threshold |
ratioCVcutoff |
the threshold for coefficient of variation (CV) for estimating the variance ratio, and the number of randomly selected markers will be increased until the CV is below the threshold |
geno.sparse |
if |
num.thread |
the number of threads |
model.savefn |
the filename of model output, R data file '.rda' or '.RData' |
seed |
an integer as a seed for random numbers |
fork.loading |
load genotypes via forking or not; forking processes in Unix can reduce loading time of genotypes, but may double the memory usage; not applicable on Windows |
verbose |
if |
Utilizing the sparse structure of genotypes could significantly improve the computational efficiency of model fitting, but it also increases the memory usage. For more details of SAIGE algorithm, please refer to the SAIGE paper [Zhou et al. 2018] (see the reference section).
Returns a list with the following components:
coefficients |
the beta coefficients for fixed effects; |
tau |
a numeric vector of variance components 'Sigma_E' and 'Sigma_G'; |
linear.predictors |
the linear fit on link scale; |
fitted.values |
fitted values from objects returned by modeling
functions using |
residuals |
residuals; |
cov |
covariance matrix of beta coefficients; |
converged |
whether the model is fitted or not; |
obj.noK |
internal use, returned object from the SPAtest package; |
var.ratio |
a data.frame with columns 'id' (variant.id), 'maf' (minor allele frequency), 'mac' (minor allele count), 'var1' (the variance of score statistic), 'var2' (a variance estimate without accounting for estimated random effects) and 'ratio' (var1/var2, estimated variance ratio for variance approximation); |
trait.type |
either "binary" or "quantitative"; |
sample.id |
the sample IDs used in the model fitting; |
variant.id |
the variant IDs used in the model fitting. |
Xiuwen Zheng
Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet (2018). Sep;50(9):1335-1341.
T Sofer, X Zheng, SM Gogarten, CA Laurie, etc. A fully adjusted two-stage procedure for rank-normalization in genetic association studies. 2019. Genetic Epidemiology 43(3), 263-275
# open a GDS file fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds") gdsfile <- seqOpen(fn) # load phenotype phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds") pheno <- read.table(phenofn, header=TRUE, as.is=TRUE) head(pheno) # fit the null model glmm <- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, gdsfile, trait.type="binary") glmm # close the GDS file seqClose(gdsfile)