impute {LEA} | R Documentation |
Impute missing genotypes in a genotype file (lfmm) by using ancestry and genotype frequency estimates from an snmf
run. The function generates a new lfmm
file. See lfmm
.
impute (object, input.file, method, K, run)
object |
An snmfProject object. |
input.file |
A path (character string) to an input file in lfmm format with missing genotypes. The same input data must be used when generating the snmf object. |
method |
A character string: "random" or "mode". With "random", imputation is performed by using the genotype probabilities. With "mode", the most likely genotype is used for matrix completion. |
K |
An integer value. The number of ancestral populations. |
run |
An integer value. A particular run used for imputation (usually the run number that minimizes the cross entropy criterion). |
NULL |
The function writes the imputed genotypes in an output file having the "_imputed.lfmm" suffix. |
Olivier Francois
### Example of analysis ### data("tutorial") # creation of a genotype file with missing genotypes # The data contain 400 SNPs for 50 individuals. dat = as.numeric(tutorial.R) dat[sample(1:length(dat), 100)] <- 9 dat <- matrix(dat, nrow = 50, ncol = 400 ) write.lfmm(dat, "genotypes.lfmm") ################ # running snmf # ################ project.snmf = snmf("genotypes.lfmm", K = 4, entropy = TRUE, repetitions = 10, project = "new") # select the run with the lowest cross-entropy value best = which.min(cross.entropy(project.snmf, K = 4)) # Impute the missing genotypes impute(project.snmf, "genotypes.lfmm", method = 'mode', K = 4, run = best) # Compare with truth # Proportion of correct imputation results: mean( tutorial.R[dat == 9] == read.lfmm("genotypes.lfmm_imputed.lfmm")[dat == 9] )