findgRNAs {CRISPRseek}R Documentation

Find potential gRNAs

Description

Find potential gRNAs for an input file containing sequences in fasta format

Usage

findgRNAs(inputFilePath, format = "fasta", PAM = "NGG", PAM.size = 3, 
    findPairedgRNAOnly = FALSE, annotatePaired = TRUE, enable.multicore = FALSE,
    n.cores.max = 6, gRNA.pattern = "", gRNA.size = 20, 
    overlap.gRNA.positions = c(17,18), min.gap = 0, max.gap = 20,
    pairOutputFile, name.prefix = "", featureWeightMatrixFile = system.file("extdata", 
        "DoenchNBT2014.csv", package = "CRISPRseek"), baseBeforegRNA = 4, 
        baseAfterPAM = 3, calculategRNAEfficacy = FALSE, efficacyFile,
    PAM.location = "3prime", rule.set = c("Root_RuleSet1_2014", "Root_RuleSet2_2016"))

Arguments

inputFilePath

Sequence input file path or a DNAStringSet object that contains sequences to be searched for potential gRNAs

format

Format of the input file, fasta and fastq are supported, default fasta

PAM

protospacer-adjacent motif (PAM) sequence after the gRNA, default NGG

PAM.size

PAM length, default 3

findPairedgRNAOnly

Choose whether to only search for paired gRNAs in such an orientation that the first one is on minus strand called reverse gRNA and the second one is on plus strand called forward gRNA. TRUE or FALSE, default FALSE

annotatePaired

Indicate whether to output paired information, default TRUE

enable.multicore

Indicate whether enable parallel processing, default FALSE. For super long sequences with lots of gRNAs, suggest set it to TRUE

n.cores.max

Indicating maximum number of cores to use in multi core mode, i.e., parallel processing, default 6. Please set it to 1 to disable multicore processing for small dataset.

gRNA.pattern

Regular expression or IUPAC Extended Genetic Alphabet to represent gRNA pattern, default is no restriction. To specify that the gRNA must start with GG for example, then set it to ^GG. Please see help(translatePattern) for a list of IUPAC Extended Genetic Alphabet.

gRNA.size

The size of the gRNA, default 20

overlap.gRNA.positions

The required overlap positions of gRNA and restriction enzyme cut site, default 17 and 18

min.gap

Minimum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 0

max.gap

Maximum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 20

pairOutputFile

The output file for writing paired gRNA information to

name.prefix

The prefix used when assign name to found gRNAs, default gRNA, short for guided RNA.

baseBeforegRNA

Number of bases before gRNA used for calculating gRNA efficiency, default 4 for spCas9 Please note, for PAM located on the 5 prime, need to specify the number of bases before the PAM sequence plus PAM size.

baseAfterPAM

Number of bases after PAM used for calculating gRNA efficiency, default 3 for spCas9 Please note, for PAM located on the 5 prime, need to include the length of the gRNA plus the extended sequence on the 3 prime

featureWeightMatrixFile

Feature weight matrix file used for calculating gRNA efficiency. By default DoenchNBT2014 weight matrix is used. To use alternative weight matrix file, please input a csv file with first column containing significant features and the second column containing the corresponding weights for the features. Please see Doench et al., 2014 for details.

calculategRNAEfficacy

Default to FALSE, not to calculate gRNA efficacy

efficacyFile

File path to write gRNA efficacies

PAM.location

PAM location relative to gRNA. For example, spCas9 PAM is located on the 3 prime while cpf1 PAM is located on the 5 prime

rule.set

Specify a rule set scoring system for calculating gRNA efficacy.

Details

If users already has a fasta file that contains a set of potential gRNAs, then users can call filergRNAs directly although the easiest way is to call the one-stop-shopping function OffTargetAnalysis with findgRNAs set to FALSE.

Value

DNAStringSet consists of potential gRNAs that can be input to filtergRNAs function directly

Note

If the input sequence file contains multiple >300 bp sequences, suggest create one input file for each sequence and run the OffTargetAnalysis separately.

Author(s)

Lihua Julie Zhu

See Also

offTargetAnalysis

Examples

    findgRNAs(inputFilePath = system.file("extdata",
        "inputseq.fa", package = "CRISPRseek"),
        pairOutputFile = "testpairedgRNAs.xls",
        findPairedgRNAOnly = TRUE)               
    findgRNAs(inputFilePath = system.file("extdata", 
        "cpf1.fa", package = "CRISPRseek"), 
        findPairedgRNAOnly=FALSE, 
        pairOutputFile = "testpairedgRNAs-cpf1.xls", 
        PAM="TTTN", PAM.location = "5prime", PAM.size = 4, 
        overlap.gRNA.positions = c(19,23),
        baseBeforegRNA = 8, baseAfterPAM = 23,
        calculategRNAEfficacy= TRUE, 
        featureWeightMatrixFile = system.file("extdata", 
            "DoenchNBT2014.csv", package = "CRISPRseek"),
       efficacyFile = "testcpf1Efficacy.xls")
 

[Package CRISPRseek version 1.24.0 Index]