collapse.fastq {ORFik} | R Documentation |
For each unique read in the file, collapse into 1 and state in the fasta header how many reads existed of that type. This is done after trimming usually, works best for reads < 50 read length. Not so effective for 150 bp length mRNA-seq etc.
collapse.fastq( files, outdir = file.path(dirname(files[1]), "collapsed"), header.out.format = "ribotoolkit", compress = FALSE, prefix = "collapsed_" )
files |
paths to fasta / fastq files to collapse. I tries to detect format per file, if file does not have .fastq, .fastq.gz, .fq or fq.gz extensions, it will be treated as a .fasta file format. |
outdir |
outdir to save files, default:
|
header.out.format |
character, default "ribotoolkit", else must be "fastx". How the read header of the output fasta should be formated: ribotoolkit: ">seq1_x55", sequence 1 has 55 duplicated reads collapsed. fastx: ">1-55", sequence 1 has 55 duplicated reads collapsed |
compress |
logical, default FALSE |
prefix |
character, default "collapsed_" Prefix to name of output file. |
invisible(NULL), files saved to disc in fasta format.
fastq.folder <- tempdir() # <- Your fastq files infiles <- dir(fastq.folder, "*.fastq", full.names = TRUE) # collapse.fastq(infiles)