lineagespot 1.0.0
lineagespot is a framework written in R, and aims to identify
SARS-CoV-2 related mutations based on a single (or a list) of variant(s)
file(s) (i.e., variant calling format). The method can facilitate the
detection of SARS-CoV-2 lineages in wastewater samples using next
generation sequencing, and attempts to infer the potential distribution
of the SARS-CoV-2 lineages.
lineagespot is distributed as a Bioconductor
package and requires R (version “4.1”), which can be installed on any
operating system from CRAN, and
Bioconductor (version “3.14”).
To install lineagespot package enter the following commands in
your R session:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("lineagespot")
## Check that you have a valid Bioconductor installation
BiocManager::valid()Example fastq files are provided through zenodo. For the pre processing steps of them, the bioinformatics analysis pipeline is provided here.
Once lineagespot is successfully installed, it can be loaded as follow:
library(lineagespot)lineagespot can be run by calling one function that implements the overall
pipeline:
results <- lineagespot(vcf_folder = system.file("extdata", "vcf-files", 
                                                package = "lineagespot"),
                      gff3_path = system.file("extdata", 
                                              "NC_045512.2_annot.gff3", 
                                              package = "lineagespot"),
                      ref_folder = system.file("extdata", "ref", 
                                               package = "lineagespot"))The function returns three tables:
# overall table
head(results$variants.table)
#>          CHROM POS                         ID   REF  ALT DP AD_ref AD_alt
#> 1: NC_045512.2 328   NC_045512.2;328;ACA;ACCA   ACA ACCA 36     34      1
#> 2: NC_045512.2 355        NC_045512.2;355;C;T     C    T 42     41      1
#> 3: NC_045512.2 366        NC_045512.2;366;C;T     C    T 42     28     14
#> 4: NC_045512.2 401 NC_045512.2;401;CTTAA;CTAA CTTAA CTAA 37     35      2
#> 5: NC_045512.2 406     NC_045512.2;406;AGA;AA   AGA   AA 35     34      1
#> 6: NC_045512.2 421        NC_045512.2;421;C;A     C    A 35     34      1
#>    Gene_Name  Nt_alt AA_alt         AF codon_num                sample
#> 1:     ORF1a  64dupC  Q22fs 0.02777778        21 SampleA_freebayes_ann
#> 2:     ORF1a   90C>T   G30G 0.02380952        30 SampleA_freebayes_ann
#> 3:     ORF1a  101C>T   S34F 0.33333333        34 SampleA_freebayes_ann
#> 4:     ORF1a 138delT  D48fs 0.05405405        46 SampleA_freebayes_ann
#> 5:     ORF1a 142delG  D48fs 0.02857143        47 SampleA_freebayes_ann
#> 6:     ORF1a  156C>A   G52G 0.02857143        52 SampleA_freebayes_ann# lineages' hits
head(results$lineage.hits)
#>    Gene_Name AA_alt                sample   DP AD_alt        AF lineage
#> 1:         M   I82T SampleC_freebayes_ann 3984   2770 0.6952811    AY.1
#> 2:         N   D63G SampleC_freebayes_ann 2180    787 0.3610092    AY.1
#> 3:         N  R203M SampleC_freebayes_ann 4147   4125 0.9946950    AY.1
#> 4:         N  G215C SampleC_freebayes_ann 4477   2574 0.5749386    AY.1
#> 5:         N  D377Y SampleC_freebayes_ann 4271   1623 0.3800047    AY.1
#> 6:     ORF1a A1306S SampleC_freebayes_ann 2202   1267 0.5753860    AY.1# lineagespot report
head(results$lineage.report)
#>    lineage                sample     meanAF meanAF_uniq minAF_uniq_nonzero N
#> 1:    AY.1 SampleA_freebayes_ann 0.08333333   0.0000000                 NA 1
#> 2:    AY.1 SampleB_freebayes_ann 0.08333333   0.0000000                 NA 1
#> 3:    AY.1 SampleC_freebayes_ann 0.43162568   0.0000000                 NA 6
#> 4:    AY.2 SampleA_freebayes_ann 0.07692308   0.0000000                 NA 1
#> 5:    AY.2 SampleB_freebayes_ann 0.07692308   0.0000000                 NA 1
#> 6:    AY.2 SampleC_freebayes_ann 0.33117826   0.1198191          0.1594335 4
#>    lineage N. rules lineage prop.
#> 1:               31    0.03225806
#> 2:               31    0.03225806
#> 3:               31    0.19354839
#> 4:               29    0.03448276
#> 5:               29    0.03448276
#> 6:               29    0.13793103Here is the output of sessionInfo() on the system on which this document was
compiled running pandoc 2.5:
#> R version 4.2.0 RC (2022-04-19 r82224)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lineagespot_1.0.0 RefManageR_1.3.0  BiocStyle_2.24.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] MatrixGenerics_1.8.0        Biobase_2.56.0             
#>  [3] httr_1.4.2                  sass_0.4.1                 
#>  [5] bit64_4.0.5                 jsonlite_1.8.0             
#>  [7] bslib_0.3.1                 assertthat_0.2.1           
#>  [9] BiocManager_1.30.17         stats4_4.2.0               
#> [11] BiocFileCache_2.4.0         blob_1.2.3                 
#> [13] BSgenome_1.64.0             GenomeInfoDbData_1.2.8     
#> [15] Rsamtools_2.12.0            yaml_2.3.5                 
#> [17] progress_1.2.2              pillar_1.7.0               
#> [19] RSQLite_2.2.12              lattice_0.20-45            
#> [21] glue_1.6.2                  digest_0.6.29              
#> [23] GenomicRanges_1.48.0        XVector_0.36.0             
#> [25] htmltools_0.5.2             Matrix_1.4-1               
#> [27] plyr_1.8.7                  XML_3.99-0.9               
#> [29] pkgconfig_2.0.3             biomaRt_2.52.0             
#> [31] bookdown_0.26               zlibbioc_1.42.0            
#> [33] purrr_0.3.4                 BiocParallel_1.30.0        
#> [35] tibble_3.1.6                KEGGREST_1.36.0            
#> [37] generics_0.1.2              IRanges_2.30.0             
#> [39] ellipsis_0.3.2              cachem_1.0.6               
#> [41] SummarizedExperiment_1.26.0 GenomicFeatures_1.48.0     
#> [43] BiocGenerics_0.42.0         cli_3.3.0                  
#> [45] magrittr_2.0.3              crayon_1.5.1               
#> [47] memoise_2.0.1               evaluate_0.15              
#> [49] fansi_1.0.3                 xml2_1.3.3                 
#> [51] tools_4.2.0                 data.table_1.14.2          
#> [53] prettyunits_1.1.1           hms_1.1.1                  
#> [55] BiocIO_1.6.0                lifecycle_1.0.1            
#> [57] matrixStats_0.62.0          stringr_1.4.0              
#> [59] S4Vectors_0.34.0            DelayedArray_0.22.0        
#> [61] AnnotationDbi_1.58.0        Biostrings_2.64.0          
#> [63] compiler_4.2.0              jquerylib_0.1.4            
#> [65] GenomeInfoDb_1.32.0         rlang_1.0.2                
#> [67] grid_4.2.0                  RCurl_1.98-1.6             
#> [69] rjson_0.2.21                rappdirs_0.3.3             
#> [71] VariantAnnotation_1.42.0    bitops_1.0-7               
#> [73] rmarkdown_2.14              restfulr_0.0.13            
#> [75] curl_4.3.2                  DBI_1.1.2                  
#> [77] R6_2.5.1                    GenomicAlignments_1.32.0   
#> [79] lubridate_1.8.0             rtracklayer_1.56.0         
#> [81] dplyr_1.0.8                 knitr_1.38                 
#> [83] utf8_1.2.2                  fastmap_1.1.0              
#> [85] bit_4.0.4                   filelock_1.0.2             
#> [87] stringi_1.7.6               parallel_4.2.0             
#> [89] Rcpp_1.0.8.3                vctrs_0.4.1                
#> [91] png_0.1-7                   tidyselect_1.1.2           
#> [93] dbplyr_2.1.1                xfun_0.30