seq.hotSPOT 1.6.0
1Department of Dermatology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 2Department of Cell Stress Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY 3Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY
In detail description of methods and application of seq.hotSPOT
Next generation sequencing is a powerful tool for assessment of mutation burden in both healthy and diseased tissues. However, in order to sufficiently capture mutation burden in clinically healthy tissues, deep sequencing is required. While whole-exome and whole-genome sequencing are popular methods for sequencing cancer samples, it is not economically feasible to sequence large genomic regions at the high depth needed for healthy tissues. Therefore, it is important to identify relevant genomic areas to design targeted sequencing panels.
Currently, minimal resources exist which enable researchers to design their own targeted sequencing panels based on specific biological questions and tissues of interest. seq.hotSPOT may be used in combination with the Bioconductor package RTCGA.mutations, which can be used to pull mutation datasets from the TCGA database to be used as input data in seq.hotSPOT functions. This would not only allow users to identify highly mutated regions in cancer of interest, but the package RTCGA.clinical may be also used to identify highly mutated regions in subsets of patients with specific clinical features of interest.
seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing.
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("seq.hotSPOT")Load [seq.hotSPOT][]
library(seq.hotSPOT)The mutation dataset should include two columns containing the chromosome and genomic position of each mutation. The columns should be named “chr” and “pos” respectively. Optionally, the gene names for each mutation may be included under a column named “gene”.
data("mutation_data")
head(mutation_data)
#>    chr       pos gene
#> 22   4 126329603 FAT4
#> 23   4 126329653 FAT4
#> 24   4 126329608 FAT4
#> 25   4 126329651 FAT4
#> 26   4 126329633 FAT4
#> 27   4 126329653 FAT4This algorithm searches the mutational dataset (input) for mutational hotspot regions on each chromosome:
amps <- amp_pool(data = mutation_data, amp = 100)
head(amps)
#>   lowerbound upperbound chromosome count id mut_lowerbound mut_upperbound
#> 1    1803511    1803610          4    17  x        1803553        1803564
#> 2    1806007    1806106          4    11  x        1806047        1806066
#> 3    1808912    1809011          4     6  x        1808958        1808970
#> 4  126329597  126329696          4    34  x      126329601      126329700
#> 5    7577035    7577134         17    10  x        7577058        7577127
#> 6    7577498    7577597         17    38  x        7577537        7577569fw_bins <- fw_hotspot(bins = amps, data = mutation_data, amp = 100, len = 1000, include_genes = TRUE)
head(fw_bins)
#>     Lowerbound Upperbound Chromosome Mutation Count Cumulative Panel Length
#> 6      7577498    7577597         17             38                     100
#> 4    126329597  126329696          4             34                     200
#> 1      1803511    1803610          4             17                     300
#> 102  120512189  120512288          1             13                     400
#> 2      1806007    1806106          4             11                     500
#> 5      7577035    7577134         17             10                     600
#>     Cumulative Mutations   Gene
#> 6                     38   TP53
#> 4                     72   FAT4
#> 1                     89  FGFR3
#> 102                  102 NOTCH2
#> 2                    113  FGFR3
#> 5                    123   TP53com_bins <- com_hotspot(fw_panel = fw_bins, bins = amps, data = mutation_data, 
                        amp = 100, len = 1000, size = 3, include_genes = TRUE)
head(com_bins)
#>    Lowerbound Upperbound Chromosome Mutation Count Cumulative Panel Length
#> 6     7577498    7577597         17             38                     100
#> 4   126329597  126329696          4             34                     200
#> 1     1803511    1803610          4             17                     300
#> 2     1806007    1806106          4             11                     400
#> 5     7577035    7577134         17             10                     500
#> 47  120497611  120497710          1              8                     600
#>    Cumulative Mutations   Gene
#> 6                    38   TP53
#> 4                    72   FAT4
#> 1                    89  FGFR3
#> 2                   100  FGFR3
#> 5                   110   TP53
#> 47                  118 NOTCH2sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] seq.hotSPOT_1.6.0 BiocStyle_2.34.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] hash_2.2.6.3        digest_0.6.37       R6_2.5.1           
#>  [4] bookdown_0.41       fastmap_1.2.0       xfun_0.48          
#>  [7] cachem_1.1.0        R.utils_2.12.3      knitr_1.48         
#> [10] htmltools_0.5.8.1   rmarkdown_2.28      lifecycle_1.0.4    
#> [13] cli_3.6.3           R.methodsS3_1.8.2   sass_0.4.9         
#> [16] jquerylib_0.1.4     compiler_4.4.1      R.oo_1.26.0        
#> [19] tools_4.4.1         evaluate_1.0.1      bslib_0.8.0        
#> [22] yaml_2.3.10         BiocManager_1.30.25 jsonlite_1.8.9     
#> [25] rlang_1.1.4