TEKRABber 1.0.1
TEKRABber is used to estimate the correlations between genes and transposable elements (TEs) from RNA-seq data comparing between: (1) Two Species (2) Control vs. Experiment. In the following sections, we will use built-in data to demonstrate how to implement TEKRABber on you own analysis.
To use TEKRABber from your R environment, you need to install it using BiocManager:
install.packages("BiocManager")
BiocManager::install("TEKRABber")library(TEKRABber)
library(SummarizedExperiment) # load it if you are running this tutorialGene and TE expression data are generated from randomly picked brain regions FASTQ files from 10 humans and 10 chimpanzees (Khrameeva E et al., Genome Research, 2020). The values for the first column of gene and TE count table must be Ensembl gene ID and TE name:
# load built-in data
data(speciesCounts)
hmGene <- speciesCounts$hmGene
hmTE <- speciesCounts$hmTE
chimpGene <- speciesCounts$chimpGene
chimpTE <- speciesCounts$chimpTE
# the first column must be Ensembl gene ID for gene, and TE name for TE
head(hmGene)##            Geneid SRR8750453 SRR8750454 SRR8750455 SRR8750456 SRR8750457
## 1 ENSG00000000003        250        267        227        286        128
## 2 ENSG00000000005         13          2         15          9          5
## 3 ENSG00000000419        260        311        159        259        272
## 4 ENSG00000000457         86        131        100         94         80
## 5 ENSG00000000460         21         17         42         33         55
## 6 ENSG00000000938        162         75         95        252        195
##   SRR8750458 SRR8750459 SRR8750460 SRR8750461 SRR8750462
## 1        394        268        102        370        244
## 2          0          1          8          0          2
## 3        408        371        126        211        374
## 4        158        119         46         77         81
## 5         29         50         11         18         20
## 6        137         93        108        197         69In the first step, we use orthologScale() to get orthology information and
calculate the scaling factor between two species. The species name needs to be
the abbreviation of scientific species name used in Ensembl. (Note: (1)This
step queries information using biomaRt and it might
need some time or try different mirrors due to the connections to Ensembl
(2)It might take some time to calculate scaling factor based on
your data size).
# You can use the code below to search for species name
ensembl <- biomaRt::useEnsembl(biomart = "genes")
biomaRt::listDatasets(ensembl)# In order to save time, we have save the data for this tutorial.
data(fetchDataHmChimp)
fetchData <- fetchDataHmChimp
# Query the data and calculate scaling factor using orthologScale():
# fetchData <- orthologScale(
#     geneCountRef = hmGene,
#     geneCountCompare = chimpGene,
#     speciesRef = "hsapiens",
#     speciesCompare = "ptroglodytes"
# )correlation estimation
We use DECorrInputs() to return input files for downstream analysis.
inputBundle <- DECorrInputs(
    orthologTable = fetchData$orthologTable,
    scaleFactor = fetchData$scaleFactor,
    geneCountRef = hmGene,
    geneCountCompare = chimpGene,
    teCountRef = hmTE,
    teCountCompare = chimpTE
)In this step, we need to generate a metadata contain species name
(i.e., human and chimpanzee). The row names need to be same as the DE input
table and the column name must be species (see the example below). Then we
use DEgeneTE() to perform DE analysis. When you are comparing samples between
two species, the parameter expDesign should be TRUE (as default).
meta <- data.frame(
    species = c(rep("human", ncol(hmGene) - 1), 
    rep("chimpanzee", ncol(chimpGene) - 1))
)
meta$species <- factor(meta$species, levels = c("human", "chimpanzee"))
rownames(meta) <- colnames(inputBundle$geneInputDESeq2)
hmchimpDE <- DEgeneTE(
    geneTable = inputBundle$geneInputDESeq2,
    teTable = inputBundle$teInputDESeq2,
    metadata = meta,
    expDesign = TRUE
)Here we use corrOrthologTE() to perform correlation estimation comparing
each ortholog and TE. This is the most time-consuming step if you have large
data. For a quick demonstration, we use a relatively small data. You can
specify the correlation method and adjusted p-value method. The default
methods are Pearson’s correlation and FDR. Note:  For more efficient and
specific analysis, you can subset your data in this step to focus on only the
orthologs and TEs that you are interested in.
# load built-in data
data(speciesCorr)
hmGeneCorrInput <- assay_tekcorrset(speciesCorr, "gene", "human")
hmTECorrInput <- assay_tekcorrset(speciesCorr, "te", "human")
chimpGeneCorrInput <- assay_tekcorrset(speciesCorr, "gene", "chimpanzee")
chimpTECorrInput <- assay_tekcorrset(speciesCorr, "te", "chimpanzee")
hmCorrResult <- corrOrthologTE(
    geneInput = hmGeneCorrInput,
    teInput = hmTECorrInput,
    corrMethod = "pearson",
    padjMethod = "fdr"
)
chimpCorrResult <- corrOrthologTE(
    geneInput = chimpGeneCorrInput,
    teInput = chimpTECorrInput,
    corrMethod = "pearson",
    padjMethod = "fdr"
)
head(hmCorrResult)##          geneName      teName      pvalue        coef      padj
## 1 ENSG00000143125        L1MD 0.271964872  0.38497828 0.9990235
## 2 ENSG00000143125        MSTA 0.335873091  0.34036703 0.9990235
## 3 ENSG00000143125  MLT1N2-int 0.966658172  0.01524552 0.9990235
## 4 ENSG00000143125       LTR57 0.067870603  0.59794954 0.9990235
## 5 ENSG00000143125 HERVK11-int 0.001028058  0.87118988 0.3210294
## 6 ENSG00000143125        LTR5 0.855235258 -0.06647109 0.9990235appTEKRABber():TEKRABber provides an app function for you to
quickly view your result. First, you will need to assign the differentially
expressed orthologs/TEs results, correlation results and metadata as global
variables: appDE, appRef, appCompare and appMeta. See the following
example.
#create global variables for app-use
appDE <- hmchimpDE
appRef <- hmCorrResult
appCompare <- chimpCorrResult
appMeta <- meta # this is the same one in DE analysis
appTEKRABber()In the Expression tab page, (1) you can specify your input gene and
TE. The result will show in box plots with data points in normalized log2
expression level (2) DE analysis result will show in table including
statistical information (3) Correlation result will indicate if these
selected pairs are significantly correlated and the value of correlation
coefficients.
In the Correlation tab page (above figure), you can select your data in
scatter plots in three ways. (1) Specify the data point and it will turn
red in the distribution of your results (2) Show the distribution of all
the data from your correlation results based on their correlation
coefficients and adjusted p-value. The blue vertical dashed line indicates
the boundary
of adjusted p-value is 0.05, and the orange one is for adjusted p-value
0.01 (3) you can click the data points which you are interested in, and
it will be listed in the table. You can also drag a certain area to show
data points in it.
If you want to compare selected genes and TEs (1) from different tissue in same species or (2) control and drug treatment in same tissue in same species, please generate all the input files following the input format. Here we show an example data of prepared input files including expression counts from 10 control and 10 treatment samples. The format of input data: row names should be gene name or id, and column name is your sample id (please see details below).
# load built-in data
data(ctInputDE)
geneInputDE <- ctInputDE$gene
teInputDE <- ctInputDE$te
# you need to follow the input format as below
head(geneInputDE)##                 control_1 control_2 control_3 control_4 control_5 treatment_6
## ENSG00000180263      1470      2072      1864      2238      2246        2599
## ENSG00000185985      1599      1045       946      1642      2199         665
## ENSG00000144355       517       380      1211       812        48         388
## ENSG00000234003         4         4        14        10         5           9
## ENSG00000257342         1         1         1         2         3           3
## ENSG00000223953       259       830       133       258       850         504
##                 treatment_7 treatment_8 treatment_9 treatment_10
## ENSG00000180263        2679        2562        2532         2682
## ENSG00000185985        1023        2477        1423         1731
## ENSG00000144355         275         633          59          248
## ENSG00000234003           4          18          13           22
## ENSG00000257342           0           6           1            5
## ENSG00000223953        1143        1500         498          864For DE analysis in the same species, you also use DEgeneTE() function,
however, you need to set the parameter expDesign to FALSE. You also
need to provide a metadata which this time the column name must be
experiment. See demonstration below:
metaExp <- data.frame(experiment = c(rep("control", 5), rep("treatment", 5)))
rownames(metaExp) <- colnames(geneInputDE)
metaExp$experiment <- factor(
    metaExp$experiment, 
    levels = c("control", "treatment")
)
resultDE <- DEgeneTE(
    geneTable = geneInputDE,
    teTable = teInputDE,
    metadata = metaExp,
    expDesign = FALSE
)For a quick demonstration to perform correlation of genes and TEs in control and treatment sample, we use relatively small input tables which only include 10 genes and 10 TEs.
# load built-in data
data(ctCorr)
geneConCorrInput <- assay_tekcorrset(ctCorr, "gene", "control")
teConCorrInput <- assay_tekcorrset(ctCorr, "te", "control")
geneTreatCorrInput <- assay_tekcorrset(ctCorr, "gene", "treatment")
teTreatCorrInput <- assay_tekcorrset(ctCorr, "te", "treatment")
# you need to follow the input format as below
head(geneConCorrInput)##                 control_1 control_2 control_3 control_4 control_5
## ENSG00000180263      1470      2072      1864      2238      2246
## ENSG00000185985      1599      1045       946      1642      2199
## ENSG00000144355       517       380      1211       812        48
## ENSG00000234003         4         4        14        10         5
## ENSG00000257342         1         1         1         2         3
## ENSG00000223953       259       830       133       258       850controlCorr <- corrOrthologTE(
    geneInput = geneConCorrInput,
    teInput = teConCorrInput,
    corrMethod = "pearson",
    padjMethod = "fdr"
)
treatmentCorr <- corrOrthologTE(
    geneInput = geneTreatCorrInput,
    teInput = teTreatCorrInput,
    corrMethod = "pearson",
    padjMethod = "fdr"
)
head(treatmentCorr)##          geneName teName    pvalue       coef      padj
## 1 ENSG00000180263    ALU 0.7350708 -0.2096204 0.8750842
## 2 ENSG00000180263  AluJb 0.6463988 -0.2814802 0.8729353
## 3 ENSG00000180263  AluJo 0.5879945 -0.3296607 0.8399922
## 4 ENSG00000180263  AluJr 0.6080013 -0.3130670 0.8563399
## 5 ENSG00000180263 AluJr4 0.7158070 -0.2251209 0.8729353
## 6 ENSG00000180263  AluSc 0.7130150 -0.2273721 0.8729353appTEKRABber():appDE <- resultDE
appRef <- controlCorr
appCompare <- treatmentCorr
appMeta <- metaExp
appTEKRABber()sessionInfo()## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SummarizedExperiment_1.26.1 Biobase_2.56.0             
##  [3] GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
##  [5] IRanges_2.30.0              S4Vectors_0.34.0           
##  [7] BiocGenerics_0.42.0         MatrixGenerics_1.8.0       
##  [9] matrixStats_0.62.0          TEKRABber_1.0.1            
## [11] BiocStyle_2.24.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7           bit64_4.0.5            RColorBrewer_1.1-3    
##  [4] httr_1.4.3             numDeriv_2016.8-1.1    tools_4.2.0           
##  [7] bslib_0.3.1            utf8_1.2.2             R6_2.5.1              
## [10] DBI_1.1.2              colorspace_2.0-3       apeglm_1.18.0         
## [13] tidyselect_1.1.2       DESeq2_1.36.0          bit_4.0.4             
## [16] compiler_4.2.0         cli_3.3.0              DelayedArray_0.22.0   
## [19] bookdown_0.26          sass_0.4.1             scales_1.2.0          
## [22] mvtnorm_1.1-3          genefilter_1.78.0      stringr_1.4.0         
## [25] digest_0.6.29          rmarkdown_2.14         XVector_0.36.0        
## [28] pkgconfig_2.0.3        htmltools_0.5.2        fastmap_1.1.0         
## [31] bbmle_1.0.25           rlang_1.0.2            RSQLite_2.2.14        
## [34] jquerylib_0.1.4        generics_0.1.2         jsonlite_1.8.0        
## [37] BiocParallel_1.30.3    dplyr_1.0.9            RCurl_1.98-1.7        
## [40] magrittr_2.0.3         GenomeInfoDbData_1.2.8 Matrix_1.4-1          
## [43] Rcpp_1.0.8.3           munsell_0.5.0          fansi_1.0.3           
## [46] lifecycle_1.0.1        stringi_1.7.6          yaml_2.3.5            
## [49] MASS_7.3-57            zlibbioc_1.42.0        plyr_1.8.7            
## [52] grid_4.2.0             blob_1.2.3             parallel_4.2.0        
## [55] bdsmatrix_1.3-6        crayon_1.5.1           lattice_0.20-45       
## [58] Biostrings_2.64.0      splines_4.2.0          annotate_1.74.0       
## [61] KEGGREST_1.36.2        locfit_1.5-9.5         knitr_1.39            
## [64] pillar_1.7.0           geneplotter_1.74.0     codetools_0.2-18      
## [67] XML_3.99-0.9           glue_1.6.2             evaluate_0.15         
## [70] BiocManager_1.30.18    png_0.1-7              vctrs_0.4.1           
## [73] gtable_0.3.0           purrr_0.3.4            assertthat_0.2.1      
## [76] cachem_1.0.6           ggplot2_3.3.6          emdbook_1.3.12        
## [79] xfun_0.31              xtable_1.8-4           coda_0.19-4           
## [82] survival_3.3-1         tibble_3.1.7           AnnotationDbi_1.58.0  
## [85] memoise_2.0.1          ellipsis_0.3.2