DTEG.analysis {ORFik}R Documentation

Run differential TE analysis

Description

Creates a total of 3 DESeq models (given x is design argument input (usually stage or condition) and libraryType is RNA-seq and Ribo-seq):
1. Ribo-seq model: design = ~ x (differences between the x groups in Ribo-seq)
2. RNA-seq model: design = ~ x (differences between the x groups in RNA-seq)
3. TE model: design = ~ x + libraryType + libraryType:x (differences between the x and libraryType groups and the interaction between them)
Using an equal reimplementation of the deltaTE algorithm (see reference). You need at least 2 groups and 2 replicates per group. The Ribo-seq counts will be over CDS and RNA-seq over mRNAs, per transcript.

Usage

DTEG.analysis(
  df.rfp,
  df.rna,
  output.dir = paste0(dirname(df.rfp$filepath[1]), "/QC_STATS/"),
  design = "stage",
  p.value = 0.05,
  RFP_counts = countTable(df.rfp, "cds", type = "summarized"),
  RNA_counts = countTable(df.rna, "mrna", type = "summarized"),
  batch.effect = FALSE,
  pairs = combn.pairs(unlist(df.rfp[, design])),
  plot.title = "",
  plot.ext = ".pdf",
  width = 6,
  height = 6,
  dot.size = 0.4,
  relative.name = paste0("DTEG_plot", plot.ext),
  complex.categories = FALSE
)

Arguments

df.rfp

a experiment of Ribo-seq or 80S from TCP-seq.

df.rna

a experiment of RNA-seq

output.dir

output.dir directory to save plots, plot will be named "TE_between". If NULL, will not save.

design

a character vector, default "stage". The columns in the ORFik experiment that represent the comparison contrasts. Usually found in "stage", "condition" or "fraction" column.

p.value

a numeric, default 0.05 in interval (0,1) or "" to not show. What p-value used for the analysis? Will be shown as a caption.

RFP_counts

a SummarizedExperiment, default: countTable(df.rfp, "cds", type = "summarized"), unshifted libraries, all transcripts. If you have pshifted reads and countTables, do: countTable(df.rfp, "cds", type = "summarized", count.folder = "pshifted") Assign a subset if you don't want to analyze all genes. It is recommended to not subset, to give DESeq2 data for variance analysis.

RNA_counts

a SummarizedExperiment, default: countTable(df.rna, "mrna", type = "summarized"), all transcripts. Assign a subset if you don't want to analyze all genes. It is recommended to not subset, to give DESeq2 data for variance analysis.

batch.effect,

logical, default FALSE. If you believe you might have batch effects, set to TRUE, will use replicate column to represent batch effects. Batch effect usually means that you have a strong variance between biological replicates. Check PCA plot on count tables to verify if you need to set it to TRUE.

pairs

list of character pairs, the experiment contrasts. Default: combn.pairs(unlist(df.rfp[, design])

plot.title

title for plots, usually name of experiment etc

plot.ext

character, default: ".pdf". Alternatives: ".png" or ".jpg".

width

numeric, default 6 (in inches)

height

numeric, default 6 (in inches)

dot.size

numeric, default 0.4, size of point dots in plot.

relative.name

character, Default: "DTEG_plot.pdf". Relative name of file to be saved in folder specified in output.dir. Change to .pdf if you want pdf file instead of png.

complex.categories

logical, default FALSE. Seperate into more groups, will add Inverse (opposite diagonal of mRNA abundance) and Expression (only significant mRNA-seq)

Details

The respective groups are defined as this (given a user defined p value, shown here as 0.05):
1. Translation - te.p.adj < 0.05 & rfp.p.adj < 0.05 & rna.p.adj > 0.05
2. mRNA abundance - te.p.adj > 0.05 & rfp.p.adj < 0.05 & rna.p.adj > 0.05
3. Buffering - te.p.adj < 0.05 & rfp.p.adj > 0.05 & rna.p.adj > 0.05

Buffering also adds some close group which are split up if you set complex.categories = TRUE (You will then get in addition) See Figure 1 in the reference article for a clear definition of the groups!
If you do not need isoform variants, subset to longest isoform per gene either before or in the returned object (See examples). If you do not have RNA-seq controls, you can still use DESeq on Ribo-seq alone.
The LFC values are shrunken by lfcShrink(type = "normal").

Remember that DESeq by default can not do global change analysis, it can only find subsets with changes in LFC!

Value

a data.table with 9 columns. (log fold changes, p.ajust values, group, regulation status and gene id)

References

doi: 10.1002/cpmb.108

See Also

Other TE: DTEG.plot(), te.table(), te_rna.plot()

Examples

## Simple example
#df.rfp <- read.experiment("Riboseq")
#df.rna <- read.experiment("RNAseq")
#dt <- DTEG.analysis(df.rfp, df.rna)
## If you want to use the pshifted libs for analysis:
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"))
## Restrict DTEGs by log fold change (LFC):
## subset to abs(LFC) < 1.5 for both rfp and rna
#dt[abs(rfp) < 1.5 & abs(rna) < 1.5, Regulation := "No change"]

## Only longest isoform per gene:
#tx_longest <- filterTranscripts(df.rfp, 0, 1, 0)
#dt <- dt[id %in% tx_longest,]
## Convert to gene id
#dt[, id := txNamesToGeneNames(id, df.rfp)]
## To get by gene symbol, use biomaRt conversion
## To flip directionality of contrast pair nr 2:
#design <- "condition"
#pairs <- combn.pairs(unlist(df.rfp[, design])
#pairs[[2]] <- rev(pars[[2]])
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"),
#                       pairs = pairs)

[Package ORFik version 1.12.13 Index]