SpliceImpactR links differential splicing to
protein-level outcomes, including sequence changes, domain gain/loss,
and potential PPI rewiring.
SpliceImpactR is a self-contained and efficient pipeline designed to identify how alternative RNA processing events affect resulting proteins. After initial alternative RNA processing quantification, the method runs entirely within R and does not require additional external software.
The pipeline begins by importing alternative RNA processing events that differ across conditions. It is built to support highly flexible input, requiring only inclusion and exclusion coordinates for each event isoform. This makes it compatible with both user-defined events and outputs from existing tools.
Custom input functions detect multiple classes of alternative RNA processing, including alternative first exons (AFE), hybrid first exons (HFE), retained introns (RI), skipped exons (SE), alternative 5′ and 3′ splice sites (A5SS and A3SS), mutually exclusive exons (MXE), hybrid last exons (HLE), and alternative last exons (ALE).
SpliceImpactR then performs differential inclusion analysis to identify condition-specific transcript pairs that differ by at least one alternative RNA processing event. These events are mapped to transcript and protein annotations to evaluate their effects on coding potential, sequence similarity, and frameshifts.
The pipeline next uses curated and user-provided protein feature sets to annotate domain-level changes, assess shifts in domain usage across conditions, and integrate domain-domain interaction data, domain-motif interaction data, and protein-protein interaction information to predict downstream effects on protein interaction networks.
Finally, SpliceImpactR performs a global analysis to detect co-regulated splicing events and quantify the relative contribution of each alternative RNA processing type. The pipeline is also available through an R Shiny interface to improve accessibility and ease of use.
SpliceImpactR is built to handle both S4
SpliceImpactResult object with custom accession functions
and standard data.table input / workflow.
SpliceImpactResult objects contain all relevant data along
with detailed sequence information for each event. This object allows
for maximal interoperability with any level of data from this
pipeline.
This vignette uses bundled test data so every core step is runnable.
Install with one of the following methods.
rMATS (replicate Multivariate Analysis of Transcript
Splicing) provides event-level splicing tables.HITindex quantifies first/last exon usage and labels
hybrid exons.PPIDM Protein-Protein Interaction Domain Miner for
domain-domain Interaction derived from PPI and 3did’s domain-domain
interactionsELM Eukaryotic Linear Motif Database for accessing
Short Linear Motif occurences and domain-motif interactionsBiomaRt accesses data from InterPro, PFAM, SignalP,
TMHMM, CDD, Mobidb-litesample_frame).get_protein_features() supports these feature sources: -
interpro: integrated domain/family/superfamily signatures.
- pfam: protein domain HMM families. - cdd:
NCBI Conserved Domain Database annotations. - gene3d:
CATH/Gene3D structural domain annotations. - signalp:
signal peptide calls. - tmhmm: transmembrane helix calls. -
ncoils: coiled-coil region predictions. - seg:
low-complexity segments. - mobidblite: intrinsically
disordered region calls. - elm: short linear motifs (SLiMs;
retrieved from ELM, not BioMart).
For BioMart-backed sources, each feature must provide three
attributes: {feature}, {feature}_start, and
{feature}_end (for example pfam,
pfam_start, pfam_end). You can add non-BioMart
custom features with get_manual_features() (explained
further down in the Custom Input Entry Points section) and merge
everything through get_comprehensive_annotations().
We load test annotations and three representative feature sources
(interpro, signalp, elm) and
combine them. Loading multiple databases in one call is possible, but
not recommended due to timeouts. These are automatically cached and
loaded from cache if the correct signature (organism, version, etc) is
found. BiomaRt can give difficulties in loading these, so this procedure
will sometimes take a few tries. If this is unsuccessful, restart R
and/or try suggested methods related to this issue (eg:
httr::reset_config())
library(SpliceImpactR)
annotation_df <- get_annotation(load = "test")
#> [INFO] Loading bundled test annotation data
interpro_features <- get_protein_features(
biomaRt_databases = c("interpro"),
gtf_df = annotation_df$annotations,
test = TRUE
)
signalp_features <- get_protein_features(
biomaRt_databases = c("signalp"),
gtf_df = annotation_df$annotations,
test = TRUE
)
elm_features <- get_protein_features(
biomaRt_databases = c("elm"),
gtf_df = annotation_df$annotations,
test = TRUE
)
protein_feature_total <- get_comprehensive_annotations(
list(interpro_features, signalp_features, elm_features)
)
exon_features <- get_exon_features(
annotation_df$annotations,
protein_feature_total
)Each sample directory contains event files exported by
rMATS and/or HITindex. The manifest below
points to the bundled test directories. Using this function,
SpliceImpactR will search each directory for the outputs provided by
rMATS and HIT Index output files ending in .AFEPSI,
.ALEPSI, and .exon. Label one condition
case and one condition control.
Required sample manifest columns: - path: per-sample
directory containing splicing output files. - sample_name:
unique sample identifier. - condition: case/control label
used in DI and downstream pairing.
Manifest expectations: - One row per sample. - path
values must point to readable sample directories. - Use exactly two
conditions for this standard workflow (case,
control). - Include replicate samples per condition for
robust differential modeling.
sample_frame <- data.frame(
path = c(
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/control_S5/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/control_S6/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/control_S7/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/control_S8/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/case_S1/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/case_S2/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/case_S3/"),
file.path(system.file("extdata", package = "SpliceImpactR"),
"rawData/case_S4/")
),
sample_name = c("S5", "S6", "S7", "S8", "S1", "S2", "S3", "S4"),
condition = c(
"control", "control", "control", "control",
"case", "case", "case", "case"
),
stringsAsFactors = FALSE
)get_splicing_impact)If your inputs already follow the standard layout, run the full workflow with the top-level wrapper and choose table or S4 return mode.
# data.table-first return
out_dt <- get_splicing_impact(
sample_frame = sample_frame,
source_data = "both",
annotation_df = annotation_df,
protein_feature_total = protein_feature_total,
exon_features = exon_features,
return_class = "data.table"
)
# S4-first return (single object carries all slots)
out_s4 <- get_splicing_impact(
sample_frame = sample_frame,
source_data = "both",
annotation_df = annotation_df,
protein_feature_total = protein_feature_total,
exon_features = exon_features,
return_class = "S4"
)For the stepwise workflow in this vignette, load event data explicitly:
data <- get_rmats_hit(
sample_frame,
event_types = c("ALE", "AFE", "MXE", "SE", "A3SS", "A5SS", "RI")
)
#> [INFO] Filtered 0 unannotated ALE/AFE events (465 -> 465)
DT <- data.table::as.data.table(data)
DT[, .(
n_rows = .N,
n_events = data.table::uniqueN(event_id),
n_genes = data.table::uniqueN(gene_id)
)]
#> n_rows n_events n_genes
#> <int> <int> <int>
#> 1: 5863 605 41If rMATS and HITindex outputs are stored in
separate per-sample directories, load them independently and combine on
shared columns. In this test-data example, both manifests point to the
same bundled paths; in real analyses,
sample_frame_rmats$path and
sample_frame_hit$path can be different.
sample_frame_rmats <- sample_frame
sample_frame_hit <- sample_frame
rmats_only <- get_rmats(
load_rmats(
sample_frame_rmats,
use = "JCEC",
event_types = c("MXE", "SE", "A3SS", "A5SS", "RI")
)
)
hit_only <- get_hitindex(
sample_frame_hit,
keep_annotated_first_last = TRUE
)
#> [INFO] Filtered 0 unannotated ALE/AFE events (465 -> 465)
shared_cols <- intersect(names(rmats_only), names(hit_only))
data_separate <- data.table::rbindlist(
list(
rmats_only[, ..shared_cols],
hit_only[, ..shared_cols]
),
use.names = TRUE,
fill = TRUE
)
data_separate[, .(
n_rows = .N,
n_events = data.table::uniqueN(event_id),
n_genes = data.table::uniqueN(gene_id)
)]
#> n_rows n_events n_genes
#> <int> <int> <int>
#> 1: 5863 605 41This summary compares event-level mean HIT values across conditions.
hit_compare <- compare_hit_index(
sample_frame,
condition_map = c(control = "control", test = "case")
)
hit_compare$plotHITindex condition comparison.
head(hit_compare$results[order(fdr)], 6)
#> event_key control test diff_HIT
#> <char> <num> <num> <num>
#> 1: ENSG00000158286|chr1:6210370-6211118 -0.1890000 0.12766667 0.3166667
#> 2: ENSG00000158286|chr1:6211867-6212053 -0.1905000 -0.23400000 -0.0435000
#> 3: ENSG00000158286|chr1:6212218-6212416 -0.1783333 -0.14533333 0.0330000
#> 4: ENSG00000158286|chr1:6212682-6213183 0.4350000 0.04666667 -0.3883333
#> 5: ENSG00000158286|chr1:6218289-6218369 -0.2993333 -0.05766667 0.2416667
#> 6: ENSG00000142599|chr1:8355419-8355599 0.1010000 0.10800000 0.0070000
#> delta_HIT n_control n_test p_value fdr
#> <num> <int> <int> <num> <num>
#> 1: 0.3166667 2 3 1.0000000 1
#> 2: 0.0435000 2 3 1.0000000 1
#> 3: 0.0330000 3 3 0.8337307 1
#> 4: 0.3883333 2 3 1.0000000 1
#> 5: 0.2416667 3 3 0.2567179 1
#> 6: 0.0070000 4 4 0.9338296 1We run quasi-binomial GLMs and filter significant events. Verbose
filtering messages are line-wrapped so progress logs remain readable
without horizontal scrolling. For efficient processing, set
parallel_glm to TRUE and provide a
BiocParallel object to BPPARAM.
Default thresholds in keep_sig_pairs is FDR < 0.05
and |DELTA PSI| > 0.1.
res <- get_differential_inclusion(
data,
min_total_reads = 10,
parallel_glm = TRUE,
BPPARAM = BiocParallel::SerialParam()
)
#> [INFO] Input contains 41 genes, 605 events, and 2932 event instances.
#> [PROCESSING/INFO] Filtering out low-coverage rows removed 892 event instances; 4971
#> remain.
#> [PROCESSING/INFO] Completing AFE/ALE with zeros per sample (total strategy: event_max)
#> filled 407 AFE rows and 160 ALE rows (totals: AFE=608, ALE=424).
#> [PROCESSING/INFO] Filtering genes with no nonzero values per sample/event_type removed
#> 37 genes from specific sample groups; 40 genes remain overall.
#> [PROCESSING/INFO] Filtering by minimum condition presence removed 211 events; 351
#> events remain.
#> [PROCESSING/INFO] Removed 62 non-changing events; 289 events remain.
#> [PROCESSING/INFO] Filtering events not represented in both conditions removed 30
#> events; 259 events remain.
#> [PROCESSING] Fitting quasi-binomial GLMs per site...
#> | | | 0% | |======================================================================| 100%
#>
#> [DONE] Fitted 531 sites in 1 chunks.
#> [INFO] Done.
res_di <- keep_sig_pairs(res)
res[, .(
n_rows = .N,
n_sig = sum(padj <= 0.05 & abs(delta_psi) >= 0.1, na.rm = TRUE)
)]
#> n_rows n_sig
#> <int> <int>
#> 1: 533 73Differential inclusion volcano plot.
Significant DI events are mapped to annotation and sequence data, then collapsed into case/control transcript pairs. This matching is done through a strict hierarchy:
chr, strand, and
gene_id to keep only compatible annotation intervals.inc) coordinates to exons and require
all inclusion parts to be covered for a transcript candidate.exc)
coordinates.first, internal, last),
then reciprocal overlap and intersection width; protein-linked
transcripts are preferred when available.get_pairs(source = "multi")
by joining all positive delta_psi rows (case) with all
negative rows (control) for each event_id, then ordering by
strongest |delta_psi|.matched <- get_matched_events_chunked(
res_di,
annotation_df$annotations,
chunk_size = 2000
)
#> Chunk 1/1: rows 1..94
hits_sequences <- attach_sequences(matched, annotation_df$sequences)
pairs <- get_pairs(hits_sequences, source = "multi")
pairs[, .(
n_pairs = .N,
n_events = data.table::uniqueN(event_id)
)]
#> n_pairs n_events
#> <int> <int>
#> 1: 36 28Once pairing is done, we can probe whether there is a proximal/distal shift in AFE/ALE usage.
Proximal versus distal terminal exon summary.
compare_sequence_frame() computes protein-coding class,
alignment identity, frameshift/rescue status, and length deltas.
Key labels used downstream: - protein_coding: both
isoforms have protein IDs (both sides coding). - onePC:
only one side has a protein ID. - noPC: neither side has a
protein ID. - Match: protein sequences are identical. -
FrameShift: frame is disrupted between paired isoforms.
seq_compare <- compare_sequence_frame(pairs, annotation_df$annotations)
#> [1] "[INFO] Processing 36 transcript and protein sequence alignments, this may take a little..."
#> [1] "[Processing] Identifying frame shifts and rescues"
#> [1] "[INFO] 1 frameshifts (0 rescues) and 27 non-frameshifts were identified, "Protein alignment summary by event type.
Case versus control isoform length comparison.
Background transcript pairs are derived from annotations and compared against observed event-linked domain changes.
bg <- get_background(
source = "annotated",
annotations = annotation_df$annotations,
protein_features = protein_feature_total,
BPPARAM = BiocParallel::SerialParam()
)
hits_domain <- get_domains(seq_compare, exon_features)
hits_domain[, .(
n_hits = .N,
n_domain_changed = sum(diff_n > 0, na.rm = TRUE)
)]
#> n_hits n_domain_changed
#> <int> <int>
#> 1: 36 12enriched_domains <- enrich_domains_hypergeo(
hits = hits_domain,
background = bg,
db_filter = "interpro"
)
domain_plot <- plot_enriched_domains_counts(enriched_domains, top_n = 20)
domain_plotTop enriched InterPro domain differences.
You can also stratify enrichment by event type or by feature database.
enriched_afe_interpro <- enrich_by_event(
hits = hits_domain,
background = bg,
events = "AFE",
db_filter = "interpro"
)
enriched_interpro_only <- enrich_by_db(
hits = hits_domain,
background = bg,
dbs = "interpro"
)
list(
by_event_rows = nrow(enriched_afe_interpro),
by_db_rows = nrow(enriched_interpro_only)
)
#> $by_event_rows
#> [1] 1
#>
#> $by_db_rows
#> [1] 1PPI effects are inferred from domain and motif changes. Domain-domain and domain-motif interactions are sourced from 3did/PPDIM and ELM. PPI rewiring currently only works for human inputs.
ppi <- get_ppi_interactions()
hits_final <- get_ppi_switches(
hits_domain,
ppi,
protein_feature_total
)
#>
ppi_plot <- plot_ppi_summary(hits_final)
ppi_plotSummary of predicted case/control PPI changes.
get_gene_enrichment() extracts foreground genes for
enrichment.
mode = "di" expects a DI-like table
(res).mode = "ppi"/"domain" expects a hits-like
table (hits_final or hits_domain).fg_di <- get_gene_enrichment(
mode = "di",
res = res,
padj_threshold = 0.1,
delta_psi_threshold = 0.1
)
fg_domain <- get_gene_enrichment(mode = "domain", hits = hits_domain)
fg_ppi <- get_gene_enrichment(mode = "ppi", hits = hits_final)
lengths(list(di = fg_di, domain = fg_domain, ppi = fg_ppi))
#> di domain ppi
#> 40 11 6
intersect(fg_di, fg_ppi)
#> [1] "ENSG00000116171" "ENSG00000142599" "ENSG00000140983" "ENSG00000101391"
#> [5] "ENSG00000116406"The optional get_enrichment() step depends on additional
packages and may return no significant terms on small toy datasets. If
you see all NA statistics, foreground genes often map too
sparsely into the selected background and ontology.
p_adjust_cutoff and top_n_plot are set to
allow for example visualization, but should be adjusted to standard (eg:
0.05, 10) for non-toy data uses.
needed <- c("clusterProfiler", "msigdbr", "org.Hs.eg.db")
has_needed <- all(vapply(
needed,
requireNamespace,
FUN.VALUE = logical(1),
quietly = TRUE
))
#>
#>
if (has_needed) {
enrichment_di <- get_enrichment(
foreground = fg_domain,
background = bg$gene_id,
species = "human",
gene_id_type = "ensembl",
sources = "GO:BP",
p_adjust_cutoff = 1, ## CHANGE FOR NON-TOY DATA USAGE, eg: 0.05
top_n_plot = 1 ## CHANGE FOR NON-TOY DATA USAGE, eg: NULL or 10
)
enrichment_di$plot
} else {
message(
"Skipping get_enrichment(): install ",
paste(needed, collapse = ", ")
)
}
#> 'select()' returned 1:1 mapping between keys and columns
#> 'select()' returned 1:1 mapping between keys and columns
#> Warning: semData is not provided. It will be calculated automatically.plot_two_transcripts_with_domains_unified() shows
matched isoforms in either transcript (genome-coordinate) or protein
(amino-acid) view.
viz_pair <- hits_final[
!is.na(transcript_id_case) &
!is.na(transcript_id_control) &
transcript_id_case != "" &
transcript_id_control != ""
][1]
if (nrow(viz_pair) == 0) {
stop("No transcript pairs available for visualization.")
}
tx_pair <- c(viz_pair$transcript_id_case, viz_pair$transcript_id_control)
transcript_centric <- plot_two_transcripts_with_domains_unified(
transcripts = tx_pair,
gtf_df = annotation_df$annotations,
protein_features = protein_feature_total,
feature_db = c("interpro"),
combine_domains = TRUE,
view = "transcript"
)
protein_centric <- plot_two_transcripts_with_domains_unified(
transcripts = tx_pair,
gtf_df = annotation_df$annotations,
protein_features = protein_feature_total,
feature_db = c("interpro"),
combine_domains = TRUE,
view = "protein"
)
transcript_centricTranscript and protein views for one paired event.
Transcript and protein views for one paired event.
probe_individual_event() plots PSI for one event across
samples.
event_to_probe <- res$event_id[1]
probe <- probe_individual_event(data, event = event_to_probe)
probe$plotPSI distribution for one selected event.
head(probe$data)
#> event_id sample condition psi i.condition form
#> <char> <char> <char> <num> <char> <char>
#> 1: A3SS:26 S1 case 0.483 case INC
#> 2: A3SS:26 S2 case 0.483 case INC
#> 3: A3SS:26 S3 case 0.525 case INC
#> 4: A3SS:26 S4 case 0.345 case INC
#> 5: A3SS:26 S5 control 0.208 control INC
#> 6: A3SS:26 S6 control 0.085 control INCintegrated_event_summary() combines sequence, domain,
and PPI summaries.
Panel guide: - Top-left: event classification composition
(Match, FrameShift, etc.) by event type. -
Top-right: alignment score distributions by event type. - Middle-left:
domain-change prevalence (Any, case-only, control-only,
both). - Middle-center/right: fraction and counts of predicted PPI
rewiring. - Bottom-left: relative event retention from pre-filter DI to
final hits. - Bottom-right: gene-level event-type coordination (Jaccard
heatmap).
Integrated multi-layer summary.
hits_final, data,
res)The pipeline returns three core tables in data.table
mode: - data: sample-level event measurements before
differential modeling. - res: differential inclusion
results per tested event/site. - hits_final: paired
case/control isoform effects with sequence, domain, and PPI
annotations.
Suffix convention used throughout: - _case =
case-preferred isoform values. - _control =
control-preferred isoform values.
hits_final (integrated event-level output)Use this table for biological interpretation and downstream plotting.
event_id,
event_type, gene_id, chr,
strand, transcript_id_case,
transcript_id_control, protein_id_case,
protein_id_control, form_case,
form_control, exons_case,
exons_control.inc_case,
inc_control, exc_case,
exc_control, delta_psi_case,
delta_psi_control, p.value_case,
p.value_control, padj_case,
padj_control, n_samples_case,
n_samples_control, n_case_case,
n_case_control, n_control_case,
n_control_control.transcript_seq_case,
transcript_seq_control, protein_seq_case,
protein_seq_control, pc_class, length columns
(prot_len_*, tx_len_*,
exon_cds_len_*, exon_len_*) and their
*_diff / *_diff_abs derivatives.dna_pid,
dna_score, dna_width), protein alignment
(prot_pid, prot_score,
prot_width), frame diagnostics (frame_call,
rescue, frame_check_exon1,
frame_check_exon2), and
summary_classification.domains_exons_case,
domains_exons_control, case_only_domains,
control_only_domains, case_only_domains_list,
control_only_domains_list,
either_domains_list, case_only_n,
control_only_n, diff_n.case_ppi,
control_ppi, n_case_ppi,
n_control_ppi, n_ppi,
case_ppi_drivers, control_ppi_drivers.data (raw sample-level input table)Use data to inspect per-sample evidence feeding DI.
Core columns: event_id, event_type,
form, gene_id, chr,
strand, inc, exc,
inclusion_reads, exclusion_reads,
psi, sample, condition,
source_file.
Often present depending on import path: HITindex metadata such as
HITindex, class, nFE,
nLE, nUP, nDOWN,
nTXPT, psi_original, total_reads,
source.
res (differential inclusion output)Use res to rank significant events before
pairing/domain/PPI steps.
Core columns: site_id, event_id,
event_type, gene_id, chr,
strand, inc, exc,
form, n_samples, n_control,
n_case, mean_psi_ctrl,
mean_psi_case, delta_psi,
p.value, padj, cooks_max,
n, n_used.
Use SpliceImpactResult when you want all pipeline pieces
in one object, with consistent filtering across event-linked slots.
SpliceImpactResult is a custom S4 container that keeps
all major pipeline parts synchronized: - raw_events
(SummarizedExperiment): sample-level table + ranges/assays.
- di_events / res_di (GRanges):
differential inclusion rows. - matched
(DFrame): annotation-matched rows (and sequence-attached
rows). - paired_hits (GRanges) +
segments (GRangesList): final case/control
event impacts. - sample_frame (DFrame): sample
manifest metadata.
obj <- as_splice_impact_result(
data = data,
res = res,
res_di = res_di,
matched = hits_sequences,
sample_frame = sample_frame,
hits_final = hits_final
)get_hits_*() wrappers provide compact views for common
downstream tasks: - get_hits_core(): event IDs,
transcript/protein IDs, and key DI metadata. -
get_hits_domain(): domain gain/loss columns
(case_only_*, control_only_*,
diff_n, mapped domain lists). -
get_hits_ppi(): PPI partner changes (n_ppi,
case/control partner lists, PPI drivers). -
get_hits_sequence(): sequence/alignment/frame fields
(pc_class, frame_call, protein/transcript
length deltas, identity metrics).
raw_dt <- as_dt_from_s4(obj, "raw_events")
di_dt <- as_dt_from_s4(obj, "di_events")
hits_dt <- as_dt_from_s4(obj, "paired_hits")
hits_core <- get_hits_core(obj)
hits_domain_view <- get_hits_domain(obj)
hits_ppi_view <- get_hits_ppi(obj)
hits_sequence_view <- get_hits_sequence(obj)
c(
raw_rows = nrow(raw_dt),
di_rows = nrow(di_dt),
hits_rows = nrow(hits_dt),
hits_core_rows = nrow(hits_core),
hits_domain_rows = nrow(hits_domain_view),
hits_ppi_rows = nrow(hits_ppi_view),
hits_sequence_rows = nrow(hits_sequence_view)
)
#> raw_rows di_rows hits_rows hits_core_rows
#> 5863 533 36 36
#> hits_domain_rows hits_ppi_rows hits_sequence_rows
#> 36 36 36obj_focus <- filter_spliceimpact_hits(
obj,
diff_n > 0,
n_ppi > 0
)
focus_hits <- as_dt_from_s4(obj_focus, "paired_hits")
focus_hits[, .N, by = event_type][order(-N)]
#> event_type N
#> <char> <int>
#> 1: AFE 2
#> 2: RI 2
#> 3: SE 2fg_di_s4 <- get_gene_enrichment(
mode = "di",
x = obj,
padj_threshold = 0.05,
delta_psi_threshold = 0.1
)
fg_ppi_s4 <- get_gene_enrichment(mode = "ppi", x = obj)
fg_ppi_focus <- get_gene_enrichment(mode = "ppi", x = obj_focus)
list(
di_equal_table = identical(sort(fg_di), sort(fg_di_s4)),
ppi_equal_table = identical(sort(fg_ppi), sort(fg_ppi_s4)),
focused_ppi_genes = head(fg_ppi_focus)
)
#> $di_equal_table
#> [1] TRUE
#>
#> $ppi_equal_table
#> [1] TRUE
#>
#> $focused_ppi_genes
#> [1] "ENSG00000116171" "ENSG00000142599" "ENSG00000140983" "ENSG00000164692"
#> [5] "ENSG00000116406" "ENSG00000101391"spliceimpact_s4_schema()
#> $slots
#> $slots$raw_events
#> [1] "SummarizedExperiment: sample/form rows with rowRanges + rowData"
#>
#> $slots$di_events
#> [1] "GRanges: differential inclusion rows with mcols"
#>
#> $slots$res_di
#> [1] "GRanges: threshold-filtered differential rows with mcols"
#>
#> $slots$matched
#> [1] "S4Vectors::DataFrame: annotation-matched DI rows (and sequence-attached rows when present)"
#>
#> $slots$sample_frame
#> [1] "S4Vectors::DataFrame: sample manifest (`path`, `sample_name`, `condition`)"
#>
#> $slots$paired_hits
#> [1] "GRanges: paired case/control rows with mcols"
#>
#> $slots$segments
#> [1] "GRangesList: per-pair genomic segments (inc_case/inc_control/exc_case/exc_control)"
#>
#> $slots$metadata
#> [1] "list: provenance and pipeline metadata"
#>
#>
#> $keys
#> $keys$raw_events
#> [1] "raw_key (rowData)"
#>
#> $keys$di_events
#> [1] "di_key (mcols)"
#>
#> $keys$res_di
#> [1] "di_key (mcols)"
#>
#> $keys$paired_hits
#> [1] "pair_key (mcols)"
#>
#> $keys$segments
#> [1] "names(segments) == pair_key"
#>
#>
#> $assays
#> [1] "psi" "inclusion_reads" "exclusion_reads"This mirrors the table workflow, but keeps everything in a single
SpliceImpactResult object.
obj_flow <- as_splice_impact_result(
data = data,
sample_frame = sample_frame
)
obj_flow <- get_differential_inclusion(
obj_flow,
min_total_reads = 10,
parallel_glm = TRUE,
BPPARAM = BiocParallel::SerialParam(),
return_class = "S4"
)
obj_flow <- keep_sig_pairs(obj_flow, return_class = "S4")
obj_flow <- get_matched_events_chunked(
obj_flow,
annotation_df$annotations,
chunk_size = 2000,
return_class = "S4"
)
obj_flow <- attach_sequences(
obj_flow,
annotation_df$sequences,
return_class = "S4"
)
obj_flow <- get_pairs(obj_flow, source = "multi", return_class = "S4")
obj_flow <- compare_sequence_frame(
obj_flow,
annotation_df$annotations,
return_class = "S4"
)
obj_flow <- get_domains(obj_flow, exon_features, return_class = "S4")
obj_flow <- get_ppi_switches(
obj_flow,
ppi,
protein_feature_total,
return_class = "S4"
)
hits_core_flow <- get_hits_core(obj_flow)
hits_domain_flow <- get_hits_domain(obj_flow)
hits_ppi_flow <- get_hits_ppi(obj_flow)
hits_sequence_flow <- get_hits_sequence(obj_flow)These helpers support non-standard inputs at multiple stages of the workflow.
get_manual_features() lets you add user-defined
peptide-level features.
ann_dt <- data.table::as.data.table(annotation_df$annotations)
coding_tx <- unique(ann_dt[type == "exon" & cds_has == TRUE, transcript_id])
n_manual <- min(3L, length(coding_tx))
if (n_manual < 1L) {
stop("No coding transcripts found for manual feature example.")
}
manual_df <- data.frame(
ensembl_transcript_id = coding_tx[seq_len(n_manual)],
ensembl_peptide_id = rep(NA_character_, n_manual),
name = paste0("demo_feature_", seq_len(n_manual)),
start = c(20L, 45L, 80L)[seq_len(n_manual)],
stop = c(35L, 58L, 92L)[seq_len(n_manual)],
database = rep("manual", n_manual),
alt_name = rep(NA_character_, n_manual),
feature_id = rep(NA_character_, n_manual),
stringsAsFactors = FALSE
)
manual_features <- get_manual_features(
manual_features = manual_df,
gtf_df = annotation_df$annotations
)
#> 0 domain(s) not matched to genomic coords
head(manual_features)
#> ensembl_transcript_id feature_id start stop chr strand clean_name
#> <char> <char> <num> <num> <char> <char> <char>
#> 1: ENST00000337907 <NA> 20 35 chr1 - demo_feature_1
#> 2: ENST00000400908 <NA> 45 58 chr1 - demo_feature_2
#> 3: ENST00000476556 <NA> 80 92 chr1 - demo_feature_3
#> alt_name database ensembl_peptide_id method
#> <char> <char> <char> <char>
#> 1: <NA> manual <NA> manual
#> 2: <NA> manual <NA> manual
#> 3: <NA> manual <NA> manual
#> name
#> <char>
#> 1: demo_feature_1;chr1:8656030-8656077
#> 2: demo_feature_2;chr1:8656105-8656146
#> 3: demo_feature_3;chr1:8362842-8361798Use get_user_data() when you already have sample-level
reads/PSI.
example_df <- data.frame(
event_id = rep("A3SS:1", 8),
event_type = "A3SS",
form = rep(c("INC", "EXC"), each = 4),
gene_id = "ENSG00000158286",
chr = "chrX",
strand = "-",
inc = c(rep("149608626-149608834", 4), rep("149608626-149608829", 4)),
exc = c(rep("", 4), rep("149608830-149608834", 4)),
inclusion_reads = c(30, 32, 29, 31, 2, 3, 4, 3),
exclusion_reads = c(1, 1, 2, 1, 28, 27, 26, 30),
sample = c("S1", "S2", "S3", "S4", "S1", "S2", "S3", "S4"),
condition = rep(c("case", "case", "control", "control"), 2),
stringsAsFactors = FALSE
)
example_df$psi <- example_df$inclusion_reads /
(example_df$inclusion_reads + example_df$exclusion_reads)
user_data <- get_user_data(example_df)
head(user_data)
#> event_id event_type form gene_id chr strand inc
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: A3SS:1 A3SS INC ENSG00000158286 chrX - 149608626-149608834
#> 2: A3SS:1 A3SS INC ENSG00000158286 chrX - 149608626-149608834
#> 3: A3SS:1 A3SS INC ENSG00000158286 chrX - 149608626-149608834
#> 4: A3SS:1 A3SS INC ENSG00000158286 chrX - 149608626-149608834
#> 5: A3SS:1 A3SS EXC ENSG00000158286 chrX - 149608626-149608829
#> 6: A3SS:1 A3SS EXC ENSG00000158286 chrX - 149608626-149608829
#> exc inclusion_reads exclusion_reads psi sample
#> <char> <num> <num> <num> <char>
#> 1: 30 1 0.96774194 S1
#> 2: 32 1 0.96969697 S2
#> 3: 29 2 0.93548387 S3
#> 4: 31 1 0.96875000 S4
#> 5: 149608830-149608834 2 28 0.06666667 S1
#> 6: 149608830-149608834 3 27 0.10000000 S2
#> condition source_file
#> <char> <char>
#> 1: case
#> 2: case
#> 3: control
#> 4: control
#> 5: case
#> 6: caseUse get_user_data_post_di() when you already have
event-level DI summaries.
example_user_data <- data.frame(
event_id = rep("A3SS:1", 8),
event_type = "A3SS",
gene_id = "ENSG00000158286",
chr = "chrX",
strand = "-",
form = rep(c("INC", "EXC"), each = 4),
inc = c(rep("149608626-149608834", 4), rep("149608626-149608829", 4)),
exc = c(rep("", 4), rep("149608830-149608834", 4)),
stringsAsFactors = FALSE
)
user_res <- get_user_data_post_di(example_user_data)
head(user_res)
#> site_id
#> <char>
#> 1: A3SS|ENSG00000158286|chrX|149608626-149608834||INC
#> 2: A3SS|ENSG00000158286|chrX|149608626-149608834||INC
#> 3: A3SS|ENSG00000158286|chrX|149608626-149608834||INC
#> 4: A3SS|ENSG00000158286|chrX|149608626-149608834||INC
#> 5: A3SS|ENSG00000158286|chrX|149608626-149608829|149608830-149608834|EXC
#> 6: A3SS|ENSG00000158286|chrX|149608626-149608829|149608830-149608834|EXC
#> event_type event_id gene_id chr strand inc
#> <char> <char> <char> <char> <char> <char>
#> 1: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608834
#> 2: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608834
#> 3: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608834
#> 4: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608834
#> 5: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608829
#> 6: A3SS A3SS:1 ENSG00000158286 chrX - 149608626-149608829
#> exc n_samples n_control n_case mean_psi_ctrl mean_psi_case
#> <char> <num> <num> <num> <num> <num>
#> 1: -1 -1 -1 -1 -1
#> 2: -1 -1 -1 -1 -1
#> 3: -1 -1 -1 -1 -1
#> 4: -1 -1 -1 -1 -1
#> 5: 149608830-149608834 -1 -1 -1 -1 -1
#> 6: 149608830-149608834 -1 -1 -1 -1 -1
#> delta_psi p.value padj cooks_max form n n_used
#> <num> <num> <num> <num> <char> <num> <num>
#> 1: 1 0 0 -1 INC -1 -1
#> 2: 1 0 0 -1 INC -1 -1
#> 3: 1 0 0 -1 INC -1 -1
#> 4: 1 0 0 -1 INC -1 -1
#> 5: -1 0 0 -1 EXC -1 -1
#> 6: -1 0 0 -1 EXC -1 -1Use get_rmats_post_di() to ingest already-computed rMATS
result tables.
rmats_df <- data.frame(
ID = 1L,
GeneID = "ENSG00000182871",
geneSymbol = "COL18A1",
chr = "chr21",
strand = "+",
longExonStart_0base = 45505834L,
longExonEnd = 45505966L,
shortES = 45505837L,
shortEE = 45505966L,
flankingES = 45505357L,
flankingEE = 45505431L,
ID.2 = 2L,
IJC_SAMPLE_1 = "1,1,1",
SJC_SAMPLE_1 = "1,1,1",
IJC_SAMPLE_2 = "1,1,1",
SJC_SAMPLE_2 = "1,1,1",
IncFormLen = 52L,
SkipFormLen = 49L,
PValue = 0.6967562,
FDR = 1,
IncLevel1 = "0.0,0.0,0.0",
IncLevel2 = "1.0,1.0,1.0",
IncLevelDifference = 1.0,
stringsAsFactors = FALSE
)
user_res_rmats <- get_rmats_post_di(rmats_df, event_type = "A3SS")
head(user_res_rmats)
#> event_id event_type form gene_id chr strand inc
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: A3SS:1 A3SS INC ENSG00000182871 chr21 + 45505834-45505966
#> 2: A3SS:1 A3SS EXC ENSG00000182871 chr21 + 45505837-45505966
#> exc p.value delta_psi padj
#> <char> <num> <num> <num>
#> 1: 0.6967562 1 1
#> 2: 45505834-45505836 0.6967562 -1 1Use compare_transcript_pairs() when you want to start
directly from chosen transcript pairs instead of event matching.
tx_ids <- unique(annotation_df$annotations$transcript_id)
tx_ids <- tx_ids[!is.na(tx_ids) & tx_ids != ""]
if (length(tx_ids) < 4L) {
stop("Need at least four transcripts for transcript-pair example.")
}
transcript_pairs <- data.frame(
transcript1 = tx_ids[1:2],
transcript2 = tx_ids[3:4],
stringsAsFactors = FALSE
)
user_matched <- compare_transcript_pairs(
transcript_pairs = transcript_pairs,
annotations = annotation_df$annotations
)
#> 2 out of 2 transcript pairs contained within annotations
head(user_matched)
#> event_id event_type form gene_id chr strand
#> <char> <char> <char> <char> <char> <char>
#> 1: TXCMP:1 TXCMP EXC ENSG00000158286 chr1 +
#> 2: TXCMP:1 TXCMP INC ENSG00000158286 chr1 +
#> 3: TXCMP:2 TXCMP EXC ENSG00000158286 chr1 +
#> 4: TXCMP:2 TXCMP INC ENSG00000158286 chr1 +
#> inc
#> <char>
#> 1: 6205475-6206101;6206196-6206302;6206536-6206726;6207379-6208307
#> 2: 6205475-6206101;6206196-6206302;6206536-6206726;6207379-6208307
#> 3: 6206129-6206302;6206536-6206726;6207379-6207564
#> 4: 6206129-6206302;6206536-6206726;6207379-6207564
#> exc
#> <char>
#> 1: 6208618-6209025;6209115-6209196;6209268-6209539;6209924-6209970;6210223-6210295;6210370-6210900
#> 2: 6208618-6209025;6209115-6209196;6209268-6209539;6209924-6209970;6210223-6210295;6210370-6210900
#> 3: 6208860-6209025;6209414-6209539;6209924-6209970;6210223-6210295;6210370-6210438;6211867-6212053;6212218-6212416;6212682-6213183;6218289-6218369;6219236-6220805
#> 4: 6208860-6209025;6209414-6209539;6209924-6209970;6210223-6210295;6210370-6210438;6211867-6212053;6212218-6212416;6212682-6213183;6218289-6218369;6219236-6220805
#> delta_psi p.value padj n_samples n_control n_case transcript_id
#> <num> <num> <num> <num> <num> <num> <char>
#> 1: -1 0 0 0 0 0 ENST00000485539
#> 2: 1 0 0 0 0 0 ENST00000466994
#> 3: -1 0 0 0 0 0 ENST00000496676
#> 4: 1 0 0 0 0 0 ENST00000484435
#> exons
#> <char>
#> 1: ENSE00001934791;ENSE00003528881;ENSE00001948578;ENSE00003560800;ENSE00003516232;ENSE00001921263
#> 2: ENSE00001843998;ENSE00001829482;ENSE00003686018;ENSE00001935169
#> 3: ENSE00001866172;ENSE00003662600;ENSE00003560800;ENSE00003516232;ENSE00003531305;ENSE00003488296;ENSE00001883823;ENSE00001858590;ENSE00003562079;ENSE00001844921
#> 4: ENSE00001551004;ENSE00003686018;ENSE00001828693sessionInfo()
#> R version 4.5.3 (2026-03-11)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BiocParallel_1.45.0 SpliceImpactR_0.99.4 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.5.3 BiocIO_1.21.0
#> [3] bitops_1.0-9 ggplotify_0.1.3
#> [5] tibble_3.3.1 polyclip_1.10-7
#> [7] enrichit_0.1.4 XML_3.99-0.23
#> [9] lifecycle_1.0.5 httr2_1.2.2
#> [11] pwalign_1.7.0 rstatix_0.7.3
#> [13] processx_3.8.7 lattice_0.22-9
#> [15] MASS_7.3-65 backports_1.5.1
#> [17] magrittr_2.0.5 sass_0.4.10
#> [19] rmarkdown_2.31 jquerylib_0.1.4
#> [21] yaml_2.3.12 otel_0.2.0
#> [23] ggtangle_0.1.1 cowplot_1.2.0
#> [25] DBI_1.3.0 buildtools_1.0.0
#> [27] RColorBrewer_1.1-3 abind_1.4-8
#> [29] GenomicRanges_1.63.2 purrr_1.2.1
#> [31] BiocGenerics_0.57.0 msigdbr_26.1.0
#> [33] RCurl_1.98-1.18 yulab.utils_0.2.4
#> [35] tweenr_2.0.3 rappdirs_0.3.4
#> [37] aisdk_1.1.0 gdtools_0.5.0
#> [39] IRanges_2.45.0 S4Vectors_0.49.1
#> [41] enrichplot_1.31.5 ggrepel_0.9.8
#> [43] tidytree_0.4.7 maketools_1.3.2
#> [45] codetools_0.2-20 DelayedArray_0.37.1
#> [47] DOSE_4.5.1 ggforce_0.5.0
#> [49] tidyselect_1.2.1 aplot_0.2.9
#> [51] farver_2.1.2 matrixStats_1.5.0
#> [53] stats4_4.5.3 Seqinfo_1.1.0
#> [55] GenomicAlignments_1.47.0 jsonlite_2.0.0
#> [57] Formula_1.2-5 systemfonts_1.3.2
#> [59] tools_4.5.3 ggnewscale_0.5.2
#> [61] treeio_1.35.0 PFAM.db_3.22.0
#> [63] Rcpp_1.1.1 glue_1.8.0
#> [65] gridExtra_2.3 SparseArray_1.11.13
#> [67] xfun_0.57 qvalue_2.43.0
#> [69] MatrixGenerics_1.23.0 dplyr_1.2.1
#> [71] withr_3.0.2 BiocManager_1.30.27
#> [73] fastmap_1.2.0 callr_3.7.6
#> [75] digest_0.6.39 R6_2.6.1
#> [77] gridGraphics_0.5-1 GO.db_3.23.1
#> [79] RSQLite_2.4.6 cigarillo_1.1.0
#> [81] tidyr_1.3.2 generics_0.1.4
#> [83] fontLiberation_0.1.0 data.table_1.18.2.1
#> [85] rtracklayer_1.71.3 httr_1.4.8
#> [87] htmlwidgets_1.6.4 S4Arrays_1.11.1
#> [89] scatterpie_0.2.6 pkgconfig_2.0.3
#> [91] gtable_0.3.6 blob_1.3.0
#> [93] S7_0.2.1 XVector_0.51.0
#> [95] sys_3.4.3 clusterProfiler_4.19.7
#> [97] htmltools_0.5.9 fontBitstreamVera_0.1.1
#> [99] carData_3.0-6 scales_1.4.0
#> [101] Biobase_2.71.0 png_0.1-9
#> [103] ggfun_0.2.0 knitr_1.51
#> [105] reshape2_1.4.5 rjson_0.2.23
#> [107] nlme_3.1-169 curl_7.0.0
#> [109] org.Hs.eg.db_3.23.1 cachem_1.1.0
#> [111] stringr_1.6.0 parallel_4.5.3
#> [113] AnnotationDbi_1.73.1 restfulr_0.0.16
#> [115] pillar_1.11.1 grid_4.5.3
#> [117] vctrs_0.7.2 ggpubr_0.6.3
#> [119] car_3.1-5 tidydr_0.0.6
#> [121] cluster_2.1.8.2 evaluate_1.0.5
#> [123] cli_3.6.6 compiler_4.5.3
#> [125] Rsamtools_2.27.2 rlang_1.2.0
#> [127] crayon_1.5.3 ggsignif_0.6.4
#> [129] labeling_0.4.3 ps_1.9.2
#> [131] plyr_1.8.9 fs_2.0.1
#> [133] ggiraph_0.9.6 stringi_1.8.7
#> [135] assertthat_0.2.1 babelgene_22.9
#> [137] Biostrings_2.79.5 lazyeval_0.2.3
#> [139] GOSemSim_2.37.2 fontquiver_0.2.1
#> [141] Matrix_1.7-5 patchwork_1.3.2
#> [143] bit64_4.6.0-1 ggplot2_4.0.2
#> [145] KEGGREST_1.51.1 SummarizedExperiment_1.41.1
#> [147] igraph_2.2.3 broom_1.0.12
#> [149] memoise_2.0.1 bslib_0.10.0
#> [151] ggtree_4.1.2 bit_4.6.0
#> [153] ape_5.8-1 gson_0.1.0hits_final, data, res)