1 Introduction

Plasma circulating cell-free DNA (cfDNA) analysis has transformed cancer care. While recent work has shown that ctDNA fragments have distinct features (size, end motifs) compared to healthy cfDNA, few tools offer a standardized framework for an in-depth analysis that links these features to the specific mutational status of each DNA fragment.

fRagmentomics addresses this need by providing a robust and flexible R package to integrate cfDNA fragment features with their mutational status. By standardizing the analysis of both SNVs and indels at the single-fragment level, fRagmentomics supports the interpretation of liquid biopsies to distinguish tumoral origin.

This vignette provides a step-by-step walkthrough of the main analysis workflow using the run_fRagmentomics() function and demonstrates the package’s core visualization capabilities thank to the visualisation functions: plot_size_distribution(), plot_ggseqlogo_meme(), plot_freq_barplot(), plot_motif_barplot().

2 Installation

2.1 Prerequisites

fRagmentomics is built and tested under R version 4.4.3.

2.2 System Dependencies

fRagmentomics optionally makes use bcftools (tested with version 1.21) for variant normalization. If using apply_bcftools_norm=TRUE argument, please, ensure it is installed and accessible in your system’s PATH.

We recommend using a conda environment with bcftools installed but you can also use any other environment manager or rely on tools already installed in your system.

# conda install -c bioconda mamba
mamba install -c bioconda bcftools=1.21

2.3 R Package Installation

You may install the package from github or from conda.

1. Install from conda

# To use bcftools normalisation (RECOMMENDED for indels)
mamba env create -n fRagmentomics-env -c elsab-lab -c conda-forge -c bioconda r-fragmentomics bcftools=1.21

2. Install from github

When installing from GitHub, pre-installing heavy Bioconductor deps via Conda can avoid slow source compilation.

Rscript -e 'if (!requireNamespace("remotes", quietly=TRUE))
                install.packages("remotes", repos="https://mirror.ibcp.fr/pub/CRAN/");
            remotes::install_github("ElsaB-Lab/fRagmentomics", build_vignettes=FALSE, upgrade="never")'

NOTE: If you hit compilation issues (e.g. with Rsamtools / BiocParallel), try one of the two approaches below.

  • With Conda/Mamba (preinstall Rsamtools)
mamba env create -n fRagmentomics-env -c conda-forge -c bioconda bioconductor-rsamtools bioconductor-variantannotation bcftools=1.21 # To use bcftools normalisation (RECOMMENDED for indels)
mamba activate fRagmentomics-env
Rscript -e 'if (!requireNamespace("BiocManager", quietly=TRUE))
                install.packages("BiocManager", repos="https://mirror.ibcp.fr/pub/CRAN/");
            if (!requireNamespace("remotes", quietly=TRUE))
                install.packages("remotes", repos="https://mirror.ibcp.fr/pub/CRAN/");
            remotes::install_github("ElsaB-Lab/fRagmentomics", build_vignettes=FALSE, upgrade="never")'
  • Without Conda/Mamba (use BiocManager to get Bioconductor deps)
Rscript -e 'if (!requireNamespace("BiocManager", quietly=TRUE))
                install.packages("BiocManager", repos="https://mirror.ibcp.fr/pub/CRAN/");
            if (!requireNamespace("remotes", quietly=TRUE))
                install.packages("remotes", repos="https://mirror.ibcp.fr/pub/CRAN/");
            BiocManager::install(c("Rsamtools","VariantAnnotation","GenomicAlignments","BiocParallel"), ask=FALSE, update=FALSE);
            remotes::install_github("ElsaB-Lab/fRagmentomics", build_vignettes=FALSE, upgrade="never")'

After these steps are complete, you can load the package into your R session with library(fRagmentomics).

3 A Complete Analysis Workflow

To make it user-friendly, the entire analysis is performed by a single function: run_fRagmentomics(). We will use the example data included with the package to demonstrate a typical workflow.

For a deeper dive into the package’s methodology, please refer to the README.md file on the project’s GitHub repository.

The README provides detailed explanations of:

  • The normalization for converting indel representations into a standardized VCF format.

  • The context-aware algorithm for determining read status.

  • The formulas used for precise fragment size calculation.

  • The logic for classifying fragments into different mutational groups.

3.1 1. Loading the Package and Example Data

We begin by loading the fRagmentomics library. To facilitate reproducibility, the package ships with a fully anonymized test dataset comprising a small BAM file, its corresponding FASTA reference, and a mutation file. This test sample originates from a lung cancer case harboring an EGFR exon 19 15 bp in-frame deletion — a well-characterized oncogenic driver associated with sensitivity to tyrosine kinase inhibitors.

The scripts used to generate, subset, and anonymize the test BAM and FASTA files are available in the package directory: inst/scripts/

suppressPackageStartupMessages({
  library(fRagmentomics)
})

# Locate the example files bundled with the package
mut_file <- system.file(
  "extdata/mutation",
  "cfdna-egfr-del_chr7_55241864_55243064_10k.mutations.tsv",
  package = "fRagmentomics"
)
bam_file <- system.file(
  "extdata/bam",
  "cfdna-egfr-del_chr7_55241864_55243064_10k.bam",
  package = "fRagmentomics"
)
fasta_file <- system.file(
  "extdata/fasta",
  "hg19_chr7_55231864_55253064.fa",
  package = "fRagmentomics"
)

# Print the file paths to confirm they were found
mut_file
## [1] "/tmp/RtmpmM4S46/Rinst18f6ed2f4c8f8e/fRagmentomics/extdata/mutation/cfdna-egfr-del_chr7_55241864_55243064_10k.mutations.tsv"
bam_file
## [1] "/tmp/RtmpmM4S46/Rinst18f6ed2f4c8f8e/fRagmentomics/extdata/bam/cfdna-egfr-del_chr7_55241864_55243064_10k.bam"
fasta_file
## [1] "/tmp/RtmpmM4S46/Rinst18f6ed2f4c8f8e/fRagmentomics/extdata/fasta/hg19_chr7_55231864_55253064.fa"
# Detect whether bcftools is available on the system PATH
has_bcftools <- nzchar(Sys.which("bcftools"))
if (!has_bcftools) {
  message(
    "bcftools not found on PATH; normalization examples will run without it."
  )
}
## bcftools not found on PATH; normalization examples will run without it.

To handle 0-based mutation coordinates, set the one_based = FALSE parameter. This will ensure the positions are correctly converted. Furthermore, fRagmentomics automatically standardizes indel representations to resolve ambiguities, as the same variant can often be described in multiple ways (e.g., left-aligned vs. right-aligned).

The following table summarizes the accepted formats:

Simple Format

To describe a… REF Column ALT Column POS Column
Deletion of “AT” AT "", -, ., _, NA Position of the first deleted base (A)
Insertion of “CT” "", -, ., _, NA CT Position of the base before the insertion

VCF-Style Padded Format

To describe a… REF Column ALT Column POS Column
Deletion of “AT” from “GAT GAT G Position of the anchor base (G)
Insertion of "CT" after “A” A ACT Position of the anchor base (A)

3.2 2. Running the Analysis

3.2.1 Basic Workflow

First, we will run run_fRagmentomics() with the default arguments. We provide the paths to our three input files, a sample_id for annotation, and specify the number of cores for parallel processing.

When apply_bcftools_norm = TRUE, the pipeline calls bcftools norm to left-align and normalize indels, removing representation/positional ambiguity. To use this option you must have bcftools installed and discoverable on your system (see the Installation section of the README). If bcftools is unavailable, you may set apply_bcftools_norm = FALSE, but indel positions will not be left-aligned or normalized, which may lead to discrepancies across callers and reduce comparability. In this vignette, we detect bcftools availability via has_bcftools and toggle normalization accordingly so the examples run on systems without it.

if (!has_bcftools) {
  message("bcftools not detected; running without external normalization.")
}
## bcftools not detected; running without external normalization.
# Run the full analysis pipeline with default settings
df_frag_basic <- run_fRagmentomics(
  mut = mut_file,
  bam = bam_file,
  fasta = fasta_file,
  sample_id = "cfdna-egfr-del",
  apply_bcftools_norm = has_bcftools,
  verbose = FALSE,
  n_cores = 1
)

# View the dimensions and the first few rows
cat("Dimensions of basic results:", dim(df_frag_basic), "\n")
## Dimensions of basic results: 3017 24
head(df_frag_basic)
## DataFrame with 6 rows and 24 columns
##        Sample_Id  Chromosome  Position              Ref         Alt
##      <character> <character> <integer>      <character> <character>
## 1 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 2 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 4 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 5 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 6 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
##           Input_Mutation Fragment_Id Fragment_QC Fragment_Status_Simple
##              <character> <character> <character>            <character>
## 1 chr7:10601:AGGAATTAA..  READ_04307          OK                     WT
## 2 chr7:10601:AGGAATTAA..  READ_02304          OK                     WT
## 3 chr7:10601:AGGAATTAA..  READ_01914          OK                     WT
## 4 chr7:10601:AGGAATTAA..  READ_00367          OK                     WT
## 5 chr7:10601:AGGAATTAA..  READ_02497          OK                     WT
## 6 chr7:10601:AGGAATTAA..  READ_01703          OK                     WT
##   Fragment_Status_Detail Fragment_Size Read_5p_Status Read_3p_Status
##              <character>     <integer>    <character>    <character>
## 1                     WT           436             NA             WT
## 2                     WT           371             NA             WT
## 3                     WT           410             NA             WT
## 4                     WT           347             NA             WT
## 5                     WT           382             NA             WT
## 6                     WT           442             NA             WT
##       BASE_5p     BASE_3p     BASQ_5p     BASQ_3p Position_5p Position_3p
##   <character> <character> <character> <character>   <integer>   <integer>
## 1          NA           A          NA           F       10175       10610
## 2          NA           A          NA           F       10243       10613
## 3          NA           A          NA           F       10255       10664
## 4          NA           A          NA           F       10257       10603
## 5          NA           A          NA           F       10267       10648
## 6          NA           A          NA           F       10268       10709
##   Fragment_Bases_5p Fragment_Bases_3p Fragment_Basqs_5p Fragment_Basqs_3p
##         <character>       <character>       <character>       <character>
## 1             CAGAA             TTAAG             FFFFF             FFFFF
## 2             TGATT             AGAGA             FFFFF             FF:FF
## 3             GGCCT             TGCTT             FFFFF             FFFFF
## 4             CCTAG             CAAGG             FFFFF             FFFFF
## 5             GCATC             CCTCG             FFFFF             FFFFF
## 6             CATCA             TTTTC             FFFF,             FFFF:
##         VAF
##   <numeric>
## 1   26.3755
## 2   26.3755
## 3   26.3755
## 4   26.3755
## 5   26.3755
## 6   26.3755

3.2.2 Advanced Workflow: Customizing the Analysis

The run_fRagmentomics() function can be customized for more specific use cases. In the example below, we will perform a more stringent analysis and request additional information in the output table.

Specifically, we will:

  • Narrow the search window to 1000 bp around the variant using neg_offset_mate_search and pos_offset_mate_search.

  • Apply stricter BAM filters by requiring reads to be in a “proper pair” using flag_bam_list.

  • Request additional output columns for bam information (report_bam_info), soft-clip counts (report_softclip), and the 3-base motifs at the fragment ends (report_5p_3p_bases_fragment).

  • Keep fragments that do not pass quality control using the parameter retain_fail_qc: e.g., reads with incorrect orientation, mapped to different chromosomes, unmapped mates.

if (!has_bcftools) {
  message(
    "bcftools not detected; running advanced workflow without external normalization."
  )
}
## bcftools not detected; running advanced workflow without external normalization.
# Run the analysis with custom parameters
df_frag_advanced <- run_fRagmentomics(
  mut = mut_file,
  bam = bam_file,
  fasta = fasta_file,
  sample_id = "cfdna-egfr-del",
  neg_offset_mate_search = -500,
  pos_offset_mate_search = 500,
  flag_bam_list = list(
    isProperPair = TRUE,
    isUnmappedQuery = FALSE,
    hasUnmappedMate = FALSE,
    isSecondaryAlignment = FALSE
  ),
  report_bam_info = TRUE,
  report_softclip = TRUE,
  retain_fail_qc = TRUE,
  report_5p_3p_bases_fragment = 3,
  apply_bcftools_norm = has_bcftools,
  verbose = FALSE,
  n_cores = 1
)

# Observe that stricter filtering may result in fewer fragments
cat("Dimensions of advanced results:", dim(df_frag_advanced), "\n")
## Dimensions of advanced results: 2730 35
# View the newly added columns
head(df_frag_advanced[, c(
  "Fragment_Id",
  "TLEN",
  "Nb_Fragment_Bases_Softclip_5p",
  "Fragment_Bases_5p"
)])
## DataFrame with 6 rows and 4 columns
##   Fragment_Id      TLEN Nb_Fragment_Bases_Softclip_5p Fragment_Bases_5p
##   <character> <integer>                     <integer>       <character>
## 1  READ_02195       226                             0               GCA
## 2  READ_03952       220                             0               CTG
## 3  READ_02111       217                             0               CGG
## 4  READ_03049       253                             0               GGC
## 5  READ_01000       232                             0               CCC
## 6  READ_03528       237                             0               CCC

3.3 3. Exploring the Output

Note: The following outputs and visualizations are generated from a small, simulated test dataset. They are intended to demonstrate the package’s functionality and are not representative of real biological results.

The run_fRagmentomics() function returns a DataFrame where each row represents the analysis of a single fragment. Let’s explore the key columns to understand the results.

3.3.1 A First Look at the Results Table

# Dataframe output
df_frag_basic
## DataFrame with 3017 rows and 24 columns
##           Sample_Id  Chromosome  Position              Ref         Alt
##         <character> <character> <integer>      <character> <character>
## 1    cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 2    cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3    cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 4    cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 5    cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## ...             ...         ...       ...              ...         ...
## 3013 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3014 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3015 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3016 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
## 3017 cfdna-egfr-del        chr7     10601 AGGAATTAAGAGAAGC           A
##              Input_Mutation Fragment_Id Fragment_QC Fragment_Status_Simple
##                 <character> <character> <character>            <character>
## 1    chr7:10601:AGGAATTAA..  READ_04307          OK                     WT
## 2    chr7:10601:AGGAATTAA..  READ_02304          OK                     WT
## 3    chr7:10601:AGGAATTAA..  READ_01914          OK                     WT
## 4    chr7:10601:AGGAATTAA..  READ_00367          OK                     WT
## 5    chr7:10601:AGGAATTAA..  READ_02497          OK                     WT
## ...                     ...         ...         ...                    ...
## 3013 chr7:10601:AGGAATTAA..  READ_04411          OK                     WT
## 3014 chr7:10601:AGGAATTAA..  READ_04737          OK                     WT
## 3015 chr7:10601:AGGAATTAA..  READ_04938          OK                     WT
## 3016 chr7:10601:AGGAATTAA..  READ_05052          OK                     WT
## 3017 chr7:10601:AGGAATTAA..  READ_05101          OK                     WT
##      Fragment_Status_Detail Fragment_Size Read_5p_Status Read_3p_Status
##                 <character>     <integer>    <character>    <character>
## 1                        WT           436             NA             WT
## 2                        WT           371             NA             WT
## 3                        WT           410             NA             WT
## 4                        WT           347             NA             WT
## 5                        WT           382             NA             WT
## ...                     ...           ...            ...            ...
## 3013                     WT           174             WT             NA
## 3014                     WT           183             WT             NA
## 3015                     WT           167             WT             NA
## 3016                     WT           152             WT             NA
## 3017                     WT           176             WT             NA
##          BASE_5p     BASE_3p     BASQ_5p     BASQ_3p Position_5p Position_3p
##      <character> <character> <character> <character>   <integer>   <integer>
## 1             NA           A          NA           F       10175       10610
## 2             NA           A          NA           F       10243       10613
## 3             NA           A          NA           F       10255       10664
## 4             NA           A          NA           F       10257       10603
## 5             NA           A          NA           F       10267       10648
## ...          ...         ...         ...         ...         ...         ...
## 3013           A          NA           F          NA       10601       10774
## 3014           A          NA           F          NA       10601       10783
## 3015           A          NA           F          NA       10601       10767
## 3016           A          NA           F          NA       10601       10752
## 3017           A          NA           F          NA       10601       10776
##      Fragment_Bases_5p Fragment_Bases_3p Fragment_Basqs_5p Fragment_Basqs_3p
##            <character>       <character>       <character>       <character>
## 1                CAGAA             TTAAG             FFFFF             FFFFF
## 2                TGATT             AGAGA             FFFFF             FF:FF
## 3                GGCCT             TGCTT             FFFFF             FFFFF
## 4                CCTAG             CAAGG             FFFFF             FFFFF
## 5                GCATC             CCTCG             FFFFF             FFFFF
## ...                ...               ...               ...               ...
## 3013             AGGAA             TTCTA             F,FFF             FFFFF
## 3014             AGGAA             TTTCC             FFFFF             FFFFF
## 3015             AGGAA             GTTCA             FFFFF             FFFF:
## 3016             AGGAA             CTCCA             FFFFF             FFFFF
## 3017             AGGAA             CTACG             FFFF:             FFFFF
##            VAF
##      <numeric>
## 1      26.3755
## 2      26.3755
## 3      26.3755
## 4      26.3755
## 5      26.3755
## ...        ...
## 3013   26.3755
## 3014   26.3755
## 3015   26.3755
## 3016   26.3755
## 3017   26.3755

3.3.2 Understanding the Fragment Status

The most important columns for interpreting the results are Fragment_Status_Simple and Fragment_Status_Detail.

The Fragment_Status_Simple column provides a high-level classification for each fragment. The table below explains each possible label:

Fragment_Status_Simple Represents… Typical Read Evidence (Read_5p_Status / Read_3p_Status)
MUT The fragment confidently supports the mutant allele. MUT/MUT, MUT/AMB or MUT/NA
WT The fragment confidently supports an allele that is not the target. WT/WT, WT/AMB or WT/NA
OTH The two reads give conflicting high-confidence information. OTH/OTH, OTH/AMB or OTH/NA
N/I The fragment is non-informative: either discordant or ambiguous. MUT/WT, MUT/OTH, WT/OTH, AMB/AMB or AMB/NA

You can quickly summarize the results for a given mutation by counting the occurrences of each status:

table(df_frag_advanced$Fragment_Status_Simple)
## 
##  MUT  N/I  OTH   WT 
##  665   18   10 2036

The Fragment_Status_Detail column concatenates the statuses of the two reads (e.g., "WT & AMB").

table(df_frag_advanced$Fragment_Status_Detail)
## 
##                                   AMB                              AMB & WT 
##                                     6                                     1 
## AMB & WT by CIGAR but potentially MUT                                   MUT 
##                                     5                                   638 
##                             MUT & OTH MUT & WT by CIGAR but potentially MUT 
##                                     1                                    27 
##                                   OTH                              OTH & WT 
##                                    10                                    11 
##                                    WT       WT by CIGAR but potentially MUT 
##                                  1968                                    62

3.3.3 Exploring Other Key Features

You can also quickly summarize other key fragmentomic features like the calculated fragment size:

summary(df_frag_advanced$Fragment_Size)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.     NAs 
##      70     146     163     161     174     259       1

And you can easily extract the final VAF calculated for each mutation:

unique(df_frag_advanced$VAF)
## [1] 24.52969

For a complete description of every column in the output, please refer to the README.md file on the package’s GitHub repository.

4 Visualizing Fragmentomic Features

The df_frag_advanced object, produced by run_fRagmentomics(), can be directly used with the package’s suite of plotting functions to visualize the results.

4.1 Fragment Size Distribution

One of the most common analyses is to compare the size distribution of fragments that carry the mutation (MUT) against those that do not (WT). The plot_size_distribution() function is designed for this purpose.

4.1.1 1. Comparing Groups with a Histogram

Let’s start by creating a histogram to compare our two main groups. We’ll set show_histogram = TRUE and show_density = FALSE. Using histo_args to set an alpha value adds transparency, which is helpful when bars overlap.

plot_size_distribution(
  df_fragments = df_frag_advanced,
  vals_z = c("MUT", "WT"),
  show_histogram = TRUE,
  show_density = FALSE,
  x_limits = c(100, 420),
  histo_args = list(alpha = 0.5)
)

4.1.2 2. Using Density Curves

Alternatively, we can represent the same data using smooth density curves, which can make it easier to see the shape and peaks of the distributions. This is the default behavior (show_density = TRUE). Here, we’ll also show all available groups in the data by leaving vals_z as the default NULL.

plot_size_distribution(
  df_fragments = df_frag_advanced,
  show_histogram = FALSE,
  show_density = TRUE,
  x_limits = c(100, 420)
)

4.1.3 3. Visualizing the Overall Distribution

Finally, to see the size distribution of all fragments combined into a single group, we can set col_z = NULL. This example also demonstrates how to overlay a histogram and a density curve in the same plot.

plot_size_distribution(
  df_fragments = df_frag_advanced,
  col_z = NULL,
  show_histogram = TRUE,
  show_density = TRUE,
  x_limits = c(100, 420),
  histo_args = list(alpha = 0.5)
)

4.2 Detailed 3-Base End Motif Proportions

For a deeper dive into the 3-base motifs at fragment ends, the package provides plot_motif_barplot(). This function has three visualization modes, controlled by the representation argument.

4.2.1 1. Hierarchical View (representation = "split_by_base")

This is the default view. It creates a detailed plot with nested facets for each base position, providing a complete landscape of all 64 possible motif proportions.

plot_motif_barplot(
  df_fragments = df_frag_basic,
  representation = "split_by_base",
  vals_z = c("MUT", "WT")
)

We can also customize this plot, for instance by filtering to show only motifs that start with ‘C’ or ‘G’ using motif_start, and passing additional arguments (like alpha for transparency) to the underlying geom_bar() via the ... parameter.

plot_motif_barplot(
  df_fragments = df_frag_basic,
  representation = "split_by_base",
  motif_start = c("C", "G"),
  vals_z = c("MUT", "WT"),
  alpha = 0.7 # Pass 'alpha' to geom_bar()
)

4.2.2 2. Differential Analysis (representation = "differential")

This mode is designed to directly compare exactly two groups. It calculates the log2 Fold Change (log2FC) of the proportion of each motif between the groups, making it easy to spot motifs that are enriched or depleted in one group relative to the other.

plot_motif_barplot(
  df_fragments = df_frag_advanced,
  representation = "differential",
  vals_z = c("MUT", "WT")
)

4.2.3 3. Side-by-side Comparison (representation = "split_by_motif")

This mode creates a more traditional bar chart. Each 3-base motif is placed on the x-axis, with side-by-side bars showing the proportion for each group. This is useful for directly comparing the frequency of specific, known motifs.

plot_motif_barplot(
  df_fragments = df_frag_basic,
  representation = "split_by_motif",
  vals_z = c("MUT", "WT")
)
## Warning in RColorBrewer::brewer.pal(length(group_keys), "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

4.2.4 End Motif Sequence Logos

To visualize the nucleotide frequency at each position of the fragment ends, the package provides plot_ggseqlogo_meme(). This function uses the popular ggseqlogo package to create sequence logo plots, which are excellent for identifying conserved patterns or sequence preferences.

4.2.4.1 1. Comparing Motifs Between Groups

A common use case is to compare the end-motif patterns of different fragment groups. Here, we’ll look at the first 6 bases (motif_size = 6) of the 5’ end (motif_type = "Start") for MUT versus WT fragments.

plot_ggseqlogo_meme(
  df_fragments = df_frag_basic,
  motif_type = "Start",
  motif_size = 6,
  vals_z = c("MUT", "WT"),
  colors_z = c("#F6BD60", "#84A59D", "#FD96A9", "#083D77")
)
## Warning: Requested 'motif_size' (6) exceeds shortest sequence (5). Using 5.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggseqlogo package.
##   Please report the issue at <https://github.com/omarwagih/ggseqlogo/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

4.2.4.2 2. Visualizing Both Ends of a Single Group

You can also analyze a single group in more detail. By setting motif_type = "Both", the function will display the 5’-end motif and the 3’-end motif side-by-side. Let’s look at the 4-base motifs for just the MUT fragments.

plot_ggseqlogo_meme(
  df_fragments = df_frag_basic,
  motif_type = "Both",
  motif_size = 4,
  vals_z = "MUT",
  colors_z = c("#F6BD60", "#84A59D", "#FD96A9", "#083D77")
)

4.3 Overall Nucleotide Frequency

The plot_freq_barplot() function provides a summary of the nucleotide composition at fragment ends. Instead of looking at specific motifs, it calculates the overall proportion of each of the four bases (A, C, G, T) across the motif length. This is useful for spotting broad compositional biases between groups and includes a global Chi-squared test for statistical significance.

In this example, we will analyze the overall nucleotide frequency in the first 5 bases (motif_size = 5) of the 5’ end (motif_type = "Start") of the fragments. We will also pass the alpha argument via ... to make the bars transparent.

plot_freq_barplot(
  df_fragments = df_frag_basic,
  motif_size = 5,
  motif_type = "Start",
  alpha = 0.7
)

5 Session Information

This section provides details about the R session and package versions used to generate this vignette, ensuring reproducibility.

sessionInfo()
## R Under development (unstable) (2026-03-05 r89546)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] future_1.70.0         fRagmentomics_0.99.11 BiocStyle_2.39.0     
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.3.0                   bitops_1.0-9               
##   [3] rlang_1.1.7                 magrittr_2.0.4             
##   [5] otel_0.2.0                  matrixStats_1.5.0          
##   [7] compiler_4.6.0              RSQLite_2.4.6              
##   [9] GenomicFeatures_1.63.1      png_0.1-9                  
##  [11] systemfonts_1.3.2           vctrs_0.7.1                
##  [13] stringr_1.6.0               pkgconfig_2.0.3            
##  [15] crayon_1.5.3                fastmap_1.2.0              
##  [17] magick_2.9.1                XVector_0.51.0             
##  [19] labeling_0.4.3              Rsamtools_2.27.1           
##  [21] rmarkdown_2.30              tzdb_0.5.0                 
##  [23] UCSC.utils_1.7.1            ragg_1.5.1                 
##  [25] tinytex_0.58                purrr_1.2.1                
##  [27] bit_4.6.0                   xfun_0.56                  
##  [29] ggseqlogo_0.2.2             cachem_1.1.0               
##  [31] cigarillo_1.1.0             GenomeInfoDb_1.47.2        
##  [33] jsonlite_2.0.0              blob_1.3.0                 
##  [35] DelayedArray_0.37.0         BiocParallel_1.45.0        
##  [37] parallel_4.6.0              R6_2.6.1                   
##  [39] VariantAnnotation_1.57.1    bslib_0.10.0               
##  [41] stringi_1.8.7               RColorBrewer_1.1-3         
##  [43] rtracklayer_1.71.3          parallelly_1.46.1          
##  [45] GenomicRanges_1.63.1        jquerylib_0.1.4            
##  [47] Rcpp_1.1.1                  Seqinfo_1.1.0              
##  [49] bookdown_0.46               SummarizedExperiment_1.41.1
##  [51] knitr_1.51                  future.apply_1.20.2        
##  [53] readr_2.2.0                 IRanges_2.45.0             
##  [55] Matrix_1.7-4                tidyselect_1.2.1           
##  [57] dichromat_2.0-0.1           abind_1.4-8                
##  [59] yaml_2.3.12                 codetools_0.2-20           
##  [61] curl_7.0.0                  listenv_0.10.1             
##  [63] lattice_0.22-9              tibble_3.3.1               
##  [65] Biobase_2.71.0              withr_3.0.2                
##  [67] KEGGREST_1.51.1             S7_0.2.1                   
##  [69] evaluate_1.0.5              Biostrings_2.79.5          
##  [71] pillar_1.11.1               BiocManager_1.30.27        
##  [73] MatrixGenerics_1.23.0       stats4_4.6.0               
##  [75] generics_0.1.4              vroom_1.7.0                
##  [77] RCurl_1.98-1.17             S4Vectors_0.49.0           
##  [79] hms_1.1.4                   ggplot2_4.0.2              
##  [81] scales_1.4.0                globals_0.19.1             
##  [83] glue_1.8.0                  tools_4.6.0                
##  [85] BiocIO_1.21.0               data.table_1.18.2.1        
##  [87] BSgenome_1.79.1             GenomicAlignments_1.47.0   
##  [89] XML_3.99-0.22               grid_4.6.0                 
##  [91] tidyr_1.3.2                 AnnotationDbi_1.73.0       
##  [93] restfulr_0.0.16             cli_3.6.5                  
##  [95] textshaping_1.0.5           S4Arrays_1.11.1            
##  [97] dplyr_1.2.0                 gtable_0.3.6               
##  [99] ggh4x_0.3.1                 sass_0.4.10                
## [101] digest_0.6.39               BiocGenerics_0.57.0        
## [103] SparseArray_1.11.11         rjson_0.2.23               
## [105] farver_2.1.2                memoise_2.0.1              
## [107] htmltools_0.5.9             lifecycle_1.0.5            
## [109] httr_1.4.8                  bit64_4.6.0-1