Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
| SummarizedExperiment-compatible Functions | Description | 
|---|---|
| all | After all tidySummarizedExperimentis a SummarizedExperiment object, just better | 
| tidyverse Packages | Description | 
|---|---|
| dplyr | Almost all dplyrAPIs like for any tibble | 
| tidyr | Almost all tidyrAPIs like for any tibble | 
| ggplot2 | ggplotlike for any tibble | 
| plotly | plot_lylike for any tibble | 
| Utilities | Description | 
|---|---|
| as_tibble | Convert cell-wise information to a tbl_df | 
if (!requireNamespace("BiocManager", quietly=TRUE)) {
      install.packages("BiocManager")
  }
BiocManager::install("tidySummarizedExperiment")
From Github (development)
devtools::install_github("stemangiola/tidySummarizedExperiment")
Load libraries used in the examples.
library(ggplot2)
library(tidySummarizedExperiment)
tidySummarizedExperiment, the best of both worlds!This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
pasilla_tidy <- tidySummarizedExperiment::pasilla 
It looks like a tibble
pasilla_tidy
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
##    .feature    .sample counts condition type      
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
But it is a SummarizedExperiment object after all
assays(pasilla_tidy)
## List of length 1
## names(1): counts
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice to choose rows by position, for example to choose the first row.
pasilla_tidy %>%
    slice(1)
## # A SummarizedExperiment-tibble abstraction: 1 × 5
## # Features=1 | Samples=1 | Assays=counts
##   .feature    .sample counts condition type      
##   <chr>       <chr>    <int> <chr>     <chr>     
## 1 FBgn0000003 untrt1       0 untreated single_end
We can use filter to choose rows by criteria.
pasilla_tidy %>%
    filter(condition == "untreated")
## # A SummarizedExperiment-tibble abstraction: 58,396 × 5
## # Features=14599 | Samples=4 | Assays=counts
##    .feature    .sample counts condition type      
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
We can use select to choose columns.
pasilla_tidy %>%
    select(.sample)
## # A tibble: 102,193 × 1
##    .sample
##    <chr>  
##  1 untrt1 
##  2 untrt1 
##  3 untrt1 
##  4 untrt1 
##  5 untrt1 
##  6 untrt1 
##  7 untrt1 
##  8 untrt1 
##  9 untrt1 
## 10 untrt1 
## # ℹ 102,183 more rows
We can use count to count how many rows we have for each sample.
pasilla_tidy %>%
    count(.sample)
## # A tibble: 7 × 2
##   .sample     n
##   <chr>   <int>
## 1 trt1    14599
## 2 trt2    14599
## 3 trt3    14599
## 4 untrt1  14599
## 5 untrt2  14599
## 6 untrt3  14599
## 7 untrt4  14599
We can use distinct to see what distinct sample information we have.
pasilla_tidy %>%
    distinct(.sample, condition, type)
## # A tibble: 7 × 3
##   .sample condition type      
##   <chr>   <chr>     <chr>     
## 1 untrt1  untreated single_end
## 2 untrt2  untreated single_end
## 3 untrt3  untreated paired_end
## 4 untrt4  untreated paired_end
## 5 trt1    treated   single_end
## 6 trt2    treated   paired_end
## 7 trt3    treated   paired_end
We could use rename to rename a column. For example, to modify the type column name.
pasilla_tidy %>%
    rename(sequencing=type)
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
##    .feature    .sample counts condition sequencing
##    <chr>       <chr>    <int> <chr>     <chr>     
##  1 FBgn0000003 untrt1       0 untreated single_end
##  2 FBgn0000008 untrt1      92 untreated single_end
##  3 FBgn0000014 untrt1       5 untreated single_end
##  4 FBgn0000015 untrt1       0 untreated single_end
##  5 FBgn0000017 untrt1    4664 untreated single_end
##  6 FBgn0000018 untrt1     583 untreated single_end
##  7 FBgn0000022 untrt1       0 untreated single_end
##  8 FBgn0000024 untrt1      10 untreated single_end
##  9 FBgn0000028 untrt1       0 untreated single_end
## 10 FBgn0000032 untrt1    1446 untreated single_end
## # ℹ 40 more rows
We could use mutate to create a column. For example, we could create a new type column that contains single
and paired instead of single_end and paired_end.
pasilla_tidy %>%
    mutate(type=gsub("_end", "", type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 5
## # Features=14599 | Samples=7 | Assays=counts
##    .feature    .sample counts condition type  
##    <chr>       <chr>    <int> <chr>     <chr> 
##  1 FBgn0000003 untrt1       0 untreated single
##  2 FBgn0000008 untrt1      92 untreated single
##  3 FBgn0000014 untrt1       5 untreated single
##  4 FBgn0000015 untrt1       0 untreated single
##  5 FBgn0000017 untrt1    4664 untreated single
##  6 FBgn0000018 untrt1     583 untreated single
##  7 FBgn0000022 untrt1       0 untreated single
##  8 FBgn0000024 untrt1      10 untreated single
##  9 FBgn0000028 untrt1       0 untreated single
## 10 FBgn0000032 untrt1    1446 untreated single
## # ℹ 40 more rows
We could use unite to combine multiple columns into a single column.
pasilla_tidy %>%
    unite("group", c(condition, type))
## # A SummarizedExperiment-tibble abstraction: 102,193 × 4
## # Features=14599 | Samples=7 | Assays=counts
##    .feature    .sample counts group               
##    <chr>       <chr>    <int> <chr>               
##  1 FBgn0000003 untrt1       0 untreated_single_end
##  2 FBgn0000008 untrt1      92 untreated_single_end
##  3 FBgn0000014 untrt1       5 untreated_single_end
##  4 FBgn0000015 untrt1       0 untreated_single_end
##  5 FBgn0000017 untrt1    4664 untreated_single_end
##  6 FBgn0000018 untrt1     583 untreated_single_end
##  7 FBgn0000022 untrt1       0 untreated_single_end
##  8 FBgn0000024 untrt1      10 untreated_single_end
##  9 FBgn0000028 untrt1       0 untreated_single_end
## 10 FBgn0000032 untrt1    1446 untreated_single_end
## # ℹ 40 more rows
We can also combine commands with the tidyverse pipe %>%.
For example, we could combine group_by and summarise to get the total counts for each sample.
pasilla_tidy %>%
    group_by(.sample) %>%
    summarise(total_counts=sum(counts))
## # A tibble: 7 × 2
##   .sample total_counts
##   <chr>          <int>
## 1 trt1        18670279
## 2 trt2         9571826
## 3 trt3        10343856
## 4 untrt1      13972512
## 5 untrt2      21911438
## 6 untrt3       8358426
## 7 untrt4       9841335
We could combine group_by, mutate and filter to get the transcripts with mean count > 0.
pasilla_tidy %>%
    group_by(.feature) %>%
    mutate(mean_count=mean(counts)) %>%
    filter(mean_count > 0)
## # A tibble: 86,513 × 6
## # Groups:   .feature [12,359]
##    .feature    .sample counts condition type       mean_count
##    <chr>       <chr>    <int> <chr>     <chr>           <dbl>
##  1 FBgn0000003 untrt1       0 untreated single_end      0.143
##  2 FBgn0000008 untrt1      92 untreated single_end     99.6  
##  3 FBgn0000014 untrt1       5 untreated single_end      1.43 
##  4 FBgn0000015 untrt1       0 untreated single_end      0.857
##  5 FBgn0000017 untrt1    4664 untreated single_end   4672.   
##  6 FBgn0000018 untrt1     583 untreated single_end    461.   
##  7 FBgn0000022 untrt1       0 untreated single_end      0.143
##  8 FBgn0000024 untrt1      10 untreated single_end      7    
##  9 FBgn0000028 untrt1       0 untreated single_end      0.429
## 10 FBgn0000032 untrt1    1446 untreated single_end   1085.   
## # ℹ 86,503 more rows
my_theme <-
    list(
        scale_fill_brewer(palette="Set1"),
        scale_color_brewer(palette="Set1"),
        theme_bw() +
            theme(
                panel.border=element_blank(),
                axis.line=element_line(),
                panel.grid.major=element_line(size=0.2),
                panel.grid.minor=element_line(size=0.1),
                text=element_text(size=12),
                legend.position="bottom",
                aspect.ratio=1,
                strip.background=element_blank(),
                axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
                axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
            )
    )
We can treat pasilla_tidy as a normal tibble for plotting.
Here we plot the distribution of counts per sample.
pasilla_tidy %>%
    ggplot(aes(counts + 1, group=.sample, color=`type`)) +
    geom_density() +
    scale_x_log10() +
    my_theme
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] tidyr_1.3.1                     dplyr_1.1.4                    
##  [3] tidySummarizedExperiment_1.16.0 ttservice_0.4.1                
##  [5] SummarizedExperiment_1.36.0     Biobase_2.66.0                 
##  [7] GenomicRanges_1.58.0            GenomeInfoDb_1.42.0            
##  [9] IRanges_2.40.0                  S4Vectors_0.44.0               
## [11] BiocGenerics_0.52.0             MatrixGenerics_1.18.0          
## [13] matrixStats_1.4.1               ggplot2_3.5.1                  
## [15] knitr_1.48                     
## 
## loaded via a namespace (and not attached):
##  [1] plotly_4.10.4           utf8_1.2.4              generics_0.1.3         
##  [4] SparseArray_1.6.0       stringi_1.8.4           lattice_0.22-6         
##  [7] digest_0.6.37           magrittr_2.0.3          RColorBrewer_1.1-3     
## [10] evaluate_1.0.1          grid_4.4.1              fastmap_1.2.0          
## [13] jsonlite_1.8.9          Matrix_1.7-1            httr_1.4.7             
## [16] purrr_1.0.2             fansi_1.0.6             viridisLite_0.4.2      
## [19] UCSC.utils_1.2.0        scales_1.3.0            lazyeval_0.2.2         
## [22] abind_1.4-8             cli_3.6.3               rlang_1.1.4            
## [25] crayon_1.5.3            XVector_0.46.0          ellipsis_0.3.2         
## [28] munsell_0.5.1           withr_3.0.2             DelayedArray_0.32.0    
## [31] S4Arrays_1.6.0          tools_4.4.1             colorspace_2.1-1       
## [34] GenomeInfoDbData_1.2.13 vctrs_0.6.5             R6_2.5.1               
## [37] lifecycle_1.0.4         stringr_1.5.1           zlibbioc_1.52.0        
## [40] htmlwidgets_1.6.4       pkgconfig_2.0.3         pillar_1.9.0           
## [43] gtable_0.3.6            glue_1.8.0              data.table_1.16.2      
## [46] highr_0.11              xfun_0.48               tibble_3.2.1           
## [49] tidyselect_1.2.1        farver_2.1.2            htmltools_0.5.8.1      
## [52] labeling_0.4.3          compiler_4.4.1