DFplyr 1.0.0
DFplyrDFplyr is a R package available via the
Bioconductor repository for packages and can be
downloaded via BiocManager::install():
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("DFplyr")
## Check that you have a valid Bioconductor installation
BiocManager::valid()DFplyr is inspired by dplyr which implements a
wide variety of common data manipulations (mutate, select, filter) but
which only operates on objects of class data.frame or tibble (from r CRANpkg("tibble")).
When working with S4Vectors DataFrames - which are frequently
used as components of, for example SummarizedExperiment objects -
a common workaround is to convert the DataFrame to a tibble in order to then
use dplyr functions to manipulate the contents, before converting
back to a DataFrame.
This has several drawbacks, including the fact that tibble does not support
rownames (and dplyr frequently does not preserve them), does not
support S4 columns (e.g. IRanges vectors), and requires the back
and forth transformation any time manipulation is desired.
DFplyrlibrary("DFplyr")To being with, we create an S4Vectors DataFrame, including some
S4 columns
library(S4Vectors)
m <- mtcars[, c("cyl", "hp", "am", "gear", "disp")]
d <- as(m, "DataFrame")
d$grX <- GenomicRanges::GRanges("chrX", IRanges::IRanges(1:32, width = 10))
d$grY <- GenomicRanges::GRanges("chrY", IRanges::IRanges(1:32, width = 10))
d$nl <- IRanges::NumericList(lapply(d$gear, function(n) round(rnorm(n), 2)))
d
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> ...                      ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...This will appear in RStudio’s environment pane as a
Formal class DataFrame (dplyr-compatible)when using DFplyr. No interference with the actual object is required, but this helps identify that dplyr-compatibility is available.
DataFrames can then be used in dplyr-like calls the same as
data.frame or tibble objects. Support for working with S4 columns is enabled
provided they have appropriate functions. Adding multiple columns will result in
the new columns being created in alphabetical order. For example, adding a new
column newvar which is the sum of the cyl and hp columns
mutate(d, newvar = cyl + hp)
#> DataFrame with 32 rows and 9 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl    newvar
#>                    <GRanges> <CompressedNumericList> <numeric>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...       116
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...       116
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...        97
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75       116
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83       183
#> ...                      ...                     ...       ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...       117
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...       272
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...       181
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...       343
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...       113or doubling the nl column as nl2
mutate(d, nl2 = nl * 2)
#> DataFrame with 32 rows and 9 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl                     nl2
#>                    <GRanges> <CompressedNumericList> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...      2.46,3.06,1.66,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...   -4.76, 0.86,-1.66,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...   -1.58,-2.16,-0.62,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75        1.76,-2.04, 1.50
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83        3.74, 0.98,-3.66
#> ...                      ...                     ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...    2.90,-3.88, 0.22,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...    2.74, 0.80,-0.16,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...    0.60, 4.22,-0.02,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...    3.82, 2.68,-3.52,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...   -3.00, 0.80, 0.52,...or calculating the length() of the nl column cells as length_nl
mutate(d, length_nl = lengths(nl))
#> DataFrame with 32 rows and 9 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl length_nl
#>                    <GRanges> <CompressedNumericList> <integer>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...         4
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...         4
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...         4
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75         3
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83         3
#> ...                      ...                     ...       ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...         5
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...         5
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...         5
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...         5
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...         4Transformations can involve S4-related functions, such as extracting the
seqnames(), strand(), and end() of the grX column
mutate(d,
    chr = GenomeInfoDb::seqnames(grX),
    strand_X = BiocGenerics::strand(grX),
    end_X = BiocGenerics::end(grX)
)
#> DataFrame with 32 rows and 11 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl   chr     end_X strand_X
#>                    <GRanges> <CompressedNumericList> <Rle> <integer>    <Rle>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...  chrX        10        *
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...  chrX        11        *
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...  chrX        12        *
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75  chrX        13        *
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83  chrX        14        *
#> ...                      ...                     ...   ...       ...      ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...  chrX        37        *
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...  chrX        38        *
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...  chrX        39        *
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...  chrX        40        *
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...  chrX        41        *the object returned remains a standard DataFrame, and further calls can be
piped with %>%, in this case extracting the newly created newvar column
mutate(d, newvar = cyl + hp) %>%
    pull(newvar)
#>  [1] 116 116  97 116 183 111 253  66  99 129 129 188 188 188 213 223 238  70  56
#> [20]  69 101 158 158 253 183  70  95 117 272 181 343 113Some of the variants of the dplyr verbs also work, such as transforming the
numeric columns using a quosure style lambda function, in this case squaring
them
mutate_if(d, is.numeric, ~ .^2)
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                36     12100         1        16     25600  chrX:1-10
#> Mazda RX4 Wag            36     12100         1        16     25600  chrX:2-11
#> Datsun 710               16      8649         1        16     11664  chrX:3-12
#> Hornet 4 Drive           36     12100         0         9     66564  chrX:4-13
#> Hornet Sportabout        64     30625         0         9    129600  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa             16     12769         1        25   9044.01 chrX:28-37
#> Ford Pantera L           64     69696         1        25 123201.00 chrX:29-38
#> Ferrari Dino             36     30625         1        25  21025.00 chrX:30-39
#> Maserati Bora            64    112225         1        25  90601.00 chrX:31-40
#> Volvo 142E               16     11881         1        16  14641.00 chrX:32-41
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> ...                      ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...or extracting the start of all of the "GRanges" columns
mutate_if(d, ~ isa(., "GRanges"), BiocGenerics::start)
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp       grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric> <integer>
#> Mazda RX4                 6       110         1         4       160         1
#> Mazda RX4 Wag             6       110         1         4       160         2
#> Datsun 710                4        93         1         4       108         3
#> Hornet 4 Drive            6       110         0         3       258         4
#> Hornet Sportabout         8       175         0         3       360         5
#> ...                     ...       ...       ...       ...       ...       ...
#> Lotus Europa              4       113         1         5      95.1        28
#> Ford Pantera L            8       264         1         5     351.0        29
#> Ferrari Dino              6       175         1         5     145.0        30
#> Maserati Bora             8       335         1         5     301.0        31
#> Volvo 142E                4       109         1         4     121.0        32
#>                         grY                      nl
#>                   <integer> <CompressedNumericList>
#> Mazda RX4                 1      1.23,1.53,0.83,...
#> Mazda RX4 Wag             2   -2.38, 0.43,-0.83,...
#> Datsun 710                3   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive            4        0.88,-1.02, 0.75
#> Hornet Sportabout         5        1.87, 0.49,-1.83
#> ...                     ...                     ...
#> Lotus Europa             28    1.45,-1.94, 0.11,...
#> Ford Pantera L           29    1.37, 0.40,-0.08,...
#> Ferrari Dino             30    0.30, 2.11,-0.01,...
#> Maserati Bora            31    1.91, 1.34,-1.76,...
#> Volvo 142E               32   -1.50, 0.40, 0.26,...Use of tidyselect helpers is limited to within vars()
calls and using the _at variants
mutate_at(d, vars(starts_with("c")), ~ .^2)
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                36       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag            36       110         1         4       160  chrX:2-11
#> Datsun 710               16        93         1         4       108  chrX:3-12
#> Hornet 4 Drive           36       110         0         3       258  chrX:4-13
#> Hornet Sportabout        64       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa             16       113         1         5      95.1 chrX:28-37
#> Ford Pantera L           64       264         1         5     351.0 chrX:29-38
#> Ferrari Dino             36       175         1         5     145.0 chrX:30-39
#> Maserati Bora            64       335         1         5     301.0 chrX:31-40
#> Volvo 142E               16       109         1         4     121.0 chrX:32-41
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> ...                      ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...and also works with other verbs
select_at(d, vars(starts_with("gr")))
#> DataFrame with 32 rows and 2 columns
#>                          grX        grY
#>                    <GRanges>  <GRanges>
#> Mazda RX4          chrX:1-10  chrY:1-10
#> Mazda RX4 Wag      chrX:2-11  chrY:2-11
#> Datsun 710         chrX:3-12  chrY:3-12
#> Hornet 4 Drive     chrX:4-13  chrY:4-13
#> Hornet Sportabout  chrX:5-14  chrY:5-14
#> ...                      ...        ...
#> Lotus Europa      chrX:28-37 chrY:28-37
#> Ford Pantera L    chrX:29-38 chrY:29-38
#> Ferrari Dino      chrX:30-39 chrY:30-39
#> Maserati Bora     chrX:31-40 chrY:31-40
#> Volvo 142E        chrX:32-41 chrY:32-41Importantly, grouped operations are supported. DataFrame does not
natively support groups (the same way that data.frame does not) so these
are implemented specifically for DFplyr with group information shown at the
top of the printed output
group_by(d, cyl, am)
#> DataFrame with 32 rows and 8 columns
#> Groups:  cyl, am 
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> ...                      ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...Other verbs are similarly implemented, and preserve row names where possible. For example, selecting a limited set of columns using non-standard evaluation (NSE)
select(d, am, cyl)
#> DataFrame with 32 rows and 2 columns
#>                          am       cyl
#>                   <numeric> <numeric>
#> Mazda RX4                 1         6
#> Mazda RX4 Wag             1         6
#> Datsun 710                1         4
#> Hornet 4 Drive            0         6
#> Hornet Sportabout         0         8
#> ...                     ...       ...
#> Lotus Europa              1         4
#> Ford Pantera L            1         8
#> Ferrari Dino              1         6
#> Maserati Bora             1         8
#> Volvo 142E                1         4Arranging rows according to the ordering of a column
arrange(d, desc(hp))
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Maserati Bora             8       335         1         5       301 chrX:31-40
#> Ford Pantera L            8       264         1         5       351 chrX:29-38
#> Duster 360                8       245         0         3       360  chrX:7-16
#> Camaro Z28                8       245         0         3       350 chrX:24-33
#> Chrysler Imperial         8       230         0         3       440 chrX:17-26
#> ...                     ...       ...       ...       ...       ...        ...
#> Fiat 128                  4        66         1         4      78.7 chrX:18-27
#> Fiat X1-9                 4        66         1         4      79.0 chrX:26-35
#> Toyota Corolla            4        65         1         4      71.1 chrX:20-29
#> Merc 240D                 4        62         0         4     146.7  chrX:8-17
#> Honda Civic               4        52         1         4      75.7 chrX:19-28
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Duster 360         chrY:7-16        1.11,-1.34, 0.60
#> Camaro Z28        chrY:24-33        1.87, 0.32,-1.05
#> Chrysler Imperial chrY:17-26        0.16, 1.44,-1.18
#> ...                      ...                     ...
#> Fiat 128          chrY:18-27   -1.87,-0.44,-0.80,...
#> Fiat X1-9         chrY:26-35      0.18,0.38,0.45,...
#> Toyota Corolla    chrY:20-29   -0.05,-0.01, 1.11,...
#> Merc 240D          chrY:8-17    0.83,-0.54,-0.48,...
#> Honda Civic       chrY:19-28   -1.30,-1.33,-1.45,...Filtering to only specific values appearing in a column
filter(d, am == 0)
#> DataFrame with 19 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Hornet 4 Drive            6       110         0         3     258.0  chrX:4-13
#> Hornet Sportabout         8       175         0         3     360.0  chrX:5-14
#> Valiant                   6       105         0         3     225.0  chrX:6-15
#> Duster 360                8       245         0         3     360.0  chrX:7-16
#> Merc 240D                 4        62         0         4     146.7  chrX:8-17
#> ...                     ...       ...       ...       ...       ...        ...
#> Toyota Corona             4        97         0         3     120.1 chrX:21-30
#> Dodge Challenger          8       150         0         3     318.0 chrX:22-31
#> AMC Javelin               8       150         0         3     304.0 chrX:23-32
#> Camaro Z28                8       245         0         3     350.0 chrX:24-33
#> Pontiac Firebird          8       175         0         3     400.0 chrX:25-34
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> Valiant            chrY:6-15        0.02, 0.15,-1.65
#> Duster 360         chrY:7-16        1.11,-1.34, 0.60
#> Merc 240D          chrY:8-17    0.83,-0.54,-0.48,...
#> ...                      ...                     ...
#> Toyota Corona     chrY:21-30        0.01, 0.70,-1.00
#> Dodge Challenger  chrY:22-31       -1.38,-0.17,-1.12
#> AMC Javelin       chrY:23-32       -1.99, 1.33, 0.86
#> Camaro Z28        chrY:24-33        1.87, 0.32,-1.05
#> Pontiac Firebird  chrY:25-34       -0.77,-1.11,-0.92Selecting specific rows by index
slice(d, 3:6)
#> DataFrame with 4 rows and 8 columns
#>                         cyl        hp        am      gear      disp       grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric> <GRanges>
#> Datsun 710                4        93         1         4       108 chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258 chrX:4-13
#> Hornet Sportabout         8       175         0         3       360 chrX:5-14
#> Valiant                   6       105         0         3       225 chrX:6-15
#>                         grY                      nl
#>                   <GRanges> <CompressedNumericList>
#> Datsun 710        chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive    chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout chrY:5-14        1.87, 0.49,-1.83
#> Valiant           chrY:6-15        0.02, 0.15,-1.65These also work for grouped objects, and also preserve the rownames, e.g.
selecting the first two rows from each group of gear
group_by(d, gear) %>%
    slice(1:2)
#> DataFrame with 6 rows and 8 columns
#> Groups:  gear 
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Hornet Sportabout         8       175         0         3     360.0  chrX:5-14
#> Merc 450SL                8       180         0         3     275.8 chrX:13-22
#> Mazda RX4                 6       110         1         4     160.0  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4     160.0  chrX:2-11
#> Porsche 914-2             4        91         1         5     120.3 chrX:27-36
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> Merc 450SL        chrY:13-22          2.73,0.43,0.08
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Porsche 914-2     chrY:27-36   -0.31,-0.91, 0.41,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...rename is itself renamed to rename2 due to conflicts between
dplyr and S4Vectors, but works in the
dplyr sense of taking new = old replacements with NSE syntax
select(d, am, cyl) %>%
    rename2(foo = am)
#> DataFrame with 32 rows and 2 columns
#>                         foo       cyl
#>                   <numeric> <numeric>
#> Mazda RX4                 1         6
#> Mazda RX4 Wag             1         6
#> Datsun 710                1         4
#> Hornet 4 Drive            0         6
#> Hornet Sportabout         0         8
#> ...                     ...       ...
#> Lotus Europa              1         4
#> Ford Pantera L            1         8
#> Ferrari Dino              1         6
#> Maserati Bora             1         8
#> Volvo 142E                1         4Row names are not preserved when there may be duplicates or they don’t make
sense, otherwise the first label (according to the current de-duplication
method, in the case of distinct, this is via BiocGenerics::duplicated). This
may have complications for S4 columns.
distinct(d)
#> DataFrame with 32 rows and 8 columns
#>                         cyl        hp        am      gear      disp        grX
#>                   <numeric> <numeric> <numeric> <numeric> <numeric>  <GRanges>
#> Mazda RX4                 6       110         1         4       160  chrX:1-10
#> Mazda RX4 Wag             6       110         1         4       160  chrX:2-11
#> Datsun 710                4        93         1         4       108  chrX:3-12
#> Hornet 4 Drive            6       110         0         3       258  chrX:4-13
#> Hornet Sportabout         8       175         0         3       360  chrX:5-14
#> ...                     ...       ...       ...       ...       ...        ...
#> Lotus Europa              4       113         1         5      95.1 chrX:28-37
#> Ford Pantera L            8       264         1         5     351.0 chrX:29-38
#> Ferrari Dino              6       175         1         5     145.0 chrX:30-39
#> Maserati Bora             8       335         1         5     301.0 chrX:31-40
#> Volvo 142E                4       109         1         4     121.0 chrX:32-41
#>                          grY                      nl
#>                    <GRanges> <CompressedNumericList>
#> Mazda RX4          chrY:1-10      1.23,1.53,0.83,...
#> Mazda RX4 Wag      chrY:2-11   -2.38, 0.43,-0.83,...
#> Datsun 710         chrY:3-12   -0.79,-1.08,-0.31,...
#> Hornet 4 Drive     chrY:4-13        0.88,-1.02, 0.75
#> Hornet Sportabout  chrY:5-14        1.87, 0.49,-1.83
#> ...                      ...                     ...
#> Lotus Europa      chrY:28-37    1.45,-1.94, 0.11,...
#> Ford Pantera L    chrY:29-38    1.37, 0.40,-0.08,...
#> Ferrari Dino      chrY:30-39    0.30, 2.11,-0.01,...
#> Maserati Bora     chrY:31-40    1.91, 1.34,-1.76,...
#> Volvo 142E        chrY:32-41   -1.50, 0.40, 0.26,...Behaviours are ideally the same as those of dplyr wherever possible, for example a grouped tally
group_by(d, cyl, am) %>%
    tally(gear)
#> DataFrame with 6 rows and 3 columns
#>         cyl        am         n
#>   <numeric> <numeric> <numeric>
#> 1         4         0        11
#> 2         4         1        34
#> 3         6         0        14
#> 4         6         1        13
#> 5         8         0        36
#> 6         8         1        10or a count with weights
count(d, gear, am, cyl)
#> DataFrame with 10 rows and 4 columns
#>        gear    am   cyl         n
#>    <factor> <Rle> <Rle> <integer>
#> 1         3     0     4         1
#> 2         3     0     6         2
#> 3         3     0     8        12
#> 4         4     0     4         2
#> 5         4     0     6         2
#> 6         4     1     4         6
#> 7         4     1     6         2
#> 8         5     1     4         2
#> 9         5     1     6         1
#> 10        5     1     8         2DFplyrWe hope that DFplyr will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
citation("DFplyr")
#> To cite package 'DFplyr' in publications use:
#> 
#>   Carroll J (2024). _DFplyr: A `DataFrame` (`S4Vectors`) backend for
#>   `dplyr`_. doi:10.18129/B9.bioc.DFplyr
#>   <https://doi.org/10.18129/B9.bioc.DFplyr>, R package version 1.0.0,
#>   <https://bioconductor.org/packages/DFplyr>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {DFplyr: A `DataFrame` (`S4Vectors`) backend for `dplyr`},
#>     author = {Jonathan Carroll},
#>     year = {2024},
#>     note = {R package version 1.0.0},
#>     url = {https://bioconductor.org/packages/DFplyr},
#>     doi = {10.18129/B9.bioc.DFplyr},
#>   }#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 24.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-10-29
#>  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package          * version date (UTC) lib source
#>  BiocGenerics     * 0.52.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  BiocManager        1.30.25 2024-08-28 [2] CRAN (R 4.4.1)
#>  BiocStyle        * 2.34.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  bookdown           0.41    2024-10-16 [2] CRAN (R 4.4.1)
#>  bslib              0.8.0   2024-07-29 [2] CRAN (R 4.4.1)
#>  cachem             1.1.0   2024-05-16 [2] CRAN (R 4.4.1)
#>  cli                3.6.3   2024-06-21 [2] CRAN (R 4.4.1)
#>  DFplyr           * 1.0.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.1)
#>  digest             0.6.37  2024-08-19 [2] CRAN (R 4.4.1)
#>  dplyr            * 1.1.4   2023-11-17 [2] CRAN (R 4.4.1)
#>  evaluate           1.0.1   2024-10-10 [2] CRAN (R 4.4.1)
#>  fansi              1.0.6   2023-12-08 [2] CRAN (R 4.4.1)
#>  fastmap            1.2.0   2024-05-15 [2] CRAN (R 4.4.1)
#>  generics           0.1.3   2022-07-05 [2] CRAN (R 4.4.1)
#>  GenomeInfoDb       1.42.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  GenomeInfoDbData   1.2.13  2024-10-01 [2] Bioconductor
#>  GenomicRanges      1.58.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  glue               1.8.0   2024-09-30 [2] CRAN (R 4.4.1)
#>  htmltools          0.5.8.1 2024-04-04 [2] CRAN (R 4.4.1)
#>  httr               1.4.7   2023-08-15 [2] CRAN (R 4.4.1)
#>  IRanges            2.40.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  jquerylib          0.1.4   2021-04-26 [2] CRAN (R 4.4.1)
#>  jsonlite           1.8.9   2024-09-20 [2] CRAN (R 4.4.1)
#>  knitr              1.48    2024-07-07 [2] CRAN (R 4.4.1)
#>  lifecycle          1.0.4   2023-11-07 [2] CRAN (R 4.4.1)
#>  magrittr           2.0.3   2022-03-30 [2] CRAN (R 4.4.1)
#>  pillar             1.9.0   2023-03-22 [2] CRAN (R 4.4.1)
#>  pkgconfig          2.0.3   2019-09-22 [2] CRAN (R 4.4.1)
#>  R6                 2.5.1   2021-08-19 [2] CRAN (R 4.4.1)
#>  rlang              1.1.4   2024-06-04 [2] CRAN (R 4.4.1)
#>  rmarkdown          2.28    2024-08-17 [2] CRAN (R 4.4.1)
#>  S4Vectors        * 0.44.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  sass               0.4.9   2024-03-15 [2] CRAN (R 4.4.1)
#>  sessioninfo        1.2.2   2021-12-06 [2] CRAN (R 4.4.1)
#>  tibble             3.2.1   2023-03-20 [2] CRAN (R 4.4.1)
#>  tidyselect         1.2.1   2024-03-11 [2] CRAN (R 4.4.1)
#>  UCSC.utils         1.2.0   2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  utf8               1.2.4   2023-10-22 [2] CRAN (R 4.4.1)
#>  vctrs              0.6.5   2023-12-01 [2] CRAN (R 4.4.1)
#>  withr              3.0.2   2024-10-28 [2] CRAN (R 4.4.1)
#>  xfun               0.48    2024-10-03 [2] CRAN (R 4.4.1)
#>  XVector            0.46.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#>  yaml               2.3.10  2024-07-26 [2] CRAN (R 4.4.1)
#>  zlibbioc           1.52.0  2024-10-29 [2] Bioconductor 3.20 (R 4.4.1)
#> 
#>  [1] /home/biocbuild/bbs-3.20-bioc/tmpdir/RtmpEzPksP/Rinst3cfffc58977b36
#>  [2] /home/biocbuild/bbs-3.20-bioc/R/site-library
#>  [3] /home/biocbuild/bbs-3.20-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────