colsum,HDF5Matrix-method {DelayedMatrixStats}R Documentation

Give Column and Row Sums of an HDF5Matrix Based on a Grouping Variable

Description

Compute column and row sums across rows or columns of a numeric HDF5Array::HDF5Matrix object for each level of a grouping variable.

Usage

## S4 method for signature 'HDF5Matrix'
colsum(x, group, reorder = TRUE, na.rm = FALSE,
  filepath = NULL, name = NULL, chunkdim = NULL, level = NULL,
  type = c("double", "integer"), BPPARAM = bpparam())

## S4 method for signature 'HDF5Matrix'
rowsum(x, group, reorder = TRUE, na.rm = FALSE,
  filepath = NULL, name = NULL, chunkdim = NULL, level = NULL,
  type = c("double", "integer"), BPPARAM = bpparam())

Arguments

x

An HDF5Array::HDF5Matrix object.

group

A vector or factor giving the grouping, with one element per row of x for rowsum() or one element per column of x for colsum(). Missing values will be treated as another group and a warning will be given.

reorder

If TRUE, then the result will be in order of sort(unique(group)). If FALSE, it will be in the order that groups are encountered.

na.rm

logical (TRUE or FALSE). Should NA (including NaN) values be discarded?

filepath

NULL or the path (as a single string) to the (new or existing) HDF5 file where to write the dataset. If NULL, then the dataset will be written to the current HDF5 dump file i.e. the path returned by HDF5Array::getHDF5DumpFile() will be used.

name

NULL or the name of the HDF5 dataset to write. If NULL, then the name returned by [HDF5Array::getHDF5DumpName()] will be used.

chunkdim

The dimensions of the chunks to use for writing the data to disk. By default, HDF5Array::getHDF5DumpChunkDim(dim(ans)) will be used, where ans is the returned object. See ?HDF5Array::getHDF5DumpChunkDim() for more information.

level

The compression level to use for writing the data to disk. By default, HDF5Array::getHDF5DumpCompressionLevel() will be used. See ?HDF5Array::getHDF5DumpCompressionLevel() for more information.

type

The type of the data that will be written to the HDF5Array object to create the result. If the result is known a priori to be integer, then it is recommended to set type = "integer".

BPPARAM

An optional BiocParallel instance determining the parallel back-end to be used during evaluation, or a list of BiocParallel instances, to be applied in sequence for nested calls to BiocParallel functions.

Details

NOTE: Unlike base::rowsum(), the result is a base::double unless type = "integer" is specified. Notably, compared to base::rowsum(), this means that there are not the same issues with over/underflow in forming the sum results for integer arguments.

Examples

# A DelayedMatrix with a 'HDF5ArraySeed' seed
# NOTE: Requires that the HDF5Array package is installed
library(HDF5Array)
dm_HDF5 <- writeHDF5Array(matrix(c(rep(1L, 5),
                                   as.integer((0:4) ^ 2),
                                   seq(-5L, -1L, 1L)),
                                 ncol = 3))
group <- c(1, 1, 2)

# Compute the sums and store them in an HDF5-backed DelayedMatrix.
xsum <- colsum(dm_HDF5, group)
class(seed(xsum))


[Package DelayedMatrixStats version 1.4.0 Index]