Streamer-package {Streamer} | R Documentation |
Large data files can be difficult to work with in R, where data
generally resides in memory. This package encourages a style of
programming where data is 'streamed' from disk into R through a series
of components that, typically, reduce the original data to a
manageable size. The package provides useful
Producer
and Consumer
components for operations such as data input, sampling, indexing, and
transformation.
The central paradigm in this package is a Stream
composed of a
Producer
and zero or more
Consumer
components. The Producer
is
responsible for input of data, e.g., from the file system. A
Consumer
accepts data from a Producer
and performs
transformations on it. The Stream
function is used to
assemble a Producer
and zero or more Consumer
components
into a single string.
The yield
function can be applied to a stream to
generate one ‘chunk’ of data. The definition of chunk depends on the
stream and its components. A common paradigm repeatedly invokes
yield
on a stream, retrieving chunks of the stream for further
processing.
Martin Morgan mtmorgan@fhcrc.org
Producer
, Consumer
are the
main types of stream components. Use Stream
to connect
components, and yield
to iterate a stream.
## About this package packageDescription("Streamer") ## Existing stream components getClass("Producer") # Producer classes getClass("Consumer") # Consumer classes ## An example fl <- system.file("extdata", "s_1_sequence.txt", package="Streamer") b <- RawInput(fl, 100L, reader=rawReaderFactory(1e4)) s <- Stream(RawToChar(), Rev(), b) s head(yield(s)) # First chunk close(b) b <- RawInput(fl, 5000L, verbose=TRUE) d <- Downsample(sampledSize=50) s <- Stream(RawToChar(), d, b) s s[[2]] ## Processing the first ten chunks of the file i <- 1 while (10 >= i && 0L != length(chunk <- yield(s))) { cat("chunk", i, "length", length(chunk), "\n") i <- i + 1 } close(b)