getPeaklist {IPPD} | R Documentation |
Peak pattern extraction by non-negative ls/lad template matching
Description
Generates a candidate list of isotopic peak patterns
present in a protein mass spectrum. This is achieved by matching
templates calculated according to the so-called Averagine model to the raw
spectrum using either non-negative least squares (ls) or non-negative
least absolute deviation (lad) estimation. The presence of multiple charge
states is supported. In particular, the approach is capable of
deconvolving overlapping patterns. The method can be applied with two
different kind of peak shapes, Gaussians and Exponentially Modified Gaussians (EMG).
Usage
getPeaklist(mz, intensities, model = c("Gaussian", "EMG"),
model.parameters = list(alpha = function(){},
sigma = function(){},
mu = function(){}),
averagine.table = NULL,
loss = c("L2", "L1"), binning = FALSE,
postprocessing = TRUE, trace = TRUE, returnbasis = TRUE,
control.basis = list(charges = c(1,2,3,4),
eps = 1e-05),
control.localnoise = list(quantile = 0.5,
factor.place = 1.5,
factor.post = 0,
window = NULL,
subtract = FALSE),
control.postprocessing = list(mzfilter = FALSE,
prune = FALSE,
factor.prune = NULL,
ppm = NULL,
goodnessoffit = FALSE),
control.binning = list(tol = 0.01))
Arguments
mz |
A numeric vector of m/z (mass/charge) values (in
Thomson), ordered increasingly.
|
intensities |
A numeric vector of intensities
corresponding to mz .
|
model |
Basic model for the shape of a single peak. Must be
"Gaussian" or "EMG" (exponentially modified
Gaussian). See fitModelParameters for further information on
these models.
|
averagine.table |
If averagine.table = NULL the usual Averagine model (Senko et al. (1995): Determination of Monoisotopic Masses and Ion Populations for
Large Biomolecules from Resolved Isotopic Distributions, J. Am. Soc. Mass Spect., 6, 229-233) is used. Otherwise, avergine.table has to be a matrix or data.frame of the following form: each row encodes the
isotope distribution at a certain mass. The first entry of each row contains such mass while the remaining entries contain the relative abundances of the isotopes.
|
loss |
The loss function to be used. The choice loss =
"L2" yield a nonnegative least squares fit, loss = "L1" a
nonnegative least absolute deviation fit. The second choice is
more robust when deviations from model assumptions (peak model,
Averagine model,... ) frequently occur in the data. Note,
however, that computation time is much higher for the second choice
(at least by a factor two).
|
model.parameters |
A list of functions with precisely one
argument representing mz . The parameters of a single peak are
typically modeled as a function of m/z. If model = "Gaussian" ,
the peak shape depends on the parameter sigma (a function
sigma(mz) ). If model = "EMG" , the peak shape
additionally depends on two parameters alpha and mu
(two functions alpha(mz) and mu(mz) ). Note that
constant functions are usually specified by using
a construction of the form parameter(mz) <- function(mz)
rep(constant, length(mz)) . Moreover, note that a valid function
has to be vectorized. For the automatic generation
of those functions from a raw spectrum and further details on the
meaning of the parameters, see fitModelParameters. The output
of a call to fitModelParameters can directly be plugged into
getPeaklist via the argument model.parameters .
|
binning |
A logical indicating whether the fitting process
should be done sequentially in 'bins'. If TRUE , the spectrum is cut into pieces
(bins). Each bin is then fitted separately, and the results of all
bins are combined in the end. Division into bins may be configured using control.binning . See also
the 'Details' section below.
|
postprocessing |
A logical indicating whether a
post-processing correction should be applied to the raw
peaklist. See also the argument control.postprocessing and the 'Details' section below.
|
trace |
A logical indicating whether information tracing the different steps of the
fitting process should be displayed.
|
returnbasis |
A logical indicating whether the matrix of basis functions (template
functions evaluated at mz ) should be returned. Note that this may be expensive in terms of storage.
|
control.basis |
A list of arguments controlling the computation of
the matrix of basis functions:
charges The set of charge states present in the
spectrum.
eps Function values below eps are set equal to
precisely zero in order to make the basis function matrix sparse.
|
control.localnoise |
A list of arguments controlling the placement
and selection of basis functions on the basis of a 'local noise
level':
quantile A value from 0.1, 0.2, ..., 0.9 , specifying
the quantile of the intensities residing in a sliding
m/z window (s. below) to be used as 'local noise level'.
factor.place Controls the placement of basis
functions. A basis function is placed at an element of mz
if and only if the intensity at that position exceeds the 'local
noise level' by a factor at least equal to factor.place .
factor.post Controls which basis functions
enter the postprocessing step. A basis function is
discarded before the postprocessing step if its estimated
amplitude does not exceed the factor.post times the 'local
noise level'. By default factor.post = 0 . The pre-filtering
step before postprocessing is mainly done for computational
speed-up, and factor.post = 0 is supposed to yield the
qualitatively best solution, though it may take additional time.
window The length of the sliding window used to compute
quantile , to be specified in Thomson. By default,
window is chosen in such a way that it equals the length of
the support of an 'average' basis function for charge state one.
subtract A logical indicating whether the
'local noise level' should be subtracted
from the observed intensities. Setting subtract = TRUE is
typically beneficial in the sense that fitting of noise is reduced.
|
control.postprocessing |
A list of arguments controlling the
postprocessing step (provided postprocessing = TRUE ):
mzfilter Setting mzfilter = TRUE removes basis
functions at positions where peak patterns are highly
improbable to occur, thereby removing peaks from the list which are
likely to be noise peaks. This filter is sometimes called 'peptide
mass rule', see Zubarev et al. (1996): Accuracy Requirements
for Peptide Characterization by Monoisotopic Molecular
Measurements, Anal. Chem.,88,4060-4063.
prune , factor.prune Setting prune = TRUE activates a crude
scheme that removes low-intensity peaks (likely to be noise
peaks), as frequently occurring in regions with extremely intense
peaks. According to this scheme, a peak is removed from the peak
list if its amplitude is less than factor.prune times the
locally most intense amplitude, where factor.prune
typically ranges from 0.01 to 0.1 .
ppm A ppm (= parts per million) tolerance value within
which basis functions at different m/z positions are considered to
be merged, s. 'Details' below. By default, that value is computed
from the spacing of the first two m/z positions.
goodnessoffit A logical indicating whether a local goodness-of-fit
adjustment of the signa-to-noise ratio should be computed. Yields
usually more reliable evaluation of the detected patterns, but is
computationally more demanding.
|
control.binning |
Controls the division of the spectrum into bins
(if binning = TRUE ). Based on the 'local noise level'
described in control.localnoise , if within
a range of (1+tol) Thomson no further significant position occurs, a bin
is closed, and a new one is not opened as long as a new significant
position occurs..
|
Details
While setting binning = TRUE
yields a procedure which
is less memory consuming than fitting the whole
spectrum simultaneously (binning = FALSE
), it may be inferior from a quality aspect, since division
into bins has to be done with care. Otherwise, peak patterns might be
split up into different bins, which would result into erroneous
fitting.
Postprocessing of the raw list usually yields a
considerably improved result by counteracting the 'peak-splitting
phenomenon': due to a limited sampling rate and discrepancies
between theoretical and observed peak patterns, several templates at
adjacent positions are used to fit the same peak pattern.
Value
An object of class peaklist.
Warning
Although we have tried to choose default values
expected to produce sensible results, the user should carefully
examine all options.
Warning
Depending on the length and the resolution of the
raw spectrum, fitting the whole spectrum simultaneously as recommended
is expensive from a computational point of view, and may take up to several
minutes per spectrum.
See Also
fitModelParameters, peaklist
Examples
### load data
data(toyspectrum)
data(toyspectrumsolution)
mz <- toyspectrum[,"x"]
intensities <- toyspectrum[,"yyy"]
### select mz range
filter <- mz >= 2800 & mz <= 3200
### Extract peak patterns with model = "Gaussian"
sigmafun <- function (mz) -8.5e-07 * mz + 6.09e-10 * mz^2 + 0.00076
gausslist <- getPeaklist(mz = mz[filter], intensities = intensities[filter],
model = "Gaussian",
model.parameters = list(sigma = sigmafun,
alpha = function(mz){},
mu = function(mz){}),
control.localnoise = list(quantile = 0.5, factor.place = 3))
show(gausslist)
### threshold list at signal-to-noise ratio = 2
peaklist <- threshold(gausslist, threshold = 2)
### Extract peak patterns with model = "EMG" and loss = "L1"
alpha0 <- function(mz) 0.00001875 * 0.5 * 4/3 * mz
sigma0 <- function(mz) 0.00001875 * 0.5 * mz
mu0 <- function(mz) return(rep(-0.06162891, length(mz)))
EMGlist <- getPeaklist(mz = mz[filter], intensities = intensities[filter],
model = "EMG", loss = "L1",
model.parameters = list(sigma = sigma0,
alpha = alpha0,
mu = mu0),
control.localnoise = list(quantile = 0.5, factor.place = 3))
show(EMGlist)
peaklist2 <- threshold(EMGlist, threshold = 2)
### plot results of the 1st list and compare vs. 'truth'
### 'ground truth'
solution <- toyspectrumsolution[toyspectrumsolution[,1] >= 2800 & toyspectrumsolution[,1] <= 3200,]
visualize(gausslist, mz[filter], intensities[filter], lower = 3150, upper = 3170,
truth = TRUE,
signal = TRUE,
fitted = TRUE,
postprocessed = TRUE,
booktrue = as.matrix(toyspectrumsolution),
cutoff.eps = 0.2)
[Package
IPPD version 1.35.0
Index]