naiveBayesKernel {ClassifyR}R Documentation

Classification Using A Bayes Classifier with Kernel Density Estimates

Description

Kernel density estimates are fitted to the training data and a naive Bayes classifier is used to classify samples in the test data.

Usage

  ## S4 method for signature 'matrix'
naiveBayesKernel(measurements, classes, test, ...)
  ## S4 method for signature 'DataFrame'
naiveBayesKernel(measurements, classes, test,
                 densityFunction = density,
                 densityParameters = list(bw = "nrd0", n = 1024,
                                                 from = expression(min(featureValues)),
                                              to = expression(max(featureValues))),
                   weighted = c("both", "unweighted", "weighted"),
                   weight = c("all", "height difference", "crossover distance", "sum differences"),
                   minDifference = 0, returnType = c("class", "score", "both"), verbose = 3)
  ## S4 method for signature 'MultiAssayExperiment'
naiveBayesKernel(measurements, test, targets = names(measurements), ...)  

Arguments

measurements

Either a matrix, DataFrame or MultiAssayExperiment containing the training data. For a matrix, the rows are features, and the columns are samples.

classes

Either a vector of class labels of class factor of the same length as the number of samples in measurements or if the measurements are of class DataFrame a character vector of length 1 containing the column name in measurement is also permitted. Not used if measurements is a MultiAssayExperiment object.

test

An object of the same class as measurements with no samples in common with measurements and the same number of features as it.

targets

If measurements is a MultiAssayExperiment, the names of the data tables to be used. "clinical" is also a valid value and specifies that integer variables from the clinical data table will be used.

...

Unused variables by the three top-level methods passed to the internal method which does the classification.

densityFunction

Default: density. A function which will return a probability density, which is essentially a list with x and y coordinates.

densityParameters

A list of options for densityFunction. Default: list(bw = "nrd0", n = 1024, from = expression(min(featureValues)), to = expression(max(featureValues)).

weighted

Default: "both". Either "both", "unweighted" or "weighted". In weighted mode, the difference in densities is summed over all features. If unweighted mode, each feature's vote is worth the same. Both can be calculated simultaneously.

weight

Default: "all". Either "all", "height difference", "crossover distance" or "sum differences". The type of weight to calculate. For "height difference", the weight of each prediction is equal to the vertical distance between two densities, for a particular value of x. For "crossover distance", the x positions where two densities cross is firstly calculated. The predicted class is the class with the highest density at the particular value of x and the weight is the distance of x from the nearest density crossover point. For "sum differences", the weight is the sum of the weights calculated by both types of distances.

minDifference

Default: 0. The minimum difference in densities for a feature to be allowed to vote. Can be a vector of cutoffs. If no features for a particular sample have a difference large enough, the class predicted is simply the largest class.

returnType

Default: "class". Either "class", "score" or "both". Sets the return value from the prediction to either a vector of class labels, score for a sample belonging to the second class, as determined by the factor levels, or both labels and scores in a data.frame.

verbose

Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Details

If weighted is TRUE, then a sample's predicted class is the class with the largest sum of weights, each scaled for the number of samples in the training data of each class. Otherwise, when weighted is FALSE, each feature has an equal vote, and votes for the class with the largest weight, scaled for class sizes in the training set.

The variable name of each feature's measurements in the iteration over all features is featureValues. This is important to know if each feature's measurements need to be referred to in the specification of densityParameters, such as for specifying the range of x values of the density function to be computed. For example, see the default value of densityParameters above.

If weight is "crossover distance", the crossover points are computed by considering the distance between y values of the two densities at every x value. x values for which the sign of the difference changes compared to the difference of the closest lower value of x are used as the crossover points.

Setting weight to "sum differences" is intended to find a mix of features which are strongly differentially expressed and differentially variable.

Value

A vector or list of class prediction information, as long as the number of samples in the test data, or lists of such information, if a variety of predictions is generated.

Author(s)

Dario Strbenac, John Ormerod

Examples

  trainMatrix <- matrix(rnorm(1000, 8, 2), ncol = 10)
  classes <- factor(rep(c("Poor", "Good"), each = 5))
  
  # Make first 30 genes increased in value for poor samples.
  trainMatrix[1:30, 1:5] <- trainMatrix[1:30, 1:5] + 5
  
  testMatrix <- matrix(rnorm(1000, 8, 2), ncol = 10)
  
  # Make first 30 genes increased in value for sixth to tenth samples.
  testMatrix[1:30, 6:10] <- testMatrix[1:30, 6:10] + 5
  
  naiveBayesKernel(trainMatrix, classes, testMatrix)

[Package ClassifyR version 2.0.10 Index]