$debug
$debug : integer
Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output
Implements the Naive Bayes text classification algorithm.
This class also provides a method to sample a beta vector from a dataset, making it easy to generate several slightly-different classifiers for the same dataset in order to form classifier committees.
train(object $X, array $y)
Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.
object | $X | SparseMatrix of training examples |
array | $y | example labels |
sampleBeta(object $features)
Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.
object | $features | Features instance for the training set, used to determine how often a given feature occurs in positive and negative examples |
classify(array $x)
Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.
array | $x | feature vector represented by an associative array mapping features to their weights |
logit(integer $pos, integer $neg) : float
Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.
integer | $pos | count of positive examples exhibiting some feature |
integer | $neg | count of negative examples |
log odds of seeing the feature in a positive example
sampleGammaDeviate(integer $alpha) : float
Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.
integer | $alpha | parameter to Gamma distribution (in practice, a count of occurrences of some feature) |
a deviate from the Gamma distribution parameterized by $alpha