\seekquarry\yioop\library\classifiersNaiveBayes

Implements the Naive Bayes text classification algorithm.

This class also provides a method to sample a beta vector from a dataset, making it easy to generate several slightly-different classifiers for the same dataset in order to form classifier committees.

Summary

Methods
Properties
Constants
log()
train()
sampleBeta()
classify()
logit()
sampleGammaDeviate()
$debug
$gamma
$epsilon
$beta
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$debug

$debug : integer

Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output

Type

integer

$gamma

$gamma : float

Parameter used to weight positive examples.

Type

float

$epsilon

$epsilon : float

Parameter used to weight negative examples.

Type

float

$beta

$beta : array

Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.

Type

array

Methods

log()

log(string  $message) 

Write a message to log file depending on debug level for this subpackage

Parameters

string $message

what to write to the log

train()

train(object  $X, array  $y) 

Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.

Parameters

object $X

SparseMatrix of training examples

array $y

example labels

sampleBeta()

sampleBeta(object  $features) 

Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.

Parameters

object $features

Features instance for the training set, used to determine how often a given feature occurs in positive and negative examples

classify()

classify(array  $x) 

Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.

Parameters

array $x

feature vector represented by an associative array mapping features to their weights

logit()

logit(integer  $pos, integer  $neg) : float

Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.

Parameters

integer $pos

count of positive examples exhibiting some feature

integer $neg

count of negative examples

Returns

float —

log odds of seeing the feature in a positive example

sampleGammaDeviate()

sampleGammaDeviate(integer  $alpha) : float

Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.

Parameters

integer $alpha

parameter to Gamma distribution (in practice, a count of occurrences of some feature)

Returns

float —

a deviate from the Gamma distribution parameterized by $alpha