\seekquarry\yioop\library\classifiersNaiveBayes

Implements the Naive Bayes text classification algorithm.

This class also provides a method to sample a beta vector from a dataset, making it easy to generate several slightly-different classifiers for the same dataset in order to form classifier committees.

Summary

Methods

Properties

Constants

log()
train()
sampleBeta()
classify()
logit()
sampleGammaDeviate()

$debug
$gamma
$epsilon
$beta

No constants found

No protected methods found

No protected properties found

N/A

No private methods found

No private properties found

N/A

File: src/library/classifiers/NaiveBayes.php
Package: Default
Class hierarchy: \seekquarry\yioop\library\classifiers\ClassifierAlgorithm

\seekquarry\yioop\library\classifiers\NaiveBayes

Properties

$debug

$debug : integer

Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output

Type

integer

$gamma

$gamma : float

Parameter used to weight positive examples.

Type

float

$epsilon

$epsilon : float

Parameter used to weight negative examples.

Type

float

$beta

$beta : array

Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.

Type

array

Methods

log()

log(string  $message)

Write a message to log file depending on debug level for this subpackage

Parameters

string

$message

what to write to the log

train()

train(object  $X, array  $y)

Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.

Parameters

object	$X	SparseMatrix of training examples
array	$y	example labels

sampleBeta()

sampleBeta(object  $features)

Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.

Parameters

object

$features

Features instance for the training set, used to determine how often a given feature occurs in positive and negative examples

classify()

classify(array  $x)

Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.

Parameters

array

feature vector represented by an associative array mapping features to their weights

logit()

logit(integer  $pos, integer  $neg) : float

Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.

Parameters

integer	$pos	count of positive examples exhibiting some feature
integer	$neg	count of negative examples

Returns

float —

log odds of seeing the feature in a positive example

sampleGammaDeviate()

sampleGammaDeviate(integer  $alpha) : float

Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.

Parameters

integer

$alpha

parameter to Gamma distribution (in practice, a count of occurrences of some feature)

Returns

float —

a deviate from the Gamma distribution parameterized by $alpha