$debug
$debug : integer
Level of detail to be used for logging. Higher values mean more detail.
Implements the logistic regression text classification algorithm using lasso regression and a cyclic coordinate descent optimization step.
This algorithm is rather slow to converge for large datasets or a large number of features, but it does provide regularization in order to combat over-fitting, and out-performs Naive-Bayes in tests on the same data set. The algorithm augments a standard cyclic coordinate descent approach by ``sleeping'' features that don't significantly change during a single step. Each time an optimization step for a feature doesn't change the feature weight beyond some threshold, that feature is forced to sit out the next optimization round. The threshold increases over successive rounds, effectively placing an upper limit on the number of iterations over all features, while simultaneously limiting the number of features updated on each round. This optimization speeds up convergence, but at the cost of some accuracy.
train(object $X, array $y)
An adaptation of the Zhang-Oles 2001 CLG algorithm by Genkin et al. to use the Laplace prior for parameter regularization. On completion, optimizes the beta vector to maximize the likelihood of the data set.
object | $X | SparseMatrix representing the training dataset |
array | $y | array of known labels corresponding to the rows of $X |
classify(array $x)
Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.
array | $x | feature vector represented by an associative array mapping features to their weights |
computeApproxLikelihood(object $Xj, array $y, array $r, float $d) : array
Computes the approximate likelihood of y given a single feature, and returns it as a pair <numerator, denominator>.
object | $Xj | iterator over the non-zero entries in column j of the data |
array | $y | labels corresponding to entries in $Xj; each label is 1 if example i has the target label, and -1 otherwise |
array | $r | cached dot products of the beta vector and feature weights for each example i |
float | $d | trust region for feature j |
two-element array containing the numerator and denominator of the likelihood
score(array $r, array $y, array $beta) : float
Computes an approximate score that can be used to get an idea of how much a given optimization step improved the likelihood of the data set.
array | $r | cached dot products of the beta vector and feature weights for each example i |
array | $y | labels for each example |
array | $beta | beta vector of feature weights (used to penalize large weights) |
value proportional to the likelihood of the data, penalized by the magnitude of the beta vector