BinaryFeatures |
A concrete Features subclass that represents a document as a binary
vector where a one indicates that a feature is present in the document, and
a zero indicates that it is not. The absent features are ignored, so the
binary vector is actually sparse, containing only those feature indices
where the value is one. |
ChiSquaredFeatureSelection |
A subclass of FeatureSelection that implements chi-squared feature
selection. |
Classifier |
The primary interface for building and using classifiers. An instance of
this class represents a single classifier in memory, but the class also
provides static methods to manage classifiers on disk. |
ClassifierAlgorithm |
An abstract class shared by classification algorithms that implement a
common interface. |
Features |
Manages a dataset's features, providing a standard interface for converting
documents to feature vectors, and for accessing feature statistics. |
FeatureSelection |
This is an abstract class that specifies an interface for selecting top
features from a dataset. |
InvertedData |
Stores a data matrix in an inverted index on columns with non-zero entries. |
LassoRegression |
Implements the logistic regression text classification algorithm using lasso
regression and a cyclic coordinate descent optimization step. |
NaiveBayes |
Implements the Naive Bayes text classification algorithm. |
SparseMatrix |
A sparse matrix implementation based on an associative array of associative
arrays. |
WeightedFeatures |
A concrete Features subclass that represents a document as a
vector of feature weights, where weights are computed using a modified form
of TF * IDF. This feature mapping is experimental, and may not work
correctly. |