\seekquarry\yioop\library\classifiersWeightedFeatures

A concrete Features subclass that represents a document as a vector of feature weights, where weights are computed using a modified form of TF * IDF. This feature mapping is experimental, and may not work correctly.

Each document in the training set is expected to be fed through an instance of a subclass of this abstract class in order to convert it to a feature vector. Terms are replaced with feature indices (e.g., 'Pythagorean' => 1, 'theorem' => 2, and so on), which are contiguous. The value at a feature index is determined by the subclass; one might weight terms according to how often they occur in the document, while another might use a simple binary representation. The feature index 0 is reserved for an intercept term, which always has a value of one.

Summary

Methods
Properties
Constants
addExample()
updateExampleLabel()
numFeatures()
labelStats()
varStats()
restrict()
mapToRestrictedFeatures()
mapTrainingSet()
mapDocument()
$vocab
$var_freqs
$label_freqs
$feature_map
$top_terms
$D
$n
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$vocab

$vocab : array

Maps terms to their feature indices, which start at 1.

Type

array

$var_freqs

$var_freqs : array

Maps terms to how often they occur in documents by label.

Type

array

$label_freqs

$label_freqs : array

Maps labels to the number of documents they're assigned to.

Type

array

$feature_map

$feature_map : array

Maps old feature indices to new ones when a feature subset operation has been applied to restrict the number of features.

Type

array

$top_terms

$top_terms : array

A list of the top terms according to the last feature subset operation, if any.

Type

array

$D

$D : integer

Number of trainin examples

Type

integer

$n

$n : integer

Number of elements in Vocabulary

Type

integer

Methods

addExample()

addExample(array  $terms, integer  $label) : array

Maps a new example to a feature vector, adding any new terms to the vocabulary, and updating term and label statistics. The example should be an array of terms and their counts, and the output simply replaces terms with feature indices.

Parameters

array $terms

array of terms mapped to the number of times they occur in the example

integer $label

label for this example, either -1 or 1

Returns

array —

input example with terms replaced by feature indices

updateExampleLabel()

updateExampleLabel(array  $features, integer  $old_label, integer  $new_label) 

Updates the label and term statistics to reflect a label change for an example from the training set. A new label of 0 indicates that the example is being removed entirely. Note that term statistics only count one occurrence of a term per example.

Parameters

array $features

feature vector from when the example was originally added

integer $old_label

old example label in {-1, 1}

integer $new_label

new example label in {-1, 0, 1}, where 0 indicates that the example should be removed entirely

numFeatures()

numFeatures() : integer

Returns the number of features, not including the intercept term represented by feature zero. For example, if we had features 0.

.10, this function would return 10.

Returns

integer —

the number of features in the training set

labelStats()

labelStats() : array

Returns the positive and negative label counts for the training set.

Returns

array —

positive and negative label counts indexed by label, either 1 or -1

varStats()

varStats(integer  $j, integer  $label) : array

Returns the statistics for a particular feature and label in the training set. The statistics are counts of how often the term appears or fails to appear in examples with or without the target label. They are returned in a flat array, in the following order:

0 => # examples where feature present, label matches 1 => # examples where feature present, label doesn't match 2 => # examples where feature absent, label matches 3 => # examples where feature absent, label doesn't match

Parameters

integer $j

feature index

integer $label

target label

Returns

array —

feature statistics in 4-element flat array

restrict()

restrict(object  $fs) : object

Given a FeatureSelection instance, return a new clone of this Features instance using a restricted feature subset. The new Features instance is augmented with a feature map that it can use to convert feature indices from the larger feature set to indices for the reduced set.

Parameters

object $fs

FeatureSelection instance to be used to select the most informative terms

Returns

object —

new Features instance using the restricted feature set

mapToRestrictedFeatures()

mapToRestrictedFeatures(array  $features) : array

Maps the indices of a feature vector to those used by a restricted feature set, dropping and features that aren't in the map. If this Features instance isn't restricted, then the passed-in features are returned unmodified.

Parameters

array $features

feature vector mapping feature indices to frequencies

Returns

array —

original feature vector with indices mapped according to the feature_map property, and any features that don't occcur in feature_map dropped

mapTrainingSet()

mapTrainingSet(array  $docs) : object

{@inheritDocs}

Parameters

array $docs

array of training examples represented as feature vectors where the values are per-example counts

Returns

object —

SparseMatrix instance whose rows are the transformed feature vectors

mapDocument()

mapDocument(array  $tokens) : array

{@inheritDocs}

Parameters

array $tokens

associative array of terms mapped to their within-document counts

Returns

array —

feature vector corresponding to the tokens, mapped according to the implementation of a particular Features subclass