train()
train(mixed $text_files, string $term_tag_separator = "/", float $learning_rate = 0.1, integer $num_epoch = 1200, \seekquarry\yioop\library\function $term_callback = null, \seekquarry\yioop\library\function $tag_callback = null)
Uses text files containing sentences to create a matrix
so that from a two chars before a term, two chars after a char context,
together with a two tags before a term context and a term,
the odds that a named entity as been found can be calculated
Parameters
mixed |
$text_files |
with training data. These can be a file or
an array of file names. |
string |
$term_tag_separator |
separator used to separate term and tag
for terms in input sentence |
float |
$learning_rate |
learning rate when cycling over data trying
to minimize the cross-entropy loss in the prediction of the tag of the
middle term. |
integer |
$num_epoch |
number of times to cycle through the
complete data set. Default value of 1200 seems to avoid overfitting |
\seekquarry\yioop\library\function |
$term_callback |
callback function applied to a term
before adding term to sentence term array as part of processing and
training with a sentence. |
\seekquarry\yioop\library\function |
$tag_callback |
callback function applied to a part of
speech tag before adding tag to sentence tag array as part of
processing and training with a sentence. |