
Machine learning based named entity recognizer.

NamedEntityContextTagger is used by @see StochasticTermSegmenter to help in segmenting sentences in which no term separators such as spaces are used.


No public properties found
No constants found
No protected methods found
No protected properties found
No private methods found
No private properties found



__construct(string  $lang) 

Constructor for the NamedEntityContextTagger.

Sets the language this tagger tags for and sets up the path for where it should be stored


string $lang

locale tag of the language this tagger tags is for


train(mixed  $text_files, string  $term_tag_separator = "/", float  $learning_rate = 0.1, integer  $num_epoch = 1200, \seekquarry\yioop\library\function  $term_callback = null, \seekquarry\yioop\library\function  $tag_callback = null) 

Uses text files containing sentences to create a matrix so that from a two chars before a term, two chars after a char context, together with a two tags before a term context and a term, the odds that a named entity as been found can be calculated


mixed $text_files

with training data. These can be a file or an array of file names.

string $term_tag_separator

separator used to separate term and tag for terms in input sentence

float $learning_rate

learning rate when cycling over data trying to minimize the cross-entropy loss in the prediction of the tag of the middle term.

integer $num_epoch

number of times to cycle through the complete data set. Default value of 1200 seems to avoid overfitting

\seekquarry\yioop\library\function $term_callback

callback function applied to a term before adding term to sentence term array as part of processing and training with a sentence.

\seekquarry\yioop\library\function $tag_callback

callback function applied to a part of speech tag before adding tag to sentence tag array as part of processing and training with a sentence.


predict(mixed  $sentence) : array

Predicts named entities that exists in a sentence.


mixed $sentence

is an array of segmented words/terms or a string that will be split on white space


array —

all predicted named entities together with a tag indicating kind of named entity ex. [["郑振铎","nr"],["国民党","nt"]]