\seekquarry\yioop\libraryNamedEntityContextTagger

Machine learning based named entity recognizer.

NamedEntityContextTagger is used by @see StochasticTermSegmenter to help in segmenting sentences in which no term separators such as spaces are used.

Summary

Methods
Properties
Constants
__construct()
train()
predict()
No public properties found
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Methods

__construct()

__construct(string  $lang) 

Constructor for the NamedEntityContextTagger.

Sets the language this tagger tags for and sets up the path for where it should be stored

Parameters

string $lang

locale tag of the language this tagger tags is for

train()

train(mixed  $text_files, string  $term_tag_separator = "/", float  $learning_rate = 0.1, integer  $num_epoch = 1200, \seekquarry\yioop\library\function  $term_callback = null, \seekquarry\yioop\library\function  $tag_callback = null) 

Uses text files containing sentences to create a matrix so that from a two chars before a term, two chars after a char context, together with a two tags before a term context and a term, the odds that a named entity as been found can be calculated

Parameters

mixed $text_files

with training data. These can be a file or an array of file names.

string $term_tag_separator

separator used to separate term and tag for terms in input sentence

float $learning_rate

learning rate when cycling over data trying to minimize the cross-entropy loss in the prediction of the tag of the middle term.

integer $num_epoch

number of times to cycle through the complete data set. Default value of 1200 seems to avoid overfitting

\seekquarry\yioop\library\function $term_callback

callback function applied to a term before adding term to sentence term array as part of processing and training with a sentence.

\seekquarry\yioop\library\function $tag_callback

callback function applied to a part of speech tag before adding tag to sentence tag array as part of processing and training with a sentence.

predict()

predict(mixed  $sentence) : array

Predicts named entities that exists in a sentence.

Parameters

mixed $sentence

is an array of segmented words/terms or a string that will be split on white space

Returns

array —

all predicted named entities together with a tag indicating kind of named entity ex. [["郑振铎","nr"],["国民党","nt"]]