$lang
$lang : string
Current language, only tested on Simplified Chinese Might be extensible for other languages in the future
Machine learning based NER tagger. Typically, ContextWeightedNERTagger.php can train the language with some dataset and predict the tag given a list of word.
processTexts(mixed $text_files, $term_tag_splier = "/", $term_process = null, $tag_process = null) : array
A function that process the training data
mixed | $text_files | can be a file or an array of file names |
$term_tag_splier | ||
$term_process | ||
$tag_process |
of seperated sentences, each sentenfce have the format of [[words...],[tags...]] Data format MSRA: 我们/o 是/o 受到/o 郑振铎/nr 先生/o 、/o 阿英/nr 先生/o 著作/o 的/o 启示/o ,/o 从/o 个人/o 条件/o 出发/o ,/o 瞄准/o 现代/o 出版/o 史/o 研究/o 的/o 空白/o ,/o 重点/o 集/o 藏/o 解放区/o 、/o 国民党/nt 毁/o 禁/o 出版物/o 。/o To adapt to other language, some modifications are needed
train(mixed $text_files, float $learning_rate = 0.1, integer $max_epoch = 1200, \seekquarry\yioop\library\function $term_process = null, \seekquarry\yioop\library\function $tag_process = null)
Function to train a data Notice: This function might run very long time, depending on training set
mixed | $text_files | are training data can be a file or an array of file names |
float | $learning_rate | |
integer | $max_epoch | 1200 might be a good one, the weight will overfit if it's greater than this number |
\seekquarry\yioop\library\function | $term_process | is a preporcess on term before training |
\seekquarry\yioop\library\function | $tag_process | is a preporcess on tag before training |
predict(mixed $sentence, $delimiter = "", \seekquarry\yioop\library\function $splitter = null) : \seekquarry\yioop\library\@array
The primary function to predit the tag
mixed | $sentence | is an array of segmented words/terms or a string needs to be splited by $splitter |
$delimiter | ||
\seekquarry\yioop\library\function | $splitter | to process $sentence if $sentence is a string |
all predicted named entities with its tag ex. [["郑振铎","nr"],["国民党","nt"]]