\seekquarry\yioop\libraryContextWeightedNamedEntityRecognizer

Machine learning based NER tagger. Typically, ContextWeightedNERTagger.php can train the language with some dataset and predict the tag given a list of word.

Summary

Methods
Properties
Constants
__construct()
processTexts()
train()
predict()
$lang
$word_feature
$tag_feature
$bias
No constants found
No protected methods found
No protected properties found
N/A
getIndex()
save_weight()
loadWeight()
pack_b()
unpack_b()
pack_t()
unpack_t()
pack_w()
unpack_w()
getB()
getT()
getW()
$tag_set
N/A

Properties

$lang

$lang : string

Current language, only tested on Simplified Chinese Might be extensible for other languages in the future

Type

string

$word_feature

$word_feature : array

The word weight feature y = wx + b Generized by training method

Type

array

$tag_feature

$tag_feature : array

The tag weight feature y = wx + b Generalized by training method

Type

array

$bias

$bias : array

The bias y = wx + b Generized by training method

Type

array

$tag_set

$tag_set : \seekquarry\yioop\library\associative

All possible tag set Generized by training method

Type

\seekquarry\yioop\library\associative — array [tag => tag index]

Methods

__construct()

__construct(string  $lang) 

The constructer of the pos tagger To extend to other languages, some work are needed: Define $this->getKeyImpl, $this->rule_defined_key See Chinese example.

Parameters

string $lang

describes current langauge

processTexts()

processTexts(mixed  $text_files,   $term_tag_splier = "/",   $term_process = null,   $tag_process = null) : array

A function that process the training data

Parameters

mixed $text_files

can be a file or an array of file names

$term_tag_splier
$term_process
$tag_process

Returns

array —

of seperated sentences, each sentenfce have the format of [[words...],[tags...]] Data format MSRA: 我们/o 是/o 受到/o 郑振铎/nr 先生/o 、/o 阿英/nr 先生/o 著作/o 的/o 启示/o ,/o 从/o 个人/o 条件/o 出发/o ,/o 瞄准/o 现代/o 出版/o 史/o 研究/o 的/o 空白/o ,/o 重点/o 集/o 藏/o 解放区/o 、/o 国民党/nt 毁/o 禁/o 出版物/o 。/o To adapt to other language, some modifications are needed

train()

train(mixed  $text_files, float  $learning_rate = 0.1, integer  $max_epoch = 1200, \seekquarry\yioop\library\function  $term_process = null, \seekquarry\yioop\library\function  $tag_process = null) 

Function to train a data Notice: This function might run very long time, depending on training set

Parameters

mixed $text_files

are training data can be a file or an array of file names

float $learning_rate
integer $max_epoch

1200 might be a good one, the weight will overfit if it's greater than this number

\seekquarry\yioop\library\function $term_process

is a preporcess on term before training

\seekquarry\yioop\library\function $tag_process

is a preporcess on tag before training

predict()

predict(mixed  $sentence,   $delimiter = "", \seekquarry\yioop\library\function  $splitter = null) : \seekquarry\yioop\library\@array

The primary function to predit the tag

Parameters

mixed $sentence

is an array of segmented words/terms or a string needs to be splited by $splitter

$delimiter
\seekquarry\yioop\library\function $splitter

to process $sentence if $sentence is a string

Returns

\seekquarry\yioop\library\@array —

all predicted named entities with its tag ex. [["郑振铎","nr"],["国民党","nt"]]

getIndex()

getIndex(  $index,   $terms) 

A list of private helper functions Given a setence ($term), find the key at position $index

Parameters

$index
$terms

save_weight()

save_weight() 

save the trained weight to disk

loadWeight()

loadWeight(  $training_load = false) 

load the trained weight from disk

Parameters

$training_load

pack_b()

pack_b() 

Pack the bias

unpack_b()

unpack_b() 

Unpack the bias

pack_t()

pack_t(  $key) 

Pack the tag_feature

Parameters

$key

unpack_t()

unpack_t(  $key) 

Unpack the tag_feature

Parameters

$key

pack_w()

pack_w(  $key) 

Pack the word_feature

Parameters

$key

unpack_w()

unpack_w(  $key) 

Unpack the word_feature

Parameters

$key

getB()

getB(  $tag_index) 

Get the bias value for tag

Parameters

$tag_index

getT()

getT(  $key,   $tag_index) 

Get the bias value for tag

Parameters

$key
$tag_index

getW()

getW(  $term,   $position,   $tag_index) 

Get the weight value for term at position for tag

Parameters

$term
$position
$tag_index