\seekquarry\yioop\libraryContextWeightedNamedEntityRecognizer

Machine learning based NER tagger. Typically, ContextWeightedNERTagger.php can train the language with some dataset and predict the tag given a list of word.

Summary

Methods

Properties

Constants

__construct()
processTexts()
train()
predict()

$lang
$word_feature
$tag_feature
$bias

No constants found

No protected methods found

No protected properties found

N/A

getIndex()
save_weight()
loadWeight()
pack_b()
unpack_b()
pack_t()
unpack_t()
pack_w()
unpack_w()
getB()
getT()
getW()

$tag_set

N/A

File: src/library/ContextWeightedNamedEntityRecognizer.php
Package: Default
Class hierarchy: \seekquarry\yioop\library\ContextWeightedNamedEntityRecognizer

Properties

$lang

$lang : string

Current language, only tested on Simplified Chinese Might be extensible for other languages in the future

Type

string

$word_feature

$word_feature : array

The word weight feature y = wx + b Generized by training method

Type

array

$tag_feature

$tag_feature : array

The tag weight feature y = wx + b Generalized by training method

Type

array

$bias

$bias : array

The bias y = wx + b Generized by training method

Type

array

$tag_set

$tag_set : \seekquarry\yioop\library\associative

All possible tag set Generized by training method

Type

\seekquarry\yioop\library\associative — array [tag => tag index]

Methods

__construct()

__construct(string  $lang)

The constructer of the pos tagger To extend to other languages, some work are needed: Define $this->getKeyImpl, $this->rule_defined_key See Chinese example.

Parameters

string

$lang

describes current langauge

processTexts()

processTexts(mixed  $text_files,   $term_tag_splier = "/",   $term_process = null,   $tag_process = null) : array

A function that process the training data

Parameters

mixed	$text_files	can be a file or an array of file names
	$term_tag_splier
	$term_process
	$tag_process

Returns

array —

of seperated sentences, each sentenfce have the format of [[words...],[tags...]] Data format MSRA: 我们/o 是/o 受到/o 郑振铎/nr 先生/o 、/o 阿英/nr 先生/o 著作/o 的/o 启示/o ，/o 从/o 个人/o 条件/o 出发/o ，/o 瞄准/o 现代/o 出版/o 史/o 研究/o 的/o 空白/o ，/o 重点/o 集/o 藏/o 解放区/o 、/o 国民党/nt 毁/o 禁/o 出版物/o 。/o To adapt to other language, some modifications are needed

train()

train(mixed  $text_files, float  $learning_rate = 0.1, integer  $max_epoch = 1200, \seekquarry\yioop\library\function  $term_process = null, \seekquarry\yioop\library\function  $tag_process = null)

Function to train a data Notice: This function might run very long time, depending on training set

Parameters

mixed	$text_files	are training data can be a file or an array of file names
float	$learning_rate
integer	$max_epoch	1200 might be a good one, the weight will overfit if it's greater than this number
\seekquarry\yioop\library\function	$term_process	is a preporcess on term before training
\seekquarry\yioop\library\function	$tag_process	is a preporcess on tag before training

predict()

predict(mixed  $sentence,   $delimiter = "", \seekquarry\yioop\library\function  $splitter = null) : \seekquarry\yioop\library\@array

The primary function to predit the tag

Parameters

mixed	$sentence	is an array of segmented words/terms or a string needs to be splited by $splitter
	$delimiter
\seekquarry\yioop\library\function	$splitter	to process $sentence if $sentence is a string

Returns

\seekquarry\yioop\library\@array —

all predicted named entities with its tag ex. [["郑振铎","nr"],["国民党","nt"]]

getIndex()

getIndex(  $index,   $terms)

A list of private helper functions Given a setence ($term), find the key at position $index

Parameters

	$index
	$terms

save_weight()

save_weight()

save the trained weight to disk

loadWeight()

loadWeight(  $training_load = false)

load the trained weight from disk

Parameters

$training_load

pack_b()

pack_b()

Pack the bias

unpack_b()

unpack_b()

Unpack the bias

pack_t()

pack_t(  $key)

Pack the tag_feature

Parameters

$key

unpack_t()

unpack_t(  $key)

Unpack the tag_feature

Parameters

$key

pack_w()

pack_w(  $key)

Pack the word_feature

Parameters

$key

unpack_w()

unpack_w(  $key)

Unpack the word_feature

Parameters

$key

getB()

getB(  $tag_index)

Get the bias value for tag

Parameters

$tag_index

getT()

getT(  $key,   $tag_index)

Get the bias value for tag

Parameters

	$key
	$tag_index

getW()

getW(  $term,   $position,   $tag_index)

Get the weight value for term at position for tag

Parameters

	$term
	$position
	$tag_index