$stop_words
$stop_words :
A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries. This is also used for language detection
This class has a collection of methods for Portuguese locale specific tokenization. In particular, it has a stemmer implementing the Snowball Stemming algorithm presented in http://snowball.tartarus.org/algorithms/portuguese/stemmer.html
segment(string $pre_segment) : string
Stub function which could be used for a word segmenter.
Such a segmenter on input thisisabunchofwords would output this is a bunch of words
string | $pre_segment | before segmentation |
should return string with words separated by space in this case does nothing
step2(string $word) : \seekquarry\yioop\locale\pt\resources\processed
Verb Suffix Removal Step If step 1 does not change anything than this function will be called
It will also check for longest suffix from the suffix set Remove if found
string | $word | the string to suffix removal |
string
findRV(string $word) : string
This method will find RV region in the $word If the second letter is a consonant, RV is the region after the next following vowel, or if the first two letters are vowels, RV is the region after the next consonant, and otherwise (consonant-vowel case) RV is the region after the third letter.
string | $word |
$rv region