\seekquarry\yioop\locale\ar\resourcesTokenizer

Arabic specific tokenization code. In particular, it has a stemmer, The stemmer is my stab at porting Ljiljana Dolamic (University of Neuchatel, www.unine.ch/info/clef/) C stemming algorithm: http://members.unine.ch/jacques.savoy/clef That algorithm maps all stems to ASCII. Instead, I tried to leave everything using Arabic characters.

Summary

Methods
Properties
Constants
segment()
stopwordsRemover()
stem()
$no_stem_list
$stop_words
No constants found
No protected methods found
No protected properties found
N/A
removeModifiersAndArchaic()
removeSuffix()
removePrefix()
No private properties found
N/A

Properties

$no_stem_list

$no_stem_list : array

Words we don't want to be stemmed

Type

array

$stop_words

$stop_words : 

A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries

Type

Methods

segment()

segment(string  $pre_segment) : string

Stub function which could be used for a word segmenter.

Such a segmenter on input thisisabunchofwords would output this is a bunch of words

Parameters

string $pre_segment

before segmentation

Returns

string —

should return string with words separated by space in this case does nothing

stopwordsRemover()

stopwordsRemover(mixed  $data) : mixed

Removes the stop words from the page (used for Word Cloud generation)

Parameters

mixed $data

either a string or an array of string to remove stop words from

Returns

mixed —

$data with no stop words

stem()

stem(string  $word) : string

Computes the stem of an Arabic word

Parameters

string $word

the string to stem

Returns

string —

the stem of $word

removeModifiersAndArchaic()

removeModifiersAndArchaic(string  $word) : string

Removes common letter modifiers as well as some archaic characters

Parameters

string $word

Returns

string —

the $word after letter modifiers removed

removeSuffix()

removeSuffix(string  $word) : string

Removes Arabic suffixes to get root

Parameters

string $word

word to remove suffixes from

Returns

string —

the $word after suffix removal

removePrefix()

removePrefix(string  $word) : string

Removes Arabic prefixes to get root

Parameters

string $word

word to remove prefixes from

Returns

string —

the $word after prefix removal