$stop_words
$stop_words :
A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries. This is also used for language detection
This class has a collection of methods for Dutch locale specific tokenization. In particular, it has a stemmer, .
segment(string $pre_segment) : string
Stub function which could be used for a word segmenter.
Such a segmenter on input thisisabunchofwords would output this is a bunch of words
string | $pre_segment | before segmentation |
should return string with words separated by space in this case does nothing
step3b(string $word, integer $R2) : string
Search for the longest among the following suffixes, and perform the action indicated.
If in R2 and ends with eigend, eigingm igend or iging remove it If in R2 and ends with ig preceded by an e remove it If in R2 and ends with lijk, baar or bar then remove it
string | $word | the string to stem |
integer | $R2 | the R index |
the string with the various endings removed if they exist
substituteIAndY(string $word) : string
Put initial y, y after a vowel, and i between vowels into upper case.
string | $word | the string to put initial y, y after a vowel, and i between vowels into upper case. |
the string with an initial y, y after a vowel, and i between vowels into upper case.
getRIndex(string $word, integer $start) : integer
Get the R index. The R index is the first consonent that follows a vowel after the $start index
string | $word | the string to search for the R index |
integer | $start | the index to start searching for the R index in the string |
the R index if found, otherwise -1
step1(string $word, integer $R1) : string
Define a valid en-ending as a non-vowel, and not gem and remove it
string | $word | the string to stem |
integer | $R1 | the int that represents the R index |
the string with the valid en-ending as a non-vowel, and not gem ending removed
step2(string $word) : string
Delete the suffix e if in R1 and preceded by a non-vowel, and then undouble the ending
string | $word | the string to delete the suffix e if in R1 and preceded by a non-vowel, and then undouble the ending |
the string with the suffix e if in R1 and preceded by a non-vowel deleted, and then undouble the ending
step3a(string $word, integer $R2) : string
Delete the letters heid if in R2 and not preceded by a c, and treat an a preceding en like in step 1
string | $word | the string to delete the letters heid if in R2 and not preceded by a c, and treat an a preceding en like in step 1 |
integer | $R2 | the R index |
the string with the letters heid if in R2 and not preceded by a c deleted, and treated an a preceding en like in step 1
step4(string $word) : string
If the words ends CVD, where C is a non-vowel, D is a non-vowel other than I, and V is double a, e, o or u, remove one of the vowels from V (for example, moom -> mon, weed -> wed).
string | $word | the string to check for the CVD combination |
the string with the CVD combination removed otherwise the original string
replace(string $word, string $regex, string $replace, integer $offset) : string
Replace a string based on a regex expression
string | $word | the string to search for regex replacement |
string | $regex | the regex to use to find and replacement |
string | $replace | the string to replace if the pattern is matched |
integer | $offset | the int to start to look for the regex replacement |
the string with the characters replaced if the regex matches, otherwise the original string
endsWith(string $haystack, string $needle, boolean $case = true) : boolean
Checks to see if a string ends with a certain string
string | $haystack | the string to check |
string | $needle | the string to match at the end |
boolean | $case | whether the check should be case insensitive or not |
true if it ends with $needle, otherwise false