src/locale/hi/resourcesTokenizer.php

SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer

Copyright (C) 2009 - 2020 Chris Pollett chris@pollett.org

LICENSE:

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Classes

Tokenizer Hindi specific tokenization code. In particular, it has a stemmer, The stemmer is my stab at porting Ljiljana Dolamic (University of Neuchatel, www.unine.ch/info/clef/) Java stemming algorithm: http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt Here given a word, its stem is that part of the word that is common to all its inflected variants. For example, tall is common to tall, taller, tallest. A stemmer takes a word and tries to produce its stem.