Class used to reorder the last 10 links computed by PhraseModel based on
thesaurus semantic information. For English, thesaurus semantic information
can be provided by WordNet, a lexical English database
available at http://wordnet.princeton.edu/
To enable, you this have to define WORDNET_EXEC in your local_config file.
The idea behind thresaurus reordering is that given a query, it
is tagged for parts of speech. Each term is then looked up in thesaurus for
those parts of speech. Representative phrases for those term senses are
extracted from the ranked thesaurus output and a set of rewrites of the
original query are created. By looking up the number
of times these rewrites occur in the searched index the top two phrases
that represent the original query are computed.The BM25 similarity of these
phrases is then scored against each of the 10 output summaries of
PhraseModel and used to reorder the results.
To add thesaurus reordering for a different locale, two methods need to be
written in that locale tokenizer.php file
tagPartsOfSpeechPhrase($phrase) which on an input phrase return a string
where each term_i in the phrase has been replace with term_i~pos
where pos is a two character part of speech NN, VB, AJ, AV, or NA (if
none of the previous apply)
scoredThesaurusMatches($term, $word_type, $whole_query) which takes
a term from an original whole_query which has been tagged to be
one of the types VB (for verb), NN (for noun), AJ (for adjective),
AV (for adverb), or NA (for anything else), it outputs
a sequence of (score => array of thesaurus terms) associations.
The score representing one word sense of term
Given that these methods have been implemented if the use_thesaurus field
of that language tokenizer is set to true, the thesaurus will be used.
Extracts similar phrases to the input query using thesaurus results.
Part of speech tagging is processed on input and the output is
looked up in the thesaurus. USing this a ranked list of alternate
query phrases is created.
For those phrases, counts in the Yioop index are calculated
and the top two phrases are selected.
Parameters
string
$orig_query
input query from user
string
$index_name
selected index for search engine
string
$lang
locale tag for the query
integer
$threshold
once count in posting list for any word
reaches to threshold then return the number
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Gets array of BM25 scores for given input array of summaries
and thesaurus generated queries
Parameters
array
$similar_phrases
an array of thesaurus generated queries
array
$summaries
an array of summaries which is generated
during crawl time.
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Returns
array
—
of BM25 score for each document based on the thesaurus
simimar phrases
Computes suggested related phrases from thesaurus based on part of
speech done on each query term.
Parameters
string
$query
query entered by user
string
$lang
locale tag for the query
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Returns
string
—
array $suggestion consisting of phrases suggested to
be similar in meaning to some sens of the query
Returns the number of documents in an index that a phrase occurs in.
If it occurs in more than threshold documents then cut off search.
Parameters
string
$phrase
to look up in index
integer
$threshold
once count in posting list for any word
reaches to threshold then return the number
string
$index_name
selected index for search engine
string
$lang
locale tag for the query
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Returns
integer
—
number of documents phrase occurs in
changeCaseOfStringArray()
changeCaseOfStringArray(array $summaries) : array
Lower cases an array of strings
Parameters
array
$summaries
strings to put into lower case
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Computes the BM25 of an array of documents given that the idf and
tf scores for these documents have already been computed
Parameters
array
$idf
inverse doc frequency for given query array
array
$tf
term frequency for given query array
$num_terms
number of terms that make up input query
$num_summaries
count for input summaries
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Calculates the BM25 normalized term frequency of a set of terms in
a collection of text summaries
Parameters
array
$summaries
list of summary strings to compute BM25TF w.r.t
array
$terms
we want the term frequency computation for
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Returns
array
—
$tfbm25 a 2d array with rows being indexed by terms and
columns indexed by summaries and the values of an entry being
the tfbm25 score for that term in that document
Computes a 2D array of the number of occurences of term i in document j
Parameters
array
$summaries
documents to compute frequencies in
array
$terms
terms to compute frequencies for
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
To get the inverse document frequencies for a collection of terms in
a set of documents.
IDF(term_i) = log_10(# of document / # docs term i in)
Parameters
array
$summaries
documents to use in calculating IDF score
array
$terms
terms to compute IDF score for
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Warning: count(): Parameter must be an array or an object that implements Countable in phar:///Applications/MAMP/htdocs/git/phpDocumentor.phar/vendor/twig/twig/lib/Twig/Extension/Core.php on line 1293
Returns
array
—
$idf 1D-array saying the inverse document frequency for
each term