$stop_words
$stop_words :
A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries. This is also used for language detection
Chinese specific tokenization code. Typically, tokenizer.php either contains a stemmer for the language in question or it specifies how many characters in a char gram
segment(string $pre_segment, string $method = "STS") : string
A word segmenter.
Such a segmenter on input thisisabunchofwords would output this is a bunch of words
string | $pre_segment | before segmentation |
string | $method | indicates which method to use |
with words separated by space
extractTripletsPhrases(array $word_and_phrase_list) : array
Scans a word list for phrases. For phrases found generate a list of question and answer pairs at two levels of granularity: CONCISE (using all terms in orginal phrase) and RAW (removing (adjectives, etc).
array | $word_and_phrase_list | of statements |
with two fields: QUESTION_LIST consisting of triplets (SUBJECT, PREDICATES, OBJECT) where one of the components has been replaced with a question marker.
tagTokenizePartOfSpeech(string $text) : array
Split input text into terms and output an array with one element per term, that element consisting of array with the term token and the part of speech tag.
string | $text | string to tag and tokenize |
of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) for one each token in $text
parseTypeList(\seekquarry\yioop\locale\zh_CN\resources\array& $cur_node, array $tagged_phrase, string $type) : string
Starting at the $cur_node in a $tagged_phrase parse tree for an English sentence, create a phrase string for each of the next nodes which belong to part of speech group $type.
\seekquarry\yioop\locale\zh_CN\resources\array& | $cur_node | node within parse tree |
array | $tagged_phrase | parse tree for phrase |
string | $type | self::$noun_type, self::$verb_type, etc |
phrase string involving only terms of that $type
parseAdjective(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for an adjective if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["cur_node" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "JJ" a subarray with a token node for the adjective that was parsed
parseDeterminer(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a determiner if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "DT" a subarray with a token node for the determiner that was parsed
parseNoun(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a noun if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "NN" a subarray with a token node for the noun string that was parsed
parseVerb(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a verb if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "VB" a subarray with a token node for the verb string that was parsed
parsePrepositionalPhrases(array $tagged_phrase, array $tree, integer $index = 1) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a sequence of prepositional phrases if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["cur_node" => current parse position in $tagged_phrase] |
integer | $index | which term in $tagged_phrase to start to try to parse a preposition from |
has fields "cur_node" index of how far we parsed $tagged_phrase parsed followed by additional possible fields (here i represents the ith clause found): "IN_i" with value a preposition subtree "DT_i" with value a determiner subtree "JJ_i" with value an adjective subtree "NN_i" with value an additional noun subtree
parseNounPhrase(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a noun phrase if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "NP" a subarray with possible fields "DT" with value a determiner subtree "JJ" with value an adjective subtree "NN" with value a noun tree
parseVerbPhrase(array $tagged_phrase, array $tree) : array
Takes a part-of-speech tagged phrase and pre-tree with a parse-from position and builds a parse tree for a verb phrase if possible
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
array | $tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
has fields "cur_node" index of how far we parsed $tagged_phrase "VP" a subarray with possible fields "VB" with value a verb subtree "NP" with value an noun phrase subtree
parseWholePhrase(array $tagged_phrase, $tree, $tree_np_pre = array()) : array
Given a part-of-speeech tagged phrase array generates a parse tree for the phrase using a recursive descent parser.
array | $tagged_phrase | an array of pairs of the form ("token" => token_for_term, "tag"=> part_of_speech_tag_for_term) |
$tree | that consists of ["curnode" => current parse position in $tagged_phrase] |
|
$tree_np_pre | subject found from previous sub-sentence |
used to represent a tree. The array has up to three fields $tree["cur_node"] index of how far we parsed our$tagged_phrase $tree["NP"] contains a subtree for a noun phrase $tree["VP"] contains a subtree for a verb phrase
extractTripletsParseTree(\seekquarry\yioop\locale\zh_CN\resources\are $tree) : array
Takes a parse tree of a phrase and computes subject, predicate, and object arrays. Each of these array consists of two components CONCISE and RAW, CONCISE corresponding to something more similar to the words in the original phrase and RAW to the case where extraneous words have been removed
\seekquarry\yioop\locale\zh_CN\resources\are | $tree | a parse tree for a sentence |
triplet array
extractDeepestSpeechPartPhrase(array $tree, string $pos) : string
Takes phrase tree $tree and a part-of-speech $pos returns the deepest $pos only path in tree.
array | $tree | phrase to extract type from |
string | $pos | the part of speech to extract |
the label of deepest $pos only path in $tree
extractObjectParseTree( $tree) : array
Takes a parse tree of a phrase or statement and returns an array with two fields CONCISE and RAW the former having the object of the original phrase (as a string) the latter having the importart parts of the object
$tree |
with two fields CONCISE and RAW as described above
extractPredicateParseTree( $tree) : array
Takes a parse tree of a phrase or statement and returns an array with two fields CONCISE and RAW the former having the predicate of the original phrase (as a string) the latter having the importart parts of the predicate
$tree |
with two fields CONCISE and RAW as described above
extractSubjectParseTree( $tree) : array
Takes a parse tree of a phrase or statement and returns an array with two fields CONCISE and RAW the former having the subject of the original phrase (as a string) the latter having the importart parts of the subject
$tree |
with two fields CONCISE and RAW as described above
rearrangeTripletsByType(array $sub_pred_obj_triplets) : array
Takes a triplets array with subject, predicate, object fields with CONCISE and RAW subfields and rearranges it to have two fields CONCISE and RAW with subject, predicate, object, and QUESTION_ANSWER_LIST subfields
array | $sub_pred_obj_triplets | in format described above |
$processed_triplets in format described above
extractTripletByType(array $sub_pred_obj_triplets, string $type) : array
Takes a triplets array with subject, predicate, object fields with CONCISE, RAW subfields and produces a triplits with $type subfield (where $type is one of CONCISE and RAW) and with subject, predicate, object, and QUESTION_ANSWER_LIST subfields
array | $sub_pred_obj_triplets | in format described above |
string | $type | either CONCISE or RAW |
$triplets in format described above
parseQuestion(string $tagged_question, integer $index, string $question_word) : array
Takes tagged question string starts with Who and returns question triplet from the question string
string | $tagged_question | part-of-speech tagged question |
integer | $index | current index in statement |
string | $question_word | is the question word need to be replaced |
parsed triplet