$rule_trees
$rule_trees : array
Used to store parse trees that this parser executes
Has methods to parse user-defined page rules to apply documents to be indexed.
There are two types of statements that a user can define: command statements and assignment statements
A command statement takes a key field argument for the page associative array and does a function call to manipulate that page. These have the syntax: addMetaWords(field) ;add the field and field value to the META_WORD ;array for the page addKeywordLink(field) ;split the field on a comma, view this as a search ;keywords => link text association, and add this to ;the KEYWORD_LINKS array. setStack(field) ;set which field value should be used as a stack pushStack(field) ;add the field value for field to the top of stack popStack(field) ;pop the top of the stack into the field value for ;field setOutputFolder(dir) ;if auxiliary output, rather than just to the ; a yioop index, is being done, then set the folder ; for this output to be dir setOutputFormat(format) ;format of auxiliary output either CSV or SQL ;SQL mean that writeOutput will write an insert ;statement setOutputTable(table) ;if output is SQL then what table to use for the ;insert statements toArray(field) ;splits field value for field on a comma and ;assign field value to be the resulting array toString(field) ;if field value is an array then implode that ;array using comma and store the result in field ;value unset(field) ;unset that field value writeOutput(field) ;use the contents of field value viewed as an array ;to fill in the columns of a SQL insert statement ;or CSV row
Assignments can either be straight assignments with '=' or concatenation assignments with '.='. There are the following kinds of values that one can assign:
field = some_other_field ; sets $page['field'] = $page['some_other_field'] field = "some_string" ; sets $page['field'] to "some string" field = /some_regex/replacement_where_dollar_vars_allowed/ ; computes the results of replacing matches to some_regex in ; $page['field'] with replacement_where_dollar_vars_allowed field = /some_regex/g ;sets $page['field'] to the array of all matches ; of some regex in $page['field']
For each of the above assignments we could have used ".=" instead of "="
executeRuleTrees(\seekquarry\yioop\library\array& $page_data, array $rule_trees = null)
Executes either the internal $rule_trees or the passed $rule_trees on the provided $page_data associative array
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record (will be changed by this operation) |
array | $rule_trees | an array of annotated syntax trees to for rules used to update $page_data |
executeFunctionRule(array $tree, \seekquarry\yioop\library\array& $page_data)
Used to execute a single command rule on $page_data
array | $tree | annotated syntax tree of a function call rule |
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record (will be changed by this operation) |
executeAssignmentRule(array $tree, \seekquarry\yioop\library\array& $page_data)
Used to execute a single assignment rule on $page_data
array | $tree | annotated syntax tree of an assignment rule |
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record (will be changed by this operation) |
addMetaWord( $field, \seekquarry\yioop\library\array& $page_data)
Adds a meta word u:$field:$page_data[$field_name] to the array of meta words for this page
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
addKeywordLink( $field, \seekquarry\yioop\library\array& $page_data)
Adds a $keywords => $link_text pair to the KEYWORD_LINKS array fro this page based on the value $field on the page. The pair is extracted by splitting on comma. The KEYWORD_LINKS array can be used when a cached version of a page is displayed to show a list of links from the cached page in the header. These links correspond to search in Yioop. for example the value: madonna, rock star would add a link to the top of the cache page with text "rock star" which when clicked would perform a Yioop search on madonna.
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
pushStack( $field, \seekquarry\yioop\library\array& $page_data)
Pushes an element or items in an array stored in field onto the current stack
$field | what field to get data to push onto fcurrent stack |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
popStack( $field, \seekquarry\yioop\library\array& $page_data)
Pop an element or items in an array stored in field onto the current stack
$field | what field to get data to push onto fcurrent stack |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
setOutputFolder( $dir, \seekquarry\yioop\library\array& $page_data)
Set output folder
$dir | output directory in which to write data.txt files containing the contents of some fields after writeOutput commands |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
toArray( $field, \seekquarry\yioop\library\array& $page_data)
If $page_data[$field] is a string, splits it into an array on comma, trims leading and trailing spaces from each item and stores the result back into $page_data[$field]
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
toString( $field, \seekquarry\yioop\library\array& $page_data)
If $page_data[$field] is an array, implode it into a string on comma, and stores the result back into $page_data[$field]
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
unsetVariable( $field, \seekquarry\yioop\library\array& $page_data)
Unsets the key $field (or the crawl constant it corresponds to) in $page_data. If it is a crawlconstant it doesn't unset it -- it just sets it to the empty string
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |
writeOutput( $field, \seekquarry\yioop\library\array& $page_data)
Write the value of a field to the output folder in the current format. If the field is not set nothing is written
$field | the key in $page_data to use |
|
\seekquarry\yioop\library\array& | $page_data | an associative array of containing summary info of a web page/record |