\seekquarry\yioop\libraryPageRuleParser

Has methods to parse user-defined page rules to apply documents to be indexed.

There are two types of statements that a user can define: command statements and assignment statements

A command statement takes a key field argument for the page associative array and does a function call to manipulate that page. These have the syntax: addMetaWords(field) ;add the field and field value to the META_WORD ;array for the page addKeywordLink(field) ;split the field on a comma, view this as a search ;keywords => link text association, and add this to ;the KEYWORD_LINKS array. setStack(field) ;set which field value should be used as a stack pushStack(field) ;add the field value for field to the top of stack popStack(field) ;pop the top of the stack into the field value for ;field setOutputFolder(dir) ;if auxiliary output, rather than just to the ; a yioop index, is being done, then set the folder ; for this output to be dir setOutputFormat(format) ;format of auxiliary output either CSV or SQL ;SQL mean that writeOutput will write an insert ;statement setOutputTable(table) ;if output is SQL then what table to use for the ;insert statements toArray(field) ;splits field value for field on a comma and ;assign field value to be the resulting array toString(field) ;if field value is an array then implode that ;array using comma and store the result in field ;value unset(field) ;unset that field value writeOutput(field) ;use the contents of field value viewed as an array ;to fill in the columns of a SQL insert statement ;or CSV row

Assignments can either be straight assignments with '=' or concatenation assignments with '.='. There are the following kinds of values that one can assign:

field = some_other_field ; sets $page['field'] = $page['some_other_field'] field = "some_string" ; sets $page['field'] to "some string" field = /some_regex/replacement_where_dollar_vars_allowed/ ; computes the results of replacing matches to some_regex in ; $page['field'] with replacement_where_dollar_vars_allowed field = /some_regex/g ;sets $page['field'] to the array of all matches ; of some regex in $page['field']

For each of the above assignments we could have used ".=" instead of "="

Summary

Methods
Properties
Constants
__construct()
parseRules()
executeRuleTrees()
executeFunctionRule()
executeAssignmentRule()
getVarField()
addMetaWord()
addKeywordLink()
setStack()
pushStack()
popStack()
setOutputFolder()
setOutputFormat()
setOutputTable()
toArray()
toString()
unsetVariable()
writeOutput()
$rule_trees
$output_folder
$output_format
$output_table
$stack
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$rule_trees

$rule_trees : array

Used to store parse trees that this parser executes

Type

array

$output_folder

$output_folder : string

If outputting to auxiliary file is being done, the current folder to use for such output

Type

string

$output_format

$output_format : string

If outputting to auxiliary file is being done, the current file format to output with (either SQL or CSV)

Type

string

$output_table

$output_table : string

If outputting to auxiliary file is being done, and the current file format is SQL then what table to output insert statements for

Type

string

$stack

$stack : string

Name of field which will be used as a stack for push and popping other fields values

Type

string

Methods

__construct()

__construct(string  $page_rules = "") 

Constructs a PageRuleParser using the supplied page_rules

Parameters

string $page_rules

a sequence of lines with page rules as described in the class comments

parseRules()

parseRules(string  $page_rules) : array

Parses a string of pages rules into parse trees that can be executed later

Parameters

string $page_rules

a sequence of lines with page rules as described in the class comments

Returns

array —

of parse trees which can be executed in sequence

executeRuleTrees()

executeRuleTrees(\seekquarry\yioop\library\array&  $page_data, array  $rule_trees = null) 

Executes either the internal $rule_trees or the passed $rule_trees on the provided $page_data associative array

Parameters

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record (will be changed by this operation)

array $rule_trees

an array of annotated syntax trees to for rules used to update $page_data

executeFunctionRule()

executeFunctionRule(array  $tree, \seekquarry\yioop\library\array&  $page_data) 

Used to execute a single command rule on $page_data

Parameters

array $tree

annotated syntax tree of a function call rule

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record (will be changed by this operation)

executeAssignmentRule()

executeAssignmentRule(array  $tree, \seekquarry\yioop\library\array&  $page_data) 

Used to execute a single assignment rule on $page_data

Parameters

array $tree

annotated syntax tree of an assignment rule

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record (will be changed by this operation)

getVarField()

getVarField(string  $var_name) : string

Either returns $var_name or the value of the CrawlConstant with name $var_name.

Parameters

string $var_name

field to look up

Returns

string —

looked up value

addMetaWord()

addMetaWord(  $field, \seekquarry\yioop\library\array&  $page_data) 

Adds a meta word u:$field:$page_data[$field_name] to the array of meta words for this page

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

addKeywordLink()

addKeywordLink(  $field, \seekquarry\yioop\library\array&  $page_data) 

Adds a $keywords => $link_text pair to the KEYWORD_LINKS array fro this page based on the value $field on the page. The pair is extracted by splitting on comma. The KEYWORD_LINKS array can be used when a cached version of a page is displayed to show a list of links from the cached page in the header. These links correspond to search in Yioop. for example the value: madonna, rock star would add a link to the top of the cache page with text "rock star" which when clicked would perform a Yioop search on madonna.

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

setStack()

setStack(  $field, \seekquarry\yioop\library\array&  $page_data) 

Set field variable to be used as a stack

Parameters

$field

what field variable to use for current stack

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

pushStack()

pushStack(  $field, \seekquarry\yioop\library\array&  $page_data) 

Pushes an element or items in an array stored in field onto the current stack

Parameters

$field

what field to get data to push onto fcurrent stack

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

popStack()

popStack(  $field, \seekquarry\yioop\library\array&  $page_data) 

Pop an element or items in an array stored in field onto the current stack

Parameters

$field

what field to get data to push onto fcurrent stack

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

setOutputFolder()

setOutputFolder(  $dir, \seekquarry\yioop\library\array&  $page_data) 

Set output folder

Parameters

$dir

output directory in which to write data.txt files containing the contents of some fields after writeOutput commands

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

setOutputFormat()

setOutputFormat(  $format, \seekquarry\yioop\library\array&  $page_data) 

Set output format

Parameters

$format

can be either csv or sql

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

setOutputTable()

setOutputTable(  $table, \seekquarry\yioop\library\array&  $page_data) 

Set output table

Parameters

$table

table to use if output format is sql

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

toArray()

toArray(  $field, \seekquarry\yioop\library\array&  $page_data) 

If $page_data[$field] is a string, splits it into an array on comma, trims leading and trailing spaces from each item and stores the result back into $page_data[$field]

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

toString()

toString(  $field, \seekquarry\yioop\library\array&  $page_data) 

If $page_data[$field] is an array, implode it into a string on comma, and stores the result back into $page_data[$field]

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

unsetVariable()

unsetVariable(  $field, \seekquarry\yioop\library\array&  $page_data) 

Unsets the key $field (or the crawl constant it corresponds to) in $page_data. If it is a crawlconstant it doesn't unset it -- it just sets it to the empty string

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record

writeOutput()

writeOutput(  $field, \seekquarry\yioop\library\array&  $page_data) 

Write the value of a field to the output folder in the current format. If the field is not set nothing is written

Parameters

$field

the key in $page_data to use

\seekquarry\yioop\library\array& $page_data

an associative array of containing summary info of a web page/record