$index_archive
$index_archive : object
The IndexArchiveBundle object that this indexing plugin might make changes to in its postProcessing method
Used to extract emails, phone numbers, and addresses from a web page.
These are extracted into the EMAILS, PHONE_NUMBERS, and ADDRESSES fields of the page's summary.
pageProcessing(string $page, string $url) : array
This method is called by a PageProcessor in its handle() method just after it has processed a web page. This method allows an indexing plugin to do additional processing on the page such as adding sub-documents, before the page summary is handed back to the fetcher.
string | $page | web-page contents |
string | $url | the url where the page contents came from, used to canonicalize relative links |
consisting of a sequence of subdoc arrays found on the given page.
pageSummaryProcessing(array $summary, string $url)
Adjusts the document summary of a page after the page processor's process method has been called so that the subdoc's fields associated with the addresses plugin get copied as fields of the whole page summary. Then it deletes the subdoc fields.
array | $summary | of current document. It will be adjusted by the code below |
string | $url | the url where the summary contents came from |
postProcessing(string $index_name)
This method is called by the queue_server with the name of a completed index. This allows the indexing plugin to perform searches on the index and using the results, inject new page/index data into the index before it becomes available for end use.
string | $index_name | the name/timestamp of an IndexArchiveBundle to do post processing for |
checkCandidate(array $pre_address) : mixed
Checks if the passed sequence of lines has enough features of a postal address to call it an address. If so, return the address as a single string
array | $pre_address | an array of potential address lines |
false if not address, the lines imploded together using space if an address
checkStreet(string $line) : boolean
Used to check if a given line in an address candidate has features associated with being a street address.
string | $line | address line to check |
whether or not it contains a word identified with being a street address such as WAY, AVENUE, STREET, etc.