$iterate_timestamp
$iterate_timestamp : integer
Timestamp of the archive that is being iterated over
Class used to model iterating documents indexed in an WebArchiveBundle. This would typically be for the purpose of re-indexing these documents.
weight( $site) : boolean
Estimates the importance of the site according to the weighting of the particular archive iterator
$site | an associative array containing info about a web page |
false we assume files were crawled roughly according to page importance so we use default estimate of doc rank
nextPages(integer $num, boolean $no_process = false) : array
Gets the next $num many docs from the iterator
integer | $num | number of docs to get |
boolean | $no_process | this flag is inherited from base class but does not do anything in this case |
associative arrays for $num pages
__construct(string $prefix, string $iterate_timestamp, string $result_timestamp)
Creates a web archive iterator with the given parameters.
string | $prefix | fetcher number this bundle is associated with |
string | $iterate_timestamp | timestamp of the web archive bundle to iterate over the pages of |
string | $result_timestamp | timestamp of the web archive bundle results are being stored in |