seek_quarry
[ class tree: seek_quarry ] [ index: seek_quarry ] [ all elements ]

Class: ArchiveBundleIterator

Source Location: /lib/archive_bundle_iterators/archive_bundle_iterator.php

Class Overview


Abstract class used to model iterating documents indexed in an WebArchiveBundle or set of such bundles.


Author(s):

  • Chris Pollett

Implements interfaces:

Variables

Methods


Child classes:

DatabaseBundleIterator
Used to iterate through the records that result from an SQL query to a database
MixArchiveBundleIterator
Used to do an archive crawl based on the results of a crawl mix.
TextArchiveBundleIterator
Used to iterate through the records of a collection of text or compressed text-oriented records
WebArchiveBundleIterator
Class used to model iterating documents indexed in an WebArchiveBundle. This would typically be for the purpose of re-indexing these documents.

Class Details

[line 49]
Abstract class used to model iterating documents indexed in an WebArchiveBundle or set of such bundles.



Tags:

author:  Chris Pollett
see:  WebArchiveBundle
abstract:  


[ Top ]


Class Variables

$end_of_iterator =

[line 65]

Whether or not the iterator still has more documents


Type:   bool


[ Top ]

$iterate_timestamp =

[line 55]

Timestamp of the archive that is being iterated over


Type:   int


[ Top ]

$result_dir =

[line 70]

The path to the directory where the iteration status is stored.


Type:   string


[ Top ]

$result_timestamp =

[line 60]

Timestamp of the archive that is being used to store results in


Type:   int


[ Top ]



Class Methods


method nextPages [line 138]

array nextPages( int $num, [bool $no_process = false])

Gets the next $num many docs from the iterator



Tags:

return:  associative arrays for $num pages
abstract:  


Overridden in child classes as:

DatabaseBundleIterator::nextPages()
Gets the next at most $num many docs from the iterator. It might return less than $num many documents if the partition changes or the end of the bundle is reached.
MixArchiveBundleIterator::nextPages()
Gets the next $num many docs from the iterator
TextArchiveBundleIterator::nextPages()
Gets the next at most $num many docs from the iterator. It might return less than $num many documents if the partition changes or the end of the bundle is reached.
WebArchiveBundleIterator::nextPages()
Gets the next $num many docs from the iterator

Parameters:

int   $num   number of docs to get
bool   $no_process   do not do any processing on page data

[ Top ]

method reset [line 143]

void reset( )

Resets the iterator to the start of the archive bundle



Tags:

abstract:  


Overridden in child classes as:

DatabaseBundleIterator::reset()
Resets the iterator to the start of the archive bundle
MixArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
TextArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle
WebArchiveBundleIterator::reset()
Resets the iterator to the start of the archive bundle

[ Top ]

method restoreCheckpoint [line 98]

array restoreCheckpoint( )

Restores the internal state from the file iterate_status.txt in the

result dir such that the next call to nextPages will pick up from just after the last checkpoint. Each iterator should make a call to restoreCheckpoint at the end of the constructor method after the instance members have been initialized.




Tags:

return:  the data serialized when saveCheckpoint was called


Overridden in child classes as:

MixArchiveBundleIterator::restoreCheckpoint()
Restores state from a previous instantiation, after the last batch of pages extracted.
WebArchiveBundleIterator::restoreCheckpoint()
Restores state from a previous instantiation, after the last batch of pages extracted.

[ Top ]

method saveCheckpoint [line 80]

void saveCheckpoint( [array $info = array()])

Stores the current progress to the file iterate_status.txt in the result

dir such that a new instance of the iterator could be constructed and return the next set of pages without having to process all of the pages that came before. Each iterator should make a call to saveCheckpoint after extracting a batch of pages.




Overridden in child classes as:

MixArchiveBundleIterator::saveCheckpoint()
Saves the current state so that a new instantiation can pick up just after the last batch of pages extracted.
WebArchiveBundleIterator::saveCheckpoint()
Saves the current state so that a new instantiation can pick up just after the last batch of pages extracted.

Parameters:

array   $info   any extra info a subclass wants to save

[ Top ]

method seekPage [line 115]

void seekPage( $limit $limit)

Advances the iterator to the $limit page, with as little additional processing as possible



Overridden in child classes as:

DatabaseBundleIterator::seekPage()
Advances the iterator to the $limit page, with as little additional processing as possible

Parameters:

$limit   $limit   page to advance to

[ Top ]

method weight [line 130]

mixed weight( $site &$site)

Estimates the important of the site according to the weighting of

the particular archive iterator




Tags:

return:  a 4-bit number or false if iterator doesn't uses default ranking method
abstract:  


Overridden in child classes as:

DatabaseBundleIterator::weight()
Estimates the important of the site according to the weighting of
MixArchiveBundleIterator::weight()
Estimates the importance of the site according to the weighting of
TextArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
MediaWikiArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
OdpRdfArchiveBundleIterator::weight()
Estimates the important of the site according to the weighting of
WebArchiveBundleIterator::weight()
Estimates the importance of the site according to the weighting of

Parameters:

$site   &$site   an associative array containing info about a web page

[ Top ]


Documentation generated by phpDocumentor 1.4.3