\seekquarry\yioop\libraryarchive_bundle_iterators

Classes

ArcArchiveBundleIterator Used to iterate through the records of a collection of arc files stored in a WebArchiveBundle folder. Arc is the file format of the Internet Archive http://www.archive.org/web/researcher/ArcFileFormat.php. Iteration would be for the purpose making an index of these records
ArchiveBundleIterator Abstract class used to model iterating documents indexed in an WebArchiveBundle or set of such bundles.
DatabaseBundleIterator Used to iterate through the records that result from an SQL query to a database
MediaWikiArchiveBundleIterator Used to iterate through a collection of .xml.bz2 media wiki files stored in a WebArchiveBundle folder. Here these media wiki files contain the kinds of documents used by wikipedia. Iteration would be for the purpose making an index of these records
MixArchiveBundleIterator Used to do an archive crawl based on the results of a crawl mix.
OdpRdfArchiveBundleIterator Used to iterate through the records of a collection of one or more open directory RDF files stored in a WebArchiveBundle folder. Open Directory file can be found at http://rdf.dmoz.org/ . Iteration would be for the purpose making an index of these records
TextArchiveBundleIterator Used to iterate through the records of a collection of text or compressed text-oriented records
WarcArchiveBundleIterator Used to iterate through the records of a collection of warc files stored in a WebArchiveBundle folder. Warc is the newer file format of the Internet Archive and other for digital preservation: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml http://archive-access.sourceforge.net/warc/ Iteration is done for the purpose making an index of these records
WebArchiveBundleIterator Class used to model iterating documents indexed in an WebArchiveBundle. This would typically be for the purpose of re-indexing these documents.