src/library/archive_bundle_iteratorsWarcArchiveBundleIterator.php

SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer

Copyright (C) 2009 - 2020 Chris Pollett chris@pollett.org

LICENSE:

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

END LICENSE

Classes

WarcArchiveBundleIterator Used to iterate through the records of a collection of warc files stored in a WebArchiveBundle folder. Warc is the newer file format of the Internet Archive and other for digital preservation: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml http://archive-access.sourceforge.net/warc/ Iteration is done for the purpose making an index of these records