Procedural File: arc_tool.php
Source Location: /bin/arc_tool.php
Classes:
ArcTool
Command line program that allows one to examine the content of the WebArchiveBundles and IndexArchiveBundles of Yioop crawls.
Page Details:
SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer
Copyright (C) 2009 - 2013 Chris Pollett chris@pollett.org LICENSE: This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. END LICENSE
Tags:
Includes:
require_once(BASE_DIR."/lib/index_bundle_iterators/word_iterator.php") [line 68]
To be able to determine info about word in a index dictionary require_once(BASE_DIR."/lib/url_parser.php") [line 80]
Used for manipulating urls require_once($filename) [line 76]
Load the iterator classes for non-yioop archives require_once(BASE_DIR."/lib/web_queue_bundle.php") [line 62]
Load the class that maintains our URL queue require_once(BASE_DIR."/lib/index_manager.php") [line 71]
Used by word_iterator.php require_once(BASE_DIR."/lib/utility.php") [line 83]
For crawlHash function require_once(BASE_DIR."/models/datasources/".DBMS."_manager.php") [line 86]
Get the database library based on the current database type require_once(BASE_DIR."/lib/fetch_url.php") [line 89]
Load FetchUrl, used by the MediaWiki archive iterator require_once(BASE_DIR.'/configs/config.php') [line 48]
Load in global configuration settings require_once(BASE_DIR."/lib/crawl_constants.php") [line 92]
Loads common constants for web crawling require_once(BASE_DIR."/lib/index_archive_bundle.php") [line 65]
Load word->{array of docs with word} index class
BASE_DIR [line 37]
LOG_TO_FILES [line 45]
NO_CACHE [line 56]
USE_CACHE [line 59]
|