\seekquarry\yioop\libraryCrawlConstants

Shared constants and enums used by components that are involved in the crawling process

Summary

Methods
Constants
No public methods found
BOTH
INDEXER
SCHEDULER
queue_base_name
archive_base_name
name_archive_iterator
fetch_archive_iterator
save_point
schedule_data_base_name
schedule_name
robot_data_base_name
etag_expires_data_base_name
index_data_base_name
feed_index_data_base_name
double_index_base_name
network_base_name
network_crawllist_base_name
statistics_base_name
index_closed_name
fetch_batch_name
fetch_crawl_info
fetch_closed_name
data_base_name
schedule_start_name
robot_table_name
mirror_table_name
local_ip_cache_file
ASCENDING
DESCENDING
FEED_CRAWL_TIME
MAX
MIN
STOP_STATE
CONTINUE_STATE
NO_DATA_STATE
WAITING_START_MESSAGE_STATE
REDO_STATE
STATUS
CRAWL_TIME
HTTP_CODE
TIMESTAMP
TYPE
ENCODING
SEEN_URLS
MACHINE
INVERTED_INDEX
SAVED_CRAWL_TIMES
SCHEDULE_TIME
URL
WEIGHT
ROBOT_PATHS
HASH
PAGE
DOC_INFO
TITLE
DESCRIPTION
THUMB
CRAWL_DELAY
LINKS
ROBOT_TXT
TO_CRAWL
INDEX
DESCRIPTION_SCORES
HEIGHT
WIDTH
ROBOTS_TXT
DEBUG
DIRECTION
PINNED
SLEEP_START
SLEEP_DURATION
DOC_DEPTH
DOC_RANK
URL_WEIGHT
INLINKS
NEW_CRAWL
OFFSET
PATHS
HASH_URL
SUMMARY_OFFSET
DUMMY
SITES
SCORE
CRAWL_ORDER
RESTRICT_SITES_BY_URL
ALLOWED_SITES
DISALLOWED_SITES
BREADTH_FIRST
PAGE_IMPORTANCE
MACHINE_URI
SITE_INFO
FILETYPE
SUMMARY
URL_INFO
HASH_SEEN_URLS
RECENT_URLS
MEMORY_USAGE
DOC_ID
RELEVANCE
PAGE_RULES
CACHE_PAGE_PARTITION
GENERATION
HASH_SUM_SCORE
HASH_URL_COUNT
IS_DOC
IP_ADDRESSES
CLD_IN_COMMON
JUST_METAS
WEB_CRAWL
ARCHIVE_CRAWL
CRAWL_TYPE
CRAWL_INDEX
HEADER
SERVER
SERVER_VERSION
OPERATING_SYSTEM
MODIFIED
LANG
ROBOT_INSTANCE
DOC_LEN
SUBDOCS
SUBDOCTYPE
INDEXING_PLUGINS
DOMAIN_WEIGHTS
POSITION_LIST
PROXIMITY
LOCATION
INDEXED_FILE_TYPES
PAGE_RANGE_REQUEST
PAGE_RECRAWL_FREQUENCY
DATA
QUEUE_SERVERS
CURRENT_SERVER
SIZE
TOTAL_TIME
DNS_TIME
AGENT_LIST
ROBOT_METAS
ARC_DIR
ARC_TYPE
ARC_DATA
KEY
MACHINE_ID
IS_VIDEO
IS_FEED
SOURCE_NAME
LINK_SEEN_URLS
POST_MAX_SIZE
LOGGING
META_WORDS
CACHE_PAGES
WARC_ID
START_PARTITION
INI
UI_FLAGS
KEYWORD_LINKS
END_ITERATOR
ACTIVE_CLASSIFIERS
ACTIVE_CLASSIFIERS_DATA
MAX_DESCRIPTION_LEN
CACHE_PAGE_VALIDATORS
CACHE_PAGE_VALIDATION_DATA
NUM_PARTITIONS
PARTITION_NUM
ACTIVE_RANKERS
USER_RANKS
INDEXING_PLUGINS_DATA
REPOSITORY_TYPE
FILE_NAME
SHA_HASH
TOR_PROXY
PROXY_SERVERS
NEEDS_OFFSET_FLAG
BASIC_SUMMARIZER
CENTROID_SUMMARIZER
SUMMARIZER_OPTION
WORD_CLOUD
THESAURUS_SCORE
IS_GOPHER_URL
MINIMUM_FETCH_LOOP_TIME
IMAGE_LINK
GRAPH_BASED_SUMMARIZER
CENTROID_WEIGHTED_SUMMARIZER
SCRAPER_LABEL
SCRAPERS
QUESTION_ANSWERS
CONTENT_SIZE
NO_RANGE
MAX_DEPTH
REPEAT_TYPE
CHANNEL
THUMB_URL
IS_VR
DURATION
PUBDATE
SLOW_START
IS_SAFE
No protected methods found
N/A
No private methods found
N/A

Constants

BOTH

BOTH

Used to say what kind of queue_server this is

INDEXER

INDEXER

Used to say what kind of queue_server this is

SCHEDULER

SCHEDULER

Used to say what kind of queue_server this is

queue_base_name

queue_base_name

archive_base_name

archive_base_name

name_archive_iterator

name_archive_iterator

fetch_archive_iterator

fetch_archive_iterator

save_point

save_point

schedule_data_base_name

schedule_data_base_name

schedule_name

schedule_name

robot_data_base_name

robot_data_base_name

etag_expires_data_base_name

etag_expires_data_base_name

index_data_base_name

index_data_base_name

feed_index_data_base_name

feed_index_data_base_name

double_index_base_name

double_index_base_name

network_base_name

network_base_name

network_crawllist_base_name

network_crawllist_base_name

statistics_base_name

statistics_base_name

index_closed_name

index_closed_name

fetch_batch_name

fetch_batch_name

fetch_crawl_info

fetch_crawl_info

fetch_closed_name

fetch_closed_name

data_base_name

data_base_name

schedule_start_name

schedule_start_name

robot_table_name

robot_table_name

mirror_table_name

mirror_table_name

local_ip_cache_file

local_ip_cache_file

ASCENDING

ASCENDING

used for word iterator direction

DESCENDING

DESCENDING

FEED_CRAWL_TIME

FEED_CRAWL_TIME

media feed index archive bundle timestamp

MAX

MAX

Used in priority queue

MIN

MIN

STOP_STATE

STOP_STATE

starts of daemon processes

CONTINUE_STATE

CONTINUE_STATE

NO_DATA_STATE

NO_DATA_STATE

WAITING_START_MESSAGE_STATE

WAITING_START_MESSAGE_STATE

REDO_STATE

REDO_STATE

STATUS

STATUS

CRAWL_TIME

CRAWL_TIME

HTTP_CODE

HTTP_CODE

TIMESTAMP

TIMESTAMP

TYPE

TYPE

ENCODING

ENCODING

SEEN_URLS

SEEN_URLS

MACHINE

MACHINE

INVERTED_INDEX

INVERTED_INDEX

SAVED_CRAWL_TIMES

SAVED_CRAWL_TIMES

SCHEDULE_TIME

SCHEDULE_TIME

URL

URL

WEIGHT

WEIGHT

ROBOT_PATHS

ROBOT_PATHS

HASH

HASH

PAGE

PAGE

DOC_INFO

DOC_INFO

TITLE

TITLE

DESCRIPTION

DESCRIPTION

THUMB

THUMB

CRAWL_DELAY

CRAWL_DELAY

ROBOT_TXT

ROBOT_TXT

TO_CRAWL

TO_CRAWL

INDEX

INDEX

DESCRIPTION_SCORES

DESCRIPTION_SCORES

HEIGHT

HEIGHT

WIDTH

WIDTH

ROBOTS_TXT

ROBOTS_TXT

DEBUG

DEBUG

DIRECTION

DIRECTION

PINNED

PINNED

SLEEP_START

SLEEP_START

SLEEP_DURATION

SLEEP_DURATION

DOC_DEPTH

DOC_DEPTH

DOC_RANK

DOC_RANK

URL_WEIGHT

URL_WEIGHT

NEW_CRAWL

NEW_CRAWL

OFFSET

OFFSET

PATHS

PATHS

HASH_URL

HASH_URL

SUMMARY_OFFSET

SUMMARY_OFFSET

DUMMY

DUMMY

SITES

SITES

SCORE

SCORE

CRAWL_ORDER

CRAWL_ORDER

RESTRICT_SITES_BY_URL

RESTRICT_SITES_BY_URL

ALLOWED_SITES

ALLOWED_SITES

DISALLOWED_SITES

DISALLOWED_SITES

BREADTH_FIRST

BREADTH_FIRST

PAGE_IMPORTANCE

PAGE_IMPORTANCE

MACHINE_URI

MACHINE_URI

SITE_INFO

SITE_INFO

FILETYPE

FILETYPE

SUMMARY

SUMMARY

URL_INFO

URL_INFO

HASH_SEEN_URLS

HASH_SEEN_URLS

RECENT_URLS

RECENT_URLS

MEMORY_USAGE

MEMORY_USAGE

DOC_ID

DOC_ID

RELEVANCE

RELEVANCE

PAGE_RULES

PAGE_RULES

CACHE_PAGE_PARTITION

CACHE_PAGE_PARTITION

GENERATION

GENERATION

HASH_SUM_SCORE

HASH_SUM_SCORE

HASH_URL_COUNT

HASH_URL_COUNT

IS_DOC

IS_DOC

IP_ADDRESSES

IP_ADDRESSES

CLD_IN_COMMON

CLD_IN_COMMON

JUST_METAS

JUST_METAS

WEB_CRAWL

WEB_CRAWL

ARCHIVE_CRAWL

ARCHIVE_CRAWL

CRAWL_TYPE

CRAWL_TYPE

CRAWL_INDEX

CRAWL_INDEX

HEADER

HEADER

SERVER

SERVER

SERVER_VERSION

SERVER_VERSION

OPERATING_SYSTEM

OPERATING_SYSTEM

MODIFIED

MODIFIED

LANG

LANG

ROBOT_INSTANCE

ROBOT_INSTANCE

DOC_LEN

DOC_LEN

SUBDOCS

SUBDOCS

SUBDOCTYPE

SUBDOCTYPE

INDEXING_PLUGINS

INDEXING_PLUGINS

DOMAIN_WEIGHTS

DOMAIN_WEIGHTS

POSITION_LIST

POSITION_LIST

PROXIMITY

PROXIMITY

LOCATION

LOCATION

INDEXED_FILE_TYPES

INDEXED_FILE_TYPES

PAGE_RANGE_REQUEST

PAGE_RANGE_REQUEST

PAGE_RECRAWL_FREQUENCY

PAGE_RECRAWL_FREQUENCY

DATA

DATA

QUEUE_SERVERS

QUEUE_SERVERS

CURRENT_SERVER

CURRENT_SERVER

SIZE

SIZE

TOTAL_TIME

TOTAL_TIME

DNS_TIME

DNS_TIME

AGENT_LIST

AGENT_LIST

ROBOT_METAS

ROBOT_METAS

ARC_DIR

ARC_DIR

ARC_TYPE

ARC_TYPE

ARC_DATA

ARC_DATA

KEY

KEY

MACHINE_ID

MACHINE_ID

IS_VIDEO

IS_VIDEO

IS_FEED

IS_FEED

SOURCE_NAME

SOURCE_NAME

POST_MAX_SIZE

POST_MAX_SIZE

LOGGING

LOGGING

META_WORDS

META_WORDS

CACHE_PAGES

CACHE_PAGES

WARC_ID

WARC_ID

START_PARTITION

START_PARTITION

INI

INI

UI_FLAGS

UI_FLAGS

END_ITERATOR

END_ITERATOR

ACTIVE_CLASSIFIERS

ACTIVE_CLASSIFIERS

ACTIVE_CLASSIFIERS_DATA

ACTIVE_CLASSIFIERS_DATA

MAX_DESCRIPTION_LEN

MAX_DESCRIPTION_LEN

CACHE_PAGE_VALIDATORS

CACHE_PAGE_VALIDATORS

CACHE_PAGE_VALIDATION_DATA

CACHE_PAGE_VALIDATION_DATA

NUM_PARTITIONS

NUM_PARTITIONS

PARTITION_NUM

PARTITION_NUM

ACTIVE_RANKERS

ACTIVE_RANKERS

USER_RANKS

USER_RANKS

INDEXING_PLUGINS_DATA

INDEXING_PLUGINS_DATA

REPOSITORY_TYPE

REPOSITORY_TYPE

FILE_NAME

FILE_NAME

SHA_HASH

SHA_HASH

TOR_PROXY

TOR_PROXY

PROXY_SERVERS

PROXY_SERVERS

NEEDS_OFFSET_FLAG

NEEDS_OFFSET_FLAG

BASIC_SUMMARIZER

BASIC_SUMMARIZER

CENTROID_SUMMARIZER

CENTROID_SUMMARIZER

SUMMARIZER_OPTION

SUMMARIZER_OPTION

WORD_CLOUD

WORD_CLOUD

THESAURUS_SCORE

THESAURUS_SCORE

IS_GOPHER_URL

IS_GOPHER_URL

MINIMUM_FETCH_LOOP_TIME

MINIMUM_FETCH_LOOP_TIME

GRAPH_BASED_SUMMARIZER

GRAPH_BASED_SUMMARIZER

CENTROID_WEIGHTED_SUMMARIZER

CENTROID_WEIGHTED_SUMMARIZER

SCRAPER_LABEL

SCRAPER_LABEL

SCRAPERS

SCRAPERS

QUESTION_ANSWERS

QUESTION_ANSWERS

CONTENT_SIZE

CONTENT_SIZE

NO_RANGE

NO_RANGE

MAX_DEPTH

MAX_DEPTH

REPEAT_TYPE

REPEAT_TYPE

CHANNEL

CHANNEL

THUMB_URL

THUMB_URL

IS_VR

IS_VR

DURATION

DURATION

PUBDATE

PUBDATE

SLOW_START

SLOW_START

IS_SAFE

IS_SAFE